OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [tags/] [gnu-src/] [gcc-4.5.1/] [gcc-4.5.1-or32-1.0rc4/] [libstdc++-v3/] [doc/] [xml/] [manual/] [strings.xml] - Blame information for rev 519

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 424 jeremybenn
2
3
 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
4
[ ]>
5
 
6
7
8
 
9
10
  
11
    
12
      ISO C++
13
    
14
    
15
      library
16
    
17
  
18
19
 
20
</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>21</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>  Strings</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>22</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>  <indexterm><primary>Strings</primary></indexterm></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>23</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>
24
 
25
26
 
27
28
29
  String Classes
30
 
31
  
32
    Simple Transformations
33
    
34
      Here are Standard, simple, and portable ways to perform common
35
      transformations on a string instance, such as
36
      "convert to all upper case." The word transformations
37
      is especially apt, because the standard template function
38
      transform<> is used.
39
   
40
   
41
     This code will go through some iterations.  Here's a simple
42
     version:
43
   
44
   
45
   #include <string>
46
   #include <algorithm>
47
   #include <cctype>      // old <ctype.h>
48
 
49
   struct ToLower
50
   {
51
     char operator() (char c) const  { return std::tolower(c); }
52
   };
53
 
54
   struct ToUpper
55
   {
56
     char operator() (char c) const  { return std::toupper(c); }
57
   };
58
 
59
   int main()
60
   {
61
     std::string  s ("Some Kind Of Initial Input Goes Here");
62
 
63
     // Change everything into upper case
64
     std::transform (s.begin(), s.end(), s.begin(), ToUpper());
65
 
66
     // Change everything into lower case
67
     std::transform (s.begin(), s.end(), s.begin(), ToLower());
68
 
69
     // Change everything back into upper case, but store the
70
     // result in a different string
71
     std::string  capital_s;
72
     capital_s.resize(s.size());
73
     std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper());
74
   }
75
   
76
   
77
     Note that these calls all
78
      involve the global C locale through the use of the C functions
79
      toupper/tolower.  This is absolutely guaranteed to work --
80
      but only if the string contains only characters
81
      from the basic source character set, and there are only
82
      96 of those.  Which means that not even all English text can be
83
      represented (certain British spellings, proper names, and so forth).
84
      So, if all your input forevermore consists of only those 96
85
      characters (hahahahahaha), then you're done.
86
   
87
   Note that the
88
      ToUpper and ToLower function objects
89
      are needed because toupper and tolower
90
      are overloaded names (declared in <cctype> and
91
      <locale>) so the template-arguments for
92
      transform<> cannot be deduced, as explained in
93
      this
94
      message.
95
      
96
      At minimum, you can write short wrappers like
97
   
98
   
99
   char toLower (char c)
100
   {
101
      return std::tolower(c);
102
   } 
103
   (Thanks to James Kanze for assistance and suggestions on all of this.)
104
   
105
   Another common operation is trimming off excess whitespace.  Much
106
      like transformations, this task is trivial with the use of string's
107
      find family.  These examples are broken into multiple
108
      statements for readability:
109
   
110
   
111
   std::string  str (" \t blah blah blah    \n ");
112
 
113
   // trim leading whitespace
114
   string::size_type  notwhite = str.find_first_not_of(" \t\n");
115
   str.erase(0,notwhite);
116
 
117
   // trim trailing whitespace
118
   notwhite = str.find_last_not_of(" \t\n");
119
   str.erase(notwhite+1); 
120
   Obviously, the calls to find could be inserted directly
121
      into the calls to erase, in case your compiler does not
122
      optimize named temporaries out of existence.
123
   
124
 
125
  
126
  
127
    Case Sensitivity
128
    
129
    
130
 
131
   The well-known-and-if-it-isn't-well-known-it-ought-to-be
132
      Guru of the Week
133
      discussions held on Usenet covered this topic in January of 1998.
134
      Briefly, the challenge was, write a 'ci_string' class which
135
      is identical to the standard 'string' class, but is
136
      case-insensitive in the same way as the (common but nonstandard)
137
      C function stricmp().
138
   
139
   
140
   ci_string s( "AbCdE" );
141
 
142
   // case insensitive
143
   assert( s == "abcde" );
144
   assert( s == "ABCDE" );
145
 
146
   // still case-preserving, of course
147
   assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
148
   assert( strcmp( s.c_str(), "abcde" ) != 0 ); 
149
 
150
   The solution is surprisingly easy.  The original answer was
151
   posted on Usenet, and a revised version appears in Herb Sutter's
152
   book Exceptional C++ and on his website as GotW 29.
153
   
154
   See?  Told you it was easy!
155
   
156
     Added June 2000: The May 2000 issue of C++
157
     Report contains a fascinating 
158
     url="http://lafstern.org/matt/col2_new.pdf"> article by
159
     Matt Austern (yes, the Matt Austern) on why
160
     case-insensitive comparisons are not as easy as they seem, and
161
     why creating a class is the wrong way to go
162
     about it in production code.  (The GotW answer mentions one of
163
     the principle difficulties; his article mentions more.)
164
   
165
   Basically, this is "easy" only if you ignore some things,
166
      things which may be too important to your program to ignore.  (I chose
167
      to ignore them when originally writing this entry, and am surprised
168
      that nobody ever called me on it...)  The GotW question and answer
169
      remain useful instructional tools, however.
170
   
171
   Added September 2000:  James Kanze provided a link to a
172
      Unicode
173
      Technical Report discussing case handling, which provides some
174
      very good information.
175
   
176
 
177
  
178
  
179
    Arbitrary Character Types
180
    
181
    
182
 
183
   The std::basic_string is tantalizingly general, in that
184
      it is parameterized on the type of the characters which it holds.
185
      In theory, you could whip up a Unicode character class and instantiate
186
      std::basic_string<my_unicode_char>, or assuming
187
      that integers are wider than characters on your platform, maybe just
188
      declare variables of type std::basic_string<int>.
189
   
190
   That's the theory.  Remember however that basic_string has additional
191
      type parameters, which take default arguments based on the character
192
      type (called CharT here):
193
   
194
   
195
      template <typename CharT,
196
                typename Traits = char_traits<CharT>,
197
                typename Alloc = allocator<CharT> >
198
      class basic_string { .... };
199
   Now, allocator<CharT> will probably Do The Right
200
      Thing by default, unless you need to implement your own allocator
201
      for your characters.
202
   
203
   But char_traits takes more work.  The char_traits
204
      template is declared but not defined.
205
      That means there is only
206
   
207
   
208
      template <typename CharT>
209
        struct char_traits
210
        {
211
            static void foo (type1 x, type2 y);
212
            ...
213
        };
214
   and functions such as char_traits<CharT>::foo() are not
215
      actually defined anywhere for the general case.  The C++ standard
216
      permits this, because writing such a definition to fit all possible
217
      CharT's cannot be done.
218
   
219
   The C++ standard also requires that char_traits be specialized for
220
      instantiations of char and wchar_t, and it
221
      is these template specializations that permit entities like
222
      basic_string<char,char_traits<char>> to work.
223
   
224
   If you want to use character types other than char and wchar_t,
225
      such as unsigned char and int, you will
226
      need suitable specializations for them.  For a time, in earlier
227
      versions of GCC, there was a mostly-correct implementation that
228
      let programmers be lazy but it broke under many situations, so it
229
      was removed.  GCC 3.4 introduced a new implementation that mostly
230
      works and can be specialized even for int and other
231
      built-in types.
232
   
233
   If you want to use your own special character class, then you have
234
      a lot
235
      of work to do, especially if you with to use i18n features
236
      (facets require traits information but don't have a traits argument).
237
   
238
   Another example of how to specialize char_traits was given on the
239
      mailing list and at a later date was put into the file 
240
      include/ext/pod_char_traits.h.  We agree
241
      that the way it's used with basic_string (scroll down to main())
242
      doesn't look nice, but that's because the
243
      nice-looking first attempt turned out to not
244
      be conforming C++, due to the rule that CharT must be a POD.
245
      (See how tricky this is?)
246
   
247
 
248
  
249
 
250
  
251
    Tokenizing
252
    
253
    
254
   The Standard C (and C++) function strtok() leaves a lot to
255
      be desired in terms of user-friendliness.  It's unintuitive, it
256
      destroys the character string on which it operates, and it requires
257
      you to handle all the memory problems.  But it does let the client
258
      code decide what to use to break the string into pieces; it allows
259
      you to choose the "whitespace," so to speak.
260
   
261
   A C++ implementation lets us keep the good things and fix those
262
      annoyances.  The implementation here is more intuitive (you only
263
      call it once, not in a loop with varying argument), it does not
264
      affect the original string at all, and all the memory allocation
265
      is handled for you.
266
   
267
   It's called stringtok, and it's a template function. Sources are
268
   as below, in a less-portable form than it could be, to keep this
269
   example simple (for example, see the comments on what kind of
270
   string it will accept).
271
   
272
 
273
274
#include <string>
275
template <typename Container>
276
void
277
stringtok(Container &container, string const &in,
278
          const char * const delimiters = " \t\n")
279
{
280
    const string::size_type len = in.length();
281
          string::size_type i = 0;
282
 
283
    while (i < len)
284
    {
285
        // Eat leading whitespace
286
        i = in.find_first_not_of(delimiters, i);
287
        if (i == string::npos)
288
          return;   // Nothing left but white space
289
 
290
        // Find the end of the token
291
        string::size_type j = in.find_first_of(delimiters, i);
292
 
293
        // Push token
294
        if (j == string::npos)
295
        {
296
          container.push_back(in.substr(i));
297
          return;
298
        }
299
        else
300
          container.push_back(in.substr(i, j-i));
301
 
302
        // Set up for next loop
303
        i = j + 1;
304
    }
305
}
306
307
 
308
 
309
   
310
     The author uses a more general (but less readable) form of it for
311
     parsing command strings and the like.  If you compiled and ran this
312
     code using it:
313
   
314
 
315
 
316
   
317
   std::list<string>  ls;
318
   stringtok (ls, " this  \t is\t\n  a test  ");
319
   for (std::list<string>const_iterator i = ls.begin();
320
        i != ls.end(); ++i)
321
   {
322
       std::cerr << ':' << (*i) << ":\n";
323
   } 
324
   You would see this as output:
325
   
326
   
327
   :this:
328
   :is:
329
   :a:
330
   :test: 
331
   with all the whitespace removed.  The original s is still
332
      available for use, ls will clean up after itself, and
333
      ls.size() will return how many tokens there were.
334
   
335
   As always, there is a price paid here, in that stringtok is not
336
      as fast as strtok.  The other benefits usually outweigh that, however.
337
   
338
 
339
   Added February 2001:  Mark Wilden pointed out that the
340
      standard std::getline() function can be used with standard
341
      istringstreams to perform
342
      tokenizing as well.  Build an istringstream from the input text,
343
      and then use std::getline with varying delimiters (the three-argument
344
      signature) to extract tokens into a string.
345
   
346
 
347
 
348
  
349
  
350
    Shrink to Fit
351
    
352
    
353
   From GCC 3.4 calling s.reserve(res) on a
354
      string s with res < s.capacity() will
355
      reduce the string's capacity to std::max(s.size(), res).
356
   
357
   This behaviour is suggested, but not required by the standard. Prior
358
      to GCC 3.4 the following alternative can be used instead
359
   
360
   
361
      std::string(str.data(), str.size()).swap(str);
362
   
363
   This is similar to the idiom for reducing
364
      a vector's memory usage
365
      (see this FAQ
366
      entry) but the regular copy constructor cannot be used
367
      because libstdc++'s string is Copy-On-Write.
368
   
369
 
370
 
371
  
372
 
373
  
374
    CString (MFC)
375
    
376
    
377
 
378
   A common lament seen in various newsgroups deals with the Standard
379
      string class as opposed to the Microsoft Foundation Class called
380
      CString.  Often programmers realize that a standard portable
381
      answer is better than a proprietary nonportable one, but in porting
382
      their application from a Win32 platform, they discover that they
383
      are relying on special functions offered by the CString class.
384
   
385
   Things are not as bad as they seem.  In
386
      this
387
      message, Joe Buck points out a few very important things:
388
   
389
      
390
         The Standard string supports all the operations
391
             that CString does, with three exceptions.
392
         
393
         Two of those exceptions (whitespace trimming and case
394
             conversion) are trivial to implement.  In fact, we do so
395
             on this page.
396
         
397
         The third is CString::Format, which allows formatting
398
             in the style of sprintf.  This deserves some mention:
399
         
400
      
401
   
402
      The old libg++ library had a function called form(), which did much
403
      the same thing.  But for a Standard solution, you should use the
404
      stringstream classes.  These are the bridge between the iostream
405
      hierarchy and the string class, and they operate with regular
406
      streams seamlessly because they inherit from the iostream
407
      hierarchy.  An quick example:
408
   
409
   
410
   #include <iostream>
411
   #include <string>
412
   #include <sstream>
413
 
414
   string f (string& incoming)     // incoming is "foo  N"
415
   {
416
       istringstream   incoming_stream(incoming);
417
       string          the_word;
418
       int             the_number;
419
 
420
       incoming_stream >> the_word        // extract "foo"
421
                       >> the_number;     // extract N
422
 
423
       ostringstream   output_stream;
424
       output_stream << "The word was " << the_word
425
                     << " and 3*N was " << (3*the_number);
426
 
427
       return output_stream.str();
428
   } 
429
   A serious problem with CString is a design bug in its memory
430
      allocation.  Specifically, quoting from that same message:
431
   
432
   
433
   CString suffers from a common programming error that results in
434
   poor performance.  Consider the following code:
435
 
436
   CString n_copies_of (const CString& foo, unsigned n)
437
   {
438
           CString tmp;
439
           for (unsigned i = 0; i < n; i++)
440
                   tmp += foo;
441
           return tmp;
442
   }
443
 
444
   This function is O(n^2), not O(n).  The reason is that each +=
445
   causes a reallocation and copy of the existing string.  Microsoft
446
   applications are full of this kind of thing (quadratic performance
447
   on tasks that can be done in linear time) -- on the other hand,
448
   we should be thankful, as it's created such a big market for high-end
449
   ix86 hardware. :-)
450
 
451
   If you replace CString with string in the above function, the
452
   performance is O(n).
453
   
454
   Joe Buck also pointed out some other things to keep in mind when
455
      comparing CString and the Standard string class:
456
   
457
      
458
         CString permits access to its internal representation; coders
459
             who exploited that may have problems moving to string.
460
         
461
         Microsoft ships the source to CString (in the files
462
             MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation
463
             bug and rebuild your MFC libraries.
464
             Note: It looks like the CString shipped
465
             with VC++6.0 has fixed this, although it may in fact have been
466
             one of the VC++ SPs that did it.
467
         
468
         string operations like this have O(n) complexity
469
             if the implementors do it correctly.  The libstdc++
470
             implementors did it correctly.  Other vendors might not.
471
         
472
         While chapters of the SGI STL are used in libstdc++, their
473
             string class is not.  The SGI string is essentially
474
             vector<char> and does not do any reference
475
             counting like libstdc++'s does.  (It is O(n), though.)
476
             So if you're thinking about SGI's string or rope classes,
477
             you're now looking at four possibilities:  CString, the
478
             libstdc++ string, the SGI string, and the SGI rope, and this
479
             is all before any allocator or traits customizations!  (More
480
             choices than you can shake a stick at -- want fries with that?)
481
         
482
      
483
 
484
  
485
486
 
487
488
 
489

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.