OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [trunk/] [gnu-dev/] [or1k-gcc/] [libstdc++-v3/] [doc/] [xml/] [manual/] [strings.xml] - Blame information for rev 742

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 742 jeremybenn
2
         xml:id="std.strings" xreflabel="Strings">
3
4
 
5
</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>6</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>  Strings</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>7</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>  <indexterm><primary>Strings</primary></indexterm></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>8</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>
9
  
10
    
11
      ISO C++
12
    
13
    
14
      library
15
    
16
  
17
18
 
19
20
 
21
22
String Classes
23
 
24
 
25
  
Simple Transformations
26
 
27
    
28
      Here are Standard, simple, and portable ways to perform common
29
      transformations on a string instance, such as
30
      "convert to all upper case." The word transformations
31
      is especially apt, because the standard template function
32
      transform<> is used.
33
   
34
   
35
     This code will go through some iterations.  Here's a simple
36
     version:
37
   
38
   
39
   #include <string>
40
   #include <algorithm>
41
   #include <cctype>      // old <ctype.h>
42
 
43
   struct ToLower
44
   {
45
     char operator() (char c) const  { return std::tolower(c); }
46
   };
47
 
48
   struct ToUpper
49
   {
50
     char operator() (char c) const  { return std::toupper(c); }
51
   };
52
 
53
   int main()
54
   {
55
     std::string  s ("Some Kind Of Initial Input Goes Here");
56
 
57
     // Change everything into upper case
58
     std::transform (s.begin(), s.end(), s.begin(), ToUpper());
59
 
60
     // Change everything into lower case
61
     std::transform (s.begin(), s.end(), s.begin(), ToLower());
62
 
63
     // Change everything back into upper case, but store the
64
     // result in a different string
65
     std::string  capital_s;
66
     capital_s.resize(s.size());
67
     std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper());
68
   }
69
   
70
   
71
     Note that these calls all
72
      involve the global C locale through the use of the C functions
73
      toupper/tolower.  This is absolutely guaranteed to work --
74
      but only if the string contains only characters
75
      from the basic source character set, and there are only
76
      96 of those.  Which means that not even all English text can be
77
      represented (certain British spellings, proper names, and so forth).
78
      So, if all your input forevermore consists of only those 96
79
      characters (hahahahahaha), then you're done.
80
   
81
   Note that the
82
      ToUpper and ToLower function objects
83
      are needed because toupper and tolower
84
      are overloaded names (declared in <cctype> and
85
      <locale>) so the template-arguments for
86
      transform<> cannot be deduced, as explained in
87
      this
88
      message.
89
      
90
      At minimum, you can write short wrappers like
91
   
92
   
93
   char toLower (char c)
94
   {
95
      return std::tolower(c);
96
   } 
97
   (Thanks to James Kanze for assistance and suggestions on all of this.)
98
   
99
   Another common operation is trimming off excess whitespace.  Much
100
      like transformations, this task is trivial with the use of string's
101
      find family.  These examples are broken into multiple
102
      statements for readability:
103
   
104
   
105
   std::string  str (" \t blah blah blah    \n ");
106
 
107
   // trim leading whitespace
108
   string::size_type  notwhite = str.find_first_not_of(" \t\n");
109
   str.erase(0,notwhite);
110
 
111
   // trim trailing whitespace
112
   notwhite = str.find_last_not_of(" \t\n");
113
   str.erase(notwhite+1); 
114
   Obviously, the calls to find could be inserted directly
115
      into the calls to erase, in case your compiler does not
116
      optimize named temporaries out of existence.
117
   
118
 
119
  
120
  
Case Sensitivity
121
 
122
    
123
    
124
 
125
   The well-known-and-if-it-isn't-well-known-it-ought-to-be
126
      Guru of the Week
127
      discussions held on Usenet covered this topic in January of 1998.
128
      Briefly, the challenge was, write a 'ci_string' class which
129
      is identical to the standard 'string' class, but is
130
      case-insensitive in the same way as the (common but nonstandard)
131
      C function stricmp().
132
   
133
   
134
   ci_string s( "AbCdE" );
135
 
136
   // case insensitive
137
   assert( s == "abcde" );
138
   assert( s == "ABCDE" );
139
 
140
   // still case-preserving, of course
141
   assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
142
   assert( strcmp( s.c_str(), "abcde" ) != 0 ); 
143
 
144
   The solution is surprisingly easy.  The original answer was
145
   posted on Usenet, and a revised version appears in Herb Sutter's
146
   book Exceptional C++ and on his website as GotW 29.
147
   
148
   See?  Told you it was easy!
149
   
150
     Added June 2000: The May 2000 issue of C++
151
     Report contains a fascinating  article by
152
     Matt Austern (yes, the Matt Austern) on why
153
     case-insensitive comparisons are not as easy as they seem, and
154
     why creating a class is the wrong way to go
155
     about it in production code.  (The GotW answer mentions one of
156
     the principle difficulties; his article mentions more.)
157
   
158
   Basically, this is "easy" only if you ignore some things,
159
      things which may be too important to your program to ignore.  (I chose
160
      to ignore them when originally writing this entry, and am surprised
161
      that nobody ever called me on it...)  The GotW question and answer
162
      remain useful instructional tools, however.
163
   
164
   Added September 2000:  James Kanze provided a link to a
165
      Unicode
166
      Technical Report discussing case handling, which provides some
167
      very good information.
168
   
169
 
170
  
171
  
Arbitrary Character Types
172
 
173
    
174
    
175
 
176
   The std::basic_string is tantalizingly general, in that
177
      it is parameterized on the type of the characters which it holds.
178
      In theory, you could whip up a Unicode character class and instantiate
179
      std::basic_string<my_unicode_char>, or assuming
180
      that integers are wider than characters on your platform, maybe just
181
      declare variables of type std::basic_string<int>.
182
   
183
   That's the theory.  Remember however that basic_string has additional
184
      type parameters, which take default arguments based on the character
185
      type (called CharT here):
186
   
187
   
188
      template <typename CharT,
189
                typename Traits = char_traits<CharT>,
190
                typename Alloc = allocator<CharT> >
191
      class basic_string { .... };
192
   Now, allocator<CharT> will probably Do The Right
193
      Thing by default, unless you need to implement your own allocator
194
      for your characters.
195
   
196
   But char_traits takes more work.  The char_traits
197
      template is declared but not defined.
198
      That means there is only
199
   
200
   
201
      template <typename CharT>
202
        struct char_traits
203
        {
204
            static void foo (type1 x, type2 y);
205
            ...
206
        };
207
   and functions such as char_traits<CharT>::foo() are not
208
      actually defined anywhere for the general case.  The C++ standard
209
      permits this, because writing such a definition to fit all possible
210
      CharT's cannot be done.
211
   
212
   The C++ standard also requires that char_traits be specialized for
213
      instantiations of char and wchar_t, and it
214
      is these template specializations that permit entities like
215
      basic_string<char,char_traits<char>> to work.
216
   
217
   If you want to use character types other than char and wchar_t,
218
      such as unsigned char and int, you will
219
      need suitable specializations for them.  For a time, in earlier
220
      versions of GCC, there was a mostly-correct implementation that
221
      let programmers be lazy but it broke under many situations, so it
222
      was removed.  GCC 3.4 introduced a new implementation that mostly
223
      works and can be specialized even for int and other
224
      built-in types.
225
   
226
   If you want to use your own special character class, then you have
227
      a lot
228
      of work to do, especially if you with to use i18n features
229
      (facets require traits information but don't have a traits argument).
230
   
231
   Another example of how to specialize char_traits was given on the
232
      mailing list and at a later date was put into the file 
233
      include/ext/pod_char_traits.h.  We agree
234
      that the way it's used with basic_string (scroll down to main())
235
      doesn't look nice, but that's because the
236
      nice-looking first attempt turned out to not
237
      be conforming C++, due to the rule that CharT must be a POD.
238
      (See how tricky this is?)
239
   
240
 
241
  
242
 
243
  
Tokenizing
244
 
245
    
246
    
247
   The Standard C (and C++) function strtok() leaves a lot to
248
      be desired in terms of user-friendliness.  It's unintuitive, it
249
      destroys the character string on which it operates, and it requires
250
      you to handle all the memory problems.  But it does let the client
251
      code decide what to use to break the string into pieces; it allows
252
      you to choose the "whitespace," so to speak.
253
   
254
   A C++ implementation lets us keep the good things and fix those
255
      annoyances.  The implementation here is more intuitive (you only
256
      call it once, not in a loop with varying argument), it does not
257
      affect the original string at all, and all the memory allocation
258
      is handled for you.
259
   
260
   It's called stringtok, and it's a template function. Sources are
261
   as below, in a less-portable form than it could be, to keep this
262
   example simple (for example, see the comments on what kind of
263
   string it will accept).
264
   
265
 
266
267
#include <string>
268
template <typename Container>
269
void
270
stringtok(Container &container, string const &in,
271
          const char * const delimiters = " \t\n")
272
{
273
    const string::size_type len = in.length();
274
          string::size_type i = 0;
275
 
276
    while (i < len)
277
    {
278
        // Eat leading whitespace
279
        i = in.find_first_not_of(delimiters, i);
280
        if (i == string::npos)
281
          return;   // Nothing left but white space
282
 
283
        // Find the end of the token
284
        string::size_type j = in.find_first_of(delimiters, i);
285
 
286
        // Push token
287
        if (j == string::npos)
288
        {
289
          container.push_back(in.substr(i));
290
          return;
291
        }
292
        else
293
          container.push_back(in.substr(i, j-i));
294
 
295
        // Set up for next loop
296
        i = j + 1;
297
    }
298
}
299
300
 
301
 
302
   
303
     The author uses a more general (but less readable) form of it for
304
     parsing command strings and the like.  If you compiled and ran this
305
     code using it:
306
   
307
 
308
 
309
   
310
   std::list<string>  ls;
311
   stringtok (ls, " this  \t is\t\n  a test  ");
312
   for (std::list<string>const_iterator i = ls.begin();
313
        i != ls.end(); ++i)
314
   {
315
       std::cerr << ':' << (*i) << ":\n";
316
   } 
317
   You would see this as output:
318
   
319
   
320
   :this:
321
   :is:
322
   :a:
323
   :test: 
324
   with all the whitespace removed.  The original s is still
325
      available for use, ls will clean up after itself, and
326
      ls.size() will return how many tokens there were.
327
   
328
   As always, there is a price paid here, in that stringtok is not
329
      as fast as strtok.  The other benefits usually outweigh that, however.
330
   
331
 
332
   Added February 2001:  Mark Wilden pointed out that the
333
      standard std::getline() function can be used with standard
334
      istringstreams to perform
335
      tokenizing as well.  Build an istringstream from the input text,
336
      and then use std::getline with varying delimiters (the three-argument
337
      signature) to extract tokens into a string.
338
   
339
 
340
 
341
  
342
  
Shrink to Fit
343
 
344
    
345
    
346
   From GCC 3.4 calling s.reserve(res) on a
347
      string s with res < s.capacity() will
348
      reduce the string's capacity to std::max(s.size(), res).
349
   
350
   This behaviour is suggested, but not required by the standard. Prior
351
      to GCC 3.4 the following alternative can be used instead
352
   
353
   
354
      std::string(str.data(), str.size()).swap(str);
355
   
356
   This is similar to the idiom for reducing
357
      a vector's memory usage
358
      (see this FAQ
359
      entry) but the regular copy constructor cannot be used
360
      because libstdc++'s string is Copy-On-Write.
361
   
362
   In C++11 mode you can call
363
      s.shrink_to_fit() to achieve the same effect as
364
      s.reserve(s.size()).
365
   
366
 
367
 
368
  
369
 
370
  
CString (MFC)
371
 
372
    
373
    
374
 
375
   A common lament seen in various newsgroups deals with the Standard
376
      string class as opposed to the Microsoft Foundation Class called
377
      CString.  Often programmers realize that a standard portable
378
      answer is better than a proprietary nonportable one, but in porting
379
      their application from a Win32 platform, they discover that they
380
      are relying on special functions offered by the CString class.
381
   
382
   Things are not as bad as they seem.  In
383
      this
384
      message, Joe Buck points out a few very important things:
385
   
386
      
387
         The Standard string supports all the operations
388
             that CString does, with three exceptions.
389
         
390
         Two of those exceptions (whitespace trimming and case
391
             conversion) are trivial to implement.  In fact, we do so
392
             on this page.
393
         
394
         The third is CString::Format, which allows formatting
395
             in the style of sprintf.  This deserves some mention:
396
         
397
      
398
   
399
      The old libg++ library had a function called form(), which did much
400
      the same thing.  But for a Standard solution, you should use the
401
      stringstream classes.  These are the bridge between the iostream
402
      hierarchy and the string class, and they operate with regular
403
      streams seamlessly because they inherit from the iostream
404
      hierarchy.  An quick example:
405
   
406
   
407
   #include <iostream>
408
   #include <string>
409
   #include <sstream>
410
 
411
   string f (string& incoming)     // incoming is "foo  N"
412
   {
413
       istringstream   incoming_stream(incoming);
414
       string          the_word;
415
       int             the_number;
416
 
417
       incoming_stream >> the_word        // extract "foo"
418
                       >> the_number;     // extract N
419
 
420
       ostringstream   output_stream;
421
       output_stream << "The word was " << the_word
422
                     << " and 3*N was " << (3*the_number);
423
 
424
       return output_stream.str();
425
   } 
426
   A serious problem with CString is a design bug in its memory
427
      allocation.  Specifically, quoting from that same message:
428
   
429
   
430
   CString suffers from a common programming error that results in
431
   poor performance.  Consider the following code:
432
 
433
   CString n_copies_of (const CString& foo, unsigned n)
434
   {
435
           CString tmp;
436
           for (unsigned i = 0; i < n; i++)
437
                   tmp += foo;
438
           return tmp;
439
   }
440
 
441
   This function is O(n^2), not O(n).  The reason is that each +=
442
   causes a reallocation and copy of the existing string.  Microsoft
443
   applications are full of this kind of thing (quadratic performance
444
   on tasks that can be done in linear time) -- on the other hand,
445
   we should be thankful, as it's created such a big market for high-end
446
   ix86 hardware. :-)
447
 
448
   If you replace CString with string in the above function, the
449
   performance is O(n).
450
   
451
   Joe Buck also pointed out some other things to keep in mind when
452
      comparing CString and the Standard string class:
453
   
454
      
455
         CString permits access to its internal representation; coders
456
             who exploited that may have problems moving to string.
457
         
458
         Microsoft ships the source to CString (in the files
459
             MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation
460
             bug and rebuild your MFC libraries.
461
             Note: It looks like the CString shipped
462
             with VC++6.0 has fixed this, although it may in fact have been
463
             one of the VC++ SPs that did it.
464
         
465
         string operations like this have O(n) complexity
466
             if the implementors do it correctly.  The libstdc++
467
             implementors did it correctly.  Other vendors might not.
468
         
469
         While chapters of the SGI STL are used in libstdc++, their
470
             string class is not.  The SGI string is essentially
471
             vector<char> and does not do any reference
472
             counting like libstdc++'s does.  (It is O(n), though.)
473
             So if you're thinking about SGI's string or rope classes,
474
             you're now looking at four possibilities:  CString, the
475
             libstdc++ string, the SGI string, and the SGI rope, and this
476
             is all before any allocator or traits customizations!  (More
477
             choices than you can shake a stick at -- want fries with that?)
478
         
479
      
480
 
481
  
482
483
 
484
485
 
486

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.