1 |
20 |
jlechner |
From: herbs@cntc.com (Herb Sutter)
|
2 |
|
|
Subject: Guru of the Week #29: Solution
|
3 |
|
|
Date: 22 Jan 1998 00:00:00 GMT
|
4 |
|
|
Message-ID: <6a8q26$9qa@netlab.cs.rpi.edu>
|
5 |
|
|
Newsgroups: comp.lang.c++.moderated
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
.--------------------------------------------------------------------.
|
9 |
|
|
| Guru of the Week problems and solutions are posted regularly on |
|
10 |
|
|
| news:comp.lang.c++.moderated. For past problems and solutions |
|
11 |
|
|
| see the GotW archive at http://www.cntc.com. |
|
12 |
|
|
| Is there a topic you'd like to see covered? mailto:herbs@cntc.com |
|
13 |
|
|
`--------------------------------------------------------------------'
|
14 |
|
|
_______________________________________________________
|
15 |
|
|
|
16 |
|
|
GotW #29: Strings
|
17 |
|
|
|
18 |
|
|
Difficulty: 7 / 10
|
19 |
|
|
_______________________________________________________
|
20 |
|
|
|
21 |
|
|
|
22 |
|
|
>Write a ci_string class which is identical to the
|
23 |
|
|
>standard 'string' class, but is case-insensitive in the
|
24 |
|
|
>same way as the C function stricmp():
|
25 |
|
|
|
26 |
|
|
The "how can I make a case-insensitive string?"
|
27 |
|
|
question is so common that it probably deserves its own
|
28 |
|
|
FAQ -- hence this issue of GotW.
|
29 |
|
|
|
30 |
|
|
Note 1: The stricmp() case-insensitive string
|
31 |
|
|
comparison function is not part of the C standard, but
|
32 |
|
|
it is a common extension on many C compilers.
|
33 |
|
|
|
34 |
|
|
Note 2: What "case insensitive" actually means depends
|
35 |
|
|
entirely on your application and language. For
|
36 |
|
|
example, many languages do not have "cases" at all, and
|
37 |
|
|
for languages that do you have to decide whether you
|
38 |
|
|
want accented characters to compare equal to unaccented
|
39 |
|
|
characters, and so on. This GotW provides guidance on
|
40 |
|
|
how to implement case-insensitivity for standard
|
41 |
|
|
strings in whatever sense applies to your particular
|
42 |
|
|
situation.
|
43 |
|
|
|
44 |
|
|
|
45 |
|
|
Here's what we want to achieve:
|
46 |
|
|
|
47 |
|
|
> ci_string s( "AbCdE" );
|
48 |
|
|
>
|
49 |
|
|
> // case insensitive
|
50 |
|
|
> assert( s == "abcde" );
|
51 |
|
|
> assert( s == "ABCDE" );
|
52 |
|
|
>
|
53 |
|
|
> // still case-preserving, of course
|
54 |
|
|
> assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
|
55 |
|
|
> assert( strcmp( s.c_str(), "abcde" ) != 0 );
|
56 |
|
|
|
57 |
|
|
The key here is to understand what a "string" actually
|
58 |
|
|
is in standard C++. If you look in your trusty string
|
59 |
|
|
header, you'll see something like this:
|
60 |
|
|
|
61 |
|
|
typedef basic_string string;
|
62 |
|
|
|
63 |
|
|
So string isn't really a class... it's a typedef of a
|
64 |
|
|
template. In turn, the basic_string<> template is
|
65 |
|
|
declared as follows, in all its glory:
|
66 |
|
|
|
67 |
|
|
template
|
68 |
|
|
class traits = char_traits,
|
69 |
|
|
class Allocator = allocator >
|
70 |
|
|
class basic_string;
|
71 |
|
|
|
72 |
|
|
So "string" really means "basic_string
|
73 |
|
|
char_traits, allocator >". We don't need
|
74 |
|
|
to worry about the allocator part, but the key here is
|
75 |
|
|
the char_traits part because char_traits defines how
|
76 |
|
|
characters interact and compare(!).
|
77 |
|
|
|
78 |
|
|
basic_string supplies useful comparison functions that
|
79 |
|
|
let you compare whether a string is equal to another,
|
80 |
|
|
less than another, and so on. These string comparisons
|
81 |
|
|
functions are built on top of character comparison
|
82 |
|
|
functions supplied in the char_traits template. In
|
83 |
|
|
particular, the char_traits template supplies character
|
84 |
|
|
comparison functions named eq(), ne(), and lt() for
|
85 |
|
|
equality, inequality, and less-than comparisons, and
|
86 |
|
|
compare() and find() functions to compare and search
|
87 |
|
|
sequences of characters.
|
88 |
|
|
|
89 |
|
|
If we want these to behave differently, all we have to
|
90 |
|
|
do is provide a different char_traits template! Here's
|
91 |
|
|
the easiest way:
|
92 |
|
|
|
93 |
|
|
struct ci_char_traits : public char_traits
|
94 |
|
|
// just inherit all the other functions
|
95 |
|
|
// that we don't need to override
|
96 |
|
|
{
|
97 |
|
|
static bool eq( char c1, char c2 ) {
|
98 |
|
|
return tolower(c1) == tolower(c2);
|
99 |
|
|
}
|
100 |
|
|
|
101 |
|
|
static bool ne( char c1, char c2 ) {
|
102 |
|
|
return tolower(c1) != tolower(c2);
|
103 |
|
|
}
|
104 |
|
|
|
105 |
|
|
static bool lt( char c1, char c2 ) {
|
106 |
|
|
return tolower(c1) < tolower(c2);
|
107 |
|
|
}
|
108 |
|
|
|
109 |
|
|
static int compare( const char* s1,
|
110 |
|
|
const char* s2,
|
111 |
|
|
size_t n ) {
|
112 |
|
|
return strnicmp( s1, s2, n );
|
113 |
|
|
// if available on your compiler,
|
114 |
|
|
// otherwise you can roll your own
|
115 |
|
|
}
|
116 |
|
|
|
117 |
|
|
static const char*
|
118 |
|
|
find( const char* s, int n, char a ) {
|
119 |
|
|
while( n-- > 0 && tolower(*s) != tolower(a) ) {
|
120 |
|
|
++s;
|
121 |
|
|
}
|
122 |
|
|
return n >= 0 ? s : 0;
|
123 |
|
|
}
|
124 |
|
|
};
|
125 |
|
|
|
126 |
|
|
[N.B. A bug in the original code has been fixed for the
|
127 |
|
|
GCC documentation, the corrected code was taken from
|
128 |
|
|
Herb Sutter's book, Exceptional C++]
|
129 |
|
|
|
130 |
|
|
And finally, the key that brings it all together:
|
131 |
|
|
|
132 |
|
|
typedef basic_string ci_string;
|
133 |
|
|
|
134 |
|
|
All we've done is created a typedef named "ci_string"
|
135 |
|
|
which operates exactly like the standard "string",
|
136 |
|
|
except that it uses ci_char_traits instead of
|
137 |
|
|
char_traits to get its character comparison
|
138 |
|
|
rules. Since we've handily made the ci_char_traits
|
139 |
|
|
rules case-insensitive, we've made ci_string itself
|
140 |
|
|
case-insensitive without any further surgery -- that
|
141 |
|
|
is, we have a case-insensitive string without having
|
142 |
|
|
touched basic_string at all!
|
143 |
|
|
|
144 |
|
|
This GotW should give you a flavour for how the
|
145 |
|
|
basic_string template works and how flexible it is in
|
146 |
|
|
practice. If you want different comparisons than the
|
147 |
|
|
ones stricmp() and tolower() give you, just replace the
|
148 |
|
|
five functions shown above with your own code that
|
149 |
|
|
performs character comparisons the way that's
|
150 |
|
|
appropriate in your particular application.
|
151 |
|
|
|
152 |
|
|
|
153 |
|
|
|
154 |
|
|
Exercise for the reader:
|
155 |
|
|
|
156 |
|
|
Is it safe to inherit ci_char_traits from
|
157 |
|
|
char_traits this way? Why or why not?
|
158 |
|
|
|
159 |
|
|
|