1 |
742 |
jeremybenn |
|
2 |
|
|
xml:id="std.containers" xreflabel="Containers">
|
3 |
|
|
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
Containers
|
7 |
|
|
Containers
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
|
11 |
|
|
ISO C++
|
12 |
|
|
|
13 |
|
|
|
14 |
|
|
library
|
15 |
|
|
|
16 |
|
|
|
17 |
|
|
|
18 |
|
|
|
19 |
|
|
|
20 |
|
|
|
21 |
|
|
|
22 |
|
|
|
23 |
|
|
|
24 |
|
|
|
25 |
|
|
|
26 |
|
|
|
27 |
|
|
|
28 |
|
|
|
29 |
|
|
|
30 |
|
|
|
31 |
|
|
|
32 |
|
|
Yes it is, and that's okay. This is a decision that we preserved
|
33 |
|
|
when we imported SGI's STL implementation. The following is
|
34 |
|
|
quoted from their FAQ:
|
35 |
|
|
|
36 |
|
|
|
37 |
|
|
|
38 |
|
|
The size() member function, for list and slist, takes time
|
39 |
|
|
proportional to the number of elements in the list. This was a
|
40 |
|
|
deliberate tradeoff. The only way to get a constant-time
|
41 |
|
|
size() for linked lists would be to maintain an extra member
|
42 |
|
|
variable containing the list's size. This would require taking
|
43 |
|
|
extra time to update that variable (it would make splice() a
|
44 |
|
|
linear time operation, for example), and it would also make the
|
45 |
|
|
list larger. Many list algorithms don't require that extra
|
46 |
|
|
word (algorithms that do require it might do better with
|
47 |
|
|
vectors than with lists), and, when it is necessary to maintain
|
48 |
|
|
an explicit size count, it's something that users can do
|
49 |
|
|
themselves.
|
50 |
|
|
|
51 |
|
|
|
52 |
|
|
This choice is permitted by the C++ standard. The standard says
|
53 |
|
|
that size() should be constant time, and
|
54 |
|
|
should does not mean the same thing as
|
55 |
|
|
shall . This is the officially recommended ISO
|
56 |
|
|
wording for saying that an implementation is supposed to do
|
57 |
|
|
something unless there is a good reason not to.
|
58 |
|
|
|
59 |
|
|
|
60 |
|
|
One implication of linear time size(): you should never write
|
61 |
|
|
|
62 |
|
|
|
63 |
|
|
if (L.size() == 0)
|
64 |
|
|
...
|
65 |
|
|
|
66 |
|
|
|
67 |
|
|
|
68 |
|
|
Instead, you should write
|
69 |
|
|
|
70 |
|
|
|
71 |
|
|
|
72 |
|
|
if (L.empty())
|
73 |
|
|
...
|
74 |
|
|
|
75 |
|
|
|
76 |
|
|
|
77 |
|
|
|
78 |
|
|
|
79 |
|
|
|
80 |
|
|
|
81 |
|
|
|
82 |
|
|
|
83 |
|
|
|
84 |
|
|
Space Overhead Management
|
85 |
|
|
|
86 |
|
|
|
87 |
|
|
In this
|
88 |
|
|
message to the list, Daniel Kostecky announced work on an
|
89 |
|
|
alternate form of std::vector that would support
|
90 |
|
|
hints on the number of elements to be over-allocated. The design
|
91 |
|
|
was also described, along with possible implementation choices.
|
92 |
|
|
|
93 |
|
|
|
94 |
|
|
The first two alpha releases were announced here
|
95 |
|
|
and here.
|
96 |
|
|
|
97 |
|
|
|
98 |
|
|
|
99 |
|
|
|
100 |
|
|
|
101 |
|
|
|
102 |
|
|
|
103 |
|
|
|
104 |
|
|
|
105 |
|
|
|
106 |
|
|
|
107 |
|
|
|
108 |
|
|
|
109 |
|
|
Section [23.1.2], Table 69, of the C++ standard lists this
|
110 |
|
|
function for all of the associative containers (map, set, etc):
|
111 |
|
|
|
112 |
|
|
|
113 |
|
|
a.insert(p,t);
|
114 |
|
|
|
115 |
|
|
|
116 |
|
|
where 'p' is an iterator into the container 'a', and 't' is the
|
117 |
|
|
item to insert. The standard says that t is
|
118 |
|
|
inserted as close as possible to the position just prior to
|
119 |
|
|
p . (Library DR #233 addresses this topic,
|
120 |
|
|
referring to N1780.
|
121 |
|
|
Since version 4.2 GCC implements the resolution to DR 233, so
|
122 |
|
|
that insertions happen as close as possible to the hint. For
|
123 |
|
|
earlier releases the hint was only used as described below.
|
124 |
|
|
|
125 |
|
|
|
126 |
|
|
Here we'll describe how the hinting works in the libstdc++
|
127 |
|
|
implementation, and what you need to do in order to take
|
128 |
|
|
advantage of it. (Insertions can change from logarithmic
|
129 |
|
|
complexity to amortized constant time, if the hint is properly
|
130 |
|
|
used.) Also, since the current implementation is based on the
|
131 |
|
|
SGI STL one, these points may hold true for other library
|
132 |
|
|
implementations also, since the HP/SGI code is used in a lot of
|
133 |
|
|
places.
|
134 |
|
|
|
135 |
|
|
|
136 |
|
|
In the following text, the phrases greater
|
137 |
|
|
than and less than refer to the
|
138 |
|
|
results of the strict weak ordering imposed on the container by
|
139 |
|
|
its comparison object, which defaults to (basically)
|
140 |
|
|
< . Using those phrases is semantically sloppy,
|
141 |
|
|
but I didn't want to get bogged down in syntax. I assume that if
|
142 |
|
|
you are intelligent enough to use your own comparison objects,
|
143 |
|
|
you are also intelligent enough to assign greater
|
144 |
|
|
and lesser their new meanings in the next
|
145 |
|
|
paragraph. *grin*
|
146 |
|
|
|
147 |
|
|
|
148 |
|
|
If the hint parameter ('p' above) is equivalent to:
|
149 |
|
|
|
150 |
|
|
|
151 |
|
|
|
152 |
|
|
|
153 |
|
|
begin() , then the item being inserted should
|
154 |
|
|
have a key less than all the other keys in the container.
|
155 |
|
|
The item will be inserted at the beginning of the container,
|
156 |
|
|
becoming the new entry at begin() .
|
157 |
|
|
|
158 |
|
|
|
159 |
|
|
|
160 |
|
|
|
161 |
|
|
end() , then the item being inserted should have
|
162 |
|
|
a key greater than all the other keys in the container. The
|
163 |
|
|
item will be inserted at the end of the container, becoming
|
164 |
|
|
the new entry before end() .
|
165 |
|
|
|
166 |
|
|
|
167 |
|
|
|
168 |
|
|
|
169 |
|
|
neither begin() nor end() , then:
|
170 |
|
|
Let h be the entry in the container pointed to
|
171 |
|
|
by hint , that is, h = *hint . Then
|
172 |
|
|
the item being inserted should have a key less than that of
|
173 |
|
|
h , and greater than that of the item preceding
|
174 |
|
|
h . The new item will be inserted between
|
175 |
|
|
h and h 's predecessor.
|
176 |
|
|
|
177 |
|
|
|
178 |
|
|
|
179 |
|
|
|
180 |
|
|
For multimap and multiset , the
|
181 |
|
|
restrictions are slightly looser: greater than
|
182 |
|
|
should be replaced by not less than and less
|
183 |
|
|
than should be replaced by not greater
|
184 |
|
|
than. (Why not replace greater with
|
185 |
|
|
greater-than-or-equal-to? You probably could in your head, but
|
186 |
|
|
the mathematicians will tell you that it isn't the same thing.)
|
187 |
|
|
|
188 |
|
|
|
189 |
|
|
If the conditions are not met, then the hint is not used, and the
|
190 |
|
|
insertion proceeds as if you had called a.insert(t)
|
191 |
|
|
instead. (Note that GCC releases
|
192 |
|
|
prior to 3.0.2 had a bug in the case with hint ==
|
193 |
|
|
begin() for the map and set
|
194 |
|
|
classes. You should not use a hint argument in those releases.)
|
195 |
|
|
|
196 |
|
|
|
197 |
|
|
This behavior goes well with other containers'
|
198 |
|
|
insert() functions which take an iterator: if used,
|
199 |
|
|
the new item will be inserted before the iterator passed as an
|
200 |
|
|
argument, same as the other containers.
|
201 |
|
|
|
202 |
|
|
|
203 |
|
|
Note also that the hint in this
|
204 |
|
|
implementation is a one-shot. The older insertion-with-hint
|
205 |
|
|
routines check the immediately surrounding entries to ensure that
|
206 |
|
|
the new item would in fact belong there. If the hint does not
|
207 |
|
|
point to the correct place, then no further local searching is
|
208 |
|
|
done; the search begins from scratch in logarithmic time.
|
209 |
|
|
|
210 |
|
|
|
211 |
|
|
|
212 |
|
|
|
213 |
|
|
|
214 |
|
|
|
215 |
|
|
|
216 |
|
|
|
217 |
|
|
|
218 |
|
|
|
219 |
|
|
No, you cannot write code of the form
|
220 |
|
|
|
221 |
|
|
|
222 |
|
|
|
223 |
|
|
#include <bitset>
|
224 |
|
|
|
225 |
|
|
void foo (size_t n)
|
226 |
|
|
{
|
227 |
|
|
std::bitset<n> bits;
|
228 |
|
|
....
|
229 |
|
|
}
|
230 |
|
|
|
231 |
|
|
|
232 |
|
|
because n must be known at compile time. Your
|
233 |
|
|
compiler is correct; it is not a bug. That's the way templates
|
234 |
|
|
work. (Yes, it is a feature.)
|
235 |
|
|
|
236 |
|
|
|
237 |
|
|
There are a couple of ways to handle this kind of thing. Please
|
238 |
|
|
consider all of them before passing judgement. They include, in
|
239 |
|
|
no chaptericular order:
|
240 |
|
|
|
241 |
|
|
|
242 |
|
|
A very large N in bitset<N> .
|
243 |
|
|
A container<bool>.
|
244 |
|
|
Extremely weird solutions.
|
245 |
|
|
|
246 |
|
|
|
247 |
|
|
A very large N in
|
248 |
|
|
bitset<N> . It has been
|
249 |
|
|
pointed out a few times in newsgroups that N bits only takes up
|
250 |
|
|
(N/8) bytes on most systems, and division by a factor of eight is
|
251 |
|
|
pretty impressive when speaking of memory. Half a megabyte given
|
252 |
|
|
over to a bitset (recall that there is zero space overhead for
|
253 |
|
|
housekeeping info; it is known at compile time exactly how large
|
254 |
|
|
the set is) will hold over four million bits. If you're using
|
255 |
|
|
those bits as status flags (e.g.,
|
256 |
|
|
changed /unchanged flags), that's a
|
257 |
|
|
lot of state.
|
258 |
|
|
|
259 |
|
|
|
260 |
|
|
You can then keep track of the maximum bit used
|
261 |
|
|
during some testing runs on representative data, make note of how
|
262 |
|
|
many of those bits really need to be there, and then reduce N to
|
263 |
|
|
a smaller number. Leave some extra space, of course. (If you
|
264 |
|
|
plan to write code like the incorrect example above, where the
|
265 |
|
|
bitset is a local variable, then you may have to talk your
|
266 |
|
|
compiler into allowing that much stack space; there may be zero
|
267 |
|
|
space overhead, but it's all allocated inside the object.)
|
268 |
|
|
|
269 |
|
|
|
270 |
|
|
A container<bool>. The
|
271 |
|
|
Committee made provision for the space savings possible with that
|
272 |
|
|
(N/8) usage previously mentioned, so that you don't have to do
|
273 |
|
|
wasteful things like Container<char> or
|
274 |
|
|
Container<short int> . Specifically,
|
275 |
|
|
vector<bool> is required to be specialized for
|
276 |
|
|
that space savings.
|
277 |
|
|
|
278 |
|
|
|
279 |
|
|
The problem is that vector<bool> doesn't
|
280 |
|
|
behave like a normal vector anymore. There have been
|
281 |
|
|
journal articles which discuss the problems (the ones by Herb
|
282 |
|
|
Sutter in the May and July/August 1999 issues of C++ Report cover
|
283 |
|
|
it well). Future revisions of the ISO C++ Standard will change
|
284 |
|
|
the requirement for vector<bool>
|
285 |
|
|
specialization. In the meantime, deque<bool>
|
286 |
|
|
is recommended (although its behavior is sane, you probably will
|
287 |
|
|
not get the space savings, but the allocation scheme is different
|
288 |
|
|
than that of vector).
|
289 |
|
|
|
290 |
|
|
|
291 |
|
|
Extremely weird solutions. If
|
292 |
|
|
you have access to the compiler and linker at runtime, you can do
|
293 |
|
|
something insane, like figuring out just how many bits you need,
|
294 |
|
|
then writing a temporary source code file. That file contains an
|
295 |
|
|
instantiation of bitset for the required number of
|
296 |
|
|
bits, inside some wrapper functions with unchanging signatures.
|
297 |
|
|
Have your program then call the compiler on that file using
|
298 |
|
|
Position Independent Code, then open the newly-created object
|
299 |
|
|
file and load those wrapper functions. You'll have an
|
300 |
|
|
instantiation of bitset<N> for the exact
|
301 |
|
|
N that you need at the time. Don't forget to delete
|
302 |
|
|
the temporary files. (Yes, this can be, and
|
303 |
|
|
has been, done.)
|
304 |
|
|
|
305 |
|
|
|
306 |
|
|
|
307 |
|
|
This would be the approach of either a visionary genius or a
|
308 |
|
|
raving lunatic, depending on your programming and management
|
309 |
|
|
style. Probably the latter.
|
310 |
|
|
|
311 |
|
|
|
312 |
|
|
Which of the above techniques you use, if any, are up to you and
|
313 |
|
|
your intended application. Some time/space profiling is
|
314 |
|
|
indicated if it really matters (don't just guess). And, if you
|
315 |
|
|
manage to do anything along the lines of the third category, the
|
316 |
|
|
author would love to hear from you...
|
317 |
|
|
|
318 |
|
|
|
319 |
|
|
Also note that the implementation of bitset used in libstdc++ has
|
320 |
|
|
some extensions.
|
321 |
|
|
|
322 |
|
|
|
323 |
|
|
|
324 |
|
|
|
325 |
|
|
|
326 |
|
|
|
327 |
|
|
|
328 |
|
|
|
329 |
|
|
Bitmasks do not take char* nor const char* arguments in their
|
330 |
|
|
constructors. This is something of an accident, but you can read
|
331 |
|
|
about the problem: follow the library's Links from
|
332 |
|
|
the homepage, and from the C++ information defect
|
333 |
|
|
reflector link, select the library issues list. Issue
|
334 |
|
|
number 116 describes the problem.
|
335 |
|
|
|
336 |
|
|
|
337 |
|
|
For now you can simply make a temporary string object using the
|
338 |
|
|
constructor expression:
|
339 |
|
|
|
340 |
|
|
|
341 |
|
|
std::bitset<5> b ( std::string(10110 ) );
|
342 |
|
|
|
343 |
|
|
|
344 |
|
|
|
345 |
|
|
instead of
|
346 |
|
|
|
347 |
|
|
|
348 |
|
|
|
349 |
|
|
std::bitset<5> b ( 10110 ); // invalid
|
350 |
|
|
|
351 |
|
|
|
352 |
|
|
|
353 |
|
|
|
354 |
|
|
|
355 |
|
|
|
356 |
|
|
|
357 |
|
|
|
358 |
|
|
|
359 |
|
|
|
360 |
|
|
|
361 |
|
|
|
362 |
|
|
|
363 |
|
|
|
364 |
|
|
You're writing some code and can't decide whether to use builtin
|
365 |
|
|
arrays or some kind of container. There are compelling reasons
|
366 |
|
|
to use one of the container classes, but you're afraid that
|
367 |
|
|
you'll eventually run into difficulties, change everything back
|
368 |
|
|
to arrays, and then have to change all the code that uses those
|
369 |
|
|
data types to keep up with the change.
|
370 |
|
|
|
371 |
|
|
|
372 |
|
|
If your code makes use of the standard algorithms, this isn't as
|
373 |
|
|
scary as it sounds. The algorithms don't know, nor care, about
|
374 |
|
|
the kind of container on which they work, since
|
375 |
|
|
the algorithms are only given endpoints to work with. For the
|
376 |
|
|
container classes, these are iterators (usually
|
377 |
|
|
begin() and end() , but not always).
|
378 |
|
|
For builtin arrays, these are the address of the first element
|
379 |
|
|
and the past-the-end element.
|
380 |
|
|
|
381 |
|
|
|
382 |
|
|
Some very simple wrapper functions can hide all of that from the
|
383 |
|
|
rest of the code. For example, a pair of functions called
|
384 |
|
|
beginof can be written, one that takes an array,
|
385 |
|
|
another that takes a vector. The first returns a pointer to the
|
386 |
|
|
first element, and the second returns the vector's
|
387 |
|
|
begin() iterator.
|
388 |
|
|
|
389 |
|
|
|
390 |
|
|
The functions should be made template functions, and should also
|
391 |
|
|
be declared inline. As pointed out in the comments in the code
|
392 |
|
|
below, this can lead to beginof being optimized out
|
393 |
|
|
of existence, so you pay absolutely nothing in terms of increased
|
394 |
|
|
code size or execution time.
|
395 |
|
|
|
396 |
|
|
|
397 |
|
|
The result is that if all your algorithm calls look like
|
398 |
|
|
|
399 |
|
|
|
400 |
|
|
std::transform(beginof(foo), endof(foo), beginof(foo), SomeFunction);
|
401 |
|
|
|
402 |
|
|
|
403 |
|
|
then the type of foo can change from an array of ints to a vector
|
404 |
|
|
of ints to a deque of ints and back again, without ever changing
|
405 |
|
|
any client code.
|
406 |
|
|
|
407 |
|
|
|
408 |
|
|
|
409 |
|
|
// beginof
|
410 |
|
|
template<typename T>
|
411 |
|
|
inline typename vector<T>::iterator
|
412 |
|
|
beginof(vector<T> &v)
|
413 |
|
|
{ return v.begin(); }
|
414 |
|
|
|
415 |
|
|
template<typename T, unsigned int sz>
|
416 |
|
|
inline T*
|
417 |
|
|
beginof(T (&array)[sz]) { return array; }
|
418 |
|
|
|
419 |
|
|
// endof
|
420 |
|
|
template<typename T>
|
421 |
|
|
inline typename vector<T>::iterator
|
422 |
|
|
endof(vector<T> &v)
|
423 |
|
|
{ return v.end(); }
|
424 |
|
|
|
425 |
|
|
template<typename T, unsigned int sz>
|
426 |
|
|
inline T*
|
427 |
|
|
endof(T (&array)[sz]) { return array + sz; }
|
428 |
|
|
|
429 |
|
|
// lengthof
|
430 |
|
|
template<typename T>
|
431 |
|
|
inline typename vector<T>::size_type
|
432 |
|
|
lengthof(vector<T> &v)
|
433 |
|
|
{ return v.size(); }
|
434 |
|
|
|
435 |
|
|
template<typename T, unsigned int sz>
|
436 |
|
|
inline unsigned int
|
437 |
|
|
lengthof(T (&)[sz]) { return sz; }
|
438 |
|
|
|
439 |
|
|
|
440 |
|
|
|
441 |
|
|
Astute readers will notice two things at once: first, that the
|
442 |
|
|
container class is still a vector<T> instead
|
443 |
|
|
of a more general Container<T> . This would
|
444 |
|
|
mean that three functions for deque would have to be
|
445 |
|
|
added, another three for list , and so on. This is
|
446 |
|
|
due to problems with getting template resolution correct; I find
|
447 |
|
|
it easier just to give the extra three lines and avoid confusion.
|
448 |
|
|
|
449 |
|
|
|
450 |
|
|
Second, the line
|
451 |
|
|
|
452 |
|
|
|
453 |
|
|
inline unsigned int lengthof (T (&)[sz]) { return sz; }
|
454 |
|
|
|
455 |
|
|
|
456 |
|
|
looks just weird! Hint: unused parameters can be left nameless.
|
457 |
|
|
|
458 |
|
|
|
459 |
|
|
|
460 |
|
|
|
461 |
|
|
|
462 |
|
|
|