1 |
742 |
jeremybenn |
|
2 |
|
|
xml:id="std.iterators" xreflabel="Iterators">
|
3 |
|
|
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
Iterators
|
7 |
|
|
Iterators
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
|
11 |
|
|
ISO C++
|
12 |
|
|
|
13 |
|
|
|
14 |
|
|
library
|
15 |
|
|
|
16 |
|
|
|
17 |
|
|
|
18 |
|
|
|
19 |
|
|
|
20 |
|
|
|
21 |
|
|
|
22 |
|
|
|
23 |
|
|
|
24 |
|
|
|
25 |
|
|
|
26 |
|
|
|
27 |
|
|
|
28 |
|
|
The following
|
29 |
|
|
FAQ entry points out that
|
30 |
|
|
iterators are not implemented as pointers. They are a generalization
|
31 |
|
|
of pointers, but they are implemented in libstdc++ as separate
|
32 |
|
|
classes.
|
33 |
|
|
|
34 |
|
|
|
35 |
|
|
Keeping that simple fact in mind as you design your code will
|
36 |
|
|
prevent a whole lot of difficult-to-understand bugs.
|
37 |
|
|
|
38 |
|
|
|
39 |
|
|
You can think of it the other way 'round, even. Since iterators
|
40 |
|
|
are a generalization, that means
|
41 |
|
|
that pointers are
|
42 |
|
|
iterators, and that pointers can be used
|
43 |
|
|
whenever an iterator would be. All those functions in the
|
44 |
|
|
Algorithms section of the Standard will work just as well on plain
|
45 |
|
|
arrays and their pointers.
|
46 |
|
|
|
47 |
|
|
|
48 |
|
|
That doesn't mean that when you pass in a pointer, it gets
|
49 |
|
|
wrapped into some special delegating iterator-to-pointer class
|
50 |
|
|
with a layer of overhead. (If you think that's the case
|
51 |
|
|
anywhere, you don't understand templates to begin with...) Oh,
|
52 |
|
|
no; if you pass in a pointer, then the compiler will instantiate
|
53 |
|
|
that template using T* as a type, and good old high-speed
|
54 |
|
|
pointer arithmetic as its operations, so the resulting code will
|
55 |
|
|
be doing exactly the same things as it would be doing if you had
|
56 |
|
|
hand-coded it yourself (for the 273rd time).
|
57 |
|
|
|
58 |
|
|
|
59 |
|
|
How much overhead is there when using an
|
60 |
|
|
iterator class? Very little. Most of the layering classes
|
61 |
|
|
contain nothing but typedefs, and typedefs are
|
62 |
|
|
"meta-information" that simply tell the compiler some
|
63 |
|
|
nicknames; they don't create code. That information gets passed
|
64 |
|
|
down through inheritance, so while the compiler has to do work
|
65 |
|
|
looking up all the names, your runtime code does not. (This has
|
66 |
|
|
been a prime concern from the beginning.)
|
67 |
|
|
|
68 |
|
|
|
69 |
|
|
|
70 |
|
|
|
71 |
|
|
|
72 |
|
|
|
73 |
|
|
|
74 |
|
|
|
75 |
|
|
This starts off sounding complicated, but is actually very easy,
|
76 |
|
|
especially towards the end. Trust me.
|
77 |
|
|
|
78 |
|
|
Beginners usually have a little trouble understand the whole
|
79 |
|
|
'past-the-end' thing, until they remember their early algebra classes
|
80 |
|
|
(see, they told you that stuff would come in handy!) and
|
81 |
|
|
the concept of half-open ranges.
|
82 |
|
|
|
83 |
|
|
First, some history, and a reminder of some of the funkier rules in
|
84 |
|
|
C and C++ for builtin arrays. The following rules have always been
|
85 |
|
|
true for both languages:
|
86 |
|
|
|
87 |
|
|
|
88 |
|
|
|
89 |
|
|
You can point anywhere in the array, or to the first element
|
90 |
|
|
past the end of the array. A pointer that points to one
|
91 |
|
|
past the end of the array is guaranteed to be as unique as a
|
92 |
|
|
pointer to somewhere inside the array, so that you can compare
|
93 |
|
|
such pointers safely.
|
94 |
|
|
|
95 |
|
|
|
96 |
|
|
|
97 |
|
|
You can only dereference a pointer that points into an array.
|
98 |
|
|
If your array pointer points outside the array -- even to just
|
99 |
|
|
one past the end -- and you dereference it, Bad Things happen.
|
100 |
|
|
|
101 |
|
|
|
102 |
|
|
|
103 |
|
|
Strictly speaking, simply pointing anywhere else invokes
|
104 |
|
|
undefined behavior. Most programs won't puke until such a
|
105 |
|
|
pointer is actually dereferenced, but the standards leave that
|
106 |
|
|
up to the platform.
|
107 |
|
|
|
108 |
|
|
|
109 |
|
|
|
110 |
|
|
The reason this past-the-end addressing was allowed is to make it
|
111 |
|
|
easy to write a loop to go over an entire array, e.g.,
|
112 |
|
|
while (*d++ = *s++);.
|
113 |
|
|
|
114 |
|
|
So, when you think of two pointers delimiting an array, don't think
|
115 |
|
|
of them as indexing 0 through n-1. Think of them as boundary
|
116 |
|
|
markers:
|
117 |
|
|
|
118 |
|
|
|
119 |
|
|
|
120 |
|
|
beginning end
|
121 |
|
|
| |
|
122 |
|
|
| | This is bad. Always having to
|
123 |
|
|
| | remember to add or subtract one.
|
124 |
|
|
| | Off-by-one bugs very common here.
|
125 |
|
|
V V
|
126 |
|
|
array of N elements
|
127 |
|
|
|---|---|--...--|---|---|
|
128 |
|
|
| 0 | 1 | ... |N-2|N-1|
|
129 |
|
|
|---|---|--...--|---|---|
|
130 |
|
|
|
131 |
|
|
^ ^
|
132 |
|
|
| |
|
133 |
|
|
| | This is good. This is safe. This
|
134 |
|
|
| | is guaranteed to work. Just don't
|
135 |
|
|
| | dereference 'end'.
|
136 |
|
|
beginning end
|
137 |
|
|
|
138 |
|
|
|
139 |
|
|
See? Everything between the boundary markers is chapter of the array.
|
140 |
|
|
Simple.
|
141 |
|
|
|
142 |
|
|
Now think back to your junior-high school algebra course, when you
|
143 |
|
|
were learning how to draw graphs. Remember that a graph terminating
|
144 |
|
|
with a solid dot meant, "Everything up through this point,"
|
145 |
|
|
and a graph terminating with an open dot meant, "Everything up
|
146 |
|
|
to, but not including, this point," respectively called closed
|
147 |
|
|
and open ranges? Remember how closed ranges were written with
|
148 |
|
|
brackets, [a,b], and open ranges were written with parentheses,
|
149 |
|
|
(a,b)?
|
150 |
|
|
|
151 |
|
|
The boundary markers for arrays describe a half-open range,
|
152 |
|
|
starting with (and including) the first element, and ending with (but
|
153 |
|
|
not including) the last element: [beginning,end). See, I
|
154 |
|
|
told you it would be simple in the end.
|
155 |
|
|
|
156 |
|
|
Iterators, and everything working with iterators, follows this same
|
157 |
|
|
time-honored tradition. A container's begin() method returns
|
158 |
|
|
an iterator referring to the first element, and its end()
|
159 |
|
|
method returns a past-the-end iterator, which is guaranteed to be
|
160 |
|
|
unique and comparable against any other iterator pointing into the
|
161 |
|
|
middle of the container.
|
162 |
|
|
|
163 |
|
|
Container constructors, container methods, and algorithms, all take
|
164 |
|
|
pairs of iterators describing a range of values on which to operate.
|
165 |
|
|
All of these ranges are half-open ranges, so you pass the beginning
|
166 |
|
|
iterator as the starting parameter, and the one-past-the-end iterator
|
167 |
|
|
as the finishing parameter.
|
168 |
|
|
|
169 |
|
|
This generalizes very well. You can operate on sub-ranges quite
|
170 |
|
|
easily this way; functions accepting a [first,last) range
|
171 |
|
|
don't know or care whether they are the boundaries of an entire {array,
|
172 |
|
|
sequence, container, whatever}, or whether they only enclose a few
|
173 |
|
|
elements from the center. This approach also makes zero-length
|
174 |
|
|
sequences very simple to recognize: if the two endpoints compare
|
175 |
|
|
equal, then the {array, sequence, container, whatever} is empty.
|
176 |
|
|
|
177 |
|
|
Just don't dereference end() .
|
178 |
|
|
|
179 |
|
|
|
180 |
|
|
|
181 |
|
|
|
182 |
|
|
|
183 |
|
|
|
184 |
|
|
|
185 |
|
|
|