OpenCores
URL https://opencores.org/ocsvn/scarts/scarts/trunk

Subversion Repositories scarts

[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [motivation.html] - Blame information for rev 20

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 20 jlechner
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2
<html>
3
    <head>
4
        <title>Motivation</title>
5
        <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
6
        <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
7
    </head>
8
<body bgcolor = "white">
9
<h1>Motivation</h1>
10
 
11
<p>
12
        The <a href = "introduction.html">Introduction</a> Section described some challenges
13
in designing associative containers. This section describes the STL's solution and motivation for an alternative solution. It is organized as follows.
14
</p>
15
 
16
<ol>
17
        <li> <a href = "#stl">The STL's Associative-Container Design</a>
18
        briefly describes the STL's solution.
19
        </li>
20
        <li> <a href = "#policies">Choice of Policies</a> discusses possible additional policies by which to parameterize data structures.
21
        </li>
22
        <li> <a href = "#ds_genericity">Data-Structure Genericity</a> discusses possible problems with generic manipulation of containers based on different underlying data-structures.
23
        </li>
24
        <li> <a href = "#mapping_semantics">Mapping Semantics</a> discusses scalability issues with the STL's non-unique-mapping associative containers.
25
        </li>
26
        <li> <a href = "#methods">Choice of Methods</a> discusses some reservations with the choice of methods in the STL.
27
        </li>
28
</ol>
29
 
30
<h2><a name = "stl">The STL's Associative-Container Design</a></h2>
31
 
32
<p>
33
        The STL (or its extensions) currently offer associative containers based on underlying red-black trees or collision-chaining hash tables. For association, containers based on trees are parameterized by a comparison functor, and containers based on hash tables are parameterized by a hash functor and an equivalence functor.
34
</p>
35
 
36
<p>
37
        For each underlying data-structure, the STL offers four containers with different mapping semantics. A map-type uniquely maps each key to some datum, a set-type stores uniquely keys, a multimap-type non-uniquely maps each key to some datum, and a multiset-type non-uniquely stores keys.
38
</p>
39
 
40
<p>
41
        Containers contain various iterator-based methods. <i>E.g.</i>, all containers have constructors taking a pair of iterators, and transactionally construct an object containing all elements in the iterators' range. Additionally, it is possible to (non-transactionally) insert a range given by iterators, or erase such a range. Other methods are implicitly range-based, <i>e.g.</i>, it is possible to test the equivalence of two associative container objects via <tt><b>operator</b>==</tt>.
42
</p>
43
 
44
<h2><a name = "policies">Choice of Policies</a></h2>
45
 
46
<p>
47
        In order to function efficiently in various settings, associative containers require
48
a wide variety of policies.
49
</p>
50
 
51
<p>
52
        For example, a hash policy instructs how to transform a key object into some non-negative integral type; <i>e.g.</i>, a hash functor might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A hash table, though, requires transforming each key object into some non-negative integral type in some specific domain; <i>e.g.</i>, a hash table with 128 entries might transform the <tt>"hello"</tt> into position 63. The policy by which the hash value is transformed into a position within the table can dramatically affect performance.
53
</p>
54
 
55
<p>
56
        Additionally, most hash-table algorithms encounter collisions. To mitigate the cost of these collisions, it sometimes is beneficial to store the hash value along with each element
57
[<a href = "references.html#clrs2001">clrs2001</a>, <a href = "references.html#austern01htprop">austern01htprop</a>]. While this improves performance for complex keys, it hampers performance for simple keys, and is best left as a policy.
58
</p>
59
 
60
<p>
61
        Tree-based containers allow reasonable access while maintaining order between elements. In some cases, however, tree-based containers can be used for additional purposes. <i>E.g.</i>,consider Figure
62
<a href = "#interval_invariants">
63
Sets of line intervals
64
</a>-A,
65
which shows
66
an example of a tree-based set storing
67
half-open geometric line intervals. An <tt>std::set</tt> with this
68
structure can efficiently answer whether <i>[20, 101)</i> is in the
69
set, but it cannot efficiently answer whether any interval in the
70
set overlaps <i>[20, 101)</i>, nor can it efficiently enumerate all
71
intervals overlapping <i>[20, 101)</i>. A well-known augmentation to
72
balanced trees can support efficient answers to such questions
73
[<a href = "references.html#clrs2001">clrs2001</a>]. Namely,
74
an invariant should be maintained whereby
75
each node should contain also the
76
maximal endpoint of any interval within its subtree, as in Figure
77
<a href = "#interval_invariants">
78
Sets of line intervals
79
</a>-B. In order to maintain this ivariant, though, an invariant-restoring policy is
80
required.
81
</p>
82
 
83
<h6 align = "center">
84
<a name = "interval_invariants">
85
<img src = "interval_node_invariants.jpg" width = "70%" alt = "no image">
86
</a>
87
</h6>
88
<h6 align = "center">
89
Sets of line intervals.
90
</h6>
91
 
92
 
93
<h2><a name = "ds_genericity">Data-Structure Genericity</a></h2>
94
 
95
<p>
96
        Consider a generic function manipulating an associative container, <i>e.g.</i>,
97
</p>
98
 
99
<pre>
100
<b>template</b>&lt;
101
        <b>class</b> Cntnr&gt;
102
<b>int</b> some_op_sequence
103
    (Cntnr &r_cnt)
104
{
105
        ...
106
}
107
</pre>
108
 
109
<p>
110
        The underlying data structure affects what the function can do with the container object.
111
</p>
112
 
113
<p>
114
 For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then the function can
115
use <tt>std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)</tt>
116
in order to apply <tt>foobar</tt> to all elements between <tt>foo</tt>
117
and <tt>bar</tt>. If <tt>Cntnr</tt> is a hash-based container, then this call's results are undefined.
118
</p>
119
 
120
<p>
121
        Also, if <tt>Cntnr</tt> is tree-based, the type and object of the comparison functor
122
can be accessed. If <tt>Cntnr</tt> is hash based, these queries are nonsensical</p>
123
 
124
<p>
125
        These types of problems are excaberated when considering the wide variety of useful underlying data-structures.         Figure
126
<a href = "#different_underlying_data_structures">Different underlying data structures</a>
127
shows different underlying data-structures (the ones
128
currently supported in <tt>pb_assoc</tt>). A shows a collision-chaining hash-table; B shows a probing hash-table; C shows a red-black tree; D shows a splay tree; E shows a tree based on an ordered vector (the tree is implicit in the order of the elements); E shows a list-based container with update policies.
129
</p>
130
 
131
<h6 align = "center">
132
<a name = "different_underlying_data_structures">
133
<img src = "different_underlying_dss.jpg" width = "70%" alt = "no image">
134
</a>
135
</h6>
136
<h6 align = "center">
137
Different underlying data structures.
138
</h6>
139
 
140
<p>
141
        These underlying data structures display different behavior. For one, they can be queried for different policies. Furthermore:
142
</p>
143
<ol>
144
        <li>
145
                Containers based on C, D, and E store eleents in a meaningful order; the others store elements in a meaningless (and probably time-varying) order. As a futher consequence, containers based on C, D, and E can support erase operations taking an iterator and returning an iterator to the following element; the others cannot.
146
        </li>
147
        <li>
148
                Containers based on C, D, and E can be split and joined efficiently, while the others cannot. Containers based on C and D, futhermore, can guarantee that this is exception-free; containers based on E cannot guarantee this.
149
        </li>
150
        <li>
151
                Containers based on all but E can guarantee that erasing an element is exception free; containers based on E cannot guarantee this. Containers based on all but B and E can guarantee that modifying an object of their type does not invalidate iterators or references to their elements, while contianers based on B and E cannot. Containers based on C, D, and E can futhermore make a stronger guarantee, namely that modifiying an object of their type does not affect the relation of iterators.
152
        </li>
153
</ol>
154
 
155
<p>
156
        A unified tag and traits system (as used for the STL's iterators, for example) can ease  generic manipulation of associative containers based on different underlying data-structures.
157
</p>
158
 
159
<h2><a name = "mapping_semantics">Mapping Semantics</a></h2>
160
 
161
        <p>
162
        In some cases, map and set semantics are inappropriate. <i>E.g.</i>, consider
163
an application monitoring user activity. Such an application might be designed to track a user, the machine(s) to which the user is logged, application(s) the user is running on the machine, and the start time of the application. In this case, since a user might run more than a single application, there can be no unique mapping from a user to specific datum.
164
        </p>
165
 
166
<p>
167
    The STL's non-unique mapping containers (<i>e.g.</i>,
168
<tt>std::multimap</tt> and <tt>std::multiset</tt>) can be used
169
in this case. These types of containers can store store two or more equivalent, non-identical keys [<a href = "references.html#kleft00sets">kleft00sets</a>]. Figure
170
<a href = "#embedded_lists_1">Non-unique mapping containers in the STL's design</a> shows possible structures of STL tree-based and hash-based containers, multisets, respectively; in this figure, equivalent-key nodes share the same shading.
171
</p>
172
 
173
<h6 align = "center">
174
<a name = "embedded_lists_1">
175
<img src = "embedded_lists_1.jpg" width = "70%" alt = "no image">
176
</a>
177
</h6>
178
<h6 align = "center">
179
Non-unique mapping containers in the STL's design.
180
</h6>
181
 
182
<p>
183
        This design has several advantages. Foremost, it allows maps and multimaps, and sets and multisets, to share the same <tt>value_type</tt>, easing generic manipulation of containers with different mapping semantics.
184
</p>
185
 
186
 
187
<p>
188
    Conversely, this design has possible scalability drawbacks, due to an implicit "embedding" of linked lists.
189
Figure
190
<a href = "#embedded_lists_2">
191
Embedded lists in STL multimaps
192
</a>-A shows a tree with shaded nodes sharing equivalent keys;
193
Figure
194
<a href = "#embedded_lists_2">
195
Embedded lists in STL multimaps
196
</a>-A explicitly shows the linked lists implicit in Figure
197
<a href = "#embedded_lists_1">Non-unique mapping containers in the STL's design</a>. The drawbacks are the following.
198
</p>
199
 
200
<ol>
201
    <li> As mentioned before, there are several underlying data-structures, each with its set of tradeoffs.
202
The STL's design uses an associative linked-list to store all elements with equivalent primary
203
key (<i>e.g.</i>, users). Searching for a secondary key (<i>e.g.</i>,
204
a process) is inherently linear. While this works reasonably well when the number of distinct secondary
205
keys is small, it does not scale well.
206
    </li>
207
    <li> Embedding linked lists can cause the entire structure to be inefficient.
208
<i>E.g.</i>, Figure
209
<a href = "#embedded_lists_1">
210
Effect of embedded lists in STL multimaps
211
</a>-A
212
    shows a tree with several shaded nodes containing equivalent keys; note how unbalanced
213
this tree would seem when considering all shaded nodes to be a single node.
214
Figure
215
<a href = "#embedded_lists_1">
216
Effect of embedded lists in STL multimaps
217
</a>-B shows a hash table with several shaded nodes containing equivalent keys; note
218
that this can lengthen the search for other nodes as well.
219
    </li>
220
    <li> Embdedding linked lists is only possible for some data structures.
221
Some data structures, <i>e.g.</i>, probing-hash tables, linear hash tables,
222
and extendible hash tables, cannot support it.
223
    </li>
224
    <li> The embedded linked list design forgoes the abilitiy to treat
225
all elements with the same primary key as a single entity. The ability to
226
efficiently simultaneously insert (or erase) a larger number of elements with
227
the same primary key is lost; the ability to utilize segmented iterators is lost
228
[<a href = "references.html#austern98segmented">austern98segmented</a>].
229
        </li>
230
        <li> The linked-list design uses much space. For one, in the above example, the data identifying will must be duplicated for each application run by the user. Furthermore, the "links" in the linked list are supplied by the underlying data structure. In the case of tree-based containers, for example, the linked list utilizes the fact that each tree node contains pointers to its parent and its children; given that the order of equivalent keys is meaningless, the number of pointers exceeds the functionality supplied by a linked list.
231
        </li>
232
</ol>
233
 
234
<h6 align = "center">
235
<a name = "embedded_lists_2">
236
<img src = "embedded_lists_2.jpg" width = "70d" alt = "no image">
237
</a>
238
</h6>
239
<h6 align = "center">
240
Embedded lists in STL multimaps.
241
</h6>
242
 
243
 
244
<h2><a name = "methods">Choice of Methods</a></h2>
245
 
246
<p>
247
        [<a href = "references.html#meyers02both">meyers02both</a>] points out
248
that a class's methods should comprise only operations which depend on the class's internal structure; other operations are best designed as external functions. Possibly, therefore, the STL's associative containers lack some useful methods, and provide some redundant methods.
249
</p>
250
 
251
<ol>
252
        <li>
253
                Possibly missing methods:
254
        </li>
255
        <ol>
256
                <li>
257
                        It is well-known that tree-based container objects can be efficiently split or joined
258
                [<a href = "references.html#clrs2001">clrs2001</a>]. Externally splitting or joining trees is super-linear, and, furthermore, can throw exceptions. Split and join methods, consequently, seem good choices for tree-based container methods.
259
                </li>
260
                <li>
261
                        Suppose all elements which match a certain criteria need to be erased from an
262
unordered container object, <i>e.g.</i>, all elements whos keys are in a given range. Externally erasing them from the container object is super-linear, since erasing an element might reorder all iterators. Conditional erasing, therefore, seems a good choice for associative containers.
263
                </li>
264
        </ol>
265
                <li> Possibly redundant methods:</li>
266
        <ol>
267
                <li>
268
                        STL associative containers provide methods for inserting a range of elements given by a pair of iterators. At best, this can be implemented as an external function, or, even more efficiently, as a join operation (for the case of tree-based containers). Moreover, these methods seem similar to constructors taking a range given by a pair of iterators; the constructors, however, are transactional, whereas the insert methods are not; this is possibly confusing.
269
                </li>
270
                <li>
271
                        STL associative containers provide methods for erasing a range of elements given by a pair of iterators. At best, this can be implemented as an external function, or, even more efficiently, as a (small) sequence of split and join operations (for the case of tree-based containers). Moreover, the results of erasing a range is undefined for the case of containers based on unordered data-structures.
272
                </li>
273
                <li>
274
                        Associative containers are parameterized by policies allowing to test keys, but not data, for equivalence. When comparing two associative container objects, it is at least as reasonable to expect that they are equivalent if both keys and data are equivalent, as it is reasonable to expect that they are equivalent if their keys only are equivalent. Furthermore, in different settings it makes sense that two objects are equivalent if they store keys in the same order, whereas in other settings order does not matter. The operators <tt>operator==</tt> and <tt>operator!=</tt> are not descriptive enough for these considerations.
275
                </li>
276
        </ol>
277
</ol>
278
 
279
</body>
280
 
281
</html>

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.