1 |
20 |
jlechner |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
2 |
|
|
<html>
|
3 |
|
|
<head>
|
4 |
|
|
<title>Hash-Based Containers</title>
|
5 |
|
|
<meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
|
6 |
|
|
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
|
7 |
|
|
</head>
|
8 |
|
|
|
9 |
|
|
<body bgcolor = "white">
|
10 |
|
|
|
11 |
|
|
<h1>Hash-Based Containers</h1>
|
12 |
|
|
|
13 |
|
|
<p>
|
14 |
|
|
This section describes hash-based containers. It is organized
|
15 |
|
|
as follows.
|
16 |
|
|
</p>
|
17 |
|
|
|
18 |
|
|
<ol>
|
19 |
|
|
<li>
|
20 |
|
|
<a href = "#overview">Overview</a> is an overview.
|
21 |
|
|
</li>
|
22 |
|
|
<li>
|
23 |
|
|
<a href = "#hash_policies">Hash Policies</a> discusses
|
24 |
|
|
hash policies.
|
25 |
|
|
</li>
|
26 |
|
|
<li>
|
27 |
|
|
<a href = "#resize_policies">Resize Policies</a> discusses
|
28 |
|
|
resize policies.
|
29 |
|
|
</li>
|
30 |
|
|
<li>
|
31 |
|
|
<a href = "#policy_interaction">Policy Interaction</a> discusses
|
32 |
|
|
interaction between policies.
|
33 |
|
|
</li>
|
34 |
|
|
</ol>
|
35 |
|
|
|
36 |
|
|
|
37 |
|
|
|
38 |
|
|
<h2><a name = "overview">Overview</a></h2>
|
39 |
|
|
|
40 |
|
|
|
41 |
|
|
<p>
|
42 |
|
|
Figure
|
43 |
|
|
<a href = "#hash_cd">Hash-based containers</a>
|
44 |
|
|
shows the container-hierarchy; the hash-based containers are circled.
|
45 |
|
|
<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
|
46 |
|
|
is a collision-chaining hash-based container;
|
47 |
|
|
<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
|
48 |
|
|
is a (general) probing hash-based container.
|
49 |
|
|
</p>
|
50 |
|
|
|
51 |
|
|
<h6 align = "center">
|
52 |
|
|
<a name = "hash_cd">
|
53 |
|
|
<img src = "hash_cd.jpg" width = "70%" alt = "no image">
|
54 |
|
|
</h6>
|
55 |
|
|
<h6 align = "center">
|
56 |
|
|
</a>
|
57 |
|
|
Hash-based containers.
|
58 |
|
|
</h6>
|
59 |
|
|
|
60 |
|
|
<p>
|
61 |
|
|
The collision-chaining hash-based container has the following declaration.
|
62 |
|
|
</p>
|
63 |
|
|
<pre>
|
64 |
|
|
<b>template</b><
|
65 |
|
|
<b>typename</b> Key,
|
66 |
|
|
<b>typename</b> Data,
|
67 |
|
|
<b>class</b> Hash_Fn = std::hash<Key>,
|
68 |
|
|
<b>class</b> Eq_Fn = std::equal_to<Key>,
|
69 |
|
|
<b>class</b> Comb_Hash_Fn =
|
70 |
|
|
<a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><>
|
71 |
|
|
<b>class</b> Resize_Policy = <i>default explained below.</i>
|
72 |
|
|
<b>bool</b> Store_Hash = <b>false</b>,
|
73 |
|
|
<b>class</b> Allocator =
|
74 |
|
|
std::allocator<<b>char</b>> >
|
75 |
|
|
<b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>;
|
76 |
|
|
</pre>
|
77 |
|
|
|
78 |
|
|
<p>
|
79 |
|
|
The parameters have the following meaning:
|
80 |
|
|
</p>
|
81 |
|
|
<ol>
|
82 |
|
|
<li> <tt>Key</tt> is the key type.
|
83 |
|
|
</li>
|
84 |
|
|
<li> <tt>Data</tt> is the data-policy, and is explained in
|
85 |
|
|
<a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>.
|
86 |
|
|
</li>
|
87 |
|
|
<li> <tt>Hash_Fn</tt> is a key hashing functor.</li>
|
88 |
|
|
<li> <tt>Eq_Fn</tt> is a key equivalence functor.</li>
|
89 |
|
|
<li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it
|
90 |
|
|
describes how to translate hash values into positions within the table.
|
91 |
|
|
This is described in
|
92 |
|
|
<a name = "#hash_policies">Hash Policies</a>.</li>
|
93 |
|
|
</li>
|
94 |
|
|
<li> <tt>Resize_Policy</tt> describes how a container object should
|
95 |
|
|
change its internal size. This is described in
|
96 |
|
|
<a name = #resize_policies">Resize Policies</a>.</li>
|
97 |
|
|
<li> <tt>Store_Hash</tt> indicates whether the hash value should
|
98 |
|
|
be stored with each entry. This is described in
|
99 |
|
|
<a name = "#policy_interaction">Policy Interaction</a>.</li>
|
100 |
|
|
<li> <tt>Allocator</tt> is (surprisingly) an allocator type.
|
101 |
|
|
</li>
|
102 |
|
|
</ol>
|
103 |
|
|
|
104 |
|
|
<p>
|
105 |
|
|
The probing hash-based container has the following declaration.
|
106 |
|
|
</p>
|
107 |
|
|
<pre>
|
108 |
|
|
<b>template</b><
|
109 |
|
|
<b>typename</b> Key,
|
110 |
|
|
<b>typename</b> Data,
|
111 |
|
|
<b>class</b> Hash_Fn =
|
112 |
|
|
std::hash<
|
113 |
|
|
Key>,
|
114 |
|
|
<b>class</b> Eq_Fn =
|
115 |
|
|
std::equal_to<
|
116 |
|
|
Key>,
|
117 |
|
|
<b>class</b> Comb_Probe_Fn =
|
118 |
|
|
<a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><>
|
119 |
|
|
<b>class</b> Probe_Fn = <i>default explained below.</i>
|
120 |
|
|
<b>class</b> Resize_Policy = <i>default explained below.</i>
|
121 |
|
|
<b>bool</b> Store_Hash = <b>false</b>,
|
122 |
|
|
<b>class</b> Allocator =
|
123 |
|
|
std::allocator<<b>char</b>> >
|
124 |
|
|
<b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>;
|
125 |
|
|
</pre>
|
126 |
|
|
|
127 |
|
|
<p>
|
128 |
|
|
The parameters are identical to those of the collision-chaining container, except
|
129 |
|
|
for the following.
|
130 |
|
|
</p>
|
131 |
|
|
<ol>
|
132 |
|
|
<li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into
|
133 |
|
|
a sequence of positions within the table.
|
134 |
|
|
</li>
|
135 |
|
|
<li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li>
|
136 |
|
|
</ol>
|
137 |
|
|
|
138 |
|
|
|
139 |
|
|
<p>
|
140 |
|
|
Some of the default template values depend on the values of other parameters,
|
141 |
|
|
and are explained in
|
142 |
|
|
<a name = "#policy_interaction">Policy Interaction</a>.
|
143 |
|
|
</p>
|
144 |
|
|
|
145 |
|
|
<h2><a name = "hash_policies">Hash Policies</h2></a>
|
146 |
|
|
<p>
|
147 |
|
|
This subsection describes hash policies. It is organized as follows:
|
148 |
|
|
</p>
|
149 |
|
|
<ol>
|
150 |
|
|
<li> <a href = "#general_terms">General Terms</a> describes
|
151 |
|
|
some general terms.
|
152 |
|
|
</li>
|
153 |
|
|
<li> <a href = "#range_hashing_fns">Range-Hashing Functions</a>
|
154 |
|
|
describes range-hasing functions.</li>
|
155 |
|
|
<li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a>
|
156 |
|
|
describes ranged-hash functions. </li>
|
157 |
|
|
<li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a>
|
158 |
|
|
describes the implementation of these concepts in <tt>pb_assoc</tt>.
|
159 |
|
|
</li>
|
160 |
|
|
</ol>
|
161 |
|
|
|
162 |
|
|
|
163 |
|
|
<h3><a name="general_terms">General Terms</a></h3>
|
164 |
|
|
|
165 |
|
|
<p>
|
166 |
|
|
There
|
167 |
|
|
are actually three functions involved in transforming a key into a hash-table's
|
168 |
|
|
position (see Figure
|
169 |
|
|
<a href = "#hash_ranged_hash_range_hashing_fns">
|
170 |
|
|
Hash runctions, ranged-hash functions, and range-hashing functions
|
171 |
|
|
</a>):
|
172 |
|
|
</p>
|
173 |
|
|
<ol>
|
174 |
|
|
<li>
|
175 |
|
|
A <i>ranged-hash</i> function, which maps keys into an interval of the
|
176 |
|
|
non-negative integrals. This is the function actually required by the
|
177 |
|
|
hash-table algorithm.
|
178 |
|
|
</li>
|
179 |
|
|
<li>
|
180 |
|
|
A hash function, which maps keys into non-negative integral types. This is
|
181 |
|
|
typically specified by the writer of the key class.
|
182 |
|
|
</li>
|
183 |
|
|
<li>
|
184 |
|
|
A <i>range-hashing</i> function, which maps non-negative integral types into an
|
185 |
|
|
interval of non-negative integral types.
|
186 |
|
|
</li>
|
187 |
|
|
</ol>
|
188 |
|
|
|
189 |
|
|
<h6 align = "center">
|
190 |
|
|
<a name = "hash_ranged_hash_range_hashing_fns">
|
191 |
|
|
<img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image">
|
192 |
|
|
</h6>
|
193 |
|
|
<h6 align = "center">
|
194 |
|
|
</a>
|
195 |
|
|
Hash runctions, ranged-hash functions, and range-hashing functions.
|
196 |
|
|
</h6>
|
197 |
|
|
|
198 |
|
|
<p>
|
199 |
|
|
Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
|
200 |
|
|
characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
|
201 |
|
|
into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
|
202 |
|
|
value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
|
203 |
|
|
function
|
204 |
|
|
</p>
|
205 |
|
|
<p>
|
206 |
|
|
<i>f : U × Z<sub>+</sub> → Z<sub>+</sub> </i>,
|
207 |
|
|
</p>
|
208 |
|
|
<p>
|
209 |
|
|
such that for any <i>u</i> in <i>U</i>
|
210 |
|
|
,
|
211 |
|
|
</p>
|
212 |
|
|
<p>
|
213 |
|
|
<i>0 ≤ f(u, m) ≤ m - 1 </i>,
|
214 |
|
|
</p>
|
215 |
|
|
<p>
|
216 |
|
|
and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
|
217 |
|
|
One common solution is to use the composition of the hash function
|
218 |
|
|
</p>
|
219 |
|
|
<p>
|
220 |
|
|
<i>h : U → Z<sub>+</sub> </i>,
|
221 |
|
|
</p>
|
222 |
|
|
<p>
|
223 |
|
|
which maps elements of <i>U</i> into the non-negative integrals, and
|
224 |
|
|
</p>
|
225 |
|
|
<p>
|
226 |
|
|
<i>g : Z<sub>+</sub> × Z<sub>+</sub> → Z<sub>+</sub>, </i>
|
227 |
|
|
</p>
|
228 |
|
|
<p>
|
229 |
|
|
which maps a non-negative hash value, and a non-negative range upper-bound into
|
230 |
|
|
a non-negative integral in the range between 0 (inclusive) and the range upper
|
231 |
|
|
bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
|
232 |
|
|
</p>
|
233 |
|
|
<p>
|
234 |
|
|
<i>0 ≤ g(r, m) ≤ m - 1 </i>.
|
235 |
|
|
</p>
|
236 |
|
|
<p>
|
237 |
|
|
The resulting ranged-hash function, is
|
238 |
|
|
</p>
|
239 |
|
|
<p>
|
240 |
|
|
<i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
|
241 |
|
|
</i>(1) .
|
242 |
|
|
</p>
|
243 |
|
|
|
244 |
|
|
<p>
|
245 |
|
|
From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
|
246 |
|
|
always be composed (however the converse is not true). The STL's hash-based
|
247 |
|
|
containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed.
|
248 |
|
|
</p>
|
249 |
|
|
|
250 |
|
|
|
251 |
|
|
<p>
|
252 |
|
|
The above describes the case where a key is to be mapped into a <i>single
|
253 |
|
|
position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
|
254 |
|
|
In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
|
255 |
|
|
within a table, <i>e.g.</i>, in a probing table.
|
256 |
|
|
</p>
|
257 |
|
|
<p>
|
258 |
|
|
Similar terms apply in this case: the table requires a <i>ranged probe</i>
|
259 |
|
|
function, mapping a key into a sequence of positions withing the table. This is
|
260 |
|
|
typically acheived by composing a <i>hash function</i> mapping the key
|
261 |
|
|
into a non-negative integral type, a <i>probe</i> function transforming the
|
262 |
|
|
hash value into a sequence of hash values, and a <i>range-hashing</i> function
|
263 |
|
|
transforming the sequence of hash values into a sequence of positions.
|
264 |
|
|
</p>
|
265 |
|
|
|
266 |
|
|
|
267 |
|
|
<h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3>
|
268 |
|
|
|
269 |
|
|
<p>
|
270 |
|
|
Some common choices for range-hashing functions are the division,
|
271 |
|
|
multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
|
272 |
|
|
defined as
|
273 |
|
|
</p>
|
274 |
|
|
<p>
|
275 |
|
|
<i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
|
276 |
|
|
</p>
|
277 |
|
|
<p>
|
278 |
|
|
<i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉ </i>,
|
279 |
|
|
</p>
|
280 |
|
|
<p>
|
281 |
|
|
and
|
282 |
|
|
</p>
|
283 |
|
|
<p>
|
284 |
|
|
<i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉ </i>,
|
285 |
|
|
</p>
|
286 |
|
|
<p>
|
287 |
|
|
respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
|
288 |
|
|
powers of 2), and some <i>a</i>. Each of these range-hashing functions works
|
289 |
|
|
best for some different setting.
|
290 |
|
|
</p>
|
291 |
|
|
<p>
|
292 |
|
|
The division method <a href="#division_method">(2)</a> is a very common
|
293 |
|
|
choice. However, even this single method can be implemented in two very
|
294 |
|
|
different ways. It is possible to implement <a href="#division_method">(2)</a>
|
295 |
|
|
using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
|
296 |
|
|
level <i>&</i> (bit-mask) operation (for the case where <i>m</i> is a power of
|
297 |
|
|
2), <i>i.e.</i>,
|
298 |
|
|
</p>
|
299 |
|
|
<p>
|
300 |
|
|
<i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
|
301 |
|
|
</p>
|
302 |
|
|
<p>
|
303 |
|
|
and
|
304 |
|
|
</p>
|
305 |
|
|
<p>
|
306 |
|
|
<a name="eqn:division_method_bit_mask">
|
307 |
|
|
<i>g(r, m) = r & m - 1, ( m = 2<sup>k</sup>
|
308 |
|
|
</i>
|
309 |
|
|
for some<i> k) </i></a>(4) ,
|
310 |
|
|
</p>
|
311 |
|
|
<p>
|
312 |
|
|
respectively.
|
313 |
|
|
</p>
|
314 |
|
|
<p>
|
315 |
|
|
The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
|
316 |
|
|
has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
|
317 |
|
|
is affected by all the bits of <i>r</i> (minimizing the chance of collision).
|
318 |
|
|
It has the disadvantage of using the costly modulo operation. This method is
|
319 |
|
|
hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
|
320 |
|
|
</p>
|
321 |
|
|
|
322 |
|
|
<p>
|
323 |
|
|
The <i>&</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
|
324 |
|
|
has the advantage of relying on the fast bitwise and operation. It has the
|
325 |
|
|
disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
|
326 |
|
|
This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
|
327 |
|
|
</p>
|
328 |
|
|
|
329 |
|
|
|
330 |
|
|
|
331 |
|
|
|
332 |
|
|
<h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3>
|
333 |
|
|
|
334 |
|
|
<p>
|
335 |
|
|
In some less frequent cases it is beneficial to allow the client to
|
336 |
|
|
directly specify a ranged-hash hash function. It is true, that the writer of
|
337 |
|
|
the ranged-hash function cannot rely on the values of <i>m</i> having specific
|
338 |
|
|
numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
|
339 |
|
|
since the values of <i>m</i> are determined by a resize policy with possibly
|
340 |
|
|
orthogonal considerations.
|
341 |
|
|
</p>
|
342 |
|
|
|
343 |
|
|
<p>
|
344 |
|
|
There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing
|
345 |
|
|
[<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second
|
346 |
|
|
is when the values of <i>m</i> can be used to estimate the
|
347 |
|
|
"general" number of distinct values required. This is described in the following.
|
348 |
|
|
</p>
|
349 |
|
|
|
350 |
|
|
<p>
|
351 |
|
|
Let
|
352 |
|
|
</p>
|
353 |
|
|
|
354 |
|
|
<p>
|
355 |
|
|
<i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
|
356 |
|
|
</p>
|
357 |
|
|
|
358 |
|
|
<p>
|
359 |
|
|
be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
|
360 |
|
|
Consider the following ranged-hash function:
|
361 |
|
|
</p>
|
362 |
|
|
|
363 |
|
|
<p>
|
364 |
|
|
<a name="eqn:total_string_dna_hash">
|
365 |
|
|
<i>
|
366 |
|
|
f<sub>1</sub>(s, m) =
|
367 |
|
|
∑ <sub>i =
|
368 |
|
|
0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
|
369 |
|
|
</a> (5) ,
|
370 |
|
|
</p>
|
371 |
|
|
|
372 |
|
|
<p>
|
373 |
|
|
where <i>a</i> is some non-negative integral value. This is the standard
|
374 |
|
|
string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
|
375 |
|
|
Its advantage is that it takes into account all of the characters of the
|
376 |
|
|
string.
|
377 |
|
|
</p>
|
378 |
|
|
|
379 |
|
|
<p>
|
380 |
|
|
Now assume that <i>s</i> is the string representation of a of a long DNA
|
381 |
|
|
sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
|
382 |
|
|
entire string might be prohibitively expensive. A possible alternative might be
|
383 |
|
|
to use only the first <i>k</i> characters of the string, where
|
384 |
|
|
</p>
|
385 |
|
|
|
386 |
|
|
<p>
|
387 |
|
|
k <sup>|S|</sup> ≥ m ,
|
388 |
|
|
</p>
|
389 |
|
|
<p>
|
390 |
|
|
<i>i.e.</i>, using the hash function
|
391 |
|
|
</p>
|
392 |
|
|
<p>
|
393 |
|
|
<a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k
|
394 |
|
|
- 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
|
395 |
|
|
</p>
|
396 |
|
|
<p>
|
397 |
|
|
requiring scanning over only
|
398 |
|
|
</p>
|
399 |
|
|
<p>
|
400 |
|
|
<i>k = </i>log<i><sub>4</sub>( m ) </i>
|
401 |
|
|
</p>
|
402 |
|
|
<p>
|
403 |
|
|
characters.
|
404 |
|
|
</p>
|
405 |
|
|
<p>
|
406 |
|
|
Other more elaborate hash-functions might scan <i>k</i> characters starting at
|
407 |
|
|
a random position (determined at each resize), or scanning <i>k</i> random
|
408 |
|
|
positions (determined at each resize), <i>i.e.</i>, using
|
409 |
|
|
</p>
|
410 |
|
|
<p>
|
411 |
|
|
<i>f<sub>3</sub>(s, m) = ∑ <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
|
412 |
|
|
s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
|
413 |
|
|
</p>
|
414 |
|
|
<p>
|
415 |
|
|
or
|
416 |
|
|
</p>
|
417 |
|
|
<p>
|
418 |
|
|
<i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
|
419 |
|
|
a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
|
420 |
|
|
</p>
|
421 |
|
|
<p>
|
422 |
|
|
<p>
|
423 |
|
|
respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
|
424 |
|
|
(inclusive) range <i>[0,...,t-1]</i>.
|
425 |
|
|
</p>
|
426 |
|
|
|
427 |
|
|
|
428 |
|
|
<h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3>
|
429 |
|
|
|
430 |
|
|
<p>
|
431 |
|
|
<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is
|
432 |
|
|
parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor
|
433 |
|
|
and a combining hash functor, respectively.
|
434 |
|
|
</p>
|
435 |
|
|
|
436 |
|
|
<p>
|
437 |
|
|
For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
|
438 |
|
|
one of the
|
439 |
|
|
<a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>,
|
440 |
|
|
then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor.
|
441 |
|
|
The container will synthesize a ranged-hash functor from both. For example, Figure
|
442 |
|
|
<a href = "#hash_range_hashing_seq_diagram">
|
443 |
|
|
Insert hash sequence diagram
|
444 |
|
|
</a>
|
445 |
|
|
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
|
446 |
|
|
the container transforms the key into a non-negative integral using the hash
|
447 |
|
|
functor (points B and C), and transforms the result into a position
|
448 |
|
|
using the combining functor (points D and E).
|
449 |
|
|
</p>
|
450 |
|
|
|
451 |
|
|
<h6 align = "center">
|
452 |
|
|
<a name = "hash_range_hashing_seq_diagram">
|
453 |
|
|
<img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image">
|
454 |
|
|
</a>
|
455 |
|
|
</h6>
|
456 |
|
|
<h6 align = "center">
|
457 |
|
|
Insert hash sequence diagram.
|
458 |
|
|
</h6>
|
459 |
|
|
|
460 |
|
|
|
461 |
|
|
<p>
|
462 |
|
|
<tt>pb_assoc</tt> contains the following range-hashing policies:
|
463 |
|
|
</p>
|
464 |
|
|
|
465 |
|
|
<ol>
|
466 |
|
|
<li>
|
467 |
|
|
<a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
|
468 |
|
|
and
|
469 |
|
|
<a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
|
470 |
|
|
are range-hashing functions based on a bit-mask and a modulo operation, respectively.
|
471 |
|
|
</li>
|
472 |
|
|
</ol>
|
473 |
|
|
|
474 |
|
|
|
475 |
|
|
<p>
|
476 |
|
|
If <tt>Comb_Hash_Fn</tt> is instantiated by
|
477 |
|
|
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
|
478 |
|
|
and a combining-hash functor, the container treats
|
479 |
|
|
the combining hash functor as a ranged-hash function. For example, Figure
|
480 |
|
|
<a href = "#hash_range_hashing_seq_diagram2">
|
481 |
|
|
Insert hash sequence diagram with a null combination policy
|
482 |
|
|
</a>
|
483 |
|
|
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
|
484 |
|
|
the container transforms the key into a position
|
485 |
|
|
using the combining functor (points B and C).
|
486 |
|
|
</p>
|
487 |
|
|
|
488 |
|
|
|
489 |
|
|
<h6 align = "center">
|
490 |
|
|
<a name = "hash_range_hashing_seq_diagram2">
|
491 |
|
|
<img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image">
|
492 |
|
|
</a>
|
493 |
|
|
</h6>
|
494 |
|
|
<h6 align = "center">
|
495 |
|
|
Insert hash sequence diagram with a null combination policy.
|
496 |
|
|
</h6>
|
497 |
|
|
|
498 |
|
|
<p>
|
499 |
|
|
Similarly,
|
500 |
|
|
<a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt>
|
501 |
|
|
is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
|
502 |
|
|
<tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt>
|
503 |
|
|
and <tt>Comb_Probe_Fn</tt> are, respectively,
|
504 |
|
|
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and
|
505 |
|
|
<a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>,
|
506 |
|
|
then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt>
|
507 |
|
|
is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value,
|
508 |
|
|
and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions
|
509 |
|
|
within the table.
|
510 |
|
|
</p>
|
511 |
|
|
|
512 |
|
|
<p>
|
513 |
|
|
<tt>pb_assoc</tt> contains the following probe policies:
|
514 |
|
|
</p>
|
515 |
|
|
|
516 |
|
|
<ol>
|
517 |
|
|
<li>
|
518 |
|
|
<a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe
|
519 |
|
|
function.
|
520 |
|
|
</li>
|
521 |
|
|
<li>
|
522 |
|
|
<a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is
|
523 |
|
|
a quadratic probe function.
|
524 |
|
|
</li>
|
525 |
|
|
</ol>
|
526 |
|
|
|
527 |
|
|
|
528 |
|
|
|
529 |
|
|
|
530 |
|
|
|
531 |
|
|
|
532 |
|
|
|
533 |
|
|
|
534 |
|
|
|
535 |
|
|
<h2><a name = "resize_policies">Resize Policies</a></h2>
|
536 |
|
|
|
537 |
|
|
<p>
|
538 |
|
|
This subsection describes resize policies. It is organized as follows:
|
539 |
|
|
</p>
|
540 |
|
|
|
541 |
|
|
<ol>
|
542 |
|
|
<li> <a href = "#general">General Terms</a> describes general
|
543 |
|
|
terms.
|
544 |
|
|
</li>
|
545 |
|
|
<li> <a href = "#size_policies">Size Policies</a> describes size
|
546 |
|
|
policies.
|
547 |
|
|
</li>
|
548 |
|
|
<li> <a href = "#trigger_policies">Trigger Policies</a> describes trigger
|
549 |
|
|
policies.
|
550 |
|
|
</li> <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a>
|
551 |
|
|
describes the implementation of these concepts in <tt>pb_assoc</tt>.
|
552 |
|
|
</li>
|
553 |
|
|
</ol>
|
554 |
|
|
|
555 |
|
|
|
556 |
|
|
<h3><a name = "general">General Terms</a></h3>
|
557 |
|
|
|
558 |
|
|
<p>
|
559 |
|
|
Hash-tables, as opposed to trees, do not naturally grow or shrink. It
|
560 |
|
|
is necessary to specify policies to determine how and when a hash table should change
|
561 |
|
|
its size.
|
562 |
|
|
</p>
|
563 |
|
|
|
564 |
|
|
<p>
|
565 |
|
|
In general, resize policies can be decomposed into (probably orthogonal)
|
566 |
|
|
policies:
|
567 |
|
|
</p>
|
568 |
|
|
<ol>
|
569 |
|
|
<li> A <i>size policy</i> indicating <i>how</i> a hash table should
|
570 |
|
|
grow (<i>e.g.,</i> it should multiply by powers of 2).
|
571 |
|
|
</li>
|
572 |
|
|
<li> A <i>trigger policy</i> indicating <i>when</i> a hash table should
|
573 |
|
|
grow (<i>e.g.,</i> a load factor is exceeded).
|
574 |
|
|
</li>
|
575 |
|
|
</ol>
|
576 |
|
|
|
577 |
|
|
|
578 |
|
|
|
579 |
|
|
<h3><a name = "size_policies">Size Policies</a></h3>
|
580 |
|
|
|
581 |
|
|
<p>
|
582 |
|
|
Size policies determine how a hash table
|
583 |
|
|
changes size. These policies are simple, and there are relatively
|
584 |
|
|
few sensible options. An exponential-size policy (with the initial
|
585 |
|
|
size and growth factors both powers of 2) works well with a
|
586 |
|
|
mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
|
587 |
|
|
and is the
|
588 |
|
|
hard-wired policy used by Dinkumware
|
589 |
|
|
[<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A
|
590 |
|
|
prime-list based policy works well with a modulo-prime range
|
591 |
|
|
hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
|
592 |
|
|
and is the
|
593 |
|
|
hard-wired policy used by SGI's implementation
|
594 |
|
|
[<a href = "references.html#sgi_stl">sgi_stl</a>].
|
595 |
|
|
</p>
|
596 |
|
|
|
597 |
|
|
|
598 |
|
|
|
599 |
|
|
|
600 |
|
|
<h3><a name = "trigger_policies">Trigger Policies</a></h3>
|
601 |
|
|
|
602 |
|
|
<p>
|
603 |
|
|
Trigger policies determine when a hash table changes size.
|
604 |
|
|
Following is a description of two polcies: <i>load-check</i>
|
605 |
|
|
policies, and a collision-check policies.
|
606 |
|
|
</p>
|
607 |
|
|
|
608 |
|
|
<p>
|
609 |
|
|
Load-check policies are straightforward. The user
|
610 |
|
|
specifies two factors, <i>α<sub>min</sub></i> and <i>α<sub>max</sub></i>, and
|
611 |
|
|
the hash table maintains the invariant that
|
612 |
|
|
</p>
|
613 |
|
|
<p>
|
614 |
|
|
<i>
|
615 |
|
|
<a name = "eqn:load_factor_min_max">
|
616 |
|
|
α<sub>min</sub>
|
617 |
|
|
≤
|
618 |
|
|
(number of stored elements) / (hash-table size)
|
619 |
|
|
≤
|
620 |
|
|
α<sub>max</sub>
|
621 |
|
|
</a>
|
622 |
|
|
</i>
|
623 |
|
|
(1)
|
624 |
|
|
.
|
625 |
|
|
</p>
|
626 |
|
|
|
627 |
|
|
<p>
|
628 |
|
|
Collision-check policies work in the opposite direction of
|
629 |
|
|
load-check policies. They focus on keeping the number of
|
630 |
|
|
collisions moderate and hoping
|
631 |
|
|
that the size of the table will not grow very large,
|
632 |
|
|
instead of keeping a moderate load-factor and
|
633 |
|
|
hoping that the number of collisions will be small.
|
634 |
|
|
A
|
635 |
|
|
maximal collision-check policy resizes when the shortest
|
636 |
|
|
probe-sequence grows too large.
|
637 |
|
|
</p>
|
638 |
|
|
|
639 |
|
|
|
640 |
|
|
<p>
|
641 |
|
|
Consider Figure
|
642 |
|
|
<a href = "#balls_and_bins">Balls and bins</a>.
|
643 |
|
|
Let the size of the hash table be denoted by <i>m</i>, the
|
644 |
|
|
length of a probe sequence be denoted by <i>k</i>, and some load
|
645 |
|
|
factor be denoted by α. We would like to calculate the
|
646 |
|
|
minimal length of <i>k</i>, such that if there were <i>α m</i> elements
|
647 |
|
|
in the hash table, a probe sequence of length <i>k</i> would be found
|
648 |
|
|
with probability at most <i>1/m</i>.
|
649 |
|
|
</p>
|
650 |
|
|
|
651 |
|
|
<h6 align = "center">
|
652 |
|
|
<a name = "balls_and_bins">
|
653 |
|
|
<img src = "balls_and_bins.jpg" width = "70%" alt = "no image">
|
654 |
|
|
</a>
|
655 |
|
|
</h6>
|
656 |
|
|
<h6 align = "center">
|
657 |
|
|
Balls and bins.
|
658 |
|
|
</h6>
|
659 |
|
|
|
660 |
|
|
|
661 |
|
|
<p>
|
662 |
|
|
Denote the probability that a probe sequence of length <i>k</i>
|
663 |
|
|
appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence
|
664 |
|
|
of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution.
|
665 |
|
|
Then
|
666 |
|
|
</p>
|
667 |
|
|
<p>
|
668 |
|
|
<a name = "eqn:prob_of_p1">
|
669 |
|
|
<i>p<sub>1</sub>
|
670 |
|
|
= </i>(3)
|
671 |
|
|
</a>
|
672 |
|
|
</p>
|
673 |
|
|
<p>
|
674 |
|
|
<i>
|
675 |
|
|
<b>P</b>(l<sub>1</sub> ≥ k)
|
676 |
|
|
=
|
677 |
|
|
</i>
|
678 |
|
|
</p>
|
679 |
|
|
<p>
|
680 |
|
|
<i><b>P</b>(l<sub>1</sub> ≥ α ( 1 + k / α - 1 )
|
681 |
|
|
≤ </i>(a)
|
682 |
|
|
</p>
|
683 |
|
|
<p>
|
684 |
|
|
<i>
|
685 |
|
|
e
|
686 |
|
|
^
|
687 |
|
|
(
|
688 |
|
|
-
|
689 |
|
|
(
|
690 |
|
|
α ( k / α - 1 )<sup>2</sup>
|
691 |
|
|
)
|
692 |
|
|
/2
|
693 |
|
|
)
|
694 |
|
|
</i>
|
695 |
|
|
,
|
696 |
|
|
</p>
|
697 |
|
|
<p>
|
698 |
|
|
where (a) follows from the Chernoff bound
|
699 |
|
|
[<a href = "references.html#motwani95random">motwani95random</a>].
|
700 |
|
|
To
|
701 |
|
|
calculate the probability that <i>some</i> bin contains a probe
|
702 |
|
|
sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are
|
703 |
|
|
negatively-dependent
|
704 |
|
|
[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
|
705 |
|
|
Let <i><b>I</b>(.)</i>
|
706 |
|
|
denote the indicator function. Then
|
707 |
|
|
<p>
|
708 |
|
|
<a name = "eqn:at_least_k_i_n_some_bin">
|
709 |
|
|
<i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> ≥ k )
|
710 |
|
|
= </i>(3)
|
711 |
|
|
</a>
|
712 |
|
|
</p>
|
713 |
|
|
<p>
|
714 |
|
|
<i>
|
715 |
|
|
<b>P</b>
|
716 |
|
|
(
|
717 |
|
|
∑ <sub>i = 1</sub><sup>m</sup>
|
718 |
|
|
<b>I</b>(l<sub>i</sub> ≥ k) ≥ 1
|
719 |
|
|
)
|
720 |
|
|
=
|
721 |
|
|
</i>
|
722 |
|
|
</p>
|
723 |
|
|
<p>
|
724 |
|
|
<i>
|
725 |
|
|
<b>P</b>
|
726 |
|
|
(
|
727 |
|
|
∑ <sub>i = 1</sub><sup>m</sup>
|
728 |
|
|
<b>I</b>
|
729 |
|
|
(
|
730 |
|
|
l<sub>i</sub> ≥ k
|
731 |
|
|
)
|
732 |
|
|
≥
|
733 |
|
|
m p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 )
|
734 |
|
|
)
|
735 |
|
|
≤ </i>(a)
|
736 |
|
|
</p>
|
737 |
|
|
<p>
|
738 |
|
|
<i>
|
739 |
|
|
e
|
740 |
|
|
^
|
741 |
|
|
(
|
742 |
|
|
(
|
743 |
|
|
-
|
744 |
|
|
m p<sub>1</sub>
|
745 |
|
|
(
|
746 |
|
|
1 / (m p<sub>1</sub>) - 1
|
747 |
|
|
)
|
748 |
|
|
<sup>2</sup>
|
749 |
|
|
)
|
750 |
|
|
/
|
751 |
|
|
2
|
752 |
|
|
)
|
753 |
|
|
,
|
754 |
|
|
</i>
|
755 |
|
|
</p>
|
756 |
|
|
<p>
|
757 |
|
|
where (a) follows from the fact that the Chernoff bound can be
|
758 |
|
|
applied to negatively-dependent variables
|
759 |
|
|
[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
|
760 |
|
|
Inserting <a href = "#prob_of_p1">(2)</a> into
|
761 |
|
|
<a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>,
|
762 |
|
|
we obtain
|
763 |
|
|
</p>
|
764 |
|
|
<p>
|
765 |
|
|
<i>
|
766 |
|
|
k
|
767 |
|
|
~
|
768 |
|
|
√
|
769 |
|
|
(
|
770 |
|
|
2 α </i>ln<i> 2 m </i>ln<i>(m) )
|
771 |
|
|
)
|
772 |
|
|
</i>
|
773 |
|
|
.
|
774 |
|
|
</p>
|
775 |
|
|
|
776 |
|
|
|
777 |
|
|
|
778 |
|
|
|
779 |
|
|
|
780 |
|
|
|
781 |
|
|
|
782 |
|
|
|
783 |
|
|
|
784 |
|
|
<h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3>
|
785 |
|
|
|
786 |
|
|
<p>
|
787 |
|
|
The resize policies in the previous subsection are conceptually straightforward. The design
|
788 |
|
|
of hash-based containers' size-related interface is complicated by some factors.
|
789 |
|
|
</p>
|
790 |
|
|
<ol>
|
791 |
|
|
<li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no
|
792 |
|
|
distinction between the number of entries the container holds and the number of entries it is using. This,
|
793 |
|
|
of course, is not the case for hash-based containers. Moreover, even describing the
|
794 |
|
|
"actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container
|
795 |
|
|
holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries.
|
796 |
|
|
</li>
|
797 |
|
|
<li>
|
798 |
|
|
The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy
|
799 |
|
|
maintains an invariant concerning the load factor of a container object. This is sometimes too rigid:
|
800 |
|
|
<ol>
|
801 |
|
|
<li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container).
|
802 |
|
|
</li>
|
803 |
|
|
<li>
|
804 |
|
|
In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy.
|
805 |
|
|
</li>
|
806 |
|
|
</ol>
|
807 |
|
|
</li>
|
808 |
|
|
<li>
|
809 |
|
|
Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a>
|
810 |
|
|
discusses the previous concepts.)
|
811 |
|
|
</li>
|
812 |
|
|
</ol>
|
813 |
|
|
|
814 |
|
|
<p>
|
815 |
|
|
Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers.
|
816 |
|
|
</p>
|
817 |
|
|
|
818 |
|
|
|
819 |
|
|
|
820 |
|
|
<p>
|
821 |
|
|
This library attempts to address these types of problems by delegating all size-related functionality to
|
822 |
|
|
policy classes. Hash-based containers
|
823 |
|
|
are parameterized by a resize-policy class (among others), and derive publicly from
|
824 |
|
|
the resize-policy class
|
825 |
|
|
[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]
|
826 |
|
|
<i>E.g.</i>, a collision-chaining
|
827 |
|
|
hash table is defined as follows:
|
828 |
|
|
</p>
|
829 |
|
|
<pre>
|
830 |
|
|
cc_ht_map<
|
831 |
|
|
<b>class</b> Key,
|
832 |
|
|
<b>class</b> Data,
|
833 |
|
|
...
|
834 |
|
|
<b>class</b> Resize_Policy
|
835 |
|
|
...> :
|
836 |
|
|
<b>public</b> Resize_Policy
|
837 |
|
|
</pre>
|
838 |
|
|
|
839 |
|
|
<p>
|
840 |
|
|
The containers themselves lack any functionality or public interface for manipulating sizes. A container
|
841 |
|
|
object merely forwards events to its resize policy object and queries it for needed actions.
|
842 |
|
|
</p>
|
843 |
|
|
|
844 |
|
|
<p>
|
845 |
|
|
Figure
|
846 |
|
|
<a href = "#insert_resize_sequence_diagram1">
|
847 |
|
|
Insert resize sequence diagram
|
848 |
|
|
</a>
|
849 |
|
|
shows a (possible) sequence diagram of an insert operation.
|
850 |
|
|
The user inserts an element; the hash table
|
851 |
|
|
notifies its resize policy that a search has started (point A);
|
852 |
|
|
in this case, a single collision is encountered - the table
|
853 |
|
|
notifies its resize policy of this (point B); the container
|
854 |
|
|
finally notifies its resize policy that the search has ended (point C);
|
855 |
|
|
it then queries its resize policy whether a resize is needed, and if so,
|
856 |
|
|
what is the new size (points D to G); following the resize, it notifies
|
857 |
|
|
the policy that a resize has completed (point H); finally, the element
|
858 |
|
|
is inserted, and the policy notified (point I).
|
859 |
|
|
</p>
|
860 |
|
|
|
861 |
|
|
<h6 align = "center">
|
862 |
|
|
<a name = "insert_resize_sequence_diagram1">
|
863 |
|
|
<img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image">
|
864 |
|
|
</a>
|
865 |
|
|
</h6>
|
866 |
|
|
<h6 align = "center">
|
867 |
|
|
Insert resize sequence diagram.
|
868 |
|
|
</h6>
|
869 |
|
|
|
870 |
|
|
<p>
|
871 |
|
|
This addresses, to some extent, the problems mentioned above:
|
872 |
|
|
</p>
|
873 |
|
|
<ol>
|
874 |
|
|
<li>
|
875 |
|
|
Different instantiations of range-hashing policies can be met with different instantiations of
|
876 |
|
|
resize policies.
|
877 |
|
|
</li>
|
878 |
|
|
<li>
|
879 |
|
|
Questions on size-related interface are avoided, since the containers have no size-related methods. Thus
|
880 |
|
|
a container has no method for querying its actual size. It merely continuously forwards enough information to
|
881 |
|
|
its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design the appropriate method. Also, a container has no methods for setting its size. It merely queries its
|
882 |
|
|
resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and
|
883 |
|
|
supports a <tt><b>protected virtual</b></tt> function for external resize.
|
884 |
|
|
</li>
|
885 |
|
|
</ol>
|
886 |
|
|
|
887 |
|
|
<p>
|
888 |
|
|
The library contains a single class for instantiating a resize policy,
|
889 |
|
|
<tt>pb_assoc</tt> contains
|
890 |
|
|
a standard resize policy,
|
891 |
|
|
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> (the name is explained shortly).
|
892 |
|
|
In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports
|
893 |
|
|
queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility).
|
894 |
|
|
([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for
|
895 |
|
|
changing between alternative interfaces at compile time.)
|
896 |
|
|
</p>
|
897 |
|
|
|
898 |
|
|
<p>
|
899 |
|
|
As noted before,
|
900 |
|
|
size and trigger policies are usually orthogonal.
|
901 |
|
|
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
|
902 |
|
|
is parameterized by size and trigger policies. For example,
|
903 |
|
|
a collision-chaining hash table
|
904 |
|
|
is typically be defined as follows:
|
905 |
|
|
</p>
|
906 |
|
|
<pre>
|
907 |
|
|
cc_ht_map<
|
908 |
|
|
key,
|
909 |
|
|
data,
|
910 |
|
|
...
|
911 |
|
|
hash_standard_resize_policy<
|
912 |
|
|
some_trigger_policy,
|
913 |
|
|
some_size_policy,
|
914 |
|
|
...> >
|
915 |
|
|
</pre>
|
916 |
|
|
|
917 |
|
|
<p>
|
918 |
|
|
The sole function of
|
919 |
|
|
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
|
920 |
|
|
is to
|
921 |
|
|
act as a standard delegator
|
922 |
|
|
[<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these
|
923 |
|
|
policies.
|
924 |
|
|
|
925 |
|
|
<p>
|
926 |
|
|
Figures
|
927 |
|
|
<a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a>
|
928 |
|
|
and
|
929 |
|
|
<a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a>
|
930 |
|
|
show sequence diagrams illustrating the interaction between
|
931 |
|
|
the standard resize policy and its trigger and size policies, respectively.
|
932 |
|
|
</p>
|
933 |
|
|
|
934 |
|
|
<h6 align = "center">
|
935 |
|
|
<a name = "insert_resize_sequence_diagram2">
|
936 |
|
|
<img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image">
|
937 |
|
|
</a>
|
938 |
|
|
</h6>
|
939 |
|
|
<h6 align = "center">
|
940 |
|
|
Standard resize policy trigger sequence diagram.
|
941 |
|
|
</h6>
|
942 |
|
|
|
943 |
|
|
<h6 align = "center">
|
944 |
|
|
<a name = "insert_resize_sequence_diagram3">
|
945 |
|
|
<img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image">
|
946 |
|
|
</a>
|
947 |
|
|
</h6>
|
948 |
|
|
<h6 align = "center">
|
949 |
|
|
Standard resize policy size sequence diagram.
|
950 |
|
|
</h6>
|
951 |
|
|
|
952 |
|
|
<p>
|
953 |
|
|
The library (currently) supports the following instantiations of size
|
954 |
|
|
and trigger policies:
|
955 |
|
|
</p>
|
956 |
|
|
|
957 |
|
|
<ol>
|
958 |
|
|
<li>
|
959 |
|
|
<a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> implements
|
960 |
|
|
a load check trigger policy.
|
961 |
|
|
</li>
|
962 |
|
|
<li>
|
963 |
|
|
<a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
|
964 |
|
|
implements a collision check trigger policy.
|
965 |
|
|
</li>
|
966 |
|
|
<li>
|
967 |
|
|
<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> implemens
|
968 |
|
|
an exponential-size policy (which should be used with mask range hashing).
|
969 |
|
|
</li>
|
970 |
|
|
<li>
|
971 |
|
|
<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> implementing
|
972 |
|
|
a size policy based on a sequence of primes
|
973 |
|
|
[<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing
|
974 |
|
|
</li>
|
975 |
|
|
</ol>
|
976 |
|
|
|
977 |
|
|
<p>
|
978 |
|
|
The trigger policies also support interfaces for changing their specifics which depend on compile time constants.
|
979 |
|
|
</p>
|
980 |
|
|
|
981 |
|
|
|
982 |
|
|
<p>
|
983 |
|
|
Figure
|
984 |
|
|
<a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture
|
985 |
|
|
of the resize-related classes.
|
986 |
|
|
<tt>Container</tt> (which stands for any of the hash-based containers) is parameterized
|
987 |
|
|
by <tt>Resize_Policy</tt>, from which it subclasses publicly
|
988 |
|
|
[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>].
|
989 |
|
|
This class is currently instantiated only by
|
990 |
|
|
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
|
991 |
|
|
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> itself
|
992 |
|
|
is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>.
|
993 |
|
|
Currently, <tt>Trigger_Policy</tt> is instantiated by
|
994 |
|
|
<a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
|
995 |
|
|
or
|
996 |
|
|
<a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by
|
997 |
|
|
<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
|
998 |
|
|
or
|
999 |
|
|
<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.
|
1000 |
|
|
</p>
|
1001 |
|
|
|
1002 |
|
|
|
1003 |
|
|
<h6 align = "center">
|
1004 |
|
|
<a name = "resize_policy_cd">
|
1005 |
|
|
<img src = "resize_policy_cd.jpg" width = "70%" alt = "no image">
|
1006 |
|
|
</a>
|
1007 |
|
|
</h6>
|
1008 |
|
|
<h6 align = "center">
|
1009 |
|
|
Resize policy class diagram.
|
1010 |
|
|
</h6>
|
1011 |
|
|
|
1012 |
|
|
|
1013 |
|
|
|
1014 |
|
|
|
1015 |
|
|
<h2><a name = "#policy_interaction">Policy Interaction</a></h2>
|
1016 |
|
|
|
1017 |
|
|
<p>
|
1018 |
|
|
Hash-tables are unfortunately susceptible to choice of policies. One
|
1019 |
|
|
of the more complicated aspects of this is that poor combinations of good policies
|
1020 |
|
|
can alter performance drastically. Following are some considerations.
|
1021 |
|
|
</p>
|
1022 |
|
|
|
1023 |
|
|
|
1024 |
|
|
|
1025 |
|
|
|
1026 |
|
|
|
1027 |
|
|
<h3>Range-Hashing Policies and Resize Policies</h3>
|
1028 |
|
|
|
1029 |
|
|
<p>
|
1030 |
|
|
</p>
|
1031 |
|
|
|
1032 |
|
|
|
1033 |
|
|
<h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3>
|
1034 |
|
|
|
1035 |
|
|
|
1036 |
|
|
<p>
|
1037 |
|
|
<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
|
1038 |
|
|
and
|
1039 |
|
|
<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
|
1040 |
|
|
are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt>
|
1041 |
|
|
parameter. If the latter parameter is <tt><b>true</b></tt>, then
|
1042 |
|
|
the container stores with each entry a hash value, and uses
|
1043 |
|
|
this value in case of collisions to determine whether to apply a hash value.
|
1044 |
|
|
This can lower the cost of collision for some types, but increase the cost of collisions for other types.
|
1045 |
|
|
</p>
|
1046 |
|
|
|
1047 |
|
|
<p>
|
1048 |
|
|
If a ranged-hash function or ranged probe function is directly supplied, however,
|
1049 |
|
|
then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted.
|
1050 |
|
|
</p>
|
1051 |
|
|
|
1052 |
|
|
|
1053 |
|
|
|
1054 |
|
|
</body>
|
1055 |
|
|
|
1056 |
|
|
</html>
|