OpenCores
URL https://opencores.org/ocsvn/scarts/scarts/trunk

Subversion Repositories scarts

[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [hash_based_containers.html] - Blame information for rev 20

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 20 jlechner
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2
<html>
3
    <head>
4
        <title>Hash-Based Containers</title>
5
        <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
6
        <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
7
    </head>
8
 
9
<body bgcolor = "white">
10
 
11
<h1>Hash-Based Containers</h1>
12
 
13
<p>
14
    This section describes hash-based containers. It is organized
15
as follows.
16
</p>
17
 
18
<ol>
19
        <li>
20
                <a href = "#overview">Overview</a> is an overview.
21
        </li>
22
        <li>
23
                <a href = "#hash_policies">Hash Policies</a> discusses
24
        hash policies.
25
        </li>
26
        <li>
27
                <a href = "#resize_policies">Resize Policies</a> discusses
28
        resize policies.
29
        </li>
30
        <li>
31
                <a href = "#policy_interaction">Policy Interaction</a> discusses
32
        interaction between policies.
33
        </li>
34
</ol>
35
 
36
 
37
 
38
<h2><a name = "overview">Overview</a></h2>
39
 
40
 
41
<p>
42
        Figure
43
<a href = "#hash_cd">Hash-based containers</a>
44
        shows the container-hierarchy; the hash-based containers are circled.
45
<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
46
is a collision-chaining hash-based container;
47
<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
48
is a (general) probing hash-based container.
49
</p>
50
 
51
<h6 align = "center">
52
<a name = "hash_cd">
53
<img src = "hash_cd.jpg" width = "70%" alt = "no image">
54
</h6>
55
<h6 align = "center">
56
</a>
57
Hash-based containers.
58
</h6>
59
 
60
<p>
61
        The collision-chaining hash-based container has the following declaration.
62
</p>
63
<pre>
64
<b>template</b>&lt;
65
        <b>typename</b> Key,
66
        <b>typename</b> Data,
67
        <b>class</b> Hash_Fn = std::hash&lt;Key&gt;,
68
        <b>class</b> Eq_Fn = std::equal_to&lt;Key&gt;,
69
        <b>class</b> Comb_Hash_Fn =
70
                <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
71
        <b>class</b> Resize_Policy = <i>default explained below.</i>
72
        <b>bool</b> Store_Hash = <b>false</b>,
73
        <b>class</b> Allocator =
74
                std::allocator&lt;<b>char</b>&gt; &gt;
75
<b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>;
76
</pre>
77
 
78
<p>
79
        The parameters have the following meaning:
80
</p>
81
<ol>
82
        <li> <tt>Key</tt> is the key type.
83
        </li>
84
        <li> <tt>Data</tt> is the data-policy, and is explained in
85
<a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>.
86
        </li>
87
        <li> <tt>Hash_Fn</tt> is a key hashing functor.</li>
88
        <li> <tt>Eq_Fn</tt> is a key equivalence functor.</li>
89
        <li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it
90
describes how to translate hash values into positions within the table.
91
This is described in
92
<a name = "#hash_policies">Hash Policies</a>.</li>
93
        </li>
94
        <li> <tt>Resize_Policy</tt> describes how a container object should
95
change its internal size. This is described in
96
<a name = #resize_policies">Resize Policies</a>.</li>
97
        <li> <tt>Store_Hash</tt> indicates whether the hash value should
98
be stored with each entry. This is described in
99
<a name = "#policy_interaction">Policy Interaction</a>.</li>
100
        <li> <tt>Allocator</tt> is (surprisingly) an allocator type.
101
        </li>
102
</ol>
103
 
104
<p>
105
        The probing hash-based container has the following declaration.
106
</p>
107
<pre>
108
<b>template</b>&lt;
109
        <b>typename</b> Key,
110
        <b>typename</b> Data,
111
        <b>class</b> Hash_Fn =
112
                std::hash&lt;
113
                        Key&gt;,
114
        <b>class</b> Eq_Fn =
115
                std::equal_to&lt;
116
                        Key&gt;,
117
        <b>class</b> Comb_Probe_Fn =
118
                <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
119
        <b>class</b> Probe_Fn = <i>default explained below.</i>
120
        <b>class</b> Resize_Policy = <i>default explained below.</i>
121
        <b>bool</b> Store_Hash = <b>false</b>,
122
        <b>class</b> Allocator =
123
                std::allocator&lt;<b>char</b>&gt; &gt;
124
<b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>;
125
</pre>
126
 
127
<p>
128
        The parameters are identical to those of the collision-chaining container, except
129
for the following.
130
</p>
131
<ol>
132
        <li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into
133
a sequence of positions within the table.
134
        </li>
135
        <li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li>
136
</ol>
137
 
138
 
139
<p>
140
        Some of the default template values depend on the values of other parameters,
141
and are explained in
142
<a name = "#policy_interaction">Policy Interaction</a>.
143
</p>
144
 
145
<h2><a name = "hash_policies">Hash Policies</h2></a>
146
<p>
147
    This subsection describes hash policies. It is organized as follows:
148
</p>
149
<ol>
150
    <li> <a href = "#general_terms">General Terms</a>  describes
151
            some general terms.
152
    </li>
153
    <li> <a href = "#range_hashing_fns">Range-Hashing Functions</a>
154
        describes range-hasing functions.</li>
155
    <li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a>
156
        describes ranged-hash functions. </li>
157
    <li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a>
158
            describes the implementation of these concepts in <tt>pb_assoc</tt>.
159
    </li>
160
</ol>
161
 
162
 
163
<h3><a name="general_terms">General Terms</a></h3>
164
 
165
<p>
166
    There
167
are actually three functions involved in transforming a key into a hash-table's
168
position (see Figure
169
<a href = "#hash_ranged_hash_range_hashing_fns">
170
Hash runctions, ranged-hash functions, and range-hashing functions
171
</a>):
172
</p>
173
<ol>
174
    <li>
175
        A <i>ranged-hash</i> function, which maps keys into an interval of the
176
        non-negative integrals. This is the function actually required by the
177
        hash-table algorithm.
178
    </li>
179
    <li>
180
        A hash function, which maps keys into non-negative integral types. This is
181
        typically specified by the writer of the key class.
182
    </li>
183
    <li>
184
        A <i>range-hashing</i> function, which maps non-negative integral types into an
185
        interval of non-negative integral types.
186
    </li>
187
</ol>
188
 
189
<h6 align = "center">
190
<a name = "hash_ranged_hash_range_hashing_fns">
191
<img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image">
192
</h6>
193
<h6 align = "center">
194
</a>
195
Hash runctions, ranged-hash functions, and range-hashing functions.
196
</h6>
197
 
198
<p>
199
    Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
200
    characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
201
    into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
202
    value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
203
    function
204
</p>
205
<p>
206
    <i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub> </i>,
207
</p>
208
<p>
209
    such that for any <i>u</i> in <i>U</i>
210
,
211
</p>
212
<p>
213
    <i>0 &le; f(u, m) &le; m - 1 </i>,
214
</p>
215
<p>
216
    and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
217
    One common solution is to use the composition of the hash function
218
</p>
219
<p>
220
    <i>h : U &rarr; Z<sub>+</sub> </i>,
221
</p>
222
<p>
223
    which maps elements of <i>U</i> into the non-negative integrals, and
224
</p>
225
<p>
226
    <i>g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr; Z<sub>+</sub>, </i>
227
</p>
228
<p>
229
    which maps a non-negative hash value, and a non-negative range upper-bound into
230
    a non-negative integral in the range between 0 (inclusive) and the range upper
231
    bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
232
</p>
233
<p>
234
    <i>0 &le; g(r, m) &le; m - 1 </i>.
235
</p>
236
<p>
237
    The resulting ranged-hash function, is
238
</p>
239
<p>
240
    <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
241
    </i>(1) .
242
</p>
243
 
244
<p>
245
    From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
246
    always be composed (however the converse is not true). The STL's hash-based
247
    containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed.
248
</p>
249
 
250
 
251
<p>
252
    The above describes the case where a key is to be mapped into a <i>single
253
position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
254
In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
255
within a table, <i>e.g.</i>, in a probing table.
256
</p>
257
<p>
258
    Similar terms apply in this case: the table requires a <i>ranged probe</i>
259
function, mapping a key into a sequence of positions withing the table. This is
260
typically acheived by composing a <i>hash function</i> mapping the key
261
into a non-negative integral type, a <i>probe</i> function transforming the
262
hash value into a sequence of hash values, and a <i>range-hashing</i> function
263
transforming the sequence of hash values into a sequence of positions.
264
</p>
265
 
266
 
267
<h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3>
268
 
269
<p>
270
    Some common choices for range-hashing functions are the division,
271
    multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
272
    defined as
273
</p>
274
<p>
275
    <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
276
</p>
277
<p>
278
    <i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil; </i>,
279
</p>
280
<p>
281
    and
282
</p>
283
<p>
284
    <i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil; </i>,
285
</p>
286
<p>
287
respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
288
powers of 2), and some <i>a</i>. Each of these range-hashing functions works
289
best for some different setting.
290
</p>
291
<p>
292
    The division method <a href="#division_method">(2)</a> is a very common
293
    choice. However, even this single method can be implemented in two very
294
    different ways. It is possible to implement <a href="#division_method">(2)</a>
295
    using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
296
    level <i>&amp;</i> (bit-mask) operation (for the case where <i>m</i> is a power of
297
    2), <i>i.e.</i>,
298
</p>
299
<p>
300
    <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
301
</p>
302
<p>
303
    and
304
</p>
305
<p>
306
    <a name="eqn:division_method_bit_mask">
307
    <i>g(r, m) = r &amp; m - 1, ( m = 2<sup>k</sup>
308
    </i>
309
        for some<i> k) </i></a>(4) ,
310
</p>
311
<p>
312
    respectively.
313
</p>
314
<p>
315
    The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
316
    has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
317
    is affected by all the bits of <i>r</i> (minimizing the chance of collision).
318
    It has the disadvantage of using the costly modulo operation. This method is
319
    hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
320
</p>
321
 
322
<p>
323
    The <i>&amp;</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
324
    has the advantage of relying on the fast bitwise and operation. It has the
325
    disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
326
    This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
327
</p>
328
 
329
 
330
 
331
 
332
<h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3>
333
 
334
<p>
335
    In some less frequent cases it is beneficial to allow the client to
336
directly specify a ranged-hash hash function. It is true, that the writer of
337
the ranged-hash function cannot rely on the values of <i>m</i> having specific
338
numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
339
since the values of <i>m</i> are determined by a resize policy with possibly
340
orthogonal considerations.
341
</p>
342
 
343
<p>
344
        There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing
345
[<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second
346
is when the values of <i>m</i> can be used to estimate the
347
"general" number of distinct values required. This is described in the following.
348
</p>
349
 
350
<p>
351
    Let
352
</p>
353
 
354
<p>
355
    <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
356
</p>
357
 
358
<p>
359
    be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
360
Consider the following ranged-hash function:
361
</p>
362
 
363
<p>
364
    <a name="eqn:total_string_dna_hash">
365
        <i>
366
            f<sub>1</sub>(s, m) =
367
            &sum; <sub>i =
368
            0</sub><sup>t   - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
369
    </a> (5) ,
370
</p>
371
 
372
<p>
373
    where <i>a</i> is some non-negative integral value. This is the standard
374
string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
375
Its advantage is that it takes into account all of the characters of the
376
string.
377
</p>
378
 
379
<p>
380
    Now assume that <i>s</i> is the string representation of a of a long DNA
381
sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
382
entire string might be prohibitively expensive. A possible alternative might be
383
to use only the first <i>k</i> characters of the string, where
384
</p>
385
 
386
<p>
387
    k <sup>|S|</sup> &ge; m ,
388
</p>
389
<p>
390
    <i>i.e.</i>, using the hash function
391
</p>
392
<p>
393
    <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k
394
                - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
395
</p>
396
<p>
397
    requiring scanning over only
398
</p>
399
<p>
400
    <i>k = </i>log<i><sub>4</sub>( m ) </i>
401
</p>
402
<p>
403
    characters.
404
</p>
405
<p>
406
    Other more elaborate hash-functions might scan <i>k</i> characters starting at
407
    a random position (determined at each resize), or scanning <i>k</i> random
408
    positions (determined at each resize), <i>i.e.</i>, using
409
</p>
410
<p>
411
    <i>f<sub>3</sub>(s, m) = &sum; <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
412
        s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
413
</p>
414
<p>
415
    or
416
</p>
417
<p>
418
    <i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
419
        a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
420
</p>
421
<p>
422
<p>
423
    respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
424
    (inclusive) range <i>[0,...,t-1]</i>.
425
</p>
426
 
427
 
428
<h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3>
429
 
430
<p>
431
<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is
432
parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor
433
and a combining hash functor, respectively.
434
</p>
435
 
436
<p>
437
        For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
438
one of the
439
<a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>,
440
then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor.
441
The container will synthesize a ranged-hash functor from both. For example, Figure
442
<a href = "#hash_range_hashing_seq_diagram">
443
Insert hash sequence diagram
444
</a>
445
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
446
the container transforms the key into a non-negative integral using the hash
447
functor (points B and C), and transforms the result into a position
448
using the combining functor (points D and E).
449
</p>
450
 
451
<h6 align = "center">
452
<a name = "hash_range_hashing_seq_diagram">
453
<img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image">
454
</a>
455
</h6>
456
<h6 align = "center">
457
Insert hash sequence diagram.
458
</h6>
459
 
460
 
461
<p>
462
    <tt>pb_assoc</tt> contains the following range-hashing policies:
463
</p>
464
 
465
<ol>
466
    <li>
467
<a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
468
and
469
<a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
470
are range-hashing functions based on a bit-mask and a modulo operation, respectively.
471
    </li>
472
</ol>
473
 
474
 
475
<p>
476
    If <tt>Comb_Hash_Fn</tt> is instantiated by
477
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
478
and a combining-hash functor, the container treats
479
the combining hash functor as a ranged-hash function. For example, Figure
480
<a href = "#hash_range_hashing_seq_diagram2">
481
Insert hash sequence diagram with a null combination policy
482
</a>
483
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
484
the container transforms the key into a position
485
using the combining functor (points B and C).
486
</p>
487
 
488
 
489
<h6 align = "center">
490
<a name = "hash_range_hashing_seq_diagram2">
491
<img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image">
492
</a>
493
</h6>
494
<h6 align = "center">
495
Insert hash sequence diagram with a null combination policy.
496
</h6>
497
 
498
<p>
499
        Similarly,
500
<a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt>
501
is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
502
<tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt>
503
and <tt>Comb_Probe_Fn</tt> are, respectively,
504
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and
505
<a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>,
506
then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt>
507
is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value,
508
and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions
509
within the table.
510
</p>
511
 
512
<p>
513
        <tt>pb_assoc</tt> contains the following probe policies:
514
</p>
515
 
516
<ol>
517
    <li>
518
<a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe
519
function.
520
    </li>
521
        <li>
522
<a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is
523
a quadratic probe function.
524
    </li>
525
</ol>
526
 
527
 
528
 
529
 
530
 
531
 
532
 
533
 
534
 
535
<h2><a name = "resize_policies">Resize Policies</a></h2>
536
 
537
<p>
538
    This subsection describes resize policies. It is organized as follows:
539
</p>
540
 
541
<ol>
542
    <li> <a href = "#general">General Terms</a> describes general
543
        terms.
544
    </li>
545
    <li> <a href = "#size_policies">Size Policies</a>  describes size
546
    policies.
547
    </li>
548
    <li> <a href = "#trigger_policies">Trigger Policies</a>  describes trigger
549
    policies.
550
    </li>   <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a>
551
         describes the implementation of these concepts in <tt>pb_assoc</tt>.
552
    </li>
553
</ol>
554
 
555
 
556
<h3><a name = "general">General Terms</a></h3>
557
 
558
<p>
559
    Hash-tables, as opposed to trees, do not naturally grow or shrink. It
560
is necessary to specify policies to determine how and when a hash table should change
561
its size.
562
</p>
563
 
564
<p>
565
    In general, resize policies can be decomposed into (probably orthogonal)
566
policies:
567
</p>
568
<ol>
569
    <li> A <i>size policy</i> indicating <i>how</i> a hash table should
570
grow (<i>e.g.,</i> it should multiply by powers of 2).
571
    </li>
572
    <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should
573
grow (<i>e.g.,</i> a load factor is exceeded).
574
    </li>
575
</ol>
576
 
577
 
578
 
579
<h3><a name = "size_policies">Size Policies</a></h3>
580
 
581
<p>
582
    Size policies determine how a hash table
583
changes size. These policies are simple, and there are relatively
584
few sensible options. An exponential-size policy (with the initial
585
size and growth factors both powers of 2) works well with a
586
mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
587
and is the
588
hard-wired policy used by Dinkumware
589
[<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A
590
prime-list based policy works well with a modulo-prime range
591
hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
592
and is the
593
hard-wired policy used by SGI's implementation
594
[<a href = "references.html#sgi_stl">sgi_stl</a>].
595
</p>
596
 
597
 
598
 
599
 
600
<h3><a name = "trigger_policies">Trigger Policies</a></h3>
601
 
602
<p>
603
    Trigger policies determine when a hash table changes size.
604
Following is a description of two polcies: <i>load-check</i>
605
policies, and a collision-check policies.
606
</p>
607
 
608
<p>
609
    Load-check policies are straightforward. The user
610
specifies two factors, <i>&alpha;<sub>min</sub></i> and <i>&alpha;<sub>max</sub></i>, and
611
the hash table maintains the invariant that
612
</p>
613
<p>
614
    <i>
615
        <a name = "eqn:load_factor_min_max">
616
        &alpha;<sub>min</sub>
617
        &le;
618
        (number of stored elements) / (hash-table size)
619
        &le;
620
        &alpha;<sub>max</sub>
621
        </a>
622
    </i>
623
    (1)
624
    .
625
</p>
626
 
627
<p>
628
    Collision-check policies work in the opposite direction of
629
load-check policies. They focus on keeping the number of
630
collisions moderate and hoping
631
that the size of the table will not grow very large,
632
instead of keeping a moderate load-factor and
633
hoping that the number of collisions will be small.
634
A
635
maximal collision-check policy resizes when the shortest
636
probe-sequence grows too large.
637
</p>
638
 
639
 
640
<p>
641
    Consider Figure
642
<a href = "#balls_and_bins">Balls and bins</a>.
643
    Let the size of the hash table be denoted by <i>m</i>, the
644
length of a probe sequence be denoted by <i>k</i>, and some load
645
factor be denoted by &alpha;. We would like to calculate the
646
minimal length of <i>k</i>, such that if there were <i>&alpha; m</i> elements
647
in the hash table, a probe sequence of length <i>k</i> would be found
648
with probability at most <i>1/m</i>.
649
</p>
650
 
651
<h6 align = "center">
652
<a name = "balls_and_bins">
653
<img src = "balls_and_bins.jpg" width = "70%" alt = "no image">
654
</a>
655
</h6>
656
<h6 align = "center">
657
Balls and bins.
658
</h6>
659
 
660
 
661
<p>
662
    Denote the probability that a probe sequence of length <i>k</i>
663
appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence
664
of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution.
665
Then
666
</p>
667
    <p>
668
    <a name = "eqn:prob_of_p1">
669
        <i>p<sub>1</sub>
670
        = </i>(3)
671
    </a>
672
    </p>
673
    <p>
674
    <i>
675
    <b>P</b>(l<sub>1</sub> &ge; k)
676
    =
677
    </i>
678
    </p>
679
    <p>
680
    <i><b>P</b>(l<sub>1</sub> &ge; &alpha; ( 1 + k / &alpha; - 1 )
681
    &le; </i>(a)
682
    </p>
683
    <p>
684
    <i>
685
    e
686
    ^
687
    (
688
        -
689
        (
690
            &alpha; ( k / &alpha; - 1 )<sup>2</sup>
691
        )
692
        /2
693
    )
694
    </i>
695
    ,
696
</p>
697
<p>
698
    where (a) follows from the Chernoff bound
699
[<a href = "references.html#motwani95random">motwani95random</a>].
700
To
701
calculate the probability that <i>some</i> bin contains a probe
702
sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are
703
negatively-dependent
704
[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
705
Let <i><b>I</b>(.)</i>
706
denote the indicator function. Then
707
    <p>
708
    <a name = "eqn:at_least_k_i_n_some_bin">
709
        <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> &ge; k )
710
        = </i>(3)
711
    </a>
712
    </p>
713
    <p>
714
    <i>
715
   <b>P</b>
716
    (
717
        &sum; <sub>i = 1</sub><sup>m</sup>
718
        <b>I</b>(l<sub>i</sub> &ge; k) &ge; 1
719
    )
720
    =
721
    </i>
722
    </p>
723
    <p>
724
    <i>
725
    <b>P</b>
726
    (
727
        &sum; <sub>i = 1</sub><sup>m</sup>
728
        <b>I</b>
729
        (
730
            l<sub>i</sub> &ge; k
731
        )
732
        &ge;
733
        m  p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 )
734
    )
735
    &le; </i>(a)
736
    </p>
737
    <p>
738
    <i>
739
    e
740
    ^
741
    (
742
        (
743
            -
744
            m p<sub>1</sub>
745
            (
746
                1 / (m p<sub>1</sub>) - 1
747
            )
748
            <sup>2</sup>
749
        )
750
        /
751
        2
752
    )
753
    ,
754
    </i>
755
    </p>
756
<p>
757
where (a) follows from the fact that the Chernoff bound can be
758
applied to negatively-dependent variables
759
[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
760
Inserting <a href = "#prob_of_p1">(2)</a> into
761
<a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>,
762
we obtain
763
</p>
764
<p>
765
    <i>
766
    k
767
    ~
768
    &radic;
769
    (
770
        2 &alpha; </i>ln<i> 2 m </i>ln<i>(m) )
771
    )
772
    </i>
773
    .
774
</p>
775
 
776
 
777
 
778
 
779
 
780
 
781
 
782
 
783
 
784
<h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3>
785
 
786
<p>
787
    The resize policies in the previous subsection are conceptually straightforward. The design
788
of hash-based containers' size-related interface is complicated by some factors.
789
</p>
790
<ol>
791
    <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no
792
distinction between the number of entries the container holds and the number of entries it is using. This,
793
of course, is not the case for hash-based containers. Moreover, even describing the
794
"actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container
795
holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries.
796
    </li>
797
    <li>
798
    The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy
799
maintains an invariant concerning the load factor of a container object. This is sometimes too rigid:
800
    <ol>
801
        <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container).
802
        </li>
803
        <li>
804
            In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy.
805
        </li>
806
    </ol>
807
    </li>
808
    <li>
809
    Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a>
810
discusses the previous concepts.)
811
    </li>
812
</ol>
813
 
814
<p>
815
    Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers.
816
</p>
817
 
818
 
819
 
820
<p>
821
        This library attempts to address these types of problems by delegating all size-related functionality to
822
policy classes. Hash-based containers
823
are parameterized by a resize-policy class (among others), and derive publicly from
824
the resize-policy class
825
[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]
826
 <i>E.g.</i>, a collision-chaining
827
hash table is defined as follows:
828
</p>
829
<pre>
830
cc_ht_map&lt;
831
  <b>class</b> Key,
832
  <b>class</b> Data,
833
  ...
834
  <b>class</b> Resize_Policy
835
  ...&gt; :
836
    <b>public</b> Resize_Policy
837
</pre>
838
 
839
<p>
840
    The containers themselves lack any functionality or public interface for manipulating sizes. A container
841
object merely forwards events to its resize policy object and queries it for needed actions.
842
</p>
843
 
844
<p>
845
    Figure
846
<a href = "#insert_resize_sequence_diagram1">
847
Insert resize sequence diagram
848
</a>
849
shows a (possible) sequence diagram of an insert operation.
850
The user inserts an element; the hash table
851
notifies its resize policy that a search has started (point A);
852
in this case, a single collision is encountered - the table
853
notifies its resize policy of this (point B); the container
854
finally notifies its resize policy that the search has ended (point C);
855
it then queries its resize policy whether a resize is needed, and if so,
856
what is the new size (points D to G); following the resize, it notifies
857
the policy that a resize has completed (point H); finally, the element
858
is inserted, and the policy notified (point I).
859
</p>
860
 
861
<h6 align = "center">
862
<a name = "insert_resize_sequence_diagram1">
863
<img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image">
864
</a>
865
</h6>
866
<h6 align = "center">
867
Insert resize sequence diagram.
868
</h6>
869
 
870
<p>
871
    This addresses, to some extent, the problems mentioned above:
872
</p>
873
<ol>
874
    <li>
875
        Different instantiations of range-hashing policies can be met with different instantiations of
876
    resize policies.
877
    </li>
878
    <li>
879
        Questions on size-related interface are avoided, since the containers have no size-related methods. Thus
880
    a container has no method for querying its actual size. It merely continuously forwards enough information to
881
    its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design     the appropriate method. Also, a container has no methods for setting its size. It merely queries its
882
resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and
883
supports a <tt><b>protected virtual</b></tt> function for external resize.
884
    </li>
885
</ol>
886
 
887
<p>
888
    The library contains a single class for instantiating a resize policy,
889
<tt>pb_assoc</tt> contains
890
a standard resize policy,
891
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> (the name is explained shortly).
892
In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports
893
queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility).
894
([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for
895
changing between alternative interfaces at compile time.)
896
</p>
897
 
898
<p>
899
As noted before,
900
    size and trigger policies are usually orthogonal.
901
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
902
is parameterized by size and trigger policies. For example,
903
a collision-chaining hash table
904
is typically be defined as follows:
905
</p>
906
<pre>
907
cc_ht_map&lt;
908
  key,
909
  data,
910
  ...
911
  hash_standard_resize_policy&lt;
912
    some_trigger_policy,
913
    some_size_policy,
914
    ...&gt; &gt;
915
</pre>
916
 
917
<p>
918
 The sole function of
919
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
920
 is to
921
act as a standard delegator
922
[<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these
923
policies.
924
 
925
<p>
926
    Figures
927
<a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a>
928
    and
929
<a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a>
930
    show sequence diagrams illustrating the interaction between
931
    the standard resize policy and its trigger and size policies, respectively.
932
</p>
933
 
934
<h6 align = "center">
935
<a name = "insert_resize_sequence_diagram2">
936
<img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image">
937
</a>
938
</h6>
939
<h6 align = "center">
940
Standard resize policy trigger sequence diagram.
941
</h6>
942
 
943
<h6 align = "center">
944
<a name = "insert_resize_sequence_diagram3">
945
<img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image">
946
</a>
947
</h6>
948
<h6 align = "center">
949
Standard resize policy size sequence diagram.
950
</h6>
951
 
952
<p>
953
    The library (currently) supports the following instantiations of size
954
and trigger policies:
955
</p>
956
 
957
<ol>
958
    <li>
959
        <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> implements
960
    a load check trigger policy.
961
    </li>
962
    <li>
963
        <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
964
    implements a collision check trigger policy.
965
    </li>
966
    <li>
967
<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> implemens
968
an exponential-size policy (which should be used with mask range hashing).
969
    </li>
970
    <li>
971
<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> implementing
972
a size policy based on a sequence of primes
973
[<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing
974
    </li>
975
</ol>
976
 
977
<p>
978
    The trigger policies also support interfaces for changing their specifics which depend on compile time constants.
979
</p>
980
 
981
 
982
<p>
983
    Figure
984
<a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture
985
of the resize-related classes.
986
<tt>Container</tt> (which stands for any of the hash-based containers) is parameterized
987
by <tt>Resize_Policy</tt>, from which it subclasses publicly
988
[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>].
989
This class is currently instantiated only by
990
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
991
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> itself
992
is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>.
993
Currently, <tt>Trigger_Policy</tt> is instantiated by
994
<a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
995
or
996
<a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by
997
<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
998
or
999
<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.
1000
</p>
1001
 
1002
 
1003
<h6 align = "center">
1004
<a name = "resize_policy_cd">
1005
<img src = "resize_policy_cd.jpg" width = "70%" alt = "no image">
1006
</a>
1007
</h6>
1008
<h6 align = "center">
1009
Resize policy class diagram.
1010
</h6>
1011
 
1012
 
1013
 
1014
 
1015
<h2><a name = "#policy_interaction">Policy Interaction</a></h2>
1016
 
1017
<p>
1018
        Hash-tables are unfortunately susceptible to choice of policies. One
1019
of the more complicated aspects of this is that poor combinations of good policies
1020
can alter performance drastically. Following are some considerations.
1021
</p>
1022
 
1023
 
1024
 
1025
 
1026
 
1027
<h3>Range-Hashing Policies and Resize Policies</h3>
1028
 
1029
<p>
1030
</p>
1031
 
1032
 
1033
<h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3>
1034
 
1035
 
1036
<p>
1037
<a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
1038
and
1039
<a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
1040
are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt>
1041
parameter. If the latter parameter is <tt><b>true</b></tt>, then
1042
the container stores with each entry a hash value, and uses
1043
this value in case of collisions to determine whether to apply a hash value.
1044
This can lower the cost of collision for some types, but increase the cost of collisions for other types.
1045
</p>
1046
 
1047
<p>
1048
        If a ranged-hash function or ranged probe function is directly supplied, however,
1049
then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted.
1050
</p>
1051
 
1052
 
1053
 
1054
</body>
1055
 
1056
</html>

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.