OpenCores
URL https://opencores.org/ocsvn/scarts/scarts/trunk

Subversion Repositories scarts

[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [hash_policies.html] - Blame information for rev 20

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 20 jlechner
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2
<html>
3
<head>
4
    <title>Hash Policies</title>
5
    <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
6
    <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
7
</head>
8
<body bgcolor="white">
9
 
10
<h1>Hash Policies</h1>
11
<p>
12
    This subsection describes hash policies. It is organized as follows:
13
</p>
14
<ol>
15
    <li> The <a href = "#general_terms">General Terms</a> Section describes
16
            some general terms.
17
    </li>
18
    <li> The <a href = "#range_hashing_fns">Range-Hashing Functions</a> Section
19
        describes range-hasing functions.</li>
20
    <li> The <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a> Section
21
        describes ranged-hash functions. </li>
22
    <li> The <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a> Section
23
            describes the implementation of these concepts in <tt>pb_assoc</tt>.
24
    </li>
25
</ol>
26
 
27
 
28
<h2><a name="general_terms">General Terms</a></h2>
29
 
30
<p>
31
    There
32
are actually three functions involved in transforming a key into a hash-table's
33
position (see Figure
34
<a href = "#hash_ranged_hash_range_hashing_fns">
35
Hash runctions, ranged-hash functions, and range-hashing functions
36
</a>):
37
</p>
38
<ol>
39
    <li>
40
        A <i>ranged-hash</i> function, which maps keys into an interval of the
41
        non-negative integrals. This is the function actually required by the
42
        hash-table algorithm.
43
    </li>
44
    <li>
45
        A hash function, which maps keys into non-negative integral types. This is
46
        typically specified by the writer of the key class.
47
    </li>
48
    <li>
49
        A <i>range-hashing</i> function, which maps non-negative integral types into an
50
        interval of non-negative integral types.
51
    </li>
52
</ol>
53
 
54
<h6 align = "center">
55
<a name = "hash_ranged_hash_range_hashing_fns">
56
<img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "40%" alt = "no image">
57
</a>
58
Hash runctions, ranged-hash functions, and range-hashing functions.
59
</h6>
60
 
61
<p>
62
    Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
63
    characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
64
    into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
65
    value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
66
    function
67
</p>
68
<p>
69
    <i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub> </i>,
70
</p>
71
<p>
72
    such that for any <i>u</i> in <i>U</i>
73
,
74
</p>
75
<p>
76
    <i>0 &le; f(u, m) &le; m - 1 </i>,
77
</p>
78
<p>
79
    and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
80
    One common solution is to use the composition of the hash function
81
</p>
82
<p>
83
    <i>h : U &rarr; Z<sub>+</sub> </i>,
84
</p>
85
<p>
86
    which maps elements of <i>U</i> into the non-negative integrals, and
87
</p>
88
<p>
89
    <i>g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr; Z<sub>+</sub>, </i>
90
</p>
91
<p>
92
    which maps a non-negative hash value, and a non-negative range upper-bound into
93
    a non-negative integral in the range between 0 (inclusive) and the range upper
94
    bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
95
</p>
96
<p>
97
    <i>0 &le; g(r, m) &le; m - 1 </i>.
98
</p>
99
<p>
100
    The resulting ranged-hash function, is
101
</p>
102
<p>
103
    <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
104
    </i>(1) .
105
</p>
106
 
107
<p>
108
    From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
109
    always be composed (however the converse is not true).
110
</p>
111
 
112
 
113
<p>
114
    The above describes the case where a key is to be mapped into a <i>single
115
position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
116
In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
117
within a table, <i>e.g.</i>, in a probing table.
118
</p>
119
<p>
120
    Similar terms apply in this case: the table requires a <i>ranged probe</i>
121
function, mapping a key into a sequence of positions withing the table. This is
122
typically acheived by composing a <i>hash function</i> mapping the key
123
into a non-negative integral type, a <i>probe</i> function transforming the
124
hash value into a sequence of hash values, and a <i>range-hashing</i> function
125
transforming the sequence of hash values into a sequence of positions.
126
</p>
127
 
128
 
129
<h2><a name="range_hashing_fns">Range-Hashing Functions</a></h2>
130
 
131
<p>
132
    Some common choices for range-hashing functions are the division,
133
    multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
134
    defined as
135
</p>
136
<p>
137
    <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
138
</p>
139
<p>
140
    <i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil; </i>,
141
</p>
142
<p>
143
    and
144
</p>
145
<p>
146
    <i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil; </i>,
147
</p>
148
<p>
149
respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
150
powers of 2), and some <i>a</i>. Each of these range-hashing functions works
151
best for some different setting.
152
</p>
153
<p>
154
    The division method <a href="#division_method">(2)</a> is a very common
155
    choice. However, even this single method can be implemented in two very
156
    different ways. It is possible to implement <a href="#division_method">(2)</a>
157
    using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
158
    level <i>&amp;</i> (bit-mask) operation (for the case where <i>m</i> is a power of
159
    2), <i>i.e.</i>,
160
</p>
161
<p>
162
    <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
163
</p>
164
<p>
165
    and
166
</p>
167
<p>
168
    <a name="eqn:division_method_bit_mask">
169
    <i>g(r, m) = r &amp; m - 1, ( m = 2<sup>k</sup>
170
    </i>
171
        for some<i> k) </i></a>(4) ,
172
</p>
173
<p>
174
    respectively.
175
</p>
176
<p>
177
    The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
178
    has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
179
    is affected by all the bits of <i>r</i> (minimizing the chance of collision).
180
    It has the disadvantage of using the costly modulo operation. This method is
181
    hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
182
</p>
183
 
184
<p>
185
    The <i>&amp;</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
186
    has the advantage of relying on the fast bitwise and operation. It has the
187
    disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
188
    This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
189
</p>
190
 
191
 
192
 
193
 
194
<h2><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h2>
195
 
196
<p>
197
    Although rarer, there are cases where it is beneficial to allow the client to
198
directly specify a ranged-hash hash function. It is true, that the writer of
199
the ranged-hash function cannot rely on the values of <i>m</i> having specific
200
numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
201
since the values of <i>m</i> are determined by a resize policy with possibly
202
orthogonal considerations [<a href="references.html#austern98segmented">austern98segmented</a>].
203
The values of <i>m</i> can be used in some cases, though, to estimate the
204
"general" number of distinct values required.
205
</p>
206
 
207
<p>
208
    Let
209
</p>
210
 
211
<p>
212
    <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
213
</p>
214
 
215
<p>
216
    be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
217
Consider the following ranged-hash function:
218
</p>
219
 
220
<p>
221
    <a name="eqn:total_string_dna_hash">
222
        <i>
223
            f<sub>1</sub>(s, m) =
224
            &sum; <sub>i =
225
            0</sub><sup>t   - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
226
    </a> (5) ,
227
</p>
228
 
229
<p>
230
    where <i>a</i> is some non-negative integral value. This is the standard
231
string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
232
Its advantage is that it takes into account all of the characters of the
233
string.
234
</p>
235
 
236
<p>
237
    Now assume that <i>s</i> is the string representation of a of a long DNA
238
sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
239
entire string might be prohibitively expensive. A possible alternative might be
240
to use only the first <i>k</i> characters of the string, where
241
</p>
242
 
243
<p>
244
    k <sup>|S|</sup> &ge; m ,
245
</p>
246
<p>
247
    <i>i.e.</i>, using the hash function
248
</p>
249
<p>
250
    <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k
251
                - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
252
</p>
253
<p>
254
    requiring scanning over only
255
</p>
256
<p>
257
    <i>k = </i>log<i><sub>4</sub>( m ) </i>
258
</p>
259
<p>
260
    characters.
261
</p>
262
<p>
263
    Other more elaborate hash-functions might scan <i>k</i> characters starting at
264
    a random position (determined at each resize), or scanning <i>k</i> random
265
    positions (determined at each resize), <i>i.e.</i>, using
266
</p>
267
<p>
268
    <i>f<sub>3</sub>(s, m) = &sum; <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
269
        s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
270
</p>
271
<p>
272
    or
273
</p>
274
<p>
275
    <i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
276
        a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
277
</p>
278
<p>
279
<p>
280
    respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
281
    (inclusive) range <i>[0,...,t-1]</i>.
282
</p>
283
 
284
 
285
<h2><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h2>
286
 
287
<p>
288
    Containers based on collision-chaining hash tables in <tt>pb_assoc</tt>
289
are parameterized by the functors <tt>Hash_Fn</tt>, and <tt>Comb_Hash_Fn</tt>.
290
</p>
291
 
292
<p>
293
    If such a container is instantiated with any hash functor and
294
range-hashing functor, the container will synthesize a ranged-hash functor
295
automatically. For example, Figure
296
<a href = "#hash_range_hashing_seq_diagram">
297
Insert hash sequence diagram
298
</a>
299
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
300
the container transforms the key into a non-negative integral using the hash
301
functor (points B and C), and transforms the result into a position
302
using the combining functor (points D and E).
303
</p>
304
 
305
<h6 align = "center">
306
<a name = "hash_range_hashing_seq_diagram">
307
<img src = "hash_range_hashing_seq_diagram.jpg" width = "50%" alt = "no image">
308
</a>
309
</h6>
310
<h6 align = "center">
311
Insert hash sequence diagram.
312
</h6>
313
 
314
 
315
<p>
316
    If such a container is instantiated with the
317
<a href = "concepts.html#concepts_null_policies">null policy</a>
318
hash functor,
319
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
320
and a combining-hash functor, the container treats
321
the combining hash functor as a ranged-hash function. For example, Figure
322
<a href = "#hash_range_hashing_seq_diagram2">
323
Insert hash sequence diagram with a null combination policy
324
</a>
325
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
326
the container transforms the key into a position
327
using the combining functor (points B and C).
328
</p>
329
 
330
 
331
<h6 align = "center">
332
<a name = "hash_range_hashing_seq_diagram2">
333
<img src = "hash_range_hashing_seq_diagram2.jpg" width = "50%" alt = "no image">
334
</a>
335
</h6>
336
<h6 align = "center">
337
Insert hash sequence diagram with a null combination policy.
338
</h6>
339
 
340
<p>
341
    <tt>pb_assoc</tt> contains the following hash-related policies:
342
</p>
343
 
344
<ol>
345
    <li>
346
<a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
347
and
348
<a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
349
are range-hashing functions based on a bit-mask and a modulo operation, respectively.
350
    </li>
351
    <li>
352
<a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> and
353
<a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are probe
354
classes based on linear and quadratic increment, respectively.
355
    </li>
356
    <li>
357
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>
358
and
359
<a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>
360
are
361
<a href = "concepts.html#concepts_null_policies">null policy classes</a> for creating
362
ranged-hash and ranged-probe functions directly (<i>i.e.</i>, not through
363
composition).
364
    </li>
365
</ol>
366
 
367
<p>
368
    <tt>pb_assoc</tt> does not provide any hash functions (it relies on those
369
of the STL).
370
</p>
371
 
372
 
373
</body>
374
 
375
</html>

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.