1 |
20 |
jlechner |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
2 |
|
|
<html>
|
3 |
|
|
<head>
|
4 |
|
|
<title>Hash Policies</title>
|
5 |
|
|
<meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
|
6 |
|
|
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
|
7 |
|
|
</head>
|
8 |
|
|
<body bgcolor="white">
|
9 |
|
|
|
10 |
|
|
<h1>Hash Policies</h1>
|
11 |
|
|
<p>
|
12 |
|
|
This subsection describes hash policies. It is organized as follows:
|
13 |
|
|
</p>
|
14 |
|
|
<ol>
|
15 |
|
|
<li> The <a href = "#general_terms">General Terms</a> Section describes
|
16 |
|
|
some general terms.
|
17 |
|
|
</li>
|
18 |
|
|
<li> The <a href = "#range_hashing_fns">Range-Hashing Functions</a> Section
|
19 |
|
|
describes range-hasing functions.</li>
|
20 |
|
|
<li> The <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a> Section
|
21 |
|
|
describes ranged-hash functions. </li>
|
22 |
|
|
<li> The <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a> Section
|
23 |
|
|
describes the implementation of these concepts in <tt>pb_assoc</tt>.
|
24 |
|
|
</li>
|
25 |
|
|
</ol>
|
26 |
|
|
|
27 |
|
|
|
28 |
|
|
<h2><a name="general_terms">General Terms</a></h2>
|
29 |
|
|
|
30 |
|
|
<p>
|
31 |
|
|
There
|
32 |
|
|
are actually three functions involved in transforming a key into a hash-table's
|
33 |
|
|
position (see Figure
|
34 |
|
|
<a href = "#hash_ranged_hash_range_hashing_fns">
|
35 |
|
|
Hash runctions, ranged-hash functions, and range-hashing functions
|
36 |
|
|
</a>):
|
37 |
|
|
</p>
|
38 |
|
|
<ol>
|
39 |
|
|
<li>
|
40 |
|
|
A <i>ranged-hash</i> function, which maps keys into an interval of the
|
41 |
|
|
non-negative integrals. This is the function actually required by the
|
42 |
|
|
hash-table algorithm.
|
43 |
|
|
</li>
|
44 |
|
|
<li>
|
45 |
|
|
A hash function, which maps keys into non-negative integral types. This is
|
46 |
|
|
typically specified by the writer of the key class.
|
47 |
|
|
</li>
|
48 |
|
|
<li>
|
49 |
|
|
A <i>range-hashing</i> function, which maps non-negative integral types into an
|
50 |
|
|
interval of non-negative integral types.
|
51 |
|
|
</li>
|
52 |
|
|
</ol>
|
53 |
|
|
|
54 |
|
|
<h6 align = "center">
|
55 |
|
|
<a name = "hash_ranged_hash_range_hashing_fns">
|
56 |
|
|
<img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "40%" alt = "no image">
|
57 |
|
|
</a>
|
58 |
|
|
Hash runctions, ranged-hash functions, and range-hashing functions.
|
59 |
|
|
</h6>
|
60 |
|
|
|
61 |
|
|
<p>
|
62 |
|
|
Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
|
63 |
|
|
characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
|
64 |
|
|
into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
|
65 |
|
|
value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
|
66 |
|
|
function
|
67 |
|
|
</p>
|
68 |
|
|
<p>
|
69 |
|
|
<i>f : U × Z<sub>+</sub> → Z<sub>+</sub> </i>,
|
70 |
|
|
</p>
|
71 |
|
|
<p>
|
72 |
|
|
such that for any <i>u</i> in <i>U</i>
|
73 |
|
|
,
|
74 |
|
|
</p>
|
75 |
|
|
<p>
|
76 |
|
|
<i>0 ≤ f(u, m) ≤ m - 1 </i>,
|
77 |
|
|
</p>
|
78 |
|
|
<p>
|
79 |
|
|
and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
|
80 |
|
|
One common solution is to use the composition of the hash function
|
81 |
|
|
</p>
|
82 |
|
|
<p>
|
83 |
|
|
<i>h : U → Z<sub>+</sub> </i>,
|
84 |
|
|
</p>
|
85 |
|
|
<p>
|
86 |
|
|
which maps elements of <i>U</i> into the non-negative integrals, and
|
87 |
|
|
</p>
|
88 |
|
|
<p>
|
89 |
|
|
<i>g : Z<sub>+</sub> × Z<sub>+</sub> → Z<sub>+</sub>, </i>
|
90 |
|
|
</p>
|
91 |
|
|
<p>
|
92 |
|
|
which maps a non-negative hash value, and a non-negative range upper-bound into
|
93 |
|
|
a non-negative integral in the range between 0 (inclusive) and the range upper
|
94 |
|
|
bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
|
95 |
|
|
</p>
|
96 |
|
|
<p>
|
97 |
|
|
<i>0 ≤ g(r, m) ≤ m - 1 </i>.
|
98 |
|
|
</p>
|
99 |
|
|
<p>
|
100 |
|
|
The resulting ranged-hash function, is
|
101 |
|
|
</p>
|
102 |
|
|
<p>
|
103 |
|
|
<i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
|
104 |
|
|
</i>(1) .
|
105 |
|
|
</p>
|
106 |
|
|
|
107 |
|
|
<p>
|
108 |
|
|
From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
|
109 |
|
|
always be composed (however the converse is not true).
|
110 |
|
|
</p>
|
111 |
|
|
|
112 |
|
|
|
113 |
|
|
<p>
|
114 |
|
|
The above describes the case where a key is to be mapped into a <i>single
|
115 |
|
|
position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
|
116 |
|
|
In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
|
117 |
|
|
within a table, <i>e.g.</i>, in a probing table.
|
118 |
|
|
</p>
|
119 |
|
|
<p>
|
120 |
|
|
Similar terms apply in this case: the table requires a <i>ranged probe</i>
|
121 |
|
|
function, mapping a key into a sequence of positions withing the table. This is
|
122 |
|
|
typically acheived by composing a <i>hash function</i> mapping the key
|
123 |
|
|
into a non-negative integral type, a <i>probe</i> function transforming the
|
124 |
|
|
hash value into a sequence of hash values, and a <i>range-hashing</i> function
|
125 |
|
|
transforming the sequence of hash values into a sequence of positions.
|
126 |
|
|
</p>
|
127 |
|
|
|
128 |
|
|
|
129 |
|
|
<h2><a name="range_hashing_fns">Range-Hashing Functions</a></h2>
|
130 |
|
|
|
131 |
|
|
<p>
|
132 |
|
|
Some common choices for range-hashing functions are the division,
|
133 |
|
|
multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
|
134 |
|
|
defined as
|
135 |
|
|
</p>
|
136 |
|
|
<p>
|
137 |
|
|
<i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
|
138 |
|
|
</p>
|
139 |
|
|
<p>
|
140 |
|
|
<i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉ </i>,
|
141 |
|
|
</p>
|
142 |
|
|
<p>
|
143 |
|
|
and
|
144 |
|
|
</p>
|
145 |
|
|
<p>
|
146 |
|
|
<i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉ </i>,
|
147 |
|
|
</p>
|
148 |
|
|
<p>
|
149 |
|
|
respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
|
150 |
|
|
powers of 2), and some <i>a</i>. Each of these range-hashing functions works
|
151 |
|
|
best for some different setting.
|
152 |
|
|
</p>
|
153 |
|
|
<p>
|
154 |
|
|
The division method <a href="#division_method">(2)</a> is a very common
|
155 |
|
|
choice. However, even this single method can be implemented in two very
|
156 |
|
|
different ways. It is possible to implement <a href="#division_method">(2)</a>
|
157 |
|
|
using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
|
158 |
|
|
level <i>&</i> (bit-mask) operation (for the case where <i>m</i> is a power of
|
159 |
|
|
2), <i>i.e.</i>,
|
160 |
|
|
</p>
|
161 |
|
|
<p>
|
162 |
|
|
<i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
|
163 |
|
|
</p>
|
164 |
|
|
<p>
|
165 |
|
|
and
|
166 |
|
|
</p>
|
167 |
|
|
<p>
|
168 |
|
|
<a name="eqn:division_method_bit_mask">
|
169 |
|
|
<i>g(r, m) = r & m - 1, ( m = 2<sup>k</sup>
|
170 |
|
|
</i>
|
171 |
|
|
for some<i> k) </i></a>(4) ,
|
172 |
|
|
</p>
|
173 |
|
|
<p>
|
174 |
|
|
respectively.
|
175 |
|
|
</p>
|
176 |
|
|
<p>
|
177 |
|
|
The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
|
178 |
|
|
has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
|
179 |
|
|
is affected by all the bits of <i>r</i> (minimizing the chance of collision).
|
180 |
|
|
It has the disadvantage of using the costly modulo operation. This method is
|
181 |
|
|
hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
|
182 |
|
|
</p>
|
183 |
|
|
|
184 |
|
|
<p>
|
185 |
|
|
The <i>&</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
|
186 |
|
|
has the advantage of relying on the fast bitwise and operation. It has the
|
187 |
|
|
disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
|
188 |
|
|
This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
|
189 |
|
|
</p>
|
190 |
|
|
|
191 |
|
|
|
192 |
|
|
|
193 |
|
|
|
194 |
|
|
<h2><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h2>
|
195 |
|
|
|
196 |
|
|
<p>
|
197 |
|
|
Although rarer, there are cases where it is beneficial to allow the client to
|
198 |
|
|
directly specify a ranged-hash hash function. It is true, that the writer of
|
199 |
|
|
the ranged-hash function cannot rely on the values of <i>m</i> having specific
|
200 |
|
|
numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
|
201 |
|
|
since the values of <i>m</i> are determined by a resize policy with possibly
|
202 |
|
|
orthogonal considerations [<a href="references.html#austern98segmented">austern98segmented</a>].
|
203 |
|
|
The values of <i>m</i> can be used in some cases, though, to estimate the
|
204 |
|
|
"general" number of distinct values required.
|
205 |
|
|
</p>
|
206 |
|
|
|
207 |
|
|
<p>
|
208 |
|
|
Let
|
209 |
|
|
</p>
|
210 |
|
|
|
211 |
|
|
<p>
|
212 |
|
|
<i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
|
213 |
|
|
</p>
|
214 |
|
|
|
215 |
|
|
<p>
|
216 |
|
|
be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
|
217 |
|
|
Consider the following ranged-hash function:
|
218 |
|
|
</p>
|
219 |
|
|
|
220 |
|
|
<p>
|
221 |
|
|
<a name="eqn:total_string_dna_hash">
|
222 |
|
|
<i>
|
223 |
|
|
f<sub>1</sub>(s, m) =
|
224 |
|
|
∑ <sub>i =
|
225 |
|
|
0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
|
226 |
|
|
</a> (5) ,
|
227 |
|
|
</p>
|
228 |
|
|
|
229 |
|
|
<p>
|
230 |
|
|
where <i>a</i> is some non-negative integral value. This is the standard
|
231 |
|
|
string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
|
232 |
|
|
Its advantage is that it takes into account all of the characters of the
|
233 |
|
|
string.
|
234 |
|
|
</p>
|
235 |
|
|
|
236 |
|
|
<p>
|
237 |
|
|
Now assume that <i>s</i> is the string representation of a of a long DNA
|
238 |
|
|
sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
|
239 |
|
|
entire string might be prohibitively expensive. A possible alternative might be
|
240 |
|
|
to use only the first <i>k</i> characters of the string, where
|
241 |
|
|
</p>
|
242 |
|
|
|
243 |
|
|
<p>
|
244 |
|
|
k <sup>|S|</sup> ≥ m ,
|
245 |
|
|
</p>
|
246 |
|
|
<p>
|
247 |
|
|
<i>i.e.</i>, using the hash function
|
248 |
|
|
</p>
|
249 |
|
|
<p>
|
250 |
|
|
<a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k
|
251 |
|
|
- 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
|
252 |
|
|
</p>
|
253 |
|
|
<p>
|
254 |
|
|
requiring scanning over only
|
255 |
|
|
</p>
|
256 |
|
|
<p>
|
257 |
|
|
<i>k = </i>log<i><sub>4</sub>( m ) </i>
|
258 |
|
|
</p>
|
259 |
|
|
<p>
|
260 |
|
|
characters.
|
261 |
|
|
</p>
|
262 |
|
|
<p>
|
263 |
|
|
Other more elaborate hash-functions might scan <i>k</i> characters starting at
|
264 |
|
|
a random position (determined at each resize), or scanning <i>k</i> random
|
265 |
|
|
positions (determined at each resize), <i>i.e.</i>, using
|
266 |
|
|
</p>
|
267 |
|
|
<p>
|
268 |
|
|
<i>f<sub>3</sub>(s, m) = ∑ <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
|
269 |
|
|
s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
|
270 |
|
|
</p>
|
271 |
|
|
<p>
|
272 |
|
|
or
|
273 |
|
|
</p>
|
274 |
|
|
<p>
|
275 |
|
|
<i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
|
276 |
|
|
a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
|
277 |
|
|
</p>
|
278 |
|
|
<p>
|
279 |
|
|
<p>
|
280 |
|
|
respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
|
281 |
|
|
(inclusive) range <i>[0,...,t-1]</i>.
|
282 |
|
|
</p>
|
283 |
|
|
|
284 |
|
|
|
285 |
|
|
<h2><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h2>
|
286 |
|
|
|
287 |
|
|
<p>
|
288 |
|
|
Containers based on collision-chaining hash tables in <tt>pb_assoc</tt>
|
289 |
|
|
are parameterized by the functors <tt>Hash_Fn</tt>, and <tt>Comb_Hash_Fn</tt>.
|
290 |
|
|
</p>
|
291 |
|
|
|
292 |
|
|
<p>
|
293 |
|
|
If such a container is instantiated with any hash functor and
|
294 |
|
|
range-hashing functor, the container will synthesize a ranged-hash functor
|
295 |
|
|
automatically. For example, Figure
|
296 |
|
|
<a href = "#hash_range_hashing_seq_diagram">
|
297 |
|
|
Insert hash sequence diagram
|
298 |
|
|
</a>
|
299 |
|
|
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
|
300 |
|
|
the container transforms the key into a non-negative integral using the hash
|
301 |
|
|
functor (points B and C), and transforms the result into a position
|
302 |
|
|
using the combining functor (points D and E).
|
303 |
|
|
</p>
|
304 |
|
|
|
305 |
|
|
<h6 align = "center">
|
306 |
|
|
<a name = "hash_range_hashing_seq_diagram">
|
307 |
|
|
<img src = "hash_range_hashing_seq_diagram.jpg" width = "50%" alt = "no image">
|
308 |
|
|
</a>
|
309 |
|
|
</h6>
|
310 |
|
|
<h6 align = "center">
|
311 |
|
|
Insert hash sequence diagram.
|
312 |
|
|
</h6>
|
313 |
|
|
|
314 |
|
|
|
315 |
|
|
<p>
|
316 |
|
|
If such a container is instantiated with the
|
317 |
|
|
<a href = "concepts.html#concepts_null_policies">null policy</a>
|
318 |
|
|
hash functor,
|
319 |
|
|
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
|
320 |
|
|
and a combining-hash functor, the container treats
|
321 |
|
|
the combining hash functor as a ranged-hash function. For example, Figure
|
322 |
|
|
<a href = "#hash_range_hashing_seq_diagram2">
|
323 |
|
|
Insert hash sequence diagram with a null combination policy
|
324 |
|
|
</a>
|
325 |
|
|
shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
|
326 |
|
|
the container transforms the key into a position
|
327 |
|
|
using the combining functor (points B and C).
|
328 |
|
|
</p>
|
329 |
|
|
|
330 |
|
|
|
331 |
|
|
<h6 align = "center">
|
332 |
|
|
<a name = "hash_range_hashing_seq_diagram2">
|
333 |
|
|
<img src = "hash_range_hashing_seq_diagram2.jpg" width = "50%" alt = "no image">
|
334 |
|
|
</a>
|
335 |
|
|
</h6>
|
336 |
|
|
<h6 align = "center">
|
337 |
|
|
Insert hash sequence diagram with a null combination policy.
|
338 |
|
|
</h6>
|
339 |
|
|
|
340 |
|
|
<p>
|
341 |
|
|
<tt>pb_assoc</tt> contains the following hash-related policies:
|
342 |
|
|
</p>
|
343 |
|
|
|
344 |
|
|
<ol>
|
345 |
|
|
<li>
|
346 |
|
|
<a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
|
347 |
|
|
and
|
348 |
|
|
<a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
|
349 |
|
|
are range-hashing functions based on a bit-mask and a modulo operation, respectively.
|
350 |
|
|
</li>
|
351 |
|
|
<li>
|
352 |
|
|
<a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> and
|
353 |
|
|
<a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are probe
|
354 |
|
|
classes based on linear and quadratic increment, respectively.
|
355 |
|
|
</li>
|
356 |
|
|
<li>
|
357 |
|
|
<a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>
|
358 |
|
|
and
|
359 |
|
|
<a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>
|
360 |
|
|
are
|
361 |
|
|
<a href = "concepts.html#concepts_null_policies">null policy classes</a> for creating
|
362 |
|
|
ranged-hash and ranged-probe functions directly (<i>i.e.</i>, not through
|
363 |
|
|
composition).
|
364 |
|
|
</li>
|
365 |
|
|
</ol>
|
366 |
|
|
|
367 |
|
|
<p>
|
368 |
|
|
<tt>pb_assoc</tt> does not provide any hash functions (it relies on those
|
369 |
|
|
of the STL).
|
370 |
|
|
</p>
|
371 |
|
|
|
372 |
|
|
|
373 |
|
|
</body>
|
374 |
|
|
|
375 |
|
|
</html>
|