URL
https://opencores.org/ocsvn/scarts/scarts/trunk
Subversion Repositories scarts
[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [hash_policies.html] - Rev 20
Compare with Previous | Blame | View Log
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>Hash Policies</title> <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1"> <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5"> </head> <body bgcolor="white"> <h1>Hash Policies</h1> <p> This subsection describes hash policies. It is organized as follows: </p> <ol> <li> The <a href = "#general_terms">General Terms</a> Section describes some general terms. </li> <li> The <a href = "#range_hashing_fns">Range-Hashing Functions</a> Section describes range-hasing functions.</li> <li> The <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a> Section describes ranged-hash functions. </li> <li> The <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a> Section describes the implementation of these concepts in <tt>pb_assoc</tt>. </li> </ol> <h2><a name="general_terms">General Terms</a></h2> <p> There are actually three functions involved in transforming a key into a hash-table's position (see Figure <a href = "#hash_ranged_hash_range_hashing_fns"> Hash runctions, ranged-hash functions, and range-hashing functions </a>): </p> <ol> <li> A <i>ranged-hash</i> function, which maps keys into an interval of the non-negative integrals. This is the function actually required by the hash-table algorithm. </li> <li> A hash function, which maps keys into non-negative integral types. This is typically specified by the writer of the key class. </li> <li> A <i>range-hashing</i> function, which maps non-negative integral types into an interval of non-negative integral types. </li> </ol> <h6 align = "center"> <a name = "hash_ranged_hash_range_hashing_fns"> <img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "40%" alt = "no image"> </a> Hash runctions, ranged-hash functions, and range-hashing functions. </h6> <p> Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3 characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly" into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i> function </p> <p> <i>f : U × Z<sub>+</sub> → Z<sub>+</sub> </i>, </p> <p> such that for any <i>u</i> in <i>U</i> , </p> <p> <i>0 ≤ f(u, m) ≤ m - 1 </i>, </p> <p> and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>]. One common solution is to use the composition of the hash function </p> <p> <i>h : U → Z<sub>+</sub> </i>, </p> <p> which maps elements of <i>U</i> into the non-negative integrals, and </p> <p> <i>g : Z<sub>+</sub> × Z<sub>+</sub> → Z<sub>+</sub>, </i> </p> <p> which maps a non-negative hash value, and a non-negative range upper-bound into a non-negative integral in the range between 0 (inclusive) and the range upper bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>, </p> <p> <i>0 ≤ g(r, m) ≤ m - 1 </i>. </p> <p> The resulting ranged-hash function, is </p> <p> <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a> </i>(1) . </p> <p> From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can always be composed (however the converse is not true). </p> <p> The above describes the case where a key is to be mapped into a <i>single position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table. In other cases, a key is to be mapped into a <i>sequence of poisitions</i> within a table, <i>e.g.</i>, in a probing table. </p> <p> Similar terms apply in this case: the table requires a <i>ranged probe</i> function, mapping a key into a sequence of positions withing the table. This is typically acheived by composing a <i>hash function</i> mapping the key into a non-negative integral type, a <i>probe</i> function transforming the hash value into a sequence of hash values, and a <i>range-hashing</i> function transforming the sequence of hash values into a sequence of positions. </p> <h2><a name="range_hashing_fns">Range-Hashing Functions</a></h2> <p> Some common choices for range-hashing functions are the division, multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>], defined as </p> <p> <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) , </p> <p> <i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉ </i>, </p> <p> and </p> <p> <i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉ </i>, </p> <p> respectively, for some positive integrals <i>u</i> and <i>v</i> (typically powers of 2), and some <i>a</i>. Each of these range-hashing functions works best for some different setting. </p> <p> The division method <a href="#division_method">(2)</a> is a very common choice. However, even this single method can be implemented in two very different ways. It is possible to implement <a href="#division_method">(2)</a> using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low level <i>&</i> (bit-mask) operation (for the case where <i>m</i> is a power of 2), <i>i.e.</i>, </p> <p> <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) , </p> <p> and </p> <p> <a name="eqn:division_method_bit_mask"> <i>g(r, m) = r & m - 1, ( m = 2<sup>k</sup> </i> for some<i> k) </i></a>(4) , </p> <p> respectively. </p> <p> The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a> has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> is affected by all the bits of <i>r</i> (minimizing the chance of collision). It has the disadvantage of using the costly modulo operation. This method is hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>]. </p> <p> The <i>&</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a> has the advantage of relying on the fast bitwise and operation. It has the disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>. This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]. </p> <h2><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h2> <p> Although rarer, there are cases where it is beneficial to allow the client to directly specify a ranged-hash hash function. It is true, that the writer of the ranged-hash function cannot rely on the values of <i>m</i> having specific numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]), since the values of <i>m</i> are determined by a resize policy with possibly orthogonal considerations [<a href="references.html#austern98segmented">austern98segmented</a>]. The values of <i>m</i> can be used in some cases, though, to estimate the "general" number of distinct values required. </p> <p> Let </p> <p> <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i> </p> <p> be a string of <i>t</i> characters, each of which is from domain <i>S</i>. Consider the following ranged-hash function: </p> <p> <a name="eqn:total_string_dna_hash"> <i> f<sub>1</sub>(s, m) = ∑ <sub>i = 0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i> </a> (5) , </p> <p> where <i>a</i> is some non-negative integral value. This is the standard string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>]. Its advantage is that it takes into account all of the characters of the string. </p> <p> Now assume that <i>s</i> is the string representation of a of a long DNA sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the entire string might be prohibitively expensive. A possible alternative might be to use only the first <i>k</i> characters of the string, where </p> <p> k <sup>|S|</sup> ≥ m , </p> <p> <i>i.e.</i>, using the hash function </p> <p> <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6) </p> <p> requiring scanning over only </p> <p> <i>k = </i>log<i><sub>4</sub>( m ) </i> </p> <p> characters. </p> <p> Other more elaborate hash-functions might scan <i>k</i> characters starting at a random position (determined at each resize), or scanning <i>k</i> random positions (determined at each resize), <i>i.e.</i>, using </p> <p> <i>f<sub>3</sub>(s, m) = ∑ <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>, </p> <p> or </p> <p> <i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub> a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>, </p> <p> <p> respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the (inclusive) range <i>[0,...,t-1]</i>. </p> <h2><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h2> <p> Containers based on collision-chaining hash tables in <tt>pb_assoc</tt> are parameterized by the functors <tt>Hash_Fn</tt>, and <tt>Comb_Hash_Fn</tt>. </p> <p> If such a container is instantiated with any hash functor and range-hashing functor, the container will synthesize a ranged-hash functor automatically. For example, Figure <a href = "#hash_range_hashing_seq_diagram"> Insert hash sequence diagram </a> shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), the container transforms the key into a non-negative integral using the hash functor (points B and C), and transforms the result into a position using the combining functor (points D and E). </p> <h6 align = "center"> <a name = "hash_range_hashing_seq_diagram"> <img src = "hash_range_hashing_seq_diagram.jpg" width = "50%" alt = "no image"> </a> </h6> <h6 align = "center"> Insert hash sequence diagram. </h6> <p> If such a container is instantiated with the <a href = "concepts.html#concepts_null_policies">null policy</a> hash functor, <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>, and a combining-hash functor, the container treats the combining hash functor as a ranged-hash function. For example, Figure <a href = "#hash_range_hashing_seq_diagram2"> Insert hash sequence diagram with a null combination policy </a> shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), the container transforms the key into a position using the combining functor (points B and C). </p> <h6 align = "center"> <a name = "hash_range_hashing_seq_diagram2"> <img src = "hash_range_hashing_seq_diagram2.jpg" width = "50%" alt = "no image"> </a> </h6> <h6 align = "center"> Insert hash sequence diagram with a null combination policy. </h6> <p> <tt>pb_assoc</tt> contains the following hash-related policies: </p> <ol> <li> <a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a> and <a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a> are range-hashing functions based on a bit-mask and a modulo operation, respectively. </li> <li> <a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> and <a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are probe classes based on linear and quadratic increment, respectively. </li> <li> <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and <a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a> are <a href = "concepts.html#concepts_null_policies">null policy classes</a> for creating ranged-hash and ranged-probe functions directly (<i>i.e.</i>, not through composition). </li> </ol> <p> <tt>pb_assoc</tt> does not provide any hash functions (it relies on those of the STL). </p> </body> </html>