URL
https://opencores.org/ocsvn/scarts/scarts/trunk
Subversion Repositories scarts
[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [hash_based_containers.html] - Rev 20
Compare with Previous | Blame | View Log
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>Hash-Based Containers</title> <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1"> <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5"> </head> <body bgcolor = "white"> <h1>Hash-Based Containers</h1> <p> This section describes hash-based containers. It is organized as follows. </p> <ol> <li> <a href = "#overview">Overview</a> is an overview. </li> <li> <a href = "#hash_policies">Hash Policies</a> discusses hash policies. </li> <li> <a href = "#resize_policies">Resize Policies</a> discusses resize policies. </li> <li> <a href = "#policy_interaction">Policy Interaction</a> discusses interaction between policies. </li> </ol> <h2><a name = "overview">Overview</a></h2> <p> Figure <a href = "#hash_cd">Hash-based containers</a> shows the container-hierarchy; the hash-based containers are circled. <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is a collision-chaining hash-based container; <a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a> is a (general) probing hash-based container. </p> <h6 align = "center"> <a name = "hash_cd"> <img src = "hash_cd.jpg" width = "70%" alt = "no image"> </h6> <h6 align = "center"> </a> Hash-based containers. </h6> <p> The collision-chaining hash-based container has the following declaration. </p> <pre> <b>template</b>< <b>typename</b> Key, <b>typename</b> Data, <b>class</b> Hash_Fn = std::hash<Key>, <b>class</b> Eq_Fn = std::equal_to<Key>, <b>class</b> Comb_Hash_Fn = <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><> <b>class</b> Resize_Policy = <i>default explained below.</i> <b>bool</b> Store_Hash = <b>false</b>, <b>class</b> Allocator = std::allocator<<b>char</b>> > <b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>; </pre> <p> The parameters have the following meaning: </p> <ol> <li> <tt>Key</tt> is the key type. </li> <li> <tt>Data</tt> is the data-policy, and is explained in <a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>. </li> <li> <tt>Hash_Fn</tt> is a key hashing functor.</li> <li> <tt>Eq_Fn</tt> is a key equivalence functor.</li> <li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it describes how to translate hash values into positions within the table. This is described in <a name = "#hash_policies">Hash Policies</a>.</li> </li> <li> <tt>Resize_Policy</tt> describes how a container object should change its internal size. This is described in <a name = #resize_policies">Resize Policies</a>.</li> <li> <tt>Store_Hash</tt> indicates whether the hash value should be stored with each entry. This is described in <a name = "#policy_interaction">Policy Interaction</a>.</li> <li> <tt>Allocator</tt> is (surprisingly) an allocator type. </li> </ol> <p> The probing hash-based container has the following declaration. </p> <pre> <b>template</b>< <b>typename</b> Key, <b>typename</b> Data, <b>class</b> Hash_Fn = std::hash< Key>, <b>class</b> Eq_Fn = std::equal_to< Key>, <b>class</b> Comb_Probe_Fn = <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a><> <b>class</b> Probe_Fn = <i>default explained below.</i> <b>class</b> Resize_Policy = <i>default explained below.</i> <b>bool</b> Store_Hash = <b>false</b>, <b>class</b> Allocator = std::allocator<<b>char</b>> > <b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>; </pre> <p> The parameters are identical to those of the collision-chaining container, except for the following. </p> <ol> <li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into a sequence of positions within the table. </li> <li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li> </ol> <p> Some of the default template values depend on the values of other parameters, and are explained in <a name = "#policy_interaction">Policy Interaction</a>. </p> <h2><a name = "hash_policies">Hash Policies</h2></a> <p> This subsection describes hash policies. It is organized as follows: </p> <ol> <li> <a href = "#general_terms">General Terms</a> describes some general terms. </li> <li> <a href = "#range_hashing_fns">Range-Hashing Functions</a> describes range-hasing functions.</li> <li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a> describes ranged-hash functions. </li> <li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a> describes the implementation of these concepts in <tt>pb_assoc</tt>. </li> </ol> <h3><a name="general_terms">General Terms</a></h3> <p> There are actually three functions involved in transforming a key into a hash-table's position (see Figure <a href = "#hash_ranged_hash_range_hashing_fns"> Hash runctions, ranged-hash functions, and range-hashing functions </a>): </p> <ol> <li> A <i>ranged-hash</i> function, which maps keys into an interval of the non-negative integrals. This is the function actually required by the hash-table algorithm. </li> <li> A hash function, which maps keys into non-negative integral types. This is typically specified by the writer of the key class. </li> <li> A <i>range-hashing</i> function, which maps non-negative integral types into an interval of non-negative integral types. </li> </ol> <h6 align = "center"> <a name = "hash_ranged_hash_range_hashing_fns"> <img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image"> </h6> <h6 align = "center"> </a> Hash runctions, ranged-hash functions, and range-hashing functions. </h6> <p> Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3 characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly" into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i> function </p> <p> <i>f : U × Z<sub>+</sub> → Z<sub>+</sub> </i>, </p> <p> such that for any <i>u</i> in <i>U</i> , </p> <p> <i>0 ≤ f(u, m) ≤ m - 1 </i>, </p> <p> and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>]. One common solution is to use the composition of the hash function </p> <p> <i>h : U → Z<sub>+</sub> </i>, </p> <p> which maps elements of <i>U</i> into the non-negative integrals, and </p> <p> <i>g : Z<sub>+</sub> × Z<sub>+</sub> → Z<sub>+</sub>, </i> </p> <p> which maps a non-negative hash value, and a non-negative range upper-bound into a non-negative integral in the range between 0 (inclusive) and the range upper bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>, </p> <p> <i>0 ≤ g(r, m) ≤ m - 1 </i>. </p> <p> The resulting ranged-hash function, is </p> <p> <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a> </i>(1) . </p> <p> From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can always be composed (however the converse is not true). The STL's hash-based containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed. </p> <p> The above describes the case where a key is to be mapped into a <i>single position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table. In other cases, a key is to be mapped into a <i>sequence of poisitions</i> within a table, <i>e.g.</i>, in a probing table. </p> <p> Similar terms apply in this case: the table requires a <i>ranged probe</i> function, mapping a key into a sequence of positions withing the table. This is typically acheived by composing a <i>hash function</i> mapping the key into a non-negative integral type, a <i>probe</i> function transforming the hash value into a sequence of hash values, and a <i>range-hashing</i> function transforming the sequence of hash values into a sequence of positions. </p> <h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3> <p> Some common choices for range-hashing functions are the division, multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>], defined as </p> <p> <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) , </p> <p> <i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉ </i>, </p> <p> and </p> <p> <i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉ </i>, </p> <p> respectively, for some positive integrals <i>u</i> and <i>v</i> (typically powers of 2), and some <i>a</i>. Each of these range-hashing functions works best for some different setting. </p> <p> The division method <a href="#division_method">(2)</a> is a very common choice. However, even this single method can be implemented in two very different ways. It is possible to implement <a href="#division_method">(2)</a> using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low level <i>&</i> (bit-mask) operation (for the case where <i>m</i> is a power of 2), <i>i.e.</i>, </p> <p> <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) , </p> <p> and </p> <p> <a name="eqn:division_method_bit_mask"> <i>g(r, m) = r & m - 1, ( m = 2<sup>k</sup> </i> for some<i> k) </i></a>(4) , </p> <p> respectively. </p> <p> The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a> has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> is affected by all the bits of <i>r</i> (minimizing the chance of collision). It has the disadvantage of using the costly modulo operation. This method is hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>]. </p> <p> The <i>&</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a> has the advantage of relying on the fast bitwise and operation. It has the disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>. This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]. </p> <h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3> <p> In some less frequent cases it is beneficial to allow the client to directly specify a ranged-hash hash function. It is true, that the writer of the ranged-hash function cannot rely on the values of <i>m</i> having specific numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]), since the values of <i>m</i> are determined by a resize policy with possibly orthogonal considerations. </p> <p> There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing [<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second is when the values of <i>m</i> can be used to estimate the "general" number of distinct values required. This is described in the following. </p> <p> Let </p> <p> <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i> </p> <p> be a string of <i>t</i> characters, each of which is from domain <i>S</i>. Consider the following ranged-hash function: </p> <p> <a name="eqn:total_string_dna_hash"> <i> f<sub>1</sub>(s, m) = ∑ <sub>i = 0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i> </a> (5) , </p> <p> where <i>a</i> is some non-negative integral value. This is the standard string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>]. Its advantage is that it takes into account all of the characters of the string. </p> <p> Now assume that <i>s</i> is the string representation of a of a long DNA sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the entire string might be prohibitively expensive. A possible alternative might be to use only the first <i>k</i> characters of the string, where </p> <p> k <sup>|S|</sup> ≥ m , </p> <p> <i>i.e.</i>, using the hash function </p> <p> <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6) </p> <p> requiring scanning over only </p> <p> <i>k = </i>log<i><sub>4</sub>( m ) </i> </p> <p> characters. </p> <p> Other more elaborate hash-functions might scan <i>k</i> characters starting at a random position (determined at each resize), or scanning <i>k</i> random positions (determined at each resize), <i>i.e.</i>, using </p> <p> <i>f<sub>3</sub>(s, m) = ∑ <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>, </p> <p> or </p> <p> <i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub> a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>, </p> <p> <p> respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the (inclusive) range <i>[0,...,t-1]</i>. </p> <h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3> <p> <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor and a combining hash functor, respectively. </p> <p> For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>, one of the <a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>, then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor. The container will synthesize a ranged-hash functor from both. For example, Figure <a href = "#hash_range_hashing_seq_diagram"> Insert hash sequence diagram </a> shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), the container transforms the key into a non-negative integral using the hash functor (points B and C), and transforms the result into a position using the combining functor (points D and E). </p> <h6 align = "center"> <a name = "hash_range_hashing_seq_diagram"> <img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image"> </a> </h6> <h6 align = "center"> Insert hash sequence diagram. </h6> <p> <tt>pb_assoc</tt> contains the following range-hashing policies: </p> <ol> <li> <a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a> and <a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a> are range-hashing functions based on a bit-mask and a modulo operation, respectively. </li> </ol> <p> If <tt>Comb_Hash_Fn</tt> is instantiated by <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>, and a combining-hash functor, the container treats the combining hash functor as a ranged-hash function. For example, Figure <a href = "#hash_range_hashing_seq_diagram2"> Insert hash sequence diagram with a null combination policy </a> shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A), the container transforms the key into a position using the combining functor (points B and C). </p> <h6 align = "center"> <a name = "hash_range_hashing_seq_diagram2"> <img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image"> </a> </h6> <h6 align = "center"> Insert hash sequence diagram with a null combination policy. </h6> <p> Similarly, <a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt> is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and <tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt> and <tt>Comb_Probe_Fn</tt> are, respectively, <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and <a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>, then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt> is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value, and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions within the table. </p> <p> <tt>pb_assoc</tt> contains the following probe policies: </p> <ol> <li> <a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe function. </li> <li> <a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is a quadratic probe function. </li> </ol> <h2><a name = "resize_policies">Resize Policies</a></h2> <p> This subsection describes resize policies. It is organized as follows: </p> <ol> <li> <a href = "#general">General Terms</a> describes general terms. </li> <li> <a href = "#size_policies">Size Policies</a> describes size policies. </li> <li> <a href = "#trigger_policies">Trigger Policies</a> describes trigger policies. </li> <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a> describes the implementation of these concepts in <tt>pb_assoc</tt>. </li> </ol> <h3><a name = "general">General Terms</a></h3> <p> Hash-tables, as opposed to trees, do not naturally grow or shrink. It is necessary to specify policies to determine how and when a hash table should change its size. </p> <p> In general, resize policies can be decomposed into (probably orthogonal) policies: </p> <ol> <li> A <i>size policy</i> indicating <i>how</i> a hash table should grow (<i>e.g.,</i> it should multiply by powers of 2). </li> <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should grow (<i>e.g.,</i> a load factor is exceeded). </li> </ol> <h3><a name = "size_policies">Size Policies</a></h3> <p> Size policies determine how a hash table changes size. These policies are simple, and there are relatively few sensible options. An exponential-size policy (with the initial size and growth factors both powers of 2) works well with a mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection), and is the hard-wired policy used by Dinkumware [<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A prime-list based policy works well with a modulo-prime range hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection), and is the hard-wired policy used by SGI's implementation [<a href = "references.html#sgi_stl">sgi_stl</a>]. </p> <h3><a name = "trigger_policies">Trigger Policies</a></h3> <p> Trigger policies determine when a hash table changes size. Following is a description of two polcies: <i>load-check</i> policies, and a collision-check policies. </p> <p> Load-check policies are straightforward. The user specifies two factors, <i>α<sub>min</sub></i> and <i>α<sub>max</sub></i>, and the hash table maintains the invariant that </p> <p> <i> <a name = "eqn:load_factor_min_max"> α<sub>min</sub> ≤ (number of stored elements) / (hash-table size) ≤ α<sub>max</sub> </a> </i> (1) . </p> <p> Collision-check policies work in the opposite direction of load-check policies. They focus on keeping the number of collisions moderate and hoping that the size of the table will not grow very large, instead of keeping a moderate load-factor and hoping that the number of collisions will be small. A maximal collision-check policy resizes when the shortest probe-sequence grows too large. </p> <p> Consider Figure <a href = "#balls_and_bins">Balls and bins</a>. Let the size of the hash table be denoted by <i>m</i>, the length of a probe sequence be denoted by <i>k</i>, and some load factor be denoted by α. We would like to calculate the minimal length of <i>k</i>, such that if there were <i>α m</i> elements in the hash table, a probe sequence of length <i>k</i> would be found with probability at most <i>1/m</i>. </p> <h6 align = "center"> <a name = "balls_and_bins"> <img src = "balls_and_bins.jpg" width = "70%" alt = "no image"> </a> </h6> <h6 align = "center"> Balls and bins. </h6> <p> Denote the probability that a probe sequence of length <i>k</i> appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution. Then </p> <p> <a name = "eqn:prob_of_p1"> <i>p<sub>1</sub> = </i>(3) </a> </p> <p> <i> <b>P</b>(l<sub>1</sub> ≥ k) = </i> </p> <p> <i><b>P</b>(l<sub>1</sub> ≥ α ( 1 + k / α - 1 ) ≤ </i>(a) </p> <p> <i> e ^ ( - ( α ( k / α - 1 )<sup>2</sup> ) /2 ) </i> , </p> <p> where (a) follows from the Chernoff bound [<a href = "references.html#motwani95random">motwani95random</a>]. To calculate the probability that <i>some</i> bin contains a probe sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are negatively-dependent [<a href = "references.html#dubhashi98neg">dubhashi98neg</a>]. Let <i><b>I</b>(.)</i> denote the indicator function. Then <p> <a name = "eqn:at_least_k_i_n_some_bin"> <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> ≥ k ) = </i>(3) </a> </p> <p> <i> <b>P</b> ( ∑ <sub>i = 1</sub><sup>m</sup> <b>I</b>(l<sub>i</sub> ≥ k) ≥ 1 ) = </i> </p> <p> <i> <b>P</b> ( ∑ <sub>i = 1</sub><sup>m</sup> <b>I</b> ( l<sub>i</sub> ≥ k ) ≥ m p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 ) ) ≤ </i>(a) </p> <p> <i> e ^ ( ( - m p<sub>1</sub> ( 1 / (m p<sub>1</sub>) - 1 ) <sup>2</sup> ) / 2 ) , </i> </p> <p> where (a) follows from the fact that the Chernoff bound can be applied to negatively-dependent variables [<a href = "references.html#dubhashi98neg">dubhashi98neg</a>]. Inserting <a href = "#prob_of_p1">(2)</a> into <a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>, we obtain </p> <p> <i> k ~ √ ( 2 α </i>ln<i> 2 m </i>ln<i>(m) ) ) </i> . </p> <h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3> <p> The resize policies in the previous subsection are conceptually straightforward. The design of hash-based containers' size-related interface is complicated by some factors. </p> <ol> <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no distinction between the number of entries the container holds and the number of entries it is using. This, of course, is not the case for hash-based containers. Moreover, even describing the "actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries. </li> <li> The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy maintains an invariant concerning the load factor of a container object. This is sometimes too rigid: <ol> <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container). </li> <li> In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy. </li> </ol> </li> <li> Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a> discusses the previous concepts.) </li> </ol> <p> Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers. </p> <p> This library attempts to address these types of problems by delegating all size-related functionality to policy classes. Hash-based containers are parameterized by a resize-policy class (among others), and derive publicly from the resize-policy class [<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] <i>E.g.</i>, a collision-chaining hash table is defined as follows: </p> <pre> cc_ht_map< <b>class</b> Key, <b>class</b> Data, ... <b>class</b> Resize_Policy ...> : <b>public</b> Resize_Policy </pre> <p> The containers themselves lack any functionality or public interface for manipulating sizes. A container object merely forwards events to its resize policy object and queries it for needed actions. </p> <p> Figure <a href = "#insert_resize_sequence_diagram1"> Insert resize sequence diagram </a> shows a (possible) sequence diagram of an insert operation. The user inserts an element; the hash table notifies its resize policy that a search has started (point A); in this case, a single collision is encountered - the table notifies its resize policy of this (point B); the container finally notifies its resize policy that the search has ended (point C); it then queries its resize policy whether a resize is needed, and if so, what is the new size (points D to G); following the resize, it notifies the policy that a resize has completed (point H); finally, the element is inserted, and the policy notified (point I). </p> <h6 align = "center"> <a name = "insert_resize_sequence_diagram1"> <img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image"> </a> </h6> <h6 align = "center"> Insert resize sequence diagram. </h6> <p> This addresses, to some extent, the problems mentioned above: </p> <ol> <li> Different instantiations of range-hashing policies can be met with different instantiations of resize policies. </li> <li> Questions on size-related interface are avoided, since the containers have no size-related methods. Thus a container has no method for querying its actual size. It merely continuously forwards enough information to its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design the appropriate method. Also, a container has no methods for setting its size. It merely queries its resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and supports a <tt><b>protected virtual</b></tt> function for external resize. </li> </ol> <p> The library contains a single class for instantiating a resize policy, <tt>pb_assoc</tt> contains a standard resize policy, <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> (the name is explained shortly). In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility). ([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for changing between alternative interfaces at compile time.) </p> <p> As noted before, size and trigger policies are usually orthogonal. <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> is parameterized by size and trigger policies. For example, a collision-chaining hash table is typically be defined as follows: </p> <pre> cc_ht_map< key, data, ... hash_standard_resize_policy< some_trigger_policy, some_size_policy, ...> > </pre> <p> The sole function of <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> is to act as a standard delegator [<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these policies. <p> Figures <a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a> and <a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a> show sequence diagrams illustrating the interaction between the standard resize policy and its trigger and size policies, respectively. </p> <h6 align = "center"> <a name = "insert_resize_sequence_diagram2"> <img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image"> </a> </h6> <h6 align = "center"> Standard resize policy trigger sequence diagram. </h6> <h6 align = "center"> <a name = "insert_resize_sequence_diagram3"> <img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image"> </a> </h6> <h6 align = "center"> Standard resize policy size sequence diagram. </h6> <p> The library (currently) supports the following instantiations of size and trigger policies: </p> <ol> <li> <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> implements a load check trigger policy. </li> <li> <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a> implements a collision check trigger policy. </li> <li> <a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> implemens an exponential-size policy (which should be used with mask range hashing). </li> <li> <a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> implementing a size policy based on a sequence of primes [<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing </li> </ol> <p> The trigger policies also support interfaces for changing their specifics which depend on compile time constants. </p> <p> Figure <a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture of the resize-related classes. <tt>Container</tt> (which stands for any of the hash-based containers) is parameterized by <tt>Resize_Policy</tt>, from which it subclasses publicly [<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]. This class is currently instantiated only by <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>. <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> itself is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>. Currently, <tt>Trigger_Policy</tt> is instantiated by <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>, or <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by <a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>, or <a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>. </p> <h6 align = "center"> <a name = "resize_policy_cd"> <img src = "resize_policy_cd.jpg" width = "70%" alt = "no image"> </a> </h6> <h6 align = "center"> Resize policy class diagram. </h6> <h2><a name = "#policy_interaction">Policy Interaction</a></h2> <p> Hash-tables are unfortunately susceptible to choice of policies. One of the more complicated aspects of this is that poor combinations of good policies can alter performance drastically. Following are some considerations. </p> <h3>Range-Hashing Policies and Resize Policies</h3> <p> </p> <h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3> <p> <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> and <a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a> are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt> parameter. If the latter parameter is <tt><b>true</b></tt>, then the container stores with each entry a hash value, and uses this value in case of collisions to determine whether to apply a hash value. This can lower the cost of collision for some types, but increase the cost of collisions for other types. </p> <p> If a ranged-hash function or ranged probe function is directly supplied, however, then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted. </p> </body> </html>