URL https://opencores.org/ocsvn/scarts/scarts/trunk
Subversion Repositories scarts

[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [resize_policies.html] - Rev 20

Compare with Previous | Blame | View Log
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
    <head>
        <title>Resize Policies</title>
        <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
        <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
    </head>
<body bgcolor = "white">
<h1>Hash-Based Continers' Resize Policies</h1>
 
<p>
    This subsection describes resize policies. It is organized as follows:
</p>
 
<ol>
    <li> The <a href = "#general">General Terms</a> Subsection describes general
        terms.
    </li>
    <li> The <a href = "#size_policies">Size Policies</a> Subsection describes size
    policies.
    </li>
    <li> The <a href = "#trigger_policies">Trigger Policies</a> Subsection describes trigger
    policies.
    </li>   <li> The <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a>
        Subsection describes the implementation of these concepts in <tt>pb_assoc</tt>.
    </li>
</ol>
 
 
<h2><a name = "general">General Terms</a></h2>
 
<p>
    Hash-tables, as opposed to trees, do not naturally grow or shrink. It
is necessary to specify policies to determine how and when a hash table should change
its size.
</p>
 
<p>
    In general, resize policies can be decomposed into (probably orthogonal)
policies:
</p>
<ol>
    <li> A <i>size policy</i> indicating <i>how</i> a hash table should
grow (<i>e.g.,</i> it should multiply by powers of 2).
    </li>
    <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should
grow (<i>e.g.,</i> a load factor is exceeded).
    </li>
</ol>
 
 
 
<h2><a name = "size_policies">Size Policies</a></h2>
 
<p>
    Size policies determine how a hash table
changes size. These policies are simple, and there are relatively
few sensible options. An exponential-size policy (with the initial
size and growth factors both powers of 2) works well with a
mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
and is the
hard-wired policy used by Dinkumware
[<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A
prime-list based policy works well with a modulo-prime range
hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
and is the
hard-wired policy used by SGI's implementation
[<a href = "references.html#sgi_stl">sgi_stl</a>].
</p>
 
 
 
 
<h2><a name = "trigger_policies">Trigger Policies</a></h2>
 
<p>
    Trigger policies determine when a hash table changes size.
Following is a description of two polcies: <i>load-check</i>
policies, and a collision-check policies.
</p>
 
<p>
    Load-check policies are straightforward. The user
specifies two factors, <i>&alpha;<sub>min</sub></i> and <i>&alpha;<sub>max</sub></i>, and
the hash table maintains the invariant that
</p>
<p>
    <i>
        <a name = "eqn:load_factor_min_max">
        &alpha;<sub>min</sub>
        &le;
        (number of stored elements) / (hash-table size)
        &le;
        &alpha;<sub>max</sub>
        </a>
    </i>
    (1)
    .
</p>
 
<p>
    Collision-check policies work in the opposite direction of
load-check policies. They focus on keeping the number of
collisions moderate and hoping
that the size of the table will not grow very large,
instead of keeping a moderate load-factor and
hoping that the number of collisions will be small.
A
maximal collision-check policy resizes when the shortest
probe-sequence grows too large.
</p>
 
 
<p>
    Consider Figure
<a href = "#balls_and_bins">Balls and bins</a>.
    Let the size of the hash table be denoted by <i>m</i>, the
length of a probe sequence be denoted by <i>k</i>, and some load
factor be denoted by &alpha;. We would like to calculate the
minimal length of <i>k</i>, such that if there were <i>&alpha; m</i> elements
in the hash table, a probe sequence of length <i>k</i> would be found
with probability at most <i>1/m</i>.
</p>
 
<h6 align = "center">
<a name = "balls_and_bins">
<img src = "balls_and_bins.jpg" width = "60%" alt = "no image">
</a>
Balls and bins.
</h6>
 
 
<p>
    Denote the probability that a probe sequence of length <i>k</i>
appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence
of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution.
Then
</p>
    <p>
    <a name = "eqn:prob_of_p1">
        <i>p<sub>1</sub>
        = </i>(3)
    </a>
    </p>
    <p>
    <i>
    <b>P</b>(l<sub>1</sub> &ge; k)
    =
    </i>
    </p>
    <p>
    <i><b>P</b>(l<sub>1</sub> &ge; &alpha; ( 1 + k / &alpha; - 1 )
    &le; </i>(a)
    </p>
    <p>
    <i>
    e
    ^
    (
        -
        (
            &alpha; ( k / &alpha; - 1 )<sup>2</sup>
        )
        /2
    )
    </i>
    ,
</p>
<p>
    where (a) follows from the Chernoff bound
[<a href = "references.html#motwani95random">motwani95random</a>].
To
calculate the probability that <i>some</i> bin contains a probe
sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are
negatively-dependent
[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
Let <i><b>I</b>(.)</i>
denote the indicator function. Then
    <p>
    <a name = "eqn:at_least_k_i_n_some_bin">
        <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> &ge; k )
        = </i>(3)
    </a>
    </p>
    <p>
    <i>
   <b>P</b>
    (
        &sum; <sub>i = 1</sub><sup>m</sup>
        <b>I</b>(l<sub>i</sub> &ge; k) &ge; 1
    )
    =
    </i>
    </p>
    <p>
    <i>
    <b>P</b>
    (
        &sum; <sub>i = 1</sub><sup>m</sup>
        <b>I</b>
        (
            l<sub>i</sub> &ge; k
        )
        &ge;
        m  p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 )
    )
    &le; </i>(a)
    </p>
    <p>
    <i>
    e
    ^
    (
        (
            -
            m p<sub>1</sub>
            (
                1 / (m p<sub>1</sub>) - 1
            )
            <sup>2</sup>
        )
        /
        2
    )
    ,
    </i>
    </p>
<p>
where (a) follows from the fact that the Chernoff bound can be
applied to negatively-dependent variables
[<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
Inserting <a href = "#prob_of_p1">(2)</a> into
<a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>,
we obtain
</p>
<p>
    <i>
    k
    ~
    &radic;
    (
        2 &alpha; </i>ln<i> 2 m </i>ln<i>(m) )
    )
    </i>
    .
</p>
 
 
 
 
 
 
 
 
 
<h2><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h2>
 
<p>
    The resize policies in the previous subsection are conceptually straightforward. The design
of hash-based containers' size-related interface is complicated by some factors.
</p>
<ol>
    <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no
distinction between the number of entries the container holds and the number of entries it is using. This,
of course, is not the case for hash-based containers. Moreover, even describing the
"actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container
holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries.
    </li>
    <li>
    The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy
maintains an invariant concerning the load factor of a container object. This is sometimes too rigid:
    <ol>
        <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container).
        </li>
        <li>
            In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy.
        </li>
    </ol>
    </li>
    <li>
    Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a>
discusses the previous concepts.)
    </li>
</ol>
 
<p>
    Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers.
</p>
 
 
 
<p>
        This library attempts to address these types of problems by delegating all size-related functionality to
policy classes. Hash-based containers
are parameterized by a resize-policy class (among others), and derive publicly from
the resize-policy class
[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]
 <i>E.g.</i>, a collision-chaining
hash table is defined as follows:
</p>
<pre>
cc_ht_map&lt;
  <b>class</b> Key,
  <b>class</b> Data,
  ...
  <b>class</b> Resize_Policy
  ...&gt; :
    <b>public</b> Resize_Policy
</pre>
 
<p>
    The containers themselves lack any functionality or public interface for manipulating sizes. A container
object merely forwards events to its resize policy object and queries it for needed actions.
</p>
 
<p>
    Figure
<a href = "#insert_resize_sequence_diagram1">
Insert resize sequence diagram
</a>
shows a (possible) sequence diagram of an insert operation.
The user inserts an element; the hash table
notifies its resize policy that a search has started (point A);
in this case, a single collision is encountered - the table
notifies its resize policy of this (point B); the container
finally notifies its resize policy that the search has ended (point C);
it then queries its resize policy whether a resize is needed, and if so,
what is the new size (points D to G); following the resize, it notifies
the policy that a resize has completed (point H); finally, the element
is inserted, and the policy notified (point I).
</p>
 
<h6 align = "center">
<a name = "insert_resize_sequence_diagram1">
<img src = "insert_resize_sequence_diagram1.jpg" width = "50%" alt = "no image">
</a>
</h6>
<h6 align = "center">
Insert resize sequence diagram.
</h6>
 
<p>
    This addresses, to some extent, the problems mentioned above:
</p>
<ol>
    <li>
        Different instantiations of range-hashing policies can be met with different instantiations of
    resize policies.
    </li>
    <li>
        Questions on size-related interface are avoided, since the containers have no size-related methods. Thus
    a container has no method for querying its actual size. It merely continuously forwards enough information to
    its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design     the appropriate method. Also, a container has no methods for setting its size. It merely queries its
resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and
supports a <tt><b>protected virtual</b></tt> function for external resize.
    </li>
</ol>
 
<p>
    The library contains a single class for instantiating a resize policy,
<tt>pb_assoc</tt> contains
a standard resize policy,
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> (the name is explained shortly).
In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports
queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility).
([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for
changing between alternative interfaces at compile time.)
</p>
 
<p>
As noted before,
    size and trigger policies are usually orthogonal.
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
is parameterized by size and trigger policies. For example,
a collision-chaining hash table
is typically be defined as follows:
</p>
<pre>
cc_ht_map&lt;
  key,
  data,
  ...
  hash_standard_resize_policy&lt;
    some_trigger_policy,
    some_size_policy,
    ...&gt; &gt;
</pre>
 
<p>
 The sole function of
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
 is to
act as a standard delegator
[<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these
policies.
 
<p>
    Figures
<a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a>
    and
<a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a>
    show sequence diagrams illustrating the interaction between
    the standard resize policy and its trigger and size policies, respectively.
</p>
 
<h6 align = "center">
<a name = "insert_resize_sequence_diagram2">
<img src = "insert_resize_sequence_diagram2.jpg" width = "50%" alt = "no image">
</a>
</h6>
<h6 align = "center">
Standard resize policy trigger sequence diagram.
</h6>
 
<h6 align = "center">
<a name = "insert_resize_sequence_diagram3">
<img src = "insert_resize_sequence_diagram3.jpg" width = "50%" alt = "no image">
</a>
</h6>
<h6 align = "center">
Standard resize policy size sequence diagram.
</h6>
 
<p>
    The library (currently) supports the following instantiations of size
and trigger policies:
</p>
 
<ol>
    <li>
        <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> implements
    a load check trigger policy.
    </li>
    <li>
        <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
    implements a collision check trigger policy.
    </li>
    <li>
<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> implemens
an exponential-size policy (which should be used with mask range hashing).
    </li>
    <li>
<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> implementing
a size policy based on a sequence of primes
[<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing
    </li>
</ol>
 
<p>
    The trigger policies also support interfaces for changing their specifics which depend on compile time constants.
</p>
 
 
<p>
    Figure
<a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture
of the resize-related classes.
<tt>Container</tt> (which stands for any of the hash-based containers) is parameterized
by <tt>Resize_Policy</tt>, from which it subclasses publicly
[<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>].
This class is currently instantiated only by
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
<a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> itself
is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>.
Currently, <tt>Trigger_Policy</tt> is instantiated by
<a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
or
<a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by
<a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
or
<a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.
</p>
 
 
<h6 align = "center">
<a name = "resize_policy_cd">
<img src = "resize_policy_cd.jpg" width = "40%" alt = "no image">
</a>
</h6>
<h6 align = "center">
Resize policy class diagram.
</h6>
 
 
</body>
 
</html>
Compare with Previous | Blame | View Log
Browse

Tools

Subversion Repositories scarts

[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [resize_policies.html] - Rev 20