OpenCores
URL https://opencores.org/ocsvn/scarts/scarts/trunk

Subversion Repositories scarts

[/] [scarts/] [trunk/] [toolchain/] [scarts-gcc/] [gcc-4.1.1/] [libstdc++-v3/] [docs/] [html/] [ext/] [pb_assoc/] [motivation.html] - Rev 20

Compare with Previous | Blame | View Log

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
    <head>
        <title>Motivation</title>
        <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
        <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
    </head>
<body bgcolor = "white">
<h1>Motivation</h1>
 
<p>
	The <a href = "introduction.html">Introduction</a> Section described some challenges
in designing associative containers. This section describes the STL's solution and motivation for an alternative solution. It is organized as follows.
</p>
 
<ol>
	<li> <a href = "#stl">The STL's Associative-Container Design</a>
	briefly describes the STL's solution.
	</li>
	<li> <a href = "#policies">Choice of Policies</a> discusses possible additional policies by which to parameterize data structures.
	</li>
	<li> <a href = "#ds_genericity">Data-Structure Genericity</a> discusses possible problems with generic manipulation of containers based on different underlying data-structures.
	</li>
	<li> <a href = "#mapping_semantics">Mapping Semantics</a> discusses scalability issues with the STL's non-unique-mapping associative containers.
	</li>
	<li> <a href = "#methods">Choice of Methods</a> discusses some reservations with the choice of methods in the STL.
	</li>
</ol>
 
<h2><a name = "stl">The STL's Associative-Container Design</a></h2>
 
<p>
	The STL (or its extensions) currently offer associative containers based on underlying red-black trees or collision-chaining hash tables. For association, containers based on trees are parameterized by a comparison functor, and containers based on hash tables are parameterized by a hash functor and an equivalence functor.
</p>
 
<p>
	For each underlying data-structure, the STL offers four containers with different mapping semantics. A map-type uniquely maps each key to some datum, a set-type stores uniquely keys, a multimap-type non-uniquely maps each key to some datum, and a multiset-type non-uniquely stores keys.
</p>
 
<p>
	Containers contain various iterator-based methods. <i>E.g.</i>, all containers have constructors taking a pair of iterators, and transactionally construct an object containing all elements in the iterators' range. Additionally, it is possible to (non-transactionally) insert a range given by iterators, or erase such a range. Other methods are implicitly range-based, <i>e.g.</i>, it is possible to test the equivalence of two associative container objects via <tt><b>operator</b>==</tt>.
</p>
 
<h2><a name = "policies">Choice of Policies</a></h2>
 
<p>
	In order to function efficiently in various settings, associative containers require
a wide variety of policies.
</p>
 
<p>
	For example, a hash policy instructs how to transform a key object into some non-negative integral type; <i>e.g.</i>, a hash functor might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A hash table, though, requires transforming each key object into some non-negative integral type in some specific domain; <i>e.g.</i>, a hash table with 128 entries might transform the <tt>"hello"</tt> into position 63. The policy by which the hash value is transformed into a position within the table can dramatically affect performance.
</p>
 
<p>
	Additionally, most hash-table algorithms encounter collisions. To mitigate the cost of these collisions, it sometimes is beneficial to store the hash value along with each element
[<a href = "references.html#clrs2001">clrs2001</a>, <a href = "references.html#austern01htprop">austern01htprop</a>]. While this improves performance for complex keys, it hampers performance for simple keys, and is best left as a policy.
</p>
 
<p>
	Tree-based containers allow reasonable access while maintaining order between elements. In some cases, however, tree-based containers can be used for additional purposes. <i>E.g.</i>,consider Figure
<a href = "#interval_invariants">
Sets of line intervals
</a>-A,
which shows
an example of a tree-based set storing
half-open geometric line intervals. An <tt>std::set</tt> with this
structure can efficiently answer whether <i>[20, 101)</i> is in the
set, but it cannot efficiently answer whether any interval in the
set overlaps <i>[20, 101)</i>, nor can it efficiently enumerate all
intervals overlapping <i>[20, 101)</i>. A well-known augmentation to
balanced trees can support efficient answers to such questions
[<a href = "references.html#clrs2001">clrs2001</a>]. Namely,
an invariant should be maintained whereby
each node should contain also the
maximal endpoint of any interval within its subtree, as in Figure
<a href = "#interval_invariants">
Sets of line intervals
</a>-B. In order to maintain this ivariant, though, an invariant-restoring policy is
required.
</p>
 
<h6 align = "center">
<a name = "interval_invariants">
<img src = "interval_node_invariants.jpg" width = "70%" alt = "no image">
</a>
</h6>
<h6 align = "center">
Sets of line intervals.
</h6>
 
 
<h2><a name = "ds_genericity">Data-Structure Genericity</a></h2>
 
<p>
	Consider a generic function manipulating an associative container, <i>e.g.</i>,
</p>
 
<pre>
<b>template</b>&lt;
	<b>class</b> Cntnr&gt;
<b>int</b> some_op_sequence
    (Cntnr &r_cnt)
{
	...
}
</pre>
 
<p>
	The underlying data structure affects what the function can do with the container object.
</p>
 
<p>
 For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then the function can
use <tt>std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)</tt>
in order to apply <tt>foobar</tt> to all elements between <tt>foo</tt>
and <tt>bar</tt>. If <tt>Cntnr</tt> is a hash-based container, then this call's results are undefined.
</p>
 
<p>
	Also, if <tt>Cntnr</tt> is tree-based, the type and object of the comparison functor
can be accessed. If <tt>Cntnr</tt> is hash based, these queries are nonsensical</p>
 
<p>
	These types of problems are excaberated when considering the wide variety of useful underlying data-structures. 	Figure
<a href = "#different_underlying_data_structures">Different underlying data structures</a>
shows different underlying data-structures (the ones
currently supported in <tt>pb_assoc</tt>). A shows a collision-chaining hash-table; B shows a probing hash-table; C shows a red-black tree; D shows a splay tree; E shows a tree based on an ordered vector (the tree is implicit in the order of the elements); E shows a list-based container with update policies.
</p>
 
<h6 align = "center">
<a name = "different_underlying_data_structures">
<img src = "different_underlying_dss.jpg" width = "70%" alt = "no image">
</a>
</h6>
<h6 align = "center">
Different underlying data structures.
</h6>
 
<p>
	These underlying data structures display different behavior. For one, they can be queried for different policies. Furthermore:
</p>
<ol>
	<li>
		Containers based on C, D, and E store eleents in a meaningful order; the others store elements in a meaningless (and probably time-varying) order. As a futher consequence, containers based on C, D, and E can support erase operations taking an iterator and returning an iterator to the following element; the others cannot.
	</li>
	<li>
		Containers based on C, D, and E can be split and joined efficiently, while the others cannot. Containers based on C and D, futhermore, can guarantee that this is exception-free; containers based on E cannot guarantee this.
	</li>
	<li>
		Containers based on all but E can guarantee that erasing an element is exception free; containers based on E cannot guarantee this. Containers based on all but B and E can guarantee that modifying an object of their type does not invalidate iterators or references to their elements, while contianers based on B and E cannot. Containers based on C, D, and E can futhermore make a stronger guarantee, namely that modifiying an object of their type does not affect the relation of iterators.
	</li>
</ol>
 
<p>
	A unified tag and traits system (as used for the STL's iterators, for example) can ease  generic manipulation of associative containers based on different underlying data-structures.
</p>
 
<h2><a name = "mapping_semantics">Mapping Semantics</a></h2>
 
	<p>
	In some cases, map and set semantics are inappropriate. <i>E.g.</i>, consider
an application monitoring user activity. Such an application might be designed to track a user, the machine(s) to which the user is logged, application(s) the user is running on the machine, and the start time of the application. In this case, since a user might run more than a single application, there can be no unique mapping from a user to specific datum.
	</p>
 
<p>
    The STL's non-unique mapping containers (<i>e.g.</i>,
<tt>std::multimap</tt> and <tt>std::multiset</tt>) can be used
in this case. These types of containers can store store two or more equivalent, non-identical keys [<a href = "references.html#kleft00sets">kleft00sets</a>]. Figure
<a href = "#embedded_lists_1">Non-unique mapping containers in the STL's design</a> shows possible structures of STL tree-based and hash-based containers, multisets, respectively; in this figure, equivalent-key nodes share the same shading.
</p>
 
<h6 align = "center">
<a name = "embedded_lists_1">
<img src = "embedded_lists_1.jpg" width = "70%" alt = "no image">
</a>
</h6>
<h6 align = "center">
Non-unique mapping containers in the STL's design.
</h6>
 
<p>
	This design has several advantages. Foremost, it allows maps and multimaps, and sets and multisets, to share the same <tt>value_type</tt>, easing generic manipulation of containers with different mapping semantics.
</p>
 
 
<p>
    Conversely, this design has possible scalability drawbacks, due to an implicit "embedding" of linked lists.
Figure
<a href = "#embedded_lists_2">
Embedded lists in STL multimaps
</a>-A shows a tree with shaded nodes sharing equivalent keys;
Figure
<a href = "#embedded_lists_2">
Embedded lists in STL multimaps
</a>-A explicitly shows the linked lists implicit in Figure
<a href = "#embedded_lists_1">Non-unique mapping containers in the STL's design</a>. The drawbacks are the following.
</p>
 
<ol>
    <li> As mentioned before, there are several underlying data-structures, each with its set of tradeoffs.
The STL's design uses an associative linked-list to store all elements with equivalent primary
key (<i>e.g.</i>, users). Searching for a secondary key (<i>e.g.</i>,
a process) is inherently linear. While this works reasonably well when the number of distinct secondary
keys is small, it does not scale well.
    </li>
    <li> Embedding linked lists can cause the entire structure to be inefficient.
<i>E.g.</i>, Figure
<a href = "#embedded_lists_1">
Effect of embedded lists in STL multimaps
</a>-A
    shows a tree with several shaded nodes containing equivalent keys; note how unbalanced
this tree would seem when considering all shaded nodes to be a single node.
Figure
<a href = "#embedded_lists_1">
Effect of embedded lists in STL multimaps
</a>-B shows a hash table with several shaded nodes containing equivalent keys; note
that this can lengthen the search for other nodes as well.
    </li>
    <li> Embdedding linked lists is only possible for some data structures.
Some data structures, <i>e.g.</i>, probing-hash tables, linear hash tables,
and extendible hash tables, cannot support it.
    </li>
    <li> The embedded linked list design forgoes the abilitiy to treat
all elements with the same primary key as a single entity. The ability to
efficiently simultaneously insert (or erase) a larger number of elements with
the same primary key is lost; the ability to utilize segmented iterators is lost
[<a href = "references.html#austern98segmented">austern98segmented</a>].
	</li>
	<li> The linked-list design uses much space. For one, in the above example, the data identifying will must be duplicated for each application run by the user. Furthermore, the "links" in the linked list are supplied by the underlying data structure. In the case of tree-based containers, for example, the linked list utilizes the fact that each tree node contains pointers to its parent and its children; given that the order of equivalent keys is meaningless, the number of pointers exceeds the functionality supplied by a linked list.
	</li>
</ol>
 
<h6 align = "center">
<a name = "embedded_lists_2">
<img src = "embedded_lists_2.jpg" width = "70d" alt = "no image">
</a>
</h6>
<h6 align = "center">
Embedded lists in STL multimaps.
</h6>
 
 
<h2><a name = "methods">Choice of Methods</a></h2>
 
<p>
	[<a href = "references.html#meyers02both">meyers02both</a>] points out
that a class's methods should comprise only operations which depend on the class's internal structure; other operations are best designed as external functions. Possibly, therefore, the STL's associative containers lack some useful methods, and provide some redundant methods.
</p>
 
<ol>
	<li>
		Possibly missing methods:
	</li>
	<ol>
		<li>
			It is well-known that tree-based container objects can be efficiently split or joined
		[<a href = "references.html#clrs2001">clrs2001</a>]. Externally splitting or joining trees is super-linear, and, furthermore, can throw exceptions. Split and join methods, consequently, seem good choices for tree-based container methods.
		</li>
		<li>
			Suppose all elements which match a certain criteria need to be erased from an
unordered container object, <i>e.g.</i>, all elements whos keys are in a given range. Externally erasing them from the container object is super-linear, since erasing an element might reorder all iterators. Conditional erasing, therefore, seems a good choice for associative containers.
		</li>
	</ol>
		<li> Possibly redundant methods:</li>
	<ol>
		<li>
			STL associative containers provide methods for inserting a range of elements given by a pair of iterators. At best, this can be implemented as an external function, or, even more efficiently, as a join operation (for the case of tree-based containers). Moreover, these methods seem similar to constructors taking a range given by a pair of iterators; the constructors, however, are transactional, whereas the insert methods are not; this is possibly confusing.
		</li>
		<li>
			STL associative containers provide methods for erasing a range of elements given by a pair of iterators. At best, this can be implemented as an external function, or, even more efficiently, as a (small) sequence of split and join operations (for the case of tree-based containers). Moreover, the results of erasing a range is undefined for the case of containers based on unordered data-structures.
		</li>
		<li>
			Associative containers are parameterized by policies allowing to test keys, but not data, for equivalence. When comparing two associative container objects, it is at least as reasonable to expect that they are equivalent if both keys and data are equivalent, as it is reasonable to expect that they are equivalent if their keys only are equivalent. Furthermore, in different settings it makes sense that two objects are equivalent if they store keys in the same order, whereas in other settings order does not matter. The operators <tt>operator==</tt> and <tt>operator!=</tt> are not descriptive enough for these considerations.
		</li>
	</ol>
</ol>
 
</body>
 
</html>
 

Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.