1 |
424 |
jeremybenn |
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
2 |
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
3 |
|
|
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Chapter 33. Allocators</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2" /><meta name="keywords" content=" ISO C++ , library " /><link rel="home" href="../spine.html" title="The GNU C++ Library Documentation" /><link rel="up" href="extensions.html" title="Part XII. Extensions" /><link rel="prev" href="bk01pt12ch32s07.html" title="Diagnostics" /><link rel="next" href="bitmap_allocator.html" title="bitmap_allocator" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 33. Allocators</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="bk01pt12ch32s07.html">Prev</a> </td><th width="60%" align="center">Part XII.
|
4 |
|
|
Extensions
|
5 |
|
|
|
6 |
|
|
</th><td width="20%" align="right"> <a accesskey="n" href="bitmap_allocator.html">Next</a></td></tr></table><hr /></div><div class="chapter" title="Chapter 33. Allocators"><div class="titlepage"><div><div><h2 class="title"><a id="manual.ext.allocator"></a>Chapter 33. Allocators</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="ext_allocators.html#manual.ext.allocator.mt">mt_allocator</a></span></dt><dd><dl><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.intro">Intro</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.design_issues">Design Issues</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.impl">Implementation</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.example_single">Single Thread Example</a></span></dt><dt><span class="sect2"><a href="ext_allocators.html#allocator.mt.example_multi">Multiple Thread Example</a></span></dt></dl></dd><dt><span class="sect1"><a href="bitmap_allocator.html">bitmap_allocator</a></span></dt><dd><dl><dt><span class="sect2"><a href="bitmap_allocator.html#allocator.bitmap.design">Design</a></span></dt><dt><span class="sect2"><a href="bitmap_allocator.html#allocator.bitmap.impl">Implementation</a></span></dt></dl></dd></dl></div><div class="sect1" title="mt_allocator"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="manual.ext.allocator.mt"></a>mt_allocator</h2></div></div></div><p>
|
7 |
|
|
</p><div class="sect2" title="Intro"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.intro"></a>Intro</h3></div></div></div><p>
|
8 |
|
|
The mt allocator [hereinafter referred to simply as "the allocator"]
|
9 |
|
|
is a fixed size (power of two) allocator that was initially
|
10 |
|
|
developed specifically to suit the needs of multi threaded
|
11 |
|
|
applications [hereinafter referred to as an MT application]. Over
|
12 |
|
|
time the allocator has evolved and been improved in many ways, in
|
13 |
|
|
particular it now also does a good job in single threaded
|
14 |
|
|
applications [hereinafter referred to as a ST application]. (Note:
|
15 |
|
|
In this document, when referring to single threaded applications
|
16 |
|
|
this also includes applications that are compiled with gcc without
|
17 |
|
|
thread support enabled. This is accomplished using ifdef's on
|
18 |
|
|
__GTHREADS). This allocator is tunable, very flexible, and capable
|
19 |
|
|
of high-performance.
|
20 |
|
|
</p><p>
|
21 |
|
|
The aim of this document is to describe - from an application point of
|
22 |
|
|
view - the "inner workings" of the allocator.
|
23 |
|
|
</p></div><div class="sect2" title="Design Issues"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.design_issues"></a>Design Issues</h3></div></div></div><div class="sect3" title="Overview"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.overview"></a>Overview</h4></div></div></div><p> There are three general components to the allocator: a datum
|
24 |
|
|
describing the characteristics of the memory pool, a policy class
|
25 |
|
|
containing this pool that links instantiation types to common or
|
26 |
|
|
individual pools, and a class inheriting from the policy class that is
|
27 |
|
|
the actual allocator.
|
28 |
|
|
</p><p>The datum describing pools characteristics is
|
29 |
|
|
</p><pre class="programlisting">
|
30 |
|
|
template<bool _Thread>
|
31 |
|
|
class __pool
|
32 |
|
|
</pre><p> This class is parametrized on thread support, and is explicitly
|
33 |
|
|
specialized for both multiple threads (with <code class="code">bool==true</code>)
|
34 |
|
|
and single threads (via <code class="code">bool==false</code>.) It is possible to
|
35 |
|
|
use a custom pool datum instead of the default class that is provided.
|
36 |
|
|
</p><p> There are two distinct policy classes, each of which can be used
|
37 |
|
|
with either type of underlying pool datum.
|
38 |
|
|
</p><pre class="programlisting">
|
39 |
|
|
template<bool _Thread>
|
40 |
|
|
struct __common_pool_policy
|
41 |
|
|
|
42 |
|
|
template<typename _Tp, bool _Thread>
|
43 |
|
|
struct __per_type_pool_policy
|
44 |
|
|
</pre><p> The first policy, <code class="code">__common_pool_policy</code>, implements a
|
45 |
|
|
common pool. This means that allocators that are instantiated with
|
46 |
|
|
different types, say <code class="code">char</code> and <code class="code">long</code> will both
|
47 |
|
|
use the same pool. This is the default policy.
|
48 |
|
|
</p><p> The second policy, <code class="code">__per_type_pool_policy</code>, implements
|
49 |
|
|
a separate pool for each instantiating type. Thus, <code class="code">char</code>
|
50 |
|
|
and <code class="code">long</code> will use separate pools. This allows per-type
|
51 |
|
|
tuning, for instance.
|
52 |
|
|
</p><p> Putting this all together, the actual allocator class is
|
53 |
|
|
</p><pre class="programlisting">
|
54 |
|
|
template<typename _Tp, typename _Poolp = __default_policy>
|
55 |
|
|
class __mt_alloc : public __mt_alloc_base<_Tp>, _Poolp
|
56 |
|
|
</pre><p> This class has the interface required for standard library allocator
|
57 |
|
|
classes, namely member functions <code class="code">allocate</code> and
|
58 |
|
|
<code class="code">deallocate</code>, plus others.
|
59 |
|
|
</p></div></div><div class="sect2" title="Implementation"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.impl"></a>Implementation</h3></div></div></div><div class="sect3" title="Tunable Parameters"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.tune"></a>Tunable Parameters</h4></div></div></div><p>Certain allocation parameters can be modified, or tuned. There
|
60 |
|
|
exists a nested <code class="code">struct __pool_base::_Tune</code> that contains all
|
61 |
|
|
these parameters, which include settings for
|
62 |
|
|
</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Alignment</p></li><li class="listitem"><p>Maximum bytes before calling <code class="code">::operator new</code> directly</p></li><li class="listitem"><p>Minimum bytes</p></li><li class="listitem"><p>Size of underlying global allocations</p></li><li class="listitem"><p>Maximum number of supported threads</p></li><li class="listitem"><p>Migration of deallocations to the global free list</p></li><li class="listitem"><p>Shunt for global <code class="code">new</code> and <code class="code">delete</code></p></li></ul></div><p>Adjusting parameters for a given instance of an allocator can only
|
63 |
|
|
happen before any allocations take place, when the allocator itself is
|
64 |
|
|
initialized. For instance:
|
65 |
|
|
</p><pre class="programlisting">
|
66 |
|
|
#include <ext/mt_allocator.h>
|
67 |
|
|
|
68 |
|
|
struct pod
|
69 |
|
|
{
|
70 |
|
|
int i;
|
71 |
|
|
int j;
|
72 |
|
|
};
|
73 |
|
|
|
74 |
|
|
int main()
|
75 |
|
|
{
|
76 |
|
|
typedef pod value_type;
|
77 |
|
|
typedef __gnu_cxx::__mt_alloc<value_type> allocator_type;
|
78 |
|
|
typedef __gnu_cxx::__pool_base::_Tune tune_type;
|
79 |
|
|
|
80 |
|
|
tune_type t_default;
|
81 |
|
|
tune_type t_opt(16, 5120, 32, 5120, 20, 10, false);
|
82 |
|
|
tune_type t_single(16, 5120, 32, 5120, 1, 10, false);
|
83 |
|
|
|
84 |
|
|
tune_type t;
|
85 |
|
|
t = allocator_type::_M_get_options();
|
86 |
|
|
allocator_type::_M_set_options(t_opt);
|
87 |
|
|
t = allocator_type::_M_get_options();
|
88 |
|
|
|
89 |
|
|
allocator_type a;
|
90 |
|
|
allocator_type::pointer p1 = a.allocate(128);
|
91 |
|
|
allocator_type::pointer p2 = a.allocate(5128);
|
92 |
|
|
|
93 |
|
|
a.deallocate(p1, 128);
|
94 |
|
|
a.deallocate(p2, 5128);
|
95 |
|
|
|
96 |
|
|
return 0;
|
97 |
|
|
}
|
98 |
|
|
</pre></div><div class="sect3" title="Initialization"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.init"></a>Initialization</h4></div></div></div><p>
|
99 |
|
|
The static variables (pointers to freelists, tuning parameters etc)
|
100 |
|
|
are initialized as above, or are set to the global defaults.
|
101 |
|
|
</p><p>
|
102 |
|
|
The very first allocate() call will always call the
|
103 |
|
|
_S_initialize_once() function. In order to make sure that this
|
104 |
|
|
function is called exactly once we make use of a __gthread_once call
|
105 |
|
|
in MT applications and check a static bool (_S_init) in ST
|
106 |
|
|
applications.
|
107 |
|
|
</p><p>
|
108 |
|
|
The _S_initialize() function:
|
109 |
|
|
- If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool
|
110 |
|
|
_S_force_new to true and then returns. This will cause subsequent calls to
|
111 |
|
|
allocate() to return memory directly from a new() call, and deallocate will
|
112 |
|
|
only do a delete() call.
|
113 |
|
|
</p><p>
|
114 |
|
|
- If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MT
|
115 |
|
|
applications will:
|
116 |
|
|
- Calculate the number of bins needed. A bin is a specific power of two size
|
117 |
|
|
of bytes. I.e., by default the allocator will deal with requests of up to
|
118 |
|
|
128 bytes (or whatever the value of _S_max_bytes is when _S_init() is
|
119 |
|
|
called). This means that there will be bins of the following sizes
|
120 |
|
|
(in bytes): 1, 2, 4, 8, 16, 32, 64, 128.
|
121 |
|
|
|
122 |
|
|
- Create the _S_binmap array. All requests are rounded up to the next
|
123 |
|
|
"large enough" bin. I.e., a request for 29 bytes will cause a block from
|
124 |
|
|
the "32 byte bin" to be returned to the application. The purpose of
|
125 |
|
|
_S_binmap is to speed up the process of finding out which bin to use.
|
126 |
|
|
I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes).
|
127 |
|
|
</p><p>
|
128 |
|
|
- Create the _S_bin array. This array consists of bin_records. There will be
|
129 |
|
|
as many bin_records in this array as the number of bins that we calculated
|
130 |
|
|
earlier. I.e., if _S_max_bytes = 128 there will be 8 entries.
|
131 |
|
|
Each bin_record is then initialized:
|
132 |
|
|
- bin_record->first = An array of pointers to block_records. There will be
|
133 |
|
|
as many block_records pointers as there are maximum number of threads
|
134 |
|
|
(in a ST application there is only 1 thread, in a MT application there
|
135 |
|
|
are _S_max_threads).
|
136 |
|
|
This holds the pointer to the first free block for each thread in this
|
137 |
|
|
bin. I.e., if we would like to know where the first free block of size 32
|
138 |
|
|
for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ]
|
139 |
|
|
|
140 |
|
|
The above created block_record pointers members are now initialized to
|
141 |
|
|
their initial values. I.e. _S_bin[ n ].first[ n ] = NULL;
|
142 |
|
|
</p><p>
|
143 |
|
|
- Additionally a MT application will:
|
144 |
|
|
- Create a list of free thread id's. The pointer to the first entry
|
145 |
|
|
is stored in _S_thread_freelist_first. The reason for this approach is
|
146 |
|
|
that the __gthread_self() call will not return a value that corresponds to
|
147 |
|
|
the maximum number of threads allowed but rather a process id number or
|
148 |
|
|
something else. So what we do is that we create a list of thread_records.
|
149 |
|
|
This list is _S_max_threads long and each entry holds a size_t thread_id
|
150 |
|
|
which is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads.
|
151 |
|
|
Each time a thread calls allocate() or deallocate() we call
|
152 |
|
|
_S_get_thread_id() which looks at the value of _S_thread_key which is a
|
153 |
|
|
thread local storage pointer. If this is NULL we know that this is a newly
|
154 |
|
|
created thread and we pop the first entry from this list and saves the
|
155 |
|
|
pointer to this record in the _S_thread_key variable. The next time
|
156 |
|
|
we will get the pointer to the thread_record back and we use the
|
157 |
|
|
thread_record->thread_id as identification. I.e., the first thread that
|
158 |
|
|
calls allocate will get the first record in this list and thus be thread
|
159 |
|
|
number 1 and will then find the pointer to its first free 32 byte block
|
160 |
|
|
in _S_bin[ 5 ].first[ 1 ]
|
161 |
|
|
When we create the _S_thread_key we also define a destructor
|
162 |
|
|
(_S_thread_key_destr) which means that when the thread dies, this
|
163 |
|
|
thread_record is returned to the front of this list and the thread id
|
164 |
|
|
can then be reused if a new thread is created.
|
165 |
|
|
This list is protected by a mutex (_S_thread_freelist_mutex) which is only
|
166 |
|
|
locked when records are removed or added to the list.
|
167 |
|
|
</p><p>
|
168 |
|
|
- Initialize the free and used counters of each bin_record:
|
169 |
|
|
- bin_record->free = An array of size_t. This keeps track of the number
|
170 |
|
|
of blocks on a specific thread's freelist in each bin. I.e., if a thread
|
171 |
|
|
has 12 32-byte blocks on it's freelists and allocates one of these, this
|
172 |
|
|
counter would be decreased to 11.
|
173 |
|
|
|
174 |
|
|
- bin_record->used = An array of size_t. This keeps track of the number
|
175 |
|
|
of blocks currently in use of this size by this thread. I.e., if a thread
|
176 |
|
|
has made 678 requests (and no deallocations...) of 32-byte blocks this
|
177 |
|
|
counter will read 678.
|
178 |
|
|
|
179 |
|
|
The above created arrays are now initialized with their initial values.
|
180 |
|
|
I.e. _S_bin[ n ].free[ n ] = 0;
|
181 |
|
|
</p><p>
|
182 |
|
|
- Initialize the mutex of each bin_record: The bin_record->mutex
|
183 |
|
|
is used to protect the global freelist. This concept of a global
|
184 |
|
|
freelist is explained in more detail in the section "A multi
|
185 |
|
|
threaded example", but basically this mutex is locked whenever a
|
186 |
|
|
block of memory is retrieved or returned to the global freelist
|
187 |
|
|
for this specific bin. This only occurs when a number of blocks
|
188 |
|
|
are grabbed from the global list to a thread specific list or when
|
189 |
|
|
a thread decides to return some blocks to the global freelist.
|
190 |
|
|
</p></div><div class="sect3" title="Deallocation Notes"><div class="titlepage"><div><div><h4 class="title"><a id="allocator.mt.deallocation"></a>Deallocation Notes</h4></div></div></div><p> Notes about deallocation. This allocator does not explicitly
|
191 |
|
|
release memory. Because of this, memory debugging programs like
|
192 |
|
|
valgrind or purify may notice leaks: sorry about this
|
193 |
|
|
inconvenience. Operating systems will reclaim allocated memory at
|
194 |
|
|
program termination anyway. If sidestepping this kind of noise is
|
195 |
|
|
desired, there are three options: use an allocator, like
|
196 |
|
|
<code class="code">new_allocator</code> that releases memory while debugging, use
|
197 |
|
|
GLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or use a
|
198 |
|
|
custom pool datum that releases resources on destruction.
|
199 |
|
|
</p><p>
|
200 |
|
|
On systems with the function <code class="code">__cxa_atexit</code>, the
|
201 |
|
|
allocator can be forced to free all memory allocated before program
|
202 |
|
|
termination with the member function
|
203 |
|
|
<code class="code">__pool_type::_M_destroy</code>. However, because this member
|
204 |
|
|
function relies on the precise and exactly-conforming ordering of
|
205 |
|
|
static destructors, including those of a static local
|
206 |
|
|
<code class="code">__pool</code> object, it should not be used, ever, on systems
|
207 |
|
|
that don't have the necessary underlying support. In addition, in
|
208 |
|
|
practice, forcing deallocation can be tricky, as it requires the
|
209 |
|
|
<code class="code">__pool</code> object to be fully-constructed before the object
|
210 |
|
|
that uses it is fully constructed. For most (but not all) STL
|
211 |
|
|
containers, this works, as an instance of the allocator is constructed
|
212 |
|
|
as part of a container's constructor. However, this assumption is
|
213 |
|
|
implementation-specific, and subject to change. For an example of a
|
214 |
|
|
pool that frees memory, see the following
|
215 |
|
|
<a class="ulink" href="http://gcc.gnu.org/viewcvs/trunk/libstdc++-v3/testsuite/ext/mt_allocator/deallocate_local-6.cc?view=markup" target="_top">
|
216 |
|
|
example.</a>
|
217 |
|
|
</p></div></div><div class="sect2" title="Single Thread Example"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.example_single"></a>Single Thread Example</h3></div></div></div><p>
|
218 |
|
|
Let's start by describing how the data on a freelist is laid out in memory.
|
219 |
|
|
This is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes):
|
220 |
|
|
</p><pre class="programlisting">
|
221 |
|
|
+----------------+
|
222 |
|
|
| next* ---------|--+ (_S_bin[ 3 ].first[ 3 ] points here)
|
223 |
|
|
| | |
|
224 |
|
|
| | |
|
225 |
|
|
| | |
|
226 |
|
|
+----------------+ |
|
227 |
|
|
| thread_id = 3 | |
|
228 |
|
|
| | |
|
229 |
|
|
| | |
|
230 |
|
|
| | |
|
231 |
|
|
+----------------+ |
|
232 |
|
|
| DATA | | (A pointer to here is what is returned to the
|
233 |
|
|
| | | the application when needed)
|
234 |
|
|
| | |
|
235 |
|
|
| | |
|
236 |
|
|
| | |
|
237 |
|
|
| | |
|
238 |
|
|
| | |
|
239 |
|
|
| | |
|
240 |
|
|
+----------------+ |
|
241 |
|
|
+----------------+ |
|
242 |
|
|
| next* |<-+ (If next == NULL it's the last one on the list)
|
243 |
|
|
| |
|
244 |
|
|
| |
|
245 |
|
|
| |
|
246 |
|
|
+----------------+
|
247 |
|
|
| thread_id = 3 |
|
248 |
|
|
| |
|
249 |
|
|
| |
|
250 |
|
|
| |
|
251 |
|
|
+----------------+
|
252 |
|
|
| DATA |
|
253 |
|
|
| |
|
254 |
|
|
| |
|
255 |
|
|
| |
|
256 |
|
|
| |
|
257 |
|
|
| |
|
258 |
|
|
| |
|
259 |
|
|
| |
|
260 |
|
|
+----------------+
|
261 |
|
|
</pre><p>
|
262 |
|
|
With this in mind we simplify things a bit for a while and say that there is
|
263 |
|
|
only one thread (a ST application). In this case all operations are made to
|
264 |
|
|
what is referred to as the global pool - thread id 0 (No thread may be
|
265 |
|
|
assigned this id since they span from 1 to _S_max_threads in a MT application).
|
266 |
|
|
</p><p>
|
267 |
|
|
When the application requests memory (calling allocate()) we first look at the
|
268 |
|
|
requested size and if this is > _S_max_bytes we call new() directly and return.
|
269 |
|
|
</p><p>
|
270 |
|
|
If the requested size is within limits we start by finding out from which
|
271 |
|
|
bin we should serve this request by looking in _S_binmap.
|
272 |
|
|
</p><p>
|
273 |
|
|
A quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of
|
274 |
|
|
this size on the freelist (0). If this is not NULL - fine, just remove the
|
275 |
|
|
block that _S_bin[ bin ].first[ 0 ] points to from the list,
|
276 |
|
|
update _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data.
|
277 |
|
|
</p><p>
|
278 |
|
|
If the freelist is empty (the pointer is NULL) we must get memory from the
|
279 |
|
|
system and build us a freelist within this memory. All requests for new memory
|
280 |
|
|
is made in chunks of _S_chunk_size. Knowing the size of a block_record and
|
281 |
|
|
the bytes that this bin stores we then calculate how many blocks we can create
|
282 |
|
|
within this chunk, build the list, remove the first block, update the pointer
|
283 |
|
|
(_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data.
|
284 |
|
|
</p><p>
|
285 |
|
|
Deallocation is equally simple; the pointer is casted back to a block_record
|
286 |
|
|
pointer, lookup which bin to use based on the size, add the block to the front
|
287 |
|
|
of the global freelist and update the pointer as needed
|
288 |
|
|
(_S_bin[ bin ].first[ 0 ]).
|
289 |
|
|
</p><p>
|
290 |
|
|
The decision to add deallocated blocks to the front of the freelist was made
|
291 |
|
|
after a set of performance measurements that showed that this is roughly 10%
|
292 |
|
|
faster than maintaining a set of "last pointers" as well.
|
293 |
|
|
</p></div><div class="sect2" title="Multiple Thread Example"><div class="titlepage"><div><div><h3 class="title"><a id="allocator.mt.example_multi"></a>Multiple Thread Example</h3></div></div></div><p>
|
294 |
|
|
In the ST example we never used the thread_id variable present in each block.
|
295 |
|
|
Let's start by explaining the purpose of this in a MT application.
|
296 |
|
|
</p><p>
|
297 |
|
|
The concept of "ownership" was introduced since many MT applications
|
298 |
|
|
allocate and deallocate memory to shared containers from different
|
299 |
|
|
threads (such as a cache shared amongst all threads). This introduces
|
300 |
|
|
a problem if the allocator only returns memory to the current threads
|
301 |
|
|
freelist (I.e., there might be one thread doing all the allocation and
|
302 |
|
|
thus obtaining ever more memory from the system and another thread
|
303 |
|
|
that is getting a longer and longer freelist - this will in the end
|
304 |
|
|
consume all available memory).
|
305 |
|
|
</p><p>
|
306 |
|
|
Each time a block is moved from the global list (where ownership is
|
307 |
|
|
irrelevant), to a threads freelist (or when a new freelist is built
|
308 |
|
|
from a chunk directly onto a threads freelist or when a deallocation
|
309 |
|
|
occurs on a block which was not allocated by the same thread id as the
|
310 |
|
|
one doing the deallocation) the thread id is set to the current one.
|
311 |
|
|
</p><p>
|
312 |
|
|
What's the use? Well, when a deallocation occurs we can now look at
|
313 |
|
|
the thread id and find out if it was allocated by another thread id
|
314 |
|
|
and decrease the used counter of that thread instead, thus keeping the
|
315 |
|
|
free and used counters correct. And keeping the free and used counters
|
316 |
|
|
corrects is very important since the relationship between these two
|
317 |
|
|
variables decides if memory should be returned to the global pool or
|
318 |
|
|
not when a deallocation occurs.
|
319 |
|
|
</p><p>
|
320 |
|
|
When the application requests memory (calling allocate()) we first
|
321 |
|
|
look at the requested size and if this is >_S_max_bytes we call new()
|
322 |
|
|
directly and return.
|
323 |
|
|
</p><p>
|
324 |
|
|
If the requested size is within limits we start by finding out from which
|
325 |
|
|
bin we should serve this request by looking in _S_binmap.
|
326 |
|
|
</p><p>
|
327 |
|
|
A call to _S_get_thread_id() returns the thread id for the calling thread
|
328 |
|
|
(and if no value has been set in _S_thread_key, a new id is assigned and
|
329 |
|
|
returned).
|
330 |
|
|
</p><p>
|
331 |
|
|
A quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are
|
332 |
|
|
any blocks of this size on the current threads freelist. If this is
|
333 |
|
|
not NULL - fine, just remove the block that _S_bin[ bin ].first[
|
334 |
|
|
thread_id ] points to from the list, update _S_bin[ bin ].first[
|
335 |
|
|
thread_id ], update the free and used counters and return a pointer to
|
336 |
|
|
that blocks data.
|
337 |
|
|
</p><p>
|
338 |
|
|
If the freelist is empty (the pointer is NULL) we start by looking at
|
339 |
|
|
the global freelist (0). If there are blocks available on the global
|
340 |
|
|
freelist we lock this bins mutex and move up to block_count (the
|
341 |
|
|
number of blocks of this bins size that will fit into a _S_chunk_size)
|
342 |
|
|
or until end of list - whatever comes first - to the current threads
|
343 |
|
|
freelist and at the same time change the thread_id ownership and
|
344 |
|
|
update the counters and pointers. When the bins mutex has been
|
345 |
|
|
unlocked, we remove the block that _S_bin[ bin ].first[ thread_id ]
|
346 |
|
|
points to from the list, update _S_bin[ bin ].first[ thread_id ],
|
347 |
|
|
update the free and used counters, and return a pointer to that blocks
|
348 |
|
|
data.
|
349 |
|
|
</p><p>
|
350 |
|
|
The reason that the number of blocks moved to the current threads
|
351 |
|
|
freelist is limited to block_count is to minimize the chance that a
|
352 |
|
|
subsequent deallocate() call will return the excess blocks to the
|
353 |
|
|
global freelist (based on the _S_freelist_headroom calculation, see
|
354 |
|
|
below).
|
355 |
|
|
</p><p>
|
356 |
|
|
However if there isn't any memory on the global pool we need to get
|
357 |
|
|
memory from the system - this is done in exactly the same way as in a
|
358 |
|
|
single threaded application with one major difference; the list built
|
359 |
|
|
in the newly allocated memory (of _S_chunk_size size) is added to the
|
360 |
|
|
current threads freelist instead of to the global.
|
361 |
|
|
</p><p>
|
362 |
|
|
The basic process of a deallocation call is simple: always add the
|
363 |
|
|
block to the front of the current threads freelist and update the
|
364 |
|
|
counters and pointers (as described earlier with the specific check of
|
365 |
|
|
ownership that causes the used counter of the thread that originally
|
366 |
|
|
allocated the block to be decreased instead of the current threads
|
367 |
|
|
counter).
|
368 |
|
|
</p><p>
|
369 |
|
|
And here comes the free and used counters to service. Each time a
|
370 |
|
|
deallocation() call is made, the length of the current threads
|
371 |
|
|
freelist is compared to the amount memory in use by this thread.
|
372 |
|
|
</p><p>
|
373 |
|
|
Let's go back to the example of an application that has one thread
|
374 |
|
|
that does all the allocations and one that deallocates. Both these
|
375 |
|
|
threads use say 516 32-byte blocks that was allocated during thread
|
376 |
|
|
creation for example. Their used counters will both say 516 at this
|
377 |
|
|
point. The allocation thread now grabs 1000 32-byte blocks and puts
|
378 |
|
|
them in a shared container. The used counter for this thread is now
|
379 |
|
|
1516.
|
380 |
|
|
</p><p>
|
381 |
|
|
The deallocation thread now deallocates 500 of these blocks. For each
|
382 |
|
|
deallocation made the used counter of the allocating thread is
|
383 |
|
|
decreased and the freelist of the deallocation thread gets longer and
|
384 |
|
|
longer. But the calculation made in deallocate() will limit the length
|
385 |
|
|
of the freelist in the deallocation thread to _S_freelist_headroom %
|
386 |
|
|
of it's used counter. In this case, when the freelist (given that the
|
387 |
|
|
_S_freelist_headroom is at it's default value of 10%) exceeds 52
|
388 |
|
|
(516/10) blocks will be returned to the global pool where the
|
389 |
|
|
allocating thread may pick them up and reuse them.
|
390 |
|
|
</p><p>
|
391 |
|
|
In order to reduce lock contention (since this requires this bins
|
392 |
|
|
mutex to be locked) this operation is also made in chunks of blocks
|
393 |
|
|
(just like when chunks of blocks are moved from the global freelist to
|
394 |
|
|
a threads freelist mentioned above). The "formula" used can probably
|
395 |
|
|
be improved to further reduce the risk of blocks being "bounced back
|
396 |
|
|
and forth" between freelists.
|
397 |
|
|
</p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="bk01pt12ch32s07.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="extensions.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="bitmap_allocator.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Diagnostics </td><td width="20%" align="center"><a accesskey="h" href="../spine.html">Home</a></td><td width="40%" align="right" valign="top"> bitmap_allocator</td></tr></table></div></body></html>
|