OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [tags/] [gnu-src/] [gcc-4.5.1/] [gcc-4.5.1-or32-1.0rc4/] [libstdc++-v3/] [doc/] [xml/] [manual/] [mt_allocator.xml] - Blame information for rev 519

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 424 jeremybenn
2
3
 
4
5
  
6
    
7
      ISO C++
8
    
9
    
10
      allocator
11
    
12
  
13
14
 
15
mt_allocator
16
 
17
18
19
 
20
21
Intro
22
 
23
24
  The mt allocator [hereinafter referred to simply as "the allocator"]
25
  is a fixed size (power of two) allocator that was initially
26
  developed specifically to suit the needs of multi threaded
27
  applications [hereinafter referred to as an MT application]. Over
28
  time the allocator has evolved and been improved in many ways, in
29
  particular it now also does a good job in single threaded
30
  applications [hereinafter referred to as a ST application]. (Note:
31
  In this document, when referring to single threaded applications
32
  this also includes applications that are compiled with gcc without
33
  thread support enabled. This is accomplished using ifdef's on
34
  __GTHREADS). This allocator is tunable, very flexible, and capable
35
  of high-performance.
36
37
 
38
39
  The aim of this document is to describe - from an application point of
40
  view - the "inner workings" of the allocator.
41
42
 
43
44
 
45
 
46
47
Design Issues
48
 
49
50
Overview
51
 
52
 
53
 There are three general components to the allocator: a datum
54
describing the characteristics of the memory pool, a policy class
55
containing this pool that links instantiation types to common or
56
individual pools, and a class inheriting from the policy class that is
57
the actual allocator.
58
59
 
60
The datum describing pools characteristics is
61
62
63
  template<bool _Thread>
64
    class __pool
65
66
 This class is parametrized on thread support, and is explicitly
67
specialized for both multiple threads (with bool==true)
68
and single threads (via bool==false.) It is possible to
69
use a custom pool datum instead of the default class that is provided.
70
71
 
72
 There are two distinct policy classes, each of which can be used
73
with either type of underlying pool datum.
74
75
 
76
77
  template<bool _Thread>
78
    struct __common_pool_policy
79
 
80
  template<typename _Tp, bool _Thread>
81
    struct __per_type_pool_policy
82
83
 
84
 The first policy, __common_pool_policy, implements a
85
common pool. This means that allocators that are instantiated with
86
different types, say char and long will both
87
use the same pool. This is the default policy.
88
89
 
90
 The second policy, __per_type_pool_policy, implements
91
a separate pool for each instantiating type. Thus, char
92
and long will use separate pools. This allows per-type
93
tuning, for instance.
94
95
 
96
 Putting this all together, the actual allocator class is
97
98
99
  template<typename _Tp, typename _Poolp = __default_policy>
100
    class __mt_alloc : public __mt_alloc_base<_Tp>,  _Poolp
101
102
 This class has the interface required for standard library allocator
103
classes, namely member functions allocate and
104
deallocate, plus others.
105
106
 
107
108
109
 
110
111
Implementation
112
 
113
 
114
115
Tunable Parameters
116
 
117
Certain allocation parameters can be modified, or tuned. There
118
exists a nested struct __pool_base::_Tune that contains all
119
these parameters, which include settings for
120
121
   
122
     Alignment
123
     Maximum bytes before calling ::operator new directly
124
     Minimum bytes
125
     Size of underlying global allocations
126
     Maximum number of supported threads
127
     Migration of deallocations to the global free list
128
     Shunt for global new and delete
129
   
130
Adjusting parameters for a given instance of an allocator can only
131
happen before any allocations take place, when the allocator itself is
132
initialized. For instance:
133
134
135
#include <ext/mt_allocator.h>
136
 
137
struct pod
138
{
139
  int i;
140
  int j;
141
};
142
 
143
int main()
144
{
145
  typedef pod value_type;
146
  typedef __gnu_cxx::__mt_alloc<value_type> allocator_type;
147
  typedef __gnu_cxx::__pool_base::_Tune tune_type;
148
 
149
  tune_type t_default;
150
  tune_type t_opt(16, 5120, 32, 5120, 20, 10, false);
151
  tune_type t_single(16, 5120, 32, 5120, 1, 10, false);
152
 
153
  tune_type t;
154
  t = allocator_type::_M_get_options();
155
  allocator_type::_M_set_options(t_opt);
156
  t = allocator_type::_M_get_options();
157
 
158
  allocator_type a;
159
  allocator_type::pointer p1 = a.allocate(128);
160
  allocator_type::pointer p2 = a.allocate(5128);
161
 
162
  a.deallocate(p1, 128);
163
  a.deallocate(p2, 5128);
164
 
165
  return 0;
166
}
167
168
 
169
170
 
171
172
Initialization
173
 
174
175
The static variables (pointers to freelists, tuning parameters etc)
176
are initialized as above, or are set to the global defaults.
177
178
 
179
180
The very first allocate() call will always call the
181
_S_initialize_once() function.  In order to make sure that this
182
function is called exactly once we make use of a __gthread_once call
183
in MT applications and check a static bool (_S_init) in ST
184
applications.
185
186
 
187
188
The _S_initialize() function:
189
- If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool
190
  _S_force_new to true and then returns. This will cause subsequent calls to
191
  allocate() to return memory directly from a new() call, and deallocate will
192
  only do a delete() call.
193
194
 
195
196
- If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MT
197
  applications will:
198
  - Calculate the number of bins needed. A bin is a specific power of two size
199
    of bytes. I.e., by default the allocator will deal with requests of up to
200
    128 bytes (or whatever the value of _S_max_bytes is when _S_init() is
201
    called). This means that there will be bins of the following sizes
202
    (in bytes): 1, 2, 4, 8, 16, 32, 64, 128.
203
 
204
  - Create the _S_binmap array. All requests are rounded up to the next
205
    "large enough" bin. I.e., a request for 29 bytes will cause a block from
206
    the "32 byte bin" to be returned to the application. The purpose of
207
    _S_binmap is to speed up the process of finding out which bin to use.
208
    I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes).
209
210
211
  - Create the _S_bin array. This array consists of bin_records. There will be
212
    as many bin_records in this array as the number of bins that we calculated
213
    earlier. I.e., if _S_max_bytes = 128 there will be 8 entries.
214
    Each bin_record is then initialized:
215
    - bin_record->first = An array of pointers to block_records. There will be
216
      as many block_records pointers as there are maximum number of threads
217
      (in a ST application there is only 1 thread, in a MT application there
218
      are _S_max_threads).
219
      This holds the pointer to the first free block for each thread in this
220
      bin. I.e., if we would like to know where the first free block of size 32
221
      for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ]
222
 
223
    The above created block_record pointers members are now initialized to
224
    their initial values. I.e. _S_bin[ n ].first[ n ] = NULL;
225
226
 
227
228
- Additionally a MT application will:
229
  - Create a list of free thread id's. The pointer to the first entry
230
    is stored in _S_thread_freelist_first. The reason for this approach is
231
    that the __gthread_self() call will not return a value that corresponds to
232
    the maximum number of threads allowed but rather a process id number or
233
    something else. So what we do is that we create a list of thread_records.
234
    This list is _S_max_threads long and each entry holds a size_t thread_id
235
    which is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads.
236
    Each time a thread calls allocate() or deallocate() we call
237
    _S_get_thread_id() which looks at the value of _S_thread_key which is a
238
    thread local storage pointer. If this is NULL we know that this is a newly
239
    created thread and we pop the first entry from this list and saves the
240
    pointer to this record in the _S_thread_key variable. The next time
241
    we will get the pointer to the thread_record back and we use the
242
    thread_record->thread_id as identification. I.e., the first thread that
243
    calls allocate will get the first record in this list and thus be thread
244
    number 1 and will then find the pointer to its first free 32 byte block
245
    in _S_bin[ 5 ].first[ 1 ]
246
    When we create the _S_thread_key we also define a destructor
247
    (_S_thread_key_destr) which means that when the thread dies, this
248
    thread_record is returned to the front of this list and the thread id
249
    can then be reused if a new thread is created.
250
    This list is protected by a mutex (_S_thread_freelist_mutex) which is only
251
    locked when records are removed or added to the list.
252
253
254
  - Initialize the free and used counters of each bin_record:
255
    - bin_record->free = An array of size_t. This keeps track of the number
256
      of blocks on a specific thread's freelist in each bin. I.e., if a thread
257
      has 12 32-byte blocks on it's freelists and allocates one of these, this
258
      counter would be decreased to 11.
259
 
260
    - bin_record->used = An array of size_t. This keeps track of the number
261
      of blocks currently in use of this size by this thread. I.e., if a thread
262
      has made 678 requests (and no deallocations...) of 32-byte blocks this
263
      counter will read 678.
264
 
265
    The above created arrays are now initialized with their initial values.
266
    I.e. _S_bin[ n ].free[ n ] = 0;
267
268
269
  - Initialize the mutex of each bin_record: The bin_record->mutex
270
    is used to protect the global freelist. This concept of a global
271
    freelist is explained in more detail in the section "A multi
272
    threaded example", but basically this mutex is locked whenever a
273
    block of memory is retrieved or returned to the global freelist
274
    for this specific bin. This only occurs when a number of blocks
275
    are grabbed from the global list to a thread specific list or when
276
    a thread decides to return some blocks to the global freelist.
277
278
 
279
280
 
281
282
Deallocation Notes
283
 
284
 Notes about deallocation. This allocator does not explicitly
285
release memory. Because of this, memory debugging programs like
286
valgrind or purify may notice leaks: sorry about this
287
inconvenience. Operating systems will reclaim allocated memory at
288
program termination anyway. If sidestepping this kind of noise is
289
desired, there are three options: use an allocator, like
290
new_allocator that releases memory while debugging, use
291
GLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or use a
292
custom pool datum that releases resources on destruction.
293
294
 
295
296
  On systems with the function __cxa_atexit, the
297
allocator can be forced to free all memory allocated before program
298
termination with the member function
299
__pool_type::_M_destroy. However, because this member
300
function relies on the precise and exactly-conforming ordering of
301
static destructors, including those of a static local
302
__pool object, it should not be used, ever, on systems
303
that don't have the necessary underlying support. In addition, in
304
practice, forcing deallocation can be tricky, as it requires the
305
__pool object to be fully-constructed before the object
306
that uses it is fully constructed. For most (but not all) STL
307
containers, this works, as an instance of the allocator is constructed
308
as part of a container's constructor. However, this assumption is
309
implementation-specific, and subject to change. For an example of a
310
pool that frees memory, see the following
311
    
312
    example.
313
314
 
315
316
 
317
318
 
319
320
Single Thread Example
321
 
322
323
Let's start by describing how the data on a freelist is laid out in memory.
324
This is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes):
325
326
327
+----------------+
328
| next* ---------|--+  (_S_bin[ 3 ].first[ 3 ] points here)
329
|                |  |
330
|                |  |
331
|                |  |
332
+----------------+  |
333
| thread_id = 3  |  |
334
|                |  |
335
|                |  |
336
|                |  |
337
+----------------+  |
338
| DATA           |  |  (A pointer to here is what is returned to the
339
|                |  |   the application when needed)
340
|                |  |
341
|                |  |
342
|                |  |
343
|                |  |
344
|                |  |
345
|                |  |
346
+----------------+  |
347
+----------------+  |
348
| next*          |<-+  (If next == NULL it's the last one on the list)
349
|                |
350
|                |
351
|                |
352
+----------------+
353
| thread_id = 3  |
354
|                |
355
|                |
356
|                |
357
+----------------+
358
| DATA           |
359
|                |
360
|                |
361
|                |
362
|                |
363
|                |
364
|                |
365
|                |
366
+----------------+
367
368
 
369
370
With this in mind we simplify things a bit for a while and say that there is
371
only one thread (a ST application). In this case all operations are made to
372
what is referred to as the global pool - thread id 0 (No thread may be
373
assigned this id since they span from 1 to _S_max_threads in a MT application).
374
375
376
When the application requests memory (calling allocate()) we first look at the
377
requested size and if this is > _S_max_bytes we call new() directly and return.
378
379
380
If the requested size is within limits we start by finding out from which
381
bin we should serve this request by looking in _S_binmap.
382
383
384
A quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of
385
this size on the freelist (0). If this is not NULL - fine, just remove the
386
block that _S_bin[ bin ].first[ 0 ] points to from the list,
387
update _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data.
388
389
390
If the freelist is empty (the pointer is NULL) we must get memory from the
391
system and build us a freelist within this memory. All requests for new memory
392
is made in chunks of _S_chunk_size. Knowing the size of a block_record and
393
the bytes that this bin stores we then calculate how many blocks we can create
394
within this chunk, build the list, remove the first block, update the pointer
395
(_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data.
396
397
 
398
399
Deallocation is equally simple; the pointer is casted back to a block_record
400
pointer, lookup which bin to use based on the size, add the block to the front
401
of the global freelist and update the pointer as needed
402
(_S_bin[ bin ].first[ 0 ]).
403
404
 
405
406
The decision to add deallocated blocks to the front of the freelist was made
407
after a set of performance measurements that showed that this is roughly 10%
408
faster than maintaining a set of "last pointers" as well.
409
410
 
411
412
 
413
414
Multiple Thread Example
415
 
416
417
In the ST example we never used the thread_id variable present in each block.
418
Let's start by explaining the purpose of this in a MT application.
419
420
 
421
422
The concept of "ownership" was introduced since many MT applications
423
allocate and deallocate memory to shared containers from different
424
threads (such as a cache shared amongst all threads). This introduces
425
a problem if the allocator only returns memory to the current threads
426
freelist (I.e., there might be one thread doing all the allocation and
427
thus obtaining ever more memory from the system and another thread
428
that is getting a longer and longer freelist - this will in the end
429
consume all available memory).
430
431
 
432
433
Each time a block is moved from the global list (where ownership is
434
irrelevant), to a threads freelist (or when a new freelist is built
435
from a chunk directly onto a threads freelist or when a deallocation
436
occurs on a block which was not allocated by the same thread id as the
437
one doing the deallocation) the thread id is set to the current one.
438
439
 
440
441
What's the use? Well, when a deallocation occurs we can now look at
442
the thread id and find out if it was allocated by another thread id
443
and decrease the used counter of that thread instead, thus keeping the
444
free and used counters correct. And keeping the free and used counters
445
corrects is very important since the relationship between these two
446
variables decides if memory should be returned to the global pool or
447
not when a deallocation occurs.
448
449
 
450
451
When the application requests memory (calling allocate()) we first
452
look at the requested size and if this is >_S_max_bytes we call new()
453
directly and return.
454
455
 
456
457
If the requested size is within limits we start by finding out from which
458
bin we should serve this request by looking in _S_binmap.
459
460
 
461
462
A call to _S_get_thread_id() returns the thread id for the calling thread
463
(and if no value has been set in _S_thread_key, a new id is assigned and
464
returned).
465
466
 
467
468
A quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are
469
any blocks of this size on the current threads freelist. If this is
470
not NULL - fine, just remove the block that _S_bin[ bin ].first[
471
thread_id ] points to from the list, update _S_bin[ bin ].first[
472
thread_id ], update the free and used counters and return a pointer to
473
that blocks data.
474
475
 
476
477
If the freelist is empty (the pointer is NULL) we start by looking at
478
the global freelist (0). If there are blocks available on the global
479
freelist we lock this bins mutex and move up to block_count (the
480
number of blocks of this bins size that will fit into a _S_chunk_size)
481
or until end of list - whatever comes first - to the current threads
482
freelist and at the same time change the thread_id ownership and
483
update the counters and pointers. When the bins mutex has been
484
unlocked, we remove the block that _S_bin[ bin ].first[ thread_id ]
485
points to from the list, update _S_bin[ bin ].first[ thread_id ],
486
update the free and used counters, and return a pointer to that blocks
487
data.
488
489
 
490
491
The reason that the number of blocks moved to the current threads
492
freelist is limited to block_count is to minimize the chance that a
493
subsequent deallocate() call will return the excess blocks to the
494
global freelist (based on the _S_freelist_headroom calculation, see
495
below).
496
497
 
498
499
However if there isn't any memory on the global pool we need to get
500
memory from the system - this is done in exactly the same way as in a
501
single threaded application with one major difference; the list built
502
in the newly allocated memory (of _S_chunk_size size) is added to the
503
current threads freelist instead of to the global.
504
505
 
506
507
The basic process of a deallocation call is simple: always add the
508
block to the front of the current threads freelist and update the
509
counters and pointers (as described earlier with the specific check of
510
ownership that causes the used counter of the thread that originally
511
allocated the block to be decreased instead of the current threads
512
counter).
513
514
 
515
516
And here comes the free and used counters to service. Each time a
517
deallocation() call is made, the length of the current threads
518
freelist is compared to the amount memory in use by this thread.
519
520
 
521
522
Let's go back to the example of an application that has one thread
523
that does all the allocations and one that deallocates. Both these
524
threads use say 516 32-byte blocks that was allocated during thread
525
creation for example.  Their used counters will both say 516 at this
526
point. The allocation thread now grabs 1000 32-byte blocks and puts
527
them in a shared container. The used counter for this thread is now
528
1516.
529
530
 
531
532
The deallocation thread now deallocates 500 of these blocks. For each
533
deallocation made the used counter of the allocating thread is
534
decreased and the freelist of the deallocation thread gets longer and
535
longer. But the calculation made in deallocate() will limit the length
536
of the freelist in the deallocation thread to _S_freelist_headroom %
537
of it's used counter.  In this case, when the freelist (given that the
538
_S_freelist_headroom is at it's default value of 10%) exceeds 52
539
(516/10) blocks will be returned to the global pool where the
540
allocating thread may pick them up and reuse them.
541
542
 
543
544
In order to reduce lock contention (since this requires this bins
545
mutex to be locked) this operation is also made in chunks of blocks
546
(just like when chunks of blocks are moved from the global freelist to
547
a threads freelist mentioned above). The "formula" used can probably
548
be improved to further reduce the risk of blocks being "bounced back
549
and forth" between freelists.
550
551
 
552
553
 
554

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.