OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [trunk/] [gnu-dev/] [or1k-gcc/] [libstdc++-v3/] [doc/] [xml/] [manual/] [mt_allocator.xml] - Blame information for rev 745

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 742 jeremybenn
2
         xml:id="manual.ext.allocator.mt" xreflabel="mt allocator">
3
4
 
5
The mt_allocator
6
  
7
    
8
      ISO C++
9
    
10
    
11
      allocator
12
    
13
  
14
15
 
16
 
17
 
18
19
20
 
21
Intro
22
 
23
 
24
25
  The mt allocator [hereinafter referred to simply as "the allocator"]
26
  is a fixed size (power of two) allocator that was initially
27
  developed specifically to suit the needs of multi threaded
28
  applications [hereinafter referred to as an MT application]. Over
29
  time the allocator has evolved and been improved in many ways, in
30
  particular it now also does a good job in single threaded
31
  applications [hereinafter referred to as a ST application]. (Note:
32
  In this document, when referring to single threaded applications
33
  this also includes applications that are compiled with gcc without
34
  thread support enabled. This is accomplished using ifdef's on
35
  __GTHREADS). This allocator is tunable, very flexible, and capable
36
  of high-performance.
37
38
 
39
40
  The aim of this document is to describe - from an application point of
41
  view - the "inner workings" of the allocator.
42
43
 
44
45
 
46
 
47
Design Issues
48
 
49
 
50
Overview
51
 
52
 
53
 
54
 There are three general components to the allocator: a datum
55
describing the characteristics of the memory pool, a policy class
56
containing this pool that links instantiation types to common or
57
individual pools, and a class inheriting from the policy class that is
58
the actual allocator.
59
60
 
61
The datum describing pools characteristics is
62
63
64
  template<bool _Thread>
65
    class __pool
66
67
 This class is parametrized on thread support, and is explicitly
68
specialized for both multiple threads (with bool==true)
69
and single threads (via bool==false.) It is possible to
70
use a custom pool datum instead of the default class that is provided.
71
72
 
73
 There are two distinct policy classes, each of which can be used
74
with either type of underlying pool datum.
75
76
 
77
78
  template<bool _Thread>
79
    struct __common_pool_policy
80
 
81
  template<typename _Tp, bool _Thread>
82
    struct __per_type_pool_policy
83
84
 
85
 The first policy, __common_pool_policy, implements a
86
common pool. This means that allocators that are instantiated with
87
different types, say char and long will both
88
use the same pool. This is the default policy.
89
90
 
91
 The second policy, __per_type_pool_policy, implements
92
a separate pool for each instantiating type. Thus, char
93
and long will use separate pools. This allows per-type
94
tuning, for instance.
95
96
 
97
 Putting this all together, the actual allocator class is
98
99
100
  template<typename _Tp, typename _Poolp = __default_policy>
101
    class __mt_alloc : public __mt_alloc_base<_Tp>,  _Poolp
102
103
 This class has the interface required for standard library allocator
104
classes, namely member functions allocate and
105
deallocate, plus others.
106
107
 
108
109
110
 
111
Implementation
112
 
113
 
114
 
115
Tunable Parameters
116
 
117
 
118
Certain allocation parameters can be modified, or tuned. There
119
exists a nested struct __pool_base::_Tune that contains all
120
these parameters, which include settings for
121
122
   
123
     Alignment
124
     Maximum bytes before calling ::operator new directly
125
     Minimum bytes
126
     Size of underlying global allocations
127
     Maximum number of supported threads
128
     Migration of deallocations to the global free list
129
     Shunt for global new and delete
130
   
131
Adjusting parameters for a given instance of an allocator can only
132
happen before any allocations take place, when the allocator itself is
133
initialized. For instance:
134
135
136
#include <ext/mt_allocator.h>
137
 
138
struct pod
139
{
140
  int i;
141
  int j;
142
};
143
 
144
int main()
145
{
146
  typedef pod value_type;
147
  typedef __gnu_cxx::__mt_alloc<value_type> allocator_type;
148
  typedef __gnu_cxx::__pool_base::_Tune tune_type;
149
 
150
  tune_type t_default;
151
  tune_type t_opt(16, 5120, 32, 5120, 20, 10, false);
152
  tune_type t_single(16, 5120, 32, 5120, 1, 10, false);
153
 
154
  tune_type t;
155
  t = allocator_type::_M_get_options();
156
  allocator_type::_M_set_options(t_opt);
157
  t = allocator_type::_M_get_options();
158
 
159
  allocator_type a;
160
  allocator_type::pointer p1 = a.allocate(128);
161
  allocator_type::pointer p2 = a.allocate(5128);
162
 
163
  a.deallocate(p1, 128);
164
  a.deallocate(p2, 5128);
165
 
166
  return 0;
167
}
168
169
 
170
171
 
172
Initialization
173
 
174
 
175
176
The static variables (pointers to freelists, tuning parameters etc)
177
are initialized as above, or are set to the global defaults.
178
179
 
180
181
The very first allocate() call will always call the
182
_S_initialize_once() function.  In order to make sure that this
183
function is called exactly once we make use of a __gthread_once call
184
in MT applications and check a static bool (_S_init) in ST
185
applications.
186
187
 
188
189
The _S_initialize() function:
190
- If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool
191
  _S_force_new to true and then returns. This will cause subsequent calls to
192
  allocate() to return memory directly from a new() call, and deallocate will
193
  only do a delete() call.
194
195
 
196
197
- If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MT
198
  applications will:
199
  - Calculate the number of bins needed. A bin is a specific power of two size
200
    of bytes. I.e., by default the allocator will deal with requests of up to
201
    128 bytes (or whatever the value of _S_max_bytes is when _S_init() is
202
    called). This means that there will be bins of the following sizes
203
    (in bytes): 1, 2, 4, 8, 16, 32, 64, 128.
204
 
205
  - Create the _S_binmap array. All requests are rounded up to the next
206
    "large enough" bin. I.e., a request for 29 bytes will cause a block from
207
    the "32 byte bin" to be returned to the application. The purpose of
208
    _S_binmap is to speed up the process of finding out which bin to use.
209
    I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes).
210
211
212
  - Create the _S_bin array. This array consists of bin_records. There will be
213
    as many bin_records in this array as the number of bins that we calculated
214
    earlier. I.e., if _S_max_bytes = 128 there will be 8 entries.
215
    Each bin_record is then initialized:
216
    - bin_record->first = An array of pointers to block_records. There will be
217
      as many block_records pointers as there are maximum number of threads
218
      (in a ST application there is only 1 thread, in a MT application there
219
      are _S_max_threads).
220
      This holds the pointer to the first free block for each thread in this
221
      bin. I.e., if we would like to know where the first free block of size 32
222
      for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ]
223
 
224
    The above created block_record pointers members are now initialized to
225
    their initial values. I.e. _S_bin[ n ].first[ n ] = NULL;
226
227
 
228
229
- Additionally a MT application will:
230
  - Create a list of free thread id's. The pointer to the first entry
231
    is stored in _S_thread_freelist_first. The reason for this approach is
232
    that the __gthread_self() call will not return a value that corresponds to
233
    the maximum number of threads allowed but rather a process id number or
234
    something else. So what we do is that we create a list of thread_records.
235
    This list is _S_max_threads long and each entry holds a size_t thread_id
236
    which is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads.
237
    Each time a thread calls allocate() or deallocate() we call
238
    _S_get_thread_id() which looks at the value of _S_thread_key which is a
239
    thread local storage pointer. If this is NULL we know that this is a newly
240
    created thread and we pop the first entry from this list and saves the
241
    pointer to this record in the _S_thread_key variable. The next time
242
    we will get the pointer to the thread_record back and we use the
243
    thread_record->thread_id as identification. I.e., the first thread that
244
    calls allocate will get the first record in this list and thus be thread
245
    number 1 and will then find the pointer to its first free 32 byte block
246
    in _S_bin[ 5 ].first[ 1 ]
247
    When we create the _S_thread_key we also define a destructor
248
    (_S_thread_key_destr) which means that when the thread dies, this
249
    thread_record is returned to the front of this list and the thread id
250
    can then be reused if a new thread is created.
251
    This list is protected by a mutex (_S_thread_freelist_mutex) which is only
252
    locked when records are removed or added to the list.
253
254
255
  - Initialize the free and used counters of each bin_record:
256
    - bin_record->free = An array of size_t. This keeps track of the number
257
      of blocks on a specific thread's freelist in each bin. I.e., if a thread
258
      has 12 32-byte blocks on it's freelists and allocates one of these, this
259
      counter would be decreased to 11.
260
 
261
    - bin_record->used = An array of size_t. This keeps track of the number
262
      of blocks currently in use of this size by this thread. I.e., if a thread
263
      has made 678 requests (and no deallocations...) of 32-byte blocks this
264
      counter will read 678.
265
 
266
    The above created arrays are now initialized with their initial values.
267
    I.e. _S_bin[ n ].free[ n ] = 0;
268
269
270
  - Initialize the mutex of each bin_record: The bin_record->mutex
271
    is used to protect the global freelist. This concept of a global
272
    freelist is explained in more detail in the section "A multi
273
    threaded example", but basically this mutex is locked whenever a
274
    block of memory is retrieved or returned to the global freelist
275
    for this specific bin. This only occurs when a number of blocks
276
    are grabbed from the global list to a thread specific list or when
277
    a thread decides to return some blocks to the global freelist.
278
279
 
280
281
 
282
Deallocation Notes
283
 
284
 
285
 Notes about deallocation. This allocator does not explicitly
286
release memory. Because of this, memory debugging programs like
287
valgrind or purify may notice leaks: sorry about this
288
inconvenience. Operating systems will reclaim allocated memory at
289
program termination anyway. If sidestepping this kind of noise is
290
desired, there are three options: use an allocator, like
291
new_allocator that releases memory while debugging, use
292
GLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or use a
293
custom pool datum that releases resources on destruction.
294
295
 
296
297
  On systems with the function __cxa_atexit, the
298
allocator can be forced to free all memory allocated before program
299
termination with the member function
300
__pool_type::_M_destroy. However, because this member
301
function relies on the precise and exactly-conforming ordering of
302
static destructors, including those of a static local
303
__pool object, it should not be used, ever, on systems
304
that don't have the necessary underlying support. In addition, in
305
practice, forcing deallocation can be tricky, as it requires the
306
__pool object to be fully-constructed before the object
307
that uses it is fully constructed. For most (but not all) STL
308
containers, this works, as an instance of the allocator is constructed
309
as part of a container's constructor. However, this assumption is
310
implementation-specific, and subject to change. For an example of a
311
pool that frees memory, see the following
312
    
313
    example.
314
315
 
316
317
 
318
319
 
320
Single Thread Example
321
 
322
 
323
324
Let's start by describing how the data on a freelist is laid out in memory.
325
This is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes):
326
327
328
+----------------+
329
| next* ---------|--+  (_S_bin[ 3 ].first[ 3 ] points here)
330
|                |  |
331
|                |  |
332
|                |  |
333
+----------------+  |
334
| thread_id = 3  |  |
335
|                |  |
336
|                |  |
337
|                |  |
338
+----------------+  |
339
| DATA           |  |  (A pointer to here is what is returned to the
340
|                |  |   the application when needed)
341
|                |  |
342
|                |  |
343
|                |  |
344
|                |  |
345
|                |  |
346
|                |  |
347
+----------------+  |
348
+----------------+  |
349
| next*          |<-+  (If next == NULL it's the last one on the list)
350
|                |
351
|                |
352
|                |
353
+----------------+
354
| thread_id = 3  |
355
|                |
356
|                |
357
|                |
358
+----------------+
359
| DATA           |
360
|                |
361
|                |
362
|                |
363
|                |
364
|                |
365
|                |
366
|                |
367
+----------------+
368
369
 
370
371
With this in mind we simplify things a bit for a while and say that there is
372
only one thread (a ST application). In this case all operations are made to
373
what is referred to as the global pool - thread id 0 (No thread may be
374
assigned this id since they span from 1 to _S_max_threads in a MT application).
375
376
377
When the application requests memory (calling allocate()) we first look at the
378
requested size and if this is > _S_max_bytes we call new() directly and return.
379
380
381
If the requested size is within limits we start by finding out from which
382
bin we should serve this request by looking in _S_binmap.
383
384
385
A quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of
386
this size on the freelist (0). If this is not NULL - fine, just remove the
387
block that _S_bin[ bin ].first[ 0 ] points to from the list,
388
update _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data.
389
390
391
If the freelist is empty (the pointer is NULL) we must get memory from the
392
system and build us a freelist within this memory. All requests for new memory
393
is made in chunks of _S_chunk_size. Knowing the size of a block_record and
394
the bytes that this bin stores we then calculate how many blocks we can create
395
within this chunk, build the list, remove the first block, update the pointer
396
(_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data.
397
398
 
399
400
Deallocation is equally simple; the pointer is casted back to a block_record
401
pointer, lookup which bin to use based on the size, add the block to the front
402
of the global freelist and update the pointer as needed
403
(_S_bin[ bin ].first[ 0 ]).
404
405
 
406
407
The decision to add deallocated blocks to the front of the freelist was made
408
after a set of performance measurements that showed that this is roughly 10%
409
faster than maintaining a set of "last pointers" as well.
410
411
 
412
413
 
414
Multiple Thread Example
415
 
416
 
417
418
In the ST example we never used the thread_id variable present in each block.
419
Let's start by explaining the purpose of this in a MT application.
420
421
 
422
423
The concept of "ownership" was introduced since many MT applications
424
allocate and deallocate memory to shared containers from different
425
threads (such as a cache shared amongst all threads). This introduces
426
a problem if the allocator only returns memory to the current threads
427
freelist (I.e., there might be one thread doing all the allocation and
428
thus obtaining ever more memory from the system and another thread
429
that is getting a longer and longer freelist - this will in the end
430
consume all available memory).
431
432
 
433
434
Each time a block is moved from the global list (where ownership is
435
irrelevant), to a threads freelist (or when a new freelist is built
436
from a chunk directly onto a threads freelist or when a deallocation
437
occurs on a block which was not allocated by the same thread id as the
438
one doing the deallocation) the thread id is set to the current one.
439
440
 
441
442
What's the use? Well, when a deallocation occurs we can now look at
443
the thread id and find out if it was allocated by another thread id
444
and decrease the used counter of that thread instead, thus keeping the
445
free and used counters correct. And keeping the free and used counters
446
corrects is very important since the relationship between these two
447
variables decides if memory should be returned to the global pool or
448
not when a deallocation occurs.
449
450
 
451
452
When the application requests memory (calling allocate()) we first
453
look at the requested size and if this is >_S_max_bytes we call new()
454
directly and return.
455
456
 
457
458
If the requested size is within limits we start by finding out from which
459
bin we should serve this request by looking in _S_binmap.
460
461
 
462
463
A call to _S_get_thread_id() returns the thread id for the calling thread
464
(and if no value has been set in _S_thread_key, a new id is assigned and
465
returned).
466
467
 
468
469
A quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are
470
any blocks of this size on the current threads freelist. If this is
471
not NULL - fine, just remove the block that _S_bin[ bin ].first[
472
thread_id ] points to from the list, update _S_bin[ bin ].first[
473
thread_id ], update the free and used counters and return a pointer to
474
that blocks data.
475
476
 
477
478
If the freelist is empty (the pointer is NULL) we start by looking at
479
the global freelist (0). If there are blocks available on the global
480
freelist we lock this bins mutex and move up to block_count (the
481
number of blocks of this bins size that will fit into a _S_chunk_size)
482
or until end of list - whatever comes first - to the current threads
483
freelist and at the same time change the thread_id ownership and
484
update the counters and pointers. When the bins mutex has been
485
unlocked, we remove the block that _S_bin[ bin ].first[ thread_id ]
486
points to from the list, update _S_bin[ bin ].first[ thread_id ],
487
update the free and used counters, and return a pointer to that blocks
488
data.
489
490
 
491
492
The reason that the number of blocks moved to the current threads
493
freelist is limited to block_count is to minimize the chance that a
494
subsequent deallocate() call will return the excess blocks to the
495
global freelist (based on the _S_freelist_headroom calculation, see
496
below).
497
498
 
499
500
However if there isn't any memory on the global pool we need to get
501
memory from the system - this is done in exactly the same way as in a
502
single threaded application with one major difference; the list built
503
in the newly allocated memory (of _S_chunk_size size) is added to the
504
current threads freelist instead of to the global.
505
506
 
507
508
The basic process of a deallocation call is simple: always add the
509
block to the front of the current threads freelist and update the
510
counters and pointers (as described earlier with the specific check of
511
ownership that causes the used counter of the thread that originally
512
allocated the block to be decreased instead of the current threads
513
counter).
514
515
 
516
517
And here comes the free and used counters to service. Each time a
518
deallocation() call is made, the length of the current threads
519
freelist is compared to the amount memory in use by this thread.
520
521
 
522
523
Let's go back to the example of an application that has one thread
524
that does all the allocations and one that deallocates. Both these
525
threads use say 516 32-byte blocks that was allocated during thread
526
creation for example.  Their used counters will both say 516 at this
527
point. The allocation thread now grabs 1000 32-byte blocks and puts
528
them in a shared container. The used counter for this thread is now
529
1516.
530
531
 
532
533
The deallocation thread now deallocates 500 of these blocks. For each
534
deallocation made the used counter of the allocating thread is
535
decreased and the freelist of the deallocation thread gets longer and
536
longer. But the calculation made in deallocate() will limit the length
537
of the freelist in the deallocation thread to _S_freelist_headroom %
538
of it's used counter.  In this case, when the freelist (given that the
539
_S_freelist_headroom is at it's default value of 10%) exceeds 52
540
(516/10) blocks will be returned to the global pool where the
541
allocating thread may pick them up and reuse them.
542
543
 
544
545
In order to reduce lock contention (since this requires this bins
546
mutex to be locked) this operation is also made in chunks of blocks
547
(just like when chunks of blocks are moved from the global freelist to
548
a threads freelist mentioned above). The "formula" used can probably
549
be improved to further reduce the risk of blocks being "bounced back
550
and forth" between freelists.
551
552
 
553
554
 
555

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.