1 |
721 |
jeremybenn |
<HTML>
|
2 |
|
|
<HEAD>
|
3 |
|
|
<TITLE> Conservative GC Algorithmic Overview </TITLE>
|
4 |
|
|
<AUTHOR> Hans-J. Boehm, HP Labs (Much of this was written at SGI)</author>
|
5 |
|
|
</HEAD>
|
6 |
|
|
<BODY>
|
7 |
|
|
<H1> <I>This is under construction, and may always be.</i> </h1>
|
8 |
|
|
<H1> Conservative GC Algorithmic Overview </h1>
|
9 |
|
|
<P>
|
10 |
|
|
This is a description of the algorithms and data structures used in our
|
11 |
|
|
conservative garbage collector. I expect the level of detail to increase
|
12 |
|
|
with time. For a survey of GC algorithms, see for example
|
13 |
|
|
<A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
|
14 |
|
|
excellent paper</a>. For an overview of the collector interface,
|
15 |
|
|
see <A HREF="gcinterface.html">here</a>.
|
16 |
|
|
<P>
|
17 |
|
|
This description is targeted primarily at someone trying to understand the
|
18 |
|
|
source code. It specifically refers to variable and function names.
|
19 |
|
|
It may also be useful for understanding the algorithms at a higher level.
|
20 |
|
|
<P>
|
21 |
|
|
The description here assumes that the collector is used in default mode.
|
22 |
|
|
In particular, we assume that it used as a garbage collector, and not just
|
23 |
|
|
a leak detector. We initially assume that it is used in stop-the-world,
|
24 |
|
|
non-incremental mode, though the presence of the incremental collector
|
25 |
|
|
will be apparent in the design.
|
26 |
|
|
We assume the default finalization model, but the code affected by that
|
27 |
|
|
is very localized.
|
28 |
|
|
<H2> Introduction </h2>
|
29 |
|
|
The garbage collector uses a modified mark-sweep algorithm. Conceptually
|
30 |
|
|
it operates roughly in four phases, which are performed occasionally
|
31 |
|
|
as part of a memory allocation:
|
32 |
|
|
|
33 |
|
|
<OL>
|
34 |
|
|
|
35 |
|
|
<LI>
|
36 |
|
|
<I>Preparation</i> Each object has an associated mark bit.
|
37 |
|
|
Clear all mark bits, indicating that all objects
|
38 |
|
|
are potentially unreachable.
|
39 |
|
|
|
40 |
|
|
<LI>
|
41 |
|
|
<I>Mark phase</i> Marks all objects that can be reachable via chains of
|
42 |
|
|
pointers from variables. Often the collector has no real information
|
43 |
|
|
about the location of pointer variables in the heap, so it
|
44 |
|
|
views all static data areas, stacks and registers as potentially containing
|
45 |
|
|
pointers. Any bit patterns that represent addresses inside
|
46 |
|
|
heap objects managed by the collector are viewed as pointers.
|
47 |
|
|
Unless the client program has made heap object layout information
|
48 |
|
|
available to the collector, any heap objects found to be reachable from
|
49 |
|
|
variables are again scanned similarly.
|
50 |
|
|
|
51 |
|
|
<LI>
|
52 |
|
|
<I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
|
53 |
|
|
objects, and returns them to an appropriate free list for reuse. This is
|
54 |
|
|
not really a separate phase; even in non incremental mode this is operation
|
55 |
|
|
is usually performed on demand during an allocation that discovers an empty
|
56 |
|
|
free list. Thus the sweep phase is very unlikely to touch a page that
|
57 |
|
|
would not have been touched shortly thereafter anyway.
|
58 |
|
|
|
59 |
|
|
<LI>
|
60 |
|
|
<I>Finalization phase</i> Unreachable objects which had been registered
|
61 |
|
|
for finalization are enqueued for finalization outside the collector.
|
62 |
|
|
|
63 |
|
|
</ol>
|
64 |
|
|
|
65 |
|
|
<P>
|
66 |
|
|
The remaining sections describe the memory allocation data structures,
|
67 |
|
|
and then the last 3 collection phases in more detail. We conclude by
|
68 |
|
|
outlining some of the additional features implemented in the collector.
|
69 |
|
|
|
70 |
|
|
<H2>Allocation</h2>
|
71 |
|
|
The collector includes its own memory allocator. The allocator obtains
|
72 |
|
|
memory from the system in a platform-dependent way. Under UNIX, it
|
73 |
|
|
uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
|
74 |
|
|
<P>
|
75 |
|
|
Most static data used by the allocator, as well as that needed by the
|
76 |
|
|
rest of the garbage collector is stored inside the
|
77 |
|
|
<TT>_GC_arrays</tt> structure.
|
78 |
|
|
This allows the garbage collector to easily ignore the collectors own
|
79 |
|
|
data structures when it searches for root pointers. Other allocator
|
80 |
|
|
and collector internal data structures are allocated dynamically
|
81 |
|
|
with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
|
82 |
|
|
allow for deallocation, and is therefore used only for permanent data
|
83 |
|
|
structures.
|
84 |
|
|
<P>
|
85 |
|
|
The allocator allocates objects of different <I>kinds</i>.
|
86 |
|
|
Different kinds are handled somewhat differently by certain parts
|
87 |
|
|
of the garbage collector. Certain kinds are scanned for pointers,
|
88 |
|
|
others are not. Some may have per-object type descriptors that
|
89 |
|
|
determine pointer locations. Or a specific kind may correspond
|
90 |
|
|
to one specific object layout. Two built-in kinds are uncollectable.
|
91 |
|
|
One (<TT>STUBBORN</tt>) is immutable without special precautions.
|
92 |
|
|
In spite of that, it is very likely that most C clients of the
|
93 |
|
|
collector currently
|
94 |
|
|
use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
|
95 |
|
|
The <A HREF="http://gcc.gnu.org/java">gcj</a> runtime also makes
|
96 |
|
|
heavy use of a kind (allocated with GC_gcj_malloc) that stores
|
97 |
|
|
type information at a known offset in method tables.
|
98 |
|
|
<P>
|
99 |
|
|
The collector uses a two level allocator. A large block is defined to
|
100 |
|
|
be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
|
101 |
|
|
typically on the order of the page size.
|
102 |
|
|
<P>
|
103 |
|
|
Large block sizes are rounded up to
|
104 |
|
|
the next multiple of <TT>HBLKSIZE</tt> and then allocated by
|
105 |
|
|
<TT>GC_allochblk</tt>. Recent versions of the collector
|
106 |
|
|
use an approximate best fit algorithm by keeping free lists for
|
107 |
|
|
several large block sizes.
|
108 |
|
|
The actual
|
109 |
|
|
implementation of <TT>GC_allochblk</tt>
|
110 |
|
|
is significantly complicated by black-listing issues
|
111 |
|
|
(see below).
|
112 |
|
|
<P>
|
113 |
|
|
Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>.
|
114 |
|
|
Each chunk is
|
115 |
|
|
dedicated to only one object size and kind. The allocator maintains
|
116 |
|
|
separate free lists for each size and kind of object.
|
117 |
|
|
<P>
|
118 |
|
|
Once a large block is split for use in smaller objects, it can only
|
119 |
|
|
be used for objects of that size, unless the collector discovers a completely
|
120 |
|
|
empty chunk. Completely empty chunks are restored to the appropriate
|
121 |
|
|
large block free list.
|
122 |
|
|
<P>
|
123 |
|
|
In order to avoid allocating blocks for too many distinct object sizes,
|
124 |
|
|
the collector normally does not directly allocate objects of every possible
|
125 |
|
|
request size. Instead request are rounded up to one of a smaller number
|
126 |
|
|
of allocated sizes, for which free lists are maintained. The exact
|
127 |
|
|
allocated sizes are computed on demand, but subject to the constraint
|
128 |
|
|
that they increase roughly in geometric progression. Thus objects
|
129 |
|
|
requested early in the execution are likely to be allocated with exactly
|
130 |
|
|
the requested size, subject to alignment constraints.
|
131 |
|
|
See <TT>GC_init_size_map</tt> for details.
|
132 |
|
|
<P>
|
133 |
|
|
The actual size rounding operation during small object allocation is
|
134 |
|
|
implemented as a table lookup in <TT>GC_size_map</tt>.
|
135 |
|
|
<P>
|
136 |
|
|
Both collector initialization and computation of allocated sizes are
|
137 |
|
|
handled carefully so that they do not slow down the small object fast
|
138 |
|
|
allocation path. An attempt to allocate before the collector is initialized,
|
139 |
|
|
or before the appropriate <TT>GC_size_map</tt> entry is computed,
|
140 |
|
|
will take the same path as an allocation attempt with an empty free list.
|
141 |
|
|
This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
|
142 |
|
|
which performs the appropriate initialization checks.
|
143 |
|
|
<P>
|
144 |
|
|
In non-incremental mode, we make a decision about whether to garbage collect
|
145 |
|
|
whenever an allocation would otherwise have failed with the current heap size.
|
146 |
|
|
If the total amount of allocation since the last collection is less than
|
147 |
|
|
the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
|
148 |
|
|
expand the heap. Otherwise, we initiate a garbage collection. This ensures
|
149 |
|
|
that the amount of garbage collection work per allocated byte remains
|
150 |
|
|
constant.
|
151 |
|
|
<P>
|
152 |
|
|
The above is in fact an oversimplification of the real heap expansion
|
153 |
|
|
and GC triggering heuristic, which adjusts slightly for root size
|
154 |
|
|
and certain kinds of
|
155 |
|
|
fragmentation. In particular:
|
156 |
|
|
<UL>
|
157 |
|
|
<LI> Programs with a large root set size and
|
158 |
|
|
little live heap memory will expand the heap to amortize the cost of
|
159 |
|
|
scanning the roots.
|
160 |
|
|
<LI> Versions 5.x of the collector actually collect more frequently in
|
161 |
|
|
nonincremental mode. The large block allocator usually refuses to split
|
162 |
|
|
large heap blocks once the garbage collection threshold is
|
163 |
|
|
reached. This often has the effect of collecting well before the
|
164 |
|
|
heap fills up, thus reducing fragmentation and working set size at the
|
165 |
|
|
expense of GC time. Versions 6.x choose an intermediate strategy depending
|
166 |
|
|
on how much large object allocation has taken place in the past.
|
167 |
|
|
(If the collector is configured to unmap unused pages, versions 6.x
|
168 |
|
|
use the 5.x strategy.)
|
169 |
|
|
<LI> In calculating the amount of allocation since the last collection we
|
170 |
|
|
give partial credit for objects we expect to be explicitly deallocated.
|
171 |
|
|
Even if all objects are explicitly managed, it is often desirable to collect
|
172 |
|
|
on rare occasion, since that is our only mechanism for coalescing completely
|
173 |
|
|
empty chunks.
|
174 |
|
|
</ul>
|
175 |
|
|
<P>
|
176 |
|
|
It has been suggested that this should be adjusted so that we favor
|
177 |
|
|
expansion if the resulting heap still fits into physical memory.
|
178 |
|
|
In many cases, that would no doubt help. But it is tricky to do this
|
179 |
|
|
in a way that remains robust if multiple application are contending
|
180 |
|
|
for a single pool of physical memory.
|
181 |
|
|
|
182 |
|
|
<H2>Mark phase</h2>
|
183 |
|
|
|
184 |
|
|
At each collection, the collector marks all objects that are
|
185 |
|
|
possibly reachable from pointer variables. Since it cannot generally
|
186 |
|
|
tell where pointer variables are located, it scans the following
|
187 |
|
|
<I>root segments</i> for pointers:
|
188 |
|
|
<UL>
|
189 |
|
|
<LI>The registers. Depending on the architecture, this may be done using
|
190 |
|
|
assembly code, or by calling a <TT>setjmp</tt>-like function which saves
|
191 |
|
|
register contents on the stack.
|
192 |
|
|
<LI>The stack(s). In the case of a single-threaded application,
|
193 |
|
|
on most platforms this
|
194 |
|
|
is done by scanning the memory between (an approximation of) the current
|
195 |
|
|
stack pointer and <TT>GC_stackbottom</tt>. (For Itanium, the register stack
|
196 |
|
|
scanned separately.) The <TT>GC_stackbottom</tt> variable is set in
|
197 |
|
|
a highly platform-specific way depending on the appropriate configuration
|
198 |
|
|
information in <TT>gcconfig.h</tt>. Note that the currently active
|
199 |
|
|
stack needs to be scanned carefully, since callee-save registers of
|
200 |
|
|
client code may appear inside collector stack frames, which may
|
201 |
|
|
change during the mark process. This is addressed by scanning
|
202 |
|
|
some sections of the stack "eagerly", effectively capturing a snapshot
|
203 |
|
|
at one point in time.
|
204 |
|
|
<LI>Static data region(s). In the simplest case, this is the region
|
205 |
|
|
between <TT>DATASTART</tt> and <TT>DATAEND</tt>, as defined in
|
206 |
|
|
<TT>gcconfig.h</tt>. However, in most cases, this will also involve
|
207 |
|
|
static data regions associated with dynamic libraries. These are
|
208 |
|
|
identified by the mostly platform-specific code in <TT>dyn_load.c</tt>.
|
209 |
|
|
</ul>
|
210 |
|
|
The marker maintains an explicit stack of memory regions that are known
|
211 |
|
|
to be accessible, but that have not yet been searched for contained pointers.
|
212 |
|
|
Each stack entry contains the starting address of the block to be scanned,
|
213 |
|
|
as well as a descriptor of the block. If no layout information is
|
214 |
|
|
available for the block, then the descriptor is simply a length.
|
215 |
|
|
(For other possibilities, see <TT>gc_mark.h</tt>.)
|
216 |
|
|
<P>
|
217 |
|
|
At the beginning of the mark phase, all root segments
|
218 |
|
|
(as described above) are pushed on the
|
219 |
|
|
stack by <TT>GC_push_roots</tt>. (Registers and eagerly processed
|
220 |
|
|
stack sections are processed by pushing the referenced objects instead
|
221 |
|
|
of the stack section itself.) If <TT>ALL_INTERIOR_PTRS</tt> is not
|
222 |
|
|
defined, then stack roots require special treatment. In this case, the
|
223 |
|
|
normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
|
224 |
|
|
explicitly checks for interior pointers and pushes descriptors for target
|
225 |
|
|
objects.
|
226 |
|
|
<P>
|
227 |
|
|
The marker is structured to allow incremental marking.
|
228 |
|
|
Each call to <TT>GC_mark_some</tt> performs a small amount of
|
229 |
|
|
work towards marking the heap.
|
230 |
|
|
It maintains
|
231 |
|
|
explicit state in the form of <TT>GC_mark_state</tt>, which
|
232 |
|
|
identifies a particular sub-phase. Some other pieces of state, most
|
233 |
|
|
notably the mark stack, identify how much work remains to be done
|
234 |
|
|
in each sub-phase. The normal progression of mark states for
|
235 |
|
|
a stop-the-world collection is:
|
236 |
|
|
<OL>
|
237 |
|
|
<LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
|
238 |
|
|
objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously
|
239 |
|
|
be false, so the mark state is advanced to
|
240 |
|
|
<LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
|
241 |
|
|
uncollectable objects, roots, and then mark everything reachable from them.
|
242 |
|
|
<TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
|
243 |
|
|
objects are pushed, and objects reachable from them are marked.
|
244 |
|
|
At that point, the next call to <TT>GC_mark_some</tt> calls
|
245 |
|
|
<TT>GC_push_roots</tt> to push the roots. It the advances the
|
246 |
|
|
mark state to
|
247 |
|
|
<LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
|
248 |
|
|
empty, all reachable objects are marked. Once in this state, we work
|
249 |
|
|
only on emptying the mark stack. Once this is completed, the state
|
250 |
|
|
changes to
|
251 |
|
|
<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
|
252 |
|
|
</ol>
|
253 |
|
|
|
254 |
|
|
The core mark routine <TT>GC_mark_from</tt>, is called
|
255 |
|
|
repeatedly by several of the sub-phases when the mark stack starts to fill
|
256 |
|
|
up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
|
257 |
|
|
to empty the mark stack.
|
258 |
|
|
The routine is designed to only perform a limited amount of marking at
|
259 |
|
|
each call, so that it can also be used by the incremental collector.
|
260 |
|
|
It is fairly carefully tuned, since it usually consumes a large majority
|
261 |
|
|
of the garbage collection time.
|
262 |
|
|
<P>
|
263 |
|
|
The fact that it perform a only a small amount of work per call also
|
264 |
|
|
allows it to be used as the core routine of the parallel marker. In that
|
265 |
|
|
case it is normally invoked on thread-private mark stacks instead of the
|
266 |
|
|
global mark stack. More details can be found in
|
267 |
|
|
<A HREF="scale.html">scale.html</a>
|
268 |
|
|
<P>
|
269 |
|
|
The marker correctly handles mark stack overflows. Whenever the mark stack
|
270 |
|
|
overflows, the mark state is reset to <TT>MS_INVALID</tt>.
|
271 |
|
|
Since there are already marked objects in the heap,
|
272 |
|
|
this eventually forces a complete
|
273 |
|
|
scan of the heap, searching for pointers, during which any unmarked objects
|
274 |
|
|
referenced by marked objects are again pushed on the mark stack. This
|
275 |
|
|
process is repeated until the mark phase completes without a stack overflow.
|
276 |
|
|
Each time the stack overflows, an attempt is made to grow the mark stack.
|
277 |
|
|
All pieces of the collector that push regions onto the mark stack have to be
|
278 |
|
|
careful to ensure forward progress, even in case of repeated mark stack
|
279 |
|
|
overflows. Every mark attempt results in additional marked objects.
|
280 |
|
|
<P>
|
281 |
|
|
Each mark stack entry is processed by examining all candidate pointers
|
282 |
|
|
in the range described by the entry. If the region has no associated
|
283 |
|
|
type information, then this typically requires that each 4-byte aligned
|
284 |
|
|
quantity (8-byte aligned with 64-bit pointers) be considered a candidate
|
285 |
|
|
pointer.
|
286 |
|
|
<P>
|
287 |
|
|
We determine whether a candidate pointer is actually the address of
|
288 |
|
|
a heap block. This is done in the following steps:
|
289 |
|
|
<NL>
|
290 |
|
|
<LI> The candidate pointer is checked against rough heap bounds.
|
291 |
|
|
These heap bounds are maintained such that all actual heap objects
|
292 |
|
|
fall between them. In order to facilitate black-listing (see below)
|
293 |
|
|
we also include address regions that the heap is likely to expand into.
|
294 |
|
|
Most non-pointers fail this initial test.
|
295 |
|
|
<LI> The candidate pointer is divided into two pieces; the most significant
|
296 |
|
|
bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
|
297 |
|
|
the least significant bits specify an offset within that page.
|
298 |
|
|
(A hardware page may actually consist of multiple such pages.
|
299 |
|
|
HBLKSIZE is usually the page size divided by a small power of two.)
|
300 |
|
|
<LI>
|
301 |
|
|
The page address part of the candidate pointer is looked up in a
|
302 |
|
|
<A HREF="tree.html">table</a>.
|
303 |
|
|
Each table entry contains either 0, indicating that the page is not part
|
304 |
|
|
of the garbage collected heap, a small integer <I>n</i>, indicating
|
305 |
|
|
that the page is part of large object, starting at least <I>n</i> pages
|
306 |
|
|
back, or a pointer to a descriptor for the page. In the first case,
|
307 |
|
|
the candidate pointer i not a true pointer and can be safely ignored.
|
308 |
|
|
In the last two cases, we can obtain a descriptor for the page containing
|
309 |
|
|
the beginning of the object.
|
310 |
|
|
<LI>
|
311 |
|
|
The starting address of the referenced object is computed.
|
312 |
|
|
The page descriptor contains the size of the object(s)
|
313 |
|
|
in that page, the object kind, and the necessary mark bits for those
|
314 |
|
|
objects. The size information can be used to map the candidate pointer
|
315 |
|
|
to the object starting address. To accelerate this process, the page header
|
316 |
|
|
also contains a pointer to a precomputed map of page offsets to displacements
|
317 |
|
|
from the beginning of an object. The use of this map avoids a
|
318 |
|
|
potentially slow integer remainder operation in computing the object
|
319 |
|
|
start address.
|
320 |
|
|
<LI>
|
321 |
|
|
The mark bit for the target object is checked and set. If the object
|
322 |
|
|
was previously unmarked, the object is pushed on the mark stack.
|
323 |
|
|
The descriptor is read from the page descriptor. (This is computed
|
324 |
|
|
from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
|
325 |
|
|
</nl>
|
326 |
|
|
<P>
|
327 |
|
|
At the end of the mark phase, mark bits for left-over free lists are cleared,
|
328 |
|
|
in case a free list was accidentally marked due to a stray pointer.
|
329 |
|
|
|
330 |
|
|
<H2>Sweep phase</h2>
|
331 |
|
|
|
332 |
|
|
At the end of the mark phase, all blocks in the heap are examined.
|
333 |
|
|
Unmarked large objects are immediately returned to the large object free list.
|
334 |
|
|
Each small object page is checked to see if all mark bits are clear.
|
335 |
|
|
If so, the entire page is returned to the large object free list.
|
336 |
|
|
Small object pages containing some reachable object are queued for later
|
337 |
|
|
sweeping, unless we determine that the page contains very little free
|
338 |
|
|
space, in which case it is not examined further.
|
339 |
|
|
<P>
|
340 |
|
|
This initial sweep pass touches only block headers, not
|
341 |
|
|
the blocks themselves. Thus it does not require significant paging, even
|
342 |
|
|
if large sections of the heap are not in physical memory.
|
343 |
|
|
<P>
|
344 |
|
|
Nonempty small object pages are swept when an allocation attempt
|
345 |
|
|
encounters an empty free list for that object size and kind.
|
346 |
|
|
Pages for the correct size and kind are repeatedly swept until at
|
347 |
|
|
least one empty block is found. Sweeping such a page involves
|
348 |
|
|
scanning the mark bit array in the page header, and building a free
|
349 |
|
|
list linked through the first words in the objects themselves.
|
350 |
|
|
This does involve touching the appropriate data page, but in most cases
|
351 |
|
|
it will be touched only just before it is used for allocation.
|
352 |
|
|
Hence any paging is essentially unavoidable.
|
353 |
|
|
<P>
|
354 |
|
|
Except in the case of pointer-free objects, we maintain the invariant
|
355 |
|
|
that any object in a small object free list is cleared (except possibly
|
356 |
|
|
for the link field). Thus it becomes the burden of the small object
|
357 |
|
|
sweep routine to clear objects. This has the advantage that we can
|
358 |
|
|
easily recover from accidentally marking a free list, though that could
|
359 |
|
|
also be handled by other means. The collector currently spends a fair
|
360 |
|
|
amount of time clearing objects, and this approach should probably be
|
361 |
|
|
revisited.
|
362 |
|
|
<P>
|
363 |
|
|
In most configurations, we use specialized sweep routines to handle common
|
364 |
|
|
small object sizes. Since we allocate one mark bit per word, it becomes
|
365 |
|
|
easier to examine the relevant mark bits if the object size divides
|
366 |
|
|
the word length evenly. We also suitably unroll the inner sweep loop
|
367 |
|
|
in each case. (It is conceivable that profile-based procedure cloning
|
368 |
|
|
in the compiler could make this unnecessary and counterproductive. I
|
369 |
|
|
know of no existing compiler to which this applies.)
|
370 |
|
|
<P>
|
371 |
|
|
The sweeping of small object pages could be avoided completely at the expense
|
372 |
|
|
of examining mark bits directly in the allocator. This would probably
|
373 |
|
|
be more expensive, since each allocation call would have to reload
|
374 |
|
|
a large amount of state (e.g. next object address to be swept, position
|
375 |
|
|
in mark bit table) before it could do its work. The current scheme
|
376 |
|
|
keeps the allocator simple and allows useful optimizations in the sweeper.
|
377 |
|
|
|
378 |
|
|
<H2>Finalization</h2>
|
379 |
|
|
Both <TT>GC_register_disappearing_link</tt> and
|
380 |
|
|
<TT>GC_register_finalizer</tt> add the request to a corresponding hash
|
381 |
|
|
table. The hash table is allocated out of collected memory, but
|
382 |
|
|
the reference to the finalizable object is hidden from the collector.
|
383 |
|
|
Currently finalization requests are processed non-incrementally at the
|
384 |
|
|
end of a mark cycle.
|
385 |
|
|
<P>
|
386 |
|
|
The collector makes an initial pass over the table of finalizable objects,
|
387 |
|
|
pushing the contents of unmarked objects onto the mark stack.
|
388 |
|
|
After pushing each object, the marker is invoked to mark all objects
|
389 |
|
|
reachable from it. The object itself is not explicitly marked.
|
390 |
|
|
This assures that objects on which a finalizer depends are neither
|
391 |
|
|
collected nor finalized.
|
392 |
|
|
<P>
|
393 |
|
|
If in the process of marking from an object the
|
394 |
|
|
object itself becomes marked, we have uncovered
|
395 |
|
|
a cycle involving the object. This usually results in a warning from the
|
396 |
|
|
collector. Such objects are not finalized, since it may be
|
397 |
|
|
unsafe to do so. See the more detailed
|
398 |
|
|
<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>.
|
399 |
|
|
<P>
|
400 |
|
|
Any objects remaining unmarked at the end of this process are added to
|
401 |
|
|
a queue of objects whose finalizers can be run. Depending on collector
|
402 |
|
|
configuration, finalizers are dequeued and run either implicitly during
|
403 |
|
|
allocation calls, or explicitly in response to a user request.
|
404 |
|
|
(Note that the former is unfortunately both the default and not generally safe.
|
405 |
|
|
If finalizers perform synchronization, it may result in deadlocks.
|
406 |
|
|
Nontrivial finalizers generally need to perform synchronization, and
|
407 |
|
|
thus require a different collector configuration.)
|
408 |
|
|
<P>
|
409 |
|
|
The collector provides a mechanism for replacing the procedure that is
|
410 |
|
|
used to mark through objects. This is used both to provide support for
|
411 |
|
|
Java-style unordered finalization, and to ignore certain kinds of cycles,
|
412 |
|
|
<I>e.g.</i> those arising from C++ implementations of virtual inheritance.
|
413 |
|
|
|
414 |
|
|
<H2>Generational Collection and Dirty Bits</h2>
|
415 |
|
|
We basically use the concurrent and generational GC algorithm described in
|
416 |
|
|
<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>,
|
417 |
|
|
by Boehm, Demers, and Shenker.
|
418 |
|
|
<P>
|
419 |
|
|
The most significant modification is that
|
420 |
|
|
the collector always starts running in the allocating thread.
|
421 |
|
|
There is no separate garbage collector thread. (If parallel GC is
|
422 |
|
|
enabled, helper threads may also be woken up.)
|
423 |
|
|
If an allocation attempt either requests a large object, or encounters
|
424 |
|
|
an empty small object free list, and notices that there is a collection
|
425 |
|
|
in progress, it immediately performs a small amount of marking work
|
426 |
|
|
as described above.
|
427 |
|
|
<P>
|
428 |
|
|
This change was made both because we wanted to easily accommodate
|
429 |
|
|
single-threaded environments, and because a separate GC thread requires
|
430 |
|
|
very careful control over the scheduler to prevent the mutator from
|
431 |
|
|
out-running the collector, and hence provoking unneeded heap growth.
|
432 |
|
|
<P>
|
433 |
|
|
In incremental mode, the heap is always expanded when we encounter
|
434 |
|
|
insufficient space for an allocation. Garbage collection is triggered
|
435 |
|
|
whenever we notice that more than
|
436 |
|
|
<TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
|
437 |
|
|
bytes of allocation have taken place.
|
438 |
|
|
After <TT>GC_full_freq</tt> minor collections a major collection
|
439 |
|
|
is started.
|
440 |
|
|
<P>
|
441 |
|
|
All collections initially run interrupted until a predetermined
|
442 |
|
|
amount of time (50 msecs by default) has expired. If this allows
|
443 |
|
|
the collection to complete entirely, we can avoid correcting
|
444 |
|
|
for data structure modifications during the collection. If it does
|
445 |
|
|
not complete, we return control to the mutator, and perform small
|
446 |
|
|
amounts of additional GC work during those later allocations that
|
447 |
|
|
cannot be satisfied from small object free lists. When marking completes,
|
448 |
|
|
the set of modified pages is retrieved, and we mark once again from
|
449 |
|
|
marked objects on those pages, this time with the mutator stopped.
|
450 |
|
|
<P>
|
451 |
|
|
We keep track of modified pages using one of several distinct mechanisms:
|
452 |
|
|
<OL>
|
453 |
|
|
<LI>
|
454 |
|
|
Through explicit mutator cooperation. Currently this requires
|
455 |
|
|
the use of <TT>GC_malloc_stubborn</tt>, and is rarely used.
|
456 |
|
|
<LI>
|
457 |
|
|
(<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and
|
458 |
|
|
catching write faults. This is
|
459 |
|
|
implemented for many Unix-like systems and for win32. It is not possible
|
460 |
|
|
in a few environments.
|
461 |
|
|
<LI>
|
462 |
|
|
(<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc.
|
463 |
|
|
(Currently only Sun's
|
464 |
|
|
Solaris supports this. Though this is considerably cleaner, performance
|
465 |
|
|
may actually be better with mprotect and signals.)
|
466 |
|
|
<LI>
|
467 |
|
|
(<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this
|
468 |
|
|
case the one in Xerox PCR.
|
469 |
|
|
<LI>
|
470 |
|
|
(<TT>DEFAULT_VDB</tt>) By treating all pages as dirty. This is the default if
|
471 |
|
|
none of the other techniques is known to be usable, and
|
472 |
|
|
<TT>GC_malloc_stubborn</tt> is not used. Practical only for testing, or if
|
473 |
|
|
the vast majority of objects use <TT>GC_malloc_stubborn</tt>.
|
474 |
|
|
</ol>
|
475 |
|
|
|
476 |
|
|
<H2>Black-listing</h2>
|
477 |
|
|
|
478 |
|
|
The collector implements <I>black-listing</i> of pages, as described
|
479 |
|
|
in
|
480 |
|
|
<A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/">
|
481 |
|
|
Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available
|
482 |
|
|
<A HREF="papers/pldi93.ps.Z">here</a>.
|
483 |
|
|
<P>
|
484 |
|
|
During the mark phase, the collector tracks ``near misses'', i.e. attempts
|
485 |
|
|
to follow a ``pointer'' to just outside the garbage-collected heap, or
|
486 |
|
|
to a currently unallocated page inside the heap. Pages that have been
|
487 |
|
|
the targets of such near misses are likely to be the targets of
|
488 |
|
|
misidentified ``pointers'' in the future. To minimize the future
|
489 |
|
|
damage caused by such misidentifications they will be allocated only to
|
490 |
|
|
small pointerfree objects.
|
491 |
|
|
<P>
|
492 |
|
|
The collector understands two different kinds of black-listing. A
|
493 |
|
|
page may be black listed for interior pointer references
|
494 |
|
|
(<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near
|
495 |
|
|
miss from a location that requires interior pointer recognition,
|
496 |
|
|
<I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt>
|
497 |
|
|
is set. In this case, we also avoid allocating large blocks that include
|
498 |
|
|
this page.
|
499 |
|
|
<P>
|
500 |
|
|
If the near miss came from a source that did not require interior
|
501 |
|
|
pointer recognition, it is black-listed with
|
502 |
|
|
<TT>GC_add_to_black_list_normal</tt>.
|
503 |
|
|
A page black-listed in this way may appear inside a large object,
|
504 |
|
|
so long as it is not the first page of a large object.
|
505 |
|
|
<P>
|
506 |
|
|
The <TT>GC_allochblk</tt> routine respects black-listing when assigning
|
507 |
|
|
a block to a particular object kind and size. It occasionally
|
508 |
|
|
drops (i.e. allocates and forgets) blocks that are completely black-listed
|
509 |
|
|
in order to avoid excessively long large block free lists containing
|
510 |
|
|
only unusable blocks. This would otherwise become an issue
|
511 |
|
|
if there is low demand for small pointerfree objects.
|
512 |
|
|
|
513 |
|
|
<H2>Thread support</h2>
|
514 |
|
|
We support several different threading models. Unfortunately Pthreads,
|
515 |
|
|
the only reasonably well standardized thread model, supports too narrow
|
516 |
|
|
an interface for conservative garbage collection. There appears to be
|
517 |
|
|
no completely portable way to allow the collector
|
518 |
|
|
to coexist with various Pthreads
|
519 |
|
|
implementations. Hence we currently support only the more
|
520 |
|
|
common Pthreads implementations.
|
521 |
|
|
<P>
|
522 |
|
|
In particular, it is very difficult for the collector to stop all other
|
523 |
|
|
threads in the system and examine the register contents. This is currently
|
524 |
|
|
accomplished with very different mechanisms for some Pthreads
|
525 |
|
|
implementations. The Solaris implementation temporarily disables much
|
526 |
|
|
of the user-level threads implementation by stopping kernel-level threads
|
527 |
|
|
("lwp"s). The Linux/HPUX/OSF1 and Irix implementations sends signals to
|
528 |
|
|
individual Pthreads and has them wait in the signal handler.
|
529 |
|
|
<P>
|
530 |
|
|
The Linux and Irix implementations use
|
531 |
|
|
only documented Pthreads calls, but rely on extensions to their semantics.
|
532 |
|
|
The Linux implementation <TT>linux_threads.c</tt> relies on only very
|
533 |
|
|
mild extensions to the pthreads semantics, and already supports a large number
|
534 |
|
|
of other Unix-like pthreads implementations. Our goal is to make this the
|
535 |
|
|
only pthread support in the collector.
|
536 |
|
|
<P>
|
537 |
|
|
(The Irix implementation is separate only for historical reasons and should
|
538 |
|
|
clearly be merged. The current Solaris implementation probably performs
|
539 |
|
|
better in the uniprocessor case, but does not support thread operations in the
|
540 |
|
|
collector. Hence it cannot support the parallel marker.)
|
541 |
|
|
<P>
|
542 |
|
|
All implementations must
|
543 |
|
|
intercept thread creation and a few other thread-specific calls to allow
|
544 |
|
|
enumeration of threads and location of thread stacks. This is current
|
545 |
|
|
accomplished with <TT># define</tt>'s in <TT>gc.h</tt>
|
546 |
|
|
(really <TT>gc_pthread_redirects.h</tt>), or optionally
|
547 |
|
|
by using ld's function call wrapping mechanism under Linux.
|
548 |
|
|
<P>
|
549 |
|
|
Recent versions of the collector support several facilites to enhance
|
550 |
|
|
the processor-scalability and thread performance of the collector.
|
551 |
|
|
These are discussed in more detail <A HREF="scale.html">here</a>.
|
552 |
|
|
<P>
|
553 |
|
|
Comments are appreciated. Please send mail to
|
554 |
|
|
<A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> or
|
555 |
|
|
<A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a>
|
556 |
|
|
<P>
|
557 |
|
|
This is a modified copy of a page written while the author was at SGI.
|
558 |
|
|
The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>.
|
559 |
|
|
</body>
|
560 |
|
|
</html>
|