OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [trunk/] [gnu-dev/] [or1k-gcc/] [boehm-gc/] [doc/] [debugging.html] - Blame information for rev 737

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 721 jeremybenn
<HTML>
2
<HEAD>
3
<TITLE>Debugging Garbage Collector Related Problems</title>
4
</head>
5
<BODY>
6
<H1>Debugging Garbage Collector Related Problems</h1>
7
This page contains some hints on
8
debugging issues specific to
9
the Boehm-Demers-Weiser conservative garbage collector.
10
It applies both to debugging issues in client code that manifest themselves
11
as collector misbehavior, and to debugging the collector itself.
12
<P>
13
If you suspect a bug in the collector itself, it is strongly recommended
14
that you try the latest collector release, even if it is labelled as "alpha",
15
before proceeding.
16
<H2>Bus Errors and Segmentation Violations</h2>
17
<P>
18
If the fault occurred in GC_find_limit, or with incremental collection enabled,
19
this is probably normal.  The collector installs handlers to take care of
20
these.  You will not see these unless you are using a debugger.
21
Your debugger <I>should</i> allow you to continue.
22
It's often preferable to tell the debugger to ignore SIGBUS and SIGSEGV
23
("<TT>handle SIGSEGV SIGBUS nostop noprint</tt>" in gdb,
24
"<TT>ignore SIGSEGV SIGBUS</tt>" in most versions of dbx)
25
and set a breakpoint in <TT>abort</tt>.
26
The collector will call abort if the signal had another cause,
27
and there was not other handler previously installed.
28
<P>
29
We recommend debugging without incremental collection if possible.
30
(This applies directly to UNIX systems.
31
Debugging with incremental collection under win32 is worse.  See README.win32.)
32
<P>
33
If the application generates an unhandled SIGSEGV or equivalent, it may
34
often be easiest to set the environment variable GC_LOOP_ON_ABORT.  On many
35
platforms, this will cause the collector to loop in a handler when the
36
SIGSEGV is encountered (or when the collector aborts for some other reason),
37
and a debugger can then be attached to the looping
38
process.  This sidesteps common operating system problems related
39
to incomplete core files for multithreaded applications, etc.
40
<H2>Other Signals</h2>
41
On most platforms, the multithreaded version of the collector needs one or
42
two other signals for internal use by the collector in stopping threads.
43
It is normally wise to tell the debugger to ignore these.  On Linux,
44
the collector currently uses SIGPWR and SIGXCPU by default.
45
<H2>Warning Messages About Needing to Allocate Blacklisted Blocks</h2>
46
The garbage collector generates warning messages of the form
47
<PRE>
48
Needed to allocate blacklisted block at 0x...
49
</pre>
50
or
51
<PRE>
52
Repeated allocation of very large block ...
53
</pre>
54
when it needs to allocate a block at a location that it knows to be
55
referenced by a false pointer.  These false pointers can be either permanent
56
(<I>e.g.</i> a static integer variable that never changes) or temporary.
57
In the latter case, the warning is largely spurious, and the block will
58
eventually be reclaimed normally.
59
In the former case, the program will still run correctly, but the block
60
will never be reclaimed.  Unless the block is intended to be
61
permanent, the warning indicates a memory leak.
62
<OL>
63
<LI>Ignore these warnings while you are using GC_DEBUG.  Some of the routines
64
mentioned below don't have debugging equivalents.  (Alternatively, write
65
the missing routines and send them to me.)
66
<LI>Replace allocator calls that request large blocks with calls to
67
<TT>GC_malloc_ignore_off_page</tt> or
68
<TT>GC_malloc_atomic_ignore_off_page</tt>.  You may want to set a
69
breakpoint in <TT>GC_default_warn_proc</tt> to help you identify such calls.
70
Make sure that a pointer to somewhere near the beginning of the resulting block
71
is maintained in a (preferably volatile) variable as long as
72
the block is needed.
73
<LI>
74
If the large blocks are allocated with realloc, we suggest instead allocating
75
them with something like the following.  Note that the realloc size increment
76
should be fairly large (e.g. a factor of 3/2) for this to exhibit reasonable
77
performance.  But we all know we should do that anyway.
78
<PRE>
79
void * big_realloc(void *p, size_t new_size)
80
{
81
    size_t old_size = GC_size(p);
82
    void * result;
83
 
84
    if (new_size <= 10000) return(GC_realloc(p, new_size));
85
    if (new_size <= old_size) return(p);
86
    result = GC_malloc_ignore_off_page(new_size);
87
    if (result == 0) return(0);
88
    memcpy(result,p,old_size);
89
    GC_free(p);
90
    return(result);
91
}
92
</pre>
93
 
94
<LI> In the unlikely case that even relatively small object
95
(&lt;20KB) allocations are triggering these warnings, then your address
96
space contains lots of "bogus pointers", i.e. values that appear to
97
be pointers but aren't.  Usually this can be solved by using GC_malloc_atomic
98
or the routines in gc_typed.h to allocate large pointer-free regions of bitmaps, etc.  Sometimes the problem can be solved with trivial changes of encoding
99
in certain values.  It is possible, to identify the source of the bogus
100
pointers by building the collector with <TT>-DPRINT_BLACK_LIST</tt>,
101
which will cause it to print the "bogus pointers", along with their location.
102
 
103
<LI> If you get only a fixed number of these warnings, you are probably only
104
introducing a bounded leak by ignoring them.  If the data structures being
105
allocated are intended to be permanent, then it is also safe to ignore them.
106
The warnings can be turned off by calling GC_set_warn_proc with a procedure
107
that ignores these warnings (e.g. by doing absolutely nothing).
108
</ol>
109
 
110
<H2>The Collector References a Bad Address in <TT>GC_malloc</tt></h2>
111
 
112
This typically happens while the collector is trying to remove an entry from
113
its free list, and the free list pointer is bad because the free list link
114
in the last allocated object was bad.
115
<P>
116
With &gt; 99% probability, you wrote past the end of an allocated object.
117
Try setting <TT>GC_DEBUG</tt> before including <TT>gc.h</tt> and
118
allocating with <TT>GC_MALLOC</tt>.  This will try to detect such
119
overwrite errors.
120
 
121
<H2>Unexpectedly Large Heap</h2>
122
 
123
Unexpected heap growth can be due to one of the following:
124
<OL>
125
<LI> Data structures that are being unintentionally retained.  This
126
is commonly caused by data structures that are no longer being used,
127
but were not cleared, or by caches growing without bounds.
128
<LI> Pointer misidentification.  The garbage collector is interpreting
129
integers or other data as pointers and retaining the "referenced"
130
objects.  A common symptom is that GC_dump() shows much of the heap
131
as black-listed.
132
<LI> Heap fragmentation.  This should never result in unbounded growth,
133
but it may account for larger heaps.  This is most commonly caused
134
by allocation of large objects.  On some platforms it can be reduced
135
by building with -DUSE_MUNMAP, which will cause the collector to unmap
136
memory corresponding to pages that have not been recently used.
137
<LI> Per object overhead.  This is usually a relatively minor effect, but
138
it may be worth considering.  If the collector recognizes interior
139
pointers, object sizes are increased, so that one-past-the-end pointers
140
are correctly recognized.  The collector can be configured not to do this
141
(<TT>-DDONT_ADD_BYTE_AT_END</tt>).
142
<P>
143
The collector rounds up object sizes so the result fits well into the
144
chunk size (<TT>HBLKSIZE</tt>, normally 4K on 32 bit machines, 8K
145
on 64 bit machines) used by the collector.   Thus it may be worth avoiding
146
objects of size 2K + 1 (or 2K if a byte is being added at the end.)
147
</ol>
148
The last two cases can often be identified by looking at the output
149
of a call to <TT>GC_dump()</tt>.  Among other things, it will print the
150
list of free heap blocks, and a very brief description of all chunks in
151
the heap, the object sizes they correspond to, and how many live objects
152
were found in the chunk at the last collection.
153
<P>
154
Growing data structures can usually be identified by
155
<OL>
156
<LI> Building the collector with <TT>-DKEEP_BACK_PTRS</tt>,
157
<LI> Preferably using debugging allocation (defining <TT>GC_DEBUG</tt>
158
before including <TT>gc.h</tt> and allocating with <TT>GC_MALLOC</tt>),
159
so that objects will be identified by their allocation site,
160
<LI> Running the application long enough so
161
that most of the heap is composed of "leaked" memory, and
162
<LI> Then calling <TT>GC_generate_random_backtrace()</tt> from backptr.h
163
a few times to determine why some randomly sampled objects in the heap are
164
being retained.
165
</ol>
166
<P>
167
The same technique can often be used to identify problems with false
168
pointers, by noting whether the reference chains printed by
169
<TT>GC_generate_random_backtrace()</tt> involve any misidentified pointers.
170
An alternate technique is to build the collector with
171
<TT>-DPRINT_BLACK_LIST</tt> which will cause it to report values that
172
are almost, but not quite, look like heap pointers.  It is very likely that
173
actual false pointers will come from similar sources.
174
<P>
175
In the unlikely case that false pointers are an issue, it can usually
176
be resolved using one or more of the following techniques:
177
<OL>
178
<LI> Use <TT>GC_malloc_atomic</tt> for objects containing no pointers.
179
This is especially important for large arrays containing compressed data,
180
pseudo-random numbers, and the like.  It is also likely to improve GC
181
performance, perhaps drastically so if the application is paging.
182
<LI> If you allocate large objects containing only
183
one or two pointers at the beginning, either try the typed allocation
184
primitives is <TT>gc_typed.h</tt>, or separate out the pointerfree component.
185
<LI> Consider using <TT>GC_malloc_ignore_off_page()</tt>
186
to allocate large objects.  (See <TT>gc.h</tt> and above for details.
187
Large means &gt; 100K in most environments.)
188
<LI> If your heap size is larger than 100MB or so, build the collector with
189
-DLARGE_CONFIG.  This allows the collector to keep more precise black-list
190
information.
191
<LI> If you are using heaps close to, or larger than, a gigabyte on a 32-bit
192
machine, you may want to consider moving to a platform with 64-bit pointers.
193
This is very likely to resolve any false pointer issues.
194
</ol>
195
<H2>Prematurely Reclaimed Objects</h2>
196
The usual symptom of this is a segmentation fault, or an obviously overwritten
197
value in a heap object.  This should, of course, be impossible.  In practice,
198
it may happen for reasons like the following:
199
<OL>
200
<LI> The collector did not intercept the creation of threads correctly in
201
a multithreaded application, <I>e.g.</i> because the client called
202
<TT>pthread_create</tt> without including <TT>gc.h</tt>, which redefines it.
203
<LI> The last pointer to an object in the garbage collected heap was stored
204
somewhere were the collector couldn't see it, <I>e.g.</i> in an
205
object allocated with system <TT>malloc</tt>, in certain types of
206
<TT>mmap</tt>ed files,
207
or in some data structure visible only to the OS.  (On some platforms,
208
thread-local storage is one of these.)
209
<LI> The last pointer to an object was somehow disguised, <I>e.g.</i> by
210
XORing it with another pointer.
211
<LI> Incorrect use of <TT>GC_malloc_atomic</tt> or typed allocation.
212
<LI> An incorrect <TT>GC_free</tt> call.
213
<LI> The client program overwrote an internal garbage collector data structure.
214
<LI> A garbage collector bug.
215
<LI> (Empirically less likely than any of the above.) A compiler optimization
216
that disguised the last pointer.
217
</ol>
218
The following relatively simple techniques should be tried first to narrow
219
down the problem:
220
<OL>
221
<LI> If you are using the incremental collector try turning it off for
222
debugging.
223
<LI> If you are using shared libraries, try linking statically.  If that works,
224
ensure that DYNAMIC_LOADING is defined on your platform.
225
<LI> Try to reproduce the problem with fully debuggable unoptimized code.
226
This will eliminate the last possibility, as well as making debugging easier.
227
<LI> Try replacing any suspect typed allocation and <TT>GC_malloc_atomic</tt>
228
calls with calls to <TT>GC_malloc</tt>.
229
<LI> Try removing any GC_free calls (<I>e.g.</i> with a suitable
230
<TT>#define</tt>).
231
<LI> Rebuild the collector with <TT>-DGC_ASSERTIONS</tt>.
232
<LI> If the following works on your platform (i.e. if gctest still works
233
if you do this), try building the collector with
234
<TT>-DREDIRECT_MALLOC=GC_malloc_uncollectable</tt>.  This will cause
235
the collector to scan memory allocated with malloc.
236
</ol>
237
If all else fails, you will have to attack this with a debugger.
238
Suggested steps:
239
<OL>
240
<LI> Call <TT>GC_dump()</tt> from the debugger around the time of the failure.  Verify
241
that the collectors idea of the root set (i.e. static data regions which
242
it should scan for pointers) looks plausible.  If not, i.e. if it doesn't
243
include some static variables, report this as
244
a collector bug.  Be sure to describe your platform precisely, since this sort
245
of problem is nearly always very platform dependent.
246
<LI> Especially if the failure is not deterministic, try to isolate it to
247
a relatively small test case.
248
<LI> Set a break point in <TT>GC_finish_collection</tt>.  This is a good
249
point to examine what has been marked, i.e. found reachable, by the
250
collector.
251
<LI> If the failure is deterministic, run the process
252
up to the last collection before the failure.
253
Note that the variable <TT>GC_gc_no</tt> counts collections and can be used
254
to set a conditional breakpoint in the right one.  It is incremented just
255
before the call to GC_finish_collection.
256
If object <TT>p</tt> was prematurely recycled, it may be helpful to
257
look at <TT>*GC_find_header(p)</tt> at the failure point.
258
The <TT>hb_last_reclaimed</tt> field will identify the collection number
259
during which its block was last swept.
260
<LI> Verify that the offending object still has its correct contents at
261
this point.
262
Then call <TT>GC_is_marked(p)</tt> from the debugger to verify that the
263
object has not been marked, and is about to be reclaimed.  Note that
264
<TT>GC_is_marked(p)</tt> expects the real address of an object (the
265
address of the debug header if there is one), and thus it may
266
be more appropriate to call <TT>GC_is_marked(GC_base(p))</tt>
267
instead.
268
<LI> Determine a path from a root, i.e. static variable, stack, or
269
register variable,
270
to the reclaimed object.  Call <TT>GC_is_marked(q)</tt> for each object
271
<TT>q</tt> along the path, trying to locate the first unmarked object, say
272
<TT>r</tt>.
273
<LI> If <TT>r</tt> is pointed to by a static root,
274
verify that the location
275
pointing to it is part of the root set printed by <TT>GC_dump()</tt>.  If it
276
is on the stack in the main (or only) thread, verify that
277
<TT>GC_stackbottom</tt> is set correctly to the base of the stack.  If it is
278
in another thread stack, check the collector's thread data structure
279
(<TT>GC_thread[]</tt> on several platforms) to make sure that stack bounds
280
are set correctly.
281
<LI> If <TT>r</tt> is pointed to by heap object <TT>s</tt>, check that the
282
collector's layout description for <TT>s</tt> is such that the pointer field
283
will be scanned.  Call <TT>*GC_find_header(s)</tt> to look at the descriptor
284
for the heap chunk.  The <TT>hb_descr</tt> field specifies the layout
285
of objects in that chunk.  See gc_mark.h for the meaning of the descriptor.
286
(If it's low order 2 bits are zero, then it is just the length of the
287
object prefix to be scanned.  This form is always used for objects allocated
288
with <TT>GC_malloc</tt> or <TT>GC_malloc_atomic</tt>.)
289
<LI> If the failure is not deterministic, you may still be able to apply some
290
of the above technique at the point of failure.  But remember that objects
291
allocated since the last collection will not have been marked, even if the
292
collector is functioning properly.  On some platforms, the collector
293
can be configured to save call chains in objects for debugging.
294
Enabling this feature will also cause it to save the call stack at the
295
point of the last GC in GC_arrays._last_stack.
296
<LI> When looking at GC internal data structures remember that a number
297
of <TT>GC_</tt><I>xxx</i> variables are really macro defined to
298
<TT>GC_arrays._</tt><I>xxx</i>, so that
299
the collector can avoid scanning them.
300
</ol>
301
</body>
302
</html>
303
 
304
 
305
 
306
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.