1 |
1625 |
jcastillo |
[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
|
2 |
|
|
|
3 |
|
|
This is how to track down a bug if you know nothing about kernel hacking.
|
4 |
|
|
It's a brute force approach but it works pretty well.
|
5 |
|
|
|
6 |
|
|
You need:
|
7 |
|
|
|
8 |
|
|
. A reproducible bug - it has to happen predictably (sorry)
|
9 |
|
|
. All the kernel tar files from a revision that worked to the
|
10 |
|
|
revision that doesn't
|
11 |
|
|
|
12 |
|
|
You will then do:
|
13 |
|
|
|
14 |
|
|
. Rebuild a revision that you believe works, install, and verify that.
|
15 |
|
|
. Do a binary search over the kernels to figure out which one
|
16 |
|
|
introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but
|
17 |
|
|
you know that 1.3.69 does. Pick a kernel in the middle and build
|
18 |
|
|
that, like 1.3.50. Build & test; if it works, pick the mid point
|
19 |
|
|
between .50 and .69, else the mid point between .28 and .50.
|
20 |
|
|
. You'll narrow it down to the kernel that introduced the bug. You
|
21 |
|
|
can probably do better than this but it gets tricky.
|
22 |
|
|
|
23 |
|
|
. Narrow it down to a subdirectory
|
24 |
|
|
|
25 |
|
|
- Copy kernel that works into "test". Let's say that 3.62 works,
|
26 |
|
|
but 3.63 doesn't. So you diff -r those two kernels and come
|
27 |
|
|
up with a list of directories that changed. For each of those
|
28 |
|
|
directories:
|
29 |
|
|
|
30 |
|
|
Copy the non-working directory next to the working directory
|
31 |
|
|
as "dir.63".
|
32 |
|
|
One directory at time, try moving the working directory to
|
33 |
|
|
"dir.62" and mv dir.63 dir"time, try
|
34 |
|
|
|
35 |
|
|
mv dir dir.62
|
36 |
|
|
mv dir.63 dir
|
37 |
|
|
find dir -name '*.[oa]' -print | xargs rm -f
|
38 |
|
|
|
39 |
|
|
And then rebuild and retest. Assuming that all related
|
40 |
|
|
changes were contained in the sub directory, this should
|
41 |
|
|
isolate the change to a directory.
|
42 |
|
|
|
43 |
|
|
Problems: changes in header files may have occurred; I've
|
44 |
|
|
found in my case that they were self explanatory - you may
|
45 |
|
|
or may not want to give up when that happens.
|
46 |
|
|
|
47 |
|
|
. Narrow it down to a file
|
48 |
|
|
|
49 |
|
|
- You can apply the same technique to each file in the directory,
|
50 |
|
|
hoping that the changes in that file are self contained.
|
51 |
|
|
|
52 |
|
|
. Narrow it down to a routine
|
53 |
|
|
|
54 |
|
|
- You can take the old file and the new file and manually create
|
55 |
|
|
a merged file that has
|
56 |
|
|
|
57 |
|
|
#ifdef VER62
|
58 |
|
|
routine()
|
59 |
|
|
{
|
60 |
|
|
...
|
61 |
|
|
}
|
62 |
|
|
#else
|
63 |
|
|
routine()
|
64 |
|
|
{
|
65 |
|
|
...
|
66 |
|
|
}
|
67 |
|
|
#endif
|
68 |
|
|
|
69 |
|
|
And then walk through that file, one routine at a time and
|
70 |
|
|
prefix it with
|
71 |
|
|
|
72 |
|
|
#define VER62
|
73 |
|
|
/* both routines here */
|
74 |
|
|
#undef VER62
|
75 |
|
|
|
76 |
|
|
Then recompile, retest, move the ifdefs until you find the one
|
77 |
|
|
that makes the difference.
|
78 |
|
|
|
79 |
|
|
Finally, you take all the info that you have, kernel revisions, bug
|
80 |
|
|
description, the extent to which you have narrowed it down, and pass
|
81 |
|
|
that off to whomever you believe is the maintainer of that section.
|
82 |
|
|
A post to linux.dev.kernel isn't such a bad idea if you've done some
|
83 |
|
|
work to narrow it down.
|
84 |
|
|
|
85 |
|
|
If you get it down to a routine, you'll probably get a fix in 24 hours.
|
86 |
|
|
|
87 |
|
|
My apologies to Linus and the other kernel hackers for describing this
|
88 |
|
|
brute force approach, it's hardly what a kernel hack would do. However,
|
89 |
|
|
it does work and it lets non-hackers help bug fix. And it is cool
|
90 |
|
|
because Linux snapshots will let you do this - something that you can't
|
91 |
|
|
do with vender supplied releases.
|
92 |
|
|
|