URL
https://opencores.org/ocsvn/test_project/test_project/trunk
Subversion Repositories test_project
[/] [test_project/] [trunk/] [linux_sd_driver/] [Documentation/] [BUG-HUNTING] - Rev 62
Compare with Previous | Blame | View Log
Table of contents=================Last updated: 20 December 2005Contents========- Introduction- Devices not appearing- Finding patch that caused a bug-- Finding using git-bisect-- Finding it the old way- Fixing the bugIntroduction============Always try the latest kernel from kernel.org and build from source. If you arenot confident in doing that please report the bug to your distribution vendorinstead of to a kernel developer.Finding bugs is not always easy. Have a go though. If you can't find it don'tgive up. Report as much as you have found to the relevant maintainer. SeeMAINTAINERS for who that is for the subsystem you have worked on.Before you submit a bug report read REPORTING-BUGS.Devices not appearing=====================Often this is caused by udev. Check that first before blaming it on thekernel.Finding patch that caused a bug===============================Finding using git-bisect------------------------Using the provided tools with git makes finding bugs easy provided the bug isreproducible.Steps to do it:- start using git for the kernel source- read the man page for git-bisect- have funFinding it the old way----------------------[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]This is how to track down a bug if you know nothing about kernel hacking.It's a brute force approach but it works pretty well.You need:. A reproducible bug - it has to happen predictably (sorry). All the kernel tar files from a revision that worked to therevision that doesn'tYou will then do:. Rebuild a revision that you believe works, install, and verify that.. Do a binary search over the kernels to figure out which oneintroduced the bug. I.e., suppose 1.3.28 didn't have the bug, butyou know that 1.3.69 does. Pick a kernel in the middle and buildthat, like 1.3.50. Build & test; if it works, pick the mid pointbetween .50 and .69, else the mid point between .28 and .50.. You'll narrow it down to the kernel that introduced the bug. Youcan probably do better than this but it gets tricky.. Narrow it down to a subdirectory- Copy kernel that works into "test". Let's say that 3.62 works,but 3.63 doesn't. So you diff -r those two kernels and comeup with a list of directories that changed. For each of thosedirectories:Copy the non-working directory next to the working directoryas "dir.63".One directory at time, try moving the working directory to"dir.62" and mv dir.63 dir"time, trymv dir dir.62mv dir.63 dirfind dir -name '*.[oa]' -print | xargs rm -fAnd then rebuild and retest. Assuming that all relatedchanges were contained in the sub directory, this shouldisolate the change to a directory.Problems: changes in header files may have occurred; I'vefound in my case that they were self explanatory - you mayor may not want to give up when that happens.. Narrow it down to a file- You can apply the same technique to each file in the directory,hoping that the changes in that file are self contained.. Narrow it down to a routine- You can take the old file and the new file and manually createa merged file that has#ifdef VER62routine(){...}#elseroutine(){...}#endifAnd then walk through that file, one routine at a time andprefix it with#define VER62/* both routines here */#undef VER62Then recompile, retest, move the ifdefs until you find the onethat makes the difference.Finally, you take all the info that you have, kernel revisions, bugdescription, the extent to which you have narrowed it down, and passthat off to whomever you believe is the maintainer of that section.A post to linux.dev.kernel isn't such a bad idea if you've done somework to narrow it down.If you get it down to a routine, you'll probably get a fix in 24 hours.My apologies to Linus and the other kernel hackers for describing thisbrute force approach, it's hardly what a kernel hacker would do. However,it does work and it lets non-hackers help fix bugs. And it is coolbecause Linux snapshots will let you do this - something that you can'tdo with vendor supplied releases.Fixing the bug==============Nobody is going to tell you how to fix bugs. Seriously. You need to work itout. But below are some hints on how to use the tools.To debug a kernel, use objdump and look for the hex offset from the crashoutput to find the valid line of code/assembler. Without debug symbols, youwill see the assembler code for the routine shown, but if your kernel hasdebug symbols the C code will also be available. (Debug symbols can be enabledin the kernel hacking menu of the menu configuration.) For example:objdump -r -S -l --disassemble net/dccp/ipv4.oNB.: you need to be at the top level of the kernel tree for this to pick upyour C files.If you don't have access to the code you can also debug on some crash dumpse.g. crash dump output as shown by Dave Miller.> EIP is at ip_queue_xmit+0x14/0x4c0> ...> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85>> Put the bytes into a "foo.s" file like this:>> .text> .globl foo> foo:> .byte .... /* bytes from Code: part of OOPS dump */>> Compile it with "gcc -c -o foo.o foo.s" then look at the output of> "objdump --disassemble foo.o".>> Output:>> ip_queue_xmit:> push %ebp> push %edi> push %esi> push %ebx> sub $0xbc, %esp> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)> mov 0x8(%ebp), %ebx ! %ebx = skb->sk> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->optIn addition, you can use GDB to figure out the exact file and linenumber of the OOPS from the vmlinux file. If you haveCONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from theOOPS:EIP: 0060:[<c021e50e>] Not tainted VLIAnd use GDB to translate that to human-readable form:gdb vmlinux(gdb) l *0xc021e50eIf you don't have CONFIG_DEBUG_INFO enabled, you use the functionoffset from the OOPS:EIP is at vt_ioctl+0xda8/0x1482And recompile the kernel with CONFIG_DEBUG_INFO enabled:make vmlinuxgdb vmlinux(gdb) p vt_ioctl(gdb) l *(0x<address of vt_ioctl> + 0xda8)Another very useful option of the Kernel Hacking section in menuconfig isDebug memory allocations. This will help you see whether data has beeninitialised and not set before use etc. To see the values that get assignedwith this look at mm/slab.c and search for POISON_INUSE. When using this anOops will often show the poisoned data instead of zero which is the default.Once you have worked out a fix please submit it upstream. After all opensource is about sharing what you do and don't you want to be recognised foryour genius?Please do read Documentation/SubmittingPatches though to help your code getaccepted.
