URL
https://opencores.org/ocsvn/or1k/or1k/trunk
Subversion Repositories or1k
[/] [or1k/] [trunk/] [uclinux/] [uClinux-2.0.x/] [Documentation/] [oops-tracing.txt] - Rev 1765
Compare with Previous | Blame | View Log
From: Linus Torvalds <torvalds@cs.helsinki.fi>How to track down an Oops.. [originally a mail to linux-kernel]The main trick is having 5 years of experience with those pesky oopsmessages ;-)Actually, there are things you can do that make this easier. I have twoseparate approaches:gdb /usr/src/linux/vmlinuxgdb> disassemble <offending_function>That's the easy way to find the problem, at least if the bug-report iswell made (like this one was - run through ksymoops to get theinformation of which function and the offset in the function that ithappened in).Oh, it helps if the report happens on a kernel that is compiled with thesame compiler and similar setups.The other thing to do is disassemble the "Code:" part of the bug report:ksymoops will do this too with the correct tools (and new version ofksymoops), but if you don't have the tools you can just do a sillyprogram:char str[] = "\xXX\xXX\xXX...";main(){}and compile it with gcc -g and then do "disassemble str" (where the "XX"stuff are the values reported by the Oops - you can just cut-and-pasteand do a replace of spaces to "\x" - that's what I do, as I'm too lazyto write a program to automate this all).Finally, if you want to see where the code comes from, you can docd /usr/src/linuxmake fs/buffer.s # or whatever file the bug happened inand then you get a better idea of what happens than with the gdbdisassembly.Now, the trick is just then to combine all the data you have: the Csources (and general knowledge of what it _should_ do, the assemblylisting and the code disassembly (and additionally the register dump youalso get from the "oops" message - that can be useful to see _what_ thecorrupted pointers were, and when you have the assembler listing you canalso match the other registers to whatever C expressions they were usedfor).Essentially, you just look at what doesn't match (in this case it was the"Code" disassembly that didn't match with what the compiler generated).Then you need to find out _why_ they don't match. Often it's simple - yousee that the code uses a NULL pointer and then you look at the code andwonder how the NULL pointer got there, and if it's a valid thing to doyou just check against it..Now, if somebody gets the idea that this is time-consuming and requiressome small amount of concentration, you're right. Which is why I willmostly just ignore any panic reports that don't have the symbol tableinfo etc looked up: it simply gets too hard to look it up (I have someprograms to search for specific patterns in the kernel code segment, andsometimes I have been able to look up those kinds of panics too, butthat really requires pretty good knowledge of the kernel just to be ableto pick out the right sequences etc..)_Sometimes_ it happens that I just see the disassembled code sequencefrom the panic, and I know immediately where it's coming from. That's whenI get worried that I've been doing this for too long ;-)Linus---------------------------------------------------------------------------Notes on Oops tracing with klogd:In order to help Linus and the other kernel developers there has beensubstantial support incorporated into klogd for processing protectionfaults. In order to have full support for address resolution at leastversion 1.3-pl3 of the sysklogd package should be used.When a protection fault occurs the klogd daemon automaticallytranslates important addresses in the kernel log messages to theirsymbolic equivalents. This translated kernel message is thenforwarded through whatever reporting mechanism klogd is using. Theprotection fault message can be simply cut out of the message filesand forwarded to the kernel developers.Two types of address resolution are performed by klogd. The first isstatic translation and the second is dynamic translation. Statictranslation uses the System.map file in much the same manner thatksymoops does. In order to do static translation the klogd daemonmust be able to find a system map file at daemon initialization time.See the klogd man page for information on how klogd searches for mapfiles.Dynamic address translation is important when kernel loadable modulesare being used. Since memory for kernel modules is allocated from thekernel's dynamic memory pools there are no fixed locations for eitherthe start of the module or for functions and symbols in the module.The kernel supports system calls which allow a program to determinewhich modules are loaded and their location in memory. Using thesesystem calls the klogd daemon builds a symbol table which can be usedto debug a protection fault which occurs in a loadable kernel module.At the very minimum klogd will provide the name of the module whichgenerated the protection fault. There may be additional symbolicinformation available if the developer of the loadable module chose toexport symbol information from the module.Since the kernel module environment can be dynamic there must be amechanism for notifying the klogd daemon when a change in moduleenvironment occurs. There are command line options available whichallow klogd to signal the currently executing daemon that symbolinformation should be refreshed. See the klogd manual page for moreinformation.A patch is included with the sysklogd distribution which modifies themodules-2.0.0 package to automatically signal klogd whenever a moduleis loaded or unloaded. Applying this patch provides essentiallyseamless support for debugging protection faults which occur withkernel loadable modules.The following is an example of a protection fault in a loadable moduleprocessed by klogd:---------------------------------------------------------------------------Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97ccAug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000Aug 29 09:51:01 blizard kernel: *pde = 00000000Aug 29 09:51:01 blizard kernel: Oops: 0002Aug 29 09:51:01 blizard kernel: CPU: 0Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868]Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0cAug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8cAug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3---------------------------------------------------------------------------Dr. G.W. Wettstein Oncology Research Div. Computing FacilityRoger Maris Cancer Center INTERNET: greg@wind.rmcc.com820 4th St. N.Fargo, ND 58122Phone: 701-234-7556
