OpenCores
no use no use 1/1 no use no use
Minsoc hangs
by Marco.Gunia on Jan 14, 2016
Marco.Gunia
Posts: 1
Joined: Jul 17, 2014
Last seen: Jun 22, 2022
Hi,

I have managed to get Minsoc running on a Spartan6 FPGA board stacked on top of a custom PCB. Amongst others, the custom board offers a ZigBee chip, which is to be controlled. The goal is to estabilish communication between two of these boards using the chip. Sending works well. For receiving there is a problem: I have created two software versions for the receiver: The first version uses polling of the ZigBee chip instead of interrupts and this works well. The second version uses an Interrupt to be informed about a new message and this works fine for hours (each second a new message is sent / received), but then accidentally (after thousands of receivings) stops working. Going into the system using GDB shows that the system really hangs. It does not perform any operation anymore. It is stucked at an operation (but not on the same operation if repeating this scenario). After debugging for days now, checking the stack, variables, memory, etc. I can not figure out why the system stops. It must have something to do with the interrupts, because the interrupt-free version works well.

Has anybody of you had a similar problem before? May I ask, if you have any idea how to trace back the error?

Best regards,
Marco
RE: Minsoc hangs
by rfajardo on Jan 30, 2016
rfajardo
Posts: 306
Joined: Jun 12, 2008
Last seen: Jan 6, 2020
Hello Marco,
these are the kind of problems that are hard to trace. You need a little bit of detective skills to proceed. First, be sure that the place & route result of your FPGA system is time clean, also specify timing constraints for the interrupt input coming from ZigBee. If you can be sure about that, mostly, what will work once on hardware, should work again unless you could change some internal CPU states while doing so, what I don't believe.

I would than concentrate on analyzing concurrency in your program. In the pooling program, the input pin is always checked at the same step in the program flow and all variables are in a known state. In an interrupt service routine, the regular program flow could have been interrupted and there could be unititialized variables, function pointers and so on. You have to consider if your program could be jumping to somewhere in memory where there is no code or other code for example, or you overwrite code by mistake if you use memcpy with wrong length of pointers.

You could have gotten an unaligned store or load exceptions what would help to analyse you problem. Since the program is at different runs, hanging somewhere else, it seems that you are executing, either wrong code or data. It would be important to check if you can step over that code when you attach to the debugger to eliminate that option.

If you find out where your Program Counter is running, check you executable with objdump -D executable > assembly.txt, open that file and check that memory address. It will also tell you in which function you are lurking, then try to imagine how you got there.

If none of that helps, you could write a CPU tracer in HDL to trace back the branches of the CPU, you could then find out how it got where it is.

I hope that helps for a start and good luck!
Raul
no use no use 1/1 no use no use
© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.