|
Target Debug slow
by jtaarud on Jul 20, 2010 |
jtaarud
Posts: 8 Joined: Jul 8, 2008 Last seen: Mar 23, 2012 |
||
|
I have a very slow debug connection:
I run DDD on a nice LINUX server I run or_debug_proxy on under Cygwin near my target My JTAG transfer speed is approximately 5 Mbps. When I try to disassemble a location, each line of dissassembled code arrives about once per 1.2 seconds. The logic analyzer sees mostly idle on the JTAG line, and all of my CPUs show mostly idle. Is there some kind of weird timeout going on between the proxy and the USB or DDD? |
|||
|
RE: Target Debug slow
by jeremybennett on Jul 20, 2010 |
jeremybennett
Posts: 815 Joined: May 29, 2008 Last seen: Jun 13, 2019 |
||
|
Hi Jeff I'm reposting here the email I sent to you earlier, for the benefit of other readers. The or_debug_proxy was a joint development, and others will have suggestions. It reuses my Remote Serial Protocol handler. You don't mention the target board you are using. However I am assuming it is via a USB to JTAG link, probably using a FTDI 2232C chip. There are known issues here. The USB interrupt frequency is 125uS, which limits you to 4K activities per second. A GDB operation maps into several remote serial protocol transfers. These in turn each map onto multiple JTAG transfers. I've certainly measured less than 1% CPU activity, which indicates the problem is waiting for the JTAG interface. The effect is compounded I believe by the open source version of the USB/JTAG interface on which or_debug_proxy is built. I suspect it is designed to be functionally correct rather than super efficient. There is work to be done here. Transfers over physical JTAG are never going to be that fast. It is bit-serial and the JTAG clock is typically 10x slower than the system clock. So on a 25MHz FPGA board, you are not going to clock in more than 200-300KB/s just on bit-rate. By the time you add in the overhead of multiple RSP packets and JTAG overheads, 20-30KB/S transfer rates look feasible. That is not untypical of simple JTAG systems. It is also why there are efforts, both in the standards bodies (IEEE 1149.7) and commercially (e.g ARM's debug structure) to promote faster alternatives. The fact that you are getting 1KB/s is down to the inefficiency of the current open source low level drivers. I'd love to have the time to develop a really good low-level driver for the FTDI interface. However it isn't a top priority for me. Of course if someone would like to fund such work, I'd be very happy to do it :-). HTH Jeremy
-- |
|||
|
RE: Target Debug slow
by jtaarud on Jul 20, 2010 |
jtaarud
Posts: 8 Joined: Jul 8, 2008 Last seen: Mar 23, 2012 |
||
|
Wow! If what I'm experiencing during debug is truly the speed of things, I'm curious how anyone can put up with it. Right now I can't sell the idea of this platform to our SW group because they use IAR or ARM tools that don't experience this kind of turnaround ACK delay.
I would take a stab at it myself, but I'm drowning in work applied to speed up the core, plus learning the compiler ins and outs. Is there anyone out there who has solved this problem? |
|||
|
RE: Target Debug slow
by julius on Jul 20, 2010 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
|
Hi Jeff,
What is it that is extremely slow for you? The disassembly of code from the target? Downloading the application to the target via DDD/GDB? As Jeremy indicates, there is a bottleneck thanks to the USB. This is not such a big issue for bursts of reads or writes, but doing something like write then read then write will ensure that you get the maximum penalty thanks to the USB. I have made the or_debug_proxy request as much data from GDB as it can for each transaction, ensuring we're making full use of each USB transaction (for instance, downloading application code to the target should avoid this as much as possible) but it's still an issue - for each time we do a transaction, we read back a CRC/checksum value from the debug unit to ensure all the data was transferred correctly. This cannot be avoided unless you skip the checksum I'm not sure exactly what goes on GDB when it's trying to disassemble code, like you mention, but I have noticed disassembling code on the target can be slow sometimes. What rates are you seeing GDB reporting when downloading code? I find it usual to see between 80 and 120 KByte/s for a large transfer (Linux kernel, say.) Also, how old is the or_debug_proxy code you're using? Is it a recent (last few months) checkout from the OpenCores repository? |
|||
|
RE: Target Debug slow
by jeremybennett on Jul 21, 2010 |
jeremybennett
Posts: 815 Joined: May 29, 2008 Last seen: Jun 13, 2019 |
||
|
Wow! If what I'm experiencing during debug is truly the speed of things, I'm curious how anyone can put up with it. Right now I can't sell the idea of this platform to our SW group because they use IAR or ARM tools that don't experience this kind of turnaround ACK delay. Hi Jeff, The commercial versions of the FTDI libraries I suspect go much faster. There isn't any reason for them to make their free versions super fast. Remember that for a large image, you can preload it when starting Or1ksim. You don't have to load it via Or1ksim. When you are connected to a target, GDB disassembly will read from the target one word at a time. That is just the way GDB works. I'm not sure why it is quite so slow, although because of the JTAG overhead, reading one word takes almost as long as reading an entire page. If you are not connected to a target, it will disassemble from the image file, not the target, and that will be fast. I think you can tell GDB to do this, even when you are connected to the target. These are some of the few places where you notice these things, and they will apply to any JTAG interface of this sort. It's just not good at transferring large blocks of data, particularly byte or word at a time. What speed are your colleagues running the ARM core, and what debug interface are they using? I would take a stab at it myself, but I'm drowning in work applied to speed up the core, plus learning the compiler ins and outs. Is there anyone out there who has solved this problem? There are commercial alternatives out there that are much faster (e.g. XJTAG), although you'd then need to adapt the RSP server to drive them (not too hard). For the free version, we know what needs fixing - the low level FTDI 2232C interface. It's on the list of things to be done, but it is not a high priority. Although that priority could be changed given appropriate commercial incentive :-). Jeremy |
|||
|
RE: Target Debug slow
by rfajardo on Jul 21, 2010 |
rfajardo
Posts: 306 Joined: Jun 12, 2008 Last seen: Jan 6, 2020 |
||
|
Hi Jeff,
I don't know the speeds reached for different cables for the Advanced Debug System. However it is a second debugging system also developped for OpenRISC and available here on OpenCores (http://opencores.org/project,adv_debug_sys), you could give it a try. It works with several cables. If configured so, it can work over the same FPGA JTAG interface and same cable used for FPGA configuration. That avoids cable change or extra pin assignment for the debugging system. Best regards, Raul |
|||
|
RE: Target Debug slow
by jeremybennett on Jul 21, 2010 |
jeremybennett
Posts: 815 Joined: May 29, 2008 Last seen: Jun 13, 2019 |
||
|
Hi Raul, I'd be very interested to see the comparative performances, particularly with the same cable. Jeremy |
|||
|
RE: Target Debug slow
by rfajardo on Jul 21, 2010 |
rfajardo
Posts: 306 Joined: Jun 12, 2008 Last seen: Jan 6, 2020 |
||
|
Hi Jeremy,
yes, that would be good. As I said, I unfortunately don't know the values. I thought that the use of a different cable, supported by ADS, could help in achieving a higher speed. Greetings, Raul |
|||
|
RE: Target Debug slow
by jeremybennett on Jul 21, 2010 |
jeremybennett
Posts: 815 Joined: May 29, 2008 Last seen: Jun 13, 2019 |
||
|
I thought that the use of a different cable, supported by ADS, could help in achieving a higher speed. Hi Raul, It might do, but in our case the problem is the low level driver libraries. If a different cable comes with different (open source!) libraries then that could make a difference. However the JTAG interface is inherently slow for debug purposes. Even in a perfect world, with a 2.5MHz JTAG clock, you'll struggle to achieve block data transfers higher than a few 10s of KB/s. Best wishes, Jeremy |
|||
|
RE: Target Debug slow
by jtaarud on Jul 21, 2010 |
jtaarud
Posts: 8 Joined: Jul 8, 2008 Last seen: Mar 23, 2012 |
||
|
With Julius' clue, I discovered that the cygwin version of or_debug_proxy makes a call to sleep(1) inside of a poll loop. This call is intended to prevent CPU hogging. Unfortunately it also degrades responsiveness. On a dual-core windows machine only one core takes 100%.
With the workaround (commenting out the sleep), the responsiveness is great and download speed is 40-50kB/sec. Thank you for all the suggestions! |
|||
|
RE: Target Debug slow
by jeremybennett on Jul 22, 2010 |
jeremybennett
Posts: 815 Joined: May 29, 2008 Last seen: Jun 13, 2019 |
||
|
Hi Jeff That's good news. The only thing that now surprises me is quite how good a result you are getting! My tests had put the upper bound with a 2.5MHz JTAG clock at around 20-30kHz. What speed are you running your JTAG clock at? Best wishes, Jeremy |
|||
|
RE: Target Debug slow
by julius on Jul 22, 2010 |
julius
Posts: 363 Joined: Jul 1, 2008 Last seen: May 17, 2021 |
||
|
My tests had put the upper bound with a 2.5MHz JTAG clock at around 20-30kHz.
The FT2232D chip in the ORSoC debug cables, according to the datasheet ( http://www.ftdichip.com/Documents/DataSheets/DS_FT2232D.pdf ), are "capable of a maximum sustained data rate of 5.6 Mega bits/s." The driver functions for the chip allow you to specify a clock divisor which sets the frequency of the JTAG signals being generated. The following function is how the frequency is determined:
frequency = 12MHz/((1 + dwClockDivisor) * 2)
In both the Linux and Windows USB driver functions the clock divisor is set to 0, so we're driving the JTAG lines as fast as possible, about 6MHz. This, of course, is the burst transaction speed and nothing to do with the USB transaction rate. Anyway, thought it would be interesting to note. |
|||
|
RE: Target Debug slow
by nyawn on Jul 23, 2010 |
nyawn
Posts: 173 Joined: Dec 19, 2008 Last seen: May 31, 2023 |
||
|
Raul / Jeremy:
For the Advanced Debug System, I've only got performance numbers for the Altera USB-Blaster. For code downloads of about 1MB, GDB shows ~30k/sec (under WinXP/cygwin). I don't have any quantitative data for other tasks. The USB-Blaster driver does not use libFTDI, it goes through libUSB (which may well have its own performance issues), so it may gain a bit there. But, the USB-Blaster is designed to work fast only with multiples of 8 bits, so it loses a lot every time it has to transfer the remainder bits via bit-banging. I'm sure Altera designs its hardware to use multiples of 8 bits, but I didn't. ADS also has an FT2232 driver, but it uses libFTDI. Since I don't have a 2232-based cable, I don't have any performance numbers. |
|||
|
RE: Target Debug slow
by bruceli on Aug 26, 2010 |
bruceli
Posts: 5 Joined: Feb 3, 2009 Last seen: Mar 23, 2020 |
||
|
With Julius' clue, I discovered that the cygwin version of or_debug_proxy makes a call to sleep(1) inside of a poll loop. This call is intended to prevent CPU hogging. Unfortunately it also degrades responsiveness. On a dual-core windows machine only one core takes 100%.
With the workaround (commenting out the sleep), the responsiveness is great and download speed is 40-50kB/sec. Thank you for all the suggestions! I find the same problem, according to your solution, problem's gone. Thanks a lot! |
|||
|
RE: Target Debug slow
by jtaarud on Aug 29, 2010 |
jtaarud
Posts: 8 Joined: Jul 8, 2008 Last seen: Mar 23, 2012 |
||
|
I should point out that removing the sleep(1) has 1 glaring drawback. 100% CPU use of one of the cores on your processor.
|
|||

1/2 
