virtex7_pcie_dma project thread
by aborga on Jan 7, 2015 |
aborga
Posts: 23 Joined: Dec 15, 2008 Last seen: Nov 13, 2024 |
||
The idea is the following:
-- this is the official thread related to http://opencores.org/project,virtex7_pcie_dma -- the link to this thread appears in the main project description All issues related to the project: specific questions, comments, feedback, etc. should be posted as a reply to this post. Double advantage: -- users: * find a clear place where to ask questions of public relevance related to a specific project (besides private email exchange, and bug report) e.g: "hey where can I modifiy the number of lanes?" "anybody willing to implement a mind reader based on this code?" * share experiance with the whole community on a specific project e.g: "wow I found this code so buggy I don't now what to do with it!" -- developers: * watch for questions in a single place e.g:"thanks Jimi, we will implement the DMA automatic-reload in the next release" e.g:"thanks Eddie, we will tap registers" * follow own project related discussions e.g:"cool somebody is planning to improve the code!" due to lack of time of the maintainers other threads in the forum will, concerning this project, be hardly followed. have fun. P.S: to improve the quality of the forum RSS feedback would be much appreciated! |
RE: virtex7_pcie_dma project thread
by n713z on Mar 27, 2015 |
n713z
Posts: 2 Joined: Feb 7, 2009 Last seen: Aug 28, 2015 |
||
Thank you for sharing this effort!
I tried running your core on a VC709 board. The good news is I can get DMAs to show up. The bad news is the core hangs after a few dozen DMAs so that all the PIO reads return 0xFFFFFFFF and none of the resets have any effect. Before I start digging into this, I was curious if you had the same experience or does it run stably in your hands. |
RE: virtex7_pcie_dma project thread
by fransschreuder on Mar 31, 2015 |
fransschreuder
Posts: 11 Joined: Aug 2, 2010 Last seen: Nov 14, 2024 |
||
Hello n713z,
As we are developing the core, we are also coming to new surprises sometimes. One of our colleagues has had a similar problem, but we first thought this was due to timing violations in other parts of the design which are not included in the opencores design. Could you explain which mode you are using (wrap_around or single cycle)? Is it possible for you to put ila probes on the axi bus (m_axis_rq* outputs from dma_read_write) and on the dma_descriptors_s (dma_control) and on completer_state_slv (dma_control)? It would be nice if you could capture the moment on which the DMA core does its last successful write, or at least do a postmortem capture. Thanks a lot for helping us to debug the core. Regards, Frans |
RE: virtex7_pcie_dma project thread
by fransschreuder on Apr 30, 2015 |
fransschreuder
Posts: 11 Joined: Aug 2, 2010 Last seen: Nov 14, 2024 |
||
Hello n713z,
Please try todays revision (svn R12). It should be fixed by now. We do still see the sysem freeze here, but now you have to transfer at full speed for some hours in order to get it to lock. We are investigating further. Frans |
RE: virtex7_pcie_dma project thread
by n713z on May 1, 2015 |
n713z
Posts: 2 Joined: Feb 7, 2009 Last seen: Aug 28, 2015 |
||
Thanks for the update, and sorry I'm not better help in debugging this--I'm doing this after hours for my own education and am far from expert on these matters.
I did try out the latest version, and I get the same behavior although less often, just like you describe. Note that I'm using a trivial smoketest that exercises this problem almost instantly, like so: loop { *((volatile int *)buffer + BUFSIZ - 4) = 0x0; // clear the receive buffer writel(0x80, virtual_base_address + 0x400); // pop the DMA while (*((volatile int *)buffer + BUFSIZ - 4)) != 0xdeadbeef && (++waited_too_long } I run this in the init function of a device driver, and when the while () test fails to complete the core seems stuck, register reads all return 0xff and soft reset through /sys/bus/pci/ doesn't help (rebooting the computer does, fwiw). With the current version this runs for several hundred cycles on average, with the previous one it would crash after a dozen or so. |
RE: virtex7_pcie_dma project thread
by clurado1980 on Sep 8, 2015 |
clurado1980
Posts: 1 Joined: Dec 28, 2011 Last seen: Aug 4, 2016 |
||
Hi,
in order to use PCIE-DMA core, i created a simple driver which do the followings: Initialization: +probing device +enabling it +set device bus master +requesting regions +mapping memory bars with pci_iomap and obtaining a virtual address reading and writing registers are ok, also i can see there is a copy of bar start addresses at 0x300 to 0x320 @ bar0. To make a DMA transaction: +allocating buffer with kmalloc and GFP_DMA +obtaining physical address via virt_to_phys +setting up registers +polling for register which holds descriptor status but i didn't have successful DMA with steps above, the allocated memory will be untoched in read transaction. the descriptor done register shows that descriptor is still have something to do and descriptor current address will be unchaged and remain in started address. its looks like writing 0x1 to 0x400 ( descriptor enable ) isn't successful. I am using the core with HTG-708 with Xilinx Virtex7 690T FPGA, the lspci shows bars correctly and speed of 8GT/s. here is log of my driver: [84228.067382] Reseting NPCI DMA [84228.067383] WRITE_32-> BAR_0 [0x420][84228.067385] READ_32-> BAR_0 [0x420]=0x0 [84228.067388] WRITE_32-> BAR_0 [0x430][84228.067391] READ_32-> BAR_0 [0x430]=0x0 [84228.067400] NPCI-DMA: Allocated 8188 byte buffer at physical address 0x28000 ( virt-addrs 0xffff880000028000 ) for descriptor 0 [84228.067403] NPCI-DMA: setting up descriptor 0 from 0x28000 to 0x287ff (2047*4 bytes) [84228.067405] WRITE_64-> BAR_0 [0x0][84228.067409] READ_64-> BAR_0 [0x0]=0x28000 [84228.067411] WRITE_64-> BAR_0 [0x8][84228.067414] READ_64-> BAR_0 [0x8]=0x287ff [84427.096422] NPCI-DMA: initiating read on descriptor 0 [84427.096425] WRITE_32-> BAR_0 [0x10][84427.096429] READ_32-> BAR_0 [0x10]=0xfff [84427.096432] READ_32-> BAR_0 [0x400]=0x0 [84427.096435] WRITE_32-> BAR_0 [0x400][84427.096438] READ_32-> BAR_0 [0x400]=0x0 [84428.977960] NPCI: Unregistering driver [84438.977996] READ_64-> BAR_0 [0x200]=0x88000 [84438.978000] READ_64-> BAR_0 [0x208]=0x2 |
RE: virtex7_pcie_dma project thread
by venkub on Jan 22, 2016 |
venkub
Posts: 1 Joined: Aug 6, 2008 Last seen: Dec 24, 2023 |
||
Hi,
Thank you for your contribution to OPENCORES. When I was going through the files, I see that the AXIS interface's *_tuser port is not connected between the PCIe core and the DMA engine. And as per Xilinx PCIe core (Virext, v4.1), the _tuser port carries information regarding the TLP. Can you please clarify on this? Regards, Venkat |
RE: virtex7_pcie_dma project thread
by fransschreuder on Jan 24, 2016 |
fransschreuder
Posts: 11 Joined: Aug 2, 2010 Last seen: Nov 14, 2024 |
||
Hello,
The tuser signals indeed carry some information like byte enable and parity. we decided not to support byte enable, only 32 bit multiples. parity is also not used by our design. Regards, Frans |
RE: virtex7_pcie_dma project thread
by BasStuif on Apr 12, 2016 |
BasStuif
Posts: 4 Joined: Jan 11, 2016 Last seen: Jun 27, 2016 |
||
Dear Frans and Andrea,
Thank you for sharing your work with the community. I am working on a driver for MS Windows, but am running into a problem. I am sure I must be missing something. Unfortunately it seems only reading back the temperature seems to be working. I have attached results from running wupper-tools as well as my own results when trying to perform the same operations as wupper-tool. Unfortunately it seems in both cases DMA simply does not want to start. I hope you will be able to tell me: 1) What am I missing, which step, to kick-off DMA? 2) What may be a good place to set probes and figure out why DMA does not start? 3) Vivado has a problem closing timing, and is stuck at 0.07ns to slow. Can this be an issue? Should the 250 MHz be an issue, it seems relatively straightforward to reduce the PCIe AXIs output clock to 125 MHz, and adjust the PLL and constrains accordingly. does this seem a viable solution? 4) Are there any PCIe timing settings that I should adjust in my PC's BIOS? Thanks you for your time, if you prefer we can also confer in Dutch language via email: bas.stuifmeel@nlncsa.nl Looking forward to your insights, Best regards, Bas |
RE: virtex7_pcie_dma project thread
by BasStuif on Apr 13, 2016 |
BasStuif
Posts: 4 Joined: Jan 11, 2016 Last seen: Jun 27, 2016 |
||
Dear Frans and Andrea,
Another question, the software should it be build on a 32 bit Linux or a 64 bit linux? I have been told the u_long on 64 bit Linux is 64 bits wide, but on 32 bit Linux it is only 32 bits wide. Since the wupper driver uses a struct (elegant) that overlays with the wupper registers, the registers are aligned on u_long. Also, when I look in the DMA controller, the completer does the following: register_address_s = s_axis_cq.tdata(63 downto 2)&"00"; Thus taking a 32 bit aligned address for further decoding. And indeed subsequent decodeing is 32 bits aligned: case (register_address_s(3 downto 2)) is ... I can imagine this will work nicely when compiled on 32 bit Linux. Has it been tested on 64 bit Linux? Best regards, Bas. |
RE: virtex7_pcie_dma project thread
by fransschreuder on Apr 13, 2016 |
fransschreuder
Posts: 11 Joined: Aug 2, 2010 Last seen: Nov 14, 2024 |
||
Hello Bas,
We are very interested to see the code for the Windows driver, however I don't think we have any plans to ever use the core with Windows. I think it is good to at least share idea's and code for the users who do want to use it. 1). You will need to do a few things, first of all you must make sure there is data to be transmitted into the fifo. See for that the example applications. The current firmware example has a pseudo random generator (LFSR) that can put data into the fifo, but it must be enabled. Then the start address, end address and TLP size must be set in the descriptor. The TLP size (number of 32 bit words) depends on your PC, most PC's can't handle more than 128 or 256 bytes per TLP. Finally you have to enable the descriptor (Address BAR0 + 0x400). This enable bit will automatically clear as the DMA completes. 2). To see activity, I would first look with chipscope at the output (and input) of the fifo, later on you can check whether the DMA descriptor enable flag gets set. 3). 0.07 ns is no reason for the design not to work, this will probably only give problems if the fpga reaches extreme temperatures. It should however be possible for the wupper design to meet timing. We have a very large design, combined with wupper and also another large part running at 320MHz and it meets timing. (The XC7VX690T FPGA is filled with > 65% LUT's, and BRAM even more) 4). You should check that your PC supports PCIe Gen3, you can check the maximum PCIe Payload size in the bios (and maybe set it). I think it is a good idea to have your code debugged and then also place a copy into our repository. For now, while the driver is unstable, you could maybe make a Github repository or something similar to show us the code. Regards, Frans |
RE: virtex7_pcie_dma project thread
by fransschreuder on Apr 13, 2016 |
fransschreuder
Posts: 11 Joined: Aug 2, 2010 Last seen: Nov 14, 2024 |
||
Hello Bas,
We have only been using the core on a 64 bit linux, it has not been tested on a 32 bit system yet. The registers are in fact addressable as 32, 64 or even 128 bit registers, so it should be possible to use it on both 32 and 64 bit systems. I think for the Wupper driver to work on a 32 bit system, we should change all the u_long values to u_int64_t (from stdint.h) Frans |
RE: virtex7_pcie_dma project thread
by fransschreuder on Apr 13, 2016 |
fransschreuder
Posts: 11 Joined: Aug 2, 2010 Last seen: Nov 14, 2024 |
||
Hello Bas,
I see from your attachment that you are using the "wrap around" method. I would advise you to start your tests using wrap_around=0, this way you will get a single DMA write from start_address to end_address, and then the enable bit will automatically clear. The wrap_around method is much more complicated, the software / driver will have to maintain the "read_pointer" to tell the firmware until what address it can read / write. Frans
Dear Frans and Andrea,
Thank you for sharing your work with the community. I am working on a driver for MS Windows, but am running into a problem. I am sure I must be missing something. Unfortunately it seems only reading back the temperature seems to be working. I have attached results from running wupper-tools as well as my own results when trying to perform the same operations as wupper-tool. Unfortunately it seems in both cases DMA simply does not want to start. I hope you will be able to tell me: 1) What am I missing, which step, to kick-off DMA? 2) What may be a good place to set probes and figure out why DMA does not start? 3) Vivado has a problem closing timing, and is stuck at 0.07ns to slow. Can this be an issue? Should the 250 MHz be an issue, it seems relatively straightforward to reduce the PCIe AXIs output clock to 125 MHz, and adjust the PLL and constrains accordingly. does this seem a viable solution? 4) Are there any PCIe timing settings that I should adjust in my PC's BIOS? Thanks you for your time, if you prefer we can also confer in Dutch language via email: bas.stuifmeel@nlncsa.nl Looking forward to your insights, Best regards, Bas |
RE: virtex7_pcie_dma project thread
by BasStuif on Apr 20, 2016 |
BasStuif
Posts: 4 Joined: Jan 11, 2016 Last seen: Jun 27, 2016 |
||
Hello Frans and Andrea,
Thank you for your help. I have been making good progress and DMA to the card and back again seems to be working almost perfectly. I have simplyfied the application on the card by connectin upfifo and downfifo, with some minimal controls. Which seems to be working nicely. It seems I am getting some strange data through the upfifo. To illustrate I have atached wave-forms and some output that matches the wave forms. The test data contains two counters: one is counting every block of 256 bits, the other every 32 bit word. This provides a nice view of what is going on with the DMA. It seems that: 1) whole 256 bit blocks are suffled out of order (eg: 1,2,5,6,?,3,4,...) 2) sometimes some control is perceived as data as well? (the '?' above) Can this be because of TLP settings or any other setting? Thank you for your time, Bas.
160420-Output and wave forms.pdf (87 kb)
|
RE: virtex7_pcie_dma project thread
by BasStuif on Apr 28, 2016 |
BasStuif
Posts: 4 Joined: Jan 11, 2016 Last seen: Jun 27, 2016 |
||
Problem solved!
Because I am using a newer version of Vivado I had to manually reconfigure the PCIe core. Apparently not much of the original configuration was left and I should have turned off the "AXI-ST Frame Straddle" option. WUPPER ignores the TUSER signals, therefore it can not support the frame straddeling. The straddeling causes frames to become misaligned and 'confuses' WUPPER. |