OpenCores
First Prev 2/2 no use no use
RE: virtex7_pcie_dma project thread
by saban on May 5, 2016
saban
Posts: 1
Joined: Jul 7, 2014
Last seen: May 15, 2016
Hi,

I achieved to benchmark your project, but I have a question about your benchmark results.
I have Intel 3.6GHz 4 hardcores(8 threads cpu).
I'm trying to see your benchmark results but I could not see 6GB/s. Instead, I can see 4.95GB/s and I need to more performance than 5GB/s.
My PC's 1 CPU core increases to %100 and I think, My hardware setup has lower profile than yours.
What is your system setup and Do you benchmark your system with lower profile PC' and How much did it impact to your results?

I made some trivial changings on your project.
This line is not worked for me. I got memory allocation error and I changed this line.
export CMEM_PARAMS="gfpbpa_size=128 gfpbpa_quantum=4"(on driver)
I changed size to 4K.

Thanks in advance.
RE: virtex7_pcie_dma project thread
by aborga on Jun 9, 2016
aborga
Posts: 23
Joined: Dec 15, 2008
Last seen: Feb 27, 2024
Hi Saban,

Which of our tools are you using for performance measurements?

Keep in mind that for single shot DMA transfers, the data transfer size has an impact on the efficiency of the data transfer, hence the transfer rate.

Which transfer size (block size) do you use?

Have a look at Appendix A of our example application document:

http://opencores.org/websvn,filedetails?repname=virtex7_pcie_dma&path=%2Fvirtex7_pcie_dma%2Ftrunk%2Fdocumentation%2Fexample_application%2Finternship-wupper.pdf

Cheers,
Andrea
RE: virtex7_pcie_dma project thread
by fabrizio90 on Dec 14, 2021
fabrizio90
Posts: 1
Joined: Jun 26, 2017
Last seen: Dec 14, 2021
Hello Frans and Andrea,
I hope this message finds you well. A Master student and me are working on using the Wupper firmware to implement a tracking algorithm on the VU9p. We firstly made the tests with the VC709 and they went fine (they run on HW). We are now trying initially to see if the default version of the Wupper on the VU9p compiles but it always ends in errors never seen with the VC709 (vector assignment not correct because of the vector size, parameters read wrongly etc..). We are using Vivado 2021.1 and we noticed that this problems occurs only when we use the Virtex Ultrascale + (we tried also the VCU128), and considering that in the manual is written that to use the PCIE4 Vivado 2018.1 is required, we suppose that might be the problem. We would ask you, because in the manual is stated as possible, how to implement the PCIE Gen3x16 version on the VU9p, which should not give us errors due to the Vivado version (we already used a Gen3 firmware) and which is still good for our tests. We searched in the manual but didn't find how to do it. Thanks for your attention.

Best regards,
Fabrizio
RE: virtex7_pcie_dma project thread
by moshark on Mar 14, 2022
moshark
Posts: 2
Joined: Oct 20, 2021
Last seen: Jul 18, 2023
Hello Frans and Andrea,

I know it has been a while since anyone has posted but would appreciate your help, as I have been having some issues with Wupper. I am using a VU9 FPGA card and Vivado 2020.2. I am trying to use PCIE GEN3 with 16 lanes and 512 bit data width. The OS I'm using is ubuntu 20.04 and kernel is 5.13.0-35. I couldn't use the example provided for the Bittware XUP-P3R card as that was using PCIE4 with 8 lanes. I had to update the ultrascale+ Integrated Block (PCIE4) for PCI Express (1.3) core to use 16 lanes instead and 8.0 GT/s link speed. I have kept all other settings the same as what you provide. This has implemented with no issues. However when i run the "./wupper-dma-transfer -g" command i get back zeroes:
###WARNING: unknown WUPPER-card type = 800
Starting DMA write
done DMA write
Buffer 1 addresses:
0: 0
1: 0
2: 0
3: 0
4: 0
5: 0
6: 0
7: 0
8: 0
9: 0
10: 0
11: 0
12: 0
13: 0
14: 0
15: 0
16: 0
17: 0
18: 0
19: 0
20: 0
21: 0
22: 0
23: 0
24: 0
25: 0
26: 0
27: 0
28: 0
29: 0
30: 0
31: 0
32: 0
33: 0
34: 0
35: 0
36: 0
37: 0
38: 0
39: 0
40: 0
41: 0
42: 0
43: 0
44: 0
45: 0
46: 0
47: 0
48: 0
49: 0
50: 0
51: 0
52: 0
53: 0
54: 0
55: 0
56: 0
57: 0
58: 0
59: 0
60: 0
61: 0
62: 0
63: 0
64: 0
65: 0
66: 0
67: 0
68: 0
69: 0
70: 0
71: 0
72: 0
73: 0
74: 0
75: 0
76: 0
77: 0
78: 0
79: 0
80: 0
81: 0
82: 0
83: 0
84: 0
85: 0
86: 0
87: 0
88: 0
89: 0
90: 0
91: 0
92: 0
93: 0
94: 0
95: 0
96: 0
97: 0
98: 0
99: 0
Also when i check the status of the driver using the script you provide "./drivers_wupper_local status" I get:
cmem_rcc 8433664 0

>>>>>> Status of the cmem_rcc driver

./drivers_wupper_local: line 13: 12920 Killed more /proc/cmem_rcc
wupper 45056 0

>>>>>> Status of the wupper driver

./drivers_wupper_local: line 13: 12926 Killed more /proc/wupper

It seems the driver is being killed for some reason.

I was hoping to know if Wupper can in fact be used for PCIE3x16 lanes and if there is anything you can recommend for me to try.

Thank You!
Mohamed ElSharkawy
RE: virtex7_pcie_dma project thread
by fransschreuder on Mar 15, 2022
fransschreuder
Posts: 11
Joined: Aug 2, 2010
Last seen: Jan 25, 2024
Dear Mohamed ElSharkawy,
It looks like you need to disable the IOMMU in the bios of your computer.
Regards,
Frans
RE: virtex7_pcie_dma project thread
by moshark on Mar 16, 2022
moshark
Posts: 2
Joined: Oct 20, 2021
Last seen: Jul 18, 2023
Thank you very much Frans, turning FRAME STRADDLE RQ in the PCIE core OFF and IOMMU in BIOS OFF fixed the problem. It works!

Best,
Mohamed ElSharkawy

RE: virtex7_pcie_dma project thread
by giuliocreva on Jun 1, 2022
giuliocreva
Posts: 1
Joined: May 26, 2022
Last seen: Jul 12, 2022
Hi Frans and Andrea,

Thanks for sharing this great project on OpenCores.

I have Wupper implemented and working correctly in one shot mode. The exact amount of data I require is transferred and the data is consistent with the test pattern I am generating in the FPGA.

I haven't modified the drivers or the base WupperCard class in any way. Instead I am building some test routines based on the WupperCard class.

In my application data is constantly generated in the FPGA so I am trying to move towards the endless DMA operation.
I tried to do this in two different ways:
1) Using interrupts to wait until there is a wrap around.
I am getting the first interrupt, but it never gets cleared although I am updating the
read_ptr using dma_advance_ptr (I copied the code from wupper-throughput to do that).
2) Polling on the evolution of fw_ptr against read_ptr (including the even_addr_dma and
even_addr_pc to detect a wrap around).
All I can see in this case is a wrap around happening after the first DMA transfer is
completed and then the DMA engine stops as described in section 2.2 of the manual
because fw_ptr reaches rd_ptr and there is no more space available in the buffer
(corned case "D").
In both cases I made sure that the allocated buffer is big enough to allow several atomic transfers before a wrap around (buffer set to 1GB and atomic transfers of 1kB).

This is the sequence of actions I am performing in my test routine for the second approach:
1) Setup the buffer
2) Open the card
3) Issue soft reset
4) Start the DMA engine in wrap around mode with a call to dma_to_host
5) Start the test data generation
6) Start polling on fw_ptr and wrap around indicators
The sequence is very similar when I try to use interrupts. The only difference is that I enable the interrupts in the startup sequence and then I wait on the interrupt instead of polling.
I also tried to invert steps (5) and (6) bit it didn't help.

Can you spot anything obviously wrong in the setup sequence?
Do you have any more generic advise on the endless DMA operation that could help?

I am more than happy to share the test utility code if it helps.

Thank you.

Kind regards,
Giulio
First Prev 2/2 no use no use
© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.