1 |
2 |
wzab |
DESCRIPTION
|
2 |
|
|
|
3 |
|
|
This archive implements the simple and light protocol for transmission
|
4 |
|
|
of data from low resources FPGA connected to the Ethernet MAC
|
5 |
|
|
and an embedded system running Linux OS.
|
6 |
|
|
The main goal was to assure the reliable transmission over unreliable
|
7 |
|
|
Ethernet link without need to buffer significant amount of data
|
8 |
|
|
in the FPGA. This created a need to obtain possibly early
|
9 |
|
|
acknowledgment of received packets from the embedded system,
|
10 |
|
|
and therefore the protocol had to be implemented in layer 3.
|
11 |
|
|
|
12 |
|
|
The Ethernet type 0xfade was used (unregistered, but as this
|
13 |
|
|
protocol should be used only in a small private networks,
|
14 |
|
|
without routers, with switches only, it should not be a problem).
|
15 |
|
|
|
16 |
|
|
We assume, that the FPGA is capable to store one "set" of packets
|
17 |
|
|
(in the example design length of this set is equal to 32).
|
18 |
|
|
To start the transmission, receiver sends the "start transmission"
|
19 |
|
|
packet:
|
20 |
|
|
TGT,SRC,0xfade,0x0001,pad to 64 bytes
|
21 |
|
|
|
22 |
|
|
After reception of the "start transmission" packet, the transmitter
|
23 |
|
|
(FPGA) starts to send the data packets:
|
24 |
|
|
TGT,SRC,0xfade,0xa5a5,set & packet number, delay, 1024 bytes of data
|
25 |
|
|
|
26 |
|
|
|
27 |
|
|
After reception of the correct data packet, the receiver sends the
|
28 |
|
|
"acknowledge" packet:
|
29 |
|
|
TGT,SRC,0xfade,0x0003,set & packet number, pad to 64 bytes
|
30 |
|
|
|
31 |
|
|
Another packet may be used to request immediate stop of transmission:
|
32 |
|
|
TGT,SRC,0xfade,0x0005, pad to 64 bytes
|
33 |
|
|
|
34 |
|
|
When first packets from the current set buffered in FPGA are
|
35 |
|
|
transmitted and acknowledged, they may be replaced with the packets
|
36 |
|
|
from the next set - the current state of transmission is stored
|
37 |
|
|
in desc_memory in the desc_manager entity.
|
38 |
|
|
|
39 |
|
|
When particular packet is not acknowledged, it is transmitted once
|
40 |
|
|
again. In current example design each packet has simple attributes:
|
41 |
|
|
1. set number
|
42 |
|
|
2. valid (ready to be sent)
|
43 |
|
|
3. sent (has been sent at least once - used for delay adaptation)
|
44 |
|
|
4. confirmed (reception has been confirmed, packet may be replaced
|
45 |
|
|
with the same packet from the next set)
|
46 |
|
|
|
47 |
|
|
List of packets is cyclically browsed to move the "head" and "tail"
|
48 |
|
|
pointers.
|
49 |
|
|
I've also tried another approach with more sophisticated packet
|
50 |
|
|
manager based on linked lists, but it is not fully debugged and not
|
51 |
|
|
ready for release yet. However the approach with cyclic browsing is
|
52 |
|
|
sufficient, as anyway an additional delay between packets had to be
|
53 |
|
|
introduced to achieve optimal transmission.
|
54 |
|
|
|
55 |
|
|
If the data packets are sent too quickly, the acknowledge
|
56 |
|
|
packets from the embedded system are received too late,
|
57 |
|
|
and the packet is retransmitted before acknowledge arrives.
|
58 |
|
|
The same may occur if the embedded system is overloaded
|
59 |
|
|
with packets from different slaves and drops some packets.
|
60 |
|
|
|
61 |
|
|
Therefore paradoxically resending of packets as soon as possible
|
62 |
|
|
does not provide the maximal throughput, and a delay between
|
63 |
|
|
packets must be introduced.
|
64 |
|
|
Of course if this delay is too big, the transmission also slows down.
|
65 |
|
|
|
66 |
|
|
To find the optimal delay, I have implemented a simple adaptive
|
67 |
|
|
algorithm based on analysis of the ratio between number of all sent
|
68 |
|
|
packets and of retransmitted packets: Nretr/Nall
|
69 |
|
|
If the data packets are sent too quickly, the ratio of Nretr/Nall
|
70 |
|
|
increases indicating, that the delay should be higher.
|
71 |
|
|
If the ratio Nretr/Nall is near to 0, we may reduce the delay.
|
72 |
|
|
Such a simple algorithm works quite satisfactory.
|
73 |
|
|
|
74 |
|
|
In the embedded system, the fpga_l3_fade.ko driver allows you
|
75 |
|
|
to service multiple FPGA slaves connected to different network
|
76 |
|
|
interfaces.
|
77 |
|
|
The "max_slaves" parameter lets you to set the maximum number of
|
78 |
|
|
slaves, when module is loaded.
|
79 |
|
|
|
80 |
|
|
After that, you can open /dev/l3_fpga0, /dev/l3_fpga1 ...
|
81 |
|
|
devices, to connect different slaves.
|
82 |
|
|
To connect one of those devices to particular FPGA slave,
|
83 |
|
|
you need to use the ioctl command L3_V1_IOC_STARTMAC
|
84 |
|
|
(please see the attached receiver2.c application for
|
85 |
|
|
an example).
|
86 |
|
|
The data received from the FPGA are placed in a kernel
|
87 |
|
|
buffer (each subdevice has its own buffer) which may be mmapped
|
88 |
|
|
to the user space application, providing very quick access
|
89 |
|
|
to the data. Another ioctl commands: L3_V1_IOC_READPTRS
|
90 |
|
|
and L3_V1_IOC_WRITEPTRS allow you to read the head and tail
|
91 |
|
|
pointers in this buffer and to confirm reception of data.
|
92 |
|
|
The attached receiver2.c application uses the described
|
93 |
|
|
mechanisms and simply tests, if the connected FPGA slave
|
94 |
|
|
sends consecutive 32-bit integers.
|
95 |
|
|
|
96 |
|
|
DISCLAIMER:
|
97 |
|
|
The published sources are "the first iteration". They work for me,
|
98 |
|
|
but I do not provide any warranty. You can use it only on your
|
99 |
|
|
own risk!
|
100 |
|
|
|
101 |
|
|
I hope to prepare the new, more mature version, which will be
|
102 |
|
|
described in a "official" publication (I'll send the reference,
|
103 |
|
|
when it is ready).
|
104 |
|
|
|
105 |
|
|
I'll also publish further versions of sources on my website:
|
106 |
|
|
http://www.ise.pw.edu.pl/~wzab/fpga_l3_fade
|
107 |
|
|
|
108 |
|
|
|
109 |
|
|
LICENSING:
|
110 |
|
|
1. My kernel driver is released under the GPL license
|
111 |
|
|
2. My user space application is public domain
|
112 |
|
|
3. My FPGA code is published with BSD license
|
113 |
|
|
4. I include also very slightly modified Ethernet MAC
|
114 |
|
|
http://opencores.org/project,ethernet_tri_mode
|
115 |
|
|
which is published under LGPL.
|
116 |
|
|
5. Due to licensing issues I can include only xco files for blocks
|
117 |
|
|
generated by Xilinx tools (in case of sources for
|
118 |
|
|
Spartan 3E Starter Kit instead of binary dcm1.xaw file
|
119 |
|
|
I had to include the generated dcm1.vhd file to avoid binary
|
120 |
|
|
attachment in shar archive).
|
121 |
|
|
I hope that you'll be able to rebuild my design with them
|
122 |
|
|
|
123 |
|
|
REBUILDING of FPGA CORES
|
124 |
3 |
wzab |
The sources are split into two sections:
|
125 |
|
|
FPGA_with_MAC - this is the older version with Ethernet MAC taken from
|
126 |
|
|
http://opencores.org/project,ethernet_tri_mode
|
127 |
|
|
FPGA_no_MAC - this is the newer version with renoved Ethernet MAC
|
128 |
|
|
instead two small state machines are implemented in
|
129 |
|
|
ethernet_sender_X and ethernet_receiver_X (X=4 or 8)
|
130 |
|
|
controlling the PHY directly.
|
131 |
|
|
|
132 |
2 |
wzab |
My sources have been tested with three boards: SP601, Atlys and
|
133 |
|
|
Spartan-3E Starter Kit. In the FPGA subdirectory there are
|
134 |
|
|
three subdirectories: sp601, atlys and sk3e. In each of those
|
135 |
|
|
subdirectories you there is the "build.sh" script, which
|
136 |
|
|
should recreate the .bit file needed to configure particular
|
137 |
|
|
board.
|
138 |
|
|
|
139 |
|
|
If you create something basing on this my work, I'll be glad if you
|
140 |
|
|
provide information about my project (especially if you cite my
|
141 |
|
|
article, after it is ready and published)
|
142 |
|
|
|
143 |
15 |
wzab |
EXPERIMENTAL "JUMBO FRAMES" BASED IMPLEMENTATION FOR 1Gb/s and 10GB/s LINKS
|
144 |
|
|
In the directory experimental_jumbo_frames_version you can find
|
145 |
|
|
the experimental version of my protocol, working with the 10Gb/s link on the
|
146 |
|
|
KC705 board and with 1Gb/s link on the Atlys board.
|
147 |
|
|
It uses longer "jumbo frames" with 8192 bytes of user data to transmit
|
148 |
|
|
data from the FPGA.
|
149 |
|
|
The high speed operation has exposed serious disadvantages of the previous
|
150 |
|
|
implementation. E.g. the concept of "sets" of packets has been dropped,
|
151 |
|
|
and instead packets are sequentially (modulo 2^32) numbered in the data
|
152 |
|
|
stream.
|
153 |
|
|
Additionally a possibility to send user defined commands (16-bit command
|
154 |
|
|
code, 32-bit command argument, 12-bytes return value (with 8 bytes defined
|
155 |
|
|
by the user)) to the FPGA.
|
156 |
12 |
wzab |
The design has been initially tested, and is working, but it still
|
157 |
15 |
wzab |
needs some improvements.
|
158 |
|
|
After the cleanup, this approach will be ported also to the version
|
159 |
|
|
working with standard frames.
|