# CRIPTARILES ALLI SINDIFIED FIX. SINDIFIED FIX. SINDIFIED FIX. CONTRIBUTOR F 12.0 #### **FEATURES** Implements UDP, IPv4, ARP protocols Zero latency between UDP and MAC layer - (combinatorial transfer during user data phase) - See simulation diagram below Allows full control of UDP src & dst ports on TX. Provides access to UDP src & dst ports on RX (user filtering) Couples directly to Xilinx Tri-Mode eth Mac via AXI interface Separate building blocks to create custom stacks Easy to tap into the IP layer directly Supports TX and RX with IP layer broadcast address Separate clock domains for tx & rx paths Choice of smaller single slot ARP or multislot up to 255 slots Tested for 1Gbit Ethernet, but applicable to 100M and 10M #### SIMULATION DIAGRAM SHOWING ZERO LATENCY ON RECEIVE # **LIMITATIONS** #### Does not handle segmentation and reassembly - Assumes packets offerred for transmission will fit in a single ethernet frame - Discards packets received if they require reassembly # **OVERALL BLOCK DIAGRAM** # STRUCTURAL DECOMPOSITION #### ARP BLOCK OPTIONS #### ARP can be instantiated in one of the following options: - arp simple 1-slot ARP layer with timeout - arpv2 multislot ARP layer with timeout # These can be selected in the IP\_Complete\_nomac.vhd file by commenting out the appropriate line – ``` -- for arp_layer : arp use entity work.arp; -- single slot arbitrator for arp_layer : arp use entity work.arpv2; -- multislot arbitrator ``` # **ARP V2 BLOCK** Legend: RX clock domain TX clock domain #### INTERFACE end UDP Complete nomac; ``` entity UDP Complete nomac is Port ( -- UDP TX signals udp tx start : in std logic; -- indicates req to tx UDP udp txi : in udp tx type; -- UDP tx cxns udp tx result : out std logic vector (1 downto 0); -- tx status (changes during tx) udp tx data out ready: out std logic; -- indicates udp tx is ready to take data -- UDP RX signals udp rx start : out std logic; -- indicates receipt of udp header udp rxo : out udp rx type; -- IP RX signals ip rx hdr : out ipv4 rx header type; -- system signals rx clk : in STD LOGIC; tx clk : in STD LOGIC; reset : in STD LOGIC; our ip address : in STD LOGIC VECTOR (31 downto 0); our mac address : in std logic vector (47 downto 0); : in upd control type; control -- status signals arp pkt count : out STD LOGIC VECTOR(7 downto 0); -- count of arp pkts received ip pkt count : out STD LOGIC VECTOR(7 downto 0); -- number of IP pkts received for us -- MAC Transmitter mac tx tdata : out std logic vector(7 downto 0); -- data byte to tx -- tdata is valid mac tx tvalid : out std logic; mac tx tready : in std logic; -- mac is ready to accept data mac tx tfirst : out std_logic; -- indicates firstbyte of frame mac tx tlast : out std logic; -- indicates last byte of frame mac rx tdata : in std logic vector(7 downto 0); -- data byte received mac rx tvalid : in std logic; -- indicates tdata is valid mac rx tready : out std logic; -- tells mac that we are ready to take data mac rx tlast : in std logic -- indicates last byte of the trame ); ``` # THE AXI INTERFACE This implementation makes extensive use of the AXI interface (axi.vhd): #### **MAC INTERFACE** The MAC interface is fairly simple with separate clocks for receiver and transmitter. Each interface (RX and TX) is based on the AXI interface and has an 8-bit data bus, a valid signal, a last byte signal, and a backchannel signal to indicate that the other end is ready to accept data. The Transmit interface has an additional signal (mac\_tx\_tfirst) which can be used by MAC blocks that need something to indicate the start of frame. This signal is asserted simulaneous with the first byte to be transmitted (providing that tready is high). On the following diagram, tx\_clk and rx\_clk are shown sourced from the MAC transmit and receive blocks, but can come from an independent clock generator that feeds clocks to both the MAC blocks and the UDP\_IP\_stack. Data is clocked on the rising edge. # **SYNTHESIS STATS** 451 occupied slices on Xilinx xc6vlx240t (1%) (687 flipflops, 1294 LUTs) Test synthesis using Xilinx ISE 13.4 | Architecture | Slices | FF / LUTS | Block Rams | % slices used | |------------------|--------|-------------|------------|---------------| | Arp (1 slot) | 490 | 684/ 1283 | 0 | 1% | | Arpv2 (255 slot) | 674 | 1139 / 1822 | 2 | 1% | # MODULE DESCRIPTION: UDP\_COMPLETE\_NOMAC Simply wires up the following blocks: - UDP\_TX - UDP\_RX - IP\_Complete\_nomac Propagates the IP RX header info to the UDP\_complete\_nomac module interface. # MODULE DESCRIPTION: UDP\_TX AND UDP\_RX #### UDP\_TX: - Very simple FSM to capture data from the supplied UDP TX header, and send out a UDP header. - Asserts data ready when in user data phase, and copies bytes from the user supplied data. - Assumes user will supply the CRC (specs allow CRC to be zero). #### UDP\_RX - Very simple FSM to parse the UDP header from data supplied from the IP layer, and then to send user data from the IP layer to the interface (asserts udp\_rxo.data.data\_in\_valid). - Discards IP pkts until it gets one with protocol=x11 (UDP pkt). #### **MODULE DESCRIPTION: IPV4** Simply wires up the following blocks: - IPv4 - ARP - Tx\_arbitrator Arp reads the MAX RX data in parallel with the IPv4 RX path. ARP is looking for ARP pkts, while IPv4 is looking for IP pkts. IPv4 interacts directly with ARP block during TX to ensure that the transmit destination MAC address is known. TX\_arbitrator, controls access to the MAC TX layer, as both ARP and IPv4 may want to transmit at the same time. # MODULE DESCRIPTION: IPV4\_TX #### IPv4\_TX comprises two simple FSMs: - to control transmission of the header and user data - to calculate the header checksum #### To use, - set the TX header, and assert ip\_tx\_start. - The block begins to calculate the header CRC and transmit the header - Once in the user data stage, the block asserts ip\_tx\_data\_out\_ready and copies user data over to the MAC TX output # MODULE DESCRIPTION: IPV4\_RX Simple FSM to parse both the ethernet frame header and the IP v4 header. #### Ignores packets that - Are not v4 IP packets - Require reassembly - Are not for our ip address and are not for the broadcast address Once all these checks are satisfied, the rx header data: ip\_rx.hdr is valid and the module asserts ip\_rx\_start. Received user data is available through the ip\_rx.data record. # MODULE DESCRIPTION: ARP (SINGLE SLOT VERSION) Handles receipt of ARP packets Handles transmission of ARP requests and timeout if no response received Handles request resolution (check ARP cache and request resolution if not found) Three FSMs, one for each of the above functions ARP mapper cache is only 1 deep in this implementation - which means that it is only really good for point-point comms. - Use ARPv2 if you want an implementation with more slots Input signals to module indicate our IP and MAC addresses ARP timeout is configured by generics in the ARP, IP, and UDP modules: ``` CLOCK_FREQ : integer := 125000000; ARP TIMEOUT : integer := 60 ``` CLOCK\_FREQ is used to scale the rx\_clk to produce a 1Hz signal for timing. ARP\_TIMEOUT specifies the timeout in seconds. Note: on timeout, ARP does not retransmit the ARP req, but reports a transmit error. Send again, to send extra ARP requests. # MODULE DESCRIPTION: ARPV2 (MULTI SLOT VERSION) Handles receipt of ARP packets Handles transmission of ARP requests and timeout if no response received Handles request resolution (check ARP cache and request resolution if not found) Decomposed into modules: req - handles request response protocol and contains a single slot cache for fast lookup store - maintains a map of IP->MAC addresses, configurable size to 255 tx - encodes the «I Have» and «who has» ARP tx formats rx - decodes the ARP protocols «I have» and «who has» sync - performs clock sync between the RX and TX clock domains ARPV2 mapper cache is configurable up to 255 slots. Input signals to module indicate our IP and MAC addresses ARP ARP MAX PKT TMO2 is configured by generics in the ARP, IP, and UDP modules: ``` CLOCK FREQ : integer := 125000000; ARP TIMEOUT : integer := 60 ARP_MAX_PKT_TMO : integer := 5 MAX_ARP_ENTRIES : integer := 255 ``` CLOCK\_FREQ is used to scale the rx\_clk to produce a 1Hz signal for timing. ARP\_TIMEOUT specifies the timeout in seconds. ``` ARP MAX PKT TMO specifies the number of received "I Have" ARP responses which don't satisfy our request before timeout. ``` Note: on timeout, ARP does not retransmit the ARP reg, but reports a transmit error. Send again, to send extra ARP requests. ``` MAX ARP ENTRIES specifies the number of slots in the ARP cache (max 255) ``` # MODULE DESCRIPTION: TX\_ARBITRATOR FSM to arbitrate access to the MAC TX layer by - IP TX path - ARP TX path One of the sources requests access and must wait until it is granted. Priority is given to the IP path as it is expected that that path has the highest request rate. # **SIMULATION** Every vdhl module has a corresponding RTL simulation test bench. Additionally, there are simulation test benches for various module integrations. In this version, verification is not completely automatic. The test benches test for some things, but much is left to manual inspection via the simulator waveforms. # **TESTBENCH - HW** The HW testbench is built around the Xilinx ML-605 prototyping card. It directly uses the card's 200MHz clocks, Eth PHY (copper) and LEDs to indicate status. A simple VHDL driver module for the stack replies with a canned response whenever it receives a UDP pkt on a particular IP addr and port number. The Xilinx LogiCORE IP Virtex-6 FPGA Embedded Tri-Mode Ethernet MAC v2.1 is used to couple the UDP/IP stack to the board's Ethernet PHY. This is used with the standard FIFO user buffering (which adds a one-frame delay). It should be possible also to remove this FIFO to reduce latency. A laptop provides stimulus by way of one of two Java programs: - UDPTest.java writes one UDP pkt and waits for a response then prints it - UDPTestStream.java writes a number of UDP pkts and prints responses The test network is a single twisted CAT-6 cable between the laptop and the ML-605 board. Wireshark (on the laptop) is used to capture the traffic on the wire (sample pcap files are included) # **TEST SETUP** # **TESTBENCH HW - ML605 MODULES** - UDP\_Complete integration of UDP with a mac layer - IP Complete integration of IP layer only with a mac layer - UDP\_Integration\_Example test example with vhdl process to reply to received UDP packets #### **TEST RESULTS** The xilinx MAC layer used contains a FIFO which therefore introduces a 1 frame delay. For tightly coupled low latency requirements, this can be removed. #### **Output from UDPTest:** Sending packet: 1=45~34=201~18=23~ on port 2000Got [@ABC] #### Output from UDPTestStream: Sending price tick 205 Sending price tick 204 Sending price tick 203 Sending price tick 202 Got [@ABC] Got [@ABC] Got [@ABC] Got [@ABC] ...