URL
https://opencores.org/ocsvn/core1990_interlaken/core1990_interlaken/trunk
Subversion Repositories core1990_interlaken
[/] [core1990_interlaken/] [trunk/] [documentation/] [protocol_survey_report/] [Sections/] [Hardware_Implementation.tex] - Rev 5
Go to most recent revision | Compare with Previous | Blame | View Log
\section{Hardware implementation} \label{sec:hardware_implementation} The purpose of this assignment was to search and implement the best protocol matching a clear set of requirements. In chapter~\ref{sec:interlaken} this best protocol has been found and described in details. This section will focus on the implementation of Interlaken on an FPGA and the hardware provided to test it. The author has been provided a Xilinx VC707 Evaluation Board~\cite{VC707} by Nikhef to test with. The provided board is depicted in Figure~\ref{Fig:VC707_Nikhef}. \begin{figure}[H] \centering \includegraphics[width=\textwidth]{VC707_Nikhef.jpg} \caption{The Virtex-7 VC707 Board provided by Nikhef.} \label{Fig:VC707_Nikhef} \end{figure} %The VC707 features a Virtex-7 XC7VX485T-2FFG1761C \cite{Virtex-7} which contains about 486k logic cells and 37 kb ram. The board also features among others 1 GB DDR3, 128 MB Flash, USB 2.0 ULPI transceiver, GTX transceivers, 10/100/1000 tri-speed Ethernet PHY, HDMI codec, PCI Express lanes (Gen 1 x8 \& 2 x8) and SFP+ connector. The VC707 Evaluation Board contains 27 accessible GTX transceivers according to the documentation. Eight are wired to the PCI Express x8 connector and sixteen are connected to the FMC connectors. This results in three left from which one is wired to the SMA connectors, another one is connected to the SFP/SFP+ connector and the last one is used in combination with the Ethernet PHY for SGMII connection. This makes clear that only two of these GTX transceivers are immediately accessible for communication with other boards or products. These are the transceivers wired to the SMA and SFP+ connectors. The included GTX transceivers support transfer speeds up to 12,5 Gbps in case the QPLL is used instead of the CPLL, which is excellent since 10 Gbps is the target line rate~\cite{GTXT}. The difference between PLL types and which clocks they generate will be explained in section \ref{subsec:Hardware_Transceiver}. It is even specifically mentioned that in case of Interlaken a line rate of 10,3125 Gbps would be supported. The QPLL frequency would of course be 10,3125 GHz since all data will be serialized and transmitted over the line. This Chapter will contain separate sections describing the transmitter, receiver and transceiver parts. In case IP cores are used this will be noted with the accompanying version and vendor. %while the reference clock should be 161,13 MHz. This is understandable since the 10,3125 GHz is a serial transmission and when this is converted to a parallel bus the same speed has to be maintained. This can be calculated dividing 10,3125 GHz by the 64-bits which indeed results in the 161,13 MHz. \subsection{Transmitter} The transmitter side will be described and designed first. This will deliver more insight in the framing and encoding of the data which will make it easier to remove the framing and decode the transmitted data at the receiving side. Figure \ref{Fig:Hardware_TX} displays an overview of the complete transmitter side of the interface. Only useful data will be stored and the FIFO can also indicate it is full to other logic. Framing components have to communicate with each other because of the extra space required in between the data flowing. \begin{figure}[H] \centering \includegraphics[width=\textwidth]{Hardware_TX.png} \caption{Overview of the TX block diagram.} \label{Fig:Hardware_TX} \end{figure} \subsubsection[FIFO]{FIFO \hfill OSI Layer 2} The FIFO will act as a buffer temporary storing data when frames are added or the interface can process no more data. It is a possibility the user needs some time to respond and stop the appearance of new data at the interface input so the FIFO would hold this data. It's very inefficient to store all data in the FIFO appearing at the input since it could be possible not all data has to be transferred or the user interface provides no new data. To counter this problem a separate state machine has been added which responds to the SOP and EOP signals. In case a SOP is detected the FIFO will be allowed to store data and in case a EOP is detect the process will be stopped. The SOP and EOP signal will also be stored in the FIFO like other user interface signals which have to be included in the burst control words. In case the bursts will be generated according to the optional scheduling enhancement, an extra feature will be required to read the amount of data already placed in the FIFO. The current software version implements the Xilinx FIFO generator 13.1 IP core. \subsubsection[Bursts]{Bursts \hfill OSI Layer 2} The conversion from data packets to complete bursts will happen in a separate component. Certain input signals are expected which the application developer should provide. Besides the data input itself a SOP en EOP signal, indicating the start and end of the packet, can for example be expected. In the Burst component a state machine can be found which remains in idle state unless the burst is enabled and a start of packet is detected. This will trigger the state machine and the arriving data from the FIFO will be read. The data will be saved in several pipelined registers to make packing the data in burst words possible. As explained in Section~\ref{subsec:interlaken_bursts} a burst control word has to be added first. After this the data will follow and as long as the state machine doesn't detect an EOP this continues. Every word of data processed will also cause an increment by one in the word counter because of the maximum burst length, BurstMax, that is allowed. When this value is reached the state machine will switch to another state for one cycle and will return to processing the data again. This has been implemented because a burst control word has to be output. When an EOP signal is detected the state machine will switch to another state. Reading the FIFO will be stopped and a control word which contains the EOP and CRC-24 will follow at the output. After this it is important to check the word counter value. When the data length transmitted in this situation is shorter than the predefined BurstShort, the control word will be followed by one of multiple idle words until the transmitted data length is equal to BurstShort and the state machine will again wait for an SOP signal. In case the transmission contained an amount of words in between the values of BurstShort and BurstMax, no idle words will be necessary to include.\\ An implementation of the optional scheduling enhancement is recommended but not yet developed. \subsubsection[Meta Frame]{Meta Framing \hfill OSI Layer 2} This component will add the meta frames to the data transmission, as discussed in section~\ref{subsec:interlaken_metaframe} this will be four words. It contains a state machine that counts the passed data words. When the transmission starts, these four control words will appear at the output. During these four cycles data will be read and pipelined so when the control words have passed, data will immediately follow. When the word length of the passed data reaches a value of MetaFrameLength, including the several data words, the state will change. Firstly the pipelined data will be output so it takes several cycles before all data left the component. After this the framing words will be output again to complete the meta frame and during this process input data will already enter the pipeline registers again. This way the cycle repeats and a complete meta frame will always appear at the output. One last thing to consider is the FIFO will also stop being read for four clock cycles. Otherwise there won't be place for the framing words. \subsubsection[CRC generation]{CRC generation \hfill OSI Layer 2} Bursts and Meta Frames both contain variants of the CRC and generating this will be explained in this subsection. Both contain the same method since they will check a certain length of words and also the specific word that will contain the CRC itself later. The component responsible for generating the CRC needs two clock cycles to let this check appear at the output. In this case a method is used where data enters the CRC component but is also saved in original state parallel to the register because this has yet to be transmitted. Since the control words containing the reserved space for the CRC also have to be checked, the data has actually to be held in parallel registers for two clock cycles. This method makes it fairly simple to put the generated CRC in the control word since this word and the CRC now are available on the same clock cycle. Now it's just a matter of moving the bits to the right position in the control word. Figure~\ref{Fig:Hardware_CRC} shows a visual representation of this. \begin{figure}[H] \centering \includegraphics[width=0.9\textwidth]{Hardware_CRC.png} \caption{Used method generating CRC.} \label{Fig:Hardware_CRC} \end{figure} \subsubsection[Flow control]{Flow control \hfill OSI Layer 2} Flow control is an important aspect that has to be added in the design. This will be included in the form of the earlier discussed Out-of-Band flow control. \subsubsection[Scrambler]{Scrambler \hfill OSI Layer 1} The scrambler will receive a 64-bit data input from the meta framing component. The scrambled output also contains 64-bits. One addition to the scrambler is that it inspects certain bits in the data input. This has been done because the synchronization and scrambler state words have to be transmitted unscrambled. The block types have already been placed by the meta framing so when the control pin is a logic '1' and the block type indicated a synchronization or scrambler state word, the accompanying bits will be added. Of course the data control output will be a logic '1' in case a control word will appear at the output. The scrambler polynomial has been defined before and the Interlaken Protocol Definition already included a piece of code in Appendix B showing how to constantly generate the output and new state of the polynomial. This code was unfortunately written in Verilog so only the part generating the polynomial has been used and was translated to VHDL. \begin{figure}[H] \centering \includegraphics[width=0.6\textwidth]{Hardware_Scrambler.png} \caption{Overview of the Scrambler block.} \label{Fig:Hardware_Scrambler} \end{figure} \subsubsection[Encoder]{Encoder \hfill OSI Layer 1} The encoder will accept the 64-bit data signal (Data\_In) and control word indicator (Data\_Control) from the scrambler as inputs. Of course the encoder should also have a clock, reset and enable input. When the encoder is enabled all data packets will be added the preamble header. The two control/data word indication bits will depend on the status of Data\_Control. In case this signal is a logic '1', this indicates a control word and '10' will be added. In case the signal is a logic '0', arrival of a data word is indicated and '01' will be added. This should always generate the right bits. The status of inversion bit will also be determined here. A separate variable will be reset every clock cycle and will count the running disparity of the incoming data Data\_In using a for-loop. This value will be saved and compared to the running disparity value of the data just being transmitted. This data is located in a separate variable. In case both words contain a majority of ones or a majority of zeros, the bits of the incoming data will be inverted. After this the data will be moved to the variable and compared to the input again concerning running disparity. The data output will follow as a 67-bit value and is ready to be processed by the SerDes. Figure \ref{Fig:Hardware_Encoder} displays the entity of the Encoder. \begin{figure}[H] \centering \includegraphics[width=0.6\textwidth]{Hardware_Encoder.png} \caption{Overview of the Encoder block.} \label{Fig:Hardware_Encoder} \end{figure} \newpage \subsection{Receiver} The receiving side will be responsible for restoring the data back to its original form. The knowledge gained while developing the transmitter side parts creates better understanding on how to take on the receiving side. While the transmitting side added framing words and encoded data, the receiving side has to decode this again and remove the frames. \subsubsection[Deframing]{Deframing \hfill OSI Layer 2} This is simply removing the control words which can be indicated by the specific indication signal. Of course several indicators have to be read and processed before simply deleting words. For example in case an SOP or EOP is detected, the user interface will output a high SOP or EOP pin. \subsubsection[CRC checking]{CRC checking \hfill OSI Layer 2} Error checking will be done generating the CRC again like at the transmitter side. This generated value will be compared to the received CRC value in the control words. In case these match the data arrived flawless. When these don't match data corruption occurred and the data is not identical to that at the transmitting side. \subsubsection[Flow control]{Flow control \hfill OSI Layer 2} The receiver has to constantly check the \subsubsection[Descrambler]{Descrambler \hfill OSI Layer 1} Data leaving the decoder has yet to be descrambled. When starting the descrambler it won't process any input data but instead look for the unscrambled synchronization words. The control words indicator from the decoder will be read. In case this indicates a control word and the block type is identical to the synchronization one, the data will be compared to the predefined sync data. In case these are identical, the state machine will move into another state and two counters start. The amount of passing words will be counted. After a certain MetaFrameLength amount of words the synchronization word has to appear again. In case this happens the sync word counter will be incremented by one. In case this reaches the value of four, the state machine moves to the next state indicating a lock. When in lock all words at the input will be descrambled, except the sync and scrambler state words of course. There will still be checked on correct synchronization words and in case this is not identical to the word expected, the sync word error counter will be incremented by one. After this the scrambler state word is expected to arrive. In case this matches the current polynomial the status should remain locked. Otherwise the scrambler state mismatch counter will increment by one. When the sync word error or the scrambler state mismatch counters reaches the value of respectively four or three, a reset will follow. The descrambler loses its lock and has to look for synchronization words again to get in lock. \begin{figure}[H] \centering \includegraphics[width=0.6\textwidth]{Hardware_Descrambler.png} \caption{Overview of the descrambler block.} \label{Fig:Hardware_Descrambler} \end{figure} \subsubsection[Decoder]{Decoder \hfill OSI Layer 1} Data and control packets entering the decoder will of course be 67 bits wide. The preamble has to be removed which will return the data to its 64-bit width before encoding. Unfortunately these preamble bits cannot be directly removed since the exact position of these bits in the word is unknown. Since the packets have been serialized and parallelized again the preamble bits could have moved to for example positions 42:40 instead of the expected 66:64. The method to detect these bits and to lock on them is described in the Interlaken Protocol Definition. \\ Since the preamble always contains at least one transition, looking for transitions is the key to finding the preamble location in repeating packets. In this case transitions in the data input will be detected using a logic XOR with inputs Data(i) and Data(i-1). Variable i will loop from 66 to 0 so all transitions will be registered. In case the bits are equal in value it is clear no transition occurred and the XOR will output a logic '0'. In case a transition happens the bits won't be equal in value and the XOR will output a logic '1'. The outcome will be saved in a new 67-bit signal T1 now containing a logic '1' on all locations where the input data contains a transition. This signal T1 will go through a logic AND comparing it to the saved value of the earlier returned transitions called T2 or the reset value during startup. This way it can easily be analyzed which transitions are returning every word. Since this can easily be done in parallel, all transition bits occurring in both words will cause the AND to output a logic '1'. The complete result will be copied to signal T2. This process will constantly repeat until a single returning transition is left. In case transitions at certain positions are not always returning, this will be saved at the same location in T2 as a logic '0'. This indicated the location isn't containing the preamble. There is a possibility no legal repeating sync pattern can be detected and this will quickly cause T2 to be completely filled with zeros and a reset will follow. A quick overview of the till now discussed hardware design is visualized in Figure~\ref{Fig:Hardware_DecoderPt1}. \begin{figure}[H] \centering \includegraphics[width=\textwidth]{Hardware_DecoderPt1.png} \caption{Overview of the decoder input.} \label{Fig:Hardware_DecoderPt1} \end{figure} During the repeating patterns another counter will keep up how many consecutive legal sync pattern have passed. When the no repeating patterns have been found the counter will of course be reset. In case the pattern holds on, contains a single returning transition and the counter reaches a value of 64 consecutive legal patterns, the decoder will declare a word lock. When moving into lock state the sync header location will be saved to compare later and detect sync errors. The word lock will keep on as long less than sixteen sync header errors occur per 64 words. The error and word counter will be reset after every 64 words that pass. In case more than these sixteen sync errors occur the decoder will lose its lock state and be reset. After this the whole procedure restarts. Figure~\ref{Fig:Hardware_DecoderPt2} depicts the hardware schematic in case the decoder is in lock. As explained it will be checked if the input data still has a transition on the same location as the one locked on. The state machine will constantly be updated on this value. \begin{figure}[H] \centering \includegraphics[width=\textwidth]{Hardware_DecoderPt2.png} \caption{Overview of the decoder in lock.} \label{Fig:Hardware_DecoderPt2} \end{figure} While in lock the input data will be processed. In case the data has been inverted this will be reversed so that the polarity always leaves the decoder in original form. The two word indication bits will also de removed and a separate signal will leave the component to indicate a control word appeared at the output. This is also shown in Figure~\ref{Fig:Hardware_Decoder} which also shows two additional signals leaving the decoder. In case the decoder is locked or errors occur the separate signals will make these events visible to the logic interfacing with the Interlaken core. \begin{figure}[H] \centering \includegraphics[width=0.6\textwidth]{Hardware_Decoder.png} \caption{Overview of the decoder block.} \label{Fig:Hardware_Decoder} \end{figure} Additionally there will always be a transition to lock on since the preamble added by the encoder will be present. Even in case there is no data to transmit the scrambler will continue outputting data containing varying transitions which makes it possible to still get in lock or maintain the locked state. \subsubsection{Complete receiving interface} \subsection{Transceiver} \label{subsec:Hardware_Transceiver} Separate hardware is required to generate the 10Gbps link itself.\\ QPLL is used because line operates above CPLL operating range.\\ According to transciever manual ug476 Interlaken can run at 10.3125 Gbps. This requires a system clock of $10,3125 Gbit / 64 bits = 161,13 MHz$ or $10,0 Gbit / 64 bits = 156,25 MHz$ \newpage \subsection{Current state of progress} This subsection will describe the current progress on the hardware development of the Interlaken Protocol. Two tables will be presented from which Table~\ref{Tab:Transmitter_Status} gives information on the transmitter side and Table~\ref{Tab:Receiver_Status} will provide an overview of the status on the receiver side. This has been done to match the structure in which the hardware will be developed. %Options to customize the table easily \taburowcolors[2] 2{tableLineOne .. tableLineTwo} \tabulinesep = ^2mm_2mm \everyrow{\tabucline[.3mm white]{}} \begin{table}[H] \begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]} \tableHeaderStyle Function & Simulation & Hardware & Comments \\ Generating bursts& Done & - & Working in simulation as intended (simple packet mode)\\ Meta Framing & Done & - & Working in simulation\\ Scrambling & Done & - & Working in simulation - Amount of input pins may change\\ Encoding & Done & - & Working in simulation\\ CRC generating & Done & - & Working in simulation\\ Flow control & - & - & Not started yet/bits reserved\\ \end{tabu} \caption{Overview of progress on the transmitter side.} \label{Tab:Transmitter_Status} \end{table} \begin{table}[H] \begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]} \tableHeaderStyle Function & Simulation & Hardware & Comments \\ Deframing & Done & - & Working in simulation\\ Descrambler & Done & - & Working as expected in simulation\\ Decoder & Done & - & Working in simulation\\ CRC checking& Done & - & Working in simulation\\ Flow control& - & - & Not started yet/bits read but static\\ \end{tabu} \caption{Overview of progress on the receiver side.} \label{Tab:Receiver_Status} \end{table} \newpage
Go to most recent revision | Compare with Previous | Blame | View Log