OpenCores

Rev 5	Rev 8
Line 1...	Line 1...
`\section{Hardware implementation}`	`\section{Hardware implementation}`
`\label{sec:hardware_implementation}`	`\label{sec:hardware_implementation}`

`The purpose of this assignment was to search and implement the best protocol matching a clear set of requirements. In chapter~\ref{sec:interlaken} this best protocol has been found and described in details. This section will focus on the implementation of Interlaken on an FPGA and the hardware provided to test it.`	`The purpose of this assignment was to search and implement the best protocol matching a clear set of requirements. In chapter~\ref{sec:interlaken} this best protocol has been found and described in details. This section will focus on the implementation of Interlaken on an FPGA and the hardware provided to test it.`

`The author has been provided a Xilinx VC707 Evaluation Board~\cite{VC707} by Nikhef to test with.`	`The author has been provided a Xilinx VC707 Evaluation Board~\cite{VC707} by Nikhef to eventually test the implemented design on.`
`The provided board is depicted in Figure~\ref{Fig:VC707_Nikhef}.`	`The provided board is depicted in Figure~\ref{Fig:VC707_Nikhef}.`

`\begin{figure}[H]`	`\begin{figure}[H]`
`\centering`	`\centering`
`\includegraphics[width=\textwidth]{VC707_Nikhef.jpg}`	`\includegraphics[width=\textwidth]{VC707_Nikhef.jpg}`
Line 22...	Line 22...
The included GTX transceivers support transfer speeds up to 12,5 Gbps in case the QPLL is used instead of the CPLL, which is excellent since 10 Gbps is the target line rate~\cite{GTXT}. The difference between PLL types and which clocks they generate will be explained in section \ref{subsec:Hardware_Transceiver}. It is even specifically mentioned that in case of Interlaken a line rate of 10,3125 Gbps would be supported. The QPLL frequency would of course be 10,3125 GHz since all data will be serialized and transmitted over the line.	The included GTX transceivers support transfer speeds up to 12,5 Gbps in case the QPLL is used instead of the CPLL, which is excellent since 10 Gbps is the target line rate~\cite{GTXT}. The difference between PLL types and which clocks they generate will be explained in section \ref{subsec:Hardware_Transceiver}. It is even specifically mentioned that in case of Interlaken a line rate of 10,3125 Gbps would be supported. The QPLL frequency would of course be 10,3125 GHz since all data will be serialized and transmitted over the line.

`This Chapter will contain separate sections describing the transmitter, receiver and transceiver parts. In case IP cores are used this will be noted with the accompanying version and vendor.`	`This Chapter will contain separate sections describing the transmitter, receiver and transceiver parts. In case IP cores are used this will be noted with the accompanying version and vendor.`
`%while the reference clock should be 161,13 MHz. This is understandable since the 10,3125 GHz is a serial transmission and when this is converted to a parallel bus the same speed has to be maintained. This can be calculated dividing 10,3125 GHz by the 64-bits which indeed results in the 161,13 MHz.`	`%while the reference clock should be 161,13 MHz. This is understandable since the 10,3125 GHz is a serial transmission and when this is converted to a parallel bus the same speed has to be maintained. This can be calculated dividing 10,3125 GHz by the 64-bits which indeed results in the 161,13 MHz.`

`\subsection{Transmitter}`	`\subsection{Transmitter side}`
`The transmitter side will be described and designed first. This will deliver more insight in the framing and encoding of the data which will make it easier to remove the framing and decode the transmitted data at the receiving side.`	`The transmitter side will be described and designed first. This will deliver more insight in the framing and encoding of the data which will make it easier to remove the framing and decode the transmitted data at the receiving side.`

`Figure \ref{Fig:Hardware_TX} displays an overview of the complete transmitter side of the interface. Only useful data will be stored and the FIFO can also indicate it is full to other logic. Framing components have to communicate with each other because of the extra space required in between the data flowing.`	`Figure \ref{Fig:Hardware_TX} depicts the complete transmitter side of the interface. Only useful data will be stored and the FIFO can also indicate it is full to other logic. Framing components have to communicate with each other because of the extra space required in between the data flowing.`
`\begin{figure}[H]`	`\begin{figure}[H]`
`\centering`	`\centering`
`\includegraphics[width=\textwidth]{Hardware_TX.png}`	`\includegraphics[width=\textwidth]{Hardware_TX.png}`
`\caption{Overview of the TX block diagram.}`	`\caption{Overview of the TX block diagram.}`
`\label{Fig:Hardware_TX}`	`\label{Fig:Hardware_TX}`
`\end{figure}`	`\end{figure}`

`\subsubsection[FIFO]{FIFO \hfill OSI Layer 2}`	`\subsubsection[TX FIFO]{TX FIFO \hfill OSI Layer 2}`
`The FIFO will act as a buffer temporary storing data when frames are added or the interface can process no more data. It is a possibility the user needs some time to respond and stop the appearance of new data at the interface input so the FIFO would hold this data.`	`The FIFO will act as a buffer temporary storing data when frames are added or the interface can process no more data. It is also especially useful in case the interface needs some extra time before the next frame can be processed. This will often occur during the addition of burst and meta frames.`

It's very inefficient to store all data in the FIFO appearing at the input since it could be possible not all data has to be transferred or the user interface provides no new data. To counter this problem a separate state machine has been added which responds to the SOP and EOP signals. In case a SOP is detected the FIFO will be allowed to store data and in case a EOP is detect the process will be stopped. The SOP and EOP signal will also be stored in the FIFO like other user interface signals which have to be included in the burst control words.	`Another very useful feature of the FIFO is the ability to cross clock domains. Data will be written in the FIFO at a by the user determined rate while to FIFO will be read at the maximum speed the transceiver allows to always keep the line and logic alive/busy.`

`In case the bursts will be generated according to the optional scheduling enhancement, an extra feature will be required to read the amount of data already placed in the FIFO.`	`A separate input port is available to the user which enables the write ability of the FIFO. So only data the user really want to be stored will enter the FIFO. Besides this other control signals like the SOP, EOP and EOP valid will be put together with the data in one signal. The single 68-bit variable will then enter the FIFO.`

`The current software version implements the Xilinx FIFO generator 13.1 IP core.`	`In case the bursts will be generated according to the optional scheduling enhancement, an extra feature will be required to read the amount of data already placed in the FIFO.\\`

	`The most recent version implements the Xilinx FIFO generator 13.1 IP core.`

`\subsubsection[Bursts]{Bursts \hfill OSI Layer 2}`	`\subsubsection[Bursts]{Bursts \hfill OSI Layer 2}`
`The conversion from data packets to complete bursts will happen in a separate component. Certain input signals are expected which the application developer should provide. Besides the data input itself a SOP en EOP signal, indicating the start and end of the packet, can for example be expected.`	`Data leaving the FIFO will be converted to complete bursts. The control signal provided by the user which will leave the FIFO with the data, will be read and according to this the bursts will be formed. Then for example an SOP or EOP burst control word can be generated.`

	In the Burst component a state machine can be found which remains in idle state unless the burst is enabled and a start of packet is detected. This will trigger the state machine and the arriving data from the FIFO will be read. The data will be saved in several pipelined registers to make packing the data in burst words possible. As explained in Section~\ref{subsec:interlaken_bursts} a burst control word has to be added first. After this the data will follow and as long as the state machine doesn't detect an EOP this continues. Every word of data processed will also cause an increment by one in the word counter because of the maximum burst length, BurstMax, that is allowed. When this value is reached the state machine will switch to another state for one cycle and will return to processing the data again. This has been done so a burst control word can be transmitted between bursts of maximum lengths.

In the Burst component a state machine can be found which remains in idle state unless the burst is enabled and a start of packet is detected. This will trigger the state machine and the arriving data from the FIFO will be read. The data will be saved in several pipelined registers to make packing the data in burst words possible. As explained in Section~\ref{subsec:interlaken_bursts} a burst control word has to be added first. After this the data will follow and as long as the state machine doesn't detect an EOP this continues. Every word of data processed will also cause an increment by one in the word counter because of the maximum burst length, BurstMax, that is allowed. When this value is reached the state machine will switch to another state for one cycle and will return to processing the data again. This has been implemented because a burst control word has to be output.	When an EOP signal is detected the state machine will switch to another state. Reading the FIFO will be stopped and a control word which contains the EOP and CRC-24 will follow at the output. After this it is important to check the word counter value. When the data length transmitted in this situation is shorter than the predefined BurstShort, the control word will be followed by one of multiple idle words until the transmitted data length is equal to BurstShort and the state machine will again wait for an SOP signal. In case the transmission contained an amount of words in between the values of BurstShort and BurstMax, no idle words will be necessary to include.

When an EOP signal is detected the state machine will switch to another state. Reading the FIFO will be stopped and a control word which contains the EOP and CRC-24 will follow at the output. After this it is important to check the word counter value. When the data length transmitted in this situation is shorter than the predefined BurstShort, the control word will be followed by one of multiple idle words until the transmitted data length is equal to BurstShort and the state machine will again wait for an SOP signal. In case the transmission contained an amount of words in between the values of BurstShort and BurstMax, no idle words will be necessary to include.\\	There is also a situation possible where the user sends data at a slower rate than the complete interface can process. In this case the FIFO will transmit multiple empty flags during transmission. But the transceiver is still expecting data so the empty spaces are filled up with idle words. This way data is still transmitted and the link is kept alive. In case these idle words won't be used, the FIFO will still output the last value in it's memory. The interface will then just processes this as useful data which results in duplicated data at the receiving side. Plus an RX FIFO overflow since the RX FIFO will be read at the same rate the TX FIFO is filled.\\

`An implementation of the optional scheduling enhancement is recommended but not yet developed.`	`An implementation of the optional scheduling enhancement is recommended but not yet developed.`

`\subsubsection[Meta Frame]{Meta Framing \hfill OSI Layer 2}`	`\subsubsection[Meta Frame]{Meta Framing \hfill OSI Layer 2}`
This component will add the meta frames to the data transmission, as discussed in section~\ref{subsec:interlaken_metaframe} this will be four words. It contains a state machine that counts the passed data words. When the transmission starts, these four control words will appear at the output. During these four cycles data will be read and pipelined so when the control words have passed, data will immediately follow. When the word length of the passed data reaches a value of MetaFrameLength, including the several data words, the state will change.	This component will add the meta frames to the data transmission, as discussed in section~\ref{subsec:interlaken_metaframe} this will be four words. It contains a state machine that counts the passed data words. When the transmission starts, these four control words will appear at the output. During these four cycles data will be read and pipelined so when the control words have passed, data will immediately follow. When the word length of the passed data reaches a value of MetaFrameLength, including the several data words, the state will change.

`Firstly the pipelined data will be output so it takes several cycles before all data left the component. After this the framing words will be output again to complete the meta frame and during this process input data will already enter the pipeline registers again. This way the cycle repeats and a complete meta frame will always appear at the output. One last thing to consider is the FIFO will also stop being read for four clock cycles. Otherwise there won't be place for the framing words.`	`Firstly the pipelined data will be output so it takes several cycles before all data left the component. After this the framing words will be output again to complete the meta frame and during this process input data will already enter the pipeline registers again. This way the cycle repeats and a complete meta frame will always appear at the output. One last thing to consider is the FIFO will also stop being read for four clock cycles. Otherwise there won't be place for the framing words.`

`\subsubsection[CRC generation]{CRC generation \hfill OSI Layer 2}`	`\subsubsection[CRC generation]{Generating CRC \hfill OSI Layer 2}`
`Bursts and Meta Frames both contain variants of the CRC and generating this will be explained in this subsection. Both contain the same method since they will check a certain length of words and also the specific word that will contain the CRC itself later.`	`Bursts and Meta Frames both contain variants of the CRC and generating this will be explained in this subsection. Both contain the same method since they will check a certain length of words and also the specific word that will contain the CRC itself later.`

The component responsible for generating the CRC needs two clock cycles to let this check appear at the output. In this case a method is used where data enters the CRC component but is also saved in original state parallel to the register because this has yet to be transmitted. Since the control words containing the reserved space for the CRC also have to be checked, the data has actually to be held in parallel registers for two clock cycles. This method makes it fairly simple to put the generated CRC in the control word since this word and the CRC now are available on the same clock cycle. Now it's just a matter of moving the bits to the right position in the control word.	The component responsible for generating the CRC needs two clock cycles to let this check appear at the output. In this case a method is used where data enters the CRC component but is also saved in original state parallel to the register because this has yet to be transmitted. Since the control words containing the reserved space for the CRC also have to be checked, the data has actually to be held in parallel registers for two clock cycles. This method makes it fairly simple to put the generated CRC in the control word since this word and the CRC now are available on the same clock cycle. Now it's just a matter of moving the bits to the right position in the control word.
`Figure~\ref{Fig:Hardware_CRC} shows a visual representation of this.`	`Figure~\ref{Fig:Hardware_CRC} shows a visual representation of this.`
`\begin{figure}[H]`	`\begin{figure}[H]`
`\centering`	`\centering`
`\includegraphics[width=0.9\textwidth]{Hardware_CRC.png}`	`\includegraphics[width=0.6\textwidth]{Hardware_CRC.png}`
`\caption{Used method generating CRC.}`	`\caption{Used method generating CRC.}`
`\label{Fig:Hardware_CRC}`	`\label{Fig:Hardware_CRC}`
`\end{figure}`	`\end{figure}`

`\subsubsection[Flow control]{Flow control \hfill OSI Layer 2}`
`Flow control is an important aspect that has to be added in the design. This will be included in the form of the earlier discussed Out-of-Band flow control.`

`\subsubsection[Scrambler]{Scrambler \hfill OSI Layer 1}`	`\subsubsection[Scrambler]{Scrambler \hfill OSI Layer 1}`
`The scrambler will receive a 64-bit data input from the meta framing component. The scrambled output also contains 64-bits.`	The scrambler will receive a 64-bit data input from the meta framing component. Since the synchronization and scrambler state words won't have to be scrambled these have to be detected first. When the control input signal is low this means data is entering the scrambler and this will always be scrambled. In case the control input is high this first six bits will be looked at. This way the scrambler can determine whether the word is synchronization, scrambler state or other control word. The first will leave the scrambler untouched while the second will appear at the output after the current scrambler state has been added to the word. All other control words will be scrambled. The scrambled data will be outputted alongside the control word indicator and a data valid signal.
One addition to the scrambler is that it inspects certain bits in the data input. This has been done because the synchronization and scrambler state words have to be transmitted unscrambled. The block types have already been placed by the meta framing so when the control pin is a logic '1' and the block type indicated a synchronization or scrambler state word, the accompanying bits will be added. Of course the data control output will be a logic '1' in case a control word will appear at the output.
	`The scrambler polynomial has been defined before. The Interlaken Protocol Definition already includes a piece of code in Appendix B showing how to constantly generate the output and new state of the polynomial. This code was unfortunately written in Verilog so only the part generating the polynomial has been used and was translated to VHDL.`

`The scrambler polynomial has been defined before and the Interlaken Protocol Definition already included a piece of code in Appendix B showing how to constantly generate the output and new state of the polynomial. This code was unfortunately written in Verilog so only the part generating the polynomial has been used and was translated to VHDL.`
`\begin{figure}[H]`
`\centering`
`\includegraphics[width=0.6\textwidth]{Hardware_Scrambler.png}`
`\caption{Overview of the Scrambler block.}`
`\label{Fig:Hardware_Scrambler}`
`\end{figure}`
`\subsubsection[Encoder]{Encoder \hfill OSI Layer 1}`	`\subsubsection[Encoder]{Encoder \hfill OSI Layer 1}`
`The encoder will accept the 64-bit data signal (Data\_In) and control word indicator (Data\_Control) from the scrambler as inputs. Of course the encoder should also have a clock, reset and enable input.`	`The encoder will accept the scrambled data output. When enabled all data packets will be added a 3-bit preamble header. The control word input signal will indicate what the first two preamble bits accompanying the word should be. When the word is data '01' will be added and when a control word appears this will be '10'.`

	`For the determination of the inversion bit a separate variable will be reset every clock cycle. This will count the running disparity of the incoming data using a for-loop. The value will be saved and compared to the running disparity value of the data just being transmitted. This data is located in a separate variable.`

`When the encoder is enabled all data packets will be added the preamble header. The two control/data word indication bits will depend on the status of Data\_Control. In case this signal is a logic '1', this indicates a control word and '10' will be added. In case the signal is a logic '0', arrival of a data word is indicated and '01' will be added. This should always generate the right bits.`	`In case both words contain a majority of the same symbol, the bits of the incoming data will be inverted and the inversion bit will be added to the preamble as a logic high. After this the data will be moved to another variable and will this time be compared to the newly appearing input concerning running disparity.`

The status of inversion bit will also be determined here. A separate variable will be reset every clock cycle and will count the running disparity of the incoming data Data\_In using a for-loop. This value will be saved and compared to the running disparity value of the data just being transmitted. This data is located in a separate variable. In case both words contain a majority of ones or a majority of zeros, the bits of the incoming data will be inverted. After this the data will be moved to the variable and compared to the input again concerning running disparity.	`After this process a 67-bit word will leave the encoder and should be ready for transmission. The transceiver will accept this data and is responsible for the real transmission.`

	`\newpage`
	`\subsection{Receiver side}`
	`The receiving side will be responsible for restoring the data to its original form before this entered the interface. During development of the transmitter side many knowledge has been gained which makes it easier to develop the receiver side. While the transmitting side added framing words and encoded data, the receiving side has to decode this again and remove the frames. Figure \ref{Fig:Hardware_RX} depicts an overview of the receiver side.`

`The data output will follow as a 67-bit value and is ready to be processed by the SerDes. Figure \ref{Fig:Hardware_Encoder} displays the entity of the Encoder.`
`\begin{figure}[H]`	`\begin{figure}[H]`
`\centering`	`\centering`
`\includegraphics[width=0.6\textwidth]{Hardware_Encoder.png}`	`\includegraphics[width=\textwidth]{Hardware_RX.png}`
`\caption{Overview of the Encoder block.}`	`\caption{Overview of the RX block diagram.}`
`\label{Fig:Hardware_Encoder}`	`\label{Fig:Hardware_RX}`
`\end{figure}`	`\end{figure}`

`\newpage`	`\subsubsection[RX FIFO]{RX FIFO \hfill OSI Layer 2}`
`\subsection{Receiver}`	`The original data identical to the data before entering the interface will be written in this FIFO. Again it is also used to cross clock domains since the interface itself will be running at a standard frequency but the user clock may vary or configure at a different frequency.`
`The receiving side will be responsible for restoring the data back to its original form. The knowledge gained while developing the transmitter side parts creates better understanding on how to take on the receiving side. While the transmitting side added framing words and encoded data, the receiving side has to decode this again and remove the frames.`

`\subsubsection[Deframing]{Deframing \hfill OSI Layer 2}`	`Only when the component removing the burst frames output a valid signal, the data appearing at the FIFO input will be stored. Otherwise the data will be ignored since it then concerns a control word or duplicated data which is not useful. This means the valid signal is connected to the FIFO write enable pin.`
`This is simply removing the control words which can be indicated by the specific indication signal. Of course several indicators have to be read and processed before simply deleting words. For example in case an SOP or EOP is detected, the user interface will output a high SOP or EOP pin.`

`\subsubsection[CRC checking]{CRC checking \hfill OSI Layer 2}`	The user has to set the FIFO read signal high before any data will appear at the output which prevents data loss. In case the user logic is busy with other tasks and can't process the data, this will remain stored in the FIFO. However the chance on the FIFO overflowing is fairly high in such situating. This will be prevented by flow control which will be discussed in \ref{subsec:Hardware_FlowControl}. For this the FIFO programmable full signal can be used which offers a set value and threshold based on how far the FIFO is filled with data.\\
`Error checking will be done generating the CRC again like at the transmitter side. This generated value will be compared to the received CRC value in the control words. In case these match the data arrived flawless. When these don't match data corruption occurred and the data is not identical to that at the transmitting side.`

`\subsubsection[Flow control]{Flow control \hfill OSI Layer 2}`	`The most recent version implements the Xilinx FIFO generator 13.1 IP core.`
`The receiver has to constantly check the`

`\subsubsection[Descrambler]{Descrambler \hfill OSI Layer 1}`	`\subsubsection[Deframing Burst]{Deframing Burst \hfill OSI Layer 2}`
`Data leaving the decoder has yet to be descrambled. When starting the descrambler it won't process any input data but instead look for the unscrambled synchronization words. The control words indicator from the decoder will be read. In case this indicates a control word and the block type is identical to the synchronization one, the data will be compared to the predefined sync data. In case these are identical, the state machine will move into another state and two counters start.`	`The added burst control words have to be removed. However these words contain critical information which have to be read and processed before simply deleting words. For example in case an SOP or EOP is detected, the user interface will output a high SOP or EOP pin.`

`The amount of passing words will be counted. After a certain MetaFrameLength amount of words the synchronization word has to appear again. In case this happens the sync word counter will be incremented by one. In case this reaches the value of four, the state machine moves to the next state indicating a lock.`	The deframing can easily be done by inspecting the valid and control indicator which accompany each word. When the valid signal is low the data will simply be ignored. In case valid is high and control is also high, this indicates a valid burst control word. Since different burst control words each contain other viable information, it will first be determined which word this is. According to this analysis the useful information will be extracted. This data will be saved and appear at the output accompanying the next data word. For example an SOP be added to the next word. One exception is the EOP since this actually has to accompany the most recent passed data word.

`When in lock all words at the input will be descrambled, except the sync and scrambler state words of course. There will still be checked on correct synchronization words and in case this is not identical to the word expected, the sync word error counter will be incremented by one. After this the scrambler state word is expected to arrive. In case this matches the current polynomial the status should remain locked. Otherwise the scrambler state mismatch counter will increment by one.`	`In case valid is high and control low, the entering word will be considered as valid data. However this data will not yet appear at the output but is saved. This is done because an EOP can follow anytime and in case the data word is released to early the EOP will miss. When another valid data or control word enters the component, the saved word will be released. This means the possible appeared EOP or earlier SOP will appear at the output of the component together with the data.`

`When the sync word error or the scrambler state mismatch counters reaches the value of respectively four or three, a reset will follow. The descrambler loses its lock and has to look for synchronization words again to get in lock.`	`Another valid signal at the output will switch to high when the data leaves the component to indicate the FIFO that this data is useful and has to be stored.`

`\begin{figure}[H]`	`\subsubsection[Deframing Meta]{Deframing Meta \hfill OSI Layer 2}`
`\centering`	`The meta frames are useful during transmission and deframing but they have to be invalidated in the process of outputting only the original user data. The descrambler uses these frames to synchronize itself and before deframing the CRC-32 has also to be checked.`
`\includegraphics[width=0.6\textwidth]{Hardware_Descrambler.png}`	`After this the meta frames will simply be ignored by making their accompanying valid signal low at the output. This way other components will just ignore them. Of course when data enters this component with a logical low accompanying valid signal, the data will be ignored.`
`\caption{Overview of the descrambler block.}`
`\label{Fig:Hardware_Descrambler}`
`\end{figure}`

`\subsubsection[Decoder]{Decoder \hfill OSI Layer 1}`	`\subsubsection[CRC checking]{CRC checking \hfill OSI Layer 2}`
Data and control packets entering the decoder will of course be 67 bits wide. The preamble has to be removed which will return the data to its 64-bit width before encoding. Unfortunately these preamble bits cannot be directly removed since the exact position of these bits in the word is unknown. Since the packets have been serialized and parallelized again the preamble bits could have moved to for example positions 42:40 instead of the expected 66:64. The method to detect these bits and to lock on them is described in the Interlaken Protocol Definition. \\	`Error checking will be done generating the CRC again like at the transmitter side. This generated value will be compared to the received CRC value in the control words. In case these match the data arrived flawless. When these don't match data corruption occurred and the data is not identical to that at the transmitting side.`

Since the preamble always contains at least one transition, looking for transitions is the key to finding the preamble location in repeating packets. In this case transitions in the data input will be detected using a logic XOR with inputs Data(i) and Data(i-1). Variable i will loop from 66 to 0 so all transitions will be registered. In case the bits are equal in value it is clear no transition occurred and the XOR will output a logic '0'. In case a transition happens the bits won't be equal in value and the XOR will output a logic '1'. The outcome will be saved in a new 67-bit signal T1 now containing a logic '1' on all locations where the input data contains a transition.	`\subsubsection[Descrambler]{Descrambler \hfill OSI Layer 1}`
	Data leaving the decoder is still in scrambled format and not yes usable. When starting the descrambler it won't process any input data but instead look for the unscrambled synchronization words. The control words indicator from the decoder will be read. In case this indicates a control word and the block type is identical to the synchronization one, the data will be compared to the predefined sync data. In case these are identical, the state machine will move into another state and two counters start. In case the input data valid signal is a logical low, this word will be ignored so the scrambler state will also not change.

`This signal T1 will go through a logic AND comparing it to the saved value of the earlier returned transitions called T2 or the reset value during startup. This way it can easily be analyzed which transitions are returning every word. Since this can easily be done in parallel, all transition bits occurring in both words will cause the AND to output a logic '1'. The complete result will be copied to signal T2. This process will constantly repeat until a single returning transition is left.`	`The amount of passing words will be counted. After a certain MetaFrameLength amount of words the synchronization word has to appear again. In case this happens the sync word counter will be incremented by one. In case this reaches the value of four, the state machine moves to the next state indicating a lock.`

`In case transitions at certain positions are not always returning, this will be saved at the same location in T2 as a logic '0'. This indicated the location isn't containing the preamble. There is a possibility no legal repeating sync pattern can be detected and this will quickly cause T2 to be completely filled with zeros and a reset will follow. A quick overview of the till now discussed hardware design is visualized in Figure~\ref{Fig:Hardware_DecoderPt1}.`	`When in lock all words at the input will be descrambled, except the sync and scrambler state words of course. There will still be checked on correct synchronization words and in case this is not identical to the word expected, the sync word error counter will be incremented by one. After this the scrambler state word is expected to arrive. In case this matches the current polynomial the status should remain locked. Otherwise the scrambler state mismatch counter will increment by one.`

`\begin{figure}[H]`	`When the sync word error or the scrambler state mismatch counters reaches the value of respectively four or three, a reset will follow. The descrambler loses its lock and has to look for synchronization words again to get in lock.`
`\centering`
`\includegraphics[width=\textwidth]{Hardware_DecoderPt1.png}`
`\caption{Overview of the decoder input.}`
`\label{Fig:Hardware_DecoderPt1}`
`\end{figure}`

`During the repeating patterns another counter will keep up how many consecutive legal sync pattern have passed. When the no repeating patterns have been found the counter will of course be reset. In case the pattern holds on, contains a single returning transition and the counter reaches a value of 64 consecutive legal patterns, the decoder will declare a word lock. When moving into lock state the sync header location will be saved to compare later and detect sync errors.`	`\subsubsection[Decoder]{Decoder \hfill OSI Layer 1}`
	`The decoder will immediately receive data from the transceiver. Thus data entering the decoder will be 67-bit wide. This part will be responsible for removing the preamble and reducing the data width to 64-bits. The scrambled data will remain to be descrambled and the preamble will be converted to a separate control word indicator. The inversion bit in the preamble will be read and according to this data will be inverted or not, then the inversion bit will be discarded.`

`The word lock will keep on as long less than sixteen sync header errors occur per 64 words. The error and word counter will be reset after every 64 words that pass. In case more than these sixteen sync errors occur the decoder will lose its lock state and be reset. After this the whole procedure restarts.`	`One of the most important functions of the decoder is to alight the data correctly. Since the data is scrambled and cannot be read, the preamble has to be used for alignment. The decoder can lock on the bit transition in the preamble which assures the 64-bits leaving the decoder are really identical to the packet that has been transmitted.`

`Figure~\ref{Fig:Hardware_DecoderPt2} depicts the hardware schematic in case the decoder is in lock. As explained it will be checked if the input data still has a transition on the same location as the one locked on. The state machine will constantly be updated on this value.`	In combination with the transceiver from Xilinx, the decoder can use a separate pin for bit slipping. This will cause the transceiver to change the alignment of data during data processing and changes the preamble position. After every bit slip operation the current location of the preamble will be checked and when this is not located at the last three bits, the slip operation will repeat until the preamble is at the right location. This will also cause the decoder to go into lock after 64 consecutive words contain the right preamble location.

`\begin{figure}[H]`	When in lock the amount of processed words will be tracked by using a counter. A single word that is not correctly synchronized will not immediately cause the decoder to lose lock. For this an error counter is used which will keep track of the incorrectly synced words. After sixteen errors the decoder will lose lock and reset. This is determined over all words processed but every 64 words. After these 64 words the word counter and error status will be reset. Fortunately the three preamble bits only effectively use 50\% of the valid conditions so errors should not be common and will be easily detected.
`\centering`
`\includegraphics[width=\textwidth]{Hardware_DecoderPt2.png}`
`\caption{Overview of the decoder in lock.}`
`\label{Fig:Hardware_DecoderPt2}`
`\end{figure}`

While in lock the input data will be processed. In case the data has been inverted this will be reversed so that the polarity always leaves the decoder in original form. The two word indication bits will also de removed and a separate signal will leave the component to indicate a control word appeared at the output. This is also shown in Figure~\ref{Fig:Hardware_Decoder} which also shows two additional signals leaving the decoder. In case the decoder is locked or errors occur the separate signals will make these events visible to the logic interfacing with the Interlaken core.	`The implemented decoder contains three outputs. Two are for the control and valid signals while another one is of course for the data/control word itself. This makes it easy for the components after this to know which type of data this 64-bit word is.`

`\begin{figure}[H]`	`\subsection[Flow control]{Flow control \hfill OSI Layer 2}`
`\centering`	`\label{subsec:Hardware_FlowControl}`
`\includegraphics[width=0.6\textwidth]{Hardware_Decoder.png}`	It is of great importance to constantly check the RX FIFO status. It can be that the user logic reading the FIFO is busy and cannot keep up with the link speed. In case overflows normally start to occur, data will be lost permanently and this may not be allowed to happen. The solution to this problem is to let the receiver constantly check the RX FIFO status. When the FIFO start filling up more than usual and begins to have a tendency to overflow, there will be a signal generated that will cause the control words to transmit an XOFF for the specific channel.
`\caption{Overview of the decoder block.}`	`When the side responsible for transmitting the side receives this message, action will be taken by stopping to read the TX FIFO. This way the TX FIFO will fill up somewhat more and the FIFO will update the user logic on this which will stop sending data.`
`\label{Fig:Hardware_Decoder}`
`\end{figure}`

`Additionally there will always be a transition to lock on since the preamble added by the encoder will be present. Even in case there is no data to transmit the scrambler will continue outputting data containing varying transitions which makes it possible to still get in lock or maintain the locked state.`
`\subsubsection{Complete receiving interface}`

`\subsection{Transceiver}`	`\subsection{Transceiver}`
`\label{subsec:Hardware_Transceiver}`	`\label{subsec:Hardware_Transceiver}`
`Separate hardware is required to generate the 10Gbps link itself.\\`	`Separate hardware is required to set up the 10 Gbps link itself. For such link a 10 GHz clock signal is required and of course the parallel data should be serialized to ready it for serial transmission. A great thing is that the transceiver already takes care of this. The only requirements are to configure it correctly and to provide stable clocks.`
`QPLL is used because line operates above CPLL operating range.\\`
`According to transciever manual ug476 Interlaken can run at 10.3125 Gbps. This requires a system clock of $10,3125 Gbit / 64 bits = 161,13 MHz$ or $10,0 Gbit / 64 bits = 156,25 MHz$`

`\newpage`	`Because a Xilinx FPGA is used for testing, the Xilinx transceiver will also be used. This will of course have consequences for the whole core being FPGA vendor independent but in this case there is no other solution.`

`\subsection{Current state of progress}`	The GTX transceiver in the Virtex-7 contains two types of PLL's (Phase Locked Loops) which generates the clock frequency required for the transmission line. In this case the QPLL will be used since the CPLL is limited at a lower maximum frequency and is less stable. The transceiver connected to the SFP+ connection will be used to transmit the data over fiber. The accompanying GTX transceiver is located at Quad0 and is noted as GTX\_X1Y2. The clock the Interlaken Interface itself works on is expected to be $10,0 Gbit / 64 bits = 156,25 MHz$.
`This subsection will describe the current progress on the hardware development of the Interlaken Protocol. Two tables will be presented from which Table~\ref{Tab:Transmitter_Status} gives information on the transmitter side and Table~\ref{Tab:Receiver_Status} will provide an overview of the status on the receiver side. This has been done to match the structure in which the hardware will be developed.`

	`During startup the transceiver needs several microseconds to configure itself and to lock the QPLL. After this data can be applied to get the other part of the interface locked.`

`%Options to customize the table easily`	`The transceiver also requires a 40 MHz DRP clock which will be generated with the Xilinx Clocking Wizard 5.3 IP core. The most recent version implements the Xilinx Transceiver Wizard 3.6 IP core.`
`\taburowcolors[2] 2{tableLineOne .. tableLineTwo}`
`\tabulinesep = ^2mm_2mm`	`\subsection{Complete interface}`
`\everyrow{\tabucline[.3mm white]{}}`	`When all components are combined to one complete interface the overview should look like depicted in \ref{Fig:Core1990}. This includes the transmitter and receiver logic connected to the transceiver. While the interface only requires the user to input data and control signals with it, certain clocks should of course also be provided.`

`\begin{table}[H]`	`\begin{figure}[H]`
`\begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]}`	`\centering`
`\tableHeaderStyle`	`\includegraphics[width=\textwidth]{Core1990_Overview.png}`
`Function & Simulation & Hardware & Comments \\`	`\caption{Complete Core1990 architecture.}`
`Generating bursts& Done & - & Working in simulation as intended (simple packet mode)\\`	`\label{Fig:Core1990}`
`Meta Framing & Done & - & Working in simulation\\`	`\end{figure}`
`Scrambling & Done & - & Working in simulation - Amount of input pins may change\\`
`Encoding & Done & - & Working in simulation\\`
`CRC generating & Done & - & Working in simulation\\`
`Flow control & - & - & Not started yet/bits reserved\\`
`\end{tabu}`
`\caption{Overview of progress on the transmitter side.}`
`\label{Tab:Transmitter_Status}`
`\end{table}`

`\begin{table}[H]`
`\begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]}`
`\tableHeaderStyle`
`Function & Simulation & Hardware & Comments \\`
`Deframing & Done & - & Working in simulation\\`
`Descrambler & Done & - & Working as expected in simulation\\`
`Decoder & Done & - & Working in simulation\\`
`CRC checking& Done & - & Working in simulation\\`
`Flow control& - & - & Not started yet/bits read but static\\`
`\end{tabu}`
`\caption{Overview of progress on the receiver side.}`
`\label{Tab:Receiver_Status}`
`\end{table}`

`\newpage`	`\newpage`
`No newline at end of file`	`No newline at end of file`

Line 1...

\section{Hardware implementation}

\section{Hardware implementation}

\label{sec:hardware_implementation}

\label{sec:hardware_implementation}

The purpose of this assignment was to search and implement the best protocol matching a clear set of requirements. In chapter~\ref{sec:interlaken} this best protocol has been found and described in details. This section will focus on the implementation of Interlaken on an FPGA and the hardware provided to test it.

The purpose of this assignment was to search and implement the best protocol matching a clear set of requirements. In chapter~\ref{sec:interlaken} this best protocol has been found and described in details. This section will focus on the implementation of Interlaken on an FPGA and the hardware provided to test it.

The author has been provided a Xilinx VC707 Evaluation Board~\cite{VC707} by Nikhef to test with.

The author has been provided a Xilinx VC707 Evaluation Board~\cite{VC707} by Nikhef to eventually test the implemented design on.

The provided board is depicted in Figure~\ref{Fig:VC707_Nikhef}.

The provided board is depicted in Figure~\ref{Fig:VC707_Nikhef}.

\begin{figure}[H]

\begin{figure}[H]

        \centering

        \centering

        \includegraphics[width=\textwidth]{VC707_Nikhef.jpg}

        \includegraphics[width=\textwidth]{VC707_Nikhef.jpg}

Line 22...

The included GTX transceivers support transfer speeds up to 12,5 Gbps in case the QPLL is used instead of the CPLL, which is excellent since 10 Gbps is the target line rate~\cite{GTXT}. The difference between PLL types and which clocks they generate will be explained in section \ref{subsec:Hardware_Transceiver}. It is even specifically mentioned that in case of Interlaken a line rate of 10,3125 Gbps would be supported. The QPLL frequency would of course be 10,3125 GHz since all data will be serialized and transmitted over the line.

The included GTX transceivers support transfer speeds up to 12,5 Gbps in case the QPLL is used instead of the CPLL, which is excellent since 10 Gbps is the target line rate~\cite{GTXT}. The difference between PLL types and which clocks they generate will be explained in section \ref{subsec:Hardware_Transceiver}. It is even specifically mentioned that in case of Interlaken a line rate of 10,3125 Gbps would be supported. The QPLL frequency would of course be 10,3125 GHz since all data will be serialized and transmitted over the line.

This Chapter will contain separate sections describing the transmitter, receiver and transceiver parts. In case IP cores are used this will be noted with the accompanying version and vendor.

This Chapter will contain separate sections describing the transmitter, receiver and transceiver parts. In case IP cores are used this will be noted with the accompanying version and vendor.

%while the reference clock should be 161,13 MHz. This is understandable since the 10,3125 GHz is a serial transmission and when this is converted to a parallel bus the same speed has to be maintained. This can be calculated dividing 10,3125 GHz by the 64-bits which indeed results in the 161,13 MHz.

%while the reference clock should be 161,13 MHz. This is understandable since the 10,3125 GHz is a serial transmission and when this is converted to a parallel bus the same speed has to be maintained. This can be calculated dividing 10,3125 GHz by the 64-bits which indeed results in the 161,13 MHz.

\subsection{Transmitter}

\subsection{Transmitter side}

        The transmitter side will be described and designed first. This will deliver more insight in the framing and encoding of the data which will make it easier to remove the framing and decode the transmitted data at the receiving side.

        The transmitter side will be described and designed first. This will deliver more insight in the framing and encoding of the data which will make it easier to remove the framing and decode the transmitted data at the receiving side.

        Figure \ref{Fig:Hardware_TX} displays an overview of the complete transmitter side of the interface. Only useful data will be stored and the FIFO can also indicate it is full to other logic. Framing components have to communicate with each other because of the extra space required in between the data flowing.

        Figure \ref{Fig:Hardware_TX} depicts the complete transmitter side of the interface. Only useful data will be stored and the FIFO can also indicate it is full to other logic. Framing components have to communicate with each other because of the extra space required in between the data flowing.

        \begin{figure}[H]

        \begin{figure}[H]

                \centering

                \centering

                \includegraphics[width=\textwidth]{Hardware_TX.png}

                \includegraphics[width=\textwidth]{Hardware_TX.png}

                \caption{Overview of the TX block diagram.}

                \caption{Overview of the TX block diagram.}

                \label{Fig:Hardware_TX}

                \label{Fig:Hardware_TX}

        \end{figure}

        \end{figure}

        \subsubsection[FIFO]{FIFO \hfill OSI Layer 2}

        \subsubsection[TX FIFO]{TX FIFO \hfill OSI Layer 2}

                The FIFO will act as a buffer temporary storing data when frames are added or the interface can process no more data. It is a possibility the user needs some time to respond and stop the appearance of new data at the interface input so the FIFO would hold this data.

                The FIFO will act as a buffer temporary storing data when frames are added or the interface can process no more data. It is also especially useful in case the interface needs some extra time before the next frame can be processed. This will often occur during the addition of burst and meta frames.

                It's very inefficient to store all data in the FIFO appearing at the input since it could be possible not all data has to be transferred or the user interface provides no new data. To counter this problem a separate state machine has been added which responds to the SOP and EOP signals. In case a SOP is detected the FIFO will be allowed to store data and in case a EOP is detect the process will be stopped. The SOP and EOP signal will also be stored in the FIFO like other user interface signals which have to be included in the burst control words.

                Another very useful feature of the FIFO is the ability to cross clock domains. Data will be written in the FIFO at a by the user determined rate while to FIFO will be read at the maximum speed the transceiver allows to always keep the line and logic alive/busy.

                In case the bursts will be generated according to the optional scheduling enhancement, an extra feature will be required to read the amount of data already placed in the FIFO.

                A separate input port is available to the user which enables the write ability of the FIFO. So only data the user really want to be stored will enter the FIFO. Besides this other control signals like the SOP, EOP and EOP valid will be put together with the data in one signal. The single 68-bit variable will then enter the FIFO.

                The current software version implements the Xilinx FIFO generator 13.1 IP core.

                In case the bursts will be generated according to the optional scheduling enhancement, an extra feature will be required to read the amount of data already placed in the FIFO.\\

                The most recent version implements the Xilinx FIFO generator 13.1 IP core.

        \subsubsection[Bursts]{Bursts \hfill OSI Layer 2}

        \subsubsection[Bursts]{Bursts \hfill OSI Layer 2}

                The conversion from data packets to complete bursts will happen in a separate component. Certain input signals are expected which the application developer should provide. Besides the data input itself a SOP en EOP signal, indicating the start and end of the packet, can for example be expected.

                Data leaving the FIFO will be converted to complete bursts. The control signal provided by the user which will leave the FIFO with the data, will be read and according to this the bursts will be formed. Then for example an SOP or EOP burst control word can be generated.

                In the Burst component a state machine can be found which remains in idle state unless the burst is enabled and a start of packet is detected. This will trigger the state machine and the arriving data from the FIFO will be read. The data will be saved in several pipelined registers to make packing the data in burst words possible. As explained in Section~\ref{subsec:interlaken_bursts} a burst control word has to be added first. After this the data will follow and as long as the state machine doesn't detect an EOP this continues. Every word of data processed will also cause an increment by one in the word counter because of the maximum burst length, BurstMax, that is allowed. When this value is reached the state machine will switch to another state for one cycle and will return to processing the data again. This has been done so a burst control word can be transmitted between bursts of maximum lengths.

                In the Burst component a state machine can be found which remains in idle state unless the burst is enabled and a start of packet is detected. This will trigger the state machine and the arriving data from the FIFO will be read. The data will be saved in several pipelined registers to make packing the data in burst words possible. As explained in Section~\ref{subsec:interlaken_bursts} a burst control word has to be added first. After this the data will follow and as long as the state machine doesn't detect an EOP this continues. Every word of data processed will also cause an increment by one in the word counter because of the maximum burst length, BurstMax, that is allowed. When this value is reached the state machine will switch to another state for one cycle and will return to processing the data again. This has been implemented because a burst control word has to be output.

                When an EOP signal is detected the state machine will switch to another state. Reading the FIFO will be stopped and a control word which contains the EOP and CRC-24 will follow at the output. After this it is important to check the word counter value. When the data length transmitted in this situation is shorter than the predefined BurstShort, the control word will be followed by one of multiple idle words until the transmitted data length is equal to BurstShort and the state machine will again wait for an SOP signal. In case the transmission contained an amount of words in between the values of BurstShort and BurstMax, no idle words will be necessary to include.

                When an EOP signal is detected the state machine will switch to another state. Reading the FIFO will be stopped and a control word which contains the EOP and CRC-24 will follow at the output. After this it is important to check the word counter value. When the data length transmitted in this situation is shorter than the predefined BurstShort, the control word will be followed by one of multiple idle words until the transmitted data length is equal to BurstShort and the state machine will again wait for an SOP signal. In case the transmission contained an amount of words in between the values of BurstShort and BurstMax, no idle words will be necessary to include.\\

                There is also a situation possible where the user sends data at a slower rate than the complete interface can process. In this case the FIFO will transmit multiple empty flags during transmission. But the transceiver is still expecting data so the empty spaces are filled up with idle words. This way data is still transmitted and the link is kept alive. In case these idle words won't be used, the FIFO will still output the last value in it's memory. The interface will then just processes this as useful data which results in duplicated data at the receiving side. Plus an RX FIFO overflow since the RX FIFO will be read at the same rate the TX FIFO is filled.\\

                An implementation of the optional scheduling enhancement is recommended but not yet developed.

                An implementation of the optional scheduling enhancement is recommended but not yet developed.

        \subsubsection[Meta Frame]{Meta Framing \hfill OSI Layer 2}

        \subsubsection[Meta Frame]{Meta Framing \hfill OSI Layer 2}

                This component will add the meta frames to the data transmission, as discussed in section~\ref{subsec:interlaken_metaframe} this will be four words. It contains a state machine that counts the passed data words. When the transmission starts, these four control words will appear at the output. During these four cycles data will be read and pipelined so when the control words have passed, data will immediately follow. When the word length of the passed data reaches a value of MetaFrameLength, including the several data words, the state will change.

                This component will add the meta frames to the data transmission, as discussed in section~\ref{subsec:interlaken_metaframe} this will be four words. It contains a state machine that counts the passed data words. When the transmission starts, these four control words will appear at the output. During these four cycles data will be read and pipelined so when the control words have passed, data will immediately follow. When the word length of the passed data reaches a value of MetaFrameLength, including the several data words, the state will change.

                Firstly the pipelined data will be output so it takes several cycles before all data left the component. After this the framing words will be output again to complete the meta frame and during this process input data will already enter the pipeline registers again. This way the cycle repeats and a complete meta frame will always appear at the output. One last thing to consider is the FIFO will also stop being read for four clock cycles. Otherwise there won't be place for the framing words.

                Firstly the pipelined data will be output so it takes several cycles before all data left the component. After this the framing words will be output again to complete the meta frame and during this process input data will already enter the pipeline registers again. This way the cycle repeats and a complete meta frame will always appear at the output. One last thing to consider is the FIFO will also stop being read for four clock cycles. Otherwise there won't be place for the framing words.

        \subsubsection[CRC generation]{CRC generation \hfill OSI Layer 2}

        \subsubsection[CRC generation]{Generating CRC \hfill OSI Layer 2}

                Bursts and Meta Frames both contain variants of the CRC and generating this will be explained in this subsection. Both contain the same method since they will check a certain length of words and also the specific word that will contain the CRC itself later.

                Bursts and Meta Frames both contain variants of the CRC and generating this will be explained in this subsection. Both contain the same method since they will check a certain length of words and also the specific word that will contain the CRC itself later.

                The component responsible for generating the CRC needs two clock cycles to let this check appear at the output. In this case a method is used where data enters the CRC component but is also saved in original state parallel to the register because this has yet to be transmitted. Since the control words containing the reserved space for the CRC also have to be checked, the data has actually to be held in parallel registers for two clock cycles. This method makes it fairly simple to put the generated CRC in the control word since this word and the CRC now are available on the same clock cycle. Now it's just a matter of moving the bits to the right position in the control word.

                The component responsible for generating the CRC needs two clock cycles to let this check appear at the output. In this case a method is used where data enters the CRC component but is also saved in original state parallel to the register because this has yet to be transmitted. Since the control words containing the reserved space for the CRC also have to be checked, the data has actually to be held in parallel registers for two clock cycles. This method makes it fairly simple to put the generated CRC in the control word since this word and the CRC now are available on the same clock cycle. Now it's just a matter of moving the bits to the right position in the control word.

                Figure~\ref{Fig:Hardware_CRC} shows a visual representation of this.

                Figure~\ref{Fig:Hardware_CRC} shows a visual representation of this.

                \begin{figure}[H]

                \begin{figure}[H]

                        \centering

                        \centering

                        \includegraphics[width=0.9\textwidth]{Hardware_CRC.png}

                        \includegraphics[width=0.6\textwidth]{Hardware_CRC.png}

                        \caption{Used method generating CRC.}

                        \caption{Used method generating CRC.}

                        \label{Fig:Hardware_CRC}

                        \label{Fig:Hardware_CRC}

                \end{figure}

                \end{figure}

        \subsubsection[Flow control]{Flow control \hfill OSI Layer 2}

                Flow control is an important aspect that has to be added in the design. This will be included in the form of the earlier discussed Out-of-Band flow control.

        \subsubsection[Scrambler]{Scrambler \hfill OSI Layer 1}

        \subsubsection[Scrambler]{Scrambler \hfill OSI Layer 1}

                The scrambler will receive a 64-bit data input from the meta framing component. The scrambled output also contains 64-bits.

                The scrambler will receive a 64-bit data input from the meta framing component. Since the synchronization and scrambler state words won't have to be scrambled these have to be detected first. When the control input signal is low this means data is entering the scrambler and this will always be scrambled. In case the control input is high this first six bits will be looked at. This way the scrambler can determine whether the word is synchronization, scrambler state or other control word. The first will leave the scrambler untouched while the second will appear at the output after the current scrambler state has been added to the word. All other control words will be scrambled. The scrambled data will be outputted alongside the control word indicator and a data valid signal.

                One addition to the scrambler is that it inspects certain bits in the data input. This has been done because the synchronization and scrambler state words have to be transmitted unscrambled. The block types have already been placed by the meta framing so when the control pin is a logic '1' and the block type indicated a synchronization or scrambler state word, the accompanying bits will be added. Of course the data control output will be a logic '1' in case a control word will appear at the output.

                The scrambler polynomial has been defined before. The Interlaken Protocol Definition already includes a piece of code in Appendix B showing how to constantly generate the output and new state of the polynomial. This code was unfortunately written in Verilog so only the part generating the polynomial has been used and was translated to VHDL.

                The scrambler polynomial has been defined before and the Interlaken Protocol Definition already included a piece of code in Appendix B showing how to constantly generate the output and new state of the polynomial. This code was unfortunately written in Verilog so only the part generating the polynomial has been used and was translated to VHDL.

                \begin{figure}[H]

                        \centering

                        \includegraphics[width=0.6\textwidth]{Hardware_Scrambler.png}

                        \caption{Overview of the Scrambler block.}

                        \label{Fig:Hardware_Scrambler}

                \end{figure}

        \subsubsection[Encoder]{Encoder \hfill OSI Layer 1}

        \subsubsection[Encoder]{Encoder \hfill OSI Layer 1}

                The encoder will accept the 64-bit data signal (Data\_In) and control word indicator (Data\_Control) from the scrambler as inputs. Of course the encoder should also have a clock, reset and enable input.

                The encoder will accept the scrambled data output. When enabled all data packets will be added a 3-bit preamble header. The control word input signal will indicate what the first two preamble bits accompanying the word should be. When the word is data '01' will be added and when a control word appears this will be '10'.

                For the determination of the inversion bit a separate variable will be reset every clock cycle. This will count the running disparity of the incoming data using a for-loop. The value will be saved and compared to the running disparity value of the data just being transmitted. This data is located in a separate variable.

                When the encoder is enabled all data packets will be added the preamble header. The two control/data word indication bits will depend on the status of Data\_Control. In case this signal is a logic '1', this indicates a control word and '10' will be added. In case the signal is a logic '0', arrival of a data word is indicated and '01' will be added. This should always generate the right bits.

                In case both words contain a majority of the same symbol, the bits of the incoming data will be inverted and the inversion bit will be added to the preamble as a logic high. After this the data will be moved to another variable and will this time be compared to the newly appearing input concerning running disparity.

                The status of inversion bit will also be determined here. A separate variable will be reset every clock cycle and will count the running disparity of the incoming data Data\_In using a for-loop. This value will be saved and compared to the running disparity value of the data just being transmitted. This data is located in a separate variable. In case both words contain a majority of ones or a majority of zeros, the bits of the incoming data will be inverted. After this the data will be moved to the variable and compared to the input again concerning running disparity.

                After this process a 67-bit word will leave the encoder and should be ready for transmission. The transceiver will accept this data and is responsible for the real transmission.

\newpage

\subsection{Receiver side}

        The receiving side will be responsible for restoring the data to its original form before this entered the interface. During development of the transmitter side many knowledge has been gained which makes it easier to develop the receiver side. While the transmitting side added framing words and encoded data, the receiving side has to decode this again and remove the frames. Figure \ref{Fig:Hardware_RX} depicts an overview of the receiver side.

                The data output will follow as a 67-bit value and is ready to be processed by the SerDes. Figure \ref{Fig:Hardware_Encoder} displays the entity of the Encoder.

                \begin{figure}[H]

                \begin{figure}[H]

                        \centering

                        \centering

                        \includegraphics[width=0.6\textwidth]{Hardware_Encoder.png}

                \includegraphics[width=\textwidth]{Hardware_RX.png}

                        \caption{Overview of the Encoder block.}

                \caption{Overview of the RX block diagram.}

                        \label{Fig:Hardware_Encoder}

                \label{Fig:Hardware_RX}

                \end{figure}

                \end{figure}

\newpage

        \subsubsection[RX FIFO]{RX FIFO \hfill OSI Layer 2}

\subsection{Receiver}

                The original data identical to the data before entering the interface will be written in this FIFO. Again it is also used to cross clock domains since the interface itself will be running at a standard frequency but the user clock may vary or configure at a different frequency.

        The receiving side will be responsible for restoring the data back to its original form. The knowledge gained while developing the transmitter side parts creates better understanding on how to take on the receiving side. While the transmitting side added framing words and encoded data, the receiving side has to decode this again and remove the frames.

        \subsubsection[Deframing]{Deframing \hfill OSI Layer 2}

                Only when the component removing the burst frames output a valid signal, the data appearing at the FIFO input will be stored. Otherwise the data will be ignored since it then concerns a control word or duplicated data which is not useful. This means the valid signal is connected to the FIFO write enable pin.

                This is simply removing the control words which can be indicated by the specific indication signal. Of course several indicators have to be read and processed before simply deleting words. For example in case an SOP or EOP is detected, the user interface will output a high SOP or EOP pin.

        \subsubsection[CRC checking]{CRC checking \hfill OSI Layer 2}

                The user has to set the FIFO read signal high before any data will appear at the output which prevents data loss. In case the user logic is busy with other tasks and can't process the data, this will remain stored in the FIFO. However the chance on the FIFO overflowing is fairly high in such situating. This will be prevented by flow control which will be discussed in \ref{subsec:Hardware_FlowControl}. For this the FIFO programmable full signal can be used which offers a set value and threshold based on how far the FIFO is filled with data.\\

                Error checking will be done generating the CRC again like at the transmitter side. This generated value will be compared to the received CRC value in the control words. In case these match the data arrived flawless. When these don't match data corruption occurred and the data is not identical to that at the transmitting side.

        \subsubsection[Flow control]{Flow control \hfill OSI Layer 2}

                The most recent version implements the Xilinx FIFO generator 13.1 IP core.

                The receiver has to constantly check the

        \subsubsection[Descrambler]{Descrambler \hfill OSI Layer 1}

        \subsubsection[Deframing Burst]{Deframing Burst \hfill OSI Layer 2}

                Data leaving the decoder has yet to be descrambled. When starting the descrambler it won't process any input data but instead look for the unscrambled synchronization words. The control words indicator from the decoder will be read. In case this indicates a control word and the block type is identical to the synchronization one, the data will be compared to the predefined sync data. In case these are identical, the state machine will move into another state and two counters start.

                The added burst control words have to be removed. However these words contain critical information which have to be read and processed before simply deleting words. For example in case an SOP or EOP is detected, the user interface will output a high SOP or EOP pin.

                The amount of passing words will be counted. After a certain MetaFrameLength amount of words the synchronization word has to appear again. In case this happens the sync word counter will be incremented by one. In case this reaches the value of four, the state machine moves to the next state indicating a lock.

                The deframing can easily be done by inspecting the valid and control indicator which accompany each word. When the valid signal is low the data will simply be ignored. In case valid is high and control is also high, this indicates a valid burst control word. Since different burst control words each contain other viable information, it will first be determined which word this is. According to this analysis the useful information will be extracted. This data will be saved and appear at the output accompanying the next data word. For example an SOP be added to the next word. One exception is the EOP since this actually has to accompany the most recent passed data word.

                When in lock all words at the input will be descrambled, except the sync and scrambler state words of course. There will still be checked on correct synchronization words and in case this is not identical to the word expected, the sync word error counter will be incremented by one. After this the scrambler state word is expected to arrive. In case this matches the current polynomial the status should remain locked. Otherwise the scrambler state mismatch counter will increment by one.

                In case valid is high and control low, the entering word will be considered as valid data. However this data will not yet appear at the output but is saved. This is done because an EOP can follow anytime and in case the data word is released to early the EOP will miss. When another valid data or control word enters the component, the saved word will be released. This means the possible appeared EOP or earlier SOP will appear at the output of the component together with the data.

                When the sync word error or the scrambler state mismatch counters reaches the value of respectively four or three, a reset will follow. The descrambler loses its lock and has to look for synchronization words again to get in lock.

                Another valid signal at the output will switch to high when the data leaves the component to indicate the FIFO that this data is useful and has to be stored.

                \begin{figure}[H]

        \subsubsection[Deframing Meta]{Deframing Meta \hfill OSI Layer 2}

                        \centering

                The meta frames are useful during transmission and deframing but they have to be invalidated in the process of outputting only the original user data. The descrambler uses these frames to synchronize itself and before deframing the CRC-32 has also to be checked.

                        \includegraphics[width=0.6\textwidth]{Hardware_Descrambler.png}

                After this the meta frames will simply be ignored by making their accompanying valid signal low at the output. This way other components will just ignore them. Of course when data enters this component with a logical low accompanying valid signal, the data will be ignored.

                        \caption{Overview of the descrambler block.}

                        \label{Fig:Hardware_Descrambler}

                \end{figure}

        \subsubsection[Decoder]{Decoder \hfill OSI Layer 1}

        \subsubsection[CRC checking]{CRC checking \hfill OSI Layer 2}

                Data and control packets entering the decoder will of course be 67 bits wide. The preamble has to be removed which will return the data to its 64-bit width before encoding. Unfortunately these preamble bits cannot be directly removed since the exact position of these bits in the word is unknown. Since the packets have been serialized and parallelized again the preamble bits could have moved to for example positions 42:40 instead of the expected 66:64. The method to detect these bits and to lock on them is described in the Interlaken Protocol Definition. \\

                Error checking will be done generating the CRC again like at the transmitter side. This generated value will be compared to the received CRC value in the control words. In case these match the data arrived flawless. When these don't match data corruption occurred and the data is not identical to that at the transmitting side.

                Since the preamble always contains at least one transition, looking for transitions is the key to finding the preamble location in repeating packets. In this case transitions in the data input will be detected using a logic XOR with inputs Data(i) and Data(i-1). Variable i will loop from 66 to 0 so all transitions will be registered. In case the bits are equal in value it is clear no transition occurred and the XOR will output a logic '0'. In case a transition happens the bits won't be equal in value and the XOR will output a logic '1'. The outcome will be saved in a new 67-bit signal T1 now containing a logic '1' on all locations where the input data contains a transition.

        \subsubsection[Descrambler]{Descrambler \hfill OSI Layer 1}

                Data leaving the decoder is still in scrambled format and not yes usable. When starting the descrambler it won't process any input data but instead look for the unscrambled synchronization words. The control words indicator from the decoder will be read. In case this indicates a control word and the block type is identical to the synchronization one, the data will be compared to the predefined sync data. In case these are identical, the state machine will move into another state and two counters start. In case the input data valid signal is a logical low, this word will be ignored so the scrambler state will also not change.

                This signal T1 will go through a logic AND comparing it to the saved value of the earlier returned transitions called T2 or the reset value during startup. This way it can easily be analyzed which transitions are returning every word. Since this can easily be done in parallel, all transition bits occurring in both words will cause the AND to output a logic '1'. The complete result will be copied to signal T2. This process will constantly repeat until a single returning transition is left.

                The amount of passing words will be counted. After a certain MetaFrameLength amount of words the synchronization word has to appear again. In case this happens the sync word counter will be incremented by one. In case this reaches the value of four, the state machine moves to the next state indicating a lock.

                In case transitions at certain positions are not always returning, this will be saved at the same location in T2 as a logic '0'. This indicated the location isn't containing the preamble. There is a possibility no legal repeating sync pattern can be detected and this will quickly cause T2 to be completely filled with zeros and a reset will follow. A quick overview of the till now discussed hardware design is visualized in Figure~\ref{Fig:Hardware_DecoderPt1}.

                When in lock all words at the input will be descrambled, except the sync and scrambler state words of course. There will still be checked on correct synchronization words and in case this is not identical to the word expected, the sync word error counter will be incremented by one. After this the scrambler state word is expected to arrive. In case this matches the current polynomial the status should remain locked. Otherwise the scrambler state mismatch counter will increment by one.

                \begin{figure}[H]

                When the sync word error or the scrambler state mismatch counters reaches the value of respectively four or three, a reset will follow. The descrambler loses its lock and has to look for synchronization words again to get in lock.

                        \centering

                        \includegraphics[width=\textwidth]{Hardware_DecoderPt1.png}

                        \caption{Overview of the decoder input.}

                        \label{Fig:Hardware_DecoderPt1}

                \end{figure}

                During the repeating patterns another counter will keep up how many consecutive legal sync pattern have passed. When the no repeating patterns have been found the counter will of course be reset. In case the pattern holds on, contains a single returning transition and the counter reaches a value of 64 consecutive legal patterns, the decoder will declare a word lock. When moving into lock state the sync header location will be saved to compare later and detect sync errors.

        \subsubsection[Decoder]{Decoder \hfill OSI Layer 1}

                The decoder will immediately receive data from the transceiver. Thus data entering the decoder will be 67-bit wide. This part will be responsible for removing the preamble and reducing the data width to 64-bits. The scrambled data will remain to be descrambled and the preamble will be converted to a separate control word indicator. The inversion bit in the preamble will be read and according to this data will be inverted or not, then the inversion bit will be discarded.

                The word lock will keep on as long less than sixteen sync header errors occur per 64 words. The error and word counter will be reset after every 64 words that pass. In case more than these sixteen sync errors occur the decoder will lose its lock state and be reset. After this the whole procedure restarts.

                One of the most important functions of the decoder is to alight the data correctly. Since the data is scrambled and cannot be read, the preamble has to be used for alignment. The decoder can lock on the bit transition in the preamble which assures the 64-bits leaving the decoder are really identical to the packet that has been transmitted.

                Figure~\ref{Fig:Hardware_DecoderPt2} depicts the hardware schematic in case the decoder is in lock. As explained it will be checked if the input data still has a transition on the same location as the one locked on. The state machine will constantly be updated on this value.

                In combination with the transceiver from Xilinx, the decoder can use a separate pin for bit slipping. This will cause the transceiver to change the alignment of data during data processing and changes the preamble position. After every bit slip operation the current location of the preamble will be checked and when this is not located at the last three bits, the slip operation will repeat until the preamble is at the right location. This will also cause the decoder to go into lock after 64 consecutive words contain the right preamble location.

                \begin{figure}[H]

                When in lock the amount of processed words will be tracked by using a counter. A single word that is not correctly synchronized will not immediately cause the decoder to lose lock. For this an error counter is used which will keep track of the incorrectly synced words. After sixteen errors the decoder will lose lock and reset. This is determined over all words processed but every 64 words. After these 64 words the word counter and error status will be reset. Fortunately the three preamble bits only effectively use 50\% of the valid conditions so errors should not be common and will be easily detected.

                        \centering

                        \includegraphics[width=\textwidth]{Hardware_DecoderPt2.png}

                        \caption{Overview of the decoder in lock.}

                        \label{Fig:Hardware_DecoderPt2}

                \end{figure}

                While in lock the input data will be processed. In case the data has been inverted this will be reversed so that the polarity always leaves the decoder in original form. The two word indication bits will also de removed and a separate signal will leave the component to indicate a control word appeared at the output. This is also shown in Figure~\ref{Fig:Hardware_Decoder} which also shows two additional signals leaving the decoder. In case the decoder is locked or errors occur the separate signals will make these events visible to the logic interfacing with the Interlaken core.

                The implemented decoder contains three outputs. Two are for the control and valid signals while another one is of course for the data/control word itself. This makes it easy for the components after this to know which type of data this 64-bit word is.

                \begin{figure}[H]

\subsection[Flow control]{Flow control \hfill OSI Layer 2}

                        \centering

        \label{subsec:Hardware_FlowControl}

                        \includegraphics[width=0.6\textwidth]{Hardware_Decoder.png}

        It is of great importance to constantly check the RX FIFO status. It can be that the user logic reading the FIFO is busy and cannot keep up with the link speed. In case overflows normally start to occur, data will be lost permanently and this may not be allowed to happen. The solution to this problem is to let the receiver constantly check the RX FIFO status. When the FIFO start filling up more than usual and begins to have a tendency to overflow, there will be a signal generated that will cause the control words to transmit an XOFF for the specific channel.

                        \caption{Overview of the decoder block.}

        When the side responsible for transmitting the side receives this message, action will be taken by stopping to read the TX FIFO. This way the TX FIFO will fill up somewhat more and the FIFO will update the user logic on this which will stop sending data.

                        \label{Fig:Hardware_Decoder}

                \end{figure}

                Additionally there will always be a transition to lock on since the preamble added by the encoder will be present. Even in case there is no data to transmit the scrambler will continue outputting data containing varying transitions which makes it possible to still get in lock or maintain the locked state.

        \subsubsection{Complete receiving interface}

\subsection{Transceiver}

\subsection{Transceiver}

\label{subsec:Hardware_Transceiver}

\label{subsec:Hardware_Transceiver}

Separate hardware is required to generate the 10Gbps link itself.\\

        Separate hardware is required to set up the 10 Gbps link itself. For such link a 10 GHz clock signal is required and of course the parallel data should be serialized to ready it for serial transmission. A great thing is that the transceiver already takes care of this. The only requirements are to configure it correctly and to provide stable clocks.

QPLL is used because line operates above CPLL operating range.\\

According to transciever manual ug476 Interlaken can run at 10.3125 Gbps. This requires a system clock of $10,3125 Gbit / 64 bits = 161,13 MHz$ or $10,0 Gbit / 64 bits = 156,25 MHz$

\newpage

        Because a Xilinx FPGA is used for testing, the Xilinx transceiver will also be used. This will of course have consequences for the whole core being FPGA vendor independent but in this case there is no other solution.

\subsection{Current state of progress}

        The GTX transceiver in the Virtex-7 contains two types of PLL's (Phase Locked Loops) which generates the clock frequency required for the transmission line. In this case the QPLL will be used since the CPLL is limited at a lower maximum frequency and is less stable. The transceiver connected to the SFP+ connection will be used to transmit the data over fiber. The accompanying GTX transceiver is located at Quad0 and is noted as GTX\_X1Y2. The clock the Interlaken Interface itself works on is expected to be $10,0 Gbit / 64 bits = 156,25 MHz$.

        This subsection will describe the current progress on the hardware development of the Interlaken Protocol. Two tables will be presented from which Table~\ref{Tab:Transmitter_Status} gives information on the transmitter side and Table~\ref{Tab:Receiver_Status} will provide an overview of the status on the receiver side. This has been done to match the structure in which the hardware will be developed.

        During startup the transceiver needs several microseconds to configure itself and to lock the QPLL. After this data can be applied to get the other part of the interface locked.

        %Options to customize the table easily

        The transceiver also requires a 40 MHz DRP clock which will be generated with the Xilinx Clocking Wizard 5.3 IP core. The most recent version implements the Xilinx Transceiver Wizard 3.6 IP core.

        \taburowcolors[2] 2{tableLineOne .. tableLineTwo}

        \tabulinesep = ^2mm_2mm

\subsection{Complete interface}

        \everyrow{\tabucline[.3mm  white]{}}

        When all components are combined to one complete interface the overview should look like depicted in \ref{Fig:Core1990}. This includes the transmitter and receiver logic connected to the transceiver. While the interface only requires the user to input data and control signals with it, certain clocks should of course also be provided.

        \begin{table}[H]

        \begin{figure}[H]

                \begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]}

                \centering

                        \tableHeaderStyle

                \includegraphics[width=\textwidth]{Core1990_Overview.png}

                        Function                & Simulation & Hardware & Comments \\

                \caption{Complete Core1990 architecture.}

                        Generating bursts& Done          & -            & Working in simulation as intended (simple packet mode)\\

                \label{Fig:Core1990}

                        Meta Framing    & Done           & -            & Working in simulation\\

        \end{figure}

                        Scrambling              & Done           & -            & Working in simulation - Amount of input pins may change\\

                        Encoding                & Done           & -            & Working in simulation\\

                        CRC     generating      & Done           & -            & Working in simulation\\

                        Flow control    & -                      & -            & Not started yet/bits reserved\\

                \end{tabu}

                \caption{Overview of progress on the transmitter side.}

                \label{Tab:Transmitter_Status}

        \end{table}

        \begin{table}[H]

                \begin{tabu} to \textwidth {X[1.5] X[1] X[1] X[3]}

                        \tableHeaderStyle

                        Function        & Simulation & Hardware & Comments \\

                        Deframing       & Done           & -            & Working in simulation\\

                        Descrambler     & Done           & -            & Working as expected in simulation\\

                        Decoder         & Done           & -            & Working in simulation\\

                        CRC checking& Done               & -            & Working in simulation\\

                        Flow control& -                  & -            & Not started yet/bits read but static\\

                \end{tabu}

                \caption{Overview of progress on the receiver side.}

                \label{Tab:Receiver_Status}

        \end{table}

\newpage

\newpage

 No newline at end of file

 No newline at end of file

Browse

Tools

Subversion Repositories core1990_interlaken

[/] [core1990_interlaken/] [trunk/] [documentation/] [protocol_survey_report/] [Sections/] [Hardware_Implementation.tex] - Diff between revs 5 and 8