Line 1... |
Line 1... |
\chapter{Architecture}
|
\chapter{Architecture}
|
|
|
\section{Block diagram}
|
\section{Block diagram}
|
The architecture for the full IP core is shown in the Figure~\ref{blockdiagram}. It consists of 2 major parts, the actual
|
The architecture for the full IP core is shown in the Figure~\ref{blockdiagram}. It consists of 2 major parts, the actual
|
exponentiation core (\verb|mod_sim_exp_core| entity) with a bus interface wrapped around it. In the following sections these
|
exponentiation core (\verb|mod_sim_exp_core| entity) with a bus interface wrapped around it. In the following sections these
|
different blocks are described in detail.\\
|
different blocks are described in detail. The bus interface and the exponentiation core can run on different clock
|
|
frequencies, so they are independent of each other.\\
|
\begin{figure}[H]
|
\begin{figure}[H]
|
\centering
|
\centering
|
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=10cm]{pictures/block_diagram.pdf}
|
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=10cm]{pictures/block_diagram.pdf}
|
\caption{Block diagram of the Modular Simultaneous Exponentiation IP core}
|
\caption{Block diagram of the Modular Simultaneous Exponentiation IP core}
|
\label{blockdiagram}
|
\label{blockdiagram}
|
Line 28... |
Line 29... |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=10cm]{pictures/mod_sim_exp_core.pdf}
|
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=10cm]{pictures/mod_sim_exp_core.pdf}
|
\cprotect\caption{\verb|mod_sim_exp_core| structure}
|
\cprotect\caption{\verb|mod_sim_exp_core| structure}
|
\label{msec_structure}
|
\label{msec_structure}
|
\end{figure}
|
\end{figure}
|
|
|
|
The multiplier and control unit operate on the \verb|core_clk| clock frequency and the interface to the operand RAM and
|
|
exponent FIFO operates on the \verb|bus_clk| clock frequency. The transition between the 2 clock domains is mainly
|
|
implemented by the RAM and FIFO. For the remainder, the necessary control signals are synchronised to the
|
|
\verb|bus_clk|. Thus when using the \verb|mod_sim_exp_core|, one can thus assume that al ports are operating on the
|
|
\verb|bus_clk| clock signal.
|
|
|
\subsection{Multiplier}
|
\subsection{Multiplier}
|
The kernel of this design is a pipelined Montgomery multiplier. A Montgomery multiplication\cite{MontModMul} allows efficient implementation of a
|
The kernel of this design is a pipelined Montgomery multiplier. A Montgomery multiplication\cite{MontModMul} allows efficient implementation of a
|
modular multiplication without explicitly carrying out the classical modular reduction step. Right-shift operations ensure that the length of the (intermediate) results does not exceed $n+1$ bits. The result of a Montgomery multiplication is given by~(\ref{eq:mont}):
|
modular multiplication without explicitly carrying out the classical modular reduction step. Right-shift operations ensure that the length of the (intermediate) results does not exceed $n+1$ bits. The result of a Montgomery multiplication is given by~(\ref{eq:mont}):
|
\begin{align}\label{eq:mont}
|
\begin{align}\label{eq:mont}
|
r = x \cdot y \cdot R^{-1} \bmod m \hspace{1.5cm}\text{with } R = 2^{n}
|
r = x \cdot y \cdot R^{-1} \bmod m \hspace{1.5cm}\text{with } R = 2^{n}
|
Line 123... |
Line 130... |
To store the exponents, there is a FIFO of 32 bit wide. Every 32 bit entry has to be formatted as 16 bit of $e_0$ for the
|
To store the exponents, there is a FIFO of 32 bit wide. Every 32 bit entry has to be formatted as 16 bit of $e_0$ for the
|
lower part [15:0] and 16 bit of $e_1$ for the higher part [31:16]. Entries have to be pushed in the FIFO starting with the least significant word and ending with the most significant word of the exponents.
|
lower part [15:0] and 16 bit of $e_1$ for the higher part [31:16]. Entries have to be pushed in the FIFO starting with the least significant word and ending with the most significant word of the exponents.
|
|
|
For the FIFO there are 2 styles available. The implementation style depends on the style of the operand memory and it can not be set directly. When the RAM option \verb|"xil_prim"| is chosen, the resulting FIFO will use the FIFO18E1 primitive. It is able to store 512 entries, meaning 2 exponents of each 8192 bit long.
|
For the FIFO there are 2 styles available. The implementation style depends on the style of the operand memory and it can not be set directly. When the RAM option \verb|"xil_prim"| is chosen, the resulting FIFO will use the FIFO18E1 primitive. It is able to store 512 entries, meaning 2 exponents of each 8192 bit long.
|
|
|
When the RAM options \verb|"generic"| or \verb|"asym"| are chosen, a generic FIFO will be implemented. This consist of a symmetric RAM with the control logic for a FIFO. The depth of this generic FIFO is adjustable with the parameter \verb|C_FIFO_DEPTH|.
|
When the RAM options \verb|"generic"| or \verb|"asym"| are chosen, a generic FIFO \footnote{This FIFO is a slightly
|
The number of RAM blocks for the FIFO is given by (\ref{eq:fifoblocks}), where \verb|RAMBLOCK_SIZE| is the size [bits] of the FPGA's RAM primitive.
|
modified version of the generic FIFOs project at OpenCores.org (http://opencores.org/project,generic\_fifos).} will be
|
|
implemented.
|
|
This consist of a dual port symmetric RAM with the control logic for a FIFO. The depth of this generic FIFO is adjustable with the parameter \verb|C_FIFO_AW|. The number of RAM blocks for the FIFO is given by (\ref{eq:fifoblocks}), where
|
|
\verb|RAMBLOCK_SIZE| is the size [bits] of the FPGA's RAM primitive.
|
\begin{align}
|
\begin{align}
|
\left[\left(\mathtt{C\_FIFO\_DEPTH}+1\right) \cdot 32 \right]/ \mathtt{RAMBLOCK\_SIZE} \label{eq:fifoblocks}
|
\left[\left(\mathtt{2^{C\_FIFO\_AW}}+1\right) \cdot 32 \right]/ \mathtt{RAMBLOCK\_SIZE} \label{eq:fifoblocks}
|
\end{align}
|
\end{align}
|
|
|
\subsection{Control unit}
|
\subsection{Control unit}
|
The control unit loads in the operands and has full control over the multiplier. For single multiplications, it latches in
|
The control unit loads in the operands and has full control over the multiplier. For single multiplications, it latches in
|
the $x$ operand, then places the $y$ operand on the bus and starts the multiplier. In case of an exponentiation, the FIFO is
|
the $x$ operand, then places the $y$ operand on the bus and starts the multiplier. In case of an exponentiation, the FIFO is
|
Line 145... |
Line 155... |
\begin{tabular}{|l|c|c|p{8cm}|}
|
\begin{tabular}{|l|c|c|p{8cm}|}
|
\hline
|
\hline
|
\rowcolor{Gray}
|
\rowcolor{Gray}
|
\textbf{Port} & \textbf{Width} & \textbf{Direction} & \textbf{Description} \bigstrut\\
|
\textbf{Port} & \textbf{Width} & \textbf{Direction} & \textbf{Description} \bigstrut\\
|
\hline
|
\hline
|
\verb|clk| & 1 & in & core clock input \bigstrut\\
|
\verb|core_clk| & 1 & in & core clock input, clock signal for the multiplier and control unit \bigstrut\\
|
|
\hline
|
|
\verb|bus_clk| & 1 & in & bus clock input, clock signal for all core IO \bigstrut\\
|
\hline
|
\hline
|
\verb|reset| & 1 & in & reset signal (active high) resets the pipeline, fifo and control logic \bigstrut\\
|
\verb|reset| & 1 & in & reset signal (active high) resets the pipeline, fifo and control logic \bigstrut\\
|
\hline
|
\hline
|
\multicolumn{4}{|l|}{\textbf{\textit{operand memory interface}}} \bigstrut\\
|
\multicolumn{4}{|l|}{\textbf{\textit{operand memory interface}}} \bigstrut\\
|
\hline
|
\hline
|
Line 211... |
Line 223... |
\hline
|
\hline
|
\verb|C_NR_STAGES_LOW| & number of lower stages in the pipeline, defines the bit-width of the lower pipeline part & integer & 32 \bigstrut\\
|
\verb|C_NR_STAGES_LOW| & number of lower stages in the pipeline, defines the bit-width of the lower pipeline part & integer & 32 \bigstrut\\
|
\hline
|
\hline
|
\verb|C_SPLIT_PIPELINE| & option to split the pipeline in 2 parts & boolean & true \bigstrut\\
|
\verb|C_SPLIT_PIPELINE| & option to split the pipeline in 2 parts & boolean & true \bigstrut\\
|
\hline
|
\hline
|
\verb|C_FIFO_DEPTH| & depth of the generic FIFO, only applicable if \verb|C_MEM_STYLE| = \verb|"generic"| or \verb|"asym"| & integer & 32 \bigstrut\\
|
\verb|C_FIFO_AW| & address width of the generic FIFO pointers, FIFO size is equal to $2^{C\_FIFO\_AW} $. & integer & 7 \bigstrut\\
|
|
& only applicable if \verb|C_MEM_STYLE| = \verb|"generic"| or \verb|"asym"| & & \\
|
\hline
|
\hline
|
\verb|C_MEM_STYLE| & select the RAM memory style (3 options): & string & \verb|"generic"| \bigstrut\\
|
\verb|C_MEM_STYLE| & select the RAM memory style (3 options): & string & \verb|"generic"| \bigstrut\\
|
& \verb|"generic"| : use general 32-bit RAMs & & \\
|
& \verb|"generic"| : use general 32-bit RAMs & & \\
|
& \verb|"asym"| : use asymmetric RAMs & & \\
|
& \verb|"asym"| : use asymmetric RAMs & & \\
|
& (For more information see \ref{subsec:RAM_and_FIFO}) & & \\
|
& (For more information see \ref{subsec:RAM_and_FIFO}) & & \\
|