URL
https://opencores.org/ocsvn/mod_sim_exp/mod_sim_exp/trunk
Subversion Repositories mod_sim_exp
Compare Revisions
- This comparison shows the changes necessary to convert path
/mod_sim_exp/tags/Release_1.3/doc/src
- from Rev 78 to Rev 79
- ↔ Reverse comparison
Rev 78 → Rev 79
/architecture.tex
0,0 → 1,244
\chapter{Architecture} |
|
\section{Block diagram} |
The architecture for the full IP core is shown in the Figure~\ref{blockdiagram}. It consists of 2 major parts, the actual |
exponentiation core (\verb|mod_sim_exp_core| entity) with a bus interface wrapped around it. In the following sections these |
different blocks are described in detail.\\ |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=10cm]{pictures/block_diagram.pdf} |
\caption{Block diagram of the Modular Simultaneous Exponentiation IP core} |
\label{blockdiagram} |
\end{figure} |
\newpage |
|
\section{Exponentiation core} |
The exponentiation core (\verb|mod_sim_exp_core| entity) is the top level of the modular simultaneous exponentiation |
core. It is made up by 4 main blocks (Figure~\ref{msec_structure}):\\ |
|
\begin{itemize} |
\item a pipelined Montgomery multiplier as the main processing unit |
\item RAM to store the operands and the modulus |
\item a FIFO to store the exponents |
\item a control unit which controls the multiplier for the exponentiation and multiplication operations |
\end{itemize} |
|
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=10cm]{pictures/mod_sim_exp_core.pdf} |
\cprotect\caption{\verb|mod_sim_exp_core| structure} |
\label{msec_structure} |
\end{figure} |
|
\subsection{Multiplier} |
The kernel of this design is a pipelined Montgomery multiplier. A Montgomery multiplication\cite{MontModMul} allows efficient implementation of a |
modular multiplication without explicitly carrying out the classical modular reduction step. Right-shift operations ensure that the length of the (intermediate) results does not exceed $n+1$ bits. The result of a Montgomery multiplication is given by~(\ref{eq:mont}): |
\begin{align}\label{eq:mont} |
r = x \cdot y \cdot R^{-1} \bmod m \hspace{1.5cm}\text{with } R = 2^{n} |
\end{align} |
For the structure of the multiplier, the work of \textit{Nedjah and Mourelle}\cite{NedMour} is used as a basis. They show that for large operands ($>$512 bits) the $time\times area$ product is minimal when a systolic implementation is used. This construction is composed of cells that each compute a bit of the (intermediate) result. |
|
Because a fully unrolled two-dimensional systolic implementation would require too many resources, a systolic array (one-dimensional) implementation is chosen. This implies that the intermediate results are fed back to the same same array of cells through a register. A shift register will shift-in a bit of the $x$ operand for every step in the calculation (figure~\ref{mult_structure}). When multiplication is completed, a final check is made to ensure the result is smaller than the modulus. If not, a final reduction with $m$ is necessary. |
|
\textbf{Note:} For this implementation the modulus $m$ has to be uneven to obtain a correct result. However, we can assume that for cryptographic applications, this is the case. |
|
|
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=15cm]{pictures/mult_structure.pdf} |
\caption{Multiplier structure. For clarification the $my$ adder and reduction logic are depicted separately, whereas in practice they are internal parts of the stages. (See Figure~\ref{stage_structure})} |
\label{mult_structure} |
\end{figure} |
|
\subsubsection{Stage and pipeline structure} |
The Montgomery algorithm uses a series of additions and right shifts to obtain the desired result. The main disadvantage |
is the carry propagation in the adder, and therefore a pipelined version is used. The length of the operands ($n$) and |
the number of pipeline stages can be chosen before synthesis. The user has the option to split the pipeline into 2 |
smaller parts so there are 3 operand lengths available during runtime\footnote{e.g. a total pipeline length of 1536 bit |
split into a part of 512 bit and a part of 1024 bit}. |
|
The stages and first and last cell logic design are presented in Figure~\ref{stage_structure}. Each stage takes in a |
part of the modulus $m$ and $y$ operand and for each step of the multiplication, a bit of the $x$ operand is fed to the |
pipeline (together with the generated $q$ signal), starting with the Least Significant Bit. The systolic array cells |
need the modulus $m$, the operand $y$ and the sum $m+y$ as an input. The result from the cells is latched into a |
register, and then passed back to the systolic cells for the next bit of $x$. During this pass the right shift operation |
is implemented. Each stage thus needs the least significant bit from the next stage to calculate the next step. Final |
reduction logic is also present in the stages for when the multiplication is complete. |
|
An example of the standard pipeline structure is presented in Figure~\ref{pipeline_structure}. It is constructed using |
stages with a predefined width. The first cell logic processes the first bit of the $m$ and $y$ operand and generates |
the $q$ signal. The last cell logic finishes the reduction and selects the correct result. For operation of this |
pipeline, it is clear that each stage can only compute a step every 2 clock cycles. This is because the stages rely on |
the result of the next stage. |
|
In Figure~\ref{pipeline_structure_split} an example pipeline design is drawn for a split pipeline. All multiplexers on |
this figure are controlled by the pipeline select signal (\verb|p_sel|). During runtime the user can choose which part |
of the pipeline is used, the lower or higher part or the full pipeline. |
|
\newpage |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=25cm, angle=90]{pictures/sys_stage.pdf} |
\caption{Pipeline stage and first and last cell logic} |
\label{stage_structure} |
\end{figure} |
\newpage |
|
\newpage |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=25cm, angle=90]{pictures/sys_pipeline_notsplit.pdf} |
\caption{Example of the pipeline structure (3 stages)} |
\label{pipeline_structure} |
\end{figure} |
\newpage |
|
\newpage |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=22cm, angle=90]{pictures/sys_pipeline.pdf} |
\caption{Example of a split pipeline (1+2 stages)} |
\label{pipeline_structure_split} |
\end{figure} |
\newpage |
|
|
\subsection{Operand RAM and exponent FIFO} \label{subsec:RAM_and_FIFO} |
The core's RAM is designed to store 4 operands and a modulus. \footnote{This is the default configuration. The number of operands can be increased, but the control logic is only designed to work with the default configuration.} Three (3) options are available for the implementation of the RAM. Setting the parameter \verb|C_MEM_STYLE|, will change the implementation style. All styles try to use the RAM resources available on the FPGA. |
|
If the FPGA supports asymmetric RAMs, i.e. with a different read and write width, we suggest that the option \verb|"asym"| is selected. Since the (device specific) RAM blocks are inferred through code, it is imperative to select the right device (\verb|C_DEVICE|), as this inference is different between manufacturers. Currently, only Altera and Xilinx are supported. |
|
If there's no asymmetric RAM support, the option \verb|"generic"| should be selected. This option will work for most FPGAs, but the disadvantage is that it will use more resources than the \verb|"asym"| option. This is because a significant number of LUTs will be used to construct an asymmetric RAM. |
|
For both options the size of the RAM adapts dynamically to the chosen pipeline width (\verb|C_NR_BITS_TOTAL|). |
|
Finally, the option \verb|"xil_prim"| is targeted specifically to Xilinx devices. It uses blocks of RAM generated with CoreGen. These blocks are of a fixed width and this results in a fixed RAM of 4x1536 bit for the operands and 1536 bit for the modulus. This option is deprecated in favor of \verb|"asym"|. |
|
Reading and writing (from the bus side) to the operands and modulus is done one 32-bit word at a time. If using a split pipeline, it is important that operands for the higher part of the pipeline are loaded into the RAM with preceding zero's for the lower bits of the pipeline. As a rule of thumb, the number of FPGA RAM blocks that will be used is given by (\ref{eq:ramblocks}): |
\begin{align} |
2 \cdot \mathtt{C\_NR\_BITS\_TOTAL} / 32\label{eq:ramblocks} |
\end{align} |
\newline |
|
To store the exponents, there is a FIFO of 32 bit wide. Every 32 bit entry has to be formatted as 16 bit of $e_0$ for the |
lower part [15:0] and 16 bit of $e_1$ for the higher part [31:16]. Entries have to be pushed in the FIFO starting with the least significant word and ending with the most significant word of the exponents. |
|
For the FIFO there are 2 styles available. The implementation style depends on the style of the operand memory and it can not be set directly. When the RAM option \verb|"xil_prim"| is chosen, the resulting FIFO will use the FIFO18E1 primitive. It is able to store 512 entries, meaning 2 exponents of each 8192 bit long. |
|
When the RAM options \verb|"generic"| or \verb|"asym"| are chosen, a generic FIFO will be implemented. This consist of a symmetric RAM with the control logic for a FIFO. The depth of this generic FIFO is adjustable with the parameter \verb|C_FIFO_DEPTH|. |
The number of RAM blocks for the FIFO is given by (\ref{eq:fifoblocks}), where \verb|RAMBLOCK_SIZE| is the size [bits] of the FPGA's RAM primitive. |
\begin{align} |
\left[\left(\mathtt{C\_FIFO\_DEPTH}+1\right) \cdot 32 \right]/ \mathtt{RAMBLOCK\_SIZE} \label{eq:fifoblocks} |
\end{align} |
|
\subsection{Control unit} |
The control unit loads in the operands and has full control over the multiplier. For single multiplications, it latches in |
the $x$ operand, then places the $y$ operand on the bus and starts the multiplier. In case of an exponentiation, the FIFO is |
emptied while the necessary single multiplications are performed. When the computation is done, the ready signal is |
asserted to notify the system. |
|
\newpage |
\subsection{IO ports and memory map} |
The \verb|mod_sim_exp_core| IO ports\\ |
\newline |
% Table generated by Excel2LaTeX |
\begin{tabular}{|l|c|c|p{8cm}|} |
\hline |
\rowcolor{Gray} |
\textbf{Port} & \textbf{Width} & \textbf{Direction} & \textbf{Description} \bigstrut\\ |
\hline |
\verb|clk| & 1 & in & core clock input \bigstrut\\ |
\hline |
\verb|reset| & 1 & in & reset signal (active high) resets the pipeline, fifo and control logic \bigstrut\\ |
\hline |
\multicolumn{4}{|l|}{\textbf{\textit{operand memory interface}}} \bigstrut\\ |
\hline |
\verb|rw_address| & 9 & in & operand memory read/write address (structure descibed below) \bigstrut\\ |
\hline |
\verb|data_out| & 32 & out & operand data out (0 is lsb) \bigstrut\\ |
\hline |
\verb|data_in| & 32 & in & operand data in (0 is lsb) \bigstrut\\ |
\hline |
\verb|write_enable| & 1 & in & write enable signal, latches \verb|data_in| to operand RAM \bigstrut\\ |
\hline |
\verb|collision| & 1 & out & collision output, asserts on a write error \bigstrut\\ |
\hline |
\multicolumn{4}{|l|}{\textbf{\textit{exponent FIFO interface}}} \bigstrut\\ |
\hline |
\verb|fifo_din| & 32 & in & FIFO data in, bits [31:16] for $e_1$ operand and bits [15:0] for $e_0$ operand \bigstrut\\ |
\hline |
\verb|fifo_push| & 1 & in & push \verb|fifo_din| into the FIFO \bigstrut\\ |
\hline |
\verb|fifo_nopush| & 1 & out & flag to indicate if there was an error pushing the word to the FIFO \bigstrut\\ |
\hline |
\verb|fifo_full| & 1 & out & flag to indicate the FIFO is full \bigstrut\\ |
\hline |
\multicolumn{4}{|l|}{\textbf{\textit{control signals}}} \bigstrut\\ |
\hline |
\verb|x_sel_single| & 2 & in & selection for x operand source during single multiplication \bigstrut\\ |
\hline |
\verb|y_sel_single| & 2 & in & selection for y operand source during single multiplication \bigstrut\\ |
\hline |
\verb|dest_op_single| & 2 & in & selection for the result destination operand for single multiplication \bigstrut\\ |
\hline |
\verb|p_sel| & 2 & in & specifies which pipeline part to use for exponentiation / multiplication. \bigstrut[t]\\ |
& & & ``01'' : use lower pipeline part \\ |
& & & ``10'' : use higher pipeline part \\ |
& & & ``11'' : use full pipeline \bigstrut[b]\\ |
\hline |
\verb|modulus_sel| & 1 & in & selection for which modulus to use for the calculations (only available if \verb|C_MEM_STYLE| = \verb|"generic"| or \verb|"asym"|). Otherwise set to 0 \bigstrut\\ |
\hline |
\verb|exp_m| & 1 & in & core operation mode. ``0'' for single multiplications and ``1'' for exponentiations \bigstrut\\ |
\hline |
\verb|start| & 1 & in & start the calculation for current mode \bigstrut\\ |
\hline |
\verb|ready| & 1 & out & indicates the multiplication/exponentiation is done \bigstrut\\ |
\hline |
\verb|calc_time| & 1 & out & is high during a multiplication, indicator for used calculation time \bigstrut\\ |
\hline |
\end{tabular}% |
\newpage |
The \verb|mod_sim_exp_core| parameters\\ |
\begin{center} |
\begin{tabular}{|l|p{6.5cm}|c|l|} |
\hline |
\rowcolor{Gray} |
\textbf{Name} & \textbf{Description} & \textbf{VHDL Type} &\textbf{Default Value} \bigstrut\\ |
\hline |
\verb|C_NR_BITS_TOTAL| & total width of the multiplier in bits & integer & 1536\bigstrut\\ |
\hline |
\verb|C_NR_STAGES_TOTAL| & total number of stages in the pipeline & integer & 96\bigstrut\\ |
\hline |
\verb|C_NR_STAGES_LOW| & number of lower stages in the pipeline, defines the bit-width of the lower pipeline part & integer & 32 \bigstrut\\ |
\hline |
\verb|C_SPLIT_PIPELINE| & option to split the pipeline in 2 parts & boolean & true \bigstrut\\ |
\hline |
\verb|C_FIFO_DEPTH| & depth of the generic FIFO, only applicable if \verb|C_MEM_STYLE| = \verb|"generic"| or \verb|"asym"| & integer & 32 \bigstrut\\ |
\hline |
\verb|C_MEM_STYLE| & select the RAM memory style (3 options): & string & \verb|"generic"| \bigstrut\\ |
& \verb|"generic"| : use general 32-bit RAMs & & \\ |
& \verb|"asym"| : use asymmetric RAMs & & \\ |
& (For more information see \ref{subsec:RAM_and_FIFO}) & & \\ |
& \verb|"xil_prim"| : use xilinx primitives & &\\ |
& (deprecated) & & \bigstrut[b] \\ |
\hline |
\verb|C_DEVICE| & device manufacturer: & & \\ |
& \verb|"xilinx"| or \verb|"altera"| & string & \verb|"xilinx"| \bigstrut\\ |
\hline |
\end{tabular}% |
\end{center} |
|
The following figure describes the structure of the Operand RAM memory, for every operand there is a space of 2048 bits |
reserved. So operand widths up to 2048 bits are supported.\\ |
\newline \\ |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=15cm]{pictures/msec_memory.pdf} |
\caption{Address structure of the exponentiation core} |
\label{Address_structure} |
\end{figure} |
|
\section{Bus interface} |
The bus interface implements the register necessary for the control unit inputs to the \verb|mod_sim_exp_core| entity. |
It also maps the memory to the required bus and connects the interrupt signals. The embedded processor then has full control |
over the core. |
/mod_sim_exp.tex
0,0 → 1,63
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
% % |
% LaTeX, Modular Simultaneous Exponentiation core documentation % |
% % |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
\documentclass[11pt,a4paper,twoside]{report} |
|
\usepackage[a4paper,left=2.5cm, right=2cm, top=2cm, bottom=2cm]{geometry} |
\usepackage{graphicx} |
\usepackage[latin1]{inputenc} |
\usepackage{listings} % for showing source code |
\usepackage{verbatim} |
\usepackage{hyperref} |
\usepackage{url} |
\usepackage[small,bf,hang]{caption} |
\usepackage{pslatex} |
\usepackage{bigstrut} |
\usepackage{color, colortbl} |
\usepackage[bottom]{footmisc} % to place footnotes at the bottom |
|
\usepackage{algorithm} % for pseudocode |
\usepackage{algorithmicx} |
\usepackage[noend]{algpseudocode} |
\usepackage[fleqn]{amsmath} |
|
\usepackage{cprotect} % for verb in caption |
|
\usepackage{sectsty} % adjust fonts from sections and captions |
\allsectionsfont{\sffamily} |
\chapterfont{\raggedright\sffamily} |
|
\usepackage{float} % ex. \begin{figure}[H] |
|
\newcommand{\tab}{\hspace*{2em}} |
\newcommand{\version}{v1.3} |
\newcommand{\dramco}{DraMCo research group -- KAHO Sint-Lieven\\Association KU Leuven} |
\newcommand{\thetitle}{Modular Simultaneous Exponentiation\\IP Core Specification (\version)} |
|
\usepackage{mod_sim_exp_style} |
\definecolor{Gray}{gray}{0.9} |
\bibliographystyle{ieeetr} |
|
\title{\thetitle} |
\author{Jonas De Craene} |
\authorEmail{JonasDC@opencores.org} |
\hwDesigner{Geoffrey Ottoy, DraMCo research group} |
\hwCoDesigner{Jonas De Craene} |
\company{DraMCo research group, \href{mailto:info@dramco.org}{info@dramco.org}} |
|
\begin{document} |
|
\preface |
\raggedbottom |
\input{introduction} |
\input{architecture} |
\input{operation} |
\input{plb_interface} |
\input{performance} |
|
\bibliography{cited} |
|
\input{license} |
\end{document} |
/plb_interface.tex
0,0 → 1,339
\chapter{PLB interface} |
\section{Structure} |
The Processor Local Bus interface for this core is structured as in Figure~\ref{PLBstructure}. The core acts as a slave |
to the PLB bus. The PLB v4.6 Slave\cite{XilinxPLB} logic translates the interface to a lower level IP Interconnect |
Interface (IPIC). |
This is then used to connect the core internal components to. The user logic contains the exponentiation core and the |
control register for the core its control inputs and outputs. An internal interrupt controller\cite{XilinxIntr} handles |
the outgoing interrupt requests and a software reset module is provided to be able to reset the IP core at runtime. This |
bus interface is created using the ``Create or Import Peripheral'' wizard from Xilinx Platform Studio.\\ |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=7cm]{pictures/plb_interface.pdf} |
\caption{PLB IP core structure} |
\label{PLBstructure} |
\end{figure} |
|
\newpage |
\section{Parameters} |
This section describes the parameters used to configure the core, only the relevant parameters are discussed. PLB |
specific parameters are left to the user to configure. The IP core specific parameters and their respective use are |
listed in the table below. |
\begin{center} |
\begin{tabular}{|l|p{6.5cm}|c|l|} |
\hline |
\rowcolor{Gray} |
\textbf{Name} & \textbf{Description} & \textbf{VHDL Type} &\textbf{Default Value} \bigstrut\\ |
\hline |
\multicolumn{4}{|l|}{\textit{\textbf{Memory configuration}}} \\ |
\hline |
\verb|C_FIFO_DEPTH| & depth of the generic FIFO, only applicable if \verb|C_MEM_STYLE| = \verb|"generic"| or \verb|"asym"| & integer & 32 \bigstrut\\ |
\hline |
\verb|C_MEM_STYLE| & the memory structure to use for the RAM, choice between 3 options: & string & \verb|"generic"| \bigstrut\\ |
& \verb|"xil_prim"| : use xilinx primitives & & \\ |
& \verb|"generic"| : use general 32-bit RAMs & & \\ |
& \verb|"asym"| : use asymmetric RAMs & & \\ |
& (For more information see \ref{subsec:RAM_and_FIFO}) & & \bigstrut[b] \\ |
\hline |
\verb|C_DEVICE| & device manufacturer: & string & \verb|"xilinx"| \\ |
& \verb|"xilinx"| or \verb|"altera"| & & \bigstrut\\ |
\hline |
\verb|C_BASEADDR| & base address for the IP core's memory space & std\_logic\_vector & X"FFFFFFFF" \bigstrut\\ |
\hline |
\verb|C_HIGHADDR| & high address for the IP core's memory space & std\_logic\_vector & X"00000000" \bigstrut\\ |
\hline |
\verb|C_M_BASEADDR| & base address for the modulus memory space & std\_logic\_vector & X"FFFFFFFF" \bigstrut\\ |
\hline |
\verb|C_M_HIGHADDR| & high address for the modulus memory space & std\_logic\_vector & X"00000000" \bigstrut\\ |
\hline |
\verb|C_OP0_BASEADDR| & base address for the operand 0 memory space & std\_logic\_vector & X"FFFFFFFF" \bigstrut\\ |
\hline |
\verb|C_OP0_HIGHADDR| & high address for the operand 0 memory space & std\_logic\_vector & X"00000000" \bigstrut\\ |
\hline |
\verb|C_OP1_BASEADDR| & base address for the operand 1 memory space & std\_logic\_vector & X"FFFFFFFF" \bigstrut\\ |
\hline |
\verb|C_OP1_HIGHADDR| & high address for the operand 1 memory space & std\_logic\_vector & X"00000000" \bigstrut\\ |
\hline |
\verb|C_OP2_BASEADDR| & base address for the operand 2 memory space & std\_logic\_vector & X"FFFFFFFF" \bigstrut\\ |
\hline |
\verb|C_OP2_HIGHADDR| & high address for the operand 2 memory space & std\_logic\_vector & X"00000000" \bigstrut\\ |
\hline |
\verb|C_OP3_BASEADDR| & base address for the operand 3 memory space & std\_logic\_vector & X"FFFFFFFF" \bigstrut\\ |
\hline |
\verb|C_OP3_HIGHADDR| & high address for the operand 3 memory space & std\_logic\_vector & X"00000000" \bigstrut\\ |
\hline |
\verb|C_FIFO_BASEADDR| & base address for the FIFO memory space & std\_logic\_vector & X"FFFFFFFF" \bigstrut\\ |
\hline |
\verb|C_FIFO_HIGHADDR| & high address for the FIFO memory space & std\_logic\_vector & X"00000000" \bigstrut\\ |
\hline |
\multicolumn{4}{|l|}{\textit{\textbf{Multiplier configuration}}} \\ |
\hline |
\verb|C_NR_BITS_TOTAL| & total width of the multiplier in bits & integer & 1536\bigstrut\\ |
\hline |
\verb|C_NR_STAGES_TOTAL| & total number of stages in the pipeline & integer & 96\bigstrut\\ |
\hline |
\verb|C_NR_STAGES_LOW| & number of lower stages in the pipeline, defines the bit-width of the lower pipeline part & integer & 32 \bigstrut\\ |
\hline |
\verb|C_SPLIT_PIPELINE| & option to split the pipeline in 2 parts & boolean & true \bigstrut\\ |
\hline |
\end{tabular}% |
\end{center} |
%\newline |
|
The complete IP core's memory space can be controlled. As can be seen, the operand, modulus and FIFO memory space can be |
chosen separately from the IP core's memory space which hold the registers for control, software reset and interrupt |
control. The core's memory space must have a minimum width of 1K byte for all registers to be accessible. For the FIFO |
memory space, a minimum width of 4 byte is needed, since the FIFO is only 32 bit wide. The memory space width for the |
operands and the modulus need a minimum width equal to the total multiplier width.\\ |
|
There are 4 parameters to configure the multiplier. These values define the width of the multiplier operands and the |
number of pipeline stages. If \verb|C_SPLIT_PIPELINE| is false, only operands with a width of\\\verb|C_NR_BITS_TOTAL| are |
valid. Else if \verb|C_SPLIT_PIPELINE| is true, 3 operand widths can be supported: |
\begin{itemize} |
\item the length of the full pipeline ($C\_NR\_BITS\_TOTAL$) |
\item the length of the lower pipeline ($\frac{C\_NR\_BITS\_TOTAL}{C\_NR\_STAGES\_TOTAL} \cdot C\_NR\_STAGES\_LOW $) |
\item the length of the higher pipeline ($\frac{C\_NR\_BITS\_TOTAL}{C\_NR\_STAGES\_TOTAL} \cdot (C\_NR\_STAGES\_TOTAL - C\_NR\_STAGES\_LOW$) |
\end{itemize} |
|
\section{IO ports} |
\begin{tabular}{|l|c|c|l|} |
\hline |
\rowcolor{Gray} |
\textbf{Port} & \textbf{Width} & \textbf{Direction} & \textbf{Description} \\ |
\hline |
\multicolumn{4}{|l|}{\textit{\textbf{PLB bus connections}}} \\ |
\hline |
\verb|SPLB_Clk| & 1 & in & see note 1 \\ |
\hline |
\verb|SPLB_Rst| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_ABus| & 32 & in & see note 1 \\ |
\hline |
\verb|PLB_PAValid| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_masterID| & 3 & in & see note 1 \\ |
\hline |
\verb|PLB_RNW| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_BE| & 4 & in & see note 1 \\ |
\hline |
\verb|PLB_size| & 4 & in & see note 1 \\ |
\hline |
\verb|PLB_type| & 3 & in & see note 1 \\ |
\hline |
\verb|PLB_wrDBus| & 32 & in & see note 1 \\ |
\hline |
\verb|Sl_addrAck| & 1 & out & see note 1 \\ |
\hline |
\verb|Sl_SSize| & 2 & out & see note 1 \\ |
\hline |
\verb|Sl_wait| & 1 & out & see note 1 \\ |
\hline |
\verb|Sl_rearbitrate| & 1 & out & see note 1 \\ |
\hline |
\verb|Sl_wrDack| & 1 & out & see note 1 \\ |
\hline |
\verb|Sl_wrComp| & 1 & out & see note 1 \\ |
\hline |
\verb|Sl_rdBus| & 32 & out & see note 1 \\ |
\hline |
\verb|Sl_MBusy| & 8 & out & see note 1 \\ |
\hline |
\verb|Sl_MWrErr| & 8 & out & see note 1 \\ |
\hline |
\verb|Sl_MRdErr| & 8 & out & see note 1 \\ |
\hline |
\multicolumn{4}{|l|}{\textit{\textbf{unused PLB signals}}} \\ |
\hline |
\verb|PLB_UABus| & 32 & in & see note 1 \\ |
\hline |
\verb|PLB_SAValid| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_rdPrim| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_wrPrim| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_abort| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_busLock| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_MSize| & 2 & in & see note 1 \\ |
\hline |
\verb|PLB_TAttribute| & 16 & in & see note 1 \\ |
\hline |
\verb|PLB_lockerr| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_wrBurst| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_rdBurst| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_wrPendReq| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_rdPendReq| & 1 & in & see note 1 \\ |
\hline |
\verb|PLB_rdPendPri| & 2 & in & see note 1 \\ |
\hline |
\verb|PLB_wrPendPri| & 2 & in & see note 1 \\ |
\hline |
\verb|PLB_reqPri| & 2 & in & see note 1 \\ |
\hline |
\verb|Sl_wrBTerm| & 1 & out & see note 1 \\ |
\hline |
\verb|Sl_rdWdAddr| & 4 & out & see note 1 \\ |
\hline |
\verb|Sl_rdBTerm| & 1 & out & see note 1 \\ |
\hline |
\verb|Sl_MIRQ| & 8 & out & see note 1 \\ |
\hline |
\multicolumn{4}{|l|}{\textit{\textbf{Core signals}}} \\ |
\hline |
\verb|IP2INTC_Irpt| & 1 & out & core interrupt signal \\ |
\hline |
\verb|calc_time| & 1 & out & is high when core is performing a multiplication, for monitoring \\ |
\hline |
\end{tabular}% |
\newline \newline |
\textbf{Note 1:} The function and timing of this signal is defined in the IBM\textsuperscript{\textregistered} 128-Bit Processor Local Bus Architecture Specification |
Version 4.6. |
|
\section{Registers} |
This section specifies the IP core internal registers as seen from the software. These registers allow to control and |
configure the modular exponentiation core and to read out its state. All addresses given in this table are relative to the |
IP core's base address.\\ |
\newline |
% Table generated by Excel2LaTeX |
\begin{tabular}{|l|c|c|c|l|} |
\hline |
\rowcolor{Gray} |
\textbf{Name} & \textbf{Width} & \textbf{Address} & \textbf{Access} & \textbf{Description} \bigstrut\\ |
\hline |
control register & 32 & 0x0000 & RW & multiplier core control signals and \bigstrut[t]\\ |
& & & & interrupt flags register\bigstrut[b]\\ |
\hline |
software reset & 32 & 0x0100 & W & soft reset for the IP core \bigstrut\\ |
\hline |
\multicolumn{5}{|l|}{\textbf{\textit{Interrupt controller registers}}} \bigstrut\\ |
\hline |
global interrupt enable register & 32 & 0x021C & RW & global interrupt enable for the IP core \bigstrut[t]\\ |
interrupt status register & 32 & 0x0220 & R & register for interrupt status flags\\ |
interrupt enable register & 32 & 0x0228 & RW & register to enable individual IP core interrupts \bigstrut[b]\\ |
\hline |
\end{tabular}% |
|
\newpage |
\subsection{Control register (offset = 0x0000)} |
This registers holds the control inputs to the multiplier core and the interrupt flags.\\ |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=15cm]{pictures/plb_control_reg.pdf} |
\caption{control register} |
\end{figure} |
|
|
\begin{tabular}{ll} |
bits 0-1 & P\_SEL : selects which pipeline part to be active\\ |
& $\bullet$ "01" lower pipeline part\\ |
& $\bullet$ "10" higher pipeline part\\ |
& $\bullet$ "11" full pipeline\\ |
& $\bullet$ "00" invalid selection\\ |
&\\ |
bits 2-3 & DEST\_OP : selects the operand (0-3) to store the result in for a single\\ |
& Montgomery multiplication\footnotemark\\ |
&\\ |
bits 4-5 & X\_OP : selects the x operand (0-3) for a single Montgomery multiplication\footnotemark[\value{footnote}]\\ |
&\\ |
bits 6-7 & Y\_OP : selects the y operand (0-3) for a single Montgomery multiplication\footnotemark[\value{footnote}]\\ |
&\\ |
bit 8 & START : starts the multiplication/exponentiation\\ |
&\\ |
bit 9 & EXP/M : selects the operating mode\\ |
& $\bullet$ "0" single Montgomery multiplications\\ |
& $\bullet$ "1" simultaneous exponentiations\\ |
&\\ |
bits 10-15 & unimplemented\\ |
&\\ |
bit 16 & READY : ready flag, "1" when multiplication is done\\ |
& must be cleared in software\\ |
&\\ |
bit 17 & MEM\_ERR : memory collision error flag, "1" when write error occurred\\ |
& must be cleared in software\\ |
&\\ |
bit 18 & FIFO\_FULL : FIFO full error flag, "1" when FIFO is full\\ |
& must be cleared in software\\ |
&\\ |
bit 19 & FIFO\_ERR : FIFO write/push error flag, "1" when push error occurred\\ |
& must be cleared in software\\ |
&\\ |
bits 20-31 & unimplemented\\ |
&\\ |
\end{tabular} |
\newline |
\newline |
\footnotetext{when the core is running in exponentiation mode, the parameters DEST\_OP, X\_OP and Y\_OP have no effect.} |
|
\newpage |
\subsection{Software reset register (offset = 0x0100)} |
This is a register with write only access, and provides the possibility to reset the IP core from software by writing |
0x0000000A to this address. The reset affects the full IP core, thus resetting the control register, interrupt controller, |
the multiplier pipeline, FIFO and control logic of the core. |
|
\subsection{Global interrupt enable register (offset = 0x021C)} |
This register contains a single defined bit in the high-order position. The GIE bit enables or disables all interrupts |
form the IP core.\\ |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=15cm]{pictures/plb_gie_reg.pdf} |
\caption{Global interrupt enable register} |
\end{figure} |
|
\begin{tabular}{ll} |
bit 0 & GIE : Global interrupt enable\\ |
& $\bullet$ "0" disables all core interrupts\\ |
& $\bullet$ "1" enables all core interrupts\\ |
&\\ |
bits 1-31 & unimplemented\\ |
&\\ |
\end{tabular} |
|
\subsection{Interrupt status register (offset = 0x0220)} |
Read-only register that contains the status of the core interrupts. Currently there is only one common interrupt from |
the core that is asserted when a multiplication/exponentiation is done, FIFO is full, on FIFO push error or memory write |
collision.\\ |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=15cm]{pictures/plb_is_reg.pdf} |
\caption{Interrupt status register} |
\end{figure} |
|
\begin{tabular}{ll} |
bits 0-30 & unimplemented\\ |
&\\ |
bit 31 & CIS : Core interrupt status\\ |
& is high when interrupt is requested from core\\ |
&\\ |
\end{tabular} |
|
\subsection{interrupt enable register (offset = 0x0228)} |
This register contains the interrupt enable bits for the respective interrupt bits of the interrupt status register.\\ |
\begin{figure}[H] |
\centering |
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=15cm]{pictures/plb_ie_reg.pdf} |
\caption{Interrupt enable register} |
\end{figure} |
\begin{tabular}{ll} |
bits 0-30 & unimplemented\\ |
&\\ |
bit 31 & CIE : Core interrupt enable\\ |
& $\bullet$ "0" disable core interrupt\\ |
& $\bullet$ "1" enable core interrupt\\ |
&\\ |
\end{tabular} |
|
\section{Interfacing the core's RAM} |
Special attention must be taken when writing data to the operands and modulus. The least significant bit of the data has be on the lowest |
address and the most significant bit on the highest address. A write to the RAM has to happen 1 word at a time, byte writes are not |
supported due to the structure of the RAM. |
|
\section{Handling interrupts} |
When the embedded processor receives an interrupt signal from this core, it is up to the controlling software to |
determine the source of the interrupt by reading out the interrupt flag of the control register. |
/acknowl.tex
0,0 → 1,34
\chapter*{Acknowledgments} |
\addcontentsline{toc}{chapter}{Acknowledgments} |
This project is maintained by the DraMCo research group\footnote{\url{http://www.dramco.be/}} of KAHO Sint-Lieven\footnote{\url{http://www.kahosl.be/}}, part of the KU Leuven association\footnote{\url{http://associatie.kuleuven.be/}}. |
The base design for this IP core is written by Geoffrey Ottoy, member of the DraMCo research group. Further adjustments have been made by Jonas De Craene |
|
%\addtocontents{toc}{Document Revision History} |
|
\chapter*{Document Revision History} |
\addcontentsline{toc}{chapter}{Document Revision History} |
|
\section*{History} |
\begin{tabular}{|l|c|l|p{10cm}|} |
\hline |
\rowcolor{Gray} |
\textbf{Revision} & \textbf{Date} & \textbf{By} & \textbf{Description} \\ |
\hline |
0 & November 2012 & JDC & First draft of this specification\\ |
\hline |
1.0 & November 2012 & JDC & Added sections ``Acknowledgement'' and ``Performance and resource usage'' as well as different fonts for \textit{variables} and \verb|signal_names|\\ |
\hline |
1.1 & November 2012 & GO & Added this ``Document Revision History''. Made several small changes in layout and formulation.\\ |
\hline |
1.2 & March 2013 & JDC & Added information about the new possible RAM structures\\ |
\hline |
1.3 & March 2013 & GO & Revision of newly added RAM structures\\ |
\hline |
\end{tabular}% |
|
\section*{Author info} |
|
\begin{itemize} |
\item[GO:] Geoffrey Ottoy\\DraMCo research group\\\url{geoffrey.ottoy@kahosl.be} |
\item[JDC:] Jonas De Craene\\KAHO Sint-Lieven\\\url{JonasDC@opencores.org} |
\end{itemize} |
/pictures/latex_figs.vsd
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
pictures/latex_figs.vsd
Property changes :
Added: svn:mime-type
## -0,0 +1 ##
+application/octet-stream
\ No newline at end of property
Index: pictures/mod_sim_exp_core.pdf
===================================================================
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
Index: pictures/mod_sim_exp_core.pdf
===================================================================
--- pictures/mod_sim_exp_core.pdf (nonexistent)
+++ pictures/mod_sim_exp_core.pdf (revision 79)
pictures/mod_sim_exp_core.pdf
Property changes :
Added: svn:mime-type
## -0,0 +1 ##
+application/octet-stream
\ No newline at end of property
Index: pictures/msec_memory.pdf
===================================================================
--- pictures/msec_memory.pdf (nonexistent)
+++ pictures/msec_memory.pdf (revision 79)
@@ -0,0 +1,1766 @@
+%PDF-1.5
+%
+1 0 obj
+<>>>
+endobj
+2 0 obj
+<>
+endobj
+3 0 obj
+<>/Pattern<>/Font<>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 378.75 311.25] /Contents 4 0 R/Group<>/Tabs/S/StructParents 0>>
+endobj
+4 0 obj
+<>
+stream
+x\k%*^/QTnQ+hI*T)Dn)=os/T9x}3ܽo8.ߞPb.OO&F
+\Rcj/NO=NO4J߂W 31L
+Mw_ק'0 ӓ
+\^===v8\rSNMnj?y77}+$V F'BsB+V7H8