| 1 |
47 |
JonasDC |
\chapter{Operation}
|
| 2 |
|
|
|
| 3 |
|
|
\section{Pipeline operation}
|
| 4 |
|
|
The operation of the pipeline is shown in Figure~\ref{fig:pipeline_op}. One can see that the stages are started
|
| 5 |
|
|
every 2 clock cycles ($\tau_{c}$ is the core clock period). This is needed because the least significant bit of the next
|
| 6 |
|
|
stage result is needed. Every stage has to run $n$ (the width of the operands) times for the multiplication to be complete.
|
| 7 |
|
|
|
| 8 |
|
|
\begin{figure}[H]
|
| 9 |
|
|
\centering
|
| 10 |
|
|
\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=7cm]{pictures/pipeline_operation.pdf}
|
| 11 |
|
|
\caption{Pipeline operation: Each circle represents an active stage. The number indicates how much times that stage
|
| 12 |
|
|
has run. Dotted line contours indicate the stage is inactive.}
|
| 13 |
|
|
\label{fig:pipeline_op}
|
| 14 |
|
|
\end{figure}
|
| 15 |
|
|
|
| 16 |
|
|
For performing one Montgomery multiplication using this core, the total computation time $T_m$ for an $n$-bit operand
|
| 17 |
|
|
with a $k$-stage pipeline is given by~(\ref{eq:Tmult}).
|
| 18 |
|
|
\begin{align}\label{eq:Tmult}
|
| 19 |
|
|
T_{m} = \left[k + 2(n - 1)\right] \tau_c
|
| 20 |
|
|
\end{align}
|
| 21 |
|
|
\newpage
|
| 22 |
|
|
\section{Modular Simultaneous exponentiation operations}
|
| 23 |
|
|
Exponentiations are calculated with Algorithm~\ref{alg:mme} which uses the Montgomery multiplier as the main computation
|
| 24 |
|
|
step. It uses the principle of a square-and-multiply algorithm to calculate an exponentiation with 2 bases.
|
| 25 |
|
|
\begin{algorithm}[H] % enter the algorithm environment
|
| 26 |
|
|
\caption{Montgomery simultaneous exponentiation} % give the algorithm a caption
|
| 27 |
|
|
\label{alg:mme} % and a label for \ref{} commands later in the document
|
| 28 |
|
|
\algnewcommand\algorithmicdownto{\textbf{downto}}
|
| 29 |
|
|
\algrenewtext{For}[3]%
|
| 30 |
|
|
{\algorithmicfor\ #1 $\gets$ #2 \algorithmicdownto\ #3 \algorithmicdo}
|
| 31 |
|
|
\algnewcommand\algorithmicswitch{\textbf{switch}}
|
| 32 |
|
|
\algrenewtext{While}[2]%
|
| 33 |
|
|
{\algorithmicswitch\ #1, #2}
|
| 34 |
|
|
\algnewcommand\algorithmicinput{\textbf{Input:}}
|
| 35 |
|
|
\algnewcommand\Input{\item[\algorithmicinput]}
|
| 36 |
|
|
\algnewcommand\algorithmicoutput{\textbf{Output:}}
|
| 37 |
|
|
\algnewcommand\Output{\item[\algorithmicoutput]}
|
| 38 |
|
|
\footnotesize
|
| 39 |
|
|
\begin{algorithmic}[1] % enter the algorithmic environment
|
| 40 |
|
|
\Input $g_{0},\:g_{1},\:e_{0}=(e_{0_{t-1}} \cdots e_{0_{0}})_{2},\:e_{1}=(e_{0_{t-1}} \cdots e_{0_{0}})_{2},\:R^{2}\bmod m,\:m$
|
| 41 |
|
|
\Output $g_{0}^{e_{0}} \cdot g_{1}^{e_{1}} \bmod m$
|
| 42 |
|
|
\State $\tilde{g}_{0} := \text{Mont}(g_{0}, R^{2}),\:\tilde{g}_{1} := \text{Mont}(g_{1}, R^{2}),\:\tilde{g}_{01} := \text{Mont}(\tilde{g}_{0}, \tilde{g}_{1})$
|
| 43 |
|
|
\State $a := \text{Mont}(R^{2}, 1)$
|
| 44 |
|
|
\Comment This is the same as $a := R \bmod m$.
|
| 45 |
|
|
\For{$i$}{$(t-1)$}{0}
|
| 46 |
|
|
\State $a := \text{Mont}(a, a)$
|
| 47 |
|
|
\While{$e_{1_{i}}$}{$e_{0_{i}}$} % use as switch statement
|
| 48 |
|
|
\State $0,\:1:\;a := \text{Mont}(a, \tilde{g}_{0})$
|
| 49 |
|
|
\State $1,\:0:\;a := \text{Mont}(a, \tilde{g}_{1})$
|
| 50 |
|
|
\State $1,\:1:\;a := \text{Mont}(a, \tilde{g}_{01})$
|
| 51 |
|
|
\EndWhile
|
| 52 |
|
|
\EndFor
|
| 53 |
|
|
\State $a := \text{Mont}(a, 1)$
|
| 54 |
|
|
\State \Return{$a$}
|
| 55 |
|
|
\end{algorithmic}
|
| 56 |
|
|
\end{algorithm}
|
| 57 |
|
|
It can be seen that the algorithm requires $R^{2}\bmod m$ which is $2^{2n}\bmod m$. We assume $R^2 \bmod m$ can be
|
| 58 |
|
|
provided or pre-computed. The for loop in the algorithm is executed by the control logic of the core. Apart from this,
|
| 59 |
|
|
a few pre- and one post-calculations have to be performed.
|
| 60 |
|
|
|
| 61 |
|
|
The computation time for an exponentiation depends on the number of zero's in the exponents, from
|
| 62 |
|
|
Algorithm~\ref{alg:mme} one can see that if both exponent bits are zero at a time, no multiplication has to be
|
| 63 |
|
|
performed. Thus reducing the total time. The average computation time for a modular simultaneous exponentiation, with
|
| 64 |
|
|
$n$-bit base operands and $t$-bit exponents is given by~(\ref{eq:Tsime}).
|
| 65 |
|
|
\begin{align}\label{eq:Tsime}
|
| 66 |
|
|
T_{se} = \frac{7}{4} t \cdot T_{m} = \frac{7}{4}t \cdot [k + 2(n - 1)] \tau_c
|
| 67 |
|
|
\end{align}
|
| 68 |
|
|
|
| 69 |
|
|
For single base exponentiations, i.e. 1 exponent is equal to zero, the average exponentiation time is given by~(\ref{eq:Texp}).
|
| 70 |
|
|
\begin{align}\label{eq:Texp}
|
| 71 |
|
|
T_{e} = \frac{3}{2} t \cdot T_{m} = \frac{3}{2}t \cdot [k + 2(n - 1)] \tau_c
|
| 72 |
|
|
\end{align}
|
| 73 |
|
|
|
| 74 |
|
|
The formulas~(\ref{eq:Tsime}) and~(\ref{eq:Texp}) given here are only the theoretical average time for an exponentiation,
|
| 75 |
|
|
excluding the pre- and post-computations.
|
| 76 |
|
|
|
| 77 |
|
|
|
| 78 |
|
|
\section{Core operation steps}
|
| 79 |
|
|
The core can operate in 2 modes, multiplication or exponentiation mode. The steps required to do one of these actions
|
| 80 |
|
|
are described here.
|
| 81 |
|
|
\subsection{Single Montgomery multiplication}
|
| 82 |
|
|
The following steps are needed for a single Montgomery multiplication:
|
| 83 |
|
|
\begin{enumerate}
|
| 84 |
|
|
\item load the modulus to the RAM using the 32 bit bus
|
| 85 |
|
|
\item load the desired $x$ and $y$ operands into any 2 locations of the operand RAM using the 32 bit bus.
|
| 86 |
|
|
\item select the correct input operands for the multiplier using \verb|x_sel_single| and \verb|y_sel_single|
|
| 87 |
|
|
\item select the result destination operand using \verb|result_dest_op|
|
| 88 |
|
|
\item set \verb|exp/m| = `0' to select multiplication mode
|
| 89 |
|
|
\item set \verb|p_sel| to choose which pipeline part you will use
|
| 90 |
|
|
\item generate a start pulse for the core
|
| 91 |
|
|
\item wait until interrupt is received and read out result in selected operand
|
| 92 |
|
|
\end{enumerate}
|
| 93 |
|
|
|
| 94 |
|
|
\textbf{Note:} this computation gives a result \( r = x \cdot y \cdot R^{-1} \bmod m\). If the actual product of $x$ and $y$ is
|
| 95 |
|
|
desired, a final Montgomery multiplication of the result with $R^{2}$ is needed.
|
| 96 |
|
|
|
| 97 |
|
|
\subsection{Modular simultaneous exponentiation}
|
| 98 |
|
|
The core requires $\tilde{g}_{0}$, $\tilde{g}_{0}$, $\tilde{g}_{01}$ and $a$ to be loaded into the correct operand
|
| 99 |
|
|
spaces before starting the exponentiation. These parameters are calculated using single Montgomery multiplications as follows:
|
| 100 |
|
|
\begin{align*}
|
| 101 |
|
|
\tilde{g}_{0} &= Mont(g_{0}, R^{2}) &\,&= g_{0} \cdot R \bmod m & \hspace{3cm}\text{in operand 0}\hspace{4cm}\\
|
| 102 |
|
|
\tilde{g}_{1} &= Mont(g_{1}, R^{2}) &\,&= g_{1} \cdot R \bmod m & \hspace{3cm}\text{in operand 1}\hspace{4cm}\\
|
| 103 |
|
|
\tilde{g}_{01} &= Mont(\tilde{g}_{0}, \tilde{g}_{1}) &\,&= g_{0} \cdot g_{1} \cdot R \bmod m & \hspace{3cm}\text{in operand 2}\hspace{4cm}\\
|
| 104 |
|
|
a &= Mont(R^{2}, 1) &\,&= R \bmod m &\hspace{3cm}\text{in operand 3}\hspace{4cm}
|
| 105 |
|
|
\end{align*}
|
| 106 |
|
|
When the exponentiation is done, a final multiplication has to be started by the software to multiply $a$ with 1.
|
| 107 |
|
|
The steps needed for a full simultaneous exponentiation are:
|
| 108 |
|
|
|
| 109 |
|
|
|
| 110 |
|
|
\begin{enumerate}
|
| 111 |
|
|
\item load the modulus to the RAM using the 32 bit bus
|
| 112 |
|
|
\item load the desired $g_0$, $g_1$ operands and \(R^{2} \bmod m\) into the operand RAM using the 32 bit bus.
|
| 113 |
|
|
\item set \verb|p_sel| to choose which pipeline part you will use
|
| 114 |
|
|
\item compute $\tilde{g}_{0}$ by using a single Montgomery multiplication of $g_{0}$ with $R^{2}$ and place the result $\tilde{g}_{0}$ in operand 0.
|
| 115 |
|
|
\item compute $\tilde{g}_{1}$ by using a single Montgomery multiplication of $g_{1}$ with $R^{2}$ and place the result $\tilde{g}_{1}$ in operand 1.
|
| 116 |
|
|
\item compute $\tilde{g}_{01}$ by using a single Montgomery multiplication of $\tilde{g}_{0}$ with $\tilde{g}_{1}$ and place the result $\tilde{g}_{01}$ in operand 2.
|
| 117 |
|
|
\item compute $a$ by using a single Montgomery multiplication of $R^{2}$ with $1$ and place the result $a$ in operand 3.
|
| 118 |
|
|
\item set the core in exponentiation mode ($exp/m$='1')
|
| 119 |
|
|
\item generate a start pulse for the core
|
| 120 |
|
|
\item wait until interrupt is received
|
| 121 |
|
|
\item perform the post-computation using a single Montgomery multiplication of $a$(in operand 3) with 1 and read out result
|
| 122 |
|
|
\end{enumerate}
|