1 
47 
JonasDC 
\chapter{Operation}

2 



3 


\section{Pipeline operation}

4 


The operation of the pipeline is shown in Figure~\ref{fig:pipeline_op}. One can see that the stages are started

5 


every 2 clock cycles ($\tau_{c}$ is the core clock period). This is needed because the least significant bit of the next

6 


stage result is needed. Every stage has to run $n$ (the width of the operands) times for the multiplication to be complete.

7 



8 


\begin{figure}[H]

9 


\centering

10 


\includegraphics[trim=1.2cm 1.2cm 1.2cm 1.2cm, width=7cm]{pictures/pipeline_operation.pdf}

11 


\caption{Pipeline operation: Each circle represents an active stage. The number indicates how much times that stage

12 


has run. Dotted line contours indicate the stage is inactive.}

13 


\label{fig:pipeline_op}

14 


\end{figure}

15 



16 


For performing one Montgomery multiplication using this core, the total computation time $T_m$ for an $n$bit operand

17 


with a $k$stage pipeline is given by~(\ref{eq:Tmult}).

18 


\begin{align}\label{eq:Tmult}

19 


T_{m} = \left[k + 2(n  1)\right] \tau_c

20 


\end{align}

21 


\newpage

22 


\section{Modular Simultaneous exponentiation operations}

23 


Exponentiations are calculated with Algorithm~\ref{alg:mme} which uses the Montgomery multiplier as the main computation

24 


step. It uses the principle of a squareandmultiply algorithm to calculate an exponentiation with 2 bases.

25 


\begin{algorithm}[H] % enter the algorithm environment

26 


\caption{Montgomery simultaneous exponentiation} % give the algorithm a caption

27 


\label{alg:mme} % and a label for \ref{} commands later in the document

28 


\algnewcommand\algorithmicdownto{\textbf{downto}}

29 


\algrenewtext{For}[3]%

30 


{\algorithmicfor\ #1 $\gets$ #2 \algorithmicdownto\ #3 \algorithmicdo}

31 


\algnewcommand\algorithmicswitch{\textbf{switch}}

32 


\algrenewtext{While}[2]%

33 


{\algorithmicswitch\ #1, #2}

34 


\algnewcommand\algorithmicinput{\textbf{Input:}}

35 


\algnewcommand\Input{\item[\algorithmicinput]}

36 


\algnewcommand\algorithmicoutput{\textbf{Output:}}

37 


\algnewcommand\Output{\item[\algorithmicoutput]}

38 


\footnotesize

39 


\begin{algorithmic}[1] % enter the algorithmic environment

40 


\Input $g_{0},\:g_{1},\:e_{0}=(e_{0_{t1}} \cdots e_{0_{0}})_{2},\:e_{1}=(e_{0_{t1}} \cdots e_{0_{0}})_{2},\:R^{2}\bmod m,\:m$

41 


\Output $g_{0}^{e_{0}} \cdot g_{1}^{e_{1}} \bmod m$

42 


\State $\tilde{g}_{0} := \text{Mont}(g_{0}, R^{2}),\:\tilde{g}_{1} := \text{Mont}(g_{1}, R^{2}),\:\tilde{g}_{01} := \text{Mont}(\tilde{g}_{0}, \tilde{g}_{1})$

43 


\State $a := \text{Mont}(R^{2}, 1)$

44 


\Comment This is the same as $a := R \bmod m$.

45 


\For{$i$}{$(t1)$}{0}

46 


\State $a := \text{Mont}(a, a)$

47 


\While{$e_{1_{i}}$}{$e_{0_{i}}$} % use as switch statement

48 


\State $0,\:1:\;a := \text{Mont}(a, \tilde{g}_{0})$

49 


\State $1,\:0:\;a := \text{Mont}(a, \tilde{g}_{1})$

50 


\State $1,\:1:\;a := \text{Mont}(a, \tilde{g}_{01})$

51 


\EndWhile

52 


\EndFor

53 


\State $a := \text{Mont}(a, 1)$

54 


\State \Return{$a$}

55 


\end{algorithmic}

56 


\end{algorithm}

57 


It can be seen that the algorithm requires $R^{2}\bmod m$ which is $2^{2n}\bmod m$. We assume $R^2 \bmod m$ can be

58 


provided or precomputed. The for loop in the algorithm is executed by the control logic of the core. Apart from this,

59 


a few pre and one postcalculations have to be performed.

60 



61 


The computation time for an exponentiation depends on the number of zero's in the exponents, from

62 


Algorithm~\ref{alg:mme} one can see that if both exponent bits are zero at a time, no multiplication has to be

63 


performed. Thus reducing the total time. The average computation time for a modular simultaneous exponentiation, with

64 


$n$bit base operands and $t$bit exponents is given by~(\ref{eq:Tsime}).

65 


\begin{align}\label{eq:Tsime}

66 


T_{se} = \frac{7}{4} t \cdot T_{m} = \frac{7}{4}t \cdot [k + 2(n  1)] \tau_c

67 


\end{align}

68 



69 


For single base exponentiations, i.e. 1 exponent is equal to zero, the average exponentiation time is given by~(\ref{eq:Texp}).

70 


\begin{align}\label{eq:Texp}

71 


T_{e} = \frac{3}{2} t \cdot T_{m} = \frac{3}{2}t \cdot [k + 2(n  1)] \tau_c

72 


\end{align}

73 



74 


The formulas~(\ref{eq:Tsime}) and~(\ref{eq:Texp}) given here are only the theoretical average time for an exponentiation,

75 


excluding the pre and postcomputations.

76 



77 



78 


\section{Core operation steps}

79 


The core can operate in 2 modes, multiplication or exponentiation mode. The steps required to do one of these actions

80 


are described here.

81 


\subsection{Single Montgomery multiplication}

82 


The following steps are needed for a single Montgomery multiplication:

83 


\begin{enumerate}

84 


\item load the modulus to the RAM using the 32 bit bus

85 


\item load the desired $x$ and $y$ operands into any 2 locations of the operand RAM using the 32 bit bus.

86 


\item select the correct input operands for the multiplier using \verbx_sel_single and \verby_sel_single

87 


\item select the result destination operand using \verbresult_dest_op

88 


\item set \verbexp/m = `0' to select multiplication mode

89 


\item set \verbp_sel to choose which pipeline part you will use

90 


\item generate a start pulse for the core

91 


\item wait until interrupt is received and read out result in selected operand

92 


\end{enumerate}

93 



94 


\textbf{Note:} this computation gives a result \( r = x \cdot y \cdot R^{1} \bmod m\). If the actual product of $x$ and $y$ is

95 


desired, a final Montgomery multiplication of the result with $R^{2}$ is needed.

96 



97 


\subsection{Modular simultaneous exponentiation}

98 


The core requires $\tilde{g}_{0}$, $\tilde{g}_{0}$, $\tilde{g}_{01}$ and $a$ to be loaded into the correct operand

99 


spaces before starting the exponentiation. These parameters are calculated using single Montgomery multiplications as follows:

100 


\begin{align*}

101 


\tilde{g}_{0} &= Mont(g_{0}, R^{2}) &\,&= g_{0} \cdot R \bmod m & \hspace{3cm}\text{in operand 0}\hspace{4cm}\\

102 


\tilde{g}_{1} &= Mont(g_{1}, R^{2}) &\,&= g_{1} \cdot R \bmod m & \hspace{3cm}\text{in operand 1}\hspace{4cm}\\

103 


\tilde{g}_{01} &= Mont(\tilde{g}_{0}, \tilde{g}_{1}) &\,&= g_{0} \cdot g_{1} \cdot R \bmod m & \hspace{3cm}\text{in operand 2}\hspace{4cm}\\

104 


a &= Mont(R^{2}, 1) &\,&= R \bmod m &\hspace{3cm}\text{in operand 3}\hspace{4cm}

105 


\end{align*}

106 


When the exponentiation is done, a final multiplication has to be started by the software to multiply $a$ with 1.

107 


The steps needed for a full simultaneous exponentiation are:

108 



109 



110 


\begin{enumerate}

111 


\item load the modulus to the RAM using the 32 bit bus

112 


\item load the desired $g_0$, $g_1$ operands and \(R^{2} \bmod m\) into the operand RAM using the 32 bit bus.

113 


\item set \verbp_sel to choose which pipeline part you will use

114 


\item compute $\tilde{g}_{0}$ by using a single Montgomery multiplication of $g_{0}$ with $R^{2}$ and place the result $\tilde{g}_{0}$ in operand 0.

115 


\item compute $\tilde{g}_{1}$ by using a single Montgomery multiplication of $g_{1}$ with $R^{2}$ and place the result $\tilde{g}_{1}$ in operand 1.

116 


\item compute $\tilde{g}_{01}$ by using a single Montgomery multiplication of $\tilde{g}_{0}$ with $\tilde{g}_{1}$ and place the result $\tilde{g}_{01}$ in operand 2.

117 


\item compute $a$ by using a single Montgomery multiplication of $R^{2}$ with $1$ and place the result $a$ in operand 3.

118 


\item set the core in exponentiation mode ($exp/m$='1')

119 


\item generate a start pulse for the core

120 


\item wait until interrupt is received

121 


\item perform the postcomputation using a single Montgomery multiplication of $a$(in operand 3) with 1 and read out result

122 


\end{enumerate}
