1 |
11 |
subhasis25 |
\documentclass[a4paper]{article}
|
2 |
|
|
\usepackage{amsmath}
|
3 |
|
|
\usepackage{amssymb}
|
4 |
|
|
\usepackage{amsfonts}
|
5 |
|
|
\usepackage{graphicx}
|
6 |
|
|
\DeclareGraphicsExtensions{.pdf,.eps,.png,.jpg}
|
7 |
|
|
\usepackage{color}
|
8 |
|
|
\usepackage{psfig}
|
9 |
|
|
\usepackage{float}
|
10 |
|
|
\usepackage{subfigure}
|
11 |
|
|
\setlength\topmargin{0 in}
|
12 |
|
|
\setlength\oddsidemargin{0 in}
|
13 |
|
|
\setlength\textwidth{6.5 in}
|
14 |
|
|
|
15 |
|
|
\title{Fully Pipelined AES Core}
|
16 |
|
|
\author{Subhasis Das}
|
17 |
|
|
\date{}
|
18 |
|
|
|
19 |
|
|
\begin{document}
|
20 |
|
|
\maketitle
|
21 |
|
|
\section*{Basic Architecture}
|
22 |
|
|
This core meets the NIST FIPS-197 specifications. The basic block diagram is given in Figure \ref{arch}.
|
23 |
|
|
\begin{figure}[h]
|
24 |
|
|
\centering
|
25 |
|
|
\includegraphics[scale=0.7]{arch}
|
26 |
|
|
\caption{Basic Architecture}
|
27 |
|
|
\label{arch}
|
28 |
|
|
\end{figure}
|
29 |
|
|
|
30 |
|
|
I have generated each of the roundkeys in two steps. Let us call
|
31 |
|
|
\[
|
32 |
|
|
\text{RotWord}(\text{ Sbox}(C_3)\;)\text{ xor RCon}\; = \;f(C_3)
|
33 |
|
|
\]
|
34 |
|
|
Then, we can see that
|
35 |
|
|
\begin{equation*}
|
36 |
|
|
\begin{aligned}
|
37 |
|
|
C_0^\prime &= f(C_3) \;\text{xor}\; C_0 \\
|
38 |
|
|
C_1^\prime &= f(C_3) \;\text{xor}\; C_0 \;\text{xor}\; C_1 \\
|
39 |
|
|
C_2^\prime &= f(C_3) \;\text{xor}\; C_0 \;\text{xor}\; C_1 \;\text{xor}\; C_2 \\
|
40 |
|
|
C_3^\prime &= f(C_3) \;\text{xor}\; C_0 \;\text{xor}\; C_1 \;\text{xor}\; C_2 \;\text{xor}\; C_3
|
41 |
|
|
\end{aligned}
|
42 |
|
|
\end{equation*}
|
43 |
|
|
where $C_i$ is the column i of the current roundkey and $C_i^\prime$ is the column i of the next roundkey.
|
44 |
|
|
This first step of generating $f(C_3)$ is done alongwith the addkey step of the previous cycle and the second step is done in the combined S-Box and ShiftRows step.
|
45 |
|
|
|
46 |
|
|
The inputs to the overall processor are as follows:
|
47 |
|
|
\begin{itemize}
|
48 |
|
|
\item clk\_i: System Clock, Data I/O at rising edge
|
49 |
|
|
\item rst\_i: Asynchronous Reset, active high, initializes all inputs to all stages and the final output to zero.
|
50 |
|
|
\item plaintext\_i: 16$\times$8 bits plaintext input
|
51 |
|
|
\item keyblock\_i: 16$\times$8 bits keyblock input
|
52 |
|
|
\end{itemize}
|
53 |
|
|
The output is
|
54 |
|
|
\begin{itemize}
|
55 |
|
|
\item ciphertext\_o: 16$\times$8 bits ciphertext output
|
56 |
|
|
\end{itemize}
|
57 |
|
|
|
58 |
|
|
The timing diagram is shown in Figure \ref{clock}.
|
59 |
|
|
\begin{figure}[H]
|
60 |
|
|
\centering
|
61 |
|
|
\includegraphics[scale=0.7]{clock}
|
62 |
|
|
\caption{Timing Diagram}
|
63 |
|
|
\label{clock}
|
64 |
|
|
\end{figure}
|
65 |
|
|
|
66 |
|
|
The \texttt{trunk/rtl/vhdl} directory contains the whole source code.
|
67 |
|
|
|
68 |
|
|
The sample testbench is in \texttt{trunk/bench/vhdl}.
|
69 |
|
|
|
70 |
12 |
subhasis25 |
For compiling and running the testbench, the script \texttt{sim\_isim.sh} in \texttt{trunk/sim/rtl\_sim/run} directory can be used for Xilinx ISim simulator and \texttt{sim\_ghdl.sh} for GHDL. The testbench takes in plaintext and key data from \texttt{vectors.dat} in \texttt{trunk/sim/rtl\_sim/src} directory. The expected ciphertext data should be present in \texttt{cipher.dat} in \texttt{trunk/sim/rtl\_sim/src} directory. The results are written to \texttt{output.log} in \texttt{trunk/sim/rtl\_sim/log} directory. The final line is 'OK' if all tests pass, else it is 'FAIL'. This can be used to automate checkings over large test datasets.
|
71 |
11 |
subhasis25 |
|
72 |
|
|
The \texttt{trunk/syn/Xilinx/run} directory contains the \texttt{synth.sh} shell script, which will synthesize the design when run using Xilinx ISE WebPack tools.
|
73 |
|
|
|
74 |
|
|
The speed optimized synthesis results with timing driven map on a Xilinx 5VLX50T device is shown in Table \ref{stats}.
|
75 |
|
|
\begin{table}[h]
|
76 |
|
|
\centering
|
77 |
|
|
\begin{tabular}{|l|l|}
|
78 |
|
|
\hline
|
79 |
|
|
$f_{max}$ & $\approx$ 330 MHz \\
|
80 |
|
|
\hline
|
81 |
|
|
Max throughput & $\approx$ 42 Gbps \\
|
82 |
|
|
\hline
|
83 |
|
|
Slice Registers's & 7873 (27\%) \\
|
84 |
|
|
\hline
|
85 |
|
|
Slice LUT's & 14724 (51\%) \\
|
86 |
|
|
\hline
|
87 |
|
|
Bonded IOB's & 386 (80\%) \\
|
88 |
|
|
\hline
|
89 |
|
|
\end{tabular}
|
90 |
|
|
\caption{Design Statistics}
|
91 |
|
|
\label{stats}
|
92 |
|
|
\end{table}
|
93 |
|
|
|
94 |
|
|
All the synthesis, map and place and route logs are available in \texttt{trunk/syn/Xilinx/log} directory.
|
95 |
|
|
\end{document}
|