OpenCores
URL https://opencores.org/ocsvn/ternary_adder/ternary_adder/trunk

Subversion Repositories ternary_adder

[/] [ternary_adder/] [trunk/] [doc/] [ternary_adder.tex] - Blame information for rev 3

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 plutonium
\documentclass[a4paper,BCOR7mm,12pt,pointlessnumbers,bibtotoc]{scrartcl}
2
 
3
\usepackage{amsmath,epsfig}
4
\usepackage{amssymb,amsfonts}
5
\usepackage{color}
6
\usepackage{array,booktabs}
7
\usepackage{graphicx}
8
\usepackage{caption}
9
\usepackage[hypcap=true,labelformat=simple]{subcaption}
10
\renewcommand{\thesubfigure}{(\alph{subfigure})}
11
\usepackage{tikz}
12
\usetikzlibrary{arrows,automata}
13
\usepackage{listings}
14
\usepackage{hyperref}
15
\usepackage{enumitem}
16
 
17
\newcolumntype{C}[1]{>{\centering\arraybackslash}p{#1}} % centering column type with fixed width
18
\newcolumntype{R}[1]{>{\raggedleft\arraybackslash}p{#1}} % right aligned column type with fixed width
19
\newcolumntype{L}[1]{>{\raggedright\arraybackslash}p{#1}} % left aligned column type with fixed width
20
 
21
\newcommand{\ceil}[1]{\left\lceil #1 \right\rceil} %\left\lceil #1 \right\rceil
22
 
23
\begin{document}
24
%\maketitle
25
\begin{center}
26
\Large Ternary Adder IP Cores\\[0.4cm]
27
\large Martin Kumm, Jens Willkomm \\[0.5cm]
28
\large \today \\[0.5cm]
29
\end{center}
30
 
31
\section{Introduction}
32
 
33
This IP core provides resource efficient ternary adders, i.\,e., adders with three inputs performing $s = x + y + z$, for the Altera and Xilinx platforms.
34
Resource efficient means that they need exactly the same resources on modern FPGAs as two-input adders, but are slightly slower.
35
The Xilinx core (\verb|ternary_adder_xilinx.vhd|) is a low-level implementation, following an US patent from Xilinx \cite{sp07}. It directly uses the Xilinx primitives (\verb|CARRY4|, \verb|LUT6_2| and \verb|FDCE|). It is suitable for all FPGAs providing 6-input LUTs. Today, these are the Virtex~5-7, Spartan~6, Kintex~7 and Artix~7 families. The Altera core (\verb|ternary_adder_altera.vhd|) is a high-level implementation using the '+' operator. However, the ternary subtract operations ($x - y + z$, $x + y - z$ and $x - y - z$) are not supported by a high-level description; so this is realized by extending the word size of the ternary adders and setting the lower bits to appropriate constant values.
36
They can be mapped very resource efficient for all Altera FPGAs providing adaptive logic modules (ALMs), today, these are the Arria I,II,V and Stratix II-V FPGAs.
37
 
38
\section{Interface}
39
 
40
The generics as well as the port are identical for the Altera and Xilinx implementation and are described in Table~\ref{tab:generics} and Table~\ref{tab:port}, respectively.
41
 
42
\begin{table}[!h]
43
        \renewcommand{\arraystretch}{1.1}
44
        \caption{Description of the generics}
45
        \label{tab:generics}
46
        \centering
47
        \begin{tabular}{lccL{7cm}}
48
          \toprule
49
          Generic & Type & Default & Description\\
50
    \cmidrule(rl){1-1} \cmidrule(rl){2-2} \cmidrule(rl){3-3} \cmidrule(rl){4-4}
51
    \verb|input_word_size|  & integer & 10    & Input word size of the inputs $x$,$y$ and $z$. The output word size is automatically set to \verb|input_word_size+2|\\
52
    \verb|subtract_y|       & boolean & false & Input $y$ is negated, realizing $s = x - y \pm z$\\
53
    \verb|subtract_z|       & boolean & false & Input $z$ is negated, realizing $s = x \pm y - z$\\
54
    \verb|use_output_ff|    & boolean & true  & If true, the adder uses flip flops at the output (without extra slice or ALM resources)\\
55
    \bottomrule
56
   \end{tabular}
57
\end{table}
58
 
59
\begin{table}[!h]
60
        \renewcommand{\arraystretch}{1.1}
61
        \caption{Description of the port}
62
        \label{tab:port}
63
        \centering
64
        \begin{tabular}{lcccL{5cm}}
65
          \toprule
66
          Generic & Direction & Type & Word Size & Description\\
67
    \cmidrule(rl){1-1} \cmidrule(rl){2-2} \cmidrule(rl){3-3} \cmidrule(rl){4-4} \cmidrule(rl){5-5}
68
    \verb|clk_i| & in  & \verb|sl|        & 1                    & Clock input (used when \verb|use_output_ff=true|)\\
69
    \verb|rst_i| & in  & \verb|sl|        & 1                    & Reset input (used when \verb|use_output_ff=true|)\\
70
    \verb|x_i|   & in  & \verb|slv| & \verb|input_word_size|     & Input $x$\\
71
    \verb|y_i|   & in  & \verb|slv| & \verb|input_word_size|     & Input $y$\\
72
    \verb|z_i|   & in  & \verb|slv| & \verb|input_word_size|     & Input $z$\\
73
    \verb|sum_o| & out & \verb|slv| & \verb|input_word_size + 2| & Sum output $s$\\
74
    \bottomrule
75
   \end{tabular}
76
\end{table}
77
 
78
\section{Implementation}
79
 
80
Both implementations uses a carry save adder (CSA) tree with three inputs and a final ripple carry adder as vector merging adder (VMA). One stage of full adders (FAs) is used to realize a 3:2 compressor, i.\,e., the three input bit vectors are compressed to two bit vectors which are obtained by the sum and carry outputs. A second stage of FAs merges these two bit vectors to a single result.
81
 
82
For Altera, the 3:2 compressor can be directly mapped to the ALM LUT, realizing the sum $s'_i = x_i \oplus y_i \oplus z_i$ and carry $c'_i = x_i y_i + x_i z_i + y_i z_i$. The full adders of the ALM are used for the VMA.
83
To include both stages in a single ALM stage, each ALM has to be configured to the shared arithmetic mode \cite{blsy09} in which the output of one LUT is connected to the FA input of the next higher bit. The resulting ternary adder structure is shown in \figurename~\ref{fig:ternary_adder_stratix}.
84
 
85
For Xilinx, the FA for the 3:2 compressor is also realized in the FPGA LUT \cite{sp07}. In addition to that, one additional XOR gate has to be realized in the same LUT to complete the fast carry chain resources to a ripple carry adder for the VMA. The carry output of the first FA (realized in the LUT) must be routed to the next higher FA input using the FPGA routing fabric.
86
The resulting slice configuration is shown in \figurename~\ref{fig:ternary_adder_virtex_5_6_7}.
87
 
88
 
89
\begin{figure}[!h]
90
\centering
91
% \subfigure[]{\scalebox{1}{\includegraphics{images/ternary_adder_generic}\label{fig:ternary_adder_generic}}}
92
% \subfigure[]{\scalebox{1}{\includegraphics{images/ternary_adder_3_2_comp}\label{fig:ternary_adder_3_2_comp}}}
93
        \begin{subfigure}[c]{\columnwidth}
94
          \centering
95
    \scalebox{1.2}{\includegraphics{images/ternary_adder_altera}}
96
                \caption{}
97
                \label{fig:ternary_adder_stratix}
98
        \end{subfigure}
99
        \begin{subfigure}[c]{\columnwidth}
100
          \centering
101
    \scalebox{1.2}{\includegraphics{images/ternary_adder_xilinx}}
102
                \caption{}
103
                \label{fig:ternary_adder_virtex_5_6_7}
104
        \end{subfigure}
105
 \caption{Realization of ternary adders
106
 %(a) generic architecture using two ripple carry adders
107
 on
108
 (a) Altera Stratix II-V ALMs (b) Xilinx Virtex 5-7 Slices}
109
 \label{fig:ternary_adders}
110
\end{figure}
111
 
112
\section{Resource Consumption}
113
 
114
For Altera, each ALM can compute two output bits. As the output word size is two bits more than the input word size, there are
115
\begin{align}
116
        N_{\text{ALM},++} = \ceil{\frac{\text{input\_word\_size}+2}{2}}
117
\end{align}
118
ALMs needed for a pure addition ($s = x + y + z$). If one input is subtracted (setting one of \verb|subtract_y| or \verb|subtract_y| to true), the word length has to be extended by one bit leading to:
119
\begin{align}
120
        N_{\text{ALM},+-} = \ceil{\frac{\text{input\_word\_size}+3}{2}}
121
\end{align}
122
Finally, if two inputs are subtracted, the word length has to be further increased leading to:
123
\begin{align}
124
        N_{\text{ALM},--} = \ceil{\frac{\text{input\_word\_size}+4}{2}}
125
\end{align}
126
 
127
For Xilinx, four output bits can be computed in each slice. Thus, the number of slices is given by:
128
\begin{align}
129
        N_{\text{Slices}} = \ceil{\frac{\text{input\_word\_size}+2}{4}}
130
\end{align}
131 3 plutonium
The slice usage is independent of the operation performed. If a slice is not fully utilized, the remaining LUTs can still be used for other functionalities.
132 2 plutonium
 
133
\section{Performance}
134
 
135
To estimate the performance, the maximum clock frequencies ($f_\text{max}$) were obtained by synthesis experiments for Altera Stratix~IV (EP4SGX230KF40C2) using Quartus-II~10.1 and Xilinx Virtex~6 (XC6VLX75T-2FF484) using ISE 13.4, both after place \& route. The resulting clock frequencies with output word sizes from 16 up to 64\,bit are shown in Table~\ref{tab:performance}.
136
 
137
\begin{table}[!h]
138
        \renewcommand{\arraystretch}{1.1}
139
        \caption{Performance of the IP Cores}
140
        \label{tab:performance}
141
        \centering
142
        \begin{tabular}{ccc}
143
          \toprule
144
      output word size & $f_\text{max}$ Stratix IV [MHz] & $f_\text{max}$ Virtex 6 [MHz]\\
145
            \cmidrule(rl){1-1} \cmidrule(rl){2-2} \cmidrule(rl){3-3}
146
      16 & 708 & 450\\
147
      32 & 565 & 379\\
148
      48 & 479 & 312 \\
149
      64 & 423 & 292\\
150
    \bottomrule
151
   \end{tabular}
152
\end{table}
153
 
154
\section{Simulation \& Test}
155
 
156
The simulation and automated tests were performed using Modelsim. For that, a testbench (\verb|tb_ternary_adder.vhd|) was created which uses a random number generator together with assert statements to verify the designs. To automate the different FPGA targets the do-file \verb|batch_sim.do| was created which compiles the designs and applies the tests for each target as specified in the do-file \verb|sim_single_inst.do|. These tests include different word sizes, subtractions and the output flip flop functionality. All tests can be started from command line using \verb|vsim -c -do 'do batch_sim.do'| (as defined in \verb|modelsim_batch_sim.sh|).
157
 
158
\bibliographystyle{alpha}
159
\bibliography{ternary_adder}
160
 
161
\end{document}
162
 
163
 
164
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.