URL https://opencores.org/ocsvn/dblclockfft/dblclockfft/trunk

Subversion Repositories dblclockfft

[/] [dblclockfft/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 42

Details | Compare with Previous | View Log


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%% Filename:    doc/src/spec.tex
%%
%% Project:     A General Purpose Pipelined FFT Implementation
%%
%% Purpose:     This file contains the LaTeX instructions necessary to build
%%              the doc/spec.pdf file.  It's not nearly as interesting as the
%%      doc/spec.pdf file itself, so I would recommend you read that file
%%      before looking for items within here.
%%
%% Creator:     Dan Gisselquist, Ph.D.
%%              Gisselquist Technology, LLC
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%% Copyright (C) 2018, Gisselquist Technology, LLC
%%
%% This file is part of the general purpose pipelined FFT project.
%%
%% The pipelined FFT project is free software (firmware): you can redistribute
%% it and/or modify it under the terms of the GNU Lesser General Public License
%% as published by the Free Software Foundation, either version 3 of the
%% License, or (at your option) any later version.
%%
%% The pipelined FFT project is distributed in the hope that it will be useful,
%% but WITHOUT ANY WARRANTY; without even the implied warranty of
%% MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
%% General Public License for more details.
%%
%% You should have received a copy of the GNU Lesser General Public License
%% along with this program.  (It's in the $(ROOT)/doc directory.  Run make
%% with no target there if the PDF file isn't present.)  If not, see
%% <http://www.gnu.org/licenses/> for a copy.
%%
%% License:     LGPL, v3, as defined and found on www.gnu.org,
%%              http://www.gnu.org/licenses/lgpl.html
%%
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%%
\documentclass{gqtekspec}
\project{Pipelined FFT}
\title{Specification}
\author{Dan Gisselquist, Ph.D.}
\email{dgisselq (at) opencores.org}
\revision{Rev.~0.3}
\begin{document}
\pagestyle{gqtekspecplain}
\titlepage
\begin{license}
Copyright (C) \theyear\today, Gisselquist Technology, LLC
 
This file is part of the general purpose pipelined FFT project.
 
The pipelined FFT project is free software (firmware): you can redistribute
it and/or modify it under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
 
The pipelined FFT project is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
General Public License for more details.
 
You should have received a copy of the GNU Lesser General Public License along
with this project.  If not, see \texttt{http://www.gnu.org/licenses/} for a
copy.
\end{license}
\begin{revisionhistory}
0.3 & 6/2/2015 & Gisselquist & General purpose pipelined FFT generator\\\hline
0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline
0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline
\end{revisionhistory}
% Revision History
% Table of Contents, named Contents
\tableofcontents
\listoffigures
\listoftables
\begin{preface}
This FFT came about originally from my attempts to design and implement a
GPS decorrelation
algorithm inside a generic FPGA, but only on a limited budget.  As such,
it was built before I had the board that could use it.  Because I was trying
to hit a very high processing rate, the FFT core was originally built to handle
two samples at a time.  Hence the original name, the ``Double clocked FFT
core''.
 
One of the difficulties of doing all of your development using Verilator as
a simulator, is that you can only simulate components that you have the
Verilog source for.  My desire to use Verilator kept me from using any of
the vendor supplied FFTs out there.
 
This, then, was and is the genesis of this project.
 
Since this genesis, I've used this core as part of several other designs and
maintaied it.  Eventually it has morphed into the general purpose FFT core
generator that it is today.
 
\end{preface}
 
\chapter{Introduction}
\pagenumbering{arabic}
\setcounter{page}{1}
 
The General Purpose Pipelined FFT generator project contains all of the
software necessary to create an arbitrary sized FFT HDL core that will
accept up to two samples per clock cycle, and after some pipeline delay
it will output FFT results at the same rate they are input.
 
The FFT generated by this approach is very configurable.  By simple adjustment
of a command line parameter, the FFT may be made to be a forward FFT or an
inverse FFT.  The number of bits processed, kept, and maintained by this
FFT are also configurable.  Even the number of bits used for the twiddle
factors, or whether or not to bit reverse the outputs, are all configurable
parts to this FFT core.  Finally, the FFT can be configured to process two
samples per clock, one sample per clock, one sample every other clock, or even
one sample every third clock (or less).
 
These features make this general purpose pipelined FFT generator very
different and unique among the other cores available on opencores.com.
 
For those who wish to get started right away, please download the package,
change into the {\tt sw} directory and run {\tt make}.  There is no need to
run a configure script, {\tt fftgen} is completely portable C++.  (While I
do my development on Ubuntu, I am told by others that the core builds on
Microsoft systems as well.)  Then, once built, go ahead and run {\tt fftgen}
without any arguments.  This will cause {\tt fftgen} to print a usage
statement to the screen.  Review the usage statement, and run {\tt fftgen}
a second time with the arguments you need.
 
 
\chapter{Generation}
 
Creating an FFT core is as simple as running the program
{\tt fftgen}.  The program will then create a series of Verilog files, as
well as {\tt .hex} files suitable for use with a \textdollar readmemh, and
place them into an output directory, {\tt ./fft-core/} by default, that
{\tt fftgen} will create.  Creating the core you want takes a touch of
configuring.  Therefore, the following lists the arguments that can be given to
{\tt fftgen} to adjust the core that it builds:
\begin{itemize}
\item[\hbox{-f size}]
        This specifies the size of the FFT core that {\tt fftgen} will build.
        The size must be a power of two.
 
        Given an input $x\left[n\right]$, the FFT will calculate,
        \begin{eqnarray*}
        X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]
                e^{-j2\pi \frac{k}{N}n}
        \end{eqnarray*}
        to within a scale factor.
 
\item[\hbox{-i}]
        This specifies that the FFT will be an inverse FFT.  Specifically,
        it will calculate,
        \begin{eqnarray*}
        x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}
        \end{eqnarray*}
 
        If no {\tt -i} option is given, the core will by default generate a
        forward FFT.
 
\item[\hbox{-2}]
        Builds an FFT that can ingest and output two samples per clock.
 
        This option requires six multiplies for all but the last two butterfly
        stages.  The last two butterfly stages are accomplished using shifts
        and adds only, so they require no multiplies.
 
\item[\hbox{-k 1}]
        Builds an FFT that can ingest and output one sample per clock.
        This option is incompatible with {\tt -2}.
 
        This option requires three multiplies for all but the last two
        butterfly stages.
 
\item[\hbox{-k 2}]
        Builds an FFT that can ingest and output one sample every other clock.
        This option is incompatible with {\tt -2}, and will override {\tt -k 1}.
        You are responsible for making sure that {\tt i\_ce} will never be
        true for two clocks in a row.
 
        Unlike {\tt -k 1}, this option only requires two multiplies for all
        but the last two butterfly stages.
 
\item[\hbox{-k 3}]
        Builds an FFT that can ingest and output one sample every third clock.
        This option is incompatible with {\tt -2}, and will override any
        other {\tt -k} option.
        For this to work, you will need to guarantee that {\tt i\_ce} will
        never be true more than one time in any three clock periods.
 
        Unlike {\tt -k 1} and {\tt -k 2}, this option only requires one
        multiply for all but the last two butterfly stages.
 
\item[\hbox{-s}]
        This causes the core to skip the final bit reversal stage.  The
        outputs of the FFT will then come out in bit reversed order.
 
        This option is useful in those cases where someone wishes to
        multiply the coefficients coming out of an FFT by some product,
        and then to inverse FFT the results.  If the coefficients are also
        applied in bit--reversed order, then both the FFT and IFFT may
        skip their bit reversals.
 
        Be aware, however, doing this requires the bit reversed forward
        transform be followed by a bitreversed decimation in time approach
        to the inverse transform.  This software does not (yet) provide that
        capability.  As such, this capability is really just a placeholder for
        a future capability.
\item[\hbox{-d DIR}]
        Specifies the DIRectory to place the produced Verilog files.  By
        default, this will be in the `./fft-core/' directory, but it can
        be moved to any other directory as necessary.
\item[\hbox{-n bits}] Sets the number of input bits per sample.  Given this
        setting, each of the two samples clocked in at every clock cycle
        will have this many bits for their real portion, and again this many
        bits for their imaginary portion.  Thus, the data input to the
        FFT will be four times this many bits per clock.
\item[\hbox{-m bits}] This sets the maximum bit width of the output.
        By default, the FFT will gain bits as they accumulate within
        the FFT.  Bits are accumulated at roughly one bit for every two stages.
        However, if this value is set, bits are only accumulated up to this
        maximum width.  After this width, further accumulations are truncated.
\item[\hbox{-c bits}] Specifies the number of extra bits to be given to each
        twiddle factor.  The size of the twiddle factors is nominally the size
        of the input data.  By specifying {\tt -c <bits>}, you can extend
        this default value to avoid any loss in precision.
\item[\hbox{-x bits}]  Maintains {\tt bits} extra bits during the computation
        to deal with roundoff error.  Hence, after the first stage, there will
        be this many excess bits within the FFT pipeline.
 
        Internally accumulated roundoff error can be a difficult
        problem to solve.  By using this option, you guarantee that the FFT
        runs with an additional {\tt bits} bits, and only truncates down to
        the necessary width at the end in order to minimize rounding
        errors along the way.
 
\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT
        will consume.  By default, the FFT does not use any hardware multiplies.
        However, this can be expensive on the rest of the logic used by the
        device.  You can avoid this problem by allowing the FFT to use
        hardware multiplies using this option.  By default, the multiplies will
        be used in the latter stages, so that they will be applied where
        the bit width is the greatest.
\end{itemize}
 
\chapter{Architecture}
 
As a component of another system the structure of this system is a simple
black box such as the one shown in Fig.~\ref{fig:black-box-one}
\begin{figure}\begin{center}
\begin{pspicture}(-2.2in,0.3in)(2.2in,2in)
% \rput(0,0){\psframe(-2.2in,0.3in)(2.2in,2in)}
\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0.3in)(0.75in,2in)}
        \rput(0,1in){(I)FFT Core}
        \rput[r](-1.6in,1.8in){\tt i\_clk}
                \rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}
        \rput[r](-1.6in,1.5in){\tt i\_rst}
                \rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}
        \rput[r](-1.6in,1.2in){\tt i\_ce}
                \rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}
        \rput[r](-1.6in,0.6in){\tt i\_sample}
                \rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}
                \rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
                \rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}
        %
        \rput[l](1.6in,1.2in){\tt o\_sync}
                \rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}
        \rput[l](1.6in,0.6in){\tt o\_result}
                \rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}
                \rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
                \rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}
        }
\end{pspicture}
\caption{(I)FFT Black Box Diagram}\label{fig:black-box-one}
\end{center}\end{figure}
for the two traditional one sample per clock FFT implementation, or
Fig.~\ref{fig:black-box-dbl}
\begin{figure}\begin{center}
\begin{pspicture}(-2.1in,0)(2.1in,2in)
% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}
\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}
        \rput(0,1in){(I)FFT Core}
        \rput[r](-1.6in,1.8in){\tt i\_clk}
                \rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}
        \rput[r](-1.6in,1.5in){\tt i\_rst}
                \rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}
        \rput[r](-1.6in,1.2in){\tt i\_ce}
                \rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}
        \rput[r](-1.6in,0.6in){\tt i\_left}
                \rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}
                \rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
                \rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}
        \rput[r](-1.6in,0.3in){\tt i\_right}
                \rput(-1.5in,0.3in){\psline{->}(0,0)(0.7in,0)}
                \rput(-1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
                \rput[br](-1.2in,0.3in){\scalebox{0.75}{$2N_i$}}
        %
        \rput[l](1.6in,1.2in){\tt o\_sync}
                \rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}
        \rput[l](1.6in,0.6in){\tt o\_left}
                \rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}
                \rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
                \rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}
        \rput[l](1.6in,0.3in){\tt o\_right}
                \rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}
                \rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
                \rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}
        }
\end{pspicture}
\caption{Two sample per clock (I)FFT Black Box Diagram}\label{fig:black-box-dbl}
\end{center}\end{figure}
for the two samples per clock FFT.
 
 
The interface
is simple: strobe the reset line, and every clock thereafter set the clock
enable line when data is valid on the input port.  Likewise
for the outputs, when the {\tt o\_sync} line goes high the first data sample
is available.  Ever after that, one data sample will be available every clock
cycle that the {\tt i\_ce} line is high.
 
Internal to the FFT, things are a touch more complex.
 
Fig.~\ref{fig:white-box-dbl}
\begin{figure}\begin{center}
\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)
        % \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}
        \rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}
        \rput(0,5in){%
                \rput[r](1.95in,0.125in){\tiny\tt i\_left}
                \rput[l](4.05in,0.125in){\tiny\tt i\_right}
                \rput(2.0in,0){\psline{->}(0,0.25in)(0,0.0in)}
                \rput(4.0in,0){\psline{->}(0,0.25in)(0,0.0in)}
        }
        \rput(2in,0){%
                \rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput[r](-0.05in,0.675in){\tiny Left}
                        \rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}
                        \rput(0,0.25in){Evens, $N$}
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
                        \rput[l](0.35in,-0.125in){\tiny Data}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
                \rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput(0,0.25in){Evens, $N/2$}
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
                        \rput[l](0.35in,-0.125in){\tiny Data}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                        \rput( 0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
                % \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        % \rput(0,0.25in){Evens, $N$}}
                \rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput(0,0.25in){Evens, $8$}
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
                        \rput[l](0.35in,-0.125in){\tiny Data}
                        \rput[r](-0.35in,0.675in){\tiny Sync}
                        \rput[l](0.35in,0.675in){\tiny Data}
                        \rput(-0.3in,0.9in){$\vdots$}
                        \rput( 0.3in,0.9in){$\vdots$}
                        \rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
                        \rput( 0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
                \rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput(0,0.25in){Qtrstage (Even)}
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
                        \rput[lb](0.6in,-0.10in){\tiny Data}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.5in)(0.8in,-0.5in)}
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.125in)(0.4in,-0.125in)(0.4in,-0.25in)}}
                % \rput(0,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        % \rput(0,0.25in){dblstage}}
                % \rput(0,0in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        % \rput(0,0.25in){Bit Reversal}}
        }
        \rput(4in,0){%
                \rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput[l](0.05in,0.675in){\tiny Right}
                        \rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}
                        \rput(0,0.25in){Odds, $N$}
                        \rput[l](0.35in,-0.125in){\tiny Sync}
                        \rput[r](-0.35in,-0.125in){\tiny Data}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
                \rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput(0,0.25in){Odds, $N/2$}
                        \rput[l](0.35in,-0.125in){\tiny Sync}
                        \rput[r](-0.35in,-0.125in){\tiny Data}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
                % \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        % \rput(0,0.25in){Evens, $N$}}
                \rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput(0,0.25in){Odds, $8$}
                        \rput[l](0.35in,0.675in){\tiny Sync}
                        \rput[r](-0.35in,0.675in){\tiny Data}
                        \rput(-0.3in,0.9in){$\vdots$}
                        \rput( 0.3in,0.9in){$\vdots$}
                        \rput[l](0.35in,-0.125in){\tiny Sync}
                        \rput[r](-0.35in,-0.125in){\tiny Data}
                        \rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
                        \rput(0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
                \rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
                        \rput(0,0.25in){Qtrstage (Odd)}
                        \rput[rb](-0.6in,-0.10in){\tiny Data}
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                                \rput[t](0.3in,-0.3in){\tiny NC}
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.125in)(-0.4in,-0.125in)(-0.4in,-0.25in)}
                        }
        }
        \rput(3in,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%
                \rput(0,0.25in){Double Stage}
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
                        \rput[l](0.35in,-0.125in){\tiny Right}
                        \rput[r](0.15in,-0.125in){\tiny Left}
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
        \rput(3in,0in){\psframe(-0.5in,0)(0.5in,0.5in)%
                \rput(0,0.25in){Bit Reversal}
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
                        \rput[l](0.35in,-0.125in){\tiny Right}
                        \rput[r](0.15in,-0.125in){\tiny Left}
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
        \rput(3in,-0.25in){\rput[r](-0.35in,-0.125in){\tiny\tt o\_sync}
                        \rput[l](0.35in,-0.125in){\tiny\tt o\_right}
                        \rput[r](0.15in,-0.125in){\tiny\tt o\_left}
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
\end{pspicture}
\caption{Internal FFT Structure}\label{fig:white-box-dbl}
\end{center}\end{figure}
attempts to show some of this structure for the two-sample per clock FFT.
As you can see from the figure, the
FFT itself is composed of a series of stages.  For the two-sample per clock
FFT, these stages are split from the beginning into an even stage and an odd
stage.  Further, they are numbered
according to the size of the FFT they represent.  Therefore the first stage
is numbered $N$ and represents the first stage of an $N$ point FFT.  The
second stage is labeled $N/2$, then $N/4$, and so on down to $N=8$.  The
four sample stage and the two sample stages are different, however.  These
two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be
accomplished without any multiplies.  Therefore they have been accomplished
separately.  Likewise all of the stages, save the double stage at the bottom,
operate on one data sample per clock.  Only the last stage, prior to the
bit reversal stage, takes two data samples per clock as input, and outputs two
data samples per clock.  Finally, the bit reversal stage acts as the last
piece of the structure.
 
Internal to each of the FFT stages is a butterfly and a complex multiply,
as shown in Fig.~\ref{fig:fftstage}.
\begin{figure}\begin{center}
\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)
        % \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}
        \rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}
        \rput[r](1.625in,4.125in){\tt i\_sample}
        \rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%
                        \psline{->}(0,0)(-0.2in,-0.25in)%
                        \psarc{->}{0.15in}{200}{340}}
        \rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}
                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}
                        \rput(0,0){\psframe(0.3in,0)(0.4in,0.25in)}
                        \rput(0,0){\psframe(0.5in,0)(0.6in,0.25in)}
                        \rput(0,0){\psframe(0.7in,0)(0.8in,0.25in)}
                        \rput(0,0){\psframe(0.9in,0)(1.0in,0.25in)}
                        \rput(0,0){\psframe(1.1in,0)(1.2in,0.25in)}
                        \rput(0,0){\psline{-}(0.7in,-0.05in)(1.1in,-0.25in)}
                        \rput(0,0){\psline{<-}(0.7in,0.3in)(1.5in,0.5in)(1.5in,0.75in)}}
        \rput(1.85in,2.75in){\psline(0,0.75in)(0,-0.25in)}
        \rput(0.6in,0.25in){\rput(0,0){\psframe[linewidth=2\pslinewidth](0,0)(2in,2.0in)}
                \rput(0.50in,2in){\psline{->}(0,0.25in)(0,0in)}
                \rput(1.25in,2in){\psline{->}(0,0.25in)(0,0in)}
                \rput(1.75in,2in){\psline{->}(0,0.25in)(0,0in)}
                \rput(0.5in,0){%
                        \rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}
                        \rput(0in,0){\psline{->}(0,1.75in)(0.65in,1.1in)}
                        \rput(-0.1in,1.1in){$+$}
                        \rput(0in,1.0in){$\bigoplus$}
                        \rput(0in,0){\psline{->}(0,0.9in)(0,0.75in)}
                        \rput(0in,0.5in){\psframe(-0.45in,-0.25in)(0.45in,0.25in)}
                        \rput(0in,0.5in){\parbox{0.8in}{Delay, and\\shift by $C-2$}}
                        \rput(0in,0){\psline{->}(0,0.25in)(0,0.0in)}}
                \rput(1.25in,0){%
                        \rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}
                        \rput(0in,0){\psline{->}(0,1.75in)(-0.65in,1.1in)}
                        \rput(0.1in,1.1in){$-$}
                        \rput(0in,1in){$\bigoplus$}
                        \rput(0in,0){\psline{->}(0,0.9in)(0,0.6in)}
                        \rput(0in,0.5in){$\bigotimes$}
                        \rput(0in,0){\psline{->}(0,0.4in)(0,0.0in)}}
                \rput(1.75in,0){%
                        \rput(0,0){\psline{->}(0,2.0in)(0,0.5in)(-0.4in,0.5in)}}
                \rput(0.50in,-0.25in){\psline{->}(0,0.25in)(0,-1.05in)}
                \rput(1.25in,-0.25in){\psline{-}(0,0.25in)(0,0in)}}
        \rput*[l](2.0in,0.5in){DIF Butterfly}
        \rput*[lb](1.95in,2.5in){Coefficient memory}
        % \rput(0,0){\psframe(1.3in,-0.25in)(4.7in,5in)}
        \rput(1.7in,-0.5in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}
                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}
                        \rput(0,0){\psframe(0.3in,0)(0.4in,0.25in)}
                        \rput(0,0){\psframe(0.5in,0)(0.6in,0.25in)}
                        \rput(0,0){\psframe(0.7in,0)(0.8in,0.25in)}
                        \rput(0,0){\psframe(0.9in,0)(1.0in,0.25in)}
                        \rput(0,0){\psframe(1.1in,0)(1.2in,0.25in)}
                        \rput(0,0){\psline{<-}(0.7in,0.30in)(0.15in,0.5in)}
                        \rput(0,0){\psline{->}(0.7in,-0.05in)(-0.2in,-0.3in)(-0.2in,-0.55in)}}
        \rput(1.3in,-1.3in){\psline{->}(-0.2in,0.25in)(0,0)}
        \rput(1.3in,-1.3in){\psarcn{->}{0.15in}{150}{30}}
        \rput(1.3in,-1.3in){\psline{->}(0,0)(0,-0.5in)}
        \rput[l](1.35in,-1.675in){\tt o\_data}
\end{pspicture}
\caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage}
\end{center}\end{figure}
These FFT stages are really no different than any other decimation in
frequency radix-2 FFT implementation, save only that for the two-sample per
clock FFT the coefficients are alternated between the two inputs
shown in Fig.~\ref{fig:white-box-dbl}.  That is, the even stages would get all
the even coefficients, and the odd stages get all of the odd coefficients.
For the more general purpose FFT, there's only the one pipeline, so every
sample goes through every butterfly.
Internally, each stage spends the first $N/2$ clocks storing its inputs
into memory, and then the next $N/2$ clocks pairing a stored input with
a single (fresh) external input, so that both values become inputs to the
butterfly.  Likewise, the butterfly coefficient is read from a small ROM table.
 
One trick to making the FFT stage work successfully is synchronization.  Since
the shift and add multiplies create a delay of (roughly) one clock cycle per
two bits of input, there is a significant pipeline delay from the input to the
output of the butterfly routine.  To match this delay, the FFT stage places a
synchronization pulse into the butterfly.  When this synchronization pulse
comes out of the butterfly, the values of the butterfly then match the
first sample out of the stage.  The next synchronization problem comes from
the fact that the butterflies operate on two samples at a time, whereas the
FFT stage operates on a single sample at a time.  This means that half the
time the butterfly output will be invalid.  To keep things aligned, and to
avoid the invalid data half, a counter is started by the synchronization pulse
coming out of the butterfly in order to keep track.  Using this counter and
once the butterfly produces the first sync pulse, the next $N/2$ clock cycles
will produce valid butterfly outputs.  For these clock cycles, the left or
first output is sent immediately to the next FFT stage, whereas the right
or second output is saved into memory.  Once these cycles are complete, the
butterfly outputs will be invalid for the next $N/2$ clock cycles.  During
these invalid clock cycles, the FFT stage outputs data that had been stored
in memory.  In this fashion, data is always valid coming out of each FFT
stage once the initial synchronization pulse goes high.
 
The complex multiply itself, formed internal to the butterfly routine, is
formed from three very simple shift and add multiplies, whose output is
then transformed into a single complex output, although there is a command
line option to use hardware multiplies instead.  To avoid overflow, the
complex coefficients, $z_n$, for these multiplies are given by,
\begin{eqnarray}
z_n &=& c_n + js_n,\mbox{ where} \\
c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\
s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}
\end{eqnarray}
$C$ is the number of bits allocated to the coefficient.
 
For those wishing to understand this operation further and in more depth, I
would commend them to the literature on how a decimation in frequency FFT is
constructed.
 
\chapter{Operation}
 
The core is actually really easy to use:
\begin{enumerate}
        \item Provide a system clock to the core every clock cycle.
        \item Set the {\tt i\_rst} line high for at least one clock cycle
                before you intend to use the core.
        \item From the time of reset until the first sample pair is available
                on the IO ports, {\tt i\_rst} may be kept low, but the clock
                enable line {\tt i\_ce} must also be kept low.
 
        \item On the clock containing the first sample, {\tt i\_sample}, or the
                first sample pair, {\tt i\_left} and {\tt i\_right} for the
                two sample-per-clock FFT, set {\tt i\_ce} high.
 
                If you have elected an FFT that multiplexes its multiplies,
                and so can only handle one sample every two or three clocks,
                then you'll need to guarantee {\tt i\_ce} remains low for
                one or two clocks respectively before raising it again.
 
        \item Ever after, any time a valid sample is placed in {\tt i\_sample}
                and {\tt i\_ce} raised high, a sample will enter into the
                FFT for processing.  For the two sample-per-clock FFT, the
                input will be into {\tt i\_left} (for the first input)
                and {\tt i\_right} (for the next one).
 
        \item At the first valid output, the FFT core will set {\tt o\_sync}
                line high in addition to placing the output value into
                {\tt o\_result}.  For the two-sample per clock FFT, the outputs
                will be placed into {\tt o\_left}
                (the first of two), and {\tt o\_right} (the second of the two)
                respectively.
 
        \item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock
                two samples in and two samples out.  On any valid first
                pair of samples coming out of the transform,
                {\tt o\_sync} will be high.  Otherwise {\tt o\_sync} will
                remain low.
\end{enumerate}
 
There are no special modes or states associated with this core.  If you wish
it to stop or pause, just turn off {\tt i\_ce}.  If you wish to flush the
core, just send zeros into the core.
 
\chapter{Registers}
 
Once built, the FFT routine has no capability for runtime configuration
or reconfiguration.  Therefore, this implementation maintains no user
configurable or readable registers.
 
This is a great advantage in many ways, simply because it greatly simplifies
the interface over other cores that are available out there.
 
\chapter{Clocks}
 
The FFT routines built by this core use one clock only.  The speed of this
clock will depend upon the speed your hardware is capable of.  If your data
rate is slower than your clock speed, just hold off on the {\tt i\_ce}
line as necessary so that every clock with the {\tt i\_ce} line high is a
valid sample.
 
\chapter{IO Ports}
 
The FFT core presents a small set of IO ports to its external interface.
These ports are listed in Table.~\ref{tbl:ioports}.
\begin{table}[htbp]
\begin{center}
\begin{portlist}
i\_clk & 1 & Input & The global clock driving the FFT. \\\hline
i\_rst & 1 & Input & An active high synchronous reset.\\\hline
i\_ce & 1 & Input & Clock Enable.  Set this high to clock data in and
                out.\\\hline
i\_sample & $2N_i$ & Input & The complex input sample.  Bits
                [$\left(2N_i-1\right)$:$N_i$] of this value are the real
                portion, whereas bits [$\left(N_i-1\right)$:0] represent the
                imaginary portion.  Both portions are in signed twos complement
                integer format.  The number of bits, $N_i$, is configurable.
                \\\hline
 
i\_left & $2N_i$ & Input & When the core is configured for two-samples per
                clock,
                this is the first of the two data inputs presented to the core
                on any given clock.  It has the same format as {\tt i\_sample}
                above.
                \\\hline
i\_right & $2N_i$ & Input & The second of two input complex input samples,
                used when the core is configured for two-samples per clock.
                The format is the same as {\tt i\_sample} above.\\\hline
o\_sync & 1 & Output & Signals the first output sample of any transform.
                It will be zero from the time of the reset until the first
                output sample.  Ever afterwards, it will be true any time
                bin zero of the FFT is on the output.\\\hline
o\_result & $2N_o$ & Output & The complex output sample.  The format is the
                same, save only that $N_o$ bits are used for each twos
                complement portion instead of $N_i$.  Hence bits
                [$\left(2N_o-1\right)$:$N_o$] of this value are the real
                portion, whereas bits [$\left(N_o-1\right)$:0] represent the
                imaginary portion.
                \\\hline
o\_left & $2N_o$ & Output & When in the two-sample per clock configuration,
                this is the first of two complex output samples.\\\hline
o\_right & $2N_o$ & Output & When in the two-sample per clcok configuration,
                this is the second of two complex output samples.  \\\hline
                \\\hline
\end{portlist}
\caption{List of IO ports}\label{tbl:ioports}
\end{center}\end{table}
% Appendices
% Index
\end{document}
 
 

Line No.	Rev	Author	Line
1	42	dgisselq	`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
2			`%%`
3			`%% Filename: doc/src/spec.tex`
4			`%%`
5			`%% Project: A General Purpose Pipelined FFT Implementation`
6			`%%`
7			`%% Purpose: This file contains the LaTeX instructions necessary to build`
8			`%% the doc/spec.pdf file. It's not nearly as interesting as the`
9			`%% doc/spec.pdf file itself, so I would recommend you read that file`
10			`%% before looking for items within here.`
11			`%%`
12			`%% Creator: Dan Gisselquist, Ph.D.`
13			`%% Gisselquist Technology, LLC`
14			`%%`
15			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
16			`%%`
17			`%% Copyright (C) 2018, Gisselquist Technology, LLC`
18			`%%`
19			`%% This file is part of the general purpose pipelined FFT project.`
20			`%%`
21			`%% The pipelined FFT project is free software (firmware): you can redistribute`
22			`%% it and/or modify it under the terms of the GNU Lesser General Public License`
23			`%% as published by the Free Software Foundation, either version 3 of the`
24			`%% License, or (at your option) any later version.`
25			`%%`
26			`%% The pipelined FFT project is distributed in the hope that it will be useful,`
27			`%% but WITHOUT ANY WARRANTY; without even the implied warranty of`
28			`%% MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser`
29			`%% General Public License for more details.`
30			`%%`
31			`%% You should have received a copy of the GNU Lesser General Public License`
32			`%% along with this program. (It's in the $(ROOT)/doc directory. Run make`
33			`%% with no target there if the PDF file isn't present.) If not, see`
34			`%% <http://www.gnu.org/licenses/> for a copy.`
35			`%%`
36			`%% License: LGPL, v3, as defined and found on www.gnu.org,`
37			`%% http://www.gnu.org/licenses/lgpl.html`
38			`%%`
39			`%%`
40			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
41			`%%`
42			`%%`
43	10	dgisselq	`\documentclass{gqtekspec}`
44	42	dgisselq	`\project{Pipelined FFT}`
45	10	dgisselq	`\title{Specification}`
46			`\author{Dan Gisselquist, Ph.D.}`
47	27	dgisselq	`\email{dgisselq (at) opencores.org}`
48	42	dgisselq	`\revision{Rev.~0.3}`
49	10	dgisselq	`\begin{document}`
50			`\pagestyle{gqtekspecplain}`
51			`\titlepage`
52			`\begin{license}`
53			`Copyright (C) \theyear\today, Gisselquist Technology, LLC`
54
55	42	dgisselq	`This file is part of the general purpose pipelined FFT project.`
56	10	dgisselq
57	42	dgisselq	`The pipelined FFT project is free software (firmware): you can redistribute`
58			`it and/or modify it under the terms of the GNU Lesser General Public License`
59			`as published by the Free Software Foundation, either version 3 of the`
60			`License, or (at your option) any later version.`
61	10	dgisselq
62	42	dgisselq	`The pipelined FFT project is distributed in the hope that it will be useful,`
63			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
64			`MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser`
65			`General Public License for more details.`
66
67			`You should have received a copy of the GNU Lesser General Public License along`
68			`with this project. If not, see \texttt{http://www.gnu.org/licenses/} for a`
69			`copy.`
70	10	dgisselq	`\end{license}`
71			`\begin{revisionhistory}`
72	42	dgisselq	`0.3 & 6/2/2015 & Gisselquist & General purpose pipelined FFT generator\\\hline`
73	26	dgisselq	`0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline`
74	11	dgisselq	`0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline`
75	10	dgisselq	`\end{revisionhistory}`
76			`% Revision History`
77			`% Table of Contents, named Contents`
78			`\tableofcontents`
79			`\listoffigures`
80			`\listoftables`
81			`\begin{preface}`
82	42	dgisselq	`This FFT came about originally from my attempts to design and implement a`
83			`GPS decorrelation`
84	10	dgisselq	`algorithm inside a generic FPGA, but only on a limited budget. As such,`
85	42	dgisselq	`it was built before I had the board that could use it. Because I was trying`
86			`to hit a very high processing rate, the FFT core was originally built to handle`
87			two samples at a time. Hence the original name, the ``Double clocked FFT
88			`core''.`
89	10	dgisselq
90	42	dgisselq	`One of the difficulties of doing all of your development using Verilator as`
91			`a simulator, is that you can only simulate components that you have the`
92			`Verilog source for. My desire to use Verilator kept me from using any of`
93			`the vendor supplied FFTs out there.`
94	10	dgisselq
95			`This, then, was and is the genesis of this project.`
96	42	dgisselq
97			`Since this genesis, I've used this core as part of several other designs and`
98			`maintaied it. Eventually it has morphed into the general purpose FFT core`
99			`generator that it is today.`
100
101	10	dgisselq	`\end{preface}`
102
103			`\chapter{Introduction}`
104			`\pagenumbering{arabic}`
105			`\setcounter{page}{1}`
106
107	42	dgisselq	`The General Purpose Pipelined FFT generator project contains all of the`
108			`software necessary to create an arbitrary sized FFT HDL core that will`
109			`accept up to two samples per clock cycle, and after some pipeline delay`
110			`it will output FFT results at the same rate they are input.`
111	10	dgisselq
112			`The FFT generated by this approach is very configurable. By simple adjustment`
113			`of a command line parameter, the FFT may be made to be a forward FFT or an`
114			`inverse FFT. The number of bits processed, kept, and maintained by this`
115			`FFT are also configurable. Even the number of bits used for the twiddle`
116			`factors, or whether or not to bit reverse the outputs, are all configurable`
117	42	dgisselq	`parts to this FFT core. Finally, the FFT can be configured to process two`
118			`samples per clock, one sample per clock, one sample every other clock, or even`
119			`one sample every third clock (or less).`
120	10	dgisselq
121	42	dgisselq	`These features make this general purpose pipelined FFT generator very`
122			`different and unique among the other cores available on opencores.com.`
123	10	dgisselq
124			`For those who wish to get started right away, please download the package,`
125			`change into the {\tt sw} directory and run {\tt make}. There is no need to`
126	42	dgisselq	`run a configure script, {\tt fftgen} is completely portable C++. (While I`
127			`do my development on Ubuntu, I am told by others that the core builds on`
128			`Microsoft systems as well.) Then, once built, go ahead and run {\tt fftgen}`
129			`without any arguments. This will cause {\tt fftgen} to print a usage`
130			`statement to the screen. Review the usage statement, and run {\tt fftgen}`
131			`a second time with the arguments you need.`
132	10	dgisselq
133
134			`\chapter{Generation}`
135
136	42	dgisselq	`Creating an FFT core is as simple as running the program`
137	10	dgisselq	`{\tt fftgen}. The program will then create a series of Verilog files, as`
138			`well as {\tt .hex} files suitable for use with a \textdollar readmemh, and`
139	42	dgisselq	`place them into an output directory, {\tt ./fft-core/} by default, that`
140			`{\tt fftgen} will create. Creating the core you want takes a touch of`
141			`configuring. Therefore, the following lists the arguments that can be given to`
142	10	dgisselq	`{\tt fftgen} to adjust the core that it builds:`
143			`\begin{itemize}`
144			`\item[\hbox{-f size}]`
145			`This specifies the size of the FFT core that {\tt fftgen} will build.`
146	42	dgisselq	`The size must be a power of two.`
147
148			`Given an input $x\left[n\right]$, the FFT will calculate,`
149	10	dgisselq	`\begin{eqnarray*}`
150			`X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]`
151			`e^{-j2\pi \frac{k}{N}n}`
152			`\end{eqnarray*}`
153	42	dgisselq	`to within a scale factor.`
154	10	dgisselq
155	42	dgisselq	`\item[\hbox{-i}]`
156	10	dgisselq	`This specifies that the FFT will be an inverse FFT. Specifically,`
157			`it will calculate,`
158			`\begin{eqnarray*}`
159			`x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}`
160			`\end{eqnarray*}`
161	42	dgisselq
162			`If no {\tt -i} option is given, the core will by default generate a`
163			`forward FFT.`
164
165			`\item[\hbox{-2}]`
166			`Builds an FFT that can ingest and output two samples per clock.`
167
168			`This option requires six multiplies for all but the last two butterfly`
169			`stages. The last two butterfly stages are accomplished using shifts`
170			`and adds only, so they require no multiplies.`
171
172			`\item[\hbox{-k 1}]`
173			`Builds an FFT that can ingest and output one sample per clock.`
174			`This option is incompatible with {\tt -2}.`
175
176			`This option requires three multiplies for all but the last two`
177			`butterfly stages.`
178
179			`\item[\hbox{-k 2}]`
180			`Builds an FFT that can ingest and output one sample every other clock.`
181			`This option is incompatible with {\tt -2}, and will override {\tt -k 1}.`
182			`You are responsible for making sure that {\tt i\_ce} will never be`
183			`true for two clocks in a row.`
184
185			`Unlike {\tt -k 1}, this option only requires two multiplies for all`
186			`but the last two butterfly stages.`
187
188			`\item[\hbox{-k 3}]`
189			`Builds an FFT that can ingest and output one sample every third clock.`
190			`This option is incompatible with {\tt -2}, and will override any`
191			`other {\tt -k} option.`
192			`For this to work, you will need to guarantee that {\tt i\_ce} will`
193			`never be true more than one time in any three clock periods.`
194
195			`Unlike {\tt -k 1} and {\tt -k 2}, this option only requires one`
196			`multiply for all but the last two butterfly stages.`
197
198	10	dgisselq	`\item[\hbox{-s}]`
199			`This causes the core to skip the final bit reversal stage. The`
200			`outputs of the FFT will then come out in bit reversed order.`
201
202			`This option is useful in those cases where someone wishes to`
203			`multiply the coefficients coming out of an FFT by some product,`
204			`and then to inverse FFT the results. If the coefficients are also`
205			`applied in bit--reversed order, then both the FFT and IFFT may`
206			`skip their bit reversals.`
207	22	dgisselq
208			`Be aware, however, doing this requires the bit reversed forward`
209			`transform be followed by a bitreversed decimation in time approach`
210			`to the inverse transform. This software does not (yet) provide that`
211	42	dgisselq	`capability. As such, this capability is really just a placeholder for`
212			`a future capability.`
213	10	dgisselq	`\item[\hbox{-d DIR}]`
214			`Specifies the DIRectory to place the produced Verilog files. By`
215			default, this will be in the `./fft-core/' directory, but it can
216			`be moved to any other directory as necessary.`
217			`\item[\hbox{-n bits}] Sets the number of input bits per sample. Given this`
218			`setting, each of the two samples clocked in at every clock cycle`
219			`will have this many bits for their real portion, and again this many`
220			`bits for their imaginary portion. Thus, the data input to the`
221			`FFT will be four times this many bits per clock.`
222			`\item[\hbox{-m bits}] This sets the maximum bit width of the output.`
223			`By default, the FFT will gain bits as they accumulate within`
224			`the FFT. Bits are accumulated at roughly one bit for every two stages.`
225			`However, if this value is set, bits are only accumulated up to this`
226			`maximum width. After this width, further accumulations are truncated.`
227	42	dgisselq	`\item[\hbox{-c bits}] Specifies the number of extra bits to be given to each`
228			`twiddle factor. The size of the twiddle factors is nominally the size`
229			`of the input data. By specifying {\tt -c <bits>}, you can extend`
230			`this default value to avoid any loss in precision.`
231			`\item[\hbox{-x bits}] Maintains {\tt bits} extra bits during the computation`
232			`to deal with roundoff error. Hence, after the first stage, there will`
233			`be this many excess bits within the FFT pipeline.`
234
235			`Internally accumulated roundoff error can be a difficult`
236	22	dgisselq	`problem to solve. By using this option, you guarantee that the FFT`
237			`runs with an additional {\tt bits} bits, and only truncates down to`
238			`the necessary width at the end in order to minimize rounding`
239			`errors along the way.`
240	42	dgisselq
241	22	dgisselq	`\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT`
242			`will consume. By default, the FFT does not use any hardware multiplies.`
243			`However, this can be expensive on the rest of the logic used by the`
244			`device. You can avoid this problem by allowing the FFT to use`
245			`hardware multiplies using this option. By default, the multiplies will`
246			`be used in the latter stages, so that they will be applied where`
247			`the bit width is the greatest.`
248	10	dgisselq	`\end{itemize}`
249
250			`\chapter{Architecture}`
251
252			`As a component of another system the structure of this system is a simple`
253	42	dgisselq	`black box such as the one shown in Fig.~\ref{fig:black-box-one}`
254	10	dgisselq	`\begin{figure}\begin{center}`
255	42	dgisselq	`\begin{pspicture}(-2.2in,0.3in)(2.2in,2in)`
256			`% \rput(0,0){\psframe(-2.2in,0.3in)(2.2in,2in)}`
257			`\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0.3in)(0.75in,2in)}`
258			`\rput(0,1in){(I)FFT Core}`
259			`\rput[r](-1.6in,1.8in){\tt i\_clk}`
260			`\rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}`
261			`\rput[r](-1.6in,1.5in){\tt i\_rst}`
262			`\rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}`
263			`\rput[r](-1.6in,1.2in){\tt i\_ce}`
264			`\rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}`
265			`\rput[r](-1.6in,0.6in){\tt i\_sample}`
266			`\rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}`
267			`\rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
268			`\rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}`
269			`%`
270			`\rput[l](1.6in,1.2in){\tt o\_sync}`
271			`\rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}`
272			`\rput[l](1.6in,0.6in){\tt o\_result}`
273			`\rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}`
274			`\rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
275			`\rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}`
276			`}`
277			`\end{pspicture}`
278			`\caption{(I)FFT Black Box Diagram}\label{fig:black-box-one}`
279			`\end{center}\end{figure}`
280			`for the two traditional one sample per clock FFT implementation, or`
281			`Fig.~\ref{fig:black-box-dbl}`
282			`\begin{figure}\begin{center}`
283	10	dgisselq	`\begin{pspicture}(-2.1in,0)(2.1in,2in)`
284			`% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}`
285			`\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}`
286			`\rput(0,1in){(I)FFT Core}`
287			`\rput[r](-1.6in,1.8in){\tt i\_clk}`
288			`\rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}`
289			`\rput[r](-1.6in,1.5in){\tt i\_rst}`
290			`\rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}`
291			`\rput[r](-1.6in,1.2in){\tt i\_ce}`
292			`\rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}`
293			`\rput[r](-1.6in,0.6in){\tt i\_left}`
294			`\rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}`
295			`\rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
296			`\rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}`
297			`\rput[r](-1.6in,0.3in){\tt i\_right}`
298			`\rput(-1.5in,0.3in){\psline{->}(0,0)(0.7in,0)}`
299			`\rput(-1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
300			`\rput[br](-1.2in,0.3in){\scalebox{0.75}{$2N_i$}}`
301			`%`
302			`\rput[l](1.6in,1.2in){\tt o\_sync}`
303			`\rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}`
304			`\rput[l](1.6in,0.6in){\tt o\_left}`
305			`\rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}`
306			`\rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
307			`\rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}`
308			`\rput[l](1.6in,0.3in){\tt o\_right}`
309			`\rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}`
310			`\rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
311			`\rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}`
312			`}`
313			`\end{pspicture}`
314	42	dgisselq	`\caption{Two sample per clock (I)FFT Black Box Diagram}\label{fig:black-box-dbl}`
315	10	dgisselq	`\end{center}\end{figure}`
316	42	dgisselq	`for the two samples per clock FFT.`
317
318
319	10	dgisselq	`The interface`
320			`is simple: strobe the reset line, and every clock thereafter set the clock`
321	42	dgisselq	`enable line when data is valid on the input port. Likewise`
322	10	dgisselq	`for the outputs, when the {\tt o\_sync} line goes high the first data sample`
323			`is available. Ever after that, one data sample will be available every clock`
324			`cycle that the {\tt i\_ce} line is high.`
325
326	42	dgisselq	`Internal to the FFT, things are a touch more complex.`
327
328			`Fig.~\ref{fig:white-box-dbl}`
329	10	dgisselq	`\begin{figure}\begin{center}`
330			`\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)`
331			`% \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}`
332	11	dgisselq	`\rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}`
333	10	dgisselq	`\rput(0,5in){%`
334			`\rput[r](1.95in,0.125in){\tiny\tt i\_left}`
335			`\rput[l](4.05in,0.125in){\tiny\tt i\_right}`
336			`\rput(2.0in,0){\psline{->}(0,0.25in)(0,0.0in)}`
337			`\rput(4.0in,0){\psline{->}(0,0.25in)(0,0.0in)}`
338			`}`
339			`\rput(2in,0){%`
340			`\rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%`
341			`\rput[r](-0.05in,0.675in){\tiny Left}`
342			`\rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}`
343			`\rput(0,0.25in){Evens, $N$}`
344			`\rput[r](-0.35in,-0.125in){\tiny Sync}`
345			`\rput[l](0.35in,-0.125in){\tiny Data}`
346			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
347			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
348			`\rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%`
349			`\rput(0,0.25in){Evens, $N/2$}`
350			`\rput[r](-0.35in,-0.125in){\tiny Sync}`
351			`\rput[l](0.35in,-0.125in){\tiny Data}`
352			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
353			`\rput( 0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
354			`% \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%`
355			`% \rput(0,0.25in){Evens, $N$}}`
356			`\rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%`
357			`\rput(0,0.25in){Evens, $8$}`
358			`\rput[r](-0.35in,-0.125in){\tiny Sync}`
359			`\rput[l](0.35in,-0.125in){\tiny Data}`
360			`\rput[r](-0.35in,0.675in){\tiny Sync}`
361			`\rput[l](0.35in,0.675in){\tiny Data}`
362			`\rput(-0.3in,0.9in){$\vdots$}`
363			`\rput( 0.3in,0.9in){$\vdots$}`
364			`\rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}`
365			`\rput( 0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}`
366			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
367			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
368			`\rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%`
369			`\rput(0,0.25in){Qtrstage (Even)}`
370			`\rput[r](-0.35in,-0.125in){\tiny Sync}`
371			`\rput[lb](0.6in,-0.10in){\tiny Data}`
372			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.5in)(0.8in,-0.5in)}`
373			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.125in)(0.4in,-0.125in)(0.4in,-0.25in)}}`
374			`% \rput(0,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%`
375			`% \rput(0,0.25in){dblstage}}`
376			`% \rput(0,0in){\psframe(-0.5in,0)(0.5in,0.5in)%`
377			`% \rput(0,0.25in){Bit Reversal}}`
378			`}`
379			`\rput(4in,0){%`
380			`\rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%`
381			`\rput[l](0.05in,0.675in){\tiny Right}`
382			`\rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}`
383			`\rput(0,0.25in){Odds, $N$}`
384			`\rput[l](0.35in,-0.125in){\tiny Sync}`
385			`\rput[r](-0.35in,-0.125in){\tiny Data}`
386			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
387			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
388			`\rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%`
389			`\rput(0,0.25in){Odds, $N/2$}`
390			`\rput[l](0.35in,-0.125in){\tiny Sync}`
391			`\rput[r](-0.35in,-0.125in){\tiny Data}`
392			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
393			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
394			`% \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%`
395			`% \rput(0,0.25in){Evens, $N$}}`
396			`\rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%`
397			`\rput(0,0.25in){Odds, $8$}`
398			`\rput[l](0.35in,0.675in){\tiny Sync}`
399			`\rput[r](-0.35in,0.675in){\tiny Data}`
400			`\rput(-0.3in,0.9in){$\vdots$}`
401			`\rput( 0.3in,0.9in){$\vdots$}`
402			`\rput[l](0.35in,-0.125in){\tiny Sync}`
403			`\rput[r](-0.35in,-0.125in){\tiny Data}`
404			`\rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}`
405			`\rput(0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}`
406			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
407			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
408			`\rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%`
409			`\rput(0,0.25in){Qtrstage (Odd)}`
410			`\rput[rb](-0.6in,-0.10in){\tiny Data}`
411			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
412			`\rput[t](0.3in,-0.3in){\tiny NC}`
413			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.125in)(-0.4in,-0.125in)(-0.4in,-0.25in)}`
414			`}`
415			`}`
416			`\rput(3in,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%`
417			`\rput(0,0.25in){Double Stage}`
418			`\rput[r](-0.35in,-0.125in){\tiny Sync}`
419			`\rput[l](0.35in,-0.125in){\tiny Right}`
420			`\rput[r](0.15in,-0.125in){\tiny Left}`
421			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
422			`\rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}`
423			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
424			`\rput(3in,0in){\psframe(-0.5in,0)(0.5in,0.5in)%`
425			`\rput(0,0.25in){Bit Reversal}`
426			`\rput[r](-0.35in,-0.125in){\tiny Sync}`
427			`\rput[l](0.35in,-0.125in){\tiny Right}`
428			`\rput[r](0.15in,-0.125in){\tiny Left}`
429			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
430			`\rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}`
431			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
432			`\rput(3in,-0.25in){\rput[r](-0.35in,-0.125in){\tiny\tt o\_sync}`
433			`\rput[l](0.35in,-0.125in){\tiny\tt o\_right}`
434			`\rput[r](0.15in,-0.125in){\tiny\tt o\_left}`
435			`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
436			`\rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}`
437			`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
438			`\end{pspicture}`
439	42	dgisselq	`\caption{Internal FFT Structure}\label{fig:white-box-dbl}`
440	10	dgisselq	`\end{center}\end{figure}`
441	42	dgisselq	`attempts to show some of this structure for the two-sample per clock FFT.`
442			`As you can see from the figure, the`
443			`FFT itself is composed of a series of stages. For the two-sample per clock`
444			`FFT, these stages are split from the beginning into an even stage and an odd`
445			`stage. Further, they are numbered`
446	10	dgisselq	`according to the size of the FFT they represent. Therefore the first stage`
447			`is numbered $N$ and represents the first stage of an $N$ point FFT. The`
448	42	dgisselq	`second stage is labeled $N/2$, then $N/4$, and so on down to $N=8$. The`
449	10	dgisselq	`four sample stage and the two sample stages are different, however. These`
450			`two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be`
451			`accomplished without any multiplies. Therefore they have been accomplished`
452			`separately. Likewise all of the stages, save the double stage at the bottom,`
453			`operate on one data sample per clock. Only the last stage, prior to the`
454			`bit reversal stage, takes two data samples per clock as input, and outputs two`
455			`data samples per clock. Finally, the bit reversal stage acts as the last`
456			`piece of the structure.`
457
458			`Internal to each of the FFT stages is a butterfly and a complex multiply,`
459			`as shown in Fig.~\ref{fig:fftstage}.`
460			`\begin{figure}\begin{center}`
461	11	dgisselq	`\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)`
462			`% \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}`
463			`\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}`
464	42	dgisselq	`\rput[r](1.625in,4.125in){\tt i\_sample}`
465	11	dgisselq	`\rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%`
466			`\psline{->}(0,0)(-0.2in,-0.25in)%`
467			`\psarc{->}{0.15in}{200}{340}}`
468	10	dgisselq	`\rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}`
469			`\rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}`
470			`\rput(0,0){\psframe(0.3in,0)(0.4in,0.25in)}`
471			`\rput(0,0){\psframe(0.5in,0)(0.6in,0.25in)}`
472			`\rput(0,0){\psframe(0.7in,0)(0.8in,0.25in)}`
473			`\rput(0,0){\psframe(0.9in,0)(1.0in,0.25in)}`
474			`\rput(0,0){\psframe(1.1in,0)(1.2in,0.25in)}`
475			`\rput(0,0){\psline{-}(0.7in,-0.05in)(1.1in,-0.25in)}`
476	11	dgisselq	`\rput(0,0){\psline{<-}(0.7in,0.3in)(1.5in,0.5in)(1.5in,0.75in)}}`
477	10	dgisselq	`\rput(1.85in,2.75in){\psline(0,0.75in)(0,-0.25in)}`
478	11	dgisselq	`\rput(0.6in,0.25in){\rput(0,0){\psframe[linewidth=2\pslinewidth](0,0)(2in,2.0in)}`
479	10	dgisselq	`\rput(0.50in,2in){\psline{->}(0,0.25in)(0,0in)}`
480			`\rput(1.25in,2in){\psline{->}(0,0.25in)(0,0in)}`
481			`\rput(1.75in,2in){\psline{->}(0,0.25in)(0,0in)}`
482			`\rput(0.5in,0){%`
483			`\rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}`
484			`\rput(0in,0){\psline{->}(0,1.75in)(0.65in,1.1in)}`
485			`\rput(-0.1in,1.1in){$+$}`
486			`\rput(0in,1.0in){$\bigoplus$}`
487			`\rput(0in,0){\psline{->}(0,0.9in)(0,0.75in)}`
488			`\rput(0in,0.5in){\psframe(-0.45in,-0.25in)(0.45in,0.25in)}`
489			`\rput(0in,0.5in){\parbox{0.8in}{Delay, and\\shift by $C-2$}}`
490			`\rput(0in,0){\psline{->}(0,0.25in)(0,0.0in)}}`
491			`\rput(1.25in,0){%`
492			`\rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}`
493			`\rput(0in,0){\psline{->}(0,1.75in)(-0.65in,1.1in)}`
494			`\rput(0.1in,1.1in){$-$}`
495			`\rput(0in,1in){$\bigoplus$}`
496			`\rput(0in,0){\psline{->}(0,0.9in)(0,0.6in)}`
497			`\rput(0in,0.5in){$\bigotimes$}`
498			`\rput(0in,0){\psline{->}(0,0.4in)(0,0.0in)}}`
499			`\rput(1.75in,0){%`
500			`\rput(0,0){\psline{->}(0,2.0in)(0,0.5in)(-0.4in,0.5in)}}`

Browse

Tools

Subversion Repositories dblclockfft

[/] [dblclockfft/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 42