OpenCores

Rev 32	Rev 42
Line 1...	Line 1...
	`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
	`%%`
	`%% Filename: doc/src/spec.tex`
	`%%`
	`%% Project: A General Purpose Pipelined FFT Implementation`
	`%%`
	`%% Purpose: This file contains the LaTeX instructions necessary to build`
	`%% the doc/spec.pdf file. It's not nearly as interesting as the`
	`%% doc/spec.pdf file itself, so I would recommend you read that file`
	`%% before looking for items within here.`
	`%%`
	`%% Creator: Dan Gisselquist, Ph.D.`
	`%% Gisselquist Technology, LLC`
	`%%`
	`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
	`%%`
	`%% Copyright (C) 2018, Gisselquist Technology, LLC`
	`%%`
	`%% This file is part of the general purpose pipelined FFT project.`
	`%%`
	`%% The pipelined FFT project is free software (firmware): you can redistribute`
	`%% it and/or modify it under the terms of the GNU Lesser General Public License`
	`%% as published by the Free Software Foundation, either version 3 of the`
	`%% License, or (at your option) any later version.`
	`%%`
	`%% The pipelined FFT project is distributed in the hope that it will be useful,`
	`%% but WITHOUT ANY WARRANTY; without even the implied warranty of`
	`%% MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser`
	`%% General Public License for more details.`
	`%%`
	`%% You should have received a copy of the GNU Lesser General Public License`
	`%% along with this program. (It's in the $(ROOT)/doc directory. Run make`
	`%% with no target there if the PDF file isn't present.) If not, see`
	`%% <http://www.gnu.org/licenses/> for a copy.`
	`%%`
	`%% License: LGPL, v3, as defined and found on www.gnu.org,`
	`%% http://www.gnu.org/licenses/lgpl.html`
	`%%`
	`%%`
	`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
	`%%`
	`%%`
`\documentclass{gqtekspec}`	`\documentclass{gqtekspec}`
`\project{Double Clocked FFT}`	`\project{Pipelined FFT}`
`\title{Specification}`	`\title{Specification}`
`\author{Dan Gisselquist, Ph.D.}`	`\author{Dan Gisselquist, Ph.D.}`
`\email{dgisselq (at) opencores.org}`	`\email{dgisselq (at) opencores.org}`
`\revision{Rev.~0.2}`	`\revision{Rev.~0.3}`
`\begin{document}`	`\begin{document}`
`\pagestyle{gqtekspecplain}`	`\pagestyle{gqtekspecplain}`
`\titlepage`	`\titlepage`
`\begin{license}`	`\begin{license}`
`Copyright (C) \theyear\today, Gisselquist Technology, LLC`	`Copyright (C) \theyear\today, Gisselquist Technology, LLC`

`This project is free software (firmware): you can redistribute it and/or`	`This file is part of the general purpose pipelined FFT project.`
`modify it under the terms of the GNU General Public License as published`
`by the Free Software Foundation, either version 3 of the License, or (at`
`your option) any later version.`

`This program is distributed in the hope that it will be useful, but WITHOUT`
`ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or`
`FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License`
`for more details.`

`You should have received a copy of the GNU General Public License along`	`The pipelined FFT project is free software (firmware): you can redistribute`
`with this program. If not, see \texttt{http://www.gnu.org/licenses/} for a copy.`	`it and/or modify it under the terms of the GNU Lesser General Public License`
	`as published by the Free Software Foundation, either version 3 of the`
	`License, or (at your option) any later version.`

	`The pipelined FFT project is distributed in the hope that it will be useful,`
	`but WITHOUT ANY WARRANTY; without even the implied warranty of`
	`MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser`
	`General Public License for more details.`

	`You should have received a copy of the GNU Lesser General Public License along`
	`with this project. If not, see \texttt{http://www.gnu.org/licenses/} for a`
	`copy.`
`\end{license}`	`\end{license}`
`\begin{revisionhistory}`	`\begin{revisionhistory}`
	`0.3 & 6/2/2015 & Gisselquist & General purpose pipelined FFT generator\\\hline`
`0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline`	`0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline`
`0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline`	`0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline`
`\end{revisionhistory}`	`\end{revisionhistory}`
`% Revision History`	`% Revision History`
`% Table of Contents, named Contents`	`% Table of Contents, named Contents`
`\tableofcontents`	`\tableofcontents`
`\listoffigures`	`\listoffigures`
`\listoftables`	`\listoftables`
`\begin{preface}`	`\begin{preface}`
`This FFT comes from my attempts to design and implement a signal processing`	`This FFT came about originally from my attempts to design and implement a`
	`GPS decorrelation`
`algorithm inside a generic FPGA, but only on a limited budget. As such,`	`algorithm inside a generic FPGA, but only on a limited budget. As such,`
`I don't yet have the FPGA board I wish to place this algorithm onto, neither`	`it was built before I had the board that could use it. Because I was trying`
`do I have any expensive modeling or simulation capabilities. I'm using`	`to hit a very high processing rate, the FFT core was originally built to handle`
`Verilator for my modeling and simulation needs. This makes`	two samples at a time. Hence the original name, the ``Double clocked FFT
`using a vendor supplied IP core, such as an FFT, difficult if not impossible`	`core''.`
`to use.`
	`One of the difficulties of doing all of your development using Verilator as`
`My problem was made worse when I learned that the published maximum clock`	`a simulator, is that you can only simulate components that you have the`
`speed for a device wasn't necessarily the maximum clock speed that I could`	`Verilog source for. My desire to use Verilator kept me from using any of`
`achieve. My design needed to process the incoming signal at 500~MHz to be`	`the vendor supplied FFTs out there.`
`commercially viable. 500~MHz is not necessarily a clock speed`
`that can be easily achieved. 250~MHz, on the other hand, is much more within`
`the realm of possibility. Achieving a 500~MHz performance with a 250~MHz`
`clock, however, requires an FFT that accepts two samples per clock.`

`This, then, was and is the genesis of this project.`	`This, then, was and is the genesis of this project.`

	`Since this genesis, I've used this core as part of several other designs and`
	`maintaied it. Eventually it has morphed into the general purpose FFT core`
	`generator that it is today.`

`\end{preface}`	`\end{preface}`

`\chapter{Introduction}`	`\chapter{Introduction}`
`\pagenumbering{arabic}`	`\pagenumbering{arabic}`
`\setcounter{page}{1}`	`\setcounter{page}{1}`

`The Double Clocked FFT project contains all of the software necessary to`	`The General Purpose Pipelined FFT generator project contains all of the`
`create the IP to generate an arbitrary sized FFT that will clock two samples`	`software necessary to create an arbitrary sized FFT HDL core that will`
`in at each clock cycle, and after some pipeline delay it will clock two`	`accept up to two samples per clock cycle, and after some pipeline delay`
`samples out at every clock cycle.`	`it will output FFT results at the same rate they are input.`

`The FFT generated by this approach is very configurable. By simple adjustment`	`The FFT generated by this approach is very configurable. By simple adjustment`
`of a command line parameter, the FFT may be made to be a forward FFT or an`	`of a command line parameter, the FFT may be made to be a forward FFT or an`
`inverse FFT. The number of bits processed, kept, and maintained by this`	`inverse FFT. The number of bits processed, kept, and maintained by this`
`FFT are also configurable. Even the number of bits used for the twiddle`	`FFT are also configurable. Even the number of bits used for the twiddle`
`factors, or whether or not to bit reverse the outputs, are all configurable`	`factors, or whether or not to bit reverse the outputs, are all configurable`
`parts to this FFT core.`	`parts to this FFT core. Finally, the FFT can be configured to process two`
	`samples per clock, one sample per clock, one sample every other clock, or even`
	`one sample every third clock (or less).`

`These features make the Double Clocked FFT very different and unique among the`	`These features make this general purpose pipelined FFT generator very`
`other cores available on opencores.com.`	`different and unique among the other cores available on opencores.com.`

`For those who wish to get started right away, please download the package,`	`For those who wish to get started right away, please download the package,`
`change into the {\tt sw} directory and run {\tt make}. There is no need to`	`change into the {\tt sw} directory and run {\tt make}. There is no need to`
`run a configure script, {\tt fftgen} is completely portable C++. Then, once`	`run a configure script, {\tt fftgen} is completely portable C++. (While I`
`built, go ahead and run {\tt fftgen} without any arguments. This will cause`	`do my development on Ubuntu, I am told by others that the core builds on`
`{\tt fftgen} to print a usage statement to the screen. Review the usage`	`Microsoft systems as well.) Then, once built, go ahead and run {\tt fftgen}`
`statement, and run {\tt fftgen} a second time with the arguments you need.`	`without any arguments. This will cause {\tt fftgen} to print a usage`
	`statement to the screen. Review the usage statement, and run {\tt fftgen}`
	`a second time with the arguments you need.`


`\chapter{Generation}`	`\chapter{Generation}`

`Creating a double clocked FFT core is as simple as running the program`	`Creating an FFT core is as simple as running the program`
`{\tt fftgen}. The program will then create a series of Verilog files, as`	`{\tt fftgen}. The program will then create a series of Verilog files, as`
`well as {\tt .hex} files suitable for use with a \textdollar readmemh, and`	`well as {\tt .hex} files suitable for use with a \textdollar readmemh, and`
`place them into an {\tt ./fft-core/} directory that {\tt fftgen} will create.`	`place them into an output directory, {\tt ./fft-core/} by default, that`
`Creating the core you want takes a touch of configuring.`	`{\tt fftgen} will create. Creating the core you want takes a touch of`
`Therefore, the following lists the arguments that can be given to`	`configuring. Therefore, the following lists the arguments that can be given to`
`{\tt fftgen} to adjust the core that it builds:`	`{\tt fftgen} to adjust the core that it builds:`
`\begin{itemize}`	`\begin{itemize}`
`\item[\hbox{-f size}]`	`\item[\hbox{-f size}]`
`This specifies the size of the FFT core that {\tt fftgen} will build.`	`This specifies the size of the FFT core that {\tt fftgen} will build.`
`The size must be a power of two. The transform is given, within a`	`The size must be a power of two.`
`scale factor, to,`
	`Given an input $x\left[n\right]$, the FFT will calculate,`
`\begin{eqnarray*}`	`\begin{eqnarray*}`
`X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]`	`X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]`
`e^{-j2\pi \frac{k}{N}n}`	`e^{-j2\pi \frac{k}{N}n}`
`\end{eqnarray*}`	`\end{eqnarray*}`
	`to within a scale factor.`

`\item[\hbox{-1}]`	`\item[\hbox{-i}]`
`This specifies that the FFT will be an inverse FFT. Specifically,`	`This specifies that the FFT will be an inverse FFT. Specifically,`
`it will calculate,`	`it will calculate,`
`\begin{eqnarray*}`	`\begin{eqnarray*}`
`x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}`	`x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}`
`\end{eqnarray*}`	`\end{eqnarray*}`
`\item[\hbox{-0}]`
`This specifies building a forward FFT. However, since this is the`	`If no {\tt -i} option is given, the core will by default generate a`
`default, this option never necessary.`	`forward FFT.`

	`\item[\hbox{-2}]`
	`Builds an FFT that can ingest and output two samples per clock.`

	`This option requires six multiplies for all but the last two butterfly`
	`stages. The last two butterfly stages are accomplished using shifts`
	`and adds only, so they require no multiplies.`

	`\item[\hbox{-k 1}]`
	`Builds an FFT that can ingest and output one sample per clock.`
	`This option is incompatible with {\tt -2}.`

	`This option requires three multiplies for all but the last two`
	`butterfly stages.`

	`\item[\hbox{-k 2}]`
	`Builds an FFT that can ingest and output one sample every other clock.`
	`This option is incompatible with {\tt -2}, and will override {\tt -k 1}.`
	`You are responsible for making sure that {\tt i\_ce} will never be`
	`true for two clocks in a row.`

	`Unlike {\tt -k 1}, this option only requires two multiplies for all`
	`but the last two butterfly stages.`

	`\item[\hbox{-k 3}]`
	`Builds an FFT that can ingest and output one sample every third clock.`
	`This option is incompatible with {\tt -2}, and will override any`
	`other {\tt -k} option.`
	`For this to work, you will need to guarantee that {\tt i\_ce} will`
	`never be true more than one time in any three clock periods.`

	`Unlike {\tt -k 1} and {\tt -k 2}, this option only requires one`
	`multiply for all but the last two butterfly stages.`

`\item[\hbox{-s}]`	`\item[\hbox{-s}]`
`This causes the core to skip the final bit reversal stage. The`	`This causes the core to skip the final bit reversal stage. The`
`outputs of the FFT will then come out in bit reversed order.`	`outputs of the FFT will then come out in bit reversed order.`

`This option is useful in those cases where someone wishes to`	`This option is useful in those cases where someone wishes to`
Line 118...	Line 206...
`skip their bit reversals.`	`skip their bit reversals.`

`Be aware, however, doing this requires the bit reversed forward`	`Be aware, however, doing this requires the bit reversed forward`
`transform be followed by a bitreversed decimation in time approach`	`transform be followed by a bitreversed decimation in time approach`
`to the inverse transform. This software does not (yet) provide that`	`to the inverse transform. This software does not (yet) provide that`
`capability. As such, the utility just isn't there yet.`	`capability. As such, this capability is really just a placeholder for`
`\item[\hbox{-S}]`	`a future capability.`
`Include the final bit reversal stage. As this is also the default,`
`specifying the option should not be necessary.`
`\item[\hbox{-d DIR}]`	`\item[\hbox{-d DIR}]`
`Specifies the DIRectory to place the produced Verilog files. By`	`Specifies the DIRectory to place the produced Verilog files. By`
default, this will be in the `./fft-core/' directory, but it can	default, this will be in the `./fft-core/' directory, but it can
`be moved to any other directory as necessary.`	`be moved to any other directory as necessary.`
`\item[\hbox{-n bits}] Sets the number of input bits per sample. Given this`	`\item[\hbox{-n bits}] Sets the number of input bits per sample. Given this`
Line 136...	Line 222...
`\item[\hbox{-m bits}] This sets the maximum bit width of the output.`	`\item[\hbox{-m bits}] This sets the maximum bit width of the output.`
`By default, the FFT will gain bits as they accumulate within`	`By default, the FFT will gain bits as they accumulate within`
`the FFT. Bits are accumulated at roughly one bit for every two stages.`	`the FFT. Bits are accumulated at roughly one bit for every two stages.`
`However, if this value is set, bits are only accumulated up to this`	`However, if this value is set, bits are only accumulated up to this`
`maximum width. After this width, further accumulations are truncated.`	`maximum width. After this width, further accumulations are truncated.`
`\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given`	`\item[\hbox{-c bits}] Specifies the number of extra bits to be given to each`
`by the number of bits input to that stage plus this extra number of`	`twiddle factor. The size of the twiddle factors is nominally the size`
`bits per coefficient. By increasing the number of bits per coefficient`	`of the input data. By specifying {\tt -c <bits>}, you can extend`
`above that of the input samples, truncation error is kept to the`	`this default value to avoid any loss in precision.`
`original error found within the original samples.`	`\item[\hbox{-x bits}] Maintains {\tt bits} extra bits during the computation`
`\item[\hbox{-x bits}] Internally accumulated roundoff error can be a difficult`	`to deal with roundoff error. Hence, after the first stage, there will`
	`be this many excess bits within the FFT pipeline.`

	`Internally accumulated roundoff error can be a difficult`
`problem to solve. By using this option, you guarantee that the FFT`	`problem to solve. By using this option, you guarantee that the FFT`
`runs with an additional {\tt bits} bits, and only truncates down to`	`runs with an additional {\tt bits} bits, and only truncates down to`
`the necessary width at the end in order to minimize rounding`	`the necessary width at the end in order to minimize rounding`
`errors along the way.`	`errors along the way.`

`\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT`	`\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT`
`will consume. By default, the FFT does not use any hardware multiplies.`	`will consume. By default, the FFT does not use any hardware multiplies.`
`However, this can be expensive on the rest of the logic used by the`	`However, this can be expensive on the rest of the logic used by the`
`device. You can avoid this problem by allowing the FFT to use`	`device. You can avoid this problem by allowing the FFT to use`
`hardware multiplies using this option. By default, the multiplies will`	`hardware multiplies using this option. By default, the multiplies will`
Line 158...	Line 248...
`\end{itemize}`	`\end{itemize}`

`\chapter{Architecture}`	`\chapter{Architecture}`

`As a component of another system the structure of this system is a simple`	`As a component of another system the structure of this system is a simple`
`black box such as the one shown in Fig.~\ref{fig:black-box}.`	`black box such as the one shown in Fig.~\ref{fig:black-box-one}`
	`\begin{figure}\begin{center}`
	`\begin{pspicture}(-2.2in,0.3in)(2.2in,2in)`
	`% \rput(0,0){\psframe(-2.2in,0.3in)(2.2in,2in)}`
	`\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0.3in)(0.75in,2in)}`
	`\rput(0,1in){(I)FFT Core}`
	`\rput[r](-1.6in,1.8in){\tt i\_clk}`
	`\rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}`
	`\rput[r](-1.6in,1.5in){\tt i\_rst}`
	`\rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}`
	`\rput[r](-1.6in,1.2in){\tt i\_ce}`
	`\rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}`
	`\rput[r](-1.6in,0.6in){\tt i\_sample}`
	`\rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}`
	`\rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
	`\rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}`
	`%`
	`\rput[l](1.6in,1.2in){\tt o\_sync}`
	`\rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}`
	`\rput[l](1.6in,0.6in){\tt o\_result}`
	`\rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}`
	`\rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
	`\rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}`
	`}`
	`\end{pspicture}`
	`\caption{(I)FFT Black Box Diagram}\label{fig:black-box-one}`
	`\end{center}\end{figure}`
	`for the two traditional one sample per clock FFT implementation, or`
	`Fig.~\ref{fig:black-box-dbl}`
`\begin{figure}\begin{center}`	`\begin{figure}\begin{center}`
`\begin{pspicture}(-2.1in,0)(2.1in,2in)`	`\begin{pspicture}(-2.1in,0)(2.1in,2in)`
`% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}`	`% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}`
`\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}`	`\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}`
`\rput(0,1in){(I)FFT Core}`	`\rput(0,1in){(I)FFT Core}`
Line 191...	Line 309...
`\rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}`	`\rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}`
`\rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`	`\rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}`
`\rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}`	`\rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}`
`}`	`}`
`\end{pspicture}`	`\end{pspicture}`
`\caption{(I)FFT Black Box Diagram}\label{fig:black-box}`	`\caption{Two sample per clock (I)FFT Black Box Diagram}\label{fig:black-box-dbl}`
`\end{center}\end{figure}`	`\end{center}\end{figure}`
	`for the two samples per clock FFT.`


`The interface`	`The interface`
`is simple: strobe the reset line, and every clock thereafter set the clock`	`is simple: strobe the reset line, and every clock thereafter set the clock`
`enable line when data is valid on the left and right input ports. Likewise`	`enable line when data is valid on the input port. Likewise`
`for the outputs, when the {\tt o\_sync} line goes high the first data sample`	`for the outputs, when the {\tt o\_sync} line goes high the first data sample`
`is available. Ever after that, one data sample will be available every clock`	`is available. Ever after that, one data sample will be available every clock`
`cycle that the {\tt i\_ce} line is high.`	`cycle that the {\tt i\_ce} line is high.`

`Internal to the FFT, things are a touch more complex. Fig.~\ref{fig:white-box}`	`Internal to the FFT, things are a touch more complex.`

	`Fig.~\ref{fig:white-box-dbl}`
`\begin{figure}\begin{center}`	`\begin{figure}\begin{center}`
`\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)`	`\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)`
`% \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}`	`% \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}`
`\rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}`	`\rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}`
`\rput(0,5in){%`	`\rput(0,5in){%`
Line 311...	Line 434...
`\rput[r](0.15in,-0.125in){\tiny\tt o\_left}`	`\rput[r](0.15in,-0.125in){\tiny\tt o\_left}`
`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`	`\rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}`
`\rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}`	`\rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}`
`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`	`\rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}`
`\end{pspicture}`	`\end{pspicture}`
`\caption{Internal FFT Structure}\label{fig:white-box}`	`\caption{Internal FFT Structure}\label{fig:white-box-dbl}`
`\end{center}\end{figure}`	`\end{center}\end{figure}`
`attempts to show some of this structure. As you can see from the figure, the`	`attempts to show some of this structure for the two-sample per clock FFT.`
`FFT itself is composed of a series of stages. These stages are split from the`	`As you can see from the figure, the`
`beginning into an even stage and an odd stage. Further, they are numbered`	`FFT itself is composed of a series of stages. For the two-sample per clock`
	`FFT, these stages are split from the beginning into an even stage and an odd`
	`stage. Further, they are numbered`
`according to the size of the FFT they represent. Therefore the first stage`	`according to the size of the FFT they represent. Therefore the first stage`
`is numbered $N$ and represents the first stage of an $N$ point FFT. The`	`is numbered $N$ and represents the first stage of an $N$ point FFT. The`
`second stage is labeled $N/2$, then $N/$, and so on down to $N=8$. The`	`second stage is labeled $N/2$, then $N/4$, and so on down to $N=8$. The`
`four sample stage and the two sample stages are different, however. These`	`four sample stage and the two sample stages are different, however. These`
`two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be`	`two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be`
`accomplished without any multiplies. Therefore they have been accomplished`	`accomplished without any multiplies. Therefore they have been accomplished`
`separately. Likewise all of the stages, save the double stage at the bottom,`	`separately. Likewise all of the stages, save the double stage at the bottom,`
`operate on one data sample per clock. Only the last stage, prior to the`	`operate on one data sample per clock. Only the last stage, prior to the`
Line 334...	Line 459...
`as shown in Fig.~\ref{fig:fftstage}.`	`as shown in Fig.~\ref{fig:fftstage}.`
`\begin{figure}\begin{center}`	`\begin{figure}\begin{center}`
`\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)`	`\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)`
`% \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}`	`% \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}`
`\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}`	`\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}`
`\rput[r](1.625in,4.125in){\tt i\_data}`	`\rput[r](1.625in,4.125in){\tt i\_sample}`
`\rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%`	`\rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%`
`\psline{->}(0,0)(-0.2in,-0.25in)%`	`\psline{->}(0,0)(-0.2in,-0.25in)%`
`\psarc{->}{0.15in}{200}{340}}`	`\psarc{->}{0.15in}{200}{340}}`
`\rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}`	`\rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}`
`\rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}`	`\rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}`
Line 393...	Line 518...
`\rput[l](1.35in,-1.675in){\tt o\_data}`	`\rput[l](1.35in,-1.675in){\tt o\_data}`
`\end{pspicture}`	`\end{pspicture}`
`\caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage}`	`\caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage}`
`\end{center}\end{figure}`	`\end{center}\end{figure}`
`These FFT stages are really no different than any other decimation in`	`These FFT stages are really no different than any other decimation in`
`frequency FFT, save only that the coefficients are alternated between the`	`frequency radix-2 FFT implementation, save only that for the two-sample per`
`two stages. That is, the even stages get all the even coefficients, and`	`clock FFT the coefficients are alternated between the two inputs`
`the odd stages get all of the odd coefficients.`	`shown in Fig.~\ref{fig:white-box-dbl}. That is, the even stages would get all`
`Internally, each stage spends the first $N/4$ clocks storing its inputs`	`the even coefficients, and the odd stages get all of the odd coefficients.`
`into memory, and then the next $N/4$ clocks pairing a stored input with`	`For the more general purpose FFT, there's only the one pipeline, so every`
`a single external input, so that both values become inputs to the butterfly.`	`sample goes through every butterfly.`
`Likewise, the butterfly coefficient is read from a small ROM table.`	`Internally, each stage spends the first $N/2$ clocks storing its inputs`
	`into memory, and then the next $N/2$ clocks pairing a stored input with`
	`a single (fresh) external input, so that both values become inputs to the`
	`butterfly. Likewise, the butterfly coefficient is read from a small ROM table.`

`One trick to making the FFT stage work successfully is synchronization. Since`	`One trick to making the FFT stage work successfully is synchronization. Since`
`the shift and add multiplies create a delay of (roughly) one clock cycle per`	`the shift and add multiplies create a delay of (roughly) one clock cycle per`
`bit of input, there is a significant pipeline delay from the input to the`	`two bits of input, there is a significant pipeline delay from the input to the`
`output of the butterfly routine. To match this delay, the FFT stage places a`	`output of the butterfly routine. To match this delay, the FFT stage places a`
`synchronization pulse into the butterfly. When this synchronization pulse`	`synchronization pulse into the butterfly. When this synchronization pulse`
`comes out of the butterfly, the values of the butterfly then match the`	`comes out of the butterfly, the values of the butterfly then match the`
`first sample out of the stage. The next synchronization problem comes from`	`first sample out of the stage. The next synchronization problem comes from`
`the fact that the butterflies operate on two samples at a time, whereas the`	`the fact that the butterflies operate on two samples at a time, whereas the`
`FFT stage operates on a single sample at a time. This means that half the`	`FFT stage operates on a single sample at a time. This means that half the`
`time the butterfly output will be invalid. To keep things aligned, and to`	`time the butterfly output will be invalid. To keep things aligned, and to`
`avoid the invalid data half, a counter is started by the synchronization pulse`	`avoid the invalid data half, a counter is started by the synchronization pulse`
`coming out of the butterfly in order to keep track. Using this counter and`	`coming out of the butterfly in order to keep track. Using this counter and`
`once the butterfly produces the first sync pulse, the next $N/4$ clock cycles`	`once the butterfly produces the first sync pulse, the next $N/2$ clock cycles`
`will produce valid butterfly outputs. For these clock cycles, the left or`	`will produce valid butterfly outputs. For these clock cycles, the left or`
`first output is sent immediately to the next FFT stage, whereas the right`	`first output is sent immediately to the next FFT stage, whereas the right`
`or second output is saved into memory. Once these cycles are complete, the`	`or second output is saved into memory. Once these cycles are complete, the`
`butterfly outputs will be invalid for the next $N/4$ clock cycles. During`	`butterfly outputs will be invalid for the next $N/2$ clock cycles. During`
`these invalid clock cycles, the FFT stage outputs data that had been stored`	`these invalid clock cycles, the FFT stage outputs data that had been stored`
`in memory. In this fashion, data is always valid coming out of each FFT`	`in memory. In this fashion, data is always valid coming out of each FFT`
`stage once the initial synchronization pulse goes high.`	`stage once the initial synchronization pulse goes high.`

`The complex multiply itself, formed internal to the butterfly routine, is`	`The complex multiply itself, formed internal to the butterfly routine, is`
Line 448...	Line 576...
`\item Set the {\tt i\_rst} line high for at least one clock cycle`	`\item Set the {\tt i\_rst} line high for at least one clock cycle`
`before you intend to use the core.`	`before you intend to use the core.`
`\item From the time of reset until the first sample pair is available`	`\item From the time of reset until the first sample pair is available`
`on the IO ports, {\tt i\_rst} may be kept low, but the clock`	`on the IO ports, {\tt i\_rst} may be kept low, but the clock`
`enable line {\tt i\_ce} must also be kept low.`	`enable line {\tt i\_ce} must also be kept low.`
`\item On the clock containing the first sample pair, {\tt i\_left}`
`and {\tt i\_right}, set {\tt i\_ce} high.`	`\item On the clock containing the first sample, {\tt i\_sample}, or the`
`\item Ever after, any time a valid pair of samples is available to`	`first sample pair, {\tt i\_left} and {\tt i\_right} for the`
`the input of the FFT, place the first sample of the pair`	`two sample-per-clock FFT, set {\tt i\_ce} high.`
`on the {\tt i\_left} line, the second on the {\tt i\_right}`
`line, and set {\tt i\_ce} high.`	`If you have elected an FFT that multiplexes its multiplies,`
	`and so can only handle one sample every two or three clocks,`
	`then you'll need to guarantee {\tt i\_ce} remains low for`
	`one or two clocks respectively before raising it again.`

	`\item Ever after, any time a valid sample is placed in {\tt i\_sample}`
	`and {\tt i\_ce} raised high, a sample will enter into the`
	`FFT for processing. For the two sample-per-clock FFT, the`
	`input will be into {\tt i\_left} (for the first input)`
	`and {\tt i\_right} (for the next one).`

`\item At the first valid output, the FFT core will set {\tt o\_sync}`	`\item At the first valid output, the FFT core will set {\tt o\_sync}`
`line high in addition to the output values {\tt o\_left}`	`line high in addition to placing the output value into`
`(the first of two), and {\tt o\_right} (the second of the two).`	`{\tt o\_result}. For the two-sample per clock FFT, the outputs`
	`will be placed into {\tt o\_left}`
	`(the first of two), and {\tt o\_right} (the second of the two)`
	`respectively.`

`\item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock`	`\item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock`
`two samples in and two samples out. On any valid first`	`two samples in and two samples out. On any valid first`
`pair of samples coming out of the transform,`	`pair of samples coming out of the transform,`
`{\tt o\_sync} will be high. Otherwise {\tt o\_sync} will`	`{\tt o\_sync} will be high. Otherwise {\tt o\_sync} will`
`remain low.`	`remain low.`
Line 496...	Line 638...
`\begin{portlist}`	`\begin{portlist}`
`i\_clk & 1 & Input & The global clock driving the FFT. \\\hline`	`i\_clk & 1 & Input & The global clock driving the FFT. \\\hline`
`i\_rst & 1 & Input & An active high synchronous reset.\\\hline`	`i\_rst & 1 & Input & An active high synchronous reset.\\\hline`
`i\_ce & 1 & Input & Clock Enable. Set this high to clock data in and`	`i\_ce & 1 & Input & Clock Enable. Set this high to clock data in and`
`out.\\\hline`	`out.\\\hline`
`i\_left & $2N_i$ & Input & The first of two input complex input samples. Bits`	`i\_sample & $2N_i$ & Input & The complex input sample. Bits`
`[$\left(2N_i-1\right)$:$N_i$] of this value are the real`	`[$\left(2N_i-1\right)$:$N_i$] of this value are the real`
`portion, whereas bits [$\left(N_i-1\right)$:0] represent the`	`portion, whereas bits [$\left(N_i-1\right)$:0] represent the`
`imaginary portion. Both portions are in signed twos complement`	`imaginary portion. Both portions are in signed twos complement`
`integer format. The number of bits, $N_i$, is configurable.`	`integer format. The number of bits, $N_i$, is configurable.`
`\\\hline`	`\\\hline`
`i\_right & $2N_i$ & Input & The second of two input complex input samples.`
`The format is the same as {\tt i\_left} above.\\\hline`	`i\_left & $2N_i$ & Input & When the core is configured for two-samples per`
`o\_left & $2N_o$ & Output & The first of two input complex output samples.`	`clock,`
`The format is the same, save only that $N_o$ bits are`	`this is the first of the two data inputs presented to the core`
`used for each twos complement portion instead of $N_i$.\\\hline`	`on any given clock. It has the same format as {\tt i\_sample}`
`o\_right & $2N_o$ & Output & The second of two input complex output samples.`	`above.`
`The format is the same as for {\tt o\_left} above.\\\hline`	`\\\hline`
`o\_sync & 1 & Output & Signals the first output sample pair of any transform,`	`i\_right & $2N_i$ & Input & The second of two input complex input samples,`
`zero otherwise.`	`used when the core is configured for two-samples per clock.`
	`The format is the same as {\tt i\_sample} above.\\\hline`
	`o\_sync & 1 & Output & Signals the first output sample of any transform.`
	`It will be zero from the time of the reset until the first`
	`output sample. Ever afterwards, it will be true any time`
	`bin zero of the FFT is on the output.\\\hline`
	`o\_result & $2N_o$ & Output & The complex output sample. The format is the`
	`same, save only that $N_o$ bits are used for each twos`
	`complement portion instead of $N_i$. Hence bits`
	`[$\left(2N_o-1\right)$:$N_o$] of this value are the real`
	`portion, whereas bits [$\left(N_o-1\right)$:0] represent the`
	`imaginary portion.`
	`\\\hline`
	`o\_left & $2N_o$ & Output & When in the two-sample per clock configuration,`
	`this is the first of two complex output samples.\\\hline`
	`o\_right & $2N_o$ & Output & When in the two-sample per clcok configuration,`
	`this is the second of two complex output samples. \\\hline`
`\\\hline`	`\\\hline`
`\end{portlist}`	`\end{portlist}`
`\caption{List of IO ports}\label{tbl:ioports}`	`\caption{List of IO ports}\label{tbl:ioports}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`% Appendices`	`% Appendices`

Line 1...

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%

%% Filename:    doc/src/spec.tex

%%

%% Project:     A General Purpose Pipelined FFT Implementation

%%

%% Purpose:     This file contains the LaTeX instructions necessary to build

%%              the doc/spec.pdf file.  It's not nearly as interesting as the

%%      doc/spec.pdf file itself, so I would recommend you read that file

%%      before looking for items within here.

%%

%% Creator:     Dan Gisselquist, Ph.D.

%%              Gisselquist Technology, LLC

%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%

%% Copyright (C) 2018, Gisselquist Technology, LLC

%%

%% This file is part of the general purpose pipelined FFT project.

%%

%% The pipelined FFT project is free software (firmware): you can redistribute

%% it and/or modify it under the terms of the GNU Lesser General Public License

%% as published by the Free Software Foundation, either version 3 of the

%% License, or (at your option) any later version.

%%

%% The pipelined FFT project is distributed in the hope that it will be useful,

%% but WITHOUT ANY WARRANTY; without even the implied warranty of

%% MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser

%% General Public License for more details.

%%

%% You should have received a copy of the GNU Lesser General Public License

%% along with this program.  (It's in the $(ROOT)/doc directory.  Run make

%% with no target there if the PDF file isn't present.)  If not, see

%% <http://www.gnu.org/licenses/> for a copy.

%%

%% License:     LGPL, v3, as defined and found on www.gnu.org,

%%              http://www.gnu.org/licenses/lgpl.html

%%

%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%

%%

\documentclass{gqtekspec}

\documentclass{gqtekspec}

\project{Double Clocked FFT}

\project{Pipelined FFT}

\title{Specification}

\title{Specification}

\author{Dan Gisselquist, Ph.D.}

\author{Dan Gisselquist, Ph.D.}

\email{dgisselq (at) opencores.org}

\email{dgisselq (at) opencores.org}

\revision{Rev.~0.2}

\revision{Rev.~0.3}

\begin{document}

\begin{document}

\pagestyle{gqtekspecplain}

\pagestyle{gqtekspecplain}

\titlepage

\titlepage

\begin{license}

\begin{license}

Copyright (C) \theyear\today, Gisselquist Technology, LLC

Copyright (C) \theyear\today, Gisselquist Technology, LLC

This project is free software (firmware): you can redistribute it and/or

This file is part of the general purpose pipelined FFT project.

modify it under the terms of  the GNU General Public License as published

by the Free Software Foundation, either version 3 of the License, or (at

your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or

FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License

for more details.

You should have received a copy of the GNU General Public License along

The pipelined FFT project is free software (firmware): you can redistribute

with this program.  If not, see \texttt{http://www.gnu.org/licenses/} for a copy.

it and/or modify it under the terms of the GNU Lesser General Public License

as published by the Free Software Foundation, either version 3 of the

License, or (at your option) any later version.

The pipelined FFT project is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser

General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along

with this project.  If not, see \texttt{http://www.gnu.org/licenses/} for a

copy.

\end{license}

\end{license}

\begin{revisionhistory}

\begin{revisionhistory}

0.3 & 6/2/2015 & Gisselquist & General purpose pipelined FFT generator\\\hline

0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline

0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline

0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline

0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline

\end{revisionhistory}

\end{revisionhistory}

% Revision History

% Revision History

% Table of Contents, named Contents

% Table of Contents, named Contents

\tableofcontents

\tableofcontents

\listoffigures

\listoffigures

\listoftables

\listoftables

\begin{preface}

\begin{preface}

This FFT comes from my attempts to design and implement a signal processing

This FFT came about originally from my attempts to design and implement a

GPS decorrelation

algorithm inside a generic FPGA, but only on a limited budget.  As such,

algorithm inside a generic FPGA, but only on a limited budget.  As such,

I don't yet have the FPGA board I wish to place this algorithm onto, neither

it was built before I had the board that could use it.  Because I was trying

do I have any expensive modeling or simulation capabilities.  I'm using

to hit a very high processing rate, the FFT core was originally built to handle

Verilator for my modeling and simulation needs.  This makes

two samples at a time.  Hence the original name, the ``Double clocked FFT

using a vendor supplied IP core, such as an FFT, difficult if not impossible

core''.

to use.

One of the difficulties of doing all of your development using Verilator as

My problem was made worse when I learned that the published maximum clock

a simulator, is that you can only simulate components that you have the

speed for a device wasn't necessarily the maximum clock speed that I could

Verilog source for.  My desire to use Verilator kept me from using any of

achieve.  My design needed to process the incoming signal at 500~MHz to be

the vendor supplied FFTs out there.

commercially viable.  500~MHz is not necessarily a clock speed

that can be easily achieved.  250~MHz, on the other hand, is much more within

the realm of possibility.  Achieving a 500~MHz performance with a 250~MHz

clock, however, requires an FFT that accepts two samples per clock.

This, then, was and is the genesis of this project.

This, then, was and is the genesis of this project.

Since this genesis, I've used this core as part of several other designs and

maintaied it.  Eventually it has morphed into the general purpose FFT core

generator that it is today.

\end{preface}

\end{preface}

\chapter{Introduction}

\chapter{Introduction}

\pagenumbering{arabic}

\pagenumbering{arabic}

\setcounter{page}{1}

\setcounter{page}{1}

The Double Clocked FFT project contains all of the software necessary to

The General Purpose Pipelined FFT generator project contains all of the

create the IP to generate an arbitrary sized FFT that will clock two samples

software necessary to create an arbitrary sized FFT HDL core that will

in at each clock cycle, and after some pipeline delay it will clock two

accept up to two samples per clock cycle, and after some pipeline delay

samples out at every clock cycle.

it will output FFT results at the same rate they are input.

The FFT generated by this approach is very configurable.  By simple adjustment

The FFT generated by this approach is very configurable.  By simple adjustment

of a command line parameter, the FFT may be made to be a forward FFT or an

of a command line parameter, the FFT may be made to be a forward FFT or an

inverse FFT.  The number of bits processed, kept, and maintained by this

inverse FFT.  The number of bits processed, kept, and maintained by this

FFT are also configurable.  Even the number of bits used for the twiddle

FFT are also configurable.  Even the number of bits used for the twiddle

factors, or whether or not to bit reverse the outputs, are all configurable

factors, or whether or not to bit reverse the outputs, are all configurable

parts to this FFT core.

parts to this FFT core.  Finally, the FFT can be configured to process two

samples per clock, one sample per clock, one sample every other clock, or even

one sample every third clock (or less).

These features make the Double Clocked FFT very different and unique among the

These features make this general purpose pipelined FFT generator very

other cores available on opencores.com.

different and unique among the other cores available on opencores.com.

For those who wish to get started right away, please download the package,

For those who wish to get started right away, please download the package,

change into the {\tt sw} directory and run {\tt make}.  There is no need to

change into the {\tt sw} directory and run {\tt make}.  There is no need to

run a configure script, {\tt fftgen} is completely portable C++.  Then, once

run a configure script, {\tt fftgen} is completely portable C++.  (While I

built, go ahead and run {\tt fftgen} without any arguments.  This will cause

do my development on Ubuntu, I am told by others that the core builds on

{\tt fftgen} to print a usage statement to the screen.  Review the usage

Microsoft systems as well.)  Then, once built, go ahead and run {\tt fftgen}

statement, and run {\tt fftgen} a second time with the arguments you need.

without any arguments.  This will cause {\tt fftgen} to print a usage

statement to the screen.  Review the usage statement, and run {\tt fftgen}

a second time with the arguments you need.

\chapter{Generation}

\chapter{Generation}

Creating a double clocked FFT core is as simple as running the program

Creating an FFT core is as simple as running the program

{\tt fftgen}.  The program will then create a series of Verilog files, as

{\tt fftgen}.  The program will then create a series of Verilog files, as

well as {\tt .hex} files suitable for use with a \textdollar readmemh, and

well as {\tt .hex} files suitable for use with a \textdollar readmemh, and

place them into an {\tt ./fft-core/} directory that {\tt fftgen} will create.

place them into an output directory, {\tt ./fft-core/} by default, that

Creating the core you want takes a touch of configuring.

{\tt fftgen} will create.  Creating the core you want takes a touch of

Therefore, the following lists the arguments that can be given to

configuring.  Therefore, the following lists the arguments that can be given to

{\tt fftgen} to adjust the core that it builds:

{\tt fftgen} to adjust the core that it builds:

\begin{itemize}

\begin{itemize}

\item[\hbox{-f size}]

\item[\hbox{-f size}]

        This specifies the size of the FFT core that {\tt fftgen} will build.

        This specifies the size of the FFT core that {\tt fftgen} will build.

        The size must be a power of two.  The transform is given, within a

        The size must be a power of two.

        scale factor, to,

        Given an input $x\left[n\right]$, the FFT will calculate,

        \begin{eqnarray*}

        \begin{eqnarray*}

        X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]

        X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]

                e^{-j2\pi \frac{k}{N}n}

                e^{-j2\pi \frac{k}{N}n}

        \end{eqnarray*}

        \end{eqnarray*}

        to within a scale factor.

\item[\hbox{-1}]

\item[\hbox{-i}]

        This specifies that the FFT will be an inverse FFT.  Specifically,

        This specifies that the FFT will be an inverse FFT.  Specifically,

        it will calculate,

        it will calculate,

        \begin{eqnarray*}

        \begin{eqnarray*}

        x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}

        x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}

        \end{eqnarray*}

        \end{eqnarray*}

\item[\hbox{-0}]

        This specifies building a forward FFT.  However, since this is the

        If no {\tt -i} option is given, the core will by default generate a

        default, this option never necessary.

        forward FFT.

\item[\hbox{-2}]

        Builds an FFT that can ingest and output two samples per clock.

        This option requires six multiplies for all but the last two butterfly

        stages.  The last two butterfly stages are accomplished using shifts

        and adds only, so they require no multiplies.

\item[\hbox{-k 1}]

        Builds an FFT that can ingest and output one sample per clock.

        This option is incompatible with {\tt -2}.

        This option requires three multiplies for all but the last two

        butterfly stages.

\item[\hbox{-k 2}]

        Builds an FFT that can ingest and output one sample every other clock.

        This option is incompatible with {\tt -2}, and will override {\tt -k 1}.

        You are responsible for making sure that {\tt i\_ce} will never be

        true for two clocks in a row.

        Unlike {\tt -k 1}, this option only requires two multiplies for all

        but the last two butterfly stages.

\item[\hbox{-k 3}]

        Builds an FFT that can ingest and output one sample every third clock.

        This option is incompatible with {\tt -2}, and will override any

        other {\tt -k} option.

        For this to work, you will need to guarantee that {\tt i\_ce} will

        never be true more than one time in any three clock periods.

        Unlike {\tt -k 1} and {\tt -k 2}, this option only requires one

        multiply for all but the last two butterfly stages.

\item[\hbox{-s}]

\item[\hbox{-s}]

        This causes the core to skip the final bit reversal stage.  The

        This causes the core to skip the final bit reversal stage.  The

        outputs of the FFT will then come out in bit reversed order.

        outputs of the FFT will then come out in bit reversed order.

        This option is useful in those cases where someone wishes to

        This option is useful in those cases where someone wishes to

Line 118...

Line 206...

        skip their bit reversals.

        skip their bit reversals.

        Be aware, however, doing this requires the bit reversed forward

        Be aware, however, doing this requires the bit reversed forward

        transform be followed by a bitreversed decimation in time approach

        transform be followed by a bitreversed decimation in time approach

        to the inverse transform.  This software does not (yet) provide that

        to the inverse transform.  This software does not (yet) provide that

        capability.  As such, the utility just isn't there yet.

        capability.  As such, this capability is really just a placeholder for

\item[\hbox{-S}]

        a future capability.

        Include the final bit reversal stage.  As this is also the default,

        specifying the option should not be necessary.

\item[\hbox{-d DIR}]

\item[\hbox{-d DIR}]

        Specifies the DIRectory to place the produced Verilog files.  By

        Specifies the DIRectory to place the produced Verilog files.  By

        default, this will be in the `./fft-core/' directory, but it can

        default, this will be in the `./fft-core/' directory, but it can

        be moved to any other directory as necessary.

        be moved to any other directory as necessary.

\item[\hbox{-n bits}] Sets the number of input bits per sample.  Given this

\item[\hbox{-n bits}] Sets the number of input bits per sample.  Given this

Line 136...

Line 222...

\item[\hbox{-m bits}] This sets the maximum bit width of the output.

\item[\hbox{-m bits}] This sets the maximum bit width of the output.

        By default, the FFT will gain bits as they accumulate within

        By default, the FFT will gain bits as they accumulate within

        the FFT.  Bits are accumulated at roughly one bit for every two stages.

        the FFT.  Bits are accumulated at roughly one bit for every two stages.

        However, if this value is set, bits are only accumulated up to this

        However, if this value is set, bits are only accumulated up to this

        maximum width.  After this width, further accumulations are truncated.

        maximum width.  After this width, further accumulations are truncated.

\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given

\item[\hbox{-c bits}] Specifies the number of extra bits to be given to each

        by the number of bits input to that stage plus this extra number of

        twiddle factor.  The size of the twiddle factors is nominally the size

        bits per coefficient.  By increasing the number of bits per coefficient

        of the input data.  By specifying {\tt -c <bits>}, you can extend

        above that of the input samples, truncation error is kept to the

        this default value to avoid any loss in precision.

        original error found within the original samples.

\item[\hbox{-x bits}]  Maintains {\tt bits} extra bits during the computation

\item[\hbox{-x bits}] Internally accumulated roundoff error can be a difficult

        to deal with roundoff error.  Hence, after the first stage, there will

        be this many excess bits within the FFT pipeline.

        Internally accumulated roundoff error can be a difficult

        problem to solve.  By using this option, you guarantee that the FFT

        problem to solve.  By using this option, you guarantee that the FFT

        runs with an additional {\tt bits} bits, and only truncates down to

        runs with an additional {\tt bits} bits, and only truncates down to

        the necessary width at the end in order to minimize rounding

        the necessary width at the end in order to minimize rounding

        errors along the way.

        errors along the way.

\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT

\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT

        will consume.  By default, the FFT does not use any hardware multiplies.

        will consume.  By default, the FFT does not use any hardware multiplies.

        However, this can be expensive on the rest of the logic used by the

        However, this can be expensive on the rest of the logic used by the

        device.  You can avoid this problem by allowing the FFT to use

        device.  You can avoid this problem by allowing the FFT to use

        hardware multiplies using this option.  By default, the multiplies will

        hardware multiplies using this option.  By default, the multiplies will

Line 158...

Line 248...

\end{itemize}

\end{itemize}

\chapter{Architecture}

\chapter{Architecture}

As a component of another system the structure of this system is a simple

As a component of another system the structure of this system is a simple

black box such as the one shown in Fig.~\ref{fig:black-box}.

black box such as the one shown in Fig.~\ref{fig:black-box-one}

\begin{figure}\begin{center}

\begin{pspicture}(-2.2in,0.3in)(2.2in,2in)

% \rput(0,0){\psframe(-2.2in,0.3in)(2.2in,2in)}

\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0.3in)(0.75in,2in)}

        \rput(0,1in){(I)FFT Core}

        \rput[r](-1.6in,1.8in){\tt i\_clk}

                \rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}

        \rput[r](-1.6in,1.5in){\tt i\_rst}

                \rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}

        \rput[r](-1.6in,1.2in){\tt i\_ce}

                \rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}

        \rput[r](-1.6in,0.6in){\tt i\_sample}

                \rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}

                \rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}

                \rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}

        \rput[l](1.6in,1.2in){\tt o\_sync}

                \rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}

        \rput[l](1.6in,0.6in){\tt o\_result}

                \rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}

                \rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}

                \rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}

\end{pspicture}

\caption{(I)FFT Black Box Diagram}\label{fig:black-box-one}

\end{center}\end{figure}

for the two traditional one sample per clock FFT implementation, or

Fig.~\ref{fig:black-box-dbl}

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\begin{pspicture}(-2.1in,0)(2.1in,2in)

\begin{pspicture}(-2.1in,0)(2.1in,2in)

% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}

% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}

\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}

\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}

        \rput(0,1in){(I)FFT Core}

        \rput(0,1in){(I)FFT Core}

Line 191...

Line 309...

                \rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}

                \rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}

                \rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}

                \rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}

                \rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}

                \rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}

\end{pspicture}

\end{pspicture}

\caption{(I)FFT Black Box Diagram}\label{fig:black-box}

\caption{Two sample per clock (I)FFT Black Box Diagram}\label{fig:black-box-dbl}

\end{center}\end{figure}

\end{center}\end{figure}

for the two samples per clock FFT.

The interface

The interface

is simple: strobe the reset line, and every clock thereafter set the clock

is simple: strobe the reset line, and every clock thereafter set the clock

enable line when data is valid on the left and right input ports.  Likewise

enable line when data is valid on the input port.  Likewise

for the outputs, when the {\tt o\_sync} line goes high the first data sample

for the outputs, when the {\tt o\_sync} line goes high the first data sample

is available.  Ever after that, one data sample will be available every clock

is available.  Ever after that, one data sample will be available every clock

cycle that the {\tt i\_ce} line is high.

cycle that the {\tt i\_ce} line is high.

Internal to the FFT, things are a touch more complex.  Fig.~\ref{fig:white-box}

Internal to the FFT, things are a touch more complex.

Fig.~\ref{fig:white-box-dbl}

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)

\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)

        % \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}

        % \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}

        \rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}

        \rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}

        \rput(0,5in){%

        \rput(0,5in){%

Line 311...

Line 434...

                        \rput[r](0.15in,-0.125in){\tiny\tt o\_left}

                        \rput[r](0.15in,-0.125in){\tiny\tt o\_left}

                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}

                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}

                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}

                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}

                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}

                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}

\end{pspicture}

\end{pspicture}

\caption{Internal FFT Structure}\label{fig:white-box}

\caption{Internal FFT Structure}\label{fig:white-box-dbl}

\end{center}\end{figure}

\end{center}\end{figure}

attempts to show some of this structure.  As you can see from the figure, the

attempts to show some of this structure for the two-sample per clock FFT.

FFT itself is composed of a series of stages.  These stages are split from the

As you can see from the figure, the

beginning into an even stage and an odd stage.  Further, they are numbered

FFT itself is composed of a series of stages.  For the two-sample per clock

FFT, these stages are split from the beginning into an even stage and an odd

stage.  Further, they are numbered

according to the size of the FFT they represent.  Therefore the first stage

according to the size of the FFT they represent.  Therefore the first stage

is numbered $N$ and represents the first stage of an $N$ point FFT.  The

is numbered $N$ and represents the first stage of an $N$ point FFT.  The

second stage is labeled $N/2$, then $N/$, and so on down to $N=8$.  The

second stage is labeled $N/2$, then $N/4$, and so on down to $N=8$.  The

four sample stage and the two sample stages are different, however.  These

four sample stage and the two sample stages are different, however.  These

two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be

two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be

accomplished without any multiplies.  Therefore they have been accomplished

accomplished without any multiplies.  Therefore they have been accomplished

separately.  Likewise all of the stages, save the double stage at the bottom,

separately.  Likewise all of the stages, save the double stage at the bottom,

operate on one data sample per clock.  Only the last stage, prior to the

operate on one data sample per clock.  Only the last stage, prior to the

Line 334...

Line 459...

as shown in Fig.~\ref{fig:fftstage}.

as shown in Fig.~\ref{fig:fftstage}.

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)

\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)

        % \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}

        % \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}

        \rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}

        \rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}

        \rput[r](1.625in,4.125in){\tt i\_data}

        \rput[r](1.625in,4.125in){\tt i\_sample}

        \rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%

        \rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%

                        \psline{->}(0,0)(-0.2in,-0.25in)%

                        \psline{->}(0,0)(-0.2in,-0.25in)%

                        \psarc{->}{0.15in}{200}{340}}

                        \psarc{->}{0.15in}{200}{340}}

        \rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}

        \rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}

                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}

                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}

Line 393...

Line 518...

        \rput[l](1.35in,-1.675in){\tt o\_data}

        \rput[l](1.35in,-1.675in){\tt o\_data}

\end{pspicture}

\end{pspicture}

\caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage}

\caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage}

\end{center}\end{figure}

\end{center}\end{figure}

These FFT stages are really no different than any other decimation in

These FFT stages are really no different than any other decimation in

frequency FFT, save only that the coefficients are alternated between the

frequency radix-2 FFT implementation, save only that for the two-sample per

two stages.  That is, the even stages get all the even coefficients, and

clock FFT the coefficients are alternated between the two inputs

the odd stages get all of the odd coefficients.

shown in Fig.~\ref{fig:white-box-dbl}.  That is, the even stages would get all

Internally, each stage spends the first $N/4$ clocks storing its inputs

the even coefficients, and the odd stages get all of the odd coefficients.

into memory, and then the next $N/4$ clocks pairing a stored input with

For the more general purpose FFT, there's only the one pipeline, so every

a single external input, so that both values become inputs to the butterfly.

sample goes through every butterfly.

Likewise, the butterfly coefficient is read from a small ROM table.

Internally, each stage spends the first $N/2$ clocks storing its inputs

into memory, and then the next $N/2$ clocks pairing a stored input with

a single (fresh) external input, so that both values become inputs to the

butterfly.  Likewise, the butterfly coefficient is read from a small ROM table.

One trick to making the FFT stage work successfully is synchronization.  Since

One trick to making the FFT stage work successfully is synchronization.  Since

the shift and add multiplies create a delay of (roughly) one clock cycle per

the shift and add multiplies create a delay of (roughly) one clock cycle per

bit of input, there is a significant pipeline delay from the input to the

two bits of input, there is a significant pipeline delay from the input to the

output of the butterfly routine.  To match this delay, the FFT stage places a

output of the butterfly routine.  To match this delay, the FFT stage places a

synchronization pulse into the butterfly.  When this synchronization pulse

synchronization pulse into the butterfly.  When this synchronization pulse

comes out of the butterfly, the values of the butterfly then match the

comes out of the butterfly, the values of the butterfly then match the

first sample out of the stage.  The next synchronization problem comes from

first sample out of the stage.  The next synchronization problem comes from

the fact that the butterflies operate on two samples at a time, whereas the

the fact that the butterflies operate on two samples at a time, whereas the

FFT stage operates on a single sample at a time.  This means that half the

FFT stage operates on a single sample at a time.  This means that half the

time the butterfly output will be invalid.  To keep things aligned, and to

time the butterfly output will be invalid.  To keep things aligned, and to

avoid the invalid data half, a counter is started by the synchronization pulse

avoid the invalid data half, a counter is started by the synchronization pulse

coming out of the butterfly in order to keep track.  Using this counter and

coming out of the butterfly in order to keep track.  Using this counter and

once the butterfly produces the first sync pulse, the next $N/4$ clock cycles

once the butterfly produces the first sync pulse, the next $N/2$ clock cycles

will produce valid butterfly outputs.  For these clock cycles, the left or

will produce valid butterfly outputs.  For these clock cycles, the left or

first output is sent immediately to the next FFT stage, whereas the right

first output is sent immediately to the next FFT stage, whereas the right

or second output is saved into memory.  Once these cycles are complete, the

or second output is saved into memory.  Once these cycles are complete, the

butterfly outputs will be invalid for the next $N/4$ clock cycles.  During

butterfly outputs will be invalid for the next $N/2$ clock cycles.  During

these invalid clock cycles, the FFT stage outputs data that had been stored

these invalid clock cycles, the FFT stage outputs data that had been stored

in memory.  In this fashion, data is always valid coming out of each FFT

in memory.  In this fashion, data is always valid coming out of each FFT

stage once the initial synchronization pulse goes high.

stage once the initial synchronization pulse goes high.

The complex multiply itself, formed internal to the butterfly routine, is

The complex multiply itself, formed internal to the butterfly routine, is

Line 448...

Line 576...

        \item Set the {\tt i\_rst} line high for at least one clock cycle

        \item Set the {\tt i\_rst} line high for at least one clock cycle

                before you intend to use the core.

                before you intend to use the core.

        \item From the time of reset until the first sample pair is available

        \item From the time of reset until the first sample pair is available

                on the IO ports, {\tt i\_rst} may be kept low, but the clock

                on the IO ports, {\tt i\_rst} may be kept low, but the clock

                enable line {\tt i\_ce} must also be kept low.

                enable line {\tt i\_ce} must also be kept low.

        \item On the clock containing the first sample pair, {\tt i\_left}

                and {\tt i\_right}, set {\tt i\_ce} high.

        \item On the clock containing the first sample, {\tt i\_sample}, or the

        \item Ever after, any time a valid pair of samples is available to

                first sample pair, {\tt i\_left} and {\tt i\_right} for the

                the input of the FFT, place the first sample of the pair

                two sample-per-clock FFT, set {\tt i\_ce} high.

                on the {\tt i\_left} line, the second on the {\tt i\_right}

                line, and set {\tt i\_ce} high.

                If you have elected an FFT that multiplexes its multiplies,

                and so can only handle one sample every two or three clocks,

                then you'll need to guarantee {\tt i\_ce} remains low for

                one or two clocks respectively before raising it again.

        \item Ever after, any time a valid sample is placed in {\tt i\_sample}

                and {\tt i\_ce} raised high, a sample will enter into the

                FFT for processing.  For the two sample-per-clock FFT, the

                input will be into {\tt i\_left} (for the first input)

                and {\tt i\_right} (for the next one).

        \item At the first valid output, the FFT core will set {\tt o\_sync}

        \item At the first valid output, the FFT core will set {\tt o\_sync}

                line high in addition to the output values {\tt o\_left}

                line high in addition to placing the output value into

                (the first of two), and {\tt o\_right} (the second of the two).

                {\tt o\_result}.  For the two-sample per clock FFT, the outputs

                will be placed into {\tt o\_left}

                (the first of two), and {\tt o\_right} (the second of the two)

                respectively.

        \item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock

        \item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock

                two samples in and two samples out.  On any valid first

                two samples in and two samples out.  On any valid first

                pair of samples coming out of the transform,

                pair of samples coming out of the transform,

                {\tt o\_sync} will be high.  Otherwise {\tt o\_sync} will

                {\tt o\_sync} will be high.  Otherwise {\tt o\_sync} will

                remain low.

                remain low.

Line 496...

Line 638...

\begin{portlist}

\begin{portlist}

i\_clk & 1 & Input & The global clock driving the FFT. \\\hline

i\_clk & 1 & Input & The global clock driving the FFT. \\\hline

i\_rst & 1 & Input & An active high synchronous reset.\\\hline

i\_rst & 1 & Input & An active high synchronous reset.\\\hline

i\_ce & 1 & Input & Clock Enable.  Set this high to clock data in and

i\_ce & 1 & Input & Clock Enable.  Set this high to clock data in and

                out.\\\hline

                out.\\\hline

i\_left & $2N_i$ & Input & The first of two input complex input samples.  Bits

i\_sample & $2N_i$ & Input & The complex input sample.  Bits

                [$\left(2N_i-1\right)$:$N_i$] of this value are the real

                [$\left(2N_i-1\right)$:$N_i$] of this value are the real

                portion, whereas bits [$\left(N_i-1\right)$:0] represent the

                portion, whereas bits [$\left(N_i-1\right)$:0] represent the

                imaginary portion.  Both portions are in signed twos complement

                imaginary portion.  Both portions are in signed twos complement

                integer format.  The number of bits, $N_i$, is configurable.

                integer format.  The number of bits, $N_i$, is configurable.

                \\\hline

                \\\hline

i\_right & $2N_i$ & Input & The second of two input complex input samples.

                The format is the same as {\tt i\_left} above.\\\hline

i\_left & $2N_i$ & Input & When the core is configured for two-samples per

o\_left & $2N_o$ & Output & The first of two input complex output samples.

                clock,

                The format is the same, save only that $N_o$ bits are

                this is the first of the two data inputs presented to the core

                used for each twos complement portion instead of $N_i$.\\\hline

                on any given clock.  It has the same format as {\tt i\_sample}

o\_right & $2N_o$ & Output & The second of two input complex output samples.

                above.

                The format is the same as for {\tt o\_left} above.\\\hline

                \\\hline

o\_sync & 1 & Output & Signals the first output sample pair of any transform,

i\_right & $2N_i$ & Input & The second of two input complex input samples,

                zero otherwise.

                used when the core is configured for two-samples per clock.

                The format is the same as {\tt i\_sample} above.\\\hline

o\_sync & 1 & Output & Signals the first output sample of any transform.

                It will be zero from the time of the reset until the first

                output sample.  Ever afterwards, it will be true any time

                bin zero of the FFT is on the output.\\\hline

o\_result & $2N_o$ & Output & The complex output sample.  The format is the

                same, save only that $N_o$ bits are used for each twos

                complement portion instead of $N_i$.  Hence bits

                [$\left(2N_o-1\right)$:$N_o$] of this value are the real

                portion, whereas bits [$\left(N_o-1\right)$:0] represent the

                imaginary portion.

                \\\hline

o\_left & $2N_o$ & Output & When in the two-sample per clock configuration,

                this is the first of two complex output samples.\\\hline

o\_right & $2N_o$ & Output & When in the two-sample per clcok configuration,

                this is the second of two complex output samples.  \\\hline

                \\\hline

                \\\hline

\end{portlist}

\end{portlist}

\caption{List of IO ports}\label{tbl:ioports}

\caption{List of IO ports}\label{tbl:ioports}

\end{center}\end{table}

\end{center}\end{table}

% Appendices

% Appendices

Browse

Tools

Subversion Repositories dblclockfft

[/] [dblclockfft/] [trunk/] [doc/] [src/] [spec.tex] - Diff between revs 32 and 42