OpenCores

/doc/Makefile

1,20 → 1,75

/doc/lgpl-3.0.pdf Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream

/doc/lgpl-3.0.pdf

doc/lgpl-3.0.pdf Property changes : Added: svn:mime-type ## -0,0 +1 ## +application/octet-stream \ No newline at end of property Index: doc/spec.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: doc/src/lgpl-3.0.tex =================================================================== --- doc/src/lgpl-3.0.tex (nonexistent) +++ doc/src/lgpl-3.0.tex (revision 42) @@ -0,0 +1,206 @@ +\documentclass[11pt]{article} + +% match lgpl-3.0.txt +\renewcommand{\labelenumii}{\alph{enumii})} +\renewcommand{\labelenumiii}{\arabic{enumiii})} + +\title{GNU LESSER GENERAL PUBLIC LICENSE} +\date{Version 3, 29 June 2007} + +\begin{document} +\maketitle + +\begin{center} +{\parindent 0in + +Copyright \copyright\ 2007 Free Software Foundation, Inc. \texttt{http://fsf.org/} + +\bigskip +Everyone is permitted to copy and distribute verbatim copies of this + +license document, but changing it is not allowed.} + +\end{center} + + + This version of the GNU Lesser General Public License incorporates +the terms and conditions of version 3 of the GNU General Public +License, supplemented by the additional permissions listed below. + +\begin{enumerate} +\addtocounter{enumi}{-1} % start at 0 + +\item Additional Definitions. + + As used herein, ``this License'' refers to version 3 of the GNU Lesser +General Public License, and the ``GNU GPL'' refers to version 3 of the GNU +General Public License. + + ``The Library'' refers to a covered work governed by this License, +other than an Application or a Combined Work as defined below. + + An ``Application'' is any work that makes use of an interface provided +by the Library, but which is not otherwise based on the Library. +Defining a subclass of a class defined by the Library is deemed a mode +of using an interface provided by the Library. + + A ``Combined Work'' is a work produced by combining or linking an +Application with the Library. The particular version of the Library +with which the Combined Work was made is also called the ``Linked +Version''. + + The ``Minimal Corresponding Source'' for a Combined Work means the +Corresponding Source for the Combined Work, excluding any source code +for portions of the Combined Work that, considered in isolation, are +based on the Application, and not on the Linked Version. + + The ``Corresponding Application Code'' for a Combined Work means the +object code and/or source code for the Application, including any data +and utility programs needed for reproducing the Combined Work from the +Application, but excluding the System Libraries of the Combined Work. + +\item Exception to Section 3 of the GNU GPL. + + You may convey a covered work under sections 3 and 4 of this License +without being bound by section 3 of the GNU GPL. + +\item Conveying Modified Versions. + + If you modify a copy of the Library, and, in your modifications, a +facility refers to a function or data to be supplied by an Application +that uses the facility (other than as an argument passed when the +facility is invoked), then you may convey a copy of the modified +version: + + \begin{enumerate} + \item under this License, provided that you make a good faith effort to + ensure that, in the event an Application does not supply the + function or data, the facility still operates, and performs + whatever part of its purpose remains meaningful, or + + \item under the GNU GPL, with none of the additional permissions of + this License applicable to that copy. + \end{enumerate} + +\item Object Code Incorporating Material from Library Header Files. + + The object code form of an Application may incorporate material from +a header file that is part of the Library. You may convey such object +code under terms of your choice, provided that, if the incorporated +material is not limited to numerical parameters, data structure +layouts and accessors, or small macros, inline functions and templates +(ten or fewer lines in length), you do both of the following: + + \begin{enumerate} + \item Give prominent notice with each copy of the object code that the + Library is used in it and that the Library and its use are + covered by this License. + + \item Accompany the object code with a copy of the GNU GPL and this license + document. + \end{enumerate} + +\item Combined Works. + + You may convey a Combined Work under terms of your choice that, +taken together, effectively do not restrict modification of the +portions of the Library contained in the Combined Work and reverse +engineering for debugging such modifications, if you also do each of +the following: + + \begin{enumerate} + \item Give prominent notice with each copy of the Combined Work that + the Library is used in it and that the Library and its use are + covered by this License. + + \item Accompany the Combined Work with a copy of the GNU GPL and this license + document. + + \item For a Combined Work that displays copyright notices during + execution, include the copyright notice for the Library among + these notices, as well as a reference directing the user to the + copies of the GNU GPL and this license document. + + \item Do one of the following: + + \begin{enumerate} + \addtocounter{enumiii}{-1} % start at 0 + \item Convey the Minimal Corresponding Source under the terms of this + License, and the Corresponding Application Code in a form + suitable for, and under terms that permit, the user to + recombine or relink the Application with a modified version of + the Linked Version to produce a modified Combined Work, in the + manner specified by section 6 of the GNU GPL for conveying + Corresponding Source. + + \item Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (a) uses at run time + a copy of the Library already present on the user's computer + system, and (b) will operate properly with a modified version + of the Library that is interface-compatible with the Linked + Version. + \end{enumerate} + + \item Provide Installation Information, but only if you would otherwise + be required to provide such information under section 6 of the + GNU GPL, and only to the extent that such information is + necessary to install and execute a modified version of the + Combined Work produced by recombining or relinking the + Application with a modified version of the Linked Version. (If + you use option 4d0, the Installation Information must accompany + the Minimal Corresponding Source and Corresponding Application + Code. If you use option 4d1, you must provide the Installation + Information in the manner specified by section 6 of the GNU GPL + for conveying Corresponding Source.) + \end{enumerate} + +\item Combined Libraries. + + You may place library facilities that are a work based on the +Library side by side in a single library together with other library +facilities that are not Applications and are not covered by this +License, and convey such a combined library under terms of your +choice, if you do both of the following: + + \begin{enumerate} + \item Accompany the combined library with a copy of the same work based + on the Library, uncombined with any other library facilities, + conveyed under the terms of this License. + + \item Give prominent notice with the combined library that part of it + is a work based on the Library, and explaining where to find the + accompanying uncombined form of the same work. + \end{enumerate} + +\item Revised Versions of the GNU Lesser General Public License. + + The Free Software Foundation may publish revised and/or new versions +of the GNU Lesser General Public License from time to time. Such new +versions will be similar in spirit to the present version, but may +differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the +Library as you received it specifies that a certain numbered version +of the GNU Lesser General Public License ``or any later version'' +applies to it, you have the option of following the terms and +conditions either of that published version or of any later version +published by the Free Software Foundation. If the Library as you +received it does not specify a version number of the GNU Lesser +General Public License, you may choose any version of the GNU Lesser +General Public License ever published by the Free Software Foundation. + + If the Library as you received it specifies that a proxy can decide +whether future versions of the GNU Lesser General Public License shall +apply, that proxy's public statement of acceptance of any version is +permanent authorization for you to choose that version for the +Library. + +\end{enumerate} + +\end{document} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: + Index: doc/src/spec.tex =================================================================== --- doc/src/spec.tex (revision 41) +++ doc/src/spec.tex (revision 42) @@ -1,9 +1,51 @@ +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% +%% Filename: doc/src/spec.tex +%% +%% Project: A General Purpose Pipelined FFT Implementation +%% +%% Purpose: This file contains the LaTeX instructions necessary to build +%% the doc/spec.pdf file. It's not nearly as interesting as the +%% doc/spec.pdf file itself, so I would recommend you read that file +%% before looking for items within here. +%% +%% Creator: Dan Gisselquist, Ph.D. +%% Gisselquist Technology, LLC +%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% +%% Copyright (C) 2018, Gisselquist Technology, LLC +%% +%% This file is part of the general purpose pipelined FFT project. +%% +%% The pipelined FFT project is free software (firmware): you can redistribute +%% it and/or modify it under the terms of the GNU Lesser General Public License +%% as published by the Free Software Foundation, either version 3 of the +%% License, or (at your option) any later version. +%% +%% The pipelined FFT project is distributed in the hope that it will be useful, +%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%% MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser +%% General Public License for more details. +%% +%% You should have received a copy of the GNU Lesser General Public License +%% along with this program. (It's in the $(ROOT)/doc directory. Run make +%% with no target there if the PDF file isn't present.) If not, see +%% for a copy. +%% +%% License: LGPL, v3, as defined and found on www.gnu.org, +%% http://www.gnu.org/licenses/lgpl.html +%% +%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% +%% \documentclass{gqtekspec} -\project{Double Clocked FFT} +\project{Pipelined FFT} \title{Specification} \author{Dan Gisselquist, Ph.D.} \email{dgisselq (at) opencores.org} -\revision{Rev.~0.2} +\revision{Rev.~0.3} \begin{document} \pagestyle{gqtekspecplain} \titlepage @@ -10,20 +52,24 @@ \begin{license} Copyright (C) \theyear\today, Gisselquist Technology, LLC -This project is free software (firmware): you can redistribute it and/or -modify it under the terms of the GNU General Public License as published -by the Free Software Foundation, either version 3 of the License, or (at -your option) any later version. +This file is part of the general purpose pipelined FFT project. -This program is distributed in the hope that it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or -FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -for more details. +The pipelined FFT project is free software (firmware): you can redistribute +it and/or modify it under the terms of the GNU Lesser General Public License +as published by the Free Software Foundation, either version 3 of the +License, or (at your option) any later version. -You should have received a copy of the GNU General Public License along -with this program. If not, see \texttt{http://www.gnu.org/licenses/} for a copy. +The pipelined FFT project is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser +General Public License for more details. + +You should have received a copy of the GNU Lesser General Public License along +with this project. If not, see \texttt{http://www.gnu.org/licenses/} for a +copy. \end{license} \begin{revisionhistory} +0.3 & 6/2/2015 & Gisselquist & General purpose pipelined FFT generator\\\hline 0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline 0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline \end{revisionhistory} @@ -33,23 +79,25 @@ \listoffigures \listoftables \begin{preface} -This FFT comes from my attempts to design and implement a signal processing +This FFT came about originally from my attempts to design and implement a +GPS decorrelation algorithm inside a generic FPGA, but only on a limited budget. As such, -I don't yet have the FPGA board I wish to place this algorithm onto, neither -do I have any expensive modeling or simulation capabilities. I'm using -Verilator for my modeling and simulation needs. This makes -using a vendor supplied IP core, such as an FFT, difficult if not impossible -to use. +it was built before I had the board that could use it. Because I was trying +to hit a very high processing rate, the FFT core was originally built to handle +two samples at a time. Hence the original name, the ``Double clocked FFT +core''. -My problem was made worse when I learned that the published maximum clock -speed for a device wasn't necessarily the maximum clock speed that I could -achieve. My design needed to process the incoming signal at 500~MHz to be -commercially viable. 500~MHz is not necessarily a clock speed -that can be easily achieved. 250~MHz, on the other hand, is much more within -the realm of possibility. Achieving a 500~MHz performance with a 250~MHz -clock, however, requires an FFT that accepts two samples per clock. +One of the difficulties of doing all of your development using Verilator as +a simulator, is that you can only simulate components that you have the +Verilog source for. My desire to use Verilator kept me from using any of +the vendor supplied FFTs out there. This, then, was and is the genesis of this project. + +Since this genesis, I've used this core as part of several other designs and +maintaied it. Eventually it has morphed into the general purpose FFT core +generator that it is today. + \end{preface} \chapter{Introduction} @@ -56,10 +104,10 @@ \pagenumbering{arabic} \setcounter{page}{1} -The Double Clocked FFT project contains all of the software necessary to -create the IP to generate an arbitrary sized FFT that will clock two samples -in at each clock cycle, and after some pipeline delay it will clock two -samples out at every clock cycle. +The General Purpose Pipelined FFT generator project contains all of the +software necessary to create an arbitrary sized FFT HDL core that will +accept up to two samples per clock cycle, and after some pipeline delay +it will output FFT results at the same rate they are input. The FFT generated by this approach is very configurable. By simple adjustment of a command line parameter, the FFT may be made to be a forward FFT or an @@ -66,47 +114,87 @@ inverse FFT. The number of bits processed, kept, and maintained by this FFT are also configurable. Even the number of bits used for the twiddle factors, or whether or not to bit reverse the outputs, are all configurable -parts to this FFT core. +parts to this FFT core. Finally, the FFT can be configured to process two +samples per clock, one sample per clock, one sample every other clock, or even +one sample every third clock (or less). -These features make the Double Clocked FFT very different and unique among the -other cores available on opencores.com. +These features make this general purpose pipelined FFT generator very +different and unique among the other cores available on opencores.com. For those who wish to get started right away, please download the package, change into the {\tt sw} directory and run {\tt make}. There is no need to -run a configure script, {\tt fftgen} is completely portable C++. Then, once -built, go ahead and run {\tt fftgen} without any arguments. This will cause -{\tt fftgen} to print a usage statement to the screen. Review the usage -statement, and run {\tt fftgen} a second time with the arguments you need. +run a configure script, {\tt fftgen} is completely portable C++. (While I +do my development on Ubuntu, I am told by others that the core builds on +Microsoft systems as well.) Then, once built, go ahead and run {\tt fftgen} +without any arguments. This will cause {\tt fftgen} to print a usage +statement to the screen. Review the usage statement, and run {\tt fftgen} +a second time with the arguments you need. \chapter{Generation} -Creating a double clocked FFT core is as simple as running the program +Creating an FFT core is as simple as running the program {\tt fftgen}. The program will then create a series of Verilog files, as well as {\tt .hex} files suitable for use with a \textdollar readmemh, and -place them into an {\tt ./fft-core/} directory that {\tt fftgen} will create. -Creating the core you want takes a touch of configuring. -Therefore, the following lists the arguments that can be given to +place them into an output directory, {\tt ./fft-core/} by default, that +{\tt fftgen} will create. Creating the core you want takes a touch of +configuring. Therefore, the following lists the arguments that can be given to {\tt fftgen} to adjust the core that it builds: \begin{itemize} \item[\hbox{-f size}] This specifies the size of the FFT core that {\tt fftgen} will build. - The size must be a power of two. The transform is given, within a - scale factor, to, + The size must be a power of two. + + Given an input $x\left[n\right]$, the FFT will calculate, \begin{eqnarray*} X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right] e^{-j2\pi \frac{k}{N}n} \end{eqnarray*} + to within a scale factor. -\item[\hbox{-1}] +\item[\hbox{-i}] This specifies that the FFT will be an inverse FFT. Specifically, it will calculate, \begin{eqnarray*} x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n} \end{eqnarray*} -\item[\hbox{-0}] - This specifies building a forward FFT. However, since this is the - default, this option never necessary. + + If no {\tt -i} option is given, the core will by default generate a + forward FFT. + +\item[\hbox{-2}] + Builds an FFT that can ingest and output two samples per clock. + + This option requires six multiplies for all but the last two butterfly + stages. The last two butterfly stages are accomplished using shifts + and adds only, so they require no multiplies. + +\item[\hbox{-k 1}] + Builds an FFT that can ingest and output one sample per clock. + This option is incompatible with {\tt -2}. + + This option requires three multiplies for all but the last two + butterfly stages. + +\item[\hbox{-k 2}] + Builds an FFT that can ingest and output one sample every other clock. + This option is incompatible with {\tt -2}, and will override {\tt -k 1}. + You are responsible for making sure that {\tt i\_ce} will never be + true for two clocks in a row. + + Unlike {\tt -k 1}, this option only requires two multiplies for all + but the last two butterfly stages. + +\item[\hbox{-k 3}] + Builds an FFT that can ingest and output one sample every third clock. + This option is incompatible with {\tt -2}, and will override any + other {\tt -k} option. + For this to work, you will need to guarantee that {\tt i\_ce} will + never be true more than one time in any three clock periods. + + Unlike {\tt -k 1} and {\tt -k 2}, this option only requires one + multiply for all but the last two butterfly stages. + \item[\hbox{-s}] This causes the core to skip the final bit reversal stage. The outputs of the FFT will then come out in bit reversed order. @@ -120,10 +208,8 @@ Be aware, however, doing this requires the bit reversed forward transform be followed by a bitreversed decimation in time approach to the inverse transform. This software does not (yet) provide that - capability. As such, the utility just isn't there yet. -\item[\hbox{-S}] - Include the final bit reversal stage. As this is also the default, - specifying the option should not be necessary. + capability. As such, this capability is really just a placeholder for + a future capability. \item[\hbox{-d DIR}] Specifies the DIRectory to place the produced Verilog files. By default, this will be in the `./fft-core/' directory, but it can @@ -138,16 +224,20 @@ the FFT. Bits are accumulated at roughly one bit for every two stages. However, if this value is set, bits are only accumulated up to this maximum width. After this width, further accumulations are truncated. -\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given - by the number of bits input to that stage plus this extra number of - bits per coefficient. By increasing the number of bits per coefficient - above that of the input samples, truncation error is kept to the - original error found within the original samples. -\item[\hbox{-x bits}] Internally accumulated roundoff error can be a difficult +\item[\hbox{-c bits}] Specifies the number of extra bits to be given to each + twiddle factor. The size of the twiddle factors is nominally the size + of the input data. By specifying {\tt -c }, you can extend + this default value to avoid any loss in precision. +\item[\hbox{-x bits}] Maintains {\tt bits} extra bits during the computation + to deal with roundoff error. Hence, after the first stage, there will + be this many excess bits within the FFT pipeline. + + Internally accumulated roundoff error can be a difficult problem to solve. By using this option, you guarantee that the FFT runs with an additional {\tt bits} bits, and only truncates down to the necessary width at the end in order to minimize rounding errors along the way. + \item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT will consume. By default, the FFT does not use any hardware multiplies. However, this can be expensive on the rest of the logic used by the @@ -160,8 +250,36 @@ \chapter{Architecture} As a component of another system the structure of this system is a simple -black box such as the one shown in Fig.~\ref{fig:black-box}. +black box such as the one shown in Fig.~\ref{fig:black-box-one} \begin{figure}\begin{center} +\begin{pspicture}(-2.2in,0.3in)(2.2in,2in) +% \rput(0,0){\psframe(-2.2in,0.3in)(2.2in,2in)} +\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0.3in)(0.75in,2in)} + \rput(0,1in){(I)FFT Core} + \rput[r](-1.6in,1.8in){\tt i\_clk} + \rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)} + \rput[r](-1.6in,1.5in){\tt i\_rst} + \rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)} + \rput[r](-1.6in,1.2in){\tt i\_ce} + \rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)} + \rput[r](-1.6in,0.6in){\tt i\_sample} + \rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)} + \rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)} + \rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}} + % + \rput[l](1.6in,1.2in){\tt o\_sync} + \rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)} + \rput[l](1.6in,0.6in){\tt o\_result} + \rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)} + \rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)} + \rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}} + } +\end{pspicture} +\caption{(I)FFT Black Box Diagram}\label{fig:black-box-one} +\end{center}\end{figure} +for the two traditional one sample per clock FFT implementation, or +Fig.~\ref{fig:black-box-dbl} +\begin{figure}\begin{center} \begin{pspicture}(-2.1in,0)(2.1in,2in) % \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)} \rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)} @@ -193,16 +311,21 @@ \rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}} } \end{pspicture} -\caption{(I)FFT Black Box Diagram}\label{fig:black-box} +\caption{Two sample per clock (I)FFT Black Box Diagram}\label{fig:black-box-dbl} \end{center}\end{figure} +for the two samples per clock FFT. + + The interface is simple: strobe the reset line, and every clock thereafter set the clock -enable line when data is valid on the left and right input ports. Likewise +enable line when data is valid on the input port. Likewise for the outputs, when the {\tt o\_sync} line goes high the first data sample is available. Ever after that, one data sample will be available every clock cycle that the {\tt i\_ce} line is high. -Internal to the FFT, things are a touch more complex. Fig.~\ref{fig:white-box} +Internal to the FFT, things are a touch more complex. + +Fig.~\ref{fig:white-box-dbl} \begin{figure}\begin{center} \begin{pspicture}(1.3in,-0.5in)(4.7in,5in) % \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)} @@ -313,14 +436,16 @@ \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)} \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}} \end{pspicture} -\caption{Internal FFT Structure}\label{fig:white-box} +\caption{Internal FFT Structure}\label{fig:white-box-dbl} \end{center}\end{figure} -attempts to show some of this structure. As you can see from the figure, the -FFT itself is composed of a series of stages. These stages are split from the -beginning into an even stage and an odd stage. Further, they are numbered +attempts to show some of this structure for the two-sample per clock FFT. +As you can see from the figure, the +FFT itself is composed of a series of stages. For the two-sample per clock +FFT, these stages are split from the beginning into an even stage and an odd +stage. Further, they are numbered according to the size of the FFT they represent. Therefore the first stage is numbered $N$ and represents the first stage of an $N$ point FFT. The -second stage is labeled $N/2$, then $N/$, and so on down to $N=8$. The +second stage is labeled $N/2$, then $N/4$, and so on down to $N=8$. The four sample stage and the two sample stages are different, however. These two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be accomplished without any multiplies. Therefore they have been accomplished @@ -336,7 +461,7 @@ \begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in) % \rput(0,0){\psframe(0in,-2in)(3in,4.25in)} \rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)} - \rput[r](1.625in,4.125in){\tt i\_data} + \rput[r](1.625in,4.125in){\tt i\_sample} \rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)% \psline{->}(0,0)(-0.2in,-0.25in)% \psarc{->}{0.15in}{200}{340}} @@ -395,17 +520,20 @@ \caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage} \end{center}\end{figure} These FFT stages are really no different than any other decimation in -frequency FFT, save only that the coefficients are alternated between the -two stages. That is, the even stages get all the even coefficients, and -the odd stages get all of the odd coefficients. -Internally, each stage spends the first $N/4$ clocks storing its inputs -into memory, and then the next $N/4$ clocks pairing a stored input with -a single external input, so that both values become inputs to the butterfly. -Likewise, the butterfly coefficient is read from a small ROM table. +frequency radix-2 FFT implementation, save only that for the two-sample per +clock FFT the coefficients are alternated between the two inputs +shown in Fig.~\ref{fig:white-box-dbl}. That is, the even stages would get all +the even coefficients, and the odd stages get all of the odd coefficients. +For the more general purpose FFT, there's only the one pipeline, so every +sample goes through every butterfly. +Internally, each stage spends the first $N/2$ clocks storing its inputs +into memory, and then the next $N/2$ clocks pairing a stored input with +a single (fresh) external input, so that both values become inputs to the +butterfly. Likewise, the butterfly coefficient is read from a small ROM table. One trick to making the FFT stage work successfully is synchronization. Since the shift and add multiplies create a delay of (roughly) one clock cycle per -bit of input, there is a significant pipeline delay from the input to the +two bits of input, there is a significant pipeline delay from the input to the output of the butterfly routine. To match this delay, the FFT stage places a synchronization pulse into the butterfly. When this synchronization pulse comes out of the butterfly, the values of the butterfly then match the @@ -415,11 +543,11 @@ time the butterfly output will be invalid. To keep things aligned, and to avoid the invalid data half, a counter is started by the synchronization pulse coming out of the butterfly in order to keep track. Using this counter and -once the butterfly produces the first sync pulse, the next $N/4$ clock cycles +once the butterfly produces the first sync pulse, the next $N/2$ clock cycles will produce valid butterfly outputs. For these clock cycles, the left or first output is sent immediately to the next FFT stage, whereas the right or second output is saved into memory. Once these cycles are complete, the -butterfly outputs will be invalid for the next $N/4$ clock cycles. During +butterfly outputs will be invalid for the next $N/2$ clock cycles. During these invalid clock cycles, the FFT stage outputs data that had been stored in memory. In this fashion, data is always valid coming out of each FFT stage once the initial synchronization pulse goes high. @@ -450,15 +578,29 @@ \item From the time of reset until the first sample pair is available on the IO ports, {\tt i\_rst} may be kept low, but the clock enable line {\tt i\_ce} must also be kept low. - \item On the clock containing the first sample pair, {\tt i\_left} - and {\tt i\_right}, set {\tt i\_ce} high. - \item Ever after, any time a valid pair of samples is available to - the input of the FFT, place the first sample of the pair - on the {\tt i\_left} line, the second on the {\tt i\_right} - line, and set {\tt i\_ce} high. + + \item On the clock containing the first sample, {\tt i\_sample}, or the + first sample pair, {\tt i\_left} and {\tt i\_right} for the + two sample-per-clock FFT, set {\tt i\_ce} high. + + If you have elected an FFT that multiplexes its multiplies, + and so can only handle one sample every two or three clocks, + then you'll need to guarantee {\tt i\_ce} remains low for + one or two clocks respectively before raising it again. + + \item Ever after, any time a valid sample is placed in {\tt i\_sample} + and {\tt i\_ce} raised high, a sample will enter into the + FFT for processing. For the two sample-per-clock FFT, the + input will be into {\tt i\_left} (for the first input) + and {\tt i\_right} (for the next one). + \item At the first valid output, the FFT core will set {\tt o\_sync} - line high in addition to the output values {\tt o\_left} - (the first of two), and {\tt o\_right} (the second of the two). + line high in addition to placing the output value into + {\tt o\_result}. For the two-sample per clock FFT, the outputs + will be placed into {\tt o\_left} + (the first of two), and {\tt o\_right} (the second of the two) + respectively. + \item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock two samples in and two samples out. On any valid first pair of samples coming out of the transform, @@ -498,22 +640,38 @@ i\_rst & 1 & Input & An active high synchronous reset.\\\hline i\_ce & 1 & Input & Clock Enable. Set this high to clock data in and out.\\\hline -i\_left & $2N_i$ & Input & The first of two input complex input samples. Bits +i\_sample & $2N_i$ & Input & The complex input sample. Bits [$\left(2N_i-1\right)$:$N_i$] of this value are the real portion, whereas bits [$\left(N_i-1\right)$:0] represent the imaginary portion. Both portions are in signed twos complement integer format. The number of bits, $N_i$, is configurable. \\\hline -i\_right & $2N_i$ & Input & The second of two input complex input samples. - The format is the same as {\tt i\_left} above.\\\hline -o\_left & $2N_o$ & Output & The first of two input complex output samples. - The format is the same, save only that $N_o$ bits are - used for each twos complement portion instead of $N_i$.\\\hline -o\_right & $2N_o$ & Output & The second of two input complex output samples. - The format is the same as for {\tt o\_left} above.\\\hline -o\_sync & 1 & Output & Signals the first output sample pair of any transform, - zero otherwise. + +i\_left & $2N_i$ & Input & When the core is configured for two-samples per + clock, + this is the first of the two data inputs presented to the core + on any given clock. It has the same format as {\tt i\_sample} + above. \\\hline +i\_right & $2N_i$ & Input & The second of two input complex input samples, + used when the core is configured for two-samples per clock. + The format is the same as {\tt i\_sample} above.\\\hline +o\_sync & 1 & Output & Signals the first output sample of any transform. + It will be zero from the time of the reset until the first + output sample. Ever afterwards, it will be true any time + bin zero of the FFT is on the output.\\\hline +o\_result & $2N_o$ & Output & The complex output sample. The format is the + same, save only that $N_o$ bits are used for each twos + complement portion instead of $N_i$. Hence bits + [$\left(2N_o-1\right)$:$N_o$] of this value are the real + portion, whereas bits [$\left(N_o-1\right)$:0] represent the + imaginary portion. + \\\hline +o\_left & $2N_o$ & Output & When in the two-sample per clock configuration, + this is the first of two complex output samples.\\\hline +o\_right & $2N_o$ & Output & When in the two-sample per clcok configuration, + this is the second of two complex output samples. \\\hline + \\\hline \end{portlist} \caption{List of IO ports}\label{tbl:ioports} \end{center}\end{table}

/doc/src

doc/src Property changes : Added: svn:ignore ## -0,0 +1 ## +spec.out

Browse

Tools

Subversion Repositories dblclockfft

Compare Revisions

Rev 41 → Rev 42