OpenCores
URL https://opencores.org/ocsvn/dblclockfft/dblclockfft/trunk

Subversion Repositories dblclockfft

[/] [dblclockfft/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 42

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 42 dgisselq
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2
%%
3
%% Filename:    doc/src/spec.tex
4
%%
5
%% Project:     A General Purpose Pipelined FFT Implementation
6
%%
7
%% Purpose:     This file contains the LaTeX instructions necessary to build
8
%%              the doc/spec.pdf file.  It's not nearly as interesting as the
9
%%      doc/spec.pdf file itself, so I would recommend you read that file
10
%%      before looking for items within here.
11
%%
12
%% Creator:     Dan Gisselquist, Ph.D.
13
%%              Gisselquist Technology, LLC
14
%%
15
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
16
%%
17
%% Copyright (C) 2018, Gisselquist Technology, LLC
18
%%
19
%% This file is part of the general purpose pipelined FFT project.
20
%%
21
%% The pipelined FFT project is free software (firmware): you can redistribute
22
%% it and/or modify it under the terms of the GNU Lesser General Public License
23
%% as published by the Free Software Foundation, either version 3 of the
24
%% License, or (at your option) any later version.
25
%%
26
%% The pipelined FFT project is distributed in the hope that it will be useful,
27
%% but WITHOUT ANY WARRANTY; without even the implied warranty of
28
%% MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
29
%% General Public License for more details.
30
%%
31
%% You should have received a copy of the GNU Lesser General Public License
32
%% along with this program.  (It's in the $(ROOT)/doc directory.  Run make
33
%% with no target there if the PDF file isn't present.)  If not, see
34
%% <http://www.gnu.org/licenses/> for a copy.
35
%%
36
%% License:     LGPL, v3, as defined and found on www.gnu.org,
37
%%              http://www.gnu.org/licenses/lgpl.html
38
%%
39
%%
40
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
41
%%
42
%%
43 10 dgisselq
\documentclass{gqtekspec}
44 42 dgisselq
\project{Pipelined FFT}
45 10 dgisselq
\title{Specification}
46
\author{Dan Gisselquist, Ph.D.}
47 27 dgisselq
\email{dgisselq (at) opencores.org}
48 42 dgisselq
\revision{Rev.~0.3}
49 10 dgisselq
\begin{document}
50
\pagestyle{gqtekspecplain}
51
\titlepage
52
\begin{license}
53
Copyright (C) \theyear\today, Gisselquist Technology, LLC
54
 
55 42 dgisselq
This file is part of the general purpose pipelined FFT project.
56 10 dgisselq
 
57 42 dgisselq
The pipelined FFT project is free software (firmware): you can redistribute
58
it and/or modify it under the terms of the GNU Lesser General Public License
59
as published by the Free Software Foundation, either version 3 of the
60
License, or (at your option) any later version.
61 10 dgisselq
 
62 42 dgisselq
The pipelined FFT project is distributed in the hope that it will be useful,
63
but WITHOUT ANY WARRANTY; without even the implied warranty of
64
MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
65
General Public License for more details.
66
 
67
You should have received a copy of the GNU Lesser General Public License along
68
with this project.  If not, see \texttt{http://www.gnu.org/licenses/} for a
69
copy.
70 10 dgisselq
\end{license}
71
\begin{revisionhistory}
72 42 dgisselq
0.3 & 6/2/2015 & Gisselquist & General purpose pipelined FFT generator\\\hline
73 26 dgisselq
0.2 & 6/2/2015 & Gisselquist & Superficial formatting changes\\\hline
74 11 dgisselq
0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline
75 10 dgisselq
\end{revisionhistory}
76
% Revision History
77
% Table of Contents, named Contents
78
\tableofcontents
79
\listoffigures
80
\listoftables
81
\begin{preface}
82 42 dgisselq
This FFT came about originally from my attempts to design and implement a
83
GPS decorrelation
84 10 dgisselq
algorithm inside a generic FPGA, but only on a limited budget.  As such,
85 42 dgisselq
it was built before I had the board that could use it.  Because I was trying
86
to hit a very high processing rate, the FFT core was originally built to handle
87
two samples at a time.  Hence the original name, the ``Double clocked FFT
88
core''.
89 10 dgisselq
 
90 42 dgisselq
One of the difficulties of doing all of your development using Verilator as
91
a simulator, is that you can only simulate components that you have the
92
Verilog source for.  My desire to use Verilator kept me from using any of
93
the vendor supplied FFTs out there.
94 10 dgisselq
 
95
This, then, was and is the genesis of this project.
96 42 dgisselq
 
97
Since this genesis, I've used this core as part of several other designs and
98
maintaied it.  Eventually it has morphed into the general purpose FFT core
99
generator that it is today.
100
 
101 10 dgisselq
\end{preface}
102
 
103
\chapter{Introduction}
104
\pagenumbering{arabic}
105
\setcounter{page}{1}
106
 
107 42 dgisselq
The General Purpose Pipelined FFT generator project contains all of the
108
software necessary to create an arbitrary sized FFT HDL core that will
109
accept up to two samples per clock cycle, and after some pipeline delay
110
it will output FFT results at the same rate they are input.
111 10 dgisselq
 
112
The FFT generated by this approach is very configurable.  By simple adjustment
113
of a command line parameter, the FFT may be made to be a forward FFT or an
114
inverse FFT.  The number of bits processed, kept, and maintained by this
115
FFT are also configurable.  Even the number of bits used for the twiddle
116
factors, or whether or not to bit reverse the outputs, are all configurable
117 42 dgisselq
parts to this FFT core.  Finally, the FFT can be configured to process two
118
samples per clock, one sample per clock, one sample every other clock, or even
119
one sample every third clock (or less).
120 10 dgisselq
 
121 42 dgisselq
These features make this general purpose pipelined FFT generator very
122
different and unique among the other cores available on opencores.com.
123 10 dgisselq
 
124
For those who wish to get started right away, please download the package,
125
change into the {\tt sw} directory and run {\tt make}.  There is no need to
126 42 dgisselq
run a configure script, {\tt fftgen} is completely portable C++.  (While I
127
do my development on Ubuntu, I am told by others that the core builds on
128
Microsoft systems as well.)  Then, once built, go ahead and run {\tt fftgen}
129
without any arguments.  This will cause {\tt fftgen} to print a usage
130
statement to the screen.  Review the usage statement, and run {\tt fftgen}
131
a second time with the arguments you need.
132 10 dgisselq
 
133
 
134
\chapter{Generation}
135
 
136 42 dgisselq
Creating an FFT core is as simple as running the program
137 10 dgisselq
{\tt fftgen}.  The program will then create a series of Verilog files, as
138
well as {\tt .hex} files suitable for use with a \textdollar readmemh, and
139 42 dgisselq
place them into an output directory, {\tt ./fft-core/} by default, that
140
{\tt fftgen} will create.  Creating the core you want takes a touch of
141
configuring.  Therefore, the following lists the arguments that can be given to
142 10 dgisselq
{\tt fftgen} to adjust the core that it builds:
143
\begin{itemize}
144
\item[\hbox{-f size}]
145
        This specifies the size of the FFT core that {\tt fftgen} will build.
146 42 dgisselq
        The size must be a power of two.
147
 
148
        Given an input $x\left[n\right]$, the FFT will calculate,
149 10 dgisselq
        \begin{eqnarray*}
150
        X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]
151
                e^{-j2\pi \frac{k}{N}n}
152
        \end{eqnarray*}
153 42 dgisselq
        to within a scale factor.
154 10 dgisselq
 
155 42 dgisselq
\item[\hbox{-i}]
156 10 dgisselq
        This specifies that the FFT will be an inverse FFT.  Specifically,
157
        it will calculate,
158
        \begin{eqnarray*}
159
        x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}
160
        \end{eqnarray*}
161 42 dgisselq
 
162
        If no {\tt -i} option is given, the core will by default generate a
163
        forward FFT.
164
 
165
\item[\hbox{-2}]
166
        Builds an FFT that can ingest and output two samples per clock.
167
 
168
        This option requires six multiplies for all but the last two butterfly
169
        stages.  The last two butterfly stages are accomplished using shifts
170
        and adds only, so they require no multiplies.
171
 
172
\item[\hbox{-k 1}]
173
        Builds an FFT that can ingest and output one sample per clock.
174
        This option is incompatible with {\tt -2}.
175
 
176
        This option requires three multiplies for all but the last two
177
        butterfly stages.
178
 
179
\item[\hbox{-k 2}]
180
        Builds an FFT that can ingest and output one sample every other clock.
181
        This option is incompatible with {\tt -2}, and will override {\tt -k 1}.
182
        You are responsible for making sure that {\tt i\_ce} will never be
183
        true for two clocks in a row.
184
 
185
        Unlike {\tt -k 1}, this option only requires two multiplies for all
186
        but the last two butterfly stages.
187
 
188
\item[\hbox{-k 3}]
189
        Builds an FFT that can ingest and output one sample every third clock.
190
        This option is incompatible with {\tt -2}, and will override any
191
        other {\tt -k} option.
192
        For this to work, you will need to guarantee that {\tt i\_ce} will
193
        never be true more than one time in any three clock periods.
194
 
195
        Unlike {\tt -k 1} and {\tt -k 2}, this option only requires one
196
        multiply for all but the last two butterfly stages.
197
 
198 10 dgisselq
\item[\hbox{-s}]
199
        This causes the core to skip the final bit reversal stage.  The
200
        outputs of the FFT will then come out in bit reversed order.
201
 
202
        This option is useful in those cases where someone wishes to
203
        multiply the coefficients coming out of an FFT by some product,
204
        and then to inverse FFT the results.  If the coefficients are also
205
        applied in bit--reversed order, then both the FFT and IFFT may
206
        skip their bit reversals.
207 22 dgisselq
 
208
        Be aware, however, doing this requires the bit reversed forward
209
        transform be followed by a bitreversed decimation in time approach
210
        to the inverse transform.  This software does not (yet) provide that
211 42 dgisselq
        capability.  As such, this capability is really just a placeholder for
212
        a future capability.
213 10 dgisselq
\item[\hbox{-d DIR}]
214
        Specifies the DIRectory to place the produced Verilog files.  By
215
        default, this will be in the `./fft-core/' directory, but it can
216
        be moved to any other directory as necessary.
217
\item[\hbox{-n bits}] Sets the number of input bits per sample.  Given this
218
        setting, each of the two samples clocked in at every clock cycle
219
        will have this many bits for their real portion, and again this many
220
        bits for their imaginary portion.  Thus, the data input to the
221
        FFT will be four times this many bits per clock.
222
\item[\hbox{-m bits}] This sets the maximum bit width of the output.
223
        By default, the FFT will gain bits as they accumulate within
224
        the FFT.  Bits are accumulated at roughly one bit for every two stages.
225
        However, if this value is set, bits are only accumulated up to this
226
        maximum width.  After this width, further accumulations are truncated.
227 42 dgisselq
\item[\hbox{-c bits}] Specifies the number of extra bits to be given to each
228
        twiddle factor.  The size of the twiddle factors is nominally the size
229
        of the input data.  By specifying {\tt -c <bits>}, you can extend
230
        this default value to avoid any loss in precision.
231
\item[\hbox{-x bits}]  Maintains {\tt bits} extra bits during the computation
232
        to deal with roundoff error.  Hence, after the first stage, there will
233
        be this many excess bits within the FFT pipeline.
234
 
235
        Internally accumulated roundoff error can be a difficult
236 22 dgisselq
        problem to solve.  By using this option, you guarantee that the FFT
237
        runs with an additional {\tt bits} bits, and only truncates down to
238
        the necessary width at the end in order to minimize rounding
239
        errors along the way.
240 42 dgisselq
 
241 22 dgisselq
\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT
242
        will consume.  By default, the FFT does not use any hardware multiplies.
243
        However, this can be expensive on the rest of the logic used by the
244
        device.  You can avoid this problem by allowing the FFT to use
245
        hardware multiplies using this option.  By default, the multiplies will
246
        be used in the latter stages, so that they will be applied where
247
        the bit width is the greatest.
248 10 dgisselq
\end{itemize}
249
 
250
\chapter{Architecture}
251
 
252
As a component of another system the structure of this system is a simple
253 42 dgisselq
black box such as the one shown in Fig.~\ref{fig:black-box-one}
254 10 dgisselq
\begin{figure}\begin{center}
255 42 dgisselq
\begin{pspicture}(-2.2in,0.3in)(2.2in,2in)
256
% \rput(0,0){\psframe(-2.2in,0.3in)(2.2in,2in)}
257
\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0.3in)(0.75in,2in)}
258
        \rput(0,1in){(I)FFT Core}
259
        \rput[r](-1.6in,1.8in){\tt i\_clk}
260
                \rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}
261
        \rput[r](-1.6in,1.5in){\tt i\_rst}
262
                \rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}
263
        \rput[r](-1.6in,1.2in){\tt i\_ce}
264
                \rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}
265
        \rput[r](-1.6in,0.6in){\tt i\_sample}
266
                \rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}
267
                \rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
268
                \rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}
269
        %
270
        \rput[l](1.6in,1.2in){\tt o\_sync}
271
                \rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}
272
        \rput[l](1.6in,0.6in){\tt o\_result}
273
                \rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}
274
                \rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
275
                \rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}
276
        }
277
\end{pspicture}
278
\caption{(I)FFT Black Box Diagram}\label{fig:black-box-one}
279
\end{center}\end{figure}
280
for the two traditional one sample per clock FFT implementation, or
281
Fig.~\ref{fig:black-box-dbl}
282
\begin{figure}\begin{center}
283 10 dgisselq
\begin{pspicture}(-2.1in,0)(2.1in,2in)
284
% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}
285
\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}
286
        \rput(0,1in){(I)FFT Core}
287
        \rput[r](-1.6in,1.8in){\tt i\_clk}
288
                \rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}
289
        \rput[r](-1.6in,1.5in){\tt i\_rst}
290
                \rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}
291
        \rput[r](-1.6in,1.2in){\tt i\_ce}
292
                \rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}
293
        \rput[r](-1.6in,0.6in){\tt i\_left}
294
                \rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}
295
                \rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
296
                \rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}
297
        \rput[r](-1.6in,0.3in){\tt i\_right}
298
                \rput(-1.5in,0.3in){\psline{->}(0,0)(0.7in,0)}
299
                \rput(-1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
300
                \rput[br](-1.2in,0.3in){\scalebox{0.75}{$2N_i$}}
301
        %
302
        \rput[l](1.6in,1.2in){\tt o\_sync}
303
                \rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}
304
        \rput[l](1.6in,0.6in){\tt o\_left}
305
                \rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}
306
                \rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
307
                \rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}
308
        \rput[l](1.6in,0.3in){\tt o\_right}
309
                \rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}
310
                \rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
311
                \rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}
312
        }
313
\end{pspicture}
314 42 dgisselq
\caption{Two sample per clock (I)FFT Black Box Diagram}\label{fig:black-box-dbl}
315 10 dgisselq
\end{center}\end{figure}
316 42 dgisselq
for the two samples per clock FFT.
317
 
318
 
319 10 dgisselq
The interface
320
is simple: strobe the reset line, and every clock thereafter set the clock
321 42 dgisselq
enable line when data is valid on the input port.  Likewise
322 10 dgisselq
for the outputs, when the {\tt o\_sync} line goes high the first data sample
323
is available.  Ever after that, one data sample will be available every clock
324
cycle that the {\tt i\_ce} line is high.
325
 
326 42 dgisselq
Internal to the FFT, things are a touch more complex.
327
 
328
Fig.~\ref{fig:white-box-dbl}
329 10 dgisselq
\begin{figure}\begin{center}
330
\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)
331
        % \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}
332 11 dgisselq
        \rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}
333 10 dgisselq
        \rput(0,5in){%
334
                \rput[r](1.95in,0.125in){\tiny\tt i\_left}
335
                \rput[l](4.05in,0.125in){\tiny\tt i\_right}
336
                \rput(2.0in,0){\psline{->}(0,0.25in)(0,0.0in)}
337
                \rput(4.0in,0){\psline{->}(0,0.25in)(0,0.0in)}
338
        }
339
        \rput(2in,0){%
340
                \rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
341
                        \rput[r](-0.05in,0.675in){\tiny Left}
342
                        \rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}
343
                        \rput(0,0.25in){Evens, $N$}
344
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
345
                        \rput[l](0.35in,-0.125in){\tiny Data}
346
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
347
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
348
                \rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
349
                        \rput(0,0.25in){Evens, $N/2$}
350
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
351
                        \rput[l](0.35in,-0.125in){\tiny Data}
352
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
353
                        \rput( 0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
354
                % \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%
355
                        % \rput(0,0.25in){Evens, $N$}}
356
                \rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
357
                        \rput(0,0.25in){Evens, $8$}
358
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
359
                        \rput[l](0.35in,-0.125in){\tiny Data}
360
                        \rput[r](-0.35in,0.675in){\tiny Sync}
361
                        \rput[l](0.35in,0.675in){\tiny Data}
362
                        \rput(-0.3in,0.9in){$\vdots$}
363
                        \rput( 0.3in,0.9in){$\vdots$}
364
                        \rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
365
                        \rput( 0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
366
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
367
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
368
                \rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
369
                        \rput(0,0.25in){Qtrstage (Even)}
370
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
371
                        \rput[lb](0.6in,-0.10in){\tiny Data}
372
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.5in)(0.8in,-0.5in)}
373
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.125in)(0.4in,-0.125in)(0.4in,-0.25in)}}
374
                % \rput(0,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%
375
                        % \rput(0,0.25in){dblstage}}
376
                % \rput(0,0in){\psframe(-0.5in,0)(0.5in,0.5in)%
377
                        % \rput(0,0.25in){Bit Reversal}}
378
        }
379
        \rput(4in,0){%
380
                \rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
381
                        \rput[l](0.05in,0.675in){\tiny Right}
382
                        \rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}
383
                        \rput(0,0.25in){Odds, $N$}
384
                        \rput[l](0.35in,-0.125in){\tiny Sync}
385
                        \rput[r](-0.35in,-0.125in){\tiny Data}
386
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
387
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
388
                \rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
389
                        \rput(0,0.25in){Odds, $N/2$}
390
                        \rput[l](0.35in,-0.125in){\tiny Sync}
391
                        \rput[r](-0.35in,-0.125in){\tiny Data}
392
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
393
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
394
                % \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%
395
                        % \rput(0,0.25in){Evens, $N$}}
396
                \rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
397
                        \rput(0,0.25in){Odds, $8$}
398
                        \rput[l](0.35in,0.675in){\tiny Sync}
399
                        \rput[r](-0.35in,0.675in){\tiny Data}
400
                        \rput(-0.3in,0.9in){$\vdots$}
401
                        \rput( 0.3in,0.9in){$\vdots$}
402
                        \rput[l](0.35in,-0.125in){\tiny Sync}
403
                        \rput[r](-0.35in,-0.125in){\tiny Data}
404
                        \rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
405
                        \rput(0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
406
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
407
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
408
                \rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
409
                        \rput(0,0.25in){Qtrstage (Odd)}
410
                        \rput[rb](-0.6in,-0.10in){\tiny Data}
411
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}
412
                                \rput[t](0.3in,-0.3in){\tiny NC}
413
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.125in)(-0.4in,-0.125in)(-0.4in,-0.25in)}
414
                        }
415
        }
416
        \rput(3in,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%
417
                \rput(0,0.25in){Double Stage}
418
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
419
                        \rput[l](0.35in,-0.125in){\tiny Right}
420
                        \rput[r](0.15in,-0.125in){\tiny Left}
421
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
422
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
423
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
424
        \rput(3in,0in){\psframe(-0.5in,0)(0.5in,0.5in)%
425
                \rput(0,0.25in){Bit Reversal}
426
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
427
                        \rput[l](0.35in,-0.125in){\tiny Right}
428
                        \rput[r](0.15in,-0.125in){\tiny Left}
429
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
430
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
431
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
432
        \rput(3in,-0.25in){\rput[r](-0.35in,-0.125in){\tiny\tt o\_sync}
433
                        \rput[l](0.35in,-0.125in){\tiny\tt o\_right}
434
                        \rput[r](0.15in,-0.125in){\tiny\tt o\_left}
435
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
436
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
437
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
438
\end{pspicture}
439 42 dgisselq
\caption{Internal FFT Structure}\label{fig:white-box-dbl}
440 10 dgisselq
\end{center}\end{figure}
441 42 dgisselq
attempts to show some of this structure for the two-sample per clock FFT.
442
As you can see from the figure, the
443
FFT itself is composed of a series of stages.  For the two-sample per clock
444
FFT, these stages are split from the beginning into an even stage and an odd
445
stage.  Further, they are numbered
446 10 dgisselq
according to the size of the FFT they represent.  Therefore the first stage
447
is numbered $N$ and represents the first stage of an $N$ point FFT.  The
448 42 dgisselq
second stage is labeled $N/2$, then $N/4$, and so on down to $N=8$.  The
449 10 dgisselq
four sample stage and the two sample stages are different, however.  These
450
two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be
451
accomplished without any multiplies.  Therefore they have been accomplished
452
separately.  Likewise all of the stages, save the double stage at the bottom,
453
operate on one data sample per clock.  Only the last stage, prior to the
454
bit reversal stage, takes two data samples per clock as input, and outputs two
455
data samples per clock.  Finally, the bit reversal stage acts as the last
456
piece of the structure.
457
 
458
Internal to each of the FFT stages is a butterfly and a complex multiply,
459
as shown in Fig.~\ref{fig:fftstage}.
460
\begin{figure}\begin{center}
461 11 dgisselq
\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)
462
        % \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}
463
        \rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}
464 42 dgisselq
        \rput[r](1.625in,4.125in){\tt i\_sample}
465 11 dgisselq
        \rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%
466
                        \psline{->}(0,0)(-0.2in,-0.25in)%
467
                        \psarc{->}{0.15in}{200}{340}}
468 10 dgisselq
        \rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}
469
                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}
470
                        \rput(0,0){\psframe(0.3in,0)(0.4in,0.25in)}
471
                        \rput(0,0){\psframe(0.5in,0)(0.6in,0.25in)}
472
                        \rput(0,0){\psframe(0.7in,0)(0.8in,0.25in)}
473
                        \rput(0,0){\psframe(0.9in,0)(1.0in,0.25in)}
474
                        \rput(0,0){\psframe(1.1in,0)(1.2in,0.25in)}
475
                        \rput(0,0){\psline{-}(0.7in,-0.05in)(1.1in,-0.25in)}
476 11 dgisselq
                        \rput(0,0){\psline{<-}(0.7in,0.3in)(1.5in,0.5in)(1.5in,0.75in)}}
477 10 dgisselq
        \rput(1.85in,2.75in){\psline(0,0.75in)(0,-0.25in)}
478 11 dgisselq
        \rput(0.6in,0.25in){\rput(0,0){\psframe[linewidth=2\pslinewidth](0,0)(2in,2.0in)}
479 10 dgisselq
                \rput(0.50in,2in){\psline{->}(0,0.25in)(0,0in)}
480
                \rput(1.25in,2in){\psline{->}(0,0.25in)(0,0in)}
481
                \rput(1.75in,2in){\psline{->}(0,0.25in)(0,0in)}
482
                \rput(0.5in,0){%
483
                        \rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}
484
                        \rput(0in,0){\psline{->}(0,1.75in)(0.65in,1.1in)}
485
                        \rput(-0.1in,1.1in){$+$}
486
                        \rput(0in,1.0in){$\bigoplus$}
487
                        \rput(0in,0){\psline{->}(0,0.9in)(0,0.75in)}
488
                        \rput(0in,0.5in){\psframe(-0.45in,-0.25in)(0.45in,0.25in)}
489
                        \rput(0in,0.5in){\parbox{0.8in}{Delay, and\\shift by $C-2$}}
490
                        \rput(0in,0){\psline{->}(0,0.25in)(0,0.0in)}}
491
                \rput(1.25in,0){%
492
                        \rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}
493
                        \rput(0in,0){\psline{->}(0,1.75in)(-0.65in,1.1in)}
494
                        \rput(0.1in,1.1in){$-$}
495
                        \rput(0in,1in){$\bigoplus$}
496
                        \rput(0in,0){\psline{->}(0,0.9in)(0,0.6in)}
497
                        \rput(0in,0.5in){$\bigotimes$}
498
                        \rput(0in,0){\psline{->}(0,0.4in)(0,0.0in)}}
499
                \rput(1.75in,0){%
500
                        \rput(0,0){\psline{->}(0,2.0in)(0,0.5in)(-0.4in,0.5in)}}
501 11 dgisselq
                \rput(0.50in,-0.25in){\psline{->}(0,0.25in)(0,-1.05in)}
502
                \rput(1.25in,-0.25in){\psline{-}(0,0.25in)(0,0in)}}
503
        \rput*[l](2.0in,0.5in){DIF Butterfly}
504
        \rput*[lb](1.95in,2.5in){Coefficient memory}
505 10 dgisselq
        % \rput(0,0){\psframe(1.3in,-0.25in)(4.7in,5in)}
506 11 dgisselq
        \rput(1.7in,-0.5in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}
507 10 dgisselq
                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}
508
                        \rput(0,0){\psframe(0.3in,0)(0.4in,0.25in)}
509
                        \rput(0,0){\psframe(0.5in,0)(0.6in,0.25in)}
510
                        \rput(0,0){\psframe(0.7in,0)(0.8in,0.25in)}
511
                        \rput(0,0){\psframe(0.9in,0)(1.0in,0.25in)}
512
                        \rput(0,0){\psframe(1.1in,0)(1.2in,0.25in)}
513 11 dgisselq
                        \rput(0,0){\psline{<-}(0.7in,0.30in)(0.15in,0.5in)}
514
                        \rput(0,0){\psline{->}(0.7in,-0.05in)(-0.2in,-0.3in)(-0.2in,-0.55in)}}
515
        \rput(1.3in,-1.3in){\psline{->}(-0.2in,0.25in)(0,0)}
516
        \rput(1.3in,-1.3in){\psarcn{->}{0.15in}{150}{30}}
517
        \rput(1.3in,-1.3in){\psline{->}(0,0)(0,-0.5in)}
518
        \rput[l](1.35in,-1.675in){\tt o\_data}
519 10 dgisselq
\end{pspicture}
520 11 dgisselq
\caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage}
521 10 dgisselq
\end{center}\end{figure}
522
These FFT stages are really no different than any other decimation in
523 42 dgisselq
frequency radix-2 FFT implementation, save only that for the two-sample per
524
clock FFT the coefficients are alternated between the two inputs
525
shown in Fig.~\ref{fig:white-box-dbl}.  That is, the even stages would get all
526
the even coefficients, and the odd stages get all of the odd coefficients.
527
For the more general purpose FFT, there's only the one pipeline, so every
528
sample goes through every butterfly.
529
Internally, each stage spends the first $N/2$ clocks storing its inputs
530
into memory, and then the next $N/2$ clocks pairing a stored input with
531
a single (fresh) external input, so that both values become inputs to the
532
butterfly.  Likewise, the butterfly coefficient is read from a small ROM table.
533 10 dgisselq
 
534
One trick to making the FFT stage work successfully is synchronization.  Since
535 22 dgisselq
the shift and add multiplies create a delay of (roughly) one clock cycle per
536 42 dgisselq
two bits of input, there is a significant pipeline delay from the input to the
537 22 dgisselq
output of the butterfly routine.  To match this delay, the FFT stage places a
538 10 dgisselq
synchronization pulse into the butterfly.  When this synchronization pulse
539
comes out of the butterfly, the values of the butterfly then match the
540
first sample out of the stage.  The next synchronization problem comes from
541
the fact that the butterflies operate on two samples at a time, whereas the
542
FFT stage operates on a single sample at a time.  This means that half the
543
time the butterfly output will be invalid.  To keep things aligned, and to
544
avoid the invalid data half, a counter is started by the synchronization pulse
545
coming out of the butterfly in order to keep track.  Using this counter and
546 42 dgisselq
once the butterfly produces the first sync pulse, the next $N/2$ clock cycles
547 10 dgisselq
will produce valid butterfly outputs.  For these clock cycles, the left or
548
first output is sent immediately to the next FFT stage, whereas the right
549
or second output is saved into memory.  Once these cycles are complete, the
550 42 dgisselq
butterfly outputs will be invalid for the next $N/2$ clock cycles.  During
551 10 dgisselq
these invalid clock cycles, the FFT stage outputs data that had been stored
552
in memory.  In this fashion, data is always valid coming out of each FFT
553
stage once the initial synchronization pulse goes high.
554
 
555
The complex multiply itself, formed internal to the butterfly routine, is
556
formed from three very simple shift and add multiplies, whose output is
557 22 dgisselq
then transformed into a single complex output, although there is a command
558
line option to use hardware multiplies instead.  To avoid overflow, the
559 10 dgisselq
complex coefficients, $z_n$, for these multiplies are given by,
560
\begin{eqnarray}
561
z_n &=& c_n + js_n,\mbox{ where} \\
562
c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\
563
s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}
564
\end{eqnarray}
565
$C$ is the number of bits allocated to the coefficient.
566
 
567
For those wishing to understand this operation further and in more depth, I
568
would commend them to the literature on how a decimation in frequency FFT is
569
constructed.
570
 
571
\chapter{Operation}
572
 
573
The core is actually really easy to use:
574
\begin{enumerate}
575
        \item Provide a system clock to the core every clock cycle.
576
        \item Set the {\tt i\_rst} line high for at least one clock cycle
577
                before you intend to use the core.
578
        \item From the time of reset until the first sample pair is available
579
                on the IO ports, {\tt i\_rst} may be kept low, but the clock
580
                enable line {\tt i\_ce} must also be kept low.
581 42 dgisselq
 
582
        \item On the clock containing the first sample, {\tt i\_sample}, or the
583
                first sample pair, {\tt i\_left} and {\tt i\_right} for the
584
                two sample-per-clock FFT, set {\tt i\_ce} high.
585
 
586
                If you have elected an FFT that multiplexes its multiplies,
587
                and so can only handle one sample every two or three clocks,
588
                then you'll need to guarantee {\tt i\_ce} remains low for
589
                one or two clocks respectively before raising it again.
590
 
591
        \item Ever after, any time a valid sample is placed in {\tt i\_sample}
592
                and {\tt i\_ce} raised high, a sample will enter into the
593
                FFT for processing.  For the two sample-per-clock FFT, the
594
                input will be into {\tt i\_left} (for the first input)
595
                and {\tt i\_right} (for the next one).
596
 
597 10 dgisselq
        \item At the first valid output, the FFT core will set {\tt o\_sync}
598 42 dgisselq
                line high in addition to placing the output value into
599
                {\tt o\_result}.  For the two-sample per clock FFT, the outputs
600
                will be placed into {\tt o\_left}
601
                (the first of two), and {\tt o\_right} (the second of the two)
602
                respectively.
603
 
604 10 dgisselq
        \item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock
605 11 dgisselq
                two samples in and two samples out.  On any valid first
606 10 dgisselq
                pair of samples coming out of the transform,
607
                {\tt o\_sync} will be high.  Otherwise {\tt o\_sync} will
608
                remain low.
609
\end{enumerate}
610
 
611
There are no special modes or states associated with this core.  If you wish
612
it to stop or pause, just turn off {\tt i\_ce}.  If you wish to flush the
613
core, just send zeros into the core.
614
 
615
\chapter{Registers}
616
 
617
Once built, the FFT routine has no capability for runtime configuration
618
or reconfiguration.  Therefore, this implementation maintains no user
619
configurable or readable registers.
620
 
621
This is a great advantage in many ways, simply because it greatly simplifies
622
the interface over other cores that are available out there.
623
 
624
\chapter{Clocks}
625
 
626
The FFT routines built by this core use one clock only.  The speed of this
627
clock will depend upon the speed your hardware is capable of.  If your data
628
rate is slower than your clock speed, just hold off on the {\tt i\_ce}
629
line as necessary so that every clock with the {\tt i\_ce} line high is a
630
valid sample.
631
 
632
\chapter{IO Ports}
633
 
634
The FFT core presents a small set of IO ports to its external interface.
635
These ports are listed in Table.~\ref{tbl:ioports}.
636
\begin{table}[htbp]
637
\begin{center}
638
\begin{portlist}
639
i\_clk & 1 & Input & The global clock driving the FFT. \\\hline
640
i\_rst & 1 & Input & An active high synchronous reset.\\\hline
641
i\_ce & 1 & Input & Clock Enable.  Set this high to clock data in and
642
                out.\\\hline
643 42 dgisselq
i\_sample & $2N_i$ & Input & The complex input sample.  Bits
644 12 dgisselq
                [$\left(2N_i-1\right)$:$N_i$] of this value are the real
645
                portion, whereas bits [$\left(N_i-1\right)$:0] represent the
646
                imaginary portion.  Both portions are in signed twos complement
647
                integer format.  The number of bits, $N_i$, is configurable.
648 10 dgisselq
                \\\hline
649 42 dgisselq
 
650
i\_left & $2N_i$ & Input & When the core is configured for two-samples per
651
                clock,
652
                this is the first of the two data inputs presented to the core
653
                on any given clock.  It has the same format as {\tt i\_sample}
654
                above.
655 10 dgisselq
                \\\hline
656 42 dgisselq
i\_right & $2N_i$ & Input & The second of two input complex input samples,
657
                used when the core is configured for two-samples per clock.
658
                The format is the same as {\tt i\_sample} above.\\\hline
659
o\_sync & 1 & Output & Signals the first output sample of any transform.
660
                It will be zero from the time of the reset until the first
661
                output sample.  Ever afterwards, it will be true any time
662
                bin zero of the FFT is on the output.\\\hline
663
o\_result & $2N_o$ & Output & The complex output sample.  The format is the
664
                same, save only that $N_o$ bits are used for each twos
665
                complement portion instead of $N_i$.  Hence bits
666
                [$\left(2N_o-1\right)$:$N_o$] of this value are the real
667
                portion, whereas bits [$\left(N_o-1\right)$:0] represent the
668
                imaginary portion.
669
                \\\hline
670
o\_left & $2N_o$ & Output & When in the two-sample per clock configuration,
671
                this is the first of two complex output samples.\\\hline
672
o\_right & $2N_o$ & Output & When in the two-sample per clcok configuration,
673
                this is the second of two complex output samples.  \\\hline
674
                \\\hline
675 10 dgisselq
\end{portlist}
676
\caption{List of IO ports}\label{tbl:ioports}
677
\end{center}\end{table}
678
% Appendices
679
% Index
680
\end{document}
681
 
682
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.