OpenCores
URL https://opencores.org/ocsvn/dblclockfft/dblclockfft/trunk

Subversion Repositories dblclockfft

[/] [dblclockfft/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 19

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 10 dgisselq
\documentclass{gqtekspec}
2
\project{Double Clocked FFT}
3
\title{Specification}
4
\author{Dan Gisselquist, Ph.D.}
5
\email{dgisselq\at opencores.org}
6 11 dgisselq
\revision{Rev.~0.1}
7 10 dgisselq
\begin{document}
8
\pagestyle{gqtekspecplain}
9
\titlepage
10
\begin{license}
11
Copyright (C) \theyear\today, Gisselquist Technology, LLC
12
 
13
This project is free software (firmware): you can redistribute it and/or
14
modify it under the terms of  the GNU General Public License as published
15
by the Free Software Foundation, either version 3 of the License, or (at
16
your option) any later version.
17
 
18
This program is distributed in the hope that it will be useful, but WITHOUT
19
ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
20
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
21
for more details.
22
 
23
You should have received a copy of the GNU General Public License along
24
with this program.  If not, see \hbox{<http://www.gnu.org/licenses/>} for a copy.
25
\end{license}
26
\begin{revisionhistory}
27 11 dgisselq
0.1 & 3/3/2015 & Gisselquist & First Draft \\\hline
28 10 dgisselq
\end{revisionhistory}
29
% Revision History
30
% Table of Contents, named Contents
31
\tableofcontents
32
\listoffigures
33
\listoftables
34
\begin{preface}
35
This FFT comes from my attempts to design and implement a signal processing
36
algorithm inside a generic FPGA, but only on a limited budget.  As such,
37
I don't yet have the FPGA board I wish to place this algorithm onto, neither
38
do I have any expensive modeling or simulation capabilities.  I'm using
39
Verilator for my modeling and simulation needs.  This makes
40
using a vendor supplied IP core, such as an FFT, difficult if not impossible
41
to use.
42
 
43
My problem was made worse when I learned that the published maximum clock
44
speed for a device wasn't necessarily the maximum clock speed that I could
45
achieve.  My design needed to process the incoming signal at 500~MHz to be
46
commercially viable.  500~MHz is not necessarily a clock speed
47
that can be easily achieved.  250~MHz, on the other hand, is much more within
48
the realm of possibility.  Achieving a 500~MHz performance with a 250~MHz
49
clock, however, requires an FFT that accepts two samples per clock.
50
 
51
This, then, was and is the genesis of this project.
52
\end{preface}
53
 
54
\chapter{Introduction}
55
\pagenumbering{arabic}
56
\setcounter{page}{1}
57
 
58
The Double Clocked FFT project contains all of the software necessary to
59
create the IP to generate an arbitrary sized FFT that will clock two samples
60
in at each clock cycle, and after some pipeline delay it will clock two
61
samples out at every clock cycle.
62
 
63
The FFT generated by this approach is very configurable.  By simple adjustment
64
of a command line parameter, the FFT may be made to be a forward FFT or an
65
inverse FFT.  The number of bits processed, kept, and maintained by this
66
FFT are also configurable.  Even the number of bits used for the twiddle
67
factors, or whether or not to bit reverse the outputs, are all configurable
68
parts to this FFT core.
69
 
70
These features make the Double Clocked FFT very different and unique among the
71
other cores available on opencores.com.
72
 
73
For those who wish to get started right away, please download the package,
74
change into the {\tt sw} directory and run {\tt make}.  There is no need to
75
run a configure script, {\tt fftgen} is completely portable C++.  Then, once
76
built, go ahead and run {\tt fftgen} without any arguments.  This will cause
77
{\tt fftgen} to print a usage statement to the screen.  Review the usage
78
statement, and run {\tt fftgen} a second time with the arguments you need.
79
 
80
 
81
\chapter{Generation}
82
 
83
Creating a double clocked FFT core is as simple as running the program
84
{\tt fftgen}.  The program will then create a series of Verilog files, as
85
well as {\tt .hex} files suitable for use with a \textdollar readmemh, and
86
place them into an {\tt ./fft-core/} directory that {\tt fftgen} will create.
87
Creating the core you want takes a touch of configuring.
88
Therefore, the following lists the arguments that can be given to
89
{\tt fftgen} to adjust the core that it builds:
90
\begin{itemize}
91
\item[\hbox{-f size}]
92
        This specifies the size of the FFT core that {\tt fftgen} will build.
93
        The size must be a power of two.  The transform is given, within a
94
        scale factor, to,
95
        \begin{eqnarray*}
96
        X\left[k\right] &=& \sum_{n=0}^{N-1} x\left[n\right]
97
                e^{-j2\pi \frac{k}{N}n}
98
        \end{eqnarray*}
99
 
100
\item[\hbox{-1}]
101
        This specifies that the FFT will be an inverse FFT.  Specifically,
102
        it will calculate,
103
        \begin{eqnarray*}
104
        x\left[n\right] &=& \sum_{k=0}^{N-1} X\left[k\right] e^{j2\pi \frac{k}{N}n}
105
        \end{eqnarray*}
106
\item[\hbox{-0}]
107
        This specifies building a forward FFT.  However, since this is the
108
        default, this option never necessary.
109
\item[\hbox{-s}]
110
        This causes the core to skip the final bit reversal stage.  The
111
        outputs of the FFT will then come out in bit reversed order.
112
 
113
        This option is useful in those cases where someone wishes to
114
        multiply the coefficients coming out of an FFT by some product,
115
        and then to inverse FFT the results.  If the coefficients are also
116
        applied in bit--reversed order, then both the FFT and IFFT may
117
        skip their bit reversals.
118
\item[\hbox{-S}]
119
        Include the final bit reversal stage.  As this is also the default,
120
        specifying the option should not be necessary.
121
\item[\hbox{-d DIR}]
122
        Specifies the DIRectory to place the produced Verilog files.  By
123
        default, this will be in the `./fft-core/' directory, but it can
124
        be moved to any other directory as necessary.
125
\item[\hbox{-n bits}] Sets the number of input bits per sample.  Given this
126
        setting, each of the two samples clocked in at every clock cycle
127
        will have this many bits for their real portion, and again this many
128
        bits for their imaginary portion.  Thus, the data input to the
129
        FFT will be four times this many bits per clock.
130
\item[\hbox{-m bits}] This sets the maximum bit width of the output.
131
        By default, the FFT will gain bits as they accumulate within
132
        the FFT.  Bits are accumulated at roughly one bit for every two stages.
133
        However, if this value is set, bits are only accumulated up to this
134
        maximum width.  After this width, further accumulations are truncated.
135
\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given
136
        by the number of bits input to that stage plus this extra number of
137
        bits per coefficient.  By increasing the number of bits per coefficient
138
        above that of the input samples, truncation error is kept to the
139
        original error found within the original samples.
140
\end{itemize}
141
 
142
\chapter{Architecture}
143
 
144
As a component of another system the structure of this system is a simple
145
black box such as the one shown in Fig.~\ref{fig:black-box}.
146
\begin{figure}\begin{center}
147
\begin{pspicture}(-2.1in,0)(2.1in,2in)
148
% \rput(0,0){\psframe(-2.1in,0)(2.1in,2in)}
149
\rput(0,0){\rput(0,0){\psframe[linewidth=2\pslinewidth](-0.75in,0)(0.75in,2in)}
150
        \rput(0,1in){(I)FFT Core}
151
        \rput[r](-1.6in,1.8in){\tt i\_clk}
152
                \rput(-1.5in,1.8in){\psline{->}(0,0)(0.7in,0)}
153
        \rput[r](-1.6in,1.5in){\tt i\_rst}
154
                \rput(-1.5in,1.5in){\psline{->}(0,0)(0.7in,0)}
155
        \rput[r](-1.6in,1.2in){\tt i\_ce}
156
                \rput(-1.5in,1.2in){\psline{->}(0,0)(0.7in,0)}
157
        \rput[r](-1.6in,0.6in){\tt i\_left}
158
                \rput(-1.5in,0.6in){\psline{->}(0,0)(0.7in,0)}
159
                \rput(-1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
160
                \rput[br](-1.2in,0.6in){\scalebox{0.75}{$2N_i$}}
161
        \rput[r](-1.6in,0.3in){\tt i\_right}
162
                \rput(-1.5in,0.3in){\psline{->}(0,0)(0.7in,0)}
163
                \rput(-1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
164
                \rput[br](-1.2in,0.3in){\scalebox{0.75}{$2N_i$}}
165
        %
166
        \rput[l](1.6in,1.2in){\tt o\_sync}
167
                \rput(0.8in,1.2in){\psline{->}(0,0)(0.7in,0)}
168
        \rput[l](1.6in,0.6in){\tt o\_left}
169
                \rput(0.8in,0.6in){\psline{->}(0,0)(0.7in,0)}
170
                \rput(1.15in,0.6in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
171
                \rput[br](1.1in,0.6in){\scalebox{0.75}{$2N_o$}}
172
        \rput[l](1.6in,0.3in){\tt o\_right}
173
                \rput(0.8in,0.3in){\psline{->}(0,0)(0.7in,0)}
174
                \rput(1.15in,0.3in){\psline(-0.05in,-0.05in)(0.05in,0.05in)}
175
                \rput[br](1.1in,0.3in){\scalebox{0.75}{$2N_o$}}
176
        }
177
\end{pspicture}
178
\caption{(I)FFT Black Box Diagram}\label{fig:black-box}
179
\end{center}\end{figure}
180
The interface
181
is simple: strobe the reset line, and every clock thereafter set the clock
182
enable line when data is valid on the left and right input ports.  Likewise
183
for the outputs, when the {\tt o\_sync} line goes high the first data sample
184
is available.  Ever after that, one data sample will be available every clock
185
cycle that the {\tt i\_ce} line is high.
186
 
187
Internal to the FFT, things are a touch more complex.  Fig.~\ref{fig:white-box}
188
\begin{figure}\begin{center}
189
\begin{pspicture}(1.3in,-0.5in)(4.7in,5in)
190
        % \rput(0,0){\psframe(0,-0.5in)(\textwidth,5.25in)}
191 11 dgisselq
        \rput(0,0){\psframe[linewidth=2\pslinewidth](1.3in,-0.25in)(4.7in,5in)}
192 10 dgisselq
        \rput(0,5in){%
193
                \rput[r](1.95in,0.125in){\tiny\tt i\_left}
194
                \rput[l](4.05in,0.125in){\tiny\tt i\_right}
195
                \rput(2.0in,0){\psline{->}(0,0.25in)(0,0.0in)}
196
                \rput(4.0in,0){\psline{->}(0,0.25in)(0,0.0in)}
197
        }
198
        \rput(2in,0){%
199
                \rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
200
                        \rput[r](-0.05in,0.675in){\tiny Left}
201
                        \rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}
202
                        \rput(0,0.25in){Evens, $N$}
203
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
204
                        \rput[l](0.35in,-0.125in){\tiny Data}
205
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
206
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
207
                \rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
208
                        \rput(0,0.25in){Evens, $N/2$}
209
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
210
                        \rput[l](0.35in,-0.125in){\tiny Data}
211
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
212
                        \rput( 0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
213
                % \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%
214
                        % \rput(0,0.25in){Evens, $N$}}
215
                \rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
216
                        \rput(0,0.25in){Evens, $8$}
217
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
218
                        \rput[l](0.35in,-0.125in){\tiny Data}
219
                        \rput[r](-0.35in,0.675in){\tiny Sync}
220
                        \rput[l](0.35in,0.675in){\tiny Data}
221
                        \rput(-0.3in,0.9in){$\vdots$}
222
                        \rput( 0.3in,0.9in){$\vdots$}
223
                        \rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
224
                        \rput( 0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
225
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
226
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
227
                \rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
228
                        \rput(0,0.25in){Qtrstage (Even)}
229
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
230
                        \rput[lb](0.6in,-0.10in){\tiny Data}
231
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.5in)(0.8in,-0.5in)}
232
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.125in)(0.4in,-0.125in)(0.4in,-0.25in)}}
233
                % \rput(0,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%
234
                        % \rput(0,0.25in){dblstage}}
235
                % \rput(0,0in){\psframe(-0.5in,0)(0.5in,0.5in)%
236
                        % \rput(0,0.25in){Bit Reversal}}
237
        }
238
        \rput(4in,0){%
239
                \rput(0,4.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
240
                        \rput[l](0.05in,0.675in){\tiny Right}
241
                        \rput(0.0in,0){\psline{->}(0,0.75in)(0,0.5in)}
242
                        \rput(0,0.25in){Odds, $N$}
243
                        \rput[l](0.35in,-0.125in){\tiny Sync}
244
                        \rput[r](-0.35in,-0.125in){\tiny Data}
245
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
246
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
247
                \rput(0,3.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
248
                        \rput(0,0.25in){Odds, $N/2$}
249
                        \rput[l](0.35in,-0.125in){\tiny Sync}
250
                        \rput[r](-0.35in,-0.125in){\tiny Data}
251
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
252
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
253
                % \rput(0,3in){\psframe(-0.5in,0)(0.5in,0.5in)%
254
                        % \rput(0,0.25in){Evens, $N$}}
255
                \rput(0,2.25in){\psframe(-0.5in,0)(0.5in,0.5in)%
256
                        \rput(0,0.25in){Odds, $8$}
257
                        \rput[l](0.35in,0.675in){\tiny Sync}
258
                        \rput[r](-0.35in,0.675in){\tiny Data}
259
                        \rput(-0.3in,0.9in){$\vdots$}
260
                        \rput( 0.3in,0.9in){$\vdots$}
261
                        \rput[l](0.35in,-0.125in){\tiny Sync}
262
                        \rput[r](-0.35in,-0.125in){\tiny Data}
263
                        \rput(-0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
264
                        \rput(0.3in,0.75in){\psline{->}(0,0)(0,-0.25in)}
265
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
266
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
267
                \rput(0,1.5in){\psframe(-0.5in,0)(0.5in,0.5in)%
268
                        \rput(0,0.25in){Qtrstage (Odd)}
269
                        \rput[rb](-0.6in,-0.10in){\tiny Data}
270
                        \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}
271
                                \rput[t](0.3in,-0.3in){\tiny NC}
272
                        \rput(-0.3in,0){\psline{->}(0,0)(0,-0.125in)(-0.4in,-0.125in)(-0.4in,-0.25in)}
273
                        }
274
        }
275
        \rput(3in,0.75in){\psframe(-0.5in,0)(0.5in,0.5in)%
276
                \rput(0,0.25in){Double Stage}
277
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
278
                        \rput[l](0.35in,-0.125in){\tiny Right}
279
                        \rput[r](0.15in,-0.125in){\tiny Left}
280
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
281
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
282
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
283
        \rput(3in,0in){\psframe(-0.5in,0)(0.5in,0.5in)%
284
                \rput(0,0.25in){Bit Reversal}
285
                        \rput[r](-0.35in,-0.125in){\tiny Sync}
286
                        \rput[l](0.35in,-0.125in){\tiny Right}
287
                        \rput[r](0.15in,-0.125in){\tiny Left}
288
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
289
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
290
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
291
        \rput(3in,-0.25in){\rput[r](-0.35in,-0.125in){\tiny\tt o\_sync}
292
                        \rput[l](0.35in,-0.125in){\tiny\tt o\_right}
293
                        \rput[r](0.15in,-0.125in){\tiny\tt o\_left}
294
                \rput(-0.3in,0){\psline{->}(0,0)(0,-0.25in)}
295
                \rput(0.2in,0){\psline{->}(0,0)(0,-0.25in)}
296
                \rput(0.3in,0){\psline{->}(0,0)(0,-0.25in)}}
297
\end{pspicture}
298
\caption{Internal FFT Structure}\label{fig:white-box}
299
\end{center}\end{figure}
300
attempts to show some of this structure.  As you can see from the figure, the
301
FFT itself is composed of a series of stages.  These stages are split from the
302
beginning into an even stage and an odd stage.  Further, they are numbered
303
according to the size of the FFT they represent.  Therefore the first stage
304
is numbered $N$ and represents the first stage of an $N$ point FFT.  The
305
second stage is labeled $N/2$, then $N/$, and so on down to $N=8$.  The
306
four sample stage and the two sample stages are different, however.  These
307
two stages, representing three blocks on Fig.~\ref{fig:white-box}, can be
308
accomplished without any multiplies.  Therefore they have been accomplished
309
separately.  Likewise all of the stages, save the double stage at the bottom,
310
operate on one data sample per clock.  Only the last stage, prior to the
311
bit reversal stage, takes two data samples per clock as input, and outputs two
312
data samples per clock.  Finally, the bit reversal stage acts as the last
313
piece of the structure.
314
 
315
Internal to each of the FFT stages is a butterfly and a complex multiply,
316
as shown in Fig.~\ref{fig:fftstage}.
317
\begin{figure}\begin{center}
318 11 dgisselq
\begin{pspicture}(-0.25in,-1.8in)(3.25in,4.25in)
319
        % \rput(0,0){\psframe(0in,-2in)(3in,4.25in)}
320
        \rput(0,0){\psframe[linewidth=2\pslinewidth](-0.25in,-1.55in)(3.25in,4.0in)}
321
        \rput[r](1.625in,4.125in){\tt i\_data}
322
        \rput(1.675in,3.75in){\psline{->}(0,0.5in)(0,0in)%
323
                        \psline{->}(0,0)(-0.2in,-0.25in)%
324
                        \psarc{->}{0.15in}{200}{340}}
325 10 dgisselq
        \rput(0,2.75in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}
326
                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}
327
                        \rput(0,0){\psframe(0.3in,0)(0.4in,0.25in)}
328
                        \rput(0,0){\psframe(0.5in,0)(0.6in,0.25in)}
329
                        \rput(0,0){\psframe(0.7in,0)(0.8in,0.25in)}
330
                        \rput(0,0){\psframe(0.9in,0)(1.0in,0.25in)}
331
                        \rput(0,0){\psframe(1.1in,0)(1.2in,0.25in)}
332
                        \rput(0,0){\psline{-}(0.7in,-0.05in)(1.1in,-0.25in)}
333 11 dgisselq
                        \rput(0,0){\psline{<-}(0.7in,0.3in)(1.5in,0.5in)(1.5in,0.75in)}}
334 10 dgisselq
        \rput(1.85in,2.75in){\psline(0,0.75in)(0,-0.25in)}
335 11 dgisselq
        \rput(0.6in,0.25in){\rput(0,0){\psframe[linewidth=2\pslinewidth](0,0)(2in,2.0in)}
336 10 dgisselq
                \rput(0.50in,2in){\psline{->}(0,0.25in)(0,0in)}
337
                \rput(1.25in,2in){\psline{->}(0,0.25in)(0,0in)}
338
                \rput(1.75in,2in){\psline{->}(0,0.25in)(0,0in)}
339
                \rput(0.5in,0){%
340
                        \rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}
341
                        \rput(0in,0){\psline{->}(0,1.75in)(0.65in,1.1in)}
342
                        \rput(-0.1in,1.1in){$+$}
343
                        \rput(0in,1.0in){$\bigoplus$}
344
                        \rput(0in,0){\psline{->}(0,0.9in)(0,0.75in)}
345
                        \rput(0in,0.5in){\psframe(-0.45in,-0.25in)(0.45in,0.25in)}
346
                        \rput(0in,0.5in){\parbox{0.8in}{Delay, and\\shift by $C-2$}}
347
                        \rput(0in,0){\psline{->}(0,0.25in)(0,0.0in)}}
348
                \rput(1.25in,0){%
349
                        \rput(0in,0){\psline{->}(0,2.0in)(0,1.1in)}
350
                        \rput(0in,0){\psline{->}(0,1.75in)(-0.65in,1.1in)}
351
                        \rput(0.1in,1.1in){$-$}
352
                        \rput(0in,1in){$\bigoplus$}
353
                        \rput(0in,0){\psline{->}(0,0.9in)(0,0.6in)}
354
                        \rput(0in,0.5in){$\bigotimes$}
355
                        \rput(0in,0){\psline{->}(0,0.4in)(0,0.0in)}}
356
                \rput(1.75in,0){%
357
                        \rput(0,0){\psline{->}(0,2.0in)(0,0.5in)(-0.4in,0.5in)}}
358 11 dgisselq
                \rput(0.50in,-0.25in){\psline{->}(0,0.25in)(0,-1.05in)}
359
                \rput(1.25in,-0.25in){\psline{-}(0,0.25in)(0,0in)}}
360
        \rput*[l](2.0in,0.5in){DIF Butterfly}
361
        \rput*[lb](1.95in,2.5in){Coefficient memory}
362 10 dgisselq
        % \rput(0,0){\psframe(1.3in,-0.25in)(4.7in,5in)}
363 11 dgisselq
        \rput(1.7in,-0.5in){\rput(0,0){\psframe(0,0)(1.3in,0.25in)}
364 10 dgisselq
                        \rput(0,0){\psframe(0.1in,0)(0.2in,0.25in)}
365
                        \rput(0,0){\psframe(0.3in,0)(0.4in,0.25in)}
366
                        \rput(0,0){\psframe(0.5in,0)(0.6in,0.25in)}
367
                        \rput(0,0){\psframe(0.7in,0)(0.8in,0.25in)}
368
                        \rput(0,0){\psframe(0.9in,0)(1.0in,0.25in)}
369
                        \rput(0,0){\psframe(1.1in,0)(1.2in,0.25in)}
370 11 dgisselq
                        \rput(0,0){\psline{<-}(0.7in,0.30in)(0.15in,0.5in)}
371
                        \rput(0,0){\psline{->}(0.7in,-0.05in)(-0.2in,-0.3in)(-0.2in,-0.55in)}}
372
        \rput(1.3in,-1.3in){\psline{->}(-0.2in,0.25in)(0,0)}
373
        \rput(1.3in,-1.3in){\psarcn{->}{0.15in}{150}{30}}
374
        \rput(1.3in,-1.3in){\psline{->}(0,0)(0,-0.5in)}
375
        \rput[l](1.35in,-1.675in){\tt o\_data}
376 10 dgisselq
\end{pspicture}
377 11 dgisselq
\caption{A Single FFT Stage, with Butterfly}\label{fig:fftstage}
378 10 dgisselq
\end{center}\end{figure}
379
These FFT stages are really no different than any other decimation in
380
frequency FFT, save only that the coefficients are alternated between the
381
two stages.  That is, the even stages get all the even coefficients, and
382
the odd stages get all of the odd coefficients.
383
Internally, each stage spends the first $N/4$ clocks storing its inputs
384
into memory, and then the next $N/4$ clocks pairing a stored input with
385
a single external input, so that both values become inputs to the butterfly.
386
Likewise, the butterfly coefficient is read from a small ROM table.
387
 
388
One trick to making the FFT stage work successfully is synchronization.  Since
389
the multiplies create a delay of (roughly) one clock cycle per bit of input,
390
there is a significant pipeline delay from the input to the output of the
391
butterfly routine.  To match this delay, the FFT stage places a
392
synchronization pulse into the butterfly.  When this synchronization pulse
393
comes out of the butterfly, the values of the butterfly then match the
394
first sample out of the stage.  The next synchronization problem comes from
395
the fact that the butterflies operate on two samples at a time, whereas the
396
FFT stage operates on a single sample at a time.  This means that half the
397
time the butterfly output will be invalid.  To keep things aligned, and to
398
avoid the invalid data half, a counter is started by the synchronization pulse
399
coming out of the butterfly in order to keep track.  Using this counter and
400
once the butterfly produces the first sync pulse, the next $N/4$ clock cycles
401
will produce valid butterfly outputs.  For these clock cycles, the left or
402
first output is sent immediately to the next FFT stage, whereas the right
403
or second output is saved into memory.  Once these cycles are complete, the
404
butterfly outputs will be invalid for the next $N/4$ clock cycles.  During
405
these invalid clock cycles, the FFT stage outputs data that had been stored
406
in memory.  In this fashion, data is always valid coming out of each FFT
407
stage once the initial synchronization pulse goes high.
408
 
409
The complex multiply itself, formed internal to the butterfly routine, is
410
formed from three very simple shift and add multiplies, whose output is
411
then transformed into a single complex output.  To avoid overflow, the
412
complex coefficients, $z_n$, for these multiplies are given by,
413
\begin{eqnarray}
414
z_n &=& c_n + js_n,\mbox{ where} \\
415
c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\
416
s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}
417
\end{eqnarray}
418
$C$ is the number of bits allocated to the coefficient.
419
 
420
For those wishing to understand this operation further and in more depth, I
421
would commend them to the literature on how a decimation in frequency FFT is
422
constructed.
423
 
424
\chapter{Operation}
425
 
426
The core is actually really easy to use:
427
\begin{enumerate}
428
        \item Provide a system clock to the core every clock cycle.
429
        \item Set the {\tt i\_rst} line high for at least one clock cycle
430
                before you intend to use the core.
431
        \item From the time of reset until the first sample pair is available
432
                on the IO ports, {\tt i\_rst} may be kept low, but the clock
433
                enable line {\tt i\_ce} must also be kept low.
434
        \item On the clock containing the first sample pair, {\tt i\_left}
435
                and {\tt i\_right}, set {\tt i\_ce} high.
436
        \item Ever after, any time a valid pair of samples is available to
437
                the input of the FFT, place the first sample of the pair
438
                on the {\tt i\_left} line, the second on the {\tt i\_right}
439
                line, and set {\tt i\_ce} high.
440
        \item At the first valid output, the FFT core will set {\tt o\_sync}
441
                line high in addition to the output values {\tt o\_left}
442
                (the first of two), and {\tt o\_right} (the second of the two).
443
        \item Ever after, whenever {\tt i\_ce} is high, the FFT core will clock
444 11 dgisselq
                two samples in and two samples out.  On any valid first
445 10 dgisselq
                pair of samples coming out of the transform,
446
                {\tt o\_sync} will be high.  Otherwise {\tt o\_sync} will
447
                remain low.
448
\end{enumerate}
449
 
450
There are no special modes or states associated with this core.  If you wish
451
it to stop or pause, just turn off {\tt i\_ce}.  If you wish to flush the
452
core, just send zeros into the core.
453
 
454
\chapter{Registers}
455
 
456
Once built, the FFT routine has no capability for runtime configuration
457
or reconfiguration.  Therefore, this implementation maintains no user
458
configurable or readable registers.
459
 
460
This is a great advantage in many ways, simply because it greatly simplifies
461
the interface over other cores that are available out there.
462
 
463
\chapter{Clocks}
464
 
465
The FFT routines built by this core use one clock only.  The speed of this
466
clock will depend upon the speed your hardware is capable of.  If your data
467
rate is slower than your clock speed, just hold off on the {\tt i\_ce}
468
line as necessary so that every clock with the {\tt i\_ce} line high is a
469
valid sample.
470
 
471
\chapter{IO Ports}
472
 
473
The FFT core presents a small set of IO ports to its external interface.
474
These ports are listed in Table.~\ref{tbl:ioports}.
475
\begin{table}[htbp]
476
\begin{center}
477
\begin{portlist}
478
i\_clk & 1 & Input & The global clock driving the FFT. \\\hline
479
i\_rst & 1 & Input & An active high synchronous reset.\\\hline
480
i\_ce & 1 & Input & Clock Enable.  Set this high to clock data in and
481
                out.\\\hline
482
i\_left & $2N_i$ & Input & The first of two input complex input samples.  Bits
483 12 dgisselq
                [$\left(2N_i-1\right)$:$N_i$] of this value are the real
484
                portion, whereas bits [$\left(N_i-1\right)$:0] represent the
485
                imaginary portion.  Both portions are in signed twos complement
486
                integer format.  The number of bits, $N_i$, is configurable.
487 10 dgisselq
                \\\hline
488
i\_right & $2N_i$ & Input & The second of two input complex input samples.
489
                The format is the same as {\tt i\_left} above.\\\hline
490
o\_left & $2N_o$ & Output & The first of two input complex output samples.
491
                The format is the same, save only that $N_o$ bits are
492
                used for each twos complement portion instead of $N_i$.\\\hline
493
o\_right & $2N_o$ & Output & The second of two input complex output samples.
494
                The format is the same as for {\tt o\_left} above.\\\hline
495
o\_sync & 1 & Output & Signals the first output sample pair of any transform,
496
                zero otherwise.
497
                \\\hline
498
\end{portlist}
499
\caption{List of IO ports}\label{tbl:ioports}
500
\end{center}\end{table}
501
% Appendices
502
% Index
503
\end{document}
504
 
505
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.