Line 113... |
Line 113... |
This option is useful in those cases where someone wishes to
|
This option is useful in those cases where someone wishes to
|
multiply the coefficients coming out of an FFT by some product,
|
multiply the coefficients coming out of an FFT by some product,
|
and then to inverse FFT the results. If the coefficients are also
|
and then to inverse FFT the results. If the coefficients are also
|
applied in bit--reversed order, then both the FFT and IFFT may
|
applied in bit--reversed order, then both the FFT and IFFT may
|
skip their bit reversals.
|
skip their bit reversals.
|
|
|
|
Be aware, however, doing this requires the bit reversed forward
|
|
transform be followed by a bitreversed decimation in time approach
|
|
to the inverse transform. This software does not (yet) provide that
|
|
capability. As such, the utility just isn't there yet.
|
\item[\hbox{-S}]
|
\item[\hbox{-S}]
|
Include the final bit reversal stage. As this is also the default,
|
Include the final bit reversal stage. As this is also the default,
|
specifying the option should not be necessary.
|
specifying the option should not be necessary.
|
\item[\hbox{-d DIR}]
|
\item[\hbox{-d DIR}]
|
Specifies the DIRectory to place the produced Verilog files. By
|
Specifies the DIRectory to place the produced Verilog files. By
|
Line 135... |
Line 140... |
\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given
|
\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given
|
by the number of bits input to that stage plus this extra number of
|
by the number of bits input to that stage plus this extra number of
|
bits per coefficient. By increasing the number of bits per coefficient
|
bits per coefficient. By increasing the number of bits per coefficient
|
above that of the input samples, truncation error is kept to the
|
above that of the input samples, truncation error is kept to the
|
original error found within the original samples.
|
original error found within the original samples.
|
|
\item[\hbox{-x bits}] Internally accumulated roundoff error can be a difficult
|
|
problem to solve. By using this option, you guarantee that the FFT
|
|
runs with an additional {\tt bits} bits, and only truncates down to
|
|
the necessary width at the end in order to minimize rounding
|
|
errors along the way.
|
|
\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT
|
|
will consume. By default, the FFT does not use any hardware multiplies.
|
|
However, this can be expensive on the rest of the logic used by the
|
|
device. You can avoid this problem by allowing the FFT to use
|
|
hardware multiplies using this option. By default, the multiplies will
|
|
be used in the latter stages, so that they will be applied where
|
|
the bit width is the greatest.
|
\end{itemize}
|
\end{itemize}
|
|
|
\chapter{Architecture}
|
\chapter{Architecture}
|
|
|
As a component of another system the structure of this system is a simple
|
As a component of another system the structure of this system is a simple
|
Line 384... |
Line 401... |
into memory, and then the next $N/4$ clocks pairing a stored input with
|
into memory, and then the next $N/4$ clocks pairing a stored input with
|
a single external input, so that both values become inputs to the butterfly.
|
a single external input, so that both values become inputs to the butterfly.
|
Likewise, the butterfly coefficient is read from a small ROM table.
|
Likewise, the butterfly coefficient is read from a small ROM table.
|
|
|
One trick to making the FFT stage work successfully is synchronization. Since
|
One trick to making the FFT stage work successfully is synchronization. Since
|
the multiplies create a delay of (roughly) one clock cycle per bit of input,
|
the shift and add multiplies create a delay of (roughly) one clock cycle per
|
there is a significant pipeline delay from the input to the output of the
|
bit of input, there is a significant pipeline delay from the input to the
|
butterfly routine. To match this delay, the FFT stage places a
|
output of the butterfly routine. To match this delay, the FFT stage places a
|
synchronization pulse into the butterfly. When this synchronization pulse
|
synchronization pulse into the butterfly. When this synchronization pulse
|
comes out of the butterfly, the values of the butterfly then match the
|
comes out of the butterfly, the values of the butterfly then match the
|
first sample out of the stage. The next synchronization problem comes from
|
first sample out of the stage. The next synchronization problem comes from
|
the fact that the butterflies operate on two samples at a time, whereas the
|
the fact that the butterflies operate on two samples at a time, whereas the
|
FFT stage operates on a single sample at a time. This means that half the
|
FFT stage operates on a single sample at a time. This means that half the
|
Line 406... |
Line 423... |
in memory. In this fashion, data is always valid coming out of each FFT
|
in memory. In this fashion, data is always valid coming out of each FFT
|
stage once the initial synchronization pulse goes high.
|
stage once the initial synchronization pulse goes high.
|
|
|
The complex multiply itself, formed internal to the butterfly routine, is
|
The complex multiply itself, formed internal to the butterfly routine, is
|
formed from three very simple shift and add multiplies, whose output is
|
formed from three very simple shift and add multiplies, whose output is
|
then transformed into a single complex output. To avoid overflow, the
|
then transformed into a single complex output, although there is a command
|
|
line option to use hardware multiplies instead. To avoid overflow, the
|
complex coefficients, $z_n$, for these multiplies are given by,
|
complex coefficients, $z_n$, for these multiplies are given by,
|
\begin{eqnarray}
|
\begin{eqnarray}
|
z_n &=& c_n + js_n,\mbox{ where} \\
|
z_n &=& c_n + js_n,\mbox{ where} \\
|
c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\
|
c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\
|
s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}
|
s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}
|