OpenCores
URL https://opencores.org/ocsvn/dblclockfft/dblclockfft/trunk

Subversion Repositories dblclockfft

[/] [dblclockfft/] [trunk/] [doc/] [src/] [spec.tex] - Diff between revs 12 and 22

Go to most recent revision | Show entire file | Details | Blame | View Log

Rev 12 Rev 22
Line 113... Line 113...
        This option is useful in those cases where someone wishes to
        This option is useful in those cases where someone wishes to
        multiply the coefficients coming out of an FFT by some product,
        multiply the coefficients coming out of an FFT by some product,
        and then to inverse FFT the results.  If the coefficients are also
        and then to inverse FFT the results.  If the coefficients are also
        applied in bit--reversed order, then both the FFT and IFFT may
        applied in bit--reversed order, then both the FFT and IFFT may
        skip their bit reversals.
        skip their bit reversals.
 
 
 
        Be aware, however, doing this requires the bit reversed forward
 
        transform be followed by a bitreversed decimation in time approach
 
        to the inverse transform.  This software does not (yet) provide that
 
        capability.  As such, the utility just isn't there yet.
\item[\hbox{-S}]
\item[\hbox{-S}]
        Include the final bit reversal stage.  As this is also the default,
        Include the final bit reversal stage.  As this is also the default,
        specifying the option should not be necessary.
        specifying the option should not be necessary.
\item[\hbox{-d DIR}]
\item[\hbox{-d DIR}]
        Specifies the DIRectory to place the produced Verilog files.  By
        Specifies the DIRectory to place the produced Verilog files.  By
Line 135... Line 140...
\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given
\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given
        by the number of bits input to that stage plus this extra number of
        by the number of bits input to that stage plus this extra number of
        bits per coefficient.  By increasing the number of bits per coefficient
        bits per coefficient.  By increasing the number of bits per coefficient
        above that of the input samples, truncation error is kept to the
        above that of the input samples, truncation error is kept to the
        original error found within the original samples.
        original error found within the original samples.
 
\item[\hbox{-x bits}] Internally accumulated roundoff error can be a difficult
 
        problem to solve.  By using this option, you guarantee that the FFT
 
        runs with an additional {\tt bits} bits, and only truncates down to
 
        the necessary width at the end in order to minimize rounding
 
        errors along the way.
 
\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT
 
        will consume.  By default, the FFT does not use any hardware multiplies.
 
        However, this can be expensive on the rest of the logic used by the
 
        device.  You can avoid this problem by allowing the FFT to use
 
        hardware multiplies using this option.  By default, the multiplies will
 
        be used in the latter stages, so that they will be applied where
 
        the bit width is the greatest.
\end{itemize}
\end{itemize}
 
 
\chapter{Architecture}
\chapter{Architecture}
 
 
As a component of another system the structure of this system is a simple
As a component of another system the structure of this system is a simple
Line 384... Line 401...
into memory, and then the next $N/4$ clocks pairing a stored input with
into memory, and then the next $N/4$ clocks pairing a stored input with
a single external input, so that both values become inputs to the butterfly.
a single external input, so that both values become inputs to the butterfly.
Likewise, the butterfly coefficient is read from a small ROM table.
Likewise, the butterfly coefficient is read from a small ROM table.
 
 
One trick to making the FFT stage work successfully is synchronization.  Since
One trick to making the FFT stage work successfully is synchronization.  Since
the multiplies create a delay of (roughly) one clock cycle per bit of input,
the shift and add multiplies create a delay of (roughly) one clock cycle per
there is a significant pipeline delay from the input to the output of the
bit of input, there is a significant pipeline delay from the input to the
butterfly routine.  To match this delay, the FFT stage places a
output of the butterfly routine.  To match this delay, the FFT stage places a
synchronization pulse into the butterfly.  When this synchronization pulse
synchronization pulse into the butterfly.  When this synchronization pulse
comes out of the butterfly, the values of the butterfly then match the
comes out of the butterfly, the values of the butterfly then match the
first sample out of the stage.  The next synchronization problem comes from
first sample out of the stage.  The next synchronization problem comes from
the fact that the butterflies operate on two samples at a time, whereas the
the fact that the butterflies operate on two samples at a time, whereas the
FFT stage operates on a single sample at a time.  This means that half the
FFT stage operates on a single sample at a time.  This means that half the
Line 406... Line 423...
in memory.  In this fashion, data is always valid coming out of each FFT
in memory.  In this fashion, data is always valid coming out of each FFT
stage once the initial synchronization pulse goes high.
stage once the initial synchronization pulse goes high.
 
 
The complex multiply itself, formed internal to the butterfly routine, is
The complex multiply itself, formed internal to the butterfly routine, is
formed from three very simple shift and add multiplies, whose output is
formed from three very simple shift and add multiplies, whose output is
then transformed into a single complex output.  To avoid overflow, the
then transformed into a single complex output, although there is a command
 
line option to use hardware multiplies instead.  To avoid overflow, the
complex coefficients, $z_n$, for these multiplies are given by,
complex coefficients, $z_n$, for these multiplies are given by,
\begin{eqnarray}
\begin{eqnarray}
z_n &=& c_n + js_n,\mbox{ where} \\
z_n &=& c_n + js_n,\mbox{ where} \\
c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\
c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\
s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}
s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.