OpenCores

Rev 12	Rev 22
Line 113...	Line 113...
`This option is useful in those cases where someone wishes to`	`This option is useful in those cases where someone wishes to`
`multiply the coefficients coming out of an FFT by some product,`	`multiply the coefficients coming out of an FFT by some product,`
`and then to inverse FFT the results. If the coefficients are also`	`and then to inverse FFT the results. If the coefficients are also`
`applied in bit--reversed order, then both the FFT and IFFT may`	`applied in bit--reversed order, then both the FFT and IFFT may`
`skip their bit reversals.`	`skip their bit reversals.`

	`Be aware, however, doing this requires the bit reversed forward`
	`transform be followed by a bitreversed decimation in time approach`
	`to the inverse transform. This software does not (yet) provide that`
	`capability. As such, the utility just isn't there yet.`
`\item[\hbox{-S}]`	`\item[\hbox{-S}]`
`Include the final bit reversal stage. As this is also the default,`	`Include the final bit reversal stage. As this is also the default,`
`specifying the option should not be necessary.`	`specifying the option should not be necessary.`
`\item[\hbox{-d DIR}]`	`\item[\hbox{-d DIR}]`
`Specifies the DIRectory to place the produced Verilog files. By`	`Specifies the DIRectory to place the produced Verilog files. By`
Line 135...	Line 140...
`\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given`	`\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given`
`by the number of bits input to that stage plus this extra number of`	`by the number of bits input to that stage plus this extra number of`
`bits per coefficient. By increasing the number of bits per coefficient`	`bits per coefficient. By increasing the number of bits per coefficient`
`above that of the input samples, truncation error is kept to the`	`above that of the input samples, truncation error is kept to the`
`original error found within the original samples.`	`original error found within the original samples.`
	`\item[\hbox{-x bits}] Internally accumulated roundoff error can be a difficult`
	`problem to solve. By using this option, you guarantee that the FFT`
	`runs with an additional {\tt bits} bits, and only truncates down to`
	`the necessary width at the end in order to minimize rounding`
	`errors along the way.`
	`\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT`
	`will consume. By default, the FFT does not use any hardware multiplies.`
	`However, this can be expensive on the rest of the logic used by the`
	`device. You can avoid this problem by allowing the FFT to use`
	`hardware multiplies using this option. By default, the multiplies will`
	`be used in the latter stages, so that they will be applied where`
	`the bit width is the greatest.`
`\end{itemize}`	`\end{itemize}`

`\chapter{Architecture}`	`\chapter{Architecture}`

`As a component of another system the structure of this system is a simple`	`As a component of another system the structure of this system is a simple`
Line 384...	Line 401...
`into memory, and then the next $N/4$ clocks pairing a stored input with`	`into memory, and then the next $N/4$ clocks pairing a stored input with`
`a single external input, so that both values become inputs to the butterfly.`	`a single external input, so that both values become inputs to the butterfly.`
`Likewise, the butterfly coefficient is read from a small ROM table.`	`Likewise, the butterfly coefficient is read from a small ROM table.`

`One trick to making the FFT stage work successfully is synchronization. Since`	`One trick to making the FFT stage work successfully is synchronization. Since`
`the multiplies create a delay of (roughly) one clock cycle per bit of input,`	`the shift and add multiplies create a delay of (roughly) one clock cycle per`
`there is a significant pipeline delay from the input to the output of the`	`bit of input, there is a significant pipeline delay from the input to the`
`butterfly routine. To match this delay, the FFT stage places a`	`output of the butterfly routine. To match this delay, the FFT stage places a`
`synchronization pulse into the butterfly. When this synchronization pulse`	`synchronization pulse into the butterfly. When this synchronization pulse`
`comes out of the butterfly, the values of the butterfly then match the`	`comes out of the butterfly, the values of the butterfly then match the`
`first sample out of the stage. The next synchronization problem comes from`	`first sample out of the stage. The next synchronization problem comes from`
`the fact that the butterflies operate on two samples at a time, whereas the`	`the fact that the butterflies operate on two samples at a time, whereas the`
`FFT stage operates on a single sample at a time. This means that half the`	`FFT stage operates on a single sample at a time. This means that half the`
Line 406...	Line 423...
`in memory. In this fashion, data is always valid coming out of each FFT`	`in memory. In this fashion, data is always valid coming out of each FFT`
`stage once the initial synchronization pulse goes high.`	`stage once the initial synchronization pulse goes high.`

`The complex multiply itself, formed internal to the butterfly routine, is`	`The complex multiply itself, formed internal to the butterfly routine, is`
`formed from three very simple shift and add multiplies, whose output is`	`formed from three very simple shift and add multiplies, whose output is`
`then transformed into a single complex output. To avoid overflow, the`	`then transformed into a single complex output, although there is a command`
	`line option to use hardware multiplies instead. To avoid overflow, the`
`complex coefficients, $z_n$, for these multiplies are given by,`	`complex coefficients, $z_n$, for these multiplies are given by,`
`\begin{eqnarray}`	`\begin{eqnarray}`
`z_n &=& c_n + js_n,\mbox{ where} \\`	`z_n &=& c_n + js_n,\mbox{ where} \\`
`c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\`	`c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\`
`s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}`	`s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}`

Line 113...

        This option is useful in those cases where someone wishes to

        This option is useful in those cases where someone wishes to

        multiply the coefficients coming out of an FFT by some product,

        multiply the coefficients coming out of an FFT by some product,

        and then to inverse FFT the results.  If the coefficients are also

        and then to inverse FFT the results.  If the coefficients are also

        applied in bit--reversed order, then both the FFT and IFFT may

        applied in bit--reversed order, then both the FFT and IFFT may

        skip their bit reversals.

        skip their bit reversals.

        Be aware, however, doing this requires the bit reversed forward

        transform be followed by a bitreversed decimation in time approach

        to the inverse transform.  This software does not (yet) provide that

        capability.  As such, the utility just isn't there yet.

\item[\hbox{-S}]

\item[\hbox{-S}]

        Include the final bit reversal stage.  As this is also the default,

        Include the final bit reversal stage.  As this is also the default,

        specifying the option should not be necessary.

        specifying the option should not be necessary.

\item[\hbox{-d DIR}]

\item[\hbox{-d DIR}]

        Specifies the DIRectory to place the produced Verilog files.  By

        Specifies the DIRectory to place the produced Verilog files.  By

Line 135...

Line 140...

\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given

\item[\hbox{-c bits}] The number of bits in each twiddle coefficient is given

        by the number of bits input to that stage plus this extra number of

        by the number of bits input to that stage plus this extra number of

        bits per coefficient.  By increasing the number of bits per coefficient

        bits per coefficient.  By increasing the number of bits per coefficient

        above that of the input samples, truncation error is kept to the

        above that of the input samples, truncation error is kept to the

        original error found within the original samples.

        original error found within the original samples.

\item[\hbox{-x bits}] Internally accumulated roundoff error can be a difficult

        problem to solve.  By using this option, you guarantee that the FFT

        runs with an additional {\tt bits} bits, and only truncates down to

        the necessary width at the end in order to minimize rounding

        errors along the way.

\item[\hbox{-p nmpy}] This sets the number of hardware multiplies that the FFT

        will consume.  By default, the FFT does not use any hardware multiplies.

        However, this can be expensive on the rest of the logic used by the

        device.  You can avoid this problem by allowing the FFT to use

        hardware multiplies using this option.  By default, the multiplies will

        be used in the latter stages, so that they will be applied where

        the bit width is the greatest.

\end{itemize}

\end{itemize}

\chapter{Architecture}

\chapter{Architecture}

As a component of another system the structure of this system is a simple

As a component of another system the structure of this system is a simple

Line 384...

Line 401...

into memory, and then the next $N/4$ clocks pairing a stored input with

into memory, and then the next $N/4$ clocks pairing a stored input with

a single external input, so that both values become inputs to the butterfly.

a single external input, so that both values become inputs to the butterfly.

Likewise, the butterfly coefficient is read from a small ROM table.

Likewise, the butterfly coefficient is read from a small ROM table.

One trick to making the FFT stage work successfully is synchronization.  Since

One trick to making the FFT stage work successfully is synchronization.  Since

the multiplies create a delay of (roughly) one clock cycle per bit of input,

the shift and add multiplies create a delay of (roughly) one clock cycle per

there is a significant pipeline delay from the input to the output of the

bit of input, there is a significant pipeline delay from the input to the

butterfly routine.  To match this delay, the FFT stage places a

output of the butterfly routine.  To match this delay, the FFT stage places a

synchronization pulse into the butterfly.  When this synchronization pulse

synchronization pulse into the butterfly.  When this synchronization pulse

comes out of the butterfly, the values of the butterfly then match the

comes out of the butterfly, the values of the butterfly then match the

first sample out of the stage.  The next synchronization problem comes from

first sample out of the stage.  The next synchronization problem comes from

the fact that the butterflies operate on two samples at a time, whereas the

the fact that the butterflies operate on two samples at a time, whereas the

FFT stage operates on a single sample at a time.  This means that half the

FFT stage operates on a single sample at a time.  This means that half the

Line 406...

Line 423...

in memory.  In this fashion, data is always valid coming out of each FFT

in memory.  In this fashion, data is always valid coming out of each FFT

stage once the initial synchronization pulse goes high.

stage once the initial synchronization pulse goes high.

The complex multiply itself, formed internal to the butterfly routine, is

The complex multiply itself, formed internal to the butterfly routine, is

formed from three very simple shift and add multiplies, whose output is

formed from three very simple shift and add multiplies, whose output is

then transformed into a single complex output.  To avoid overflow, the

then transformed into a single complex output, although there is a command

line option to use hardware multiplies instead.  To avoid overflow, the

complex coefficients, $z_n$, for these multiplies are given by,

complex coefficients, $z_n$, for these multiplies are given by,

\begin{eqnarray}

\begin{eqnarray}

z_n &=& c_n + js_n,\mbox{ where} \\

z_n &=& c_n + js_n,\mbox{ where} \\

c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\

c_n &=& \left\lfloor 2^{C-2}\cos\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor,\\

s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}

s_n &=& \left\lfloor 2^{C-2}\sin\left(2\pi \frac{n}{N}\right)+\frac{1}{2}\right\rfloor\mbox{, and}

Browse

Tools

Subversion Repositories dblclockfft

[/] [dblclockfft/] [trunk/] [doc/] [src/] [spec.tex] - Diff between revs 12 and 22