The goal of this project is to create an IP core for an FFT that runs, in a pipelined fashion, at two samples per clock. A C++ program will generate the Verilog files, allowing the FFT to be of an arbitrary length--subject only to the capability of the FPGA used to implement the FFT.
One of my goals is to create an FFT core that can be used with open source and third party Verilog simulation facilities, such as Verilator. This would be difficult with a proprietary IP core.
For those who might be wondering, why would I need an FFT that runs at two samples per clock? Let me remind them that FFT's tend to use their multiplies more efficiently than other filtering implementations, but to do so you need to use some form of overlap and add filtering structure. An overlap and add structure immediately puts you into needing an FFT that runs at twice the clock speed of the incoming data.
The following statistics come from a Basys-3 development board implementation using Vivado as the development tool:
FFT Size | 32 | 64 | 128 | 256 | 512 | 1024 |
---|---|---|---|---|---|---|
Bit Width | 16 | 16 | 16 | 16 | 16 | 16 |
Twiddle Factor Bits | 17 | 17 | 17 | 17 | 17 | 17 |
Extra Internal Bits | 1 | 1 | 1 | 1 | 1 | 1 |
Stages with Optimized Multiplies | 3 | 4 | 5 | 6 | 7 | 7 |
Slice LUTs | 2411 | 3136 | 3811 | 4517 | 5325 | 8885 |
Slice Registers | 4130 | 5401 | 6536 | 7593 | 8682 | 14687 |
Memory LUTs | 352 | 470 | 524 | 622 | 808 | 1280 |
Flip Flop Pairs | 3622 | 4625 | 5469 | 6389 | 7577 | 12223 |
Block RAMs | 2 | 2 | 4 | 6 | 8 | 13 |
DSP48s | 18 | 26 | 30 | 36 | 42 | 42 |
If I can muster the time to keep working on this, I'd like to add ...
* A capability to do FFT's on real samples, rather than just complex
* A capability to operate at one sample per clock, or even one sample every two clocks
Please feel free to contact me at dgisselq at opencores.org if you would like further features, or to have this core tailored to your application or device.
Dan