Double Clocked FFT Core

Project maintainers


Name: dblclockfft
Created: Feb 20, 2015
Updated: Apr 7, 2018
SVN Updated: Mar 19, 2019
SVN: Browse
Latest version: download (might take a bit to start...)
Statistics: View
Bugs: 5 reported / 2 solved
Star21you like it: star it!

Other project properties

Category:DSP core
Development status:Stable
Additional info:Design done, FPGA proven, Specification done
WishBone compliant: No
WishBone version: n/a
License: GPL


The goal of this project is to create an IP core for an FFT that runs, in a pipelined fashion, at two samples per clock. A C++ program will generate the Verilog files, allowing the FFT to be of an arbitrary length--subject only to the capability of the FPGA used to implement the FFT.

One of my goals is to create an FFT core that can be used with open source and third party Verilog simulation facilities, such as Verilator. This would be difficult with a proprietary IP core.

For those who might be wondering, why would I need an FFT that runs at two samples per clock? Let me remind them that FFT's tend to use their multiplies more efficiently than other filtering implementations, but to do so you need to use some form of overlap and add filtering structure. An overlap and add structure immediately puts you into needing an FFT that runs at twice the clock speed of the incoming data.

Usage Statistics

The following statistics come from a Basys-3 development board implementation using Vivado as the development tool:

FFT Size32641282565121024
Bit Width161616161616
Twiddle Factor Bits171717171717
Extra Internal Bits111111
Stages with Optimized Multiplies345677
Slice LUTs241131363811451753258885
Slice Registers4130540165367593868214687
Memory LUTs3524705246228081280
Flip Flop Pairs3622462554696389757712223
Block RAMs2246813
I should also note that the last two stages of any of these FFT implementations don't use multiplies, just adds and subtracts. As a result seven stages of hardware multiplies is the maximum you can have for a 512 point FFT. The 1024 point FFT does one multiply stage in logic, and the result is ... expensive.

Future Upgrades

If I can muster the time to keep working on this, I'd like to add ...

* A capability to do FFT's on real samples, rather than just complex

* A capability to operate at one sample per clock, or even one sample every two clocks


Please feel free to contact me at dgisselq at if you would like further features, or to have this core tailored to your application or device.