OpenCores
URL https://opencores.org/ocsvn/dblclockfft/dblclockfft/trunk

Subversion Repositories dblclockfft

Compare Revisions

  • This comparison shows the changes necessary to convert path
    /dblclockfft
    from Rev 35 to Rev 36
    Reverse comparison

Rev 35 → Rev 36

/trunk/README.md
1,19 → 1,32
# A Double-Clocked FFT Core Generator
# A Generic Piplined FFT Core Generator
 
The Double Clocked FFT project contains all of the software necessary to
create the IP to generate an arbitrary sized FFT that will clock two samples
in at each clock cycle, and after some pipeline delay it will clock two
samples out at every clock cycle.
This generic pipelined FFT project contains all of the software necessary to
create the IP to generate an arbitrary sized FFT. The FFT has been modified
for operation in one of the following modes:
 
The FFT generated by this approach is very configurable. By simple adjustment
of a command line parameter, the FFT may be made to be a forward FFT or an
- Two samples in per clock and, after some delay, two samples out per clock.
This uses 6 multiplies per FFT stage in the butterflies. This was the purpose
of the original `dblclkfft`. (Why double clock? I don't know. Double-sample
FFT might've been a better name.)
 
- One sample in per clock, with the `i_ce` line being high for every incoming
sample--up to one sample per clock. There's also options to run with at
least one clock between samples, or even two clocks between samples (or more).
This mode uses 3, 2, or 1 multiplies per FFT stage respectively.
 
- Eventually, I want to support a real FFT mode which will accept real samples
input, and alternately produce real and imaginary samples output--or the
converse for the inverse FFT.
 
The FFT generated by this project is very configurable. By simple adjustment
of a command line parameter, the FFT created will either be a forward FFT or an
inverse FFT. The number of bits processed, kept, and maintained by this
FFT are also configurable. Even the number of bits used for the twiddle
factors, or whether or not to bit reverse the outputs, are all configurable
parts to this FFT core.
 
These features make the Double Clocked FFT very different and unique among the
other open HDL cores you may fine.
These features make this open source pipelined FFT module very different
and unique among the other open HDL cores you may find.
 
For those who wish to get started right away, please download the package,
change into the ``sw`` directory and run ``make``. There is no need to
22,27 → 35,21
``fftgen`` to print a usage statement to the screen. Review the usage
statement, and run ``fftgen`` a second time with the arguments you need.
 
Alternatively, you _could_ read the specification.
# Current State
 
## Genesis
This FFT comes from my attempts to design and implement a signal processing
algorithm inside a generic FPGA, but only on a limited budget. As such,
I don't yet have the FPGA board I wish to place this algorithm onto, neither
do I have any expensive modeling or simulation capabilities. I'm using
Verilator for my modeling and simulation needs. This makes
using a vendor supplied IP core, such as an FFT, difficult if not impossible
to use.
This particular version of the FFT core now passes all my tests. It has
yet to meet hardware to be finally verified.
 
My problem was made worse when I learned that the published maximum clock
speed for a device wasn't necessarily the maximum clock speed that I could
achieve. My design needed to process the incoming signal at 500 MHz to be
commercially viable. 500 MHz is not necessarily a clock speed
that can be easily achieved. 250 MHz, on the other hand, is much more within
the realm of possibility. Achieving a 500 MHz performance with a 250 MHz
clock, however, requires an FFT that accepts two samples per clock.
- The [FFT test bench](bench/cpp/fft_tb.cpp) doesn't yet have a threshold that
adjusts with input parameters to determine success or failure (yet).
 
This, then, was and is the genesis of this project.
- I haven't started on the real-only version of this FFT.
 
While my previously stated goal ws to continue working with this core until it
has a real-FFT capability before releasing it back into the master branch,
I'm actually so excited that I got it to this point that I'm going to move
it from dev to master earlier, and come back to get the real only version.
 
# Commercial Applications
 
Should you find the GPLv3 license insufficient for your needs, other licenses
50,3 → 57,6
 
Likewise, please contact us should you wish to fund the further development
of this core.
 
Watch this space if you are interested in a release under another license.
I'm thinking about relicensing this with a more permissive license.
/trunk/bench/cpp/Makefile
6,15 → 6,14
##
## Purpose: This programs the build process for the test benches
## associated with the double clocked FFT project. These
## test benches are designed for the size and arguments of the
## FFT as given by the Makefile in the trunk/sw directory,
## although they shouldn't be too difficult to modify for
## other FFT parameters.
## test benches are designed for the size and arguments of the FFT as
## given by the Makefile in the trunk/sw directory, although they shouldn't
## be too difficult to modify for other FFT parameters.
##
## Please note that running these test benches requires access
## to the *cmem_*.hex files found in trunk/sw/fft-core. I
## usually soft link them into this directory, but such linking
## is not currently part of this makefile or the build scripts.
## Please note that running these test benches requires access to the
## *cmem_*.hex files found in trunk/rtl. I usually soft link
## them into this directory, but such linking is not currently part of
## this makefile or the build scripts.
##
## Creator: Dan Gisselquist, Ph.D.
## Gisselquist Technology, LLC
21,7 → 20,7
##
##########################################################################/
##
## Copyright (C) 2015, Gisselquist Technology, LLC
## Copyright (C) 2015,2018 Gisselquist Technology, LLC
##
## This program is free software (firmware): you can redistribute it and/or
## modify it under the terms of the GNU General Public License as published
43,10 → 42,11
##
##
##########################################################################/
all: mpy_tb dblrev_tb dblstage_tb qtrstage_tb fft_tb test
all: mpy_tb bitreverse_tb hwbfly_tb butterfly_tb fftstage_tb fft_tb
all: qtrstage_tb laststage_tb test
 
OBJDR:= ../../sw/fft-core/obj_dir
VSRCD = ../../sw/fft-core
OBJDR:= ../../rtl/obj_dir
VSRCD = ../../rtl
TBODR:= ../rtl/obj_dir
ifneq ($(VERILATOR_ROOT),)
VERILATOR:=$(VERILATOR_ROOT)/bin/verilator
60,24 → 60,25
VINC := -I$(VROOT)/include -I$(OBJDR)/ -I$(TBODR)/
# MPYLB:= $(OBJDR)/Vshiftaddmpy__ALL.a
MPYLB:= $(OBJDR)/Vlongbimpy__ALL.a
DBLRV:= $(OBJDR)/Vdblreverse__ALL.a
DBLSG:= $(OBJDR)/Vdblstage__ALL.a
BTREV:= $(OBJDR)/Vbitreverse__ALL.a
STAGE:= $(OBJDR)/Vfftstage__ALL.a
QTRSG:= $(OBJDR)/Vqtrstage__ALL.a
LSTSG:= $(OBJDR)/Vlaststage__ALL.a
BFLYL:= $(OBJDR)/Vbutterfly__ALL.a
HWBFY:= $(OBJDR)/Vhwbfly__ALL.a
FFTLB:= $(OBJDR)/Vfftmain__ALL.a
IFTLB:= $(TBODR)/Vifft_tb__ALL.a
STGLB:= $(OBJDR)/Vfftstage_o2048__ALL.a
STGLB:= $(OBJDR)/Vfftstage__ALL.a
VSRCS:= $(VROOT)/include/verilated.cpp $(VROOT)/include/verilated_vcd_c.cpp
 
mpy_tb: mpy_tb.cpp fftsize.h twoc.h $(MPYLB)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(MPYLB) $(VSRCS) -o $@
 
dblrev_tb: dblrev_tb.cpp twoc.cpp twoc.h fftsize.h $(DBLRV)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(DBLRV) $(VSRCS) -o $@
bitreverse_tb: bitreverse_tb.cpp twoc.cpp twoc.h fftsize.h $(BTREV)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(BTREV) $(VSRCS) -o $@
 
dblstage_tb: dblstage_tb.cpp twoc.cpp twoc.h $(DBLSG)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(DBLSG) $(VSRCS) -o $@
laststage_tb: laststage_tb.cpp twoc.cpp twoc.h $(LSTSG)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(LSTSG) $(VSRCS) -o $@
 
qtrstage_tb: qtrstage_tb.cpp twoc.cpp twoc.h $(QTRSG)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(QTRSG) $(VSRCS) -o $@
88,7 → 89,7
hwbfly_tb: hwbfly_tb.cpp twoc.cpp twoc.h $(HWBFY)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(HWBFY) $(VSRCS) -o $@
 
fftstage_o2048_tb: fftstage_o2048_tb.cpp twoc.cpp twoc.h $(STGLB)
fftstage_tb: fftstage_tb.cpp twoc.cpp twoc.h $(STGLB)
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(STGLB) $(VSRCS) -o $@
 
fft_tb: fft_tb.cpp twoc.cpp twoc.h fftsize.h $(FFTLB)
112,22 → 113,22
ln -s $(VSRCD)/*.hex .
 
.PHONY: test
test: mpy_tb dblrev_tb dblstage_tb qtrstage_tb butterfly_tb fftstage_o2048_tb
test: mpy_tb bitreverse_tb fftstage_tb qtrstage_tb butterfly_tb fftstage_tb
test: fft_tb ifft_tb hwbfly_tb
./mpy_tb
./dblrev_tb
./dblstage_tb
./qtrstage_tb
./bitreverse_tb
./fftstage_tb
echo ./qtrstage_tb
./butterfly_tb
./hwbfly_tb
./fftstage_o2048_tb
./fftstage_tb
./fft_tb
./ifft_tb
 
.PHONY: clean
clean:
rm -f mpy_tb dblrev_tb dblstage_tb qtrstage_tb butterfly_tb
rm -f fftstage_o2048_tb fft_tb ifft_tb hwbfly_tb
rm -f mpy_tb bitreverse_tb fftstage_tb qtrstage_tb butterfly_tb
rm -f fftstage_tb fft_tb ifft_tb hwbfly_tb
rm -rf fft_tb.dbl ifft_tb.dbl
rm -rf *cmem_*.hex
 
/trunk/bench/cpp/README.md
0,0 → 1,18
Here are the bench tests for the pipelined FFT. In general, there's a
`*_tb.cpp` file corresponding to every unit within the FFT. Feel free to
try them.
 
Be aware, however, the [fft_tb](fft_tb.cpp) doesn't truly
check for success--I just haven't gotten to the point of verifying that
the FFT result is *close enough* to the right answer in spite of actually
calculating the right answer. Instead, it creates a data file that can be
read in Octave via [fft_tb.m](fft_tb.m). That will show the first test output.
The second and subsequent outputs can be read via `k=k+1;` followed by calling
[plottst](plottst.m).
 
As another note (before I clean things up more), you'll need the `*.hex` files
in the same directory as the one you call [fft_tb](fft_tb.cpp) or
[fftstage_tb](fftstage_tb.cpp) from.
 
I expect the IFFT will work: it's just an FFT with conjugate twiddle factors,
although I haven't fully tested it yet.
/trunk/bench/cpp/bitreverse_tb.cpp
0,0 → 1,235
////////////////////////////////////////////////////////////////////////////
//
// Filename: snglbrev_tb.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: A test-bench for the bitreversal stage of the pipelined
// FFT. This file may be run autonomously. If so, the last line
// output will either read "SUCCESS" on success, or some other failure
// message otherwise.
//
// This file depends upon verilator to both compile, run, and therefore
// test either snglbrev.v or dblreverse.v--depending on whether or not the
// FFT handles one or two inputs per clock respectively.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
///////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015,2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
///////////////////////////////////////////////////////////////////////////
#include "verilated.h"
#include "verilated_vcd_c.h"
 
#include "fftsize.h"
#include "Vbitreverse.h"
 
#define FFTBITS TST_DBLREVERSE_LGSIZE
#define FFTSIZE (1<<(FFTBITS))
#define FFTMASK (FFTSIZE-1)
#define DATALEN (1<<(FFTBITS+1))
#define DATAMSK (DATALEN-1)
#define PAGEMSK (FFTSIZE)
 
#ifdef NEW_VERILATOR
#define VVAR(A) bitreverse__DOT_ ## A
#else
#define VVAR(A) v__DOT_ ## A
#endif
 
typedef Vbitreverse TSTCLASS;
 
#define iaddr VVAR(_wraddr)
#define in_reset VVAR(_in_reset)
 
VerilatedVcdC *trace = NULL;
uint64_t m_tickcount = 0;
 
void tick(TSTCLASS *brev) {
m_tickcount++;
 
brev->i_clk = 0;
brev->eval();
if (trace) trace->dump((uint64_t)(10ul*m_tickcount-2));
brev->i_clk = 1;
brev->eval();
if (trace) trace->dump((uint64_t)(10ul*m_tickcount));
brev->i_clk = 0;
brev->eval();
if (trace) {
trace->dump((uint64_t)(10ul*m_tickcount+5));
trace->flush();
}
 
brev->i_ce = 0;
}
 
void cetick(TSTCLASS *brev) {
brev->i_ce = 1;
tick(brev);
if (rand()&1) {
brev->i_ce = 1;
tick(brev);
}
}
 
void reset(TSTCLASS *brev) {
brev->i_ce = 0;
brev->i_reset = 1;
tick(brev);
brev->i_ce = 0;
brev->i_reset = 0;
tick(brev);
}
 
unsigned long bitrev(const int nbits, const unsigned long vl) {
unsigned long r = 0;
unsigned long val = vl;
 
for(int k=0; k<nbits; k++) {
r <<= 1;
r |= (val & 1);
val >>= 1;
}
 
return r;
}
 
int main(int argc, char **argv, char **envp) {
Verilated::commandArgs(argc, argv);
Verilated::traceEverOn(true);
TSTCLASS *brev = new TSTCLASS;
int syncd = 0;
unsigned long datastore[DATALEN], dataidx=0;
const int BREV_OFFSET = 0;
 
trace = new VerilatedVcdC;
brev->trace(trace, 99);
trace->open("bitreverse_tb.vcd");
 
reset(brev);
 
printf("FFTSIZE = %08x\n", FFTSIZE);
printf("FFTMASK = %08x\n", FFTMASK);
printf("DATALEN = %08x\n", DATALEN);
printf("DATAMSK = %08x\n", DATAMSK);
 
for(int k=0; k<4*(FFTSIZE); k++) {
brev->i_ce = 1;
#ifdef DBLCLKFFT
brev->i_in_0 = 2*k;
brev->i_in_1 = 2*k+1;
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_0;
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_1;
#else
brev->i_in = k;
datastore[(dataidx++)&(DATAMSK)] = brev->i_in;
#endif
tick(brev);
 
printf("k=%3d: IN = %6lx, OUT = %6lx, SYNC = %d\t(%2x) %d\n",
k, brev->i_in, brev->o_out, brev->o_sync,
brev->iaddr, brev->in_reset);
 
if ((k>BREV_OFFSET)&&((BREV_OFFSET==(k&FFTMASK))?1:0) != brev->o_sync) {
fprintf(stdout, "FAIL, BAD SYNC (k = %d > %d)\n", k, BREV_OFFSET);
exit(EXIT_FAILURE);
} else if (brev->o_sync) {
syncd = 1;
}
if ((syncd)&&((brev->o_out&FFTMASK) != bitrev(FFTBITS, k-BREV_OFFSET))) {
fprintf(stdout, "FAIL: BITREV.0 of k (%2x) = %2lx, not %2lx\n",
k, brev->o_out, bitrev(FFTBITS, (k-BREV_OFFSET)));
exit(EXIT_FAILURE);
}
}
 
for(int k=0; k<4*(FFTSIZE); k++) {
brev->i_ce = 1;
#ifdef DBLCLKFFT
brev->i_in_0 = rand() & 0x0ffffff;
brev->i_in_1 = rand() & 0x0ffffff;
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_0;
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_1;
#else
brev->i_in = rand() & 0x0ffffff;
datastore[(dataidx++)&(DATAMSK)] = brev->i_in;
#endif
tick(brev);
 
#ifdef DBLCLKFFT
printf("k=%3d: IN = %6lx : %6lx, OUT = %6lx : %6lx, SYNC = %d\n",
k, brev->i_in_0, brev->i_in_1,
brev->o_out_0, brev->o_out_1, brev->o_sync);
#else
printf("k=%3d: IN = %6lx, OUT = %6lx, SYNC = %d\n",
k, brev->i_in, brev->o_out, brev->o_sync);
#endif
 
if (brev->o_sync)
syncd = 1;
#ifdef DBLCLKFFT
if ((syncd)&&(brev->o_out_0 != datastore[(((dataidx-2-FFTSIZE)&PAGEMSK) + bitrev(FFTBITS, (dataidx-FFTSIZE-2)&FFTMASK))])) {
fprintf(stdout, "FAIL: BITREV.0 of k (%2x) = %2lx, not %2lx (expected %lx -> %lx)\n",
k, brev->o_out_0,
datastore[(((dataidx-2-FFTSIZE)&PAGEMSK)
+ bitrev(FFTBITS, (dataidx-FFTSIZE-2)&FFTMASK))],
(dataidx-2)&DATAMSK,
(((dataidx-2)&PAGEMSK)
+ bitrev(FFTBITS, (dataidx-FFTSIZE-2)&FFTMASK)));
// exit(-1);
}
 
if ((syncd)&&(brev->o_out_1 != datastore[(((dataidx-2-FFTSIZE)&PAGEMSK) + bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))])) {
fprintf(stdout, "FAIL: BITREV.1 of k (%2x) = %2lx, not %2lx (expected %lx)\n",
k, brev->o_out_1,
datastore[(((dataidx-2-FFTSIZE)&PAGEMSK)
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))],
(((dataidx-1)&PAGEMSK)
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK)));
// exit(-1);
}
#else
if ((syncd)&&(brev->o_out != datastore[
(((dataidx-1-FFTSIZE)&PAGEMSK)
+ bitrev(FFTBITS,
(dataidx-FFTSIZE-1)&FFTMASK))])) {
fprintf(stdout, "FAIL: BITREV.0 of k (%2x) = %2lx, not %2lx (expected %lx -> %lx)\n",
k, brev->o_out,
datastore[(((dataidx-1-FFTSIZE)&PAGEMSK)
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))],
(dataidx-2)&DATAMSK,
(((dataidx-2)&PAGEMSK)
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK)));
exit(EXIT_FAILURE);
}
#endif
}
 
delete brev;
 
printf("SUCCESS!\n");
exit(0);
}
/trunk/bench/cpp/butterfly_tb.cpp
4,13 → 4,13
//
// Project: A Doubletime Pipelined FFT
//
// Purpose: A test-bench for the butterfly.v subfile of the double
// clocked FFT. This file may be run autonomously. If so,
// the last line output will either read "SUCCESS" on success,
// or some other failure message otherwise.
// Purpose: A test-bench for the butterfly.v subfile of the generic
// pipelined FFT. This file may be run autonomously. If so,
// the last line output will either read "SUCCESS" on success, or some
// other failure message otherwise.
//
// This file depends upon verilator to both compile, run, and
// therefore test butterfly.v
// This file depends upon verilator to both compile, run, and therefore
// test butterfly.v
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
17,7 → 17,7
//
///////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015, Gisselquist Technology, LLC
// Copyright (C) 2015,2018 Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
42,11 → 42,18
#include <stdio.h>
#include <stdint.h>
 
#include "fftsize.h"
#include "verilated.h"
#include "verilated_vcd_c.h"
#include "Vbutterfly.h"
#include "verilated.h"
#include "twoc.h"
#include "fftsize.h"
 
#ifdef NEW_VERILATOR
#define VVAR(A) butterfly__DOT__ ## A
#else
#define VVAR(A) v__DOT_ ## A
#endif
 
#define IWIDTH TST_BUTTERFLY_IWIDTH
#define CWIDTH TST_BUTTERFLY_CWIDTH
#define OWIDTH TST_BUTTERFLY_OWIDTH
55,24 → 62,55
class BFLY_TB {
public:
Vbutterfly *m_bfly;
VerilatedVcdC *m_trace;
unsigned long m_left[64], m_right[64];
bool m_aux[64];
int m_addr, m_lastaux, m_offset;
bool m_syncd, m_waiting_for_sync_input;
uint64_t m_tickcount;
 
BFLY_TB(void) {
Verilated::traceEverOn(true);
m_trace = NULL;
m_bfly = new Vbutterfly;
m_addr = 0;
m_syncd = 0;
m_tickcount = 0;
m_waiting_for_sync_input = true;
}
 
void opentrace(const char *vcdname) {
if (!m_trace) {
m_trace = new VerilatedVcdC;
m_bfly->trace(m_trace, 99);
m_trace->open(vcdname);
}
}
 
void closetrace(void) {
if (m_trace) {
m_trace->close();
delete m_trace;
m_trace = NULL;
}
}
 
void tick(void) {
m_tickcount++;
 
m_lastaux = m_bfly->o_aux;
m_bfly->i_clk = 0;
m_bfly->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2));
m_bfly->i_clk = 1;
m_bfly->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount));
m_bfly->i_clk = 0;
m_bfly->eval();
if (m_trace) {
m_trace->dump((uint64_t)(10ul*m_tickcount+5));
m_trace->flush();
}
 
if ((!m_syncd)&&(m_bfly->o_aux))
m_offset = m_addr;
79,14 → 117,33
m_syncd = (m_syncd) || (m_bfly->o_aux);
}
 
void cetick(void) {
int ce = m_bfly->i_ce, nkce;
 
tick();
 
nkce = (rand()&1);
#ifdef FFT_CKPCE
nkce += FFT_CKPCE;
#endif
 
if ((ce)&&(nkce > 0)) {
m_bfly->i_ce = 0;
for(int kce=0; kce<nkce-1; kce++)
tick();
}
 
m_bfly->i_ce = ce;
}
 
void reset(void) {
m_bfly->i_ce = 0;
m_bfly->i_rst = 1;
m_bfly->i_reset = 1;
m_bfly->i_coef = 0l;
m_bfly->i_left = 0;
m_bfly->i_right = 0;
tick();
m_bfly->i_rst = 0;
m_bfly->i_reset = 0;
m_bfly->i_ce = 1;
//
// Let's run a RESET test here, forcing the whole butterfly
93,19 → 150,18
// to be filled with aux=1. If the reset works right,
// we'll never get an aux=1 output.
//
m_bfly->i_rst = 1;
m_bfly->i_reset = 1;
m_bfly->i_aux = 1;
for(int i=0; i<200; i++) {
m_bfly->i_ce = 1;
tick();
}
m_bfly->i_ce = 1;
for(int i=0; i<200; i++)
cetick();
 
// Now here's the RESET line, so let's see what the test does
m_bfly->i_rst = 1;
m_bfly->i_reset = 1;
m_bfly->i_ce = 1;
m_bfly->i_aux = 1;
tick();
m_bfly->i_rst = 0;
cetick();
m_bfly->i_reset = 0;
m_syncd = 0;
 
m_waiting_for_sync_input = true;
124,41 → 180,42
}
 
m_bfly->i_ce = 1;
tick();
cetick();
 
if ((m_bfly->o_aux)&&(!m_lastaux))
printf("\n");
printf("n,k=%d,%3d: COEF=%010lx, LFT=%08x, RHT=%08x, A=%d, OLFT =%09lx, ORHT=%09lx, AUX=%d\n",
printf("n,k=%d,%3d: COEF=%0*lx, LFT=%0*x, RHT=%0*x, A=%d, OLFT =%0*lx, ORHT=%0*lx, AUX=%d\n",
n,k,
m_bfly->i_coef & (~(-1l<<40)),
m_bfly->i_left,
m_bfly->i_right,
(2*CWIDTH+3)/4, ubits(m_bfly->i_coef, 2*CWIDTH),
(2*IWIDTH+3)/4, m_bfly->i_left,
(2*IWIDTH+3)/4, m_bfly->i_right,
m_bfly->i_aux,
m_bfly->o_left,
m_bfly->o_right,
(2*OWIDTH+3)/4, (long)m_bfly->o_left,
(2*OWIDTH+3)/4, (long)m_bfly->o_right,
m_bfly->o_aux);
 
if ((m_syncd)&&(m_left[(m_addr-m_offset)&(64-1)] != m_bfly->o_left)) {
printf("WRONG O_LEFT! (%lx(exp) != %lx(sut))\n",
printf("WRONG O_LEFT! (%lx(exp) != %lx(sut)\n",
m_left[(m_addr-m_offset)&(64-1)],
m_bfly->o_left);
exit(-1);
(long)m_bfly->o_left);
exit(EXIT_FAILURE);
}
 
if ((m_syncd)&&(m_right[(m_addr-m_offset)&(64-1)] != m_bfly->o_right)) {
printf("WRONG O_RIGHT (%10lx(exp) != (%10lx(sut))!\n",
m_right[(m_addr-m_offset)&(64-1)], m_bfly->o_right);
exit(-1);
printf("WRONG O_RIGHT! (%lx(exp) != %lx(sut))\n",
m_right[(m_addr-m_offset)&(64-1)],
(long)m_bfly->o_right);
exit(EXIT_FAILURE);
}
 
if ((m_syncd)&&(m_aux[(m_addr-m_offset)&(64-1)] != m_bfly->o_aux)) {
printf("FAILED AUX CHANNEL TEST (i.e. the SYNC)\n");
exit(-1);
exit(EXIT_FAILURE);
}
 
if ((m_addr > TST_BUTTERFLY_MPYDELAY+6)&&(!m_syncd)) {
printf("NO SYNC PULSE!\n");
// exit(-1);
exit(EXIT_FAILURE);
}
 
// Now, let's calculate an "expected" result ...
241,6 → 298,18
}
};
 
long gentestword(int w, int al, int ar) {
unsigned long lo, hi, r;
hi = ((unsigned long)(al&0x0c))<<(w-4);
hi += (al&3)-2ul;
 
lo = ((unsigned long)(ar&0x0c))<<(w-4);
lo += (ar&3)-2ul;
 
r = (ubits(hi, w) << w) | (ubits(lo, w));
return r;
}
 
int main(int argc, char **argv, char **envp) {
Verilated::commandArgs(argc, argv);
BFLY_TB *bfly = new BFLY_TB;
251,13 → 320,32
 
const int TESTSZ = 256;
 
bfly->opentrace("butterfly.vcd");
 
bfly->reset();
 
// #define ZEROTEST
#define ZEROTEST bfly->test(9,0,0x0000000000l,0x00000000,0x00000000, 0)
// Test whether or not the aux channel starts clear, like its supposed to
 
bfly->test(9,0,0x4000000000l,0x000f0000,0x00000000, 1);
ZEROTEST;
ZEROTEST;
bfly->test(9,0,0x4000000000l,0x00000000,0x000f0000, 0);
ZEROTEST;
ZEROTEST;
bfly->test(9,0,0x4000000000l,0x000f0000,0x000f0000, 0);
ZEROTEST;
ZEROTEST;
bfly->test(9,1,0x4000000000l,0x000f0000,0xfff10000, 0);
ZEROTEST;
ZEROTEST;
bfly->test(9,2,0x4000000000l,0x0000000f,0x0000fff1, 0);
ZEROTEST;
ZEROTEST;
bfly->test(9,3,0x4000000000l,0x0000000f,0x0000000f, 0);
ZEROTEST;
ZEROTEST;
 
bfly->test(9,0,0x4000000000l,0x7fff0000,0x7fff0000, 1);
bfly->test(9,1,0x4000000000l,0x7fff0000,0x80010000, 0);
337,6 → 425,30
bfly->test(n,k, cof, lft, rht, aux);
}
 
int k = TESTSZ;
// Exhaustively test
#if (4*IWIDTH+2*CWIDTH <= 24)
for(int a=0; a<(1<<(2*IWIDTH)); a++)
for(int b=0; b<(1<<(2*IWIDTH)); b++)
for(int c=0; c<(1<<(2*CWIDTH)); c++)
bfly->test(0, k++, c, a, b, 0);
 
printf("Exhaust complete\n");
#else
for(int al=0; al<16; al++)
for(int ar=0; ar<16; ar++)
for(int bl=0; bl<16; bl++)
for(int br=0; br<16; br++)
for(int cl=0; cl<16; cl++)
for(int cr=0; cr<16; cr++) {
long a = gentestword(IWIDTH, al, ar);
long b = gentestword(IWIDTH, bl, br);
long c = gentestword(CWIDTH, cl, cr);
bfly->test(0, k++, c, a, b, 0);
}
printf("Partial exhaust complete\n");
#endif
 
delete bfly;
 
printf("SUCCESS!\n");
/trunk/bench/cpp/fft_tb.cpp
5,12 → 5,11
//
// Purpose: A test-bench for the main program, fftmain.v, of the double
// clocked FFT. This file may be run autonomously (when
// fully functional). If so, the last line output will either
// read "SUCCESS" on success, or some other failure message
// otherwise.
// fully functional). If so, the last line output will either read
// "SUCCESS" on success, or some other failure message otherwise.
//
// This file depends upon verilator to both compile, run, and
// therefore test fftmain.v
// This file depends upon verilator to both compile, run, and therefore
// test fftmain.v
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
17,7 → 16,7
//
///////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015, Gisselquist Technology, LLC
// Copyright (C) 2015,2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
40,10 → 39,12
//
///////////////////////////////////////////////////////////////////////////
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <fftw3.h>
 
#include "verilated.h"
#include "verilated_vcd_c.h"
#include "Vfftmain.h"
#include "twoc.h"
 
56,8 → 57,11
#define VVAR(A) v__DOT_ ## A
#endif
 
 
#ifdef DBLCLKFFT
#define revstage_iaddr VVAR(_revstage__DOT__iaddr)
#else
#define revstage_iaddr VVAR(_revstage__DOT__wraddr)
#endif
#define br_sync VVAR(_br_sync)
#define br_started VVAR(_r_br_started)
#define w_s2048 VVAR(_w_s2048)
119,9 → 123,11
double *m_fft_buf;
bool m_syncd;
unsigned long m_tickcount;
VerilatedVcdC* m_trace;
 
FFT_TB(void) {
m_fft = new Vfftmain;
Verilated::traceEverOn(true);
m_iaddr = m_oaddr = 0;
m_dumpfp = NULL;
 
131,39 → 137,75
FFTW_FORWARD, FFTW_MEASURE);
m_syncd = false;
m_ntest = 0;
}
 
m_tickcount = 0l;
~FFT_TB(void) {
closetrace();
delete m_fft;
m_fft = NULL;
}
 
virtual void opentrace(const char *vcdname) {
if (!m_trace) {
m_trace = new VerilatedVcdC;
m_fft->trace(m_trace, 99);
m_trace->open(vcdname);
}
}
 
virtual void closetrace(void) {
if (m_trace) {
m_trace->close();
delete m_trace;
m_trace = NULL;
}
}
 
void tick(void) {
if ((!m_fft->i_ce)||(m_fft->i_rst))
m_tickcount++;
if (m_fft->i_reset)
printf("TICK(%s,%s)\n",
(m_fft->i_rst)?"RST":" ",
(m_fft->i_reset)?"RST":" ",
(m_fft->i_ce)?"CE":" ");
 
m_fft->i_clk = 0;
m_fft->eval();
if (m_trace)
m_trace->dump((vluint64_t)(10*m_tickcount-2));
m_fft->i_clk = 1;
m_fft->eval();
if (m_trace)
m_trace->dump((vluint64_t)(10*m_tickcount));
m_fft->i_clk = 0;
m_fft->eval();
if (m_trace) {
m_trace->dump((vluint64_t)(10*m_tickcount+5));
m_trace->flush();
}
}
 
m_tickcount++;
void cetick(void) {
int ce = m_fft->i_ce, nkce;
tick();
 
/*
int nrpt = (rand()&0x01f) + 1;
m_fft->i_ce = 0;
for(int i=0; i<nrpt; i++) {
m_fft->i_clk = 0;
m_fft->eval();
m_fft->i_clk = 1;
m_fft->eval();
nkce = (rand()&1);
#ifdef FFT_CKPCE
nkce += FFT_CKPCE;
#endif
if ((ce)&&(nkce>0)) {
m_fft->i_ce = 0;
for(int kce=1; kce < nkce; kce++)
tick();
}
*/
 
m_fft->i_ce = ce;
}
 
void reset(void) {
m_fft->i_ce = 0;
m_fft->i_rst = 1;
m_fft->i_reset = 1;
tick();
m_fft->i_rst = 0;
m_fft->i_reset = 0;
tick();
 
m_iaddr = m_oaddr = m_logbase = 0;
254,18 → 296,19
printf("%3d : SCALE = %12.6f, WT = %18.1f, ISQ = %15.1f, ",
m_ntest, scale, wt, isq);
printf("OSQ = %18.1f, ", osq);
printf("XISQ = %18.1f\n", xisq);
printf("XISQ = %18.1f, sqrt = %9.2f\n", xisq, sqrt(xisq));
if (xisq > 1.4 * FFTLEN/2) {
printf("TEST FAIL!! Result is out of bounds from ");
printf("expected result with FFTW3.\n");
// exit(-2);
// exit(EXIT_FAILURE);
}
m_ntest++;
}
 
#ifdef DBLCLKFFT
bool test(ITYP lft, ITYP rht) {
m_fft->i_ce = 1;
m_fft->i_rst = 0;
m_fft->i_reset = 0;
m_fft->i_left = lft;
m_fft->i_right = rht;
 
272,7 → 315,7
m_log[(m_iaddr++)&(NFTLOG*FFTLEN-1)] = lft;
m_log[(m_iaddr++)&(NFTLOG*FFTLEN-1)] = rht;
 
tick();
cetick();
 
if (m_fft->o_sync) {
if (!m_syncd) {
339,7 → 382,81
 
return (m_fft->o_sync);
}
#else
bool test(ITYP data) {
m_fft->i_ce = 1;
m_fft->i_reset = 0;
m_fft->i_sample = data;
 
m_log[(m_iaddr++)&(NFTLOG*FFTLEN-1)] = data;
 
cetick();
 
if (m_fft->o_sync) {
if (!m_syncd) {
m_syncd = true;
printf("ORIGINAL SYNC AT 0x%lx, m_oaddr set to 0x%x\n", m_tickcount, m_oaddr);
m_logbase = m_iaddr;
} else printf("RESYNC AT %lx\n", m_tickcount);
m_oaddr &= (-1<<LGWIDTH);
} else m_oaddr += 1;
 
printf("%8x,%5d: %08x -> %011lx\t",
m_iaddr, m_oaddr, data, m_fft->o_result);
 
#ifndef APPLY_BITREVERSE_LOCALLY
printf(" [%3x]%s", m_fft->revstage_iaddr,
(m_fft->br_sync)?"S"
:((m_fft->br_started)?".":"x"));
#endif
 
printf(" ");
#if (FFT_SIZE>=2048)
printf("%s", (m_fft->w_s2048)?"S":"-");
#endif
#if (FFT_SIZE>1024)
printf("%s", (m_fft->w_s1024)?"S":"-");
#endif
#if (FFT_SIZE>512)
printf("%s", (m_fft->w_s512)?"S":"-");
#endif
#if (FFT_SIZE>256)
printf("%s", (m_fft->w_s256)?"S":"-");
#endif
#if (FFT_SIZE>128)
printf("%s", (m_fft->w_s128)?"S":"-");
#endif
#if (FFT_SIZE>64)
printf("%s", (m_fft->w_s64)?"S":"-");
#endif
#if (FFT_SIZE>32)
printf("%s", (m_fft->w_s32)?"S":"-");
#endif
#if (FFT_SIZE>16)
printf("%s", (m_fft->w_s16)?"S":"-");
#endif
#if (FFT_SIZE>8)
printf("%s", (m_fft->w_s8)?"S":"-");
#endif
#if (FFT_SIZE>4)
printf("%s", (m_fft->w_s4)?"S":"-");
#endif
 
printf(" %s%s\n",
(m_fft->o_sync)?"\t(SYNC!)":"",
(m_fft->o_result)?" (NZ)":"");
 
m_data[(m_oaddr )&(FFTLEN-1)] = m_fft->o_result;
 
if ((m_syncd)&&((m_oaddr&(FFTLEN-1)) == FFTLEN-1)) {
dumpwrite();
checkresults();
}
 
return (m_fft->o_sync);
}
#endif
 
bool test(double lft_r, double lft_i, double rht_r, double rht_i) {
ITYP ilft, irht, ilft_r, ilft_i, irht_r, irht_i;
 
351,7 → 468,12
ilft = (ilft_r << IWIDTH) | ilft_i;
irht = (irht_r << IWIDTH) | irht_i;
 
#ifdef DBLCLKFFT
return test(ilft, irht);
#else
test(ilft);
return test(irht);
#endif
}
 
double rdata(int addr) {
405,6 → 527,7
exit(-1);
}
 
fft->opentrace("fft.vcd");
fft->reset();
 
{
414,7 → 537,8
fft->dump(fpout);
 
// 1.
fft->test(0.0, 0.0, 32767.0, 0.0);
double maxv = ((1l<<(IWIDTH-1))-1l);
fft->test(0.0, 0.0, maxv, 0.0);
for(int k=0; k<FFTLEN/2-1; k++)
fft->test(0.0,0.0,0.0,0.0);
 
422,27 → 546,27
for(int k=0; k<FFTLEN/2; k++) {
double cl, cr, sl, sr, W;
W = - 2.0 * M_PI / FFTLEN * (1);
cl = cos(W * (2*k )) * 16383.0;
sl = sin(W * (2*k )) * 16383.0;
cr = cos(W * (2*k+1)) * 16383.0;
sr = sin(W * (2*k+1)) * 16383.0;
cl = cos(W * (2*k )) * (double)((1l<<(IWIDTH-2))-1l);
sl = sin(W * (2*k )) * (double)((1l<<(IWIDTH-2))-1l);
cr = cos(W * (2*k+1)) * (double)((1l<<(IWIDTH-2))-1l);
sr = sin(W * (2*k+1)) * (double)((1l<<(IWIDTH-2))-1l);
fft->test(cl, sl, cr, sr);
}
 
// 2.
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=0; k<FFTLEN/2-1; k++)
fft->test(0.0,0.0,0.0,0.0);
 
// 3.
fft->test(0.0,0.0,0.0,0.0);
fft->test(32767.0, 0.0, 0.0, 0.0);
fft->test(maxv, 0.0, 0.0, 0.0);
for(int k=0; k<FFTLEN/2-1; k++)
fft->test(0.0,0.0,0.0,0.0);
 
// 4.
for(int k=0; k<8; k++)
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=8; k<FFTLEN/2; k++)
fft->test(0.0,0.0,0.0,0.0);
 
449,7 → 573,7
// 5.
if (FFTLEN/2 >= 16) {
for(int k=0; k<16; k++)
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=16; k<FFTLEN/2; k++)
fft->test(0.0,0.0,0.0,0.0);
}
457,7 → 581,7
// 6.
if (FFTLEN/2 >= 32) {
for(int k=0; k<32; k++)
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=32; k<FFTLEN/2; k++)
fft->test(0.0,0.0,0.0,0.0);
}
465,7 → 589,7
// 7.
if (FFTLEN/2 >= 64) {
for(int k=0; k<64; k++)
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=64; k<FFTLEN/2; k++)
fft->test(0.0,0.0,0.0,0.0);
}
472,7 → 596,7
 
if (FFTLEN/2 >= 128) {
for(int k=0; k<128; k++)
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=128; k<FFTLEN/2; k++)
fft->test(0.0,0.0,0.0,0.0);
}
479,7 → 603,7
 
if (FFTLEN/2 >= 256) {
for(int k=0; k<256; k++)
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=256; k<FFTLEN/2; k++)
fft->test(0.0,0.0,0.0,0.0);
}
486,7 → 610,7
 
if (FFTLEN/2 >= 512) {
for(int k=0; k<256+128; k++)
fft->test(32767.0, 0.0, 32767.0, 0.0);
fft->test(maxv, 0.0, maxv, 0.0);
for(int k=256+128; k<FFTLEN/2; k++)
fft->test(0.0,0.0,0.0,0.0);
}
603,22 → 727,22
 
// 65.
for(int k=0; k<FFTLEN/2; k++)
fft->test(32767.0,0.0,-32767.0,0.0);
fft->test(maxv,0.0,-maxv,0.0);
// 66.
for(int k=0; k<FFTLEN/2; k++)
fft->test(0.0,-32767.0,0.0,32767.0);
fft->test(0.0,-maxv,0.0,maxv);
// 67.
for(int k=0; k<FFTLEN/2; k++)
fft->test(-32768.0,-32768.0,-32768.0,-32768.0);
fft->test(-maxv,-maxv,-maxv,-maxv);
// 68.
for(int k=0; k<FFTLEN/2; k++)
fft->test(0.0,-32767.0,0.0,32767.0);
fft->test(0.0,-maxv,0.0,maxv);
// 69.
for(int k=0; k<FFTLEN/2; k++)
fft->test(0.0,32767.0,0.0,-32767.0);
fft->test(0.0,maxv,0.0,-maxv);
// 70.
for(int k=0; k<FFTLEN/2; k++)
fft->test(-32768.0,-32768.0,-32768.0,-32768.0);
fft->test(-maxv,-maxv,-maxv,-maxv);
 
// 71. Now let's go for an impulse (SUCCESS)
fft->test(16384.0, 0.0, 0.0, 0.0);
722,8 → 846,16
 
fclose(fpout);
 
if (!fft->m_syncd) {
printf("FAIL -- NO SYNC\n");
goto test_failure;
}
 
printf("SUCCESS!!\n");
exit(0);
test_failure:
printf("TEST FAILED!!\n");
exit(0);
}
 
 
/trunk/bench/cpp/fftstage_tb.cpp
0,0 → 1,369
////////////////////////////////////////////////////////////////////////////////
//
// Filename: fftstage_tb.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: A test-bench for a generic FFT stage which has been
// instantiated by fftgen. Without loss of (much) generality,
// we'll examine the 2048 fftstage.v. This file may be run autonomously.
// If so, the last line output will either read "SUCCESS" on success, or
// some other failure message otherwise. Likewise the exit code will
// also indicate success (exit(0)) or failure (anything else).
//
// This file depends upon verilator to both compile, run, and therefore
// test fftstage.v. Also, you'll need to place a copy of the cmem_*2048
// hex file into the directory where you run this test bench.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#include "Vfftstage.h"
#include "verilated.h"
#include "verilated_vcd_c.h"
#include "twoc.h"
#include "fftsize.h"
 
 
#ifdef NEW_VERILATOR
#define VVAR(A) fftstage__DOT_ ## A
#else
#define VVAR(A) v__DOT_ ## A
#endif
 
#define cmem VVAR(_cmem)
#define iaddr VVAR(_iaddr)
 
#define FFTBITS (FFT_LGWIDTH)
#define FFTLEN (1<<FFTBITS)
#define FFTSIZE FFTLEN
#define FFTMASK (FFTLEN-1)
#define IWIDTH FFT_IWIDTH
#define CWIDTH 20
#define OWIDTH (FFT_IWIDTH+1)
#define BFLYSHIFT 0
#define LGWIDTH (FFT_LGWIDTH)
#ifdef DBLCLKFFT
#define LGSPAN (LGWIDTH-2)
#else
#define LGSPAN (LGWIDTH-1)
#endif
#define ROUND true
 
#define SPANLEN (1<<LGSPAN)
#define SPANMASK (SPANLEN-1)
#define DBLSPANLEN (1<<(LGSPAN+4))
#define DBLSPANMASK (DBLSPANLEN-1)
 
class FFTSTAGE_TB {
public:
Vfftstage *m_ftstage;
VerilatedVcdC *m_trace;
long m_oaddr, m_iaddr;
long m_vals[SPANLEN], m_out[DBLSPANLEN];
bool m_syncd;
int m_offset;
uint64_t m_tickcount;
 
FFTSTAGE_TB(void) {
Verilated::traceEverOn(true);
m_ftstage = new Vfftstage;
m_syncd = false;
m_iaddr = m_oaddr = 0;
m_offset = 0;
m_tickcount = 0;
}
 
void opentrace(const char *vcdname) {
if (!m_trace) {
m_trace = new VerilatedVcdC;
m_ftstage->trace(m_trace, 99);
m_trace->open(vcdname);
}
}
 
void closetrace(void) {
if (m_trace) {
m_trace->close();
delete m_trace;
m_trace = NULL;
}
}
 
void tick(void) {
m_tickcount++;
 
m_ftstage->i_clk = 0;
m_ftstage->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2));
m_ftstage->i_clk = 1;
m_ftstage->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount));
m_ftstage->i_clk = 0;
m_ftstage->eval();
if (m_trace) {
m_trace->dump((uint64_t)(10ul*m_tickcount+5));
m_trace->flush();
}
}
 
void cetick(void) {
int ce = m_ftstage->i_ce, nkce;
 
tick();
nkce = 0; // (rand()&1);
#ifdef FFT_CKPCE
nkce += FFT_CKPCE;
#endif
if ((ce)&&(nkce > 0)) {
m_ftstage->i_ce = 0;
for(int kce = 1; kce < nkce; kce++)
tick();
}
 
m_ftstage->i_ce = ce;
}
 
void reset(void) {
m_ftstage->i_ce = 0;
m_ftstage->i_reset = 1;
tick();
 
// Let's give it several ticks with no sync
m_ftstage->i_ce = 0;
m_ftstage->i_reset = 0;
for(int i=0; i<8192; i++) {
m_ftstage->i_data = rand();
m_ftstage->i_sync = 0;
m_ftstage->i_ce = 1;
 
cetick();
 
assert(m_ftstage->o_sync == 0);
}
 
m_iaddr = 0;
m_oaddr = 0;
m_offset = 0;
m_syncd = false;
}
 
void butterfly(const long cv, const long lft, const long rht,
long &o_lft, long &o_rht) {
long cv_r, cv_i;
long lft_r, lft_i, rht_r, rht_i;
long o_lft_r, o_lft_i, o_rht_r, o_rht_i;
 
cv_r = sbits(cv>>CWIDTH, CWIDTH);
cv_i = sbits(cv, CWIDTH);
 
lft_r = sbits(lft>>IWIDTH, IWIDTH);
lft_i = sbits(lft, IWIDTH);
 
rht_r = sbits(rht>>IWIDTH, IWIDTH);
rht_i = sbits(rht, IWIDTH);
 
o_lft_r = lft_r + rht_r;
o_lft_i = lft_i + rht_i;
 
o_lft_r &= (~(-1l << OWIDTH));
o_lft_i &= (~(-1l << OWIDTH));
 
// o_lft_r >>= 1;
// o_lft_i >>= 1;
o_lft = (o_lft_r << OWIDTH) | (o_lft_i);
 
o_rht_r = (cv_r * (lft_r-rht_r)) - (cv_i * (lft_i-rht_i));
o_rht_i = (cv_r * (lft_i-rht_i)) + (cv_i * (lft_r-rht_r));
 
if (ROUND) {
if (o_rht_r & (1<<(CWIDTH-3)))
o_rht_r += (1<<(CWIDTH-3))-1;
if (o_rht_i & (1<<(CWIDTH-3)))
o_rht_i += (1<<(CWIDTH-3))-1;
}
 
o_rht_r >>= (CWIDTH-2);
o_rht_i >>= (CWIDTH-2);
 
o_rht_r &= (~(-1l << OWIDTH));
o_rht_i &= (~(-1l << OWIDTH));
o_rht = (o_rht_r << OWIDTH) | (o_rht_i);
 
/*
printf("%10lx %10lx %10lx -> %10lx %10lx\n",
cv & ((1l<<(2*CWIDTH))-1l),
lft & ((1l<<(2*IWIDTH))-1l),
rht & ((1l<<(2*IWIDTH))-1l),
o_lft & ((1l<<(2*OWIDTH))-1l),
o_rht & ((1l<<(2*OWIDTH))-1l));
*/
}
 
void test(bool i_sync, long i_data) {
long cv;
bool bc;
int raddr;
bool failed = false;
 
m_ftstage->i_reset = 0;
m_ftstage->i_ce = 1;
m_ftstage->i_sync = i_sync;
i_data &= (~(-1l<<(2*IWIDTH)));
m_ftstage->i_data = i_data;
 
cv = m_ftstage->cmem[m_iaddr & SPANMASK];
bc = m_iaddr & (1<<LGSPAN);
if (!bc)
m_vals[m_iaddr & (SPANMASK)] = i_data;
else {
int waddr = m_iaddr ^ (1<<LGSPAN);
waddr &= (DBLSPANMASK);
if (m_iaddr & (1<<(LGSPAN+1)))
waddr |= (1<<(LGSPAN));
butterfly(cv, m_vals[m_iaddr & (SPANMASK)], i_data,
m_out[(m_iaddr-SPANLEN) & (DBLSPANMASK)],
m_out[m_iaddr & (DBLSPANMASK)]);
/*
printf("BFLY: C=%16lx M=%8lx I=%10lx -> %10lx %10lx\n",
cv, m_vals[m_iaddr & (SPANMASK)], i_data,
m_out[(m_iaddr-SPANLEN)&(DBLSPANMASK)],
m_out[m_iaddr & (DBLSPANMASK)]);
*/
}
 
cetick();
 
if ((!m_syncd)&&(m_ftstage->o_sync)) {
m_syncd = true;
// m_oaddr = m_iaddr - 0x219;
// m_oaddr = m_iaddr - 0;
m_offset = m_iaddr;
m_oaddr = 0;
 
printf("SYNC!!!!\n");
}
 
raddr = (m_iaddr-m_offset) & DBLSPANMASK;
/*
if (m_oaddr & (1<<(LGSPAN+1)))
raddr |= (1<<LGSPAN);
*/
 
printf("%4ld, %4ld: %d %9lx -> %9lx %d ... %4x %15lx (%10lx)\n",
(long)m_iaddr, (long)m_oaddr,
i_sync, (long)(i_data) & (~(-1l << (2*IWIDTH))),
(long)m_ftstage->o_data,
m_ftstage->o_sync,
 
m_ftstage->iaddr&(FFTMASK>>1),
(long)(m_ftstage->cmem[m_ftstage->iaddr&(SPANMASK>>1)]) & (~(-1l<<(2*CWIDTH))),
(long)m_out[raddr]);
 
if ((m_syncd)&&(m_ftstage->o_sync != ((((m_iaddr-m_offset)&((1<<(LGSPAN+1))-1))==0)?1:0))) {
fprintf(stderr, "Bad output sync (m_iaddr = %lx, m_offset = %x)\n",
(m_iaddr-m_offset) & SPANMASK, m_offset);
failed = true;
}
 
if (m_syncd) {
if (m_out[raddr] != m_ftstage->o_data) {
printf("Bad output data, ([%lx - %x = %x] %lx(exp) != %lx(sut))\n",
m_iaddr, m_offset, raddr,
m_out[raddr], (long)m_ftstage->o_data);
failed = true;
}
} else if (m_iaddr > 4096) {
printf("NO OUTPUT SYNC!\n");
failed = true;
}
m_iaddr++;
m_oaddr++;
 
if (failed)
exit(-1);
}
};
 
 
int main(int argc, char **argv, char **envp) {
Verilated::commandArgs(argc, argv);
FFTSTAGE_TB *ftstage = new FFTSTAGE_TB;
 
printf("Expecting : IWIDTH = %d, CWIDTH = %d, OWIDTH = %d\n",
IWIDTH, CWIDTH, OWIDTH);
 
ftstage->opentrace("fftstage.vcd");
ftstage->reset();
 
// Medium real (constant) value ... just for starters
for(int k=1; k<FFTSIZE; k+=2)
ftstage->test((k==1), 0x00200000l);
// Medium imaginary (constant) value ... just for starters
for(int k=1; k<FFTSIZE; k+=2)
ftstage->test((k==1), 0x00000020l);
// Medium sine wave, real
for(int k=1; k<FFTSIZE; k+=2) {
long vl;
vl= (long)(cos(2.0 * M_PI * 1.0 / FFTSIZE * k)*(1l<<30) + 0.5);
vl &= (-1l << 16); // Turn off the imaginary bit portion
vl &= (~(-1l << (IWIDTH*2))); // Turn off unused high order bits
ftstage->test((k==1), vl);
}
// Smallest real value
for(int k=1; k<FFTSIZE; k+=2)
ftstage->test((k==1), 0x00080000l);
// Smallest imaginary value
for(int k=1; k<FFTSIZE; k+=2)
ftstage->test((k==1), 0x00000001l);
// Largest real value
for(int k=1; k<FFTSIZE; k+=2)
ftstage->test((k==1), 0x200000000l);
// Largest negative imaginary value
for(int k=1; k<FFTSIZE; k+=2)
ftstage->test((k==1), 0x000010000l);
// Let's try an impulse
for(int k=0; k<FFTSIZE; k+=2)
ftstage->test((k==0), (k==0)?0x020000000l:0l);
// Now, let's clear out the result
for(int k=0; k<FFTSIZE; k+=2)
ftstage->test((k==0), 0x000000000l);
for(int k=0; k<FFTSIZE; k+=2)
ftstage->test((k==0), 0x000000000l);
for(int k=0; k<FFTSIZE; k+=2)
ftstage->test((k==0), 0x000000000l);
for(int k=0; k<FFTSIZE; k+=2)
ftstage->test((k==0), 0x000000000l);
 
printf("SUCCESS! (Offset = %d)\n", ftstage->m_offset);
delete ftstage;
 
exit(0);
}
/trunk/bench/cpp/hwbfly_tb.cpp
1,16 → 1,16
////////////////////////////////////////////////////////////////////////////
//
// Filename: butterfly_tb.cpp
// Filename: hwbfly_tb.cpp
//
// Project: A Doubletime Pipelined FFT
//
// Purpose: A test-bench for the butterfly.v subfile of the double
// clocked FFT. This file may be run autonomously. If so,
// the last line output will either read "SUCCESS" on success,
// or some other failure message otherwise.
// Purpose: A test-bench for the hardware butterfly subfile of the generic
// pipelined FFT. This file may be run autonomously. If so,
// the last line output will either read "SUCCESS" on success, or some
// other failure message otherwise.
//
// This file depends upon verilator to both compile, run, and
// therefore test butterfly.v
// This file depends upon verilator to both compile, run, and therefore
// test hwbfly.v
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
17,7 → 17,7
//
///////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015, Gisselquist Technology, LLC
// Copyright (C) 2015,2018 Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
42,9 → 42,11
#include <stdio.h>
#include <stdint.h>
 
#include "verilated.h"
#include "verilated_vcd_c.h"
#include "Vhwbfly.h"
#include "verilated.h"
#include "twoc.h"
#include "fftsize.h"
 
#ifdef NEW_VERILATOR
#define VVAR(A) hwbfly__DOT_ ## A
52,27 → 54,65
#define VVAR(A) v__DOT_ ## A
#endif
 
#define IWIDTH TST_BUTTERFLY_IWIDTH
#define CWIDTH TST_BUTTERFLY_CWIDTH
#define OWIDTH TST_BUTTERFLY_OWIDTH
 
class BFLY_TB {
class HWBFLY_TB {
public:
Vhwbfly *m_bfly;
VerilatedVcdC *m_trace;
unsigned long m_left[64], m_right[64];
bool m_aux[64];
int m_addr, m_lastaux, m_offset;
bool m_syncd;
uint64_t m_tickcount;
 
BFLY_TB(void) {
HWBFLY_TB(void) {
Verilated::traceEverOn(true);
m_trace = NULL;
m_bfly = new Vhwbfly;
m_addr = 0;
m_syncd = 0;
m_tickcount = 0;
m_bfly->i_reset = 1;
m_bfly->i_clk = 0;
m_bfly->eval();
m_bfly->i_reset = 0;
}
 
void opentrace(const char *vcdname) {
if (!m_trace) {
m_trace = new VerilatedVcdC;
m_bfly->trace(m_trace, 99);
m_trace->open(vcdname);
}
}
 
void closetrace(void) {
if (m_trace) {
m_trace->close();
delete m_trace;
m_trace = NULL;
}
}
 
void tick(void) {
m_tickcount++;
 
m_lastaux = m_bfly->o_aux;
m_bfly->i_clk = 0;
m_bfly->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2));
m_bfly->i_clk = 1;
m_bfly->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount));
m_bfly->i_clk = 0;
m_bfly->eval();
if (m_trace) {
m_trace->dump((uint64_t)(10ul*m_tickcount+5));
m_trace->flush();
}
 
if ((!m_syncd)&&(m_bfly->o_aux))
m_offset = m_addr;
79,14 → 119,33
m_syncd = (m_syncd) || (m_bfly->o_aux);
}
 
void cetick(void) {
int ce = m_bfly->i_ce, nkce;
 
tick();
 
nkce = (rand()&1);
#ifdef FFT_CKPCE
nkce += FFT_CKPCE;
#endif
 
if ((ce)&&(nkce > 0)) {
m_bfly->i_ce = 0;
for(int kce=0; kce<nkce-1; kce++)
tick();
}
 
m_bfly->i_ce = ce;
}
 
void reset(void) {
m_bfly->i_ce = 0;
m_bfly->i_rst = 1;
m_bfly->i_reset = 1;
m_bfly->i_coef = 0l;
m_bfly->i_left = 0;
m_bfly->i_right = 0;
tick();
m_bfly->i_rst = 0;
m_bfly->i_reset = 0;
m_bfly->i_ce = 1;
//
// Let's run a RESET test here, forcing the whole butterfly
93,18 → 152,18
// to be filled with aux=1. If the reset works right,
// we'll never get an aux=1 output.
//
m_bfly->i_rst = 1;
m_bfly->i_reset = 1;
m_bfly->i_aux = 1;
m_bfly->i_ce = 1;
m_bfly->i_aux = 1;
for(int i=0; i<200; i++)
tick();
cetick();
 
// Now here's the RESET line, so let's see what the test does
m_bfly->i_rst = 1;
m_bfly->i_reset = 1;
m_bfly->i_ce = 1;
m_bfly->i_aux = 1;
tick();
m_bfly->i_rst = 0;
cetick();
m_bfly->i_reset = 0;
m_syncd = 0;
}
 
111,13 → 170,13
void test(const int n, const int k, const unsigned long cof,
const unsigned lft, const unsigned rht, const int aux) {
 
m_bfly->i_coef = cof & (~(-1l << 40));
m_bfly->i_left = lft;
m_bfly->i_right = rht;
m_bfly->i_coef = ubits(cof, 2*TST_BUTTERFLY_CWIDTH);
m_bfly->i_left = ubits(lft, 2*TST_BUTTERFLY_IWIDTH);
m_bfly->i_right = ubits(rht, 2*TST_BUTTERFLY_IWIDTH);
m_bfly->i_aux = aux & 1;
 
m_bfly->i_ce = 1;
tick();
cetick();
 
if ((m_bfly->o_aux)&&(!m_lastaux))
printf("\n");
130,30 → 189,49
m_bfly->o_left,
m_bfly->o_right,
m_bfly->o_aux);
#if (FFT_CKPCE == 1)
printf(", p1 = 0x%08lx p2 = 0x%08lx, p3 = 0x%08lx",
#define rp_one VVAR(_CKPCE_ONE__DOT__rp_one)
#define rp_two VVAR(_CKPCE_ONE__DOT__rp_two)
#define rp_three VVAR(_CKPCE_ONE__DOT__rp_three)
m_bfly->rp_one,
m_bfly->rp_two,
m_bfly->rp_three);
#elif (FFT_CKPCE == 2)
#define rp_one VVAR(_genblk1__DOT__CKPCE_TWO__DOT__rp2_one)
#define rp_two VVAR(_genblk1__DOT__CKPCE_TWO__DOT__rp_two)
#define rp_three VVAR(_genblk1__DOT__CKPCE_TWO__DOT__rp_three)
printf(", p1 = 0x%08lx p2 = 0x%08lx, p3 = 0x%08lx",
m_bfly->rp_one,
m_bfly->rp_two,
m_bfly->rp_three);
#else
printf("CKPCE = %d\n", FFT_CKPCE);
#endif
 
printf("\n");
 
if ((m_syncd)&&(m_left[(m_addr-m_offset)&(64-1)] != m_bfly->o_left)) {
fprintf(stderr, "WRONG O_LEFT! (%lx(exp) != %lx(sut)\n",
printf("WRONG O_LEFT! (%lx(exp) != %lx(sut)\n",
m_left[(m_addr-m_offset)&(64-1)],
m_bfly->o_left);
exit(-1);
exit(EXIT_FAILURE);
}
 
if ((m_syncd)&&(m_right[(m_addr-m_offset)&(64-1)] != m_bfly->o_right)) {
fprintf(stderr, "WRONG O_RIGHT! (%lx(exp) != %lx(sut))\n",
m_right[(m_addr-m_offset)&(64-1)],
m_bfly->o_right);
exit(-1);
printf("WRONG O_RIGHT! (%lx(exp) != %lx(sut))\n",
m_right[(m_addr-m_offset)&(64-1)], m_bfly->o_right);
exit(EXIT_FAILURE);
}
 
if ((m_syncd)&&(m_aux[(m_addr-m_offset)&(64-1)] != m_bfly->o_aux)) {
fprintf(stderr, "FAILED AUX CHANNEL TEST (i.e. the SYNC)\n");
exit(-1);
printf("FAILED AUX CHANNEL TEST (i.e. the SYNC)\n");
exit(EXIT_FAILURE);
}
 
if ((m_addr > 22)&&(!m_syncd)) {
fprintf(stderr, "NO SYNC PULSE!\n");
exit(-1);
printf("NO SYNC PULSE!\n");
exit(EXIT_FAILURE);
}
 
// Now, let's calculate an "expected" result ...
160,20 → 238,20
long rlft, ilft;
 
// Extract left and right values ...
rlft = sbits(m_bfly->i_left >> 16, 16);
ilft = sbits(m_bfly->i_left , 16);
rlft = sbits(m_bfly->i_left >> IWIDTH, IWIDTH);
ilft = sbits(m_bfly->i_left , IWIDTH);
 
// Now repeat for the right hand value ...
long rrht, irht;
// Extract left and right values ...
rrht = sbits(m_bfly->i_right >> 16, 16);
irht = sbits(m_bfly->i_right , 16);
rrht = sbits(m_bfly->i_right >> IWIDTH, IWIDTH);
irht = sbits(m_bfly->i_right , IWIDTH);
 
// and again for the coefficients
long rcof, icof;
// Extract left and right values ...
rcof = sbits(m_bfly->i_coef >> 20, 20);
icof = sbits(m_bfly->i_coef , 20);
rcof = sbits(m_bfly->i_coef >> CWIDTH, CWIDTH);
icof = sbits(m_bfly->i_coef , CWIDTH);
 
// Now, let's do the butterfly ourselves ...
long sumi, sumr, difi, difr;
198,9 → 276,12
p2 = difi * icof;
p3 = (difr + difi) * (rcof + icof);
 
mpyr = p1-p2 + (1<<17);
mpyi = p3-p1-p2 + (1<<17);
mpyr = p1-p2;
mpyi = p3-p1-p2;
 
mpyr = rndbits(mpyr, (IWIDTH+2)+(CWIDTH+1), OWIDTH+4);
mpyi = rndbits(mpyi, (IWIDTH+2)+(CWIDTH+1), OWIDTH+4);
 
/*
printf("RC=%lx, IC=%lx, ", rcof, icof);
printf("P1=%lx,P2=%lx,P3=%lx, ", p1,p2,p3);
211,12 → 292,15
long o_left_r, o_left_i, o_right_r, o_right_i;
unsigned long o_left, o_right;
 
o_left_r = sumr & 0x01ffff; o_left_i = sumi & 0x01ffff;
o_left = (o_left_r << 17) | (o_left_i);
o_left_r = rndbits(sumr<<(CWIDTH-2), CWIDTH+IWIDTH+3, OWIDTH+4);
o_left_r = ubits(o_left_r, OWIDTH);
o_left_i = rndbits(sumi<<(CWIDTH-2), CWIDTH+IWIDTH+3, OWIDTH+4);
o_left_i = ubits(o_left_i, OWIDTH);
o_left = (o_left_r << OWIDTH) | (o_left_i);
 
o_right_r = (mpyr>>18) & 0x01ffff;
o_right_i = (mpyi>>18) & 0x01ffff;
o_right = (o_right_r << 17) | (o_right_i);
o_right_r = ubits(mpyr, OWIDTH);
o_right_i = ubits(mpyi, OWIDTH);
o_right = (o_right_r << OWIDTH) | (o_right_i);
/*
printf("oR_r = %lx, ", o_right_r);
printf("oR_i = %lx\n", o_right_i);
232,7 → 316,7
 
int main(int argc, char **argv, char **envp) {
Verilated::commandArgs(argc, argv);
BFLY_TB *bfly = new BFLY_TB;
HWBFLY_TB *bfly = new HWBFLY_TB;
int16_t ir0, ii0, lstr, lsti;
int32_t sumr, sumi, difr, difi;
int32_t smr, smi, dfr, dfi;
240,6 → 324,8
 
const int TESTSZ = 256;
 
bfly->opentrace("hwbfly.vcd");
 
bfly->reset();
 
bfly->test(9,0,0x4000000000l,0x7fff0000,0x7fff0000, 1);
/trunk/bench/cpp/laststage_tb.cpp
0,0 → 1,332
////////////////////////////////////////////////////////////////////////////
//
// Filename: laststage_tb.cpp
//
// Project: A Doubletime Pipelined FFT
//
// Purpose: A test-bench for the laststage.v subfile of the general purpose
// pipelined FFT. This file may be run autonomously. If so,
// the last line output will either read "SUCCESS" on success, or some
// other failure message otherwise.
//
// This file depends upon verilator to both compile, run, and therefore
// test laststage.v
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
///////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015,2018 Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
///////////////////////////////////////////////////////////////////////////
#include <stdio.h>
#include <stdint.h>
 
#include "verilated.h"
#include "verilated_vcd_c.h"
#include "Vlaststage.h"
#include "twoc.h"
 
#define IWIDTH 16
#define OWIDTH (IWIDTH+1)
#define SHIFT 0
#define ROUND 1
 
#define ASIZ 32
#define AMSK (ASIZ-1)
 
class LASTSTAGE_TB {
public:
Vlaststage *m_last;
VerilatedVcdC *m_trace;
#ifdef DBLCLKFFT
unsigned long m_left[ASIZ], m_right[ASIZ];
#else
unsigned long m_data[ASIZ];
#endif
bool m_syncd;
int m_addr, m_offset;
unsigned long m_tickcount;
 
LASTSTAGE_TB(void) {
Verilated::traceEverOn(true);
m_last = new Vlaststage;
m_tickcount = 0;
m_syncd = false; m_addr = 0, m_offset = 0;
}
 
void opentrace(const char *vcdname) {
if (!m_trace) {
m_trace = new VerilatedVcdC;
m_last->trace(m_trace, 99);
m_trace->open(vcdname);
}
}
 
void closetrace(void) {
if (m_trace) {
m_trace->close();
delete m_trace;
m_trace = NULL;
}
}
 
void tick(void) {
m_tickcount++;
 
m_last->i_clk = 0;
m_last->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul * m_tickcount - 2));
m_last->i_clk = 1;
m_last->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul * m_tickcount));
m_last->i_clk = 0;
m_last->eval();
if (m_trace) {
m_trace->dump((uint64_t)(10ul * m_tickcount + 5));
m_trace->flush();
}
m_last->i_reset = 0;
m_last->i_sync = 0;
}
 
void cetick(void) {
int nkce;
 
tick();
nkce = (rand()&1);
#ifdef FFT_CKPCE
nkce += FFT_CKPCE;
#endif
if ((m_last->i_ce)&&(nkce > 0)) {
m_last->i_ce = 0;
for(int kce = 1; kce < nkce; kce++)
tick();
m_last->i_ce = 1;
}
}
 
void reset(void) {
m_last->i_reset = 1;
tick();
 
m_syncd = false; m_addr = 0, m_offset = 0;
}
 
void check_results(void) {
bool failed = false;
 
if ((!m_syncd)&&(m_last->o_sync)) {
m_syncd = true;
m_offset = m_addr;
printf("SYNCD at %d\n", m_addr);
}
 
#ifdef DBLCLKFFT
int ir0, ir1, ii0, ii1, or0, oi0, or1, oi1;
 
ir0 = sbits(m_left[ (m_addr-m_offset)&AMSK]>>IWIDTH, IWIDTH);
ir1 = sbits(m_right[(m_addr-m_offset)&AMSK]>>IWIDTH, IWIDTH);
ii0 = sbits(m_left[ (m_addr-m_offset)&AMSK], IWIDTH);
ii1 = sbits(m_right[(m_addr-m_offset)&AMSK], IWIDTH);
 
 
or0 = sbits(m_last->o_left >> OWIDTH, OWIDTH);
oi0 = sbits(m_last->o_left , OWIDTH);
or1 = sbits(m_last->o_right >> OWIDTH, OWIDTH);
oi1 = sbits(m_last->o_right , OWIDTH);
 
 
// Sign extensions
printf("k=%3d: IN = %08x:%08x, OUT =%09lx:%09lx, S=%d\n",
m_addr, m_last->i_left, m_last->i_right,
m_last->o_left, m_last->o_right,
m_last->o_sync);
 
/*
printf("\tI0 = { %x : %x }, I1 = { %x : %x }, O0 = { %x : %x }, O1 = { %x : %x }\n",
ir0, ii0, ir1, ii1, or0, oi0, or1, oi1);
*/
 
if (m_syncd) {
if (or0 != (ir0 + ir1)) {
printf("FAIL 1: or0 != (ir0+ir1), or %x(exp) != %x(sut)\n", (ir0+ir1), or0);
failed=true;}
if (oi0 != (ii0 + ii1)) {printf("FAIL 2\n"); failed=true;}
if (or1 != (ir0 - ir1)) {printf("FAIL 3\n"); failed=true;}
if (oi1 != (ii0 - ii1)) {printf("FAIL 4\n"); failed=true;}
} else if (m_addr > 20) {
printf("NO SYNC!\n");
failed = true;
}
#else
int or0, oi0;
int sumr, sumi, difr, difi;
int ir0, ii0, ir1, ii1, ir2, ii2, ir3, ii3, irn, iin;
 
irn = sbits(m_data[(m_addr-m_offset+2)&AMSK]>>IWIDTH, IWIDTH);
iin = sbits(m_data[(m_addr-m_offset+2)&AMSK], IWIDTH);
ir0 = sbits(m_data[(m_addr-m_offset+1)&AMSK]>>IWIDTH, IWIDTH);
ii0 = sbits(m_data[(m_addr-m_offset+1)&AMSK], IWIDTH);
ir1 = sbits(m_data[(m_addr-m_offset )&AMSK]>>IWIDTH, IWIDTH);
ii1 = sbits(m_data[(m_addr-m_offset )&AMSK], IWIDTH);
ir2 = sbits(m_data[(m_addr-m_offset-1)&AMSK]>>IWIDTH, IWIDTH);
ii2 = sbits(m_data[(m_addr-m_offset-1)&AMSK], IWIDTH);
ir3 = sbits(m_data[(m_addr-m_offset-2)&AMSK]>>IWIDTH, IWIDTH);
ii3 = sbits(m_data[(m_addr-m_offset-2)&AMSK], IWIDTH);
 
sumr = ir1 + ir0;
sumi = ii1 + ii0;
 
difr = ir2 - ir1;
difi = ii2 - ii1;
 
or0 = sbits(m_last->o_val >> OWIDTH, OWIDTH);
oi0 = sbits(m_last->o_val , OWIDTH);
 
printf("IR0 = %08x, IR1 = %08x, IR2 = %08x, ",
ir0, ir1, ir2);
printf("II0 = %08x, II1 = %08x, II2 = %08x, ",
ii0, ii1, ii2);
// Sign extensions
printf("k=%3d: IN = %08x, %c, OUT =%09lx, S=%d\n",
m_addr, m_last->i_val,
m_last->i_sync ? 'S':' ',
m_last->o_val, m_last->o_sync);
 
 
if ((m_syncd)&&(0 == ((m_addr-m_offset)&1))) {
if (or0 != sumr) {
printf("FAIL 1: or0 != (ir0+ir1), or %x(exp) != %x(sut)\n", sumr, or0);
failed=true;
} if (oi0 != sumi) {
printf("FAIL 2\n");
failed=true;
}
} else if ((m_syncd)&&(1 == ((m_addr-m_offset)&1))) {
if (or0 != difr) {
printf("FAIL 3: or0 != (ir1-ir0), or %x(exp) != %x(sut)\n", difr, or0);
failed=true;
} if (oi0 != difi) {
printf("FAIL 4: oi0 != (ii1-ii0), or %x(exp) != %x(sut)\n", difi, oi0);
failed=true;
}
} else if (m_addr > 20) {
printf("NO SYNC!\n");
failed = true;
}
#endif
if (failed)
exit(-2);
}
 
void sync(void) {
m_last->i_sync = 1;
}
 
void test(unsigned long left, unsigned long right) {
m_last->i_ce = 1;
if (m_last->i_sync)
m_addr = 0;
#ifdef DBLCLKFFT
m_last->i_left = left;
m_last->i_right = right;
 
m_left[ m_addr&AMSK] = m_last->i_left;
m_right[m_addr&AMSK] = m_last->i_right;
m_addr++;
 
cetick();
#else
m_last->i_val = left;
m_data[ m_addr&AMSK] = m_last->i_val;
m_addr = (m_addr+1);
cetick();
 
check_results();
 
m_last->i_val = right;
m_data[m_addr&AMSK] = m_last->i_val;
m_addr = (m_addr+1)&AMSK;
cetick();
#endif
 
check_results();
}
 
void test(int ir0, int ii0, int ir1, int ii1) {
unsigned long left, right, mask = (1<<IWIDTH)-1;
 
left = ((ir0&mask) << IWIDTH) | (ii0 & mask);
right = ((ir1&mask) << IWIDTH) | (ii1 & mask);
test(left, right);
}
};
 
int main(int argc, char **argv, char **envp) {
Verilated::commandArgs(argc, argv);
LASTSTAGE_TB *tb = new LASTSTAGE_TB;
 
tb->opentrace("laststage.vcd");
tb->reset();
 
tb->sync();
 
tb->test( 1, 0,0,0);
tb->test( 0, 2,0,0);
tb->test( 0, 0,4,0);
tb->test( 0, 0,0,8);
 
tb->test( 0, 0,0,0);
 
tb->test(16,16,0,0);
tb->test(0,0,16,16);
tb->test(16,-16,0,0);
tb->test(0,0,16,-16);
tb->test(16,16,0,0);
tb->test(0,0,16,16);
 
for(int k=0; k<64; k++) {
int16_t ir0, ii0, ir1, ii1;
 
// Let's pick some random values, ...
ir0 = rand(); if (ir0&4) ir0 = -ir0;
ii0 = rand(); if (ii0&2) ii0 = -ii0;
ir1 = rand(); if (ir1&1) ir1 = -ir1;
ii1 = rand(); if (ii1&8) ii1 = -ii1;
 
tb->test(ir0, ii0, ir1, ii1);
 
}
 
delete tb;
 
printf("SUCCESS!\n");
exit(0);
}
 
 
 
 
 
 
/trunk/bench/cpp/mpy_tb.cpp
2,7 → 2,7
//
// Filename: mpy_tb.cpp
//
// Project: A Doubletime Pipelined FFT
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: A test-bench for the shift and add shiftaddmpy.v subfile of
// the double clocked FFT. This file may be run autonomously.
17,7 → 17,7
//
///////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015, Gisselquist Technology, LLC
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
39,7 → 39,11
//
//
///////////////////////////////////////////////////////////////////////////
#include "verilated.h"
#include "verilated_vcd_c.h"
 
#include "fftsize.h"
 
#ifdef USE_OLD_MULTIPLY
#include "Vshiftaddmpy.h"
typedef Vshiftaddmpy Vmpy;
54,17 → 58,20
#define DELAY ((AW/2)+(AW&1)+2)
#endif
 
#include "verilated.h"
#include "twoc.h"
 
class MPYTB {
public:
Vmpy *mpy;
long vals[32];
int m_addr;
Vmpy *m_mpy;
VerilatedVcdC *m_trace;
long vals[32];
int m_addr;
uint64_t m_tickcount;
 
MPYTB(void) {
mpy = new Vmpy;
Verilated::traceEverOn(true);
m_mpy = new Vmpy;
m_tickcount = 0;
 
for(int i=0; i<32; i++)
vals[i] = 0;
71,24 → 78,68
m_addr = 0;
}
~MPYTB(void) {
delete mpy;
closetrace();
delete m_mpy;
}
 
void tick(void) {
mpy->i_clk = 0;
mpy->eval();
mpy->i_clk = 1;
mpy->eval();
void opentrace(const char *vcdname) {
if (!m_trace) {
m_trace = new VerilatedVcdC;
m_mpy->trace(m_trace, 99);
m_trace->open(vcdname);
}
}
 
void closetrace(void) {
if (m_trace) {
m_trace->close();
delete m_trace;
m_trace = NULL;
}
}
 
void tick(void) {
m_tickcount++;
 
m_mpy->i_clk = 0;
m_mpy->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2));
m_mpy->i_clk = 1;
m_mpy->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount));
m_mpy->i_clk = 0;
m_mpy->eval();
if (m_trace) {
m_trace->dump((uint64_t)(10ul*m_tickcount+5));
m_trace->flush();
}
}
 
void cetick(void) {
int ce = m_mpy->i_ce, nkce;
 
tick();
nkce = (rand()&1);
#ifdef FFT_CKPCE
nkce += FFT_CKPCE;
#endif
if ((ce)&&(nkce>0)) {
m_mpy->i_ce = 0;
for(int kce=1; kce<nkce; kce++)
tick();
}
 
m_mpy->i_ce = ce;
}
 
void reset(void) {
mpy->i_clk = 0;
mpy->i_ce = 1;
mpy->i_a = 0;
mpy->i_b = 0;
m_mpy->i_clk = 0;
m_mpy->i_ce = 1;
m_mpy->i_a_unsorted = 0;
m_mpy->i_b_unsorted = 0;
 
for(int k=0; k<20; k++)
tick();
cetick();
}
 
bool test(const int ia, const int ib) {
97,23 → 148,21
 
a = sbits(ia, AW);
b = sbits(ib, BW);
mpy->i_ce = 1;
mpy->i_a = ubits(a, AW);
mpy->i_b = ubits(b, BW);
m_mpy->i_ce = 1;
m_mpy->i_a_unsorted = ubits(a, AW);
m_mpy->i_b_unsorted = ubits(b, BW);
 
vals[m_addr&31] = a * b;
 
tick();
if (rand()&1) {
mpy->i_ce = 0;
tick();
}
cetick();
 
printf("k=%3d: A =%04x, B =%05x -> O = %9lx (ANS=%10lx)\n",
m_addr, (int)ubits(a,AW), (int)ubits(b,BW),
(long)mpy->o_r, ubits(vals[m_addr&31], AW+BW+4));
printf("k=%3d: A =%0*x, B =%0*x -> O = %*lx (ANS=%*lx)\n",
m_addr, (AW+3)/4, (int)ubits(a,AW),
(BW+3)/4, (int)ubits(b,BW),
(AW+BW+3)/4, (long)m_mpy->o_r,
(AW+BW+7)/4, ubits(vals[m_addr&31], AW+BW+4));
 
out = sbits(mpy->o_r, AW+BW);
out = sbits(m_mpy->o_r, AW+BW);
 
m_addr++;
 
131,6 → 180,7
Verilated::commandArgs(argc, argv);
MPYTB *tb = new MPYTB;
 
tb->opentrace("mpy.vcd");
tb->reset();
 
for(int k=0; k<15; k++) {
149,10 → 199,16
tb->test(a, b);
}
 
for(int k=0; k<2048; k++) {
int a, b, out;
 
tb->test(rand(), rand());
if (AW+BW <= 20) {
// Exhaustive test
for(int a=0; a< (1<<AW); a++)
for(int b=0; b< (1<<BW); b++)
tb->test(a, b);
printf("Exhaust complete\n");
} else {
// Pseudorandom test
for(int k=0; k<2048; k++)
tb->test(rand(), rand());
}
 
delete tb;
/trunk/bench/cpp/qtrstage_tb.cpp
6,11 → 6,11
//
// Purpose: A test-bench for the qtrstage.v subfile of the double
// clocked FFT. This file may be run autonomously. If so,
// the last line output will either read "SUCCESS" on success,
// or some other failure message otherwise.
// the last line output will either read "SUCCESS" on success, or some
// other failure message otherwise.
//
// This file depends upon verilator to both compile, run, and
// therefore test qtrstage.v
// This file depends upon verilator to both compile, run, and therefore
// test qtrstage.v
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
17,7 → 17,7
//
///////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015, Gisselquist Technology, LLC
// Copyright (C) 2015,2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
42,8 → 42,9
#include <stdio.h>
#include <stdint.h>
 
#include "verilated.h"
#include "verilated_vcd_c.h"
#include "Vqtrstage.h"
#include "verilated.h"
#include "twoc.h"
#include "fftsize.h"
 
72,30 → 73,78
class QTRTEST_TB {
public:
Vqtrstage *m_qstage;
unsigned long m_data[ASIZ];
VerilatedVcdC *m_trace;
unsigned long m_data[ASIZ], m_tickcount;
int m_addr, m_offset;
bool m_syncd;
 
QTRTEST_TB(void) {
Verilated::traceEverOn(true);
m_trace = NULL;
m_qstage = new Vqtrstage;
m_addr = 0; m_offset = 6; m_syncd = false;
m_addr = 0;
m_offset = 6;
m_syncd = false;
m_tickcount = 0;
}
 
void opentrace(const char *vcdname) {
if (!m_trace) {
m_trace = new VerilatedVcdC;
m_qstage->trace(m_trace, 99);
m_trace->open(vcdname);
}
}
 
void closetrace(void) {
if (m_trace) {
m_trace->close();
delete m_trace;
m_trace = NULL;
}
}
 
void tick(void) {
m_tickcount++;
 
m_qstage->i_clk = 0;
m_qstage->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2));
m_qstage->i_clk = 1;
m_qstage->eval();
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount));
m_qstage->i_clk = 0;
m_qstage->eval();
if (m_trace) {
m_trace->dump((uint64_t)(10ul*m_tickcount+5));
m_trace->flush();
}
 
m_qstage->i_sync = 0;
}
 
void cetick(void) {
int nkce;
 
tick();
nkce = (rand()&1);
#ifdef FFT_CKPCE
nkce += FFT_CKPCE;
#endif
if ((m_qstage->i_ce)&&(nkce>0)) {
m_qstage->i_ce = 0;
for(int kce = 1; kce < nkce; kce++)
tick();
m_qstage->i_ce = 1;
}
}
 
void reset(void) {
m_qstage->i_ce = 0;
m_qstage->i_rst = 1;
m_qstage->i_reset = 1;
tick();
m_qstage->i_ce = 0;
m_qstage->i_rst = 0;
m_qstage->i_reset = 0;
tick();
 
m_addr = 0; m_offset = 6; m_syncd = false;
102,18 → 151,22
}
 
void check_results(void) {
int ir0, ii0, ir1, ii1, ir2, ii2;
int sumr, sumi, difr, difi, or0, oi0;
bool fail = false;
 
if ((!m_syncd)&&(m_qstage->o_sync)) {
m_syncd = true;
assert(m_addr == m_offset);
m_offset = m_addr;
printf("VALID-SYNC!!\n");
}
 
if (!m_syncd)
return;
 
#ifdef DBLCLKFFT
int ir0, ii0, ir1, ii1, ir2, ii2;
 
ir0 = sbits(m_data[(m_addr-m_offset-1)&AMSK]>>IWIDTH, IWIDTH);
ii0 = sbits(m_data[(m_addr-m_offset-1)&AMSK], IWIDTH);
ir1 = sbits(m_data[(m_addr-m_offset )&AMSK]>>IWIDTH, IWIDTH);
140,11 → 193,49
if (oi0 != difi) {
printf("FAIL 4: oi0 != difi (%x(exp) != %x(sut))\n", difi, oi0); fail = true;}
}
#else
int locn = (m_addr-m_offset)&AMSK;
int ir1, ii1, ir3, ii3, ir5, ii5;
 
if (m_qstage->o_sync != ((((m_addr-m_offset)&127) == 0)?1:0)) {
printf("BAD O-SYNC, m_addr = %d, m_offset = %d\n", m_addr, m_offset); fail = true;
ir5 = sbits(m_data[(m_addr-m_offset-2)&AMSK]>>IWIDTH, IWIDTH);
ii5 = sbits(m_data[(m_addr-m_offset-2)&AMSK], IWIDTH);
ir3 = sbits(m_data[(m_addr-m_offset )&AMSK]>>IWIDTH, IWIDTH);
ii3 = sbits(m_data[(m_addr-m_offset )&AMSK], IWIDTH);
ir1 = sbits(m_data[(m_addr-m_offset+2)&AMSK]>>IWIDTH, IWIDTH);
ii1 = sbits(m_data[(m_addr-m_offset+2)&AMSK], IWIDTH);
 
sumr = ir3 + ir1;
sumi = ii3 + ii1;
difr = ir5 - ir3;
difi = ii5 - ii3;
 
or0 = sbits(m_qstage->o_data >> OWIDTH, OWIDTH);
oi0 = sbits(m_qstage->o_data, OWIDTH);
 
if (0==((locn)&2)) {
if (or0 != sumr) {
printf("FAIL 1: or0 != sumr (%x(exp) != %x(sut))\n", sumr, or0); fail = true;
}
if (oi0 != sumi) {
printf("FAIL 2: oi0 != sumi (%x(exp) != %x(sut))\n", sumi, oi0); fail = true;}
} else if (2==((m_addr-m_offset)&3)) {
if (or0 != difr) {
printf("FAIL 3: or0 != difr (%x(exp) != %x(sut))\n", difr, or0); fail = true;}
if (oi0 != difi) {
printf("FAIL 4: oi0 != difi (%x(exp) != %x(sut))\n", difi, oi0); fail = true;}
} else if (3==((m_addr-m_offset)&3)) {
if (or0 != difi) {
printf("FAIL 3: or0 != difr (%x(exp) != %x(sut))\n", difr, or0); fail = true;}
if (oi0 != -difr) {
printf("FAIL 4: oi0 != difi (%x(exp) != %x(sut))\n", difi, oi0); fail = true;}
}
 
// if (m_qstage->o_sync != ((((m_addr-m_offset)&127) == 0)?1:0)) {
// printf("BAD O-SYNC, m_addr = %d, m_offset = %d\n", m_addr, m_offset); fail = true;
// }
#endif
 
 
if (fail)
exit(-1);
}
159,6 → 250,7
m_qstage->i_ce = 1;
m_qstage->i_data = data;
// m_qstage->i_sync = (((m_addr&127)==2)?1:0);
// printf("DATA[%08x] = %08x ... ", m_addr, data);
m_data[ (m_addr++)&AMSK] = data;
tick();
 
172,7 → 264,11
m_qstage->diff_i,
m_qstage->pipeline,
m_qstage->iaddr,
#ifdef DBLCLKFFT
m_qstage->imem,
#else
m_qstage->imem[1],
#endif
m_qstage->wait_for_sync);
 
check_results();
202,17 → 298,57
int16_t ir0, ii0, ir1, ii1, ir2, ii2;
int32_t sumr, sumi, difr, difi;
 
tb->opentrace("qtrstage.vcd");
tb->reset();
 
tb->test( 16, 0);
tb->test( 16, 0);
tb->sync();
 
tb->test( 8, 0);
tb->test( 0, 0);
tb->test( 0, 0);
tb->test( 0, 0);
 
tb->test( 0, 4);
tb->test( 0, 0);
tb->test( 0, 0);
tb->test( 0, 0);
 
tb->test( 0, 0);
tb->test( 32, 0);
tb->test( 0, 0);
tb->test( 0, 0);
 
tb->test( 0, 0);
tb->test( 0, 64);
tb->test( 0, 0);
tb->test( 0, 0);
 
tb->test( 0, 0);
tb->test( 0, 0);
tb->test(128, 0);
tb->test( 0, 0);
 
tb->test( 0, 0);
tb->test( 0, 0);
tb->test( 0,256);
tb->test( 0, 0);
 
tb->test( 0, 0);
tb->test( 0, 0);
tb->test( 0, 0);
tb->test( 2, 0);
 
tb->test( 0, 0);
tb->test( 0, 0);
tb->test( 0, 0);
tb->test( 0, 1);
 
tb->test( 0, 16);
tb->test( 0, 16);
tb->test( 16, 0);
tb->test(-16, 0);
tb->test( 0, 16);
tb->test( 0,-16);
 
for(int k=0; k<1060; k++) {
tb->random_test();
/trunk/bench/formal/.gitignore
0,0 → 1,15
bitreverse
laststage
qtrstage
hwbfly_one
hwbfly_two
hwbfly_three
butterfly_one
butterfly_two
butterfly_three
butterfly_ck1
butterfly_ck2_r0
butterfly_ck2_r1
butterfly_ck3_r0
butterfly_ck3_r1
butterfly_ck3_r2
/trunk/bench/formal/README.md
0,0 → 1,18
This directory contains several SymbiYosys scripts useful for
formally verifying parts and pieces of the design. Admittedly,
the entire design has yet to be formally verfified, however many
components have been verified successfully. These include:
 
- The butterflies, both the hardware enabled butterflies and the
soft multiplies.
 
- The penultimate (4-pt) stage of the FFT
 
- The final stage (2-pt) of the FFT
 
- The bitreverse
 
My intention is not to place formal properties into the repository.
Within the [defaults.h](../../sw/defaults.h) there's a
``formal_property_flag`` used for controlling whether or not the
formal properties are included into the RTL files.
/trunk/bench/formal/abs_mpy.v
0,0 → 1,114
////////////////////////////////////////////////////////////////////////////////
//
// Filename: abs_mpy.v
//
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core
//
// Purpose: This code has been modified from the mpyop.v file so as to
// abstract the multiply that formal methods struggle so hard to
// deal with. It also simplifies the interface so that (if enabled)
// the multiply will return in 1-6 clocks, rather than the specified
// number for the given architecture.
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory. Run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module abs_mpy(i_a, i_b, o_result);
parameter AW = 32, BW=32;
parameter [0:0] OPT_SIGNED = 1'b1;
input wire [(AW-1):0] i_a;
input wire [(BW-1):0] i_b;
output wire [(AW+BW-1):0] o_result;
 
wire [(AW+BW-1):0] any_result;
assign any_result = $anyseq;
 
reg [AW-1:0] u_a;
reg [BW-1:0] u_b;
 
always @(*)
begin
u_a = ((i_a[AW-1])&&(OPT_SIGNED)) ? -i_a : i_a;
u_b = ((i_b[BW-1])&&(OPT_SIGNED)) ? -i_b : i_b;
end
 
reg [(AW+BW-1):0] u_result;
always @(*)
if ((OPT_SIGNED)&&(any_result[AW+BW-1]))
u_result = - { 1'b1, any_result };
else
u_result = { 1'b0, any_result };
 
always @(*)
begin
// Constrain our result among many possibilities
if ((i_a == 0)||(i_b == 0))
assume(any_result == 0);
else if (OPT_SIGNED)
assume(any_result[AW+BW-1]
== (i_a[AW-1] ^ i_b[BW-1]));
 
assume(u_result[AW+BW-1:BW] <= u_a);
assume(u_result[AW+BW-1:AW] <= u_b);
end
 
genvar k;
generate
begin
for(k=0; k<AW-1; k=k+1)
begin
always @(*)
if (u_a == (1<<k))
assume(u_result == (u_b << k));
end
 
for(k=0; k<BW; k=k+1)
begin
always @(*)
if (u_b == (1<<k))
assume(u_result== (u_a << k));
end
 
end endgenerate
 
assign o_result = any_result;
 
/*
always @(*)
if (i_a == 1)
assert(o_result == {{(AW){i_b[BW-1]}}, i_b });
 
always @(*)
if (i_b == 1)
assert(o_result == {{(BW){i_a[AW-1]}}, i_a });
*/
endmodule
/trunk/bench/formal/bitreverse.sby
0,0 → 1,13
[options]
mode prove
depth 12
 
[engines]
smtbmc
 
[script]
read_verilog -formal -DBITREVERSE bitreverse.v
prep -top bitreverse
 
[files]
../../rtl/bitreverse.v
/trunk/bench/formal/butterfly.sby
0,0 → 1,42
[tasks]
ck1
ck2_r0
ck2_r1
ck3_r0
ck3_r1
 
[options]
mode prove
depth 30
 
[engines]
smtbmc
 
[script]
read_verilog -formal -DHWBFLY abs_mpy.v
read_verilog -formal -DHWBFLY convround.v
read_verilog -formal -DHWBFLY longbimpy.v
read_verilog -formal -DHWBFLY bimpy.v
read_verilog -formal -DHWBFLY butterfly.v
 
# While I'd love to change the width of the inputs and the coefficients,
# doing so would adjust the width of the firmware multiplies, and so defeat
# our purpose here.
# ck1: chparam -set CKPCE 1 butterfly
ck1: chparam -set CKPCE 1 -set CWIDTH 19 -set IWIDTH 15 butterfly
#
ck2_r0: chparam -set CKPCE 2 -set CWIDTH 20 -set IWIDTH 12 -set F_CHECK 1 butterfly
ck2_r1: chparam -set CKPCE 2 -set CWIDTH 16 -set IWIDTH 6 -set F_CHECK 0 butterfly
#
ck3_r0: chparam -set CKPCE 3 -set CWIDTH 16 -set IWIDTH 12 -set F_CHECK 0 butterfly
ck3_r1: chparam -set CKPCE 3 -set CWIDTH 18 -set IWIDTH 14 -set F_CHECK 1 butterfly
ck3_r2: chparam -set CKPCE 3 -set CWIDTH 20 -set IWIDTH 16 -set F_CHECK 2 butterfly
 
prep -top butterfly
 
[files]
abs_mpy.v
../../rtl/convround.v
../../rtl/bimpy.v
../../rtl/longbimpy.v
../../rtl/butterfly.v
/trunk/bench/formal/hwbfly.sby
0,0 → 1,27
[tasks]
one
two
three
 
[options]
mode prove
depth 23
 
[engines]
smtbmc
 
[script]
read_verilog -formal -DHWBFLY abs_mpy.v
read_verilog -formal -DHWBFLY convround.v
read_verilog -formal -DHWBFLY hwbfly.v
 
one: chparam -set CKPCE 1 -set IWIDTH 4 -set CWIDTH 6 hwbfly
two: chparam -set CKPCE 2 -set IWIDTH 4 -set CWIDTH 6 hwbfly
three: chparam -set CKPCE 3 -set IWIDTH 4 -set CWIDTH 6 hwbfly
 
prep -top hwbfly
 
[files]
abs_mpy.v
../../rtl/convround.v
../../rtl/hwbfly.v
/trunk/bench/formal/laststage.sby
0,0 → 1,16
[options]
mode prove
depth 20
 
[engines]
smtbmc yices
 
[script]
read_verilog -formal -DLASTSTAGE convround.v
read_verilog -formal -DLASTSTAGE laststage.v
chparam -set IWIDTH 3 -set OWIDTH 4 laststage
prep -top laststage
 
[files]
../../rtl/laststage.v
../../rtl/convround.v
/trunk/bench/formal/qtrstage.sby
0,0 → 1,16
[options]
mode prove
depth 20
 
[engines]
smtbmc boolector
 
[script]
read_verilog -formal -DQTRSTAGE convround.v
read_verilog -formal -DQTRSTAGE qtrstage.v
chparam -set IWIDTH 3 -set OWIDTH 4 qtrstage
prep -top qtrstage
 
[files]
../../rtl/qtrstage.v
../../rtl/convround.v
/trunk/rtl/README.md
0,0 → 1,28
This directory contains a demonstration FFT design.
 
Should you wish to use this core, I would recommend you run `fftgen` from the
[sw](../sw) directory to create an FFT tailored to your own needs.
 
In sum, from the top down, the modules are:
 
- [fftmain](fftmain.v) is the top level FFT file.
 
- [fftstage](fftstage.v) calculates one FFT stage
 
- [hwbfly](hwbfly.v) implements a butterfly that uses the `*` operator
for its multiply
- [butterfly](butterfly.v) implements a butterfly that uses a logic
multiply at the cost of more logic and a greater delay.
 
- [longbimpy](longbimpy.v) is the logic binary multiply.
- [bimpy](bimpy.v) multiplies a small set of bits together. It is a
component of [longbimpy](longbimpy.v)
 
- [qtrstage](qtrstage.v) is the 4-pt stage of the FFT
 
- [laststage](laststage.v) is the 2-pt stage of the FFT
 
- [bitreverse](bitreverse.v), the final step in the multiply, bit-reverses
the outgoing data.
 
 
/trunk/rtl/bimpy.v
0,0 → 1,72
////////////////////////////////////////////////////////////////////////////////
//
// Filename: ../rtl/bimpy.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: A simple 2-bit multiply based upon the fact that LUT's allow
// 6-bits of input. In other words, I could build a 3-bit
// multiply from 6 LUTs (5 actually, since the first could have two
// outputs). This would allow multiplication of three bit digits, save
// only for the fact that you would need two bits of carry. The bimpy
// approach throttles back a bit and does a 2x2 bit multiply in a LUT,
// guaranteeing that it will never carry more than one bit. While this
// multiply is hardware independent (and can still run under Verilator
// therefore), it is really motivated by trying to optimize for a
// specific piece of hardware (Xilinx-7 series ...) that has at least
// 4-input LUT's with carry chains.
//
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module bimpy(i_clk, i_ce, i_a, i_b, o_r);
parameter BW=18, // Number of bits in i_b
LUTB=2; // Number of bits in i_a for our LUT multiply
input i_clk, i_ce;
input [(LUTB-1):0] i_a;
input [(BW-1):0] i_b;
output reg [(BW+LUTB-1):0] o_r;
 
wire [(BW+LUTB-2):0] w_r;
wire [(BW+LUTB-3):1] c;
 
assign w_r = { ((i_a[1])?i_b:{(BW){1'b0}}), 1'b0 }
^ { 1'b0, ((i_a[0])?i_b:{(BW){1'b0}}) };
assign c = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1'b0}}) }
& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1'b0}});
 
always @(posedge i_clk)
if (i_ce)
o_r <= w_r + { c, 2'b0 };
 
endmodule
/trunk/rtl/bitreverse.v
0,0 → 1,325
////////////////////////////////////////////////////////////////////////////////
//
// Filename: bitreverse.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This module bitreverses a pipelined FFT input. Operation is
// expected as follows:
//
// i_clk A running clock at whatever system speed is offered.
// i_reset A synchronous reset signal, that resets all internals
// i_ce If this is one, one input is consumed and an output
// is produced.
// i_in_0, i_in_1
// Two inputs to be consumed, each of width WIDTH.
// o_out_0, o_out_1
// Two of the bitreversed outputs, also of the same
// width, WIDTH. Of course, there is a delay from the
// first input to the first output. For this purpose,
// o_sync is present.
// o_sync This will be a 1'b1 for the first value in any block.
// Following a reset, this will only become 1'b1 once
// the data has been loaded and is now valid. After that,
// all outputs will be valid.
//
// 20150602 -- This module has undergone massive rework in order to
// ensure that it uses resources efficiently. As a result,
// it now optimizes nicely into block RAMs. As an unfortunately
// side effect, it now passes it's bench test (dblrev_tb) but
// fails the integration bench test (fft_tb).
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
 
 
//
// How do we do bit reversing at two smples per clock? Can we separate out
// our work into eight memory banks, writing two banks at once and reading
// another two banks in the same clock?
//
// mem[00xxx0] = s_0[n]
// mem[00xxx1] = s_1[n]
// o_0[n] = mem[10xxx0]
// o_1[n] = mem[11xxx0]
// ...
// mem[01xxx0] = s_0[m]
// mem[01xxx1] = s_1[m]
// o_0[m] = mem[10xxx1]
// o_1[m] = mem[11xxx1]
// ...
// mem[10xxx0] = s_0[n]
// mem[10xxx1] = s_1[n]
// o_0[n] = mem[00xxx0]
// o_1[n] = mem[01xxx0]
// ...
// mem[11xxx0] = s_0[m]
// mem[11xxx1] = s_1[m]
// o_0[m] = mem[00xxx1]
// o_1[m] = mem[01xxx1]
// ...
//
// The answer is that, yes we can but: we need to use four memory banks
// to do it properly. These four banks are defined by the two bits
// that determine the top and bottom of the correct address. Larger
// FFT's would require more memories.
//
//
module bitreverse(i_clk, i_reset, i_ce, i_in_0, i_in_1,
o_out_0, o_out_1, o_sync);
parameter LGSIZE=5, WIDTH=24;
input i_clk, i_reset, i_ce;
input [(2*WIDTH-1):0] i_in_0, i_in_1;
output wire [(2*WIDTH-1):0] o_out_0, o_out_1;
output reg o_sync;
 
reg in_reset;
reg [(LGSIZE-1):0] iaddr;
wire [(LGSIZE-3):0] braddr;
 
genvar k;
generate for(k=0; k<LGSIZE-2; k=k+1)
begin : gen_a_bit_reversed_value
assign braddr[k] = iaddr[LGSIZE-3-k];
end endgenerate
 
initial iaddr = 0;
initial in_reset = 1'b1;
initial o_sync = 1'b0;
always @(posedge i_clk)
if (i_reset)
begin
iaddr <= 0;
in_reset <= 1'b1;
o_sync <= 1'b0;
end else if (i_ce)
begin
iaddr <= iaddr + { {(LGSIZE-1){1'b0}}, 1'b1 };
if (&iaddr[(LGSIZE-2):0])
in_reset <= 1'b0;
if (in_reset)
o_sync <= 1'b0;
else
o_sync <= ~(|iaddr[(LGSIZE-2):0]);
end
 
reg [(2*WIDTH-1):0] mem_e [0:((1<<(LGSIZE))-1)];
reg [(2*WIDTH-1):0] mem_o [0:((1<<(LGSIZE))-1)];
 
always @(posedge i_clk)
if (i_ce) mem_e[iaddr] <= i_in_0;
always @(posedge i_clk)
if (i_ce) mem_o[iaddr] <= i_in_1;
 
 
reg [(2*WIDTH-1):0] evn_out_0, evn_out_1, odd_out_0, odd_out_1;
 
always @(posedge i_clk)
if (i_ce)
evn_out_0 <= mem_e[{!iaddr[LGSIZE-1],1'b0,braddr}];
always @(posedge i_clk)
if (i_ce)
evn_out_1 <= mem_e[{!iaddr[LGSIZE-1],1'b1,braddr}];
always @(posedge i_clk)
if (i_ce)
odd_out_0 <= mem_o[{!iaddr[LGSIZE-1],1'b0,braddr}];
always @(posedge i_clk)
if (i_ce)
odd_out_1 <= mem_o[{!iaddr[LGSIZE-1],1'b1,braddr}];
 
reg adrz;
always @(posedge i_clk)
if (i_ce) adrz <= iaddr[LGSIZE-2];
 
assign o_out_0 = (adrz)?odd_out_0:evn_out_0;
assign o_out_1 = (adrz)?odd_out_1:evn_out_1;
 
`ifdef FORMAL
`ifdef BITREVERSE
`define ASSUME assume
`define ASSERT assert
`else
`define ASSUME assert
`define ASSERT assume
`endif
 
reg f_past_valid;
initial f_past_valid = 1'b0;
always @(posedge i_clk)
f_past_valid <= 1'b1;
 
initial `ASSUME(i_reset);
always @(posedge i_clk)
if ((!f_past_valid)||($past(i_reset)))
begin
`ASSERT(iaddr == 0);
`ASSERT(in_reset);
`ASSERT(!o_sync);
end
`ifdef BITREVERSE
always @(posedge i_clk)
assume((i_ce)||($past(i_ce))||($past(i_ce,2)));
`endif // BITREVERSE
 
(* anyconst *) reg [LGSIZE-1:0] f_const_addr;
wire [LGSIZE-3:0] f_reversed_addr;
// reg [LGSIZE:0] f_now;
reg f_addr_loaded_0, f_addr_loaded_1;
reg [(2*WIDTH-1):0] f_data_0, f_data_1;
wire f_writing, f_reading;
 
generate for(k=0; k<LGSIZE-2; k=k+1)
assign f_reversed_addr[k] = f_const_addr[LGSIZE-3-k];
endgenerate
 
assign f_writing=(f_const_addr[LGSIZE-1]==iaddr[LGSIZE-1]);
assign f_reading=(f_const_addr[LGSIZE-1]!=iaddr[LGSIZE-1]);
initial f_addr_loaded_0 = 1'b0;
initial f_addr_loaded_1 = 1'b0;
always @(posedge i_clk)
if (i_reset)
begin
f_addr_loaded_0 <= 1'b0;
f_addr_loaded_1 <= 1'b0;
end else if (i_ce)
begin
if (iaddr == f_const_addr)
begin
f_addr_loaded_0 <= 1'b1;
f_addr_loaded_1 <= 1'b1;
end
 
if (f_reading)
begin
if ((braddr == f_const_addr[LGSIZE-3:0])
&&(iaddr[LGSIZE-2] == 1'b0))
f_addr_loaded_0 <= 1'b0;
 
if ((braddr == f_const_addr[LGSIZE-3:0])
&&(iaddr[LGSIZE-2] == 1'b1))
f_addr_loaded_1 <= 1'b0;
end
end
 
always @(posedge i_clk)
if ((i_ce)&&(iaddr == f_const_addr))
begin
f_data_0 <= i_in_0;
f_data_1 <= i_in_1;
`ASSERT(!f_addr_loaded_0);
`ASSERT(!f_addr_loaded_1);
end
 
always @(posedge i_clk)
if ((f_past_valid)&&(!$past(i_reset))
&&($past(f_addr_loaded_0))&&(!f_addr_loaded_0))
begin
assert(!$past(iaddr[LGSIZE-2]));
if (f_const_addr[LGSIZE-2])
assert(o_out_1 == f_data_0);
else
assert(o_out_0 == f_data_0);
end
 
always @(posedge i_clk)
if ((f_past_valid)&&(!$past(i_reset))
&&($past(f_addr_loaded_1))&&(!f_addr_loaded_1))
begin
assert($past(iaddr[LGSIZE-2]));
if (f_const_addr[LGSIZE-2])
assert(o_out_1 == f_data_1);
else
assert(o_out_0 == f_data_1);
end
 
always @(*)
`ASSERT(o_sync == ((iaddr[LGSIZE-2:0] == 1)&&(!in_reset)));
 
// Before writing to a section, the loaded flags should be
// zero
always @(*)
if (f_writing)
begin
`ASSERT(f_addr_loaded_0 == (iaddr[LGSIZE-2:0]
> f_const_addr[LGSIZE-2:0]));
`ASSERT(f_addr_loaded_1 == (iaddr[LGSIZE-2:0]
> f_const_addr[LGSIZE-2:0]));
end
 
// If we were writing, and now we are reading, then both
// f_addr_loaded flags must be set
always @(posedge i_clk)
if ((f_past_valid)&&(!$past(i_reset))
&&($past(f_writing))&&(f_reading))
begin
`ASSERT(f_addr_loaded_0);
`ASSERT(f_addr_loaded_1);
end
 
always @(*)
if (f_writing)
`ASSERT(f_addr_loaded_0 == f_addr_loaded_1);
 
// When reading, and the loaded flag is zero, our pointer
// must not have hit the address of interest yet
always @(*)
if ((!in_reset)&&(f_reading))
`ASSERT(f_addr_loaded_0 ==
((!iaddr[LGSIZE-2])&&(iaddr[LGSIZE-3:0]
<= f_reversed_addr[LGSIZE-3:0])));
always @(*)
if ((!in_reset)&&(f_reading))
`ASSERT(f_addr_loaded_1 ==
((!iaddr[LGSIZE-2])||(iaddr[LGSIZE-3:0]
<= f_reversed_addr[LGSIZE-3:0])));
always @(*)
if ((in_reset)&&(f_reading))
begin
`ASSERT(!f_addr_loaded_0);
`ASSERT(!f_addr_loaded_1);
end
 
always @(*)
if(iaddr[LGSIZE-1])
`ASSERT(!in_reset);
 
always @(*)
if (f_addr_loaded_0)
`ASSERT(mem_e[f_const_addr] == f_data_0);
always @(*)
if (f_addr_loaded_1)
`ASSERT(mem_o[f_const_addr] == f_data_1);
 
 
`endif // FORMAL
endmodule
/trunk/rtl/butterfly.v
0,0 → 1,803
////////////////////////////////////////////////////////////////////////////////
//
// Filename: butterfly.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This routine caculates a butterfly for a decimation
// in frequency version of an FFT. Specifically, given
// complex Left and Right values together with a coefficient, the output
// of this routine is given by:
//
// L' = L + R
// R' = (L - R)*C
//
// The rest of the junk below handles timing (mostly), to make certain
// that L' and R' reach the output at the same clock. Further, just to
// make certain that is the case, an 'aux' input exists. This aux value
// will come out of this routine synchronized to the values it came in
// with. (i.e., both L', R', and aux all have the same delay.) Hence,
// a caller of this routine may set aux on the first input with valid
// data, and then wait to see aux set on the output to know when to find
// the first output with valid data.
//
// All bits are preserved until the very last clock, where any more bits
// than OWIDTH will be quietly discarded.
//
// This design features no overflow checking.
//
// Notes:
// CORDIC:
// Much as we might like, we can't use a cordic here.
// The goal is to accomplish an FFT, as defined, and a
// CORDIC places a scale factor onto the data. Removing
// the scale factor would cost two multiplies, which
// is precisely what we are trying to avoid.
//
//
// 3-MULTIPLIES:
// It should also be possible to do this with three multiplies
// and an extra two addition cycles.
//
// We want
// R+I = (a + jb) * (c + jd)
// R+I = (ac-bd) + j(ad+bc)
// We multiply
// P1 = ac
// P2 = bd
// P3 = (a+b)(c+d)
// Then
// R+I=(P1-P2)+j(P3-P2-P1)
//
// WIDTHS:
// On multiplying an X width number by an
// Y width number, X>Y, the result should be (X+Y)
// bits, right?
// -2^(X-1) <= a <= 2^(X-1) - 1
// -2^(Y-1) <= b <= 2^(Y-1) - 1
// (2^(Y-1)-1)*(-2^(X-1)) <= ab <= 2^(X-1)2^(Y-1)
// -2^(X+Y-2)+2^(X-1) <= ab <= 2^(X+Y-2) <= 2^(X+Y-1) - 1
// -2^(X+Y-1) <= ab <= 2^(X+Y-1)-1
// YUP! But just barely. Do this and you'll really want
// to drop a bit, although you will risk overflow in so
// doing.
//
// 20150602 -- The sync logic lines have been completely redone. The
// synchronization lines no longer go through the FIFO with the
// left hand sum, but are kept out of memory. This allows the
// butterfly to use more optimal memory resources, while also
// guaranteeing that the sync lines can be properly reset upon
// any reset signal.
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module butterfly(i_clk, i_reset, i_ce, i_coef, i_left, i_right, i_aux,
o_left, o_right, o_aux);
// Public changeable parameters ...
parameter IWIDTH=16,CWIDTH=20,OWIDTH=17;
parameter SHIFT=0;
// The number of clocks per each i_ce. The actual number can be
// more, but the algorithm depends upon at least this many for
// extra internal processing.
parameter CKPCE=1;
//
// Local/derived parameters that are calculated from the above
// params. Apart from algorithmic changes below, these should not
// be adjusted
//
// The first step is to calculate how many clocks it takes our
// multiply to come back with an answer within. The time in the
// multiply depends upon the input value with the fewest number of
// bits--to keep the pipeline depth short. So, let's find the
// fewest number of bits here.
localparam MXMPYBITS =
((IWIDTH+2)>(CWIDTH+1)) ? (CWIDTH+1) : (IWIDTH + 2);
//
// Given this "fewest" number of bits, we can calculate the
// number of clocks the multiply itself will take.
localparam MPYDELAY=((MXMPYBITS+1)/2)+2;
//
// In an environment when CKPCE > 1, the multiply delay isn't
// necessarily the delay felt by this algorithm--measured in
// i_ce's. In particular, if the multiply can operate with more
// operations per clock, it can appear to finish "faster".
// Since most of the logic in this core operates on the slower
// clock, we'll need to map that speed into the number of slower
// clock ticks that it takes.
localparam LCLDELAY = (CKPCE == 1) ? MPYDELAY
: (CKPCE == 2) ? (MPYDELAY/2+2)
: (MPYDELAY/3 + 2);
localparam LGDELAY = (MPYDELAY>64) ? 7
: (MPYDELAY > 32) ? 6
: (MPYDELAY > 16) ? 5
: (MPYDELAY > 8) ? 4
: (MPYDELAY > 4) ? 3
: 2;
localparam AUXLEN=(LCLDELAY+3);
localparam MPYREMAINDER = MPYDELAY - CKPCE*(MPYDELAY/CKPCE);
 
 
input i_clk, i_reset, i_ce;
input [(2*CWIDTH-1):0] i_coef;
input [(2*IWIDTH-1):0] i_left, i_right;
input i_aux;
output wire [(2*OWIDTH-1):0] o_left, o_right;
output reg o_aux;
 
reg [(2*IWIDTH-1):0] r_left, r_right;
reg [(2*CWIDTH-1):0] r_coef, r_coef_2;
wire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i;
assign r_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];
assign r_left_i = r_left[ (IWIDTH-1):0];
assign r_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];
assign r_right_i = r_right[(IWIDTH-1):0];
 
reg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i;
 
reg [(LGDELAY-1):0] fifo_addr;
wire [(LGDELAY-1):0] fifo_read_addr;
assign fifo_read_addr = fifo_addr - LCLDELAY[(LGDELAY-1):0];
reg [(2*IWIDTH+1):0] fifo_left [ 0:((1<<LGDELAY)-1)];
 
// Set up the input to the multiply
always @(posedge i_clk)
if (i_ce)
begin
// One clock just latches the inputs
r_left <= i_left; // No change in # of bits
r_right <= i_right;
r_coef <= i_coef;
// Next clock adds/subtracts
r_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits
r_sum_i <= r_left_i + r_right_i;
r_dif_r <= r_left_r - r_right_r;
r_dif_i <= r_left_i - r_right_i;
// Other inputs are simply delayed on second clock
r_coef_2<= r_coef;
end
 
// Don't forget to record the even side, since it doesn't need
// to be multiplied, but yet we still need the results in sync
// with the answer when it is ready.
initial fifo_addr = 0;
always @(posedge i_clk)
if (i_reset)
fifo_addr <= 0;
else if (i_ce)
// Need to delay the sum side--nothing else happens
// to it, but it needs to stay synchronized with the
// right side.
fifo_addr <= fifo_addr + 1;
 
always @(posedge i_clk)
if (i_ce)
fifo_left[fifo_addr] <= { r_sum_r, r_sum_i };
 
wire signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i;
assign ir_coef_r = r_coef_2[(2*CWIDTH-1):CWIDTH];
assign ir_coef_i = r_coef_2[(CWIDTH-1):0];
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] p_one, p_two, p_three;
 
 
// Multiply output is always a width of the sum of the widths of
// the two inputs. ALWAYS. This is independent of the number of
// bits in p_one, p_two, or p_three. These values needed to
// accumulate a bit (or two) each. However, this approach to a
// three multiply complex multiply cannot increase the total
// number of bits in our final output. We'll take care of
// dropping back down to the proper width, OWIDTH, in our routine
// below.
 
 
// We accomplish here "Karatsuba" multiplication. That is,
// by doing three multiplies we accomplish the work of four.
// Let's prove to ourselves that this works ... We wish to
// multiply: (a+jb) * (c+jd), where a+jb is given by
// a + jb = r_dif_r + j r_dif_i, and
// c + jd = ir_coef_r + j ir_coef_i.
// We do this by calculating the intermediate products P1, P2,
// and P3 as
// P1 = ac
// P2 = bd
// P3 = (a + b) * (c + d)
// and then complete our final answer with
// ac - bd = P1 - P2 (this checks)
// ad + bc = P3 - P2 - P1
// = (ac + bc + ad + bd) - bd - ac
// = bc + ad (this checks)
 
 
// This should really be based upon an IF, such as in
// if (IWIDTH < CWIDTH) then ...
// However, this is the only (other) way I know to do it.
generate if (CKPCE <= 1)
begin
 
wire [(CWIDTH):0] p3c_in;
wire [(IWIDTH+1):0] p3d_in;
assign p3c_in = ir_coef_i + ir_coef_r;
assign p3d_in = r_dif_r + r_dif_i;
 
// We need to pad these first two multiplies by an extra
// bit just to keep them aligned with the third,
// simpler, multiply.
longbimpy #(CWIDTH+1,IWIDTH+2) p1(i_clk, i_ce,
{ir_coef_r[CWIDTH-1],ir_coef_r},
{r_dif_r[IWIDTH],r_dif_r}, p_one);
longbimpy #(CWIDTH+1,IWIDTH+2) p2(i_clk, i_ce,
{ir_coef_i[CWIDTH-1],ir_coef_i},
{r_dif_i[IWIDTH],r_dif_i}, p_two);
longbimpy #(CWIDTH+1,IWIDTH+2) p3(i_clk, i_ce,
p3c_in, p3d_in, p_three);
 
end else if (CKPCE == 2)
begin : CKPCE_TWO
// Coefficient multiply inputs
reg [2*(CWIDTH)-1:0] mpy_pipe_c;
// Data multiply inputs
reg [2*(IWIDTH+1)-1:0] mpy_pipe_d;
wire signed [(CWIDTH-1):0] mpy_pipe_vc;
wire signed [(IWIDTH):0] mpy_pipe_vd;
//
reg signed [(CWIDTH+1)-1:0] mpy_cof_sum;
reg signed [(IWIDTH+2)-1:0] mpy_dif_sum;
 
assign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH];
assign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1];
 
reg mpy_pipe_v;
reg ce_phase;
 
reg signed [(CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;
reg signed [IWIDTH+CWIDTH+3-1:0] longmpy;
 
 
initial ce_phase = 1'b0;
always @(posedge i_clk)
if (i_reset)
ce_phase <= 1'b0;
else if (i_ce)
ce_phase <= 1'b1;
else
ce_phase <= 1'b0;
 
always @(*)
mpy_pipe_v = (i_ce)||(ce_phase);
 
always @(posedge i_clk)
if (ce_phase)
begin
mpy_pipe_c[2*CWIDTH-1:0] <=
{ ir_coef_r, ir_coef_i };
mpy_pipe_d[2*(IWIDTH+1)-1:0] <=
{ r_dif_r, r_dif_i };
 
mpy_cof_sum <= ir_coef_i + ir_coef_r;
mpy_dif_sum <= r_dif_r + r_dif_i;
 
end else if (i_ce)
begin
mpy_pipe_c[2*(CWIDTH)-1:0] <= {
mpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} };
mpy_pipe_d[2*(IWIDTH+1)-1:0] <= {
mpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} };
end
 
longbimpy #(CWIDTH+1,IWIDTH+2) mpy0(i_clk, mpy_pipe_v,
mpy_cof_sum, mpy_dif_sum, longmpy);
 
longbimpy #(CWIDTH+1,IWIDTH+2) mpy1(i_clk, mpy_pipe_v,
{ mpy_pipe_vc[CWIDTH-1], mpy_pipe_vc },
{ mpy_pipe_vd[IWIDTH ], mpy_pipe_vd },
mpy_pipe_out);
 
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0]
rp_one, rp_two, rp_three,
rp2_one, rp2_two, rp2_three;
 
always @(posedge i_clk)
if (((i_ce)&&(!MPYDELAY[0]))
||((ce_phase)&&(MPYDELAY[0])))
rp_one <= mpy_pipe_out;
always @(posedge i_clk)
if (((i_ce)&&(MPYDELAY[0]))
||((ce_phase)&&(!MPYDELAY[0])))
rp_two <= mpy_pipe_out;
always @(posedge i_clk)
if (i_ce)
rp_three <= longmpy;
 
// Our outputs *MUST* be set on a clock where i_ce is
// true for the following logic to work. Make that
// happen here.
always @(posedge i_clk)
if (i_ce)
rp2_one<= rp_one;
always @(posedge i_clk)
if (i_ce)
rp2_two <= rp_two;
always @(posedge i_clk)
if (i_ce)
rp2_three<= rp_three;
 
assign p_one = rp2_one;
assign p_two = (!MPYDELAY[0])? rp2_two : rp_two;
assign p_three = ( MPYDELAY[0])? rp_three : rp2_three;
 
// verilator lint_off UNUSED
wire [2*(IWIDTH+CWIDTH+3)-1:0] unused;
assign unused = { rp2_two, rp2_three };
// verilator lint_on UNUSED
 
end else if (CKPCE <= 3)
begin : CKPCE_THREE
// Coefficient multiply inputs
reg [3*(CWIDTH+1)-1:0] mpy_pipe_c;
// Data multiply inputs
reg [3*(IWIDTH+2)-1:0] mpy_pipe_d;
wire signed [(CWIDTH):0] mpy_pipe_vc;
wire signed [(IWIDTH+1):0] mpy_pipe_vd;
 
assign mpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)];
assign mpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)];
 
reg mpy_pipe_v;
reg [2:0] ce_phase;
 
reg signed [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;
 
initial ce_phase = 3'b011;
always @(posedge i_clk)
if (i_reset)
ce_phase <= 3'b011;
else if (i_ce)
ce_phase <= 3'b000;
else if (ce_phase != 3'b011)
ce_phase <= ce_phase + 1'b1;
 
always @(*)
mpy_pipe_v = (i_ce)||(ce_phase < 3'b010);
 
always @(posedge i_clk)
if (ce_phase == 3'b000)
begin
// Second clock
mpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= {
ir_coef_r[CWIDTH-1], ir_coef_r,
ir_coef_i[CWIDTH-1], ir_coef_i };
mpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r;
mpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= {
r_dif_r[IWIDTH], r_dif_r,
r_dif_i[IWIDTH], r_dif_i };
mpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i;
 
end else if (mpy_pipe_v)
begin
mpy_pipe_c[3*(CWIDTH+1)-1:0] <= {
mpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1'b0}} };
mpy_pipe_d[3*(IWIDTH+2)-1:0] <= {
mpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1'b0}} };
end
 
longbimpy #(CWIDTH+1,IWIDTH+2) mpy(i_clk, mpy_pipe_v,
mpy_pipe_vc, mpy_pipe_vd, mpy_pipe_out);
 
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0]
rp_one, rp_two, rp_three,
rp2_one, rp2_two, rp2_three,
rp3_one;
 
always @(posedge i_clk)
if (MPYREMAINDER == 0)
begin
 
if (i_ce)
rp_two <= mpy_pipe_out;
else if (ce_phase == 3'b000)
rp_three <= mpy_pipe_out;
else if (ce_phase == 3'b001)
rp_one <= mpy_pipe_out;
 
end else if (MPYREMAINDER == 1)
begin
 
if (i_ce)
rp_one <= mpy_pipe_out;
else if (ce_phase == 3'b000)
rp_two <= mpy_pipe_out;
else if (ce_phase == 3'b001)
rp_three <= mpy_pipe_out;
 
end else // if (MPYREMAINDER == 2)
begin
 
if (i_ce)
rp_three <= mpy_pipe_out;
else if (ce_phase == 3'b000)
rp_one <= mpy_pipe_out;
else if (ce_phase == 3'b001)
rp_two <= mpy_pipe_out;
 
end
 
always @(posedge i_clk)
if (i_ce)
begin
rp2_one <= rp_one;
rp2_two <= rp_two;
rp2_three <= (MPYREMAINDER == 2) ? mpy_pipe_out : rp_three;
rp3_one <= (MPYREMAINDER == 0) ? rp2_one : rp_one;
end
assign p_one = rp3_one;
assign p_two = rp2_two;
assign p_three = rp2_three;
 
end endgenerate
// These values are held in memory and delayed during the
// multiply. Here, we recover them. During the multiply,
// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},
// therefore, the left_x values need to be right shifted by
// CWIDTH-2 as well. The additional bits come from a sign
// extension.
wire signed [(IWIDTH+CWIDTH):0] fifo_i, fifo_r;
reg [(2*IWIDTH+1):0] fifo_read;
assign fifo_r = { {2{fifo_read[2*(IWIDTH+1)-1]}}, fifo_read[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1'b0}} };
assign fifo_i = { {2{fifo_read[(IWIDTH+1)-1]}}, fifo_read[((IWIDTH+1)-1):0], {(CWIDTH-2){1'b0}} };
 
 
reg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;
 
// Let's do some rounding and remove unnecessary bits.
// We have (IWIDTH+CWIDTH+3) bits here, we need to drop down to
// OWIDTH, and SHIFT by SHIFT bits in the process. The trick is
// that we don't need (IWIDTH+CWIDTH+3) bits. We've accumulated
// them, but the actual values will never fill all these bits.
// In particular, we only need:
// IWIDTH bits for the input
// +1 bit for the add/subtract
// +CWIDTH bits for the coefficient multiply
// +1 bit for the add/subtract in the complex multiply
// ------
// (IWIDTH+CWIDTH+2) bits at full precision.
//
// However, the coefficient multiply multiplied by a maximum value
// of 2^(CWIDTH-2). Thus, we only have
// IWIDTH bits for the input
// +1 bit for the add/subtract
// +CWIDTH-2 bits for the coefficient multiply
// +1 (optional) bit for the add/subtract in the cpx mpy.
// -------- ... multiply. (This last bit may be shifted out.)
// (IWIDTH+CWIDTH) valid output bits.
// Now, if the user wants to keep any extras of these (via OWIDTH),
// or if he wishes to arbitrarily shift some of these off (via
// SHIFT) we accomplish that here.
 
wire signed [(OWIDTH-1):0] rnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;
 
wire signed [(CWIDTH+IWIDTH+3-1):0] left_sr, left_si;
assign left_sr = { {(2){fifo_r[(IWIDTH+CWIDTH)]}}, fifo_r };
assign left_si = { {(2){fifo_i[(IWIDTH+CWIDTH)]}}, fifo_i };
 
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_r(i_clk, i_ce,
left_sr, rnd_left_r);
 
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_i(i_clk, i_ce,
left_si, rnd_left_i);
 
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,
mpy_r, rnd_right_r);
 
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,
mpy_i, rnd_right_i);
 
always @(posedge i_clk)
if (i_ce)
begin
// First clock, recover all values
fifo_read <= fifo_left[fifo_read_addr];
// These values are IWIDTH+CWIDTH+3 bits wide
// although they only need to be (IWIDTH+1)
// + (CWIDTH) bits wide. (We've got two
// extra bits we need to get rid of.)
mpy_r <= p_one - p_two;
mpy_i <= p_three - p_one - p_two;
end
 
reg [(AUXLEN-1):0] aux_pipeline;
initial aux_pipeline = 0;
always @(posedge i_clk)
if (i_reset)
aux_pipeline <= 0;
else if (i_ce)
aux_pipeline <= { aux_pipeline[(AUXLEN-2):0], i_aux };
 
initial o_aux = 1'b0;
always @(posedge i_clk)
if (i_reset)
o_aux <= 1'b0;
else if (i_ce)
begin
// Second clock, latch for final clock
o_aux <= aux_pipeline[AUXLEN-1];
end
 
// As a final step, we pack our outputs into two packed two's
// complement numbers per output word, so that each output word
// has (2*OWIDTH) bits in it, with the top half being the real
// portion and the bottom half being the imaginary portion.
assign o_left = { rnd_left_r, rnd_left_i };
assign o_right= { rnd_right_r,rnd_right_i};
 
`ifdef VERILATOR
`define FORMAL
`endif
`ifdef FORMAL
localparam F_LGDEPTH = (AUXLEN > 64) ? 7
: (AUXLEN > 32) ? 6
: (AUXLEN > 16) ? 5
: (AUXLEN > 8) ? 4
: (AUXLEN > 4) ? 3 : 2;
 
localparam F_DEPTH = AUXLEN;
localparam [F_LGDEPTH-1:0] F_D = F_DEPTH[F_LGDEPTH-1:0]-1;
 
reg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1];
reg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1];
reg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1];
reg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1];
reg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1];
reg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1];
reg signed [F_DEPTH-1:0] f_dlyaux;
 
initial f_dlyaux[0] = 0;
always @(posedge i_clk)
if (i_reset)
f_dlyaux <= 0;
else if (i_ce)
f_dlyaux <= { f_dlyaux[F_DEPTH-2:0], i_aux };
 
always @(posedge i_clk)
if (i_ce)
begin
f_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH];
f_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0];
f_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH];
f_dlyright_i[0] <= i_right[( IWIDTH-1):0];
f_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH];
f_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0];
end
 
genvar k;
generate for(k=1; k<F_DEPTH; k=k+1)
begin : F_PROPAGATE_DELAY_LINES
 
 
always @(posedge i_clk)
if (i_ce)
begin
f_dlyleft_r[k] <= f_dlyleft_r[ k-1];
f_dlyleft_i[k] <= f_dlyleft_i[ k-1];
f_dlyright_r[k] <= f_dlyright_r[k-1];
f_dlyright_i[k] <= f_dlyright_i[k-1];
f_dlycoeff_r[k] <= f_dlycoeff_r[k-1];
f_dlycoeff_i[k] <= f_dlycoeff_i[k-1];
end
 
end endgenerate
 
`ifndef VERILATOR
always @(posedge i_clk)
if ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3))
&&(!$past(i_ce,4)))
assume(i_ce);
 
generate if (CKPCE <= 1)
begin
 
// i_ce is allowed to be anything in this mode
 
end else if (CKPCE == 2)
begin : F_CKPCE_TWO
 
always @(posedge i_clk)
if ($past(i_ce))
assume(!i_ce);
 
end else if (CKPCE == 3)
begin : F_CKPCE_THREE
 
always @(posedge i_clk)
if (($past(i_ce))||($past(i_ce,2)))
assume(!i_ce);
 
end endgenerate
`endif
 
reg [F_LGDEPTH:0] f_startup_counter;
initial f_startup_counter = 0;
always @(posedge i_clk)
if (i_reset)
f_startup_counter <= 0;
else if ((i_ce)&&(!(&f_startup_counter)))
f_startup_counter <= f_startup_counter + 1;
 
wire signed [IWIDTH:0] f_sumr, f_sumi;
always @(*)
begin
f_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D];
f_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D];
end
 
wire signed [IWIDTH+CWIDTH+3-1:0] f_sumrx, f_sumix;
assign f_sumrx = { {(4){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} };
assign f_sumix = { {(4){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} };
 
wire signed [IWIDTH:0] f_difr, f_difi;
always @(*)
begin
f_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D];
f_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D];
end
 
wire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix;
assign f_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr };
assign f_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi };
 
wire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i;
assign f_widecoeff_r ={ {(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}},
f_dlycoeff_r[F_D] };
assign f_widecoeff_i ={ {(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}},
f_dlycoeff_i[F_D] };
 
always @(posedge i_clk)
if (f_startup_counter > {1'b0, F_D})
begin
assert(aux_pipeline == f_dlyaux);
assert(left_sr == f_sumrx);
assert(left_si == f_sumix);
assert(aux_pipeline[AUXLEN-1] == f_dlyaux[F_D]);
 
if ((f_difr == 0)&&(f_difi == 0))
begin
assert(mpy_r == 0);
assert(mpy_i == 0);
end else if ((f_dlycoeff_r[F_D] == 0)
&&(f_dlycoeff_i[F_D] == 0))
begin
assert(mpy_r == 0);
assert(mpy_i == 0);
end
 
if ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0))
begin
assert(mpy_r == f_difrx);
assert(mpy_i == f_difix);
end
 
if ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1))
begin
assert(mpy_r == -f_difix);
assert(mpy_i == f_difrx);
end
 
if ((f_difr == 1)&&(f_difi == 0))
begin
assert(mpy_r == f_widecoeff_r);
assert(mpy_i == f_widecoeff_i);
end
 
if ((f_difr == 0)&&(f_difi == 1))
begin
assert(mpy_r == -f_widecoeff_i);
assert(mpy_i == f_widecoeff_r);
end
end
 
// Let's see if we can improve our performance at all by
// moving our test one clock earlier. If nothing else, it should
// help induction finish one (or more) clocks ealier than
// otherwise
 
 
wire signed [IWIDTH:0] f_predifr, f_predifi;
always @(*)
begin
f_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1];
f_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1];
end
 
wire signed [IWIDTH+CWIDTH+3-1:0] f_predifrx, f_predifix;
assign f_predifrx = { {(CWIDTH+2){f_predifr[IWIDTH]}}, f_predifr };
assign f_predifix = { {(CWIDTH+2){f_predifi[IWIDTH]}}, f_predifi };
 
wire signed [CWIDTH:0] f_sumcoef;
wire signed [IWIDTH+1:0] f_sumdiff;
always @(*)
begin
f_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1];
f_sumdiff = f_predifr + f_predifi;
end
 
// Induction helpers
always @(posedge i_clk)
if (f_startup_counter >= { 1'b0, F_D })
begin
if (f_dlycoeff_r[F_D-1] == 0)
assert(p_one == 0);
if (f_dlycoeff_i[F_D-1] == 0)
assert(p_two == 0);
 
if (f_dlycoeff_r[F_D-1] == 1)
assert(p_one == f_predifrx);
if (f_dlycoeff_i[F_D-1] == 1)
assert(p_two == f_predifix);
 
if (f_predifr == 0)
assert(p_one == 0);
if (f_predifi == 0)
assert(p_two == 0);
 
// verilator lint_off WIDTH
if (f_predifr == 1)
assert(p_one == f_dlycoeff_r[F_D-1]);
if (f_predifi == 1)
assert(p_two == f_dlycoeff_i[F_D-1]);
// verilator lint_on WIDTH
 
if (f_sumcoef == 0)
assert(p_three == 0);
if (f_sumdiff == 0)
assert(p_three == 0);
// verilator lint_off WIDTH
if (f_sumcoef == 1)
assert(p_three == f_sumdiff);
if (f_sumdiff == 1)
assert(p_three == f_sumcoef);
// verilator lint_on WIDTH
`ifdef VERILATOR
assert(p_one == f_predifr * f_dlycoeff_r[F_D-1]);
assert(p_two == f_predifi * f_dlycoeff_i[F_D-1]);
assert(p_three == f_sumdiff * f_sumcoef);
`endif // VERILATOR
end
 
// F_CHECK will be set externally by the solver, so that we can
// double check that the solver is actually testing what we think
// it is testing. We'll set it here to MPYREMAINDER, which will
// essentially eliminate the check--unless overridden by the
// solver.
parameter F_CHECK = MPYREMAINDER;
initial assert(MPYREMAINDER == F_CHECK);
 
`endif // FORMAL
endmodule
/trunk/rtl/convround.v
0,0 → 1,124
////////////////////////////////////////////////////////////////////////////////
//
// Filename: convround.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: A convergent rounding routine, also known as banker's
// rounding, Dutch rounding, Gaussian rounding, unbiased
// rounding, or ... more, at least according to Wikipedia.
//
// This form of rounding works by rounding, when the direction is in
// question, towards the nearest even value.
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module convround(i_clk, i_ce, i_val, o_val);
parameter IWID=16, OWID=8, SHIFT=0;
input i_clk, i_ce;
input signed [(IWID-1):0] i_val;
output reg signed [(OWID-1):0] o_val;
 
// Let's deal with three cases to be as general as we can be here
//
// 1. The desired output would lose no bits at all
// 2. One bit would be dropped, so the rounding is simply
// adjusting the value to be the nearest even number in
// cases of being halfway between two. If identically
// equal to a number, we just leave it as is.
// 3. Two or more bits would be dropped. In this case, we round
// normally unless we are rounding a value of exactly
// halfway between the two. In the halfway case we round
// to the nearest even number.
generate
if (IWID == OWID) // In this case, the shift is irrelevant and
begin // cannot be applied. No truncation or rounding takes
// effect here.
 
always @(posedge i_clk)
if (i_ce) o_val <= i_val[(IWID-1):0];
 
end else if (IWID-SHIFT == OWID)
begin // No truncation or rounding, output drops no bits
 
always @(posedge i_clk)
if (i_ce) o_val <= i_val[(IWID-SHIFT-1):0];
 
end else if (IWID-SHIFT-1 == OWID)
begin // Output drops one bit, can only add one or ... not.
wire [(OWID-1):0] truncated_value, rounded_up;
wire last_valid_bit, first_lost_bit;
assign truncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];
assign rounded_up=truncated_value + {{(OWID-1){1'b0}}, 1'b1 };
assign last_valid_bit = truncated_value[0];
assign first_lost_bit = i_val[0];
 
always @(posedge i_clk)
if (i_ce)
begin
if (!first_lost_bit) // Round down / truncate
o_val <= truncated_value;
else if (last_valid_bit)// Round up to nearest
o_val <= rounded_up; // even value
else // else round down to the nearest
o_val <= truncated_value; // even value
end
 
end else // If there's more than one bit we are dropping
begin
wire [(OWID-1):0] truncated_value, rounded_up;
wire last_valid_bit, first_lost_bit;
assign truncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];
assign rounded_up=truncated_value + {{(OWID-1){1'b0}}, 1'b1 };
assign last_valid_bit = truncated_value[0];
assign first_lost_bit = i_val[(IWID-SHIFT-OWID-1)];
 
wire [(IWID-SHIFT-OWID-2):0] other_lost_bits;
assign other_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];
 
always @(posedge i_clk)
if (i_ce)
begin
if (!first_lost_bit) // Round down / truncate
o_val <= truncated_value;
else if (|other_lost_bits) // Round up to
o_val <= rounded_up; // closest value
else if (last_valid_bit) // Round up to
o_val <= rounded_up; // nearest even
else // else round down to nearest even
o_val <= truncated_value;
end
end
endgenerate
 
endmodule
/trunk/rtl/fftmain.v
0,0 → 1,276
////////////////////////////////////////////////////////////////////////////////
//
// Filename: fftmain.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This is the main module in the General Purpose FPGA FFT
// implementation. As such, all other modules are subordinate
// to this one. This module accomplish a fixed size Complex FFT on
// 2048 data points.
// The FFT is fully pipelined, and accepts as inputs two complex two's
// complement samples per clock.
//
// Parameters:
// i_clk The clock. All operations are synchronous with this clock.
// i_reset Synchronous reset, active high. Setting this line will
// force the reset of all of the internals to this routine.
// Further, following a reset, the o_sync line will go
// high the same time the first output sample is valid.
// i_ce A clock enable line. If this line is set, this module
// will accept two complex values as inputs, and produce
// two (possibly empty) complex values as outputs.
// i_left The first of two complex input samples. This value is split
// into two two's complement numbers, 15 bits each, with
// the real portion in the high order bits, and the
// imaginary portion taking the bottom 15 bits.
// i_right This is the same thing as i_left, only this is the second of
// two such samples. Hence, i_left would contain input
// sample zero, i_right would contain sample one. On the
// next clock i_left would contain input sample two,
// i_right number three and so forth.
// o_left The first of two output samples, of the same format as i_left,
// only having 21 bits for each of the real and imaginary
// components, leading to 42 bits total.
// o_right The second of two output samples produced each clock. This has
// the same format as o_left.
// o_sync A one bit output indicating the first valid sample produced by
// this FFT following a reset. Ever after, this will
// indicate the first sample of an FFT frame.
//
// Arguments: This file was computer generated using the following command
// line:
//
// % ./fftgen -v -d ../rtl -f 2048 -2 -p 0 -n 15 -a ../bench/cpp/fftsize.h
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
//
//
module fftmain(i_clk, i_reset, i_ce,
i_left, i_right,
o_left, o_right, o_sync);
parameter IWIDTH=15, OWIDTH=21, LGWIDTH=11;
//
input i_clk, i_reset, i_ce;
//
input [(2*IWIDTH-1):0] i_left, i_right;
output reg [(2*OWIDTH-1):0] o_left, o_right;
output reg o_sync;
 
 
// Outputs of the FFT, ready for bit reversal.
wire [(2*OWIDTH-1):0] br_left, br_right;
 
 
wire w_s2048;
// verilator lint_off UNUSED
wire w_os2048;
// verilator lint_on UNUSED
wire [31:0] w_e2048, w_o2048;
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0,
0, 1, "cmem_e4096.hex")
stage_e2048(i_clk, i_reset, i_ce,
(!i_reset), i_left, w_e2048, w_s2048);
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0,
0, 1, "cmem_o4096.hex")
stage_o2048(i_clk, i_reset, i_ce,
(!i_reset), i_right, w_o2048, w_os2048);
 
 
wire w_s1024;
// verilator lint_off UNUSED
wire w_os1024;
// verilator lint_on UNUSED
wire [33:0] w_e1024, w_o1024;
fftstage #(16,20,17,11,8,0,
0, 1, "cmem_e2048.hex")
stage_e1024(i_clk, i_reset, i_ce,
w_s2048, w_e2048, w_e1024, w_s1024);
fftstage #(16,20,17,11,8,0,
0, 1, "cmem_o2048.hex")
stage_o1024(i_clk, i_reset, i_ce,
w_s2048, w_o2048, w_o1024, w_os1024);
 
wire w_s512;
// verilator lint_off UNUSED
wire w_os512;
// verilator lint_on UNUSED
wire [33:0] w_e512, w_o512;
fftstage #(17,21,17,11,7,0,
0, 1, "cmem_e1024.hex")
stage_e512(i_clk, i_reset, i_ce,
w_s1024, w_e1024, w_e512, w_s512);
fftstage #(17,21,17,11,7,0,
0, 1, "cmem_o1024.hex")
stage_o512(i_clk, i_reset, i_ce,
w_s1024, w_o1024, w_o512, w_os512);
 
wire w_s256;
// verilator lint_off UNUSED
wire w_os256;
// verilator lint_on UNUSED
wire [35:0] w_e256, w_o256;
fftstage #(17,21,18,11,6,0,
0, 1, "cmem_e512.hex")
stage_e256(i_clk, i_reset, i_ce,
w_s512, w_e512, w_e256, w_s256);
fftstage #(17,21,18,11,6,0,
0, 1, "cmem_o512.hex")
stage_o256(i_clk, i_reset, i_ce,
w_s512, w_o512, w_o256, w_os256);
 
wire w_s128;
// verilator lint_off UNUSED
wire w_os128;
// verilator lint_on UNUSED
wire [35:0] w_e128, w_o128;
fftstage #(18,22,18,11,5,0,
0, 1, "cmem_e256.hex")
stage_e128(i_clk, i_reset, i_ce,
w_s256, w_e256, w_e128, w_s128);
fftstage #(18,22,18,11,5,0,
0, 1, "cmem_o256.hex")
stage_o128(i_clk, i_reset, i_ce,
w_s256, w_o256, w_o128, w_os128);
 
wire w_s64;
// verilator lint_off UNUSED
wire w_os64;
// verilator lint_on UNUSED
wire [37:0] w_e64, w_o64;
fftstage #(18,22,19,11,4,0,
0, 1, "cmem_e128.hex")
stage_e64(i_clk, i_reset, i_ce,
w_s128, w_e128, w_e64, w_s64);
fftstage #(18,22,19,11,4,0,
0, 1, "cmem_o128.hex")
stage_o64(i_clk, i_reset, i_ce,
w_s128, w_o128, w_o64, w_os64);
 
wire w_s32;
// verilator lint_off UNUSED
wire w_os32;
// verilator lint_on UNUSED
wire [37:0] w_e32, w_o32;
fftstage #(19,23,19,11,3,0,
0, 1, "cmem_e64.hex")
stage_e32(i_clk, i_reset, i_ce,
w_s64, w_e64, w_e32, w_s32);
fftstage #(19,23,19,11,3,0,
0, 1, "cmem_o64.hex")
stage_o32(i_clk, i_reset, i_ce,
w_s64, w_o64, w_o32, w_os32);
 
wire w_s16;
// verilator lint_off UNUSED
wire w_os16;
// verilator lint_on UNUSED
wire [39:0] w_e16, w_o16;
fftstage #(19,23,20,11,2,0,
0, 1, "cmem_e32.hex")
stage_e16(i_clk, i_reset, i_ce,
w_s32, w_e32, w_e16, w_s16);
fftstage #(19,23,20,11,2,0,
0, 1, "cmem_o32.hex")
stage_o16(i_clk, i_reset, i_ce,
w_s32, w_o32, w_o16, w_os16);
 
wire w_s8;
// verilator lint_off UNUSED
wire w_os8;
// verilator lint_on UNUSED
wire [39:0] w_e8, w_o8;
fftstage #(20,24,20,11,1,0,
0, 1, "cmem_e16.hex")
stage_e8(i_clk, i_reset, i_ce,
w_s16, w_e16, w_e8, w_s8);
fftstage #(20,24,20,11,1,0,
0, 1, "cmem_o16.hex")
stage_o8(i_clk, i_reset, i_ce,
w_s16, w_o16, w_o8, w_os8);
 
wire w_s4;
// verilator lint_off UNUSED
wire w_os4;
// verilator lint_on UNUSED
wire [41:0] w_e4, w_o4;
qtrstage #(20,21,11,0,0,0) stage_e4(i_clk, i_reset, i_ce,
w_s8, w_e8, w_e4, w_s4);
qtrstage #(20,21,11,1,0,0) stage_o4(i_clk, i_reset, i_ce,
w_s8, w_o8, w_o4, w_os4);
wire w_s2;
wire [41:0] w_e2, w_o2;
laststage #(21,21,0) stage_2(i_clk, i_reset, i_ce,
w_s4, w_e4, w_o4, w_e2, w_o2, w_s2);
 
 
// Prepare for a (potential) bit-reverse stage.
assign br_left = w_e2;
assign br_right = w_o2;
 
wire br_start;
reg r_br_started;
initial r_br_started = 1'b0;
always @(posedge i_clk)
if (i_reset)
r_br_started <= 1'b0;
else if (i_ce)
r_br_started <= r_br_started || w_s2;
assign br_start = r_br_started || w_s2;
 
// Now for the bit-reversal stage.
wire br_sync;
wire [(2*OWIDTH-1):0] br_o_left, br_o_right;
bitreverse #(11,21)
revstage(i_clk, i_reset,
(i_ce & br_start), br_left, br_right,
br_o_left, br_o_right, br_sync);
 
 
// Last clock: Register our outputs, we're done.
initial o_sync = 1'b0;
always @(posedge i_clk)
if (i_reset)
o_sync <= 1'b0;
else if (i_ce)
o_sync <= br_sync;
 
always @(posedge i_clk)
if (i_ce)
begin
o_left <= br_o_left;
o_right <= br_o_right;
end
 
 
endmodule
/trunk/rtl/fftstage.v
0,0 → 1,247
////////////////////////////////////////////////////////////////////////////////
//
// Filename: fftstage.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This file is (almost) a Verilog source file. It is meant to
// be used by a FFT core compiler to generate FFTs which may be
// used as part of an FFT core. Specifically, this file encapsulates
// the options of an FFT-stage. For any 2^N length FFT, there shall be
// (N-1) of these stages.
//
//
// Operation:
// Given a stream of values, operate upon them as though they were
// value pairs, x[n] and x[n+N/2]. The stream begins when n=0, and ends
// when n=N/2-1 (i.e. there's a full set of N values). When the value
// x[0] enters, the synchronization input, i_sync, must be true as well.
//
// For this stream, produce outputs
// y[n ] = x[n] + x[n+N/2], and
// y[n+N/2] = (x[n] - x[n+N/2]) * c[n],
// where c[n] is a complex coefficient found in the
// external memory file COEFFILE.
// When y[0] is output, a synchronization bit o_sync will be true as
// well, otherwise it will be zero.
//
// Most of the work to do this is done within the butterfly, whether the
// hardware accelerated butterfly (uses a DSP) or not.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module fftstage(i_clk, i_reset, i_ce, i_sync, i_data, o_data, o_sync);
parameter IWIDTH=15,CWIDTH=20,OWIDTH=16;
// Parameters specific to the core that should be changed when this
// core is built ... Note that the minimum LGSPAN (the base two log
// of the span, or the base two log of the current FFT size) is 3.
// Smaller spans (i.e. the span of 2) must use the dbl laststage module.
parameter LGWIDTH=10, LGSPAN=8, BFLYSHIFT=0;
parameter [0:0] OPT_HWMPY = 1;
// Clocks per CE. If your incoming data rate is less than 50% of your
// clock speed, you can set CKPCE to 2'b10, make sure there's at least
// one clock between cycles when i_ce is high, and then use two
// multiplies instead of three. Setting CKPCE to 2'b11, and insisting
// on at least two clocks with i_ce low between cycles with i_ce high,
// then the hardware optimized butterfly code will used one multiply
// instead of two.
parameter CKPCE = 1;
// The COEFFILE parameter contains the name of the file containing the
// FFT twiddle factors
parameter COEFFILE="cmem_o2048.hex";
 
`ifdef VERILATOR
parameter [0:0] ZERO_ON_IDLE = 1'b0;
`else
localparam [0:0] ZERO_ON_IDLE = 1'b0;
`endif // VERILATOR
 
input i_clk, i_reset, i_ce, i_sync;
input [(2*IWIDTH-1):0] i_data;
output reg [(2*OWIDTH-1):0] o_data;
output reg o_sync;
 
reg wait_for_sync;
reg [(2*IWIDTH-1):0] ib_a, ib_b;
reg [(2*CWIDTH-1):0] ib_c;
reg ib_sync;
 
reg b_started;
wire ob_sync;
wire [(2*OWIDTH-1):0] ob_a, ob_b;
 
// cmem is defined as an array of real and complex values,
// where the top CWIDTH bits are the real value and the bottom
// CWIDTH bits are the imaginary value.
//
// cmem[i] = { (2^(CWIDTH-2)) * cos(2*pi*i/(2^LGWIDTH)),
// (2^(CWIDTH-2)) * sin(2*pi*i/(2^LGWIDTH)) };
//
reg [(2*CWIDTH-1):0] cmem [0:((1<<LGSPAN)-1)];
initial $readmemh(COEFFILE,cmem);
 
reg [(LGSPAN):0] iaddr;
reg [(2*IWIDTH-1):0] imem [0:((1<<LGSPAN)-1)];
 
reg [LGSPAN:0] oB;
reg [(2*OWIDTH-1):0] omem [0:((1<<LGSPAN)-1)];
 
initial wait_for_sync = 1'b1;
initial iaddr = 0;
always @(posedge i_clk)
if (i_reset)
begin
wait_for_sync <= 1'b1;
iaddr <= 0;
end else if ((i_ce)&&((!wait_for_sync)||(i_sync)))
begin
//
// First step: Record what we're not ready to use yet
//
iaddr <= iaddr + { {(LGSPAN){1'b0}}, 1'b1 };
wait_for_sync <= 1'b0;
end
always @(posedge i_clk) // Need to make certain here that we don't read
if ((i_ce)&&(!iaddr[LGSPAN])) // and write the same address on
imem[iaddr[(LGSPAN-1):0]] <= i_data; // the same clk
 
//
// Now, we have all the inputs, so let's feed the butterfly
//
initial ib_sync = 1'b0;
always @(posedge i_clk)
if (i_reset)
ib_sync <= 1'b0;
else if (i_ce)
begin
// Set the sync to true on the very first
// valid input in, and hence on the very
// first valid data out per FFT.
ib_sync <= (iaddr==(1<<(LGSPAN)));
end
 
always @(posedge i_clk)
if (i_ce)
begin
// One input from memory, ...
ib_a <= imem[iaddr[(LGSPAN-1):0]];
// One input clocked in from the top
ib_b <= i_data;
// and the coefficient or twiddle factor
ib_c <= cmem[iaddr[(LGSPAN-1):0]];
end
 
// The idle register is designed to keep track of when an input
// to the butterfly is important and going to be used. It's used
// in a flag following, so that when useful values are placed
// into the butterfly they'll be non-zero (idle=0), otherwise when
// the inputs to the butterfly are irrelevant and will be ignored,
// then (idle=1) those inputs will be set to zero. This
// functionality is not designed to be used in operation, but only
// within a Verilator simulation context when chasing a bug.
// In this limited environment, the non-zero answers will stand
// in a trace making it easier to highlight a bug.
reg idle;
generate if (ZERO_ON_IDLE)
begin
initial idle = 1;
always @(posedge i_clk)
if (i_reset)
idle <= 1'b1;
else if (i_ce)
idle <= (!iaddr[LGSPAN])&&(!wait_for_sync);
 
end else begin
 
always @(*) idle = 0;
 
end endgenerate
 
generate if (OPT_HWMPY)
begin : HWBFLY
hwbfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),
.CKPCE(CKPCE), .SHIFT(BFLYSHIFT))
bfly(i_clk, i_reset, i_ce, (idle)?0:ib_c,
(idle || (!i_ce)) ? 0:ib_a,
(idle || (!i_ce)) ? 0:ib_b,
(ib_sync)&&(i_ce),
ob_a, ob_b, ob_sync);
end else begin : FWBFLY
butterfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),
.CKPCE(CKPCE),.SHIFT(BFLYSHIFT))
bfly(i_clk, i_reset, i_ce,
(idle||(!i_ce))?0:ib_c,
(idle||(!i_ce))?0:ib_a,
(idle||(!i_ce))?0:ib_b,
(ib_sync&&i_ce),
ob_a, ob_b, ob_sync);
end endgenerate
 
//
// Next step: recover the outputs from the butterfly
//
initial oB = 0;
initial o_sync = 0;
initial b_started = 0;
always @(posedge i_clk)
if (i_reset)
begin
oB <= 0;
o_sync <= 0;
b_started <= 0;
end else if (i_ce)
begin
o_sync <= (!oB[LGSPAN])?ob_sync : 1'b0;
if (ob_sync||b_started)
oB <= oB + { {(LGSPAN){1'b0}}, 1'b1 };
if ((ob_sync)&&(!oB[LGSPAN]))
// A butterfly output is available
b_started <= 1'b1;
end
 
reg [(LGSPAN-1):0] dly_addr;
reg [(2*OWIDTH-1):0] dly_value;
always @(posedge i_clk)
if (i_ce)
begin
dly_addr <= oB[(LGSPAN-1):0];
dly_value <= ob_b;
end
always @(posedge i_clk)
if (i_ce)
omem[dly_addr] <= dly_value;
 
always @(posedge i_clk)
if (i_ce)
o_data <= (!oB[LGSPAN])?ob_a : omem[oB[(LGSPAN-1):0]];
 
endmodule
/trunk/rtl/hwbfly.v
0,0 → 1,709
////////////////////////////////////////////////////////////////////////////////
//
// Filename: hwbfly.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This routine is identical to the butterfly.v routine found
// in 'butterfly.v', save only that it uses the verilog
// operator '*' in hopes that the synthesizer would be able to optimize
// it with hardware resources.
//
// It is understood that a hardware multiply can complete its operation in
// a single clock.
//
// Operation:
//
// Given two inputs, A (i_left) and B (i_right), and a complex
// coefficient C (i_coeff), return two outputs, O1 and O2, where:
//
// O1 = A + B, and
// O2 = (A - B)*C
//
// This operation is commonly known as a Decimation in Frequency (DIF)
// Radix-2 Butterfly.
// O1 and O2 are rounded before being returned in (o_left) and o_right
// to OWIDTH bits. If SHIFT is one, an extra bit is dropped from these
// values during the rounding process.
//
// Further, since these outputs will take some number of clocks to
// calculate, we'll pipe a value (i_aux) through the system and return
// it with the results (o_aux), so you can synchronize to the outgoing
// output stream.
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module hwbfly(i_clk, i_reset, i_ce, i_coef, i_left, i_right, i_aux,
o_left, o_right, o_aux);
// Public changeable parameters ...
// - IWIDTH, number of bits in each component of the input
// - CWIDTH, number of bits in each component of the twiddle factor
// - OWIDTH, number of bits in each component of the output
parameter IWIDTH=16,CWIDTH=IWIDTH+4,OWIDTH=IWIDTH+1;
// Drop an additional bit on the output?
parameter SHIFT=0;
// The number of clocks per clock enable, 1, 2, or 3.
parameter [1:0] CKPCE=1;
//
input i_clk, i_reset, i_ce;
input [(2*CWIDTH-1):0] i_coef;
input [(2*IWIDTH-1):0] i_left, i_right;
input i_aux;
output wire [(2*OWIDTH-1):0] o_left, o_right;
output reg o_aux;
 
 
reg [(2*IWIDTH-1):0] r_left, r_right;
reg r_aux, r_aux_2;
reg [(2*CWIDTH-1):0] r_coef;
wire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i;
assign r_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];
assign r_left_i = r_left[ (IWIDTH-1):0];
assign r_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];
assign r_right_i = r_right[(IWIDTH-1):0];
reg signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i;
 
reg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i;
 
reg [(2*IWIDTH+2):0] leftv, leftvv;
 
// Set up the input to the multiply
initial r_aux = 1'b0;
initial r_aux_2 = 1'b0;
always @(posedge i_clk)
if (i_reset)
begin
r_aux <= 1'b0;
r_aux_2 <= 1'b0;
end else if (i_ce)
begin
// One clock just latches the inputs
r_aux <= i_aux;
// Next clock adds/subtracts
// Other inputs are simply delayed on second clock
r_aux_2 <= r_aux;
end
always @(posedge i_clk)
if (i_ce)
begin
// One clock just latches the inputs
r_left <= i_left; // No change in # of bits
r_right <= i_right;
r_coef <= i_coef;
// Next clock adds/subtracts
r_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits
r_sum_i <= r_left_i + r_right_i;
r_dif_r <= r_left_r - r_right_r;
r_dif_i <= r_left_i - r_right_i;
// Other inputs are simply delayed on second clock
ir_coef_r <= r_coef[(2*CWIDTH-1):CWIDTH];
ir_coef_i <= r_coef[(CWIDTH-1):0];
end
 
 
// See comments in the butterfly.v source file for a discussion of
// these operations and the appropriate bit widths.
 
wire signed [((IWIDTH+1)+(CWIDTH)-1):0] p_one, p_two;
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] p_three;
 
initial leftv = 0;
initial leftvv = 0;
always @(posedge i_clk)
if (i_reset)
begin
leftv <= 0;
leftvv <= 0;
end else if (i_ce)
begin
// Second clock, pipeline = 1
leftv <= { r_aux_2, r_sum_r, r_sum_i };
 
// Third clock, pipeline = 3
// As desired, each of these lines infers a DSP48
leftvv <= leftv;
end
 
generate if (CKPCE <= 1)
begin : CKPCE_ONE
// Coefficient multiply inputs
reg signed [(CWIDTH-1):0] p1c_in, p2c_in;
// Data multiply inputs
reg signed [(IWIDTH):0] p1d_in, p2d_in;
// Product 3, coefficient input
reg signed [(CWIDTH):0] p3c_in;
// Product 3, data input
reg signed [(IWIDTH+1):0] p3d_in;
 
reg signed [((IWIDTH+1)+(CWIDTH)-1):0] rp_one, rp_two;
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three;
 
always @(posedge i_clk)
if (i_ce)
begin
// Second clock, pipeline = 1
p1c_in <= ir_coef_r;
p2c_in <= ir_coef_i;
p1d_in <= r_dif_r;
p2d_in <= r_dif_i;
p3c_in <= ir_coef_i + ir_coef_r;
p3d_in <= r_dif_r + r_dif_i;
end
 
`ifndef FORMAL
always @(posedge i_clk)
if (i_ce)
begin
// Third clock, pipeline = 3
// As desired, each of these lines infers a DSP48
rp_one <= p1c_in * p1d_in;
rp_two <= p2c_in * p2d_in;
rp_three <= p3c_in * p3d_in;
end
`else
wire signed [((IWIDTH+1)+(CWIDTH)-1):0] pre_rp_one, pre_rp_two;
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] pre_rp_three;
 
abs_mpy #(CWIDTH,IWIDTH+1,1'b1)
onei(p1c_in, p1d_in, pre_rp_one);
abs_mpy #(CWIDTH,IWIDTH+1,1'b1)
twoi(p2c_in, p2d_in, pre_rp_two);
abs_mpy #(CWIDTH+1,IWIDTH+2,1'b1)
threei(p3c_in, p3d_in, pre_rp_three);
 
always @(posedge i_clk)
if (i_ce)
begin
rp_one = pre_rp_one;
rp_two = pre_rp_two;
rp_three = pre_rp_three;
end
`endif // FORMAL
 
assign p_one = rp_one;
assign p_two = rp_two;
assign p_three = rp_three;
 
end else if (CKPCE <= 2)
begin : CKPCE_TWO
// Coefficient multiply inputs
reg [2*(CWIDTH)-1:0] mpy_pipe_c;
// Data multiply inputs
reg [2*(IWIDTH+1)-1:0] mpy_pipe_d;
wire signed [(CWIDTH-1):0] mpy_pipe_vc;
wire signed [(IWIDTH):0] mpy_pipe_vd;
//
reg signed [(CWIDTH+1)-1:0] mpy_cof_sum;
reg signed [(IWIDTH+2)-1:0] mpy_dif_sum;
 
assign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH];
assign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1];
 
reg mpy_pipe_v;
reg ce_phase;
 
reg signed [(CWIDTH+IWIDTH+1)-1:0] mpy_pipe_out;
reg signed [IWIDTH+CWIDTH+3-1:0] longmpy;
 
 
initial ce_phase = 1'b1;
always @(posedge i_clk)
if (i_reset)
ce_phase <= 1'b1;
else if (i_ce)
ce_phase <= 1'b0;
else
ce_phase <= 1'b1;
 
always @(*)
mpy_pipe_v = (i_ce)||(!ce_phase);
 
always @(posedge i_clk)
if (!ce_phase)
begin
// Pre-clock
mpy_pipe_c[2*CWIDTH-1:0] <=
{ ir_coef_r, ir_coef_i };
mpy_pipe_d[2*(IWIDTH+1)-1:0] <=
{ r_dif_r, r_dif_i };
 
mpy_cof_sum <= ir_coef_i + ir_coef_r;
mpy_dif_sum <= r_dif_r + r_dif_i;
 
end else if (i_ce)
begin
// First clock
mpy_pipe_c[2*(CWIDTH)-1:0] <= {
mpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} };
mpy_pipe_d[2*(IWIDTH+1)-1:0] <= {
mpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} };
end
 
`ifndef FORMAL
always @(posedge i_clk)
if (i_ce) // First clock
longmpy <= mpy_cof_sum * mpy_dif_sum;
 
always @(posedge i_clk)
if (mpy_pipe_v)
mpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd;
`else
wire signed [IWIDTH+CWIDTH+3-1:0] pre_longmpy;
wire signed [(CWIDTH+IWIDTH+1)-1:0] pre_mpy_pipe_out;
 
abs_mpy #(CWIDTH+1,IWIDTH+2,1)
longmpyi(mpy_cof_sum, mpy_dif_sum, pre_longmpy);
 
always @(posedge i_clk)
if (i_ce)
longmpy <= pre_longmpy;
 
 
abs_mpy #(CWIDTH,IWIDTH+1,1)
mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out);
 
always @(posedge i_clk)
if (mpy_pipe_v)
mpy_pipe_out <= pre_mpy_pipe_out;
`endif
 
reg signed [((IWIDTH+1)+(CWIDTH)-1):0] rp_one,
rp2_one, rp_two;
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three;
 
always @(posedge i_clk)
if (!ce_phase) // 1.5 clock
rp_one <= mpy_pipe_out;
always @(posedge i_clk)
if (i_ce) // two clocks
rp_two <= mpy_pipe_out;
always @(posedge i_clk)
if (i_ce) // Second clock
rp_three<= longmpy;
always @(posedge i_clk)
if (i_ce)
rp2_one<= rp_one;
 
assign p_one = rp2_one;
assign p_two = rp_two;
assign p_three= rp_three;
 
end else if (CKPCE <= 2'b11)
begin : CKPCE_THREE
// Coefficient multiply inputs
reg [3*(CWIDTH+1)-1:0] mpy_pipe_c;
// Data multiply inputs
reg [3*(IWIDTH+2)-1:0] mpy_pipe_d;
wire signed [(CWIDTH):0] mpy_pipe_vc;
wire signed [(IWIDTH+1):0] mpy_pipe_vd;
 
assign mpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)];
assign mpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)];
 
reg mpy_pipe_v;
reg [2:0] ce_phase;
 
reg signed [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;
 
initial ce_phase = 3'b011;
always @(posedge i_clk)
if (i_reset)
ce_phase <= 3'b011;
else if (i_ce)
ce_phase <= 3'b000;
else if (ce_phase != 3'b011)
ce_phase <= ce_phase + 1'b1;
 
always @(*)
mpy_pipe_v = (i_ce)||(ce_phase < 3'b010);
 
always @(posedge i_clk)
if (ce_phase == 3'b000)
begin
// Second clock
mpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= {
ir_coef_r[CWIDTH-1], ir_coef_r,
ir_coef_i[CWIDTH-1], ir_coef_i };
mpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r;
mpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= {
r_dif_r[IWIDTH], r_dif_r,
r_dif_i[IWIDTH], r_dif_i };
mpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i;
 
end else if (mpy_pipe_v)
begin
mpy_pipe_c[3*(CWIDTH+1)-1:0] <= {
mpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1'b0}} };
mpy_pipe_d[3*(IWIDTH+2)-1:0] <= {
mpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1'b0}} };
end
 
`ifndef FORMAL
always @(posedge i_clk)
if (mpy_pipe_v)
mpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd;
 
`else // FORMAL
wire signed [ (CWIDTH+IWIDTH+3)-1:0] pre_mpy_pipe_out;
 
abs_mpy #(CWIDTH+1,IWIDTH+2,1)
mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out);
always @(posedge i_clk)
if (mpy_pipe_v)
mpy_pipe_out <= pre_mpy_pipe_out;
`endif // FORMAL
 
reg signed [((IWIDTH+1)+(CWIDTH)-1):0] rp_one, rp_two,
rp2_one, rp2_two;
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three, rp2_three;
 
always @(posedge i_clk)
if(i_ce)
rp_one <= mpy_pipe_out[(CWIDTH+IWIDTH):0];
always @(posedge i_clk)
if(ce_phase == 3'b000)
rp_two <= mpy_pipe_out[(CWIDTH+IWIDTH):0];
always @(posedge i_clk)
if(ce_phase == 3'b001)
rp_three <= mpy_pipe_out;
always @(posedge i_clk)
if (i_ce)
begin
rp2_one<= rp_one;
rp2_two<= rp_two;
rp2_three<= rp_three;
end
assign p_one = rp2_one;
assign p_two = rp2_two;
assign p_three = rp2_three;
 
end endgenerate
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] w_one, w_two;
assign w_one = { {(2){p_one[((IWIDTH+1)+(CWIDTH)-1)]}}, p_one };
assign w_two = { {(2){p_two[((IWIDTH+1)+(CWIDTH)-1)]}}, p_two };
 
// These values are held in memory and delayed during the
// multiply. Here, we recover them. During the multiply,
// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},
// therefore, the left_x values need to be right shifted by
// CWIDTH-2 as well. The additional bits come from a sign
// extension.
wire aux_s;
wire signed [(IWIDTH+CWIDTH):0] left_si, left_sr;
reg [(2*IWIDTH+2):0] left_saved;
assign left_sr = { {2{left_saved[2*(IWIDTH+1)-1]}}, left_saved[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1'b0}} };
assign left_si = { {2{left_saved[(IWIDTH+1)-1]}}, left_saved[((IWIDTH+1)-1):0], {(CWIDTH-2){1'b0}} };
assign aux_s = left_saved[2*IWIDTH+2];
 
(* use_dsp48="no" *)
reg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;
 
initial left_saved = 0;
initial o_aux = 1'b0;
always @(posedge i_clk)
if (i_reset)
begin
left_saved <= 0;
o_aux <= 1'b0;
end else if (i_ce)
begin
// First clock, recover all values
left_saved <= leftvv;
 
// Second clock, round and latch for final clock
o_aux <= aux_s;
end
always @(posedge i_clk)
if (i_ce)
begin
// These values are IWIDTH+CWIDTH+3 bits wide
// although they only need to be (IWIDTH+1)
// + (CWIDTH) bits wide. (We've got two
// extra bits we need to get rid of.)
 
// These two lines also infer DSP48's.
// To keep from using extra DSP48 resources,
// they are prevented from using DSP48's
// by the (* use_dsp48 ... *) comment above.
mpy_r <= w_one - w_two;
mpy_i <= p_three - w_one - w_two;
end
 
// Round the results
wire signed [(OWIDTH-1):0] rnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;
 
convround #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_r(i_clk, i_ce,
left_sr, rnd_left_r);
 
convround #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_i(i_clk, i_ce,
left_si, rnd_left_i);
 
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,
mpy_r, rnd_right_r);
 
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,
mpy_i, rnd_right_i);
 
// As a final step, we pack our outputs into two packed two's
// complement numbers per output word, so that each output word
// has (2*OWIDTH) bits in it, with the top half being the real
// portion and the bottom half being the imaginary portion.
assign o_left = { rnd_left_r, rnd_left_i };
assign o_right= { rnd_right_r,rnd_right_i};
 
`ifdef VERILATOR
`define FORMAL
`endif
`ifdef FORMAL
localparam F_LGDEPTH = 3;
localparam F_DEPTH = 5;
localparam [F_LGDEPTH-1:0] F_D = F_DEPTH-1;
 
reg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1];
reg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1];
reg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1];
reg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1];
reg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1];
reg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1];
reg signed [F_DEPTH-1:0] f_dlyaux;
 
always @(posedge i_clk)
if (i_reset)
f_dlyaux <= 0;
else if (i_ce)
f_dlyaux <= { f_dlyaux[F_DEPTH-2:0], i_aux };
 
always @(posedge i_clk)
if (i_ce)
begin
f_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH];
f_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0];
f_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH];
f_dlyright_i[0] <= i_right[( IWIDTH-1):0];
f_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH];
f_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0];
end
 
genvar k;
generate for(k=1; k<F_DEPTH; k=k+1)
 
always @(posedge i_clk)
if (i_ce)
begin
f_dlyleft_r[k] <= f_dlyleft_r[ k-1];
f_dlyleft_i[k] <= f_dlyleft_i[ k-1];
f_dlyright_r[k] <= f_dlyright_r[k-1];
f_dlyright_i[k] <= f_dlyright_i[k-1];
f_dlycoeff_r[k] <= f_dlycoeff_r[k-1];
f_dlycoeff_i[k] <= f_dlycoeff_i[k-1];
end
 
endgenerate
 
`ifdef VERILATOR
`else
always @(posedge i_clk)
if ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3))
&&(!$past(i_ce,4)))
assume(i_ce);
 
generate if (CKPCE <= 1)
begin
 
// i_ce is allowed to be anything in this mode
 
end else if (CKPCE == 2)
begin : F_CKPCE_TWO
 
always @(posedge i_clk)
if ($past(i_ce))
assume(!i_ce);
 
end else if (CKPCE == 3)
begin : F_CKPCE_THREE
 
always @(posedge i_clk)
if (($past(i_ce))||($past(i_ce,2)))
assume(!i_ce);
 
end endgenerate
`endif
reg [F_LGDEPTH-1:0] f_startup_counter;
initial f_startup_counter = 0;
always @(posedge i_clk)
if (i_reset)
f_startup_counter <= 0;
else if ((i_ce)&&(!(&f_startup_counter)))
f_startup_counter <= f_startup_counter + 1;
 
wire signed [IWIDTH:0] f_sumr, f_sumi;
always @(*)
begin
f_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D];
f_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D];
end
 
wire signed [IWIDTH+CWIDTH:0] f_sumrx, f_sumix;
assign f_sumrx = { {(2){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} };
assign f_sumix = { {(2){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} };
 
wire signed [IWIDTH:0] f_difr, f_difi;
always @(*)
begin
f_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D];
f_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D];
end
 
wire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix;
assign f_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr };
assign f_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi };
 
wire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i;
assign f_widecoeff_r = {{(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}},
f_dlycoeff_r[F_D] };
assign f_widecoeff_i = {{(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}},
f_dlycoeff_i[F_D] };
 
always @(posedge i_clk)
if (f_startup_counter > F_D)
begin
assert(left_sr == f_sumrx);
assert(left_si == f_sumix);
assert(aux_s == f_dlyaux[F_D]);
 
if ((f_difr == 0)&&(f_difi == 0))
begin
assert(mpy_r == 0);
assert(mpy_i == 0);
end else if ((f_dlycoeff_r[F_D] == 0)
&&(f_dlycoeff_i[F_D] == 0))
begin
assert(mpy_r == 0);
assert(mpy_i == 0);
end
 
if ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0))
begin
assert(mpy_r == f_difrx);
assert(mpy_i == f_difix);
end
 
if ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1))
begin
assert(mpy_r == -f_difix);
assert(mpy_i == f_difrx);
end
 
if ((f_difr == 1)&&(f_difi == 0))
begin
assert(mpy_r == f_widecoeff_r);
assert(mpy_i == f_widecoeff_i);
end
 
if ((f_difr == 0)&&(f_difi == 1))
begin
assert(mpy_r == -f_widecoeff_i);
assert(mpy_i == f_widecoeff_r);
end
end
 
// Let's see if we can improve our performance at all by
// moving our test one clock earlier. If nothing else, it should
// help induction finish one (or more) clocks ealier than
// otherwise
 
 
wire signed [IWIDTH:0] f_predifr, f_predifi;
always @(*)
begin
f_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1];
f_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1];
end
 
wire signed [IWIDTH+CWIDTH+1-1:0] f_predifrx, f_predifix;
assign f_predifrx = { {(CWIDTH){f_predifr[IWIDTH]}}, f_predifr };
assign f_predifix = { {(CWIDTH){f_predifi[IWIDTH]}}, f_predifi };
 
wire signed [CWIDTH:0] f_sumcoef;
wire signed [IWIDTH+1:0] f_sumdiff;
always @(*)
begin
f_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1];
f_sumdiff = f_predifr + f_predifi;
end
 
// Induction helpers
always @(posedge i_clk)
if (f_startup_counter >= F_D)
begin
if (f_dlycoeff_r[F_D-1] == 0)
assert(p_one == 0);
if (f_dlycoeff_i[F_D-1] == 0)
assert(p_two == 0);
 
if (f_dlycoeff_r[F_D-1] == 1)
assert(p_one == f_predifrx);
if (f_dlycoeff_i[F_D-1] == 1)
assert(p_two == f_predifix);
 
if (f_predifr == 0)
assert(p_one == 0);
if (f_predifi == 0)
assert(p_two == 0);
 
// verilator lint_off WIDTH
if (f_predifr == 1)
assert(p_one == f_dlycoeff_r[F_D-1]);
if (f_predifi == 1)
assert(p_two == f_dlycoeff_i[F_D-1]);
// verilator lint_on WIDTH
 
if (f_sumcoef == 0)
assert(p_three == 0);
if (f_sumdiff == 0)
assert(p_three == 0);
// verilator lint_off WIDTH
if (f_sumcoef == 1)
assert(p_three == f_sumdiff);
if (f_sumdiff == 1)
assert(p_three == f_sumcoef);
// verilator lint_on WIDTH
`ifdef VERILATOR
assert(p_one == f_predifr * f_dlycoeff_r[F_D-1]);
assert(p_two == f_predifi * f_dlycoeff_i[F_D-1]);
assert(p_three == f_sumdiff * f_sumcoef);
`endif // VERILATOR
end
 
`endif // FORMAL
endmodule
/trunk/rtl/ifftmain.v
0,0 → 1,276
////////////////////////////////////////////////////////////////////////////////
//
// Filename: ifftmain.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This is the main module in the General Purpose FPGA FFT
// implementation. As such, all other modules are subordinate
// to this one. This module accomplish a fixed size Complex FFT on
// 2048 data points.
// The FFT is fully pipelined, and accepts as inputs two complex two's
// complement samples per clock.
//
// Parameters:
// i_clk The clock. All operations are synchronous with this clock.
// i_reset Synchronous reset, active high. Setting this line will
// force the reset of all of the internals to this routine.
// Further, following a reset, the o_sync line will go
// high the same time the first output sample is valid.
// i_ce A clock enable line. If this line is set, this module
// will accept two complex values as inputs, and produce
// two (possibly empty) complex values as outputs.
// i_left The first of two complex input samples. This value is split
// into two two's complement numbers, 15 bits each, with
// the real portion in the high order bits, and the
// imaginary portion taking the bottom 15 bits.
// i_right This is the same thing as i_left, only this is the second of
// two such samples. Hence, i_left would contain input
// sample zero, i_right would contain sample one. On the
// next clock i_left would contain input sample two,
// i_right number three and so forth.
// o_left The first of two output samples, of the same format as i_left,
// only having 21 bits for each of the real and imaginary
// components, leading to 42 bits total.
// o_right The second of two output samples produced each clock. This has
// the same format as o_left.
// o_sync A one bit output indicating the first valid sample produced by
// this FFT following a reset. Ever after, this will
// indicate the first sample of an FFT frame.
//
// Arguments: This file was computer generated using the following command
// line:
//
// % ./fftgen -i -d ../rtl -f 2048 -2 -p 0 -n 15 -a ../bench/cpp/ifftsize.h
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
//
//
module ifftmain(i_clk, i_reset, i_ce,
i_left, i_right,
o_left, o_right, o_sync);
parameter IWIDTH=15, OWIDTH=21, LGWIDTH=11;
//
input i_clk, i_reset, i_ce;
//
input [(2*IWIDTH-1):0] i_left, i_right;
output reg [(2*OWIDTH-1):0] o_left, o_right;
output reg o_sync;
 
 
// Outputs of the FFT, ready for bit reversal.
wire [(2*OWIDTH-1):0] br_left, br_right;
 
 
wire w_s2048;
// verilator lint_off UNUSED
wire w_os2048;
// verilator lint_on UNUSED
wire [31:0] w_e2048, w_o2048;
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0,
0, 1, "icmem_e4096.hex")
stage_e2048(i_clk, i_reset, i_ce,
(!i_reset), i_left, w_e2048, w_s2048);
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0,
0, 1, "icmem_o4096.hex")
stage_o2048(i_clk, i_reset, i_ce,
(!i_reset), i_right, w_o2048, w_os2048);
 
 
wire w_s1024;
// verilator lint_off UNUSED
wire w_os1024;
// verilator lint_on UNUSED
wire [33:0] w_e1024, w_o1024;
fftstage #(16,20,17,11,8,0,
0, 1, "icmem_e2048.hex")
stage_e1024(i_clk, i_reset, i_ce,
w_s2048, w_e2048, w_e1024, w_s1024);
fftstage #(16,20,17,11,8,0,
0, 1, "icmem_o2048.hex")
stage_o1024(i_clk, i_reset, i_ce,
w_s2048, w_o2048, w_o1024, w_os1024);
 
wire w_s512;
// verilator lint_off UNUSED
wire w_os512;
// verilator lint_on UNUSED
wire [33:0] w_e512, w_o512;
fftstage #(17,21,17,11,7,0,
0, 1, "icmem_e1024.hex")
stage_e512(i_clk, i_reset, i_ce,
w_s1024, w_e1024, w_e512, w_s512);
fftstage #(17,21,17,11,7,0,
0, 1, "icmem_o1024.hex")
stage_o512(i_clk, i_reset, i_ce,
w_s1024, w_o1024, w_o512, w_os512);
 
wire w_s256;
// verilator lint_off UNUSED
wire w_os256;
// verilator lint_on UNUSED
wire [35:0] w_e256, w_o256;
fftstage #(17,21,18,11,6,0,
0, 1, "icmem_e512.hex")
stage_e256(i_clk, i_reset, i_ce,
w_s512, w_e512, w_e256, w_s256);
fftstage #(17,21,18,11,6,0,
0, 1, "icmem_o512.hex")
stage_o256(i_clk, i_reset, i_ce,
w_s512, w_o512, w_o256, w_os256);
 
wire w_s128;
// verilator lint_off UNUSED
wire w_os128;
// verilator lint_on UNUSED
wire [35:0] w_e128, w_o128;
fftstage #(18,22,18,11,5,0,
0, 1, "icmem_e256.hex")
stage_e128(i_clk, i_reset, i_ce,
w_s256, w_e256, w_e128, w_s128);
fftstage #(18,22,18,11,5,0,
0, 1, "icmem_o256.hex")
stage_o128(i_clk, i_reset, i_ce,
w_s256, w_o256, w_o128, w_os128);
 
wire w_s64;
// verilator lint_off UNUSED
wire w_os64;
// verilator lint_on UNUSED
wire [37:0] w_e64, w_o64;
fftstage #(18,22,19,11,4,0,
0, 1, "icmem_e128.hex")
stage_e64(i_clk, i_reset, i_ce,
w_s128, w_e128, w_e64, w_s64);
fftstage #(18,22,19,11,4,0,
0, 1, "icmem_o128.hex")
stage_o64(i_clk, i_reset, i_ce,
w_s128, w_o128, w_o64, w_os64);
 
wire w_s32;
// verilator lint_off UNUSED
wire w_os32;
// verilator lint_on UNUSED
wire [37:0] w_e32, w_o32;
fftstage #(19,23,19,11,3,0,
0, 1, "icmem_e64.hex")
stage_e32(i_clk, i_reset, i_ce,
w_s64, w_e64, w_e32, w_s32);
fftstage #(19,23,19,11,3,0,
0, 1, "icmem_o64.hex")
stage_o32(i_clk, i_reset, i_ce,
w_s64, w_o64, w_o32, w_os32);
 
wire w_s16;
// verilator lint_off UNUSED
wire w_os16;
// verilator lint_on UNUSED
wire [39:0] w_e16, w_o16;
fftstage #(19,23,20,11,2,0,
0, 1, "icmem_e32.hex")
stage_e16(i_clk, i_reset, i_ce,
w_s32, w_e32, w_e16, w_s16);
fftstage #(19,23,20,11,2,0,
0, 1, "icmem_o32.hex")
stage_o16(i_clk, i_reset, i_ce,
w_s32, w_o32, w_o16, w_os16);
 
wire w_s8;
// verilator lint_off UNUSED
wire w_os8;
// verilator lint_on UNUSED
wire [39:0] w_e8, w_o8;
fftstage #(20,24,20,11,1,0,
0, 1, "icmem_e16.hex")
stage_e8(i_clk, i_reset, i_ce,
w_s16, w_e16, w_e8, w_s8);
fftstage #(20,24,20,11,1,0,
0, 1, "icmem_o16.hex")
stage_o8(i_clk, i_reset, i_ce,
w_s16, w_o16, w_o8, w_os8);
 
wire w_s4;
// verilator lint_off UNUSED
wire w_os4;
// verilator lint_on UNUSED
wire [41:0] w_e4, w_o4;
qtrstage #(20,21,11,0,1,0) stage_e4(i_clk, i_reset, i_ce,
w_s8, w_e8, w_e4, w_s4);
qtrstage #(20,21,11,1,1,0) stage_o4(i_clk, i_reset, i_ce,
w_s8, w_o8, w_o4, w_os4);
wire w_s2;
wire [41:0] w_e2, w_o2;
laststage #(21,21,0) stage_2(i_clk, i_reset, i_ce,
w_s4, w_e4, w_o4, w_e2, w_o2, w_s2);
 
 
// Prepare for a (potential) bit-reverse stage.
assign br_left = w_e2;
assign br_right = w_o2;
 
wire br_start;
reg r_br_started;
initial r_br_started = 1'b0;
always @(posedge i_clk)
if (i_reset)
r_br_started <= 1'b0;
else if (i_ce)
r_br_started <= r_br_started || w_s2;
assign br_start = r_br_started || w_s2;
 
// Now for the bit-reversal stage.
wire br_sync;
wire [(2*OWIDTH-1):0] br_o_left, br_o_right;
bitreverse #(11,21)
revstage(i_clk, i_reset,
(i_ce & br_start), br_left, br_right,
br_o_left, br_o_right, br_sync);
 
 
// Last clock: Register our outputs, we're done.
initial o_sync = 1'b0;
always @(posedge i_clk)
if (i_reset)
o_sync <= 1'b0;
else if (i_ce)
o_sync <= br_sync;
 
always @(posedge i_clk)
if (i_ce)
begin
o_left <= br_o_left;
o_right <= br_o_right;
end
 
 
endmodule
/trunk/rtl/laststage.v
0,0 → 1,171
////////////////////////////////////////////////////////////////////////////////
//
// Filename: laststage.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This is part of an FPGA implementation that will process
// the final stage of a decimate-in-frequency FFT, running
// through the data at two samples per clock. If you notice from the
// derivation of an FFT, the only time both even and odd samples are
// used at the same time is in this stage. Therefore, other than this
// stage and these twiddles, all of the other stages can run two stages
// at a time at one sample per clock.
//
// Operation:
// Given a stream of values, operate upon them as though they were
// value pairs, x[2n] and x[2n+1]. The stream begins when n=0, and ends
// when n=1. When the first x[0] value enters, the synchronization
// input, i_sync, must be true as well.
//
// For this stream, produce outputs
// y[2n ] = x[2n] + x[2n+1], and
// y[2n+1] = x[2n] - x[2n+1]
//
// When y[0] is output, a synchronization bit o_sync will be true as
// well, otherwise it will be zero.
//
//
// In this implementation, the output is valid one clock after the input
// is valid. The output also accumulates one bit above and beyond the
// number of bits in the input.
//
// i_clk A system clock
// i_reset A synchronous reset
// i_ce Circuit enable--nothing happens unless this line is high
// i_sync A synchronization signal, high once per FFT at the start
// i_left The first (even) complex sample input. The higher order
// bits contain the real portion, low order bits the
// imaginary portion, all in two's complement.
// i_right The next (odd) complex sample input, same format as
// i_left.
// o_left The first (even) complex output.
// o_right The next (odd) complex output.
// o_sync Output synchronization signal.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module laststage(i_clk, i_reset, i_ce, i_sync, i_left, i_right, o_left, o_right, o_sync);
parameter IWIDTH=16,OWIDTH=IWIDTH+1, SHIFT=0;
input i_clk, i_reset, i_ce, i_sync;
input [(2*IWIDTH-1):0] i_left, i_right;
output reg [(2*OWIDTH-1):0] o_left, o_right;
output reg o_sync;
 
wire signed [(IWIDTH-1):0] i_in_0r, i_in_0i, i_in_1r, i_in_1i;
assign i_in_0r = i_left[(2*IWIDTH-1):(IWIDTH)];
assign i_in_0i = i_left[(IWIDTH-1):0];
assign i_in_1r = i_right[(2*IWIDTH-1):(IWIDTH)];
assign i_in_1i = i_right[(IWIDTH-1):0];
wire [(OWIDTH-1):0] o_out_0r, o_out_0i,
o_out_1r, o_out_1i;
 
 
// Handle a potential rounding situation, when IWIDTH>=OWIDTH.
 
 
 
// As with any register connected to the sync pulse, these must
// have initial values and be reset on the i_reset signal.
// Other data values need only restrict their updates to i_ce
// enabled clocks, but sync's must obey resets and initial
// conditions as well.
reg rnd_sync, r_sync;
 
initial rnd_sync = 1'b0; // Sync into rounding
initial r_sync = 1'b0; // Sync coming out
always @(posedge i_clk)
if (i_reset)
begin
rnd_sync <= 1'b0;
r_sync <= 1'b0;
end else if (i_ce)
begin
rnd_sync <= i_sync;
r_sync <= rnd_sync;
end
 
// As with other variables, these are really only updated when in
// the processing pipeline, after the first i_sync. However, to
// eliminate as much unnecessary logic as possible, we toggle
// these any time the i_ce line is enabled, and don't reset.
// them on i_reset.
// Don't forget that we accumulate a bit by adding two values
// together. Therefore our intermediate value must have one more
// bit than the two originals.
reg signed [(IWIDTH):0] rnd_in_0r, rnd_in_0i;
reg signed [(IWIDTH):0] rnd_in_1r, rnd_in_1i;
 
always @(posedge i_clk)
if (i_ce)
begin
//
rnd_in_0r <= i_in_0r + i_in_1r;
rnd_in_0i <= i_in_0i + i_in_1i;
//
rnd_in_1r <= i_in_0r - i_in_1r;
rnd_in_1i <= i_in_0i - i_in_1i;
//
end
 
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0r(i_clk, i_ce,
rnd_in_0r, o_out_0r);
 
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0i(i_clk, i_ce,
rnd_in_0i, o_out_0i);
 
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1r(i_clk, i_ce,
rnd_in_1r, o_out_1r);
 
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1i(i_clk, i_ce,
rnd_in_1i, o_out_1i);
 
 
// Prior versions of this routine did not include the extra
// clock and register/flip-flops that this routine requires.
// These are placed in here to correct a bug in Verilator, that
// otherwise struggles. (Hopefully this will fix the problem ...)
always @(posedge i_clk)
if (i_ce)
begin
o_left <= { o_out_0r, o_out_0i };
o_right <= { o_out_1r, o_out_1i };
end
 
initial o_sync = 1'b0; // Final sync coming out of module
always @(posedge i_clk)
if (i_reset)
o_sync <= 1'b0;
else if (i_ce)
o_sync <= r_sync;
 
endmodule
/trunk/rtl/longbimpy.v
0,0 → 1,179
////////////////////////////////////////////////////////////////////////////////
//
// Filename: ../rtl/longbimpy.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: A portable shift and add multiply, built with the knowledge
// of the existence of a six bit LUT and carry chain. That knowledge
// allows us to multiply two bits from one value at a time against all
// of the bits of the other value. This sub multiply is called the
// bimpy.
//
// For minimal processing delay, make the first parameter the one with
// the least bits, so that AWIDTH <= BWIDTH.
//
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module longbimpy(i_clk, i_ce, i_a_unsorted, i_b_unsorted, o_r);
parameter IAW=8, // The width of i_a, min width is 5
IBW=12, // The width of i_b, can be anything
// The following three parameters should not be changed
// by any implementation, but are based upon hardware
// and the above values:
OW=IAW+IBW; // The output width
localparam AW = (IAW<IBW) ? IAW : IBW,
BW = (IAW<IBW) ? IBW : IAW,
IW=(AW+1)&(-2), // Internal width of A
LUTB=2, // How many bits we can multiply by at once
TLEN=(AW+(LUTB-1))/LUTB; // Nmbr of rows in our tableau
input i_clk, i_ce;
input [(IAW-1):0] i_a_unsorted;
input [(IBW-1):0] i_b_unsorted;
output reg [(AW+BW-1):0] o_r;
 
//
// Swap parameter order, so that AW <= BW -- for performance
// reasons
wire [AW-1:0] i_a;
wire [BW-1:0] i_b;
generate if (IAW <= IBW)
begin : NO_PARAM_CHANGE
assign i_a = i_a_unsorted;
assign i_b = i_b_unsorted;
end else begin : SWAP_PARAMETERS
assign i_a = i_b_unsorted;
assign i_b = i_a_unsorted;
end endgenerate
 
reg [(IW-1):0] u_a;
reg [(BW-1):0] u_b;
reg sgn;
 
reg [(IW-1-2*(LUTB)):0] r_a[0:(TLEN-3)];
reg [(BW-1):0] r_b[0:(TLEN-3)];
reg [(TLEN-1):0] r_s;
reg [(IW+BW-1):0] acc[0:(TLEN-2)];
genvar k;
 
// First step:
// Switch to unsigned arithmetic for our multiply, keeping track
// of the along the way. We'll then add the sign again later at
// the end.
//
// If we were forced to stay within two's complement arithmetic,
// taking the absolute value here would require an additional bit.
// However, because our results are now unsigned, we can stay
// within the number of bits given (for now).
generate if (IW > AW)
begin
always @(posedge i_clk)
if (i_ce)
u_a <= { 1'b0, (i_a[AW-1])?(-i_a):(i_a) };
end else begin
always @(posedge i_clk)
if (i_ce)
u_a <= (i_a[AW-1])?(-i_a):(i_a);
end endgenerate
 
always @(posedge i_clk)
if (i_ce)
begin
u_b <= (i_b[BW-1])?(-i_b):(i_b);
sgn <= i_a[AW-1] ^ i_b[BW-1];
end
 
wire [(BW+LUTB-1):0] pr_a, pr_b;
 
//
// Second step: First two 2xN products.
//
// Since we have no tableau of additions (yet), we can do both
// of the first two rows at the same time and add them together.
// For the next round, we'll then have a previous sum to accumulate
// with new and subsequent product, and so only do one product at
// a time can follow this--but the first clock can do two at a time.
bimpy #(BW) lmpy_0(i_clk,i_ce,u_a[( LUTB-1): 0], u_b, pr_a);
bimpy #(BW) lmpy_1(i_clk,i_ce,u_a[(2*LUTB-1):LUTB], u_b, pr_b);
always @(posedge i_clk)
if (i_ce) r_a[0] <= u_a[(IW-1):(2*LUTB)];
always @(posedge i_clk)
if (i_ce) r_b[0] <= u_b;
always @(posedge i_clk)
if (i_ce) r_s <= { r_s[(TLEN-2):0], sgn };
always @(posedge i_clk) // One clk after p[0],p[1] become valid
if (i_ce) acc[0] <= { {(IW-LUTB){1'b0}}, pr_a}
+{ {(IW-(2*LUTB)){1'b0}}, pr_b, {(LUTB){1'b0}} };
 
generate // Keep track of intermediate values, before multiplying them
if (TLEN > 3) for(k=0; k<TLEN-3; k=k+1)
begin : gencopies
always @(posedge i_clk)
if (i_ce)
begin
r_a[k+1] <= { {(LUTB){1'b0}},
r_a[k][(IW-1-(2*LUTB)):LUTB] };
r_b[k+1] <= r_b[k];
end
end endgenerate
 
generate // The actual multiply and accumulate stage
if (TLEN > 2) for(k=0; k<TLEN-2; k=k+1)
begin : genstages
// First, the multiply: 2-bits times BW bits
wire [(BW+LUTB-1):0] genp;
bimpy #(BW) genmpy(i_clk,i_ce,r_a[k][(LUTB-1):0],r_b[k], genp);
 
// Then the accumulate step -- on the next clock
always @(posedge i_clk)
if (i_ce)
acc[k+1] <= acc[k] + {{(IW-LUTB*(k+3)){1'b0}},
genp, {(LUTB*(k+2)){1'b0}} };
end endgenerate
 
wire [(IW+BW-1):0] w_r;
assign w_r = (r_s[TLEN-1]) ? (-acc[TLEN-2]) : acc[TLEN-2];
always @(posedge i_clk)
if (i_ce)
o_r <= w_r[(AW+BW-1):0];
 
generate if (IW > AW)
begin : VUNUSED
// verilator lint_off UNUSED
wire [(IW-AW)-1:0] unused;
assign unused = w_r[(IW+BW-1):(AW+BW)];
// verilator lint_on UNUSED
end endgenerate
 
endmodule
/trunk/rtl/qtrstage.v
0,0 → 1,178
////////////////////////////////////////////////////////////////////////////////
//
// Filename: qtrstage.v
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This file encapsulates the 4 point stage of a decimation in
// frequency FFT. This particular implementation is optimized
// so that all of the multiplies are accomplished by additions and
// multiplexers only.
//
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`default_nettype none
//
module qtrstage(i_clk, i_reset, i_ce, i_sync, i_data, o_data, o_sync);
parameter IWIDTH=16, OWIDTH=IWIDTH+1;
// Parameters specific to the core that should be changed when this
// core is built ... Note that the minimum LGSPAN is 2. Smaller
// spans must use the fftdoubles stage.
parameter LGWIDTH=8, ODD=0, INVERSE=0,SHIFT=0;
input i_clk, i_reset, i_ce, i_sync;
input [(2*IWIDTH-1):0] i_data;
output reg [(2*OWIDTH-1):0] o_data;
output reg o_sync;
reg wait_for_sync;
reg [3:0] pipeline;
 
reg [(IWIDTH):0] sum_r, sum_i, diff_r, diff_i;
 
reg [(2*OWIDTH-1):0] ob_a;
wire [(2*OWIDTH-1):0] ob_b;
reg [(OWIDTH-1):0] ob_b_r, ob_b_i;
assign ob_b = { ob_b_r, ob_b_i };
 
reg [(LGWIDTH-1):0] iaddr;
reg [(2*IWIDTH-1):0] imem;
 
wire signed [(IWIDTH-1):0] imem_r, imem_i;
assign imem_r = imem[(2*IWIDTH-1):(IWIDTH)];
assign imem_i = imem[(IWIDTH-1):0];
 
wire signed [(IWIDTH-1):0] i_data_r, i_data_i;
assign i_data_r = i_data[(2*IWIDTH-1):(IWIDTH)];
assign i_data_i = i_data[(IWIDTH-1):0];
 
reg [(2*OWIDTH-1):0] omem;
 
wire signed [(OWIDTH-1):0] rnd_sum_r, rnd_sum_i, rnd_diff_r, rnd_diff_i,
n_rnd_diff_r, n_rnd_diff_i;
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_sum_r(i_clk, i_ce,
sum_r, rnd_sum_r);
 
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_sum_i(i_clk, i_ce,
sum_i, rnd_sum_i);
 
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_diff_r(i_clk, i_ce,
diff_r, rnd_diff_r);
 
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_diff_i(i_clk, i_ce,
diff_i, rnd_diff_i);
 
assign n_rnd_diff_r = - rnd_diff_r;
assign n_rnd_diff_i = - rnd_diff_i;
initial wait_for_sync = 1'b1;
initial iaddr = 0;
always @(posedge i_clk)
if (i_reset)
begin
wait_for_sync <= 1'b1;
iaddr <= 0;
end else if ((i_ce)&&((!wait_for_sync)||(i_sync)))
begin
iaddr <= iaddr + { {(LGWIDTH-1){1'b0}}, 1'b1 };
wait_for_sync <= 1'b0;
end
 
always @(posedge i_clk)
if (i_ce)
imem <= i_data;
 
 
// Note that we don't check on wait_for_sync or i_sync here.
// Why not? Because iaddr will always be zero until after the
// first i_ce, so we are safe.
initial pipeline = 4'h0;
always @(posedge i_clk)
if (i_reset)
pipeline <= 4'h0;
else if (i_ce) // is our pipeline process full? Which stages?
pipeline <= { pipeline[2:0], iaddr[0] };
 
// This is the pipeline[-1] stage, pipeline[0] will be set next.
always @(posedge i_clk)
if ((i_ce)&&(iaddr[0]))
begin
sum_r <= imem_r + i_data_r;
sum_i <= imem_i + i_data_i;
diff_r <= imem_r - i_data_r;
diff_i <= imem_i - i_data_i;
end
 
// pipeline[1] takes sum_x and diff_x and produces rnd_x
 
// Now for pipeline[2]. We can actually do this at all i_ce
// clock times, since nothing will listen unless pipeline[3]
// on the next clock. Thus, we simplify this logic and do
// it independent of pipeline[2].
always @(posedge i_clk)
if (i_ce)
begin
ob_a <= { rnd_sum_r, rnd_sum_i };
// on Even, W = e^{-j2pi 1/4 0} = 1
if (ODD == 0)
begin
ob_b_r <= rnd_diff_r;
ob_b_i <= rnd_diff_i;
end else if (INVERSE==0) begin
// on Odd, W = e^{-j2pi 1/4} = -j
ob_b_r <= rnd_diff_i;
ob_b_i <= n_rnd_diff_r;
end else begin
// on Odd, W = e^{j2pi 1/4} = j
ob_b_r <= n_rnd_diff_i;
ob_b_i <= rnd_diff_r;
end
end
 
always @(posedge i_clk)
if (i_ce)
begin // In sequence, clock = 3
if (pipeline[3])
begin
omem <= ob_b;
o_data <= ob_a;
end else
o_data <= omem;
end
 
// Don't forget in the sync check that we are running
// at two clocks per sample. Thus we need to
// produce a sync every 2^(LGWIDTH-1) clocks.
initial o_sync = 1'b0;
always @(posedge i_clk)
if (i_reset)
o_sync <= 1'b0;
else if (i_ce)
o_sync <= &(~iaddr[(LGWIDTH-2):3]) && (iaddr[2:0] == 3'b101);
endmodule
/trunk/sw/Makefile
2,17 → 2,16
##
## Filename: Makefile
##
## Project: A Doubletime Pipelined FFT
## Project: A Generic Pipelined FFT Implementation
##
## Purpose: This is the main Makefile for the FFT core generator.
## It is very simple in its construction, the most complicated
## parts being the building of the Verilator simulation--a
## step that may not be required for your project.
## parts being the building of the Verilator simulation--a step that may
## not be required for your project.
##
## To build the FFT generator, just type 'make' on a line
## by itself. For a quick tutorial in how to run the
## generator, just type './fftgen -h' to read the usage()
## statement.
## To build the FFT generator, just type 'make' on a line by itself. For
## a quick tutorial in how to run the generator, just type './fftgen -h'
## to read the usage() statement.
##
## Creator: Dan Gisselquist, Ph.D.
## Gisselquist Technology, LLC
19,7 → 18,7
##
##########################################################################/
##
## Copyright (C) 2015, Gisselquist Technology, LLC
## Copyright (C) 2015,2018, Gisselquist Technology, LLC
##
## This program is free software (firmware): you can redistribute it and/or
## modify it under the terms of the GNU General Public License as published
45,10 → 44,20
##
# This is really simple ...
all: fftgen
CORED := fft-core
OBJDR := $(CORED)/obj_dir
TESTSZ := 2048
BENCHD := ../bench/cpp
CORED := ../rtl
VOBJDR := $(CORED)/obj_dir
OBJDIR := obj-pc
BENCHD := ../bench/cpp
SOURCES := bitreverse.cpp bldstage.cpp butterfly.cpp fftgen.cpp fftlib.cpp \
legal.cpp rounding.cpp softmpy.cpp
TESTSZ := -f 2048
# CKPCE := -k 1
CKPCE := -2
MPYS := -p 0
IWID := -n 15
FFTPARAMS := -d $(CORED) $(TESTSZ) $(CKPCE) $(MPYS) $(IWID)
OBJECTS := $(addprefix $(OBJDIR)/,$(subst .cpp,.o,$(SOURCES)))
HEADERS := $(wildcard *.h)
ifneq ($(VERILATOR_ROOT),)
VERILATOR:=$(VERILATOR_ROOT)/bin/verilator
else
57,17 → 66,19
endif
export $(VERILATOR)
VROOT := $(VERILATOR_ROOT)
VFLAGS := -Wall -MMD --trace -cc
VFLAGS := -Wall -O3 -MMD --trace -cc
CFLAGS := -g -Wall
 
fftgen: fftgen.o
$(CXX) $< -o $@
$(OBJDIR)/%.o: %.cpp
$(mk-objdir)
$(CXX) -c $(CFLAGS) $< -o $@
 
%.o: %.cpp
$(CXX) -c $< -o $@
fftgen: $(OBJECTS)
$(CXX) $(CFLAGS) $^ -o $@
 
.PHONY: test
test: fft ifft butterfly dblreverse qtrstage dblstage fftstage_o2048
test: hwbfly shiftaddmpy longbimpy
test: fft ifft butterfly fftstage hwbfly shiftaddmpy longbimpy qtrstage
test: bitreverse laststage
 
#
# Although these parameters, a 2048 point FFT of 16 bits input, aren't
76,101 → 87,176
# you may need to adjust the test benches if you wish to prove that your
# changes work.
#
.PHONY: fft
fft: fftgen
./fftgen -f $(TESTSZ) -n 16 -p 6 -a $(BENCHD)/fftsize.h
.PHONY: fft forcedfft
fft: $(VOBJDR)/Vfftmain__ALL.so
$(CORED)/fftmain.v: fftgen
./fftgen -v $(FFTPARAMS) -a $(BENCHD)/fftsize.h
forcedfft: fftgen
./fftgen -v $(FFTPARAMS) -a $(BENCHD)/fftsize.h
$(VOBJDR)/Vfftmain.h: $(CORED)/fftmain.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) fftmain.v
cd $(OBJDR); make -f Vfftmain.mk
$(VOBJDR)/Vfftmain__ALL.so: $(VOBJDR)/Vfftmain.h
cd $(VOBJDR); make -f Vfftmain.mk
 
.PHONY: dblfft
dblfft: $(VOBJDR)/Vdblfftmain__ALL.so
$(CORED)/dblfftmain.v: fftgen
./fftgen -v $(FFTPARAMS) -a $(BENCHD)/fftsize.h
$(VOBJDR)/Vdblfftmain.h: $(CORED)/dblfftmain.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) dblfftmain.v
$(VOBJDR)/Vdblfftmain__ALL.so: $(VOBJDR)/Vdblfftmain.h
cd $(VOBJDR); make -f Vdblfftmain.mk
 
.PHONY: idblfft
idblfft: $(VOBJDR)/Vidblfftmain__ALL.so
$(CORED)/idblfftmain.v: fftgen
./fftgen -i $(FFTPARAMS) -a $(BENCHD)/ifftsize.h
$(VOBJDR)/Vidblfftmain.h: $(CORED)/idblfftmain.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) idblfftmain.v
$(VOBJDR)/Vidblfftmain__ALL.so: $(VOBJDR)/Vidblfftmain.h
cd $(VOBJDR); make -f Vidblfftmain.mk
 
.PHONY: ifft
ifft: fftgen
./fftgen -f $(TESTSZ) -i -n 22 -p 6 -a $(BENCHD)/ifftsize.h
ifft: $(VOBJDR)/Vifftmain__ALL.so
$(CORED)/ifftmain.v: fftgen
./fftgen -i $(FFTPARAMS) -a $(BENCHD)/ifftsize.h
$(VOBJDR)/Vifftmain.h: $(CORED)/ifftmain.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) ifftmain.v
cd $(OBJDR); make -f Vifftmain.mk
$(VOBJDR)/Vifftmain__ALL.so: $(VOBJDR)/Vifftmain.h
cd $(VOBJDR); make -f Vifftmain.mk
 
.PHONY: shiftaddmpy
shiftaddmpy: $(OBJDR)/Vshiftaddmpy__ALL.a
shiftaddmpy: $(VOBJDR)/Vshiftaddmpy__ALL.a
 
$(CORED)/shiftaddmpy.v: fft
$(OBJDR)/Vshiftaddmpy.cpp $(OBJDR)/Vshiftaddmpy.h: $(CORED)/shiftaddmpy.v
$(VOBJDR)/Vshiftaddmpy.cpp $(VOBJDR)/Vshiftaddmpy.h: $(CORED)/shiftaddmpy.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) shiftaddmpy.v
$(OBJDR)/Vshiftaddmpy__ALL.a: $(OBJDR)/Vshiftaddmpy.h
$(OBJDR)/Vshiftaddmpy__ALL.a: $(OBJDR)/Vshiftaddmpy.cpp
cd $(OBJDR)/; make -f Vshiftaddmpy.mk
$(VOBJDR)/Vshiftaddmpy__ALL.a: $(VOBJDR)/Vshiftaddmpy.h
$(VOBJDR)/Vshiftaddmpy__ALL.a: $(VOBJDR)/Vshiftaddmpy.cpp
cd $(VOBJDR)/; make -f Vshiftaddmpy.mk
 
.PHONY: longbimpy
longbimpy: $(OBJDR)/Vlongbimpy__ALL.a
longbimpy: $(VOBJDR)/Vlongbimpy__ALL.a
 
$(CORED)/longbimpy.v: fft
$(OBJDR)/Vlongbimpy.cpp $(OBJDR)/Vlongbimpy.h: $(CORED)/longbimpy.v
$(VOBJDR)/Vlongbimpy.cpp $(VOBJDR)/Vlongbimpy.h: $(CORED)/longbimpy.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) longbimpy.v
$(OBJDR)/Vlongbimpy__ALL.a: $(OBJDR)/Vlongbimpy.h
$(OBJDR)/Vlongbimpy__ALL.a: $(OBJDR)/Vlongbimpy.cpp
cd $(OBJDR)/; make -f Vlongbimpy.mk
$(VOBJDR)/Vlongbimpy__ALL.a: $(VOBJDR)/Vlongbimpy.h
$(VOBJDR)/Vlongbimpy__ALL.a: $(VOBJDR)/Vlongbimpy.cpp
cd $(VOBJDR)/; make -f Vlongbimpy.mk
 
.PHONY: butterfly
butterfly: $(OBJDR)/Vbutterfly__ALL.a
butterfly: $(VOBJDR)/Vbutterfly__ALL.a
 
$(CORED)/butterfly.v: fft
$(OBJDR)/Vbutterfly.cpp $(OBJDR)/Vbutterfly.h: $(CORED)/butterfly.v
$(VOBJDR)/Vbutterfly.cpp $(VOBJDR)/Vbutterfly.h: $(CORED)/butterfly.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) butterfly.v
$(OBJDR)/Vbutterfly__ALL.a: $(OBJDR)/Vbutterfly.h
$(OBJDR)/Vbutterfly__ALL.a: $(OBJDR)/Vbutterfly.cpp
cd $(OBJDR)/; make -f Vbutterfly.mk
$(VOBJDR)/Vbutterfly__ALL.a: $(VOBJDR)/Vbutterfly.h
$(VOBJDR)/Vbutterfly__ALL.a: $(VOBJDR)/Vbutterfly.cpp
cd $(VOBJDR)/; make -f Vbutterfly.mk
 
.PHONY: hwbfly
hwbfly: $(OBJDR)/Vhwbfly__ALL.a
hwbfly: $(VOBJDR)/Vhwbfly__ALL.a
 
$(CORED)/hwbfly.v: fft
$(OBJDR)/Vhwbfly.cpp $(OBJDR)/Vhwbfly.h: $(CORED)/hwbfly.v
$(VOBJDR)/Vhwbfly.cpp $(VOBJDR)/Vhwbfly.h: $(CORED)/hwbfly.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) hwbfly.v
$(OBJDR)/Vhwbfly__ALL.a: $(OBJDR)/Vhwbfly.h
$(OBJDR)/Vhwbfly__ALL.a: $(OBJDR)/Vhwbfly.cpp
cd $(OBJDR)/; make -f Vhwbfly.mk
$(VOBJDR)/Vhwbfly__ALL.a: $(VOBJDR)/Vhwbfly.h
$(VOBJDR)/Vhwbfly__ALL.a: $(VOBJDR)/Vhwbfly.cpp
cd $(VOBJDR)/; make -f Vhwbfly.mk
 
.PHONY: dblreverse
dblreverse: $(OBJDR)/Vdblreverse__ALL.a
.PHONY: bitreverse
bitreverse: $(VOBJDR)/Vbitreverse__ALL.a
 
$(CORED)/dblreverse.v: fft
$(OBJDR)/Vdblreverse.cpp $(OBJDR)/Vdblreverse.h: $(CORED)/dblreverse.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) dblreverse.v
$(OBJDR)/Vdblreverse__ALL.a: $(OBJDR)/Vdblreverse.h
$(OBJDR)/Vdblreverse__ALL.a: $(OBJDR)/Vdblreverse.cpp
cd $(OBJDR)/; make -f Vdblreverse.mk
$(CORED)/bitreverse.v: fft
$(VOBJDR)/Vbitreverse.cpp $(VOBJDR)/Vbitreverse.h: $(CORED)/bitreverse.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) bitreverse.v
$(VOBJDR)/Vbitreverse__ALL.a: $(VOBJDR)/Vbitreverse.h
$(VOBJDR)/Vbitreverse__ALL.a: $(VOBJDR)/Vbitreverse.cpp
cd $(VOBJDR)/; make -f Vbitreverse.mk
 
.PHONY: qtrstage
qtrstage: $(OBJDR)/Vqtrstage__ALL.a
qtrstage: $(VOBJDR)/Vqtrstage__ALL.a
 
$(CORED)/qtrstage.v: fft
$(OBJDR)/Vqtrstage.cpp $(OBJDR)/Vqtrstage.h: $(CORED)/qtrstage.v
$(VOBJDR)/Vqtrstage.cpp $(VOBJDR)/Vqtrstage.h: $(CORED)/qtrstage.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) qtrstage.v
$(OBJDR)/Vqtrstage__ALL.a: $(OBJDR)/Vqtrstage.h
$(OBJDR)/Vqtrstage__ALL.a: $(OBJDR)/Vqtrstage.cpp
cd $(OBJDR)/; make -f Vqtrstage.mk
$(VOBJDR)/Vqtrstage__ALL.a: $(VOBJDR)/Vqtrstage.h
$(VOBJDR)/Vqtrstage__ALL.a: $(VOBJDR)/Vqtrstage.cpp
cd $(VOBJDR)/; make -f Vqtrstage.mk
 
.PHONY: dblstage
dblstage: $(OBJDR)/Vdblstage__ALL.a
dblstage: $(VOBJDR)/Vdblstage__ALL.a
 
$(CORED)/dblstage.v: fft
$(OBJDR)/Vdblstage.cpp $(OBJDR)/Vdblstage.h: $(CORED)/dblstage.v
$(VOBJDR)/Vdblstage.cpp $(VOBJDR)/Vdblstage.h: $(CORED)/dblstage.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) dblstage.v
$(OBJDR)/Vdblstage__ALL.a: $(OBJDR)/Vdblstage.h
$(OBJDR)/Vdblstage__ALL.a: $(OBJDR)/Vdblstage.cpp
cd $(OBJDR)/; make -f Vdblstage.mk
$(VOBJDR)/Vdblstage__ALL.a: $(VOBJDR)/Vdblstage.h
$(VOBJDR)/Vdblstage__ALL.a: $(VOBJDR)/Vdblstage.cpp
cd $(VOBJDR)/; make -f Vdblstage.mk
 
.PHONY: fftstage_o2048
dblstage: $(OBJDR)/Vfftstage_o2048__ALL.a
.PHONY: laststage
laststage: $(VOBJDR)/Vlaststage__ALL.a
 
$(CORED)/fftstage_o2048.v: fft
$(OBJDR)/Vfftstage_o2048.cpp $(OBJDR)/Vfftstage_o2048.h: $(CORED)/fftstage_o2048.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) fftstage_o2048.v
$(OBJDR)/Vfftstage_o2048__ALL.a: $(OBJDR)/Vfftstage_o2048.h
$(OBJDR)/Vfftstage_o2048__ALL.a: $(OBJDR)/Vfftstage_o2048.cpp
cd $(OBJDR)/; make -f Vfftstage_o2048.mk
$(CORED)/laststage.v: fft
$(VOBJDR)/Vlaststage.cpp $(VOBJDR)/Vlaststage.h: $(CORED)/laststage.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) laststage.v
$(VOBJDR)/Vlaststage__ALL.a: $(VOBJDR)/Vlaststage.h
$(VOBJDR)/Vlaststage__ALL.a: $(VOBJDR)/Vlaststage.cpp
cd $(VOBJDR)/; make -f Vlaststage.mk
 
.PHONY: fftstage
fftstage: $(VOBJDR)/Vfftstage__ALL.a
 
$(CORED)/fftstage.v: fft
$(VOBJDR)/Vfftstage.cpp $(VOBJDR)/Vfftstage.h: $(CORED)/fftstage.v
cd $(CORED)/; $(VERILATOR) $(VFLAGS) fftstage.v
$(VOBJDR)/Vfftstage__ALL.a: $(VOBJDR)/Vfftstage.h
$(VOBJDR)/Vfftstage__ALL.a: $(VOBJDR)/Vfftstage.cpp
cd $(VOBJDR)/; make -f Vfftstage.mk
 
 
.PHONY: clean
clean:
rm fftgen fftgen.o
rm -rf $(CORED)
rm -rf $(CORED)/obj_dir
rm -rf $(CORED)/fftmain.v $(CORED)/fftstage.v
rm -rf $(CORED)/qtrstage.v $(CORED)/laststage.v $(CORED)/bitreverse.v
rm -rf $(CORED)/butterfly.v $(CORED)/hwbfly.v
rm -rf $(CORED)/longbimpy.v $(CORED)/bimpy.v
rm -rf $(CORED)/convround.v
 
#
# The "depends" target, to know what files things depend upon. The depends
# file itself is kept in $(OBJDIR)/depends.txt
#
define build-depends
$(mk-objdir)
@echo "Building dependency file"
@$(CXX) $(CFLAGS) $(INCS) -MM $(SOURCES) > $(OBJDIR)/xdepends.txt
@sed -e 's/^.*.o: /$(OBJDIR)\/&/' < $(OBJDIR)/xdepends.txt > $(OBJDIR)/depends.txt
@rm $(OBJDIR)/xdepends.txt
endef
 
.PHONY: depends
depends: tags
$(build-depends)
 
$(OBJDIR)/depends.txt: depends
 
#
# Make a directory to hold all of the FFT-gen (i.e. the C++) build products
# (object files)
#
define mk-objdir
@bash -c "if [ ! -e $(OBJDIR) ]; then mkdir -p $(OBJDIR); fi"
endef
 
#
# The "tags" target
#
tags: $(SOURCES) $(HEADERS)
@echo "Generating tags"
@ctags $(SOURCES) $(HEADERS)
 
-include $(OBJDIR)/depends.txt
/trunk/sw/README.md
0,0 → 1,11
This directory contains the software to generate the FFT. It compiles into a
program called `fftgen`, which you can then call to generate the FFT you are
interested in.
 
Components of this coregen include:
 
- [fftgen.cpp](fftgen.cpp) - This is the top level or 'main' FFT generation program.
- [bldstage.cpp](bldstage.cpp) - Generates the code for a single FFT stage,
called [fftstage.v](../rtl/fftstage.v) in the RTL directory.
- [softmpy.cpp](softmpy.cpp) - Generates a soft multiply.
- [bitreverse.cpp](bitreverse.cpp) - Generates a bit reverse module
/trunk/sw/bitreverse.cpp
0,0 → 1,673
////////////////////////////////////////////////////////////////////////////////
//
// Filename: bitreverse.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen
#include <stdio.h>
#include <stdlib.h>
 
#ifdef _MSC_VER // added for ms vs compatibility
 
#include <io.h>
#include <direct.h>
#define _USE_MATH_DEFINES
#else
// And for G++/Linux environment
 
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros
#include <sys/stat.h>
#endif
 
#include <string.h>
#include <string>
#include <math.h>
#include <ctype.h>
#include <assert.h>
 
#include "defaults.h"
#include "legal.h"
#include "bitreverse.h"
 
void build_snglbrev(const char *fname, const bool async_reset) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
char *modulename = strdup(fname), *pslash;
modulename[strlen(modulename)-2] = '\0';
pslash = strrchr(modulename, '/');
if (pslash != NULL)
strcpy(modulename, pslash+1);
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\t%s.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tThis module bitreverses a pipelined FFT input. It differes\n"
"// from the dblreverse module in that this is just a simple and\n"
"// straightforward bitreverse, rather than one written to handle two\n"
"// words at once.\n"
"//\n"
"//\n%s"
"//\n", modulename, prjname, creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module %s(i_clk, %s, i_ce, i_in, o_out, o_sync);\n"
"\tparameter\t\t\tLGSIZE=%d, WIDTH=24;\n"
"\tinput\t\t\t\ti_clk, %s, i_ce;\n"
"\tinput\t\t[(2*WIDTH-1):0]\ti_in;\n"
"\toutput\twire\t[(2*WIDTH-1):0]\to_out;\n"
"\toutput\treg\t\t\to_sync;\n", modulename, resetw.c_str(),
TST_DBLREVERSE_LGSIZE,
resetw.c_str());
 
fprintf(fp,
" reg [(LGSIZE):0] wraddr;\n"
" wire [(LGSIZE):0] rdaddr;\n"
"\n"
" reg [(2*WIDTH-1):0] brmem [0:((1<<(LGSIZE+1))-1)];\n"
"\n"
" genvar k;\n"
" generate for(k=0; k<LGSIZE; k=k+1)\n"
" assign rdaddr[k] = wraddr[LGSIZE-1-k];\n"
" endgenerate\n"
" assign rdaddr[LGSIZE] = !wraddr[LGSIZE];\n"
"\n"
" reg in_reset;\n"
"\n"
" initial in_reset = 1'b1;\n");
 
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
" in_reset <= 1'b1;\n"
" else if ((i_ce)&&(&wraddr[(LGSIZE-1):0]))\n"
" in_reset <= 1'b0;\n"
"\n"
" initial wraddr = 0;\n");
 
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
" wraddr <= 0;\n"
" else if (i_ce)\n"
" begin\n"
" brmem[wraddr] <= i_in;\n"
" wraddr <= wraddr + 1;\n"
" end\n"
"\n"
" always @(posedge i_clk)\n"
" if (i_ce) // If (i_reset) we just output junk ... not a problem\n"
" o_out <= brmem[rdaddr]; // w/o a sync pulse\n"
"\n"
" initial o_sync = 1'b0;\n");
 
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
" o_sync <= 1'b0;\n"
" else if ((i_ce)&&(!in_reset))\n"
" o_sync <= (wraddr[(LGSIZE-1):0] == 0);\n"
"\n");
 
 
if (formal_property_flag) {
fprintf(fp,
"`ifdef\tFORMAL\n"
"`ifdef BITREVERSE\n"
"`define\tASSUME assume\n"
"`define\tASSERT assert\n");
if (async_reset)
fprintf(fp,
"\n\talways @($global_clock)\n"
"\t\tassume(i_clk != $past(i_clk));\n\n");
 
fprintf(fp,
"`else\n"
"`define\tASSUME assert\n"
"`define\tASSERT assume\n"
"`endif\n"
"\n"
"\treg f_past_valid;\n"
"\tinitial f_past_valid = 1'b0;\n"
"\talways @(posedge i_clk)\n"
"\t\tf_past_valid <= 1'b1;\n\n");
if (async_reset)
fprintf(fp,
"\tinitial `ASSUME(!i_areset_n);\n"
"\talways @($global_clock)\n"
"\tif (!$rose(i_clk)))\n"
"\t\t`ASSERT(!$rose(i_areset_n));\n\n"
"\talways @($global_clock)\n"
"\tif (!$rose(i_clk))\n"
"\tbegin\n"
"\t\t`ASSUME($stable(i_ce));\n"
"\t\t`ASSUME($stable(i_in));\n"
"\t\t//\n"
"\t\tif (i_areset_n)\n"
"\t\tbegin\n"
"\t\t\t`ASSERT($stable(o_out));\n"
"\t\t\t`ASSERT($stable(o_sync));\n"
"\t\tend\n"
"\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif ((!f_past_valid)||(!i_areset_n))\n"
"\tbegin\n");
else
fprintf(fp,
"\tinitial `ASSUME(i_reset);\n"
"\talways @(posedge i_clk)\n"
"\tif ((!f_past_valid)||($past(i_reset)))\n"
"\tbegin\n");
 
fprintf(fp,
"\t\t`ASSERT(wraddr == 0);\n"
"\t\t`ASSERT(in_reset);\n"
"\t\t`ASSERT(!o_sync);\n");
fprintf(fp, "\tend\n");
 
 
fprintf(fp, "`ifdef BITREVERSE\n"
"\talways @(posedge i_clk)\n"
"\t\tassume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n"
"`endif // BITREVERSE\n\n");
 
fprintf(fp,
"\t\t(* anyconst *) reg [LGSIZE:0]\tf_const_addr;\n"
"\t\twire\t[LGSIZE:0]\tf_reversed_addr;\n"
"\t\treg\t f_addr_loaded;\n"
"\t\treg\t[(2*WIDTH-1):0]\tf_addr_value;\n"
"\n"
"\t\tgenerate for(k=0; k<LGSIZE; k=k+1)\n"
"\t\t\tassign\tf_reversed_addr[k] = f_const_addr[LGSIZE-1-k];\n"
"\t\tendgenerate\n"
"\t\tassign\tf_reversed_addr[LGSIZE] = f_const_addr[LGSIZE];\n"
"\n"
"\t\tinitial\tf_addr_loaded = 1'b0;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n"
"\t\t\tf_addr_loaded <= 1'b0;\n"
"\t\telse if (i_ce)\n"
"\t\tbegin\n"
"\t\t\tif (wraddr == f_const_addr)\n"
"\t\t\t\tf_addr_loaded <= 1'b1;\n"
"\t\t\telse if (rdaddr == f_const_addr)\n"
"\t\t\t\tf_addr_loaded <= 1'b0;\n"
"\t\tend\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif ((i_ce)&&(wraddr == f_const_addr))\n"
"\t\tbegin\n"
"\t\t\tf_addr_value <= i_in;\n"
"\t\t\t`ASSERT(!f_addr_loaded);\n"
"\t\tend\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n"
"\t\t\t\t&&($past(f_addr_loaded))&&(!f_addr_loaded))\n"
"\t\t\tassert(o_out == f_addr_value);\n"
"\n"
"\t\talways @(*)\n"
"\t\tif (o_sync)\n"
"\t\t\tassert(wraddr[LGSIZE-1:0] == 1);\n"
"\n"
"\t\talways @(*)\n"
"\t\tif ((wraddr[LGSIZE]==f_const_addr[LGSIZE])\n"
"\t\t\t\t&&(wraddr[LGSIZE-1:0]\n"
"\t\t\t\t\t\t<= f_const_addr[LGSIZE-1:0]))\n"
"\t\t\t`ASSERT(!f_addr_loaded);\n"
"\n"
"\t\talways @(*)\n"
"\t\tif ((rdaddr[LGSIZE]==f_const_addr[LGSIZE])&&(f_addr_loaded))\n"
"\t\t\t`ASSERT(wraddr[LGSIZE-1:0]\n"
"\t\t\t\t\t<= f_reversed_addr[LGSIZE-1:0]+1);\n"
"\n"
"\t\talways @(*)\n"
"\t\tif (f_addr_loaded)\n"
"\t\t\t`ASSERT(brmem[f_const_addr] == f_addr_value);\n"
"\n"
"\n\n");
 
fprintf(fp,
"`endif\t// FORMAL\n");
}
 
fprintf(fp,
"endmodule\n");
 
fclose(fp);
free(modulename);
}
 
void build_dblreverse(const char *fname, const bool async_reset) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
char *modulename = strdup(fname), *pslash;
modulename[strlen(modulename)-2] = '\0';
pslash = strrchr(modulename, '/');
if (pslash != NULL)
strcpy(modulename, pslash+1);
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\t%s.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tThis module bitreverses a pipelined FFT input. Operation is\n"
"// expected as follows:\n"
"//\n"
"// i_clk A running clock at whatever system speed is offered.\n",
modulename, prjname);
 
if (async_reset)
fprintf(fp,
"// i_areset_n An active low asynchronous reset signal,\n"
"// that resets all internals\n");
else
fprintf(fp,
"// i_reset A synchronous reset signal, that resets all internals\n");
 
fprintf(fp,
"// i_ce If this is one, one input is consumed and an output\n"
"// is produced.\n"
"// i_in_0, i_in_1\n"
"// Two inputs to be consumed, each of width WIDTH.\n"
"// o_out_0, o_out_1\n"
"// Two of the bitreversed outputs, also of the same\n"
"// width, WIDTH. Of course, there is a delay from the\n"
"// first input to the first output. For this purpose,\n"
"// o_sync is present.\n"
"// o_sync This will be a 1\'b1 for the first value in any block.\n"
"// Following a reset, this will only become 1\'b1 once\n"
"// the data has been loaded and is now valid. After that,\n"
"// all outputs will be valid.\n"
"//\n"
"// 20150602 -- This module has undergone massive rework in order to\n"
"// ensure that it uses resources efficiently. As a result,\n"
"// it now optimizes nicely into block RAMs. As an unfortunately\n"
"// side effect, it now passes it\'s bench test (dblrev_tb) but\n"
"// fails the integration bench test (fft_tb).\n"
"//\n"
"//\n%s"
"//\n", creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"\n\n"
"//\n"
"// How do we do bit reversing at two smples per clock? Can we separate out\n"
"// our work into eight memory banks, writing two banks at once and reading\n"
"// another two banks in the same clock?\n"
"//\n"
"// mem[00xxx0] = s_0[n]\n"
"// mem[00xxx1] = s_1[n]\n"
"// o_0[n] = mem[10xxx0]\n"
"// o_1[n] = mem[11xxx0]\n"
"// ...\n"
"// mem[01xxx0] = s_0[m]\n"
"// mem[01xxx1] = s_1[m]\n"
"// o_0[m] = mem[10xxx1]\n"
"// o_1[m] = mem[11xxx1]\n"
"// ...\n"
"// mem[10xxx0] = s_0[n]\n"
"// mem[10xxx1] = s_1[n]\n"
"// o_0[n] = mem[00xxx0]\n"
"// o_1[n] = mem[01xxx0]\n"
"// ...\n"
"// mem[11xxx0] = s_0[m]\n"
"// mem[11xxx1] = s_1[m]\n"
"// o_0[m] = mem[00xxx1]\n"
"// o_1[m] = mem[01xxx1]\n"
"// ...\n"
"//\n"
"// The answer is that, yes we can but: we need to use four memory banks\n"
"// to do it properly. These four banks are defined by the two bits\n"
"// that determine the top and bottom of the correct address. Larger\n"
"// FFT\'s would require more memories.\n"
"//\n"
"//\n");
fprintf(fp,
"module %s(i_clk, %s, i_ce, i_in_0, i_in_1,\n"
"\t\to_out_0, o_out_1, o_sync);\n"
"\tparameter\t\t\tLGSIZE=%d, WIDTH=24;\n"
"\tinput\t\t\t\ti_clk, %s, i_ce;\n"
"\tinput\t\t[(2*WIDTH-1):0]\ti_in_0, i_in_1;\n"
"\toutput\twire\t[(2*WIDTH-1):0]\to_out_0, o_out_1;\n"
"\toutput\treg\t\t\to_sync;\n", modulename,
resetw.c_str(), TST_DBLREVERSE_LGSIZE, resetw.c_str());
 
fprintf(fp,
"\n"
"\treg\t\t\tin_reset;\n"
"\treg\t[(LGSIZE-1):0]\tiaddr;\n"
"\twire\t[(LGSIZE-3):0]\tbraddr;\n"
"\n"
"\tgenvar\tk;\n"
"\tgenerate for(k=0; k<LGSIZE-2; k=k+1)\n"
"\tbegin : gen_a_bit_reversed_value\n"
"\t\tassign braddr[k] = iaddr[LGSIZE-3-k];\n"
"\tend endgenerate\n"
"\n"
"\tinitial iaddr = 0;\n"
"\tinitial in_reset = 1\'b1;\n"
"\tinitial o_sync = 1\'b0;\n");
 
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\tbegin\n"
"\t\t\tiaddr <= 0;\n"
"\t\t\tin_reset <= 1\'b1;\n"
"\t\t\to_sync <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\tiaddr <= iaddr + { {(LGSIZE-1){1\'b0}}, 1\'b1 };\n"
"\t\t\tif (&iaddr[(LGSIZE-2):0])\n"
"\t\t\t\tin_reset <= 1\'b0;\n"
"\t\t\tif (in_reset)\n"
"\t\t\t\to_sync <= 1\'b0;\n"
"\t\t\telse\n"
"\t\t\t\to_sync <= ~(|iaddr[(LGSIZE-2):0]);\n"
"\t\tend\n"
"\n"
"\treg\t[(2*WIDTH-1):0]\tmem_e [0:((1<<(LGSIZE))-1)];\n"
"\treg\t[(2*WIDTH-1):0]\tmem_o [0:((1<<(LGSIZE))-1)];\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\tmem_e[iaddr] <= i_in_0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\tmem_o[iaddr] <= i_in_1;\n"
"\n"
"\n"
"\treg [(2*WIDTH-1):0] evn_out_0, evn_out_1, odd_out_0, odd_out_1;\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\tevn_out_0 <= mem_e[{!iaddr[LGSIZE-1],1\'b0,braddr}];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\tevn_out_1 <= mem_e[{!iaddr[LGSIZE-1],1\'b1,braddr}];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\todd_out_0 <= mem_o[{!iaddr[LGSIZE-1],1\'b0,braddr}];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\todd_out_1 <= mem_o[{!iaddr[LGSIZE-1],1\'b1,braddr}];\n"
"\n"
"\treg\tadrz;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) adrz <= iaddr[LGSIZE-2];\n"
"\n"
"\tassign\to_out_0 = (adrz)?odd_out_0:evn_out_0;\n"
"\tassign\to_out_1 = (adrz)?odd_out_1:evn_out_1;\n"
"\n");
 
if (formal_property_flag) {
fprintf(fp,
"`ifdef\tFORMAL\n"
"`ifdef BITREVERSE\n"
"`define\tASSUME assume\n"
"`define\tASSERT assert\n");
if (async_reset)
fprintf(fp,
"\n\talways @($global_clock)\n"
"\t\tassume(i_clk != $past(i_clk));\n\n");
 
fprintf(fp,
"`else\n"
"`define\tASSUME assert\n"
"`define\tASSERT assume\n"
"`endif\n"
"\n"
"\treg f_past_valid;\n"
"\tinitial f_past_valid = 1'b0;\n"
"\talways @(posedge i_clk)\n"
"\t\tf_past_valid <= 1'b1;\n\n");
if (async_reset)
fprintf(fp,
"\tinitial `ASSUME(!i_areset_n);\n"
"\talways @($global_clock)\n"
"\tif (!$rose(i_clk)))\n"
"\t\t`ASSERT(!$rose(i_areset_n));\n\n"
"\talways @($global_clock)\n"
"\tif (!$rose(i_clk))\n"
"\tbegin\n"
"\t\t`ASSUME($stable(i_ce));\n"
"\t\t`ASSUME($stable(i_in_0));\n"
"\t\t`ASSUME($stable(i_in_1));\n"
"\t\t//\n"
"\t\tif (i_areset_n)\n"
"\t\tbegin\n"
"\t\t\t`ASSERT($stable(o_out_0));\n"
"\t\t\t`ASSERT($stable(o_out_1));\n"
"\t\t\t`ASSERT($stable(o_sync));\n"
"\t\tend\n"
"\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif ((!f_past_valid)||(!i_areset_n))\n"
"\tbegin\n");
else
fprintf(fp,
"\tinitial `ASSUME(i_reset);\n"
"\talways @(posedge i_clk)\n"
"\tif ((!f_past_valid)||($past(i_reset)))\n"
"\tbegin\n");
 
fprintf(fp,
"\t\t`ASSERT(iaddr == 0);\n"
"\t\t`ASSERT(in_reset);\n"
"\t\t`ASSERT(!o_sync);\n");
fprintf(fp, "\tend\n");
 
 
fprintf(fp, "`ifdef BITREVERSE\n"
"\talways @(posedge i_clk)\n"
"\t\tassume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n"
"`endif // BITREVERSE\n\n");
 
 
fprintf(fp,
"\t\t(* anyconst *) reg [LGSIZE-1:0] f_const_addr;\n"
"\t\twire [LGSIZE-3:0] f_reversed_addr;\n"
"\t\t// reg [LGSIZE:0] f_now;\n"
"\t\treg f_addr_loaded_0, f_addr_loaded_1;\n"
"\t\treg [(2*WIDTH-1):0] f_data_0, f_data_1;\n"
"\t\twire f_writing, f_reading;\n"
"\n"
"\t\tgenerate for(k=0; k<LGSIZE-2; k=k+1)\n"
"\t\t assign f_reversed_addr[k] = f_const_addr[LGSIZE-3-k];\n"
"\t\tendgenerate\n"
"\n"
"\t\tassign f_writing=(f_const_addr[LGSIZE-1]==iaddr[LGSIZE-1]);\n"
"\t\tassign f_reading=(f_const_addr[LGSIZE-1]!=iaddr[LGSIZE-1]);\n"
"\t\tinitial f_addr_loaded_0 = 1'b0;\n"
"\t\tinitial f_addr_loaded_1 = 1'b0;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n"
"\t\tbegin\n"
"\t f_addr_loaded_0 <= 1'b0;\n"
"\t f_addr_loaded_1 <= 1'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t if (iaddr == f_const_addr)\n"
"\t\t begin\n"
"\t\t f_addr_loaded_0 <= 1'b1;\n"
"\t\t f_addr_loaded_1 <= 1'b1;\n"
"\t\t end\n"
"\n"
"\t\t if (f_reading)\n"
"\t\t begin\n"
"\t\t if ((braddr == f_const_addr[LGSIZE-3:0])\n"
"\t\t &&(iaddr[LGSIZE-2] == 1'b0))\n"
"\t\t f_addr_loaded_0 <= 1'b0;\n"
"\n"
"\t\t if ((braddr == f_const_addr[LGSIZE-3:0])\n"
"\t\t &&(iaddr[LGSIZE-2] == 1'b1))\n"
"\t\t f_addr_loaded_1 <= 1'b0;\n"
"\t\t end\n"
"\t\tend\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif ((i_ce)&&(iaddr == f_const_addr))\n"
"\t\tbegin\n"
"\t\t f_data_0 <= i_in_0;\n"
"\t\t f_data_1 <= i_in_1;\n"
"\t\t `ASSERT(!f_addr_loaded_0);\n"
"\t\t `ASSERT(!f_addr_loaded_1);\n"
"\t\tend\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n"
"\t\t &&($past(f_addr_loaded_0))&&(!f_addr_loaded_0))\n"
"\t\tbegin\n"
"\t\t assert(!$past(iaddr[LGSIZE-2]));\n"
"\t\t if (f_const_addr[LGSIZE-2])\n"
"\t\t assert(o_out_1 == f_data_0);\n"
"\t\t else\n"
"\t\t assert(o_out_0 == f_data_0);\n"
"\t\tend\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n"
"\t\t &&($past(f_addr_loaded_1))&&(!f_addr_loaded_1))\n"
"\t\tbegin\n"
"\t\t assert($past(iaddr[LGSIZE-2]));\n"
"\t\t if (f_const_addr[LGSIZE-2])\n"
"\t\t assert(o_out_1 == f_data_1);\n"
"\t\t else\n"
"\t\t assert(o_out_0 == f_data_1);\n"
"\t\tend\n"
"\n"
"\t\talways @(*)\n"
"\t\t `ASSERT(o_sync == ((iaddr[LGSIZE-2:0] == 1)&&(!in_reset)));\n"
"\n"
"\t\t// Before writing to a section, the loaded flags should be\n"
"\t\t// zero\n"
"\t\talways @(*)\n"
"\t\tif (f_writing)\n"
"\t\tbegin\n"
"\t\t `ASSERT(f_addr_loaded_0 == (iaddr[LGSIZE-2:0]\n"
"\t\t > f_const_addr[LGSIZE-2:0]));\n"
"\t\t `ASSERT(f_addr_loaded_1 == (iaddr[LGSIZE-2:0]\n"
"\t\t > f_const_addr[LGSIZE-2:0]));\n"
"\t\tend\n"
"\n"
"\t\t// If we were writing, and now we are reading, then both\n"
"\t\t// f_addr_loaded flags must be set\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n"
"\t\t &&($past(f_writing))&&(f_reading))\n"
"\t\tbegin\n"
"\t\t `ASSERT(f_addr_loaded_0);\n"
"\t\t `ASSERT(f_addr_loaded_1);\n"
"\t\tend\n"
"\n"
"\t\talways @(*)\n"
"\t\tif (f_writing)\n"
"\t\t `ASSERT(f_addr_loaded_0 == f_addr_loaded_1);\n"
"\n"
"\t\t// When reading, and the loaded flag is zero, our pointer\n"
"\t\t// must not have hit the address of interest yet\n"
"\t\talways @(*)\n"
"\t\tif ((!in_reset)&&(f_reading))\n"
"\t\t `ASSERT(f_addr_loaded_0 ==\n"
"\t\t ((!iaddr[LGSIZE-2])&&(iaddr[LGSIZE-3:0]\n"
"\t\t <= f_reversed_addr[LGSIZE-3:0])));\n"
"\t\talways @(*)\n"
"\t\tif ((!in_reset)&&(f_reading))\n"
"\t\t `ASSERT(f_addr_loaded_1 ==\n"
"\t\t ((!iaddr[LGSIZE-2])||(iaddr[LGSIZE-3:0]\n"
"\t\t <= f_reversed_addr[LGSIZE-3:0])));\n"
"\t\talways @(*)\n"
"\t\tif ((in_reset)&&(f_reading))\n"
"\t\tbegin\n"
"\t\t `ASSERT(!f_addr_loaded_0);\n"
"\t\t `ASSERT(!f_addr_loaded_1);\n"
"\t\tend\n"
"\n"
"\t\talways @(*)\n"
"\t\tif(iaddr[LGSIZE-1])\n"
"\t\t `ASSERT(!in_reset);\n"
"\n"
"\t\talways @(*)\n"
"\t\tif (f_addr_loaded_0)\n"
"\t\t `ASSERT(mem_e[f_const_addr] == f_data_0);\n"
"\t\talways @(*)\n"
"\t\tif (f_addr_loaded_1)\n"
"\t\t `ASSERT(mem_o[f_const_addr] == f_data_1);\n"
"\n\n");
 
 
fprintf(fp,
"`endif\t// FORMAL\n");
}
 
fprintf(fp,
"endmodule\n");
 
fclose(fp);
free(modulename);
}
/trunk/sw/bitreverse.h
0,0 → 1,44
////////////////////////////////////////////////////////////////////////////////
//
// Filename: bitreverse.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef BITREVERSE_H
#define BITREVERSE_H
 
extern void build_snglbrev(const char *fname, const bool async_reset = false);
extern void build_dblreverse(const char *fname, const bool async_reset = false);
 
#endif // BITREVERSE_H
/trunk/sw/bldstage.cpp
0,0 → 1,549
////////////////////////////////////////////////////////////////////////////////
//
// Filename: bldstage.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen
#include <stdio.h>
#include <stdlib.h>
 
#ifdef _MSC_VER // added for ms vs compatibility
 
#include <io.h>
#include <direct.h>
#define _USE_MATH_DEFINES
 
#else
// And for G++/Linux environment
 
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros
#endif
 
#include <string.h>
#include <string>
#include <math.h>
#include <ctype.h>
#include <assert.h>
 
#include "defaults.h"
#include "legal.h"
#include "fftlib.h"
#include "rounding.h"
#include "bldstage.h"
 
void build_dblstage(const char *fname, ROUND_T rounding,
const bool async_reset, const bool dbg) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
const char *rnd_string;
if (rounding == RND_TRUNCATE)
rnd_string = "truncate";
else if (rounding == RND_FROMZERO)
rnd_string = "roundfromzero";
else if (rounding == RND_HALFUP)
rnd_string = "roundhalfup";
else
rnd_string = "convround";
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\tlaststage%s.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tThis is part of an FPGA implementation that will process\n"
"// the final stage of a decimate-in-frequency FFT, running\n"
"// through the data at two samples per clock. If you notice from the\n"
"// derivation of an FFT, the only time both even and odd samples are\n"
"// used at the same time is in this stage. Therefore, other than this\n"
"// stage and these twiddles, all of the other stages can run two stages\n"
"// at a time at one sample per clock.\n"
"//\n"
"// Operation:\n"
"// Given a stream of values, operate upon them as though they were\n"
"// value pairs, x[2n] and x[2n+1]. The stream begins when n=0, and ends\n"
"// when n=1. When the first x[0] value enters, the synchronization\n"
"// input, i_sync, must be true as well.\n"
"//\n"
"// For this stream, produce outputs\n"
"// y[2n ] = x[2n] + x[2n+1], and\n"
"// y[2n+1] = x[2n] - x[2n+1]\n"
"//\n"
"// When y[0] is output, a synchronization bit o_sync will be true as\n"
"// well, otherwise it will be zero.\n"
"//\n"
"//\n"
"// In this implementation, the output is valid one clock after the input\n"
"// is valid. The output also accumulates one bit above and beyond the\n"
"// number of bits in the input.\n"
"//\n"
"// i_clk A system clock\n", (dbg)?"_dbg":"", prjname);
if (async_reset)
fprintf(fp,
"// i_areset_n An active low asynchronous reset\n");
else
fprintf(fp,
"// i_reset A synchronous reset\n");
 
fprintf(fp,
"// i_ce Circuit enable--nothing happens unless this line is high\n"
"// i_sync A synchronization signal, high once per FFT at the start\n"
"// i_left The first (even) complex sample input. The higher order\n"
"// bits contain the real portion, low order bits the\n"
"// imaginary portion, all in two\'s complement.\n"
"// i_right The next (odd) complex sample input, same format as\n"
"// i_left.\n"
"// o_left The first (even) complex output.\n"
"// o_right The next (odd) complex output.\n"
"// o_sync Output synchronization signal.\n"
"//\n%s"
"//\n", creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module\tlaststage%s(i_clk, %s, i_ce, i_sync, i_left, i_right, o_left, o_right, o_sync%s);\n"
"\tparameter\tIWIDTH=%d,OWIDTH=IWIDTH+1, SHIFT=%d;\n"
"\tinput\t\ti_clk, %s, i_ce, i_sync;\n"
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n"
"\toutput\treg\t[(2*OWIDTH-1):0]\to_left, o_right;\n"
"\toutput\treg\t\t\to_sync;\n"
"\n", (dbg)?"_dbg":"", resetw.c_str(), (dbg)?", o_dbg":"",
TST_DBLSTAGE_IWIDTH, TST_DBLSTAGE_SHIFT,
resetw.c_str());
 
if (dbg) { fprintf(fp, "\toutput\twire\t[33:0]\t\t\to_dbg;\n"
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_left[(2*OWIDTH-1):(2*OWIDTH-16)],\n"
"\t\t\t\t\to_left[(OWIDTH-1):(OWIDTH-16)] };\n"
"\n");
}
fprintf(fp,
"\twire\tsigned\t[(IWIDTH-1):0]\ti_in_0r, i_in_0i, i_in_1r, i_in_1i;\n"
"\tassign\ti_in_0r = i_left[(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\ti_in_0i = i_left[(IWIDTH-1):0];\n"
"\tassign\ti_in_1r = i_right[(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\ti_in_1i = i_right[(IWIDTH-1):0];\n"
"\twire\t[(OWIDTH-1):0]\t\to_out_0r, o_out_0i,\n"
"\t\t\t\t\to_out_1r, o_out_1i;\n"
"\n"
"\n"
"\t// Handle a potential rounding situation, when IWIDTH>=OWIDTH.\n"
"\n"
"\n");
fprintf(fp,
"\n"
"\t// As with any register connected to the sync pulse, these must\n"
"\t// have initial values and be reset on the %s signal.\n"
"\t// Other data values need only restrict their updates to i_ce\n"
"\t// enabled clocks, but sync\'s must obey resets and initial\n"
"\t// conditions as well.\n"
"\treg\trnd_sync, r_sync;\n"
"\n"
"\tinitial\trnd_sync = 1\'b0; // Sync into rounding\n"
"\tinitial\tr_sync = 1\'b0; // Sync coming out\n",
resetw.c_str());
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negdge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\tbegin\n"
"\t\t\trnd_sync <= 1\'b0;\n"
"\t\t\tr_sync <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\trnd_sync <= i_sync;\n"
"\t\t\tr_sync <= rnd_sync;\n"
"\t\tend\n"
"\n"
"\t// As with other variables, these are really only updated when in\n"
"\t// the processing pipeline, after the first i_sync. However, to\n"
"\t// eliminate as much unnecessary logic as possible, we toggle\n"
"\t// these any time the i_ce line is enabled, and don\'t reset.\n"
"\t// them on %s.\n", resetw.c_str());
fprintf(fp,
"\t// Don't forget that we accumulate a bit by adding two values\n"
"\t// together. Therefore our intermediate value must have one more\n"
"\t// bit than the two originals.\n"
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_0r, rnd_in_0i;\n"
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_1r, rnd_in_1i;\n\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t//\n"
"\t\t\trnd_in_0r <= i_in_0r + i_in_1r;\n"
"\t\t\trnd_in_0i <= i_in_0i + i_in_1i;\n"
"\t\t\t//\n"
"\t\t\trnd_in_1r <= i_in_0r - i_in_1r;\n"
"\t\t\trnd_in_1i <= i_in_0i - i_in_1i;\n"
"\t\t\t//\n"
"\t\tend\n"
"\n");
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0r(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_0r, o_out_0r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0i(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_0i, o_out_0i);\n\n", rnd_string);
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1r(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_1r, o_out_1r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1i(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_1i, o_out_1i);\n\n", rnd_string);
 
fprintf(fp, "\n"
"\t// Prior versions of this routine did not include the extra\n"
"\t// clock and register/flip-flops that this routine requires.\n"
"\t// These are placed in here to correct a bug in Verilator, that\n"
"\t// otherwise struggles. (Hopefully this will fix the problem ...)\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\to_left <= { o_out_0r, o_out_0i };\n"
"\t\t\to_right <= { o_out_1r, o_out_1i };\n"
"\t\tend\n"
"\n"
"\tinitial\to_sync = 1\'b0; // Final sync coming out of module\n");
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negdge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\t\to_sync <= 1\'b0;\n"
"\t\telse if (i_ce)\n"
"\t\t\to_sync <= r_sync;\n"
"\n"
"endmodule\n");
fclose(fp);
}
 
void build_stage(const char *fname,
int stage, int nwide, int offset,
int nbits, int xtra, int ckpce,
const bool async_reset, const bool dbg) {
FILE *fstage = fopen(fname, "w");
int cbits = nbits + xtra;
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
if (((unsigned)cbits * 2u) >= sizeof(long long)*8) {
fprintf(stderr, "ERROR: CMEM Coefficient precision requested overflows long long data type.\n");
exit(-1);
}
 
if (fstage == NULL) {
fprintf(stderr, "ERROR: Could not open %s for writing!\n", fname);
perror("O/S Err was:");
fprintf(stderr, "Attempting to continue, but this file will be missing.\n");
return;
}
 
fprintf(fstage,
SLASHLINE
"//\n"
"// Filename:\tfftstage%s.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tThis file is (almost) a Verilog source file. It is meant to\n"
"// be used by a FFT core compiler to generate FFTs which may be\n"
"// used as part of an FFT core. Specifically, this file encapsulates\n"
"// the options of an FFT-stage. For any 2^N length FFT, there shall be\n"
"// (N-1) of these stages.\n"
"//\n"
"//\n"
"// Operation:\n"
"// Given a stream of values, operate upon them as though they were\n"
"// value pairs, x[n] and x[n+N/2]. The stream begins when n=0, and ends\n"
"// when n=N/2-1 (i.e. there's a full set of N values). When the value\n"
"// x[0] enters, the synchronization input, i_sync, must be true as well.\n"
"//\n"
"// For this stream, produce outputs\n"
"// y[n ] = x[n] + x[n+N/2], and\n"
"// y[n+N/2] = (x[n] - x[n+N/2]) * c[n],\n"
"// where c[n] is a complex coefficient found in the\n"
"// external memory file COEFFILE.\n"
"// When y[0] is output, a synchronization bit o_sync will be true as\n"
"// well, otherwise it will be zero.\n"
"//\n"
"// Most of the work to do this is done within the butterfly, whether the\n"
"// hardware accelerated butterfly (uses a DSP) or not.\n"
"//\n%s"
"//\n",
(dbg)?"_dbg":"", prjname, creator);
fprintf(fstage, "%s", cpyleft);
fprintf(fstage, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fstage, "module\tfftstage%s(i_clk, %s, i_ce, i_sync, i_data, o_data, o_sync%s);\n",
(dbg)?"_dbg":"", resetw.c_str(),
(dbg)?", o_dbg":"");
// These parameter values are useless at this point--they are to be
// replaced by the parameter values in the calling program. Only
// problem is, the CWIDTH needs to match exactly!
fprintf(fstage, "\tparameter\tIWIDTH=%d,CWIDTH=%d,OWIDTH=%d;\n",
nbits, 20, nbits+1); // 20, not cbits, since the tb depends upon it
fprintf(fstage,
"\t// Parameters specific to the core that should be changed when this\n"
"\t// core is built ... Note that the minimum LGSPAN (the base two log\n"
"\t// of the span, or the base two log of the current FFT size) is 3.\n"
"\t// Smaller spans (i.e. the span of 2) must use the dbl laststage module.\n"
"\tparameter\tLGWIDTH=%d, LGSPAN=%d, BFLYSHIFT=0;\n"
"\tparameter\t[0:0] OPT_HWMPY = 1;\n",
lgval(stage), (nwide <= 1) ? lgval(stage)-1 : lgval(stage)-2);
fprintf(fstage,
"\t// Clocks per CE. If your incoming data rate is less than 50%% of your\n"
"\t// clock speed, you can set CKPCE to 2\'b10, make sure there's at least\n"
"\t// one clock between cycles when i_ce is high, and then use two\n"
"\t// multiplies instead of three. Setting CKPCE to 2\'b11, and insisting\n"
"\t// on at least two clocks with i_ce low between cycles with i_ce high,\n"
"\t// then the hardware optimized butterfly code will used one multiply\n"
"\t// instead of two.\n"
"\tparameter\t CKPCE = %d;\n", ckpce);
 
fprintf(fstage,
"\t// The COEFFILE parameter contains the name of the file containing the\n"
"\t// FFT twiddle factors\n");
if (nwide == 2) {
fprintf(fstage, "\tparameter\tCOEFFILE=\"cmem_%c%d.hex\";\n",
(offset)?'o':'e', stage*2);
} else
fprintf(fstage, "\tparameter\tCOEFFILE=\"cmem_%d.hex\";\n",
stage);
 
fprintf(fstage,"\n"
"`ifdef VERILATOR\n"
"\tparameter [0:0] ZERO_ON_IDLE = 1'b0;\n"
"`else\n"
"\tlocalparam [0:0] ZERO_ON_IDLE = 1'b0;\n"
"`endif // VERILATOR\n\n");
 
fprintf(fstage,
"\tinput i_clk, %s, i_ce, i_sync;\n"
"\tinput [(2*IWIDTH-1):0] i_data;\n"
"\toutput reg [(2*OWIDTH-1):0] o_data;\n"
"\toutput reg o_sync;\n"
"\n", resetw.c_str());
if (dbg) { fprintf(fstage, "\toutput\twire\t[33:0]\t\t\to_dbg;\n"
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n"
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n"
"\n");
}
fprintf(fstage,
"\treg wait_for_sync;\n"
"\treg [(2*IWIDTH-1):0] ib_a, ib_b;\n"
"\treg [(2*CWIDTH-1):0] ib_c;\n"
"\treg ib_sync;\n"
"\n"
"\treg b_started;\n"
"\twire ob_sync;\n"
"\twire [(2*OWIDTH-1):0]\tob_a, ob_b;\n");
fprintf(fstage,
"\n"
"\t// cmem is defined as an array of real and complex values,\n"
"\t// where the top CWIDTH bits are the real value and the bottom\n"
"\t// CWIDTH bits are the imaginary value.\n"
"\t//\n"
"\t// cmem[i] = { (2^(CWIDTH-2)) * cos(2*pi*i/(2^LGWIDTH)),\n"
"\t// (2^(CWIDTH-2)) * sin(2*pi*i/(2^LGWIDTH)) };\n"
"\t//\n"
"\treg [(2*CWIDTH-1):0] cmem [0:((1<<LGSPAN)-1)];\n"
"\tinitial\t$readmemh(COEFFILE,cmem);\n\n");
 
// gen_coeff_file(coredir, fname, stage, cbits, nwide, offset, inv);
 
fprintf(fstage,
"\treg [(LGSPAN):0] iaddr;\n"
"\treg [(2*IWIDTH-1):0] imem [0:((1<<LGSPAN)-1)];\n"
"\n"
"\treg [LGSPAN:0] oB;\n"
"\treg [(2*OWIDTH-1):0] omem [0:((1<<LGSPAN)-1)];\n"
"\n"
"\tinitial wait_for_sync = 1\'b1;\n"
"\tinitial iaddr = 0;\n");
if (async_reset)
fprintf(fstage, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fstage, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
 
fprintf(fstage,
"\tbegin\n"
"\t\t\twait_for_sync <= 1\'b1;\n"
"\t\t\tiaddr <= 0;\n"
"\tend else if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n"
"\tbegin\n"
"\t\t//\n"
"\t\t// First step: Record what we\'re not ready to use yet\n"
"\t\t//\n"
"\t\tiaddr <= iaddr + { {(LGSPAN){1\'b0}}, 1\'b1 };\n"
"\t\twait_for_sync <= 1\'b0;\n"
"\tend\n"
"\talways @(posedge i_clk) // Need to make certain here that we don\'t read\n"
"\tif ((i_ce)&&(!iaddr[LGSPAN])) // and write the same address on\n"
"\t\timem[iaddr[(LGSPAN-1):0]] <= i_data; // the same clk\n"
"\n");
 
fprintf(fstage,
"\t//\n"
"\t// Now, we have all the inputs, so let\'s feed the butterfly\n"
"\t//\n"
"\tinitial ib_sync = 1\'b0;\n");
if (async_reset)
fprintf(fstage, "\talways @(posedge i_clk, negedge i_areset_n)\n\tif (!i_areset_n)\n");
else
fprintf(fstage, "\talways @(posedge i_clk)\n\tif (i_reset)\n");
fprintf(fstage,
"\t\tib_sync <= 1\'b0;\n"
"\telse if (i_ce)\n"
"\tbegin\n"
"\t\t// Set the sync to true on the very first\n"
"\t\t// valid input in, and hence on the very\n"
"\t\t// first valid data out per FFT.\n"
"\t\tib_sync <= (iaddr==(1<<(LGSPAN)));\n"
"\tend\n\n"
"\talways\t@(posedge i_clk)\n"
"\tif (i_ce)\n"
"\tbegin\n"
"\t\t// One input from memory, ...\n"
"\t\tib_a <= imem[iaddr[(LGSPAN-1):0]];\n"
"\t\t// One input clocked in from the top\n"
"\t\tib_b <= i_data;\n"
"\t\t// and the coefficient or twiddle factor\n"
"\t\tib_c <= cmem[iaddr[(LGSPAN-1):0]];\n"
"\tend\n\n");
 
fprintf(fstage,
"\t// The idle register is designed to keep track of when an input\n"
"\t// to the butterfly is important and going to be used. It's used\n"
"\t// in a flag following, so that when useful values are placed\n"
"\t// into the butterfly they'll be non-zero (idle=0), otherwise when\n"
"\t// the inputs to the butterfly are irrelevant and will be ignored,\n"
"\t// then (idle=1) those inputs will be set to zero. This\n"
"\t// functionality is not designed to be used in operation, but only\n"
"\t// within a Verilator simulation context when chasing a bug.\n"
"\t// In this limited environment, the non-zero answers will stand\n"
"\t// in a trace making it easier to highlight a bug.\n"
"\treg idle;\n"
"\tgenerate if (ZERO_ON_IDLE)\n"
"\tbegin\n"
"\t\tinitial idle = 1;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n"
"\t\t\tidle <= 1\'b1;\n"
"\t\telse if (i_ce)\n"
"\t\t\tidle <= (!iaddr[LGSPAN])&&(!wait_for_sync);\n\n"
"\tend else begin\n\n"
"\t\talways @(*) idle = 0;\n\n"
"\tend endgenerate\n\n");
 
fprintf(fstage,
"\tgenerate if (OPT_HWMPY)\n"
"\tbegin : HWBFLY\n"
"\t\thwbfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n"
"\t\t\t\t.CKPCE(CKPCE), .SHIFT(BFLYSHIFT))\n"
"\t\t\tbfly(i_clk, %s, i_ce, (idle)?0:ib_c,\n"
"\t\t\t\t(idle || (!i_ce)) ? 0:ib_a,\n"
"\t\t\t\t(idle || (!i_ce)) ? 0:ib_b,\n"
"\t\t\t\t(ib_sync)&&(i_ce),\n"
"\t\t\t\tob_a, ob_b, ob_sync);\n"
"\tend else begin : FWBFLY\n"
"\t\tbutterfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n"
"\t\t\t\t.CKPCE(CKPCE),.SHIFT(BFLYSHIFT))\n"
"\t\t\tbfly(i_clk, %s, i_ce,\n"
"\t\t\t\t\t(idle||(!i_ce))?0:ib_c,\n"
"\t\t\t\t\t(idle||(!i_ce))?0:ib_a,\n"
"\t\t\t\t\t(idle||(!i_ce))?0:ib_b,\n"
"\t\t\t\t\t(ib_sync&&i_ce),\n"
"\t\t\t\t\tob_a, ob_b, ob_sync);\n"
"\tend endgenerate\n\n",
resetw.c_str(), resetw.c_str());
 
fprintf(fstage,
"\t//\n"
"\t// Next step: recover the outputs from the butterfly\n"
"\t//\n"
"\tinitial oB = 0;\n"
"\tinitial o_sync = 0;\n"
"\tinitial b_started = 0;\n");
if (async_reset)
fprintf(fstage, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fstage, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fstage,
"\tbegin\n"
"\t\toB <= 0;\n"
"\t\to_sync <= 0;\n"
"\t\tb_started <= 0;\n"
"\tend else if (i_ce)\n"
"\tbegin\n"
"\t\to_sync <= (!oB[LGSPAN])?ob_sync : 1\'b0;\n"
"\t\tif (ob_sync||b_started)\n"
"\t\t\toB <= oB + { {(LGSPAN){1\'b0}}, 1\'b1 };\n"
"\t\tif ((ob_sync)&&(!oB[LGSPAN]))\n"
"\t\t// A butterfly output is available\n"
"\t\t\tb_started <= 1\'b1;\n"
"\tend\n\n");
fprintf(fstage,
"\treg [(LGSPAN-1):0]\t\tdly_addr;\n"
"\treg [(2*OWIDTH-1):0]\tdly_value;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_ce)\n"
"\tbegin\n"
"\t\tdly_addr <= oB[(LGSPAN-1):0];\n"
"\t\tdly_value <= ob_b;\n"
"\tend\n"
"\talways @(posedge i_clk)\n"
"\tif (i_ce)\n"
"\t\tomem[dly_addr] <= dly_value;\n"
"\n");
fprintf(fstage,
"\talways @(posedge i_clk)\n"
"\tif (i_ce)\n"
"\t\to_data <= (!oB[LGSPAN])?ob_a : omem[oB[(LGSPAN-1):0]];\n"
"\n");
fprintf(fstage, "endmodule\n");
}
/trunk/sw/bldstage.h
0,0 → 1,52
////////////////////////////////////////////////////////////////////////////////
//
// Filename: bldstage.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef BLDSTAGE_H
#define BLDSTAGE_H
 
#include "rounding.h"
 
extern void build_dblstage(const char *fname, ROUND_T rounding,
const bool async_reset = false, const bool dbg = false);
 
extern void build_stage(const char *fname,
int stage, int nwide, int offset,
int nbits, int xtra, int ckpce,
const bool async_reset = false,
const bool dbg=false);
 
#endif // BLDSTAGE_H
/trunk/sw/butterfly.cpp
0,0 → 1,1800
////////////////////////////////////////////////////////////////////////////////
//
// Filename: butterfly.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen
#include <stdio.h>
#include <stdlib.h>
 
#ifdef _MSC_VER // added for ms vs compatibility
 
#include <io.h>
#include <direct.h>
#define _USE_MATH_DEFINES
#define R_OK 4 /* Test for read permission. */
#define W_OK 2 /* Test for write permission. */
#define X_OK 0 /* !!!!!! execute permission - unsupported in windows*/
#define F_OK 0 /* Test for existence. */
 
#if _MSC_VER <= 1700
 
int lstat(const char *filename, struct stat *buf) { return 1; };
#define S_ISDIR(A) 0
 
#else
 
#define lstat _stat
#define S_ISDIR _S_IFDIR
 
#endif
 
#define mkdir(A,B) _mkdir(A)
 
#define access _access
 
#else
// And for G++/Linux environment
 
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros
#include <sys/stat.h>
#endif
 
#include <string.h>
#include <string>
#include <math.h>
#include <ctype.h>
#include <assert.h>
 
#include "defaults.h"
#include "legal.h"
#include "rounding.h"
#include "fftlib.h"
#include "bldstage.h"
#include "bitreverse.h"
#include "softmpy.h"
#include "butterfly.h"
 
void build_butterfly(const char *fname, int xtracbits, ROUND_T rounding,
int ckpce, const bool async_reset) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
const char *rnd_string;
if (rounding == RND_TRUNCATE)
rnd_string = "truncate";
else if (rounding == RND_FROMZERO)
rnd_string = "roundfromzero";
else if (rounding == RND_HALFUP)
rnd_string = "roundhalfup";
else
rnd_string = "convround";
 
//if (ckpce >= 3)
//ckpce = 3;
if (ckpce <= 1)
ckpce = 1;
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\tbutterfly.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tThis routine caculates a butterfly for a decimation\n"
"// in frequency version of an FFT. Specifically, given\n"
"// complex Left and Right values together with a coefficient, the output\n"
"// of this routine is given by:\n"
"//\n"
"// L' = L + R\n"
"// R' = (L - R)*C\n"
"//\n"
"// The rest of the junk below handles timing (mostly), to make certain\n"
"// that L' and R' reach the output at the same clock. Further, just to\n"
"// make certain that is the case, an 'aux' input exists. This aux value\n"
"// will come out of this routine synchronized to the values it came in\n"
"// with. (i.e., both L', R', and aux all have the same delay.) Hence,\n"
"// a caller of this routine may set aux on the first input with valid\n"
"// data, and then wait to see aux set on the output to know when to find\n"
"// the first output with valid data.\n"
"//\n"
"// All bits are preserved until the very last clock, where any more bits\n"
"// than OWIDTH will be quietly discarded.\n"
"//\n"
"// This design features no overflow checking.\n"
"//\n"
"// Notes:\n"
"// CORDIC:\n"
"// Much as we might like, we can't use a cordic here.\n"
"// The goal is to accomplish an FFT, as defined, and a\n"
"// CORDIC places a scale factor onto the data. Removing\n"
"// the scale factor would cost two multiplies, which\n"
"// is precisely what we are trying to avoid.\n"
"//\n"
"//\n"
"// 3-MULTIPLIES:\n"
"// It should also be possible to do this with three multiplies\n"
"// and an extra two addition cycles.\n"
"//\n"
"// We want\n"
"// R+I = (a + jb) * (c + jd)\n"
"// R+I = (ac-bd) + j(ad+bc)\n"
"// We multiply\n"
"// P1 = ac\n"
"// P2 = bd\n"
"// P3 = (a+b)(c+d)\n"
"// Then\n"
"// R+I=(P1-P2)+j(P3-P2-P1)\n"
"//\n"
"// WIDTHS:\n"
"// On multiplying an X width number by an\n"
"// Y width number, X>Y, the result should be (X+Y)\n"
"// bits, right?\n"
"// -2^(X-1) <= a <= 2^(X-1) - 1\n"
"// -2^(Y-1) <= b <= 2^(Y-1) - 1\n"
"// (2^(Y-1)-1)*(-2^(X-1)) <= ab <= 2^(X-1)2^(Y-1)\n"
"// -2^(X+Y-2)+2^(X-1) <= ab <= 2^(X+Y-2) <= 2^(X+Y-1) - 1\n"
"// -2^(X+Y-1) <= ab <= 2^(X+Y-1)-1\n"
"// YUP! But just barely. Do this and you'll really want\n"
"// to drop a bit, although you will risk overflow in so\n"
"// doing.\n"
"//\n"
"// 20150602 -- The sync logic lines have been completely redone. The\n"
"// synchronization lines no longer go through the FIFO with the\n"
"// left hand sum, but are kept out of memory. This allows the\n"
"// butterfly to use more optimal memory resources, while also\n"
"// guaranteeing that the sync lines can be properly reset upon\n"
"// any reset signal.\n"
"//\n"
"//\n%s"
"//\n", prjname, creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
 
fprintf(fp,
"module\tbutterfly(i_clk, %s, i_ce, i_coef, i_left, i_right, i_aux,\n"
"\t\to_left, o_right, o_aux);\n"
"\t// Public changeable parameters ...\n", resetw.c_str());
 
fprintf(fp,
"\tparameter IWIDTH=%d,", TST_BUTTERFLY_IWIDTH);
#ifdef TST_BUTTERFLY_CWIDTH
fprintf(fp, "CWIDTH=%d,", TST_BUTTERFLY_CWIDTH);
#else
fprintf(fp, "CWIDTH=IWIDTH+%d,", xtracbits);
#endif
#ifdef TST_BUTTERFLY_OWIDTH
fprintf(fp, "OWIDTH=%d;\n", TST_BUTTERFLY_OWIDTH);
// OWIDTH = TST_BUTTERFLY_OWIDTH;
#else
fprintf(fp, "OWIDTH=IWIDTH+1;\n");
#endif
fprintf(fp, "\tparameter\tSHIFT=0;\n");
 
fprintf(fp,
"\t// The number of clocks per each i_ce. The actual number can be\n"
"\t// more, but the algorithm depends upon at least this many for\n"
"\t// extra internal processing.\n"
"\tparameter CKPCE=%d;\n", ckpce);
 
fprintf(fp,
"\t//\n"
"\t// Local/derived parameters that are calculated from the above\n"
"\t// params. Apart from algorithmic changes below, these should not\n"
"\t// be adjusted\n"
"\t//\n"
"\t// The first step is to calculate how many clocks it takes our\n"
"\t// multiply to come back with an answer within. The time in the\n"
"\t// multiply depends upon the input value with the fewest number of\n"
"\t// bits--to keep the pipeline depth short. So, let's find the\n"
"\t// fewest number of bits here.\n"
"\tlocalparam MXMPYBITS = \n"
"\t\t((IWIDTH+2)>(CWIDTH+1)) ? (CWIDTH+1) : (IWIDTH + 2);\n"
"\t//\n"
"\t// Given this \"fewest\" number of bits, we can calculate the\n"
"\t// number of clocks the multiply itself will take.\n"
"\tlocalparam MPYDELAY=((MXMPYBITS+1)/2)+2;\n"
"\t//\n"
"\t// In an environment when CKPCE > 1, the multiply delay isn\'t\n"
"\t// necessarily the delay felt by this algorithm--measured in\n"
"\t// i_ce\'s. In particular, if the multiply can operate with more\n"
"\t// operations per clock, it can appear to finish \"faster\".\n"
"\t// Since most of the logic in this core operates on the slower\n"
"\t// clock, we'll need to map that speed into the number of slower\n"
"\t// clock ticks that it takes.\n"
"\tlocalparam LCLDELAY = (CKPCE == 1) ? MPYDELAY\n"
"\t\t: (CKPCE == 2) ? (MPYDELAY/2+2)\n"
"\t\t: (MPYDELAY/3 + 2);\n"
"\tlocalparam LGDELAY = (MPYDELAY>64) ? 7\n"
"\t\t\t: (MPYDELAY > 32) ? 6\n"
"\t\t\t: (MPYDELAY > 16) ? 5\n"
"\t\t\t: (MPYDELAY > 8) ? 4\n"
"\t\t\t: (MPYDELAY > 4) ? 3\n"
"\t\t\t: 2;\n"
"\tlocalparam AUXLEN=(LCLDELAY+3);\n"
"\tlocalparam MPYREMAINDER = MPYDELAY - CKPCE*(MPYDELAY/CKPCE);\n"
"\n\n");
 
 
fprintf(fp,
"\tinput\t\ti_clk, %s, i_ce;\n"
"\tinput\t\t[(2*CWIDTH-1):0] i_coef;\n"
"\tinput\t\t[(2*IWIDTH-1):0] i_left, i_right;\n"
"\tinput\t\ti_aux;\n"
"\toutput\twire [(2*OWIDTH-1):0] o_left, o_right;\n"
"\toutput\treg\to_aux;\n\n", resetw.c_str());
fprintf(fp,
"\treg\t[(2*IWIDTH-1):0]\tr_left, r_right;\n"
"\treg\t[(2*CWIDTH-1):0]\tr_coef, r_coef_2;\n"
"\twire\tsigned\t[(IWIDTH-1):0]\tr_left_r, r_left_i, r_right_r, r_right_i;\n"
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n"
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n"
"\n"
"\treg\tsigned\t[(IWIDTH):0]\tr_sum_r, r_sum_i, r_dif_r, r_dif_i;\n"
"\n"
"\treg [(LGDELAY-1):0] fifo_addr;\n"
"\twire [(LGDELAY-1):0] fifo_read_addr;\n"
"\tassign\tfifo_read_addr = fifo_addr - LCLDELAY[(LGDELAY-1):0];\n"
"\treg [(2*IWIDTH+1):0] fifo_left [ 0:((1<<LGDELAY)-1)];\n"
"\n");
fprintf(fp,
"\t// Set up the input to the multiply\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// One clock just latches the inputs\n"
"\t\t\tr_left <= i_left; // No change in # of bits\n"
"\t\t\tr_right <= i_right;\n"
"\t\t\tr_coef <= i_coef;\n"
"\t\t\t// Next clock adds/subtracts\n"
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n"
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n"
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n"
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n"
"\t\t\t// Other inputs are simply delayed on second clock\n"
"\t\t\tr_coef_2<= r_coef;\n"
"\t\tend\n"
"\n");
fprintf(fp,
"\t// Don\'t forget to record the even side, since it doesn\'t need\n"
"\t// to be multiplied, but yet we still need the results in sync\n"
"\t// with the answer when it is ready.\n"
"\tinitial fifo_addr = 0;\n");
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\t\tfifo_addr <= 0;\n"
"\t\telse if (i_ce)\n"
"\t\t\t// Need to delay the sum side--nothing else happens\n"
"\t\t\t// to it, but it needs to stay synchronized with the\n"
"\t\t\t// right side.\n"
"\t\t\tfifo_addr <= fifo_addr + 1;\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\tfifo_left[fifo_addr] <= { r_sum_r, r_sum_i };\n"
"\n"
"\twire\tsigned\t[(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n"
"\tassign\tir_coef_r = r_coef_2[(2*CWIDTH-1):CWIDTH];\n"
"\tassign\tir_coef_i = r_coef_2[(CWIDTH-1):0];\n"
"\twire\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\tp_one, p_two, p_three;\n"
"\n"
"\n");
fprintf(fp,
"\t// Multiply output is always a width of the sum of the widths of\n"
"\t// the two inputs. ALWAYS. This is independent of the number of\n"
"\t// bits in p_one, p_two, or p_three. These values needed to\n"
"\t// accumulate a bit (or two) each. However, this approach to a\n"
"\t// three multiply complex multiply cannot increase the total\n"
"\t// number of bits in our final output. We\'ll take care of\n"
"\t// dropping back down to the proper width, OWIDTH, in our routine\n"
"\t// below.\n"
"\n"
"\n");
fprintf(fp,
"\t// We accomplish here \"Karatsuba\" multiplication. That is,\n"
"\t// by doing three multiplies we accomplish the work of four.\n"
"\t// Let\'s prove to ourselves that this works ... We wish to\n"
"\t// multiply: (a+jb) * (c+jd), where a+jb is given by\n"
"\t//\ta + jb = r_dif_r + j r_dif_i, and\n"
"\t//\tc + jd = ir_coef_r + j ir_coef_i.\n"
"\t// We do this by calculating the intermediate products P1, P2,\n"
"\t// and P3 as\n"
"\t//\tP1 = ac\n"
"\t//\tP2 = bd\n"
"\t//\tP3 = (a + b) * (c + d)\n"
"\t// and then complete our final answer with\n"
"\t//\tac - bd = P1 - P2 (this checks)\n"
"\t//\tad + bc = P3 - P2 - P1\n"
"\t//\t = (ac + bc + ad + bd) - bd - ac\n"
"\t//\t = bc + ad (this checks)\n"
"\n"
"\n");
fprintf(fp,
"\t// This should really be based upon an IF, such as in\n"
"\t// if (IWIDTH < CWIDTH) then ...\n"
"\t// However, this is the only (other) way I know to do it.\n"
"\tgenerate if (CKPCE <= 1)\n"
"\tbegin\n"
"\n"
"\t\twire\t[(CWIDTH):0]\tp3c_in;\n"
"\t\twire\t[(IWIDTH+1):0]\tp3d_in;\n"
"\t\tassign\tp3c_in = ir_coef_i + ir_coef_r;\n"
"\t\tassign\tp3d_in = r_dif_r + r_dif_i;\n"
"\n"
"\t\t// We need to pad these first two multiplies by an extra\n"
"\t\t// bit just to keep them aligned with the third,\n"
"\t\t// simpler, multiply.\n"
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) p1(i_clk, i_ce,\n"
"\t\t\t\t{ir_coef_r[CWIDTH-1],ir_coef_r},\n"
"\t\t\t\t{r_dif_r[IWIDTH],r_dif_r}, p_one);\n"
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) p2(i_clk, i_ce,\n"
"\t\t\t\t{ir_coef_i[CWIDTH-1],ir_coef_i},\n"
"\t\t\t\t{r_dif_i[IWIDTH],r_dif_i}, p_two);\n"
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) p3(i_clk, i_ce,\n"
"\t\t\t\tp3c_in, p3d_in, p_three);\n"
"\n");
 
///////////////////////////////////////////
///
/// Two clocks per CE, so CE, no-ce, CE, no-ce, etc
///
fprintf(fp,
"\tend else if (CKPCE == 2)\n"
"\tbegin : CKPCE_TWO\n"
"\t\t// Coefficient multiply inputs\n"
"\t\treg [2*(CWIDTH)-1:0] mpy_pipe_c;\n"
"\t\t// Data multiply inputs\n"
"\t\treg [2*(IWIDTH+1)-1:0] mpy_pipe_d;\n"
"\t\twire signed [(CWIDTH-1):0] mpy_pipe_vc;\n"
"\t\twire signed [(IWIDTH):0] mpy_pipe_vd;\n"
"\t\t//\n"
"\t\treg signed [(CWIDTH+1)-1:0] mpy_cof_sum;\n"
"\t\treg signed [(IWIDTH+2)-1:0] mpy_dif_sum;\n"
"\n"
"\t\tassign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH];\n"
"\t\tassign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1];\n"
"\n"
"\t\treg mpy_pipe_v;\n"
"\t\treg ce_phase;\n"
"\n"
"\t\treg signed [(CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;\n"
"\t\treg signed [IWIDTH+CWIDTH+3-1:0] longmpy;\n"
"\n"
"\n"
"\t\tinitial ce_phase = 1'b0;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n"
"\t\t\tce_phase <= 1'b0;\n"
"\t\telse if (i_ce)\n"
"\t\t\tce_phase <= 1'b1;\n"
"\t\telse\n"
"\t\t\tce_phase <= 1'b0;\n"
"\n"
"\t\talways @(*)\n"
"\t\t\tmpy_pipe_v = (i_ce)||(ce_phase);\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (ce_phase)\n"
"\t\tbegin\n"
"\t\t\tmpy_pipe_c[2*CWIDTH-1:0] <=\n"
"\t\t\t\t\t{ ir_coef_r, ir_coef_i };\n"
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <=\n"
"\t\t\t\t\t{ r_dif_r, r_dif_i };\n"
"\n"
"\t\t\tmpy_cof_sum <= ir_coef_i + ir_coef_r;\n"
"\t\t\tmpy_dif_sum <= r_dif_r + r_dif_i;\n"
"\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\tmpy_pipe_c[2*(CWIDTH)-1:0] <= {\n"
"\t\t\t\tmpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} };\n"
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <= {\n"
"\t\t\t\tmpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} };\n"
"\t\tend\n"
"\n");
fprintf(fp,
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) mpy0(i_clk, mpy_pipe_v,\n"
"\t\t\t\tmpy_cof_sum, mpy_dif_sum, longmpy);\n"
"\n");
 
fprintf(fp,
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) mpy1(i_clk, mpy_pipe_v,\n"
"\t\t\t\t{ mpy_pipe_vc[CWIDTH-1], mpy_pipe_vc },\n"
"\t\t\t\t{ mpy_pipe_vd[IWIDTH ], mpy_pipe_vd },\n"
"\t\t\t\tmpy_pipe_out);\n\n");
 
fprintf(fp,
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\n"
"\t\t\t\t\trp_one, rp_two, rp_three,\n"
"\t\t\t\t\trp2_one, rp2_two, rp2_three;\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (((i_ce)&&(!MPYDELAY[0]))\n"
"\t\t\t||((ce_phase)&&(MPYDELAY[0])))\n"
"\t\t\trp_one <= mpy_pipe_out;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (((i_ce)&&(MPYDELAY[0]))\n"
"\t\t\t||((ce_phase)&&(!MPYDELAY[0])))\n"
"\t\t\trp_two <= mpy_pipe_out;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\trp_three <= longmpy;\n"
"\n"
"\t\t// Our outputs *MUST* be set on a clock where i_ce is\n"
"\t\t// true for the following logic to work. Make that\n"
"\t\t// happen here.\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\trp2_one<= rp_one;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\trp2_two <= rp_two;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\trp2_three<= rp_three;\n"
"\n"
"\t\tassign p_one = rp2_one;\n"
"\t\tassign p_two = (!MPYDELAY[0])? rp2_two : rp_two;\n"
"\t\tassign p_three = ( MPYDELAY[0])? rp_three : rp2_three;\n"
"\n"
"\t\t// verilator lint_off UNUSED\n"
"\t\twire\t[2*(IWIDTH+CWIDTH+3)-1:0]\tunused;\n"
"\t\tassign\tunused = { rp2_two, rp2_three };\n"
"\t\t// verilator lint_on UNUSED\n"
"\n");
 
/////////////////////////
///
/// Three clock per CE, so CE, no-ce, no-ce*, CE
///
fprintf(fp,
"\tend else if (CKPCE <= 3)\n\tbegin : CKPCE_THREE\n");
 
fprintf(fp,
"\t\t// Coefficient multiply inputs\n"
"\t\treg\t\t[3*(CWIDTH+1)-1:0]\tmpy_pipe_c;\n"
"\t\t// Data multiply inputs\n"
"\t\treg\t\t[3*(IWIDTH+2)-1:0]\tmpy_pipe_d;\n"
"\t\twire\tsigned [(CWIDTH):0] mpy_pipe_vc;\n"
"\t\twire\tsigned [(IWIDTH+1):0] mpy_pipe_vd;\n"
"\n"
"\t\tassign\tmpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)];\n"
"\t\tassign\tmpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)];\n"
"\n"
"\t\treg\t\t\tmpy_pipe_v;\n"
"\t\treg\t\t[2:0]\tce_phase;\n"
"\n"
"\t\treg\tsigned [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;\n"
"\n");
fprintf(fp,
"\t\tinitial\tce_phase = 3'b011;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n"
"\t\t\tce_phase <= 3'b011;\n"
"\t\telse if (i_ce)\n"
"\t\t\tce_phase <= 3'b000;\n"
"\t\telse if (ce_phase != 3'b011)\n"
"\t\t\tce_phase <= ce_phase + 1'b1;\n"
"\n"
"\t\talways @(*)\n"
"\t\t\tmpy_pipe_v = (i_ce)||(ce_phase < 3'b010);\n"
"\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (ce_phase == 3\'b000)\n"
"\t\t\tbegin\n"
"\t\t\t\t// Second clock\n"
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= {\n"
"\t\t\t\t\tir_coef_r[CWIDTH-1], ir_coef_r,\n"
"\t\t\t\t\tir_coef_i[CWIDTH-1], ir_coef_i };\n"
"\t\t\t\tmpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r;\n"
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= {\n"
"\t\t\t\t\tr_dif_r[IWIDTH], r_dif_r,\n"
"\t\t\t\t\tr_dif_i[IWIDTH], r_dif_i };\n"
"\t\t\t\tmpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i;\n"
"\n"
"\t\t\tend else if (mpy_pipe_v)\n"
"\t\t\tbegin\n"
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:0] <= {\n"
"\t\t\t\t\tmpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1\'b0}} };\n"
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:0] <= {\n"
"\t\t\t\t\tmpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1\'b0}} };\n"
"\t\t\tend\n"
"\n");
fprintf(fp,
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) mpy(i_clk, mpy_pipe_v,\n"
"\t\t\t\tmpy_pipe_vc, mpy_pipe_vd, mpy_pipe_out);\n"
"\n");
 
fprintf(fp,
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\n"
"\t\t\t\trp_one, rp_two, rp_three,\n"
"\t\t\t\trp2_one, rp2_two, rp2_three,\n"
"\t\t\t\trp3_one;\n"
"\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\tif (MPYREMAINDER == 0)\n"
"\t\tbegin\n\n"
"\t\t if (i_ce)\n"
"\t\t rp_two <= mpy_pipe_out;\n"
"\t\t else if (ce_phase == 3'b000)\n"
"\t\t rp_three <= mpy_pipe_out;\n"
"\t\t else if (ce_phase == 3'b001)\n"
"\t\t rp_one <= mpy_pipe_out;\n\n"
"\t\tend else if (MPYREMAINDER == 1)\n"
"\t\tbegin\n\n"
"\t\t if (i_ce)\n"
"\t\t rp_one <= mpy_pipe_out;\n"
"\t\t else if (ce_phase == 3'b000)\n"
"\t\t rp_two <= mpy_pipe_out;\n"
"\t\t else if (ce_phase == 3'b001)\n"
"\t\t rp_three <= mpy_pipe_out;\n\n"
"\t\tend else // if (MPYREMAINDER == 2)\n"
"\t\tbegin\n\n"
"\t\t if (i_ce)\n"
"\t\t rp_three <= mpy_pipe_out;\n"
"\t\t else if (ce_phase == 3'b000)\n"
"\t\t rp_one <= mpy_pipe_out;\n"
"\t\t else if (ce_phase == 3'b001)\n"
"\t\t rp_two <= mpy_pipe_out;\n\n"
"\t\tend\n\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\trp2_one <= rp_one;\n"
"\t\t\trp2_two <= rp_two;\n"
"\t\t\trp2_three <= (MPYREMAINDER == 2) ? mpy_pipe_out : rp_three;\n"
"\t\t\trp3_one <= (MPYREMAINDER == 0) ? rp2_one : rp_one;\n"
"\t\tend\n");
fprintf(fp,
 
"\t\tassign\tp_one = rp3_one;\n"
"\t\tassign\tp_two = rp2_two;\n"
"\t\tassign\tp_three = rp2_three;\n"
"\n");
 
fprintf(fp,
"\tend endgenerate\n");
 
fprintf(fp,
"\t// These values are held in memory and delayed during the\n"
"\t// multiply. Here, we recover them. During the multiply,\n"
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n"
"\t// therefore, the left_x values need to be right shifted by\n"
"\t// CWIDTH-2 as well. The additional bits come from a sign\n"
"\t// extension.\n"
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] fifo_i, fifo_r;\n"
"\treg\t\t[(2*IWIDTH+1):0] fifo_read;\n"
"\tassign\tfifo_r = { {2{fifo_read[2*(IWIDTH+1)-1]}}, fifo_read[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n"
"\tassign\tfifo_i = { {2{fifo_read[(IWIDTH+1)-1]}}, fifo_read[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n"
"\n"
"\n"
"\treg\tsigned\t[(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n"
"\n");
fprintf(fp,
"\t// Let's do some rounding and remove unnecessary bits.\n"
"\t// We have (IWIDTH+CWIDTH+3) bits here, we need to drop down to\n"
"\t// OWIDTH, and SHIFT by SHIFT bits in the process. The trick is\n"
"\t// that we don\'t need (IWIDTH+CWIDTH+3) bits. We\'ve accumulated\n"
"\t// them, but the actual values will never fill all these bits.\n"
"\t// In particular, we only need:\n"
"\t//\t IWIDTH bits for the input\n"
"\t//\t +1 bit for the add/subtract\n"
"\t//\t+CWIDTH bits for the coefficient multiply\n"
"\t//\t +1 bit for the add/subtract in the complex multiply\n"
"\t//\t ------\n"
"\t//\t (IWIDTH+CWIDTH+2) bits at full precision.\n"
"\t//\n"
"\t// However, the coefficient multiply multiplied by a maximum value\n"
"\t// of 2^(CWIDTH-2). Thus, we only have\n"
"\t//\t IWIDTH bits for the input\n"
"\t//\t +1 bit for the add/subtract\n"
"\t//\t+CWIDTH-2 bits for the coefficient multiply\n"
"\t//\t +1 (optional) bit for the add/subtract in the cpx mpy.\n"
"\t//\t -------- ... multiply. (This last bit may be shifted out.)\n"
"\t//\t (IWIDTH+CWIDTH) valid output bits.\n"
"\t// Now, if the user wants to keep any extras of these (via OWIDTH),\n"
"\t// or if he wishes to arbitrarily shift some of these off (via\n"
"\t// SHIFT) we accomplish that here.\n"
"\n");
fprintf(fp,
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n");
 
fprintf(fp,
"\twire\tsigned\t[(CWIDTH+IWIDTH+3-1):0]\tleft_sr, left_si;\n"
"\tassign left_sr = { {(2){fifo_r[(IWIDTH+CWIDTH)]}}, fifo_r };\n"
"\tassign left_si = { {(2){fifo_i[(IWIDTH+CWIDTH)]}}, fifo_i };\n\n");
 
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_r(i_clk, i_ce,\n"
"\t\t\t\tleft_sr, rnd_left_r);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_i(i_clk, i_ce,\n"
"\t\t\t\tleft_si, rnd_left_i);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n"
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n"
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string);
fprintf(fp,
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// First clock, recover all values\n"
"\t\t\tfifo_read <= fifo_left[fifo_read_addr];\n"
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n"
"\t\t\t// although they only need to be (IWIDTH+1)\n"
"\t\t\t// + (CWIDTH) bits wide. (We\'ve got two\n"
"\t\t\t// extra bits we need to get rid of.)\n"
"\t\t\tmpy_r <= p_one - p_two;\n"
"\t\t\tmpy_i <= p_three - p_one - p_two;\n"
"\t\tend\n"
"\n");
 
fprintf(fp,
"\treg\t[(AUXLEN-1):0]\taux_pipeline;\n"
"\tinitial\taux_pipeline = 0;\n");
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\t\taux_pipeline <= 0;\n"
"\t\telse if (i_ce)\n"
"\t\t\taux_pipeline <= { aux_pipeline[(AUXLEN-2):0], i_aux };\n"
"\n");
fprintf(fp,
"\tinitial o_aux = 1\'b0;\n");
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\t\to_aux <= 1\'b0;\n"
"\t\telse if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// Second clock, latch for final clock\n"
"\t\t\to_aux <= aux_pipeline[AUXLEN-1];\n"
"\t\tend\n"
"\n");
 
fprintf(fp,
"\t// As a final step, we pack our outputs into two packed two\'s\n"
"\t// complement numbers per output word, so that each output word\n"
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n"
"\t// portion and the bottom half being the imaginary portion.\n"
"\tassign o_left = { rnd_left_r, rnd_left_i };\n"
"\tassign o_right= { rnd_right_r,rnd_right_i};\n"
"\n");
 
if (formal_property_flag) {
fprintf(fp,
"`ifdef VERILATOR\n"
"`define FORMAL\n"
"`endif\n"
"`ifdef FORMAL\n"
"\tlocalparam F_LGDEPTH = (AUXLEN > 64) ? 7\n"
"\t\t\t: (AUXLEN > 32) ? 6\n"
"\t\t\t: (AUXLEN > 16) ? 5\n"
"\t\t\t: (AUXLEN > 8) ? 4\n"
"\t\t\t: (AUXLEN > 4) ? 3 : 2;\n\n"
"\tlocalparam F_DEPTH = AUXLEN;\n"
"\tlocalparam [F_LGDEPTH-1:0] F_D = F_DEPTH[F_LGDEPTH-1:0]-1;\n"
"\n"
"\treg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1];\n"
"\treg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1];\n"
"\treg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1];\n"
"\treg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1];\n"
"\treg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1];\n"
"\treg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1];\n"
"\treg signed [F_DEPTH-1:0] f_dlyaux;\n"
"\n"
"\tinitial\tf_dlyaux[0] = 0;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n"
"\t\tf_dlyaux\t<= 0;\n"
"\telse if (i_ce)\n"
"\t\tf_dlyaux\t<= { f_dlyaux[F_DEPTH-2:0], i_aux };\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif (i_ce)\n"
"\tbegin\n"
"\t f_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH];\n"
"\t f_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0];\n"
"\t f_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH];\n"
"\t f_dlyright_i[0] <= i_right[( IWIDTH-1):0];\n"
"\t f_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH];\n"
"\t f_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0];\n"
"\tend\n"
"\n"
"\tgenvar k;\n"
"\tgenerate for(k=1; k<F_DEPTH; k=k+1)\n"
"\tbegin : F_PROPAGATE_DELAY_LINES\n"
"\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t f_dlyleft_r[k] <= f_dlyleft_r[ k-1];\n"
"\t\t f_dlyleft_i[k] <= f_dlyleft_i[ k-1];\n"
"\t\t f_dlyright_r[k] <= f_dlyright_r[k-1];\n"
"\t\t f_dlyright_i[k] <= f_dlyright_i[k-1];\n"
"\t\t f_dlycoeff_r[k] <= f_dlycoeff_r[k-1];\n"
"\t\t f_dlycoeff_i[k] <= f_dlycoeff_i[k-1];\n"
"\t\tend\n"
"\n"
"\tend endgenerate\n"
"\n"
"`ifndef VERILATOR\n"
"\talways @(posedge i_clk)\n"
"\tif ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3))\n"
"\t &&(!$past(i_ce,4)))\n"
"\t assume(i_ce);\n"
"\n"
"\tgenerate if (CKPCE <= 1)\n"
"\tbegin\n"
"\n"
"\t // i_ce is allowed to be anything in this mode\n"
"\n"
"\tend else if (CKPCE == 2)\n"
"\tbegin : F_CKPCE_TWO\n"
"\n"
"\t always @(posedge i_clk)\n"
"\t if ($past(i_ce))\n"
"\t assume(!i_ce);\n"
"\n"
"\tend else if (CKPCE == 3)\n"
"\tbegin : F_CKPCE_THREE\n"
"\n"
"\t always @(posedge i_clk)\n"
"\t if (($past(i_ce))||($past(i_ce,2)))\n"
"\t assume(!i_ce);\n"
"\n"
"\tend endgenerate\n"
"`endif\n"
"\n"
"\treg [F_LGDEPTH:0] f_startup_counter;\n"
"\tinitial f_startup_counter = 0;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n"
"\t f_startup_counter <= 0;\n"
"\telse if ((i_ce)&&(!(&f_startup_counter)))\n"
"\t f_startup_counter <= f_startup_counter + 1;\n"
"\n"
"\twire signed [IWIDTH:0] f_sumr, f_sumi;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t f_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D];\n"
"\t f_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D];\n"
"\tend\n"
"\n"
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_sumrx, f_sumix;\n"
"\tassign\tf_sumrx = { {(4){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} };\n"
"\tassign\tf_sumix = { {(4){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} };\n"
"\n"
"\twire signed [IWIDTH:0] f_difr, f_difi;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t f_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D];\n"
"\t f_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D];\n"
"\tend\n"
"\n"
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix;\n"
"\tassign\tf_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr };\n"
"\tassign\tf_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi };\n"
"\n"
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i;\n"
"\tassign\tf_widecoeff_r ={ {(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}},\n"
"\t\t\t\t\t\tf_dlycoeff_r[F_D] };\n"
"\tassign\tf_widecoeff_i ={ {(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}},\n"
"\t\t\t\t\t\tf_dlycoeff_i[F_D] };\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif (f_startup_counter > {1'b0, F_D})\n"
"\tbegin\n"
"\t assert(aux_pipeline == f_dlyaux);\n"
"\t assert(left_sr == f_sumrx);\n"
"\t assert(left_si == f_sumix);\n"
"\t assert(aux_pipeline[AUXLEN-1] == f_dlyaux[F_D]);\n"
"\n"
"\t if ((f_difr == 0)&&(f_difi == 0))\n"
"\t begin\n"
"\t assert(mpy_r == 0);\n"
"\t assert(mpy_i == 0);\n"
"\t end else if ((f_dlycoeff_r[F_D] == 0)\n"
"\t &&(f_dlycoeff_i[F_D] == 0))\n"
"\t begin\n"
"\t assert(mpy_r == 0);\n"
"\t assert(mpy_i == 0);\n"
"\t end\n"
"\n"
"\t if ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0))\n"
"\t begin\n"
"\t assert(mpy_r == f_difrx);\n"
"\t assert(mpy_i == f_difix);\n"
"\t end\n"
"\n"
"\t if ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1))\n"
"\t begin\n"
"\t assert(mpy_r == -f_difix);\n"
"\t assert(mpy_i == f_difrx);\n"
"\t end\n"
"\n"
"\t if ((f_difr == 1)&&(f_difi == 0))\n"
"\t begin\n"
"\t assert(mpy_r == f_widecoeff_r);\n"
"\t assert(mpy_i == f_widecoeff_i);\n"
"\t end\n"
"\n"
"\t if ((f_difr == 0)&&(f_difi == 1))\n"
"\t begin\n"
"\t assert(mpy_r == -f_widecoeff_i);\n"
"\t assert(mpy_i == f_widecoeff_r);\n"
"\t end\n"
"\tend\n"
"\n");
 
fprintf(fp,
"\t// Let's see if we can improve our performance at all by\n"
"\t// moving our test one clock earlier. If nothing else, it should\n"
"\t// help induction finish one (or more) clocks ealier than\n"
"\t// otherwise\n"
"\n\n"
"\twire signed [IWIDTH:0] f_predifr, f_predifi;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t\tf_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1];\n"
"\t\tf_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1];\n"
"\tend\n"
"\n"
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_predifrx, f_predifix;\n"
"\tassign f_predifrx = { {(CWIDTH+2){f_predifr[IWIDTH]}}, f_predifr };\n"
"\tassign f_predifix = { {(CWIDTH+2){f_predifi[IWIDTH]}}, f_predifi };\n"
"\n"
"\twire signed [CWIDTH:0] f_sumcoef;\n"
"\twire signed [IWIDTH+1:0] f_sumdiff;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t\tf_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1];\n"
"\t\tf_sumdiff = f_predifr + f_predifi;\n"
"\tend\n"
"\n"
"\t// Induction helpers\n"
"\talways @(posedge i_clk)\n"
"\tif (f_startup_counter >= { 1'b0, F_D })\n"
"\tbegin\n"
"\t\tif (f_dlycoeff_r[F_D-1] == 0)\n"
"\t\t\tassert(p_one == 0);\n"
"\t\tif (f_dlycoeff_i[F_D-1] == 0)\n"
"\t\t\tassert(p_two == 0);\n"
"\n"
"\t\tif (f_dlycoeff_r[F_D-1] == 1)\n"
"\t\t\tassert(p_one == f_predifrx);\n"
"\t\tif (f_dlycoeff_i[F_D-1] == 1)\n"
"\t\t\tassert(p_two == f_predifix);\n"
"\n"
"\t\tif (f_predifr == 0)\n"
"\t\t\tassert(p_one == 0);\n"
"\t\tif (f_predifi == 0)\n"
"\t\t\tassert(p_two == 0);\n"
"\n"
"\t\t// verilator lint_off WIDTH\n"
"\t\tif (f_predifr == 1)\n"
"\t\t\tassert(p_one == f_dlycoeff_r[F_D-1]);\n"
"\t\tif (f_predifi == 1)\n"
"\t\t\tassert(p_two == f_dlycoeff_i[F_D-1]);\n"
"\t\t// verilator lint_on WIDTH\n"
"\n"
"\t\tif (f_sumcoef == 0)\n"
"\t\t\tassert(p_three == 0);\n"
"\t\tif (f_sumdiff == 0)\n"
"\t\t\tassert(p_three == 0);\n"
"\t\t// verilator lint_off WIDTH\n"
"\t\tif (f_sumcoef == 1)\n"
"\t\t\tassert(p_three == f_sumdiff);\n"
"\t\tif (f_sumdiff == 1)\n"
"\t\t\tassert(p_three == f_sumcoef);\n"
"\t\t// verilator lint_on WIDTH\n"
"`ifdef VERILATOR\n"
"\t\tassert(p_one == f_predifr * f_dlycoeff_r[F_D-1]);\n"
"\t\tassert(p_two == f_predifi * f_dlycoeff_i[F_D-1]);\n"
"\t\tassert(p_three == f_sumdiff * f_sumcoef);\n"
"`endif // VERILATOR\n"
"\tend\n\n");
 
fprintf(fp,
"\t// F_CHECK will be set externally by the solver, so that we can\n"
"\t// double check that the solver is actually testing what we think\n"
"\t// it is testing. We'll set it here to MPYREMAINDER, which will\n"
"\t// essentially eliminate the check--unless overridden by the\n"
"\t// solver.\n"
"\tparameter F_CHECK = MPYREMAINDER;\n"
"\tinitial assert(MPYREMAINDER == F_CHECK);\n\n");
 
fprintf(fp,
"`endif // FORMAL\n");
}
 
fprintf(fp,
"endmodule\n");
fclose(fp);
}
 
void build_hwbfly(const char *fname, int xtracbits, ROUND_T rounding,
int ckpce, const bool async_reset) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
const char *rnd_string;
if (rounding == RND_TRUNCATE)
rnd_string = "truncate";
else if (rounding == RND_FROMZERO)
rnd_string = "roundfromzero";
else if (rounding == RND_HALFUP)
rnd_string = "roundhalfup";
else
rnd_string = "convround";
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\thwbfly.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tThis routine is identical to the butterfly.v routine found\n"
"// in 'butterfly.v', save only that it uses the verilog\n"
"// operator '*' in hopes that the synthesizer would be able to optimize\n"
"// it with hardware resources.\n"
"//\n"
"// It is understood that a hardware multiply can complete its operation in\n"
"// a single clock.\n"
"//\n"
"// Operation:\n"
"//\n"
"// Given two inputs, A (i_left) and B (i_right), and a complex\n"
"// coefficient C (i_coeff), return two outputs, O1 and O2, where:\n"
"//\n"
"// O1 = A + B, and\n"
"// O2 = (A - B)*C\n"
"//\n"
"// This operation is commonly known as a Decimation in Frequency (DIF)\n"
"// Radix-2 Butterfly.\n"
"// O1 and O2 are rounded before being returned in (o_left) and o_right\n"
"// to OWIDTH bits. If SHIFT is one, an extra bit is dropped from these\n"
"// values during the rounding process.\n"
"//\n"
"// Further, since these outputs will take some number of clocks to\n"
"// calculate, we'll pipe a value (i_aux) through the system and return\n"
"// it with the results (o_aux), so you can synchronize to the outgoing\n"
"// output stream.\n"
"//\n"
"//\n%s"
"//\n", prjname, creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module hwbfly(i_clk, %s, i_ce, i_coef, i_left, i_right, i_aux,\n"
"\t\to_left, o_right, o_aux);\n"
"\t// Public changeable parameters ...\n"
"\t// - IWIDTH, number of bits in each component of the input\n"
"\t// - CWIDTH, number of bits in each component of the twiddle factor\n"
"\t// - OWIDTH, number of bits in each component of the output\n"
"\tparameter IWIDTH=16,CWIDTH=IWIDTH+%d,OWIDTH=IWIDTH+1;\n"
"\t// Drop an additional bit on the output?\n"
"\tparameter\t\tSHIFT=0;\n"
"\t// The number of clocks per clock enable, 1, 2, or 3.\n"
"\tparameter\t[1:0]\tCKPCE=%d;\n\t//\n", resetw.c_str(), xtracbits,
ckpce);
 
fprintf(fp,
"\tinput\t\ti_clk, %s, i_ce;\n"
"\tinput\t\t[(2*CWIDTH-1):0]\ti_coef;\n"
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n"
"\tinput\t\ti_aux;\n"
"\toutput\twire\t[(2*OWIDTH-1):0]\to_left, o_right;\n"
"\toutput\treg\to_aux;\n\n"
"\n", resetw.c_str());
 
fprintf(fp,
"\treg\t[(2*IWIDTH-1):0] r_left, r_right;\n"
"\treg\t r_aux, r_aux_2;\n"
"\treg\t[(2*CWIDTH-1):0] r_coef;\n"
"\twire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i;\n"
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n"
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n"
"\treg signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n"
"\n"
"\treg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i;\n"
"\n"
"\treg [(2*IWIDTH+2):0] leftv, leftvv;\n"
"\n"
"\t// Set up the input to the multiply\n"
"\tinitial r_aux = 1\'b0;\n"
"\tinitial r_aux_2 = 1\'b0;\n");
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\tbegin\n"
"\t\t\tr_aux <= 1\'b0;\n"
"\t\t\tr_aux_2 <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// One clock just latches the inputs\n"
"\t\t\tr_aux <= i_aux;\n"
"\t\t\t// Next clock adds/subtracts\n"
"\t\t\t// Other inputs are simply delayed on second clock\n"
"\t\t\tr_aux_2 <= r_aux;\n"
"\t\tend\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// One clock just latches the inputs\n"
"\t\t\tr_left <= i_left; // No change in # of bits\n"
"\t\t\tr_right <= i_right;\n"
"\t\t\tr_coef <= i_coef;\n"
"\t\t\t// Next clock adds/subtracts\n"
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n"
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n"
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n"
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n"
"\t\t\t// Other inputs are simply delayed on second clock\n"
"\t\t\tir_coef_r <= r_coef[(2*CWIDTH-1):CWIDTH];\n"
"\t\t\tir_coef_i <= r_coef[(CWIDTH-1):0];\n"
"\t\tend\n"
"\n\n");
fprintf(fp,
"\t// See comments in the butterfly.v source file for a discussion of\n"
"\t// these operations and the appropriate bit widths.\n\n");
fprintf(fp,
"\twire\tsigned [((IWIDTH+1)+(CWIDTH)-1):0] p_one, p_two;\n"
"\twire\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] p_three;\n"
"\n"
"\tinitial leftv = 0;\n"
"\tinitial leftvv = 0;\n");
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\tbegin\n"
"\t\t\tleftv <= 0;\n"
"\t\t\tleftvv <= 0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// Second clock, pipeline = 1\n"
"\t\t\tleftv <= { r_aux_2, r_sum_r, r_sum_i };\n"
"\n"
"\t\t\t// Third clock, pipeline = 3\n"
"\t\t\t// As desired, each of these lines infers a DSP48\n"
"\t\t\tleftvv <= leftv;\n"
"\t\tend\n"
"\n");
 
// Nominally, we should handle code for 1, 2, or 3 clocks per CE, with
// one clock per CE meaning CE could be constant. The code below
// instead handles 1 or 3 clocks per CE, leaving the two clocks per
// CE optimization(s) unfulfilled.
 
// fprintf(fp,
//"\tend else if (CKPCI == 2'b01)\n\tbegin\n");
 
///////////////////////////////////////////
///
/// One clock per CE, so CE, CE, CE, CE, CE is possible
///
fprintf(fp,
"\tgenerate if (CKPCE <= 1)\n\tbegin : CKPCE_ONE\n");
 
fprintf(fp,
"\t\t// Coefficient multiply inputs\n"
"\t\treg\tsigned [(CWIDTH-1):0] p1c_in, p2c_in;\n"
"\t\t// Data multiply inputs\n"
"\t\treg\tsigned [(IWIDTH):0] p1d_in, p2d_in;\n"
"\t\t// Product 3, coefficient input\n"
"\t\treg\tsigned [(CWIDTH):0] p3c_in;\n"
"\t\t// Product 3, data input\n"
"\t\treg\tsigned [(IWIDTH+1):0] p3d_in;\n"
"\n");
fprintf(fp,
"\t\treg\tsigned [((IWIDTH+1)+(CWIDTH)-1):0] rp_one, rp_two;\n"
"\t\treg\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three;\n"
"\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// Second clock, pipeline = 1\n"
"\t\t\tp1c_in <= ir_coef_r;\n"
"\t\t\tp2c_in <= ir_coef_i;\n"
"\t\t\tp1d_in <= r_dif_r;\n"
"\t\t\tp2d_in <= r_dif_i;\n"
"\t\t\tp3c_in <= ir_coef_i + ir_coef_r;\n"
"\t\t\tp3d_in <= r_dif_r + r_dif_i;\n"
"\t\tend\n\n");
 
if (formal_property_flag)
fprintf(fp,
"`ifndef FORMAL\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// Third clock, pipeline = 3\n"
"\t\t\t// As desired, each of these lines infers a DSP48\n"
"\t\t\trp_one <= p1c_in * p1d_in;\n"
"\t\t\trp_two <= p2c_in * p2d_in;\n"
"\t\t\trp_three <= p3c_in * p3d_in;\n"
"\t\tend\n");
 
if (formal_property_flag)
fprintf(fp,
"`else\n"
"\t\twire signed [((IWIDTH+1)+(CWIDTH)-1):0] pre_rp_one, pre_rp_two;\n"
"\t\twire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] pre_rp_three;\n"
"\n"
"\t\tabs_mpy #(CWIDTH,IWIDTH+1,1'b1)\n"
"\t\t onei(p1c_in, p1d_in, pre_rp_one);\n"
"\t\tabs_mpy #(CWIDTH,IWIDTH+1,1'b1)\n"
"\t\t twoi(p2c_in, p2d_in, pre_rp_two);\n"
"\t\tabs_mpy #(CWIDTH+1,IWIDTH+2,1'b1)\n"
"\t\t threei(p3c_in, p3d_in, pre_rp_three);\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t rp_one = pre_rp_one;\n"
"\t\t rp_two = pre_rp_two;\n"
"\t\t rp_three = pre_rp_three;\n"
"\t\tend\n"
"`endif // FORMAL\n");
 
fprintf(fp,"\n"
"\t\tassign\tp_one = rp_one;\n"
"\t\tassign\tp_two = rp_two;\n"
"\t\tassign\tp_three = rp_three;\n"
"\n");
 
///////////////////////////////////////////
///
/// Two clocks per CE, so CE, no-ce, CE, no-ce, etc
///
fprintf(fp,
"\tend else if (CKPCE <= 2)\n"
"\tbegin : CKPCE_TWO\n"
"\t\t// Coefficient multiply inputs\n"
"\t\treg [2*(CWIDTH)-1:0] mpy_pipe_c;\n"
"\t\t// Data multiply inputs\n"
"\t\treg [2*(IWIDTH+1)-1:0] mpy_pipe_d;\n"
"\t\twire signed [(CWIDTH-1):0] mpy_pipe_vc;\n"
"\t\twire signed [(IWIDTH):0] mpy_pipe_vd;\n"
"\t\t//\n"
"\t\treg signed [(CWIDTH+1)-1:0] mpy_cof_sum;\n"
"\t\treg signed [(IWIDTH+2)-1:0] mpy_dif_sum;\n"
"\n"
"\t\tassign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH];\n"
"\t\tassign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1];\n"
"\n"
"\t\treg mpy_pipe_v;\n"
"\t\treg ce_phase;\n"
"\n"
"\t\treg signed [(CWIDTH+IWIDTH+1)-1:0] mpy_pipe_out;\n"
"\t\treg signed [IWIDTH+CWIDTH+3-1:0] longmpy;\n"
"\n"
"\n"
"\t\tinitial ce_phase = 1'b1;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n"
"\t\t\tce_phase <= 1'b1;\n"
"\t\telse if (i_ce)\n"
"\t\t\tce_phase <= 1'b0;\n"
"\t\telse\n"
"\t\t\tce_phase <= 1'b1;\n"
"\n"
"\t\talways @(*)\n"
"\t\t\tmpy_pipe_v = (i_ce)||(!ce_phase);\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (!ce_phase)\n"
"\t\tbegin\n"
"\t\t\t// Pre-clock\n"
"\t\t\tmpy_pipe_c[2*CWIDTH-1:0] <=\n"
"\t\t\t\t\t{ ir_coef_r, ir_coef_i };\n"
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <=\n"
"\t\t\t\t\t{ r_dif_r, r_dif_i };\n"
"\n"
"\t\t\tmpy_cof_sum <= ir_coef_i + ir_coef_r;\n"
"\t\t\tmpy_dif_sum <= r_dif_r + r_dif_i;\n"
"\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// First clock\n"
"\t\t\tmpy_pipe_c[2*(CWIDTH)-1:0] <= {\n"
"\t\t\t\tmpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} };\n"
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <= {\n"
"\t\t\t\tmpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} };\n"
"\t\tend\n\n");
 
if (formal_property_flag)
fprintf(fp, "`ifndef FORMAL\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce) // First clock\n"
"\t\t\tlongmpy <= mpy_cof_sum * mpy_dif_sum;\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (mpy_pipe_v)\n"
"\t\t\tmpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd;\n");
 
if (formal_property_flag)
fprintf(fp, "`else\n"
"\t\twire signed [IWIDTH+CWIDTH+3-1:0] pre_longmpy;\n"
"\t\twire signed [(CWIDTH+IWIDTH+1)-1:0] pre_mpy_pipe_out;\n"
"\n"
"\t\tabs_mpy #(CWIDTH+1,IWIDTH+2,1)\n"
"\t\t longmpyi(mpy_cof_sum, mpy_dif_sum, pre_longmpy);\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t longmpy <= pre_longmpy;\n"
"\n"
"\n"
"\t\tabs_mpy #(CWIDTH,IWIDTH+1,1)\n"
"\t\t mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out);\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (mpy_pipe_v)\n"
"\t\t mpy_pipe_out <= pre_mpy_pipe_out;\n"
"`endif\n");
 
fprintf(fp,"\n"
"\t\treg\tsigned\t[((IWIDTH+1)+(CWIDTH)-1):0] rp_one,\n"
"\t\t\t\t\t\t\trp2_one, rp_two;\n"
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three;\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (!ce_phase) // 1.5 clock\n"
"\t\t\trp_one <= mpy_pipe_out;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce) // two clocks\n"
"\t\t\trp_two <= mpy_pipe_out;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce) // Second clock\n"
"\t\t\trp_three<= longmpy;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\trp2_one<= rp_one;\n"
"\n"
"\t\tassign p_one = rp2_one;\n"
"\t\tassign p_two = rp_two;\n"
"\t\tassign p_three= rp_three;\n"
"\n");
 
/////////////////////////
///
/// Three clock per CE, so CE, no-ce, no-ce*, CE
///
fprintf(fp,
"\tend else if (CKPCE <= 2'b11)\n\tbegin : CKPCE_THREE\n");
 
fprintf(fp,
"\t\t// Coefficient multiply inputs\n"
"\t\treg\t\t[3*(CWIDTH+1)-1:0]\tmpy_pipe_c;\n"
"\t\t// Data multiply inputs\n"
"\t\treg\t\t[3*(IWIDTH+2)-1:0]\tmpy_pipe_d;\n"
"\t\twire\tsigned [(CWIDTH):0] mpy_pipe_vc;\n"
"\t\twire\tsigned [(IWIDTH+1):0] mpy_pipe_vd;\n"
"\n"
"\t\tassign\tmpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)];\n"
"\t\tassign\tmpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)];\n"
"\n"
"\t\treg\t\t\tmpy_pipe_v;\n"
"\t\treg\t\t[2:0]\tce_phase;\n"
"\n"
"\t\treg\tsigned [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;\n"
"\n");
fprintf(fp,
"\t\tinitial\tce_phase = 3'b011;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n"
"\t\t\tce_phase <= 3'b011;\n"
"\t\telse if (i_ce)\n"
"\t\t\tce_phase <= 3'b000;\n"
"\t\telse if (ce_phase != 3'b011)\n"
"\t\t\tce_phase <= ce_phase + 1'b1;\n"
"\n"
"\t\talways @(*)\n"
"\t\t\tmpy_pipe_v = (i_ce)||(ce_phase < 3'b010);\n"
"\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (ce_phase == 3\'b000)\n"
"\t\t\tbegin\n"
"\t\t\t\t// Second clock\n"
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= {\n"
"\t\t\t\t\tir_coef_r[CWIDTH-1], ir_coef_r,\n"
"\t\t\t\t\tir_coef_i[CWIDTH-1], ir_coef_i };\n"
"\t\t\t\tmpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r;\n"
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= {\n"
"\t\t\t\t\tr_dif_r[IWIDTH], r_dif_r,\n"
"\t\t\t\t\tr_dif_i[IWIDTH], r_dif_i };\n"
"\t\t\t\tmpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i;\n"
"\n"
"\t\t\tend else if (mpy_pipe_v)\n"
"\t\t\tbegin\n"
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:0] <= {\n"
"\t\t\t\t\tmpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1\'b0}} };\n"
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:0] <= {\n"
"\t\t\t\t\tmpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1\'b0}} };\n"
"\t\t\tend\n\n");
 
if (formal_property_flag)
fprintf(fp, "`ifndef\tFORMAL\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (mpy_pipe_v)\n"
"\t\t\t\tmpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd;\n"
"\n");
 
if (formal_property_flag)
fprintf(fp,
"`else\t// FORMAL\n"
"\t\twire signed [ (CWIDTH+IWIDTH+3)-1:0] pre_mpy_pipe_out;\n"
"\n"
"\t\tabs_mpy #(CWIDTH+1,IWIDTH+2,1)\n"
"\t\t mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out);\n"
"\t\talways @(posedge i_clk)\n"
"\t\t if (mpy_pipe_v)\n"
"\t\t mpy_pipe_out <= pre_mpy_pipe_out;\n"
"`endif\t// FORMAL\n\n");
 
 
fprintf(fp,
"\t\treg\tsigned\t[((IWIDTH+1)+(CWIDTH)-1):0]\trp_one, rp_two,\n"
"\t\t\t\t\t\trp2_one, rp2_two;\n"
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\trp_three, rp2_three;\n"
 
"\n");
 
fprintf(fp,
"\t\talways @(posedge i_clk)\n"
"\t\tif(i_ce)\n"
"\t\t\trp_one <= mpy_pipe_out[(CWIDTH+IWIDTH):0];\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif(ce_phase == 3'b000)\n"
"\t\t\trp_two <= mpy_pipe_out[(CWIDTH+IWIDTH):0];\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif(ce_phase == 3'b001)\n"
"\t\t\trp_three <= mpy_pipe_out;\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\trp2_one<= rp_one;\n"
"\t\t\trp2_two<= rp_two;\n"
"\t\t\trp2_three<= rp_three;\n"
"\t\tend\n");
fprintf(fp,
"\t\tassign p_one\t= rp2_one;\n"
"\t\tassign p_two\t= rp2_two;\n"
"\t\tassign\tp_three\t= rp2_three;\n"
"\n");
 
fprintf(fp,
"\tend endgenerate\n");
 
fprintf(fp,
"\twire\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] w_one, w_two;\n"
"\tassign\tw_one = { {(2){p_one[((IWIDTH+1)+(CWIDTH)-1)]}}, p_one };\n"
"\tassign\tw_two = { {(2){p_two[((IWIDTH+1)+(CWIDTH)-1)]}}, p_two };\n"
"\n");
 
fprintf(fp,
"\t// These values are held in memory and delayed during the\n"
"\t// multiply. Here, we recover them. During the multiply,\n"
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n"
"\t// therefore, the left_x values need to be right shifted by\n"
"\t// CWIDTH-2 as well. The additional bits come from a sign\n"
"\t// extension.\n"
"\twire\taux_s;\n"
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] left_si, left_sr;\n"
"\treg\t\t[(2*IWIDTH+2):0] left_saved;\n"
"\tassign\tleft_sr = { {2{left_saved[2*(IWIDTH+1)-1]}}, left_saved[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n"
"\tassign\tleft_si = { {2{left_saved[(IWIDTH+1)-1]}}, left_saved[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n"
"\tassign\taux_s = left_saved[2*IWIDTH+2];\n"
"\n"
"\t(* use_dsp48=\"no\" *)\n"
"\treg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n"
"\n");
 
fprintf(fp,
"\tinitial left_saved = 0;\n"
"\tinitial o_aux = 1\'b0;\n");
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t\tbegin\n"
"\t\t\tleft_saved <= 0;\n"
"\t\t\to_aux <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// First clock, recover all values\n"
"\t\t\tleft_saved <= leftvv;\n"
"\n"
"\t\t\t// Second clock, round and latch for final clock\n"
"\t\t\to_aux <= aux_s;\n"
"\t\tend\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n"
"\t\t\t// although they only need to be (IWIDTH+1)\n"
"\t\t\t// + (CWIDTH) bits wide. (We've got two\n"
"\t\t\t// extra bits we need to get rid of.)\n"
"\n"
"\t\t\t// These two lines also infer DSP48\'s.\n"
"\t\t\t// To keep from using extra DSP48 resources,\n"
"\t\t\t// they are prevented from using DSP48\'s\n"
"\t\t\t// by the (* use_dsp48 ... *) comment above.\n"
"\t\t\tmpy_r <= w_one - w_two;\n"
"\t\t\tmpy_i <= p_three - w_one - w_two;\n"
"\t\tend\n"
"\n");
 
fprintf(fp,
"\t// Round the results\n"
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n");
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_r(i_clk, i_ce,\n"
"\t\t\t\tleft_sr, rnd_left_r);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_i(i_clk, i_ce,\n"
"\t\t\t\tleft_si, rnd_left_i);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n"
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n"
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string);
 
 
fprintf(fp,
"\t// As a final step, we pack our outputs into two packed two's\n"
"\t// complement numbers per output word, so that each output word\n"
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n"
"\t// portion and the bottom half being the imaginary portion.\n"
"\tassign\to_left = { rnd_left_r, rnd_left_i };\n"
"\tassign\to_right= { rnd_right_r,rnd_right_i};\n"
"\n");
 
if (formal_property_flag) {
fprintf(fp,
"`ifdef VERILATOR\n"
"`define FORMAL\n"
"`endif\n"
"`ifdef FORMAL\n"
"\tlocalparam F_LGDEPTH = 3;\n"
"\tlocalparam F_DEPTH = 5;\n"
"\tlocalparam [F_LGDEPTH-1:0] F_D = F_DEPTH-1;\n"
"\n"
"\treg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1];\n"
"\treg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1];\n"
"\treg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1];\n"
"\treg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1];\n"
"\treg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1];\n"
"\treg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1];\n"
"\treg signed [F_DEPTH-1:0] f_dlyaux;\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n"
"\t\tf_dlyaux <= 0;\n"
"\telse if (i_ce)\n"
"\t\tf_dlyaux <= { f_dlyaux[F_DEPTH-2:0], i_aux };\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif (i_ce)\n"
"\tbegin\n"
"\t\tf_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH];\n"
"\t\tf_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0];\n"
"\t\tf_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH];\n"
"\t\tf_dlyright_i[0] <= i_right[( IWIDTH-1):0];\n"
"\t\tf_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH];\n"
"\t\tf_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0];\n"
"\tend\n"
"\n"
"\tgenvar k;\n"
"\tgenerate for(k=1; k<F_DEPTH; k=k+1)\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tf_dlyleft_r[k] <= f_dlyleft_r[ k-1];\n"
"\t\t\tf_dlyleft_i[k] <= f_dlyleft_i[ k-1];\n"
"\t\t\tf_dlyright_r[k] <= f_dlyright_r[k-1];\n"
"\t\t\tf_dlyright_i[k] <= f_dlyright_i[k-1];\n"
"\t\t\tf_dlycoeff_r[k] <= f_dlycoeff_r[k-1];\n"
"\t\t\tf_dlycoeff_i[k] <= f_dlycoeff_i[k-1];\n"
"\t\tend\n"
"\n"
"\tendgenerate\n"
"\n"
"`ifdef VERILATOR"
/*
"\tgenerate if (CKPCE <= 1)\n"
"\tbegin\n"
"\n"
"\t\t// i_ce is allowed to be anything in this mode\n"
"\n"
"\tend else if (CKPCE == 2)\n"
"\tbegin : F_CKPCE_TWO\n"
"\n"
"\t\tassert property (@(posedge i_clk)\n"
"\t\t i_ce |=> !i_ce);\n"
"\n"
"\tend else if (CKPCE == 3)\n"
"\tbegin : F_CKPCE_THREE\n"
"\n"
"\t\tassert property (@(posedge i_clk)\n"
"\t\t i_ce |=> !i_ce ##1 !i_ce);\n"
"\n"
"\tend endgenerate\n"
*/
"\n"
"`else\n"
"\talways @(posedge i_clk)\n"
"\tif ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3))\n"
"\t\t\t&&(!$past(i_ce,4)))\n"
"\t\tassume(i_ce);\n"
"\n"
"\tgenerate if (CKPCE <= 1)\n"
"\tbegin\n"
"\n"
"\t\t// i_ce is allowed to be anything in this mode\n"
"\n"
"\tend else if (CKPCE == 2)\n"
"\tbegin : F_CKPCE_TWO\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t if ($past(i_ce))\n"
"\t\t assume(!i_ce);\n"
"\n"
"\tend else if (CKPCE == 3)\n"
"\tbegin : F_CKPCE_THREE\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t if (($past(i_ce))||($past(i_ce,2)))\n"
"\t\t assume(!i_ce);\n"
"\n"
"\tend endgenerate\n"
"`endif"
"\n"
"\treg [F_LGDEPTH-1:0] f_startup_counter;\n"
"\tinitial f_startup_counter = 0;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n"
"\t\tf_startup_counter <= 0;\n"
"\telse if ((i_ce)&&(!(&f_startup_counter)))\n"
"\t\tf_startup_counter <= f_startup_counter + 1;\n"
"\n"
"\twire signed [IWIDTH:0] f_sumr, f_sumi;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t\tf_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D];\n"
"\t\tf_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D];\n"
"\tend\n"
"\n"
"\twire signed [IWIDTH+CWIDTH:0] f_sumrx, f_sumix;\n"
"\tassign f_sumrx = { {(2){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} };\n"
"\tassign f_sumix = { {(2){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} };\n"
"\n"
"\twire signed [IWIDTH:0] f_difr, f_difi;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t\tf_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D];\n"
"\t\tf_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D];\n"
"\tend\n"
"\n"
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix;\n"
"\tassign f_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr };\n"
"\tassign f_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi };\n"
"\n"
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i;\n"
"\tassign f_widecoeff_r = {{(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}},\n"
"\t f_dlycoeff_r[F_D] };\n"
"\tassign f_widecoeff_i = {{(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}},\n"
"\t f_dlycoeff_i[F_D] };\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif (f_startup_counter > F_D)\n"
"\tbegin\n"
"\t\tassert(left_sr == f_sumrx);\n"
"\t\tassert(left_si == f_sumix);\n"
"\t\tassert(aux_s == f_dlyaux[F_D]);\n"
"\n"
"\t\tif ((f_difr == 0)&&(f_difi == 0))\n"
"\t\tbegin\n"
"\t\t assert(mpy_r == 0);\n"
"\t\t assert(mpy_i == 0);\n"
"\t\tend else if ((f_dlycoeff_r[F_D] == 0)\n"
"\t\t &&(f_dlycoeff_i[F_D] == 0))\n"
"\t\tbegin\n"
"\t assert(mpy_r == 0);\n"
"\t\t assert(mpy_i == 0);\n"
"\t\tend\n"
"\n"
"\t\tif ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0))\n"
"\t\tbegin\n"
"\t\t assert(mpy_r == f_difrx);\n"
"\t\t assert(mpy_i == f_difix);\n"
"\t\tend\n"
"\n"
"\t\tif ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1))\n"
"\t\tbegin\n"
"\t\t assert(mpy_r == -f_difix);\n"
"\t\t assert(mpy_i == f_difrx);\n"
"\t\tend\n"
"\n"
"\t\tif ((f_difr == 1)&&(f_difi == 0))\n"
"\t\tbegin\n"
"\t\t assert(mpy_r == f_widecoeff_r);\n"
"\t\t assert(mpy_i == f_widecoeff_i);\n"
"\t\tend\n"
"\n"
"\t\tif ((f_difr == 0)&&(f_difi == 1))\n"
"\t\tbegin\n"
"\t\t assert(mpy_r == -f_widecoeff_i);\n"
"\t\t assert(mpy_i == f_widecoeff_r);\n"
"\t\tend\n"
"\tend\n"
"\n");
 
fprintf(fp,
"\t// Let's see if we can improve our performance at all by\n"
"\t// moving our test one clock earlier. If nothing else, it should\n"
"\t// help induction finish one (or more) clocks ealier than\n"
"\t// otherwise\n"
"\n\n"
"\twire signed [IWIDTH:0] f_predifr, f_predifi;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t\tf_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1];\n"
"\t\tf_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1];\n"
"\tend\n"
"\n"
"\twire signed [IWIDTH+CWIDTH+1-1:0] f_predifrx, f_predifix;\n"
"\tassign f_predifrx = { {(CWIDTH){f_predifr[IWIDTH]}}, f_predifr };\n"
"\tassign f_predifix = { {(CWIDTH){f_predifi[IWIDTH]}}, f_predifi };\n"
"\n"
"\twire signed [CWIDTH:0] f_sumcoef;\n"
"\twire signed [IWIDTH+1:0] f_sumdiff;\n"
"\talways @(*)\n"
"\tbegin\n"
"\t\tf_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1];\n"
"\t\tf_sumdiff = f_predifr + f_predifi;\n"
"\tend\n"
"\n"
"\t// Induction helpers\n"
"\talways @(posedge i_clk)\n"
"\tif (f_startup_counter >= F_D)\n"
"\tbegin\n"
"\t\tif (f_dlycoeff_r[F_D-1] == 0)\n"
"\t\t\tassert(p_one == 0);\n"
"\t\tif (f_dlycoeff_i[F_D-1] == 0)\n"
"\t\t\tassert(p_two == 0);\n"
"\n"
"\t\tif (f_dlycoeff_r[F_D-1] == 1)\n"
"\t\t\tassert(p_one == f_predifrx);\n"
"\t\tif (f_dlycoeff_i[F_D-1] == 1)\n"
"\t\t\tassert(p_two == f_predifix);\n"
"\n"
"\t\tif (f_predifr == 0)\n"
"\t\t\tassert(p_one == 0);\n"
"\t\tif (f_predifi == 0)\n"
"\t\t\tassert(p_two == 0);\n"
"\n"
"\t\t// verilator lint_off WIDTH\n"
"\t\tif (f_predifr == 1)\n"
"\t\t\tassert(p_one == f_dlycoeff_r[F_D-1]);\n"
"\t\tif (f_predifi == 1)\n"
"\t\t\tassert(p_two == f_dlycoeff_i[F_D-1]);\n"
"\t\t// verilator lint_on WIDTH\n"
"\n"
"\t\tif (f_sumcoef == 0)\n"
"\t\t\tassert(p_three == 0);\n"
"\t\tif (f_sumdiff == 0)\n"
"\t\t\tassert(p_three == 0);\n"
"\t\t// verilator lint_off WIDTH\n"
"\t\tif (f_sumcoef == 1)\n"
"\t\t\tassert(p_three == f_sumdiff);\n"
"\t\tif (f_sumdiff == 1)\n"
"\t\t\tassert(p_three == f_sumcoef);\n"
"\t\t// verilator lint_on WIDTH\n"
"`ifdef VERILATOR\n"
"\t\tassert(p_one == f_predifr * f_dlycoeff_r[F_D-1]);\n"
"\t\tassert(p_two == f_predifi * f_dlycoeff_i[F_D-1]);\n"
"\t\tassert(p_three == f_sumdiff * f_sumcoef);\n"
"`endif // VERILATOR\n"
"\tend\n\n"
"`endif // FORMAL\n");
}
 
fprintf(fp,
"endmodule\n");
 
fclose(fp);
}
/trunk/sw/butterfly.h
0,0 → 1,48
////////////////////////////////////////////////////////////////////////////////
//
// Filename: butterfly.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef BUTTERFLY_H
#define BUTTERFLY_H
 
extern void build_butterfly(const char *fname, int xtracbits,
ROUND_T rounding, int ckpce = 1,
const bool async_reset = false);
 
extern void build_hwbfly(const char *fname, int xtracbits, ROUND_T rounding,
int ckpce = 3, const bool async_reset= false);
 
#endif
/trunk/sw/defaults.h
0,0 → 1,81
////////////////////////////////////////////////////////////////////////////////
//
// Filename: defaults.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef DEFAULTS_H
#define DEFAULTS_H
 
#define DEF_NBITSIN 16
#define DEF_COREDIR "fft-core"
#define DEF_XTRACBITS 4
#define DEF_NMPY 0
#define DEF_XTRAPBITS 0
#define USE_OLD_MULTIPLY false
 
// To coordinate testing, it helps to have some defines in our header file that
// are common with the default parameters found within the various subroutines.
// We'll define those common parameters here. These values, however, have no
// effect on anything other than bench testing. They do, though, allow us to
// bench test exact copies of what is going on within the FFT when necessary
// in order to find problems.
// First, parameters for the new multiply based upon the bi-multiply structure
// (2-bits/2-tableau rows at a time).
#define TST_LONGBIMPY_AW 8
#define TST_LONGBIMPY_BW 12 // Leave undefined to match AW
 
// We also include parameters for the shift add multiply
#define TST_SHIFTADDMPY_AW 16
#define TST_SHIFTADDMPY_BW 20 // Leave undefined to match AW
 
// Now for parameters matching the butterfly
#define TST_BUTTERFLY_IWIDTH 16
#define TST_BUTTERFLY_CWIDTH 20
#define TST_BUTTERFLY_OWIDTH (TST_BUTTERFLY_IWIDTH+1)
 
// Now for parameters matching the qtrstage
#define TST_QTRSTAGE_IWIDTH 16
#define TST_QTRSTAGE_LGWIDTH 8
 
// Parameters for the dblstage
#define TST_DBLSTAGE_IWIDTH 16
#define TST_DBLSTAGE_SHIFT 0
 
// Now for parameters matching the dblreverse stage
#define TST_DBLREVERSE_LGSIZE 5
 
static const bool formal_property_flag = true;
 
#endif
/trunk/sw/fftgen.cpp
2,7 → 2,7
//
// Filename: fftgen.cpp
//
// Project: A Doubletime Pipelined FFT
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: This is the core generator for the project. Every part
// and piece of this project begins and ends in this program.
27,7 → 27,7
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2017, Gisselquist Technology, LLC
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
67,9 → 67,6
 
#if _MSC_VER <= 1700
 
long long llround(double d) {
if (d<0) return -(long long)(-d+0.5);
else return (long long)(d+0.5); }
int lstat(const char *filename, struct stat *buf) { return 1; };
#define S_ISDIR(A) 0
 
97,123 → 94,16
#include <ctype.h>
#include <assert.h>
 
#define DEF_NBITSIN 16
#define DEF_COREDIR "fft-core"
#define DEF_XTRACBITS 4
#define DEF_NMPY 0
#define DEF_XTRAPBITS 0
#define USE_OLD_MULTIPLY false
#include "defaults.h"
#include "legal.h"
#include "rounding.h"
#include "fftlib.h"
#include "bldstage.h"
#include "bitreverse.h"
#include "softmpy.h"
#include "butterfly.h"
 
// To coordinate testing, it helps to have some defines in our header file that
// are common with the default parameters found within the various subroutines.
// We'll define those common parameters here. These values, however, have no
// effect on anything other than bench testing. They do, though, allow us to
// bench test exact copies of what is going on within the FFT when necessary
// in order to find problems.
// First, parameters for the new multiply based upon the bi-multiply structure
// (2-bits/2-tableau rows at a time).
#define TST_LONGBIMPY_AW 16
#define TST_LONGBIMPY_BW 20 // Leave undefined to match AW
 
// We also include parameters for the shift add multiply
#define TST_SHIFTADDMPY_AW 16
#define TST_SHIFTADDMPY_BW 20 // Leave undefined to match AW
 
// Now for parameters matching the butterfly
#define TST_BUTTERFLY_IWIDTH 16
#define TST_BUTTERFLY_CWIDTH 20
#define TST_BUTTERFLY_OWIDTH 17
 
// Now for parameters matching the qtrstage
#define TST_QTRSTAGE_IWIDTH 16
#define TST_QTRSTAGE_LGWIDTH 8
 
// Parameters for the dblstage
#define TST_DBLSTAGE_IWIDTH 16
#define TST_DBLSTAGE_SHIFT 0
 
// Now for parameters matching the dblreverse stage
#define TST_DBLREVERSE_LGSIZE 5
 
typedef enum {
RND_TRUNCATE, RND_FROMZERO, RND_HALFUP, RND_CONVERGENT
} ROUND_T;
 
const char cpyleft[] =
"////////////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Copyright (C) 2015-2017, Gisselquist Technology, LLC\n"
"//\n"
"// This program is free software (firmware): you can redistribute it and/or\n"
"// modify it under the terms of the GNU General Public License as published\n"
"// by the Free Software Foundation, either version 3 of the License, or (at\n"
"// your option) any later version.\n"
"//\n"
"// This program is distributed in the hope that it will be useful, but WITHOUT\n"
"// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or\n"
"// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License\n"
"// for more details.\n"
"//\n"
"// You should have received a copy of the GNU General Public License along\n"
"// with this program. (It's in the $(ROOT)/doc directory, run make with no\n"
"// target there if the PDF file isn\'t present.) If not, see\n"
"// <http://www.gnu.org/licenses/> for a copy.\n"
"//\n"
"// License: GPL, v3, as defined and found on www.gnu.org,\n"
"// http://www.gnu.org/licenses/gpl.html\n"
"//\n"
"//\n"
"////////////////////////////////////////////////////////////////////////////////\n";
const char prjname[] = "A Doubletime Pipelined FFT";
const char creator[] = "// Creator: Dan Gisselquist, Ph.D.\n"
"// Gisselquist Technology, LLC\n";
 
int lgval(int vl) {
int lg;
 
for(lg=1; (1<<lg) < vl; lg++)
;
return lg;
}
 
int nextlg(int vl) {
int r;
 
for(r=1; r<vl; r<<=1)
;
return r;
}
 
int bflydelay(int nbits, int xtra) {
int cbits = nbits + xtra;
int delay;
 
if (USE_OLD_MULTIPLY) {
if (nbits+1<cbits)
delay = nbits+4;
else
delay = cbits+3;
} else {
int na=nbits+2, nb=cbits+1;
if (nb<na) {
int tmp = nb;
nb = na; na = tmp;
} delay = ((na)/2+(na&1)+2);
}
return delay;
}
 
int lgdelay(int nbits, int xtra) {
// The butterfly code needs to compare a valid address, of this
// many bits, with an address two greater. This guarantees we
// have enough bits for that comparison. We'll also end up with
// more storage space to look for these values, but without a
// redesign that's just what we'll deal with.
return lgval(bflydelay(nbits, xtra)+3);
}
 
void build_truncator(const char *fname) {
printf("TRUNCATING!\n");
void build_dblquarters(const char *fname, ROUND_T rounding, const bool async_reset=false, const bool dbg=false) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
220,359 → 110,6
perror("O/S Err was:");
return;
}
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: truncate.v\n"
"// \n"
"// Project: %s\n"
"//\n"
"// Purpose: Truncation is one of several options that can be used\n"
"// internal to the various FFT stages to drop bits from one \n"
"// stage to the next. In general, it is the simplest method\n"
"// of dropping bits, since it requires only a bit selection.\n"
"//\n"
"// This form of rounding isn\'t really that great for FFT\'s,\n"
"// since it tends to produce a DC bias in the result. (Other\n"
"// less pronounced biases may also exist.)\n"
"//\n"
"// This particular version also registers the output with the\n"
"// clock, so there will be a delay of one going through this\n"
"// module. This will keep it in line with the other forms of\n"
"// rounding that can be used.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module truncate(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_val <= i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\n"
"endmodule\n");
}
 
 
void build_roundhalfup(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: roundhalfup.v\n"
"// \n"
"// Project: %s\n"
"//\n"
"// Purpose: Rounding half up is the way I was always taught to round in\n"
"// school. A one half value is added to the result, and then\n"
"// the result is truncated. When used in an FFT, this produces\n"
"// less bias than the truncation method, although a bias still\n"
"// tends to remain.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module roundhalfup(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\t// Let's deal with two cases to be as general as we can be here\n"
"\t//\n"
"\t// 1. The desired output would lose no bits at all\n"
"\t// 2. One or more bits would be dropped, so the rounding is simply\n"
"\t//\t\ta matter of adding one to the bit about to be dropped,\n"
"\t//\t\tmoving all halfway and above numbers up to the next\n"
"\t//\t\tvalue.\n"
"\tgenerate\n"
"\tif (IWID-SHIFT == OWID)\n"
"\tbegin // No truncation or rounding, output drops no bits\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n"
"\n"
"\tend else // if (IWID-SHIFT-1 >= OWID)\n"
"\tbegin // Output drops one bit, can only add one or ... not.\n"
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n"
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse\n"
"\t\t\t\t\to_val <= rounded_up; // even value\n"
"\t\t\tend\n"
"\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"endmodule\n");
}
 
void build_roundfromzero(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: roundfromzero.v\n"
"// \n"
"// Project: %s\n"
"//\n"
"// Purpose: Truncation is one of several options that can be used\n"
"// internal to the various FFT stages to drop bits from one \n"
"// stage to the next. In general, it is the simplest method\n"
"// of dropping bits, since it requires only a bit selection.\n"
"//\n"
"// This form of rounding isn\'t really that great for FFT\'s,\n"
"// since it tends to produce a DC bias in the result. (Other\n"
"// less pronounced biases may also exist.)\n"
"//\n"
"// This particular version also registers the output with the\n"
"// clock, so there will be a delay of one going through this\n"
"// module. This will keep it in line with the other forms of\n"
"// rounding that can be used.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module roundfromzero(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\t// Let's deal with three cases to be as general as we can be here\n"
"\t//\n"
"\t//\t1. The desired output would lose no bits at all\n"
"\t//\t2. One bit would be dropped, so the rounding is simply\n"
"\t//\t\tadjusting the value to be the closer to zero in\n"
"\t//\t\tcases of being halfway between two. If identically\n"
"\t//\t\tequal to a number, we just leave it as is.\n"
"\t//\t3. Two or more bits would be dropped. In this case, we round\n"
"\t//\t\tnormally unless we are rounding a value of exactly\n"
"\t//\t\thalfway between the two. In the halfway case, we\n"
"\t//\t\tround away from zero.\n"
"\tgenerate\n"
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n"
"\tbegin // cannot be applied. No truncation or rounding takes\n"
"\t// effect here.\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n"
"\n"
"\tend else if (IWID-SHIFT == OWID)\n"
"\tbegin // No truncation or rounding, output drops no bits\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n"
"\n"
"\tend else if (IWID-SHIFT-1 == OWID)\n"
"\tbegin // Output drops one bit, can only add one or ... not.\n"
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n"
"\t\twire\t\t\tsign_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tfirst_lost_bit = i_val[0];\n"
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (sign_bit)\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse\n"
"\t\t\t\t\to_val <= rounded_up;\n"
"\t\t\tend\n"
"\n"
"\tend else // If there's more than one bit we are dropping\n"
"\tbegin\n"
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n"
"\t\twire\t\t\tsign_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n"
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n"
"\n"
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n"
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (|other_lost_bits) // Round up to\n"
"\t\t\t\t\to_val <= rounded_up; // closest value\n"
"\t\t\t\telse if (sign_bit)\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse\n"
"\t\t\t\t\to_val <= rounded_up;\n"
"\t\t\tend\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"endmodule\n");
}
 
void build_convround(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: convround.v\n"
"// \n"
"// Project: %s\n"
"//\n"
"// Purpose: A convergent rounding routine, also known as banker\'s\n"
"// rounding, Dutch rounding, Gaussian rounding, unbiased\n"
"// rounding, or ... more, at least according to Wikipedia.\n"
"//\n"
"// This form of rounding works by rounding, when the direction is in\n"
"// question, towards the nearest even value.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module convround(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\t// Let's deal with three cases to be as general as we can be here\n"
"\t//\n"
"\t//\t1. The desired output would lose no bits at all\n"
"\t//\t2. One bit would be dropped, so the rounding is simply\n"
"\t//\t\tadjusting the value to be the nearest even number in\n"
"\t//\t\tcases of being halfway between two. If identically\n"
"\t//\t\tequal to a number, we just leave it as is.\n"
"\t//\t3. Two or more bits would be dropped. In this case, we round\n"
"\t//\t\tnormally unless we are rounding a value of exactly\n"
"\t//\t\thalfway between the two. In the halfway case we round\n"
"\t//\t\tto the nearest even number.\n"
"\tgenerate\n"
// What if IWID < OWID? We should expand here ... somehow
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n"
"\tbegin // cannot be applied. No truncation or rounding takes\n"
"\t// effect here.\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n"
"\n"
// What if IWID-SHIFT < OWID? Shouldn't we also shift here as well?
"\tend else if (IWID-SHIFT == OWID)\n"
"\tbegin // No truncation or rounding, output drops no bits\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n"
"\n"
"\tend else if (IWID-SHIFT-1 == OWID)\n"
// Is there any way to limit the number of bits that are examined here, for the
// purpose of simplifying/reducing logic? I mean, if we go from 32 to 16 bits,
// must we check all 15 bits for equality to zero?
"\tbegin // Output drops one bit, can only add one or ... not.\n"
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n"
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tlast_valid_bit = truncated_value[0];\n"
"\t\tassign\tfirst_lost_bit = i_val[0];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (last_valid_bit)// Round up to nearest\n"
"\t\t\t\t\to_val <= rounded_up; // even value\n"
"\t\t\t\telse // else round down to the nearest\n"
"\t\t\t\t\to_val <= truncated_value; // even value\n"
"\t\t\tend\n"
"\n"
"\tend else // If there's more than one bit we are dropping\n"
"\tbegin\n"
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n"
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tlast_valid_bit = truncated_value[0];\n"
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n"
"\n"
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n"
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (|other_lost_bits) // Round up to\n"
"\t\t\t\t\to_val <= rounded_up; // closest value\n"
"\t\t\t\telse if (last_valid_bit) // Round up to\n"
"\t\t\t\t\to_val <= rounded_up; // nearest even\n"
"\t\t\t\telse // else round down to nearest even\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\tend\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"endmodule\n");
}
 
void build_quarters(const char *fname, ROUND_T rounding, bool dbg=false) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
const char *rnd_string;
if (rounding == RND_TRUNCATE)
rnd_string = "truncate";
585,16 → 122,16
 
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
SLASHLINE
"//\n"
"// Filename: qtrstage%s.v\n"
"// \n"
"// Project: %s\n"
"// Filename:\tqtrstage%s.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose: This file encapsulates the 4 point stage of a decimation in\n"
"// frequency FFT. This particular implementation is optimized\n"
"// so that all of the multiplies are accomplished by additions\n"
"// and multiplexers only.\n"
"// so that all of the multiplies are accomplished by additions and\n"
"// multiplexers only.\n"
"//\n"
"//\n%s"
"//\n",
602,19 → 139,25
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
fprintf(fp,
"module\tqtrstage%s(i_clk, i_rst, i_ce, i_sync, i_data, o_data, o_sync%s);\n"
"module\tqtrstage%s(i_clk, %s, i_ce, i_sync, i_data, o_data, o_sync%s);\n"
"\tparameter IWIDTH=%d, OWIDTH=IWIDTH+1;\n"
"\t// Parameters specific to the core that should be changed when this\n"
"\t// core is built ... Note that the minimum LGSPAN is 2. Smaller \n"
"\t// core is built ... Note that the minimum LGSPAN is 2. Smaller\n"
"\t// spans must use the fftdoubles stage.\n"
"\tparameter\tLGWIDTH=%d, ODD=0, INVERSE=0,SHIFT=0;\n"
"\tinput\t i_clk, i_rst, i_ce, i_sync;\n"
"\tinput\t i_clk, %s, i_ce, i_sync;\n"
"\tinput\t [(2*IWIDTH-1):0] i_data;\n"
"\toutput\treg [(2*OWIDTH-1):0] o_data;\n"
"\toutput\treg o_sync;\n"
"\t\n", (dbg)?"_dbg":"", (dbg)?", o_dbg":"", TST_QTRSTAGE_IWIDTH,
TST_QTRSTAGE_LGWIDTH);
"\t\n", (dbg)?"_dbg":"",
resetw.c_str(),
(dbg)?", o_dbg":"", TST_QTRSTAGE_IWIDTH,
TST_QTRSTAGE_LGWIDTH, resetw.c_str());
if (dbg) { fprintf(fp, "\toutput\twire\t[33:0]\t\t\to_dbg;\n"
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n"
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n"
675,9 → 218,16
*/
fprintf(fp,
"\tinitial wait_for_sync = 1\'b1;\n"
"\tinitial iaddr = 0;\n"
"\tinitial iaddr = 0;\n");
if (async_reset)
fprintf(fp,
"\talways @(posedge i_clk, negedge i_areset_n)\n"
"\t\tif (!i_reset)\n");
else
fprintf(fp,
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tif (i_reset)\n");
fprintf(fp,
"\t\tbegin\n"
"\t\t\twait_for_sync <= 1\'b1;\n"
"\t\t\tiaddr <= 0;\n"
685,7 → 235,7
"\t\tbegin\n"
"\t\t\tiaddr <= iaddr + { {(LGWIDTH-1){1\'b0}}, 1\'b1 };\n"
"\t\t\twait_for_sync <= 1\'b0;\n"
"\t\tend\n"
"\t\tend\n\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\timem <= i_data;\n"
694,9 → 244,17
"\t// Note that we don\'t check on wait_for_sync or i_sync here.\n"
"\t// Why not? Because iaddr will always be zero until after the\n"
"\t// first i_ce, so we are safe.\n"
"\tinitial pipeline = 4\'h0;\n"
"\tinitial pipeline = 4\'h0;\n");
if (async_reset)
fprintf(fp,
"\talways\t@(posedge i_clk, negedge i_areset_n)\n"
"\t\tif (!i_reset)\n");
else
fprintf(fp,
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tif (i_reset)\n");
 
fprintf(fp,
"\t\t\tpipeline <= 4\'h0;\n"
"\t\telse if (i_ce) // is our pipeline process full? Which stages?\n"
"\t\t\tpipeline <= { pipeline[2:0], iaddr[0] };\n\n");
752,9 → 310,17
"\t// Don\'t forget in the sync check that we are running\n"
"\t// at two clocks per sample. Thus we need to\n"
"\t// produce a sync every 2^(LGWIDTH-1) clocks.\n"
"\tinitial\to_sync = 1\'b0;\n"
"\tinitial\to_sync = 1\'b0;\n");
 
if (async_reset)
fprintf(fp,
"\talways\t@(posedge i_clk, negedge i_areset_n)\n"
"\t\tif (!i_areset_n)\n");
else
fprintf(fp,
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tif (i_reset)\n");
fprintf(fp,
"\t\t\to_sync <= 1\'b0;\n"
"\t\telse if (i_ce)\n"
"\t\t\to_sync <= &(~iaddr[(LGWIDTH-2):3]) && (iaddr[2:0] == 3'b101);\n");
761,7 → 327,7
fprintf(fp, "endmodule\n");
}
 
void build_dblstage(const char *fname, ROUND_T rounding, const bool dbg = false) {
void build_snglquarters(const char *fname, ROUND_T rounding, const bool async_reset=false, const bool dbg=false) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
768,7 → 334,6
perror("O/S Err was:");
return;
}
 
const char *rnd_string;
if (rounding == RND_TRUNCATE)
rnd_string = "truncate";
781,995 → 346,393
 
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
SLASHLINE
"//\n"
"// Filename: dblstage%s.v\n"
"// Filename:\tqtrstage%s.v\n"
"//\n"
"// Project: %s\n"
"// Project:\t%s\n"
"//\n"
"// Purpose: This is part of an FPGA implementation that will process\n"
"// the final stage of a decimate-in-frequency FFT, running\n"
"// through the data at two samples per clock. If you notice\n"
"// from the derivation of an FFT, the only time both even and\n"
"// odd samples are used at the same time is in this stage.\n"
"// Therefore, other than this stage and these twiddles, all of\n"
"// the other stages can run two stages at a time at one sample\n"
"// per clock.\n"
"// Purpose: This file encapsulates the 4 point stage of a decimation in\n"
"// frequency FFT. This particular implementation is optimized\n"
"// so that all of the multiplies are accomplished by additions and\n"
"// multiplexers only.\n"
"//\n"
"// In this implementation, the output is valid one clock after\n"
"// the input is valid. The output also accumulates one bit\n"
"// above and beyond the number of bits in the input.\n"
"// \n"
"// i_clk A system clock\n"
"// i_rst A synchronous reset\n"
"// i_ce Circuit enable--nothing happens unless this line is high\n"
"// i_sync A synchronization signal, high once per FFT at the start\n"
"// i_left The first (even) complex sample input. The higher order\n"
"// bits contain the real portion, low order bits the\n"
"// imaginary portion, all in two\'s complement.\n"
"// i_right The next (odd) complex sample input, same format as\n"
"// i_left.\n"
"// o_left The first (even) complex output.\n"
"// o_right The next (odd) complex output.\n"
"// o_sync Output synchronization signal.\n"
"// Operation:\n"
"// The operation of this stage is identical to the regular stages of\n"
"// the FFT (see them for details), with one additional and critical\n"
"// difference: this stage doesn't require any hardware multiplication.\n"
"// The multiplies within it may all be accomplished using additions and\n"
"// subtractions.\n"
"//\n"
"// Let's see how this is done. Given x[n] and x[n+2], cause thats the\n"
"// stage we are working on, with i_sync true for x[0] being input,\n"
"// produce the output:\n"
"//\n"
"// y[n ] = x[n] + x[n+2]\n"
"// y[n+2] = (x[n] - x[n+2]) * e^{-j2pi n/2} (forward transform)\n"
"// = (x[n] - x[n+2]) * -j^n\n"
"//\n"
"// y[n].r = x[n].r + x[n+2].r (This is the easy part)\n"
"// y[n].i = x[n].i + x[n+2].i\n"
"//\n"
"// y[2].r = x[0].r - x[2].r\n"
"// y[2].i = x[0].i - x[2].i\n"
"//\n"
"// y[3].r = (x[1].i - x[3].i) (forward transform)\n"
"// y[3].i = - (x[1].r - x[3].r)\n"
"//\n"
"// y[3].r = - (x[1].i - x[3].i) (inverse transform)\n"
"// y[3].i = (x[1].r - x[3].r) (INVERSE = 1)\n"
// "//\n"
// "// When the FFT is run in the two samples per clock mode, this quarter\n"
// "// stage will operate on either x[0] and x[2] (ODD = 0), or x[1] and\n"
// "// x[3] (ODD = 1). In all other cases, it will operate on all four\n"
// "// values.\n"
"//\n%s"
"//\n", (dbg)?"_dbg":"", prjname, creator);
 
"//\n",
(dbg)?"_dbg":"", prjname, creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
 
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
fprintf(fp,
"module\tdblstage%s(i_clk, i_rst, i_ce, i_sync, i_left, i_right, o_left, o_right, o_sync%s);\n"
"\tparameter\tIWIDTH=%d,OWIDTH=IWIDTH+1, SHIFT=%d;\n"
"\tinput\t\ti_clk, i_rst, i_ce, i_sync;\n"
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n"
"\toutput\treg\t[(2*OWIDTH-1):0]\to_left, o_right;\n"
"\toutput\treg\t\t\to_sync;\n"
"\n", (dbg)?"_dbg":"", (dbg)?", o_dbg":"",
TST_DBLSTAGE_IWIDTH, TST_DBLSTAGE_SHIFT);
 
"module\tqtrstage%s(i_clk, %s, i_ce, i_sync, i_data, o_data, o_sync%s);\n"
"\tparameter IWIDTH=%d, OWIDTH=IWIDTH+1;\n"
"\tparameter\tLGWIDTH=%d, INVERSE=0,SHIFT=0;\n"
"\tinput\t i_clk, %s, i_ce, i_sync;\n"
"\tinput\t [(2*IWIDTH-1):0] i_data;\n"
"\toutput\treg [(2*OWIDTH-1):0] o_data;\n"
"\toutput\treg o_sync;\n"
"\t\n", (dbg)?"_dbg":"", resetw.c_str(),
(dbg)?", o_dbg":"", TST_QTRSTAGE_IWIDTH,
TST_QTRSTAGE_LGWIDTH, resetw.c_str());
if (dbg) { fprintf(fp, "\toutput\twire\t[33:0]\t\t\to_dbg;\n"
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_left[(2*OWIDTH-1):(2*OWIDTH-16)],\n"
"\t\t\t\t\to_left[(OWIDTH-1):(OWIDTH-16)] };\n"
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n"
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n"
"\n");
}
 
fprintf(fp,
"\twire\tsigned\t[(IWIDTH-1):0]\ti_in_0r, i_in_0i, i_in_1r, i_in_1i;\n"
"\tassign\ti_in_0r = i_left[(2*IWIDTH-1):(IWIDTH)]; \n"
"\tassign\ti_in_0i = i_left[(IWIDTH-1):0]; \n"
"\tassign\ti_in_1r = i_right[(2*IWIDTH-1):(IWIDTH)]; \n"
"\tassign\ti_in_1i = i_right[(IWIDTH-1):0]; \n"
"\twire\t[(OWIDTH-1):0]\t\to_out_0r, o_out_0i,\n"
"\t\t\t\t\to_out_1r, o_out_1i;\n"
"\treg\t wait_for_sync;\n"
"\treg\t[2:0] pipeline;\n"
"\n"
"\treg\tsigned [(IWIDTH):0] sum_r, sum_i, diff_r, diff_i;\n"
"\n"
"\t// Handle a potential rounding situation, when IWIDTH>=OWIDTH.\n"
"\treg\t[(2*OWIDTH-1):0]\tob_a;\n"
"\twire\t[(2*OWIDTH-1):0]\tob_b;\n"
"\treg\t[(OWIDTH-1):0]\t\tob_b_r, ob_b_i;\n"
"\tassign\tob_b = { ob_b_r, ob_b_i };\n"
"\n"
"\n");
fprintf(fp,
"\n"
"\t// As with any register connected to the sync pulse, these must\n"
"\t// have initial values and be reset on the i_rst signal.\n"
"\t// Other data values need only restrict their updates to i_ce\n"
"\t// enabled clocks, but sync\'s must obey resets and initial\n"
"\t// conditions as well.\n"
"\treg\trnd_sync, r_sync;\n"
"\treg\t[(LGWIDTH-1):0]\t\tiaddr;\n"
"\treg\t[(2*IWIDTH-1):0]\timem\t[0:1];\n"
"\n"
"\tinitial\trnd_sync = 1\'b0; // Sync into rounding\n"
"\tinitial\tr_sync = 1\'b0; // Sync coming out\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tbegin\n"
"\t\t\trnd_sync <= 1\'b0;\n"
"\t\t\tr_sync <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\trnd_sync <= i_sync;\n"
"\t\t\tr_sync <= rnd_sync;\n"
"\t\tend\n"
"\twire\tsigned\t[(IWIDTH-1):0]\timem_r, imem_i;\n"
"\tassign\timem_r = imem[1][(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\timem_i = imem[1][(IWIDTH-1):0];\n"
"\n"
"\t// As with other variables, these are really only updated when in\n"
"\t// the processing pipeline, after the first i_sync. However, to\n"
"\t// eliminate as much unnecessary logic as possible, we toggle\n"
"\t// these any time the i_ce line is enabled, and don\'t reset.\n"
"\t// them on i_rst.\n");
fprintf(fp,
"\t// Don't forget that we accumulate a bit by adding two values\n"
"\t// together. Therefore our intermediate value must have one more\n"
"\t// bit than the two originals.\n"
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_0r, rnd_in_0i;\n"
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_1r, rnd_in_1i;\n\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t//\n"
"\t\t\trnd_in_0r <= i_in_0r + i_in_1r;\n"
"\t\t\trnd_in_0i <= i_in_0i + i_in_1i;\n"
"\t\t\t//\n"
"\t\t\trnd_in_1r <= i_in_0r - i_in_1r;\n"
"\t\t\trnd_in_1i <= i_in_0i - i_in_1i;\n"
"\t\t\t//\n"
"\t\tend\n"
"\twire\tsigned\t[(IWIDTH-1):0]\ti_data_r, i_data_i;\n"
"\tassign\ti_data_r = i_data[(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\ti_data_i = i_data[(IWIDTH-1):0];\n"
"\n"
"\treg [(2*OWIDTH-1):0] omem [0:1];\n"
"\n");
 
fprintf(fp, "\t//\n"
"\t// Round our output values down to OWIDTH bits\n"
"\t//\n");
 
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0r(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_0r, o_out_0r);\n\n", rnd_string);
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_sum_r, rnd_sum_i,\n"
"\t\t\trnd_diff_r, rnd_diff_i, n_rnd_diff_r, n_rnd_diff_i;\n"
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_sum_r(i_clk, i_ce,\n"
"\t\t\t\tsum_r, rnd_sum_r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0i(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_0i, o_out_0i);\n\n", rnd_string);
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_sum_i(i_clk, i_ce,\n"
"\t\t\t\tsum_i, rnd_sum_i);\n\n", rnd_string);
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1r(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_1r, o_out_1r);\n\n", rnd_string);
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_diff_r(i_clk, i_ce,\n"
"\t\t\t\tdiff_r, rnd_diff_r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1i(i_clk, i_ce,\n"
"\t\t\t\t\t\t\trnd_in_1i, o_out_1i);\n\n", rnd_string);
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_diff_i(i_clk, i_ce,\n"
"\t\t\t\tdiff_i, rnd_diff_i);\n\n", rnd_string);
fprintf(fp, "\tassign n_rnd_diff_r = - rnd_diff_r;\n"
"\tassign n_rnd_diff_i = - rnd_diff_i;\n");
fprintf(fp,
"\tinitial wait_for_sync = 1\'b1;\n"
"\tinitial iaddr = 0;\n");
if (async_reset)
fprintf(fp,
"\talways @(posedge i_clk, negedge i_areset_n)\n"
"\t\tif (!i_reset)\n");
else
fprintf(fp,
"\talways @(posedge i_clk)\n"
"\t\tif (i_reset)\n");
 
fprintf(fp, "\n"
"\t// Prior versions of this routine did not include the extra\n"
"\t// clock and register/flip-flops that this routine requires.\n"
"\t// These are placed in here to correct a bug in Verilator, that\n"
"\t// otherwise struggles. (Hopefully this will fix the problem ...)\n"
fprintf(fp, "\t\tbegin\n"
"\t\t\twait_for_sync <= 1\'b1;\n"
"\t\t\tiaddr <= 0;\n"
"\t\tend else if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n"
"\t\tbegin\n"
"\t\t\tiaddr <= iaddr + 1\'b1;\n"
"\t\t\twait_for_sync <= 1\'b0;\n"
"\t\tend\n\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\to_left <= { o_out_0r, o_out_0i };\n"
"\t\t\to_right <= { o_out_1r, o_out_1i };\n"
"\t\t\timem[0] <= i_data;\n"
"\t\t\timem[1] <= imem[0];\n"
"\t\tend\n"
"\n"
"\tinitial\to_sync = 1'b0; // Final sync coming out of module\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\t\to_sync <= 1'b0;\n"
"\t\telse if (i_ce)\n"
"\t\t\to_sync <= r_sync;\n"
"\n"
"endmodule\n");
fclose(fp);
}
"\n\n");
fprintf(fp,
"\t// Note that we don\'t check on wait_for_sync or i_sync here.\n"
"\t// Why not? Because iaddr will always be zero until after the\n"
"\t// first i_ce, so we are safe.\n"
"\tinitial pipeline = 3\'h0;\n");
 
void build_multiply(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
if (async_reset)
fprintf(fp,
"\talways\t@(posedge i_clk, negedge i_areset_n)\n"
"\t\tif (!i_reset)\n");
else
fprintf(fp,
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_reset)\n");
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: shiftaddmpy.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: A portable shift and add multiply.\n"
"//\n"
"// While both Xilinx and Altera will offer single clock \n"
"// multiplies, this simple approach will multiply two numbers\n"
"// on any architecture. The result maintains the full width\n"
"// of the multiply, there are no extra stuff bits, no rounding,\n"
"// no shifted bits, etc.\n"
"//\n"
"// Further, for those applications that can support it, this\n"
"// multiply is pipelined and will produce one answer per clock.\n"
"//\n"
"// For minimal processing delay, make the first parameter\n"
"// the one with the least bits, so that AWIDTH <= BWIDTH.\n"
"//\n"
"// The processing delay in this multiply is (AWIDTH+1) cycles.\n"
"// That is, if the data is present on the input at clock t=0,\n"
"// the result will be present on the output at time t=AWIDTH+1;\n"
"//\n"
"//\n%s"
"//\n", prjname, creator);
"\t\t\tpipeline <= 3\'h0;\n"
"\t\telse if (i_ce) // is our pipeline process full? Which stages?\n"
"\t\t\tpipeline <= { pipeline[1:0], iaddr[1] };\n\n");
fprintf(fp,
"\t// This is the pipeline[-1] stage, pipeline[0] will be set next.\n"
"\talways\t@(posedge i_clk)\n"
"\t\tif ((i_ce)&&(iaddr[1]))\n"
"\t\tbegin\n"
"\t\t\tsum_r <= imem_r + i_data_r;\n"
"\t\t\tsum_i <= imem_i + i_data_i;\n"
"\t\t\tdiff_r <= imem_r - i_data_r;\n"
"\t\t\tdiff_i <= imem_i - i_data_i;\n"
"\t\tend\n\n");
fprintf(fp,
"\t// pipeline[1] takes sum_x and diff_x and produces rnd_x\n\n");
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module shiftaddmpy(i_clk, i_ce, i_a, i_b, o_r);\n"
"\tparameter\tAWIDTH=%d,BWIDTH=", TST_SHIFTADDMPY_AW);
#ifdef TST_SHIFTADDMPY_BW
fprintf(fp, "%d;\n", TST_SHIFTADDMPY_BW);
#else
fprintf(fp, "AWIDTH;\n");
#endif
fprintf(fp,
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\t[(AWIDTH-1):0]\t\ti_a;\n"
"\tinput\t\t[(BWIDTH-1):0]\t\ti_b;\n"
"\toutput\treg\t[(AWIDTH+BWIDTH-1):0]\to_r;\n"
"\n"
"\treg\t[(AWIDTH-1):0]\tu_a;\n"
"\treg\t[(BWIDTH-1):0]\tu_b;\n"
"\treg\t\t\tsgn;\n"
"\n"
"\treg\t[(AWIDTH-2):0]\t\tr_a[0:(AWIDTH-1)];\n"
"\treg\t[(AWIDTH+BWIDTH-2):0]\tr_b[0:(AWIDTH-1)];\n"
"\treg\t\t\t\tr_s[0:(AWIDTH-1)];\n"
"\treg\t[(AWIDTH+BWIDTH-1):0]\tacc[0:(AWIDTH-1)];\n"
"\tgenvar k;\n"
"\n"
"\t// If we were forced to stay within two\'s complement arithmetic,\n"
"\t// taking the absolute value here would require an additional bit.\n"
"\t// However, because our results are now unsigned, we can stay\n"
"\t// within the number of bits given (for now).\n"
"\talways @(posedge i_clk)\n"
"\t// Now for pipeline[2]. We can actually do this at all i_ce\n"
"\t// clock times, since nothing will listen unless pipeline[3]\n"
"\t// on the next clock. Thus, we simplify this logic and do\n"
"\t// it independent of pipeline[2].\n"
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tu_a <= (i_a[AWIDTH-1])?(-i_a):(i_a);\n"
"\t\t\tu_b <= (i_b[BWIDTH-1])?(-i_b):(i_b);\n"
"\t\t\tsgn <= i_a[AWIDTH-1] ^ i_b[BWIDTH-1];\n"
"\t\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\t\tob_a <= { rnd_sum_r, rnd_sum_i };\n"
"\t\t\t// on Even, W = e^{-j2pi 1/4 0} = 1\n"
"\t\t\tif (!iaddr[0])\n"
"\t\t\tbegin\n"
"\t\t\t\tob_b_r <= rnd_diff_r;\n"
"\t\t\t\tob_b_i <= rnd_diff_i;\n"
"\t\t\tend else if (INVERSE==0) begin\n"
"\t\t\t\t// on Odd, W = e^{-j2pi 1/4} = -j\n"
"\t\t\t\tob_b_r <= rnd_diff_i;\n"
"\t\t\t\tob_b_i <= n_rnd_diff_r;\n"
"\t\t\tend else begin\n"
"\t\t\t\t// on Odd, W = e^{j2pi 1/4} = j\n"
"\t\t\t\tob_b_r <= n_rnd_diff_i;\n"
"\t\t\t\tob_b_i <= rnd_diff_r;\n"
"\t\t\tend\n"
"\t\tend\n\n");
fprintf(fp,
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tacc[0] <= (u_a[0]) ? { {(AWIDTH){1\'b0}}, u_b }\n"
"\t\t\t\t\t: {(AWIDTH+BWIDTH){1\'b0}};\n"
"\t\t\tr_a[0] <= { u_a[(AWIDTH-1):1] };\n"
"\t\t\tr_b[0] <= { {(AWIDTH-1){1\'b0}}, u_b };\n"
"\t\t\tr_s[0] <= sgn; // The final sign, needs to be preserved\n"
"\t\tend\n"
"\n"
"\tgenerate\n"
"\tfor(k=0; k<AWIDTH-1; k=k+1)\n"
"\tbegin : genstages\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tacc[k+1] <= acc[k] + ((r_a[k][0]) ? {r_b[k],1\'b0}:0);\n"
"\t\t\tr_a[k+1] <= { 1\'b0, r_a[k][(AWIDTH-2):1] };\n"
"\t\t\tr_b[k+1] <= { r_b[k][(AWIDTH+BWIDTH-3):0], 1\'b0};\n"
"\t\t\tr_s[k+1] <= r_s[k];\n"
"\t\tend\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_r <= (r_s[AWIDTH-1]) ? (-acc[AWIDTH-1]) : acc[AWIDTH-1];\n"
"\n"
"endmodule\n");
"\t\tbegin // In sequence, clock = 3\n"
"\t\t\tomem[0] <= ob_b;\n"
"\t\t\tomem[1] <= omem[0];\n"
"\t\t\tif (pipeline[2])\n"
"\t\t\t\to_data <= ob_a;\n"
"\t\t\telse\n"
"\t\t\t\to_data <= omem[1];\n"
"\t\tend\n\n");
 
fclose(fp);
}
fprintf(fp,
"\tinitial\to_sync = 1\'b0;\n");
 
void build_bimpy(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
if (async_reset)
fprintf(fp,
"\talways\t@(posedge i_clk, negedge i_areset_n)\n"
"\t\tif (!i_areset_n)\n");
else
fprintf(fp,
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_reset)\n");
fprintf(fp,
"////////////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: %s\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: A simple 2-bit multiply based upon the fact that LUT's allow\n"
"// 6-bits of input. In other words, I could build a 3-bit\n"
"// multiply from 6 LUTs (5 actually, since the first could have\n"
"// two outputs). This would allow multiplication of three bit\n"
"// digits, save only for the fact that you would need two bits\n"
"// of carry. The bimpy approach throttles back a bit and does\n"
"// a 2x2 bit multiply in a LUT, guaranteeing that it will never\n"
"// carry more than one bit. While this multiply is hardware\n"
"// independent (and can still run under Verilator therefore),\n"
"// it is really motivated by trying to optimize for a specific\n"
"// piece of hardware (Xilinx-7 series ...) that has at least\n"
"// 4-input LUT's with carry chains.\n"
"//\n"
"//\n"
"//\n%s"
"//\n", fname, prjname, creator);
"\t\t\to_sync <= 1\'b0;\n"
"\t\telse if (i_ce)\n"
"\t\t\to_sync <= (iaddr[2:0] == 3'b101);\n\n");
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module bimpy(i_clk, i_ce, i_a, i_b, o_r);\n"
"\tparameter\tBW=18, // Number of bits in i_b\n"
"\t\t\tLUTB=2; // Number of bits in i_a for our LUT multiply\n"
"\tinput\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\t[(LUTB-1):0]\ti_a;\n"
"\tinput\t\t[(BW-1):0]\ti_b;\n"
"\toutput\treg\t[(BW+LUTB-1):0] o_r;\n"
if (formal_property_flag) {
fprintf(fp,
"`ifdef FORMAL\n"
"\treg f_past_valid;\n"
"\tinitial f_past_valid = 1'b0;\n"
"\talways @(posedge i_clk)\n"
"\t f_past_valid = 1'b1;\n"
"\n"
"\twire [(BW+LUTB-2):0] w_r;\n"
"\twire [(BW+LUTB-3):1] c;\n"
"`ifdef QTRSTAGE\n"
"\talways @(posedge i_clk)\n"
"\t assume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n"
"`endif\n"
"\n"
"\tassign\tw_r = { ((i_a[1])?i_b:{(BW){1'b0}}), 1'b0 }\n"
"\t\t\t\t^ { 1'b0, ((i_a[0])?i_b:{(BW){1'b0}}) };\n"
"\tassign\tc = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1'b0}}) }\n"
"\t\t\t& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1'b0}});\n"
"\t// The below logic only works if the rounding stage does nothing\n"
"\tinitial assert(IWIDTH+1 == OWIDTH);\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_r <= w_r + { c, 2'b0 };\n"
"\treg signed [IWIDTH-1:0] f_piped_real [0:7];\n"
"\treg signed [IWIDTH-1:0] f_piped_imag [0:7];\n"
"\n"
"endmodule\n");
 
fclose(fp);
}
 
void build_longbimpy(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
"////////////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: %s\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: A portable shift and add multiply, built with the knowledge\n"
"// of the existence of a six bit LUT and carry chain. That\n"
"// knowledge allows us to multiply two bits from one value\n"
"// at a time against all of the bits of the other value. This\n"
"// sub multiply is called the bimpy.\n"
"//\n"
"// For minimal processing delay, make the first parameter\n"
"// the one with the least bits, so that AWIDTH <= BWIDTH.\n"
"//\n"
"//\n"
"//\n%s"
"//\n", fname, prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module longbimpy(i_clk, i_ce, i_a, i_b, o_r);\n"
"\tparameter AW=%d, // The width of i_a, min width is 5\n"
"\t\t\tBW=", TST_LONGBIMPY_AW);
#ifdef TST_LONGBIMPY_BW
fprintf(fp, "%d", TST_LONGBIMPY_BW);
#else
fprintf(fp, "AW");
#endif
 
fprintf(fp, ", // The width of i_b, can be anything\n"
"\t\t\t// The following three parameters should not be changed\n"
"\t\t\t// by any implementation, but are based upon hardware\n"
"\t\t\t// and the above values:\n"
"\t\t\tOW=AW+BW, // The output width\n"
"\t\t\tIW=(AW+1)&(-2), // Internal width of A\n"
"\t\t\tLUTB=2, // How many bits we can multiply by at once\n"
"\t\t\tTLEN=(AW+(LUTB-1))/LUTB; // Nmbr of rows in our tableau\n"
"\tinput\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\t[(AW-1):0]\ti_a;\n"
"\tinput\t\t[(BW-1):0]\ti_b;\n"
"\toutput\treg\t[(AW+BW-1):0]\to_r;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_ce)\n"
"\tbegin\n"
"\t f_piped_real[0] <= i_data[2*IWIDTH-1:IWIDTH];\n"
"\t f_piped_imag[0] <= i_data[ IWIDTH-1:0];\n"
"\n"
"\treg\t[(IW-1):0]\tu_a;\n"
"\treg\t[(BW-1):0]\tu_b;\n"
"\treg\t\t\tsgn;\n"
"\t f_piped_real[1] <= f_piped_real[0];\n"
"\t f_piped_imag[1] <= f_piped_imag[0];\n"
"\n"
"\treg\t[(IW-1-2*(LUTB)):0]\tr_a[0:(TLEN-3)];\n"
"\treg\t[(BW-1):0]\t\tr_b[0:(TLEN-3)];\n"
"\treg\t[(TLEN-1):0]\t\tr_s;\n"
"\treg\t[(IW+BW-1):0]\t\tacc[0:(TLEN-2)];\n"
"\tgenvar k;\n"
"\t f_piped_real[2] <= f_piped_real[1];\n"
"\t f_piped_imag[2] <= f_piped_imag[1];\n"
"\n"
"\t// First step:\n"
"\t// Switch to unsigned arithmetic for our multiply, keeping track\n"
"\t// of the along the way. We'll then add the sign again later at\n"
"\t// the end.\n"
"\t//\n"
"\t// If we were forced to stay within two's complement arithmetic,\n"
"\t// taking the absolute value here would require an additional bit.\n"
"\t// However, because our results are now unsigned, we can stay\n"
"\t// within the number of bits given (for now).\n"
"\tgenerate if (IW > AW)\n"
"\tbegin\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\t\tu_a <= { 1'b0, (i_a[AW-1])?(-i_a):(i_a) };\n"
"\tend else begin\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\t\tu_a <= (i_a[AW-1])?(-i_a):(i_a);\n"
"\tend endgenerate\n"
"\t f_piped_real[3] <= f_piped_real[2];\n"
"\t f_piped_imag[3] <= f_piped_imag[2];\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tu_b <= (i_b[BW-1])?(-i_b):(i_b);\n"
"\t\t\tsgn <= i_a[AW-1] ^ i_b[BW-1];\n"
"\t\tend\n"
"\t f_piped_real[4] <= f_piped_real[3];\n"
"\t f_piped_imag[4] <= f_piped_imag[3];\n"
"\n"
"\twire [(BW+LUTB-1):0] pr_a, pr_b;\n"
"\t f_piped_real[5] <= f_piped_real[4];\n"
"\t f_piped_imag[5] <= f_piped_imag[4];\n"
"\n"
"\t//\n"
"\t// Second step: First two 2xN products.\n"
"\t//\n"
"\t// Since we have no tableau of additions (yet), we can do both\n"
"\t// of the first two rows at the same time and add them together.\n"
"\t// For the next round, we'll then have a previous sum to accumulate\n"
"\t// with new and subsequent product, and so only do one product at\n"
"\t// a time can follow this--but the first clock can do two at a time.\n"
"\tbimpy\t#(BW) lmpy_0(i_clk,i_ce,u_a[( LUTB-1): 0], u_b, pr_a);\n"
"\tbimpy\t#(BW) lmpy_1(i_clk,i_ce,u_a[(2*LUTB-1):LUTB], u_b, pr_b);\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) r_a[0] <= u_a[(IW-1):(2*LUTB)];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) r_b[0] <= u_b;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) r_s <= { r_s[(TLEN-2):0], sgn };\n"
"\talways @(posedge i_clk) // One clk after p[0],p[1] become valid\n"
"\t\tif (i_ce) acc[0] <= { {(IW-LUTB){1'b0}}, pr_a}\n"
"\t\t\t +{ {(IW-(2*LUTB)){1'b0}}, pr_b, {(LUTB){1'b0}} };\n"
"\t f_piped_real[6] <= f_piped_real[5];\n"
"\t f_piped_imag[6] <= f_piped_imag[5];\n"
"\n"
"\tgenerate // Keep track of intermediate values, before multiplying them\n"
"\tif (TLEN > 3) for(k=0; k<TLEN-3; k=k+1)\n"
"\tbegin : gencopies\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tr_a[k+1] <= { {(LUTB){1'b0}},\n"
"\t\t\t\tr_a[k][(IW-1-(2*LUTB)):LUTB] };\n"
"\t\t\tr_b[k+1] <= r_b[k];\n"
"\t\tend\n"
"\tend endgenerate\n"
"\t f_piped_real[7] <= f_piped_real[6];\n"
"\t f_piped_imag[7] <= f_piped_imag[6];\n"
"\tend\n"
"\n"
"\tgenerate // The actual multiply and accumulate stage\n"
"\tif (TLEN > 2) for(k=0; k<TLEN-2; k=k+1)\n"
"\tbegin : genstages\n"
"\t\t// First, the multiply: 2-bits times BW bits\n"
"\t\twire\t[(BW+LUTB-1):0] genp;\n"
"\t\tbimpy #(BW) genmpy(i_clk,i_ce,r_a[k][(LUTB-1):0],r_b[k], genp);\n"
"\treg f_rsyncd;\n"
"\twire f_syncd;\n"
"\n"
"\t\t// Then the accumulate step -- on the next clock\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\t\tacc[k+1] <= acc[k] + {{(IW-LUTB*(k+3)){1'b0}},\n"
"\t\t\t\t\tgenp, {(LUTB*(k+2)){1'b0}} };\n"
"\tend endgenerate\n"
"\n"
"\twire [(IW+BW-1):0] w_r;\n"
"\tassign\tw_r = (r_s[TLEN-1]) ? (-acc[TLEN-2]) : acc[TLEN-2];\n"
"\tinitial f_rsyncd = 0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_r <= w_r[(AW+BW-1):0];\n"
"\tif(i_reset)\n"
"\t f_rsyncd <= 1'b0;\n"
"\telse if (!f_rsyncd)\n"
"\t f_rsyncd <= (o_sync);\n"
"\tassign f_syncd = (f_rsyncd)||(o_sync);\n"
"\n"
"\tgenerate if (IW > AW)\n"
"\tbegin : VUNUSED\n"
"\t\t// verilator lint_off UNUSED\n"
"\t\twire\t[(IW-AW)-1:0]\tunused;\n"
"\t\tassign\tunused = w_r[(IW+BW-1):(AW+BW)];\n"
"\t\t// verilator lint_on UNUSED\n"
"\tend endgenerate\n"
"\treg [1:0] f_state;\n"
"\n"
"endmodule\n");
 
fclose(fp);
}
 
void build_dblreverse(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: dblreverse.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: This module bitreverses a pipelined FFT input. Operation is\n"
"// expected as follows:\n"
"//\n"
"// i_clk A running clock at whatever system speed is offered.\n"
"// i_rst A synchronous reset signal, that resets all internals\n"
"// i_ce If this is one, one input is consumed and an output\n"
"// is produced.\n"
"// i_in_0, i_in_1\n"
"// Two inputs to be consumed, each of width WIDTH.\n"
"// o_out_0, o_out_1\n"
"// Two of the bitreversed outputs, also of the same\n"
"// width, WIDTH. Of course, there is a delay from the\n"
"// first input to the first output. For this purpose,\n"
"// o_sync is present.\n"
"// o_sync This will be a 1\'b1 for the first value in any block.\n"
"// Following a reset, this will only become 1\'b1 once\n"
"// the data has been loaded and is now valid. After that,\n"
"// all outputs will be valid.\n"
"//\n"
"// 20150602 -- This module has undergone massive rework in order to\n"
"// ensure that it uses resources efficiently. As a result, \n"
"// it now optimizes nicely into block RAMs. As an unfortunately\n"
"// side effect, it now passes it\'s bench test (dblrev_tb) but\n"
"// fails the integration bench test (fft_tb).\n"
"//\n"
"//\n%s"
"//\n", prjname, creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"\n\n"
"//\n"
"// How do we do bit reversing at two smples per clock? Can we separate out\n"
"// our work into eight memory banks, writing two banks at once and reading\n"
"// another two banks in the same clock?\n"
"//\n"
"// mem[00xxx0] = s_0[n]\n"
"// mem[00xxx1] = s_1[n]\n"
"// o_0[n] = mem[10xxx0]\n"
"// o_1[n] = mem[11xxx0]\n"
"// ...\n"
"// mem[01xxx0] = s_0[m]\n"
"// mem[01xxx1] = s_1[m]\n"
"// o_0[m] = mem[10xxx1]\n"
"// o_1[m] = mem[11xxx1]\n"
"// ...\n"
"// mem[10xxx0] = s_0[n]\n"
"// mem[10xxx1] = s_1[n]\n"
"// o_0[n] = mem[00xxx0]\n"
"// o_1[n] = mem[01xxx0]\n"
"// ...\n"
"// mem[11xxx0] = s_0[m]\n"
"// mem[11xxx1] = s_1[m]\n"
"// o_0[m] = mem[00xxx1]\n"
"// o_1[m] = mem[01xxx1]\n"
"// ...\n"
"//\n"
"// The answer is that, yes we can but: we need to use four memory banks\n"
"// to do it properly. These four banks are defined by the two bits\n"
"// that determine the top and bottom of the correct address. Larger\n"
"// FFT\'s would require more memories.\n"
"//\n"
"//\n");
fprintf(fp,
"module dblreverse(i_clk, i_rst, i_ce, i_in_0, i_in_1,\n"
"\t\to_out_0, o_out_1, o_sync);\n"
"\tparameter\t\t\tLGSIZE=%d, WIDTH=24;\n"
"\tinput\t\t\t\ti_clk, i_rst, i_ce;\n"
"\tinput\t\t[(2*WIDTH-1):0]\ti_in_0, i_in_1;\n"
"\toutput\twire\t[(2*WIDTH-1):0]\to_out_0, o_out_1;\n"
"\toutput\treg\t\t\to_sync;\n", TST_DBLREVERSE_LGSIZE);
 
fprintf(fp,
"\n"
"\treg\t\t\tin_reset;\n"
"\treg\t[(LGSIZE-1):0]\tiaddr;\n"
"\twire\t[(LGSIZE-3):0]\tbraddr;\n"
"\tinitial f_state = 0;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n"
"\t f_state <= 0;\n"
"\telse if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n"
"\t f_state <= f_state + 1;\n"
"\n"
"\tgenvar\tk;\n"
"\tgenerate for(k=0; k<LGSIZE-2; k=k+1)\n"
"\tbegin : gen_a_bit_reversed_value\n"
"\t\tassign braddr[k] = iaddr[LGSIZE-3-k];\n"
"\tend endgenerate\n"
"\talways @(*)\n"
"\tif (f_state != 0)\n"
"\t assume(!i_sync);\n"
"\n"
"\tinitial iaddr = 0;\n"
"\tinitial in_reset = 1\'b1;\n"
"\tinitial o_sync = 1\'b0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tbegin\n"
"\t\t\tiaddr <= 0;\n"
"\t\t\tin_reset <= 1\'b1;\n"
"\t\t\to_sync <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\tiaddr <= iaddr + { {(LGSIZE-1){1\'b0}}, 1\'b1 };\n"
"\t\t\tif (&iaddr[(LGSIZE-2):0])\n"
"\t\t\t\tin_reset <= 1\'b0;\n"
"\t\t\tif (in_reset)\n"
"\t\t\t\to_sync <= 1\'b0;\n"
"\t\t\telse\n"
"\t\t\t\to_sync <= ~(|iaddr[(LGSIZE-2):0]);\n"
"\t\tend\n"
"\t assert(f_state[1:0] == iaddr[1:0]);\n"
"\n"
"\treg\t[(2*WIDTH-1):0]\tmem_e [0:((1<<(LGSIZE))-1)];\n"
"\treg\t[(2*WIDTH-1):0]\tmem_o [0:((1<<(LGSIZE))-1)];\n"
"\twire signed [2*IWIDTH-1:0] f_i_real, f_i_imag;\n"
"\tassign f_i_real = i_data[2*IWIDTH-1:IWIDTH];\n"
"\tassign f_i_imag = i_data[ IWIDTH-1:0];\n"
"\n"
"\twire signed [OWIDTH-1:0] f_o_real, f_o_imag;\n"
"\tassign f_o_real = o_data[2*OWIDTH-1:OWIDTH];\n"
"\tassign f_o_imag = o_data[ OWIDTH-1:0];\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\tmem_e[iaddr] <= i_in_0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\tmem_o[iaddr] <= i_in_1;\n"
"\tif (f_state == 2'b11)\n"
"\tbegin\n"
"\t assume(f_piped_real[0] != 3'sb100);\n"
"\t assume(f_piped_real[2] != 3'sb100);\n"
"\t assert(sum_r == f_piped_real[2] + f_piped_real[0]);\n"
"\t assert(sum_i == f_piped_imag[2] + f_piped_imag[0]);\n"
"\n"
"\t assert(diff_r == f_piped_real[2] - f_piped_real[0]);\n"
"\t assert(diff_i == f_piped_imag[2] - f_piped_imag[0]);\n"
"\tend\n"
"\n"
"\treg [(2*WIDTH-1):0] evn_out_0, evn_out_1, odd_out_0, odd_out_1;\n"
"\talways @(posedge i_clk)\n"
"\tif ((f_state == 2'b00)&&((f_syncd)||(iaddr >= 4)))\n"
"\tbegin\n"
"\t assert(rnd_sum_r == f_piped_real[3]+f_piped_real[1]);\n"
"\t assert(rnd_sum_i == f_piped_imag[3]+f_piped_imag[1]);\n"
"\t assert(rnd_diff_r == f_piped_real[3]-f_piped_real[1]);\n"
"\t assert(rnd_diff_i == f_piped_imag[3]-f_piped_imag[1]);\n"
"\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\tevn_out_0 <= mem_e[{~iaddr[LGSIZE-1],1\'b0,braddr}];\n"
"\tif ((f_state == 2'b10)&&(f_syncd))\n"
"\tbegin\n"
"\t // assert(o_sync);\n"
"\t assert(f_o_real == f_piped_real[5] + f_piped_real[3]);\n"
"\t assert(f_o_imag == f_piped_imag[5] + f_piped_imag[3]);\n"
"\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\tevn_out_1 <= mem_e[{~iaddr[LGSIZE-1],1\'b1,braddr}];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\todd_out_0 <= mem_o[{~iaddr[LGSIZE-1],1\'b0,braddr}];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n\t\t\todd_out_1 <= mem_o[{~iaddr[LGSIZE-1],1\'b1,braddr}];\n"
"\tif ((f_state == 2'b11)&&(f_syncd))\n"
"\tbegin\n"
"\t assert(!o_sync);\n"
"\t assert(f_o_real == f_piped_real[5] + f_piped_real[3]);\n"
"\t assert(f_o_imag == f_piped_imag[5] + f_piped_imag[3]);\n"
"\tend\n"
"\n"
"\treg\tadrz;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) adrz <= iaddr[LGSIZE-2];\n"
"\tif ((f_state == 2'b00)&&(f_syncd))\n"
"\tbegin\n"
"\t assert(!o_sync);\n"
"\t assert(f_o_real == f_piped_real[7] - f_piped_real[5]);\n"
"\t assert(f_o_imag == f_piped_imag[7] - f_piped_imag[5]);\n"
"\tend\n"
"\n"
"\tassign\to_out_0 = (adrz)?odd_out_0:evn_out_0;\n"
"\tassign\to_out_1 = (adrz)?odd_out_1:evn_out_1;\n"
"\talways @(*)\n"
"\tif ((iaddr[2:0] == 0)&&(!wait_for_sync))\n"
"\t assume(i_sync);\n"
"\n"
"endmodule\n");
 
fclose(fp);
}
 
void build_butterfly(const char *fname, int xtracbits, ROUND_T rounding) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
const char *rnd_string;
if (rounding == RND_TRUNCATE)
rnd_string = "truncate";
else if (rounding == RND_FROMZERO)
rnd_string = "roundfromzero";
else if (rounding == RND_HALFUP)
rnd_string = "roundhalfup";
else
rnd_string = "convround";
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: butterfly.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: This routine caculates a butterfly for a decimation\n"
"// in frequency version of an FFT. Specifically, given\n"
"// complex Left and Right values together with a \n"
"// coefficient, the output of this routine is given\n"
"// by:\n"
"//\n"
"// L' = L + R\n"
"// R' = (L - R)*C\n"
"//\n"
"// The rest of the junk below handles timing (mostly),\n"
"// to make certain that L' and R' reach the output at\n"
"// the same clock. Further, just to make certain\n"
"// that is the case, an 'aux' input exists. This\n"
"// aux value will come out of this routine synchronized\n"
"// to the values it came in with. (i.e., both L', R',\n"
"// and aux all have the same delay.) Hence, a caller\n"
"// of this routine may set aux on the first input with\n"
"// valid data, and then wait to see aux set on the output\n"
"// to know when to find the first output with valid data.\n"
"//\n"
"// All bits are preserved until the very last clock,\n"
"// where any more bits than OWIDTH will be quietly\n"
"// discarded.\n"
"//\n"
"// This design features no overflow checking.\n"
"// \n"
"// Notes:\n"
"// CORDIC:\n"
"// Much as we would like, we can't use a cordic here.\n"
"// The goal is to accomplish an FFT, as defined, and a\n"
"// CORDIC places a scale factor onto the data. Removing\n"
"// the scale factor would cost a two multiplies, which\n"
"// is precisely what we are trying to avoid.\n"
"//\n"
"//\n"
"// 3-MULTIPLIES:\n"
"// It should also be possible to do this with three \n"
"// multiplies and an extra two addition cycles. \n"
"//\n"
"// We want\n"
"// R+I = (a + jb) * (c + jd)\n"
"// R+I = (ac-bd) + j(ad+bc)\n"
"// We multiply\n"
"// P1 = ac\n"
"// P2 = bd\n"
"// P3 = (a+b)(c+d)\n"
"// Then \n"
"// R+I=(P1-P2)+j(P3-P2-P1)\n"
"//\n"
"// WIDTHS:\n"
"// On multiplying an X width number by an\n"
"// Y width number, X>Y, the result should be (X+Y)\n"
"// bits, right?\n"
"// -2^(X-1) <= a <= 2^(X-1) - 1\n"
"// -2^(Y-1) <= b <= 2^(Y-1) - 1\n"
"// (2^(Y-1)-1)*(-2^(X-1)) <= ab <= 2^(X-1)2^(Y-1)\n"
"// -2^(X+Y-2)+2^(X-1) <= ab <= 2^(X+Y-2) <= 2^(X+Y-1) - 1\n"
"// -2^(X+Y-1) <= ab <= 2^(X+Y-1)-1\n"
"// YUP! But just barely. Do this and you'll really want\n"
"// to drop a bit, although you will risk overflow in so\n"
"// doing.\n"
"//\n"
"// 20150602 -- The sync logic lines have been completely redone. The\n"
"// synchronization lines no longer go through the FIFO with the\n"
"// left hand sum, but are kept out of memory. This allows the\n"
"// butterfly to use more optimal memory resources, while also\n"
"// guaranteeing that the sync lines can be properly reset upon\n"
"// any reset signal.\n"
"//\n"
"//\n%s"
"//\n", prjname, creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
 
fprintf(fp,
"module\tbutterfly(i_clk, i_rst, i_ce, i_coef, i_left, i_right, i_aux,\n"
"\t\to_left, o_right, o_aux);\n"
"\t// Public changeable parameters ...\n"
"\tparameter IWIDTH=%d,", TST_BUTTERFLY_IWIDTH);
#ifdef TST_BUTTERFLY_CWIDTH
fprintf(fp, "CWIDTH=%d,", TST_BUTTERFLY_CWIDTH);
#else
fprintf(fp, "CWIDTH=IWIDTH+%d,", xtracbits);
#endif
#ifdef TST_BUTTERFLY_OWIDTH
fprintf(fp, "OWIDTH=%d;\n", TST_BUTTERFLY_OWIDTH);
#else
fprintf(fp, "OWIDTH=IWIDTH+1;\n");
#endif
fprintf(fp,
"\t// Parameters specific to the core that should not be changed.\n"
"\tparameter MPYDELAY=%d'd%d,\n"
"\t\t\tSHIFT=0, AUXLEN=(MPYDELAY+3);\n"
"\t// The LGDELAY should be the base two log of the MPYDELAY. If\n"
"\t// this value is fractional, then round up to the nearest\n"
"\t// integer: LGDELAY=ceil(log(MPYDELAY)/log(2));\n"
"\tparameter\tLGDELAY=%d;\n"
"\tinput\t\ti_clk, i_rst, i_ce;\n"
"\tinput\t\t[(2*CWIDTH-1):0] i_coef;\n"
"\tinput\t\t[(2*IWIDTH-1):0] i_left, i_right;\n"
"\tinput\t\ti_aux;\n"
"\toutput\twire [(2*OWIDTH-1):0] o_left, o_right;\n"
"\toutput\treg\to_aux;\n"
"\n", lgdelay(16,xtracbits), bflydelay(16, xtracbits),
lgdelay(16,xtracbits));
fprintf(fp,
"\treg\t[(2*IWIDTH-1):0]\tr_left, r_right;\n"
"\treg\t[(2*CWIDTH-1):0]\tr_coef, r_coef_2;\n"
"\twire\tsigned\t[(IWIDTH-1):0]\tr_left_r, r_left_i, r_right_r, r_right_i;\n"
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n"
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n"
"\talways @(*)\n"
"\tif (wait_for_sync)\n"
"\t assert((iaddr == 0)&&(f_state == 2'b00)&&(!o_sync)&&(!f_rsyncd));\n"
"\n"
"\treg\tsigned\t[(IWIDTH):0]\tr_sum_r, r_sum_i, r_dif_r, r_dif_i;\n"
"\n"
"\treg [(LGDELAY-1):0] fifo_addr;\n"
"\twire [(LGDELAY-1):0] fifo_read_addr;\n"
"\tassign\tfifo_read_addr = fifo_addr - MPYDELAY;\n"
"\treg [(2*IWIDTH+1):0] fifo_left [ 0:((1<<LGDELAY)-1)];\n"
"\n");
fprintf(fp,
"\t// Set up the input to the multiply\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// One clock just latches the inputs\n"
"\t\t\tr_left <= i_left; // No change in # of bits\n"
"\t\t\tr_right <= i_right;\n"
"\t\t\tr_coef <= i_coef;\n"
"\t\t\t// Next clock adds/subtracts\n"
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n"
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n"
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n"
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n"
"\t\t\t// Other inputs are simply delayed on second clock\n"
"\t\t\tr_coef_2<= r_coef;\n"
"\t\tend\n"
"\n");
fprintf(fp,
"\t// Don\'t forget to record the even side, since it doesn\'t need\n"
"\t// to be multiplied, but yet we still need the results in sync\n"
"\t// with the answer when it is ready.\n"
"\tinitial fifo_addr = 0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\t\tfifo_addr <= 0;\n"
"\t\telse if (i_ce)\n"
"\t\t\t// Need to delay the sum side--nothing else happens\n"
"\t\t\t// to it, but it needs to stay synchronized with the\n"
"\t\t\t// right side.\n"
"\t\t\tfifo_addr <= fifo_addr + 1;\n"
"\tif ((f_past_valid)&&($past(i_ce))&&($past(i_sync))&&(!$past(i_reset)))\n"
"\t assert(!wait_for_sync);\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\tfifo_left[fifo_addr] <= { r_sum_r, r_sum_i };\n"
"\n"
"\twire\tsigned\t[(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n"
"\tassign\tir_coef_r = r_coef_2[(2*CWIDTH-1):CWIDTH];\n"
"\tassign\tir_coef_i = r_coef_2[(CWIDTH-1):0];\n"
"\twire\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\tp_one, p_two, p_three;\n"
"\n"
"\n");
fprintf(fp,
"\t// Multiply output is always a width of the sum of the widths of\n"
"\t// the two inputs. ALWAYS. This is independent of the number of\n"
"\t// bits in p_one, p_two, or p_three. These values needed to \n"
"\t// accumulate a bit (or two) each. However, this approach to a\n"
"\t// three multiply complex multiply cannot increase the total\n"
"\t// number of bits in our final output. We\'ll take care of\n"
"\t// dropping back down to the proper width, OWIDTH, in our routine\n"
"\t// below.\n"
"\n"
"\n");
fprintf(fp,
"\t// We accomplish here \"Karatsuba\" multiplication. That is,\n"
"\t// by doing three multiplies we accomplish the work of four.\n"
"\t// Let\'s prove to ourselves that this works ... We wish to\n"
"\t// multiply: (a+jb) * (c+jd), where a+jb is given by\n"
"\t//\ta + jb = r_dif_r + j r_dif_i, and\n"
"\t//\tc + jd = ir_coef_r + j ir_coef_i.\n"
"\t// We do this by calculating the intermediate products P1, P2,\n"
"\t// and P3 as\n"
"\t//\tP1 = ac\n"
"\t//\tP2 = bd\n"
"\t//\tP3 = (a + b) * (c + d)\n"
"\t// and then complete our final answer with\n"
"\t//\tac - bd = P1 - P2 (this checks)\n"
"\t//\tad + bc = P3 - P2 - P1\n"
"\t//\t = (ac + bc + ad + bd) - bd - ac\n"
"\t//\t = bc + ad (this checks)\n"
"\n"
"\n");
fprintf(fp,
"\t// This should really be based upon an IF, such as in\n"
"\t// if (IWIDTH < CWIDTH) then ...\n"
"\t// However, this is the only (other) way I know to do it.\n"
"\tgenerate if (CWIDTH < IWIDTH+1)\n"
"\tif ((f_state == 2'b01)&&(f_syncd))\n"
"\tbegin\n"
"\t\twire\t[(CWIDTH):0]\tp3c_in;\n"
"\t\twire\t[(IWIDTH+1):0]\tp3d_in;\n"
"\t\tassign\tp3c_in = ir_coef_i + ir_coef_r;\n"
"\t\tassign\tp3d_in = r_dif_r + r_dif_i;\n"
"\n"
"\t\t// We need to pad these first two multiplies by an extra\n"
"\t\t// bit just to keep them aligned with the third,\n"
"\t\t// simpler, multiply.\n"
"\t\t%s #(CWIDTH+1,IWIDTH+2) p1(i_clk, i_ce,\n"
"\t\t\t\t{ir_coef_r[CWIDTH-1],ir_coef_r},\n"
"\t\t\t\t{r_dif_r[IWIDTH],r_dif_r}, p_one);\n"
"\t\t%s #(CWIDTH+1,IWIDTH+2) p2(i_clk, i_ce,\n"
"\t\t\t\t{ir_coef_i[CWIDTH-1],ir_coef_i},\n"
"\t\t\t\t{r_dif_i[IWIDTH],r_dif_i}, p_two);\n"
"\t\t%s #(CWIDTH+1,IWIDTH+2) p3(i_clk, i_ce,\n"
"\t\t\t\tp3c_in, p3d_in, p_three);\n"
"\tend else begin\n"
"\t\twire\t[(CWIDTH):0]\tp3c_in;\n"
"\t\twire\t[(IWIDTH+1):0]\tp3d_in;\n"
"\t\tassign\tp3c_in = ir_coef_i + ir_coef_r;\n"
"\t\tassign\tp3d_in = r_dif_r + r_dif_i;\n"
"\n"
"\t\t%s #(IWIDTH+2,CWIDTH+1) p1a(i_clk, i_ce,\n"
"\t\t\t\t{r_dif_r[IWIDTH],r_dif_r},\n"
"\t\t\t\t{ir_coef_r[CWIDTH-1],ir_coef_r}, p_one);\n"
"\t\t%s #(IWIDTH+2,CWIDTH+1) p2a(i_clk, i_ce,\n"
"\t\t\t\t{r_dif_i[IWIDTH], r_dif_i},\n"
"\t\t\t\t{ir_coef_i[CWIDTH-1],ir_coef_i}, p_two);\n"
"\t\t%s #(IWIDTH+2,CWIDTH+1) p3a(i_clk, i_ce,\n"
"\t\t\t\tp3d_in, p3c_in, p_three);\n"
"\t assert(!o_sync);\n"
"\t if (INVERSE)\n"
"\t begin\n"
"\t assert(f_o_real == -f_piped_imag[7]+f_piped_imag[5]);\n"
"\t assert(f_o_imag == f_piped_real[7]-f_piped_real[5]);\n"
"\t end else begin\n"
"\t assert(f_o_real == f_piped_imag[7]-f_piped_imag[5]);\n"
"\t assert(f_o_imag == -f_piped_real[7]+f_piped_real[5]);\n"
"\t end\n"
"\tend\n"
"\tendgenerate\n"
"\n",
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy",
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy",
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy",
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy",
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy",
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy");
fprintf(fp,
"\t// These values are held in memory and delayed during the\n"
"\t// multiply. Here, we recover them. During the multiply,\n"
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n"
"\t// therefore, the left_x values need to be right shifted by\n"
"\t// CWIDTH-2 as well. The additional bits come from a sign\n"
"\t// extension.\n"
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] fifo_i, fifo_r;\n"
"\treg\t\t[(2*IWIDTH+1):0] fifo_read;\n"
"\tassign\tfifo_r = { {2{fifo_read[2*(IWIDTH+1)-1]}}, fifo_read[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n"
"\tassign\tfifo_i = { {2{fifo_read[(IWIDTH+1)-1]}}, fifo_read[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n"
"\n"
"\n"
"\treg\tsigned\t[(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n"
"\n");
fprintf(fp,
"\t// Let's do some rounding and remove unnecessary bits.\n"
"\t// We have (IWIDTH+CWIDTH+3) bits here, we need to drop down to\n"
"\t// OWIDTH, and SHIFT by SHIFT bits in the process. The trick is\n"
"\t// that we don\'t need (IWIDTH+CWIDTH+3) bits. We\'ve accumulated\n"
"\t// them, but the actual values will never fill all these bits.\n"
"\t// In particular, we only need:\n"
"\t//\t IWIDTH bits for the input\n"
"\t//\t +1 bit for the add/subtract\n"
"\t//\t+CWIDTH bits for the coefficient multiply\n"
"\t//\t +1 bit for the add/subtract in the complex multiply\n"
"\t//\t ------\n"
"\t//\t (IWIDTH+CWIDTH+2) bits at full precision.\n"
"\t//\n"
"\t// However, the coefficient multiply multiplied by a maximum value\n"
"\t// of 2^(CWIDTH-2). Thus, we only have\n"
"\t//\t IWIDTH bits for the input\n"
"\t//\t +1 bit for the add/subtract\n"
"\t//\t+CWIDTH-2 bits for the coefficient multiply\n"
"\t//\t +1 (optional) bit for the add/subtract in the cpx mpy.\n"
"\t//\t -------- ... multiply. (This last bit may be shifted out.)\n"
"\t//\t (IWIDTH+CWIDTH) valid output bits. \n"
"\t// Now, if the user wants to keep any extras of these (via OWIDTH),\n"
"\t// or if he wishes to arbitrarily shift some of these off (via\n"
"\t// SHIFT) we accomplish that here.\n"
"\n");
fprintf(fp,
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n");
"`endif\n");
}
 
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_r(i_clk, i_ce,\n"
"\t\t\t\t{ {2{fifo_r[(IWIDTH+CWIDTH)]}}, fifo_r }, rnd_left_r);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_i(i_clk, i_ce,\n"
"\t\t\t\t{ {2{fifo_i[(IWIDTH+CWIDTH)]}}, fifo_i }, rnd_left_i);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n"
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n"
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string);
fprintf(fp,
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// First clock, recover all values\n"
"\t\t\tfifo_read <= fifo_left[fifo_read_addr];\n"
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n"
"\t\t\t// although they only need to be (IWIDTH+1)\n"
"\t\t\t// + (CWIDTH) bits wide. (We\'ve got two\n"
"\t\t\t// extra bits we need to get rid of.)\n"
"\t\t\tmpy_r <= p_one - p_two;\n"
"\t\t\tmpy_i <= p_three - p_one - p_two;\n"
"\t\tend\n"
"\n");
fprintf(fp, "endmodule\n");
}
 
fprintf(fp,
"\treg\t[(AUXLEN-1):0]\taux_pipeline;\n"
"\tinitial\taux_pipeline = 0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\t\taux_pipeline <= 0;\n"
"\t\telse if (i_ce)\n"
"\t\t\taux_pipeline <= { aux_pipeline[(AUXLEN-2):0], i_aux };\n"
"\n");
fprintf(fp,
"\tinitial o_aux = 1\'b0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\t\to_aux <= 1\'b0;\n"
"\t\telse if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// Second clock, latch for final clock\n"
"\t\t\to_aux <= aux_pipeline[AUXLEN-1];\n"
"\t\tend\n"
"\n");
 
fprintf(fp,
"\t// As a final step, we pack our outputs into two packed two\'s\n"
"\t// complement numbers per output word, so that each output word\n"
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n"
"\t// portion and the bottom half being the imaginary portion.\n"
"\tassign o_left = { rnd_left_r, rnd_left_i };\n"
"\tassign o_right= { rnd_right_r,rnd_right_i};\n"
"\n"
"endmodule\n");
fclose(fp);
}
 
void build_hwbfly(const char *fname, int xtracbits, ROUND_T rounding) {
void build_sngllast(const char *fname, const bool async_reset = false) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
1777,472 → 740,221
return;
}
 
const char *rnd_string;
if (rounding == RND_TRUNCATE)
rnd_string = "truncate";
else if (rounding == RND_FROMZERO)
rnd_string = "roundfromzero";
else if (rounding == RND_HALFUP)
rnd_string = "roundhalfup";
else
rnd_string = "convround";
std::string resetw("i_reset");
if (async_reset)
resetw = std::string("i_areset_n");
 
 
fprintf(fp,
"///////////////////////////////////////////////////////////////////////////\n"
SLASHLINE
"//\n"
"// Filename: hwbfly.v\n"
"// Filename:\tlaststage.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: This routine is identical to the butterfly.v routine found\n"
"// in 'butterfly.v', save only that it uses the verilog \n"
"// operator '*' in hopes that the synthesizer would be able to optimize\n"
"// it with hardware resources.\n"
"// Purpose: This is part of an FPGA implementation that will process\n"
"// the final stage of a decimate-in-frequency FFT, running\n"
"// through the data at one sample per clock.\n"
"//\n"
"// It is understood that a hardware multiply can complete its operation in\n"
"// a single clock.\n"
"//\n"
"//\n%s"
"//\n", prjname, creator);
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
 
fprintf(fp,
"module hwbfly(i_clk, i_rst, i_ce, i_coef, i_left, i_right, i_aux,\n"
"\t\to_left, o_right, o_aux);\n"
"\t// Public changeable parameters ...\n"
"\tparameter IWIDTH=16,CWIDTH=IWIDTH+%d,OWIDTH=IWIDTH+1;\n"
"\t// Parameters specific to the core that should not be changed.\n"
"\tparameter\tSHIFT=0;\n"
"\tinput\t\ti_clk, i_rst, i_ce;\n"
"\tinput\t\t[(2*CWIDTH-1):0]\ti_coef;\n"
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n"
"\tinput\t\ti_aux;\n"
"\toutput\twire\t[(2*OWIDTH-1):0]\to_left, o_right;\n"
"\toutput\treg\to_aux;\n"
"\n", xtracbits);
"module laststage(i_clk, %s, i_ce, i_sync, i_val, o_val, o_sync);\n"
" parameter IWIDTH=16,OWIDTH=IWIDTH+1, SHIFT=0;\n"
" input i_clk, %s, i_ce, i_sync;\n"
" input [(2*IWIDTH-1):0] i_val;\n"
" output wire [(2*OWIDTH-1):0] o_val;\n"
" output reg o_sync;\n\n",
resetw.c_str(), resetw.c_str());
 
fprintf(fp,
"\treg\t[(2*IWIDTH-1):0] r_left, r_right;\n"
"\treg\t r_aux, r_aux_2;\n"
"\treg\t[(2*CWIDTH-1):0] r_coef;\n"
"\twire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i;\n"
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n"
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n"
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n"
"\treg signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n"
" reg signed [(IWIDTH-1):0] m_r, m_i;\n"
" wire signed [(IWIDTH-1):0] i_r, i_i;\n"
"\n"
"\treg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i;\n"
" assign i_r = i_val[(2*IWIDTH-1):(IWIDTH)]; \n"
" assign i_i = i_val[(IWIDTH-1):0]; \n"
"\n"
"\treg [(2*IWIDTH+2):0] leftv, leftvv;\n"
" // Don't forget that we accumulate a bit by adding two values\n"
" // together. Therefore our intermediate value must have one more\n"
" // bit than the two originals.\n"
" reg signed [(IWIDTH):0] rnd_r, rnd_i, sto_r, sto_i;\n"
" reg wait_for_sync, stage;\n"
" reg [1:0] sync_pipe;\n"
"\n"
"\t// Set up the input to the multiply\n"
"\tinitial r_aux = 1\'b0;\n"
"\tinitial r_aux_2 = 1\'b0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tbegin\n"
"\t\t\tr_aux <= 1\'b0;\n"
"\t\t\tr_aux_2 <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// One clock just latches the inputs\n"
"\t\t\tr_aux <= i_aux;\n"
"\t\t\t// Next clock adds/subtracts\n"
"\t\t\t// Other inputs are simply delayed on second clock\n"
"\t\t\tr_aux_2 <= r_aux;\n"
"\t\tend\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// One clock just latches the inputs\n"
"\t\t\tr_left <= i_left; // No change in # of bits\n"
"\t\t\tr_right <= i_right;\n"
"\t\t\tr_coef <= i_coef;\n"
"\t\t\t// Next clock adds/subtracts\n"
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n"
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n"
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n"
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n"
"\t\t\t// Other inputs are simply delayed on second clock\n"
"\t\t\tir_coef_r <= r_coef[(2*CWIDTH-1):CWIDTH];\n"
"\t\t\tir_coef_i <= r_coef[(CWIDTH-1):0];\n"
"\t\tend\n"
"\n\n");
fprintf(fp,
"\t// See comments in the butterfly.v source file for a discussion of\n"
"\t// these operations and the appropriate bit widths.\n\n");
fprintf(fp,
"\treg\tsigned [((IWIDTH+1)+(CWIDTH)-1):0] p_one, p_two;\n"
"\treg\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] p_three;\n"
"\n"
"\treg\tsigned [(CWIDTH-1):0] p1c_in, p2c_in; // Coefficient multiply inputs\n"
"\treg\tsigned [(IWIDTH):0] p1d_in, p2d_in; // Data multiply inputs\n"
"\treg\tsigned [(CWIDTH):0] p3c_in; // Product 3, coefficient input\n"
"\treg\tsigned [(IWIDTH+1):0] p3d_in; // Product 3, data input\n"
"\n"
"\tinitial leftv = 0;\n"
"\tinitial leftvv = 0;\n"
"\talways @(posedge i_clk)\n"
"\tbegin\n"
"\t\tif (i_rst)\n"
"\t\tbegin\n"
"\t\t\tleftv <= 0;\n"
"\t\t\tleftvv <= 0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// Second clock, pipeline = 1\n"
"\t\t\tleftv <= { r_aux_2, r_sum_r, r_sum_i };\n"
"\n"
"\t\t\t// Third clock, pipeline = 3\n"
"\t\t\t// As desired, each of these lines infers a DSP48\n"
"\t\t\tleftvv <= leftv;\n"
"\t\tend\n"
"\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// Second clock, pipeline = 1\n"
"\t\t\tp1c_in <= ir_coef_r;\n"
"\t\t\tp2c_in <= ir_coef_i;\n"
"\t\t\tp1d_in <= r_dif_r;\n"
"\t\t\tp2d_in <= r_dif_i;\n"
"\t\t\tp3c_in <= ir_coef_i + ir_coef_r;\n"
"\t\t\tp3d_in <= r_dif_r + r_dif_i;\n"
"\n"
"\n"
"\t\t\t// Third clock, pipeline = 3\n"
"\t\t\t// As desired, each of these lines infers a DSP48\n"
"\t\t\tp_one <= p1c_in * p1d_in;\n"
"\t\t\tp_two <= p2c_in * p2d_in;\n"
"\t\t\tp_three <= p3c_in * p3d_in;\n"
"\t\tend\n"
"\n"
"\twire\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] w_one, w_two;\n"
"\tassign\tw_one = { {(2){p_one[((IWIDTH+1)+(CWIDTH)-1)]}}, p_one };\n"
"\tassign\tw_two = { {(2){p_two[((IWIDTH+1)+(CWIDTH)-1)]}}, p_two };\n"
"\n");
" initial wait_for_sync = 1'b1;\n"
" initial stage = 1'b0;\n");
 
if (async_reset)
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
fprintf(fp,
"\t// These values are held in memory and delayed during the\n"
"\t// multiply. Here, we recover them. During the multiply,\n"
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n"
"\t// therefore, the left_x values need to be right shifted by\n"
"\t// CWIDTH-2 as well. The additional bits come from a sign\n"
"\t// extension.\n"
"\twire\taux_s;\n"
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] left_si, left_sr;\n"
"\treg\t\t[(2*IWIDTH+2):0] left_saved;\n"
"\tassign\tleft_sr = { {2{left_saved[2*(IWIDTH+1)-1]}}, left_saved[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n"
"\tassign\tleft_si = { {2{left_saved[(IWIDTH+1)-1]}}, left_saved[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n"
"\tassign\taux_s = left_saved[2*IWIDTH+2];\n"
"\n"
"\n"
"\t(* use_dsp48=\"no\" *)\n"
"\treg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n");
fprintf(fp,
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n");
" begin\n"
" wait_for_sync <= 1'b1;\n"
" stage <= 1'b0;\n"
" end else if ((i_ce)&&((!wait_for_sync)||(i_sync))&&(!stage))\n"
" begin\n"
" wait_for_sync <= 1'b0;\n"
" //\n"
" stage <= 1'b1;\n"
" //\n"
" end else if (i_ce)\n"
" stage <= 1'b0;\n\n");
 
fprintf(fp, "\tinitial\tsync_pipe = 0;\n");
if (async_reset)
fprintf(fp,
"\talways @(posedge i_clk, negedge i_areset_n)\n"
"\tif (!i_areset_n)\n");
else
fprintf(fp,
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n");
 
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_r(i_clk, i_ce,\n"
"\t\t\t\tleft_sr, rnd_left_r);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_i(i_clk, i_ce,\n"
"\t\t\t\tleft_si, rnd_left_i);\n\n",
rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n"
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string);
fprintf(fp,
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n"
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string);
"\t\tsync_pipe <= 0;\n"
"\telse if (i_ce)\n"
"\t\tsync_pipe <= { sync_pipe[0], i_sync };\n\n");
 
fprintf(fp, "\tinitial\to_sync = 1\'b0;\n");
if (async_reset)
fprintf(fp,
"\talways @(posedge i_clk, negedge i_areset_n)\n"
"\tif (!i_areset_n)\n");
else
fprintf(fp,
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n");
 
fprintf(fp,
"\tinitial left_saved = 0;\n"
"\tinitial o_aux = 1\'b0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tbegin\n"
"\t\t\tleft_saved <= 0;\n"
"\t\t\to_aux <= 1\'b0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// First clock, recover all values\n"
"\t\t\tleft_saved <= leftvv;\n"
"\n"
"\t\t\t// Second clock, round and latch for final clock\n"
"\t\t\to_aux <= aux_s;\n"
"\t\tend\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n"
"\t\t\t// although they only need to be (IWIDTH+1)\n"
"\t\t\t// + (CWIDTH) bits wide. (We've got two\n"
"\t\t\t// extra bits we need to get rid of.)\n"
"\n"
"\t\t\t// These two lines also infer DSP48\'s.\n"
"\t\t\t// To keep from using extra DSP48 resources,\n"
"\t\t\t// they are prevented from using DSP48\'s\n"
"\t\t\t// by the (* use_dsp48 ... *) comment above.\n"
"\t\t\tmpy_r <= w_one - w_two;\n"
"\t\t\tmpy_i <= p_three - w_one - w_two;\n"
"\t\tend\n"
"\n");
"\t\to_sync <= 1\'b0;\n"
"\telse if (i_ce)\n"
"\t\to_sync <= sync_pipe[1];\n\n");
 
fprintf(fp,
"\t// As a final step, we pack our outputs into two packed two's\n"
"\t// complement numbers per output word, so that each output word\n"
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n"
"\t// portion and the bottom half being the imaginary portion.\n"
"\tassign\to_left = { rnd_left_r, rnd_left_i };\n"
"\tassign\to_right= { rnd_right_r,rnd_right_i};\n"
" always @(posedge i_clk)\n"
" if (i_ce)\n"
" begin\n"
" if (!stage)\n"
" begin\n"
" // Clock 1\n"
" m_r <= i_r;\n"
" m_i <= i_i;\n"
" // Clock 3\n"
" rnd_r <= sto_r;\n"
" rnd_i <= sto_i;\n"
" //\n"
" end else begin\n"
" // Clock 2\n"
" rnd_r <= m_r + i_r;\n"
" rnd_i <= m_i + i_i;\n"
" //\n"
" sto_r <= m_r - i_r;\n"
" sto_i <= m_i - i_i;\n"
" //\n"
" end\n"
" end\n"
"\n"
"endmodule\n");
 
}
 
void build_stage(const char *fname, const char *coredir, int stage, bool odd, int nbits, bool inv, int xtra, bool hwmpy=false, bool dbg=false) {
FILE *fstage = fopen(fname, "w");
int cbits = nbits + xtra;
 
if ((cbits * 2) >= sizeof(long long)*8) {
fprintf(stderr, "ERROR: CMEM Coefficient precision requested overflows long long data type.\n");
exit(-1);
}
 
if (fstage == NULL) {
fprintf(stderr, "ERROR: Could not open %s for writing!\n", fname);
perror("O/S Err was:");
fprintf(stderr, "Attempting to continue, but this file will be missing.\n");
return;
}
 
fprintf(fstage,
"////////////////////////////////////////////////////////////////////////////\n"
"//\n"
"// Filename: %sfftstage_%c%d%s.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: This file is (almost) a Verilog source file. It is meant to\n"
"// be used by a FFT core compiler to generate FFTs which may be\n"
"// used as part of an FFT core. Specifically, this file \n"
"// encapsulates the options of an FFT-stage. For any 2^N length\n"
"// FFT, there shall be (N-1) of these stages. \n"
"//\n%s"
"//\n",
(inv)?"i":"", (odd)?'o':'e', stage*2, (dbg)?"_dbg":"", prjname, creator);
fprintf(fstage, "%s", cpyleft);
fprintf(fstage, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fstage, "module\t%sfftstage_%c%d%s(i_clk, i_rst, i_ce, i_sync, i_data, o_data, o_sync%s);\n",
(inv)?"i":"", (odd)?'o':'e', stage*2, (dbg)?"_dbg":"",
(dbg)?", o_dbg":"");
// These parameter values are useless at this point--they are to be
// replaced by the parameter values in the calling program. Only
// problem is, the CWIDTH needs to match exactly!
fprintf(fstage, "\tparameter\tIWIDTH=%d,CWIDTH=%d,OWIDTH=%d;\n",
nbits, cbits, nbits+1);
fprintf(fstage,
"\t// Parameters specific to the core that should be changed when this\n"
"\t// core is built ... Note that the minimum LGSPAN (the base two log\n"
"\t// of the span, or the base two log of the current FFT size) is 3.\n"
"\t// Smaller spans (i.e. the span of 2) must use the dblstage module.\n"
"\tparameter\tLGWIDTH=11, LGSPAN=9, LGBDLY=5, BFLYSHIFT=0;\n");
fprintf(fstage,
"\tinput i_clk, i_rst, i_ce, i_sync;\n"
"\tinput [(2*IWIDTH-1):0] i_data;\n"
"\toutput reg [(2*OWIDTH-1):0] o_data;\n"
"\toutput reg o_sync;\n"
"\n");
if (dbg) { fprintf(fstage, "\toutput\twire\t[33:0]\t\t\to_dbg;\n"
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n"
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n"
"\n");
}
fprintf(fstage,
"\treg wait_for_sync;\n"
"\treg [(2*IWIDTH-1):0] ib_a, ib_b;\n"
"\treg [(2*CWIDTH-1):0] ib_c;\n"
"\treg ib_sync;\n"
" // Now that we have our results, let's round them and report them\n"
" wire signed [(OWIDTH-1):0] o_r, o_i;\n"
"\n"
"\treg b_started;\n"
"\twire ob_sync;\n"
"\twire [(2*OWIDTH-1):0]\tob_a, ob_b;\n");
fprintf(fstage,
" convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_r(i_clk, i_ce, rnd_r, o_r);\n"
" convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_i(i_clk, i_ce, rnd_i, o_i);\n"
"\n"
"\t// %scmem is defined as an array of real and complex values,\n"
"\t// where the top CWIDTH bits are the real value and the bottom\n"
"\t// CWIDTH bits are the imaginary value.\n"
"\t//\n"
"\t// %scmem[i] = { (2^(CWIDTH-2)) * cos(2*pi*i/(2^LGWIDTH)),\n"
"\t// (2^(CWIDTH-2)) * sin(2*pi*i/(2^LGWIDTH)) };\n"
"\t//\n"
"\treg [(2*CWIDTH-1):0] %scmem [0:((1<<LGSPAN)-1)];\n"
"\tinitial\t$readmemh(\"%scmem_%c%d.hex\",%scmem);\n\n",
(inv)?"i":"", (inv)?"i":"", (inv)?"i":"",
(inv)?"i":"", (odd)?'o':'e',stage<<1, (inv)?"i":"");
{
FILE *cmem;
" assign o_val = { o_r, o_i };\n"
"\n");
 
{
char *memfile, *ptr;
 
memfile = new char[strlen(fname)+128];
strcpy(memfile, fname);
if ((NULL != (ptr = strrchr(memfile, '/')))&&(ptr>memfile)) {
ptr++;
sprintf(ptr, "%scmem_%c%d.hex", (inv)?"i":"", (odd)?'o':'e', stage*2);
} else {
sprintf(memfile, "%s/%scmem_%c%d.hex",
coredir, (inv)?"i":"",
(odd)?'o':'e', stage*2);
}
// strcpy(&memfile[strlen(memfile)-2], ".hex");
cmem = fopen(memfile, "w");
if (NULL == cmem) {
fprintf(stderr, "Could not open/write \'%s\' with FFT coefficients.\n", memfile);
perror("Err from O/S:");
exit(-2);
}
 
delete[] memfile;
}
// fprintf(cmem, "// CBITS = %d, inv = %s\n", cbits, (inv)?"true":"false");
for(int i=0; i<stage/2; i++) {
int k = 2*i+odd;
double W = ((inv)?1:-1)*2.0*M_PI*k/(double)(2*stage);
double c, s;
long long ic, is, vl;
 
c = cos(W); s = sin(W);
ic = (long long)llround((1ll<<(cbits-2)) * c);
is = (long long)llround((1ll<<(cbits-2)) * s);
vl = (ic & (~(-1ll << (cbits))));
vl <<= (cbits);
vl |= (is & (~(-1ll << (cbits))));
fprintf(cmem, "%0*llx\n", ((cbits*2+3)/4), vl);
/*
fprintf(cmem, "%0*llx\t\t// %f+j%f -> %llx +j%llx\n",
((cbits*2+3)/4), vl, c, s,
ic & (~(-1ll<<(((cbits+3)/4)*4))),
is & (~(-1ll<<(((cbits+3)/4)*4))));
*/
} fclose(cmem);
if (formal_property_flag) {
fprintf(fp,
"`ifdef FORMAL\n"
"\treg f_past_valid;\n"
"\tinitial f_past_valid = 1'b0;\n"
"\talways @(posedge i_clk)\n"
"\t f_past_valid <= 1'b1;\n"
"\n"
"`ifdef LASTSTAGE\n"
"\talways @(posedge i_clk)\n"
"\t assume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n"
"`endif\n"
"\n"
"\tinitial assert(IWIDTH+1 == OWIDTH);\n"
"\n"
"\treg signed [IWIDTH-1:0] f_piped_real [0:3];\n"
"\treg signed [IWIDTH-1:0] f_piped_imag [0:3];\n"
"\talways @(posedge i_clk)\n"
"\tif (i_ce)\n"
"\tbegin\n"
"\t f_piped_real[0] <= i_val[2*IWIDTH-1:IWIDTH];\n"
"\t f_piped_imag[0] <= i_val[ IWIDTH-1:0];\n"
"\n"
"\t f_piped_real[1] <= f_piped_real[0];\n"
"\t f_piped_imag[1] <= f_piped_imag[0];\n"
"\n"
"\t f_piped_real[2] <= f_piped_real[1];\n"
"\t f_piped_imag[2] <= f_piped_imag[1];\n"
"\n"
"\t f_piped_real[3] <= f_piped_real[2];\n"
"\t f_piped_imag[3] <= f_piped_imag[2];\n"
"\tend\n"
"\n"
"\twire f_syncd;\n"
"\treg f_rsyncd;\n"
"\n"
"\tinitial f_rsyncd = 0;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n"
"\t f_rsyncd <= 1'b0;\n"
"\telse if (!f_rsyncd)\n"
"\t f_rsyncd <= o_sync;\n"
"\tassign f_syncd = (f_rsyncd)||(o_sync);\n"
"\n"
"\treg f_state;\n"
"\tinitial f_state = 0;\n"
"\talways @(posedge i_clk)\n"
"\tif (i_reset)\n"
"\t f_state <= 0;\n"
"\telse if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n"
"\t f_state <= f_state + 1;\n"
"\n"
"\talways @(*)\n"
"\tif (f_state != 0)\n"
"\t assume(!i_sync);\n"
"\n"
"\talways @(*)\n"
"\t assert(stage == f_state[0]);\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif ((f_state == 1'b1)&&(f_syncd))\n"
"\tbegin\n"
"\t assert(o_r == f_piped_real[2] + f_piped_real[1]);\n"
"\t assert(o_i == f_piped_imag[2] + f_piped_imag[1]);\n"
"\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\tif ((f_state == 1'b0)&&(f_syncd))\n"
"\tbegin\n"
"\t assert(!o_sync);\n"
"\t assert(o_r == f_piped_real[3] - f_piped_real[2]);\n"
"\t assert(o_i == f_piped_imag[3] - f_piped_imag[2]);\n"
"\tend\n"
"\n"
"\talways @(*)\n"
"\tif (wait_for_sync)\n"
"\tbegin\n"
"\t assert(!f_rsyncd);\n"
"\t assert(!o_sync);\n"
"\t assert(f_state == 0);\n"
"\tend\n\n");
}
 
fprintf(fstage,
"\treg [(LGWIDTH-2):0] iaddr;\n"
"\treg [(2*IWIDTH-1):0] imem [0:((1<<LGSPAN)-1)];\n"
"\n"
"\treg [LGSPAN:0] oB;\n"
"\treg [(2*OWIDTH-1):0] omem [0:((1<<LGSPAN)-1)];\n"
"\n"
"\tinitial wait_for_sync = 1\'b1;\n"
"\tinitial iaddr = 0;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tbegin\n"
"\t\t\twait_for_sync <= 1\'b1;\n"
"\t\t\tiaddr <= 0;\n"
"\t\tend\n"
"\t\telse if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n"
"\t\tbegin\n"
"\t\t\t//\n"
"\t\t\t// First step: Record what we\'re not ready to use yet\n"
"\t\t\t//\n"
"\t\t\tiaddr <= iaddr + { {(LGWIDTH-2){1\'b0}}, 1\'b1 };\n"
"\t\t\twait_for_sync <= 1\'b0;\n"
"\t\tend\n"
"\talways @(posedge i_clk) // Need to make certain here that we don\'t read\n"
"\t\tif ((i_ce)&&(!iaddr[LGSPAN])) // and write the same address on\n"
"\t\t\timem[iaddr[(LGSPAN-1):0]] <= i_data; // the same clk\n"
"\n");
fprintf(fp,
"`endif // FORMAL\n"
"endmodule\n");
 
fprintf(fstage,
"\t//\n"
"\t// Now, we have all the inputs, so let\'s feed the butterfly\n"
"\t//\n"
"\tinitial ib_sync = 1\'b0;\n"
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\t\tib_sync <= 1\'b0;\n"
"\t\telse if ((i_ce)&&(iaddr[LGSPAN]))\n"
"\t\t\tbegin\n"
"\t\t\t\t// Set the sync to true on the very first\n"
"\t\t\t\t// valid input in, and hence on the very\n"
"\t\t\t\t// first valid data out per FFT.\n"
"\t\t\t\tib_sync <= (iaddr==(1<<(LGSPAN)));\n"
"\t\t\tend\n"
"\talways\t@(posedge i_clk)\n"
"\t\tif ((i_ce)&&(iaddr[LGSPAN]))\n"
"\t\t\tbegin\n"
"\t\t\t\t// One input from memory, ...\n"
"\t\t\t\tib_a <= imem[iaddr[(LGSPAN-1):0]];\n"
"\t\t\t\t// One input clocked in from the top\n"
"\t\t\t\tib_b <= i_data;\n"
"\t\t\t\t// and the coefficient or twiddle factor\n"
"\t\t\t\tib_c <= %scmem[iaddr[(LGSPAN-1):0]];\n"
"\t\t\tend\n\n", (inv)?"i":"");
 
if (hwmpy) {
fprintf(fstage,
"\thwbfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n"
"\t\t\t.SHIFT(BFLYSHIFT))\n"
"\t\tbfly(i_clk, i_rst, i_ce, ib_c,\n"
"\t\t\tib_a, ib_b, ib_sync, ob_a, ob_b, ob_sync);\n");
} else {
fprintf(fstage,
"\tbutterfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n"
"\t\t\t.MPYDELAY(%d\'d%d),.LGDELAY(LGBDLY),.SHIFT(BFLYSHIFT))\n"
"\t\tbfly(i_clk, i_rst, i_ce, ib_c,\n"
"\t\t\tib_a, ib_b, ib_sync, ob_a, ob_b, ob_sync);\n",
lgdelay(nbits, xtra), bflydelay(nbits, xtra));
}
 
fprintf(fstage,
"\t//\n"
"\t// Next step: recover the outputs from the butterfly\n"
"\t//\n"
"\tinitial oB = 0;\n"
"\tinitial o_sync = 0;\n"
"\tinitial b_started = 0;\n"
"\talways\t@(posedge i_clk)\n"
"\t\tif (i_rst)\n"
"\t\tbegin\n"
"\t\t\toB <= 0;\n"
"\t\t\to_sync <= 0;\n"
"\t\t\tb_started <= 0;\n"
"\t\tend else if (i_ce)\n"
"\t\tbegin\n"
"\t\t\to_sync <= (!oB[LGSPAN])?ob_sync : 1\'b0;\n"
"\t\t\tif (ob_sync||b_started)\n"
"\t\t\t\toB <= oB + { {(LGSPAN){1\'b0}}, 1\'b1 };\n"
"\t\t\tif ((ob_sync)&&(!oB[LGSPAN]))\n"
"\t\t\t// A butterfly output is available\n"
"\t\t\t\tb_started <= 1\'b1;\n"
"\t\tend\n\n");
fprintf(fstage,
"\treg [(LGSPAN-1):0]\t\tdly_addr;\n"
"\treg [(2*OWIDTH-1):0]\tdly_value;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tdly_addr <= oB[(LGSPAN-1):0];\n"
"\t\t\tdly_value <= ob_b;\n"
"\t\tend\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\tomem[dly_addr] <= dly_value;\n"
"\n");
fprintf(fstage,
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_data <= (!oB[LGSPAN])?ob_a : omem[oB[(LGSPAN-1):0]];\n"
"\n");
fprintf(fstage, "endmodule\n");
fclose(fp);
}
 
void usage(void) {
2249,20 → 961,25
fprintf(stderr,
"USAGE:\tfftgen [-f <size>] [-d dir] [-c cbits] [-n nbits] [-m mxbits] [-s]\n"
// "\tfftgen -i\n"
"\t-1\tBuild a normal FFT, running at one clock per complex sample, or (for\n"
"\t\ta real FFT) at one clock per two real input samples.\n"
"\t-1\tBuild a normal FFT, running at one clock per complex sample, or\n"
"\t\t(for a real FFT) at one clock per two real input samples.\n"
"\t-a <hdrname> Create a header of information describing the built-in\n"
"\t\tparameters, useful for module-level testing with Verilator\n"
"\t-c <cbits>\tCauses all internal complex coefficients to be\n"
"\t\tlonger than the corresponding data bits, to help avoid\n"
"\t\tcoefficient truncation errors. The default is %d bits longer\n"
"\t\tthan the data bits.\n"
"\t-d <dir>\tPlaces all of the generated verilog files into <dir>.\n"
"\t\tThe default is a subdirectory of the current directory named %s.\n"
"\t-f <size>\tSets the size of the FFT as the number of complex\n"
"\t-d <dir> Places all of the generated verilog files into <dir>.\n"
"\t\tThe default is a subdirectory of the current directory\n"
"\t\tnamed %s.\n"
"\t-f <size> Sets the size of the FFT as the number of complex\n"
"\t\tsamples input to the transform. (No default value, this is\n"
"\t\ta required parameter.)\n"
"\t-i\tAn inverse FFT, meaning that the coefficients are\n"
"\t\tgiven by e^{ j 2 pi k/N n }. The default is a forward FFT, with\n"
"\t\tcoefficients given by e^{ -j 2 pi k/N n }.\n"
"\t-k #\tSets # clocks per sample, used to minimize multiplies. Also\n"
"\t\tsets one sample in per i_ce clock (opt -1)\n"
"\t-m <mxbits>\tSets the maximum bit width that the FFT should ever\n"
"\t\tproduce. Internal values greater than this value will be\n"
"\t\ttruncated to this value. (The default value grows the input\n"
2270,11 → 987,9
"\t-n <nbits>\tSets the bitwidth for values coming into the (i)FFT.\n"
"\t\tThe default is %d bits input for each component of the two\n"
"\t\tcomplex values into the FFT.\n"
"\t-p <nmpy>\tSets the number of stages that will use any hardware \n"
"\t\tmultiplication facility, instead of shift-add emulation.\n"
"\t\tThree multiplies per butterfly, or six multiplies per stage will\n"
"\t\tbe accelerated in this fashion. The default is not to use any\n"
"\t\thardware multipliers.\n"
"\t-p <nmpy> Sets the number of hardware multiplies (DSPs) to use, versus\n"
"\t\tshift-add emulation. The default is not to use any hardware\n"
"\t\tmultipliers.\n"
"\t-r\tBuild a real-FFT at four input points per sample, rather than a\n"
"\t\tcomplex FFT. (Default is a Complex FFT.)\n"
"\t-s\tSkip the final bit reversal stage. This is useful in\n"
2286,8 → 1001,8
"\t\tnot yet provide.)\n"
"\t-S\tInclude the final bit reversal stage (default).\n"
"\t-x <xtrabits>\tUse this many extra bits internally, before any final\n"
"\t\trounding or truncation of the answer to the final number of bits.\n"
"\t\tThe default is to use %d extra bits internally.\n",
"\t\trounding or truncation of the answer to the final number of\n"
"\t\tbits. The default is to use %d extra bits internally.\n",
/*
"\t-0\tA forward FFT (default), meaning that the coefficients are\n"
"\t\tgiven by e^{-j 2 pi k/N n }.\n"
2302,11 → 1017,14
int main(int argc, char **argv) {
int fftsize = -1, lgsize = -1;
int nbitsin = DEF_NBITSIN, xtracbits = DEF_XTRACBITS,
nummpy=DEF_NMPY, nonmpy=2;
int nbitsout, maxbitsout = -1, xtrapbits=DEF_XTRAPBITS;
nummpy=DEF_NMPY, nmpypstage=6, mpy_stages;
int nbitsout, maxbitsout = -1, xtrapbits=DEF_XTRAPBITS, ckpce = 0;
const char *EMPTYSTR = "";
bool bitreverse = true, inverse=false,
verbose_flag = false, single_clock = false,
real_fft = false;
verbose_flag = false,
single_clock = false,
real_fft = false,
async_reset = false;
FILE *vmain;
std::string coredir = DEF_COREDIR, cmdline = "", hdrname = "";
ROUND_T rounding = RND_CONVERGENT;
2318,6 → 1036,7
if (argc <= 1)
usage();
 
// Copy the original command line before we mess with it
cmdline = argv[0];
for(int argn=1; argn<argc; argn++) {
cmdline += " ";
2324,146 → 1043,87
cmdline += argv[argn];
}
 
for(int argn=1; argn<argc; argn++) {
if ('-' == argv[argn][0]) {
for(int j=1; (argv[argn][j])&&(j<100); j++) {
switch(argv[argn][j]) {
/*
case '0':
inverse = false;
{ int c;
while((c = getopt(argc, argv, "12Aa:c:d:D:f:hik:m:n:p:rsSx:v")) != -1) {
switch(c) {
case '1': single_clock = true; break;
case '2': single_clock = false; break;
case 'A': async_reset = true; break;
case 'a': hdrname = strdup(optarg); break;
case 'c': xtracbits = atoi(optarg); break;
case 'd': coredir = std::string(optarg); break;
case 'D': dbgstage = atoi(optarg); break;
case 'f': fftsize = atoi(optarg);
{ int sln = strlen(optarg);
if (!isdigit(optarg[sln-1])){
switch(optarg[sln-1]) {
case 'k': case 'K':
fftsize <<= 10;
break;
*/
case '1':
single_clock = true;
case 'm': case 'M':
fftsize <<= 20;
break;
case 'a':
if (argn+1 >= argc) {
printf("ERR: No header filename given\n\n");
usage(); exit(-1);
}
hdrname = argv[++argn];
j+= 200;
case 'g': case 'G':
fftsize <<= 30;
break;
case 'c':
if (argn+1 >= argc) {
printf("ERR: No extra number of coefficient bits given!\n\n");
usage(); exit(-1);
}
xtracbits = atoi(argv[++argn]);
j+= 200;
break;
case 'd':
if (argn+1 >= argc) {
printf("ERR: No directory given into which to place the core!\n\n");
usage(); exit(-1);
}
coredir = argv[++argn];
j += 200;
break;
case 'D':
dbg = true;
if (argn+1 >= argc) {
printf("ERR: No debug stage number given!\n\n");
usage(); exit(-1);
}
dbgstage = atoi(argv[++argn]);
j+= 200;
break;
case 'f':
if (argn+1 >= argc) {
printf("ERR: No FFT Size given!\n\n");
usage(); exit(-1);
}
fftsize = atoi(argv[++argn]);
{ int sln = strlen(argv[argn]);
if (!isdigit(argv[argn][sln-1])){
switch(argv[argn][sln-1]) {
case 'k': case 'K':
fftsize <<= 10;
break;
case 'm': case 'M':
fftsize <<= 20;
break;
case 'g': case 'G':
fftsize <<= 30;
break;
default:
printf("ERR: Unknown FFT size, %s!\n", argv[argn]);
exit(-1);
}
}}
j += 200;
break;
case 'h':
usage();
exit(0);
break;
case 'i':
inverse = true;
break;
case 'm':
if (argn+1 >= argc) {
printf("ERR: No maximum output bit value given!\n\n");
exit(-1);
}
maxbitsout = atoi(argv[++argn]);
j += 200;
break;
case 'n':
if (argn+1 >= argc) {
printf("ERR: No input bit size given!\n\n");
exit(-1);
}
nbitsin = atoi(argv[++argn]);
j += 200;
break;
case 'p':
if (argn+1 >= argc) {
printf("ERR: No number given for number of hardware multiply stages!\n\n");
exit(-1);
}
nummpy = atoi(argv[++argn]);
j += 200;
break;
case 'r':
real_fft = true;
break;
case 'S':
bitreverse = true;
break;
case 's':
bitreverse = false;
break;
case 'x':
if (argn+1 >= argc) {
printf("ERR: No extra number of bits given!\n\n");
usage(); exit(-1);
} j+= 200;
xtrapbits = atoi(argv[++argn]);
break;
case 'v':
verbose_flag = true;
break;
default:
printf("Unknown argument, -%c\n", argv[argn][j]);
usage();
exit(-1);
}
}
} else {
printf("Unrecognized argument, %s\n", argv[argn]);
printf("ERR: Unknown FFT size, %s!\n", optarg);
exit(EXIT_FAILURE);
}
}} break;
case 'h': usage(); exit(EXIT_SUCCESS); break;
case 'i': inverse = true; break;
case 'k': ckpce = atoi(optarg);
single_clock = true;
break;
case 'm': maxbitsout = atoi(optarg); break;
case 'n': nbitsin = atoi(optarg); break;
case 'p': nummpy = atoi(optarg); break;
case 'r': real_fft = true; break;
case 'S': bitreverse = true; break;
case 's': bitreverse = false; break;
case 'x': xtrapbits = atoi(optarg); break;
case 'v': verbose_flag = true; break;
// case 'z': variable_size = true; break;
default:
printf("Unknown argument, -%c\n", c);
usage();
exit(-1);
exit(EXIT_FAILURE);
}
}}
 
if (verbose_flag) {
if (inverse)
printf("Building a %d point inverse FFT module, with %s outputs\n",
fftsize,
(real_fft)?"real ":"complex");
else
printf("Building a %d point %sforward FFT module\n",
fftsize,
(real_fft)?"real ":"");
if (!single_clock)
printf(" that accepts two inputs per clock\n");
if (async_reset)
printf(" using a negative logic ASYNC reset\n");
 
printf("The core will be placed into the %s/ directory\n", coredir.c_str());
 
if (hdrname[0])
printf("A C header file, %s, will be written capturing these\n"
"options for a Verilator testbench\n",
hdrname.c_str());
// nummpy
// xtrapbits
}
 
if (real_fft) {
printf("The real FFT option is not implemented yet, but still on\nmy to do list. Please try again later.\n");
exit(0);
} if (single_clock) {
printf("The single clock FFT option is not implemented yet, but still on\nmy to do list. Please try again later.\n");
exit(0);
} if (!bitreverse) {
exit(EXIT_FAILURE);
}
 
if (ckpce < 1)
ckpce = 1;
if (!bitreverse) {
printf("WARNING: While I can skip the bit reverse stage, the code to do\n");
printf("an inverse FFT on a bit--reversed input has not yet been\n");
printf("built.\n");
2476,7 → 1136,7
 
if ((fftsize <= 0)||(nbitsin < 1)||(nbitsin>48)) {
printf("INVALID PARAMETERS!!!!\n");
exit(-1);
exit(EXIT_FAILURE);
}
 
 
2483,7 → 1143,7
if (nextlg(fftsize) != fftsize) {
fprintf(stderr, "ERR: FFTSize (%d) *must* be a power of two\n",
fftsize);
exit(-1);
exit(EXIT_FAILURE);
} else if (fftsize < 2) {
fprintf(stderr, "ERR: Minimum FFTSize is 2, not %d\n",
fftsize);
2496,7 → 1156,7
fprintf(stderr, "Indeed, a size of %d doesn\'t make much sense to me at all.\n", fftsize);
fprintf(stderr, "Is such an operation even defined?\n");
}
exit(-1);
exit(EXIT_FAILURE);
}
 
// Calculate how many output bits we'll have, and what the log
2522,14 → 1182,30
} if ((maxbitsout > 0)&&(nbitsout > maxbitsout))
nbitsout = maxbitsout;
 
if (verbose_flag) {
printf("Output samples will be %d bits wide\n", nbitsout);
printf("This %sFFT will take %d-bit samples in, and produce %d samples out\n", (inverse)?"i":"", nbitsin, nbitsout);
if (maxbitsout > 0)
printf(" Internally, it will allow items to accumulate to %d bits\n", maxbitsout);
printf(" Twiddle-factors of %d bits will be used\n",
nbitsin+xtracbits);
if (!bitreverse)
printf(" The output will be left in bit-reversed order\n");
}
 
// Figure out how many multiply stages to use, and how many to skip
{
int lgv = lgval(fftsize);
if (!single_clock) {
nmpypstage = 6;
} else if (ckpce <= 1) {
nmpypstage = 3;
} else if (ckpce == 2) {
nmpypstage = 2;
} else
nmpypstage = 1;
 
nonmpy = lgv - nummpy;
if (nonmpy < 2) nonmpy = 2;
nummpy = lgv - nonmpy;
}
mpy_stages = nummpy / nmpypstage;
if (mpy_stages > lgval(fftsize)-2)
mpy_stages = lgval(fftsize)-2;
 
{
struct stat sbuf;
2538,13 → 1214,13
fprintf(stderr, "\'%s\' already exists, and is not a directory!\n", coredir.c_str());
fprintf(stderr, "I will stop now, lest I overwrite something you care about.\n");
fprintf(stderr, "To try again, please remove this file.\n");
exit(-1);
exit(EXIT_FAILURE);
}
} else
mkdir(coredir.c_str(), 0755);
if (access(coredir.c_str(), X_OK|W_OK) != 0) {
fprintf(stderr, "I have no access to the directory \'%s\'.\n", coredir.c_str());
exit(-1);
exit(EXIT_FAILURE);
}
}
 
2553,24 → 1229,25
if (hdr == NULL) {
fprintf(stderr, "ERROR: Cannot open %s to create header file\n", hdrname.c_str());
perror("O/S Err:");
exit(-2);
exit(EXIT_FAILURE);
}
 
fprintf(hdr, "/////////////////////////////////////////////////////////////////////////////\n");
fprintf(hdr, "//\n");
fprintf(hdr, "// Filename: %s\n", hdrname.c_str());
fprintf(hdr, "//\n");
fprintf(hdr, "// Project: %s\n", prjname);
fprintf(hdr, "//\n");
fprintf(hdr, "// Purpose: This simple header file captures the internal constants\n");
fprintf(hdr, "// within the FFT that were used to build it, for the purpose\n");
fprintf(hdr, "// of making C++ integration (and test bench testing) simpler. That\n");
fprintf(hdr, "// is, should the FFT change size, this will note that size change\n");
fprintf(hdr, "// and thus any test bench or other C++ program dependent upon\n");
fprintf(hdr, "// either the size of the FFT, the number of bits in or out of\n");
fprintf(hdr, "// it, etc., can pick up the changes in the defines found within\n");
fprintf(hdr, "// this file.\n");
fprintf(hdr, "//\n");
fprintf(hdr,
SLASHLINE
"//\n"
"// Filename:\t%s\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose: This simple header file captures the internal constants\n"
"// within the FFT that were used to build it, for the purpose\n"
"// of making C++ integration (and test bench testing) simpler. That is,\n"
"// should the FFT change size, this will note that size change and thus\n"
"// any test bench or other C++ program dependent upon either the size of\n"
"// the FFT, the number of bits in or out of it, etc., can pick up the\n"
"// changes in the defines found within this file.\n"
"//\n",
hdrname.c_str(), prjname);
fprintf(hdr, "%s", creator);
fprintf(hdr, "//\n");
fprintf(hdr, "%s", cpyleft);
2588,6 → 1265,11
(inverse)?"I":"", nbitsout,
(inverse)?"I":"", lgsize,
(inverse)?"I":"", (inverse)?"I":"");
if (ckpce > 0)
fprintf(hdr, "#define\t%sFFT_CKPCE\t%d\t// Clocks per CE\n",
(inverse)?"I":"", ckpce);
else
fprintf(hdr, "// Two samples per i_ce\n");
if (!bitreverse)
fprintf(hdr, "#define\t%sFFT_SKIPS_BIT_REVERSE\n",
(inverse)?"I":"");
2595,6 → 1277,8
fprintf(hdr, "#define\tRL%sFFT\n\n", (inverse)?"I":"");
if (!single_clock)
fprintf(hdr, "#define\tDBLCLK%sFFT\n\n", (inverse)?"I":"");
else
fprintf(hdr, "// #define\tDBLCLK%sFFT // this FFT takes one input sample per clock\n\n", (inverse)?"I":"");
if (USE_OLD_MULTIPLY)
fprintf(hdr, "#define\tUSE_OLD_MULTIPLY\n\n");
 
2650,53 → 1334,89
if (NULL == vmain) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname_string.c_str());
perror("Err from O/S:");
exit(-1);
exit(EXIT_FAILURE);
}
 
if (verbose_flag)
printf("Opened %s\n", fname_string.c_str());
}
 
fprintf(vmain, "/////////////////////////////////////////////////////////////////////////////\n");
fprintf(vmain, "//\n");
fprintf(vmain, "// Filename: %sfftmain.v\n", (inverse)?"i":"");
fprintf(vmain, "//\n");
fprintf(vmain, "// Project: %s\n", prjname);
fprintf(vmain, "//\n");
fprintf(vmain, "// Purpose: This is the main module in the Doubletime FPGA FFT project.\n");
fprintf(vmain, "// As such, all other modules are subordinate to this one.\n");
fprintf(vmain, "// (I have been reading too much legalese this week ...)\n");
fprintf(vmain, "// This module accomplish a fixed size Complex FFT on %d data\n", fftsize);
fprintf(vmain, "// points. The FFT is fully pipelined, and accepts as inputs\n");
fprintf(vmain, "// two complex two\'s complement samples per clock.\n");
fprintf(vmain, "//\n");
fprintf(vmain, "// Parameters:\n");
fprintf(vmain, "// i_clk\tThe clock. All operations are synchronous with this clock.\n");
fprintf(vmain, "//\ti_rst\tSynchronous reset, active high. Setting this line will\n");
fprintf(vmain, "//\t\t\tforce the reset of all of the internals to this routine.\n");
fprintf(vmain, "//\t\t\tFurther, following a reset, the o_sync line will go\n");
fprintf(vmain, "//\t\t\thigh the same time the first output sample is valid.\n");
fprintf(vmain, "//\ti_ce\tA clock enable line. If this line is set, this module\n");
fprintf(vmain, "//\t\t\twill accept two complex values as inputs, and produce\n");
fprintf(vmain, "//\t\t\ttwo (possibly empty) complex values as outputs.\n");
fprintf(vmain, "//\ti_left\tThe first of two complex input samples. This value is split\n");
fprintf(vmain, "//\t\t\tinto two two\'s complement numbers, %d bits each, with\n", nbitsin);
fprintf(vmain, "//\t\t\tthe real portion in the high order bits, and the\n");
fprintf(vmain, "//\t\t\timaginary portion taking the bottom %d bits.\n", nbitsin);
fprintf(vmain, "//\ti_right\tThis is the same thing as i_left, only this is the second of\n");
fprintf(vmain, "//\t\t\ttwo such samples. Hence, i_left would contain input\n");
fprintf(vmain, "//\t\t\tsample zero, i_right would contain sample one. On the\n");
fprintf(vmain, "//\t\t\tnext clock i_left would contain input sample two,\n");
fprintf(vmain, "//\t\t\ti_right number three and so forth.\n");
fprintf(vmain, "//\to_left\tThe first of two output samples, of the same format as i_left,\n");
fprintf(vmain, "//\t\t\tonly having %d bits for each of the real and imaginary\n", nbitsout);
fprintf(vmain, "//\t\t\tcomponents, leading to %d bits total.\n", nbitsout*2);
fprintf(vmain, "//\to_right\tThe second of two output samples produced each clock. This has\n");
fprintf(vmain, "//\t\t\tthe same format as o_left.\n");
fprintf(vmain, "//\to_sync\tA one bit output indicating the first valid sample produced by\n");
fprintf(vmain, "//\t\t\tthis FFT following a reset. Ever after, this will\n");
fprintf(vmain, "//\t\t\tindicate the first sample of an FFT frame.\n");
fprintf(vmain, "//\n");
fprintf(vmain, "// Arguments:\tThis file was computer generated using the\n");
fprintf(vmain, "//\t\tfollowing command line:\n");
fprintf(vmain, "//\n");
fprintf(vmain,
SLASHLINE
"//\n"
"// Filename:\t%sfftmain.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: This is the main module in the General Purpose FPGA FFT\n"
"// implementation. As such, all other modules are subordinate\n"
"// to this one. This module accomplish a fixed size Complex FFT on\n"
"// %d data points.\n",
(inverse)?"i":"",prjname, fftsize);
if (single_clock) {
fprintf(vmain,
"// The FFT is fully pipelined, and accepts as inputs one complex two\'s\n"
"// complement sample per clock.\n");
} else {
fprintf(vmain,
"// The FFT is fully pipelined, and accepts as inputs two complex two\'s\n"
"// complement samples per clock.\n");
}
 
fprintf(vmain,
"//\n"
"// Parameters:\n"
"// i_clk\tThe clock. All operations are synchronous with this clock.\n"
"// i_%sreset%s\tSynchronous reset, active high. Setting this line will\n"
"// \t\tforce the reset of all of the internals to this routine.\n"
"// \t\tFurther, following a reset, the o_sync line will go\n"
"// \t\thigh the same time the first output sample is valid.\n",
(async_reset)?"a":"", (async_reset)?"_n":"");
if (single_clock) {
fprintf(vmain,
"// i_ce\tA clock enable line. If this line is set, this module\n"
"// \t\twill accept one complex input value, and produce\n"
"// \t\tone (possibly empty) complex output value.\n"
"// i_sample\tThe complex input sample. This value is split\n"
"// \t\tinto two two\'s complement numbers, %d bits each, with\n"
"// \t\tthe real portion in the high order bits, and the\n"
"// \t\timaginary portion taking the bottom %d bits.\n"
"// o_result\tThe output result, of the same format as i_sample,\n"
"// \t\tonly having %d bits for each of the real and imaginary\n"
"// \t\tcomponents, leading to %d bits total.\n"
"// o_sync\tA one bit output indicating the first sample of the FFT frame.\n"
"// \t\tIt also indicates the first valid sample out of the FFT\n"
"// \t\ton the first frame.\n", nbitsin, nbitsin, nbitsout, nbitsout*2);
} else {
fprintf(vmain,
"// i_ce\tA clock enable line. If this line is set, this module\n"
"// \t\twill accept two complex values as inputs, and produce\n"
"// \t\ttwo (possibly empty) complex values as outputs.\n"
"// i_left\tThe first of two complex input samples. This value is split\n"
"// \t\tinto two two\'s complement numbers, %d bits each, with\n"
"// \t\tthe real portion in the high order bits, and the\n"
"// \t\timaginary portion taking the bottom %d bits.\n"
"// i_right\tThis is the same thing as i_left, only this is the second of\n"
"// \t\ttwo such samples. Hence, i_left would contain input\n"
"// \t\tsample zero, i_right would contain sample one. On the\n"
"// \t\tnext clock i_left would contain input sample two,\n"
"// \t\ti_right number three and so forth.\n"
"// o_left\tThe first of two output samples, of the same format as i_left,\n"
"// \t\tonly having %d bits for each of the real and imaginary\n"
"// \t\tcomponents, leading to %d bits total.\n"
"// o_right\tThe second of two output samples produced each clock. This has\n"
"// \t\tthe same format as o_left.\n"
"// o_sync\tA one bit output indicating the first valid sample produced by\n"
"// \t\tthis FFT following a reset. Ever after, this will\n"
"// \t\tindicate the first sample of an FFT frame.\n",
nbitsin, nbitsin, nbitsout, nbitsout*2);
}
 
fprintf(vmain,
"//\n"
"// Arguments:\tThis file was computer generated using the following command\n"
"//\t\tline:\n"
"//\n");
fprintf(vmain, "//\t\t%% %s\n", cmdline.c_str());
fprintf(vmain, "//\n");
fprintf(vmain, "%s", creator);
2705,44 → 1425,69
fprintf(vmain, "//\n//\n`default_nettype\tnone\n//\n");
 
 
std::string resetw("i_reset");
if (async_reset)
resetw = "i_areset_n";
 
fprintf(vmain, "//\n");
fprintf(vmain, "//\n");
fprintf(vmain, "module %sfftmain(i_clk, i_rst, i_ce,\n", (inverse)?"i":"");
fprintf(vmain, "\t\ti_left, i_right,\n");
fprintf(vmain, "\t\to_left, o_right, o_sync%s);\n",
fprintf(vmain, "module %sfftmain(i_clk, %s, i_ce,\n",
(inverse)?"i":"", resetw.c_str());
if (single_clock) {
fprintf(vmain, "\t\ti_sample, o_result, o_sync%s);\n",
(dbg)?", o_dbg":"");
fprintf(vmain, "\tparameter\tIWIDTH=%d, OWIDTH=%d, LGWIDTH=%d;\n", nbitsin, nbitsout, lgsize);
} else {
fprintf(vmain, "\t\ti_left, i_right,\n");
fprintf(vmain, "\t\to_left, o_right, o_sync%s);\n",
(dbg)?", o_dbg":"");
}
fprintf(vmain, "\tparameter\tIWIDTH=%d, OWIDTH=%d, LGWIDTH=%d;\n\t//\n", nbitsin, nbitsout, lgsize);
assert(lgsize > 0);
fprintf(vmain, "\tinput\t\ti_clk, i_rst, i_ce;\n");
fprintf(vmain, "\tinput\t\t\t\t\ti_clk, %s, i_ce;\n\t//\n",
resetw.c_str());
if (single_clock) {
fprintf(vmain, "\tinput\t\t[(2*IWIDTH-1):0]\ti_sample;\n");
fprintf(vmain, "\toutput\treg\t[(2*OWIDTH-1):0]\to_result;\n");
} else {
fprintf(vmain, "\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n");
fprintf(vmain, "\toutput\treg\t[(2*OWIDTH-1):0]\to_left, o_right;\n");
fprintf(vmain, "\toutput\treg\t\t\to_sync;\n");
}
fprintf(vmain, "\toutput\treg\t\t\t\to_sync;\n");
if (dbg)
fprintf(vmain, "\toutput\twire\t[33:0]\t\to_dbg;\n");
fprintf(vmain, "\n\n");
 
fprintf(vmain, "\t// Outputs of the FFT, ready for bit reversal.\n");
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_left, br_right;\n");
fprintf(vmain, "\n\n");
 
if (single_clock)
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_sample;\n");
else
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_left, br_right;\n");
int tmp_size = fftsize, lgtmp = lgsize;
if (fftsize == 2) {
if (bitreverse) {
fprintf(vmain, "\treg\tbr_start;\n");
fprintf(vmain, "\tinitial br_start = 1\'b0;\n");
fprintf(vmain, "\talways @(posedge i_clk)\n");
fprintf(vmain, "\t\tif (i_rst)\n");
if (async_reset) {
fprintf(vmain, "\talways @(posedge i_clk, negedge i_arese_n)\n");
fprintf(vmain, "\t\tif (!i_areset_n)\n");
} else {
fprintf(vmain, "\talways @(posedge i_clk)\n");
fprintf(vmain, "\t\tif (i_reset)\n");
}
fprintf(vmain, "\t\t\tbr_start <= 1\'b0;\n");
fprintf(vmain, "\t\telse if (i_ce)\n");
fprintf(vmain, "\t\t\tbr_start <= 1\'b1;\n");
}
fprintf(vmain, "\n\n");
fprintf(vmain, "\tdblstage\t#(IWIDTH)\tstage_2(i_clk, i_rst, i_ce,\n");
fprintf(vmain, "\t\t\t(!i_rst), i_left, i_right, br_left, br_right);\n");
fprintf(vmain, "\tlaststage\t#(IWIDTH)\tstage_2(i_clk, %s, i_ce,\n", resetw.c_str());
fprintf(vmain, "\t\t\t(%s%s), i_left, i_right, br_left, br_right);\n",
(async_reset)?"":"!", resetw.c_str());
fprintf(vmain, "\n\n");
} else {
int nbits = nbitsin, dropbit=0;
int obits = nbits+1+xtrapbits;
std::string cmem;
FILE *cmemfp;
 
if ((maxbitsout > 0)&&(obits > maxbitsout))
obits = maxbitsout;
2753,50 → 1498,87
 
// Last two stages are always non-multiply stages
// since the multiplies can be done by adds
mpystage = ((lgtmp-2) <= nummpy);
mpystage = ((lgtmp-2) <= mpy_stages);
 
if (mpystage)
fprintf(vmain, "\t// A hardware optimized FFT stage\n");
fprintf(vmain, "\n\n");
fprintf(vmain, "\twire\t\tw_s%d;\n", fftsize);
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n", fftsize);
fprintf(vmain, "\twire\t[%d:0]\tw_e%d, w_o%d;\n", 2*(obits+xtrapbits)-1, fftsize, fftsize);
fprintf(vmain, "\t%sfftstage_e%d%s\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,%d,0)\tstage_e%d(i_clk, i_rst, i_ce,\n",
(inverse)?"i":"", fftsize,
if (single_clock) {
fprintf(vmain, "\twire\t[%d:0]\tw_d%d;\n", 2*(obits+xtrapbits)-1, fftsize);
cmem = gen_coeff_fname(EMPTYSTR, fftsize, 1, 0, inverse);
cmemfp = gen_coeff_open(cmem.c_str());
gen_coeffs(cmemfp, fftsize, nbitsin+xtracbits, 1, 0, inverse);
fprintf(vmain, "\tfftstage%s\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,0,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_%d(i_clk, %s, i_ce,\n",
((dbg)&&(dbgstage == fftsize))?"_dbg":"",
xtracbits, obits+xtrapbits,
lgsize, lgtmp-2, lgdelay(nbits,xtracbits),
fftsize);
fprintf(vmain, "\t\t\t(!i_rst), i_left, w_e%d, w_s%d%s);\n", fftsize, fftsize, ((dbg)&&(dbgstage == fftsize))?", o_dbg":"");
fprintf(vmain, "\t%sfftstage_o%d\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,%d,0)\tstage_o%d(i_clk, i_rst, i_ce,\n",
(inverse)?"i":"", fftsize,
xtracbits, obits+xtrapbits,
lgsize, lgtmp-2, lgdelay(nbits,xtracbits),
fftsize);
fprintf(vmain, "\t\t\t(!i_rst), i_right, w_o%d, w_os%d);\n", fftsize, fftsize);
fprintf(vmain, "\n\n");
xtracbits, obits+xtrapbits,
lgsize, lgtmp-1,
(mpystage)?1:0,
ckpce, cmem.c_str(),
fftsize, resetw.c_str());
fprintf(vmain, "\t\t\t(%s%s), i_sample, w_d%d, w_s%d%s);\n",
(async_reset)?"":"!", resetw.c_str(),
fftsize, fftsize,
((dbg)&&(dbgstage == fftsize))
? ", o_dbg":"");
} else {
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n", fftsize);
fprintf(vmain, "\twire\t[%d:0]\tw_e%d, w_o%d;\n", 2*(obits+xtrapbits)-1, fftsize, fftsize);
cmem = gen_coeff_fname(EMPTYSTR, fftsize, 2, 0, inverse);
cmemfp = gen_coeff_open(cmem.c_str());
gen_coeffs(cmemfp, fftsize, nbitsin+xtracbits, 2, 0, inverse);
fprintf(vmain, "\tfftstage%s\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,0,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_e%d(i_clk, %s, i_ce,\n",
((dbg)&&(dbgstage == fftsize))?"_dbg":"",
xtracbits, obits+xtrapbits,
lgsize, lgtmp-2,
(mpystage)?1:0,
ckpce, cmem.c_str(),
fftsize, resetw.c_str());
fprintf(vmain, "\t\t\t(%s%s), i_left, w_e%d, w_s%d%s);\n",
(async_reset)?"":"!", resetw.c_str(),
fftsize, fftsize,
((dbg)&&(dbgstage == fftsize))?", o_dbg":"");
cmem = gen_coeff_fname(EMPTYSTR, fftsize, 2, 1, inverse);
cmemfp = gen_coeff_open(cmem.c_str());
gen_coeffs(cmemfp, fftsize, nbitsin+xtracbits, 2, 1, inverse);
fprintf(vmain, "\tfftstage\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,0,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_o%d(i_clk, %s, i_ce,\n",
xtracbits, obits+xtrapbits,
lgsize, lgtmp-2,
(mpystage)?1:0,
ckpce, cmem.c_str(),
fftsize, resetw.c_str());
fprintf(vmain, "\t\t\t(%s%s), i_right, w_o%d, w_os%d);\n",
(async_reset)?"":"!",resetw.c_str(),
fftsize, fftsize);
}
 
 
std::string fname;
char numstr[12];
 
fname = coredir + "/";
if (inverse) fname += "i";
fname += "fftstage_e";
sprintf(numstr, "%d", fftsize);
fname += numstr;
if ((dbg)&&(dbgstage == fftsize))
fname += "_dbg";
fname += ".v";
build_stage(fname.c_str(), coredir.c_str(), fftsize/2, 0, nbits, inverse, xtracbits, mpystage, (dbg)&&(dbgstage == fftsize)); // Even stage
if (inverse)
fname += "i";
fname += "fftstage";
if (dbg) {
std::string dbgname(fname);
dbgname += "_dbg";
dbgname += ".v";
if (single_clock)
build_stage(fname.c_str(), fftsize, 1, 0, nbits, xtracbits, ckpce, async_reset, true);
else
build_stage(fname.c_str(), fftsize/2, 2, 1, nbits, xtracbits, ckpce, async_reset, true);
}
 
fname = coredir + "/";
if (inverse) fname += "i";
fname += "fftstage_o";
sprintf(numstr, "%d", fftsize);
fname += numstr;
fname += ".v";
build_stage(fname.c_str(), coredir.c_str(), fftsize/2, 1, nbits, inverse, xtracbits, mpystage, false); // Odd stage
if (single_clock) {
build_stage(fname.c_str(), fftsize, 1, 0,
nbits, xtracbits, ckpce, async_reset,
false);
} else {
// All stages use the same Verilog, so we only
// need to build one
build_stage(fname.c_str(), fftsize/2, 2, 1,
nbits, xtracbits, ckpce, async_reset, false);
}
}
 
nbits = obits; // New number of input bits
2812,68 → 1594,79
{
bool mpystage;
 
mpystage = ((lgtmp-2) <= nummpy);
mpystage = ((lgtmp-2) <= mpy_stages);
 
if (mpystage)
fprintf(vmain, "\t// A hardware optimized FFT stage\n");
fprintf(vmain, "\twire\t\tw_s%d;\n",
tmp_size);
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n",
tmp_size);
fprintf(vmain,"\twire\t[%d:0]\tw_e%d, w_o%d;\n",
2*(obits+xtrapbits)-1,
tmp_size, tmp_size);
fprintf(vmain, "\t%sfftstage_e%d%s\t#(%d,%d,%d,%d,%d,%d,%d)\tstage_e%d(i_clk, i_rst, i_ce,\n",
(inverse)?"i":"", tmp_size,
((dbg)&&(dbgstage==tmp_size))?"_dbg":"",
nbits+xtrapbits,
nbits+xtracbits+xtrapbits,
obits+xtrapbits,
lgsize, lgtmp-2,
lgdelay(nbits+xtrapbits,xtracbits),
(dropbit)?0:0, tmp_size);
fprintf(vmain, "\t\t\t\t\t\tw_s%d, w_e%d, w_e%d, w_s%d%s);\n",
tmp_size<<1, tmp_size<<1,
tmp_size, tmp_size,
((dbg)&&(dbgstage == tmp_size))
?", o_dbg":"");
fprintf(vmain, "\t%sfftstage_o%d\t#(%d,%d,%d,%d,%d,%d,%d)\tstage_o%d(i_clk, i_rst, i_ce,\n",
(inverse)?"i":"", tmp_size,
nbits+xtrapbits,
nbits+xtracbits+xtrapbits,
obits+xtrapbits,
lgsize, lgtmp-2,
lgdelay(nbits+xtrapbits,xtracbits),
(dropbit)?0:0, tmp_size);
fprintf(vmain, "\t\t\t\t\t\tw_s%d, w_o%d, w_o%d, w_os%d);\n",
tmp_size<<1, tmp_size<<1,
tmp_size, tmp_size);
fprintf(vmain, "\n\n");
 
std::string fname;
char numstr[12];
 
fname = coredir + "/";
if (inverse) fname += "i";
fname += "fftstage_e";
sprintf(numstr, "%d", tmp_size);
fname += numstr;
if ((dbg)&&(dbgstage == tmp_size))
fname += "_dbg";
fname += ".v";
build_stage(fname.c_str(), coredir.c_str(), tmp_size/2, 0,
nbits+xtrapbits, inverse, xtracbits,
mpystage, ((dbg)&&(dbgstage == tmp_size))); // Even stage
 
fname = coredir + "/";
if (inverse) fname += "i";
fname += "fftstage_o";
sprintf(numstr, "%d", tmp_size);
fname += numstr;
fname += ".v";
build_stage(fname.c_str(), coredir.c_str(), tmp_size/2, 1,
nbits+xtrapbits, inverse, xtracbits,
mpystage, false); // Odd stage
if (single_clock) {
fprintf(vmain,"\twire\t[%d:0]\tw_d%d;\n",
2*(obits+xtrapbits)-1,
tmp_size);
cmem = gen_coeff_fname(EMPTYSTR, tmp_size, 1, 0, inverse);
cmemfp = gen_coeff_open(cmem.c_str());
gen_coeffs(cmemfp, tmp_size,
nbits+xtracbits+xtrapbits, 1, 0, inverse);
fprintf(vmain, "\tfftstage%s\t#(%d,%d,%d,%d,%d,%d,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_%d(i_clk, %s, i_ce,\n",
((dbg)&&(dbgstage==tmp_size))?"_dbg":"",
nbits+xtrapbits,
nbits+xtracbits+xtrapbits,
obits+xtrapbits,
lgsize, lgtmp-1,
(dropbit)?0:0, (mpystage)?1:0,
ckpce,
cmem.c_str(), tmp_size,
resetw.c_str());
fprintf(vmain, "\t\t\tw_s%d, w_d%d, w_d%d, w_s%d%s);\n",
tmp_size<<1, tmp_size<<1,
tmp_size, tmp_size,
((dbg)&&(dbgstage == tmp_size))
?", o_dbg":"");
} else {
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n",
tmp_size);
fprintf(vmain,"\twire\t[%d:0]\tw_e%d, w_o%d;\n",
2*(obits+xtrapbits)-1,
tmp_size, tmp_size);
cmem = gen_coeff_fname(EMPTYSTR, tmp_size, 2, 0, inverse);
cmemfp = gen_coeff_open(cmem.c_str());
gen_coeffs(cmemfp, tmp_size,
nbits+xtracbits+xtrapbits, 2, 0, inverse);
fprintf(vmain, "\tfftstage%s\t#(%d,%d,%d,%d,%d,%d,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_e%d(i_clk, %s, i_ce,\n",
((dbg)&&(dbgstage==tmp_size))?"_dbg":"",
nbits+xtrapbits,
nbits+xtracbits+xtrapbits,
obits+xtrapbits,
lgsize, lgtmp-2,
(dropbit)?0:0, (mpystage)?1:0,
ckpce,
cmem.c_str(), tmp_size,
resetw.c_str());
fprintf(vmain, "\t\t\tw_s%d, w_e%d, w_e%d, w_s%d%s);\n",
tmp_size<<1, tmp_size<<1,
tmp_size, tmp_size,
((dbg)&&(dbgstage == tmp_size))
?", o_dbg":"");
cmem = gen_coeff_fname(EMPTYSTR,
tmp_size, 2, 1, inverse);
cmemfp = gen_coeff_open(cmem.c_str());
gen_coeffs(cmemfp, tmp_size,
nbits+xtracbits+xtrapbits,
2, 1, inverse);
fprintf(vmain, "\tfftstage\t#(%d,%d,%d,%d,%d,%d,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_o%d(i_clk, %s, i_ce,\n",
nbits+xtrapbits,
nbits+xtracbits+xtrapbits,
obits+xtrapbits,
lgsize, lgtmp-2,
(dropbit)?0:0, (mpystage)?1:0,
ckpce, cmem.c_str(), tmp_size,
resetw.c_str());
fprintf(vmain, "\t\t\tw_s%d, w_o%d, w_o%d, w_os%d);\n",
tmp_size<<1, tmp_size<<1,
tmp_size, tmp_size);
}
fprintf(vmain, "\n");
}
 
 
2889,17 → 1682,31
obits = maxbitsout;
 
fprintf(vmain, "\twire\t\tw_s4;\n");
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os4;\n\t// verilator lint_on UNUSED\n");
fprintf(vmain, "\twire\t[%d:0]\tw_e4, w_o4;\n", 2*(obits+xtrapbits)-1);
fprintf(vmain, "\tqtrstage%s\t#(%d,%d,%d,0,%d,%d)\tstage_e4(i_clk, i_rst, i_ce,\n",
((dbg)&&(dbgstage==4))?"_dbg":"",
nbits+xtrapbits, obits+xtrapbits, lgsize,
(inverse)?1:0, (dropbit)?0:0);
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_e8, w_e4, w_s4%s);\n",
((dbg)&&(dbgstage==4))?", o_dbg":"");
fprintf(vmain, "\tqtrstage\t#(%d,%d,%d,1,%d,%d)\tstage_o4(i_clk, i_rst, i_ce,\n",
nbits+xtrapbits, obits+xtrapbits, lgsize, (inverse)?1:0, (dropbit)?0:0);
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_o8, w_o4, w_os4);\n");
if (single_clock) {
fprintf(vmain, "\twire\t[%d:0]\tw_d4;\n",
2*(obits+xtrapbits)-1);
fprintf(vmain, "\tqtrstage%s\t#(%d,%d,%d,%d,%d)\tstage_4(i_clk, %s, i_ce,\n",
((dbg)&&(dbgstage==4))?"_dbg":"",
nbits+xtrapbits, obits+xtrapbits, lgsize,
(inverse)?1:0, (dropbit)?0:0,
resetw.c_str());
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_d8, w_d4, w_s4%s);\n",
((dbg)&&(dbgstage==4))?", o_dbg":"");
} else {
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os4;\n\t// verilator lint_on UNUSED\n");
fprintf(vmain, "\twire\t[%d:0]\tw_e4, w_o4;\n", 2*(obits+xtrapbits)-1);
fprintf(vmain, "\tqtrstage%s\t#(%d,%d,%d,0,%d,%d)\tstage_e4(i_clk, %s, i_ce,\n",
((dbg)&&(dbgstage==4))?"_dbg":"",
nbits+xtrapbits, obits+xtrapbits, lgsize,
(inverse)?1:0, (dropbit)?0:0,
resetw.c_str());
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_e8, w_e4, w_s4%s);\n",
((dbg)&&(dbgstage==4))?", o_dbg":"");
fprintf(vmain, "\tqtrstage\t#(%d,%d,%d,1,%d,%d)\tstage_o4(i_clk, %s, i_ce,\n",
nbits+xtrapbits, obits+xtrapbits, lgsize, (inverse)?1:0, (dropbit)?0:0,
resetw.c_str());
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_o8, w_o4, w_os4);\n");
}
dropbit ^= 1;
nbits = obits;
tmp_size >>= 1; lgtmp--;
2912,26 → 1719,51
if ((maxbitsout>0)&&(obits > maxbitsout))
obits = maxbitsout;
fprintf(vmain, "\twire\t\tw_s2;\n");
fprintf(vmain, "\twire\t[%d:0]\tw_e2, w_o2;\n", 2*obits-1);
if (single_clock) {
fprintf(vmain, "\twire\t[%d:0]\tw_d2;\n",
2*obits-1);
} else {
fprintf(vmain, "\twire\t[%d:0]\tw_e2, w_o2;\n",
2*obits-1);
}
if ((nbits+xtrapbits+1 == obits)&&(!dropbit))
printf("WARNING: SCALING OFF BY A FACTOR OF TWO--should\'ve dropped a bit in the last stage.\n");
fprintf(vmain, "\tdblstage\t#(%d,%d,%d)\tstage_2(i_clk, i_rst, i_ce,\n", nbits+xtrapbits, obits,(dropbit)?0:1);
fprintf(vmain, "\t\t\t\t\tw_s4, w_e4, w_o4, w_e2, w_o2, w_s2);\n");
 
if (single_clock) {
fprintf(vmain, "\tlaststage\t#(%d,%d,%d)\tstage_2(i_clk, %s, i_ce,\n",
nbits+xtrapbits, obits,(dropbit)?0:1,
resetw.c_str());
fprintf(vmain, "\t\t\t\t\tw_s4, w_d4, w_d2, w_s2);\n");
} else {
fprintf(vmain, "\tlaststage\t#(%d,%d,%d)\tstage_2(i_clk, %s, i_ce,\n",
nbits+xtrapbits, obits,(dropbit)?0:1,
resetw.c_str());
fprintf(vmain, "\t\t\t\t\tw_s4, w_e4, w_o4, w_e2, w_o2, w_s2);\n");
}
 
fprintf(vmain, "\n\n");
nbits = obits;
}
 
fprintf(vmain, "\t// Prepare for a (potential) bit-reverse stage.\n");
fprintf(vmain, "\tassign\tbr_left = w_e2;\n");
fprintf(vmain, "\tassign\tbr_right = w_o2;\n");
if (single_clock)
fprintf(vmain, "\tassign\tbr_sample= w_d2;\n");
else {
fprintf(vmain, "\tassign\tbr_left = w_e2;\n");
fprintf(vmain, "\tassign\tbr_right = w_o2;\n");
}
fprintf(vmain, "\n");
if (bitreverse) {
fprintf(vmain, "\twire\tbr_start;\n");
fprintf(vmain, "\treg\tr_br_started;\n");
fprintf(vmain, "\tinitial\tr_br_started = 1\'b0;\n");
fprintf(vmain, "\talways @(posedge i_clk)\n");
fprintf(vmain, "\t\tif (i_rst)\n");
if (async_reset) {
fprintf(vmain, "\talways @(posedge i_clk, negedge i_areset_n)\n");
fprintf(vmain, "\t\tif (!i_areset_n)\n");
} else {
fprintf(vmain, "\talways @(posedge i_clk)\n");
fprintf(vmain, "\t\tif (i_reset)\n");
}
fprintf(vmain, "\t\t\tr_br_started <= 1\'b0;\n");
fprintf(vmain, "\t\telse if (i_ce)\n");
fprintf(vmain, "\t\t\tr_br_started <= r_br_started || w_s2;\n");
2939,14 → 1771,25
}
}
 
 
fprintf(vmain, "\n");
fprintf(vmain, "\t// Now for the bit-reversal stage.\n");
fprintf(vmain, "\twire\tbr_sync;\n");
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_o_left, br_o_right;\n");
if (bitreverse) {
fprintf(vmain, "\tdblreverse\t#(%d,%d)\trevstage(i_clk, i_rst,\n", lgsize, nbitsout);
fprintf(vmain, "\t\t\t(i_ce & br_start), br_left, br_right,\n");
fprintf(vmain, "\t\t\tbr_o_left, br_o_right, br_sync);\n");
if (single_clock) {
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_o_result;\n");
fprintf(vmain, "\tbitreverse\t#(%d,%d)\n\t\trevstage(i_clk, %s,\n", lgsize, nbitsout, resetw.c_str());
fprintf(vmain, "\t\t\t(i_ce & br_start), br_sample,\n");
fprintf(vmain, "\t\t\tbr_o_result, br_sync);\n");
} else {
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_o_left, br_o_right;\n");
fprintf(vmain, "\tbitreverse\t#(%d,%d)\n\t\trevstage(i_clk, %s,\n", lgsize, nbitsout, resetw.c_str());
fprintf(vmain, "\t\t\t(i_ce & br_start), br_left, br_right,\n");
fprintf(vmain, "\t\t\tbr_o_left, br_o_right, br_sync);\n");
}
} else if (single_clock) {
fprintf(vmain, "\tassign\tbr_o_result = br_result;\n");
fprintf(vmain, "\tassign\tbr_sync = w_s2;\n");
} else {
fprintf(vmain, "\tassign\tbr_o_left = br_left;\n");
fprintf(vmain, "\tassign\tbr_o_right = br_right;\n");
2953,35 → 1796,51
fprintf(vmain, "\tassign\tbr_sync = w_s2;\n");
}
 
fprintf(vmain, "\n\n");
fprintf(vmain, "\t// Last clock: Register our outputs, we\'re done.\n");
fprintf(vmain, "\tinitial\to_sync = 1\'b0;\n");
fprintf(vmain, "\talways @(posedge i_clk)\n");
fprintf(vmain, "\t\tif (i_rst)\n");
fprintf(vmain, "\t\t\to_sync <= 1\'b0;\n");
fprintf(vmain, "\t\telse if (i_ce)\n");
fprintf(vmain, "\t\t\to_sync <= br_sync;\n");
fprintf(vmain, "\n");
fprintf(vmain, "\talways @(posedge i_clk)\n");
fprintf(vmain, "\t\tif (i_ce)\n");
fprintf(vmain, "\t\tbegin\n");
fprintf(vmain, "\t\t\to_left <= br_o_left;\n");
fprintf(vmain, "\t\t\to_right <= br_o_right;\n");
fprintf(vmain, "\t\tend\n");
fprintf(vmain, "\n\n");
fprintf(vmain, "endmodule\n");
fprintf(vmain,
"\n\n"
"\t// Last clock: Register our outputs, we\'re done.\n"
"\tinitial\to_sync = 1\'b0;\n");
if (async_reset)
fprintf(vmain,
"\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n");
else {
fprintf(vmain,
"\talways @(posedge i_clk)\n\t\tif (i_reset)\n");
}
 
fprintf(vmain,
"\t\t\to_sync <= 1\'b0;\n"
"\t\telse if (i_ce)\n"
"\t\t\to_sync <= br_sync;\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n");
if (single_clock) {
fprintf(vmain, "\t\t\to_result <= br_o_result;\n");
} else {
fprintf(vmain,
"\t\tbegin\n"
"\t\t\to_left <= br_o_left;\n"
"\t\t\to_right <= br_o_right;\n"
"\t\tend\n");
}
 
fprintf(vmain,
"\n\n"
"endmodule\n");
fclose(vmain);
 
 
{
std::string fname;
 
fname = coredir + "/butterfly.v";
build_butterfly(fname.c_str(), xtracbits, rounding);
build_butterfly(fname.c_str(), xtracbits, rounding,
ckpce, async_reset);
 
if (nummpy > 0) {
fname = coredir + "/hwbfly.v";
build_hwbfly(fname.c_str(), xtracbits, rounding);
}
fname = coredir + "/hwbfly.v";
build_hwbfly(fname.c_str(), xtracbits, rounding,
ckpce, async_reset);
 
{
// To make debugging easier, we build both of these
2996,20 → 1855,40
 
if ((dbg)&&(dbgstage == 4)) {
fname = coredir + "/qtrstage_dbg.v";
build_quarters(fname.c_str(), rounding, true);
if (single_clock)
build_snglquarters(fname.c_str(), rounding,
async_reset, true);
else
build_dblquarters(fname.c_str(), rounding,
async_reset, true);
}
fname = coredir + "/qtrstage.v";
build_quarters(fname.c_str(), rounding, false);
 
if ((dbg)&&(dbgstage == 2))
fname = coredir + "/dblstage_dbg.v";
if (single_clock)
build_snglquarters(fname.c_str(), rounding,
async_reset, false);
else
fname = coredir + "/dblstage.v";
build_dblstage(fname.c_str(), rounding, (dbg)&&(dbgstage==2));
build_dblquarters(fname.c_str(), rounding,
async_reset, false);
 
 
if (single_clock) {
fname = coredir + "/laststage.v";
build_sngllast(fname.c_str(), async_reset);
} else {
if ((dbg)&&(dbgstage == 2))
fname = coredir + "/laststage_dbg.v";
else
fname = coredir + "/laststage.v";
build_dblstage(fname.c_str(), rounding,
async_reset, (dbg)&&(dbgstage==2));
}
 
if (bitreverse) {
fname = coredir + "/dblreverse.v";
build_dblreverse(fname.c_str());
fname = coredir + "/bitreverse.v";
if (single_clock)
build_snglbrev(fname.c_str(), async_reset);
else
build_dblreverse(fname.c_str(), async_reset);
}
 
const char *rnd_string = "";
3029,4 → 1908,7
}
 
}
 
if (verbose_flag)
printf("All done -- success\n");
}
/trunk/sw/fftlib.cpp
0,0 → 1,197
////////////////////////////////////////////////////////////////////////////////
//
// Filename: fftlib.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen
#include <stdio.h>
#include <stdlib.h>
 
#ifdef _MSC_VER // added for ms vs compatibility
 
#include <io.h>
#include <direct.h>
#define _USE_MATH_DEFINES
 
#if _MSC_VER <= 1700
 
long long llround(double d) {
if (d<0) return -(long long)(-d+0.5);
else return (long long)(d+0.5); }
 
#endif
 
#else
// And for G++/Linux environment
 
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros
#endif
 
#include <string.h>
#include <string>
#include <math.h>
// #include <ctype.h>
#include <assert.h>
 
#include "fftlib.h"
 
 
int lgval(int vl) {
int lg;
 
for(lg=1; (1<<lg) < vl; lg++)
;
return lg;
}
 
int nextlg(int vl) {
int r;
 
for(r=1; r<vl; r<<=1)
;
return r;
}
 
int bflydelay(int nbits, int xtra) {
int cbits = nbits + xtra;
int delay;
 
if (USE_OLD_MULTIPLY) {
if (nbits+1<cbits)
delay = nbits+4;
else
delay = cbits+3;
} else {
int na=nbits+2, nb=cbits+1;
if (nb<na) {
int tmp = nb;
nb = na; na = tmp;
} delay = ((na)/2+(na&1)+2);
}
return delay;
}
 
int lgdelay(int nbits, int xtra) {
// The butterfly code needs to compare a valid address, of this
// many bits, with an address two greater. This guarantees we
// have enough bits for that comparison. We'll also end up with
// more storage space to look for these values, but without a
// redesign that's just what we'll deal with.
return lgval(bflydelay(nbits, xtra)+3);
}
 
void gen_coeffs(FILE *cmem, int stage, int cbits,
int nwide, int offset, bool inv) {
//
// For an FFT stage of 2^n elements, we need 2^(n-1) butterfly
// coefficients, sometimes called twiddle factors. Stage captures the
// width of the FFT at this point. If thiss is a 2x at a time FFT,
// nwide will be equal to 2, and offset will be one or two.
//
assert(nwide > 0);
assert(offset < nwide);
assert(stage / nwide > 1);
assert(stage % nwide == 0);
printf("GEN-COEFFS(): stage =%4d, bits =%2d, nwide = %d, offset = %d, nverse = %d\n", stage, cbits, nwide, offset, inv);
int ncoeffs = stage/nwide/2;
for(int i=0; i<ncoeffs; i++) {
int k = nwide*i+offset;
double W = ((inv)?1:-1)*2.0*M_PI*k/(double)(stage);
double c, s;
long long ic, is, vl;
 
c = cos(W); s = sin(W);
ic = (long long)llround((1ll<<(cbits-2)) * c);
is = (long long)llround((1ll<<(cbits-2)) * s);
vl = (ic & (~(-1ll << (cbits))));
vl <<= (cbits);
vl |= (is & (~(-1ll << (cbits))));
fprintf(cmem, "%0*llx\n", ((cbits*2+3)/4), vl);
//
} fclose(cmem);
}
 
std::string gen_coeff_fname(const char *coredir,
int stage, int nwide, int offset, bool inv) {
std::string result;
char *memfile;
 
assert((nwide == 1)||(nwide == 2));
 
memfile = new char[strlen(coredir)+3+10+strlen(".hex")+64];
if (nwide == 2) {
if (coredir[0] == '\0') {
sprintf(memfile, "%scmem_%c%d.hex",
(inv)?"i":"", (offset==1)?'o':'e', stage*nwide);
} else {
sprintf(memfile, "%s/%scmem_%c%d.hex",
coredir, (inv)?"i":"",
(offset==1)?'o':'e', stage*nwide);
}
} else if (coredir[0] == '\0') // if (nwide == 1)
sprintf(memfile, "%scmem_%d.hex",
(inv)?"i":"", stage);
else
sprintf(memfile, "%s/%scmem_%d.hex",
coredir, (inv)?"i":"", stage);
 
result = std::string(memfile);
delete[] memfile;
return result;
}
 
FILE *gen_coeff_open(const char *fname) {
FILE *cmem;
 
cmem = fopen(fname, "w");
if (NULL == cmem) {
fprintf(stderr, "Could not open FFT coefficient file "
"\'%s\' for writing\n", fname);
perror("Err from O/S:");
exit(EXIT_FAILURE);
}
 
return cmem;
}
 
void gen_coeff_file(const char *coredir, const char *fname,
int stage, int cbits, int nwide, int offset, bool inv) {
std::string fstr;
FILE *cmem;
 
fstr= gen_coeff_fname(coredir, stage, nwide, offset, inv);
cmem = gen_coeff_open(fstr.c_str());
gen_coeffs(cmem, stage, cbits, nwide, offset, inv);
}
/trunk/sw/fftlib.h
0,0 → 1,55
////////////////////////////////////////////////////////////////////////////////
//
// Filename: fftlib.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose:
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef FFTLIB_H
#define FFTLIB_H
 
#define USE_OLD_MULTIPLY false
 
extern int lgval(int vl);
extern int nextlg(int vl);
extern int bflydelay(int nbits, int xtra);
extern int lgdelay(int nbits, int xtra);
extern void gen_coeffs(FILE *cmem, int stage, int cbits,
int nwide, int offset, bool inv);
extern std::string gen_coeff_fname(const char *coredir,
int stage, int nwide, int offset, bool inv);
extern FILE *gen_coeff_open(const char *fname);
extern void gen_coeff_file(const char *coredir, const char *fname,
int stage, int cbits, int nwide, int offset, bool inv);
 
#endif // FFTLIB_H
/trunk/sw/legal.cpp
0,0 → 1,70
////////////////////////////////////////////////////////////////////////////////
//
// Filename: legal.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: Contains the information and logic necessary to place a
// copyright, name, author, and purpoose statement at the head of
// every file.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#include "legal.h"
 
const char cpyleft[] =
SLASHLINE
"//\n"
"// Copyright (C) 2015-2018, Gisselquist Technology, LLC\n"
"//\n"
"// This program is free software (firmware): you can redistribute it and/or\n"
"// modify it under the terms of the GNU General Public License as published\n"
"// by the Free Software Foundation, either version 3 of the License, or (at\n"
"// your option) any later version.\n"
"//\n"
"// This program is distributed in the hope that it will be useful, but WITHOUT\n"
"// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or\n"
"// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License\n"
"// for more details.\n"
"//\n"
"// You should have received a copy of the GNU General Public License along\n"
"// with this program. (It's in the $(ROOT)/doc directory, run make with no\n"
"// target there if the PDF file isn\'t present.) If not, see\n"
"// <http://www.gnu.org/licenses/> for a copy.\n"
"//\n"
"// License: GPL, v3, as defined and found on www.gnu.org,\n"
"// http://www.gnu.org/licenses/gpl.html\n"
"//\n"
"//\n"
SLASHLINE;
const char prjname[] = "A General Purpose Pipelined FFT Implementation";
const char creator[] = "// Creator: Dan Gisselquist, Ph.D.\n"
"// Gisselquist Technology, LLC\n";
 
/trunk/sw/legal.h
0,0 → 1,49
////////////////////////////////////////////////////////////////////////////////
//
// Filename: legal.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: Contains the information and logic necessary to place a
// copyright, name, author, and purpoose statement at the head of
// every file.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef LEGAL_H
#define LEGAL_H
 
#define SLASHLINE "////////////////////////////////////////////////////////////////////////////////\n"
 
extern const char cpyleft[];
extern const char prjname[];
extern const char creator[];
 
#endif // LEGAL_H
/trunk/sw/rounding.cpp
0,0 → 1,407
////////////////////////////////////////////////////////////////////////////////
//
// Filename: rounding.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: To create one of a series of modules to handle dropping bits
// within the FFT implementation.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen
 
#include <stdio.h>
#include <stdlib.h>
 
#include <string.h>
#include <string>
#include <math.h>
#include <ctype.h>
#include <assert.h>
 
#include "legal.h"
#include "rounding.h"
 
#define SLASHLINE "////////////////////////////////////////////////////////////////////////////////\n"
 
 
void build_truncator(const char *fname) {
printf("TRUNCATING!\n");
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\ttruncate.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose: Truncation is one of several options that can be used\n"
"// internal to the various FFT stages to drop bits from one\n"
"// stage to the next. In general, it is the simplest method of dropping\n"
"// bits, since it requires only a bit selection.\n"
"//\n"
"// This form of rounding isn\'t really that great for FFT\'s, since it\n"
"// tends to produce a DC bias in the result. (Other less pronounced\n"
"// biases may also exist.)\n"
"//\n"
"// This particular version also registers the output with the clock, so\n"
"// there will be a delay of one going through this module. This will\n"
"// keep it in line with the other forms of rounding that can be used.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module truncate(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_val <= i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\n"
"endmodule\n");
}
 
void build_roundhalfup(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\troundhalfup.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tRounding half up is the way I was always taught to round in\n"
"// school. A one half value is added to the result, and then\n"
"// the result is truncated. When used in an FFT, this produces less\n"
"// bias than the truncation method, although a bias still tends to\n"
"// remain.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module roundhalfup(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\t// Let's deal with two cases to be as general as we can be here\n"
"\t//\n"
"\t// 1. The desired output would lose no bits at all\n"
"\t// 2. One or more bits would be dropped, so the rounding is simply\n"
"\t//\t\ta matter of adding one to the bit about to be dropped,\n"
"\t//\t\tmoving all halfway and above numbers up to the next\n"
"\t//\t\tvalue.\n"
"\tgenerate\n"
"\tif (IWID-SHIFT == OWID)\n"
"\tbegin // No truncation or rounding, output drops no bits\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n"
"\n"
"\tend else // if (IWID-SHIFT-1 >= OWID)\n"
"\tbegin // Output drops one bit, can only add one or ... not.\n"
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n"
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse\n"
"\t\t\t\t\to_val <= rounded_up; // even value\n"
"\t\t\tend\n"
"\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"endmodule\n");
}
 
void build_roundfromzero(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\troundfromzero.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: Truncation is one of several options that can be used\n"
"// internal to the various FFT stages to drop bits from one\n"
"// stage to the next. In general, it is the simplest method of dropping\n"
"// bits, since it requires only a bit selection.\n"
"//\n"
"// This form of rounding isn\'t really that great for FFT\'s, since it\n"
"// tends to produce a DC bias in the result. (Other less pronounced\n"
"// biases may also exist.)\n"
"//\n"
"// This particular version also registers the output with the clock, so\n"
"// clock, so there will be a delay of one going through this module.\n"
"// This will keep it in line with the other forms of rounding that can\n"
"// be used.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module roundfromzero(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\t// Let's deal with three cases to be as general as we can be here\n"
"\t//\n"
"\t//\t1. The desired output would lose no bits at all\n"
"\t//\t2. One bit would be dropped, so the rounding is simply\n"
"\t//\t\tadjusting the value to be the closer to zero in\n"
"\t//\t\tcases of being halfway between two. If identically\n"
"\t//\t\tequal to a number, we just leave it as is.\n"
"\t//\t3. Two or more bits would be dropped. In this case, we round\n"
"\t//\t\tnormally unless we are rounding a value of exactly\n"
"\t//\t\thalfway between the two. In the halfway case, we\n"
"\t//\t\tround away from zero.\n"
"\tgenerate\n"
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n"
"\tbegin // cannot be applied. No truncation or rounding takes\n"
"\t// effect here.\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n"
"\n"
"\tend else if (IWID-SHIFT == OWID)\n"
"\tbegin // No truncation or rounding, output drops no bits\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n"
"\n"
"\tend else if (IWID-SHIFT-1 == OWID)\n"
"\tbegin // Output drops one bit, can only add one or ... not.\n"
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n"
"\t\twire\t\t\tsign_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tfirst_lost_bit = i_val[0];\n"
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (sign_bit)\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse\n"
"\t\t\t\t\to_val <= rounded_up;\n"
"\t\t\tend\n"
"\n"
"\tend else // If there's more than one bit we are dropping\n"
"\tbegin\n"
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n"
"\t\twire\t\t\tsign_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n"
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n"
"\n"
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n"
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (|other_lost_bits) // Round up to\n"
"\t\t\t\t\to_val <= rounded_up; // closest value\n"
"\t\t\t\telse if (sign_bit)\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse\n"
"\t\t\t\t\to_val <= rounded_up;\n"
"\t\t\tend\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"endmodule\n");
}
 
void build_convround(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename: convround.v\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: A convergent rounding routine, also known as banker\'s\n"
"// rounding, Dutch rounding, Gaussian rounding, unbiased\n"
"// rounding, or ... more, at least according to Wikipedia.\n"
"//\n"
"// This form of rounding works by rounding, when the direction is in\n"
"// question, towards the nearest even value.\n"
"//\n"
"//\n%s"
"//\n",
prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module convround(i_clk, i_ce, i_val, o_val);\n"
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n"
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n"
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n"
"\n"
"\t// Let's deal with three cases to be as general as we can be here\n"
"\t//\n"
"\t//\t1. The desired output would lose no bits at all\n"
"\t//\t2. One bit would be dropped, so the rounding is simply\n"
"\t//\t\tadjusting the value to be the nearest even number in\n"
"\t//\t\tcases of being halfway between two. If identically\n"
"\t//\t\tequal to a number, we just leave it as is.\n"
"\t//\t3. Two or more bits would be dropped. In this case, we round\n"
"\t//\t\tnormally unless we are rounding a value of exactly\n"
"\t//\t\thalfway between the two. In the halfway case we round\n"
"\t//\t\tto the nearest even number.\n"
"\tgenerate\n"
// What if IWID < OWID? We should expand here ... somehow
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n"
"\tbegin // cannot be applied. No truncation or rounding takes\n"
"\t// effect here.\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n"
"\n"
// What if IWID-SHIFT < OWID? Shouldn't we also shift here as well?
"\tend else if (IWID-SHIFT == OWID)\n"
"\tbegin // No truncation or rounding, output drops no bits\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n"
"\n"
"\tend else if (IWID-SHIFT-1 == OWID)\n"
// Is there any way to limit the number of bits that are examined here, for the
// purpose of simplifying/reducing logic? I mean, if we go from 32 to 16 bits,
// must we check all 15 bits for equality to zero?
"\tbegin // Output drops one bit, can only add one or ... not.\n"
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n"
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tlast_valid_bit = truncated_value[0];\n"
"\t\tassign\tfirst_lost_bit = i_val[0];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (last_valid_bit)// Round up to nearest\n"
"\t\t\t\t\to_val <= rounded_up; // even value\n"
"\t\t\t\telse // else round down to the nearest\n"
"\t\t\t\t\to_val <= truncated_value; // even value\n"
"\t\t\tend\n"
"\n"
"\tend else // If there's more than one bit we are dropping\n"
"\tbegin\n"
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n"
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n"
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n"
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n"
"\t\tassign\tlast_valid_bit = truncated_value[0];\n"
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n"
"\n"
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n"
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n"
"\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\tbegin\n"
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\t\telse if (|other_lost_bits) // Round up to\n"
"\t\t\t\t\to_val <= rounded_up; // closest value\n"
"\t\t\t\telse if (last_valid_bit) // Round up to\n"
"\t\t\t\t\to_val <= rounded_up; // nearest even\n"
"\t\t\t\telse // else round down to nearest even\n"
"\t\t\t\t\to_val <= truncated_value;\n"
"\t\t\tend\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"endmodule\n");
}
 
/trunk/sw/rounding.h
0,0 → 1,52
////////////////////////////////////////////////////////////////////////////////
//
// Filename: rounding.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: To create one of a series of modules to handle dropping bits
// within the FFT implementation.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef ROUNDING_H
#define ROUNDING_H
 
typedef enum {
RND_TRUNCATE, RND_FROMZERO, RND_HALFUP, RND_CONVERGENT
} ROUND_T;
 
 
extern void build_truncator(const char *fname);
extern void build_roundhalfup(const char *fname);
extern void build_roundfromzero(const char *fname);
extern void build_convround(const char *fname);
 
#endif
/trunk/sw/softmpy.cpp
0,0 → 1,400
////////////////////////////////////////////////////////////////////////////////
//
// Filename: softmpy.cpp
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: If the chip doesn't have any hardware multiplies, you'll need
// a soft-multiply implementation. This provides that
// implementation.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen
#include <stdio.h>
#include <stdlib.h>
 
#ifdef _MSC_VER // added for ms vs compatibility
 
#include <io.h>
#include <direct.h>
#define _USE_MATH_DEFINES
 
#endif
 
#include <string.h>
#include <string>
#include <math.h>
#include <ctype.h>
#include <assert.h>
 
#include "defaults.h"
#include "legal.h"
#include "softmpy.h"
 
void build_multiply(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\tshiftaddmpy.v\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tA portable shift and add multiply.\n"
"//\n"
"// While both Xilinx and Altera will offer single clock multiplies, this\n"
"// simple approach will multiply two numbers on any architecture. The\n"
"// result maintains the full width of the multiply, there are no extra\n"
"// stuff bits, no rounding, no shifted bits, etc.\n"
"//\n"
"// Further, for those applications that can support it, this multiply\n"
"// is pipelined and will produce one answer per clock.\n"
"//\n"
"// For minimal processing delay, make the first parameter the one with\n"
"// the least bits, so that AWIDTH <= BWIDTH.\n"
"//\n"
"// The processing delay in this multiply is (AWIDTH+1) cycles. That is,\n"
"// if the data is present on the input at clock t=0, the result will be\n"
"// present on the output at time t=AWIDTH+1;\n"
"//\n"
"//\n%s"
"//\n", prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module shiftaddmpy(i_clk, i_ce, i_a, i_b, o_r);\n"
"\tparameter\tAWIDTH=%d,BWIDTH=", TST_SHIFTADDMPY_AW);
#ifdef TST_SHIFTADDMPY_BW
fprintf(fp, "%d;\n", TST_SHIFTADDMPY_BW);
#else
fprintf(fp, "AWIDTH;\n");
#endif
fprintf(fp,
"\tinput\t\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\t[(AWIDTH-1):0]\t\ti_a;\n"
"\tinput\t\t[(BWIDTH-1):0]\t\ti_b;\n"
"\toutput\treg\t[(AWIDTH+BWIDTH-1):0]\to_r;\n"
"\n"
"\treg\t[(AWIDTH-1):0]\tu_a;\n"
"\treg\t[(BWIDTH-1):0]\tu_b;\n"
"\treg\t\t\tsgn;\n"
"\n"
"\treg\t[(AWIDTH-2):0]\t\tr_a[0:(AWIDTH-1)];\n"
"\treg\t[(AWIDTH+BWIDTH-2):0]\tr_b[0:(AWIDTH-1)];\n"
"\treg\t\t\t\tr_s[0:(AWIDTH-1)];\n"
"\treg\t[(AWIDTH+BWIDTH-1):0]\tacc[0:(AWIDTH-1)];\n"
"\tgenvar k;\n"
"\n"
"\t// If we were forced to stay within two\'s complement arithmetic,\n"
"\t// taking the absolute value here would require an additional bit.\n"
"\t// However, because our results are now unsigned, we can stay\n"
"\t// within the number of bits given (for now).\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tu_a <= (i_a[AWIDTH-1])?(-i_a):(i_a);\n"
"\t\t\tu_b <= (i_b[BWIDTH-1])?(-i_b):(i_b);\n"
"\t\t\tsgn <= i_a[AWIDTH-1] ^ i_b[BWIDTH-1];\n"
"\t\tend\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tacc[0] <= (u_a[0]) ? { {(AWIDTH){1\'b0}}, u_b }\n"
"\t\t\t\t\t: {(AWIDTH+BWIDTH){1\'b0}};\n"
"\t\t\tr_a[0] <= { u_a[(AWIDTH-1):1] };\n"
"\t\t\tr_b[0] <= { {(AWIDTH-1){1\'b0}}, u_b };\n"
"\t\t\tr_s[0] <= sgn; // The final sign, needs to be preserved\n"
"\t\tend\n"
"\n"
"\tgenerate\n"
"\tfor(k=0; k<AWIDTH-1; k=k+1)\n"
"\tbegin : genstages\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tacc[k+1] <= acc[k] + ((r_a[k][0]) ? {r_b[k],1\'b0}:0);\n"
"\t\t\tr_a[k+1] <= { 1\'b0, r_a[k][(AWIDTH-2):1] };\n"
"\t\t\tr_b[k+1] <= { r_b[k][(AWIDTH+BWIDTH-3):0], 1\'b0};\n"
"\t\t\tr_s[k+1] <= r_s[k];\n"
"\t\tend\n"
"\tend\n"
"\tendgenerate\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_r <= (r_s[AWIDTH-1]) ? (-acc[AWIDTH-1]) : acc[AWIDTH-1];\n"
"\n"
"endmodule\n");
 
fclose(fp);
}
 
void build_bimpy(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename:\t%s\n"
"//\n"
"// Project:\t%s\n"
"//\n"
"// Purpose:\tA simple 2-bit multiply based upon the fact that LUT's allow\n"
"// 6-bits of input. In other words, I could build a 3-bit\n"
"// multiply from 6 LUTs (5 actually, since the first could have two\n"
"// outputs). This would allow multiplication of three bit digits, save\n"
"// only for the fact that you would need two bits of carry. The bimpy\n"
"// approach throttles back a bit and does a 2x2 bit multiply in a LUT,\n"
"// guaranteeing that it will never carry more than one bit. While this\n"
"// multiply is hardware independent (and can still run under Verilator\n"
"// therefore), it is really motivated by trying to optimize for a\n"
"// specific piece of hardware (Xilinx-7 series ...) that has at least\n"
"// 4-input LUT's with carry chains.\n"
"//\n"
"//\n"
"//\n%s"
"//\n", fname, prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module bimpy(i_clk, i_ce, i_a, i_b, o_r);\n"
"\tparameter\tBW=18, // Number of bits in i_b\n"
"\t\t\tLUTB=2; // Number of bits in i_a for our LUT multiply\n"
"\tinput\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\t[(LUTB-1):0]\ti_a;\n"
"\tinput\t\t[(BW-1):0]\ti_b;\n"
"\toutput\treg\t[(BW+LUTB-1):0] o_r;\n"
"\n"
"\twire [(BW+LUTB-2):0] w_r;\n"
"\twire [(BW+LUTB-3):1] c;\n"
"\n"
"\tassign\tw_r = { ((i_a[1])?i_b:{(BW){1\'b0}}), 1\'b0 }\n"
"\t\t\t\t^ { 1\'b0, ((i_a[0])?i_b:{(BW){1\'b0}}) };\n"
"\tassign\tc = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1\'b0}}) }\n"
"\t\t\t& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1\'b0}});\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_r <= w_r + { c, 2'b0 };\n"
"\n"
"endmodule\n");
 
fclose(fp);
}
 
void build_longbimpy(const char *fname) {
FILE *fp = fopen(fname, "w");
if (NULL == fp) {
fprintf(stderr, "Could not open \'%s\' for writing\n", fname);
perror("O/S Err was:");
return;
}
 
fprintf(fp,
SLASHLINE
"//\n"
"// Filename: %s\n"
"//\n"
"// Project: %s\n"
"//\n"
"// Purpose: A portable shift and add multiply, built with the knowledge\n"
"// of the existence of a six bit LUT and carry chain. That knowledge\n"
"// allows us to multiply two bits from one value at a time against all\n"
"// of the bits of the other value. This sub multiply is called the\n"
"// bimpy.\n"
"//\n"
"// For minimal processing delay, make the first parameter the one with\n"
"// the least bits, so that AWIDTH <= BWIDTH.\n"
"//\n"
"//\n"
"//\n%s"
"//\n", fname, prjname, creator);
 
fprintf(fp, "%s", cpyleft);
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n");
fprintf(fp,
"module longbimpy(i_clk, i_ce, i_a_unsorted, i_b_unsorted, o_r);\n"
"\tparameter IAW=%d, // The width of i_a, min width is 5\n"
"\t\t\tIBW=", TST_LONGBIMPY_AW);
#ifdef TST_LONGBIMPY_BW
fprintf(fp, "%d", TST_LONGBIMPY_BW);
#else
fprintf(fp, "IAW");
#endif
 
fprintf(fp, ", // The width of i_b, can be anything\n"
"\t\t\t// The following three parameters should not be changed\n"
"\t\t\t// by any implementation, but are based upon hardware\n"
"\t\t\t// and the above values:\n"
"\t\t\tOW=IAW+IBW; // The output width\n");
fprintf(fp,
"\tlocalparam AW = (IAW<IBW) ? IAW : IBW,\n"
"\t\t\tBW = (IAW<IBW) ? IBW : IAW,\n"
"\t\t\tIW=(AW+1)&(-2), // Internal width of A\n"
"\t\t\tLUTB=2, // How many bits we can multiply by at once\n"
"\t\t\tTLEN=(AW+(LUTB-1))/LUTB; // Nmbr of rows in our tableau\n"
"\tinput\t\t\t\ti_clk, i_ce;\n"
"\tinput\t\t[(IAW-1):0]\ti_a_unsorted;\n"
"\tinput\t\t[(IBW-1):0]\ti_b_unsorted;\n"
"\toutput\treg\t[(AW+BW-1):0]\to_r;\n"
"\n"
"\t//\n"
"\t// Swap parameter order, so that AW <= BW -- for performance\n"
"\t// reasons\n"
"\twire [AW-1:0] i_a;\n"
"\twire [BW-1:0] i_b;\n"
"\tgenerate if (IAW <= IBW)\n"
"\tbegin : NO_PARAM_CHANGE\n"
"\t\tassign i_a = i_a_unsorted;\n"
"\t\tassign i_b = i_b_unsorted;\n"
"\tend else begin : SWAP_PARAMETERS\n"
"\t\tassign i_a = i_b_unsorted;\n"
"\t\tassign i_b = i_a_unsorted;\n"
"\tend endgenerate\n"
"\n"
"\treg\t[(IW-1):0]\tu_a;\n"
"\treg\t[(BW-1):0]\tu_b;\n"
"\treg\t\t\tsgn;\n"
"\n"
"\treg\t[(IW-1-2*(LUTB)):0]\tr_a[0:(TLEN-3)];\n"
"\treg\t[(BW-1):0]\t\tr_b[0:(TLEN-3)];\n"
"\treg\t[(TLEN-1):0]\t\tr_s;\n"
"\treg\t[(IW+BW-1):0]\t\tacc[0:(TLEN-2)];\n"
"\tgenvar k;\n"
"\n"
"\t// First step:\n"
"\t// Switch to unsigned arithmetic for our multiply, keeping track\n"
"\t// of the along the way. We'll then add the sign again later at\n"
"\t// the end.\n"
"\t//\n"
"\t// If we were forced to stay within two's complement arithmetic,\n"
"\t// taking the absolute value here would require an additional bit.\n"
"\t// However, because our results are now unsigned, we can stay\n"
"\t// within the number of bits given (for now).\n"
"\tgenerate if (IW > AW)\n"
"\tbegin\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\t\tu_a <= { 1\'b0, (i_a[AW-1])?(-i_a):(i_a) };\n"
"\tend else begin\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\t\tu_a <= (i_a[AW-1])?(-i_a):(i_a);\n"
"\tend endgenerate\n"
"\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tu_b <= (i_b[BW-1])?(-i_b):(i_b);\n"
"\t\t\tsgn <= i_a[AW-1] ^ i_b[BW-1];\n"
"\t\tend\n"
"\n"
"\twire [(BW+LUTB-1):0] pr_a, pr_b;\n"
"\n"
"\t//\n"
"\t// Second step: First two 2xN products.\n"
"\t//\n"
"\t// Since we have no tableau of additions (yet), we can do both\n"
"\t// of the first two rows at the same time and add them together.\n"
"\t// For the next round, we'll then have a previous sum to accumulate\n"
"\t// with new and subsequent product, and so only do one product at\n"
"\t// a time can follow this--but the first clock can do two at a time.\n"
"\tbimpy\t#(BW) lmpy_0(i_clk,i_ce,u_a[( LUTB-1): 0], u_b, pr_a);\n"
"\tbimpy\t#(BW) lmpy_1(i_clk,i_ce,u_a[(2*LUTB-1):LUTB], u_b, pr_b);\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) r_a[0] <= u_a[(IW-1):(2*LUTB)];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) r_b[0] <= u_b;\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce) r_s <= { r_s[(TLEN-2):0], sgn };\n"
"\talways @(posedge i_clk) // One clk after p[0],p[1] become valid\n"
"\t\tif (i_ce) acc[0] <= { {(IW-LUTB){1\'b0}}, pr_a}\n"
"\t\t\t +{ {(IW-(2*LUTB)){1\'b0}}, pr_b, {(LUTB){1\'b0}} };\n"
"\n"
"\tgenerate // Keep track of intermediate values, before multiplying them\n"
"\tif (TLEN > 3) for(k=0; k<TLEN-3; k=k+1)\n"
"\tbegin : gencopies\n"
"\t\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\tbegin\n"
"\t\t\tr_a[k+1] <= { {(LUTB){1\'b0}},\n"
"\t\t\t\tr_a[k][(IW-1-(2*LUTB)):LUTB] };\n"
"\t\t\tr_b[k+1] <= r_b[k];\n"
"\t\tend\n"
"\tend endgenerate\n"
"\n"
"\tgenerate // The actual multiply and accumulate stage\n"
"\tif (TLEN > 2) for(k=0; k<TLEN-2; k=k+1)\n"
"\tbegin : genstages\n"
"\t\t// First, the multiply: 2-bits times BW bits\n"
"\t\twire\t[(BW+LUTB-1):0] genp;\n"
"\t\tbimpy #(BW) genmpy(i_clk,i_ce,r_a[k][(LUTB-1):0],r_b[k], genp);\n"
"\n"
"\t\t// Then the accumulate step -- on the next clock\n"
"\t\talways @(posedge i_clk)\n"
"\t\t\tif (i_ce)\n"
"\t\t\t\tacc[k+1] <= acc[k] + {{(IW-LUTB*(k+3)){1\'b0}},\n"
"\t\t\t\t\tgenp, {(LUTB*(k+2)){1\'b0}} };\n"
"\tend endgenerate\n"
"\n"
"\twire [(IW+BW-1):0] w_r;\n"
"\tassign\tw_r = (r_s[TLEN-1]) ? (-acc[TLEN-2]) : acc[TLEN-2];\n"
"\talways @(posedge i_clk)\n"
"\t\tif (i_ce)\n"
"\t\t\to_r <= w_r[(AW+BW-1):0];\n"
"\n"
"\tgenerate if (IW > AW)\n"
"\tbegin : VUNUSED\n"
"\t\t// verilator lint_off UNUSED\n"
"\t\twire\t[(IW-AW)-1:0]\tunused;\n"
"\t\tassign\tunused = w_r[(IW+BW-1):(AW+BW)];\n"
"\t\t// verilator lint_on UNUSED\n"
"\tend endgenerate\n"
"\n"
"endmodule\n");
 
fclose(fp);
}
 
/trunk/sw/softmpy.h
0,0 → 1,47
////////////////////////////////////////////////////////////////////////////////
//
// Filename: softmpy.h
//
// Project: A General Purpose Pipelined FFT Implementation
//
// Purpose: If the chip doesn't have any hardware multiplies, you'll need
// a soft-multiply implementation. This provides that
// implementation.
//
// Creator: Dan Gisselquist, Ph.D.
// Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2018, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program. (It's in the $(ROOT)/doc directory, run make with no
// target there if the PDF file isn't present.) If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License: GPL, v3, as defined and found on www.gnu.org,
// http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
#ifndef SOFTMPY_H
#define SOFTMPY_H
 
extern void build_multiply(const char *fname);
extern void build_bimpy(const char *fname);
extern void build_longbimpy(const char *fname);
 
#endif // SOFTMPY_H

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.