URL
https://opencores.org/ocsvn/dblclockfft/dblclockfft/trunk
Subversion Repositories dblclockfft
Compare Revisions
- This comparison shows the changes necessary to convert path
/dblclockfft
- from Rev 35 to Rev 36
- ↔ Reverse comparison
Rev 35 → Rev 36
/trunk/README.md
1,19 → 1,32
# A Double-Clocked FFT Core Generator |
# A Generic Piplined FFT Core Generator |
|
The Double Clocked FFT project contains all of the software necessary to |
create the IP to generate an arbitrary sized FFT that will clock two samples |
in at each clock cycle, and after some pipeline delay it will clock two |
samples out at every clock cycle. |
This generic pipelined FFT project contains all of the software necessary to |
create the IP to generate an arbitrary sized FFT. The FFT has been modified |
for operation in one of the following modes: |
|
The FFT generated by this approach is very configurable. By simple adjustment |
of a command line parameter, the FFT may be made to be a forward FFT or an |
- Two samples in per clock and, after some delay, two samples out per clock. |
This uses 6 multiplies per FFT stage in the butterflies. This was the purpose |
of the original `dblclkfft`. (Why double clock? I don't know. Double-sample |
FFT might've been a better name.) |
|
- One sample in per clock, with the `i_ce` line being high for every incoming |
sample--up to one sample per clock. There's also options to run with at |
least one clock between samples, or even two clocks between samples (or more). |
This mode uses 3, 2, or 1 multiplies per FFT stage respectively. |
|
- Eventually, I want to support a real FFT mode which will accept real samples |
input, and alternately produce real and imaginary samples output--or the |
converse for the inverse FFT. |
|
The FFT generated by this project is very configurable. By simple adjustment |
of a command line parameter, the FFT created will either be a forward FFT or an |
inverse FFT. The number of bits processed, kept, and maintained by this |
FFT are also configurable. Even the number of bits used for the twiddle |
factors, or whether or not to bit reverse the outputs, are all configurable |
parts to this FFT core. |
|
These features make the Double Clocked FFT very different and unique among the |
other open HDL cores you may fine. |
These features make this open source pipelined FFT module very different |
and unique among the other open HDL cores you may find. |
|
For those who wish to get started right away, please download the package, |
change into the ``sw`` directory and run ``make``. There is no need to |
22,27 → 35,21
``fftgen`` to print a usage statement to the screen. Review the usage |
statement, and run ``fftgen`` a second time with the arguments you need. |
|
Alternatively, you _could_ read the specification. |
# Current State |
|
## Genesis |
This FFT comes from my attempts to design and implement a signal processing |
algorithm inside a generic FPGA, but only on a limited budget. As such, |
I don't yet have the FPGA board I wish to place this algorithm onto, neither |
do I have any expensive modeling or simulation capabilities. I'm using |
Verilator for my modeling and simulation needs. This makes |
using a vendor supplied IP core, such as an FFT, difficult if not impossible |
to use. |
This particular version of the FFT core now passes all my tests. It has |
yet to meet hardware to be finally verified. |
|
My problem was made worse when I learned that the published maximum clock |
speed for a device wasn't necessarily the maximum clock speed that I could |
achieve. My design needed to process the incoming signal at 500 MHz to be |
commercially viable. 500 MHz is not necessarily a clock speed |
that can be easily achieved. 250 MHz, on the other hand, is much more within |
the realm of possibility. Achieving a 500 MHz performance with a 250 MHz |
clock, however, requires an FFT that accepts two samples per clock. |
- The [FFT test bench](bench/cpp/fft_tb.cpp) doesn't yet have a threshold that |
adjusts with input parameters to determine success or failure (yet). |
|
This, then, was and is the genesis of this project. |
- I haven't started on the real-only version of this FFT. |
|
While my previously stated goal ws to continue working with this core until it |
has a real-FFT capability before releasing it back into the master branch, |
I'm actually so excited that I got it to this point that I'm going to move |
it from dev to master earlier, and come back to get the real only version. |
|
# Commercial Applications |
|
Should you find the GPLv3 license insufficient for your needs, other licenses |
50,3 → 57,6
|
Likewise, please contact us should you wish to fund the further development |
of this core. |
|
Watch this space if you are interested in a release under another license. |
I'm thinking about relicensing this with a more permissive license. |
/trunk/bench/cpp/fftstage_o2048_tb.cpp
File deleted
/trunk/bench/cpp/dblrev_tb.cpp
File deleted
/trunk/bench/cpp/dblstage_tb.cpp
File deleted
/trunk/bench/cpp/Makefile
6,15 → 6,14
## |
## Purpose: This programs the build process for the test benches |
## associated with the double clocked FFT project. These |
## test benches are designed for the size and arguments of the |
## FFT as given by the Makefile in the trunk/sw directory, |
## although they shouldn't be too difficult to modify for |
## other FFT parameters. |
## test benches are designed for the size and arguments of the FFT as |
## given by the Makefile in the trunk/sw directory, although they shouldn't |
## be too difficult to modify for other FFT parameters. |
## |
## Please note that running these test benches requires access |
## to the *cmem_*.hex files found in trunk/sw/fft-core. I |
## usually soft link them into this directory, but such linking |
## is not currently part of this makefile or the build scripts. |
## Please note that running these test benches requires access to the |
## *cmem_*.hex files found in trunk/rtl. I usually soft link |
## them into this directory, but such linking is not currently part of |
## this makefile or the build scripts. |
## |
## Creator: Dan Gisselquist, Ph.D. |
## Gisselquist Technology, LLC |
21,7 → 20,7
## |
##########################################################################/ |
## |
## Copyright (C) 2015, Gisselquist Technology, LLC |
## Copyright (C) 2015,2018 Gisselquist Technology, LLC |
## |
## This program is free software (firmware): you can redistribute it and/or |
## modify it under the terms of the GNU General Public License as published |
43,10 → 42,11
## |
## |
##########################################################################/ |
all: mpy_tb dblrev_tb dblstage_tb qtrstage_tb fft_tb test |
all: mpy_tb bitreverse_tb hwbfly_tb butterfly_tb fftstage_tb fft_tb |
all: qtrstage_tb laststage_tb test |
|
OBJDR:= ../../sw/fft-core/obj_dir |
VSRCD = ../../sw/fft-core |
OBJDR:= ../../rtl/obj_dir |
VSRCD = ../../rtl |
TBODR:= ../rtl/obj_dir |
ifneq ($(VERILATOR_ROOT),) |
VERILATOR:=$(VERILATOR_ROOT)/bin/verilator |
60,24 → 60,25
VINC := -I$(VROOT)/include -I$(OBJDR)/ -I$(TBODR)/ |
# MPYLB:= $(OBJDR)/Vshiftaddmpy__ALL.a |
MPYLB:= $(OBJDR)/Vlongbimpy__ALL.a |
DBLRV:= $(OBJDR)/Vdblreverse__ALL.a |
DBLSG:= $(OBJDR)/Vdblstage__ALL.a |
BTREV:= $(OBJDR)/Vbitreverse__ALL.a |
STAGE:= $(OBJDR)/Vfftstage__ALL.a |
QTRSG:= $(OBJDR)/Vqtrstage__ALL.a |
LSTSG:= $(OBJDR)/Vlaststage__ALL.a |
BFLYL:= $(OBJDR)/Vbutterfly__ALL.a |
HWBFY:= $(OBJDR)/Vhwbfly__ALL.a |
FFTLB:= $(OBJDR)/Vfftmain__ALL.a |
IFTLB:= $(TBODR)/Vifft_tb__ALL.a |
STGLB:= $(OBJDR)/Vfftstage_o2048__ALL.a |
STGLB:= $(OBJDR)/Vfftstage__ALL.a |
VSRCS:= $(VROOT)/include/verilated.cpp $(VROOT)/include/verilated_vcd_c.cpp |
|
mpy_tb: mpy_tb.cpp fftsize.h twoc.h $(MPYLB) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(MPYLB) $(VSRCS) -o $@ |
|
dblrev_tb: dblrev_tb.cpp twoc.cpp twoc.h fftsize.h $(DBLRV) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(DBLRV) $(VSRCS) -o $@ |
bitreverse_tb: bitreverse_tb.cpp twoc.cpp twoc.h fftsize.h $(BTREV) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(BTREV) $(VSRCS) -o $@ |
|
dblstage_tb: dblstage_tb.cpp twoc.cpp twoc.h $(DBLSG) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(DBLSG) $(VSRCS) -o $@ |
laststage_tb: laststage_tb.cpp twoc.cpp twoc.h $(LSTSG) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(LSTSG) $(VSRCS) -o $@ |
|
qtrstage_tb: qtrstage_tb.cpp twoc.cpp twoc.h $(QTRSG) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(QTRSG) $(VSRCS) -o $@ |
88,7 → 89,7
hwbfly_tb: hwbfly_tb.cpp twoc.cpp twoc.h $(HWBFY) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(HWBFY) $(VSRCS) -o $@ |
|
fftstage_o2048_tb: fftstage_o2048_tb.cpp twoc.cpp twoc.h $(STGLB) |
fftstage_tb: fftstage_tb.cpp twoc.cpp twoc.h $(STGLB) |
g++ -g $(VINC) $(VDEFS) $< twoc.cpp $(STGLB) $(VSRCS) -o $@ |
|
fft_tb: fft_tb.cpp twoc.cpp twoc.h fftsize.h $(FFTLB) |
112,22 → 113,22
ln -s $(VSRCD)/*.hex . |
|
.PHONY: test |
test: mpy_tb dblrev_tb dblstage_tb qtrstage_tb butterfly_tb fftstage_o2048_tb |
test: mpy_tb bitreverse_tb fftstage_tb qtrstage_tb butterfly_tb fftstage_tb |
test: fft_tb ifft_tb hwbfly_tb |
./mpy_tb |
./dblrev_tb |
./dblstage_tb |
./qtrstage_tb |
./bitreverse_tb |
./fftstage_tb |
echo ./qtrstage_tb |
./butterfly_tb |
./hwbfly_tb |
./fftstage_o2048_tb |
./fftstage_tb |
./fft_tb |
./ifft_tb |
|
.PHONY: clean |
clean: |
rm -f mpy_tb dblrev_tb dblstage_tb qtrstage_tb butterfly_tb |
rm -f fftstage_o2048_tb fft_tb ifft_tb hwbfly_tb |
rm -f mpy_tb bitreverse_tb fftstage_tb qtrstage_tb butterfly_tb |
rm -f fftstage_tb fft_tb ifft_tb hwbfly_tb |
rm -rf fft_tb.dbl ifft_tb.dbl |
rm -rf *cmem_*.hex |
|
/trunk/bench/cpp/README.md
0,0 → 1,18
Here are the bench tests for the pipelined FFT. In general, there's a |
`*_tb.cpp` file corresponding to every unit within the FFT. Feel free to |
try them. |
|
Be aware, however, the [fft_tb](fft_tb.cpp) doesn't truly |
check for success--I just haven't gotten to the point of verifying that |
the FFT result is *close enough* to the right answer in spite of actually |
calculating the right answer. Instead, it creates a data file that can be |
read in Octave via [fft_tb.m](fft_tb.m). That will show the first test output. |
The second and subsequent outputs can be read via `k=k+1;` followed by calling |
[plottst](plottst.m). |
|
As another note (before I clean things up more), you'll need the `*.hex` files |
in the same directory as the one you call [fft_tb](fft_tb.cpp) or |
[fftstage_tb](fftstage_tb.cpp) from. |
|
I expect the IFFT will work: it's just an FFT with conjugate twiddle factors, |
although I haven't fully tested it yet. |
/trunk/bench/cpp/bitreverse_tb.cpp
0,0 → 1,235
//////////////////////////////////////////////////////////////////////////// |
// |
// Filename: snglbrev_tb.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: A test-bench for the bitreversal stage of the pipelined |
// FFT. This file may be run autonomously. If so, the last line |
// output will either read "SUCCESS" on success, or some other failure |
// message otherwise. |
// |
// This file depends upon verilator to both compile, run, and therefore |
// test either snglbrev.v or dblreverse.v--depending on whether or not the |
// FFT handles one or two inputs per clock respectively. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015,2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
#include "verilated.h" |
#include "verilated_vcd_c.h" |
|
#include "fftsize.h" |
#include "Vbitreverse.h" |
|
#define FFTBITS TST_DBLREVERSE_LGSIZE |
#define FFTSIZE (1<<(FFTBITS)) |
#define FFTMASK (FFTSIZE-1) |
#define DATALEN (1<<(FFTBITS+1)) |
#define DATAMSK (DATALEN-1) |
#define PAGEMSK (FFTSIZE) |
|
#ifdef NEW_VERILATOR |
#define VVAR(A) bitreverse__DOT_ ## A |
#else |
#define VVAR(A) v__DOT_ ## A |
#endif |
|
typedef Vbitreverse TSTCLASS; |
|
#define iaddr VVAR(_wraddr) |
#define in_reset VVAR(_in_reset) |
|
VerilatedVcdC *trace = NULL; |
uint64_t m_tickcount = 0; |
|
void tick(TSTCLASS *brev) { |
m_tickcount++; |
|
brev->i_clk = 0; |
brev->eval(); |
if (trace) trace->dump((uint64_t)(10ul*m_tickcount-2)); |
brev->i_clk = 1; |
brev->eval(); |
if (trace) trace->dump((uint64_t)(10ul*m_tickcount)); |
brev->i_clk = 0; |
brev->eval(); |
if (trace) { |
trace->dump((uint64_t)(10ul*m_tickcount+5)); |
trace->flush(); |
} |
|
brev->i_ce = 0; |
} |
|
void cetick(TSTCLASS *brev) { |
brev->i_ce = 1; |
tick(brev); |
if (rand()&1) { |
brev->i_ce = 1; |
tick(brev); |
} |
} |
|
void reset(TSTCLASS *brev) { |
brev->i_ce = 0; |
brev->i_reset = 1; |
tick(brev); |
brev->i_ce = 0; |
brev->i_reset = 0; |
tick(brev); |
} |
|
unsigned long bitrev(const int nbits, const unsigned long vl) { |
unsigned long r = 0; |
unsigned long val = vl; |
|
for(int k=0; k<nbits; k++) { |
r <<= 1; |
r |= (val & 1); |
val >>= 1; |
} |
|
return r; |
} |
|
int main(int argc, char **argv, char **envp) { |
Verilated::commandArgs(argc, argv); |
Verilated::traceEverOn(true); |
TSTCLASS *brev = new TSTCLASS; |
int syncd = 0; |
unsigned long datastore[DATALEN], dataidx=0; |
const int BREV_OFFSET = 0; |
|
trace = new VerilatedVcdC; |
brev->trace(trace, 99); |
trace->open("bitreverse_tb.vcd"); |
|
reset(brev); |
|
printf("FFTSIZE = %08x\n", FFTSIZE); |
printf("FFTMASK = %08x\n", FFTMASK); |
printf("DATALEN = %08x\n", DATALEN); |
printf("DATAMSK = %08x\n", DATAMSK); |
|
for(int k=0; k<4*(FFTSIZE); k++) { |
brev->i_ce = 1; |
#ifdef DBLCLKFFT |
brev->i_in_0 = 2*k; |
brev->i_in_1 = 2*k+1; |
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_0; |
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_1; |
#else |
brev->i_in = k; |
datastore[(dataidx++)&(DATAMSK)] = brev->i_in; |
#endif |
tick(brev); |
|
printf("k=%3d: IN = %6lx, OUT = %6lx, SYNC = %d\t(%2x) %d\n", |
k, brev->i_in, brev->o_out, brev->o_sync, |
brev->iaddr, brev->in_reset); |
|
if ((k>BREV_OFFSET)&&((BREV_OFFSET==(k&FFTMASK))?1:0) != brev->o_sync) { |
fprintf(stdout, "FAIL, BAD SYNC (k = %d > %d)\n", k, BREV_OFFSET); |
exit(EXIT_FAILURE); |
} else if (brev->o_sync) { |
syncd = 1; |
} |
if ((syncd)&&((brev->o_out&FFTMASK) != bitrev(FFTBITS, k-BREV_OFFSET))) { |
fprintf(stdout, "FAIL: BITREV.0 of k (%2x) = %2lx, not %2lx\n", |
k, brev->o_out, bitrev(FFTBITS, (k-BREV_OFFSET))); |
exit(EXIT_FAILURE); |
} |
} |
|
for(int k=0; k<4*(FFTSIZE); k++) { |
brev->i_ce = 1; |
#ifdef DBLCLKFFT |
brev->i_in_0 = rand() & 0x0ffffff; |
brev->i_in_1 = rand() & 0x0ffffff; |
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_0; |
datastore[(dataidx++)&(DATAMSK)] = brev->i_in_1; |
#else |
brev->i_in = rand() & 0x0ffffff; |
datastore[(dataidx++)&(DATAMSK)] = brev->i_in; |
#endif |
tick(brev); |
|
#ifdef DBLCLKFFT |
printf("k=%3d: IN = %6lx : %6lx, OUT = %6lx : %6lx, SYNC = %d\n", |
k, brev->i_in_0, brev->i_in_1, |
brev->o_out_0, brev->o_out_1, brev->o_sync); |
#else |
printf("k=%3d: IN = %6lx, OUT = %6lx, SYNC = %d\n", |
k, brev->i_in, brev->o_out, brev->o_sync); |
#endif |
|
if (brev->o_sync) |
syncd = 1; |
#ifdef DBLCLKFFT |
if ((syncd)&&(brev->o_out_0 != datastore[(((dataidx-2-FFTSIZE)&PAGEMSK) + bitrev(FFTBITS, (dataidx-FFTSIZE-2)&FFTMASK))])) { |
fprintf(stdout, "FAIL: BITREV.0 of k (%2x) = %2lx, not %2lx (expected %lx -> %lx)\n", |
k, brev->o_out_0, |
datastore[(((dataidx-2-FFTSIZE)&PAGEMSK) |
+ bitrev(FFTBITS, (dataidx-FFTSIZE-2)&FFTMASK))], |
(dataidx-2)&DATAMSK, |
(((dataidx-2)&PAGEMSK) |
+ bitrev(FFTBITS, (dataidx-FFTSIZE-2)&FFTMASK))); |
// exit(-1); |
} |
|
if ((syncd)&&(brev->o_out_1 != datastore[(((dataidx-2-FFTSIZE)&PAGEMSK) + bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))])) { |
fprintf(stdout, "FAIL: BITREV.1 of k (%2x) = %2lx, not %2lx (expected %lx)\n", |
k, brev->o_out_1, |
datastore[(((dataidx-2-FFTSIZE)&PAGEMSK) |
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))], |
(((dataidx-1)&PAGEMSK) |
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))); |
// exit(-1); |
} |
#else |
if ((syncd)&&(brev->o_out != datastore[ |
(((dataidx-1-FFTSIZE)&PAGEMSK) |
+ bitrev(FFTBITS, |
(dataidx-FFTSIZE-1)&FFTMASK))])) { |
fprintf(stdout, "FAIL: BITREV.0 of k (%2x) = %2lx, not %2lx (expected %lx -> %lx)\n", |
k, brev->o_out, |
datastore[(((dataidx-1-FFTSIZE)&PAGEMSK) |
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))], |
(dataidx-2)&DATAMSK, |
(((dataidx-2)&PAGEMSK) |
+ bitrev(FFTBITS, (dataidx-FFTSIZE-1)&FFTMASK))); |
exit(EXIT_FAILURE); |
} |
#endif |
} |
|
delete brev; |
|
printf("SUCCESS!\n"); |
exit(0); |
} |
/trunk/bench/cpp/butterfly_tb.cpp
4,13 → 4,13
// |
// Project: A Doubletime Pipelined FFT |
// |
// Purpose: A test-bench for the butterfly.v subfile of the double |
// clocked FFT. This file may be run autonomously. If so, |
// the last line output will either read "SUCCESS" on success, |
// or some other failure message otherwise. |
// Purpose: A test-bench for the butterfly.v subfile of the generic |
// pipelined FFT. This file may be run autonomously. If so, |
// the last line output will either read "SUCCESS" on success, or some |
// other failure message otherwise. |
// |
// This file depends upon verilator to both compile, run, and |
// therefore test butterfly.v |
// This file depends upon verilator to both compile, run, and therefore |
// test butterfly.v |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
17,7 → 17,7
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// Copyright (C) 2015,2018 Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
42,11 → 42,18
#include <stdio.h> |
#include <stdint.h> |
|
#include "fftsize.h" |
#include "verilated.h" |
#include "verilated_vcd_c.h" |
#include "Vbutterfly.h" |
#include "verilated.h" |
#include "twoc.h" |
#include "fftsize.h" |
|
#ifdef NEW_VERILATOR |
#define VVAR(A) butterfly__DOT__ ## A |
#else |
#define VVAR(A) v__DOT_ ## A |
#endif |
|
#define IWIDTH TST_BUTTERFLY_IWIDTH |
#define CWIDTH TST_BUTTERFLY_CWIDTH |
#define OWIDTH TST_BUTTERFLY_OWIDTH |
55,24 → 62,55
class BFLY_TB { |
public: |
Vbutterfly *m_bfly; |
VerilatedVcdC *m_trace; |
unsigned long m_left[64], m_right[64]; |
bool m_aux[64]; |
int m_addr, m_lastaux, m_offset; |
bool m_syncd, m_waiting_for_sync_input; |
uint64_t m_tickcount; |
|
BFLY_TB(void) { |
Verilated::traceEverOn(true); |
m_trace = NULL; |
m_bfly = new Vbutterfly; |
m_addr = 0; |
m_syncd = 0; |
m_tickcount = 0; |
m_waiting_for_sync_input = true; |
} |
|
void opentrace(const char *vcdname) { |
if (!m_trace) { |
m_trace = new VerilatedVcdC; |
m_bfly->trace(m_trace, 99); |
m_trace->open(vcdname); |
} |
} |
|
void closetrace(void) { |
if (m_trace) { |
m_trace->close(); |
delete m_trace; |
m_trace = NULL; |
} |
} |
|
void tick(void) { |
m_tickcount++; |
|
m_lastaux = m_bfly->o_aux; |
m_bfly->i_clk = 0; |
m_bfly->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2)); |
m_bfly->i_clk = 1; |
m_bfly->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount)); |
m_bfly->i_clk = 0; |
m_bfly->eval(); |
if (m_trace) { |
m_trace->dump((uint64_t)(10ul*m_tickcount+5)); |
m_trace->flush(); |
} |
|
if ((!m_syncd)&&(m_bfly->o_aux)) |
m_offset = m_addr; |
79,14 → 117,33
m_syncd = (m_syncd) || (m_bfly->o_aux); |
} |
|
void cetick(void) { |
int ce = m_bfly->i_ce, nkce; |
|
tick(); |
|
nkce = (rand()&1); |
#ifdef FFT_CKPCE |
nkce += FFT_CKPCE; |
#endif |
|
if ((ce)&&(nkce > 0)) { |
m_bfly->i_ce = 0; |
for(int kce=0; kce<nkce-1; kce++) |
tick(); |
} |
|
m_bfly->i_ce = ce; |
} |
|
void reset(void) { |
m_bfly->i_ce = 0; |
m_bfly->i_rst = 1; |
m_bfly->i_reset = 1; |
m_bfly->i_coef = 0l; |
m_bfly->i_left = 0; |
m_bfly->i_right = 0; |
tick(); |
m_bfly->i_rst = 0; |
m_bfly->i_reset = 0; |
m_bfly->i_ce = 1; |
// |
// Let's run a RESET test here, forcing the whole butterfly |
93,19 → 150,18
// to be filled with aux=1. If the reset works right, |
// we'll never get an aux=1 output. |
// |
m_bfly->i_rst = 1; |
m_bfly->i_reset = 1; |
m_bfly->i_aux = 1; |
for(int i=0; i<200; i++) { |
m_bfly->i_ce = 1; |
tick(); |
} |
m_bfly->i_ce = 1; |
for(int i=0; i<200; i++) |
cetick(); |
|
// Now here's the RESET line, so let's see what the test does |
m_bfly->i_rst = 1; |
m_bfly->i_reset = 1; |
m_bfly->i_ce = 1; |
m_bfly->i_aux = 1; |
tick(); |
m_bfly->i_rst = 0; |
cetick(); |
m_bfly->i_reset = 0; |
m_syncd = 0; |
|
m_waiting_for_sync_input = true; |
124,41 → 180,42
} |
|
m_bfly->i_ce = 1; |
tick(); |
cetick(); |
|
if ((m_bfly->o_aux)&&(!m_lastaux)) |
printf("\n"); |
printf("n,k=%d,%3d: COEF=%010lx, LFT=%08x, RHT=%08x, A=%d, OLFT =%09lx, ORHT=%09lx, AUX=%d\n", |
printf("n,k=%d,%3d: COEF=%0*lx, LFT=%0*x, RHT=%0*x, A=%d, OLFT =%0*lx, ORHT=%0*lx, AUX=%d\n", |
n,k, |
m_bfly->i_coef & (~(-1l<<40)), |
m_bfly->i_left, |
m_bfly->i_right, |
(2*CWIDTH+3)/4, ubits(m_bfly->i_coef, 2*CWIDTH), |
(2*IWIDTH+3)/4, m_bfly->i_left, |
(2*IWIDTH+3)/4, m_bfly->i_right, |
m_bfly->i_aux, |
m_bfly->o_left, |
m_bfly->o_right, |
(2*OWIDTH+3)/4, (long)m_bfly->o_left, |
(2*OWIDTH+3)/4, (long)m_bfly->o_right, |
m_bfly->o_aux); |
|
if ((m_syncd)&&(m_left[(m_addr-m_offset)&(64-1)] != m_bfly->o_left)) { |
printf("WRONG O_LEFT! (%lx(exp) != %lx(sut))\n", |
printf("WRONG O_LEFT! (%lx(exp) != %lx(sut)\n", |
m_left[(m_addr-m_offset)&(64-1)], |
m_bfly->o_left); |
exit(-1); |
(long)m_bfly->o_left); |
exit(EXIT_FAILURE); |
} |
|
if ((m_syncd)&&(m_right[(m_addr-m_offset)&(64-1)] != m_bfly->o_right)) { |
printf("WRONG O_RIGHT (%10lx(exp) != (%10lx(sut))!\n", |
m_right[(m_addr-m_offset)&(64-1)], m_bfly->o_right); |
exit(-1); |
printf("WRONG O_RIGHT! (%lx(exp) != %lx(sut))\n", |
m_right[(m_addr-m_offset)&(64-1)], |
(long)m_bfly->o_right); |
exit(EXIT_FAILURE); |
} |
|
if ((m_syncd)&&(m_aux[(m_addr-m_offset)&(64-1)] != m_bfly->o_aux)) { |
printf("FAILED AUX CHANNEL TEST (i.e. the SYNC)\n"); |
exit(-1); |
exit(EXIT_FAILURE); |
} |
|
if ((m_addr > TST_BUTTERFLY_MPYDELAY+6)&&(!m_syncd)) { |
printf("NO SYNC PULSE!\n"); |
// exit(-1); |
exit(EXIT_FAILURE); |
} |
|
// Now, let's calculate an "expected" result ... |
241,6 → 298,18
} |
}; |
|
long gentestword(int w, int al, int ar) { |
unsigned long lo, hi, r; |
hi = ((unsigned long)(al&0x0c))<<(w-4); |
hi += (al&3)-2ul; |
|
lo = ((unsigned long)(ar&0x0c))<<(w-4); |
lo += (ar&3)-2ul; |
|
r = (ubits(hi, w) << w) | (ubits(lo, w)); |
return r; |
} |
|
int main(int argc, char **argv, char **envp) { |
Verilated::commandArgs(argc, argv); |
BFLY_TB *bfly = new BFLY_TB; |
251,13 → 320,32
|
const int TESTSZ = 256; |
|
bfly->opentrace("butterfly.vcd"); |
|
bfly->reset(); |
|
// #define ZEROTEST |
#define ZEROTEST bfly->test(9,0,0x0000000000l,0x00000000,0x00000000, 0) |
// Test whether or not the aux channel starts clear, like its supposed to |
|
bfly->test(9,0,0x4000000000l,0x000f0000,0x00000000, 1); |
ZEROTEST; |
ZEROTEST; |
bfly->test(9,0,0x4000000000l,0x00000000,0x000f0000, 0); |
ZEROTEST; |
ZEROTEST; |
bfly->test(9,0,0x4000000000l,0x000f0000,0x000f0000, 0); |
ZEROTEST; |
ZEROTEST; |
bfly->test(9,1,0x4000000000l,0x000f0000,0xfff10000, 0); |
ZEROTEST; |
ZEROTEST; |
bfly->test(9,2,0x4000000000l,0x0000000f,0x0000fff1, 0); |
ZEROTEST; |
ZEROTEST; |
bfly->test(9,3,0x4000000000l,0x0000000f,0x0000000f, 0); |
ZEROTEST; |
ZEROTEST; |
|
bfly->test(9,0,0x4000000000l,0x7fff0000,0x7fff0000, 1); |
bfly->test(9,1,0x4000000000l,0x7fff0000,0x80010000, 0); |
337,6 → 425,30
bfly->test(n,k, cof, lft, rht, aux); |
} |
|
int k = TESTSZ; |
// Exhaustively test |
#if (4*IWIDTH+2*CWIDTH <= 24) |
for(int a=0; a<(1<<(2*IWIDTH)); a++) |
for(int b=0; b<(1<<(2*IWIDTH)); b++) |
for(int c=0; c<(1<<(2*CWIDTH)); c++) |
bfly->test(0, k++, c, a, b, 0); |
|
printf("Exhaust complete\n"); |
#else |
for(int al=0; al<16; al++) |
for(int ar=0; ar<16; ar++) |
for(int bl=0; bl<16; bl++) |
for(int br=0; br<16; br++) |
for(int cl=0; cl<16; cl++) |
for(int cr=0; cr<16; cr++) { |
long a = gentestword(IWIDTH, al, ar); |
long b = gentestword(IWIDTH, bl, br); |
long c = gentestword(CWIDTH, cl, cr); |
bfly->test(0, k++, c, a, b, 0); |
} |
printf("Partial exhaust complete\n"); |
#endif |
|
delete bfly; |
|
printf("SUCCESS!\n"); |
/trunk/bench/cpp/fft_tb.cpp
5,12 → 5,11
// |
// Purpose: A test-bench for the main program, fftmain.v, of the double |
// clocked FFT. This file may be run autonomously (when |
// fully functional). If so, the last line output will either |
// read "SUCCESS" on success, or some other failure message |
// otherwise. |
// fully functional). If so, the last line output will either read |
// "SUCCESS" on success, or some other failure message otherwise. |
// |
// This file depends upon verilator to both compile, run, and |
// therefore test fftmain.v |
// This file depends upon verilator to both compile, run, and therefore |
// test fftmain.v |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
17,7 → 16,7
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// Copyright (C) 2015,2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
40,10 → 39,12
// |
/////////////////////////////////////////////////////////////////////////// |
#include <stdio.h> |
#include <stdlib.h> |
#include <math.h> |
#include <fftw3.h> |
|
#include "verilated.h" |
#include "verilated_vcd_c.h" |
#include "Vfftmain.h" |
#include "twoc.h" |
|
56,8 → 57,11
#define VVAR(A) v__DOT_ ## A |
#endif |
|
|
#ifdef DBLCLKFFT |
#define revstage_iaddr VVAR(_revstage__DOT__iaddr) |
#else |
#define revstage_iaddr VVAR(_revstage__DOT__wraddr) |
#endif |
#define br_sync VVAR(_br_sync) |
#define br_started VVAR(_r_br_started) |
#define w_s2048 VVAR(_w_s2048) |
119,9 → 123,11
double *m_fft_buf; |
bool m_syncd; |
unsigned long m_tickcount; |
VerilatedVcdC* m_trace; |
|
FFT_TB(void) { |
m_fft = new Vfftmain; |
Verilated::traceEverOn(true); |
m_iaddr = m_oaddr = 0; |
m_dumpfp = NULL; |
|
131,39 → 137,75
FFTW_FORWARD, FFTW_MEASURE); |
m_syncd = false; |
m_ntest = 0; |
} |
|
m_tickcount = 0l; |
~FFT_TB(void) { |
closetrace(); |
delete m_fft; |
m_fft = NULL; |
} |
|
virtual void opentrace(const char *vcdname) { |
if (!m_trace) { |
m_trace = new VerilatedVcdC; |
m_fft->trace(m_trace, 99); |
m_trace->open(vcdname); |
} |
} |
|
virtual void closetrace(void) { |
if (m_trace) { |
m_trace->close(); |
delete m_trace; |
m_trace = NULL; |
} |
} |
|
void tick(void) { |
if ((!m_fft->i_ce)||(m_fft->i_rst)) |
m_tickcount++; |
if (m_fft->i_reset) |
printf("TICK(%s,%s)\n", |
(m_fft->i_rst)?"RST":" ", |
(m_fft->i_reset)?"RST":" ", |
(m_fft->i_ce)?"CE":" "); |
|
m_fft->i_clk = 0; |
m_fft->eval(); |
if (m_trace) |
m_trace->dump((vluint64_t)(10*m_tickcount-2)); |
m_fft->i_clk = 1; |
m_fft->eval(); |
if (m_trace) |
m_trace->dump((vluint64_t)(10*m_tickcount)); |
m_fft->i_clk = 0; |
m_fft->eval(); |
if (m_trace) { |
m_trace->dump((vluint64_t)(10*m_tickcount+5)); |
m_trace->flush(); |
} |
} |
|
m_tickcount++; |
void cetick(void) { |
int ce = m_fft->i_ce, nkce; |
tick(); |
|
/* |
int nrpt = (rand()&0x01f) + 1; |
m_fft->i_ce = 0; |
for(int i=0; i<nrpt; i++) { |
m_fft->i_clk = 0; |
m_fft->eval(); |
m_fft->i_clk = 1; |
m_fft->eval(); |
nkce = (rand()&1); |
#ifdef FFT_CKPCE |
nkce += FFT_CKPCE; |
#endif |
if ((ce)&&(nkce>0)) { |
m_fft->i_ce = 0; |
for(int kce=1; kce < nkce; kce++) |
tick(); |
} |
*/ |
|
m_fft->i_ce = ce; |
} |
|
void reset(void) { |
m_fft->i_ce = 0; |
m_fft->i_rst = 1; |
m_fft->i_reset = 1; |
tick(); |
m_fft->i_rst = 0; |
m_fft->i_reset = 0; |
tick(); |
|
m_iaddr = m_oaddr = m_logbase = 0; |
254,18 → 296,19
printf("%3d : SCALE = %12.6f, WT = %18.1f, ISQ = %15.1f, ", |
m_ntest, scale, wt, isq); |
printf("OSQ = %18.1f, ", osq); |
printf("XISQ = %18.1f\n", xisq); |
printf("XISQ = %18.1f, sqrt = %9.2f\n", xisq, sqrt(xisq)); |
if (xisq > 1.4 * FFTLEN/2) { |
printf("TEST FAIL!! Result is out of bounds from "); |
printf("expected result with FFTW3.\n"); |
// exit(-2); |
// exit(EXIT_FAILURE); |
} |
m_ntest++; |
} |
|
#ifdef DBLCLKFFT |
bool test(ITYP lft, ITYP rht) { |
m_fft->i_ce = 1; |
m_fft->i_rst = 0; |
m_fft->i_reset = 0; |
m_fft->i_left = lft; |
m_fft->i_right = rht; |
|
272,7 → 315,7
m_log[(m_iaddr++)&(NFTLOG*FFTLEN-1)] = lft; |
m_log[(m_iaddr++)&(NFTLOG*FFTLEN-1)] = rht; |
|
tick(); |
cetick(); |
|
if (m_fft->o_sync) { |
if (!m_syncd) { |
339,7 → 382,81
|
return (m_fft->o_sync); |
} |
#else |
bool test(ITYP data) { |
m_fft->i_ce = 1; |
m_fft->i_reset = 0; |
m_fft->i_sample = data; |
|
m_log[(m_iaddr++)&(NFTLOG*FFTLEN-1)] = data; |
|
cetick(); |
|
if (m_fft->o_sync) { |
if (!m_syncd) { |
m_syncd = true; |
printf("ORIGINAL SYNC AT 0x%lx, m_oaddr set to 0x%x\n", m_tickcount, m_oaddr); |
m_logbase = m_iaddr; |
} else printf("RESYNC AT %lx\n", m_tickcount); |
m_oaddr &= (-1<<LGWIDTH); |
} else m_oaddr += 1; |
|
printf("%8x,%5d: %08x -> %011lx\t", |
m_iaddr, m_oaddr, data, m_fft->o_result); |
|
#ifndef APPLY_BITREVERSE_LOCALLY |
printf(" [%3x]%s", m_fft->revstage_iaddr, |
(m_fft->br_sync)?"S" |
:((m_fft->br_started)?".":"x")); |
#endif |
|
printf(" "); |
#if (FFT_SIZE>=2048) |
printf("%s", (m_fft->w_s2048)?"S":"-"); |
#endif |
#if (FFT_SIZE>1024) |
printf("%s", (m_fft->w_s1024)?"S":"-"); |
#endif |
#if (FFT_SIZE>512) |
printf("%s", (m_fft->w_s512)?"S":"-"); |
#endif |
#if (FFT_SIZE>256) |
printf("%s", (m_fft->w_s256)?"S":"-"); |
#endif |
#if (FFT_SIZE>128) |
printf("%s", (m_fft->w_s128)?"S":"-"); |
#endif |
#if (FFT_SIZE>64) |
printf("%s", (m_fft->w_s64)?"S":"-"); |
#endif |
#if (FFT_SIZE>32) |
printf("%s", (m_fft->w_s32)?"S":"-"); |
#endif |
#if (FFT_SIZE>16) |
printf("%s", (m_fft->w_s16)?"S":"-"); |
#endif |
#if (FFT_SIZE>8) |
printf("%s", (m_fft->w_s8)?"S":"-"); |
#endif |
#if (FFT_SIZE>4) |
printf("%s", (m_fft->w_s4)?"S":"-"); |
#endif |
|
printf(" %s%s\n", |
(m_fft->o_sync)?"\t(SYNC!)":"", |
(m_fft->o_result)?" (NZ)":""); |
|
m_data[(m_oaddr )&(FFTLEN-1)] = m_fft->o_result; |
|
if ((m_syncd)&&((m_oaddr&(FFTLEN-1)) == FFTLEN-1)) { |
dumpwrite(); |
checkresults(); |
} |
|
return (m_fft->o_sync); |
} |
#endif |
|
bool test(double lft_r, double lft_i, double rht_r, double rht_i) { |
ITYP ilft, irht, ilft_r, ilft_i, irht_r, irht_i; |
|
351,7 → 468,12
ilft = (ilft_r << IWIDTH) | ilft_i; |
irht = (irht_r << IWIDTH) | irht_i; |
|
#ifdef DBLCLKFFT |
return test(ilft, irht); |
#else |
test(ilft); |
return test(irht); |
#endif |
} |
|
double rdata(int addr) { |
405,6 → 527,7
exit(-1); |
} |
|
fft->opentrace("fft.vcd"); |
fft->reset(); |
|
{ |
414,7 → 537,8
fft->dump(fpout); |
|
// 1. |
fft->test(0.0, 0.0, 32767.0, 0.0); |
double maxv = ((1l<<(IWIDTH-1))-1l); |
fft->test(0.0, 0.0, maxv, 0.0); |
for(int k=0; k<FFTLEN/2-1; k++) |
fft->test(0.0,0.0,0.0,0.0); |
|
422,27 → 546,27
for(int k=0; k<FFTLEN/2; k++) { |
double cl, cr, sl, sr, W; |
W = - 2.0 * M_PI / FFTLEN * (1); |
cl = cos(W * (2*k )) * 16383.0; |
sl = sin(W * (2*k )) * 16383.0; |
cr = cos(W * (2*k+1)) * 16383.0; |
sr = sin(W * (2*k+1)) * 16383.0; |
cl = cos(W * (2*k )) * (double)((1l<<(IWIDTH-2))-1l); |
sl = sin(W * (2*k )) * (double)((1l<<(IWIDTH-2))-1l); |
cr = cos(W * (2*k+1)) * (double)((1l<<(IWIDTH-2))-1l); |
sr = sin(W * (2*k+1)) * (double)((1l<<(IWIDTH-2))-1l); |
fft->test(cl, sl, cr, sr); |
} |
|
// 2. |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=0; k<FFTLEN/2-1; k++) |
fft->test(0.0,0.0,0.0,0.0); |
|
// 3. |
fft->test(0.0,0.0,0.0,0.0); |
fft->test(32767.0, 0.0, 0.0, 0.0); |
fft->test(maxv, 0.0, 0.0, 0.0); |
for(int k=0; k<FFTLEN/2-1; k++) |
fft->test(0.0,0.0,0.0,0.0); |
|
// 4. |
for(int k=0; k<8; k++) |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=8; k<FFTLEN/2; k++) |
fft->test(0.0,0.0,0.0,0.0); |
|
449,7 → 573,7
// 5. |
if (FFTLEN/2 >= 16) { |
for(int k=0; k<16; k++) |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=16; k<FFTLEN/2; k++) |
fft->test(0.0,0.0,0.0,0.0); |
} |
457,7 → 581,7
// 6. |
if (FFTLEN/2 >= 32) { |
for(int k=0; k<32; k++) |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=32; k<FFTLEN/2; k++) |
fft->test(0.0,0.0,0.0,0.0); |
} |
465,7 → 589,7
// 7. |
if (FFTLEN/2 >= 64) { |
for(int k=0; k<64; k++) |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=64; k<FFTLEN/2; k++) |
fft->test(0.0,0.0,0.0,0.0); |
} |
472,7 → 596,7
|
if (FFTLEN/2 >= 128) { |
for(int k=0; k<128; k++) |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=128; k<FFTLEN/2; k++) |
fft->test(0.0,0.0,0.0,0.0); |
} |
479,7 → 603,7
|
if (FFTLEN/2 >= 256) { |
for(int k=0; k<256; k++) |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=256; k<FFTLEN/2; k++) |
fft->test(0.0,0.0,0.0,0.0); |
} |
486,7 → 610,7
|
if (FFTLEN/2 >= 512) { |
for(int k=0; k<256+128; k++) |
fft->test(32767.0, 0.0, 32767.0, 0.0); |
fft->test(maxv, 0.0, maxv, 0.0); |
for(int k=256+128; k<FFTLEN/2; k++) |
fft->test(0.0,0.0,0.0,0.0); |
} |
603,22 → 727,22
|
// 65. |
for(int k=0; k<FFTLEN/2; k++) |
fft->test(32767.0,0.0,-32767.0,0.0); |
fft->test(maxv,0.0,-maxv,0.0); |
// 66. |
for(int k=0; k<FFTLEN/2; k++) |
fft->test(0.0,-32767.0,0.0,32767.0); |
fft->test(0.0,-maxv,0.0,maxv); |
// 67. |
for(int k=0; k<FFTLEN/2; k++) |
fft->test(-32768.0,-32768.0,-32768.0,-32768.0); |
fft->test(-maxv,-maxv,-maxv,-maxv); |
// 68. |
for(int k=0; k<FFTLEN/2; k++) |
fft->test(0.0,-32767.0,0.0,32767.0); |
fft->test(0.0,-maxv,0.0,maxv); |
// 69. |
for(int k=0; k<FFTLEN/2; k++) |
fft->test(0.0,32767.0,0.0,-32767.0); |
fft->test(0.0,maxv,0.0,-maxv); |
// 70. |
for(int k=0; k<FFTLEN/2; k++) |
fft->test(-32768.0,-32768.0,-32768.0,-32768.0); |
fft->test(-maxv,-maxv,-maxv,-maxv); |
|
// 71. Now let's go for an impulse (SUCCESS) |
fft->test(16384.0, 0.0, 0.0, 0.0); |
722,8 → 846,16
|
fclose(fpout); |
|
if (!fft->m_syncd) { |
printf("FAIL -- NO SYNC\n"); |
goto test_failure; |
} |
|
printf("SUCCESS!!\n"); |
exit(0); |
test_failure: |
printf("TEST FAILED!!\n"); |
exit(0); |
} |
|
|
/trunk/bench/cpp/fftstage_tb.cpp
0,0 → 1,369
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: fftstage_tb.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: A test-bench for a generic FFT stage which has been |
// instantiated by fftgen. Without loss of (much) generality, |
// we'll examine the 2048 fftstage.v. This file may be run autonomously. |
// If so, the last line output will either read "SUCCESS" on success, or |
// some other failure message otherwise. Likewise the exit code will |
// also indicate success (exit(0)) or failure (anything else). |
// |
// This file depends upon verilator to both compile, run, and therefore |
// test fftstage.v. Also, you'll need to place a copy of the cmem_*2048 |
// hex file into the directory where you run this test bench. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#include "Vfftstage.h" |
#include "verilated.h" |
#include "verilated_vcd_c.h" |
#include "twoc.h" |
#include "fftsize.h" |
|
|
#ifdef NEW_VERILATOR |
#define VVAR(A) fftstage__DOT_ ## A |
#else |
#define VVAR(A) v__DOT_ ## A |
#endif |
|
#define cmem VVAR(_cmem) |
#define iaddr VVAR(_iaddr) |
|
#define FFTBITS (FFT_LGWIDTH) |
#define FFTLEN (1<<FFTBITS) |
#define FFTSIZE FFTLEN |
#define FFTMASK (FFTLEN-1) |
#define IWIDTH FFT_IWIDTH |
#define CWIDTH 20 |
#define OWIDTH (FFT_IWIDTH+1) |
#define BFLYSHIFT 0 |
#define LGWIDTH (FFT_LGWIDTH) |
#ifdef DBLCLKFFT |
#define LGSPAN (LGWIDTH-2) |
#else |
#define LGSPAN (LGWIDTH-1) |
#endif |
#define ROUND true |
|
#define SPANLEN (1<<LGSPAN) |
#define SPANMASK (SPANLEN-1) |
#define DBLSPANLEN (1<<(LGSPAN+4)) |
#define DBLSPANMASK (DBLSPANLEN-1) |
|
class FFTSTAGE_TB { |
public: |
Vfftstage *m_ftstage; |
VerilatedVcdC *m_trace; |
long m_oaddr, m_iaddr; |
long m_vals[SPANLEN], m_out[DBLSPANLEN]; |
bool m_syncd; |
int m_offset; |
uint64_t m_tickcount; |
|
FFTSTAGE_TB(void) { |
Verilated::traceEverOn(true); |
m_ftstage = new Vfftstage; |
m_syncd = false; |
m_iaddr = m_oaddr = 0; |
m_offset = 0; |
m_tickcount = 0; |
} |
|
void opentrace(const char *vcdname) { |
if (!m_trace) { |
m_trace = new VerilatedVcdC; |
m_ftstage->trace(m_trace, 99); |
m_trace->open(vcdname); |
} |
} |
|
void closetrace(void) { |
if (m_trace) { |
m_trace->close(); |
delete m_trace; |
m_trace = NULL; |
} |
} |
|
void tick(void) { |
m_tickcount++; |
|
m_ftstage->i_clk = 0; |
m_ftstage->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2)); |
m_ftstage->i_clk = 1; |
m_ftstage->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount)); |
m_ftstage->i_clk = 0; |
m_ftstage->eval(); |
if (m_trace) { |
m_trace->dump((uint64_t)(10ul*m_tickcount+5)); |
m_trace->flush(); |
} |
} |
|
void cetick(void) { |
int ce = m_ftstage->i_ce, nkce; |
|
tick(); |
nkce = 0; // (rand()&1); |
#ifdef FFT_CKPCE |
nkce += FFT_CKPCE; |
#endif |
if ((ce)&&(nkce > 0)) { |
m_ftstage->i_ce = 0; |
for(int kce = 1; kce < nkce; kce++) |
tick(); |
} |
|
m_ftstage->i_ce = ce; |
} |
|
void reset(void) { |
m_ftstage->i_ce = 0; |
m_ftstage->i_reset = 1; |
tick(); |
|
// Let's give it several ticks with no sync |
m_ftstage->i_ce = 0; |
m_ftstage->i_reset = 0; |
for(int i=0; i<8192; i++) { |
m_ftstage->i_data = rand(); |
m_ftstage->i_sync = 0; |
m_ftstage->i_ce = 1; |
|
cetick(); |
|
assert(m_ftstage->o_sync == 0); |
} |
|
m_iaddr = 0; |
m_oaddr = 0; |
m_offset = 0; |
m_syncd = false; |
} |
|
void butterfly(const long cv, const long lft, const long rht, |
long &o_lft, long &o_rht) { |
long cv_r, cv_i; |
long lft_r, lft_i, rht_r, rht_i; |
long o_lft_r, o_lft_i, o_rht_r, o_rht_i; |
|
cv_r = sbits(cv>>CWIDTH, CWIDTH); |
cv_i = sbits(cv, CWIDTH); |
|
lft_r = sbits(lft>>IWIDTH, IWIDTH); |
lft_i = sbits(lft, IWIDTH); |
|
rht_r = sbits(rht>>IWIDTH, IWIDTH); |
rht_i = sbits(rht, IWIDTH); |
|
o_lft_r = lft_r + rht_r; |
o_lft_i = lft_i + rht_i; |
|
o_lft_r &= (~(-1l << OWIDTH)); |
o_lft_i &= (~(-1l << OWIDTH)); |
|
// o_lft_r >>= 1; |
// o_lft_i >>= 1; |
o_lft = (o_lft_r << OWIDTH) | (o_lft_i); |
|
o_rht_r = (cv_r * (lft_r-rht_r)) - (cv_i * (lft_i-rht_i)); |
o_rht_i = (cv_r * (lft_i-rht_i)) + (cv_i * (lft_r-rht_r)); |
|
if (ROUND) { |
if (o_rht_r & (1<<(CWIDTH-3))) |
o_rht_r += (1<<(CWIDTH-3))-1; |
if (o_rht_i & (1<<(CWIDTH-3))) |
o_rht_i += (1<<(CWIDTH-3))-1; |
} |
|
o_rht_r >>= (CWIDTH-2); |
o_rht_i >>= (CWIDTH-2); |
|
o_rht_r &= (~(-1l << OWIDTH)); |
o_rht_i &= (~(-1l << OWIDTH)); |
o_rht = (o_rht_r << OWIDTH) | (o_rht_i); |
|
/* |
printf("%10lx %10lx %10lx -> %10lx %10lx\n", |
cv & ((1l<<(2*CWIDTH))-1l), |
lft & ((1l<<(2*IWIDTH))-1l), |
rht & ((1l<<(2*IWIDTH))-1l), |
o_lft & ((1l<<(2*OWIDTH))-1l), |
o_rht & ((1l<<(2*OWIDTH))-1l)); |
*/ |
} |
|
void test(bool i_sync, long i_data) { |
long cv; |
bool bc; |
int raddr; |
bool failed = false; |
|
m_ftstage->i_reset = 0; |
m_ftstage->i_ce = 1; |
m_ftstage->i_sync = i_sync; |
i_data &= (~(-1l<<(2*IWIDTH))); |
m_ftstage->i_data = i_data; |
|
cv = m_ftstage->cmem[m_iaddr & SPANMASK]; |
bc = m_iaddr & (1<<LGSPAN); |
if (!bc) |
m_vals[m_iaddr & (SPANMASK)] = i_data; |
else { |
int waddr = m_iaddr ^ (1<<LGSPAN); |
waddr &= (DBLSPANMASK); |
if (m_iaddr & (1<<(LGSPAN+1))) |
waddr |= (1<<(LGSPAN)); |
butterfly(cv, m_vals[m_iaddr & (SPANMASK)], i_data, |
m_out[(m_iaddr-SPANLEN) & (DBLSPANMASK)], |
m_out[m_iaddr & (DBLSPANMASK)]); |
/* |
printf("BFLY: C=%16lx M=%8lx I=%10lx -> %10lx %10lx\n", |
cv, m_vals[m_iaddr & (SPANMASK)], i_data, |
m_out[(m_iaddr-SPANLEN)&(DBLSPANMASK)], |
m_out[m_iaddr & (DBLSPANMASK)]); |
*/ |
} |
|
cetick(); |
|
if ((!m_syncd)&&(m_ftstage->o_sync)) { |
m_syncd = true; |
// m_oaddr = m_iaddr - 0x219; |
// m_oaddr = m_iaddr - 0; |
m_offset = m_iaddr; |
m_oaddr = 0; |
|
printf("SYNC!!!!\n"); |
} |
|
raddr = (m_iaddr-m_offset) & DBLSPANMASK; |
/* |
if (m_oaddr & (1<<(LGSPAN+1))) |
raddr |= (1<<LGSPAN); |
*/ |
|
printf("%4ld, %4ld: %d %9lx -> %9lx %d ... %4x %15lx (%10lx)\n", |
(long)m_iaddr, (long)m_oaddr, |
i_sync, (long)(i_data) & (~(-1l << (2*IWIDTH))), |
(long)m_ftstage->o_data, |
m_ftstage->o_sync, |
|
m_ftstage->iaddr&(FFTMASK>>1), |
(long)(m_ftstage->cmem[m_ftstage->iaddr&(SPANMASK>>1)]) & (~(-1l<<(2*CWIDTH))), |
(long)m_out[raddr]); |
|
if ((m_syncd)&&(m_ftstage->o_sync != ((((m_iaddr-m_offset)&((1<<(LGSPAN+1))-1))==0)?1:0))) { |
fprintf(stderr, "Bad output sync (m_iaddr = %lx, m_offset = %x)\n", |
(m_iaddr-m_offset) & SPANMASK, m_offset); |
failed = true; |
} |
|
if (m_syncd) { |
if (m_out[raddr] != m_ftstage->o_data) { |
printf("Bad output data, ([%lx - %x = %x] %lx(exp) != %lx(sut))\n", |
m_iaddr, m_offset, raddr, |
m_out[raddr], (long)m_ftstage->o_data); |
failed = true; |
} |
} else if (m_iaddr > 4096) { |
printf("NO OUTPUT SYNC!\n"); |
failed = true; |
} |
m_iaddr++; |
m_oaddr++; |
|
if (failed) |
exit(-1); |
} |
}; |
|
|
|
int main(int argc, char **argv, char **envp) { |
Verilated::commandArgs(argc, argv); |
FFTSTAGE_TB *ftstage = new FFTSTAGE_TB; |
|
printf("Expecting : IWIDTH = %d, CWIDTH = %d, OWIDTH = %d\n", |
IWIDTH, CWIDTH, OWIDTH); |
|
ftstage->opentrace("fftstage.vcd"); |
ftstage->reset(); |
|
// Medium real (constant) value ... just for starters |
for(int k=1; k<FFTSIZE; k+=2) |
ftstage->test((k==1), 0x00200000l); |
// Medium imaginary (constant) value ... just for starters |
for(int k=1; k<FFTSIZE; k+=2) |
ftstage->test((k==1), 0x00000020l); |
// Medium sine wave, real |
for(int k=1; k<FFTSIZE; k+=2) { |
long vl; |
vl= (long)(cos(2.0 * M_PI * 1.0 / FFTSIZE * k)*(1l<<30) + 0.5); |
vl &= (-1l << 16); // Turn off the imaginary bit portion |
vl &= (~(-1l << (IWIDTH*2))); // Turn off unused high order bits |
ftstage->test((k==1), vl); |
} |
// Smallest real value |
for(int k=1; k<FFTSIZE; k+=2) |
ftstage->test((k==1), 0x00080000l); |
// Smallest imaginary value |
for(int k=1; k<FFTSIZE; k+=2) |
ftstage->test((k==1), 0x00000001l); |
// Largest real value |
for(int k=1; k<FFTSIZE; k+=2) |
ftstage->test((k==1), 0x200000000l); |
// Largest negative imaginary value |
for(int k=1; k<FFTSIZE; k+=2) |
ftstage->test((k==1), 0x000010000l); |
// Let's try an impulse |
for(int k=0; k<FFTSIZE; k+=2) |
ftstage->test((k==0), (k==0)?0x020000000l:0l); |
// Now, let's clear out the result |
for(int k=0; k<FFTSIZE; k+=2) |
ftstage->test((k==0), 0x000000000l); |
for(int k=0; k<FFTSIZE; k+=2) |
ftstage->test((k==0), 0x000000000l); |
for(int k=0; k<FFTSIZE; k+=2) |
ftstage->test((k==0), 0x000000000l); |
for(int k=0; k<FFTSIZE; k+=2) |
ftstage->test((k==0), 0x000000000l); |
|
printf("SUCCESS! (Offset = %d)\n", ftstage->m_offset); |
delete ftstage; |
|
exit(0); |
} |
/trunk/bench/cpp/hwbfly_tb.cpp
1,16 → 1,16
//////////////////////////////////////////////////////////////////////////// |
// |
// Filename: butterfly_tb.cpp |
// Filename: hwbfly_tb.cpp |
// |
// Project: A Doubletime Pipelined FFT |
// |
// Purpose: A test-bench for the butterfly.v subfile of the double |
// clocked FFT. This file may be run autonomously. If so, |
// the last line output will either read "SUCCESS" on success, |
// or some other failure message otherwise. |
// Purpose: A test-bench for the hardware butterfly subfile of the generic |
// pipelined FFT. This file may be run autonomously. If so, |
// the last line output will either read "SUCCESS" on success, or some |
// other failure message otherwise. |
// |
// This file depends upon verilator to both compile, run, and |
// therefore test butterfly.v |
// This file depends upon verilator to both compile, run, and therefore |
// test hwbfly.v |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
17,7 → 17,7
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// Copyright (C) 2015,2018 Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
42,9 → 42,11
#include <stdio.h> |
#include <stdint.h> |
|
#include "verilated.h" |
#include "verilated_vcd_c.h" |
#include "Vhwbfly.h" |
#include "verilated.h" |
#include "twoc.h" |
#include "fftsize.h" |
|
#ifdef NEW_VERILATOR |
#define VVAR(A) hwbfly__DOT_ ## A |
52,27 → 54,65
#define VVAR(A) v__DOT_ ## A |
#endif |
|
#define IWIDTH TST_BUTTERFLY_IWIDTH |
#define CWIDTH TST_BUTTERFLY_CWIDTH |
#define OWIDTH TST_BUTTERFLY_OWIDTH |
|
class BFLY_TB { |
class HWBFLY_TB { |
public: |
Vhwbfly *m_bfly; |
VerilatedVcdC *m_trace; |
unsigned long m_left[64], m_right[64]; |
bool m_aux[64]; |
int m_addr, m_lastaux, m_offset; |
bool m_syncd; |
uint64_t m_tickcount; |
|
BFLY_TB(void) { |
HWBFLY_TB(void) { |
Verilated::traceEverOn(true); |
m_trace = NULL; |
m_bfly = new Vhwbfly; |
m_addr = 0; |
m_syncd = 0; |
m_tickcount = 0; |
m_bfly->i_reset = 1; |
m_bfly->i_clk = 0; |
m_bfly->eval(); |
m_bfly->i_reset = 0; |
} |
|
void opentrace(const char *vcdname) { |
if (!m_trace) { |
m_trace = new VerilatedVcdC; |
m_bfly->trace(m_trace, 99); |
m_trace->open(vcdname); |
} |
} |
|
void closetrace(void) { |
if (m_trace) { |
m_trace->close(); |
delete m_trace; |
m_trace = NULL; |
} |
} |
|
void tick(void) { |
m_tickcount++; |
|
m_lastaux = m_bfly->o_aux; |
m_bfly->i_clk = 0; |
m_bfly->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2)); |
m_bfly->i_clk = 1; |
m_bfly->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount)); |
m_bfly->i_clk = 0; |
m_bfly->eval(); |
if (m_trace) { |
m_trace->dump((uint64_t)(10ul*m_tickcount+5)); |
m_trace->flush(); |
} |
|
if ((!m_syncd)&&(m_bfly->o_aux)) |
m_offset = m_addr; |
79,14 → 119,33
m_syncd = (m_syncd) || (m_bfly->o_aux); |
} |
|
void cetick(void) { |
int ce = m_bfly->i_ce, nkce; |
|
tick(); |
|
nkce = (rand()&1); |
#ifdef FFT_CKPCE |
nkce += FFT_CKPCE; |
#endif |
|
if ((ce)&&(nkce > 0)) { |
m_bfly->i_ce = 0; |
for(int kce=0; kce<nkce-1; kce++) |
tick(); |
} |
|
m_bfly->i_ce = ce; |
} |
|
void reset(void) { |
m_bfly->i_ce = 0; |
m_bfly->i_rst = 1; |
m_bfly->i_reset = 1; |
m_bfly->i_coef = 0l; |
m_bfly->i_left = 0; |
m_bfly->i_right = 0; |
tick(); |
m_bfly->i_rst = 0; |
m_bfly->i_reset = 0; |
m_bfly->i_ce = 1; |
// |
// Let's run a RESET test here, forcing the whole butterfly |
93,18 → 152,18
// to be filled with aux=1. If the reset works right, |
// we'll never get an aux=1 output. |
// |
m_bfly->i_rst = 1; |
m_bfly->i_reset = 1; |
m_bfly->i_aux = 1; |
m_bfly->i_ce = 1; |
m_bfly->i_aux = 1; |
for(int i=0; i<200; i++) |
tick(); |
cetick(); |
|
// Now here's the RESET line, so let's see what the test does |
m_bfly->i_rst = 1; |
m_bfly->i_reset = 1; |
m_bfly->i_ce = 1; |
m_bfly->i_aux = 1; |
tick(); |
m_bfly->i_rst = 0; |
cetick(); |
m_bfly->i_reset = 0; |
m_syncd = 0; |
} |
|
111,13 → 170,13
void test(const int n, const int k, const unsigned long cof, |
const unsigned lft, const unsigned rht, const int aux) { |
|
m_bfly->i_coef = cof & (~(-1l << 40)); |
m_bfly->i_left = lft; |
m_bfly->i_right = rht; |
m_bfly->i_coef = ubits(cof, 2*TST_BUTTERFLY_CWIDTH); |
m_bfly->i_left = ubits(lft, 2*TST_BUTTERFLY_IWIDTH); |
m_bfly->i_right = ubits(rht, 2*TST_BUTTERFLY_IWIDTH); |
m_bfly->i_aux = aux & 1; |
|
m_bfly->i_ce = 1; |
tick(); |
cetick(); |
|
if ((m_bfly->o_aux)&&(!m_lastaux)) |
printf("\n"); |
130,30 → 189,49
m_bfly->o_left, |
m_bfly->o_right, |
m_bfly->o_aux); |
#if (FFT_CKPCE == 1) |
printf(", p1 = 0x%08lx p2 = 0x%08lx, p3 = 0x%08lx", |
#define rp_one VVAR(_CKPCE_ONE__DOT__rp_one) |
#define rp_two VVAR(_CKPCE_ONE__DOT__rp_two) |
#define rp_three VVAR(_CKPCE_ONE__DOT__rp_three) |
m_bfly->rp_one, |
m_bfly->rp_two, |
m_bfly->rp_three); |
#elif (FFT_CKPCE == 2) |
#define rp_one VVAR(_genblk1__DOT__CKPCE_TWO__DOT__rp2_one) |
#define rp_two VVAR(_genblk1__DOT__CKPCE_TWO__DOT__rp_two) |
#define rp_three VVAR(_genblk1__DOT__CKPCE_TWO__DOT__rp_three) |
printf(", p1 = 0x%08lx p2 = 0x%08lx, p3 = 0x%08lx", |
m_bfly->rp_one, |
m_bfly->rp_two, |
m_bfly->rp_three); |
#else |
printf("CKPCE = %d\n", FFT_CKPCE); |
#endif |
|
printf("\n"); |
|
if ((m_syncd)&&(m_left[(m_addr-m_offset)&(64-1)] != m_bfly->o_left)) { |
fprintf(stderr, "WRONG O_LEFT! (%lx(exp) != %lx(sut)\n", |
printf("WRONG O_LEFT! (%lx(exp) != %lx(sut)\n", |
m_left[(m_addr-m_offset)&(64-1)], |
m_bfly->o_left); |
exit(-1); |
exit(EXIT_FAILURE); |
} |
|
if ((m_syncd)&&(m_right[(m_addr-m_offset)&(64-1)] != m_bfly->o_right)) { |
fprintf(stderr, "WRONG O_RIGHT! (%lx(exp) != %lx(sut))\n", |
m_right[(m_addr-m_offset)&(64-1)], |
m_bfly->o_right); |
exit(-1); |
printf("WRONG O_RIGHT! (%lx(exp) != %lx(sut))\n", |
m_right[(m_addr-m_offset)&(64-1)], m_bfly->o_right); |
exit(EXIT_FAILURE); |
} |
|
if ((m_syncd)&&(m_aux[(m_addr-m_offset)&(64-1)] != m_bfly->o_aux)) { |
fprintf(stderr, "FAILED AUX CHANNEL TEST (i.e. the SYNC)\n"); |
exit(-1); |
printf("FAILED AUX CHANNEL TEST (i.e. the SYNC)\n"); |
exit(EXIT_FAILURE); |
} |
|
if ((m_addr > 22)&&(!m_syncd)) { |
fprintf(stderr, "NO SYNC PULSE!\n"); |
exit(-1); |
printf("NO SYNC PULSE!\n"); |
exit(EXIT_FAILURE); |
} |
|
// Now, let's calculate an "expected" result ... |
160,20 → 238,20
long rlft, ilft; |
|
// Extract left and right values ... |
rlft = sbits(m_bfly->i_left >> 16, 16); |
ilft = sbits(m_bfly->i_left , 16); |
rlft = sbits(m_bfly->i_left >> IWIDTH, IWIDTH); |
ilft = sbits(m_bfly->i_left , IWIDTH); |
|
// Now repeat for the right hand value ... |
long rrht, irht; |
// Extract left and right values ... |
rrht = sbits(m_bfly->i_right >> 16, 16); |
irht = sbits(m_bfly->i_right , 16); |
rrht = sbits(m_bfly->i_right >> IWIDTH, IWIDTH); |
irht = sbits(m_bfly->i_right , IWIDTH); |
|
// and again for the coefficients |
long rcof, icof; |
// Extract left and right values ... |
rcof = sbits(m_bfly->i_coef >> 20, 20); |
icof = sbits(m_bfly->i_coef , 20); |
rcof = sbits(m_bfly->i_coef >> CWIDTH, CWIDTH); |
icof = sbits(m_bfly->i_coef , CWIDTH); |
|
// Now, let's do the butterfly ourselves ... |
long sumi, sumr, difi, difr; |
198,9 → 276,12
p2 = difi * icof; |
p3 = (difr + difi) * (rcof + icof); |
|
mpyr = p1-p2 + (1<<17); |
mpyi = p3-p1-p2 + (1<<17); |
mpyr = p1-p2; |
mpyi = p3-p1-p2; |
|
mpyr = rndbits(mpyr, (IWIDTH+2)+(CWIDTH+1), OWIDTH+4); |
mpyi = rndbits(mpyi, (IWIDTH+2)+(CWIDTH+1), OWIDTH+4); |
|
/* |
printf("RC=%lx, IC=%lx, ", rcof, icof); |
printf("P1=%lx,P2=%lx,P3=%lx, ", p1,p2,p3); |
211,12 → 292,15
long o_left_r, o_left_i, o_right_r, o_right_i; |
unsigned long o_left, o_right; |
|
o_left_r = sumr & 0x01ffff; o_left_i = sumi & 0x01ffff; |
o_left = (o_left_r << 17) | (o_left_i); |
o_left_r = rndbits(sumr<<(CWIDTH-2), CWIDTH+IWIDTH+3, OWIDTH+4); |
o_left_r = ubits(o_left_r, OWIDTH); |
o_left_i = rndbits(sumi<<(CWIDTH-2), CWIDTH+IWIDTH+3, OWIDTH+4); |
o_left_i = ubits(o_left_i, OWIDTH); |
o_left = (o_left_r << OWIDTH) | (o_left_i); |
|
o_right_r = (mpyr>>18) & 0x01ffff; |
o_right_i = (mpyi>>18) & 0x01ffff; |
o_right = (o_right_r << 17) | (o_right_i); |
o_right_r = ubits(mpyr, OWIDTH); |
o_right_i = ubits(mpyi, OWIDTH); |
o_right = (o_right_r << OWIDTH) | (o_right_i); |
/* |
printf("oR_r = %lx, ", o_right_r); |
printf("oR_i = %lx\n", o_right_i); |
232,7 → 316,7
|
int main(int argc, char **argv, char **envp) { |
Verilated::commandArgs(argc, argv); |
BFLY_TB *bfly = new BFLY_TB; |
HWBFLY_TB *bfly = new HWBFLY_TB; |
int16_t ir0, ii0, lstr, lsti; |
int32_t sumr, sumi, difr, difi; |
int32_t smr, smi, dfr, dfi; |
240,6 → 324,8
|
const int TESTSZ = 256; |
|
bfly->opentrace("hwbfly.vcd"); |
|
bfly->reset(); |
|
bfly->test(9,0,0x4000000000l,0x7fff0000,0x7fff0000, 1); |
/trunk/bench/cpp/laststage_tb.cpp
0,0 → 1,332
//////////////////////////////////////////////////////////////////////////// |
// |
// Filename: laststage_tb.cpp |
// |
// Project: A Doubletime Pipelined FFT |
// |
// Purpose: A test-bench for the laststage.v subfile of the general purpose |
// pipelined FFT. This file may be run autonomously. If so, |
// the last line output will either read "SUCCESS" on success, or some |
// other failure message otherwise. |
// |
// This file depends upon verilator to both compile, run, and therefore |
// test laststage.v |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015,2018 Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
#include <stdio.h> |
#include <stdint.h> |
|
#include "verilated.h" |
#include "verilated_vcd_c.h" |
#include "Vlaststage.h" |
#include "twoc.h" |
|
#define IWIDTH 16 |
#define OWIDTH (IWIDTH+1) |
#define SHIFT 0 |
#define ROUND 1 |
|
#define ASIZ 32 |
#define AMSK (ASIZ-1) |
|
class LASTSTAGE_TB { |
public: |
Vlaststage *m_last; |
VerilatedVcdC *m_trace; |
#ifdef DBLCLKFFT |
unsigned long m_left[ASIZ], m_right[ASIZ]; |
#else |
unsigned long m_data[ASIZ]; |
#endif |
bool m_syncd; |
int m_addr, m_offset; |
unsigned long m_tickcount; |
|
LASTSTAGE_TB(void) { |
Verilated::traceEverOn(true); |
m_last = new Vlaststage; |
m_tickcount = 0; |
m_syncd = false; m_addr = 0, m_offset = 0; |
} |
|
void opentrace(const char *vcdname) { |
if (!m_trace) { |
m_trace = new VerilatedVcdC; |
m_last->trace(m_trace, 99); |
m_trace->open(vcdname); |
} |
} |
|
void closetrace(void) { |
if (m_trace) { |
m_trace->close(); |
delete m_trace; |
m_trace = NULL; |
} |
} |
|
void tick(void) { |
m_tickcount++; |
|
m_last->i_clk = 0; |
m_last->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul * m_tickcount - 2)); |
m_last->i_clk = 1; |
m_last->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul * m_tickcount)); |
m_last->i_clk = 0; |
m_last->eval(); |
if (m_trace) { |
m_trace->dump((uint64_t)(10ul * m_tickcount + 5)); |
m_trace->flush(); |
} |
m_last->i_reset = 0; |
m_last->i_sync = 0; |
} |
|
void cetick(void) { |
int nkce; |
|
tick(); |
nkce = (rand()&1); |
#ifdef FFT_CKPCE |
nkce += FFT_CKPCE; |
#endif |
if ((m_last->i_ce)&&(nkce > 0)) { |
m_last->i_ce = 0; |
for(int kce = 1; kce < nkce; kce++) |
tick(); |
m_last->i_ce = 1; |
} |
} |
|
void reset(void) { |
m_last->i_reset = 1; |
tick(); |
|
m_syncd = false; m_addr = 0, m_offset = 0; |
} |
|
void check_results(void) { |
bool failed = false; |
|
if ((!m_syncd)&&(m_last->o_sync)) { |
m_syncd = true; |
m_offset = m_addr; |
printf("SYNCD at %d\n", m_addr); |
} |
|
#ifdef DBLCLKFFT |
int ir0, ir1, ii0, ii1, or0, oi0, or1, oi1; |
|
ir0 = sbits(m_left[ (m_addr-m_offset)&AMSK]>>IWIDTH, IWIDTH); |
ir1 = sbits(m_right[(m_addr-m_offset)&AMSK]>>IWIDTH, IWIDTH); |
ii0 = sbits(m_left[ (m_addr-m_offset)&AMSK], IWIDTH); |
ii1 = sbits(m_right[(m_addr-m_offset)&AMSK], IWIDTH); |
|
|
or0 = sbits(m_last->o_left >> OWIDTH, OWIDTH); |
oi0 = sbits(m_last->o_left , OWIDTH); |
or1 = sbits(m_last->o_right >> OWIDTH, OWIDTH); |
oi1 = sbits(m_last->o_right , OWIDTH); |
|
|
// Sign extensions |
printf("k=%3d: IN = %08x:%08x, OUT =%09lx:%09lx, S=%d\n", |
m_addr, m_last->i_left, m_last->i_right, |
m_last->o_left, m_last->o_right, |
m_last->o_sync); |
|
/* |
printf("\tI0 = { %x : %x }, I1 = { %x : %x }, O0 = { %x : %x }, O1 = { %x : %x }\n", |
ir0, ii0, ir1, ii1, or0, oi0, or1, oi1); |
*/ |
|
if (m_syncd) { |
if (or0 != (ir0 + ir1)) { |
printf("FAIL 1: or0 != (ir0+ir1), or %x(exp) != %x(sut)\n", (ir0+ir1), or0); |
failed=true;} |
if (oi0 != (ii0 + ii1)) {printf("FAIL 2\n"); failed=true;} |
if (or1 != (ir0 - ir1)) {printf("FAIL 3\n"); failed=true;} |
if (oi1 != (ii0 - ii1)) {printf("FAIL 4\n"); failed=true;} |
} else if (m_addr > 20) { |
printf("NO SYNC!\n"); |
failed = true; |
} |
#else |
int or0, oi0; |
int sumr, sumi, difr, difi; |
int ir0, ii0, ir1, ii1, ir2, ii2, ir3, ii3, irn, iin; |
|
irn = sbits(m_data[(m_addr-m_offset+2)&AMSK]>>IWIDTH, IWIDTH); |
iin = sbits(m_data[(m_addr-m_offset+2)&AMSK], IWIDTH); |
ir0 = sbits(m_data[(m_addr-m_offset+1)&AMSK]>>IWIDTH, IWIDTH); |
ii0 = sbits(m_data[(m_addr-m_offset+1)&AMSK], IWIDTH); |
ir1 = sbits(m_data[(m_addr-m_offset )&AMSK]>>IWIDTH, IWIDTH); |
ii1 = sbits(m_data[(m_addr-m_offset )&AMSK], IWIDTH); |
ir2 = sbits(m_data[(m_addr-m_offset-1)&AMSK]>>IWIDTH, IWIDTH); |
ii2 = sbits(m_data[(m_addr-m_offset-1)&AMSK], IWIDTH); |
ir3 = sbits(m_data[(m_addr-m_offset-2)&AMSK]>>IWIDTH, IWIDTH); |
ii3 = sbits(m_data[(m_addr-m_offset-2)&AMSK], IWIDTH); |
|
sumr = ir1 + ir0; |
sumi = ii1 + ii0; |
|
difr = ir2 - ir1; |
difi = ii2 - ii1; |
|
or0 = sbits(m_last->o_val >> OWIDTH, OWIDTH); |
oi0 = sbits(m_last->o_val , OWIDTH); |
|
printf("IR0 = %08x, IR1 = %08x, IR2 = %08x, ", |
ir0, ir1, ir2); |
printf("II0 = %08x, II1 = %08x, II2 = %08x, ", |
ii0, ii1, ii2); |
// Sign extensions |
printf("k=%3d: IN = %08x, %c, OUT =%09lx, S=%d\n", |
m_addr, m_last->i_val, |
m_last->i_sync ? 'S':' ', |
m_last->o_val, m_last->o_sync); |
|
|
if ((m_syncd)&&(0 == ((m_addr-m_offset)&1))) { |
if (or0 != sumr) { |
printf("FAIL 1: or0 != (ir0+ir1), or %x(exp) != %x(sut)\n", sumr, or0); |
failed=true; |
} if (oi0 != sumi) { |
printf("FAIL 2\n"); |
failed=true; |
} |
} else if ((m_syncd)&&(1 == ((m_addr-m_offset)&1))) { |
if (or0 != difr) { |
printf("FAIL 3: or0 != (ir1-ir0), or %x(exp) != %x(sut)\n", difr, or0); |
failed=true; |
} if (oi0 != difi) { |
printf("FAIL 4: oi0 != (ii1-ii0), or %x(exp) != %x(sut)\n", difi, oi0); |
failed=true; |
} |
} else if (m_addr > 20) { |
printf("NO SYNC!\n"); |
failed = true; |
} |
#endif |
if (failed) |
exit(-2); |
} |
|
void sync(void) { |
m_last->i_sync = 1; |
} |
|
void test(unsigned long left, unsigned long right) { |
m_last->i_ce = 1; |
if (m_last->i_sync) |
m_addr = 0; |
#ifdef DBLCLKFFT |
m_last->i_left = left; |
m_last->i_right = right; |
|
m_left[ m_addr&AMSK] = m_last->i_left; |
m_right[m_addr&AMSK] = m_last->i_right; |
m_addr++; |
|
cetick(); |
#else |
m_last->i_val = left; |
m_data[ m_addr&AMSK] = m_last->i_val; |
m_addr = (m_addr+1); |
cetick(); |
|
check_results(); |
|
m_last->i_val = right; |
m_data[m_addr&AMSK] = m_last->i_val; |
m_addr = (m_addr+1)&AMSK; |
cetick(); |
#endif |
|
check_results(); |
} |
|
void test(int ir0, int ii0, int ir1, int ii1) { |
unsigned long left, right, mask = (1<<IWIDTH)-1; |
|
left = ((ir0&mask) << IWIDTH) | (ii0 & mask); |
right = ((ir1&mask) << IWIDTH) | (ii1 & mask); |
test(left, right); |
} |
}; |
|
int main(int argc, char **argv, char **envp) { |
Verilated::commandArgs(argc, argv); |
LASTSTAGE_TB *tb = new LASTSTAGE_TB; |
|
tb->opentrace("laststage.vcd"); |
tb->reset(); |
|
tb->sync(); |
|
tb->test( 1, 0,0,0); |
tb->test( 0, 2,0,0); |
tb->test( 0, 0,4,0); |
tb->test( 0, 0,0,8); |
|
tb->test( 0, 0,0,0); |
|
tb->test(16,16,0,0); |
tb->test(0,0,16,16); |
tb->test(16,-16,0,0); |
tb->test(0,0,16,-16); |
tb->test(16,16,0,0); |
tb->test(0,0,16,16); |
|
for(int k=0; k<64; k++) { |
int16_t ir0, ii0, ir1, ii1; |
|
// Let's pick some random values, ... |
ir0 = rand(); if (ir0&4) ir0 = -ir0; |
ii0 = rand(); if (ii0&2) ii0 = -ii0; |
ir1 = rand(); if (ir1&1) ir1 = -ir1; |
ii1 = rand(); if (ii1&8) ii1 = -ii1; |
|
tb->test(ir0, ii0, ir1, ii1); |
|
} |
|
delete tb; |
|
printf("SUCCESS!\n"); |
exit(0); |
} |
|
|
|
|
|
|
/trunk/bench/cpp/mpy_tb.cpp
2,7 → 2,7
// |
// Filename: mpy_tb.cpp |
// |
// Project: A Doubletime Pipelined FFT |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: A test-bench for the shift and add shiftaddmpy.v subfile of |
// the double clocked FFT. This file may be run autonomously. |
17,7 → 17,7
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
39,7 → 39,11
// |
// |
/////////////////////////////////////////////////////////////////////////// |
#include "verilated.h" |
#include "verilated_vcd_c.h" |
|
#include "fftsize.h" |
|
#ifdef USE_OLD_MULTIPLY |
#include "Vshiftaddmpy.h" |
typedef Vshiftaddmpy Vmpy; |
54,17 → 58,20
#define DELAY ((AW/2)+(AW&1)+2) |
#endif |
|
#include "verilated.h" |
#include "twoc.h" |
|
class MPYTB { |
public: |
Vmpy *mpy; |
long vals[32]; |
int m_addr; |
Vmpy *m_mpy; |
VerilatedVcdC *m_trace; |
long vals[32]; |
int m_addr; |
uint64_t m_tickcount; |
|
MPYTB(void) { |
mpy = new Vmpy; |
Verilated::traceEverOn(true); |
m_mpy = new Vmpy; |
m_tickcount = 0; |
|
for(int i=0; i<32; i++) |
vals[i] = 0; |
71,24 → 78,68
m_addr = 0; |
} |
~MPYTB(void) { |
delete mpy; |
closetrace(); |
delete m_mpy; |
} |
|
void tick(void) { |
mpy->i_clk = 0; |
mpy->eval(); |
mpy->i_clk = 1; |
mpy->eval(); |
void opentrace(const char *vcdname) { |
if (!m_trace) { |
m_trace = new VerilatedVcdC; |
m_mpy->trace(m_trace, 99); |
m_trace->open(vcdname); |
} |
} |
|
void closetrace(void) { |
if (m_trace) { |
m_trace->close(); |
delete m_trace; |
m_trace = NULL; |
} |
} |
|
void tick(void) { |
m_tickcount++; |
|
m_mpy->i_clk = 0; |
m_mpy->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2)); |
m_mpy->i_clk = 1; |
m_mpy->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount)); |
m_mpy->i_clk = 0; |
m_mpy->eval(); |
if (m_trace) { |
m_trace->dump((uint64_t)(10ul*m_tickcount+5)); |
m_trace->flush(); |
} |
} |
|
void cetick(void) { |
int ce = m_mpy->i_ce, nkce; |
|
tick(); |
nkce = (rand()&1); |
#ifdef FFT_CKPCE |
nkce += FFT_CKPCE; |
#endif |
if ((ce)&&(nkce>0)) { |
m_mpy->i_ce = 0; |
for(int kce=1; kce<nkce; kce++) |
tick(); |
} |
|
m_mpy->i_ce = ce; |
} |
|
void reset(void) { |
mpy->i_clk = 0; |
mpy->i_ce = 1; |
mpy->i_a = 0; |
mpy->i_b = 0; |
m_mpy->i_clk = 0; |
m_mpy->i_ce = 1; |
m_mpy->i_a_unsorted = 0; |
m_mpy->i_b_unsorted = 0; |
|
for(int k=0; k<20; k++) |
tick(); |
cetick(); |
} |
|
bool test(const int ia, const int ib) { |
97,23 → 148,21
|
a = sbits(ia, AW); |
b = sbits(ib, BW); |
mpy->i_ce = 1; |
mpy->i_a = ubits(a, AW); |
mpy->i_b = ubits(b, BW); |
m_mpy->i_ce = 1; |
m_mpy->i_a_unsorted = ubits(a, AW); |
m_mpy->i_b_unsorted = ubits(b, BW); |
|
vals[m_addr&31] = a * b; |
|
tick(); |
if (rand()&1) { |
mpy->i_ce = 0; |
tick(); |
} |
cetick(); |
|
printf("k=%3d: A =%04x, B =%05x -> O = %9lx (ANS=%10lx)\n", |
m_addr, (int)ubits(a,AW), (int)ubits(b,BW), |
(long)mpy->o_r, ubits(vals[m_addr&31], AW+BW+4)); |
printf("k=%3d: A =%0*x, B =%0*x -> O = %*lx (ANS=%*lx)\n", |
m_addr, (AW+3)/4, (int)ubits(a,AW), |
(BW+3)/4, (int)ubits(b,BW), |
(AW+BW+3)/4, (long)m_mpy->o_r, |
(AW+BW+7)/4, ubits(vals[m_addr&31], AW+BW+4)); |
|
out = sbits(mpy->o_r, AW+BW); |
out = sbits(m_mpy->o_r, AW+BW); |
|
m_addr++; |
|
131,6 → 180,7
Verilated::commandArgs(argc, argv); |
MPYTB *tb = new MPYTB; |
|
tb->opentrace("mpy.vcd"); |
tb->reset(); |
|
for(int k=0; k<15; k++) { |
149,10 → 199,16
tb->test(a, b); |
} |
|
for(int k=0; k<2048; k++) { |
int a, b, out; |
|
tb->test(rand(), rand()); |
if (AW+BW <= 20) { |
// Exhaustive test |
for(int a=0; a< (1<<AW); a++) |
for(int b=0; b< (1<<BW); b++) |
tb->test(a, b); |
printf("Exhaust complete\n"); |
} else { |
// Pseudorandom test |
for(int k=0; k<2048; k++) |
tb->test(rand(), rand()); |
} |
|
delete tb; |
/trunk/bench/cpp/qtrstage_tb.cpp
6,11 → 6,11
// |
// Purpose: A test-bench for the qtrstage.v subfile of the double |
// clocked FFT. This file may be run autonomously. If so, |
// the last line output will either read "SUCCESS" on success, |
// or some other failure message otherwise. |
// the last line output will either read "SUCCESS" on success, or some |
// other failure message otherwise. |
// |
// This file depends upon verilator to both compile, run, and |
// therefore test qtrstage.v |
// This file depends upon verilator to both compile, run, and therefore |
// test qtrstage.v |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
17,7 → 17,7
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// Copyright (C) 2015,2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
42,8 → 42,9
#include <stdio.h> |
#include <stdint.h> |
|
#include "verilated.h" |
#include "verilated_vcd_c.h" |
#include "Vqtrstage.h" |
#include "verilated.h" |
#include "twoc.h" |
#include "fftsize.h" |
|
72,30 → 73,78
class QTRTEST_TB { |
public: |
Vqtrstage *m_qstage; |
unsigned long m_data[ASIZ]; |
VerilatedVcdC *m_trace; |
unsigned long m_data[ASIZ], m_tickcount; |
int m_addr, m_offset; |
bool m_syncd; |
|
QTRTEST_TB(void) { |
Verilated::traceEverOn(true); |
m_trace = NULL; |
m_qstage = new Vqtrstage; |
m_addr = 0; m_offset = 6; m_syncd = false; |
m_addr = 0; |
m_offset = 6; |
m_syncd = false; |
m_tickcount = 0; |
} |
|
void opentrace(const char *vcdname) { |
if (!m_trace) { |
m_trace = new VerilatedVcdC; |
m_qstage->trace(m_trace, 99); |
m_trace->open(vcdname); |
} |
} |
|
void closetrace(void) { |
if (m_trace) { |
m_trace->close(); |
delete m_trace; |
m_trace = NULL; |
} |
} |
|
void tick(void) { |
m_tickcount++; |
|
m_qstage->i_clk = 0; |
m_qstage->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount-2)); |
m_qstage->i_clk = 1; |
m_qstage->eval(); |
if (m_trace) m_trace->dump((uint64_t)(10ul*m_tickcount)); |
m_qstage->i_clk = 0; |
m_qstage->eval(); |
if (m_trace) { |
m_trace->dump((uint64_t)(10ul*m_tickcount+5)); |
m_trace->flush(); |
} |
|
m_qstage->i_sync = 0; |
} |
|
void cetick(void) { |
int nkce; |
|
tick(); |
nkce = (rand()&1); |
#ifdef FFT_CKPCE |
nkce += FFT_CKPCE; |
#endif |
if ((m_qstage->i_ce)&&(nkce>0)) { |
m_qstage->i_ce = 0; |
for(int kce = 1; kce < nkce; kce++) |
tick(); |
m_qstage->i_ce = 1; |
} |
} |
|
void reset(void) { |
m_qstage->i_ce = 0; |
m_qstage->i_rst = 1; |
m_qstage->i_reset = 1; |
tick(); |
m_qstage->i_ce = 0; |
m_qstage->i_rst = 0; |
m_qstage->i_reset = 0; |
tick(); |
|
m_addr = 0; m_offset = 6; m_syncd = false; |
102,18 → 151,22
} |
|
void check_results(void) { |
int ir0, ii0, ir1, ii1, ir2, ii2; |
int sumr, sumi, difr, difi, or0, oi0; |
bool fail = false; |
|
if ((!m_syncd)&&(m_qstage->o_sync)) { |
m_syncd = true; |
assert(m_addr == m_offset); |
m_offset = m_addr; |
printf("VALID-SYNC!!\n"); |
} |
|
if (!m_syncd) |
return; |
|
#ifdef DBLCLKFFT |
int ir0, ii0, ir1, ii1, ir2, ii2; |
|
ir0 = sbits(m_data[(m_addr-m_offset-1)&AMSK]>>IWIDTH, IWIDTH); |
ii0 = sbits(m_data[(m_addr-m_offset-1)&AMSK], IWIDTH); |
ir1 = sbits(m_data[(m_addr-m_offset )&AMSK]>>IWIDTH, IWIDTH); |
140,11 → 193,49
if (oi0 != difi) { |
printf("FAIL 4: oi0 != difi (%x(exp) != %x(sut))\n", difi, oi0); fail = true;} |
} |
#else |
int locn = (m_addr-m_offset)&AMSK; |
int ir1, ii1, ir3, ii3, ir5, ii5; |
|
if (m_qstage->o_sync != ((((m_addr-m_offset)&127) == 0)?1:0)) { |
printf("BAD O-SYNC, m_addr = %d, m_offset = %d\n", m_addr, m_offset); fail = true; |
ir5 = sbits(m_data[(m_addr-m_offset-2)&AMSK]>>IWIDTH, IWIDTH); |
ii5 = sbits(m_data[(m_addr-m_offset-2)&AMSK], IWIDTH); |
ir3 = sbits(m_data[(m_addr-m_offset )&AMSK]>>IWIDTH, IWIDTH); |
ii3 = sbits(m_data[(m_addr-m_offset )&AMSK], IWIDTH); |
ir1 = sbits(m_data[(m_addr-m_offset+2)&AMSK]>>IWIDTH, IWIDTH); |
ii1 = sbits(m_data[(m_addr-m_offset+2)&AMSK], IWIDTH); |
|
sumr = ir3 + ir1; |
sumi = ii3 + ii1; |
difr = ir5 - ir3; |
difi = ii5 - ii3; |
|
or0 = sbits(m_qstage->o_data >> OWIDTH, OWIDTH); |
oi0 = sbits(m_qstage->o_data, OWIDTH); |
|
if (0==((locn)&2)) { |
if (or0 != sumr) { |
printf("FAIL 1: or0 != sumr (%x(exp) != %x(sut))\n", sumr, or0); fail = true; |
} |
if (oi0 != sumi) { |
printf("FAIL 2: oi0 != sumi (%x(exp) != %x(sut))\n", sumi, oi0); fail = true;} |
} else if (2==((m_addr-m_offset)&3)) { |
if (or0 != difr) { |
printf("FAIL 3: or0 != difr (%x(exp) != %x(sut))\n", difr, or0); fail = true;} |
if (oi0 != difi) { |
printf("FAIL 4: oi0 != difi (%x(exp) != %x(sut))\n", difi, oi0); fail = true;} |
} else if (3==((m_addr-m_offset)&3)) { |
if (or0 != difi) { |
printf("FAIL 3: or0 != difr (%x(exp) != %x(sut))\n", difr, or0); fail = true;} |
if (oi0 != -difr) { |
printf("FAIL 4: oi0 != difi (%x(exp) != %x(sut))\n", difi, oi0); fail = true;} |
} |
|
// if (m_qstage->o_sync != ((((m_addr-m_offset)&127) == 0)?1:0)) { |
// printf("BAD O-SYNC, m_addr = %d, m_offset = %d\n", m_addr, m_offset); fail = true; |
// } |
#endif |
|
|
if (fail) |
exit(-1); |
} |
159,6 → 250,7
m_qstage->i_ce = 1; |
m_qstage->i_data = data; |
// m_qstage->i_sync = (((m_addr&127)==2)?1:0); |
// printf("DATA[%08x] = %08x ... ", m_addr, data); |
m_data[ (m_addr++)&AMSK] = data; |
tick(); |
|
172,7 → 264,11
m_qstage->diff_i, |
m_qstage->pipeline, |
m_qstage->iaddr, |
#ifdef DBLCLKFFT |
m_qstage->imem, |
#else |
m_qstage->imem[1], |
#endif |
m_qstage->wait_for_sync); |
|
check_results(); |
202,17 → 298,57
int16_t ir0, ii0, ir1, ii1, ir2, ii2; |
int32_t sumr, sumi, difr, difi; |
|
tb->opentrace("qtrstage.vcd"); |
tb->reset(); |
|
tb->test( 16, 0); |
tb->test( 16, 0); |
tb->sync(); |
|
tb->test( 8, 0); |
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test( 0, 0); |
|
tb->test( 0, 4); |
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test( 0, 0); |
|
tb->test( 0, 0); |
tb->test( 32, 0); |
tb->test( 0, 0); |
tb->test( 0, 0); |
|
tb->test( 0, 0); |
tb->test( 0, 64); |
tb->test( 0, 0); |
tb->test( 0, 0); |
|
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test(128, 0); |
tb->test( 0, 0); |
|
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test( 0,256); |
tb->test( 0, 0); |
|
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test( 2, 0); |
|
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test( 0, 0); |
tb->test( 0, 1); |
|
tb->test( 0, 16); |
tb->test( 0, 16); |
tb->test( 16, 0); |
tb->test(-16, 0); |
tb->test( 0, 16); |
tb->test( 0,-16); |
|
for(int k=0; k<1060; k++) { |
tb->random_test(); |
/trunk/bench/formal/.gitignore
0,0 → 1,15
bitreverse |
laststage |
qtrstage |
hwbfly_one |
hwbfly_two |
hwbfly_three |
butterfly_one |
butterfly_two |
butterfly_three |
butterfly_ck1 |
butterfly_ck2_r0 |
butterfly_ck2_r1 |
butterfly_ck3_r0 |
butterfly_ck3_r1 |
butterfly_ck3_r2 |
/trunk/bench/formal/README.md
0,0 → 1,18
This directory contains several SymbiYosys scripts useful for |
formally verifying parts and pieces of the design. Admittedly, |
the entire design has yet to be formally verfified, however many |
components have been verified successfully. These include: |
|
- The butterflies, both the hardware enabled butterflies and the |
soft multiplies. |
|
- The penultimate (4-pt) stage of the FFT |
|
- The final stage (2-pt) of the FFT |
|
- The bitreverse |
|
My intention is not to place formal properties into the repository. |
Within the [defaults.h](../../sw/defaults.h) there's a |
``formal_property_flag`` used for controlling whether or not the |
formal properties are included into the RTL files. |
/trunk/bench/formal/abs_mpy.v
0,0 → 1,114
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: abs_mpy.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: This code has been modified from the mpyop.v file so as to |
// abstract the multiply that formal methods struggle so hard to |
// deal with. It also simplifies the interface so that (if enabled) |
// the multiply will return in 1-6 clocks, rather than the specified |
// number for the given architecture. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory. Run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module abs_mpy(i_a, i_b, o_result); |
parameter AW = 32, BW=32; |
parameter [0:0] OPT_SIGNED = 1'b1; |
input wire [(AW-1):0] i_a; |
input wire [(BW-1):0] i_b; |
output wire [(AW+BW-1):0] o_result; |
|
wire [(AW+BW-1):0] any_result; |
assign any_result = $anyseq; |
|
reg [AW-1:0] u_a; |
reg [BW-1:0] u_b; |
|
always @(*) |
begin |
u_a = ((i_a[AW-1])&&(OPT_SIGNED)) ? -i_a : i_a; |
u_b = ((i_b[BW-1])&&(OPT_SIGNED)) ? -i_b : i_b; |
end |
|
reg [(AW+BW-1):0] u_result; |
always @(*) |
if ((OPT_SIGNED)&&(any_result[AW+BW-1])) |
u_result = - { 1'b1, any_result }; |
else |
u_result = { 1'b0, any_result }; |
|
always @(*) |
begin |
// Constrain our result among many possibilities |
if ((i_a == 0)||(i_b == 0)) |
assume(any_result == 0); |
else if (OPT_SIGNED) |
assume(any_result[AW+BW-1] |
== (i_a[AW-1] ^ i_b[BW-1])); |
|
assume(u_result[AW+BW-1:BW] <= u_a); |
assume(u_result[AW+BW-1:AW] <= u_b); |
end |
|
genvar k; |
generate |
begin |
for(k=0; k<AW-1; k=k+1) |
begin |
always @(*) |
if (u_a == (1<<k)) |
assume(u_result == (u_b << k)); |
end |
|
for(k=0; k<BW; k=k+1) |
begin |
always @(*) |
if (u_b == (1<<k)) |
assume(u_result== (u_a << k)); |
end |
|
end endgenerate |
|
assign o_result = any_result; |
|
/* |
always @(*) |
if (i_a == 1) |
assert(o_result == {{(AW){i_b[BW-1]}}, i_b }); |
|
always @(*) |
if (i_b == 1) |
assert(o_result == {{(BW){i_a[AW-1]}}, i_a }); |
*/ |
endmodule |
/trunk/bench/formal/bitreverse.sby
0,0 → 1,13
[options] |
mode prove |
depth 12 |
|
[engines] |
smtbmc |
|
[script] |
read_verilog -formal -DBITREVERSE bitreverse.v |
prep -top bitreverse |
|
[files] |
../../rtl/bitreverse.v |
/trunk/bench/formal/butterfly.sby
0,0 → 1,42
[tasks] |
ck1 |
ck2_r0 |
ck2_r1 |
ck3_r0 |
ck3_r1 |
|
[options] |
mode prove |
depth 30 |
|
[engines] |
smtbmc |
|
[script] |
read_verilog -formal -DHWBFLY abs_mpy.v |
read_verilog -formal -DHWBFLY convround.v |
read_verilog -formal -DHWBFLY longbimpy.v |
read_verilog -formal -DHWBFLY bimpy.v |
read_verilog -formal -DHWBFLY butterfly.v |
|
# While I'd love to change the width of the inputs and the coefficients, |
# doing so would adjust the width of the firmware multiplies, and so defeat |
# our purpose here. |
# ck1: chparam -set CKPCE 1 butterfly |
ck1: chparam -set CKPCE 1 -set CWIDTH 19 -set IWIDTH 15 butterfly |
# |
ck2_r0: chparam -set CKPCE 2 -set CWIDTH 20 -set IWIDTH 12 -set F_CHECK 1 butterfly |
ck2_r1: chparam -set CKPCE 2 -set CWIDTH 16 -set IWIDTH 6 -set F_CHECK 0 butterfly |
# |
ck3_r0: chparam -set CKPCE 3 -set CWIDTH 16 -set IWIDTH 12 -set F_CHECK 0 butterfly |
ck3_r1: chparam -set CKPCE 3 -set CWIDTH 18 -set IWIDTH 14 -set F_CHECK 1 butterfly |
ck3_r2: chparam -set CKPCE 3 -set CWIDTH 20 -set IWIDTH 16 -set F_CHECK 2 butterfly |
|
prep -top butterfly |
|
[files] |
abs_mpy.v |
../../rtl/convround.v |
../../rtl/bimpy.v |
../../rtl/longbimpy.v |
../../rtl/butterfly.v |
/trunk/bench/formal/hwbfly.sby
0,0 → 1,27
[tasks] |
one |
two |
three |
|
[options] |
mode prove |
depth 23 |
|
[engines] |
smtbmc |
|
[script] |
read_verilog -formal -DHWBFLY abs_mpy.v |
read_verilog -formal -DHWBFLY convround.v |
read_verilog -formal -DHWBFLY hwbfly.v |
|
one: chparam -set CKPCE 1 -set IWIDTH 4 -set CWIDTH 6 hwbfly |
two: chparam -set CKPCE 2 -set IWIDTH 4 -set CWIDTH 6 hwbfly |
three: chparam -set CKPCE 3 -set IWIDTH 4 -set CWIDTH 6 hwbfly |
|
prep -top hwbfly |
|
[files] |
abs_mpy.v |
../../rtl/convround.v |
../../rtl/hwbfly.v |
/trunk/bench/formal/laststage.sby
0,0 → 1,16
[options] |
mode prove |
depth 20 |
|
[engines] |
smtbmc yices |
|
[script] |
read_verilog -formal -DLASTSTAGE convround.v |
read_verilog -formal -DLASTSTAGE laststage.v |
chparam -set IWIDTH 3 -set OWIDTH 4 laststage |
prep -top laststage |
|
[files] |
../../rtl/laststage.v |
../../rtl/convround.v |
/trunk/bench/formal/qtrstage.sby
0,0 → 1,16
[options] |
mode prove |
depth 20 |
|
[engines] |
smtbmc boolector |
|
[script] |
read_verilog -formal -DQTRSTAGE convround.v |
read_verilog -formal -DQTRSTAGE qtrstage.v |
chparam -set IWIDTH 3 -set OWIDTH 4 qtrstage |
prep -top qtrstage |
|
[files] |
../../rtl/qtrstage.v |
../../rtl/convround.v |
/trunk/rtl/README.md
0,0 → 1,28
This directory contains a demonstration FFT design. |
|
Should you wish to use this core, I would recommend you run `fftgen` from the |
[sw](../sw) directory to create an FFT tailored to your own needs. |
|
In sum, from the top down, the modules are: |
|
- [fftmain](fftmain.v) is the top level FFT file. |
|
- [fftstage](fftstage.v) calculates one FFT stage |
|
- [hwbfly](hwbfly.v) implements a butterfly that uses the `*` operator |
for its multiply |
- [butterfly](butterfly.v) implements a butterfly that uses a logic |
multiply at the cost of more logic and a greater delay. |
|
- [longbimpy](longbimpy.v) is the logic binary multiply. |
- [bimpy](bimpy.v) multiplies a small set of bits together. It is a |
component of [longbimpy](longbimpy.v) |
|
- [qtrstage](qtrstage.v) is the 4-pt stage of the FFT |
|
- [laststage](laststage.v) is the 2-pt stage of the FFT |
|
- [bitreverse](bitreverse.v), the final step in the multiply, bit-reverses |
the outgoing data. |
|
|
/trunk/rtl/bimpy.v
0,0 → 1,72
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: ../rtl/bimpy.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: A simple 2-bit multiply based upon the fact that LUT's allow |
// 6-bits of input. In other words, I could build a 3-bit |
// multiply from 6 LUTs (5 actually, since the first could have two |
// outputs). This would allow multiplication of three bit digits, save |
// only for the fact that you would need two bits of carry. The bimpy |
// approach throttles back a bit and does a 2x2 bit multiply in a LUT, |
// guaranteeing that it will never carry more than one bit. While this |
// multiply is hardware independent (and can still run under Verilator |
// therefore), it is really motivated by trying to optimize for a |
// specific piece of hardware (Xilinx-7 series ...) that has at least |
// 4-input LUT's with carry chains. |
// |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module bimpy(i_clk, i_ce, i_a, i_b, o_r); |
parameter BW=18, // Number of bits in i_b |
LUTB=2; // Number of bits in i_a for our LUT multiply |
input i_clk, i_ce; |
input [(LUTB-1):0] i_a; |
input [(BW-1):0] i_b; |
output reg [(BW+LUTB-1):0] o_r; |
|
wire [(BW+LUTB-2):0] w_r; |
wire [(BW+LUTB-3):1] c; |
|
assign w_r = { ((i_a[1])?i_b:{(BW){1'b0}}), 1'b0 } |
^ { 1'b0, ((i_a[0])?i_b:{(BW){1'b0}}) }; |
assign c = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1'b0}}) } |
& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1'b0}}); |
|
always @(posedge i_clk) |
if (i_ce) |
o_r <= w_r + { c, 2'b0 }; |
|
endmodule |
/trunk/rtl/bitreverse.v
0,0 → 1,325
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: bitreverse.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This module bitreverses a pipelined FFT input. Operation is |
// expected as follows: |
// |
// i_clk A running clock at whatever system speed is offered. |
// i_reset A synchronous reset signal, that resets all internals |
// i_ce If this is one, one input is consumed and an output |
// is produced. |
// i_in_0, i_in_1 |
// Two inputs to be consumed, each of width WIDTH. |
// o_out_0, o_out_1 |
// Two of the bitreversed outputs, also of the same |
// width, WIDTH. Of course, there is a delay from the |
// first input to the first output. For this purpose, |
// o_sync is present. |
// o_sync This will be a 1'b1 for the first value in any block. |
// Following a reset, this will only become 1'b1 once |
// the data has been loaded and is now valid. After that, |
// all outputs will be valid. |
// |
// 20150602 -- This module has undergone massive rework in order to |
// ensure that it uses resources efficiently. As a result, |
// it now optimizes nicely into block RAMs. As an unfortunately |
// side effect, it now passes it's bench test (dblrev_tb) but |
// fails the integration bench test (fft_tb). |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
|
|
// |
// How do we do bit reversing at two smples per clock? Can we separate out |
// our work into eight memory banks, writing two banks at once and reading |
// another two banks in the same clock? |
// |
// mem[00xxx0] = s_0[n] |
// mem[00xxx1] = s_1[n] |
// o_0[n] = mem[10xxx0] |
// o_1[n] = mem[11xxx0] |
// ... |
// mem[01xxx0] = s_0[m] |
// mem[01xxx1] = s_1[m] |
// o_0[m] = mem[10xxx1] |
// o_1[m] = mem[11xxx1] |
// ... |
// mem[10xxx0] = s_0[n] |
// mem[10xxx1] = s_1[n] |
// o_0[n] = mem[00xxx0] |
// o_1[n] = mem[01xxx0] |
// ... |
// mem[11xxx0] = s_0[m] |
// mem[11xxx1] = s_1[m] |
// o_0[m] = mem[00xxx1] |
// o_1[m] = mem[01xxx1] |
// ... |
// |
// The answer is that, yes we can but: we need to use four memory banks |
// to do it properly. These four banks are defined by the two bits |
// that determine the top and bottom of the correct address. Larger |
// FFT's would require more memories. |
// |
// |
module bitreverse(i_clk, i_reset, i_ce, i_in_0, i_in_1, |
o_out_0, o_out_1, o_sync); |
parameter LGSIZE=5, WIDTH=24; |
input i_clk, i_reset, i_ce; |
input [(2*WIDTH-1):0] i_in_0, i_in_1; |
output wire [(2*WIDTH-1):0] o_out_0, o_out_1; |
output reg o_sync; |
|
reg in_reset; |
reg [(LGSIZE-1):0] iaddr; |
wire [(LGSIZE-3):0] braddr; |
|
genvar k; |
generate for(k=0; k<LGSIZE-2; k=k+1) |
begin : gen_a_bit_reversed_value |
assign braddr[k] = iaddr[LGSIZE-3-k]; |
end endgenerate |
|
initial iaddr = 0; |
initial in_reset = 1'b1; |
initial o_sync = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
iaddr <= 0; |
in_reset <= 1'b1; |
o_sync <= 1'b0; |
end else if (i_ce) |
begin |
iaddr <= iaddr + { {(LGSIZE-1){1'b0}}, 1'b1 }; |
if (&iaddr[(LGSIZE-2):0]) |
in_reset <= 1'b0; |
if (in_reset) |
o_sync <= 1'b0; |
else |
o_sync <= ~(|iaddr[(LGSIZE-2):0]); |
end |
|
reg [(2*WIDTH-1):0] mem_e [0:((1<<(LGSIZE))-1)]; |
reg [(2*WIDTH-1):0] mem_o [0:((1<<(LGSIZE))-1)]; |
|
always @(posedge i_clk) |
if (i_ce) mem_e[iaddr] <= i_in_0; |
always @(posedge i_clk) |
if (i_ce) mem_o[iaddr] <= i_in_1; |
|
|
reg [(2*WIDTH-1):0] evn_out_0, evn_out_1, odd_out_0, odd_out_1; |
|
always @(posedge i_clk) |
if (i_ce) |
evn_out_0 <= mem_e[{!iaddr[LGSIZE-1],1'b0,braddr}]; |
always @(posedge i_clk) |
if (i_ce) |
evn_out_1 <= mem_e[{!iaddr[LGSIZE-1],1'b1,braddr}]; |
always @(posedge i_clk) |
if (i_ce) |
odd_out_0 <= mem_o[{!iaddr[LGSIZE-1],1'b0,braddr}]; |
always @(posedge i_clk) |
if (i_ce) |
odd_out_1 <= mem_o[{!iaddr[LGSIZE-1],1'b1,braddr}]; |
|
reg adrz; |
always @(posedge i_clk) |
if (i_ce) adrz <= iaddr[LGSIZE-2]; |
|
assign o_out_0 = (adrz)?odd_out_0:evn_out_0; |
assign o_out_1 = (adrz)?odd_out_1:evn_out_1; |
|
`ifdef FORMAL |
`ifdef BITREVERSE |
`define ASSUME assume |
`define ASSERT assert |
`else |
`define ASSUME assert |
`define ASSERT assume |
`endif |
|
reg f_past_valid; |
initial f_past_valid = 1'b0; |
always @(posedge i_clk) |
f_past_valid <= 1'b1; |
|
initial `ASSUME(i_reset); |
always @(posedge i_clk) |
if ((!f_past_valid)||($past(i_reset))) |
begin |
`ASSERT(iaddr == 0); |
`ASSERT(in_reset); |
`ASSERT(!o_sync); |
end |
`ifdef BITREVERSE |
always @(posedge i_clk) |
assume((i_ce)||($past(i_ce))||($past(i_ce,2))); |
`endif // BITREVERSE |
|
(* anyconst *) reg [LGSIZE-1:0] f_const_addr; |
wire [LGSIZE-3:0] f_reversed_addr; |
// reg [LGSIZE:0] f_now; |
reg f_addr_loaded_0, f_addr_loaded_1; |
reg [(2*WIDTH-1):0] f_data_0, f_data_1; |
wire f_writing, f_reading; |
|
generate for(k=0; k<LGSIZE-2; k=k+1) |
assign f_reversed_addr[k] = f_const_addr[LGSIZE-3-k]; |
endgenerate |
|
assign f_writing=(f_const_addr[LGSIZE-1]==iaddr[LGSIZE-1]); |
assign f_reading=(f_const_addr[LGSIZE-1]!=iaddr[LGSIZE-1]); |
initial f_addr_loaded_0 = 1'b0; |
initial f_addr_loaded_1 = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
f_addr_loaded_0 <= 1'b0; |
f_addr_loaded_1 <= 1'b0; |
end else if (i_ce) |
begin |
if (iaddr == f_const_addr) |
begin |
f_addr_loaded_0 <= 1'b1; |
f_addr_loaded_1 <= 1'b1; |
end |
|
if (f_reading) |
begin |
if ((braddr == f_const_addr[LGSIZE-3:0]) |
&&(iaddr[LGSIZE-2] == 1'b0)) |
f_addr_loaded_0 <= 1'b0; |
|
if ((braddr == f_const_addr[LGSIZE-3:0]) |
&&(iaddr[LGSIZE-2] == 1'b1)) |
f_addr_loaded_1 <= 1'b0; |
end |
end |
|
always @(posedge i_clk) |
if ((i_ce)&&(iaddr == f_const_addr)) |
begin |
f_data_0 <= i_in_0; |
f_data_1 <= i_in_1; |
`ASSERT(!f_addr_loaded_0); |
`ASSERT(!f_addr_loaded_1); |
end |
|
always @(posedge i_clk) |
if ((f_past_valid)&&(!$past(i_reset)) |
&&($past(f_addr_loaded_0))&&(!f_addr_loaded_0)) |
begin |
assert(!$past(iaddr[LGSIZE-2])); |
if (f_const_addr[LGSIZE-2]) |
assert(o_out_1 == f_data_0); |
else |
assert(o_out_0 == f_data_0); |
end |
|
always @(posedge i_clk) |
if ((f_past_valid)&&(!$past(i_reset)) |
&&($past(f_addr_loaded_1))&&(!f_addr_loaded_1)) |
begin |
assert($past(iaddr[LGSIZE-2])); |
if (f_const_addr[LGSIZE-2]) |
assert(o_out_1 == f_data_1); |
else |
assert(o_out_0 == f_data_1); |
end |
|
always @(*) |
`ASSERT(o_sync == ((iaddr[LGSIZE-2:0] == 1)&&(!in_reset))); |
|
// Before writing to a section, the loaded flags should be |
// zero |
always @(*) |
if (f_writing) |
begin |
`ASSERT(f_addr_loaded_0 == (iaddr[LGSIZE-2:0] |
> f_const_addr[LGSIZE-2:0])); |
`ASSERT(f_addr_loaded_1 == (iaddr[LGSIZE-2:0] |
> f_const_addr[LGSIZE-2:0])); |
end |
|
// If we were writing, and now we are reading, then both |
// f_addr_loaded flags must be set |
always @(posedge i_clk) |
if ((f_past_valid)&&(!$past(i_reset)) |
&&($past(f_writing))&&(f_reading)) |
begin |
`ASSERT(f_addr_loaded_0); |
`ASSERT(f_addr_loaded_1); |
end |
|
always @(*) |
if (f_writing) |
`ASSERT(f_addr_loaded_0 == f_addr_loaded_1); |
|
// When reading, and the loaded flag is zero, our pointer |
// must not have hit the address of interest yet |
always @(*) |
if ((!in_reset)&&(f_reading)) |
`ASSERT(f_addr_loaded_0 == |
((!iaddr[LGSIZE-2])&&(iaddr[LGSIZE-3:0] |
<= f_reversed_addr[LGSIZE-3:0]))); |
always @(*) |
if ((!in_reset)&&(f_reading)) |
`ASSERT(f_addr_loaded_1 == |
((!iaddr[LGSIZE-2])||(iaddr[LGSIZE-3:0] |
<= f_reversed_addr[LGSIZE-3:0]))); |
always @(*) |
if ((in_reset)&&(f_reading)) |
begin |
`ASSERT(!f_addr_loaded_0); |
`ASSERT(!f_addr_loaded_1); |
end |
|
always @(*) |
if(iaddr[LGSIZE-1]) |
`ASSERT(!in_reset); |
|
always @(*) |
if (f_addr_loaded_0) |
`ASSERT(mem_e[f_const_addr] == f_data_0); |
always @(*) |
if (f_addr_loaded_1) |
`ASSERT(mem_o[f_const_addr] == f_data_1); |
|
|
`endif // FORMAL |
endmodule |
/trunk/rtl/butterfly.v
0,0 → 1,803
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: butterfly.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This routine caculates a butterfly for a decimation |
// in frequency version of an FFT. Specifically, given |
// complex Left and Right values together with a coefficient, the output |
// of this routine is given by: |
// |
// L' = L + R |
// R' = (L - R)*C |
// |
// The rest of the junk below handles timing (mostly), to make certain |
// that L' and R' reach the output at the same clock. Further, just to |
// make certain that is the case, an 'aux' input exists. This aux value |
// will come out of this routine synchronized to the values it came in |
// with. (i.e., both L', R', and aux all have the same delay.) Hence, |
// a caller of this routine may set aux on the first input with valid |
// data, and then wait to see aux set on the output to know when to find |
// the first output with valid data. |
// |
// All bits are preserved until the very last clock, where any more bits |
// than OWIDTH will be quietly discarded. |
// |
// This design features no overflow checking. |
// |
// Notes: |
// CORDIC: |
// Much as we might like, we can't use a cordic here. |
// The goal is to accomplish an FFT, as defined, and a |
// CORDIC places a scale factor onto the data. Removing |
// the scale factor would cost two multiplies, which |
// is precisely what we are trying to avoid. |
// |
// |
// 3-MULTIPLIES: |
// It should also be possible to do this with three multiplies |
// and an extra two addition cycles. |
// |
// We want |
// R+I = (a + jb) * (c + jd) |
// R+I = (ac-bd) + j(ad+bc) |
// We multiply |
// P1 = ac |
// P2 = bd |
// P3 = (a+b)(c+d) |
// Then |
// R+I=(P1-P2)+j(P3-P2-P1) |
// |
// WIDTHS: |
// On multiplying an X width number by an |
// Y width number, X>Y, the result should be (X+Y) |
// bits, right? |
// -2^(X-1) <= a <= 2^(X-1) - 1 |
// -2^(Y-1) <= b <= 2^(Y-1) - 1 |
// (2^(Y-1)-1)*(-2^(X-1)) <= ab <= 2^(X-1)2^(Y-1) |
// -2^(X+Y-2)+2^(X-1) <= ab <= 2^(X+Y-2) <= 2^(X+Y-1) - 1 |
// -2^(X+Y-1) <= ab <= 2^(X+Y-1)-1 |
// YUP! But just barely. Do this and you'll really want |
// to drop a bit, although you will risk overflow in so |
// doing. |
// |
// 20150602 -- The sync logic lines have been completely redone. The |
// synchronization lines no longer go through the FIFO with the |
// left hand sum, but are kept out of memory. This allows the |
// butterfly to use more optimal memory resources, while also |
// guaranteeing that the sync lines can be properly reset upon |
// any reset signal. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module butterfly(i_clk, i_reset, i_ce, i_coef, i_left, i_right, i_aux, |
o_left, o_right, o_aux); |
// Public changeable parameters ... |
parameter IWIDTH=16,CWIDTH=20,OWIDTH=17; |
parameter SHIFT=0; |
// The number of clocks per each i_ce. The actual number can be |
// more, but the algorithm depends upon at least this many for |
// extra internal processing. |
parameter CKPCE=1; |
// |
// Local/derived parameters that are calculated from the above |
// params. Apart from algorithmic changes below, these should not |
// be adjusted |
// |
// The first step is to calculate how many clocks it takes our |
// multiply to come back with an answer within. The time in the |
// multiply depends upon the input value with the fewest number of |
// bits--to keep the pipeline depth short. So, let's find the |
// fewest number of bits here. |
localparam MXMPYBITS = |
((IWIDTH+2)>(CWIDTH+1)) ? (CWIDTH+1) : (IWIDTH + 2); |
// |
// Given this "fewest" number of bits, we can calculate the |
// number of clocks the multiply itself will take. |
localparam MPYDELAY=((MXMPYBITS+1)/2)+2; |
// |
// In an environment when CKPCE > 1, the multiply delay isn't |
// necessarily the delay felt by this algorithm--measured in |
// i_ce's. In particular, if the multiply can operate with more |
// operations per clock, it can appear to finish "faster". |
// Since most of the logic in this core operates on the slower |
// clock, we'll need to map that speed into the number of slower |
// clock ticks that it takes. |
localparam LCLDELAY = (CKPCE == 1) ? MPYDELAY |
: (CKPCE == 2) ? (MPYDELAY/2+2) |
: (MPYDELAY/3 + 2); |
localparam LGDELAY = (MPYDELAY>64) ? 7 |
: (MPYDELAY > 32) ? 6 |
: (MPYDELAY > 16) ? 5 |
: (MPYDELAY > 8) ? 4 |
: (MPYDELAY > 4) ? 3 |
: 2; |
localparam AUXLEN=(LCLDELAY+3); |
localparam MPYREMAINDER = MPYDELAY - CKPCE*(MPYDELAY/CKPCE); |
|
|
input i_clk, i_reset, i_ce; |
input [(2*CWIDTH-1):0] i_coef; |
input [(2*IWIDTH-1):0] i_left, i_right; |
input i_aux; |
output wire [(2*OWIDTH-1):0] o_left, o_right; |
output reg o_aux; |
|
reg [(2*IWIDTH-1):0] r_left, r_right; |
reg [(2*CWIDTH-1):0] r_coef, r_coef_2; |
wire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i; |
assign r_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)]; |
assign r_left_i = r_left[ (IWIDTH-1):0]; |
assign r_right_r = r_right[(2*IWIDTH-1):(IWIDTH)]; |
assign r_right_i = r_right[(IWIDTH-1):0]; |
|
reg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i; |
|
reg [(LGDELAY-1):0] fifo_addr; |
wire [(LGDELAY-1):0] fifo_read_addr; |
assign fifo_read_addr = fifo_addr - LCLDELAY[(LGDELAY-1):0]; |
reg [(2*IWIDTH+1):0] fifo_left [ 0:((1<<LGDELAY)-1)]; |
|
// Set up the input to the multiply |
always @(posedge i_clk) |
if (i_ce) |
begin |
// One clock just latches the inputs |
r_left <= i_left; // No change in # of bits |
r_right <= i_right; |
r_coef <= i_coef; |
// Next clock adds/subtracts |
r_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits |
r_sum_i <= r_left_i + r_right_i; |
r_dif_r <= r_left_r - r_right_r; |
r_dif_i <= r_left_i - r_right_i; |
// Other inputs are simply delayed on second clock |
r_coef_2<= r_coef; |
end |
|
// Don't forget to record the even side, since it doesn't need |
// to be multiplied, but yet we still need the results in sync |
// with the answer when it is ready. |
initial fifo_addr = 0; |
always @(posedge i_clk) |
if (i_reset) |
fifo_addr <= 0; |
else if (i_ce) |
// Need to delay the sum side--nothing else happens |
// to it, but it needs to stay synchronized with the |
// right side. |
fifo_addr <= fifo_addr + 1; |
|
always @(posedge i_clk) |
if (i_ce) |
fifo_left[fifo_addr] <= { r_sum_r, r_sum_i }; |
|
wire signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i; |
assign ir_coef_r = r_coef_2[(2*CWIDTH-1):CWIDTH]; |
assign ir_coef_i = r_coef_2[(CWIDTH-1):0]; |
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] p_one, p_two, p_three; |
|
|
// Multiply output is always a width of the sum of the widths of |
// the two inputs. ALWAYS. This is independent of the number of |
// bits in p_one, p_two, or p_three. These values needed to |
// accumulate a bit (or two) each. However, this approach to a |
// three multiply complex multiply cannot increase the total |
// number of bits in our final output. We'll take care of |
// dropping back down to the proper width, OWIDTH, in our routine |
// below. |
|
|
// We accomplish here "Karatsuba" multiplication. That is, |
// by doing three multiplies we accomplish the work of four. |
// Let's prove to ourselves that this works ... We wish to |
// multiply: (a+jb) * (c+jd), where a+jb is given by |
// a + jb = r_dif_r + j r_dif_i, and |
// c + jd = ir_coef_r + j ir_coef_i. |
// We do this by calculating the intermediate products P1, P2, |
// and P3 as |
// P1 = ac |
// P2 = bd |
// P3 = (a + b) * (c + d) |
// and then complete our final answer with |
// ac - bd = P1 - P2 (this checks) |
// ad + bc = P3 - P2 - P1 |
// = (ac + bc + ad + bd) - bd - ac |
// = bc + ad (this checks) |
|
|
// This should really be based upon an IF, such as in |
// if (IWIDTH < CWIDTH) then ... |
// However, this is the only (other) way I know to do it. |
generate if (CKPCE <= 1) |
begin |
|
wire [(CWIDTH):0] p3c_in; |
wire [(IWIDTH+1):0] p3d_in; |
assign p3c_in = ir_coef_i + ir_coef_r; |
assign p3d_in = r_dif_r + r_dif_i; |
|
// We need to pad these first two multiplies by an extra |
// bit just to keep them aligned with the third, |
// simpler, multiply. |
longbimpy #(CWIDTH+1,IWIDTH+2) p1(i_clk, i_ce, |
{ir_coef_r[CWIDTH-1],ir_coef_r}, |
{r_dif_r[IWIDTH],r_dif_r}, p_one); |
longbimpy #(CWIDTH+1,IWIDTH+2) p2(i_clk, i_ce, |
{ir_coef_i[CWIDTH-1],ir_coef_i}, |
{r_dif_i[IWIDTH],r_dif_i}, p_two); |
longbimpy #(CWIDTH+1,IWIDTH+2) p3(i_clk, i_ce, |
p3c_in, p3d_in, p_three); |
|
end else if (CKPCE == 2) |
begin : CKPCE_TWO |
// Coefficient multiply inputs |
reg [2*(CWIDTH)-1:0] mpy_pipe_c; |
// Data multiply inputs |
reg [2*(IWIDTH+1)-1:0] mpy_pipe_d; |
wire signed [(CWIDTH-1):0] mpy_pipe_vc; |
wire signed [(IWIDTH):0] mpy_pipe_vd; |
// |
reg signed [(CWIDTH+1)-1:0] mpy_cof_sum; |
reg signed [(IWIDTH+2)-1:0] mpy_dif_sum; |
|
assign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH]; |
assign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1]; |
|
reg mpy_pipe_v; |
reg ce_phase; |
|
reg signed [(CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out; |
reg signed [IWIDTH+CWIDTH+3-1:0] longmpy; |
|
|
initial ce_phase = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
ce_phase <= 1'b0; |
else if (i_ce) |
ce_phase <= 1'b1; |
else |
ce_phase <= 1'b0; |
|
always @(*) |
mpy_pipe_v = (i_ce)||(ce_phase); |
|
always @(posedge i_clk) |
if (ce_phase) |
begin |
mpy_pipe_c[2*CWIDTH-1:0] <= |
{ ir_coef_r, ir_coef_i }; |
mpy_pipe_d[2*(IWIDTH+1)-1:0] <= |
{ r_dif_r, r_dif_i }; |
|
mpy_cof_sum <= ir_coef_i + ir_coef_r; |
mpy_dif_sum <= r_dif_r + r_dif_i; |
|
end else if (i_ce) |
begin |
mpy_pipe_c[2*(CWIDTH)-1:0] <= { |
mpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} }; |
mpy_pipe_d[2*(IWIDTH+1)-1:0] <= { |
mpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} }; |
end |
|
longbimpy #(CWIDTH+1,IWIDTH+2) mpy0(i_clk, mpy_pipe_v, |
mpy_cof_sum, mpy_dif_sum, longmpy); |
|
longbimpy #(CWIDTH+1,IWIDTH+2) mpy1(i_clk, mpy_pipe_v, |
{ mpy_pipe_vc[CWIDTH-1], mpy_pipe_vc }, |
{ mpy_pipe_vd[IWIDTH ], mpy_pipe_vd }, |
mpy_pipe_out); |
|
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] |
rp_one, rp_two, rp_three, |
rp2_one, rp2_two, rp2_three; |
|
always @(posedge i_clk) |
if (((i_ce)&&(!MPYDELAY[0])) |
||((ce_phase)&&(MPYDELAY[0]))) |
rp_one <= mpy_pipe_out; |
always @(posedge i_clk) |
if (((i_ce)&&(MPYDELAY[0])) |
||((ce_phase)&&(!MPYDELAY[0]))) |
rp_two <= mpy_pipe_out; |
always @(posedge i_clk) |
if (i_ce) |
rp_three <= longmpy; |
|
// Our outputs *MUST* be set on a clock where i_ce is |
// true for the following logic to work. Make that |
// happen here. |
always @(posedge i_clk) |
if (i_ce) |
rp2_one<= rp_one; |
always @(posedge i_clk) |
if (i_ce) |
rp2_two <= rp_two; |
always @(posedge i_clk) |
if (i_ce) |
rp2_three<= rp_three; |
|
assign p_one = rp2_one; |
assign p_two = (!MPYDELAY[0])? rp2_two : rp_two; |
assign p_three = ( MPYDELAY[0])? rp_three : rp2_three; |
|
// verilator lint_off UNUSED |
wire [2*(IWIDTH+CWIDTH+3)-1:0] unused; |
assign unused = { rp2_two, rp2_three }; |
// verilator lint_on UNUSED |
|
end else if (CKPCE <= 3) |
begin : CKPCE_THREE |
// Coefficient multiply inputs |
reg [3*(CWIDTH+1)-1:0] mpy_pipe_c; |
// Data multiply inputs |
reg [3*(IWIDTH+2)-1:0] mpy_pipe_d; |
wire signed [(CWIDTH):0] mpy_pipe_vc; |
wire signed [(IWIDTH+1):0] mpy_pipe_vd; |
|
assign mpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)]; |
assign mpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)]; |
|
reg mpy_pipe_v; |
reg [2:0] ce_phase; |
|
reg signed [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out; |
|
initial ce_phase = 3'b011; |
always @(posedge i_clk) |
if (i_reset) |
ce_phase <= 3'b011; |
else if (i_ce) |
ce_phase <= 3'b000; |
else if (ce_phase != 3'b011) |
ce_phase <= ce_phase + 1'b1; |
|
always @(*) |
mpy_pipe_v = (i_ce)||(ce_phase < 3'b010); |
|
always @(posedge i_clk) |
if (ce_phase == 3'b000) |
begin |
// Second clock |
mpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= { |
ir_coef_r[CWIDTH-1], ir_coef_r, |
ir_coef_i[CWIDTH-1], ir_coef_i }; |
mpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r; |
mpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= { |
r_dif_r[IWIDTH], r_dif_r, |
r_dif_i[IWIDTH], r_dif_i }; |
mpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i; |
|
end else if (mpy_pipe_v) |
begin |
mpy_pipe_c[3*(CWIDTH+1)-1:0] <= { |
mpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1'b0}} }; |
mpy_pipe_d[3*(IWIDTH+2)-1:0] <= { |
mpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1'b0}} }; |
end |
|
longbimpy #(CWIDTH+1,IWIDTH+2) mpy(i_clk, mpy_pipe_v, |
mpy_pipe_vc, mpy_pipe_vd, mpy_pipe_out); |
|
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] |
rp_one, rp_two, rp_three, |
rp2_one, rp2_two, rp2_three, |
rp3_one; |
|
always @(posedge i_clk) |
if (MPYREMAINDER == 0) |
begin |
|
if (i_ce) |
rp_two <= mpy_pipe_out; |
else if (ce_phase == 3'b000) |
rp_three <= mpy_pipe_out; |
else if (ce_phase == 3'b001) |
rp_one <= mpy_pipe_out; |
|
end else if (MPYREMAINDER == 1) |
begin |
|
if (i_ce) |
rp_one <= mpy_pipe_out; |
else if (ce_phase == 3'b000) |
rp_two <= mpy_pipe_out; |
else if (ce_phase == 3'b001) |
rp_three <= mpy_pipe_out; |
|
end else // if (MPYREMAINDER == 2) |
begin |
|
if (i_ce) |
rp_three <= mpy_pipe_out; |
else if (ce_phase == 3'b000) |
rp_one <= mpy_pipe_out; |
else if (ce_phase == 3'b001) |
rp_two <= mpy_pipe_out; |
|
end |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
rp2_one <= rp_one; |
rp2_two <= rp_two; |
rp2_three <= (MPYREMAINDER == 2) ? mpy_pipe_out : rp_three; |
rp3_one <= (MPYREMAINDER == 0) ? rp2_one : rp_one; |
end |
assign p_one = rp3_one; |
assign p_two = rp2_two; |
assign p_three = rp2_three; |
|
end endgenerate |
// These values are held in memory and delayed during the |
// multiply. Here, we recover them. During the multiply, |
// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...}, |
// therefore, the left_x values need to be right shifted by |
// CWIDTH-2 as well. The additional bits come from a sign |
// extension. |
wire signed [(IWIDTH+CWIDTH):0] fifo_i, fifo_r; |
reg [(2*IWIDTH+1):0] fifo_read; |
assign fifo_r = { {2{fifo_read[2*(IWIDTH+1)-1]}}, fifo_read[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1'b0}} }; |
assign fifo_i = { {2{fifo_read[(IWIDTH+1)-1]}}, fifo_read[((IWIDTH+1)-1):0], {(CWIDTH-2){1'b0}} }; |
|
|
reg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i; |
|
// Let's do some rounding and remove unnecessary bits. |
// We have (IWIDTH+CWIDTH+3) bits here, we need to drop down to |
// OWIDTH, and SHIFT by SHIFT bits in the process. The trick is |
// that we don't need (IWIDTH+CWIDTH+3) bits. We've accumulated |
// them, but the actual values will never fill all these bits. |
// In particular, we only need: |
// IWIDTH bits for the input |
// +1 bit for the add/subtract |
// +CWIDTH bits for the coefficient multiply |
// +1 bit for the add/subtract in the complex multiply |
// ------ |
// (IWIDTH+CWIDTH+2) bits at full precision. |
// |
// However, the coefficient multiply multiplied by a maximum value |
// of 2^(CWIDTH-2). Thus, we only have |
// IWIDTH bits for the input |
// +1 bit for the add/subtract |
// +CWIDTH-2 bits for the coefficient multiply |
// +1 (optional) bit for the add/subtract in the cpx mpy. |
// -------- ... multiply. (This last bit may be shifted out.) |
// (IWIDTH+CWIDTH) valid output bits. |
// Now, if the user wants to keep any extras of these (via OWIDTH), |
// or if he wishes to arbitrarily shift some of these off (via |
// SHIFT) we accomplish that here. |
|
wire signed [(OWIDTH-1):0] rnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i; |
|
wire signed [(CWIDTH+IWIDTH+3-1):0] left_sr, left_si; |
assign left_sr = { {(2){fifo_r[(IWIDTH+CWIDTH)]}}, fifo_r }; |
assign left_si = { {(2){fifo_i[(IWIDTH+CWIDTH)]}}, fifo_i }; |
|
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_r(i_clk, i_ce, |
left_sr, rnd_left_r); |
|
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_i(i_clk, i_ce, |
left_si, rnd_left_i); |
|
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce, |
mpy_r, rnd_right_r); |
|
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce, |
mpy_i, rnd_right_i); |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
// First clock, recover all values |
fifo_read <= fifo_left[fifo_read_addr]; |
// These values are IWIDTH+CWIDTH+3 bits wide |
// although they only need to be (IWIDTH+1) |
// + (CWIDTH) bits wide. (We've got two |
// extra bits we need to get rid of.) |
mpy_r <= p_one - p_two; |
mpy_i <= p_three - p_one - p_two; |
end |
|
reg [(AUXLEN-1):0] aux_pipeline; |
initial aux_pipeline = 0; |
always @(posedge i_clk) |
if (i_reset) |
aux_pipeline <= 0; |
else if (i_ce) |
aux_pipeline <= { aux_pipeline[(AUXLEN-2):0], i_aux }; |
|
initial o_aux = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
o_aux <= 1'b0; |
else if (i_ce) |
begin |
// Second clock, latch for final clock |
o_aux <= aux_pipeline[AUXLEN-1]; |
end |
|
// As a final step, we pack our outputs into two packed two's |
// complement numbers per output word, so that each output word |
// has (2*OWIDTH) bits in it, with the top half being the real |
// portion and the bottom half being the imaginary portion. |
assign o_left = { rnd_left_r, rnd_left_i }; |
assign o_right= { rnd_right_r,rnd_right_i}; |
|
`ifdef VERILATOR |
`define FORMAL |
`endif |
`ifdef FORMAL |
localparam F_LGDEPTH = (AUXLEN > 64) ? 7 |
: (AUXLEN > 32) ? 6 |
: (AUXLEN > 16) ? 5 |
: (AUXLEN > 8) ? 4 |
: (AUXLEN > 4) ? 3 : 2; |
|
localparam F_DEPTH = AUXLEN; |
localparam [F_LGDEPTH-1:0] F_D = F_DEPTH[F_LGDEPTH-1:0]-1; |
|
reg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1]; |
reg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1]; |
reg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1]; |
reg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1]; |
reg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1]; |
reg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1]; |
reg signed [F_DEPTH-1:0] f_dlyaux; |
|
initial f_dlyaux[0] = 0; |
always @(posedge i_clk) |
if (i_reset) |
f_dlyaux <= 0; |
else if (i_ce) |
f_dlyaux <= { f_dlyaux[F_DEPTH-2:0], i_aux }; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
f_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH]; |
f_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0]; |
f_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH]; |
f_dlyright_i[0] <= i_right[( IWIDTH-1):0]; |
f_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH]; |
f_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0]; |
end |
|
genvar k; |
generate for(k=1; k<F_DEPTH; k=k+1) |
begin : F_PROPAGATE_DELAY_LINES |
|
|
always @(posedge i_clk) |
if (i_ce) |
begin |
f_dlyleft_r[k] <= f_dlyleft_r[ k-1]; |
f_dlyleft_i[k] <= f_dlyleft_i[ k-1]; |
f_dlyright_r[k] <= f_dlyright_r[k-1]; |
f_dlyright_i[k] <= f_dlyright_i[k-1]; |
f_dlycoeff_r[k] <= f_dlycoeff_r[k-1]; |
f_dlycoeff_i[k] <= f_dlycoeff_i[k-1]; |
end |
|
end endgenerate |
|
`ifndef VERILATOR |
always @(posedge i_clk) |
if ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3)) |
&&(!$past(i_ce,4))) |
assume(i_ce); |
|
generate if (CKPCE <= 1) |
begin |
|
// i_ce is allowed to be anything in this mode |
|
end else if (CKPCE == 2) |
begin : F_CKPCE_TWO |
|
always @(posedge i_clk) |
if ($past(i_ce)) |
assume(!i_ce); |
|
end else if (CKPCE == 3) |
begin : F_CKPCE_THREE |
|
always @(posedge i_clk) |
if (($past(i_ce))||($past(i_ce,2))) |
assume(!i_ce); |
|
end endgenerate |
`endif |
|
reg [F_LGDEPTH:0] f_startup_counter; |
initial f_startup_counter = 0; |
always @(posedge i_clk) |
if (i_reset) |
f_startup_counter <= 0; |
else if ((i_ce)&&(!(&f_startup_counter))) |
f_startup_counter <= f_startup_counter + 1; |
|
wire signed [IWIDTH:0] f_sumr, f_sumi; |
always @(*) |
begin |
f_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D]; |
f_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D]; |
end |
|
wire signed [IWIDTH+CWIDTH+3-1:0] f_sumrx, f_sumix; |
assign f_sumrx = { {(4){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} }; |
assign f_sumix = { {(4){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} }; |
|
wire signed [IWIDTH:0] f_difr, f_difi; |
always @(*) |
begin |
f_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D]; |
f_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D]; |
end |
|
wire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix; |
assign f_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr }; |
assign f_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi }; |
|
wire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i; |
assign f_widecoeff_r ={ {(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}}, |
f_dlycoeff_r[F_D] }; |
assign f_widecoeff_i ={ {(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}}, |
f_dlycoeff_i[F_D] }; |
|
always @(posedge i_clk) |
if (f_startup_counter > {1'b0, F_D}) |
begin |
assert(aux_pipeline == f_dlyaux); |
assert(left_sr == f_sumrx); |
assert(left_si == f_sumix); |
assert(aux_pipeline[AUXLEN-1] == f_dlyaux[F_D]); |
|
if ((f_difr == 0)&&(f_difi == 0)) |
begin |
assert(mpy_r == 0); |
assert(mpy_i == 0); |
end else if ((f_dlycoeff_r[F_D] == 0) |
&&(f_dlycoeff_i[F_D] == 0)) |
begin |
assert(mpy_r == 0); |
assert(mpy_i == 0); |
end |
|
if ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0)) |
begin |
assert(mpy_r == f_difrx); |
assert(mpy_i == f_difix); |
end |
|
if ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1)) |
begin |
assert(mpy_r == -f_difix); |
assert(mpy_i == f_difrx); |
end |
|
if ((f_difr == 1)&&(f_difi == 0)) |
begin |
assert(mpy_r == f_widecoeff_r); |
assert(mpy_i == f_widecoeff_i); |
end |
|
if ((f_difr == 0)&&(f_difi == 1)) |
begin |
assert(mpy_r == -f_widecoeff_i); |
assert(mpy_i == f_widecoeff_r); |
end |
end |
|
// Let's see if we can improve our performance at all by |
// moving our test one clock earlier. If nothing else, it should |
// help induction finish one (or more) clocks ealier than |
// otherwise |
|
|
wire signed [IWIDTH:0] f_predifr, f_predifi; |
always @(*) |
begin |
f_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1]; |
f_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1]; |
end |
|
wire signed [IWIDTH+CWIDTH+3-1:0] f_predifrx, f_predifix; |
assign f_predifrx = { {(CWIDTH+2){f_predifr[IWIDTH]}}, f_predifr }; |
assign f_predifix = { {(CWIDTH+2){f_predifi[IWIDTH]}}, f_predifi }; |
|
wire signed [CWIDTH:0] f_sumcoef; |
wire signed [IWIDTH+1:0] f_sumdiff; |
always @(*) |
begin |
f_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1]; |
f_sumdiff = f_predifr + f_predifi; |
end |
|
// Induction helpers |
always @(posedge i_clk) |
if (f_startup_counter >= { 1'b0, F_D }) |
begin |
if (f_dlycoeff_r[F_D-1] == 0) |
assert(p_one == 0); |
if (f_dlycoeff_i[F_D-1] == 0) |
assert(p_two == 0); |
|
if (f_dlycoeff_r[F_D-1] == 1) |
assert(p_one == f_predifrx); |
if (f_dlycoeff_i[F_D-1] == 1) |
assert(p_two == f_predifix); |
|
if (f_predifr == 0) |
assert(p_one == 0); |
if (f_predifi == 0) |
assert(p_two == 0); |
|
// verilator lint_off WIDTH |
if (f_predifr == 1) |
assert(p_one == f_dlycoeff_r[F_D-1]); |
if (f_predifi == 1) |
assert(p_two == f_dlycoeff_i[F_D-1]); |
// verilator lint_on WIDTH |
|
if (f_sumcoef == 0) |
assert(p_three == 0); |
if (f_sumdiff == 0) |
assert(p_three == 0); |
// verilator lint_off WIDTH |
if (f_sumcoef == 1) |
assert(p_three == f_sumdiff); |
if (f_sumdiff == 1) |
assert(p_three == f_sumcoef); |
// verilator lint_on WIDTH |
`ifdef VERILATOR |
assert(p_one == f_predifr * f_dlycoeff_r[F_D-1]); |
assert(p_two == f_predifi * f_dlycoeff_i[F_D-1]); |
assert(p_three == f_sumdiff * f_sumcoef); |
`endif // VERILATOR |
end |
|
// F_CHECK will be set externally by the solver, so that we can |
// double check that the solver is actually testing what we think |
// it is testing. We'll set it here to MPYREMAINDER, which will |
// essentially eliminate the check--unless overridden by the |
// solver. |
parameter F_CHECK = MPYREMAINDER; |
initial assert(MPYREMAINDER == F_CHECK); |
|
`endif // FORMAL |
endmodule |
/trunk/rtl/convround.v
0,0 → 1,124
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: convround.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: A convergent rounding routine, also known as banker's |
// rounding, Dutch rounding, Gaussian rounding, unbiased |
// rounding, or ... more, at least according to Wikipedia. |
// |
// This form of rounding works by rounding, when the direction is in |
// question, towards the nearest even value. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module convround(i_clk, i_ce, i_val, o_val); |
parameter IWID=16, OWID=8, SHIFT=0; |
input i_clk, i_ce; |
input signed [(IWID-1):0] i_val; |
output reg signed [(OWID-1):0] o_val; |
|
// Let's deal with three cases to be as general as we can be here |
// |
// 1. The desired output would lose no bits at all |
// 2. One bit would be dropped, so the rounding is simply |
// adjusting the value to be the nearest even number in |
// cases of being halfway between two. If identically |
// equal to a number, we just leave it as is. |
// 3. Two or more bits would be dropped. In this case, we round |
// normally unless we are rounding a value of exactly |
// halfway between the two. In the halfway case we round |
// to the nearest even number. |
generate |
if (IWID == OWID) // In this case, the shift is irrelevant and |
begin // cannot be applied. No truncation or rounding takes |
// effect here. |
|
always @(posedge i_clk) |
if (i_ce) o_val <= i_val[(IWID-1):0]; |
|
end else if (IWID-SHIFT == OWID) |
begin // No truncation or rounding, output drops no bits |
|
always @(posedge i_clk) |
if (i_ce) o_val <= i_val[(IWID-SHIFT-1):0]; |
|
end else if (IWID-SHIFT-1 == OWID) |
begin // Output drops one bit, can only add one or ... not. |
wire [(OWID-1):0] truncated_value, rounded_up; |
wire last_valid_bit, first_lost_bit; |
assign truncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)]; |
assign rounded_up=truncated_value + {{(OWID-1){1'b0}}, 1'b1 }; |
assign last_valid_bit = truncated_value[0]; |
assign first_lost_bit = i_val[0]; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
if (!first_lost_bit) // Round down / truncate |
o_val <= truncated_value; |
else if (last_valid_bit)// Round up to nearest |
o_val <= rounded_up; // even value |
else // else round down to the nearest |
o_val <= truncated_value; // even value |
end |
|
end else // If there's more than one bit we are dropping |
begin |
wire [(OWID-1):0] truncated_value, rounded_up; |
wire last_valid_bit, first_lost_bit; |
assign truncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)]; |
assign rounded_up=truncated_value + {{(OWID-1){1'b0}}, 1'b1 }; |
assign last_valid_bit = truncated_value[0]; |
assign first_lost_bit = i_val[(IWID-SHIFT-OWID-1)]; |
|
wire [(IWID-SHIFT-OWID-2):0] other_lost_bits; |
assign other_lost_bits = i_val[(IWID-SHIFT-OWID-2):0]; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
if (!first_lost_bit) // Round down / truncate |
o_val <= truncated_value; |
else if (|other_lost_bits) // Round up to |
o_val <= rounded_up; // closest value |
else if (last_valid_bit) // Round up to |
o_val <= rounded_up; // nearest even |
else // else round down to nearest even |
o_val <= truncated_value; |
end |
end |
endgenerate |
|
endmodule |
/trunk/rtl/fftmain.v
0,0 → 1,276
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: fftmain.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This is the main module in the General Purpose FPGA FFT |
// implementation. As such, all other modules are subordinate |
// to this one. This module accomplish a fixed size Complex FFT on |
// 2048 data points. |
// The FFT is fully pipelined, and accepts as inputs two complex two's |
// complement samples per clock. |
// |
// Parameters: |
// i_clk The clock. All operations are synchronous with this clock. |
// i_reset Synchronous reset, active high. Setting this line will |
// force the reset of all of the internals to this routine. |
// Further, following a reset, the o_sync line will go |
// high the same time the first output sample is valid. |
// i_ce A clock enable line. If this line is set, this module |
// will accept two complex values as inputs, and produce |
// two (possibly empty) complex values as outputs. |
// i_left The first of two complex input samples. This value is split |
// into two two's complement numbers, 15 bits each, with |
// the real portion in the high order bits, and the |
// imaginary portion taking the bottom 15 bits. |
// i_right This is the same thing as i_left, only this is the second of |
// two such samples. Hence, i_left would contain input |
// sample zero, i_right would contain sample one. On the |
// next clock i_left would contain input sample two, |
// i_right number three and so forth. |
// o_left The first of two output samples, of the same format as i_left, |
// only having 21 bits for each of the real and imaginary |
// components, leading to 42 bits total. |
// o_right The second of two output samples produced each clock. This has |
// the same format as o_left. |
// o_sync A one bit output indicating the first valid sample produced by |
// this FFT following a reset. Ever after, this will |
// indicate the first sample of an FFT frame. |
// |
// Arguments: This file was computer generated using the following command |
// line: |
// |
// % ./fftgen -v -d ../rtl -f 2048 -2 -p 0 -n 15 -a ../bench/cpp/fftsize.h |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
// |
// |
module fftmain(i_clk, i_reset, i_ce, |
i_left, i_right, |
o_left, o_right, o_sync); |
parameter IWIDTH=15, OWIDTH=21, LGWIDTH=11; |
// |
input i_clk, i_reset, i_ce; |
// |
input [(2*IWIDTH-1):0] i_left, i_right; |
output reg [(2*OWIDTH-1):0] o_left, o_right; |
output reg o_sync; |
|
|
// Outputs of the FFT, ready for bit reversal. |
wire [(2*OWIDTH-1):0] br_left, br_right; |
|
|
wire w_s2048; |
// verilator lint_off UNUSED |
wire w_os2048; |
// verilator lint_on UNUSED |
wire [31:0] w_e2048, w_o2048; |
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0, |
0, 1, "cmem_e4096.hex") |
stage_e2048(i_clk, i_reset, i_ce, |
(!i_reset), i_left, w_e2048, w_s2048); |
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0, |
0, 1, "cmem_o4096.hex") |
stage_o2048(i_clk, i_reset, i_ce, |
(!i_reset), i_right, w_o2048, w_os2048); |
|
|
wire w_s1024; |
// verilator lint_off UNUSED |
wire w_os1024; |
// verilator lint_on UNUSED |
wire [33:0] w_e1024, w_o1024; |
fftstage #(16,20,17,11,8,0, |
0, 1, "cmem_e2048.hex") |
stage_e1024(i_clk, i_reset, i_ce, |
w_s2048, w_e2048, w_e1024, w_s1024); |
fftstage #(16,20,17,11,8,0, |
0, 1, "cmem_o2048.hex") |
stage_o1024(i_clk, i_reset, i_ce, |
w_s2048, w_o2048, w_o1024, w_os1024); |
|
wire w_s512; |
// verilator lint_off UNUSED |
wire w_os512; |
// verilator lint_on UNUSED |
wire [33:0] w_e512, w_o512; |
fftstage #(17,21,17,11,7,0, |
0, 1, "cmem_e1024.hex") |
stage_e512(i_clk, i_reset, i_ce, |
w_s1024, w_e1024, w_e512, w_s512); |
fftstage #(17,21,17,11,7,0, |
0, 1, "cmem_o1024.hex") |
stage_o512(i_clk, i_reset, i_ce, |
w_s1024, w_o1024, w_o512, w_os512); |
|
wire w_s256; |
// verilator lint_off UNUSED |
wire w_os256; |
// verilator lint_on UNUSED |
wire [35:0] w_e256, w_o256; |
fftstage #(17,21,18,11,6,0, |
0, 1, "cmem_e512.hex") |
stage_e256(i_clk, i_reset, i_ce, |
w_s512, w_e512, w_e256, w_s256); |
fftstage #(17,21,18,11,6,0, |
0, 1, "cmem_o512.hex") |
stage_o256(i_clk, i_reset, i_ce, |
w_s512, w_o512, w_o256, w_os256); |
|
wire w_s128; |
// verilator lint_off UNUSED |
wire w_os128; |
// verilator lint_on UNUSED |
wire [35:0] w_e128, w_o128; |
fftstage #(18,22,18,11,5,0, |
0, 1, "cmem_e256.hex") |
stage_e128(i_clk, i_reset, i_ce, |
w_s256, w_e256, w_e128, w_s128); |
fftstage #(18,22,18,11,5,0, |
0, 1, "cmem_o256.hex") |
stage_o128(i_clk, i_reset, i_ce, |
w_s256, w_o256, w_o128, w_os128); |
|
wire w_s64; |
// verilator lint_off UNUSED |
wire w_os64; |
// verilator lint_on UNUSED |
wire [37:0] w_e64, w_o64; |
fftstage #(18,22,19,11,4,0, |
0, 1, "cmem_e128.hex") |
stage_e64(i_clk, i_reset, i_ce, |
w_s128, w_e128, w_e64, w_s64); |
fftstage #(18,22,19,11,4,0, |
0, 1, "cmem_o128.hex") |
stage_o64(i_clk, i_reset, i_ce, |
w_s128, w_o128, w_o64, w_os64); |
|
wire w_s32; |
// verilator lint_off UNUSED |
wire w_os32; |
// verilator lint_on UNUSED |
wire [37:0] w_e32, w_o32; |
fftstage #(19,23,19,11,3,0, |
0, 1, "cmem_e64.hex") |
stage_e32(i_clk, i_reset, i_ce, |
w_s64, w_e64, w_e32, w_s32); |
fftstage #(19,23,19,11,3,0, |
0, 1, "cmem_o64.hex") |
stage_o32(i_clk, i_reset, i_ce, |
w_s64, w_o64, w_o32, w_os32); |
|
wire w_s16; |
// verilator lint_off UNUSED |
wire w_os16; |
// verilator lint_on UNUSED |
wire [39:0] w_e16, w_o16; |
fftstage #(19,23,20,11,2,0, |
0, 1, "cmem_e32.hex") |
stage_e16(i_clk, i_reset, i_ce, |
w_s32, w_e32, w_e16, w_s16); |
fftstage #(19,23,20,11,2,0, |
0, 1, "cmem_o32.hex") |
stage_o16(i_clk, i_reset, i_ce, |
w_s32, w_o32, w_o16, w_os16); |
|
wire w_s8; |
// verilator lint_off UNUSED |
wire w_os8; |
// verilator lint_on UNUSED |
wire [39:0] w_e8, w_o8; |
fftstage #(20,24,20,11,1,0, |
0, 1, "cmem_e16.hex") |
stage_e8(i_clk, i_reset, i_ce, |
w_s16, w_e16, w_e8, w_s8); |
fftstage #(20,24,20,11,1,0, |
0, 1, "cmem_o16.hex") |
stage_o8(i_clk, i_reset, i_ce, |
w_s16, w_o16, w_o8, w_os8); |
|
wire w_s4; |
// verilator lint_off UNUSED |
wire w_os4; |
// verilator lint_on UNUSED |
wire [41:0] w_e4, w_o4; |
qtrstage #(20,21,11,0,0,0) stage_e4(i_clk, i_reset, i_ce, |
w_s8, w_e8, w_e4, w_s4); |
qtrstage #(20,21,11,1,0,0) stage_o4(i_clk, i_reset, i_ce, |
w_s8, w_o8, w_o4, w_os4); |
wire w_s2; |
wire [41:0] w_e2, w_o2; |
laststage #(21,21,0) stage_2(i_clk, i_reset, i_ce, |
w_s4, w_e4, w_o4, w_e2, w_o2, w_s2); |
|
|
// Prepare for a (potential) bit-reverse stage. |
assign br_left = w_e2; |
assign br_right = w_o2; |
|
wire br_start; |
reg r_br_started; |
initial r_br_started = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
r_br_started <= 1'b0; |
else if (i_ce) |
r_br_started <= r_br_started || w_s2; |
assign br_start = r_br_started || w_s2; |
|
// Now for the bit-reversal stage. |
wire br_sync; |
wire [(2*OWIDTH-1):0] br_o_left, br_o_right; |
bitreverse #(11,21) |
revstage(i_clk, i_reset, |
(i_ce & br_start), br_left, br_right, |
br_o_left, br_o_right, br_sync); |
|
|
// Last clock: Register our outputs, we're done. |
initial o_sync = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
o_sync <= 1'b0; |
else if (i_ce) |
o_sync <= br_sync; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
o_left <= br_o_left; |
o_right <= br_o_right; |
end |
|
|
endmodule |
/trunk/rtl/fftstage.v
0,0 → 1,247
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: fftstage.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This file is (almost) a Verilog source file. It is meant to |
// be used by a FFT core compiler to generate FFTs which may be |
// used as part of an FFT core. Specifically, this file encapsulates |
// the options of an FFT-stage. For any 2^N length FFT, there shall be |
// (N-1) of these stages. |
// |
// |
// Operation: |
// Given a stream of values, operate upon them as though they were |
// value pairs, x[n] and x[n+N/2]. The stream begins when n=0, and ends |
// when n=N/2-1 (i.e. there's a full set of N values). When the value |
// x[0] enters, the synchronization input, i_sync, must be true as well. |
// |
// For this stream, produce outputs |
// y[n ] = x[n] + x[n+N/2], and |
// y[n+N/2] = (x[n] - x[n+N/2]) * c[n], |
// where c[n] is a complex coefficient found in the |
// external memory file COEFFILE. |
// When y[0] is output, a synchronization bit o_sync will be true as |
// well, otherwise it will be zero. |
// |
// Most of the work to do this is done within the butterfly, whether the |
// hardware accelerated butterfly (uses a DSP) or not. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module fftstage(i_clk, i_reset, i_ce, i_sync, i_data, o_data, o_sync); |
parameter IWIDTH=15,CWIDTH=20,OWIDTH=16; |
// Parameters specific to the core that should be changed when this |
// core is built ... Note that the minimum LGSPAN (the base two log |
// of the span, or the base two log of the current FFT size) is 3. |
// Smaller spans (i.e. the span of 2) must use the dbl laststage module. |
parameter LGWIDTH=10, LGSPAN=8, BFLYSHIFT=0; |
parameter [0:0] OPT_HWMPY = 1; |
// Clocks per CE. If your incoming data rate is less than 50% of your |
// clock speed, you can set CKPCE to 2'b10, make sure there's at least |
// one clock between cycles when i_ce is high, and then use two |
// multiplies instead of three. Setting CKPCE to 2'b11, and insisting |
// on at least two clocks with i_ce low between cycles with i_ce high, |
// then the hardware optimized butterfly code will used one multiply |
// instead of two. |
parameter CKPCE = 1; |
// The COEFFILE parameter contains the name of the file containing the |
// FFT twiddle factors |
parameter COEFFILE="cmem_o2048.hex"; |
|
`ifdef VERILATOR |
parameter [0:0] ZERO_ON_IDLE = 1'b0; |
`else |
localparam [0:0] ZERO_ON_IDLE = 1'b0; |
`endif // VERILATOR |
|
input i_clk, i_reset, i_ce, i_sync; |
input [(2*IWIDTH-1):0] i_data; |
output reg [(2*OWIDTH-1):0] o_data; |
output reg o_sync; |
|
reg wait_for_sync; |
reg [(2*IWIDTH-1):0] ib_a, ib_b; |
reg [(2*CWIDTH-1):0] ib_c; |
reg ib_sync; |
|
reg b_started; |
wire ob_sync; |
wire [(2*OWIDTH-1):0] ob_a, ob_b; |
|
// cmem is defined as an array of real and complex values, |
// where the top CWIDTH bits are the real value and the bottom |
// CWIDTH bits are the imaginary value. |
// |
// cmem[i] = { (2^(CWIDTH-2)) * cos(2*pi*i/(2^LGWIDTH)), |
// (2^(CWIDTH-2)) * sin(2*pi*i/(2^LGWIDTH)) }; |
// |
reg [(2*CWIDTH-1):0] cmem [0:((1<<LGSPAN)-1)]; |
initial $readmemh(COEFFILE,cmem); |
|
reg [(LGSPAN):0] iaddr; |
reg [(2*IWIDTH-1):0] imem [0:((1<<LGSPAN)-1)]; |
|
reg [LGSPAN:0] oB; |
reg [(2*OWIDTH-1):0] omem [0:((1<<LGSPAN)-1)]; |
|
initial wait_for_sync = 1'b1; |
initial iaddr = 0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
wait_for_sync <= 1'b1; |
iaddr <= 0; |
end else if ((i_ce)&&((!wait_for_sync)||(i_sync))) |
begin |
// |
// First step: Record what we're not ready to use yet |
// |
iaddr <= iaddr + { {(LGSPAN){1'b0}}, 1'b1 }; |
wait_for_sync <= 1'b0; |
end |
always @(posedge i_clk) // Need to make certain here that we don't read |
if ((i_ce)&&(!iaddr[LGSPAN])) // and write the same address on |
imem[iaddr[(LGSPAN-1):0]] <= i_data; // the same clk |
|
// |
// Now, we have all the inputs, so let's feed the butterfly |
// |
initial ib_sync = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
ib_sync <= 1'b0; |
else if (i_ce) |
begin |
// Set the sync to true on the very first |
// valid input in, and hence on the very |
// first valid data out per FFT. |
ib_sync <= (iaddr==(1<<(LGSPAN))); |
end |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
// One input from memory, ... |
ib_a <= imem[iaddr[(LGSPAN-1):0]]; |
// One input clocked in from the top |
ib_b <= i_data; |
// and the coefficient or twiddle factor |
ib_c <= cmem[iaddr[(LGSPAN-1):0]]; |
end |
|
// The idle register is designed to keep track of when an input |
// to the butterfly is important and going to be used. It's used |
// in a flag following, so that when useful values are placed |
// into the butterfly they'll be non-zero (idle=0), otherwise when |
// the inputs to the butterfly are irrelevant and will be ignored, |
// then (idle=1) those inputs will be set to zero. This |
// functionality is not designed to be used in operation, but only |
// within a Verilator simulation context when chasing a bug. |
// In this limited environment, the non-zero answers will stand |
// in a trace making it easier to highlight a bug. |
reg idle; |
generate if (ZERO_ON_IDLE) |
begin |
initial idle = 1; |
always @(posedge i_clk) |
if (i_reset) |
idle <= 1'b1; |
else if (i_ce) |
idle <= (!iaddr[LGSPAN])&&(!wait_for_sync); |
|
end else begin |
|
always @(*) idle = 0; |
|
end endgenerate |
|
generate if (OPT_HWMPY) |
begin : HWBFLY |
hwbfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH), |
.CKPCE(CKPCE), .SHIFT(BFLYSHIFT)) |
bfly(i_clk, i_reset, i_ce, (idle)?0:ib_c, |
(idle || (!i_ce)) ? 0:ib_a, |
(idle || (!i_ce)) ? 0:ib_b, |
(ib_sync)&&(i_ce), |
ob_a, ob_b, ob_sync); |
end else begin : FWBFLY |
butterfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH), |
.CKPCE(CKPCE),.SHIFT(BFLYSHIFT)) |
bfly(i_clk, i_reset, i_ce, |
(idle||(!i_ce))?0:ib_c, |
(idle||(!i_ce))?0:ib_a, |
(idle||(!i_ce))?0:ib_b, |
(ib_sync&&i_ce), |
ob_a, ob_b, ob_sync); |
end endgenerate |
|
// |
// Next step: recover the outputs from the butterfly |
// |
initial oB = 0; |
initial o_sync = 0; |
initial b_started = 0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
oB <= 0; |
o_sync <= 0; |
b_started <= 0; |
end else if (i_ce) |
begin |
o_sync <= (!oB[LGSPAN])?ob_sync : 1'b0; |
if (ob_sync||b_started) |
oB <= oB + { {(LGSPAN){1'b0}}, 1'b1 }; |
if ((ob_sync)&&(!oB[LGSPAN])) |
// A butterfly output is available |
b_started <= 1'b1; |
end |
|
reg [(LGSPAN-1):0] dly_addr; |
reg [(2*OWIDTH-1):0] dly_value; |
always @(posedge i_clk) |
if (i_ce) |
begin |
dly_addr <= oB[(LGSPAN-1):0]; |
dly_value <= ob_b; |
end |
always @(posedge i_clk) |
if (i_ce) |
omem[dly_addr] <= dly_value; |
|
always @(posedge i_clk) |
if (i_ce) |
o_data <= (!oB[LGSPAN])?ob_a : omem[oB[(LGSPAN-1):0]]; |
|
endmodule |
/trunk/rtl/hwbfly.v
0,0 → 1,709
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: hwbfly.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This routine is identical to the butterfly.v routine found |
// in 'butterfly.v', save only that it uses the verilog |
// operator '*' in hopes that the synthesizer would be able to optimize |
// it with hardware resources. |
// |
// It is understood that a hardware multiply can complete its operation in |
// a single clock. |
// |
// Operation: |
// |
// Given two inputs, A (i_left) and B (i_right), and a complex |
// coefficient C (i_coeff), return two outputs, O1 and O2, where: |
// |
// O1 = A + B, and |
// O2 = (A - B)*C |
// |
// This operation is commonly known as a Decimation in Frequency (DIF) |
// Radix-2 Butterfly. |
// O1 and O2 are rounded before being returned in (o_left) and o_right |
// to OWIDTH bits. If SHIFT is one, an extra bit is dropped from these |
// values during the rounding process. |
// |
// Further, since these outputs will take some number of clocks to |
// calculate, we'll pipe a value (i_aux) through the system and return |
// it with the results (o_aux), so you can synchronize to the outgoing |
// output stream. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module hwbfly(i_clk, i_reset, i_ce, i_coef, i_left, i_right, i_aux, |
o_left, o_right, o_aux); |
// Public changeable parameters ... |
// - IWIDTH, number of bits in each component of the input |
// - CWIDTH, number of bits in each component of the twiddle factor |
// - OWIDTH, number of bits in each component of the output |
parameter IWIDTH=16,CWIDTH=IWIDTH+4,OWIDTH=IWIDTH+1; |
// Drop an additional bit on the output? |
parameter SHIFT=0; |
// The number of clocks per clock enable, 1, 2, or 3. |
parameter [1:0] CKPCE=1; |
// |
input i_clk, i_reset, i_ce; |
input [(2*CWIDTH-1):0] i_coef; |
input [(2*IWIDTH-1):0] i_left, i_right; |
input i_aux; |
output wire [(2*OWIDTH-1):0] o_left, o_right; |
output reg o_aux; |
|
|
reg [(2*IWIDTH-1):0] r_left, r_right; |
reg r_aux, r_aux_2; |
reg [(2*CWIDTH-1):0] r_coef; |
wire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i; |
assign r_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)]; |
assign r_left_i = r_left[ (IWIDTH-1):0]; |
assign r_right_r = r_right[(2*IWIDTH-1):(IWIDTH)]; |
assign r_right_i = r_right[(IWIDTH-1):0]; |
reg signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i; |
|
reg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i; |
|
reg [(2*IWIDTH+2):0] leftv, leftvv; |
|
// Set up the input to the multiply |
initial r_aux = 1'b0; |
initial r_aux_2 = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
r_aux <= 1'b0; |
r_aux_2 <= 1'b0; |
end else if (i_ce) |
begin |
// One clock just latches the inputs |
r_aux <= i_aux; |
// Next clock adds/subtracts |
// Other inputs are simply delayed on second clock |
r_aux_2 <= r_aux; |
end |
always @(posedge i_clk) |
if (i_ce) |
begin |
// One clock just latches the inputs |
r_left <= i_left; // No change in # of bits |
r_right <= i_right; |
r_coef <= i_coef; |
// Next clock adds/subtracts |
r_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits |
r_sum_i <= r_left_i + r_right_i; |
r_dif_r <= r_left_r - r_right_r; |
r_dif_i <= r_left_i - r_right_i; |
// Other inputs are simply delayed on second clock |
ir_coef_r <= r_coef[(2*CWIDTH-1):CWIDTH]; |
ir_coef_i <= r_coef[(CWIDTH-1):0]; |
end |
|
|
// See comments in the butterfly.v source file for a discussion of |
// these operations and the appropriate bit widths. |
|
wire signed [((IWIDTH+1)+(CWIDTH)-1):0] p_one, p_two; |
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] p_three; |
|
initial leftv = 0; |
initial leftvv = 0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
leftv <= 0; |
leftvv <= 0; |
end else if (i_ce) |
begin |
// Second clock, pipeline = 1 |
leftv <= { r_aux_2, r_sum_r, r_sum_i }; |
|
// Third clock, pipeline = 3 |
// As desired, each of these lines infers a DSP48 |
leftvv <= leftv; |
end |
|
generate if (CKPCE <= 1) |
begin : CKPCE_ONE |
// Coefficient multiply inputs |
reg signed [(CWIDTH-1):0] p1c_in, p2c_in; |
// Data multiply inputs |
reg signed [(IWIDTH):0] p1d_in, p2d_in; |
// Product 3, coefficient input |
reg signed [(CWIDTH):0] p3c_in; |
// Product 3, data input |
reg signed [(IWIDTH+1):0] p3d_in; |
|
reg signed [((IWIDTH+1)+(CWIDTH)-1):0] rp_one, rp_two; |
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
// Second clock, pipeline = 1 |
p1c_in <= ir_coef_r; |
p2c_in <= ir_coef_i; |
p1d_in <= r_dif_r; |
p2d_in <= r_dif_i; |
p3c_in <= ir_coef_i + ir_coef_r; |
p3d_in <= r_dif_r + r_dif_i; |
end |
|
`ifndef FORMAL |
always @(posedge i_clk) |
if (i_ce) |
begin |
// Third clock, pipeline = 3 |
// As desired, each of these lines infers a DSP48 |
rp_one <= p1c_in * p1d_in; |
rp_two <= p2c_in * p2d_in; |
rp_three <= p3c_in * p3d_in; |
end |
`else |
wire signed [((IWIDTH+1)+(CWIDTH)-1):0] pre_rp_one, pre_rp_two; |
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] pre_rp_three; |
|
abs_mpy #(CWIDTH,IWIDTH+1,1'b1) |
onei(p1c_in, p1d_in, pre_rp_one); |
abs_mpy #(CWIDTH,IWIDTH+1,1'b1) |
twoi(p2c_in, p2d_in, pre_rp_two); |
abs_mpy #(CWIDTH+1,IWIDTH+2,1'b1) |
threei(p3c_in, p3d_in, pre_rp_three); |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
rp_one = pre_rp_one; |
rp_two = pre_rp_two; |
rp_three = pre_rp_three; |
end |
`endif // FORMAL |
|
assign p_one = rp_one; |
assign p_two = rp_two; |
assign p_three = rp_three; |
|
end else if (CKPCE <= 2) |
begin : CKPCE_TWO |
// Coefficient multiply inputs |
reg [2*(CWIDTH)-1:0] mpy_pipe_c; |
// Data multiply inputs |
reg [2*(IWIDTH+1)-1:0] mpy_pipe_d; |
wire signed [(CWIDTH-1):0] mpy_pipe_vc; |
wire signed [(IWIDTH):0] mpy_pipe_vd; |
// |
reg signed [(CWIDTH+1)-1:0] mpy_cof_sum; |
reg signed [(IWIDTH+2)-1:0] mpy_dif_sum; |
|
assign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH]; |
assign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1]; |
|
reg mpy_pipe_v; |
reg ce_phase; |
|
reg signed [(CWIDTH+IWIDTH+1)-1:0] mpy_pipe_out; |
reg signed [IWIDTH+CWIDTH+3-1:0] longmpy; |
|
|
initial ce_phase = 1'b1; |
always @(posedge i_clk) |
if (i_reset) |
ce_phase <= 1'b1; |
else if (i_ce) |
ce_phase <= 1'b0; |
else |
ce_phase <= 1'b1; |
|
always @(*) |
mpy_pipe_v = (i_ce)||(!ce_phase); |
|
always @(posedge i_clk) |
if (!ce_phase) |
begin |
// Pre-clock |
mpy_pipe_c[2*CWIDTH-1:0] <= |
{ ir_coef_r, ir_coef_i }; |
mpy_pipe_d[2*(IWIDTH+1)-1:0] <= |
{ r_dif_r, r_dif_i }; |
|
mpy_cof_sum <= ir_coef_i + ir_coef_r; |
mpy_dif_sum <= r_dif_r + r_dif_i; |
|
end else if (i_ce) |
begin |
// First clock |
mpy_pipe_c[2*(CWIDTH)-1:0] <= { |
mpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} }; |
mpy_pipe_d[2*(IWIDTH+1)-1:0] <= { |
mpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} }; |
end |
|
`ifndef FORMAL |
always @(posedge i_clk) |
if (i_ce) // First clock |
longmpy <= mpy_cof_sum * mpy_dif_sum; |
|
always @(posedge i_clk) |
if (mpy_pipe_v) |
mpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd; |
`else |
wire signed [IWIDTH+CWIDTH+3-1:0] pre_longmpy; |
wire signed [(CWIDTH+IWIDTH+1)-1:0] pre_mpy_pipe_out; |
|
abs_mpy #(CWIDTH+1,IWIDTH+2,1) |
longmpyi(mpy_cof_sum, mpy_dif_sum, pre_longmpy); |
|
always @(posedge i_clk) |
if (i_ce) |
longmpy <= pre_longmpy; |
|
|
abs_mpy #(CWIDTH,IWIDTH+1,1) |
mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out); |
|
always @(posedge i_clk) |
if (mpy_pipe_v) |
mpy_pipe_out <= pre_mpy_pipe_out; |
`endif |
|
reg signed [((IWIDTH+1)+(CWIDTH)-1):0] rp_one, |
rp2_one, rp_two; |
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three; |
|
always @(posedge i_clk) |
if (!ce_phase) // 1.5 clock |
rp_one <= mpy_pipe_out; |
always @(posedge i_clk) |
if (i_ce) // two clocks |
rp_two <= mpy_pipe_out; |
always @(posedge i_clk) |
if (i_ce) // Second clock |
rp_three<= longmpy; |
always @(posedge i_clk) |
if (i_ce) |
rp2_one<= rp_one; |
|
assign p_one = rp2_one; |
assign p_two = rp_two; |
assign p_three= rp_three; |
|
end else if (CKPCE <= 2'b11) |
begin : CKPCE_THREE |
// Coefficient multiply inputs |
reg [3*(CWIDTH+1)-1:0] mpy_pipe_c; |
// Data multiply inputs |
reg [3*(IWIDTH+2)-1:0] mpy_pipe_d; |
wire signed [(CWIDTH):0] mpy_pipe_vc; |
wire signed [(IWIDTH+1):0] mpy_pipe_vd; |
|
assign mpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)]; |
assign mpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)]; |
|
reg mpy_pipe_v; |
reg [2:0] ce_phase; |
|
reg signed [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out; |
|
initial ce_phase = 3'b011; |
always @(posedge i_clk) |
if (i_reset) |
ce_phase <= 3'b011; |
else if (i_ce) |
ce_phase <= 3'b000; |
else if (ce_phase != 3'b011) |
ce_phase <= ce_phase + 1'b1; |
|
always @(*) |
mpy_pipe_v = (i_ce)||(ce_phase < 3'b010); |
|
always @(posedge i_clk) |
if (ce_phase == 3'b000) |
begin |
// Second clock |
mpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= { |
ir_coef_r[CWIDTH-1], ir_coef_r, |
ir_coef_i[CWIDTH-1], ir_coef_i }; |
mpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r; |
mpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= { |
r_dif_r[IWIDTH], r_dif_r, |
r_dif_i[IWIDTH], r_dif_i }; |
mpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i; |
|
end else if (mpy_pipe_v) |
begin |
mpy_pipe_c[3*(CWIDTH+1)-1:0] <= { |
mpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1'b0}} }; |
mpy_pipe_d[3*(IWIDTH+2)-1:0] <= { |
mpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1'b0}} }; |
end |
|
`ifndef FORMAL |
always @(posedge i_clk) |
if (mpy_pipe_v) |
mpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd; |
|
`else // FORMAL |
wire signed [ (CWIDTH+IWIDTH+3)-1:0] pre_mpy_pipe_out; |
|
abs_mpy #(CWIDTH+1,IWIDTH+2,1) |
mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out); |
always @(posedge i_clk) |
if (mpy_pipe_v) |
mpy_pipe_out <= pre_mpy_pipe_out; |
`endif // FORMAL |
|
reg signed [((IWIDTH+1)+(CWIDTH)-1):0] rp_one, rp_two, |
rp2_one, rp2_two; |
reg signed [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three, rp2_three; |
|
always @(posedge i_clk) |
if(i_ce) |
rp_one <= mpy_pipe_out[(CWIDTH+IWIDTH):0]; |
always @(posedge i_clk) |
if(ce_phase == 3'b000) |
rp_two <= mpy_pipe_out[(CWIDTH+IWIDTH):0]; |
always @(posedge i_clk) |
if(ce_phase == 3'b001) |
rp_three <= mpy_pipe_out; |
always @(posedge i_clk) |
if (i_ce) |
begin |
rp2_one<= rp_one; |
rp2_two<= rp_two; |
rp2_three<= rp_three; |
end |
assign p_one = rp2_one; |
assign p_two = rp2_two; |
assign p_three = rp2_three; |
|
end endgenerate |
wire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] w_one, w_two; |
assign w_one = { {(2){p_one[((IWIDTH+1)+(CWIDTH)-1)]}}, p_one }; |
assign w_two = { {(2){p_two[((IWIDTH+1)+(CWIDTH)-1)]}}, p_two }; |
|
// These values are held in memory and delayed during the |
// multiply. Here, we recover them. During the multiply, |
// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...}, |
// therefore, the left_x values need to be right shifted by |
// CWIDTH-2 as well. The additional bits come from a sign |
// extension. |
wire aux_s; |
wire signed [(IWIDTH+CWIDTH):0] left_si, left_sr; |
reg [(2*IWIDTH+2):0] left_saved; |
assign left_sr = { {2{left_saved[2*(IWIDTH+1)-1]}}, left_saved[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1'b0}} }; |
assign left_si = { {2{left_saved[(IWIDTH+1)-1]}}, left_saved[((IWIDTH+1)-1):0], {(CWIDTH-2){1'b0}} }; |
assign aux_s = left_saved[2*IWIDTH+2]; |
|
(* use_dsp48="no" *) |
reg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i; |
|
initial left_saved = 0; |
initial o_aux = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
left_saved <= 0; |
o_aux <= 1'b0; |
end else if (i_ce) |
begin |
// First clock, recover all values |
left_saved <= leftvv; |
|
// Second clock, round and latch for final clock |
o_aux <= aux_s; |
end |
always @(posedge i_clk) |
if (i_ce) |
begin |
// These values are IWIDTH+CWIDTH+3 bits wide |
// although they only need to be (IWIDTH+1) |
// + (CWIDTH) bits wide. (We've got two |
// extra bits we need to get rid of.) |
|
// These two lines also infer DSP48's. |
// To keep from using extra DSP48 resources, |
// they are prevented from using DSP48's |
// by the (* use_dsp48 ... *) comment above. |
mpy_r <= w_one - w_two; |
mpy_i <= p_three - w_one - w_two; |
end |
|
// Round the results |
wire signed [(OWIDTH-1):0] rnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i; |
|
convround #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_r(i_clk, i_ce, |
left_sr, rnd_left_r); |
|
convround #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_i(i_clk, i_ce, |
left_si, rnd_left_i); |
|
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce, |
mpy_r, rnd_right_r); |
|
convround #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce, |
mpy_i, rnd_right_i); |
|
// As a final step, we pack our outputs into two packed two's |
// complement numbers per output word, so that each output word |
// has (2*OWIDTH) bits in it, with the top half being the real |
// portion and the bottom half being the imaginary portion. |
assign o_left = { rnd_left_r, rnd_left_i }; |
assign o_right= { rnd_right_r,rnd_right_i}; |
|
`ifdef VERILATOR |
`define FORMAL |
`endif |
`ifdef FORMAL |
localparam F_LGDEPTH = 3; |
localparam F_DEPTH = 5; |
localparam [F_LGDEPTH-1:0] F_D = F_DEPTH-1; |
|
reg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1]; |
reg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1]; |
reg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1]; |
reg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1]; |
reg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1]; |
reg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1]; |
reg signed [F_DEPTH-1:0] f_dlyaux; |
|
always @(posedge i_clk) |
if (i_reset) |
f_dlyaux <= 0; |
else if (i_ce) |
f_dlyaux <= { f_dlyaux[F_DEPTH-2:0], i_aux }; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
f_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH]; |
f_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0]; |
f_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH]; |
f_dlyright_i[0] <= i_right[( IWIDTH-1):0]; |
f_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH]; |
f_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0]; |
end |
|
genvar k; |
generate for(k=1; k<F_DEPTH; k=k+1) |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
f_dlyleft_r[k] <= f_dlyleft_r[ k-1]; |
f_dlyleft_i[k] <= f_dlyleft_i[ k-1]; |
f_dlyright_r[k] <= f_dlyright_r[k-1]; |
f_dlyright_i[k] <= f_dlyright_i[k-1]; |
f_dlycoeff_r[k] <= f_dlycoeff_r[k-1]; |
f_dlycoeff_i[k] <= f_dlycoeff_i[k-1]; |
end |
|
endgenerate |
|
`ifdef VERILATOR |
`else |
always @(posedge i_clk) |
if ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3)) |
&&(!$past(i_ce,4))) |
assume(i_ce); |
|
generate if (CKPCE <= 1) |
begin |
|
// i_ce is allowed to be anything in this mode |
|
end else if (CKPCE == 2) |
begin : F_CKPCE_TWO |
|
always @(posedge i_clk) |
if ($past(i_ce)) |
assume(!i_ce); |
|
end else if (CKPCE == 3) |
begin : F_CKPCE_THREE |
|
always @(posedge i_clk) |
if (($past(i_ce))||($past(i_ce,2))) |
assume(!i_ce); |
|
end endgenerate |
`endif |
reg [F_LGDEPTH-1:0] f_startup_counter; |
initial f_startup_counter = 0; |
always @(posedge i_clk) |
if (i_reset) |
f_startup_counter <= 0; |
else if ((i_ce)&&(!(&f_startup_counter))) |
f_startup_counter <= f_startup_counter + 1; |
|
wire signed [IWIDTH:0] f_sumr, f_sumi; |
always @(*) |
begin |
f_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D]; |
f_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D]; |
end |
|
wire signed [IWIDTH+CWIDTH:0] f_sumrx, f_sumix; |
assign f_sumrx = { {(2){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} }; |
assign f_sumix = { {(2){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} }; |
|
wire signed [IWIDTH:0] f_difr, f_difi; |
always @(*) |
begin |
f_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D]; |
f_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D]; |
end |
|
wire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix; |
assign f_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr }; |
assign f_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi }; |
|
wire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i; |
assign f_widecoeff_r = {{(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}}, |
f_dlycoeff_r[F_D] }; |
assign f_widecoeff_i = {{(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}}, |
f_dlycoeff_i[F_D] }; |
|
always @(posedge i_clk) |
if (f_startup_counter > F_D) |
begin |
assert(left_sr == f_sumrx); |
assert(left_si == f_sumix); |
assert(aux_s == f_dlyaux[F_D]); |
|
if ((f_difr == 0)&&(f_difi == 0)) |
begin |
assert(mpy_r == 0); |
assert(mpy_i == 0); |
end else if ((f_dlycoeff_r[F_D] == 0) |
&&(f_dlycoeff_i[F_D] == 0)) |
begin |
assert(mpy_r == 0); |
assert(mpy_i == 0); |
end |
|
if ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0)) |
begin |
assert(mpy_r == f_difrx); |
assert(mpy_i == f_difix); |
end |
|
if ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1)) |
begin |
assert(mpy_r == -f_difix); |
assert(mpy_i == f_difrx); |
end |
|
if ((f_difr == 1)&&(f_difi == 0)) |
begin |
assert(mpy_r == f_widecoeff_r); |
assert(mpy_i == f_widecoeff_i); |
end |
|
if ((f_difr == 0)&&(f_difi == 1)) |
begin |
assert(mpy_r == -f_widecoeff_i); |
assert(mpy_i == f_widecoeff_r); |
end |
end |
|
// Let's see if we can improve our performance at all by |
// moving our test one clock earlier. If nothing else, it should |
// help induction finish one (or more) clocks ealier than |
// otherwise |
|
|
wire signed [IWIDTH:0] f_predifr, f_predifi; |
always @(*) |
begin |
f_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1]; |
f_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1]; |
end |
|
wire signed [IWIDTH+CWIDTH+1-1:0] f_predifrx, f_predifix; |
assign f_predifrx = { {(CWIDTH){f_predifr[IWIDTH]}}, f_predifr }; |
assign f_predifix = { {(CWIDTH){f_predifi[IWIDTH]}}, f_predifi }; |
|
wire signed [CWIDTH:0] f_sumcoef; |
wire signed [IWIDTH+1:0] f_sumdiff; |
always @(*) |
begin |
f_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1]; |
f_sumdiff = f_predifr + f_predifi; |
end |
|
// Induction helpers |
always @(posedge i_clk) |
if (f_startup_counter >= F_D) |
begin |
if (f_dlycoeff_r[F_D-1] == 0) |
assert(p_one == 0); |
if (f_dlycoeff_i[F_D-1] == 0) |
assert(p_two == 0); |
|
if (f_dlycoeff_r[F_D-1] == 1) |
assert(p_one == f_predifrx); |
if (f_dlycoeff_i[F_D-1] == 1) |
assert(p_two == f_predifix); |
|
if (f_predifr == 0) |
assert(p_one == 0); |
if (f_predifi == 0) |
assert(p_two == 0); |
|
// verilator lint_off WIDTH |
if (f_predifr == 1) |
assert(p_one == f_dlycoeff_r[F_D-1]); |
if (f_predifi == 1) |
assert(p_two == f_dlycoeff_i[F_D-1]); |
// verilator lint_on WIDTH |
|
if (f_sumcoef == 0) |
assert(p_three == 0); |
if (f_sumdiff == 0) |
assert(p_three == 0); |
// verilator lint_off WIDTH |
if (f_sumcoef == 1) |
assert(p_three == f_sumdiff); |
if (f_sumdiff == 1) |
assert(p_three == f_sumcoef); |
// verilator lint_on WIDTH |
`ifdef VERILATOR |
assert(p_one == f_predifr * f_dlycoeff_r[F_D-1]); |
assert(p_two == f_predifi * f_dlycoeff_i[F_D-1]); |
assert(p_three == f_sumdiff * f_sumcoef); |
`endif // VERILATOR |
end |
|
`endif // FORMAL |
endmodule |
/trunk/rtl/ifftmain.v
0,0 → 1,276
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: ifftmain.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This is the main module in the General Purpose FPGA FFT |
// implementation. As such, all other modules are subordinate |
// to this one. This module accomplish a fixed size Complex FFT on |
// 2048 data points. |
// The FFT is fully pipelined, and accepts as inputs two complex two's |
// complement samples per clock. |
// |
// Parameters: |
// i_clk The clock. All operations are synchronous with this clock. |
// i_reset Synchronous reset, active high. Setting this line will |
// force the reset of all of the internals to this routine. |
// Further, following a reset, the o_sync line will go |
// high the same time the first output sample is valid. |
// i_ce A clock enable line. If this line is set, this module |
// will accept two complex values as inputs, and produce |
// two (possibly empty) complex values as outputs. |
// i_left The first of two complex input samples. This value is split |
// into two two's complement numbers, 15 bits each, with |
// the real portion in the high order bits, and the |
// imaginary portion taking the bottom 15 bits. |
// i_right This is the same thing as i_left, only this is the second of |
// two such samples. Hence, i_left would contain input |
// sample zero, i_right would contain sample one. On the |
// next clock i_left would contain input sample two, |
// i_right number three and so forth. |
// o_left The first of two output samples, of the same format as i_left, |
// only having 21 bits for each of the real and imaginary |
// components, leading to 42 bits total. |
// o_right The second of two output samples produced each clock. This has |
// the same format as o_left. |
// o_sync A one bit output indicating the first valid sample produced by |
// this FFT following a reset. Ever after, this will |
// indicate the first sample of an FFT frame. |
// |
// Arguments: This file was computer generated using the following command |
// line: |
// |
// % ./fftgen -i -d ../rtl -f 2048 -2 -p 0 -n 15 -a ../bench/cpp/ifftsize.h |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
// |
// |
module ifftmain(i_clk, i_reset, i_ce, |
i_left, i_right, |
o_left, o_right, o_sync); |
parameter IWIDTH=15, OWIDTH=21, LGWIDTH=11; |
// |
input i_clk, i_reset, i_ce; |
// |
input [(2*IWIDTH-1):0] i_left, i_right; |
output reg [(2*OWIDTH-1):0] o_left, o_right; |
output reg o_sync; |
|
|
// Outputs of the FFT, ready for bit reversal. |
wire [(2*OWIDTH-1):0] br_left, br_right; |
|
|
wire w_s2048; |
// verilator lint_off UNUSED |
wire w_os2048; |
// verilator lint_on UNUSED |
wire [31:0] w_e2048, w_o2048; |
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0, |
0, 1, "icmem_e4096.hex") |
stage_e2048(i_clk, i_reset, i_ce, |
(!i_reset), i_left, w_e2048, w_s2048); |
fftstage #(IWIDTH,IWIDTH+4,16,11,9,0, |
0, 1, "icmem_o4096.hex") |
stage_o2048(i_clk, i_reset, i_ce, |
(!i_reset), i_right, w_o2048, w_os2048); |
|
|
wire w_s1024; |
// verilator lint_off UNUSED |
wire w_os1024; |
// verilator lint_on UNUSED |
wire [33:0] w_e1024, w_o1024; |
fftstage #(16,20,17,11,8,0, |
0, 1, "icmem_e2048.hex") |
stage_e1024(i_clk, i_reset, i_ce, |
w_s2048, w_e2048, w_e1024, w_s1024); |
fftstage #(16,20,17,11,8,0, |
0, 1, "icmem_o2048.hex") |
stage_o1024(i_clk, i_reset, i_ce, |
w_s2048, w_o2048, w_o1024, w_os1024); |
|
wire w_s512; |
// verilator lint_off UNUSED |
wire w_os512; |
// verilator lint_on UNUSED |
wire [33:0] w_e512, w_o512; |
fftstage #(17,21,17,11,7,0, |
0, 1, "icmem_e1024.hex") |
stage_e512(i_clk, i_reset, i_ce, |
w_s1024, w_e1024, w_e512, w_s512); |
fftstage #(17,21,17,11,7,0, |
0, 1, "icmem_o1024.hex") |
stage_o512(i_clk, i_reset, i_ce, |
w_s1024, w_o1024, w_o512, w_os512); |
|
wire w_s256; |
// verilator lint_off UNUSED |
wire w_os256; |
// verilator lint_on UNUSED |
wire [35:0] w_e256, w_o256; |
fftstage #(17,21,18,11,6,0, |
0, 1, "icmem_e512.hex") |
stage_e256(i_clk, i_reset, i_ce, |
w_s512, w_e512, w_e256, w_s256); |
fftstage #(17,21,18,11,6,0, |
0, 1, "icmem_o512.hex") |
stage_o256(i_clk, i_reset, i_ce, |
w_s512, w_o512, w_o256, w_os256); |
|
wire w_s128; |
// verilator lint_off UNUSED |
wire w_os128; |
// verilator lint_on UNUSED |
wire [35:0] w_e128, w_o128; |
fftstage #(18,22,18,11,5,0, |
0, 1, "icmem_e256.hex") |
stage_e128(i_clk, i_reset, i_ce, |
w_s256, w_e256, w_e128, w_s128); |
fftstage #(18,22,18,11,5,0, |
0, 1, "icmem_o256.hex") |
stage_o128(i_clk, i_reset, i_ce, |
w_s256, w_o256, w_o128, w_os128); |
|
wire w_s64; |
// verilator lint_off UNUSED |
wire w_os64; |
// verilator lint_on UNUSED |
wire [37:0] w_e64, w_o64; |
fftstage #(18,22,19,11,4,0, |
0, 1, "icmem_e128.hex") |
stage_e64(i_clk, i_reset, i_ce, |
w_s128, w_e128, w_e64, w_s64); |
fftstage #(18,22,19,11,4,0, |
0, 1, "icmem_o128.hex") |
stage_o64(i_clk, i_reset, i_ce, |
w_s128, w_o128, w_o64, w_os64); |
|
wire w_s32; |
// verilator lint_off UNUSED |
wire w_os32; |
// verilator lint_on UNUSED |
wire [37:0] w_e32, w_o32; |
fftstage #(19,23,19,11,3,0, |
0, 1, "icmem_e64.hex") |
stage_e32(i_clk, i_reset, i_ce, |
w_s64, w_e64, w_e32, w_s32); |
fftstage #(19,23,19,11,3,0, |
0, 1, "icmem_o64.hex") |
stage_o32(i_clk, i_reset, i_ce, |
w_s64, w_o64, w_o32, w_os32); |
|
wire w_s16; |
// verilator lint_off UNUSED |
wire w_os16; |
// verilator lint_on UNUSED |
wire [39:0] w_e16, w_o16; |
fftstage #(19,23,20,11,2,0, |
0, 1, "icmem_e32.hex") |
stage_e16(i_clk, i_reset, i_ce, |
w_s32, w_e32, w_e16, w_s16); |
fftstage #(19,23,20,11,2,0, |
0, 1, "icmem_o32.hex") |
stage_o16(i_clk, i_reset, i_ce, |
w_s32, w_o32, w_o16, w_os16); |
|
wire w_s8; |
// verilator lint_off UNUSED |
wire w_os8; |
// verilator lint_on UNUSED |
wire [39:0] w_e8, w_o8; |
fftstage #(20,24,20,11,1,0, |
0, 1, "icmem_e16.hex") |
stage_e8(i_clk, i_reset, i_ce, |
w_s16, w_e16, w_e8, w_s8); |
fftstage #(20,24,20,11,1,0, |
0, 1, "icmem_o16.hex") |
stage_o8(i_clk, i_reset, i_ce, |
w_s16, w_o16, w_o8, w_os8); |
|
wire w_s4; |
// verilator lint_off UNUSED |
wire w_os4; |
// verilator lint_on UNUSED |
wire [41:0] w_e4, w_o4; |
qtrstage #(20,21,11,0,1,0) stage_e4(i_clk, i_reset, i_ce, |
w_s8, w_e8, w_e4, w_s4); |
qtrstage #(20,21,11,1,1,0) stage_o4(i_clk, i_reset, i_ce, |
w_s8, w_o8, w_o4, w_os4); |
wire w_s2; |
wire [41:0] w_e2, w_o2; |
laststage #(21,21,0) stage_2(i_clk, i_reset, i_ce, |
w_s4, w_e4, w_o4, w_e2, w_o2, w_s2); |
|
|
// Prepare for a (potential) bit-reverse stage. |
assign br_left = w_e2; |
assign br_right = w_o2; |
|
wire br_start; |
reg r_br_started; |
initial r_br_started = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
r_br_started <= 1'b0; |
else if (i_ce) |
r_br_started <= r_br_started || w_s2; |
assign br_start = r_br_started || w_s2; |
|
// Now for the bit-reversal stage. |
wire br_sync; |
wire [(2*OWIDTH-1):0] br_o_left, br_o_right; |
bitreverse #(11,21) |
revstage(i_clk, i_reset, |
(i_ce & br_start), br_left, br_right, |
br_o_left, br_o_right, br_sync); |
|
|
// Last clock: Register our outputs, we're done. |
initial o_sync = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
o_sync <= 1'b0; |
else if (i_ce) |
o_sync <= br_sync; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
o_left <= br_o_left; |
o_right <= br_o_right; |
end |
|
|
endmodule |
/trunk/rtl/laststage.v
0,0 → 1,171
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: laststage.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This is part of an FPGA implementation that will process |
// the final stage of a decimate-in-frequency FFT, running |
// through the data at two samples per clock. If you notice from the |
// derivation of an FFT, the only time both even and odd samples are |
// used at the same time is in this stage. Therefore, other than this |
// stage and these twiddles, all of the other stages can run two stages |
// at a time at one sample per clock. |
// |
// Operation: |
// Given a stream of values, operate upon them as though they were |
// value pairs, x[2n] and x[2n+1]. The stream begins when n=0, and ends |
// when n=1. When the first x[0] value enters, the synchronization |
// input, i_sync, must be true as well. |
// |
// For this stream, produce outputs |
// y[2n ] = x[2n] + x[2n+1], and |
// y[2n+1] = x[2n] - x[2n+1] |
// |
// When y[0] is output, a synchronization bit o_sync will be true as |
// well, otherwise it will be zero. |
// |
// |
// In this implementation, the output is valid one clock after the input |
// is valid. The output also accumulates one bit above and beyond the |
// number of bits in the input. |
// |
// i_clk A system clock |
// i_reset A synchronous reset |
// i_ce Circuit enable--nothing happens unless this line is high |
// i_sync A synchronization signal, high once per FFT at the start |
// i_left The first (even) complex sample input. The higher order |
// bits contain the real portion, low order bits the |
// imaginary portion, all in two's complement. |
// i_right The next (odd) complex sample input, same format as |
// i_left. |
// o_left The first (even) complex output. |
// o_right The next (odd) complex output. |
// o_sync Output synchronization signal. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module laststage(i_clk, i_reset, i_ce, i_sync, i_left, i_right, o_left, o_right, o_sync); |
parameter IWIDTH=16,OWIDTH=IWIDTH+1, SHIFT=0; |
input i_clk, i_reset, i_ce, i_sync; |
input [(2*IWIDTH-1):0] i_left, i_right; |
output reg [(2*OWIDTH-1):0] o_left, o_right; |
output reg o_sync; |
|
wire signed [(IWIDTH-1):0] i_in_0r, i_in_0i, i_in_1r, i_in_1i; |
assign i_in_0r = i_left[(2*IWIDTH-1):(IWIDTH)]; |
assign i_in_0i = i_left[(IWIDTH-1):0]; |
assign i_in_1r = i_right[(2*IWIDTH-1):(IWIDTH)]; |
assign i_in_1i = i_right[(IWIDTH-1):0]; |
wire [(OWIDTH-1):0] o_out_0r, o_out_0i, |
o_out_1r, o_out_1i; |
|
|
// Handle a potential rounding situation, when IWIDTH>=OWIDTH. |
|
|
|
// As with any register connected to the sync pulse, these must |
// have initial values and be reset on the i_reset signal. |
// Other data values need only restrict their updates to i_ce |
// enabled clocks, but sync's must obey resets and initial |
// conditions as well. |
reg rnd_sync, r_sync; |
|
initial rnd_sync = 1'b0; // Sync into rounding |
initial r_sync = 1'b0; // Sync coming out |
always @(posedge i_clk) |
if (i_reset) |
begin |
rnd_sync <= 1'b0; |
r_sync <= 1'b0; |
end else if (i_ce) |
begin |
rnd_sync <= i_sync; |
r_sync <= rnd_sync; |
end |
|
// As with other variables, these are really only updated when in |
// the processing pipeline, after the first i_sync. However, to |
// eliminate as much unnecessary logic as possible, we toggle |
// these any time the i_ce line is enabled, and don't reset. |
// them on i_reset. |
// Don't forget that we accumulate a bit by adding two values |
// together. Therefore our intermediate value must have one more |
// bit than the two originals. |
reg signed [(IWIDTH):0] rnd_in_0r, rnd_in_0i; |
reg signed [(IWIDTH):0] rnd_in_1r, rnd_in_1i; |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
// |
rnd_in_0r <= i_in_0r + i_in_1r; |
rnd_in_0i <= i_in_0i + i_in_1i; |
// |
rnd_in_1r <= i_in_0r - i_in_1r; |
rnd_in_1i <= i_in_0i - i_in_1i; |
// |
end |
|
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0r(i_clk, i_ce, |
rnd_in_0r, o_out_0r); |
|
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0i(i_clk, i_ce, |
rnd_in_0i, o_out_0i); |
|
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1r(i_clk, i_ce, |
rnd_in_1r, o_out_1r); |
|
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1i(i_clk, i_ce, |
rnd_in_1i, o_out_1i); |
|
|
// Prior versions of this routine did not include the extra |
// clock and register/flip-flops that this routine requires. |
// These are placed in here to correct a bug in Verilator, that |
// otherwise struggles. (Hopefully this will fix the problem ...) |
always @(posedge i_clk) |
if (i_ce) |
begin |
o_left <= { o_out_0r, o_out_0i }; |
o_right <= { o_out_1r, o_out_1i }; |
end |
|
initial o_sync = 1'b0; // Final sync coming out of module |
always @(posedge i_clk) |
if (i_reset) |
o_sync <= 1'b0; |
else if (i_ce) |
o_sync <= r_sync; |
|
endmodule |
/trunk/rtl/longbimpy.v
0,0 → 1,179
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: ../rtl/longbimpy.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: A portable shift and add multiply, built with the knowledge |
// of the existence of a six bit LUT and carry chain. That knowledge |
// allows us to multiply two bits from one value at a time against all |
// of the bits of the other value. This sub multiply is called the |
// bimpy. |
// |
// For minimal processing delay, make the first parameter the one with |
// the least bits, so that AWIDTH <= BWIDTH. |
// |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module longbimpy(i_clk, i_ce, i_a_unsorted, i_b_unsorted, o_r); |
parameter IAW=8, // The width of i_a, min width is 5 |
IBW=12, // The width of i_b, can be anything |
// The following three parameters should not be changed |
// by any implementation, but are based upon hardware |
// and the above values: |
OW=IAW+IBW; // The output width |
localparam AW = (IAW<IBW) ? IAW : IBW, |
BW = (IAW<IBW) ? IBW : IAW, |
IW=(AW+1)&(-2), // Internal width of A |
LUTB=2, // How many bits we can multiply by at once |
TLEN=(AW+(LUTB-1))/LUTB; // Nmbr of rows in our tableau |
input i_clk, i_ce; |
input [(IAW-1):0] i_a_unsorted; |
input [(IBW-1):0] i_b_unsorted; |
output reg [(AW+BW-1):0] o_r; |
|
// |
// Swap parameter order, so that AW <= BW -- for performance |
// reasons |
wire [AW-1:0] i_a; |
wire [BW-1:0] i_b; |
generate if (IAW <= IBW) |
begin : NO_PARAM_CHANGE |
assign i_a = i_a_unsorted; |
assign i_b = i_b_unsorted; |
end else begin : SWAP_PARAMETERS |
assign i_a = i_b_unsorted; |
assign i_b = i_a_unsorted; |
end endgenerate |
|
reg [(IW-1):0] u_a; |
reg [(BW-1):0] u_b; |
reg sgn; |
|
reg [(IW-1-2*(LUTB)):0] r_a[0:(TLEN-3)]; |
reg [(BW-1):0] r_b[0:(TLEN-3)]; |
reg [(TLEN-1):0] r_s; |
reg [(IW+BW-1):0] acc[0:(TLEN-2)]; |
genvar k; |
|
// First step: |
// Switch to unsigned arithmetic for our multiply, keeping track |
// of the along the way. We'll then add the sign again later at |
// the end. |
// |
// If we were forced to stay within two's complement arithmetic, |
// taking the absolute value here would require an additional bit. |
// However, because our results are now unsigned, we can stay |
// within the number of bits given (for now). |
generate if (IW > AW) |
begin |
always @(posedge i_clk) |
if (i_ce) |
u_a <= { 1'b0, (i_a[AW-1])?(-i_a):(i_a) }; |
end else begin |
always @(posedge i_clk) |
if (i_ce) |
u_a <= (i_a[AW-1])?(-i_a):(i_a); |
end endgenerate |
|
always @(posedge i_clk) |
if (i_ce) |
begin |
u_b <= (i_b[BW-1])?(-i_b):(i_b); |
sgn <= i_a[AW-1] ^ i_b[BW-1]; |
end |
|
wire [(BW+LUTB-1):0] pr_a, pr_b; |
|
// |
// Second step: First two 2xN products. |
// |
// Since we have no tableau of additions (yet), we can do both |
// of the first two rows at the same time and add them together. |
// For the next round, we'll then have a previous sum to accumulate |
// with new and subsequent product, and so only do one product at |
// a time can follow this--but the first clock can do two at a time. |
bimpy #(BW) lmpy_0(i_clk,i_ce,u_a[( LUTB-1): 0], u_b, pr_a); |
bimpy #(BW) lmpy_1(i_clk,i_ce,u_a[(2*LUTB-1):LUTB], u_b, pr_b); |
always @(posedge i_clk) |
if (i_ce) r_a[0] <= u_a[(IW-1):(2*LUTB)]; |
always @(posedge i_clk) |
if (i_ce) r_b[0] <= u_b; |
always @(posedge i_clk) |
if (i_ce) r_s <= { r_s[(TLEN-2):0], sgn }; |
always @(posedge i_clk) // One clk after p[0],p[1] become valid |
if (i_ce) acc[0] <= { {(IW-LUTB){1'b0}}, pr_a} |
+{ {(IW-(2*LUTB)){1'b0}}, pr_b, {(LUTB){1'b0}} }; |
|
generate // Keep track of intermediate values, before multiplying them |
if (TLEN > 3) for(k=0; k<TLEN-3; k=k+1) |
begin : gencopies |
always @(posedge i_clk) |
if (i_ce) |
begin |
r_a[k+1] <= { {(LUTB){1'b0}}, |
r_a[k][(IW-1-(2*LUTB)):LUTB] }; |
r_b[k+1] <= r_b[k]; |
end |
end endgenerate |
|
generate // The actual multiply and accumulate stage |
if (TLEN > 2) for(k=0; k<TLEN-2; k=k+1) |
begin : genstages |
// First, the multiply: 2-bits times BW bits |
wire [(BW+LUTB-1):0] genp; |
bimpy #(BW) genmpy(i_clk,i_ce,r_a[k][(LUTB-1):0],r_b[k], genp); |
|
// Then the accumulate step -- on the next clock |
always @(posedge i_clk) |
if (i_ce) |
acc[k+1] <= acc[k] + {{(IW-LUTB*(k+3)){1'b0}}, |
genp, {(LUTB*(k+2)){1'b0}} }; |
end endgenerate |
|
wire [(IW+BW-1):0] w_r; |
assign w_r = (r_s[TLEN-1]) ? (-acc[TLEN-2]) : acc[TLEN-2]; |
always @(posedge i_clk) |
if (i_ce) |
o_r <= w_r[(AW+BW-1):0]; |
|
generate if (IW > AW) |
begin : VUNUSED |
// verilator lint_off UNUSED |
wire [(IW-AW)-1:0] unused; |
assign unused = w_r[(IW+BW-1):(AW+BW)]; |
// verilator lint_on UNUSED |
end endgenerate |
|
endmodule |
/trunk/rtl/qtrstage.v
0,0 → 1,178
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: qtrstage.v |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This file encapsulates the 4 point stage of a decimation in |
// frequency FFT. This particular implementation is optimized |
// so that all of the multiplies are accomplished by additions and |
// multiplexers only. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
`default_nettype none |
// |
module qtrstage(i_clk, i_reset, i_ce, i_sync, i_data, o_data, o_sync); |
parameter IWIDTH=16, OWIDTH=IWIDTH+1; |
// Parameters specific to the core that should be changed when this |
// core is built ... Note that the minimum LGSPAN is 2. Smaller |
// spans must use the fftdoubles stage. |
parameter LGWIDTH=8, ODD=0, INVERSE=0,SHIFT=0; |
input i_clk, i_reset, i_ce, i_sync; |
input [(2*IWIDTH-1):0] i_data; |
output reg [(2*OWIDTH-1):0] o_data; |
output reg o_sync; |
|
reg wait_for_sync; |
reg [3:0] pipeline; |
|
reg [(IWIDTH):0] sum_r, sum_i, diff_r, diff_i; |
|
reg [(2*OWIDTH-1):0] ob_a; |
wire [(2*OWIDTH-1):0] ob_b; |
reg [(OWIDTH-1):0] ob_b_r, ob_b_i; |
assign ob_b = { ob_b_r, ob_b_i }; |
|
reg [(LGWIDTH-1):0] iaddr; |
reg [(2*IWIDTH-1):0] imem; |
|
wire signed [(IWIDTH-1):0] imem_r, imem_i; |
assign imem_r = imem[(2*IWIDTH-1):(IWIDTH)]; |
assign imem_i = imem[(IWIDTH-1):0]; |
|
wire signed [(IWIDTH-1):0] i_data_r, i_data_i; |
assign i_data_r = i_data[(2*IWIDTH-1):(IWIDTH)]; |
assign i_data_i = i_data[(IWIDTH-1):0]; |
|
reg [(2*OWIDTH-1):0] omem; |
|
wire signed [(OWIDTH-1):0] rnd_sum_r, rnd_sum_i, rnd_diff_r, rnd_diff_i, |
n_rnd_diff_r, n_rnd_diff_i; |
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_sum_r(i_clk, i_ce, |
sum_r, rnd_sum_r); |
|
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_sum_i(i_clk, i_ce, |
sum_i, rnd_sum_i); |
|
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_diff_r(i_clk, i_ce, |
diff_r, rnd_diff_r); |
|
convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_diff_i(i_clk, i_ce, |
diff_i, rnd_diff_i); |
|
assign n_rnd_diff_r = - rnd_diff_r; |
assign n_rnd_diff_i = - rnd_diff_i; |
initial wait_for_sync = 1'b1; |
initial iaddr = 0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
wait_for_sync <= 1'b1; |
iaddr <= 0; |
end else if ((i_ce)&&((!wait_for_sync)||(i_sync))) |
begin |
iaddr <= iaddr + { {(LGWIDTH-1){1'b0}}, 1'b1 }; |
wait_for_sync <= 1'b0; |
end |
|
always @(posedge i_clk) |
if (i_ce) |
imem <= i_data; |
|
|
// Note that we don't check on wait_for_sync or i_sync here. |
// Why not? Because iaddr will always be zero until after the |
// first i_ce, so we are safe. |
initial pipeline = 4'h0; |
always @(posedge i_clk) |
if (i_reset) |
pipeline <= 4'h0; |
else if (i_ce) // is our pipeline process full? Which stages? |
pipeline <= { pipeline[2:0], iaddr[0] }; |
|
// This is the pipeline[-1] stage, pipeline[0] will be set next. |
always @(posedge i_clk) |
if ((i_ce)&&(iaddr[0])) |
begin |
sum_r <= imem_r + i_data_r; |
sum_i <= imem_i + i_data_i; |
diff_r <= imem_r - i_data_r; |
diff_i <= imem_i - i_data_i; |
end |
|
// pipeline[1] takes sum_x and diff_x and produces rnd_x |
|
// Now for pipeline[2]. We can actually do this at all i_ce |
// clock times, since nothing will listen unless pipeline[3] |
// on the next clock. Thus, we simplify this logic and do |
// it independent of pipeline[2]. |
always @(posedge i_clk) |
if (i_ce) |
begin |
ob_a <= { rnd_sum_r, rnd_sum_i }; |
// on Even, W = e^{-j2pi 1/4 0} = 1 |
if (ODD == 0) |
begin |
ob_b_r <= rnd_diff_r; |
ob_b_i <= rnd_diff_i; |
end else if (INVERSE==0) begin |
// on Odd, W = e^{-j2pi 1/4} = -j |
ob_b_r <= rnd_diff_i; |
ob_b_i <= n_rnd_diff_r; |
end else begin |
// on Odd, W = e^{j2pi 1/4} = j |
ob_b_r <= n_rnd_diff_i; |
ob_b_i <= rnd_diff_r; |
end |
end |
|
always @(posedge i_clk) |
if (i_ce) |
begin // In sequence, clock = 3 |
if (pipeline[3]) |
begin |
omem <= ob_b; |
o_data <= ob_a; |
end else |
o_data <= omem; |
end |
|
// Don't forget in the sync check that we are running |
// at two clocks per sample. Thus we need to |
// produce a sync every 2^(LGWIDTH-1) clocks. |
initial o_sync = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
o_sync <= 1'b0; |
else if (i_ce) |
o_sync <= &(~iaddr[(LGWIDTH-2):3]) && (iaddr[2:0] == 3'b101); |
endmodule |
/trunk/sw/Makefile
2,17 → 2,16
## |
## Filename: Makefile |
## |
## Project: A Doubletime Pipelined FFT |
## Project: A Generic Pipelined FFT Implementation |
## |
## Purpose: This is the main Makefile for the FFT core generator. |
## It is very simple in its construction, the most complicated |
## parts being the building of the Verilator simulation--a |
## step that may not be required for your project. |
## parts being the building of the Verilator simulation--a step that may |
## not be required for your project. |
## |
## To build the FFT generator, just type 'make' on a line |
## by itself. For a quick tutorial in how to run the |
## generator, just type './fftgen -h' to read the usage() |
## statement. |
## To build the FFT generator, just type 'make' on a line by itself. For |
## a quick tutorial in how to run the generator, just type './fftgen -h' |
## to read the usage() statement. |
## |
## Creator: Dan Gisselquist, Ph.D. |
## Gisselquist Technology, LLC |
19,7 → 18,7
## |
##########################################################################/ |
## |
## Copyright (C) 2015, Gisselquist Technology, LLC |
## Copyright (C) 2015,2018, Gisselquist Technology, LLC |
## |
## This program is free software (firmware): you can redistribute it and/or |
## modify it under the terms of the GNU General Public License as published |
45,10 → 44,20
## |
# This is really simple ... |
all: fftgen |
CORED := fft-core |
OBJDR := $(CORED)/obj_dir |
TESTSZ := 2048 |
BENCHD := ../bench/cpp |
CORED := ../rtl |
VOBJDR := $(CORED)/obj_dir |
OBJDIR := obj-pc |
BENCHD := ../bench/cpp |
SOURCES := bitreverse.cpp bldstage.cpp butterfly.cpp fftgen.cpp fftlib.cpp \ |
legal.cpp rounding.cpp softmpy.cpp |
TESTSZ := -f 2048 |
# CKPCE := -k 1 |
CKPCE := -2 |
MPYS := -p 0 |
IWID := -n 15 |
FFTPARAMS := -d $(CORED) $(TESTSZ) $(CKPCE) $(MPYS) $(IWID) |
OBJECTS := $(addprefix $(OBJDIR)/,$(subst .cpp,.o,$(SOURCES))) |
HEADERS := $(wildcard *.h) |
ifneq ($(VERILATOR_ROOT),) |
VERILATOR:=$(VERILATOR_ROOT)/bin/verilator |
else |
57,17 → 66,19
endif |
export $(VERILATOR) |
VROOT := $(VERILATOR_ROOT) |
VFLAGS := -Wall -MMD --trace -cc |
VFLAGS := -Wall -O3 -MMD --trace -cc |
CFLAGS := -g -Wall |
|
fftgen: fftgen.o |
$(CXX) $< -o $@ |
$(OBJDIR)/%.o: %.cpp |
$(mk-objdir) |
$(CXX) -c $(CFLAGS) $< -o $@ |
|
%.o: %.cpp |
$(CXX) -c $< -o $@ |
fftgen: $(OBJECTS) |
$(CXX) $(CFLAGS) $^ -o $@ |
|
.PHONY: test |
test: fft ifft butterfly dblreverse qtrstage dblstage fftstage_o2048 |
test: hwbfly shiftaddmpy longbimpy |
test: fft ifft butterfly fftstage hwbfly shiftaddmpy longbimpy qtrstage |
test: bitreverse laststage |
|
# |
# Although these parameters, a 2048 point FFT of 16 bits input, aren't |
76,101 → 87,176
# you may need to adjust the test benches if you wish to prove that your |
# changes work. |
# |
.PHONY: fft |
fft: fftgen |
./fftgen -f $(TESTSZ) -n 16 -p 6 -a $(BENCHD)/fftsize.h |
.PHONY: fft forcedfft |
fft: $(VOBJDR)/Vfftmain__ALL.so |
$(CORED)/fftmain.v: fftgen |
./fftgen -v $(FFTPARAMS) -a $(BENCHD)/fftsize.h |
forcedfft: fftgen |
./fftgen -v $(FFTPARAMS) -a $(BENCHD)/fftsize.h |
$(VOBJDR)/Vfftmain.h: $(CORED)/fftmain.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) fftmain.v |
cd $(OBJDR); make -f Vfftmain.mk |
$(VOBJDR)/Vfftmain__ALL.so: $(VOBJDR)/Vfftmain.h |
cd $(VOBJDR); make -f Vfftmain.mk |
|
.PHONY: dblfft |
dblfft: $(VOBJDR)/Vdblfftmain__ALL.so |
$(CORED)/dblfftmain.v: fftgen |
./fftgen -v $(FFTPARAMS) -a $(BENCHD)/fftsize.h |
$(VOBJDR)/Vdblfftmain.h: $(CORED)/dblfftmain.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) dblfftmain.v |
$(VOBJDR)/Vdblfftmain__ALL.so: $(VOBJDR)/Vdblfftmain.h |
cd $(VOBJDR); make -f Vdblfftmain.mk |
|
.PHONY: idblfft |
idblfft: $(VOBJDR)/Vidblfftmain__ALL.so |
$(CORED)/idblfftmain.v: fftgen |
./fftgen -i $(FFTPARAMS) -a $(BENCHD)/ifftsize.h |
$(VOBJDR)/Vidblfftmain.h: $(CORED)/idblfftmain.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) idblfftmain.v |
$(VOBJDR)/Vidblfftmain__ALL.so: $(VOBJDR)/Vidblfftmain.h |
cd $(VOBJDR); make -f Vidblfftmain.mk |
|
.PHONY: ifft |
ifft: fftgen |
./fftgen -f $(TESTSZ) -i -n 22 -p 6 -a $(BENCHD)/ifftsize.h |
ifft: $(VOBJDR)/Vifftmain__ALL.so |
$(CORED)/ifftmain.v: fftgen |
./fftgen -i $(FFTPARAMS) -a $(BENCHD)/ifftsize.h |
$(VOBJDR)/Vifftmain.h: $(CORED)/ifftmain.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) ifftmain.v |
cd $(OBJDR); make -f Vifftmain.mk |
$(VOBJDR)/Vifftmain__ALL.so: $(VOBJDR)/Vifftmain.h |
cd $(VOBJDR); make -f Vifftmain.mk |
|
.PHONY: shiftaddmpy |
shiftaddmpy: $(OBJDR)/Vshiftaddmpy__ALL.a |
shiftaddmpy: $(VOBJDR)/Vshiftaddmpy__ALL.a |
|
$(CORED)/shiftaddmpy.v: fft |
$(OBJDR)/Vshiftaddmpy.cpp $(OBJDR)/Vshiftaddmpy.h: $(CORED)/shiftaddmpy.v |
$(VOBJDR)/Vshiftaddmpy.cpp $(VOBJDR)/Vshiftaddmpy.h: $(CORED)/shiftaddmpy.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) shiftaddmpy.v |
$(OBJDR)/Vshiftaddmpy__ALL.a: $(OBJDR)/Vshiftaddmpy.h |
$(OBJDR)/Vshiftaddmpy__ALL.a: $(OBJDR)/Vshiftaddmpy.cpp |
cd $(OBJDR)/; make -f Vshiftaddmpy.mk |
$(VOBJDR)/Vshiftaddmpy__ALL.a: $(VOBJDR)/Vshiftaddmpy.h |
$(VOBJDR)/Vshiftaddmpy__ALL.a: $(VOBJDR)/Vshiftaddmpy.cpp |
cd $(VOBJDR)/; make -f Vshiftaddmpy.mk |
|
.PHONY: longbimpy |
longbimpy: $(OBJDR)/Vlongbimpy__ALL.a |
longbimpy: $(VOBJDR)/Vlongbimpy__ALL.a |
|
$(CORED)/longbimpy.v: fft |
$(OBJDR)/Vlongbimpy.cpp $(OBJDR)/Vlongbimpy.h: $(CORED)/longbimpy.v |
$(VOBJDR)/Vlongbimpy.cpp $(VOBJDR)/Vlongbimpy.h: $(CORED)/longbimpy.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) longbimpy.v |
$(OBJDR)/Vlongbimpy__ALL.a: $(OBJDR)/Vlongbimpy.h |
$(OBJDR)/Vlongbimpy__ALL.a: $(OBJDR)/Vlongbimpy.cpp |
cd $(OBJDR)/; make -f Vlongbimpy.mk |
$(VOBJDR)/Vlongbimpy__ALL.a: $(VOBJDR)/Vlongbimpy.h |
$(VOBJDR)/Vlongbimpy__ALL.a: $(VOBJDR)/Vlongbimpy.cpp |
cd $(VOBJDR)/; make -f Vlongbimpy.mk |
|
.PHONY: butterfly |
butterfly: $(OBJDR)/Vbutterfly__ALL.a |
butterfly: $(VOBJDR)/Vbutterfly__ALL.a |
|
$(CORED)/butterfly.v: fft |
$(OBJDR)/Vbutterfly.cpp $(OBJDR)/Vbutterfly.h: $(CORED)/butterfly.v |
$(VOBJDR)/Vbutterfly.cpp $(VOBJDR)/Vbutterfly.h: $(CORED)/butterfly.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) butterfly.v |
$(OBJDR)/Vbutterfly__ALL.a: $(OBJDR)/Vbutterfly.h |
$(OBJDR)/Vbutterfly__ALL.a: $(OBJDR)/Vbutterfly.cpp |
cd $(OBJDR)/; make -f Vbutterfly.mk |
$(VOBJDR)/Vbutterfly__ALL.a: $(VOBJDR)/Vbutterfly.h |
$(VOBJDR)/Vbutterfly__ALL.a: $(VOBJDR)/Vbutterfly.cpp |
cd $(VOBJDR)/; make -f Vbutterfly.mk |
|
.PHONY: hwbfly |
hwbfly: $(OBJDR)/Vhwbfly__ALL.a |
hwbfly: $(VOBJDR)/Vhwbfly__ALL.a |
|
$(CORED)/hwbfly.v: fft |
$(OBJDR)/Vhwbfly.cpp $(OBJDR)/Vhwbfly.h: $(CORED)/hwbfly.v |
$(VOBJDR)/Vhwbfly.cpp $(VOBJDR)/Vhwbfly.h: $(CORED)/hwbfly.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) hwbfly.v |
$(OBJDR)/Vhwbfly__ALL.a: $(OBJDR)/Vhwbfly.h |
$(OBJDR)/Vhwbfly__ALL.a: $(OBJDR)/Vhwbfly.cpp |
cd $(OBJDR)/; make -f Vhwbfly.mk |
$(VOBJDR)/Vhwbfly__ALL.a: $(VOBJDR)/Vhwbfly.h |
$(VOBJDR)/Vhwbfly__ALL.a: $(VOBJDR)/Vhwbfly.cpp |
cd $(VOBJDR)/; make -f Vhwbfly.mk |
|
.PHONY: dblreverse |
dblreverse: $(OBJDR)/Vdblreverse__ALL.a |
.PHONY: bitreverse |
bitreverse: $(VOBJDR)/Vbitreverse__ALL.a |
|
$(CORED)/dblreverse.v: fft |
$(OBJDR)/Vdblreverse.cpp $(OBJDR)/Vdblreverse.h: $(CORED)/dblreverse.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) dblreverse.v |
$(OBJDR)/Vdblreverse__ALL.a: $(OBJDR)/Vdblreverse.h |
$(OBJDR)/Vdblreverse__ALL.a: $(OBJDR)/Vdblreverse.cpp |
cd $(OBJDR)/; make -f Vdblreverse.mk |
$(CORED)/bitreverse.v: fft |
$(VOBJDR)/Vbitreverse.cpp $(VOBJDR)/Vbitreverse.h: $(CORED)/bitreverse.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) bitreverse.v |
$(VOBJDR)/Vbitreverse__ALL.a: $(VOBJDR)/Vbitreverse.h |
$(VOBJDR)/Vbitreverse__ALL.a: $(VOBJDR)/Vbitreverse.cpp |
cd $(VOBJDR)/; make -f Vbitreverse.mk |
|
.PHONY: qtrstage |
qtrstage: $(OBJDR)/Vqtrstage__ALL.a |
qtrstage: $(VOBJDR)/Vqtrstage__ALL.a |
|
$(CORED)/qtrstage.v: fft |
$(OBJDR)/Vqtrstage.cpp $(OBJDR)/Vqtrstage.h: $(CORED)/qtrstage.v |
$(VOBJDR)/Vqtrstage.cpp $(VOBJDR)/Vqtrstage.h: $(CORED)/qtrstage.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) qtrstage.v |
$(OBJDR)/Vqtrstage__ALL.a: $(OBJDR)/Vqtrstage.h |
$(OBJDR)/Vqtrstage__ALL.a: $(OBJDR)/Vqtrstage.cpp |
cd $(OBJDR)/; make -f Vqtrstage.mk |
$(VOBJDR)/Vqtrstage__ALL.a: $(VOBJDR)/Vqtrstage.h |
$(VOBJDR)/Vqtrstage__ALL.a: $(VOBJDR)/Vqtrstage.cpp |
cd $(VOBJDR)/; make -f Vqtrstage.mk |
|
.PHONY: dblstage |
dblstage: $(OBJDR)/Vdblstage__ALL.a |
dblstage: $(VOBJDR)/Vdblstage__ALL.a |
|
$(CORED)/dblstage.v: fft |
$(OBJDR)/Vdblstage.cpp $(OBJDR)/Vdblstage.h: $(CORED)/dblstage.v |
$(VOBJDR)/Vdblstage.cpp $(VOBJDR)/Vdblstage.h: $(CORED)/dblstage.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) dblstage.v |
$(OBJDR)/Vdblstage__ALL.a: $(OBJDR)/Vdblstage.h |
$(OBJDR)/Vdblstage__ALL.a: $(OBJDR)/Vdblstage.cpp |
cd $(OBJDR)/; make -f Vdblstage.mk |
$(VOBJDR)/Vdblstage__ALL.a: $(VOBJDR)/Vdblstage.h |
$(VOBJDR)/Vdblstage__ALL.a: $(VOBJDR)/Vdblstage.cpp |
cd $(VOBJDR)/; make -f Vdblstage.mk |
|
.PHONY: fftstage_o2048 |
dblstage: $(OBJDR)/Vfftstage_o2048__ALL.a |
.PHONY: laststage |
laststage: $(VOBJDR)/Vlaststage__ALL.a |
|
$(CORED)/fftstage_o2048.v: fft |
$(OBJDR)/Vfftstage_o2048.cpp $(OBJDR)/Vfftstage_o2048.h: $(CORED)/fftstage_o2048.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) fftstage_o2048.v |
$(OBJDR)/Vfftstage_o2048__ALL.a: $(OBJDR)/Vfftstage_o2048.h |
$(OBJDR)/Vfftstage_o2048__ALL.a: $(OBJDR)/Vfftstage_o2048.cpp |
cd $(OBJDR)/; make -f Vfftstage_o2048.mk |
$(CORED)/laststage.v: fft |
$(VOBJDR)/Vlaststage.cpp $(VOBJDR)/Vlaststage.h: $(CORED)/laststage.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) laststage.v |
$(VOBJDR)/Vlaststage__ALL.a: $(VOBJDR)/Vlaststage.h |
$(VOBJDR)/Vlaststage__ALL.a: $(VOBJDR)/Vlaststage.cpp |
cd $(VOBJDR)/; make -f Vlaststage.mk |
|
.PHONY: fftstage |
fftstage: $(VOBJDR)/Vfftstage__ALL.a |
|
$(CORED)/fftstage.v: fft |
$(VOBJDR)/Vfftstage.cpp $(VOBJDR)/Vfftstage.h: $(CORED)/fftstage.v |
cd $(CORED)/; $(VERILATOR) $(VFLAGS) fftstage.v |
$(VOBJDR)/Vfftstage__ALL.a: $(VOBJDR)/Vfftstage.h |
$(VOBJDR)/Vfftstage__ALL.a: $(VOBJDR)/Vfftstage.cpp |
cd $(VOBJDR)/; make -f Vfftstage.mk |
|
|
.PHONY: clean |
clean: |
rm fftgen fftgen.o |
rm -rf $(CORED) |
rm -rf $(CORED)/obj_dir |
rm -rf $(CORED)/fftmain.v $(CORED)/fftstage.v |
rm -rf $(CORED)/qtrstage.v $(CORED)/laststage.v $(CORED)/bitreverse.v |
rm -rf $(CORED)/butterfly.v $(CORED)/hwbfly.v |
rm -rf $(CORED)/longbimpy.v $(CORED)/bimpy.v |
rm -rf $(CORED)/convround.v |
|
# |
# The "depends" target, to know what files things depend upon. The depends |
# file itself is kept in $(OBJDIR)/depends.txt |
# |
define build-depends |
$(mk-objdir) |
@echo "Building dependency file" |
@$(CXX) $(CFLAGS) $(INCS) -MM $(SOURCES) > $(OBJDIR)/xdepends.txt |
@sed -e 's/^.*.o: /$(OBJDIR)\/&/' < $(OBJDIR)/xdepends.txt > $(OBJDIR)/depends.txt |
@rm $(OBJDIR)/xdepends.txt |
endef |
|
.PHONY: depends |
depends: tags |
$(build-depends) |
|
$(OBJDIR)/depends.txt: depends |
|
# |
# Make a directory to hold all of the FFT-gen (i.e. the C++) build products |
# (object files) |
# |
define mk-objdir |
@bash -c "if [ ! -e $(OBJDIR) ]; then mkdir -p $(OBJDIR); fi" |
endef |
|
# |
# The "tags" target |
# |
tags: $(SOURCES) $(HEADERS) |
@echo "Generating tags" |
@ctags $(SOURCES) $(HEADERS) |
|
-include $(OBJDIR)/depends.txt |
/trunk/sw/README.md
0,0 → 1,11
This directory contains the software to generate the FFT. It compiles into a |
program called `fftgen`, which you can then call to generate the FFT you are |
interested in. |
|
Components of this coregen include: |
|
- [fftgen.cpp](fftgen.cpp) - This is the top level or 'main' FFT generation program. |
- [bldstage.cpp](bldstage.cpp) - Generates the code for a single FFT stage, |
called [fftstage.v](../rtl/fftstage.v) in the RTL directory. |
- [softmpy.cpp](softmpy.cpp) - Generates a soft multiply. |
- [bitreverse.cpp](bitreverse.cpp) - Generates a bit reverse module |
/trunk/sw/bitreverse.cpp
0,0 → 1,673
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: bitreverse.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen |
#include <stdio.h> |
#include <stdlib.h> |
|
#ifdef _MSC_VER // added for ms vs compatibility |
|
#include <io.h> |
#include <direct.h> |
#define _USE_MATH_DEFINES |
#else |
// And for G++/Linux environment |
|
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros |
#include <sys/stat.h> |
#endif |
|
#include <string.h> |
#include <string> |
#include <math.h> |
#include <ctype.h> |
#include <assert.h> |
|
#include "defaults.h" |
#include "legal.h" |
#include "bitreverse.h" |
|
void build_snglbrev(const char *fname, const bool async_reset) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
char *modulename = strdup(fname), *pslash; |
modulename[strlen(modulename)-2] = '\0'; |
pslash = strrchr(modulename, '/'); |
if (pslash != NULL) |
strcpy(modulename, pslash+1); |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\t%s.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tThis module bitreverses a pipelined FFT input. It differes\n" |
"// from the dblreverse module in that this is just a simple and\n" |
"// straightforward bitreverse, rather than one written to handle two\n" |
"// words at once.\n" |
"//\n" |
"//\n%s" |
"//\n", modulename, prjname, creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module %s(i_clk, %s, i_ce, i_in, o_out, o_sync);\n" |
"\tparameter\t\t\tLGSIZE=%d, WIDTH=24;\n" |
"\tinput\t\t\t\ti_clk, %s, i_ce;\n" |
"\tinput\t\t[(2*WIDTH-1):0]\ti_in;\n" |
"\toutput\twire\t[(2*WIDTH-1):0]\to_out;\n" |
"\toutput\treg\t\t\to_sync;\n", modulename, resetw.c_str(), |
TST_DBLREVERSE_LGSIZE, |
resetw.c_str()); |
|
fprintf(fp, |
" reg [(LGSIZE):0] wraddr;\n" |
" wire [(LGSIZE):0] rdaddr;\n" |
"\n" |
" reg [(2*WIDTH-1):0] brmem [0:((1<<(LGSIZE+1))-1)];\n" |
"\n" |
" genvar k;\n" |
" generate for(k=0; k<LGSIZE; k=k+1)\n" |
" assign rdaddr[k] = wraddr[LGSIZE-1-k];\n" |
" endgenerate\n" |
" assign rdaddr[LGSIZE] = !wraddr[LGSIZE];\n" |
"\n" |
" reg in_reset;\n" |
"\n" |
" initial in_reset = 1'b1;\n"); |
|
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
" in_reset <= 1'b1;\n" |
" else if ((i_ce)&&(&wraddr[(LGSIZE-1):0]))\n" |
" in_reset <= 1'b0;\n" |
"\n" |
" initial wraddr = 0;\n"); |
|
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
" wraddr <= 0;\n" |
" else if (i_ce)\n" |
" begin\n" |
" brmem[wraddr] <= i_in;\n" |
" wraddr <= wraddr + 1;\n" |
" end\n" |
"\n" |
" always @(posedge i_clk)\n" |
" if (i_ce) // If (i_reset) we just output junk ... not a problem\n" |
" o_out <= brmem[rdaddr]; // w/o a sync pulse\n" |
"\n" |
" initial o_sync = 1'b0;\n"); |
|
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
" o_sync <= 1'b0;\n" |
" else if ((i_ce)&&(!in_reset))\n" |
" o_sync <= (wraddr[(LGSIZE-1):0] == 0);\n" |
"\n"); |
|
|
if (formal_property_flag) { |
fprintf(fp, |
"`ifdef\tFORMAL\n" |
"`ifdef BITREVERSE\n" |
"`define\tASSUME assume\n" |
"`define\tASSERT assert\n"); |
if (async_reset) |
fprintf(fp, |
"\n\talways @($global_clock)\n" |
"\t\tassume(i_clk != $past(i_clk));\n\n"); |
|
fprintf(fp, |
"`else\n" |
"`define\tASSUME assert\n" |
"`define\tASSERT assume\n" |
"`endif\n" |
"\n" |
"\treg f_past_valid;\n" |
"\tinitial f_past_valid = 1'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tf_past_valid <= 1'b1;\n\n"); |
|
if (async_reset) |
fprintf(fp, |
"\tinitial `ASSUME(!i_areset_n);\n" |
"\talways @($global_clock)\n" |
"\tif (!$rose(i_clk)))\n" |
"\t\t`ASSERT(!$rose(i_areset_n));\n\n" |
"\talways @($global_clock)\n" |
"\tif (!$rose(i_clk))\n" |
"\tbegin\n" |
"\t\t`ASSUME($stable(i_ce));\n" |
"\t\t`ASSUME($stable(i_in));\n" |
"\t\t//\n" |
"\t\tif (i_areset_n)\n" |
"\t\tbegin\n" |
"\t\t\t`ASSERT($stable(o_out));\n" |
"\t\t\t`ASSERT($stable(o_sync));\n" |
"\t\tend\n" |
"\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((!f_past_valid)||(!i_areset_n))\n" |
"\tbegin\n"); |
else |
fprintf(fp, |
"\tinitial `ASSUME(i_reset);\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((!f_past_valid)||($past(i_reset)))\n" |
"\tbegin\n"); |
|
fprintf(fp, |
"\t\t`ASSERT(wraddr == 0);\n" |
"\t\t`ASSERT(in_reset);\n" |
"\t\t`ASSERT(!o_sync);\n"); |
fprintf(fp, "\tend\n"); |
|
|
fprintf(fp, "`ifdef BITREVERSE\n" |
"\talways @(posedge i_clk)\n" |
"\t\tassume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n" |
"`endif // BITREVERSE\n\n"); |
|
fprintf(fp, |
"\t\t(* anyconst *) reg [LGSIZE:0]\tf_const_addr;\n" |
"\t\twire\t[LGSIZE:0]\tf_reversed_addr;\n" |
"\t\treg\t f_addr_loaded;\n" |
"\t\treg\t[(2*WIDTH-1):0]\tf_addr_value;\n" |
"\n" |
"\t\tgenerate for(k=0; k<LGSIZE; k=k+1)\n" |
"\t\t\tassign\tf_reversed_addr[k] = f_const_addr[LGSIZE-1-k];\n" |
"\t\tendgenerate\n" |
"\t\tassign\tf_reversed_addr[LGSIZE] = f_const_addr[LGSIZE];\n" |
"\n" |
"\t\tinitial\tf_addr_loaded = 1'b0;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n" |
"\t\t\tf_addr_loaded <= 1'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tif (wraddr == f_const_addr)\n" |
"\t\t\t\tf_addr_loaded <= 1'b1;\n" |
"\t\t\telse if (rdaddr == f_const_addr)\n" |
"\t\t\t\tf_addr_loaded <= 1'b0;\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif ((i_ce)&&(wraddr == f_const_addr))\n" |
"\t\tbegin\n" |
"\t\t\tf_addr_value <= i_in;\n" |
"\t\t\t`ASSERT(!f_addr_loaded);\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n" |
"\t\t\t\t&&($past(f_addr_loaded))&&(!f_addr_loaded))\n" |
"\t\t\tassert(o_out == f_addr_value);\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\tif (o_sync)\n" |
"\t\t\tassert(wraddr[LGSIZE-1:0] == 1);\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\tif ((wraddr[LGSIZE]==f_const_addr[LGSIZE])\n" |
"\t\t\t\t&&(wraddr[LGSIZE-1:0]\n" |
"\t\t\t\t\t\t<= f_const_addr[LGSIZE-1:0]))\n" |
"\t\t\t`ASSERT(!f_addr_loaded);\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\tif ((rdaddr[LGSIZE]==f_const_addr[LGSIZE])&&(f_addr_loaded))\n" |
"\t\t\t`ASSERT(wraddr[LGSIZE-1:0]\n" |
"\t\t\t\t\t<= f_reversed_addr[LGSIZE-1:0]+1);\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\tif (f_addr_loaded)\n" |
"\t\t\t`ASSERT(brmem[f_const_addr] == f_addr_value);\n" |
"\n" |
"\n\n"); |
|
fprintf(fp, |
"`endif\t// FORMAL\n"); |
} |
|
fprintf(fp, |
"endmodule\n"); |
|
fclose(fp); |
free(modulename); |
} |
|
void build_dblreverse(const char *fname, const bool async_reset) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
char *modulename = strdup(fname), *pslash; |
modulename[strlen(modulename)-2] = '\0'; |
pslash = strrchr(modulename, '/'); |
if (pslash != NULL) |
strcpy(modulename, pslash+1); |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\t%s.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tThis module bitreverses a pipelined FFT input. Operation is\n" |
"// expected as follows:\n" |
"//\n" |
"// i_clk A running clock at whatever system speed is offered.\n", |
modulename, prjname); |
|
if (async_reset) |
fprintf(fp, |
"// i_areset_n An active low asynchronous reset signal,\n" |
"// that resets all internals\n"); |
else |
fprintf(fp, |
"// i_reset A synchronous reset signal, that resets all internals\n"); |
|
fprintf(fp, |
"// i_ce If this is one, one input is consumed and an output\n" |
"// is produced.\n" |
"// i_in_0, i_in_1\n" |
"// Two inputs to be consumed, each of width WIDTH.\n" |
"// o_out_0, o_out_1\n" |
"// Two of the bitreversed outputs, also of the same\n" |
"// width, WIDTH. Of course, there is a delay from the\n" |
"// first input to the first output. For this purpose,\n" |
"// o_sync is present.\n" |
"// o_sync This will be a 1\'b1 for the first value in any block.\n" |
"// Following a reset, this will only become 1\'b1 once\n" |
"// the data has been loaded and is now valid. After that,\n" |
"// all outputs will be valid.\n" |
"//\n" |
"// 20150602 -- This module has undergone massive rework in order to\n" |
"// ensure that it uses resources efficiently. As a result,\n" |
"// it now optimizes nicely into block RAMs. As an unfortunately\n" |
"// side effect, it now passes it\'s bench test (dblrev_tb) but\n" |
"// fails the integration bench test (fft_tb).\n" |
"//\n" |
"//\n%s" |
"//\n", creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"\n\n" |
"//\n" |
"// How do we do bit reversing at two smples per clock? Can we separate out\n" |
"// our work into eight memory banks, writing two banks at once and reading\n" |
"// another two banks in the same clock?\n" |
"//\n" |
"// mem[00xxx0] = s_0[n]\n" |
"// mem[00xxx1] = s_1[n]\n" |
"// o_0[n] = mem[10xxx0]\n" |
"// o_1[n] = mem[11xxx0]\n" |
"// ...\n" |
"// mem[01xxx0] = s_0[m]\n" |
"// mem[01xxx1] = s_1[m]\n" |
"// o_0[m] = mem[10xxx1]\n" |
"// o_1[m] = mem[11xxx1]\n" |
"// ...\n" |
"// mem[10xxx0] = s_0[n]\n" |
"// mem[10xxx1] = s_1[n]\n" |
"// o_0[n] = mem[00xxx0]\n" |
"// o_1[n] = mem[01xxx0]\n" |
"// ...\n" |
"// mem[11xxx0] = s_0[m]\n" |
"// mem[11xxx1] = s_1[m]\n" |
"// o_0[m] = mem[00xxx1]\n" |
"// o_1[m] = mem[01xxx1]\n" |
"// ...\n" |
"//\n" |
"// The answer is that, yes we can but: we need to use four memory banks\n" |
"// to do it properly. These four banks are defined by the two bits\n" |
"// that determine the top and bottom of the correct address. Larger\n" |
"// FFT\'s would require more memories.\n" |
"//\n" |
"//\n"); |
fprintf(fp, |
"module %s(i_clk, %s, i_ce, i_in_0, i_in_1,\n" |
"\t\to_out_0, o_out_1, o_sync);\n" |
"\tparameter\t\t\tLGSIZE=%d, WIDTH=24;\n" |
"\tinput\t\t\t\ti_clk, %s, i_ce;\n" |
"\tinput\t\t[(2*WIDTH-1):0]\ti_in_0, i_in_1;\n" |
"\toutput\twire\t[(2*WIDTH-1):0]\to_out_0, o_out_1;\n" |
"\toutput\treg\t\t\to_sync;\n", modulename, |
resetw.c_str(), TST_DBLREVERSE_LGSIZE, resetw.c_str()); |
|
fprintf(fp, |
"\n" |
"\treg\t\t\tin_reset;\n" |
"\treg\t[(LGSIZE-1):0]\tiaddr;\n" |
"\twire\t[(LGSIZE-3):0]\tbraddr;\n" |
"\n" |
"\tgenvar\tk;\n" |
"\tgenerate for(k=0; k<LGSIZE-2; k=k+1)\n" |
"\tbegin : gen_a_bit_reversed_value\n" |
"\t\tassign braddr[k] = iaddr[LGSIZE-3-k];\n" |
"\tend endgenerate\n" |
"\n" |
"\tinitial iaddr = 0;\n" |
"\tinitial in_reset = 1\'b1;\n" |
"\tinitial o_sync = 1\'b0;\n"); |
|
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\tbegin\n" |
"\t\t\tiaddr <= 0;\n" |
"\t\t\tin_reset <= 1\'b1;\n" |
"\t\t\to_sync <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tiaddr <= iaddr + { {(LGSIZE-1){1\'b0}}, 1\'b1 };\n" |
"\t\t\tif (&iaddr[(LGSIZE-2):0])\n" |
"\t\t\t\tin_reset <= 1\'b0;\n" |
"\t\t\tif (in_reset)\n" |
"\t\t\t\to_sync <= 1\'b0;\n" |
"\t\t\telse\n" |
"\t\t\t\to_sync <= ~(|iaddr[(LGSIZE-2):0]);\n" |
"\t\tend\n" |
"\n" |
"\treg\t[(2*WIDTH-1):0]\tmem_e [0:((1<<(LGSIZE))-1)];\n" |
"\treg\t[(2*WIDTH-1):0]\tmem_o [0:((1<<(LGSIZE))-1)];\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\tmem_e[iaddr] <= i_in_0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\tmem_o[iaddr] <= i_in_1;\n" |
"\n" |
"\n" |
"\treg [(2*WIDTH-1):0] evn_out_0, evn_out_1, odd_out_0, odd_out_1;\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\tevn_out_0 <= mem_e[{!iaddr[LGSIZE-1],1\'b0,braddr}];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\tevn_out_1 <= mem_e[{!iaddr[LGSIZE-1],1\'b1,braddr}];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\todd_out_0 <= mem_o[{!iaddr[LGSIZE-1],1\'b0,braddr}];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\todd_out_1 <= mem_o[{!iaddr[LGSIZE-1],1\'b1,braddr}];\n" |
"\n" |
"\treg\tadrz;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) adrz <= iaddr[LGSIZE-2];\n" |
"\n" |
"\tassign\to_out_0 = (adrz)?odd_out_0:evn_out_0;\n" |
"\tassign\to_out_1 = (adrz)?odd_out_1:evn_out_1;\n" |
"\n"); |
|
if (formal_property_flag) { |
fprintf(fp, |
"`ifdef\tFORMAL\n" |
"`ifdef BITREVERSE\n" |
"`define\tASSUME assume\n" |
"`define\tASSERT assert\n"); |
if (async_reset) |
fprintf(fp, |
"\n\talways @($global_clock)\n" |
"\t\tassume(i_clk != $past(i_clk));\n\n"); |
|
fprintf(fp, |
"`else\n" |
"`define\tASSUME assert\n" |
"`define\tASSERT assume\n" |
"`endif\n" |
"\n" |
"\treg f_past_valid;\n" |
"\tinitial f_past_valid = 1'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tf_past_valid <= 1'b1;\n\n"); |
|
if (async_reset) |
fprintf(fp, |
"\tinitial `ASSUME(!i_areset_n);\n" |
"\talways @($global_clock)\n" |
"\tif (!$rose(i_clk)))\n" |
"\t\t`ASSERT(!$rose(i_areset_n));\n\n" |
"\talways @($global_clock)\n" |
"\tif (!$rose(i_clk))\n" |
"\tbegin\n" |
"\t\t`ASSUME($stable(i_ce));\n" |
"\t\t`ASSUME($stable(i_in_0));\n" |
"\t\t`ASSUME($stable(i_in_1));\n" |
"\t\t//\n" |
"\t\tif (i_areset_n)\n" |
"\t\tbegin\n" |
"\t\t\t`ASSERT($stable(o_out_0));\n" |
"\t\t\t`ASSERT($stable(o_out_1));\n" |
"\t\t\t`ASSERT($stable(o_sync));\n" |
"\t\tend\n" |
"\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((!f_past_valid)||(!i_areset_n))\n" |
"\tbegin\n"); |
else |
fprintf(fp, |
"\tinitial `ASSUME(i_reset);\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((!f_past_valid)||($past(i_reset)))\n" |
"\tbegin\n"); |
|
fprintf(fp, |
"\t\t`ASSERT(iaddr == 0);\n" |
"\t\t`ASSERT(in_reset);\n" |
"\t\t`ASSERT(!o_sync);\n"); |
fprintf(fp, "\tend\n"); |
|
|
fprintf(fp, "`ifdef BITREVERSE\n" |
"\talways @(posedge i_clk)\n" |
"\t\tassume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n" |
"`endif // BITREVERSE\n\n"); |
|
|
fprintf(fp, |
"\t\t(* anyconst *) reg [LGSIZE-1:0] f_const_addr;\n" |
"\t\twire [LGSIZE-3:0] f_reversed_addr;\n" |
"\t\t// reg [LGSIZE:0] f_now;\n" |
"\t\treg f_addr_loaded_0, f_addr_loaded_1;\n" |
"\t\treg [(2*WIDTH-1):0] f_data_0, f_data_1;\n" |
"\t\twire f_writing, f_reading;\n" |
"\n" |
"\t\tgenerate for(k=0; k<LGSIZE-2; k=k+1)\n" |
"\t\t assign f_reversed_addr[k] = f_const_addr[LGSIZE-3-k];\n" |
"\t\tendgenerate\n" |
"\n" |
"\t\tassign f_writing=(f_const_addr[LGSIZE-1]==iaddr[LGSIZE-1]);\n" |
"\t\tassign f_reading=(f_const_addr[LGSIZE-1]!=iaddr[LGSIZE-1]);\n" |
"\t\tinitial f_addr_loaded_0 = 1'b0;\n" |
"\t\tinitial f_addr_loaded_1 = 1'b0;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n" |
"\t\tbegin\n" |
"\t f_addr_loaded_0 <= 1'b0;\n" |
"\t f_addr_loaded_1 <= 1'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t if (iaddr == f_const_addr)\n" |
"\t\t begin\n" |
"\t\t f_addr_loaded_0 <= 1'b1;\n" |
"\t\t f_addr_loaded_1 <= 1'b1;\n" |
"\t\t end\n" |
"\n" |
"\t\t if (f_reading)\n" |
"\t\t begin\n" |
"\t\t if ((braddr == f_const_addr[LGSIZE-3:0])\n" |
"\t\t &&(iaddr[LGSIZE-2] == 1'b0))\n" |
"\t\t f_addr_loaded_0 <= 1'b0;\n" |
"\n" |
"\t\t if ((braddr == f_const_addr[LGSIZE-3:0])\n" |
"\t\t &&(iaddr[LGSIZE-2] == 1'b1))\n" |
"\t\t f_addr_loaded_1 <= 1'b0;\n" |
"\t\t end\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif ((i_ce)&&(iaddr == f_const_addr))\n" |
"\t\tbegin\n" |
"\t\t f_data_0 <= i_in_0;\n" |
"\t\t f_data_1 <= i_in_1;\n" |
"\t\t `ASSERT(!f_addr_loaded_0);\n" |
"\t\t `ASSERT(!f_addr_loaded_1);\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n" |
"\t\t &&($past(f_addr_loaded_0))&&(!f_addr_loaded_0))\n" |
"\t\tbegin\n" |
"\t\t assert(!$past(iaddr[LGSIZE-2]));\n" |
"\t\t if (f_const_addr[LGSIZE-2])\n" |
"\t\t assert(o_out_1 == f_data_0);\n" |
"\t\t else\n" |
"\t\t assert(o_out_0 == f_data_0);\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n" |
"\t\t &&($past(f_addr_loaded_1))&&(!f_addr_loaded_1))\n" |
"\t\tbegin\n" |
"\t\t assert($past(iaddr[LGSIZE-2]));\n" |
"\t\t if (f_const_addr[LGSIZE-2])\n" |
"\t\t assert(o_out_1 == f_data_1);\n" |
"\t\t else\n" |
"\t\t assert(o_out_0 == f_data_1);\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\t `ASSERT(o_sync == ((iaddr[LGSIZE-2:0] == 1)&&(!in_reset)));\n" |
"\n" |
"\t\t// Before writing to a section, the loaded flags should be\n" |
"\t\t// zero\n" |
"\t\talways @(*)\n" |
"\t\tif (f_writing)\n" |
"\t\tbegin\n" |
"\t\t `ASSERT(f_addr_loaded_0 == (iaddr[LGSIZE-2:0]\n" |
"\t\t > f_const_addr[LGSIZE-2:0]));\n" |
"\t\t `ASSERT(f_addr_loaded_1 == (iaddr[LGSIZE-2:0]\n" |
"\t\t > f_const_addr[LGSIZE-2:0]));\n" |
"\t\tend\n" |
"\n" |
"\t\t// If we were writing, and now we are reading, then both\n" |
"\t\t// f_addr_loaded flags must be set\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif ((f_past_valid)&&(!$past(i_reset))\n" |
"\t\t &&($past(f_writing))&&(f_reading))\n" |
"\t\tbegin\n" |
"\t\t `ASSERT(f_addr_loaded_0);\n" |
"\t\t `ASSERT(f_addr_loaded_1);\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\tif (f_writing)\n" |
"\t\t `ASSERT(f_addr_loaded_0 == f_addr_loaded_1);\n" |
"\n" |
"\t\t// When reading, and the loaded flag is zero, our pointer\n" |
"\t\t// must not have hit the address of interest yet\n" |
"\t\talways @(*)\n" |
"\t\tif ((!in_reset)&&(f_reading))\n" |
"\t\t `ASSERT(f_addr_loaded_0 ==\n" |
"\t\t ((!iaddr[LGSIZE-2])&&(iaddr[LGSIZE-3:0]\n" |
"\t\t <= f_reversed_addr[LGSIZE-3:0])));\n" |
"\t\talways @(*)\n" |
"\t\tif ((!in_reset)&&(f_reading))\n" |
"\t\t `ASSERT(f_addr_loaded_1 ==\n" |
"\t\t ((!iaddr[LGSIZE-2])||(iaddr[LGSIZE-3:0]\n" |
"\t\t <= f_reversed_addr[LGSIZE-3:0])));\n" |
"\t\talways @(*)\n" |
"\t\tif ((in_reset)&&(f_reading))\n" |
"\t\tbegin\n" |
"\t\t `ASSERT(!f_addr_loaded_0);\n" |
"\t\t `ASSERT(!f_addr_loaded_1);\n" |
"\t\tend\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\tif(iaddr[LGSIZE-1])\n" |
"\t\t `ASSERT(!in_reset);\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\tif (f_addr_loaded_0)\n" |
"\t\t `ASSERT(mem_e[f_const_addr] == f_data_0);\n" |
"\t\talways @(*)\n" |
"\t\tif (f_addr_loaded_1)\n" |
"\t\t `ASSERT(mem_o[f_const_addr] == f_data_1);\n" |
"\n\n"); |
|
|
fprintf(fp, |
"`endif\t// FORMAL\n"); |
} |
|
fprintf(fp, |
"endmodule\n"); |
|
fclose(fp); |
free(modulename); |
} |
/trunk/sw/bitreverse.h
0,0 → 1,44
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: bitreverse.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef BITREVERSE_H |
#define BITREVERSE_H |
|
extern void build_snglbrev(const char *fname, const bool async_reset = false); |
extern void build_dblreverse(const char *fname, const bool async_reset = false); |
|
#endif // BITREVERSE_H |
/trunk/sw/bldstage.cpp
0,0 → 1,549
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: bldstage.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen |
#include <stdio.h> |
#include <stdlib.h> |
|
#ifdef _MSC_VER // added for ms vs compatibility |
|
#include <io.h> |
#include <direct.h> |
#define _USE_MATH_DEFINES |
|
#else |
// And for G++/Linux environment |
|
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros |
#endif |
|
#include <string.h> |
#include <string> |
#include <math.h> |
#include <ctype.h> |
#include <assert.h> |
|
#include "defaults.h" |
#include "legal.h" |
#include "fftlib.h" |
#include "rounding.h" |
#include "bldstage.h" |
|
void build_dblstage(const char *fname, ROUND_T rounding, |
const bool async_reset, const bool dbg) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
const char *rnd_string; |
if (rounding == RND_TRUNCATE) |
rnd_string = "truncate"; |
else if (rounding == RND_FROMZERO) |
rnd_string = "roundfromzero"; |
else if (rounding == RND_HALFUP) |
rnd_string = "roundhalfup"; |
else |
rnd_string = "convround"; |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\tlaststage%s.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tThis is part of an FPGA implementation that will process\n" |
"// the final stage of a decimate-in-frequency FFT, running\n" |
"// through the data at two samples per clock. If you notice from the\n" |
"// derivation of an FFT, the only time both even and odd samples are\n" |
"// used at the same time is in this stage. Therefore, other than this\n" |
"// stage and these twiddles, all of the other stages can run two stages\n" |
"// at a time at one sample per clock.\n" |
"//\n" |
"// Operation:\n" |
"// Given a stream of values, operate upon them as though they were\n" |
"// value pairs, x[2n] and x[2n+1]. The stream begins when n=0, and ends\n" |
"// when n=1. When the first x[0] value enters, the synchronization\n" |
"// input, i_sync, must be true as well.\n" |
"//\n" |
"// For this stream, produce outputs\n" |
"// y[2n ] = x[2n] + x[2n+1], and\n" |
"// y[2n+1] = x[2n] - x[2n+1]\n" |
"//\n" |
"// When y[0] is output, a synchronization bit o_sync will be true as\n" |
"// well, otherwise it will be zero.\n" |
"//\n" |
"//\n" |
"// In this implementation, the output is valid one clock after the input\n" |
"// is valid. The output also accumulates one bit above and beyond the\n" |
"// number of bits in the input.\n" |
"//\n" |
"// i_clk A system clock\n", (dbg)?"_dbg":"", prjname); |
if (async_reset) |
fprintf(fp, |
"// i_areset_n An active low asynchronous reset\n"); |
else |
fprintf(fp, |
"// i_reset A synchronous reset\n"); |
|
fprintf(fp, |
"// i_ce Circuit enable--nothing happens unless this line is high\n" |
"// i_sync A synchronization signal, high once per FFT at the start\n" |
"// i_left The first (even) complex sample input. The higher order\n" |
"// bits contain the real portion, low order bits the\n" |
"// imaginary portion, all in two\'s complement.\n" |
"// i_right The next (odd) complex sample input, same format as\n" |
"// i_left.\n" |
"// o_left The first (even) complex output.\n" |
"// o_right The next (odd) complex output.\n" |
"// o_sync Output synchronization signal.\n" |
"//\n%s" |
"//\n", creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module\tlaststage%s(i_clk, %s, i_ce, i_sync, i_left, i_right, o_left, o_right, o_sync%s);\n" |
"\tparameter\tIWIDTH=%d,OWIDTH=IWIDTH+1, SHIFT=%d;\n" |
"\tinput\t\ti_clk, %s, i_ce, i_sync;\n" |
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n" |
"\toutput\treg\t[(2*OWIDTH-1):0]\to_left, o_right;\n" |
"\toutput\treg\t\t\to_sync;\n" |
"\n", (dbg)?"_dbg":"", resetw.c_str(), (dbg)?", o_dbg":"", |
TST_DBLSTAGE_IWIDTH, TST_DBLSTAGE_SHIFT, |
resetw.c_str()); |
|
if (dbg) { fprintf(fp, "\toutput\twire\t[33:0]\t\t\to_dbg;\n" |
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_left[(2*OWIDTH-1):(2*OWIDTH-16)],\n" |
"\t\t\t\t\to_left[(OWIDTH-1):(OWIDTH-16)] };\n" |
"\n"); |
} |
fprintf(fp, |
"\twire\tsigned\t[(IWIDTH-1):0]\ti_in_0r, i_in_0i, i_in_1r, i_in_1i;\n" |
"\tassign\ti_in_0r = i_left[(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\ti_in_0i = i_left[(IWIDTH-1):0];\n" |
"\tassign\ti_in_1r = i_right[(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\ti_in_1i = i_right[(IWIDTH-1):0];\n" |
"\twire\t[(OWIDTH-1):0]\t\to_out_0r, o_out_0i,\n" |
"\t\t\t\t\to_out_1r, o_out_1i;\n" |
"\n" |
"\n" |
"\t// Handle a potential rounding situation, when IWIDTH>=OWIDTH.\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\n" |
"\t// As with any register connected to the sync pulse, these must\n" |
"\t// have initial values and be reset on the %s signal.\n" |
"\t// Other data values need only restrict their updates to i_ce\n" |
"\t// enabled clocks, but sync\'s must obey resets and initial\n" |
"\t// conditions as well.\n" |
"\treg\trnd_sync, r_sync;\n" |
"\n" |
"\tinitial\trnd_sync = 1\'b0; // Sync into rounding\n" |
"\tinitial\tr_sync = 1\'b0; // Sync coming out\n", |
resetw.c_str()); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negdge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\tbegin\n" |
"\t\t\trnd_sync <= 1\'b0;\n" |
"\t\t\tr_sync <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\trnd_sync <= i_sync;\n" |
"\t\t\tr_sync <= rnd_sync;\n" |
"\t\tend\n" |
"\n" |
"\t// As with other variables, these are really only updated when in\n" |
"\t// the processing pipeline, after the first i_sync. However, to\n" |
"\t// eliminate as much unnecessary logic as possible, we toggle\n" |
"\t// these any time the i_ce line is enabled, and don\'t reset.\n" |
"\t// them on %s.\n", resetw.c_str()); |
fprintf(fp, |
"\t// Don't forget that we accumulate a bit by adding two values\n" |
"\t// together. Therefore our intermediate value must have one more\n" |
"\t// bit than the two originals.\n" |
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_0r, rnd_in_0i;\n" |
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_1r, rnd_in_1i;\n\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t//\n" |
"\t\t\trnd_in_0r <= i_in_0r + i_in_1r;\n" |
"\t\t\trnd_in_0i <= i_in_0i + i_in_1i;\n" |
"\t\t\t//\n" |
"\t\t\trnd_in_1r <= i_in_0r - i_in_1r;\n" |
"\t\t\trnd_in_1i <= i_in_0i - i_in_1i;\n" |
"\t\t\t//\n" |
"\t\tend\n" |
"\n"); |
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0r(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_0r, o_out_0r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0i(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_0i, o_out_0i);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1r(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_1r, o_out_1r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1i(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_1i, o_out_1i);\n\n", rnd_string); |
|
fprintf(fp, "\n" |
"\t// Prior versions of this routine did not include the extra\n" |
"\t// clock and register/flip-flops that this routine requires.\n" |
"\t// These are placed in here to correct a bug in Verilator, that\n" |
"\t// otherwise struggles. (Hopefully this will fix the problem ...)\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\to_left <= { o_out_0r, o_out_0i };\n" |
"\t\t\to_right <= { o_out_1r, o_out_1i };\n" |
"\t\tend\n" |
"\n" |
"\tinitial\to_sync = 1\'b0; // Final sync coming out of module\n"); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negdge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\t\to_sync <= 1\'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\to_sync <= r_sync;\n" |
"\n" |
"endmodule\n"); |
fclose(fp); |
} |
|
void build_stage(const char *fname, |
int stage, int nwide, int offset, |
int nbits, int xtra, int ckpce, |
const bool async_reset, const bool dbg) { |
FILE *fstage = fopen(fname, "w"); |
int cbits = nbits + xtra; |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
if (((unsigned)cbits * 2u) >= sizeof(long long)*8) { |
fprintf(stderr, "ERROR: CMEM Coefficient precision requested overflows long long data type.\n"); |
exit(-1); |
} |
|
if (fstage == NULL) { |
fprintf(stderr, "ERROR: Could not open %s for writing!\n", fname); |
perror("O/S Err was:"); |
fprintf(stderr, "Attempting to continue, but this file will be missing.\n"); |
return; |
} |
|
fprintf(fstage, |
SLASHLINE |
"//\n" |
"// Filename:\tfftstage%s.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tThis file is (almost) a Verilog source file. It is meant to\n" |
"// be used by a FFT core compiler to generate FFTs which may be\n" |
"// used as part of an FFT core. Specifically, this file encapsulates\n" |
"// the options of an FFT-stage. For any 2^N length FFT, there shall be\n" |
"// (N-1) of these stages.\n" |
"//\n" |
"//\n" |
"// Operation:\n" |
"// Given a stream of values, operate upon them as though they were\n" |
"// value pairs, x[n] and x[n+N/2]. The stream begins when n=0, and ends\n" |
"// when n=N/2-1 (i.e. there's a full set of N values). When the value\n" |
"// x[0] enters, the synchronization input, i_sync, must be true as well.\n" |
"//\n" |
"// For this stream, produce outputs\n" |
"// y[n ] = x[n] + x[n+N/2], and\n" |
"// y[n+N/2] = (x[n] - x[n+N/2]) * c[n],\n" |
"// where c[n] is a complex coefficient found in the\n" |
"// external memory file COEFFILE.\n" |
"// When y[0] is output, a synchronization bit o_sync will be true as\n" |
"// well, otherwise it will be zero.\n" |
"//\n" |
"// Most of the work to do this is done within the butterfly, whether the\n" |
"// hardware accelerated butterfly (uses a DSP) or not.\n" |
"//\n%s" |
"//\n", |
(dbg)?"_dbg":"", prjname, creator); |
fprintf(fstage, "%s", cpyleft); |
fprintf(fstage, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fstage, "module\tfftstage%s(i_clk, %s, i_ce, i_sync, i_data, o_data, o_sync%s);\n", |
(dbg)?"_dbg":"", resetw.c_str(), |
(dbg)?", o_dbg":""); |
// These parameter values are useless at this point--they are to be |
// replaced by the parameter values in the calling program. Only |
// problem is, the CWIDTH needs to match exactly! |
fprintf(fstage, "\tparameter\tIWIDTH=%d,CWIDTH=%d,OWIDTH=%d;\n", |
nbits, 20, nbits+1); // 20, not cbits, since the tb depends upon it |
fprintf(fstage, |
"\t// Parameters specific to the core that should be changed when this\n" |
"\t// core is built ... Note that the minimum LGSPAN (the base two log\n" |
"\t// of the span, or the base two log of the current FFT size) is 3.\n" |
"\t// Smaller spans (i.e. the span of 2) must use the dbl laststage module.\n" |
"\tparameter\tLGWIDTH=%d, LGSPAN=%d, BFLYSHIFT=0;\n" |
"\tparameter\t[0:0] OPT_HWMPY = 1;\n", |
lgval(stage), (nwide <= 1) ? lgval(stage)-1 : lgval(stage)-2); |
fprintf(fstage, |
"\t// Clocks per CE. If your incoming data rate is less than 50%% of your\n" |
"\t// clock speed, you can set CKPCE to 2\'b10, make sure there's at least\n" |
"\t// one clock between cycles when i_ce is high, and then use two\n" |
"\t// multiplies instead of three. Setting CKPCE to 2\'b11, and insisting\n" |
"\t// on at least two clocks with i_ce low between cycles with i_ce high,\n" |
"\t// then the hardware optimized butterfly code will used one multiply\n" |
"\t// instead of two.\n" |
"\tparameter\t CKPCE = %d;\n", ckpce); |
|
fprintf(fstage, |
"\t// The COEFFILE parameter contains the name of the file containing the\n" |
"\t// FFT twiddle factors\n"); |
if (nwide == 2) { |
fprintf(fstage, "\tparameter\tCOEFFILE=\"cmem_%c%d.hex\";\n", |
(offset)?'o':'e', stage*2); |
} else |
fprintf(fstage, "\tparameter\tCOEFFILE=\"cmem_%d.hex\";\n", |
stage); |
|
fprintf(fstage,"\n" |
"`ifdef VERILATOR\n" |
"\tparameter [0:0] ZERO_ON_IDLE = 1'b0;\n" |
"`else\n" |
"\tlocalparam [0:0] ZERO_ON_IDLE = 1'b0;\n" |
"`endif // VERILATOR\n\n"); |
|
fprintf(fstage, |
"\tinput i_clk, %s, i_ce, i_sync;\n" |
"\tinput [(2*IWIDTH-1):0] i_data;\n" |
"\toutput reg [(2*OWIDTH-1):0] o_data;\n" |
"\toutput reg o_sync;\n" |
"\n", resetw.c_str()); |
if (dbg) { fprintf(fstage, "\toutput\twire\t[33:0]\t\t\to_dbg;\n" |
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n" |
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n" |
"\n"); |
} |
fprintf(fstage, |
"\treg wait_for_sync;\n" |
"\treg [(2*IWIDTH-1):0] ib_a, ib_b;\n" |
"\treg [(2*CWIDTH-1):0] ib_c;\n" |
"\treg ib_sync;\n" |
"\n" |
"\treg b_started;\n" |
"\twire ob_sync;\n" |
"\twire [(2*OWIDTH-1):0]\tob_a, ob_b;\n"); |
fprintf(fstage, |
"\n" |
"\t// cmem is defined as an array of real and complex values,\n" |
"\t// where the top CWIDTH bits are the real value and the bottom\n" |
"\t// CWIDTH bits are the imaginary value.\n" |
"\t//\n" |
"\t// cmem[i] = { (2^(CWIDTH-2)) * cos(2*pi*i/(2^LGWIDTH)),\n" |
"\t// (2^(CWIDTH-2)) * sin(2*pi*i/(2^LGWIDTH)) };\n" |
"\t//\n" |
"\treg [(2*CWIDTH-1):0] cmem [0:((1<<LGSPAN)-1)];\n" |
"\tinitial\t$readmemh(COEFFILE,cmem);\n\n"); |
|
// gen_coeff_file(coredir, fname, stage, cbits, nwide, offset, inv); |
|
fprintf(fstage, |
"\treg [(LGSPAN):0] iaddr;\n" |
"\treg [(2*IWIDTH-1):0] imem [0:((1<<LGSPAN)-1)];\n" |
"\n" |
"\treg [LGSPAN:0] oB;\n" |
"\treg [(2*OWIDTH-1):0] omem [0:((1<<LGSPAN)-1)];\n" |
"\n" |
"\tinitial wait_for_sync = 1\'b1;\n" |
"\tinitial iaddr = 0;\n"); |
if (async_reset) |
fprintf(fstage, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fstage, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
|
fprintf(fstage, |
"\tbegin\n" |
"\t\t\twait_for_sync <= 1\'b1;\n" |
"\t\t\tiaddr <= 0;\n" |
"\tend else if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n" |
"\tbegin\n" |
"\t\t//\n" |
"\t\t// First step: Record what we\'re not ready to use yet\n" |
"\t\t//\n" |
"\t\tiaddr <= iaddr + { {(LGSPAN){1\'b0}}, 1\'b1 };\n" |
"\t\twait_for_sync <= 1\'b0;\n" |
"\tend\n" |
"\talways @(posedge i_clk) // Need to make certain here that we don\'t read\n" |
"\tif ((i_ce)&&(!iaddr[LGSPAN])) // and write the same address on\n" |
"\t\timem[iaddr[(LGSPAN-1):0]] <= i_data; // the same clk\n" |
"\n"); |
|
fprintf(fstage, |
"\t//\n" |
"\t// Now, we have all the inputs, so let\'s feed the butterfly\n" |
"\t//\n" |
"\tinitial ib_sync = 1\'b0;\n"); |
if (async_reset) |
fprintf(fstage, "\talways @(posedge i_clk, negedge i_areset_n)\n\tif (!i_areset_n)\n"); |
else |
fprintf(fstage, "\talways @(posedge i_clk)\n\tif (i_reset)\n"); |
fprintf(fstage, |
"\t\tib_sync <= 1\'b0;\n" |
"\telse if (i_ce)\n" |
"\tbegin\n" |
"\t\t// Set the sync to true on the very first\n" |
"\t\t// valid input in, and hence on the very\n" |
"\t\t// first valid data out per FFT.\n" |
"\t\tib_sync <= (iaddr==(1<<(LGSPAN)));\n" |
"\tend\n\n" |
"\talways\t@(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\tbegin\n" |
"\t\t// One input from memory, ...\n" |
"\t\tib_a <= imem[iaddr[(LGSPAN-1):0]];\n" |
"\t\t// One input clocked in from the top\n" |
"\t\tib_b <= i_data;\n" |
"\t\t// and the coefficient or twiddle factor\n" |
"\t\tib_c <= cmem[iaddr[(LGSPAN-1):0]];\n" |
"\tend\n\n"); |
|
fprintf(fstage, |
"\t// The idle register is designed to keep track of when an input\n" |
"\t// to the butterfly is important and going to be used. It's used\n" |
"\t// in a flag following, so that when useful values are placed\n" |
"\t// into the butterfly they'll be non-zero (idle=0), otherwise when\n" |
"\t// the inputs to the butterfly are irrelevant and will be ignored,\n" |
"\t// then (idle=1) those inputs will be set to zero. This\n" |
"\t// functionality is not designed to be used in operation, but only\n" |
"\t// within a Verilator simulation context when chasing a bug.\n" |
"\t// In this limited environment, the non-zero answers will stand\n" |
"\t// in a trace making it easier to highlight a bug.\n" |
"\treg idle;\n" |
"\tgenerate if (ZERO_ON_IDLE)\n" |
"\tbegin\n" |
"\t\tinitial idle = 1;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n" |
"\t\t\tidle <= 1\'b1;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\tidle <= (!iaddr[LGSPAN])&&(!wait_for_sync);\n\n" |
"\tend else begin\n\n" |
"\t\talways @(*) idle = 0;\n\n" |
"\tend endgenerate\n\n"); |
|
fprintf(fstage, |
"\tgenerate if (OPT_HWMPY)\n" |
"\tbegin : HWBFLY\n" |
"\t\thwbfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n" |
"\t\t\t\t.CKPCE(CKPCE), .SHIFT(BFLYSHIFT))\n" |
"\t\t\tbfly(i_clk, %s, i_ce, (idle)?0:ib_c,\n" |
"\t\t\t\t(idle || (!i_ce)) ? 0:ib_a,\n" |
"\t\t\t\t(idle || (!i_ce)) ? 0:ib_b,\n" |
"\t\t\t\t(ib_sync)&&(i_ce),\n" |
"\t\t\t\tob_a, ob_b, ob_sync);\n" |
"\tend else begin : FWBFLY\n" |
"\t\tbutterfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n" |
"\t\t\t\t.CKPCE(CKPCE),.SHIFT(BFLYSHIFT))\n" |
"\t\t\tbfly(i_clk, %s, i_ce,\n" |
"\t\t\t\t\t(idle||(!i_ce))?0:ib_c,\n" |
"\t\t\t\t\t(idle||(!i_ce))?0:ib_a,\n" |
"\t\t\t\t\t(idle||(!i_ce))?0:ib_b,\n" |
"\t\t\t\t\t(ib_sync&&i_ce),\n" |
"\t\t\t\t\tob_a, ob_b, ob_sync);\n" |
"\tend endgenerate\n\n", |
resetw.c_str(), resetw.c_str()); |
|
fprintf(fstage, |
"\t//\n" |
"\t// Next step: recover the outputs from the butterfly\n" |
"\t//\n" |
"\tinitial oB = 0;\n" |
"\tinitial o_sync = 0;\n" |
"\tinitial b_started = 0;\n"); |
if (async_reset) |
fprintf(fstage, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fstage, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fstage, |
"\tbegin\n" |
"\t\toB <= 0;\n" |
"\t\to_sync <= 0;\n" |
"\t\tb_started <= 0;\n" |
"\tend else if (i_ce)\n" |
"\tbegin\n" |
"\t\to_sync <= (!oB[LGSPAN])?ob_sync : 1\'b0;\n" |
"\t\tif (ob_sync||b_started)\n" |
"\t\t\toB <= oB + { {(LGSPAN){1\'b0}}, 1\'b1 };\n" |
"\t\tif ((ob_sync)&&(!oB[LGSPAN]))\n" |
"\t\t// A butterfly output is available\n" |
"\t\t\tb_started <= 1\'b1;\n" |
"\tend\n\n"); |
fprintf(fstage, |
"\treg [(LGSPAN-1):0]\t\tdly_addr;\n" |
"\treg [(2*OWIDTH-1):0]\tdly_value;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\tbegin\n" |
"\t\tdly_addr <= oB[(LGSPAN-1):0];\n" |
"\t\tdly_value <= ob_b;\n" |
"\tend\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\t\tomem[dly_addr] <= dly_value;\n" |
"\n"); |
fprintf(fstage, |
"\talways @(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\t\to_data <= (!oB[LGSPAN])?ob_a : omem[oB[(LGSPAN-1):0]];\n" |
"\n"); |
fprintf(fstage, "endmodule\n"); |
} |
/trunk/sw/bldstage.h
0,0 → 1,52
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: bldstage.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef BLDSTAGE_H |
#define BLDSTAGE_H |
|
#include "rounding.h" |
|
extern void build_dblstage(const char *fname, ROUND_T rounding, |
const bool async_reset = false, const bool dbg = false); |
|
extern void build_stage(const char *fname, |
int stage, int nwide, int offset, |
int nbits, int xtra, int ckpce, |
const bool async_reset = false, |
const bool dbg=false); |
|
#endif // BLDSTAGE_H |
/trunk/sw/butterfly.cpp
0,0 → 1,1800
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: butterfly.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen |
#include <stdio.h> |
#include <stdlib.h> |
|
#ifdef _MSC_VER // added for ms vs compatibility |
|
#include <io.h> |
#include <direct.h> |
#define _USE_MATH_DEFINES |
#define R_OK 4 /* Test for read permission. */ |
#define W_OK 2 /* Test for write permission. */ |
#define X_OK 0 /* !!!!!! execute permission - unsupported in windows*/ |
#define F_OK 0 /* Test for existence. */ |
|
#if _MSC_VER <= 1700 |
|
int lstat(const char *filename, struct stat *buf) { return 1; }; |
#define S_ISDIR(A) 0 |
|
#else |
|
#define lstat _stat |
#define S_ISDIR _S_IFDIR |
|
#endif |
|
#define mkdir(A,B) _mkdir(A) |
|
#define access _access |
|
#else |
// And for G++/Linux environment |
|
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros |
#include <sys/stat.h> |
#endif |
|
#include <string.h> |
#include <string> |
#include <math.h> |
#include <ctype.h> |
#include <assert.h> |
|
#include "defaults.h" |
#include "legal.h" |
#include "rounding.h" |
#include "fftlib.h" |
#include "bldstage.h" |
#include "bitreverse.h" |
#include "softmpy.h" |
#include "butterfly.h" |
|
void build_butterfly(const char *fname, int xtracbits, ROUND_T rounding, |
int ckpce, const bool async_reset) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
const char *rnd_string; |
if (rounding == RND_TRUNCATE) |
rnd_string = "truncate"; |
else if (rounding == RND_FROMZERO) |
rnd_string = "roundfromzero"; |
else if (rounding == RND_HALFUP) |
rnd_string = "roundhalfup"; |
else |
rnd_string = "convround"; |
|
//if (ckpce >= 3) |
//ckpce = 3; |
if (ckpce <= 1) |
ckpce = 1; |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\tbutterfly.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tThis routine caculates a butterfly for a decimation\n" |
"// in frequency version of an FFT. Specifically, given\n" |
"// complex Left and Right values together with a coefficient, the output\n" |
"// of this routine is given by:\n" |
"//\n" |
"// L' = L + R\n" |
"// R' = (L - R)*C\n" |
"//\n" |
"// The rest of the junk below handles timing (mostly), to make certain\n" |
"// that L' and R' reach the output at the same clock. Further, just to\n" |
"// make certain that is the case, an 'aux' input exists. This aux value\n" |
"// will come out of this routine synchronized to the values it came in\n" |
"// with. (i.e., both L', R', and aux all have the same delay.) Hence,\n" |
"// a caller of this routine may set aux on the first input with valid\n" |
"// data, and then wait to see aux set on the output to know when to find\n" |
"// the first output with valid data.\n" |
"//\n" |
"// All bits are preserved until the very last clock, where any more bits\n" |
"// than OWIDTH will be quietly discarded.\n" |
"//\n" |
"// This design features no overflow checking.\n" |
"//\n" |
"// Notes:\n" |
"// CORDIC:\n" |
"// Much as we might like, we can't use a cordic here.\n" |
"// The goal is to accomplish an FFT, as defined, and a\n" |
"// CORDIC places a scale factor onto the data. Removing\n" |
"// the scale factor would cost two multiplies, which\n" |
"// is precisely what we are trying to avoid.\n" |
"//\n" |
"//\n" |
"// 3-MULTIPLIES:\n" |
"// It should also be possible to do this with three multiplies\n" |
"// and an extra two addition cycles.\n" |
"//\n" |
"// We want\n" |
"// R+I = (a + jb) * (c + jd)\n" |
"// R+I = (ac-bd) + j(ad+bc)\n" |
"// We multiply\n" |
"// P1 = ac\n" |
"// P2 = bd\n" |
"// P3 = (a+b)(c+d)\n" |
"// Then\n" |
"// R+I=(P1-P2)+j(P3-P2-P1)\n" |
"//\n" |
"// WIDTHS:\n" |
"// On multiplying an X width number by an\n" |
"// Y width number, X>Y, the result should be (X+Y)\n" |
"// bits, right?\n" |
"// -2^(X-1) <= a <= 2^(X-1) - 1\n" |
"// -2^(Y-1) <= b <= 2^(Y-1) - 1\n" |
"// (2^(Y-1)-1)*(-2^(X-1)) <= ab <= 2^(X-1)2^(Y-1)\n" |
"// -2^(X+Y-2)+2^(X-1) <= ab <= 2^(X+Y-2) <= 2^(X+Y-1) - 1\n" |
"// -2^(X+Y-1) <= ab <= 2^(X+Y-1)-1\n" |
"// YUP! But just barely. Do this and you'll really want\n" |
"// to drop a bit, although you will risk overflow in so\n" |
"// doing.\n" |
"//\n" |
"// 20150602 -- The sync logic lines have been completely redone. The\n" |
"// synchronization lines no longer go through the FIFO with the\n" |
"// left hand sum, but are kept out of memory. This allows the\n" |
"// butterfly to use more optimal memory resources, while also\n" |
"// guaranteeing that the sync lines can be properly reset upon\n" |
"// any reset signal.\n" |
"//\n" |
"//\n%s" |
"//\n", prjname, creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
|
fprintf(fp, |
"module\tbutterfly(i_clk, %s, i_ce, i_coef, i_left, i_right, i_aux,\n" |
"\t\to_left, o_right, o_aux);\n" |
"\t// Public changeable parameters ...\n", resetw.c_str()); |
|
fprintf(fp, |
"\tparameter IWIDTH=%d,", TST_BUTTERFLY_IWIDTH); |
#ifdef TST_BUTTERFLY_CWIDTH |
fprintf(fp, "CWIDTH=%d,", TST_BUTTERFLY_CWIDTH); |
#else |
fprintf(fp, "CWIDTH=IWIDTH+%d,", xtracbits); |
#endif |
#ifdef TST_BUTTERFLY_OWIDTH |
fprintf(fp, "OWIDTH=%d;\n", TST_BUTTERFLY_OWIDTH); |
// OWIDTH = TST_BUTTERFLY_OWIDTH; |
#else |
fprintf(fp, "OWIDTH=IWIDTH+1;\n"); |
#endif |
fprintf(fp, "\tparameter\tSHIFT=0;\n"); |
|
fprintf(fp, |
"\t// The number of clocks per each i_ce. The actual number can be\n" |
"\t// more, but the algorithm depends upon at least this many for\n" |
"\t// extra internal processing.\n" |
"\tparameter CKPCE=%d;\n", ckpce); |
|
fprintf(fp, |
"\t//\n" |
"\t// Local/derived parameters that are calculated from the above\n" |
"\t// params. Apart from algorithmic changes below, these should not\n" |
"\t// be adjusted\n" |
"\t//\n" |
"\t// The first step is to calculate how many clocks it takes our\n" |
"\t// multiply to come back with an answer within. The time in the\n" |
"\t// multiply depends upon the input value with the fewest number of\n" |
"\t// bits--to keep the pipeline depth short. So, let's find the\n" |
"\t// fewest number of bits here.\n" |
"\tlocalparam MXMPYBITS = \n" |
"\t\t((IWIDTH+2)>(CWIDTH+1)) ? (CWIDTH+1) : (IWIDTH + 2);\n" |
"\t//\n" |
"\t// Given this \"fewest\" number of bits, we can calculate the\n" |
"\t// number of clocks the multiply itself will take.\n" |
"\tlocalparam MPYDELAY=((MXMPYBITS+1)/2)+2;\n" |
"\t//\n" |
"\t// In an environment when CKPCE > 1, the multiply delay isn\'t\n" |
"\t// necessarily the delay felt by this algorithm--measured in\n" |
"\t// i_ce\'s. In particular, if the multiply can operate with more\n" |
"\t// operations per clock, it can appear to finish \"faster\".\n" |
"\t// Since most of the logic in this core operates on the slower\n" |
"\t// clock, we'll need to map that speed into the number of slower\n" |
"\t// clock ticks that it takes.\n" |
"\tlocalparam LCLDELAY = (CKPCE == 1) ? MPYDELAY\n" |
"\t\t: (CKPCE == 2) ? (MPYDELAY/2+2)\n" |
"\t\t: (MPYDELAY/3 + 2);\n" |
"\tlocalparam LGDELAY = (MPYDELAY>64) ? 7\n" |
"\t\t\t: (MPYDELAY > 32) ? 6\n" |
"\t\t\t: (MPYDELAY > 16) ? 5\n" |
"\t\t\t: (MPYDELAY > 8) ? 4\n" |
"\t\t\t: (MPYDELAY > 4) ? 3\n" |
"\t\t\t: 2;\n" |
"\tlocalparam AUXLEN=(LCLDELAY+3);\n" |
"\tlocalparam MPYREMAINDER = MPYDELAY - CKPCE*(MPYDELAY/CKPCE);\n" |
"\n\n"); |
|
|
fprintf(fp, |
"\tinput\t\ti_clk, %s, i_ce;\n" |
"\tinput\t\t[(2*CWIDTH-1):0] i_coef;\n" |
"\tinput\t\t[(2*IWIDTH-1):0] i_left, i_right;\n" |
"\tinput\t\ti_aux;\n" |
"\toutput\twire [(2*OWIDTH-1):0] o_left, o_right;\n" |
"\toutput\treg\to_aux;\n\n", resetw.c_str()); |
fprintf(fp, |
"\treg\t[(2*IWIDTH-1):0]\tr_left, r_right;\n" |
"\treg\t[(2*CWIDTH-1):0]\tr_coef, r_coef_2;\n" |
"\twire\tsigned\t[(IWIDTH-1):0]\tr_left_r, r_left_i, r_right_r, r_right_i;\n" |
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n" |
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n" |
"\n" |
"\treg\tsigned\t[(IWIDTH):0]\tr_sum_r, r_sum_i, r_dif_r, r_dif_i;\n" |
"\n" |
"\treg [(LGDELAY-1):0] fifo_addr;\n" |
"\twire [(LGDELAY-1):0] fifo_read_addr;\n" |
"\tassign\tfifo_read_addr = fifo_addr - LCLDELAY[(LGDELAY-1):0];\n" |
"\treg [(2*IWIDTH+1):0] fifo_left [ 0:((1<<LGDELAY)-1)];\n" |
"\n"); |
fprintf(fp, |
"\t// Set up the input to the multiply\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// One clock just latches the inputs\n" |
"\t\t\tr_left <= i_left; // No change in # of bits\n" |
"\t\t\tr_right <= i_right;\n" |
"\t\t\tr_coef <= i_coef;\n" |
"\t\t\t// Next clock adds/subtracts\n" |
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n" |
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n" |
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n" |
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n" |
"\t\t\t// Other inputs are simply delayed on second clock\n" |
"\t\t\tr_coef_2<= r_coef;\n" |
"\t\tend\n" |
"\n"); |
fprintf(fp, |
"\t// Don\'t forget to record the even side, since it doesn\'t need\n" |
"\t// to be multiplied, but yet we still need the results in sync\n" |
"\t// with the answer when it is ready.\n" |
"\tinitial fifo_addr = 0;\n"); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\t\tfifo_addr <= 0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\t// Need to delay the sum side--nothing else happens\n" |
"\t\t\t// to it, but it needs to stay synchronized with the\n" |
"\t\t\t// right side.\n" |
"\t\t\tfifo_addr <= fifo_addr + 1;\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\tfifo_left[fifo_addr] <= { r_sum_r, r_sum_i };\n" |
"\n" |
"\twire\tsigned\t[(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n" |
"\tassign\tir_coef_r = r_coef_2[(2*CWIDTH-1):CWIDTH];\n" |
"\tassign\tir_coef_i = r_coef_2[(CWIDTH-1):0];\n" |
"\twire\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\tp_one, p_two, p_three;\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\t// Multiply output is always a width of the sum of the widths of\n" |
"\t// the two inputs. ALWAYS. This is independent of the number of\n" |
"\t// bits in p_one, p_two, or p_three. These values needed to\n" |
"\t// accumulate a bit (or two) each. However, this approach to a\n" |
"\t// three multiply complex multiply cannot increase the total\n" |
"\t// number of bits in our final output. We\'ll take care of\n" |
"\t// dropping back down to the proper width, OWIDTH, in our routine\n" |
"\t// below.\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\t// We accomplish here \"Karatsuba\" multiplication. That is,\n" |
"\t// by doing three multiplies we accomplish the work of four.\n" |
"\t// Let\'s prove to ourselves that this works ... We wish to\n" |
"\t// multiply: (a+jb) * (c+jd), where a+jb is given by\n" |
"\t//\ta + jb = r_dif_r + j r_dif_i, and\n" |
"\t//\tc + jd = ir_coef_r + j ir_coef_i.\n" |
"\t// We do this by calculating the intermediate products P1, P2,\n" |
"\t// and P3 as\n" |
"\t//\tP1 = ac\n" |
"\t//\tP2 = bd\n" |
"\t//\tP3 = (a + b) * (c + d)\n" |
"\t// and then complete our final answer with\n" |
"\t//\tac - bd = P1 - P2 (this checks)\n" |
"\t//\tad + bc = P3 - P2 - P1\n" |
"\t//\t = (ac + bc + ad + bd) - bd - ac\n" |
"\t//\t = bc + ad (this checks)\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\t// This should really be based upon an IF, such as in\n" |
"\t// if (IWIDTH < CWIDTH) then ...\n" |
"\t// However, this is the only (other) way I know to do it.\n" |
"\tgenerate if (CKPCE <= 1)\n" |
"\tbegin\n" |
"\n" |
"\t\twire\t[(CWIDTH):0]\tp3c_in;\n" |
"\t\twire\t[(IWIDTH+1):0]\tp3d_in;\n" |
"\t\tassign\tp3c_in = ir_coef_i + ir_coef_r;\n" |
"\t\tassign\tp3d_in = r_dif_r + r_dif_i;\n" |
"\n" |
"\t\t// We need to pad these first two multiplies by an extra\n" |
"\t\t// bit just to keep them aligned with the third,\n" |
"\t\t// simpler, multiply.\n" |
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) p1(i_clk, i_ce,\n" |
"\t\t\t\t{ir_coef_r[CWIDTH-1],ir_coef_r},\n" |
"\t\t\t\t{r_dif_r[IWIDTH],r_dif_r}, p_one);\n" |
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) p2(i_clk, i_ce,\n" |
"\t\t\t\t{ir_coef_i[CWIDTH-1],ir_coef_i},\n" |
"\t\t\t\t{r_dif_i[IWIDTH],r_dif_i}, p_two);\n" |
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) p3(i_clk, i_ce,\n" |
"\t\t\t\tp3c_in, p3d_in, p_three);\n" |
"\n"); |
|
/////////////////////////////////////////// |
/// |
/// Two clocks per CE, so CE, no-ce, CE, no-ce, etc |
/// |
fprintf(fp, |
"\tend else if (CKPCE == 2)\n" |
"\tbegin : CKPCE_TWO\n" |
"\t\t// Coefficient multiply inputs\n" |
"\t\treg [2*(CWIDTH)-1:0] mpy_pipe_c;\n" |
"\t\t// Data multiply inputs\n" |
"\t\treg [2*(IWIDTH+1)-1:0] mpy_pipe_d;\n" |
"\t\twire signed [(CWIDTH-1):0] mpy_pipe_vc;\n" |
"\t\twire signed [(IWIDTH):0] mpy_pipe_vd;\n" |
"\t\t//\n" |
"\t\treg signed [(CWIDTH+1)-1:0] mpy_cof_sum;\n" |
"\t\treg signed [(IWIDTH+2)-1:0] mpy_dif_sum;\n" |
"\n" |
"\t\tassign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH];\n" |
"\t\tassign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1];\n" |
"\n" |
"\t\treg mpy_pipe_v;\n" |
"\t\treg ce_phase;\n" |
"\n" |
"\t\treg signed [(CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;\n" |
"\t\treg signed [IWIDTH+CWIDTH+3-1:0] longmpy;\n" |
"\n" |
"\n" |
"\t\tinitial ce_phase = 1'b0;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n" |
"\t\t\tce_phase <= 1'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\tce_phase <= 1'b1;\n" |
"\t\telse\n" |
"\t\t\tce_phase <= 1'b0;\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\t\tmpy_pipe_v = (i_ce)||(ce_phase);\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (ce_phase)\n" |
"\t\tbegin\n" |
"\t\t\tmpy_pipe_c[2*CWIDTH-1:0] <=\n" |
"\t\t\t\t\t{ ir_coef_r, ir_coef_i };\n" |
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <=\n" |
"\t\t\t\t\t{ r_dif_r, r_dif_i };\n" |
"\n" |
"\t\t\tmpy_cof_sum <= ir_coef_i + ir_coef_r;\n" |
"\t\t\tmpy_dif_sum <= r_dif_r + r_dif_i;\n" |
"\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tmpy_pipe_c[2*(CWIDTH)-1:0] <= {\n" |
"\t\t\t\tmpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} };\n" |
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <= {\n" |
"\t\t\t\tmpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} };\n" |
"\t\tend\n" |
"\n"); |
fprintf(fp, |
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) mpy0(i_clk, mpy_pipe_v,\n" |
"\t\t\t\tmpy_cof_sum, mpy_dif_sum, longmpy);\n" |
"\n"); |
|
fprintf(fp, |
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) mpy1(i_clk, mpy_pipe_v,\n" |
"\t\t\t\t{ mpy_pipe_vc[CWIDTH-1], mpy_pipe_vc },\n" |
"\t\t\t\t{ mpy_pipe_vd[IWIDTH ], mpy_pipe_vd },\n" |
"\t\t\t\tmpy_pipe_out);\n\n"); |
|
fprintf(fp, |
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\n" |
"\t\t\t\t\trp_one, rp_two, rp_three,\n" |
"\t\t\t\t\trp2_one, rp2_two, rp2_three;\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (((i_ce)&&(!MPYDELAY[0]))\n" |
"\t\t\t||((ce_phase)&&(MPYDELAY[0])))\n" |
"\t\t\trp_one <= mpy_pipe_out;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (((i_ce)&&(MPYDELAY[0]))\n" |
"\t\t\t||((ce_phase)&&(!MPYDELAY[0])))\n" |
"\t\t\trp_two <= mpy_pipe_out;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\trp_three <= longmpy;\n" |
"\n" |
"\t\t// Our outputs *MUST* be set on a clock where i_ce is\n" |
"\t\t// true for the following logic to work. Make that\n" |
"\t\t// happen here.\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\trp2_one<= rp_one;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\trp2_two <= rp_two;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\trp2_three<= rp_three;\n" |
"\n" |
"\t\tassign p_one = rp2_one;\n" |
"\t\tassign p_two = (!MPYDELAY[0])? rp2_two : rp_two;\n" |
"\t\tassign p_three = ( MPYDELAY[0])? rp_three : rp2_three;\n" |
"\n" |
"\t\t// verilator lint_off UNUSED\n" |
"\t\twire\t[2*(IWIDTH+CWIDTH+3)-1:0]\tunused;\n" |
"\t\tassign\tunused = { rp2_two, rp2_three };\n" |
"\t\t// verilator lint_on UNUSED\n" |
"\n"); |
|
///////////////////////// |
/// |
/// Three clock per CE, so CE, no-ce, no-ce*, CE |
/// |
fprintf(fp, |
"\tend else if (CKPCE <= 3)\n\tbegin : CKPCE_THREE\n"); |
|
fprintf(fp, |
"\t\t// Coefficient multiply inputs\n" |
"\t\treg\t\t[3*(CWIDTH+1)-1:0]\tmpy_pipe_c;\n" |
"\t\t// Data multiply inputs\n" |
"\t\treg\t\t[3*(IWIDTH+2)-1:0]\tmpy_pipe_d;\n" |
"\t\twire\tsigned [(CWIDTH):0] mpy_pipe_vc;\n" |
"\t\twire\tsigned [(IWIDTH+1):0] mpy_pipe_vd;\n" |
"\n" |
"\t\tassign\tmpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)];\n" |
"\t\tassign\tmpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)];\n" |
"\n" |
"\t\treg\t\t\tmpy_pipe_v;\n" |
"\t\treg\t\t[2:0]\tce_phase;\n" |
"\n" |
"\t\treg\tsigned [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;\n" |
"\n"); |
fprintf(fp, |
"\t\tinitial\tce_phase = 3'b011;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n" |
"\t\t\tce_phase <= 3'b011;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\tce_phase <= 3'b000;\n" |
"\t\telse if (ce_phase != 3'b011)\n" |
"\t\t\tce_phase <= ce_phase + 1'b1;\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\t\tmpy_pipe_v = (i_ce)||(ce_phase < 3'b010);\n" |
"\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (ce_phase == 3\'b000)\n" |
"\t\t\tbegin\n" |
"\t\t\t\t// Second clock\n" |
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= {\n" |
"\t\t\t\t\tir_coef_r[CWIDTH-1], ir_coef_r,\n" |
"\t\t\t\t\tir_coef_i[CWIDTH-1], ir_coef_i };\n" |
"\t\t\t\tmpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r;\n" |
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= {\n" |
"\t\t\t\t\tr_dif_r[IWIDTH], r_dif_r,\n" |
"\t\t\t\t\tr_dif_i[IWIDTH], r_dif_i };\n" |
"\t\t\t\tmpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i;\n" |
"\n" |
"\t\t\tend else if (mpy_pipe_v)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:0] <= {\n" |
"\t\t\t\t\tmpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1\'b0}} };\n" |
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:0] <= {\n" |
"\t\t\t\t\tmpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1\'b0}} };\n" |
"\t\t\tend\n" |
"\n"); |
fprintf(fp, |
"\t\tlongbimpy #(CWIDTH+1,IWIDTH+2) mpy(i_clk, mpy_pipe_v,\n" |
"\t\t\t\tmpy_pipe_vc, mpy_pipe_vd, mpy_pipe_out);\n" |
"\n"); |
|
fprintf(fp, |
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\n" |
"\t\t\t\trp_one, rp_two, rp_three,\n" |
"\t\t\t\trp2_one, rp2_two, rp2_three,\n" |
"\t\t\t\trp3_one;\n" |
"\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (MPYREMAINDER == 0)\n" |
"\t\tbegin\n\n" |
"\t\t if (i_ce)\n" |
"\t\t rp_two <= mpy_pipe_out;\n" |
"\t\t else if (ce_phase == 3'b000)\n" |
"\t\t rp_three <= mpy_pipe_out;\n" |
"\t\t else if (ce_phase == 3'b001)\n" |
"\t\t rp_one <= mpy_pipe_out;\n\n" |
"\t\tend else if (MPYREMAINDER == 1)\n" |
"\t\tbegin\n\n" |
"\t\t if (i_ce)\n" |
"\t\t rp_one <= mpy_pipe_out;\n" |
"\t\t else if (ce_phase == 3'b000)\n" |
"\t\t rp_two <= mpy_pipe_out;\n" |
"\t\t else if (ce_phase == 3'b001)\n" |
"\t\t rp_three <= mpy_pipe_out;\n\n" |
"\t\tend else // if (MPYREMAINDER == 2)\n" |
"\t\tbegin\n\n" |
"\t\t if (i_ce)\n" |
"\t\t rp_three <= mpy_pipe_out;\n" |
"\t\t else if (ce_phase == 3'b000)\n" |
"\t\t rp_one <= mpy_pipe_out;\n" |
"\t\t else if (ce_phase == 3'b001)\n" |
"\t\t rp_two <= mpy_pipe_out;\n\n" |
"\t\tend\n\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\trp2_one <= rp_one;\n" |
"\t\t\trp2_two <= rp_two;\n" |
"\t\t\trp2_three <= (MPYREMAINDER == 2) ? mpy_pipe_out : rp_three;\n" |
"\t\t\trp3_one <= (MPYREMAINDER == 0) ? rp2_one : rp_one;\n" |
"\t\tend\n"); |
fprintf(fp, |
|
"\t\tassign\tp_one = rp3_one;\n" |
"\t\tassign\tp_two = rp2_two;\n" |
"\t\tassign\tp_three = rp2_three;\n" |
"\n"); |
|
fprintf(fp, |
"\tend endgenerate\n"); |
|
fprintf(fp, |
"\t// These values are held in memory and delayed during the\n" |
"\t// multiply. Here, we recover them. During the multiply,\n" |
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n" |
"\t// therefore, the left_x values need to be right shifted by\n" |
"\t// CWIDTH-2 as well. The additional bits come from a sign\n" |
"\t// extension.\n" |
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] fifo_i, fifo_r;\n" |
"\treg\t\t[(2*IWIDTH+1):0] fifo_read;\n" |
"\tassign\tfifo_r = { {2{fifo_read[2*(IWIDTH+1)-1]}}, fifo_read[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n" |
"\tassign\tfifo_i = { {2{fifo_read[(IWIDTH+1)-1]}}, fifo_read[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n" |
"\n" |
"\n" |
"\treg\tsigned\t[(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n" |
"\n"); |
fprintf(fp, |
"\t// Let's do some rounding and remove unnecessary bits.\n" |
"\t// We have (IWIDTH+CWIDTH+3) bits here, we need to drop down to\n" |
"\t// OWIDTH, and SHIFT by SHIFT bits in the process. The trick is\n" |
"\t// that we don\'t need (IWIDTH+CWIDTH+3) bits. We\'ve accumulated\n" |
"\t// them, but the actual values will never fill all these bits.\n" |
"\t// In particular, we only need:\n" |
"\t//\t IWIDTH bits for the input\n" |
"\t//\t +1 bit for the add/subtract\n" |
"\t//\t+CWIDTH bits for the coefficient multiply\n" |
"\t//\t +1 bit for the add/subtract in the complex multiply\n" |
"\t//\t ------\n" |
"\t//\t (IWIDTH+CWIDTH+2) bits at full precision.\n" |
"\t//\n" |
"\t// However, the coefficient multiply multiplied by a maximum value\n" |
"\t// of 2^(CWIDTH-2). Thus, we only have\n" |
"\t//\t IWIDTH bits for the input\n" |
"\t//\t +1 bit for the add/subtract\n" |
"\t//\t+CWIDTH-2 bits for the coefficient multiply\n" |
"\t//\t +1 (optional) bit for the add/subtract in the cpx mpy.\n" |
"\t//\t -------- ... multiply. (This last bit may be shifted out.)\n" |
"\t//\t (IWIDTH+CWIDTH) valid output bits.\n" |
"\t// Now, if the user wants to keep any extras of these (via OWIDTH),\n" |
"\t// or if he wishes to arbitrarily shift some of these off (via\n" |
"\t// SHIFT) we accomplish that here.\n" |
"\n"); |
fprintf(fp, |
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n"); |
|
fprintf(fp, |
"\twire\tsigned\t[(CWIDTH+IWIDTH+3-1):0]\tleft_sr, left_si;\n" |
"\tassign left_sr = { {(2){fifo_r[(IWIDTH+CWIDTH)]}}, fifo_r };\n" |
"\tassign left_si = { {(2){fifo_i[(IWIDTH+CWIDTH)]}}, fifo_i };\n\n"); |
|
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_r(i_clk, i_ce,\n" |
"\t\t\t\tleft_sr, rnd_left_r);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_i(i_clk, i_ce,\n" |
"\t\t\t\tleft_si, rnd_left_i);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n" |
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n" |
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string); |
fprintf(fp, |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// First clock, recover all values\n" |
"\t\t\tfifo_read <= fifo_left[fifo_read_addr];\n" |
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n" |
"\t\t\t// although they only need to be (IWIDTH+1)\n" |
"\t\t\t// + (CWIDTH) bits wide. (We\'ve got two\n" |
"\t\t\t// extra bits we need to get rid of.)\n" |
"\t\t\tmpy_r <= p_one - p_two;\n" |
"\t\t\tmpy_i <= p_three - p_one - p_two;\n" |
"\t\tend\n" |
"\n"); |
|
fprintf(fp, |
"\treg\t[(AUXLEN-1):0]\taux_pipeline;\n" |
"\tinitial\taux_pipeline = 0;\n"); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\t\taux_pipeline <= 0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\taux_pipeline <= { aux_pipeline[(AUXLEN-2):0], i_aux };\n" |
"\n"); |
fprintf(fp, |
"\tinitial o_aux = 1\'b0;\n"); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\t\to_aux <= 1\'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// Second clock, latch for final clock\n" |
"\t\t\to_aux <= aux_pipeline[AUXLEN-1];\n" |
"\t\tend\n" |
"\n"); |
|
fprintf(fp, |
"\t// As a final step, we pack our outputs into two packed two\'s\n" |
"\t// complement numbers per output word, so that each output word\n" |
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n" |
"\t// portion and the bottom half being the imaginary portion.\n" |
"\tassign o_left = { rnd_left_r, rnd_left_i };\n" |
"\tassign o_right= { rnd_right_r,rnd_right_i};\n" |
"\n"); |
|
if (formal_property_flag) { |
fprintf(fp, |
"`ifdef VERILATOR\n" |
"`define FORMAL\n" |
"`endif\n" |
"`ifdef FORMAL\n" |
"\tlocalparam F_LGDEPTH = (AUXLEN > 64) ? 7\n" |
"\t\t\t: (AUXLEN > 32) ? 6\n" |
"\t\t\t: (AUXLEN > 16) ? 5\n" |
"\t\t\t: (AUXLEN > 8) ? 4\n" |
"\t\t\t: (AUXLEN > 4) ? 3 : 2;\n\n" |
"\tlocalparam F_DEPTH = AUXLEN;\n" |
"\tlocalparam [F_LGDEPTH-1:0] F_D = F_DEPTH[F_LGDEPTH-1:0]-1;\n" |
"\n" |
"\treg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1];\n" |
"\treg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1];\n" |
"\treg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1];\n" |
"\treg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1];\n" |
"\treg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1];\n" |
"\treg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1];\n" |
"\treg signed [F_DEPTH-1:0] f_dlyaux;\n" |
"\n" |
"\tinitial\tf_dlyaux[0] = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n" |
"\t\tf_dlyaux\t<= 0;\n" |
"\telse if (i_ce)\n" |
"\t\tf_dlyaux\t<= { f_dlyaux[F_DEPTH-2:0], i_aux };\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\tbegin\n" |
"\t f_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH];\n" |
"\t f_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0];\n" |
"\t f_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH];\n" |
"\t f_dlyright_i[0] <= i_right[( IWIDTH-1):0];\n" |
"\t f_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH];\n" |
"\t f_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0];\n" |
"\tend\n" |
"\n" |
"\tgenvar k;\n" |
"\tgenerate for(k=1; k<F_DEPTH; k=k+1)\n" |
"\tbegin : F_PROPAGATE_DELAY_LINES\n" |
"\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t f_dlyleft_r[k] <= f_dlyleft_r[ k-1];\n" |
"\t\t f_dlyleft_i[k] <= f_dlyleft_i[ k-1];\n" |
"\t\t f_dlyright_r[k] <= f_dlyright_r[k-1];\n" |
"\t\t f_dlyright_i[k] <= f_dlyright_i[k-1];\n" |
"\t\t f_dlycoeff_r[k] <= f_dlycoeff_r[k-1];\n" |
"\t\t f_dlycoeff_i[k] <= f_dlycoeff_i[k-1];\n" |
"\t\tend\n" |
"\n" |
"\tend endgenerate\n" |
"\n" |
"`ifndef VERILATOR\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3))\n" |
"\t &&(!$past(i_ce,4)))\n" |
"\t assume(i_ce);\n" |
"\n" |
"\tgenerate if (CKPCE <= 1)\n" |
"\tbegin\n" |
"\n" |
"\t // i_ce is allowed to be anything in this mode\n" |
"\n" |
"\tend else if (CKPCE == 2)\n" |
"\tbegin : F_CKPCE_TWO\n" |
"\n" |
"\t always @(posedge i_clk)\n" |
"\t if ($past(i_ce))\n" |
"\t assume(!i_ce);\n" |
"\n" |
"\tend else if (CKPCE == 3)\n" |
"\tbegin : F_CKPCE_THREE\n" |
"\n" |
"\t always @(posedge i_clk)\n" |
"\t if (($past(i_ce))||($past(i_ce,2)))\n" |
"\t assume(!i_ce);\n" |
"\n" |
"\tend endgenerate\n" |
"`endif\n" |
"\n" |
"\treg [F_LGDEPTH:0] f_startup_counter;\n" |
"\tinitial f_startup_counter = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n" |
"\t f_startup_counter <= 0;\n" |
"\telse if ((i_ce)&&(!(&f_startup_counter)))\n" |
"\t f_startup_counter <= f_startup_counter + 1;\n" |
"\n" |
"\twire signed [IWIDTH:0] f_sumr, f_sumi;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t f_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D];\n" |
"\t f_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D];\n" |
"\tend\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_sumrx, f_sumix;\n" |
"\tassign\tf_sumrx = { {(4){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} };\n" |
"\tassign\tf_sumix = { {(4){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} };\n" |
"\n" |
"\twire signed [IWIDTH:0] f_difr, f_difi;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t f_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D];\n" |
"\t f_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D];\n" |
"\tend\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix;\n" |
"\tassign\tf_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr };\n" |
"\tassign\tf_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi };\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i;\n" |
"\tassign\tf_widecoeff_r ={ {(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}},\n" |
"\t\t\t\t\t\tf_dlycoeff_r[F_D] };\n" |
"\tassign\tf_widecoeff_i ={ {(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}},\n" |
"\t\t\t\t\t\tf_dlycoeff_i[F_D] };\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif (f_startup_counter > {1'b0, F_D})\n" |
"\tbegin\n" |
"\t assert(aux_pipeline == f_dlyaux);\n" |
"\t assert(left_sr == f_sumrx);\n" |
"\t assert(left_si == f_sumix);\n" |
"\t assert(aux_pipeline[AUXLEN-1] == f_dlyaux[F_D]);\n" |
"\n" |
"\t if ((f_difr == 0)&&(f_difi == 0))\n" |
"\t begin\n" |
"\t assert(mpy_r == 0);\n" |
"\t assert(mpy_i == 0);\n" |
"\t end else if ((f_dlycoeff_r[F_D] == 0)\n" |
"\t &&(f_dlycoeff_i[F_D] == 0))\n" |
"\t begin\n" |
"\t assert(mpy_r == 0);\n" |
"\t assert(mpy_i == 0);\n" |
"\t end\n" |
"\n" |
"\t if ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0))\n" |
"\t begin\n" |
"\t assert(mpy_r == f_difrx);\n" |
"\t assert(mpy_i == f_difix);\n" |
"\t end\n" |
"\n" |
"\t if ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1))\n" |
"\t begin\n" |
"\t assert(mpy_r == -f_difix);\n" |
"\t assert(mpy_i == f_difrx);\n" |
"\t end\n" |
"\n" |
"\t if ((f_difr == 1)&&(f_difi == 0))\n" |
"\t begin\n" |
"\t assert(mpy_r == f_widecoeff_r);\n" |
"\t assert(mpy_i == f_widecoeff_i);\n" |
"\t end\n" |
"\n" |
"\t if ((f_difr == 0)&&(f_difi == 1))\n" |
"\t begin\n" |
"\t assert(mpy_r == -f_widecoeff_i);\n" |
"\t assert(mpy_i == f_widecoeff_r);\n" |
"\t end\n" |
"\tend\n" |
"\n"); |
|
fprintf(fp, |
"\t// Let's see if we can improve our performance at all by\n" |
"\t// moving our test one clock earlier. If nothing else, it should\n" |
"\t// help induction finish one (or more) clocks ealier than\n" |
"\t// otherwise\n" |
"\n\n" |
"\twire signed [IWIDTH:0] f_predifr, f_predifi;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t\tf_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1];\n" |
"\t\tf_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1];\n" |
"\tend\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_predifrx, f_predifix;\n" |
"\tassign f_predifrx = { {(CWIDTH+2){f_predifr[IWIDTH]}}, f_predifr };\n" |
"\tassign f_predifix = { {(CWIDTH+2){f_predifi[IWIDTH]}}, f_predifi };\n" |
"\n" |
"\twire signed [CWIDTH:0] f_sumcoef;\n" |
"\twire signed [IWIDTH+1:0] f_sumdiff;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t\tf_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1];\n" |
"\t\tf_sumdiff = f_predifr + f_predifi;\n" |
"\tend\n" |
"\n" |
"\t// Induction helpers\n" |
"\talways @(posedge i_clk)\n" |
"\tif (f_startup_counter >= { 1'b0, F_D })\n" |
"\tbegin\n" |
"\t\tif (f_dlycoeff_r[F_D-1] == 0)\n" |
"\t\t\tassert(p_one == 0);\n" |
"\t\tif (f_dlycoeff_i[F_D-1] == 0)\n" |
"\t\t\tassert(p_two == 0);\n" |
"\n" |
"\t\tif (f_dlycoeff_r[F_D-1] == 1)\n" |
"\t\t\tassert(p_one == f_predifrx);\n" |
"\t\tif (f_dlycoeff_i[F_D-1] == 1)\n" |
"\t\t\tassert(p_two == f_predifix);\n" |
"\n" |
"\t\tif (f_predifr == 0)\n" |
"\t\t\tassert(p_one == 0);\n" |
"\t\tif (f_predifi == 0)\n" |
"\t\t\tassert(p_two == 0);\n" |
"\n" |
"\t\t// verilator lint_off WIDTH\n" |
"\t\tif (f_predifr == 1)\n" |
"\t\t\tassert(p_one == f_dlycoeff_r[F_D-1]);\n" |
"\t\tif (f_predifi == 1)\n" |
"\t\t\tassert(p_two == f_dlycoeff_i[F_D-1]);\n" |
"\t\t// verilator lint_on WIDTH\n" |
"\n" |
"\t\tif (f_sumcoef == 0)\n" |
"\t\t\tassert(p_three == 0);\n" |
"\t\tif (f_sumdiff == 0)\n" |
"\t\t\tassert(p_three == 0);\n" |
"\t\t// verilator lint_off WIDTH\n" |
"\t\tif (f_sumcoef == 1)\n" |
"\t\t\tassert(p_three == f_sumdiff);\n" |
"\t\tif (f_sumdiff == 1)\n" |
"\t\t\tassert(p_three == f_sumcoef);\n" |
"\t\t// verilator lint_on WIDTH\n" |
"`ifdef VERILATOR\n" |
"\t\tassert(p_one == f_predifr * f_dlycoeff_r[F_D-1]);\n" |
"\t\tassert(p_two == f_predifi * f_dlycoeff_i[F_D-1]);\n" |
"\t\tassert(p_three == f_sumdiff * f_sumcoef);\n" |
"`endif // VERILATOR\n" |
"\tend\n\n"); |
|
fprintf(fp, |
"\t// F_CHECK will be set externally by the solver, so that we can\n" |
"\t// double check that the solver is actually testing what we think\n" |
"\t// it is testing. We'll set it here to MPYREMAINDER, which will\n" |
"\t// essentially eliminate the check--unless overridden by the\n" |
"\t// solver.\n" |
"\tparameter F_CHECK = MPYREMAINDER;\n" |
"\tinitial assert(MPYREMAINDER == F_CHECK);\n\n"); |
|
fprintf(fp, |
"`endif // FORMAL\n"); |
} |
|
fprintf(fp, |
"endmodule\n"); |
fclose(fp); |
} |
|
void build_hwbfly(const char *fname, int xtracbits, ROUND_T rounding, |
int ckpce, const bool async_reset) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
const char *rnd_string; |
if (rounding == RND_TRUNCATE) |
rnd_string = "truncate"; |
else if (rounding == RND_FROMZERO) |
rnd_string = "roundfromzero"; |
else if (rounding == RND_HALFUP) |
rnd_string = "roundhalfup"; |
else |
rnd_string = "convround"; |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\thwbfly.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tThis routine is identical to the butterfly.v routine found\n" |
"// in 'butterfly.v', save only that it uses the verilog\n" |
"// operator '*' in hopes that the synthesizer would be able to optimize\n" |
"// it with hardware resources.\n" |
"//\n" |
"// It is understood that a hardware multiply can complete its operation in\n" |
"// a single clock.\n" |
"//\n" |
"// Operation:\n" |
"//\n" |
"// Given two inputs, A (i_left) and B (i_right), and a complex\n" |
"// coefficient C (i_coeff), return two outputs, O1 and O2, where:\n" |
"//\n" |
"// O1 = A + B, and\n" |
"// O2 = (A - B)*C\n" |
"//\n" |
"// This operation is commonly known as a Decimation in Frequency (DIF)\n" |
"// Radix-2 Butterfly.\n" |
"// O1 and O2 are rounded before being returned in (o_left) and o_right\n" |
"// to OWIDTH bits. If SHIFT is one, an extra bit is dropped from these\n" |
"// values during the rounding process.\n" |
"//\n" |
"// Further, since these outputs will take some number of clocks to\n" |
"// calculate, we'll pipe a value (i_aux) through the system and return\n" |
"// it with the results (o_aux), so you can synchronize to the outgoing\n" |
"// output stream.\n" |
"//\n" |
"//\n%s" |
"//\n", prjname, creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module hwbfly(i_clk, %s, i_ce, i_coef, i_left, i_right, i_aux,\n" |
"\t\to_left, o_right, o_aux);\n" |
"\t// Public changeable parameters ...\n" |
"\t// - IWIDTH, number of bits in each component of the input\n" |
"\t// - CWIDTH, number of bits in each component of the twiddle factor\n" |
"\t// - OWIDTH, number of bits in each component of the output\n" |
"\tparameter IWIDTH=16,CWIDTH=IWIDTH+%d,OWIDTH=IWIDTH+1;\n" |
"\t// Drop an additional bit on the output?\n" |
"\tparameter\t\tSHIFT=0;\n" |
"\t// The number of clocks per clock enable, 1, 2, or 3.\n" |
"\tparameter\t[1:0]\tCKPCE=%d;\n\t//\n", resetw.c_str(), xtracbits, |
ckpce); |
|
fprintf(fp, |
"\tinput\t\ti_clk, %s, i_ce;\n" |
"\tinput\t\t[(2*CWIDTH-1):0]\ti_coef;\n" |
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n" |
"\tinput\t\ti_aux;\n" |
"\toutput\twire\t[(2*OWIDTH-1):0]\to_left, o_right;\n" |
"\toutput\treg\to_aux;\n\n" |
"\n", resetw.c_str()); |
|
fprintf(fp, |
"\treg\t[(2*IWIDTH-1):0] r_left, r_right;\n" |
"\treg\t r_aux, r_aux_2;\n" |
"\treg\t[(2*CWIDTH-1):0] r_coef;\n" |
"\twire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i;\n" |
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n" |
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n" |
"\treg signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n" |
"\n" |
"\treg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i;\n" |
"\n" |
"\treg [(2*IWIDTH+2):0] leftv, leftvv;\n" |
"\n" |
"\t// Set up the input to the multiply\n" |
"\tinitial r_aux = 1\'b0;\n" |
"\tinitial r_aux_2 = 1\'b0;\n"); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\tbegin\n" |
"\t\t\tr_aux <= 1\'b0;\n" |
"\t\t\tr_aux_2 <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// One clock just latches the inputs\n" |
"\t\t\tr_aux <= i_aux;\n" |
"\t\t\t// Next clock adds/subtracts\n" |
"\t\t\t// Other inputs are simply delayed on second clock\n" |
"\t\t\tr_aux_2 <= r_aux;\n" |
"\t\tend\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// One clock just latches the inputs\n" |
"\t\t\tr_left <= i_left; // No change in # of bits\n" |
"\t\t\tr_right <= i_right;\n" |
"\t\t\tr_coef <= i_coef;\n" |
"\t\t\t// Next clock adds/subtracts\n" |
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n" |
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n" |
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n" |
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n" |
"\t\t\t// Other inputs are simply delayed on second clock\n" |
"\t\t\tir_coef_r <= r_coef[(2*CWIDTH-1):CWIDTH];\n" |
"\t\t\tir_coef_i <= r_coef[(CWIDTH-1):0];\n" |
"\t\tend\n" |
"\n\n"); |
fprintf(fp, |
"\t// See comments in the butterfly.v source file for a discussion of\n" |
"\t// these operations and the appropriate bit widths.\n\n"); |
fprintf(fp, |
"\twire\tsigned [((IWIDTH+1)+(CWIDTH)-1):0] p_one, p_two;\n" |
"\twire\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] p_three;\n" |
"\n" |
"\tinitial leftv = 0;\n" |
"\tinitial leftvv = 0;\n"); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\tbegin\n" |
"\t\t\tleftv <= 0;\n" |
"\t\t\tleftvv <= 0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// Second clock, pipeline = 1\n" |
"\t\t\tleftv <= { r_aux_2, r_sum_r, r_sum_i };\n" |
"\n" |
"\t\t\t// Third clock, pipeline = 3\n" |
"\t\t\t// As desired, each of these lines infers a DSP48\n" |
"\t\t\tleftvv <= leftv;\n" |
"\t\tend\n" |
"\n"); |
|
// Nominally, we should handle code for 1, 2, or 3 clocks per CE, with |
// one clock per CE meaning CE could be constant. The code below |
// instead handles 1 or 3 clocks per CE, leaving the two clocks per |
// CE optimization(s) unfulfilled. |
|
// fprintf(fp, |
//"\tend else if (CKPCI == 2'b01)\n\tbegin\n"); |
|
/////////////////////////////////////////// |
/// |
/// One clock per CE, so CE, CE, CE, CE, CE is possible |
/// |
fprintf(fp, |
"\tgenerate if (CKPCE <= 1)\n\tbegin : CKPCE_ONE\n"); |
|
fprintf(fp, |
"\t\t// Coefficient multiply inputs\n" |
"\t\treg\tsigned [(CWIDTH-1):0] p1c_in, p2c_in;\n" |
"\t\t// Data multiply inputs\n" |
"\t\treg\tsigned [(IWIDTH):0] p1d_in, p2d_in;\n" |
"\t\t// Product 3, coefficient input\n" |
"\t\treg\tsigned [(CWIDTH):0] p3c_in;\n" |
"\t\t// Product 3, data input\n" |
"\t\treg\tsigned [(IWIDTH+1):0] p3d_in;\n" |
"\n"); |
fprintf(fp, |
"\t\treg\tsigned [((IWIDTH+1)+(CWIDTH)-1):0] rp_one, rp_two;\n" |
"\t\treg\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three;\n" |
"\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// Second clock, pipeline = 1\n" |
"\t\t\tp1c_in <= ir_coef_r;\n" |
"\t\t\tp2c_in <= ir_coef_i;\n" |
"\t\t\tp1d_in <= r_dif_r;\n" |
"\t\t\tp2d_in <= r_dif_i;\n" |
"\t\t\tp3c_in <= ir_coef_i + ir_coef_r;\n" |
"\t\t\tp3d_in <= r_dif_r + r_dif_i;\n" |
"\t\tend\n\n"); |
|
if (formal_property_flag) |
fprintf(fp, |
"`ifndef FORMAL\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// Third clock, pipeline = 3\n" |
"\t\t\t// As desired, each of these lines infers a DSP48\n" |
"\t\t\trp_one <= p1c_in * p1d_in;\n" |
"\t\t\trp_two <= p2c_in * p2d_in;\n" |
"\t\t\trp_three <= p3c_in * p3d_in;\n" |
"\t\tend\n"); |
|
if (formal_property_flag) |
fprintf(fp, |
"`else\n" |
"\t\twire signed [((IWIDTH+1)+(CWIDTH)-1):0] pre_rp_one, pre_rp_two;\n" |
"\t\twire signed [((IWIDTH+2)+(CWIDTH+1)-1):0] pre_rp_three;\n" |
"\n" |
"\t\tabs_mpy #(CWIDTH,IWIDTH+1,1'b1)\n" |
"\t\t onei(p1c_in, p1d_in, pre_rp_one);\n" |
"\t\tabs_mpy #(CWIDTH,IWIDTH+1,1'b1)\n" |
"\t\t twoi(p2c_in, p2d_in, pre_rp_two);\n" |
"\t\tabs_mpy #(CWIDTH+1,IWIDTH+2,1'b1)\n" |
"\t\t threei(p3c_in, p3d_in, pre_rp_three);\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t rp_one = pre_rp_one;\n" |
"\t\t rp_two = pre_rp_two;\n" |
"\t\t rp_three = pre_rp_three;\n" |
"\t\tend\n" |
"`endif // FORMAL\n"); |
|
fprintf(fp,"\n" |
"\t\tassign\tp_one = rp_one;\n" |
"\t\tassign\tp_two = rp_two;\n" |
"\t\tassign\tp_three = rp_three;\n" |
"\n"); |
|
/////////////////////////////////////////// |
/// |
/// Two clocks per CE, so CE, no-ce, CE, no-ce, etc |
/// |
fprintf(fp, |
"\tend else if (CKPCE <= 2)\n" |
"\tbegin : CKPCE_TWO\n" |
"\t\t// Coefficient multiply inputs\n" |
"\t\treg [2*(CWIDTH)-1:0] mpy_pipe_c;\n" |
"\t\t// Data multiply inputs\n" |
"\t\treg [2*(IWIDTH+1)-1:0] mpy_pipe_d;\n" |
"\t\twire signed [(CWIDTH-1):0] mpy_pipe_vc;\n" |
"\t\twire signed [(IWIDTH):0] mpy_pipe_vd;\n" |
"\t\t//\n" |
"\t\treg signed [(CWIDTH+1)-1:0] mpy_cof_sum;\n" |
"\t\treg signed [(IWIDTH+2)-1:0] mpy_dif_sum;\n" |
"\n" |
"\t\tassign mpy_pipe_vc = mpy_pipe_c[2*(CWIDTH)-1:CWIDTH];\n" |
"\t\tassign mpy_pipe_vd = mpy_pipe_d[2*(IWIDTH+1)-1:IWIDTH+1];\n" |
"\n" |
"\t\treg mpy_pipe_v;\n" |
"\t\treg ce_phase;\n" |
"\n" |
"\t\treg signed [(CWIDTH+IWIDTH+1)-1:0] mpy_pipe_out;\n" |
"\t\treg signed [IWIDTH+CWIDTH+3-1:0] longmpy;\n" |
"\n" |
"\n" |
"\t\tinitial ce_phase = 1'b1;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n" |
"\t\t\tce_phase <= 1'b1;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\tce_phase <= 1'b0;\n" |
"\t\telse\n" |
"\t\t\tce_phase <= 1'b1;\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\t\tmpy_pipe_v = (i_ce)||(!ce_phase);\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (!ce_phase)\n" |
"\t\tbegin\n" |
"\t\t\t// Pre-clock\n" |
"\t\t\tmpy_pipe_c[2*CWIDTH-1:0] <=\n" |
"\t\t\t\t\t{ ir_coef_r, ir_coef_i };\n" |
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <=\n" |
"\t\t\t\t\t{ r_dif_r, r_dif_i };\n" |
"\n" |
"\t\t\tmpy_cof_sum <= ir_coef_i + ir_coef_r;\n" |
"\t\t\tmpy_dif_sum <= r_dif_r + r_dif_i;\n" |
"\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// First clock\n" |
"\t\t\tmpy_pipe_c[2*(CWIDTH)-1:0] <= {\n" |
"\t\t\t\tmpy_pipe_c[(CWIDTH)-1:0], {(CWIDTH){1'b0}} };\n" |
"\t\t\tmpy_pipe_d[2*(IWIDTH+1)-1:0] <= {\n" |
"\t\t\t\tmpy_pipe_d[(IWIDTH+1)-1:0], {(IWIDTH+1){1'b0}} };\n" |
"\t\tend\n\n"); |
|
if (formal_property_flag) |
fprintf(fp, "`ifndef FORMAL\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) // First clock\n" |
"\t\t\tlongmpy <= mpy_cof_sum * mpy_dif_sum;\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (mpy_pipe_v)\n" |
"\t\t\tmpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd;\n"); |
|
if (formal_property_flag) |
fprintf(fp, "`else\n" |
"\t\twire signed [IWIDTH+CWIDTH+3-1:0] pre_longmpy;\n" |
"\t\twire signed [(CWIDTH+IWIDTH+1)-1:0] pre_mpy_pipe_out;\n" |
"\n" |
"\t\tabs_mpy #(CWIDTH+1,IWIDTH+2,1)\n" |
"\t\t longmpyi(mpy_cof_sum, mpy_dif_sum, pre_longmpy);\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t longmpy <= pre_longmpy;\n" |
"\n" |
"\n" |
"\t\tabs_mpy #(CWIDTH,IWIDTH+1,1)\n" |
"\t\t mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out);\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (mpy_pipe_v)\n" |
"\t\t mpy_pipe_out <= pre_mpy_pipe_out;\n" |
"`endif\n"); |
|
fprintf(fp,"\n" |
"\t\treg\tsigned\t[((IWIDTH+1)+(CWIDTH)-1):0] rp_one,\n" |
"\t\t\t\t\t\t\trp2_one, rp_two;\n" |
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0] rp_three;\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (!ce_phase) // 1.5 clock\n" |
"\t\t\trp_one <= mpy_pipe_out;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) // two clocks\n" |
"\t\t\trp_two <= mpy_pipe_out;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) // Second clock\n" |
"\t\t\trp_three<= longmpy;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\trp2_one<= rp_one;\n" |
"\n" |
"\t\tassign p_one = rp2_one;\n" |
"\t\tassign p_two = rp_two;\n" |
"\t\tassign p_three= rp_three;\n" |
"\n"); |
|
///////////////////////// |
/// |
/// Three clock per CE, so CE, no-ce, no-ce*, CE |
/// |
fprintf(fp, |
"\tend else if (CKPCE <= 2'b11)\n\tbegin : CKPCE_THREE\n"); |
|
fprintf(fp, |
"\t\t// Coefficient multiply inputs\n" |
"\t\treg\t\t[3*(CWIDTH+1)-1:0]\tmpy_pipe_c;\n" |
"\t\t// Data multiply inputs\n" |
"\t\treg\t\t[3*(IWIDTH+2)-1:0]\tmpy_pipe_d;\n" |
"\t\twire\tsigned [(CWIDTH):0] mpy_pipe_vc;\n" |
"\t\twire\tsigned [(IWIDTH+1):0] mpy_pipe_vd;\n" |
"\n" |
"\t\tassign\tmpy_pipe_vc = mpy_pipe_c[3*(CWIDTH+1)-1:2*(CWIDTH+1)];\n" |
"\t\tassign\tmpy_pipe_vd = mpy_pipe_d[3*(IWIDTH+2)-1:2*(IWIDTH+2)];\n" |
"\n" |
"\t\treg\t\t\tmpy_pipe_v;\n" |
"\t\treg\t\t[2:0]\tce_phase;\n" |
"\n" |
"\t\treg\tsigned [ (CWIDTH+IWIDTH+3)-1:0] mpy_pipe_out;\n" |
"\n"); |
fprintf(fp, |
"\t\tinitial\tce_phase = 3'b011;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n" |
"\t\t\tce_phase <= 3'b011;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\tce_phase <= 3'b000;\n" |
"\t\telse if (ce_phase != 3'b011)\n" |
"\t\t\tce_phase <= ce_phase + 1'b1;\n" |
"\n" |
"\t\talways @(*)\n" |
"\t\t\tmpy_pipe_v = (i_ce)||(ce_phase < 3'b010);\n" |
"\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (ce_phase == 3\'b000)\n" |
"\t\t\tbegin\n" |
"\t\t\t\t// Second clock\n" |
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:(CWIDTH+1)] <= {\n" |
"\t\t\t\t\tir_coef_r[CWIDTH-1], ir_coef_r,\n" |
"\t\t\t\t\tir_coef_i[CWIDTH-1], ir_coef_i };\n" |
"\t\t\t\tmpy_pipe_c[CWIDTH:0] <= ir_coef_i + ir_coef_r;\n" |
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:(IWIDTH+2)] <= {\n" |
"\t\t\t\t\tr_dif_r[IWIDTH], r_dif_r,\n" |
"\t\t\t\t\tr_dif_i[IWIDTH], r_dif_i };\n" |
"\t\t\t\tmpy_pipe_d[(IWIDTH+2)-1:0] <= r_dif_r + r_dif_i;\n" |
"\n" |
"\t\t\tend else if (mpy_pipe_v)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tmpy_pipe_c[3*(CWIDTH+1)-1:0] <= {\n" |
"\t\t\t\t\tmpy_pipe_c[2*(CWIDTH+1)-1:0], {(CWIDTH+1){1\'b0}} };\n" |
"\t\t\t\tmpy_pipe_d[3*(IWIDTH+2)-1:0] <= {\n" |
"\t\t\t\t\tmpy_pipe_d[2*(IWIDTH+2)-1:0], {(IWIDTH+2){1\'b0}} };\n" |
"\t\t\tend\n\n"); |
|
if (formal_property_flag) |
fprintf(fp, "`ifndef\tFORMAL\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (mpy_pipe_v)\n" |
"\t\t\t\tmpy_pipe_out <= mpy_pipe_vc * mpy_pipe_vd;\n" |
"\n"); |
|
if (formal_property_flag) |
fprintf(fp, |
"`else\t// FORMAL\n" |
"\t\twire signed [ (CWIDTH+IWIDTH+3)-1:0] pre_mpy_pipe_out;\n" |
"\n" |
"\t\tabs_mpy #(CWIDTH+1,IWIDTH+2,1)\n" |
"\t\t mpy_pipe_outi(mpy_pipe_vc, mpy_pipe_vd, pre_mpy_pipe_out);\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t if (mpy_pipe_v)\n" |
"\t\t mpy_pipe_out <= pre_mpy_pipe_out;\n" |
"`endif\t// FORMAL\n\n"); |
|
|
fprintf(fp, |
"\t\treg\tsigned\t[((IWIDTH+1)+(CWIDTH)-1):0]\trp_one, rp_two,\n" |
"\t\t\t\t\t\trp2_one, rp2_two;\n" |
"\t\treg\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\trp_three, rp2_three;\n" |
|
"\n"); |
|
fprintf(fp, |
"\t\talways @(posedge i_clk)\n" |
"\t\tif(i_ce)\n" |
"\t\t\trp_one <= mpy_pipe_out[(CWIDTH+IWIDTH):0];\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif(ce_phase == 3'b000)\n" |
"\t\t\trp_two <= mpy_pipe_out[(CWIDTH+IWIDTH):0];\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif(ce_phase == 3'b001)\n" |
"\t\t\trp_three <= mpy_pipe_out;\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\trp2_one<= rp_one;\n" |
"\t\t\trp2_two<= rp_two;\n" |
"\t\t\trp2_three<= rp_three;\n" |
"\t\tend\n"); |
fprintf(fp, |
"\t\tassign p_one\t= rp2_one;\n" |
"\t\tassign p_two\t= rp2_two;\n" |
"\t\tassign\tp_three\t= rp2_three;\n" |
"\n"); |
|
fprintf(fp, |
"\tend endgenerate\n"); |
|
fprintf(fp, |
"\twire\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] w_one, w_two;\n" |
"\tassign\tw_one = { {(2){p_one[((IWIDTH+1)+(CWIDTH)-1)]}}, p_one };\n" |
"\tassign\tw_two = { {(2){p_two[((IWIDTH+1)+(CWIDTH)-1)]}}, p_two };\n" |
"\n"); |
|
fprintf(fp, |
"\t// These values are held in memory and delayed during the\n" |
"\t// multiply. Here, we recover them. During the multiply,\n" |
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n" |
"\t// therefore, the left_x values need to be right shifted by\n" |
"\t// CWIDTH-2 as well. The additional bits come from a sign\n" |
"\t// extension.\n" |
"\twire\taux_s;\n" |
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] left_si, left_sr;\n" |
"\treg\t\t[(2*IWIDTH+2):0] left_saved;\n" |
"\tassign\tleft_sr = { {2{left_saved[2*(IWIDTH+1)-1]}}, left_saved[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n" |
"\tassign\tleft_si = { {2{left_saved[(IWIDTH+1)-1]}}, left_saved[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n" |
"\tassign\taux_s = left_saved[2*IWIDTH+2];\n" |
"\n" |
"\t(* use_dsp48=\"no\" *)\n" |
"\treg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n" |
"\n"); |
|
fprintf(fp, |
"\tinitial left_saved = 0;\n" |
"\tinitial o_aux = 1\'b0;\n"); |
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\tbegin\n" |
"\t\t\tleft_saved <= 0;\n" |
"\t\t\to_aux <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// First clock, recover all values\n" |
"\t\t\tleft_saved <= leftvv;\n" |
"\n" |
"\t\t\t// Second clock, round and latch for final clock\n" |
"\t\t\to_aux <= aux_s;\n" |
"\t\tend\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n" |
"\t\t\t// although they only need to be (IWIDTH+1)\n" |
"\t\t\t// + (CWIDTH) bits wide. (We've got two\n" |
"\t\t\t// extra bits we need to get rid of.)\n" |
"\n" |
"\t\t\t// These two lines also infer DSP48\'s.\n" |
"\t\t\t// To keep from using extra DSP48 resources,\n" |
"\t\t\t// they are prevented from using DSP48\'s\n" |
"\t\t\t// by the (* use_dsp48 ... *) comment above.\n" |
"\t\t\tmpy_r <= w_one - w_two;\n" |
"\t\t\tmpy_i <= p_three - w_one - w_two;\n" |
"\t\tend\n" |
"\n"); |
|
fprintf(fp, |
"\t// Round the results\n" |
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n"); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_r(i_clk, i_ce,\n" |
"\t\t\t\tleft_sr, rnd_left_r);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_i(i_clk, i_ce,\n" |
"\t\t\t\tleft_si, rnd_left_i);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n" |
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n" |
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string); |
|
|
fprintf(fp, |
"\t// As a final step, we pack our outputs into two packed two's\n" |
"\t// complement numbers per output word, so that each output word\n" |
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n" |
"\t// portion and the bottom half being the imaginary portion.\n" |
"\tassign\to_left = { rnd_left_r, rnd_left_i };\n" |
"\tassign\to_right= { rnd_right_r,rnd_right_i};\n" |
"\n"); |
|
if (formal_property_flag) { |
fprintf(fp, |
"`ifdef VERILATOR\n" |
"`define FORMAL\n" |
"`endif\n" |
"`ifdef FORMAL\n" |
"\tlocalparam F_LGDEPTH = 3;\n" |
"\tlocalparam F_DEPTH = 5;\n" |
"\tlocalparam [F_LGDEPTH-1:0] F_D = F_DEPTH-1;\n" |
"\n" |
"\treg signed [IWIDTH-1:0] f_dlyleft_r [0:F_DEPTH-1];\n" |
"\treg signed [IWIDTH-1:0] f_dlyleft_i [0:F_DEPTH-1];\n" |
"\treg signed [IWIDTH-1:0] f_dlyright_r [0:F_DEPTH-1];\n" |
"\treg signed [IWIDTH-1:0] f_dlyright_i [0:F_DEPTH-1];\n" |
"\treg signed [CWIDTH-1:0] f_dlycoeff_r [0:F_DEPTH-1];\n" |
"\treg signed [CWIDTH-1:0] f_dlycoeff_i [0:F_DEPTH-1];\n" |
"\treg signed [F_DEPTH-1:0] f_dlyaux;\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n" |
"\t\tf_dlyaux <= 0;\n" |
"\telse if (i_ce)\n" |
"\t\tf_dlyaux <= { f_dlyaux[F_DEPTH-2:0], i_aux };\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\tbegin\n" |
"\t\tf_dlyleft_r[0] <= i_left[ (2*IWIDTH-1):IWIDTH];\n" |
"\t\tf_dlyleft_i[0] <= i_left[ ( IWIDTH-1):0];\n" |
"\t\tf_dlyright_r[0] <= i_right[(2*IWIDTH-1):IWIDTH];\n" |
"\t\tf_dlyright_i[0] <= i_right[( IWIDTH-1):0];\n" |
"\t\tf_dlycoeff_r[0] <= i_coef[ (2*CWIDTH-1):CWIDTH];\n" |
"\t\tf_dlycoeff_i[0] <= i_coef[ ( CWIDTH-1):0];\n" |
"\tend\n" |
"\n" |
"\tgenvar k;\n" |
"\tgenerate for(k=1; k<F_DEPTH; k=k+1)\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tf_dlyleft_r[k] <= f_dlyleft_r[ k-1];\n" |
"\t\t\tf_dlyleft_i[k] <= f_dlyleft_i[ k-1];\n" |
"\t\t\tf_dlyright_r[k] <= f_dlyright_r[k-1];\n" |
"\t\t\tf_dlyright_i[k] <= f_dlyright_i[k-1];\n" |
"\t\t\tf_dlycoeff_r[k] <= f_dlycoeff_r[k-1];\n" |
"\t\t\tf_dlycoeff_i[k] <= f_dlycoeff_i[k-1];\n" |
"\t\tend\n" |
"\n" |
"\tendgenerate\n" |
"\n" |
"`ifdef VERILATOR" |
/* |
"\tgenerate if (CKPCE <= 1)\n" |
"\tbegin\n" |
"\n" |
"\t\t// i_ce is allowed to be anything in this mode\n" |
"\n" |
"\tend else if (CKPCE == 2)\n" |
"\tbegin : F_CKPCE_TWO\n" |
"\n" |
"\t\tassert property (@(posedge i_clk)\n" |
"\t\t i_ce |=> !i_ce);\n" |
"\n" |
"\tend else if (CKPCE == 3)\n" |
"\tbegin : F_CKPCE_THREE\n" |
"\n" |
"\t\tassert property (@(posedge i_clk)\n" |
"\t\t i_ce |=> !i_ce ##1 !i_ce);\n" |
"\n" |
"\tend endgenerate\n" |
*/ |
"\n" |
"`else\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((!$past(i_ce))&&(!$past(i_ce,2))&&(!$past(i_ce,3))\n" |
"\t\t\t&&(!$past(i_ce,4)))\n" |
"\t\tassume(i_ce);\n" |
"\n" |
"\tgenerate if (CKPCE <= 1)\n" |
"\tbegin\n" |
"\n" |
"\t\t// i_ce is allowed to be anything in this mode\n" |
"\n" |
"\tend else if (CKPCE == 2)\n" |
"\tbegin : F_CKPCE_TWO\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t if ($past(i_ce))\n" |
"\t\t assume(!i_ce);\n" |
"\n" |
"\tend else if (CKPCE == 3)\n" |
"\tbegin : F_CKPCE_THREE\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t if (($past(i_ce))||($past(i_ce,2)))\n" |
"\t\t assume(!i_ce);\n" |
"\n" |
"\tend endgenerate\n" |
"`endif" |
"\n" |
"\treg [F_LGDEPTH-1:0] f_startup_counter;\n" |
"\tinitial f_startup_counter = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n" |
"\t\tf_startup_counter <= 0;\n" |
"\telse if ((i_ce)&&(!(&f_startup_counter)))\n" |
"\t\tf_startup_counter <= f_startup_counter + 1;\n" |
"\n" |
"\twire signed [IWIDTH:0] f_sumr, f_sumi;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t\tf_sumr = f_dlyleft_r[F_D] + f_dlyright_r[F_D];\n" |
"\t\tf_sumi = f_dlyleft_i[F_D] + f_dlyright_i[F_D];\n" |
"\tend\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH:0] f_sumrx, f_sumix;\n" |
"\tassign f_sumrx = { {(2){f_sumr[IWIDTH]}}, f_sumr, {(CWIDTH-2){1'b0}} };\n" |
"\tassign f_sumix = { {(2){f_sumi[IWIDTH]}}, f_sumi, {(CWIDTH-2){1'b0}} };\n" |
"\n" |
"\twire signed [IWIDTH:0] f_difr, f_difi;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t\tf_difr = f_dlyleft_r[F_D] - f_dlyright_r[F_D];\n" |
"\t\tf_difi = f_dlyleft_i[F_D] - f_dlyright_i[F_D];\n" |
"\tend\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_difrx, f_difix;\n" |
"\tassign f_difrx = { {(CWIDTH+2){f_difr[IWIDTH]}}, f_difr };\n" |
"\tassign f_difix = { {(CWIDTH+2){f_difi[IWIDTH]}}, f_difi };\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH+3-1:0] f_widecoeff_r, f_widecoeff_i;\n" |
"\tassign f_widecoeff_r = {{(IWIDTH+3){f_dlycoeff_r[F_D][CWIDTH-1]}},\n" |
"\t f_dlycoeff_r[F_D] };\n" |
"\tassign f_widecoeff_i = {{(IWIDTH+3){f_dlycoeff_i[F_D][CWIDTH-1]}},\n" |
"\t f_dlycoeff_i[F_D] };\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif (f_startup_counter > F_D)\n" |
"\tbegin\n" |
"\t\tassert(left_sr == f_sumrx);\n" |
"\t\tassert(left_si == f_sumix);\n" |
"\t\tassert(aux_s == f_dlyaux[F_D]);\n" |
"\n" |
"\t\tif ((f_difr == 0)&&(f_difi == 0))\n" |
"\t\tbegin\n" |
"\t\t assert(mpy_r == 0);\n" |
"\t\t assert(mpy_i == 0);\n" |
"\t\tend else if ((f_dlycoeff_r[F_D] == 0)\n" |
"\t\t &&(f_dlycoeff_i[F_D] == 0))\n" |
"\t\tbegin\n" |
"\t assert(mpy_r == 0);\n" |
"\t\t assert(mpy_i == 0);\n" |
"\t\tend\n" |
"\n" |
"\t\tif ((f_dlycoeff_r[F_D] == 1)&&(f_dlycoeff_i[F_D] == 0))\n" |
"\t\tbegin\n" |
"\t\t assert(mpy_r == f_difrx);\n" |
"\t\t assert(mpy_i == f_difix);\n" |
"\t\tend\n" |
"\n" |
"\t\tif ((f_dlycoeff_r[F_D] == 0)&&(f_dlycoeff_i[F_D] == 1))\n" |
"\t\tbegin\n" |
"\t\t assert(mpy_r == -f_difix);\n" |
"\t\t assert(mpy_i == f_difrx);\n" |
"\t\tend\n" |
"\n" |
"\t\tif ((f_difr == 1)&&(f_difi == 0))\n" |
"\t\tbegin\n" |
"\t\t assert(mpy_r == f_widecoeff_r);\n" |
"\t\t assert(mpy_i == f_widecoeff_i);\n" |
"\t\tend\n" |
"\n" |
"\t\tif ((f_difr == 0)&&(f_difi == 1))\n" |
"\t\tbegin\n" |
"\t\t assert(mpy_r == -f_widecoeff_i);\n" |
"\t\t assert(mpy_i == f_widecoeff_r);\n" |
"\t\tend\n" |
"\tend\n" |
"\n"); |
|
fprintf(fp, |
"\t// Let's see if we can improve our performance at all by\n" |
"\t// moving our test one clock earlier. If nothing else, it should\n" |
"\t// help induction finish one (or more) clocks ealier than\n" |
"\t// otherwise\n" |
"\n\n" |
"\twire signed [IWIDTH:0] f_predifr, f_predifi;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t\tf_predifr = f_dlyleft_r[F_D-1] - f_dlyright_r[F_D-1];\n" |
"\t\tf_predifi = f_dlyleft_i[F_D-1] - f_dlyright_i[F_D-1];\n" |
"\tend\n" |
"\n" |
"\twire signed [IWIDTH+CWIDTH+1-1:0] f_predifrx, f_predifix;\n" |
"\tassign f_predifrx = { {(CWIDTH){f_predifr[IWIDTH]}}, f_predifr };\n" |
"\tassign f_predifix = { {(CWIDTH){f_predifi[IWIDTH]}}, f_predifi };\n" |
"\n" |
"\twire signed [CWIDTH:0] f_sumcoef;\n" |
"\twire signed [IWIDTH+1:0] f_sumdiff;\n" |
"\talways @(*)\n" |
"\tbegin\n" |
"\t\tf_sumcoef = f_dlycoeff_r[F_D-1] + f_dlycoeff_i[F_D-1];\n" |
"\t\tf_sumdiff = f_predifr + f_predifi;\n" |
"\tend\n" |
"\n" |
"\t// Induction helpers\n" |
"\talways @(posedge i_clk)\n" |
"\tif (f_startup_counter >= F_D)\n" |
"\tbegin\n" |
"\t\tif (f_dlycoeff_r[F_D-1] == 0)\n" |
"\t\t\tassert(p_one == 0);\n" |
"\t\tif (f_dlycoeff_i[F_D-1] == 0)\n" |
"\t\t\tassert(p_two == 0);\n" |
"\n" |
"\t\tif (f_dlycoeff_r[F_D-1] == 1)\n" |
"\t\t\tassert(p_one == f_predifrx);\n" |
"\t\tif (f_dlycoeff_i[F_D-1] == 1)\n" |
"\t\t\tassert(p_two == f_predifix);\n" |
"\n" |
"\t\tif (f_predifr == 0)\n" |
"\t\t\tassert(p_one == 0);\n" |
"\t\tif (f_predifi == 0)\n" |
"\t\t\tassert(p_two == 0);\n" |
"\n" |
"\t\t// verilator lint_off WIDTH\n" |
"\t\tif (f_predifr == 1)\n" |
"\t\t\tassert(p_one == f_dlycoeff_r[F_D-1]);\n" |
"\t\tif (f_predifi == 1)\n" |
"\t\t\tassert(p_two == f_dlycoeff_i[F_D-1]);\n" |
"\t\t// verilator lint_on WIDTH\n" |
"\n" |
"\t\tif (f_sumcoef == 0)\n" |
"\t\t\tassert(p_three == 0);\n" |
"\t\tif (f_sumdiff == 0)\n" |
"\t\t\tassert(p_three == 0);\n" |
"\t\t// verilator lint_off WIDTH\n" |
"\t\tif (f_sumcoef == 1)\n" |
"\t\t\tassert(p_three == f_sumdiff);\n" |
"\t\tif (f_sumdiff == 1)\n" |
"\t\t\tassert(p_three == f_sumcoef);\n" |
"\t\t// verilator lint_on WIDTH\n" |
"`ifdef VERILATOR\n" |
"\t\tassert(p_one == f_predifr * f_dlycoeff_r[F_D-1]);\n" |
"\t\tassert(p_two == f_predifi * f_dlycoeff_i[F_D-1]);\n" |
"\t\tassert(p_three == f_sumdiff * f_sumcoef);\n" |
"`endif // VERILATOR\n" |
"\tend\n\n" |
"`endif // FORMAL\n"); |
} |
|
fprintf(fp, |
"endmodule\n"); |
|
fclose(fp); |
} |
/trunk/sw/butterfly.h
0,0 → 1,48
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: butterfly.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef BUTTERFLY_H |
#define BUTTERFLY_H |
|
extern void build_butterfly(const char *fname, int xtracbits, |
ROUND_T rounding, int ckpce = 1, |
const bool async_reset = false); |
|
extern void build_hwbfly(const char *fname, int xtracbits, ROUND_T rounding, |
int ckpce = 3, const bool async_reset= false); |
|
#endif |
/trunk/sw/defaults.h
0,0 → 1,81
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: defaults.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef DEFAULTS_H |
#define DEFAULTS_H |
|
#define DEF_NBITSIN 16 |
#define DEF_COREDIR "fft-core" |
#define DEF_XTRACBITS 4 |
#define DEF_NMPY 0 |
#define DEF_XTRAPBITS 0 |
#define USE_OLD_MULTIPLY false |
|
// To coordinate testing, it helps to have some defines in our header file that |
// are common with the default parameters found within the various subroutines. |
// We'll define those common parameters here. These values, however, have no |
// effect on anything other than bench testing. They do, though, allow us to |
// bench test exact copies of what is going on within the FFT when necessary |
// in order to find problems. |
// First, parameters for the new multiply based upon the bi-multiply structure |
// (2-bits/2-tableau rows at a time). |
#define TST_LONGBIMPY_AW 8 |
#define TST_LONGBIMPY_BW 12 // Leave undefined to match AW |
|
// We also include parameters for the shift add multiply |
#define TST_SHIFTADDMPY_AW 16 |
#define TST_SHIFTADDMPY_BW 20 // Leave undefined to match AW |
|
// Now for parameters matching the butterfly |
#define TST_BUTTERFLY_IWIDTH 16 |
#define TST_BUTTERFLY_CWIDTH 20 |
#define TST_BUTTERFLY_OWIDTH (TST_BUTTERFLY_IWIDTH+1) |
|
// Now for parameters matching the qtrstage |
#define TST_QTRSTAGE_IWIDTH 16 |
#define TST_QTRSTAGE_LGWIDTH 8 |
|
// Parameters for the dblstage |
#define TST_DBLSTAGE_IWIDTH 16 |
#define TST_DBLSTAGE_SHIFT 0 |
|
// Now for parameters matching the dblreverse stage |
#define TST_DBLREVERSE_LGSIZE 5 |
|
static const bool formal_property_flag = true; |
|
#endif |
/trunk/sw/fftgen.cpp
2,7 → 2,7
// |
// Filename: fftgen.cpp |
// |
// Project: A Doubletime Pipelined FFT |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: This is the core generator for the project. Every part |
// and piece of this project begins and ends in this program. |
27,7 → 27,7
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2017, Gisselquist Technology, LLC |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
67,9 → 67,6
|
#if _MSC_VER <= 1700 |
|
long long llround(double d) { |
if (d<0) return -(long long)(-d+0.5); |
else return (long long)(d+0.5); } |
int lstat(const char *filename, struct stat *buf) { return 1; }; |
#define S_ISDIR(A) 0 |
|
97,123 → 94,16
#include <ctype.h> |
#include <assert.h> |
|
#define DEF_NBITSIN 16 |
#define DEF_COREDIR "fft-core" |
#define DEF_XTRACBITS 4 |
#define DEF_NMPY 0 |
#define DEF_XTRAPBITS 0 |
#define USE_OLD_MULTIPLY false |
#include "defaults.h" |
#include "legal.h" |
#include "rounding.h" |
#include "fftlib.h" |
#include "bldstage.h" |
#include "bitreverse.h" |
#include "softmpy.h" |
#include "butterfly.h" |
|
// To coordinate testing, it helps to have some defines in our header file that |
// are common with the default parameters found within the various subroutines. |
// We'll define those common parameters here. These values, however, have no |
// effect on anything other than bench testing. They do, though, allow us to |
// bench test exact copies of what is going on within the FFT when necessary |
// in order to find problems. |
// First, parameters for the new multiply based upon the bi-multiply structure |
// (2-bits/2-tableau rows at a time). |
#define TST_LONGBIMPY_AW 16 |
#define TST_LONGBIMPY_BW 20 // Leave undefined to match AW |
|
// We also include parameters for the shift add multiply |
#define TST_SHIFTADDMPY_AW 16 |
#define TST_SHIFTADDMPY_BW 20 // Leave undefined to match AW |
|
// Now for parameters matching the butterfly |
#define TST_BUTTERFLY_IWIDTH 16 |
#define TST_BUTTERFLY_CWIDTH 20 |
#define TST_BUTTERFLY_OWIDTH 17 |
|
// Now for parameters matching the qtrstage |
#define TST_QTRSTAGE_IWIDTH 16 |
#define TST_QTRSTAGE_LGWIDTH 8 |
|
// Parameters for the dblstage |
#define TST_DBLSTAGE_IWIDTH 16 |
#define TST_DBLSTAGE_SHIFT 0 |
|
// Now for parameters matching the dblreverse stage |
#define TST_DBLREVERSE_LGSIZE 5 |
|
typedef enum { |
RND_TRUNCATE, RND_FROMZERO, RND_HALFUP, RND_CONVERGENT |
} ROUND_T; |
|
const char cpyleft[] = |
"////////////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Copyright (C) 2015-2017, Gisselquist Technology, LLC\n" |
"//\n" |
"// This program is free software (firmware): you can redistribute it and/or\n" |
"// modify it under the terms of the GNU General Public License as published\n" |
"// by the Free Software Foundation, either version 3 of the License, or (at\n" |
"// your option) any later version.\n" |
"//\n" |
"// This program is distributed in the hope that it will be useful, but WITHOUT\n" |
"// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or\n" |
"// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License\n" |
"// for more details.\n" |
"//\n" |
"// You should have received a copy of the GNU General Public License along\n" |
"// with this program. (It's in the $(ROOT)/doc directory, run make with no\n" |
"// target there if the PDF file isn\'t present.) If not, see\n" |
"// <http://www.gnu.org/licenses/> for a copy.\n" |
"//\n" |
"// License: GPL, v3, as defined and found on www.gnu.org,\n" |
"// http://www.gnu.org/licenses/gpl.html\n" |
"//\n" |
"//\n" |
"////////////////////////////////////////////////////////////////////////////////\n"; |
const char prjname[] = "A Doubletime Pipelined FFT"; |
const char creator[] = "// Creator: Dan Gisselquist, Ph.D.\n" |
"// Gisselquist Technology, LLC\n"; |
|
int lgval(int vl) { |
int lg; |
|
for(lg=1; (1<<lg) < vl; lg++) |
; |
return lg; |
} |
|
int nextlg(int vl) { |
int r; |
|
for(r=1; r<vl; r<<=1) |
; |
return r; |
} |
|
int bflydelay(int nbits, int xtra) { |
int cbits = nbits + xtra; |
int delay; |
|
if (USE_OLD_MULTIPLY) { |
if (nbits+1<cbits) |
delay = nbits+4; |
else |
delay = cbits+3; |
} else { |
int na=nbits+2, nb=cbits+1; |
if (nb<na) { |
int tmp = nb; |
nb = na; na = tmp; |
} delay = ((na)/2+(na&1)+2); |
} |
return delay; |
} |
|
int lgdelay(int nbits, int xtra) { |
// The butterfly code needs to compare a valid address, of this |
// many bits, with an address two greater. This guarantees we |
// have enough bits for that comparison. We'll also end up with |
// more storage space to look for these values, but without a |
// redesign that's just what we'll deal with. |
return lgval(bflydelay(nbits, xtra)+3); |
} |
|
void build_truncator(const char *fname) { |
printf("TRUNCATING!\n"); |
void build_dblquarters(const char *fname, ROUND_T rounding, const bool async_reset=false, const bool dbg=false) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
220,359 → 110,6
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: truncate.v\n" |
"// \n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: Truncation is one of several options that can be used\n" |
"// internal to the various FFT stages to drop bits from one \n" |
"// stage to the next. In general, it is the simplest method\n" |
"// of dropping bits, since it requires only a bit selection.\n" |
"//\n" |
"// This form of rounding isn\'t really that great for FFT\'s,\n" |
"// since it tends to produce a DC bias in the result. (Other\n" |
"// less pronounced biases may also exist.)\n" |
"//\n" |
"// This particular version also registers the output with the\n" |
"// clock, so there will be a delay of one going through this\n" |
"// module. This will keep it in line with the other forms of\n" |
"// rounding that can be used.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module truncate(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_val <= i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\n" |
"endmodule\n"); |
} |
|
|
void build_roundhalfup(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: roundhalfup.v\n" |
"// \n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: Rounding half up is the way I was always taught to round in\n" |
"// school. A one half value is added to the result, and then\n" |
"// the result is truncated. When used in an FFT, this produces\n" |
"// less bias than the truncation method, although a bias still\n" |
"// tends to remain.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module roundhalfup(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\t// Let's deal with two cases to be as general as we can be here\n" |
"\t//\n" |
"\t// 1. The desired output would lose no bits at all\n" |
"\t// 2. One or more bits would be dropped, so the rounding is simply\n" |
"\t//\t\ta matter of adding one to the bit about to be dropped,\n" |
"\t//\t\tmoving all halfway and above numbers up to the next\n" |
"\t//\t\tvalue.\n" |
"\tgenerate\n" |
"\tif (IWID-SHIFT == OWID)\n" |
"\tbegin // No truncation or rounding, output drops no bits\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n" |
"\n" |
"\tend else // if (IWID-SHIFT-1 >= OWID)\n" |
"\tbegin // Output drops one bit, can only add one or ... not.\n" |
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n" |
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse\n" |
"\t\t\t\t\to_val <= rounded_up; // even value\n" |
"\t\t\tend\n" |
"\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"endmodule\n"); |
} |
|
void build_roundfromzero(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: roundfromzero.v\n" |
"// \n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: Truncation is one of several options that can be used\n" |
"// internal to the various FFT stages to drop bits from one \n" |
"// stage to the next. In general, it is the simplest method\n" |
"// of dropping bits, since it requires only a bit selection.\n" |
"//\n" |
"// This form of rounding isn\'t really that great for FFT\'s,\n" |
"// since it tends to produce a DC bias in the result. (Other\n" |
"// less pronounced biases may also exist.)\n" |
"//\n" |
"// This particular version also registers the output with the\n" |
"// clock, so there will be a delay of one going through this\n" |
"// module. This will keep it in line with the other forms of\n" |
"// rounding that can be used.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module roundfromzero(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\t// Let's deal with three cases to be as general as we can be here\n" |
"\t//\n" |
"\t//\t1. The desired output would lose no bits at all\n" |
"\t//\t2. One bit would be dropped, so the rounding is simply\n" |
"\t//\t\tadjusting the value to be the closer to zero in\n" |
"\t//\t\tcases of being halfway between two. If identically\n" |
"\t//\t\tequal to a number, we just leave it as is.\n" |
"\t//\t3. Two or more bits would be dropped. In this case, we round\n" |
"\t//\t\tnormally unless we are rounding a value of exactly\n" |
"\t//\t\thalfway between the two. In the halfway case, we\n" |
"\t//\t\tround away from zero.\n" |
"\tgenerate\n" |
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n" |
"\tbegin // cannot be applied. No truncation or rounding takes\n" |
"\t// effect here.\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n" |
"\n" |
"\tend else if (IWID-SHIFT == OWID)\n" |
"\tbegin // No truncation or rounding, output drops no bits\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n" |
"\n" |
"\tend else if (IWID-SHIFT-1 == OWID)\n" |
"\tbegin // Output drops one bit, can only add one or ... not.\n" |
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n" |
"\t\twire\t\t\tsign_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tfirst_lost_bit = i_val[0];\n" |
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (sign_bit)\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse\n" |
"\t\t\t\t\to_val <= rounded_up;\n" |
"\t\t\tend\n" |
"\n" |
"\tend else // If there's more than one bit we are dropping\n" |
"\tbegin\n" |
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n" |
"\t\twire\t\t\tsign_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n" |
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n" |
"\n" |
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n" |
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (|other_lost_bits) // Round up to\n" |
"\t\t\t\t\to_val <= rounded_up; // closest value\n" |
"\t\t\t\telse if (sign_bit)\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse\n" |
"\t\t\t\t\to_val <= rounded_up;\n" |
"\t\t\tend\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"endmodule\n"); |
} |
|
void build_convround(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: convround.v\n" |
"// \n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: A convergent rounding routine, also known as banker\'s\n" |
"// rounding, Dutch rounding, Gaussian rounding, unbiased\n" |
"// rounding, or ... more, at least according to Wikipedia.\n" |
"//\n" |
"// This form of rounding works by rounding, when the direction is in\n" |
"// question, towards the nearest even value.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module convround(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\t// Let's deal with three cases to be as general as we can be here\n" |
"\t//\n" |
"\t//\t1. The desired output would lose no bits at all\n" |
"\t//\t2. One bit would be dropped, so the rounding is simply\n" |
"\t//\t\tadjusting the value to be the nearest even number in\n" |
"\t//\t\tcases of being halfway between two. If identically\n" |
"\t//\t\tequal to a number, we just leave it as is.\n" |
"\t//\t3. Two or more bits would be dropped. In this case, we round\n" |
"\t//\t\tnormally unless we are rounding a value of exactly\n" |
"\t//\t\thalfway between the two. In the halfway case we round\n" |
"\t//\t\tto the nearest even number.\n" |
"\tgenerate\n" |
// What if IWID < OWID? We should expand here ... somehow |
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n" |
"\tbegin // cannot be applied. No truncation or rounding takes\n" |
"\t// effect here.\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n" |
"\n" |
// What if IWID-SHIFT < OWID? Shouldn't we also shift here as well? |
"\tend else if (IWID-SHIFT == OWID)\n" |
"\tbegin // No truncation or rounding, output drops no bits\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n" |
"\n" |
"\tend else if (IWID-SHIFT-1 == OWID)\n" |
// Is there any way to limit the number of bits that are examined here, for the |
// purpose of simplifying/reducing logic? I mean, if we go from 32 to 16 bits, |
// must we check all 15 bits for equality to zero? |
"\tbegin // Output drops one bit, can only add one or ... not.\n" |
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n" |
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tlast_valid_bit = truncated_value[0];\n" |
"\t\tassign\tfirst_lost_bit = i_val[0];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (last_valid_bit)// Round up to nearest\n" |
"\t\t\t\t\to_val <= rounded_up; // even value\n" |
"\t\t\t\telse // else round down to the nearest\n" |
"\t\t\t\t\to_val <= truncated_value; // even value\n" |
"\t\t\tend\n" |
"\n" |
"\tend else // If there's more than one bit we are dropping\n" |
"\tbegin\n" |
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n" |
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tlast_valid_bit = truncated_value[0];\n" |
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n" |
"\n" |
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n" |
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (|other_lost_bits) // Round up to\n" |
"\t\t\t\t\to_val <= rounded_up; // closest value\n" |
"\t\t\t\telse if (last_valid_bit) // Round up to\n" |
"\t\t\t\t\to_val <= rounded_up; // nearest even\n" |
"\t\t\t\telse // else round down to nearest even\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\tend\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"endmodule\n"); |
} |
|
void build_quarters(const char *fname, ROUND_T rounding, bool dbg=false) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
const char *rnd_string; |
if (rounding == RND_TRUNCATE) |
rnd_string = "truncate"; |
585,16 → 122,16
|
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
SLASHLINE |
"//\n" |
"// Filename: qtrstage%s.v\n" |
"// \n" |
"// Project: %s\n" |
"// Filename:\tqtrstage%s.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose: This file encapsulates the 4 point stage of a decimation in\n" |
"// frequency FFT. This particular implementation is optimized\n" |
"// so that all of the multiplies are accomplished by additions\n" |
"// and multiplexers only.\n" |
"// so that all of the multiplies are accomplished by additions and\n" |
"// multiplexers only.\n" |
"//\n" |
"//\n%s" |
"//\n", |
602,19 → 139,25
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
fprintf(fp, |
"module\tqtrstage%s(i_clk, i_rst, i_ce, i_sync, i_data, o_data, o_sync%s);\n" |
"module\tqtrstage%s(i_clk, %s, i_ce, i_sync, i_data, o_data, o_sync%s);\n" |
"\tparameter IWIDTH=%d, OWIDTH=IWIDTH+1;\n" |
"\t// Parameters specific to the core that should be changed when this\n" |
"\t// core is built ... Note that the minimum LGSPAN is 2. Smaller \n" |
"\t// core is built ... Note that the minimum LGSPAN is 2. Smaller\n" |
"\t// spans must use the fftdoubles stage.\n" |
"\tparameter\tLGWIDTH=%d, ODD=0, INVERSE=0,SHIFT=0;\n" |
"\tinput\t i_clk, i_rst, i_ce, i_sync;\n" |
"\tinput\t i_clk, %s, i_ce, i_sync;\n" |
"\tinput\t [(2*IWIDTH-1):0] i_data;\n" |
"\toutput\treg [(2*OWIDTH-1):0] o_data;\n" |
"\toutput\treg o_sync;\n" |
"\t\n", (dbg)?"_dbg":"", (dbg)?", o_dbg":"", TST_QTRSTAGE_IWIDTH, |
TST_QTRSTAGE_LGWIDTH); |
"\t\n", (dbg)?"_dbg":"", |
resetw.c_str(), |
(dbg)?", o_dbg":"", TST_QTRSTAGE_IWIDTH, |
TST_QTRSTAGE_LGWIDTH, resetw.c_str()); |
if (dbg) { fprintf(fp, "\toutput\twire\t[33:0]\t\t\to_dbg;\n" |
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n" |
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n" |
675,9 → 218,16
*/ |
fprintf(fp, |
"\tinitial wait_for_sync = 1\'b1;\n" |
"\tinitial iaddr = 0;\n" |
"\tinitial iaddr = 0;\n"); |
if (async_reset) |
fprintf(fp, |
"\talways @(posedge i_clk, negedge i_areset_n)\n" |
"\t\tif (!i_reset)\n"); |
else |
fprintf(fp, |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\tbegin\n" |
"\t\t\twait_for_sync <= 1\'b1;\n" |
"\t\t\tiaddr <= 0;\n" |
685,7 → 235,7
"\t\tbegin\n" |
"\t\t\tiaddr <= iaddr + { {(LGWIDTH-1){1\'b0}}, 1\'b1 };\n" |
"\t\t\twait_for_sync <= 1\'b0;\n" |
"\t\tend\n" |
"\t\tend\n\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\timem <= i_data;\n" |
694,9 → 244,17
"\t// Note that we don\'t check on wait_for_sync or i_sync here.\n" |
"\t// Why not? Because iaddr will always be zero until after the\n" |
"\t// first i_ce, so we are safe.\n" |
"\tinitial pipeline = 4\'h0;\n" |
"\tinitial pipeline = 4\'h0;\n"); |
if (async_reset) |
fprintf(fp, |
"\talways\t@(posedge i_clk, negedge i_areset_n)\n" |
"\t\tif (!i_reset)\n"); |
else |
fprintf(fp, |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tif (i_reset)\n"); |
|
fprintf(fp, |
"\t\t\tpipeline <= 4\'h0;\n" |
"\t\telse if (i_ce) // is our pipeline process full? Which stages?\n" |
"\t\t\tpipeline <= { pipeline[2:0], iaddr[0] };\n\n"); |
752,9 → 310,17
"\t// Don\'t forget in the sync check that we are running\n" |
"\t// at two clocks per sample. Thus we need to\n" |
"\t// produce a sync every 2^(LGWIDTH-1) clocks.\n" |
"\tinitial\to_sync = 1\'b0;\n" |
"\tinitial\to_sync = 1\'b0;\n"); |
|
if (async_reset) |
fprintf(fp, |
"\talways\t@(posedge i_clk, negedge i_areset_n)\n" |
"\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t\t\to_sync <= 1\'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\to_sync <= &(~iaddr[(LGWIDTH-2):3]) && (iaddr[2:0] == 3'b101);\n"); |
761,7 → 327,7
fprintf(fp, "endmodule\n"); |
} |
|
void build_dblstage(const char *fname, ROUND_T rounding, const bool dbg = false) { |
void build_snglquarters(const char *fname, ROUND_T rounding, const bool async_reset=false, const bool dbg=false) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
768,7 → 334,6
perror("O/S Err was:"); |
return; |
} |
|
const char *rnd_string; |
if (rounding == RND_TRUNCATE) |
rnd_string = "truncate"; |
781,995 → 346,393
|
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
SLASHLINE |
"//\n" |
"// Filename: dblstage%s.v\n" |
"// Filename:\tqtrstage%s.v\n" |
"//\n" |
"// Project: %s\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose: This is part of an FPGA implementation that will process\n" |
"// the final stage of a decimate-in-frequency FFT, running\n" |
"// through the data at two samples per clock. If you notice\n" |
"// from the derivation of an FFT, the only time both even and\n" |
"// odd samples are used at the same time is in this stage.\n" |
"// Therefore, other than this stage and these twiddles, all of\n" |
"// the other stages can run two stages at a time at one sample\n" |
"// per clock.\n" |
"// Purpose: This file encapsulates the 4 point stage of a decimation in\n" |
"// frequency FFT. This particular implementation is optimized\n" |
"// so that all of the multiplies are accomplished by additions and\n" |
"// multiplexers only.\n" |
"//\n" |
"// In this implementation, the output is valid one clock after\n" |
"// the input is valid. The output also accumulates one bit\n" |
"// above and beyond the number of bits in the input.\n" |
"// \n" |
"// i_clk A system clock\n" |
"// i_rst A synchronous reset\n" |
"// i_ce Circuit enable--nothing happens unless this line is high\n" |
"// i_sync A synchronization signal, high once per FFT at the start\n" |
"// i_left The first (even) complex sample input. The higher order\n" |
"// bits contain the real portion, low order bits the\n" |
"// imaginary portion, all in two\'s complement.\n" |
"// i_right The next (odd) complex sample input, same format as\n" |
"// i_left.\n" |
"// o_left The first (even) complex output.\n" |
"// o_right The next (odd) complex output.\n" |
"// o_sync Output synchronization signal.\n" |
"// Operation:\n" |
"// The operation of this stage is identical to the regular stages of\n" |
"// the FFT (see them for details), with one additional and critical\n" |
"// difference: this stage doesn't require any hardware multiplication.\n" |
"// The multiplies within it may all be accomplished using additions and\n" |
"// subtractions.\n" |
"//\n" |
"// Let's see how this is done. Given x[n] and x[n+2], cause thats the\n" |
"// stage we are working on, with i_sync true for x[0] being input,\n" |
"// produce the output:\n" |
"//\n" |
"// y[n ] = x[n] + x[n+2]\n" |
"// y[n+2] = (x[n] - x[n+2]) * e^{-j2pi n/2} (forward transform)\n" |
"// = (x[n] - x[n+2]) * -j^n\n" |
"//\n" |
"// y[n].r = x[n].r + x[n+2].r (This is the easy part)\n" |
"// y[n].i = x[n].i + x[n+2].i\n" |
"//\n" |
"// y[2].r = x[0].r - x[2].r\n" |
"// y[2].i = x[0].i - x[2].i\n" |
"//\n" |
"// y[3].r = (x[1].i - x[3].i) (forward transform)\n" |
"// y[3].i = - (x[1].r - x[3].r)\n" |
"//\n" |
"// y[3].r = - (x[1].i - x[3].i) (inverse transform)\n" |
"// y[3].i = (x[1].r - x[3].r) (INVERSE = 1)\n" |
// "//\n" |
// "// When the FFT is run in the two samples per clock mode, this quarter\n" |
// "// stage will operate on either x[0] and x[2] (ODD = 0), or x[1] and\n" |
// "// x[3] (ODD = 1). In all other cases, it will operate on all four\n" |
// "// values.\n" |
"//\n%s" |
"//\n", (dbg)?"_dbg":"", prjname, creator); |
|
"//\n", |
(dbg)?"_dbg":"", prjname, creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
fprintf(fp, |
"module\tdblstage%s(i_clk, i_rst, i_ce, i_sync, i_left, i_right, o_left, o_right, o_sync%s);\n" |
"\tparameter\tIWIDTH=%d,OWIDTH=IWIDTH+1, SHIFT=%d;\n" |
"\tinput\t\ti_clk, i_rst, i_ce, i_sync;\n" |
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n" |
"\toutput\treg\t[(2*OWIDTH-1):0]\to_left, o_right;\n" |
"\toutput\treg\t\t\to_sync;\n" |
"\n", (dbg)?"_dbg":"", (dbg)?", o_dbg":"", |
TST_DBLSTAGE_IWIDTH, TST_DBLSTAGE_SHIFT); |
|
"module\tqtrstage%s(i_clk, %s, i_ce, i_sync, i_data, o_data, o_sync%s);\n" |
"\tparameter IWIDTH=%d, OWIDTH=IWIDTH+1;\n" |
"\tparameter\tLGWIDTH=%d, INVERSE=0,SHIFT=0;\n" |
"\tinput\t i_clk, %s, i_ce, i_sync;\n" |
"\tinput\t [(2*IWIDTH-1):0] i_data;\n" |
"\toutput\treg [(2*OWIDTH-1):0] o_data;\n" |
"\toutput\treg o_sync;\n" |
"\t\n", (dbg)?"_dbg":"", resetw.c_str(), |
(dbg)?", o_dbg":"", TST_QTRSTAGE_IWIDTH, |
TST_QTRSTAGE_LGWIDTH, resetw.c_str()); |
if (dbg) { fprintf(fp, "\toutput\twire\t[33:0]\t\t\to_dbg;\n" |
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_left[(2*OWIDTH-1):(2*OWIDTH-16)],\n" |
"\t\t\t\t\to_left[(OWIDTH-1):(OWIDTH-16)] };\n" |
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n" |
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n" |
"\n"); |
} |
|
fprintf(fp, |
"\twire\tsigned\t[(IWIDTH-1):0]\ti_in_0r, i_in_0i, i_in_1r, i_in_1i;\n" |
"\tassign\ti_in_0r = i_left[(2*IWIDTH-1):(IWIDTH)]; \n" |
"\tassign\ti_in_0i = i_left[(IWIDTH-1):0]; \n" |
"\tassign\ti_in_1r = i_right[(2*IWIDTH-1):(IWIDTH)]; \n" |
"\tassign\ti_in_1i = i_right[(IWIDTH-1):0]; \n" |
"\twire\t[(OWIDTH-1):0]\t\to_out_0r, o_out_0i,\n" |
"\t\t\t\t\to_out_1r, o_out_1i;\n" |
"\treg\t wait_for_sync;\n" |
"\treg\t[2:0] pipeline;\n" |
"\n" |
"\treg\tsigned [(IWIDTH):0] sum_r, sum_i, diff_r, diff_i;\n" |
"\n" |
"\t// Handle a potential rounding situation, when IWIDTH>=OWIDTH.\n" |
"\treg\t[(2*OWIDTH-1):0]\tob_a;\n" |
"\twire\t[(2*OWIDTH-1):0]\tob_b;\n" |
"\treg\t[(OWIDTH-1):0]\t\tob_b_r, ob_b_i;\n" |
"\tassign\tob_b = { ob_b_r, ob_b_i };\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\n" |
"\t// As with any register connected to the sync pulse, these must\n" |
"\t// have initial values and be reset on the i_rst signal.\n" |
"\t// Other data values need only restrict their updates to i_ce\n" |
"\t// enabled clocks, but sync\'s must obey resets and initial\n" |
"\t// conditions as well.\n" |
"\treg\trnd_sync, r_sync;\n" |
"\treg\t[(LGWIDTH-1):0]\t\tiaddr;\n" |
"\treg\t[(2*IWIDTH-1):0]\timem\t[0:1];\n" |
"\n" |
"\tinitial\trnd_sync = 1\'b0; // Sync into rounding\n" |
"\tinitial\tr_sync = 1\'b0; // Sync coming out\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tbegin\n" |
"\t\t\trnd_sync <= 1\'b0;\n" |
"\t\t\tr_sync <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\trnd_sync <= i_sync;\n" |
"\t\t\tr_sync <= rnd_sync;\n" |
"\t\tend\n" |
"\twire\tsigned\t[(IWIDTH-1):0]\timem_r, imem_i;\n" |
"\tassign\timem_r = imem[1][(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\timem_i = imem[1][(IWIDTH-1):0];\n" |
"\n" |
"\t// As with other variables, these are really only updated when in\n" |
"\t// the processing pipeline, after the first i_sync. However, to\n" |
"\t// eliminate as much unnecessary logic as possible, we toggle\n" |
"\t// these any time the i_ce line is enabled, and don\'t reset.\n" |
"\t// them on i_rst.\n"); |
fprintf(fp, |
"\t// Don't forget that we accumulate a bit by adding two values\n" |
"\t// together. Therefore our intermediate value must have one more\n" |
"\t// bit than the two originals.\n" |
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_0r, rnd_in_0i;\n" |
"\treg\tsigned\t[(IWIDTH):0]\trnd_in_1r, rnd_in_1i;\n\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t//\n" |
"\t\t\trnd_in_0r <= i_in_0r + i_in_1r;\n" |
"\t\t\trnd_in_0i <= i_in_0i + i_in_1i;\n" |
"\t\t\t//\n" |
"\t\t\trnd_in_1r <= i_in_0r - i_in_1r;\n" |
"\t\t\trnd_in_1i <= i_in_0i - i_in_1i;\n" |
"\t\t\t//\n" |
"\t\tend\n" |
"\twire\tsigned\t[(IWIDTH-1):0]\ti_data_r, i_data_i;\n" |
"\tassign\ti_data_r = i_data[(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\ti_data_i = i_data[(IWIDTH-1):0];\n" |
"\n" |
"\treg [(2*OWIDTH-1):0] omem [0:1];\n" |
"\n"); |
|
fprintf(fp, "\t//\n" |
"\t// Round our output values down to OWIDTH bits\n" |
"\t//\n"); |
|
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0r(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_0r, o_out_0r);\n\n", rnd_string); |
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_sum_r, rnd_sum_i,\n" |
"\t\t\trnd_diff_r, rnd_diff_i, n_rnd_diff_r, n_rnd_diff_i;\n" |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_sum_r(i_clk, i_ce,\n" |
"\t\t\t\tsum_r, rnd_sum_r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_0i(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_0i, o_out_0i);\n\n", rnd_string); |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_sum_i(i_clk, i_ce,\n" |
"\t\t\t\tsum_i, rnd_sum_i);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1r(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_1r, o_out_1r);\n\n", rnd_string); |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_diff_r(i_clk, i_ce,\n" |
"\t\t\t\tdiff_r, rnd_diff_r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_1i(i_clk, i_ce,\n" |
"\t\t\t\t\t\t\trnd_in_1i, o_out_1i);\n\n", rnd_string); |
"\t%s #(IWIDTH+1,OWIDTH,SHIFT)\tdo_rnd_diff_i(i_clk, i_ce,\n" |
"\t\t\t\tdiff_i, rnd_diff_i);\n\n", rnd_string); |
fprintf(fp, "\tassign n_rnd_diff_r = - rnd_diff_r;\n" |
"\tassign n_rnd_diff_i = - rnd_diff_i;\n"); |
fprintf(fp, |
"\tinitial wait_for_sync = 1\'b1;\n" |
"\tinitial iaddr = 0;\n"); |
if (async_reset) |
fprintf(fp, |
"\talways @(posedge i_clk, negedge i_areset_n)\n" |
"\t\tif (!i_reset)\n"); |
else |
fprintf(fp, |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_reset)\n"); |
|
fprintf(fp, "\n" |
"\t// Prior versions of this routine did not include the extra\n" |
"\t// clock and register/flip-flops that this routine requires.\n" |
"\t// These are placed in here to correct a bug in Verilator, that\n" |
"\t// otherwise struggles. (Hopefully this will fix the problem ...)\n" |
fprintf(fp, "\t\tbegin\n" |
"\t\t\twait_for_sync <= 1\'b1;\n" |
"\t\t\tiaddr <= 0;\n" |
"\t\tend else if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n" |
"\t\tbegin\n" |
"\t\t\tiaddr <= iaddr + 1\'b1;\n" |
"\t\t\twait_for_sync <= 1\'b0;\n" |
"\t\tend\n\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\to_left <= { o_out_0r, o_out_0i };\n" |
"\t\t\to_right <= { o_out_1r, o_out_1i };\n" |
"\t\t\timem[0] <= i_data;\n" |
"\t\t\timem[1] <= imem[0];\n" |
"\t\tend\n" |
"\n" |
"\tinitial\to_sync = 1'b0; // Final sync coming out of module\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\t\to_sync <= 1'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\to_sync <= r_sync;\n" |
"\n" |
"endmodule\n"); |
fclose(fp); |
} |
"\n\n"); |
fprintf(fp, |
"\t// Note that we don\'t check on wait_for_sync or i_sync here.\n" |
"\t// Why not? Because iaddr will always be zero until after the\n" |
"\t// first i_ce, so we are safe.\n" |
"\tinitial pipeline = 3\'h0;\n"); |
|
void build_multiply(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
if (async_reset) |
fprintf(fp, |
"\talways\t@(posedge i_clk, negedge i_areset_n)\n" |
"\t\tif (!i_reset)\n"); |
else |
fprintf(fp, |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_reset)\n"); |
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: shiftaddmpy.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: A portable shift and add multiply.\n" |
"//\n" |
"// While both Xilinx and Altera will offer single clock \n" |
"// multiplies, this simple approach will multiply two numbers\n" |
"// on any architecture. The result maintains the full width\n" |
"// of the multiply, there are no extra stuff bits, no rounding,\n" |
"// no shifted bits, etc.\n" |
"//\n" |
"// Further, for those applications that can support it, this\n" |
"// multiply is pipelined and will produce one answer per clock.\n" |
"//\n" |
"// For minimal processing delay, make the first parameter\n" |
"// the one with the least bits, so that AWIDTH <= BWIDTH.\n" |
"//\n" |
"// The processing delay in this multiply is (AWIDTH+1) cycles.\n" |
"// That is, if the data is present on the input at clock t=0,\n" |
"// the result will be present on the output at time t=AWIDTH+1;\n" |
"//\n" |
"//\n%s" |
"//\n", prjname, creator); |
"\t\t\tpipeline <= 3\'h0;\n" |
"\t\telse if (i_ce) // is our pipeline process full? Which stages?\n" |
"\t\t\tpipeline <= { pipeline[1:0], iaddr[1] };\n\n"); |
fprintf(fp, |
"\t// This is the pipeline[-1] stage, pipeline[0] will be set next.\n" |
"\talways\t@(posedge i_clk)\n" |
"\t\tif ((i_ce)&&(iaddr[1]))\n" |
"\t\tbegin\n" |
"\t\t\tsum_r <= imem_r + i_data_r;\n" |
"\t\t\tsum_i <= imem_i + i_data_i;\n" |
"\t\t\tdiff_r <= imem_r - i_data_r;\n" |
"\t\t\tdiff_i <= imem_i - i_data_i;\n" |
"\t\tend\n\n"); |
fprintf(fp, |
"\t// pipeline[1] takes sum_x and diff_x and produces rnd_x\n\n"); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module shiftaddmpy(i_clk, i_ce, i_a, i_b, o_r);\n" |
"\tparameter\tAWIDTH=%d,BWIDTH=", TST_SHIFTADDMPY_AW); |
#ifdef TST_SHIFTADDMPY_BW |
fprintf(fp, "%d;\n", TST_SHIFTADDMPY_BW); |
#else |
fprintf(fp, "AWIDTH;\n"); |
#endif |
fprintf(fp, |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\t[(AWIDTH-1):0]\t\ti_a;\n" |
"\tinput\t\t[(BWIDTH-1):0]\t\ti_b;\n" |
"\toutput\treg\t[(AWIDTH+BWIDTH-1):0]\to_r;\n" |
"\n" |
"\treg\t[(AWIDTH-1):0]\tu_a;\n" |
"\treg\t[(BWIDTH-1):0]\tu_b;\n" |
"\treg\t\t\tsgn;\n" |
"\n" |
"\treg\t[(AWIDTH-2):0]\t\tr_a[0:(AWIDTH-1)];\n" |
"\treg\t[(AWIDTH+BWIDTH-2):0]\tr_b[0:(AWIDTH-1)];\n" |
"\treg\t\t\t\tr_s[0:(AWIDTH-1)];\n" |
"\treg\t[(AWIDTH+BWIDTH-1):0]\tacc[0:(AWIDTH-1)];\n" |
"\tgenvar k;\n" |
"\n" |
"\t// If we were forced to stay within two\'s complement arithmetic,\n" |
"\t// taking the absolute value here would require an additional bit.\n" |
"\t// However, because our results are now unsigned, we can stay\n" |
"\t// within the number of bits given (for now).\n" |
"\talways @(posedge i_clk)\n" |
"\t// Now for pipeline[2]. We can actually do this at all i_ce\n" |
"\t// clock times, since nothing will listen unless pipeline[3]\n" |
"\t// on the next clock. Thus, we simplify this logic and do\n" |
"\t// it independent of pipeline[2].\n" |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tu_a <= (i_a[AWIDTH-1])?(-i_a):(i_a);\n" |
"\t\t\tu_b <= (i_b[BWIDTH-1])?(-i_b):(i_b);\n" |
"\t\t\tsgn <= i_a[AWIDTH-1] ^ i_b[BWIDTH-1];\n" |
"\t\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\t\tob_a <= { rnd_sum_r, rnd_sum_i };\n" |
"\t\t\t// on Even, W = e^{-j2pi 1/4 0} = 1\n" |
"\t\t\tif (!iaddr[0])\n" |
"\t\t\tbegin\n" |
"\t\t\t\tob_b_r <= rnd_diff_r;\n" |
"\t\t\t\tob_b_i <= rnd_diff_i;\n" |
"\t\t\tend else if (INVERSE==0) begin\n" |
"\t\t\t\t// on Odd, W = e^{-j2pi 1/4} = -j\n" |
"\t\t\t\tob_b_r <= rnd_diff_i;\n" |
"\t\t\t\tob_b_i <= n_rnd_diff_r;\n" |
"\t\t\tend else begin\n" |
"\t\t\t\t// on Odd, W = e^{j2pi 1/4} = j\n" |
"\t\t\t\tob_b_r <= n_rnd_diff_i;\n" |
"\t\t\t\tob_b_i <= rnd_diff_r;\n" |
"\t\t\tend\n" |
"\t\tend\n\n"); |
fprintf(fp, |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tacc[0] <= (u_a[0]) ? { {(AWIDTH){1\'b0}}, u_b }\n" |
"\t\t\t\t\t: {(AWIDTH+BWIDTH){1\'b0}};\n" |
"\t\t\tr_a[0] <= { u_a[(AWIDTH-1):1] };\n" |
"\t\t\tr_b[0] <= { {(AWIDTH-1){1\'b0}}, u_b };\n" |
"\t\t\tr_s[0] <= sgn; // The final sign, needs to be preserved\n" |
"\t\tend\n" |
"\n" |
"\tgenerate\n" |
"\tfor(k=0; k<AWIDTH-1; k=k+1)\n" |
"\tbegin : genstages\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tacc[k+1] <= acc[k] + ((r_a[k][0]) ? {r_b[k],1\'b0}:0);\n" |
"\t\t\tr_a[k+1] <= { 1\'b0, r_a[k][(AWIDTH-2):1] };\n" |
"\t\t\tr_b[k+1] <= { r_b[k][(AWIDTH+BWIDTH-3):0], 1\'b0};\n" |
"\t\t\tr_s[k+1] <= r_s[k];\n" |
"\t\tend\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_r <= (r_s[AWIDTH-1]) ? (-acc[AWIDTH-1]) : acc[AWIDTH-1];\n" |
"\n" |
"endmodule\n"); |
"\t\tbegin // In sequence, clock = 3\n" |
"\t\t\tomem[0] <= ob_b;\n" |
"\t\t\tomem[1] <= omem[0];\n" |
"\t\t\tif (pipeline[2])\n" |
"\t\t\t\to_data <= ob_a;\n" |
"\t\t\telse\n" |
"\t\t\t\to_data <= omem[1];\n" |
"\t\tend\n\n"); |
|
fclose(fp); |
} |
fprintf(fp, |
"\tinitial\to_sync = 1\'b0;\n"); |
|
void build_bimpy(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
if (async_reset) |
fprintf(fp, |
"\talways\t@(posedge i_clk, negedge i_areset_n)\n" |
"\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_reset)\n"); |
fprintf(fp, |
"////////////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: %s\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: A simple 2-bit multiply based upon the fact that LUT's allow\n" |
"// 6-bits of input. In other words, I could build a 3-bit\n" |
"// multiply from 6 LUTs (5 actually, since the first could have\n" |
"// two outputs). This would allow multiplication of three bit\n" |
"// digits, save only for the fact that you would need two bits\n" |
"// of carry. The bimpy approach throttles back a bit and does\n" |
"// a 2x2 bit multiply in a LUT, guaranteeing that it will never\n" |
"// carry more than one bit. While this multiply is hardware\n" |
"// independent (and can still run under Verilator therefore),\n" |
"// it is really motivated by trying to optimize for a specific\n" |
"// piece of hardware (Xilinx-7 series ...) that has at least\n" |
"// 4-input LUT's with carry chains.\n" |
"//\n" |
"//\n" |
"//\n%s" |
"//\n", fname, prjname, creator); |
"\t\t\to_sync <= 1\'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\to_sync <= (iaddr[2:0] == 3'b101);\n\n"); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module bimpy(i_clk, i_ce, i_a, i_b, o_r);\n" |
"\tparameter\tBW=18, // Number of bits in i_b\n" |
"\t\t\tLUTB=2; // Number of bits in i_a for our LUT multiply\n" |
"\tinput\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\t[(LUTB-1):0]\ti_a;\n" |
"\tinput\t\t[(BW-1):0]\ti_b;\n" |
"\toutput\treg\t[(BW+LUTB-1):0] o_r;\n" |
if (formal_property_flag) { |
fprintf(fp, |
"`ifdef FORMAL\n" |
"\treg f_past_valid;\n" |
"\tinitial f_past_valid = 1'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t f_past_valid = 1'b1;\n" |
"\n" |
"\twire [(BW+LUTB-2):0] w_r;\n" |
"\twire [(BW+LUTB-3):1] c;\n" |
"`ifdef QTRSTAGE\n" |
"\talways @(posedge i_clk)\n" |
"\t assume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n" |
"`endif\n" |
"\n" |
"\tassign\tw_r = { ((i_a[1])?i_b:{(BW){1'b0}}), 1'b0 }\n" |
"\t\t\t\t^ { 1'b0, ((i_a[0])?i_b:{(BW){1'b0}}) };\n" |
"\tassign\tc = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1'b0}}) }\n" |
"\t\t\t& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1'b0}});\n" |
"\t// The below logic only works if the rounding stage does nothing\n" |
"\tinitial assert(IWIDTH+1 == OWIDTH);\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_r <= w_r + { c, 2'b0 };\n" |
"\treg signed [IWIDTH-1:0] f_piped_real [0:7];\n" |
"\treg signed [IWIDTH-1:0] f_piped_imag [0:7];\n" |
"\n" |
"endmodule\n"); |
|
fclose(fp); |
} |
|
void build_longbimpy(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
"////////////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: %s\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: A portable shift and add multiply, built with the knowledge\n" |
"// of the existence of a six bit LUT and carry chain. That\n" |
"// knowledge allows us to multiply two bits from one value\n" |
"// at a time against all of the bits of the other value. This\n" |
"// sub multiply is called the bimpy.\n" |
"//\n" |
"// For minimal processing delay, make the first parameter\n" |
"// the one with the least bits, so that AWIDTH <= BWIDTH.\n" |
"//\n" |
"//\n" |
"//\n%s" |
"//\n", fname, prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module longbimpy(i_clk, i_ce, i_a, i_b, o_r);\n" |
"\tparameter AW=%d, // The width of i_a, min width is 5\n" |
"\t\t\tBW=", TST_LONGBIMPY_AW); |
#ifdef TST_LONGBIMPY_BW |
fprintf(fp, "%d", TST_LONGBIMPY_BW); |
#else |
fprintf(fp, "AW"); |
#endif |
|
fprintf(fp, ", // The width of i_b, can be anything\n" |
"\t\t\t// The following three parameters should not be changed\n" |
"\t\t\t// by any implementation, but are based upon hardware\n" |
"\t\t\t// and the above values:\n" |
"\t\t\tOW=AW+BW, // The output width\n" |
"\t\t\tIW=(AW+1)&(-2), // Internal width of A\n" |
"\t\t\tLUTB=2, // How many bits we can multiply by at once\n" |
"\t\t\tTLEN=(AW+(LUTB-1))/LUTB; // Nmbr of rows in our tableau\n" |
"\tinput\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\t[(AW-1):0]\ti_a;\n" |
"\tinput\t\t[(BW-1):0]\ti_b;\n" |
"\toutput\treg\t[(AW+BW-1):0]\to_r;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\tbegin\n" |
"\t f_piped_real[0] <= i_data[2*IWIDTH-1:IWIDTH];\n" |
"\t f_piped_imag[0] <= i_data[ IWIDTH-1:0];\n" |
"\n" |
"\treg\t[(IW-1):0]\tu_a;\n" |
"\treg\t[(BW-1):0]\tu_b;\n" |
"\treg\t\t\tsgn;\n" |
"\t f_piped_real[1] <= f_piped_real[0];\n" |
"\t f_piped_imag[1] <= f_piped_imag[0];\n" |
"\n" |
"\treg\t[(IW-1-2*(LUTB)):0]\tr_a[0:(TLEN-3)];\n" |
"\treg\t[(BW-1):0]\t\tr_b[0:(TLEN-3)];\n" |
"\treg\t[(TLEN-1):0]\t\tr_s;\n" |
"\treg\t[(IW+BW-1):0]\t\tacc[0:(TLEN-2)];\n" |
"\tgenvar k;\n" |
"\t f_piped_real[2] <= f_piped_real[1];\n" |
"\t f_piped_imag[2] <= f_piped_imag[1];\n" |
"\n" |
"\t// First step:\n" |
"\t// Switch to unsigned arithmetic for our multiply, keeping track\n" |
"\t// of the along the way. We'll then add the sign again later at\n" |
"\t// the end.\n" |
"\t//\n" |
"\t// If we were forced to stay within two's complement arithmetic,\n" |
"\t// taking the absolute value here would require an additional bit.\n" |
"\t// However, because our results are now unsigned, we can stay\n" |
"\t// within the number of bits given (for now).\n" |
"\tgenerate if (IW > AW)\n" |
"\tbegin\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\t\tu_a <= { 1'b0, (i_a[AW-1])?(-i_a):(i_a) };\n" |
"\tend else begin\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\t\tu_a <= (i_a[AW-1])?(-i_a):(i_a);\n" |
"\tend endgenerate\n" |
"\t f_piped_real[3] <= f_piped_real[2];\n" |
"\t f_piped_imag[3] <= f_piped_imag[2];\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tu_b <= (i_b[BW-1])?(-i_b):(i_b);\n" |
"\t\t\tsgn <= i_a[AW-1] ^ i_b[BW-1];\n" |
"\t\tend\n" |
"\t f_piped_real[4] <= f_piped_real[3];\n" |
"\t f_piped_imag[4] <= f_piped_imag[3];\n" |
"\n" |
"\twire [(BW+LUTB-1):0] pr_a, pr_b;\n" |
"\t f_piped_real[5] <= f_piped_real[4];\n" |
"\t f_piped_imag[5] <= f_piped_imag[4];\n" |
"\n" |
"\t//\n" |
"\t// Second step: First two 2xN products.\n" |
"\t//\n" |
"\t// Since we have no tableau of additions (yet), we can do both\n" |
"\t// of the first two rows at the same time and add them together.\n" |
"\t// For the next round, we'll then have a previous sum to accumulate\n" |
"\t// with new and subsequent product, and so only do one product at\n" |
"\t// a time can follow this--but the first clock can do two at a time.\n" |
"\tbimpy\t#(BW) lmpy_0(i_clk,i_ce,u_a[( LUTB-1): 0], u_b, pr_a);\n" |
"\tbimpy\t#(BW) lmpy_1(i_clk,i_ce,u_a[(2*LUTB-1):LUTB], u_b, pr_b);\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) r_a[0] <= u_a[(IW-1):(2*LUTB)];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) r_b[0] <= u_b;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) r_s <= { r_s[(TLEN-2):0], sgn };\n" |
"\talways @(posedge i_clk) // One clk after p[0],p[1] become valid\n" |
"\t\tif (i_ce) acc[0] <= { {(IW-LUTB){1'b0}}, pr_a}\n" |
"\t\t\t +{ {(IW-(2*LUTB)){1'b0}}, pr_b, {(LUTB){1'b0}} };\n" |
"\t f_piped_real[6] <= f_piped_real[5];\n" |
"\t f_piped_imag[6] <= f_piped_imag[5];\n" |
"\n" |
"\tgenerate // Keep track of intermediate values, before multiplying them\n" |
"\tif (TLEN > 3) for(k=0; k<TLEN-3; k=k+1)\n" |
"\tbegin : gencopies\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tr_a[k+1] <= { {(LUTB){1'b0}},\n" |
"\t\t\t\tr_a[k][(IW-1-(2*LUTB)):LUTB] };\n" |
"\t\t\tr_b[k+1] <= r_b[k];\n" |
"\t\tend\n" |
"\tend endgenerate\n" |
"\t f_piped_real[7] <= f_piped_real[6];\n" |
"\t f_piped_imag[7] <= f_piped_imag[6];\n" |
"\tend\n" |
"\n" |
"\tgenerate // The actual multiply and accumulate stage\n" |
"\tif (TLEN > 2) for(k=0; k<TLEN-2; k=k+1)\n" |
"\tbegin : genstages\n" |
"\t\t// First, the multiply: 2-bits times BW bits\n" |
"\t\twire\t[(BW+LUTB-1):0] genp;\n" |
"\t\tbimpy #(BW) genmpy(i_clk,i_ce,r_a[k][(LUTB-1):0],r_b[k], genp);\n" |
"\treg f_rsyncd;\n" |
"\twire f_syncd;\n" |
"\n" |
"\t\t// Then the accumulate step -- on the next clock\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\t\tacc[k+1] <= acc[k] + {{(IW-LUTB*(k+3)){1'b0}},\n" |
"\t\t\t\t\tgenp, {(LUTB*(k+2)){1'b0}} };\n" |
"\tend endgenerate\n" |
"\n" |
"\twire [(IW+BW-1):0] w_r;\n" |
"\tassign\tw_r = (r_s[TLEN-1]) ? (-acc[TLEN-2]) : acc[TLEN-2];\n" |
"\tinitial f_rsyncd = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_r <= w_r[(AW+BW-1):0];\n" |
"\tif(i_reset)\n" |
"\t f_rsyncd <= 1'b0;\n" |
"\telse if (!f_rsyncd)\n" |
"\t f_rsyncd <= (o_sync);\n" |
"\tassign f_syncd = (f_rsyncd)||(o_sync);\n" |
"\n" |
"\tgenerate if (IW > AW)\n" |
"\tbegin : VUNUSED\n" |
"\t\t// verilator lint_off UNUSED\n" |
"\t\twire\t[(IW-AW)-1:0]\tunused;\n" |
"\t\tassign\tunused = w_r[(IW+BW-1):(AW+BW)];\n" |
"\t\t// verilator lint_on UNUSED\n" |
"\tend endgenerate\n" |
"\treg [1:0] f_state;\n" |
"\n" |
"endmodule\n"); |
|
fclose(fp); |
} |
|
void build_dblreverse(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: dblreverse.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: This module bitreverses a pipelined FFT input. Operation is\n" |
"// expected as follows:\n" |
"//\n" |
"// i_clk A running clock at whatever system speed is offered.\n" |
"// i_rst A synchronous reset signal, that resets all internals\n" |
"// i_ce If this is one, one input is consumed and an output\n" |
"// is produced.\n" |
"// i_in_0, i_in_1\n" |
"// Two inputs to be consumed, each of width WIDTH.\n" |
"// o_out_0, o_out_1\n" |
"// Two of the bitreversed outputs, also of the same\n" |
"// width, WIDTH. Of course, there is a delay from the\n" |
"// first input to the first output. For this purpose,\n" |
"// o_sync is present.\n" |
"// o_sync This will be a 1\'b1 for the first value in any block.\n" |
"// Following a reset, this will only become 1\'b1 once\n" |
"// the data has been loaded and is now valid. After that,\n" |
"// all outputs will be valid.\n" |
"//\n" |
"// 20150602 -- This module has undergone massive rework in order to\n" |
"// ensure that it uses resources efficiently. As a result, \n" |
"// it now optimizes nicely into block RAMs. As an unfortunately\n" |
"// side effect, it now passes it\'s bench test (dblrev_tb) but\n" |
"// fails the integration bench test (fft_tb).\n" |
"//\n" |
"//\n%s" |
"//\n", prjname, creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"\n\n" |
"//\n" |
"// How do we do bit reversing at two smples per clock? Can we separate out\n" |
"// our work into eight memory banks, writing two banks at once and reading\n" |
"// another two banks in the same clock?\n" |
"//\n" |
"// mem[00xxx0] = s_0[n]\n" |
"// mem[00xxx1] = s_1[n]\n" |
"// o_0[n] = mem[10xxx0]\n" |
"// o_1[n] = mem[11xxx0]\n" |
"// ...\n" |
"// mem[01xxx0] = s_0[m]\n" |
"// mem[01xxx1] = s_1[m]\n" |
"// o_0[m] = mem[10xxx1]\n" |
"// o_1[m] = mem[11xxx1]\n" |
"// ...\n" |
"// mem[10xxx0] = s_0[n]\n" |
"// mem[10xxx1] = s_1[n]\n" |
"// o_0[n] = mem[00xxx0]\n" |
"// o_1[n] = mem[01xxx0]\n" |
"// ...\n" |
"// mem[11xxx0] = s_0[m]\n" |
"// mem[11xxx1] = s_1[m]\n" |
"// o_0[m] = mem[00xxx1]\n" |
"// o_1[m] = mem[01xxx1]\n" |
"// ...\n" |
"//\n" |
"// The answer is that, yes we can but: we need to use four memory banks\n" |
"// to do it properly. These four banks are defined by the two bits\n" |
"// that determine the top and bottom of the correct address. Larger\n" |
"// FFT\'s would require more memories.\n" |
"//\n" |
"//\n"); |
fprintf(fp, |
"module dblreverse(i_clk, i_rst, i_ce, i_in_0, i_in_1,\n" |
"\t\to_out_0, o_out_1, o_sync);\n" |
"\tparameter\t\t\tLGSIZE=%d, WIDTH=24;\n" |
"\tinput\t\t\t\ti_clk, i_rst, i_ce;\n" |
"\tinput\t\t[(2*WIDTH-1):0]\ti_in_0, i_in_1;\n" |
"\toutput\twire\t[(2*WIDTH-1):0]\to_out_0, o_out_1;\n" |
"\toutput\treg\t\t\to_sync;\n", TST_DBLREVERSE_LGSIZE); |
|
fprintf(fp, |
"\n" |
"\treg\t\t\tin_reset;\n" |
"\treg\t[(LGSIZE-1):0]\tiaddr;\n" |
"\twire\t[(LGSIZE-3):0]\tbraddr;\n" |
"\tinitial f_state = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n" |
"\t f_state <= 0;\n" |
"\telse if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n" |
"\t f_state <= f_state + 1;\n" |
"\n" |
"\tgenvar\tk;\n" |
"\tgenerate for(k=0; k<LGSIZE-2; k=k+1)\n" |
"\tbegin : gen_a_bit_reversed_value\n" |
"\t\tassign braddr[k] = iaddr[LGSIZE-3-k];\n" |
"\tend endgenerate\n" |
"\talways @(*)\n" |
"\tif (f_state != 0)\n" |
"\t assume(!i_sync);\n" |
"\n" |
"\tinitial iaddr = 0;\n" |
"\tinitial in_reset = 1\'b1;\n" |
"\tinitial o_sync = 1\'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tbegin\n" |
"\t\t\tiaddr <= 0;\n" |
"\t\t\tin_reset <= 1\'b1;\n" |
"\t\t\to_sync <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tiaddr <= iaddr + { {(LGSIZE-1){1\'b0}}, 1\'b1 };\n" |
"\t\t\tif (&iaddr[(LGSIZE-2):0])\n" |
"\t\t\t\tin_reset <= 1\'b0;\n" |
"\t\t\tif (in_reset)\n" |
"\t\t\t\to_sync <= 1\'b0;\n" |
"\t\t\telse\n" |
"\t\t\t\to_sync <= ~(|iaddr[(LGSIZE-2):0]);\n" |
"\t\tend\n" |
"\t assert(f_state[1:0] == iaddr[1:0]);\n" |
"\n" |
"\treg\t[(2*WIDTH-1):0]\tmem_e [0:((1<<(LGSIZE))-1)];\n" |
"\treg\t[(2*WIDTH-1):0]\tmem_o [0:((1<<(LGSIZE))-1)];\n" |
"\twire signed [2*IWIDTH-1:0] f_i_real, f_i_imag;\n" |
"\tassign f_i_real = i_data[2*IWIDTH-1:IWIDTH];\n" |
"\tassign f_i_imag = i_data[ IWIDTH-1:0];\n" |
"\n" |
"\twire signed [OWIDTH-1:0] f_o_real, f_o_imag;\n" |
"\tassign f_o_real = o_data[2*OWIDTH-1:OWIDTH];\n" |
"\tassign f_o_imag = o_data[ OWIDTH-1:0];\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\tmem_e[iaddr] <= i_in_0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\tmem_o[iaddr] <= i_in_1;\n" |
"\tif (f_state == 2'b11)\n" |
"\tbegin\n" |
"\t assume(f_piped_real[0] != 3'sb100);\n" |
"\t assume(f_piped_real[2] != 3'sb100);\n" |
"\t assert(sum_r == f_piped_real[2] + f_piped_real[0]);\n" |
"\t assert(sum_i == f_piped_imag[2] + f_piped_imag[0]);\n" |
"\n" |
"\t assert(diff_r == f_piped_real[2] - f_piped_real[0]);\n" |
"\t assert(diff_i == f_piped_imag[2] - f_piped_imag[0]);\n" |
"\tend\n" |
"\n" |
"\treg [(2*WIDTH-1):0] evn_out_0, evn_out_1, odd_out_0, odd_out_1;\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((f_state == 2'b00)&&((f_syncd)||(iaddr >= 4)))\n" |
"\tbegin\n" |
"\t assert(rnd_sum_r == f_piped_real[3]+f_piped_real[1]);\n" |
"\t assert(rnd_sum_i == f_piped_imag[3]+f_piped_imag[1]);\n" |
"\t assert(rnd_diff_r == f_piped_real[3]-f_piped_real[1]);\n" |
"\t assert(rnd_diff_i == f_piped_imag[3]-f_piped_imag[1]);\n" |
"\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\tevn_out_0 <= mem_e[{~iaddr[LGSIZE-1],1\'b0,braddr}];\n" |
"\tif ((f_state == 2'b10)&&(f_syncd))\n" |
"\tbegin\n" |
"\t // assert(o_sync);\n" |
"\t assert(f_o_real == f_piped_real[5] + f_piped_real[3]);\n" |
"\t assert(f_o_imag == f_piped_imag[5] + f_piped_imag[3]);\n" |
"\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\tevn_out_1 <= mem_e[{~iaddr[LGSIZE-1],1\'b1,braddr}];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\todd_out_0 <= mem_o[{~iaddr[LGSIZE-1],1\'b0,braddr}];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n\t\t\todd_out_1 <= mem_o[{~iaddr[LGSIZE-1],1\'b1,braddr}];\n" |
"\tif ((f_state == 2'b11)&&(f_syncd))\n" |
"\tbegin\n" |
"\t assert(!o_sync);\n" |
"\t assert(f_o_real == f_piped_real[5] + f_piped_real[3]);\n" |
"\t assert(f_o_imag == f_piped_imag[5] + f_piped_imag[3]);\n" |
"\tend\n" |
"\n" |
"\treg\tadrz;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) adrz <= iaddr[LGSIZE-2];\n" |
"\tif ((f_state == 2'b00)&&(f_syncd))\n" |
"\tbegin\n" |
"\t assert(!o_sync);\n" |
"\t assert(f_o_real == f_piped_real[7] - f_piped_real[5]);\n" |
"\t assert(f_o_imag == f_piped_imag[7] - f_piped_imag[5]);\n" |
"\tend\n" |
"\n" |
"\tassign\to_out_0 = (adrz)?odd_out_0:evn_out_0;\n" |
"\tassign\to_out_1 = (adrz)?odd_out_1:evn_out_1;\n" |
"\talways @(*)\n" |
"\tif ((iaddr[2:0] == 0)&&(!wait_for_sync))\n" |
"\t assume(i_sync);\n" |
"\n" |
"endmodule\n"); |
|
fclose(fp); |
} |
|
void build_butterfly(const char *fname, int xtracbits, ROUND_T rounding) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
const char *rnd_string; |
if (rounding == RND_TRUNCATE) |
rnd_string = "truncate"; |
else if (rounding == RND_FROMZERO) |
rnd_string = "roundfromzero"; |
else if (rounding == RND_HALFUP) |
rnd_string = "roundhalfup"; |
else |
rnd_string = "convround"; |
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: butterfly.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: This routine caculates a butterfly for a decimation\n" |
"// in frequency version of an FFT. Specifically, given\n" |
"// complex Left and Right values together with a \n" |
"// coefficient, the output of this routine is given\n" |
"// by:\n" |
"//\n" |
"// L' = L + R\n" |
"// R' = (L - R)*C\n" |
"//\n" |
"// The rest of the junk below handles timing (mostly),\n" |
"// to make certain that L' and R' reach the output at\n" |
"// the same clock. Further, just to make certain\n" |
"// that is the case, an 'aux' input exists. This\n" |
"// aux value will come out of this routine synchronized\n" |
"// to the values it came in with. (i.e., both L', R',\n" |
"// and aux all have the same delay.) Hence, a caller\n" |
"// of this routine may set aux on the first input with\n" |
"// valid data, and then wait to see aux set on the output\n" |
"// to know when to find the first output with valid data.\n" |
"//\n" |
"// All bits are preserved until the very last clock,\n" |
"// where any more bits than OWIDTH will be quietly\n" |
"// discarded.\n" |
"//\n" |
"// This design features no overflow checking.\n" |
"// \n" |
"// Notes:\n" |
"// CORDIC:\n" |
"// Much as we would like, we can't use a cordic here.\n" |
"// The goal is to accomplish an FFT, as defined, and a\n" |
"// CORDIC places a scale factor onto the data. Removing\n" |
"// the scale factor would cost a two multiplies, which\n" |
"// is precisely what we are trying to avoid.\n" |
"//\n" |
"//\n" |
"// 3-MULTIPLIES:\n" |
"// It should also be possible to do this with three \n" |
"// multiplies and an extra two addition cycles. \n" |
"//\n" |
"// We want\n" |
"// R+I = (a + jb) * (c + jd)\n" |
"// R+I = (ac-bd) + j(ad+bc)\n" |
"// We multiply\n" |
"// P1 = ac\n" |
"// P2 = bd\n" |
"// P3 = (a+b)(c+d)\n" |
"// Then \n" |
"// R+I=(P1-P2)+j(P3-P2-P1)\n" |
"//\n" |
"// WIDTHS:\n" |
"// On multiplying an X width number by an\n" |
"// Y width number, X>Y, the result should be (X+Y)\n" |
"// bits, right?\n" |
"// -2^(X-1) <= a <= 2^(X-1) - 1\n" |
"// -2^(Y-1) <= b <= 2^(Y-1) - 1\n" |
"// (2^(Y-1)-1)*(-2^(X-1)) <= ab <= 2^(X-1)2^(Y-1)\n" |
"// -2^(X+Y-2)+2^(X-1) <= ab <= 2^(X+Y-2) <= 2^(X+Y-1) - 1\n" |
"// -2^(X+Y-1) <= ab <= 2^(X+Y-1)-1\n" |
"// YUP! But just barely. Do this and you'll really want\n" |
"// to drop a bit, although you will risk overflow in so\n" |
"// doing.\n" |
"//\n" |
"// 20150602 -- The sync logic lines have been completely redone. The\n" |
"// synchronization lines no longer go through the FIFO with the\n" |
"// left hand sum, but are kept out of memory. This allows the\n" |
"// butterfly to use more optimal memory resources, while also\n" |
"// guaranteeing that the sync lines can be properly reset upon\n" |
"// any reset signal.\n" |
"//\n" |
"//\n%s" |
"//\n", prjname, creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
|
fprintf(fp, |
"module\tbutterfly(i_clk, i_rst, i_ce, i_coef, i_left, i_right, i_aux,\n" |
"\t\to_left, o_right, o_aux);\n" |
"\t// Public changeable parameters ...\n" |
"\tparameter IWIDTH=%d,", TST_BUTTERFLY_IWIDTH); |
#ifdef TST_BUTTERFLY_CWIDTH |
fprintf(fp, "CWIDTH=%d,", TST_BUTTERFLY_CWIDTH); |
#else |
fprintf(fp, "CWIDTH=IWIDTH+%d,", xtracbits); |
#endif |
#ifdef TST_BUTTERFLY_OWIDTH |
fprintf(fp, "OWIDTH=%d;\n", TST_BUTTERFLY_OWIDTH); |
#else |
fprintf(fp, "OWIDTH=IWIDTH+1;\n"); |
#endif |
fprintf(fp, |
"\t// Parameters specific to the core that should not be changed.\n" |
"\tparameter MPYDELAY=%d'd%d,\n" |
"\t\t\tSHIFT=0, AUXLEN=(MPYDELAY+3);\n" |
"\t// The LGDELAY should be the base two log of the MPYDELAY. If\n" |
"\t// this value is fractional, then round up to the nearest\n" |
"\t// integer: LGDELAY=ceil(log(MPYDELAY)/log(2));\n" |
"\tparameter\tLGDELAY=%d;\n" |
"\tinput\t\ti_clk, i_rst, i_ce;\n" |
"\tinput\t\t[(2*CWIDTH-1):0] i_coef;\n" |
"\tinput\t\t[(2*IWIDTH-1):0] i_left, i_right;\n" |
"\tinput\t\ti_aux;\n" |
"\toutput\twire [(2*OWIDTH-1):0] o_left, o_right;\n" |
"\toutput\treg\to_aux;\n" |
"\n", lgdelay(16,xtracbits), bflydelay(16, xtracbits), |
lgdelay(16,xtracbits)); |
fprintf(fp, |
"\treg\t[(2*IWIDTH-1):0]\tr_left, r_right;\n" |
"\treg\t[(2*CWIDTH-1):0]\tr_coef, r_coef_2;\n" |
"\twire\tsigned\t[(IWIDTH-1):0]\tr_left_r, r_left_i, r_right_r, r_right_i;\n" |
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n" |
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n" |
"\talways @(*)\n" |
"\tif (wait_for_sync)\n" |
"\t assert((iaddr == 0)&&(f_state == 2'b00)&&(!o_sync)&&(!f_rsyncd));\n" |
"\n" |
"\treg\tsigned\t[(IWIDTH):0]\tr_sum_r, r_sum_i, r_dif_r, r_dif_i;\n" |
"\n" |
"\treg [(LGDELAY-1):0] fifo_addr;\n" |
"\twire [(LGDELAY-1):0] fifo_read_addr;\n" |
"\tassign\tfifo_read_addr = fifo_addr - MPYDELAY;\n" |
"\treg [(2*IWIDTH+1):0] fifo_left [ 0:((1<<LGDELAY)-1)];\n" |
"\n"); |
fprintf(fp, |
"\t// Set up the input to the multiply\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// One clock just latches the inputs\n" |
"\t\t\tr_left <= i_left; // No change in # of bits\n" |
"\t\t\tr_right <= i_right;\n" |
"\t\t\tr_coef <= i_coef;\n" |
"\t\t\t// Next clock adds/subtracts\n" |
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n" |
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n" |
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n" |
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n" |
"\t\t\t// Other inputs are simply delayed on second clock\n" |
"\t\t\tr_coef_2<= r_coef;\n" |
"\t\tend\n" |
"\n"); |
fprintf(fp, |
"\t// Don\'t forget to record the even side, since it doesn\'t need\n" |
"\t// to be multiplied, but yet we still need the results in sync\n" |
"\t// with the answer when it is ready.\n" |
"\tinitial fifo_addr = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\t\tfifo_addr <= 0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\t// Need to delay the sum side--nothing else happens\n" |
"\t\t\t// to it, but it needs to stay synchronized with the\n" |
"\t\t\t// right side.\n" |
"\t\t\tfifo_addr <= fifo_addr + 1;\n" |
"\tif ((f_past_valid)&&($past(i_ce))&&($past(i_sync))&&(!$past(i_reset)))\n" |
"\t assert(!wait_for_sync);\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\tfifo_left[fifo_addr] <= { r_sum_r, r_sum_i };\n" |
"\n" |
"\twire\tsigned\t[(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n" |
"\tassign\tir_coef_r = r_coef_2[(2*CWIDTH-1):CWIDTH];\n" |
"\tassign\tir_coef_i = r_coef_2[(CWIDTH-1):0];\n" |
"\twire\tsigned\t[((IWIDTH+2)+(CWIDTH+1)-1):0]\tp_one, p_two, p_three;\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\t// Multiply output is always a width of the sum of the widths of\n" |
"\t// the two inputs. ALWAYS. This is independent of the number of\n" |
"\t// bits in p_one, p_two, or p_three. These values needed to \n" |
"\t// accumulate a bit (or two) each. However, this approach to a\n" |
"\t// three multiply complex multiply cannot increase the total\n" |
"\t// number of bits in our final output. We\'ll take care of\n" |
"\t// dropping back down to the proper width, OWIDTH, in our routine\n" |
"\t// below.\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\t// We accomplish here \"Karatsuba\" multiplication. That is,\n" |
"\t// by doing three multiplies we accomplish the work of four.\n" |
"\t// Let\'s prove to ourselves that this works ... We wish to\n" |
"\t// multiply: (a+jb) * (c+jd), where a+jb is given by\n" |
"\t//\ta + jb = r_dif_r + j r_dif_i, and\n" |
"\t//\tc + jd = ir_coef_r + j ir_coef_i.\n" |
"\t// We do this by calculating the intermediate products P1, P2,\n" |
"\t// and P3 as\n" |
"\t//\tP1 = ac\n" |
"\t//\tP2 = bd\n" |
"\t//\tP3 = (a + b) * (c + d)\n" |
"\t// and then complete our final answer with\n" |
"\t//\tac - bd = P1 - P2 (this checks)\n" |
"\t//\tad + bc = P3 - P2 - P1\n" |
"\t//\t = (ac + bc + ad + bd) - bd - ac\n" |
"\t//\t = bc + ad (this checks)\n" |
"\n" |
"\n"); |
fprintf(fp, |
"\t// This should really be based upon an IF, such as in\n" |
"\t// if (IWIDTH < CWIDTH) then ...\n" |
"\t// However, this is the only (other) way I know to do it.\n" |
"\tgenerate if (CWIDTH < IWIDTH+1)\n" |
"\tif ((f_state == 2'b01)&&(f_syncd))\n" |
"\tbegin\n" |
"\t\twire\t[(CWIDTH):0]\tp3c_in;\n" |
"\t\twire\t[(IWIDTH+1):0]\tp3d_in;\n" |
"\t\tassign\tp3c_in = ir_coef_i + ir_coef_r;\n" |
"\t\tassign\tp3d_in = r_dif_r + r_dif_i;\n" |
"\n" |
"\t\t// We need to pad these first two multiplies by an extra\n" |
"\t\t// bit just to keep them aligned with the third,\n" |
"\t\t// simpler, multiply.\n" |
"\t\t%s #(CWIDTH+1,IWIDTH+2) p1(i_clk, i_ce,\n" |
"\t\t\t\t{ir_coef_r[CWIDTH-1],ir_coef_r},\n" |
"\t\t\t\t{r_dif_r[IWIDTH],r_dif_r}, p_one);\n" |
"\t\t%s #(CWIDTH+1,IWIDTH+2) p2(i_clk, i_ce,\n" |
"\t\t\t\t{ir_coef_i[CWIDTH-1],ir_coef_i},\n" |
"\t\t\t\t{r_dif_i[IWIDTH],r_dif_i}, p_two);\n" |
"\t\t%s #(CWIDTH+1,IWIDTH+2) p3(i_clk, i_ce,\n" |
"\t\t\t\tp3c_in, p3d_in, p_three);\n" |
"\tend else begin\n" |
"\t\twire\t[(CWIDTH):0]\tp3c_in;\n" |
"\t\twire\t[(IWIDTH+1):0]\tp3d_in;\n" |
"\t\tassign\tp3c_in = ir_coef_i + ir_coef_r;\n" |
"\t\tassign\tp3d_in = r_dif_r + r_dif_i;\n" |
"\n" |
"\t\t%s #(IWIDTH+2,CWIDTH+1) p1a(i_clk, i_ce,\n" |
"\t\t\t\t{r_dif_r[IWIDTH],r_dif_r},\n" |
"\t\t\t\t{ir_coef_r[CWIDTH-1],ir_coef_r}, p_one);\n" |
"\t\t%s #(IWIDTH+2,CWIDTH+1) p2a(i_clk, i_ce,\n" |
"\t\t\t\t{r_dif_i[IWIDTH], r_dif_i},\n" |
"\t\t\t\t{ir_coef_i[CWIDTH-1],ir_coef_i}, p_two);\n" |
"\t\t%s #(IWIDTH+2,CWIDTH+1) p3a(i_clk, i_ce,\n" |
"\t\t\t\tp3d_in, p3c_in, p_three);\n" |
"\t assert(!o_sync);\n" |
"\t if (INVERSE)\n" |
"\t begin\n" |
"\t assert(f_o_real == -f_piped_imag[7]+f_piped_imag[5]);\n" |
"\t assert(f_o_imag == f_piped_real[7]-f_piped_real[5]);\n" |
"\t end else begin\n" |
"\t assert(f_o_real == f_piped_imag[7]-f_piped_imag[5]);\n" |
"\t assert(f_o_imag == -f_piped_real[7]+f_piped_real[5]);\n" |
"\t end\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n", |
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy", |
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy", |
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy", |
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy", |
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy", |
(USE_OLD_MULTIPLY)?"shiftaddmpy":"longbimpy"); |
fprintf(fp, |
"\t// These values are held in memory and delayed during the\n" |
"\t// multiply. Here, we recover them. During the multiply,\n" |
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n" |
"\t// therefore, the left_x values need to be right shifted by\n" |
"\t// CWIDTH-2 as well. The additional bits come from a sign\n" |
"\t// extension.\n" |
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] fifo_i, fifo_r;\n" |
"\treg\t\t[(2*IWIDTH+1):0] fifo_read;\n" |
"\tassign\tfifo_r = { {2{fifo_read[2*(IWIDTH+1)-1]}}, fifo_read[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n" |
"\tassign\tfifo_i = { {2{fifo_read[(IWIDTH+1)-1]}}, fifo_read[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n" |
"\n" |
"\n" |
"\treg\tsigned\t[(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n" |
"\n"); |
fprintf(fp, |
"\t// Let's do some rounding and remove unnecessary bits.\n" |
"\t// We have (IWIDTH+CWIDTH+3) bits here, we need to drop down to\n" |
"\t// OWIDTH, and SHIFT by SHIFT bits in the process. The trick is\n" |
"\t// that we don\'t need (IWIDTH+CWIDTH+3) bits. We\'ve accumulated\n" |
"\t// them, but the actual values will never fill all these bits.\n" |
"\t// In particular, we only need:\n" |
"\t//\t IWIDTH bits for the input\n" |
"\t//\t +1 bit for the add/subtract\n" |
"\t//\t+CWIDTH bits for the coefficient multiply\n" |
"\t//\t +1 bit for the add/subtract in the complex multiply\n" |
"\t//\t ------\n" |
"\t//\t (IWIDTH+CWIDTH+2) bits at full precision.\n" |
"\t//\n" |
"\t// However, the coefficient multiply multiplied by a maximum value\n" |
"\t// of 2^(CWIDTH-2). Thus, we only have\n" |
"\t//\t IWIDTH bits for the input\n" |
"\t//\t +1 bit for the add/subtract\n" |
"\t//\t+CWIDTH-2 bits for the coefficient multiply\n" |
"\t//\t +1 (optional) bit for the add/subtract in the cpx mpy.\n" |
"\t//\t -------- ... multiply. (This last bit may be shifted out.)\n" |
"\t//\t (IWIDTH+CWIDTH) valid output bits. \n" |
"\t// Now, if the user wants to keep any extras of these (via OWIDTH),\n" |
"\t// or if he wishes to arbitrarily shift some of these off (via\n" |
"\t// SHIFT) we accomplish that here.\n" |
"\n"); |
fprintf(fp, |
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n"); |
"`endif\n"); |
} |
|
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_r(i_clk, i_ce,\n" |
"\t\t\t\t{ {2{fifo_r[(IWIDTH+CWIDTH)]}}, fifo_r }, rnd_left_r);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_left_i(i_clk, i_ce,\n" |
"\t\t\t\t{ {2{fifo_i[(IWIDTH+CWIDTH)]}}, fifo_i }, rnd_left_i);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n" |
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n" |
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string); |
fprintf(fp, |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// First clock, recover all values\n" |
"\t\t\tfifo_read <= fifo_left[fifo_read_addr];\n" |
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n" |
"\t\t\t// although they only need to be (IWIDTH+1)\n" |
"\t\t\t// + (CWIDTH) bits wide. (We\'ve got two\n" |
"\t\t\t// extra bits we need to get rid of.)\n" |
"\t\t\tmpy_r <= p_one - p_two;\n" |
"\t\t\tmpy_i <= p_three - p_one - p_two;\n" |
"\t\tend\n" |
"\n"); |
fprintf(fp, "endmodule\n"); |
} |
|
fprintf(fp, |
"\treg\t[(AUXLEN-1):0]\taux_pipeline;\n" |
"\tinitial\taux_pipeline = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\t\taux_pipeline <= 0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\taux_pipeline <= { aux_pipeline[(AUXLEN-2):0], i_aux };\n" |
"\n"); |
fprintf(fp, |
"\tinitial o_aux = 1\'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\t\to_aux <= 1\'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// Second clock, latch for final clock\n" |
"\t\t\to_aux <= aux_pipeline[AUXLEN-1];\n" |
"\t\tend\n" |
"\n"); |
|
fprintf(fp, |
"\t// As a final step, we pack our outputs into two packed two\'s\n" |
"\t// complement numbers per output word, so that each output word\n" |
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n" |
"\t// portion and the bottom half being the imaginary portion.\n" |
"\tassign o_left = { rnd_left_r, rnd_left_i };\n" |
"\tassign o_right= { rnd_right_r,rnd_right_i};\n" |
"\n" |
"endmodule\n"); |
fclose(fp); |
} |
|
void build_hwbfly(const char *fname, int xtracbits, ROUND_T rounding) { |
void build_sngllast(const char *fname, const bool async_reset = false) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
1777,472 → 740,221
return; |
} |
|
const char *rnd_string; |
if (rounding == RND_TRUNCATE) |
rnd_string = "truncate"; |
else if (rounding == RND_FROMZERO) |
rnd_string = "roundfromzero"; |
else if (rounding == RND_HALFUP) |
rnd_string = "roundhalfup"; |
else |
rnd_string = "convround"; |
std::string resetw("i_reset"); |
if (async_reset) |
resetw = std::string("i_areset_n"); |
|
|
fprintf(fp, |
"///////////////////////////////////////////////////////////////////////////\n" |
SLASHLINE |
"//\n" |
"// Filename: hwbfly.v\n" |
"// Filename:\tlaststage.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: This routine is identical to the butterfly.v routine found\n" |
"// in 'butterfly.v', save only that it uses the verilog \n" |
"// operator '*' in hopes that the synthesizer would be able to optimize\n" |
"// it with hardware resources.\n" |
"// Purpose: This is part of an FPGA implementation that will process\n" |
"// the final stage of a decimate-in-frequency FFT, running\n" |
"// through the data at one sample per clock.\n" |
"//\n" |
"// It is understood that a hardware multiply can complete its operation in\n" |
"// a single clock.\n" |
"//\n" |
"//\n%s" |
"//\n", prjname, creator); |
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
|
fprintf(fp, |
"module hwbfly(i_clk, i_rst, i_ce, i_coef, i_left, i_right, i_aux,\n" |
"\t\to_left, o_right, o_aux);\n" |
"\t// Public changeable parameters ...\n" |
"\tparameter IWIDTH=16,CWIDTH=IWIDTH+%d,OWIDTH=IWIDTH+1;\n" |
"\t// Parameters specific to the core that should not be changed.\n" |
"\tparameter\tSHIFT=0;\n" |
"\tinput\t\ti_clk, i_rst, i_ce;\n" |
"\tinput\t\t[(2*CWIDTH-1):0]\ti_coef;\n" |
"\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n" |
"\tinput\t\ti_aux;\n" |
"\toutput\twire\t[(2*OWIDTH-1):0]\to_left, o_right;\n" |
"\toutput\treg\to_aux;\n" |
"\n", xtracbits); |
"module laststage(i_clk, %s, i_ce, i_sync, i_val, o_val, o_sync);\n" |
" parameter IWIDTH=16,OWIDTH=IWIDTH+1, SHIFT=0;\n" |
" input i_clk, %s, i_ce, i_sync;\n" |
" input [(2*IWIDTH-1):0] i_val;\n" |
" output wire [(2*OWIDTH-1):0] o_val;\n" |
" output reg o_sync;\n\n", |
resetw.c_str(), resetw.c_str()); |
|
fprintf(fp, |
"\treg\t[(2*IWIDTH-1):0] r_left, r_right;\n" |
"\treg\t r_aux, r_aux_2;\n" |
"\treg\t[(2*CWIDTH-1):0] r_coef;\n" |
"\twire signed [(IWIDTH-1):0] r_left_r, r_left_i, r_right_r, r_right_i;\n" |
"\tassign\tr_left_r = r_left[ (2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_left_i = r_left[ (IWIDTH-1):0];\n" |
"\tassign\tr_right_r = r_right[(2*IWIDTH-1):(IWIDTH)];\n" |
"\tassign\tr_right_i = r_right[(IWIDTH-1):0];\n" |
"\treg signed [(CWIDTH-1):0] ir_coef_r, ir_coef_i;\n" |
" reg signed [(IWIDTH-1):0] m_r, m_i;\n" |
" wire signed [(IWIDTH-1):0] i_r, i_i;\n" |
"\n" |
"\treg signed [(IWIDTH):0] r_sum_r, r_sum_i, r_dif_r, r_dif_i;\n" |
" assign i_r = i_val[(2*IWIDTH-1):(IWIDTH)]; \n" |
" assign i_i = i_val[(IWIDTH-1):0]; \n" |
"\n" |
"\treg [(2*IWIDTH+2):0] leftv, leftvv;\n" |
" // Don't forget that we accumulate a bit by adding two values\n" |
" // together. Therefore our intermediate value must have one more\n" |
" // bit than the two originals.\n" |
" reg signed [(IWIDTH):0] rnd_r, rnd_i, sto_r, sto_i;\n" |
" reg wait_for_sync, stage;\n" |
" reg [1:0] sync_pipe;\n" |
"\n" |
"\t// Set up the input to the multiply\n" |
"\tinitial r_aux = 1\'b0;\n" |
"\tinitial r_aux_2 = 1\'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tbegin\n" |
"\t\t\tr_aux <= 1\'b0;\n" |
"\t\t\tr_aux_2 <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// One clock just latches the inputs\n" |
"\t\t\tr_aux <= i_aux;\n" |
"\t\t\t// Next clock adds/subtracts\n" |
"\t\t\t// Other inputs are simply delayed on second clock\n" |
"\t\t\tr_aux_2 <= r_aux;\n" |
"\t\tend\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// One clock just latches the inputs\n" |
"\t\t\tr_left <= i_left; // No change in # of bits\n" |
"\t\t\tr_right <= i_right;\n" |
"\t\t\tr_coef <= i_coef;\n" |
"\t\t\t// Next clock adds/subtracts\n" |
"\t\t\tr_sum_r <= r_left_r + r_right_r; // Now IWIDTH+1 bits\n" |
"\t\t\tr_sum_i <= r_left_i + r_right_i;\n" |
"\t\t\tr_dif_r <= r_left_r - r_right_r;\n" |
"\t\t\tr_dif_i <= r_left_i - r_right_i;\n" |
"\t\t\t// Other inputs are simply delayed on second clock\n" |
"\t\t\tir_coef_r <= r_coef[(2*CWIDTH-1):CWIDTH];\n" |
"\t\t\tir_coef_i <= r_coef[(CWIDTH-1):0];\n" |
"\t\tend\n" |
"\n\n"); |
fprintf(fp, |
"\t// See comments in the butterfly.v source file for a discussion of\n" |
"\t// these operations and the appropriate bit widths.\n\n"); |
fprintf(fp, |
"\treg\tsigned [((IWIDTH+1)+(CWIDTH)-1):0] p_one, p_two;\n" |
"\treg\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] p_three;\n" |
"\n" |
"\treg\tsigned [(CWIDTH-1):0] p1c_in, p2c_in; // Coefficient multiply inputs\n" |
"\treg\tsigned [(IWIDTH):0] p1d_in, p2d_in; // Data multiply inputs\n" |
"\treg\tsigned [(CWIDTH):0] p3c_in; // Product 3, coefficient input\n" |
"\treg\tsigned [(IWIDTH+1):0] p3d_in; // Product 3, data input\n" |
"\n" |
"\tinitial leftv = 0;\n" |
"\tinitial leftvv = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\tbegin\n" |
"\t\tif (i_rst)\n" |
"\t\tbegin\n" |
"\t\t\tleftv <= 0;\n" |
"\t\t\tleftvv <= 0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// Second clock, pipeline = 1\n" |
"\t\t\tleftv <= { r_aux_2, r_sum_r, r_sum_i };\n" |
"\n" |
"\t\t\t// Third clock, pipeline = 3\n" |
"\t\t\t// As desired, each of these lines infers a DSP48\n" |
"\t\t\tleftvv <= leftv;\n" |
"\t\tend\n" |
"\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// Second clock, pipeline = 1\n" |
"\t\t\tp1c_in <= ir_coef_r;\n" |
"\t\t\tp2c_in <= ir_coef_i;\n" |
"\t\t\tp1d_in <= r_dif_r;\n" |
"\t\t\tp2d_in <= r_dif_i;\n" |
"\t\t\tp3c_in <= ir_coef_i + ir_coef_r;\n" |
"\t\t\tp3d_in <= r_dif_r + r_dif_i;\n" |
"\n" |
"\n" |
"\t\t\t// Third clock, pipeline = 3\n" |
"\t\t\t// As desired, each of these lines infers a DSP48\n" |
"\t\t\tp_one <= p1c_in * p1d_in;\n" |
"\t\t\tp_two <= p2c_in * p2d_in;\n" |
"\t\t\tp_three <= p3c_in * p3d_in;\n" |
"\t\tend\n" |
"\n" |
"\twire\tsigned [((IWIDTH+2)+(CWIDTH+1)-1):0] w_one, w_two;\n" |
"\tassign\tw_one = { {(2){p_one[((IWIDTH+1)+(CWIDTH)-1)]}}, p_one };\n" |
"\tassign\tw_two = { {(2){p_two[((IWIDTH+1)+(CWIDTH)-1)]}}, p_two };\n" |
"\n"); |
" initial wait_for_sync = 1'b1;\n" |
" initial stage = 1'b0;\n"); |
|
if (async_reset) |
fprintf(fp, "\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else |
fprintf(fp, "\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
fprintf(fp, |
"\t// These values are held in memory and delayed during the\n" |
"\t// multiply. Here, we recover them. During the multiply,\n" |
"\t// values were multiplied by 2^(CWIDTH-2)*exp{-j*2*pi*...},\n" |
"\t// therefore, the left_x values need to be right shifted by\n" |
"\t// CWIDTH-2 as well. The additional bits come from a sign\n" |
"\t// extension.\n" |
"\twire\taux_s;\n" |
"\twire\tsigned\t[(IWIDTH+CWIDTH):0] left_si, left_sr;\n" |
"\treg\t\t[(2*IWIDTH+2):0] left_saved;\n" |
"\tassign\tleft_sr = { {2{left_saved[2*(IWIDTH+1)-1]}}, left_saved[(2*(IWIDTH+1)-1):(IWIDTH+1)], {(CWIDTH-2){1\'b0}} };\n" |
"\tassign\tleft_si = { {2{left_saved[(IWIDTH+1)-1]}}, left_saved[((IWIDTH+1)-1):0], {(CWIDTH-2){1\'b0}} };\n" |
"\tassign\taux_s = left_saved[2*IWIDTH+2];\n" |
"\n" |
"\n" |
"\t(* use_dsp48=\"no\" *)\n" |
"\treg signed [(CWIDTH+IWIDTH+3-1):0] mpy_r, mpy_i;\n"); |
fprintf(fp, |
"\twire\tsigned\t[(OWIDTH-1):0]\trnd_left_r, rnd_left_i, rnd_right_r, rnd_right_i;\n\n"); |
" begin\n" |
" wait_for_sync <= 1'b1;\n" |
" stage <= 1'b0;\n" |
" end else if ((i_ce)&&((!wait_for_sync)||(i_sync))&&(!stage))\n" |
" begin\n" |
" wait_for_sync <= 1'b0;\n" |
" //\n" |
" stage <= 1'b1;\n" |
" //\n" |
" end else if (i_ce)\n" |
" stage <= 1'b0;\n\n"); |
|
fprintf(fp, "\tinitial\tsync_pipe = 0;\n"); |
if (async_reset) |
fprintf(fp, |
"\talways @(posedge i_clk, negedge i_areset_n)\n" |
"\tif (!i_areset_n)\n"); |
else |
fprintf(fp, |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n"); |
|
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_r(i_clk, i_ce,\n" |
"\t\t\t\tleft_sr, rnd_left_r);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+1,OWIDTH,SHIFT+2) do_rnd_left_i(i_clk, i_ce,\n" |
"\t\t\t\tleft_si, rnd_left_i);\n\n", |
rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_r(i_clk, i_ce,\n" |
"\t\t\t\tmpy_r, rnd_right_r);\n\n", rnd_string); |
fprintf(fp, |
"\t%s #(CWIDTH+IWIDTH+3,OWIDTH,SHIFT+4) do_rnd_right_i(i_clk, i_ce,\n" |
"\t\t\t\tmpy_i, rnd_right_i);\n\n", rnd_string); |
"\t\tsync_pipe <= 0;\n" |
"\telse if (i_ce)\n" |
"\t\tsync_pipe <= { sync_pipe[0], i_sync };\n\n"); |
|
fprintf(fp, "\tinitial\to_sync = 1\'b0;\n"); |
if (async_reset) |
fprintf(fp, |
"\talways @(posedge i_clk, negedge i_areset_n)\n" |
"\tif (!i_areset_n)\n"); |
else |
fprintf(fp, |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n"); |
|
fprintf(fp, |
"\tinitial left_saved = 0;\n" |
"\tinitial o_aux = 1\'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tbegin\n" |
"\t\t\tleft_saved <= 0;\n" |
"\t\t\to_aux <= 1\'b0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// First clock, recover all values\n" |
"\t\t\tleft_saved <= leftvv;\n" |
"\n" |
"\t\t\t// Second clock, round and latch for final clock\n" |
"\t\t\to_aux <= aux_s;\n" |
"\t\tend\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\t// These values are IWIDTH+CWIDTH+3 bits wide\n" |
"\t\t\t// although they only need to be (IWIDTH+1)\n" |
"\t\t\t// + (CWIDTH) bits wide. (We've got two\n" |
"\t\t\t// extra bits we need to get rid of.)\n" |
"\n" |
"\t\t\t// These two lines also infer DSP48\'s.\n" |
"\t\t\t// To keep from using extra DSP48 resources,\n" |
"\t\t\t// they are prevented from using DSP48\'s\n" |
"\t\t\t// by the (* use_dsp48 ... *) comment above.\n" |
"\t\t\tmpy_r <= w_one - w_two;\n" |
"\t\t\tmpy_i <= p_three - w_one - w_two;\n" |
"\t\tend\n" |
"\n"); |
"\t\to_sync <= 1\'b0;\n" |
"\telse if (i_ce)\n" |
"\t\to_sync <= sync_pipe[1];\n\n"); |
|
fprintf(fp, |
"\t// As a final step, we pack our outputs into two packed two's\n" |
"\t// complement numbers per output word, so that each output word\n" |
"\t// has (2*OWIDTH) bits in it, with the top half being the real\n" |
"\t// portion and the bottom half being the imaginary portion.\n" |
"\tassign\to_left = { rnd_left_r, rnd_left_i };\n" |
"\tassign\to_right= { rnd_right_r,rnd_right_i};\n" |
" always @(posedge i_clk)\n" |
" if (i_ce)\n" |
" begin\n" |
" if (!stage)\n" |
" begin\n" |
" // Clock 1\n" |
" m_r <= i_r;\n" |
" m_i <= i_i;\n" |
" // Clock 3\n" |
" rnd_r <= sto_r;\n" |
" rnd_i <= sto_i;\n" |
" //\n" |
" end else begin\n" |
" // Clock 2\n" |
" rnd_r <= m_r + i_r;\n" |
" rnd_i <= m_i + i_i;\n" |
" //\n" |
" sto_r <= m_r - i_r;\n" |
" sto_i <= m_i - i_i;\n" |
" //\n" |
" end\n" |
" end\n" |
"\n" |
"endmodule\n"); |
|
} |
|
void build_stage(const char *fname, const char *coredir, int stage, bool odd, int nbits, bool inv, int xtra, bool hwmpy=false, bool dbg=false) { |
FILE *fstage = fopen(fname, "w"); |
int cbits = nbits + xtra; |
|
if ((cbits * 2) >= sizeof(long long)*8) { |
fprintf(stderr, "ERROR: CMEM Coefficient precision requested overflows long long data type.\n"); |
exit(-1); |
} |
|
if (fstage == NULL) { |
fprintf(stderr, "ERROR: Could not open %s for writing!\n", fname); |
perror("O/S Err was:"); |
fprintf(stderr, "Attempting to continue, but this file will be missing.\n"); |
return; |
} |
|
fprintf(fstage, |
"////////////////////////////////////////////////////////////////////////////\n" |
"//\n" |
"// Filename: %sfftstage_%c%d%s.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: This file is (almost) a Verilog source file. It is meant to\n" |
"// be used by a FFT core compiler to generate FFTs which may be\n" |
"// used as part of an FFT core. Specifically, this file \n" |
"// encapsulates the options of an FFT-stage. For any 2^N length\n" |
"// FFT, there shall be (N-1) of these stages. \n" |
"//\n%s" |
"//\n", |
(inv)?"i":"", (odd)?'o':'e', stage*2, (dbg)?"_dbg":"", prjname, creator); |
fprintf(fstage, "%s", cpyleft); |
fprintf(fstage, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fstage, "module\t%sfftstage_%c%d%s(i_clk, i_rst, i_ce, i_sync, i_data, o_data, o_sync%s);\n", |
(inv)?"i":"", (odd)?'o':'e', stage*2, (dbg)?"_dbg":"", |
(dbg)?", o_dbg":""); |
// These parameter values are useless at this point--they are to be |
// replaced by the parameter values in the calling program. Only |
// problem is, the CWIDTH needs to match exactly! |
fprintf(fstage, "\tparameter\tIWIDTH=%d,CWIDTH=%d,OWIDTH=%d;\n", |
nbits, cbits, nbits+1); |
fprintf(fstage, |
"\t// Parameters specific to the core that should be changed when this\n" |
"\t// core is built ... Note that the minimum LGSPAN (the base two log\n" |
"\t// of the span, or the base two log of the current FFT size) is 3.\n" |
"\t// Smaller spans (i.e. the span of 2) must use the dblstage module.\n" |
"\tparameter\tLGWIDTH=11, LGSPAN=9, LGBDLY=5, BFLYSHIFT=0;\n"); |
fprintf(fstage, |
"\tinput i_clk, i_rst, i_ce, i_sync;\n" |
"\tinput [(2*IWIDTH-1):0] i_data;\n" |
"\toutput reg [(2*OWIDTH-1):0] o_data;\n" |
"\toutput reg o_sync;\n" |
"\n"); |
if (dbg) { fprintf(fstage, "\toutput\twire\t[33:0]\t\t\to_dbg;\n" |
"\tassign\to_dbg = { ((o_sync)&&(i_ce)), i_ce, o_data[(2*OWIDTH-1):(2*OWIDTH-16)],\n" |
"\t\t\t\t\to_data[(OWIDTH-1):(OWIDTH-16)] };\n" |
"\n"); |
} |
fprintf(fstage, |
"\treg wait_for_sync;\n" |
"\treg [(2*IWIDTH-1):0] ib_a, ib_b;\n" |
"\treg [(2*CWIDTH-1):0] ib_c;\n" |
"\treg ib_sync;\n" |
" // Now that we have our results, let's round them and report them\n" |
" wire signed [(OWIDTH-1):0] o_r, o_i;\n" |
"\n" |
"\treg b_started;\n" |
"\twire ob_sync;\n" |
"\twire [(2*OWIDTH-1):0]\tob_a, ob_b;\n"); |
fprintf(fstage, |
" convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_r(i_clk, i_ce, rnd_r, o_r);\n" |
" convround #(IWIDTH+1,OWIDTH,SHIFT) do_rnd_i(i_clk, i_ce, rnd_i, o_i);\n" |
"\n" |
"\t// %scmem is defined as an array of real and complex values,\n" |
"\t// where the top CWIDTH bits are the real value and the bottom\n" |
"\t// CWIDTH bits are the imaginary value.\n" |
"\t//\n" |
"\t// %scmem[i] = { (2^(CWIDTH-2)) * cos(2*pi*i/(2^LGWIDTH)),\n" |
"\t// (2^(CWIDTH-2)) * sin(2*pi*i/(2^LGWIDTH)) };\n" |
"\t//\n" |
"\treg [(2*CWIDTH-1):0] %scmem [0:((1<<LGSPAN)-1)];\n" |
"\tinitial\t$readmemh(\"%scmem_%c%d.hex\",%scmem);\n\n", |
(inv)?"i":"", (inv)?"i":"", (inv)?"i":"", |
(inv)?"i":"", (odd)?'o':'e',stage<<1, (inv)?"i":""); |
{ |
FILE *cmem; |
" assign o_val = { o_r, o_i };\n" |
"\n"); |
|
{ |
char *memfile, *ptr; |
|
memfile = new char[strlen(fname)+128]; |
strcpy(memfile, fname); |
if ((NULL != (ptr = strrchr(memfile, '/')))&&(ptr>memfile)) { |
ptr++; |
sprintf(ptr, "%scmem_%c%d.hex", (inv)?"i":"", (odd)?'o':'e', stage*2); |
} else { |
sprintf(memfile, "%s/%scmem_%c%d.hex", |
coredir, (inv)?"i":"", |
(odd)?'o':'e', stage*2); |
} |
// strcpy(&memfile[strlen(memfile)-2], ".hex"); |
cmem = fopen(memfile, "w"); |
if (NULL == cmem) { |
fprintf(stderr, "Could not open/write \'%s\' with FFT coefficients.\n", memfile); |
perror("Err from O/S:"); |
exit(-2); |
} |
|
delete[] memfile; |
} |
// fprintf(cmem, "// CBITS = %d, inv = %s\n", cbits, (inv)?"true":"false"); |
for(int i=0; i<stage/2; i++) { |
int k = 2*i+odd; |
double W = ((inv)?1:-1)*2.0*M_PI*k/(double)(2*stage); |
double c, s; |
long long ic, is, vl; |
|
c = cos(W); s = sin(W); |
ic = (long long)llround((1ll<<(cbits-2)) * c); |
is = (long long)llround((1ll<<(cbits-2)) * s); |
vl = (ic & (~(-1ll << (cbits)))); |
vl <<= (cbits); |
vl |= (is & (~(-1ll << (cbits)))); |
fprintf(cmem, "%0*llx\n", ((cbits*2+3)/4), vl); |
/* |
fprintf(cmem, "%0*llx\t\t// %f+j%f -> %llx +j%llx\n", |
((cbits*2+3)/4), vl, c, s, |
ic & (~(-1ll<<(((cbits+3)/4)*4))), |
is & (~(-1ll<<(((cbits+3)/4)*4)))); |
*/ |
} fclose(cmem); |
if (formal_property_flag) { |
fprintf(fp, |
"`ifdef FORMAL\n" |
"\treg f_past_valid;\n" |
"\tinitial f_past_valid = 1'b0;\n" |
"\talways @(posedge i_clk)\n" |
"\t f_past_valid <= 1'b1;\n" |
"\n" |
"`ifdef LASTSTAGE\n" |
"\talways @(posedge i_clk)\n" |
"\t assume((i_ce)||($past(i_ce))||($past(i_ce,2)));\n" |
"`endif\n" |
"\n" |
"\tinitial assert(IWIDTH+1 == OWIDTH);\n" |
"\n" |
"\treg signed [IWIDTH-1:0] f_piped_real [0:3];\n" |
"\treg signed [IWIDTH-1:0] f_piped_imag [0:3];\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_ce)\n" |
"\tbegin\n" |
"\t f_piped_real[0] <= i_val[2*IWIDTH-1:IWIDTH];\n" |
"\t f_piped_imag[0] <= i_val[ IWIDTH-1:0];\n" |
"\n" |
"\t f_piped_real[1] <= f_piped_real[0];\n" |
"\t f_piped_imag[1] <= f_piped_imag[0];\n" |
"\n" |
"\t f_piped_real[2] <= f_piped_real[1];\n" |
"\t f_piped_imag[2] <= f_piped_imag[1];\n" |
"\n" |
"\t f_piped_real[3] <= f_piped_real[2];\n" |
"\t f_piped_imag[3] <= f_piped_imag[2];\n" |
"\tend\n" |
"\n" |
"\twire f_syncd;\n" |
"\treg f_rsyncd;\n" |
"\n" |
"\tinitial f_rsyncd = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n" |
"\t f_rsyncd <= 1'b0;\n" |
"\telse if (!f_rsyncd)\n" |
"\t f_rsyncd <= o_sync;\n" |
"\tassign f_syncd = (f_rsyncd)||(o_sync);\n" |
"\n" |
"\treg f_state;\n" |
"\tinitial f_state = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\tif (i_reset)\n" |
"\t f_state <= 0;\n" |
"\telse if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n" |
"\t f_state <= f_state + 1;\n" |
"\n" |
"\talways @(*)\n" |
"\tif (f_state != 0)\n" |
"\t assume(!i_sync);\n" |
"\n" |
"\talways @(*)\n" |
"\t assert(stage == f_state[0]);\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((f_state == 1'b1)&&(f_syncd))\n" |
"\tbegin\n" |
"\t assert(o_r == f_piped_real[2] + f_piped_real[1]);\n" |
"\t assert(o_i == f_piped_imag[2] + f_piped_imag[1]);\n" |
"\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\tif ((f_state == 1'b0)&&(f_syncd))\n" |
"\tbegin\n" |
"\t assert(!o_sync);\n" |
"\t assert(o_r == f_piped_real[3] - f_piped_real[2]);\n" |
"\t assert(o_i == f_piped_imag[3] - f_piped_imag[2]);\n" |
"\tend\n" |
"\n" |
"\talways @(*)\n" |
"\tif (wait_for_sync)\n" |
"\tbegin\n" |
"\t assert(!f_rsyncd);\n" |
"\t assert(!o_sync);\n" |
"\t assert(f_state == 0);\n" |
"\tend\n\n"); |
} |
|
fprintf(fstage, |
"\treg [(LGWIDTH-2):0] iaddr;\n" |
"\treg [(2*IWIDTH-1):0] imem [0:((1<<LGSPAN)-1)];\n" |
"\n" |
"\treg [LGSPAN:0] oB;\n" |
"\treg [(2*OWIDTH-1):0] omem [0:((1<<LGSPAN)-1)];\n" |
"\n" |
"\tinitial wait_for_sync = 1\'b1;\n" |
"\tinitial iaddr = 0;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tbegin\n" |
"\t\t\twait_for_sync <= 1\'b1;\n" |
"\t\t\tiaddr <= 0;\n" |
"\t\tend\n" |
"\t\telse if ((i_ce)&&((!wait_for_sync)||(i_sync)))\n" |
"\t\tbegin\n" |
"\t\t\t//\n" |
"\t\t\t// First step: Record what we\'re not ready to use yet\n" |
"\t\t\t//\n" |
"\t\t\tiaddr <= iaddr + { {(LGWIDTH-2){1\'b0}}, 1\'b1 };\n" |
"\t\t\twait_for_sync <= 1\'b0;\n" |
"\t\tend\n" |
"\talways @(posedge i_clk) // Need to make certain here that we don\'t read\n" |
"\t\tif ((i_ce)&&(!iaddr[LGSPAN])) // and write the same address on\n" |
"\t\t\timem[iaddr[(LGSPAN-1):0]] <= i_data; // the same clk\n" |
"\n"); |
fprintf(fp, |
"`endif // FORMAL\n" |
"endmodule\n"); |
|
fprintf(fstage, |
"\t//\n" |
"\t// Now, we have all the inputs, so let\'s feed the butterfly\n" |
"\t//\n" |
"\tinitial ib_sync = 1\'b0;\n" |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\t\tib_sync <= 1\'b0;\n" |
"\t\telse if ((i_ce)&&(iaddr[LGSPAN]))\n" |
"\t\t\tbegin\n" |
"\t\t\t\t// Set the sync to true on the very first\n" |
"\t\t\t\t// valid input in, and hence on the very\n" |
"\t\t\t\t// first valid data out per FFT.\n" |
"\t\t\t\tib_sync <= (iaddr==(1<<(LGSPAN)));\n" |
"\t\t\tend\n" |
"\talways\t@(posedge i_clk)\n" |
"\t\tif ((i_ce)&&(iaddr[LGSPAN]))\n" |
"\t\t\tbegin\n" |
"\t\t\t\t// One input from memory, ...\n" |
"\t\t\t\tib_a <= imem[iaddr[(LGSPAN-1):0]];\n" |
"\t\t\t\t// One input clocked in from the top\n" |
"\t\t\t\tib_b <= i_data;\n" |
"\t\t\t\t// and the coefficient or twiddle factor\n" |
"\t\t\t\tib_c <= %scmem[iaddr[(LGSPAN-1):0]];\n" |
"\t\t\tend\n\n", (inv)?"i":""); |
|
if (hwmpy) { |
fprintf(fstage, |
"\thwbfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n" |
"\t\t\t.SHIFT(BFLYSHIFT))\n" |
"\t\tbfly(i_clk, i_rst, i_ce, ib_c,\n" |
"\t\t\tib_a, ib_b, ib_sync, ob_a, ob_b, ob_sync);\n"); |
} else { |
fprintf(fstage, |
"\tbutterfly #(.IWIDTH(IWIDTH),.CWIDTH(CWIDTH),.OWIDTH(OWIDTH),\n" |
"\t\t\t.MPYDELAY(%d\'d%d),.LGDELAY(LGBDLY),.SHIFT(BFLYSHIFT))\n" |
"\t\tbfly(i_clk, i_rst, i_ce, ib_c,\n" |
"\t\t\tib_a, ib_b, ib_sync, ob_a, ob_b, ob_sync);\n", |
lgdelay(nbits, xtra), bflydelay(nbits, xtra)); |
} |
|
fprintf(fstage, |
"\t//\n" |
"\t// Next step: recover the outputs from the butterfly\n" |
"\t//\n" |
"\tinitial oB = 0;\n" |
"\tinitial o_sync = 0;\n" |
"\tinitial b_started = 0;\n" |
"\talways\t@(posedge i_clk)\n" |
"\t\tif (i_rst)\n" |
"\t\tbegin\n" |
"\t\t\toB <= 0;\n" |
"\t\t\to_sync <= 0;\n" |
"\t\t\tb_started <= 0;\n" |
"\t\tend else if (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\to_sync <= (!oB[LGSPAN])?ob_sync : 1\'b0;\n" |
"\t\t\tif (ob_sync||b_started)\n" |
"\t\t\t\toB <= oB + { {(LGSPAN){1\'b0}}, 1\'b1 };\n" |
"\t\t\tif ((ob_sync)&&(!oB[LGSPAN]))\n" |
"\t\t\t// A butterfly output is available\n" |
"\t\t\t\tb_started <= 1\'b1;\n" |
"\t\tend\n\n"); |
fprintf(fstage, |
"\treg [(LGSPAN-1):0]\t\tdly_addr;\n" |
"\treg [(2*OWIDTH-1):0]\tdly_value;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tdly_addr <= oB[(LGSPAN-1):0];\n" |
"\t\t\tdly_value <= ob_b;\n" |
"\t\tend\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\tomem[dly_addr] <= dly_value;\n" |
"\n"); |
fprintf(fstage, |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_data <= (!oB[LGSPAN])?ob_a : omem[oB[(LGSPAN-1):0]];\n" |
"\n"); |
fprintf(fstage, "endmodule\n"); |
fclose(fp); |
} |
|
void usage(void) { |
2249,20 → 961,25
fprintf(stderr, |
"USAGE:\tfftgen [-f <size>] [-d dir] [-c cbits] [-n nbits] [-m mxbits] [-s]\n" |
// "\tfftgen -i\n" |
"\t-1\tBuild a normal FFT, running at one clock per complex sample, or (for\n" |
"\t\ta real FFT) at one clock per two real input samples.\n" |
"\t-1\tBuild a normal FFT, running at one clock per complex sample, or\n" |
"\t\t(for a real FFT) at one clock per two real input samples.\n" |
"\t-a <hdrname> Create a header of information describing the built-in\n" |
"\t\tparameters, useful for module-level testing with Verilator\n" |
"\t-c <cbits>\tCauses all internal complex coefficients to be\n" |
"\t\tlonger than the corresponding data bits, to help avoid\n" |
"\t\tcoefficient truncation errors. The default is %d bits longer\n" |
"\t\tthan the data bits.\n" |
"\t-d <dir>\tPlaces all of the generated verilog files into <dir>.\n" |
"\t\tThe default is a subdirectory of the current directory named %s.\n" |
"\t-f <size>\tSets the size of the FFT as the number of complex\n" |
"\t-d <dir> Places all of the generated verilog files into <dir>.\n" |
"\t\tThe default is a subdirectory of the current directory\n" |
"\t\tnamed %s.\n" |
"\t-f <size> Sets the size of the FFT as the number of complex\n" |
"\t\tsamples input to the transform. (No default value, this is\n" |
"\t\ta required parameter.)\n" |
"\t-i\tAn inverse FFT, meaning that the coefficients are\n" |
"\t\tgiven by e^{ j 2 pi k/N n }. The default is a forward FFT, with\n" |
"\t\tcoefficients given by e^{ -j 2 pi k/N n }.\n" |
"\t-k #\tSets # clocks per sample, used to minimize multiplies. Also\n" |
"\t\tsets one sample in per i_ce clock (opt -1)\n" |
"\t-m <mxbits>\tSets the maximum bit width that the FFT should ever\n" |
"\t\tproduce. Internal values greater than this value will be\n" |
"\t\ttruncated to this value. (The default value grows the input\n" |
2270,11 → 987,9
"\t-n <nbits>\tSets the bitwidth for values coming into the (i)FFT.\n" |
"\t\tThe default is %d bits input for each component of the two\n" |
"\t\tcomplex values into the FFT.\n" |
"\t-p <nmpy>\tSets the number of stages that will use any hardware \n" |
"\t\tmultiplication facility, instead of shift-add emulation.\n" |
"\t\tThree multiplies per butterfly, or six multiplies per stage will\n" |
"\t\tbe accelerated in this fashion. The default is not to use any\n" |
"\t\thardware multipliers.\n" |
"\t-p <nmpy> Sets the number of hardware multiplies (DSPs) to use, versus\n" |
"\t\tshift-add emulation. The default is not to use any hardware\n" |
"\t\tmultipliers.\n" |
"\t-r\tBuild a real-FFT at four input points per sample, rather than a\n" |
"\t\tcomplex FFT. (Default is a Complex FFT.)\n" |
"\t-s\tSkip the final bit reversal stage. This is useful in\n" |
2286,8 → 1001,8
"\t\tnot yet provide.)\n" |
"\t-S\tInclude the final bit reversal stage (default).\n" |
"\t-x <xtrabits>\tUse this many extra bits internally, before any final\n" |
"\t\trounding or truncation of the answer to the final number of bits.\n" |
"\t\tThe default is to use %d extra bits internally.\n", |
"\t\trounding or truncation of the answer to the final number of\n" |
"\t\tbits. The default is to use %d extra bits internally.\n", |
/* |
"\t-0\tA forward FFT (default), meaning that the coefficients are\n" |
"\t\tgiven by e^{-j 2 pi k/N n }.\n" |
2302,11 → 1017,14
int main(int argc, char **argv) { |
int fftsize = -1, lgsize = -1; |
int nbitsin = DEF_NBITSIN, xtracbits = DEF_XTRACBITS, |
nummpy=DEF_NMPY, nonmpy=2; |
int nbitsout, maxbitsout = -1, xtrapbits=DEF_XTRAPBITS; |
nummpy=DEF_NMPY, nmpypstage=6, mpy_stages; |
int nbitsout, maxbitsout = -1, xtrapbits=DEF_XTRAPBITS, ckpce = 0; |
const char *EMPTYSTR = ""; |
bool bitreverse = true, inverse=false, |
verbose_flag = false, single_clock = false, |
real_fft = false; |
verbose_flag = false, |
single_clock = false, |
real_fft = false, |
async_reset = false; |
FILE *vmain; |
std::string coredir = DEF_COREDIR, cmdline = "", hdrname = ""; |
ROUND_T rounding = RND_CONVERGENT; |
2318,6 → 1036,7
if (argc <= 1) |
usage(); |
|
// Copy the original command line before we mess with it |
cmdline = argv[0]; |
for(int argn=1; argn<argc; argn++) { |
cmdline += " "; |
2324,146 → 1043,87
cmdline += argv[argn]; |
} |
|
for(int argn=1; argn<argc; argn++) { |
if ('-' == argv[argn][0]) { |
for(int j=1; (argv[argn][j])&&(j<100); j++) { |
switch(argv[argn][j]) { |
/* |
case '0': |
inverse = false; |
{ int c; |
while((c = getopt(argc, argv, "12Aa:c:d:D:f:hik:m:n:p:rsSx:v")) != -1) { |
switch(c) { |
case '1': single_clock = true; break; |
case '2': single_clock = false; break; |
case 'A': async_reset = true; break; |
case 'a': hdrname = strdup(optarg); break; |
case 'c': xtracbits = atoi(optarg); break; |
case 'd': coredir = std::string(optarg); break; |
case 'D': dbgstage = atoi(optarg); break; |
case 'f': fftsize = atoi(optarg); |
{ int sln = strlen(optarg); |
if (!isdigit(optarg[sln-1])){ |
switch(optarg[sln-1]) { |
case 'k': case 'K': |
fftsize <<= 10; |
break; |
*/ |
case '1': |
single_clock = true; |
case 'm': case 'M': |
fftsize <<= 20; |
break; |
case 'a': |
if (argn+1 >= argc) { |
printf("ERR: No header filename given\n\n"); |
usage(); exit(-1); |
} |
hdrname = argv[++argn]; |
j+= 200; |
case 'g': case 'G': |
fftsize <<= 30; |
break; |
case 'c': |
if (argn+1 >= argc) { |
printf("ERR: No extra number of coefficient bits given!\n\n"); |
usage(); exit(-1); |
} |
xtracbits = atoi(argv[++argn]); |
j+= 200; |
break; |
case 'd': |
if (argn+1 >= argc) { |
printf("ERR: No directory given into which to place the core!\n\n"); |
usage(); exit(-1); |
} |
coredir = argv[++argn]; |
j += 200; |
break; |
case 'D': |
dbg = true; |
if (argn+1 >= argc) { |
printf("ERR: No debug stage number given!\n\n"); |
usage(); exit(-1); |
} |
dbgstage = atoi(argv[++argn]); |
j+= 200; |
break; |
case 'f': |
if (argn+1 >= argc) { |
printf("ERR: No FFT Size given!\n\n"); |
usage(); exit(-1); |
} |
fftsize = atoi(argv[++argn]); |
{ int sln = strlen(argv[argn]); |
if (!isdigit(argv[argn][sln-1])){ |
switch(argv[argn][sln-1]) { |
case 'k': case 'K': |
fftsize <<= 10; |
break; |
case 'm': case 'M': |
fftsize <<= 20; |
break; |
case 'g': case 'G': |
fftsize <<= 30; |
break; |
default: |
printf("ERR: Unknown FFT size, %s!\n", argv[argn]); |
exit(-1); |
} |
}} |
j += 200; |
break; |
case 'h': |
usage(); |
exit(0); |
break; |
case 'i': |
inverse = true; |
break; |
case 'm': |
if (argn+1 >= argc) { |
printf("ERR: No maximum output bit value given!\n\n"); |
exit(-1); |
} |
maxbitsout = atoi(argv[++argn]); |
j += 200; |
break; |
case 'n': |
if (argn+1 >= argc) { |
printf("ERR: No input bit size given!\n\n"); |
exit(-1); |
} |
nbitsin = atoi(argv[++argn]); |
j += 200; |
break; |
case 'p': |
if (argn+1 >= argc) { |
printf("ERR: No number given for number of hardware multiply stages!\n\n"); |
exit(-1); |
} |
nummpy = atoi(argv[++argn]); |
j += 200; |
break; |
case 'r': |
real_fft = true; |
break; |
case 'S': |
bitreverse = true; |
break; |
case 's': |
bitreverse = false; |
break; |
case 'x': |
if (argn+1 >= argc) { |
printf("ERR: No extra number of bits given!\n\n"); |
usage(); exit(-1); |
} j+= 200; |
xtrapbits = atoi(argv[++argn]); |
break; |
case 'v': |
verbose_flag = true; |
break; |
default: |
printf("Unknown argument, -%c\n", argv[argn][j]); |
usage(); |
exit(-1); |
} |
} |
} else { |
printf("Unrecognized argument, %s\n", argv[argn]); |
printf("ERR: Unknown FFT size, %s!\n", optarg); |
exit(EXIT_FAILURE); |
} |
}} break; |
case 'h': usage(); exit(EXIT_SUCCESS); break; |
case 'i': inverse = true; break; |
case 'k': ckpce = atoi(optarg); |
single_clock = true; |
break; |
case 'm': maxbitsout = atoi(optarg); break; |
case 'n': nbitsin = atoi(optarg); break; |
case 'p': nummpy = atoi(optarg); break; |
case 'r': real_fft = true; break; |
case 'S': bitreverse = true; break; |
case 's': bitreverse = false; break; |
case 'x': xtrapbits = atoi(optarg); break; |
case 'v': verbose_flag = true; break; |
// case 'z': variable_size = true; break; |
default: |
printf("Unknown argument, -%c\n", c); |
usage(); |
exit(-1); |
exit(EXIT_FAILURE); |
} |
}} |
|
if (verbose_flag) { |
if (inverse) |
printf("Building a %d point inverse FFT module, with %s outputs\n", |
fftsize, |
(real_fft)?"real ":"complex"); |
else |
printf("Building a %d point %sforward FFT module\n", |
fftsize, |
(real_fft)?"real ":""); |
if (!single_clock) |
printf(" that accepts two inputs per clock\n"); |
if (async_reset) |
printf(" using a negative logic ASYNC reset\n"); |
|
printf("The core will be placed into the %s/ directory\n", coredir.c_str()); |
|
if (hdrname[0]) |
printf("A C header file, %s, will be written capturing these\n" |
"options for a Verilator testbench\n", |
hdrname.c_str()); |
// nummpy |
// xtrapbits |
} |
|
if (real_fft) { |
printf("The real FFT option is not implemented yet, but still on\nmy to do list. Please try again later.\n"); |
exit(0); |
} if (single_clock) { |
printf("The single clock FFT option is not implemented yet, but still on\nmy to do list. Please try again later.\n"); |
exit(0); |
} if (!bitreverse) { |
exit(EXIT_FAILURE); |
} |
|
if (ckpce < 1) |
ckpce = 1; |
if (!bitreverse) { |
printf("WARNING: While I can skip the bit reverse stage, the code to do\n"); |
printf("an inverse FFT on a bit--reversed input has not yet been\n"); |
printf("built.\n"); |
2476,7 → 1136,7
|
if ((fftsize <= 0)||(nbitsin < 1)||(nbitsin>48)) { |
printf("INVALID PARAMETERS!!!!\n"); |
exit(-1); |
exit(EXIT_FAILURE); |
} |
|
|
2483,7 → 1143,7
if (nextlg(fftsize) != fftsize) { |
fprintf(stderr, "ERR: FFTSize (%d) *must* be a power of two\n", |
fftsize); |
exit(-1); |
exit(EXIT_FAILURE); |
} else if (fftsize < 2) { |
fprintf(stderr, "ERR: Minimum FFTSize is 2, not %d\n", |
fftsize); |
2496,7 → 1156,7
fprintf(stderr, "Indeed, a size of %d doesn\'t make much sense to me at all.\n", fftsize); |
fprintf(stderr, "Is such an operation even defined?\n"); |
} |
exit(-1); |
exit(EXIT_FAILURE); |
} |
|
// Calculate how many output bits we'll have, and what the log |
2522,14 → 1182,30
} if ((maxbitsout > 0)&&(nbitsout > maxbitsout)) |
nbitsout = maxbitsout; |
|
if (verbose_flag) { |
printf("Output samples will be %d bits wide\n", nbitsout); |
printf("This %sFFT will take %d-bit samples in, and produce %d samples out\n", (inverse)?"i":"", nbitsin, nbitsout); |
if (maxbitsout > 0) |
printf(" Internally, it will allow items to accumulate to %d bits\n", maxbitsout); |
printf(" Twiddle-factors of %d bits will be used\n", |
nbitsin+xtracbits); |
if (!bitreverse) |
printf(" The output will be left in bit-reversed order\n"); |
} |
|
// Figure out how many multiply stages to use, and how many to skip |
{ |
int lgv = lgval(fftsize); |
if (!single_clock) { |
nmpypstage = 6; |
} else if (ckpce <= 1) { |
nmpypstage = 3; |
} else if (ckpce == 2) { |
nmpypstage = 2; |
} else |
nmpypstage = 1; |
|
nonmpy = lgv - nummpy; |
if (nonmpy < 2) nonmpy = 2; |
nummpy = lgv - nonmpy; |
} |
mpy_stages = nummpy / nmpypstage; |
if (mpy_stages > lgval(fftsize)-2) |
mpy_stages = lgval(fftsize)-2; |
|
{ |
struct stat sbuf; |
2538,13 → 1214,13
fprintf(stderr, "\'%s\' already exists, and is not a directory!\n", coredir.c_str()); |
fprintf(stderr, "I will stop now, lest I overwrite something you care about.\n"); |
fprintf(stderr, "To try again, please remove this file.\n"); |
exit(-1); |
exit(EXIT_FAILURE); |
} |
} else |
mkdir(coredir.c_str(), 0755); |
if (access(coredir.c_str(), X_OK|W_OK) != 0) { |
fprintf(stderr, "I have no access to the directory \'%s\'.\n", coredir.c_str()); |
exit(-1); |
exit(EXIT_FAILURE); |
} |
} |
|
2553,24 → 1229,25
if (hdr == NULL) { |
fprintf(stderr, "ERROR: Cannot open %s to create header file\n", hdrname.c_str()); |
perror("O/S Err:"); |
exit(-2); |
exit(EXIT_FAILURE); |
} |
|
fprintf(hdr, "/////////////////////////////////////////////////////////////////////////////\n"); |
fprintf(hdr, "//\n"); |
fprintf(hdr, "// Filename: %s\n", hdrname.c_str()); |
fprintf(hdr, "//\n"); |
fprintf(hdr, "// Project: %s\n", prjname); |
fprintf(hdr, "//\n"); |
fprintf(hdr, "// Purpose: This simple header file captures the internal constants\n"); |
fprintf(hdr, "// within the FFT that were used to build it, for the purpose\n"); |
fprintf(hdr, "// of making C++ integration (and test bench testing) simpler. That\n"); |
fprintf(hdr, "// is, should the FFT change size, this will note that size change\n"); |
fprintf(hdr, "// and thus any test bench or other C++ program dependent upon\n"); |
fprintf(hdr, "// either the size of the FFT, the number of bits in or out of\n"); |
fprintf(hdr, "// it, etc., can pick up the changes in the defines found within\n"); |
fprintf(hdr, "// this file.\n"); |
fprintf(hdr, "//\n"); |
fprintf(hdr, |
SLASHLINE |
"//\n" |
"// Filename:\t%s\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose: This simple header file captures the internal constants\n" |
"// within the FFT that were used to build it, for the purpose\n" |
"// of making C++ integration (and test bench testing) simpler. That is,\n" |
"// should the FFT change size, this will note that size change and thus\n" |
"// any test bench or other C++ program dependent upon either the size of\n" |
"// the FFT, the number of bits in or out of it, etc., can pick up the\n" |
"// changes in the defines found within this file.\n" |
"//\n", |
hdrname.c_str(), prjname); |
fprintf(hdr, "%s", creator); |
fprintf(hdr, "//\n"); |
fprintf(hdr, "%s", cpyleft); |
2588,6 → 1265,11
(inverse)?"I":"", nbitsout, |
(inverse)?"I":"", lgsize, |
(inverse)?"I":"", (inverse)?"I":""); |
if (ckpce > 0) |
fprintf(hdr, "#define\t%sFFT_CKPCE\t%d\t// Clocks per CE\n", |
(inverse)?"I":"", ckpce); |
else |
fprintf(hdr, "// Two samples per i_ce\n"); |
if (!bitreverse) |
fprintf(hdr, "#define\t%sFFT_SKIPS_BIT_REVERSE\n", |
(inverse)?"I":""); |
2595,6 → 1277,8
fprintf(hdr, "#define\tRL%sFFT\n\n", (inverse)?"I":""); |
if (!single_clock) |
fprintf(hdr, "#define\tDBLCLK%sFFT\n\n", (inverse)?"I":""); |
else |
fprintf(hdr, "// #define\tDBLCLK%sFFT // this FFT takes one input sample per clock\n\n", (inverse)?"I":""); |
if (USE_OLD_MULTIPLY) |
fprintf(hdr, "#define\tUSE_OLD_MULTIPLY\n\n"); |
|
2650,53 → 1334,89
if (NULL == vmain) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname_string.c_str()); |
perror("Err from O/S:"); |
exit(-1); |
exit(EXIT_FAILURE); |
} |
|
if (verbose_flag) |
printf("Opened %s\n", fname_string.c_str()); |
} |
|
fprintf(vmain, "/////////////////////////////////////////////////////////////////////////////\n"); |
fprintf(vmain, "//\n"); |
fprintf(vmain, "// Filename: %sfftmain.v\n", (inverse)?"i":""); |
fprintf(vmain, "//\n"); |
fprintf(vmain, "// Project: %s\n", prjname); |
fprintf(vmain, "//\n"); |
fprintf(vmain, "// Purpose: This is the main module in the Doubletime FPGA FFT project.\n"); |
fprintf(vmain, "// As such, all other modules are subordinate to this one.\n"); |
fprintf(vmain, "// (I have been reading too much legalese this week ...)\n"); |
fprintf(vmain, "// This module accomplish a fixed size Complex FFT on %d data\n", fftsize); |
fprintf(vmain, "// points. The FFT is fully pipelined, and accepts as inputs\n"); |
fprintf(vmain, "// two complex two\'s complement samples per clock.\n"); |
fprintf(vmain, "//\n"); |
fprintf(vmain, "// Parameters:\n"); |
fprintf(vmain, "// i_clk\tThe clock. All operations are synchronous with this clock.\n"); |
fprintf(vmain, "//\ti_rst\tSynchronous reset, active high. Setting this line will\n"); |
fprintf(vmain, "//\t\t\tforce the reset of all of the internals to this routine.\n"); |
fprintf(vmain, "//\t\t\tFurther, following a reset, the o_sync line will go\n"); |
fprintf(vmain, "//\t\t\thigh the same time the first output sample is valid.\n"); |
fprintf(vmain, "//\ti_ce\tA clock enable line. If this line is set, this module\n"); |
fprintf(vmain, "//\t\t\twill accept two complex values as inputs, and produce\n"); |
fprintf(vmain, "//\t\t\ttwo (possibly empty) complex values as outputs.\n"); |
fprintf(vmain, "//\ti_left\tThe first of two complex input samples. This value is split\n"); |
fprintf(vmain, "//\t\t\tinto two two\'s complement numbers, %d bits each, with\n", nbitsin); |
fprintf(vmain, "//\t\t\tthe real portion in the high order bits, and the\n"); |
fprintf(vmain, "//\t\t\timaginary portion taking the bottom %d bits.\n", nbitsin); |
fprintf(vmain, "//\ti_right\tThis is the same thing as i_left, only this is the second of\n"); |
fprintf(vmain, "//\t\t\ttwo such samples. Hence, i_left would contain input\n"); |
fprintf(vmain, "//\t\t\tsample zero, i_right would contain sample one. On the\n"); |
fprintf(vmain, "//\t\t\tnext clock i_left would contain input sample two,\n"); |
fprintf(vmain, "//\t\t\ti_right number three and so forth.\n"); |
fprintf(vmain, "//\to_left\tThe first of two output samples, of the same format as i_left,\n"); |
fprintf(vmain, "//\t\t\tonly having %d bits for each of the real and imaginary\n", nbitsout); |
fprintf(vmain, "//\t\t\tcomponents, leading to %d bits total.\n", nbitsout*2); |
fprintf(vmain, "//\to_right\tThe second of two output samples produced each clock. This has\n"); |
fprintf(vmain, "//\t\t\tthe same format as o_left.\n"); |
fprintf(vmain, "//\to_sync\tA one bit output indicating the first valid sample produced by\n"); |
fprintf(vmain, "//\t\t\tthis FFT following a reset. Ever after, this will\n"); |
fprintf(vmain, "//\t\t\tindicate the first sample of an FFT frame.\n"); |
fprintf(vmain, "//\n"); |
fprintf(vmain, "// Arguments:\tThis file was computer generated using the\n"); |
fprintf(vmain, "//\t\tfollowing command line:\n"); |
fprintf(vmain, "//\n"); |
fprintf(vmain, |
SLASHLINE |
"//\n" |
"// Filename:\t%sfftmain.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: This is the main module in the General Purpose FPGA FFT\n" |
"// implementation. As such, all other modules are subordinate\n" |
"// to this one. This module accomplish a fixed size Complex FFT on\n" |
"// %d data points.\n", |
(inverse)?"i":"",prjname, fftsize); |
if (single_clock) { |
fprintf(vmain, |
"// The FFT is fully pipelined, and accepts as inputs one complex two\'s\n" |
"// complement sample per clock.\n"); |
} else { |
fprintf(vmain, |
"// The FFT is fully pipelined, and accepts as inputs two complex two\'s\n" |
"// complement samples per clock.\n"); |
} |
|
fprintf(vmain, |
"//\n" |
"// Parameters:\n" |
"// i_clk\tThe clock. All operations are synchronous with this clock.\n" |
"// i_%sreset%s\tSynchronous reset, active high. Setting this line will\n" |
"// \t\tforce the reset of all of the internals to this routine.\n" |
"// \t\tFurther, following a reset, the o_sync line will go\n" |
"// \t\thigh the same time the first output sample is valid.\n", |
(async_reset)?"a":"", (async_reset)?"_n":""); |
if (single_clock) { |
fprintf(vmain, |
"// i_ce\tA clock enable line. If this line is set, this module\n" |
"// \t\twill accept one complex input value, and produce\n" |
"// \t\tone (possibly empty) complex output value.\n" |
"// i_sample\tThe complex input sample. This value is split\n" |
"// \t\tinto two two\'s complement numbers, %d bits each, with\n" |
"// \t\tthe real portion in the high order bits, and the\n" |
"// \t\timaginary portion taking the bottom %d bits.\n" |
"// o_result\tThe output result, of the same format as i_sample,\n" |
"// \t\tonly having %d bits for each of the real and imaginary\n" |
"// \t\tcomponents, leading to %d bits total.\n" |
"// o_sync\tA one bit output indicating the first sample of the FFT frame.\n" |
"// \t\tIt also indicates the first valid sample out of the FFT\n" |
"// \t\ton the first frame.\n", nbitsin, nbitsin, nbitsout, nbitsout*2); |
} else { |
fprintf(vmain, |
"// i_ce\tA clock enable line. If this line is set, this module\n" |
"// \t\twill accept two complex values as inputs, and produce\n" |
"// \t\ttwo (possibly empty) complex values as outputs.\n" |
"// i_left\tThe first of two complex input samples. This value is split\n" |
"// \t\tinto two two\'s complement numbers, %d bits each, with\n" |
"// \t\tthe real portion in the high order bits, and the\n" |
"// \t\timaginary portion taking the bottom %d bits.\n" |
"// i_right\tThis is the same thing as i_left, only this is the second of\n" |
"// \t\ttwo such samples. Hence, i_left would contain input\n" |
"// \t\tsample zero, i_right would contain sample one. On the\n" |
"// \t\tnext clock i_left would contain input sample two,\n" |
"// \t\ti_right number three and so forth.\n" |
"// o_left\tThe first of two output samples, of the same format as i_left,\n" |
"// \t\tonly having %d bits for each of the real and imaginary\n" |
"// \t\tcomponents, leading to %d bits total.\n" |
"// o_right\tThe second of two output samples produced each clock. This has\n" |
"// \t\tthe same format as o_left.\n" |
"// o_sync\tA one bit output indicating the first valid sample produced by\n" |
"// \t\tthis FFT following a reset. Ever after, this will\n" |
"// \t\tindicate the first sample of an FFT frame.\n", |
nbitsin, nbitsin, nbitsout, nbitsout*2); |
} |
|
fprintf(vmain, |
"//\n" |
"// Arguments:\tThis file was computer generated using the following command\n" |
"//\t\tline:\n" |
"//\n"); |
fprintf(vmain, "//\t\t%% %s\n", cmdline.c_str()); |
fprintf(vmain, "//\n"); |
fprintf(vmain, "%s", creator); |
2705,44 → 1425,69
fprintf(vmain, "//\n//\n`default_nettype\tnone\n//\n"); |
|
|
std::string resetw("i_reset"); |
if (async_reset) |
resetw = "i_areset_n"; |
|
fprintf(vmain, "//\n"); |
fprintf(vmain, "//\n"); |
fprintf(vmain, "module %sfftmain(i_clk, i_rst, i_ce,\n", (inverse)?"i":""); |
fprintf(vmain, "\t\ti_left, i_right,\n"); |
fprintf(vmain, "\t\to_left, o_right, o_sync%s);\n", |
fprintf(vmain, "module %sfftmain(i_clk, %s, i_ce,\n", |
(inverse)?"i":"", resetw.c_str()); |
if (single_clock) { |
fprintf(vmain, "\t\ti_sample, o_result, o_sync%s);\n", |
(dbg)?", o_dbg":""); |
fprintf(vmain, "\tparameter\tIWIDTH=%d, OWIDTH=%d, LGWIDTH=%d;\n", nbitsin, nbitsout, lgsize); |
} else { |
fprintf(vmain, "\t\ti_left, i_right,\n"); |
fprintf(vmain, "\t\to_left, o_right, o_sync%s);\n", |
(dbg)?", o_dbg":""); |
} |
fprintf(vmain, "\tparameter\tIWIDTH=%d, OWIDTH=%d, LGWIDTH=%d;\n\t//\n", nbitsin, nbitsout, lgsize); |
assert(lgsize > 0); |
fprintf(vmain, "\tinput\t\ti_clk, i_rst, i_ce;\n"); |
fprintf(vmain, "\tinput\t\t\t\t\ti_clk, %s, i_ce;\n\t//\n", |
resetw.c_str()); |
if (single_clock) { |
fprintf(vmain, "\tinput\t\t[(2*IWIDTH-1):0]\ti_sample;\n"); |
fprintf(vmain, "\toutput\treg\t[(2*OWIDTH-1):0]\to_result;\n"); |
} else { |
fprintf(vmain, "\tinput\t\t[(2*IWIDTH-1):0]\ti_left, i_right;\n"); |
fprintf(vmain, "\toutput\treg\t[(2*OWIDTH-1):0]\to_left, o_right;\n"); |
fprintf(vmain, "\toutput\treg\t\t\to_sync;\n"); |
} |
fprintf(vmain, "\toutput\treg\t\t\t\to_sync;\n"); |
if (dbg) |
fprintf(vmain, "\toutput\twire\t[33:0]\t\to_dbg;\n"); |
fprintf(vmain, "\n\n"); |
|
fprintf(vmain, "\t// Outputs of the FFT, ready for bit reversal.\n"); |
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_left, br_right;\n"); |
fprintf(vmain, "\n\n"); |
|
if (single_clock) |
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_sample;\n"); |
else |
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_left, br_right;\n"); |
int tmp_size = fftsize, lgtmp = lgsize; |
if (fftsize == 2) { |
if (bitreverse) { |
fprintf(vmain, "\treg\tbr_start;\n"); |
fprintf(vmain, "\tinitial br_start = 1\'b0;\n"); |
fprintf(vmain, "\talways @(posedge i_clk)\n"); |
fprintf(vmain, "\t\tif (i_rst)\n"); |
if (async_reset) { |
fprintf(vmain, "\talways @(posedge i_clk, negedge i_arese_n)\n"); |
fprintf(vmain, "\t\tif (!i_areset_n)\n"); |
} else { |
fprintf(vmain, "\talways @(posedge i_clk)\n"); |
fprintf(vmain, "\t\tif (i_reset)\n"); |
} |
fprintf(vmain, "\t\t\tbr_start <= 1\'b0;\n"); |
fprintf(vmain, "\t\telse if (i_ce)\n"); |
fprintf(vmain, "\t\t\tbr_start <= 1\'b1;\n"); |
} |
fprintf(vmain, "\n\n"); |
fprintf(vmain, "\tdblstage\t#(IWIDTH)\tstage_2(i_clk, i_rst, i_ce,\n"); |
fprintf(vmain, "\t\t\t(!i_rst), i_left, i_right, br_left, br_right);\n"); |
fprintf(vmain, "\tlaststage\t#(IWIDTH)\tstage_2(i_clk, %s, i_ce,\n", resetw.c_str()); |
fprintf(vmain, "\t\t\t(%s%s), i_left, i_right, br_left, br_right);\n", |
(async_reset)?"":"!", resetw.c_str()); |
fprintf(vmain, "\n\n"); |
} else { |
int nbits = nbitsin, dropbit=0; |
int obits = nbits+1+xtrapbits; |
std::string cmem; |
FILE *cmemfp; |
|
if ((maxbitsout > 0)&&(obits > maxbitsout)) |
obits = maxbitsout; |
2753,50 → 1498,87
|
// Last two stages are always non-multiply stages |
// since the multiplies can be done by adds |
mpystage = ((lgtmp-2) <= nummpy); |
mpystage = ((lgtmp-2) <= mpy_stages); |
|
if (mpystage) |
fprintf(vmain, "\t// A hardware optimized FFT stage\n"); |
fprintf(vmain, "\n\n"); |
fprintf(vmain, "\twire\t\tw_s%d;\n", fftsize); |
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n", fftsize); |
fprintf(vmain, "\twire\t[%d:0]\tw_e%d, w_o%d;\n", 2*(obits+xtrapbits)-1, fftsize, fftsize); |
fprintf(vmain, "\t%sfftstage_e%d%s\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,%d,0)\tstage_e%d(i_clk, i_rst, i_ce,\n", |
(inverse)?"i":"", fftsize, |
if (single_clock) { |
fprintf(vmain, "\twire\t[%d:0]\tw_d%d;\n", 2*(obits+xtrapbits)-1, fftsize); |
cmem = gen_coeff_fname(EMPTYSTR, fftsize, 1, 0, inverse); |
cmemfp = gen_coeff_open(cmem.c_str()); |
gen_coeffs(cmemfp, fftsize, nbitsin+xtracbits, 1, 0, inverse); |
fprintf(vmain, "\tfftstage%s\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,0,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_%d(i_clk, %s, i_ce,\n", |
((dbg)&&(dbgstage == fftsize))?"_dbg":"", |
xtracbits, obits+xtrapbits, |
lgsize, lgtmp-2, lgdelay(nbits,xtracbits), |
fftsize); |
fprintf(vmain, "\t\t\t(!i_rst), i_left, w_e%d, w_s%d%s);\n", fftsize, fftsize, ((dbg)&&(dbgstage == fftsize))?", o_dbg":""); |
fprintf(vmain, "\t%sfftstage_o%d\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,%d,0)\tstage_o%d(i_clk, i_rst, i_ce,\n", |
(inverse)?"i":"", fftsize, |
xtracbits, obits+xtrapbits, |
lgsize, lgtmp-2, lgdelay(nbits,xtracbits), |
fftsize); |
fprintf(vmain, "\t\t\t(!i_rst), i_right, w_o%d, w_os%d);\n", fftsize, fftsize); |
fprintf(vmain, "\n\n"); |
xtracbits, obits+xtrapbits, |
lgsize, lgtmp-1, |
(mpystage)?1:0, |
ckpce, cmem.c_str(), |
fftsize, resetw.c_str()); |
fprintf(vmain, "\t\t\t(%s%s), i_sample, w_d%d, w_s%d%s);\n", |
(async_reset)?"":"!", resetw.c_str(), |
fftsize, fftsize, |
((dbg)&&(dbgstage == fftsize)) |
? ", o_dbg":""); |
} else { |
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n", fftsize); |
fprintf(vmain, "\twire\t[%d:0]\tw_e%d, w_o%d;\n", 2*(obits+xtrapbits)-1, fftsize, fftsize); |
cmem = gen_coeff_fname(EMPTYSTR, fftsize, 2, 0, inverse); |
cmemfp = gen_coeff_open(cmem.c_str()); |
gen_coeffs(cmemfp, fftsize, nbitsin+xtracbits, 2, 0, inverse); |
fprintf(vmain, "\tfftstage%s\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,0,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_e%d(i_clk, %s, i_ce,\n", |
((dbg)&&(dbgstage == fftsize))?"_dbg":"", |
xtracbits, obits+xtrapbits, |
lgsize, lgtmp-2, |
(mpystage)?1:0, |
ckpce, cmem.c_str(), |
fftsize, resetw.c_str()); |
fprintf(vmain, "\t\t\t(%s%s), i_left, w_e%d, w_s%d%s);\n", |
(async_reset)?"":"!", resetw.c_str(), |
fftsize, fftsize, |
((dbg)&&(dbgstage == fftsize))?", o_dbg":""); |
cmem = gen_coeff_fname(EMPTYSTR, fftsize, 2, 1, inverse); |
cmemfp = gen_coeff_open(cmem.c_str()); |
gen_coeffs(cmemfp, fftsize, nbitsin+xtracbits, 2, 1, inverse); |
fprintf(vmain, "\tfftstage\t#(IWIDTH,IWIDTH+%d,%d,%d,%d,0,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_o%d(i_clk, %s, i_ce,\n", |
xtracbits, obits+xtrapbits, |
lgsize, lgtmp-2, |
(mpystage)?1:0, |
ckpce, cmem.c_str(), |
fftsize, resetw.c_str()); |
fprintf(vmain, "\t\t\t(%s%s), i_right, w_o%d, w_os%d);\n", |
(async_reset)?"":"!",resetw.c_str(), |
fftsize, fftsize); |
} |
|
|
std::string fname; |
char numstr[12]; |
|
fname = coredir + "/"; |
if (inverse) fname += "i"; |
fname += "fftstage_e"; |
sprintf(numstr, "%d", fftsize); |
fname += numstr; |
if ((dbg)&&(dbgstage == fftsize)) |
fname += "_dbg"; |
fname += ".v"; |
build_stage(fname.c_str(), coredir.c_str(), fftsize/2, 0, nbits, inverse, xtracbits, mpystage, (dbg)&&(dbgstage == fftsize)); // Even stage |
if (inverse) |
fname += "i"; |
fname += "fftstage"; |
if (dbg) { |
std::string dbgname(fname); |
dbgname += "_dbg"; |
dbgname += ".v"; |
if (single_clock) |
build_stage(fname.c_str(), fftsize, 1, 0, nbits, xtracbits, ckpce, async_reset, true); |
else |
build_stage(fname.c_str(), fftsize/2, 2, 1, nbits, xtracbits, ckpce, async_reset, true); |
} |
|
fname = coredir + "/"; |
if (inverse) fname += "i"; |
fname += "fftstage_o"; |
sprintf(numstr, "%d", fftsize); |
fname += numstr; |
fname += ".v"; |
build_stage(fname.c_str(), coredir.c_str(), fftsize/2, 1, nbits, inverse, xtracbits, mpystage, false); // Odd stage |
if (single_clock) { |
build_stage(fname.c_str(), fftsize, 1, 0, |
nbits, xtracbits, ckpce, async_reset, |
false); |
} else { |
// All stages use the same Verilog, so we only |
// need to build one |
build_stage(fname.c_str(), fftsize/2, 2, 1, |
nbits, xtracbits, ckpce, async_reset, false); |
} |
} |
|
nbits = obits; // New number of input bits |
2812,68 → 1594,79
{ |
bool mpystage; |
|
mpystage = ((lgtmp-2) <= nummpy); |
mpystage = ((lgtmp-2) <= mpy_stages); |
|
if (mpystage) |
fprintf(vmain, "\t// A hardware optimized FFT stage\n"); |
fprintf(vmain, "\twire\t\tw_s%d;\n", |
tmp_size); |
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n", |
tmp_size); |
fprintf(vmain,"\twire\t[%d:0]\tw_e%d, w_o%d;\n", |
2*(obits+xtrapbits)-1, |
tmp_size, tmp_size); |
fprintf(vmain, "\t%sfftstage_e%d%s\t#(%d,%d,%d,%d,%d,%d,%d)\tstage_e%d(i_clk, i_rst, i_ce,\n", |
(inverse)?"i":"", tmp_size, |
((dbg)&&(dbgstage==tmp_size))?"_dbg":"", |
nbits+xtrapbits, |
nbits+xtracbits+xtrapbits, |
obits+xtrapbits, |
lgsize, lgtmp-2, |
lgdelay(nbits+xtrapbits,xtracbits), |
(dropbit)?0:0, tmp_size); |
fprintf(vmain, "\t\t\t\t\t\tw_s%d, w_e%d, w_e%d, w_s%d%s);\n", |
tmp_size<<1, tmp_size<<1, |
tmp_size, tmp_size, |
((dbg)&&(dbgstage == tmp_size)) |
?", o_dbg":""); |
fprintf(vmain, "\t%sfftstage_o%d\t#(%d,%d,%d,%d,%d,%d,%d)\tstage_o%d(i_clk, i_rst, i_ce,\n", |
(inverse)?"i":"", tmp_size, |
nbits+xtrapbits, |
nbits+xtracbits+xtrapbits, |
obits+xtrapbits, |
lgsize, lgtmp-2, |
lgdelay(nbits+xtrapbits,xtracbits), |
(dropbit)?0:0, tmp_size); |
fprintf(vmain, "\t\t\t\t\t\tw_s%d, w_o%d, w_o%d, w_os%d);\n", |
tmp_size<<1, tmp_size<<1, |
tmp_size, tmp_size); |
fprintf(vmain, "\n\n"); |
|
std::string fname; |
char numstr[12]; |
|
fname = coredir + "/"; |
if (inverse) fname += "i"; |
fname += "fftstage_e"; |
sprintf(numstr, "%d", tmp_size); |
fname += numstr; |
if ((dbg)&&(dbgstage == tmp_size)) |
fname += "_dbg"; |
fname += ".v"; |
build_stage(fname.c_str(), coredir.c_str(), tmp_size/2, 0, |
nbits+xtrapbits, inverse, xtracbits, |
mpystage, ((dbg)&&(dbgstage == tmp_size))); // Even stage |
|
fname = coredir + "/"; |
if (inverse) fname += "i"; |
fname += "fftstage_o"; |
sprintf(numstr, "%d", tmp_size); |
fname += numstr; |
fname += ".v"; |
build_stage(fname.c_str(), coredir.c_str(), tmp_size/2, 1, |
nbits+xtrapbits, inverse, xtracbits, |
mpystage, false); // Odd stage |
if (single_clock) { |
fprintf(vmain,"\twire\t[%d:0]\tw_d%d;\n", |
2*(obits+xtrapbits)-1, |
tmp_size); |
cmem = gen_coeff_fname(EMPTYSTR, tmp_size, 1, 0, inverse); |
cmemfp = gen_coeff_open(cmem.c_str()); |
gen_coeffs(cmemfp, tmp_size, |
nbits+xtracbits+xtrapbits, 1, 0, inverse); |
fprintf(vmain, "\tfftstage%s\t#(%d,%d,%d,%d,%d,%d,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_%d(i_clk, %s, i_ce,\n", |
((dbg)&&(dbgstage==tmp_size))?"_dbg":"", |
nbits+xtrapbits, |
nbits+xtracbits+xtrapbits, |
obits+xtrapbits, |
lgsize, lgtmp-1, |
(dropbit)?0:0, (mpystage)?1:0, |
ckpce, |
cmem.c_str(), tmp_size, |
resetw.c_str()); |
fprintf(vmain, "\t\t\tw_s%d, w_d%d, w_d%d, w_s%d%s);\n", |
tmp_size<<1, tmp_size<<1, |
tmp_size, tmp_size, |
((dbg)&&(dbgstage == tmp_size)) |
?", o_dbg":""); |
} else { |
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os%d;\n\t// verilator lint_on UNUSED\n", |
tmp_size); |
fprintf(vmain,"\twire\t[%d:0]\tw_e%d, w_o%d;\n", |
2*(obits+xtrapbits)-1, |
tmp_size, tmp_size); |
cmem = gen_coeff_fname(EMPTYSTR, tmp_size, 2, 0, inverse); |
cmemfp = gen_coeff_open(cmem.c_str()); |
gen_coeffs(cmemfp, tmp_size, |
nbits+xtracbits+xtrapbits, 2, 0, inverse); |
fprintf(vmain, "\tfftstage%s\t#(%d,%d,%d,%d,%d,%d,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_e%d(i_clk, %s, i_ce,\n", |
((dbg)&&(dbgstage==tmp_size))?"_dbg":"", |
nbits+xtrapbits, |
nbits+xtracbits+xtrapbits, |
obits+xtrapbits, |
lgsize, lgtmp-2, |
(dropbit)?0:0, (mpystage)?1:0, |
ckpce, |
cmem.c_str(), tmp_size, |
resetw.c_str()); |
fprintf(vmain, "\t\t\tw_s%d, w_e%d, w_e%d, w_s%d%s);\n", |
tmp_size<<1, tmp_size<<1, |
tmp_size, tmp_size, |
((dbg)&&(dbgstage == tmp_size)) |
?", o_dbg":""); |
cmem = gen_coeff_fname(EMPTYSTR, |
tmp_size, 2, 1, inverse); |
cmemfp = gen_coeff_open(cmem.c_str()); |
gen_coeffs(cmemfp, tmp_size, |
nbits+xtracbits+xtrapbits, |
2, 1, inverse); |
fprintf(vmain, "\tfftstage\t#(%d,%d,%d,%d,%d,%d,\n\t\t\t%d, %d, \"%s\")\n\t\tstage_o%d(i_clk, %s, i_ce,\n", |
nbits+xtrapbits, |
nbits+xtracbits+xtrapbits, |
obits+xtrapbits, |
lgsize, lgtmp-2, |
(dropbit)?0:0, (mpystage)?1:0, |
ckpce, cmem.c_str(), tmp_size, |
resetw.c_str()); |
fprintf(vmain, "\t\t\tw_s%d, w_o%d, w_o%d, w_os%d);\n", |
tmp_size<<1, tmp_size<<1, |
tmp_size, tmp_size); |
} |
fprintf(vmain, "\n"); |
} |
|
|
2889,17 → 1682,31
obits = maxbitsout; |
|
fprintf(vmain, "\twire\t\tw_s4;\n"); |
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os4;\n\t// verilator lint_on UNUSED\n"); |
fprintf(vmain, "\twire\t[%d:0]\tw_e4, w_o4;\n", 2*(obits+xtrapbits)-1); |
fprintf(vmain, "\tqtrstage%s\t#(%d,%d,%d,0,%d,%d)\tstage_e4(i_clk, i_rst, i_ce,\n", |
((dbg)&&(dbgstage==4))?"_dbg":"", |
nbits+xtrapbits, obits+xtrapbits, lgsize, |
(inverse)?1:0, (dropbit)?0:0); |
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_e8, w_e4, w_s4%s);\n", |
((dbg)&&(dbgstage==4))?", o_dbg":""); |
fprintf(vmain, "\tqtrstage\t#(%d,%d,%d,1,%d,%d)\tstage_o4(i_clk, i_rst, i_ce,\n", |
nbits+xtrapbits, obits+xtrapbits, lgsize, (inverse)?1:0, (dropbit)?0:0); |
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_o8, w_o4, w_os4);\n"); |
if (single_clock) { |
fprintf(vmain, "\twire\t[%d:0]\tw_d4;\n", |
2*(obits+xtrapbits)-1); |
fprintf(vmain, "\tqtrstage%s\t#(%d,%d,%d,%d,%d)\tstage_4(i_clk, %s, i_ce,\n", |
((dbg)&&(dbgstage==4))?"_dbg":"", |
nbits+xtrapbits, obits+xtrapbits, lgsize, |
(inverse)?1:0, (dropbit)?0:0, |
resetw.c_str()); |
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_d8, w_d4, w_s4%s);\n", |
((dbg)&&(dbgstage==4))?", o_dbg":""); |
} else { |
fprintf(vmain, "\t// verilator lint_off UNUSED\n\twire\t\tw_os4;\n\t// verilator lint_on UNUSED\n"); |
fprintf(vmain, "\twire\t[%d:0]\tw_e4, w_o4;\n", 2*(obits+xtrapbits)-1); |
fprintf(vmain, "\tqtrstage%s\t#(%d,%d,%d,0,%d,%d)\tstage_e4(i_clk, %s, i_ce,\n", |
((dbg)&&(dbgstage==4))?"_dbg":"", |
nbits+xtrapbits, obits+xtrapbits, lgsize, |
(inverse)?1:0, (dropbit)?0:0, |
resetw.c_str()); |
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_e8, w_e4, w_s4%s);\n", |
((dbg)&&(dbgstage==4))?", o_dbg":""); |
fprintf(vmain, "\tqtrstage\t#(%d,%d,%d,1,%d,%d)\tstage_o4(i_clk, %s, i_ce,\n", |
nbits+xtrapbits, obits+xtrapbits, lgsize, (inverse)?1:0, (dropbit)?0:0, |
resetw.c_str()); |
fprintf(vmain, "\t\t\t\t\t\tw_s8, w_o8, w_o4, w_os4);\n"); |
} |
dropbit ^= 1; |
nbits = obits; |
tmp_size >>= 1; lgtmp--; |
2912,26 → 1719,51
if ((maxbitsout>0)&&(obits > maxbitsout)) |
obits = maxbitsout; |
fprintf(vmain, "\twire\t\tw_s2;\n"); |
fprintf(vmain, "\twire\t[%d:0]\tw_e2, w_o2;\n", 2*obits-1); |
if (single_clock) { |
fprintf(vmain, "\twire\t[%d:0]\tw_d2;\n", |
2*obits-1); |
} else { |
fprintf(vmain, "\twire\t[%d:0]\tw_e2, w_o2;\n", |
2*obits-1); |
} |
if ((nbits+xtrapbits+1 == obits)&&(!dropbit)) |
printf("WARNING: SCALING OFF BY A FACTOR OF TWO--should\'ve dropped a bit in the last stage.\n"); |
fprintf(vmain, "\tdblstage\t#(%d,%d,%d)\tstage_2(i_clk, i_rst, i_ce,\n", nbits+xtrapbits, obits,(dropbit)?0:1); |
fprintf(vmain, "\t\t\t\t\tw_s4, w_e4, w_o4, w_e2, w_o2, w_s2);\n"); |
|
if (single_clock) { |
fprintf(vmain, "\tlaststage\t#(%d,%d,%d)\tstage_2(i_clk, %s, i_ce,\n", |
nbits+xtrapbits, obits,(dropbit)?0:1, |
resetw.c_str()); |
fprintf(vmain, "\t\t\t\t\tw_s4, w_d4, w_d2, w_s2);\n"); |
} else { |
fprintf(vmain, "\tlaststage\t#(%d,%d,%d)\tstage_2(i_clk, %s, i_ce,\n", |
nbits+xtrapbits, obits,(dropbit)?0:1, |
resetw.c_str()); |
fprintf(vmain, "\t\t\t\t\tw_s4, w_e4, w_o4, w_e2, w_o2, w_s2);\n"); |
} |
|
fprintf(vmain, "\n\n"); |
nbits = obits; |
} |
|
fprintf(vmain, "\t// Prepare for a (potential) bit-reverse stage.\n"); |
fprintf(vmain, "\tassign\tbr_left = w_e2;\n"); |
fprintf(vmain, "\tassign\tbr_right = w_o2;\n"); |
if (single_clock) |
fprintf(vmain, "\tassign\tbr_sample= w_d2;\n"); |
else { |
fprintf(vmain, "\tassign\tbr_left = w_e2;\n"); |
fprintf(vmain, "\tassign\tbr_right = w_o2;\n"); |
} |
fprintf(vmain, "\n"); |
if (bitreverse) { |
fprintf(vmain, "\twire\tbr_start;\n"); |
fprintf(vmain, "\treg\tr_br_started;\n"); |
fprintf(vmain, "\tinitial\tr_br_started = 1\'b0;\n"); |
fprintf(vmain, "\talways @(posedge i_clk)\n"); |
fprintf(vmain, "\t\tif (i_rst)\n"); |
if (async_reset) { |
fprintf(vmain, "\talways @(posedge i_clk, negedge i_areset_n)\n"); |
fprintf(vmain, "\t\tif (!i_areset_n)\n"); |
} else { |
fprintf(vmain, "\talways @(posedge i_clk)\n"); |
fprintf(vmain, "\t\tif (i_reset)\n"); |
} |
fprintf(vmain, "\t\t\tr_br_started <= 1\'b0;\n"); |
fprintf(vmain, "\t\telse if (i_ce)\n"); |
fprintf(vmain, "\t\t\tr_br_started <= r_br_started || w_s2;\n"); |
2939,14 → 1771,25
} |
} |
|
|
fprintf(vmain, "\n"); |
fprintf(vmain, "\t// Now for the bit-reversal stage.\n"); |
fprintf(vmain, "\twire\tbr_sync;\n"); |
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_o_left, br_o_right;\n"); |
if (bitreverse) { |
fprintf(vmain, "\tdblreverse\t#(%d,%d)\trevstage(i_clk, i_rst,\n", lgsize, nbitsout); |
fprintf(vmain, "\t\t\t(i_ce & br_start), br_left, br_right,\n"); |
fprintf(vmain, "\t\t\tbr_o_left, br_o_right, br_sync);\n"); |
if (single_clock) { |
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_o_result;\n"); |
fprintf(vmain, "\tbitreverse\t#(%d,%d)\n\t\trevstage(i_clk, %s,\n", lgsize, nbitsout, resetw.c_str()); |
fprintf(vmain, "\t\t\t(i_ce & br_start), br_sample,\n"); |
fprintf(vmain, "\t\t\tbr_o_result, br_sync);\n"); |
} else { |
fprintf(vmain, "\twire\t[(2*OWIDTH-1):0]\tbr_o_left, br_o_right;\n"); |
fprintf(vmain, "\tbitreverse\t#(%d,%d)\n\t\trevstage(i_clk, %s,\n", lgsize, nbitsout, resetw.c_str()); |
fprintf(vmain, "\t\t\t(i_ce & br_start), br_left, br_right,\n"); |
fprintf(vmain, "\t\t\tbr_o_left, br_o_right, br_sync);\n"); |
} |
} else if (single_clock) { |
fprintf(vmain, "\tassign\tbr_o_result = br_result;\n"); |
fprintf(vmain, "\tassign\tbr_sync = w_s2;\n"); |
} else { |
fprintf(vmain, "\tassign\tbr_o_left = br_left;\n"); |
fprintf(vmain, "\tassign\tbr_o_right = br_right;\n"); |
2953,35 → 1796,51
fprintf(vmain, "\tassign\tbr_sync = w_s2;\n"); |
} |
|
fprintf(vmain, "\n\n"); |
fprintf(vmain, "\t// Last clock: Register our outputs, we\'re done.\n"); |
fprintf(vmain, "\tinitial\to_sync = 1\'b0;\n"); |
fprintf(vmain, "\talways @(posedge i_clk)\n"); |
fprintf(vmain, "\t\tif (i_rst)\n"); |
fprintf(vmain, "\t\t\to_sync <= 1\'b0;\n"); |
fprintf(vmain, "\t\telse if (i_ce)\n"); |
fprintf(vmain, "\t\t\to_sync <= br_sync;\n"); |
fprintf(vmain, "\n"); |
fprintf(vmain, "\talways @(posedge i_clk)\n"); |
fprintf(vmain, "\t\tif (i_ce)\n"); |
fprintf(vmain, "\t\tbegin\n"); |
fprintf(vmain, "\t\t\to_left <= br_o_left;\n"); |
fprintf(vmain, "\t\t\to_right <= br_o_right;\n"); |
fprintf(vmain, "\t\tend\n"); |
fprintf(vmain, "\n\n"); |
fprintf(vmain, "endmodule\n"); |
fprintf(vmain, |
"\n\n" |
"\t// Last clock: Register our outputs, we\'re done.\n" |
"\tinitial\to_sync = 1\'b0;\n"); |
if (async_reset) |
fprintf(vmain, |
"\talways @(posedge i_clk, negedge i_areset_n)\n\t\tif (!i_areset_n)\n"); |
else { |
fprintf(vmain, |
"\talways @(posedge i_clk)\n\t\tif (i_reset)\n"); |
} |
|
fprintf(vmain, |
"\t\t\to_sync <= 1\'b0;\n" |
"\t\telse if (i_ce)\n" |
"\t\t\to_sync <= br_sync;\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n"); |
if (single_clock) { |
fprintf(vmain, "\t\t\to_result <= br_o_result;\n"); |
} else { |
fprintf(vmain, |
"\t\tbegin\n" |
"\t\t\to_left <= br_o_left;\n" |
"\t\t\to_right <= br_o_right;\n" |
"\t\tend\n"); |
} |
|
fprintf(vmain, |
"\n\n" |
"endmodule\n"); |
fclose(vmain); |
|
|
{ |
std::string fname; |
|
fname = coredir + "/butterfly.v"; |
build_butterfly(fname.c_str(), xtracbits, rounding); |
build_butterfly(fname.c_str(), xtracbits, rounding, |
ckpce, async_reset); |
|
if (nummpy > 0) { |
fname = coredir + "/hwbfly.v"; |
build_hwbfly(fname.c_str(), xtracbits, rounding); |
} |
fname = coredir + "/hwbfly.v"; |
build_hwbfly(fname.c_str(), xtracbits, rounding, |
ckpce, async_reset); |
|
{ |
// To make debugging easier, we build both of these |
2996,20 → 1855,40
|
if ((dbg)&&(dbgstage == 4)) { |
fname = coredir + "/qtrstage_dbg.v"; |
build_quarters(fname.c_str(), rounding, true); |
if (single_clock) |
build_snglquarters(fname.c_str(), rounding, |
async_reset, true); |
else |
build_dblquarters(fname.c_str(), rounding, |
async_reset, true); |
} |
fname = coredir + "/qtrstage.v"; |
build_quarters(fname.c_str(), rounding, false); |
|
if ((dbg)&&(dbgstage == 2)) |
fname = coredir + "/dblstage_dbg.v"; |
if (single_clock) |
build_snglquarters(fname.c_str(), rounding, |
async_reset, false); |
else |
fname = coredir + "/dblstage.v"; |
build_dblstage(fname.c_str(), rounding, (dbg)&&(dbgstage==2)); |
build_dblquarters(fname.c_str(), rounding, |
async_reset, false); |
|
|
if (single_clock) { |
fname = coredir + "/laststage.v"; |
build_sngllast(fname.c_str(), async_reset); |
} else { |
if ((dbg)&&(dbgstage == 2)) |
fname = coredir + "/laststage_dbg.v"; |
else |
fname = coredir + "/laststage.v"; |
build_dblstage(fname.c_str(), rounding, |
async_reset, (dbg)&&(dbgstage==2)); |
} |
|
if (bitreverse) { |
fname = coredir + "/dblreverse.v"; |
build_dblreverse(fname.c_str()); |
fname = coredir + "/bitreverse.v"; |
if (single_clock) |
build_snglbrev(fname.c_str(), async_reset); |
else |
build_dblreverse(fname.c_str(), async_reset); |
} |
|
const char *rnd_string = ""; |
3029,4 → 1908,7
} |
|
} |
|
if (verbose_flag) |
printf("All done -- success\n"); |
} |
/trunk/sw/fftlib.cpp
0,0 → 1,197
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: fftlib.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen |
#include <stdio.h> |
#include <stdlib.h> |
|
#ifdef _MSC_VER // added for ms vs compatibility |
|
#include <io.h> |
#include <direct.h> |
#define _USE_MATH_DEFINES |
|
#if _MSC_VER <= 1700 |
|
long long llround(double d) { |
if (d<0) return -(long long)(-d+0.5); |
else return (long long)(d+0.5); } |
|
#endif |
|
#else |
// And for G++/Linux environment |
|
#include <unistd.h> // Defines the R_OK/W_OK/etc. macros |
#endif |
|
#include <string.h> |
#include <string> |
#include <math.h> |
// #include <ctype.h> |
#include <assert.h> |
|
#include "fftlib.h" |
|
|
int lgval(int vl) { |
int lg; |
|
for(lg=1; (1<<lg) < vl; lg++) |
; |
return lg; |
} |
|
int nextlg(int vl) { |
int r; |
|
for(r=1; r<vl; r<<=1) |
; |
return r; |
} |
|
int bflydelay(int nbits, int xtra) { |
int cbits = nbits + xtra; |
int delay; |
|
if (USE_OLD_MULTIPLY) { |
if (nbits+1<cbits) |
delay = nbits+4; |
else |
delay = cbits+3; |
} else { |
int na=nbits+2, nb=cbits+1; |
if (nb<na) { |
int tmp = nb; |
nb = na; na = tmp; |
} delay = ((na)/2+(na&1)+2); |
} |
return delay; |
} |
|
int lgdelay(int nbits, int xtra) { |
// The butterfly code needs to compare a valid address, of this |
// many bits, with an address two greater. This guarantees we |
// have enough bits for that comparison. We'll also end up with |
// more storage space to look for these values, but without a |
// redesign that's just what we'll deal with. |
return lgval(bflydelay(nbits, xtra)+3); |
} |
|
void gen_coeffs(FILE *cmem, int stage, int cbits, |
int nwide, int offset, bool inv) { |
// |
// For an FFT stage of 2^n elements, we need 2^(n-1) butterfly |
// coefficients, sometimes called twiddle factors. Stage captures the |
// width of the FFT at this point. If thiss is a 2x at a time FFT, |
// nwide will be equal to 2, and offset will be one or two. |
// |
assert(nwide > 0); |
assert(offset < nwide); |
assert(stage / nwide > 1); |
assert(stage % nwide == 0); |
printf("GEN-COEFFS(): stage =%4d, bits =%2d, nwide = %d, offset = %d, nverse = %d\n", stage, cbits, nwide, offset, inv); |
int ncoeffs = stage/nwide/2; |
for(int i=0; i<ncoeffs; i++) { |
int k = nwide*i+offset; |
double W = ((inv)?1:-1)*2.0*M_PI*k/(double)(stage); |
double c, s; |
long long ic, is, vl; |
|
c = cos(W); s = sin(W); |
ic = (long long)llround((1ll<<(cbits-2)) * c); |
is = (long long)llround((1ll<<(cbits-2)) * s); |
vl = (ic & (~(-1ll << (cbits)))); |
vl <<= (cbits); |
vl |= (is & (~(-1ll << (cbits)))); |
fprintf(cmem, "%0*llx\n", ((cbits*2+3)/4), vl); |
// |
} fclose(cmem); |
} |
|
std::string gen_coeff_fname(const char *coredir, |
int stage, int nwide, int offset, bool inv) { |
std::string result; |
char *memfile; |
|
assert((nwide == 1)||(nwide == 2)); |
|
memfile = new char[strlen(coredir)+3+10+strlen(".hex")+64]; |
if (nwide == 2) { |
if (coredir[0] == '\0') { |
sprintf(memfile, "%scmem_%c%d.hex", |
(inv)?"i":"", (offset==1)?'o':'e', stage*nwide); |
} else { |
sprintf(memfile, "%s/%scmem_%c%d.hex", |
coredir, (inv)?"i":"", |
(offset==1)?'o':'e', stage*nwide); |
} |
} else if (coredir[0] == '\0') // if (nwide == 1) |
sprintf(memfile, "%scmem_%d.hex", |
(inv)?"i":"", stage); |
else |
sprintf(memfile, "%s/%scmem_%d.hex", |
coredir, (inv)?"i":"", stage); |
|
result = std::string(memfile); |
delete[] memfile; |
return result; |
} |
|
FILE *gen_coeff_open(const char *fname) { |
FILE *cmem; |
|
cmem = fopen(fname, "w"); |
if (NULL == cmem) { |
fprintf(stderr, "Could not open FFT coefficient file " |
"\'%s\' for writing\n", fname); |
perror("Err from O/S:"); |
exit(EXIT_FAILURE); |
} |
|
return cmem; |
} |
|
void gen_coeff_file(const char *coredir, const char *fname, |
int stage, int cbits, int nwide, int offset, bool inv) { |
std::string fstr; |
FILE *cmem; |
|
fstr= gen_coeff_fname(coredir, stage, nwide, offset, inv); |
cmem = gen_coeff_open(fstr.c_str()); |
gen_coeffs(cmem, stage, cbits, nwide, offset, inv); |
} |
/trunk/sw/fftlib.h
0,0 → 1,55
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: fftlib.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef FFTLIB_H |
#define FFTLIB_H |
|
#define USE_OLD_MULTIPLY false |
|
extern int lgval(int vl); |
extern int nextlg(int vl); |
extern int bflydelay(int nbits, int xtra); |
extern int lgdelay(int nbits, int xtra); |
extern void gen_coeffs(FILE *cmem, int stage, int cbits, |
int nwide, int offset, bool inv); |
extern std::string gen_coeff_fname(const char *coredir, |
int stage, int nwide, int offset, bool inv); |
extern FILE *gen_coeff_open(const char *fname); |
extern void gen_coeff_file(const char *coredir, const char *fname, |
int stage, int cbits, int nwide, int offset, bool inv); |
|
#endif // FFTLIB_H |
/trunk/sw/legal.cpp
0,0 → 1,70
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: legal.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: Contains the information and logic necessary to place a |
// copyright, name, author, and purpoose statement at the head of |
// every file. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#include "legal.h" |
|
const char cpyleft[] = |
SLASHLINE |
"//\n" |
"// Copyright (C) 2015-2018, Gisselquist Technology, LLC\n" |
"//\n" |
"// This program is free software (firmware): you can redistribute it and/or\n" |
"// modify it under the terms of the GNU General Public License as published\n" |
"// by the Free Software Foundation, either version 3 of the License, or (at\n" |
"// your option) any later version.\n" |
"//\n" |
"// This program is distributed in the hope that it will be useful, but WITHOUT\n" |
"// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or\n" |
"// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License\n" |
"// for more details.\n" |
"//\n" |
"// You should have received a copy of the GNU General Public License along\n" |
"// with this program. (It's in the $(ROOT)/doc directory, run make with no\n" |
"// target there if the PDF file isn\'t present.) If not, see\n" |
"// <http://www.gnu.org/licenses/> for a copy.\n" |
"//\n" |
"// License: GPL, v3, as defined and found on www.gnu.org,\n" |
"// http://www.gnu.org/licenses/gpl.html\n" |
"//\n" |
"//\n" |
SLASHLINE; |
const char prjname[] = "A General Purpose Pipelined FFT Implementation"; |
const char creator[] = "// Creator: Dan Gisselquist, Ph.D.\n" |
"// Gisselquist Technology, LLC\n"; |
|
/trunk/sw/legal.h
0,0 → 1,49
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: legal.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: Contains the information and logic necessary to place a |
// copyright, name, author, and purpoose statement at the head of |
// every file. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef LEGAL_H |
#define LEGAL_H |
|
#define SLASHLINE "////////////////////////////////////////////////////////////////////////////////\n" |
|
extern const char cpyleft[]; |
extern const char prjname[]; |
extern const char creator[]; |
|
#endif // LEGAL_H |
/trunk/sw/rounding.cpp
0,0 → 1,407
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: rounding.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: To create one of a series of modules to handle dropping bits |
// within the FFT implementation. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen |
|
#include <stdio.h> |
#include <stdlib.h> |
|
#include <string.h> |
#include <string> |
#include <math.h> |
#include <ctype.h> |
#include <assert.h> |
|
#include "legal.h" |
#include "rounding.h" |
|
#define SLASHLINE "////////////////////////////////////////////////////////////////////////////////\n" |
|
|
void build_truncator(const char *fname) { |
printf("TRUNCATING!\n"); |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\ttruncate.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose: Truncation is one of several options that can be used\n" |
"// internal to the various FFT stages to drop bits from one\n" |
"// stage to the next. In general, it is the simplest method of dropping\n" |
"// bits, since it requires only a bit selection.\n" |
"//\n" |
"// This form of rounding isn\'t really that great for FFT\'s, since it\n" |
"// tends to produce a DC bias in the result. (Other less pronounced\n" |
"// biases may also exist.)\n" |
"//\n" |
"// This particular version also registers the output with the clock, so\n" |
"// there will be a delay of one going through this module. This will\n" |
"// keep it in line with the other forms of rounding that can be used.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module truncate(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_val <= i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\n" |
"endmodule\n"); |
} |
|
void build_roundhalfup(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\troundhalfup.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tRounding half up is the way I was always taught to round in\n" |
"// school. A one half value is added to the result, and then\n" |
"// the result is truncated. When used in an FFT, this produces less\n" |
"// bias than the truncation method, although a bias still tends to\n" |
"// remain.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module roundhalfup(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\t// Let's deal with two cases to be as general as we can be here\n" |
"\t//\n" |
"\t// 1. The desired output would lose no bits at all\n" |
"\t// 2. One or more bits would be dropped, so the rounding is simply\n" |
"\t//\t\ta matter of adding one to the bit about to be dropped,\n" |
"\t//\t\tmoving all halfway and above numbers up to the next\n" |
"\t//\t\tvalue.\n" |
"\tgenerate\n" |
"\tif (IWID-SHIFT == OWID)\n" |
"\tbegin // No truncation or rounding, output drops no bits\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n" |
"\n" |
"\tend else // if (IWID-SHIFT-1 >= OWID)\n" |
"\tbegin // Output drops one bit, can only add one or ... not.\n" |
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n" |
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse\n" |
"\t\t\t\t\to_val <= rounded_up; // even value\n" |
"\t\t\tend\n" |
"\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"endmodule\n"); |
} |
|
void build_roundfromzero(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\troundfromzero.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: Truncation is one of several options that can be used\n" |
"// internal to the various FFT stages to drop bits from one\n" |
"// stage to the next. In general, it is the simplest method of dropping\n" |
"// bits, since it requires only a bit selection.\n" |
"//\n" |
"// This form of rounding isn\'t really that great for FFT\'s, since it\n" |
"// tends to produce a DC bias in the result. (Other less pronounced\n" |
"// biases may also exist.)\n" |
"//\n" |
"// This particular version also registers the output with the clock, so\n" |
"// clock, so there will be a delay of one going through this module.\n" |
"// This will keep it in line with the other forms of rounding that can\n" |
"// be used.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module roundfromzero(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\t// Let's deal with three cases to be as general as we can be here\n" |
"\t//\n" |
"\t//\t1. The desired output would lose no bits at all\n" |
"\t//\t2. One bit would be dropped, so the rounding is simply\n" |
"\t//\t\tadjusting the value to be the closer to zero in\n" |
"\t//\t\tcases of being halfway between two. If identically\n" |
"\t//\t\tequal to a number, we just leave it as is.\n" |
"\t//\t3. Two or more bits would be dropped. In this case, we round\n" |
"\t//\t\tnormally unless we are rounding a value of exactly\n" |
"\t//\t\thalfway between the two. In the halfway case, we\n" |
"\t//\t\tround away from zero.\n" |
"\tgenerate\n" |
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n" |
"\tbegin // cannot be applied. No truncation or rounding takes\n" |
"\t// effect here.\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n" |
"\n" |
"\tend else if (IWID-SHIFT == OWID)\n" |
"\tbegin // No truncation or rounding, output drops no bits\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n" |
"\n" |
"\tend else if (IWID-SHIFT-1 == OWID)\n" |
"\tbegin // Output drops one bit, can only add one or ... not.\n" |
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n" |
"\t\twire\t\t\tsign_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tfirst_lost_bit = i_val[0];\n" |
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (sign_bit)\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse\n" |
"\t\t\t\t\to_val <= rounded_up;\n" |
"\t\t\tend\n" |
"\n" |
"\tend else // If there's more than one bit we are dropping\n" |
"\tbegin\n" |
"\t\twire\t[(OWID-1):0]\ttruncated_value, rounded_up;\n" |
"\t\twire\t\t\tsign_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n" |
"\t\tassign\tsign_bit = i_val[(IWID-1)];\n" |
"\n" |
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n" |
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (|other_lost_bits) // Round up to\n" |
"\t\t\t\t\to_val <= rounded_up; // closest value\n" |
"\t\t\t\telse if (sign_bit)\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse\n" |
"\t\t\t\t\to_val <= rounded_up;\n" |
"\t\t\tend\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"endmodule\n"); |
} |
|
void build_convround(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename: convround.v\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: A convergent rounding routine, also known as banker\'s\n" |
"// rounding, Dutch rounding, Gaussian rounding, unbiased\n" |
"// rounding, or ... more, at least according to Wikipedia.\n" |
"//\n" |
"// This form of rounding works by rounding, when the direction is in\n" |
"// question, towards the nearest even value.\n" |
"//\n" |
"//\n%s" |
"//\n", |
prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module convround(i_clk, i_ce, i_val, o_val);\n" |
"\tparameter\tIWID=16, OWID=8, SHIFT=0;\n" |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\tsigned\t[(IWID-1):0]\ti_val;\n" |
"\toutput\treg\tsigned\t[(OWID-1):0]\to_val;\n" |
"\n" |
"\t// Let's deal with three cases to be as general as we can be here\n" |
"\t//\n" |
"\t//\t1. The desired output would lose no bits at all\n" |
"\t//\t2. One bit would be dropped, so the rounding is simply\n" |
"\t//\t\tadjusting the value to be the nearest even number in\n" |
"\t//\t\tcases of being halfway between two. If identically\n" |
"\t//\t\tequal to a number, we just leave it as is.\n" |
"\t//\t3. Two or more bits would be dropped. In this case, we round\n" |
"\t//\t\tnormally unless we are rounding a value of exactly\n" |
"\t//\t\thalfway between the two. In the halfway case we round\n" |
"\t//\t\tto the nearest even number.\n" |
"\tgenerate\n" |
// What if IWID < OWID? We should expand here ... somehow |
"\tif (IWID == OWID) // In this case, the shift is irrelevant and\n" |
"\tbegin // cannot be applied. No truncation or rounding takes\n" |
"\t// effect here.\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-1):0];\n" |
"\n" |
// What if IWID-SHIFT < OWID? Shouldn't we also shift here as well? |
"\tend else if (IWID-SHIFT == OWID)\n" |
"\tbegin // No truncation or rounding, output drops no bits\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\to_val <= i_val[(IWID-SHIFT-1):0];\n" |
"\n" |
"\tend else if (IWID-SHIFT-1 == OWID)\n" |
// Is there any way to limit the number of bits that are examined here, for the |
// purpose of simplifying/reducing logic? I mean, if we go from 32 to 16 bits, |
// must we check all 15 bits for equality to zero? |
"\tbegin // Output drops one bit, can only add one or ... not.\n" |
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n" |
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tlast_valid_bit = truncated_value[0];\n" |
"\t\tassign\tfirst_lost_bit = i_val[0];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (last_valid_bit)// Round up to nearest\n" |
"\t\t\t\t\to_val <= rounded_up; // even value\n" |
"\t\t\t\telse // else round down to the nearest\n" |
"\t\t\t\t\to_val <= truncated_value; // even value\n" |
"\t\t\tend\n" |
"\n" |
"\tend else // If there's more than one bit we are dropping\n" |
"\tbegin\n" |
"\t\twire\t[(OWID-1):0] truncated_value, rounded_up;\n" |
"\t\twire\t\t\tlast_valid_bit, first_lost_bit;\n" |
"\t\tassign\ttruncated_value=i_val[(IWID-1-SHIFT):(IWID-SHIFT-OWID)];\n" |
"\t\tassign\trounded_up=truncated_value + {{(OWID-1){1\'b0}}, 1\'b1 };\n" |
"\t\tassign\tlast_valid_bit = truncated_value[0];\n" |
"\t\tassign\tfirst_lost_bit = i_val[(IWID-SHIFT-OWID-1)];\n" |
"\n" |
"\t\twire\t[(IWID-SHIFT-OWID-2):0]\tother_lost_bits;\n" |
"\t\tassign\tother_lost_bits = i_val[(IWID-SHIFT-OWID-2):0];\n" |
"\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\tbegin\n" |
"\t\t\t\tif (!first_lost_bit) // Round down / truncate\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\t\telse if (|other_lost_bits) // Round up to\n" |
"\t\t\t\t\to_val <= rounded_up; // closest value\n" |
"\t\t\t\telse if (last_valid_bit) // Round up to\n" |
"\t\t\t\t\to_val <= rounded_up; // nearest even\n" |
"\t\t\t\telse // else round down to nearest even\n" |
"\t\t\t\t\to_val <= truncated_value;\n" |
"\t\t\tend\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"endmodule\n"); |
} |
|
/trunk/sw/rounding.h
0,0 → 1,52
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: rounding.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: To create one of a series of modules to handle dropping bits |
// within the FFT implementation. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef ROUNDING_H |
#define ROUNDING_H |
|
typedef enum { |
RND_TRUNCATE, RND_FROMZERO, RND_HALFUP, RND_CONVERGENT |
} ROUND_T; |
|
|
extern void build_truncator(const char *fname); |
extern void build_roundhalfup(const char *fname); |
extern void build_roundfromzero(const char *fname); |
extern void build_convround(const char *fname); |
|
#endif |
/trunk/sw/softmpy.cpp
0,0 → 1,400
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: softmpy.cpp |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: If the chip doesn't have any hardware multiplies, you'll need |
// a soft-multiply implementation. This provides that |
// implementation. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#define _CRT_SECURE_NO_WARNINGS // ms vs 2012 doesn't like fopen |
#include <stdio.h> |
#include <stdlib.h> |
|
#ifdef _MSC_VER // added for ms vs compatibility |
|
#include <io.h> |
#include <direct.h> |
#define _USE_MATH_DEFINES |
|
#endif |
|
#include <string.h> |
#include <string> |
#include <math.h> |
#include <ctype.h> |
#include <assert.h> |
|
#include "defaults.h" |
#include "legal.h" |
#include "softmpy.h" |
|
void build_multiply(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\tshiftaddmpy.v\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tA portable shift and add multiply.\n" |
"//\n" |
"// While both Xilinx and Altera will offer single clock multiplies, this\n" |
"// simple approach will multiply two numbers on any architecture. The\n" |
"// result maintains the full width of the multiply, there are no extra\n" |
"// stuff bits, no rounding, no shifted bits, etc.\n" |
"//\n" |
"// Further, for those applications that can support it, this multiply\n" |
"// is pipelined and will produce one answer per clock.\n" |
"//\n" |
"// For minimal processing delay, make the first parameter the one with\n" |
"// the least bits, so that AWIDTH <= BWIDTH.\n" |
"//\n" |
"// The processing delay in this multiply is (AWIDTH+1) cycles. That is,\n" |
"// if the data is present on the input at clock t=0, the result will be\n" |
"// present on the output at time t=AWIDTH+1;\n" |
"//\n" |
"//\n%s" |
"//\n", prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module shiftaddmpy(i_clk, i_ce, i_a, i_b, o_r);\n" |
"\tparameter\tAWIDTH=%d,BWIDTH=", TST_SHIFTADDMPY_AW); |
#ifdef TST_SHIFTADDMPY_BW |
fprintf(fp, "%d;\n", TST_SHIFTADDMPY_BW); |
#else |
fprintf(fp, "AWIDTH;\n"); |
#endif |
fprintf(fp, |
"\tinput\t\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\t[(AWIDTH-1):0]\t\ti_a;\n" |
"\tinput\t\t[(BWIDTH-1):0]\t\ti_b;\n" |
"\toutput\treg\t[(AWIDTH+BWIDTH-1):0]\to_r;\n" |
"\n" |
"\treg\t[(AWIDTH-1):0]\tu_a;\n" |
"\treg\t[(BWIDTH-1):0]\tu_b;\n" |
"\treg\t\t\tsgn;\n" |
"\n" |
"\treg\t[(AWIDTH-2):0]\t\tr_a[0:(AWIDTH-1)];\n" |
"\treg\t[(AWIDTH+BWIDTH-2):0]\tr_b[0:(AWIDTH-1)];\n" |
"\treg\t\t\t\tr_s[0:(AWIDTH-1)];\n" |
"\treg\t[(AWIDTH+BWIDTH-1):0]\tacc[0:(AWIDTH-1)];\n" |
"\tgenvar k;\n" |
"\n" |
"\t// If we were forced to stay within two\'s complement arithmetic,\n" |
"\t// taking the absolute value here would require an additional bit.\n" |
"\t// However, because our results are now unsigned, we can stay\n" |
"\t// within the number of bits given (for now).\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tu_a <= (i_a[AWIDTH-1])?(-i_a):(i_a);\n" |
"\t\t\tu_b <= (i_b[BWIDTH-1])?(-i_b):(i_b);\n" |
"\t\t\tsgn <= i_a[AWIDTH-1] ^ i_b[BWIDTH-1];\n" |
"\t\tend\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tacc[0] <= (u_a[0]) ? { {(AWIDTH){1\'b0}}, u_b }\n" |
"\t\t\t\t\t: {(AWIDTH+BWIDTH){1\'b0}};\n" |
"\t\t\tr_a[0] <= { u_a[(AWIDTH-1):1] };\n" |
"\t\t\tr_b[0] <= { {(AWIDTH-1){1\'b0}}, u_b };\n" |
"\t\t\tr_s[0] <= sgn; // The final sign, needs to be preserved\n" |
"\t\tend\n" |
"\n" |
"\tgenerate\n" |
"\tfor(k=0; k<AWIDTH-1; k=k+1)\n" |
"\tbegin : genstages\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tacc[k+1] <= acc[k] + ((r_a[k][0]) ? {r_b[k],1\'b0}:0);\n" |
"\t\t\tr_a[k+1] <= { 1\'b0, r_a[k][(AWIDTH-2):1] };\n" |
"\t\t\tr_b[k+1] <= { r_b[k][(AWIDTH+BWIDTH-3):0], 1\'b0};\n" |
"\t\t\tr_s[k+1] <= r_s[k];\n" |
"\t\tend\n" |
"\tend\n" |
"\tendgenerate\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_r <= (r_s[AWIDTH-1]) ? (-acc[AWIDTH-1]) : acc[AWIDTH-1];\n" |
"\n" |
"endmodule\n"); |
|
fclose(fp); |
} |
|
void build_bimpy(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename:\t%s\n" |
"//\n" |
"// Project:\t%s\n" |
"//\n" |
"// Purpose:\tA simple 2-bit multiply based upon the fact that LUT's allow\n" |
"// 6-bits of input. In other words, I could build a 3-bit\n" |
"// multiply from 6 LUTs (5 actually, since the first could have two\n" |
"// outputs). This would allow multiplication of three bit digits, save\n" |
"// only for the fact that you would need two bits of carry. The bimpy\n" |
"// approach throttles back a bit and does a 2x2 bit multiply in a LUT,\n" |
"// guaranteeing that it will never carry more than one bit. While this\n" |
"// multiply is hardware independent (and can still run under Verilator\n" |
"// therefore), it is really motivated by trying to optimize for a\n" |
"// specific piece of hardware (Xilinx-7 series ...) that has at least\n" |
"// 4-input LUT's with carry chains.\n" |
"//\n" |
"//\n" |
"//\n%s" |
"//\n", fname, prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module bimpy(i_clk, i_ce, i_a, i_b, o_r);\n" |
"\tparameter\tBW=18, // Number of bits in i_b\n" |
"\t\t\tLUTB=2; // Number of bits in i_a for our LUT multiply\n" |
"\tinput\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\t[(LUTB-1):0]\ti_a;\n" |
"\tinput\t\t[(BW-1):0]\ti_b;\n" |
"\toutput\treg\t[(BW+LUTB-1):0] o_r;\n" |
"\n" |
"\twire [(BW+LUTB-2):0] w_r;\n" |
"\twire [(BW+LUTB-3):1] c;\n" |
"\n" |
"\tassign\tw_r = { ((i_a[1])?i_b:{(BW){1\'b0}}), 1\'b0 }\n" |
"\t\t\t\t^ { 1\'b0, ((i_a[0])?i_b:{(BW){1\'b0}}) };\n" |
"\tassign\tc = { ((i_a[1])?i_b[(BW-2):0]:{(BW-1){1\'b0}}) }\n" |
"\t\t\t& ((i_a[0])?i_b[(BW-1):1]:{(BW-1){1\'b0}});\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_r <= w_r + { c, 2'b0 };\n" |
"\n" |
"endmodule\n"); |
|
fclose(fp); |
} |
|
void build_longbimpy(const char *fname) { |
FILE *fp = fopen(fname, "w"); |
if (NULL == fp) { |
fprintf(stderr, "Could not open \'%s\' for writing\n", fname); |
perror("O/S Err was:"); |
return; |
} |
|
fprintf(fp, |
SLASHLINE |
"//\n" |
"// Filename: %s\n" |
"//\n" |
"// Project: %s\n" |
"//\n" |
"// Purpose: A portable shift and add multiply, built with the knowledge\n" |
"// of the existence of a six bit LUT and carry chain. That knowledge\n" |
"// allows us to multiply two bits from one value at a time against all\n" |
"// of the bits of the other value. This sub multiply is called the\n" |
"// bimpy.\n" |
"//\n" |
"// For minimal processing delay, make the first parameter the one with\n" |
"// the least bits, so that AWIDTH <= BWIDTH.\n" |
"//\n" |
"//\n" |
"//\n%s" |
"//\n", fname, prjname, creator); |
|
fprintf(fp, "%s", cpyleft); |
fprintf(fp, "//\n//\n`default_nettype\tnone\n//\n"); |
fprintf(fp, |
"module longbimpy(i_clk, i_ce, i_a_unsorted, i_b_unsorted, o_r);\n" |
"\tparameter IAW=%d, // The width of i_a, min width is 5\n" |
"\t\t\tIBW=", TST_LONGBIMPY_AW); |
#ifdef TST_LONGBIMPY_BW |
fprintf(fp, "%d", TST_LONGBIMPY_BW); |
#else |
fprintf(fp, "IAW"); |
#endif |
|
fprintf(fp, ", // The width of i_b, can be anything\n" |
"\t\t\t// The following three parameters should not be changed\n" |
"\t\t\t// by any implementation, but are based upon hardware\n" |
"\t\t\t// and the above values:\n" |
"\t\t\tOW=IAW+IBW; // The output width\n"); |
fprintf(fp, |
"\tlocalparam AW = (IAW<IBW) ? IAW : IBW,\n" |
"\t\t\tBW = (IAW<IBW) ? IBW : IAW,\n" |
"\t\t\tIW=(AW+1)&(-2), // Internal width of A\n" |
"\t\t\tLUTB=2, // How many bits we can multiply by at once\n" |
"\t\t\tTLEN=(AW+(LUTB-1))/LUTB; // Nmbr of rows in our tableau\n" |
"\tinput\t\t\t\ti_clk, i_ce;\n" |
"\tinput\t\t[(IAW-1):0]\ti_a_unsorted;\n" |
"\tinput\t\t[(IBW-1):0]\ti_b_unsorted;\n" |
"\toutput\treg\t[(AW+BW-1):0]\to_r;\n" |
"\n" |
"\t//\n" |
"\t// Swap parameter order, so that AW <= BW -- for performance\n" |
"\t// reasons\n" |
"\twire [AW-1:0] i_a;\n" |
"\twire [BW-1:0] i_b;\n" |
"\tgenerate if (IAW <= IBW)\n" |
"\tbegin : NO_PARAM_CHANGE\n" |
"\t\tassign i_a = i_a_unsorted;\n" |
"\t\tassign i_b = i_b_unsorted;\n" |
"\tend else begin : SWAP_PARAMETERS\n" |
"\t\tassign i_a = i_b_unsorted;\n" |
"\t\tassign i_b = i_a_unsorted;\n" |
"\tend endgenerate\n" |
"\n" |
"\treg\t[(IW-1):0]\tu_a;\n" |
"\treg\t[(BW-1):0]\tu_b;\n" |
"\treg\t\t\tsgn;\n" |
"\n" |
"\treg\t[(IW-1-2*(LUTB)):0]\tr_a[0:(TLEN-3)];\n" |
"\treg\t[(BW-1):0]\t\tr_b[0:(TLEN-3)];\n" |
"\treg\t[(TLEN-1):0]\t\tr_s;\n" |
"\treg\t[(IW+BW-1):0]\t\tacc[0:(TLEN-2)];\n" |
"\tgenvar k;\n" |
"\n" |
"\t// First step:\n" |
"\t// Switch to unsigned arithmetic for our multiply, keeping track\n" |
"\t// of the along the way. We'll then add the sign again later at\n" |
"\t// the end.\n" |
"\t//\n" |
"\t// If we were forced to stay within two's complement arithmetic,\n" |
"\t// taking the absolute value here would require an additional bit.\n" |
"\t// However, because our results are now unsigned, we can stay\n" |
"\t// within the number of bits given (for now).\n" |
"\tgenerate if (IW > AW)\n" |
"\tbegin\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\t\tu_a <= { 1\'b0, (i_a[AW-1])?(-i_a):(i_a) };\n" |
"\tend else begin\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\t\tu_a <= (i_a[AW-1])?(-i_a):(i_a);\n" |
"\tend endgenerate\n" |
"\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tu_b <= (i_b[BW-1])?(-i_b):(i_b);\n" |
"\t\t\tsgn <= i_a[AW-1] ^ i_b[BW-1];\n" |
"\t\tend\n" |
"\n" |
"\twire [(BW+LUTB-1):0] pr_a, pr_b;\n" |
"\n" |
"\t//\n" |
"\t// Second step: First two 2xN products.\n" |
"\t//\n" |
"\t// Since we have no tableau of additions (yet), we can do both\n" |
"\t// of the first two rows at the same time and add them together.\n" |
"\t// For the next round, we'll then have a previous sum to accumulate\n" |
"\t// with new and subsequent product, and so only do one product at\n" |
"\t// a time can follow this--but the first clock can do two at a time.\n" |
"\tbimpy\t#(BW) lmpy_0(i_clk,i_ce,u_a[( LUTB-1): 0], u_b, pr_a);\n" |
"\tbimpy\t#(BW) lmpy_1(i_clk,i_ce,u_a[(2*LUTB-1):LUTB], u_b, pr_b);\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) r_a[0] <= u_a[(IW-1):(2*LUTB)];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) r_b[0] <= u_b;\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce) r_s <= { r_s[(TLEN-2):0], sgn };\n" |
"\talways @(posedge i_clk) // One clk after p[0],p[1] become valid\n" |
"\t\tif (i_ce) acc[0] <= { {(IW-LUTB){1\'b0}}, pr_a}\n" |
"\t\t\t +{ {(IW-(2*LUTB)){1\'b0}}, pr_b, {(LUTB){1\'b0}} };\n" |
"\n" |
"\tgenerate // Keep track of intermediate values, before multiplying them\n" |
"\tif (TLEN > 3) for(k=0; k<TLEN-3; k=k+1)\n" |
"\tbegin : gencopies\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\tbegin\n" |
"\t\t\tr_a[k+1] <= { {(LUTB){1\'b0}},\n" |
"\t\t\t\tr_a[k][(IW-1-(2*LUTB)):LUTB] };\n" |
"\t\t\tr_b[k+1] <= r_b[k];\n" |
"\t\tend\n" |
"\tend endgenerate\n" |
"\n" |
"\tgenerate // The actual multiply and accumulate stage\n" |
"\tif (TLEN > 2) for(k=0; k<TLEN-2; k=k+1)\n" |
"\tbegin : genstages\n" |
"\t\t// First, the multiply: 2-bits times BW bits\n" |
"\t\twire\t[(BW+LUTB-1):0] genp;\n" |
"\t\tbimpy #(BW) genmpy(i_clk,i_ce,r_a[k][(LUTB-1):0],r_b[k], genp);\n" |
"\n" |
"\t\t// Then the accumulate step -- on the next clock\n" |
"\t\talways @(posedge i_clk)\n" |
"\t\t\tif (i_ce)\n" |
"\t\t\t\tacc[k+1] <= acc[k] + {{(IW-LUTB*(k+3)){1\'b0}},\n" |
"\t\t\t\t\tgenp, {(LUTB*(k+2)){1\'b0}} };\n" |
"\tend endgenerate\n" |
"\n" |
"\twire [(IW+BW-1):0] w_r;\n" |
"\tassign\tw_r = (r_s[TLEN-1]) ? (-acc[TLEN-2]) : acc[TLEN-2];\n" |
"\talways @(posedge i_clk)\n" |
"\t\tif (i_ce)\n" |
"\t\t\to_r <= w_r[(AW+BW-1):0];\n" |
"\n" |
"\tgenerate if (IW > AW)\n" |
"\tbegin : VUNUSED\n" |
"\t\t// verilator lint_off UNUSED\n" |
"\t\twire\t[(IW-AW)-1:0]\tunused;\n" |
"\t\tassign\tunused = w_r[(IW+BW-1):(AW+BW)];\n" |
"\t\t// verilator lint_on UNUSED\n" |
"\tend endgenerate\n" |
"\n" |
"endmodule\n"); |
|
fclose(fp); |
} |
|
/trunk/sw/softmpy.h
0,0 → 1,47
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: softmpy.h |
// |
// Project: A General Purpose Pipelined FFT Implementation |
// |
// Purpose: If the chip doesn't have any hardware multiplies, you'll need |
// a soft-multiply implementation. This provides that |
// implementation. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Technology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015-2018, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// |
#ifndef SOFTMPY_H |
#define SOFTMPY_H |
|
extern void build_multiply(const char *fname); |
extern void build_bimpy(const char *fname); |
extern void build_longbimpy(const char *fname); |
|
#endif // SOFTMPY_H |