BiRiscV - 32-bit dual issue RISC-V CPU

Star3you like it: star it!

biRISC-V - 32-bit dual issue RISC-V CPU




  • 32-bit RISC-V ISA CPU core.
  • Superscalar (dual-issue) in-order 6 or 7 stage pipeline.
  • Support RISC-V’s integer (I), multiplication and division (M), and CSR instructions (Z) extensions (RV32IMZicsr).
  • Branch prediction (bimodel/gshare) with configurable depth branch target buffer (BTB) and return address stack (RAS).
  • 64-bit instruction fetch, 32-bit data access.
  • 2 x integer ALU (arithmetic, shifters and branch units).
  • 1 x load store unit, 1 x out-of-pipeline divider.
  • Issue and complete up to 2 independent instructions per cycle.
  • Supports user, supervisor and machine mode privilege levels.
  • Basic MMU support - capable of booting Linux with atomics (RV-A) SW emulation.
  • Implements base ISA spec v2.1 and privileged ISA spec v1.11.
  • Verified using Google's RISCV-DV random instruction sequences using cosimulation against C++ ISA model.
  • Support for instruction / data cache, AXI bus interfaces or tightly coupled memories.
  • Configurable number of pipeline stages, result forwarding options, and branch prediction resources.
  • Synthesizable Verilog 2001, Verilator and FPGA friendly.
  • Coremark: 4.1 CoreMark/MHz
  • Dhrystone: 1.9 DMIPS/MHz ('legal compile options' / 337 instructions per iteration)

Similar Cores

  • SiFive E76
    • RV32IMAFC
    • Dual issue in-order 8 stage pipeline
    • 4 ALU units (2 early, 2 late)
    • :heavy_multiplication_x: Commercial closed source core/$$
  • WD SweRV RISC-V Core EH1
    • RV32IMC
    • Dual issue in-order 9 stage pipeline
    • 4 ALU units (2 early, 2 late)
    • :heavy_multiplication_x: System Verilog + auto signal hookup
    • :heavy_multiplication_x: No data cache option
    • :heavy_multiplication_x: Not able to boot Linux

Project Aims

  • Boot Linux all the way to a functional userspace environment. :heavy_check_mark:
  • Achieve competitive performance for this class of in-order machine (i.e. aim for 80% of WD SweRV CoreMark score). :heavy_check_mark:
  • Reasonable PPA / FPGA resource friendly. :heavy_check_mark:
  • Fit easily onto cheap hobbyist FPGAs (e.g. Xilinx Artix 7) without using all LUT resources and synthesize > 50MHz. :heavy_check_mark:
  • Support various cache and TCM options. :heavy_check_mark:
  • Be constructed using readable, maintainable and documented IEEE 1364-2001 Verilog. :heavy_check_mark:
  • Simulate in open-source tools such as Verilator and Icarus Verilog. :heavy_check_mark:
  • In later releases, add support for atomic extensions.

Prior Work

Based on my previous work;

To clone this project and its dependencies;

git clone --recursive

Running Helloworld

To run a simple test image on the core RTL using Icarus Verilog;

# Install Icarus Verilog (Debian / Ubuntu / Linux Mint)
sudo apt-get install iverilog

# [or] Install Icarus Verilog (Redhat / Centos)
#sudo yum install iverilog

# Run a simple test image (test.elf)
cd tb/tb_core_icarus

The expected output is;

Starting bench
VCD info: dumpfile waveform.vcd opened for output.

1. Initialised data
2. Multiply
3. Divide
4. Shift left
5. Shift right
6. Shift right arithmetic
7. Signed comparision
8. Word access
9. Byte access
10. Comparision


Param NameValid RangeDescription
SUPPORT_SUPER1/0Enable supervisor / user privilege levels.
SUPPORT_MMU1/0Enable basic memory management unit.
SUPPORT_MULDIV1/0Enable HW multiply / divide (RV-M).
SUPPORT_DUAL_ISSUE1/0Support superscalar operation.
SUPPORT_LOAD_BYPASS1/0Support load result bypass paths.
SUPPORT_MUL_BYPASS1/0Support multiply result bypass paths.
SUPPORT_REGFILE_XILINX1/0Support Xilinx optimised register file.
SUPPORT_BRANCH_PREDICTION1/0Enable branch prediction structures.
NUM_BTB_ENTRIES2 -Number of branch target buffer entries.
NUM_BHT_ENTRIES2 -Number of branch history table entries.
BHT_ENABLE1/0Enable branch history table based prediction.
GSHARE_ENABLE1/0Enable GSHARE branch prediction algorithm.
RAS_ENABLE1/0Enable return address stack prediction.
NUM_RAS_ENTRIES2 -Number of return stack addresses supported.
EXTRA_DECODE_STAGE1/0Extra decode pipe stage for improved timing.
MEM_CACHE_ADDR_MIN32'h0 - 32'hffffffffLowest cacheable memory address.
MEM_CACHE_ADDR_MAX32'h0 - 32'hffffffffHighest cacheable memory address.