1 |
2 |
ultra_embe |
# biRISC-V - 32-bit dual issue RISC-V CPU
|
2 |
|
|
|
3 |
|
|
Github: [http://github.com/ultraembedded/biriscv](http://github.com/ultraembedded/biriscv)
|
4 |
|
|
|
5 |
|
|
![biRISC-V](docs/biRISC-V.png)
|
6 |
|
|
|
7 |
|
|
## Features
|
8 |
|
|
* 32-bit RISC-V ISA CPU core.
|
9 |
|
|
* Superscalar (dual-issue) in-order 6 or 7 stage pipeline.
|
10 |
|
|
* Support RISC-V’s integer (I), multiplication and division (M), and CSR instructions (Z) extensions (RV32IMZicsr).
|
11 |
|
|
* Branch prediction (bimodel/gshare) with configurable depth branch target buffer (BTB) and return address stack (RAS).
|
12 |
|
|
* 64-bit instruction fetch, 32-bit data access.
|
13 |
|
|
* 2 x integer ALU (arithmetic, shifters and branch units).
|
14 |
|
|
* 1 x load store unit, 1 x out-of-pipeline divider.
|
15 |
|
|
* Issue and complete up to 2 independent instructions per cycle.
|
16 |
|
|
* Supports user, supervisor and machine mode privilege levels.
|
17 |
|
|
* Basic MMU support - capable of booting Linux with atomics (RV-A) SW emulation.
|
18 |
|
|
* Implements base ISA spec [v2.1](docs/riscv_isa_spec.pdf) and privileged ISA spec [v1.11](docs/riscv_privileged_spec.pdf).
|
19 |
|
|
* Verified using [Google's RISCV-DV](https://github.com/google/riscv-dv) random instruction sequences using cosimulation against [C++ ISA model](https://github.com/ultraembedded/exactstep).
|
20 |
|
|
* Support for instruction / data cache, AXI bus interfaces or tightly coupled memories.
|
21 |
|
|
* Configurable number of pipeline stages, result forwarding options, and branch prediction resources.
|
22 |
|
|
* Synthesizable Verilog 2001, Verilator and FPGA friendly.
|
23 |
|
|
* Coremark: **4.1 CoreMark/MHz**
|
24 |
|
|
* Dhrystone: **1.9 DMIPS/MHz** ('legal compile options' / 337 instructions per iteration)
|
25 |
|
|
|
26 |
|
|
*A sequence showing execution of 2 instructions per cycle;*
|
27 |
|
|
![Dual-Issue](docs/dual_issue.png)
|
28 |
|
|
|
29 |
|
|
## Documentation
|
30 |
|
|
* [Configuration](http://github.com/ultraembedded/biriscv/docs/configuration.md)
|
31 |
|
|
* [Booting Linux](http://github.com/ultraembedded/biriscv/docs/linux.md)
|
32 |
|
|
* [Integration](http://github.com/ultraembedded/biriscv/docs/integration.md)
|
33 |
|
|
* [Custom Features](http://github.com/ultraembedded/biriscv/docs/custom.md)
|
34 |
|
|
|
35 |
|
|
## Similar Cores
|
36 |
|
|
* [SiFive E76](https://www.sifive.com/cores/e76)
|
37 |
|
|
* RV32IMAFC
|
38 |
|
|
* Dual issue in-order 8 stage pipeline
|
39 |
|
|
* 4 ALU units (2 early, 2 late)
|
40 |
|
|
* :heavy_multiplication_x: *Commercial closed source core/$$*
|
41 |
|
|
* [WD SweRV RISC-V Core EH1](https://github.com/chipsalliance/Cores-SweRV)
|
42 |
|
|
* RV32IMC
|
43 |
|
|
* Dual issue in-order 9 stage pipeline
|
44 |
|
|
* 4 ALU units (2 early, 2 late)
|
45 |
|
|
* :heavy_multiplication_x: *System Verilog + auto signal hookup*
|
46 |
|
|
* :heavy_multiplication_x: *No data cache option*
|
47 |
|
|
* :heavy_multiplication_x: *Not able to boot Linux*
|
48 |
|
|
|
49 |
|
|
## Project Aims
|
50 |
|
|
* Boot Linux all the way to a functional userspace environment. :heavy_check_mark:
|
51 |
|
|
* Achieve competitive performance for this class of in-order machine (i.e. aim for 80% of WD SweRV CoreMark score). :heavy_check_mark:
|
52 |
|
|
* Reasonable PPA / FPGA resource friendly. :heavy_check_mark:
|
53 |
|
|
* Fit easily onto cheap hobbyist FPGAs (e.g. Xilinx Artix 7) without using all LUT resources and synthesize > 50MHz. :heavy_check_mark:
|
54 |
|
|
* Support various cache and TCM options. :heavy_check_mark:
|
55 |
|
|
* Be constructed using readable, maintainable and documented IEEE 1364-2001 Verilog. :heavy_check_mark:
|
56 |
|
|
* Simulate in open-source tools such as Verilator and Icarus Verilog. :heavy_check_mark:
|
57 |
|
|
* *In later releases, add support for atomic extensions.*
|
58 |
|
|
|
59 |
|
|
*Booting the stock Linux 5.0.0-rc8 kernel built for RV32IMA to userspace on a Digilent Arty Artix 7 with biRISC-V (with atomic instructions emulated in the bootloader);*
|
60 |
|
|
![Linux-Boot](docs/linux-boot.png)
|
61 |
|
|
|
62 |
|
|
## Prior Work
|
63 |
|
|
Based on my previous work;
|
64 |
|
|
* Github: [http://github.com/ultraembedded/riscv](http://github.com/ultraembedded/riscv)
|
65 |
|
|
|
66 |
|
|
## Getting Started
|
67 |
|
|
|
68 |
|
|
#### Cloning
|
69 |
|
|
|
70 |
|
|
To clone this project and its dependencies;
|
71 |
|
|
|
72 |
|
|
```
|
73 |
|
|
git clone --recursive https://github.com/ultraembedded/biriscv.git
|
74 |
|
|
|
75 |
|
|
```
|
76 |
|
|
|
77 |
|
|
#### Running Helloworld
|
78 |
|
|
|
79 |
|
|
To run a simple test image on the core RTL using Icarus Verilog;
|
80 |
|
|
|
81 |
|
|
```
|
82 |
|
|
# Install Icarus Verilog (Debian / Ubuntu / Linux Mint)
|
83 |
|
|
sudo apt-get install iverilog
|
84 |
|
|
|
85 |
|
|
# [or] Install Icarus Verilog (Redhat / Centos)
|
86 |
|
|
#sudo yum install iverilog
|
87 |
|
|
|
88 |
|
|
# Run a simple test image (test.elf)
|
89 |
|
|
cd tb/tb_core_icarus
|
90 |
|
|
make
|
91 |
|
|
```
|
92 |
|
|
|
93 |
|
|
The expected output is;
|
94 |
|
|
```
|
95 |
|
|
Starting bench
|
96 |
|
|
VCD info: dumpfile waveform.vcd opened for output.
|
97 |
|
|
|
98 |
|
|
Test:
|
99 |
|
|
1. Initialised data
|
100 |
|
|
2. Multiply
|
101 |
|
|
3. Divide
|
102 |
|
|
4. Shift left
|
103 |
|
|
5. Shift right
|
104 |
|
|
6. Shift right arithmetic
|
105 |
|
|
7. Signed comparision
|
106 |
|
|
8. Word access
|
107 |
|
|
9. Byte access
|
108 |
|
|
10. Comparision
|
109 |
|
|
```
|
110 |
|
|
|
111 |
|
|
#### Configuration
|
112 |
|
|
|
113 |
|
|
| Param Name | Valid Range | Description |
|
114 |
|
|
| ------------------------- |:--------------------:| ----------------------------------------------|
|
115 |
|
|
| SUPPORT_SUPER | 1/0 | Enable supervisor / user privilege levels. |
|
116 |
|
|
| SUPPORT_MMU | 1/0 | Enable basic memory management unit. |
|
117 |
|
|
| SUPPORT_MULDIV | 1/0 | Enable HW multiply / divide (RV-M). |
|
118 |
|
|
| SUPPORT_DUAL_ISSUE | 1/0 | Support superscalar operation. |
|
119 |
|
|
| SUPPORT_LOAD_BYPASS | 1/0 | Support load result bypass paths. |
|
120 |
|
|
| SUPPORT_MUL_BYPASS | 1/0 | Support multiply result bypass paths. |
|
121 |
|
|
| SUPPORT_REGFILE_XILINX | 1/0 | Support Xilinx optimised register file. |
|
122 |
|
|
| SUPPORT_BRANCH_PREDICTION | 1/0 | Enable branch prediction structures. |
|
123 |
|
|
| NUM_BTB_ENTRIES | 2 - | Number of branch target buffer entries. |
|
124 |
|
|
| NUM_BTB_ENTRIES_W | 1 - | Set to log2(NUM_BTB_ENTRIES). |
|
125 |
|
|
| NUM_BHT_ENTRIES | 2 - | Number of branch history table entries. |
|
126 |
|
|
| NUM_BHT_ENTRIES_W | 1 - | Set to log2(NUM_BHT_ENTRIES_W). |
|
127 |
|
|
| BHT_ENABLE | 1/0 | Enable branch history table based prediction. |
|
128 |
|
|
| GSHARE_ENABLE | 1/0 | Enable GSHARE branch prediction algorithm. |
|
129 |
|
|
| RAS_ENABLE | 1/0 | Enable return address stack prediction. |
|
130 |
|
|
| NUM_RAS_ENTRIES | 2 - | Number of return stack addresses supported. |
|
131 |
|
|
| NUM_RAS_ENTRIES_W | 1 - | Set to log2(NUM_RAS_ENTRIES_W). |
|
132 |
|
|
| EXTRA_DECODE_STAGE | 1/0 | Extra decode pipe stage for improved timing. |
|
133 |
|
|
| MEM_CACHE_ADDR_MIN | 32'h0 - 32'hffffffff | Lowest cacheable memory address. |
|
134 |
|
|
| MEM_CACHE_ADDR_MAX | 32'h0 - 32'hffffffff | Highest cacheable memory address. |
|