Line 1... |
Line 1... |
# [The NEORV32 Processor](https://github.com/stnolting/neorv32) (RISC-V-compliant)
|
# [The NEORV32 Processor](https://github.com/stnolting/neorv32) (RISC-V)
|
|
|
[![Build Status](https://travis-ci.com/stnolting/neorv32.svg?branch=master)](https://travis-ci.com/stnolting/neorv32)
|
[![Build Status](https://travis-ci.com/stnolting/neorv32.svg?branch=master)](https://travis-ci.com/stnolting/neorv32)
|
[![license](https://img.shields.io/github/license/stnolting/neorv32)](https://github.com/stnolting/neorv32/blob/master/LICENSE)
|
[![license](https://img.shields.io/github/license/stnolting/neorv32)](https://github.com/stnolting/neorv32/blob/master/LICENSE)
|
[![release](https://img.shields.io/github/v/release/stnolting/neorv32)](https://github.com/stnolting/neorv32/releases)
|
[![release](https://img.shields.io/github/v/release/stnolting/neorv32)](https://github.com/stnolting/neorv32/releases)
|
|
|
Line 43... |
Line 43... |
[compile the GCC toolchains](https://github.com/riscv/riscv-gnu-toolchain) by yourself, you can also
|
[compile the GCC toolchains](https://github.com/riscv/riscv-gnu-toolchain) by yourself, you can also
|
download [pre-compiled toolchains](https://github.com/stnolting/riscv_gcc_prebuilt) for Linux.
|
download [pre-compiled toolchains](https://github.com/stnolting/riscv_gcc_prebuilt) for Linux.
|
|
|
For more information take a look a the [![NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/PDF_32.png) NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
For more information take a look a the [![NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/PDF_32.png) NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
|
|
|
This project is hosted on [GitHub](https://github.com/stnolting/neorv32) and [opencores.org](https://opencores.org/projects/neorv32).
|
|
A not-so-complete project log can be found on [hackaday.io](https://hackaday.io/project/174167-the-neorv32-risc-v-processor).
|
|
|
|
|
### Key Features
|
### Key Features
|
|
|
- RISC-V-compliant `rv32i` CPU with optional `C`, `E`, `M`, `U`, `Zicsr`, `Zifencei` and PMP (physical memory protection) extensions
|
- RISC-V-compliant `rv32i` CPU with optional `C`, `E`, `M`, `U`, `Zicsr`, `Zifencei` and PMP (physical memory protection) extensions
|
- GCC-based toolchain ([pre-compiled rv32i and rv32e toolchains available](https://github.com/stnolting/riscv_gcc_prebuilt))
|
- GCC-based toolchain ([pre-compiled rv32i and rv32e toolchains available](https://github.com/stnolting/riscv_gcc_prebuilt))
|
- Application compilation based on [GNU makefiles](https://github.com/stnolting/neorv32/blob/master/sw/example/blink_led/makefile)
|
- Application compilation based on [GNU makefiles](https://github.com/stnolting/neorv32/blob/master/sw/example/blink_led/makefile)
|
Line 102... |
Line 106... |
- Add AXI(-Lite) bridges
|
- Add AXI(-Lite) bridges
|
- Synthesis results for more platforms
|
- Synthesis results for more platforms
|
- Port Dhrystone benchmark
|
- Port Dhrystone benchmark
|
- Implement atomic operations (`A` extension) and floating-point operations (`F` extension)
|
- Implement atomic operations (`A` extension) and floating-point operations (`F` extension)
|
- Maybe port an RTOS (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr), [freeRTOS](https://www.freertos.org) or [RIOT](https://www.riot-os.org))
|
- Maybe port an RTOS (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr), [freeRTOS](https://www.freertos.org) or [RIOT](https://www.riot-os.org))
|
- Make a 64-bit branch someday
|
|
|
|
|
|
|
|
## Features
|
## Features
|
|
|
Line 255... |
Line 258... |
| Intel | Cyclone IV `EP4CE22F17C6N` | Terasic DE0-Nano | Quartus Prime Lite 19.1 | balanced | `rv32imcu` + `Zicsr` + `Zifencei` | 3800 (17%) | 1706 (8%) | 0 (0%) | 231424 (38%) | - | - | 100 MHz |
|
| Intel | Cyclone IV `EP4CE22F17C6N` | Terasic DE0-Nano | Quartus Prime Lite 19.1 | balanced | `rv32imcu` + `Zicsr` + `Zifencei` | 3800 (17%) | 1706 (8%) | 0 (0%) | 231424 (38%) | - | - | 100 MHz |
|
| Lattice | iCE40 UltraPlus `iCE40UP5K-SG48I` | Upduino v2.0 | Radiant 2.1 (LSE) | timing | `rv32icu` + `Zicsr` + `Zifencei` | 4950 (93%) | 1641 (31%) | 0 (0%) | - | 12 (40%) | 4 (100%) | *c* 22.875 MHz |
|
| Lattice | iCE40 UltraPlus `iCE40UP5K-SG48I` | Upduino v2.0 | Radiant 2.1 (LSE) | timing | `rv32icu` + `Zicsr` + `Zifencei` | 4950 (93%) | 1641 (31%) | 0 (0%) | - | 12 (40%) | 4 (100%) | *c* 22.875 MHz |
|
| Xilinx | Artix-7 `XC7A35TICSG324-1L` | Arty A7-35T | Vivado 2019.2 | default | `rv32imcu` + `Zicsr` + `Zifencei` | 2445 (12%) | 1893 (4%) | 0 (0%) | - | 8 (16%) | - | *c* 100 MHz |
|
| Xilinx | Artix-7 `XC7A35TICSG324-1L` | Arty A7-35T | Vivado 2019.2 | default | `rv32imcu` + `Zicsr` + `Zifencei` | 2445 (12%) | 1893 (4%) | 0 (0%) | - | 8 (16%) | - | *c* 100 MHz |
|
|
|
**Notes**
|
**Notes**
|
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kb).
|
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DMEM (each 64kb).
|
The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).
|
The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).
|
* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.
|
* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.
|
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32
|
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32
|
bootloader to store and automatically boot an application program after reset (both tested successfully).
|
bootloader to store and automatically boot an application program after reset (both tested successfully).
|
|
|
Line 269... |
Line 272... |
|
|
The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the
|
The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the
|
[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark
|
[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark
|
tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.
|
tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.
|
|
|
Results generated for hardware version: `1.3.6.5`
|
Results generated for hardware version: `1.3.7.0`
|
|
|
~~~
|
~~~
|
**Configuration**
|
**Configuration**
|
Hardware: 32kB IMEM, 16kB DMEM, 100MHz clock
|
Hardware: 32kB IMEM, 16kB DMEM, 100MHz clock
|
CoreMark: 2000 iterations, MEM_METHOD is MEM_STACK
|
CoreMark: 2000 iterations, MEM_METHOD is MEM_STACK
|
Line 281... |
Line 284... |
Peripherals: UART for printing the results
|
Peripherals: UART for printing the results
|
~~~
|
~~~
|
|
|
| CPU | Executable Size | Optimization | CoreMark Score | CoreMarks/MHz |
|
| CPU | Executable Size | Optimization | CoreMark Score | CoreMarks/MHz |
|
|:---------------------|:---------------:|:------------:|:--------------:|:-------------:|
|
|:---------------------|:---------------:|:------------:|:--------------:|:-------------:|
|
| `rv32i` | 26 764 bytes | `-O3` | 28.98 | 0.2898 |
|
| `rv32i` | 26 748 bytes | `-O3` | 28.98 | 0.2898 |
|
| `rv32im` | 25 612 bytes | `-O3` | 58.82 | 0.5882 |
|
| `rv32im` | 25 580 bytes | `-O3` | 60.60 | 0.6060 |
|
| `rv32imc` | 19 652 bytes | `-O3` | 60.61 | 0.6061 |
|
| `rv32imc` | 19 636 bytes | `-O3` | 62.50 | 0.6250 |
|
| `rv32imc` + FAST_MUL | 19 652 bytes | `-O3` | 71.43 | 0.7143 |
|
| `rv32imc` + FAST_MUL | 19 636 bytes | `-O3` | 74.07 | 0.7407 |
|
|
|
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extensions (enabled via the `FAST_MUL_EN` generic).
|
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic).
|
|
|
### Instruction Cycles
|
### Instruction Cycles
|
|
|
The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,
|
The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,
|
each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).
|
each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).
|
Line 303... |
Line 306... |
The following table shows the performance results for successfully running 2000 CoreMark
|
The following table shows the performance results for successfully running 2000 CoreMark
|
iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by
|
iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by
|
dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)
|
dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)
|
by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.
|
by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.
|
|
|
Results generated for hardware version: `1.3.6.5`
|
Results generated for hardware version: `1.3.7.0`
|
|
|
| CPU | Required Clock Cycles | Executed Instructions | Average CPI |
|
| CPU | Required Clock Cycles | Executed Instructions | Average CPI |
|
|:---------------------|----------------------:|----------------------:|:-----------:|
|
|:---------------------|----------------------:|----------------------:|:-----------:|
|
| `rv32i` | 6 984 305 325 | 1 468 927 290 | 4.75 |
|
| `rv32i` | 6 955 817 507 | 1 468 927 290 | 4.73 |
|
| `rv32im` | 3 415 761 325 | 601 565 734 | 5.67 |
|
| `rv32im` | 3 376 961 507 | 601 565 750 | 5.61 |
|
| `rv32imc` | 3 398 881 094 | 601 565 832 | 5.65 |
|
| `rv32imc` | 3 274 832 513 | 601 565 964 | 5.44 |
|
| `rv32imc` + FAST_MUL | 2 835 121 094 | 601 565 846 | 4.71 |
|
| `rv32imc` + FAST_MUL | 2 711 072 513 | 601 566 024 | 4.51 |
|
|
|
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extensions (enabled via the `FAST_MUL_EN` generic).
|
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic).
|
|
|
|
|
## Top Entities
|
## Top Entities
|
|
|
The top entity of the **processor** is [**neorv32_top.vhd**](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) (from the `rtl/core` folder).
|
The top entity of the **processor** is [**neorv32_top.vhd**](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) (from the `rtl/core` folder).
|