OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [README.md] - Diff between revs 19 and 20

Go to most recent revision | Show entire file | Details | Blame | View Log

Rev 19 Rev 20
Line 1... Line 1...
# [The NEORV32 Processor](https://github.com/stnolting/neorv32) (RISC-V-compliant)
# [The NEORV32 Processor](https://github.com/stnolting/neorv32) (RISC-V)
 
 
[![Build Status](https://travis-ci.com/stnolting/neorv32.svg?branch=master)](https://travis-ci.com/stnolting/neorv32)
[![Build Status](https://travis-ci.com/stnolting/neorv32.svg?branch=master)](https://travis-ci.com/stnolting/neorv32)
[![license](https://img.shields.io/github/license/stnolting/neorv32)](https://github.com/stnolting/neorv32/blob/master/LICENSE)
[![license](https://img.shields.io/github/license/stnolting/neorv32)](https://github.com/stnolting/neorv32/blob/master/LICENSE)
[![release](https://img.shields.io/github/v/release/stnolting/neorv32)](https://github.com/stnolting/neorv32/releases)
[![release](https://img.shields.io/github/v/release/stnolting/neorv32)](https://github.com/stnolting/neorv32/releases)
 
 
Line 43... Line 43...
[compile the GCC toolchains](https://github.com/riscv/riscv-gnu-toolchain) by yourself, you can also
[compile the GCC toolchains](https://github.com/riscv/riscv-gnu-toolchain) by yourself, you can also
download [pre-compiled toolchains](https://github.com/stnolting/riscv_gcc_prebuilt) for Linux.
download [pre-compiled toolchains](https://github.com/stnolting/riscv_gcc_prebuilt) for Linux.
 
 
For more information take a look a the [![NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/PDF_32.png) NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
For more information take a look a the [![NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/PDF_32.png) NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
 
 
 
This project is hosted on [GitHub](https://github.com/stnolting/neorv32) and [opencores.org](https://opencores.org/projects/neorv32).
 
A not-so-complete project log can be found on [hackaday.io](https://hackaday.io/project/174167-the-neorv32-risc-v-processor).
 
 
 
 
###  Key Features
###  Key Features
 
 
- RISC-V-compliant `rv32i` CPU with optional `C`, `E`, `M`, `U`, `Zicsr`, `Zifencei` and PMP (physical memory protection) extensions
- RISC-V-compliant `rv32i` CPU with optional `C`, `E`, `M`, `U`, `Zicsr`, `Zifencei` and PMP (physical memory protection) extensions
- GCC-based toolchain ([pre-compiled rv32i and rv32e toolchains available](https://github.com/stnolting/riscv_gcc_prebuilt))
- GCC-based toolchain ([pre-compiled rv32i and rv32e toolchains available](https://github.com/stnolting/riscv_gcc_prebuilt))
- Application compilation based on [GNU makefiles](https://github.com/stnolting/neorv32/blob/master/sw/example/blink_led/makefile)
- Application compilation based on [GNU makefiles](https://github.com/stnolting/neorv32/blob/master/sw/example/blink_led/makefile)
Line 102... Line 106...
- Add AXI(-Lite) bridges
- Add AXI(-Lite) bridges
- Synthesis results for more platforms
- Synthesis results for more platforms
- Port Dhrystone benchmark
- Port Dhrystone benchmark
- Implement atomic operations (`A` extension) and floating-point operations (`F` extension)
- Implement atomic operations (`A` extension) and floating-point operations (`F` extension)
- Maybe port an RTOS (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr), [freeRTOS](https://www.freertos.org) or [RIOT](https://www.riot-os.org))
- Maybe port an RTOS (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr), [freeRTOS](https://www.freertos.org) or [RIOT](https://www.riot-os.org))
- Make a 64-bit branch someday
 
 
 
 
 
 
 
## Features
## Features
 
 
Line 255... Line 258...
| Intel   | Cyclone IV `EP4CE22F17C6N`        | Terasic DE0-Nano | Quartus Prime Lite 19.1 | balanced       | `rv32imcu` + `Zicsr` + `Zifencei` | 3800 (17%) | 1706  (8%) | 0 (0%) | 231424 (38%) |          - |        - |        100 MHz |
| Intel   | Cyclone IV `EP4CE22F17C6N`        | Terasic DE0-Nano | Quartus Prime Lite 19.1 | balanced       | `rv32imcu` + `Zicsr` + `Zifencei` | 3800 (17%) | 1706  (8%) | 0 (0%) | 231424 (38%) |          - |        - |        100 MHz |
| Lattice | iCE40 UltraPlus `iCE40UP5K-SG48I` | Upduino v2.0     | Radiant 2.1 (LSE)       | timing         | `rv32icu`  + `Zicsr` + `Zifencei` | 4950 (93%) | 1641 (31%) | 0 (0%) |            - |   12 (40%) | 4 (100%) | *c* 22.875 MHz |
| Lattice | iCE40 UltraPlus `iCE40UP5K-SG48I` | Upduino v2.0     | Radiant 2.1 (LSE)       | timing         | `rv32icu`  + `Zicsr` + `Zifencei` | 4950 (93%) | 1641 (31%) | 0 (0%) |            - |   12 (40%) | 4 (100%) | *c* 22.875 MHz |
| Xilinx  | Artix-7 `XC7A35TICSG324-1L`       | Arty A7-35T      | Vivado 2019.2           | default        | `rv32imcu` + `Zicsr` + `Zifencei` | 2445 (12%) | 1893  (4%) | 0 (0%) |            - |    8 (16%) |        - |    *c* 100 MHz |
| Xilinx  | Artix-7 `XC7A35TICSG324-1L`       | Arty A7-35T      | Vivado 2019.2           | default        | `rv32imcu` + `Zicsr` + `Zifencei` | 2445 (12%) | 1893  (4%) | 0 (0%) |            - |    8 (16%) |        - |    *c* 100 MHz |
 
 
**Notes**
**Notes**
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kb).
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DMEM (each 64kb).
The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).
The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).
* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.
* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32
bootloader to store and automatically boot an application program after reset (both tested successfully).
bootloader to store and automatically boot an application program after reset (both tested successfully).
 
 
Line 269... Line 272...
 
 
The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the
The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the
[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark
[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark
tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.
tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.
 
 
Results generated for hardware version: `1.3.6.5`
Results generated for hardware version: `1.3.7.0`
 
 
~~~
~~~
**Configuration**
**Configuration**
Hardware:    32kB IMEM, 16kB DMEM, 100MHz clock
Hardware:    32kB IMEM, 16kB DMEM, 100MHz clock
CoreMark:    2000 iterations, MEM_METHOD is MEM_STACK
CoreMark:    2000 iterations, MEM_METHOD is MEM_STACK
Line 281... Line 284...
Peripherals: UART for printing the results
Peripherals: UART for printing the results
~~~
~~~
 
 
| CPU                  | Executable Size | Optimization | CoreMark Score | CoreMarks/MHz |
| CPU                  | Executable Size | Optimization | CoreMark Score | CoreMarks/MHz |
|:---------------------|:---------------:|:------------:|:--------------:|:-------------:|
|:---------------------|:---------------:|:------------:|:--------------:|:-------------:|
| `rv32i`              |    26 764 bytes |        `-O3` |          28.98 |        0.2898 |
| `rv32i`              |    26 748 bytes |        `-O3` |          28.98 |        0.2898 |
| `rv32im`             |    25 612 bytes |        `-O3` |          58.82 |        0.5882 |
| `rv32im`             |    25 580 bytes |        `-O3` |          60.60 |        0.6060 |
| `rv32imc`            |    19 652 bytes |        `-O3` |          60.61 |        0.6061 |
| `rv32imc`            |    19 636 bytes |        `-O3` |          62.50 |        0.6250 |
| `rv32imc` + FAST_MUL |    19 652 bytes |        `-O3` |          71.43 |        0.7143 |
| `rv32imc` + FAST_MUL |    19 636 bytes |        `-O3` |          74.07 |        0.7407 |
 
 
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extensions (enabled via the `FAST_MUL_EN` generic).
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic).
 
 
### Instruction Cycles
### Instruction Cycles
 
 
The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,
The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,
each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).
each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).
Line 303... Line 306...
The following table shows the performance results for successfully running 2000 CoreMark
The following table shows the performance results for successfully running 2000 CoreMark
iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by
iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by
dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)
dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)
by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.
by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.
 
 
Results generated for hardware version: `1.3.6.5`
Results generated for hardware version: `1.3.7.0`
 
 
| CPU                  | Required Clock Cycles | Executed Instructions | Average CPI |
| CPU                  | Required Clock Cycles | Executed Instructions | Average CPI |
|:---------------------|----------------------:|----------------------:|:-----------:|
|:---------------------|----------------------:|----------------------:|:-----------:|
| `rv32i`              |         6 984 305 325 |         1 468 927 290 |        4.75 |
| `rv32i`              |         6 955 817 507 |         1 468 927 290 |        4.73 |
| `rv32im`             |         3 415 761 325 |           601 565 734 |        5.67 |
| `rv32im`             |         3 376 961 507 |           601 565 750 |        5.61 |
| `rv32imc`            |         3 398 881 094 |           601 565 832 |        5.65 |
| `rv32imc`            |         3 274 832 513 |           601 565 964 |        5.44 |
| `rv32imc` + FAST_MUL |         2 835 121 094 |           601 565 846 |        4.71 |
| `rv32imc` + FAST_MUL |         2 711 072 513 |           601 566 024 |        4.51 |
 
 
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extensions (enabled via the `FAST_MUL_EN` generic).
The _FAST_MUL_ configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic).
 
 
 
 
## Top Entities
## Top Entities
 
 
The top entity of the **processor** is [**neorv32_top.vhd**](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) (from the `rtl/core` folder).
The top entity of the **processor** is [**neorv32_top.vhd**](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) (from the `rtl/core` folder).

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.