Line 20... |
Line 20... |
|
|
## Introduction
|
## Introduction
|
|
|
The NEORV32 is a customizable full-scale mikrocontroller-like processor system based on a [RISC-V-compliant](https://github.com/stnolting/neorv32_riscv_compliance)
|
The NEORV32 is a customizable full-scale mikrocontroller-like processor system based on a [RISC-V-compliant](https://github.com/stnolting/neorv32_riscv_compliance)
|
`rv32i` CPU with optional `E`, `C`, `M`, `Zicsr` and `Zifencei` extensions. The CPU was built from scratch and is compliant to the **Unprivileged
|
`rv32i` CPU with optional `E`, `C`, `M`, `Zicsr` and `Zifencei` extensions. The CPU was built from scratch and is compliant to the **Unprivileged
|
ISA Specification Version 2.1** and a subset of the **Privileged Architecture Specification Version 1.12-draft**.
|
ISA Specification Version 2.2** and a subset of the **Privileged Architecture Specification Version 1.12-draft**.
|
|
|
The NEORV32 is intended as auxiliary processor within a larger SoC designs or as stand-alone custom microcontroller.
|
The NEORV32 is intended as auxiliary processor within a larger SoC designs or as stand-alone custom microcontroller.
|
Its top entity can be directly synthesized for any FPGA without modifications and provides a full-scale RISC-V based microcontroller.
|
Its top entity can be directly synthesized for any FPGA without modifications and provides a full-scale RISC-V based microcontroller.
|
|
|
The processor provides common peripherals and interfaces like input and output ports, serial interfaces for UART, I²C and SPI,
|
The processor provides common peripherals and interfaces like input and output ports, serial interfaces for UART, I²C and SPI,
|
Line 59... |
Line 59... |
The processor passes the official `rv32i`, `rv32im`, `rv32imc`, `rv32Zicsr` and `rv32Zifencei` [RISC-V compliance tests](https://github.com/riscv/riscv-compliance).
|
The processor passes the official `rv32i`, `rv32im`, `rv32imc`, `rv32Zicsr` and `rv32Zifencei` [RISC-V compliance tests](https://github.com/riscv/riscv-compliance).
|
|
|
| Project component | CI status | Note |
|
| Project component | CI status | Note |
|
|:--------------------------------------------------------------------------------|:----------|:---------|
|
|:--------------------------------------------------------------------------------|:----------|:---------|
|
| [NEORV32 processor](https://github.com/stnolting/neorv32) | [![Test](https://img.shields.io/travis/stnolting/neorv32/master.svg?label=test)](https://travis-ci.com/stnolting/neorv32) | [![sw doc](https://img.shields.io/badge/SW%20documentation-gh--pages-blue)](https://stnolting.github.io/neorv32/files.html) |
|
| [NEORV32 processor](https://github.com/stnolting/neorv32) | [![Test](https://img.shields.io/travis/stnolting/neorv32/master.svg?label=test)](https://travis-ci.com/stnolting/neorv32) | [![sw doc](https://img.shields.io/badge/SW%20documentation-gh--pages-blue)](https://stnolting.github.io/neorv32/files.html) |
|
| [Pre-build toolchain](https://github.com/stnolting/riscv_gcc_prebuilt) | [![Test](https://img.shields.io/travis/stnolting/riscv_gcc_prebuilt/master.svg?label=test)](https://travis-ci.com/stnolting/riscv_gcc_prebuilt) | |
|
| [Pre-built toolchain](https://github.com/stnolting/riscv_gcc_prebuilt) | [![Test](https://img.shields.io/travis/stnolting/riscv_gcc_prebuilt/master.svg?label=test)](https://travis-ci.com/stnolting/riscv_gcc_prebuilt) | |
|
| [RISC-V compliance test](https://github.com/stnolting/neorv32_riscv_compliance) | [![Test](https://img.shields.io/travis/stnolting/neorv32_riscv_compliance/master.svg?label=compliance)](https://travis-ci.com/stnolting/neorv32_riscv_compliance) | |
|
| [RISC-V compliance test](https://github.com/stnolting/neorv32_riscv_compliance) | [![Test](https://img.shields.io/travis/stnolting/neorv32_riscv_compliance/master.svg?label=compliance)](https://travis-ci.com/stnolting/neorv32_riscv_compliance) | |
|
|
|
|
|
### Limitations to be fixed
|
### Non RISC-V-Compliant Issues
|
|
|
* No exception is triggered in `E`-mode when using registers above `x15` yet
|
* No exception is triggered in `E` mode when using registers above `x15` (*needs fixing*)
|
* `misa` CSR is read-only; no dynamic enabling/disabling of implemented CPU extensions during runtime
|
* `misa` CSR is read-only - no dynamic enabling/disabling of implemented CPU extensions during runtime
|
|
* Machine software interrupt `msi` is implemented, but there is no mechanism available to trigger it
|
|
* The `[m]cycleh` and `[m]instreth` CSR counters are only 20-bit wide (in contrast to original 32-bit)
|
|
|
|
|
### To-Do / Wish List
|
### To-Do / Wish List
|
|
|
|
- Option to use DSPs for multiplications in `M` extensions (would be so much faster)
|
- Synthesis results for more platforms
|
- Synthesis results for more platforms
|
- Port Dhrystone benchmark
|
- Port Dhrystone benchmark
|
- Implement atomic operations (`A` extension)
|
- Implement atomic operations (`A` extension)
|
- Implement co-processor for single-precision floating-point operations (`F` extension)
|
- Implement co-processor for single-precision floating-point operations (`F` extension)
|
- Implement user mode (`U` extension)
|
- Implement user mode (`U` extension)
|
- Make a 64-bit branch
|
|
- Maybe port an RTOS (like [freeRTOS](https://www.freertos.org/) or [RIOT](https://www.riot-os.org/))
|
- Maybe port an RTOS (like [freeRTOS](https://www.freertos.org/) or [RIOT](https://www.riot-os.org/))
|
|
- Make a 64-bit branch
|
|
|
|
|
|
|
## Features
|
## Features
|
|
|
Line 96... |
Line 99... |
- Detailed [datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf) (pdf)
|
- Detailed [datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf) (pdf)
|
- Completely described in behavioral, platform-independent VHDL – no primitives, macros, etc.
|
- Completely described in behavioral, platform-independent VHDL – no primitives, macros, etc.
|
- Fully synchronous design, no latches, no gated clocks
|
- Fully synchronous design, no latches, no gated clocks
|
- Small hardware footprint and high operating frequency
|
- Small hardware footprint and high operating frequency
|
- Highly customizable processor configuration
|
- Highly customizable processor configuration
|
- Optional processor-internal data and instruction memories (DMEM/IMEM)
|
- _Optional_ processor-internal data and instruction memories (DMEM/IMEM)
|
- _Optional_ internal bootloader with UART console and automatic SPI flash boot option
|
- _Optional_ internal bootloader with UART console and automatic SPI flash boot option
|
- _Optional_ machine system timer (MTIME), RISC-V-compliant
|
- _Optional_ machine system timer (MTIME), RISC-V-compliant
|
- _Optional_ universal asynchronous receiver and transmitter (UART)
|
- _Optional_ universal asynchronous receiver and transmitter (UART)
|
- _Optional_ 8/16/24/32-bit serial peripheral interface controller (SPI) with 8 dedicated chip select lines
|
- _Optional_ 8/16/24/32-bit serial peripheral interface controller (SPI) with 8 dedicated chip select lines
|
- _Optional_ two wire serial interface controller (TWI), compatible to the I²C standard
|
- _Optional_ two wire serial interface controller (TWI), compatible to the I²C standard
|
Line 109... |
Line 112... |
- _Optional_ watchdog timer (WDT)
|
- _Optional_ watchdog timer (WDT)
|
- _Optional_ PWM controller with 4 channels and 8-bit duty cycle resolution (PWM)
|
- _Optional_ PWM controller with 4 channels and 8-bit duty cycle resolution (PWM)
|
- _Optional_ GARO-based true random number generator (TRNG)
|
- _Optional_ GARO-based true random number generator (TRNG)
|
- _Optional_ core-local interrupt controller with 8 channels (CLIC)
|
- _Optional_ core-local interrupt controller with 8 channels (CLIC)
|
- _Optional_ dummy device (DEVNULL) (can be used for *fast* simulation console output)
|
- _Optional_ dummy device (DEVNULL) (can be used for *fast* simulation console output)
|
|
- System configuration information memory to check hardware configuration by software (SYSINFO)
|
|
|
### CPU Features
|
### CPU Features
|
|
|
![neorv32 Overview](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/neorv32_cpu.png)
|
![neorv32 Overview](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/neorv32_cpu.png)
|
|
|
The CPU is [compliant](https://github.com/stnolting/neorv32_riscv_compliance) to the
|
The CPU is [compliant](https://github.com/stnolting/neorv32_riscv_compliance) to the
|
[official RISC-V specifications](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/riscv-spec.pdf) including a subset of the
|
[official RISC-V specifications (2.2)](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/riscv-spec.pdf) including a subset of the
|
[RISC-V privileged architecture specifications](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/riscv-spec.pdf).
|
[RISC-V privileged architecture specifications (1.12-draft)](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/riscv-spec.pdf).
|
|
|
More information regarding the CPU including a detailed list of the instruction set and the available CSRs can be found in
|
More information regarding the CPU including a detailed list of the instruction set and the available CSRs can be found in
|
the [![NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/PDF_32.png) NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
the [![NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/PDF_32.png) NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
|
|
|
|
**General**:
|
**General**:
|
* No hardware support of unaligned accesses - they will trigger and exception
|
* Modified Harvard architecture (separate CPU interfaces for data and instructions; single processor-bus via bus switch)
|
* Two stages in-order pipeline (FETCH, EXECUTE); each stage uses a multi-cycle processing scheme
|
* Two stages in-order pipeline (FETCH, EXECUTE); each stage uses a multi-cycle processing scheme
|
|
* No hardware support of unaligned accesses - they will trigger and exception
|
|
|
|
|
**RV32I base instruction set** (`I` extension):
|
**RV32I base instruction set** (`I` extension):
|
* ALU instructions: `LUI` `AUIPC` `ADDI` `SLTI` `SLTIU` `XORI` `ORI` `ANDI` `SLLI` `SRLI` `SRAI` `ADD` `SUB` `SLL` `SLT` `SLTU` `XOR` `SRL` `SRA` `OR` `AND`
|
* ALU instructions: `LUI` `AUIPC` `ADDI` `SLTI` `SLTIU` `XORI` `ORI` `ANDI` `SLLI` `SRLI` `SRAI` `ADD` `SUB` `SLL` `SLT` `SLTU` `XOR` `SRL` `SRA` `OR` `AND`
|
* Jump and branch instructions: `JAL` `JALR` `BEQ` `BNE` `BLT` `BGE` `BLTU` `BGEU`
|
* Jump and branch instructions: `JAL` `JALR` `BEQ` `BNE` `BLT` `BGE` `BLTU` `BGEU`
|
Line 141... |
Line 146... |
* Memory instructions: `C.LW` `C.SW` `C.LWSP` `C.SWSP`
|
* Memory instructions: `C.LW` `C.SW` `C.LWSP` `C.SWSP`
|
* Misc instructions: `C.EBREAK` (only with `Zicsr` extension)
|
* Misc instructions: `C.EBREAK` (only with `Zicsr` extension)
|
|
|
**Embedded CPU version** (`E` extension):
|
**Embedded CPU version** (`E` extension):
|
* Reduced register file (only the 16 lowest registers)
|
* Reduced register file (only the 16 lowest registers)
|
* No performance counter CSRs
|
|
|
|
**Integer multiplication and division hardware** (`M` extension):
|
**Integer multiplication and division hardware** (`M` extension):
|
* Multiplication instructions: `MUL` `MULH` `MULHSU` `MULHU`
|
* Multiplication instructions: `MUL` `MULH` `MULHSU` `MULHU`
|
* Division instructions: `DIV` `DIVU` `REM` `REMU`
|
* Division instructions: `DIV` `DIVU` `REM` `REMU`
|
|
|
**Privileged architecture / CSR access** (`Zicsr` extension):
|
**Privileged architecture / CSR access** (`Zicsr` extension):
|
* Privilege levels: `M-mode` (Machine mode)
|
* Privilege levels: `M-mode` (Machine mode)
|
* CSR access instructions: `CSRRW` `CSRRS` `CSRRC` `CSRRWI` `CSRRSI` `CSRRCI`
|
* CSR access instructions: `CSRRW` `CSRRS` `CSRRC` `CSRRWI` `CSRRSI` `CSRRCI`
|
* System instructions: `MRET` `WFI`
|
* System instructions: `MRET` `WFI`
|
* Counter CSRs: `cycle` `cycleh` `time` `timeh` `instret` `instreth` `mcycle` `mcycleh` `minstret` `minstreth`
|
* Counter CSRs: `[m]cycle[h]` `[m]instret[h]` `time[h]`
|
* Machine CSRs: `mstatus` `misa`(read-only!) `mie` `mtvec` `mscratch` `mepc` `mcause` `mtval` `mip` `mimpid` `mhartid`
|
* Machine CSRs: `mstatus` `misa`(read-only!) `mie` `mtvec` `mscratch` `mepc` `mcause` `mtval` `mip` `mvendorid` `marchid` `mimpid` `mhartid`
|
* Custom CSRs: `mfeatures` `mclock` `mispacebase` `mdspacebase` `mispacesize` `mdspacesize`
|
|
* Supported exceptions and interrupts:
|
* Supported exceptions and interrupts:
|
* Misaligned instruction address
|
* Misaligned instruction address
|
* Instruction access fault
|
* Instruction access fault
|
* Illegal instruction
|
* Illegal instruction
|
* Breakpoint (via `ebreak` instruction)
|
* Breakpoint (via `ebreak` instruction)
|
* Load address misaligned
|
* Load address misaligned
|
* Load access fault
|
* Load access fault
|
* Store address misaligned
|
* Store address misaligned
|
* Store access fault
|
* Store access fault
|
* Environment call from M-mode (via `ecall` instruction)
|
* Environment call from M-mode (via `ecall` instruction)
|
* Machine software interrupt `msi`
|
|
* Machine timer interrupt `mti` (via MTIME unit)
|
* Machine timer interrupt `mti` (via MTIME unit)
|
* Machine external interrupt `mei` (via CLIC unit)
|
* Machine external interrupt `mei` (via CLIC unit)
|
|
|
**Privileged architecture / FENCE.I** (`Zifencei` extension):
|
**Privileged architecture / FENCE.I** (`Zifencei` extension):
|
* System instructions: `FENCE.I`
|
* System instructions: `FENCE.I`
|
Line 177... |
Line 179... |
## FPGA Implementation Results
|
## FPGA Implementation Results
|
|
|
This chapter shows exemplary implementation results of the NEORV32 processor for an **Intel Cyclone IV EP4CE22F17C6N FPGA** on
|
This chapter shows exemplary implementation results of the NEORV32 processor for an **Intel Cyclone IV EP4CE22F17C6N FPGA** on
|
a DE0-nano board. The design was synthesized using **Intel Quartus Prime Lite 19.1** ("balanced implementation"). The timing
|
a DE0-nano board. The design was synthesized using **Intel Quartus Prime Lite 19.1** ("balanced implementation"). The timing
|
information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not otherwise specified, the default configuration
|
information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not otherwise specified, the default configuration
|
of the processor's generics is assumed. No constraints were used.
|
of the processor's generics is assumed. No constraints were used at all.
|
|
|
### CPU
|
### CPU
|
|
|
Results generated for hardware version: `1.0.0.0`
|
Results generated for hardware version: `1.2.0.0`
|
|
|
| CPU Configuration | LEs | FFs | Memory bits | DSPs | f_max |
|
| CPU Configuration | LEs | FFs | Memory bits | DSPs | f_max |
|
|:--------------------|:----------:|:--------:|:-----------:|:------:|:-------:|
|
|:---------------------------------|:----------:|:--------:|:-----------:|:----:|:-------:|
|
| `rv32i` | 1027 | 474 | 2048 | 0 (0%) | 111 MHz |
|
| `rv32i` | 1065 | 477 | 2048 | 0 | 112 MHz |
|
| `rv32i` + `Zicsr` | 1721 | 868 | 2048 | 0 (0%) | 104 MHz |
|
| `rv32i` + `Zicsr` + `Zifencei` | 1914 | 837 | 2048 | 0 | 100 MHz |
|
| `rv32im` + `Zicsr` | 2298 | 1115 | 2048 | 0 (0%) | 103 MHz |
|
| `rv32im` + `Zicsr` + `Zifencei` | 2542 | 1085 | 2048 | 0 | 100 MHz |
|
| `rv32imc` + `Zicsr` | 2557 | 1138 | 2048 | 0 (0%) | 103 MHz |
|
| `rv32imc` + `Zicsr` + `Zifencei` | 2806 | 1102 | 2048 | 0 | 100 MHz |
|
| `rv32emc` + `Zicsr` | 2342 | 1005 | 1024 | 0 (0%) | 100 MHz |
|
| `rv32emc` + `Zicsr` + `Zifencei` | 2783 | 1102 | 1024 | 0 | 100 MHz |
|
|
|
### Processor-Internal Peripherals and Memories
|
### Processor-Internal Peripherals and Memories
|
|
|
Results generated for hardware version: `1.0.5.0`
|
Results generated for hardware version: `1.2.0.0`
|
|
|
| Module | Description | LEs | FFs | Memory bits | DSPs |
|
| Module | Description | LEs | FFs | Memory bits | DSPs |
|
|:---------|:------------------------------------------------|:---:|:---:|:-----------:|:----:|
|
|:---------|:------------------------------------------------|:---:|:---:|:-----------:|:----:|
|
| BOOT ROM | Bootloader ROM (4kB) | 3 | 1 | 32 768 | 0 |
|
| BOOT ROM | Bootloader ROM (4kB) | 3 | 1 | 32 768 | 0 |
|
| DEVNULL | Dummy device | 3 | 1 | 0 | 0 |
|
| DEVNULL | Dummy device | 3 | 1 | 0 | 0 |
|
Line 205... |
Line 207... |
| GPIO | General purpose input/output ports | 38 | 33 | 0 | 0 |
|
| GPIO | General purpose input/output ports | 38 | 33 | 0 | 0 |
|
| IMEM | Processor-internal instruction memory (16kb) | 7 | 2 | 131 072 | 0 |
|
| IMEM | Processor-internal instruction memory (16kb) | 7 | 2 | 131 072 | 0 |
|
| MTIME | Machine system timer | 269 | 166 | 0 | 0 |
|
| MTIME | Machine system timer | 269 | 166 | 0 | 0 |
|
| PWM | Pulse-width modulation controller | 76 | 69 | 0 | 0 |
|
| PWM | Pulse-width modulation controller | 76 | 69 | 0 | 0 |
|
| SPI | Serial peripheral interface | 206 | 125 | 0 | 0 |
|
| SPI | Serial peripheral interface | 206 | 125 | 0 | 0 |
|
|
| SYSINFO | System configuration information memory | 7 | 7 | 0 | 0 |
|
| TRNG | True random number generator | 104 | 93 | 0 | 0 |
|
| TRNG | True random number generator | 104 | 93 | 0 | 0 |
|
| TWI | Two-wire interface | 78 | 44 | 0 | 0 |
|
| TWI | Two-wire interface | 78 | 44 | 0 | 0 |
|
| UART | Universal asynchronous receiver/transmitter | 151 | 108 | 0 | 0 |
|
| UART | Universal asynchronous receiver/transmitter | 151 | 108 | 0 | 0 |
|
| WDT | Watchdog timer | 57 | 45 | 0 | 0 |
|
| WDT | Watchdog timer | 57 | 45 | 0 | 0 |
|
|
|
|
|
### Exemplary FPGA Setups
|
### Exemplary FPGA Setups
|
|
|
Exemplary implementation results for different FPGA platforms. The processor setup uses *all provided peripherals*,
|
Exemplary implementation results for different FPGA platforms. The processor setup uses *all provided peripherals*,
|
all CPU extensions (`rv32imc` + `Zicsr` + `Zifencei`, no `E` extension), no external memory interface and only internal
|
no external memory interface and only internal instruction and data memories. IMEM uses 16kB and DMEM uses 8kB memory space. The setup's top entity connects most of the
|
instruction and data memories. IMEM uses 16kB and DMEM uses 8kB memory space. The setup top entity connects most of the
|
|
processor's [top entity](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) signals
|
processor's [top entity](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) signals
|
to FPGA pins - except for the Wishbone bus and the external interrupt signals.
|
to FPGA pins - except for the Wishbone bus and the interrupt signals.
|
|
|
Results generated for hardware version: `1.0.1.1`
|
Results generated for hardware version: `1.2.0.0`
|
|
|
| Vendor | FPGA | Board | Toolchain | Impl. strategy | LUT / LE | FF / REG | DSP | Memory Bits | BRAM / EBR | SPRAM | Frequency |
|
| Vendor | FPGA | Board | Toolchain | Impl. strategy |CPU | LUT / LE | FF / REG | DSP | Memory Bits | BRAM / EBR | SPRAM | Frequency |
|
|:--------|:----------------------------------|:-----------------|:------------------------|:---------------|:-----------|:-----------|:-------|:-------------|:-----------|:---------|------------:|
|
|:--------|:----------------------------------|:-----------------|:------------------------|:---------------|:---------------------------------|:-----------|:-----------|:-------|:-------------|:-----------|:---------|------------:|
|
| Intel | Cyclone IV `EP4CE22F17C6N` | Terasic DE0-Nano | Quartus Prime Lite 19.1 | balanced | 3841 (17%) | 1866 (8%) | 0 (0%) | 231424 (38%) | - | - | 103 MHz |
|
| Intel | Cyclone IV `EP4CE22F17C6N` | Terasic DE0-Nano | Quartus Prime Lite 19.1 | balanced | `rv32imc` + `Zicsr` + `Zifencei` | 4066 (18%) | 1877 (8%) | 0 (0%) | 231424 (38%) | - | - | 100 MHz |
|
| Lattice | iCE40 UltraPlus `iCE40UP5K-SG48I` | Upduino v2.0 | Radiant 2.1 (LSE) | default | 5014 (95%) | 1952 (37%) | 0 (0%) | - | 12 (40%) | 4 (100%) | c 20.25 MHz |
|
| Lattice | iCE40 UltraPlus `iCE40UP5K-SG48I` | Upduino v2.0 | Radiant 2.1 (LSE) | timing | `rv32ic` + `Zicsr` + `Zifencei` | 5017 (95%) | 1717 (32%) | 0 (0%) | - | 12 (40%) | 4 (100%) | c 20.25 MHz |
|
| Xilinx | Artix-7 `XC7A35TICSG324-1L` | Arty A7-35T | Vivado 2019.2 | default | 2312 (11%) | 1924 (5%) | 0 (0%) | - | 8 (16%) | - | c 100 MHz |
|
| Xilinx | Artix-7 `XC7A35TICSG324-1L` | Arty A7-35T | Vivado 2019.2 | default | `rv32imc` + `Zicsr` + `Zifencei` | 2494 (12%) | 1930 (5%) | 0 (0%) | - | 8 (16%) | - | c 100 MHz |
|
|
|
**Notes**
|
**Notes**
|
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kb).
|
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kb).
|
The FPGA-specific memory components can be found in the [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up) folder.
|
The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).
|
* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are `f_max` results from the place and route timing reports.
|
* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.
|
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32
|
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32
|
bootloader to store and automatically boot an application program after reset (both tested successfully).
|
bootloader to store and automatically boot an application program after reset (both tested successfully).
|
|
|
## Performance
|
## Performance
|
|
|
Line 242... |
Line 244... |
|
|
The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the
|
The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the
|
[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark
|
[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark
|
tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.
|
tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.
|
|
|
Results generated for hardware version: `1.0.0.0`
|
Results generated for hardware version: `1.2.0.0`
|
|
|
~~~
|
~~~
|
**Configuration**
|
**Configuration**
|
Hardware: 32kB IMEM, 16kb DMEM, 100MHz clock
|
Hardware: 32kB IMEM, 16kB DMEM, 100MHz clock
|
CoreMark: 2000 iterations, MEM_METHOD is MEM_STACK
|
CoreMark: 2000 iterations, MEM_METHOD is MEM_STACK
|
CPU extensions: `rv32i` or `rv32im` or `rv32imc`
|
Compiler: RISCV32-GCC 9.2.0
|
Used peripherals: UART for printing the results
|
Peripherals: UART for printing the results
|
~~~
|
~~~
|
|
|
| __Configuration__ | __Optimization__ | __Executable Size__ | __CoreMark Score__ | __CoreMarks/MHz__ |
|
| CPU | Optimization | CoreMark Score | CoreMarks/MHz |
|
|:------------------|:----------------:|:-------------------:|:------------------:|:-----------------:|
|
|:---------------------------------|:------------:|:--------------:|:-------------:|
|
| `rv32i` | `-Os` | 18 044 bytes | 21.98 | 0.21 |
|
| `rv32i` + `Zicsr` + `Zifencei` | `-O2` | 25.97 | 0.2597 |
|
| `rv32i` | `-O2` | 20 388 bytes | 25 | 0.25 |
|
| `rv32im` + `Zicsr` + `Zifencei` | `-O2` | 55.55 | 0.5555 |
|
| `rv32im` | `-Os` | 16 980 bytes | 40 | 0.40 |
|
| `rv32imc` + `Zicsr` + `Zifencei` | `-O2` | 54.05 | 0.5405 |
|
| `rv32im` | `-O2` | 19 436 bytes | 51.28 | 0.51 |
|
|
| `rv32imc` | `-Os` | 13 076 bytes | 39.22 | 0.39 |
|
|
| `rv32imc` | `-O2` | 15 208 bytes | 50 | 0.50 |
|
|
|
|
|
|
### Instruction Cycles
|
### Instruction Cycles
|
|
|
The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,
|
The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,
|
Line 274... |
Line 273... |
Please note that the CPU-internal shifter (e.g. for the `SLL` instruction) as well as the multiplier and divider of the
|
Please note that the CPU-internal shifter (e.g. for the `SLL` instruction) as well as the multiplier and divider of the
|
`M` extension use a bit-serial approach and require several cycles for completion.
|
`M` extension use a bit-serial approach and require several cycles for completion.
|
|
|
The following table shows the performance results for successfully running 2000 CoreMark
|
The following table shows the performance results for successfully running 2000 CoreMark
|
iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by
|
iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by
|
dividing the total number of required clock cycles (all of CoreMark
|
dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)
|
– not only the timed core) by the number of executed instructions (`instret[h]` CSRs). The executables
|
by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O2`.
|
were generated using optimization `-O2`.
|
|
|
Results generated for hardware version: `1.2.0.0`
|
| CPU / Toolchain Config. | Required Clock Cycles | Executed Instructions | Average CPI |
|
|
|:------------------------|----------------------:|----------------------:|:-----------:|
|
| CPU | Required Clock Cycles | Executed Instructions | Average CPI |
|
| `rv32i` | 19 355 607 369 | 2 995 064 579 | 6.5 |
|
|:---------------------------------|----------------------:|----------------------:|:-----------:|
|
| `rv32im` | 5 809 384 583 | 867 377 291 | 6.7 |
|
| `rv32i` + `Zicsr` + `Zifencei` | 7 754 927 850 | 1 492 843 669 | 5.2 |
|
| `rv32imc` | 5 560 220 723 | 825 898 407 | 6.7 |
|
| `rv32im` + `Zicsr` + `Zifencei` | 3 684 015 850 | 626 274 115 | 5.9 |
|
|
| `rv32imc` + `Zicsr` + `Zifencei` | 3 788 220 853 | 626 274 115 | 6.0 |
|
|
|
|
|
|
|
## Top Entity
|
## Top Entity
|
|
|
The top entity of the processor is [**neorv32_top.vhd**](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) (from the `rtl/core` folder).
|
The top entity of the processor is [**neorv32_top.vhd**](https://github.com/stnolting/neorv32/blob/master/rtl/core/neorv32_top.vhd) (from the `rtl/core` folder).
|
Just instantiate this file in your project and you are ready to go! All signals of this top entity are of type *std_ulogic* or *std_ulogic_vector*, respectively
|
Just instantiate this file in your project and you are ready to go! All signals of this top entity are of type *std_ulogic* or *std_ulogic_vector*, respectively
|
(except for the TWI signals, which are of type *std_logic*).
|
(except for the TWI signals, which are of type *std_logic*).
|
|
|
Use the generics to configure the processor according to your needs. Each generics is initilized with the default configuration.
|
Use the generics to configure the processor according to your needs. Each generic is initilized with the default configuration.
|
Detailed information regarding the signals and configuration generics can be found in the [NEORV32 documentary](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
Detailed information regarding the signals and configuration generics can be found in the [NEORV32 documentary](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
|
|
```vhdl
|
```vhdl
|
entity neorv32_top is
|
entity neorv32_top is
|
generic (
|
generic (
|
-- General --
|
-- General --
|
CLOCK_FREQUENCY : natural := 0; -- clock frequency of clk_i in Hz
|
CLOCK_FREQUENCY : natural := 0; -- clock frequency of clk_i in Hz
|
HART_ID : std_ulogic_vector(31 downto 0) := x"00000000"; -- custom hardware thread ID
|
|
BOOTLOADER_USE : boolean := true; -- implement processor-internal bootloader?
|
BOOTLOADER_USE : boolean := true; -- implement processor-internal bootloader?
|
CSR_COUNTERS_USE : boolean := true; -- implement RISC-V perf. counters ([m]instret[h], [m]cycle[h], time[h])?
|
CSR_COUNTERS_USE : boolean := true; -- implement RISC-V perf. counters ([m]instret[h], [m]cycle[h], time[h])?
|
|
USER_CODE : std_ulogic_vector(31 downto 0) := x"00000000"; -- custom user code
|
-- RISC-V CPU Extensions --
|
-- RISC-V CPU Extensions --
|
CPU_EXTENSION_RISCV_C : boolean := true; -- implement compressed extension?
|
CPU_EXTENSION_RISCV_C : boolean := true; -- implement compressed extension?
|
CPU_EXTENSION_RISCV_E : boolean := false; -- implement embedded RF extension?
|
CPU_EXTENSION_RISCV_E : boolean := false; -- implement embedded RF extension?
|
CPU_EXTENSION_RISCV_M : boolean := true; -- implement muld/div extension?
|
CPU_EXTENSION_RISCV_M : boolean := true; -- implement muld/div extension?
|
CPU_EXTENSION_RISCV_Zicsr : boolean := true; -- implement CSR system?
|
CPU_EXTENSION_RISCV_Zicsr : boolean := true; -- implement CSR system?
|
Line 350... |
Line 350... |
wb_sel_o : out std_ulogic_vector(03 downto 0); -- byte enable
|
wb_sel_o : out std_ulogic_vector(03 downto 0); -- byte enable
|
wb_stb_o : out std_ulogic; -- strobe
|
wb_stb_o : out std_ulogic; -- strobe
|
wb_cyc_o : out std_ulogic; -- valid cycle
|
wb_cyc_o : out std_ulogic; -- valid cycle
|
wb_ack_i : in std_ulogic := '0'; -- transfer acknowledge
|
wb_ack_i : in std_ulogic := '0'; -- transfer acknowledge
|
wb_err_i : in std_ulogic := '0'; -- transfer error
|
wb_err_i : in std_ulogic := '0'; -- transfer error
|
|
-- Advanced memory control signals (available if MEM_EXT_USE = true) --
|
|
fence_o : out std_ulogic; -- indicates an executed FENCE operation
|
|
fencei_o : out std_ulogic; -- indicates an executed FENCEI operation
|
-- GPIO (available if IO_GPIO_USE = true) --
|
-- GPIO (available if IO_GPIO_USE = true) --
|
gpio_o : out std_ulogic_vector(15 downto 0); -- parallel output
|
gpio_o : out std_ulogic_vector(15 downto 0); -- parallel output
|
gpio_i : in std_ulogic_vector(15 downto 0) := (others => '0'); -- parallel input
|
gpio_i : in std_ulogic_vector(15 downto 0) := (others => '0'); -- parallel input
|
-- UART (available if IO_UART_USE = true) --
|
-- UART (available if IO_UART_USE = true) --
|
uart_txd_o : out std_ulogic; -- UART send data
|
uart_txd_o : out std_ulogic; -- UART send data
|
Line 417... |
Line 420... |
[https://github.com/stnolting/riscv_gcc_prebuilt](https://github.com/stnolting/riscv_gcc_prebuilt)
|
[https://github.com/stnolting/riscv_gcc_prebuilt](https://github.com/stnolting/riscv_gcc_prebuilt)
|
|
|
|
|
### Dowload the NEORV32 and Create a Hardware Project
|
### Dowload the NEORV32 and Create a Hardware Project
|
|
|
Now its time to get the most recent version the NEORV32 Processor project from GitHub. Clone the NEORV32 repository using
|
Get the sources of the NEORV32 Processor project. You can either download a [release](https://github.com/stnolting/neorv32/releases)
|
`git` from the command line (suggested for easy project updates via `git pull`):
|
or get the most recent version of this project as [`*.zip` file](https://github.com/stnolting/neorv32/archive/master.zip) or using `git clone` (suggested for easy project updates via `git pull`):
|
|
|
$ git clone https://github.com/stnolting/neorv32.git
|
$ git clone https://github.com/stnolting/neorv32.git
|
|
|
Create a new project with your FPGA design tool of choice. Add all the `*.vhd` files from the [`rtl/core`](https://github.com/stnolting/neorv32/blob/master/rtl)
|
Create a new project with your FPGA design tool of choice and add all the `*.vhd` files from the [`rtl/core`](https://github.com/stnolting/neorv32/blob/master/rtl)
|
folder to this project and add them to a **new library** called `neorv32`.
|
folder to this project. Make sure to add them to a **new library** called `neorv32`.
|
|
|
You can either instantiate the [processor's top entity](https://github.com/stnolting/neorv32#top-entity) in your own project or you
|
You can either instantiate the [processor's top entity](https://github.com/stnolting/neorv32#top-entity) in your own project or you
|
can use a simple [test setup](https://github.com/stnolting/neorv32/blob/master/rtl/top_templates/neorv32_test_setup.vhd) (from the project's
|
can use a simple [test setup](https://github.com/stnolting/neorv32/blob/master/rtl/top_templates/neorv32_test_setup.vhd) (from the project's
|
[`rtl/top_templates`](https://github.com/stnolting/neorv32/blob/master/rtl/top_templates) folder) as top entity.
|
[`rtl/top_templates`](https://github.com/stnolting/neorv32/blob/master/rtl/top_templates) folder) as top entity.
|
This test setup instantiates the processor, implements most of the peripherals and the basic ISA. Only the UART, clock, reset and some GPIO output sginals are
|
This test setup instantiates the processor and implements most of the peripherals and some ISA extensions. Only the UART, clock, reset and some GPIO output sginals are
|
propagated (basically, its a FPGA "hello world" example):
|
propagated (basically, its a FPGA "hello world" example):
|
|
|
```vhdl
|
```vhdl
|
entity neorv32_test_setup is
|
entity neorv32_test_setup is
|
port (
|
port (
|
Line 518... |
Line 521... |
|
|
|
|
|
|
## Legal
|
## Legal
|
|
|
This is project is released under the BSD 3-Clause license. No copyright infringement intended.
|
This project is released under the BSD 3-Clause license. No copyright infringement intended.
|
Other implied or used projects might have different licensing - see their documentation to get more information.
|
Other implied or used projects might have different licensing - see their documentation to get more information.
|
|
|
#### Citation
|
#### Citation
|
|
|
If you are using the NEORV32 Processor in some kind of publication, please cite it as follows:
|
If you are using the NEORV32 Processor in some kind of publication, please cite it as follows:
|
Line 593... |
Line 596... |
|
|
![Open Source Hardware Logo https://www.oshwa.org](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/oshw_logo.png)
|
![Open Source Hardware Logo https://www.oshwa.org](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/figures/oshw_logo.png)
|
|
|
This project is not affiliated with or endorsed by the Open Source Initiative (https://www.oshwa.org / https://opensource.org).
|
This project is not affiliated with or endorsed by the Open Source Initiative (https://www.oshwa.org / https://opensource.org).
|
|
|
\
|
.
|
|
|
Made with :coffee: in Hannover, Germany.
|
Made with :coffee: in Hannover, Germany.
|