Line 113... |
Line 113... |
* small hardware footprint and high operating frequency for easy integration
|
* small hardware footprint and high operating frequency for easy integration
|
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
|
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
|
** RISC-V compatibility: passes the official architecture tests
|
** RISC-V compatibility: passes the official architecture tests
|
** base architecture + privileged architecture (optional) + ISA extensions (optional)
|
** base architecture + privileged architecture (optional) + ISA extensions (optional)
|
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
|
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
|
|
** aims to support <<_full_virtualization>> capabilities (CPU _and_ SoC) to increase execution safety
|
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
|
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
|
* **NEORV32 Processor (SoC)**: highly-configurable full-scale microcontroller-like processor system
|
* **NEORV32 Processor (SoC)**: highly-configurable full-scale microcontroller-like processor system
|
** based on the NEORV32 CPU
|
** based on the NEORV32 CPU
|
** optional serial interfaces (UARTs, TWI, SPI)
|
** optional serial interfaces (UARTs, TWI, SPI)
|
** optional timers and counters (WDT, MTIME)
|
** optional timers and counters (WDT, MTIME)
|
Line 142... |
Line 143... |
:sectnums:
|
:sectnums:
|
=== Project Folder Structure
|
=== Project Folder Structure
|
|
|
...................................
|
...................................
|
neorv32 - Project home folder
|
neorv32 - Project home folder
|
├.ci - Scripts for continuous integration
|
│
|
├setups - Example setups for various FPGA boards and toolchains
|
|
│└...
|
|
├CHANGELOG.md - Project change log
|
|
├docs - Project documentation
|
├docs - Project documentation
|
|
│├datasheet - .adoc sources for NEORV32 data sheet
|
│├doxygen_build - Software framework documentation (generated by doxygen)
|
│├doxygen_build - Software framework documentation (generated by doxygen)
|
│├src_adoc - AsciiDoc sources for this document
|
│├figures - Figures and logos
|
|
│├icons - Misc. symbols
|
│├references - Data sheets and RISC-V specs.
|
│├references - Data sheets and RISC-V specs.
|
│└figures - Figures and logos
|
│└src_adoc - AsciiDoc sources for this document
|
├riscv-arch-test - Port files for the official RISC-V architecture tests
|
│
|
├rtl - VHDL sources
|
├rtl - VHDL sources
|
│├core - Sources of the CPU & SoC
|
│├core - Core sources of the CPU & SoC
|
│└templates - Alternate/additional top entities/wrappers
|
│└templates - Alternate/additional top entities & wrappers
|
│ ├processor - Processor wrappers
|
│ ├processor - Processor SoC wrappers
|
│ └system - System wrappers for advanced connectivity
|
│ └system - System wrappers for advanced connectivity
|
├sim - Simulation files
|
│
|
│└rtl_modules - Processor modules for simulation-only
|
├setups - Example setups for various FPGAs, boards and toolchains
|
|
│└...
|
|
│
|
|
├sim - Simulation files (see User Guide)
|
|
│
|
└sw - Software framework
|
└sw - Software framework
|
├bootloader - Sources and scripts for the NEORV32 internal bootloader
|
├bootloader - Sources and scripts for the NEORV32 internal bootloader
|
├common - Linker script and crt0.S start-up code
|
├common - Linker script and crt0.S start-up code
|
├example - Various example programs
|
├example - Various example programs
|
│└...
|
│└...
|
|
├isa-test
|
|
│├riscv-arch-test - RISC-V spec. compatibility test framework (submodule)
|
|
│└port-neorv32 - Port files for the official RISC-V architecture tests
|
├ocd_firmware - source code for on-chip debugger's "park loop"
|
├ocd_firmware - source code for on-chip debugger's "park loop"
|
├openocd - OpenOCD on-chip debugger configuration files
|
├openocd - OpenOCD on-chip debugger configuration files
|
├image_gen - Helper program to generate NEORV32 executables
|
├image_gen - Helper program to generate NEORV32 executables
|
└lib - Processor core library
|
└lib - Processor core library
|
├include - Header files (*.h)
|
├include - Header files (*.h)
|
└source - Source files (*.c)
|
└source - Source files (*.c)
|
...................................
|
...................................
|
|
|
[NOTE]
|
|
There are further files and folders starting with a dot which – for example – contain
|
|
data/configurations only relevant for git or for the continuous integration framework (`.ci`).
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
Line 209... |
Line 213... |
│
|
│
|
├neorv32_boot_rom.vhd - Bootloader ROM
|
├neorv32_boot_rom.vhd - Bootloader ROM
|
│└neorv32_bootloader_image.vhd - Bootloader boot ROM memory image
|
│└neorv32_bootloader_image.vhd - Bootloader boot ROM memory image
|
├neorv32_busswitch.vhd - Processor bus switch for CPU buses (I&D)
|
├neorv32_busswitch.vhd - Processor bus switch for CPU buses (I&D)
|
├neorv32_bus_keeper.vhd - Processor-internal bus monitor
|
├neorv32_bus_keeper.vhd - Processor-internal bus monitor
|
├neorv32_icache.vhd - Processor-internal instruction cache
|
|
├neorv32_cfs.vhd - Custom functions subsystem
|
├neorv32_cfs.vhd - Custom functions subsystem
|
├neorv32_debug_dm.vhd - on-chip debugger: debug module
|
├neorv32_debug_dm.vhd - on-chip debugger: debug module
|
├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module
|
├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module
|
├neorv32_dmem.vhd - Processor-internal data memory
|
├neorv32_dmem.vhd - Processor-internal data memory
|
├neorv32_gpio.vhd - General purpose input/output port unit
|
├neorv32_gpio.vhd - General purpose input/output port unit
|
|
├neorv32_icache.vhd - Processor-internal instruction cache
|
├neorv32_imem.vhd - Processor-internal instruction memory
|
├neorv32_imem.vhd - Processor-internal instruction memory
|
│└neor32_application_image.vhd - IMEM application initialization image
|
│└neor32_application_image.vhd - IMEM application initialization image
|
├neorv32_mtime.vhd - Machine system timer
|
├neorv32_mtime.vhd - Machine system timer
|
├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface
|
├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface
|
├neorv32_pwm.vhd - Pulse-width modulation controller
|
├neorv32_pwm.vhd - Pulse-width modulation controller
|
Line 236... |
Line 240... |
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
=== FPGA Implementation Results
|
=== FPGA Implementation Results
|
|
|
This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that
|
This chapter shows _exemplary_ implementation results of the NEORV32 CPU and NEORV32 Processor.
|
the provided results are just a relative measure as logic functions of different modules might be merged
|
|
between entity boundaries, so the actual utilization results might vary a bit.
|
|
|
|
:sectnums:
|
:sectnums:
|
==== CPU
|
==== CPU
|
|
|
[cols="<2,<8"]
|
[cols="<2,<8"]
|
[grid="topbot"]
|
[grid="topbot"]
|
|=======================
|
|=======================
|
| Hardware version: | `1.5.5.5`
|
| Hardware version: | `1.5.7.10`
|
| Top entity: | `rtl/core/neorv32_cpu.vhd`
|
| Top entity: | `rtl/core/neorv32_cpu.vhd`
|
|=======================
|
|=======================
|
|
|
[cols="<5,>1,>1,>1,>1,>1"]
|
[cols="<5,>1,>1,>1,>1,>1"]
|
[options="header",grid="rows"]
|
[options="header",grid="rows"]
|
|=======================
|
|=======================
|
| CPU | LEs | FFs | MEM bits | DSPs | _f~max~_
|
| CPU | LEs | FFs | MEM bits | DSPs | _f~max~_
|
| `rv32i` | 980 | 409 | 1024 | 0 | 125 MHz
|
| `rv32i` | 806 | 359 | 1024 | 0 | 125 MHz
|
| `rv32i_Zicsr` | 1835 | 856 | 1024 | 0 | 125 MHz
|
| `rv32i_Zicsr` | 1729 | 813 | 1024 | 0 | 124 MHz
|
| `rv32im_Zicsr` | 2443 | 1134 | 1024 | 0 | 125 MHz
|
| `rv32im_Zicsr` | 2269 | 1055 | 1024 | 0 | 124 MHz
|
| `rv32imc_Zicsr` | 2669 | 1149 | 1024 | 0 | 125 MHz
|
| `rv32imc_Zicsr` | 2501 | 1070 | 1024 | 0 | 124 MHz
|
| `rv32imac_Zicsr` | 2685 | 1156 | 1024 | 0 | 125 MHz
|
| `rv32imac_Zicsr` | 2511 | 1074 | 1024 | 0 | 124 MHz
|
| `rv32imac_Zicsr` + `debug_mode` | 3058 | 1225 | 1024 | 0 | 125 MHz
|
| `rv32imacu_Zicsr` | 2521 | 1079 | 1024 | 0 | 124 MHz
|
| `rv32imac_Zicsr` + `u` | 2698 | 1162 | 1024 | 0 | 125 MHz
|
| `rv32imacu_Zicsr_Zifencei` | 2522 | 1079 | 1024 | 0 | 122 MHz
|
| `rv32imac_Zicsr_Zifencei` + `u` | 2715 | 1162 | 1024 | 0 | 125 MHz
|
| `rv32imacu_Zicsr_Zifencei_Zfinx` | 3807 | 1731 | 1024 | 7 | 116 MHz
|
| `rv32imac_Zicsr_Zifencei_Zfinx` + `u` | 4004 | 1812 | 1024 | 7 | 118 MHz
|
| `rv32imacu_Zicsr_Zifencei_Zfinx_DebugMode` | 3974 | 1815 | 1024 | 7 | 116 MHz
|
|=======================
|
|=======================
|
|
|
|
[NOTE]
|
|
No HPM counters and no PMP regions were implemented for generating these results.
|
|
|
|
[TIP]
|
|
The CPU provides further options to reduce the area footprint (for example by constraining the CPU-internal
|
|
counter sizes) or to increase performance (for example by using a barrel-shifter; at cost of extra hardware).
|
|
See section <<_processor_top_entity_generics>> for more information.
|
|
|
|
|
:sectnums:
|
:sectnums:
|
==== Processor Modules
|
==== Processor Modules
|
|
|
[cols="<2,<8"]
|
[cols="<2,<8"]
|
[grid="topbot"]
|
[grid="topbot"]
|
|=======================
|
|=======================
|
| Hardware version: | `1.5.7.8`
|
| Hardware version: | `1.5.7.15`
|
| Top entity: | `rtl/core/neorv32_top.vhd`
|
| Top entity: | `rtl/core/neorv32_top.vhd`
|
|=======================
|
|=======================
|
|
|
.Hardware utilization by the processor modules (mandatory core modules in **bold**)
|
.Hardware utilization by the processor modules (mandatory core modules in **bold**)
|
[cols="<2,<8,>1,>1,>2,>1"]
|
[cols="<2,<8,>1,>1,>2,>1"]
|
Line 292... |
Line 302... |
| DTM | On-chip debugger - debug transfer module (JTAG) | 254 | 218 | 0 | 0
|
| DTM | On-chip debugger - debug transfer module (JTAG) | 254 | 218 | 0 | 0
|
| GPIO | General purpose input/output ports | 134 | 161 | 0 | 0
|
| GPIO | General purpose input/output ports | 134 | 161 | 0 | 0
|
| iCACHE | Instruction cache (1x4 blocks, 256 bytes per block) | 2 21| 156 | 8192 | 0
|
| iCACHE | Instruction cache (1x4 blocks, 256 bytes per block) | 2 21| 156 | 8192 | 0
|
| IMEM | Processor-internal instruction memory (16kB) | 13 | 2 | 131072 | 0
|
| IMEM | Processor-internal instruction memory (16kB) | 13 | 2 | 131072 | 0
|
| MTIME | Machine system timer | 319 | 167 | 0 | 0
|
| MTIME | Machine system timer | 319 | 167 | 0 | 0
|
| NEOLED | Smart LED Interface (NeoPixel/WS28128) [4xFIFO] | 342 | 307 | 0 | 0
|
| NEOLED | Smart LED Interface (NeoPixel/WS28128) [FIFO_depth=1] | 226 | 182 | 0 | 0
|
| SLINK | Stream link interface (4 links, FIFO_depth=1) | 345 | 313 | 0 | 0
|
| SLINK | Stream link interface (2xRX, 2xTX, FIFO_depth=1) | 208 | 181 | 0 | 0
|
| PWM | Pulse_width modulation controller (4 channels) | 71 | 69 | 0 | 0
|
| PWM | Pulse_width modulation controller (4 channels) | 71 | 69 | 0 | 0
|
| SPI | Serial peripheral interface | 148 | 127 | 0 | 0
|
| SPI | Serial peripheral interface | 148 | 127 | 0 | 0
|
| **SYSINFO** | System configuration information memory | 14 | 11 | 0 | 0
|
| **SYSINFO** | System configuration information memory | 14 | 11 | 0 | 0
|
| TRNG | True random number generator | 89 | 76 | 0 | 0
|
| TRNG | True random number generator | 89 | 76 | 0 | 0
|
| TWI | Two-wire interface | 77 | 43 | 0 | 0
|
| TWI | Two-wire interface | 77 | 43 | 0 | 0
|
Line 310... |
Line 320... |
|
|
|
|
:sectnums:
|
:sectnums:
|
==== Exemplary Setups
|
==== Exemplary Setups
|
|
|
[TIP]
|
|
Check out the `setups` folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/setups),
|
Check out the `setups` folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/setups),
|
which provides several demo setups for various FPGA boards and toolchains.
|
which provides several demo setups for various FPGA boards and toolchains.
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
=== CPU Performance
|
=== CPU Performance
|
|
|
:sectnums:
|
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark].
|
==== CoreMark Benchmark
|
This benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
|
|
system. The according sources can be found in the `sw/example/coremark` folder.
|
.Configuration
|
|
[cols="<2,<8"]
|
|
[grid="topbot"]
|
|
|=======================
|
|
| Hardware: | 32kB IMEM, 16kB DMEM, no caches, 100MHz clock
|
|
| CoreMark: | 2000 iterations, MEM_METHOD is MEM_STACK
|
|
| Compiler: | RISCV32-GCC 10.1.0
|
|
| Peripherals: | UART for printing the results
|
|
| Compiler flags: | default, see makefile
|
|
|=======================
|
|
|
|
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark]. This
|
|
benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
|
|
system. The according source code and the SW project can be found in the `sw/example/coremark` folder.
|
|
|
|
The resulting CoreMark score is defined as CoreMark iterations per second.
|
The resulting CoreMark score is defined as CoreMark iterations per second.
|
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
|
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
|
defined as CoreMark score divided by the CPU's clock frequency in MHz.
|
defined as CoreMark score divided by the CPU's clock frequency in MHz.
|
|
|
|
.Configuration
|
[cols="<2,<8"]
|
[cols="<2,<8"]
|
[grid="topbot"]
|
[grid="topbot"]
|
|=======================
|
|=======================
|
| Hardware version: | `1.4.9.8`
|
| HW version: | `1.5.7.10`
|
|
| Hardware: | 32kB int. IMEM, 16kB int. DMEM, no caches, 100MHz clock
|
|
| CoreMark: | 2000 iterations, MEM_METHOD is MEM_STACK
|
|
| Compiler: | RISCV32-GCC 10.2.0
|
|
| Compiler flags: | default, see makefile
|
|=======================
|
|=======================
|
|
|
.CoreMark results
|
.CoreMark results
|
[cols="<4,>1,>1,>1"]
|
[cols="<4,^1,^1,^1"]
|
[options="header",grid="rows"]
|
[options="header",grid="rows"]
|
|=======================
|
|=======================
|
| CPU (incl. `Zicsr`) | Executable size | CoreMark Score | CoreMarks/Mhz
|
| CPU | CoreMark Score | CoreMarks/Mhz | Average CPI
|
| `rv32i` | 28756 bytes | 36.36 | **0.3636**
|
| _small_ (`rv32i_Zicsr`) | 33.89 | **0.3389** | **4.04**
|
| `rv32im` | 27516 bytes | 68.97 | **0.6897**
|
| _medium_ (`rv32imc_Zicsr`) | 62.50 | **0.6250** | **5.34**
|
| `rv32imc` | 22008 bytes | 68.97 | **0.6897**
|
| _performance_(`rv32imc_Zicsr` + perf. options) | 95.23 | **0.9523** | **3.54**
|
| `rv32imc` + _FAST_MUL_EN_ | 22008 bytes | 86.96 | **0.8696**
|
|
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ | 22008 bytes | 90.91 | **0.9091**
|
|
|=======================
|
|=======================
|
|
|
[NOTE]
|
[NOTE]
|
All executable were generated using maximum optimization `-O3`.
|
The "_performance_" CPU configuration uses the <<_fast_mul_en>> and <<_fast_shift_en>> options.
|
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the _M_ extension (enabled via the
|
|
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
|
|
operations (enabled via the _FAST_SHIFT_EN_ generic).
|
|
|
|
|
|
|
|
:sectnums:
|
|
==== Instruction Timing
|
|
|
|
|
[NOTE]
|
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
|
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
|
several consecutive micro operations. Hence, each instruction requires several clock cycles to execute.
|
several consecutive micro operations.
|
|
|
|
[NOTE]
|
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
|
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
|
the available CPU extensions. The following table shows the performance results for successfully (!) running
|
the available CPU extensions. The average CPI is computed by dividing the total number of required clock cycles
|
2000 CoreMark iterations.
|
(only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions
|
|
(`[m]instret[h]` CSRs).
|
The average CPI is computed by dividing the total number of required clock cycles (only the timed core to
|
|
avoid distortion due to IO wait cycles) by the number of executed instructions (`[m]instret[h]` CSRs). The
|
|
executables were generated using optimization `-O3`.
|
|
|
|
[cols="<2,<8"]
|
|
[grid="topbot"]
|
|
|=======================
|
|
| Hardware version: | `1.4.9.8`
|
|
|=======================
|
|
|
|
.CoreMark instruction timing
|
|
[cols="<4,>2,>2,>2"]
|
|
[options="header",grid="rows"]
|
|
|=======================
|
|
| CPU (incl. `Zicsr`) | Required clock cycles | Executed instruction | Average CPI
|
|
| `rv32i` | 5595750503 | 1466028607 | **3.82**
|
|
| `rv32im` | 2966086503 | 598651143 | **4.95**
|
|
| `rv32imc` | 2981786734 | 611814918 | **4.87**
|
|
| `rv32imc` + _FAST_MUL_EN_ | 2399234734 | 611814918 | **3.92**
|
|
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ | 2265135174 | 611814948 | **3.70**
|
|
|=======================
|
|
|
|
[TIP]
|
|
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the M extension (enabled via the
|
|
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
|
|
operations (enabled via the _FAST_SHIFT_EN_ generic).
|
|
|
|
[TIP]
|
[TIP]
|
More information regarding the execution time of each implemented instruction can be found in
|
More information regarding the execution time of each implemented instruction can be found in
|
chapter <<_instruction_timing>>.
|
chapter <<_instruction_timing>>.
|
|
|