Line 17... |
Line 17... |
The software framework of the processor comes with application makefiles, software libraries for all CPU
|
The software framework of the processor comes with application makefiles, software libraries for all CPU
|
and processor features, a bootloader, a runtime environment and several example programs - including a port
|
and processor features, a bootloader, a runtime environment and several example programs - including a port
|
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
|
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
|
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).
|
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).
|
|
|
[TIP]
|
|
Check out the processor's **https://stnolting.github.io/neorv32/ug[online User Guide]**
|
Check out the processor's **https://stnolting.github.io/neorv32/ug[online User Guide]**
|
that provides hands-on tutorials to get you started.
|
that provides hands-on tutorials to get you started.
|
The project's change log is available in https://github.com/stnolting/neorv32/blob/main/CHANGELOG.md[CHANGELOG.md]
|
|
in the root directory of the NEORV32 repository. Please also check out the <<_legal>> section.
|
|
|
|
|
|
**Structure**
|
**Structure**
|
|
|
[start=2]
|
[start=2]
|
. <<_neorv32_processor_soc>>
|
. <<_neorv32_processor_soc>>
|
. <<_neorv32_central_processing_unit_cpu>>
|
. <<_neorv32_central_processing_unit_cpu>>
|
. <<_software_framework>>
|
. <<_software_framework>>
|
. <<_on_chip_debugger_ocd>>
|
. <<_on_chip_debugger_ocd>>
|
|
. <<_legal>>
|
|
|
|
|
|
**Annotations**
|
|
|
|
[WARNING]
|
|
Warning
|
|
|
|
[IMPORTANT]
|
|
Important
|
|
|
|
[NOTE]
|
|
Note
|
|
|
|
[TIP]
|
|
Tip
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
|
|
Line 211... |
Line 223... |
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
=== FPGA Implementation Results
|
=== FPGA Implementation Results
|
|
|
This chapter shows _exemplary_ implementation results of the NEORV32 CPU and NEORV32 Processor.
|
This section shows _exemplary_ FPGA implementation results for the NEORV32 CPU and NEORV32 Processor modules.
|
|
Note that certain configuration options might also have an impact on other configuration options. Furthermore,
|
|
this report cannot cover all possible option combinations. Hence, the presented implementation results are
|
|
just _exemplary_. If not otherwise mentioned all implementations use the default generic configurations.
|
|
|
:sectnums:
|
:sectnums:
|
==== CPU
|
==== CPU
|
|
|
[cols="<2,<8"]
|
[cols="<2,<8"]
|
[grid="topbot"]
|
[grid="topbot"]
|
|=======================
|
|=======================
|
|
| HW version: | `1.6.8.3`
|
| Top entity: | `rtl/core/neorv32_cpu.vhd`
|
| Top entity: | `rtl/core/neorv32_cpu.vhd`
|
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
| Toolchain: | Quartus Prime 20.1.0
|
| Toolchain: | Quartus Prime Lite 21.1
|
|
| Constraints: | **no timing constraints**, "_balanced optimization_", f~max~ from "_Slow 1200mV 0C Model_"
|
|=======================
|
|=======================
|
|
|
[cols="<5,>1,>1,>1,>1,>1"]
|
[cols="<6,>1,>1,>1,>1,>1"]
|
[options="header",grid="rows"]
|
[options="header",grid="rows"]
|
|=======================
|
|=======================
|
| CPU | LEs | FFs | MEM bits | DSPs | _f~max~_
|
| CPU ISA Configuration | LEs | FFs | MEM bits | DSPs | _f~max~_
|
| `rv32i` | 806 | 359 | 1024 | 0 | 125 MHz
|
| `rv32e` | 900 | 388 | 512 | 0 | 121 MHz
|
| `rv32i_Zicsr_Zicntr` | 1729 | 813 | 1024 | 0 | 124 MHz
|
| `rv32i` | 904 | 388 | 1024 | 0 | 121 MHz
|
| `rv32im_Zicsr_Zicntr` | 2269 | 1055 | 1024 | 0 | 124 MHz
|
| `rv32i_Zicsr` | 1425 | 673 | 1024 | 0 | 118 MHz
|
| `rv32imc_Zicsr_Zicntr` | 2501 | 1070 | 1024 | 0 | 124 MHz
|
| `rv32i_Zicsr_Zicntr` | 1778 | 803 | 1024 | 0 | 118 MHz
|
| `rv32imac_Zicsr_Zicntr` | 2511 | 1074 | 1024 | 0 | 124 MHz
|
| `rv32im_Zicsr_Zicntr` | 2244 | 978 | 1024 | 0 | 118 MHz
|
| `rv32imacu_Zicsr_Zicntr` | 2521 | 1079 | 1024 | 0 | 124 MHz
|
| `rv32ima_Zicsr_Zicntr` | 2267 | 982 | 1024 | 0 | 118 MHz
|
| `rv32imacu_Zicsr_Zicntr_Zifencei` | 2522 | 1079 | 1024 | 0 | 122 MHz
|
| `rv32imac_Zicsr_Zicntr` | 2453 | 994 | 1024 | 0 | 118 MHz
|
| `rv32imacu_Zicsr_Zicntr_Zifencei_Zfinx` | 3807 | 1731 | 1024 | 7 | 116 MHz
|
| `rv32imacb_Zicsr_Zicntr` | 3270 | 1249 | 1024 | 0 | 118 MHz
|
| `rv32imacu_Zicsr_Zicntr_Zifencei_Zfinx_DebugMode` | 3974 | 1815 | 1024 | 7 | 116 MHz
|
| `rv32imacbu_Zicsr_Zicntr` | 3286 | 1254 | 1024 | 0 | 118 MHz
|
|
| `rv32imacbu_Zicsr_Zicntr_Zifencei` | 3278 | 1254 | 1024 | 0 | 118 MHz
|
|
| `rv32imacbu_Zicsr_Zicntr_Zifencei_Zfinx` | 4536 | 1906 | 1024 | 7 | 115 MHz
|
|
| `rv32imacbu_Zicsr_Zicntr_Zifencei_Zfinx_DebugMode` | 5989 | 2416 | 1024 | 7 | 110 MHz
|
|=======================
|
|=======================
|
|
|
|
.**RISC-V Compliance**
|
|
[NOTE]
|
|
The `Zicsr` ISA extension implements the privileged machine architecture
|
|
(see <<_zicsr_control_and_status_register_access_privileged_architecture>>). The `Zicntr` ISA
|
|
extension implements the basic counters and timers (see <<_zicntr_cpu_base_counters>>). Both
|
|
extensions are _mandatory_ in order to comply with the RISC-V architecture specifications.
|
|
|
|
[NOTE]
|
|
The table above does not show _all_ CPU ISA extensions. More sophisticated and application-specific
|
|
options like PMP and HMP are not included in this overview.
|
|
|
|
.Goal-Driven Optimization
|
[TIP]
|
[TIP]
|
The CPU provides further options to reduce the area footprint (for example by constraining the CPU-internal
|
The CPU provides further options to reduce the area footprint (for example by constraining the CPU-internal
|
counter sizes) or to increase performance (for example by using a barrel-shifter; at cost of extra hardware).
|
counter sizes) or to increase performance (for example by using a barrel-shifter; at cost of extra hardware).
|
See section <<_processor_top_entity_generics>> for more information. Also, take a look at the User Guide section
|
See section <<_processor_top_entity_generics>> for more information. Also, take a look at the User Guide section
|
https://stnolting.github.io/neorv32/ug/#_application_specific_processor_configuration[Application-Specific Processor Configuration].
|
https://stnolting.github.io/neorv32/ug/#_application_specific_processor_configuration[Application-Specific Processor Configuration].
|
|
|
|
|
:sectnums:
|
:sectnums:
|
==== Processor Modules
|
==== Processor - Modules
|
|
|
[cols="<2,<8"]
|
[cols="<2,<8"]
|
[grid="topbot"]
|
[grid="topbot"]
|
|=======================
|
|=======================
|
|
| HW version: | `1.6.8.3`
|
| Top entity: | `rtl/core/neorv32_top.vhd`
|
| Top entity: | `rtl/core/neorv32_top.vhd`
|
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
|
| Toolchain: | Quartus Prime 20.1.0
|
| Toolchain: | Quartus Prime Lite 21.1
|
|
| Constraints: | **no timing constraints**, "_balanced optimization_"
|
|=======================
|
|=======================
|
|
|
.Hardware utilization by the processor modules (mandatory core modules in **bold**)
|
.Hardware utilization by processor module (mandatory modules highlighted in **bold**)
|
[cols="<2,<8,>1,>1,>2,>1"]
|
[cols="<2,<8,>1,>1,>2,>1"]
|
[options="header",grid="rows"]
|
[options="header",grid="rows"]
|
|=======================
|
|=======================
|
| Module | Description | LEs | FFs | MEM bits | DSPs
|
| Module | Description | LEs | FFs | MEM bits | DSPs
|
| Boot ROM | Bootloader ROM (4kB) | 2 | 1 | 32768 | 0
|
| Boot ROM | Bootloader ROM (4kB) | 3 | 2 | 32768 | 0
|
| **BUSKEEPER** | Processor-internal bus monitor | 9 | 6 | 0 | 0
|
| **BUSKEEPER** | Processor-internal bus monitor | 28 | 15 | 0 | 0
|
| **BUSSWITCH** | Bus multiplexer for CPU instr. and data interface | 63 | 8 | 0 | 0
|
| **BUSSWITCH** | Bus multiplexer for CPU instr. and data interface | 69 | 8 | 0 | 0
|
| CFS | Custom functions subsystemfootnote:[Resource utilization depends on actually implemented custom functionality.] | - | - | - | -
|
| CFS | Custom functions subsystemfootnote:[Resource utilization depends on custom design logic.] | - | - | - | -
|
| DMEM | Processor-internal data memory (8kB) | 19 | 2 | 65536 | 0
|
| DM | On-chip debugger - debug module | 473 | 240 | 0 | 0
|
| DM | On-chip debugger - debug module | 493 | 240 | 0 | 0
|
| DTM | On-chip debugger - debug transfer module (JTAG) | 259 | 221 | 0 | 0
|
| DTM | On-chip debugger - debug transfer module (JTAG) | 254 | 218 | 0 | 0
|
| DMEM | Processor-internal data memory (8kB) | 18 | 2 | 65536 | 0
|
| GPIO | General purpose input/output ports | 134 | 161 | 0 | 0
|
| GPIO | General purpose input/output ports | 102 | 98 | 0 | 0
|
| iCACHE | Instruction cache (1x4 blocks, 256 bytes per block) | 2 21| 156 | 8192 | 0
|
| GPTMR | General Purpose Timer | 153 | 105 | 0 | 0
|
| IMEM | Processor-internal instruction memory (16kB) | 13 | 2 | 131072 | 0
|
| iCACHE | Instruction cache (2x4 blocks, 64 bytes per block) | 417 | 297 | 4096 | 0
|
| MTIME | Machine system timer | 319 | 167 | 0 | 0
|
| IMEM | Processor-internal instruction memory (16kB) | 12 | 2 | 131072 | 0
|
| NEOLED | Smart LED Interface (NeoPixel/WS28128) [FIFO_depth=1] | 226 | 182 | 0 | 0
|
| MTIME | Machine system timer | 345 | 166 | 0 | 0
|
| SLINK | Stream link interface (2xRX, 2xTX, FIFO_depth=1) | 208 | 181 | 0 | 0
|
| NEOLED | Smart LED Interface (NeoPixel/WS28128) (FIFO_depth=1) | 227 | 184 | 0 | 0
|
| PWM | Pulse_width modulation controller (4 channels) | 71 | 69 | 0 | 0
|
| PWM | Pulse_width modulation controller (8 channels) | 128 | qq7 | 0 | 0
|
| SPI | Serial peripheral interface | 148 | 127 | 0 | 0
|
| SLINK | Stream link interface (2xRX, 2xTX, FIFO_depth=1) | 136 | 116 | 0 | 0
|
| **SYSINFO** | System configuration information memory | 14 | 11 | 0 | 0
|
| SPI | Serial peripheral interface | 114 | 94 | 0 | 0
|
| TRNG | True random number generator | 89 | 76 | 0 | 0
|
| **SYSINFO** | System configuration information memory | 13 | 11 | 0 | 0
|
|
| TRNG | True random number generator | 89 | 79 | 0 | 0
|
| TWI | Two-wire interface | 77 | 43 | 0 | 0
|
| TWI | Two-wire interface | 77 | 43 | 0 | 0
|
| UART0/1 | Universal asynchronous receiver/transmitter 0/1 | 183 | 132 | 0 | 0
|
| UART0, UART1 | Universal asynchronous receiver/transmitter 0/1 (FIFO_depth=1) | 195 | 143 | 0 | 0
|
| WDT | Watchdog timer | 53 | 43 | 0 | 0
|
| WDT | Watchdog timer | 61 | 46 | 0 | 0
|
| WISHBONE | External memory interface | 114 | 110 | 0 | 0
|
| WISHBONE | External memory interface | 120 | 112 | 0 | 0
|
| XIRQ | External interrupt controller (32 channels) | 241 | 201 | 0 | 0
|
| XIP | Execute in place module | 318 | 244 | 0 | 0
|
| GPTMR | General Purpose Timer | 153 | 107 | 0 | 0
|
| XIRQ | External interrupt controller (32 channels) | 245 | 200 | 0 | 0
|
| XIP | Execute in place module | 305 | 243 | 0 | 0
|
|
|=======================
|
|=======================
|
|
|
|
[NOTE]
|
|
Note that not all IOs were actually connected to FPGA pins (for example some GPIO inputs and outputs)
|
|
when generating these reports.
|
|
|
|
|
|
|
:sectnums:
|
:sectnums:
|
==== Exemplary Setups
|
==== Exemplary Setups
|
|
|
Line 335... |
Line 373... |
| _small_ (`rv32i_Zicsr`) | 33.89 | **0.3389** | **4.04**
|
| _small_ (`rv32i_Zicsr`) | 33.89 | **0.3389** | **4.04**
|
| _medium_ (`rv32imc_Zicsr`) | 62.50 | **0.6250** | **5.34**
|
| _medium_ (`rv32imc_Zicsr`) | 62.50 | **0.6250** | **5.34**
|
| _performance_ (`rv32imc_Zicsr` + perf. options) | 95.23 | **0.9523** | **3.54**
|
| _performance_ (`rv32imc_Zicsr` + perf. options) | 95.23 | **0.9523** | **3.54**
|
|=======================
|
|=======================
|
|
|
[IMPORTANT]
|
[NOTE]
|
The CoreMark results were generated using a `rv32i` toolchain. This toolchain supports standard extensions
|
The CoreMark results were generated using a `rv32i` toolchain. This toolchain supports standard extensions
|
like `M` and `C` but the built-in libraries only use the base `I` ISA.
|
like `M` and `C` but the built-in libraries only use the base `I` ISA.
|
|
|
[NOTE]
|
[NOTE]
|
The "_performance_" CPU configuration uses the <<_fast_mul_en>> and <<_fast_shift_en>> options.
|
The "_performance_" CPU configuration uses the <<_fast_mul_en>> and <<_fast_shift_en>> options.
|
|
|
[NOTE]
|
|
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
|
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
|
several consecutive micro operations.
|
several consecutive micro operations.
|
|
|
[NOTE]
|
|
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
|
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
|
the available CPU extensions. The average CPI is computed by dividing the total number of required clock cycles
|
the available CPU extensions. The average CPI is computed by dividing the total number of required clock cycles
|
(only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions
|
(only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions
|
(`[m]instret[h]` CSRs).
|
(`[m]instret[h]` CSRs). More information regarding the execution time of each implemented instruction can be found in
|
|
|
[TIP]
|
|
More information regarding the execution time of each implemented instruction can be found in
|
|
chapter <<_instruction_timing>>.
|
chapter <<_instruction_timing>>.
|