OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [overview.adoc] - Diff between revs 61 and 62

Go to most recent revision | Show entire file | Details | Blame | View Log

Rev 61 Rev 62
Line 113... Line 113...
* small hardware footprint and high operating frequency for easy integration
* small hardware footprint and high operating frequency for easy integration
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
** RISC-V compatibility: passes the official architecture tests
** RISC-V compatibility: passes the official architecture tests
** base architecture + privileged architecture (optional) + ISA extensions (optional)
** base architecture + privileged architecture (optional) + ISA extensions (optional)
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
 
** aims to support <<_full_virtualization>> capabilities (CPU _and_ SoC) to increase execution safety
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
* **NEORV32 Processor (SoC)**: highly-configurable full-scale microcontroller-like processor system
* **NEORV32 Processor (SoC)**: highly-configurable full-scale microcontroller-like processor system
** based on the NEORV32 CPU
** based on the NEORV32 CPU
** optional serial interfaces (UARTs, TWI, SPI)
** optional serial interfaces (UARTs, TWI, SPI)
** optional timers and counters (WDT, MTIME)
** optional timers and counters (WDT, MTIME)
Line 142... Line 143...
:sectnums:
:sectnums:
=== Project Folder Structure
=== Project Folder Structure
 
 
...................................
...................................
neorv32           - Project home folder
neorv32           - Project home folder
├.ci              - Scripts for continuous integration
├setups           - Example setups for various FPGA boards and toolchains
 
│└...
 
├CHANGELOG.md     - Project change log
 
├docs             - Project documentation
├docs             - Project documentation
 
│├datasheet        - .adoc sources for NEORV32 data sheet
│├doxygen_build   - Software framework documentation (generated by doxygen)
│├doxygen_build   - Software framework documentation (generated by doxygen)
│├src_adoc        - AsciiDoc sources for this document
│├figures          - Figures and logos
 
│├icons            - Misc. symbols
│├references      - Data sheets and RISC-V specs.
│├references      - Data sheets and RISC-V specs.
│└figures         - Figures and logos
│└src_adoc         - AsciiDoc sources for this document
├riscv-arch-test  - Port files for the official RISC-V architecture tests
├rtl              - VHDL sources
├rtl              - VHDL sources
│├core            - Sources of the CPU & SoC
│├core             - Core sources of the CPU & SoC
│└templates       - Alternate/additional top entities/wrappers
│└templates        - Alternate/additional top entities & wrappers
│ ├processor      - Processor wrappers
│ ├processor       - Processor SoC wrappers
│ └system         - System wrappers for advanced connectivity
│ └system         - System wrappers for advanced connectivity
├sim              - Simulation files
│└rtl_modules     - Processor modules for simulation-only
├setups            - Example setups for various FPGAs, boards and toolchains
 
│└...
 
 
├sim               - Simulation files (see User Guide)
 
└sw               - Software framework
└sw               - Software framework
 ├bootloader      - Sources and scripts for the NEORV32 internal bootloader
 ├bootloader      - Sources and scripts for the NEORV32 internal bootloader
 ├common          - Linker script and crt0.S start-up code
 ├common          - Linker script and crt0.S start-up code
 ├example         - Various example programs
 ├example         - Various example programs
 │└...
 │└...
 
 ├isa-test
 
 │├riscv-arch-test - RISC-V spec. compatibility test framework (submodule)
 
 │└port-neorv32    - Port files for the official RISC-V architecture tests
 ├ocd_firmware    - source code for on-chip debugger's "park loop"
 ├ocd_firmware    - source code for on-chip debugger's "park loop"
 ├openocd         - OpenOCD on-chip debugger configuration files
 ├openocd         - OpenOCD on-chip debugger configuration files
 ├image_gen       - Helper program to generate NEORV32 executables
 ├image_gen       - Helper program to generate NEORV32 executables
 └lib             - Processor core library
 └lib             - Processor core library
  ├include        - Header files (*.h)
  ├include        - Header files (*.h)
  └source         - Source files (*.c)
  └source         - Source files (*.c)
...................................
...................................
 
 
[NOTE]
 
There are further files and folders starting with a dot which – for example – contain
 
data/configurations only relevant for git or for the continuous integration framework (`.ci`).
 
 
 
 
 
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
Line 209... Line 213...
├neorv32_boot_rom.vhd            - Bootloader ROM
├neorv32_boot_rom.vhd            - Bootloader ROM
│└neorv32_bootloader_image.vhd   - Bootloader boot ROM memory image
│└neorv32_bootloader_image.vhd   - Bootloader boot ROM memory image
├neorv32_busswitch.vhd           - Processor bus switch for CPU buses (I&D)
├neorv32_busswitch.vhd           - Processor bus switch for CPU buses (I&D)
├neorv32_bus_keeper.vhd          - Processor-internal bus monitor
├neorv32_bus_keeper.vhd          - Processor-internal bus monitor
├neorv32_icache.vhd              - Processor-internal instruction cache
 
├neorv32_cfs.vhd                 - Custom functions subsystem
├neorv32_cfs.vhd                 - Custom functions subsystem
├neorv32_debug_dm.vhd            - on-chip debugger: debug module
├neorv32_debug_dm.vhd            - on-chip debugger: debug module
├neorv32_debug_dtm.vhd           - on-chip debugger: debug transfer module
├neorv32_debug_dtm.vhd           - on-chip debugger: debug transfer module
├neorv32_dmem.vhd                - Processor-internal data memory
├neorv32_dmem.vhd                - Processor-internal data memory
├neorv32_gpio.vhd                - General purpose input/output port unit
├neorv32_gpio.vhd                - General purpose input/output port unit
 
├neorv32_icache.vhd              - Processor-internal instruction cache
├neorv32_imem.vhd                - Processor-internal instruction memory
├neorv32_imem.vhd                - Processor-internal instruction memory
│└neor32_application_image.vhd   - IMEM application initialization image
│└neor32_application_image.vhd   - IMEM application initialization image
├neorv32_mtime.vhd               - Machine system timer
├neorv32_mtime.vhd               - Machine system timer
├neorv32_neoled.vhd              - NeoPixel (TM) compatible smart LED interface
├neorv32_neoled.vhd              - NeoPixel (TM) compatible smart LED interface
├neorv32_pwm.vhd                 - Pulse-width modulation controller
├neorv32_pwm.vhd                 - Pulse-width modulation controller
Line 236... Line 240...
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== FPGA Implementation Results
=== FPGA Implementation Results
 
 
This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that
This chapter shows _exemplary_ implementation results of the NEORV32 CPU and NEORV32 Processor.
the provided results are just a relative measure as logic functions of different modules might be merged
 
between entity boundaries, so the actual utilization results might vary a bit.
 
 
 
:sectnums:
:sectnums:
==== CPU
==== CPU
 
 
[cols="<2,<8"]
[cols="<2,<8"]
[grid="topbot"]
[grid="topbot"]
|=======================
|=======================
| Hardware version: | `1.5.5.5`
| Hardware version: | `1.5.7.10`
| Top entity:       | `rtl/core/neorv32_cpu.vhd`
| Top entity:       | `rtl/core/neorv32_cpu.vhd`
|=======================
|=======================
 
 
[cols="<5,>1,>1,>1,>1,>1"]
[cols="<5,>1,>1,>1,>1,>1"]
[options="header",grid="rows"]
[options="header",grid="rows"]
|=======================
|=======================
| CPU                                   | LEs  | FFs  | MEM bits | DSPs | _f~max~_
| CPU                                   | LEs  | FFs  | MEM bits | DSPs | _f~max~_
| `rv32i`                               |  980 |  409 | 1024     | 0    | 125 MHz
| `rv32i`                                    |  806 |  359 |     1024 |    0 | 125 MHz
| `rv32i_Zicsr`                         | 1835 |  856 | 1024     | 0    | 125 MHz
| `rv32i_Zicsr`                              | 1729 |  813 |     1024 |    0 | 124 MHz
| `rv32im_Zicsr`                        | 2443 | 1134 | 1024     | 0    | 125 MHz
| `rv32im_Zicsr`                             | 2269 | 1055 |     1024 |    0 | 124 MHz
| `rv32imc_Zicsr`                       | 2669 | 1149 | 1024     | 0    | 125 MHz
| `rv32imc_Zicsr`                            | 2501 | 1070 |     1024 |    0 | 124 MHz
| `rv32imac_Zicsr`                      | 2685 | 1156 | 1024     | 0    | 125 MHz
| `rv32imac_Zicsr`                           | 2511 | 1074 |     1024 |    0 | 124 MHz
| `rv32imac_Zicsr` + `debug_mode`       | 3058 | 1225 | 1024     | 0    | 125 MHz
| `rv32imacu_Zicsr`                          | 2521 | 1079 |     1024 |    0 | 124 MHz
| `rv32imac_Zicsr` + `u`                | 2698 | 1162 | 1024     | 0    | 125 MHz
| `rv32imacu_Zicsr_Zifencei`                 | 2522 | 1079 |     1024 |    0 | 122 MHz
| `rv32imac_Zicsr_Zifencei` + `u`       | 2715 | 1162 | 1024     | 0    | 125 MHz
| `rv32imacu_Zicsr_Zifencei_Zfinx`           | 3807 | 1731 |     1024 |    7 | 116 MHz
| `rv32imac_Zicsr_Zifencei_Zfinx` + `u` | 4004 | 1812 | 1024     | 7    | 118 MHz
| `rv32imacu_Zicsr_Zifencei_Zfinx_DebugMode` | 3974 | 1815 |     1024 |    7 | 116 MHz
|=======================
|=======================
 
 
 
[NOTE]
 
No HPM counters and no PMP regions were implemented for generating these results.
 
 
 
[TIP]
 
The CPU provides further options to reduce the area footprint (for example by constraining the CPU-internal
 
counter sizes) or to increase performance (for example by using a barrel-shifter; at cost of extra hardware).
 
See section <<_processor_top_entity_generics>> for more information.
 
 
 
 
:sectnums:
:sectnums:
==== Processor Modules
==== Processor Modules
 
 
[cols="<2,<8"]
[cols="<2,<8"]
[grid="topbot"]
[grid="topbot"]
|=======================
|=======================
| Hardware version: | `1.5.7.8`
| Hardware version: | `1.5.7.15`
| Top entity:       | `rtl/core/neorv32_top.vhd`
| Top entity:       | `rtl/core/neorv32_top.vhd`
|=======================
|=======================
 
 
.Hardware utilization by the processor modules (mandatory core modules in **bold**)
.Hardware utilization by the processor modules (mandatory core modules in **bold**)
[cols="<2,<8,>1,>1,>2,>1"]
[cols="<2,<8,>1,>1,>2,>1"]
Line 292... Line 302...
| DTM           | On-chip debugger - debug transfer module (JTAG)     | 254 | 218 |        0 |    0
| DTM           | On-chip debugger - debug transfer module (JTAG)     | 254 | 218 |        0 |    0
| GPIO          | General purpose input/output ports                  | 134 | 161 |        0 |    0
| GPIO          | General purpose input/output ports                  | 134 | 161 |        0 |    0
| iCACHE        | Instruction cache (1x4 blocks, 256 bytes per block) | 2 21| 156 |     8192 |    0
| iCACHE        | Instruction cache (1x4 blocks, 256 bytes per block) | 2 21| 156 |     8192 |    0
| IMEM          | Processor-internal instruction memory (16kB)        |  13 |   2 |   131072 |    0
| IMEM          | Processor-internal instruction memory (16kB)        |  13 |   2 |   131072 |    0
| MTIME         | Machine system timer                                | 319 | 167 |        0 |    0
| MTIME         | Machine system timer                                | 319 | 167 |        0 |    0
| NEOLED        | Smart LED Interface (NeoPixel/WS28128) [4xFIFO]     | 342 | 307 |        0 |    0
| NEOLED        | Smart LED Interface (NeoPixel/WS28128) [FIFO_depth=1] | 226 | 182 |        0 |    0
| SLINK         | Stream link interface (4 links, FIFO_depth=1)       | 345 | 313 |        0 |    0
| SLINK         | Stream link interface (2xRX, 2xTX, FIFO_depth=1)      | 208 | 181 |        0 |    0
| PWM           | Pulse_width modulation controller (4 channels)      |  71 |  69 |        0 |    0
| PWM           | Pulse_width modulation controller (4 channels)      |  71 |  69 |        0 |    0
| SPI           | Serial peripheral interface                         | 148 | 127 |        0 |    0
| SPI           | Serial peripheral interface                         | 148 | 127 |        0 |    0
| **SYSINFO**   | System configuration information memory             |  14 |  11 |        0 |    0
| **SYSINFO**   | System configuration information memory             |  14 |  11 |        0 |    0
| TRNG          | True random number generator                        |  89 |  76 |        0 |    0
| TRNG          | True random number generator                        |  89 |  76 |        0 |    0
| TWI           | Two-wire interface                                  |  77 |  43 |        0 |    0
| TWI           | Two-wire interface                                  |  77 |  43 |        0 |    0
Line 310... Line 320...
 
 
 
 
:sectnums:
:sectnums:
==== Exemplary Setups
==== Exemplary Setups
 
 
[TIP]
 
Check out the `setups` folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/setups),
Check out the `setups` folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/setups),
which provides several demo setups for various FPGA boards and toolchains.
which provides several demo setups for various FPGA boards and toolchains.
 
 
 
 
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== CPU Performance
=== CPU Performance
 
 
:sectnums:
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark].
==== CoreMark Benchmark
This benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
 
system. The according sources can be found in the `sw/example/coremark` folder.
.Configuration
 
[cols="<2,<8"]
 
[grid="topbot"]
 
|=======================
 
| Hardware:       | 32kB IMEM, 16kB DMEM, no caches, 100MHz clock
 
| CoreMark:       | 2000 iterations, MEM_METHOD is MEM_STACK
 
| Compiler:       | RISCV32-GCC 10.1.0
 
| Peripherals:    | UART for printing the results
 
| Compiler flags: | default, see makefile
 
|=======================
 
 
 
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark]. This
 
benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
 
system. The according source code and the SW project can be found in the `sw/example/coremark` folder.
 
 
 
The resulting CoreMark score is defined as CoreMark iterations per second.
The resulting CoreMark score is defined as CoreMark iterations per second.
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
defined as CoreMark score divided by the CPU's clock frequency in MHz.
defined as CoreMark score divided by the CPU's clock frequency in MHz.
 
 
 
.Configuration
[cols="<2,<8"]
[cols="<2,<8"]
[grid="topbot"]
[grid="topbot"]
|=======================
|=======================
| Hardware version: | `1.4.9.8`
| HW version:     | `1.5.7.10`
 
| Hardware:       | 32kB int. IMEM, 16kB int. DMEM, no caches, 100MHz clock
 
| CoreMark:       | 2000 iterations, MEM_METHOD is MEM_STACK
 
| Compiler:       | RISCV32-GCC 10.2.0
 
| Compiler flags: | default, see makefile
|=======================
|=======================
 
 
.CoreMark results
.CoreMark results
[cols="<4,>1,>1,>1"]
[cols="<4,^1,^1,^1"]
[options="header",grid="rows"]
[options="header",grid="rows"]
|=======================
|=======================
| CPU (incl. `Zicsr`)                         | Executable size | CoreMark Score | CoreMarks/Mhz
| CPU                                            | CoreMark Score | CoreMarks/Mhz | Average CPI
| `rv32i`                                     |     28756 bytes |          36.36 | **0.3636**
| _small_ (`rv32i_Zicsr`)                        |          33.89 | **0.3389**    | **4.04**
| `rv32im`                                    |     27516 bytes |          68.97 | **0.6897**
| _medium_ (`rv32imc_Zicsr`)                     |          62.50 | **0.6250**    | **5.34**
| `rv32imc`                                   |     22008 bytes |          68.97 | **0.6897**
| _performance_(`rv32imc_Zicsr` + perf. options) |          95.23 | **0.9523**    | **3.54**
| `rv32imc` + _FAST_MUL_EN_                   |     22008 bytes |          86.96 | **0.8696**
 
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |     22008 bytes |          90.91 | **0.9091**
 
|=======================
|=======================
 
 
[NOTE]
[NOTE]
All executable were generated using maximum optimization `-O3`.
The "_performance_" CPU configuration uses the <<_fast_mul_en>> and <<_fast_shift_en>> options.
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the _M_ extension (enabled via the
 
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
 
operations (enabled via the _FAST_SHIFT_EN_ generic).
 
 
 
 
 
 
 
:sectnums:
 
==== Instruction Timing
 
 
 
 
[NOTE]
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
several consecutive micro operations. Hence, each instruction requires several clock cycles to execute.
several consecutive micro operations.
 
 
 
[NOTE]
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
the available CPU extensions. The following table shows the performance results for successfully (!) running
the available CPU extensions. The average CPI is computed by dividing the total number of required clock cycles
2000 CoreMark iterations.
(only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions
 
(`[m]instret[h]` CSRs).
The average CPI is computed by dividing the total number of required clock cycles (only the timed core to
 
avoid distortion due to IO wait cycles) by the number of executed instructions (`[m]instret[h]` CSRs). The
 
executables were generated using optimization `-O3`.
 
 
 
[cols="<2,<8"]
 
[grid="topbot"]
 
|=======================
 
| Hardware version: | `1.4.9.8`
 
|=======================
 
 
 
.CoreMark instruction timing
 
[cols="<4,>2,>2,>2"]
 
[options="header",grid="rows"]
 
|=======================
 
| CPU (incl. `Zicsr`)                         | Required clock cycles | Executed instruction | Average CPI
 
| `rv32i`                                     |            5595750503 | 1466028607           | **3.82**
 
| `rv32im`                                    |            2966086503 |  598651143           | **4.95**
 
| `rv32imc`                                   |            2981786734 |  611814918           | **4.87**
 
| `rv32imc` + _FAST_MUL_EN_                   |            2399234734 |  611814918           | **3.92**
 
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |            2265135174 |  611814948           | **3.70**
 
|=======================
 
 
 
[TIP]
 
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the M extension (enabled via the
 
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
 
operations (enabled via the _FAST_SHIFT_EN_ generic).
 
 
 
[TIP]
[TIP]
More information regarding the execution time of each implemented instruction can be found in
More information regarding the execution time of each implemented instruction can be found in
chapter <<_instruction_timing>>.
chapter <<_instruction_timing>>.
 
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.