OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [overview.adoc] - Rev 74

Compare with Previous | Blame | View Log

:sectnums:
== Overview

The NEORV32footnote:[Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.] is an open-source
RISC-V compatible processor system that is intended as *ready-to-go* auxiliary processor within a larger SoC
designs or as stand-alone custom / customizable microcontroller.

The system is highly configurable and provides optional common peripherals like embedded memories,
timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb
compatible on-chip debugger accessible via JTAG.

Special focus is paid on **execution safety** to provide defined and predictable behavior at any time.
Therefore, the CPU ensures that all memory access are acknowledged and no invalid/malformed instructions
are executed. Whenever an unexpected situation occurs, the application code is informed via hardware exceptions.

The software framework of the processor comes with application makefiles, software libraries for all CPU
and processor features, a bootloader, a runtime environment and several example programs - including a port
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).

Check out the processor's **https://stnolting.github.io/neorv32/ug[online User Guide]**
that provides hands-on tutorials to get you started.


**Structure**

[start=2]
. <<_neorv32_processor_soc>>
. <<_neorv32_central_processing_unit_cpu>>
. <<_software_framework>>
. <<_on_chip_debugger_ocd>>
. <<_legal>>


**Annotations**

[WARNING]
Warning

[IMPORTANT]
Important

[NOTE]
Note

[TIP]
Tip


<<<
// ####################################################################################################################

include::rationale.adoc[]



// ####################################################################################################################
:sectnums:
=== Project Key Features

* open-source and documented; including user guides to get started
* completely described in behavioral, platform-independent VHDL (yet platform-optimized modules are provided)
* fully synchronous design, no latches, no gated clocks
* small hardware footprint and high operating frequency for easy integration
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
** RISC-V compatibility: passes the official architecture tests
** base architecture + privileged architecture (optional) + ISA extensions (optional)
** option to add custom RISC-V instructions (as custom ISA extension)
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
** aims to support <<_full_virtualization>> capabilities (CPU _and_ SoC) to increase execution safety
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
* **NEORV32 Processor (SoC)**: highly-configurable full-scale microcontroller-like processor system
** based on the NEORV32 CPU
** optional serial interfaces (UARTs, TWI, SPI)
** optional timers and counters (WDT, MTIME)
** optional general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface
** optional embedded memories / caches for data, instructions and bootloader
** optional external memory interface (Wishbone / AXI4-Lite) and stream link interface (AXI4-Stream) for custom connectivity
** optional execute in place (XIP) module
** on-chip debugger compatible with OpenOCD and gdb including hardware trigger module
* **Software framework**
** GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles
** internal bootloader with serial user interface
** core libraries for high-level usage of the provided functions and peripherals
** runtime environment and several example programs
** doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
** FreeRTOS port + demos available

[TIP]
For more in-depth details regarding the feature provided by he hardware see the according sections:
<<_neorv32_central_processing_unit_cpu>> and <<_neorv32_processor_soc>>.

**Extensibility and Customization**

The NEORV32 processor was designed to ease customization and extensibility and provides several options for adding
application-specific custom hardware modules and accelerators. The three most common options for adding custom
on-chip modules are listed below.

* <<_processor_external_memory_interface_wishbone_axi4_lite>> for processor-external modules
* <<_custom_functions_subsystem_cfs>> for tightly-coupled processor-internal co-processors
* <<_custom_functions_unit_cfu>> for custom RISC-V instructions

[TIP]
A more detailed comparison of the extension/customization options can be found in section
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules]
of the user guide.


<<<
// ####################################################################################################################
:sectnums:
=== Project Folder Structure

...................................
neorv32                - Project home folder
├docs                  - Project documentation
│├datasheet            - AsciiDoc sources for the NEORV32 data sheet
│├figures              - Figures and logos
│├icons                - Misc. symbols
│├references           - Data sheets and RISC-V specs.
│└userguide            - AsciiDoc sources for the NEORV32 user guide
├rtl                   - VHDL sources
│├core                 - Core sources of the CPU & SoC
││└mem                 - SoC-internal memories (default architectures)
│├processor_templates  - Pre-configured SoC wrappers
│├system_integration   - System wrappers for advanced connectivity
│└test_setups          - Minimal test setup "SoCs" used in the User Guide
├sim                   - Simulation files (see User Guide)
└sw                    - Software framework
 ├bootloader           - Sources of the processor-internal bootloader
 ├common               - Linker script, crt0.S start-up code and central makefile
 ├example              - Various example programs
 │└...
 ├lib                  - Processor core library
 │├include             - Header files (*.h)
 │└source              - Source files (*.c)
 ├image_gen            - Helper program to generate NEORV32 executables^
 ├ocd_firmware         - Source code for on-chip debugger's "park loop"
 ├openocd              - OpenOCD on-chip debugger configuration files
 └svd                  - Processor system view description file (CMSIS-SVD)
...................................



<<<
// ####################################################################################################################
:sectnums:
=== VHDL File Hierarchy

All necessary VHDL hardware description files are located in the project's `rtl/core` folder. The top entity
of the entire processor including all the required configuration generics is **`neorv32_top.vhd`**.

[IMPORTANT]
All core VHDL files from the list below have to be assigned to a new design library named **`neorv32`**. Additional
files, like alternative top entities, can be assigned to any library.

...................................
neorv32_top.vhd                  - NEORV32 Processor top entity
├neorv32_fifo.vhd                - General purpose FIFO component
├neorv32_package.vhd             - Processor/CPU main VHDL package file
├neorv32_cpu.vhd                 - NEORV32 CPU top entity
│├neorv32_cpu_alu.vhd            - Arithmetic/logic unit
││├neorv32_cpu_cp_bitmanip.vhd   - Bit-manipulation co-processor (B ext.)
││├neorv32_cpu_cp_cfu.vhd        - Custom functions (instruction) co-processor (Zxcfu ext.)
││├neorv32_cpu_cp_fpu.vhd        - Floating-point co-processor (Zfinx ext.)
││├neorv32_cpu_cp_muldiv.vhd     - Mul/Div co-processor (M extension)
││└neorv32_cpu_cp_shifter.vhd    - Bit-shift co-processor
│├neorv32_cpu_bus.vhd            - Bus interface + physical memory protection
│├neorv32_cpu_control.vhd        - CPU control, exception/IRQ system and CSRs
││└neorv32_cpu_decompressor.vhd  - Compressed instructions decoder
│└neorv32_cpu_regfile.vhd        - Data register file
├neorv32_boot_rom.vhd            - Bootloader ROM
│└neorv32_bootloader_image.vhd   - Bootloader boot ROM memory image
├neorv32_busswitch.vhd           - Processor bus switch for CPU buses (I&D)
├neorv32_bus_keeper.vhd          - Processor-internal bus monitor
├neorv32_cfs.vhd                 - Custom functions subsystem
├neorv32_debug_dm.vhd            - on-chip debugger: debug module
├neorv32_debug_dtm.vhd           - on-chip debugger: debug transfer module
├neorv32_dmem.entity.vhd         - Processor-internal data memory (entity-only!)
├neorv32_gpio.vhd                - General purpose input/output port unit
├neorv32_gptmr.vhd               - General purpose 32-bit timer
├neorv32_icache.vhd              - Processor-internal instruction cache
├neorv32_imem.entity.vhd         - Processor-internal instruction memory (entity-only!)
│└neor32_application_image.vhd   - IMEM application initialization image
├neorv32_mtime.vhd               - Machine system timer
├neorv32_neoled.vhd              - NeoPixel (TM) compatible smart LED interface
├neorv32_pwm.vhd                 - Pulse-width modulation controller
├neorv32_slink.vhd               - Stream link controller
├neorv32_spi.vhd                 - Serial peripheral interface controller
├neorv32_sysinfo.vhd             - System configuration information memory
├neorv32_trng.vhd                - True random number generator
├neorv32_twi.vhd                 - Two wire serial interface controller
├neorv32_uart.vhd                - Universal async. receiver/transmitter
├neorv32_wdt.vhd                 - Watchdog timer
├neorv32_wishbone.vhd            - External (Wishbone) bus interface
├neorv32_xip.vhd                 - Execute in place module
├neorv32_xirq.vhd                - External interrupt controller
├mem/neorv32_dmem.default.vhd    - _Default_ data memory (architecture-only)
└mem/neorv32_imem.default.vhd    - _Default_ instruction memory (architecture-only)
...................................

[NOTE]
The processor-internal instruction and data memories (IMEM and DMEM) are split into two design files each:
a plain entity definition (`neorv32_*mem.entity.vhd`) and the actual architecture definition
(`mem/neorv32_*mem.default.vhd`). The `*.default.vhd` architecture definitions from `rtl/core/mem` provide a _generic_ and
_platform independent_ memory design that (should) infers embedded memory blocks. You can replace/modify the architecture
source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping
and/or timing.


<<<
// ####################################################################################################################
:sectnums:
=== FPGA Implementation Results

This section shows _exemplary_ FPGA implementation results for the NEORV32 CPU and NEORV32 Processor modules.
Note that certain configuration options might also have an impact on other configuration options. Furthermore,
this report cannot cover all possible option combinations. Hence, the presented implementation results are
just _exemplary_. If not otherwise mentioned all implementations use the default generic configurations.

:sectnums:
==== CPU

[cols="<2,<8"]
[grid="topbot"]
|=======================
| HW version:  | `1.6.9.8`
| Top entity:  | `rtl/core/neorv32_cpu.vhd`
| FPGA:        | Intel Cyclone IV E `EP4CE22F17C6`
| Toolchain:   | Quartus Prime Lite 21.1
| Constraints: | **no timing constraints**, "_balanced optimization_", f~max~ from "_Slow 1200mV 0C Model_"
|=======================

[cols="<6,>1,>1,>1,>1,>1"]
[options="header",grid="rows"]
|=======================
| CPU ISA Configuration                              | LEs  | FFs  | MEM bits | DSPs | _f~max~_
| `rv32e`                                            |  830 |  400 |      512 |    0 | 129 MHz
| `rv32i`                                            |  834 |  400 |     1024 |    0 | 129 MHz
| `rv32i_Zicsr`                                      | 1328 |  678 |     1024 |    0 | 128 MHz
| `rv32i_Zicsr_Zicntr`                               | 1614 |  808 |     1024 |    0 | 128 MHz
| `rv32im_Zicsr_Zicntr`                              | 2087 |  983 |     1024 |    0 | 128 MHz
| `rv32ima_Zicsr_Zicntr`                             | 2129 |  987 |     1024 |    0 | 128 MHz
| `rv32imac_Zicsr_Zicntr`                            | 2338 |  992 |     1024 |    0 | 128 MHz
| `rv32imacb_Zicsr_Zicntr`                           | 3175 | 1247 |     1024 |    0 | 128 MHz
| `rv32imacbu_Zicsr_Zicntr`                          | 3186 | 1254 |     1024 |    0 | 128 MHz
| `rv32imacbu_Zicsr_Zicntr_Zifencei`                 | 3187 | 1254 |     1024 |    0 | 128 MHz
| `rv32imacbu_Zicsr_Zicntr_Zifencei_Zfinx`           | 4450 | 1906 |     1024 |    7 | 123 MHz
| `rv32imacbu_Zicsr_Zicntr_Zifencei_Zfinx_DebugMode` | 4825 | 2018 |     1024 |    7 | 123 MHz
|=======================

.**RISC-V Compliance**
[NOTE]
The `Zicsr` ISA extension implements the privileged machine architecture
(see <<_zicsr_control_and_status_register_access_privileged_architecture>>). The `Zicntr` ISA
extension implements the basic counters and timers (see <<_zicntr_cpu_base_counters>>). Both
extensions are _mandatory_ in order to comply with the RISC-V architecture specifications.

[NOTE]
The table above does not show _all_ CPU ISA extensions. More sophisticated and application-specific
options like PMP and HMP are not included in this overview.

.Goal-Driven Optimization
[TIP]
The CPU provides further options to reduce the area footprint (for example by constraining the CPU-internal
counter sizes) or to increase performance (for example by using a barrel-shifter; at cost of extra hardware).
See section <<_processor_top_entity_generics>> for more information. Also, take a look at the User Guide section
https://stnolting.github.io/neorv32/ug/#_application_specific_processor_configuration[Application-Specific Processor Configuration].


:sectnums:
==== Processor - Modules

[cols="<2,<8"]
[grid="topbot"]
|=======================
| HW version: | `1.6.8.3`
| Top entity: | `rtl/core/neorv32_top.vhd`
| FPGA:       | Intel Cyclone IV E `EP4CE22F17C6`
| Toolchain:  | Quartus Prime Lite 21.1
| Constraints: | **no timing constraints**, "_balanced optimization_"
|=======================

.Hardware utilization by processor module (mandatory modules highlighted in **bold**)
[cols="<2,<8,>1,>1,>2,>1"]
[options="header",grid="rows"]
|=======================
| Module        | Description                                                    | LEs | FFs | MEM bits | DSPs
| Boot ROM      | Bootloader ROM (4kB)                                           |   3 |   2 |    32768 |    0
| **BUSKEEPER** | Processor-internal bus monitor                                 |  28 |  15 |        0 |    0
| **BUSSWITCH** | Bus multiplexer for CPU instr. and data interface              |  69 |   8 |        0 |    0
| CFS           | Custom functions subsystemfootnote:[Resource utilization depends on custom design logic.] | - | - | - | -
| DM            | On-chip debugger - debug module                                | 473 | 240 |        0 |    0
| DTM           | On-chip debugger - debug transfer module (JTAG)                | 259 | 221 |        0 |    0
| DMEM          | Processor-internal data memory (8kB)                           |  18 |   2 |    65536 |    0
| GPIO          | General purpose input/output ports                             | 102 |  98 |        0 |    0
| GPTMR         | General Purpose Timer                                          | 153 | 105 |        0 |    0
| iCACHE        | Instruction cache (2x4 blocks, 64 bytes per block)             | 417 | 297 |     4096 |    0
| IMEM          | Processor-internal instruction memory (16kB)                   |  12 |   2 |   131072 |    0
| MTIME         | Machine system timer                                           | 345 | 166 |        0 |    0
| NEOLED        | Smart LED Interface (NeoPixel/WS28128) (FIFO_depth=1)          | 227 | 184 |        0 |    0
| PWM           | Pulse_width modulation controller (8 channels)                 | 128 | qq7 |        0 |    0
| SLINK         | Stream link interface (2xRX, 2xTX, FIFO_depth=1)               | 136 | 116 |        0 |    0
| SPI           | Serial peripheral interface                                    | 114 |  94 |        0 |    0
| **SYSINFO**   | System configuration information memory                        |  13 |  11 |        0 |    0
| TRNG          | True random number generator                                   |  89 |  79 |        0 |    0
| TWI           | Two-wire interface                                             |  77 |  43 |        0 |    0
| UART0, UART1  | Universal asynchronous receiver/transmitter 0/1 (FIFO_depth=1) | 195 | 143 |        0 |    0
| WDT           | Watchdog timer                                                 |  61 |  46 |        0 |    0
| WISHBONE      | External memory interface                                      | 120 | 112 |        0 |    0
| XIP           | Execute in place module                                        | 318 | 244 |        0 |    0
| XIRQ          | External interrupt controller (32 channels)                    | 245 | 200 |        0 |    0
|=======================

[NOTE]
Note that not all IOs were actually connected to FPGA pins (for example some GPIO inputs and outputs)
when generating these reports.


<<<
:sectnums:
==== Exemplary Setups

Check out the `neorv32-setups` repository (@GitHub: https://github.com/stnolting/neorv32-setups),
which provides several demo setups for various FPGA boards and toolchains.


<<<
// ####################################################################################################################
:sectnums:
=== CPU Performance

The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark].
This benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
system. The according sources can be found in the `sw/example/coremark` folder.

.Dhrystone
[TIP]
A _simple_ port of the Dhrystone benchmark is also available in `sw/example/dhrystone`.

The resulting CoreMark score is defined as CoreMark iterations per second.
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
defined as CoreMark score divided by the CPU's clock frequency in MHz.

.Configuration
[cols="<2,<8"]
[grid="topbot"]
|=======================
| HW version:     | `1.5.7.10`
| Hardware:       | 32kB int. IMEM, 16kB int. DMEM, no caches, 100MHz clock
| CoreMark:       | 2000 iterations, MEM_METHOD is MEM_STACK
| Compiler:       | RISCV32-GCC 10.2.0
| Compiler flags: | default, see makefile
|=======================

.CoreMark results
[cols="<4,^1,^1,^1"]
[options="header",grid="rows"]
|=======================
| CPU                                             | CoreMark Score | CoreMarks/MHz | Average CPI
| _small_ (`rv32i_Zicsr`)                         |          33.89 | **0.3389**    | **4.04**
| _medium_ (`rv32imc_Zicsr`)                      |          62.50 | **0.6250**    | **5.34**
| _performance_ (`rv32imc_Zicsr` + perf. options) |          95.23 | **0.9523**    | **3.54**
|=======================

[NOTE]
The CoreMark results were generated using a `rv32i` toolchain. This toolchain supports standard extensions
like `M` and `C` but the built-in libraries only use the base `I` ISA.

[NOTE]
The "_performance_" CPU configuration uses the <<_fast_mul_en>> and <<_fast_shift_en>> options.

The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
several consecutive micro operations.
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
the available CPU extensions. The average CPI is computed by dividing the total number of required clock cycles
(only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions
(`[m]instret[h]` CSRs). More information regarding the execution time of each implemented instruction can be found in
chapter <<_instruction_timing>>.

Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.