OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [overview.adoc] - Rev 60

Go to most recent revision | Compare with Previous | Blame | View Log

:sectnums:
== Overview

[quote]
____
RISC-V - Instruction Sets Want To Be Free!
____

The NEORV32footnote:[Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.] is an open-source
RISC-V compatible processor system that is intended as *ready-to-go* auxiliary processor within a larger SoC
designs or as stand-alone custom / customizable microcontroller.

The system is highly configurable and provides optional common peripherals like embedded memories,
timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb
compatible on-chip debugger accessible via JTAG.

The software framework of the processor comes with application makefiles, software libraries for all CPU
and processor features, a bootloader, a runtime environment and several example programs – including a port
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).

[TIP]
The project's change log is available in https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md[CHANGELOG.md]
in the root directory of the NEORV32 repository. Please also check out the <<_legal>> section.



:sectnums!:
=== Structure

Chapter <<_neorv32_processor_soc>>

* top entity signals and configuration generics, address space layout, internal peripheral devices and interrupts, internal
memories and caches, internal bus architecture, external bus interface

Chapter <<_neorv32_central_processing_unit_cpu>>

* instruction set(s) and extensions, instruction timing, control ans status registers, traps, exceptions and interrupts,
hardware execution safety, native bus interface

Chapter <<_on_chip_debugger_ocd>>

* on-chip debugging compatible to the "Minimal RISC-V Debug Specification Version 0.13.2".

Chapter <<_software_framework>>

* core libraries, bootloader, makefiles, runtime environment

Chapter <<_lets_get_it_started>>

* toolchain installation and setup, hardware setup, software setup, application compilation, simulating the processor
debugging using the on-chip debugger

[TIP]
Links in this document are <<_structure,highlighted>>.



<<<
// ####################################################################################################################
:sectnums:
=== Project Key Features

* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU - passes the official RISC-V architecture tests
* official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
* optional RISC-V CPU extensions:
** `A` - atomic memory access operations
** `B` - bit-manipulation instructions
** `C` - 16-bit compressed instructions
** `E` - embedded CPU version (reduced register file size)
** `M` - integer multiplication and division hardware
** `U` - less-privileged _user_ mode
** `Zfinx` - single-precision floating-point unit
** `Zicsr` - control and status register access (privileged architecture)
** `Zifencei` - instruction stream synchronization
** `PMP` - physical memory protection
** `HPM` - hardware performance monitors
* **Software framework**
** GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles
** internal bootloader with serial user interface
** core libraries for high-level usage of the provided functions and peripherals
** runtime environment and several example programs
** doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
** FreeRTOS port + demos available
* **NEORV32 Processor**: highly-configurable full-scale microcontroller-like processor system / SoC based on the NEORV32 CPU with optional standard peripherals:
** serial interfaces (UARTs, TWI, SPI)
** timers and counters (WDT, MTIME, NCO)
** general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface
** embedded memories / caches for data, instructions and bootloader
** external memory interface (Wishbone or AXI4-Lite)
* on-chip debugger compatible with OpenOCD and gdb
* fully synchronous design, no latches, no gated clocks
* completely described in behavioral, platform-independent VHDL
* small hardware footprint and high operating frequency


<<<
// ####################################################################################################################
:sectnums:
=== Project Folder Structure

...................................
neorv32            - Project home folder
├.ci              - Scripts for continuous integration
├boards           - Example setups for various FPGA boards
├CHANGELOG.md     - Project change log
├docs             - Project documentation
│├doxygen_build  - Software framework documentation (generated by doxygen)
│├src_adoc       - AsciiDoc sources for this document
│├references     - Data sheets and RISC-V specs.
│└figures        - Figures and logos
├riscv-arch-test  - Port files for the official RISC-V architecture tests
├rtl              - VHDL sources
│├core           - Sources of the CPU & SoC
│└top_templates  - Alternate/additional top entities/wrappers
├sim              - Simulation files
│├ghdl           - Simulation scripts for GHDL
│├rtl_modules    - Processor modules for simulation-only
│└vivado         - Pre-configured Xilinx ISIM waveform
└sw               - Software framework
 ├bootloader      - Sources and scripts for the NEORV32 internal bootloader
 ├common          - Linker script and crt0.S start-up code
 ├example         - Various example programs
 │└...
 ├ocd_firmware    - source code for on-chip debugger's "park loop"
 ├openocd         - OpenOCD on-chip debugger configuration files
 ├image_gen       - Helper program to generate NEORV32 executables
 └lib             - Processor core library
  ├include        - Header files (*.h)
  └source         - Source files (*.c)
...................................

[NOTE]
There are further files and folders starting with a dot which – for example – contain
data/configurations only relevant for git or for the continuous integration framework (`.ci`).


<<<
// ####################################################################################################################
:sectnums:
=== VHDL File Hierarchy

All necessary VHDL hardware description files are located in the project's `rtl/core folder`. The top entity
of the entire processor including all the required configuration generics is **`neorv32_top.vhd`**.

[IMPORTANT]
All core VHDL files from the list below have to be assigned to a new design library named **`neorv32`**. Additional
files, like alternative top entities, can be assigned to any library.

...................................
neorv32_top.vhd                      - NEORV32 Processor top entity
├neorv32_boot_rom.vhd               - Bootloader ROM
│└neorv32_bootloader_image.vhd     - Bootloader boot ROM memory image 
├neorv32_busswitch.vhd              - Processor bus switch for CPU buses (I&D)
├neorv32_bus_keeper.vhd             - Processor-internal bus monitor
├neorv32_icache.vhd                 - Processor-internal instruction cache
├neorv32_cfs.vhd                    - Custom functions subsystem
├neorv32_cpu.vhd                    - NEORV32 CPU top entity
│├neorv32_package.vhd              - Processor/CPU main VHDL package file
│├neorv32_cpu_alu.vhd              - Arithmetic/logic unit
│├neorv32_cpu_bus.vhd              - Bus interface unit + physical memory protection
│├neorv32_cpu_control.vhd          - CPU control, exception/IRQ system and CSRs
││└neorv32_cpu_decompressor.vhd   - Compressed instructions decoder
│├neorv32_cpu_cp_fpu.vhd           - Floating-point co-processor (Zfinx extension)
│├neorv32_cpu_cp_muldiv.vhd        - Mul/Div co-processor (M extension)
│└neorv32_cpu_regfile.vhd          - Data register file
├neorv32_debug_dm.vhd               - on-chip debugger: debug module
├neorv32_debug_dtm.vhd              - on-chip debugger: debug transfer module
├neorv32_dmem.vhd                   - Processor-internal data memory
├neorv32_gpio.vhd                   - General purpose input/output port unit
├neorv32_imem.vhd                   - Processor-internal instruction memory
│└neor32_application_image.vhd     - IMEM application initialization image
├neorv32_mtime.vhd                  - Machine system timer
├neorv32_nco.vhd                    - Numerically-controlled oscillator
├neorv32_neoled.vhd                 - NeoPixel (TM) compatible smart LED interface
├neorv32_pwm.vhd                    - Pulse-width modulation controller
├neorv32_spi.vhd                    - Serial peripheral interface controller
├neorv32_sysinfo.vhd                - System configuration information memory
├neorv32_trng.vhd                   - True random number generator
├neorv32_twi.vhd                    - Two wire serial interface controller
├neorv32_uart.vhd                   - Universal async. receiver/transmitter
├neorv32_wdt.vhd                    - Watchdog timer
└neorv32_wb_interface.vhd           - External (Wishbone) bus interface
...................................


<<<
// ####################################################################################################################
:sectnums:
=== FPGA Implementation Results

This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that
the provided results are just a relative measure as logic functions of different modules might be merged
between entity boundaries, so the actual utilization results might vary a bit.

:sectnums:
==== CPU

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.5.5.5`
| Top entity:       | `rtl/core/neorv32_cpu.vhd`
|=======================

[cols="<5,>1,>1,>1,>1,>1"]
[options="header",grid="rows"]
|=======================
| CPU                                   | LEs  | FFs  | MEM bits | DSPs | _f~max~_
| `rv32i`                               |  980 |  409 | 1024     | 0    | 123 MHz
| `rv32i_Zicsr`                         | 1835 |  856 | 1024     | 0    | 124 MHz
| `rv32im_Zicsr`                        | 2443 | 1134 | 1024     | 0    | 124 MHz
| `rv32imc_Zicsr`                       | 2669 | 1149 | 1024     | 0    | 125 MHz
| `rv32imac_Zicsr`                      | 2685 | 1156 | 1024     | 0    | 124 MHz
| `rv32imac_Zicsr` + `debug_mode`       | 3058 | 1225 | 1024     | 0    | 120 MHz
| `rv32imac_Zicsr` + `u`                | 2698 | 1162 | 1024     | 0    | 124 MHz
| `rv32imac_Zicsr_Zifencei` + `u`       | 2715 | 1162 | 1024     | 0    | 122 MHz
| `rv32imac_Zicsr_Zifencei_Zfinx` + `u` | 4004 | 1812 | 1024     | 7    | 121 MHz
|=======================


:sectnums:
==== Processor Modules

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.5.5.9`
| Top entity:       | `rtl/core/neorv32_top.vhd`
|=======================

.Hardware utilization by the processor modules (mandatory core modules in **bold**)
[cols="<2,<8,>1,>1,>2,>1"]
[options="header",grid="rows"]
|=======================
| Module        | Description                                         | LEs | FFs | MEM bits | DSPs
| Boot ROM      | Bootloader ROM (4kB)                                |   3 |   1 |    32768 |    0
| **BUSKEEPER** | Processor-internal bus monitor                      |  11 |   6 |        0 |    0
| **BUSSWITCH** | Bus mux for CPU instr. and data interface           |  49 |   8 |        0 |    0
| CFS           | Custom functions subsystem                          |   - |   - |        - |    -
| DMEM          | Processor-internal data memory (8kB)                |  18 |   2 |    65536 |    0
| DM            | On-chip debugger - debug module                     | 493 | 240 |        0 |    0
| DTM           | On-chip debugger - debug transfer module (JTAG)     | 254 | 218 |        0 |    0
| GPIO          | General purpose input/output ports                  |  67 |  65 |        0 |    0
| iCACHE        | Instruction cache (1x4 blocks, 256 bytes per block) | 220 | 154 |     8192 |    0
| IMEM          | Processor-internal instruction memory (16kB)        |   6 |   2 |   131072 |    0
| MTIME         | Machine system timer                                | 289 | 200 |        0 |    0
| NCO           | Numerically-controlled oscillator                   | 254 | 226 |        0 |    0
| NEOLED        | Smart LED Interface (NeoPixel/WS28128) [4xFIFO]     | 347 | 309 |        0 |    0
| PWM           | Pulse_width modulation controller (4 channels)      |  71 |  69 |        0 |    0
| SPI           | Serial peripheral interface                         | 138 | 124 |        0 |    0
| **SYSINFO**   | System configuration information memory             |  10 |  10 |        0 |    0
| TRNG          | True random number generator                        | 132 | 105 |        0 |    0
| TWI           | Two-wire interface                                  |  77 |  44 |        0 |    0
| UART0/1       | Universal asynchronous receiver/transmitter 0/1     | 176 | 132 |        0 |    0
| WDT           | Watchdog timer                                      |  60 |  45 |        0 |    0
| WISHBONE      | External memory interface                           | 129 | 104 |        0 |    0
|=======================


<<<
:sectnums:
==== Exemplary Setups

[TIP]
Exemplary setups for different technologies and various FPGA boards can be found in the `boards` folder
(https://github.com/stnolting/neorv32/tree/master/boards).

The following table shows exemplary NEORV32 processor implementation results for different FPGA
platforms. Most setups use the default peripheral configuration (like no CFS, no caches and no
TRNG), no external memory interface and only internal instruction and data memories (IMEM uses 16kB
and DMEM uses 8kB memory space).

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.4.9.0`
|=======================

.Hardware utilization for exemplary NEORV32 setups
[cols="<4,<5,<4,<4,<3,<3,<3,<4,<4,<3"]
[options="header",grid="rows"]
|=======================
| Vendor  | FPGA                             | Board            | Toolchain               | CPU                               | LUT        | FF         | DSP    | Memory                        | _f_
| Intel   | Cyclone IV `EP4CE22F17-C6N`      | Terasic DE0-Nano | Quartus Prime Lite 20.1 | `rv32imcu_Zicsr_Zifencei` + `PMP` | 3813 (17%) | 1890 (8%)  | 0 (0%) | Memory bits: 231424 (38%)     | 119 MHz
| Lattice | iCE40 UltraPlus `iCE40UP5KSG48I` | Upduino v3.0     | Radiant 2.1             | `rv32icu_Zicsr_Zifencei`          | 5123 (97%) | 1972 (37%) | 0 (0%) | EBR: 12 (40%) SPRAM: 4 (100%) | 24 MHz
| Xilinx  | Artix-7 `XC7A35TICSG324-1L`      | Arty A7-35T      | Vivado 2019.2           | `rv32imcu_Zicsr_Zifencei` + `PMP` | 2465 (12%) | 1912 (5%)  | 0 (0%) | BRAM: 8 (16%)                 | 100 MHz
|=======================

**Notes**

* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kB).
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32 bootloader to store and automatically boot an application program after reset (both tested successfully).
* The setups with PMP implement 2 regions with a minimal granularity of 64kB.
* No HPM counters are used.


<<<
// ####################################################################################################################
:sectnums:
=== CPU Performance

:sectnums:
==== CoreMark Benchmark

.Configuration
[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware:       | 32kB IMEM, 16kB DMEM, no caches, 100MHz clock
| CoreMark:       | 2000 iterations, MEM_METHOD is MEM_STACK
| Compiler:       | RISCV32-GCC 10.1.0
| Peripherals:    | UART for printing the results
| Compiler flags: | default, see makefile
|=======================

The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark]. This
benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
system. The according source code and the SW project can be found in the `sw/example/coremark` folder.

The resulting CoreMark score is defined as CoreMark iterations per second.
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
defined as CoreMark score divided by the CPU's clock frequency in MHz.

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.4.9.8`
|=======================

.CoreMark results
[cols="<4,>1,>1,>1"]
[options="header",grid="rows"]
|=======================
| CPU (incl. `Zicsr`)                         | Executable size | CoreMark Score | CoreMarks/Mhz
| `rv32i`                                     |     28756 bytes |          36.36 | **0.3636**
| `rv32im`                                    |     27516 bytes |          68.97 | **0.6897**
| `rv32imc`                                   |     22008 bytes |          68.97 | **0.6897**
| `rv32imc` + _FAST_MUL_EN_                   |     22008 bytes |          86.96 | **0.8696**
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |     22008 bytes |          90.91 | **0.9091**
|=======================

[NOTE]
All executable were generated using maximum optimization `-O3`.
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the _M_ extension (enabled via the
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
operations (enabled via the _FAST_SHIFT_EN_ generic).


<<<
:sectnums:
==== Instruction Timing

The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
several consecutive micro operations. Hence, each instruction requires several clock cycles to execute.

The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
the available CPU extensions. The following table shows the performance results for successfully (!) running
2000 CoreMark iterations.

The average CPI is computed by dividing the total number of required clock cycles (only the timed core to
avoid distortion due to IO wait cycles) by the number of executed instructions (`[m]instret[h]` CSRs). The
executables were generated using optimization -O3.

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.4.9.8`
|=======================

.CoreMark instruction timing
[cols="<4,>2,>2,>2"]
[options="header",grid="rows"]
|=======================
| CPU (incl. `Zicsr`)                         | Required clock cycles | Executed instruction | Average CPI
| `rv32i`                                     |            5595750503 | 1466028607           | **3.82**
| `rv32im`                                    |            2966086503 |  598651143           | **4.95**
| `rv32imc`                                   |            2981786734 |  611814918           | **4.87**
| `rv32imc` + _FAST_MUL_EN_                   |            2399234734 |  611814918           | **3.92**
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |            2265135174 |  611814948           | **3.70**
|=======================

[TIP]
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the M extension (enabled via the
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
operations (enabled via the _FAST_SHIFT_EN_ generic).

[TIP]
More information regarding the execution time of each implemented instruction can be found in
chapter <<_instruction_timing>>.

Go to most recent revision | Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.