OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [overview.adoc] - Rev 61

Go to most recent revision | Compare with Previous | Blame | View Log

:sectnums:
== Overview

The NEORV32footnote:[Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.] is an open-source
RISC-V compatible processor system that is intended as *ready-to-go* auxiliary processor within a larger SoC
designs or as stand-alone custom / customizable microcontroller.

The system is highly configurable and provides optional common peripherals like embedded memories,
timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb
compatible on-chip debugger accessible via JTAG.

The software framework of the processor comes with application makefiles, software libraries for all CPU
and processor features, a bootloader, a runtime environment and several example programs – including a port
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).

[TIP]
Check out the processor's **https://stnolting.github.io/neorv32/ug[online User Guide]**
that provides hands-on tutorial to get you started.

[TIP]
The project's change log is available in https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md[CHANGELOG.md]
in the root directory of the NEORV32 repository. Please also check out the <<_legal>> section.


**Structure**

* <<_neorv32_processor_soc>>
* <<_neorv32_central_processing_unit_cpu>>
* <<_on_chip_debugger_ocd>>
* <<_software_framework>>

[TIP]
Links in this document are <<_overview,highlighted>>.



<<<
// ####################################################################################################################
:sectnums:
=== Rationale

**Why did you make this?**

I am fascinated by processor and CPU architecture design: it is the magic frontier where software meets hardware.
This project has started as something like a _journey_ into this magic realm to understand how things actually work
down on this very low level.

But there is more! When I started to dive into the emerging RISC-V ecosystem I felt overwhelmed by the complexity.
As a beginner it is hard to get an overview - especially when you want to setup a minimal platform to tinker with:
Which core to use? How to get the right toolchain? What features do I need? How does the booting work? How do I
create an actual executable? How to get that into the hardware? How to customize things? **_Where to start???_**

So this project aims to provides a _simple to understand_ and _easy to use_ yet _powerful_ and _flexible_ platform
that targets FPGA and RISC-V beginners as well as advanced users. Join me and us on this journey! 🙃


**Why a _soft_-core processor?**

As a matter of fact soft-core processors _cannot_ compete with discrete or FPGA hard-macro processors in terms
of performance, energy and size. But they do fill a niche in FPGA design space. For example, soft-core processors
allow to implement the _control flow part_ of certain applications (like communication protocol handling) using
software like plain C. This provides high flexibility as software can be easily changed, re-compiled and
re-uploaded again.

Furthermore, the concept of flexibility applies to all aspects of a soft-core processor. The user can add
_exactly_ the features that are required by the application: additional memories, custom interfaces, specialized
IP and even user-defined instructions.


**Why RISC-V?**

[quote, RISC-V International, https://riscv.org/about/]
____
RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration.
____

I love the idea of open-source. **Knowledge can help best if it is freely available.**
While open-source has already become quite popular in _software_, hardware projects still need to catch up.
Admittedly, there has been quite a development, but mainly in terms of _platforms_ and _applications_ (so
schematics, PCBs, etc.). Although processors and CPUs are the heart of almost every digital system, having a true
open-source silicon is still a rarity. RISC-V aims to change that. Even it is _just one approach_, it helps paving
the road for future development.

Furthermore, I welcome the community aspect of RISC-V. The ISA and everything beyond is developed with direct
contact to the community: this includes businesses and professionals but also hobbyist, amateurs and people
that are just curious. Everyone can join discussions and contribute to RISC-V in their very own way.

Finally, I really like the RISC-V ISA itself. It aims to be a clean, orthogonal and "intuitive" ISA that
resembles with the basic concepts of _RISC_: simple yet effective.


**Yet another RISC-V core? What makes it special?**

The NEORV32 is not based on another RISC-V core. It was build entirely from ground up (just following the official
ISA specs) having a different design goal in mind. The project does not intend to replace certain RISC-V cores or
just beat existing ones like https://github.com/SpinalHDL/VexRiscv[VexRISC] in terms of performance or
https://github.com/olofk/serv[SERV] in terms of size.

The project aims to provide _another option_ in the RISC-V / soft-core design space with a different performance
vs. size trade-off and a different focus: _embrace_ concepts like documentation, platform-independence / portability,
RISC-V compatibility, _customization_ and _ease of use_. See the <<_project_key_features>> below.


// ####################################################################################################################
:sectnums:
=== Project Key Features

* open-source and documented; including user guides to get started
* completely described in behavioral, platform-independent VHDL (yet platform-optimized modules are provided)
* fully synchronous design, no latches, no gated clocks
* small hardware footprint and high operating frequency for easy integration
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU
** RISC-V compatibility: passes the official architecture tests
** base architecture + privileged architecture (optional) + ISA extensions (optional)
** rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
** official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
* **NEORV32 Processor (SoC)**: highly-configurable full-scale microcontroller-like processor system
** based on the NEORV32 CPU
** optional serial interfaces (UARTs, TWI, SPI)
** optional timers and counters (WDT, MTIME)
** optional general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface
** optional embedded memories / caches for data, instructions and bootloader
** optional external memory interface (Wishbone / AXI4-Lite) and stream link interface (AXI4-Stream) for custom connectivity
** on-chip debugger compatible with OpenOCD and gdb
* **Software framework**
** GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles
** internal bootloader with serial user interface
** core libraries for high-level usage of the provided functions and peripherals
** runtime environment and several example programs
** doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
** FreeRTOS port + demos available

[TIP]
For more in-depth details regarding the feature provided by he hardware see the according sections:
<<_neorv32_central_processing_unit_cpu>> and <<_neorv32_processor_soc>>.


<<<
// ####################################################################################################################
:sectnums:
=== Project Folder Structure

...................................
neorv32           - Project home folder
├.ci              - Scripts for continuous integration
├setups           - Example setups for various FPGA boards and toolchains
│└...
├CHANGELOG.md     - Project change log
├docs             - Project documentation
│├doxygen_build   - Software framework documentation (generated by doxygen)
│├src_adoc        - AsciiDoc sources for this document
│├references      - Data sheets and RISC-V specs.
│└figures         - Figures and logos
├riscv-arch-test  - Port files for the official RISC-V architecture tests
├rtl              - VHDL sources
│├core            - Sources of the CPU & SoC
│└templates       - Alternate/additional top entities/wrappers
│ ├processor      - Processor wrappers
│ └system         - System wrappers for advanced connectivity
├sim              - Simulation files
│└rtl_modules     - Processor modules for simulation-only
â””sw               - Software framework
 ├bootloader      - Sources and scripts for the NEORV32 internal bootloader
 ├common          - Linker script and crt0.S start-up code
 ├example         - Various example programs
 │└...
 ├ocd_firmware    - source code for on-chip debugger's "park loop"
 ├openocd         - OpenOCD on-chip debugger configuration files
 ├image_gen       - Helper program to generate NEORV32 executables
 â””lib             - Processor core library
  ├include        - Header files (*.h)
  â””source         - Source files (*.c)
...................................

[NOTE]
There are further files and folders starting with a dot which – for example – contain
data/configurations only relevant for git or for the continuous integration framework (`.ci`).


<<<
// ####################################################################################################################
:sectnums:
=== VHDL File Hierarchy

All necessary VHDL hardware description files are located in the project's `rtl/core folder`. The top entity
of the entire processor including all the required configuration generics is **`neorv32_top.vhd`**.

[IMPORTANT]
All core VHDL files from the list below have to be assigned to a new design library named **`neorv32`**. Additional
files, like alternative top entities, can be assigned to any library.

...................................
neorv32_top.vhd                  - NEORV32 Processor top entity
│
├neorv32_fifo.vhd                - General purpose FIFO component
├neorv32_package.vhd             - Processor/CPU main VHDL package file
│
├neorv32_cpu.vhd                 - NEORV32 CPU top entity
│├neorv32_cpu_alu.vhd            - Arithmetic/logic unit
││├neorv32_cpu_cp_fpu.vhd        - Floating-point co-processor (Zfinx ext.)
││├neorv32_cpu_cp_muldiv.vhd     - Mul/Div co-processor (M extension)
││└neorv32_cpu_cp_shifter.vhd    - Bit-shift co-processor
│├neorv32_cpu_bus.vhd            - Bus interface + physical memory protection
│├neorv32_cpu_control.vhd        - CPU control, exception/IRQ system and CSRs
││└neorv32_cpu_decompressor.vhd  - Compressed instructions decoder
│└neorv32_cpu_regfile.vhd        - Data register file
│
├neorv32_boot_rom.vhd            - Bootloader ROM
│└neorv32_bootloader_image.vhd   - Bootloader boot ROM memory image
├neorv32_busswitch.vhd           - Processor bus switch for CPU buses (I&D)
├neorv32_bus_keeper.vhd          - Processor-internal bus monitor
├neorv32_icache.vhd              - Processor-internal instruction cache
├neorv32_cfs.vhd                 - Custom functions subsystem
├neorv32_debug_dm.vhd            - on-chip debugger: debug module
├neorv32_debug_dtm.vhd           - on-chip debugger: debug transfer module
├neorv32_dmem.vhd                - Processor-internal data memory
├neorv32_gpio.vhd                - General purpose input/output port unit
├neorv32_imem.vhd                - Processor-internal instruction memory
│└neor32_application_image.vhd   - IMEM application initialization image
├neorv32_mtime.vhd               - Machine system timer
├neorv32_neoled.vhd              - NeoPixel (TM) compatible smart LED interface
├neorv32_pwm.vhd                 - Pulse-width modulation controller
├neorv32_spi.vhd                 - Serial peripheral interface controller
├neorv32_sysinfo.vhd             - System configuration information memory
├neorv32_trng.vhd                - True random number generator
├neorv32_twi.vhd                 - Two wire serial interface controller
├neorv32_uart.vhd                - Universal async. receiver/transmitter
├neorv32_wdt.vhd                 - Watchdog timer
├neorv32_wishbone.vhd            - External (Wishbone) bus interface
â””neorv32_xirq.vhd                - External interrupt controller
...................................


<<<
// ####################################################################################################################
:sectnums:
=== FPGA Implementation Results

This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that
the provided results are just a relative measure as logic functions of different modules might be merged
between entity boundaries, so the actual utilization results might vary a bit.

:sectnums:
==== CPU

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.5.5.5`
| Top entity:       | `rtl/core/neorv32_cpu.vhd`
|=======================

[cols="<5,>1,>1,>1,>1,>1"]
[options="header",grid="rows"]
|=======================
| CPU                                   | LEs  | FFs  | MEM bits | DSPs | _f~max~_
| `rv32i`                               |  980 |  409 | 1024     | 0    | 125 MHz
| `rv32i_Zicsr`                         | 1835 |  856 | 1024     | 0    | 125 MHz
| `rv32im_Zicsr`                        | 2443 | 1134 | 1024     | 0    | 125 MHz
| `rv32imc_Zicsr`                       | 2669 | 1149 | 1024     | 0    | 125 MHz
| `rv32imac_Zicsr`                      | 2685 | 1156 | 1024     | 0    | 125 MHz
| `rv32imac_Zicsr` + `debug_mode`       | 3058 | 1225 | 1024     | 0    | 125 MHz
| `rv32imac_Zicsr` + `u`                | 2698 | 1162 | 1024     | 0    | 125 MHz
| `rv32imac_Zicsr_Zifencei` + `u`       | 2715 | 1162 | 1024     | 0    | 125 MHz
| `rv32imac_Zicsr_Zifencei_Zfinx` + `u` | 4004 | 1812 | 1024     | 7    | 118 MHz
|=======================


:sectnums:
==== Processor Modules

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.5.7.8`
| Top entity:       | `rtl/core/neorv32_top.vhd`
|=======================

.Hardware utilization by the processor modules (mandatory core modules in **bold**)
[cols="<2,<8,>1,>1,>2,>1"]
[options="header",grid="rows"]
|=======================
| Module        | Description                                         | LEs | FFs | MEM bits | DSPs
| Boot ROM      | Bootloader ROM (4kB)                                |   2 |   1 |    32768 |    0
| **BUSKEEPER** | Processor-internal bus monitor                      |   9 |   6 |        0 |    0
| **BUSSWITCH** | Bus mux for CPU instr. and data interface           |  63 |   8 |        0 |    0
| CFS           | Custom functions subsystemfootnote:[Resource utilization depends on actually implemented custom functionality.] | - | - | - | -
| DMEM          | Processor-internal data memory (8kB)                |  19 |   2 |    65536 |    0
| DM            | On-chip debugger - debug module                     | 493 | 240 |        0 |    0
| DTM           | On-chip debugger - debug transfer module (JTAG)     | 254 | 218 |        0 |    0
| GPIO          | General purpose input/output ports                  | 134 | 161 |        0 |    0
| iCACHE        | Instruction cache (1x4 blocks, 256 bytes per block) | 2 21| 156 |     8192 |    0
| IMEM          | Processor-internal instruction memory (16kB)        |  13 |   2 |   131072 |    0
| MTIME         | Machine system timer                                | 319 | 167 |        0 |    0
| NEOLED        | Smart LED Interface (NeoPixel/WS28128) [4xFIFO]     | 342 | 307 |        0 |    0
| SLINK         | Stream link interface (4 links, FIFO_depth=1)       | 345 | 313 |        0 |    0
| PWM           | Pulse_width modulation controller (4 channels)      |  71 |  69 |        0 |    0
| SPI           | Serial peripheral interface                         | 148 | 127 |        0 |    0
| **SYSINFO**   | System configuration information memory             |  14 |  11 |        0 |    0
| TRNG          | True random number generator                        |  89 |  76 |        0 |    0
| TWI           | Two-wire interface                                  |  77 |  43 |        0 |    0
| UART0/1       | Universal asynchronous receiver/transmitter 0/1     | 183 | 132 |        0 |    0
| WDT           | Watchdog timer                                      |  53 |  43 |        0 |    0
| WISHBONE      | External memory interface                           | 114 | 110 |        0 |    0
| XIRQ          | External interrupt controller (32 channels)         | 241 | 201 |        0 |    0
|=======================


<<<
:sectnums:
==== Exemplary Setups

[TIP]
Check out the `setups` folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/setups),
which provides several demo setups for various FPGA boards and toolchains.


<<<
// ####################################################################################################################
:sectnums:
=== CPU Performance

:sectnums:
==== CoreMark Benchmark

.Configuration
[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware:       | 32kB IMEM, 16kB DMEM, no caches, 100MHz clock
| CoreMark:       | 2000 iterations, MEM_METHOD is MEM_STACK
| Compiler:       | RISCV32-GCC 10.1.0
| Peripherals:    | UART for printing the results
| Compiler flags: | default, see makefile
|=======================

The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark]. This
benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
system. The according source code and the SW project can be found in the `sw/example/coremark` folder.

The resulting CoreMark score is defined as CoreMark iterations per second.
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
defined as CoreMark score divided by the CPU's clock frequency in MHz.

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.4.9.8`
|=======================

.CoreMark results
[cols="<4,>1,>1,>1"]
[options="header",grid="rows"]
|=======================
| CPU (incl. `Zicsr`)                         | Executable size | CoreMark Score | CoreMarks/Mhz
| `rv32i`                                     |     28756 bytes |          36.36 | **0.3636**
| `rv32im`                                    |     27516 bytes |          68.97 | **0.6897**
| `rv32imc`                                   |     22008 bytes |          68.97 | **0.6897**
| `rv32imc` + _FAST_MUL_EN_                   |     22008 bytes |          86.96 | **0.8696**
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |     22008 bytes |          90.91 | **0.9091**
|=======================

[NOTE]
All executable were generated using maximum optimization `-O3`.
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the _M_ extension (enabled via the
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
operations (enabled via the _FAST_SHIFT_EN_ generic).


<<<
:sectnums:
==== Instruction Timing

The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
several consecutive micro operations. Hence, each instruction requires several clock cycles to execute.

The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
the available CPU extensions. The following table shows the performance results for successfully (!) running
2000 CoreMark iterations.

The average CPI is computed by dividing the total number of required clock cycles (only the timed core to
avoid distortion due to IO wait cycles) by the number of executed instructions (`[m]instret[h]` CSRs). The
executables were generated using optimization `-O3`.

[cols="<2,<8"]
[grid="topbot"]
|=======================
| Hardware version: | `1.4.9.8`
|=======================

.CoreMark instruction timing
[cols="<4,>2,>2,>2"]
[options="header",grid="rows"]
|=======================
| CPU (incl. `Zicsr`)                         | Required clock cycles | Executed instruction | Average CPI
| `rv32i`                                     |            5595750503 | 1466028607           | **3.82**
| `rv32im`                                    |            2966086503 |  598651143           | **4.95**
| `rv32imc`                                   |            2981786734 |  611814918           | **4.87**
| `rv32imc` + _FAST_MUL_EN_                   |            2399234734 |  611814918           | **3.92**
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |            2265135174 |  611814948           | **3.70**
|=======================

[TIP]
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the M extension (enabled via the
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
operations (enabled via the _FAST_SHIFT_EN_ generic).

[TIP]
More information regarding the execution time of each implemented instruction can be found in
chapter <<_instruction_timing>>.

Go to most recent revision | Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.