Line 3... |
Line 3... |
|
|
image::riscv_logo.png[width=350,align=center]
|
image::riscv_logo.png[width=350,align=center]
|
|
|
**Key Features**
|
**Key Features**
|
|
|
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
|
* 32-bit multi-cycle in-order `rv32` RISC-V CPU
|
* Optional RISC-V extensions:
|
* Optional RISC-V extensions:
|
** `A` - atomic memory access operations
|
** `A` - atomic memory access operations
|
|
** `B` - bit-manipulation instructions
|
** `C` - 16-bit compressed instructions
|
** `C` - 16-bit compressed instructions
|
** `I` - integer base ISA (always enabled)
|
** `I` - integer base ISA (always enabled)
|
** `E` - embedded CPU version (reduced register file size)
|
** `E` - embedded CPU version (reduced register file size)
|
** `M` - integer multiplication and division hardware
|
** `M` - integer multiplication and division hardware
|
** `U` - less-privileged _user_ mode
|
** `U` - less-privileged _user_ mode
|
** `Zbb` - basic bit-manipulation operations
|
|
** `Zfinx` - single-precision floating-point unit
|
** `Zfinx` - single-precision floating-point unit
|
** `Zicsr` - control and status register access (privileged architecture)
|
** `Zicsr` - control and status register access (privileged architecture)
|
|
** `Zicntr` - CPU base counters
|
|
** `Zihpm` - hardware performance monitors
|
** `Zifencei` - instruction stream synchronization
|
** `Zifencei` - instruction stream synchronization
|
** `Zmmul` - integer multiplication hardware
|
** `Zmmul` - integer multiplication hardware
|
** `PMP` - physical memory protection
|
** `PMP` - physical memory protection
|
** `HPM` - hardware performance monitors
|
** `Debug` - debug mode
|
** `DB` - debug mode
|
|
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
|
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
|
* Official RISC-V open-source architecture ID
|
* Official RISC-V open-source architecture ID
|
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
|
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
|
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
|
* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
|
|
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
|
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
|
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
|
* Optional hardware performance monitors (HPM) for application benchmarking
|
* Optional hardware performance monitors (HPM) for application benchmarking
|
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
|
* Separated interfaces for instruction fetch and data access (merged into a single processor bus))
|
the NEORV32 processor)
|
|
* little-endian byte order
|
* little-endian byte order
|
* Configurable hardware reset
|
* Configurable hardware reset
|
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
|
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
|
|
|
[NOTE]
|
[NOTE]
|
Line 51... |
Line 52... |
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
|
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
|
specifications. The following figure shows the simplified architecture of the CPU.
|
specifications. The following figure shows the simplified architecture of the CPU.
|
|
|
image::neorv32_cpu.png[align=center]
|
image::neorv32_cpu.png[align=center]
|
|
|
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF - instruction fetch)
|
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
|
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
|
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
|
stored to a FIFO - the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
|
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
|
instruction words for the next pipeline stage. Compressed instructions - if enabled - are also decompressed
|
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
|
in this stage. The second stage (EX - execution) is responsible for actually executing the fetched instructions
|
|
via the execute engine.
|
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
|
|
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
|
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
|
data is stored to a FIFO queue - the instruction prefetch buffer.
|
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
|
|
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
|
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
|
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
|
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
|
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
|
instruction or decompressed 16-bit instructions) for execution.
|
due to a taken branch.
|
|
|
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
|
|
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
|
|
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
|
|
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
|
|
|
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
|
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
|
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
|
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
|
every single instruction in a series of consecutive micro-operations. The combination of these two classical
|
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
|
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
|
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
|
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
|
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
|
|
multi-cycle concept).
|
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
|
|
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
|
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
|
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
|
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
|
Architecture.
|
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
|
|
address space.
|
|
|
|
|
|
// ####################################################################################################################
|
|
:sectnums:
|
|
=== Full Virtualization
|
|
|
|
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU _and_ SoC level to
|
|
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
|
|
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
|
|
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing an
|
|
malformed instruction word or accessing a not-allocated memory address). For any kind of trap the core is always in a
|
|
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
|
|
might have to reverted). This allows predictable execution behavior at any time improving overall _execution safety_.
|
|
|
|
**Execution Safety - NEORV32 Virtualization Features**
|
|
|
|
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
|
|
(i.e. there is no speculative execution / no out-of-order states).
|
|
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
|
|
accessed address does not respond or encounters an internal error during access.
|
|
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
|
|
window. Otherwise a bus access exception is raised.
|
|
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
|
|
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
|
|
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
|
|
memory operations).
|
|
* To be continued...
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
=== RISC-V Compatibility
|
=== RISC-V Compatibility
|
Line 234... |
Line 267... |
.Atomic memory operations
|
.Atomic memory operations
|
[IMPORTANT]
|
[IMPORTANT]
|
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
|
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
|
However, these instructions are sufficient to emulate all further atomic memory operations.
|
However, these instructions are sufficient to emulate all further atomic memory operations.
|
|
|
|
.Bit-manipulation operations
|
|
[IMPORTANT]
|
|
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
|
|
and the _address generation instructions_ (`Zba`) subset yet.
|
|
|
.Instruction Misalignment
|
.Instruction Misalignment
|
[NOTE]
|
[NOTE]
|
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
|
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
|
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
|
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
|
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
|
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
|
there will be no misaligned instruction exceptions _at all_.
|
there will be no misaligned instruction exceptions _at all_.
|
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
|
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
=== CPU Top Entity - Signals
|
=== CPU Top Entity - Signals
|
|
|
Line 383... |
Line 420... |
[NOTE]
|
[NOTE]
|
The atomic instructions have special requirements for memory system / bus interconnect. More
|
The atomic instructions have special requirements for memory system / bus interconnect. More
|
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
|
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
|
|
|
|
|
|
==== **`B`** - Bit-Manipulation Operations
|
|
|
|
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
|
|
`CPU_EXTENSION_RISCV_B` configuration generic is _true_.
|
|
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
|
|
|
|
[IMPORTANT]
|
|
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
|
|
and the _address generation instructions_ (`Zba`) subset yet.
|
|
|
|
The `Zbb` sub-extension adds the following instruction:
|
|
|
|
* `andn`, `orn`, `xnor`
|
|
* `clz`, `ctz`, `cpop`
|
|
* `max`, `maxu`, `min`, `minu`
|
|
* `sext.b`, `sext.h`, `zext.h`
|
|
* `rol`, `ror`, `rori`
|
|
* `orc.b`, `rev8`
|
|
|
|
The `Zba` sub-extension adds the following instruction:
|
|
|
|
* `sh1add`, `sh2add`, `sh3add`
|
|
|
|
[TIP]
|
|
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
|
|
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
|
|
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
|
|
shift-related `B` instructions.
|
|
|
|
[WARNING]
|
|
The `B` extension is frozen but not officially ratified yet. There is no
|
|
software support for this extension in the upstream GCC RISC-V port yet. However, an
|
|
intrinsic library is provided to utilize the provided `B` extension features from C-language
|
|
code (see `sw/example/bitmanip_test`).
|
|
|
|
|
==== **`C`** - Compressed Instructions
|
==== **`C`** - Compressed Instructions
|
|
|
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
|
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
|
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
|
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
|
In this case the following instructions are available:
|
In this case the following instructions are available:
|
Line 532... |
Line 605... |
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
|
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
|
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
|
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
|
code (see `sw/example/floating_point_test`).
|
code (see `sw/example/floating_point_test`).
|
|
|
|
|
==== **`Zbb`** Basic Bit-Manipulation Operations
|
|
|
|
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
|
|
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
|
|
|
|
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
|
|
generic is _true_. In this case the following instructions are available:
|
|
|
|
* `andn`, `orn`, `xnor`
|
|
* `clz`, `ctz`, `cpop`
|
|
* `max`, `maxu`, `min`, `minu`
|
|
* `sext.b`, `sext.h`, `zext.h`
|
|
* `rol`, `ror`, `rori`
|
|
* `orc.b`, `rev8`
|
|
|
|
[TIP]
|
|
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
|
|
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
|
|
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
|
|
shift-related `Zbb` instructions.
|
|
|
|
[WARNING]
|
|
The `Zbb` extension is frozen but not officially ratified yet. There is no
|
|
software support for this extension in the upstream GCC RISC-V port yet. However, an
|
|
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
|
|
code (see `sw/example/bitmanip_test`).
|
|
|
|
|
|
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
|
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
|
|
|
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
|
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
|
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
|
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
|
In this case the following instructions are available:
|
In this case the following instructions are available:
|
Line 584... |
Line 629... |
[NOTE]
|
[NOTE]
|
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
|
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
|
`TW` (timeout wait) is hardwired to zero.
|
`TW` (timeout wait) is hardwired to zero.
|
|
|
|
|
|
|
|
|
|
==== **`Zicntr`** CPU Base Counters
|
|
|
|
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
|
|
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
|
|
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
|
|
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
|
|
|
|
[NOTE]
|
|
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
|
|
|
|
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
|
|
|
|
|
|
|
|
==== **`Zihpm`** Hardware Performance Monitors
|
|
|
|
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
|
|
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
|
|
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
|
|
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
|
|
CSR defines the architectural events that lead to an increment of the associated HPM counter.
|
|
|
|
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
|
|
|
|
Depending on the configuration the following additional CSR are available:
|
|
|
|
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
|
|
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
|
|
|
|
[IMPORTANT]
|
|
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
|
|
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
|
|
exception.
|
|
|
|
[TIP]
|
|
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
|
|
|
|
[TIP]
|
|
For a list of all HPM-related CSRs and all provided event configurations
|
|
see section <<_hardware_performance_monitors_hpm>>.
|
|
|
|
|
==== **`Zifencei`** Instruction Stream Synchronization
|
==== **`Zifencei`** Instruction Stream Synchronization
|
|
|
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
|
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
|
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
|
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
|
|
|
* `fence.i`
|
* `fence.i`
|
|
|
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
|
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
|
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
|
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
|
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
|
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
|
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
|
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
|
|
|
|
|
Line 657... |
Line 746... |
[NOTE]
|
[NOTE]
|
For more information regarding RISC-V physical memory protection see the official _The RISC-V
|
For more information regarding RISC-V physical memory protection see the official _The RISC-V
|
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
|
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
|
|
|
|
|
==== **`HPM`** Hardware Performance Monitors
|
|
|
|
In additions to the mandatory cycle (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
|
|
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
|
|
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
|
|
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
|
|
CSR defines the architectural events that lead to an increment of the associated HPM counter.
|
|
|
|
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
|
|
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
|
|
the instructions-retired counter increments with each executed instructions. The actual hardware performance
|
|
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
|
|
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will remove
|
|
all HPM logic from the design.
|
|
|
|
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
|
|
according `mcountinhibit` CSR bits are hardwired to zero.
|
|
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
|
|
The according CSRs are read-only and will always return 0.
|
|
|
|
Depending on the configuration the following additional CSR are available:
|
|
|
|
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
|
|
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
|
|
|
|
[IMPORTANT]
|
|
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
|
|
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
|
|
exception.
|
|
|
|
[TIP]
|
|
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
|
|
|
|
[TIP]
|
|
For a list of all HPM-related CSRs and all provided event configurations
|
|
see section <<_hardware_performance_monitors_hpm>>.
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
=== Instruction Timing
|
=== Instruction Timing
|
Line 725... |
Line 777... |
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
|
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
|
| Memory access | `C` | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4 + ML
|
| Memory access | `C` | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4 + ML
|
| Memory access | `A` | `lr.w` `sc.w` | 4 + ML
|
| Memory access | `A` | `lr.w` `sc.w` | 4 + ML
|
| Multiplication | `M` | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
|
| Multiplication | `M` | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
|
| Division | `M` | `div` `divu` `rem` `remu` | 22+32+4
|
| Division | `M` | `div` `divu` `rem` `remu` | 22+32+4
|
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
|
|
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
|
|
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
|
|
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
|
|
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
|
|
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
|
|
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
|
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
|
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
|
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
|
| System | `I/E` | `fence` | 3
|
| System | `I/E` | `fence` | 3
|
| System | `C`+`Zicsr` | `c.break` | 4
|
| System | `C`+`Zicsr` | `c.break` | 4
|
| System | `Zicsr` | `mret` `wfi` | 5
|
| System | `Zicsr` | `mret` `wfi` | 5
|
| System | `Zifencei` | `fence.i` | 5
|
| System | `Zifencei` | `fence.i` | 3 + ML
|
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
|
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
|
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
|
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
|
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
|
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
|
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
|
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
|
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
|
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
|
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
|
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
|
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
|
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
|
| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3
|
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
|
| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4
|
| Bit-manipulation - arithmetic/logic | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
|
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
|
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
|
| Basic bit-manip - misc | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
|
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
|
|
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
|
|
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
|
|
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
|
|=======================
|
|=======================
|
|
|
[NOTE]
|
[NOTE]
|
The presented values of the *floating-point execution cycles* are average values - obtained from
|
The presented values of the *floating-point execution cycles* are average values - obtained from
|
4096 instruction executions using pseudo-random input values. The execution time for emulating the
|
4096 instruction executions using pseudo-random input values. The execution time for emulating the
|
instructions (using pure-software libraries) is ~17..140 times higher.
|
instructions (using pure-software libraries) is ~17..140 times higher.
|
|
|
|
|
|
|
// ####################################################################################################################
|
|
include::cpu_csr.adoc[]
|
|
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
include::cpu_csr.adoc[]
|
==== Full Virtualization
|
|
|
|
Just like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilities
|
|
on CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPU
|
|
extension is enabled (implementing the full set of the privileged architecture).]
|
|
Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing an
|
|
malformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in a
|
|
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
|
|
have to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the cause
|
|
of the trap - at any time improving overall _execution safety_.
|
|
|
|
**NEORV32-Specific Virtualization Features**
|
|
|
|
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
|
|
(i.e. there is no speculative execution / no out-of-order states).
|
|
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
|
|
accessed address does not respond or encounters an internal error during access.
|
|
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional security feature,
|
|
the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions _do raise an illegal instruction trap_ and
|
|
_do not commit any operation_ (like writing registers or triggering memory operations).
|
|
* To be continued...
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
Line 962... |
Line 984... |
[IMPORTANT]
|
[IMPORTANT]
|
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
|
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
|
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
|
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
|
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
|
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
|
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
|
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
|
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
|
The _BUSKEEPER_ hardware module (see section <<_internal_bus_monitor_buskeeper>>) keeps track of all _internal_ bus transactions. If any bus operations times out
|
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
|
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
|
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
|
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
|
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
|
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
|
|
|
**Exemplary Bus Accesses**
|
**Exemplary Bus Accesses**
|
Line 1047... |
Line 1069... |
|
|
**Rational**
|
**Rational**
|
|
|
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
|
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
|
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
|
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
|
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
|
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
|
of the processing result to some kind of memory. The initial status of the data registers after power-up is
|
of the processing result to some kind of memory. The initial status of the data registers after power-up is
|
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
|
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
|
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
|
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
|
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
|
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
|
this example "uncritical registers".
|
this example "uncritical registers".
|