Line 19... |
Line 19... |
** `Zifencei` - instruction stream synchronization
|
** `Zifencei` - instruction stream synchronization
|
** `Zmmul` - integer multiplication hardware
|
** `Zmmul` - integer multiplication hardware
|
** `PMP` - physical memory protection
|
** `PMP` - physical memory protection
|
** `HPM` - hardware performance monitors
|
** `HPM` - hardware performance monitors
|
** `DB` - debug mode
|
** `DB` - debug mode
|
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)
|
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
|
* Official RISC-V open-source architecture ID
|
* Official RISC-V open-source architecture ID
|
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt
|
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
|
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
|
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
|
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
|
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
|
* Optional hardware performance monitors (HPM) for application benchmarking
|
* Optional hardware performance monitors (HPM) for application benchmarking
|
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
|
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
|
the NEORV32 processor)
|
the NEORV32 processor)
|
* little-endian byte order
|
* little-endian byte order
|
* Configurable hardware reset
|
* Configurable hardware reset
|
* No hardware support of unaligned data/instruction accesses – they will trigger an exception.
|
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
|
|
|
[NOTE]
|
[NOTE]
|
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
|
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
|
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
|
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
|
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
|
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
|
Line 51... |
Line 51... |
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
|
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
|
specifications. The following figure shows the simplified architecture of the CPU.
|
specifications. The following figure shows the simplified architecture of the CPU.
|
|
|
image::neorv32_cpu.png[align=center]
|
image::neorv32_cpu.png[align=center]
|
|
|
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch)
|
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF - instruction fetch)
|
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
|
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
|
stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
|
stored to a FIFO - the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
|
instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed
|
instruction words for the next pipeline stage. Compressed instructions - if enabled - are also decompressed
|
in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions
|
in this stage. The second stage (EX - execution) is responsible for actually executing the fetched instructions
|
via the execute engine.
|
via the execute engine.
|
|
|
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
|
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
|
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
|
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
|
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
|
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
|
Line 221... |
Line 221... |
|
|
.Hardwired R/W CSRs
|
.Hardwired R/W CSRs
|
[IMPORTANT]
|
[IMPORTANT]
|
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
|
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
|
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
|
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
|
|
Pending interrupt can only be cleared by acknowledging the interrupt-causing device. However, pending interrupts
|
|
can still be ignored by clearing the according `mie` register bits.
|
|
|
.Physical memory protection
|
.Physical memory protection
|
[IMPORTANT]
|
[IMPORTANT]
|
The physical memory protection (see section <<_machine_physical_memory_protection>>)
|
The physical memory protection (see section <<_machine_physical_memory_protection>>)
|
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
|
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
|
Line 335... |
Line 337... |
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
=== Instruction Sets and Extensions
|
=== Instruction Sets and Extensions
|
|
|
The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA
|
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
|
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
|
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
|
see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
|
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
|
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
|
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
|
|
|
[TIP]
|
[TIP]
|
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
|
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
|
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
|
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
|
or by executing an instruction and checking for an _illegal instruction exception_.
|
or by executing an instruction and checking for an _illegal instruction exception_.
|
|
|
[NOTE]
|
[NOTE]
|
Executing an instruction from an extension that is not implemented or not enabled (for example via the according
|
Executing an instruction from an extension that is not supported yet or that is currently not enabled
|
top entity generic) will raise an _illegal instruction_ exception.
|
(via the according top entity generic) will raise an _illegal instruction_ exception.
|
|
|
|
|
==== **`A`** - Atomic Memory Access
|
==== **`A`** - Atomic Memory Access
|
|
|
Atomic memory access instructions (for implementing semaphores and mutexes) are available when the
|
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
|
`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions
|
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
|
are available:
|
ISA extension is enabled if the `CPU_EXTENSION_RISCV_A` configuration generic is _true_.
|
|
In this case the following additional instructions are available:
|
|
|
* `lr.w`: load-reservate
|
* `lr.w`: load-reservate
|
* `sc.w`: store-conditional
|
* `sc.w`: store-conditional
|
|
|
[NOTE]
|
[NOTE]
|
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
|
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
|
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
|
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
|
instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
|
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
|
implemented) AMO (atomic memory operation) will trigger an illegal instruction exception.
|
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
|
|
|
|
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
|
|
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
|
|
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
|
|
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
|
|
After the execution of the `sc` instruction, the lock is automatically removed.
|
|
|
|
The lock is broken if at least one of the following conditions occur:
|
|
. executing any data memory access instruction other than `lr.w`
|
|
. raising _any_ t (for example an interrupt or a memory access exception)
|
|
|
[NOTE]
|
[NOTE]
|
The atomic instructions have special requirements for memory system / bus interconnect. More
|
The atomic instructions have special requirements for memory system / bus interconnect. More
|
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
|
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
|
|
|
|
|
==== **`C`** - Compressed Instructions
|
==== **`C`** - Compressed Instructions
|
|
|
Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is
|
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
|
_true_. In this case the following instructions are available:
|
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
|
|
In this case the following instructions are available:
|
|
|
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
|
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
|
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
|
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
|
|
|
[NOTE]
|
[NOTE]
|
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require
|
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
|
an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased
|
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
|
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
|
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
|
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
|
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
|
|
|
|
|
==== **`E`** - Embedded CPU
|
==== **`E`** - Embedded CPU
|
|
|
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware
|
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
|
requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond
|
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the `CPU_EXTENSION_RISCV_E`
|
`x15` will raise and _illegal instruction exception_.
|
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
|
|
This extension does not add any additional instructions or features.
|
|
|
[IMPORTANT]
|
[IMPORTANT]
|
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
|
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
|
|
|
|
|
==== **`I`** - Base Integer ISA
|
==== **`I`** - Base Integer ISA
|
|
|
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
|
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
|
regardless of the setting of the remaining exceptions. The base instruction set includes the following
|
regardless of the setting of the remaining exceptions. The base instruction set includes the following
|
instructions:
|
instructions:
|
|
|
* immediates: `lui`, `auipc`
|
* immediate: `lui`, `auipc`
|
* jumps: `jal`, `jalr`
|
* jumps: `jal`, `jalr`
|
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
|
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
|
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
|
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
|
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
|
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
|
* environment: `ecall`, `ebreak`, `fence`
|
* environment: `ecall`, `ebreak`, `fence`
|
Line 421... |
Line 437... |
executed. Any flags within the `fence` instruction word are ignore by the hardware.
|
executed. Any flags within the `fence` instruction word are ignore by the hardware.
|
|
|
|
|
==== **`M`** - Integer Multiplication and Division
|
==== **`M`** - Integer Multiplication and Division
|
|
|
Hardware-accelerated integer multiplication and division instructions are available when the
|
Hardware-accelerated integer multiplication and division operations are available when the
|
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
|
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
|
available:
|
available:
|
|
|
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
|
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
|
* division: `div`, `divu`, `rem`, `remu`
|
* division: `div`, `divu`, `rem`, `remu`
|
Line 438... |
Line 454... |
|
|
|
|
==== **`Zmmul`** - Integer Multiplication
|
==== **`Zmmul`** - Integer Multiplication
|
|
|
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
|
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
|
of the `M` extensions and is intended for small scale applications, that require hardware-based
|
of the `M` extensions and is intended for size-constrained setups that require hardware-based
|
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
|
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
|
This extension requires only ~50% of the hardware utilization of the `M` extension.
|
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
|
|
|
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
|
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
|
|
|
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
|
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
|
will raise an _illegal instruction exception_.
|
will raise an _illegal instruction exception_.
|
Line 452... |
Line 468... |
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
|
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
|
|
|
[TIP]
|
[TIP]
|
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
|
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
|
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
|
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
|
(example `$ make MARCH=-march=rv32im USER_FLAGS+=-mno-div clean_all exe`).
|
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
|
|
|
|
|
==== **`U`** - Less-Privileged User Mode
|
==== **`U`** - Less-Privileged User Mode
|
|
|
Adds the less-privileged _user mode_ if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For
|
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
|
instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like
|
operation mode. It is implemented if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_.
|
peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode.
|
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
|
|
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
|
|
Any kind of privilege rights violation will raise an exception to allow full virtualization.
|
|
|
|
|
==== **`X`** - NEORV32-Specific (Custom) Extensions
|
==== **`X`** - NEORV32-Specific (Custom) Extensions
|
|
|
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
|
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
|
Line 475... |
Line 493... |
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
|
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
|
|
|
|
|
==== **`Zfinx`** Single-Precision Floating-Point Operations
|
==== **`Zfinx`** Single-Precision Floating-Point Operations
|
|
|
[WARNING]
|
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
|
The NEORV32 `Zfinx` extension is specification-compliant and operational but still _experimental_.
|
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
|
|
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
|
The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the
|
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
|
integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f`
|
register file-related load/store or move instructions.
|
register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes.
|
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
|
This also implies that there are NO dedicated `f` register file related load/store or move instructions. The
|
|
official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
|
|
|
|
|
[TIP]
|
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
|
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
|
|
|
The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F`
|
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
|
extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
|
to the `F` extension. The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
|
generic is _true_. In this case the following instructions and CSRs are available:
|
generic is _true_. In this case the following instructions and CSRs are available:
|
|
|
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
|
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
|
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
|
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
|
* computational: `fadd.s`, `fsub.s`, `fmul.s`
|
* computational: `fadd.s`, `fsub.s`, `fmul.s`
|
Line 503... |
Line 520... |
[WARNING]
|
[WARNING]
|
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
|
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
|
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
|
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
|
|
|
[WARNING]
|
[WARNING]
|
Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU.
|
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
|
Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the
|
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
|
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
|
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
|
result is also flushed to zero during normalization.
|
result is also flushed to zero during normalization.
|
|
|
[WARNING]
|
[WARNING]
|
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
|
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
|
Line 517... |
Line 534... |
code (see `sw/example/floating_point_test`).
|
code (see `sw/example/floating_point_test`).
|
|
|
|
|
==== **`Zbb`** Basic Bit-Manipulation Operations
|
==== **`Zbb`** Basic Bit-Manipulation Operations
|
|
|
[WARNING]
|
|
The NEORV32 `Zbb` extension is specification-compliant and operational but still _experimental_.
|
|
|
|
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
|
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
|
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
|
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
|
|
|
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
|
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
|
generic is _true_. In this case the following instructions are available:
|
generic is _true_. In this case the following instructions are available:
|
Line 539... |
Line 553... |
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
|
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
|
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
|
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
|
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
|
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
|
shift-related `Zbb` instructions.
|
shift-related `Zbb` instructions.
|
|
|
[IMPORTANT]
|
[WARNING]
|
The `Zbb` extension is frozen but not officially ratified yet. There is no
|
The `Zbb` extension is frozen but not officially ratified yet. There is no
|
software support for this extension in the upstream GCC RISC-V port yet. However, an
|
software support for this extension in the upstream GCC RISC-V port yet. However, an
|
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
|
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
|
code (see `sw/example/bitmanip_test`).
|
code (see `sw/example/bitmanip_test`).
|
|
|
|
|
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
|
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
|
|
|
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the
|
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
|
`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are
|
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
|
available:
|
In this case the following instructions are available:
|
|
|
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
|
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
|
* environment: `mret`, `wfi`
|
* environment: `mret`, `wfi`
|
|
|
[WARNING]
|
[WARNING]
|
If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception
|
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
|
support at all. In order to provide the full spectrum of functions and to allow a secure executions
|
In order to provide the full set of functions and to allow a secure execution
|
environment, the `Zicsr` extension should always be enabled.
|
environment the `Zicsr` extension should always be enabled.
|
|
|
[NOTE]
|
[NOTE]
|
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
|
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
|
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
|
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
|
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
|
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
|
|
|
[IMPORTANT]
|
[NOTE]
|
The `wfi` instruction will raise an illegal instruction exception when executed outside of machine-mode
|
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
|
and <<_mstatus>> bit `TW` (timeout wait) is set.
|
`TW` (timeout wait) is hardwired to zero.
|
|
|
|
|
==== **`Zifencei`** Instruction Stream Synchronization
|
==== **`Zifencei`** Instruction Stream Synchronization
|
|
|
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
|
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
|
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
|
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
|
|
|
* `fence.i`
|
* `fence.i`
|
|
|
[NOTE]
|
|
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
|
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
|
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
|
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
|
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
|
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
|
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
|
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
|
|
|
|
|
==== **`PMP`** Physical Memory Protection
|
==== **`PMP`** Physical Memory Protection
|
|
|
The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs.
|
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
|
The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured
|
to constrain memory read/write/execute rights for each available privilege level.
|
via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the
|
|
`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available:
|
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
|
|
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
|
|
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
|
|
In this case the following additional CSRs are available:
|
|
|
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
|
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
|
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
|
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
|
|
|
|
[TIP]
|
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
|
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
|
|
|
**Configuration**
|
|
|
|
The actual number of regions and the minimal region granularity are defined via the top entity
|
The actual number of regions and the minimal region granularity are defined via the top entity
|
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
|
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
|
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
|
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
|
number of available `pmpcfg*` and `pmpaddr*` CSRs.
|
number of available `pmpcfg*` and `pmpaddr*` CSRs.
|
|
|
Line 618... |
Line 633... |
constant pmp_num_regions_critical_c : natural := 8;
|
constant pmp_num_regions_critical_c : natural := 8;
|
----
|
----
|
|
|
**Operation**
|
**Operation**
|
|
|
Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any
|
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
|
of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an
|
of the specified PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
|
address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked:
|
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
|
|
|
* a write access (store) will fail if no write attribute is set
|
* a write access (store) will fail if no write attribute is set
|
* a read access (load) will fail if no read attribute is set
|
* a read access (load) will fail if no read attribute is set
|
* an instruction fetch access will fail if no execute attribute is set
|
* an instruction fetch access will fail if no execute attribute is set
|
|
|
If an access to a protected region does not have the according access rights (attributes) it will raise the according
|
If an access to a protected region does not have the according access rights it will raise the according
|
_instruction/load/store access fault exception_.
|
instruction/load/store _access fault_ exception.
|
|
|
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
|
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
|
memory protection also for machine-level programs you need to active the _locked bit_ in the according
|
memory protection also for machine-level programs you need to set the _locked bit_ in the according
|
`pmpcfg*` configuration.
|
`pmpcfg*` configuration CSR.
|
|
|
[IMPORTANT]
|
[IMPORTANT]
|
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
|
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
|
internal (iterative) computations before the configuration becomes valid.
|
internal (iterative) computations before the configuration becomes valid.
|
|
|
[NOTE]
|
[NOTE]
|
For more information regarding RISC-V physical memory protection see the official _The RISC-V
|
For more information regarding RISC-V physical memory protection see the official _The RISC-V
|
Instruction Set Manual – Volume II: Privileged Architecture_ specifications.
|
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
|
|
|
|
|
==== **`HPM`** Hardware Performance Monitors
|
==== **`HPM`** Hardware Performance Monitors
|
|
|
In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
|
In additions to the mandatory cycle (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
|
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
|
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
|
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
|
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
|
`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configuration
|
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
|
CSR defines the architectural events that lead to an increment of the associated HPM counter.
|
CSR defines the architectural events that lead to an increment of the associated HPM counter.
|
|
|
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
|
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
|
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
|
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
|
the instructions-retired counter increments with each executed instructions. The actual hardware performance
|
the instructions-retired counter increments with each executed instructions. The actual hardware performance
|
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
|
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
|
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude
|
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will remove
|
all HPM logic from the design.
|
all HPM logic from the design.
|
|
|
Depending on the configuration, the following additional CSR are available:
|
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
|
|
according `mcountinhibit` CSR bits are hardwired to zero.
|
|
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
|
|
The according CSRs are read-only and will always return 0.
|
|
|
|
Depending on the configuration the following additional CSR are available:
|
|
|
* counters: `mhpmcounter*[h]` (3..31, depending on configuration)
|
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
|
* event configuration: `mhpmevent*` (3..31, depending on configuration)
|
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
|
|
|
[IMPORTANT]
|
[IMPORTANT]
|
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
|
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
|
are always zero and read-only.
|
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
|
|
exception.
|
|
|
|
[TIP]
|
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
|
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
|
|
|
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
|
[TIP]
|
according `mcountinhibit` CSR bits are hardwired to zero.
|
For a list of all HPM-related CSRs and all provided event configurations
|
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
|
see section <<_hardware_performance_monitors_hpm>>.
|
The according CSRs are read-only and will always return 0.
|
|
|
|
[NOTE]
|
|
For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>.
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
Line 733... |
Line 751... |
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
|
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
|
| Basic bit-manip - misc | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
|
| Basic bit-manip - misc | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
|
|=======================
|
|=======================
|
|
|
[NOTE]
|
[NOTE]
|
The presented values of the *floating-point execution cycles* are average values – obtained from
|
The presented values of the *floating-point execution cycles* are average values - obtained from
|
4096 instruction executions using pseudo-random input values. The execution time for emulating the
|
4096 instruction executions using pseudo-random input values. The execution time for emulating the
|
instructions (using pure-software libraries) is ~17..140 times higher.
|
instructions (using pure-software libraries) is ~17..140 times higher.
|
|
|
|
|
|
|
Line 764... |
Line 782... |
|
|
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
|
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
|
(i.e. there is no speculative execution / no out-of-order states).
|
(i.e. there is no speculative execution / no out-of-order states).
|
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
|
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
|
accessed address does not respond or encounters an internal error during access.
|
accessed address does not respond or encounters an internal error during access.
|
* The CPU raises an illegal instruction trap for _all_ unimplemented/malformed/illegal instructions.
|
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional security feature,
|
|
the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions _do raise an illegal instruction trap_ and
|
|
_do not commit any operation_ (like writing registers or triggering memory operations).
|
* To be continued...
|
* To be continued...
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
Line 789... |
Line 809... |
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
|
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
|
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
|
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
|
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
|
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
|
the second highest priority will get serviced and so on until no further interrupt are pending.
|
the second highest priority will get serviced and so on until no further interrupt are pending.
|
|
|
.RISC-V interrupts
|
.Interrupt Signal Requirements
|
[IMPORTANT]
|
[IMPORTANT]
|
All RISC-V defined machine level interrupts request signals are high-active. A request has to stay at high-level until
|
All interrupts request signals (including FIRQs) are **high-active**. A request has to stay at high-level (=asserted)
|
it is acknowledged by the CPU (for example by writing to a specific memory-mapped register).
|
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
|
|
|
.Instruction Atomicity
|
.Instruction Atomicity
|
[NOTE]
|
[NOTE]
|
All instructions execute as atomic operations – interrupts can only trigger between two instructions.
|
All instructions execute as atomic operations - interrupts can only trigger between two instructions.
|
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
|
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
|
a new interrupt handler can start.
|
a new interrupt handler can start.
|
|
|
|
|
:sectnums:
|
:sectnums:
|
Line 815... |
Line 835... |
:sectnums:
|
:sectnums:
|
==== Custom Fast Interrupt Request Lines
|
==== Custom Fast Interrupt Request Lines
|
|
|
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
|
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
|
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
|
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
|
provide custom trap codes in `mcause`. These FIRQs are reserved for processor-internal usage only.
|
provide custom trap codes in `mcause`. These FIRQs are reserved for NEORV32 processor-internal usage only.
|
|
|
[NOTE]
|
|
The fast interrupt request lines trigger on a **rising-edge**.
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums!:
|
:sectnums!:
|
Line 892... |
Line 910... |
===== Address Space
|
===== Address Space
|
|
|
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
|
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
|
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
|
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
|
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
|
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
|
does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be
|
does not support unaligned memory accesses _in hardware_ - however, a software-based handling can be
|
implemented as any unaligned memory access will trigger an according exception.
|
implemented as any unaligned memory access will trigger an according exception.
|
|
|
:sectnums:
|
:sectnums:
|
===== Interface Signals
|
===== Interface Signals
|
|
|