URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 72

Go to most recent revision | Details | Compare with Previous | View Log


:sectnums:
== NEORV32 Central Processing Unit (CPU)
 
image::neorv32_cpu_block.png[width=600,align=center]
 
**Key Features**
 
* 32-bit multi-cycle in-order `rv32` RISC-V CPU
* Optional RISC-V extensions:
** `A` - atomic memory access operations
** `B` - bit-manipulation instructions
** `C` - 16-bit compressed instructions
** `I` - integer base ISA (always enabled)
** `E` - embedded CPU version (reduced register file size)
** `M` - integer multiplication and division hardware
** `U` - less-privileged _user_ mode
** `Zfinx` - single-precision floating-point unit
** `Zicsr` - control and status register access (privileged architecture)
** `Zicntr` - CPU base counters
** `Zihpm` - hardware performance monitors
** `Zifencei` - instruction stream synchronization
** `Zmmul` - integer multiplication hardware
** `Zxcfu` - custom instructions extension
** `PMP` - physical memory protection
** `Debug` - debug mode (part of the on.chip debugger) including hardware trigger module
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
* Official RISC-V open-source architecture ID
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
* Optional hardware performance monitors (HPM) for application benchmarking
* Separated interfaces for instruction fetch and data access (merged into a single processor bus))
* little-endian byte order
* Configurable hardware reset
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
 
[NOTE]
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
setup also allows to further use the default bootloader and software framework. From this base you
can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.
 
[NOTE]
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
 
<<<
// ####################################################################################################################
:sectnums:
=== Architecture
 
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
specifications. The following figure shows the simplified architecture of the CPU.
 
image::neorv32_cpu.png[align=center]
 
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
 
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
data is stored to a FIFO queue - the instruction prefetch buffer.
 
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
instruction or decompressed 16-bit instructions) for execution.
 
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
 
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
multi-cycle concept).
 
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
address space.
 
 
// ####################################################################################################################
:sectnums:
=== Full Virtualization
 
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing a
malformed instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.
 
**Execution Safety - NEORV32 Virtualization Features**
 
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
(i.e. there is no speculative execution / no out-of-order states).
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
accessed address does not respond or encounters an internal device error during access.
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
window. Otherwise a bus access exception is raised.
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
memory operations).
* To be continued...
 
 
// ####################################################################################################################
:sectnums:
=== RISC-V Compatibility
 
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
NEORV32 processor are located in the repository's `sw/isa-test` folder.
 
[NOTE]
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
for information how to run the tests on the NEORV32.
 
.**RISC-V `rv32_m/C` Tests**
...................................
Check cadd-01           ... OK
Check caddi-01          ... OK
Check caddi16sp-01      ... OK
Check caddi4spn-01      ... OK
Check cand-01           ... OK
Check candi-01          ... OK
Check cbeqz-01          ... OK
Check cbnez-01          ... OK
Check cebreak-01        ... OK
Check cj-01             ... OK
Check cjal-01           ... OK
Check cjalr-01          ... OK
Check cjr-01            ... OK
Check cli-01            ... OK
Check clui-01           ... OK
Check clw-01            ... OK
Check clwsp-01          ... OK
Check cmv-01            ... OK
Check cnop-01           ... OK
Check cor-01            ... OK
Check cslli-01          ... OK
Check csrai-01          ... OK
Check csrli-01          ... OK
Check csub-01           ... OK
Check csw-01            ... OK
Check cswsp-01          ... OK
Check cxor-01           ... OK
--------------------------------
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
...................................
 
.**RISC-V `rv32_m/I` Tests**
...................................
Check add-01            ... OK
Check addi-01           ... OK
Check and-01            ... OK
Check andi-01           ... OK
Check auipc-01          ... OK
Check beq-01            ... OK
Check bge-01            ... OK
Check bgeu-01           ... OK
Check blt-01            ... OK
Check bltu-01           ... OK
Check bne-01            ... OK
Check fence-01          ... OK
Check jal-01            ... OK
Check jalr-01           ... OK
Check lb-align-01       ... OK
Check lbu-align-01      ... OK
Check lh-align-01       ... OK
Check lhu-align-01      ... OK
Check lui-01            ... OK
Check lw-align-01       ... OK
Check or-01             ... OK
Check ori-01            ... OK
Check sb-align-01       ... OK
Check sh-align-01       ... OK
Check sll-01            ... OK
Check slli-01           ... OK
Check slt-01            ... OK
Check slti-01           ... OK
Check sltiu-01          ... OK
Check sltu-01           ... OK
Check sra-01            ... OK
Check srai-01           ... OK
Check srl-01            ... OK
Check srli-01           ... OK
Check sub-01            ... OK
Check sw-align-01       ... OK
Check xor-01            ... OK
Check xori-01           ... OK
Check fence-01          ... OK
--------------------------------
OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
...................................
 
.**RISC-V `rv32_m/M` Tests**
...................................
Check div-01            ... OK
Check divu-01           ... OK
Check mul-01            ... OK
Check mulh-01           ... OK
Check mulhsu-01         ... OK
Check mulhu-01          ... OK
Check rem-01            ... OK
Check remu-01           ... OK
--------------------------------
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
...................................
 
.**RISC-V `rv32_m/privilege` Tests**
...................................
Check ebreak            ... OK
Check ecall             ... OK
Check misalign-beq-01   ... OK
Check misalign-bge-01   ... OK
Check misalign-bgeu-01  ... OK
Check misalign-blt-01   ... OK
Check misalign-bltu-01  ... OK
Check misalign-bne-01   ... OK
Check misalign-jal-01   ... OK
Check misalign-lh-01    ... OK
Check misalign-lhu-01   ... OK
Check misalign-lw-01    ... OK
Check misalign-sh-01    ... OK
Check misalign-sw-01    ... OK
Check misalign1-jalr-01 ... OK
Check misalign2-jalr-01 ... OK
--------------------------------
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
...................................
 
.**RISC-V `rv32_m/Zifencei` Tests**
...................................
Check Fencei            ... OK
--------------------------------
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
...................................
 
 
<<<
:sectnums:
==== RISC-V Incompatibility Issues and Limitations
 
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
can be found in section <<_instruction_sets_and_extensions>>.
 
.Read-Only "Read-Write" CSRs
[IMPORTANT]
The <<_misa>> and <<_mtval>> CSRs in the NEORV32 are _read-only_.
Any machine-mode write access to them is ignored and will _not_ cause any exceptions or side-effects to maintain
RISC-V compatibility.
 
.Physical Memory Protection
[IMPORTANT]
The physical memory protection (see section <<_machine_physical_memory_protection_csrs>>)
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
 
.Atomic Memory Operations
[IMPORTANT]
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
However, these instructions are sufficient to emulate all further atomic memory operations.
 
 
<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Signals
 
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
direction seen from the CPU.
 
.NEORV32 CPU top entity signals
[cols="<2,^1,^1,<6"]
[options="header", grid="rows"]
|=======================
| Signal           | Width | Dir.   | Function
4+^| **Global Signals**
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
| `rstn_i`         |     1 | in  | global reset, low-active
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
| `debug_o`        |     1 | out | CPU is in debug mode when set
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
| `i_bus_addr_o`   |    32 | out | destination address
| `i_bus_rdata_i`  |    32 | in  | read data
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
| `i_bus_ben_o`    |     4 | out | byte enable
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
| `i_bus_re_o`     |     1 | out | read transaction
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
4+^| **Data Bus Interface (<<_bus_interface>>)**
| `d_bus_addr_o`   |    32 | out | destination address
| `d_bus_rdata_i`  |    32 | in  | read data
| `d_bus_wdata_o`  |    32 | out | write data
| `d_bus_ben_o`    |     4 | out | byte enable
| `d_bus_we_o`     |     1 | out | write transaction
| `d_bus_re_o`     |     1 | out | read transaction
| `d_bus_lock_o`   |     1 | out | exclusive access request
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
4+^| **System Time (see <<_timeh>> CSR)**
| `time_i`         |    64 | in  | system time input (from MTIME)
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
| `firq_i`         |    16 | in  | fast interrupt request signals
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
|=======================
 
<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Generics
 
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
The _specific_ generics are listed below.
 
[cols="4,4,2"]
[frame="all",grid="none"]
|======
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | -
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
|======
 
[cols="4,4,2"]
[frame="all",grid="none"]
|======
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | -
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
|======
 
[cols="4,4,2"]
[frame="all",grid="none"]
|======
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | -
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
|======
 
 
<<<
// ####################################################################################################################
:sectnums:
=== Instruction Sets and Extensions
 
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
 
.Discovering ISA Extensions
[TIP]
The CPU can discover available ISA extensions via the <<_misa>> & <<_mxisa>> CSRs
or by executing an instruction and checking for an _illegal instruction exception_
(-> <<_full_virtualization>>). +
 +
Executing an instruction from an extension that is not supported yet or that is currently not enabled
(via the according top entity generic) will raise an illegal instruction exception.
 
 
==== **`A`** - Atomic Memory Access
 
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
ISA extension is enabled if the <<_cpu_extension_riscv_a>> configuration generic is _true_.
In this case the following additional instructions are available:
 
* `lr.w`: load-reservate
* `sc.w`: store-conditional
 
[NOTE]
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
 
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
After the execution of the `sc` instruction, the lock is automatically removed.
 
The lock is broken if at least one of the following conditions occur:
. executing any data memory access instruction other than `lr.w`
. raising _any_ t (for example an interrupt or a memory access exception)
 
[NOTE]
The atomic instructions have special requirements for memory system / bus interconnect. More
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
 
 
==== **`B`** - Bit-Manipulation Operations
 
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
<<_cpu_extension_riscv_b>> configuration generic is _true_.
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
A copy of the spec is also available in `docs/references`.
 
The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V
bit-manipulation spec. v.093) and their corresponding instructions:
 
* **`Zba` - Address-generation instructions**
** `sh1add` `sh2add` `sh3add`
* **`Zbb` - Basic bit-manipulation instructions**
** `andn` `orn` `xnor`
** `clz` `ctz` `cpop`
** `max` `maxu` `min` `minu`
** `sext.b` `sext.h` `zext.h`
** `rol` `ror` `rori`
** `orc.b` `rev8`
* **`Zbc` - Carry-less multiplication instructions**
** `clmul` `clmulh` `clmulr`
* **`Zbs` - Single-bit instructions**
** `bclr` `bclri`
** `bext` `bexti`
** `bext` `binvi`
** `bset` `bseti`
 
[TIP]
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
shift-related `B` instructions.
 
[WARNING]
The `B` extension is frozen and officially ratified. However, there is no
software support for this extension in the upstream GCC RISC-V port yet. An
intrinsic library is provided to utilize the provided `B` extension features from C-language
code (see `sw/example/bitmanip_test`) to circumvent this.
 
 
==== **`C`** - Compressed Instructions
 
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
The `C` extension is available when the <<_cpu_extension_riscv_c>> configuration generic is _true_.
In this case the following instructions are available:
 
* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`
`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`
 
[NOTE]
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
 
 
==== **`E`** - Embedded CPU
 
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the <<_cpu_extension_riscv_e>>
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
This extension does not add any additional instructions or features.
 
[NOTE]
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
 
 
==== **`I`** - Base Integer ISA
 
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
regardless of the setting of the remaining exceptions. The base instruction set includes the following
instructions:
 
* immediate: `lui` `auipc`
* jumps: `jal` `jalr`
* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`
* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`
* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`
* environment: `ecall` `ebreak` `fence`
 
[NOTE]
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
 
[NOTE]
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
executed. Any flags within the `fence` instruction word are ignore by the hardware.
 
 
==== **`M`** - Integer Multiplication and Division
 
Hardware-accelerated integer multiplication and division operations are available when the
<<_cpu_extension_riscv_m>> configuration generic is _true_. In this case the following instructions are
available:
 
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
* division: `div` `divu` `rem` `remu`
 
[NOTE]
By default, multiplication and division operations are executed in a bit-serial approach.
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
generic is _true_ allowing faster execution. Multiplications and divisions
always require a fixed amount of cycles to complete - regardless of the input operands.
 
 
==== **`Zmmul`** - Integer Multiplication
 
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
of the `M` extensions and is intended for size-constrained setups that require hardware-based
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
It is implemented if the <<_cpu_extension_riscv_zmmul>> configuration generic is _true_.
 
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
 
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
will raise an _illegal instruction exception_.
 
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
 
[TIP]
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
 
 
==== **`U`** - Less-Privileged User Mode
 
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
operation mode. It is implemented if the <<_cpu_extension_riscv_u>> configuration generic is _true_.
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.
 
 
==== **`X`** - NEORV32-Specific (Custom) Extensions
 
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the <<_misa>> CSR.
 
The most important points of the NEORV32-specific extensions are:
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
and <<_mip>> CSR. This extension is mapped to CSR bits, that are available for custom use (according to the
RISC-V specs). Also, custom trap codes for <<_mcause>> are implemented.
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
* There are <<_neorv32_specific_csrs>>.
 
 
==== **`Zfinx`** Single-Precision Floating-Point Operations
 
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
register file-related load/store or move instructions.
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
 
[NOTE]
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
 
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
to the `F` extension. The `Zfinx` extension is implemented when the <<_cpu_extension_riscv_zfinx>> configuration
generic is _true_. In this case the following instructions and CSRs are available:
 
* conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`
* comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`
* computational: `fadd.s` `fsub.s` `fmul.s`
* sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`
* number classification: `fclass.s`
 
* additional CSRs: <<_fcsr>>, <<_frm>>, <<_fflags>>
 
[WARNING]
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
 
[WARNING]
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
result is also flushed to zero during normalization.
 
[WARNING]
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
code (see `sw/example/floating_point_test`).
 
 
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
 
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
is implemented when the <<_cpu_extension_riscv_zicsr>> configuration generic is _true_.
 
[IMPORTANT]
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
In order to provide the full set of privileged functions that are required to run more complex tasks like
operating system and to allow a secure execution environment the `Zicsr` extension should be always enabled.
 
In this case the following instructions are available:
 
* CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`
* environment: `mret` `wfi`
 
[NOTE]
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").
 
[NOTE]
The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
be enabled via the <<_mie>> CSR and the global interrupt enable flag in <<_mstatus>> has to be set.
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
`TW` (timeout wait) is _hardwired_ to zero.
 
 
 
==== **`Zicntr`** CPU Base Counters
 
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
 
[NOTE]
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
 
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
 
 
 
==== **`Zihpm`** Hardware Performance Monitors
 
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
<<_hpm_cnt_width>> generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
CSR defines the architectural events that lead to an increment of the associated HPM counter.
 
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
 
Depending on the configuration the following additional CSR are available:
 
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
 
[IMPORTANT]
The HPM counter CSR can only be accessed in machine-mode. Hence, the according <<_mcounteren>> CSR bits
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
exception.
 
[TIP]
Auto-increment of the HPMs can be individually deactivated via the <<_mcountinhibit>> CSR.
 
[TIP]
For a list of all HPM-related CSRs and all provided event configurations
see section <<_hardware_performance_monitors_hpm>>.
 
 
==== **`Zifencei`** Instruction Stream Synchronization
 
The `Zifencei` CPU extension is implemented if the <<_cpu_extension_riscv_zifencei>> configuration
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
 
* `fence.i`
 
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
 
 
==== **`Zxcfu`** Custom Instructions Extension (CFU)
 
The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific
custom extension, `cfu` = name of the custom extension). When enabled via the <<_cpu_extension_riscv_zxcfu>> configuration
generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that
allows to add **custom RISC-V instructions** to the processor core.
 
The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data
transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be
implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by
the RISC-V spec for custom extensions.
 
Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that
behave like "regular" C functions.
 
[TIP]
For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.
 
[TIP]
The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.
If you like to add more complex accelerators or interfaces that can also operate independently of
the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.
 
 
==== **`PMP`** Physical Memory Protection
 
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
to constrain memory read/write/execute rights for each available privilege level.
 
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
In this case the following additional CSRs are available:
 
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
 
[TIP]
See section <<_machine_physical_memory_protection_csrs>> for more information regarding the PMP CSRs.
 
The actual number of regions and the minimal region granularity are defined via the top entity
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
number of available `pmpcfg*` and `pmpaddr*` CSRs.
 
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
increase the latency of instruction fetches and data access by +1 cycle.
 
The critical limit can be adapted for custom use by a constant from the main VHDL package file
(`rtl/core/neorv32_package.vhd`). The default value is 8:
 
[source,vhdl]
----
-- "critical" number of PMP regions --
constant pmp_num_regions_critical_c : natural := 8;
----
 
**Operation**
 
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
of the specified  PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
 
* a write access (store) will fail if no write attribute is set
* a read access (load) will fail if no read attribute is set
* an instruction fetch access will fail if no execute attribute is set
 
If an access to a protected region does not have the according access rights it will raise the according
instruction/load/store _access fault_ exception.
 
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
memory protection also for machine-level programs you need to set the _locked bit_ in the according
`pmpcfg*` configuration CSR.
 
[IMPORTANT]
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
internal (iterative) computations before the configuration becomes valid.
 
[NOTE]
For more information regarding RISC-V physical memory protection see the official _The RISC-V
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
 
 
 
<<<
// ####################################################################################################################
:sectnums:
=== Instruction Timing
 
The instruction timing listed in the table below shows the required clock cycles for executing a certain
instruction. These instruction cycles assume a bus access without additional wait states and a filled
pipeline.
 
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
configurations are presented in <<_cpu_performance>>.
 
.Clock cycles per instruction
[cols="<2,^1,^4,<3"]
[options="header", grid="rows"]
|=======================
| Class | ISA | Instruction(s) | Execution cycles
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4
| Division       | `M`  | `div` `divu` `rem` `remu`     | 2+32+2
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
| System | `I/E` | `fence` | 3
| System | `C`+`Zicsr` | `c.break` | 4
| System | `Zicsr` | `mret` `wfi` | 5
| System | `Zifencei` | `fence.i` | 3 + ML
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
| Bit-manipulation - arithmetic/logic | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
| Bit-manipulation - single-bit  | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
| CFU: custom instructions | `Zxcfu` | - | min. 4
|=======================
 
[NOTE]
The presented values of the *floating-point execution cycles* are average values - obtained from
4096 instruction executions using pseudo-random input values. The execution time for emulating the
instructions (using pure-software libraries) is ~17..140 times higher.
 
 
<<<
// ####################################################################################################################
include::cpu_csr.adoc[]
 
 
<<<
// ####################################################################################################################
:sectnums:
==== Traps, Exceptions and Interrupts
 
In this document the following nomenclature regarding traps is used:
 
* _interrupts_ = asynchronous exceptions
* _exceptions_ = synchronous exceptions
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
 
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in <<_mtvec>>
CSR. The cause of the according interrupt or exception can be determined via the content of <<_mcause>>
CSR. The address that reflects the current program counter when a trap was taken is stored to <<_mepc>> CSR.
Additional information regarding the cause of the trap can be retrieved from <<_mtval>> CSR and the processor's
<<_internal_bus_monitor_buskeeper>> (for memory access exceptions)
 
The traps are prioritized. If several _synchronous exceptions_ occur at once only the one with highest priority is triggered
while all remaining exceptions are ignored. If several _asynchronous exceptions_ (interrupts) trigger at once, the one with highest priority
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
the second highest priority will get serviced and so on until no further interrupts are pending.
 
.Interrupt Signal Requirements - Standard RISC-V Interrupts
[IMPORTANT]
All standard RISC-V interrupts request signals are **high-active**. A request has to stay at high-level (=asserted)
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
 
.Interrupt Signal Requirements - Fast Interrupt Requests
[IMPORTANT]
The NEORV32-specific FIRQ request lines are triggered by a one-shot high-level (i.e. rising edge). Each request is buffered in the CPU control
unit until the channel is either disabled (by clearing the according <<_mie>> CSR bit) or the request is explicitly cleared (by setting
the according <<_mip>> CSR bit).
 
.Instruction Atomicity
[NOTE]
All instructions execute as atomic operations - interrupts can only trigger _between_ two instructions.
So even if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.
 
 
:sectnums:
==== Memory Access Exceptions
 
If a load operation causes any exception, the instruction's destination register is
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
trigger a bus/memory read-operation at all. Vice versa, exceptions caused by a store address misalignment or a store physical
memory protection fault do not trigger a bus/memory write-operation at all.
 
 
:sectnums:
==== Custom Fast Interrupt Request Lines
 
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also
provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.
 
 
 
<<<
// ####################################################################################################################
:sectnums:
==== NEORV32 Trap Listing
 
The following table shows all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization
and the CSR side-effects. A more detailed description of the actual trap triggering events is provided in a further table.
 
[NOTE]
_Asynchronous exceptions_ (= interrupts) set the MSB of <<_mcause>> while _synchronous exception_ (= "software exception")
clear the MSB.
 
**Table Annotations**
 
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
cause ID of the according trap that is written to <<_mcause>> CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (the runtime environment _RTE_) and can
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
<<_mepc>> and <<_mtval>> CSRs when a trap is triggered:
 
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
* _B-ADR_- bad memory access address that cause the trap
* _PC_ - address of instruction that caused the trap
* _0_ - zero
* _Inst_ - the faulting instruction itself
 
.NEORV32 Trap Listing
[cols="3,6,5,14,11,4,4"]
[options="header",grid="rows"]
|=======================
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (`ebreak`) | _PC_ | _PC_
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
|=======================
 
 
The following table provides a summarized description of the actual events for triggering a specific trap.
 
.NEORV32 Trap Description
[cols="<3,<7"]
[options="header",grid="rows"]
|=======================
| Trap ID [C] | Triggered when ...
| _TRAP_CODE_I_MISALIGNED_ | fetching an 32-bit instruction word that is not 32-bit-aligned (_see note below!_)
| _TRAP_CODE_I_ACCESS_     | bus timeout or bus error during instruction word fetch
| _TRAP_CODE_I_ILLEGAL_    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation
| _TRAP_CODE_MENV_CALL_    | executing `ecall` instruction in machine-mode
| _TRAP_CODE_UENV_CALL_    | executing `ecall` instruction in user-mode
| _TRAP_CODE_BREAKPOINT_   | executing `ebreak` instruction (or triggered by on-chip debugger)
| _TRAP_CODE_S_MISALIGNED_ | storing data to an address that is not naturally aligned to the data size (byte, half, word) being stored
| _TRAP_CODE_L_MISALIGNED_ | loading data from an address that is not naturally aligned to the data size  (byte, half, word) being loaded
| _TRAP_CODE_S_ACCESS_     | bus timeout or bus error during load data operation
| _TRAP_CODE_L_ACCESS_     | bus timeout or bus error during store data operation
| _TRAP_CODE_FIRQ_0_ ... _TRAP_CODE_FIRQ_15_| caused by interrupt-condition of processor-internal modules, see <<_neorv32_specific_fast_interrupt_requests>>
| _TRAP_CODE_MEI_          | user-defined processor-external source (via dedicated top-entity signal)
| _TRAP_CODE_MSI_          | user-defined processor-external source (via dedicated top-entity signal)
| _TRAP_CODE_MTI_          | processor-internal machine timer overflow OR user-defined processor-external source (via dedicated top-entity signal)
|=======================
 
.Misaligned Instruction Address Exception
[NOTE]
For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception
is raised if bit 1 of the fetch address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented
there will never be a misaligned instruction exception _at all_.
In both cases bit 0 of the program counter (and all related CSRs) is hardwired to zero.
 
 
<<<
// ####################################################################################################################
:sectnums:
==== Bus Interface
 
The NEORV32 CPU implements a 32-bit machine with separated instruction and data interfaces making the CPU a
**Harvard Architecture**: the _instruction fetch interface_ (`i_bus_*`) is used for fetching instruction and the
_data access interface_ (`d_bus_*`) is used to access data via load and store operations.
Each of this interfaces can access an address space of up to 2^32^ bytes (4GB).
The following table shows the signals of the data and instruction interfaces as seen from the CPU (`*_o` signals are driven
by the CPU / outputs, `*_i` signals are read by the CPU / inputs). Both interfaces use the same protocol.
 
.CPU bus interfaces ()
[cols="<2,^1,^1,<6"]
[options="header",grid="rows"]
|=======================
| Signal             | Width | Direction | Description
| `i/d_bus_addr_o`   | 32    | out       | access address
| `i/d_bus_rdata_i`  | 32    | in        | data input for read operations
| `i/d_bus_wdata_o`  | 32    | out       | data output for write operations
| `i/d_bus_ben_o`    | 4     | out       | byte enable signal for write operations
| `i/d_bus_we_o`     | 1     | out       | bus write access (always zero for instruction fetches)
| `i/d_bus_re_o`     | 1     | out       | bus read access
| `i/d_bus_lock_o`   | 1     | out       | exclusive access request
| `i/d_bus_ack_i`    | 1     | in        | accessed peripheral indicates a successful completion of the bus transaction
| `i/d_bus_err_i`    | 1     | in        | accessed peripheral indicates an error during the bus transaction
| `i/d_bus_fence_o`  | 1     | out       | this signal is set for one cycle when the CPU executes an instruction/data fence operation
| `i/d_bus_priv_o`   | 2     | out       | current CPU privilege level
|=======================
 
.Pipelined Transfers
[NOTE]
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
So only a single transfer request can be "on the fly" (pending) at once. However, this is no real drawback. The
minimal possible latency for a single access is two cycles, which equals the CPU's minimal execution latency
for a single instruction.
 
.Unaligned Memory Accesses
[NOTE]
Please note, that the NEORV32 CPU does not support the handling of unaligned memory accesses _in hardware_. Any
unaligned memory access will raise an exception that can can be used to handle such accesses in _software_.
 
 
:sectnums:
===== Protocol
 
An actual bus request is triggered either by the `*_bus_re_o` signal (for reading data) or by the `*_bus_we_o` signal
(for writing data). In case of a request, one of these signals is high for exactly one cycle. The transaction is
completed when the accessed peripheral/memory either sets the `*_bus_ack_i` signal (-> successful completion) or the
`*_bus_err_i` signal (-> failed completion). These bus response signal are also set only for one cycle active.
An error indicated by the `*_bus_err_i` signal will raise the according "instruction bus access fault" or
"load/store bus access fault" exception.
 
**Minimal Response Latency**
 
The transfer can be completed directly in the same cycle as it was initiated (via the `*_bus_re_o` or `*_bus_we_o`
signal) if the peripheral sets `*_bus_ack_i` or `*_bus_err_i` high for one cycle. However, in order to shorten the
critical path such "asynchronous" completion should be avoided. The default NEORV32 processor-internal modules provide
exactly **one cycle delay** between initiation and completion of transfers.
 
**Maximal Response Latency**
 
Processor-internal peripherals or memories do not have to respond within one cycle after a bus request has been initiated.
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window
is defined by the global `max_proc_int_response_time_c` constant (default = 15 cycles; processor's VHDL package file `rtl/neorv32_package.vhd`).
It defines the maximum number of cycles after which an _unacknowledged_ (`*_bus_ack_i` or `*_bus_err_i` both not set) processor-internal bus
transfer will time out and raises a **bus fault exception**. The <<_internal_bus_monitor_buskeeper>> keeps track of all _internal_ bus
transactions to enforce this time window.
 
If any bus operations times out (for example when accessing "address space holes") the BUSKEEPER will issue a bus
error to the CPU that will raise the according instruction fetch or data access bus exception.
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However,
the external memory bus interface also provides an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
 
**Exemplary Bus Accesses**
 
.Example bus accesses: see read/write access description below
[cols="^2,^2"]
[grid="none"]
|=======================
a| image::cpu_interface_read_long.png[read,300,150]
a| image::cpu_interface_write_long.png[write,300,150]
| Read access | Write access
|=======================
 
**Write Access**
 
For a write access, the access address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
cycles after issuing.
 
**Read Access**
 
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
stable until the transaction is completed. In the example the accessed peripheral cannot answer
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
signal).
 
**Access Boundaries**
 
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
bit) and word (= 32-bit) boundaries.
 
**Exclusive (Atomic) Access**
 
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
combination. Normally, these combinations should target the same memory address.
 
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
the memory system to manage this exclusive access reservation by storing the according access address and
the source of the access itself (for example via the CPU ID in a multi-core system).
 
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
zero and will allow the according store operation to the memory system. If the lock is broken, the
instruction will write-back non-zero and will not generate an actual memory store operation.
 
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
 
* when executing any other memory-access operation than `lr.w`
* when any trap (sync. or async.) is triggered (for example to force a context switch)
* when the memory system signals a bus error (via the `bus_err_i` signal)
 
[TIP]
For more information regarding the SoC-level behavior and requirements of atomic operations see
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
 
**Memory Barriers**
 
Whenever the CPU executes a _fence_ instruction, the according interface signal is set high for one cycle
(`d_bus_fence_o` for a `fence` instruction; `i_bus_fence_o` for a `fencei` instruction). It is the task of the
memory system to perform the necessary operations (for example a cache flush and refill).
 
 
 
<<<
// ####################################################################################################################
:sectnums:
==== CPU Hardware Reset
 
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
after power-up is not relevant for a defined CPU boot process.
 
**Rationale**
 
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
of the processing result to some kind of memory. The initial status of the data registers after power-up is
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
the pipeline's data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
this example "uncritical registers".
 
**NEORV32 CPU Reset**
 
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
pipeline register will get initialized by the CPU's internal state machines, which are initialized from the main
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
 
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR <<_mie>>
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) _do_ provide a dedicated
hardware reset setting this bit to low (globally disabling interrupts).
 
**Reset Configuration**
 
Most CPU-internal register do provide an asynchronous reset in the VHDL code, but the "don't care" value
(VHDL `'-'`) is used for initialization of all uncritical registers, effectively generating a flip-flop without a
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all CPU registers can
be enabled by enabling a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
 
[source,vhdl]
----
-- use dedicated hardware reset value for UNCRITICAL registers --
-- FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value
constant dedicated_reset_c : boolean := false;
----
 
 
<<<
// ####################################################################################################################
 
include::cpu_cfu.adoc[]

Line No.	Rev	Author	Line
1	60	zero_gravi	`:sectnums:`
2			`== NEORV32 Central Processing Unit (CPU)`
3
4	72	zero_gravi	`image::neorv32_cpu_block.png[width=600,align=center]`
5	60	zero_gravi
6			`Key Features`
7
8	66	zero_gravi	* 32-bit multi-cycle in-order `rv32` RISC-V CPU
9	61	zero_gravi	`* Optional RISC-V extensions:`
10			** `A` - atomic memory access operations
11	66	zero_gravi	** `B` - bit-manipulation instructions
12	61	zero_gravi	** `C` - 16-bit compressed instructions
13			** `I` - integer base ISA (always enabled)
14			** `E` - embedded CPU version (reduced register file size)
15			** `M` - integer multiplication and division hardware
16			** `U` - less-privileged _user_ mode
17			** `Zfinx` - single-precision floating-point unit
18			** `Zicsr` - control and status register access (privileged architecture)
19	66	zero_gravi	** `Zicntr` - CPU base counters
20			** `Zihpm` - hardware performance monitors
21	61	zero_gravi	** `Zifencei` - instruction stream synchronization
22			** `Zmmul` - integer multiplication hardware
23	72	zero_gravi	** `Zxcfu` - custom instructions extension
24	61	zero_gravi	** `PMP` - physical memory protection
25	72	zero_gravi	** `Debug` - debug mode (part of the on.chip debugger) including hardware trigger module
26	65	zero_gravi	`* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)`
27	60	zero_gravi	`* Official RISC-V open-source architecture ID`
28	65	zero_gravi	`* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts`
29	66	zero_gravi	`* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)`
30			`** This is a special aspect on _execution safety_ by <<_full_virtualization>>`
31	60	zero_gravi	`* Optional physical memory configuration (PMP), compatible to the RISC-V specifications`
32			`* Optional hardware performance monitors (HPM) for application benchmarking`
33	66	zero_gravi	`* Separated interfaces for instruction fetch and data access (merged into a single processor bus))`
34	60	zero_gravi	`* little-endian byte order`
35			`* Configurable hardware reset`
36	65	zero_gravi	`* No hardware support of unaligned data/instruction accesses - they will trigger an exception.`
37	60	zero_gravi
38			`[NOTE]`
39			`It is recommended to use the NEORV32 Processor as default top instance even if you only want to use the actual`
40			`CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU`
41			`wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This`
42			`setup also allows to further use the default bootloader and software framework. From this base you`
43	70	zero_gravi	`can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.`
44	60	zero_gravi
45			`[NOTE]`
46			`This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.`
47
48			`<<<`
49			`// ####################################################################################################################`
50			`:sectnums:`
51			`=== Architecture`
52
53			`The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture`
54			`specifications. The following figure shows the simplified architecture of the CPU.`
55
56			`image::neorv32_cpu.png[align=center]`
57
58	66	zero_gravi	`The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive`
59			`micro-operations. In order to increase performance, the CPU's front-end (instruction fetch) and back-end`
60			`(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the`
61			`front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.`
62	60	zero_gravi
63	66	zero_gravi	`The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,`
64			`two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction`
65			`data is stored to a FIFO queue - the instruction prefetch buffer.`
66	60	zero_gravi
67	66	zero_gravi	`The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",`
68			`which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit`
69			`instruction or decompressed 16-bit instructions) for execution.`
70
71			`Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI`
72			`(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores`
73			`(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or`
74			`when the CPU front-end has to reload the prefetch buffer due to a taken branch.`
75
76	60	zero_gravi	`Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage`
77			`requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes`
78	66	zero_gravi	`every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of`
79			`these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle`
80			`approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the`
81			`multi-cycle concept).`
82	60	zero_gravi
83	66	zero_gravi	`As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.`
84			`These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses`
85			`have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit`
86			`address space.`
87	60	zero_gravi
88
89			`// ####################################################################################################################`
90			`:sectnums:`
91	66	zero_gravi	`=== Full Virtualization`
92
93	72	zero_gravi	`Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to`
94	66	zero_gravi	`allow a high standard of execution safety. The CPU supports all traps specified by the official RISC-V specifications.`
95			footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
96	72	zero_gravi	`Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing a`
97			`malformed instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a`
98	66	zero_gravi	`defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that`
99	72	zero_gravi	`might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.`
100	66	zero_gravi
101			`Execution Safety - NEORV32 Virtualization Features`
102
103			`* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system`
104			`(i.e. there is no speculative execution / no out-of-order states).`
105			`* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an`
106	72	zero_gravi	`accessed address does not respond or encounters an internal device error during access.`
107	66	zero_gravi	`* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time`
108			`window. Otherwise a bus access exception is raised.`
109			`* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional`
110			`execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an`
111			`illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering`
112			`memory operations).`
113			`* To be continued...`
114
115
116			`// ####################################################################################################################`
117			`:sectnums:`
118	60	zero_gravi	`=== RISC-V Compatibility`
119
120			`The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and`
121			`rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the`
122	62	zero_gravi	NEORV32 processor are located in the repository's `sw/isa-test` folder.
123
124			`[NOTE]`
125			`See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]`
126	60	zero_gravi	`for information how to run the tests on the NEORV32.`
127
128			.RISC-V `rv32_m/C` Tests
129			`...................................`
130			`Check cadd-01 ... OK`
131			`Check caddi-01 ... OK`
132			`Check caddi16sp-01 ... OK`
133			`Check caddi4spn-01 ... OK`
134			`Check cand-01 ... OK`
135			`Check candi-01 ... OK`
136			`Check cbeqz-01 ... OK`
137			`Check cbnez-01 ... OK`
138			`Check cebreak-01 ... OK`
139			`Check cj-01 ... OK`
140			`Check cjal-01 ... OK`
141			`Check cjalr-01 ... OK`
142			`Check cjr-01 ... OK`
143			`Check cli-01 ... OK`
144			`Check clui-01 ... OK`
145			`Check clw-01 ... OK`
146			`Check clwsp-01 ... OK`
147			`Check cmv-01 ... OK`
148			`Check cnop-01 ... OK`
149			`Check cor-01 ... OK`
150			`Check cslli-01 ... OK`
151			`Check csrai-01 ... OK`
152			`Check csrli-01 ... OK`
153			`Check csub-01 ... OK`
154			`Check csw-01 ... OK`
155			`Check cswsp-01 ... OK`
156			`Check cxor-01 ... OK`
157			`--------------------------------`
158			`OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32`
159			`...................................`
160
161			.RISC-V `rv32_m/I` Tests
162			`...................................`
163			`Check add-01 ... OK`
164			`Check addi-01 ... OK`
165			`Check and-01 ... OK`
166			`Check andi-01 ... OK`
167			`Check auipc-01 ... OK`
168			`Check beq-01 ... OK`
169			`Check bge-01 ... OK`
170			`Check bgeu-01 ... OK`
171			`Check blt-01 ... OK`
172			`Check bltu-01 ... OK`
173			`Check bne-01 ... OK`
174			`Check fence-01 ... OK`
175			`Check jal-01 ... OK`
176			`Check jalr-01 ... OK`
177			`Check lb-align-01 ... OK`
178			`Check lbu-align-01 ... OK`
179			`Check lh-align-01 ... OK`
180			`Check lhu-align-01 ... OK`
181			`Check lui-01 ... OK`
182			`Check lw-align-01 ... OK`
183			`Check or-01 ... OK`
184			`Check ori-01 ... OK`
185			`Check sb-align-01 ... OK`
186			`Check sh-align-01 ... OK`
187			`Check sll-01 ... OK`
188			`Check slli-01 ... OK`
189			`Check slt-01 ... OK`
190			`Check slti-01 ... OK`
191			`Check sltiu-01 ... OK`
192			`Check sltu-01 ... OK`
193			`Check sra-01 ... OK`
194			`Check srai-01 ... OK`
195			`Check srl-01 ... OK`
196			`Check srli-01 ... OK`
197			`Check sub-01 ... OK`
198			`Check sw-align-01 ... OK`
199			`Check xor-01 ... OK`
200			`Check xori-01 ... OK`
201	70	zero_gravi	`Check fence-01 ... OK`
202	60	zero_gravi	`--------------------------------`
203	70	zero_gravi	`OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32`
204	60	zero_gravi	`...................................`
205
206			.RISC-V `rv32_m/M` Tests
207			`...................................`
208			`Check div-01 ... OK`
209			`Check divu-01 ... OK`
210			`Check mul-01 ... OK`
211			`Check mulh-01 ... OK`
212			`Check mulhsu-01 ... OK`
213			`Check mulhu-01 ... OK`
214			`Check rem-01 ... OK`
215			`Check remu-01 ... OK`
216			`--------------------------------`
217			`OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32`
218			`...................................`
219
220			.RISC-V `rv32_m/privilege` Tests
221			`...................................`
222			`Check ebreak ... OK`
223			`Check ecall ... OK`
224			`Check misalign-beq-01 ... OK`
225			`Check misalign-bge-01 ... OK`
226			`Check misalign-bgeu-01 ... OK`
227			`Check misalign-blt-01 ... OK`
228			`Check misalign-bltu-01 ... OK`
229			`Check misalign-bne-01 ... OK`
230			`Check misalign-jal-01 ... OK`
231			`Check misalign-lh-01 ... OK`
232			`Check misalign-lhu-01 ... OK`
233			`Check misalign-lw-01 ... OK`
234			`Check misalign-sh-01 ... OK`
235			`Check misalign-sw-01 ... OK`
236			`Check misalign1-jalr-01 ... OK`
237			`Check misalign2-jalr-01 ... OK`
238			`--------------------------------`
239			`OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32`
240			`...................................`
241
242			.RISC-V `rv32_m/Zifencei` Tests
243			`...................................`
244			`Check Fencei ... OK`
245			`--------------------------------`
246			`OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32`
247			`...................................`
248
249
250			`<<<`
251			`:sectnums:`
252			`==== RISC-V Incompatibility Issues and Limitations`
253
254	64	zero_gravi	`This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information`
255	60	zero_gravi	`can be found in section <<_instruction_sets_and_extensions>>.`
256
257	69	zero_gravi	`.Read-Only "Read-Write" CSRs`
258	60	zero_gravi	`[IMPORTANT]`
259	72	zero_gravi	`The <<_misa>> and <<_mtval>> CSRs in the NEORV32 are _read-only_.`
260	69	zero_gravi	`Any machine-mode write access to them is ignored and will _not_ cause any exceptions or side-effects to maintain`
261			`RISC-V compatibility.`
262	60	zero_gravi
263	69	zero_gravi	`.Physical Memory Protection`
264	60	zero_gravi	`[IMPORTANT]`
265	70	zero_gravi	`The physical memory protection (see section <<_machine_physical_memory_protection_csrs>>)`
266	60	zero_gravi	`only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.`
267
268	69	zero_gravi	`.Atomic Memory Operations`
269	60	zero_gravi	`[IMPORTANT]`
270	64	zero_gravi	The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
271			`However, these instructions are sufficient to emulate all further atomic memory operations.`
272	60	zero_gravi
273	66	zero_gravi
274	60	zero_gravi	`<<<`
275			`// ####################################################################################################################`
276			`:sectnums:`
277			`=== CPU Top Entity - Signals`
278
279			The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
280			`type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal`
281			`direction seen from the CPU.`
282
283			`.NEORV32 CPU top entity signals`
284			`[cols="<2,^1,^1,<6"]`
285			`[options="header", grid="rows"]`
286			`\|=======================`
287			`\| Signal \| Width \| Dir. \| Function`
288			`4+^\| Global Signals`
289			\| `clk_i` \| 1 \| in \| global clock line, all registers triggering on rising edge
290			\| `rstn_i` \| 1 \| in \| global reset, low-active
291			\| `sleep_o` \| 1 \| out \| CPU is in sleep mode when set
292	69	zero_gravi	\| `debug_o` \| 1 \| out \| CPU is in debug mode when set
293	60	zero_gravi	`4+^\| Instruction Bus Interface (<<_bus_interface>>)`
294			\| `i_bus_addr_o` \| 32 \| out \| destination address
295			\| `i_bus_rdata_i` \| 32 \| in \| read data
296			\| `i_bus_wdata_o` \| 32 \| out \| write data (always zero)
297			\| `i_bus_ben_o` \| 4 \| out \| byte enable
298			\| `i_bus_we_o` \| 1 \| out \| write transaction (always zero)
299			\| `i_bus_re_o` \| 1 \| out \| read transaction
300			\| `i_bus_lock_o` \| 1 \| out \| exclusive access request (always zero)
301			\| `i_bus_ack_i` \| 1 \| in \| bus transfer acknowledge from accessed peripheral
302			\| `i_bus_err_i` \| 1 \| in \| bus transfer terminate from accessed peripheral
303			\| `i_bus_fence_o` \| 1 \| out \| indicates an executed _fence.i_ instruction
304			\| `i_bus_priv_o` \| 2 \| out \| current CPU privilege level
305			`4+^\| Data Bus Interface (<<_bus_interface>>)`
306			\| `d_bus_addr_o` \| 32 \| out \| destination address
307			\| `d_bus_rdata_i` \| 32 \| in \| read data
308			\| `d_bus_wdata_o` \| 32 \| out \| write data
309			\| `d_bus_ben_o` \| 4 \| out \| byte enable
310			\| `d_bus_we_o` \| 1 \| out \| write transaction
311			\| `d_bus_re_o` \| 1 \| out \| read transaction
312			\| `d_bus_lock_o` \| 1 \| out \| exclusive access request
313			\| `d_bus_ack_i` \| 1 \| in \| bus transfer acknowledge from accessed peripheral
314			\| `d_bus_err_i` \| 1 \| in \| bus transfer terminate from accessed peripheral
315			\| `d_bus_fence_o` \| 1 \| out \| indicates an executed _fence_ instruction
316			\| `d_bus_priv_o` \| 2 \| out \| current CPU privilege level
317			`4+^\| System Time (see <<_timeh>> CSR)`
318			\| `time_i` \| 64 \| in \| system time input (from MTIME)
319			`4+^\| Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)`
320			\| `msw_irq_i` \| 1 \| in \| RISC-V machine software interrupt
321			\| `mext_irq_i` \| 1 \| in \| RISC-V machine external interrupt
322			\| `mtime_irq_i` \| 1 \| in \| RISC-V machine timer interrupt
323			`4+^\| Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)`
324			\| `firq_i` \| 16 \| in \| fast interrupt request signals
325			`4+^\| Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)`
326			\| `db_halt_req_i` \| 1 \| in \| request CPU to halt and enter debug mode
327			`\|=======================`
328
329			`<<<`
330			`// ####################################################################################################################`
331			`:sectnums:`
332			`=== CPU Top Entity - Generics`
333
334			`Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).`
335			`and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the`
336			`NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.`
337			`The _specific_ generics are listed below.`
338
339			`[cols="4,4,2"]`
340			`[frame="all",grid="none"]`
341			`\|======`
342	72	zero_gravi	`\| CPU_BOOT_ADDR \| _std_ulogic_vector(31 downto 0)_ \| -`
343	60	zero_gravi	`3+\| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this`
344			`generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction`
345	61	zero_gravi	`memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.`
346	60	zero_gravi	`\|======`
347
348			`[cols="4,4,2"]`
349			`[frame="all",grid="none"]`
350			`\|======`
351	72	zero_gravi	`\| CPU_DEBUG_ADDR \| _std_ulogic_vector(31 downto 0)_ \| -`
352	60	zero_gravi	`3+\| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address`
353			`of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.`
354			`\|======`
355
356			`[cols="4,4,2"]`
357			`[frame="all",grid="none"]`
358			`\|======`
359	72	zero_gravi	`\| CPU_EXTENSION_RISCV_DEBUG \| _boolean_ \| -`
360	60	zero_gravi	`3+\| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.`
361			`\|======`
362
363
364			`<<<`
365			`// ####################################################################################################################`
366			`:sectnums:`
367			`=== Instruction Sets and Extensions`
368
369	65	zero_gravi	The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
370	60	zero_gravi	`(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please`
371	65	zero_gravi	`see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual`
372	60	zero_gravi	Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
373
374	72	zero_gravi	`.Discovering ISA Extensions`
375	60	zero_gravi	`[TIP]`
376	72	zero_gravi	`The CPU can discover available ISA extensions via the <<_misa>> & <<_mxisa>> CSRs`
377			`or by executing an instruction and checking for an _illegal instruction exception_`
378			`(-> <<_full_virtualization>>). +`
379			`+`
380	65	zero_gravi	`Executing an instruction from an extension that is not supported yet or that is currently not enabled`
381	72	zero_gravi	`(via the according top entity generic) will raise an illegal instruction exception.`
382	60	zero_gravi
383	63	zero_gravi
384	60	zero_gravi	==== `A` - Atomic Memory Access
385
386	65	zero_gravi	`Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.`
387			The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
388	72	zero_gravi	`ISA extension is enabled if the <<_cpu_extension_riscv_a>> configuration generic is _true_.`
389	65	zero_gravi	`In this case the following additional instructions are available:`
390	60	zero_gravi
391			* `lr.w`: load-reservate
392			* `sc.w`: store-conditional
393
394			`[NOTE]`
395			Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
396			`(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the`
397	65	zero_gravi	instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
398			`implemented) AMO (atomic memory operation) will raise an illegal instruction exception.`
399	60	zero_gravi
400	65	zero_gravi	The load-reservate instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
401			_data memory access lock_. Executing a store-conditional behaves as "normal" store-word instruction (`sw`) that will
402			`only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction`
403			`will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).`
404			After the execution of the `sc` instruction, the lock is automatically removed.
405
406			`The lock is broken if at least one of the following conditions occur:`
407			. executing any data memory access instruction other than `lr.w`
408			`. raising _any_ t (for example an interrupt or a memory access exception)`
409
410	60	zero_gravi	`[NOTE]`
411			`The atomic instructions have special requirements for memory system / bus interconnect. More`
412			`information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.`
413
414
415	66	zero_gravi	==== `B` - Bit-Manipulation Operations
416
417			The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
418	72	zero_gravi	`<<_cpu_extension_riscv_b>> configuration generic is _true_.`
419	66	zero_gravi	`The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip`
420	71	zero_gravi	A copy of the spec is also available in `docs/references`.
421	66	zero_gravi
422	71	zero_gravi	The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V
423			`bit-manipulation spec. v.093) and their corresponding instructions:`
424	66	zero_gravi
425	71	zero_gravi	* `Zba` - Address-generation instructions
426			** `sh1add` `sh2add` `sh3add`
427			* `Zbb` - Basic bit-manipulation instructions
428			** `andn` `orn` `xnor`
429			** `clz` `ctz` `cpop`
430			** `max` `maxu` `min` `minu`
431			** `sext.b` `sext.h` `zext.h`
432			** `rol` `ror` `rori`
433			** `orc.b` `rev8`
434			* `Zbc` - Carry-less multiplication instructions
435			** `clmul` `clmulh` `clmulr`
436			* `Zbs` - Single-bit instructions
437			** `bclr` `bclri`
438			** `bext` `bexti`
439			** `bext` `binvi`
440			** `bset` `bseti`
441	66	zero_gravi
442			`[TIP]`
443			`By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations`
444			like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
445			`<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all`
446			shift-related `B` instructions.
447
448			`[WARNING]`
449	71	zero_gravi	The `B` extension is frozen and officially ratified. However, there is no
450			`software support for this extension in the upstream GCC RISC-V port yet. An`
451	66	zero_gravi	intrinsic library is provided to utilize the provided `B` extension features from C-language
452	71	zero_gravi	code (see `sw/example/bitmanip_test`) to circumvent this.
453	66	zero_gravi
454
455	60	zero_gravi	==== `C` - Compressed Instructions
456
457	65	zero_gravi	`The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.`
458	72	zero_gravi	The `C` extension is available when the <<_cpu_extension_riscv_c>> configuration generic is _true_.
459	65	zero_gravi	`In this case the following instructions are available:`
460	60	zero_gravi
461	70	zero_gravi	* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`
462			`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`
463	60	zero_gravi
464			`[NOTE]`
465	65	zero_gravi	`When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require`
466			`an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased`
467	60	zero_gravi	again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
468			`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
469
470
471			==== `E` - Embedded CPU
472
473	65	zero_gravi	`The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to`
474	72	zero_gravi	`decrease physical hardware requirements (for example block RAM). This extensions is enabled when the <<_cpu_extension_riscv_e>>`
475	65	zero_gravi	configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
476			`This extension does not add any additional instructions or features.`
477	60	zero_gravi
478	70	zero_gravi	`[NOTE]`
479	63	zero_gravi	Due to the reduced register file size an alternate toolchain ABI (`ilp32e`) is required.
480	60	zero_gravi
481
482			==== `I` - Base Integer ISA
483	65	zero_gravi
484	60	zero_gravi	The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
485			`regardless of the setting of the remaining exceptions. The base instruction set includes the following`
486			`instructions:`
487
488	70	zero_gravi	* immediate: `lui` `auipc`
489			* jumps: `jal` `jalr`
490			* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`
491			* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`
492			* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`
493			* environment: `ecall` `ebreak` `fence`
494	60	zero_gravi
495			`[NOTE]`
496	70	zero_gravi	`In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations`
497	61	zero_gravi	`take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed`
498	70	zero_gravi	completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
499	62	zero_gravi	`complete within 2 cycles (plus overhead) regardless of the actual shift amount.`
500	60	zero_gravi

Browse

Tools

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 72