OpenCores

:sectnums:

:sectnums:

== NEORV32 Central Processing Unit (CPU)

== NEORV32 Central Processing Unit (CPU)

image::neorv32_cpu_block.png[width=600,align=center]

image::neorv32_cpu_block.png[width=600,align=center]

**Section Structure**

**Section Structure**

* <<_architecture>>, <<_full_virtualization>> and <<_risc_v_compatibility>>

* <<_architecture>>, <<_full_virtualization>> and <<_risc_v_compatibility>>

* <<_cpu_top_entity_signals>> and <<_cpu_top_entity_generics>>

* <<_cpu_top_entity_signals>> and <<_cpu_top_entity_generics>>

* <<_instruction_sets_and_extensions>>, <<_custom_functions_unit_cfu>> and <<_instruction_timing>>

* <<_instruction_sets_and_extensions>>, <<_custom_functions_unit_cfu>> and <<_instruction_timing>>

* <<_control_and_status_registers_csrs>>

* <<_control_and_status_registers_csrs>>

* <<_traps_exceptions_and_interrupts>>

* <<_traps_exceptions_and_interrupts>>

* <<_bus_interface>>

* <<_bus_interface>>

**Key Features**

**Key Features**

* 32-bit little-endian, multi-cycle, in-order `rv32` RISC-V CPU

* 32-bit little-endian, multi-cycle, in-order `rv32` RISC-V CPU

* Compatible to the RISC-V. **Privileged Architecture - Machine ISA Version 1.12** specifications

* Compatible to the RISC-V. **Privileged Architecture - Machine ISA Version 1.12** specifications

* Available <<_instruction_sets_and_extensions>>:

* Available <<_instruction_sets_and_extensions>>:

** `A` - atomic memory access operations

** `A` - atomic memory access operations

** `B` - bit-manipulation instructions

** `B` - bit-manipulation instructions

** `C` - 16-bit compressed instructions

** `C` - 16-bit compressed instructions

** `I` - integer base ISA (always enabled)

** `I` - integer base ISA (always enabled)

** `E` - embedded CPU version (reduced register file size)

** `E` - embedded CPU version (reduced register file size)

** `M` - integer multiplication and division hardware

** `M` - integer multiplication and division hardware

** `U` - less-privileged _user_ mode

** `U` - less-privileged _user_ mode

** `Zfinx` - single-precision floating-point unit

** `Zfinx` - single-precision floating-point unit

** `Zicsr` - control and status register access (privileged architecture)

** `Zicsr` - control and status register access (privileged architecture)

** `Zicntr` - CPU base counters

** `Zicntr` - CPU base counters

** `Zihpm` - hardware performance monitors

** `Zihpm` - hardware performance monitors

** `Zifencei` - instruction stream synchronization

** `Zifencei` - instruction stream synchronization

** `Zmmul` - integer multiplication hardware

** `Zmmul` - integer multiplication hardware

** `Zxcfu` - custom instructions extension

** `Zxcfu` - custom instructions extension

** `PMP` - physical memory protection

** `PMP` - physical memory protection

** `Debug` - <<_cpu_debug_mode>> (part of the on.chip debugger) including hardware <<_trigger_module>>

** `Debug` - <<_cpu_debug_mode>> (part of the on.chip debugger) including hardware <<_trigger_module>>

* <<_risc_v_compatibility>>: Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)

* <<_risc_v_compatibility>>: Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)

* Official RISC-V open-source architecture ID

* Official RISC-V open-source architecture ID

* Supports _all_ of the machine-level <<_traps_exceptions_and_interrupts>> from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)

* Supports _all_ of the machine-level <<_traps_exceptions_and_interrupts>> from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)

** This is a special aspect on _execution safety_ by <<_full_virtualization>>

** This is a special aspect on _execution safety_ by <<_full_virtualization>>

** Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 custom _fast_ interrupts

** Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 custom _fast_ interrupts

* Optional physical memory configuration (PMP), compatible to the RISC-V specifications

* Optional physical memory configuration (PMP), compatible to the RISC-V specifications

* Optional hardware performance monitors (HPM) for application benchmarking

* Optional hardware performance monitors (HPM) for application benchmarking

* Separated <<_bus_interface>>s for instruction fetch and data access

* Separated <<_bus_interface>>s for instruction fetch and data access

[NOTE]

[NOTE]

It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual

It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual

CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU

CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU

wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This

wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This

setup also allows to further use the default bootloader and software framework. From this base you

setup also allows to further use the default bootloader and software framework. From this base you

can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.

can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.

[NOTE]

[NOTE]

This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.

This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

=== Architecture

=== Architecture

The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture

The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture

specifications. The following figure shows the simplified architecture of the CPU.

specifications. The following figure shows the simplified architecture of the CPU.

image::neorv32_cpu.png[align=center]

image::neorv32_cpu.png[align=center]

The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive

The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive

micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**

micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**

(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the

(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the

front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.

front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.

The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,

The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,

two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction

two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction

data is stored to a FIFO queue - the instruction prefetch buffer.

data is stored to a FIFO queue - the instruction prefetch buffer.

The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",

The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",

which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit

which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit

instruction or decompressed 16-bit instructions) for execution.

instruction or decompressed 16-bit instructions) for execution.

Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI

Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI

(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores

(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores

(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or

(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or

when the CPU front-end has to reload the prefetch buffer due to a taken branch.

when the CPU front-end has to reload the prefetch buffer due to a taken branch.

Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage

Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage

requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes

requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes

every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of

every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of

these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle

these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle

approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the

approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the

multi-cycle concept).

multi-cycle concept).

As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.

As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.

These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses

These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses

have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit

have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit

address space.

address space.

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

=== Full Virtualization

=== Full Virtualization

Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to

Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to

allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.

allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.

footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]

footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]

Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing a

Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing a

malformed instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a

malformed instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a

defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that

defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that

might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.

might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.

**Execution Safety - NEORV32 Virtualization Features**

**Execution Safety - NEORV32 Virtualization Features**

* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system

* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system

(i.e. there is no speculative execution / no out-of-order states).

(i.e. there is no speculative execution / no out-of-order states).

* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an

* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an

accessed address does not respond or encounters an internal device error during access.

accessed address does not respond or encounters an internal device error during access.

* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time

* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time

window. Otherwise a bus access exception is raised.

window. Otherwise a bus access exception is raised.

* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional

* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional

execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an

execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an

illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering

illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering

memory operations).

memory operations).

* To be continued...

* To be continued...

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

=== RISC-V Compatibility

=== RISC-V Compatibility

The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and

The NEORV32 CPU passes the tests of the _RISC-V Architecture Test Framework_. This framework is used to check

rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the

RISC-V implementations for compatibility with the official RISC-V ISA specifications.

NEORV32 processor are located in the repository's `sw/isa-test` folder.

The NEORV32 port of this test framework has been moved to a separate repository:

https://github.com/stnolting/neorv32-verif

[NOTE]

See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]

for information how to run the tests on the NEORV32.

.**RISC-V `rv32_m/C` Tests**

.**RISC-V `rv32_m/C` Tests**

...................................

...................................

Check cadd-01           ... OK

Check cadd-01           ... OK

Check caddi-01          ... OK

Check caddi-01          ... OK

Check caddi16sp-01      ... OK

Check caddi16sp-01      ... OK

Check caddi4spn-01      ... OK

Check caddi4spn-01      ... OK

Check cand-01           ... OK

Check cand-01           ... OK

Check candi-01          ... OK

Check candi-01          ... OK

Check cbeqz-01          ... OK

Check cbeqz-01          ... OK

Check cbnez-01          ... OK

Check cbnez-01          ... OK

Check cebreak-01        ... OK

Check cebreak-01        ... OK

Check cj-01             ... OK

Check cj-01             ... OK

Check cjal-01           ... OK

Check cjal-01           ... OK

Check cjalr-01          ... OK

Check cjalr-01          ... OK

Check cjr-01            ... OK

Check cjr-01            ... OK

Check cli-01            ... OK

Check cli-01            ... OK

Check clui-01           ... OK

Check clui-01           ... OK

Check clw-01            ... OK

Check clw-01            ... OK

Check clwsp-01          ... OK

Check clwsp-01          ... OK

Check cmv-01            ... OK

Check cmv-01            ... OK

Check cnop-01           ... OK

Check cnop-01           ... OK

Check cor-01            ... OK

Check cor-01            ... OK

Check cslli-01          ... OK

Check cslli-01          ... OK

Check csrai-01          ... OK

Check csrai-01          ... OK

Check csrli-01          ... OK

Check csrli-01          ... OK

Check csub-01           ... OK

Check csub-01           ... OK

Check csw-01            ... OK

Check csw-01            ... OK

Check cswsp-01          ... OK

Check cswsp-01          ... OK

Check cxor-01           ... OK

Check cxor-01           ... OK

--------------------------------

--------------------------------

OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32

OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32

...................................

...................................

.**RISC-V `rv32_m/I` Tests**

.**RISC-V `rv32_m/I` Tests**

...................................

...................................

Check add-01            ... OK

Check add-01            ... OK

Check addi-01           ... OK

Check addi-01           ... OK

Check and-01            ... OK

Check and-01            ... OK

Check andi-01           ... OK

Check andi-01           ... OK

Check auipc-01          ... OK

Check auipc-01          ... OK

Check beq-01            ... OK

Check beq-01            ... OK

Check bge-01            ... OK

Check bge-01            ... OK

Check bgeu-01           ... OK

Check bgeu-01           ... OK

Check blt-01            ... OK

Check blt-01            ... OK

Check bltu-01           ... OK

Check bltu-01           ... OK

Check bne-01            ... OK

Check bne-01            ... OK

Check fence-01          ... OK

Check fence-01          ... OK

Check jal-01            ... OK

Check jal-01            ... IGNORED <1>

Check jalr-01           ... OK

Check jalr-01           ... OK

Check lb-align-01       ... OK

Check lb-align-01       ... OK

Check lbu-align-01      ... OK

Check lbu-align-01      ... OK

Check lh-align-01       ... OK

Check lh-align-01       ... OK

Check lhu-align-01      ... OK

Check lhu-align-01      ... OK

Check lui-01            ... OK

Check lui-01            ... OK

Check lw-align-01       ... OK

Check lw-align-01       ... OK

Check or-01             ... OK

Check or-01             ... OK

Check ori-01            ... OK

Check ori-01            ... OK

Check sb-align-01       ... OK

Check sb-align-01       ... OK

Check sh-align-01       ... OK

Check sh-align-01       ... OK

Check sll-01            ... OK

Check sll-01            ... OK

Check slli-01           ... OK

Check slli-01           ... OK

Check slt-01            ... OK

Check slt-01            ... OK

Check slti-01           ... OK

Check slti-01           ... OK

Check sltiu-01          ... OK

Check sltiu-01          ... OK

Check sltu-01           ... OK

Check sltu-01           ... OK

Check sra-01            ... OK

Check sra-01            ... OK

Check srai-01           ... OK

Check srai-01           ... OK

Check srl-01            ... OK

Check srl-01            ... OK

Check srli-01           ... OK

Check srli-01           ... OK

Check sub-01            ... OK

Check sub-01            ... OK

Check sw-align-01       ... OK

Check sw-align-01       ... OK

Check xor-01            ... OK

Check xor-01            ... OK

Check xori-01           ... OK

Check xori-01           ... OK

Check fence-01          ... OK

Check fence-01          ... OK

--------------------------------

--------------------------------

OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32

OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32

...................................

...................................

<1> Test is skipped due to a GHDL simulation issue.

.**RISC-V `rv32_m/M` Tests**

.**RISC-V `rv32_m/M` Tests**

...................................

...................................

Check div-01            ... OK

Check div-01            ... OK

Check divu-01           ... OK

Check divu-01           ... OK

Check mul-01            ... OK

Check mul-01            ... OK

Check mulh-01           ... OK

Check mulh-01           ... OK

Check mulhsu-01         ... OK

Check mulhsu-01         ... OK

Check mulhu-01          ... OK

Check mulhu-01          ... OK

Check rem-01            ... OK

Check rem-01            ... OK

Check remu-01           ... OK

Check remu-01           ... OK

--------------------------------

--------------------------------

OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32

OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32

...................................

...................................

.**RISC-V `rv32_m/privilege` Tests**

.**RISC-V `rv32_m/privilege` Tests**

...................................

...................................

Check ebreak            ... OK

Check ebreak            ... OK

Check ecall             ... OK

Check ecall             ... OK

Check misalign-beq-01   ... OK

Check misalign-beq-01   ... OK

Check misalign-bge-01   ... OK

Check misalign-bge-01   ... OK

Check misalign-bgeu-01  ... OK

Check misalign-bgeu-01  ... OK

Check misalign-blt-01   ... OK

Check misalign-blt-01   ... OK

Check misalign-bltu-01  ... OK

Check misalign-bltu-01  ... OK

Check misalign-bne-01   ... OK

Check misalign-bne-01   ... OK

Check misalign-jal-01   ... OK

Check misalign-jal-01   ... OK

Check misalign-lh-01    ... OK

Check misalign-lh-01    ... OK

Check misalign-lhu-01   ... OK

Check misalign-lhu-01   ... OK

Check misalign-lw-01    ... OK

Check misalign-lw-01    ... OK

Check misalign-sh-01    ... OK

Check misalign-sh-01    ... OK

Check misalign-sw-01    ... OK

Check misalign-sw-01    ... OK

Check misalign1-jalr-01 ... OK

Check misalign1-jalr-01 ... OK

Check misalign2-jalr-01 ... OK

Check misalign2-jalr-01 ... OK

--------------------------------

--------------------------------

OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32

OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32

...................................

...................................

.**RISC-V `rv32_m/Zifencei` Tests**

.**RISC-V `rv32_m/Zifencei` Tests**

...................................

...................................

Check Fencei            ... OK

Check Fencei            ... OK

--------------------------------

--------------------------------

OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32

OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32

...................................

...................................

<<<

<<<

:sectnums:

:sectnums:

==== RISC-V Incompatibility Issues and Limitations

==== RISC-V Incompatibility Issues and Limitations

This list shows the currently identified issues regarding full RISC-V-compatibility.

This list shows the currently identified issues regarding full RISC-V-compatibility.

.Read-Only "Read-Write" CSRs

.Read-Only "Read-Write" CSRs

[IMPORTANT]

[IMPORTANT]

The <<_misa>> and <<_mtval>> CSRs in the NEORV32 are _read-only_.

The <<_misa>> and <<_mtval>> CSRs in the NEORV32 are _read-only_.

Any machine-mode write access to them is ignored and will _not_ cause any exceptions or

Any machine-mode write access to them is ignored and will _not_ cause any exceptions or

side-effects to maintain RISC-V compatibility.

side-effects to maintain RISC-V compatibility.

.Physical Memory Protection

.Physical Memory Protection

[IMPORTANT]

[IMPORTANT]

The RISC-V-compatible NEORV32 <<_machine_physical_memory_protection_csrs>> only implements the **TOR**

The RISC-V-compatible NEORV32 <<_machine_physical_memory_protection_csrs>> only implements the **TOR**

(top of region) mode and only up to 16 PMP regions. Furthermore, the <<_pmpcfg>>'s _lock bits_ only lock

(top of region) mode and only up to 16 PMP regions. Furthermore, the <<_pmpcfg>>'s _lock bits_ only lock

the according PMP entry and not the entries below. All region rules are checked in parallel **without**

the according PMP entry and not the entries below. All region rules are checked in parallel **without**

prioritization so for identical memory regions the most restrictive PMP rule will be enforced.

prioritization so for identical memory regions the most restrictive PMP rule will be enforced.

.Atomic Memory Operations

.Atomic Memory Operations

[IMPORTANT]

[IMPORTANT]

The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.

The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.

However, these instructions are sufficient to emulate all further atomic memory operations.

However, these instructions are sufficient to emulate all further atomic memory operations.

.No HW-Support of Misaligned Memory Accesses

.No HW-Support of Misaligned Memory Accesses

[WARNING]

[WARNING]

The CPU does not support the resolution of unaligned memory access by the hardware. This is not a

The CPU does not support the resolution of unaligned memory access by the hardware. This is not a

RISC-V-compatibility issue but an important thing to know. Any kind of unaligned memory access

RISC-V-compatibility issue but an important thing to know. Any kind of unaligned memory access

will raise an exception to allow a software-based emulation.

will raise an exception to allow a software-based emulation.

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

=== CPU Top Entity - Signals

=== CPU Top Entity - Signals

The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The

The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The

type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal

type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal

direction seen from the CPU.

direction seen from the CPU.

.NEORV32 CPU top entity signals

.NEORV32 CPU top entity signals

[cols="<2,^1,^1,<6"]

[cols="<2,^1,^1,<6"]

[options="header", grid="rows"]

[options="header", grid="rows"]

|=======================

|=======================

| Signal           | Width | Dir. | Description

| Signal           | Width | Dir. | Description

4+^| **Global Signals**

4+^| **Global Signals**

| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge

| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge

| `rstn_i`         |     1 | in  | global reset, low-active

| `rstn_i`         |     1 | in  | global reset, low-active

| `sleep_o`        |     1 | out | CPU is in sleep mode when set

| `sleep_o`        |     1 | out | CPU is in sleep mode when set

| `debug_o`        |     1 | out | CPU is in debug mode when set

| `debug_o`        |     1 | out | CPU is in debug mode when set

4+^| **Instruction <<_bus_interface>>**

4+^| **Instruction <<_bus_interface>>**

| `i_bus_addr_o`   |    32 | out | access address

| `i_bus_addr_o`   |    32 | out | access address

| `i_bus_rdata_i`  |    32 | in  | read data

| `i_bus_rdata_i`  |    32 | in  | read data

| `i_bus_wdata_o`  |    32 | out | write data (always zero)

| `i_bus_wdata_o`  |    32 | out | write data (always zero)

| `i_bus_ben_o`    |     4 | out | byte enable

| `i_bus_ben_o`    |     4 | out | byte enable

| `i_bus_we_o`     |     1 | out | write transaction (always zero)

| `i_bus_we_o`     |     1 | out | write transaction (always zero)

| `i_bus_re_o`     |     1 | out | read transaction

| `i_bus_re_o`     |     1 | out | read transaction

| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)

| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)

| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral

| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral

| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral

| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral

| `i_bus_fence_o`  |     1 | out | indicates an executed `fence.i` instruction

| `i_bus_fence_o`  |     1 | out | indicates an executed `fence.i` instruction

| `i_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)

| `i_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)

4+^| **Data <<_bus_interface>>**

4+^| **Data <<_bus_interface>>**

| `d_bus_addr_o`   |    32 | out | access address

| `d_bus_addr_o`   |    32 | out | access address

| `d_bus_rdata_i`  |    32 | in  | read data

| `d_bus_rdata_i`  |    32 | in  | read data

| `d_bus_wdata_o`  |    32 | out | write data

| `d_bus_wdata_o`  |    32 | out | write data

| `d_bus_ben_o`    |     4 | out | byte enable

| `d_bus_ben_o`    |     4 | out | byte enable

| `d_bus_we_o`     |     1 | out | write transaction

| `d_bus_we_o`     |     1 | out | write transaction

| `d_bus_re_o`     |     1 | out | read transaction

| `d_bus_re_o`     |     1 | out | read transaction

| `d_bus_lock_o`   |     1 | out | exclusive access request

| `d_bus_lock_o`   |     1 | out | exclusive access request

| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral

| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral

| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral

| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral

| `d_bus_fence_o`  |     1 | out | indicates an executed `fence` instruction

| `d_bus_fence_o`  |     1 | out | indicates an executed `fence` instruction

| `d_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)

| `d_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)

4+^| **System Time (for <<_timeh>> CSR)**

4+^| **System Time (for <<_timeh>> CSR)**

| `time_i`         |    64 | in  | system time input from <<_machine_system_timer_mtime>>

| `time_i`         |    64 | in  | system time input from <<_machine_system_timer_mtime>>

4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**

4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**

| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt

| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt

| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt

| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt

| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt

| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt

4+^| **Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**

4+^| **Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**

| `firq_i`         |    16 | in  | fast interrupt request signals

| `firq_i`         |    16 | in  | fast interrupt request signals

4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**

4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**

| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode

| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode

|=======================

|=======================

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

=== CPU Top Entity - Generics

=== CPU Top Entity - Generics

Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).

Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).

and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the

and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the

NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.

NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.

The _specific_ generics are listed below.

The _specific_ generics are listed below.

[cols="4,4,2"]

[cols="4,4,2"]

[frame="all",grid="none"]

[frame="all",grid="none"]

|======

|======

| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_

| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_

3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this

3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this

generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction

generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction

memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.

memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.

|======

|======

[cols="4,4,2"]

[cols="4,4,2"]

[frame="all",grid="none"]

[frame="all",grid="none"]

|======

|======

| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_

| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_

3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address

3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address

of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.

of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.

|======

|======

[cols="4,4,2"]

[cols="4,4,2"]

[frame="all",grid="none"]

[frame="all",grid="none"]

|======

|======

| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | _no default value_

| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | _no default value_

3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.

3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.

|======

|======

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

=== Instruction Sets and Extensions

=== Instruction Sets and Extensions

The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA

The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA

(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please

(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please

see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual

see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual

Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.

Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.

.Discovering ISA Extensions

.Discovering ISA Extensions

[TIP]

[TIP]

The CPU can discover available ISA extensions via the <<_misa>> & <<_mxisa>> CSRs

The CPU can discover available ISA extensions via the <<_misa>> & <<_mxisa>> CSRs

or by executing an instruction and checking for an _illegal instruction exception_

or by executing an instruction and checking for an _illegal instruction exception_

(-> <<_full_virtualization>>). +

(-> <<_full_virtualization>>). +

Executing an instruction from an extension that is not supported yet or that is currently not enabled

Executing an instruction from an extension that is not supported yet or that is currently not enabled

(via the according top entity generic) will raise an illegal instruction exception.

(via the according top entity generic) will raise an illegal instruction exception.

==== **`A`** - Atomic Memory Access

==== **`A`** - Atomic Memory Access

Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.

Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.

The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`

The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`

ISA extension is enabled if the <<_cpu_extension_riscv_a>> configuration generic is _true_.

ISA extension is enabled if the <<_cpu_extension_riscv_a>> configuration generic is _true_.

In this case the following additional instructions are available:

In this case the following additional instructions are available:

* `lr.w`: load-reservate

* `lr.w`: load-reservate

* `sc.w`: store-conditional

* `sc.w`: store-conditional

[NOTE]

[NOTE]

Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations

Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations

(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the

(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the

instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet

instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet

implemented) AMO (atomic memory operation) will raise an illegal instruction exception.

implemented) AMO (atomic memory operation) will raise an illegal instruction exception.

The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal

The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal

_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will

_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will

only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction

only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction

will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).

will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).

After the execution of the `sc` instruction, the lock is automatically removed.

After the execution of the `sc` instruction, the lock is automatically removed.

The lock is broken if at least one of the following conditions occur:

The lock is broken if at least one of the following conditions occur:

. executing any data memory access instruction other than `lr.w`

. executing any data memory access instruction other than `lr.w`

. raising _any_ t (for example an interrupt or a memory access exception)

. raising _any_ t (for example an interrupt or a memory access exception)

[NOTE]

[NOTE]

The atomic instructions have special requirements for memory system / bus interconnect. More

The atomic instructions have special requirements for memory system / bus interconnect. More

information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.

information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.

==== **`B`** - Bit-Manipulation Operations

==== **`B`** - Bit-Manipulation Operations

The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the

The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the

<<_cpu_extension_riscv_b>> configuration generic is _true_.

<<_cpu_extension_riscv_b>> configuration generic is _true_.

The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip

The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip

A copy of the spec is also available in `docs/references`.

A copy of the spec is also available in `docs/references`.

The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V

The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V

bit-manipulation spec. v.093) and their corresponding instructions:

bit-manipulation spec. v.093) and their corresponding instructions:

* **`Zba` - Address-generation instructions**

* **`Zba` - Address-generation instructions**

** `sh1add` `sh2add` `sh3add`

** `sh1add` `sh2add` `sh3add`

* **`Zbb` - Basic bit-manipulation instructions**

* **`Zbb` - Basic bit-manipulation instructions**

** `andn` `orn` `xnor`

** `andn` `orn` `xnor`

** `clz` `ctz` `cpop`

** `clz` `ctz` `cpop`

** `max` `maxu` `min` `minu`

** `max` `maxu` `min` `minu`

** `sext.b` `sext.h` `zext.h`

** `sext.b` `sext.h` `zext.h`

** `rol` `ror` `rori`

** `rol` `ror` `rori`

** `orc.b` `rev8`

** `orc.b` `rev8`

* **`Zbc` - Carry-less multiplication instructions**

* **`Zbc` - Carry-less multiplication instructions**

** `clmul` `clmulh` `clmulr`

** `clmul` `clmulh` `clmulr`

* **`Zbs` - Single-bit instructions**

* **`Zbs` - Single-bit instructions**

** `bclr` `bclri`

** `bclr` `bclri`

** `bext` `bexti`

** `bext` `bexti`

** `bext` `binvi`

** `bext` `binvi`

** `bset` `bseti`

** `bset` `bseti`

[TIP]

[TIP]

By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations

By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations

like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the

like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the

<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all

<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all

shift-related `B` instructions.

shift-related `B` instructions.

[WARNING]

[WARNING]

The `B` extension is frozen and officially ratified. However, there is no

The `B` extension is frozen and officially ratified. However, there is no

software support for this extension in the upstream GCC RISC-V port yet. An

software support for this extension in the upstream GCC RISC-V port yet. An

intrinsic library is provided to utilize the provided `B` extension features from C-language

intrinsic library is provided to utilize the provided `B` extension features from C-language

code (see `sw/example/bitmanip_test`) to circumvent this.

code (see `sw/example/bitmanip_test`) to circumvent this.

==== **`C`** - Compressed Instructions

==== **`C`** - Compressed Instructions

The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.

The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.

The `C` extension is available when the <<_cpu_extension_riscv_c>> configuration generic is _true_.

The `C` extension is available when the <<_cpu_extension_riscv_c>> configuration generic is _true_.

In this case the following instructions are available:

In this case the following instructions are available:

* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`

* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`

`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`

`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`

[NOTE]

[NOTE]

When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require

When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require

an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased

an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased

again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,

again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,

`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).

`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).

==== **`E`** - Embedded CPU

==== **`E`** - Embedded CPU

The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to

The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to

decrease physical hardware requirements (for example block RAM). This extensions is enabled when the <<_cpu_extension_riscv_e>>

decrease physical hardware requirements (for example block RAM). This extensions is enabled when the <<_cpu_extension_riscv_e>>

configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.

configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.

This extension does not add any additional instructions or features.

This extension does not add any additional instructions or features.

[NOTE]

[NOTE]

Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.

Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.

==== **`I`** - Base Integer ISA

==== **`I`** - Base Integer ISA

The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled

The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled

regardless of the setting of the remaining exceptions. The base instruction set includes the following

regardless of the setting of the remaining exceptions. The base instruction set includes the following

instructions:

instructions:

* immediate: `lui` `auipc`

* immediate: `lui` `auipc`

* jumps: `jal` `jalr`

* jumps: `jal` `jalr`

* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`

* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`

* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`

* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`

* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`

* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`

* environment: `ecall` `ebreak` `fence`

* environment: `ecall` `ebreak` `fence`

[NOTE]

[NOTE]

In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations

In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations

take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed

take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed

completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations

completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations

complete within 2 cycles (plus overhead) regardless of the actual shift amount.

complete within 2 cycles (plus overhead) regardless of the actual shift amount.

[NOTE]

[NOTE]

Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the

Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the

top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been

top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been

executed. Any flags within the `fence` instruction word are ignore by the hardware.

executed. Any flags within the `fence` instruction word are ignore by the hardware.

==== **`M`** - Integer Multiplication and Division

==== **`M`** - Integer Multiplication and Division

Hardware-accelerated integer multiplication and division operations are available when the

Hardware-accelerated integer multiplication and division operations are available when the

<<_cpu_extension_riscv_m>> configuration generic is _true_. In this case the following instructions are

<<_cpu_extension_riscv_m>> configuration generic is _true_. In this case the following instructions are

available:

available:

* multiplication: `mul` `mulh` `mulhsu` `mulhu`

* multiplication: `mul` `mulh` `mulhsu` `mulhu`

* division: `div` `divu` `rem` `remu`

* division: `div` `divu` `rem` `remu`

[NOTE]

[NOTE]

By default, multiplication and division operations are executed in a bit-serial approach.

By default, multiplication and division operations are executed in a bit-serial approach.

Alternatively, the multiplier core can be implemented using DSP blocks if the <<_fast_mul_en>>

Alternatively, the multiplier core can be implemented using DSP blocks if the <<_fast_mul_en>>

generic is _true_ allowing faster execution. Multiplications and divisions

generic is _true_ allowing faster execution. Multiplications and divisions

always require a fixed amount of cycles to complete - regardless of the input operands.

always require a fixed amount of cycles to complete - regardless of the input operands.

[NOTE]

[NOTE]

Regardless of the setting of the <<_fast_mul_en>> generic

Regardless of the setting of the <<_fast_mul_en>> generic

multiplication and division instructions operate _independently_ of the input operands.

multiplication and division instructions operate _independently_ of the input operands.

Hence, there is **no early completion** of multiply by one/zero and divide by zero operations.

Hence, there is **no early completion** of multiply by one/zero and divide by zero operations.

==== **`Zmmul`** - Integer Multiplication

==== **`Zmmul`** - Integer Multiplication

This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations

This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations

of the `M` extensions and is intended for size-constrained setups that require hardware-based

of the `M` extensions and is intended for size-constrained setups that require hardware-based

integer multiplications but not hardware-based divisions, which will be computed entirely in software.

integer multiplications but not hardware-based divisions, which will be computed entirely in software.

This extension requires only ~50% of the hardware utilization of the "full" `M` extension.

This extension requires only ~50% of the hardware utilization of the "full" `M` extension.

It is implemented if the <<_cpu_extension_riscv_zmmul>> configuration generic is _true_.

It is implemented if the <<_cpu_extension_riscv_zmmul>> configuration generic is _true_.

* multiplication: `mul` `mulh` `mulhsu` `mulhu`

* multiplication: `mul` `mulh` `mulhsu` `mulhu`

If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)

If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)

will raise an _illegal instruction exception_.

will raise an _illegal instruction exception_.

Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.

Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.

[TIP]

[TIP]

If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"

If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"

using a `rv32im` machine architecture and setting the `-mno-div` compiler flag

using a `rv32im` machine architecture and setting the `-mno-div` compiler flag

(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).

(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).

==== **`U`** - Less-Privileged User Mode

==== **`U`** - Less-Privileged User Mode

In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged

In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged

operation mode. It is implemented if the <<_cpu_extension_riscv_u>> configuration generic is _true_.

operation mode. It is implemented if the <<_cpu_extension_riscv_u>> configuration generic is _true_.

Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like

Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like

peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).

peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).

Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.

Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.

Additional CSRs:

Additional CSRs:

* <<_mcounteren>> - machine counter enable to constrain user-mode access to timer/counter CSRs

* <<_mcounteren>> - machine counter enable to constrain user-mode access to timer/counter CSRs

==== **`X`** - NEORV32-Specific (Custom) Extensions

==== **`X`** - NEORV32-Specific (Custom) Extensions

The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the <<_misa>> CSR.

The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the <<_misa>> CSR.

The most important points of the NEORV32-specific extensions are:

The most important points of the NEORV32-specific extensions are:

* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ`), which are controlled via custom bits in the <<_mie>>

* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ`), which are controlled via custom bits in the <<_mie>>

and <<_mip>> CSRs. These extensions are mapped to CSR bits, that are available for custom use according to the

and <<_mip>> CSRs. These extensions are mapped to CSR bits, that are available for custom use according to the

RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.

RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.

* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).

* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).

* There are <<_neorv32_specific_csrs>>.

* There are <<_neorv32_specific_csrs>>.

==== **`Zfinx`** Single-Precision Floating-Point Operations

==== **`Zfinx`** Single-Precision Floating-Point Operations

The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.

The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.

The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data

The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data

instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires

instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires

less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`

less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`

register file-related load/store or move instructions.

register file-related load/store or move instructions.

The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx

The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx

[NOTE]

[NOTE]

The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.

The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.

The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative

The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative

to the `F` extension. The `Zfinx` extension is implemented when the <<_cpu_extension_riscv_zfinx>> configuration

to the `F` extension. The `Zfinx` extension is implemented when the <<_cpu_extension_riscv_zfinx>> configuration

generic is _true_. In this case the following instructions and CSRs are available:

generic is _true_. In this case the following instructions and CSRs are available:

* conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`

* conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`

* comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`

* comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`

* computational: `fadd.s` `fsub.s` `fmul.s`

* computational: `fadd.s` `fsub.s` `fmul.s`

* sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`

* sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`

* number classification: `fclass.s`

* number classification: `fclass.s`

* compressed instructions: `c.flw` `c.flwsp` `c.fsw` `c.fswsp`

* compressed instructions: `c.flw` `c.flwsp` `c.fsw` `c.fswsp`

Additional CSRs:

Additional CSRs:

* <<_fcsr>> - FPU control register

* <<_fcsr>> - FPU control register

* <<_frm>> - rounding mode control

* <<_frm>> - rounding mode control

* <<_fflags>> - FPU status flags

* <<_fflags>> - FPU status flags

[WARNING]

[WARNING]

Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!

Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!

Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!

Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!

[WARNING]

[WARNING]

Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.

Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.

Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the

Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the

FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the

FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the

result is also flushed to zero during normalization.

result is also flushed to zero during normalization.

[WARNING]

[WARNING]

The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no

The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no

software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an

software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an

intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language

intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language

code (see `sw/example/floating_point_test`).

code (see `sw/example/floating_point_test`).

==== **`Zicsr`** Control and Status Register Access / Privileged Architecture

==== **`Zicsr`** Control and Status Register Access / Privileged Architecture

The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)

The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)

is implemented when the <<_cpu_extension_riscv_zicsr>> configuration generic is _true_.

is implemented when the <<_cpu_extension_riscv_zicsr>> configuration generic is _true_.

[IMPORTANT]

[IMPORTANT]

If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!

If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!

In order to provide the full set of privileged functions that are required to run more complex tasks like

In order to provide the full set of privileged functions that are required to run more complex tasks like

operating system and to allow a secure execution environment the `Zicsr` extension should be always enabled.

operating system and to allow a secure execution environment the `Zicsr` extension should be always enabled.

In this case the following instructions are available:

In this case the following instructions are available:

* CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`

* CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`

* environment: `mret` `wfi`

* environment: `mret` `wfi`

[NOTE]

[NOTE]

If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.

If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.

However, access privileges are still enforced so these instruction variants _do_ cause side-effects

However, access privileges are still enforced so these instruction variants _do_ cause side-effects

(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").

(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").

** `wfi` Instruction **

** `wfi` Instruction **

The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is

The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is

halted until a valid interrupt request occurs. To wake up again, at least one interrupt source has to

halted until a valid interrupt request occurs. To wake up again, at least one interrupt source has to

be enabled via the <<_mie>> CSR and the global interrupt enable flag in <<_mstatus>> has to be set.

be enabled via the <<_mie>> CSR and the global interrupt enable flag in <<_mstatus>> has to be set.

If the <<_mstatus>> `TW` bis is cleared the `wfi` instruction is also allowed to execute when in user-mode.

[NOTE]

This is always the case if user-mode is not implemented. If the `TW` bit is set the execution of `wfi` in

Executing the `wfi` instruction is user-mode will raise an illegal instruction exception if

user-mode will raise an illegal instruction exception.

<<_mstatus>>.`TW` is set.

==== **`Zicntr`** CPU Base Counters

==== **`Zicntr`** CPU Base Counters

The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)

The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)

counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for

counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for

these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.

these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.

These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.

These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.

Additional CSRs:

Additional CSRs:

* <<_cycleh>>, <<_mcycleh>> - cycle counter

* <<_cycleh>>, <<_mcycleh>> - cycle counter

* <<_instreth>>, <<_minstreth>> - instructions-retired counter

* <<_instreth>>, <<_minstreth>> - instructions-retired counter

* <<_timeh>> - system _wall-clock_ time

* <<_timeh>> - system _wall-clock_ time

[NOTE]

[NOTE]

Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.

Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.

If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.

If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.

==== **`Zihpm`** Hardware Performance Monitors

==== **`Zihpm`** Hardware Performance Monitors

In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides

In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides

up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an

up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an

N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's

N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's

<<_hpm_cnt_width>> generic (0..64-bit) and a corresponding event configuration CSR. The event configuration

<<_hpm_cnt_width>> generic (0..64-bit) and a corresponding event configuration CSR. The event configuration

CSR defines the architectural events that lead to an increment of the associated HPM counter.

CSR defines the architectural events that lead to an increment of the associated HPM counter.

The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.

The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.

The actual number of implemented HPM counters is defined by the <<_hpm_num_cnts>> generic.

The actual number of implemented HPM counters is defined by the <<_hpm_num_cnts>> generic.

Additional CSRs:

Additional CSRs:

* <<_mhpmevent>> 3..31 (depending on <<_hpm_num_cnts>>) - event configuration CSRs

* <<_mhpmevent>> 3..31 (depending on <<_hpm_num_cnts>>) - event configuration CSRs

* <<_mhpmcounterh>> 3..31 (depending on <<_hpm_num_cnts>>) - counter CSRs

* <<_mhpmcounterh>> 3..31 (depending on <<_hpm_num_cnts>>) - counter CSRs

[IMPORTANT]

[IMPORTANT]

The HPM counter CSRs can only be accessed in machine-mode. Hence, the according <<_mcounteren>> CSR bits

The HPM counter CSRs can only be accessed in machine-mode. Hence, the according <<_mcounteren>> CSR bits

are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction

are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction

exception.

exception.

[TIP]

[TIP]

Auto-increment of the HPMs can be deactivated individually via the <<_mcountinhibit>> CSR.

Auto-increment of the HPMs can be deactivated individually via the <<_mcountinhibit>> CSR.

==== **`Zifencei`** Instruction Stream Synchronization

==== **`Zifencei`** Instruction Stream Synchronization

The `Zifencei` CPU extension is implemented if the <<_cpu_extension_riscv_zifencei>> configuration

The `Zifencei` CPU extension is implemented if the <<_cpu_extension_riscv_zifencei>> configuration

generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:

generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:

* `fence.i`

* `fence.i`

The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.

The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.

This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set

This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set

high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.

high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.

Any additional flags within the `fence.i` instruction word are ignore by the hardware.

Any additional flags within the `fence.i` instruction word are ignore by the hardware.

==== **`Zxcfu`** Custom Instructions Extension (CFU)

==== **`Zxcfu`** Custom Instructions Extension (CFU)

The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific

The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific

custom extension, `cfu` = name of the custom extension). When enabled via the <<_cpu_extension_riscv_zxcfu>> configuration

custom extension, `cfu` = name of the custom extension). When enabled via the <<_cpu_extension_riscv_zxcfu>> configuration

generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that

generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that

allows to add **custom RISC-V instructions** to the processor core.

allows to add **custom RISC-V instructions** to the processor core.

The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data

The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data

transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be

transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be

implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by

implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by

the RISC-V spec for custom extensions.

the RISC-V spec for custom extensions.

Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that

Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that

behave like "regular" C functions.

behave like "regular" C functions.

[TIP]

[TIP]

For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.

For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.

[TIP]

[TIP]

The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.

The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.

If you like to add more complex accelerators or interfaces that can also operate independently of

If you like to add more complex accelerators or interfaces that can also operate independently of

the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.

the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.

==== **`PMP`** Physical Memory Protection

==== **`PMP`** Physical Memory Protection

The NEORV32 physical memory protection (PMP) provides an elementary memory protection mechanism that can be used

The NEORV32 physical memory protection (PMP) provides an elementary memory protection mechanism that can be used

to constrain read, write and execute rights of arbitrary memory regions. The PMP is compatible

to constrain read, write and execute rights of arbitrary memory regions. The PMP is compatible

to the _RISC-V Privileged Architecture Specifications_. For detailed information see the according spec.'s sections.

to the _RISC-V Privileged Architecture Specifications_. For detailed information see the according spec.'s sections.

[IMPORTANT]

[IMPORTANT]

The NEORV32 PMP only supports **TOR** (top of region) mode, which basically is a "base-and-bound" concept, and only

The NEORV32 PMP only supports **TOR** (top of region) mode, which basically is a "base-and-bound" concept, and only

up to 16 PMP regions.

up to 16 PMP regions.

The physical memory protection logic is implemented if the <<_pmp_num_regions>> configuration generic is greater

The physical memory protection logic is implemented if the <<_pmp_num_regions>> configuration generic is greater

than zero. This generic also defines the total number of available configurable protection

than zero. This generic also defines the total number of available configurable protection

regions. The minimal granularity of a protected region is defined by the <<_pmp_min_granularity>> generic. Larger

regions. The minimal granularity of a protected region is defined by the <<_pmp_min_granularity>> generic. Larger

granularity will reduce hardware complexity but will also decrease granularity as the minimal region sizes increases.

granularity will reduce hardware complexity but will also decrease granularity as the minimal region sizes increases.

The default value is 4 bytes, which allows a minimal region size of 4 bytes.

The default value is 4 bytes, which allows a minimal region size of 4 bytes.

If implemented the PMP provides the following additional CSRs:

If implemented the PMP provides the following additional CSRs:

* <<_pmpcfg>> 0..3 (depending on configuration) - PMP configuration registers, 4 entries per CSR

* <<_pmpcfg>> 0..3 (depending on configuration) - PMP configuration registers, 4 entries per CSR

* <<_pmpaddr>> 0..15 (depending on configuration) - PMP address registers

* <<_pmpaddr>> 0..15 (depending on configuration) - PMP address registers

**Operation Summary**

**Operation Summary**

Any CPU access address (from the instruction fetch or data access interface) is tested if it matches _any_

Any CPU access address (from the instruction fetch or data access interface) is tested if it matches _any_

of the specified PMP regions. If there is a match, the configured access rights are enforced:

of the specified PMP regions. If there is a match, the configured access rights are enforced:

* a write access (store) will fail if no **write** attribute is set

* a write access (store) will fail if no **write** attribute is set

* a read access (load) will fail if no **read** attribute is set

* a read access (load) will fail if no **read** attribute is set

* an instruction fetch access will fail if no **execute** attribute is set

* an instruction fetch access will fail if no **execute** attribute is set

If an access to a protected region does not have the according access rights it will raise the according

If an access to a protected region does not have the according access rights it will raise the according

instruction/load/store _bus access fault_ exception.

instruction/load/store _bus access fault_ exception.

By default, all PMP checks are enforced for user-mode only. However, PMP rules can also be enforced for

By default, all PMP checks are enforced for user-mode only. However, PMP rules can also be enforced for

machine-mode when the according PMP region has the "LOCK" bit set. This will also prevent any write access

machine-mode when the according PMP region has the "LOCK" bit set. This will also prevent any write access

to according region's PMP CSRs until the CPU is reset.

to according region's PMP CSRs until the CPU is reset.

.Rule Prioritization

.Rule Prioritization

[IMPORTANT]

[IMPORTANT]

All rules are checked in parallel **without** prioritization so for identical memory regions the most restrictive

All rules are checked in parallel **without** prioritization so for identical memory regions the most restrictive

PMP rule will be enforced.

PMP rule will be enforced.

.PMP Example Program

.PMP Example Program

[TIP]

[TIP]

A simple PMP example program can be found in `sw/example/demo_pmp`.

A simple PMP example program can be found in `sw/example/demo_pmp`.

**Impact on Critical Path**

**Impact on Critical Path**

When implementing more PMP regions that a "_certain critical limit_" an **additional register stage** is automatically

When implementing more PMP regions that a "_certain critical limit_" an **additional register stage** is automatically

inserted into the CPU's memory interfaces to keep impact on the critical path as short as minimal as possible.

inserted into the CPU's memory interfaces to keep impact on the critical path as short as minimal as possible.

Unfortunately, this will also increase the latency of instruction fetches and data access by one cycle.

Unfortunately, this will also increase the latency of instruction fetches and data access by one cycle.

The _critical limit_ can be modified by a constant from the main VHDL package file

The _critical limit_ can be modified by a constant from the main VHDL package file

(`rtl/core/neorv32_package.vhd`, default value = 8):

(`rtl/core/neorv32_package.vhd`, default value = 8):

[source,vhdl]

[source,vhdl]

----

----

-- "critical" number of PMP regions --

-- "critical" number of PMP regions --

constant pmp_num_regions_critical_c : natural := 8;

constant pmp_num_regions_critical_c : natural := 8;

----

----

[TIP]

[TIP]

Reducing the minimal PMP region size / granularity via the <<_pmp_min_granularity>> to entity generic

Reducing the minimal PMP region size / granularity via the <<_pmp_min_granularity>> to entity generic

will also reduce hardware utilization and impact on critical path.

will also reduce hardware utilization and impact on critical path.

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

include::cpu_cfu.adoc[]

include::cpu_cfu.adoc[]

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

=== Instruction Timing

=== Instruction Timing

The instruction timing listed in the table below shows the required clock cycles for executing a certain

The instruction timing listed in the table below shows the required clock cycles for executing a certain

instruction. These instruction cycles assume a bus access without additional wait states and a filled

instruction. These instruction cycles assume a bus access without additional wait states and a filled

pipeline.

pipeline.

Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU

Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU

configurations are presented in <<_cpu_performance>>.

configurations are presented in <<_cpu_performance>>.

.Clock cycles per instruction

.Clock cycles per instruction

[cols="<2,^1,^4,<3"]

[cols="<2,^1,^4,<3"]

[options="header", grid="rows"]

[options="header", grid="rows"]

|=======================

|=======================

| Class | ISA | Instruction(s) | Execution cycles

| Class | ISA | Instruction(s) | Execution cycles

| ALU            | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2

| ALU            | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2

| ALU            | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2

| ALU            | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2

| ALU            | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32

| ALU            | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32

| ALU            | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:

| ALU            | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:

| Branches       | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3

| Branches       | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + (ML-1)footnote:[Memory latency.]; Not taken: 3

| Branches       | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3

| Branches       | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + (ML-1); Not taken: 3

| Jumps / Calls  | `I/E` | `jal` `jalr`                  | 4 + ML

| Jumps / Calls  | `I/E` | `jal` `jalr`                  | 5 + (ML-1)

| Jumps / Calls  | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML

| Jumps / Calls  | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 5 + (ML-1)

| Memory access  | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML

| Memory access  | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 5 + (ML-2)

| Memory access  | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML

| Memory access  | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 5 + (ML-2)

| Memory access  | `A`   | `lr.w` `sc.w`                             | 4 + ML

| Memory access  | `A`   | `lr.w` `sc.w`                             | 5 + (ML-2)

| Multiplication | `M`   | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4

| MulDiv         | `M`   | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4

| Division       | `M`   | `div` `divu` `rem` `remu`     | 2+32+2

| MulDiv         | `M`   | `div` `divu` `rem` `remu`     | 2+32+2

| CSR access     | `Zicsr`     | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 3

| System         | `Zicsr`     | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 3

| System         | `I/E`       | `fence` | 3

| System         | `Zicsr`     | `ecall` `ebreak` | 3

| System         | `Zicsr`     | `ecall` `ebreak` | 3

| System         | `Zicsr`+`C` | `c.break` | 3

| System         | `Zicsr`+`C` | `c.break` | 3

| System         | `Zicsr`     | `mret` `wfi` | 6

| System         | `Zicsr`     | `wfi`            | 3

| System         | `Zifencei`  | `fence.i` | 3 + ML

| System         | `Zicsr`     | `mret` `dret`    | 5

| Fence          | `I/E`       | `fence`   | 4 + ML

| Fence          | `Zifencei`  | `fence.i` | 4 + ML

| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110

| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110

| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112

| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112

| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22

| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22

| Floating-point - compare    | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13

| Floating-point - compare    | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13

| Floating-point - misc       | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12

| Floating-point - misc       | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12

| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47

| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47

| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48

| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48

| Bit-manipulation - arithmetic/logic    | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3

| Bit-manipulation - arithmetic/logic    | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3

| Bit-manipulation - arithmetic/logic    | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3

| Bit-manipulation - arithmetic/logic    | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3

| Bit-manipulation - shifts              | `B(Zbb)` | `clz` `ctz` | 3 + 0..32

| Bit-manipulation - shifts              | `B(Zbb)` | `clz` `ctz` | 3 + 0..32

| Bit-manipulation - shifts              | `B(Zbb)` | `cpop` | 3 + 32

| Bit-manipulation - shifts              | `B(Zbb)` | `cpop` | 3 + 32

| Bit-manipulation - shifts              | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA

| Bit-manipulation - shifts              | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA

| Bit-manipulation - single-bit          | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3

| Bit-manipulation - single-bit          | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3

| Bit-manipulation - shifted-add         | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3

| Bit-manipulation - shifted-add         | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3

| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32

| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32

| Custom instructions (CFU) | `Zxcfu` | - | min. 4

| Custom instructions (CFU) | `Zxcfu` | - | min. 4

| | | |

| | | |

| _Illegal instructions_    | `Zicsr` | - | 2

| _Illegal instructions_    | `Zicsr` | - | min. 2

|=======================

|=======================

[NOTE]

[NOTE]

The presented values of the *floating-point execution cycles* are average values - obtained from

The presented values of the *floating-point execution cycles* are average values - obtained from

4096 instruction executions using pseudo-random input values. The execution time for emulating the

4096 instruction executions using pseudo-random input values. The execution time for emulating the

instructions (using pure-software libraries) is ~17..140 times higher.

instructions (using pure-software libraries) is ~17..140 times higher.

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

include::cpu_csr.adoc[]

include::cpu_csr.adoc[]

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

==== Traps, Exceptions and Interrupts

==== Traps, Exceptions and Interrupts

In this document the following nomenclature regarding traps is used:

In this document the following nomenclature regarding traps is used:

* _interrupts_ = asynchronous exceptions

* _interrupts_ = asynchronous exceptions

* _exceptions_ = synchronous exceptions

* _exceptions_ = synchronous exceptions

* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)

* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)

Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in <<_mtvec>>

Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in <<_mtvec>>

CSR. The cause of the according interrupt or exception can be determined via the content of <<_mcause>>

CSR. The cause of the according interrupt or exception can be determined via the content of <<_mcause>>

CSR. The address that reflects the current program counter when a trap was taken is stored to <<_mepc>> CSR.

CSR. The address that reflects the current program counter when a trap was taken is stored to <<_mepc>> CSR.

Additional information regarding the cause of the trap can be retrieved from <<_mtval>> CSR and the processor's

Additional information regarding the cause of the trap can be retrieved from <<_mtval>> CSR and the processor's

<<_internal_bus_monitor_buskeeper>> (for memory access exceptions)

<<_internal_bus_monitor_buskeeper>> (for memory access exceptions)

The traps are prioritized. If several _synchronous exceptions_ occur at once only the one with highest priority is triggered

The traps are prioritized. If several _synchronous exceptions_ occur at once only the one with highest priority is triggered

while all remaining exceptions are ignored. If several _asynchronous exceptions_ (interrupts) trigger at once, the one with highest priority

while all remaining exceptions are ignored. If several _asynchronous exceptions_ (interrupts) trigger at once, the one with highest priority

is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with

is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with

the second highest priority will get serviced and so on until no further interrupts are pending.

the second highest priority will get serviced and so on until no further interrupts are pending.

.Interrupt Signal Requirements - Standard RISC-V Interrupts

.Interrupt Signal Requirements - Standard RISC-V Interrupts

[IMPORTANT]

[IMPORTANT]

All standard RISC-V interrupts request signals are **high-active**. A request has to stay at high-level (=asserted)

All standard RISC-V interrupts request signals are **high-active**. A request has to stay at high-level (=asserted)

until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).

until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).

.Interrupt Signal Requirements - Fast Interrupt Requests

.Interrupt Signal Requirements - Fast Interrupt Requests

[IMPORTANT]

[IMPORTANT]

The NEORV32-specific FIRQ request lines are triggered by a one-shot high-level (i.e. rising edge). Each request is buffered in the CPU control

The NEORV32-specific FIRQ request lines are triggered by a one-shot high-level (i.e. rising edge). Each request is buffered in the CPU control

unit until the channel is either disabled (by clearing the according <<_mie>> CSR bit) or the request is explicitly cleared (by writing

unit until the channel is either disabled (by clearing the according <<_mie>> CSR bit) or the request is explicitly cleared (by writing

zero to the according <<_mip>> CSR bit).

zero to the according <<_mip>> CSR bit).

.Instruction Atomicity

.Instruction Atomicity

[NOTE]

[NOTE]

All instructions execute as atomic operations - interrupts can only trigger _between_ two instructions.

All instructions execute as atomic operations - interrupts can only trigger _between_ two instructions.

So even if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before

So even if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before

another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.

another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.

:sectnums:

:sectnums:

===== Memory Access Exceptions

===== Memory Access Exceptions

If a load operation causes any exception, the instruction's destination register is

If a load operation causes any exception, the instruction's destination register is

_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not

_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not

trigger a bus/memory read-operation at all. Vice versa, exceptions caused by a store address misalignment or a store physical

trigger a bus/memory read-operation at all. Vice versa, exceptions caused by a store address misalignment or a store physical

memory protection fault do not trigger a bus/memory write-operation at all.

memory protection fault do not trigger a bus/memory write-operation at all.

:sectnums:

:sectnums:

===== Custom Fast Interrupt Request Lines

===== Custom Fast Interrupt Request Lines

As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top

As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top

entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also

entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also

provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.

provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.

:sectnums:

:sectnums:

===== NEORV32 Trap Listing

===== NEORV32 Trap Listing

The following table shows all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization

The following table shows all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization

and the CSR side-effects. A more detailed description of the actual trap triggering events is provided in a further table.

and the CSR side-effects. A more detailed description of the actual trap triggering events is provided in a further table.

[NOTE]

[NOTE]

_Asynchronous exceptions_ (= interrupts) set the MSB of <<_mcause>> while _synchronous exception_ (= "software exception")

_Asynchronous exceptions_ (= interrupts) set the MSB of <<_mcause>> while _synchronous exception_ (= "software exception")

clear the MSB.

clear the MSB.

**Table Annotations**

**Table Annotations**

The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the

The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the

cause ID of the according trap that is written to <<_mcause>> CSR. The "[RISC-V]" columns show the interrupt/exception code value from the

cause ID of the according trap that is written to <<_mcause>> CSR. The "[RISC-V]" columns show the interrupt/exception code value from the

official RISC-V privileged architecture spec. The "ID [C]" names are defined by the NEORV32 core library (the runtime environment _RTE_) and can

official RISC-V privileged architecture spec. The "ID [C]" names are defined by the NEORV32 core library (the runtime environment _RTE_) and can

be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to <<_mepc>> and <<_mtval>> CSRs when a trap is triggered:

be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to <<_mepc>> and <<_mtval>> CSRs when a trap is triggered:

* **IPC** - address of interrupted instruction (instruction has not been executed yet)

* **IPC** - address of interrupted instruction (instruction has not been executed yet)

* **PC** - address of instruction that caused the trap

* **PC** - address of instruction that caused the trap

* **ADR** - bad memory access address that caused the trap

* **ADR** - bad memory access address that caused the trap

* **INST** - the faulting instruction word itself

* **INST** - the faulting instruction word itself

* **0** - zero

* **0** - zero

.NEORV32 Trap Listing

.NEORV32 Trap Listing

[cols="3,6,5,14,11,4,4"]

[cols="3,6,5,14,11,4,4"]

[options="header",grid="rows"]

[options="header",grid="rows"]

|=======================

|=======================

| Prio. | `mcause`     | [RISC-V] | ID [C]                   | Cause                             | `mepc`  | `mtval`

| Prio. | `mcause`     | [RISC-V] | ID [C]                   | Cause                                  | `mepc`  | `mtval`

7+^| **Synchronous Exceptions**

7+^| **Synchronous Exceptions**

| 1     | `0x00000000` | 0.0      | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned    | **PC**  | **ADR**

| 1     | `0x00000000` | 0.0      | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned         | **PC**  | **ADR**

| 2     | `0x00000001` | 0.1      | _TRAP_CODE_I_ACCESS_     | instruction access bus fault      | **PC**  | **ADR**

| 2     | `0x00000001` | 0.1      | _TRAP_CODE_I_ACCESS_     | instruction access bus fault           | **PC**  | **ADR**

| 3     | `0x00000002` | 0.2      | _TRAP_CODE_I_ILLEGAL_    | illegal instruction               | **PC**  | **INST**

| 3     | `0x00000002` | 0.2      | _TRAP_CODE_I_ILLEGAL_    | illegal instruction               | **PC**  | **INST**

| 4     | `0x0000000B` | 0.11     | _TRAP_CODE_MENV_CALL_    | environment call from M-mode      | **PC**  | **0**

| 4     | `0x0000000B` | 0.11     | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall`) | **PC**  | **0**

| 5     | `0x00000008` | 0.8      | _TRAP_CODE_UENV_CALL_    | environment call from U-mode      | **PC**  | **0**

| 5     | `0x00000008` | 0.8      | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall`) | **PC**  | **0**

| 6     | `0x00000003` | 0.3      | _TRAP_CODE_BREAKPOINT_   | breakpoint instruction            | **PC**  | **PC**

| 6     | `0x00000003` | 0.3      | _TRAP_CODE_BREAKPOINT_   | software breakpoint (`ebreak`)         | **PC**  | **0**

| 7     | `0x00000006` | 0.6      | _TRAP_CODE_S_MISALIGNED_ | store address misaligned          | **PC**  | **ADR**

| 7     | `0x00000006` | 0.6      | _TRAP_CODE_S_MISALIGNED_ | store address misaligned          | **PC**  | **ADR**

| 8     | `0x00000004` | 0.4      | _TRAP_CODE_L_MISALIGNED_ | load address misaligned           | **PC**  | **ADR**

| 8     | `0x00000004` | 0.4      | _TRAP_CODE_L_MISALIGNED_ | load address misaligned                | **PC**  | **ADR**

| 9     | `0x00000007` | 0.7      | _TRAP_CODE_S_ACCESS_     | store access bus fault            | **PC**  | **ADR**

| 9     | `0x00000007` | 0.7      | _TRAP_CODE_S_ACCESS_     | store access bus fault                 | **PC**  | **ADR**

| 10    | `0x00000005` | 0.5      | _TRAP_CODE_L_ACCESS_     | load access bus fault             | **PC**  | **ADR**

| 10    | `0x00000005` | 0.5      | _TRAP_CODE_L_ACCESS_     | load access bus fault                  | **PC**  | **ADR**

7+^| **Asynchronous Exceptions (Interrupts)**

7+^| **Asynchronous Exceptions (Interrupts)**

| 11    | `0x80000010` | 1.16     | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0  | **IPC** | **0**

| 11    | `0x80000010` | 1.16     | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0       | **IPC** | **0**

| 12    | `0x80000011` | 1.17     | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1  | **IPC** | **0**

| 12    | `0x80000011` | 1.17     | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1       | **IPC** | **0**

| 13    | `0x80000012` | 1.18     | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2  | **IPC** | **0**

| 13    | `0x80000012` | 1.18     | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2       | **IPC** | **0**

| 14    | `0x80000013` | 1.19     | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3  | **IPC** | **0**

| 14    | `0x80000013` | 1.19     | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3       | **IPC** | **0**

| 15    | `0x80000014` | 1.20     | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4  | **IPC** | **0**

| 15    | `0x80000014` | 1.20     | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4       | **IPC** | **0**

| 16    | `0x80000015` | 1.21     | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5  | **IPC** | **0**

| 16    | `0x80000015` | 1.21     | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5       | **IPC** | **0**

| 17    | `0x80000016` | 1.22     | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6  | **IPC** | **0**

| 17    | `0x80000016` | 1.22     | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6       | **IPC** | **0**

| 18    | `0x80000017` | 1.23     | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7  | **IPC** | **0**

| 18    | `0x80000017` | 1.23     | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7       | **IPC** | **0**

| 19    | `0x80000018` | 1.24     | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8  | **IPC** | **0**

| 19    | `0x80000018` | 1.24     | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8       | **IPC** | **0**

| 20    | `0x80000019` | 1.25     | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9  | **IPC** | **0**

| 20    | `0x80000019` | 1.25     | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9       | **IPC** | **0**

| 21    | `0x8000001a` | 1.26     | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | **IPC** | **0**

| 21    | `0x8000001a` | 1.26     | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10      | **IPC** | **0**

| 22    | `0x8000001b` | 1.27     | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | **IPC** | **0**

| 22    | `0x8000001b` | 1.27     | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11      | **IPC** | **0**

| 23    | `0x8000001c` | 1.28     | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | **IPC** | **0**

| 23    | `0x8000001c` | 1.28     | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12      | **IPC** | **0**

| 24    | `0x8000001d` | 1.29     | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | **IPC** | **0**

| 24    | `0x8000001d` | 1.29     | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13      | **IPC** | **0**

| 25    | `0x8000001e` | 1.30     | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | **IPC** | **0**

| 25    | `0x8000001e` | 1.30     | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14      | **IPC** | **0**

| 26    | `0x8000001f` | 1.31     | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | **IPC** | **0**

| 26    | `0x8000001f` | 1.31     | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15      | **IPC** | **0**

| 27    | `0x8000000B` | 1.11     | _TRAP_CODE_MEI_          | machine external interrupt (MEI)  | **IPC** | **0**

| 27    | `0x8000000B` | 1.11     | _TRAP_CODE_MEI_          | machine external interrupt (MEI)       | **IPC** | **0**

| 28    | `0x80000003` | 1.3      | _TRAP_CODE_MSI_          | machine software interrupt (MSI)  | **IPC** | **0**

| 28    | `0x80000003` | 1.3      | _TRAP_CODE_MSI_          | machine software interrupt (MSI)       | **IPC** | **0**

| 29    | `0x80000007` | 1.7      | _TRAP_CODE_MTI_          | machine timer interrupt (MTI)     | **IPC** | **0**

| 29    | `0x80000007` | 1.7      | _TRAP_CODE_MTI_          | machine timer interrupt (MTI)          | **IPC** | **0**

|=======================

|=======================

The following table provides a summarized description of the actual events for triggering a specific trap.

The following table provides a summarized description of the actual events for triggering a specific trap.

.NEORV32 Trap Description

.NEORV32 Trap Description

[cols="<3,<7"]

[cols="<3,<7"]

[options="header",grid="rows"]

[options="header",grid="rows"]

|=======================

|=======================

| Trap ID [C] | Triggered when ...

| Trap ID [C] | Triggered when ...

| _TRAP_CODE_I_MISALIGNED_ | fetching a 32-bit instruction word that is not 32-bit-aligned (_see note below!_)

| _TRAP_CODE_I_MISALIGNED_ | fetching a 32-bit instruction word that is not 32-bit-aligned (_see note below!_)

| _TRAP_CODE_I_ACCESS_     | bus timeout or bus error during instruction word fetch

| _TRAP_CODE_I_ACCESS_     | bus timeout or bus error during instruction word fetch

| _TRAP_CODE_I_ILLEGAL_    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation

| _TRAP_CODE_I_ILLEGAL_    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation

| _TRAP_CODE_MENV_CALL_    | executing `ecall` instruction in machine-mode

| _TRAP_CODE_MENV_CALL_    | executing `ecall` instruction in machine-mode

| _TRAP_CODE_UENV_CALL_    | executing `ecall` instruction in user-mode

| _TRAP_CODE_UENV_CALL_    | executing `ecall` instruction in user-mode

| _TRAP_CODE_BREAKPOINT_   | executing `ebreak` instruction

| _TRAP_CODE_BREAKPOINT_   | executing `ebreak` instruction

| _TRAP_CODE_S_MISALIGNED_ | storing data to an address that is not naturally aligned to the data size (byte, half, word) being stored

| _TRAP_CODE_S_MISALIGNED_ | storing data to an address that is not naturally aligned to the data size (byte, half, word) being stored

| _TRAP_CODE_L_MISALIGNED_ | loading data from an address that is not naturally aligned to the data size  (byte, half, word) being loaded

| _TRAP_CODE_L_MISALIGNED_ | loading data from an address that is not naturally aligned to the data size  (byte, half, word) being loaded

| _TRAP_CODE_S_ACCESS_     | bus timeout or bus error during load data operation

| _TRAP_CODE_S_ACCESS_     | bus timeout or bus error during load data operation

| _TRAP_CODE_L_ACCESS_     | bus timeout or bus error during store data operation

| _TRAP_CODE_L_ACCESS_     | bus timeout or bus error during store data operation

| _TRAP_CODE_FIRQ_0_ ... _TRAP_CODE_FIRQ_15_| caused by interrupt-condition of processor-internal modules, see <<_neorv32_specific_fast_interrupt_requests>>

| _TRAP_CODE_FIRQ_0_ ... _TRAP_CODE_FIRQ_15_| caused by interrupt-condition of processor-internal modules, see <<_neorv32_specific_fast_interrupt_requests>>

| _TRAP_CODE_MEI_          | user-defined processor-external source (via dedicated top-entity signal)

| _TRAP_CODE_MEI_          | user-defined processor-external source (via dedicated top-entity signal)

| _TRAP_CODE_MSI_          | user-defined processor-external source (via dedicated top-entity signal)

| _TRAP_CODE_MSI_          | user-defined processor-external source (via dedicated top-entity signal)

| _TRAP_CODE_MTI_          | processor-internal machine timer overflow OR user-defined processor-external source (via dedicated top-entity signal)

| _TRAP_CODE_MTI_          | processor-internal machine timer overflow OR user-defined processor-external source (via dedicated top-entity signal)

|=======================

|=======================

.Misaligned Instruction Address Exception

.Misaligned Instruction Address Exception

[NOTE]

[NOTE]

For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception

For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception

is raised if bit 1 of the fetch address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented

is raised if bit 1 of the fetch address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented

there will never be a misaligned instruction exception _at all_.

there will never be a misaligned instruction exception _at all_.

In both cases bit 0 of the program counter (and all related CSRs) is hardwired to zero.

In both cases bit 0 of the program counter (and all related CSRs) is hardwired to zero.

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

==== Bus Interface

==== Bus Interface

The NEORV32 CPU implements a 32-bit machine with separated instruction and data interfaces making the CPU a

The NEORV32 CPU implements a 32-bit machine with separated instruction and data interfaces making the CPU a

**Harvard Architecture**: the _instruction fetch interface_ (`i_bus_*`) is used for fetching instruction and the

**Harvard Architecture**: the _instruction fetch interface_ (`i_bus_*`) is used for fetching instruction and the

_data access interface_ (`d_bus_*`) is used to access data via load and store operations.

_data access interface_ (`d_bus_*`) is used to access data via load and store operations.

Each of this interfaces can access an address space of up to 2^32^ bytes (4GB).

Each of this interfaces can access an address space of up to 2^32^ bytes (4GB).

The following table shows the signals of the data and instruction interfaces as seen from the CPU (`*_o` signals are driven

The following table shows the signals of the data and instruction interfaces as seen from the CPU (`*_o` signals are driven

by the CPU / outputs, `*_i` signals are read by the CPU / inputs). Both interfaces use the same protocol.

by the CPU / outputs, `*_i` signals are read by the CPU / inputs). Both interfaces use the same protocol.

.CPU bus interfaces ()

.CPU bus interfaces ()

[cols="<2,^1,^1,<6"]

[cols="<2,^1,^1,<6"]

[options="header",grid="rows"]

[options="header",grid="rows"]

|=======================

|=======================

| Signal             | Width | Direction | Description

| Signal             | Width | Direction | Description

| `i/d_bus_addr_o`   | 32    | out       | access address

| `i/d_bus_addr_o`   | 32    | out       | access address

| `i/d_bus_rdata_i`  | 32    | in        | data input for read operations

| `i/d_bus_rdata_i`  | 32    | in        | data input for read operations

| `i/d_bus_wdata_o`  | 32    | out       | data output for write operations

| `i/d_bus_wdata_o`  | 32    | out       | data output for write operations

| `i/d_bus_ben_o`    | 4     | out       | byte enable signal for write operations

| `i/d_bus_ben_o`    | 4     | out       | byte enable signal for write operations

| `i/d_bus_we_o`     | 1     | out       | bus write access (always zero for instruction fetches)

| `i/d_bus_we_o`     | 1     | out       | bus write access (always zero for instruction fetches)

| `i/d_bus_re_o`     | 1     | out       | bus read access

| `i/d_bus_re_o`     | 1     | out       | bus read access

| `i/d_bus_lock_o`   | 1     | out       | exclusive access request

| `i/d_bus_lock_o`   | 1     | out       | exclusive access request

| `i/d_bus_ack_i`    | 1     | in        | accessed peripheral indicates a successful completion of the bus transaction

| `i/d_bus_ack_i`    | 1     | in        | accessed peripheral indicates a successful completion of the bus transaction

| `i/d_bus_err_i`    | 1     | in        | accessed peripheral indicates an error during the bus transaction

| `i/d_bus_err_i`    | 1     | in        | accessed peripheral indicates an error during the bus transaction

| `i/d_bus_fence_o`  | 1     | out       | this signal is set for one cycle when the CPU executes an instruction/data fence operation

| `i/d_bus_fence_o`  | 1     | out       | this signal is set for one cycle when the CPU executes an instruction/data fence operation

| `i/d_bus_priv_o`   | 2     | out       | current CPU privilege level

| `i/d_bus_priv_o`   | 2     | out       | current CPU privilege level

|=======================

|=======================

.Pipelined Transfers

.Pipelined Transfers

[NOTE]

[NOTE]

Currently, there a no pipelined or overlapping operations implemented within the same bus interface.

Currently, there a no pipelined or overlapping operations implemented within the same bus interface.

So only a single transfer request can be "on the fly" (pending) at once. However, this is no real drawback. The

So only a single transfer request can be "on the fly" (pending) at once. However, this is no real drawback. The

minimal possible latency for a single access is two cycles, which equals the CPU's minimal execution latency

minimal possible latency for a single access is two cycles, which equals the CPU's minimal execution latency

for a single instruction.

for a single instruction.

.Unaligned Memory Accesses

.Unaligned Memory Accesses

[NOTE]

[NOTE]

Please note, that the NEORV32 CPU does not support the handling of unaligned memory accesses _in hardware_. Any

Please note, that the NEORV32 CPU does not support the handling of unaligned memory accesses _in hardware_. Any

unaligned memory access will raise an exception that can can be used to handle such accesses in _software_.

unaligned memory access will raise an exception that can can be used to handle such accesses in _software_.

:sectnums:

:sectnums:

===== Protocol

===== Protocol

An actual bus request is triggered either by the `*_bus_re_o` signal (for reading data) or by the `*_bus_we_o` signal

An actual bus request is triggered either by the `*_bus_re_o` signal (for reading data) or by the `*_bus_we_o` signal

(for writing data). In case of a request, one of these signals is high for exactly one cycle. The transaction is

(for writing data). In case of a request, one of these signals is high for exactly one cycle. The transaction is

completed when the accessed peripheral/memory either sets the `*_bus_ack_i` signal (-> successful completion) or the

completed when the accessed peripheral/memory either sets the `*_bus_ack_i` signal (-> successful completion) or the

`*_bus_err_i` signal (-> failed completion). These bus response signal are also set only for one cycle active.

`*_bus_err_i` signal (-> failed completion). These bus response signal are also set only for one cycle active.

An error indicated by the `*_bus_err_i` signal will raise the according "instruction bus access fault" or

An error indicated by the `*_bus_err_i` signal will raise the according "instruction bus access fault" or

"load/store bus access fault" exception.

"load/store bus access fault" exception.

**Minimal Response Latency**

**Minimal Response Latency**

The transfer can be completed directly in the same cycle as it was initiated (via the `*_bus_re_o` or `*_bus_we_o`

The transfer can be completed directly in the same cycle as it was initiated (via the `*_bus_re_o` or `*_bus_we_o`

signal) if the peripheral sets `*_bus_ack_i` or `*_bus_err_i` high for one cycle. However, in order to shorten the

signal) if the peripheral sets `*_bus_ack_i` or `*_bus_err_i` high for one cycle. However, in order to shorten the

critical path such "asynchronous" completion should be avoided. The default NEORV32 processor-internal modules provide

critical path such "asynchronous" completion should be avoided. The default NEORV32 processor-internal modules provide

exactly **one cycle delay** between initiation and completion of transfers.

exactly **one cycle delay** between initiation and completion of transfers.

**Maximal Response Latency**

**Maximal Response Latency**

Processor-internal peripherals or memories do not have to respond within one cycle after a bus request has been initiated.

Processor-internal peripherals or memories do not have to respond within one cycle after a bus request has been initiated.

However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window

However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window

is defined by the global `max_proc_int_response_time_c` constant (default = 15 cycles; processor's VHDL package file `rtl/neorv32_package.vhd`).

is defined by the global `max_proc_int_response_time_c` constant (default = 15 cycles; processor's VHDL package file `rtl/neorv32_package.vhd`).

It defines the maximum number of cycles after which an _unacknowledged_ (`*_bus_ack_i` or `*_bus_err_i` signal from the **processor-internal bus**

It defines the maximum number of cycles after which an _unacknowledged_ (`*_bus_ack_i` or `*_bus_err_i` signal from the **processor-internal bus**

both not set) processor-internal bus

both not set) processor-internal bus

transfer will time out and raises a **bus fault exception**. The <<_internal_bus_monitor_buskeeper>> keeps track of all _internal_ bus

transfer will time out and raises a **bus fault exception**. The <<_internal_bus_monitor_buskeeper>> keeps track of all _internal_ bus

transactions to enforce this time window.

transactions to enforce this time window.

If any bus operations times out (for example when accessing "address space holes") the BUSKEEPER will issue a bus

If any bus operations times out (for example when accessing "address space holes") the BUSKEEPER will issue a bus

error to the CPU that will raise the according instruction fetch or data access bus exception.

error to the CPU that will raise the according instruction fetch or data access bus exception.

Note that **the bus keeper does not track external accesses via the external memory bus interface**. However,

Note that **the bus keeper does not track external accesses via the external memory bus interface**. However,

the external memory bus interface also provides an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).

the external memory bus interface also provides an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).

.Interface Response

.Interface Response

[NOTE]

[NOTE]

Please note that any CPU access via the data or instruction interface has to be terminated either by asserting the

Please note that any CPU access via the data or instruction interface has to be terminated either by asserting the

CPU's *_bus_ack_i` or `*_bus_err_i` signal. Otherwise the CPU will be stalled permanently. The BUSKEEPER ensures that

CPU's *_bus_ack_i` or `*_bus_err_i` signal. Otherwise the CPU will be stalled permanently. The BUSKEEPER ensures that

any kind of access is always properly terminated.

any kind of access is always properly terminated.

**Exemplary Bus Accesses**

**Exemplary Bus Accesses**

.Example bus accesses: see read/write access description below

.Example bus accesses: see read/write access description below

[cols="^2,^2"]

[cols="^2,^2"]

[grid="none"]

[grid="none"]

|=======================

|=======================

a| image::cpu_interface_read_long.png[read,300,150]

a| image::cpu_interface_read_long.png[read,300,150]

a| image::cpu_interface_write_long.png[write,300,150]

a| image::cpu_interface_write_long.png[write,300,150]

| Read access | Write access

| Read access | Write access

|=======================

|=======================

**Write Access**

**Write Access**

For a write access, the access address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte

For a write access, the access address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte

enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the

enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the

transaction is completed. In the example the accessed peripheral cannot answer directly in the next

transaction is completed. In the example the accessed peripheral cannot answer directly in the next

cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several

cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several

cycles after issuing.

cycles after issuing.

**Read Access**

**Read Access**

For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept

For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept

stable until the transaction is completed. In the example the accessed peripheral cannot answer

stable until the transaction is completed. In the example the accessed peripheral cannot answer

directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as

directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as

the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`

the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`

signal).

signal).

**Access Boundaries**

**Access Boundaries**

The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching

The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching

compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-

compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-

bit) and word (= 32-bit) boundaries.

bit) and word (= 32-bit) boundaries.

**Exclusive (Atomic) Access**

**Exclusive (Atomic) Access**

The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional

The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional

combination. Normally, these combinations should target the same memory address.

combination. Normally, these combinations should target the same memory address.

The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction

The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction

will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of

will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of

the memory system to manage this exclusive access reservation by storing the according access address and

the memory system to manage this exclusive access reservation by storing the according access address and

the source of the access itself (for example via the CPU ID in a multi-core system).

the source of the access itself (for example via the CPU ID in a multi-core system).

When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is

When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is

evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back

evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back

zero and will allow the according store operation to the memory system. If the lock is broken, the

zero and will allow the according store operation to the memory system. If the lock is broken, the

instruction will write-back non-zero and will not generate an actual memory store operation.

instruction will write-back non-zero and will not generate an actual memory store operation.

The CPU-internal exclusive access lock is broken if at least one of the situations appear.

The CPU-internal exclusive access lock is broken if at least one of the situations appear.

* when executing any other memory-access operation than `lr.w`

* when executing any other memory-access operation than `lr.w`

* when any trap (sync. or async.) is triggered (for example to force a context switch)

* when any trap (sync. or async.) is triggered (for example to force a context switch)

* when the memory system signals a bus error (via the `bus_err_i` signal)

* when the memory system signals a bus error (via the `bus_err_i` signal)

[TIP]

[TIP]

For more information regarding the SoC-level behavior and requirements of atomic operations see

For more information regarding the SoC-level behavior and requirements of atomic operations see

section <<_processor_external_memory_interface_wishbone_axi4_lite>>.

section <<_processor_external_memory_interface_wishbone_axi4_lite>>.

**Memory Barriers**

**Memory Barriers**

Whenever the CPU executes a _fence_ instruction, the according interface signal is set high for one cycle

Whenever the CPU executes a _fence_ instruction, the according interface signal is set high for one cycle

(`d_bus_fence_o` for a `fence` instruction; `i_bus_fence_o` for a `fencei` instruction). It is the task of the

(`d_bus_fence_o` for a `fence` instruction; `i_bus_fence_o` for a `fencei` instruction). It is the task of the

memory system to perform the necessary operations (for example a cache flush and refill).

memory system to perform the necessary operations (for example a cache flush and refill).

<<<

<<<

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

==== CPU Hardware Reset

==== CPU Hardware Reset

In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical

In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical

registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a

registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a

dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers

dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers

after power-up is not relevant for a defined CPU boot process.

after power-up is not relevant for a defined CPU boot process.

**Rationale**

**Rationale**

A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage

A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage

of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the

of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the

data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back

data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back

of the processing result to some kind of memory. The initial status of the data registers after power-up is

of the processing result to some kind of memory. The initial status of the data registers after power-up is

irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in

irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in

the pipeline's data register. Therefore, the pipeline data register do no require a dedicated reset as they do not

the pipeline's data register. Therefore, the pipeline data register do no require a dedicated reset as they do not

control the actual operation (in contrast to the status register). This makes the pipeline data registers from

control the actual operation (in contrast to the status register). This makes the pipeline data registers from

this example "uncritical registers".

this example "uncritical registers".

**NEORV32 CPU Reset**

**NEORV32 CPU Reset**

In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status

In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status

and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The

and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The

pipeline register will get initialized by the CPU's internal state machines, which are initialized from the main

pipeline register will get initialized by the CPU's internal state machines, which are initialized from the main

control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like

control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like

interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).

interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).

During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to

During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to

the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR <<_mie>>

the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR <<_mie>>

does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire

does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire

because the global interrupt enabled flag in the status register (`mstatsus(mie)`) _do_ provide a dedicated

because the global interrupt enabled flag in the status register (`mstatsus(mie)`) _do_ provide a dedicated

hardware reset setting this bit to low (globally disabling interrupts).

hardware reset setting this bit to low (globally disabling interrupts).

**Reset Configuration**

**Reset Configuration**

Most CPU-internal register do provide an asynchronous reset in the VHDL code, but the "don't care" value

Most CPU-internal register do provide an asynchronous reset in the VHDL code, but the "don't care" value

(VHDL `'-'`) is used for initialization of all uncritical registers, effectively generating a flip-flop without a

(VHDL `'-'`) is used for initialization of all uncritical registers, effectively generating a flip-flop without a

reset. However, certain applications or situations (like advanced gate-level / timing simulations) might

reset. However, certain applications or situations (like advanced gate-level / timing simulations) might

require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all CPU registers can

require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all CPU registers can

be enabled by enabling a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):

be enabled by enabling a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):

[source,vhdl]

[source,vhdl]

----

----

-- use dedicated hardware reset value for UNCRITICAL registers --

-- use dedicated hardware reset value for UNCRITICAL registers --

-- FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value

-- FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value

constant dedicated_reset_c : boolean := false;

constant dedicated_reset_c : boolean := false;

----

----

Browse

Tools

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Diff between revs 73 and 74