OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Diff between revs 64 and 65

Go to most recent revision | Only display areas with differences | Details | Blame | View Log

Rev 64 Rev 65
:sectnums:
:sectnums:
== NEORV32 Central Processing Unit (CPU)
== NEORV32 Central Processing Unit (CPU)
image::riscv_logo.png[width=350,align=center]
image::riscv_logo.png[width=350,align=center]
**Key Features**
**Key Features**
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
* Optional RISC-V extensions:
* Optional RISC-V extensions:
** `A` - atomic memory access operations
** `A` - atomic memory access operations
** `C` - 16-bit compressed instructions
** `C` - 16-bit compressed instructions
** `I` - integer base ISA (always enabled)
** `I` - integer base ISA (always enabled)
** `E` - embedded CPU version (reduced register file size)
** `E` - embedded CPU version (reduced register file size)
** `M` - integer multiplication and division hardware
** `M` - integer multiplication and division hardware
** `U` - less-privileged _user_ mode
** `U` - less-privileged _user_ mode
** `Zbb` - basic bit-manipulation operations
** `Zbb` - basic bit-manipulation operations
** `Zfinx` - single-precision floating-point unit
** `Zfinx` - single-precision floating-point unit
** `Zicsr` - control and status register access (privileged architecture)
** `Zicsr` - control and status register access (privileged architecture)
** `Zifencei` - instruction stream synchronization
** `Zifencei` - instruction stream synchronization
** `Zmmul` - integer multiplication hardware
** `Zmmul` - integer multiplication hardware
** `PMP` - physical memory protection
** `PMP` - physical memory protection
** `HPM` - hardware performance monitors
** `HPM` - hardware performance monitors
** `DB` - debug mode
** `DB` - debug mode
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
* Official RISC-V open-source architecture ID
* Official RISC-V open-source architecture ID
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
* Optional hardware performance monitors (HPM) for application benchmarking
* Optional hardware performance monitors (HPM) for application benchmarking
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
the NEORV32 processor)
the NEORV32 processor)
* little-endian byte order
* little-endian byte order
* Configurable hardware reset
* Configurable hardware reset
* No hardware support of unaligned data/instruction accesses – they will trigger an exception.
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
 
 
[NOTE]
[NOTE]
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
setup also allows to further use the default bootloader and software framework. From this base you
setup also allows to further use the default bootloader and software framework. From this base you
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
[NOTE]
[NOTE]
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== Architecture
=== Architecture
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
specifications. The following figure shows the simplified architecture of the CPU.
specifications. The following figure shows the simplified architecture of the CPU.
image::neorv32_cpu.png[align=center]
image::neorv32_cpu.png[align=center]
 
 
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch)
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF - instruction fetch)
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
stored to a FIFO - the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed
instruction words for the next pipeline stage. Compressed instructions - if enabled - are also decompressed
in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions
in this stage. The second stage (EX - execution) is responsible for actually executing the fetched instructions
via the execute engine.
via the execute engine.
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
due to a taken branch.
due to a taken branch.
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
every single instruction in a series of consecutive micro-operations. The combination of these two classical
every single instruction in a series of consecutive micro-operations. The combination of these two classical
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
Architecture.
Architecture.
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== RISC-V Compatibility
=== RISC-V Compatibility
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
NEORV32 processor are located in the repository's `sw/isa-test` folder.
NEORV32 processor are located in the repository's `sw/isa-test` folder.
[NOTE]
[NOTE]
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
for information how to run the tests on the NEORV32.
for information how to run the tests on the NEORV32.
.**RISC-V `rv32_m/C` Tests**
.**RISC-V `rv32_m/C` Tests**
...................................
...................................
Check cadd-01           ... OK
Check cadd-01           ... OK
Check caddi-01          ... OK
Check caddi-01          ... OK
Check caddi16sp-01      ... OK
Check caddi16sp-01      ... OK
Check caddi4spn-01      ... OK
Check caddi4spn-01      ... OK
Check cand-01           ... OK
Check cand-01           ... OK
Check candi-01          ... OK
Check candi-01          ... OK
Check cbeqz-01          ... OK
Check cbeqz-01          ... OK
Check cbnez-01          ... OK
Check cbnez-01          ... OK
Check cebreak-01        ... OK
Check cebreak-01        ... OK
Check cj-01             ... OK
Check cj-01             ... OK
Check cjal-01           ... OK
Check cjal-01           ... OK
Check cjalr-01          ... OK
Check cjalr-01          ... OK
Check cjr-01            ... OK
Check cjr-01            ... OK
Check cli-01            ... OK
Check cli-01            ... OK
Check clui-01           ... OK
Check clui-01           ... OK
Check clw-01            ... OK
Check clw-01            ... OK
Check clwsp-01          ... OK
Check clwsp-01          ... OK
Check cmv-01            ... OK
Check cmv-01            ... OK
Check cnop-01           ... OK
Check cnop-01           ... OK
Check cor-01            ... OK
Check cor-01            ... OK
Check cslli-01          ... OK
Check cslli-01          ... OK
Check csrai-01          ... OK
Check csrai-01          ... OK
Check csrli-01          ... OK
Check csrli-01          ... OK
Check csub-01           ... OK
Check csub-01           ... OK
Check csw-01            ... OK
Check csw-01            ... OK
Check cswsp-01          ... OK
Check cswsp-01          ... OK
Check cxor-01           ... OK
Check cxor-01           ... OK
--------------------------------
--------------------------------
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
...................................
...................................
.**RISC-V `rv32_m/I` Tests**
.**RISC-V `rv32_m/I` Tests**
...................................
...................................
Check add-01            ... OK
Check add-01            ... OK
Check addi-01           ... OK
Check addi-01           ... OK
Check and-01            ... OK
Check and-01            ... OK
Check andi-01           ... OK
Check andi-01           ... OK
Check auipc-01          ... OK
Check auipc-01          ... OK
Check beq-01            ... OK
Check beq-01            ... OK
Check bge-01            ... OK
Check bge-01            ... OK
Check bgeu-01           ... OK
Check bgeu-01           ... OK
Check blt-01            ... OK
Check blt-01            ... OK
Check bltu-01           ... OK
Check bltu-01           ... OK
Check bne-01            ... OK
Check bne-01            ... OK
Check fence-01          ... OK
Check fence-01          ... OK
Check jal-01            ... OK
Check jal-01            ... OK
Check jalr-01           ... OK
Check jalr-01           ... OK
Check lb-align-01       ... OK
Check lb-align-01       ... OK
Check lbu-align-01      ... OK
Check lbu-align-01      ... OK
Check lh-align-01       ... OK
Check lh-align-01       ... OK
Check lhu-align-01      ... OK
Check lhu-align-01      ... OK
Check lui-01            ... OK
Check lui-01            ... OK
Check lw-align-01       ... OK
Check lw-align-01       ... OK
Check or-01             ... OK
Check or-01             ... OK
Check ori-01            ... OK
Check ori-01            ... OK
Check sb-align-01       ... OK
Check sb-align-01       ... OK
Check sh-align-01       ... OK
Check sh-align-01       ... OK
Check sll-01            ... OK
Check sll-01            ... OK
Check slli-01           ... OK
Check slli-01           ... OK
Check slt-01            ... OK
Check slt-01            ... OK
Check slti-01           ... OK
Check slti-01           ... OK
Check sltiu-01          ... OK
Check sltiu-01          ... OK
Check sltu-01           ... OK
Check sltu-01           ... OK
Check sra-01            ... OK
Check sra-01            ... OK
Check srai-01           ... OK
Check srai-01           ... OK
Check srl-01            ... OK
Check srl-01            ... OK
Check srli-01           ... OK
Check srli-01           ... OK
Check sub-01            ... OK
Check sub-01            ... OK
Check sw-align-01       ... OK
Check sw-align-01       ... OK
Check xor-01            ... OK
Check xor-01            ... OK
Check xori-01           ... OK
Check xori-01           ... OK
--------------------------------
--------------------------------
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
...................................
...................................
.**RISC-V `rv32_m/M` Tests**
.**RISC-V `rv32_m/M` Tests**
...................................
...................................
Check div-01            ... OK
Check div-01            ... OK
Check divu-01           ... OK
Check divu-01           ... OK
Check mul-01            ... OK
Check mul-01            ... OK
Check mulh-01           ... OK
Check mulh-01           ... OK
Check mulhsu-01         ... OK
Check mulhsu-01         ... OK
Check mulhu-01          ... OK
Check mulhu-01          ... OK
Check rem-01            ... OK
Check rem-01            ... OK
Check remu-01           ... OK
Check remu-01           ... OK
--------------------------------
--------------------------------
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
...................................
...................................
.**RISC-V `rv32_m/privilege` Tests**
.**RISC-V `rv32_m/privilege` Tests**
...................................
...................................
Check ebreak            ... OK
Check ebreak            ... OK
Check ecall             ... OK
Check ecall             ... OK
Check misalign-beq-01   ... OK
Check misalign-beq-01   ... OK
Check misalign-bge-01   ... OK
Check misalign-bge-01   ... OK
Check misalign-bgeu-01  ... OK
Check misalign-bgeu-01  ... OK
Check misalign-blt-01   ... OK
Check misalign-blt-01   ... OK
Check misalign-bltu-01  ... OK
Check misalign-bltu-01  ... OK
Check misalign-bne-01   ... OK
Check misalign-bne-01   ... OK
Check misalign-jal-01   ... OK
Check misalign-jal-01   ... OK
Check misalign-lh-01    ... OK
Check misalign-lh-01    ... OK
Check misalign-lhu-01   ... OK
Check misalign-lhu-01   ... OK
Check misalign-lw-01    ... OK
Check misalign-lw-01    ... OK
Check misalign-sh-01    ... OK
Check misalign-sh-01    ... OK
Check misalign-sw-01    ... OK
Check misalign-sw-01    ... OK
Check misalign1-jalr-01 ... OK
Check misalign1-jalr-01 ... OK
Check misalign2-jalr-01 ... OK
Check misalign2-jalr-01 ... OK
--------------------------------
--------------------------------
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
...................................
...................................
.**RISC-V `rv32_m/Zifencei` Tests**
.**RISC-V `rv32_m/Zifencei` Tests**
...................................
...................................
Check Fencei            ... OK
Check Fencei            ... OK
--------------------------------
--------------------------------
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
...................................
...................................
<<<
<<<
:sectnums:
:sectnums:
==== RISC-V Incompatibility Issues and Limitations
==== RISC-V Incompatibility Issues and Limitations
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
can be found in section <<_instruction_sets_and_extensions>>.
can be found in section <<_instruction_sets_and_extensions>>.
.Hardwired R/W CSRs
.Hardwired R/W CSRs
[IMPORTANT]
[IMPORTANT]
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
 
Pending interrupt can only be cleared by acknowledging the interrupt-causing device. However, pending interrupts
 
can still be ignored by clearing the according `mie` register bits.
 
 
.Physical memory protection
.Physical memory protection
[IMPORTANT]
[IMPORTANT]
The physical memory protection (see section <<_machine_physical_memory_protection>>)
The physical memory protection (see section <<_machine_physical_memory_protection>>)
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
.Atomic memory operations
.Atomic memory operations
[IMPORTANT]
[IMPORTANT]
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
However, these instructions are sufficient to emulate all further atomic memory operations.
However, these instructions are sufficient to emulate all further atomic memory operations.
.Instruction Misalignment
.Instruction Misalignment
[NOTE]
[NOTE]
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
there will be no misaligned instruction exceptions _at all_.
there will be no misaligned instruction exceptions _at all_.
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== CPU Top Entity - Signals
=== CPU Top Entity - Signals
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
direction seen from the CPU.
direction seen from the CPU.
.NEORV32 CPU top entity signals
.NEORV32 CPU top entity signals
[cols="<2,^1,^1,<6"]
[cols="<2,^1,^1,<6"]
[options="header", grid="rows"]
[options="header", grid="rows"]
|=======================
|=======================
| Signal           | Width | Dir.   | Function
| Signal           | Width | Dir.   | Function
4+^| **Global Signals**
4+^| **Global Signals**
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
| `rstn_i`         |     1 | in  | global reset, low-active
| `rstn_i`         |     1 | in  | global reset, low-active
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
| `i_bus_addr_o`   |    32 | out | destination address
| `i_bus_addr_o`   |    32 | out | destination address
| `i_bus_rdata_i`  |    32 | in  | read data
| `i_bus_rdata_i`  |    32 | in  | read data
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
| `i_bus_ben_o`    |     4 | out | byte enable
| `i_bus_ben_o`    |     4 | out | byte enable
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
| `i_bus_re_o`     |     1 | out | read transaction
| `i_bus_re_o`     |     1 | out | read transaction
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
4+^| **Data Bus Interface (<<_bus_interface>>)**
4+^| **Data Bus Interface (<<_bus_interface>>)**
| `d_bus_addr_o`   |    32 | out | destination address
| `d_bus_addr_o`   |    32 | out | destination address
| `d_bus_rdata_i`  |    32 | in  | read data
| `d_bus_rdata_i`  |    32 | in  | read data
| `d_bus_wdata_o`  |    32 | out | write data
| `d_bus_wdata_o`  |    32 | out | write data
| `d_bus_ben_o`    |     4 | out | byte enable
| `d_bus_ben_o`    |     4 | out | byte enable
| `d_bus_we_o`     |     1 | out | write transaction
| `d_bus_we_o`     |     1 | out | write transaction
| `d_bus_re_o`     |     1 | out | read transaction
| `d_bus_re_o`     |     1 | out | read transaction
| `d_bus_lock_o`   |     1 | out | exclusive access request
| `d_bus_lock_o`   |     1 | out | exclusive access request
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
4+^| **System Time (see <<_timeh>> CSR)**
4+^| **System Time (see <<_timeh>> CSR)**
| `time_i`         |    64 | in  | system time input (from MTIME)
| `time_i`         |    64 | in  | system time input (from MTIME)
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
| `firq_i`         |    16 | in  | fast interrupt request signals
| `firq_i`         |    16 | in  | fast interrupt request signals
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
|=======================
|=======================
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== CPU Top Entity - Generics
=== CPU Top Entity - Generics
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
The _specific_ generics are listed below.
The _specific_ generics are listed below.
[cols="4,4,2"]
[cols="4,4,2"]
[frame="all",grid="none"]
[frame="all",grid="none"]
|======
|======
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
|======
|======
[cols="4,4,2"]
[cols="4,4,2"]
[frame="all",grid="none"]
[frame="all",grid="none"]
|======
|======
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
|======
|======
[cols="4,4,2"]
[cols="4,4,2"]
[frame="all",grid="none"]
[frame="all",grid="none"]
|======
|======
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
|======
|======
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== Instruction Sets and Extensions
=== Instruction Sets and Extensions
 
 
The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
[TIP]
[TIP]
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
or by executing an instruction and checking for an _illegal instruction exception_.
or by executing an instruction and checking for an _illegal instruction exception_.
[NOTE]
[NOTE]
Executing an instruction from an extension that is not implemented or not enabled (for example via the according
Executing an instruction from an extension that is not supported yet or that is currently not enabled
top entity generic) will raise an _illegal instruction_ exception.
(via the according top entity generic) will raise an _illegal instruction_ exception.
 
 
==== **`A`** - Atomic Memory Access
==== **`A`** - Atomic Memory Access
 
 
Atomic memory access instructions (for implementing semaphores and mutexes) are available when the
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
are available:
ISA extension is enabled if the `CPU_EXTENSION_RISCV_A` configuration generic is _true_.
 
In this case the following additional instructions are available:
 
 
* `lr.w`: load-reservate
* `lr.w`: load-reservate
* `sc.w`: store-conditional
* `sc.w`: store-conditional
[NOTE]
[NOTE]
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
implemented) AMO (atomic memory operation) will trigger an illegal instruction exception.
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
 
 
 
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
 
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
 
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
 
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
 
After the execution of the `sc` instruction, the lock is automatically removed.
 
 
 
The lock is broken if at least one of the following conditions occur:
 
. executing any data memory access instruction other than `lr.w`
 
. raising _any_ t (for example an interrupt or a memory access exception)
 
 
[NOTE]
[NOTE]
The atomic instructions have special requirements for memory system / bus interconnect. More
The atomic instructions have special requirements for memory system / bus interconnect. More
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
==== **`C`** - Compressed Instructions
==== **`C`** - Compressed Instructions
 
 
Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
_true_. In this case the following instructions are available:
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
 
In this case the following instructions are available:
 
 
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
[NOTE]
[NOTE]
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
==== **`E`** - Embedded CPU
==== **`E`** - Embedded CPU
 
 
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the `CPU_EXTENSION_RISCV_E`
`x15` will raise and _illegal instruction exception_.
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
 
This extension does not add any additional instructions or features.
 
 
[IMPORTANT]
[IMPORTANT]
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
==== **`I`** - Base Integer ISA
==== **`I`** - Base Integer ISA
 
 
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
regardless of the setting of the remaining exceptions. The base instruction set includes the following
regardless of the setting of the remaining exceptions. The base instruction set includes the following
instructions:
instructions:
 
 
* immediates: `lui`, `auipc`
* immediate: `lui`, `auipc`
* jumps: `jal`, `jalr`
* jumps: `jal`, `jalr`
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
* environment: `ecall`, `ebreak`, `fence`
* environment: `ecall`, `ebreak`, `fence`
[NOTE]
[NOTE]
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
[NOTE]
[NOTE]
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
executed. Any flags within the `fence` instruction word are ignore by the hardware.
executed. Any flags within the `fence` instruction word are ignore by the hardware.
==== **`M`** - Integer Multiplication and Division
==== **`M`** - Integer Multiplication and Division
 
 
Hardware-accelerated integer multiplication and division instructions are available when the
Hardware-accelerated integer multiplication and division operations are available when the
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
available:
available:
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
* division: `div`, `divu`, `rem`, `remu`
* division: `div`, `divu`, `rem`, `remu`
[NOTE]
[NOTE]
By default, multiplication and division operations are executed in a bit-serial approach.
By default, multiplication and division operations are executed in a bit-serial approach.
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
generic is _true_ allowing faster execution. Multiplications and divisions
generic is _true_ allowing faster execution. Multiplications and divisions
always require a fixed amount of cycles to complete - regardless of the input operands.
always require a fixed amount of cycles to complete - regardless of the input operands.
==== **`Zmmul`** - Integer Multiplication
==== **`Zmmul`** - Integer Multiplication
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
of the `M` extensions and is intended for small scale applications, that require hardware-based
of the `M` extensions and is intended for size-constrained setups that require hardware-based
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
This extension requires only ~50% of the hardware utilization of the `M` extension.
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
 
 
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
will raise an _illegal instruction exception_.
will raise an _illegal instruction exception_.
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
[TIP]
[TIP]
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
(example `$ make MARCH=-march=rv32im USER_FLAGS+=-mno-div clean_all exe`).
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
 
 
==== **`U`** - Less-Privileged User Mode
==== **`U`** - Less-Privileged User Mode
 
 
Adds the less-privileged _user mode_ if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like
operation mode. It is implemented if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_.
peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode.
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
 
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
 
Any kind of privilege rights violation will raise an exception to allow full virtualization.
 
 
==== **`X`** - NEORV32-Specific (Custom) Extensions
==== **`X`** - NEORV32-Specific (Custom) Extensions
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
The most important points of the NEORV32-specific extensions are:
The most important points of the NEORV32-specific extensions are:
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
and `mip` CSR. This extension is mapped to _reserved_ CSR bits, that are available for custom use (according to the
and `mip` CSR. This extension is mapped to _reserved_ CSR bits, that are available for custom use (according to the
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
==== **`Zfinx`** Single-Precision Floating-Point Operations
==== **`Zfinx`** Single-Precision Floating-Point Operations
 
 
[WARNING]
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
The NEORV32 `Zfinx` extension is specification-compliant and operational but still _experimental_.
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
 
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f`
register file-related load/store or move instructions.
register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes.
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
This also implies that there are NO dedicated `f` register file related load/store or move instructions. The
 
official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
 
 
 
 
[TIP]
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
 
 
The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F`
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
to the `F` extension. The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
generic is _true_. In this case the following instructions and CSRs are available:
generic is _true_. In this case the following instructions and CSRs are available:
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
* computational: `fadd.s`, `fsub.s`, `fmul.s`
* computational: `fadd.s`, `fsub.s`, `fmul.s`
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
* number classification: `fclass.s`
* number classification: `fclass.s`
* additional CSRs: `fcsr`, `frm`, `fflags`
* additional CSRs: `fcsr`, `frm`, `fflags`
[WARNING]
[WARNING]
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
[WARNING]
[WARNING]
Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU.
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
result is also flushed to zero during normalization.
result is also flushed to zero during normalization.
[WARNING]
[WARNING]
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
code (see `sw/example/floating_point_test`).
code (see `sw/example/floating_point_test`).
==== **`Zbb`** Basic Bit-Manipulation Operations
==== **`Zbb`** Basic Bit-Manipulation Operations
 
 
[WARNING]
 
The NEORV32 `Zbb` extension is specification-compliant and operational but still _experimental_.
 
 
 
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
generic is _true_. In this case the following instructions are available:
generic is _true_. In this case the following instructions are available:
* `andn`, `orn`, `xnor`
* `andn`, `orn`, `xnor`
* `clz`, `ctz`, `cpop`
* `clz`, `ctz`, `cpop`
* `max`, `maxu`, `min`, `minu`
* `max`, `maxu`, `min`, `minu`
* `sext.b`, `sext.h`, `zext.h`
* `sext.b`, `sext.h`, `zext.h`
* `rol`, `ror`, `rori`
* `rol`, `ror`, `rori`
* `orc.b`, `rev8`
* `orc.b`, `rev8`
[TIP]
[TIP]
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
shift-related `Zbb` instructions.
shift-related `Zbb` instructions.
 
 
[IMPORTANT]
[WARNING]
The `Zbb` extension is frozen but not officially ratified yet. There is no
The `Zbb` extension is frozen but not officially ratified yet. There is no
software support for this extension in the upstream GCC RISC-V port yet. However, an
software support for this extension in the upstream GCC RISC-V port yet. However, an
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
code (see `sw/example/bitmanip_test`).
code (see `sw/example/bitmanip_test`).
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
 
 
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
available:
In this case the following instructions are available:
 
 
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
* environment: `mret`, `wfi`
* environment: `mret`, `wfi`
[WARNING]
[WARNING]
If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
support at all. In order to provide the full spectrum of functions and to allow a secure executions
In order to provide the full set of functions and to allow a secure execution
environment, the `Zicsr` extension should always be enabled.
environment the `Zicsr` extension should always be enabled.
 
 
[NOTE]
[NOTE]
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
 
 
[IMPORTANT]
[NOTE]
The `wfi` instruction will raise an illegal instruction exception when executed outside of machine-mode
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
and <<_mstatus>> bit `TW` (timeout wait) is set.
`TW` (timeout wait) is hardwired to zero.
 
 
==== **`Zifencei`** Instruction Stream Synchronization
==== **`Zifencei`** Instruction Stream Synchronization
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
* `fence.i`
* `fence.i`
 
 
[NOTE]
 
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
==== **`PMP`** Physical Memory Protection
==== **`PMP`** Physical Memory Protection
 
 
The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs.
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured
to constrain memory read/write/execute rights for each available privilege level.
via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the
 
`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available:
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
 
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
 
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
 
In this case the following additional CSRs are available:
 
 
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
 
 
 
[TIP]
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
 
 
**Configuration**
 
 
 
The actual number of regions and the minimal region granularity are defined via the top entity
The actual number of regions and the minimal region granularity are defined via the top entity
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
number of available `pmpcfg*` and `pmpaddr*` CSRs.
number of available `pmpcfg*` and `pmpaddr*` CSRs.
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
increase the latency of instruction fetches and data access by +1 cycle.
increase the latency of instruction fetches and data access by +1 cycle.
The critical limit can be adapted for custom use by a constant from the main VHDL package file
The critical limit can be adapted for custom use by a constant from the main VHDL package file
(`rtl/core/neorv32_package.vhd`). The default value is 8:
(`rtl/core/neorv32_package.vhd`). The default value is 8:
[source,vhdl]
[source,vhdl]
----
----
-- "critical" number of PMP regions --
-- "critical" number of PMP regions --
constant pmp_num_regions_critical_c : natural := 8;
constant pmp_num_regions_critical_c : natural := 8;
----
----
**Operation**
**Operation**
 
 
Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an
of the specified  PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked:
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
 
 
* a write access (store) will fail if no write attribute is set
* a write access (store) will fail if no write attribute is set
* a read access (load) will fail if no read attribute is set
* a read access (load) will fail if no read attribute is set
* an instruction fetch access will fail if no execute attribute is set
* an instruction fetch access will fail if no execute attribute is set
 
 
If an access to a protected region does not have the according access rights (attributes) it will raise the according
If an access to a protected region does not have the according access rights it will raise the according
_instruction/load/store access fault exception_.
instruction/load/store _access fault_ exception.
 
 
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
memory protection also for machine-level programs you need to active the _locked bit_ in the according
memory protection also for machine-level programs you need to set the _locked bit_ in the according
`pmpcfg*` configuration.
`pmpcfg*` configuration CSR.
 
 
[IMPORTANT]
[IMPORTANT]
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
internal (iterative) computations before the configuration becomes valid.
internal (iterative) computations before the configuration becomes valid.
[NOTE]
[NOTE]
For more information regarding RISC-V physical memory protection see the official _The RISC-V
For more information regarding RISC-V physical memory protection see the official _The RISC-V
Instruction Set Manual – Volume II: Privileged Architecture_ specifications.
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
 
 
==== **`HPM`** Hardware Performance Monitors
==== **`HPM`** Hardware Performance Monitors
 
 
In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
In additions to the mandatory cycle (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configuration
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
CSR defines the architectural events that lead to an increment of the associated HPM counter.
CSR defines the architectural events that lead to an increment of the associated HPM counter.
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
the instructions-retired counter increments with each executed instructions. The actual hardware performance
the instructions-retired counter increments with each executed instructions. The actual hardware performance
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will remove
all HPM logic from the design.
all HPM logic from the design.
 
 
Depending on the configuration, the following additional CSR are available:
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
 
according `mcountinhibit` CSR bits are hardwired to zero.
 
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
 
The according CSRs are read-only and will always return 0.
 
 
 
Depending on the configuration the following additional CSR are available:
 
 
* counters: `mhpmcounter*[h]` (3..31, depending on configuration)
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
* event configuration: `mhpmevent*` (3..31, depending on configuration)
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
 
 
[IMPORTANT]
[IMPORTANT]
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
are always zero and read-only.
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
 
exception.
 
 
 
[TIP]
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
 
 
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
[TIP]
according `mcountinhibit` CSR bits are hardwired to zero.
For a list of all HPM-related CSRs and all provided event configurations
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
see section <<_hardware_performance_monitors_hpm>>.
The according CSRs are read-only and will always return 0.
 
 
 
[NOTE]
 
For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>.
 
 
 
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== Instruction Timing
=== Instruction Timing
The instruction timing listed in the table below shows the required clock cycles for executing a certain
The instruction timing listed in the table below shows the required clock cycles for executing a certain
instruction. These instruction cycles assume a bus access without additional wait states and a filled
instruction. These instruction cycles assume a bus access without additional wait states and a filled
pipeline.
pipeline.
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
configurations are presented in <<_cpu_performance>>.
configurations are presented in <<_cpu_performance>>.
.Clock cycles per instruction
.Clock cycles per instruction
[cols="<2,^1,^4,<3"]
[cols="<2,^1,^4,<3"]
[options="header", grid="rows"]
[options="header", grid="rows"]
|=======================
|=======================
| Class | ISA | Instruction(s) | Execution cycles
| Class | ISA | Instruction(s) | Execution cycles
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
| System | `I/E` | `fence` | 3
| System | `I/E` | `fence` | 3
| System | `C`+`Zicsr` | `c.break` | 4
| System | `C`+`Zicsr` | `c.break` | 4
| System | `Zicsr` | `mret` `wfi` | 5
| System | `Zicsr` | `mret` `wfi` | 5
| System | `Zifencei` | `fence.i` | 5
| System | `Zifencei` | `fence.i` | 5
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3
| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3
| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4
| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
| Basic bit-manip - misc  | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
| Basic bit-manip - misc  | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
|=======================
|=======================
[NOTE]
[NOTE]
The presented values of the *floating-point execution cycles* are average values – obtained from
The presented values of the *floating-point execution cycles* are average values - obtained from
4096 instruction executions using pseudo-random input values. The execution time for emulating the
4096 instruction executions using pseudo-random input values. The execution time for emulating the
instructions (using pure-software libraries) is ~17..140 times higher.
instructions (using pure-software libraries) is ~17..140 times higher.
// ####################################################################################################################
// ####################################################################################################################
include::cpu_csr.adoc[]
include::cpu_csr.adoc[]
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
==== Full Virtualization
==== Full Virtualization
Just like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilities
Just like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilities
on CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPU
on CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPU
extension is enabled (implementing the full set of the privileged architecture).]
extension is enabled (implementing the full set of the privileged architecture).]
Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing an
Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing an
malformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in a
malformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in a
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
have to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the cause
have to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the cause
of the trap - at any time improving overall _execution safety_.
of the trap - at any time improving overall _execution safety_.
**NEORV32-Specific Virtualization Features**
**NEORV32-Specific Virtualization Features**
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
(i.e. there is no speculative execution / no out-of-order states).
(i.e. there is no speculative execution / no out-of-order states).
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
accessed address does not respond or encounters an internal error during access.
accessed address does not respond or encounters an internal error during access.
* The CPU raises an illegal instruction trap for _all_ unimplemented/malformed/illegal instructions.
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional security feature,
 
the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions _do raise an illegal instruction trap_ and
 
_do not commit any operation_ (like writing registers or triggering memory operations).
* To be continued...
* To be continued...
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
==== Traps, Exceptions and Interrupts
==== Traps, Exceptions and Interrupts
In this document the following nomenclature regarding traps is used:
In this document the following nomenclature regarding traps is used:
* _interrupts_ = asynchronous exceptions
* _interrupts_ = asynchronous exceptions
* _exceptions_ = synchronous exceptions
* _exceptions_ = synchronous exceptions
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
the second highest priority will get serviced and so on until no further interrupt are pending.
the second highest priority will get serviced and so on until no further interrupt are pending.
 
 
.RISC-V interrupts
.Interrupt Signal Requirements
[IMPORTANT]
[IMPORTANT]
All RISC-V defined machine level interrupts request signals are high-active. A request has to stay at high-level until
All interrupts request signals (including FIRQs) are **high-active**. A request has to stay at high-level (=asserted)
it is acknowledged by the CPU (for example by writing to a specific memory-mapped register).
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
 
 
.Instruction Atomicity
.Instruction Atomicity
[NOTE]
[NOTE]
All instructions execute as atomic operations – interrupts can only trigger between two instructions.
All instructions execute as atomic operations - interrupts can only trigger between two instructions.
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
a new interrupt handler can start.
a new interrupt handler can start.
:sectnums:
:sectnums:
==== Memory Access Exceptions**
==== Memory Access Exceptions**
If a load operation causes any exception, the instruction's destination register is
If a load operation causes any exception, the instruction's destination register is
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
memory protection fault do not trigger a bus write-operation at all.
memory protection fault do not trigger a bus write-operation at all.
:sectnums:
:sectnums:
==== Custom Fast Interrupt Request Lines
==== Custom Fast Interrupt Request Lines
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
provide custom trap codes in `mcause`. These FIRQs are reserved for processor-internal usage only.
provide custom trap codes in `mcause`. These FIRQs are reserved for NEORV32 processor-internal usage only.
 
 
[NOTE]
 
The fast interrupt request lines trigger on a **rising-edge**.
 
 
 
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums!:
:sectnums!:
===== NEORV32 Trap Listing
===== NEORV32 Trap Listing
.NEORV32 trap listing
.NEORV32 trap listing
[cols="3,6,5,14,11,4,4"]
[cols="3,6,5,14,11,4,4"]
[options="header",grid="rows"]
[options="header",grid="rows"]
|=======================
|=======================
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (EBREAK) | _PC_ | _PC_
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (EBREAK) | _PC_ | _PC_
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
|=======================
|=======================
**Notes**
**Notes**
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
`mepc` and `mtval` CSRs when a trap is triggered:
`mepc` and `mtval` CSRs when a trap is triggered:
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
* _B-ADR_- bad memory access address that cause the trap
* _B-ADR_- bad memory access address that cause the trap
* _PC_ - address of instruction that caused the trap
* _PC_ - address of instruction that caused the trap
* _0_ - zero
* _0_ - zero
* _Inst_ - the faulting instruction itself
* _Inst_ - the faulting instruction itself
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
==== Bus Interface
==== Bus Interface
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
:sectnums:
:sectnums:
===== Address Space
===== Address Space
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be
does not support unaligned memory accesses _in hardware_ - however, a software-based handling can be
implemented as any unaligned memory access will trigger an according exception.
implemented as any unaligned memory access will trigger an according exception.
:sectnums:
:sectnums:
===== Interface Signals
===== Interface Signals
The following table shows the signals of the data and instruction interfaces seen from the CPU
The following table shows the signals of the data and instruction interfaces seen from the CPU
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
.CPU bus interface
.CPU bus interface
[cols="<2,^1,<7"]
[cols="<2,^1,<7"]
[options="header",grid="rows"]
[options="header",grid="rows"]
|=======================
|=======================
| Signal | Size | Function
| Signal | Size | Function
| `bus_addr_o`   | 32 | access address
| `bus_addr_o`   | 32 | access address
| `bus_rdata_i`  | 32 | data input for read operations
| `bus_rdata_i`  | 32 | data input for read operations
| `bus_wdata_o`  | 32 | data output for write operations
| `bus_wdata_o`  | 32 | data output for write operations
| `bus_ben_o`    | 4  | byte enable signal for write operations
| `bus_ben_o`    | 4  | byte enable signal for write operations
| `bus_we_o`     | 1  | bus write access
| `bus_we_o`     | 1  | bus write access
| `bus_re_o`     | 1  | bus read access
| `bus_re_o`     | 1  | bus read access
| `bus_lock_o`   | 1  | exclusive access request
| `bus_lock_o`   | 1  | exclusive access request
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
| `bus_priv_o`   | 2  | current CPU privilege level
| `bus_priv_o`   | 2  | current CPU privilege level
|=======================
|=======================
[NOTE]
[NOTE]
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
So only a single transfer request can be "on the fly".
So only a single transfer request can be "on the fly".
:sectnums:
:sectnums:
===== Protocol
===== Protocol
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
access fault or load/store bus access fault exception.
access fault or load/store bus access fault exception.
[NOTE]
[NOTE]
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
[IMPORTANT]
[IMPORTANT]
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
**Exemplary Bus Accesses**
**Exemplary Bus Accesses**
.Example bus accesses: see read/write access description below
.Example bus accesses: see read/write access description below
[cols="^2,^2"]
[cols="^2,^2"]
[grid="none"]
[grid="none"]
|=======================
|=======================
a| image::cpu_interface_read_long.png[read,300,150]
a| image::cpu_interface_read_long.png[read,300,150]
a| image::cpu_interface_write_long.png[write,300,150]
a| image::cpu_interface_write_long.png[write,300,150]
| Read access | Write access
| Read access | Write access
|=======================
|=======================
**Write Access**
**Write Access**
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
cycles after issuing.
cycles after issuing.
**Read Access**
**Read Access**
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
stable until the transaction is completed. In the example the accessed peripheral cannot answer
stable until the transaction is completed. In the example the accessed peripheral cannot answer
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
signal).
signal).
**Access Boundaries**
**Access Boundaries**
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
bit) and word (= 32-bit) boundaries.
bit) and word (= 32-bit) boundaries.
**Exclusive (Atomic) Access**
**Exclusive (Atomic) Access**
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
combination. Normally, these combinations should target the same memory address.
combination. Normally, these combinations should target the same memory address.
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
the memory system to manage this exclusive access reservation by storing the according access address and
the memory system to manage this exclusive access reservation by storing the according access address and
the source of the access itself (for example via the CPU ID in a multi-core system).
the source of the access itself (for example via the CPU ID in a multi-core system).
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
zero and will allow the according store operation to the memory system. If the lock is broken, the
zero and will allow the according store operation to the memory system. If the lock is broken, the
instruction will write-back non-zero and will not generate an actual memory store operation.
instruction will write-back non-zero and will not generate an actual memory store operation.
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
* when executing any other memory-access operation than `lr.w`
* when executing any other memory-access operation than `lr.w`
* when any trap (sync. or async.) is triggered (for example to force a context switch)
* when any trap (sync. or async.) is triggered (for example to force a context switch)
* when the memory system signals a bus error (via the `bus_err_i` signal)
* when the memory system signals a bus error (via the `bus_err_i` signal)
[TIP]
[TIP]
For more information regarding the SoC-level behavior and requirements of atomic operations see
For more information regarding the SoC-level behavior and requirements of atomic operations see
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
**Memory Barriers**
**Memory Barriers**
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
memory system to perform the necessary operations (like a cache flush and refill).
memory system to perform the necessary operations (like a cache flush and refill).
<<<
<<<
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
==== CPU Hardware Reset
==== CPU Hardware Reset
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
after power-up is not relevant for a defined CPU boot process.
after power-up is not relevant for a defined CPU boot process.
**Rational**
**Rational**
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
of the processing result to some kind of memory. The initial status of the data registers after power-up is
of the processing result to some kind of memory. The initial status of the data registers after power-up is
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
this example "uncritical registers".
this example "uncritical registers".
**NEORV32 CPU Reset**
**NEORV32 CPU Reset**
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
hardware reset setting it to low (globally disabling interrupts).
hardware reset setting it to low (globally disabling interrupts).
**Reset Configuration**
**Reset Configuration**
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
[source,vhdl]
[source,vhdl]
----
----
-- "critical" number of PMP regions --
-- "critical" number of PMP regions --
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
default; TRUE=defined LOW reset value)
default; TRUE=defined LOW reset value)
----
----
 
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.