OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Diff between revs 65 and 66

Go to most recent revision | Show entire file | Details | Blame | View Log

Rev 65 Rev 66
Line 3... Line 3...
 
 
image::riscv_logo.png[width=350,align=center]
image::riscv_logo.png[width=350,align=center]
 
 
**Key Features**
**Key Features**
 
 
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
* 32-bit multi-cycle in-order `rv32` RISC-V CPU
* Optional RISC-V extensions:
* Optional RISC-V extensions:
** `A` - atomic memory access operations
** `A` - atomic memory access operations
 
** `B` - bit-manipulation instructions
** `C` - 16-bit compressed instructions
** `C` - 16-bit compressed instructions
** `I` - integer base ISA (always enabled)
** `I` - integer base ISA (always enabled)
** `E` - embedded CPU version (reduced register file size)
** `E` - embedded CPU version (reduced register file size)
** `M` - integer multiplication and division hardware
** `M` - integer multiplication and division hardware
** `U` - less-privileged _user_ mode
** `U` - less-privileged _user_ mode
** `Zbb` - basic bit-manipulation operations
 
** `Zfinx` - single-precision floating-point unit
** `Zfinx` - single-precision floating-point unit
** `Zicsr` - control and status register access (privileged architecture)
** `Zicsr` - control and status register access (privileged architecture)
 
** `Zicntr` - CPU base counters
 
** `Zihpm` - hardware performance monitors
** `Zifencei` - instruction stream synchronization
** `Zifencei` - instruction stream synchronization
** `Zmmul` - integer multiplication hardware
** `Zmmul` - integer multiplication hardware
** `PMP` - physical memory protection
** `PMP` - physical memory protection
** `HPM` - hardware performance monitors
** `Debug` - debug mode
** `DB` - debug mode
 
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
* Official RISC-V open-source architecture ID
* Official RISC-V open-source architecture ID
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
 
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
* Optional hardware performance monitors (HPM) for application benchmarking
* Optional hardware performance monitors (HPM) for application benchmarking
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
* Separated interfaces for instruction fetch and data access (merged into a single processor bus))
the NEORV32 processor)
 
* little-endian byte order
* little-endian byte order
* Configurable hardware reset
* Configurable hardware reset
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
 
 
[NOTE]
[NOTE]
Line 51... Line 52...
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
specifications. The following figure shows the simplified architecture of the CPU.
specifications. The following figure shows the simplified architecture of the CPU.
 
 
image::neorv32_cpu.png[align=center]
image::neorv32_cpu.png[align=center]
 
 
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF - instruction fetch)
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
stored to a FIFO - the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
instruction words for the next pipeline stage. Compressed instructions - if enabled - are also decompressed
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
in this stage. The second stage (EX - execution) is responsible for actually executing the fetched instructions
 
via the execute engine.
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
 
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
data is stored to a FIFO queue - the instruction prefetch buffer.
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
 
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
instruction or decompressed 16-bit instructions) for execution.
due to a taken branch.
 
 
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
 
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
 
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
 
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
 
 
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
every single instruction in a series of consecutive micro-operations. The combination of these two classical
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
 
multi-cycle concept).
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
 
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
Architecture.
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
 
address space.
 
 
 
 
 
// ####################################################################################################################
 
:sectnums:
 
=== Full Virtualization
 
 
 
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU _and_ SoC level to
 
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
 
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
 
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing an
 
malformed instruction word or accessing a not-allocated memory address). For any kind of trap the core is always in a
 
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
 
might have to reverted). This allows predictable execution behavior at any time improving overall _execution safety_.
 
 
 
**Execution Safety - NEORV32 Virtualization Features**
 
 
 
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
 
(i.e. there is no speculative execution / no out-of-order states).
 
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
 
accessed address does not respond or encounters an internal error during access.
 
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
 
window. Otherwise a bus access exception is raised.
 
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
 
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
 
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
 
memory operations).
 
* To be continued...
 
 
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== RISC-V Compatibility
=== RISC-V Compatibility
Line 234... Line 267...
.Atomic memory operations
.Atomic memory operations
[IMPORTANT]
[IMPORTANT]
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
However, these instructions are sufficient to emulate all further atomic memory operations.
However, these instructions are sufficient to emulate all further atomic memory operations.
 
 
 
.Bit-manipulation operations
 
[IMPORTANT]
 
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
 
and the _address generation instructions_ (`Zba`) subset yet.
 
 
.Instruction Misalignment
.Instruction Misalignment
[NOTE]
[NOTE]
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
there will be no misaligned instruction exceptions _at all_.
there will be no misaligned instruction exceptions _at all_.
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
 
 
 
 
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== CPU Top Entity - Signals
=== CPU Top Entity - Signals
 
 
Line 383... Line 420...
[NOTE]
[NOTE]
The atomic instructions have special requirements for memory system / bus interconnect. More
The atomic instructions have special requirements for memory system / bus interconnect. More
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
 
 
 
 
 
==== **`B`** - Bit-Manipulation Operations
 
 
 
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
 
`CPU_EXTENSION_RISCV_B` configuration generic is _true_.
 
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
 
 
 
[IMPORTANT]
 
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
 
and the _address generation instructions_ (`Zba`) subset yet.
 
 
 
The `Zbb` sub-extension adds the following instruction:
 
 
 
* `andn`, `orn`, `xnor`
 
* `clz`, `ctz`, `cpop`
 
* `max`, `maxu`, `min`, `minu`
 
* `sext.b`, `sext.h`, `zext.h`
 
* `rol`, `ror`, `rori`
 
* `orc.b`, `rev8`
 
 
 
The `Zba` sub-extension adds the following instruction:
 
 
 
* `sh1add`, `sh2add`, `sh3add`
 
 
 
[TIP]
 
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
 
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
 
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
 
shift-related `B` instructions.
 
 
 
[WARNING]
 
The `B` extension is frozen but not officially ratified yet. There is no
 
software support for this extension in the upstream GCC RISC-V port yet. However, an
 
intrinsic library is provided to utilize the provided `B` extension features from C-language
 
code (see `sw/example/bitmanip_test`).
 
 
 
 
==== **`C`** - Compressed Instructions
==== **`C`** - Compressed Instructions
 
 
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
In this case the following instructions are available:
In this case the following instructions are available:
Line 532... Line 605...
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
code (see `sw/example/floating_point_test`).
code (see `sw/example/floating_point_test`).
 
 
 
 
==== **`Zbb`** Basic Bit-Manipulation Operations
 
 
 
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
 
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
 
 
 
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
 
generic is _true_. In this case the following instructions are available:
 
 
 
* `andn`, `orn`, `xnor`
 
* `clz`, `ctz`, `cpop`
 
* `max`, `maxu`, `min`, `minu`
 
* `sext.b`, `sext.h`, `zext.h`
 
* `rol`, `ror`, `rori`
 
* `orc.b`, `rev8`
 
 
 
[TIP]
 
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
 
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
 
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
 
shift-related `Zbb` instructions.
 
 
 
[WARNING]
 
The `Zbb` extension is frozen but not officially ratified yet. There is no
 
software support for this extension in the upstream GCC RISC-V port yet. However, an
 
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
 
code (see `sw/example/bitmanip_test`).
 
 
 
 
 
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
 
 
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
In this case the following instructions are available:
In this case the following instructions are available:
Line 584... Line 629...
[NOTE]
[NOTE]
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
`TW` (timeout wait) is hardwired to zero.
`TW` (timeout wait) is hardwired to zero.
 
 
 
 
 
 
 
 
 
==== **`Zicntr`** CPU Base Counters
 
 
 
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
 
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
 
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
 
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
 
 
 
[NOTE]
 
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
 
 
 
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
 
 
 
 
 
 
 
==== **`Zihpm`** Hardware Performance Monitors
 
 
 
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
 
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
 
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
 
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
 
CSR defines the architectural events that lead to an increment of the associated HPM counter.
 
 
 
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
 
 
 
Depending on the configuration the following additional CSR are available:
 
 
 
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
 
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
 
 
 
[IMPORTANT]
 
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
 
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
 
exception.
 
 
 
[TIP]
 
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
 
 
 
[TIP]
 
For a list of all HPM-related CSRs and all provided event configurations
 
see section <<_hardware_performance_monitors_hpm>>.
 
 
 
 
==== **`Zifencei`** Instruction Stream Synchronization
==== **`Zifencei`** Instruction Stream Synchronization
 
 
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
 
 
* `fence.i`
* `fence.i`
 
 
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
 
 
 
 
Line 657... Line 746...
[NOTE]
[NOTE]
For more information regarding RISC-V physical memory protection see the official _The RISC-V
For more information regarding RISC-V physical memory protection see the official _The RISC-V
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
 
 
 
 
==== **`HPM`** Hardware Performance Monitors
 
 
 
In additions to the mandatory cycle (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
 
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
 
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
 
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
 
CSR defines the architectural events that lead to an increment of the associated HPM counter.
 
 
 
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
 
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
 
the instructions-retired counter increments with each executed instructions. The actual hardware performance
 
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
 
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will remove
 
all HPM logic from the design.
 
 
 
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
 
according `mcountinhibit` CSR bits are hardwired to zero.
 
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
 
The according CSRs are read-only and will always return 0.
 
 
 
Depending on the configuration the following additional CSR are available:
 
 
 
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
 
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
 
 
 
[IMPORTANT]
 
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
 
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
 
exception.
 
 
 
[TIP]
 
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
 
 
 
[TIP]
 
For a list of all HPM-related CSRs and all provided event configurations
 
see section <<_hardware_performance_monitors_hpm>>.
 
 
 
 
 
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
=== Instruction Timing
=== Instruction Timing
Line 725... Line 777...
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
 
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
 
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
 
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
 
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
 
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
 
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
| System | `I/E` | `fence` | 3
| System | `I/E` | `fence` | 3
| System | `C`+`Zicsr` | `c.break` | 4
| System | `C`+`Zicsr` | `c.break` | 4
| System | `Zicsr` | `mret` `wfi` | 5
| System | `Zicsr` | `mret` `wfi` | 5
| System | `Zifencei` | `fence.i` | 5
| System | `Zifencei` | `fence.i` | 3 + ML
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4
| Bit-manipulation - arithmetic/logic | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
| Basic bit-manip - misc  | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
 
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
 
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
 
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
|=======================
|=======================
 
 
[NOTE]
[NOTE]
The presented values of the *floating-point execution cycles* are average values - obtained from
The presented values of the *floating-point execution cycles* are average values - obtained from
4096 instruction executions using pseudo-random input values. The execution time for emulating the
4096 instruction executions using pseudo-random input values. The execution time for emulating the
instructions (using pure-software libraries) is ~17..140 times higher.
instructions (using pure-software libraries) is ~17..140 times higher.
 
 
 
 
 
 
// ####################################################################################################################
 
include::cpu_csr.adoc[]
 
 
 
 
 
 
 
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
include::cpu_csr.adoc[]
==== Full Virtualization
 
 
 
Just like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilities
 
on CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPU
 
extension is enabled (implementing the full set of the privileged architecture).]
 
Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing an
 
malformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in a
 
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
 
have to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the cause
 
of the trap - at any time improving overall _execution safety_.
 
 
 
**NEORV32-Specific Virtualization Features**
 
 
 
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
 
(i.e. there is no speculative execution / no out-of-order states).
 
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
 
accessed address does not respond or encounters an internal error during access.
 
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional security feature,
 
the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions _do raise an illegal instruction trap_ and
 
_do not commit any operation_ (like writing registers or triggering memory operations).
 
* To be continued...
 
 
 
 
 
 
 
// ####################################################################################################################
// ####################################################################################################################
:sectnums:
:sectnums:
Line 962... Line 984...
[IMPORTANT]
[IMPORTANT]
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
The _BUSKEEPER_ hardware module (see section <<_internal_bus_monitor_buskeeper>>) keeps track of all _internal_ bus transactions. If any bus operations times out
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
 
 
**Exemplary Bus Accesses**
**Exemplary Bus Accesses**
Line 1047... Line 1069...
 
 
**Rational**
**Rational**
 
 
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
of the processing result to some kind of memory. The initial status of the data registers after power-up is
of the processing result to some kind of memory. The initial status of the data registers after power-up is
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
this example "uncritical registers".
this example "uncritical registers".

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.