URL
https://opencores.org/ocsvn/neorv32/neorv32/trunk
Subversion Repositories neorv32
[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Rev 63
Go to most recent revision | Compare with Previous | Blame | View Log
:sectnums:== NEORV32 Central Processing Unit (CPU)image::riscv_logo.png[width=350,align=center]**Key Features*** 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU* Optional RISC-V extensions:** `A` - atomic memory access operations** `C` - 16-bit compressed instructions** `I` - integer base ISA (always enabled)** `E` - embedded CPU version (reduced register file size)** `M` - integer multiplication and division hardware** `U` - less-privileged _user_ mode** `Zbb` - basic bit-manipulation operations** `Zfinx` - single-precision floating-point unit** `Zicsr` - control and status register access (privileged architecture)** `Zifencei` - instruction stream synchronization** `Zmmul` - integer multiplication hardware** `PMP` - physical memory protection** `HPM` - hardware performance monitors** `DB` - debug mode* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)* Official RISC-V open-source architecture ID* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions* Optional physical memory configuration (PMP), compatible to the RISC-V specifications* Optional hardware performance monitors (HPM) for application benchmarking* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch forthe NEORV32 processor)* little-endian byte order* Configurable hardware reset* No hardware support of unaligned data/instruction accesses – they will trigger an exception. If the C extension is enabled instructionscan also be 16-bit aligned and a misaligned instruction address exception is not possible anymore[NOTE]It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actualCPU. Simply disable all the processor-internal modules via the generics and you will get a "CPUwrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). Thissetup also allows to further use the default bootloader and software framework. From this base youcan start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.[NOTE]This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.<<<// ####################################################################################################################:sectnums:=== ArchitectureThe NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecturespecifications. The following figure shows the simplified architecture of the CPU.image::neorv32_cpu.png[align=center]The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch)is responsible for fetching new instruction data from memory via the fetch engine. The instruction data isstored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bitinstruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressedin this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructionsvia the execute engine.These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for acertain operations can take several cycles. Since the IF and EX stages are decoupled via the instructionprefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/storesmulti-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffersdue to a taken branch.Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stagerequires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executesevery single instruction in a series of consecutive micro-operations. The combination of these two classicaldesign paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due tothe pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces aremerged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheraldevices are mapped to a single 32-bit address space making the architecture a modified Von-NeumannArchitecture.// ####################################################################################################################:sectnums:=== RISC-V CompatibilityThe NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, andrv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for theNEORV32 processor are located in the repository's `sw/isa-test` folder.[NOTE]See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]for information how to run the tests on the NEORV32..**RISC-V `rv32_m/C` Tests**...................................Check cadd-01 ... OKCheck caddi-01 ... OKCheck caddi16sp-01 ... OKCheck caddi4spn-01 ... OKCheck cand-01 ... OKCheck candi-01 ... OKCheck cbeqz-01 ... OKCheck cbnez-01 ... OKCheck cebreak-01 ... OKCheck cj-01 ... OKCheck cjal-01 ... OKCheck cjalr-01 ... OKCheck cjr-01 ... OKCheck cli-01 ... OKCheck clui-01 ... OKCheck clw-01 ... OKCheck clwsp-01 ... OKCheck cmv-01 ... OKCheck cnop-01 ... OKCheck cor-01 ... OKCheck cslli-01 ... OKCheck csrai-01 ... OKCheck csrli-01 ... OKCheck csub-01 ... OKCheck csw-01 ... OKCheck cswsp-01 ... OKCheck cxor-01 ... OK--------------------------------OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32....................................**RISC-V `rv32_m/I` Tests**...................................Check add-01 ... OKCheck addi-01 ... OKCheck and-01 ... OKCheck andi-01 ... OKCheck auipc-01 ... OKCheck beq-01 ... OKCheck bge-01 ... OKCheck bgeu-01 ... OKCheck blt-01 ... OKCheck bltu-01 ... OKCheck bne-01 ... OKCheck fence-01 ... OKCheck jal-01 ... OKCheck jalr-01 ... OKCheck lb-align-01 ... OKCheck lbu-align-01 ... OKCheck lh-align-01 ... OKCheck lhu-align-01 ... OKCheck lui-01 ... OKCheck lw-align-01 ... OKCheck or-01 ... OKCheck ori-01 ... OKCheck sb-align-01 ... OKCheck sh-align-01 ... OKCheck sll-01 ... OKCheck slli-01 ... OKCheck slt-01 ... OKCheck slti-01 ... OKCheck sltiu-01 ... OKCheck sltu-01 ... OKCheck sra-01 ... OKCheck srai-01 ... OKCheck srl-01 ... OKCheck srli-01 ... OKCheck sub-01 ... OKCheck sw-align-01 ... OKCheck xor-01 ... OKCheck xori-01 ... OK--------------------------------OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32....................................**RISC-V `rv32_m/M` Tests**...................................Check div-01 ... OKCheck divu-01 ... OKCheck mul-01 ... OKCheck mulh-01 ... OKCheck mulhsu-01 ... OKCheck mulhu-01 ... OKCheck rem-01 ... OKCheck remu-01 ... OK--------------------------------OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32....................................**RISC-V `rv32_m/privilege` Tests**...................................Check ebreak ... OKCheck ecall ... OKCheck misalign-beq-01 ... OKCheck misalign-bge-01 ... OKCheck misalign-bgeu-01 ... OKCheck misalign-blt-01 ... OKCheck misalign-bltu-01 ... OKCheck misalign-bne-01 ... OKCheck misalign-jal-01 ... OKCheck misalign-lh-01 ... OKCheck misalign-lhu-01 ... OKCheck misalign-lw-01 ... OKCheck misalign-sh-01 ... OKCheck misalign-sw-01 ... OKCheck misalign1-jalr-01 ... OKCheck misalign2-jalr-01 ... OK--------------------------------OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32....................................**RISC-V `rv32_m/Zifencei` Tests**...................................Check Fencei ... OK--------------------------------OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32...................................<<<:sectnums:==== RISC-V Incompatibility Issues and LimitationsThis list shows the currently known issues regarding full RISC-V-compatibility. More specific informationcan be found in section <<_instruction_sets_and_extensions>>.[IMPORTANT]The `misa` CSR is read-only. It shows the synthesized CPU extensions. Hence, all implementedCPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Anywrite access to it (in machine mode) is ignored and will not cause any exception or side-effects.[IMPORTANT]The `mip` CSR is read-only. Pending IRQs can be cleared using the `mie` CSR.[IMPORTANT]The `mtval` CSR is read-only.[IMPORTANT]The physical memory protection (see section <<_machine_physical_memory_protection>>)only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.[IMPORTANT]The `A` CPU extension (atomic memory access) only implements the `lr.w` and `sc.w` instructions yet.However, these instructions are sufficient to emulate all further AMO operations.<<<// ####################################################################################################################:sectnums:=== CPU Top Entity - SignalsThe following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. Thetype of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signaldirection seen from the CPU..NEORV32 CPU top entity signals[cols="<2,^1,^1,<6"][options="header", grid="rows"]|=======================| Signal | Width | Dir. | Function4+^| **Global Signals**| `clk_i` | 1 | in | global clock line, all registers triggering on rising edge| `rstn_i` | 1 | in | global reset, low-active| `sleep_o` | 1 | out | CPU is in sleep mode when set4+^| **Instruction Bus Interface (<<_bus_interface>>)**| `i_bus_addr_o` | 32 | out | destination address| `i_bus_rdata_i` | 32 | in | read data| `i_bus_wdata_o` | 32 | out | write data (always zero)| `i_bus_ben_o` | 4 | out | byte enable| `i_bus_we_o` | 1 | out | write transaction (always zero)| `i_bus_re_o` | 1 | out | read transaction| `i_bus_lock_o` | 1 | out | exclusive access request (always zero)| `i_bus_ack_i` | 1 | in | bus transfer acknowledge from accessed peripheral| `i_bus_err_i` | 1 | in | bus transfer terminate from accessed peripheral| `i_bus_fence_o` | 1 | out | indicates an executed _fence.i_ instruction| `i_bus_priv_o` | 2 | out | current CPU privilege level4+^| **Data Bus Interface (<<_bus_interface>>)**| `d_bus_addr_o` | 32 | out | destination address| `d_bus_rdata_i` | 32 | in | read data| `d_bus_wdata_o` | 32 | out | write data| `d_bus_ben_o` | 4 | out | byte enable| `d_bus_we_o` | 1 | out | write transaction| `d_bus_re_o` | 1 | out | read transaction| `d_bus_lock_o` | 1 | out | exclusive access request| `d_bus_ack_i` | 1 | in | bus transfer acknowledge from accessed peripheral| `d_bus_err_i` | 1 | in | bus transfer terminate from accessed peripheral| `d_bus_fence_o` | 1 | out | indicates an executed _fence_ instruction| `d_bus_priv_o` | 2 | out | current CPU privilege level4+^| **System Time (see <<_timeh>> CSR)**| `time_i` | 64 | in | system time input (from MTIME)4+^| **Non-Maskable Interrupt (<<_traps_exceptions_and_interrupts>>)**| `nm_irq_i` | 1 | in | non-maskable interrupt4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**| `msw_irq_i` | 1 | in | RISC-V machine software interrupt| `mext_irq_i` | 1 | in | RISC-V machine external interrupt| `mtime_irq_i` | 1 | in | RISC-V machine timer interrupt4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**| `firq_i` | 16 | in | fast interrupt request signals| `firq_ack_o` | 16 | out | fast interrupt acknowledge signals4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**| `db_halt_req_i` | 1 | in | request CPU to halt and enter debug mode|=======================<<<// ####################################################################################################################:sectnums:=== CPU Top Entity - GenericsMost of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for theNEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.The _specific_ generics are listed below.[cols="4,4,2"][frame="all",grid="none"]|======| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x000000003+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, thisgeneric is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instructionmemory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.|======[cols="4,4,2"][frame="all",grid="none"]|======| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x000000003+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base addressof the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.|======[cols="4,4,2"][frame="all",grid="none"]|======| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.|======<<<// ####################################################################################################################:sectnums:=== Instruction Sets and ExtensionsThe NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions pleasesee the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set ManualVolume II: Privileged Architecture_, which are available in the projects `docs/references` folder.[TIP]The CPU can discover available ISA extensions via the <<_misa>> CSR and the_SYSINFO_CPU_ <<_system_configuration_information_memory_sysinfo, SYSINFO>> registeror by executing an instruction and checking for an _illegal instruction exception_.[NOTE]Executing an instruction from an extension that is not implemented or not enabled (for example via the accordingtop entity generic) will raise an _illegal instruction_ exception.==== **`A`** - Atomic Memory AccessAtomic memory access instructions (for implementing semaphores and mutexes) are available when the`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructionsare available:* `lr.w`: load-reservate* `sc.w`: store-conditional[NOTE]Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations(load-modify-write instruction) can be emulated using these two instruction. Furthermore, theinstruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yetimplemented) AMO (atomic memory operation) will trigger an illegal instruction exception.[NOTE]The atomic instructions have special requirements for memory system / bus interconnect. Moreinformation can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.==== **`C`** - Compressed InstructionsCompressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is_true_. In this case the following instructions are available:* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`[NOTE]When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address requirean additional instruction fetch to load the required second half-word of that instruction. The performance can be increasedagain by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).==== **`E`** - Embedded CPUThe embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardwarerequirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond`x15` will raise and _illegal instruction exception_.[IMPORTANT]Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.==== **`I`** - Base Integer ISAThe CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabledregardless of the setting of the remaining exceptions. The base instruction set includes the followinginstructions:* immediates: `lui`, `auipc`* jumps: `jal`, `jalr`* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`* environment: `ecall`, `ebreak`, `fence`[NOTE]In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operationstake up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processedcompletely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operationscomplete within 2 cycles (plus overhead) regardless of the actual shift amount.[NOTE]Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets thetop’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has beenexecuted. Any flags within the `fence` instruction word are ignore by the hardware.==== **`M`** - Integer Multiplication and DivisionHardware-accelerated integer multiplication and division instructions are available when the`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions areavailable:* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`* division: `div`, `divu`, `rem`, `remu`[NOTE]By default, multiplication and division operations are executed in a bit-serial approach.Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`generic is _true_ allowing faster execution. Multiplications and divisionsalways require a fixed amount of cycles to complete - regardless of the input operands.==== **`Zmmul`** - Integer MultiplicationThis is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operationsof the `M` extensions and is intended for small scale applications, that require hardware-basedinteger multiplications but not hardware-based divisions, which will be computed entirely in software.This extension requires only ~50% of the hardware utilization of the `M` extension.* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)will raise an _illegal instruction exception_.Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.[TIP]If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"using a `rv32im` machine architecture and setting the `-mno-div` compiler flag(example `$ make MARCH=-march=rv32im USER_FLAGS+=-mno-div clean_all exe`).==== **`U`** - Less-Privileged User ModeAdds the less-privileged _user mode_ if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. Forinstance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (likeperipheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode.==== **`X`** - NEORV32-Specific (Custom) ExtensionsThe NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.The most important points of the NEORV32-specific extensions are:* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`and `mip` CSR. This extension is mapped to bits, that are available for custom use (according to theRISC-V specs). Also, custom trap codes for `mcause` are implemented.* The CPU provides a single _non-maskable_ interrupt (`NMI)` that also provides a custom trap code for `mcause`.* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).==== **`Zfinx`** Single-Precision Floating-Point Operations[WARNING]The NEORV32 `Zfinx` extension is specification-compliant and operational but still _experimental_.The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses theinteger register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f`register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes.This also implies that there are NO dedicated `f` register file related load/store or move instructions. Theofficial RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinxThe NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F`extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configurationgeneric is _true_. In this case the following instructions and CSRs are available:* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`* computational: `fadd.s`, `fsub.s`, `fmul.s`* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`* number classification: `fclass.s`* additional CSRs: `fcsr`, `frm`, `fflags`[WARNING]Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet![WARNING]Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU.Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering theFPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, theresult is also flushed to zero during normalization.[WARNING]The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is nosoftware support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, anintrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-languagecode (see `sw/example/floating_point_test`).==== **`Zbb`** Basic Bit-Manipulation Operations[WARNING]The NEORV32 `Zbb` extension is specification-compliant and operational but still _experimental_.The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanipThe `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configurationgeneric is _true_. In this case the following instructions are available:* `andn`, `orn`, `xnor`* `clz`, `ctz`, `cpop`* `max`, `maxu`, `min`, `minu`* `sext.b`, `sext.h`, `zext.h`* `rol`, `ror`, `rori`* `orc.b`, `rev8`[TIP]By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operationslike `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for allshift-related `Zbb` instructions.[IMPORTANT]The `Zbb` extension is frozen but not officially ratified yet. There is nosoftware support for this extension in the upstream GCC RISC-V port yet. However, anintrinsic library is provided to utilize the provided `Zbb` extension from C-languagecode (see `sw/example/bitmanip_test`).==== **`Zicsr`** Control and Status Register Access / Privileged ArchitectureThe CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions areavailable:* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`* environment: `mret`, `wfi`[WARNING]If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exceptionsupport at all. In order to provide the full spectrum of functions and to allow a secure executionsenvironment, the `Zicsr` extension should always be enabled.[NOTE]The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU ishalted until a valid interrupt request occurs. To wake up again, the according interrupt source has tobe enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.[IMPORTANT]The `wfi` instruction will raise an illegal instruction exception when executed outside of machine-modeand <<_mstatus>> bit `TW` (timeout wait) is set.==== **`Zifencei`** Instruction Stream SynchronizationThe `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configurationgeneric is _true_. It allows manual synchronization of the instruction stream via the following instruction:* `fence.i`[NOTE]The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.This allows a clean re-fetch of modified data from memory. Also, the top's `i_bus_fencei_o` signal is sethigh for one cycle to inform the memory system. Any additional flags within the `fence.i` instruction wordare ignore by the hardware.[NOTE]If the `Zifencei` extension is disabled (_CPU_EXTENSION_RISCV_Zifencei_ generic = false) executinga `fence.i` instruction will be executed as `nop` (and will **not trap**) and none of the functionsdescribed above will be executed.==== **`PMP`** Physical Memory ProtectionThe NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs.The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configuredvia the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available:* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers* `pmpaddr*` (0..63, depending on configuration): PMP address registersSee section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.**Configuration**The actual number of regions and the minimal region granularity are defined via the top entity`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal availablegranularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, thenumber of available `pmpcfg*` and `pmpaddr*` CSRs.When implementing more PMP regions that a _certain critical limit_ *an additional register stageis automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will alsoincrease the latency of instruction fetches and data access by +1 cycle.The critical limit can be adapted for custom use by a constant from the main VHDL package file(`rtl/core/neorv32_package.vhd`). The default value is 8:[source,vhdl]------ "critical" number of PMP regions --constant pmp_num_regions_critical_c : natural := 8;----**Operation**Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing anyof the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If anaddress accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked:* a write access (store) will fail if no write attribute is set* a read access (load) will fail if no read attribute is set* an instruction fetch access will fail if no execute attribute is setIf an access to a protected region does not have the according access rights (attributes) it will raise the according_instruction/load/store access fault exception_.By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physicalmemory protection also for machine-level programs you need to active the _locked bit_ in the according`pmpcfg*` configuration.[IMPORTANT]After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles forinternal (iterative) computations before the configuration becomes valid.[NOTE]For more information regarding RISC-V physical memory protection see the official _The RISC-VInstruction Set Manual – Volume II: Privileged Architecture_ specifications.==== **`HPM`** Hardware Performance MonitorsIn additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU providesup to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of anN-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configurationCSR defines the architectural events that lead to an increment of the associated HPM counter.The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) aremandatory performance monitors on every RISC-V platform and have fixed increment events. For example,the instructions-retired counter increments with each executed instructions. The actual hardware performancemonitors are optional and can be configured to increment on arbitrary hardware events. The number ofavailable HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will excludeall HPM logic from the design.Depending on the configuration, the following additional CSR are available:* counters: `mhpmcounter*[h]` (3..31, depending on configuration)* event configuration: `mhpmevent*` (3..31, depending on configuration)[IMPORTANT]The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bitsare always zero and read-only.Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and theaccording `mcountinhibit` CSR bits are hardwired to zero.However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).The according CSRs are read-only and will always return 0.[NOTE]For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>.<<<// ####################################################################################################################:sectnums:=== Instruction TimingThe instruction timing listed in the table below shows the required clock cycles for executing a certaininstruction. These instruction cycles assume a bus access without additional wait states and a filledpipeline.Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPUconfigurations are presented in <<_cpu_performance>>..Clock cycles per instruction[cols="<2,^1,^4,<3"][options="header", grid="rows"]|=======================| Class | ISA | Instruction(s) | Execution cycles| ALU | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2| ALU | `C` | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2| ALU | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32| ALU | `C` | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:| Branches | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3| Branches | `C` | `c.beqz` `c.bnez` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3| Jumps / Calls | `I/E` | `jal` `jalr` | 4 + ML| Jumps / Calls | `C` | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML| Memory access | `C` | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4 + ML| Memory access | `A` | `lr.w` `sc.w` | 4 + ML| Multiplication | `M` | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5| Division | `M` | `div` `divu` `rem` `remu` | 22+32+4| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4| System | `I/E` | `fence` | 3| System | `C`+`Zicsr` | `c.break` | 4| System | `Zicsr` | `mret` `wfi` | 5| System | `Zifencei` | `fence.i` | 5| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3| Basic bit-manip - misc | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3|=======================[NOTE]The presented values of the *floating-point execution cycles* are average values – obtained from4096 instruction executions using pseudo-random input values. The execution time for emulating theinstructions (using pure-software libraries) is ~17..140 times higher.// ####################################################################################################################include::cpu_csr.adoc[]<<<// ####################################################################################################################:sectnums:==== Full VirtualizationJust like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilitieson CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPUextension is enabled (implementing the full set of the privileged architecture).]Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing anmalformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in adefined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations thathave to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the causeof the trap - at any time improving overall _execution safety_.**NEORV32-Specific Virtualization Features*** Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system(i.e. there is no speculative execution / no out-of-order states).* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if anaccessed address does not respond or encounters an internal error during access.* The CPU raises an illegal instruction trap for _all_ unimplemented/malformed/illegal instructions.* To be continued...<<<// ####################################################################################################################:sectnums:==== Traps, Exceptions and InterruptsIn this document the following nomenclature regarding traps is used:* _interrupt_ = asynchronous exceptions* _exceptions_ = synchronous exceptions* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggeredwhile all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priorityis serviced first while the remaining ones are queued. After completing the interrupt handler the interrupt withthe second highest priority will get serviced and so on until no further interrupt are pending..Trigger Type[IMPORTANT]All CPU interrupt request signals are high-level triggered. So an interrupt request will be generated if theaccording signal is _high_ for exactly one cycle (being high for several cycles might cause multipletriggering of the same interrupt)..Instruction Atomicity[NOTE]All instructions execute as atomic operations – interrupts can only trigger between two instructions.:sectnums:==== Memory Access Exceptions**If a load operation causes any exception, the instruction's destination register is_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do nottrigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physicalmemory protection fault do not triggera bus write-operation at all.:sectnums:==== Custom Fast Interrupt Request LinesAs a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU topentity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and alsoprovide custom trap codes in `mcause`. Thes FIRQs are reserved for processor-internal usage only.:sectnums:==== Non-Maskable InterruptThe NEORV32 CPU features a single non-maskable interrupt source via the `nm_irq_i` CPU (/Processor) topentity signal. This interrupt can be used to signal _critical_ system conditions that need immediate handling.The non-maskable interrupt _cannot_ be masked/disabled at all (even not in interrupt service routines).Hence, it does _not_ provide configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible`mcause` value `0x80000000` is used to indicate the non-maskable interrupt.<<<// ####################################################################################################################:sectnums!:===== NEORV32 Trap Listing.NEORV32 trap listing[cols="3,6,5,14,11,4,4"][options="header",grid="rows"]|=======================| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`| 1 | `0x80000000` | 1.0 | _TRAP_CODE_NMI_ | non-maskable interrupt | _I-PC_ | _0_| 2 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_ | machine external interrupt | _I-PC_ | _0_| 3 | `0x80000003` | 1.3 | _TRAP_CODE_MSI_ | machine software interrupt | _I-PC_ | _0_| 4 | `0x80000007` | 1.7 | _TRAP_CODE_MTI_ | machine timer interrupt | _I-PC_ | _0_| 5 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_ | fast interrupt request channel 0 | _I-PC_ | _0_| 6 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_ | fast interrupt request channel 1 | _I-PC_ | _0_| 7 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_ | fast interrupt request channel 2 | _I-PC_ | _0_| 8 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_ | fast interrupt request channel 3 | _I-PC_ | _0_| 9 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_ | fast interrupt request channel 4 | _I-PC_ | _0_| 10 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_ | fast interrupt request channel 5 | _I-PC_ | _0_| 11 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_ | fast interrupt request channel 6 | _I-PC_ | _0_| 12 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_ | fast interrupt request channel 7 | _I-PC_ | _0_| 13 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_ | fast interrupt request channel 8 | _I-PC_ | _0_| 14 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_ | fast interrupt request channel 9 | _I-PC_ | _0_| 15 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_ | fast interrupt request channel 10 | _I-PC_ | _0_| 16 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_ | fast interrupt request channel 11 | _I-PC_ | _0_| 17 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_ | fast interrupt request channel 12 | _I-PC_ | _0_| 18 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_ | fast interrupt request channel 13 | _I-PC_ | _0_| 19 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_ | fast interrupt request channel 14 | _I-PC_ | _0_| 20 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_ | fast interrupt request channel 15 | _I-PC_ | _0_| 21 | `0x00000001` | 0.1 | _TRAP_CODE_I_ACCESS_ | instruction access fault | _B-ADR_ | _PC_| 22 | `0x00000002` | 0.2 | _TRAP_CODE_I_ILLEGAL_ | illegal instruction | _PC_ | _Inst_| 23 | `0x00000000` | 0.0 | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_| 24 | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_ | environment call from M-mode (ECALL in machine-mode) | _PC_ | _PC_| 25 | `0x00000008` | 0.8 | _TRAP_CODE_UENV_CALL_ | environment call from U-mode(ECALL in user-mode) | _PC_ | _PC_| 26 | `0x00000003` | 0.3 | _TRAP_CODE_BREAKPOINT_ | breakpoint (EBREAK) | _PC_ | _PC_| 27 | `0x00000006` | 0.6 | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_| 28 | `0x00000004` | 0.4 | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_| 29 | `0x00000007` | 0.7 | _TRAP_CODE_S_ACCESS_ | store access fault | _B-ADR_ | _B-ADR_| 30 | `0x00000005` | 0.5 | _TRAP_CODE_L_ACCESS_ | lad access fault | _B-ADR_ | _B-ADR_|=======================**Notes**The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows thecause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from theofficial RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and canbe used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to`mepc` and `mtval` CSRs when a trap is triggered:* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)* _B-ADR_- bad memory access address that cause the trap* _PC_ - address of instruction that caused the trap* _0_ - zero* _Inst_ - the faulting instruction itself<<<// ####################################################################################################################:sectnums:==== Bus InterfaceThe CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one foraccessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.:sectnums:===== Address SpaceThe CPU is a 32-bit architecture with separated instruction and data interfaces making it a HarvardArchitecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memorysystem is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPUdoes not support unaligned memory accesses _in hardware_ – however, a software-based handling can beimplemented as any unaligned memory access will trigger an according exception.:sectnums:===== Interface SignalsThe following table shows the signals of the data and instruction interfaces seen from the CPU(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs)..CPU bus interface[cols="<2,^1,<7"][options="header",grid="rows"]|=======================| Signal | Size | Function| `bus_addr_o` | 32 | access address| `bus_rdata_i` | 32 | data input for read operations| `bus_wdata_o` | 32 | data output for write operations| `bus_ben_o` | 4 | byte enable signal for write operations| `bus_we_o` | 1 | bus write access| `bus_re_o` | 1 | bus read access| `bus_lock_o` | 1 | exclusive access request| `bus_ack_i` | 1 | accessed peripheral indicates a successful completion of the bus transaction| `bus_err_i` | 1 | accessed peripheral indicates an error during the bus transaction| `bus_fence_o` | 1 | this signal is set for one cycle when the CPU executes a data/instruction fence operation| `bus_priv_o` | 2 | current CPU privilege level|=======================[NOTE]Currently, there a no pipelined or overlapping operations implemented within the same bus interface.So only a single transfer request can be "on the fly".:sectnums:===== ProtocolA bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (forwriting data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction iscompleted when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for onesingle cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction busaccess fault or load/store bus access fault exception.[NOTE]The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers..Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency[IMPORTANT]Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is definedby the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also providesan _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).**Exemplary Bus Accesses**.Example bus accesses: see read/write access description below[cols="^2,^2"][grid="none"]|=======================a| image::cpu_interface_read_long.png[read,300,150]a| image::cpu_interface_write_long.png[write,300,150]| Read access | Write access|=======================**Write Access**For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byteenable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until thetransaction is completed. In the example the accessed peripheral cannot answer directly in the nextcycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal severalcycles after issuing.**Read Access**For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is keptstable until the transaction is completed. In the example the accessed peripheral cannot answerdirectly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle asthe bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`signal).**Access Boundaries**The instruction interface will always access memory on word (= 32-bit) boundaries even if fetchingcompressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-bit) and word (= 32-bit) boundaries.**Exclusive (Atomic) Access**The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditionalcombination. Normally, these combinations should target the same memory address.The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instructionwill set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task ofthe memory system to manage this exclusive access reservation by storing the according access address andthe source of the access itself (for example via the CPU ID in a multi-core system).When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ isevaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-backzero and will allow the according store operation to the memory system. If the lock is broken, theinstruction will write-back non-zero and will not generate an actual memory store operation.The CPU-internal exclusive access lock is broken if at least one of the situations appear.* when executing any other memory-access operation than `lr.w`* when any trap (sync. or async.) is triggered (for example to force a context switch)* when the memory system signals a bus error (via the `bus_err_i` signal)[TIP]For more information regarding the SoC-level behavior and requirements of atomic operations seesection <<_processor_external_memory_interface_wishbone_axi4_lite>>.**Memory Barriers**Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of thememory system to perform the necessary operations (like a cache flush and refill).<<<// ####################################################################################################################:sectnums:==== CPU Hardware ResetIn order to reduce routing constraints (and by this the actual hardware requirements), most uncriticalregisters of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **adedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registersafter power-up is not relevant for a defined CPU boot process.**Rational**A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stageof the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when thedata in the according data register is valid. At the end of the pipeline the status register might trigger a writebackof the processing result to some kind of memory. The initial status of the data registers after power-up isirrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data inthe pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do notcontrol the actual operation (in contrast to the status register). This makes the pipeline data registers fromthis example "uncritical registers".**NEORV32 CPU Reset**In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even statusand control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. Thepipeline register will get initialized by the CPU’s internal state machines, which are initialized from the maincontrol engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (likeinterrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due tothe lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot firebecause the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicatedhardware reset setting it to low (globally disabling interrupts).**Reset Configuration**Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without areset. However, certain applications or situations (like advanced gate-level / timing simulations) mightrequire a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers canbe enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):[source,vhdl]------ "critical" number of PMP regions --constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset valuefor UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),default; TRUE=defined LOW reset value)----
Go to most recent revision | Compare with Previous | Blame | View Log
