OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 65

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11
** `C` - 16-bit compressed instructions
12
** `I` - integer base ISA (always enabled)
13
** `E` - embedded CPU version (reduced register file size)
14
** `M` - integer multiplication and division hardware
15
** `U` - less-privileged _user_ mode
16 63 zero_gravi
** `Zbb` - basic bit-manipulation operations
17 61 zero_gravi
** `Zfinx` - single-precision floating-point unit
18
** `Zicsr` - control and status register access (privileged architecture)
19
** `Zifencei` - instruction stream synchronization
20
** `Zmmul` - integer multiplication hardware
21
** `PMP` - physical memory protection
22
** `HPM` - hardware performance monitors
23
** `DB` - debug mode
24 65 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
25 60 zero_gravi
* Official RISC-V open-source architecture ID
26 65 zero_gravi
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
27 60 zero_gravi
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
28
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
29
* Optional hardware performance monitors (HPM) for application benchmarking
30
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
31
the NEORV32 processor)
32
* little-endian byte order
33
* Configurable hardware reset
34 65 zero_gravi
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
35 60 zero_gravi
 
36
[NOTE]
37
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
38
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
39
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
40
setup also allows to further use the default bootloader and software framework. From this base you
41
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
42
 
43
[NOTE]
44
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
45
 
46
<<<
47
// ####################################################################################################################
48
:sectnums:
49
=== Architecture
50
 
51
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
52
specifications. The following figure shows the simplified architecture of the CPU.
53
 
54
image::neorv32_cpu.png[align=center]
55
 
56 65 zero_gravi
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF - instruction fetch)
57 60 zero_gravi
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
58 65 zero_gravi
stored to a FIFO - the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
59
instruction words for the next pipeline stage. Compressed instructions - if enabled - are also decompressed
60
in this stage. The second stage (EX - execution) is responsible for actually executing the fetched instructions
61 60 zero_gravi
via the execute engine.
62
 
63
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
64
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
65
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
66
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
67
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
68
due to a taken branch.
69
 
70
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
71
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
72
every single instruction in a series of consecutive micro-operations. The combination of these two classical
73
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
74
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
75
 
76
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
77
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
78
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
79
Architecture.
80
 
81
 
82
// ####################################################################################################################
83
:sectnums:
84
=== RISC-V Compatibility
85
 
86
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
87
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
88 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
89
 
90
[NOTE]
91
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
92 60 zero_gravi
for information how to run the tests on the NEORV32.
93
 
94
.**RISC-V `rv32_m/C` Tests**
95
...................................
96
Check cadd-01           ... OK
97
Check caddi-01          ... OK
98
Check caddi16sp-01      ... OK
99
Check caddi4spn-01      ... OK
100
Check cand-01           ... OK
101
Check candi-01          ... OK
102
Check cbeqz-01          ... OK
103
Check cbnez-01          ... OK
104
Check cebreak-01        ... OK
105
Check cj-01             ... OK
106
Check cjal-01           ... OK
107
Check cjalr-01          ... OK
108
Check cjr-01            ... OK
109
Check cli-01            ... OK
110
Check clui-01           ... OK
111
Check clw-01            ... OK
112
Check clwsp-01          ... OK
113
Check cmv-01            ... OK
114
Check cnop-01           ... OK
115
Check cor-01            ... OK
116
Check cslli-01          ... OK
117
Check csrai-01          ... OK
118
Check csrli-01          ... OK
119
Check csub-01           ... OK
120
Check csw-01            ... OK
121
Check cswsp-01          ... OK
122
Check cxor-01           ... OK
123
--------------------------------
124
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
125
...................................
126
 
127
.**RISC-V `rv32_m/I` Tests**
128
...................................
129
Check add-01            ... OK
130
Check addi-01           ... OK
131
Check and-01            ... OK
132
Check andi-01           ... OK
133
Check auipc-01          ... OK
134
Check beq-01            ... OK
135
Check bge-01            ... OK
136
Check bgeu-01           ... OK
137
Check blt-01            ... OK
138
Check bltu-01           ... OK
139
Check bne-01            ... OK
140
Check fence-01          ... OK
141
Check jal-01            ... OK
142
Check jalr-01           ... OK
143
Check lb-align-01       ... OK
144
Check lbu-align-01      ... OK
145
Check lh-align-01       ... OK
146
Check lhu-align-01      ... OK
147
Check lui-01            ... OK
148
Check lw-align-01       ... OK
149
Check or-01             ... OK
150
Check ori-01            ... OK
151
Check sb-align-01       ... OK
152
Check sh-align-01       ... OK
153
Check sll-01            ... OK
154
Check slli-01           ... OK
155
Check slt-01            ... OK
156
Check slti-01           ... OK
157
Check sltiu-01          ... OK
158
Check sltu-01           ... OK
159
Check sra-01            ... OK
160
Check srai-01           ... OK
161
Check srl-01            ... OK
162
Check srli-01           ... OK
163
Check sub-01            ... OK
164
Check sw-align-01       ... OK
165
Check xor-01            ... OK
166
Check xori-01           ... OK
167
--------------------------------
168
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
169
...................................
170
 
171
.**RISC-V `rv32_m/M` Tests**
172
...................................
173
Check div-01            ... OK
174
Check divu-01           ... OK
175
Check mul-01            ... OK
176
Check mulh-01           ... OK
177
Check mulhsu-01         ... OK
178
Check mulhu-01          ... OK
179
Check rem-01            ... OK
180
Check remu-01           ... OK
181
--------------------------------
182
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
183
...................................
184
 
185
.**RISC-V `rv32_m/privilege` Tests**
186
...................................
187
Check ebreak            ... OK
188
Check ecall             ... OK
189
Check misalign-beq-01   ... OK
190
Check misalign-bge-01   ... OK
191
Check misalign-bgeu-01  ... OK
192
Check misalign-blt-01   ... OK
193
Check misalign-bltu-01  ... OK
194
Check misalign-bne-01   ... OK
195
Check misalign-jal-01   ... OK
196
Check misalign-lh-01    ... OK
197
Check misalign-lhu-01   ... OK
198
Check misalign-lw-01    ... OK
199
Check misalign-sh-01    ... OK
200
Check misalign-sw-01    ... OK
201
Check misalign1-jalr-01 ... OK
202
Check misalign2-jalr-01 ... OK
203
--------------------------------
204
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
205
...................................
206
 
207
.**RISC-V `rv32_m/Zifencei` Tests**
208
...................................
209
Check Fencei            ... OK
210
--------------------------------
211
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
212
...................................
213
 
214
 
215
<<<
216
:sectnums:
217
==== RISC-V Incompatibility Issues and Limitations
218
 
219 64 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
220 60 zero_gravi
can be found in section <<_instruction_sets_and_extensions>>.
221
 
222 64 zero_gravi
.Hardwired R/W CSRs
223 60 zero_gravi
[IMPORTANT]
224 64 zero_gravi
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
225
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
226 65 zero_gravi
Pending interrupt can only be cleared by acknowledging the interrupt-causing device. However, pending interrupts
227
can still be ignored by clearing the according `mie` register bits.
228 60 zero_gravi
 
229 64 zero_gravi
.Physical memory protection
230 60 zero_gravi
[IMPORTANT]
231
The physical memory protection (see section <<_machine_physical_memory_protection>>)
232
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
233
 
234 64 zero_gravi
.Atomic memory operations
235 60 zero_gravi
[IMPORTANT]
236 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
237
However, these instructions are sufficient to emulate all further atomic memory operations.
238 60 zero_gravi
 
239 64 zero_gravi
.Instruction Misalignment
240
[NOTE]
241
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
242
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
243
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
244
there will be no misaligned instruction exceptions _at all_.
245
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
246 60 zero_gravi
 
247 64 zero_gravi
 
248 60 zero_gravi
<<<
249
// ####################################################################################################################
250
:sectnums:
251
=== CPU Top Entity - Signals
252
 
253
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
254
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
255
direction seen from the CPU.
256
 
257
.NEORV32 CPU top entity signals
258
[cols="<2,^1,^1,<6"]
259
[options="header", grid="rows"]
260
|=======================
261
| Signal           | Width | Dir.   | Function
262
4+^| **Global Signals**
263
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
264
| `rstn_i`         |     1 | in  | global reset, low-active
265
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
266
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
267
| `i_bus_addr_o`   |    32 | out | destination address
268
| `i_bus_rdata_i`  |    32 | in  | read data
269
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
270
| `i_bus_ben_o`    |     4 | out | byte enable
271
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
272
| `i_bus_re_o`     |     1 | out | read transaction
273
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
274
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
275
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
276
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
277
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
278
4+^| **Data Bus Interface (<<_bus_interface>>)**
279
| `d_bus_addr_o`   |    32 | out | destination address
280
| `d_bus_rdata_i`  |    32 | in  | read data
281
| `d_bus_wdata_o`  |    32 | out | write data
282
| `d_bus_ben_o`    |     4 | out | byte enable
283
| `d_bus_we_o`     |     1 | out | write transaction
284
| `d_bus_re_o`     |     1 | out | read transaction
285
| `d_bus_lock_o`   |     1 | out | exclusive access request
286
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
287
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
288
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
289
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
290
4+^| **System Time (see <<_timeh>> CSR)**
291
| `time_i`         |    64 | in  | system time input (from MTIME)
292
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
293
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
294
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
295
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
296
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
297
| `firq_i`         |    16 | in  | fast interrupt request signals
298
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
299
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
300
|=======================
301
 
302
<<<
303
// ####################################################################################################################
304
:sectnums:
305
=== CPU Top Entity - Generics
306
 
307
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
308
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
309
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
310
The _specific_ generics are listed below.
311
 
312
[cols="4,4,2"]
313
[frame="all",grid="none"]
314
|======
315
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
316
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
317
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
318 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
319 60 zero_gravi
|======
320
 
321
[cols="4,4,2"]
322
[frame="all",grid="none"]
323
|======
324
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
325
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
326
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
327
|======
328
 
329
[cols="4,4,2"]
330
[frame="all",grid="none"]
331
|======
332
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
333
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
334
|======
335
 
336
 
337
<<<
338
// ####################################################################################################################
339
:sectnums:
340
=== Instruction Sets and Extensions
341
 
342 65 zero_gravi
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
343 60 zero_gravi
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
344 65 zero_gravi
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
345 60 zero_gravi
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
346
 
347
[TIP]
348 63 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
349 64 zero_gravi
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
350 63 zero_gravi
or by executing an instruction and checking for an _illegal instruction exception_.
351 60 zero_gravi
 
352 63 zero_gravi
[NOTE]
353 65 zero_gravi
Executing an instruction from an extension that is not supported yet or that is currently not enabled
354
(via the according top entity generic) will raise an _illegal instruction_ exception.
355 60 zero_gravi
 
356 63 zero_gravi
 
357 60 zero_gravi
==== **`A`** - Atomic Memory Access
358
 
359 65 zero_gravi
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
360
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
361
ISA extension is enabled if the `CPU_EXTENSION_RISCV_A` configuration generic is _true_.
362
In this case the following additional instructions are available:
363 60 zero_gravi
 
364
* `lr.w`: load-reservate
365
* `sc.w`: store-conditional
366
 
367
[NOTE]
368
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
369
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
370 65 zero_gravi
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
371
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
372 60 zero_gravi
 
373 65 zero_gravi
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
374
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
375
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
376
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
377
After the execution of the `sc` instruction, the lock is automatically removed.
378
 
379
The lock is broken if at least one of the following conditions occur:
380
. executing any data memory access instruction other than `lr.w`
381
. raising _any_ t (for example an interrupt or a memory access exception)
382
 
383 60 zero_gravi
[NOTE]
384
The atomic instructions have special requirements for memory system / bus interconnect. More
385
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
386
 
387
 
388
==== **`C`** - Compressed Instructions
389
 
390 65 zero_gravi
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
391
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
392
In this case the following instructions are available:
393 60 zero_gravi
 
394
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
395
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
396
 
397
[NOTE]
398 65 zero_gravi
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
399
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
400 60 zero_gravi
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
401
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
402
 
403
 
404
==== **`E`** - Embedded CPU
405
 
406 65 zero_gravi
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
407
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the `CPU_EXTENSION_RISCV_E`
408
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
409
This extension does not add any additional instructions or features.
410 60 zero_gravi
 
411 63 zero_gravi
[IMPORTANT]
412
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
413 60 zero_gravi
 
414
 
415
==== **`I`** - Base Integer ISA
416 65 zero_gravi
 
417 60 zero_gravi
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
418
regardless of the setting of the remaining exceptions. The base instruction set includes the following
419
instructions:
420
 
421 65 zero_gravi
* immediate: `lui`, `auipc`
422 60 zero_gravi
* jumps: `jal`, `jalr`
423
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
424
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
425
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
426
* environment: `ecall`, `ebreak`, `fence`
427
 
428
[NOTE]
429 61 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
430
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
431
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
432 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
433 60 zero_gravi
 
434
[NOTE]
435
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
436
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
437
executed. Any flags within the `fence` instruction word are ignore by the hardware.
438
 
439
 
440
==== **`M`** - Integer Multiplication and Division
441
 
442 65 zero_gravi
Hardware-accelerated integer multiplication and division operations are available when the
443 60 zero_gravi
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
444
available:
445
 
446 61 zero_gravi
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
447
* division: `div`, `divu`, `rem`, `remu`
448 60 zero_gravi
 
449
[NOTE]
450
By default, multiplication and division operations are executed in a bit-serial approach.
451
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
452
generic is _true_ allowing faster execution. Multiplications and divisions
453
always require a fixed amount of cycles to complete - regardless of the input operands.
454
 
455
 
456 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
457
 
458
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
459 65 zero_gravi
of the `M` extensions and is intended for size-constrained setups that require hardware-based
460 61 zero_gravi
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
461 65 zero_gravi
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
462 61 zero_gravi
 
463
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
464
 
465 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
466
will raise an _illegal instruction exception_.
467 61 zero_gravi
 
468 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
469 61 zero_gravi
 
470
[TIP]
471
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
472
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
473 65 zero_gravi
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
474 61 zero_gravi
 
475
 
476 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
477
 
478 65 zero_gravi
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
479
operation mode. It is implemented if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_.
480
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
481
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
482
Any kind of privilege rights violation will raise an exception to allow full virtualization.
483 60 zero_gravi
 
484
 
485
==== **`X`** - NEORV32-Specific (Custom) Extensions
486
 
487
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
488
 
489 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
490
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
491 64 zero_gravi
and `mip` CSR. This extension is mapped to _reserved_ CSR bits, that are available for custom use (according to the
492 60 zero_gravi
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
493 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
494 60 zero_gravi
 
495
 
496 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
497 60 zero_gravi
 
498 65 zero_gravi
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
499
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
500
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
501
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
502
register file-related load/store or move instructions.
503
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
504 60 zero_gravi
 
505 65 zero_gravi
[TIP]
506 60 zero_gravi
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
507
 
508 65 zero_gravi
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
509
to the `F` extension. The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
510 60 zero_gravi
generic is _true_. In this case the following instructions and CSRs are available:
511
 
512
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
513
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
514
* computational: `fadd.s`, `fsub.s`, `fmul.s`
515
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
516
* number classification: `fclass.s`
517
 
518
* additional CSRs: `fcsr`, `frm`, `fflags`
519
 
520
[WARNING]
521
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
522
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
523
 
524
[WARNING]
525 65 zero_gravi
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
526
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
527 60 zero_gravi
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
528
result is also flushed to zero during normalization.
529
 
530
[WARNING]
531
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
532
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
533
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
534
code (see `sw/example/floating_point_test`).
535
 
536 63 zero_gravi
 
537
==== **`Zbb`** Basic Bit-Manipulation Operations
538
 
539
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
540
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
541
 
542
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
543
generic is _true_. In this case the following instructions are available:
544
 
545
* `andn`, `orn`, `xnor`
546
* `clz`, `ctz`, `cpop`
547
* `max`, `maxu`, `min`, `minu`
548
* `sext.b`, `sext.h`, `zext.h`
549
* `rol`, `ror`, `rori`
550
* `orc.b`, `rev8`
551
 
552
[TIP]
553
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
554
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
555
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
556
shift-related `Zbb` instructions.
557
 
558 65 zero_gravi
[WARNING]
559 63 zero_gravi
The `Zbb` extension is frozen but not officially ratified yet. There is no
560
software support for this extension in the upstream GCC RISC-V port yet. However, an
561
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
562
code (see `sw/example/bitmanip_test`).
563 60 zero_gravi
 
564 62 zero_gravi
 
565 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
566
 
567 65 zero_gravi
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
568
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
569
In this case the following instructions are available:
570 60 zero_gravi
 
571
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
572
* environment: `mret`, `wfi`
573
 
574
[WARNING]
575 65 zero_gravi
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
576
In order to provide the full set of functions and to allow a secure execution
577
environment the `Zicsr` extension should always be enabled.
578 60 zero_gravi
 
579
[NOTE]
580
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
581
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
582
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
583
 
584 65 zero_gravi
[NOTE]
585
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
586
`TW` (timeout wait) is hardwired to zero.
587 60 zero_gravi
 
588 62 zero_gravi
 
589 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
590
 
591
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
592
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
593
 
594
* `fence.i`
595
 
596
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
597 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
598
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
599
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
600 60 zero_gravi
 
601
 
602
==== **`PMP`** Physical Memory Protection
603
 
604 65 zero_gravi
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
605
to constrain memory read/write/execute rights for each available privilege level.
606 60 zero_gravi
 
607 65 zero_gravi
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
608
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
609
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
610
In this case the following additional CSRs are available:
611
 
612 60 zero_gravi
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
613
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
614
 
615 65 zero_gravi
[TIP]
616 60 zero_gravi
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
617
 
618
The actual number of regions and the minimal region granularity are defined via the top entity
619
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
620
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
621
number of available `pmpcfg*` and `pmpaddr*` CSRs.
622
 
623
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
624
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
625
increase the latency of instruction fetches and data access by +1 cycle.
626
 
627
The critical limit can be adapted for custom use by a constant from the main VHDL package file
628
(`rtl/core/neorv32_package.vhd`). The default value is 8:
629
 
630
[source,vhdl]
631
----
632
-- "critical" number of PMP regions --
633
constant pmp_num_regions_critical_c : natural := 8;
634
----
635
 
636
**Operation**
637
 
638 65 zero_gravi
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
639
of the specified  PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
640
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
641 60 zero_gravi
 
642
* a write access (store) will fail if no write attribute is set
643
* a read access (load) will fail if no read attribute is set
644
* an instruction fetch access will fail if no execute attribute is set
645
 
646 65 zero_gravi
If an access to a protected region does not have the according access rights it will raise the according
647
instruction/load/store _access fault_ exception.
648 60 zero_gravi
 
649
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
650 65 zero_gravi
memory protection also for machine-level programs you need to set the _locked bit_ in the according
651
`pmpcfg*` configuration CSR.
652 60 zero_gravi
 
653
[IMPORTANT]
654
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
655
internal (iterative) computations before the configuration becomes valid.
656
 
657
[NOTE]
658
For more information regarding RISC-V physical memory protection see the official _The RISC-V
659 65 zero_gravi
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
660 60 zero_gravi
 
661
 
662
==== **`HPM`** Hardware Performance Monitors
663
 
664 65 zero_gravi
In additions to the mandatory cycle (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
665 60 zero_gravi
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
666
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
667 65 zero_gravi
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
668 60 zero_gravi
CSR defines the architectural events that lead to an increment of the associated HPM counter.
669
 
670
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
671 62 zero_gravi
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
672 60 zero_gravi
the instructions-retired counter increments with each executed instructions. The actual hardware performance
673
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
674 65 zero_gravi
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will remove
675 60 zero_gravi
all HPM logic from the design.
676
 
677 65 zero_gravi
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
678
according `mcountinhibit` CSR bits are hardwired to zero.
679
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
680
The according CSRs are read-only and will always return 0.
681 60 zero_gravi
 
682 65 zero_gravi
Depending on the configuration the following additional CSR are available:
683 60 zero_gravi
 
684 65 zero_gravi
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
685
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
686
 
687 62 zero_gravi
[IMPORTANT]
688
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
689 65 zero_gravi
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
690
exception.
691 62 zero_gravi
 
692 65 zero_gravi
[TIP]
693 60 zero_gravi
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
694
 
695 65 zero_gravi
[TIP]
696
For a list of all HPM-related CSRs and all provided event configurations
697
see section <<_hardware_performance_monitors_hpm>>.
698 60 zero_gravi
 
699
 
700
<<<
701
// ####################################################################################################################
702
:sectnums:
703
=== Instruction Timing
704
 
705
The instruction timing listed in the table below shows the required clock cycles for executing a certain
706
instruction. These instruction cycles assume a bus access without additional wait states and a filled
707
pipeline.
708
 
709
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
710
configurations are presented in <<_cpu_performance>>.
711
 
712
.Clock cycles per instruction
713
[cols="<2,^1,^4,<3"]
714
[options="header", grid="rows"]
715
|=======================
716
| Class | ISA | Instruction(s) | Execution cycles
717
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
718
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
719
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
720 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
721 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
722
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
723
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
724
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
725
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
726
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
727
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
728
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
729
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
730
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
731
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
732
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
733
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
734
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
735
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
736
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
737
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
738
| System | `I/E` | `fence` | 3
739
| System | `C`+`Zicsr` | `c.break` | 4
740
| System | `Zicsr` | `mret` `wfi` | 5
741
| System | `Zifencei` | `fence.i` | 5
742
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
743
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
744
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
745
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
746
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
747
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
748
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
749 63 zero_gravi
| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3
750
| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4
751
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
752
| Basic bit-manip - misc  | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
753 60 zero_gravi
|=======================
754
 
755
[NOTE]
756 65 zero_gravi
The presented values of the *floating-point execution cycles* are average values - obtained from
757 60 zero_gravi
4096 instruction executions using pseudo-random input values. The execution time for emulating the
758
instructions (using pure-software libraries) is ~17..140 times higher.
759
 
760
 
761
 
762
// ####################################################################################################################
763
include::cpu_csr.adoc[]
764
 
765
 
766
 
767
<<<
768
// ####################################################################################################################
769
:sectnums:
770 62 zero_gravi
==== Full Virtualization
771 60 zero_gravi
 
772 62 zero_gravi
Just like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilities
773
on CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPU
774
extension is enabled (implementing the full set of the privileged architecture).]
775
Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing an
776
malformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in a
777
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
778
have to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the cause
779
of the trap - at any time improving overall _execution safety_.
780 60 zero_gravi
 
781 62 zero_gravi
**NEORV32-Specific Virtualization Features**
782 60 zero_gravi
 
783 62 zero_gravi
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
784
(i.e. there is no speculative execution / no out-of-order states).
785
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
786
accessed address does not respond or encounters an internal error during access.
787 65 zero_gravi
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional security feature,
788
the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions _do raise an illegal instruction trap_ and
789
_do not commit any operation_ (like writing registers or triggering memory operations).
790 62 zero_gravi
* To be continued...
791 60 zero_gravi
 
792
 
793
<<<
794
// ####################################################################################################################
795
:sectnums:
796
==== Traps, Exceptions and Interrupts
797
 
798 61 zero_gravi
In this document the following nomenclature regarding traps is used:
799 60 zero_gravi
 
800 64 zero_gravi
* _interrupts_ = asynchronous exceptions
801 60 zero_gravi
* _exceptions_ = synchronous exceptions
802
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
803
 
804 61 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
805
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
806
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
807
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
808 60 zero_gravi
 
809 61 zero_gravi
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
810
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
811 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
812 61 zero_gravi
the second highest priority will get serviced and so on until no further interrupt are pending.
813 60 zero_gravi
 
814 65 zero_gravi
.Interrupt Signal Requirements
815 61 zero_gravi
[IMPORTANT]
816 65 zero_gravi
All interrupts request signals (including FIRQs) are **high-active**. A request has to stay at high-level (=asserted)
817
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
818 60 zero_gravi
 
819 61 zero_gravi
.Instruction Atomicity
820
[NOTE]
821 65 zero_gravi
All instructions execute as atomic operations - interrupts can only trigger between two instructions.
822 64 zero_gravi
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
823
a new interrupt handler can start.
824 60 zero_gravi
 
825
 
826 61 zero_gravi
:sectnums:
827
==== Memory Access Exceptions**
828 60 zero_gravi
 
829 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
830
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
831
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
832 64 zero_gravi
memory protection fault do not trigger a bus write-operation at all.
833 60 zero_gravi
 
834
 
835 61 zero_gravi
:sectnums:
836
==== Custom Fast Interrupt Request Lines
837 60 zero_gravi
 
838 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
839 60 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
840 65 zero_gravi
provide custom trap codes in `mcause`. These FIRQs are reserved for NEORV32 processor-internal usage only.
841 60 zero_gravi
 
842
 
843
 
844
<<<
845
// ####################################################################################################################
846
:sectnums!:
847
===== NEORV32 Trap Listing
848
 
849
.NEORV32 trap listing
850
[cols="3,6,5,14,11,4,4"]
851
[options="header",grid="rows"]
852
|=======================
853 64 zero_gravi
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
854
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
855
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
856
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
857
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
858
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
859
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (EBREAK) | _PC_ | _PC_
860
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
861
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
862
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
863
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
864
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
865
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
866
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
867
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
868
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
869
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
870
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
871
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
872
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
873
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
874
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
875
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
876
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
877
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
878
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
879
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
880
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
881
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
882
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
883 60 zero_gravi
|=======================
884
 
885
**Notes**
886
 
887
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
888
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
889
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
890
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
891
`mepc` and `mtval` CSRs when a trap is triggered:
892
 
893
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
894
* _B-ADR_- bad memory access address that cause the trap
895
* _PC_ - address of instruction that caused the trap
896
* _0_ - zero
897
* _Inst_ - the faulting instruction itself
898
 
899
 
900
 
901
<<<
902
// ####################################################################################################################
903
:sectnums:
904
==== Bus Interface
905
 
906
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
907
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
908
 
909
:sectnums:
910
===== Address Space
911
 
912
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
913
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
914
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
915 65 zero_gravi
does not support unaligned memory accesses _in hardware_ - however, a software-based handling can be
916 60 zero_gravi
implemented as any unaligned memory access will trigger an according exception.
917
 
918
:sectnums:
919
===== Interface Signals
920
 
921
The following table shows the signals of the data and instruction interfaces seen from the CPU
922
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
923
 
924
.CPU bus interface
925
[cols="<2,^1,<7"]
926
[options="header",grid="rows"]
927
|=======================
928
| Signal | Size | Function
929
| `bus_addr_o`   | 32 | access address
930
| `bus_rdata_i`  | 32 | data input for read operations
931
| `bus_wdata_o`  | 32 | data output for write operations
932
| `bus_ben_o`    | 4  | byte enable signal for write operations
933
| `bus_we_o`     | 1  | bus write access
934
| `bus_re_o`     | 1  | bus read access
935
| `bus_lock_o`   | 1  | exclusive access request
936
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
937
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
938
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
939
| `bus_priv_o`   | 2  | current CPU privilege level
940
|=======================
941
 
942
[NOTE]
943
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
944
So only a single transfer request can be "on the fly".
945
 
946
:sectnums:
947
===== Protocol
948
 
949
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
950
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
951
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
952
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
953
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
954
access fault or load/store bus access fault exception.
955
 
956
[NOTE]
957
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
958
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
959
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
960
 
961
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
962
[IMPORTANT]
963
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
964
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
965
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
966
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
967
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
968
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
969
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
970
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
971
 
972
**Exemplary Bus Accesses**
973
 
974
.Example bus accesses: see read/write access description below
975
[cols="^2,^2"]
976
[grid="none"]
977
|=======================
978
a| image::cpu_interface_read_long.png[read,300,150]
979
a| image::cpu_interface_write_long.png[write,300,150]
980
| Read access | Write access
981
|=======================
982
 
983
**Write Access**
984
 
985
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
986
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
987
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
988
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
989
cycles after issuing.
990
 
991
**Read Access**
992
 
993
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
994
stable until the transaction is completed. In the example the accessed peripheral cannot answer
995
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
996
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
997
signal).
998
 
999
**Access Boundaries**
1000
 
1001
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
1002
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1003
bit) and word (= 32-bit) boundaries.
1004
 
1005
**Exclusive (Atomic) Access**
1006
 
1007
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1008
combination. Normally, these combinations should target the same memory address.
1009
 
1010
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1011
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1012
the memory system to manage this exclusive access reservation by storing the according access address and
1013
the source of the access itself (for example via the CPU ID in a multi-core system).
1014
 
1015
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1016
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1017
zero and will allow the according store operation to the memory system. If the lock is broken, the
1018
instruction will write-back non-zero and will not generate an actual memory store operation.
1019
 
1020
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1021
 
1022
* when executing any other memory-access operation than `lr.w`
1023
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1024
* when the memory system signals a bus error (via the `bus_err_i` signal)
1025
 
1026
[TIP]
1027
For more information regarding the SoC-level behavior and requirements of atomic operations see
1028
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1029
 
1030
**Memory Barriers**
1031
 
1032
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
1033
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
1034
memory system to perform the necessary operations (like a cache flush and refill).
1035
 
1036
 
1037
 
1038
<<<
1039
// ####################################################################################################################
1040
:sectnums:
1041
==== CPU Hardware Reset
1042
 
1043
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1044
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1045
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1046
after power-up is not relevant for a defined CPU boot process.
1047
 
1048
**Rational**
1049
 
1050
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1051
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1052
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
1053
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1054
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1055
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1056
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1057
this example "uncritical registers".
1058
 
1059
**NEORV32 CPU Reset**
1060
 
1061
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1062
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1063
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
1064
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1065
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1066
 
1067
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1068
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
1069
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1070
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
1071
hardware reset setting it to low (globally disabling interrupts).
1072
 
1073
**Reset Configuration**
1074
 
1075
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
1076
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
1077
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1078
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
1079
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1080
 
1081
[source,vhdl]
1082
----
1083
-- "critical" number of PMP regions --
1084
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
1085
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
1086
default; TRUE=defined LOW reset value)
1087
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.