OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 71

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8 66 zero_gravi
* 32-bit multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11 66 zero_gravi
** `B` - bit-manipulation instructions
12 61 zero_gravi
** `C` - 16-bit compressed instructions
13
** `I` - integer base ISA (always enabled)
14
** `E` - embedded CPU version (reduced register file size)
15
** `M` - integer multiplication and division hardware
16
** `U` - less-privileged _user_ mode
17
** `Zfinx` - single-precision floating-point unit
18
** `Zicsr` - control and status register access (privileged architecture)
19 66 zero_gravi
** `Zicntr` - CPU base counters
20
** `Zihpm` - hardware performance monitors
21 61 zero_gravi
** `Zifencei` - instruction stream synchronization
22
** `Zmmul` - integer multiplication hardware
23
** `PMP` - physical memory protection
24 66 zero_gravi
** `Debug` - debug mode
25 65 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
26 60 zero_gravi
* Official RISC-V open-source architecture ID
27 65 zero_gravi
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
28 66 zero_gravi
* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
29
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
30 60 zero_gravi
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
31
* Optional hardware performance monitors (HPM) for application benchmarking
32 66 zero_gravi
* Separated interfaces for instruction fetch and data access (merged into a single processor bus))
33 60 zero_gravi
* little-endian byte order
34
* Configurable hardware reset
35 65 zero_gravi
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
36 60 zero_gravi
 
37
[NOTE]
38
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
39
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
40
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
41
setup also allows to further use the default bootloader and software framework. From this base you
42 70 zero_gravi
can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.
43 60 zero_gravi
 
44
[NOTE]
45
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
46
 
47
<<<
48
// ####################################################################################################################
49
:sectnums:
50
=== Architecture
51
 
52
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
53
specifications. The following figure shows the simplified architecture of the CPU.
54
 
55
image::neorv32_cpu.png[align=center]
56
 
57 66 zero_gravi
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
58
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
59
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
60
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
61 60 zero_gravi
 
62 66 zero_gravi
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
63
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
64
data is stored to a FIFO queue - the instruction prefetch buffer.
65 60 zero_gravi
 
66 66 zero_gravi
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
67
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
68
instruction or decompressed 16-bit instructions) for execution.
69
 
70
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
71
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
72
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
73
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
74
 
75 60 zero_gravi
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
76
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
77 66 zero_gravi
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
78
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
79
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
80
multi-cycle concept).
81 60 zero_gravi
 
82 66 zero_gravi
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
83
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
84
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
85
address space.
86 60 zero_gravi
 
87
 
88
// ####################################################################################################################
89
:sectnums:
90 66 zero_gravi
=== Full Virtualization
91
 
92
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU _and_ SoC level to
93
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
94
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
95
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing an
96
malformed instruction word or accessing a not-allocated memory address). For any kind of trap the core is always in a
97
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
98
might have to reverted). This allows predictable execution behavior at any time improving overall _execution safety_.
99
 
100
**Execution Safety - NEORV32 Virtualization Features**
101
 
102
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
103
(i.e. there is no speculative execution / no out-of-order states).
104
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
105
accessed address does not respond or encounters an internal error during access.
106
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
107
window. Otherwise a bus access exception is raised.
108
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
109
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
110
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
111
memory operations).
112
* To be continued...
113
 
114
 
115
// ####################################################################################################################
116
:sectnums:
117 60 zero_gravi
=== RISC-V Compatibility
118
 
119
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
120
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
121 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
122
 
123
[NOTE]
124
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
125 60 zero_gravi
for information how to run the tests on the NEORV32.
126
 
127
.**RISC-V `rv32_m/C` Tests**
128
...................................
129
Check cadd-01           ... OK
130
Check caddi-01          ... OK
131
Check caddi16sp-01      ... OK
132
Check caddi4spn-01      ... OK
133
Check cand-01           ... OK
134
Check candi-01          ... OK
135
Check cbeqz-01          ... OK
136
Check cbnez-01          ... OK
137
Check cebreak-01        ... OK
138
Check cj-01             ... OK
139
Check cjal-01           ... OK
140
Check cjalr-01          ... OK
141
Check cjr-01            ... OK
142
Check cli-01            ... OK
143
Check clui-01           ... OK
144
Check clw-01            ... OK
145
Check clwsp-01          ... OK
146
Check cmv-01            ... OK
147
Check cnop-01           ... OK
148
Check cor-01            ... OK
149
Check cslli-01          ... OK
150
Check csrai-01          ... OK
151
Check csrli-01          ... OK
152
Check csub-01           ... OK
153
Check csw-01            ... OK
154
Check cswsp-01          ... OK
155
Check cxor-01           ... OK
156
--------------------------------
157
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
158
...................................
159
 
160
.**RISC-V `rv32_m/I` Tests**
161
...................................
162
Check add-01            ... OK
163
Check addi-01           ... OK
164
Check and-01            ... OK
165
Check andi-01           ... OK
166
Check auipc-01          ... OK
167
Check beq-01            ... OK
168
Check bge-01            ... OK
169
Check bgeu-01           ... OK
170
Check blt-01            ... OK
171
Check bltu-01           ... OK
172
Check bne-01            ... OK
173
Check fence-01          ... OK
174
Check jal-01            ... OK
175
Check jalr-01           ... OK
176
Check lb-align-01       ... OK
177
Check lbu-align-01      ... OK
178
Check lh-align-01       ... OK
179
Check lhu-align-01      ... OK
180
Check lui-01            ... OK
181
Check lw-align-01       ... OK
182
Check or-01             ... OK
183
Check ori-01            ... OK
184
Check sb-align-01       ... OK
185
Check sh-align-01       ... OK
186
Check sll-01            ... OK
187
Check slli-01           ... OK
188
Check slt-01            ... OK
189
Check slti-01           ... OK
190
Check sltiu-01          ... OK
191
Check sltu-01           ... OK
192
Check sra-01            ... OK
193
Check srai-01           ... OK
194
Check srl-01            ... OK
195
Check srli-01           ... OK
196
Check sub-01            ... OK
197
Check sw-align-01       ... OK
198
Check xor-01            ... OK
199
Check xori-01           ... OK
200 70 zero_gravi
Check fence-01          ... OK
201 60 zero_gravi
--------------------------------
202 70 zero_gravi
OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
203 60 zero_gravi
...................................
204
 
205
.**RISC-V `rv32_m/M` Tests**
206
...................................
207
Check div-01            ... OK
208
Check divu-01           ... OK
209
Check mul-01            ... OK
210
Check mulh-01           ... OK
211
Check mulhsu-01         ... OK
212
Check mulhu-01          ... OK
213
Check rem-01            ... OK
214
Check remu-01           ... OK
215
--------------------------------
216
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
217
...................................
218
 
219
.**RISC-V `rv32_m/privilege` Tests**
220
...................................
221
Check ebreak            ... OK
222
Check ecall             ... OK
223
Check misalign-beq-01   ... OK
224
Check misalign-bge-01   ... OK
225
Check misalign-bgeu-01  ... OK
226
Check misalign-blt-01   ... OK
227
Check misalign-bltu-01  ... OK
228
Check misalign-bne-01   ... OK
229
Check misalign-jal-01   ... OK
230
Check misalign-lh-01    ... OK
231
Check misalign-lhu-01   ... OK
232
Check misalign-lw-01    ... OK
233
Check misalign-sh-01    ... OK
234
Check misalign-sw-01    ... OK
235
Check misalign1-jalr-01 ... OK
236
Check misalign2-jalr-01 ... OK
237
--------------------------------
238
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
239
...................................
240
 
241
.**RISC-V `rv32_m/Zifencei` Tests**
242
...................................
243
Check Fencei            ... OK
244
--------------------------------
245
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
246
...................................
247
 
248
 
249
<<<
250
:sectnums:
251
==== RISC-V Incompatibility Issues and Limitations
252
 
253 64 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
254 60 zero_gravi
can be found in section <<_instruction_sets_and_extensions>>.
255
 
256 69 zero_gravi
.Read-Only "Read-Write" CSRs
257 60 zero_gravi
[IMPORTANT]
258 69 zero_gravi
The `misa` and `mtval` CSRs in the NEORV32 are _read-only_.
259
Any machine-mode write access to them is ignored and will _not_ cause any exceptions or side-effects to maintain
260
RISC-V compatibility.
261 60 zero_gravi
 
262 69 zero_gravi
.Physical Memory Protection
263 60 zero_gravi
[IMPORTANT]
264 70 zero_gravi
The physical memory protection (see section <<_machine_physical_memory_protection_csrs>>)
265 60 zero_gravi
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
266
 
267 69 zero_gravi
.Atomic Memory Operations
268 60 zero_gravi
[IMPORTANT]
269 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
270
However, these instructions are sufficient to emulate all further atomic memory operations.
271 60 zero_gravi
 
272 66 zero_gravi
 
273 60 zero_gravi
<<<
274
// ####################################################################################################################
275
:sectnums:
276
=== CPU Top Entity - Signals
277
 
278
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
279
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
280
direction seen from the CPU.
281
 
282
.NEORV32 CPU top entity signals
283
[cols="<2,^1,^1,<6"]
284
[options="header", grid="rows"]
285
|=======================
286
| Signal           | Width | Dir.   | Function
287
4+^| **Global Signals**
288
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
289
| `rstn_i`         |     1 | in  | global reset, low-active
290
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
291 69 zero_gravi
| `debug_o`        |     1 | out | CPU is in debug mode when set
292 60 zero_gravi
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
293
| `i_bus_addr_o`   |    32 | out | destination address
294
| `i_bus_rdata_i`  |    32 | in  | read data
295
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
296
| `i_bus_ben_o`    |     4 | out | byte enable
297
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
298
| `i_bus_re_o`     |     1 | out | read transaction
299
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
300
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
301
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
302
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
303
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
304
4+^| **Data Bus Interface (<<_bus_interface>>)**
305
| `d_bus_addr_o`   |    32 | out | destination address
306
| `d_bus_rdata_i`  |    32 | in  | read data
307
| `d_bus_wdata_o`  |    32 | out | write data
308
| `d_bus_ben_o`    |     4 | out | byte enable
309
| `d_bus_we_o`     |     1 | out | write transaction
310
| `d_bus_re_o`     |     1 | out | read transaction
311
| `d_bus_lock_o`   |     1 | out | exclusive access request
312
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
313
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
314
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
315
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
316
4+^| **System Time (see <<_timeh>> CSR)**
317
| `time_i`         |    64 | in  | system time input (from MTIME)
318
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
319
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
320
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
321
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
322
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
323
| `firq_i`         |    16 | in  | fast interrupt request signals
324
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
325
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
326
|=======================
327
 
328
<<<
329
// ####################################################################################################################
330
:sectnums:
331
=== CPU Top Entity - Generics
332
 
333
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
334
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
335
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
336
The _specific_ generics are listed below.
337
 
338
[cols="4,4,2"]
339
[frame="all",grid="none"]
340
|======
341
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
342
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
343
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
344 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
345 60 zero_gravi
|======
346
 
347
[cols="4,4,2"]
348
[frame="all",grid="none"]
349
|======
350
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
351
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
352
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
353
|======
354
 
355
[cols="4,4,2"]
356
[frame="all",grid="none"]
357
|======
358
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
359
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
360
|======
361
 
362
 
363
<<<
364
// ####################################################################################################################
365
:sectnums:
366
=== Instruction Sets and Extensions
367
 
368 65 zero_gravi
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
369 60 zero_gravi
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
370 65 zero_gravi
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
371 60 zero_gravi
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
372
 
373
[TIP]
374 63 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
375 64 zero_gravi
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
376 63 zero_gravi
or by executing an instruction and checking for an _illegal instruction exception_.
377 60 zero_gravi
 
378 63 zero_gravi
[NOTE]
379 65 zero_gravi
Executing an instruction from an extension that is not supported yet or that is currently not enabled
380
(via the according top entity generic) will raise an _illegal instruction_ exception.
381 60 zero_gravi
 
382 63 zero_gravi
 
383 60 zero_gravi
==== **`A`** - Atomic Memory Access
384
 
385 65 zero_gravi
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
386
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
387
ISA extension is enabled if the `CPU_EXTENSION_RISCV_A` configuration generic is _true_.
388
In this case the following additional instructions are available:
389 60 zero_gravi
 
390
* `lr.w`: load-reservate
391
* `sc.w`: store-conditional
392
 
393
[NOTE]
394
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
395
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
396 65 zero_gravi
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
397
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
398 60 zero_gravi
 
399 65 zero_gravi
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
400
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
401
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
402
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
403
After the execution of the `sc` instruction, the lock is automatically removed.
404
 
405
The lock is broken if at least one of the following conditions occur:
406
. executing any data memory access instruction other than `lr.w`
407
. raising _any_ t (for example an interrupt or a memory access exception)
408
 
409 60 zero_gravi
[NOTE]
410
The atomic instructions have special requirements for memory system / bus interconnect. More
411
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
412
 
413
 
414 66 zero_gravi
==== **`B`** - Bit-Manipulation Operations
415
 
416
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
417
`CPU_EXTENSION_RISCV_B` configuration generic is _true_.
418
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
419 71 zero_gravi
A copy of the spec is also available in `docs/references`.
420 66 zero_gravi
 
421 71 zero_gravi
The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V
422
bit-manipulation spec. v.093) and their corresponding instructions:
423 66 zero_gravi
 
424 71 zero_gravi
* **`Zba` - Address-generation instructions**
425
** `sh1add` `sh2add` `sh3add`
426
* **`Zbb` - Basic bit-manipulation instructions**
427
** `andn` `orn` `xnor`
428
** `clz` `ctz` `cpop`
429
** `max` `maxu` `min` `minu`
430
** `sext.b` `sext.h` `zext.h`
431
** `rol` `ror` `rori`
432
** `orc.b` `rev8`
433
* **`Zbc` - Carry-less multiplication instructions**
434
** `clmul` `clmulh` `clmulr`
435
* **`Zbs` - Single-bit instructions**
436
** `bclr` `bclri`
437
** `bext` `bexti`
438
** `bext` `binvi`
439
** `bset` `bseti`
440 66 zero_gravi
 
441
[TIP]
442
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
443
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
444
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
445
shift-related `B` instructions.
446
 
447
[WARNING]
448 71 zero_gravi
The `B` extension is frozen and officially ratified. However, there is no
449
software support for this extension in the upstream GCC RISC-V port yet. An
450 66 zero_gravi
intrinsic library is provided to utilize the provided `B` extension features from C-language
451 71 zero_gravi
code (see `sw/example/bitmanip_test`) to circumvent this.
452 66 zero_gravi
 
453
 
454 60 zero_gravi
==== **`C`** - Compressed Instructions
455
 
456 65 zero_gravi
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
457
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
458
In this case the following instructions are available:
459 60 zero_gravi
 
460 70 zero_gravi
* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`
461
`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`
462 60 zero_gravi
 
463
[NOTE]
464 65 zero_gravi
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
465
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
466 60 zero_gravi
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
467
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
468
 
469
 
470
==== **`E`** - Embedded CPU
471
 
472 65 zero_gravi
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
473
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the `CPU_EXTENSION_RISCV_E`
474
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
475
This extension does not add any additional instructions or features.
476 60 zero_gravi
 
477 70 zero_gravi
[NOTE]
478 63 zero_gravi
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
479 60 zero_gravi
 
480
 
481
==== **`I`** - Base Integer ISA
482 65 zero_gravi
 
483 60 zero_gravi
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
484
regardless of the setting of the remaining exceptions. The base instruction set includes the following
485
instructions:
486
 
487 70 zero_gravi
* immediate: `lui` `auipc`
488
* jumps: `jal` `jalr`
489
* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`
490
* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`
491
* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`
492
* environment: `ecall` `ebreak` `fence`
493 60 zero_gravi
 
494
[NOTE]
495 70 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations
496 61 zero_gravi
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
497 70 zero_gravi
completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
498 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
499 60 zero_gravi
 
500
[NOTE]
501
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
502 70 zero_gravi
top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
503 60 zero_gravi
executed. Any flags within the `fence` instruction word are ignore by the hardware.
504
 
505
 
506
==== **`M`** - Integer Multiplication and Division
507
 
508 65 zero_gravi
Hardware-accelerated integer multiplication and division operations are available when the
509 60 zero_gravi
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
510
available:
511
 
512 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
513
* division: `div` `divu` `rem` `remu`
514 60 zero_gravi
 
515
[NOTE]
516
By default, multiplication and division operations are executed in a bit-serial approach.
517
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
518
generic is _true_ allowing faster execution. Multiplications and divisions
519
always require a fixed amount of cycles to complete - regardless of the input operands.
520
 
521
 
522 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
523
 
524
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
525 65 zero_gravi
of the `M` extensions and is intended for size-constrained setups that require hardware-based
526 61 zero_gravi
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
527 65 zero_gravi
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
528 61 zero_gravi
 
529 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
530 61 zero_gravi
 
531 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
532
will raise an _illegal instruction exception_.
533 61 zero_gravi
 
534 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
535 61 zero_gravi
 
536
[TIP]
537
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
538
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
539 65 zero_gravi
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
540 61 zero_gravi
 
541
 
542 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
543
 
544 65 zero_gravi
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
545
operation mode. It is implemented if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_.
546
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
547
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
548
Any kind of privilege rights violation will raise an exception to allow full virtualization.
549 60 zero_gravi
 
550
 
551
==== **`X`** - NEORV32-Specific (Custom) Extensions
552
 
553
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
554
 
555 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
556
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
557 69 zero_gravi
and `mip` CSR. This extension is mapped to CSR bits, that are available for custom use (according to the
558 60 zero_gravi
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
559 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
560 60 zero_gravi
 
561
 
562 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
563 60 zero_gravi
 
564 65 zero_gravi
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
565
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
566
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
567
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
568
register file-related load/store or move instructions.
569
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
570 60 zero_gravi
 
571 70 zero_gravi
[NOTE]
572 60 zero_gravi
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
573
 
574 65 zero_gravi
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
575
to the `F` extension. The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
576 60 zero_gravi
generic is _true_. In this case the following instructions and CSRs are available:
577
 
578 70 zero_gravi
* conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`
579
* comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`
580
* computational: `fadd.s` `fsub.s` `fmul.s`
581
* sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`
582 60 zero_gravi
* number classification: `fclass.s`
583
 
584 70 zero_gravi
* additional CSRs: `fcsr` `frm` `fflags`
585 60 zero_gravi
 
586
[WARNING]
587
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
588
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
589
 
590
[WARNING]
591 65 zero_gravi
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
592
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
593 60 zero_gravi
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
594
result is also flushed to zero during normalization.
595
 
596
[WARNING]
597
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
598
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
599
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
600
code (see `sw/example/floating_point_test`).
601
 
602 63 zero_gravi
 
603 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
604
 
605 65 zero_gravi
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
606
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
607 68 zero_gravi
 
608
[IMPORTANT]
609
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
610
In order to provide the full set of privileged functions that are required to run more complex tasks like
611 70 zero_gravi
operating system and to allow a secure execution environment the `Zicsr` extension should be always enabled.
612 68 zero_gravi
 
613 65 zero_gravi
In this case the following instructions are available:
614 60 zero_gravi
 
615 70 zero_gravi
* CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`
616
* environment: `mret` `wfi`
617 60 zero_gravi
 
618 68 zero_gravi
[NOTE]
619
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
620
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
621
(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").
622 60 zero_gravi
 
623
[NOTE]
624 68 zero_gravi
The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is
625 60 zero_gravi
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
626
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
627 65 zero_gravi
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
628 68 zero_gravi
`TW` (timeout wait) is _hardwired_ to zero.
629 60 zero_gravi
 
630 62 zero_gravi
 
631 66 zero_gravi
 
632
==== **`Zicntr`** CPU Base Counters
633
 
634
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
635
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
636
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
637
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
638
 
639
[NOTE]
640
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
641
 
642
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
643
 
644
 
645
 
646
==== **`Zihpm`** Hardware Performance Monitors
647
 
648
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
649
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
650
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
651
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
652
CSR defines the architectural events that lead to an increment of the associated HPM counter.
653
 
654
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
655
 
656
Depending on the configuration the following additional CSR are available:
657
 
658
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
659
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
660
 
661
[IMPORTANT]
662
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
663
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
664
exception.
665
 
666
[TIP]
667
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
668
 
669
[TIP]
670
For a list of all HPM-related CSRs and all provided event configurations
671
see section <<_hardware_performance_monitors_hpm>>.
672
 
673
 
674 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
675
 
676
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
677
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
678
 
679
* `fence.i`
680
 
681 66 zero_gravi
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
682 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
683
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
684
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
685 60 zero_gravi
 
686
 
687
==== **`PMP`** Physical Memory Protection
688
 
689 65 zero_gravi
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
690
to constrain memory read/write/execute rights for each available privilege level.
691 60 zero_gravi
 
692 65 zero_gravi
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
693
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
694
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
695
In this case the following additional CSRs are available:
696
 
697 60 zero_gravi
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
698
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
699
 
700 65 zero_gravi
[TIP]
701 70 zero_gravi
See section <<_machine_physical_memory_protection_csrs>> for more information regarding the PMP CSRs.
702 60 zero_gravi
 
703
The actual number of regions and the minimal region granularity are defined via the top entity
704
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
705
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
706
number of available `pmpcfg*` and `pmpaddr*` CSRs.
707
 
708
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
709
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
710
increase the latency of instruction fetches and data access by +1 cycle.
711
 
712
The critical limit can be adapted for custom use by a constant from the main VHDL package file
713
(`rtl/core/neorv32_package.vhd`). The default value is 8:
714
 
715
[source,vhdl]
716
----
717
-- "critical" number of PMP regions --
718
constant pmp_num_regions_critical_c : natural := 8;
719
----
720
 
721
**Operation**
722
 
723 65 zero_gravi
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
724
of the specified  PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
725
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
726 60 zero_gravi
 
727
* a write access (store) will fail if no write attribute is set
728
* a read access (load) will fail if no read attribute is set
729
* an instruction fetch access will fail if no execute attribute is set
730
 
731 65 zero_gravi
If an access to a protected region does not have the according access rights it will raise the according
732
instruction/load/store _access fault_ exception.
733 60 zero_gravi
 
734
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
735 65 zero_gravi
memory protection also for machine-level programs you need to set the _locked bit_ in the according
736
`pmpcfg*` configuration CSR.
737 60 zero_gravi
 
738
[IMPORTANT]
739
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
740
internal (iterative) computations before the configuration becomes valid.
741
 
742
[NOTE]
743
For more information regarding RISC-V physical memory protection see the official _The RISC-V
744 65 zero_gravi
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
745 60 zero_gravi
 
746
 
747
 
748
<<<
749
// ####################################################################################################################
750
:sectnums:
751
=== Instruction Timing
752
 
753
The instruction timing listed in the table below shows the required clock cycles for executing a certain
754
instruction. These instruction cycles assume a bus access without additional wait states and a filled
755
pipeline.
756
 
757
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
758
configurations are presented in <<_cpu_performance>>.
759
 
760
.Clock cycles per instruction
761
[cols="<2,^1,^4,<3"]
762
[options="header", grid="rows"]
763
|=======================
764
| Class | ISA | Instruction(s) | Execution cycles
765
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
766
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
767
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
768 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
769 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
770
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
771
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
772
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
773
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
774
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
775
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
776 69 zero_gravi
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4
777
| Division       | `M`  | `div` `divu` `rem` `remu`     | 2+32+2
778 60 zero_gravi
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
779
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
780
| System | `I/E` | `fence` | 3
781
| System | `C`+`Zicsr` | `c.break` | 4
782
| System | `Zicsr` | `mret` `wfi` | 5
783 66 zero_gravi
| System | `Zifencei` | `fence.i` | 3 + ML
784 60 zero_gravi
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
785
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
786
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
787
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
788
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
789
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
790
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
791 66 zero_gravi
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
792
| Bit-manipulation - arithmetic/logic | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
793
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
794
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
795
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
796 71 zero_gravi
| Bit-manipulation - single-bit  | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
797 66 zero_gravi
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
798 71 zero_gravi
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
799 60 zero_gravi
|=======================
800
 
801
[NOTE]
802 65 zero_gravi
The presented values of the *floating-point execution cycles* are average values - obtained from
803 60 zero_gravi
4096 instruction executions using pseudo-random input values. The execution time for emulating the
804
instructions (using pure-software libraries) is ~17..140 times higher.
805
 
806
 
807 66 zero_gravi
<<<
808 60 zero_gravi
// ####################################################################################################################
809
include::cpu_csr.adoc[]
810
 
811
 
812
<<<
813
// ####################################################################################################################
814
:sectnums:
815
==== Traps, Exceptions and Interrupts
816
 
817 61 zero_gravi
In this document the following nomenclature regarding traps is used:
818 60 zero_gravi
 
819 64 zero_gravi
* _interrupts_ = asynchronous exceptions
820 60 zero_gravi
* _exceptions_ = synchronous exceptions
821
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
822
 
823 61 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
824
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
825
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
826 70 zero_gravi
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR and the processor's
827
<<_internal_bus_monitor_buskeeper>> (for memory access exceptions)
828 60 zero_gravi
 
829 70 zero_gravi
The traps are prioritized. If several _synchronous exceptions_ occur at once only the one with highest priority is triggered
830
while all remaining exceptions are ignored. If several _asynchronous exceptions_ (interrupts) trigger at once, the one with highest priority
831 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
832 70 zero_gravi
the second highest priority will get serviced and so on until no further interrupts are pending.
833 60 zero_gravi
 
834 69 zero_gravi
.Interrupt Signal Requirements - Standard RISC-V Interrupts
835 61 zero_gravi
[IMPORTANT]
836 69 zero_gravi
All standard RISC-V interrupts request signals are **high-active**. A request has to stay at high-level (=asserted)
837 65 zero_gravi
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
838 60 zero_gravi
 
839 69 zero_gravi
.Interrupt Signal Requirements - Fast Interrupt Requests
840
[IMPORTANT]
841 70 zero_gravi
The NEORV32-specific FIRQ request lines are triggered by a one-shot high-level (i.e. rising edge). Each request is buffered in the CPU control
842 69 zero_gravi
unit until the channel is either disabled (by clearing the according `mie` CSR bit) or the request is explicitly cleared (by setting
843
the according `mip` CSR bit).
844
 
845 61 zero_gravi
.Instruction Atomicity
846
[NOTE]
847 70 zero_gravi
All instructions execute as atomic operations - interrupts can only trigger _between_ two instructions.
848
So even if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
849
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.
850 60 zero_gravi
 
851
 
852 61 zero_gravi
:sectnums:
853 70 zero_gravi
==== Memory Access Exceptions
854 60 zero_gravi
 
855 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
856
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
857 70 zero_gravi
trigger a bus/memory read-operation at all. Vice versa, exceptions caused by a store address misalignment or a store physical
858
memory protection fault do not trigger a bus/memory write-operation at all.
859 60 zero_gravi
 
860
 
861 61 zero_gravi
:sectnums:
862
==== Custom Fast Interrupt Request Lines
863 60 zero_gravi
 
864 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
865 60 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
866 65 zero_gravi
provide custom trap codes in `mcause`. These FIRQs are reserved for NEORV32 processor-internal usage only.
867 60 zero_gravi
 
868
 
869
 
870
<<<
871
// ####################################################################################################################
872 69 zero_gravi
:sectnums:
873
==== NEORV32 Trap Listing
874 60 zero_gravi
 
875 69 zero_gravi
The following table shows all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization
876
and the CSR side-effects. A more detailed description of the actual trap triggering events is provided in a further table.
877
 
878
[NOTE]
879
_Asynchronous exceptions_ (= interrupts) set the MSB of `mcause` while _synchronous exception_ (= "software exception")
880
clear the MSB.
881
 
882
**Table Annotations**
883
 
884
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
885
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
886 70 zero_gravi
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (the runtime environment _RTE_) and can
887 69 zero_gravi
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
888
`mepc` and `mtval` CSRs when a trap is triggered:
889
 
890
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
891
* _B-ADR_- bad memory access address that cause the trap
892
* _PC_ - address of instruction that caused the trap
893
* _0_ - zero
894
* _Inst_ - the faulting instruction itself
895
 
896
.NEORV32 Trap Listing
897 60 zero_gravi
[cols="3,6,5,14,11,4,4"]
898
[options="header",grid="rows"]
899
|=======================
900 64 zero_gravi
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
901
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
902
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
903
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
904
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
905
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
906 69 zero_gravi
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (`ebreak`) | _PC_ | _PC_
907 64 zero_gravi
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
908
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
909
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
910
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
911
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
912
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
913
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
914
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
915
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
916
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
917
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
918
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
919
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
920
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
921
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
922
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
923
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
924
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
925
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
926
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
927
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
928
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
929
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
930 60 zero_gravi
|=======================
931
 
932
 
933 69 zero_gravi
The following table provides a summarized description of the actual events for triggering a specific trap.
934 60 zero_gravi
 
935 69 zero_gravi
.NEORV32 Trap Description
936
[cols="<3,<7"]
937
[options="header",grid="rows"]
938
|=======================
939 70 zero_gravi
| Trap ID [C] | Triggered when ...
940 69 zero_gravi
| _TRAP_CODE_I_MISALIGNED_ | fetching an 32-bit instruction word that is not 32-bit-aligned (_see note below!_)
941
| _TRAP_CODE_I_ACCESS_     | bus timeout or bus error during instruction word fetch
942
| _TRAP_CODE_I_ILLEGAL_    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation
943
| _TRAP_CODE_MENV_CALL_    | executing `ecall` instruction in machine-mode
944
| _TRAP_CODE_UENV_CALL_    | executing `ecall` instruction in user-mode
945
| _TRAP_CODE_BREAKPOINT_   | executing `ebreak` instruction (or triggered by on-chip debugger)
946
| _TRAP_CODE_S_MISALIGNED_ | storing data to an address that is not naturally aligned to the data size (byte, half, word) being stored
947
| _TRAP_CODE_L_MISALIGNED_ | loading data from an address that is not naturally aligned to the data size  (byte, half, word) being loaded
948
| _TRAP_CODE_S_ACCESS_     | bus timeout or bus error during load data operation
949
| _TRAP_CODE_L_ACCESS_     | bus timeout or bus error during store data operation
950
| _TRAP_CODE_FIRQ_0_ ... _TRAP_CODE_FIRQ_15_| caused by interrupt-condition of processor-internal modules, see <<_neorv32_specific_fast_interrupt_requests>>
951
| _TRAP_CODE_MEI_          | user-defined processor-external source (via dedicated top-entity signal)
952
| _TRAP_CODE_MSI_          | user-defined processor-external source (via dedicated top-entity signal)
953
| _TRAP_CODE_MTI_          | processor-internal machine timer overflow OR user-defined processor-external source (via dedicated top-entity signal)
954
|=======================
955 60 zero_gravi
 
956 69 zero_gravi
.Instruction Address Misaligned Exception
957
[NOTE]
958
For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception
959
is raised if bit 1 of the fetch address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented
960
there will never be a misaligned instruction exception _at all_.
961
In both cases bit 0 of the program counter (and all related registers) is hardwired to zero.
962 60 zero_gravi
 
963
 
964
<<<
965
// ####################################################################################################################
966
:sectnums:
967
==== Bus Interface
968
 
969
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
970
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
971
 
972
:sectnums:
973
===== Address Space
974
 
975
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
976
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
977
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
978 65 zero_gravi
does not support unaligned memory accesses _in hardware_ - however, a software-based handling can be
979 60 zero_gravi
implemented as any unaligned memory access will trigger an according exception.
980
 
981
:sectnums:
982
===== Interface Signals
983
 
984
The following table shows the signals of the data and instruction interfaces seen from the CPU
985
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
986
 
987
.CPU bus interface
988
[cols="<2,^1,<7"]
989
[options="header",grid="rows"]
990
|=======================
991
| Signal | Size | Function
992
| `bus_addr_o`   | 32 | access address
993
| `bus_rdata_i`  | 32 | data input for read operations
994
| `bus_wdata_o`  | 32 | data output for write operations
995
| `bus_ben_o`    | 4  | byte enable signal for write operations
996
| `bus_we_o`     | 1  | bus write access
997
| `bus_re_o`     | 1  | bus read access
998
| `bus_lock_o`   | 1  | exclusive access request
999
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
1000
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
1001
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
1002
| `bus_priv_o`   | 2  | current CPU privilege level
1003
|=======================
1004
 
1005
[NOTE]
1006
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
1007 70 zero_gravi
So only a single transfer request can be "on the fly" (pending) at once.
1008 60 zero_gravi
 
1009
:sectnums:
1010
===== Protocol
1011
 
1012
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
1013
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
1014
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
1015
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
1016
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
1017
access fault or load/store bus access fault exception.
1018
 
1019
[NOTE]
1020
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
1021
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
1022
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
1023
 
1024
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
1025
[IMPORTANT]
1026
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
1027
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
1028
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
1029
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
1030 66 zero_gravi
The _BUSKEEPER_ hardware module (see section <<_internal_bus_monitor_buskeeper>>) keeps track of all _internal_ bus transactions. If any bus operations times out
1031 60 zero_gravi
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
1032
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
1033
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
1034
 
1035
**Exemplary Bus Accesses**
1036
 
1037
.Example bus accesses: see read/write access description below
1038
[cols="^2,^2"]
1039
[grid="none"]
1040
|=======================
1041
a| image::cpu_interface_read_long.png[read,300,150]
1042
a| image::cpu_interface_write_long.png[write,300,150]
1043
| Read access | Write access
1044
|=======================
1045
 
1046
**Write Access**
1047
 
1048
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
1049
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
1050
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
1051
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
1052
cycles after issuing.
1053
 
1054
**Read Access**
1055
 
1056
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
1057
stable until the transaction is completed. In the example the accessed peripheral cannot answer
1058
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
1059
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
1060
signal).
1061
 
1062
**Access Boundaries**
1063
 
1064
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
1065
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1066
bit) and word (= 32-bit) boundaries.
1067
 
1068
**Exclusive (Atomic) Access**
1069
 
1070
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1071
combination. Normally, these combinations should target the same memory address.
1072
 
1073
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1074
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1075
the memory system to manage this exclusive access reservation by storing the according access address and
1076
the source of the access itself (for example via the CPU ID in a multi-core system).
1077
 
1078
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1079
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1080
zero and will allow the according store operation to the memory system. If the lock is broken, the
1081
instruction will write-back non-zero and will not generate an actual memory store operation.
1082
 
1083
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1084
 
1085
* when executing any other memory-access operation than `lr.w`
1086
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1087
* when the memory system signals a bus error (via the `bus_err_i` signal)
1088
 
1089
[TIP]
1090
For more information regarding the SoC-level behavior and requirements of atomic operations see
1091
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1092
 
1093
**Memory Barriers**
1094
 
1095
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
1096
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
1097
memory system to perform the necessary operations (like a cache flush and refill).
1098
 
1099
 
1100
 
1101
<<<
1102
// ####################################################################################################################
1103
:sectnums:
1104
==== CPU Hardware Reset
1105
 
1106
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1107
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1108
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1109
after power-up is not relevant for a defined CPU boot process.
1110
 
1111 70 zero_gravi
**Rationale**
1112 60 zero_gravi
 
1113
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1114
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1115 66 zero_gravi
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
1116 60 zero_gravi
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1117
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1118 70 zero_gravi
the pipeline's data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1119 60 zero_gravi
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1120
this example "uncritical registers".
1121
 
1122
**NEORV32 CPU Reset**
1123
 
1124
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1125
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1126 70 zero_gravi
pipeline register will get initialized by the CPU's internal state machines, which are initialized from the main
1127 60 zero_gravi
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1128
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1129
 
1130
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1131
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
1132
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1133 70 zero_gravi
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) _do_ provide a dedicated
1134
hardware reset setting this bit to low (globally disabling interrupts).
1135 60 zero_gravi
 
1136
**Reset Configuration**
1137
 
1138 70 zero_gravi
Most CPU-internal register do provide an asynchronous reset in the VHDL code, but the "don't care" value
1139
(VHDL `'-'`) is used for initialization of all uncritical registers, effectively generating a flip-flop without a
1140 60 zero_gravi
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1141 70 zero_gravi
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all CPU registers can
1142
be enabled ba enabling a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1143 60 zero_gravi
 
1144
[source,vhdl]
1145
----
1146
-- "critical" number of PMP regions --
1147 70 zero_gravi
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value)
1148 60 zero_gravi
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.