OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 66

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8 66 zero_gravi
* 32-bit multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11 66 zero_gravi
** `B` - bit-manipulation instructions
12 61 zero_gravi
** `C` - 16-bit compressed instructions
13
** `I` - integer base ISA (always enabled)
14
** `E` - embedded CPU version (reduced register file size)
15
** `M` - integer multiplication and division hardware
16
** `U` - less-privileged _user_ mode
17
** `Zfinx` - single-precision floating-point unit
18
** `Zicsr` - control and status register access (privileged architecture)
19 66 zero_gravi
** `Zicntr` - CPU base counters
20
** `Zihpm` - hardware performance monitors
21 61 zero_gravi
** `Zifencei` - instruction stream synchronization
22
** `Zmmul` - integer multiplication hardware
23
** `PMP` - physical memory protection
24 66 zero_gravi
** `Debug` - debug mode
25 65 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
26 60 zero_gravi
* Official RISC-V open-source architecture ID
27 65 zero_gravi
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
28 66 zero_gravi
* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
29
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
30 60 zero_gravi
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
31
* Optional hardware performance monitors (HPM) for application benchmarking
32 66 zero_gravi
* Separated interfaces for instruction fetch and data access (merged into a single processor bus))
33 60 zero_gravi
* little-endian byte order
34
* Configurable hardware reset
35 65 zero_gravi
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
36 60 zero_gravi
 
37
[NOTE]
38
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
39
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
40
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
41
setup also allows to further use the default bootloader and software framework. From this base you
42
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
43
 
44
[NOTE]
45
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
46
 
47
<<<
48
// ####################################################################################################################
49
:sectnums:
50
=== Architecture
51
 
52
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
53
specifications. The following figure shows the simplified architecture of the CPU.
54
 
55
image::neorv32_cpu.png[align=center]
56
 
57 66 zero_gravi
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
58
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
59
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
60
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
61 60 zero_gravi
 
62 66 zero_gravi
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
63
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
64
data is stored to a FIFO queue - the instruction prefetch buffer.
65 60 zero_gravi
 
66 66 zero_gravi
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
67
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
68
instruction or decompressed 16-bit instructions) for execution.
69
 
70
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
71
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
72
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
73
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
74
 
75 60 zero_gravi
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
76
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
77 66 zero_gravi
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
78
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
79
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
80
multi-cycle concept).
81 60 zero_gravi
 
82 66 zero_gravi
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
83
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
84
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
85
address space.
86 60 zero_gravi
 
87
 
88
// ####################################################################################################################
89
:sectnums:
90 66 zero_gravi
=== Full Virtualization
91
 
92
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU _and_ SoC level to
93
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
94
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
95
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing an
96
malformed instruction word or accessing a not-allocated memory address). For any kind of trap the core is always in a
97
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
98
might have to reverted). This allows predictable execution behavior at any time improving overall _execution safety_.
99
 
100
**Execution Safety - NEORV32 Virtualization Features**
101
 
102
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
103
(i.e. there is no speculative execution / no out-of-order states).
104
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
105
accessed address does not respond or encounters an internal error during access.
106
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
107
window. Otherwise a bus access exception is raised.
108
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
109
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
110
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
111
memory operations).
112
* To be continued...
113
 
114
 
115
// ####################################################################################################################
116
:sectnums:
117 60 zero_gravi
=== RISC-V Compatibility
118
 
119
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
120
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
121 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
122
 
123
[NOTE]
124
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
125 60 zero_gravi
for information how to run the tests on the NEORV32.
126
 
127
.**RISC-V `rv32_m/C` Tests**
128
...................................
129
Check cadd-01           ... OK
130
Check caddi-01          ... OK
131
Check caddi16sp-01      ... OK
132
Check caddi4spn-01      ... OK
133
Check cand-01           ... OK
134
Check candi-01          ... OK
135
Check cbeqz-01          ... OK
136
Check cbnez-01          ... OK
137
Check cebreak-01        ... OK
138
Check cj-01             ... OK
139
Check cjal-01           ... OK
140
Check cjalr-01          ... OK
141
Check cjr-01            ... OK
142
Check cli-01            ... OK
143
Check clui-01           ... OK
144
Check clw-01            ... OK
145
Check clwsp-01          ... OK
146
Check cmv-01            ... OK
147
Check cnop-01           ... OK
148
Check cor-01            ... OK
149
Check cslli-01          ... OK
150
Check csrai-01          ... OK
151
Check csrli-01          ... OK
152
Check csub-01           ... OK
153
Check csw-01            ... OK
154
Check cswsp-01          ... OK
155
Check cxor-01           ... OK
156
--------------------------------
157
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
158
...................................
159
 
160
.**RISC-V `rv32_m/I` Tests**
161
...................................
162
Check add-01            ... OK
163
Check addi-01           ... OK
164
Check and-01            ... OK
165
Check andi-01           ... OK
166
Check auipc-01          ... OK
167
Check beq-01            ... OK
168
Check bge-01            ... OK
169
Check bgeu-01           ... OK
170
Check blt-01            ... OK
171
Check bltu-01           ... OK
172
Check bne-01            ... OK
173
Check fence-01          ... OK
174
Check jal-01            ... OK
175
Check jalr-01           ... OK
176
Check lb-align-01       ... OK
177
Check lbu-align-01      ... OK
178
Check lh-align-01       ... OK
179
Check lhu-align-01      ... OK
180
Check lui-01            ... OK
181
Check lw-align-01       ... OK
182
Check or-01             ... OK
183
Check ori-01            ... OK
184
Check sb-align-01       ... OK
185
Check sh-align-01       ... OK
186
Check sll-01            ... OK
187
Check slli-01           ... OK
188
Check slt-01            ... OK
189
Check slti-01           ... OK
190
Check sltiu-01          ... OK
191
Check sltu-01           ... OK
192
Check sra-01            ... OK
193
Check srai-01           ... OK
194
Check srl-01            ... OK
195
Check srli-01           ... OK
196
Check sub-01            ... OK
197
Check sw-align-01       ... OK
198
Check xor-01            ... OK
199
Check xori-01           ... OK
200
--------------------------------
201
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
202
...................................
203
 
204
.**RISC-V `rv32_m/M` Tests**
205
...................................
206
Check div-01            ... OK
207
Check divu-01           ... OK
208
Check mul-01            ... OK
209
Check mulh-01           ... OK
210
Check mulhsu-01         ... OK
211
Check mulhu-01          ... OK
212
Check rem-01            ... OK
213
Check remu-01           ... OK
214
--------------------------------
215
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
216
...................................
217
 
218
.**RISC-V `rv32_m/privilege` Tests**
219
...................................
220
Check ebreak            ... OK
221
Check ecall             ... OK
222
Check misalign-beq-01   ... OK
223
Check misalign-bge-01   ... OK
224
Check misalign-bgeu-01  ... OK
225
Check misalign-blt-01   ... OK
226
Check misalign-bltu-01  ... OK
227
Check misalign-bne-01   ... OK
228
Check misalign-jal-01   ... OK
229
Check misalign-lh-01    ... OK
230
Check misalign-lhu-01   ... OK
231
Check misalign-lw-01    ... OK
232
Check misalign-sh-01    ... OK
233
Check misalign-sw-01    ... OK
234
Check misalign1-jalr-01 ... OK
235
Check misalign2-jalr-01 ... OK
236
--------------------------------
237
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
238
...................................
239
 
240
.**RISC-V `rv32_m/Zifencei` Tests**
241
...................................
242
Check Fencei            ... OK
243
--------------------------------
244
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
245
...................................
246
 
247
 
248
<<<
249
:sectnums:
250
==== RISC-V Incompatibility Issues and Limitations
251
 
252 64 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
253 60 zero_gravi
can be found in section <<_instruction_sets_and_extensions>>.
254
 
255 64 zero_gravi
.Hardwired R/W CSRs
256 60 zero_gravi
[IMPORTANT]
257 64 zero_gravi
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
258
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
259 65 zero_gravi
Pending interrupt can only be cleared by acknowledging the interrupt-causing device. However, pending interrupts
260
can still be ignored by clearing the according `mie` register bits.
261 60 zero_gravi
 
262 64 zero_gravi
.Physical memory protection
263 60 zero_gravi
[IMPORTANT]
264
The physical memory protection (see section <<_machine_physical_memory_protection>>)
265
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
266
 
267 64 zero_gravi
.Atomic memory operations
268 60 zero_gravi
[IMPORTANT]
269 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
270
However, these instructions are sufficient to emulate all further atomic memory operations.
271 60 zero_gravi
 
272 66 zero_gravi
.Bit-manipulation operations
273
[IMPORTANT]
274
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
275
and the _address generation instructions_ (`Zba`) subset yet.
276
 
277 64 zero_gravi
.Instruction Misalignment
278
[NOTE]
279
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
280
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
281
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
282
there will be no misaligned instruction exceptions _at all_.
283
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
284 60 zero_gravi
 
285
<<<
286
// ####################################################################################################################
287
:sectnums:
288
=== CPU Top Entity - Signals
289
 
290
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
291
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
292
direction seen from the CPU.
293
 
294
.NEORV32 CPU top entity signals
295
[cols="<2,^1,^1,<6"]
296
[options="header", grid="rows"]
297
|=======================
298
| Signal           | Width | Dir.   | Function
299
4+^| **Global Signals**
300
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
301
| `rstn_i`         |     1 | in  | global reset, low-active
302
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
303
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
304
| `i_bus_addr_o`   |    32 | out | destination address
305
| `i_bus_rdata_i`  |    32 | in  | read data
306
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
307
| `i_bus_ben_o`    |     4 | out | byte enable
308
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
309
| `i_bus_re_o`     |     1 | out | read transaction
310
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
311
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
312
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
313
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
314
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
315
4+^| **Data Bus Interface (<<_bus_interface>>)**
316
| `d_bus_addr_o`   |    32 | out | destination address
317
| `d_bus_rdata_i`  |    32 | in  | read data
318
| `d_bus_wdata_o`  |    32 | out | write data
319
| `d_bus_ben_o`    |     4 | out | byte enable
320
| `d_bus_we_o`     |     1 | out | write transaction
321
| `d_bus_re_o`     |     1 | out | read transaction
322
| `d_bus_lock_o`   |     1 | out | exclusive access request
323
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
324
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
325
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
326
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
327
4+^| **System Time (see <<_timeh>> CSR)**
328
| `time_i`         |    64 | in  | system time input (from MTIME)
329
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
330
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
331
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
332
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
333
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
334
| `firq_i`         |    16 | in  | fast interrupt request signals
335
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
336
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
337
|=======================
338
 
339
<<<
340
// ####################################################################################################################
341
:sectnums:
342
=== CPU Top Entity - Generics
343
 
344
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
345
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
346
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
347
The _specific_ generics are listed below.
348
 
349
[cols="4,4,2"]
350
[frame="all",grid="none"]
351
|======
352
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
353
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
354
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
355 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
356 60 zero_gravi
|======
357
 
358
[cols="4,4,2"]
359
[frame="all",grid="none"]
360
|======
361
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
362
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
363
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
364
|======
365
 
366
[cols="4,4,2"]
367
[frame="all",grid="none"]
368
|======
369
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
370
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
371
|======
372
 
373
 
374
<<<
375
// ####################################################################################################################
376
:sectnums:
377
=== Instruction Sets and Extensions
378
 
379 65 zero_gravi
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
380 60 zero_gravi
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
381 65 zero_gravi
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
382 60 zero_gravi
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
383
 
384
[TIP]
385 63 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
386 64 zero_gravi
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
387 63 zero_gravi
or by executing an instruction and checking for an _illegal instruction exception_.
388 60 zero_gravi
 
389 63 zero_gravi
[NOTE]
390 65 zero_gravi
Executing an instruction from an extension that is not supported yet or that is currently not enabled
391
(via the according top entity generic) will raise an _illegal instruction_ exception.
392 60 zero_gravi
 
393 63 zero_gravi
 
394 60 zero_gravi
==== **`A`** - Atomic Memory Access
395
 
396 65 zero_gravi
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
397
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
398
ISA extension is enabled if the `CPU_EXTENSION_RISCV_A` configuration generic is _true_.
399
In this case the following additional instructions are available:
400 60 zero_gravi
 
401
* `lr.w`: load-reservate
402
* `sc.w`: store-conditional
403
 
404
[NOTE]
405
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
406
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
407 65 zero_gravi
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
408
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
409 60 zero_gravi
 
410 65 zero_gravi
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
411
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
412
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
413
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
414
After the execution of the `sc` instruction, the lock is automatically removed.
415
 
416
The lock is broken if at least one of the following conditions occur:
417
. executing any data memory access instruction other than `lr.w`
418
. raising _any_ t (for example an interrupt or a memory access exception)
419
 
420 60 zero_gravi
[NOTE]
421
The atomic instructions have special requirements for memory system / bus interconnect. More
422
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
423
 
424
 
425 66 zero_gravi
==== **`B`** - Bit-Manipulation Operations
426
 
427
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
428
`CPU_EXTENSION_RISCV_B` configuration generic is _true_.
429
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
430
 
431
[IMPORTANT]
432
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
433
and the _address generation instructions_ (`Zba`) subset yet.
434
 
435
The `Zbb` sub-extension adds the following instruction:
436
 
437
* `andn`, `orn`, `xnor`
438
* `clz`, `ctz`, `cpop`
439
* `max`, `maxu`, `min`, `minu`
440
* `sext.b`, `sext.h`, `zext.h`
441
* `rol`, `ror`, `rori`
442
* `orc.b`, `rev8`
443
 
444
The `Zba` sub-extension adds the following instruction:
445
 
446
* `sh1add`, `sh2add`, `sh3add`
447
 
448
[TIP]
449
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
450
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
451
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
452
shift-related `B` instructions.
453
 
454
[WARNING]
455
The `B` extension is frozen but not officially ratified yet. There is no
456
software support for this extension in the upstream GCC RISC-V port yet. However, an
457
intrinsic library is provided to utilize the provided `B` extension features from C-language
458
code (see `sw/example/bitmanip_test`).
459
 
460
 
461 60 zero_gravi
==== **`C`** - Compressed Instructions
462
 
463 65 zero_gravi
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
464
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
465
In this case the following instructions are available:
466 60 zero_gravi
 
467
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
468
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
469
 
470
[NOTE]
471 65 zero_gravi
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
472
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
473 60 zero_gravi
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
474
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
475
 
476
 
477
==== **`E`** - Embedded CPU
478
 
479 65 zero_gravi
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
480
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the `CPU_EXTENSION_RISCV_E`
481
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
482
This extension does not add any additional instructions or features.
483 60 zero_gravi
 
484 63 zero_gravi
[IMPORTANT]
485
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
486 60 zero_gravi
 
487
 
488
==== **`I`** - Base Integer ISA
489 65 zero_gravi
 
490 60 zero_gravi
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
491
regardless of the setting of the remaining exceptions. The base instruction set includes the following
492
instructions:
493
 
494 65 zero_gravi
* immediate: `lui`, `auipc`
495 60 zero_gravi
* jumps: `jal`, `jalr`
496
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
497
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
498
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
499
* environment: `ecall`, `ebreak`, `fence`
500
 
501
[NOTE]
502 61 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
503
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
504
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
505 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
506 60 zero_gravi
 
507
[NOTE]
508
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
509
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
510
executed. Any flags within the `fence` instruction word are ignore by the hardware.
511
 
512
 
513
==== **`M`** - Integer Multiplication and Division
514
 
515 65 zero_gravi
Hardware-accelerated integer multiplication and division operations are available when the
516 60 zero_gravi
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
517
available:
518
 
519 61 zero_gravi
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
520
* division: `div`, `divu`, `rem`, `remu`
521 60 zero_gravi
 
522
[NOTE]
523
By default, multiplication and division operations are executed in a bit-serial approach.
524
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
525
generic is _true_ allowing faster execution. Multiplications and divisions
526
always require a fixed amount of cycles to complete - regardless of the input operands.
527
 
528
 
529 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
530
 
531
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
532 65 zero_gravi
of the `M` extensions and is intended for size-constrained setups that require hardware-based
533 61 zero_gravi
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
534 65 zero_gravi
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
535 61 zero_gravi
 
536
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
537
 
538 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
539
will raise an _illegal instruction exception_.
540 61 zero_gravi
 
541 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
542 61 zero_gravi
 
543
[TIP]
544
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
545
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
546 65 zero_gravi
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
547 61 zero_gravi
 
548
 
549 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
550
 
551 65 zero_gravi
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
552
operation mode. It is implemented if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_.
553
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
554
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
555
Any kind of privilege rights violation will raise an exception to allow full virtualization.
556 60 zero_gravi
 
557
 
558
==== **`X`** - NEORV32-Specific (Custom) Extensions
559
 
560
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
561
 
562 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
563
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
564 64 zero_gravi
and `mip` CSR. This extension is mapped to _reserved_ CSR bits, that are available for custom use (according to the
565 60 zero_gravi
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
566 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
567 60 zero_gravi
 
568
 
569 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
570 60 zero_gravi
 
571 65 zero_gravi
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
572
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
573
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
574
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
575
register file-related load/store or move instructions.
576
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
577 60 zero_gravi
 
578 65 zero_gravi
[TIP]
579 60 zero_gravi
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
580
 
581 65 zero_gravi
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
582
to the `F` extension. The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
583 60 zero_gravi
generic is _true_. In this case the following instructions and CSRs are available:
584
 
585
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
586
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
587
* computational: `fadd.s`, `fsub.s`, `fmul.s`
588
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
589
* number classification: `fclass.s`
590
 
591
* additional CSRs: `fcsr`, `frm`, `fflags`
592
 
593
[WARNING]
594
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
595
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
596
 
597
[WARNING]
598 65 zero_gravi
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
599
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
600 60 zero_gravi
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
601
result is also flushed to zero during normalization.
602
 
603
[WARNING]
604
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
605
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
606
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
607
code (see `sw/example/floating_point_test`).
608
 
609 63 zero_gravi
 
610 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
611
 
612 65 zero_gravi
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
613
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
614
In this case the following instructions are available:
615 60 zero_gravi
 
616
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
617
* environment: `mret`, `wfi`
618
 
619
[WARNING]
620 65 zero_gravi
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
621
In order to provide the full set of functions and to allow a secure execution
622
environment the `Zicsr` extension should always be enabled.
623 60 zero_gravi
 
624
[NOTE]
625
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
626
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
627
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
628
 
629 65 zero_gravi
[NOTE]
630
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
631
`TW` (timeout wait) is hardwired to zero.
632 60 zero_gravi
 
633 62 zero_gravi
 
634 66 zero_gravi
 
635
 
636
==== **`Zicntr`** CPU Base Counters
637
 
638
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
639
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
640
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
641
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
642
 
643
[NOTE]
644
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
645
 
646
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
647
 
648
 
649
 
650
==== **`Zihpm`** Hardware Performance Monitors
651
 
652
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
653
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
654
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
655
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
656
CSR defines the architectural events that lead to an increment of the associated HPM counter.
657
 
658
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
659
 
660
Depending on the configuration the following additional CSR are available:
661
 
662
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
663
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
664
 
665
[IMPORTANT]
666
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
667
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
668
exception.
669
 
670
[TIP]
671
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
672
 
673
[TIP]
674
For a list of all HPM-related CSRs and all provided event configurations
675
see section <<_hardware_performance_monitors_hpm>>.
676
 
677
 
678 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
679
 
680
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
681
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
682
 
683
* `fence.i`
684
 
685 66 zero_gravi
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
686 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
687
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
688
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
689 60 zero_gravi
 
690
 
691
==== **`PMP`** Physical Memory Protection
692
 
693 65 zero_gravi
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
694
to constrain memory read/write/execute rights for each available privilege level.
695 60 zero_gravi
 
696 65 zero_gravi
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
697
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
698
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
699
In this case the following additional CSRs are available:
700
 
701 60 zero_gravi
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
702
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
703
 
704 65 zero_gravi
[TIP]
705 60 zero_gravi
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
706
 
707
The actual number of regions and the minimal region granularity are defined via the top entity
708
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
709
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
710
number of available `pmpcfg*` and `pmpaddr*` CSRs.
711
 
712
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
713
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
714
increase the latency of instruction fetches and data access by +1 cycle.
715
 
716
The critical limit can be adapted for custom use by a constant from the main VHDL package file
717
(`rtl/core/neorv32_package.vhd`). The default value is 8:
718
 
719
[source,vhdl]
720
----
721
-- "critical" number of PMP regions --
722
constant pmp_num_regions_critical_c : natural := 8;
723
----
724
 
725
**Operation**
726
 
727 65 zero_gravi
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
728
of the specified  PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
729
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
730 60 zero_gravi
 
731
* a write access (store) will fail if no write attribute is set
732
* a read access (load) will fail if no read attribute is set
733
* an instruction fetch access will fail if no execute attribute is set
734
 
735 65 zero_gravi
If an access to a protected region does not have the according access rights it will raise the according
736
instruction/load/store _access fault_ exception.
737 60 zero_gravi
 
738
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
739 65 zero_gravi
memory protection also for machine-level programs you need to set the _locked bit_ in the according
740
`pmpcfg*` configuration CSR.
741 60 zero_gravi
 
742
[IMPORTANT]
743
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
744
internal (iterative) computations before the configuration becomes valid.
745
 
746
[NOTE]
747
For more information regarding RISC-V physical memory protection see the official _The RISC-V
748 65 zero_gravi
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
749 60 zero_gravi
 
750
 
751
 
752
<<<
753
// ####################################################################################################################
754
:sectnums:
755
=== Instruction Timing
756
 
757
The instruction timing listed in the table below shows the required clock cycles for executing a certain
758
instruction. These instruction cycles assume a bus access without additional wait states and a filled
759
pipeline.
760
 
761
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
762
configurations are presented in <<_cpu_performance>>.
763
 
764
.Clock cycles per instruction
765
[cols="<2,^1,^4,<3"]
766
[options="header", grid="rows"]
767
|=======================
768
| Class | ISA | Instruction(s) | Execution cycles
769
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
770
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
771
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
772 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
773 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
774
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
775
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
776
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
777
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
778
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
779
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
780
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
781
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
782
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
783
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
784
| System | `I/E` | `fence` | 3
785
| System | `C`+`Zicsr` | `c.break` | 4
786
| System | `Zicsr` | `mret` `wfi` | 5
787 66 zero_gravi
| System | `Zifencei` | `fence.i` | 3 + ML
788 60 zero_gravi
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
789
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
790
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
791
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
792
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
793
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
794
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
795 66 zero_gravi
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
796
| Bit-manipulation - arithmetic/logic | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
797
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
798
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
799
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
800
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
801
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
802 60 zero_gravi
|=======================
803
 
804
[NOTE]
805 65 zero_gravi
The presented values of the *floating-point execution cycles* are average values - obtained from
806 60 zero_gravi
4096 instruction executions using pseudo-random input values. The execution time for emulating the
807
instructions (using pure-software libraries) is ~17..140 times higher.
808
 
809
 
810 66 zero_gravi
<<<
811 60 zero_gravi
// ####################################################################################################################
812
include::cpu_csr.adoc[]
813
 
814
 
815
<<<
816
// ####################################################################################################################
817
:sectnums:
818
==== Traps, Exceptions and Interrupts
819
 
820 61 zero_gravi
In this document the following nomenclature regarding traps is used:
821 60 zero_gravi
 
822 64 zero_gravi
* _interrupts_ = asynchronous exceptions
823 60 zero_gravi
* _exceptions_ = synchronous exceptions
824
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
825
 
826 61 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
827
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
828
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
829
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
830 60 zero_gravi
 
831 61 zero_gravi
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
832
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
833 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
834 61 zero_gravi
the second highest priority will get serviced and so on until no further interrupt are pending.
835 60 zero_gravi
 
836 65 zero_gravi
.Interrupt Signal Requirements
837 61 zero_gravi
[IMPORTANT]
838 65 zero_gravi
All interrupts request signals (including FIRQs) are **high-active**. A request has to stay at high-level (=asserted)
839
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
840 60 zero_gravi
 
841 61 zero_gravi
.Instruction Atomicity
842
[NOTE]
843 65 zero_gravi
All instructions execute as atomic operations - interrupts can only trigger between two instructions.
844 64 zero_gravi
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
845
a new interrupt handler can start.
846 60 zero_gravi
 
847
 
848 61 zero_gravi
:sectnums:
849
==== Memory Access Exceptions**
850 60 zero_gravi
 
851 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
852
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
853
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
854 64 zero_gravi
memory protection fault do not trigger a bus write-operation at all.
855 60 zero_gravi
 
856
 
857 61 zero_gravi
:sectnums:
858
==== Custom Fast Interrupt Request Lines
859 60 zero_gravi
 
860 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
861 60 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
862 65 zero_gravi
provide custom trap codes in `mcause`. These FIRQs are reserved for NEORV32 processor-internal usage only.
863 60 zero_gravi
 
864
 
865
 
866
<<<
867
// ####################################################################################################################
868
:sectnums!:
869
===== NEORV32 Trap Listing
870
 
871
.NEORV32 trap listing
872
[cols="3,6,5,14,11,4,4"]
873
[options="header",grid="rows"]
874
|=======================
875 64 zero_gravi
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
876
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
877
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
878
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
879
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
880
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
881
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (EBREAK) | _PC_ | _PC_
882
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
883
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
884
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
885
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
886
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
887
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
888
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
889
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
890
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
891
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
892
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
893
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
894
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
895
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
896
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
897
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
898
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
899
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
900
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
901
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
902
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
903
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
904
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
905 60 zero_gravi
|=======================
906
 
907
**Notes**
908
 
909
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
910
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
911
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
912
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
913
`mepc` and `mtval` CSRs when a trap is triggered:
914
 
915
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
916
* _B-ADR_- bad memory access address that cause the trap
917
* _PC_ - address of instruction that caused the trap
918
* _0_ - zero
919
* _Inst_ - the faulting instruction itself
920
 
921
 
922
 
923
<<<
924
// ####################################################################################################################
925
:sectnums:
926
==== Bus Interface
927
 
928
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
929
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
930
 
931
:sectnums:
932
===== Address Space
933
 
934
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
935
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
936
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
937 65 zero_gravi
does not support unaligned memory accesses _in hardware_ - however, a software-based handling can be
938 60 zero_gravi
implemented as any unaligned memory access will trigger an according exception.
939
 
940
:sectnums:
941
===== Interface Signals
942
 
943
The following table shows the signals of the data and instruction interfaces seen from the CPU
944
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
945
 
946
.CPU bus interface
947
[cols="<2,^1,<7"]
948
[options="header",grid="rows"]
949
|=======================
950
| Signal | Size | Function
951
| `bus_addr_o`   | 32 | access address
952
| `bus_rdata_i`  | 32 | data input for read operations
953
| `bus_wdata_o`  | 32 | data output for write operations
954
| `bus_ben_o`    | 4  | byte enable signal for write operations
955
| `bus_we_o`     | 1  | bus write access
956
| `bus_re_o`     | 1  | bus read access
957
| `bus_lock_o`   | 1  | exclusive access request
958
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
959
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
960
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
961
| `bus_priv_o`   | 2  | current CPU privilege level
962
|=======================
963
 
964
[NOTE]
965
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
966
So only a single transfer request can be "on the fly".
967
 
968
:sectnums:
969
===== Protocol
970
 
971
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
972
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
973
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
974
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
975
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
976
access fault or load/store bus access fault exception.
977
 
978
[NOTE]
979
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
980
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
981
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
982
 
983
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
984
[IMPORTANT]
985
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
986
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
987
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
988
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
989 66 zero_gravi
The _BUSKEEPER_ hardware module (see section <<_internal_bus_monitor_buskeeper>>) keeps track of all _internal_ bus transactions. If any bus operations times out
990 60 zero_gravi
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
991
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
992
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
993
 
994
**Exemplary Bus Accesses**
995
 
996
.Example bus accesses: see read/write access description below
997
[cols="^2,^2"]
998
[grid="none"]
999
|=======================
1000
a| image::cpu_interface_read_long.png[read,300,150]
1001
a| image::cpu_interface_write_long.png[write,300,150]
1002
| Read access | Write access
1003
|=======================
1004
 
1005
**Write Access**
1006
 
1007
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
1008
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
1009
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
1010
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
1011
cycles after issuing.
1012
 
1013
**Read Access**
1014
 
1015
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
1016
stable until the transaction is completed. In the example the accessed peripheral cannot answer
1017
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
1018
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
1019
signal).
1020
 
1021
**Access Boundaries**
1022
 
1023
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
1024
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1025
bit) and word (= 32-bit) boundaries.
1026
 
1027
**Exclusive (Atomic) Access**
1028
 
1029
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1030
combination. Normally, these combinations should target the same memory address.
1031
 
1032
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1033
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1034
the memory system to manage this exclusive access reservation by storing the according access address and
1035
the source of the access itself (for example via the CPU ID in a multi-core system).
1036
 
1037
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1038
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1039
zero and will allow the according store operation to the memory system. If the lock is broken, the
1040
instruction will write-back non-zero and will not generate an actual memory store operation.
1041
 
1042
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1043
 
1044
* when executing any other memory-access operation than `lr.w`
1045
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1046
* when the memory system signals a bus error (via the `bus_err_i` signal)
1047
 
1048
[TIP]
1049
For more information regarding the SoC-level behavior and requirements of atomic operations see
1050
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1051
 
1052
**Memory Barriers**
1053
 
1054
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
1055
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
1056
memory system to perform the necessary operations (like a cache flush and refill).
1057
 
1058
 
1059
 
1060
<<<
1061
// ####################################################################################################################
1062
:sectnums:
1063
==== CPU Hardware Reset
1064
 
1065
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1066
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1067
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1068
after power-up is not relevant for a defined CPU boot process.
1069
 
1070
**Rational**
1071
 
1072
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1073
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1074 66 zero_gravi
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
1075 60 zero_gravi
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1076
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1077
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1078
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1079
this example "uncritical registers".
1080
 
1081
**NEORV32 CPU Reset**
1082
 
1083
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1084
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1085
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
1086
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1087
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1088
 
1089
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1090
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
1091
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1092
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
1093
hardware reset setting it to low (globally disabling interrupts).
1094
 
1095
**Reset Configuration**
1096
 
1097
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
1098
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
1099
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1100
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
1101
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1102
 
1103
[source,vhdl]
1104
----
1105
-- "critical" number of PMP regions --
1106
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
1107
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
1108
default; TRUE=defined LOW reset value)
1109
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.