OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 68

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8 66 zero_gravi
* 32-bit multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11 66 zero_gravi
** `B` - bit-manipulation instructions
12 61 zero_gravi
** `C` - 16-bit compressed instructions
13
** `I` - integer base ISA (always enabled)
14
** `E` - embedded CPU version (reduced register file size)
15
** `M` - integer multiplication and division hardware
16
** `U` - less-privileged _user_ mode
17
** `Zfinx` - single-precision floating-point unit
18
** `Zicsr` - control and status register access (privileged architecture)
19 66 zero_gravi
** `Zicntr` - CPU base counters
20
** `Zihpm` - hardware performance monitors
21 61 zero_gravi
** `Zifencei` - instruction stream synchronization
22
** `Zmmul` - integer multiplication hardware
23
** `PMP` - physical memory protection
24 66 zero_gravi
** `Debug` - debug mode
25 65 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
26 60 zero_gravi
* Official RISC-V open-source architecture ID
27 65 zero_gravi
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
28 66 zero_gravi
* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
29
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
30 60 zero_gravi
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
31
* Optional hardware performance monitors (HPM) for application benchmarking
32 66 zero_gravi
* Separated interfaces for instruction fetch and data access (merged into a single processor bus))
33 60 zero_gravi
* little-endian byte order
34
* Configurable hardware reset
35 65 zero_gravi
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
36 60 zero_gravi
 
37
[NOTE]
38
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
39
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
40
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
41
setup also allows to further use the default bootloader and software framework. From this base you
42
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
43
 
44
[NOTE]
45
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
46
 
47
<<<
48
// ####################################################################################################################
49
:sectnums:
50
=== Architecture
51
 
52
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
53
specifications. The following figure shows the simplified architecture of the CPU.
54
 
55
image::neorv32_cpu.png[align=center]
56
 
57 66 zero_gravi
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
58
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
59
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
60
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
61 60 zero_gravi
 
62 66 zero_gravi
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
63
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
64
data is stored to a FIFO queue - the instruction prefetch buffer.
65 60 zero_gravi
 
66 66 zero_gravi
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
67
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
68
instruction or decompressed 16-bit instructions) for execution.
69
 
70
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
71
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
72
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
73
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
74
 
75 60 zero_gravi
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
76
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
77 66 zero_gravi
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
78
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
79
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
80
multi-cycle concept).
81 60 zero_gravi
 
82 66 zero_gravi
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
83
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
84
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
85
address space.
86 60 zero_gravi
 
87
 
88
// ####################################################################################################################
89
:sectnums:
90 66 zero_gravi
=== Full Virtualization
91
 
92
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU _and_ SoC level to
93
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
94
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
95
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing an
96
malformed instruction word or accessing a not-allocated memory address). For any kind of trap the core is always in a
97
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
98
might have to reverted). This allows predictable execution behavior at any time improving overall _execution safety_.
99
 
100
**Execution Safety - NEORV32 Virtualization Features**
101
 
102
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
103
(i.e. there is no speculative execution / no out-of-order states).
104
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
105
accessed address does not respond or encounters an internal error during access.
106
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
107
window. Otherwise a bus access exception is raised.
108
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
109
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
110
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
111
memory operations).
112
* To be continued...
113
 
114
 
115
// ####################################################################################################################
116
:sectnums:
117 60 zero_gravi
=== RISC-V Compatibility
118
 
119
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
120
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
121 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
122
 
123
[NOTE]
124
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
125 60 zero_gravi
for information how to run the tests on the NEORV32.
126
 
127
.**RISC-V `rv32_m/C` Tests**
128
...................................
129
Check cadd-01           ... OK
130
Check caddi-01          ... OK
131
Check caddi16sp-01      ... OK
132
Check caddi4spn-01      ... OK
133
Check cand-01           ... OK
134
Check candi-01          ... OK
135
Check cbeqz-01          ... OK
136
Check cbnez-01          ... OK
137
Check cebreak-01        ... OK
138
Check cj-01             ... OK
139
Check cjal-01           ... OK
140
Check cjalr-01          ... OK
141
Check cjr-01            ... OK
142
Check cli-01            ... OK
143
Check clui-01           ... OK
144
Check clw-01            ... OK
145
Check clwsp-01          ... OK
146
Check cmv-01            ... OK
147
Check cnop-01           ... OK
148
Check cor-01            ... OK
149
Check cslli-01          ... OK
150
Check csrai-01          ... OK
151
Check csrli-01          ... OK
152
Check csub-01           ... OK
153
Check csw-01            ... OK
154
Check cswsp-01          ... OK
155
Check cxor-01           ... OK
156
--------------------------------
157
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
158
...................................
159
 
160
.**RISC-V `rv32_m/I` Tests**
161
...................................
162
Check add-01            ... OK
163
Check addi-01           ... OK
164
Check and-01            ... OK
165
Check andi-01           ... OK
166
Check auipc-01          ... OK
167
Check beq-01            ... OK
168
Check bge-01            ... OK
169
Check bgeu-01           ... OK
170
Check blt-01            ... OK
171
Check bltu-01           ... OK
172
Check bne-01            ... OK
173
Check fence-01          ... OK
174
Check jal-01            ... OK
175
Check jalr-01           ... OK
176
Check lb-align-01       ... OK
177
Check lbu-align-01      ... OK
178
Check lh-align-01       ... OK
179
Check lhu-align-01      ... OK
180
Check lui-01            ... OK
181
Check lw-align-01       ... OK
182
Check or-01             ... OK
183
Check ori-01            ... OK
184
Check sb-align-01       ... OK
185
Check sh-align-01       ... OK
186
Check sll-01            ... OK
187
Check slli-01           ... OK
188
Check slt-01            ... OK
189
Check slti-01           ... OK
190
Check sltiu-01          ... OK
191
Check sltu-01           ... OK
192
Check sra-01            ... OK
193
Check srai-01           ... OK
194
Check srl-01            ... OK
195
Check srli-01           ... OK
196
Check sub-01            ... OK
197
Check sw-align-01       ... OK
198
Check xor-01            ... OK
199
Check xori-01           ... OK
200
--------------------------------
201
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
202
...................................
203
 
204
.**RISC-V `rv32_m/M` Tests**
205
...................................
206
Check div-01            ... OK
207
Check divu-01           ... OK
208
Check mul-01            ... OK
209
Check mulh-01           ... OK
210
Check mulhsu-01         ... OK
211
Check mulhu-01          ... OK
212
Check rem-01            ... OK
213
Check remu-01           ... OK
214
--------------------------------
215
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
216
...................................
217
 
218
.**RISC-V `rv32_m/privilege` Tests**
219
...................................
220
Check ebreak            ... OK
221
Check ecall             ... OK
222
Check misalign-beq-01   ... OK
223
Check misalign-bge-01   ... OK
224
Check misalign-bgeu-01  ... OK
225
Check misalign-blt-01   ... OK
226
Check misalign-bltu-01  ... OK
227
Check misalign-bne-01   ... OK
228
Check misalign-jal-01   ... OK
229
Check misalign-lh-01    ... OK
230
Check misalign-lhu-01   ... OK
231
Check misalign-lw-01    ... OK
232
Check misalign-sh-01    ... OK
233
Check misalign-sw-01    ... OK
234
Check misalign1-jalr-01 ... OK
235
Check misalign2-jalr-01 ... OK
236
--------------------------------
237
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
238
...................................
239
 
240
.**RISC-V `rv32_m/Zifencei` Tests**
241
...................................
242
Check Fencei            ... OK
243
--------------------------------
244
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
245
...................................
246
 
247
 
248
<<<
249
:sectnums:
250
==== RISC-V Incompatibility Issues and Limitations
251
 
252 64 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
253 60 zero_gravi
can be found in section <<_instruction_sets_and_extensions>>.
254
 
255 64 zero_gravi
.Hardwired R/W CSRs
256 60 zero_gravi
[IMPORTANT]
257 64 zero_gravi
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
258
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
259 65 zero_gravi
Pending interrupt can only be cleared by acknowledging the interrupt-causing device. However, pending interrupts
260
can still be ignored by clearing the according `mie` register bits.
261 60 zero_gravi
 
262 64 zero_gravi
.Physical memory protection
263 60 zero_gravi
[IMPORTANT]
264
The physical memory protection (see section <<_machine_physical_memory_protection>>)
265
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
266
 
267 64 zero_gravi
.Atomic memory operations
268 60 zero_gravi
[IMPORTANT]
269 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
270
However, these instructions are sufficient to emulate all further atomic memory operations.
271 60 zero_gravi
 
272 66 zero_gravi
.Bit-manipulation operations
273
[IMPORTANT]
274
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
275
and the _address generation instructions_ (`Zba`) subset yet.
276
 
277 64 zero_gravi
.Instruction Misalignment
278
[NOTE]
279
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
280
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
281
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
282
there will be no misaligned instruction exceptions _at all_.
283
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
284 60 zero_gravi
 
285
<<<
286
// ####################################################################################################################
287
:sectnums:
288
=== CPU Top Entity - Signals
289
 
290
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
291
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
292
direction seen from the CPU.
293
 
294
.NEORV32 CPU top entity signals
295
[cols="<2,^1,^1,<6"]
296
[options="header", grid="rows"]
297
|=======================
298
| Signal           | Width | Dir.   | Function
299
4+^| **Global Signals**
300
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
301
| `rstn_i`         |     1 | in  | global reset, low-active
302
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
303
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
304
| `i_bus_addr_o`   |    32 | out | destination address
305
| `i_bus_rdata_i`  |    32 | in  | read data
306
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
307
| `i_bus_ben_o`    |     4 | out | byte enable
308
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
309
| `i_bus_re_o`     |     1 | out | read transaction
310
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
311
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
312
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
313
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
314
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
315
4+^| **Data Bus Interface (<<_bus_interface>>)**
316
| `d_bus_addr_o`   |    32 | out | destination address
317
| `d_bus_rdata_i`  |    32 | in  | read data
318
| `d_bus_wdata_o`  |    32 | out | write data
319
| `d_bus_ben_o`    |     4 | out | byte enable
320
| `d_bus_we_o`     |     1 | out | write transaction
321
| `d_bus_re_o`     |     1 | out | read transaction
322
| `d_bus_lock_o`   |     1 | out | exclusive access request
323
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
324
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
325
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
326
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
327
4+^| **System Time (see <<_timeh>> CSR)**
328
| `time_i`         |    64 | in  | system time input (from MTIME)
329
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
330
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
331
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
332
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
333
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
334
| `firq_i`         |    16 | in  | fast interrupt request signals
335
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
336
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
337
|=======================
338
 
339
<<<
340
// ####################################################################################################################
341
:sectnums:
342
=== CPU Top Entity - Generics
343
 
344
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
345
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
346
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
347
The _specific_ generics are listed below.
348
 
349
[cols="4,4,2"]
350
[frame="all",grid="none"]
351
|======
352
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
353
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
354
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
355 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
356 60 zero_gravi
|======
357
 
358
[cols="4,4,2"]
359
[frame="all",grid="none"]
360
|======
361
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
362
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
363
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
364
|======
365
 
366
[cols="4,4,2"]
367
[frame="all",grid="none"]
368
|======
369
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
370
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
371
|======
372
 
373
 
374
<<<
375
// ####################################################################################################################
376
:sectnums:
377
=== Instruction Sets and Extensions
378
 
379 65 zero_gravi
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
380 60 zero_gravi
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
381 65 zero_gravi
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
382 60 zero_gravi
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
383
 
384
[TIP]
385 63 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
386 64 zero_gravi
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
387 63 zero_gravi
or by executing an instruction and checking for an _illegal instruction exception_.
388 60 zero_gravi
 
389 63 zero_gravi
[NOTE]
390 65 zero_gravi
Executing an instruction from an extension that is not supported yet or that is currently not enabled
391
(via the according top entity generic) will raise an _illegal instruction_ exception.
392 60 zero_gravi
 
393 63 zero_gravi
 
394 60 zero_gravi
==== **`A`** - Atomic Memory Access
395
 
396 65 zero_gravi
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
397
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
398
ISA extension is enabled if the `CPU_EXTENSION_RISCV_A` configuration generic is _true_.
399
In this case the following additional instructions are available:
400 60 zero_gravi
 
401
* `lr.w`: load-reservate
402
* `sc.w`: store-conditional
403
 
404
[NOTE]
405
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
406
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
407 65 zero_gravi
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
408
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
409 60 zero_gravi
 
410 65 zero_gravi
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
411
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
412
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
413
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
414
After the execution of the `sc` instruction, the lock is automatically removed.
415
 
416
The lock is broken if at least one of the following conditions occur:
417
. executing any data memory access instruction other than `lr.w`
418
. raising _any_ t (for example an interrupt or a memory access exception)
419
 
420 60 zero_gravi
[NOTE]
421
The atomic instructions have special requirements for memory system / bus interconnect. More
422
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
423
 
424
 
425 66 zero_gravi
==== **`B`** - Bit-Manipulation Operations
426
 
427
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
428
`CPU_EXTENSION_RISCV_B` configuration generic is _true_.
429
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
430
 
431
[IMPORTANT]
432
The NEORV32 `B` extension only implements the _basic bit-manipulation instructions_ (`Zbb`) subset
433
and the _address generation instructions_ (`Zba`) subset yet.
434
 
435
The `Zbb` sub-extension adds the following instruction:
436
 
437
* `andn`, `orn`, `xnor`
438
* `clz`, `ctz`, `cpop`
439
* `max`, `maxu`, `min`, `minu`
440
* `sext.b`, `sext.h`, `zext.h`
441
* `rol`, `ror`, `rori`
442
* `orc.b`, `rev8`
443
 
444
The `Zba` sub-extension adds the following instruction:
445
 
446
* `sh1add`, `sh2add`, `sh3add`
447
 
448
[TIP]
449
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
450
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
451
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
452
shift-related `B` instructions.
453
 
454
[WARNING]
455
The `B` extension is frozen but not officially ratified yet. There is no
456
software support for this extension in the upstream GCC RISC-V port yet. However, an
457
intrinsic library is provided to utilize the provided `B` extension features from C-language
458
code (see `sw/example/bitmanip_test`).
459
 
460
 
461 60 zero_gravi
==== **`C`** - Compressed Instructions
462
 
463 65 zero_gravi
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
464
The `C` extension is available when the `CPU_EXTENSION_RISCV_C` configuration generic is _true_.
465
In this case the following instructions are available:
466 60 zero_gravi
 
467
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
468
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
469
 
470
[NOTE]
471 65 zero_gravi
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
472
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
473 60 zero_gravi
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
474
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
475
 
476
 
477
==== **`E`** - Embedded CPU
478
 
479 65 zero_gravi
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
480
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the `CPU_EXTENSION_RISCV_E`
481
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
482
This extension does not add any additional instructions or features.
483 60 zero_gravi
 
484 63 zero_gravi
[IMPORTANT]
485
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
486 60 zero_gravi
 
487
 
488
==== **`I`** - Base Integer ISA
489 65 zero_gravi
 
490 60 zero_gravi
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
491
regardless of the setting of the remaining exceptions. The base instruction set includes the following
492
instructions:
493
 
494 65 zero_gravi
* immediate: `lui`, `auipc`
495 60 zero_gravi
* jumps: `jal`, `jalr`
496
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
497
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
498
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
499
* environment: `ecall`, `ebreak`, `fence`
500
 
501
[NOTE]
502 61 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
503
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
504
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
505 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
506 60 zero_gravi
 
507
[NOTE]
508
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
509
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
510
executed. Any flags within the `fence` instruction word are ignore by the hardware.
511
 
512
 
513
==== **`M`** - Integer Multiplication and Division
514
 
515 65 zero_gravi
Hardware-accelerated integer multiplication and division operations are available when the
516 60 zero_gravi
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
517
available:
518
 
519 61 zero_gravi
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
520
* division: `div`, `divu`, `rem`, `remu`
521 60 zero_gravi
 
522
[NOTE]
523
By default, multiplication and division operations are executed in a bit-serial approach.
524
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
525
generic is _true_ allowing faster execution. Multiplications and divisions
526
always require a fixed amount of cycles to complete - regardless of the input operands.
527
 
528
 
529 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
530
 
531
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
532 65 zero_gravi
of the `M` extensions and is intended for size-constrained setups that require hardware-based
533 61 zero_gravi
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
534 65 zero_gravi
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
535 61 zero_gravi
 
536
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
537
 
538 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
539
will raise an _illegal instruction exception_.
540 61 zero_gravi
 
541 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
542 61 zero_gravi
 
543
[TIP]
544
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
545
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
546 65 zero_gravi
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
547 61 zero_gravi
 
548
 
549 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
550
 
551 65 zero_gravi
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
552
operation mode. It is implemented if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_.
553
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
554
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
555
Any kind of privilege rights violation will raise an exception to allow full virtualization.
556 60 zero_gravi
 
557
 
558
==== **`X`** - NEORV32-Specific (Custom) Extensions
559
 
560
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
561
 
562 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
563
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
564 64 zero_gravi
and `mip` CSR. This extension is mapped to _reserved_ CSR bits, that are available for custom use (according to the
565 60 zero_gravi
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
566 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
567 60 zero_gravi
 
568
 
569 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
570 60 zero_gravi
 
571 65 zero_gravi
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
572
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
573
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
574
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
575
register file-related load/store or move instructions.
576
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
577 60 zero_gravi
 
578 65 zero_gravi
[TIP]
579 60 zero_gravi
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
580
 
581 65 zero_gravi
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
582
to the `F` extension. The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
583 60 zero_gravi
generic is _true_. In this case the following instructions and CSRs are available:
584
 
585
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
586
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
587
* computational: `fadd.s`, `fsub.s`, `fmul.s`
588
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
589
* number classification: `fclass.s`
590
 
591
* additional CSRs: `fcsr`, `frm`, `fflags`
592
 
593
[WARNING]
594
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
595
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
596
 
597
[WARNING]
598 65 zero_gravi
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
599
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
600 60 zero_gravi
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
601
result is also flushed to zero during normalization.
602
 
603
[WARNING]
604
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
605
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
606
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
607
code (see `sw/example/floating_point_test`).
608
 
609 63 zero_gravi
 
610 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
611
 
612 65 zero_gravi
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
613
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.
614 68 zero_gravi
 
615
[IMPORTANT]
616
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
617
In order to provide the full set of privileged functions that are required to run more complex tasks like
618
operating system and to allow a secure execution environment the `Zicsr` extension should always be enabled.
619
 
620 65 zero_gravi
In this case the following instructions are available:
621 60 zero_gravi
 
622
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
623
* environment: `mret`, `wfi`
624
 
625 68 zero_gravi
[NOTE]
626
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
627
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
628
(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").
629 60 zero_gravi
 
630
[NOTE]
631 68 zero_gravi
The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is
632 60 zero_gravi
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
633
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
634 65 zero_gravi
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
635 68 zero_gravi
`TW` (timeout wait) is _hardwired_ to zero.
636 60 zero_gravi
 
637 62 zero_gravi
 
638 66 zero_gravi
 
639
==== **`Zicntr`** CPU Base Counters
640
 
641
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
642
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
643
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
644
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
645
 
646
[NOTE]
647
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
648
 
649
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
650
 
651
 
652
 
653
==== **`Zihpm`** Hardware Performance Monitors
654
 
655
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
656
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
657
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
658
`HPM_CNT_WIDTH` generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
659
CSR defines the architectural events that lead to an increment of the associated HPM counter.
660
 
661
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
662
 
663
Depending on the configuration the following additional CSR are available:
664
 
665
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
666
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
667
 
668
[IMPORTANT]
669
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
670
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
671
exception.
672
 
673
[TIP]
674
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
675
 
676
[TIP]
677
For a list of all HPM-related CSRs and all provided event configurations
678
see section <<_hardware_performance_monitors_hpm>>.
679
 
680
 
681 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
682
 
683
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
684
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
685
 
686
* `fence.i`
687
 
688 66 zero_gravi
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
689 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
690
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
691
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
692 60 zero_gravi
 
693
 
694
==== **`PMP`** Physical Memory Protection
695
 
696 65 zero_gravi
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
697
to constrain memory read/write/execute rights for each available privilege level.
698 60 zero_gravi
 
699 65 zero_gravi
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
700
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
701
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
702
In this case the following additional CSRs are available:
703
 
704 60 zero_gravi
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
705
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
706
 
707 65 zero_gravi
[TIP]
708 60 zero_gravi
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
709
 
710
The actual number of regions and the minimal region granularity are defined via the top entity
711
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
712
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
713
number of available `pmpcfg*` and `pmpaddr*` CSRs.
714
 
715
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
716
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
717
increase the latency of instruction fetches and data access by +1 cycle.
718
 
719
The critical limit can be adapted for custom use by a constant from the main VHDL package file
720
(`rtl/core/neorv32_package.vhd`). The default value is 8:
721
 
722
[source,vhdl]
723
----
724
-- "critical" number of PMP regions --
725
constant pmp_num_regions_critical_c : natural := 8;
726
----
727
 
728
**Operation**
729
 
730 65 zero_gravi
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
731
of the specified  PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
732
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
733 60 zero_gravi
 
734
* a write access (store) will fail if no write attribute is set
735
* a read access (load) will fail if no read attribute is set
736
* an instruction fetch access will fail if no execute attribute is set
737
 
738 65 zero_gravi
If an access to a protected region does not have the according access rights it will raise the according
739
instruction/load/store _access fault_ exception.
740 60 zero_gravi
 
741
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
742 65 zero_gravi
memory protection also for machine-level programs you need to set the _locked bit_ in the according
743
`pmpcfg*` configuration CSR.
744 60 zero_gravi
 
745
[IMPORTANT]
746
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
747
internal (iterative) computations before the configuration becomes valid.
748
 
749
[NOTE]
750
For more information regarding RISC-V physical memory protection see the official _The RISC-V
751 65 zero_gravi
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
752 60 zero_gravi
 
753
 
754
 
755
<<<
756
// ####################################################################################################################
757
:sectnums:
758
=== Instruction Timing
759
 
760
The instruction timing listed in the table below shows the required clock cycles for executing a certain
761
instruction. These instruction cycles assume a bus access without additional wait states and a filled
762
pipeline.
763
 
764
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
765
configurations are presented in <<_cpu_performance>>.
766
 
767
.Clock cycles per instruction
768
[cols="<2,^1,^4,<3"]
769
[options="header", grid="rows"]
770
|=======================
771
| Class | ISA | Instruction(s) | Execution cycles
772
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
773
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
774
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
775 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
776 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
777
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
778
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
779
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
780
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
781
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
782
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
783
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
784
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
785
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
786
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
787
| System | `I/E` | `fence` | 3
788
| System | `C`+`Zicsr` | `c.break` | 4
789
| System | `Zicsr` | `mret` `wfi` | 5
790 66 zero_gravi
| System | `Zifencei` | `fence.i` | 3 + ML
791 60 zero_gravi
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
792
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
793
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
794
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
795
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
796
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
797
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
798 66 zero_gravi
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
799
| Bit-manipulation - arithmetic/logic | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
800
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
801
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
802
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
803
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
804
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
805 60 zero_gravi
|=======================
806
 
807
[NOTE]
808 65 zero_gravi
The presented values of the *floating-point execution cycles* are average values - obtained from
809 60 zero_gravi
4096 instruction executions using pseudo-random input values. The execution time for emulating the
810
instructions (using pure-software libraries) is ~17..140 times higher.
811
 
812
 
813 66 zero_gravi
<<<
814 60 zero_gravi
// ####################################################################################################################
815
include::cpu_csr.adoc[]
816
 
817
 
818
<<<
819
// ####################################################################################################################
820
:sectnums:
821
==== Traps, Exceptions and Interrupts
822
 
823 61 zero_gravi
In this document the following nomenclature regarding traps is used:
824 60 zero_gravi
 
825 64 zero_gravi
* _interrupts_ = asynchronous exceptions
826 60 zero_gravi
* _exceptions_ = synchronous exceptions
827
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
828
 
829 61 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
830
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
831
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
832
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
833 60 zero_gravi
 
834 61 zero_gravi
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
835
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
836 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
837 61 zero_gravi
the second highest priority will get serviced and so on until no further interrupt are pending.
838 60 zero_gravi
 
839 65 zero_gravi
.Interrupt Signal Requirements
840 61 zero_gravi
[IMPORTANT]
841 65 zero_gravi
All interrupts request signals (including FIRQs) are **high-active**. A request has to stay at high-level (=asserted)
842
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
843 60 zero_gravi
 
844 61 zero_gravi
.Instruction Atomicity
845
[NOTE]
846 65 zero_gravi
All instructions execute as atomic operations - interrupts can only trigger between two instructions.
847 64 zero_gravi
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
848
a new interrupt handler can start.
849 60 zero_gravi
 
850
 
851 61 zero_gravi
:sectnums:
852
==== Memory Access Exceptions**
853 60 zero_gravi
 
854 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
855
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
856
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
857 64 zero_gravi
memory protection fault do not trigger a bus write-operation at all.
858 60 zero_gravi
 
859
 
860 61 zero_gravi
:sectnums:
861
==== Custom Fast Interrupt Request Lines
862 60 zero_gravi
 
863 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
864 60 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
865 65 zero_gravi
provide custom trap codes in `mcause`. These FIRQs are reserved for NEORV32 processor-internal usage only.
866 60 zero_gravi
 
867
 
868
 
869
<<<
870
// ####################################################################################################################
871
:sectnums!:
872
===== NEORV32 Trap Listing
873
 
874
.NEORV32 trap listing
875
[cols="3,6,5,14,11,4,4"]
876
[options="header",grid="rows"]
877
|=======================
878 64 zero_gravi
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
879
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
880
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
881
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
882
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
883
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
884
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (EBREAK) | _PC_ | _PC_
885
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
886
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
887
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
888
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
889
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
890
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
891
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
892
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
893
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
894
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
895
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
896
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
897
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
898
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
899
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
900
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
901
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
902
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
903
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
904
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
905
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
906
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
907
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
908 60 zero_gravi
|=======================
909
 
910
**Notes**
911
 
912
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
913
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
914
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
915
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
916
`mepc` and `mtval` CSRs when a trap is triggered:
917
 
918
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
919
* _B-ADR_- bad memory access address that cause the trap
920
* _PC_ - address of instruction that caused the trap
921
* _0_ - zero
922
* _Inst_ - the faulting instruction itself
923
 
924
 
925
 
926
<<<
927
// ####################################################################################################################
928
:sectnums:
929
==== Bus Interface
930
 
931
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
932
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
933
 
934
:sectnums:
935
===== Address Space
936
 
937
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
938
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
939
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
940 65 zero_gravi
does not support unaligned memory accesses _in hardware_ - however, a software-based handling can be
941 60 zero_gravi
implemented as any unaligned memory access will trigger an according exception.
942
 
943
:sectnums:
944
===== Interface Signals
945
 
946
The following table shows the signals of the data and instruction interfaces seen from the CPU
947
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
948
 
949
.CPU bus interface
950
[cols="<2,^1,<7"]
951
[options="header",grid="rows"]
952
|=======================
953
| Signal | Size | Function
954
| `bus_addr_o`   | 32 | access address
955
| `bus_rdata_i`  | 32 | data input for read operations
956
| `bus_wdata_o`  | 32 | data output for write operations
957
| `bus_ben_o`    | 4  | byte enable signal for write operations
958
| `bus_we_o`     | 1  | bus write access
959
| `bus_re_o`     | 1  | bus read access
960
| `bus_lock_o`   | 1  | exclusive access request
961
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
962
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
963
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
964
| `bus_priv_o`   | 2  | current CPU privilege level
965
|=======================
966
 
967
[NOTE]
968
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
969
So only a single transfer request can be "on the fly".
970
 
971
:sectnums:
972
===== Protocol
973
 
974
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
975
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
976
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
977
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
978
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
979
access fault or load/store bus access fault exception.
980
 
981
[NOTE]
982
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
983
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
984
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
985
 
986
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
987
[IMPORTANT]
988
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
989
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
990
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
991
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
992 66 zero_gravi
The _BUSKEEPER_ hardware module (see section <<_internal_bus_monitor_buskeeper>>) keeps track of all _internal_ bus transactions. If any bus operations times out
993 60 zero_gravi
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
994
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
995
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
996
 
997
**Exemplary Bus Accesses**
998
 
999
.Example bus accesses: see read/write access description below
1000
[cols="^2,^2"]
1001
[grid="none"]
1002
|=======================
1003
a| image::cpu_interface_read_long.png[read,300,150]
1004
a| image::cpu_interface_write_long.png[write,300,150]
1005
| Read access | Write access
1006
|=======================
1007
 
1008
**Write Access**
1009
 
1010
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
1011
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
1012
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
1013
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
1014
cycles after issuing.
1015
 
1016
**Read Access**
1017
 
1018
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
1019
stable until the transaction is completed. In the example the accessed peripheral cannot answer
1020
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
1021
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
1022
signal).
1023
 
1024
**Access Boundaries**
1025
 
1026
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
1027
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1028
bit) and word (= 32-bit) boundaries.
1029
 
1030
**Exclusive (Atomic) Access**
1031
 
1032
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1033
combination. Normally, these combinations should target the same memory address.
1034
 
1035
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1036
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1037
the memory system to manage this exclusive access reservation by storing the according access address and
1038
the source of the access itself (for example via the CPU ID in a multi-core system).
1039
 
1040
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1041
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1042
zero and will allow the according store operation to the memory system. If the lock is broken, the
1043
instruction will write-back non-zero and will not generate an actual memory store operation.
1044
 
1045
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1046
 
1047
* when executing any other memory-access operation than `lr.w`
1048
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1049
* when the memory system signals a bus error (via the `bus_err_i` signal)
1050
 
1051
[TIP]
1052
For more information regarding the SoC-level behavior and requirements of atomic operations see
1053
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1054
 
1055
**Memory Barriers**
1056
 
1057
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
1058
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
1059
memory system to perform the necessary operations (like a cache flush and refill).
1060
 
1061
 
1062
 
1063
<<<
1064
// ####################################################################################################################
1065
:sectnums:
1066
==== CPU Hardware Reset
1067
 
1068
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1069
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1070
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1071
after power-up is not relevant for a defined CPU boot process.
1072
 
1073
**Rational**
1074
 
1075
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1076
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1077 66 zero_gravi
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
1078 60 zero_gravi
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1079
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1080
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1081
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1082
this example "uncritical registers".
1083
 
1084
**NEORV32 CPU Reset**
1085
 
1086
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1087
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1088
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
1089
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1090
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1091
 
1092
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1093
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
1094
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1095
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
1096
hardware reset setting it to low (globally disabling interrupts).
1097
 
1098
**Reset Configuration**
1099
 
1100
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
1101
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
1102
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1103
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
1104
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1105
 
1106
[source,vhdl]
1107
----
1108
-- "critical" number of PMP regions --
1109
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
1110
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
1111
default; TRUE=defined LOW reset value)
1112
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.