OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 63

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11
** `C` - 16-bit compressed instructions
12
** `I` - integer base ISA (always enabled)
13
** `E` - embedded CPU version (reduced register file size)
14
** `M` - integer multiplication and division hardware
15
** `U` - less-privileged _user_ mode
16 63 zero_gravi
** `Zbb` - basic bit-manipulation operations
17 61 zero_gravi
** `Zfinx` - single-precision floating-point unit
18
** `Zicsr` - control and status register access (privileged architecture)
19
** `Zifencei` - instruction stream synchronization
20
** `Zmmul` - integer multiplication hardware
21
** `PMP` - physical memory protection
22
** `HPM` - hardware performance monitors
23
** `DB` - debug mode
24 60 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)
25
* Official RISC-V open-source architecture ID
26
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt
27
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
28
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
29
* Optional hardware performance monitors (HPM) for application benchmarking
30
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
31
the NEORV32 processor)
32
* little-endian byte order
33
* Configurable hardware reset
34
* No hardware support of unaligned data/instruction accesses – they will trigger an exception. If the C extension is enabled instructions
35
can also be 16-bit aligned and a misaligned instruction address exception is not possible anymore
36
 
37
[NOTE]
38
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
39
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
40
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
41
setup also allows to further use the default bootloader and software framework. From this base you
42
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
43
 
44
[NOTE]
45
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
46
 
47
<<<
48
// ####################################################################################################################
49
:sectnums:
50
=== Architecture
51
 
52
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
53
specifications. The following figure shows the simplified architecture of the CPU.
54
 
55
image::neorv32_cpu.png[align=center]
56
 
57
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch)
58
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
59
stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
60
instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed
61
in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions
62
via the execute engine.
63
 
64
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
65
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
66
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
67
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
68
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
69
due to a taken branch.
70
 
71
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
72
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
73
every single instruction in a series of consecutive micro-operations. The combination of these two classical
74
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
75
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
76
 
77
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
78
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
79
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
80
Architecture.
81
 
82
 
83
// ####################################################################################################################
84
:sectnums:
85
=== RISC-V Compatibility
86
 
87
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
88
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
89 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
90
 
91
[NOTE]
92
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
93 60 zero_gravi
for information how to run the tests on the NEORV32.
94
 
95
.**RISC-V `rv32_m/C` Tests**
96
...................................
97
Check cadd-01           ... OK
98
Check caddi-01          ... OK
99
Check caddi16sp-01      ... OK
100
Check caddi4spn-01      ... OK
101
Check cand-01           ... OK
102
Check candi-01          ... OK
103
Check cbeqz-01          ... OK
104
Check cbnez-01          ... OK
105
Check cebreak-01        ... OK
106
Check cj-01             ... OK
107
Check cjal-01           ... OK
108
Check cjalr-01          ... OK
109
Check cjr-01            ... OK
110
Check cli-01            ... OK
111
Check clui-01           ... OK
112
Check clw-01            ... OK
113
Check clwsp-01          ... OK
114
Check cmv-01            ... OK
115
Check cnop-01           ... OK
116
Check cor-01            ... OK
117
Check cslli-01          ... OK
118
Check csrai-01          ... OK
119
Check csrli-01          ... OK
120
Check csub-01           ... OK
121
Check csw-01            ... OK
122
Check cswsp-01          ... OK
123
Check cxor-01           ... OK
124
--------------------------------
125
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
126
...................................
127
 
128
.**RISC-V `rv32_m/I` Tests**
129
...................................
130
Check add-01            ... OK
131
Check addi-01           ... OK
132
Check and-01            ... OK
133
Check andi-01           ... OK
134
Check auipc-01          ... OK
135
Check beq-01            ... OK
136
Check bge-01            ... OK
137
Check bgeu-01           ... OK
138
Check blt-01            ... OK
139
Check bltu-01           ... OK
140
Check bne-01            ... OK
141
Check fence-01          ... OK
142
Check jal-01            ... OK
143
Check jalr-01           ... OK
144
Check lb-align-01       ... OK
145
Check lbu-align-01      ... OK
146
Check lh-align-01       ... OK
147
Check lhu-align-01      ... OK
148
Check lui-01            ... OK
149
Check lw-align-01       ... OK
150
Check or-01             ... OK
151
Check ori-01            ... OK
152
Check sb-align-01       ... OK
153
Check sh-align-01       ... OK
154
Check sll-01            ... OK
155
Check slli-01           ... OK
156
Check slt-01            ... OK
157
Check slti-01           ... OK
158
Check sltiu-01          ... OK
159
Check sltu-01           ... OK
160
Check sra-01            ... OK
161
Check srai-01           ... OK
162
Check srl-01            ... OK
163
Check srli-01           ... OK
164
Check sub-01            ... OK
165
Check sw-align-01       ... OK
166
Check xor-01            ... OK
167
Check xori-01           ... OK
168
--------------------------------
169
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
170
...................................
171
 
172
.**RISC-V `rv32_m/M` Tests**
173
...................................
174
Check div-01            ... OK
175
Check divu-01           ... OK
176
Check mul-01            ... OK
177
Check mulh-01           ... OK
178
Check mulhsu-01         ... OK
179
Check mulhu-01          ... OK
180
Check rem-01            ... OK
181
Check remu-01           ... OK
182
--------------------------------
183
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
184
...................................
185
 
186
.**RISC-V `rv32_m/privilege` Tests**
187
...................................
188
Check ebreak            ... OK
189
Check ecall             ... OK
190
Check misalign-beq-01   ... OK
191
Check misalign-bge-01   ... OK
192
Check misalign-bgeu-01  ... OK
193
Check misalign-blt-01   ... OK
194
Check misalign-bltu-01  ... OK
195
Check misalign-bne-01   ... OK
196
Check misalign-jal-01   ... OK
197
Check misalign-lh-01    ... OK
198
Check misalign-lhu-01   ... OK
199
Check misalign-lw-01    ... OK
200
Check misalign-sh-01    ... OK
201
Check misalign-sw-01    ... OK
202
Check misalign1-jalr-01 ... OK
203
Check misalign2-jalr-01 ... OK
204
--------------------------------
205
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
206
...................................
207
 
208
.**RISC-V `rv32_m/Zifencei` Tests**
209
...................................
210
Check Fencei            ... OK
211
--------------------------------
212
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
213
...................................
214
 
215
 
216
<<<
217
:sectnums:
218
==== RISC-V Incompatibility Issues and Limitations
219
 
220
This list shows the currently known issues regarding full RISC-V-compatibility. More specific information
221
can be found in section <<_instruction_sets_and_extensions>>.
222
 
223
[IMPORTANT]
224
The `misa` CSR is read-only. It shows the synthesized CPU extensions. Hence, all implemented
225
CPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Any
226
write access to it (in machine mode) is ignored and will not cause any exception or side-effects.
227
 
228
[IMPORTANT]
229
The `mip` CSR is read-only. Pending IRQs can be cleared using the `mie` CSR.
230
 
231
[IMPORTANT]
232
The `mtval` CSR is read-only.
233
 
234
[IMPORTANT]
235
The physical memory protection (see section <<_machine_physical_memory_protection>>)
236
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
237
 
238
[IMPORTANT]
239
The `A` CPU extension (atomic memory access) only implements the `lr.w` and `sc.w` instructions yet.
240
However, these instructions are sufficient to emulate all further AMO operations.
241
 
242
 
243
<<<
244
// ####################################################################################################################
245
:sectnums:
246
=== CPU Top Entity - Signals
247
 
248
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
249
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
250
direction seen from the CPU.
251
 
252
.NEORV32 CPU top entity signals
253
[cols="<2,^1,^1,<6"]
254
[options="header", grid="rows"]
255
|=======================
256
| Signal           | Width | Dir.   | Function
257
4+^| **Global Signals**
258
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
259
| `rstn_i`         |     1 | in  | global reset, low-active
260
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
261
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
262
| `i_bus_addr_o`   |    32 | out | destination address
263
| `i_bus_rdata_i`  |    32 | in  | read data
264
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
265
| `i_bus_ben_o`    |     4 | out | byte enable
266
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
267
| `i_bus_re_o`     |     1 | out | read transaction
268
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
269
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
270
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
271
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
272
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
273
4+^| **Data Bus Interface (<<_bus_interface>>)**
274
| `d_bus_addr_o`   |    32 | out | destination address
275
| `d_bus_rdata_i`  |    32 | in  | read data
276
| `d_bus_wdata_o`  |    32 | out | write data
277
| `d_bus_ben_o`    |     4 | out | byte enable
278
| `d_bus_we_o`     |     1 | out | write transaction
279
| `d_bus_re_o`     |     1 | out | read transaction
280
| `d_bus_lock_o`   |     1 | out | exclusive access request
281
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
282
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
283
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
284
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
285
4+^| **System Time (see <<_timeh>> CSR)**
286
| `time_i`         |    64 | in  | system time input (from MTIME)
287
4+^| **Non-Maskable Interrupt (<<_traps_exceptions_and_interrupts>>)**
288
| `nm_irq_i`       |     1 | in  | non-maskable interrupt
289
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
290
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
291
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
292
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
293
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
294
| `firq_i`         |    16 | in  | fast interrupt request signals
295
| `firq_ack_o`     |    16 | out | fast interrupt acknowledge signals
296
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
297
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
298
|=======================
299
 
300
<<<
301
// ####################################################################################################################
302
:sectnums:
303
=== CPU Top Entity - Generics
304
 
305
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
306
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
307
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
308
The _specific_ generics are listed below.
309
 
310
[cols="4,4,2"]
311
[frame="all",grid="none"]
312
|======
313
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
314
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
315
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
316 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
317 60 zero_gravi
|======
318
 
319
[cols="4,4,2"]
320
[frame="all",grid="none"]
321
|======
322
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
323
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
324
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
325
|======
326
 
327
[cols="4,4,2"]
328
[frame="all",grid="none"]
329
|======
330
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
331
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
332
|======
333
 
334
 
335
<<<
336
// ####################################################################################################################
337
:sectnums:
338
=== Instruction Sets and Extensions
339
 
340
The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA
341
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
342
see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
343
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
344
 
345
[TIP]
346 63 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
347
_SYSINFO_CPU_ <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
348
or by executing an instruction and checking for an _illegal instruction exception_.
349 60 zero_gravi
 
350 63 zero_gravi
[NOTE]
351
Executing an instruction from an extension that is not implemented or not enabled (for example via the according
352
top entity generic) will raise an _illegal instruction_ exception.
353 60 zero_gravi
 
354 63 zero_gravi
 
355 60 zero_gravi
==== **`A`** - Atomic Memory Access
356
 
357
Atomic memory access instructions (for implementing semaphores and mutexes) are available when the
358
`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions
359
are available:
360
 
361
* `lr.w`: load-reservate
362
* `sc.w`: store-conditional
363
 
364
[NOTE]
365
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
366
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
367
instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
368
implemented) AMO (atomic memory operation) will trigger an illegal instruction exception.
369
 
370
[NOTE]
371
The atomic instructions have special requirements for memory system / bus interconnect. More
372
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
373
 
374
 
375
==== **`C`** - Compressed Instructions
376
 
377
Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is
378
_true_. In this case the following instructions are available:
379
 
380
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
381
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
382
 
383
[NOTE]
384
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require
385
an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased
386
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
387
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
388
 
389
 
390
==== **`E`** - Embedded CPU
391
 
392
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware
393
requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond
394
`x15` will raise and _illegal instruction exception_.
395
 
396 63 zero_gravi
[IMPORTANT]
397
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
398 60 zero_gravi
 
399
 
400
==== **`I`** - Base Integer ISA
401
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
402
regardless of the setting of the remaining exceptions. The base instruction set includes the following
403
instructions:
404
 
405
* immediates: `lui`, `auipc`
406
* jumps: `jal`, `jalr`
407
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
408
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
409
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
410
* environment: `ecall`, `ebreak`, `fence`
411
 
412
[NOTE]
413 61 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
414
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
415
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
416 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
417 60 zero_gravi
 
418
[NOTE]
419
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
420
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
421
executed. Any flags within the `fence` instruction word are ignore by the hardware.
422
 
423
 
424
==== **`M`** - Integer Multiplication and Division
425
 
426
Hardware-accelerated integer multiplication and division instructions are available when the
427
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
428
available:
429
 
430 61 zero_gravi
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
431
* division: `div`, `divu`, `rem`, `remu`
432 60 zero_gravi
 
433
[NOTE]
434
By default, multiplication and division operations are executed in a bit-serial approach.
435
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
436
generic is _true_ allowing faster execution. Multiplications and divisions
437
always require a fixed amount of cycles to complete - regardless of the input operands.
438
 
439
 
440 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
441
 
442
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
443
of the `M` extensions and is intended for small scale applications, that require hardware-based
444
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
445
This extension requires only ~50% of the hardware utilization of the `M` extension.
446
 
447
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
448
 
449 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
450
will raise an _illegal instruction exception_.
451 61 zero_gravi
 
452 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
453 61 zero_gravi
 
454
[TIP]
455
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
456
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
457
(example `$ make MARCH=-march=rv32im USER_FLAGS+=-mno-div clean_all exe`).
458
 
459
 
460 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
461
 
462 63 zero_gravi
Adds the less-privileged _user mode_ if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For
463 60 zero_gravi
instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like
464
peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode.
465
 
466
 
467
==== **`X`** - NEORV32-Specific (Custom) Extensions
468
 
469
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
470
 
471 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
472
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
473 60 zero_gravi
and `mip` CSR. This extension is mapped to bits, that are available for custom use (according to the
474
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
475 63 zero_gravi
* The CPU provides a single _non-maskable_ interrupt (`NMI)` that also provides a custom trap code for `mcause`.
476
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
477 60 zero_gravi
 
478
 
479 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
480 60 zero_gravi
 
481 63 zero_gravi
[WARNING]
482
The NEORV32 `Zfinx` extension is specification-compliant and operational but still _experimental_.
483 60 zero_gravi
 
484
The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the
485
integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f`
486
register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes.
487
This also implies that there are NO dedicated `f` register file related load/store or move instructions. The
488
official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
489
 
490
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
491
 
492
The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F`
493
extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
494
generic is _true_. In this case the following instructions and CSRs are available:
495
 
496
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
497
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
498
* computational: `fadd.s`, `fsub.s`, `fmul.s`
499
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
500
* number classification: `fclass.s`
501
 
502
* additional CSRs: `fcsr`, `frm`, `fflags`
503
 
504
[WARNING]
505
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
506
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
507
 
508
[WARNING]
509
Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU.
510
Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the
511
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
512
result is also flushed to zero during normalization.
513
 
514
[WARNING]
515
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
516
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
517
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
518
code (see `sw/example/floating_point_test`).
519
 
520 63 zero_gravi
 
521
==== **`Zbb`** Basic Bit-Manipulation Operations
522
 
523
[WARNING]
524
The NEORV32 `Zbb` extension is specification-compliant and operational but still _experimental_.
525
 
526
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
527
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
528
 
529
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
530
generic is _true_. In this case the following instructions are available:
531
 
532
* `andn`, `orn`, `xnor`
533
* `clz`, `ctz`, `cpop`
534
* `max`, `maxu`, `min`, `minu`
535
* `sext.b`, `sext.h`, `zext.h`
536
* `rol`, `ror`, `rori`
537
* `orc.b`, `rev8`
538
 
539
[TIP]
540
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
541
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
542
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
543
shift-related `Zbb` instructions.
544
 
545 62 zero_gravi
[IMPORTANT]
546 63 zero_gravi
The `Zbb` extension is frozen but not officially ratified yet. There is no
547
software support for this extension in the upstream GCC RISC-V port yet. However, an
548
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
549
code (see `sw/example/bitmanip_test`).
550 60 zero_gravi
 
551 62 zero_gravi
 
552 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
553
 
554
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the
555
`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are
556
available:
557
 
558
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
559
* environment: `mret`, `wfi`
560
 
561
[WARNING]
562
If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception
563
support at all. In order to provide the full spectrum of functions and to allow a secure executions
564
environment, the `Zicsr` extension should always be enabled.
565
 
566
[NOTE]
567
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
568
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
569
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
570
 
571 62 zero_gravi
[IMPORTANT]
572
The `wfi` instruction will raise an illegal instruction exception when executed outside of machine-mode
573
and <<_mstatus>> bit `TW` (timeout wait) is set.
574 60 zero_gravi
 
575 62 zero_gravi
 
576 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
577
 
578
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
579
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
580
 
581
* `fence.i`
582
 
583
[NOTE]
584
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
585
This allows a clean re-fetch of modified data from memory. Also, the top's `i_bus_fencei_o` signal is set
586
high for one cycle to inform the memory system. Any additional flags within the `fence.i` instruction word
587
are ignore by the hardware.
588
 
589
[NOTE]
590
If the `Zifencei` extension is disabled (_CPU_EXTENSION_RISCV_Zifencei_ generic = false) executing
591
a `fence.i` instruction will be executed as `nop` (and will **not trap**) and none of the functions
592
described above will be executed.
593
 
594
 
595
==== **`PMP`** Physical Memory Protection
596
 
597
The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs.
598
The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured
599
via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the
600
`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available:
601
 
602
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
603
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
604
 
605
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
606
 
607
**Configuration**
608
 
609
The actual number of regions and the minimal region granularity are defined via the top entity
610
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
611
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
612
number of available `pmpcfg*` and `pmpaddr*` CSRs.
613
 
614
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
615
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
616
increase the latency of instruction fetches and data access by +1 cycle.
617
 
618
The critical limit can be adapted for custom use by a constant from the main VHDL package file
619
(`rtl/core/neorv32_package.vhd`). The default value is 8:
620
 
621
[source,vhdl]
622
----
623
-- "critical" number of PMP regions --
624
constant pmp_num_regions_critical_c : natural := 8;
625
----
626
 
627
**Operation**
628
 
629
Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any
630
of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an
631
address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked:
632
 
633
* a write access (store) will fail if no write attribute is set
634
* a read access (load) will fail if no read attribute is set
635
* an instruction fetch access will fail if no execute attribute is set
636
 
637
If an access to a protected region does not have the according access rights (attributes) it will raise the according
638
_instruction/load/store access fault exception_.
639
 
640
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
641
memory protection also for machine-level programs you need to active the _locked bit_ in the according
642
`pmpcfg*` configuration.
643
 
644
[IMPORTANT]
645
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
646
internal (iterative) computations before the configuration becomes valid.
647
 
648
[NOTE]
649
For more information regarding RISC-V physical memory protection see the official _The RISC-V
650
Instruction Set Manual – Volume II: Privileged Architecture_ specifications.
651
 
652
 
653
==== **`HPM`** Hardware Performance Monitors
654
 
655
In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
656
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
657
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
658
`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configuration
659
CSR defines the architectural events that lead to an increment of the associated HPM counter.
660
 
661
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
662 62 zero_gravi
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
663 60 zero_gravi
the instructions-retired counter increments with each executed instructions. The actual hardware performance
664
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
665
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude
666
all HPM logic from the design.
667
 
668
Depending on the configuration, the following additional CSR are available:
669
 
670 62 zero_gravi
* counters: `mhpmcounter*[h]` (3..31, depending on configuration)
671 60 zero_gravi
* event configuration: `mhpmevent*` (3..31, depending on configuration)
672
 
673 62 zero_gravi
[IMPORTANT]
674
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
675
are always zero and read-only.
676
 
677 60 zero_gravi
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
678
 
679 62 zero_gravi
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
680
according `mcountinhibit` CSR bits are hardwired to zero.
681
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
682
The according CSRs are read-only and will always return 0.
683 60 zero_gravi
 
684
[NOTE]
685
For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>.
686
 
687
 
688
<<<
689
// ####################################################################################################################
690
:sectnums:
691
=== Instruction Timing
692
 
693
The instruction timing listed in the table below shows the required clock cycles for executing a certain
694
instruction. These instruction cycles assume a bus access without additional wait states and a filled
695
pipeline.
696
 
697
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
698
configurations are presented in <<_cpu_performance>>.
699
 
700
.Clock cycles per instruction
701
[cols="<2,^1,^4,<3"]
702
[options="header", grid="rows"]
703
|=======================
704
| Class | ISA | Instruction(s) | Execution cycles
705
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
706
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
707
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
708 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
709 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
710
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
711
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
712
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
713
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
714
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
715
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
716
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
717
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
718
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
719
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
720
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
721
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
722
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
723
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
724
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
725
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
726
| System | `I/E` | `fence` | 3
727
| System | `C`+`Zicsr` | `c.break` | 4
728
| System | `Zicsr` | `mret` `wfi` | 5
729
| System | `Zifencei` | `fence.i` | 5
730
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
731
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
732
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
733
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
734
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
735
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
736
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
737 63 zero_gravi
| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3
738
| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4
739
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
740
| Basic bit-manip - misc  | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
741 60 zero_gravi
|=======================
742
 
743
[NOTE]
744
The presented values of the *floating-point execution cycles* are average values – obtained from
745
4096 instruction executions using pseudo-random input values. The execution time for emulating the
746
instructions (using pure-software libraries) is ~17..140 times higher.
747
 
748
 
749
 
750
// ####################################################################################################################
751
include::cpu_csr.adoc[]
752
 
753
 
754
 
755
<<<
756
// ####################################################################################################################
757
:sectnums:
758 62 zero_gravi
==== Full Virtualization
759 60 zero_gravi
 
760 62 zero_gravi
Just like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilities
761
on CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPU
762
extension is enabled (implementing the full set of the privileged architecture).]
763
Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing an
764
malformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in a
765
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
766
have to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the cause
767
of the trap - at any time improving overall _execution safety_.
768 60 zero_gravi
 
769 62 zero_gravi
**NEORV32-Specific Virtualization Features**
770 60 zero_gravi
 
771 62 zero_gravi
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
772
(i.e. there is no speculative execution / no out-of-order states).
773
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
774
accessed address does not respond or encounters an internal error during access.
775
* The CPU raises an illegal instruction trap for _all_ unimplemented/malformed/illegal instructions.
776
* To be continued...
777 60 zero_gravi
 
778
 
779
<<<
780
// ####################################################################################################################
781
:sectnums:
782
==== Traps, Exceptions and Interrupts
783
 
784 61 zero_gravi
In this document the following nomenclature regarding traps is used:
785 60 zero_gravi
 
786
* _interrupt_ = asynchronous exceptions
787
* _exceptions_ = synchronous exceptions
788
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
789
 
790 61 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
791
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
792
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
793
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
794 60 zero_gravi
 
795 61 zero_gravi
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
796
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
797
is serviced first while the remaining ones are queued. After completing the interrupt handler the interrupt with
798
the second highest priority will get serviced and so on until no further interrupt are pending.
799 60 zero_gravi
 
800 61 zero_gravi
.Trigger Type
801
[IMPORTANT]
802
All CPU interrupt request signals are high-level triggered. So an interrupt request will be generated if the
803
according signal is _high_ for exactly one cycle (being high for several cycles might cause multiple
804
triggering of the same interrupt).
805 60 zero_gravi
 
806 61 zero_gravi
.Instruction Atomicity
807
[NOTE]
808
All instructions execute as atomic operations – interrupts can only trigger between two instructions.
809 60 zero_gravi
 
810
 
811 61 zero_gravi
:sectnums:
812
==== Memory Access Exceptions**
813 60 zero_gravi
 
814 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
815
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
816
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
817
memory protection fault do not trigger
818
a bus write-operation at all.
819 60 zero_gravi
 
820
 
821 61 zero_gravi
:sectnums:
822
==== Custom Fast Interrupt Request Lines
823 60 zero_gravi
 
824 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
825 60 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
826 61 zero_gravi
provide custom trap codes in `mcause`. Thes FIRQs are reserved for processor-internal usage only.
827 60 zero_gravi
 
828
 
829 61 zero_gravi
:sectnums:
830
==== Non-Maskable Interrupt
831 60 zero_gravi
 
832
The NEORV32 CPU features a single non-maskable interrupt source via the `nm_irq_i` CPU (/Processor) top
833 61 zero_gravi
entity signal. This interrupt can be used to signal _critical_ system conditions that need immediate handling.
834
The non-maskable interrupt _cannot_ be masked/disabled at all (even not in interrupt service routines).
835 60 zero_gravi
Hence, it does _not_ provide configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible
836
`mcause` value `0x80000000` is used to indicate the non-maskable interrupt.
837
 
838
 
839
 
840
<<<
841
// ####################################################################################################################
842
:sectnums!:
843
===== NEORV32 Trap Listing
844
 
845
.NEORV32 trap listing
846
[cols="3,6,5,14,11,4,4"]
847
[options="header",grid="rows"]
848
|=======================
849
| Prio. | `mcause`     | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
850
| 1     | `0x80000000` | 1.0      | _TRAP_CODE_NMI_ | non-maskable interrupt | _I-PC_ | _0_
851
| 2     | `0x8000000B` | 1.11     | _TRAP_CODE_MEI_ | machine external interrupt | _I-PC_ | _0_
852
| 3     | `0x80000003` | 1.3      | _TRAP_CODE_MSI_ | machine software interrupt | _I-PC_ | _0_
853
| 4     | `0x80000007` | 1.7      | _TRAP_CODE_MTI_ | machine timer interrupt | _I-PC_ | _0_
854
| 5     | `0x80000010` | 1.16     | _TRAP_CODE_FIRQ_0_ | fast interrupt request channel 0 | _I-PC_ | _0_
855
| 6     | `0x80000011` | 1.17     | _TRAP_CODE_FIRQ_1_ | fast interrupt request channel 1 | _I-PC_ | _0_
856
| 7     | `0x80000012` | 1.18     | _TRAP_CODE_FIRQ_2_ | fast interrupt request channel 2 | _I-PC_ | _0_
857
| 8     | `0x80000013` | 1.19     | _TRAP_CODE_FIRQ_3_ | fast interrupt request channel 3 | _I-PC_ | _0_
858
| 9     | `0x80000014` | 1.20     | _TRAP_CODE_FIRQ_4_ | fast interrupt request channel 4 | _I-PC_ | _0_
859
| 10    | `0x80000015` | 1.21     | _TRAP_CODE_FIRQ_5_ | fast interrupt request channel 5 | _I-PC_ | _0_
860
| 11    | `0x80000016` | 1.22     | _TRAP_CODE_FIRQ_6_ | fast interrupt request channel 6 | _I-PC_ | _0_
861
| 12    | `0x80000017` | 1.23     | _TRAP_CODE_FIRQ_7_ | fast interrupt request channel 7 | _I-PC_ | _0_
862
| 13    | `0x80000018` | 1.24     | _TRAP_CODE_FIRQ_8_ | fast interrupt request channel 8 | _I-PC_ | _0_
863
| 14    | `0x80000019` | 1.25     | _TRAP_CODE_FIRQ_9_ | fast interrupt request channel 9 | _I-PC_ | _0_
864
| 15    | `0x8000001a` | 1.26     | _TRAP_CODE_FIRQ_10_ | fast interrupt request channel 10 | _I-PC_ | _0_
865
| 16    | `0x8000001b` | 1.27     | _TRAP_CODE_FIRQ_11_ | fast interrupt request channel 11 | _I-PC_ | _0_
866
| 17    | `0x8000001c` | 1.28     | _TRAP_CODE_FIRQ_12_ | fast interrupt request channel 12 | _I-PC_ | _0_
867
| 18    | `0x8000001d` | 1.29     | _TRAP_CODE_FIRQ_13_ | fast interrupt request channel 13 | _I-PC_ | _0_
868
| 19    | `0x8000001e` | 1.30     | _TRAP_CODE_FIRQ_14_ | fast interrupt request channel 14 | _I-PC_ | _0_
869
| 20    | `0x8000001f` | 1.31     | _TRAP_CODE_FIRQ_15_ | fast interrupt request channel 15 | _I-PC_ | _0_
870
| 21    | `0x00000001` | 0.1      | _TRAP_CODE_I_ACCESS_ | instruction access fault | _B-ADR_ | _PC_
871
| 22    | `0x00000002` | 0.2      | _TRAP_CODE_I_ILLEGAL_ | illegal instruction | _PC_ | _Inst_
872
| 23    | `0x00000000` | 0.0      | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
873
| 24    | `0x0000000B` | 0.11     | _TRAP_CODE_MENV_CALL_ | environment call from M-mode (ECALL in machine-mode) | _PC_ | _PC_
874
| 25    | `0x00000008` | 0.8      | _TRAP_CODE_UENV_CALL_ | environment call from U-mode(ECALL in user-mode) | _PC_ | _PC_
875
| 26    | `0x00000003` | 0.3      | _TRAP_CODE_BREAKPOINT_ | breakpoint (EBREAK) | _PC_ | _PC_
876
| 27    | `0x00000006` | 0.6      | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
877
| 28    | `0x00000004` | 0.4      | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
878
| 29    | `0x00000007` | 0.7      | _TRAP_CODE_S_ACCESS_ | store access fault | _B-ADR_ | _B-ADR_
879
| 30    | `0x00000005` | 0.5      | _TRAP_CODE_L_ACCESS_ | lad access fault | _B-ADR_ | _B-ADR_
880
|=======================
881
 
882
**Notes**
883
 
884
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
885
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
886
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
887
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
888
`mepc` and `mtval` CSRs when a trap is triggered:
889
 
890
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
891
* _B-ADR_- bad memory access address that cause the trap
892
* _PC_ - address of instruction that caused the trap
893
* _0_ - zero
894
* _Inst_ - the faulting instruction itself
895
 
896
 
897
 
898
<<<
899
// ####################################################################################################################
900
:sectnums:
901
==== Bus Interface
902
 
903
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
904
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
905
 
906
:sectnums:
907
===== Address Space
908
 
909
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
910
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
911
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
912
does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be
913
implemented as any unaligned memory access will trigger an according exception.
914
 
915
:sectnums:
916
===== Interface Signals
917
 
918
The following table shows the signals of the data and instruction interfaces seen from the CPU
919
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
920
 
921
.CPU bus interface
922
[cols="<2,^1,<7"]
923
[options="header",grid="rows"]
924
|=======================
925
| Signal | Size | Function
926
| `bus_addr_o`   | 32 | access address
927
| `bus_rdata_i`  | 32 | data input for read operations
928
| `bus_wdata_o`  | 32 | data output for write operations
929
| `bus_ben_o`    | 4  | byte enable signal for write operations
930
| `bus_we_o`     | 1  | bus write access
931
| `bus_re_o`     | 1  | bus read access
932
| `bus_lock_o`   | 1  | exclusive access request
933
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
934
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
935
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
936
| `bus_priv_o`   | 2  | current CPU privilege level
937
|=======================
938
 
939
[NOTE]
940
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
941
So only a single transfer request can be "on the fly".
942
 
943
:sectnums:
944
===== Protocol
945
 
946
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
947
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
948
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
949
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
950
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
951
access fault or load/store bus access fault exception.
952
 
953
[NOTE]
954
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
955
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
956
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
957
 
958
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
959
[IMPORTANT]
960
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
961
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
962
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
963
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
964
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
965
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
966
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
967
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
968
 
969
**Exemplary Bus Accesses**
970
 
971
.Example bus accesses: see read/write access description below
972
[cols="^2,^2"]
973
[grid="none"]
974
|=======================
975
a| image::cpu_interface_read_long.png[read,300,150]
976
a| image::cpu_interface_write_long.png[write,300,150]
977
| Read access | Write access
978
|=======================
979
 
980
**Write Access**
981
 
982
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
983
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
984
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
985
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
986
cycles after issuing.
987
 
988
**Read Access**
989
 
990
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
991
stable until the transaction is completed. In the example the accessed peripheral cannot answer
992
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
993
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
994
signal).
995
 
996
**Access Boundaries**
997
 
998
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
999
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1000
bit) and word (= 32-bit) boundaries.
1001
 
1002
**Exclusive (Atomic) Access**
1003
 
1004
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1005
combination. Normally, these combinations should target the same memory address.
1006
 
1007
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1008
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1009
the memory system to manage this exclusive access reservation by storing the according access address and
1010
the source of the access itself (for example via the CPU ID in a multi-core system).
1011
 
1012
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1013
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1014
zero and will allow the according store operation to the memory system. If the lock is broken, the
1015
instruction will write-back non-zero and will not generate an actual memory store operation.
1016
 
1017
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1018
 
1019
* when executing any other memory-access operation than `lr.w`
1020
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1021
* when the memory system signals a bus error (via the `bus_err_i` signal)
1022
 
1023
[TIP]
1024
For more information regarding the SoC-level behavior and requirements of atomic operations see
1025
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1026
 
1027
**Memory Barriers**
1028
 
1029
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
1030
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
1031
memory system to perform the necessary operations (like a cache flush and refill).
1032
 
1033
 
1034
 
1035
<<<
1036
// ####################################################################################################################
1037
:sectnums:
1038
==== CPU Hardware Reset
1039
 
1040
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1041
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1042
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1043
after power-up is not relevant for a defined CPU boot process.
1044
 
1045
**Rational**
1046
 
1047
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1048
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1049
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
1050
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1051
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1052
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1053
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1054
this example "uncritical registers".
1055
 
1056
**NEORV32 CPU Reset**
1057
 
1058
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1059
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1060
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
1061
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1062
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1063
 
1064
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1065
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
1066
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1067
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
1068
hardware reset setting it to low (globally disabling interrupts).
1069
 
1070
**Reset Configuration**
1071
 
1072
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
1073
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
1074
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1075
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
1076
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1077
 
1078
[source,vhdl]
1079
----
1080
-- "critical" number of PMP regions --
1081
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
1082
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
1083
default; TRUE=defined LOW reset value)
1084
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.