OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 61

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11
** `C` - 16-bit compressed instructions
12
** `I` - integer base ISA (always enabled)
13
** `E` - embedded CPU version (reduced register file size)
14
** `M` - integer multiplication and division hardware
15
** `U` - less-privileged _user_ mode
16
** `Zfinx` - single-precision floating-point unit
17
** `Zicsr` - control and status register access (privileged architecture)
18
** `Zifencei` - instruction stream synchronization
19
** `Zmmul` - integer multiplication hardware
20
** `PMP` - physical memory protection
21
** `HPM` - hardware performance monitors
22
** `DB` - debug mode
23 60 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)
24
* Official RISC-V open-source architecture ID
25
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt
26
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
27
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
28
* Optional hardware performance monitors (HPM) for application benchmarking
29
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
30
the NEORV32 processor)
31
* little-endian byte order
32
* Configurable hardware reset
33
* No hardware support of unaligned data/instruction accesses – they will trigger an exception. If the C extension is enabled instructions
34
can also be 16-bit aligned and a misaligned instruction address exception is not possible anymore
35
 
36
[NOTE]
37
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
38
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
39
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
40
setup also allows to further use the default bootloader and software framework. From this base you
41
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
42
 
43
[NOTE]
44
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
45
 
46
<<<
47
// ####################################################################################################################
48
:sectnums:
49
=== Architecture
50
 
51
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
52
specifications. The following figure shows the simplified architecture of the CPU.
53
 
54
image::neorv32_cpu.png[align=center]
55
 
56
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch)
57
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
58
stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
59
instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed
60
in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions
61
via the execute engine.
62
 
63
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
64
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
65
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
66
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
67
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
68
due to a taken branch.
69
 
70
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
71
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
72
every single instruction in a series of consecutive micro-operations. The combination of these two classical
73
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
74
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
75
 
76
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
77
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
78
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
79
Architecture.
80
 
81
 
82
// ####################################################################################################################
83
:sectnums:
84
=== RISC-V Compatibility
85
 
86
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
87
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
88
NEORV32 processor are located in the repository's `riscv-arch-test` folder. See section <<_risc_v_architecture_test_framework>>
89
for information how to run the tests on the NEORV32.
90
 
91
.**RISC-V `rv32_m/C` Tests**
92
...................................
93
Check cadd-01           ... OK
94
Check caddi-01          ... OK
95
Check caddi16sp-01      ... OK
96
Check caddi4spn-01      ... OK
97
Check cand-01           ... OK
98
Check candi-01          ... OK
99
Check cbeqz-01          ... OK
100
Check cbnez-01          ... OK
101
Check cebreak-01        ... OK
102
Check cj-01             ... OK
103
Check cjal-01           ... OK
104
Check cjalr-01          ... OK
105
Check cjr-01            ... OK
106
Check cli-01            ... OK
107
Check clui-01           ... OK
108
Check clw-01            ... OK
109
Check clwsp-01          ... OK
110
Check cmv-01            ... OK
111
Check cnop-01           ... OK
112
Check cor-01            ... OK
113
Check cslli-01          ... OK
114
Check csrai-01          ... OK
115
Check csrli-01          ... OK
116
Check csub-01           ... OK
117
Check csw-01            ... OK
118
Check cswsp-01          ... OK
119
Check cxor-01           ... OK
120
--------------------------------
121
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
122
...................................
123
 
124
.**RISC-V `rv32_m/I` Tests**
125
...................................
126
Check add-01            ... OK
127
Check addi-01           ... OK
128
Check and-01            ... OK
129
Check andi-01           ... OK
130
Check auipc-01          ... OK
131
Check beq-01            ... OK
132
Check bge-01            ... OK
133
Check bgeu-01           ... OK
134
Check blt-01            ... OK
135
Check bltu-01           ... OK
136
Check bne-01            ... OK
137
Check fence-01          ... OK
138
Check jal-01            ... OK
139
Check jalr-01           ... OK
140
Check lb-align-01       ... OK
141
Check lbu-align-01      ... OK
142
Check lh-align-01       ... OK
143
Check lhu-align-01      ... OK
144
Check lui-01            ... OK
145
Check lw-align-01       ... OK
146
Check or-01             ... OK
147
Check ori-01            ... OK
148
Check sb-align-01       ... OK
149
Check sh-align-01       ... OK
150
Check sll-01            ... OK
151
Check slli-01           ... OK
152
Check slt-01            ... OK
153
Check slti-01           ... OK
154
Check sltiu-01          ... OK
155
Check sltu-01           ... OK
156
Check sra-01            ... OK
157
Check srai-01           ... OK
158
Check srl-01            ... OK
159
Check srli-01           ... OK
160
Check sub-01            ... OK
161
Check sw-align-01       ... OK
162
Check xor-01            ... OK
163
Check xori-01           ... OK
164
--------------------------------
165
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
166
...................................
167
 
168
.**RISC-V `rv32_m/M` Tests**
169
...................................
170
Check div-01            ... OK
171
Check divu-01           ... OK
172
Check mul-01            ... OK
173
Check mulh-01           ... OK
174
Check mulhsu-01         ... OK
175
Check mulhu-01          ... OK
176
Check rem-01            ... OK
177
Check remu-01           ... OK
178
--------------------------------
179
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
180
...................................
181
 
182
.**RISC-V `rv32_m/privilege` Tests**
183
...................................
184
Check ebreak            ... OK
185
Check ecall             ... OK
186
Check misalign-beq-01   ... OK
187
Check misalign-bge-01   ... OK
188
Check misalign-bgeu-01  ... OK
189
Check misalign-blt-01   ... OK
190
Check misalign-bltu-01  ... OK
191
Check misalign-bne-01   ... OK
192
Check misalign-jal-01   ... OK
193
Check misalign-lh-01    ... OK
194
Check misalign-lhu-01   ... OK
195
Check misalign-lw-01    ... OK
196
Check misalign-sh-01    ... OK
197
Check misalign-sw-01    ... OK
198
Check misalign1-jalr-01 ... OK
199
Check misalign2-jalr-01 ... OK
200
--------------------------------
201
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
202
...................................
203
 
204
.**RISC-V `rv32_m/Zifencei` Tests**
205
...................................
206
Check Fencei            ... OK
207
--------------------------------
208
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
209
...................................
210
 
211
 
212
<<<
213
:sectnums:
214
==== RISC-V Incompatibility Issues and Limitations
215
 
216
This list shows the currently known issues regarding full RISC-V-compatibility. More specific information
217
can be found in section <<_instruction_sets_and_extensions>>.
218
 
219
[IMPORTANT]
220
The `misa` CSR is read-only. It shows the synthesized CPU extensions. Hence, all implemented
221
CPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Any
222
write access to it (in machine mode) is ignored and will not cause any exception or side-effects.
223
 
224
[IMPORTANT]
225
The `mip` CSR is read-only. Pending IRQs can be cleared using the `mie` CSR.
226
 
227
[IMPORTANT]
228
The `mtval` CSR is read-only.
229
 
230
[IMPORTANT]
231
The physical memory protection (see section <<_machine_physical_memory_protection>>)
232
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
233
 
234
[IMPORTANT]
235
The `A` CPU extension (atomic memory access) only implements the `lr.w` and `sc.w` instructions yet.
236
However, these instructions are sufficient to emulate all further AMO operations.
237
 
238
 
239
<<<
240
// ####################################################################################################################
241
:sectnums:
242
=== CPU Top Entity - Signals
243
 
244
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
245
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
246
direction seen from the CPU.
247
 
248
.NEORV32 CPU top entity signals
249
[cols="<2,^1,^1,<6"]
250
[options="header", grid="rows"]
251
|=======================
252
| Signal           | Width | Dir.   | Function
253
4+^| **Global Signals**
254
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
255
| `rstn_i`         |     1 | in  | global reset, low-active
256
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
257
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
258
| `i_bus_addr_o`   |    32 | out | destination address
259
| `i_bus_rdata_i`  |    32 | in  | read data
260
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
261
| `i_bus_ben_o`    |     4 | out | byte enable
262
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
263
| `i_bus_re_o`     |     1 | out | read transaction
264
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
265
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
266
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
267
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
268
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
269
4+^| **Data Bus Interface (<<_bus_interface>>)**
270
| `d_bus_addr_o`   |    32 | out | destination address
271
| `d_bus_rdata_i`  |    32 | in  | read data
272
| `d_bus_wdata_o`  |    32 | out | write data
273
| `d_bus_ben_o`    |     4 | out | byte enable
274
| `d_bus_we_o`     |     1 | out | write transaction
275
| `d_bus_re_o`     |     1 | out | read transaction
276
| `d_bus_lock_o`   |     1 | out | exclusive access request
277
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
278
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
279
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
280
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
281
4+^| **System Time (see <<_timeh>> CSR)**
282
| `time_i`         |    64 | in  | system time input (from MTIME)
283
4+^| **Non-Maskable Interrupt (<<_traps_exceptions_and_interrupts>>)**
284
| `nm_irq_i`       |     1 | in  | non-maskable interrupt
285
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
286
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
287
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
288
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
289
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
290
| `firq_i`         |    16 | in  | fast interrupt request signals
291
| `firq_ack_o`     |    16 | out | fast interrupt acknowledge signals
292
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
293
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
294
|=======================
295
 
296
<<<
297
// ####################################################################################################################
298
:sectnums:
299
=== CPU Top Entity - Generics
300
 
301
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
302
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
303
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
304
The _specific_ generics are listed below.
305
 
306
[cols="4,4,2"]
307
[frame="all",grid="none"]
308
|======
309
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
310
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
311
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
312 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
313 60 zero_gravi
|======
314
 
315
[cols="4,4,2"]
316
[frame="all",grid="none"]
317
|======
318
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
319
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
320
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
321
|======
322
 
323
[cols="4,4,2"]
324
[frame="all",grid="none"]
325
|======
326
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
327
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
328
|======
329
 
330
 
331
<<<
332
// ####################################################################################################################
333
:sectnums:
334
=== Instruction Sets and Extensions
335
 
336
The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA
337
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
338
see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
339
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
340
 
341
[TIP]
342
The CPU can discover available ISA extensions via the <<_misa>> and <<_mzext>> CSRs or by executing an instruction
343
and checking for an _illegal instruction exception_.
344
 
345
 
346
==== **`A`** - Atomic Memory Access
347
 
348
Atomic memory access instructions (for implementing semaphores and mutexes) are available when the
349
`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions
350
are available:
351
 
352
* `lr.w`: load-reservate
353
* `sc.w`: store-conditional
354
 
355
[NOTE]
356
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
357
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
358
instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
359
implemented) AMO (atomic memory operation) will trigger an illegal instruction exception.
360
 
361
[NOTE]
362
The atomic instructions have special requirements for memory system / bus interconnect. More
363
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
364
 
365
 
366
==== **`C`** - Compressed Instructions
367
 
368
Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is
369
_true_. In this case the following instructions are available:
370
 
371
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
372
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
373
 
374
[NOTE]
375
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require
376
an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased
377
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
378
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
379
 
380
 
381
==== **`E`** - Embedded CPU
382
 
383
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware
384
requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond
385
`x15` will raise and _illegal instruction exception_.
386
 
387
Due to the reduced register file an alternate ABI (**`ilp32e`**) is required for the toolchain.
388
 
389
 
390
==== **`I`** - Base Integer ISA
391
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
392
regardless of the setting of the remaining exceptions. The base instruction set includes the following
393
instructions:
394
 
395
* immediates: `lui`, `auipc`
396
* jumps: `jal`, `jalr`
397
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
398
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
399
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
400
* environment: `ecall`, `ebreak`, `fence`
401
 
402
[NOTE]
403 61 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
404
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
405
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
406
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
407 60 zero_gravi
 
408
[NOTE]
409
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
410
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
411
executed. Any flags within the `fence` instruction word are ignore by the hardware.
412
 
413
 
414
==== **`M`** - Integer Multiplication and Division
415
 
416
Hardware-accelerated integer multiplication and division instructions are available when the
417
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
418
available:
419
 
420 61 zero_gravi
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
421
* division: `div`, `divu`, `rem`, `remu`
422 60 zero_gravi
 
423
[NOTE]
424
By default, multiplication and division operations are executed in a bit-serial approach.
425
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
426
generic is _true_ allowing faster execution. Multiplications and divisions
427
always require a fixed amount of cycles to complete - regardless of the input operands.
428
 
429
 
430 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
431
 
432
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
433
of the `M` extensions and is intended for small scale applications, that require hardware-based
434
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
435
This extension requires only ~50% of the hardware utilization of the `M` extension.
436
 
437
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
438
 
439
If `Zmmul` is enabled, executing any division instruction from the `M` ISA (`div`, `divu`, `rem`, `remu`)
440
will raise an illegal instruction exception.
441
 
442
Note that `M` and `Zmmul` extensions _cannot_ be enabled in parallel.
443
 
444
[TIP]
445
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
446
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
447
(example `$ make MARCH=-march=rv32im USER_FLAGS+=-mno-div clean_all exe`).
448
 
449
 
450 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
451
 
452
Adds the less-privileged _user mode_ when the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For
453
instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like
454
peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode.
455
 
456
 
457
==== **`X`** - NEORV32-Specific (Custom) Extensions
458
 
459
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
460
 
461
[NOTE]
462
The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
463
and `mip` CSR. This extension is mapped to bits, that are available for custom use (according to the
464
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
465
 
466
[NOTE]
467
The CPU provides a single _non-maskable_ interrupt (`NMI)` that also provides a custom trap code for `mcause`.
468
 
469
[NOTE]
470
A custom CSR `mzext` is available that can be used to check for implemented `Z*` CPU extensions
471
(for example `Zifencei`). This CSR is mapped to the official "custom CSR address region".
472
 
473
[NOTE]
474
All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception
475
(see <<_execution_safety>>).
476
 
477
 
478
==== **`Zfinx`** Single-Precision Floating-Point Operations
479
 
480
The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the
481
integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f`
482
register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes.
483
This also implies that there are NO dedicated `f` register file related load/store or move instructions. The
484
official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
485
 
486
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
487
 
488
The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F`
489
extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
490
generic is _true_. In this case the following instructions and CSRs are available:
491
 
492
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
493
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
494
* computational: `fadd.s`, `fsub.s`, `fmul.s`
495
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
496
* number classification: `fclass.s`
497
 
498
* additional CSRs: `fcsr`, `frm`, `fflags`
499
 
500
[WARNING]
501
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
502
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
503
 
504
[WARNING]
505
Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU.
506
Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the
507
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
508
result is also flushed to zero during normalization.
509
 
510
[WARNING]
511
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
512
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
513
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
514
code (see `sw/example/floating_point_test`).
515
 
516
 
517
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
518
 
519
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the
520
`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are
521
available:
522
 
523
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
524
* environment: `mret`, `wfi`
525
 
526
[WARNING]
527
If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception
528
support at all. In order to provide the full spectrum of functions and to allow a secure executions
529
environment, the `Zicsr` extension should always be enabled.
530
 
531
[NOTE]
532
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
533
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
534
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
535
 
536
 
537
==== **`Zifencei`** Instruction Stream Synchronization
538
 
539
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
540
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
541
 
542
* `fence.i`
543
 
544
[NOTE]
545
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
546
This allows a clean re-fetch of modified data from memory. Also, the top's `i_bus_fencei_o` signal is set
547
high for one cycle to inform the memory system. Any additional flags within the `fence.i` instruction word
548
are ignore by the hardware.
549
 
550
[NOTE]
551
If the `Zifencei` extension is disabled (_CPU_EXTENSION_RISCV_Zifencei_ generic = false) executing
552
a `fence.i` instruction will be executed as `nop` (and will **not trap**) and none of the functions
553
described above will be executed.
554
 
555
 
556
==== **`PMP`** Physical Memory Protection
557
 
558
The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs.
559
The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured
560
via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the
561
`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available:
562
 
563
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
564
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
565
 
566
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
567
 
568
**Configuration**
569
 
570
The actual number of regions and the minimal region granularity are defined via the top entity
571
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
572
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
573
number of available `pmpcfg*` and `pmpaddr*` CSRs.
574
 
575
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
576
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
577
increase the latency of instruction fetches and data access by +1 cycle.
578
 
579
The critical limit can be adapted for custom use by a constant from the main VHDL package file
580
(`rtl/core/neorv32_package.vhd`). The default value is 8:
581
 
582
[source,vhdl]
583
----
584
-- "critical" number of PMP regions --
585
constant pmp_num_regions_critical_c : natural := 8;
586
----
587
 
588
**Operation**
589
 
590
Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any
591
of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an
592
address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked:
593
 
594
* a write access (store) will fail if no write attribute is set
595
* a read access (load) will fail if no read attribute is set
596
* an instruction fetch access will fail if no execute attribute is set
597
 
598
If an access to a protected region does not have the according access rights (attributes) it will raise the according
599
_instruction/load/store access fault exception_.
600
 
601
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
602
memory protection also for machine-level programs you need to active the _locked bit_ in the according
603
`pmpcfg*` configuration.
604
 
605
[IMPORTANT]
606
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
607
internal (iterative) computations before the configuration becomes valid.
608
 
609
[NOTE]
610
For more information regarding RISC-V physical memory protection see the official _The RISC-V
611
Instruction Set Manual – Volume II: Privileged Architecture_ specifications.
612
 
613
 
614
==== **`HPM`** Hardware Performance Monitors
615
 
616
In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
617
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
618
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
619
`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configuration
620
CSR defines the architectural events that lead to an increment of the associated HPM counter.
621
 
622
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
623
mandatory performance monitors on every RISC-V platform and have fixed increment event. For example,
624
the instructions-retired counter increments with each executed instructions. The actual hardware performance
625
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
626
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude
627
all HPM logic from the design.
628
 
629
Depending on the configuration, the following additional CSR are available:
630
 
631
* counters: `[m]hpmcounter*[h]` (3..31, depending on configuration)
632
* event configuration: `mhpmevent*` (3..31, depending on configuration)
633
 
634
User-level access to the counter registers `hpmcounter*[h]` can be individually restricted via the `mcounteren` CSR.
635
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
636
 
637
If `HPM_NUM_CNTS` is lower than the maximumg value (=29) the remaining HPMs are not implemented.
638
However, accessing their associated CSRs will not raise an illegal instructions exception. These CSR are
639
read-only and will always return 0.
640
 
641
[NOTE]
642
For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>.
643
 
644
 
645
<<<
646
// ####################################################################################################################
647
:sectnums:
648
=== Instruction Timing
649
 
650
The instruction timing listed in the table below shows the required clock cycles for executing a certain
651
instruction. These instruction cycles assume a bus access without additional wait states and a filled
652
pipeline.
653
 
654
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
655
configurations are presented in <<_cpu_performance>>.
656
 
657
.Clock cycles per instruction
658
[cols="<2,^1,^4,<3"]
659
[options="header", grid="rows"]
660
|=======================
661
| Class | ISA | Instruction(s) | Execution cycles
662
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
663
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
664
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
665 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
666 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
667
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
668
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
669
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
670
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
671
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
672
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
673
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
674
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
675
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
676
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
677
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
678
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
679
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
680
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
681
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
682
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
683
| System | `I/E` | `fence` | 3
684
| System | `C`+`Zicsr` | `c.break` | 4
685
| System | `Zicsr` | `mret` `wfi` | 5
686
| System | `Zifencei` | `fence.i` | 5
687
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
688
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
689
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
690
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
691
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
692
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
693
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
694
|=======================
695
 
696
[NOTE]
697
The presented values of the *floating-point execution cycles* are average values – obtained from
698
4096 instruction executions using pseudo-random input values. The execution time for emulating the
699
instructions (using pure-software libraries) is ~17..140 times higher.
700
 
701
 
702
 
703
// ####################################################################################################################
704
include::cpu_csr.adoc[]
705
 
706
 
707
 
708
<<<
709
// ####################################################################################################################
710
:sectnums:
711
==== Execution Safety
712
 
713
The hardware of the NEORV32 CPU was designed for maximum *execution safety*. If the `Zicsr` CPU
714
extension is enabled, the core supports **all** traps specified by the official RISC-V specifications (obviously,
715
not the ones that are related to yet unimplemented extensions/features). Thus, the CPU provides well-defined
716
hardware fall-backs for (nearly) everything that can go wrong. Even if any kind of trap is triggered, the core
717
is always in a defined and fully synchronized state throughout the whole architecture (i.e. no need to make
718
out-of-order operations undone) that allows predictable execution behavior at any time.
719
 
720
**Core Safety Features**
721
 
722
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system (no speculative execution / out-of-order states).
723
* The CPU supports all bus exceptions including bus access exceptions that are triggered if an
724
accessed address does not respond or encounters an internal error during access (which is a rare
725
feature in many open-source RISC-V cores).
726
* The CPU raises an illegal instruction trap for **all** unimplemented/malformed/illegal instructions (to support _full_ virtualization).
727
* If user-level code tries to read from machine-level-only CSRs (like `mstatus`) an illegal instruction
728
exception is raised. The results of this operations is always zero (though, machine-level
729
code handling this exception can modify the target register of the illegal access-causing
730
instruction to allow full virtualization). Illegal write accesses to machine CSRs will not be write any data at all.
731
* Illegal user-level memory accesses to protected addresses or address regions (via physical memory
732
protection) will not be conducted at all (no actual write and no actual read; prevents triggering of
733
memory-mapped devices). Illegal load operations will not return any data (the instruction's
734
destination register will not be written at all).
735
 
736
 
737
 
738
<<<
739
// ####################################################################################################################
740
:sectnums:
741
==== Traps, Exceptions and Interrupts
742
 
743 61 zero_gravi
In this document the following nomenclature regarding traps is used:
744 60 zero_gravi
 
745
* _interrupt_ = asynchronous exceptions
746
* _exceptions_ = synchronous exceptions
747
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
748
 
749 61 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
750
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
751
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
752
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
753 60 zero_gravi
 
754 61 zero_gravi
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
755
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
756
is serviced first while the remaining ones are queued. After completing the interrupt handler the interrupt with
757
the second highest priority will get serviced and so on until no further interrupt are pending.
758 60 zero_gravi
 
759 61 zero_gravi
.Trigger Type
760
[IMPORTANT]
761
All CPU interrupt request signals are high-level triggered. So an interrupt request will be generated if the
762
according signal is _high_ for exactly one cycle (being high for several cycles might cause multiple
763
triggering of the same interrupt).
764 60 zero_gravi
 
765 61 zero_gravi
.Instruction Atomicity
766
[NOTE]
767
All instructions execute as atomic operations – interrupts can only trigger between two instructions.
768 60 zero_gravi
 
769
 
770 61 zero_gravi
:sectnums:
771
==== Memory Access Exceptions**
772 60 zero_gravi
 
773 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
774
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
775
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
776
memory protection fault do not trigger
777
a bus write-operation at all.
778 60 zero_gravi
 
779
 
780 61 zero_gravi
:sectnums:
781
==== Custom Fast Interrupt Request Lines
782 60 zero_gravi
 
783 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
784 60 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
785 61 zero_gravi
provide custom trap codes in `mcause`. Thes FIRQs are reserved for processor-internal usage only.
786 60 zero_gravi
 
787
 
788 61 zero_gravi
:sectnums:
789
==== Non-Maskable Interrupt
790 60 zero_gravi
 
791
The NEORV32 CPU features a single non-maskable interrupt source via the `nm_irq_i` CPU (/Processor) top
792 61 zero_gravi
entity signal. This interrupt can be used to signal _critical_ system conditions that need immediate handling.
793
The non-maskable interrupt _cannot_ be masked/disabled at all (even not in interrupt service routines).
794 60 zero_gravi
Hence, it does _not_ provide configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible
795
`mcause` value `0x80000000` is used to indicate the non-maskable interrupt.
796
 
797
 
798
 
799
<<<
800
// ####################################################################################################################
801
:sectnums!:
802
===== NEORV32 Trap Listing
803
 
804
.NEORV32 trap listing
805
[cols="3,6,5,14,11,4,4"]
806
[options="header",grid="rows"]
807
|=======================
808
| Prio. | `mcause`     | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
809
| 1     | `0x80000000` | 1.0      | _TRAP_CODE_NMI_ | non-maskable interrupt | _I-PC_ | _0_
810
| 2     | `0x8000000B` | 1.11     | _TRAP_CODE_MEI_ | machine external interrupt | _I-PC_ | _0_
811
| 3     | `0x80000003` | 1.3      | _TRAP_CODE_MSI_ | machine software interrupt | _I-PC_ | _0_
812
| 4     | `0x80000007` | 1.7      | _TRAP_CODE_MTI_ | machine timer interrupt | _I-PC_ | _0_
813
| 5     | `0x80000010` | 1.16     | _TRAP_CODE_FIRQ_0_ | fast interrupt request channel 0 | _I-PC_ | _0_
814
| 6     | `0x80000011` | 1.17     | _TRAP_CODE_FIRQ_1_ | fast interrupt request channel 1 | _I-PC_ | _0_
815
| 7     | `0x80000012` | 1.18     | _TRAP_CODE_FIRQ_2_ | fast interrupt request channel 2 | _I-PC_ | _0_
816
| 8     | `0x80000013` | 1.19     | _TRAP_CODE_FIRQ_3_ | fast interrupt request channel 3 | _I-PC_ | _0_
817
| 9     | `0x80000014` | 1.20     | _TRAP_CODE_FIRQ_4_ | fast interrupt request channel 4 | _I-PC_ | _0_
818
| 10    | `0x80000015` | 1.21     | _TRAP_CODE_FIRQ_5_ | fast interrupt request channel 5 | _I-PC_ | _0_
819
| 11    | `0x80000016` | 1.22     | _TRAP_CODE_FIRQ_6_ | fast interrupt request channel 6 | _I-PC_ | _0_
820
| 12    | `0x80000017` | 1.23     | _TRAP_CODE_FIRQ_7_ | fast interrupt request channel 7 | _I-PC_ | _0_
821
| 13    | `0x80000018` | 1.24     | _TRAP_CODE_FIRQ_8_ | fast interrupt request channel 8 | _I-PC_ | _0_
822
| 14    | `0x80000019` | 1.25     | _TRAP_CODE_FIRQ_9_ | fast interrupt request channel 9 | _I-PC_ | _0_
823
| 15    | `0x8000001a` | 1.26     | _TRAP_CODE_FIRQ_10_ | fast interrupt request channel 10 | _I-PC_ | _0_
824
| 16    | `0x8000001b` | 1.27     | _TRAP_CODE_FIRQ_11_ | fast interrupt request channel 11 | _I-PC_ | _0_
825
| 17    | `0x8000001c` | 1.28     | _TRAP_CODE_FIRQ_12_ | fast interrupt request channel 12 | _I-PC_ | _0_
826
| 18    | `0x8000001d` | 1.29     | _TRAP_CODE_FIRQ_13_ | fast interrupt request channel 13 | _I-PC_ | _0_
827
| 19    | `0x8000001e` | 1.30     | _TRAP_CODE_FIRQ_14_ | fast interrupt request channel 14 | _I-PC_ | _0_
828
| 20    | `0x8000001f` | 1.31     | _TRAP_CODE_FIRQ_15_ | fast interrupt request channel 15 | _I-PC_ | _0_
829
| 21    | `0x00000001` | 0.1      | _TRAP_CODE_I_ACCESS_ | instruction access fault | _B-ADR_ | _PC_
830
| 22    | `0x00000002` | 0.2      | _TRAP_CODE_I_ILLEGAL_ | illegal instruction | _PC_ | _Inst_
831
| 23    | `0x00000000` | 0.0      | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
832
| 24    | `0x0000000B` | 0.11     | _TRAP_CODE_MENV_CALL_ | environment call from M-mode (ECALL in machine-mode) | _PC_ | _PC_
833
| 25    | `0x00000008` | 0.8      | _TRAP_CODE_UENV_CALL_ | environment call from U-mode(ECALL in user-mode) | _PC_ | _PC_
834
| 26    | `0x00000003` | 0.3      | _TRAP_CODE_BREAKPOINT_ | breakpoint (EBREAK) | _PC_ | _PC_
835
| 27    | `0x00000006` | 0.6      | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
836
| 28    | `0x00000004` | 0.4      | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
837
| 29    | `0x00000007` | 0.7      | _TRAP_CODE_S_ACCESS_ | store access fault | _B-ADR_ | _B-ADR_
838
| 30    | `0x00000005` | 0.5      | _TRAP_CODE_L_ACCESS_ | lad access fault | _B-ADR_ | _B-ADR_
839
|=======================
840
 
841
**Notes**
842
 
843
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
844
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
845
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
846
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
847
`mepc` and `mtval` CSRs when a trap is triggered:
848
 
849
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
850
* _B-ADR_- bad memory access address that cause the trap
851
* _PC_ - address of instruction that caused the trap
852
* _0_ - zero
853
* _Inst_ - the faulting instruction itself
854
 
855
 
856
 
857
<<<
858
// ####################################################################################################################
859
:sectnums:
860
==== Bus Interface
861
 
862
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
863
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
864
 
865
:sectnums:
866
===== Address Space
867
 
868
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
869
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
870
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
871
does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be
872
implemented as any unaligned memory access will trigger an according exception.
873
 
874
:sectnums:
875
===== Interface Signals
876
 
877
The following table shows the signals of the data and instruction interfaces seen from the CPU
878
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
879
 
880
.CPU bus interface
881
[cols="<2,^1,<7"]
882
[options="header",grid="rows"]
883
|=======================
884
| Signal | Size | Function
885
| `bus_addr_o`   | 32 | access address
886
| `bus_rdata_i`  | 32 | data input for read operations
887
| `bus_wdata_o`  | 32 | data output for write operations
888
| `bus_ben_o`    | 4  | byte enable signal for write operations
889
| `bus_we_o`     | 1  | bus write access
890
| `bus_re_o`     | 1  | bus read access
891
| `bus_lock_o`   | 1  | exclusive access request
892
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
893
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
894
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
895
| `bus_priv_o`   | 2  | current CPU privilege level
896
|=======================
897
 
898
[NOTE]
899
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
900
So only a single transfer request can be "on the fly".
901
 
902
:sectnums:
903
===== Protocol
904
 
905
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
906
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
907
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
908
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
909
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
910
access fault or load/store bus access fault exception.
911
 
912
[NOTE]
913
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
914
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
915
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
916
 
917
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
918
[IMPORTANT]
919
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
920
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
921
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
922
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
923
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
924
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
925
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
926
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
927
 
928
**Exemplary Bus Accesses**
929
 
930
.Example bus accesses: see read/write access description below
931
[cols="^2,^2"]
932
[grid="none"]
933
|=======================
934
a| image::cpu_interface_read_long.png[read,300,150]
935
a| image::cpu_interface_write_long.png[write,300,150]
936
| Read access | Write access
937
|=======================
938
 
939
**Write Access**
940
 
941
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
942
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
943
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
944
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
945
cycles after issuing.
946
 
947
**Read Access**
948
 
949
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
950
stable until the transaction is completed. In the example the accessed peripheral cannot answer
951
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
952
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
953
signal).
954
 
955
**Access Boundaries**
956
 
957
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
958
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
959
bit) and word (= 32-bit) boundaries.
960
 
961
**Exclusive (Atomic) Access**
962
 
963
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
964
combination. Normally, these combinations should target the same memory address.
965
 
966
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
967
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
968
the memory system to manage this exclusive access reservation by storing the according access address and
969
the source of the access itself (for example via the CPU ID in a multi-core system).
970
 
971
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
972
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
973
zero and will allow the according store operation to the memory system. If the lock is broken, the
974
instruction will write-back non-zero and will not generate an actual memory store operation.
975
 
976
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
977
 
978
* when executing any other memory-access operation than `lr.w`
979
* when any trap (sync. or async.) is triggered (for example to force a context switch)
980
* when the memory system signals a bus error (via the `bus_err_i` signal)
981
 
982
[TIP]
983
For more information regarding the SoC-level behavior and requirements of atomic operations see
984
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
985
 
986
**Memory Barriers**
987
 
988
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
989
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
990
memory system to perform the necessary operations (like a cache flush and refill).
991
 
992
 
993
 
994
<<<
995
// ####################################################################################################################
996
:sectnums:
997
==== CPU Hardware Reset
998
 
999
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1000
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1001
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1002
after power-up is not relevant for a defined CPU boot process.
1003
 
1004
**Rational**
1005
 
1006
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1007
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1008
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
1009
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1010
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1011
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1012
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1013
this example "uncritical registers".
1014
 
1015
**NEORV32 CPU Reset**
1016
 
1017
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1018
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1019
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
1020
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1021
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1022
 
1023
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1024
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
1025
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1026
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
1027
hardware reset setting it to low (globally disabling interrupts).
1028
 
1029
**Reset Configuration**
1030
 
1031
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
1032
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
1033
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1034
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
1035
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1036
 
1037
[source,vhdl]
1038
----
1039
-- "critical" number of PMP regions --
1040
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
1041
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
1042
default; TRUE=defined LOW reset value)
1043
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.