OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 64

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11
** `C` - 16-bit compressed instructions
12
** `I` - integer base ISA (always enabled)
13
** `E` - embedded CPU version (reduced register file size)
14
** `M` - integer multiplication and division hardware
15
** `U` - less-privileged _user_ mode
16 63 zero_gravi
** `Zbb` - basic bit-manipulation operations
17 61 zero_gravi
** `Zfinx` - single-precision floating-point unit
18
** `Zicsr` - control and status register access (privileged architecture)
19
** `Zifencei` - instruction stream synchronization
20
** `Zmmul` - integer multiplication hardware
21
** `PMP` - physical memory protection
22
** `HPM` - hardware performance monitors
23
** `DB` - debug mode
24 60 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)
25
* Official RISC-V open-source architecture ID
26
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt
27
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
28
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
29
* Optional hardware performance monitors (HPM) for application benchmarking
30
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
31
the NEORV32 processor)
32
* little-endian byte order
33
* Configurable hardware reset
34 64 zero_gravi
* No hardware support of unaligned data/instruction accesses – they will trigger an exception.
35 60 zero_gravi
 
36
[NOTE]
37
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
38
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
39
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
40
setup also allows to further use the default bootloader and software framework. From this base you
41
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
42
 
43
[NOTE]
44
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
45
 
46
<<<
47
// ####################################################################################################################
48
:sectnums:
49
=== Architecture
50
 
51
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
52
specifications. The following figure shows the simplified architecture of the CPU.
53
 
54
image::neorv32_cpu.png[align=center]
55
 
56
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch)
57
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
58
stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
59
instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed
60
in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions
61
via the execute engine.
62
 
63
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
64
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
65
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
66
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
67
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
68
due to a taken branch.
69
 
70
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
71
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
72
every single instruction in a series of consecutive micro-operations. The combination of these two classical
73
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
74
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
75
 
76
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
77
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
78
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
79
Architecture.
80
 
81
 
82
// ####################################################################################################################
83
:sectnums:
84
=== RISC-V Compatibility
85
 
86
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
87
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
88 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
89
 
90
[NOTE]
91
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
92 60 zero_gravi
for information how to run the tests on the NEORV32.
93
 
94
.**RISC-V `rv32_m/C` Tests**
95
...................................
96
Check cadd-01           ... OK
97
Check caddi-01          ... OK
98
Check caddi16sp-01      ... OK
99
Check caddi4spn-01      ... OK
100
Check cand-01           ... OK
101
Check candi-01          ... OK
102
Check cbeqz-01          ... OK
103
Check cbnez-01          ... OK
104
Check cebreak-01        ... OK
105
Check cj-01             ... OK
106
Check cjal-01           ... OK
107
Check cjalr-01          ... OK
108
Check cjr-01            ... OK
109
Check cli-01            ... OK
110
Check clui-01           ... OK
111
Check clw-01            ... OK
112
Check clwsp-01          ... OK
113
Check cmv-01            ... OK
114
Check cnop-01           ... OK
115
Check cor-01            ... OK
116
Check cslli-01          ... OK
117
Check csrai-01          ... OK
118
Check csrli-01          ... OK
119
Check csub-01           ... OK
120
Check csw-01            ... OK
121
Check cswsp-01          ... OK
122
Check cxor-01           ... OK
123
--------------------------------
124
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
125
...................................
126
 
127
.**RISC-V `rv32_m/I` Tests**
128
...................................
129
Check add-01            ... OK
130
Check addi-01           ... OK
131
Check and-01            ... OK
132
Check andi-01           ... OK
133
Check auipc-01          ... OK
134
Check beq-01            ... OK
135
Check bge-01            ... OK
136
Check bgeu-01           ... OK
137
Check blt-01            ... OK
138
Check bltu-01           ... OK
139
Check bne-01            ... OK
140
Check fence-01          ... OK
141
Check jal-01            ... OK
142
Check jalr-01           ... OK
143
Check lb-align-01       ... OK
144
Check lbu-align-01      ... OK
145
Check lh-align-01       ... OK
146
Check lhu-align-01      ... OK
147
Check lui-01            ... OK
148
Check lw-align-01       ... OK
149
Check or-01             ... OK
150
Check ori-01            ... OK
151
Check sb-align-01       ... OK
152
Check sh-align-01       ... OK
153
Check sll-01            ... OK
154
Check slli-01           ... OK
155
Check slt-01            ... OK
156
Check slti-01           ... OK
157
Check sltiu-01          ... OK
158
Check sltu-01           ... OK
159
Check sra-01            ... OK
160
Check srai-01           ... OK
161
Check srl-01            ... OK
162
Check srli-01           ... OK
163
Check sub-01            ... OK
164
Check sw-align-01       ... OK
165
Check xor-01            ... OK
166
Check xori-01           ... OK
167
--------------------------------
168
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
169
...................................
170
 
171
.**RISC-V `rv32_m/M` Tests**
172
...................................
173
Check div-01            ... OK
174
Check divu-01           ... OK
175
Check mul-01            ... OK
176
Check mulh-01           ... OK
177
Check mulhsu-01         ... OK
178
Check mulhu-01          ... OK
179
Check rem-01            ... OK
180
Check remu-01           ... OK
181
--------------------------------
182
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
183
...................................
184
 
185
.**RISC-V `rv32_m/privilege` Tests**
186
...................................
187
Check ebreak            ... OK
188
Check ecall             ... OK
189
Check misalign-beq-01   ... OK
190
Check misalign-bge-01   ... OK
191
Check misalign-bgeu-01  ... OK
192
Check misalign-blt-01   ... OK
193
Check misalign-bltu-01  ... OK
194
Check misalign-bne-01   ... OK
195
Check misalign-jal-01   ... OK
196
Check misalign-lh-01    ... OK
197
Check misalign-lhu-01   ... OK
198
Check misalign-lw-01    ... OK
199
Check misalign-sh-01    ... OK
200
Check misalign-sw-01    ... OK
201
Check misalign1-jalr-01 ... OK
202
Check misalign2-jalr-01 ... OK
203
--------------------------------
204
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
205
...................................
206
 
207
.**RISC-V `rv32_m/Zifencei` Tests**
208
...................................
209
Check Fencei            ... OK
210
--------------------------------
211
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
212
...................................
213
 
214
 
215
<<<
216
:sectnums:
217
==== RISC-V Incompatibility Issues and Limitations
218
 
219 64 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
220 60 zero_gravi
can be found in section <<_instruction_sets_and_extensions>>.
221
 
222 64 zero_gravi
.Hardwired R/W CSRs
223 60 zero_gravi
[IMPORTANT]
224 64 zero_gravi
The `misa`, `mip` and `mtval` CSRs in the NEORV32 are _read-only_.
225
Any write access to it (in machine mode) to them are ignored and will _not_ cause any exceptions or side-effects.
226 60 zero_gravi
 
227 64 zero_gravi
.Physical memory protection
228 60 zero_gravi
[IMPORTANT]
229
The physical memory protection (see section <<_machine_physical_memory_protection>>)
230
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
231
 
232 64 zero_gravi
.Atomic memory operations
233 60 zero_gravi
[IMPORTANT]
234 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
235
However, these instructions are sufficient to emulate all further atomic memory operations.
236 60 zero_gravi
 
237 64 zero_gravi
.Instruction Misalignment
238
[NOTE]
239
This is not a real RISC-V incompatibility, but something that might not be clear when studying the RISC-V privileged
240
architecture specifications: for 32-bit only instructions (no `C` extension) the misaligned instruction exception
241
is raised if bit 1 of the access address is set (i.e. not on 32-bit boundary). If the `C` extension is implemented
242
there will be no misaligned instruction exceptions _at all_.
243
In both cases bit 0 of the program counter and all related registers is hardwired to zero.
244 60 zero_gravi
 
245 64 zero_gravi
 
246 60 zero_gravi
<<<
247
// ####################################################################################################################
248
:sectnums:
249
=== CPU Top Entity - Signals
250
 
251
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
252
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
253
direction seen from the CPU.
254
 
255
.NEORV32 CPU top entity signals
256
[cols="<2,^1,^1,<6"]
257
[options="header", grid="rows"]
258
|=======================
259
| Signal           | Width | Dir.   | Function
260
4+^| **Global Signals**
261
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
262
| `rstn_i`         |     1 | in  | global reset, low-active
263
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
264
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
265
| `i_bus_addr_o`   |    32 | out | destination address
266
| `i_bus_rdata_i`  |    32 | in  | read data
267
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
268
| `i_bus_ben_o`    |     4 | out | byte enable
269
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
270
| `i_bus_re_o`     |     1 | out | read transaction
271
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
272
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
273
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
274
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
275
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
276
4+^| **Data Bus Interface (<<_bus_interface>>)**
277
| `d_bus_addr_o`   |    32 | out | destination address
278
| `d_bus_rdata_i`  |    32 | in  | read data
279
| `d_bus_wdata_o`  |    32 | out | write data
280
| `d_bus_ben_o`    |     4 | out | byte enable
281
| `d_bus_we_o`     |     1 | out | write transaction
282
| `d_bus_re_o`     |     1 | out | read transaction
283
| `d_bus_lock_o`   |     1 | out | exclusive access request
284
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
285
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
286
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
287
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
288
4+^| **System Time (see <<_timeh>> CSR)**
289
| `time_i`         |    64 | in  | system time input (from MTIME)
290
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
291
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
292
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
293
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
294
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
295
| `firq_i`         |    16 | in  | fast interrupt request signals
296
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
297
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
298
|=======================
299
 
300
<<<
301
// ####################################################################################################################
302
:sectnums:
303
=== CPU Top Entity - Generics
304
 
305
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
306
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
307
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
308
The _specific_ generics are listed below.
309
 
310
[cols="4,4,2"]
311
[frame="all",grid="none"]
312
|======
313
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
314
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
315
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
316 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
317 60 zero_gravi
|======
318
 
319
[cols="4,4,2"]
320
[frame="all",grid="none"]
321
|======
322
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
323
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
324
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
325
|======
326
 
327
[cols="4,4,2"]
328
[frame="all",grid="none"]
329
|======
330
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
331
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
332
|======
333
 
334
 
335
<<<
336
// ####################################################################################################################
337
:sectnums:
338
=== Instruction Sets and Extensions
339
 
340
The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA
341
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
342
see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
343
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
344
 
345
[TIP]
346 63 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> CSR and the
347 64 zero_gravi
`CPU` <<_system_configuration_information_memory_sysinfo, SYSINFO>> register
348 63 zero_gravi
or by executing an instruction and checking for an _illegal instruction exception_.
349 60 zero_gravi
 
350 63 zero_gravi
[NOTE]
351
Executing an instruction from an extension that is not implemented or not enabled (for example via the according
352
top entity generic) will raise an _illegal instruction_ exception.
353 60 zero_gravi
 
354 63 zero_gravi
 
355 60 zero_gravi
==== **`A`** - Atomic Memory Access
356
 
357
Atomic memory access instructions (for implementing semaphores and mutexes) are available when the
358
`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions
359
are available:
360
 
361
* `lr.w`: load-reservate
362
* `sc.w`: store-conditional
363
 
364
[NOTE]
365
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
366
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
367
instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
368
implemented) AMO (atomic memory operation) will trigger an illegal instruction exception.
369
 
370
[NOTE]
371
The atomic instructions have special requirements for memory system / bus interconnect. More
372
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
373
 
374
 
375
==== **`C`** - Compressed Instructions
376
 
377
Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is
378
_true_. In this case the following instructions are available:
379
 
380
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
381
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
382
 
383
[NOTE]
384
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require
385
an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased
386
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
387
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
388
 
389
 
390
==== **`E`** - Embedded CPU
391
 
392
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware
393
requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond
394
`x15` will raise and _illegal instruction exception_.
395
 
396 63 zero_gravi
[IMPORTANT]
397
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
398 60 zero_gravi
 
399
 
400
==== **`I`** - Base Integer ISA
401
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
402
regardless of the setting of the remaining exceptions. The base instruction set includes the following
403
instructions:
404
 
405
* immediates: `lui`, `auipc`
406
* jumps: `jal`, `jalr`
407
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
408
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
409
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
410
* environment: `ecall`, `ebreak`, `fence`
411
 
412
[NOTE]
413 61 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial serial approach. Hence, shift operations
414
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
415
completely in parallels by a fast (but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
416 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
417 60 zero_gravi
 
418
[NOTE]
419
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
420
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
421
executed. Any flags within the `fence` instruction word are ignore by the hardware.
422
 
423
 
424
==== **`M`** - Integer Multiplication and Division
425
 
426
Hardware-accelerated integer multiplication and division instructions are available when the
427
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
428
available:
429
 
430 61 zero_gravi
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
431
* division: `div`, `divu`, `rem`, `remu`
432 60 zero_gravi
 
433
[NOTE]
434
By default, multiplication and division operations are executed in a bit-serial approach.
435
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
436
generic is _true_ allowing faster execution. Multiplications and divisions
437
always require a fixed amount of cycles to complete - regardless of the input operands.
438
 
439
 
440 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
441
 
442
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
443
of the `M` extensions and is intended for small scale applications, that require hardware-based
444
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
445
This extension requires only ~50% of the hardware utilization of the `M` extension.
446
 
447
* multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
448
 
449 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
450
will raise an _illegal instruction exception_.
451 61 zero_gravi
 
452 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
453 61 zero_gravi
 
454
[TIP]
455
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
456
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
457
(example `$ make MARCH=-march=rv32im USER_FLAGS+=-mno-div clean_all exe`).
458
 
459
 
460 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
461
 
462 63 zero_gravi
Adds the less-privileged _user mode_ if the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For
463 60 zero_gravi
instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like
464
peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode.
465
 
466
 
467
==== **`X`** - NEORV32-Specific (Custom) Extensions
468
 
469
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
470
 
471 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
472
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
473 64 zero_gravi
and `mip` CSR. This extension is mapped to _reserved_ CSR bits, that are available for custom use (according to the
474 60 zero_gravi
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
475 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
476 60 zero_gravi
 
477
 
478 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
479 60 zero_gravi
 
480 63 zero_gravi
[WARNING]
481
The NEORV32 `Zfinx` extension is specification-compliant and operational but still _experimental_.
482 60 zero_gravi
 
483
The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the
484
integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f`
485
register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes.
486
This also implies that there are NO dedicated `f` register file related load/store or move instructions. The
487
official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
488
 
489
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
490
 
491
The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F`
492
extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
493
generic is _true_. In this case the following instructions and CSRs are available:
494
 
495
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
496
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
497
* computational: `fadd.s`, `fsub.s`, `fmul.s`
498
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
499
* number classification: `fclass.s`
500
 
501
* additional CSRs: `fcsr`, `frm`, `fflags`
502
 
503
[WARNING]
504
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
505
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
506
 
507
[WARNING]
508
Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU.
509
Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the
510
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
511
result is also flushed to zero during normalization.
512
 
513
[WARNING]
514
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
515
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
516
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
517
code (see `sw/example/floating_point_test`).
518
 
519 63 zero_gravi
 
520
==== **`Zbb`** Basic Bit-Manipulation Operations
521
 
522
[WARNING]
523
The NEORV32 `Zbb` extension is specification-compliant and operational but still _experimental_.
524
 
525
The `Zbb` extension implements the _basic_ sub-set of the RISC-V bit-manipulation extensions `B`.
526
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
527
 
528
The `Zbb` extension is implemented when the `CPU_EXTENSION_RISCV_Zbb` configuration
529
generic is _true_. In this case the following instructions are available:
530
 
531
* `andn`, `orn`, `xnor`
532
* `clz`, `ctz`, `cpop`
533
* `max`, `maxu`, `min`, `minu`
534
* `sext.b`, `sext.h`, `zext.h`
535
* `rol`, `ror`, `rori`
536
* `orc.b`, `rev8`
537
 
538
[TIP]
539
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
540
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
541
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
542
shift-related `Zbb` instructions.
543
 
544 62 zero_gravi
[IMPORTANT]
545 63 zero_gravi
The `Zbb` extension is frozen but not officially ratified yet. There is no
546
software support for this extension in the upstream GCC RISC-V port yet. However, an
547
intrinsic library is provided to utilize the provided `Zbb` extension from C-language
548
code (see `sw/example/bitmanip_test`).
549 60 zero_gravi
 
550 62 zero_gravi
 
551 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
552
 
553
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the
554
`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are
555
available:
556
 
557
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
558
* environment: `mret`, `wfi`
559
 
560
[WARNING]
561
If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception
562
support at all. In order to provide the full spectrum of functions and to allow a secure executions
563
environment, the `Zicsr` extension should always be enabled.
564
 
565
[NOTE]
566
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
567
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
568
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
569
 
570 62 zero_gravi
[IMPORTANT]
571
The `wfi` instruction will raise an illegal instruction exception when executed outside of machine-mode
572
and <<_mstatus>> bit `TW` (timeout wait) is set.
573 60 zero_gravi
 
574 62 zero_gravi
 
575 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
576
 
577
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
578
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
579
 
580
* `fence.i`
581
 
582
[NOTE]
583
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
584 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
585
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
586
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
587 60 zero_gravi
 
588
 
589
==== **`PMP`** Physical Memory Protection
590
 
591
The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs.
592
The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured
593
via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the
594
`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available:
595
 
596
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
597
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
598
 
599
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
600
 
601
**Configuration**
602
 
603
The actual number of regions and the minimal region granularity are defined via the top entity
604
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
605
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
606
number of available `pmpcfg*` and `pmpaddr*` CSRs.
607
 
608
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
609
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
610
increase the latency of instruction fetches and data access by +1 cycle.
611
 
612
The critical limit can be adapted for custom use by a constant from the main VHDL package file
613
(`rtl/core/neorv32_package.vhd`). The default value is 8:
614
 
615
[source,vhdl]
616
----
617
-- "critical" number of PMP regions --
618
constant pmp_num_regions_critical_c : natural := 8;
619
----
620
 
621
**Operation**
622
 
623
Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any
624
of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an
625
address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked:
626
 
627
* a write access (store) will fail if no write attribute is set
628
* a read access (load) will fail if no read attribute is set
629
* an instruction fetch access will fail if no execute attribute is set
630
 
631
If an access to a protected region does not have the according access rights (attributes) it will raise the according
632
_instruction/load/store access fault exception_.
633
 
634
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
635
memory protection also for machine-level programs you need to active the _locked bit_ in the according
636
`pmpcfg*` configuration.
637
 
638
[IMPORTANT]
639
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
640
internal (iterative) computations before the configuration becomes valid.
641
 
642
[NOTE]
643
For more information regarding RISC-V physical memory protection see the official _The RISC-V
644
Instruction Set Manual – Volume II: Privileged Architecture_ specifications.
645
 
646
 
647
==== **`HPM`** Hardware Performance Monitors
648
 
649
In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
650
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
651
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
652
`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configuration
653
CSR defines the architectural events that lead to an increment of the associated HPM counter.
654
 
655
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
656 62 zero_gravi
mandatory performance monitors on every RISC-V platform and have fixed increment events. For example,
657 60 zero_gravi
the instructions-retired counter increments with each executed instructions. The actual hardware performance
658
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
659
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude
660
all HPM logic from the design.
661
 
662
Depending on the configuration, the following additional CSR are available:
663
 
664 62 zero_gravi
* counters: `mhpmcounter*[h]` (3..31, depending on configuration)
665 60 zero_gravi
* event configuration: `mhpmevent*` (3..31, depending on configuration)
666
 
667 62 zero_gravi
[IMPORTANT]
668
The HPM counter CSR can only be accessed in machine-mode. Hence, the according `mcounteren` CSR bits
669
are always zero and read-only.
670
 
671 60 zero_gravi
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
672
 
673 62 zero_gravi
If `HPM_NUM_CNTS` is lower than the maximum value (=29) the remaining HPM CSRs are not implemented and the
674
according `mcountinhibit` CSR bits are hardwired to zero.
675
However, accessing their associated CSRs will not raise an illegal instruction exception (if in machine mode).
676
The according CSRs are read-only and will always return 0.
677 60 zero_gravi
 
678
[NOTE]
679
For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>.
680
 
681
 
682
<<<
683
// ####################################################################################################################
684
:sectnums:
685
=== Instruction Timing
686
 
687
The instruction timing listed in the table below shows the required clock cycles for executing a certain
688
instruction. These instruction cycles assume a bus access without additional wait states and a filled
689
pipeline.
690
 
691
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
692
configurations are presented in <<_cpu_performance>>.
693
 
694
.Clock cycles per instruction
695
[cols="<2,^1,^4,<3"]
696
[options="header", grid="rows"]
697
|=======================
698
| Class | ISA | Instruction(s) | Execution cycles
699
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
700
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
701
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
702 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
703 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
704
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
705
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
706
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
707
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
708
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
709
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
710
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
711
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
712
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
713
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
714
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
715
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
716
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
717
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
718
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
719
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
720
| System | `I/E` | `fence` | 3
721
| System | `C`+`Zicsr` | `c.break` | 4
722
| System | `Zicsr` | `mret` `wfi` | 5
723
| System | `Zifencei` | `fence.i` | 5
724
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
725
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
726
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
727
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
728
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
729
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
730
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
731 63 zero_gravi
| Basic bit-manip - logic | `Zbb` | `andn` `orn` `xnor` | 3
732
| Basic bit-manip - shift | `Zbb` | `clz` `ctz` `cpop` `rol` `ror` `rori` | 4+SA, FAST_SHIFT: 4
733
| Basic bit-manip - arith | `Zbb` | `max` `maxu` `min` `minu` | 3
734
| Basic bit-manip - misc  | `Zbb` | `sext.b` `sext.h` `zext.h` `orc.b` `rev8` | 3
735 60 zero_gravi
|=======================
736
 
737
[NOTE]
738
The presented values of the *floating-point execution cycles* are average values – obtained from
739
4096 instruction executions using pseudo-random input values. The execution time for emulating the
740
instructions (using pure-software libraries) is ~17..140 times higher.
741
 
742
 
743
 
744
// ####################################################################################################################
745
include::cpu_csr.adoc[]
746
 
747
 
748
 
749
<<<
750
// ####################################################################################################################
751
:sectnums:
752 62 zero_gravi
==== Full Virtualization
753 60 zero_gravi
 
754 62 zero_gravi
Just like the RISC-V ISA the NEORV32 aims to support _ maximum virtualization_ capabilities
755
on CPU _and_ SoC level. The CPU supports **all** traps specified by the official RISC-V specifications.footnote:[If the `Zicsr` CPU
756
extension is enabled (implementing the full set of the privileged architecture).]
757
Thus, the CPU provides defined hardware fall-backs for any expected and unexpected situation (e.g. executing an
758
malformed instruction word or accessing a not-allocated address). For any kind of trap the core is always in a
759
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
760
have to be made undone). This allows predictable execution behavior - and thus, defined operations to resolve the cause
761
of the trap - at any time improving overall _execution safety_.
762 60 zero_gravi
 
763 62 zero_gravi
**NEORV32-Specific Virtualization Features**
764 60 zero_gravi
 
765 62 zero_gravi
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
766
(i.e. there is no speculative execution / no out-of-order states).
767
* The CPU supports _all_ RISC-V bus exceptions including access exceptions that are triggered if an
768
accessed address does not respond or encounters an internal error during access.
769
* The CPU raises an illegal instruction trap for _all_ unimplemented/malformed/illegal instructions.
770
* To be continued...
771 60 zero_gravi
 
772
 
773
<<<
774
// ####################################################################################################################
775
:sectnums:
776
==== Traps, Exceptions and Interrupts
777
 
778 61 zero_gravi
In this document the following nomenclature regarding traps is used:
779 60 zero_gravi
 
780 64 zero_gravi
* _interrupts_ = asynchronous exceptions
781 60 zero_gravi
* _exceptions_ = synchronous exceptions
782
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
783
 
784 61 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in `mtvec`
785
CSR. The cause of the according interrupt or exception can be determined via the content of `mcause`
786
CSR. The address that reflects the current program counter when a trap was taken is stored to `mepc` CSR.
787
Additional information regarding the cause of the trap can be retrieved from `mtval` CSR.
788 60 zero_gravi
 
789 61 zero_gravi
The traps are prioritized. If several _exceptions_ occur at once only the one with highest priority is triggered
790
while all remaining exceptions are ignored. If several _interrupts_ trigger at once, the one with highest priority
791 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
792 61 zero_gravi
the second highest priority will get serviced and so on until no further interrupt are pending.
793 60 zero_gravi
 
794 64 zero_gravi
.RISC-V interrupts
795 61 zero_gravi
[IMPORTANT]
796 64 zero_gravi
All RISC-V defined machine level interrupts request signals are high-active. A request has to stay at high-level until
797
it is acknowledged by the CPU (for example by writing to a specific memory-mapped register).
798 60 zero_gravi
 
799 61 zero_gravi
.Instruction Atomicity
800
[NOTE]
801
All instructions execute as atomic operations – interrupts can only trigger between two instructions.
802 64 zero_gravi
So if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
803
a new interrupt handler can start.
804 60 zero_gravi
 
805
 
806 61 zero_gravi
:sectnums:
807
==== Memory Access Exceptions**
808 60 zero_gravi
 
809 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
810
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
811
trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical
812 64 zero_gravi
memory protection fault do not trigger a bus write-operation at all.
813 60 zero_gravi
 
814
 
815 61 zero_gravi
:sectnums:
816
==== Custom Fast Interrupt Request Lines
817 60 zero_gravi
 
818 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
819 60 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
820 64 zero_gravi
provide custom trap codes in `mcause`. These FIRQs are reserved for processor-internal usage only.
821 60 zero_gravi
 
822 64 zero_gravi
[NOTE]
823
The fast interrupt request lines trigger on a **rising-edge**.
824 60 zero_gravi
 
825
 
826
<<<
827
// ####################################################################################################################
828
:sectnums!:
829
===== NEORV32 Trap Listing
830
 
831
.NEORV32 trap listing
832
[cols="3,6,5,14,11,4,4"]
833
[options="header",grid="rows"]
834
|=======================
835 64 zero_gravi
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
836
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
837
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
838
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
839
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
840
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
841
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (EBREAK) | _PC_ | _PC_
842
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
843
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
844
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
845
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
846
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
847
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
848
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
849
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
850
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
851
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
852
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
853
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
854
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
855
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
856
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
857
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
858
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
859
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
860
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
861
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
862
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
863
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
864
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
865 60 zero_gravi
|=======================
866
 
867
**Notes**
868
 
869
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
870
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
871
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
872
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
873
`mepc` and `mtval` CSRs when a trap is triggered:
874
 
875
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
876
* _B-ADR_- bad memory access address that cause the trap
877
* _PC_ - address of instruction that caused the trap
878
* _0_ - zero
879
* _Inst_ - the faulting instruction itself
880
 
881
 
882
 
883
<<<
884
// ####################################################################################################################
885
:sectnums:
886
==== Bus Interface
887
 
888
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
889
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
890
 
891
:sectnums:
892
===== Address Space
893
 
894
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
895
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
896
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
897
does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be
898
implemented as any unaligned memory access will trigger an according exception.
899
 
900
:sectnums:
901
===== Interface Signals
902
 
903
The following table shows the signals of the data and instruction interfaces seen from the CPU
904
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
905
 
906
.CPU bus interface
907
[cols="<2,^1,<7"]
908
[options="header",grid="rows"]
909
|=======================
910
| Signal | Size | Function
911
| `bus_addr_o`   | 32 | access address
912
| `bus_rdata_i`  | 32 | data input for read operations
913
| `bus_wdata_o`  | 32 | data output for write operations
914
| `bus_ben_o`    | 4  | byte enable signal for write operations
915
| `bus_we_o`     | 1  | bus write access
916
| `bus_re_o`     | 1  | bus read access
917
| `bus_lock_o`   | 1  | exclusive access request
918
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
919
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
920
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
921
| `bus_priv_o`   | 2  | current CPU privilege level
922
|=======================
923
 
924
[NOTE]
925
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
926
So only a single transfer request can be "on the fly".
927
 
928
:sectnums:
929
===== Protocol
930
 
931
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
932
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
933
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
934
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
935
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
936
access fault or load/store bus access fault exception.
937
 
938
[NOTE]
939
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
940
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
941
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
942
 
943
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
944
[IMPORTANT]
945
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
946
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
947
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
948
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
949
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
950
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
951
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
952
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
953
 
954
**Exemplary Bus Accesses**
955
 
956
.Example bus accesses: see read/write access description below
957
[cols="^2,^2"]
958
[grid="none"]
959
|=======================
960
a| image::cpu_interface_read_long.png[read,300,150]
961
a| image::cpu_interface_write_long.png[write,300,150]
962
| Read access | Write access
963
|=======================
964
 
965
**Write Access**
966
 
967
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
968
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
969
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
970
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
971
cycles after issuing.
972
 
973
**Read Access**
974
 
975
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
976
stable until the transaction is completed. In the example the accessed peripheral cannot answer
977
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
978
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
979
signal).
980
 
981
**Access Boundaries**
982
 
983
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
984
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
985
bit) and word (= 32-bit) boundaries.
986
 
987
**Exclusive (Atomic) Access**
988
 
989
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
990
combination. Normally, these combinations should target the same memory address.
991
 
992
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
993
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
994
the memory system to manage this exclusive access reservation by storing the according access address and
995
the source of the access itself (for example via the CPU ID in a multi-core system).
996
 
997
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
998
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
999
zero and will allow the according store operation to the memory system. If the lock is broken, the
1000
instruction will write-back non-zero and will not generate an actual memory store operation.
1001
 
1002
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1003
 
1004
* when executing any other memory-access operation than `lr.w`
1005
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1006
* when the memory system signals a bus error (via the `bus_err_i` signal)
1007
 
1008
[TIP]
1009
For more information regarding the SoC-level behavior and requirements of atomic operations see
1010
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1011
 
1012
**Memory Barriers**
1013
 
1014
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
1015
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
1016
memory system to perform the necessary operations (like a cache flush and refill).
1017
 
1018
 
1019
 
1020
<<<
1021
// ####################################################################################################################
1022
:sectnums:
1023
==== CPU Hardware Reset
1024
 
1025
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1026
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1027
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1028
after power-up is not relevant for a defined CPU boot process.
1029
 
1030
**Rational**
1031
 
1032
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1033
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1034
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
1035
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1036
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1037
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1038
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1039
this example "uncritical registers".
1040
 
1041
**NEORV32 CPU Reset**
1042
 
1043
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1044
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1045
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
1046
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1047
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1048
 
1049
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1050
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
1051
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1052
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
1053
hardware reset setting it to low (globally disabling interrupts).
1054
 
1055
**Reset Configuration**
1056
 
1057
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
1058
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
1059
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1060
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
1061
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1062
 
1063
[source,vhdl]
1064
----
1065
-- "critical" number of PMP regions --
1066
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
1067
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
1068
default; TRUE=defined LOW reset value)
1069
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.