OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 60

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4
image::riscv_logo.png[width=350,align=center]
5
 
6
**Key Features**
7
 
8
* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU
9
* Optional RISC-V extensions: `rv32[i/e][m][a][c][u]` + `[Zfinx][Zicsr][Zifencei]` + `[debug_mode]` (for on-chip debugging)
10
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)
11
* Official RISC-V open-source architecture ID
12
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt
13
* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions
14
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
15
* Optional hardware performance monitors (HPM) for application benchmarking
16
* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for
17
the NEORV32 processor)
18
* little-endian byte order
19
* Configurable hardware reset
20
* No hardware support of unaligned data/instruction accesses – they will trigger an exception. If the C extension is enabled instructions
21
can also be 16-bit aligned and a misaligned instruction address exception is not possible anymore
22
 
23
[NOTE]
24
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
25
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
26
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
27
setup also allows to further use the default bootloader and software framework. From this base you
28
can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
29
 
30
[NOTE]
31
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
32
 
33
<<<
34
// ####################################################################################################################
35
:sectnums:
36
=== Architecture
37
 
38
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
39
specifications. The following figure shows the simplified architecture of the CPU.
40
 
41
image::neorv32_cpu.png[align=center]
42
 
43
The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch)
44
is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is
45
stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit
46
instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed
47
in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions
48
via the execute engine.
49
 
50
These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a
51
certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction
52
prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI
53
(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores
54
multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers
55
due to a taken branch.
56
 
57
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
58
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
59
every single instruction in a series of consecutive micro-operations. The combination of these two classical
60
design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to
61
the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).
62
 
63
The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are
64
merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral
65
devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann
66
Architecture.
67
 
68
 
69
// ####################################################################################################################
70
:sectnums:
71
=== RISC-V Compatibility
72
 
73
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
74
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
75
NEORV32 processor are located in the repository's `riscv-arch-test` folder. See section <<_risc_v_architecture_test_framework>>
76
for information how to run the tests on the NEORV32.
77
 
78
.**RISC-V `rv32_m/C` Tests**
79
...................................
80
Check cadd-01           ... OK
81
Check caddi-01          ... OK
82
Check caddi16sp-01      ... OK
83
Check caddi4spn-01      ... OK
84
Check cand-01           ... OK
85
Check candi-01          ... OK
86
Check cbeqz-01          ... OK
87
Check cbnez-01          ... OK
88
Check cebreak-01        ... OK
89
Check cj-01             ... OK
90
Check cjal-01           ... OK
91
Check cjalr-01          ... OK
92
Check cjr-01            ... OK
93
Check cli-01            ... OK
94
Check clui-01           ... OK
95
Check clw-01            ... OK
96
Check clwsp-01          ... OK
97
Check cmv-01            ... OK
98
Check cnop-01           ... OK
99
Check cor-01            ... OK
100
Check cslli-01          ... OK
101
Check csrai-01          ... OK
102
Check csrli-01          ... OK
103
Check csub-01           ... OK
104
Check csw-01            ... OK
105
Check cswsp-01          ... OK
106
Check cxor-01           ... OK
107
--------------------------------
108
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
109
...................................
110
 
111
.**RISC-V `rv32_m/I` Tests**
112
...................................
113
Check add-01            ... OK
114
Check addi-01           ... OK
115
Check and-01            ... OK
116
Check andi-01           ... OK
117
Check auipc-01          ... OK
118
Check beq-01            ... OK
119
Check bge-01            ... OK
120
Check bgeu-01           ... OK
121
Check blt-01            ... OK
122
Check bltu-01           ... OK
123
Check bne-01            ... OK
124
Check fence-01          ... OK
125
Check jal-01            ... OK
126
Check jalr-01           ... OK
127
Check lb-align-01       ... OK
128
Check lbu-align-01      ... OK
129
Check lh-align-01       ... OK
130
Check lhu-align-01      ... OK
131
Check lui-01            ... OK
132
Check lw-align-01       ... OK
133
Check or-01             ... OK
134
Check ori-01            ... OK
135
Check sb-align-01       ... OK
136
Check sh-align-01       ... OK
137
Check sll-01            ... OK
138
Check slli-01           ... OK
139
Check slt-01            ... OK
140
Check slti-01           ... OK
141
Check sltiu-01          ... OK
142
Check sltu-01           ... OK
143
Check sra-01            ... OK
144
Check srai-01           ... OK
145
Check srl-01            ... OK
146
Check srli-01           ... OK
147
Check sub-01            ... OK
148
Check sw-align-01       ... OK
149
Check xor-01            ... OK
150
Check xori-01           ... OK
151
--------------------------------
152
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
153
...................................
154
 
155
.**RISC-V `rv32_m/M` Tests**
156
...................................
157
Check div-01            ... OK
158
Check divu-01           ... OK
159
Check mul-01            ... OK
160
Check mulh-01           ... OK
161
Check mulhsu-01         ... OK
162
Check mulhu-01          ... OK
163
Check rem-01            ... OK
164
Check remu-01           ... OK
165
--------------------------------
166
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
167
...................................
168
 
169
.**RISC-V `rv32_m/privilege` Tests**
170
...................................
171
Check ebreak            ... OK
172
Check ecall             ... OK
173
Check misalign-beq-01   ... OK
174
Check misalign-bge-01   ... OK
175
Check misalign-bgeu-01  ... OK
176
Check misalign-blt-01   ... OK
177
Check misalign-bltu-01  ... OK
178
Check misalign-bne-01   ... OK
179
Check misalign-jal-01   ... OK
180
Check misalign-lh-01    ... OK
181
Check misalign-lhu-01   ... OK
182
Check misalign-lw-01    ... OK
183
Check misalign-sh-01    ... OK
184
Check misalign-sw-01    ... OK
185
Check misalign1-jalr-01 ... OK
186
Check misalign2-jalr-01 ... OK
187
--------------------------------
188
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
189
...................................
190
 
191
.**RISC-V `rv32_m/Zifencei` Tests**
192
...................................
193
Check Fencei            ... OK
194
--------------------------------
195
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
196
...................................
197
 
198
 
199
<<<
200
:sectnums:
201
==== RISC-V Incompatibility Issues and Limitations
202
 
203
This list shows the currently known issues regarding full RISC-V-compatibility. More specific information
204
can be found in section <<_instruction_sets_and_extensions>>.
205
 
206
[IMPORTANT]
207
The `misa` CSR is read-only. It shows the synthesized CPU extensions. Hence, all implemented
208
CPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Any
209
write access to it (in machine mode) is ignored and will not cause any exception or side-effects.
210
 
211
[IMPORTANT]
212
The `mip` CSR is read-only. Pending IRQs can be cleared using the `mie` CSR.
213
 
214
[IMPORTANT]
215
The `mtval` CSR is read-only.
216
 
217
[IMPORTANT]
218
The physical memory protection (see section <<_machine_physical_memory_protection>>)
219
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
220
 
221
[IMPORTANT]
222
The `A` CPU extension (atomic memory access) only implements the `lr.w` and `sc.w` instructions yet.
223
However, these instructions are sufficient to emulate all further AMO operations.
224
 
225
 
226
<<<
227
// ####################################################################################################################
228
:sectnums:
229
=== CPU Top Entity - Signals
230
 
231
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
232
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
233
direction seen from the CPU.
234
 
235
.NEORV32 CPU top entity signals
236
[cols="<2,^1,^1,<6"]
237
[options="header", grid="rows"]
238
|=======================
239
| Signal           | Width | Dir.   | Function
240
4+^| **Global Signals**
241
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
242
| `rstn_i`         |     1 | in  | global reset, low-active
243
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
244
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
245
| `i_bus_addr_o`   |    32 | out | destination address
246
| `i_bus_rdata_i`  |    32 | in  | read data
247
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
248
| `i_bus_ben_o`    |     4 | out | byte enable
249
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
250
| `i_bus_re_o`     |     1 | out | read transaction
251
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
252
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
253
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
254
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
255
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
256
4+^| **Data Bus Interface (<<_bus_interface>>)**
257
| `d_bus_addr_o`   |    32 | out | destination address
258
| `d_bus_rdata_i`  |    32 | in  | read data
259
| `d_bus_wdata_o`  |    32 | out | write data
260
| `d_bus_ben_o`    |     4 | out | byte enable
261
| `d_bus_we_o`     |     1 | out | write transaction
262
| `d_bus_re_o`     |     1 | out | read transaction
263
| `d_bus_lock_o`   |     1 | out | exclusive access request
264
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
265
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
266
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
267
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
268
4+^| **System Time (see <<_timeh>> CSR)**
269
| `time_i`         |    64 | in  | system time input (from MTIME)
270
4+^| **Non-Maskable Interrupt (<<_traps_exceptions_and_interrupts>>)**
271
| `nm_irq_i`       |     1 | in  | non-maskable interrupt
272
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
273
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
274
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
275
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
276
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
277
| `firq_i`         |    16 | in  | fast interrupt request signals
278
| `firq_ack_o`     |    16 | out | fast interrupt acknowledge signals
279
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
280
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
281
|=======================
282
 
283
<<<
284
// ####################################################################################################################
285
:sectnums:
286
=== CPU Top Entity - Generics
287
 
288
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
289
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
290
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
291
The _specific_ generics are listed below.
292
 
293
[cols="4,4,2"]
294
[frame="all",grid="none"]
295
|======
296
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
297
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
298
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
299
memory (IMEM) if the bootloader is disabled (_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
300
|======
301
 
302
[cols="4,4,2"]
303
[frame="all",grid="none"]
304
|======
305
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000
306
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
307
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
308
|======
309
 
310
[cols="4,4,2"]
311
[frame="all",grid="none"]
312
|======
313
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false
314
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
315
|======
316
 
317
 
318
<<<
319
// ####################################################################################################################
320
:sectnums:
321
=== Instruction Sets and Extensions
322
 
323
The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA
324
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
325
see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
326
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
327
 
328
[TIP]
329
The CPU can discover available ISA extensions via the <<_misa>> and <<_mzext>> CSRs or by executing an instruction
330
and checking for an _illegal instruction exception_.
331
 
332
 
333
==== **`A`** - Atomic Memory Access
334
 
335
Atomic memory access instructions (for implementing semaphores and mutexes) are available when the
336
`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions
337
are available:
338
 
339
* `lr.w`: load-reservate
340
* `sc.w`: store-conditional
341
 
342
[NOTE]
343
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
344
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
345
instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
346
implemented) AMO (atomic memory operation) will trigger an illegal instruction exception.
347
 
348
[NOTE]
349
The atomic instructions have special requirements for memory system / bus interconnect. More
350
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
351
 
352
 
353
==== **`C`** - Compressed Instructions
354
 
355
Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is
356
_true_. In this case the following instructions are available:
357
 
358
* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`,
359
`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp`
360
 
361
[NOTE]
362
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require
363
an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased
364
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
365
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
366
 
367
 
368
==== **`E`** - Embedded CPU
369
 
370
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware
371
requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond
372
`x15` will raise and _illegal instruction exception_.
373
 
374
Due to the reduced register file an alternate ABI (**`ilp32e`**) is required for the toolchain.
375
 
376
 
377
==== **`I`** - Base Integer ISA
378
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
379
regardless of the setting of the remaining exceptions. The base instruction set includes the following
380
instructions:
381
 
382
* immediates: `lui`, `auipc`
383
* jumps: `jal`, `jalr`
384
* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu`
385
* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw`
386
* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and`
387
* environment: `ecall`, `ebreak`, `fence`
388
 
389
[NOTE]
390
In order to keep the hardware footprint low, the CPU's shift unit uses a hybrid parallel/serial approach. Shift
391
operations are split in coarse shifts (multiples of 4) and a final fine shift (0 to 3). The total execution
392
time depends on the shift amount. Alternatively, the shift operations can be processed completely in parallels by a fast
393
(but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
394
complete within 2 cycles regardless of the shift amount. Shift operations can also be executed in a pure serial manner when
395
then `TINY_SHIFT_EN` generic is _true_. In that case, shift operations take up to 32 cycles depending on the shift amount.
396
 
397
[NOTE]
398
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
399
top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
400
executed. Any flags within the `fence` instruction word are ignore by the hardware.
401
 
402
 
403
==== **`M`** - Integer Multiplication and Division
404
 
405
Hardware-accelerated integer multiplication and division instructions are available when the
406
`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are
407
available:
408
 
409
• multiplication: `mul`, `mulh`, `mulhsu`, `mulhu`
410
• division: `div`, `divu`, `rem`, `remu`
411
 
412
[NOTE]
413
By default, multiplication and division operations are executed in a bit-serial approach.
414
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
415
generic is _true_ allowing faster execution. Multiplications and divisions
416
always require a fixed amount of cycles to complete - regardless of the input operands.
417
 
418
 
419
==== **`U`** - Less-Privileged User Mode
420
 
421
Adds the less-privileged _user mode_ when the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For
422
instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like
423
peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode.
424
 
425
 
426
==== **`X`** - NEORV32-Specific (Custom) Extensions
427
 
428
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR.
429
 
430
[NOTE]
431
The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
432
and `mip` CSR. This extension is mapped to bits, that are available for custom use (according to the
433
RISC-V specs). Also, custom trap codes for `mcause` are implemented.
434
 
435
[NOTE]
436
The CPU provides a single _non-maskable_ interrupt (`NMI)` that also provides a custom trap code for `mcause`.
437
 
438
[NOTE]
439
A custom CSR `mzext` is available that can be used to check for implemented `Z*` CPU extensions
440
(for example `Zifencei`). This CSR is mapped to the official "custom CSR address region".
441
 
442
[NOTE]
443
All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception
444
(see <<_execution_safety>>).
445
 
446
 
447
==== **`Zfinx`** Single-Precision Floating-Point Operations
448
 
449
The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the
450
integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f`
451
register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes.
452
This also implies that there are NO dedicated `f` register file related load/store or move instructions. The
453
official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
454
 
455
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
456
 
457
The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F`
458
extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration
459
generic is _true_. In this case the following instructions and CSRs are available:
460
 
461
* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s`
462
* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s`
463
* computational: `fadd.s`, `fsub.s`, `fmul.s`
464
* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s`
465
* number classification: `fclass.s`
466
 
467
* additional CSRs: `fcsr`, `frm`, `fflags`
468
 
469
[WARNING]
470
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
471
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
472
 
473
[WARNING]
474
Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU.
475
Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the
476
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
477
result is also flushed to zero during normalization.
478
 
479
[WARNING]
480
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
481
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
482
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
483
code (see `sw/example/floating_point_test`).
484
 
485
 
486
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
487
 
488
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the
489
`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are
490
available:
491
 
492
* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
493
* environment: `mret`, `wfi`
494
 
495
[WARNING]
496
If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception
497
support at all. In order to provide the full spectrum of functions and to allow a secure executions
498
environment, the `Zicsr` extension should always be enabled.
499
 
500
[NOTE]
501
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
502
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
503
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.
504
 
505
 
506
==== **`Zifencei`** Instruction Stream Synchronization
507
 
508
The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration
509
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
510
 
511
* `fence.i`
512
 
513
[NOTE]
514
The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer.
515
This allows a clean re-fetch of modified data from memory. Also, the top's `i_bus_fencei_o` signal is set
516
high for one cycle to inform the memory system. Any additional flags within the `fence.i` instruction word
517
are ignore by the hardware.
518
 
519
[NOTE]
520
If the `Zifencei` extension is disabled (_CPU_EXTENSION_RISCV_Zifencei_ generic = false) executing
521
a `fence.i` instruction will be executed as `nop` (and will **not trap**) and none of the functions
522
described above will be executed.
523
 
524
 
525
==== **`PMP`** Physical Memory Protection
526
 
527
The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs.
528
The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured
529
via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the
530
`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available:
531
 
532
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
533
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
534
 
535
See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs.
536
 
537
**Configuration**
538
 
539
The actual number of regions and the minimal region granularity are defined via the top entity
540
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
541
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
542
number of available `pmpcfg*` and `pmpaddr*` CSRs.
543
 
544
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
545
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
546
increase the latency of instruction fetches and data access by +1 cycle.
547
 
548
The critical limit can be adapted for custom use by a constant from the main VHDL package file
549
(`rtl/core/neorv32_package.vhd`). The default value is 8:
550
 
551
[source,vhdl]
552
----
553
-- "critical" number of PMP regions --
554
constant pmp_num_regions_critical_c : natural := 8;
555
----
556
 
557
**Operation**
558
 
559
Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any
560
of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an
561
address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked:
562
 
563
* a write access (store) will fail if no write attribute is set
564
* a read access (load) will fail if no read attribute is set
565
* an instruction fetch access will fail if no execute attribute is set
566
 
567
If an access to a protected region does not have the according access rights (attributes) it will raise the according
568
_instruction/load/store access fault exception_.
569
 
570
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
571
memory protection also for machine-level programs you need to active the _locked bit_ in the according
572
`pmpcfg*` configuration.
573
 
574
[IMPORTANT]
575
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
576
internal (iterative) computations before the configuration becomes valid.
577
 
578
[NOTE]
579
For more information regarding RISC-V physical memory protection see the official _The RISC-V
580
Instruction Set Manual – Volume II: Privileged Architecture_ specifications.
581
 
582
 
583
==== **`HPM`** Hardware Performance Monitors
584
 
585
In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides
586
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
587
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
588
`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configuration
589
CSR defines the architectural events that lead to an increment of the associated HPM counter.
590
 
591
The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are
592
mandatory performance monitors on every RISC-V platform and have fixed increment event. For example,
593
the instructions-retired counter increments with each executed instructions. The actual hardware performance
594
monitors are optional and can be configured to increment on arbitrary hardware events. The number of
595
available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude
596
all HPM logic from the design.
597
 
598
Depending on the configuration, the following additional CSR are available:
599
 
600
* counters: `[m]hpmcounter*[h]` (3..31, depending on configuration)
601
* event configuration: `mhpmevent*` (3..31, depending on configuration)
602
 
603
User-level access to the counter registers `hpmcounter*[h]` can be individually restricted via the `mcounteren` CSR.
604
Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR.
605
 
606
If `HPM_NUM_CNTS` is lower than the maximumg value (=29) the remaining HPMs are not implemented.
607
However, accessing their associated CSRs will not raise an illegal instructions exception. These CSR are
608
read-only and will always return 0.
609
 
610
[NOTE]
611
For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>.
612
 
613
 
614
<<<
615
// ####################################################################################################################
616
:sectnums:
617
=== Instruction Timing
618
 
619
The instruction timing listed in the table below shows the required clock cycles for executing a certain
620
instruction. These instruction cycles assume a bus access without additional wait states and a filled
621
pipeline.
622
 
623
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
624
configurations are presented in <<_cpu_performance>>.
625
 
626
.Clock cycles per instruction
627
[cols="<2,^1,^4,<3"]
628
[options="header", grid="rows"]
629
|=======================
630
| Class | ISA | Instruction(s) | Execution cycles
631
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
632
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
633
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
634
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINS_SHIFT_EN` is enabled.]: 2..32
635
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
636
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
637
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
638
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
639
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
640
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
641
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
642
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5
643
| Division       | `M`  | `div` `divu` `rem` `remu`     | 22+32+4
644
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
645
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
646
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
647
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
648
| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
649
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
650
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
651
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
652
| System | `I/E` | `fence` | 3
653
| System | `C`+`Zicsr` | `c.break` | 4
654
| System | `Zicsr` | `mret` `wfi` | 5
655
| System | `Zifencei` | `fence.i` | 5
656
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
657
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
658
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
659
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
660
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
661
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
662
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
663
|=======================
664
 
665
[NOTE]
666
The presented values of the *floating-point execution cycles* are average values – obtained from
667
4096 instruction executions using pseudo-random input values. The execution time for emulating the
668
instructions (using pure-software libraries) is ~17..140 times higher.
669
 
670
 
671
 
672
// ####################################################################################################################
673
include::cpu_csr.adoc[]
674
 
675
 
676
 
677
<<<
678
// ####################################################################################################################
679
:sectnums:
680
==== Execution Safety
681
 
682
The hardware of the NEORV32 CPU was designed for maximum *execution safety*. If the `Zicsr` CPU
683
extension is enabled, the core supports **all** traps specified by the official RISC-V specifications (obviously,
684
not the ones that are related to yet unimplemented extensions/features). Thus, the CPU provides well-defined
685
hardware fall-backs for (nearly) everything that can go wrong. Even if any kind of trap is triggered, the core
686
is always in a defined and fully synchronized state throughout the whole architecture (i.e. no need to make
687
out-of-order operations undone) that allows predictable execution behavior at any time.
688
 
689
**Core Safety Features**
690
 
691
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system (no speculative execution / out-of-order states).
692
* The CPU supports all bus exceptions including bus access exceptions that are triggered if an
693
accessed address does not respond or encounters an internal error during access (which is a rare
694
feature in many open-source RISC-V cores).
695
* The CPU raises an illegal instruction trap for **all** unimplemented/malformed/illegal instructions (to support _full_ virtualization).
696
* If user-level code tries to read from machine-level-only CSRs (like `mstatus`) an illegal instruction
697
exception is raised. The results of this operations is always zero (though, machine-level
698
code handling this exception can modify the target register of the illegal access-causing
699
instruction to allow full virtualization). Illegal write accesses to machine CSRs will not be write any data at all.
700
* Illegal user-level memory accesses to protected addresses or address regions (via physical memory
701
protection) will not be conducted at all (no actual write and no actual read; prevents triggering of
702
memory-mapped devices). Illegal load operations will not return any data (the instruction's
703
destination register will not be written at all).
704
 
705
 
706
 
707
<<<
708
// ####################################################################################################################
709
:sectnums:
710
==== Traps, Exceptions and Interrupts
711
 
712
In this document a (maybe) special nomenclature regarding traps is used:
713
 
714
* _interrupt_ = asynchronous exceptions
715
* _exceptions_ = synchronous exceptions
716
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
717
 
718
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in the `mtvec`
719
CSR. The cause of the according interrupt or exception can be determined via the content of the `mcause`
720
CSR The address that reflected the current program counter when a trap was taken is stored to `mepc`.
721
Additional information regarding the cause of the trap can be retrieved from `mtval`.
722
 
723
The traps are prioritized. If several exceptions occur at once only the one with highest priority is triggered. If
724
several interrupts trigger at once, the one with highest priority is triggered while the remaining ones are
725
queued. After completing the interrupt handler the interrupt with the second highest priority will issues and
726
so on.
727
 
728
 
729
**Memory Access Exceptions**
730
 
731
If a load operation causes any exception, the destination register is not written at all. Exceptions caused by a
732
misalignment or a physical memory protection fault do not trigger a bus read-operation at all.
733
Exceptions caused by a store address misalignment or a store physical memory protection fault do not trigger
734
a bus write-operation at all.
735
 
736
 
737
**Instruction Atomicity**
738
 
739
All instructions execute as atomic operations – interrupts can only trigger between two instructions.
740
 
741
 
742
**Custom Fast Interrupt Request Lines**
743
 
744
As a custom extension, the NEORV32 CPU features 16 fast interrupt request lines via the `firq_i` CPU (/Processor) top
745
entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also
746
provide custom trap codes in `mcause`.
747
 
748
 
749
**Non-Maskable Interrupt**
750
 
751
The NEORV32 CPU features a single non-maskable interrupt source via the `nm_irq_i` CPU (/Processor) top
752
entity signal that can be used to signal critical system conditions. This interrupt source _cannot_ be disabled at all (even not in interrupt service routines).
753
Hence, it does _not_ provide configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible
754
`mcause` value `0x80000000` is used to indicate the non-maskable interrupt.
755
 
756
[IMPORTANT]
757
All CPU/Processor interrupt request signals are triggered when the signal is _high_ for exactly one cycle (being high for several cycles might
758
cause multiple triggering of the interrupt).
759
 
760
 
761
<<<
762
// ####################################################################################################################
763
:sectnums!:
764
===== NEORV32 Trap Listing
765
 
766
.NEORV32 trap listing
767
[cols="3,6,5,14,11,4,4"]
768
[options="header",grid="rows"]
769
|=======================
770
| Prio. | `mcause`     | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
771
| 1     | `0x80000000` | 1.0      | _TRAP_CODE_NMI_ | non-maskable interrupt | _I-PC_ | _0_
772
| 2     | `0x8000000B` | 1.11     | _TRAP_CODE_MEI_ | machine external interrupt | _I-PC_ | _0_
773
| 3     | `0x80000003` | 1.3      | _TRAP_CODE_MSI_ | machine software interrupt | _I-PC_ | _0_
774
| 4     | `0x80000007` | 1.7      | _TRAP_CODE_MTI_ | machine timer interrupt | _I-PC_ | _0_
775
| 5     | `0x80000010` | 1.16     | _TRAP_CODE_FIRQ_0_ | fast interrupt request channel 0 | _I-PC_ | _0_
776
| 6     | `0x80000011` | 1.17     | _TRAP_CODE_FIRQ_1_ | fast interrupt request channel 1 | _I-PC_ | _0_
777
| 7     | `0x80000012` | 1.18     | _TRAP_CODE_FIRQ_2_ | fast interrupt request channel 2 | _I-PC_ | _0_
778
| 8     | `0x80000013` | 1.19     | _TRAP_CODE_FIRQ_3_ | fast interrupt request channel 3 | _I-PC_ | _0_
779
| 9     | `0x80000014` | 1.20     | _TRAP_CODE_FIRQ_4_ | fast interrupt request channel 4 | _I-PC_ | _0_
780
| 10    | `0x80000015` | 1.21     | _TRAP_CODE_FIRQ_5_ | fast interrupt request channel 5 | _I-PC_ | _0_
781
| 11    | `0x80000016` | 1.22     | _TRAP_CODE_FIRQ_6_ | fast interrupt request channel 6 | _I-PC_ | _0_
782
| 12    | `0x80000017` | 1.23     | _TRAP_CODE_FIRQ_7_ | fast interrupt request channel 7 | _I-PC_ | _0_
783
| 13    | `0x80000018` | 1.24     | _TRAP_CODE_FIRQ_8_ | fast interrupt request channel 8 | _I-PC_ | _0_
784
| 14    | `0x80000019` | 1.25     | _TRAP_CODE_FIRQ_9_ | fast interrupt request channel 9 | _I-PC_ | _0_
785
| 15    | `0x8000001a` | 1.26     | _TRAP_CODE_FIRQ_10_ | fast interrupt request channel 10 | _I-PC_ | _0_
786
| 16    | `0x8000001b` | 1.27     | _TRAP_CODE_FIRQ_11_ | fast interrupt request channel 11 | _I-PC_ | _0_
787
| 17    | `0x8000001c` | 1.28     | _TRAP_CODE_FIRQ_12_ | fast interrupt request channel 12 | _I-PC_ | _0_
788
| 18    | `0x8000001d` | 1.29     | _TRAP_CODE_FIRQ_13_ | fast interrupt request channel 13 | _I-PC_ | _0_
789
| 19    | `0x8000001e` | 1.30     | _TRAP_CODE_FIRQ_14_ | fast interrupt request channel 14 | _I-PC_ | _0_
790
| 20    | `0x8000001f` | 1.31     | _TRAP_CODE_FIRQ_15_ | fast interrupt request channel 15 | _I-PC_ | _0_
791
| 21    | `0x00000001` | 0.1      | _TRAP_CODE_I_ACCESS_ | instruction access fault | _B-ADR_ | _PC_
792
| 22    | `0x00000002` | 0.2      | _TRAP_CODE_I_ILLEGAL_ | illegal instruction | _PC_ | _Inst_
793
| 23    | `0x00000000` | 0.0      | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
794
| 24    | `0x0000000B` | 0.11     | _TRAP_CODE_MENV_CALL_ | environment call from M-mode (ECALL in machine-mode) | _PC_ | _PC_
795
| 25    | `0x00000008` | 0.8      | _TRAP_CODE_UENV_CALL_ | environment call from U-mode(ECALL in user-mode) | _PC_ | _PC_
796
| 26    | `0x00000003` | 0.3      | _TRAP_CODE_BREAKPOINT_ | breakpoint (EBREAK) | _PC_ | _PC_
797
| 27    | `0x00000006` | 0.6      | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
798
| 28    | `0x00000004` | 0.4      | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
799
| 29    | `0x00000007` | 0.7      | _TRAP_CODE_S_ACCESS_ | store access fault | _B-ADR_ | _B-ADR_
800
| 30    | `0x00000005` | 0.5      | _TRAP_CODE_L_ACCESS_ | lad access fault | _B-ADR_ | _B-ADR_
801
|=======================
802
 
803
**Notes**
804
 
805
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
806
cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
807
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can
808
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
809
`mepc` and `mtval` CSRs when a trap is triggered:
810
 
811
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
812
* _B-ADR_- bad memory access address that cause the trap
813
* _PC_ - address of instruction that caused the trap
814
* _0_ - zero
815
* _Inst_ - the faulting instruction itself
816
 
817
 
818
 
819
<<<
820
// ####################################################################################################################
821
:sectnums:
822
==== Bus Interface
823
 
824
The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for
825
accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol.
826
 
827
:sectnums:
828
===== Address Space
829
 
830
The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard
831
Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory
832
system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU
833
does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be
834
implemented as any unaligned memory access will trigger an according exception.
835
 
836
:sectnums:
837
===== Interface Signals
838
 
839
The following table shows the signals of the data and instruction interfaces seen from the CPU
840
(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs).
841
 
842
.CPU bus interface
843
[cols="<2,^1,<7"]
844
[options="header",grid="rows"]
845
|=======================
846
| Signal | Size | Function
847
| `bus_addr_o`   | 32 | access address
848
| `bus_rdata_i`  | 32 | data input for read operations
849
| `bus_wdata_o`  | 32 | data output for write operations
850
| `bus_ben_o`    | 4  | byte enable signal for write operations
851
| `bus_we_o`     | 1  | bus write access
852
| `bus_re_o`     | 1  | bus read access
853
| `bus_lock_o`   | 1  | exclusive access request
854
| `bus_ack_i`    | 1  | accessed peripheral indicates a successful completion of the bus transaction
855
| `bus_err_i`    | 1  | accessed peripheral indicates an error during the bus transaction
856
| `bus_fence_o`  | 1  | this signal is set for one cycle when the CPU executes a data/instruction fence operation
857
| `bus_priv_o`   | 2  | current CPU privilege level
858
|=======================
859
 
860
[NOTE]
861
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
862
So only a single transfer request can be "on the fly".
863
 
864
:sectnums:
865
===== Protocol
866
 
867
A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for
868
writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is
869
completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the
870
`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one
871
single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus
872
access fault or load/store bus access fault exception.
873
 
874
[NOTE]
875
The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o`
876
signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous"
877
completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers.
878
 
879
.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
880
[IMPORTANT]
881
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle).
882
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined
883
by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`).
884
It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**.
885
The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out
886
(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception.
887
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides
888
an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
889
 
890
**Exemplary Bus Accesses**
891
 
892
.Example bus accesses: see read/write access description below
893
[cols="^2,^2"]
894
[grid="none"]
895
|=======================
896
a| image::cpu_interface_read_long.png[read,300,150]
897
a| image::cpu_interface_write_long.png[write,300,150]
898
| Read access | Write access
899
|=======================
900
 
901
**Write Access**
902
 
903
For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
904
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
905
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
906
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
907
cycles after issuing.
908
 
909
**Read Access**
910
 
911
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
912
stable until the transaction is completed. In the example the accessed peripheral cannot answer
913
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
914
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
915
signal).
916
 
917
**Access Boundaries**
918
 
919
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
920
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
921
bit) and word (= 32-bit) boundaries.
922
 
923
**Exclusive (Atomic) Access**
924
 
925
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
926
combination. Normally, these combinations should target the same memory address.
927
 
928
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
929
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
930
the memory system to manage this exclusive access reservation by storing the according access address and
931
the source of the access itself (for example via the CPU ID in a multi-core system).
932
 
933
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
934
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
935
zero and will allow the according store operation to the memory system. If the lock is broken, the
936
instruction will write-back non-zero and will not generate an actual memory store operation.
937
 
938
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
939
 
940
* when executing any other memory-access operation than `lr.w`
941
* when any trap (sync. or async.) is triggered (for example to force a context switch)
942
* when the memory system signals a bus error (via the `bus_err_i` signal)
943
 
944
[TIP]
945
For more information regarding the SoC-level behavior and requirements of atomic operations see
946
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
947
 
948
**Memory Barriers**
949
 
950
Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle
951
(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the
952
memory system to perform the necessary operations (like a cache flush and refill).
953
 
954
 
955
 
956
<<<
957
// ####################################################################################################################
958
:sectnums:
959
==== CPU Hardware Reset
960
 
961
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
962
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
963
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
964
after power-up is not relevant for a defined CPU boot process.
965
 
966
**Rational**
967
 
968
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
969
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
970
data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback
971
of the processing result to some kind of memory. The initial status of the data registers after power-up is
972
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
973
the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
974
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
975
this example "uncritical registers".
976
 
977
**NEORV32 CPU Reset**
978
 
979
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
980
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
981
pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main
982
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
983
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
984
 
985
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
986
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`)
987
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
988
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated
989
hardware reset setting it to low (globally disabling interrupts).
990
 
991
**Reset Configuration**
992
 
993
Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value
994
(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a
995
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
996
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can
997
be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
998
 
999
[source,vhdl]
1000
----
1001
-- "critical" number of PMP regions --
1002
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
1003
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
1004
default; TRUE=defined LOW reset value)
1005
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.