OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 72

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4 72 zero_gravi
image::neorv32_cpu_block.png[width=600,align=center]
5 60 zero_gravi
 
6
**Key Features**
7
 
8 66 zero_gravi
* 32-bit multi-cycle in-order `rv32` RISC-V CPU
9 61 zero_gravi
* Optional RISC-V extensions:
10
** `A` - atomic memory access operations
11 66 zero_gravi
** `B` - bit-manipulation instructions
12 61 zero_gravi
** `C` - 16-bit compressed instructions
13
** `I` - integer base ISA (always enabled)
14
** `E` - embedded CPU version (reduced register file size)
15
** `M` - integer multiplication and division hardware
16
** `U` - less-privileged _user_ mode
17
** `Zfinx` - single-precision floating-point unit
18
** `Zicsr` - control and status register access (privileged architecture)
19 66 zero_gravi
** `Zicntr` - CPU base counters
20
** `Zihpm` - hardware performance monitors
21 61 zero_gravi
** `Zifencei` - instruction stream synchronization
22
** `Zmmul` - integer multiplication hardware
23 72 zero_gravi
** `Zxcfu` - custom instructions extension
24 61 zero_gravi
** `PMP` - physical memory protection
25 72 zero_gravi
** `Debug` - debug mode (part of the on.chip debugger) including hardware trigger module
26 65 zero_gravi
* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
27 60 zero_gravi
* Official RISC-V open-source architecture ID
28 65 zero_gravi
* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts
29 66 zero_gravi
* Supports _all_ of the machine-level traps from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
30
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
31 60 zero_gravi
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
32
* Optional hardware performance monitors (HPM) for application benchmarking
33 66 zero_gravi
* Separated interfaces for instruction fetch and data access (merged into a single processor bus))
34 60 zero_gravi
* little-endian byte order
35
* Configurable hardware reset
36 65 zero_gravi
* No hardware support of unaligned data/instruction accesses - they will trigger an exception.
37 60 zero_gravi
 
38
[NOTE]
39
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
40
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
41
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
42
setup also allows to further use the default bootloader and software framework. From this base you
43 70 zero_gravi
can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.
44 60 zero_gravi
 
45
[NOTE]
46
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
47
 
48
<<<
49
// ####################################################################################################################
50
:sectnums:
51
=== Architecture
52
 
53
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
54
specifications. The following figure shows the simplified architecture of the CPU.
55
 
56
image::neorv32_cpu.png[align=center]
57
 
58 66 zero_gravi
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
59
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
60
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
61
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
62 60 zero_gravi
 
63 66 zero_gravi
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
64
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
65
data is stored to a FIFO queue - the instruction prefetch buffer.
66 60 zero_gravi
 
67 66 zero_gravi
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
68
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
69
instruction or decompressed 16-bit instructions) for execution.
70
 
71
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
72
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
73
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
74
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
75
 
76 60 zero_gravi
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
77
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
78 66 zero_gravi
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
79
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
80
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
81
multi-cycle concept).
82 60 zero_gravi
 
83 66 zero_gravi
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
84
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
85
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
86
address space.
87 60 zero_gravi
 
88
 
89
// ####################################################################################################################
90
:sectnums:
91 66 zero_gravi
=== Full Virtualization
92
 
93 72 zero_gravi
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to
94 66 zero_gravi
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
95
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
96 72 zero_gravi
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing a
97
malformed instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a
98 66 zero_gravi
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
99 72 zero_gravi
might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.
100 66 zero_gravi
 
101
**Execution Safety - NEORV32 Virtualization Features**
102
 
103
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
104
(i.e. there is no speculative execution / no out-of-order states).
105
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
106 72 zero_gravi
accessed address does not respond or encounters an internal device error during access.
107 66 zero_gravi
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
108
window. Otherwise a bus access exception is raised.
109
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
110
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
111
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
112
memory operations).
113
* To be continued...
114
 
115
 
116
// ####################################################################################################################
117
:sectnums:
118 60 zero_gravi
=== RISC-V Compatibility
119
 
120
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
121
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
122 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
123
 
124
[NOTE]
125
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
126 60 zero_gravi
for information how to run the tests on the NEORV32.
127
 
128
.**RISC-V `rv32_m/C` Tests**
129
...................................
130
Check cadd-01           ... OK
131
Check caddi-01          ... OK
132
Check caddi16sp-01      ... OK
133
Check caddi4spn-01      ... OK
134
Check cand-01           ... OK
135
Check candi-01          ... OK
136
Check cbeqz-01          ... OK
137
Check cbnez-01          ... OK
138
Check cebreak-01        ... OK
139
Check cj-01             ... OK
140
Check cjal-01           ... OK
141
Check cjalr-01          ... OK
142
Check cjr-01            ... OK
143
Check cli-01            ... OK
144
Check clui-01           ... OK
145
Check clw-01            ... OK
146
Check clwsp-01          ... OK
147
Check cmv-01            ... OK
148
Check cnop-01           ... OK
149
Check cor-01            ... OK
150
Check cslli-01          ... OK
151
Check csrai-01          ... OK
152
Check csrli-01          ... OK
153
Check csub-01           ... OK
154
Check csw-01            ... OK
155
Check cswsp-01          ... OK
156
Check cxor-01           ... OK
157
--------------------------------
158
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
159
...................................
160
 
161
.**RISC-V `rv32_m/I` Tests**
162
...................................
163
Check add-01            ... OK
164
Check addi-01           ... OK
165
Check and-01            ... OK
166
Check andi-01           ... OK
167
Check auipc-01          ... OK
168
Check beq-01            ... OK
169
Check bge-01            ... OK
170
Check bgeu-01           ... OK
171
Check blt-01            ... OK
172
Check bltu-01           ... OK
173
Check bne-01            ... OK
174
Check fence-01          ... OK
175
Check jal-01            ... OK
176
Check jalr-01           ... OK
177
Check lb-align-01       ... OK
178
Check lbu-align-01      ... OK
179
Check lh-align-01       ... OK
180
Check lhu-align-01      ... OK
181
Check lui-01            ... OK
182
Check lw-align-01       ... OK
183
Check or-01             ... OK
184
Check ori-01            ... OK
185
Check sb-align-01       ... OK
186
Check sh-align-01       ... OK
187
Check sll-01            ... OK
188
Check slli-01           ... OK
189
Check slt-01            ... OK
190
Check slti-01           ... OK
191
Check sltiu-01          ... OK
192
Check sltu-01           ... OK
193
Check sra-01            ... OK
194
Check srai-01           ... OK
195
Check srl-01            ... OK
196
Check srli-01           ... OK
197
Check sub-01            ... OK
198
Check sw-align-01       ... OK
199
Check xor-01            ... OK
200
Check xori-01           ... OK
201 70 zero_gravi
Check fence-01          ... OK
202 60 zero_gravi
--------------------------------
203 70 zero_gravi
OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
204 60 zero_gravi
...................................
205
 
206
.**RISC-V `rv32_m/M` Tests**
207
...................................
208
Check div-01            ... OK
209
Check divu-01           ... OK
210
Check mul-01            ... OK
211
Check mulh-01           ... OK
212
Check mulhsu-01         ... OK
213
Check mulhu-01          ... OK
214
Check rem-01            ... OK
215
Check remu-01           ... OK
216
--------------------------------
217
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
218
...................................
219
 
220
.**RISC-V `rv32_m/privilege` Tests**
221
...................................
222
Check ebreak            ... OK
223
Check ecall             ... OK
224
Check misalign-beq-01   ... OK
225
Check misalign-bge-01   ... OK
226
Check misalign-bgeu-01  ... OK
227
Check misalign-blt-01   ... OK
228
Check misalign-bltu-01  ... OK
229
Check misalign-bne-01   ... OK
230
Check misalign-jal-01   ... OK
231
Check misalign-lh-01    ... OK
232
Check misalign-lhu-01   ... OK
233
Check misalign-lw-01    ... OK
234
Check misalign-sh-01    ... OK
235
Check misalign-sw-01    ... OK
236
Check misalign1-jalr-01 ... OK
237
Check misalign2-jalr-01 ... OK
238
--------------------------------
239
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
240
...................................
241
 
242
.**RISC-V `rv32_m/Zifencei` Tests**
243
...................................
244
Check Fencei            ... OK
245
--------------------------------
246
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
247
...................................
248
 
249
 
250
<<<
251
:sectnums:
252
==== RISC-V Incompatibility Issues and Limitations
253
 
254 64 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility. More specific information
255 60 zero_gravi
can be found in section <<_instruction_sets_and_extensions>>.
256
 
257 69 zero_gravi
.Read-Only "Read-Write" CSRs
258 60 zero_gravi
[IMPORTANT]
259 72 zero_gravi
The <<_misa>> and <<_mtval>> CSRs in the NEORV32 are _read-only_.
260 69 zero_gravi
Any machine-mode write access to them is ignored and will _not_ cause any exceptions or side-effects to maintain
261
RISC-V compatibility.
262 60 zero_gravi
 
263 69 zero_gravi
.Physical Memory Protection
264 60 zero_gravi
[IMPORTANT]
265 70 zero_gravi
The physical memory protection (see section <<_machine_physical_memory_protection_csrs>>)
266 60 zero_gravi
only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region.
267
 
268 69 zero_gravi
.Atomic Memory Operations
269 60 zero_gravi
[IMPORTANT]
270 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
271
However, these instructions are sufficient to emulate all further atomic memory operations.
272 60 zero_gravi
 
273 66 zero_gravi
 
274 60 zero_gravi
<<<
275
// ####################################################################################################################
276
:sectnums:
277
=== CPU Top Entity - Signals
278
 
279
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
280
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
281
direction seen from the CPU.
282
 
283
.NEORV32 CPU top entity signals
284
[cols="<2,^1,^1,<6"]
285
[options="header", grid="rows"]
286
|=======================
287
| Signal           | Width | Dir.   | Function
288
4+^| **Global Signals**
289
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
290
| `rstn_i`         |     1 | in  | global reset, low-active
291
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
292 69 zero_gravi
| `debug_o`        |     1 | out | CPU is in debug mode when set
293 60 zero_gravi
4+^| **Instruction Bus Interface (<<_bus_interface>>)**
294
| `i_bus_addr_o`   |    32 | out | destination address
295
| `i_bus_rdata_i`  |    32 | in  | read data
296
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
297
| `i_bus_ben_o`    |     4 | out | byte enable
298
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
299
| `i_bus_re_o`     |     1 | out | read transaction
300
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
301
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
302
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
303
| `i_bus_fence_o`  |     1 | out | indicates an executed _fence.i_ instruction
304
| `i_bus_priv_o`   |     2 | out | current CPU privilege level
305
4+^| **Data Bus Interface (<<_bus_interface>>)**
306
| `d_bus_addr_o`   |    32 | out | destination address
307
| `d_bus_rdata_i`  |    32 | in  | read data
308
| `d_bus_wdata_o`  |    32 | out | write data
309
| `d_bus_ben_o`    |     4 | out | byte enable
310
| `d_bus_we_o`     |     1 | out | write transaction
311
| `d_bus_re_o`     |     1 | out | read transaction
312
| `d_bus_lock_o`   |     1 | out | exclusive access request
313
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
314
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
315
| `d_bus_fence_o`  |     1 | out | indicates an executed _fence_ instruction
316
| `d_bus_priv_o`   |     2 | out | current CPU privilege level
317
4+^| **System Time (see <<_timeh>> CSR)**
318
| `time_i`         |    64 | in  | system time input (from MTIME)
319
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
320
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
321
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
322
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
323
4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
324
| `firq_i`         |    16 | in  | fast interrupt request signals
325
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
326
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
327
|=======================
328
 
329
<<<
330
// ####################################################################################################################
331
:sectnums:
332
=== CPU Top Entity - Generics
333
 
334
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
335
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
336
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
337
The _specific_ generics are listed below.
338
 
339
[cols="4,4,2"]
340
[frame="all",grid="none"]
341
|======
342 72 zero_gravi
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | -
343 60 zero_gravi
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
344
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
345 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
346 60 zero_gravi
|======
347
 
348
[cols="4,4,2"]
349
[frame="all",grid="none"]
350
|======
351 72 zero_gravi
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | -
352 60 zero_gravi
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
353
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
354
|======
355
 
356
[cols="4,4,2"]
357
[frame="all",grid="none"]
358
|======
359 72 zero_gravi
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | -
360 60 zero_gravi
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
361
|======
362
 
363
 
364
<<<
365
// ####################################################################################################################
366
:sectnums:
367
=== Instruction Sets and Extensions
368
 
369 65 zero_gravi
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
370 60 zero_gravi
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
371 65 zero_gravi
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
372 60 zero_gravi
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
373
 
374 72 zero_gravi
.Discovering ISA Extensions
375 60 zero_gravi
[TIP]
376 72 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> & <<_mxisa>> CSRs
377
or by executing an instruction and checking for an _illegal instruction exception_
378
(-> <<_full_virtualization>>). +
379
 +
380 65 zero_gravi
Executing an instruction from an extension that is not supported yet or that is currently not enabled
381 72 zero_gravi
(via the according top entity generic) will raise an illegal instruction exception.
382 60 zero_gravi
 
383 63 zero_gravi
 
384 60 zero_gravi
==== **`A`** - Atomic Memory Access
385
 
386 65 zero_gravi
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
387
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
388 72 zero_gravi
ISA extension is enabled if the <<_cpu_extension_riscv_a>> configuration generic is _true_.
389 65 zero_gravi
In this case the following additional instructions are available:
390 60 zero_gravi
 
391
* `lr.w`: load-reservate
392
* `sc.w`: store-conditional
393
 
394
[NOTE]
395
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
396
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
397 65 zero_gravi
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
398
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
399 60 zero_gravi
 
400 65 zero_gravi
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
401
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
402
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
403
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
404
After the execution of the `sc` instruction, the lock is automatically removed.
405
 
406
The lock is broken if at least one of the following conditions occur:
407
. executing any data memory access instruction other than `lr.w`
408
. raising _any_ t (for example an interrupt or a memory access exception)
409
 
410 60 zero_gravi
[NOTE]
411
The atomic instructions have special requirements for memory system / bus interconnect. More
412
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
413
 
414
 
415 66 zero_gravi
==== **`B`** - Bit-Manipulation Operations
416
 
417
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
418 72 zero_gravi
<<_cpu_extension_riscv_b>> configuration generic is _true_.
419 66 zero_gravi
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
420 71 zero_gravi
A copy of the spec is also available in `docs/references`.
421 66 zero_gravi
 
422 71 zero_gravi
The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V
423
bit-manipulation spec. v.093) and their corresponding instructions:
424 66 zero_gravi
 
425 71 zero_gravi
* **`Zba` - Address-generation instructions**
426
** `sh1add` `sh2add` `sh3add`
427
* **`Zbb` - Basic bit-manipulation instructions**
428
** `andn` `orn` `xnor`
429
** `clz` `ctz` `cpop`
430
** `max` `maxu` `min` `minu`
431
** `sext.b` `sext.h` `zext.h`
432
** `rol` `ror` `rori`
433
** `orc.b` `rev8`
434
* **`Zbc` - Carry-less multiplication instructions**
435
** `clmul` `clmulh` `clmulr`
436
* **`Zbs` - Single-bit instructions**
437
** `bclr` `bclri`
438
** `bext` `bexti`
439
** `bext` `binvi`
440
** `bset` `bseti`
441 66 zero_gravi
 
442
[TIP]
443
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
444
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
445
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
446
shift-related `B` instructions.
447
 
448
[WARNING]
449 71 zero_gravi
The `B` extension is frozen and officially ratified. However, there is no
450
software support for this extension in the upstream GCC RISC-V port yet. An
451 66 zero_gravi
intrinsic library is provided to utilize the provided `B` extension features from C-language
452 71 zero_gravi
code (see `sw/example/bitmanip_test`) to circumvent this.
453 66 zero_gravi
 
454
 
455 60 zero_gravi
==== **`C`** - Compressed Instructions
456
 
457 65 zero_gravi
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
458 72 zero_gravi
The `C` extension is available when the <<_cpu_extension_riscv_c>> configuration generic is _true_.
459 65 zero_gravi
In this case the following instructions are available:
460 60 zero_gravi
 
461 70 zero_gravi
* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`
462
`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`
463 60 zero_gravi
 
464
[NOTE]
465 65 zero_gravi
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
466
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
467 60 zero_gravi
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
468
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
469
 
470
 
471
==== **`E`** - Embedded CPU
472
 
473 65 zero_gravi
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
474 72 zero_gravi
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the <<_cpu_extension_riscv_e>>
475 65 zero_gravi
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
476
This extension does not add any additional instructions or features.
477 60 zero_gravi
 
478 70 zero_gravi
[NOTE]
479 63 zero_gravi
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
480 60 zero_gravi
 
481
 
482
==== **`I`** - Base Integer ISA
483 65 zero_gravi
 
484 60 zero_gravi
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
485
regardless of the setting of the remaining exceptions. The base instruction set includes the following
486
instructions:
487
 
488 70 zero_gravi
* immediate: `lui` `auipc`
489
* jumps: `jal` `jalr`
490
* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`
491
* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`
492
* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`
493
* environment: `ecall` `ebreak` `fence`
494 60 zero_gravi
 
495
[NOTE]
496 70 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations
497 61 zero_gravi
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
498 70 zero_gravi
completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
499 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
500 60 zero_gravi
 
501
[NOTE]
502
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
503 70 zero_gravi
top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
504 60 zero_gravi
executed. Any flags within the `fence` instruction word are ignore by the hardware.
505
 
506
 
507
==== **`M`** - Integer Multiplication and Division
508
 
509 65 zero_gravi
Hardware-accelerated integer multiplication and division operations are available when the
510 72 zero_gravi
<<_cpu_extension_riscv_m>> configuration generic is _true_. In this case the following instructions are
511 60 zero_gravi
available:
512
 
513 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
514
* division: `div` `divu` `rem` `remu`
515 60 zero_gravi
 
516
[NOTE]
517
By default, multiplication and division operations are executed in a bit-serial approach.
518
Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN`
519
generic is _true_ allowing faster execution. Multiplications and divisions
520
always require a fixed amount of cycles to complete - regardless of the input operands.
521
 
522
 
523 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
524
 
525
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
526 65 zero_gravi
of the `M` extensions and is intended for size-constrained setups that require hardware-based
527 61 zero_gravi
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
528 65 zero_gravi
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
529 72 zero_gravi
It is implemented if the <<_cpu_extension_riscv_zmmul>> configuration generic is _true_.
530 61 zero_gravi
 
531 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
532 61 zero_gravi
 
533 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
534
will raise an _illegal instruction exception_.
535 61 zero_gravi
 
536 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
537 61 zero_gravi
 
538
[TIP]
539
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
540
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
541 65 zero_gravi
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
542 61 zero_gravi
 
543
 
544 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
545
 
546 65 zero_gravi
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
547 72 zero_gravi
operation mode. It is implemented if the <<_cpu_extension_riscv_u>> configuration generic is _true_.
548 65 zero_gravi
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
549
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
550 72 zero_gravi
Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.
551 60 zero_gravi
 
552
 
553
==== **`X`** - NEORV32-Specific (Custom) Extensions
554
 
555 72 zero_gravi
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the <<_misa>> CSR.
556 60 zero_gravi
 
557 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
558
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie`
559 72 zero_gravi
and <<_mip>> CSR. This extension is mapped to CSR bits, that are available for custom use (according to the
560
RISC-V specs). Also, custom trap codes for <<_mcause>> are implemented.
561 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
562 72 zero_gravi
* There are <<_neorv32_specific_csrs>>.
563 60 zero_gravi
 
564
 
565 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
566 60 zero_gravi
 
567 65 zero_gravi
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
568
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
569
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
570
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
571
register file-related load/store or move instructions.
572
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
573 60 zero_gravi
 
574 70 zero_gravi
[NOTE]
575 60 zero_gravi
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
576
 
577 65 zero_gravi
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
578 72 zero_gravi
to the `F` extension. The `Zfinx` extension is implemented when the <<_cpu_extension_riscv_zfinx>> configuration
579 60 zero_gravi
generic is _true_. In this case the following instructions and CSRs are available:
580
 
581 70 zero_gravi
* conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`
582
* comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`
583
* computational: `fadd.s` `fsub.s` `fmul.s`
584
* sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`
585 60 zero_gravi
* number classification: `fclass.s`
586
 
587 72 zero_gravi
* additional CSRs: <<_fcsr>>, <<_frm>>, <<_fflags>>
588 60 zero_gravi
 
589
[WARNING]
590
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
591
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
592
 
593
[WARNING]
594 65 zero_gravi
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
595
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
596 60 zero_gravi
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
597
result is also flushed to zero during normalization.
598
 
599
[WARNING]
600
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
601
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
602
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
603
code (see `sw/example/floating_point_test`).
604
 
605 63 zero_gravi
 
606 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
607
 
608 65 zero_gravi
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
609 72 zero_gravi
is implemented when the <<_cpu_extension_riscv_zicsr>> configuration generic is _true_.
610 68 zero_gravi
 
611
[IMPORTANT]
612
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
613
In order to provide the full set of privileged functions that are required to run more complex tasks like
614 70 zero_gravi
operating system and to allow a secure execution environment the `Zicsr` extension should be always enabled.
615 68 zero_gravi
 
616 65 zero_gravi
In this case the following instructions are available:
617 60 zero_gravi
 
618 70 zero_gravi
* CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`
619
* environment: `mret` `wfi`
620 60 zero_gravi
 
621 68 zero_gravi
[NOTE]
622
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
623
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
624
(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").
625 60 zero_gravi
 
626
[NOTE]
627 68 zero_gravi
The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is
628 60 zero_gravi
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
629 72 zero_gravi
be enabled via the <<_mie>> CSR and the global interrupt enable flag in <<_mstatus>> has to be set.
630 65 zero_gravi
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
631 68 zero_gravi
`TW` (timeout wait) is _hardwired_ to zero.
632 60 zero_gravi
 
633 62 zero_gravi
 
634 66 zero_gravi
 
635
==== **`Zicntr`** CPU Base Counters
636
 
637
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
638
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
639
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
640
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
641
 
642
[NOTE]
643
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
644
 
645
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
646
 
647
 
648
 
649
==== **`Zihpm`** Hardware Performance Monitors
650
 
651
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
652
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
653
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
654 72 zero_gravi
<<_hpm_cnt_width>> generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
655 66 zero_gravi
CSR defines the architectural events that lead to an increment of the associated HPM counter.
656
 
657
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
658
 
659
Depending on the configuration the following additional CSR are available:
660
 
661
* counters: `mhpmcounter*[h]` (3..31, depending on `HPM_NUM_CNTS`)
662
* event configuration: `mhpmevent*` (3..31, depending on `HPM_NUM_CNTS`)
663
 
664
[IMPORTANT]
665 72 zero_gravi
The HPM counter CSR can only be accessed in machine-mode. Hence, the according <<_mcounteren>> CSR bits
666 66 zero_gravi
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
667
exception.
668
 
669
[TIP]
670 72 zero_gravi
Auto-increment of the HPMs can be individually deactivated via the <<_mcountinhibit>> CSR.
671 66 zero_gravi
 
672
[TIP]
673
For a list of all HPM-related CSRs and all provided event configurations
674
see section <<_hardware_performance_monitors_hpm>>.
675
 
676
 
677 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
678
 
679 72 zero_gravi
The `Zifencei` CPU extension is implemented if the <<_cpu_extension_riscv_zifencei>> configuration
680 60 zero_gravi
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
681
 
682
* `fence.i`
683
 
684 66 zero_gravi
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
685 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
686
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
687
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
688 60 zero_gravi
 
689
 
690 72 zero_gravi
==== **`Zxcfu`** Custom Instructions Extension (CFU)
691
 
692
The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific
693
custom extension, `cfu` = name of the custom extension). When enabled via the <<_cpu_extension_riscv_zxcfu>> configuration
694
generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that
695
allows to add **custom RISC-V instructions** to the processor core.
696
 
697
The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data
698
transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be
699
implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by
700
the RISC-V spec for custom extensions.
701
 
702
Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that
703
behave like "regular" C functions.
704
 
705
[TIP]
706
For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.
707
 
708
[TIP]
709
The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.
710
If you like to add more complex accelerators or interfaces that can also operate independently of
711
the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.
712
 
713
 
714 60 zero_gravi
==== **`PMP`** Physical Memory Protection
715
 
716 65 zero_gravi
The NEORV32 physical memory protection (PMP) is compatible to the RISC-V PMP specifications. It can be used
717
to constrain memory read/write/execute rights for each available privilege level.
718 60 zero_gravi
 
719 65 zero_gravi
The NEORV32 PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger
720
minimal sizes can be configured via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements.
721
The physical memory protection system is implemented when the `PMP_NUM_REGIONS` configuration generic is >0.
722
In this case the following additional CSRs are available:
723
 
724 60 zero_gravi
* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers
725
* `pmpaddr*` (0..63, depending on configuration): PMP address registers
726
 
727 65 zero_gravi
[TIP]
728 70 zero_gravi
See section <<_machine_physical_memory_protection_csrs>> for more information regarding the PMP CSRs.
729 60 zero_gravi
 
730
The actual number of regions and the minimal region granularity are defined via the top entity
731
`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available
732
granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the
733
number of available `pmpcfg*` and `pmpaddr*` CSRs.
734
 
735
When implementing more PMP regions that a _certain critical limit_ *an additional register stage
736
is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also
737
increase the latency of instruction fetches and data access by +1 cycle.
738
 
739
The critical limit can be adapted for custom use by a constant from the main VHDL package file
740
(`rtl/core/neorv32_package.vhd`). The default value is 8:
741
 
742
[source,vhdl]
743
----
744
-- "critical" number of PMP regions --
745
constant pmp_num_regions_critical_c : natural := 8;
746
----
747
 
748
**Operation**
749
 
750 65 zero_gravi
Any CPU memory access address (from the instruction fetch or data access interface) is tested if it is accessing _any_
751
of the specified  PMP regions(configured via `pmpaddr*` and enabled via `pmpcfg*`). If an
752
address matches one of these regions, the configured access rights (attributes in `pmpcfg*`) are enforced:
753 60 zero_gravi
 
754
* a write access (store) will fail if no write attribute is set
755
* a read access (load) will fail if no read attribute is set
756
* an instruction fetch access will fail if no execute attribute is set
757
 
758 65 zero_gravi
If an access to a protected region does not have the according access rights it will raise the according
759
instruction/load/store _access fault_ exception.
760 60 zero_gravi
 
761
By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical
762 65 zero_gravi
memory protection also for machine-level programs you need to set the _locked bit_ in the according
763
`pmpcfg*` configuration CSR.
764 60 zero_gravi
 
765
[IMPORTANT]
766
After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for
767
internal (iterative) computations before the configuration becomes valid.
768
 
769
[NOTE]
770
For more information regarding RISC-V physical memory protection see the official _The RISC-V
771 65 zero_gravi
Instruction Set Manual - Volume II: Privileged Architecture_ specifications.
772 60 zero_gravi
 
773
 
774
 
775
<<<
776
// ####################################################################################################################
777
:sectnums:
778
=== Instruction Timing
779
 
780
The instruction timing listed in the table below shows the required clock cycles for executing a certain
781
instruction. These instruction cycles assume a bus access without additional wait states and a filled
782
pipeline.
783
 
784
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
785
configurations are presented in <<_cpu_performance>>.
786
 
787
.Clock cycles per instruction
788
[cols="<2,^1,^4,<3"]
789
[options="header", grid="rows"]
790
|=======================
791
| Class | ISA | Instruction(s) | Execution cycles
792
| ALU           | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
793
| ALU           | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
794
| ALU           | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
795 61 zero_gravi
| ALU           | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
796 60 zero_gravi
| Branches      | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
797
| Branches      | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
798
| Jumps / Calls | `I/E` | `jal` `jalr`                  | 4 + ML
799
| Jumps / Calls | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
800
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
801
| Memory access | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
802
| Memory access | `A`   | `lr.w` `sc.w`                             | 4 + ML
803 69 zero_gravi
| Multiplication | `M`  | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4
804
| Division       | `M`  | `div` `divu` `rem` `remu`     | 2+32+2
805 60 zero_gravi
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4
806
| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4
807
| System | `I/E` | `fence` | 3
808
| System | `C`+`Zicsr` | `c.break` | 4
809
| System | `Zicsr` | `mret` `wfi` | 5
810 66 zero_gravi
| System | `Zifencei` | `fence.i` | 3 + ML
811 60 zero_gravi
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
812
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
813
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
814
| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
815
| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
816
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
817
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
818 66 zero_gravi
| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
819
| Bit-manipulation - arithmetic/logic | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
820
| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
821
| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32
822
| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
823 71 zero_gravi
| Bit-manipulation - single-bit  | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
824 66 zero_gravi
| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
825 71 zero_gravi
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
826 72 zero_gravi
| CFU: custom instructions | `Zxcfu` | - | min. 4
827 60 zero_gravi
|=======================
828
 
829
[NOTE]
830 65 zero_gravi
The presented values of the *floating-point execution cycles* are average values - obtained from
831 60 zero_gravi
4096 instruction executions using pseudo-random input values. The execution time for emulating the
832
instructions (using pure-software libraries) is ~17..140 times higher.
833
 
834
 
835 66 zero_gravi
<<<
836 60 zero_gravi
// ####################################################################################################################
837
include::cpu_csr.adoc[]
838
 
839
 
840
<<<
841
// ####################################################################################################################
842
:sectnums:
843
==== Traps, Exceptions and Interrupts
844
 
845 61 zero_gravi
In this document the following nomenclature regarding traps is used:
846 60 zero_gravi
 
847 64 zero_gravi
* _interrupts_ = asynchronous exceptions
848 60 zero_gravi
* _exceptions_ = synchronous exceptions
849
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
850
 
851 72 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in <<_mtvec>>
852
CSR. The cause of the according interrupt or exception can be determined via the content of <<_mcause>>
853
CSR. The address that reflects the current program counter when a trap was taken is stored to <<_mepc>> CSR.
854
Additional information regarding the cause of the trap can be retrieved from <<_mtval>> CSR and the processor's
855 70 zero_gravi
<<_internal_bus_monitor_buskeeper>> (for memory access exceptions)
856 60 zero_gravi
 
857 70 zero_gravi
The traps are prioritized. If several _synchronous exceptions_ occur at once only the one with highest priority is triggered
858
while all remaining exceptions are ignored. If several _asynchronous exceptions_ (interrupts) trigger at once, the one with highest priority
859 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
860 70 zero_gravi
the second highest priority will get serviced and so on until no further interrupts are pending.
861 60 zero_gravi
 
862 69 zero_gravi
.Interrupt Signal Requirements - Standard RISC-V Interrupts
863 61 zero_gravi
[IMPORTANT]
864 69 zero_gravi
All standard RISC-V interrupts request signals are **high-active**. A request has to stay at high-level (=asserted)
865 65 zero_gravi
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
866 60 zero_gravi
 
867 69 zero_gravi
.Interrupt Signal Requirements - Fast Interrupt Requests
868
[IMPORTANT]
869 70 zero_gravi
The NEORV32-specific FIRQ request lines are triggered by a one-shot high-level (i.e. rising edge). Each request is buffered in the CPU control
870 72 zero_gravi
unit until the channel is either disabled (by clearing the according <<_mie>> CSR bit) or the request is explicitly cleared (by setting
871
the according <<_mip>> CSR bit).
872 69 zero_gravi
 
873 61 zero_gravi
.Instruction Atomicity
874
[NOTE]
875 70 zero_gravi
All instructions execute as atomic operations - interrupts can only trigger _between_ two instructions.
876
So even if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
877
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.
878 60 zero_gravi
 
879
 
880 61 zero_gravi
:sectnums:
881 70 zero_gravi
==== Memory Access Exceptions
882 60 zero_gravi
 
883 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
884
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
885 70 zero_gravi
trigger a bus/memory read-operation at all. Vice versa, exceptions caused by a store address misalignment or a store physical
886
memory protection fault do not trigger a bus/memory write-operation at all.
887 60 zero_gravi
 
888
 
889 61 zero_gravi
:sectnums:
890
==== Custom Fast Interrupt Request Lines
891 60 zero_gravi
 
892 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
893 72 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also
894
provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.
895 60 zero_gravi
 
896
 
897
 
898
<<<
899
// ####################################################################################################################
900 69 zero_gravi
:sectnums:
901
==== NEORV32 Trap Listing
902 60 zero_gravi
 
903 69 zero_gravi
The following table shows all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization
904
and the CSR side-effects. A more detailed description of the actual trap triggering events is provided in a further table.
905
 
906
[NOTE]
907 72 zero_gravi
_Asynchronous exceptions_ (= interrupts) set the MSB of <<_mcause>> while _synchronous exception_ (= "software exception")
908 69 zero_gravi
clear the MSB.
909
 
910
**Table Annotations**
911
 
912
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
913 72 zero_gravi
cause ID of the according trap that is written to <<_mcause>> CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
914 70 zero_gravi
official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (the runtime environment _RTE_) and can
915 69 zero_gravi
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to
916 72 zero_gravi
<<_mepc>> and <<_mtval>> CSRs when a trap is triggered:
917 69 zero_gravi
 
918
* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet)
919
* _B-ADR_- bad memory access address that cause the trap
920
* _PC_ - address of instruction that caused the trap
921
* _0_ - zero
922
* _Inst_ - the faulting instruction itself
923
 
924
.NEORV32 Trap Listing
925 60 zero_gravi
[cols="3,6,5,14,11,4,4"]
926
[options="header",grid="rows"]
927
|=======================
928 64 zero_gravi
| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval`
929
| 1  | `0x00000000` | 0.0  | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_
930
| 2  | `0x00000001` | 0.1  | _TRAP_CODE_I_ACCESS_     | instruction access fault | _B-ADR_ | _PC_
931
| 3  | `0x00000002` | 0.2  | _TRAP_CODE_I_ILLEGAL_    | illegal instruction | _PC_ | _Inst_
932
| 4  | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall` in machine-mode) | _PC_ | _PC_
933
| 5  | `0x00000008` | 0.8  | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall` in user-mode) | _PC_ | _PC_
934 69 zero_gravi
| 6  | `0x00000003` | 0.3  | _TRAP_CODE_BREAKPOINT_   | breakpoint (`ebreak`) | _PC_ | _PC_
935 64 zero_gravi
| 7  | `0x00000006` | 0.6  | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_
936
| 8  | `0x00000004` | 0.4  | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_
937
| 9  | `0x00000007` | 0.7  | _TRAP_CODE_S_ACCESS_     | store access fault | _B-ADR_ | _B-ADR_
938
| 10 | `0x00000005` | 0.5  | _TRAP_CODE_L_ACCESS_     | load access fault | _B-ADR_ | _B-ADR_
939
| 11 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0 | _I-PC_ | _0_
940
| 12 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1 | _I-PC_ | _0_
941
| 13 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2 | _I-PC_ | _0_
942
| 14 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3 | _I-PC_ | _0_
943
| 15 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4 | _I-PC_ | _0_
944
| 16 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5 | _I-PC_ | _0_
945
| 17 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6 | _I-PC_ | _0_
946
| 18 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7 | _I-PC_ | _0_
947
| 19 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8 | _I-PC_ | _0_
948
| 20 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9 | _I-PC_ | _0_
949
| 21 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | _I-PC_ | _0_
950
| 22 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | _I-PC_ | _0_
951
| 23 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | _I-PC_ | _0_
952
| 24 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | _I-PC_ | _0_
953
| 25 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | _I-PC_ | _0_
954
| 26 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | _I-PC_ | _0_
955
| 27 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_          | machine external interrupt | _I-PC_ | _0_
956
| 28 | `0x80000003` | 1.3  | _TRAP_CODE_MSI_          | machine software interrupt | _I-PC_ | _0_
957
| 29 | `0x80000007` | 1.7  | _TRAP_CODE_MTI_          | machine timer interrupt | _I-PC_ | _0_
958 60 zero_gravi
|=======================
959
 
960
 
961 69 zero_gravi
The following table provides a summarized description of the actual events for triggering a specific trap.
962 60 zero_gravi
 
963 69 zero_gravi
.NEORV32 Trap Description
964
[cols="<3,<7"]
965
[options="header",grid="rows"]
966
|=======================
967 70 zero_gravi
| Trap ID [C] | Triggered when ...
968 69 zero_gravi
| _TRAP_CODE_I_MISALIGNED_ | fetching an 32-bit instruction word that is not 32-bit-aligned (_see note below!_)
969
| _TRAP_CODE_I_ACCESS_     | bus timeout or bus error during instruction word fetch
970
| _TRAP_CODE_I_ILLEGAL_    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation
971
| _TRAP_CODE_MENV_CALL_    | executing `ecall` instruction in machine-mode
972
| _TRAP_CODE_UENV_CALL_    | executing `ecall` instruction in user-mode
973
| _TRAP_CODE_BREAKPOINT_   | executing `ebreak` instruction (or triggered by on-chip debugger)
974
| _TRAP_CODE_S_MISALIGNED_ | storing data to an address that is not naturally aligned to the data size (byte, half, word) being stored
975
| _TRAP_CODE_L_MISALIGNED_ | loading data from an address that is not naturally aligned to the data size  (byte, half, word) being loaded
976
| _TRAP_CODE_S_ACCESS_     | bus timeout or bus error during load data operation
977
| _TRAP_CODE_L_ACCESS_     | bus timeout or bus error during store data operation
978
| _TRAP_CODE_FIRQ_0_ ... _TRAP_CODE_FIRQ_15_| caused by interrupt-condition of processor-internal modules, see <<_neorv32_specific_fast_interrupt_requests>>
979
| _TRAP_CODE_MEI_          | user-defined processor-external source (via dedicated top-entity signal)
980
| _TRAP_CODE_MSI_          | user-defined processor-external source (via dedicated top-entity signal)
981
| _TRAP_CODE_MTI_          | processor-internal machine timer overflow OR user-defined processor-external source (via dedicated top-entity signal)
982
|=======================
983 60 zero_gravi
 
984 72 zero_gravi
.Misaligned Instruction Address Exception
985 69 zero_gravi
[NOTE]
986
For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception
987
is raised if bit 1 of the fetch address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented
988
there will never be a misaligned instruction exception _at all_.
989 72 zero_gravi
In both cases bit 0 of the program counter (and all related CSRs) is hardwired to zero.
990 60 zero_gravi
 
991
 
992
<<<
993
// ####################################################################################################################
994
:sectnums:
995
==== Bus Interface
996
 
997 72 zero_gravi
The NEORV32 CPU implements a 32-bit machine with separated instruction and data interfaces making the CPU a
998
**Harvard Architecture**: the _instruction fetch interface_ (`i_bus_*`) is used for fetching instruction and the
999
_data access interface_ (`d_bus_*`) is used to access data via load and store operations.
1000
Each of this interfaces can access an address space of up to 2^32^ bytes (4GB).
1001
The following table shows the signals of the data and instruction interfaces as seen from the CPU (`*_o` signals are driven
1002
by the CPU / outputs, `*_i` signals are read by the CPU / inputs). Both interfaces use the same protocol.
1003 60 zero_gravi
 
1004 72 zero_gravi
.CPU bus interfaces ()
1005
[cols="<2,^1,^1,<6"]
1006 60 zero_gravi
[options="header",grid="rows"]
1007
|=======================
1008 72 zero_gravi
| Signal             | Width | Direction | Description
1009
| `i/d_bus_addr_o`   | 32    | out       | access address
1010
| `i/d_bus_rdata_i`  | 32    | in        | data input for read operations
1011
| `i/d_bus_wdata_o`  | 32    | out       | data output for write operations
1012
| `i/d_bus_ben_o`    | 4     | out       | byte enable signal for write operations
1013
| `i/d_bus_we_o`     | 1     | out       | bus write access (always zero for instruction fetches)
1014
| `i/d_bus_re_o`     | 1     | out       | bus read access
1015
| `i/d_bus_lock_o`   | 1     | out       | exclusive access request
1016
| `i/d_bus_ack_i`    | 1     | in        | accessed peripheral indicates a successful completion of the bus transaction
1017
| `i/d_bus_err_i`    | 1     | in        | accessed peripheral indicates an error during the bus transaction
1018
| `i/d_bus_fence_o`  | 1     | out       | this signal is set for one cycle when the CPU executes an instruction/data fence operation
1019
| `i/d_bus_priv_o`   | 2     | out       | current CPU privilege level
1020 60 zero_gravi
|=======================
1021
 
1022 72 zero_gravi
.Pipelined Transfers
1023 60 zero_gravi
[NOTE]
1024
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
1025 72 zero_gravi
So only a single transfer request can be "on the fly" (pending) at once. However, this is no real drawback. The
1026
minimal possible latency for a single access is two cycles, which equals the CPU's minimal execution latency
1027
for a single instruction.
1028 60 zero_gravi
 
1029 72 zero_gravi
.Unaligned Memory Accesses
1030
[NOTE]
1031
Please note, that the NEORV32 CPU does not support the handling of unaligned memory accesses _in hardware_. Any
1032
unaligned memory access will raise an exception that can can be used to handle such accesses in _software_.
1033
 
1034
 
1035 60 zero_gravi
:sectnums:
1036
===== Protocol
1037
 
1038 72 zero_gravi
An actual bus request is triggered either by the `*_bus_re_o` signal (for reading data) or by the `*_bus_we_o` signal
1039
(for writing data). In case of a request, one of these signals is high for exactly one cycle. The transaction is
1040
completed when the accessed peripheral/memory either sets the `*_bus_ack_i` signal (-> successful completion) or the
1041
`*_bus_err_i` signal (-> failed completion). These bus response signal are also set only for one cycle active.
1042
An error indicated by the `*_bus_err_i` signal will raise the according "instruction bus access fault" or
1043
"load/store bus access fault" exception.
1044 60 zero_gravi
 
1045 72 zero_gravi
**Minimal Response Latency**
1046 60 zero_gravi
 
1047 72 zero_gravi
The transfer can be completed directly in the same cycle as it was initiated (via the `*_bus_re_o` or `*_bus_we_o`
1048
signal) if the peripheral sets `*_bus_ack_i` or `*_bus_err_i` high for one cycle. However, in order to shorten the
1049
critical path such "asynchronous" completion should be avoided. The default NEORV32 processor-internal modules provide
1050
exactly **one cycle delay** between initiation and completion of transfers.
1051 60 zero_gravi
 
1052 72 zero_gravi
**Maximal Response Latency**
1053
 
1054
Processor-internal peripherals or memories do not have to respond within one cycle after a bus request has been initiated.
1055
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window
1056
is defined by the global `max_proc_int_response_time_c` constant (default = 15 cycles; processor's VHDL package file `rtl/neorv32_package.vhd`).
1057
It defines the maximum number of cycles after which an _unacknowledged_ (`*_bus_ack_i` or `*_bus_err_i` both not set) processor-internal bus
1058
transfer will time out and raises a **bus fault exception**. The <<_internal_bus_monitor_buskeeper>> keeps track of all _internal_ bus
1059
transactions to enforce this time window.
1060
 
1061
If any bus operations times out (for example when accessing "address space holes") the BUSKEEPER will issue a bus
1062
error to the CPU that will raise the according instruction fetch or data access bus exception.
1063
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However,
1064
the external memory bus interface also provides an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
1065
 
1066 60 zero_gravi
**Exemplary Bus Accesses**
1067
 
1068
.Example bus accesses: see read/write access description below
1069
[cols="^2,^2"]
1070
[grid="none"]
1071
|=======================
1072
a| image::cpu_interface_read_long.png[read,300,150]
1073
a| image::cpu_interface_write_long.png[write,300,150]
1074
| Read access | Write access
1075
|=======================
1076
 
1077
**Write Access**
1078
 
1079 72 zero_gravi
For a write access, the access address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
1080 60 zero_gravi
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
1081
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
1082
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
1083
cycles after issuing.
1084
 
1085
**Read Access**
1086
 
1087
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
1088
stable until the transaction is completed. In the example the accessed peripheral cannot answer
1089
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
1090
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
1091
signal).
1092
 
1093
**Access Boundaries**
1094
 
1095
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
1096
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1097
bit) and word (= 32-bit) boundaries.
1098
 
1099
**Exclusive (Atomic) Access**
1100
 
1101
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1102
combination. Normally, these combinations should target the same memory address.
1103
 
1104
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1105
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1106
the memory system to manage this exclusive access reservation by storing the according access address and
1107
the source of the access itself (for example via the CPU ID in a multi-core system).
1108
 
1109
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1110
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1111
zero and will allow the according store operation to the memory system. If the lock is broken, the
1112
instruction will write-back non-zero and will not generate an actual memory store operation.
1113
 
1114
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1115
 
1116
* when executing any other memory-access operation than `lr.w`
1117
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1118
* when the memory system signals a bus error (via the `bus_err_i` signal)
1119
 
1120
[TIP]
1121
For more information regarding the SoC-level behavior and requirements of atomic operations see
1122
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1123
 
1124
**Memory Barriers**
1125
 
1126 72 zero_gravi
Whenever the CPU executes a _fence_ instruction, the according interface signal is set high for one cycle
1127
(`d_bus_fence_o` for a `fence` instruction; `i_bus_fence_o` for a `fencei` instruction). It is the task of the
1128
memory system to perform the necessary operations (for example a cache flush and refill).
1129 60 zero_gravi
 
1130
 
1131
 
1132
<<<
1133
// ####################################################################################################################
1134
:sectnums:
1135
==== CPU Hardware Reset
1136
 
1137
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1138
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1139
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1140
after power-up is not relevant for a defined CPU boot process.
1141
 
1142 70 zero_gravi
**Rationale**
1143 60 zero_gravi
 
1144
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1145
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1146 66 zero_gravi
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
1147 60 zero_gravi
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1148
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1149 70 zero_gravi
the pipeline's data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1150 60 zero_gravi
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1151
this example "uncritical registers".
1152
 
1153
**NEORV32 CPU Reset**
1154
 
1155
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1156
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1157 70 zero_gravi
pipeline register will get initialized by the CPU's internal state machines, which are initialized from the main
1158 60 zero_gravi
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1159
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1160
 
1161
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1162 72 zero_gravi
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR <<_mie>>
1163 60 zero_gravi
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1164 70 zero_gravi
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) _do_ provide a dedicated
1165
hardware reset setting this bit to low (globally disabling interrupts).
1166 60 zero_gravi
 
1167
**Reset Configuration**
1168
 
1169 70 zero_gravi
Most CPU-internal register do provide an asynchronous reset in the VHDL code, but the "don't care" value
1170
(VHDL `'-'`) is used for initialization of all uncritical registers, effectively generating a flip-flop without a
1171 60 zero_gravi
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1172 70 zero_gravi
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all CPU registers can
1173 72 zero_gravi
be enabled by enabling a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1174 60 zero_gravi
 
1175
[source,vhdl]
1176
----
1177 72 zero_gravi
-- use dedicated hardware reset value for UNCRITICAL registers --
1178
-- FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value
1179
constant dedicated_reset_c : boolean := false;
1180 60 zero_gravi
----
1181 72 zero_gravi
 
1182
 
1183
<<<
1184
// ####################################################################################################################
1185
 
1186
include::cpu_cfu.adoc[]

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.