OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 73

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4 72 zero_gravi
image::neorv32_cpu_block.png[width=600,align=center]
5 60 zero_gravi
 
6 73 zero_gravi
**Section Structure**
7
 
8
* <<_architecture>>, <<_full_virtualization>> and <<_risc_v_compatibility>>
9
* <<_cpu_top_entity_signals>> and <<_cpu_top_entity_generics>>
10
* <<_instruction_sets_and_extensions>>, <<_custom_functions_unit_cfu>> and <<_instruction_timing>>
11
* <<_control_and_status_registers_csrs>>
12
* <<_traps_exceptions_and_interrupts>>
13
* <<_bus_interface>>
14
 
15
 
16 60 zero_gravi
**Key Features**
17
 
18 73 zero_gravi
* 32-bit little-endian, multi-cycle, in-order `rv32` RISC-V CPU
19
* Compatible to the RISC-V. **Privileged Architecture - Machine ISA Version 1.12** specifications
20
* Available <<_instruction_sets_and_extensions>>:
21 61 zero_gravi
** `A` - atomic memory access operations
22 66 zero_gravi
** `B` - bit-manipulation instructions
23 61 zero_gravi
** `C` - 16-bit compressed instructions
24
** `I` - integer base ISA (always enabled)
25
** `E` - embedded CPU version (reduced register file size)
26
** `M` - integer multiplication and division hardware
27
** `U` - less-privileged _user_ mode
28
** `Zfinx` - single-precision floating-point unit
29
** `Zicsr` - control and status register access (privileged architecture)
30 66 zero_gravi
** `Zicntr` - CPU base counters
31
** `Zihpm` - hardware performance monitors
32 61 zero_gravi
** `Zifencei` - instruction stream synchronization
33
** `Zmmul` - integer multiplication hardware
34 72 zero_gravi
** `Zxcfu` - custom instructions extension
35 61 zero_gravi
** `PMP` - physical memory protection
36 73 zero_gravi
** `Debug` - <<_cpu_debug_mode>> (part of the on.chip debugger) including hardware <<_trigger_module>>
37
* <<_risc_v_compatibility>>: Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
38 60 zero_gravi
* Official RISC-V open-source architecture ID
39 73 zero_gravi
* Supports _all_ of the machine-level <<_traps_exceptions_and_interrupts>> from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
40 66 zero_gravi
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
41 73 zero_gravi
** Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 custom _fast_ interrupts
42 60 zero_gravi
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
43
* Optional hardware performance monitors (HPM) for application benchmarking
44 73 zero_gravi
* Separated <<_bus_interface>>s for instruction fetch and data access
45 60 zero_gravi
 
46
[NOTE]
47
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
48
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
49
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
50
setup also allows to further use the default bootloader and software framework. From this base you
51 70 zero_gravi
can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.
52 60 zero_gravi
 
53
[NOTE]
54
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
55
 
56
<<<
57
// ####################################################################################################################
58
:sectnums:
59
=== Architecture
60
 
61
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
62
specifications. The following figure shows the simplified architecture of the CPU.
63
 
64
image::neorv32_cpu.png[align=center]
65
 
66 66 zero_gravi
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
67
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
68
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
69
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
70 60 zero_gravi
 
71 66 zero_gravi
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
72
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
73
data is stored to a FIFO queue - the instruction prefetch buffer.
74 60 zero_gravi
 
75 66 zero_gravi
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
76
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
77
instruction or decompressed 16-bit instructions) for execution.
78
 
79
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
80
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
81
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
82
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
83
 
84 60 zero_gravi
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
85
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
86 66 zero_gravi
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
87
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
88
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
89
multi-cycle concept).
90 60 zero_gravi
 
91 66 zero_gravi
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
92
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
93
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
94
address space.
95 60 zero_gravi
 
96
 
97
// ####################################################################################################################
98
:sectnums:
99 66 zero_gravi
=== Full Virtualization
100
 
101 72 zero_gravi
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to
102 66 zero_gravi
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
103
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
104 72 zero_gravi
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing a
105
malformed instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a
106 66 zero_gravi
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
107 72 zero_gravi
might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.
108 66 zero_gravi
 
109
**Execution Safety - NEORV32 Virtualization Features**
110
 
111
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
112
(i.e. there is no speculative execution / no out-of-order states).
113
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
114 72 zero_gravi
accessed address does not respond or encounters an internal device error during access.
115 66 zero_gravi
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
116
window. Otherwise a bus access exception is raised.
117
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
118
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
119
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
120
memory operations).
121
* To be continued...
122
 
123
 
124
// ####################################################################################################################
125
:sectnums:
126 60 zero_gravi
=== RISC-V Compatibility
127
 
128
The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and
129
rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the
130 62 zero_gravi
NEORV32 processor are located in the repository's `sw/isa-test` folder.
131
 
132
[NOTE]
133
See section https://stnolting.github.io/neorv32/ug/#_risc_v_architecture_test_framework[User Guide: RISC-V Architecture Test Framework]
134 60 zero_gravi
for information how to run the tests on the NEORV32.
135
 
136
.**RISC-V `rv32_m/C` Tests**
137
...................................
138
Check cadd-01           ... OK
139
Check caddi-01          ... OK
140
Check caddi16sp-01      ... OK
141
Check caddi4spn-01      ... OK
142
Check cand-01           ... OK
143
Check candi-01          ... OK
144
Check cbeqz-01          ... OK
145
Check cbnez-01          ... OK
146
Check cebreak-01        ... OK
147
Check cj-01             ... OK
148
Check cjal-01           ... OK
149
Check cjalr-01          ... OK
150
Check cjr-01            ... OK
151
Check cli-01            ... OK
152
Check clui-01           ... OK
153
Check clw-01            ... OK
154
Check clwsp-01          ... OK
155
Check cmv-01            ... OK
156
Check cnop-01           ... OK
157
Check cor-01            ... OK
158
Check cslli-01          ... OK
159
Check csrai-01          ... OK
160
Check csrli-01          ... OK
161
Check csub-01           ... OK
162
Check csw-01            ... OK
163
Check cswsp-01          ... OK
164
Check cxor-01           ... OK
165
--------------------------------
166
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
167
...................................
168
 
169
.**RISC-V `rv32_m/I` Tests**
170
...................................
171
Check add-01            ... OK
172
Check addi-01           ... OK
173
Check and-01            ... OK
174
Check andi-01           ... OK
175
Check auipc-01          ... OK
176
Check beq-01            ... OK
177
Check bge-01            ... OK
178
Check bgeu-01           ... OK
179
Check blt-01            ... OK
180
Check bltu-01           ... OK
181
Check bne-01            ... OK
182
Check fence-01          ... OK
183
Check jal-01            ... OK
184
Check jalr-01           ... OK
185
Check lb-align-01       ... OK
186
Check lbu-align-01      ... OK
187
Check lh-align-01       ... OK
188
Check lhu-align-01      ... OK
189
Check lui-01            ... OK
190
Check lw-align-01       ... OK
191
Check or-01             ... OK
192
Check ori-01            ... OK
193
Check sb-align-01       ... OK
194
Check sh-align-01       ... OK
195
Check sll-01            ... OK
196
Check slli-01           ... OK
197
Check slt-01            ... OK
198
Check slti-01           ... OK
199
Check sltiu-01          ... OK
200
Check sltu-01           ... OK
201
Check sra-01            ... OK
202
Check srai-01           ... OK
203
Check srl-01            ... OK
204
Check srli-01           ... OK
205
Check sub-01            ... OK
206
Check sw-align-01       ... OK
207
Check xor-01            ... OK
208
Check xori-01           ... OK
209 70 zero_gravi
Check fence-01          ... OK
210 60 zero_gravi
--------------------------------
211 70 zero_gravi
OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
212 60 zero_gravi
...................................
213
 
214
.**RISC-V `rv32_m/M` Tests**
215
...................................
216
Check div-01            ... OK
217
Check divu-01           ... OK
218
Check mul-01            ... OK
219
Check mulh-01           ... OK
220
Check mulhsu-01         ... OK
221
Check mulhu-01          ... OK
222
Check rem-01            ... OK
223
Check remu-01           ... OK
224
--------------------------------
225
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
226
...................................
227
 
228
.**RISC-V `rv32_m/privilege` Tests**
229
...................................
230
Check ebreak            ... OK
231
Check ecall             ... OK
232
Check misalign-beq-01   ... OK
233
Check misalign-bge-01   ... OK
234
Check misalign-bgeu-01  ... OK
235
Check misalign-blt-01   ... OK
236
Check misalign-bltu-01  ... OK
237
Check misalign-bne-01   ... OK
238
Check misalign-jal-01   ... OK
239
Check misalign-lh-01    ... OK
240
Check misalign-lhu-01   ... OK
241
Check misalign-lw-01    ... OK
242
Check misalign-sh-01    ... OK
243
Check misalign-sw-01    ... OK
244
Check misalign1-jalr-01 ... OK
245
Check misalign2-jalr-01 ... OK
246
--------------------------------
247
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
248
...................................
249
 
250
.**RISC-V `rv32_m/Zifencei` Tests**
251
...................................
252
Check Fencei            ... OK
253
--------------------------------
254
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
255
...................................
256
 
257
 
258
<<<
259
:sectnums:
260
==== RISC-V Incompatibility Issues and Limitations
261
 
262 73 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility.
263 60 zero_gravi
 
264 69 zero_gravi
.Read-Only "Read-Write" CSRs
265 60 zero_gravi
[IMPORTANT]
266 72 zero_gravi
The <<_misa>> and <<_mtval>> CSRs in the NEORV32 are _read-only_.
267 73 zero_gravi
Any machine-mode write access to them is ignored and will _not_ cause any exceptions or
268
side-effects to maintain RISC-V compatibility.
269 60 zero_gravi
 
270 69 zero_gravi
.Physical Memory Protection
271 60 zero_gravi
[IMPORTANT]
272 73 zero_gravi
The RISC-V-compatible NEORV32 <<_machine_physical_memory_protection_csrs>> only implements the **TOR**
273
(top of region) mode and only up to 16 PMP regions. Furthermore, the <<_pmpcfg>>'s _lock bits_ only lock
274
the according PMP entry and not the entries below. All region rules are checked in parallel **without**
275
prioritization so for identical memory regions the most restrictive PMP rule will be enforced.
276 60 zero_gravi
 
277 69 zero_gravi
.Atomic Memory Operations
278 60 zero_gravi
[IMPORTANT]
279 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
280
However, these instructions are sufficient to emulate all further atomic memory operations.
281 60 zero_gravi
 
282 73 zero_gravi
.No HW-Support of Misaligned Memory Accesses
283
[WARNING]
284
The CPU does not support the resolution of unaligned memory access by the hardware. This is not a
285
RISC-V-compatibility issue but an important thing to know. Any kind of unaligned memory access
286
will raise an exception to allow a software-based emulation.
287 66 zero_gravi
 
288 73 zero_gravi
 
289 60 zero_gravi
<<<
290
// ####################################################################################################################
291
:sectnums:
292
=== CPU Top Entity - Signals
293
 
294
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
295
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
296
direction seen from the CPU.
297
 
298
.NEORV32 CPU top entity signals
299
[cols="<2,^1,^1,<6"]
300
[options="header", grid="rows"]
301
|=======================
302 73 zero_gravi
| Signal           | Width | Dir. | Description
303 60 zero_gravi
4+^| **Global Signals**
304
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
305
| `rstn_i`         |     1 | in  | global reset, low-active
306
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
307 69 zero_gravi
| `debug_o`        |     1 | out | CPU is in debug mode when set
308 73 zero_gravi
4+^| **Instruction <<_bus_interface>>**
309
| `i_bus_addr_o`   |    32 | out | access address
310 60 zero_gravi
| `i_bus_rdata_i`  |    32 | in  | read data
311
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
312
| `i_bus_ben_o`    |     4 | out | byte enable
313
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
314
| `i_bus_re_o`     |     1 | out | read transaction
315
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
316
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
317
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
318 73 zero_gravi
| `i_bus_fence_o`  |     1 | out | indicates an executed `fence.i` instruction
319
| `i_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)
320
4+^| **Data <<_bus_interface>>**
321
| `d_bus_addr_o`   |    32 | out | access address
322 60 zero_gravi
| `d_bus_rdata_i`  |    32 | in  | read data
323
| `d_bus_wdata_o`  |    32 | out | write data
324
| `d_bus_ben_o`    |     4 | out | byte enable
325
| `d_bus_we_o`     |     1 | out | write transaction
326
| `d_bus_re_o`     |     1 | out | read transaction
327
| `d_bus_lock_o`   |     1 | out | exclusive access request
328
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
329
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
330 73 zero_gravi
| `d_bus_fence_o`  |     1 | out | indicates an executed `fence` instruction
331
| `d_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)
332
4+^| **System Time (for <<_timeh>> CSR)**
333
| `time_i`         |    64 | in  | system time input from <<_machine_system_timer_mtime>>
334 60 zero_gravi
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
335
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
336
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
337
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
338 73 zero_gravi
4+^| **Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
339 60 zero_gravi
| `firq_i`         |    16 | in  | fast interrupt request signals
340
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
341
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
342
|=======================
343
 
344
<<<
345
// ####################################################################################################################
346
:sectnums:
347
=== CPU Top Entity - Generics
348
 
349
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
350
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
351
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
352
The _specific_ generics are listed below.
353
 
354
[cols="4,4,2"]
355
[frame="all",grid="none"]
356
|======
357 73 zero_gravi
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_
358 60 zero_gravi
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
359
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
360 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
361 60 zero_gravi
|======
362
 
363
[cols="4,4,2"]
364
[frame="all",grid="none"]
365
|======
366 73 zero_gravi
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_
367 60 zero_gravi
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
368
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
369
|======
370
 
371
[cols="4,4,2"]
372
[frame="all",grid="none"]
373
|======
374 73 zero_gravi
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | _no default value_
375 60 zero_gravi
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
376
|======
377
 
378
 
379
<<<
380
// ####################################################################################################################
381
:sectnums:
382
=== Instruction Sets and Extensions
383
 
384 65 zero_gravi
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
385 60 zero_gravi
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
386 65 zero_gravi
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
387 60 zero_gravi
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
388
 
389 72 zero_gravi
.Discovering ISA Extensions
390 60 zero_gravi
[TIP]
391 72 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> & <<_mxisa>> CSRs
392
or by executing an instruction and checking for an _illegal instruction exception_
393
(-> <<_full_virtualization>>). +
394
 +
395 65 zero_gravi
Executing an instruction from an extension that is not supported yet or that is currently not enabled
396 72 zero_gravi
(via the according top entity generic) will raise an illegal instruction exception.
397 60 zero_gravi
 
398 63 zero_gravi
 
399 60 zero_gravi
==== **`A`** - Atomic Memory Access
400
 
401 65 zero_gravi
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
402
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
403 72 zero_gravi
ISA extension is enabled if the <<_cpu_extension_riscv_a>> configuration generic is _true_.
404 65 zero_gravi
In this case the following additional instructions are available:
405 60 zero_gravi
 
406
* `lr.w`: load-reservate
407
* `sc.w`: store-conditional
408
 
409
[NOTE]
410
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
411
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
412 65 zero_gravi
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
413
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
414 60 zero_gravi
 
415 65 zero_gravi
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
416
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
417
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
418
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
419
After the execution of the `sc` instruction, the lock is automatically removed.
420
 
421
The lock is broken if at least one of the following conditions occur:
422
. executing any data memory access instruction other than `lr.w`
423
. raising _any_ t (for example an interrupt or a memory access exception)
424
 
425 60 zero_gravi
[NOTE]
426
The atomic instructions have special requirements for memory system / bus interconnect. More
427
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
428
 
429
 
430 66 zero_gravi
==== **`B`** - Bit-Manipulation Operations
431
 
432
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
433 72 zero_gravi
<<_cpu_extension_riscv_b>> configuration generic is _true_.
434 66 zero_gravi
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
435 71 zero_gravi
A copy of the spec is also available in `docs/references`.
436 66 zero_gravi
 
437 71 zero_gravi
The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V
438
bit-manipulation spec. v.093) and their corresponding instructions:
439 66 zero_gravi
 
440 71 zero_gravi
* **`Zba` - Address-generation instructions**
441
** `sh1add` `sh2add` `sh3add`
442
* **`Zbb` - Basic bit-manipulation instructions**
443
** `andn` `orn` `xnor`
444
** `clz` `ctz` `cpop`
445
** `max` `maxu` `min` `minu`
446
** `sext.b` `sext.h` `zext.h`
447
** `rol` `ror` `rori`
448
** `orc.b` `rev8`
449
* **`Zbc` - Carry-less multiplication instructions**
450
** `clmul` `clmulh` `clmulr`
451
* **`Zbs` - Single-bit instructions**
452
** `bclr` `bclri`
453
** `bext` `bexti`
454
** `bext` `binvi`
455
** `bset` `bseti`
456 66 zero_gravi
 
457
[TIP]
458
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
459
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
460
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
461
shift-related `B` instructions.
462
 
463
[WARNING]
464 71 zero_gravi
The `B` extension is frozen and officially ratified. However, there is no
465
software support for this extension in the upstream GCC RISC-V port yet. An
466 66 zero_gravi
intrinsic library is provided to utilize the provided `B` extension features from C-language
467 71 zero_gravi
code (see `sw/example/bitmanip_test`) to circumvent this.
468 66 zero_gravi
 
469
 
470 60 zero_gravi
==== **`C`** - Compressed Instructions
471
 
472 65 zero_gravi
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
473 72 zero_gravi
The `C` extension is available when the <<_cpu_extension_riscv_c>> configuration generic is _true_.
474 65 zero_gravi
In this case the following instructions are available:
475 60 zero_gravi
 
476 70 zero_gravi
* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`
477
`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`
478 60 zero_gravi
 
479
[NOTE]
480 65 zero_gravi
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
481
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
482 60 zero_gravi
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
483
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
484
 
485
 
486
==== **`E`** - Embedded CPU
487
 
488 65 zero_gravi
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
489 72 zero_gravi
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the <<_cpu_extension_riscv_e>>
490 65 zero_gravi
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
491
This extension does not add any additional instructions or features.
492 60 zero_gravi
 
493 70 zero_gravi
[NOTE]
494 63 zero_gravi
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
495 60 zero_gravi
 
496
 
497
==== **`I`** - Base Integer ISA
498 65 zero_gravi
 
499 60 zero_gravi
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
500
regardless of the setting of the remaining exceptions. The base instruction set includes the following
501
instructions:
502
 
503 70 zero_gravi
* immediate: `lui` `auipc`
504
* jumps: `jal` `jalr`
505
* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`
506
* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`
507
* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`
508
* environment: `ecall` `ebreak` `fence`
509 60 zero_gravi
 
510
[NOTE]
511 70 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations
512 61 zero_gravi
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
513 70 zero_gravi
completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
514 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
515 60 zero_gravi
 
516
[NOTE]
517
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
518 70 zero_gravi
top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
519 60 zero_gravi
executed. Any flags within the `fence` instruction word are ignore by the hardware.
520
 
521
 
522
==== **`M`** - Integer Multiplication and Division
523
 
524 65 zero_gravi
Hardware-accelerated integer multiplication and division operations are available when the
525 72 zero_gravi
<<_cpu_extension_riscv_m>> configuration generic is _true_. In this case the following instructions are
526 60 zero_gravi
available:
527
 
528 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
529
* division: `div` `divu` `rem` `remu`
530 60 zero_gravi
 
531
[NOTE]
532
By default, multiplication and division operations are executed in a bit-serial approach.
533 73 zero_gravi
Alternatively, the multiplier core can be implemented using DSP blocks if the <<_fast_mul_en>>
534 60 zero_gravi
generic is _true_ allowing faster execution. Multiplications and divisions
535
always require a fixed amount of cycles to complete - regardless of the input operands.
536
 
537 73 zero_gravi
[NOTE]
538
Regardless of the setting of the <<_fast_mul_en>> generic
539
multiplication and division instructions operate _independently_ of the input operands.
540
Hence, there is **no early completion** of multiply by one/zero and divide by zero operations.
541 60 zero_gravi
 
542 73 zero_gravi
 
543 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
544
 
545
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
546 65 zero_gravi
of the `M` extensions and is intended for size-constrained setups that require hardware-based
547 61 zero_gravi
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
548 65 zero_gravi
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
549 72 zero_gravi
It is implemented if the <<_cpu_extension_riscv_zmmul>> configuration generic is _true_.
550 61 zero_gravi
 
551 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
552 61 zero_gravi
 
553 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
554
will raise an _illegal instruction exception_.
555 61 zero_gravi
 
556 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
557 61 zero_gravi
 
558
[TIP]
559
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
560
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
561 65 zero_gravi
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
562 61 zero_gravi
 
563
 
564 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
565
 
566 65 zero_gravi
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
567 72 zero_gravi
operation mode. It is implemented if the <<_cpu_extension_riscv_u>> configuration generic is _true_.
568 65 zero_gravi
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
569
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
570 72 zero_gravi
Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.
571 60 zero_gravi
 
572 73 zero_gravi
Additional CSRs:
573 60 zero_gravi
 
574 73 zero_gravi
* <<_mcounteren>> - machine counter enable to constrain user-mode access to timer/counter CSRs
575
 
576
 
577 60 zero_gravi
==== **`X`** - NEORV32-Specific (Custom) Extensions
578
 
579 72 zero_gravi
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the <<_misa>> CSR.
580 60 zero_gravi
 
581 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
582 73 zero_gravi
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ`), which are controlled via custom bits in the <<_mie>>
583
and <<_mip>> CSRs. These extensions are mapped to CSR bits, that are available for custom use according to the
584
RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.
585 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
586 72 zero_gravi
* There are <<_neorv32_specific_csrs>>.
587 60 zero_gravi
 
588
 
589 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
590 60 zero_gravi
 
591 65 zero_gravi
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
592
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
593
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
594
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
595
register file-related load/store or move instructions.
596
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
597 60 zero_gravi
 
598 70 zero_gravi
[NOTE]
599 60 zero_gravi
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
600
 
601 65 zero_gravi
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
602 72 zero_gravi
to the `F` extension. The `Zfinx` extension is implemented when the <<_cpu_extension_riscv_zfinx>> configuration
603 60 zero_gravi
generic is _true_. In this case the following instructions and CSRs are available:
604
 
605 70 zero_gravi
* conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`
606
* comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`
607
* computational: `fadd.s` `fsub.s` `fmul.s`
608
* sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`
609 60 zero_gravi
* number classification: `fclass.s`
610
 
611 73 zero_gravi
* compressed instructions: `c.flw` `c.flwsp` `c.fsw` `c.fswsp`
612 60 zero_gravi
 
613 73 zero_gravi
Additional CSRs:
614
 
615
* <<_fcsr>> - FPU control register
616
* <<_frm>> - rounding mode control
617
* <<_fflags>> - FPU status flags
618
 
619 60 zero_gravi
[WARNING]
620
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
621
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
622
 
623
[WARNING]
624 65 zero_gravi
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
625
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
626 60 zero_gravi
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
627
result is also flushed to zero during normalization.
628
 
629
[WARNING]
630
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
631
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
632
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
633
code (see `sw/example/floating_point_test`).
634
 
635 63 zero_gravi
 
636 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
637
 
638 65 zero_gravi
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
639 72 zero_gravi
is implemented when the <<_cpu_extension_riscv_zicsr>> configuration generic is _true_.
640 68 zero_gravi
 
641
[IMPORTANT]
642
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
643
In order to provide the full set of privileged functions that are required to run more complex tasks like
644 70 zero_gravi
operating system and to allow a secure execution environment the `Zicsr` extension should be always enabled.
645 68 zero_gravi
 
646 65 zero_gravi
In this case the following instructions are available:
647 60 zero_gravi
 
648 70 zero_gravi
* CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`
649
* environment: `mret` `wfi`
650 60 zero_gravi
 
651 68 zero_gravi
[NOTE]
652
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
653
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
654
(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").
655 60 zero_gravi
 
656 73 zero_gravi
** `wfi` Instruction **
657
 
658 68 zero_gravi
The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is
659 73 zero_gravi
halted until a valid interrupt request occurs. To wake up again, at least one interrupt source has to
660 72 zero_gravi
be enabled via the <<_mie>> CSR and the global interrupt enable flag in <<_mstatus>> has to be set.
661 60 zero_gravi
 
662 73 zero_gravi
If the <<_mstatus>> `TW` bis is cleared the `wfi` instruction is also allowed to execute when in user-mode.
663
This is always the case if user-mode is not implemented. If the `TW` bit is set the execution of `wfi` in
664
user-mode will raise an illegal instruction exception.
665 62 zero_gravi
 
666 66 zero_gravi
 
667
==== **`Zicntr`** CPU Base Counters
668
 
669
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
670
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
671
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
672
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
673
 
674 73 zero_gravi
Additional CSRs:
675
 
676
* <<_cycleh>>, <<_mcycleh>> - cycle counter
677
* <<_instreth>>, <<_minstreth>> - instructions-retired counter
678
* <<_timeh>> - system _wall-clock_ time
679
 
680 66 zero_gravi
[NOTE]
681
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
682
 
683
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
684
 
685
 
686
==== **`Zihpm`** Hardware Performance Monitors
687
 
688
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
689
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
690
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
691 72 zero_gravi
<<_hpm_cnt_width>> generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
692 66 zero_gravi
CSR defines the architectural events that lead to an increment of the associated HPM counter.
693
 
694
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
695 73 zero_gravi
The actual number of implemented HPM counters is defined by the <<_hpm_num_cnts>> generic.
696 66 zero_gravi
 
697 73 zero_gravi
Additional CSRs:
698 66 zero_gravi
 
699 73 zero_gravi
* <<_mhpmevent>> 3..31 (depending on <<_hpm_num_cnts>>) - event configuration CSRs
700
* <<_mhpmcounterh>> 3..31 (depending on <<_hpm_num_cnts>>) - counter CSRs
701 66 zero_gravi
 
702
[IMPORTANT]
703 73 zero_gravi
The HPM counter CSRs can only be accessed in machine-mode. Hence, the according <<_mcounteren>> CSR bits
704 66 zero_gravi
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
705
exception.
706
 
707
[TIP]
708 73 zero_gravi
Auto-increment of the HPMs can be deactivated individually via the <<_mcountinhibit>> CSR.
709 66 zero_gravi
 
710
 
711 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
712
 
713 72 zero_gravi
The `Zifencei` CPU extension is implemented if the <<_cpu_extension_riscv_zifencei>> configuration
714 60 zero_gravi
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
715
 
716
* `fence.i`
717
 
718 66 zero_gravi
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
719 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
720
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
721
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
722 60 zero_gravi
 
723
 
724 72 zero_gravi
==== **`Zxcfu`** Custom Instructions Extension (CFU)
725
 
726
The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific
727
custom extension, `cfu` = name of the custom extension). When enabled via the <<_cpu_extension_riscv_zxcfu>> configuration
728
generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that
729
allows to add **custom RISC-V instructions** to the processor core.
730
 
731
The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data
732
transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be
733
implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by
734
the RISC-V spec for custom extensions.
735
 
736
Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that
737
behave like "regular" C functions.
738
 
739
[TIP]
740
For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.
741
 
742
[TIP]
743
The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.
744
If you like to add more complex accelerators or interfaces that can also operate independently of
745
the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.
746
 
747
 
748 60 zero_gravi
==== **`PMP`** Physical Memory Protection
749
 
750 73 zero_gravi
The NEORV32 physical memory protection (PMP) provides an elementary memory protection mechanism that can be used
751
to constrain read, write and execute rights of arbitrary memory regions. The PMP is compatible
752
to the _RISC-V Privileged Architecture Specifications_. For detailed information see the according spec.'s sections.
753 60 zero_gravi
 
754 73 zero_gravi
[IMPORTANT]
755
The NEORV32 PMP only supports **TOR** (top of region) mode, which basically is a "base-and-bound" concept, and only
756
up to 16 PMP regions.
757 65 zero_gravi
 
758 73 zero_gravi
The physical memory protection logic is implemented if the <<_pmp_num_regions>> configuration generic is greater
759
than zero. This generic also defines the total number of available configurable protection
760
regions. The minimal granularity of a protected region is defined by the <<_pmp_min_granularity>> generic. Larger
761
granularity will reduce hardware complexity but will also decrease granularity as the minimal region sizes increases.
762
The default value is 4 bytes, which allows a minimal region size of 4 bytes.
763 60 zero_gravi
 
764 73 zero_gravi
If implemented the PMP provides the following additional CSRs:
765
 
766
* <<_pmpcfg>> 0..3 (depending on configuration) - PMP configuration registers, 4 entries per CSR
767
* <<_pmpaddr>> 0..15 (depending on configuration) - PMP address registers
768
 
769
 
770
**Operation Summary**
771
 
772
Any CPU access address (from the instruction fetch or data access interface) is tested if it matches _any_
773
of the specified PMP regions. If there is a match, the configured access rights are enforced:
774
 
775
* a write access (store) will fail if no **write** attribute is set
776
* a read access (load) will fail if no **read** attribute is set
777
* an instruction fetch access will fail if no **execute** attribute is set
778
 
779
If an access to a protected region does not have the according access rights it will raise the according
780
instruction/load/store _bus access fault_ exception.
781
 
782
By default, all PMP checks are enforced for user-mode only. However, PMP rules can also be enforced for
783
machine-mode when the according PMP region has the "LOCK" bit set. This will also prevent any write access
784
to according region's PMP CSRs until the CPU is reset.
785
 
786
.Rule Prioritization
787
[IMPORTANT]
788
All rules are checked in parallel **without** prioritization so for identical memory regions the most restrictive
789
PMP rule will be enforced.
790
 
791
.PMP Example Program
792 65 zero_gravi
[TIP]
793 73 zero_gravi
A simple PMP example program can be found in `sw/example/demo_pmp`.
794 60 zero_gravi
 
795
 
796 73 zero_gravi
**Impact on Critical Path**
797 60 zero_gravi
 
798 73 zero_gravi
When implementing more PMP regions that a "_certain critical limit_" an **additional register stage** is automatically
799
inserted into the CPU's memory interfaces to keep impact on the critical path as short as minimal as possible.
800
Unfortunately, this will also increase the latency of instruction fetches and data access by one cycle.
801
The _critical limit_ can be modified by a constant from the main VHDL package file
802
(`rtl/core/neorv32_package.vhd`, default value = 8):
803 60 zero_gravi
 
804
[source,vhdl]
805
----
806
-- "critical" number of PMP regions --
807
constant pmp_num_regions_critical_c : natural := 8;
808
----
809
 
810 73 zero_gravi
[TIP]
811
Reducing the minimal PMP region size / granularity via the <<_pmp_min_granularity>> to entity generic
812
will also reduce hardware utilization and impact on critical path.
813 60 zero_gravi
 
814
 
815 73 zero_gravi
<<<
816
// ####################################################################################################################
817 60 zero_gravi
 
818 73 zero_gravi
include::cpu_cfu.adoc[]
819 60 zero_gravi
 
820
 
821
<<<
822
// ####################################################################################################################
823
:sectnums:
824
=== Instruction Timing
825
 
826
The instruction timing listed in the table below shows the required clock cycles for executing a certain
827
instruction. These instruction cycles assume a bus access without additional wait states and a filled
828
pipeline.
829
 
830
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
831
configurations are presented in <<_cpu_performance>>.
832
 
833
.Clock cycles per instruction
834
[cols="<2,^1,^4,<3"]
835
[options="header", grid="rows"]
836
|=======================
837
| Class | ISA | Instruction(s) | Execution cycles
838 73 zero_gravi
| ALU            | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
839
| ALU            | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
840
| ALU            | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
841
| ALU            | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
842
| Branches       | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
843
| Branches       | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
844
| Jumps / Calls  | `I/E` | `jal` `jalr`                  | 4 + ML
845
| Jumps / Calls  | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
846
| Memory access  | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
847
| Memory access  | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 4 + ML
848
| Memory access  | `A`   | `lr.w` `sc.w`                             | 4 + ML
849
| Multiplication | `M`   | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4
850
| Division       | `M`   | `div` `divu` `rem` `remu`     | 2+32+2
851
| CSR access     | `Zicsr`     | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 3
852
| System         | `I/E`       | `fence` | 3
853
| System         | `Zicsr`     | `ecall` `ebreak` | 3
854
| System         | `Zicsr`+`C` | `c.break` | 3
855
| System         | `Zicsr`     | `mret` `wfi` | 6
856
| System         | `Zifencei`  | `fence.i` | 3 + ML
857 60 zero_gravi
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
858
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
859
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
860 73 zero_gravi
| Floating-point - compare    | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
861
| Floating-point - misc       | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
862 60 zero_gravi
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
863
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
864 73 zero_gravi
| Bit-manipulation - arithmetic/logic    | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
865
| Bit-manipulation - arithmetic/logic    | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
866
| Bit-manipulation - shifts              | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
867
| Bit-manipulation - shifts              | `B(Zbb)` | `cpop` | 3 + 32
868
| Bit-manipulation - shifts              | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
869
| Bit-manipulation - single-bit          | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
870
| Bit-manipulation - shifted-add         | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
871 71 zero_gravi
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
872 73 zero_gravi
| Custom instructions (CFU) | `Zxcfu` | - | min. 4
873
| | | |
874
| _Illegal instructions_    | `Zicsr` | - | 2
875 60 zero_gravi
|=======================
876
 
877
[NOTE]
878 65 zero_gravi
The presented values of the *floating-point execution cycles* are average values - obtained from
879 60 zero_gravi
4096 instruction executions using pseudo-random input values. The execution time for emulating the
880
instructions (using pure-software libraries) is ~17..140 times higher.
881
 
882
 
883 66 zero_gravi
<<<
884 60 zero_gravi
// ####################################################################################################################
885
include::cpu_csr.adoc[]
886
 
887
 
888
<<<
889
// ####################################################################################################################
890
:sectnums:
891
==== Traps, Exceptions and Interrupts
892
 
893 61 zero_gravi
In this document the following nomenclature regarding traps is used:
894 60 zero_gravi
 
895 64 zero_gravi
* _interrupts_ = asynchronous exceptions
896 60 zero_gravi
* _exceptions_ = synchronous exceptions
897
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
898
 
899 72 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in <<_mtvec>>
900
CSR. The cause of the according interrupt or exception can be determined via the content of <<_mcause>>
901
CSR. The address that reflects the current program counter when a trap was taken is stored to <<_mepc>> CSR.
902
Additional information regarding the cause of the trap can be retrieved from <<_mtval>> CSR and the processor's
903 70 zero_gravi
<<_internal_bus_monitor_buskeeper>> (for memory access exceptions)
904 60 zero_gravi
 
905 70 zero_gravi
The traps are prioritized. If several _synchronous exceptions_ occur at once only the one with highest priority is triggered
906
while all remaining exceptions are ignored. If several _asynchronous exceptions_ (interrupts) trigger at once, the one with highest priority
907 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
908 70 zero_gravi
the second highest priority will get serviced and so on until no further interrupts are pending.
909 60 zero_gravi
 
910 69 zero_gravi
.Interrupt Signal Requirements - Standard RISC-V Interrupts
911 61 zero_gravi
[IMPORTANT]
912 69 zero_gravi
All standard RISC-V interrupts request signals are **high-active**. A request has to stay at high-level (=asserted)
913 65 zero_gravi
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
914 60 zero_gravi
 
915 69 zero_gravi
.Interrupt Signal Requirements - Fast Interrupt Requests
916
[IMPORTANT]
917 70 zero_gravi
The NEORV32-specific FIRQ request lines are triggered by a one-shot high-level (i.e. rising edge). Each request is buffered in the CPU control
918 73 zero_gravi
unit until the channel is either disabled (by clearing the according <<_mie>> CSR bit) or the request is explicitly cleared (by writing
919
zero to the according <<_mip>> CSR bit).
920 69 zero_gravi
 
921 61 zero_gravi
.Instruction Atomicity
922
[NOTE]
923 70 zero_gravi
All instructions execute as atomic operations - interrupts can only trigger _between_ two instructions.
924
So even if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
925
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.
926 60 zero_gravi
 
927
 
928 61 zero_gravi
:sectnums:
929 73 zero_gravi
===== Memory Access Exceptions
930 60 zero_gravi
 
931 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
932
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
933 70 zero_gravi
trigger a bus/memory read-operation at all. Vice versa, exceptions caused by a store address misalignment or a store physical
934
memory protection fault do not trigger a bus/memory write-operation at all.
935 60 zero_gravi
 
936
 
937 61 zero_gravi
:sectnums:
938 73 zero_gravi
===== Custom Fast Interrupt Request Lines
939 60 zero_gravi
 
940 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
941 72 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also
942
provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.
943 60 zero_gravi
 
944
 
945 69 zero_gravi
:sectnums:
946 73 zero_gravi
===== NEORV32 Trap Listing
947 60 zero_gravi
 
948 69 zero_gravi
The following table shows all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization
949
and the CSR side-effects. A more detailed description of the actual trap triggering events is provided in a further table.
950
 
951
[NOTE]
952 72 zero_gravi
_Asynchronous exceptions_ (= interrupts) set the MSB of <<_mcause>> while _synchronous exception_ (= "software exception")
953 69 zero_gravi
clear the MSB.
954
 
955
**Table Annotations**
956
 
957
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
958 72 zero_gravi
cause ID of the according trap that is written to <<_mcause>> CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
959 73 zero_gravi
official RISC-V privileged architecture spec. The "ID [C]" names are defined by the NEORV32 core library (the runtime environment _RTE_) and can
960
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to <<_mepc>> and <<_mtval>> CSRs when a trap is triggered:
961 69 zero_gravi
 
962 73 zero_gravi
* **IPC** - address of interrupted instruction (instruction has not been executed yet)
963
* **PC** - address of instruction that caused the trap
964
* **ADR** - bad memory access address that caused the trap
965
* **INST** - the faulting instruction word itself
966
* **0** - zero
967 69 zero_gravi
 
968
.NEORV32 Trap Listing
969 60 zero_gravi
[cols="3,6,5,14,11,4,4"]
970
[options="header",grid="rows"]
971
|=======================
972 73 zero_gravi
| Prio. | `mcause`     | [RISC-V] | ID [C]                   | Cause                             | `mepc`  | `mtval`
973
7+^| **Synchronous Exceptions**
974
| 1     | `0x00000000` | 0.0      | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned    | **PC**  | **ADR**
975
| 2     | `0x00000001` | 0.1      | _TRAP_CODE_I_ACCESS_     | instruction access bus fault      | **PC**  | **ADR**
976
| 3     | `0x00000002` | 0.2      | _TRAP_CODE_I_ILLEGAL_    | illegal instruction               | **PC**  | **INST**
977
| 4     | `0x0000000B` | 0.11     | _TRAP_CODE_MENV_CALL_    | environment call from M-mode      | **PC**  | **0**
978
| 5     | `0x00000008` | 0.8      | _TRAP_CODE_UENV_CALL_    | environment call from U-mode      | **PC**  | **0**
979
| 6     | `0x00000003` | 0.3      | _TRAP_CODE_BREAKPOINT_   | breakpoint instruction            | **PC**  | **PC**
980
| 7     | `0x00000006` | 0.6      | _TRAP_CODE_S_MISALIGNED_ | store address misaligned          | **PC**  | **ADR**
981
| 8     | `0x00000004` | 0.4      | _TRAP_CODE_L_MISALIGNED_ | load address misaligned           | **PC**  | **ADR**
982
| 9     | `0x00000007` | 0.7      | _TRAP_CODE_S_ACCESS_     | store access bus fault            | **PC**  | **ADR**
983
| 10    | `0x00000005` | 0.5      | _TRAP_CODE_L_ACCESS_     | load access bus fault             | **PC**  | **ADR**
984
7+^| **Asynchronous Exceptions (Interrupts)**
985
| 11    | `0x80000010` | 1.16     | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0  | **IPC** | **0**
986
| 12    | `0x80000011` | 1.17     | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1  | **IPC** | **0**
987
| 13    | `0x80000012` | 1.18     | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2  | **IPC** | **0**
988
| 14    | `0x80000013` | 1.19     | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3  | **IPC** | **0**
989
| 15    | `0x80000014` | 1.20     | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4  | **IPC** | **0**
990
| 16    | `0x80000015` | 1.21     | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5  | **IPC** | **0**
991
| 17    | `0x80000016` | 1.22     | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6  | **IPC** | **0**
992
| 18    | `0x80000017` | 1.23     | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7  | **IPC** | **0**
993
| 19    | `0x80000018` | 1.24     | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8  | **IPC** | **0**
994
| 20    | `0x80000019` | 1.25     | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9  | **IPC** | **0**
995
| 21    | `0x8000001a` | 1.26     | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10 | **IPC** | **0**
996
| 22    | `0x8000001b` | 1.27     | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11 | **IPC** | **0**
997
| 23    | `0x8000001c` | 1.28     | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12 | **IPC** | **0**
998
| 24    | `0x8000001d` | 1.29     | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13 | **IPC** | **0**
999
| 25    | `0x8000001e` | 1.30     | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14 | **IPC** | **0**
1000
| 26    | `0x8000001f` | 1.31     | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15 | **IPC** | **0**
1001
| 27    | `0x8000000B` | 1.11     | _TRAP_CODE_MEI_          | machine external interrupt (MEI)  | **IPC** | **0**
1002
| 28    | `0x80000003` | 1.3      | _TRAP_CODE_MSI_          | machine software interrupt (MSI)  | **IPC** | **0**
1003
| 29    | `0x80000007` | 1.7      | _TRAP_CODE_MTI_          | machine timer interrupt (MTI)     | **IPC** | **0**
1004 60 zero_gravi
|=======================
1005
 
1006
 
1007 69 zero_gravi
The following table provides a summarized description of the actual events for triggering a specific trap.
1008 60 zero_gravi
 
1009 69 zero_gravi
.NEORV32 Trap Description
1010
[cols="<3,<7"]
1011
[options="header",grid="rows"]
1012
|=======================
1013 70 zero_gravi
| Trap ID [C] | Triggered when ...
1014 73 zero_gravi
| _TRAP_CODE_I_MISALIGNED_ | fetching a 32-bit instruction word that is not 32-bit-aligned (_see note below!_)
1015 69 zero_gravi
| _TRAP_CODE_I_ACCESS_     | bus timeout or bus error during instruction word fetch
1016
| _TRAP_CODE_I_ILLEGAL_    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation
1017
| _TRAP_CODE_MENV_CALL_    | executing `ecall` instruction in machine-mode
1018
| _TRAP_CODE_UENV_CALL_    | executing `ecall` instruction in user-mode
1019 73 zero_gravi
| _TRAP_CODE_BREAKPOINT_   | executing `ebreak` instruction
1020 69 zero_gravi
| _TRAP_CODE_S_MISALIGNED_ | storing data to an address that is not naturally aligned to the data size (byte, half, word) being stored
1021
| _TRAP_CODE_L_MISALIGNED_ | loading data from an address that is not naturally aligned to the data size  (byte, half, word) being loaded
1022
| _TRAP_CODE_S_ACCESS_     | bus timeout or bus error during load data operation
1023
| _TRAP_CODE_L_ACCESS_     | bus timeout or bus error during store data operation
1024
| _TRAP_CODE_FIRQ_0_ ... _TRAP_CODE_FIRQ_15_| caused by interrupt-condition of processor-internal modules, see <<_neorv32_specific_fast_interrupt_requests>>
1025
| _TRAP_CODE_MEI_          | user-defined processor-external source (via dedicated top-entity signal)
1026
| _TRAP_CODE_MSI_          | user-defined processor-external source (via dedicated top-entity signal)
1027
| _TRAP_CODE_MTI_          | processor-internal machine timer overflow OR user-defined processor-external source (via dedicated top-entity signal)
1028
|=======================
1029 60 zero_gravi
 
1030 72 zero_gravi
.Misaligned Instruction Address Exception
1031 69 zero_gravi
[NOTE]
1032
For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception
1033
is raised if bit 1 of the fetch address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented
1034
there will never be a misaligned instruction exception _at all_.
1035 72 zero_gravi
In both cases bit 0 of the program counter (and all related CSRs) is hardwired to zero.
1036 60 zero_gravi
 
1037
 
1038
<<<
1039
// ####################################################################################################################
1040
:sectnums:
1041
==== Bus Interface
1042
 
1043 72 zero_gravi
The NEORV32 CPU implements a 32-bit machine with separated instruction and data interfaces making the CPU a
1044
**Harvard Architecture**: the _instruction fetch interface_ (`i_bus_*`) is used for fetching instruction and the
1045
_data access interface_ (`d_bus_*`) is used to access data via load and store operations.
1046
Each of this interfaces can access an address space of up to 2^32^ bytes (4GB).
1047
The following table shows the signals of the data and instruction interfaces as seen from the CPU (`*_o` signals are driven
1048
by the CPU / outputs, `*_i` signals are read by the CPU / inputs). Both interfaces use the same protocol.
1049 60 zero_gravi
 
1050 72 zero_gravi
.CPU bus interfaces ()
1051
[cols="<2,^1,^1,<6"]
1052 60 zero_gravi
[options="header",grid="rows"]
1053
|=======================
1054 72 zero_gravi
| Signal             | Width | Direction | Description
1055
| `i/d_bus_addr_o`   | 32    | out       | access address
1056
| `i/d_bus_rdata_i`  | 32    | in        | data input for read operations
1057
| `i/d_bus_wdata_o`  | 32    | out       | data output for write operations
1058
| `i/d_bus_ben_o`    | 4     | out       | byte enable signal for write operations
1059
| `i/d_bus_we_o`     | 1     | out       | bus write access (always zero for instruction fetches)
1060
| `i/d_bus_re_o`     | 1     | out       | bus read access
1061
| `i/d_bus_lock_o`   | 1     | out       | exclusive access request
1062
| `i/d_bus_ack_i`    | 1     | in        | accessed peripheral indicates a successful completion of the bus transaction
1063
| `i/d_bus_err_i`    | 1     | in        | accessed peripheral indicates an error during the bus transaction
1064
| `i/d_bus_fence_o`  | 1     | out       | this signal is set for one cycle when the CPU executes an instruction/data fence operation
1065
| `i/d_bus_priv_o`   | 2     | out       | current CPU privilege level
1066 60 zero_gravi
|=======================
1067
 
1068 72 zero_gravi
.Pipelined Transfers
1069 60 zero_gravi
[NOTE]
1070
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
1071 72 zero_gravi
So only a single transfer request can be "on the fly" (pending) at once. However, this is no real drawback. The
1072
minimal possible latency for a single access is two cycles, which equals the CPU's minimal execution latency
1073
for a single instruction.
1074 60 zero_gravi
 
1075 72 zero_gravi
.Unaligned Memory Accesses
1076
[NOTE]
1077
Please note, that the NEORV32 CPU does not support the handling of unaligned memory accesses _in hardware_. Any
1078
unaligned memory access will raise an exception that can can be used to handle such accesses in _software_.
1079
 
1080
 
1081 60 zero_gravi
:sectnums:
1082
===== Protocol
1083
 
1084 72 zero_gravi
An actual bus request is triggered either by the `*_bus_re_o` signal (for reading data) or by the `*_bus_we_o` signal
1085
(for writing data). In case of a request, one of these signals is high for exactly one cycle. The transaction is
1086
completed when the accessed peripheral/memory either sets the `*_bus_ack_i` signal (-> successful completion) or the
1087
`*_bus_err_i` signal (-> failed completion). These bus response signal are also set only for one cycle active.
1088
An error indicated by the `*_bus_err_i` signal will raise the according "instruction bus access fault" or
1089
"load/store bus access fault" exception.
1090 60 zero_gravi
 
1091 73 zero_gravi
 
1092 72 zero_gravi
**Minimal Response Latency**
1093 60 zero_gravi
 
1094 72 zero_gravi
The transfer can be completed directly in the same cycle as it was initiated (via the `*_bus_re_o` or `*_bus_we_o`
1095
signal) if the peripheral sets `*_bus_ack_i` or `*_bus_err_i` high for one cycle. However, in order to shorten the
1096
critical path such "asynchronous" completion should be avoided. The default NEORV32 processor-internal modules provide
1097
exactly **one cycle delay** between initiation and completion of transfers.
1098 60 zero_gravi
 
1099 73 zero_gravi
 
1100 72 zero_gravi
**Maximal Response Latency**
1101
 
1102
Processor-internal peripherals or memories do not have to respond within one cycle after a bus request has been initiated.
1103
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window
1104
is defined by the global `max_proc_int_response_time_c` constant (default = 15 cycles; processor's VHDL package file `rtl/neorv32_package.vhd`).
1105 73 zero_gravi
It defines the maximum number of cycles after which an _unacknowledged_ (`*_bus_ack_i` or `*_bus_err_i` signal from the **processor-internal bus**
1106
both not set) processor-internal bus
1107 72 zero_gravi
transfer will time out and raises a **bus fault exception**. The <<_internal_bus_monitor_buskeeper>> keeps track of all _internal_ bus
1108
transactions to enforce this time window.
1109
 
1110
If any bus operations times out (for example when accessing "address space holes") the BUSKEEPER will issue a bus
1111
error to the CPU that will raise the according instruction fetch or data access bus exception.
1112
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However,
1113
the external memory bus interface also provides an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
1114
 
1115 73 zero_gravi
.Interface Response
1116
[NOTE]
1117
Please note that any CPU access via the data or instruction interface has to be terminated either by asserting the
1118
CPU's *_bus_ack_i` or `*_bus_err_i` signal. Otherwise the CPU will be stalled permanently. The BUSKEEPER ensures that
1119
any kind of access is always properly terminated.
1120
 
1121
 
1122 60 zero_gravi
**Exemplary Bus Accesses**
1123
 
1124
.Example bus accesses: see read/write access description below
1125
[cols="^2,^2"]
1126
[grid="none"]
1127
|=======================
1128
a| image::cpu_interface_read_long.png[read,300,150]
1129
a| image::cpu_interface_write_long.png[write,300,150]
1130
| Read access | Write access
1131
|=======================
1132
 
1133 73 zero_gravi
 
1134 60 zero_gravi
**Write Access**
1135
 
1136 72 zero_gravi
For a write access, the access address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
1137 60 zero_gravi
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
1138
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
1139
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
1140
cycles after issuing.
1141
 
1142 73 zero_gravi
 
1143 60 zero_gravi
**Read Access**
1144
 
1145
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
1146
stable until the transaction is completed. In the example the accessed peripheral cannot answer
1147
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
1148
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
1149
signal).
1150
 
1151 73 zero_gravi
 
1152 60 zero_gravi
**Access Boundaries**
1153
 
1154
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
1155
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1156
bit) and word (= 32-bit) boundaries.
1157
 
1158 73 zero_gravi
 
1159 60 zero_gravi
**Exclusive (Atomic) Access**
1160
 
1161
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1162
combination. Normally, these combinations should target the same memory address.
1163
 
1164
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1165
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1166
the memory system to manage this exclusive access reservation by storing the according access address and
1167
the source of the access itself (for example via the CPU ID in a multi-core system).
1168
 
1169
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1170
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1171
zero and will allow the according store operation to the memory system. If the lock is broken, the
1172
instruction will write-back non-zero and will not generate an actual memory store operation.
1173
 
1174
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1175
 
1176
* when executing any other memory-access operation than `lr.w`
1177
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1178
* when the memory system signals a bus error (via the `bus_err_i` signal)
1179
 
1180
[TIP]
1181
For more information regarding the SoC-level behavior and requirements of atomic operations see
1182
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1183
 
1184 73 zero_gravi
 
1185 60 zero_gravi
**Memory Barriers**
1186
 
1187 72 zero_gravi
Whenever the CPU executes a _fence_ instruction, the according interface signal is set high for one cycle
1188
(`d_bus_fence_o` for a `fence` instruction; `i_bus_fence_o` for a `fencei` instruction). It is the task of the
1189
memory system to perform the necessary operations (for example a cache flush and refill).
1190 60 zero_gravi
 
1191
 
1192
 
1193
<<<
1194
// ####################################################################################################################
1195
:sectnums:
1196
==== CPU Hardware Reset
1197
 
1198
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1199
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1200
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1201
after power-up is not relevant for a defined CPU boot process.
1202
 
1203 73 zero_gravi
 
1204 70 zero_gravi
**Rationale**
1205 60 zero_gravi
 
1206
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1207
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1208 66 zero_gravi
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
1209 60 zero_gravi
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1210
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1211 70 zero_gravi
the pipeline's data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1212 60 zero_gravi
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1213
this example "uncritical registers".
1214
 
1215 73 zero_gravi
 
1216 60 zero_gravi
**NEORV32 CPU Reset**
1217
 
1218
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1219
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1220 70 zero_gravi
pipeline register will get initialized by the CPU's internal state machines, which are initialized from the main
1221 60 zero_gravi
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1222
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1223
 
1224
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1225 72 zero_gravi
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR <<_mie>>
1226 60 zero_gravi
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1227 70 zero_gravi
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) _do_ provide a dedicated
1228
hardware reset setting this bit to low (globally disabling interrupts).
1229 60 zero_gravi
 
1230 73 zero_gravi
 
1231 60 zero_gravi
**Reset Configuration**
1232
 
1233 70 zero_gravi
Most CPU-internal register do provide an asynchronous reset in the VHDL code, but the "don't care" value
1234
(VHDL `'-'`) is used for initialization of all uncritical registers, effectively generating a flip-flop without a
1235 60 zero_gravi
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1236 70 zero_gravi
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all CPU registers can
1237 72 zero_gravi
be enabled by enabling a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1238 60 zero_gravi
 
1239
[source,vhdl]
1240
----
1241 72 zero_gravi
-- use dedicated hardware reset value for UNCRITICAL registers --
1242
-- FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value
1243
constant dedicated_reset_c : boolean := false;
1244 60 zero_gravi
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.