OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu.adoc] - Blame information for rev 74

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== NEORV32 Central Processing Unit (CPU)
3
 
4 72 zero_gravi
image::neorv32_cpu_block.png[width=600,align=center]
5 60 zero_gravi
 
6 73 zero_gravi
**Section Structure**
7
 
8
* <<_architecture>>, <<_full_virtualization>> and <<_risc_v_compatibility>>
9
* <<_cpu_top_entity_signals>> and <<_cpu_top_entity_generics>>
10
* <<_instruction_sets_and_extensions>>, <<_custom_functions_unit_cfu>> and <<_instruction_timing>>
11
* <<_control_and_status_registers_csrs>>
12
* <<_traps_exceptions_and_interrupts>>
13
* <<_bus_interface>>
14
 
15
 
16 60 zero_gravi
**Key Features**
17
 
18 73 zero_gravi
* 32-bit little-endian, multi-cycle, in-order `rv32` RISC-V CPU
19
* Compatible to the RISC-V. **Privileged Architecture - Machine ISA Version 1.12** specifications
20
* Available <<_instruction_sets_and_extensions>>:
21 61 zero_gravi
** `A` - atomic memory access operations
22 66 zero_gravi
** `B` - bit-manipulation instructions
23 61 zero_gravi
** `C` - 16-bit compressed instructions
24
** `I` - integer base ISA (always enabled)
25
** `E` - embedded CPU version (reduced register file size)
26
** `M` - integer multiplication and division hardware
27
** `U` - less-privileged _user_ mode
28
** `Zfinx` - single-precision floating-point unit
29
** `Zicsr` - control and status register access (privileged architecture)
30 66 zero_gravi
** `Zicntr` - CPU base counters
31
** `Zihpm` - hardware performance monitors
32 61 zero_gravi
** `Zifencei` - instruction stream synchronization
33
** `Zmmul` - integer multiplication hardware
34 72 zero_gravi
** `Zxcfu` - custom instructions extension
35 61 zero_gravi
** `PMP` - physical memory protection
36 73 zero_gravi
** `Debug` - <<_cpu_debug_mode>> (part of the on.chip debugger) including hardware <<_trigger_module>>
37
* <<_risc_v_compatibility>>: Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications - passes the official RISC-V Architecture Tests (v2+)
38 60 zero_gravi
* Official RISC-V open-source architecture ID
39 73 zero_gravi
* Supports _all_ of the machine-level <<_traps_exceptions_and_interrupts>> from the RISC-V specifications (including bus access exceptions and all unimplemented/illegal/malformed instructions)
40 66 zero_gravi
** This is a special aspect on _execution safety_ by <<_full_virtualization>>
41 73 zero_gravi
** Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 custom _fast_ interrupts
42 60 zero_gravi
* Optional physical memory configuration (PMP), compatible to the RISC-V specifications
43
* Optional hardware performance monitors (HPM) for application benchmarking
44 73 zero_gravi
* Separated <<_bus_interface>>s for instruction fetch and data access
45 60 zero_gravi
 
46
[NOTE]
47
It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual
48
CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU
49
wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This
50
setup also allows to further use the default bootloader and software framework. From this base you
51 70 zero_gravi
can start building your own SoC. Of course you can also use the CPU in it's true stand-alone mode.
52 60 zero_gravi
 
53
[NOTE]
54
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.
55
 
56
<<<
57
// ####################################################################################################################
58
:sectnums:
59
=== Architecture
60
 
61
The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture
62
specifications. The following figure shows the simplified architecture of the CPU.
63
 
64
image::neorv32_cpu.png[align=center]
65
 
66 66 zero_gravi
The CPU implements a _multi-cycle_ architecture. Hence, each instruction is executed as a series of consecutive
67
micro-operations. In order to increase performance, the CPU's **front-end** (instruction fetch) and **back-end**
68
(instruction execution) are de-couples via a FIFO (the "instruction prefetch buffer"). Therefore, the
69
front-end can already fetch new instructions while the back-end is still processing previously-fetched instructions.
70 60 zero_gravi
 
71 66 zero_gravi
The front-end is responsible for fetching 32-bit chunks of instruction words (one aligned 32-bit instruction,
72
two 16-bit instructions or a mixture if 32-bit instructions are not aligned to 32-bit boundaries). The instruction
73
data is stored to a FIFO queue - the instruction prefetch buffer.
74 60 zero_gravi
 
75 66 zero_gravi
The back-end is responsible for the actual execution of the instruction. It includes an "issue engine",
76
which takes data from the instruction prefetch buffer and assembles 32-bit instruction words (plain 32-bit
77
instruction or decompressed 16-bit instructions) for execution.
78
 
79
Front-end and back-end operate in parallel and with overlapping operations. Hence, the optimal CPI
80
(cycles per instructions) is 2, but it can be significantly higher: for instance when executing loads/stores
81
(accessing memory-mapped devices with high latency), executing multi-cycle ALU operations (like divisions) or
82
when the CPU front-end has to reload the prefetch buffer due to a taken branch.
83
 
84 60 zero_gravi
Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage
85
requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes
86 66 zero_gravi
every single instruction (_including_ fetch) in a series of consecutive micro-operations. The combination of
87
these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle
88
approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the
89
multi-cycle concept).
90 60 zero_gravi
 
91 66 zero_gravi
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access.
92
These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses
93
have higher priority). Hence, ALL memory locations including peripheral devices are mapped to a single unified 32-bit
94
address space.
95 60 zero_gravi
 
96
 
97
// ####################################################################################################################
98
:sectnums:
99 66 zero_gravi
=== Full Virtualization
100
 
101 72 zero_gravi
Just like the RISC-V ISA the NEORV32 aims to provide _maximum virtualization_ capabilities on CPU and SoC level to
102 66 zero_gravi
allow a high standard of **execution safety**. The CPU supports **all** traps specified by the official RISC-V specifications.
103
footnote:[If the `Zicsr` CPU extension is enabled (implementing the full set of the privileged architecture).]
104 72 zero_gravi
Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situation (e.g. executing a
105
malformed instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a
106 66 zero_gravi
defined and fully synchronized state throughout the whole architecture (i.e. there are no out-of-order operations that
107 72 zero_gravi
might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.
108 66 zero_gravi
 
109
**Execution Safety - NEORV32 Virtualization Features**
110
 
111
* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system
112
(i.e. there is no speculative execution / no out-of-order states).
113
* The CPU supports _all_ RISC-V compatible bus exceptions including access exceptions, which are triggered if an
114 72 zero_gravi
accessed address does not respond or encounters an internal device error during access.
115 66 zero_gravi
* Accessed memory addresses (plain memory, but also memory-mapped devices) need to respond within a fixed time
116
window. Otherwise a bus access exception is raised.
117
* The RISC-V specs. state that executing an malformed instruction results in unpredictable behavior. As an additional
118
execution safety feature the NEORV32 CPU ensures that _all_ unimplemented/malformed/illegal instructions do raise an
119
illegal instruction exceptions and do not commit any state-changing operation (like writing registers or triggering
120
memory operations).
121
* To be continued...
122
 
123
 
124
// ####################################################################################################################
125
:sectnums:
126 60 zero_gravi
=== RISC-V Compatibility
127
 
128 74 zero_gravi
The NEORV32 CPU passes the tests of the _RISC-V Architecture Test Framework_. This framework is used to check
129
RISC-V implementations for compatibility with the official RISC-V ISA specifications.
130
The NEORV32 port of this test framework has been moved to a separate repository:
131
https://github.com/stnolting/neorv32-verif
132 62 zero_gravi
 
133 60 zero_gravi
.**RISC-V `rv32_m/C` Tests**
134
...................................
135
Check cadd-01           ... OK
136
Check caddi-01          ... OK
137
Check caddi16sp-01      ... OK
138
Check caddi4spn-01      ... OK
139
Check cand-01           ... OK
140
Check candi-01          ... OK
141
Check cbeqz-01          ... OK
142
Check cbnez-01          ... OK
143
Check cebreak-01        ... OK
144
Check cj-01             ... OK
145
Check cjal-01           ... OK
146
Check cjalr-01          ... OK
147
Check cjr-01            ... OK
148
Check cli-01            ... OK
149
Check clui-01           ... OK
150
Check clw-01            ... OK
151
Check clwsp-01          ... OK
152
Check cmv-01            ... OK
153
Check cnop-01           ... OK
154
Check cor-01            ... OK
155
Check cslli-01          ... OK
156
Check csrai-01          ... OK
157
Check csrli-01          ... OK
158
Check csub-01           ... OK
159
Check csw-01            ... OK
160
Check cswsp-01          ... OK
161
Check cxor-01           ... OK
162
--------------------------------
163
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
164
...................................
165
 
166
.**RISC-V `rv32_m/I` Tests**
167
...................................
168
Check add-01            ... OK
169
Check addi-01           ... OK
170
Check and-01            ... OK
171
Check andi-01           ... OK
172
Check auipc-01          ... OK
173
Check beq-01            ... OK
174
Check bge-01            ... OK
175
Check bgeu-01           ... OK
176
Check blt-01            ... OK
177
Check bltu-01           ... OK
178
Check bne-01            ... OK
179
Check fence-01          ... OK
180 74 zero_gravi
Check jal-01            ... IGNORED <1>
181 60 zero_gravi
Check jalr-01           ... OK
182
Check lb-align-01       ... OK
183
Check lbu-align-01      ... OK
184
Check lh-align-01       ... OK
185
Check lhu-align-01      ... OK
186
Check lui-01            ... OK
187
Check lw-align-01       ... OK
188
Check or-01             ... OK
189
Check ori-01            ... OK
190
Check sb-align-01       ... OK
191
Check sh-align-01       ... OK
192
Check sll-01            ... OK
193
Check slli-01           ... OK
194
Check slt-01            ... OK
195
Check slti-01           ... OK
196
Check sltiu-01          ... OK
197
Check sltu-01           ... OK
198
Check sra-01            ... OK
199
Check srai-01           ... OK
200
Check srl-01            ... OK
201
Check srli-01           ... OK
202
Check sub-01            ... OK
203
Check sw-align-01       ... OK
204
Check xor-01            ... OK
205
Check xori-01           ... OK
206 70 zero_gravi
Check fence-01          ... OK
207 60 zero_gravi
--------------------------------
208 70 zero_gravi
OK: 39/39 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
209 60 zero_gravi
...................................
210 74 zero_gravi
<1> Test is skipped due to a GHDL simulation issue.
211 60 zero_gravi
 
212
.**RISC-V `rv32_m/M` Tests**
213
...................................
214
Check div-01            ... OK
215
Check divu-01           ... OK
216
Check mul-01            ... OK
217
Check mulh-01           ... OK
218
Check mulhsu-01         ... OK
219
Check mulhu-01          ... OK
220
Check rem-01            ... OK
221
Check remu-01           ... OK
222
--------------------------------
223
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
224
...................................
225
 
226
.**RISC-V `rv32_m/privilege` Tests**
227
...................................
228
Check ebreak            ... OK
229
Check ecall             ... OK
230
Check misalign-beq-01   ... OK
231
Check misalign-bge-01   ... OK
232
Check misalign-bgeu-01  ... OK
233
Check misalign-blt-01   ... OK
234
Check misalign-bltu-01  ... OK
235
Check misalign-bne-01   ... OK
236
Check misalign-jal-01   ... OK
237
Check misalign-lh-01    ... OK
238
Check misalign-lhu-01   ... OK
239
Check misalign-lw-01    ... OK
240
Check misalign-sh-01    ... OK
241
Check misalign-sw-01    ... OK
242
Check misalign1-jalr-01 ... OK
243
Check misalign2-jalr-01 ... OK
244
--------------------------------
245
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
246
...................................
247
 
248
.**RISC-V `rv32_m/Zifencei` Tests**
249
...................................
250
Check Fencei            ... OK
251
--------------------------------
252
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32
253
...................................
254
 
255
 
256
<<<
257
:sectnums:
258
==== RISC-V Incompatibility Issues and Limitations
259
 
260 73 zero_gravi
This list shows the currently identified issues regarding full RISC-V-compatibility.
261 60 zero_gravi
 
262 69 zero_gravi
.Read-Only "Read-Write" CSRs
263 60 zero_gravi
[IMPORTANT]
264 72 zero_gravi
The <<_misa>> and <<_mtval>> CSRs in the NEORV32 are _read-only_.
265 73 zero_gravi
Any machine-mode write access to them is ignored and will _not_ cause any exceptions or
266
side-effects to maintain RISC-V compatibility.
267 60 zero_gravi
 
268 69 zero_gravi
.Physical Memory Protection
269 60 zero_gravi
[IMPORTANT]
270 73 zero_gravi
The RISC-V-compatible NEORV32 <<_machine_physical_memory_protection_csrs>> only implements the **TOR**
271
(top of region) mode and only up to 16 PMP regions. Furthermore, the <<_pmpcfg>>'s _lock bits_ only lock
272
the according PMP entry and not the entries below. All region rules are checked in parallel **without**
273
prioritization so for identical memory regions the most restrictive PMP rule will be enforced.
274 60 zero_gravi
 
275 69 zero_gravi
.Atomic Memory Operations
276 60 zero_gravi
[IMPORTANT]
277 64 zero_gravi
The `A` CPU extension only implements the `lr.w` and `sc.w` instructions yet.
278
However, these instructions are sufficient to emulate all further atomic memory operations.
279 60 zero_gravi
 
280 73 zero_gravi
.No HW-Support of Misaligned Memory Accesses
281
[WARNING]
282
The CPU does not support the resolution of unaligned memory access by the hardware. This is not a
283
RISC-V-compatibility issue but an important thing to know. Any kind of unaligned memory access
284
will raise an exception to allow a software-based emulation.
285 66 zero_gravi
 
286 73 zero_gravi
 
287 60 zero_gravi
<<<
288
// ####################################################################################################################
289
:sectnums:
290
=== CPU Top Entity - Signals
291
 
292
The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
293
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
294
direction seen from the CPU.
295
 
296
.NEORV32 CPU top entity signals
297
[cols="<2,^1,^1,<6"]
298
[options="header", grid="rows"]
299
|=======================
300 73 zero_gravi
| Signal           | Width | Dir. | Description
301 60 zero_gravi
4+^| **Global Signals**
302
| `clk_i`          |     1 | in  | global clock line, all registers triggering on rising edge
303
| `rstn_i`         |     1 | in  | global reset, low-active
304
| `sleep_o`        |     1 | out | CPU is in sleep mode when set
305 69 zero_gravi
| `debug_o`        |     1 | out | CPU is in debug mode when set
306 73 zero_gravi
4+^| **Instruction <<_bus_interface>>**
307
| `i_bus_addr_o`   |    32 | out | access address
308 60 zero_gravi
| `i_bus_rdata_i`  |    32 | in  | read data
309
| `i_bus_wdata_o`  |    32 | out | write data (always zero)
310
| `i_bus_ben_o`    |     4 | out | byte enable
311
| `i_bus_we_o`     |     1 | out | write transaction (always zero)
312
| `i_bus_re_o`     |     1 | out | read transaction
313
| `i_bus_lock_o`   |     1 | out | exclusive access request (always zero)
314
| `i_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
315
| `i_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
316 73 zero_gravi
| `i_bus_fence_o`  |     1 | out | indicates an executed `fence.i` instruction
317
| `i_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)
318
4+^| **Data <<_bus_interface>>**
319
| `d_bus_addr_o`   |    32 | out | access address
320 60 zero_gravi
| `d_bus_rdata_i`  |    32 | in  | read data
321
| `d_bus_wdata_o`  |    32 | out | write data
322
| `d_bus_ben_o`    |     4 | out | byte enable
323
| `d_bus_we_o`     |     1 | out | write transaction
324
| `d_bus_re_o`     |     1 | out | read transaction
325
| `d_bus_lock_o`   |     1 | out | exclusive access request
326
| `d_bus_ack_i`    |     1 | in  | bus transfer acknowledge from accessed peripheral
327
| `d_bus_err_i`    |     1 | in  | bus transfer terminate from accessed peripheral
328 73 zero_gravi
| `d_bus_fence_o`  |     1 | out | indicates an executed `fence` instruction
329
| `d_bus_priv_o`   |     1 | out | current _effective_ CPU privilege level (`0` user, `1` machine or debug)
330
4+^| **System Time (for <<_timeh>> CSR)**
331
| `time_i`         |    64 | in  | system time input from <<_machine_system_timer_mtime>>
332 60 zero_gravi
4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)**
333
| `msw_irq_i`      |     1 | in  | RISC-V machine software interrupt
334
| `mext_irq_i`     |     1 | in  | RISC-V machine external interrupt
335
| `mtime_irq_i`    |     1 | in  | RISC-V machine timer interrupt
336 73 zero_gravi
4+^| **Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)**
337 60 zero_gravi
| `firq_i`         |    16 | in  | fast interrupt request signals
338
4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)**
339
| `db_halt_req_i`  |     1 | in  | request CPU to halt and enter debug mode
340
|=======================
341
 
342
<<<
343
// ####################################################################################################################
344
:sectnums:
345
=== CPU Top Entity - Generics
346
 
347
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>).
348
and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the
349
NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration.
350
The _specific_ generics are listed below.
351
 
352
[cols="4,4,2"]
353
[frame="all",grid="none"]
354
|======
355 73 zero_gravi
| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_
356 60 zero_gravi
3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this
357
generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction
358 61 zero_gravi
memory (IMEM) if the bootloader is disabled (_INT_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information.
359 60 zero_gravi
|======
360
 
361
[cols="4,4,2"]
362
[frame="all",grid="none"]
363
|======
364 73 zero_gravi
| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | _no default value_
365 60 zero_gravi
3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address
366
of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information.
367
|======
368
 
369
[cols="4,4,2"]
370
[frame="all",grid="none"]
371
|======
372 73 zero_gravi
| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | _no default value_
373 60 zero_gravi
3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information.
374
|======
375
 
376
 
377
<<<
378
// ####################################################################################################################
379
:sectnums:
380
=== Instruction Sets and Extensions
381
 
382 65 zero_gravi
The basic NEORV32 is a RISC-V `rv32i` architecture that provides several _optional_ RISC-V CPU and ISA
383 60 zero_gravi
(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please
384 65 zero_gravi
see the the _RISC-V Instruction Set Manual - Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual
385 60 zero_gravi
Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder.
386
 
387 72 zero_gravi
.Discovering ISA Extensions
388 60 zero_gravi
[TIP]
389 72 zero_gravi
The CPU can discover available ISA extensions via the <<_misa>> & <<_mxisa>> CSRs
390
or by executing an instruction and checking for an _illegal instruction exception_
391
(-> <<_full_virtualization>>). +
392
 +
393 65 zero_gravi
Executing an instruction from an extension that is not supported yet or that is currently not enabled
394 72 zero_gravi
(via the according top entity generic) will raise an illegal instruction exception.
395 60 zero_gravi
 
396 63 zero_gravi
 
397 60 zero_gravi
==== **`A`** - Atomic Memory Access
398
 
399 65 zero_gravi
Atomic memory access instructions allow more sophisticated memory operations like implementing semaphores and mutexes.
400
The RICS-C specs. defines a specific _atomic_ extension that provides instructions for atomic memory accesses. The `A`
401 72 zero_gravi
ISA extension is enabled if the <<_cpu_extension_riscv_a>> configuration generic is _true_.
402 65 zero_gravi
In this case the following additional instructions are available:
403 60 zero_gravi
 
404
* `lr.w`: load-reservate
405
* `sc.w`: store-conditional
406
 
407
[NOTE]
408
Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations
409
(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the
410 65 zero_gravi
instruction's ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet
411
implemented) AMO (atomic memory operation) will raise an illegal instruction exception.
412 60 zero_gravi
 
413 65 zero_gravi
The *load-reservate* instruction behaves as a "normal" load-word instruction (`lw`) but will also set a CPU-internal
414
_data memory access lock_. Executing a *store-conditional* behaves as "normal" store-word instruction (`sw`) that will
415
only conduct an actual memory write operations if the lock is still intact. Additionally, the store-conditional instruction
416
will also return the lock state (returns zero if the lock is still intact or non-zero if the lock has been broken).
417
After the execution of the `sc` instruction, the lock is automatically removed.
418
 
419
The lock is broken if at least one of the following conditions occur:
420
. executing any data memory access instruction other than `lr.w`
421
. raising _any_ t (for example an interrupt or a memory access exception)
422
 
423 60 zero_gravi
[NOTE]
424
The atomic instructions have special requirements for memory system / bus interconnect. More
425
information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively.
426
 
427
 
428 66 zero_gravi
==== **`B`** - Bit-Manipulation Operations
429
 
430
The `B` ISA extension adds instructions for bit-manipulation operations. This extension is enabled if the
431 72 zero_gravi
<<_cpu_extension_riscv_b>> configuration generic is _true_.
432 66 zero_gravi
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-bitmanip
433 71 zero_gravi
A copy of the spec is also available in `docs/references`.
434 66 zero_gravi
 
435 71 zero_gravi
The NEORV32 `B` ISA extension includes the following sub-extensions (according to the RISC-V
436
bit-manipulation spec. v.093) and their corresponding instructions:
437 66 zero_gravi
 
438 71 zero_gravi
* **`Zba` - Address-generation instructions**
439
** `sh1add` `sh2add` `sh3add`
440
* **`Zbb` - Basic bit-manipulation instructions**
441
** `andn` `orn` `xnor`
442
** `clz` `ctz` `cpop`
443
** `max` `maxu` `min` `minu`
444
** `sext.b` `sext.h` `zext.h`
445
** `rol` `ror` `rori`
446
** `orc.b` `rev8`
447
* **`Zbc` - Carry-less multiplication instructions**
448
** `clmul` `clmulh` `clmulr`
449
* **`Zbs` - Single-bit instructions**
450
** `bclr` `bclri`
451
** `bext` `bexti`
452
** `bext` `binvi`
453
** `bset` `bseti`
454 66 zero_gravi
 
455
[TIP]
456
By default, the bit-manipulation unit uses an _iterative_ approach to compute shift-related operations
457
like `clz` and `rol`. To increase performance (at the cost of additional hardware resources) the
458
<<_fast_shift_en>> generic can be enabled to implement full-parallel logic (like barrel shifters) for all
459
shift-related `B` instructions.
460
 
461
[WARNING]
462 71 zero_gravi
The `B` extension is frozen and officially ratified. However, there is no
463
software support for this extension in the upstream GCC RISC-V port yet. An
464 66 zero_gravi
intrinsic library is provided to utilize the provided `B` extension features from C-language
465 71 zero_gravi
code (see `sw/example/bitmanip_test`) to circumvent this.
466 66 zero_gravi
 
467
 
468 60 zero_gravi
==== **`C`** - Compressed Instructions
469
 
470 65 zero_gravi
The _compressed_ ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
471 72 zero_gravi
The `C` extension is available when the <<_cpu_extension_riscv_c>> configuration generic is _true_.
472 65 zero_gravi
In this case the following instructions are available:
473 60 zero_gravi
 
474 70 zero_gravi
* `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub`
475
`c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`
476 60 zero_gravi
 
477
[NOTE]
478 65 zero_gravi
When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ instruction require
479
an additional instruction fetch to load the according second half-word of that instruction. The performance can be increased
480 60 zero_gravi
again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`,
481
`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile).
482
 
483
 
484
==== **`E`** - Embedded CPU
485
 
486 65 zero_gravi
The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
487 72 zero_gravi
decrease physical hardware requirements (for example block RAM). This extensions is enabled when the <<_cpu_extension_riscv_e>>
488 65 zero_gravi
configuration generic is _true_. Accesses to registers beyond `x15` will raise and _illegal instruction exception_.
489
This extension does not add any additional instructions or features.
490 60 zero_gravi
 
491 70 zero_gravi
[NOTE]
492 63 zero_gravi
Due to the reduced register file size an alternate toolchain ABI (**`ilp32e`**) is required.
493 60 zero_gravi
 
494
 
495
==== **`I`** - Base Integer ISA
496 65 zero_gravi
 
497 60 zero_gravi
The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled
498
regardless of the setting of the remaining exceptions. The base instruction set includes the following
499
instructions:
500
 
501 70 zero_gravi
* immediate: `lui` `auipc`
502
* jumps: `jal` `jalr`
503
* branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`
504
* memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`
505
* alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`
506
* environment: `ecall` `ebreak` `fence`
507 60 zero_gravi
 
508
[NOTE]
509 70 zero_gravi
In order to keep the hardware footprint low, the CPU's shift unit uses a bit-serial approach. Hence, shift operations
510 61 zero_gravi
take up to 32 cycles (plus overhead) depending on the actual shift amount. Alternatively, the shift operations can be processed
511 70 zero_gravi
completely in parallel by a fast (but large) barrel shifter if the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations
512 62 zero_gravi
complete within 2 cycles (plus overhead) regardless of the actual shift amount.
513 60 zero_gravi
 
514
[NOTE]
515
Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
516 70 zero_gravi
top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
517 60 zero_gravi
executed. Any flags within the `fence` instruction word are ignore by the hardware.
518
 
519
 
520
==== **`M`** - Integer Multiplication and Division
521
 
522 65 zero_gravi
Hardware-accelerated integer multiplication and division operations are available when the
523 72 zero_gravi
<<_cpu_extension_riscv_m>> configuration generic is _true_. In this case the following instructions are
524 60 zero_gravi
available:
525
 
526 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
527
* division: `div` `divu` `rem` `remu`
528 60 zero_gravi
 
529
[NOTE]
530
By default, multiplication and division operations are executed in a bit-serial approach.
531 73 zero_gravi
Alternatively, the multiplier core can be implemented using DSP blocks if the <<_fast_mul_en>>
532 60 zero_gravi
generic is _true_ allowing faster execution. Multiplications and divisions
533
always require a fixed amount of cycles to complete - regardless of the input operands.
534
 
535 73 zero_gravi
[NOTE]
536
Regardless of the setting of the <<_fast_mul_en>> generic
537
multiplication and division instructions operate _independently_ of the input operands.
538
Hence, there is **no early completion** of multiply by one/zero and divide by zero operations.
539 60 zero_gravi
 
540 73 zero_gravi
 
541 61 zero_gravi
==== **`Zmmul`** - Integer Multiplication
542
 
543
This is a _sub-extension_ of the `M` ISA extension. It implements the multiplication-only operations
544 65 zero_gravi
of the `M` extensions and is intended for size-constrained setups that require hardware-based
545 61 zero_gravi
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
546 65 zero_gravi
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.
547 72 zero_gravi
It is implemented if the <<_cpu_extension_riscv_zmmul>> configuration generic is _true_.
548 61 zero_gravi
 
549 70 zero_gravi
* multiplication: `mul` `mulh` `mulhsu` `mulhu`
550 61 zero_gravi
 
551 63 zero_gravi
If `Zmmul` is enabled, executing any division instruction from the `M` ISA extension (`div`, `divu`, `rem`, `remu`)
552
will raise an _illegal instruction exception_.
553 61 zero_gravi
 
554 63 zero_gravi
Note that `M` and `Zmmul` extensions _cannot_ be enabled at the same time.
555 61 zero_gravi
 
556
[TIP]
557
If your RISC-V GCC toolchain does not (yet) support the `_Zmmul` ISA extensions, it can be "emulated"
558
using a `rv32im` machine architecture and setting the `-mno-div` compiler flag
559 65 zero_gravi
(example `$ make MARCH=rv32im USER_FLAGS+=-mno-div clean_all exe`).
560 61 zero_gravi
 
561
 
562 60 zero_gravi
==== **`U`** - Less-Privileged User Mode
563
 
564 65 zero_gravi
In addition to the basic (and highest-privileged) machine-mode, the _user-mode_ ISA extensions adds a second less-privileged
565 72 zero_gravi
operation mode. It is implemented if the <<_cpu_extension_riscv_u>> configuration generic is _true_.
566 65 zero_gravi
Code executed in user-mode cannot access machine-mode CSRs. Furthermore, user-mode access to the address space (like
567
peripheral/IO devices) can be constrained via the physical memory protection (_PMP_).
568 72 zero_gravi
Any kind of privilege rights violation will raise an exception to allow <<_full_virtualization>>.
569 60 zero_gravi
 
570 73 zero_gravi
Additional CSRs:
571 60 zero_gravi
 
572 73 zero_gravi
* <<_mcounteren>> - machine counter enable to constrain user-mode access to timer/counter CSRs
573
 
574
 
575 60 zero_gravi
==== **`X`** - NEORV32-Specific (Custom) Extensions
576
 
577 72 zero_gravi
The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the <<_misa>> CSR.
578 60 zero_gravi
 
579 63 zero_gravi
The most important points of the NEORV32-specific extensions are:
580 73 zero_gravi
* The CPU provides 16 _fast interrupt_ interrupts (`FIRQ`), which are controlled via custom bits in the <<_mie>>
581
and <<_mip>> CSRs. These extensions are mapped to CSR bits, that are available for custom use according to the
582
RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.
583 63 zero_gravi
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see <<_full_virtualization>>).
584 72 zero_gravi
* There are <<_neorv32_specific_csrs>>.
585 60 zero_gravi
 
586
 
587 63 zero_gravi
==== **`Zfinx`** Single-Precision Floating-Point Operations
588 60 zero_gravi
 
589 65 zero_gravi
The `Zfinx` floating-point extension is an _alternative_ of the standard `F` floating-point ISA extension.
590
The `Zfinx` extensions also uses the integer register file `x` to store and operate on floating-point data
591
instead of a dedicated floating-point register file (hence, `F-in-x`). Thus, the `Zfinx` extension requires
592
less hardware resources and features faster context changes. This also implies that there are NO dedicated `f`
593
register file-related load/store or move instructions.
594
The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx
595 60 zero_gravi
 
596 70 zero_gravi
[NOTE]
597 60 zero_gravi
The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications.
598
 
599 65 zero_gravi
The `Zfinx` extensions only supports single-precision (`.s` instruction suffix), so it is a direct alternative
600 72 zero_gravi
to the `F` extension. The `Zfinx` extension is implemented when the <<_cpu_extension_riscv_zfinx>> configuration
601 60 zero_gravi
generic is _true_. In this case the following instructions and CSRs are available:
602
 
603 70 zero_gravi
* conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`
604
* comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`
605
* computational: `fadd.s` `fsub.s` `fmul.s`
606
* sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`
607 60 zero_gravi
* number classification: `fclass.s`
608
 
609 73 zero_gravi
* compressed instructions: `c.flw` `c.flwsp` `c.fsw` `c.fswsp`
610 60 zero_gravi
 
611 73 zero_gravi
Additional CSRs:
612
 
613
* <<_fcsr>> - FPU control register
614
* <<_frm>> - rounding mode control
615
* <<_fflags>> - FPU status flags
616
 
617 60 zero_gravi
[WARNING]
618
Fused multiply-add instructions `f[n]m[add/sub].s` are not supported!
619
Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet!
620
 
621
[WARNING]
622 65 zero_gravi
Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32 FPU.
623
Subnormal numbers (exponent = 0) are _flushed to zero_ setting them to +/- 0 before entering the
624 60 zero_gravi
FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the
625
result is also flushed to zero during normalization.
626
 
627
[WARNING]
628
The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no
629
software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an
630
intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language
631
code (see `sw/example/floating_point_test`).
632
 
633 63 zero_gravi
 
634 60 zero_gravi
==== **`Zicsr`** Control and Status Register Access / Privileged Architecture
635
 
636 65 zero_gravi
The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
637 72 zero_gravi
is implemented when the <<_cpu_extension_riscv_zicsr>> configuration generic is _true_.
638 68 zero_gravi
 
639
[IMPORTANT]
640
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
641
In order to provide the full set of privileged functions that are required to run more complex tasks like
642 70 zero_gravi
operating system and to allow a secure execution environment the `Zicsr` extension should be always enabled.
643 68 zero_gravi
 
644 65 zero_gravi
In this case the following instructions are available:
645 60 zero_gravi
 
646 70 zero_gravi
* CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`
647
* environment: `mret` `wfi`
648 60 zero_gravi
 
649 68 zero_gravi
[NOTE]
650
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
651
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
652
(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").
653 60 zero_gravi
 
654 73 zero_gravi
** `wfi` Instruction **
655
 
656 68 zero_gravi
The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is
657 73 zero_gravi
halted until a valid interrupt request occurs. To wake up again, at least one interrupt source has to
658 72 zero_gravi
be enabled via the <<_mie>> CSR and the global interrupt enable flag in <<_mstatus>> has to be set.
659 60 zero_gravi
 
660 74 zero_gravi
[NOTE]
661
Executing the `wfi` instruction is user-mode will raise an illegal instruction exception if
662
<<_mstatus>>.`TW` is set.
663 62 zero_gravi
 
664 66 zero_gravi
 
665
==== **`Zicntr`** CPU Base Counters
666
 
667
The `Zicntr` ISA extension adds the basic cycle `[m]cycle[h]`), instruction-retired (`[m]instret[h]`) and time (`time[h]`)
668
counters. This extensions is stated is _mandatory_ by the RISC-V spec. However, size-constrained setups may remove support for
669
these counters. Section <<_machine_counter_and_timer_csrs>> shows a list of all `Zicntr`-related CSRs.
670
These are available if the `Zicntr` ISA extensions is enabled via the <<_cpu_extension_riscv_zicntr>> generic.
671
 
672 73 zero_gravi
Additional CSRs:
673
 
674
* <<_cycleh>>, <<_mcycleh>> - cycle counter
675
* <<_instreth>>, <<_minstreth>> - instructions-retired counter
676
* <<_timeh>> - system _wall-clock_ time
677
 
678 66 zero_gravi
[NOTE]
679
Disabling the `Zicntr` extension does not remove the `time[h]`-driving MTIME unit.
680
 
681
If `Zicntr` is disabled, all accesses to the according counter CSRs will raise an illegal instruction exception.
682
 
683
 
684
==== **`Zihpm`** Hardware Performance Monitors
685
 
686
In additions to the base cycle, instructions-retired and time counters the NEORV32 CPU provides
687
up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an
688
N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's
689 72 zero_gravi
<<_hpm_cnt_width>> generic (0..64-bit) and a corresponding event configuration CSR. The event configuration
690 66 zero_gravi
CSR defines the architectural events that lead to an increment of the associated HPM counter.
691
 
692
The HPM counters are available if the `Zihpm` ISA extensions is enabled via the <<_cpu_extension_riscv_zihpm>> generic.
693 73 zero_gravi
The actual number of implemented HPM counters is defined by the <<_hpm_num_cnts>> generic.
694 66 zero_gravi
 
695 73 zero_gravi
Additional CSRs:
696 66 zero_gravi
 
697 73 zero_gravi
* <<_mhpmevent>> 3..31 (depending on <<_hpm_num_cnts>>) - event configuration CSRs
698
* <<_mhpmcounterh>> 3..31 (depending on <<_hpm_num_cnts>>) - counter CSRs
699 66 zero_gravi
 
700
[IMPORTANT]
701 73 zero_gravi
The HPM counter CSRs can only be accessed in machine-mode. Hence, the according <<_mcounteren>> CSR bits
702 66 zero_gravi
are always zero and read-only. Any access from less-privileged modes will raise an illegal instruction
703
exception.
704
 
705
[TIP]
706 73 zero_gravi
Auto-increment of the HPMs can be deactivated individually via the <<_mcountinhibit>> CSR.
707 66 zero_gravi
 
708
 
709 60 zero_gravi
==== **`Zifencei`** Instruction Stream Synchronization
710
 
711 72 zero_gravi
The `Zifencei` CPU extension is implemented if the <<_cpu_extension_riscv_zifencei>> configuration
712 60 zero_gravi
generic is _true_. It allows manual synchronization of the instruction stream via the following instruction:
713
 
714
* `fence.i`
715
 
716 66 zero_gravi
The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
717 64 zero_gravi
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
718
high for one cycle to inform the memory system (like the i-cache to perform a flush/reload.
719
Any additional flags within the `fence.i` instruction word are ignore by the hardware.
720 60 zero_gravi
 
721
 
722 72 zero_gravi
==== **`Zxcfu`** Custom Instructions Extension (CFU)
723
 
724
The `Zxcfu` presents a NEORV32-specific _custom RISC-V_ ISA extension (`Z` = sub-extension, `x` = platform-specific
725
custom extension, `cfu` = name of the custom extension). When enabled via the <<_cpu_extension_riscv_zxcfu>> configuration
726
generic, this ISA extensions adds the <<_custom_functions_unit_cfu>> to the CPU core. The CFU is a module that
727
allows to add **custom RISC-V instructions** to the processor core.
728
 
729
The CPU is implemented as ALU co-processor and is integrated right into the CPU's pipeline providing minimal data
730
transfer latency as it has direct access to the core's register file. Up to 1024 custom instructions can be
731
implemented within the CFU. These instructions are mapped to an OPCODE space that has been explicitly reserved by
732
the RISC-V spec for custom extensions.
733
 
734
Software can utilize the custom instructions by using _intrinsic functions_, which are inline assembly functions that
735
behave like "regular" C functions.
736
 
737
[TIP]
738
For more information regarding the CFU see section <<_custom_functions_unit_cfu>>.
739
 
740
[TIP]
741
The CFU / `Zxcfu` ISA extension is intended for application-specific _instructions_.
742
If you like to add more complex accelerators or interfaces that can also operate independently of
743
the CPU take a look at the memory-mapped <<_custom_functions_subsystem_cfs>>.
744
 
745
 
746 60 zero_gravi
==== **`PMP`** Physical Memory Protection
747
 
748 73 zero_gravi
The NEORV32 physical memory protection (PMP) provides an elementary memory protection mechanism that can be used
749
to constrain read, write and execute rights of arbitrary memory regions. The PMP is compatible
750
to the _RISC-V Privileged Architecture Specifications_. For detailed information see the according spec.'s sections.
751 60 zero_gravi
 
752 73 zero_gravi
[IMPORTANT]
753
The NEORV32 PMP only supports **TOR** (top of region) mode, which basically is a "base-and-bound" concept, and only
754
up to 16 PMP regions.
755 65 zero_gravi
 
756 73 zero_gravi
The physical memory protection logic is implemented if the <<_pmp_num_regions>> configuration generic is greater
757
than zero. This generic also defines the total number of available configurable protection
758
regions. The minimal granularity of a protected region is defined by the <<_pmp_min_granularity>> generic. Larger
759
granularity will reduce hardware complexity but will also decrease granularity as the minimal region sizes increases.
760
The default value is 4 bytes, which allows a minimal region size of 4 bytes.
761 60 zero_gravi
 
762 73 zero_gravi
If implemented the PMP provides the following additional CSRs:
763
 
764
* <<_pmpcfg>> 0..3 (depending on configuration) - PMP configuration registers, 4 entries per CSR
765
* <<_pmpaddr>> 0..15 (depending on configuration) - PMP address registers
766
 
767
 
768
**Operation Summary**
769
 
770
Any CPU access address (from the instruction fetch or data access interface) is tested if it matches _any_
771
of the specified PMP regions. If there is a match, the configured access rights are enforced:
772
 
773
* a write access (store) will fail if no **write** attribute is set
774
* a read access (load) will fail if no **read** attribute is set
775
* an instruction fetch access will fail if no **execute** attribute is set
776
 
777
If an access to a protected region does not have the according access rights it will raise the according
778
instruction/load/store _bus access fault_ exception.
779
 
780
By default, all PMP checks are enforced for user-mode only. However, PMP rules can also be enforced for
781
machine-mode when the according PMP region has the "LOCK" bit set. This will also prevent any write access
782
to according region's PMP CSRs until the CPU is reset.
783
 
784
.Rule Prioritization
785
[IMPORTANT]
786
All rules are checked in parallel **without** prioritization so for identical memory regions the most restrictive
787
PMP rule will be enforced.
788
 
789
.PMP Example Program
790 65 zero_gravi
[TIP]
791 73 zero_gravi
A simple PMP example program can be found in `sw/example/demo_pmp`.
792 60 zero_gravi
 
793
 
794 73 zero_gravi
**Impact on Critical Path**
795 60 zero_gravi
 
796 73 zero_gravi
When implementing more PMP regions that a "_certain critical limit_" an **additional register stage** is automatically
797
inserted into the CPU's memory interfaces to keep impact on the critical path as short as minimal as possible.
798
Unfortunately, this will also increase the latency of instruction fetches and data access by one cycle.
799
The _critical limit_ can be modified by a constant from the main VHDL package file
800
(`rtl/core/neorv32_package.vhd`, default value = 8):
801 60 zero_gravi
 
802
[source,vhdl]
803
----
804
-- "critical" number of PMP regions --
805
constant pmp_num_regions_critical_c : natural := 8;
806
----
807
 
808 73 zero_gravi
[TIP]
809
Reducing the minimal PMP region size / granularity via the <<_pmp_min_granularity>> to entity generic
810
will also reduce hardware utilization and impact on critical path.
811 60 zero_gravi
 
812
 
813 73 zero_gravi
<<<
814
// ####################################################################################################################
815 60 zero_gravi
 
816 73 zero_gravi
include::cpu_cfu.adoc[]
817 60 zero_gravi
 
818
 
819
<<<
820
// ####################################################################################################################
821
:sectnums:
822
=== Instruction Timing
823
 
824
The instruction timing listed in the table below shows the required clock cycles for executing a certain
825
instruction. These instruction cycles assume a bus access without additional wait states and a filled
826
pipeline.
827
 
828
Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU
829
configurations are presented in <<_cpu_performance>>.
830
 
831
.Clock cycles per instruction
832
[cols="<2,^1,^4,<3"]
833
[options="header", grid="rows"]
834
|=======================
835
| Class | ISA | Instruction(s) | Execution cycles
836 73 zero_gravi
| ALU            | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2
837
| ALU            | `C`   | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
838
| ALU            | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
839
| ALU            | `C`   | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
840 74 zero_gravi
| Branches       | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + (ML-1)footnote:[Memory latency.]; Not taken: 3
841
| Branches       | `C`   | `c.beqz` `c.bnez`                     | Taken: 5 + (ML-1); Not taken: 3
842
| Jumps / Calls  | `I/E` | `jal` `jalr`                  | 5 + (ML-1)
843
| Jumps / Calls  | `C`   | `c.jal` `c.j` `c.jr` `c.jalr` | 5 + (ML-1)
844
| Memory access  | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 5 + (ML-2)
845
| Memory access  | `C`   | `c.lw` `c.sw` `c.lwsp` `c.swsp`           | 5 + (ML-2)
846
| Memory access  | `A`   | `lr.w` `sc.w`                             | 5 + (ML-2)
847
| MulDiv         | `M`   | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4
848
| MulDiv         | `M`   | `div` `divu` `rem` `remu`     | 2+32+2
849
| System         | `Zicsr`     | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 3
850 73 zero_gravi
| System         | `Zicsr`     | `ecall` `ebreak` | 3
851 74 zero_gravi
| System         | `Zicsr`+`C` | `c.break`        | 3
852
| System         | `Zicsr`     | `wfi`            | 3
853
| System         | `Zicsr`     | `mret` `dret`    | 5
854
| Fence          | `I/E`       | `fence`   | 4 + ML
855
| Fence          | `Zifencei`  | `fence.i` | 4 + ML
856 60 zero_gravi
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
857
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
858
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
859 73 zero_gravi
| Floating-point - compare    | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13
860
| Floating-point - misc       | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12
861 60 zero_gravi
| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47
862
| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48
863 73 zero_gravi
| Bit-manipulation - arithmetic/logic    | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3
864
| Bit-manipulation - arithmetic/logic    | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
865
| Bit-manipulation - shifts              | `B(Zbb)` | `clz` `ctz` | 3 + 0..32
866
| Bit-manipulation - shifts              | `B(Zbb)` | `cpop` | 3 + 32
867
| Bit-manipulation - shifts              | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA
868
| Bit-manipulation - single-bit          | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3
869
| Bit-manipulation - shifted-add         | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3
870 71 zero_gravi
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
871 73 zero_gravi
| Custom instructions (CFU) | `Zxcfu` | - | min. 4
872
| | | |
873 74 zero_gravi
| _Illegal instructions_    | `Zicsr` | - | min. 2
874 60 zero_gravi
|=======================
875
 
876
[NOTE]
877 65 zero_gravi
The presented values of the *floating-point execution cycles* are average values - obtained from
878 60 zero_gravi
4096 instruction executions using pseudo-random input values. The execution time for emulating the
879
instructions (using pure-software libraries) is ~17..140 times higher.
880
 
881
 
882 66 zero_gravi
<<<
883 60 zero_gravi
// ####################################################################################################################
884
include::cpu_csr.adoc[]
885
 
886
 
887
<<<
888
// ####################################################################################################################
889
:sectnums:
890
==== Traps, Exceptions and Interrupts
891
 
892 61 zero_gravi
In this document the following nomenclature regarding traps is used:
893 60 zero_gravi
 
894 64 zero_gravi
* _interrupts_ = asynchronous exceptions
895 60 zero_gravi
* _exceptions_ = synchronous exceptions
896
* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions)
897
 
898 72 zero_gravi
Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in <<_mtvec>>
899
CSR. The cause of the according interrupt or exception can be determined via the content of <<_mcause>>
900
CSR. The address that reflects the current program counter when a trap was taken is stored to <<_mepc>> CSR.
901
Additional information regarding the cause of the trap can be retrieved from <<_mtval>> CSR and the processor's
902 70 zero_gravi
<<_internal_bus_monitor_buskeeper>> (for memory access exceptions)
903 60 zero_gravi
 
904 70 zero_gravi
The traps are prioritized. If several _synchronous exceptions_ occur at once only the one with highest priority is triggered
905
while all remaining exceptions are ignored. If several _asynchronous exceptions_ (interrupts) trigger at once, the one with highest priority
906 64 zero_gravi
is serviced first while the remaining ones stay _pending_. After completing the interrupt handler the interrupt with
907 70 zero_gravi
the second highest priority will get serviced and so on until no further interrupts are pending.
908 60 zero_gravi
 
909 69 zero_gravi
.Interrupt Signal Requirements - Standard RISC-V Interrupts
910 61 zero_gravi
[IMPORTANT]
911 69 zero_gravi
All standard RISC-V interrupts request signals are **high-active**. A request has to stay at high-level (=asserted)
912 65 zero_gravi
until it is explicitly acknowledged by the CPU software (for example by writing to a specific memory-mapped register).
913 60 zero_gravi
 
914 69 zero_gravi
.Interrupt Signal Requirements - Fast Interrupt Requests
915
[IMPORTANT]
916 70 zero_gravi
The NEORV32-specific FIRQ request lines are triggered by a one-shot high-level (i.e. rising edge). Each request is buffered in the CPU control
917 73 zero_gravi
unit until the channel is either disabled (by clearing the according <<_mie>> CSR bit) or the request is explicitly cleared (by writing
918
zero to the according <<_mip>> CSR bit).
919 69 zero_gravi
 
920 61 zero_gravi
.Instruction Atomicity
921
[NOTE]
922 70 zero_gravi
All instructions execute as atomic operations - interrupts can only trigger _between_ two instructions.
923
So even if there is a permanent interrupt request, exactly one instruction from the interrupt program will be executed before
924
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.
925 60 zero_gravi
 
926
 
927 61 zero_gravi
:sectnums:
928 73 zero_gravi
===== Memory Access Exceptions
929 60 zero_gravi
 
930 61 zero_gravi
If a load operation causes any exception, the instruction's destination register is
931
_not written_ at all. Load exceptions caused by a misalignment or a physical memory protection fault do not
932 70 zero_gravi
trigger a bus/memory read-operation at all. Vice versa, exceptions caused by a store address misalignment or a store physical
933
memory protection fault do not trigger a bus/memory write-operation at all.
934 60 zero_gravi
 
935
 
936 61 zero_gravi
:sectnums:
937 73 zero_gravi
===== Custom Fast Interrupt Request Lines
938 60 zero_gravi
 
939 61 zero_gravi
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the `firq_i` CPU top
940 72 zero_gravi
entity signals. These interrupts have custom configuration and status flags in the <<_mie>> and <<_mip>> CSRs and also
941
provide custom trap codes in <<_mcause>>. These FIRQs are reserved for NEORV32 processor-internal usage only.
942 60 zero_gravi
 
943
 
944 69 zero_gravi
:sectnums:
945 73 zero_gravi
===== NEORV32 Trap Listing
946 60 zero_gravi
 
947 69 zero_gravi
The following table shows all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization
948
and the CSR side-effects. A more detailed description of the actual trap triggering events is provided in a further table.
949
 
950
[NOTE]
951 72 zero_gravi
_Asynchronous exceptions_ (= interrupts) set the MSB of <<_mcause>> while _synchronous exception_ (= "software exception")
952 69 zero_gravi
clear the MSB.
953
 
954
**Table Annotations**
955
 
956
The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the
957 72 zero_gravi
cause ID of the according trap that is written to <<_mcause>> CSR. The "[RISC-V]" columns show the interrupt/exception code value from the
958 73 zero_gravi
official RISC-V privileged architecture spec. The "ID [C]" names are defined by the NEORV32 core library (the runtime environment _RTE_) and can
959
be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to <<_mepc>> and <<_mtval>> CSRs when a trap is triggered:
960 69 zero_gravi
 
961 73 zero_gravi
* **IPC** - address of interrupted instruction (instruction has not been executed yet)
962
* **PC** - address of instruction that caused the trap
963
* **ADR** - bad memory access address that caused the trap
964
* **INST** - the faulting instruction word itself
965
* **0** - zero
966 69 zero_gravi
 
967
.NEORV32 Trap Listing
968 60 zero_gravi
[cols="3,6,5,14,11,4,4"]
969
[options="header",grid="rows"]
970
|=======================
971 74 zero_gravi
| Prio. | `mcause`     | [RISC-V] | ID [C]                   | Cause                                  | `mepc`  | `mtval`
972 73 zero_gravi
7+^| **Synchronous Exceptions**
973 74 zero_gravi
| 1     | `0x00000000` | 0.0      | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned         | **PC**  | **ADR**
974
| 2     | `0x00000001` | 0.1      | _TRAP_CODE_I_ACCESS_     | instruction access bus fault           | **PC**  | **ADR**
975
| 3     | `0x00000002` | 0.2      | _TRAP_CODE_I_ILLEGAL_    | illegal instruction                    | **PC**  | **INST**
976
| 4     | `0x0000000B` | 0.11     | _TRAP_CODE_MENV_CALL_    | environment call from M-mode (`ecall`) | **PC**  | **0**
977
| 5     | `0x00000008` | 0.8      | _TRAP_CODE_UENV_CALL_    | environment call from U-mode (`ecall`) | **PC**  | **0**
978
| 6     | `0x00000003` | 0.3      | _TRAP_CODE_BREAKPOINT_   | software breakpoint (`ebreak`)         | **PC**  | **0**
979
| 7     | `0x00000006` | 0.6      | _TRAP_CODE_S_MISALIGNED_ | store address misaligned               | **PC**  | **ADR**
980
| 8     | `0x00000004` | 0.4      | _TRAP_CODE_L_MISALIGNED_ | load address misaligned                | **PC**  | **ADR**
981
| 9     | `0x00000007` | 0.7      | _TRAP_CODE_S_ACCESS_     | store access bus fault                 | **PC**  | **ADR**
982
| 10    | `0x00000005` | 0.5      | _TRAP_CODE_L_ACCESS_     | load access bus fault                  | **PC**  | **ADR**
983 73 zero_gravi
7+^| **Asynchronous Exceptions (Interrupts)**
984 74 zero_gravi
| 11    | `0x80000010` | 1.16     | _TRAP_CODE_FIRQ_0_       | fast interrupt request channel 0       | **IPC** | **0**
985
| 12    | `0x80000011` | 1.17     | _TRAP_CODE_FIRQ_1_       | fast interrupt request channel 1       | **IPC** | **0**
986
| 13    | `0x80000012` | 1.18     | _TRAP_CODE_FIRQ_2_       | fast interrupt request channel 2       | **IPC** | **0**
987
| 14    | `0x80000013` | 1.19     | _TRAP_CODE_FIRQ_3_       | fast interrupt request channel 3       | **IPC** | **0**
988
| 15    | `0x80000014` | 1.20     | _TRAP_CODE_FIRQ_4_       | fast interrupt request channel 4       | **IPC** | **0**
989
| 16    | `0x80000015` | 1.21     | _TRAP_CODE_FIRQ_5_       | fast interrupt request channel 5       | **IPC** | **0**
990
| 17    | `0x80000016` | 1.22     | _TRAP_CODE_FIRQ_6_       | fast interrupt request channel 6       | **IPC** | **0**
991
| 18    | `0x80000017` | 1.23     | _TRAP_CODE_FIRQ_7_       | fast interrupt request channel 7       | **IPC** | **0**
992
| 19    | `0x80000018` | 1.24     | _TRAP_CODE_FIRQ_8_       | fast interrupt request channel 8       | **IPC** | **0**
993
| 20    | `0x80000019` | 1.25     | _TRAP_CODE_FIRQ_9_       | fast interrupt request channel 9       | **IPC** | **0**
994
| 21    | `0x8000001a` | 1.26     | _TRAP_CODE_FIRQ_10_      | fast interrupt request channel 10      | **IPC** | **0**
995
| 22    | `0x8000001b` | 1.27     | _TRAP_CODE_FIRQ_11_      | fast interrupt request channel 11      | **IPC** | **0**
996
| 23    | `0x8000001c` | 1.28     | _TRAP_CODE_FIRQ_12_      | fast interrupt request channel 12      | **IPC** | **0**
997
| 24    | `0x8000001d` | 1.29     | _TRAP_CODE_FIRQ_13_      | fast interrupt request channel 13      | **IPC** | **0**
998
| 25    | `0x8000001e` | 1.30     | _TRAP_CODE_FIRQ_14_      | fast interrupt request channel 14      | **IPC** | **0**
999
| 26    | `0x8000001f` | 1.31     | _TRAP_CODE_FIRQ_15_      | fast interrupt request channel 15      | **IPC** | **0**
1000
| 27    | `0x8000000B` | 1.11     | _TRAP_CODE_MEI_          | machine external interrupt (MEI)       | **IPC** | **0**
1001
| 28    | `0x80000003` | 1.3      | _TRAP_CODE_MSI_          | machine software interrupt (MSI)       | **IPC** | **0**
1002
| 29    | `0x80000007` | 1.7      | _TRAP_CODE_MTI_          | machine timer interrupt (MTI)          | **IPC** | **0**
1003 60 zero_gravi
|=======================
1004
 
1005
 
1006 69 zero_gravi
The following table provides a summarized description of the actual events for triggering a specific trap.
1007 60 zero_gravi
 
1008 69 zero_gravi
.NEORV32 Trap Description
1009
[cols="<3,<7"]
1010
[options="header",grid="rows"]
1011
|=======================
1012 70 zero_gravi
| Trap ID [C] | Triggered when ...
1013 73 zero_gravi
| _TRAP_CODE_I_MISALIGNED_ | fetching a 32-bit instruction word that is not 32-bit-aligned (_see note below!_)
1014 69 zero_gravi
| _TRAP_CODE_I_ACCESS_     | bus timeout or bus error during instruction word fetch
1015
| _TRAP_CODE_I_ILLEGAL_    | trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation
1016
| _TRAP_CODE_MENV_CALL_    | executing `ecall` instruction in machine-mode
1017
| _TRAP_CODE_UENV_CALL_    | executing `ecall` instruction in user-mode
1018 73 zero_gravi
| _TRAP_CODE_BREAKPOINT_   | executing `ebreak` instruction
1019 69 zero_gravi
| _TRAP_CODE_S_MISALIGNED_ | storing data to an address that is not naturally aligned to the data size (byte, half, word) being stored
1020
| _TRAP_CODE_L_MISALIGNED_ | loading data from an address that is not naturally aligned to the data size  (byte, half, word) being loaded
1021
| _TRAP_CODE_S_ACCESS_     | bus timeout or bus error during load data operation
1022
| _TRAP_CODE_L_ACCESS_     | bus timeout or bus error during store data operation
1023
| _TRAP_CODE_FIRQ_0_ ... _TRAP_CODE_FIRQ_15_| caused by interrupt-condition of processor-internal modules, see <<_neorv32_specific_fast_interrupt_requests>>
1024
| _TRAP_CODE_MEI_          | user-defined processor-external source (via dedicated top-entity signal)
1025
| _TRAP_CODE_MSI_          | user-defined processor-external source (via dedicated top-entity signal)
1026
| _TRAP_CODE_MTI_          | processor-internal machine timer overflow OR user-defined processor-external source (via dedicated top-entity signal)
1027
|=======================
1028 60 zero_gravi
 
1029 72 zero_gravi
.Misaligned Instruction Address Exception
1030 69 zero_gravi
[NOTE]
1031
For 32-bit-only instructions (= no `C` extension) the misaligned instruction exception
1032
is raised if bit 1 of the fetch address is set (i.e. not on a 32-bit boundary). If the `C` extension is implemented
1033
there will never be a misaligned instruction exception _at all_.
1034 72 zero_gravi
In both cases bit 0 of the program counter (and all related CSRs) is hardwired to zero.
1035 60 zero_gravi
 
1036
 
1037
<<<
1038
// ####################################################################################################################
1039
:sectnums:
1040
==== Bus Interface
1041
 
1042 72 zero_gravi
The NEORV32 CPU implements a 32-bit machine with separated instruction and data interfaces making the CPU a
1043
**Harvard Architecture**: the _instruction fetch interface_ (`i_bus_*`) is used for fetching instruction and the
1044
_data access interface_ (`d_bus_*`) is used to access data via load and store operations.
1045
Each of this interfaces can access an address space of up to 2^32^ bytes (4GB).
1046
The following table shows the signals of the data and instruction interfaces as seen from the CPU (`*_o` signals are driven
1047
by the CPU / outputs, `*_i` signals are read by the CPU / inputs). Both interfaces use the same protocol.
1048 60 zero_gravi
 
1049 72 zero_gravi
.CPU bus interfaces ()
1050
[cols="<2,^1,^1,<6"]
1051 60 zero_gravi
[options="header",grid="rows"]
1052
|=======================
1053 72 zero_gravi
| Signal             | Width | Direction | Description
1054
| `i/d_bus_addr_o`   | 32    | out       | access address
1055
| `i/d_bus_rdata_i`  | 32    | in        | data input for read operations
1056
| `i/d_bus_wdata_o`  | 32    | out       | data output for write operations
1057
| `i/d_bus_ben_o`    | 4     | out       | byte enable signal for write operations
1058
| `i/d_bus_we_o`     | 1     | out       | bus write access (always zero for instruction fetches)
1059
| `i/d_bus_re_o`     | 1     | out       | bus read access
1060
| `i/d_bus_lock_o`   | 1     | out       | exclusive access request
1061
| `i/d_bus_ack_i`    | 1     | in        | accessed peripheral indicates a successful completion of the bus transaction
1062
| `i/d_bus_err_i`    | 1     | in        | accessed peripheral indicates an error during the bus transaction
1063
| `i/d_bus_fence_o`  | 1     | out       | this signal is set for one cycle when the CPU executes an instruction/data fence operation
1064
| `i/d_bus_priv_o`   | 2     | out       | current CPU privilege level
1065 60 zero_gravi
|=======================
1066
 
1067 72 zero_gravi
.Pipelined Transfers
1068 60 zero_gravi
[NOTE]
1069
Currently, there a no pipelined or overlapping operations implemented within the same bus interface.
1070 72 zero_gravi
So only a single transfer request can be "on the fly" (pending) at once. However, this is no real drawback. The
1071
minimal possible latency for a single access is two cycles, which equals the CPU's minimal execution latency
1072
for a single instruction.
1073 60 zero_gravi
 
1074 72 zero_gravi
.Unaligned Memory Accesses
1075
[NOTE]
1076
Please note, that the NEORV32 CPU does not support the handling of unaligned memory accesses _in hardware_. Any
1077
unaligned memory access will raise an exception that can can be used to handle such accesses in _software_.
1078
 
1079
 
1080 60 zero_gravi
:sectnums:
1081
===== Protocol
1082
 
1083 72 zero_gravi
An actual bus request is triggered either by the `*_bus_re_o` signal (for reading data) or by the `*_bus_we_o` signal
1084
(for writing data). In case of a request, one of these signals is high for exactly one cycle. The transaction is
1085
completed when the accessed peripheral/memory either sets the `*_bus_ack_i` signal (-> successful completion) or the
1086
`*_bus_err_i` signal (-> failed completion). These bus response signal are also set only for one cycle active.
1087
An error indicated by the `*_bus_err_i` signal will raise the according "instruction bus access fault" or
1088
"load/store bus access fault" exception.
1089 60 zero_gravi
 
1090 73 zero_gravi
 
1091 72 zero_gravi
**Minimal Response Latency**
1092 60 zero_gravi
 
1093 72 zero_gravi
The transfer can be completed directly in the same cycle as it was initiated (via the `*_bus_re_o` or `*_bus_we_o`
1094
signal) if the peripheral sets `*_bus_ack_i` or `*_bus_err_i` high for one cycle. However, in order to shorten the
1095
critical path such "asynchronous" completion should be avoided. The default NEORV32 processor-internal modules provide
1096
exactly **one cycle delay** between initiation and completion of transfers.
1097 60 zero_gravi
 
1098 73 zero_gravi
 
1099 72 zero_gravi
**Maximal Response Latency**
1100
 
1101
Processor-internal peripherals or memories do not have to respond within one cycle after a bus request has been initiated.
1102
However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window
1103
is defined by the global `max_proc_int_response_time_c` constant (default = 15 cycles; processor's VHDL package file `rtl/neorv32_package.vhd`).
1104 73 zero_gravi
It defines the maximum number of cycles after which an _unacknowledged_ (`*_bus_ack_i` or `*_bus_err_i` signal from the **processor-internal bus**
1105
both not set) processor-internal bus
1106 72 zero_gravi
transfer will time out and raises a **bus fault exception**. The <<_internal_bus_monitor_buskeeper>> keeps track of all _internal_ bus
1107
transactions to enforce this time window.
1108
 
1109
If any bus operations times out (for example when accessing "address space holes") the BUSKEEPER will issue a bus
1110
error to the CPU that will raise the according instruction fetch or data access bus exception.
1111
Note that **the bus keeper does not track external accesses via the external memory bus interface**. However,
1112
the external memory bus interface also provides an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>).
1113
 
1114 73 zero_gravi
.Interface Response
1115
[NOTE]
1116
Please note that any CPU access via the data or instruction interface has to be terminated either by asserting the
1117
CPU's *_bus_ack_i` or `*_bus_err_i` signal. Otherwise the CPU will be stalled permanently. The BUSKEEPER ensures that
1118
any kind of access is always properly terminated.
1119
 
1120
 
1121 60 zero_gravi
**Exemplary Bus Accesses**
1122
 
1123
.Example bus accesses: see read/write access description below
1124
[cols="^2,^2"]
1125
[grid="none"]
1126
|=======================
1127
a| image::cpu_interface_read_long.png[read,300,150]
1128
a| image::cpu_interface_write_long.png[write,300,150]
1129
| Read access | Write access
1130
|=======================
1131
 
1132 73 zero_gravi
 
1133 60 zero_gravi
**Write Access**
1134
 
1135 72 zero_gravi
For a write access, the access address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte
1136 60 zero_gravi
enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the
1137
transaction is completed. In the example the accessed peripheral cannot answer directly in the next
1138
cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several
1139
cycles after issuing.
1140
 
1141 73 zero_gravi
 
1142 60 zero_gravi
**Read Access**
1143
 
1144
For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept
1145
stable until the transaction is completed. In the example the accessed peripheral cannot answer
1146
directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as
1147
the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i`
1148
signal).
1149
 
1150 73 zero_gravi
 
1151 60 zero_gravi
**Access Boundaries**
1152
 
1153
The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching
1154
compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16-
1155
bit) and word (= 32-bit) boundaries.
1156
 
1157 73 zero_gravi
 
1158 60 zero_gravi
**Exclusive (Atomic) Access**
1159
 
1160
The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional
1161
combination. Normally, these combinations should target the same memory address.
1162
 
1163
The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction
1164
will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of
1165
the memory system to manage this exclusive access reservation by storing the according access address and
1166
the source of the access itself (for example via the CPU ID in a multi-core system).
1167
 
1168
When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is
1169
evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back
1170
zero and will allow the according store operation to the memory system. If the lock is broken, the
1171
instruction will write-back non-zero and will not generate an actual memory store operation.
1172
 
1173
The CPU-internal exclusive access lock is broken if at least one of the situations appear.
1174
 
1175
* when executing any other memory-access operation than `lr.w`
1176
* when any trap (sync. or async.) is triggered (for example to force a context switch)
1177
* when the memory system signals a bus error (via the `bus_err_i` signal)
1178
 
1179
[TIP]
1180
For more information regarding the SoC-level behavior and requirements of atomic operations see
1181
section <<_processor_external_memory_interface_wishbone_axi4_lite>>.
1182
 
1183 73 zero_gravi
 
1184 60 zero_gravi
**Memory Barriers**
1185
 
1186 72 zero_gravi
Whenever the CPU executes a _fence_ instruction, the according interface signal is set high for one cycle
1187
(`d_bus_fence_o` for a `fence` instruction; `i_bus_fence_o` for a `fencei` instruction). It is the task of the
1188
memory system to perform the necessary operations (for example a cache flush and refill).
1189 60 zero_gravi
 
1190
 
1191
 
1192
<<<
1193
// ####################################################################################################################
1194
:sectnums:
1195
==== CPU Hardware Reset
1196
 
1197
In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical
1198
registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a
1199
dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers
1200
after power-up is not relevant for a defined CPU boot process.
1201
 
1202 73 zero_gravi
 
1203 70 zero_gravi
**Rationale**
1204 60 zero_gravi
 
1205
A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage
1206
of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the
1207 66 zero_gravi
data in the according data register is valid. At the end of the pipeline the status register might trigger a write-back
1208 60 zero_gravi
of the processing result to some kind of memory. The initial status of the data registers after power-up is
1209
irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in
1210 70 zero_gravi
the pipeline's data register. Therefore, the pipeline data register do no require a dedicated reset as they do not
1211 60 zero_gravi
control the actual operation (in contrast to the status register). This makes the pipeline data registers from
1212
this example "uncritical registers".
1213
 
1214 73 zero_gravi
 
1215 60 zero_gravi
**NEORV32 CPU Reset**
1216
 
1217
In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status
1218
and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The
1219 70 zero_gravi
pipeline register will get initialized by the CPU's internal state machines, which are initialized from the main
1220 60 zero_gravi
control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like
1221
interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code).
1222
 
1223
During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to
1224 72 zero_gravi
the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR <<_mie>>
1225 60 zero_gravi
does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire
1226 70 zero_gravi
because the global interrupt enabled flag in the status register (`mstatsus(mie)`) _do_ provide a dedicated
1227
hardware reset setting this bit to low (globally disabling interrupts).
1228 60 zero_gravi
 
1229 73 zero_gravi
 
1230 60 zero_gravi
**Reset Configuration**
1231
 
1232 70 zero_gravi
Most CPU-internal register do provide an asynchronous reset in the VHDL code, but the "don't care" value
1233
(VHDL `'-'`) is used for initialization of all uncritical registers, effectively generating a flip-flop without a
1234 60 zero_gravi
reset. However, certain applications or situations (like advanced gate-level / timing simulations) might
1235 70 zero_gravi
require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all CPU registers can
1236 72 zero_gravi
be enabled by enabling a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`):
1237 60 zero_gravi
 
1238
[source,vhdl]
1239
----
1240 72 zero_gravi
-- use dedicated hardware reset value for UNCRITICAL registers --
1241
-- FALSE=reset value is irrelevant (might simplify HW), default; TRUE=defined LOW reset value
1242
constant dedicated_reset_c : boolean := false;
1243 60 zero_gravi
----

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.