OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu_cfu.adoc] - Blame information for rev 72

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 72 zero_gravi
<<<
2
:sectnums:
3
=== Custom Functions Unit (CFU)
4
 
5
The Custom Functions Unit is the central part of the <<_zxcfu_custom_instructions_extension_cfu>> and represents
6
the actual hardware module, which is used to implement _custom RISC-V instructions_. The concept of the NEORV32
7
CFU has been highly inspired by https://github.com/google/CFU-Playground[google's CFU-Playground].
8
 
9
The CFU is intended for operations that are inefficient in terms of performance, latency, energy consumption or
10
program memory requirements when implemented in pure software. Some potential application fields and exemplary
11
use-cases might include:
12
 
13
* **AI:** sub-word / vector / SIMD operations like adding all four bytes of a 32-bit data word
14
* **Cryptographic:** bit substitution and permutation
15
* **Communication:** conversions like binary to gray-code
16
* **Image processing:** look-up-tables for color space transformations
17
* implementing instructions from other RISC-V ISA extensions that are not yet supported by the NEORV32
18
 
19
[NOTE]
20
The CFU is not intended for complex and autonomous functional units that implement complete accelerators
21
like block-based AES de-/encoding). Such accelerator can be implemented within the <<_custom_functions_subsystem_cfs>>.
22
A comparison of all chip-internal hardware extension options is provided in the user guide section
23
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules].
24
 
25
 
26
:sectnums:
27
==== Custom CFU Instructions - General
28
 
29
The custom instruction utilize a specific instruction space that has been explicitly reserved for user-defined
30
extensions by the RISC-V specifications ("_Guaranteed Non-Standard Encoding Space_"). The NEORV32 CFU uses the
31
_CUSTOM0_ opcode to identify custom instructions. The binary encoding of this opcode is `0001011`.
32
 
33
The custom instructions processed by the CFU use the 32-bit **R2-type** RISC-V instruction format, which consists
34
of six bit-fields:
35
 
36
* `funct7`: 7-bit immediate
37
* `rs2`: address of second source register
38
* `rs1`: address of first source register
39
* `funct3`: 3-bit immediate
40
* `rd`: address of destination register
41
* `opcode`: always `0001011` to identify custom instructions
42
 
43
.CFU instruction format (RISC-V R2-type)
44
image::cfu_r2type_instruction.png[align=center]
45
 
46
[NOTE]
47
Obviously, all bit-fields including the immediates have to be static at compile time.
48
 
49
.Custom Instructions - Exceptions
50
[NOTE]
51
The CPU control logic can only check the _CUSTOM0_ opcode of the custom instructions to check if the
52
instruction word is valid. It cannot check the `funct3` and `funct7` bit-fields since they are
53
implementation-defined. Hence, a custom CFU instruction can never raise an illegal instruction exception.
54
However, custom will raise an illegal instruction exception if the CFU is not enabled/implemented
55
(i.e. `Zxcfu` ISA extension is not enabled).
56
 
57
The CFU operates on the two source operands and return the processing result to the destination register.
58
The actual instruction to be performed can be defined by using the `funct7` and `funct3` bit fields.
59
These immediate bit-fields can also be used to pass additional data to the CFU like offsets, look-up-tables
60
addresses or shift-amounts. However, the actual functionality is completely user-defined.
61
 
62
 
63
:sectnums:
64
==== Using Custom Instructions in Software
65
 
66
The custom instructions provided by the CFU are included into plain C code by using **intrinsics**. Intrinsics
67
behave like "normal" functions but under the hood they are a set of macros that hide the complexity of inline assembly.
68
Using such intrinsics removes the need to modify the compiler, built-in libraries and the assembler when including custom
69
instructions.
70
 
71
The NEORV32 software framework provides 8 pre-defined custom instructions macros, which are defined in
72
`sw/lib/include/neorv32_cpu_cfu.h`. Each intrinsic provides an implicit definition of the instruction word's
73
`funct3` bit-field:
74
 
75
.CFU instruction prototypes
76
[source,c]
77
----
78
neorv32_cfu_cmd0(funct7, rs1, rs2) // funct3 = 000
79
neorv32_cfu_cmd1(funct7, rs1, rs2) // funct3 = 001
80
neorv32_cfu_cmd2(funct7, rs1, rs2) // funct3 = 010
81
neorv32_cfu_cmd3(funct7, rs1, rs2) // funct3 = 011
82
neorv32_cfu_cmd4(funct7, rs1, rs2) // funct3 = 100
83
neorv32_cfu_cmd5(funct7, rs1, rs2) // funct3 = 101
84
neorv32_cfu_cmd6(funct7, rs1, rs2) // funct3 = 110
85
neorv32_cfu_cmd7(funct7, rs1, rs2) // funct3 = 111
86
----
87
 
88
Each intrinsic functions always returns a 32-bit value (the processing result). Furthermore,
89
each intrinsic function requires three arguments:
90
 
91
* `funct7` - 7-bit immediate
92
* `rs2` - source operand 2, 32-bit
93
* `rs1` - source operand 1, 32-bit
94
 
95
The `funct7` bit-field is used to pass a 7-bit literal to the CFU. The `rs1` and `rs2` arguments to pass the
96
actual data to the CFU. These arguments can be populated with variables or literals. The following example
97
show how to pass arguments when executing `neorv32_cfu_cmd6`: `funct7` is set to all-zero, `rs1` is given
98
the literal _2751_ and `rs2` is given a variable that contains the return value from `some_function()`.
99
 
100
.CFU instruction usage example
101
[source,c]
102
----
103
uint32_t opb = some_function();
104
uint32_t res = neorv32_cfu_cmd6(0b0000000, 2751, opb);
105
----
106
 
107
.CFU Example Program
108
[TIP]
109
There is a simple example program for the CFU, which shows how to use the _default_ CFU hardware module.
110
The example program is located in `sw/example/demo_cfu`.
111
 
112
 
113
:sectnums:
114
==== Custom Instructions Hardware
115
 
116
The actual functionality of the CFU's custom instruction is defined by the logic in the CFU itself.
117
It is the responsibility of the designer to implement this logic within the CFU hardware module
118
`rtl/core/neorv32_cpu_cp_cfu.vhd`.
119
 
120
The CFU hardware module receives the data from instruction word's immediate bit-fields and also
121
the operation data, which is fetched from the CPU's register file.
122
 
123
.CFU instruction data passing example
124
[source,c]
125
----
126
uint32_t opb = 0x12345678;
127
uint32_t res = neorv32_cfu_cmd6(0b0100111, 0x00cafe00, opb);
128
----
129
 
130
In this example the CFU hardware module receives the two source operands as 32-bit signal
131
and the immediate values as 7-bit and 3-bit signals:
132
 
133
* `rs1_i` (32-bit) contains the data from the `rs1` register (here = `0x00cafe00`)
134
* `rs2_i` (32-bit) contains the data from the `rs2` register (here = 0x12345678)
135
* `control.funct3` (3-bit) contains the immediate value from the `funct3` bit-field (here = `0b110`; "cmd6")
136
* `control.funct7` (7-bit) contains the immediate value from the `funct7` bit-field (here = `0b0100111`)
137
 
138
The CFU executes the according instruction (for example this is selected by the `control.funct3` signal)
139
and provides the operation result in the 32-bit `control.result` signal. The processing can be entirely
140
combinatorial, so the result is available at the end of the current clock cycle. Processing can also
141
take several clock cycles and may also include internal states and memories. As soon as the CFU has
142
completed operations it sets the `control.done` signal high.
143
 
144
.CFU Hardware Example & More Details
145
[TIP]
146
The default CFU module already implement some exemplary instructions that are used for illustration
147
by the CFU example program. See the CFU's VHDL source file (`rtl/core/neorv32_cpu_cp_cfu.vhd`), which
148
is highly commented to explain the available signals and the handshake with the CPU pipeline.
149
 
150
.CFU Execution Time
151
[NOTE]
152
The CFU is not required to finish processing within a bound time.
153
However, the designer should keep in mind that the CPU is **stalled** until the CFU has finished processing.
154
This also means the CPU cannot react to pending interrupts. Nevertheless, interrupt requests will still be queued.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.