URL
https://opencores.org/ocsvn/neorv32/neorv32/trunk
Subversion Repositories neorv32
[/] [neorv32/] [trunk/] [docs/] [datasheet/] [cpu_cfu.adoc] - Rev 72
Compare with Previous | Blame | View Log
<<<:sectnums:=== Custom Functions Unit (CFU)The Custom Functions Unit is the central part of the <<_zxcfu_custom_instructions_extension_cfu>> and representsthe actual hardware module, which is used to implement _custom RISC-V instructions_. The concept of the NEORV32CFU has been highly inspired by https://github.com/google/CFU-Playground[google's CFU-Playground].The CFU is intended for operations that are inefficient in terms of performance, latency, energy consumption orprogram memory requirements when implemented in pure software. Some potential application fields and exemplaryuse-cases might include:* **AI:** sub-word / vector / SIMD operations like adding all four bytes of a 32-bit data word* **Cryptographic:** bit substitution and permutation* **Communication:** conversions like binary to gray-code* **Image processing:** look-up-tables for color space transformations* implementing instructions from other RISC-V ISA extensions that are not yet supported by the NEORV32[NOTE]The CFU is not intended for complex and autonomous functional units that implement complete acceleratorslike block-based AES de-/encoding). Such accelerator can be implemented within the <<_custom_functions_subsystem_cfs>>.A comparison of all chip-internal hardware extension options is provided in the user guide sectionhttps://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules].:sectnums:==== Custom CFU Instructions - GeneralThe custom instruction utilize a specific instruction space that has been explicitly reserved for user-definedextensions by the RISC-V specifications ("_Guaranteed Non-Standard Encoding Space_"). The NEORV32 CFU uses the_CUSTOM0_ opcode to identify custom instructions. The binary encoding of this opcode is `0001011`.The custom instructions processed by the CFU use the 32-bit **R2-type** RISC-V instruction format, which consistsof six bit-fields:* `funct7`: 7-bit immediate* `rs2`: address of second source register* `rs1`: address of first source register* `funct3`: 3-bit immediate* `rd`: address of destination register* `opcode`: always `0001011` to identify custom instructions.CFU instruction format (RISC-V R2-type)image::cfu_r2type_instruction.png[align=center][NOTE]Obviously, all bit-fields including the immediates have to be static at compile time..Custom Instructions - Exceptions[NOTE]The CPU control logic can only check the _CUSTOM0_ opcode of the custom instructions to check if theinstruction word is valid. It cannot check the `funct3` and `funct7` bit-fields since they areimplementation-defined. Hence, a custom CFU instruction can never raise an illegal instruction exception.However, custom will raise an illegal instruction exception if the CFU is not enabled/implemented(i.e. `Zxcfu` ISA extension is not enabled).The CFU operates on the two source operands and return the processing result to the destination register.The actual instruction to be performed can be defined by using the `funct7` and `funct3` bit fields.These immediate bit-fields can also be used to pass additional data to the CFU like offsets, look-up-tablesaddresses or shift-amounts. However, the actual functionality is completely user-defined.:sectnums:==== Using Custom Instructions in SoftwareThe custom instructions provided by the CFU are included into plain C code by using **intrinsics**. Intrinsicsbehave like "normal" functions but under the hood they are a set of macros that hide the complexity of inline assembly.Using such intrinsics removes the need to modify the compiler, built-in libraries and the assembler when including custominstructions.The NEORV32 software framework provides 8 pre-defined custom instructions macros, which are defined in`sw/lib/include/neorv32_cpu_cfu.h`. Each intrinsic provides an implicit definition of the instruction word's`funct3` bit-field:.CFU instruction prototypes[source,c]----neorv32_cfu_cmd0(funct7, rs1, rs2) // funct3 = 000neorv32_cfu_cmd1(funct7, rs1, rs2) // funct3 = 001neorv32_cfu_cmd2(funct7, rs1, rs2) // funct3 = 010neorv32_cfu_cmd3(funct7, rs1, rs2) // funct3 = 011neorv32_cfu_cmd4(funct7, rs1, rs2) // funct3 = 100neorv32_cfu_cmd5(funct7, rs1, rs2) // funct3 = 101neorv32_cfu_cmd6(funct7, rs1, rs2) // funct3 = 110neorv32_cfu_cmd7(funct7, rs1, rs2) // funct3 = 111----Each intrinsic functions always returns a 32-bit value (the processing result). Furthermore,each intrinsic function requires three arguments:* `funct7` - 7-bit immediate* `rs2` - source operand 2, 32-bit* `rs1` - source operand 1, 32-bitThe `funct7` bit-field is used to pass a 7-bit literal to the CFU. The `rs1` and `rs2` arguments to pass theactual data to the CFU. These arguments can be populated with variables or literals. The following exampleshow how to pass arguments when executing `neorv32_cfu_cmd6`: `funct7` is set to all-zero, `rs1` is giventhe literal _2751_ and `rs2` is given a variable that contains the return value from `some_function()`..CFU instruction usage example[source,c]----uint32_t opb = some_function();uint32_t res = neorv32_cfu_cmd6(0b0000000, 2751, opb);----.CFU Example Program[TIP]There is a simple example program for the CFU, which shows how to use the _default_ CFU hardware module.The example program is located in `sw/example/demo_cfu`.:sectnums:==== Custom Instructions HardwareThe actual functionality of the CFU's custom instruction is defined by the logic in the CFU itself.It is the responsibility of the designer to implement this logic within the CFU hardware module`rtl/core/neorv32_cpu_cp_cfu.vhd`.The CFU hardware module receives the data from instruction word's immediate bit-fields and alsothe operation data, which is fetched from the CPU's register file..CFU instruction data passing example[source,c]----uint32_t opb = 0x12345678;uint32_t res = neorv32_cfu_cmd6(0b0100111, 0x00cafe00, opb);----In this example the CFU hardware module receives the two source operands as 32-bit signaland the immediate values as 7-bit and 3-bit signals:* `rs1_i` (32-bit) contains the data from the `rs1` register (here = `0x00cafe00`)* `rs2_i` (32-bit) contains the data from the `rs2` register (here = 0x12345678)* `control.funct3` (3-bit) contains the immediate value from the `funct3` bit-field (here = `0b110`; "cmd6")* `control.funct7` (7-bit) contains the immediate value from the `funct7` bit-field (here = `0b0100111`)The CFU executes the according instruction (for example this is selected by the `control.funct3` signal)and provides the operation result in the 32-bit `control.result` signal. The processing can be entirelycombinatorial, so the result is available at the end of the current clock cycle. Processing can alsotake several clock cycles and may also include internal states and memories. As soon as the CFU hascompleted operations it sets the `control.done` signal high..CFU Hardware Example & More Details[TIP]The default CFU module already implement some exemplary instructions that are used for illustrationby the CFU example program. See the CFU's VHDL source file (`rtl/core/neorv32_cpu_cp_cfu.vhd`), whichis highly commented to explain the available signals and the handshake with the CPU pipeline..CFU Execution Time[NOTE]The CFU is not required to finish processing within a bound time.However, the designer should keep in mind that the CPU is **stalled** until the CFU has finished processing.This also means the CPU cannot react to pending interrupts. Nevertheless, interrupt requests will still be queued.
