| 1 |
69 |
zero_gravi |
<<<
|
| 2 |
|
|
:sectnums:
|
| 3 |
|
|
== Adding Custom Hardware Modules
|
| 4 |
|
|
|
| 5 |
|
|
In resemblance to the RISC-V ISA, the NEORV32 processor was designed to ease customization and _extensibility_.
|
| 6 |
|
|
The processor provides several predefined options to add application-specific custom hardware modules and accelerators.
|
| 7 |
72 |
zero_gravi |
A <<_comparative_summary>> is given at the end of this section.
|
| 8 |
69 |
zero_gravi |
|
| 9 |
|
|
|
| 10 |
72 |
zero_gravi |
.Debugging/Testing Custom Hardware Modules
|
| 11 |
|
|
[TIP]
|
| 12 |
|
|
Custom hardware IP modules connected via the external bus interface or integrated as CFU can be debugged "in-system" using the
|
| 13 |
|
|
"bus explorer" example program (`sw/example_bus_explorer`). This program provides an interactive console (via UART0)
|
| 14 |
|
|
that allows to perform arbitrary read and write access from/to any memory-mapped register.
|
| 15 |
|
|
|
| 16 |
|
|
|
| 17 |
69 |
zero_gravi |
=== Standard (_External_) Interfaces
|
| 18 |
|
|
|
| 19 |
|
|
The processor already provides a set of standard interfaces that are intended to connect _chip-external_ devices.
|
| 20 |
|
|
However, these interfaces can also be used chip-internally. The most suitable interfaces are
|
| 21 |
|
|
https://stnolting.github.io/neorv32/#_general_purpose_input_and_output_port_gpio[GPIO],
|
| 22 |
|
|
https://stnolting.github.io/neorv32/#_primary_universal_asynchronous_receiver_and_transmitter_uart0[UART],
|
| 23 |
|
|
https://stnolting.github.io/neorv32/#_serial_peripheral_interface_controller_spi[SPI] and
|
| 24 |
|
|
https://stnolting.github.io/neorv32/#_two_wire_serial_interface_controller_twi[TWI].
|
| 25 |
|
|
|
| 26 |
72 |
zero_gravi |
The SPI and especially the GPIO interfaces might be the most straightforward approaches since they
|
| 27 |
|
|
have a minimal protocol overhead. Device-specific interrupt capabilities could be added using the
|
| 28 |
69 |
zero_gravi |
https://stnolting.github.io/neorv32/#_external_interrupt_controller_xirq[External Interrupt Controller (XIRQ)].
|
| 29 |
72 |
zero_gravi |
|
| 30 |
69 |
zero_gravi |
Beyond simplicity, these interface only provide a very limited bandwidth and require more sophisticated
|
| 31 |
74 |
zero_gravi |
software handling ("bit-banging" for the GPIO). Hence, it is not recommend to use them for _chip-internal_ communication.
|
| 32 |
69 |
zero_gravi |
|
| 33 |
|
|
|
| 34 |
|
|
=== External Bus Interface
|
| 35 |
|
|
|
| 36 |
|
|
The https://stnolting.github.io/neorv32/#_processor_external_memory_interface_wishbone_axi4_lite[External Bus Interface]
|
| 37 |
72 |
zero_gravi |
provides the classic approach for attaching custom IP. By default, the bus interface implements the widely adopted
|
| 38 |
|
|
Wishbone interface standard. This project also includes wrappers to convert to other protocol standards like ARM's
|
| 39 |
|
|
AXI4-Lite or Intel's Avalon protocols. By using a full-featured bus protocol, complex SoC designs can be implemented
|
| 40 |
|
|
including several modules and even multi-core architectures. Many FPGA EDA tools provide graphical editors to build
|
| 41 |
|
|
and customize whole SoC architectures and even include pre-defined IP libraries.
|
| 42 |
69 |
zero_gravi |
|
| 43 |
|
|
.Example AXI SoC using Xilinx Vivado
|
| 44 |
|
|
image::neorv32_axi_soc.png[]
|
| 45 |
|
|
|
| 46 |
72 |
zero_gravi |
Custom hardware modules attached to the processor's bus interface have no limitations regarding their functionality.
|
| 47 |
|
|
User-defined interfaces (like DDR memory access) can be implemented and the hardware module can operate completely
|
| 48 |
|
|
independent of the CPU.
|
| 49 |
|
|
|
| 50 |
69 |
zero_gravi |
The bus interface uses a memory-mapped approach. All data transfers are handled by simple load/store operations since the
|
| 51 |
|
|
external bus interface is mapped into the processor's https://stnolting.github.io/neorv32/#_address_space[address space].
|
| 52 |
72 |
zero_gravi |
This allows a very simple still high-bandwidth communications. However, high bus traffic may increase access latencies.
|
| 53 |
69 |
zero_gravi |
|
| 54 |
|
|
|
| 55 |
|
|
=== Stream Link Interface
|
| 56 |
|
|
|
| 57 |
74 |
zero_gravi |
The link:++https://stnolting.github.io/neorv32/#_stream_link_interface_slink++[Stream Link Interface (SLINK)] provides a
|
| 58 |
72 |
zero_gravi |
point-to-point, unidirectional and parallel data interface that can be used to transfer _streaming_ data. In
|
| 59 |
|
|
contrast to the external bus interface, the streaming interface does not provide any kind of advanced control,
|
| 60 |
|
|
so it can be seen as "constant address bursts" where data is transmitted _sequentially_ (no random accesses).
|
| 61 |
|
|
While the CPU needs to "feed" the stream link interfaces with data (and read back incoming data), the actual
|
| 62 |
|
|
processor-external processing of the data run independently of the CPU.
|
| 63 |
69 |
zero_gravi |
|
| 64 |
72 |
zero_gravi |
The stream link interface provides less protocol overhead and less latency than the bus interface. Furthermore,
|
| 65 |
|
|
FIFOs can be be configured to each direction (RX/TX) to allow more CPU-independent operation.
|
| 66 |
69 |
zero_gravi |
|
| 67 |
72 |
zero_gravi |
|
| 68 |
69 |
zero_gravi |
=== Custom Functions Subsystem
|
| 69 |
|
|
|
| 70 |
72 |
zero_gravi |
The https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem (CFS)] is
|
| 71 |
|
|
an "empty" template for a memory-mapped, processor-internal module.
|
| 72 |
|
|
|
| 73 |
|
|
The basic idea of this subsystem is to provide a convenient, simple and flexible platform, where the user can
|
| 74 |
|
|
concentrate on implementing the actual design logic rather than taking care of the communication between the
|
| 75 |
|
|
CPU/software and the design logic. Note that the CFS does not have direct access to memory. All data (and control
|
| 76 |
|
|
instruction) have to be send by the CPU.
|
| 77 |
|
|
|
| 78 |
|
|
The use-cases for the CFS include medium-scale hardware accelerators that need to be tightly-coupled to the CPU.
|
| 79 |
|
|
Potential use cases could be DSP modules like CORDIC, cryptographic accelerators or custom interfaces (like IIS).
|
| 80 |
|
|
|
| 81 |
|
|
|
| 82 |
|
|
=== Custom Functions Unit
|
| 83 |
|
|
|
| 84 |
|
|
The https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu[Custom Functions Unit (CFU)] is a functional
|
| 85 |
|
|
unit that is integrated right into the CPU's pipeline. It allows to implement custom RISC-V instructions.
|
| 86 |
|
|
This extension option is intended for rather small logic that implements operations, which cannot be emulated
|
| 87 |
|
|
in pure software in an efficient way. Since the CFU has direct access to the core's register file it can operate
|
| 88 |
|
|
with minimal data latency.
|
| 89 |
|
|
|
| 90 |
|
|
|
| 91 |
|
|
=== Comparative Summary
|
| 92 |
|
|
|
| 93 |
|
|
The following table gives a comparative summary of the most important factors when choosing one of the
|
| 94 |
|
|
chip-internal extension options:
|
| 95 |
|
|
|
| 96 |
|
|
* https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu[Custom Functions Unit] for CPU-internal custom RISC-V instructions
|
| 97 |
|
|
* https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem] for tightly-coupled processor-internal co-processors
|
| 98 |
|
|
* https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface] for processor-external streaming modules
|
| 99 |
|
|
* https://stnolting.github.io/neorv32/#_processor_external_memory_interface_wishbone_axi4_lite[External Bus Interface] for processor-external memory-mapped modules
|
| 100 |
|
|
|
| 101 |
|
|
.Comparison of On-Chip Extension Options
|
| 102 |
|
|
[cols="<1,^1,^1,^1,^1"]
|
| 103 |
|
|
[options="header",grid="rows"]
|
| 104 |
|
|
|=======================
|
| 105 |
|
|
| | Custom Functions Unit | Custom Functions Subsystem | Stream Link Interface | External Bus Interface
|
| 106 |
|
|
| **SoC location** | CPU-internal | processor-internal | processor-external | processor-external
|
| 107 |
|
|
| **HW complexity/size** | small | medium | unlimited | unlimited
|
| 108 |
|
|
| **CPU-independent operation** | no | partly | partly | completely
|
| 109 |
|
|
| **CPU interface** | register-file access | memory-mapped | memory-mapped | memory-mapped
|
| 110 |
|
|
| **Low-level CPU access scheme** | custom instructions | load/store | load/store | load/store
|
| 111 |
|
|
| **Random access** | - | yes | no, only sequential | yes
|
| 112 |
|
|
| **Access latency** | minimal | low | low | medium to high
|
| 113 |
|
|
| **External IO interfaces** | no | yes, but limited | yes | yes
|
| 114 |
|
|
| **Interrupt-capable** | no | yes | yes | user-defined
|
| 115 |
|
|
|=======================
|
| 116 |
|
|
|