OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [userguide/] [application_specific_configuration.adoc] - Blame information for rev 69

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 69 zero_gravi
<<<
2
:sectnums:
3
== Application-Specific Processor Configuration
4
 
5
Due to the processor's configuration options, which are mainly defined via the top entity VHDL generics, the SoC
6
can be tailored to the application-specific requirements. Note that this chapter does not focus on optional
7
_SoC features_ like IO/peripheral modules. It rather gives ideas on how to optimize for _overall goals_
8
like performance and area.
9
 
10
[NOTE]
11
Please keep in mind that optimizing the design in one direction (like performance) will also effect other potential
12
optimization goals (like area and energy).
13
 
14
=== Optimize for Performance
15
 
16
The following points show some concepts to optimize the processor for performance regardless of the costs
17
(i.e. increasing area and energy requirements):
18
 
19
* Enable all performance-related RISC-V CPU extensions that implement dedicated hardware accelerators instead
20
of emulating operations entirely in software:  `M`, `C`, `Zfinx`
21
* Enable mapping of compleX CPU operations to dedicated hardware: `FAST_MUL_EN => true` to use DSP slices for
22
multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift operations.
23
* Implement the instruction cache: `ICACHE_EN => true`
24
* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and
25
`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`
26
* Increase the CPU's instruction prefetch buffer size: `CPU_IPB_ENTRIES`
27
* _To be continued..._
28
 
29
 
30
=== Optimize for Size
31
 
32
The NEORV32 is a size-optimized processor system that is intended to fit into tiny niches within large SoC
33
designs or to be used a customized microcontroller in really tiny / low-power FPGAs (like Lattice iCE40).
34
Here are some ideas how to make the processor even smaller while maintaining it's _general purpose system_
35
concept and maximum RISC-V compatibility.
36
 
37
**SoC**
38
 
39
* This is obvious, but exclude all unused optional IO/peripheral modules from synthesis via the processor
40
configuration generics.
41
* If an IO module provides an option to configure the number of "channels", constrain this number to the
42
actually required value (e.g. the PWM module `IO_PWM_NUM_CH` or the external interrupt controller `XIRQ_NUM_CH`).
43
* Reduce the FIFO sizes of implemented modules (e.g. `SLINK_TX_FIFO`).
44
* Disable the instruction cache (`ICACHE_EN => false`) if the design only uses processor-internal IMEM
45
and DMEM memories.
46
* _To be continued..._
47
 
48
**CPU**
49
 
50
* Use the _embedded_ RISC-V CPU architecture extension (`CPU_EXTENSION_RISCV_E`) to reduce block RAM utilization.
51
* The compressed instructions extension (`CPU_EXTENSION_RISCV_C`) requires additional logic for the decoder but
52
also reduces program code size by approximately 30%.
53
* If not explicitly used/required, constrain the CPU's counter sizes: `CPU_CNT_WIDTH` for `[m]instret[h]`
54
(number of instruction) and `[m]cycle[h]` (number of cycles) counters. You can even remove these counters
55
by setting `CPU_CNT_WIDTH => 0` if they are not used at all (note, this is not RISC-V compliant).
56
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`).
57
* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).
58
* If you have unused DSP block available, you can map multiplication operations to those slices instead of
59
using LUTs to implement the multiplier (`FAST_MUL_EN => true`).
60
* If there is no need to execute division in hardware, use the `Zmmul` extension instead of the full-scale
61
`M` extension.
62
* Disable CPU extension that are not explicitly used (`A`, `U`, `Zfinx`).
63
* _To be continued..._
64
 
65
=== Optimize for Clock Speed
66
 
67
The NEORV32 Processor and CPU are designed to provide minimal logic between register stages to keep the
68
critical path as short as possible. When enabling additional extension or modules the impact on the existing
69
logic is also kept at a minimum to prevent timing degrading. If there is a major impact on existing
70
logic (example: many physical memory protection address configuration registers) the VHDL code automatically
71
adds additional register stages to maintain critical path length. Obviously, this increases operation latency.
72
 
73
In order to optimize for a minimal critical path (= maximum clock speed) the following points should be considered:
74
 
75
* Complex CPU extensions (in terms of hardware requirements) should be avoided (examples: floating-point unit, physical memory protection).
76
* Large carry chains (>32-bit) should be avoided (constrain CPU counter sizes: e.g. `CPU_CNT_WIDTH => 32` and `HPM_NUM_CNTS => 32`).
77
* If the target FPGA provides sufficient DSP resources, CPU multiplication operations can be mapped to DSP slices (`FAST_MUL_EN => true`)
78
reducing LUT usage and critical path impact while also increasing overall performance.
79
* Use the synchronous (registered) RX path configuration of the external memory interface (`MEM_EXT_ASYNC_RX => false`).
80
* _To be continued..._
81
 
82
[NOTE]
83
The short and fixed-length critical path allows to integrate the core into existing clock domains.
84
So no clock domain-crossing and no sub-clock generation is required. However, for very high clock
85
frequencies (this is technology / platform dependent) clock domain crossing becomes crucial for chip-internal
86
connections.
87
 
88
 
89
=== Optimize for Energy
90
 
91
There are no _dedicated_ configuration options to optimize the processor for energy (minimal consumption;
92
energy/instruction ratio) yet. However, a reduced processor area (<<_optimize_for_size>>) will also reduce
93
static energy consumption.
94
 
95
To optimize your setup for low-power applications, you can make use of the CPU sleep mode (`wfi` instruction).
96
Put the CPU to sleep mode whenever possible. Disable all processor modules that are not actually used (exclude them
97
from synthesis if the will be _never_ used; disable the module via it's control register if the module is not
98
_currently_ used). When is sleep mode, you can keep a timer module running (MTIME or the watch dog) to wake up
99
the CPU again. Since the wake up is triggered by _any_ interrupt, the external interrupt controller can also
100
be used to wake up the CPU again. By this, all timers (and all other modules) can be deactivated as well.
101
 
102
.Processor-internal clock generator shutdown
103
[TIP]
104
If _no_ IO/peripheral module is currently enabled, the processor's internal clock generator circuit will be
105
shut down reducing switching activity and thus, dynamic energy consumption.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.