1 |
60 |
zero_gravi |
<<<
|
2 |
|
|
:sectnums:
|
3 |
|
|
==== Processor-Internal Instruction Cache (iCACHE)
|
4 |
|
|
|
5 |
|
|
[cols="<3,<3,<4"]
|
6 |
|
|
[frame="topbot",grid="none"]
|
7 |
|
|
|=======================
|
8 |
|
|
| Hardware source file(s): | neorv32_icache.vhd |
|
9 |
|
|
| Software driver file(s): | none | _implicitly used_
|
10 |
|
|
| Top entity port: | none |
|
11 |
|
|
| Configuration generics: | _ICACHE_EN_ | implement processor-internal instruction cache when _true_
|
12 |
|
|
| | _ICACHE_NUM_BLOCKS_ | number of cache blocks (pages/lines)
|
13 |
|
|
| | _ICACHE_BLOCK_SIZE_ | size of a cache block in bytes
|
14 |
|
|
| | _ICACHE_ASSOCIATIVITY_ | associativity / number of sets
|
15 |
|
|
| CPU interrupts: | none |
|
16 |
|
|
|=======================
|
17 |
|
|
|
18 |
61 |
zero_gravi |
[NOTE]
|
19 |
|
|
The default `neorv32_icache.vhd` HDL source file provides a _generic_ memory design that infers embedded
|
20 |
|
|
memory. You might need to replace/modify the source file in order to use platform-specific features
|
21 |
|
|
(like advanced memory resources) or to improve technology mapping and/or timing.
|
22 |
|
|
|
23 |
60 |
zero_gravi |
The processor features an optional cache for instructions to compensate memories with high latency. The
|
24 |
|
|
cache is directly connected to the CPU's instruction fetch interface and provides a full-transparent buffering
|
25 |
|
|
of instruction fetch accesses to the entire 4GB address space.
|
26 |
|
|
|
27 |
|
|
[IMPORTANT]
|
28 |
|
|
The instruction cache is intended to accelerate instruction fetch via the external memory interface.
|
29 |
|
|
Since all processor-internal memories provide an access latency of one cycle (by default), caching
|
30 |
|
|
internal memories does not bring any performance gain. However, it _might_ reduce traffic on the
|
31 |
|
|
processor-internal bus.
|
32 |
|
|
|
33 |
|
|
The cache is implemented if the _ICACHE_EN_ generic is true. The size of the cache memory is defined via
|
34 |
|
|
_ICACHE_BLOCK_SIZE_ (the size of a single cache block/page/line in bytes; has to be a power of two and >=
|
35 |
|
|
4 bytes), _ICACHE_NUM_BLOCKS_ (the total amount of cache blocks; has to be a power of two and >= 1) and
|
36 |
|
|
the actual cache associativity _ICACHE_ASSOCIATIVITY_ (number of sets; 1 = direct-mapped, 2 = 2-way set-associative,
|
37 |
|
|
has to be a power of two and >= 1).
|
38 |
|
|
|
39 |
|
|
If the cache associativity (_ICACHE_ASSOCIATIVITY_) is > 1 the LRU replacement policy (least recently
|
40 |
|
|
used) is used.
|
41 |
|
|
|
42 |
|
|
[TIP]
|
43 |
|
|
Keep the features of the targeted FPGA's memory resources (block RAM) in mind when configuring
|
44 |
|
|
the cache size/layout to maximize and optimize resource utilization.
|
45 |
|
|
|
46 |
|
|
By executing the `ifence.i` instruction (`Zifencei` CPU extension) the cache is cleared and a reload from
|
47 |
|
|
main memory is forced. Among other things, this allows to implement self-modifying code.
|
48 |
|
|
|
49 |
|
|
**Bus Access Fault Handling**
|
50 |
|
|
|
51 |
|
|
The cache always loads a complete cache block (_ICACHE_BLOCK_SIZE_ bytes) aligned to the size of a cache
|
52 |
|
|
block if a miss is detected. If any of the accessed addresses within a single block do not successfully
|
53 |
|
|
acknowledge (i.e. issuing an error signal or timing out) the whole cache block is invalidate and any access to
|
54 |
|
|
an address within this cache block will also raise an instruction fetch bus error fault exception.
|
55 |
|
|
|