1 |
60 |
zero_gravi |
<<<
|
2 |
|
|
:sectnums:
|
3 |
|
|
==== Processor-Internal Instruction Cache (iCACHE)
|
4 |
|
|
|
5 |
|
|
[cols="<3,<3,<4"]
|
6 |
|
|
[frame="topbot",grid="none"]
|
7 |
|
|
|=======================
|
8 |
|
|
| Hardware source file(s): | neorv32_icache.vhd |
|
9 |
|
|
| Software driver file(s): | none | _implicitly used_
|
10 |
|
|
| Top entity port: | none |
|
11 |
|
|
| Configuration generics: | _ICACHE_EN_ | implement processor-internal instruction cache when _true_
|
12 |
|
|
| | _ICACHE_NUM_BLOCKS_ | number of cache blocks (pages/lines)
|
13 |
|
|
| | _ICACHE_BLOCK_SIZE_ | size of a cache block in bytes
|
14 |
|
|
| | _ICACHE_ASSOCIATIVITY_ | associativity / number of sets
|
15 |
|
|
| CPU interrupts: | none |
|
16 |
|
|
|=======================
|
17 |
|
|
|
18 |
70 |
zero_gravi |
The processor features an optional cache for instructions to improve performance when using memories with high
|
19 |
|
|
access latencies. The cache is directly connected to the CPU's instruction fetch interface and provides
|
20 |
|
|
full-transparent buffering of instruction fetch accesses to the entire address space.
|
21 |
61 |
zero_gravi |
|
22 |
60 |
zero_gravi |
The cache is implemented if the _ICACHE_EN_ generic is true. The size of the cache memory is defined via
|
23 |
|
|
_ICACHE_BLOCK_SIZE_ (the size of a single cache block/page/line in bytes; has to be a power of two and >=
|
24 |
|
|
4 bytes), _ICACHE_NUM_BLOCKS_ (the total amount of cache blocks; has to be a power of two and >= 1) and
|
25 |
|
|
the actual cache associativity _ICACHE_ASSOCIATIVITY_ (number of sets; 1 = direct-mapped, 2 = 2-way set-associative,
|
26 |
|
|
has to be a power of two and >= 1).
|
27 |
|
|
|
28 |
70 |
zero_gravi |
If the cache associativity (_ICACHE_ASSOCIATIVITY_) is greater than 1 the LRU replacement policy (least recently
|
29 |
60 |
zero_gravi |
used) is used.
|
30 |
|
|
|
31 |
70 |
zero_gravi |
.Cache Memory HDL
|
32 |
|
|
[NOTE]
|
33 |
|
|
The default `neorv32_icache.vhd` HDL source file provides a _generic_ memory design that infers embedded
|
34 |
|
|
memory. You might need to replace/modify the source file in order to use platform-specific features
|
35 |
|
|
(like advanced memory resources) or to improve technology mapping and/or timing. Also, keep the features
|
36 |
|
|
of the targeted FPGA's memory resources (block RAM) in mind when configuring
|
37 |
60 |
zero_gravi |
the cache size/layout to maximize and optimize resource utilization.
|
38 |
|
|
|
39 |
70 |
zero_gravi |
.Caching Internal Memories
|
40 |
|
|
[NOTE]
|
41 |
|
|
The instruction cache is intended to accelerate instruction fetches from _processor-external_ memories.
|
42 |
|
|
Since all processor-internal memories provide an access latency of one cycle (by default), caching
|
43 |
|
|
internal memories does not bring a relevant performance gain. However, it will slightly reduce traffic on the
|
44 |
|
|
processor-internal bus.
|
45 |
|
|
|
46 |
|
|
.Manual Cache Clear/Reload
|
47 |
|
|
[NOTE]
|
48 |
60 |
zero_gravi |
By executing the `ifence.i` instruction (`Zifencei` CPU extension) the cache is cleared and a reload from
|
49 |
70 |
zero_gravi |
main memory is triggered. Among other things this allows to implement self-modifying code.
|
50 |
60 |
zero_gravi |
|
51 |
|
|
**Bus Access Fault Handling**
|
52 |
|
|
|
53 |
70 |
zero_gravi |
The cache always loads a complete cache block (_ICACHE_BLOCK_SIZE_ bytes) aligned to it's size every time a
|
54 |
|
|
cache miss is detected. If any of the accessed addresses within a single block do not successfully
|
55 |
|
|
acknowledge the transfer (i.e. issuing an error signal or timing out) the whole cache block is invalidated and
|
56 |
|
|
any access to an address within this cache block will raise an instruction fetch bus error exception.
|
57 |
60 |
zero_gravi |
|