OpenCores
URL https://opencores.org/ocsvn/ion/ion/trunk

Subversion Repositories ion

[/] [ion/] [trunk/] [doc/] [src/] [tex/] [cache.tex] - Blame information for rev 221

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 210 ja_rd
\chapter{Cache/Memory Controller Module}
2
\label{cache}
3
 
4
    The project includes a cache+memory controller module from revision 114.\\
5
 
6
    Both the I- and the D-Cache are implemented. But the parametrization
7
    generics are still mostly unused, with many values hardcoded. And SDRAM is
8
    not supported yet. Besides, there are some loose ends in the implementation
9
    still to be solved, exlained in section ~\ref{cache_problems}.\\
10
 
11
 
12
\section{Cache Initialization and Control}
13
\label{cache_init_and_control}
14
 
15
    The cache module comes up from reset in an indeterminate, unuseable state.
16
    It needs to be initialized before being enabled.\\
17
    Initialization means mostly marking all D- and I-cache lines as invalid.
18
    The old R3000 had its own means to achieve this, but this core implements an
19
    alternative, simplified scheme.\\
20
 
21
    The standard R3000 cache control flags in the SR are not used, either. Instead,
22 221 ja_rd
    two flags from the SR have been repurposed for cache control.\\
23 210 ja_rd
 
24
\subsection{Cache control flags}
25
\label{cache_control_flags}
26
 
27
    Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.
28
    Instead they are used as explained below:
29
 
30
    \begin{itemize}
31
    \item Bit 17: Cache enable              [reset value = 0]
32
    \item Bit 16: I- and D-Cache line invalidate   [reset value = 0]
33
    \end{itemize}
34
 
35
    You always use both these flags together to set the cache operating mode:
36
 
37
    \begin{itemize}
38
    \item Bits 17:16='00'\\
39
        Cache is enabled and working.
40
    \item Bits 17:16='01'\\
41
        Cache is in D- and I-cache line invalidation mode.\\
42
        Writing word X.X.X.N to ANY address will
43
        invalidate I-Cache line N (N is an 8-bit word and X is an 8-bit
44
        don't care). Besides, the actual write will be performed too; be
45
        careful...\\
46
 
47
        Reading from any address will cause the
48
        corresponding D-cache line to be invalidated; the read will not be
49
        actually performed and the read value is undefined.\\
50
    \item Bits 17:16='10'\\
51
        Cache is disabled; no lines are being invalidated.\\
52
    \item Bits 17:16='11'\\
53
        Cache behavior is UNDETERMINED -- i.e. guaranteed crash.\\
54
    \end{itemize}
55
 
56
    Now, after reset the cache memory comes up in an undetermined state but
57
    it comes up disabled too. Before enabling it, you need to invalidate all
58
    cache lines in software (see routine cache\_init in the memtest sample).\\
59
 
60
\section{Cache Tags and Cache Address Mirroring}
61
\label{cache_tags}
62
 
63
    In order to save space in the I-Cache tag table, the tags are shorter than
64
    they should -- 14 bits instead of the 20 bits we would need to cover the
65
    entire 32-bit address:
66
 
67
\needspace{10\baselineskip}
68
\begin{verbatim}
69
 
70 221 ja_rd
                _________ <-- These address bits are NOT in the tag
71
               /         \
72 210 ja_rd
    31 ..   27| 26 .. 21  |20 ..          12|11  ..        4|3:2|
73
    +---------+-----------+-----------------+---------------+---+---+
74
    | 5       |           | 9               | 8             | 2 |   |
75
    +---------+-----------+-----------------+---------------+---+---+
76
    ^                     ^                 ^               ^- LINE_INDEX_SIZE
77
    5 bits                9 bits            LINE_NUMBER_SIZE
78
 
79
    \end{verbatim}\\
80
 
81
    Since bits 26 downto 21 are not included in the tag, there will be a
82
    'mirror' effect in the cache. We have effectively split the memory space
83
    into 32 separate blocks of 1MB which is obviously not enough but will do
84 221 ja_rd
    for the initial versions of the core.
85
 
86 210 ja_rd
    In subsequent versions of the cache, the tag size needs to be enlarged AND
87
    some of the top bits might be omitted when they're not needed to implement
88 221 ja_rd
    the default MIPS memory map (namely bit 30 which is always '0').
89 210 ja_rd
 
90
 
91
\section{Memory Controller}
92
\label{memory_controller}
93
 
94
    The cache functionality and the memory controller functionality are so
95
    closely related that I found it convenient to bundle both in the same
96
    module: I have experimented with separate modules and was unable to come up
97
    with the same performance with my synthesis tools of choice.
98
    So, while it would be desirable to split the cache from the memory controller
99
    at some later version, for the time being they are a single module.\\
100
 
101
    The memory controller interfaces the cache to external memory, be it off-core
102
    or off-chip.
103
    The memory controller implements the refill and writethrough state machines,
104
    that will necessarily be different for different kinds of memory.\\
105
 
106
 
107
\subsection{Memory Map Definition}
108
\label{memory_map_definition}
109
 
110
    The MIPS architecture specs define a memory map which determines which areas
111
    are cached and which is the default address translation \cite[p.~2-8]{r3k_ref_man}.\\
112
    Neither the memory translation nor the cacheable/uncacheable attribute of
113
    the standard MIPS architecture have been implemented. In this core, program
114
    addresses are always identical to hardware addresses.\\
115
 
116
    When requested to perform a refill or a writethrough, the memory controller
117
    needs to know what type of memory it is to be dealing with. The type of
118
    memory for each memory area is defined in a hardcoded memory map
119
    implemented in function 'decode\_address\_mips1', defined in package
120
    'mips\_pkg.vhdl'. This function will synthesize into regular, combinational
121
    decode logic.\\
122
 
123
    For each address, the memory map logic will supply the following information:
124
 
125
\begin{enumerate}
126 221 ja_rd
    \item What kind of memory it is.
127
    \item How many wait states to use.
128
    \item Whether it is writeable or not (ignored in current version).
129
    \item Whether it is cacheable or not (ignored in current version).
130 210 ja_rd
\end{enumerate}
131
 
132
    In the present implementation the memory map can't be modified at run time.\\
133
 
134
    These are the currently supported memory types:
135
 
136
\begin{tabular}{ll}
137
\hline
138
Identifier & Description \\
139
\hline
140
MT\_BRAM            & Synchronous, on-chip FPGA BRAM\\
141
MT\_IO\_SYNC        & Synchronous, on-chip register (meant for I/O)\\
142
MT\_SRAM\_16B       & Asynchronous, off-chip static memory, 16-bit wide\\
143
MT\_SRAM\_8B        & Asynchronous, off-chip static memory, 8-bit wide\\
144
MT\_DDR\_16B        & Unused yet\\
145
MT\_UNMAPPED        & Unmapped area\\
146
\hline
147
\end{tabular}
148
 
149
\subsection{Invalid memory accesses}
150
\label{invalid_memory}
151
 
152
    Whenever the CPU attempts an invalid memory access, the 'unmapped' output
153
    port of the Cache module will be asserted for 1 clock cycle.
154
 
155
    The accesses that will raise the 'unmapped' output are these:
156
 
157
    \begin{enumerate}
158
    \item Code fetch from address decoded as MT\_IO\_SYNC.
159
    \item Data write to memory address decoded as other than RAM or IO.
160
    \item Any access to an address decoded as MT\_UNMAPPED.
161
    \end{enumerate}
162
 
163
    The 'unmapped' output is ignored by the current version of the parent MCU
164
    module -- it is only used to raise a permanent flag that is then connected
165
    to a LED for debugging purposes, hardly a useful approach in a real project.
166
 
167
    In subsequent versions of the MCU module, the 'unmapped' signal will trigger
168
    a hardware interrupt.
169
 
170
    Note again that the memory attributes 'writeable' and 'cacheable' are not
171
    used in the current version. In subsequent versions 'writeable' will be
172
    used and 'cacheable' will be removed.
173
 
174
\subsection{Uncacheable memory areas}
175
\label{uncacheable_memory}
176
 
177
    There are no predefined 'uncacheable' memory areas in the current version of
178
    the core; all memory addresses are cacheable unless they are defined as
179
    IO (see mips\_cache.vhdl).\\
180
    In short, if it's not defined as MT\_UNMAPPED or MT\_IO\_SYNC, it is
181
    cacheable.
182
 
183
\section{Cache Refill and Writethrough Chronograms}
184
\label{cache_state_machine}
185
 
186
 
187
The cache state machine deals with cache refills and data writethroughs. It is
188
this state machine that 'knows' about the type and size of the external
189
memories. when DRAM is eventually implemented, or when other widths of SRAM are
190
supported, this state machine will need to change.
191
 
192
The following subsections will describe the refill and writethrough procedures.
193
Bear in mind that there is a single state machine that handles it all -- refills
194
and writethroughs can't be done simultaneously.
195
 
196
 
197
\subsubsection{SRAM interface read cycle timing -- 16-bit interface}
198
\label{sram_read_cycle_16b}
199
 
200
The refill procedure is identical for both D- and I-cache refills. All that
201
matters is the type of memory being read.
202
 
203
 
204
 
205
\needspace{15\baselineskip}
206
\begin{verbatim}
207
==== Chronogram 4.1: 16-bit SRAM refill -- DATA ============================
208
              __    __    __    __    __    __    _     __    __    __    __
209
clk         _/  \__/  \__/  \__/  \__/  \__/  \__/ ..._/  \__/  \__/  \__/
210
 
211
cache/ps    ?| (1)             | (2)             | ... | (2)             |??
212
 
213 221 ja_rd
refill_ctr  ?| 0                                 | ... |  3              |??
214 210 ja_rd
 
215
chip_addr   ?|  210h           |  211h           | ... |  217h           |--
216
 
217
data_rd     -XXXXX  [218h]     XXXXX  [219h]     | ... XXXXX  [217h]     |--
218
             |<- 2-state sequence              ->|
219
            _                                                             __
220
sram_oe_n    \____________________________________ ... __________________/
221
             |<- Total: 24 clock cycles to refill a 4-word cache line  ->|
222
============================================================================
223
\end{verbatim}\\
224
 
225
(NOTE: signal names left-clipped to fit page margins)
226
 
227
In the diagram, the data coming into bram\_data\_rd is depicted with some delay.
228
 
229
Signal \emph{cache/ps} is the current state of the cache state machine, and
230
in this chronogram it takes the following values:
231
 
232
\begin{enumerate}
233
\item data\_refill\_sram\_0
234
\item data\_refill\_sram\_1
235
\end{enumerate}
236
 
237
Each of the two states reads a halfword from SRAM. The two-state sequence is
238
repeated four times to refill the four-word cache entry.
239
 
240
Signal \emph{refill\_ctr} counts the word index within the line being refilled,
241
and runs from 0 to 4.\\
242
 
243
 
244
\subsubsection{SRAM interface read cycle timing -- 8-bit interface}
245
\label{sram_read_cycle_8b}
246
 
247 221 ja_rd
The refill from an 8-bit static memory is essentially the same as depicted
248
above, except we need to read 4 bytes (over the LSB lines of the static memory
249
data bus) instead of 2 16-bit halfwords. The operation takes correspondingly
250
longer to perform and uses an extra address line but is otherwise identical.
251 210 ja_rd
 
252 221 ja_rd
TODO: 8-bit refill chronogram to be done.
253 210 ja_rd
 
254 221 ja_rd
 
255
\subsubsection{16-bit SRAM interface write cycle timing}
256 210 ja_rd
\label{sram_write_cycle}
257
 
258
The path of the state machine that deals with SRAM writethroughs is linear so
259
a state diagram would not be very interesting. As you can see in the source
260
code, all the states are one clock long except for states
261
\emph{data\_writethrough\_sram\_0b} and \emph{data\_writethrough\_sram\_1b},
262
which will be as long as the number of wait states plus one.
263
This is the only writethrough parameter that is influenced by the wait state
264
attribute.\\
265
 
266
A general memory write will be 32-bit wide and thus it will take two 16-bit
267
memory accesses to complete. Unaligned, halfword or byte wide CPU writes might
268
in some cases be optimized to take only a single 16-bit memory access. This
269 221 ja_rd
module does no such optimization yet.
270 210 ja_rd
For simplicity, all writethroughs take two 16-bit access cycles, even if one
271
of them has both we\_n signals deasserted.\\
272
 
273
The following chronogram has been copied from a simulation of the 'hello'
274
sample. It's a 32-bit wide write to address 00000430h.
275
As you can see, the 'chip address' (the address fed to the SRAM chip) is the
276
target address divided by 2 (because there are 2 16-bit halfwords to the 32-bit
277
word). In this particular case, all the four bytes of the long word are written
278
and so both the we\_n signals are asserted for both halfwords.
279
 
280
In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for
281
two cycles.
282 221 ja_rd
Note how a lot of cycles are used in order to guarantee compliance with the
283 210 ja_rd
setup and hold times of the SRAM against the we, address and data lines.
284
 
285
\needspace{15\baselineskip}
286
\begin{verbatim}
287
==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================
288
                     __    __    __    __    __    __    __    __    __    _
289
clk                _/  \__/  \__/  \__/  \__/  \__/  \__/  \__/  \__/  \__/
290
 
291
cache/ps           ?| (1) | (2) | (3)       | (4) | (5) | (6)       | (7) |?
292
 
293
sram_chip_addr     ?|          218h               |       219h            |?
294
 
295
sram_data_wr       -------|       0000h           |         044Ch         |-
296
                   _____________             ___________             _______
297
sram_byte_we_n(0)               \___________/           \___________/
298
                   _____________             ___________             _______
299
sram_byte_we_n(1)               \___________/           \___________/
300
 
301
                   _________________________________________________________
302
sram_oe_n
303
============================================================================
304
\end{verbatim}\\
305
 
306
Signal \emph{cache/ps} is the current state of the cache state machine, and
307
in this chronogram it takes the following values:
308
 
309
\begin{enumerate}
310
\item data\_writethrough\_sram\_0a
311
\item data\_writethrough\_sram\_0b
312
\item data\_writethrough\_sram\_0c
313
\item data\_writethrough\_sram\_1a
314
\item data\_writethrough\_sram\_1b
315
\item data\_writethrough\_sram\_1c
316
\end{enumerate}
317
 
318
 
319
 
320
\section{Known Problems}
321
\label{cache_problems}
322
 
323 221 ja_rd
    The cache implementation is still provisional and has a number of
324
    acknowledged problems:
325
 
326 210 ja_rd
\begin{enumerate}
327
\item All parameters hardcoded -- generics are almost ignored.
328
\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.
329
        In my current target board it works because the FPGA hold times
330
        (including an input mux
331
        in the parent module) are far smaller than the SRAM response times, but
332
        it would be better to insert an extra cycle after the wait states in
333
        the sram read state machine.
334 221 ja_rd
\item Cache logic mixed with memory controller logic.
335 210 ja_rd
\end{enumerate}
336
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.