OpenCores
URL https://opencores.org/ocsvn/ion/ion/trunk

Subversion Repositories ion

[/] [ion/] [trunk/] [doc/] [src/] [tex/] [cache.tex] - Blame information for rev 212

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 210 ja_rd
\chapter{Cache/Memory Controller Module}
2
\label{cache}
3
 
4
    The project includes a cache+memory controller module from revision 114.\\
5
 
6
    Both the I- and the D-Cache are implemented. But the parametrization
7
    generics are still mostly unused, with many values hardcoded. And SDRAM is
8
    not supported yet. Besides, there are some loose ends in the implementation
9
    still to be solved, exlained in section ~\ref{cache_problems}.\\
10
 
11
 
12
\section{Cache Initialization and Control}
13
\label{cache_init_and_control}
14
 
15
    The cache module comes up from reset in an indeterminate, unuseable state.
16
    It needs to be initialized before being enabled.\\
17
    Initialization means mostly marking all D- and I-cache lines as invalid.
18
    The old R3000 had its own means to achieve this, but this core implements an
19
    alternative, simplified scheme.\\
20
 
21
    The standard R3000 cache control flags in the SR are not used, either. Instead,
22
    two flags from the SR have been commandeered for cache control.\\
23
 
24
\subsection{Cache control flags}
25
\label{cache_control_flags}
26
 
27
    Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.
28
    Instead they are used as explained below:
29
 
30
    \begin{itemize}
31
    \item Bit 17: Cache enable              [reset value = 0]
32
    \item Bit 16: I- and D-Cache line invalidate   [reset value = 0]
33
    \end{itemize}
34
 
35
    You always use both these flags together to set the cache operating mode:
36
 
37
    \begin{itemize}
38
    \item Bits 17:16='00'\\
39
        Cache is enabled and working.
40
    \item Bits 17:16='01'\\
41
        Cache is in D- and I-cache line invalidation mode.\\
42
        Writing word X.X.X.N to ANY address will
43
        invalidate I-Cache line N (N is an 8-bit word and X is an 8-bit
44
        don't care). Besides, the actual write will be performed too; be
45
        careful...\\
46
 
47
        Reading from any address will cause the
48
        corresponding D-cache line to be invalidated; the read will not be
49
        actually performed and the read value is undefined.\\
50
    \item Bits 17:16='10'\\
51
        Cache is disabled; no lines are being invalidated.\\
52
    \item Bits 17:16='11'\\
53
        Cache behavior is UNDETERMINED -- i.e. guaranteed crash.\\
54
    \end{itemize}
55
 
56
    Now, after reset the cache memory comes up in an undetermined state but
57
    it comes up disabled too. Before enabling it, you need to invalidate all
58
    cache lines in software (see routine cache\_init in the memtest sample).\\
59
 
60
\section{Cache Tags and Cache Address Mirroring}
61
\label{cache_tags}
62
 
63
    In order to save space in the I-Cache tag table, the tags are shorter than
64
    they should -- 14 bits instead of the 20 bits we would need to cover the
65
    entire 32-bit address:
66
 
67
\needspace{10\baselineskip}
68
\begin{verbatim}
69
 
70
             ___________ <-- These address bits are NOT in the tag
71
            /           \
72
    31 ..   27| 26 .. 21  |20 ..          12|11  ..        4|3:2|
73
    +---------+-----------+-----------------+---------------+---+---+
74
    | 5       |           | 9               | 8             | 2 |   |
75
    +---------+-----------+-----------------+---------------+---+---+
76
    ^                     ^                 ^               ^- LINE_INDEX_SIZE
77
    5 bits                9 bits            LINE_NUMBER_SIZE
78
 
79
    \end{verbatim}\\
80
 
81
    Since bits 26 downto 21 are not included in the tag, there will be a
82
    'mirror' effect in the cache. We have effectively split the memory space
83
    into 32 separate blocks of 1MB which is obviously not enough but will do
84
    for the initial tests.
85
    In subsequent versions of the cache, the tag size needs to be enlarged AND
86
    some of the top bits might be omitted when they're not needed to implement
87
    the default memory map (namely bit 30 which is always '0').
88
 
89
 
90
\section{Memory Controller}
91
\label{memory_controller}
92
 
93
    The cache functionality and the memory controller functionality are so
94
    closely related that I found it convenient to bundle both in the same
95
    module: I have experimented with separate modules and was unable to come up
96
    with the same performance with my synthesis tools of choice.
97
    So, while it would be desirable to split the cache from the memory controller
98
    at some later version, for the time being they are a single module.\\
99
 
100
    The memory controller interfaces the cache to external memory, be it off-core
101
    or off-chip.
102
    The memory controller implements the refill and writethrough state machines,
103
    that will necessarily be different for different kinds of memory.\\
104
 
105
 
106
\subsection{Memory Map Definition}
107
\label{memory_map_definition}
108
 
109
    The MIPS architecture specs define a memory map which determines which areas
110
    are cached and which is the default address translation \cite[p.~2-8]{r3k_ref_man}.\\
111
    Neither the memory translation nor the cacheable/uncacheable attribute of
112
    the standard MIPS architecture have been implemented. In this core, program
113
    addresses are always identical to hardware addresses.\\
114
 
115
    When requested to perform a refill or a writethrough, the memory controller
116
    needs to know what type of memory it is to be dealing with. The type of
117
    memory for each memory area is defined in a hardcoded memory map
118
    implemented in function 'decode\_address\_mips1', defined in package
119
    'mips\_pkg.vhdl'. This function will synthesize into regular, combinational
120
    decode logic.\\
121
 
122
    For each address, the memory map logic will supply the following information:
123
 
124
\begin{enumerate}
125
    \item What kind of memory it is
126
    \item How many wait states to use
127
    \item Whether it is writeable or not (ignored in current version)
128
    \item Whether it is cacheable or not (ignored in current version)
129
\end{enumerate}
130
 
131
    In the present implementation the memory map can't be modified at run time.\\
132
 
133
    These are the currently supported memory types:
134
 
135
\begin{tabular}{ll}
136
\hline
137
Identifier & Description \\
138
\hline
139
MT\_BRAM            & Synchronous, on-chip FPGA BRAM\\
140
MT\_IO\_SYNC        & Synchronous, on-chip register (meant for I/O)\\
141
MT\_SRAM\_16B       & Asynchronous, off-chip static memory, 16-bit wide\\
142
MT\_SRAM\_8B        & Asynchronous, off-chip static memory, 8-bit wide\\
143
MT\_DDR\_16B        & Unused yet\\
144
MT\_UNMAPPED        & Unmapped area\\
145
\hline
146
\end{tabular}
147
 
148
\subsection{Invalid memory accesses}
149
\label{invalid_memory}
150
 
151
    Whenever the CPU attempts an invalid memory access, the 'unmapped' output
152
    port of the Cache module will be asserted for 1 clock cycle.
153
 
154
    The accesses that will raise the 'unmapped' output are these:
155
 
156
    \begin{enumerate}
157
    \item Code fetch from address decoded as MT\_IO\_SYNC.
158
    \item Data write to memory address decoded as other than RAM or IO.
159
    \item Any access to an address decoded as MT\_UNMAPPED.
160
    \end{enumerate}
161
 
162
    The 'unmapped' output is ignored by the current version of the parent MCU
163
    module -- it is only used to raise a permanent flag that is then connected
164
    to a LED for debugging purposes, hardly a useful approach in a real project.
165
 
166
    In subsequent versions of the MCU module, the 'unmapped' signal will trigger
167
    a hardware interrupt.
168
 
169
    Note again that the memory attributes 'writeable' and 'cacheable' are not
170
    used in the current version. In subsequent versions 'writeable' will be
171
    used and 'cacheable' will be removed.
172
 
173
\subsection{Uncacheable memory areas}
174
\label{uncacheable_memory}
175
 
176
    There are no predefined 'uncacheable' memory areas in the current version of
177
    the core; all memory addresses are cacheable unless they are defined as
178
    IO (see mips\_cache.vhdl).\\
179
    In short, if it's not defined as MT\_UNMAPPED or MT\_IO\_SYNC, it is
180
    cacheable.
181
 
182
\section{Cache Refill and Writethrough Chronograms}
183
\label{cache_state_machine}
184
 
185
 
186
The cache state machine deals with cache refills and data writethroughs. It is
187
this state machine that 'knows' about the type and size of the external
188
memories. when DRAM is eventually implemented, or when other widths of SRAM are
189
supported, this state machine will need to change.
190
 
191
The following subsections will describe the refill and writethrough procedures.
192
Bear in mind that there is a single state machine that handles it all -- refills
193
and writethroughs can't be done simultaneously.
194
 
195
 
196
\subsubsection{SRAM interface read cycle timing -- 16-bit interface}
197
\label{sram_read_cycle_16b}
198
 
199
The refill procedure is identical for both D- and I-cache refills. All that
200
matters is the type of memory being read.
201
 
202
 
203
 
204
\needspace{15\baselineskip}
205
\begin{verbatim}
206
==== Chronogram 4.1: 16-bit SRAM refill -- DATA ============================
207
              __    __    __    __    __    __    _     __    __    __    __
208
clk         _/  \__/  \__/  \__/  \__/  \__/  \__/ ..._/  \__/  \__/  \__/
209
 
210
cache/ps    ?| (1)             | (2)             | ... | (2)             |??
211
 
212
refill_ctr  ?| 0                                 | ... <  3              |??
213
 
214
chip_addr   ?|  210h           |  211h           | ... |  217h           |--
215
 
216
data_rd     -XXXXX  [218h]     XXXXX  [219h]     | ... XXXXX  [217h]     |--
217
             |<- 2-state sequence              ->|
218
            _                                                             __
219
sram_oe_n    \____________________________________ ... __________________/
220
             |<- Total: 24 clock cycles to refill a 4-word cache line  ->|
221
============================================================================
222
\end{verbatim}\\
223
 
224
(NOTE: signal names left-clipped to fit page margins)
225
 
226
In the diagram, the data coming into bram\_data\_rd is depicted with some delay.
227
 
228
Signal \emph{cache/ps} is the current state of the cache state machine, and
229
in this chronogram it takes the following values:
230
 
231
\begin{enumerate}
232
\item idle
233
\item data\_refill\_sram\_0
234
\item data\_refill\_sram\_1
235
\end{enumerate}
236
 
237
Each of the two states reads a halfword from SRAM. The two-state sequence is
238
repeated four times to refill the four-word cache entry.
239
 
240
Signal \emph{refill\_ctr} counts the word index within the line being refilled,
241
and runs from 0 to 4.\\
242
 
243
 
244
\subsubsection{SRAM interface read cycle timing -- 8-bit interface}
245
\label{sram_read_cycle_8b}
246
 
247
TODO: 8-bit refill procedure to be done.
248
 
249
 
250
\subsubsection{SRAM interface write cycle timing}
251
\label{sram_write_cycle}
252
 
253
The path of the state machine that deals with SRAM writethroughs is linear so
254
a state diagram would not be very interesting. As you can see in the source
255
code, all the states are one clock long except for states
256
\emph{data\_writethrough\_sram\_0b} and \emph{data\_writethrough\_sram\_1b},
257
which will be as long as the number of wait states plus one.
258
This is the only writethrough parameter that is influenced by the wait state
259
attribute.\\
260
 
261
A general memory write will be 32-bit wide and thus it will take two 16-bit
262
memory accesses to complete. Unaligned, halfword or byte wide CPU writes might
263
in some cases be optimized to take only a single 16-bit memory access. This
264
module does no such optimization.
265
For simplicity, all writethroughs take two 16-bit access cycles, even if one
266
of them has both we\_n signals deasserted.\\
267
 
268
The following chronogram has been copied from a simulation of the 'hello'
269
sample. It's a 32-bit wide write to address 00000430h.
270
As you can see, the 'chip address' (the address fed to the SRAM chip) is the
271
target address divided by 2 (because there are 2 16-bit halfwords to the 32-bit
272
word). In this particular case, all the four bytes of the long word are written
273
and so both the we\_n signals are asserted for both halfwords.
274
 
275
In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for
276
two cycles.
277
Note how a lot of cycles are lost in order to guarantee compliance with the
278
setup and hold times of the SRAM against the we, address and data lines.
279
 
280
\needspace{15\baselineskip}
281
\begin{verbatim}
282
==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================
283
                     __    __    __    __    __    __    __    __    __    _
284
clk                _/  \__/  \__/  \__/  \__/  \__/  \__/  \__/  \__/  \__/
285
 
286
cache/ps           ?| (1) | (2) | (3)       | (4) | (5) | (6)       | (7) |?
287
 
288
sram_chip_addr     ?|          218h               |       219h            |?
289
 
290
sram_data_wr       -------|       0000h           |         044Ch         |-
291
                   _____________             ___________             _______
292
sram_byte_we_n(0)               \___________/           \___________/
293
                   _____________             ___________             _______
294
sram_byte_we_n(1)               \___________/           \___________/
295
 
296
                   _________________________________________________________
297
sram_oe_n
298
============================================================================
299
\end{verbatim}\\
300
 
301
Signal \emph{cache/ps} is the current state of the cache state machine, and
302
in this chronogram it takes the following values:
303
 
304
\begin{enumerate}
305
\item idle
306
\item data\_writethrough\_sram\_0a
307
\item data\_writethrough\_sram\_0b
308
\item data\_writethrough\_sram\_0c
309
\item data\_writethrough\_sram\_1a
310
\item data\_writethrough\_sram\_1b
311
\item data\_writethrough\_sram\_1c
312
\end{enumerate}
313
 
314
 
315
 
316
\section{Known Problems}
317
\label{cache_problems}
318
 
319
\begin{enumerate}
320
\item All parameters hardcoded -- generics are almost ignored.
321
\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.
322
        In my current target board it works because the FPGA hold times
323
        (including an input mux
324
        in the parent module) are far smaller than the SRAM response times, but
325
        it would be better to insert an extra cycle after the wait states in
326
        the sram read state machine.
327
\end{enumerate}
328
 

powered by: WebSVN 2.1.0

© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.