URL https://opencores.org/ocsvn/ion/ion/trunk

Subversion Repositories ion

[/] [ion/] [trunk/] [doc/] [src/] [tex/] [cache.tex] - Blame information for rev 232

Go to most recent revision | Details | Compare with Previous | View Log


\chapter{Cache/Memory Controller Module}
\label{cache}
 
    The project includes a cache+memory controller module from revision 114.\\
 
    Both the I- and the D-Cache are implemented. But the parametrization
    generics are still mostly unused, with many values hardcoded. And SDRAM is
    not supported yet. Besides, there are some loose ends in the implementation
    still to be solved, exlained in section ~\ref{cache_problems}.\\
 
 
\section{Cache Initialization and Control}
\label{cache_init_and_control}
 
    The cache module comes up from reset in an indeterminate, unuseable state.
    It needs to be initialized before being enabled.\\
    Initialization means mostly marking all D- and I-cache lines as invalid.
    The old R3000 had its own means to achieve this, but this core implements an
    alternative, simplified scheme.\\
 
    The standard R3000 cache control flags in the SR are not used, either. Instead,
    two flags from the SR have been repurposed for cache control.\\
 
\subsection{Cache control flags}
\label{cache_control_flags}
 
    Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.
    Instead they are used as explained below:
 
    \begin{itemize}
    \item Bit 17: Cache enable              [reset value = 0]
    \item Bit 16: I- and D-Cache line invalidate   [reset value = 0]
    \end{itemize}
 
    You always use both these flags together to set the cache operating mode:
 
    \begin{itemize}
    \item Bits 17:16='00'\\
        Cache is enabled and working.
    \item Bits 17:16='01'\\
        Cache is in D- and I-cache line invalidation mode.\\
        Writing word X.X.X.N to ANY address will
        invalidate I-Cache line N (N is an 8-bit word and X is an 8-bit
        don't care). Besides, the actual write will be performed too; be
        careful...\\
 
        Reading from any address will cause the
        corresponding D-cache line to be invalidated; the read will not be
        actually performed and the read value is undefined.\\
    \item Bits 17:16='10'\\
        Cache is disabled; no lines are being invalidated.\\
    \item Bits 17:16='11'\\
        Cache behavior is UNDETERMINED -- i.e. guaranteed crash.\\
    \end{itemize}
 
    Now, after reset the cache memory comes up in an undetermined state but
    it comes up disabled too. Before enabling it, you need to invalidate all
    cache lines in software (see routine cache\_init in the memtest sample).\\
 
\section{Cache Tags and Cache Address Mirroring}
\label{cache_tags}
 
    In order to save space in the I-Cache tag table, the tags are shorter than
    they should -- 14 bits instead of the 20 bits we would need to cover the
    entire 32-bit address:
 
\needspace{10\baselineskip}
\begin{verbatim}
 
                _________ <-- These address bits are NOT in the tag
               /         \
    31 ..   27| 26 .. 21  |20 ..          12|11  ..        4|3:2|
    +---------+-----------+-----------------+---------------+---+---+
    | 5       |           | 9               | 8             | 2 |   |
    +---------+-----------+-----------------+---------------+---+---+
    ^                     ^                 ^               ^- LINE_INDEX_SIZE
    5 bits                9 bits            LINE_NUMBER_SIZE
 
    \end{verbatim}\\
 
    Since bits 26 downto 21 are not included in the tag, there will be a
    'mirror' effect in the cache. We have effectively split the memory space
    into 32 separate blocks of 1MB which is obviously not enough but will do
    for the initial versions of the core.
 
    In subsequent versions of the cache, the tag size needs to be enlarged AND
    some of the top bits might be omitted when they're not needed to implement
    the default MIPS memory map (namely bit 30 which is always '0').
 
 
\section{Memory Controller}
\label{memory_controller}
 
    The cache functionality and the memory controller functionality are so
    closely related that I found it convenient to bundle both in the same
    module: I have experimented with separate modules and was unable to come up
    with the same performance with my synthesis tools of choice.
    So, while it would be desirable to split the cache from the memory controller
    at some later version, for the time being they are a single module.\\
 
    The memory controller interfaces the cache to external memory, be it off-core
    or off-chip.
    The memory controller implements the refill and writethrough state machines,
    that will necessarily be different for different kinds of memory.\\
 
 
\subsection{Memory Map Definition}
\label{memory_map_definition}
 
    The MIPS architecture specs define a memory map which determines which areas
    are cached and which is the default address translation \cite[p.~2-8]{r3k_ref_man}.\\
    Neither the memory translation nor the cacheable/uncacheable attribute of
    the standard MIPS architecture have been implemented. In this core, program
    addresses are always identical to hardware addresses.\\
 
    When requested to perform a refill or a writethrough, the memory controller
    needs to know what type of memory it is to be dealing with. The type of
    memory for each memory area is defined in a hardcoded memory map
    implemented in function 'decode\_address\_mips1', defined in package
    'mips\_pkg.vhdl'. This function will synthesize into regular, combinational
    decode logic.\\
 
    For each address, the memory map logic will supply the following information:
 
\begin{enumerate}
    \item What kind of memory it is.
    \item How many wait states to use.
    \item Whether it is writeable or not (ignored in current version).
    \item Whether it is cacheable or not (ignored in current version).
\end{enumerate}
 
    In the present implementation the memory map can't be modified at run time.\\
 
    These are the currently supported memory types:
 
\begin{tabular}{ll}
\hline
Identifier & Description \\
\hline
MT\_BRAM            & Synchronous, on-chip FPGA BRAM\\
MT\_IO\_SYNC        & Synchronous, on-chip register (meant for I/O)\\
MT\_SRAM\_16B       & Asynchronous, off-chip static memory, 16-bit wide\\
MT\_SRAM\_8B        & Asynchronous, off-chip static memory, 8-bit wide\\
MT\_DDR\_16B        & Unused yet\\
MT\_UNMAPPED        & Unmapped area\\
\hline
\end{tabular}
 
\subsection{Invalid memory accesses}
\label{invalid_memory}
 
    Whenever the CPU attempts an invalid memory access, the 'unmapped' output
    port of the Cache module will be asserted for 1 clock cycle.
 
    The accesses that will raise the 'unmapped' output are these:
 
    \begin{enumerate}
    \item Code fetch from address decoded as MT\_IO\_SYNC.
    \item Data write to memory address decoded as other than RAM or IO.
    \item Any access to an address decoded as MT\_UNMAPPED.
    \end{enumerate}
 
    The 'unmapped' output is ignored by the current version of the parent MCU
    module -- it is only used to raise a permanent flag that is then connected
    to a LED for debugging purposes, hardly a useful approach in a real project.
 
    In subsequent versions of the MCU module, the 'unmapped' signal will trigger
    a hardware interrupt.
 
    Note again that the memory attributes 'writeable' and 'cacheable' are not
    used in the current version. In subsequent versions 'writeable' will be
    used and 'cacheable' will be removed.
 
\subsection{Uncacheable memory areas}
\label{uncacheable_memory}
 
    There are no predefined 'uncacheable' memory areas in the current version of
    the core; all memory addresses are cacheable unless they are defined as
    IO (see mips\_cache.vhdl).\\
    In short, if it's not defined as MT\_UNMAPPED or MT\_IO\_SYNC, it is
    cacheable.
 
\section{Cache Refill and Writethrough Chronograms}
\label{cache_state_machine}
 
 
The cache state machine deals with cache refills and data writethroughs. It is
this state machine that 'knows' about the type and size of the external
memories. when DRAM is eventually implemented, or when other widths of SRAM are
supported, this state machine will need to change.
 
The following subsections will describe the refill and writethrough procedures.
Bear in mind that there is a single state machine that handles it all -- refills
and writethroughs can't be done simultaneously.
 
 
\subsubsection{SRAM interface read cycle timing -- 16-bit interface}
\label{sram_read_cycle_16b}
 
The refill procedure is identical for both D- and I-cache refills. All that
matters is the type of memory being read.
 
 
 
\needspace{15\baselineskip}
\begin{verbatim}
==== Chronogram 4.1: 16-bit SRAM refill -- DATA ============================
              __    __    __    __    __    __    _     __    __    __    __
clk         _/  \__/  \__/  \__/  \__/  \__/  \__/ ..._/  \__/  \__/  \__/
 
cache/ps    ?| (1)             | (2)             | ... | (2)             |??
 
refill_ctr  ?| 0                                 | ... |  3              |??
 
chip_addr   ?|  210h           |  211h           | ... |  217h           |--
 
data_rd     -XXXXX  [218h]     XXXXX  [219h]     | ... XXXXX  [217h]     |--
             |<- 2-state sequence              ->|
            _                                                             __
sram_oe_n    \____________________________________ ... __________________/
             |<- Total: 24 clock cycles to refill a 4-word cache line  ->|
============================================================================
\end{verbatim}\\
 
(NOTE: signal names left-clipped to fit page margins)
 
In the diagram, the data coming into bram\_data\_rd is depicted with some delay.
 
Signal \emph{cache/ps} is the current state of the cache state machine, and
in this chronogram it takes the following values:
 
\begin{enumerate}
\item data\_refill\_sram\_0
\item data\_refill\_sram\_1
\end{enumerate}
 
Each of the two states reads a halfword from SRAM. The two-state sequence is
repeated four times to refill the four-word cache entry.
 
Signal \emph{refill\_ctr} counts the word index within the line being refilled,
and runs from 0 to 4.\\
 
 
\subsubsection{SRAM interface read cycle timing -- 8-bit interface}
\label{sram_read_cycle_8b}
 
The refill from an 8-bit static memory is essentially the same as depicted
above, except we need to read 4 bytes (over the LSB lines of the static memory
data bus) instead of 2 16-bit halfwords. The operation takes correspondingly
longer to perform and uses an extra address line but is otherwise identical.
 
TODO: 8-bit refill chronogram to be done.
 
 
\subsubsection{16-bit SRAM interface write cycle timing}
\label{sram_write_cycle}
 
The path of the state machine that deals with SRAM writethroughs is linear so
a state diagram would not be very interesting. As you can see in the source
code, all the states are one clock long except for states
\emph{data\_writethrough\_sram\_0b} and \emph{data\_writethrough\_sram\_1b},
which will be as long as the number of wait states plus one.
This is the only writethrough parameter that is influenced by the wait state
attribute.\\
 
A general memory write will be 32-bit wide and thus it will take two 16-bit
memory accesses to complete. Unaligned, halfword or byte wide CPU writes might
in some cases be optimized to take only a single 16-bit memory access. This
module does no such optimization yet.
For simplicity, all writethroughs take two 16-bit access cycles, even if one
of them has both we\_n signals deasserted.\\
 
The following chronogram has been copied from a simulation of the 'hello'
sample. It's a 32-bit wide write to address 00000430h.
As you can see, the 'chip address' (the address fed to the SRAM chip) is the
target address divided by 2 (because there are 2 16-bit halfwords to the 32-bit
word). In this particular case, all the four bytes of the long word are written
and so both the we\_n signals are asserted for both halfwords.
 
In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for
two cycles.
Note how a lot of cycles are used in order to guarantee compliance with the
setup and hold times of the SRAM against the we, address and data lines.
 
\needspace{15\baselineskip}
\begin{verbatim}
==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================
                     __    __    __    __    __    __    __    __    __    _
clk                _/  \__/  \__/  \__/  \__/  \__/  \__/  \__/  \__/  \__/
 
cache/ps           ?| (1) | (2) | (3)       | (4) | (5) | (6)       | (7) |?
 
sram_chip_addr     ?|          218h               |       219h            |?
 
sram_data_wr       -------|       0000h           |         044Ch         |-
                   _____________             ___________             _______
sram_byte_we_n(0)               \___________/           \___________/
                   _____________             ___________             _______
sram_byte_we_n(1)               \___________/           \___________/
 
                   _________________________________________________________
sram_oe_n
============================================================================
\end{verbatim}\\
 
Signal \emph{cache/ps} is the current state of the cache state machine, and
in this chronogram it takes the following values:
 
\begin{enumerate}
\item data\_writethrough\_sram\_0a
\item data\_writethrough\_sram\_0b
\item data\_writethrough\_sram\_0c
\item data\_writethrough\_sram\_1a
\item data\_writethrough\_sram\_1b
\item data\_writethrough\_sram\_1c
\end{enumerate}
 
 
 
\section{Known Problems}
\label{cache_problems}
 
    The cache implementation is still provisional and has a number of
    acknowledged problems:
 
\begin{enumerate}
\item All parameters hardcoded -- generics are almost ignored.
\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.
        In my current target board it works because the FPGA hold times
        (including an input mux
        in the parent module) are far smaller than the SRAM response times, but
        it would be better to insert an extra cycle after the wait states in
        the sram read state machine.
\item Cache logic mixed with memory controller logic.
\end{enumerate}
 

Browse

Tools

Subversion Repositories ion

[/] [ion/] [trunk/] [doc/] [src/] [tex/] [cache.tex] - Blame information for rev 232

Line No.	Rev	Author	Line
1	210	ja_rd	`\chapter{Cache/Memory Controller Module}`
2			`\label{cache}`
3
4			`The project includes a cache+memory controller module from revision 114.\\`
5
6			`Both the I- and the D-Cache are implemented. But the parametrization`
7			`generics are still mostly unused, with many values hardcoded. And SDRAM is`
8			`not supported yet. Besides, there are some loose ends in the implementation`
9			`still to be solved, exlained in section ~\ref{cache_problems}.\\`
10
11
12			`\section{Cache Initialization and Control}`
13			`\label{cache_init_and_control}`
14
15			`The cache module comes up from reset in an indeterminate, unuseable state.`
16			`It needs to be initialized before being enabled.\\`
17			`Initialization means mostly marking all D- and I-cache lines as invalid.`
18			`The old R3000 had its own means to achieve this, but this core implements an`
19			`alternative, simplified scheme.\\`
20
21			`The standard R3000 cache control flags in the SR are not used, either. Instead,`
22	221	ja_rd	`two flags from the SR have been repurposed for cache control.\\`
23	210	ja_rd
24			`\subsection{Cache control flags}`
25			`\label{cache_control_flags}`
26
27			`Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.`
28			`Instead they are used as explained below:`
29
30			`\begin{itemize}`
31			`\item Bit 17: Cache enable [reset value = 0]`
32			`\item Bit 16: I- and D-Cache line invalidate [reset value = 0]`
33			`\end{itemize}`
34
35			`You always use both these flags together to set the cache operating mode:`
36
37			`\begin{itemize}`
38			`\item Bits 17:16='00'\\`
39			`Cache is enabled and working.`
40			`\item Bits 17:16='01'\\`
41			`Cache is in D- and I-cache line invalidation mode.\\`
42			`Writing word X.X.X.N to ANY address will`
43			`invalidate I-Cache line N (N is an 8-bit word and X is an 8-bit`
44			`don't care). Besides, the actual write will be performed too; be`
45			`careful...\\`
46
47			`Reading from any address will cause the`
48			`corresponding D-cache line to be invalidated; the read will not be`
49			`actually performed and the read value is undefined.\\`
50			`\item Bits 17:16='10'\\`
51			`Cache is disabled; no lines are being invalidated.\\`
52			`\item Bits 17:16='11'\\`
53			`Cache behavior is UNDETERMINED -- i.e. guaranteed crash.\\`
54			`\end{itemize}`
55
56			`Now, after reset the cache memory comes up in an undetermined state but`
57			`it comes up disabled too. Before enabling it, you need to invalidate all`
58			`cache lines in software (see routine cache\_init in the memtest sample).\\`
59
60			`\section{Cache Tags and Cache Address Mirroring}`
61			`\label{cache_tags}`
62
63			`In order to save space in the I-Cache tag table, the tags are shorter than`
64			`they should -- 14 bits instead of the 20 bits we would need to cover the`
65			`entire 32-bit address:`
66
67			`\needspace{10\baselineskip}`
68			`\begin{verbatim}`
69
70	221	ja_rd	`_________ <-- These address bits are NOT in the tag`
71			`/ \`
72	210	ja_rd	`31 .. 27\| 26 .. 21 \|20 .. 12\|11 .. 4\|3:2\|`
73			`+---------+-----------+-----------------+---------------+---+---+`
74			`\| 5 \| \| 9 \| 8 \| 2 \| \|`
75			`+---------+-----------+-----------------+---------------+---+---+`
76			`^ ^ ^ ^- LINE_INDEX_SIZE`
77			`5 bits 9 bits LINE_NUMBER_SIZE`
78
79			`\end{verbatim}\\`
80
81			`Since bits 26 downto 21 are not included in the tag, there will be a`
82			`'mirror' effect in the cache. We have effectively split the memory space`
83			`into 32 separate blocks of 1MB which is obviously not enough but will do`
84	221	ja_rd	`for the initial versions of the core.`
85
86	210	ja_rd	`In subsequent versions of the cache, the tag size needs to be enlarged AND`
87			`some of the top bits might be omitted when they're not needed to implement`
88	221	ja_rd	`the default MIPS memory map (namely bit 30 which is always '0').`
89	210	ja_rd
90
91			`\section{Memory Controller}`
92			`\label{memory_controller}`
93
94			`The cache functionality and the memory controller functionality are so`
95			`closely related that I found it convenient to bundle both in the same`
96			`module: I have experimented with separate modules and was unable to come up`
97			`with the same performance with my synthesis tools of choice.`
98			`So, while it would be desirable to split the cache from the memory controller`
99			`at some later version, for the time being they are a single module.\\`
100
101			`The memory controller interfaces the cache to external memory, be it off-core`
102			`or off-chip.`
103			`The memory controller implements the refill and writethrough state machines,`
104			`that will necessarily be different for different kinds of memory.\\`
105
106
107			`\subsection{Memory Map Definition}`
108			`\label{memory_map_definition}`
109
110			`The MIPS architecture specs define a memory map which determines which areas`
111			`are cached and which is the default address translation \cite[p.~2-8]{r3k_ref_man}.\\`
112			`Neither the memory translation nor the cacheable/uncacheable attribute of`
113			`the standard MIPS architecture have been implemented. In this core, program`
114			`addresses are always identical to hardware addresses.\\`
115
116			`When requested to perform a refill or a writethrough, the memory controller`
117			`needs to know what type of memory it is to be dealing with. The type of`
118			`memory for each memory area is defined in a hardcoded memory map`
119			`implemented in function 'decode\_address\_mips1', defined in package`
120			`'mips\_pkg.vhdl'. This function will synthesize into regular, combinational`
121			`decode logic.\\`
122
123			`For each address, the memory map logic will supply the following information:`
124
125			`\begin{enumerate}`
126	221	ja_rd	`\item What kind of memory it is.`
127			`\item How many wait states to use.`
128			`\item Whether it is writeable or not (ignored in current version).`
129			`\item Whether it is cacheable or not (ignored in current version).`
130	210	ja_rd	`\end{enumerate}`
131
132			`In the present implementation the memory map can't be modified at run time.\\`
133
134			`These are the currently supported memory types:`
135
136			`\begin{tabular}{ll}`
137			`\hline`
138			`Identifier & Description \\`
139			`\hline`
140			`MT\_BRAM & Synchronous, on-chip FPGA BRAM\\`
141			`MT\_IO\_SYNC & Synchronous, on-chip register (meant for I/O)\\`
142			`MT\_SRAM\_16B & Asynchronous, off-chip static memory, 16-bit wide\\`
143			`MT\_SRAM\_8B & Asynchronous, off-chip static memory, 8-bit wide\\`
144			`MT\_DDR\_16B & Unused yet\\`
145			`MT\_UNMAPPED & Unmapped area\\`
146			`\hline`
147			`\end{tabular}`
148
149			`\subsection{Invalid memory accesses}`
150			`\label{invalid_memory}`
151
152			`Whenever the CPU attempts an invalid memory access, the 'unmapped' output`
153			`port of the Cache module will be asserted for 1 clock cycle.`
154
155			`The accesses that will raise the 'unmapped' output are these:`
156
157			`\begin{enumerate}`
158			`\item Code fetch from address decoded as MT\_IO\_SYNC.`
159			`\item Data write to memory address decoded as other than RAM or IO.`
160			`\item Any access to an address decoded as MT\_UNMAPPED.`
161			`\end{enumerate}`
162
163			`The 'unmapped' output is ignored by the current version of the parent MCU`
164			`module -- it is only used to raise a permanent flag that is then connected`
165			`to a LED for debugging purposes, hardly a useful approach in a real project.`
166
167			`In subsequent versions of the MCU module, the 'unmapped' signal will trigger`
168			`a hardware interrupt.`
169
170			`Note again that the memory attributes 'writeable' and 'cacheable' are not`
171			`used in the current version. In subsequent versions 'writeable' will be`
172			`used and 'cacheable' will be removed.`
173
174			`\subsection{Uncacheable memory areas}`
175			`\label{uncacheable_memory}`
176
177			`There are no predefined 'uncacheable' memory areas in the current version of`
178			`the core; all memory addresses are cacheable unless they are defined as`
179			`IO (see mips\_cache.vhdl).\\`
180			`In short, if it's not defined as MT\_UNMAPPED or MT\_IO\_SYNC, it is`
181			`cacheable.`
182
183			`\section{Cache Refill and Writethrough Chronograms}`
184			`\label{cache_state_machine}`
185
186
187			`The cache state machine deals with cache refills and data writethroughs. It is`
188			`this state machine that 'knows' about the type and size of the external`
189			`memories. when DRAM is eventually implemented, or when other widths of SRAM are`
190			`supported, this state machine will need to change.`
191
192			`The following subsections will describe the refill and writethrough procedures.`
193			`Bear in mind that there is a single state machine that handles it all -- refills`
194			`and writethroughs can't be done simultaneously.`
195
196
197			`\subsubsection{SRAM interface read cycle timing -- 16-bit interface}`
198			`\label{sram_read_cycle_16b}`
199
200			`The refill procedure is identical for both D- and I-cache refills. All that`
201			`matters is the type of memory being read.`
202
203
204
205			`\needspace{15\baselineskip}`
206			`\begin{verbatim}`
207			`==== Chronogram 4.1: 16-bit SRAM refill -- DATA ============================`
208			`__ __ __ __ __ __ _ __ __ __ __`
209			`clk _/ \__/ \__/ \__/ \__/ \__/ \__/ ..._/ \__/ \__/ \__/`
210
211			`cache/ps ?\| (1) \| (2) \| ... \| (2) \|??`
212
213	221	ja_rd	`refill_ctr ?\| 0 \| ... \| 3 \|??`
214	210	ja_rd
215			`chip_addr ?\| 210h \| 211h \| ... \| 217h \|--`
216
217			`data_rd -XXXXX [218h] XXXXX [219h] \| ... XXXXX [217h] \|--`
218			`\|<- 2-state sequence ->\|`
219			`_ __`
220			`sram_oe_n \____________________________________ ... __________________/`
221			`\|<- Total: 24 clock cycles to refill a 4-word cache line ->\|`
222			`============================================================================`
223			`\end{verbatim}\\`
224
225			`(NOTE: signal names left-clipped to fit page margins)`
226
227			`In the diagram, the data coming into bram\_data\_rd is depicted with some delay.`
228
229			`Signal \emph{cache/ps} is the current state of the cache state machine, and`
230			`in this chronogram it takes the following values:`
231
232			`\begin{enumerate}`
233			`\item data\_refill\_sram\_0`
234			`\item data\_refill\_sram\_1`
235			`\end{enumerate}`
236
237			`Each of the two states reads a halfword from SRAM. The two-state sequence is`
238			`repeated four times to refill the four-word cache entry.`
239
240			`Signal \emph{refill\_ctr} counts the word index within the line being refilled,`
241			`and runs from 0 to 4.\\`
242
243
244			`\subsubsection{SRAM interface read cycle timing -- 8-bit interface}`
245			`\label{sram_read_cycle_8b}`
246
247	221	ja_rd	`The refill from an 8-bit static memory is essentially the same as depicted`
248			`above, except we need to read 4 bytes (over the LSB lines of the static memory`
249			`data bus) instead of 2 16-bit halfwords. The operation takes correspondingly`
250			`longer to perform and uses an extra address line but is otherwise identical.`
251	210	ja_rd
252	221	ja_rd	`TODO: 8-bit refill chronogram to be done.`
253	210	ja_rd
254	221	ja_rd
255			`\subsubsection{16-bit SRAM interface write cycle timing}`
256	210	ja_rd	`\label{sram_write_cycle}`
257
258			`The path of the state machine that deals with SRAM writethroughs is linear so`
259			`a state diagram would not be very interesting. As you can see in the source`
260			`code, all the states are one clock long except for states`
261			`\emph{data\_writethrough\_sram\_0b} and \emph{data\_writethrough\_sram\_1b},`
262			`which will be as long as the number of wait states plus one.`
263			`This is the only writethrough parameter that is influenced by the wait state`
264			`attribute.\\`
265
266			`A general memory write will be 32-bit wide and thus it will take two 16-bit`
267			`memory accesses to complete. Unaligned, halfword or byte wide CPU writes might`
268			`in some cases be optimized to take only a single 16-bit memory access. This`
269	221	ja_rd	`module does no such optimization yet.`
270	210	ja_rd	`For simplicity, all writethroughs take two 16-bit access cycles, even if one`
271			`of them has both we\_n signals deasserted.\\`
272
273			`The following chronogram has been copied from a simulation of the 'hello'`
274			`sample. It's a 32-bit wide write to address 00000430h.`
275			`As you can see, the 'chip address' (the address fed to the SRAM chip) is the`
276			`target address divided by 2 (because there are 2 16-bit halfwords to the 32-bit`
277			`word). In this particular case, all the four bytes of the long word are written`
278			`and so both the we\_n signals are asserted for both halfwords.`
279
280			`In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for`
281			`two cycles.`
282	221	ja_rd	`Note how a lot of cycles are used in order to guarantee compliance with the`
283	210	ja_rd	`setup and hold times of the SRAM against the we, address and data lines.`
284
285			`\needspace{15\baselineskip}`
286			`\begin{verbatim}`
287			`==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================`
288			`__ __ __ __ __ __ __ __ __ _`
289			`clk _/ \__/ \__/ \__/ \__/ \__/ \__/ \__/ \__/ \__/`
290
291			`cache/ps ?\| (1) \| (2) \| (3) \| (4) \| (5) \| (6) \| (7) \|?`
292
293			`sram_chip_addr ?\| 218h \| 219h \|?`
294
295			`sram_data_wr -------\| 0000h \| 044Ch \|-`
296			`_____________ ___________ _______`
297			`sram_byte_we_n(0) \___________/ \___________/`
298			`_____________ ___________ _______`
299			`sram_byte_we_n(1) \___________/ \___________/`
300
301			`_________________________________________________________`
302			`sram_oe_n`
303			`============================================================================`
304			`\end{verbatim}\\`
305
306			`Signal \emph{cache/ps} is the current state of the cache state machine, and`
307			`in this chronogram it takes the following values:`
308
309			`\begin{enumerate}`
310			`\item data\_writethrough\_sram\_0a`
311			`\item data\_writethrough\_sram\_0b`
312			`\item data\_writethrough\_sram\_0c`
313			`\item data\_writethrough\_sram\_1a`
314			`\item data\_writethrough\_sram\_1b`
315			`\item data\_writethrough\_sram\_1c`
316			`\end{enumerate}`
317
318
319
320			`\section{Known Problems}`
321			`\label{cache_problems}`
322
323	221	ja_rd	`The cache implementation is still provisional and has a number of`
324			`acknowledged problems:`
325
326	210	ja_rd	`\begin{enumerate}`
327			`\item All parameters hardcoded -- generics are almost ignored.`
328			`\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.`
329			`In my current target board it works because the FPGA hold times`
330			`(including an input mux`
331			`in the parent module) are far smaller than the SRAM response times, but`
332			`it would be better to insert an extra cycle after the wait states in`
333			`the sram read state machine.`
334	221	ja_rd	`\item Cache logic mixed with memory controller logic.`
335	210	ja_rd	`\end{enumerate}`
336