Line 17... |
Line 17... |
Initialization means mostly marking all D- and I-cache lines as invalid.
|
Initialization means mostly marking all D- and I-cache lines as invalid.
|
The old R3000 had its own means to achieve this, but this core implements an
|
The old R3000 had its own means to achieve this, but this core implements an
|
alternative, simplified scheme.\\
|
alternative, simplified scheme.\\
|
|
|
The standard R3000 cache control flags in the SR are not used, either. Instead,
|
The standard R3000 cache control flags in the SR are not used, either. Instead,
|
two flags from the SR have been commandeered for cache control.\\
|
two flags from the SR have been repurposed for cache control.\\
|
|
|
\subsection{Cache control flags}
|
\subsection{Cache control flags}
|
\label{cache_control_flags}
|
\label{cache_control_flags}
|
|
|
Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.
|
Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.
|
Line 65... |
Line 65... |
entire 32-bit address:
|
entire 32-bit address:
|
|
|
\needspace{10\baselineskip}
|
\needspace{10\baselineskip}
|
\begin{verbatim}
|
\begin{verbatim}
|
|
|
___________ <-- These address bits are NOT in the tag
|
_________ <-- These address bits are NOT in the tag
|
/ \
|
/ \
|
31 .. 27| 26 .. 21 |20 .. 12|11 .. 4|3:2|
|
31 .. 27| 26 .. 21 |20 .. 12|11 .. 4|3:2|
|
+---------+-----------+-----------------+---------------+---+---+
|
+---------+-----------+-----------------+---------------+---+---+
|
| 5 | | 9 | 8 | 2 | |
|
| 5 | | 9 | 8 | 2 | |
|
+---------+-----------+-----------------+---------------+---+---+
|
+---------+-----------+-----------------+---------------+---+---+
|
Line 79... |
Line 79... |
\end{verbatim}\\
|
\end{verbatim}\\
|
|
|
Since bits 26 downto 21 are not included in the tag, there will be a
|
Since bits 26 downto 21 are not included in the tag, there will be a
|
'mirror' effect in the cache. We have effectively split the memory space
|
'mirror' effect in the cache. We have effectively split the memory space
|
into 32 separate blocks of 1MB which is obviously not enough but will do
|
into 32 separate blocks of 1MB which is obviously not enough but will do
|
for the initial tests.
|
for the initial versions of the core.
|
|
|
In subsequent versions of the cache, the tag size needs to be enlarged AND
|
In subsequent versions of the cache, the tag size needs to be enlarged AND
|
some of the top bits might be omitted when they're not needed to implement
|
some of the top bits might be omitted when they're not needed to implement
|
the default memory map (namely bit 30 which is always '0').
|
the default MIPS memory map (namely bit 30 which is always '0').
|
|
|
|
|
\section{Memory Controller}
|
\section{Memory Controller}
|
\label{memory_controller}
|
\label{memory_controller}
|
|
|
Line 120... |
Line 121... |
decode logic.\\
|
decode logic.\\
|
|
|
For each address, the memory map logic will supply the following information:
|
For each address, the memory map logic will supply the following information:
|
|
|
\begin{enumerate}
|
\begin{enumerate}
|
\item What kind of memory it is
|
\item What kind of memory it is.
|
\item How many wait states to use
|
\item How many wait states to use.
|
\item Whether it is writeable or not (ignored in current version)
|
\item Whether it is writeable or not (ignored in current version).
|
\item Whether it is cacheable or not (ignored in current version)
|
\item Whether it is cacheable or not (ignored in current version).
|
\end{enumerate}
|
\end{enumerate}
|
|
|
In the present implementation the memory map can't be modified at run time.\\
|
In the present implementation the memory map can't be modified at run time.\\
|
|
|
These are the currently supported memory types:
|
These are the currently supported memory types:
|
Line 207... |
Line 208... |
__ __ __ __ __ __ _ __ __ __ __
|
__ __ __ __ __ __ _ __ __ __ __
|
clk _/ \__/ \__/ \__/ \__/ \__/ \__/ ..._/ \__/ \__/ \__/
|
clk _/ \__/ \__/ \__/ \__/ \__/ \__/ ..._/ \__/ \__/ \__/
|
|
|
cache/ps ?| (1) | (2) | ... | (2) |??
|
cache/ps ?| (1) | (2) | ... | (2) |??
|
|
|
refill_ctr ?| 0 | ... < 3 |??
|
refill_ctr ?| 0 | ... | 3 |??
|
|
|
chip_addr ?| 210h | 211h | ... | 217h |--
|
chip_addr ?| 210h | 211h | ... | 217h |--
|
|
|
data_rd -XXXXX [218h] XXXXX [219h] | ... XXXXX [217h] |--
|
data_rd -XXXXX [218h] XXXXX [219h] | ... XXXXX [217h] |--
|
|<- 2-state sequence ->|
|
|<- 2-state sequence ->|
|
Line 227... |
Line 228... |
|
|
Signal \emph{cache/ps} is the current state of the cache state machine, and
|
Signal \emph{cache/ps} is the current state of the cache state machine, and
|
in this chronogram it takes the following values:
|
in this chronogram it takes the following values:
|
|
|
\begin{enumerate}
|
\begin{enumerate}
|
\item idle
|
|
\item data\_refill\_sram\_0
|
\item data\_refill\_sram\_0
|
\item data\_refill\_sram\_1
|
\item data\_refill\_sram\_1
|
\end{enumerate}
|
\end{enumerate}
|
|
|
Each of the two states reads a halfword from SRAM. The two-state sequence is
|
Each of the two states reads a halfword from SRAM. The two-state sequence is
|
Line 242... |
Line 242... |
|
|
|
|
\subsubsection{SRAM interface read cycle timing -- 8-bit interface}
|
\subsubsection{SRAM interface read cycle timing -- 8-bit interface}
|
\label{sram_read_cycle_8b}
|
\label{sram_read_cycle_8b}
|
|
|
TODO: 8-bit refill procedure to be done.
|
The refill from an 8-bit static memory is essentially the same as depicted
|
|
above, except we need to read 4 bytes (over the LSB lines of the static memory
|
|
data bus) instead of 2 16-bit halfwords. The operation takes correspondingly
|
|
longer to perform and uses an extra address line but is otherwise identical.
|
|
|
|
TODO: 8-bit refill chronogram to be done.
|
|
|
|
|
\subsubsection{SRAM interface write cycle timing}
|
\subsubsection{16-bit SRAM interface write cycle timing}
|
\label{sram_write_cycle}
|
\label{sram_write_cycle}
|
|
|
The path of the state machine that deals with SRAM writethroughs is linear so
|
The path of the state machine that deals with SRAM writethroughs is linear so
|
a state diagram would not be very interesting. As you can see in the source
|
a state diagram would not be very interesting. As you can see in the source
|
code, all the states are one clock long except for states
|
code, all the states are one clock long except for states
|
Line 259... |
Line 264... |
attribute.\\
|
attribute.\\
|
|
|
A general memory write will be 32-bit wide and thus it will take two 16-bit
|
A general memory write will be 32-bit wide and thus it will take two 16-bit
|
memory accesses to complete. Unaligned, halfword or byte wide CPU writes might
|
memory accesses to complete. Unaligned, halfword or byte wide CPU writes might
|
in some cases be optimized to take only a single 16-bit memory access. This
|
in some cases be optimized to take only a single 16-bit memory access. This
|
module does no such optimization.
|
module does no such optimization yet.
|
For simplicity, all writethroughs take two 16-bit access cycles, even if one
|
For simplicity, all writethroughs take two 16-bit access cycles, even if one
|
of them has both we\_n signals deasserted.\\
|
of them has both we\_n signals deasserted.\\
|
|
|
The following chronogram has been copied from a simulation of the 'hello'
|
The following chronogram has been copied from a simulation of the 'hello'
|
sample. It's a 32-bit wide write to address 00000430h.
|
sample. It's a 32-bit wide write to address 00000430h.
|
Line 272... |
Line 277... |
word). In this particular case, all the four bytes of the long word are written
|
word). In this particular case, all the four bytes of the long word are written
|
and so both the we\_n signals are asserted for both halfwords.
|
and so both the we\_n signals are asserted for both halfwords.
|
|
|
In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for
|
In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for
|
two cycles.
|
two cycles.
|
Note how a lot of cycles are lost in order to guarantee compliance with the
|
Note how a lot of cycles are used in order to guarantee compliance with the
|
setup and hold times of the SRAM against the we, address and data lines.
|
setup and hold times of the SRAM against the we, address and data lines.
|
|
|
\needspace{15\baselineskip}
|
\needspace{15\baselineskip}
|
\begin{verbatim}
|
\begin{verbatim}
|
==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================
|
==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================
|
Line 300... |
Line 305... |
|
|
Signal \emph{cache/ps} is the current state of the cache state machine, and
|
Signal \emph{cache/ps} is the current state of the cache state machine, and
|
in this chronogram it takes the following values:
|
in this chronogram it takes the following values:
|
|
|
\begin{enumerate}
|
\begin{enumerate}
|
\item idle
|
|
\item data\_writethrough\_sram\_0a
|
\item data\_writethrough\_sram\_0a
|
\item data\_writethrough\_sram\_0b
|
\item data\_writethrough\_sram\_0b
|
\item data\_writethrough\_sram\_0c
|
\item data\_writethrough\_sram\_0c
|
\item data\_writethrough\_sram\_1a
|
\item data\_writethrough\_sram\_1a
|
\item data\_writethrough\_sram\_1b
|
\item data\_writethrough\_sram\_1b
|
Line 314... |
Line 318... |
|
|
|
|
\section{Known Problems}
|
\section{Known Problems}
|
\label{cache_problems}
|
\label{cache_problems}
|
|
|
|
The cache implementation is still provisional and has a number of
|
|
acknowledged problems:
|
|
|
\begin{enumerate}
|
\begin{enumerate}
|
\item All parameters hardcoded -- generics are almost ignored.
|
\item All parameters hardcoded -- generics are almost ignored.
|
\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.
|
\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.
|
In my current target board it works because the FPGA hold times
|
In my current target board it works because the FPGA hold times
|
(including an input mux
|
(including an input mux
|
in the parent module) are far smaller than the SRAM response times, but
|
in the parent module) are far smaller than the SRAM response times, but
|
it would be better to insert an extra cycle after the wait states in
|
it would be better to insert an extra cycle after the wait states in
|
the sram read state machine.
|
the sram read state machine.
|
|
\item Cache logic mixed with memory controller logic.
|
\end{enumerate}
|
\end{enumerate}
|
|
|
|
|
No newline at end of file
|
No newline at end of file
|