Line 8... |
Line 8... |
\makebox[\textwidth]{\framebox[9cm]{\rule{0pt}{9cm}
|
\makebox[\textwidth]{\framebox[9cm]{\rule{0pt}{9cm}
|
\includegraphics[width=8cm]{img/cpu_symbol.png}}}
|
\includegraphics[width=8cm]{img/cpu_symbol.png}}}
|
\caption{CPU module interface\label{cpu_symbol}}
|
\caption{CPU module interface\label{cpu_symbol}}
|
\end{figure}
|
\end{figure}
|
|
|
the CPU module is not meant to be used directly. Instead, the MCU module
|
the CPU module is not meant to be used directly. Instead, the SoC module
|
described in section ~\ref{mcu_module} should be used.\\
|
described in chapter ~\ref{soc_module} should be used.\\
|
|
|
The following sections will describe the CPU structure and interfaces.\\
|
The following sections will describe the CPU structure and interfaces.\\
|
|
|
\section{Bus Architecture}
|
\section{Bus Architecture}
|
\label{bus_architecture}
|
\label{bus_architecture}
|
Line 44... |
Line 44... |
bus(es).\\
|
bus(es).\\
|
|
|
Note that the basic cpu module (mips\_cpu) is meant to be connected to
|
Note that the basic cpu module (mips\_cpu) is meant to be connected to
|
internal, synchronous BRAMs only (i.e. the cache BRAMs). Some of its
|
internal, synchronous BRAMs only (i.e. the cache BRAMs). Some of its
|
outputs are not registered because they needn't be. The parent module
|
outputs are not registered because they needn't be. The parent module
|
(called 'mips\_mcu') has its outputs registered to limit $t_{co}$ to
|
(called 'mips\_soc') has its outputs registered to limit $t_{co}$ to
|
acceptable values.\\
|
acceptable values.\\
|
|
|
|
|
\subsection{Code and data read bus interface}
|
\subsection{Code and data read bus interface}
|
\label{code_data_buses}
|
\label{code_data_buses}
|
Line 169... |
Line 169... |
|
|
In short, the 'mem\_wait' input will unconditionally stall all pipeline
|
In short, the 'mem\_wait' input will unconditionally stall all pipeline
|
stages as long as it is active. It is meant to be used by the cache at cache
|
stages as long as it is active. It is meant to be used by the cache at cache
|
refills.\\
|
refills.\\
|
|
|
In short, the cache/memory controller stops the cpu for all data/code
|
The cache/memory controller stops the cpu for all data/code
|
misses for as long as it takes to do a line refill. The current cache
|
misses for as long as it takes to do a line refill. The current cache
|
implementation does refills in order (i.e. not 'target address first').
|
implementation does refills in reverse order (i.e. not 'target address first').
|
|
|
Note that external memory wait states are a separate issue. They too are
|
Note that external memory wait states are a separate issue. They too are
|
handled in the cache/memory controller. See section~\ref{cache} on the memory
|
handled in the cache/memory controller. See section~\ref{cache} on the memory
|
controller.
|
controller.
|
|
|
Line 269... |
Line 269... |
|
|
All read and write ports of the register bank are synchronous. The read
|
All read and write ports of the register bank are synchronous. The read
|
ports belong logically to stage 1 and the write port to stage 2.\\
|
ports belong logically to stage 1 and the write port to stage 2.\\
|
|
|
IMPORTANT: though the register bank read port is synchronous, its data can
|
IMPORTANT: though the register bank read port is synchronous, its data can
|
be used in stage 1 because it is read early (the read port is loaded at the
|
be used in stage 1 because it is read early (the read address port is loaded at the
|
same time as the instruction opcode). That is, a small part of the
|
same time as the instruction opcode). That is, a small part of the
|
instruction decoding is done on stage FETCH-1. Bearing in mind that the code
|
instruction decoding is done on stage FETCH-1, by feeding the source
|
|
register index field straight from the code bus to the register bank BRAM.
|
|
|
|
Bearing in mind that the code cache
|
ram is meant to be the exact same type of block as the register bank (or
|
ram is meant to be the exact same type of block as the register bank (or
|
faster if the register bank is implemented with distributed RAM), and we
|
faster if the register bank is implemented with distributed RAM), and we
|
will cram the whole ALU delay plus the reg bank delay in stage 1, it does
|
will cram the whole ALU delay plus the reg bank delay in stage 1, it does
|
not hurt moving a tiny part of the decoding to the previous cycle.\\
|
not hurt moving a tiny part of the decoding to the previous cycle.\\
|
|
|
Line 452... |
Line 455... |
The following instruction is aborted even if it is a load or a jump, and
|
The following instruction is aborted even if it is a load or a jump, and
|
traps work as specified even from a delay slot -- in that case, the address
|
traps work as specified even from a delay slot -- in that case, the address
|
saved to EPF is not the victim instruction's but the preceding jump
|
saved to EPF is not the victim instruction's but the preceding jump
|
instruction's as explained in \cite[p.~64]{see_mips_run}.\\
|
instruction's as explained in \cite[p.~64]{see_mips_run}.\\
|
|
|
Plasma used to save in epc the address of the instruction after break or
|
Plasma CPU used to save in epc the address of the instruction after break or
|
syscall, and used an unstandard vector address (0x03c). This core will go
|
syscall, and used an unstandard vector address (0x03c). This core will go
|
the standard R3000 way instead.\\
|
the standard R3000 way instead.\\
|
|
|
Note that the epc register is not used by any instruction other than mfc0.
|
Note that the epc register is not used by any instruction other than mfc0.
|
RTE is implemented and works as per R3000 specs.\\
|
RTE is implemented and works as per R3000 specs.\\
|