OpenCores

Rev 210	Rev 221
Line 17...	Line 17...
`Initialization means mostly marking all D- and I-cache lines as invalid.`	`Initialization means mostly marking all D- and I-cache lines as invalid.`
`The old R3000 had its own means to achieve this, but this core implements an`	`The old R3000 had its own means to achieve this, but this core implements an`
`alternative, simplified scheme.\\`	`alternative, simplified scheme.\\`

`The standard R3000 cache control flags in the SR are not used, either. Instead,`	`The standard R3000 cache control flags in the SR are not used, either. Instead,`
`two flags from the SR have been commandeered for cache control.\\`	`two flags from the SR have been repurposed for cache control.\\`

`\subsection{Cache control flags}`	`\subsection{Cache control flags}`
`\label{cache_control_flags}`	`\label{cache_control_flags}`

`Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.`	`Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.`
Line 65...	Line 65...
`entire 32-bit address:`	`entire 32-bit address:`

`\needspace{10\baselineskip}`	`\needspace{10\baselineskip}`
`\begin{verbatim}`	`\begin{verbatim}`

`___________ <-- These address bits are NOT in the tag`	`_________ <-- These address bits are NOT in the tag`
`/ \`	`/ \`
`31 .. 27\| 26 .. 21 \|20 .. 12\|11 .. 4\|3:2\|`	`31 .. 27\| 26 .. 21 \|20 .. 12\|11 .. 4\|3:2\|`
`+---------+-----------+-----------------+---------------+---+---+`	`+---------+-----------+-----------------+---------------+---+---+`
`\| 5 \| \| 9 \| 8 \| 2 \| \|`	`\| 5 \| \| 9 \| 8 \| 2 \| \|`
`+---------+-----------+-----------------+---------------+---+---+`	`+---------+-----------+-----------------+---------------+---+---+`
Line 79...	Line 79...
`\end{verbatim}\\`	`\end{verbatim}\\`

`Since bits 26 downto 21 are not included in the tag, there will be a`	`Since bits 26 downto 21 are not included in the tag, there will be a`
`'mirror' effect in the cache. We have effectively split the memory space`	`'mirror' effect in the cache. We have effectively split the memory space`
`into 32 separate blocks of 1MB which is obviously not enough but will do`	`into 32 separate blocks of 1MB which is obviously not enough but will do`
`for the initial tests.`	`for the initial versions of the core.`

`In subsequent versions of the cache, the tag size needs to be enlarged AND`	`In subsequent versions of the cache, the tag size needs to be enlarged AND`
`some of the top bits might be omitted when they're not needed to implement`	`some of the top bits might be omitted when they're not needed to implement`
`the default memory map (namely bit 30 which is always '0').`	`the default MIPS memory map (namely bit 30 which is always '0').`


`\section{Memory Controller}`	`\section{Memory Controller}`
`\label{memory_controller}`	`\label{memory_controller}`

Line 120...	Line 121...
`decode logic.\\`	`decode logic.\\`

`For each address, the memory map logic will supply the following information:`	`For each address, the memory map logic will supply the following information:`

`\begin{enumerate}`	`\begin{enumerate}`
`\item What kind of memory it is`	`\item What kind of memory it is.`
`\item How many wait states to use`	`\item How many wait states to use.`
`\item Whether it is writeable or not (ignored in current version)`	`\item Whether it is writeable or not (ignored in current version).`
`\item Whether it is cacheable or not (ignored in current version)`	`\item Whether it is cacheable or not (ignored in current version).`
`\end{enumerate}`	`\end{enumerate}`

`In the present implementation the memory map can't be modified at run time.\\`	`In the present implementation the memory map can't be modified at run time.\\`

`These are the currently supported memory types:`	`These are the currently supported memory types:`
Line 207...	Line 208...
`__ __ __ __ __ __ _ __ __ __ __`	`__ __ __ __ __ __ _ __ __ __ __`
`clk _/ \__/ \__/ \__/ \__/ \__/ \__/ ..._/ \__/ \__/ \__/`	`clk _/ \__/ \__/ \__/ \__/ \__/ \__/ ..._/ \__/ \__/ \__/`

`cache/ps ?\| (1) \| (2) \| ... \| (2) \|??`	`cache/ps ?\| (1) \| (2) \| ... \| (2) \|??`

`refill_ctr ?\| 0 \| ... < 3 \|??`	`refill_ctr ?\| 0 \| ... \| 3 \|??`

`chip_addr ?\| 210h \| 211h \| ... \| 217h \|--`	`chip_addr ?\| 210h \| 211h \| ... \| 217h \|--`

`data_rd -XXXXX [218h] XXXXX [219h] \| ... XXXXX [217h] \|--`	`data_rd -XXXXX [218h] XXXXX [219h] \| ... XXXXX [217h] \|--`
`\|<- 2-state sequence ->\|`	`\|<- 2-state sequence ->\|`
Line 227...	Line 228...

`Signal \emph{cache/ps} is the current state of the cache state machine, and`	`Signal \emph{cache/ps} is the current state of the cache state machine, and`
`in this chronogram it takes the following values:`	`in this chronogram it takes the following values:`

`\begin{enumerate}`	`\begin{enumerate}`
`\item idle`
`\item data\_refill\_sram\_0`	`\item data\_refill\_sram\_0`
`\item data\_refill\_sram\_1`	`\item data\_refill\_sram\_1`
`\end{enumerate}`	`\end{enumerate}`

`Each of the two states reads a halfword from SRAM. The two-state sequence is`	`Each of the two states reads a halfword from SRAM. The two-state sequence is`
Line 242...	Line 242...


`\subsubsection{SRAM interface read cycle timing -- 8-bit interface}`	`\subsubsection{SRAM interface read cycle timing -- 8-bit interface}`
`\label{sram_read_cycle_8b}`	`\label{sram_read_cycle_8b}`

`TODO: 8-bit refill procedure to be done.`	`The refill from an 8-bit static memory is essentially the same as depicted`
	`above, except we need to read 4 bytes (over the LSB lines of the static memory`
	`data bus) instead of 2 16-bit halfwords. The operation takes correspondingly`
	`longer to perform and uses an extra address line but is otherwise identical.`

	`TODO: 8-bit refill chronogram to be done.`


`\subsubsection{SRAM interface write cycle timing}`	`\subsubsection{16-bit SRAM interface write cycle timing}`
`\label{sram_write_cycle}`	`\label{sram_write_cycle}`

`The path of the state machine that deals with SRAM writethroughs is linear so`	`The path of the state machine that deals with SRAM writethroughs is linear so`
`a state diagram would not be very interesting. As you can see in the source`	`a state diagram would not be very interesting. As you can see in the source`
`code, all the states are one clock long except for states`	`code, all the states are one clock long except for states`
Line 259...	Line 264...
`attribute.\\`	`attribute.\\`

`A general memory write will be 32-bit wide and thus it will take two 16-bit`	`A general memory write will be 32-bit wide and thus it will take two 16-bit`
`memory accesses to complete. Unaligned, halfword or byte wide CPU writes might`	`memory accesses to complete. Unaligned, halfword or byte wide CPU writes might`
`in some cases be optimized to take only a single 16-bit memory access. This`	`in some cases be optimized to take only a single 16-bit memory access. This`
`module does no such optimization.`	`module does no such optimization yet.`
`For simplicity, all writethroughs take two 16-bit access cycles, even if one`	`For simplicity, all writethroughs take two 16-bit access cycles, even if one`
`of them has both we\_n signals deasserted.\\`	`of them has both we\_n signals deasserted.\\`

`The following chronogram has been copied from a simulation of the 'hello'`	`The following chronogram has been copied from a simulation of the 'hello'`
`sample. It's a 32-bit wide write to address 00000430h.`	`sample. It's a 32-bit wide write to address 00000430h.`
Line 272...	Line 277...
`word). In this particular case, all the four bytes of the long word are written`	`word). In this particular case, all the four bytes of the long word are written`
`and so both the we\_n signals are asserted for both halfwords.`	`and so both the we\_n signals are asserted for both halfwords.`

`In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for`	`In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for`
`two cycles.`	`two cycles.`
`Note how a lot of cycles are lost in order to guarantee compliance with the`	`Note how a lot of cycles are used in order to guarantee compliance with the`
`setup and hold times of the SRAM against the we, address and data lines.`	`setup and hold times of the SRAM against the we, address and data lines.`

`\needspace{15\baselineskip}`	`\needspace{15\baselineskip}`
`\begin{verbatim}`	`\begin{verbatim}`
`==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================`	`==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================`
Line 300...	Line 305...

`Signal \emph{cache/ps} is the current state of the cache state machine, and`	`Signal \emph{cache/ps} is the current state of the cache state machine, and`
`in this chronogram it takes the following values:`	`in this chronogram it takes the following values:`

`\begin{enumerate}`	`\begin{enumerate}`
`\item idle`
`\item data\_writethrough\_sram\_0a`	`\item data\_writethrough\_sram\_0a`
`\item data\_writethrough\_sram\_0b`	`\item data\_writethrough\_sram\_0b`
`\item data\_writethrough\_sram\_0c`	`\item data\_writethrough\_sram\_0c`
`\item data\_writethrough\_sram\_1a`	`\item data\_writethrough\_sram\_1a`
`\item data\_writethrough\_sram\_1b`	`\item data\_writethrough\_sram\_1b`
Line 314...	Line 318...


`\section{Known Problems}`	`\section{Known Problems}`
`\label{cache_problems}`	`\label{cache_problems}`

	`The cache implementation is still provisional and has a number of`
	`acknowledged problems:`

`\begin{enumerate}`	`\begin{enumerate}`
`\item All parameters hardcoded -- generics are almost ignored.`	`\item All parameters hardcoded -- generics are almost ignored.`
`\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.`	`\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.`
`In my current target board it works because the FPGA hold times`	`In my current target board it works because the FPGA hold times`
`(including an input mux`	`(including an input mux`
`in the parent module) are far smaller than the SRAM response times, but`	`in the parent module) are far smaller than the SRAM response times, but`
`it would be better to insert an extra cycle after the wait states in`	`it would be better to insert an extra cycle after the wait states in`
`the sram read state machine.`	`the sram read state machine.`
	`\item Cache logic mixed with memory controller logic.`
`\end{enumerate}`	`\end{enumerate}`


`No newline at end of file`	`No newline at end of file`

Line 17...

    Initialization means mostly marking all D- and I-cache lines as invalid.

    Initialization means mostly marking all D- and I-cache lines as invalid.

    The old R3000 had its own means to achieve this, but this core implements an

    The old R3000 had its own means to achieve this, but this core implements an

    alternative, simplified scheme.\\

    alternative, simplified scheme.\\

    The standard R3000 cache control flags in the SR are not used, either. Instead,

    The standard R3000 cache control flags in the SR are not used, either. Instead,

    two flags from the SR have been commandeered for cache control.\\

    two flags from the SR have been repurposed for cache control.\\

\subsection{Cache control flags}

\subsection{Cache control flags}

\label{cache_control_flags}

\label{cache_control_flags}

    Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.

    Bits 17 and 16 of the SR are NOT used for their standard R3000 purpose.

Line 65...

    entire 32-bit address:

    entire 32-bit address:

\needspace{10\baselineskip}

\needspace{10\baselineskip}

\begin{verbatim}

\begin{verbatim}

             ___________ <-- These address bits are NOT in the tag

                _________ <-- These address bits are NOT in the tag

            /           \

            /           \

    31 ..   27| 26 .. 21  |20 ..          12|11  ..        4|3:2|

    31 ..   27| 26 .. 21  |20 ..          12|11  ..        4|3:2|

    +---------+-----------+-----------------+---------------+---+---+

    +---------+-----------+-----------------+---------------+---+---+

    | 5       |           | 9               | 8             | 2 |   |

    | 5       |           | 9               | 8             | 2 |   |

    +---------+-----------+-----------------+---------------+---+---+

    +---------+-----------+-----------------+---------------+---+---+

Line 79...

    \end{verbatim}\\

    \end{verbatim}\\

    Since bits 26 downto 21 are not included in the tag, there will be a

    Since bits 26 downto 21 are not included in the tag, there will be a

    'mirror' effect in the cache. We have effectively split the memory space

    'mirror' effect in the cache. We have effectively split the memory space

    into 32 separate blocks of 1MB which is obviously not enough but will do

    into 32 separate blocks of 1MB which is obviously not enough but will do

    for the initial tests.

    for the initial versions of the core.

    In subsequent versions of the cache, the tag size needs to be enlarged AND

    In subsequent versions of the cache, the tag size needs to be enlarged AND

    some of the top bits might be omitted when they're not needed to implement

    some of the top bits might be omitted when they're not needed to implement

    the default memory map (namely bit 30 which is always '0').

    the default MIPS memory map (namely bit 30 which is always '0').

\section{Memory Controller}

\section{Memory Controller}

\label{memory_controller}

\label{memory_controller}

Line 120...

Line 121...

    decode logic.\\

    decode logic.\\

    For each address, the memory map logic will supply the following information:

    For each address, the memory map logic will supply the following information:

\begin{enumerate}

\begin{enumerate}

    \item What kind of memory it is

    \item What kind of memory it is.

    \item How many wait states to use

    \item How many wait states to use.

    \item Whether it is writeable or not (ignored in current version)

    \item Whether it is writeable or not (ignored in current version).

    \item Whether it is cacheable or not (ignored in current version)

    \item Whether it is cacheable or not (ignored in current version).

\end{enumerate}

\end{enumerate}

    In the present implementation the memory map can't be modified at run time.\\

    In the present implementation the memory map can't be modified at run time.\\

    These are the currently supported memory types:

    These are the currently supported memory types:

Line 207...

Line 208...

              __    __    __    __    __    __    _     __    __    __    __

              __    __    __    __    __    __    _     __    __    __    __

clk         _/  \__/  \__/  \__/  \__/  \__/  \__/ ..._/  \__/  \__/  \__/

clk         _/  \__/  \__/  \__/  \__/  \__/  \__/ ..._/  \__/  \__/  \__/

cache/ps    ?| (1)             | (2)             | ... | (2)             |??

cache/ps    ?| (1)             | (2)             | ... | (2)             |??

refill_ctr  ?| 0                                 | ... <  3              |??

refill_ctr  ?| 0                                 | ... |  3              |??

chip_addr   ?|  210h           |  211h           | ... |  217h           |--

chip_addr   ?|  210h           |  211h           | ... |  217h           |--

data_rd     -XXXXX  [218h]     XXXXX  [219h]     | ... XXXXX  [217h]     |--

data_rd     -XXXXX  [218h]     XXXXX  [219h]     | ... XXXXX  [217h]     |--

             |<- 2-state sequence              ->|

             |<- 2-state sequence              ->|

Line 227...

Line 228...

Signal \emph{cache/ps} is the current state of the cache state machine, and

Signal \emph{cache/ps} is the current state of the cache state machine, and

in this chronogram it takes the following values:

in this chronogram it takes the following values:

\begin{enumerate}

\begin{enumerate}

\item idle

\item data\_refill\_sram\_0

\item data\_refill\_sram\_0

\item data\_refill\_sram\_1

\item data\_refill\_sram\_1

\end{enumerate}

\end{enumerate}

Each of the two states reads a halfword from SRAM. The two-state sequence is

Each of the two states reads a halfword from SRAM. The two-state sequence is

Line 242...

\subsubsection{SRAM interface read cycle timing -- 8-bit interface}

\subsubsection{SRAM interface read cycle timing -- 8-bit interface}

\label{sram_read_cycle_8b}

\label{sram_read_cycle_8b}

TODO: 8-bit refill procedure to be done.

The refill from an 8-bit static memory is essentially the same as depicted

above, except we need to read 4 bytes (over the LSB lines of the static memory

data bus) instead of 2 16-bit halfwords. The operation takes correspondingly

longer to perform and uses an extra address line but is otherwise identical.

TODO: 8-bit refill chronogram to be done.

\subsubsection{SRAM interface write cycle timing}

\subsubsection{16-bit SRAM interface write cycle timing}

\label{sram_write_cycle}

\label{sram_write_cycle}

The path of the state machine that deals with SRAM writethroughs is linear so

The path of the state machine that deals with SRAM writethroughs is linear so

a state diagram would not be very interesting. As you can see in the source

a state diagram would not be very interesting. As you can see in the source

code, all the states are one clock long except for states

code, all the states are one clock long except for states

Line 259...

Line 264...

attribute.\\

attribute.\\

A general memory write will be 32-bit wide and thus it will take two 16-bit

A general memory write will be 32-bit wide and thus it will take two 16-bit

memory accesses to complete. Unaligned, halfword or byte wide CPU writes might

memory accesses to complete. Unaligned, halfword or byte wide CPU writes might

in some cases be optimized to take only a single 16-bit memory access. This

in some cases be optimized to take only a single 16-bit memory access. This

module does no such optimization.

module does no such optimization yet.

For simplicity, all writethroughs take two 16-bit access cycles, even if one

For simplicity, all writethroughs take two 16-bit access cycles, even if one

of them has both we\_n signals deasserted.\\

of them has both we\_n signals deasserted.\\

The following chronogram has been copied from a simulation of the 'hello'

The following chronogram has been copied from a simulation of the 'hello'

sample. It's a 32-bit wide write to address 00000430h.

sample. It's a 32-bit wide write to address 00000430h.

Line 272...

Line 277...

word). In this particular case, all the four bytes of the long word are written

word). In this particular case, all the four bytes of the long word are written

and so both the we\_n signals are asserted for both halfwords.

and so both the we\_n signals are asserted for both halfwords.

In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for

In this example, the SRAM is being accessed with 1 WS: WE\_N is asserted for

two cycles.

two cycles.

Note how a lot of cycles are lost in order to guarantee compliance with the

Note how a lot of cycles are used in order to guarantee compliance with the

setup and hold times of the SRAM against the we, address and data lines.

setup and hold times of the SRAM against the we, address and data lines.

\needspace{15\baselineskip}

\needspace{15\baselineskip}

\begin{verbatim}

\begin{verbatim}

==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================

==== Chronogram 4.3: 16-bit SRAM writethrough, 32-bit wide =================

Line 300...

Line 305...

Signal \emph{cache/ps} is the current state of the cache state machine, and

Signal \emph{cache/ps} is the current state of the cache state machine, and

in this chronogram it takes the following values:

in this chronogram it takes the following values:

\begin{enumerate}

\begin{enumerate}

\item idle

\item data\_writethrough\_sram\_0a

\item data\_writethrough\_sram\_0a

\item data\_writethrough\_sram\_0b

\item data\_writethrough\_sram\_0b

\item data\_writethrough\_sram\_0c

\item data\_writethrough\_sram\_0c

\item data\_writethrough\_sram\_1a

\item data\_writethrough\_sram\_1a

\item data\_writethrough\_sram\_1b

\item data\_writethrough\_sram\_1b

Line 314...

Line 318...

\section{Known Problems}

\section{Known Problems}

\label{cache_problems}

\label{cache_problems}

    The cache implementation is still provisional and has a number of

    acknowledged problems:

\begin{enumerate}

\begin{enumerate}

\item All parameters hardcoded -- generics are almost ignored.

\item All parameters hardcoded -- generics are almost ignored.

\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.

\item SRAM read state machine does not guarantee internal FPGA $T_{hold}$.

        In my current target board it works because the FPGA hold times

        In my current target board it works because the FPGA hold times

        (including an input mux

        (including an input mux

        in the parent module) are far smaller than the SRAM response times, but

        in the parent module) are far smaller than the SRAM response times, but

        it would be better to insert an extra cycle after the wait states in

        it would be better to insert an extra cycle after the wait states in

        the sram read state machine.

        the sram read state machine.

\item Cache logic mixed with memory controller logic.

\end{enumerate}

\end{enumerate}

 No newline at end of file

 No newline at end of file

Browse

Tools

Subversion Repositories ion

[/] [ion/] [trunk/] [doc/] [src/] [tex/] [cache.tex] - Diff between revs 210 and 221