Line 46... |
Line 46... |
\documentclass{gqtekspec}
|
\documentclass{gqtekspec}
|
\project{Zip CPU}
|
\project{Zip CPU}
|
\title{Specification}
|
\title{Specification}
|
\author{Dan Gisselquist, Ph.D.}
|
\author{Dan Gisselquist, Ph.D.}
|
\email{dgisselq (at) opencores.org}
|
\email{dgisselq (at) opencores.org}
|
\revision{Rev.~0.1}
|
\revision{Rev.~0.2}
|
\begin{document}
|
\begin{document}
|
\pagestyle{gqtekspecplain}
|
\pagestyle{gqtekspecplain}
|
\titlepage
|
\titlepage
|
\begin{license}
|
\begin{license}
|
Copyright (C) \theyear\today, Gisselquist Technology, LLC
|
Copyright (C) \theyear\today, Gisselquist Technology, LLC
|
Line 68... |
Line 68... |
You should have received a copy of the GNU General Public License along
|
You should have received a copy of the GNU General Public License along
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
copy.
|
copy.
|
\end{license}
|
\end{license}
|
\begin{revisionhistory}
|
\begin{revisionhistory}
|
|
0.2 & 8/19/2015 & Gisselquist & Still Draft, more complete \\\hline
|
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
|
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
|
\end{revisionhistory}
|
\end{revisionhistory}
|
% Revision History
|
% Revision History
|
% Table of Contents, named Contents
|
% Table of Contents, named Contents
|
\tableofcontents
|
\tableofcontents
|
% \listoffigures
|
\listoffigures
|
\listoftables
|
\listoftables
|
\begin{preface}
|
\begin{preface}
|
Many people have asked me why I am building the Zip CPU. ARM processors are
|
Many people have asked me why I am building the Zip CPU. ARM processors are
|
good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both
|
good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both
|
have better toolsets than the Zip CPU will ever have. OpenRISC is also
|
have better toolsets than the Zip CPU will ever have. OpenRISC is also
|
available. Why build a new processor?
|
available, RISC--V may be replacing it. Why build a new processor?
|
|
|
The easiest, most obvious answer is the simple one: Because I can.
|
The easiest, most obvious answer is the simple one: Because I can.
|
|
|
There's more to it, though. There's a lot that I would like to do with a
|
There's more to it, though. There's a lot that I would like to do with a
|
processor, and I want to be able to do it in a vendor independent fashion.
|
processor, and I want to be able to do it in a vendor independent fashion.
|
Line 131... |
Line 132... |
|
|
For those who like buzz words, the Zip CPU is:
|
For those who like buzz words, the Zip CPU is:
|
\begin{itemize}
|
\begin{itemize}
|
\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,
|
\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,
|
instructions are 32-bits wide, etc.
|
instructions are 32-bits wide, etc.
|
\item A RISC CPU. There is no microcode for executing instructions.
|
\item A RISC CPU. There is no microcode for executing instructions. All
|
|
instructions are designed to be completed in one clock cycle.
|
\item A Load/Store architecture. (Only load and store instructions
|
\item A Load/Store architecture. (Only load and store instructions
|
can access memory.)
|
can access memory.)
|
\item Wishbone compliant. All peripherals are accessed just like
|
\item Wishbone compliant. All peripherals are accessed just like
|
memory across this bus.
|
memory across this bus.
|
\item A Von-Neumann architecture. (The instructions and data share a
|
\item A Von-Neumann architecture. (The instructions and data share a
|
common bus.)
|
common bus.)
|
\item A pipelined architecture, having stages for {\bf Prefetch},
|
\item A pipelined architecture, having stages for {\bf Prefetch},
|
{\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory}
|
{\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory}
|
unit, and {\bf Write-back}
|
unit, and {\bf Write-back}. See Fig.~\ref{fig:cpu}
|
|
\begin{figure}\begin{center}
|
|
\includegraphics[width=3.5in]{../gfx/cpu.eps}
|
|
\caption{Zip CPU internal pipeline architecture}\label{fig:cpu}
|
|
\end{center}\end{figure}
|
|
for a diagram of this structure.
|
\item Completely open source, licensed under the GPL.\footnote{Should you
|
\item Completely open source, licensed under the GPL.\footnote{Should you
|
need a copy of the Zip CPU licensed under other terms, please
|
need a copy of the Zip CPU licensed under other terms, please
|
contact me.}
|
contact me.}
|
\end{itemize}
|
\end{itemize}
|
|
|
Line 153... |
Line 160... |
capabilities that I was never expecting to need. These include:
|
capabilities that I was never expecting to need. These include:
|
\begin{itemize}
|
\begin{itemize}
|
\item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is
|
\item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is
|
still necessary to debug this CPU. That means that there needs to be
|
still necessary to debug this CPU. That means that there needs to be
|
an external register that can control the CPU: reset it, halt it, step
|
an external register that can control the CPU: reset it, halt it, step
|
it, and tell whether it is running or not. Another register is placed
|
it, and tell whether it is running or not. My chosen interface
|
similar to this register, to allow the external controller to examine
|
includes a second register similar to this control register. This
|
|
second register allows the external controller or debugger to examine
|
registers internal to the CPU.
|
registers internal to the CPU.
|
|
|
\item {\bf Internal Debug:} Being able to run a debugger from within
|
\item {\bf Internal Debug:} Being able to run a debugger from within
|
a user process requires an ability to step a user process from
|
a user process requires an ability to step a user process from
|
within a debugger. It also requires a break instruction that can
|
within a debugger. It also requires a break instruction that can
|
Line 167... |
Line 175... |
allowed to execute. That way, upon a break, the debugger should
|
allowed to execute. That way, upon a break, the debugger should
|
be able to jump back into the user process to step the instruction
|
be able to jump back into the user process to step the instruction
|
that would've been at the break point initially, and then to
|
that would've been at the break point initially, and then to
|
replace the break after passing it.
|
replace the break after passing it.
|
|
|
|
Incidentally, this break messes with the prefetch cache and the
|
|
pipeline: if you change an instruction partially through the pipeline,
|
|
the whole pipeline needs to be cleansed. Likewise if you change
|
|
an instruction in memory, you need to make sure the cache is reloaded
|
|
with the new instruction.
|
|
|
\item {\bf Prefetch Cache:} My original implementation had a very
|
\item {\bf Prefetch Cache:} My original implementation had a very
|
simple prefetch stage. Any time the PC changed the prefetch would go
|
simple prefetch stage. Any time the PC changed the prefetch would go
|
and fetch the new instruction. While this was perhaps this simplest
|
and fetch the new instruction. While this was perhaps this simplest
|
approach, it cost roughly five clocks for every instruction. This
|
approach, it cost roughly five clocks for every instruction. This
|
was deemed unacceptable, as I wanted a CPU that could execute
|
was deemed unacceptable, as I wanted a CPU that could execute
|
Line 187... |
Line 201... |
purpose of the {\bf `trap'} instruction. This instruction needs to
|
purpose of the {\bf `trap'} instruction. This instruction needs to
|
place the CPU into supervisor mode (here equivalent to disabling
|
place the CPU into supervisor mode (here equivalent to disabling
|
interrupts), as well as handing it a parameter such as identifying
|
interrupts), as well as handing it a parameter such as identifying
|
which O/S function was called.
|
which O/S function was called.
|
|
|
My initial approach to building a trap instruction was to create
|
My initial approach to building a trap instruction was to create an external
|
an external peripheral which, when written to, would generate an
|
peripheral which, when written to, would generate an interrupt and could
|
interrupt and could return the last value written to it. This failed
|
return the last value written to it. In practice, this approach didn't work
|
timing requirements, however: the CPU executed two instructions while
|
at all: the CPU executed two instructions while waiting for the
|
waiting for the trap interrupt to take place. Since then, I've
|
trap interrupt to take place. Since then, I've decided to keep the rest of
|
decided to keep the rest of the CC register for that purpose so that a
|
the CC register for that purpose so that a write to the CC register, with the
|
write to the CC register, with the GIE bit cleared, could be used to
|
GIE bit cleared, could be used to execute a trap. This has other problems,
|
execute a trap.
|
though, primarily in the limitation of the uses of the CC register. In
|
|
particular, the CC register is the best place to put CPU state information and
|
|
to ``announce'' special CPU features (floating point, etc). So the trap
|
|
instruction still switches to interrupt mode, but the CC register is not
|
|
nearly as useful for telling the supervisor mode processor what trap is being
|
|
executed.
|
|
|
Modern timesharing systems also depend upon a {\bf Timer} interrupt
|
Modern timesharing systems also depend upon a {\bf Timer} interrupt
|
to handle task swapping. For the Zip CPU, this interrupt is handled
|
to handle task swapping. For the Zip CPU, this interrupt is handled
|
external to the CPU as part of the CPU System, found in
|
external to the CPU as part of the CPU System, found in {\tt zipsystem.v}.
|
{\tt zipsystem.v}. The timer module itself is found in
|
The timer module itself is found in {\tt ziptimer.v}.
|
{\tt ziptimer.v}.
|
|
|
|
\item {\bf Pipeline Stalls:} My original plan was to not support pipeline
|
\item {\bf Pipeline Stalls:} My original plan was to not support pipeline
|
stalls at all, but rather to require the compiler to properly schedule
|
stalls at all, but rather to require the compiler to properly schedule
|
instructions so that stalls would never be necessary. After trying
|
all instructions so that stalls would never be necessary. After trying
|
to build such an architecture, I gave up, having learned some things:
|
to build such an architecture, I gave up, having learned some things:
|
|
|
For example, in order to facilitate interrupt handling and debug
|
For example, in order to facilitate interrupt handling and debug
|
stepping, the CPU needs to know what instructions have finished, and
|
stepping, the CPU needs to know what instructions have finished, and
|
which have not. In other words, it needs to know where it can restart
|
which have not. In other words, it needs to know where it can restart
|
the pipeline from. Once restarted, it must act as though it had
|
the pipeline from. Once restarted, it must act as though it had
|
never stopped. This killed my idea of delayed branching, since
|
never stopped. This killed my idea of delayed branching, since what
|
what would be the appropriate program counter to restart at?
|
would be the appropriate program counter to restart at? The one the
|
The one the CPU was going to branch to, or the ones in the
|
CPU was going to branch to, or the ones in the delay slots? This
|
delay slots?
|
also makes the idea of compressed instruction codes difficult, since,
|
|
again, where do you restart on interrupt?
|
|
|
So I switched to a model of discrete execution: Once an instruction
|
So I switched to a model of discrete execution: Once an instruction
|
enters into either the ALU or memory unit, the instruction is
|
enters into either the ALU or memory unit, the instruction is
|
guaranteed to complete. If the logic recognizes a branch or a
|
guaranteed to complete. If the logic recognizes a branch or a
|
condition that would render the instruction entering into this stage
|
condition that would render the instruction entering into this stage
|
Line 226... |
Line 245... |
instruction for example), then the pipeline stalls for one cycle
|
instruction for example), then the pipeline stalls for one cycle
|
until the conditional branch completes. Then, if it generates a new
|
until the conditional branch completes. Then, if it generates a new
|
PC address, the stages preceeding are all wiped clean.
|
PC address, the stages preceeding are all wiped clean.
|
|
|
The discrete execution model allows such things as sleeping: if the
|
The discrete execution model allows such things as sleeping: if the
|
CPU is put to "sleep", the ALU and memory stages stall and back up
|
CPU is put to ``sleep,'' the ALU and memory stages stall and back up
|
everything before them. Likewise, anything that has entered the ALU
|
everything before them. Likewise, anything that has entered the ALU
|
or memory stage when the CPU is placed to sleep continues to completion.
|
or memory stage when the CPU is placed to sleep continues to completion.
|
To handle this logic, each pipeline stage has three control signals:
|
To handle this logic, each pipeline stage has three control signals:
|
a valid signal, a stall signal, and a clock enable signal. In
|
a valid signal, a stall signal, and a clock enable signal. In
|
general, a stage stalls if it's contents are valid and the next step
|
general, a stage stalls if it's contents are valid and the next step
|
is stalled. This allows the pipeline to fill any time a later stage
|
is stalled. This allows the pipeline to fill any time a later stage
|
stalls.
|
stalls.
|
|
|
|
This approach is also different from other pipeline approaches. Instead
|
|
of keeping the entire pipeline filled, each stage is treated
|
|
independently. Therefore, individual stages may move forward as long
|
|
as the subsequent stage is available, regardless of whether the stage
|
|
behind it is filled.
|
|
|
\item {\bf Verilog Modules:} When examining how other processors worked
|
\item {\bf Verilog Modules:} When examining how other processors worked
|
here on open cores, many of them had one separate module per pipeline
|
here on open cores, many of them had one separate module per pipeline
|
stage. While this appeared to me to be a fascinating and commendable
|
stage. While this appeared to me to be a fascinating and commendable
|
idea, my own implementation didn't work out quite so nicely.
|
idea, my own implementation didn't work out quite so nicely.
|
|
|
Line 261... |
Line 286... |
With that introduction out of the way, let's move on to the instruction
|
With that introduction out of the way, let's move on to the instruction
|
set.
|
set.
|
|
|
\chapter{CPU Architecture}\label{chap:arch}
|
\chapter{CPU Architecture}\label{chap:arch}
|
|
|
The Zip CPU supports a set of two operand instructions, where the first operand
|
The Zip CPU supports a set of two operand instructions, where the second operand
|
(always a register) is the result. The only exception is the store instruction,
|
(always a register) is the result. The only exception is the store instruction,
|
where the first operand (always a register) is the source of the data to be
|
where the first operand (always a register) is the source of the data to be
|
stored.
|
stored.
|
|
|
|
\section{Simplified Bus}
|
|
The bus architecture of the Zip CPU is that of a simplified WISHBONE bus.
|
|
It has been simplified in this fashion: all operations are 32--bit operations.
|
|
The bus is neither little endian nor bit endian. For this reason, all words
|
|
are 32--bits. All instructions are also 32--bits wide. Everything has been
|
|
built around the 32--bit word.
|
|
|
\section{Register Set}
|
\section{Register Set}
|
The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor
|
The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor
|
and a user set. The supervisor set is used in interrupt mode, whereas
|
and a user set as shown in Fig.~\ref{fig:regset}.
|
the user set is used otherwise. Of this register set, the Program Counter (PC)
|
\begin{figure}\begin{center}
|
is register 15, whereas the status register (SR) or condition code register
|
\includegraphics[width=3.5in]{../gfx/regset.eps}
|
|
\caption{Zip CPU Register File}\label{fig:regset}
|
|
\end{center}\end{figure}
|
|
The supervisor set is used in interrupt mode when interrupts are disabled,
|
|
whereas the user set is used otherwise. Of this register set, the Program
|
|
Counter (PC) is register 15, whereas the status register (SR) or condition
|
|
code register
|
(CC) is register 14. By convention, the stack pointer will be register 13 and
|
(CC) is register 14. By convention, the stack pointer will be register 13 and
|
noted as (SP)--although the instruction set allows it to be anything.
|
noted as (SP)--although there is nothing special about this register other
|
|
than this convention.
|
The CPU can access both register sets via move instructions from the
|
The CPU can access both register sets via move instructions from the
|
supervisor state, whereas the user state can only access the user registers.
|
supervisor state, whereas the user state can only access the user registers.
|
|
|
The status register is special, and bears further mention. The lower
|
The status register is special, and bears further mention. The lower
|
8 bits of the status register form a set of condition codes. Writes to other
|
10 bits of the status register form a set of CPU state and condition codes.
|
bits are preserved, and can be used as part of the trap architecture--examined
|
Writes to other bits of this register are preserved.
|
by the O/S upon any interrupt, cleared before returning.
|
|
|
|
Of the eight condition codes, the bottom four are the current flags:
|
Of the eight condition codes, the bottom four are the current flags:
|
Zero (Z),
|
Zero (Z),
|
Carry (C),
|
Carry (C),
|
Negative (N),
|
Negative (N),
|
Line 312... |
Line 350... |
This functionality was added to enable a userspace debugger
|
This functionality was added to enable a userspace debugger
|
functionality on a user process, working through supervisor mode
|
functionality on a user process, working through supervisor mode
|
of course.
|
of course.
|
|
|
|
|
The eighth bit is a break enable bit. This
|
The eighth bit is a break enable bit. This controls whether a break
|
controls whether a break instruction will halt the processor for an
|
instruction in user mode will halt the processor for an external debugger
|
external debuggerr (break enabled), or whether the break instruction
|
(break enabled), or whether the break instruction will simply send send the
|
will simply set the STEP bit and send the CPU into interrupt mode.
|
CPU into interrupt mode. Encountering a break in supervisor mode will
|
This bit can only be set within supervisor mode.
|
halt the CPU independent of the break enable bit. This bit can only be set
|
|
within supervisor mode.
|
|
|
This functionality was added to enable an external debugger to
|
This functionality was added to enable an external debugger to
|
set and manage breakpoints.
|
set and manage breakpoints.
|
|
|
The ninth bit is reserved for a floating point enable bit. When set, the
|
The ninth bit is reserved for a floating point enable bit. When set, the
|
arithmetic for the next instruction will be sent to a floating point unit.
|
arithmetic for the next instruction will be sent to a floating point unit.
|
Such a unit may later be added as an extension to the Zip CPU. If the
|
Such a unit may later be added as an extension to the Zip CPU. If the
|
CPU does not support floating point instructions, this bit will never be set.
|
CPU does not support floating point instructions, this bit will never be set.
|
|
The instruction set could also be simply extended to allow other data types
|
|
in this fashion, such as two by 16--bit vector operations or four by 8--bit
|
|
vector operations.
|
|
|
The tenth bit is a trap bit. It is set whenever the user requests a soft
|
The tenth bit is a trap bit. It is set whenever the user requests a soft
|
interrupt, and cleared on any return to userspace command. This allows the
|
interrupt, and cleared on any return to userspace command. This allows the
|
supervisor, in supervisor mode, to determine whether it got to supervisor
|
supervisor, in supervisor mode, to determine whether it got to supervisor
|
mode from a trap or from an external interrupt or both.
|
mode from a trap or from an external interrupt or both.
|
|
|
The status register bits are shown below:
|
These status register bits are summarized in Tbl.~\ref{tbl:ccbits}.
|
\begin{table}
|
\begin{table}
|
\begin{center}
|
\begin{center}
|
\begin{tabular}{l|l}
|
\begin{tabular}{l|l}
|
Bit & Meaning \\\hline
|
Bit & Meaning \\\hline
|
9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline
|
9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline
|
Line 347... |
Line 389... |
3 & V, or overflow bit.\\\hline
|
3 & V, or overflow bit.\\\hline
|
2 & N, or negative bit.\\\hline
|
2 & N, or negative bit.\\\hline
|
1 & C, or carry bit.\\\hline
|
1 & C, or carry bit.\\\hline
|
0 & Z, or zero bit. \\\hline
|
0 & Z, or zero bit. \\\hline
|
\end{tabular}
|
\end{tabular}
|
\end{center}
|
\caption{Condition Code / Status Register Bits}\label{tbl:ccbits}
|
\end{table}
|
\end{center}\end{table}
|
|
|
\section{Conditional Instructions}
|
\section{Conditional Instructions}
|
Most, although not quite all, instructions are conditionally executed. From
|
Most, although not quite all, instructions are conditionally executed. From
|
the four condition code flags, eight conditions are defined. These are shown
|
the four condition code flags, eight conditions are defined. These are shown
|
in Tbl.~\ref{tbl:conditions}.
|
in Tbl.~\ref{tbl:conditions}.
|
\begin{table}
|
\begin{table}
|
Line 362... |
Line 405... |
3'h0 & None & Always execute the instruction \\
|
3'h0 & None & Always execute the instruction \\
|
3'h1 & {\tt .Z} & Only execute when 'Z' is set \\
|
3'h1 & {\tt .Z} & Only execute when 'Z' is set \\
|
3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\
|
3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\
|
3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
|
3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
|
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
|
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
|
3'h5 & {\tt .LT} & Less than ('N' not set) \\
|
3'h5 & {\tt .LT} & Less than ('N' set) \\
|
3'h6 & {\tt .C} & Carry set\\
|
3'h6 & {\tt .C} & Carry set\\
|
3'h7 & {\tt .V} & Overflow set\\
|
3'h7 & {\tt .V} & Overflow set\\
|
\end{tabular}
|
\end{tabular}
|
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
\end{center}
|
\end{center}
|
\end{table}
|
\end{table}
|
There is no condition code for less than or equal, not C or not V. Using
|
There is no condition code for less than or equal, not C or not V. Sorry,
|
these conditions will take an extra instruction.
|
I ran out of space in 3--bits. Using these conditions will take an extra
|
(Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
|
instruction. (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
|
|
|
\section{Operand B}
|
\section{Operand B}
|
Many instruction forms have a 21-bit source "Operand B" associated with them.
|
Many instruction forms have a 21-bit source ``Operand B'' associated with them.
|
This Operand B is either equal to a register plus a signed immediate offset,
|
This Operand B is either equal to a register plus a signed immediate offset,
|
or an immediate offset by itself. This value is encoded as shown in
|
or an immediate offset by itself. This value is encoded as shown in
|
Tbl.~\ref{tbl:opb}.
|
Tbl.~\ref{tbl:opb}.
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{|l|l|l|}\hline
|
\begin{tabular}{|l|l|l|}\hline
|
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
|
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
|
1'b0 & \multicolumn{2}{l|}{Signed Immediate value} \\\hline
|
1'b0 & \multicolumn{2}{l|}{20--bit Signed Immediate value} \\\hline
|
1'b1 & 4-bit Register & 16-bit Signed immediate offset \\\hline
|
1'b1 & 4-bit Register & 16--bit Signed immediate offset \\\hline
|
\end{tabular}
|
\end{tabular}
|
\caption{Bit allocation for Operand B}\label{tbl:opb}
|
\caption{Bit allocation for Operand B}\label{tbl:opb}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
|
|
|
Sixteen and twenty bit immediates don't make sense for all instructions. For
|
|
example, what is the point of a 20--bit immediate when executing a 16--bit
|
|
multiply? Likewise, why have a 16--bit immediate when adding to a logical
|
|
or arithmetic shift? In these cases, the extra bits are reserved for future
|
|
instruction possibilities.
|
|
|
\section{Address Modes}
|
\section{Address Modes}
|
The ZIP CPU supports two addressing modes: register plus immediate, and
|
The ZIP CPU supports two addressing modes: register plus immediate, and
|
immediate address. Addresses are therefore encoded in the same fashion as
|
immediate address. Addresses are therefore encoded in the same fashion as
|
Operand B's, shown above.
|
Operand B's, shown above.
|
|
|
A lot of long hard thought was put into whether to allow pre/post increment
|
A lot of long hard thought was put into whether to allow pre/post increment
|
and decrement addressing modes. Finding no way to use these operators without
|
and decrement addressing modes. Finding no way to use these operators without
|
taking two or more clocks per instruction, these addressing modes have been
|
taking two or more clocks per instruction, these addressing modes have been
|
removed from the realm of possibilities. This means that the Zip CPU has no
|
removed from the realm of possibilities. This means that the Zip CPU has no
|
native way of executing push, pop, return, or jump to subroutine operations.
|
native way of executing push, pop, return, or jump to subroutine operations.
|
|
Each of these instructions can be emulated with a set of instructions from the
|
|
existing set.
|
|
|
\section{Move Operands}
|
\section{Move Operands}
|
The previous set of operands would be perfect and complete, save only that
|
The previous set of operands would be perfect and complete, save only that
|
the CPU needs access to non--supervisory registers while in supervisory
|
the CPU needs access to non--supervisory registers while in supervisory mode.
|
mode. Therefore, the MOV instruction is special and offers access
|
Therefore, the MOV instruction is special and offers access to these registers
|
to these registers ... when in supervisory mode. To keep the compiler
|
\ldots when in supervisory mode. To keep the compiler simple, the extra bits
|
simple, the extra bits are ignored in non-supervisory mode (as though
|
are ignored in non-supervisory mode (as though they didn't exist), rather than
|
they didn't exist), rather than being mapped to new instructions or
|
being mapped to new instructions or additional capabilities. The bits
|
additional capabilities. The bits indicating which register set each
|
indicating which register set each register lies within are the A-Usr and
|
register lies within are the A-Usr and B-Usr bits. When set to a one,
|
B-Usr bits. When set to a one, these refer to a user mode register. When set
|
these refer to a user mode register. When set to a zero, these refer
|
to a zero, these refer to a register in the current mode, whether user or
|
to a register in the current mode, whether user or supervisor.
|
supervisor. Further, because a load immediate instruction exists, there is no
|
Further, because
|
move capability between an immediate and a register: all moves come from either
|
a load immediate instruction exists, there is no move capability between
|
a register or a register plus an offset.
|
an immediate and a register: all moves come from either a register or
|
|
a register plus an offset.
|
This actually leads to a bit of a problem: since the MOV instruction encodes
|
|
which register set each register is coming from or moving to, how shall a
|
This actually leads to a bit of a problem: since the MOV instruction
|
compiler or assembler know how to compile a MOV instruction without knowing
|
encodes which register set each register is coming from or moving to,
|
the mode of the CPU at the time? For this reason, the compiler will assume
|
how shall a compiler or assembler know how to compile a MOV instruction
|
all MOV registers are supervisor registers, and display them as normal.
|
without knowing the mode of the CPU at the time? For this reason,
|
Anything with the user bit set will be treated as a user register. The CPU
|
the compiler will assume all MOV registers are supervisor registers,
|
will quietly ignore the supervisor bits while in user mode, and anything
|
and display them as normal. Anything with the user bit set will
|
marked as a user register will always be valid. (Did I just say that in the
|
be treated as a user register. The CPU will quietly ignore the
|
last paragraph?)
|
supervisor bits while in user mode, and anything marked as a user
|
|
register will always be valid.
|
|
|
|
\section{Multiply Operations}
|
\section{Multiply Operations}
|
While the Zip CPU instruction set supports multiply operations, they are not
|
The Zip CPU supports two Multiply operations, a
|
yet fully supported by the CPU. Two Multiply operations are supported, a
|
|
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU). In both
|
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU). In both
|
cases, the operand is a register plus a 16-bit immediate, subject to the
|
cases, the operand is a register plus a 16-bit immediate, subject to the
|
rule that the register cannot be the PC or CC registers. The PC register
|
rule that the register cannot be the PC or CC registers. The PC register
|
field has been stolen to create a multiply by immediate instruction. The
|
field has been stolen to create a multiply by immediate instruction. The
|
CC register field is reserved.
|
CC register field is reserved.
|
Line 461... |
Line 510... |
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
|
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
BTST(And) & \multicolumn{4}{l|}{4'h1}
|
TST(And) & \multicolumn{4}{l|}{4'h1}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
MOV & \multicolumn{4}{l|}{4'h2}
|
MOV & \multicolumn{4}{l|}{4'h2}
|
Line 596... |
Line 645... |
\end{tabular}
|
\end{tabular}
|
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
|
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
|
|
As you can see, there's lots of room for instruction set expansion. The
|
As you can see, there's lots of room for instruction set expansion. The
|
NOOP and BREAK instructions leave 24~bits of open instruction address
|
NOOP and BREAK instructions are the only instructions within one particular
|
space, minus the two instructions NOOP and BREAK. The Subtract leaves half
|
24--bit hole. Likewise, the subtract leaves half of its space open, since a
|
of its space open, since a subtract immediate is the same as an add with a
|
subtract immediate is the same as an add with a negated immediate. This
|
negated immediate.
|
spaces are reserved for future enhancements.
|
|
|
\section{Derived Instructions}
|
\section{Derived Instructions}
|
The ZIP CPU supports many other common instructions, but not all of them
|
The ZIP CPU supports many other common instructions, but not all of them
|
are single instructions. The derived instruction tables,
|
are single cycle instructions. The derived instruction tables,
|
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
|
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
|
help to capture some of how these other instructions may be implemented on
|
help to capture some of how these other instructions may be implemented on
|
the ZIP CPU. Many of these instructions will have assembly equivalents,
|
the ZIP CPU. Many of these instructions will have assembly equivalents,
|
such as the branch instructions, to facilitate working with the CPU.
|
such as the branch instructions, to facilitate working with the CPU.
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
Line 615... |
Line 664... |
Mapped & Actual & Notes \\\hline
|
Mapped & Actual & Notes \\\hline
|
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
|
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
|
& \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
|
& \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
|
& Add with carry \\\hline
|
& Add with carry \\\hline
|
BRA.Cond +/-\$Addr
|
BRA.Cond +/-\$Addr
|
& Mov.cond \$Addr+PC,PC
|
& \hbox{Mov.cond \$Addr+PC,PC}
|
& Branch or jump on condition. Works for 14 bit
|
& Branch or jump on condition. Works for 15--bit
|
address offsets.\\\hline
|
signed address offsets.\\\hline
|
BRA.Cond +/-\$Addr
|
BRA.Cond +/-\$Addr
|
& \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC}
|
& \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC}
|
& Branch/jump on condition. Works for
|
& Branch/jump on condition. Works for
|
23 bit address offsets, but costs a register, an extra instruction,
|
23 bit address offsets, but costs a register, an extra instruction,
|
and setsthe flags. \\\hline
|
and setsthe flags. \\\hline
|
Line 657... |
Line 706... |
& \parbox[t]{1.5in}{SUB \$1,SP \\\
|
& \parbox[t]{1.5in}{SUB \$1,SP \\\
|
MOV \$3+PC,R0 \\
|
MOV \$3+PC,R0 \\
|
STO R0,1(SP) \\
|
STO R0,1(SP) \\
|
MOV \$Addr+PC,PC \\
|
MOV \$Addr+PC,PC \\
|
ADD \$1,SP}
|
ADD \$1,SP}
|
& Jump to Subroutine. \\\hline
|
& Jump to Subroutine. Note the required cleanup instruction after
|
|
returning. \\\hline
|
JSR PC+\$Addr
|
JSR PC+\$Addr
|
& \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
|
& \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
|
&This is the high speed
|
&This is the high speed
|
version of a subroutine call, necessitating a register to hold the
|
version of a subroutine call, necessitating a register to hold the
|
last PC address. In its favor, this method doesn't suffer the
|
last PC address. In its favor, this method doesn't suffer the
|
Line 694... |
Line 744... |
& \parbox[t]{3in}{This CPU is designed for 32'bit word
|
& \parbox[t]{3in}{This CPU is designed for 32'bit word
|
length instructions. Byte addressing is not supported by the CPU or
|
length instructions. Byte addressing is not supported by the CPU or
|
the bus, so it therefore takes more work to do.
|
the bus, so it therefore takes more work to do.
|
|
|
Note also that in this example, \$Addr is a byte-wise address, where
|
Note also that in this example, \$Addr is a byte-wise address, where
|
all other addresses are 32-bit wordlength addresses. For this reason,
|
all other addresses in this document are 32-bit wordlength addresses.
|
|
For this reason,
|
we needed to drop the bottom two bits. This also limits the address
|
we needed to drop the bottom two bits. This also limits the address
|
space of character accesses using this method from 16 MB down to 4MB.}
|
space of character accesses using this method from 16 MB down to 4MB.}
|
\\\hline
|
\\\hline
|
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
|
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
|
& \parbox[t]{1.5in}{LSL \$1,Ry \\
|
& \parbox[t]{1.5in}{LSL \$1,Ry \\
|
Line 739... |
Line 790... |
& \parbox[t]{3in}{This depends upon the peripheral base address being
|
& \parbox[t]{3in}{This depends upon the peripheral base address being
|
in R12.
|
in R12.
|
|
|
Another opportunity might be to jump to the reset address from within
|
Another opportunity might be to jump to the reset address from within
|
supervisor mode.}\\\hline
|
supervisor mode.}\\\hline
|
RET & \parbox[t]{1.5in}{LOD \$-1(SP),R0 \\
|
RET & \parbox[t]{1.5in}{LOD \$-1(SP),PC}
|
MOV \$-1+SP,SP \\
|
& Note that this depends upon the calling context to clean up the
|
MOV R0,PC}
|
stack, as outlined for the JSR instruction. \\\hline
|
& An alternative might be to LOD \$-1(SP),PC, followed
|
|
by depending upon the calling program to ADD \$1,SP. \\\hline
|
|
\end{tabular}
|
\end{tabular}
|
\caption{Derived Instructions, continued}\label{tbl:derived-2}
|
\caption{Derived Instructions, continued}\label{tbl:derived-2}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
Line 789... |
Line 838... |
& While no extra registers are needed, this example
|
& While no extra registers are needed, this example
|
does take 3-clocks. \\\hline
|
does take 3-clocks. \\\hline
|
TRAP \#X
|
TRAP \#X
|
& LDILO \$x,CC
|
& LDILO \$x,CC
|
& This approach uses the unused bits of the CC register as a TRAP
|
& This approach uses the unused bits of the CC register as a TRAP
|
address. If these bits are zero, no trap has occurred. Unlike my
|
address. The user will need to make certain
|
previous approach, which was to use a trap peripheral, this approach
|
|
has no delay associated with it. To work, the supervisor will need
|
|
to clear this register following any trap, and the user will need to
|
|
be careful to only set this register prior to a trap condition.
|
|
Likewise, when setting this value, the user will need to make certain
|
|
that the SLEEP and GIE bits are not set in \$x. LDI would also work,
|
that the SLEEP and GIE bits are not set in \$x. LDI would also work,
|
however using LDILO permits the use of conditional traps. (i.e.,
|
however using LDILO permits the use of conditional traps. (i.e.,
|
trap if the zero flag is set.) Should you wish to trap off of a
|
trap if the zero flag is set.) Should you wish to trap off of a
|
register value, you could equivalently load \$x into the register and
|
register value, you could equivalently load \$x into the register and
|
then MOV it into the CC register. \\\hline
|
then MOV it into the CC register. \\\hline
|
Line 810... |
Line 854... |
of Rx. \\\hline
|
of Rx. \\\hline
|
WAIT
|
WAIT
|
& Or \$SLEEP,CC
|
& Or \$SLEEP,CC
|
& Wait 'til interrupt. In an interrupts disabled context, this
|
& Wait 'til interrupt. In an interrupts disabled context, this
|
becomes a HALT instruction.
|
becomes a HALT instruction.
|
</TABLE>
|
|
\end{tabular}
|
\end{tabular}
|
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
\iffalse
|
\iffalse
|
\fi
|
\fi
|
Line 825... |
Line 868... |
ever changes. Stalls are also created here if the instruction isn't
|
ever changes. Stalls are also created here if the instruction isn't
|
in the prefetch cache.
|
in the prefetch cache.
|
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
|
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
|
immediate offset.
|
immediate offset.
|
\item {\bf Read Operands}: Read registers and apply any immediate values to
|
\item {\bf Read Operands}: Read registers and apply any immediate values to
|
them. This stage will stall if any source operand is pending.
|
them. There is no means of detecting or flagging arithmetic overflow
|
|
or carry when adding the immediate to the operand. This stage will
|
|
stall if any source operand is pending.
|
A proper optimizing compiler, therefore, will schedule an instruction
|
A proper optimizing compiler, therefore, will schedule an instruction
|
between the instruction that produces the result and the instruction
|
between the instruction that produces the result and the instruction
|
that uses it.
|
that uses it.
|
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
|
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
|
instruction, and the {\bf MemOps} stage which accomplishes memory
|
instruction, and the {\bf MemOps} stage which accomplishes memory
|
Line 839... |
Line 884... |
written to the register set.
|
written to the register set.
|
\item Condition codes are available upon completion
|
\item Condition codes are available upon completion
|
\item Issuing an instruction to the memory while the memory is busy will
|
\item Issuing an instruction to the memory while the memory is busy will
|
stall the bus. If the bus deadlocks, only a reset will
|
stall the bus. If the bus deadlocks, only a reset will
|
release the CPU. (Watchdog timer, anyone?)
|
release the CPU. (Watchdog timer, anyone?)
|
|
\item The Zip CPU currently has no means of reading and acting on any
|
|
error conditions on the bus.
|
\end{itemize}
|
\end{itemize}
|
\item {\bf Write-Back}: Conditionally write back the result to register set,
|
\item {\bf Write-Back}: Conditionally write back the result to register set,
|
applying the condition. This routine is bi-re-entrant: either the
|
applying the condition. This routine is bi-re-entrant: either the
|
memory or the simple instruction may request a register write.
|
memory or the simple instruction may request a register write.
|
\end{enumerate}
|
\end{enumerate}
|
|
|
|
The Zip CPU does not support out of order execution. Therefore, if the memory
|
|
unit stalls, every other instruction stalls. Memory stores, however, can take
|
|
place concurrently with ALU operations, although memory writes cannot.
|
|
|
\section{Pipeline Logic}
|
\section{Pipeline Logic}
|
How the CPU handles some instruction combinations can be telling when
|
How the CPU handles some instruction combinations can be telling when
|
determining what happens in the pipeline. The following lists some examples:
|
determining what happens in the pipeline. The following lists some examples:
|
\begin{itemize}
|
\begin{itemize}
|
\item {\bf Delayed Branching}
|
\item {\bf Delayed Branching}
|
Line 989... |
Line 1040... |
\end{itemize}
|
\end{itemize}
|
|
|
|
|
|
|
\chapter{Peripherals}\label{chap:periph}
|
\chapter{Peripherals}\label{chap:periph}
|
|
|
|
While the previous chapter describes a CPU in isolation, the Zip System
|
|
includes a minimum set of peripherals as well. These peripherals are shown
|
|
in Fig.~\ref{fig:zipsystem}
|
|
\begin{figure}\begin{center}
|
|
\includegraphics[width=3.5in]{../gfx/system.eps}
|
|
\caption{Zip System Peripherals}\label{fig:zipsystem}
|
|
\end{center}\end{figure}
|
|
and described here. They are designed to make
|
|
the Zip CPU more useful in an Embedded Operating System environment.
|
|
|
\section{Interrupt Controller}
|
\section{Interrupt Controller}
|
|
|
|
Perhaps the most important peripheral within the Zip System is the interrupt
|
|
controller. While the Zip CPU itself can only handle one interrupt, and has
|
|
only the one interrupt state: disabled or enabled, the interrupt controller
|
|
can make things more interesting.
|
|
|
|
The Zip System interrupt controller module supports up to 15 interrupts, all
|
|
controlled from one register. Bit~31 of the interrupt controller controls
|
|
overall whether interrupts are enabled (1'b1) or disabled (1'b0). Bits~16--30
|
|
control whether individual interrupts are enabled (1'b0) or disabled (1'b0).
|
|
Bit~15 is an indicator showing whether or not any interrupt is active, and
|
|
bits~0--15 indicate whether or not an individual interrupt is active.
|
|
|
|
The interrupt controller has been designed so that bits can be controlled
|
|
individually without having any knowledge of the rest of the controller
|
|
setting. To enable an interrupt, write to the register with the high order
|
|
global enable bit set and the respective interrupt enable bit set. No other
|
|
bits will be affected. To disable an interrupt, write to the register with
|
|
the high order global enable bit cleared and the respective interrupt enable
|
|
bit set. To clear an interrupt, write a `1' to that interrupts status pin.
|
|
Zero's written to the register have no affect, save that a zero written to the
|
|
master enable will disable all interrupts.
|
|
|
|
As an example, suppose you wished to enable interrupt \#4. You would then
|
|
write to the register a {\tt 0x80100010} to enable interrupt \#4 and to clear
|
|
any past active state. When you later wish to disable this interrupt, you would
|
|
write a {\tt 0x00100010} to the register. As before, this both disables the
|
|
interrupt and clears the active indicator. This also has the side effect of
|
|
disabling all interrupts, so a second write of {\tt 0x80000000} may be necessary
|
|
to re-enable any other interrupts.
|
|
|
|
The Zip System currently hosts two interrupt controllers, a primary and a
|
|
secondary. The primary interrupt controller has one interrupt line which may
|
|
come from an external interrupt controller, and one interrupt line from the
|
|
secondary controller. Other primary interrupts include the system timers,
|
|
the jiffies interrupt, and the manual cache interrupt. The secondary interrupt
|
|
controller maintains an interrupt state for all of the processor accounting
|
|
counters.
|
|
|
\section{Counter}
|
\section{Counter}
|
|
|
The Zip Counter is a very simple counter: it just counts. It cannot be
|
The Zip Counter is a very simple counter: it just counts. It cannot be
|
halted. When it rolls over, it issues an interrupt. Writing a value to the
|
halted. When it rolls over, it issues an interrupt. Writing a value to the
|
counter just sets the current value, and it starts counting again from that
|
counter just sets the current value, and it starts counting again from that
|
Line 1010... |
Line 1111... |
Writing any non-zero value to the timer starts the timer. If the high order
|
Writing any non-zero value to the timer starts the timer. If the high order
|
bit is set when writing to the timer, the timer becomes an interval timer and
|
bit is set when writing to the timer, the timer becomes an interval timer and
|
reloads its last start time on any interrupt. Hence, to mark seconds, one
|
reloads its last start time on any interrupt. Hence, to mark seconds, one
|
might set the timer to 100~million (the number of clocks per second), and
|
might set the timer to 100~million (the number of clocks per second), and
|
set the high bit. Ever after, the timer will interrupt the CPU once per
|
set the high bit. Ever after, the timer will interrupt the CPU once per
|
second (assuming a 100~MHz clock).
|
second (assuming a 100~MHz clock). This reload capability also limits the
|
|
maximum timer value to $2^{31}-1$, rather than $2^{32}-1$.
|
|
|
\section{Watchdog Timer}
|
\section{Watchdog Timer}
|
|
|
The watchdog timer is no different from any of the other timers, save for one
|
The watchdog timer is no different from any of the other timers, save for one
|
critical difference: the interrupt line from the watchdog
|
critical difference: the interrupt line from the watchdog
|
Line 1030... |
Line 1132... |
|
|
This peripheral is motivated by the Linux use of `jiffies' whereby a process
|
This peripheral is motivated by the Linux use of `jiffies' whereby a process
|
can request to be put to sleep until a certain number of `jiffies' have
|
can request to be put to sleep until a certain number of `jiffies' have
|
elapsed. Using this interface, the CPU can read the number of `jiffies'
|
elapsed. Using this interface, the CPU can read the number of `jiffies'
|
from the peripheral (it only has the one location in address space), add the
|
from the peripheral (it only has the one location in address space), add the
|
sleep length to it, and write teh result back to the peripheral. The zipjiffies
|
sleep length to it, and write the result back to the peripheral. The zipjiffies
|
peripheral will record the value written to it only if it is nearer the current
|
peripheral will record the value written to it only if it is nearer the current
|
counter value than the last current waiting interrupt time. If no other
|
counter value than the last current waiting interrupt time. If no other
|
interrupts are waiting, and this time is in the future, it will be enabled.
|
interrupts are waiting, and this time is in the future, it will be enabled.
|
(There is currently no way to disable a jiffie interrupt once set, other
|
(There is currently no way to disable a jiffie interrupt once set, other
|
than to disable the register in the interrupt controller.) The processor
|
than to disable the interrupt line in the interrupt controller.) The processor
|
may then place this sleep request into a list among other sleep requests.
|
may then place this sleep request into a list among other sleep requests.
|
Once the timer expires, it would write the next Jiffy request to the peripheral
|
Once the timer expires, it would write the next Jiffy request to the peripheral
|
and wake up the process whose timer had expired.
|
and wake up the process whose timer had expired.
|
|
|
Indeed, the Jiffies register is nothing more than a glorified counter with
|
Indeed, the Jiffies register is nothing more than a glorified counter with
|
Line 1055... |
Line 1157... |
O/S must also keep track of values written to the Jiffies register. Thus,
|
O/S must also keep track of values written to the Jiffies register. Thus,
|
when an `alarm' trips, it should be remoed from the list of alarms, the list
|
when an `alarm' trips, it should be remoed from the list of alarms, the list
|
should be sorted, and the next alarm in terms of Jiffies should be written
|
should be sorted, and the next alarm in terms of Jiffies should be written
|
to the register.
|
to the register.
|
|
|
|
\section{Manual Cache}
|
|
|
|
The manual cache is an experimental setting that may not remain with the Zip
|
|
CPU for very long. It is designed to facilitate running from FLASH or ROM
|
|
memory, although the pipe cache really makes this need obsolete. The manual
|
|
cache works by copying data from a wishbone address (range) into the cache
|
|
register, and then by making that memory available as memory to the Zip System.
|
|
It is a {\em manual cache} because the processor must first specify what
|
|
memory to copy, and then once copied the processor can only access the cache
|
|
memory by the cache memory location. There is no transparency. It is perhaps
|
|
best described as a combination DMA controller and local memory.
|
|
|
|
Worse, this cache is likely going to be removed from the ZipSystem. Having used
|
|
the ZipSystem now for some time, I have yet to find a need or use for the manual
|
|
cache. I will likely replace this peripheral with a proper DMA controller.
|
|
|
\chapter{Operation}\label{chap:ops}
|
\chapter{Operation}\label{chap:ops}
|
|
|
\chapter{Registers}\label{chap:regs}
|
\chapter{Registers}\label{chap:regs}
|
|
|
|
The ZipSystem registers fall into two categories, ZipSystem internal registers
|
|
accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{reglist}
|
|
PIC & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline
|
|
WDT & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline
|
|
CCHE & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline
|
|
CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline
|
|
TMRA & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline
|
|
TMRB & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline
|
|
TMRC & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline
|
|
JIFF & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline
|
|
MTASK & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline
|
|
MMSTL & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline
|
|
MPSTL & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline
|
|
MICNT & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline
|
|
UTASK & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline
|
|
UMSTL & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline
|
|
UPSTL & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline
|
|
UICNT & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline
|
|
Cache & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline
|
|
\end{reglist}
|
|
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}
|
|
\end{center}\end{table}
|
|
and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.
|
|
\begin{table}[htbp]
|
|
\begin{center}\begin{reglist}
|
|
ZIPCTRL & 0 & 32 & R/W & Debug Control Register \\\hline
|
|
ZIPDATA & 1 & 32 & R/W & Debug Data Register \\\hline
|
|
\end{reglist}
|
|
\caption{Zip System Debug Registers}\label{tbl:dbgregs}
|
|
\end{center}\end{table}
|
|
|
|
|
\chapter{Wishbone Datasheet}\label{chap:wishbone}
|
\chapter{Wishbone Datasheet}\label{chap:wishbone}
|
The Zip System supports two wishbone accesses, a slave debug port and a master
|
The Zip System supports two wishbone accesses, a slave debug port and a master
|
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}
|
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}
|
\begin{table}[htbp]
|
\begin{table}[htbp]
|
\begin{center}
|
\begin{center}
|
\begin{wishboneds}
|
\begin{wishboneds}
|
Revision level of wishbone & WB B4 spec \\\hline
|
Revision level of wishbone & WB B4 spec \\\hline
|
Type of interface & Slave, Read/Write, single words only \\\hline
|
Type of interface & Slave, Read/Write, single words only \\\hline
|
|
Address Width & 1--bit \\\hline
|
Port size & 32--bit \\\hline
|
Port size & 32--bit \\\hline
|
Port granularity & 32--bit \\\hline
|
Port granularity & 32--bit \\\hline
|
Maximum Operand Size & 32--bit \\\hline
|
Maximum Operand Size & 32--bit \\\hline
|
Data transfer ordering & (Irrelevant) \\\hline
|
Data transfer ordering & (Irrelevant) \\\hline
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
Line 1092... |
Line 1245... |
and Tbl.~\ref{tbl:wishbone-master} respectively.
|
and Tbl.~\ref{tbl:wishbone-master} respectively.
|
\begin{table}[htbp]
|
\begin{table}[htbp]
|
\begin{center}
|
\begin{center}
|
\begin{wishboneds}
|
\begin{wishboneds}
|
Revision level of wishbone & WB B4 spec \\\hline
|
Revision level of wishbone & WB B4 spec \\\hline
|
Type of interface & Master, Read/Write, sometimes pipelined \\\hline
|
Type of interface & Master, Read/Write, single cycle or pipelined\\\hline
|
|
Address Width & 32--bit bits \\\hline
|
Port size & 32--bit \\\hline
|
Port size & 32--bit \\\hline
|
Port granularity & 32--bit \\\hline
|
Port granularity & 32--bit \\\hline
|
Maximum Operand Size & 32--bit \\\hline
|
Maximum Operand Size & 32--bit \\\hline
|
Data transfer ordering & (Irrelevant) \\\hline
|
Data transfer ordering & (Irrelevant) \\\hline
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
Line 1114... |
Line 1268... |
\end{tabular}\\\hline
|
\end{tabular}\\\hline
|
\end{wishboneds}
|
\end{wishboneds}
|
\caption{Wishbone Datasheet for the CPU as Master}\label{tbl:wishbone-master}
|
\caption{Wishbone Datasheet for the CPU as Master}\label{tbl:wishbone-master}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
I do not recommend that you connect these together through the interconnect.
|
I do not recommend that you connect these together through the interconnect.
|
|
Rather, the debug port of the CPU should be accessible regardless of the state
|
|
of the master bus.
|
|
|
The big thing to notice is that both the real time clock and the real time
|
You may wish to notice that neither the {\tt ERR} nor the {\tt RETRY} wires
|
date modules act as wishbone slaves, and that all accesses to the registers of
|
have been implemented. What this means is that the CPU is currently unable
|
either module are 32--bit reads and writes. The address bus does not offer
|
to detect a bus error condition, and so may stall indefinitely (hang) should
|
byte level, but rather 32--bit word level resolution. Select lines are not
|
it choose to access a value not on the bus, or a peripheral that is not
|
implemented. Bit ordering is the normal ordering where bit~31 is the most
|
yet properly configured.
|
significant bit and so forth.
|
|
|
|
\chapter{Clocks}\label{chap:clocks}
|
\chapter{Clocks}\label{chap:clocks}
|
|
|
This core is based upon the Basys--3 design. The Basys--3 development board
|
This core is based upon the Basys--3 design. The Basys--3 development board
|
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
|
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
|