Line 46... |
Line 46... |
\documentclass{gqtekspec}
|
\documentclass{gqtekspec}
|
\project{Zip CPU}
|
\project{Zip CPU}
|
\title{Specification}
|
\title{Specification}
|
\author{Dan Gisselquist, Ph.D.}
|
\author{Dan Gisselquist, Ph.D.}
|
\email{dgisselq (at) opencores.org}
|
\email{dgisselq (at) opencores.org}
|
\revision{Rev.~0.4}
|
\revision{Rev.~0.5}
|
\definecolor{webred}{rgb}{0.2,0,0}
|
\definecolor{webred}{rgb}{0.2,0,0}
|
\definecolor{webgreen}{rgb}{0,0.2,0}
|
\definecolor{webgreen}{rgb}{0,0.2,0}
|
\usepackage[dvips,ps2pdf,colorlinks=true,
|
\usepackage[dvips,ps2pdf,colorlinks=true,
|
anchorcolor=black,pagecolor=webgreen,pdfpagelabels,hypertexnames,
|
anchorcolor=black,pagecolor=webgreen,pdfpagelabels,hypertexnames,
|
pdfauthor={Dan Gisselquist},
|
pdfauthor={Dan Gisselquist},
|
Line 74... |
Line 74... |
You should have received a copy of the GNU General Public License along
|
You should have received a copy of the GNU General Public License along
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
copy.
|
copy.
|
\end{license}
|
\end{license}
|
\begin{revisionhistory}
|
\begin{revisionhistory}
|
|
0.5 & 9/29/2015 & Gisselquist & Added pipelined memory access discussion.\\\hline
|
0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline
|
0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline
|
0.3 & 8/22/2015 & Gisselquist & First completed draft\\\hline
|
0.3 & 8/22/2015 & Gisselquist & First completed draft\\\hline
|
0.2 & 8/19/2015 & Gisselquist & Still Draft, more complete \\\hline
|
0.2 & 8/19/2015 & Gisselquist & Still Draft, more complete \\\hline
|
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
|
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
|
\end{revisionhistory}
|
\end{revisionhistory}
|
Line 409... |
Line 410... |
The tenth bit is a trap bit. It is set whenever the user requests a soft
|
The tenth bit is a trap bit. It is set whenever the user requests a soft
|
interrupt, and cleared on any return to userspace command. This allows the
|
interrupt, and cleared on any return to userspace command. This allows the
|
supervisor, in supervisor mode, to determine whether it got to supervisor
|
supervisor, in supervisor mode, to determine whether it got to supervisor
|
mode from a trap or from an external interrupt or both.
|
mode from a trap or from an external interrupt or both.
|
|
|
|
These status register bits are summarized in Tbl.~\ref{tbl:ccbits}.
|
|
\begin{table}
|
|
\begin{center}
|
|
\begin{tabular}{l|l}
|
|
Bit & Meaning \\\hline
|
|
9 & Soft trap, set on a trap from user mode, cleared when returning to user mode\\\hline
|
|
8 & (Reserved for) Floating point enable \\\hline
|
|
7 & Halt on break, to support an external debugger \\\hline
|
|
6 & Step, single step the CPU in user mode\\\hline
|
|
5 & GIE, or Global Interrupt Enable \\\hline
|
|
4 & Sleep \\\hline
|
|
3 & V, or overflow bit.\\\hline
|
|
2 & N, or negative bit.\\\hline
|
|
1 & C, or carry bit.\\\hline
|
|
0 & Z, or zero bit. \\\hline
|
|
\end{tabular}
|
|
\caption{Condition Code / Status Register Bits}\label{tbl:ccbits}
|
|
\end{center}\end{table}
|
|
|
\section{Conditional Instructions}
|
\section{Conditional Instructions}
|
Most, although not quite all, instructions may be conditionally executed. From
|
Most, although not quite all, instructions may be conditionally executed. From
|
the four condition code flags, eight conditions are defined. These are shown
|
the four condition code flags, eight conditions are defined. These are shown
|
in Tbl.~\ref{tbl:conditions}.
|
in Tbl.~\ref{tbl:conditions}.
|
\begin{table}
|
\begin{table}
|
Line 544... |
Line 564... |
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
|
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
|
\rowcolor[gray]{0.85}
|
\rowcolor[gray]{0.85}
|
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
|
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
|
& \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
|
& \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
|
& Sets CC? \\\hline\hline
|
& Sets CC? \\\hline\hline
|
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
|
{\tt CMP(Sub)} & \multicolumn{4}{l|}{4'h0}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
TST(And) & \multicolumn{4}{l|}{4'h1}
|
{\tt TST(And)} & \multicolumn{4}{l|}{4'h1}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
MOV & \multicolumn{4}{l|}{4'h2}
|
{\tt MOV} & \multicolumn{4}{l|}{4'h2}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& A-Usr
|
& A-Usr
|
& \multicolumn{4}{l|}{B-Reg}
|
& \multicolumn{4}{l|}{B-Reg}
|
& B-Usr
|
& B-Usr
|
& \multicolumn{15}{l|}{15'bit signed offset}
|
& \multicolumn{15}{l|}{15'bit signed offset}
|
& \\\hline
|
& \\\hline
|
LODI & \multicolumn{4}{l|}{4'h3}
|
{\tt LODI} & \multicolumn{4}{l|}{4'h3}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{24}{l|}{24'bit Signed Immediate}
|
& \multicolumn{24}{l|}{24'bit Signed Immediate}
|
& \\\hline
|
& \\\hline
|
NOOP & \multicolumn{4}{l|}{4'h4}
|
{\tt NOOP} & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{4'he}
|
& \multicolumn{4}{l|}{4'he}
|
& \multicolumn{24}{l|}{24'h00}
|
& \multicolumn{24}{l|}{24'h00}
|
& \\\hline
|
& \\\hline
|
BREAK & \multicolumn{4}{l|}{4'h4}
|
{\tt BREAK} & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{4'he}
|
& \multicolumn{4}{l|}{4'he}
|
& \multicolumn{24}{l|}{24'h01}
|
& \multicolumn{24}{l|}{24'h01}
|
& \\\hline
|
& \\\hline
|
{\em Reserved} & \multicolumn{4}{l|}{4'h4}
|
{\em Reserved} & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{4'he}
|
& \multicolumn{4}{l|}{4'he}
|
& \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
|
& \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
|
& \\\hline
|
& \\\hline
|
LODIHI & \multicolumn{4}{l|}{4'h4}
|
{\tt LODIHI }& \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{4'hf}
|
& \multicolumn{4}{l|}{4'hf}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& 1'b1
|
& 1'b1
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
& \\\hline
|
& \\\hline
|
LODILO & \multicolumn{4}{l|}{4'h4}
|
{\tt LODILO} & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{4'hf}
|
& \multicolumn{4}{l|}{4'hf}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& 1'b0
|
& 1'b0
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
& \\\hline
|
& \\\hline
|
16-b MPYU & \multicolumn{4}{l|}{4'h4}
|
16-b {\tt MPYU} & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& 1'b0 & \multicolumn{4}{l|}{Reg}
|
& 1'b0 & \multicolumn{4}{l|}{Reg}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& Yes \\\hline
|
& Yes \\\hline
|
16-b MPYU(I) & \multicolumn{4}{l|}{4'h4}
|
16-b {\tt MPYU}(I) & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& 1'b0 & \multicolumn{4}{l|}{4'hf}
|
& 1'b0 & \multicolumn{4}{l|}{4'hf}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& Yes \\\hline
|
& Yes \\\hline
|
16-b MPYS & \multicolumn{4}{l|}{4'h4}
|
16-b {\tt MPYS} & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& 1'b1 & \multicolumn{4}{l|}{Reg}
|
& 1'b1 & \multicolumn{4}{l|}{Reg}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& Yes \\\hline
|
& Yes \\\hline
|
16-b MPYS(I) & \multicolumn{4}{l|}{4'h4}
|
16-b {\tt MPYS}(I) & \multicolumn{4}{l|}{4'h4}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& 1'b1 & \multicolumn{4}{l|}{4'hf}
|
& 1'b1 & \multicolumn{4}{l|}{4'hf}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& \multicolumn{16}{l|}{16-bit Offset}
|
& Yes \\\hline
|
& Yes \\\hline
|
ROL & \multicolumn{4}{l|}{4'h5}
|
{\tt ROL} & \multicolumn{4}{l|}{4'h5}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
|
& \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
|
& \\\hline
|
& \\\hline
|
LOD & \multicolumn{4}{l|}{4'h6}
|
{\tt LOD} & \multicolumn{4}{l|}{4'h6}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B address}
|
& \multicolumn{21}{l|}{Operand B address}
|
& \\\hline
|
& \\\hline
|
STO & \multicolumn{4}{l|}{4'h7}
|
{\tt STO} & \multicolumn{4}{l|}{4'h7}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B address}
|
& \multicolumn{21}{l|}{Operand B address}
|
& \\\hline
|
& \\\hline
|
SUB & \multicolumn{4}{l|}{4'h8}
|
{\tt SUB} & \multicolumn{4}{l|}{4'h8}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
AND & \multicolumn{4}{l|}{4'h9}
|
{\tt AND} & \multicolumn{4}{l|}{4'h9}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
ADD & \multicolumn{4}{l|}{4'ha}
|
{\tt ADD} & \multicolumn{4}{l|}{4'ha}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
OR & \multicolumn{4}{l|}{4'hb}
|
{\tt OR} & \multicolumn{4}{l|}{4'hb}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
XOR & \multicolumn{4}{l|}{4'hc}
|
{\tt XOR} & \multicolumn{4}{l|}{4'hc}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
& Yes \\\hline
|
& Yes \\\hline
|
LSL/ASL & \multicolumn{4}{l|}{4'hd}
|
{\tt LSL/ASL} & \multicolumn{4}{l|}{4'hd}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
& Yes \\\hline
|
& Yes \\\hline
|
ASR & \multicolumn{4}{l|}{4'he}
|
{\tt ASR} & \multicolumn{4}{l|}{4'he}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
& Yes \\\hline
|
& Yes \\\hline
|
LSR & \multicolumn{4}{l|}{4'hf}
|
{\tt LSR} & \multicolumn{4}{l|}{4'hf}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
& Yes \\\hline
|
& Yes \\\hline
|
\end{tabular}
|
\end{tabular}
|
Line 690... |
Line 710... |
the Zip CPU. Many of these instructions will have assembly equivalents,
|
the Zip CPU. Many of these instructions will have assembly equivalents,
|
such as the branch instructions, to facilitate working with the CPU.
|
such as the branch instructions, to facilitate working with the CPU.
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
Mapped & Actual & Notes \\\hline
|
Mapped & Actual & Notes \\\hline
|
ABS Rx
|
{\tt ABS Rx}
|
& \parbox[t]{1.5in}{TST -1,Rx\\NEG.LT Rx}
|
& \parbox[t]{1.5in}{\tt TST -1,Rx\\NEG.LT Rx}
|
& Absolute value, depends upon derived NEG.\\\hline
|
& Absolute value, depends upon derived NEG.\\\hline
|
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
|
\parbox[t]{1.4in}{\tt ADD Ra,Rx\\ADDC Rb,Ry}
|
& \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
|
& \parbox[t]{1.5in}{\tt Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
|
& Add with carry \\\hline
|
& Add with carry \\\hline
|
BRA.Cond +/-\$Addr
|
{\tt BRA.Cond +/-\$Addr}
|
& \hbox{MOV.cond \$Addr+PC,PC}
|
& \hbox{\tt MOV.cond \$Addr+PC,PC}
|
& Branch or jump on condition. Works for 15--bit
|
& Branch or jump on condition. Works for 15--bit
|
signed address offsets.\\\hline
|
signed address offsets.\\\hline
|
BRA.Cond +/-\$Addr
|
{\tt BRA.Cond +/-\$Addr}
|
& \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC}
|
& \parbox[t]{1.5in}{\tt LDI \$Addr,Rx \\ ADD.cond Rx,PC}
|
& Branch/jump on condition. Works for
|
& Branch/jump on condition. Works for
|
23 bit address offsets, but costs a register, an extra instruction,
|
23 bit address offsets, but costs a register, an extra instruction,
|
and sets the flags. \\\hline
|
and sets the flags. \\\hline
|
BNC PC+\$Addr
|
{\tt BNC PC+\$Addr}
|
& \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
|
& \parbox[t]{1.5in}{\tt Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
|
& Example of a branch on an unsupported
|
& Example of a branch on an unsupported
|
condition, in this case a branch on not carry \\\hline
|
condition, in this case a branch on not carry \\\hline
|
BUSY & MOV \$-1(PC),PC & Execute an infinite loop \\\hline
|
{\tt BUSY } & {\tt MOV \$-1(PC),PC} & Execute an infinite loop \\\hline
|
CLRF.NZ Rx
|
{\tt CLRF.NZ Rx }
|
& XOR.NZ Rx,Rx
|
& {\tt XOR.NZ Rx,Rx}
|
& Clear Rx, and flags, if the Z-bit is not set \\\hline
|
& Clear Rx, and flags, if the Z-bit is not set \\\hline
|
CLR Rx
|
{\tt CLR Rx }
|
& LDI \$0,Rx
|
& {\tt LDI \$0,Rx}
|
& Clears Rx, leaves flags untouched. This instruction cannot be
|
& Clears Rx, leaves flags untouched. This instruction cannot be
|
conditional. \\\hline
|
conditional. \\\hline
|
EXCH.W Rx
|
{\tt EXCH.W Rx }
|
& ROL \$16,Rx
|
& {\tt ROL \$16,Rx}
|
& Exchanges the top and bottom 16'bit words of Rx \\\hline
|
& Exchanges the top and bottom 16'bit words of Rx \\\hline
|
HALT
|
{\tt HALT }
|
& Or \$SLEEP,CC
|
& {\tt Or \$SLEEP,CC}
|
& Executed while in interrupt mode. In user mode this is simply a
|
& This only works when issued in interrupt/supervisor mode. In user
|
wait until interrupt instruction. \\\hline
|
mode this is simply a wait until interrupt instruction. \\\hline
|
INT & LDI \$0,CC
|
{\tt INT } & {\tt LDI \$0,CC} & \\\hline
|
& Since we're using the CC register as a trap vector as well, this
|
{\tt IRET}
|
executes TRAP \#0. \\\hline
|
& {\tt OR \$GIE,CC}
|
IRET
|
& Also known as an RTU instruction (Return to Userspace) \\\hline
|
& OR \$GIE,CC
|
{\tt JMP R6+\$Addr}
|
& Also an RTU instruction (Return to Userspace) \\\hline
|
& {\tt MOV \$Addr(R6),PC}
|
JMP R6+\$Addr
|
|
& MOV \$Addr(R6),PC
|
|
& \\\hline
|
& \\\hline
|
JSR PC+\$Addr
|
{\tt JSR PC+\$Addr}
|
& \parbox[t]{1.5in}{SUB \$1,SP \\\
|
& \parbox[t]{1.5in}{\tt SUB \$1,SP \\\
|
MOV \$3+PC,R0 \\
|
MOV \$3+PC,R0 \\
|
STO R0,1(SP) \\
|
STO R0,1(SP) \\
|
MOV \$Addr+PC,PC \\
|
MOV \$Addr+PC,PC \\
|
ADD \$1,SP}
|
ADD \$1,SP}
|
& Jump to Subroutine. Note the required cleanup instruction after
|
& Jump to Subroutine. Note the required cleanup instruction after
|
returning. This could easily be turned into a three instruction
|
returning. This could easily be turned into a three instruction
|
operand, removing the preliminary stack instruction before and
|
operand, removing the preliminary stack instruction before and
|
the cleanup after, by adjusting how any stack frame was built for
|
the cleanup after, by adjusting how any stack frame was built for
|
this routine to include space at the top of the stack for the PC.
|
this routine to include space at the top of the stack for the PC.
|
|
Note also that jumping to a subroutine costs a copy register, {\tt R0}
|
|
in this case.
|
\\\hline
|
\\\hline
|
JSR PC+\$Addr
|
{\tt JSR PC+\$Addr }
|
& \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
|
& \parbox[t]{1.5in}{\tt MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
|
&This is the high speed
|
&This is the high speed
|
version of a subroutine call, necessitating a register to hold the
|
version of a subroutine call, necessitating a register to hold the
|
last PC address. In its favor, this method doesn't suffer the
|
last PC address. In its favor, this method doesn't suffer the
|
mandatory memory access of the other approach. \\\hline
|
mandatory memory access of the other approach. \\\hline
|
LDI.l \$val,Rx
|
{\tt LDI.l \$val,Rx }
|
& \parbox[t]{1.5in}{LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
|
& \parbox[t]{1.8in}{\tt LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
|
LDILO (\$val \& 0x0ffff)}
|
LDILO (\$val\&0x0ffff),Rx}
|
& Sadly, there's not enough instruction
|
& Sadly, there's not enough instruction
|
space to load a complete immediate value into any register.
|
space to load a complete immediate value into any register.
|
Therefore, fully loading any register takes two cycles.
|
Therefore, fully loading any register takes two cycles.
|
The LDIHI (load immediate high) and LDILO (load immediate low)
|
The LDIHI (load immediate high) and LDILO (load immediate low)
|
instructions have been created to facilitate this. \\\hline
|
instructions have been created to facilitate this. \\\hline
|
Line 765... |
Line 785... |
\caption{Derived Instructions}\label{tbl:derived-1}
|
\caption{Derived Instructions}\label{tbl:derived-1}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
Mapped & Actual & Notes \\\hline
|
Mapped & Actual & Notes \\\hline
|
LOD.b \$addr,Rx
|
{\tt LOD.b \$addr,Rx}
|
& \parbox[t]{1.5in}{%
|
& \parbox[t]{1.5in}{\tt %
|
LDI \$addr,Ra \\
|
LDI \$addr,Ra \\
|
LDI \$addr,Rb \\
|
LDI \$addr,Rb \\
|
LSR \$2,Ra \\
|
LSR \$2,Ra \\
|
AND \$3,Rb \\
|
AND \$3,Rb \\
|
LOD (Ra),Rx \\
|
LOD (Ra),Rx \\
|
Line 786... |
Line 806... |
all other addresses in this document are 32-bit wordlength addresses.
|
all other addresses in this document are 32-bit wordlength addresses.
|
For this reason,
|
For this reason,
|
we needed to drop the bottom two bits. This also limits the address
|
we needed to drop the bottom two bits. This also limits the address
|
space of character accesses using this method from 16 MB down to 4MB.}
|
space of character accesses using this method from 16 MB down to 4MB.}
|
\\\hline
|
\\\hline
|
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
|
\parbox[t]{1.5in}{\tt LSL \$1,Rx\\ LSLC \$1,Ry}
|
& \parbox[t]{1.5in}{LSL \$1,Ry \\
|
& \parbox[t]{1.5in}{\tt LSL \$1,Ry \\
|
LSL \$1,Rx \\
|
LSL \$1,Rx \\
|
OR.C \$1,Ry}
|
OR.C \$1,Ry}
|
& Logical shift left with carry. Note that the
|
& Logical shift left with carry. Note that the
|
instruction order is now backwards, to keep the conditions valid.
|
instruction order is now backwards, to keep the conditions valid.
|
That is, LSL sets the carry flag, so if we did this the other way
|
That is, LSL sets the carry flag, so if we did this the other way
|
with Rx before Ry, then the condition flag wouldn't have been right
|
with Rx before Ry, then the condition flag wouldn't have been right
|
for an OR correction at the end. \\\hline
|
for an OR correction at the end. \\\hline
|
\parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry}
|
\parbox[t]{1.5in}{\tt LSR \$1,Rx \\ LSRC \$1,Ry}
|
& \parbox[t]{1.5in}{CLR Rz \\
|
& \parbox[t]{1.5in}{\tt CLR Rz \\
|
LSR \$1,Ry \\
|
LSR \$1,Ry \\
|
LDIHI.C \$8000h,Rz \\
|
LDIHI.C \$8000h,Rz \\
|
LSR \$1,Rx \\
|
LSR \$1,Rx \\
|
OR Rz,Rx}
|
OR Rz,Rx}
|
& Logical shift right with carry \\\hline
|
& Logical shift right with carry \\\hline
|
NEG Rx & \parbox[t]{1.5in}{XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
|
{\tt NEG Rx} & \parbox[t]{1.5in}{\tt XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
|
NEG.C Rx & \parbox[t]{1.5in}{MOV.C \$-1+Rx,Rx\\XOR.C \$-1,Rx} & \\\hline
|
{\tt NEG.C Rx} & \parbox[t]{1.5in}{\tt MOV.C \$-1+Rx,Rx\\XOR.C \$-1,Rx} & \\\hline
|
NOOP & NOOP & While there are many
|
{\tt NOOP} & {\tt NOOP} & While there are many
|
operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these
|
operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these
|
operations have consequences in that they might stall the bus if
|
operations have consequences in that they might stall the bus if
|
Rx isn't ready yet. For this reason, we have a dedicated NOOP
|
Rx isn't ready yet. For this reason, we have a dedicated NOOP
|
instruction. \\\hline
|
instruction. \\\hline
|
NOT Rx & XOR \$-1,Rx & \\\hline
|
{\tt NOT Rx } & {\tt XOR \$-1,Rx } & \\\hline
|
POP Rx
|
{\tt POP Rx }
|
& \parbox[t]{1.5in}{LOD \$1(SP),Rx \\ ADD \$1,SP}
|
& \parbox[t]{1.5in}{\tt LOD \$1(SP),Rx \\ ADD \$1,SP}
|
& Note
|
& Note
|
that for interrupt purposes, one can never depend upon the value at
|
that for interrupt purposes, one can never depend upon the value at
|
(SP). Hence you read from it, then increment it, lest having
|
(SP). Hence you read from it, then increment it, lest having
|
incremented it first something then comes along and writes to that
|
incremented it first something then comes along and writes to that
|
value before you can read the result. \\\hline
|
value before you can read the result. \\\hline
|
\end{tabular}
|
\end{tabular}
|
\caption{Derived Instructions, continued}\label{tbl:derived-2}
|
\caption{Derived Instructions, continued}\label{tbl:derived-2}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
PUSH Rx
|
{\tt PUSH Rx}
|
& \parbox[t]{1.5in}{SUB \$1,SP \\
|
& \parbox[t]{1.5in}{SUB \$1,SP \\
|
STO Rx,\$1(SP)}
|
STO Rx,\$1(SP)}
|
& \\\hline
|
& Note that for pipelined operation, it helps to coalesce all the
|
PUSH Rx-Ry
|
{\tt SUB}'s into one command, and place the {\tt STO}'s right
|
& \parbox[t]{1.5in}{SUB \$n,SP \\
|
after each other.\\\hline
|
|
{\tt PUSH Rx-Ry}
|
|
& \parbox[t]{1.5in}{\tt SUB \$n,SP \\
|
STO Rx,\$n(SP)
|
STO Rx,\$n(SP)
|
\ldots \\
|
\ldots \\
|
STO Ry,\$1(SP)}
|
STO Ry,\$1(SP)}
|
& Multiple pushes at once only need the single subtract from the
|
& Multiple pushes at once only need the single subtract from the
|
stack pointer. This derived instruction is analogous to a similar one
|
stack pointer. This derived instruction is analogous to a similar one
|
on the Motoroloa 68k architecture, although the Zip Assembler
|
on the Motoroloa 68k architecture, although the Zip Assembler
|
does not support this instruction (yet).\\\hline
|
does not support this instruction (yet). This instruction
|
RESET
|
also supports pipelined memory access.\\\hline
|
& \parbox[t]{1in}{STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
|
{\tt RESET}
|
& \parbox[t]{3in}{This depends upon the peripheral base address being
|
& \parbox[t]{1in}{\tt STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
|
|
& This depends upon the peripheral base address being
|
in R12.
|
in R12.
|
|
|
Another opportunity might be to jump to the reset address from within
|
Another opportunity might be to jump to the reset address from within
|
supervisor mode.}\\\hline
|
supervisor mode.\\\hline
|
RET & \parbox[t]{1.5in}{LOD \$1(SP),PC}
|
{\tt RET} & \parbox[t]{1.5in}{\tt LOD \$1(SP),PC}
|
& Note that this depends upon the calling context to clean up the
|
& Note that this depends upon the calling context to clean up the
|
stack, as outlined for the JSR instruction. \\\hline
|
stack, as outlined for the JSR instruction. \\\hline
|
RET & MOV R12,PC
|
{\tt RET} & {\tt MOV R12,PC}
|
& This is the high(er) speed version, that doesn't touch the stack.
|
& This is the high(er) speed version, that doesn't touch the stack.
|
As such, it doesn't suffer a stall on memory read/write to the stack.
|
As such, it doesn't suffer a stall on memory read/write to the stack.
|
\\\hline
|
\\\hline
|
STEP Rr,Rt
|
{\tt STEP Rr,Rt}
|
& \parbox[t]{1.5in}{LSR \$1,Rr \\ XOR.C Rt,Rr}
|
& \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}
|
& Step a Galois implementation of a Linear Feedback Shift Register, Rr,
|
& Step a Galois implementation of a Linear Feedback Shift Register, Rr,
|
using taps Rt \\\hline
|
using taps Rt \\\hline
|
STO.b Rx,\$addr
|
{\tt STO.b Rx,\$addr}
|
& \parbox[t]{1.5in}{%
|
& \parbox[t]{1.5in}{\tt %
|
LDI \$addr,Ra \\
|
LDI \$addr,Ra \\
|
LDI \$addr,Rb \\
|
LDI \$addr,Rb \\
|
LSR \$2,Ra \\
|
LSR \$2,Ra \\
|
AND \$3,Rb \\
|
AND \$3,Rb \\
|
SUB \$32,Rb \\
|
SUB \$32,Rb \\
|
LOD (Ra),Ry \\
|
LOD (Ra),Ry \\
|
AND \$0ffh,Rx \\
|
AND \$0ffh,Rx \\
|
AND \$-0ffh,Ry \\
|
AND \~\$0ffh,Ry \\
|
ROL Rb,Rx \\
|
ROL Rb,Rx \\
|
OR Rx,Ry \\
|
OR Rx,Ry \\
|
STO Ry,(Ra) }
|
STO Ry,(Ra) }
|
& \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized
|
& \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized
|
for byte-wise operations.
|
for byte-wise operations.
|
Line 875... |
Line 898... |
byte-wise address, whereas in all of our other examples it is a
|
byte-wise address, whereas in all of our other examples it is a
|
32-bit word address. This also limits the address space
|
32-bit word address. This also limits the address space
|
of character accesses from 16 MB down to 4MB.F
|
of character accesses from 16 MB down to 4MB.F
|
Further, this instruction implies a byte ordering,
|
Further, this instruction implies a byte ordering,
|
such as big or little endian.} \\\hline
|
such as big or little endian.} \\\hline
|
SWAP Rx,Ry
|
{\tt SWAP Rx,Ry }
|
& \parbox[t]{1.5in}{
|
& \parbox[t]{1.5in}{\tt
|
XOR Ry,Rx \\
|
XOR Ry,Rx \\
|
XOR Rx,Ry \\
|
XOR Rx,Ry \\
|
XOR Ry,Rx}
|
XOR Ry,Rx}
|
& While no extra registers are needed, this example
|
& While no extra registers are needed, this example
|
does take 3-clocks. \\\hline
|
does take 3-clocks. \\\hline
|
TRAP \#X
|
{\tt TRAP \#X}
|
& \parbox[t]{1.5in}{LDI \$x,R0 \\ AND ~\$GIE,CC }
|
& \parbox[t]{1.5in}{\tt LDI \$x,R0 \\ AND \~\$GIE,CC }
|
& This works because whenever a user lowers the \$GIE flag, it sets
|
& This works because whenever a user lowers the \$GIE flag, it sets
|
a TRAP bit within the CC register. Therefore, upon entering the
|
a TRAP bit within the CC register. Therefore, upon entering the
|
supervisor state, the CPU only need check this bit to know that it
|
supervisor state, the CPU only need check this bit to know that it
|
got there via a TRAP. The trap could be made conditional by making
|
got there via a TRAP. The trap could be made conditional by making
|
the LDI and the AND conditional. In that case, the assembler would
|
the LDI and the AND conditional. In that case, the assembler would
|
Line 896... |
Line 919... |
\end{tabular}
|
\end{tabular}
|
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
TST Rx
|
{\tt TST Rx}
|
& TST \$-1,Rx
|
& {\tt TST \$-1,Rx}
|
& Set the condition codes based upon Rx. Could also do a CMP \$0,Rx,
|
& Set the condition codes based upon Rx. Could also do a CMP \$0,Rx,
|
ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc. The TST and CMP
|
ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc. The TST and CMP
|
approaches won't stall future pipeline stages looking for the value
|
approaches won't stall future pipeline stages looking for the value
|
of Rx. \\\hline
|
of Rx. \\\hline
|
WAIT
|
{\tt WAIT}
|
& Or \$SLEEP,CC
|
& {\tt Or \$GIE | \$SLEEP,CC}
|
& Wait 'til interrupt. In an interrupts disabled context, this
|
& Wait until the next interrupt, then jump to supervisor/interrupt
|
becomes a HALT instruction.
|
mode.
|
\end{tabular}
|
\end{tabular}
|
\caption{Derived Instructions, continued}\label{tbl:derived-4}
|
\caption{Derived Instructions, continued}\label{tbl:derived-4}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
\section{Pipeline Stages}
|
\section{Pipeline Stages}
|
As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},
|
As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},
|
Line 1071... |
Line 1094... |
In this case, the LOD instruction cannot start until the STO is finished.
|
In this case, the LOD instruction cannot start until the STO is finished.
|
With proper scheduling, it is possible to do something in the ALU while the
|
With proper scheduling, it is possible to do something in the ALU while the
|
memory unit is busy with the STO instruction, but otherwise this pipeline will
|
memory unit is busy with the STO instruction, but otherwise this pipeline will
|
stall waiting for it to complete.
|
stall waiting for it to complete.
|
|
|
Note that even though the Wishbone bus can support pipelined accesses at
|
The Zip CPU does have the capability of supporting pipelined memory access,
|
one access per clock, only the prefetch stage can take advantage of this.
|
but only under the following conditions: all accesses within the pipeline
|
Load and Store instructions are stuck at one wishbone cycle per instruction.
|
must all be reads or all be writes, all must use the same register for their
|
|
address, and there can be no stalls or other instructions between pipelined
|
|
memory access instructions. Further, the offset to memory must be increasing
|
|
by one address each instruction. These conditions work well for saving or
|
|
storing registers to the stack.
|
|
|
\item When waiting for a conditional memory read operation to complete
|
\item When waiting for a conditional memory read operation to complete
|
\begin{enumerate}
|
\begin{enumerate}
|
\item\ {\tt LOD.Z address,RA}
|
\item\ {\tt LOD.Z address,RA}
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
Line 1233... |
Line 1260... |
restart its transfer by writing the contents of its internal buffer and then
|
restart its transfer by writing the contents of its internal buffer and then
|
re-entering its read cycle again.
|
re-entering its read cycle again.
|
|
|
When coupled with a peripheral, the DMA controller can be configured to start
|
When coupled with a peripheral, the DMA controller can be configured to start
|
a memory copy on an interrupt line going high. Further, the controller can be
|
a memory copy on an interrupt line going high. Further, the controller can be
|
configured to issue reads from (or two) the same address instead of incrementing
|
configured to issue reads from (or to) the same address instead of incrementing
|
the address at each clock. The DMA completes once the total number of items
|
the address at each clock. The DMA completes once the total number of items
|
specified (not the transfer length) have been transferred.
|
specified (not the transfer length) have been transferred.
|
|
|
In each case, once the transfer is complete and the DMA unit returns to
|
In each case, once the transfer is complete and the DMA unit returns to
|
idle, the DMA will issue an interrupt.
|
idle, the DMA will issue an interrupt.
|
Line 1400... |
Line 1427... |
onto the user stack and then copying the resulting stack address
|
onto the user stack and then copying the resulting stack address
|
into the tasks task structure, as shown in Tbl.~\ref{tbl:context-out}.
|
into the tasks task structure, as shown in Tbl.~\ref{tbl:context-out}.
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{ll}
|
\begin{tabular}{ll}
|
{\tt swap\_out:} \\
|
{\tt swap\_out:} \\
|
& {\tt MOV -15(uSP),R1} \\
|
& {\tt MOV -15(uSP),R5} \\
|
& {\tt STO R1,stack(R12)} \\
|
& {\tt STO R5,stack(R12)} \\
|
& {\tt MOV uPC,R0} \\
|
& {\tt MOV uR0,R0} \\
|
& {\tt STO R0,15(R1)} \\
|
& {\tt MOV uR1,R1} \\
|
& {\tt MOV uCC,R0} \\
|
& {\tt MOV uR2,R2} \\
|
& {\tt STO R0,14(R1)} \\
|
& {\tt MOV uR3,R3} \\
|
|
& {\tt MOV uR4,R4} \\
|
|
& {\tt STO R0,1(R5)} {\em ; Exploit memory pipelining: }\\
|
|
& {\tt STO R1,2(R5)} {\em ; All instructions write to stack }\\
|
|
& {\tt STO R2,3(R5)} {\em ; All offsets increment by one }\\
|
|
& {\tt STO R3,4(R5)} {\em ; Longest pipeline is 5 cycles.}\\
|
|
& {\tt STO R4,5(R5)} \\
|
|
& \ldots {\em ; Need to repeat for all user registers} \\
|
|
\iffalse
|
|
& {\tt MOV uR5,R0} \\
|
|
& {\tt MOV uR6,R1} \\
|
|
& {\tt MOV uR7,R2} \\
|
|
& {\tt MOV uR8,R3} \\
|
|
& {\tt MOV uR9,R4} \\
|
|
& {\tt STO R0,6(R5) }\\
|
|
& {\tt STO R1,7(R5) }\\
|
|
& {\tt STO R2,8(R5) }\\
|
|
& {\tt STO R3,9(R5) }\\
|
|
& {\tt STO R4,10(R5)} \\
|
|
\fi
|
|
& {\tt MOV uR10,R0} \\
|
|
& {\tt MOV uR11,R1} \\
|
|
& {\tt MOV uR12,R2} \\
|
|
& {\tt MOV uCC,R3} \\
|
|
& {\tt MOV uPC,R4} \\
|
|
& {\tt STO R0,11(R5)}\\
|
|
& {\tt STO R1,12(R5)}\\
|
|
& {\tt STO R2,13(R5)}\\
|
|
& {\tt STO R3,14(R5)}\\
|
|
& {\tt STO R4,15(R5)} \\
|
& {\em ; We can skip storing the stack, uSP, since it'll be stored}\\
|
& {\em ; We can skip storing the stack, uSP, since it'll be stored}\\
|
& {\em ; elsewhere (in the task structure) }\\
|
& {\em ; elsewhere (in the task structure) }\\
|
& {\tt MOV uR13,R0} \\
|
|
& {\tt STO R0,13(R1)} \\
|
|
& \ldots {\em ; Need to repeat for all user registers} \\
|
|
& {\tt MOV uR0,R0} \\
|
|
& {\tt STO R0,1(R1)} \\
|
|
\end{tabular}
|
\end{tabular}
|
\caption{Example Storing User Task Context}\label{tbl:context-out}
|
\caption{Example Storing User Task Context}\label{tbl:context-out}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
For the sake of discussion, we assume the supervisor maintains a
|
For the sake of discussion, we assume the supervisor maintains a
|
pointer to the current task's structure in supervisor register
|
pointer to the current task's structure in supervisor register
|
Line 1507... |
Line 1558... |
back off of the stack to run this task. An example of this is
|
back off of the stack to run this task. An example of this is
|
shown in Tbl.~\ref{tbl:context-in},
|
shown in Tbl.~\ref{tbl:context-in},
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{ll}
|
\begin{tabular}{ll}
|
{\tt swap\_in:} \\
|
{\tt swap\_in:} \\
|
& {\tt LOD stack(R12),R1} \\
|
& {\tt LOD stack(R12),R5} \\
|
& {\tt MOV 15(R1),uSP} \\
|
& {\tt MOV 15(R1),uSP} \\
|
& {\tt LOD 15(R1),R0} \\
|
& {\em ; Be sure to exploit the memory pipelining capability} \\
|
& {\tt MOV R0,uPC} \\
|
& {\tt LOD 1(R5),R0} \\
|
& {\tt LOD 14(R1),R0} \\
|
& {\tt LOD 2(R5),R1} \\
|
& {\tt MOV R0,uCC} \\
|
& {\tt LOD 3(R5),R2} \\
|
& {\tt LOD 13(R1),R0} \\
|
& {\tt LOD 4(R5),R3} \\
|
& {\tt MOV R0,uR12} \\
|
& {\tt LOD 5(R5),R4} \\
|
& \ldots {\em ; Need to repeat for all user registers} \\
|
|
& {\tt LOD 1(R1),R0} \\
|
|
& {\tt MOV R0,uR0} \\
|
& {\tt MOV R0,uR0} \\
|
|
& {\tt MOV R1,uR1} \\
|
|
& {\tt MOV R2,uR2} \\
|
|
& {\tt MOV R3,uR3} \\
|
|
& {\tt MOV R4,uR4} \\
|
|
& \ldots {\em ; Need to repeat for all user registers} \\
|
|
& {\tt LOD 11(R5),R0} \\
|
|
& {\tt LOD 12(R5),R1} \\
|
|
& {\tt LOD 13(R5),R2} \\
|
|
& {\tt LOD 14(R5),R3} \\
|
|
& {\tt LOD 15(R5),R4} \\
|
|
& {\tt MOV R0,uR10} \\
|
|
& {\tt MOV R1,uR11} \\
|
|
& {\tt MOV R2,uR12} \\
|
|
& {\tt MOV R3,uCC} \\
|
|
& {\tt MOV R4,uPC} \\
|
|
|
& {\tt BRA return\_to\_user} \\
|
& {\tt BRA return\_to\_user} \\
|
\end{tabular}
|
\end{tabular}
|
\caption{Example Restoring User Task Context}\label{tbl:context-in}
|
\caption{Example Restoring User Task Context}\label{tbl:context-in}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
assuming as before that the task
|
assuming as before that the task
|
Line 1714... |
Line 1779... |
|
|
The bit allocation of the control register is shown in Tbl.~\ref{tbl:dmacbits}.
|
The bit allocation of the control register is shown in Tbl.~\ref{tbl:dmacbits}.
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{bitlist}
|
\begin{bitlist}
|
31 & R & DMA Active\\\hline
|
31 & R & DMA Active\\\hline
|
30 & R & Wishbone error, transaction aborted (cleared on any write)\\\hline
|
30 & R & Wishbone error, transaction aborted. This bit is cleared the next time
|
|
this register is written to.\\\hline
|
29 & R/W & Set to '1' to prevent the controller from incrementing the source address, '0' for normal memory copy. \\\hline
|
29 & R/W & Set to '1' to prevent the controller from incrementing the source address, '0' for normal memory copy. \\\hline
|
28 & R/W & Set to '0' to prevent the controller from incrementing the
|
28 & R/W & Set to '1' to prevent the controller from incrementing the
|
destination address, '0' for normal memory copy. \\\hline
|
destination address, '0' for normal memory copy. \\\hline
|
27 \ldots 16 & W & The DMA Key. Write a 12'hfed to these bits to start the
|
27 \ldots 16 & W & The DMA Key. Write a 12'hfed to these bits to start the
|
activate any DMA transfer. \\\hline
|
activate any DMA transfer. \\\hline
|
27 & R & Always reads '0', to force the deliberate writing of the key. \\\hline
|
27 & R & Always reads '0', to force the deliberate writing of the key. \\\hline
|
26 \ldots 16 & R & Indicates the number of items in the transfer buffer that
|
26 \ldots 16 & R & Indicates the number of items in the transfer buffer that
|
Line 1793... |
Line 1859... |
uSP & 29 & 32 & R/W & User Stack Pointer\\\hline
|
uSP & 29 & 32 & R/W & User Stack Pointer\\\hline
|
uCC & 30 & 32 & R/W & User Condition Code Register \\\hline
|
uCC & 30 & 32 & R/W & User Condition Code Register \\\hline
|
uPC & 31 & 32 & R/W & User Program Counter\\\hline
|
uPC & 31 & 32 & R/W & User Program Counter\\\hline
|
PIC & 32 & 32 & R/W & Primary Interrupt Controller \\\hline
|
PIC & 32 & 32 & R/W & Primary Interrupt Controller \\\hline
|
WDT & 33 & 32 & R/W & Watchdog Timer\\\hline
|
WDT & 33 & 32 & R/W & Watchdog Timer\\\hline
|
CCHE & 34 & 32 & R/W & Manual Cache Controller\\\hline
|
|
CTRIC & 35 & 32 & R/W & Secondary Interrupt Controller\\\hline
|
CTRIC & 35 & 32 & R/W & Secondary Interrupt Controller\\\hline
|
TMRA & 36 & 32 & R/W & Timer A\\\hline
|
TMRA & 36 & 32 & R/W & Timer A\\\hline
|
TMRB & 37 & 32 & R/W & Timer B\\\hline
|
TMRB & 37 & 32 & R/W & Timer B\\\hline
|
TMRC & 38 & 32 & R/W & Timer C\\\hline
|
TMRC & 38 & 32 & R/W & Timer C\\\hline
|
JIFF & 39 & 32 & R/W & Jiffies peripheral\\\hline
|
JIFF & 39 & 32 & R/W & Jiffies peripheral\\\hline
|
Line 1807... |
Line 1872... |
MICNT & 43 & 32 & R/W & Master instruction counter\\\hline
|
MICNT & 43 & 32 & R/W & Master instruction counter\\\hline
|
UTASK & 44 & 32 & R/W & User task clock counter\\\hline
|
UTASK & 44 & 32 & R/W & User task clock counter\\\hline
|
UMSTL & 45 & 32 & R/W & User memory stall counter\\\hline
|
UMSTL & 45 & 32 & R/W & User memory stall counter\\\hline
|
UPSTL & 46 & 32 & R/W & User Pre-Fetch Stall counter\\\hline
|
UPSTL & 46 & 32 & R/W & User Pre-Fetch Stall counter\\\hline
|
UICNT & 47 & 32 & R/W & User instruction counter\\\hline
|
UICNT & 47 & 32 & R/W & User instruction counter\\\hline
|
|
DMACMD & 48 & 32 & R/W & DMA command and status register\\\hline
|
|
DMALEN & 49 & 32 & R/W & DMA transfer length\\\hline
|
|
DMARD & 50 & 32 & R/W & DMA read address\\\hline
|
|
DMAWR & 51 & 32 & R/W & DMA write address\\\hline
|
\end{reglist}
|
\end{reglist}
|
\caption{Debug Register Addresses}\label{tbl:dbgaddrs}
|
\caption{Debug Register Addresses}\label{tbl:dbgaddrs}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
Primarily, these ``registers'' include access to the entire CPU register
|
Primarily, these ``registers'' include access to the entire CPU register
|
set, as well as the internal peripherals. To read one of these registers
|
set, as well as the internal peripherals. To read one of these registers
|
Line 2113... |
Line 2182... |
(yet) support a compiler. The standard C library is an even longer
|
(yet) support a compiler. The standard C library is an even longer
|
shot. My dream of having binutils and gcc support has not been
|
shot. My dream of having binutils and gcc support has not been
|
realized and at this rate may not be realized. (I've been intimidated
|
realized and at this rate may not be realized. (I've been intimidated
|
by the challenge everytime I've looked through those codes.)
|
by the challenge everytime I've looked through those codes.)
|
|
|
|
\iffalse
|
\item While the Wishbone Bus (B4) supports a pipelined mode with single cycle
|
\item While the Wishbone Bus (B4) supports a pipelined mode with single cycle
|
execution, the Zip CPU is unable to exploit this parallelism. Instead,
|
execution, the Zip CPU is unable to exploit this parallelism. Instead,
|
apart from the DMA and the pipelined prefetch, all loads and stores
|
apart from the DMA and the pipelined prefetch, all loads and stores
|
are single wishbone bus operations requiring a minimum of 3 clocks.
|
are single wishbone bus operations requiring a minimum of 3 clocks.
|
(In practice, this has turned into 7-clocks.)
|
(In practice, this has turned into 7-clocks.)
|
|
% Addressed, 20150929
|
|
|
\iffalse
|
|
\item There is no control over whether or not an instruction sets the
|
\item There is no control over whether or not an instruction sets the
|
condition codes--certain instructions always set the condition codes,
|
condition codes--certain instructions always set the condition codes,
|
other instructions never set them. This effectively limits conditional
|
other instructions never set them. This effectively limits conditional
|
instructions to a single instruction only (with two or more
|
instructions to a single instruction only (with two or more
|
instructions as an exception), as the first instruction that sets
|
instructions as an exception), as the first instruction that sets
|
Line 2171... |
Line 2241... |
the process accounting registers are anything but light weight, why
|
the process accounting registers are anything but light weight, why
|
keep them? Why not instead make some compile flags that just turn them
|
keep them? Why not instead make some compile flags that just turn them
|
off, keeping the CPU lightweight? The same holds for the prefetch
|
off, keeping the CPU lightweight? The same holds for the prefetch
|
cache.
|
cache.
|
|
|
|
\item The `{\tt .V}' condition was never used in any code other than my test
|
|
code. Suggest changing it to a `{\tt .LE}' condition, which seems
|
|
to be more useful.
|
|
|
|
\item {\bf Consider a more traditional Instruction Cache.} The current
|
|
pipelined instruction cache just reads a window of memory into
|
|
its cache. If the CPU leaves that window, the entire cache is
|
|
invalidated. A more traditional cache, however, might allow
|
|
common subroutines to stay within the cache without invalidating the
|
|
entire cache structure.
|
|
|
\iffalse
|
\iffalse
|
\item {\bf Adjust the Zip CPU so that conditional instructions do not set
|
\item {\bf Adjust the Zip CPU so that conditional instructions do not set
|
flags}, although they may explicitly set condition codes if writing
|
flags}, although they may explicitly set condition codes if writing
|
to the CC register.
|
to the CC register.
|
|
|
This is a simple change to the core, and may show up in new releases.
|
This is a simple change to the core, and may show up in new releases.
|
% Fixed, 20150918
|
% Fixed, 20150918
|
\fi
|
|
|
|
\item The `{\tt .V}' condition was never used in any code other than my test
|
|
code. Suggest changing it to a `{\tt .LE}' condition, which seems
|
|
to be more useful.
|
|
|
|
\iffalse
|
|
\item Add in an {\bf unpredictable branch delay slot}, so that on any branch
|
\item Add in an {\bf unpredictable branch delay slot}, so that on any branch
|
the delay slot may or may not be executed before the branch.
|
the delay slot may or may not be executed before the branch.
|
Instructions that do not depend upon the branch, and that should be
|
Instructions that do not depend upon the branch, and that should be
|
executed were the branch not taken, could be placed into the delay
|
executed were the branch not taken, could be placed into the delay
|
slot. Thus, if the branch isn't taken, we wouldn't suffer the stall,
|
slot. Thus, if the branch isn't taken, we wouldn't suffer the stall,
|
Line 2224... |
Line 2299... |
for one cycle before starting again, these extra cycles add up.
|
for one cycle before starting again, these extra cycles add up.
|
It should be possible to tell the prefetch stage to give up the bus
|
It should be possible to tell the prefetch stage to give up the bus
|
as soon as the decoder knows the instruction will need the bus.
|
as soon as the decoder knows the instruction will need the bus.
|
Indeed, if done in the decode stage, this might drop the seven cycle
|
Indeed, if done in the decode stage, this might drop the seven cycle
|
access down by two cycles.
|
access down by two cycles.
|
|
|
% FIXED: 20150918
|
% FIXED: 20150918
|
\fi
|
|
|
|
\item {\bf Consider a more traditional Instruction Cache.} The current
|
|
pipelined instruction cache just reads a window of memory into
|
|
its cache. If the CPU leaves that window, the entire cache is
|
|
invalidated. A more traditional cache, however, might allow
|
|
common subroutines to stay within the cache without invalidating the
|
|
entire cache structure.
|
|
|
|
\iffalse
|
|
\item {\bf Very Long Instruction Word (VLIW).} Now, to speed up operation, I
|
\item {\bf Very Long Instruction Word (VLIW).} Now, to speed up operation, I
|
propose that the Zip CPU instruction set be modified towards a Very
|
propose that the Zip CPU instruction set be modified towards a Very
|
Long Instruction Word (VLIW) implementation. In this implementation,
|
Long Instruction Word (VLIW) implementation. In this implementation,
|
an instruction word may contain either one or two separate
|
an instruction word may contain either one or two separate
|
instructions. The first instruction would take up the high order bits,
|
instructions. The first instruction would take up the high order bits,
|