Line 41... |
Line 41... |
%% License: GPL, v3, as defined and found on www.gnu.org,
|
%% License: GPL, v3, as defined and found on www.gnu.org,
|
%% http://www.gnu.org/licenses/gpl.html
|
%% http://www.gnu.org/licenses/gpl.html
|
%%
|
%%
|
%%
|
%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%
|
|
%
|
|
%
|
|
% From TI about DSPs vs FPGAs:
|
|
% www.ti.com/general/docs/video/foldersGallery.tsp?bkg=gray
|
|
% &gpn=35145&familyid=1622&keyMatch=DSP Breaktime Episode Three
|
|
% &tisearch=Search-EN-Everything&DCMP=leadership
|
|
% &HQS=ep-pro-dsp-leadership-problog-150518-v-en
|
|
%
|
|
% FPGA's are annoyingly faster, cheaper, and not quite as power hungry
|
|
% as they used to be.
|
|
%
|
|
% Why would you choose DSPs over FPGAs? If you care about size,
|
|
% if you care about power, or happen to have a complicated algorithm
|
|
% that just isn't simply doing the same thing over and over
|
|
%
|
|
% For complex algorithms that change over time. Each have their strengths
|
|
% sometimes you can use both.
|
|
%
|
|
% "No assembly required" -- TI tools all C programming, very GUI based
|
|
% environment, very little optimization by hand ...
|
|
%
|
|
%
|
|
% The FPGA's achilles heel: Reconfigurability. It is very difficult, although
|
|
% I'm sure major vendors will tell you not impossible, to reconfigure an FPGA
|
|
% based upon the need to process time-sensitive data. If you need one of two
|
|
% algorithms, both which will fit on the FPGA individually but not together,
|
|
% switching between them on the fly is next to impossible, whereas switching
|
|
% algorithm within a CPU is not difficult at all. For example, imagine
|
|
% receiving a packet and needing to apply one of two data algorithms on the
|
|
% packet before sending it back out, and needing to do so fast. If both
|
|
% algorithms don't fit in memory, where does the packet go when you need to
|
|
% swap one algorithm out for the other? And what is the cost of that "context"
|
|
% swap?
|
|
%
|
|
%
|
\documentclass{gqtekspec}
|
\documentclass{gqtekspec}
|
\usepackage{import}
|
\usepackage{import}
|
\usepackage{bytefield}
|
\usepackage{bytefield} % Install via apt-get install texlive-science
|
% \graphicspath{{../gfx}}
|
% \graphicspath{{../gfx}}
|
\project{Zip CPU}
|
\project{Zip CPU}
|
\title{Specification}
|
\title{Specification}
|
\author{Dan Gisselquist, Ph.D.}
|
\author{Dan Gisselquist, Ph.D.}
|
\email{dgisselq (at) opencores.org}
|
\email{dgisselq (at) opencores.org}
|
\revision{Rev.~0.8}
|
\revision{Rev.~0.9}
|
\definecolor{webred}{rgb}{0.5,0,0}
|
\definecolor{webred}{rgb}{0.5,0,0}
|
\definecolor{webgreen}{rgb}{0,0.4,0}
|
\definecolor{webgreen}{rgb}{0,0.4,0}
|
\usepackage[dvips,ps2pdf,colorlinks=true,
|
\usepackage[dvips,ps2pdf,colorlinks=true,
|
anchorcolor=black,pdfpagelabels,hypertexnames,
|
anchorcolor=black,pdfpagelabels,hypertexnames,
|
pdfauthor={Dan Gisselquist},
|
pdfauthor={Dan Gisselquist},
|
Line 82... |
Line 118... |
You should have received a copy of the GNU General Public License along
|
You should have received a copy of the GNU General Public License along
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
copy.
|
copy.
|
\end{license}
|
\end{license}
|
\begin{revisionhistory}
|
\begin{revisionhistory}
|
|
0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY, MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively. LOCK instruction now
|
|
permits an intermediate ALU operation. \\\hline
|
0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline
|
0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline
|
0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline
|
0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline
|
0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline
|
0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline
|
0.5 & 9/29/2015 & Gisselquist & Added pipelined memory access discussion.\\\hline
|
0.5 & 9/29/2015 & Gisselquist & Added pipelined memory access discussion.\\\hline
|
0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline
|
0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline
|
Line 650... |
Line 688... |
\end{bytefield}
|
\end{bytefield}
|
\caption{Zip Instruction Set Format}\label{fig:iset-format}
|
\caption{Zip Instruction Set Format}\label{fig:iset-format}
|
\end{center}\end{figure}
|
\end{center}\end{figure}
|
The basic format is that some operation, defined by the OpCode, is applied
|
The basic format is that some operation, defined by the OpCode, is applied
|
if a condition, Cnd, is true in order to produce a result which is placed in
|
if a condition, Cnd, is true in order to produce a result which is placed in
|
the destination register, or DR. The Load 23--bit signed immediate instruction
|
the destination register, or DR. The load 23--bit signed immediate instruction
|
is different in that it requires no conditions, and uses only a 4-bit opcode.
|
(LDI) is different in that it accepts no conditions, and uses only a 4-bit
|
|
opcode.
|
|
|
This is actually a second version of instruction set definition, given certain
|
This is actually a second version of instruction set definition, given certain
|
lessons learned. For example, the original instruction set had the following
|
lessons learned. For example, the original instruction set had the following
|
problems:
|
problems:
|
\begin{enumerate}
|
\begin{enumerate}
|
Line 665... |
Line 704... |
require extra logic to use.
|
require extra logic to use.
|
\item The carveouts for instructions such as NOOP and LDIHI/LDILO required
|
\item The carveouts for instructions such as NOOP and LDIHI/LDILO required
|
extra logic to process.
|
extra logic to process.
|
\item The instruction set wasn't very compact. One bus operation was required
|
\item The instruction set wasn't very compact. One bus operation was required
|
for every instruction.
|
for every instruction.
|
|
\item While the CPU supported multiplies, they were only 16x16 bit multiplies.
|
\end{enumerate}
|
\end{enumerate}
|
This second version was designed with two criteria. The first was that the
|
This second version was designed with two criteria. The first was that the
|
new instruction set needed to be compatible, at the assembly language level,
|
new instruction set needed to be compatible, at the assembly language level,
|
with the previous instruction set. Thus, it must be able to support all of
|
with the previous instruction set. Thus, it must be able to support all of
|
the previous menumonics and more. This was achieved with the sole exception
|
the previous menumonics and more. This was achieved with the sole exception
|
Line 690... |
Line 730... |
to interrupt mode in between the two instructions. Likewise a new job given
|
to interrupt mode in between the two instructions. Likewise a new job given
|
to the assembler is that of automatically packing as many instructions as
|
to the assembler is that of automatically packing as many instructions as
|
possible into the VLIW format. Where necessary to place both VLIW instructions
|
possible into the VLIW format. Where necessary to place both VLIW instructions
|
on the same line, they will be separated by a vertical bar.
|
on the same line, they will be separated by a vertical bar.
|
|
|
|
One belated change to the instruction set violates some of the above
|
|
principles. This latter instruction set change replaced the {\tt LDIHI}
|
|
instruction with a 32--bit multiply instruction {\tt MPY}, and then changed
|
|
the two 16--bit multiply instructions {\tt MPYU} and {\tt MPYS} for
|
|
{\tt MPYUHI} and {\tt MPYSHI} respectively. This creates a 32--bit
|
|
multiply capability, while removing the 16--bit multiply that wasn't very
|
|
useful. Further, the {\tt LDIHI} instruction was being used primarily by the
|
|
assembler and linker to create a 32--bit load immediate pair of instructions.
|
|
This instruction set combination, {\tt LDIHI} followed by {\tt LDILO} was
|
|
replaced with an equivalent instruction set, {\tt BREV} followed by {\tt LDILO},
|
|
save that linking has been made more complicated in the process.
|
|
|
\section{Instruction OpCodes}
|
\section{Instruction OpCodes}
|
With a 5--bit opcode field, there are 32--possible instructions as shown in
|
With a 5--bit opcode field, there are 32--possible instructions as shown in
|
Tbl.~\ref{tbl:iset-opcodes}.
|
Tbl.~\ref{tbl:iset-opcodes}.
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
\begin{tabular}{|l|l|l|c|} \hline \rowcolor[gray]{0.85}
|
\begin{tabular}{|l|l|l|c|} \hline \rowcolor[gray]{0.85}
|
Line 704... |
Line 756... |
5'h03 & OR & Bitwise Or & Y \\\cline{1-3}
|
5'h03 & OR & Bitwise Or & Y \\\cline{1-3}
|
5'h04 & XOR & Bitwise Exclusive Or & \\\cline{1-3}
|
5'h04 & XOR & Bitwise Exclusive Or & \\\cline{1-3}
|
5'h05 & LSR & Logical Shift Right & \\\cline{1-3}
|
5'h05 & LSR & Logical Shift Right & \\\cline{1-3}
|
5'h06 & LSL & Logical Shift Left & \\\cline{1-3}
|
5'h06 & LSL & Logical Shift Left & \\\cline{1-3}
|
5'h07 & ASR & Arithmetic Shift Right & \\\hline
|
5'h07 & ASR & Arithmetic Shift Right & \\\hline
|
5'h08 & LDIHI & Load Immediate High & N \\\cline{1-3}
|
5'h08 & MPY & 32x32 bit multiply & Y \\\hline
|
5'h09 & LDILO & Load Immediate Low & \\\hline
|
5'h09 & LDILO & Load Immediate Low & N\\\hline
|
5'h0a & MPYU & Unsigned 16--bit Multiply & \\\cline{1-3}
|
5'h0a & MPYUHI & Upper 32 of 64 bits from an unsigned 32x32 multiply & \\\cline{1-3}
|
5'h0b & MPYS & Signed 16--bit Multiply & Y \\\cline{1-3}
|
5'h0b & MPYSHI & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3}
|
5'h0c & BREV & Bit Reverse & \\\cline{1-3}
|
5'h0c & BREV & Bit Reverse & \\\cline{1-3}
|
5'h0d & POPC& Population Count & \\\cline{1-3}
|
5'h0d & POPC& Population Count & \\\cline{1-3}
|
5'h0e & ROL & Rotate left & \\\hline
|
5'h0e & ROL & Rotate left & \\\hline
|
5'h0f & MOV & Move register & N \\\hline
|
5'h0f & MOV & Move register & N \\\hline
|
5'h10 & CMP & Compare & Y \\\cline{1-3}
|
5'h10 & CMP & Compare & Y \\\cline{1-3}
|
Line 727... |
Line 779... |
5'h1b & FPDIV & Floating point divide & \\\cline{1-3}
|
5'h1b & FPDIV & Floating point divide & \\\cline{1-3}
|
5'h1c & FPCVT & Convert integer to floating point & \\\cline{1-3}
|
5'h1c & FPCVT & Convert integer to floating point & \\\cline{1-3}
|
5'h1d & FPINT & Convert to integer & \\\hline
|
5'h1d & FPINT & Convert to integer & \\\hline
|
5'h1e & & {\em Reserved for future use} &\\\hline
|
5'h1e & & {\em Reserved for future use} &\\\hline
|
5'h1f & & {\em Reserved for future use} &\\\hline
|
5'h1f & & {\em Reserved for future use} &\\\hline
|
|
5'h18 & & NOOP (A-register = PC)&\\\cline{1-3}
|
|
5'h19 & & BREAK (A-register = PC)& N\\\cline{1-3}
|
|
5'h1a & & LOCK (A-register = PC)&\\\hline
|
\end{tabular}
|
\end{tabular}
|
\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes}
|
\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
%
|
%
|
Of these opcodes, the {\tt BREV} and {\tt POPC} are experimental, and may be
|
Of these opcodes, the {\tt BREV} and {\tt POPC} are experimental, and may be
|
Line 751... |
Line 806... |
3'h1 & {\tt .LT} & Less than ('N' set) \\
|
3'h1 & {\tt .LT} & Less than ('N' set) \\
|
3'h2 & {\tt .Z} & Only execute when 'Z' is set \\
|
3'h2 & {\tt .Z} & Only execute when 'Z' is set \\
|
3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\
|
3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\
|
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
|
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
|
3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
|
3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
|
3'h6 & {\tt .C} & Carry set\\
|
3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\
|
3'h7 & {\tt .V} & Overflow set\\
|
3'h7 & {\tt .V} & Overflow set\\
|
\end{tabular}
|
\end{tabular}
|
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
There is no condition code for less than or equal, not C or not V---there
|
There is no condition code for less than or equal, not C or not V---there
|
just wasn't enough space in 3--bits. Conditioning on a non--supported
|
just wasn't enough space in 3--bits. Conditioning on a non--supported
|
condition is still possible, but it will take an extra instruction and a
|
condition is still possible, but it will take an extra instruction and a
|
pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt
|
pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt
|
STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the
|
STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the
|
condition, and thus recovering those extra two clocks. Thus instead of
|
condition, and thus recovering those extra two clocks. Thus instead of
|
\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNV label} you can issue a
|
\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNC label} you can issue a
|
\hbox{\tt CMP Ry,Rx;} \hbox{\tt BV label}.
|
\hbox{\tt CMP 1+Ry,Rx;} \hbox{\tt BC label}.
|
|
|
Conditionally executed instructions will not further adjust the
|
Conditionally executed instructions will not further adjust the
|
condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST}
|
condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST}
|
instructions. Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions
|
instructions. Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions
|
will adjust conditions whenever they are executed. In this way,
|
will adjust conditions whenever they are executed. In this way,
|
Line 801... |
Line 856... |
\caption{VLIW Conditions}\label{tbl:vliw-conditions}
|
\caption{VLIW Conditions}\label{tbl:vliw-conditions}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
Further, the first bit is given a special meaning. If the first bit is set,
|
Further, the first bit is given a special meaning. If the first bit is set,
|
the conditions apply to the second half of the instruction, otherwise the
|
the conditions apply to the second half of the instruction, otherwise the
|
conditions will only apply to the first half of a conditional instruction.
|
conditions will only apply to the first half of a conditional instruction.
|
|
Of course, the other conditions are still available by mingling the
|
|
non--VLIW instructions with VLIW instructions.
|
|
|
\section{Operand B}
|
\section{Operand B}
|
Many instruction forms have a 19-bit source ``Operand B'' associated with them.
|
Many instruction forms have a 19-bit source ``Operand B'' associated with them.
|
This ``Operand B'' is shown in Fig.~\ref{fig:iset-format} as part of the
|
This ``Operand B'' is shown in Fig.~\ref{fig:iset-format} as part of the
|
standard instructions. This Operand B is either equal to a register plus a
|
standard instructions. This Operand B is either equal to a register plus a
|
Line 848... |
Line 905... |
removed from the realm of possibilities. This means that the Zip CPU has no
|
removed from the realm of possibilities. This means that the Zip CPU has no
|
native way of executing push, pop, return, or jump to subroutine operations.
|
native way of executing push, pop, return, or jump to subroutine operations.
|
Each of these instructions can be emulated with a set of instructions from the
|
Each of these instructions can be emulated with a set of instructions from the
|
existing set.
|
existing set.
|
|
|
|
\section{Modifying Conditions}
|
|
A quick look at the list of conditions supported by the Zip CPU and listed
|
|
in Tbl.~\ref{tbl:conditions} reveals that the Zip CPU does not have a full set
|
|
of conditions. In particular, only one explicit unsigned condition is
|
|
supported. Therefore, Tbl.~\ref{tbl:creating-conditions}
|
|
\begin{table}\begin{center}
|
|
\begin{tabular}{|l|l|l|}\hline
|
|
Original & Modified & Name \\\hline\hline
|
|
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1
|
|
& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label}
|
|
& Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline
|
|
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}
|
|
& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label}
|
|
& Less-than or equal unsigned \\[4mm]\hline
|
|
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry
|
|
& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}
|
|
& Greater-than unsigned \\[4mm]\hline
|
|
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label} % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1
|
|
& \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label}
|
|
& Greater-than equal unsigned \\[4mm]\hline
|
|
\parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A
|
|
& \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label}
|
|
& Greater-than equal unsigned (with offset)\\[4mm]\hline
|
|
\parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A
|
|
& \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label}
|
|
& Greater-than equal comparison with a constant\\[4mm]\hline
|
|
\end{tabular}
|
|
\caption{Modifying conditions}\label{tbl:creating-conditions}
|
|
\end{center}\end{table}
|
|
shows examples of how these unsupported conditions can be created
|
|
simply by adjusting the compare instruction, for no extra cost in clocks.
|
|
Of course, if the compare originally had an immediate within it, that immediate
|
|
would need to be loaded into a register in order to do some of these compares.
|
|
This case is shown as the last case above.
|
|
|
\section{Move Operands}
|
\section{Move Operands}
|
The previous set of operands would be perfect and complete, save only that
|
The previous set of operands would be perfect and complete, save only that
|
the CPU needs access to non--supervisory registers while in supervisory mode.
|
the CPU needs access to non--supervisory registers while in supervisory mode.
|
Therefore, the MOV instruction is special and offers access to these registers
|
Therefore, the MOV instruction is special and offers access to these registers
|
\ldots when in supervisory mode. To keep the compiler simple, the extra bits
|
\ldots when in supervisory mode. To keep the compiler simple, the extra bits
|
Line 873... |
Line 965... |
Anything with the user bit set will be treated as a user register and displayed
|
Anything with the user bit set will be treated as a user register and displayed
|
special. Since the CPU quietly ignores the supervisor bits while in user mode,
|
special. Since the CPU quietly ignores the supervisor bits while in user mode,
|
anything marked as a user register will always be specific.
|
anything marked as a user register will always be specific.
|
|
|
\section{Multiply Operations}
|
\section{Multiply Operations}
|
The Zip CPU supports two Multiply operations, a 16x16 bit signed multiply
|
|
({\tt MPYS}) and a 16x16 bit unsigned multiply ({\tt MPYU}). A 32--bit
|
The ZipCPU originally only supported 16x16 multiply operations. GCC, however,
|
multiply, should it be desired, needs to be created via software from this
|
wanted 32x32-bit operations and building these from 16x16-bit multiplies
|
16x16 bit multiply.
|
is painful. Therefore, the ZipCPU was modified to support 32x32-bit multiplies.
|
|
|
|
In particular, the ZipCPU supports three separate 32x32-bit multiply
|
|
instructions: {\tt MPY}, {\tt MPYUHI}, and {\tt MPYSHI}. The first of these
|
|
produces the low 32-bits of a 32x32-bit multiply result. The second two
|
|
produce the upper 32-bits. The first, {\tt MPYUHI}, produces the upper 32-bits
|
|
assuming the multiply was unsigned, whereas the second assuming it was signed.
|
|
Each multiply instruction is independent of each other in execution, although
|
|
the compiler may use them quite dependently.
|
|
|
|
In an effort to maintain single clock pipeline timing, all three of these
|
|
multiplies have been slowed down in logic. Thus, depending upon the setting
|
|
of {\tt OPT\_MULTIPLY} within {\tt cpudefs.v}, the multiply instructions
|
|
will either 1)~cause an ILLEGAL instruction error, 2)~take one additional clock,
|
|
or 3)~take two additional clocks.
|
|
|
|
|
\section{Divide Unit}
|
\section{Divide Unit}
|
The Zip CPU also has a divide unit which can be built alongside the ALU.
|
The Zip CPU also has a divide unit which can be built alongside the ALU.
|
This divide unit provides the Zip CPU with its first two instructions that
|
This divide unit provides the Zip CPU with another two instructions that
|
cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and
|
cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and
|
{\tt DIVU}, the unsigned divide. These are both 32--bit divide instructions,
|
{\tt DIVU}, the unsigned divide. These are both 32--bit divide instructions,
|
dividing one 32--bit number by another. In this case, the Operand B field,
|
dividing one 32--bit number by another. In this case, the Operand B field,
|
whether it be register or register plus immediate, constitutes the denominator,
|
whether it be register or register plus immediate, constitutes the denominator,
|
whereas the numerator is given by the other register.
|
whereas the numerator is given by the other register.
|
|
|
The Divide is also a multi--clock instruction. While the divide is running,
|
The Divide is also a multi--clock instruction. While the divide is running,
|
the ALU, memory unit, and floating point unit (if installed) will be idle.
|
the ALU, any memory loads, and the floating point unit (if installed) will be
|
Once the divide completes, other units may continue.
|
idle. Once the divide completes, other units may continue.
|
|
|
Of course, divides can have errors: division by zero. In the case of division
|
Of course, divides can have errors: division by zero. In the case of division
|
by zero, an exception will be caused that will send the CPU either from
|
by zero, an exception will be caused that will send the CPU either from
|
user mode to supervisor mode, or halt the CPU if it is already in supervisor
|
user mode to supervisor mode, or halt the CPU if it is already in supervisor
|
mode.
|
mode.
|
|
|
\section{NOOP, BREAK, and Bus Lock Instruction}
|
\section{NOOP, BREAK, and Bus Lock Instruction}
|
Three instructions are not listed in the opcode list in
|
Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are
|
Tbl.~\ref{tbl:iset-opcodes}, yet fit in the NOOP type instruction format of
|
somewhat special. These are the {\tt NOOP}, {\tt Break}, and bus {\tt LOCK}
|
Fig.~\ref{fig:iset-format}. These are the {\tt NOOP}, {\tt Break}, and
|
instructions. These are encoded according to
|
bus {\tt LOCK} instructions. These are encoded according to
|
|
Fig.~\ref{fig:iset-noop}, and have the following meanings:
|
Fig.~\ref{fig:iset-noop}, and have the following meanings:
|
\begin{figure}\begin{center}
|
\begin{figure}\begin{center}
|
\begin{bytefield}[endianness=big]{32}
|
\begin{bytefield}[endianness=big]{32}
|
\bitheader{0-31}\\
|
\bitheader{0-31}\\
|
\begin{leftwordgroup}{NOOP}
|
\begin{leftwordgroup}{NOOP}
|
\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}
|
\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}
|
\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored} \\
|
\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{Ignored} \\
|
\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}
|
\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}
|
\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{---} \\
|
\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{---} \\
|
\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}
|
\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}
|
\bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}
|
\bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}
|
\bitbox{5}{Ignored}
|
\bitbox{5}{Ignored}
|
\end{leftwordgroup} \\
|
\end{leftwordgroup} \\
|
\begin{leftwordgroup}{BREAK}
|
\begin{leftwordgroup}{BREAK}
|
\bitbox{1}{0}\bitbox{3}{3'h7}
|
\bitbox{1}{0}\bitbox{3}{3'h7}
|
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}
|
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored}
|
\end{leftwordgroup} \\
|
\end{leftwordgroup} \\
|
\begin{leftwordgroup}{LOCK}
|
\begin{leftwordgroup}{LOCK}
|
\bitbox{1}{0}\bitbox{3}{3'h7}
|
\bitbox{1}{0}\bitbox{3}{3'h7}
|
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{100}\bitbox{22}{Ignored}
|
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}
|
\end{leftwordgroup} \\
|
\end{leftwordgroup} \\
|
\end{bytefield}
|
\end{bytefield}
|
\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}
|
\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}
|
\end{center}\end{figure}
|
\end{center}\end{figure}
|
|
|
Line 937... |
Line 1043... |
The {\tt BREAK} instruction is useful for creating a debug instruction that
|
The {\tt BREAK} instruction is useful for creating a debug instruction that
|
will halt the CPU without executing. If in user mode, depending upon the
|
will halt the CPU without executing. If in user mode, depending upon the
|
setting of the break enable bit, it will either switch to supervisor mode or
|
setting of the break enable bit, it will either switch to supervisor mode or
|
halt the CPU--depending upon where the user wishes to do his debugging.
|
halt the CPU--depending upon where the user wishes to do his debugging.
|
|
|
Finally, the {\tt LOCK} instruction was added in order to make a test and
|
Finally, the {\tt LOCK} instruction was added in order to provide for
|
set multi--CPU operation possible. Following a LOCK instruction, the next
|
atomic operations. The {\tt LOCK} instruction only works in pipeline mode.
|
two instructions, if they are memory LOD/STO instructions, will execute without
|
It works by stalling the ALU pipeline stack until all prior stages are
|
dropping the wishbone {\tt CYC} line between the instructions. Thus a
|
filled, and then it guarantees that once a bus cycle is started, the
|
{\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, where Rz
|
wishbone {\tt CYC} line will remain asserted until the LOCK is deasserted.
|
is initially set, can be used to set an address while guaranteeing that Ry
|
This allows the execution of one instruction that was waiting in the load
|
was the value before setting the address to Rz. This is a useful instruction
|
operands pipeline stage, and one instruction that was waiting in the
|
while trying to achieve concurrency among multiple CPU's.
|
instruction decode stage. Further, if the instruction waiting in the decode
|
|
stage was a VLIW instruction, then it may be possible to execute a third
|
|
instruction.
|
|
|
|
This was originally written to implement an atomic test and set instruction,
|
|
such as a {\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)},
|
|
where Rz is initially set.
|
|
|
|
Other instructions using a VLIW instruction combining a single ALU instruction
|
|
with a store, such as an atomic increment, or {\tt LOCK}, {\tt LOD (Rx),Ry},
|
|
{\tt ADD 1,Ry}, {\tt STO Ry,(Rx)}, should be possible as well. Many of these
|
|
combinations remain to be tested.
|
|
|
\section{Floating Point}
|
\section{Floating Point}
|
Although the Zip CPU does not (yet) have a floating point unit, the current
|
Although the Zip CPU does not (yet) have a floating point unit, the current
|
instruction set offers eight opcodes for floating point operations, and treats
|
instruction set offers eight opcodes for floating point operations, and treats
|
floating point exceptions like divide by zero errors. Once this unit is built
|
floating point exceptions like divide by zero errors. Once this unit is built
|
and integrated together with the rest of the CPU, the Zip CPU will support
|
and integrated together with the rest of the CPU, the Zip CPU will support
|
32--bit floating point instructions natively. Any 64--bit floating point
|
32--bit floating point instructions natively. Any 64--bit floating point
|
instructions will still need to be emulated in software.
|
instructions will still need to be emulated in software.
|
|
|
|
Until that time, of even after if the floating point unit is not installed,
|
|
floating point instructions will trigger an illegal instruction exception,
|
|
which may be trapped and then implemented in software.
|
|
|
\section{Derived Instructions}
|
\section{Derived Instructions}
|
The Zip CPU supports many other common instructions, but not all of them
|
The Zip CPU supports many other common instructions, but not all of them
|
are single cycle instructions. The derived instruction tables,
|
are single cycle instructions. The derived instruction tables,
|
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3}
|
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3}
|
and~\ref{tbl:derived-4},
|
and~\ref{tbl:derived-4},
|
Line 1123... |
Line 1244... |
\\\hline
|
\\\hline
|
{\tt STEP Rr,Rt}
|
{\tt STEP Rr,Rt}
|
& \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}
|
& \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}
|
& Step a Galois implementation of a Linear Feedback Shift Register, Rr,
|
& Step a Galois implementation of a Linear Feedback Shift Register, Rr,
|
using taps Rt \\\hline
|
using taps Rt \\\hline
|
|
%
|
|
%
|
|
{\tt SEX.b Rx }
|
|
& \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx}
|
|
& Signed extend a byte into a full word.\\\hline
|
|
{\tt SEX.h Rx }
|
|
& \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx}
|
|
& Sign extend a half word into a full word.\\\hline
|
|
%
|
{\tt STO.b Rx,\$addr}
|
{\tt STO.b Rx,\$addr}
|
& \parbox[t]{1.5in}{\tt %
|
& \parbox[t]{1.5in}{\tt %
|
LDI \$addr,Ra \\
|
LDI \$addr,Ra \\
|
LDI \$addr,Rb \\
|
LDI \$addr,Rb \\
|
LSR \$2,Ra \\
|
LSR \$2,Ra \\
|