OpenCores
URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

Subversion Repositories zipcpu

Compare Revisions

  • This comparison shows the changes necessary to convert path
    /zipcpu/trunk/doc/src
    from Rev 139 to Rev 92
    Reverse comparison

Rev 139 → Rev 92

/spec.tex
43,51 → 43,15
%%
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%
% From TI about DSPs vs FPGAs:
% www.ti.com/general/docs/video/foldersGallery.tsp?bkg=gray
% &gpn=35145&familyid=1622&keyMatch=DSP Breaktime Episode Three
% &tisearch=Search-EN-Everything&DCMP=leadership
% &HQS=ep-pro-dsp-leadership-problog-150518-v-en
%
% FPGA's are annoyingly faster, cheaper, and not quite as power hungry
% as they used to be.
%
% Why would you choose DSPs over FPGAs? If you care about size,
% if you care about power, or happen to have a complicated algorithm
% that just isn't simply doing the same thing over and over
%
% For complex algorithms that change over time. Each have their strengths
% sometimes you can use both.
%
% "No assembly required" -- TI tools all C programming, very GUI based
% environment, very little optimization by hand ...
%
%
% The FPGA's achilles heel: Reconfigurability. It is very difficult, although
% I'm sure major vendors will tell you not impossible, to reconfigure an FPGA
% based upon the need to process time-sensitive data. If you need one of two
% algorithms, both which will fit on the FPGA individually but not together,
% switching between them on the fly is next to impossible, whereas switching
% algorithm within a CPU is not difficult at all. For example, imagine
% receiving a packet and needing to apply one of two data algorithms on the
% packet before sending it back out, and needing to do so fast. If both
% algorithms don't fit in memory, where does the packet go when you need to
% swap one algorithm out for the other? And what is the cost of that "context"
% swap?
%
%
\documentclass{gqtekspec}
\usepackage{import}
\usepackage{bytefield} % Install via apt-get install texlive-science
\usepackage{bytefield}
% \graphicspath{{../gfx}}
\project{Zip CPU}
\title{Specification}
\author{Dan Gisselquist, Ph.D.}
\email{dgisselq (at) opencores.org}
\revision{Rev.~0.9}
\revision{Rev.~0.8}
\definecolor{webred}{rgb}{0.5,0,0}
\definecolor{webgreen}{rgb}{0,0.4,0}
\usepackage[dvips,ps2pdf,colorlinks=true,
120,8 → 84,6
copy.
\end{license}
\begin{revisionhistory}
0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY, MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively. LOCK instruction now
permits an intermediate ALU operation. \\\hline
0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline
0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline
0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline
690,9 → 652,8
\end{center}\end{figure}
The basic format is that some operation, defined by the OpCode, is applied
if a condition, Cnd, is true in order to produce a result which is placed in
the destination register, or DR. The load 23--bit signed immediate instruction
(LDI) is different in that it accepts no conditions, and uses only a 4-bit
opcode.
the destination register, or DR. The Load 23--bit signed immediate instruction
is different in that it requires no conditions, and uses only a 4-bit opcode.
 
This is actually a second version of instruction set definition, given certain
lessons learned. For example, the original instruction set had the following
706,7 → 667,6
extra logic to process.
\item The instruction set wasn't very compact. One bus operation was required
for every instruction.
\item While the CPU supported multiplies, they were only 16x16 bit multiplies.
\end{enumerate}
This second version was designed with two criteria. The first was that the
new instruction set needed to be compatible, at the assembly language level,
732,18 → 692,6
possible into the VLIW format. Where necessary to place both VLIW instructions
on the same line, they will be separated by a vertical bar.
 
One belated change to the instruction set violates some of the above
principles. This latter instruction set change replaced the {\tt LDIHI}
instruction with a 32--bit multiply instruction {\tt MPY}, and then changed
the two 16--bit multiply instructions {\tt MPYU} and {\tt MPYS} for
{\tt MPYUHI} and {\tt MPYSHI} respectively. This creates a 32--bit
multiply capability, while removing the 16--bit multiply that wasn't very
useful. Further, the {\tt LDIHI} instruction was being used primarily by the
assembler and linker to create a 32--bit load immediate pair of instructions.
This instruction set combination, {\tt LDIHI} followed by {\tt LDILO} was
replaced with an equivalent instruction set, {\tt BREV} followed by {\tt LDILO},
save that linking has been made more complicated in the process.
 
\section{Instruction OpCodes}
With a 5--bit opcode field, there are 32--possible instructions as shown in
Tbl.~\ref{tbl:iset-opcodes}.
758,10 → 706,10
5'h05 & LSR & Logical Shift Right & \\\cline{1-3}
5'h06 & LSL & Logical Shift Left & \\\cline{1-3}
5'h07 & ASR & Arithmetic Shift Right & \\\hline
5'h08 & MPY & 32x32 bit multiply & Y \\\hline
5'h09 & LDILO & Load Immediate Low & N\\\hline
5'h0a & MPYUHI & Upper 32 of 64 bits from an unsigned 32x32 multiply & \\\cline{1-3}
5'h0b & MPYSHI & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3}
5'h08 & LDIHI & Load Immediate High & N \\\cline{1-3}
5'h09 & LDILO & Load Immediate Low & \\\hline
5'h0a & MPYU & Unsigned 16--bit Multiply & \\\cline{1-3}
5'h0b & MPYS & Signed 16--bit Multiply & Y \\\cline{1-3}
5'h0c & BREV & Bit Reverse & \\\cline{1-3}
5'h0d & POPC& Population Count & \\\cline{1-3}
5'h0e & ROL & Rotate left & \\\hline
781,9 → 729,6
5'h1d & FPINT & Convert to integer & \\\hline
5'h1e & & {\em Reserved for future use} &\\\hline
5'h1f & & {\em Reserved for future use} &\\\hline
5'h18 & & NOOP (A-register = PC)&\\\cline{1-3}
5'h19 & & BREAK (A-register = PC)& N\\\cline{1-3}
5'h1a & & LOCK (A-register = PC)&\\\hline
\end{tabular}
\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes}
\end{center}\end{table}
808,7 → 753,7
3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\
3'h6 & {\tt .C} & Carry set\\
3'h7 & {\tt .V} & Overflow set\\
\end{tabular}
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
819,8 → 764,8
pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt
STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the
condition, and thus recovering those extra two clocks. Thus instead of
\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNC label} you can issue a
\hbox{\tt CMP 1+Ry,Rx;} \hbox{\tt BC label}.
\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNV label} you can issue a
\hbox{\tt CMP Ry,Rx;} \hbox{\tt BV label}.
 
Conditionally executed instructions will not further adjust the
condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST}
858,8 → 803,6
Further, the first bit is given a special meaning. If the first bit is set,
the conditions apply to the second half of the instruction, otherwise the
conditions will only apply to the first half of a conditional instruction.
Of course, the other conditions are still available by mingling the
non--VLIW instructions with VLIW instructions.
 
\section{Operand B}
Many instruction forms have a 19-bit source ``Operand B'' associated with them.
907,41 → 850,6
Each of these instructions can be emulated with a set of instructions from the
existing set.
 
\section{Modifying Conditions}
A quick look at the list of conditions supported by the Zip CPU and listed
in Tbl.~\ref{tbl:conditions} reveals that the Zip CPU does not have a full set
of conditions. In particular, only one explicit unsigned condition is
supported. Therefore, Tbl.~\ref{tbl:creating-conditions}
\begin{table}\begin{center}
\begin{tabular}{|l|l|l|}\hline
Original & Modified & Name \\\hline\hline
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1
& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label}
& Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}
& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label}
& Less-than or equal unsigned \\[4mm]\hline
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry
& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}
& Greater-than unsigned \\[4mm]\hline
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label} % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1
& \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label}
& Greater-than equal unsigned \\[4mm]\hline
\parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A
& \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label}
& Greater-than equal unsigned (with offset)\\[4mm]\hline
\parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A
& \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label}
& Greater-than equal comparison with a constant\\[4mm]\hline
\end{tabular}
\caption{Modifying conditions}\label{tbl:creating-conditions}
\end{center}\end{table}
shows examples of how these unsupported conditions can be created
simply by adjusting the compare instruction, for no extra cost in clocks.
Of course, if the compare originally had an immediate within it, that immediate
would need to be loaded into a register in order to do some of these compares.
This case is shown as the last case above.
 
\section{Move Operands}
The previous set of operands would be perfect and complete, save only that
the CPU needs access to non--supervisory registers while in supervisory mode.
967,29 → 875,14
anything marked as a user register will always be specific.
 
\section{Multiply Operations}
The Zip CPU supports two Multiply operations, a 16x16 bit signed multiply
({\tt MPYS}) and a 16x16 bit unsigned multiply ({\tt MPYU}). A 32--bit
multiply, should it be desired, needs to be created via software from this
16x16 bit multiply.
 
The ZipCPU originally only supported 16x16 multiply operations. GCC, however,
wanted 32x32-bit operations and building these from 16x16-bit multiplies
is painful. Therefore, the ZipCPU was modified to support 32x32-bit multiplies.
 
In particular, the ZipCPU supports three separate 32x32-bit multiply
instructions: {\tt MPY}, {\tt MPYUHI}, and {\tt MPYSHI}. The first of these
produces the low 32-bits of a 32x32-bit multiply result. The second two
produce the upper 32-bits. The first, {\tt MPYUHI}, produces the upper 32-bits
assuming the multiply was unsigned, whereas the second assuming it was signed.
Each multiply instruction is independent of each other in execution, although
the compiler may use them quite dependently.
 
In an effort to maintain single clock pipeline timing, all three of these
multiplies have been slowed down in logic. Thus, depending upon the setting
of {\tt OPT\_MULTIPLY} within {\tt cpudefs.v}, the multiply instructions
will either 1)~cause an ILLEGAL instruction error, 2)~take one additional clock,
or 3)~take two additional clocks.
 
 
\section{Divide Unit}
The Zip CPU also has a divide unit which can be built alongside the ALU.
This divide unit provides the Zip CPU with another two instructions that
This divide unit provides the Zip CPU with its first two instructions that
cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and
{\tt DIVU}, the unsigned divide. These are both 32--bit divide instructions,
dividing one 32--bit number by another. In this case, the Operand B field,
997,8 → 890,8
whereas the numerator is given by the other register.
 
The Divide is also a multi--clock instruction. While the divide is running,
the ALU, any memory loads, and the floating point unit (if installed) will be
idle. Once the divide completes, other units may continue.
the ALU, memory unit, and floating point unit (if installed) will be idle.
Once the divide completes, other units may continue.
 
Of course, divides can have errors: division by zero. In the case of division
by zero, an exception will be caused that will send the CPU either from
1006,9 → 899,10
mode.
 
\section{NOOP, BREAK, and Bus Lock Instruction}
Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are
somewhat special. These are the {\tt NOOP}, {\tt Break}, and bus {\tt LOCK}
instructions. These are encoded according to
Three instructions are not listed in the opcode list in
Tbl.~\ref{tbl:iset-opcodes}, yet fit in the NOOP type instruction format of
Fig.~\ref{fig:iset-format}. These are the {\tt NOOP}, {\tt Break}, and
bus {\tt LOCK} instructions. These are encoded according to
Fig.~\ref{fig:iset-noop}, and have the following meanings:
\begin{figure}\begin{center}
\begin{bytefield}[endianness=big]{32}
1015,9 → 909,9
\bitheader{0-31}\\
\begin{leftwordgroup}{NOOP}
\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}
\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{Ignored} \\
\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored} \\
\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}
\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{---} \\
\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{---} \\
\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}
\bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}
\bitbox{5}{Ignored}
1024,11 → 918,11
\end{leftwordgroup} \\
\begin{leftwordgroup}{BREAK}
\bitbox{1}{0}\bitbox{3}{3'h7}
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored}
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}
\end{leftwordgroup} \\
\begin{leftwordgroup}{LOCK}
\bitbox{1}{0}\bitbox{3}{3'h7}
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{100}\bitbox{22}{Ignored}
\end{leftwordgroup} \\
\end{bytefield}
\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}
1045,26 → 939,15
setting of the break enable bit, it will either switch to supervisor mode or
halt the CPU--depending upon where the user wishes to do his debugging.
 
Finally, the {\tt LOCK} instruction was added in order to provide for
atomic operations. The {\tt LOCK} instruction only works in pipeline mode.
It works by stalling the ALU pipeline stack until all prior stages are
filled, and then it guarantees that once a bus cycle is started, the
wishbone {\tt CYC} line will remain asserted until the LOCK is deasserted.
This allows the execution of one instruction that was waiting in the load
operands pipeline stage, and one instruction that was waiting in the
instruction decode stage. Further, if the instruction waiting in the decode
stage was a VLIW instruction, then it may be possible to execute a third
instruction.
Finally, the {\tt LOCK} instruction was added in order to make a test and
set multi--CPU operation possible. Following a LOCK instruction, the next
two instructions, if they are memory LOD/STO instructions, will execute without
dropping the wishbone {\tt CYC} line between the instructions. Thus a
{\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, where Rz
is initially set, can be used to set an address while guaranteeing that Ry
was the value before setting the address to Rz. This is a useful instruction
while trying to achieve concurrency among multiple CPU's.
 
This was originally written to implement an atomic test and set instruction,
such as a {\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)},
where Rz is initially set.
 
Other instructions using a VLIW instruction combining a single ALU instruction
with a store, such as an atomic increment, or {\tt LOCK}, {\tt LOD (Rx),Ry},
{\tt ADD 1,Ry}, {\tt STO Ry,(Rx)}, should be possible as well. Many of these
combinations remain to be tested.
 
\section{Floating Point}
Although the Zip CPU does not (yet) have a floating point unit, the current
instruction set offers eight opcodes for floating point operations, and treats
1073,10 → 956,6
32--bit floating point instructions natively. Any 64--bit floating point
instructions will still need to be emulated in software.
 
Until that time, of even after if the floating point unit is not installed,
floating point instructions will trigger an illegal instruction exception,
which may be trapped and then implemented in software.
 
\section{Derived Instructions}
The Zip CPU supports many other common instructions, but not all of them
are single cycle instructions. The derived instruction tables,
1246,15 → 1125,6
& \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}
& Step a Galois implementation of a Linear Feedback Shift Register, Rr,
using taps Rt \\\hline
%
%
{\tt SEX.b Rx }
& \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx}
& Signed extend a byte into a full word.\\\hline
{\tt SEX.h Rx }
& \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx}
& Sign extend a half word into a full word.\\\hline
%
{\tt STO.b Rx,\$addr}
& \parbox[t]{1.5in}{\tt %
LDI \$addr,Ra \\

powered by: WebSVN 2.1.0

© copyright 1999-2022 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.