URL
https://opencores.org/ocsvn/zipcpu/zipcpu/trunk
Subversion Repositories zipcpu
Compare Revisions
- This comparison shows the changes necessary to convert path
/zipcpu/trunk/doc
- from Rev 107 to Rev 139
- ↔ Reverse comparison
Rev 107 → Rev 139
/spec.pdf
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
/src/spec.tex
43,15 → 43,51
%% |
%% |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
% |
% |
% |
% From TI about DSPs vs FPGAs: |
% www.ti.com/general/docs/video/foldersGallery.tsp?bkg=gray |
% &gpn=35145&familyid=1622&keyMatch=DSP Breaktime Episode Three |
% &tisearch=Search-EN-Everything&DCMP=leadership |
% &HQS=ep-pro-dsp-leadership-problog-150518-v-en |
% |
% FPGA's are annoyingly faster, cheaper, and not quite as power hungry |
% as they used to be. |
% |
% Why would you choose DSPs over FPGAs? If you care about size, |
% if you care about power, or happen to have a complicated algorithm |
% that just isn't simply doing the same thing over and over |
% |
% For complex algorithms that change over time. Each have their strengths |
% sometimes you can use both. |
% |
% "No assembly required" -- TI tools all C programming, very GUI based |
% environment, very little optimization by hand ... |
% |
% |
% The FPGA's achilles heel: Reconfigurability. It is very difficult, although |
% I'm sure major vendors will tell you not impossible, to reconfigure an FPGA |
% based upon the need to process time-sensitive data. If you need one of two |
% algorithms, both which will fit on the FPGA individually but not together, |
% switching between them on the fly is next to impossible, whereas switching |
% algorithm within a CPU is not difficult at all. For example, imagine |
% receiving a packet and needing to apply one of two data algorithms on the |
% packet before sending it back out, and needing to do so fast. If both |
% algorithms don't fit in memory, where does the packet go when you need to |
% swap one algorithm out for the other? And what is the cost of that "context" |
% swap? |
% |
% |
\documentclass{gqtekspec} |
\usepackage{import} |
\usepackage{bytefield} |
\usepackage{bytefield} % Install via apt-get install texlive-science |
% \graphicspath{{../gfx}} |
\project{Zip CPU} |
\title{Specification} |
\author{Dan Gisselquist, Ph.D.} |
\email{dgisselq (at) opencores.org} |
\revision{Rev.~0.8} |
\revision{Rev.~0.9} |
\definecolor{webred}{rgb}{0.5,0,0} |
\definecolor{webgreen}{rgb}{0,0.4,0} |
\usepackage[dvips,ps2pdf,colorlinks=true, |
84,6 → 120,8
copy. |
\end{license} |
\begin{revisionhistory} |
0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY, MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively. LOCK instruction now |
permits an intermediate ALU operation. \\\hline |
0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline |
0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline |
0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline |
652,8 → 690,9
\end{center}\end{figure} |
The basic format is that some operation, defined by the OpCode, is applied |
if a condition, Cnd, is true in order to produce a result which is placed in |
the destination register, or DR. The Load 23--bit signed immediate instruction |
is different in that it requires no conditions, and uses only a 4-bit opcode. |
the destination register, or DR. The load 23--bit signed immediate instruction |
(LDI) is different in that it accepts no conditions, and uses only a 4-bit |
opcode. |
|
This is actually a second version of instruction set definition, given certain |
lessons learned. For example, the original instruction set had the following |
667,6 → 706,7
extra logic to process. |
\item The instruction set wasn't very compact. One bus operation was required |
for every instruction. |
\item While the CPU supported multiplies, they were only 16x16 bit multiplies. |
\end{enumerate} |
This second version was designed with two criteria. The first was that the |
new instruction set needed to be compatible, at the assembly language level, |
692,6 → 732,18
possible into the VLIW format. Where necessary to place both VLIW instructions |
on the same line, they will be separated by a vertical bar. |
|
One belated change to the instruction set violates some of the above |
principles. This latter instruction set change replaced the {\tt LDIHI} |
instruction with a 32--bit multiply instruction {\tt MPY}, and then changed |
the two 16--bit multiply instructions {\tt MPYU} and {\tt MPYS} for |
{\tt MPYUHI} and {\tt MPYSHI} respectively. This creates a 32--bit |
multiply capability, while removing the 16--bit multiply that wasn't very |
useful. Further, the {\tt LDIHI} instruction was being used primarily by the |
assembler and linker to create a 32--bit load immediate pair of instructions. |
This instruction set combination, {\tt LDIHI} followed by {\tt LDILO} was |
replaced with an equivalent instruction set, {\tt BREV} followed by {\tt LDILO}, |
save that linking has been made more complicated in the process. |
|
\section{Instruction OpCodes} |
With a 5--bit opcode field, there are 32--possible instructions as shown in |
Tbl.~\ref{tbl:iset-opcodes}. |
706,10 → 758,10
5'h05 & LSR & Logical Shift Right & \\\cline{1-3} |
5'h06 & LSL & Logical Shift Left & \\\cline{1-3} |
5'h07 & ASR & Arithmetic Shift Right & \\\hline |
5'h08 & LDIHI & Load Immediate High & N \\\cline{1-3} |
5'h09 & LDILO & Load Immediate Low & \\\hline |
5'h0a & MPYU & Unsigned 16--bit Multiply & \\\cline{1-3} |
5'h0b & MPYS & Signed 16--bit Multiply & Y \\\cline{1-3} |
5'h08 & MPY & 32x32 bit multiply & Y \\\hline |
5'h09 & LDILO & Load Immediate Low & N\\\hline |
5'h0a & MPYUHI & Upper 32 of 64 bits from an unsigned 32x32 multiply & \\\cline{1-3} |
5'h0b & MPYSHI & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3} |
5'h0c & BREV & Bit Reverse & \\\cline{1-3} |
5'h0d & POPC& Population Count & \\\cline{1-3} |
5'h0e & ROL & Rotate left & \\\hline |
729,6 → 781,9
5'h1d & FPINT & Convert to integer & \\\hline |
5'h1e & & {\em Reserved for future use} &\\\hline |
5'h1f & & {\em Reserved for future use} &\\\hline |
5'h18 & & NOOP (A-register = PC)&\\\cline{1-3} |
5'h19 & & BREAK (A-register = PC)& N\\\cline{1-3} |
5'h1a & & LOCK (A-register = PC)&\\\hline |
\end{tabular} |
\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes} |
\end{center}\end{table} |
753,7 → 808,7
3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\ |
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\ |
3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\ |
3'h6 & {\tt .C} & Carry set\\ |
3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\ |
3'h7 & {\tt .V} & Overflow set\\ |
\end{tabular} |
\caption{Conditions for conditional operand execution}\label{tbl:conditions} |
764,8 → 819,8
pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt |
STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the |
condition, and thus recovering those extra two clocks. Thus instead of |
\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNV label} you can issue a |
\hbox{\tt CMP Ry,Rx;} \hbox{\tt BV label}. |
\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNC label} you can issue a |
\hbox{\tt CMP 1+Ry,Rx;} \hbox{\tt BC label}. |
|
Conditionally executed instructions will not further adjust the |
condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST} |
803,6 → 858,8
Further, the first bit is given a special meaning. If the first bit is set, |
the conditions apply to the second half of the instruction, otherwise the |
conditions will only apply to the first half of a conditional instruction. |
Of course, the other conditions are still available by mingling the |
non--VLIW instructions with VLIW instructions. |
|
\section{Operand B} |
Many instruction forms have a 19-bit source ``Operand B'' associated with them. |
850,6 → 907,41
Each of these instructions can be emulated with a set of instructions from the |
existing set. |
|
\section{Modifying Conditions} |
A quick look at the list of conditions supported by the Zip CPU and listed |
in Tbl.~\ref{tbl:conditions} reveals that the Zip CPU does not have a full set |
of conditions. In particular, only one explicit unsigned condition is |
supported. Therefore, Tbl.~\ref{tbl:creating-conditions} |
\begin{table}\begin{center} |
\begin{tabular}{|l|l|l|}\hline |
Original & Modified & Name \\\hline\hline |
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1 |
& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label} |
& Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline |
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label} |
& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label} |
& Less-than or equal unsigned \\[4mm]\hline |
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry |
& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label} |
& Greater-than unsigned \\[4mm]\hline |
\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label} % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1 |
& \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label} |
& Greater-than equal unsigned \\[4mm]\hline |
\parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A |
& \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label} |
& Greater-than equal unsigned (with offset)\\[4mm]\hline |
\parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A |
& \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label} |
& Greater-than equal comparison with a constant\\[4mm]\hline |
\end{tabular} |
\caption{Modifying conditions}\label{tbl:creating-conditions} |
\end{center}\end{table} |
shows examples of how these unsupported conditions can be created |
simply by adjusting the compare instruction, for no extra cost in clocks. |
Of course, if the compare originally had an immediate within it, that immediate |
would need to be loaded into a register in order to do some of these compares. |
This case is shown as the last case above. |
|
\section{Move Operands} |
The previous set of operands would be perfect and complete, save only that |
the CPU needs access to non--supervisory registers while in supervisory mode. |
875,14 → 967,29
anything marked as a user register will always be specific. |
|
\section{Multiply Operations} |
The Zip CPU supports two Multiply operations, a 16x16 bit signed multiply |
({\tt MPYS}) and a 16x16 bit unsigned multiply ({\tt MPYU}). A 32--bit |
multiply, should it be desired, needs to be created via software from this |
16x16 bit multiply. |
|
The ZipCPU originally only supported 16x16 multiply operations. GCC, however, |
wanted 32x32-bit operations and building these from 16x16-bit multiplies |
is painful. Therefore, the ZipCPU was modified to support 32x32-bit multiplies. |
|
In particular, the ZipCPU supports three separate 32x32-bit multiply |
instructions: {\tt MPY}, {\tt MPYUHI}, and {\tt MPYSHI}. The first of these |
produces the low 32-bits of a 32x32-bit multiply result. The second two |
produce the upper 32-bits. The first, {\tt MPYUHI}, produces the upper 32-bits |
assuming the multiply was unsigned, whereas the second assuming it was signed. |
Each multiply instruction is independent of each other in execution, although |
the compiler may use them quite dependently. |
|
In an effort to maintain single clock pipeline timing, all three of these |
multiplies have been slowed down in logic. Thus, depending upon the setting |
of {\tt OPT\_MULTIPLY} within {\tt cpudefs.v}, the multiply instructions |
will either 1)~cause an ILLEGAL instruction error, 2)~take one additional clock, |
or 3)~take two additional clocks. |
|
|
\section{Divide Unit} |
The Zip CPU also has a divide unit which can be built alongside the ALU. |
This divide unit provides the Zip CPU with its first two instructions that |
This divide unit provides the Zip CPU with another two instructions that |
cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and |
{\tt DIVU}, the unsigned divide. These are both 32--bit divide instructions, |
dividing one 32--bit number by another. In this case, the Operand B field, |
890,8 → 997,8
whereas the numerator is given by the other register. |
|
The Divide is also a multi--clock instruction. While the divide is running, |
the ALU, memory unit, and floating point unit (if installed) will be idle. |
Once the divide completes, other units may continue. |
the ALU, any memory loads, and the floating point unit (if installed) will be |
idle. Once the divide completes, other units may continue. |
|
Of course, divides can have errors: division by zero. In the case of division |
by zero, an exception will be caused that will send the CPU either from |
899,10 → 1006,9
mode. |
|
\section{NOOP, BREAK, and Bus Lock Instruction} |
Three instructions are not listed in the opcode list in |
Tbl.~\ref{tbl:iset-opcodes}, yet fit in the NOOP type instruction format of |
Fig.~\ref{fig:iset-format}. These are the {\tt NOOP}, {\tt Break}, and |
bus {\tt LOCK} instructions. These are encoded according to |
Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are |
somewhat special. These are the {\tt NOOP}, {\tt Break}, and bus {\tt LOCK} |
instructions. These are encoded according to |
Fig.~\ref{fig:iset-noop}, and have the following meanings: |
\begin{figure}\begin{center} |
\begin{bytefield}[endianness=big]{32} |
909,9 → 1015,9
\bitheader{0-31}\\ |
\begin{leftwordgroup}{NOOP} |
\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{} |
\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored} \\ |
\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{Ignored} \\ |
\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{} |
\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{---} \\ |
\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{---} \\ |
\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---} |
\bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001} |
\bitbox{5}{Ignored} |
918,11 → 1024,11
\end{leftwordgroup} \\ |
\begin{leftwordgroup}{BREAK} |
\bitbox{1}{0}\bitbox{3}{3'h7} |
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored} |
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored} |
\end{leftwordgroup} \\ |
\begin{leftwordgroup}{LOCK} |
\bitbox{1}{0}\bitbox{3}{3'h7} |
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{100}\bitbox{22}{Ignored} |
\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored} |
\end{leftwordgroup} \\ |
\end{bytefield} |
\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop} |
939,15 → 1045,26
setting of the break enable bit, it will either switch to supervisor mode or |
halt the CPU--depending upon where the user wishes to do his debugging. |
|
Finally, the {\tt LOCK} instruction was added in order to make a test and |
set multi--CPU operation possible. Following a LOCK instruction, the next |
two instructions, if they are memory LOD/STO instructions, will execute without |
dropping the wishbone {\tt CYC} line between the instructions. Thus a |
{\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, where Rz |
is initially set, can be used to set an address while guaranteeing that Ry |
was the value before setting the address to Rz. This is a useful instruction |
while trying to achieve concurrency among multiple CPU's. |
Finally, the {\tt LOCK} instruction was added in order to provide for |
atomic operations. The {\tt LOCK} instruction only works in pipeline mode. |
It works by stalling the ALU pipeline stack until all prior stages are |
filled, and then it guarantees that once a bus cycle is started, the |
wishbone {\tt CYC} line will remain asserted until the LOCK is deasserted. |
This allows the execution of one instruction that was waiting in the load |
operands pipeline stage, and one instruction that was waiting in the |
instruction decode stage. Further, if the instruction waiting in the decode |
stage was a VLIW instruction, then it may be possible to execute a third |
instruction. |
|
This was originally written to implement an atomic test and set instruction, |
such as a {\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, |
where Rz is initially set. |
|
Other instructions using a VLIW instruction combining a single ALU instruction |
with a store, such as an atomic increment, or {\tt LOCK}, {\tt LOD (Rx),Ry}, |
{\tt ADD 1,Ry}, {\tt STO Ry,(Rx)}, should be possible as well. Many of these |
combinations remain to be tested. |
|
\section{Floating Point} |
Although the Zip CPU does not (yet) have a floating point unit, the current |
instruction set offers eight opcodes for floating point operations, and treats |
956,6 → 1073,10
32--bit floating point instructions natively. Any 64--bit floating point |
instructions will still need to be emulated in software. |
|
Until that time, of even after if the floating point unit is not installed, |
floating point instructions will trigger an illegal instruction exception, |
which may be trapped and then implemented in software. |
|
\section{Derived Instructions} |
The Zip CPU supports many other common instructions, but not all of them |
are single cycle instructions. The derived instruction tables, |
1125,6 → 1246,15
& \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr} |
& Step a Galois implementation of a Linear Feedback Shift Register, Rr, |
using taps Rt \\\hline |
% |
% |
{\tt SEX.b Rx } |
& \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx} |
& Signed extend a byte into a full word.\\\hline |
{\tt SEX.h Rx } |
& \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx} |
& Sign extend a half word into a full word.\\\hline |
% |
{\tt STO.b Rx,\$addr} |
& \parbox[t]{1.5in}{\tt % |
LDI \$addr,Ra \\ |