OpenCores

Rev 92	Rev 139
Line 41...	Line 41...
`%% License: GPL, v3, as defined and found on www.gnu.org,`	`%% License: GPL, v3, as defined and found on www.gnu.org,`
`%% http://www.gnu.org/licenses/gpl.html`	`%% http://www.gnu.org/licenses/gpl.html`
`%%`	`%%`
`%%`	`%%`
`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`	`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
	`%`
	`%`
	`%`
	`% From TI about DSPs vs FPGAs:`
	`% www.ti.com/general/docs/video/foldersGallery.tsp?bkg=gray`
	`% &gpn=35145&familyid=1622&keyMatch=DSP Breaktime Episode Three`
	`% &tisearch=Search-EN-Everything&DCMP=leadership`
	`% &HQS=ep-pro-dsp-leadership-problog-150518-v-en`
	`%`
	`% FPGA's are annoyingly faster, cheaper, and not quite as power hungry`
	`% as they used to be.`
	`%`
	`% Why would you choose DSPs over FPGAs? If you care about size,`
	`% if you care about power, or happen to have a complicated algorithm`
	`% that just isn't simply doing the same thing over and over`
	`%`
	`% For complex algorithms that change over time. Each have their strengths`
	`% sometimes you can use both.`
	`%`
	`% "No assembly required" -- TI tools all C programming, very GUI based`
	`% environment, very little optimization by hand ...`
	`%`
	`%`
	`% The FPGA's achilles heel: Reconfigurability. It is very difficult, although`
	`% I'm sure major vendors will tell you not impossible, to reconfigure an FPGA`
	`% based upon the need to process time-sensitive data. If you need one of two`
	`% algorithms, both which will fit on the FPGA individually but not together,`
	`% switching between them on the fly is next to impossible, whereas switching`
	`% algorithm within a CPU is not difficult at all. For example, imagine`
	`% receiving a packet and needing to apply one of two data algorithms on the`
	`% packet before sending it back out, and needing to do so fast. If both`
	`% algorithms don't fit in memory, where does the packet go when you need to`
	`% swap one algorithm out for the other? And what is the cost of that "context"`
	`% swap?`
	`%`
	`%`
`\documentclass{gqtekspec}`	`\documentclass{gqtekspec}`
`\usepackage{import}`	`\usepackage{import}`
`\usepackage{bytefield}`	`\usepackage{bytefield} % Install via apt-get install texlive-science`
`% \graphicspath{{../gfx}}`	`% \graphicspath{{../gfx}}`
`\project{Zip CPU}`	`\project{Zip CPU}`
`\title{Specification}`	`\title{Specification}`
`\author{Dan Gisselquist, Ph.D.}`	`\author{Dan Gisselquist, Ph.D.}`
`\email{dgisselq (at) opencores.org}`	`\email{dgisselq (at) opencores.org}`
`\revision{Rev.~0.8}`	`\revision{Rev.~0.9}`
`\definecolor{webred}{rgb}{0.5,0,0}`	`\definecolor{webred}{rgb}{0.5,0,0}`
`\definecolor{webgreen}{rgb}{0,0.4,0}`	`\definecolor{webgreen}{rgb}{0,0.4,0}`
`\usepackage[dvips,ps2pdf,colorlinks=true,`	`\usepackage[dvips,ps2pdf,colorlinks=true,`
`anchorcolor=black,pdfpagelabels,hypertexnames,`	`anchorcolor=black,pdfpagelabels,hypertexnames,`
`pdfauthor={Dan Gisselquist},`	`pdfauthor={Dan Gisselquist},`
Line 82...	Line 118...
`You should have received a copy of the GNU General Public License along`	`You should have received a copy of the GNU General Public License along`
`with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a`	`with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a`
`copy.`	`copy.`
`\end{license}`	`\end{license}`
`\begin{revisionhistory}`	`\begin{revisionhistory}`
	`0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY, MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively. LOCK instruction now`
	`permits an intermediate ALU operation. \\\hline`
`0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline`	`0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline`
`0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline`	`0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline`
`0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline`	`0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline`
`0.5 & 9/29/2015 & Gisselquist & Added pipelined memory access discussion.\\\hline`	`0.5 & 9/29/2015 & Gisselquist & Added pipelined memory access discussion.\\\hline`
`0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline`	`0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline`
Line 650...	Line 688...
`\end{bytefield}`	`\end{bytefield}`
`\caption{Zip Instruction Set Format}\label{fig:iset-format}`	`\caption{Zip Instruction Set Format}\label{fig:iset-format}`
`\end{center}\end{figure}`	`\end{center}\end{figure}`
`The basic format is that some operation, defined by the OpCode, is applied`	`The basic format is that some operation, defined by the OpCode, is applied`
`if a condition, Cnd, is true in order to produce a result which is placed in`	`if a condition, Cnd, is true in order to produce a result which is placed in`
`the destination register, or DR. The Load 23--bit signed immediate instruction`	`the destination register, or DR. The load 23--bit signed immediate instruction`
`is different in that it requires no conditions, and uses only a 4-bit opcode.`	`(LDI) is different in that it accepts no conditions, and uses only a 4-bit`
	`opcode.`

`This is actually a second version of instruction set definition, given certain`	`This is actually a second version of instruction set definition, given certain`
`lessons learned. For example, the original instruction set had the following`	`lessons learned. For example, the original instruction set had the following`
`problems:`	`problems:`
`\begin{enumerate}`	`\begin{enumerate}`
Line 665...	Line 704...
`require extra logic to use.`	`require extra logic to use.`
`\item The carveouts for instructions such as NOOP and LDIHI/LDILO required`	`\item The carveouts for instructions such as NOOP and LDIHI/LDILO required`
`extra logic to process.`	`extra logic to process.`
`\item The instruction set wasn't very compact. One bus operation was required`	`\item The instruction set wasn't very compact. One bus operation was required`
`for every instruction.`	`for every instruction.`
	`\item While the CPU supported multiplies, they were only 16x16 bit multiplies.`
`\end{enumerate}`	`\end{enumerate}`
`This second version was designed with two criteria. The first was that the`	`This second version was designed with two criteria. The first was that the`
`new instruction set needed to be compatible, at the assembly language level,`	`new instruction set needed to be compatible, at the assembly language level,`
`with the previous instruction set. Thus, it must be able to support all of`	`with the previous instruction set. Thus, it must be able to support all of`
`the previous menumonics and more. This was achieved with the sole exception`	`the previous menumonics and more. This was achieved with the sole exception`
Line 690...	Line 730...
`to interrupt mode in between the two instructions. Likewise a new job given`	`to interrupt mode in between the two instructions. Likewise a new job given`
`to the assembler is that of automatically packing as many instructions as`	`to the assembler is that of automatically packing as many instructions as`
`possible into the VLIW format. Where necessary to place both VLIW instructions`	`possible into the VLIW format. Where necessary to place both VLIW instructions`
`on the same line, they will be separated by a vertical bar.`	`on the same line, they will be separated by a vertical bar.`

	`One belated change to the instruction set violates some of the above`
	`principles. This latter instruction set change replaced the {\tt LDIHI}`
	`instruction with a 32--bit multiply instruction {\tt MPY}, and then changed`
	`the two 16--bit multiply instructions {\tt MPYU} and {\tt MPYS} for`
	`{\tt MPYUHI} and {\tt MPYSHI} respectively. This creates a 32--bit`
	`multiply capability, while removing the 16--bit multiply that wasn't very`
	`useful. Further, the {\tt LDIHI} instruction was being used primarily by the`
	`assembler and linker to create a 32--bit load immediate pair of instructions.`
	`This instruction set combination, {\tt LDIHI} followed by {\tt LDILO} was`
	`replaced with an equivalent instruction set, {\tt BREV} followed by {\tt LDILO},`
	`save that linking has been made more complicated in the process.`

`\section{Instruction OpCodes}`	`\section{Instruction OpCodes}`
`With a 5--bit opcode field, there are 32--possible instructions as shown in`	`With a 5--bit opcode field, there are 32--possible instructions as shown in`
`Tbl.~\ref{tbl:iset-opcodes}.`	`Tbl.~\ref{tbl:iset-opcodes}.`
`\begin{table}\begin{center}`	`\begin{table}\begin{center}`
`\begin{tabular}{\|l\|l\|l\|c\|} \hline \rowcolor[gray]{0.85}`	`\begin{tabular}{\|l\|l\|l\|c\|} \hline \rowcolor[gray]{0.85}`
Line 704...	Line 756...
`5'h03 & OR & Bitwise Or & Y \\\cline{1-3}`	`5'h03 & OR & Bitwise Or & Y \\\cline{1-3}`
`5'h04 & XOR & Bitwise Exclusive Or & \\\cline{1-3}`	`5'h04 & XOR & Bitwise Exclusive Or & \\\cline{1-3}`
`5'h05 & LSR & Logical Shift Right & \\\cline{1-3}`	`5'h05 & LSR & Logical Shift Right & \\\cline{1-3}`
`5'h06 & LSL & Logical Shift Left & \\\cline{1-3}`	`5'h06 & LSL & Logical Shift Left & \\\cline{1-3}`
`5'h07 & ASR & Arithmetic Shift Right & \\\hline`	`5'h07 & ASR & Arithmetic Shift Right & \\\hline`
`5'h08 & LDIHI & Load Immediate High & N \\\cline{1-3}`	`5'h08 & MPY & 32x32 bit multiply & Y \\\hline`
`5'h09 & LDILO & Load Immediate Low & \\\hline`	`5'h09 & LDILO & Load Immediate Low & N\\\hline`
`5'h0a & MPYU & Unsigned 16--bit Multiply & \\\cline{1-3}`	`5'h0a & MPYUHI & Upper 32 of 64 bits from an unsigned 32x32 multiply & \\\cline{1-3}`
`5'h0b & MPYS & Signed 16--bit Multiply & Y \\\cline{1-3}`	`5'h0b & MPYSHI & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3}`
`5'h0c & BREV & Bit Reverse & \\\cline{1-3}`	`5'h0c & BREV & Bit Reverse & \\\cline{1-3}`
`5'h0d & POPC& Population Count & \\\cline{1-3}`	`5'h0d & POPC& Population Count & \\\cline{1-3}`
`5'h0e & ROL & Rotate left & \\\hline`	`5'h0e & ROL & Rotate left & \\\hline`
`5'h0f & MOV & Move register & N \\\hline`	`5'h0f & MOV & Move register & N \\\hline`
`5'h10 & CMP & Compare & Y \\\cline{1-3}`	`5'h10 & CMP & Compare & Y \\\cline{1-3}`
Line 727...	Line 779...
`5'h1b & FPDIV & Floating point divide & \\\cline{1-3}`	`5'h1b & FPDIV & Floating point divide & \\\cline{1-3}`
`5'h1c & FPCVT & Convert integer to floating point & \\\cline{1-3}`	`5'h1c & FPCVT & Convert integer to floating point & \\\cline{1-3}`
`5'h1d & FPINT & Convert to integer & \\\hline`	`5'h1d & FPINT & Convert to integer & \\\hline`
`5'h1e & & {\em Reserved for future use} &\\\hline`	`5'h1e & & {\em Reserved for future use} &\\\hline`
`5'h1f & & {\em Reserved for future use} &\\\hline`	`5'h1f & & {\em Reserved for future use} &\\\hline`
	`5'h18 & & NOOP (A-register = PC)&\\\cline{1-3}`
	`5'h19 & & BREAK (A-register = PC)& N\\\cline{1-3}`
	`5'h1a & & LOCK (A-register = PC)&\\\hline`
`\end{tabular}`	`\end{tabular}`
`\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes}`	`\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`%`	`%`
`Of these opcodes, the {\tt BREV} and {\tt POPC} are experimental, and may be`	`Of these opcodes, the {\tt BREV} and {\tt POPC} are experimental, and may be`
Line 751...	Line 806...
`3'h1 & {\tt .LT} & Less than ('N' set) \\`	`3'h1 & {\tt .LT} & Less than ('N' set) \\`
`3'h2 & {\tt .Z} & Only execute when 'Z' is set \\`	`3'h2 & {\tt .Z} & Only execute when 'Z' is set \\`
`3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\`	`3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\`
`3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\`	`3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\`
`3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\`	`3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\`
`3'h6 & {\tt .C} & Carry set\\`	`3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\`
`3'h7 & {\tt .V} & Overflow set\\`	`3'h7 & {\tt .V} & Overflow set\\`
`\end{tabular}`	`\end{tabular}`
`\caption{Conditions for conditional operand execution}\label{tbl:conditions}`	`\caption{Conditions for conditional operand execution}\label{tbl:conditions}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`There is no condition code for less than or equal, not C or not V---there`	`There is no condition code for less than or equal, not C or not V---there`
`just wasn't enough space in 3--bits. Conditioning on a non--supported`	`just wasn't enough space in 3--bits. Conditioning on a non--supported`
`condition is still possible, but it will take an extra instruction and a`	`condition is still possible, but it will take an extra instruction and a`
`pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt`	`pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt`
`STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the`	`STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the`
`condition, and thus recovering those extra two clocks. Thus instead of`	`condition, and thus recovering those extra two clocks. Thus instead of`
`\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNV label} you can issue a`	`\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNC label} you can issue a`
`\hbox{\tt CMP Ry,Rx;} \hbox{\tt BV label}.`	`\hbox{\tt CMP 1+Ry,Rx;} \hbox{\tt BC label}.`

`Conditionally executed instructions will not further adjust the`	`Conditionally executed instructions will not further adjust the`
`condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST}`	`condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST}`
`instructions. Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions`	`instructions. Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions`
`will adjust conditions whenever they are executed. In this way,`	`will adjust conditions whenever they are executed. In this way,`
Line 801...	Line 856...
`\caption{VLIW Conditions}\label{tbl:vliw-conditions}`	`\caption{VLIW Conditions}\label{tbl:vliw-conditions}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`Further, the first bit is given a special meaning. If the first bit is set,`	`Further, the first bit is given a special meaning. If the first bit is set,`
`the conditions apply to the second half of the instruction, otherwise the`	`the conditions apply to the second half of the instruction, otherwise the`
`conditions will only apply to the first half of a conditional instruction.`	`conditions will only apply to the first half of a conditional instruction.`
	`Of course, the other conditions are still available by mingling the`
	`non--VLIW instructions with VLIW instructions.`

`\section{Operand B}`	`\section{Operand B}`
Many instruction forms have a 19-bit source ``Operand B'' associated with them.	Many instruction forms have a 19-bit source ``Operand B'' associated with them.
This ``Operand B'' is shown in Fig.~\ref{fig:iset-format} as part of the	This ``Operand B'' is shown in Fig.~\ref{fig:iset-format} as part of the
`standard instructions. This Operand B is either equal to a register plus a`	`standard instructions. This Operand B is either equal to a register plus a`
Line 848...	Line 905...
`removed from the realm of possibilities. This means that the Zip CPU has no`	`removed from the realm of possibilities. This means that the Zip CPU has no`
`native way of executing push, pop, return, or jump to subroutine operations.`	`native way of executing push, pop, return, or jump to subroutine operations.`
`Each of these instructions can be emulated with a set of instructions from the`	`Each of these instructions can be emulated with a set of instructions from the`
`existing set.`	`existing set.`

	`\section{Modifying Conditions}`
	`A quick look at the list of conditions supported by the Zip CPU and listed`
	`in Tbl.~\ref{tbl:conditions} reveals that the Zip CPU does not have a full set`
	`of conditions. In particular, only one explicit unsigned condition is`
	`supported. Therefore, Tbl.~\ref{tbl:creating-conditions}`
	`\begin{table}\begin{center}`
	`\begin{tabular}{\|l\|l\|l\|}\hline`
	`Original & Modified & Name \\\hline\hline`
	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1`
	`& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label}`
	`& Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline`
	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}`
	`& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label}`
	`& Less-than or equal unsigned \\[4mm]\hline`
	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry`
	`& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}`
	`& Greater-than unsigned \\[4mm]\hline`
	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label} % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1`
	`& \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label}`
	`& Greater-than equal unsigned \\[4mm]\hline`
	`\parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A`
	`& \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label}`
	`& Greater-than equal unsigned (with offset)\\[4mm]\hline`
	`\parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A`
	`& \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label}`
	`& Greater-than equal comparison with a constant\\[4mm]\hline`
	`\end{tabular}`
	`\caption{Modifying conditions}\label{tbl:creating-conditions}`
	`\end{center}\end{table}`
	`shows examples of how these unsupported conditions can be created`
	`simply by adjusting the compare instruction, for no extra cost in clocks.`
	`Of course, if the compare originally had an immediate within it, that immediate`
	`would need to be loaded into a register in order to do some of these compares.`
	`This case is shown as the last case above.`

`\section{Move Operands}`	`\section{Move Operands}`
`The previous set of operands would be perfect and complete, save only that`	`The previous set of operands would be perfect and complete, save only that`
`the CPU needs access to non--supervisory registers while in supervisory mode.`	`the CPU needs access to non--supervisory registers while in supervisory mode.`
`Therefore, the MOV instruction is special and offers access to these registers`	`Therefore, the MOV instruction is special and offers access to these registers`
`\ldots when in supervisory mode. To keep the compiler simple, the extra bits`	`\ldots when in supervisory mode. To keep the compiler simple, the extra bits`
Line 873...	Line 965...
`Anything with the user bit set will be treated as a user register and displayed`	`Anything with the user bit set will be treated as a user register and displayed`
`special. Since the CPU quietly ignores the supervisor bits while in user mode,`	`special. Since the CPU quietly ignores the supervisor bits while in user mode,`
`anything marked as a user register will always be specific.`	`anything marked as a user register will always be specific.`

`\section{Multiply Operations}`	`\section{Multiply Operations}`
`The Zip CPU supports two Multiply operations, a 16x16 bit signed multiply`
`({\tt MPYS}) and a 16x16 bit unsigned multiply ({\tt MPYU}). A 32--bit`	`The ZipCPU originally only supported 16x16 multiply operations. GCC, however,`
`multiply, should it be desired, needs to be created via software from this`	`wanted 32x32-bit operations and building these from 16x16-bit multiplies`
`16x16 bit multiply.`	`is painful. Therefore, the ZipCPU was modified to support 32x32-bit multiplies.`

	`In particular, the ZipCPU supports three separate 32x32-bit multiply`
	`instructions: {\tt MPY}, {\tt MPYUHI}, and {\tt MPYSHI}. The first of these`
	`produces the low 32-bits of a 32x32-bit multiply result. The second two`
	`produce the upper 32-bits. The first, {\tt MPYUHI}, produces the upper 32-bits`
	`assuming the multiply was unsigned, whereas the second assuming it was signed.`
	`Each multiply instruction is independent of each other in execution, although`
	`the compiler may use them quite dependently.`

	`In an effort to maintain single clock pipeline timing, all three of these`
	`multiplies have been slowed down in logic. Thus, depending upon the setting`
	`of {\tt OPT\_MULTIPLY} within {\tt cpudefs.v}, the multiply instructions`
	`will either 1)~cause an ILLEGAL instruction error, 2)~take one additional clock,`
	`or 3)~take two additional clocks.`


`\section{Divide Unit}`	`\section{Divide Unit}`
`The Zip CPU also has a divide unit which can be built alongside the ALU.`	`The Zip CPU also has a divide unit which can be built alongside the ALU.`
`This divide unit provides the Zip CPU with its first two instructions that`	`This divide unit provides the Zip CPU with another two instructions that`
`cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and`	`cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and`
`{\tt DIVU}, the unsigned divide. These are both 32--bit divide instructions,`	`{\tt DIVU}, the unsigned divide. These are both 32--bit divide instructions,`
`dividing one 32--bit number by another. In this case, the Operand B field,`	`dividing one 32--bit number by another. In this case, the Operand B field,`
`whether it be register or register plus immediate, constitutes the denominator,`	`whether it be register or register plus immediate, constitutes the denominator,`
`whereas the numerator is given by the other register.`	`whereas the numerator is given by the other register.`

`The Divide is also a multi--clock instruction. While the divide is running,`	`The Divide is also a multi--clock instruction. While the divide is running,`
`the ALU, memory unit, and floating point unit (if installed) will be idle.`	`the ALU, any memory loads, and the floating point unit (if installed) will be`
`Once the divide completes, other units may continue.`	`idle. Once the divide completes, other units may continue.`

`Of course, divides can have errors: division by zero. In the case of division`	`Of course, divides can have errors: division by zero. In the case of division`
`by zero, an exception will be caused that will send the CPU either from`	`by zero, an exception will be caused that will send the CPU either from`
`user mode to supervisor mode, or halt the CPU if it is already in supervisor`	`user mode to supervisor mode, or halt the CPU if it is already in supervisor`
`mode.`	`mode.`

`\section{NOOP, BREAK, and Bus Lock Instruction}`	`\section{NOOP, BREAK, and Bus Lock Instruction}`
`Three instructions are not listed in the opcode list in`	`Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are`
`Tbl.~\ref{tbl:iset-opcodes}, yet fit in the NOOP type instruction format of`	`somewhat special. These are the {\tt NOOP}, {\tt Break}, and bus {\tt LOCK}`
`Fig.~\ref{fig:iset-format}. These are the {\tt NOOP}, {\tt Break}, and`	`instructions. These are encoded according to`
`bus {\tt LOCK} instructions. These are encoded according to`
`Fig.~\ref{fig:iset-noop}, and have the following meanings:`	`Fig.~\ref{fig:iset-noop}, and have the following meanings:`
`\begin{figure}\begin{center}`	`\begin{figure}\begin{center}`
`\begin{bytefield}[endianness=big]{32}`	`\begin{bytefield}[endianness=big]{32}`
`\bitheader{0-31}\\`	`\bitheader{0-31}\\`
`\begin{leftwordgroup}{NOOP}`	`\begin{leftwordgroup}{NOOP}`
`\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}`	`\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}`
`\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored} \\`	`\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{Ignored} \\`
`\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}`	`\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}`
`\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{---} \\`	`\bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{---} \\`
`\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}`	`\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}`
`\bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}`	`\bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}`
`\bitbox{5}{Ignored}`	`\bitbox{5}{Ignored}`
`\end{leftwordgroup} \\`	`\end{leftwordgroup} \\`
`\begin{leftwordgroup}{BREAK}`	`\begin{leftwordgroup}{BREAK}`
`\bitbox{1}{0}\bitbox{3}{3'h7}`	`\bitbox{1}{0}\bitbox{3}{3'h7}`
`\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}`	`\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored}`
`\end{leftwordgroup} \\`	`\end{leftwordgroup} \\`
`\begin{leftwordgroup}{LOCK}`	`\begin{leftwordgroup}{LOCK}`
`\bitbox{1}{0}\bitbox{3}{3'h7}`	`\bitbox{1}{0}\bitbox{3}{3'h7}`
`\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{100}\bitbox{22}{Ignored}`	`\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}`
`\end{leftwordgroup} \\`	`\end{leftwordgroup} \\`
`\end{bytefield}`	`\end{bytefield}`
`\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}`	`\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}`
`\end{center}\end{figure}`	`\end{center}\end{figure}`

Line 937...	Line 1043...
`The {\tt BREAK} instruction is useful for creating a debug instruction that`	`The {\tt BREAK} instruction is useful for creating a debug instruction that`
`will halt the CPU without executing. If in user mode, depending upon the`	`will halt the CPU without executing. If in user mode, depending upon the`
`setting of the break enable bit, it will either switch to supervisor mode or`	`setting of the break enable bit, it will either switch to supervisor mode or`
`halt the CPU--depending upon where the user wishes to do his debugging.`	`halt the CPU--depending upon where the user wishes to do his debugging.`

`Finally, the {\tt LOCK} instruction was added in order to make a test and`	`Finally, the {\tt LOCK} instruction was added in order to provide for`
`set multi--CPU operation possible. Following a LOCK instruction, the next`	`atomic operations. The {\tt LOCK} instruction only works in pipeline mode.`
`two instructions, if they are memory LOD/STO instructions, will execute without`	`It works by stalling the ALU pipeline stack until all prior stages are`
`dropping the wishbone {\tt CYC} line between the instructions. Thus a`	`filled, and then it guarantees that once a bus cycle is started, the`
`{\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, where Rz`	`wishbone {\tt CYC} line will remain asserted until the LOCK is deasserted.`
`is initially set, can be used to set an address while guaranteeing that Ry`	`This allows the execution of one instruction that was waiting in the load`
`was the value before setting the address to Rz. This is a useful instruction`	`operands pipeline stage, and one instruction that was waiting in the`
`while trying to achieve concurrency among multiple CPU's.`	`instruction decode stage. Further, if the instruction waiting in the decode`
	`stage was a VLIW instruction, then it may be possible to execute a third`
	`instruction.`

	`This was originally written to implement an atomic test and set instruction,`
	`such as a {\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)},`
	`where Rz is initially set.`

	`Other instructions using a VLIW instruction combining a single ALU instruction`
	`with a store, such as an atomic increment, or {\tt LOCK}, {\tt LOD (Rx),Ry},`
	`{\tt ADD 1,Ry}, {\tt STO Ry,(Rx)}, should be possible as well. Many of these`
	`combinations remain to be tested.`

`\section{Floating Point}`	`\section{Floating Point}`
`Although the Zip CPU does not (yet) have a floating point unit, the current`	`Although the Zip CPU does not (yet) have a floating point unit, the current`
`instruction set offers eight opcodes for floating point operations, and treats`	`instruction set offers eight opcodes for floating point operations, and treats`
`floating point exceptions like divide by zero errors. Once this unit is built`	`floating point exceptions like divide by zero errors. Once this unit is built`
`and integrated together with the rest of the CPU, the Zip CPU will support`	`and integrated together with the rest of the CPU, the Zip CPU will support`
`32--bit floating point instructions natively. Any 64--bit floating point`	`32--bit floating point instructions natively. Any 64--bit floating point`
`instructions will still need to be emulated in software.`	`instructions will still need to be emulated in software.`

	`Until that time, of even after if the floating point unit is not installed,`
	`floating point instructions will trigger an illegal instruction exception,`
	`which may be trapped and then implemented in software.`

`\section{Derived Instructions}`	`\section{Derived Instructions}`
`The Zip CPU supports many other common instructions, but not all of them`	`The Zip CPU supports many other common instructions, but not all of them`
`are single cycle instructions. The derived instruction tables,`	`are single cycle instructions. The derived instruction tables,`
`Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3}`	`Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3}`
`and~\ref{tbl:derived-4},`	`and~\ref{tbl:derived-4},`
Line 1123...	Line 1244...
`\\\hline`	`\\\hline`
`{\tt STEP Rr,Rt}`	`{\tt STEP Rr,Rt}`
`& \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}`	`& \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}`
`& Step a Galois implementation of a Linear Feedback Shift Register, Rr,`	`& Step a Galois implementation of a Linear Feedback Shift Register, Rr,`
`using taps Rt \\\hline`	`using taps Rt \\\hline`
	`%`
	`%`
	`{\tt SEX.b Rx }`
	`& \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx}`
	`& Signed extend a byte into a full word.\\\hline`
	`{\tt SEX.h Rx }`
	`& \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx}`
	`& Sign extend a half word into a full word.\\\hline`
	`%`
`{\tt STO.b Rx,\$addr}`	`{\tt STO.b Rx,\$addr}`
`& \parbox[t]{1.5in}{\tt %`	`& \parbox[t]{1.5in}{\tt %`
`LDI \$addr,Ra \\`	`LDI \$addr,Ra \\`
`LDI \$addr,Rb \\`	`LDI \$addr,Rb \\`
`LSR \$2,Ra \\`	`LSR \$2,Ra \\`

Line 41...

%% License:     GPL, v3, as defined and found on www.gnu.org,

%% License:     GPL, v3, as defined and found on www.gnu.org,

%%              http://www.gnu.org/licenses/gpl.html

%%              http://www.gnu.org/licenses/gpl.html

%%

%%

%%

%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% From TI about DSPs vs FPGAs:

%       www.ti.com/general/docs/video/foldersGallery.tsp?bkg=gray

%       &gpn=35145&familyid=1622&keyMatch=DSP Breaktime Episode Three

%       &tisearch=Search-EN-Everything&DCMP=leadership

%       &HQS=ep-pro-dsp-leadership-problog-150518-v-en

%       FPGA's are annoyingly faster, cheaper, and not quite as power hungry

%       as they used to be.

%       Why would you choose DSPs over FPGAs?  If you care about size,

%       if you care about power, or happen to have a complicated algorithm

%       that just isn't simply doing the same thing over and over

%       For complex algorithms that change over time.  Each have their strengths

%       sometimes you can use both.

%       "No assembly required" -- TI tools all C programming, very GUI based

%       environment, very little optimization by hand ...

% The FPGA's achilles heel: Reconfigurability.  It is very difficult, although

% I'm sure major vendors will tell you not impossible, to reconfigure an FPGA

% based upon the need to process time-sensitive data.  If you need one of two

% algorithms, both which will fit on the FPGA individually but not together,

% switching between them on the fly is next to impossible, whereas switching

% algorithm within a CPU is not difficult at all.  For example, imagine

% receiving a packet and needing to apply one of two data algorithms on the

% packet before sending it back out, and needing to do so fast.  If both

% algorithms don't fit in memory, where does the packet go when you need to

% swap one algorithm out for the other?  And what is the cost of that "context"

% swap?

\documentclass{gqtekspec}

\documentclass{gqtekspec}

\usepackage{import}

\usepackage{import}

\usepackage{bytefield}

\usepackage{bytefield}  % Install via apt-get install texlive-science

% \graphicspath{{../gfx}}

% \graphicspath{{../gfx}}

\project{Zip CPU}

\project{Zip CPU}

\title{Specification}

\title{Specification}

\author{Dan Gisselquist, Ph.D.}

\author{Dan Gisselquist, Ph.D.}

\email{dgisselq (at) opencores.org}

\email{dgisselq (at) opencores.org}

\revision{Rev.~0.8}

\revision{Rev.~0.9}

\definecolor{webred}{rgb}{0.5,0,0}

\definecolor{webred}{rgb}{0.5,0,0}

\definecolor{webgreen}{rgb}{0,0.4,0}

\definecolor{webgreen}{rgb}{0,0.4,0}

\usepackage[dvips,ps2pdf,colorlinks=true,

\usepackage[dvips,ps2pdf,colorlinks=true,

        anchorcolor=black,pdfpagelabels,hypertexnames,

        anchorcolor=black,pdfpagelabels,hypertexnames,

        pdfauthor={Dan Gisselquist},

        pdfauthor={Dan Gisselquist},

Line 82...

Line 118...

You should have received a copy of the GNU General Public License along

You should have received a copy of the GNU General Public License along

with this program.  If not, see \hbox{<http://www.gnu.org/licenses/>} for a

with this program.  If not, see \hbox{<http://www.gnu.org/licenses/>} for a

copy.

copy.

\end{license}

\end{license}

\begin{revisionhistory}

\begin{revisionhistory}

0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY, MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively.  LOCK instruction now

permits an intermediate ALU operation. \\\hline

0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline

0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline

0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline

0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline

0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline

0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline

0.5 & 9/29/2015 & Gisselquist & Added pipelined memory access discussion.\\\hline

0.5 & 9/29/2015 & Gisselquist & Added pipelined memory access discussion.\\\hline

0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline

0.4 & 9/19/2015 & Gisselquist & Added DMA controller, improved stall information, and self--assessment info.\\\hline

Line 650...

Line 688...

\end{bytefield}

\end{bytefield}

\caption{Zip Instruction Set Format}\label{fig:iset-format}

\caption{Zip Instruction Set Format}\label{fig:iset-format}

\end{center}\end{figure}

\end{center}\end{figure}

The basic format is that some operation, defined by the OpCode, is applied

The basic format is that some operation, defined by the OpCode, is applied

if a condition, Cnd, is true in order to produce a result which is placed in

if a condition, Cnd, is true in order to produce a result which is placed in

the destination register, or DR.  The Load 23--bit signed immediate instruction

the destination register, or DR.  The load 23--bit signed immediate instruction

is different in that it requires no conditions, and uses only a 4-bit opcode.

(LDI) is different in that it accepts no conditions, and uses only a 4-bit

opcode.

This is actually a second version of instruction set definition, given certain

This is actually a second version of instruction set definition, given certain

lessons learned.  For example, the original instruction set had the following

lessons learned.  For example, the original instruction set had the following

problems:

problems:

\begin{enumerate}

\begin{enumerate}

Line 665...

Line 704...

        require extra logic to use.

        require extra logic to use.

\item The carveouts for instructions such as NOOP and LDIHI/LDILO required

\item The carveouts for instructions such as NOOP and LDIHI/LDILO required

        extra logic to process.

        extra logic to process.

\item The instruction set wasn't very compact.  One bus operation was required

\item The instruction set wasn't very compact.  One bus operation was required

        for every instruction.

        for every instruction.

\item While the CPU supported multiplies, they were only 16x16 bit multiplies.

\end{enumerate}

\end{enumerate}

This second version was designed with two criteria.  The first was that the

This second version was designed with two criteria.  The first was that the

new instruction set needed to be compatible, at the assembly language level,

new instruction set needed to be compatible, at the assembly language level,

with the previous instruction set.  Thus, it must be able to support all of

with the previous instruction set.  Thus, it must be able to support all of

the previous menumonics and more.  This was achieved with the sole exception

the previous menumonics and more.  This was achieved with the sole exception

Line 690...

Line 730...

to interrupt mode in between the two instructions.  Likewise a new job given

to interrupt mode in between the two instructions.  Likewise a new job given

to the assembler is that of automatically packing as many instructions as

to the assembler is that of automatically packing as many instructions as

possible into the VLIW format.  Where necessary to place both VLIW instructions

possible into the VLIW format.  Where necessary to place both VLIW instructions

on the same line, they will be separated by a vertical bar.

on the same line, they will be separated by a vertical bar.

One belated change to the instruction set violates some of the above

principles.  This latter instruction set change replaced the {\tt LDIHI}

instruction with a 32--bit multiply instruction {\tt MPY}, and then changed

the two 16--bit multiply instructions {\tt MPYU} and {\tt MPYS} for

{\tt MPYUHI} and {\tt MPYSHI} respectively.  This creates a 32--bit

multiply capability, while removing the 16--bit multiply that wasn't very

useful. Further, the {\tt LDIHI} instruction was being used primarily by the

assembler and linker to create a 32--bit load immediate pair of instructions.

This instruction set combination, {\tt LDIHI} followed by {\tt LDILO} was

replaced with an equivalent instruction set, {\tt BREV} followed by {\tt LDILO},

save that linking has been made more complicated in the process.

\section{Instruction OpCodes}

\section{Instruction OpCodes}

With a 5--bit opcode field, there are 32--possible instructions as shown in

With a 5--bit opcode field, there are 32--possible instructions as shown in

Tbl.~\ref{tbl:iset-opcodes}.

Tbl.~\ref{tbl:iset-opcodes}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabular}{|l|l|l|c|} \hline \rowcolor[gray]{0.85}

\begin{tabular}{|l|l|l|c|} \hline \rowcolor[gray]{0.85}

Line 704...

Line 756...

5'h03 & OR  & Bitwise Or & Y \\\cline{1-3}

5'h03 & OR  & Bitwise Or & Y \\\cline{1-3}

5'h04 & XOR & Bitwise Exclusive Or &   \\\cline{1-3}

5'h04 & XOR & Bitwise Exclusive Or &   \\\cline{1-3}

5'h05 & LSR & Logical Shift Right &   \\\cline{1-3}

5'h05 & LSR & Logical Shift Right &   \\\cline{1-3}

5'h06 & LSL & Logical Shift Left &   \\\cline{1-3}

5'h06 & LSL & Logical Shift Left &   \\\cline{1-3}

5'h07 & ASR & Arithmetic Shift Right &   \\\hline

5'h07 & ASR & Arithmetic Shift Right &   \\\hline

5'h08 & LDIHI & Load Immediate High & N \\\cline{1-3}

5'h08 & MPY & 32x32 bit multiply & Y \\\hline

5'h09 & LDILO & Load Immediate Low &  \\\hline

5'h09 & LDILO & Load Immediate Low & N\\\hline

5'h0a & MPYU & Unsigned 16--bit Multiply &  \\\cline{1-3}

5'h0a & MPYUHI & Upper 32 of 64 bits from an unsigned 32x32 multiply &  \\\cline{1-3}

5'h0b & MPYS & Signed 16--bit Multiply & Y \\\cline{1-3}

5'h0b & MPYSHI & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3}

5'h0c & BREV & Bit Reverse &  \\\cline{1-3}

5'h0c & BREV & Bit Reverse &  \\\cline{1-3}

5'h0d & POPC& Population Count &  \\\cline{1-3}

5'h0d & POPC& Population Count &  \\\cline{1-3}

5'h0e & ROL & Rotate left &   \\\hline

5'h0e & ROL & Rotate left &   \\\hline

5'h0f & MOV & Move register & N \\\hline

5'h0f & MOV & Move register & N \\\hline

5'h10 & CMP & Compare & Y \\\cline{1-3}

5'h10 & CMP & Compare & Y \\\cline{1-3}

Line 727...

Line 779...

5'h1b & FPDIV & Floating point divide &   \\\cline{1-3}

5'h1b & FPDIV & Floating point divide &   \\\cline{1-3}

5'h1c & FPCVT & Convert integer to floating point &   \\\cline{1-3}

5'h1c & FPCVT & Convert integer to floating point &   \\\cline{1-3}

5'h1d & FPINT & Convert to integer &   \\\hline

5'h1d & FPINT & Convert to integer &   \\\hline

5'h1e & & {\em Reserved for future use} &\\\hline

5'h1e & & {\em Reserved for future use} &\\\hline

5'h1f & & {\em Reserved for future use} &\\\hline

5'h1f & & {\em Reserved for future use} &\\\hline

5'h18 & & NOOP (A-register = PC)&\\\cline{1-3}

5'h19 & & BREAK (A-register = PC)& N\\\cline{1-3}

5'h1a & & LOCK (A-register = PC)&\\\hline

\end{tabular}

\end{tabular}

\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes}

\caption{Zip CPU OpCodes}\label{tbl:iset-opcodes}

\end{center}\end{table}

\end{center}\end{table}

Of these opcodes, the {\tt BREV} and {\tt POPC} are experimental, and may be

Of these opcodes, the {\tt BREV} and {\tt POPC} are experimental, and may be

Line 751...

Line 806...

3'h1 & {\tt .LT} & Less than ('N' set) \\

3'h1 & {\tt .LT} & Less than ('N' set) \\

3'h2 & {\tt .Z} & Only execute when 'Z' is set \\

3'h2 & {\tt .Z} & Only execute when 'Z' is set \\

3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\

3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\

3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\

3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\

3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\

3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\

3'h6 & {\tt .C} & Carry set\\

3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\

3'h7 & {\tt .V} & Overflow set\\

3'h7 & {\tt .V} & Overflow set\\

\end{tabular}

\end{tabular}

\caption{Conditions for conditional operand execution}\label{tbl:conditions}

\caption{Conditions for conditional operand execution}\label{tbl:conditions}

\end{center}\end{table}

\end{center}\end{table}

There is no condition code for less than or equal, not C or not V---there

There is no condition code for less than or equal, not C or not V---there

just wasn't enough space in 3--bits.  Conditioning on a non--supported

just wasn't enough space in 3--bits.  Conditioning on a non--supported

condition is still possible, but it will take an extra instruction and a

condition is still possible, but it will take an extra instruction and a

pipeline stall.  (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt

pipeline stall.  (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt

STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the

STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the

condition, and thus recovering those extra two clocks.  Thus instead of

condition, and thus recovering those extra two clocks.  Thus instead of

\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNV label} you can issue a

\hbox{\tt CMP Rx,Ry;} \hbox{\tt BNC label} you can issue a

\hbox{\tt CMP Ry,Rx;} \hbox{\tt BV label}.

\hbox{\tt CMP 1+Ry,Rx;} \hbox{\tt BC label}.

Conditionally executed instructions will not further adjust the

Conditionally executed instructions will not further adjust the

condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST}

condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST}

instructions.   Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions

instructions.   Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions

will adjust conditions whenever they are executed.  In this way,

will adjust conditions whenever they are executed.  In this way,

Line 801...

Line 856...

\caption{VLIW Conditions}\label{tbl:vliw-conditions}

\caption{VLIW Conditions}\label{tbl:vliw-conditions}

\end{center}\end{table}

\end{center}\end{table}

Further, the first bit is given a special meaning.  If the first bit is set,

Further, the first bit is given a special meaning.  If the first bit is set,

the conditions apply to the second half of the instruction, otherwise the

the conditions apply to the second half of the instruction, otherwise the

conditions will only apply to the first half of a conditional instruction.

conditions will only apply to the first half of a conditional instruction.

Of course, the other conditions are still available by mingling the

non--VLIW instructions with VLIW instructions.

\section{Operand B}

\section{Operand B}

Many instruction forms have a 19-bit source ``Operand B'' associated with them.

Many instruction forms have a 19-bit source ``Operand B'' associated with them.

This ``Operand B'' is shown in Fig.~\ref{fig:iset-format} as part of the

This ``Operand B'' is shown in Fig.~\ref{fig:iset-format} as part of the

standard instructions.  This Operand B is either equal to a register plus a

standard instructions.  This Operand B is either equal to a register plus a

Line 848...

Line 905...

removed from the realm of possibilities.  This means that the Zip CPU has no

removed from the realm of possibilities.  This means that the Zip CPU has no

native way of executing push, pop, return, or jump to subroutine operations.

native way of executing push, pop, return, or jump to subroutine operations.

Each of these instructions can be emulated with a set of instructions from the

Each of these instructions can be emulated with a set of instructions from the

existing set.

existing set.

\section{Modifying Conditions}

A quick look at the list of conditions supported by the Zip CPU and listed

in Tbl.~\ref{tbl:conditions} reveals that the Zip CPU does not have a full set

of conditions.  In particular, only one explicit unsigned condition is

supported.  Therefore, Tbl.~\ref{tbl:creating-conditions}

\begin{table}\begin{center}

\begin{tabular}{|l|l|l|}\hline

Original & Modified & Name \\\hline\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1

        & \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label}

        & Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}

        & \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label}

        & Less-than or equal unsigned \\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label}    % if (Ry > Rx) -> Rx < Ry

        & \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}

        & Greater-than unsigned \\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label}    % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1

        & \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label}

        & Greater-than equal unsigned \\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A

        & \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label}

        & Greater-than equal unsigned (with offset)\\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A

        & \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label}

        & Greater-than equal comparison with a constant\\[4mm]\hline

\end{tabular}

\caption{Modifying conditions}\label{tbl:creating-conditions}

\end{center}\end{table}

shows examples of how these unsupported conditions can be created

simply by adjusting the compare instruction, for no extra cost in clocks.

Of course, if the compare originally had an immediate within it, that immediate

would need to be loaded into a register in order to do some of these compares.

This case is shown as the last case above.

\section{Move Operands}

\section{Move Operands}

The previous set of operands would be perfect and complete, save only that

The previous set of operands would be perfect and complete, save only that

the CPU needs access to non--supervisory registers while in supervisory mode.

the CPU needs access to non--supervisory registers while in supervisory mode.

Therefore, the MOV instruction is special and offers access to these registers

Therefore, the MOV instruction is special and offers access to these registers

\ldots when in supervisory mode.  To keep the compiler simple, the extra bits

\ldots when in supervisory mode.  To keep the compiler simple, the extra bits

Line 873...

Line 965...

Anything with the user bit set will be treated as a user register and displayed

Anything with the user bit set will be treated as a user register and displayed

special.  Since the CPU quietly ignores the supervisor bits while in user mode,

special.  Since the CPU quietly ignores the supervisor bits while in user mode,

anything marked as a user register will always be specific.

anything marked as a user register will always be specific.

\section{Multiply Operations}

\section{Multiply Operations}

The Zip CPU supports two Multiply operations, a 16x16 bit signed multiply

({\tt MPYS}) and a 16x16 bit unsigned multiply ({\tt MPYU}).  A 32--bit

The ZipCPU originally only supported 16x16 multiply operations.  GCC, however,

multiply, should it be desired, needs to be created via software from this

wanted 32x32-bit operations and building these from 16x16-bit multiplies

16x16 bit multiply.

is painful.  Therefore, the ZipCPU was modified to support 32x32-bit multiplies.

In particular, the ZipCPU supports three separate 32x32-bit multiply

instructions: {\tt MPY}, {\tt MPYUHI}, and {\tt MPYSHI}.  The first of these

produces the low 32-bits of a 32x32-bit multiply result.  The second two

produce the upper 32-bits.  The first, {\tt MPYUHI}, produces the upper 32-bits

assuming the multiply was unsigned, whereas the second assuming it was signed.

Each multiply instruction is independent of each other in execution, although

the compiler may use them quite dependently.

In an effort to maintain single clock pipeline timing, all three of these

multiplies have been slowed down in logic.  Thus, depending upon the setting

of {\tt OPT\_MULTIPLY} within {\tt cpudefs.v}, the multiply instructions

will either 1)~cause an ILLEGAL instruction error, 2)~take one additional clock,

or 3)~take two additional clocks.

\section{Divide Unit}

\section{Divide Unit}

The Zip CPU also has a divide unit which can be built alongside the ALU.

The Zip CPU also has a divide unit which can be built alongside the ALU.

This divide unit provides the Zip CPU with its first two instructions that

This divide unit provides the Zip CPU with another two instructions that

cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and

cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and

{\tt DIVU}, the unsigned divide.  These are both 32--bit divide instructions,

{\tt DIVU}, the unsigned divide.  These are both 32--bit divide instructions,

dividing one 32--bit number by another.  In this case, the Operand B field,

dividing one 32--bit number by another.  In this case, the Operand B field,

whether it be register or register plus immediate, constitutes the denominator,

whether it be register or register plus immediate, constitutes the denominator,

whereas the numerator is given by the other register.

whereas the numerator is given by the other register.

The Divide is also a multi--clock instruction.  While the divide is running,

The Divide is also a multi--clock instruction.  While the divide is running,

the ALU, memory unit, and floating point unit (if installed) will be idle.

the ALU, any memory loads, and the floating point unit (if installed) will be

Once the divide completes, other units may continue.

idle.  Once the divide completes, other units may continue.

Of course, divides can have errors: division by zero.  In the case of division

Of course, divides can have errors: division by zero.  In the case of division

by zero, an exception will be caused that will send the CPU either from

by zero, an exception will be caused that will send the CPU either from

user mode to supervisor mode, or halt the CPU if it is already in supervisor

user mode to supervisor mode, or halt the CPU if it is already in supervisor

mode.

mode.

\section{NOOP, BREAK, and Bus Lock Instruction}

\section{NOOP, BREAK, and Bus Lock Instruction}

Three instructions are not listed in the opcode list in

Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are

Tbl.~\ref{tbl:iset-opcodes}, yet fit in the NOOP type instruction format of

somewhat special.  These are the {\tt NOOP}, {\tt Break}, and bus {\tt LOCK}

Fig.~\ref{fig:iset-format}.  These are the {\tt NOOP}, {\tt Break}, and

instructions.  These are encoded according to

bus {\tt LOCK} instructions.  These are encoded according to

Fig.~\ref{fig:iset-noop}, and have the following meanings:

Fig.~\ref{fig:iset-noop}, and have the following meanings:

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\begin{bytefield}[endianness=big]{32}

\begin{bytefield}[endianness=big]{32}

\bitheader{0-31}\\

\bitheader{0-31}\\

\begin{leftwordgroup}{NOOP}

\begin{leftwordgroup}{NOOP}

\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}

\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}

        \bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored} \\

        \bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{Ignored} \\

\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}

\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}

        \bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{---} \\

        \bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{---} \\

\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}

\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}

        \bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}

        \bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}

        \bitbox{5}{Ignored}

        \bitbox{5}{Ignored}

                \end{leftwordgroup} \\

                \end{leftwordgroup} \\

\begin{leftwordgroup}{BREAK}

\begin{leftwordgroup}{BREAK}

\bitbox{1}{0}\bitbox{3}{3'h7}

\bitbox{1}{0}\bitbox{3}{3'h7}

                \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}

                \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored}

                \end{leftwordgroup} \\

                \end{leftwordgroup} \\

\begin{leftwordgroup}{LOCK}

\begin{leftwordgroup}{LOCK}

\bitbox{1}{0}\bitbox{3}{3'h7}

\bitbox{1}{0}\bitbox{3}{3'h7}

                \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{100}\bitbox{22}{Ignored}

                \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}

                \end{leftwordgroup} \\

                \end{leftwordgroup} \\

\end{bytefield}

\end{bytefield}

\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}

\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}

\end{center}\end{figure}

\end{center}\end{figure}

Line 937...

Line 1043...

The {\tt BREAK} instruction is useful for creating a debug instruction that

The {\tt BREAK} instruction is useful for creating a debug instruction that

will halt the CPU without executing.  If in user mode, depending upon the

will halt the CPU without executing.  If in user mode, depending upon the

setting of the break enable bit, it will either switch to supervisor mode or

setting of the break enable bit, it will either switch to supervisor mode or

halt the CPU--depending upon where the user wishes to do his debugging.

halt the CPU--depending upon where the user wishes to do his debugging.

Finally, the {\tt LOCK} instruction was added in order to make a test and

Finally, the {\tt LOCK} instruction was added in order to provide for

set multi--CPU operation possible.  Following a LOCK instruction, the next

atomic operations.  The {\tt LOCK} instruction only works in pipeline mode.

two instructions, if they are memory LOD/STO instructions, will execute without

It works by stalling the ALU pipeline stack until all prior stages are

dropping the wishbone {\tt CYC} line between the instructions.   Thus a

filled, and then it guarantees that once a bus cycle is started, the

{\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, where Rz

wishbone {\tt CYC} line will remain asserted until the LOCK is deasserted.

is initially set, can be used to set an address while guaranteeing that Ry

This allows the execution of one instruction that was waiting in the load

was the value before setting the address to Rz.   This is a useful instruction

operands pipeline stage, and one instruction that was waiting in the

while trying to achieve concurrency among multiple CPU's.

instruction decode stage.  Further, if the instruction waiting in the decode

stage was a VLIW instruction, then it may be possible to execute a third

instruction.

This was originally written to implement an atomic test and set instruction,

such as a {\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)},

where Rz is initially set.

Other instructions using a VLIW instruction combining a single ALU instruction

with a store, such as an atomic increment, or {\tt LOCK}, {\tt LOD (Rx),Ry},

{\tt ADD 1,Ry}, {\tt STO Ry,(Rx)}, should be possible as well.  Many of these

combinations remain to be tested.

\section{Floating Point}

\section{Floating Point}

Although the Zip CPU does not (yet) have a floating point unit, the current

Although the Zip CPU does not (yet) have a floating point unit, the current

instruction set offers eight opcodes for floating point operations, and treats

instruction set offers eight opcodes for floating point operations, and treats

floating point exceptions like divide by zero errors.  Once this unit is built

floating point exceptions like divide by zero errors.  Once this unit is built

and integrated together with the rest of the CPU, the Zip CPU will support

and integrated together with the rest of the CPU, the Zip CPU will support

32--bit floating point instructions natively.  Any 64--bit floating point

32--bit floating point instructions natively.  Any 64--bit floating point

instructions will still need to be emulated in software.

instructions will still need to be emulated in software.

Until that time, of even after if the floating point unit is not installed,

floating point instructions will trigger an illegal instruction exception,

which may be trapped and then implemented in software.

\section{Derived Instructions}

\section{Derived Instructions}

The Zip CPU supports many other common instructions, but not all of them

The Zip CPU supports many other common instructions, but not all of them

are single cycle instructions.  The derived instruction tables,

are single cycle instructions.  The derived instruction tables,

Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3}

Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3}

and~\ref{tbl:derived-4},

and~\ref{tbl:derived-4},

Line 1123...

Line 1244...

        \\\hline

        \\\hline

{\tt STEP Rr,Rt}

{\tt STEP Rr,Rt}

        & \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}

        & \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}

        & Step a Galois implementation of a Linear Feedback Shift Register, Rr,

        & Step a Galois implementation of a Linear Feedback Shift Register, Rr,

                using taps Rt \\\hline

                using taps Rt \\\hline

{\tt SEX.b Rx }

        & \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx}

        & Signed extend a byte into a full word.\\\hline

{\tt SEX.h Rx }

        & \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx}

        & Sign extend a half word into a full word.\\\hline

{\tt STO.b Rx,\$addr}

{\tt STO.b Rx,\$addr}

        & \parbox[t]{1.5in}{\tt %

        & \parbox[t]{1.5in}{\tt %

        LDI \$addr,Ra \\

        LDI \$addr,Ra \\

        LDI \$addr,Rb \\

        LDI \$addr,Rb \\

        LSR \$2,Ra \\

        LSR \$2,Ra \\

Browse

Tools

Subversion Repositories zipcpu

[/] [zipcpu/] [trunk/] [doc/] [src/] [spec.tex] - Diff between revs 92 and 139