URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

# Subversion Repositorieszipcpu

## Compare Revisions

• This comparison shows the changes necessary to convert path
/zipcpu/trunk/doc/src
from Rev 139 to Rev 92
Reverse comparison

## Rev 139 → Rev 92

/spec.tex
43,51 → 43,15
 %% %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % % % From TI about DSPs vs FPGAs: % www.ti.com/general/docs/video/foldersGallery.tsp?bkg=gray % &gpn=35145&familyid=1622&keyMatch=DSP Breaktime Episode Three % &tisearch=Search-EN-Everything&DCMP=leadership % &HQS=ep-pro-dsp-leadership-problog-150518-v-en % % FPGA's are annoyingly faster, cheaper, and not quite as power hungry % as they used to be. % % Why would you choose DSPs over FPGAs? If you care about size, % if you care about power, or happen to have a complicated algorithm % that just isn't simply doing the same thing over and over % % For complex algorithms that change over time. Each have their strengths % sometimes you can use both. % % "No assembly required" -- TI tools all C programming, very GUI based % environment, very little optimization by hand ... % % % The FPGA's achilles heel: Reconfigurability. It is very difficult, although % I'm sure major vendors will tell you not impossible, to reconfigure an FPGA % based upon the need to process time-sensitive data. If you need one of two % algorithms, both which will fit on the FPGA individually but not together, % switching between them on the fly is next to impossible, whereas switching % algorithm within a CPU is not difficult at all. For example, imagine  % receiving a packet and needing to apply one of two data algorithms on the % packet before sending it back out, and needing to do so fast. If both % algorithms don't fit in memory, where does the packet go when you need to % swap one algorithm out for the other? And what is the cost of that "context" % swap? % % \documentclass{gqtekspec} \usepackage{import} \usepackage{bytefield} % Install via apt-get install texlive-science \usepackage{bytefield} % \graphicspath{{../gfx}} \project{Zip CPU} \title{Specification} \author{Dan Gisselquist, Ph.D.} \email{dgisselq (at) opencores.org} \revision{Rev.~0.9} \revision{Rev.~0.8} \definecolor{webred}{rgb}{0.5,0,0} \definecolor{webgreen}{rgb}{0,0.4,0} \usepackage[dvips,ps2pdf,colorlinks=true,
120,8 → 84,6
 copy. \end{license} \begin{revisionhistory} 0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY, MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively. LOCK instruction now permits an intermediate ALU operation. \\\hline 0.8 & 1/28/2016 & Gisselquist & Reduced complexity early branching \\\hline 0.7 & 12/22/2015 & Gisselquist & New Instruction Set Architecture \\\hline 0.6 & 11/17/2015 & Gisselquist & Added graphics to illustrate pipeline discussion.\\\hline
690,9 → 652,8
 \end{center}\end{figure} The basic format is that some operation, defined by the OpCode, is applied if a condition, Cnd, is true in order to produce a result which is placed in the destination register, or DR. The load 23--bit signed immediate instruction (LDI) is different in that it accepts no conditions, and uses only a 4-bit opcode. the destination register, or DR. The Load 23--bit signed immediate instruction is different in that it requires no conditions, and uses only a 4-bit opcode.   This is actually a second version of instruction set definition, given certain lessons learned. For example, the original instruction set had the following
706,7 → 667,6
  extra logic to process. \item The instruction set wasn't very compact. One bus operation was required  for every instruction. \item While the CPU supported multiplies, they were only 16x16 bit multiplies. \end{enumerate} This second version was designed with two criteria. The first was that the new instruction set needed to be compatible, at the assembly language level,
732,18 → 692,6
 possible into the VLIW format. Where necessary to place both VLIW instructions on the same line, they will be separated by a vertical bar.   One belated change to the instruction set violates some of the above principles. This latter instruction set change replaced the {\tt LDIHI} instruction with a 32--bit multiply instruction {\tt MPY}, and then changed the two 16--bit multiply instructions {\tt MPYU} and {\tt MPYS} for  {\tt MPYUHI} and {\tt MPYSHI} respectively. This creates a 32--bit  multiply capability, while removing the 16--bit multiply that wasn't very useful. Further, the {\tt LDIHI} instruction was being used primarily by the assembler and linker to create a 32--bit load immediate pair of instructions. This instruction set combination, {\tt LDIHI} followed by {\tt LDILO} was replaced with an equivalent instruction set, {\tt BREV} followed by {\tt LDILO}, save that linking has been made more complicated in the process.   \section{Instruction OpCodes} With a 5--bit opcode field, there are 32--possible instructions as shown in  Tbl.~\ref{tbl:iset-opcodes}.
758,10 → 706,10
 5'h05 & LSR & Logical Shift Right & \\\cline{1-3} 5'h06 & LSL & Logical Shift Left & \\\cline{1-3} 5'h07 & ASR & Arithmetic Shift Right & \\\hline 5'h08 & MPY & 32x32 bit multiply & Y \\\hline 5'h09 & LDILO & Load Immediate Low & N\\\hline 5'h0a & MPYUHI & Upper 32 of 64 bits from an unsigned 32x32 multiply & \\\cline{1-3} 5'h0b & MPYSHI & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3} 5'h08 & LDIHI & Load Immediate High & N \\\cline{1-3} 5'h09 & LDILO & Load Immediate Low & \\\hline 5'h0a & MPYU & Unsigned 16--bit Multiply & \\\cline{1-3} 5'h0b & MPYS & Signed 16--bit Multiply & Y \\\cline{1-3} 5'h0c & BREV & Bit Reverse & \\\cline{1-3} 5'h0d & POPC& Population Count & \\\cline{1-3} 5'h0e & ROL & Rotate left & \\\hline
781,9 → 729,6
 5'h1d & FPINT & Convert to integer & \\\hline 5'h1e & & {\em Reserved for future use} &\\\hline 5'h1f & & {\em Reserved for future use} &\\\hline 5'h18 & & NOOP (A-register = PC)&\\\cline{1-3} 5'h19 & & BREAK (A-register = PC)& N\\\cline{1-3} 5'h1a & & LOCK (A-register = PC)&\\\hline \end{tabular} \caption{Zip CPU OpCodes}\label{tbl:iset-opcodes} \end{center}\end{table}
808,7 → 753,7
 3'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\ 3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\ 3'h5 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\ 3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\ 3'h6 & {\tt .C} & Carry set\\ 3'h7 & {\tt .V} & Overflow set\\ \end{tabular} \caption{Conditions for conditional operand execution}\label{tbl:conditions}
819,8 → 764,8
 pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)}) As an alternative, it is often possible to reverse the condition, and thus recovering those extra two clocks. Thus instead of \hbox{\tt CMP Rx,Ry;} \hbox{\tt BNC label} you can issue a \hbox{\tt CMP 1+Ry,Rx;} \hbox{\tt BC label}. \hbox{\tt CMP Rx,Ry;} \hbox{\tt BNV label} you can issue a \hbox{\tt CMP Ry,Rx;} \hbox{\tt BV label}.   Conditionally executed instructions will not further adjust the  condition codes, with the exception of \hbox{\tt CMP} and \hbox{\tt TST} 858,8 → 803,6  Further, the first bit is given a special meaning. If the first bit is set, the conditions apply to the second half of the instruction, otherwise the conditions will only apply to the first half of a conditional instruction. Of course, the other conditions are still available by mingling the non--VLIW instructions with VLIW instructions.   \section{Operand B} Many instruction forms have a 19-bit source Operand B'' associated with them.  907,41 → 850,6  Each of these instructions can be emulated with a set of instructions from the existing set.   \section{Modifying Conditions} A quick look at the list of conditions supported by the Zip CPU and listed in Tbl.~\ref{tbl:conditions} reveals that the Zip CPU does not have a full set of conditions. In particular, only one explicit unsigned condition is supported. Therefore, Tbl.~\ref{tbl:creating-conditions} \begin{table}\begin{center} \begin{tabular}{|l|l|l|}\hline Original & Modified & Name \\\hline\hline \parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1  & \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label}  & Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline \parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}  & \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label}  & Less-than or equal unsigned \\[4mm]\hline \parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry  & \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}  & Greater-than unsigned \\[4mm]\hline \parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label} % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1  & \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label}  & Greater-than equal unsigned \\[4mm]\hline \parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A  & \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label}  & Greater-than equal unsigned (with offset)\\[4mm]\hline \parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A  & \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label}  & Greater-than equal comparison with a constant\\[4mm]\hline \end{tabular} \caption{Modifying conditions}\label{tbl:creating-conditions} \end{center}\end{table} shows examples of how these unsupported conditions can be created simply by adjusting the compare instruction, for no extra cost in clocks. Of course, if the compare originally had an immediate within it, that immediate would need to be loaded into a register in order to do some of these compares. This case is shown as the last case above.   \section{Move Operands} The previous set of operands would be perfect and complete, save only that the CPU needs access to non--supervisory registers while in supervisory mode.  967,29 → 875,14  anything marked as a user register will always be specific.   \section{Multiply Operations} The Zip CPU supports two Multiply operations, a 16x16 bit signed multiply ({\tt MPYS}) and a 16x16 bit unsigned multiply ({\tt MPYU}). A 32--bit multiply, should it be desired, needs to be created via software from this 16x16 bit multiply.   The ZipCPU originally only supported 16x16 multiply operations. GCC, however, wanted 32x32-bit operations and building these from 16x16-bit multiplies is painful. Therefore, the ZipCPU was modified to support 32x32-bit multiplies.   In particular, the ZipCPU supports three separate 32x32-bit multiply instructions: {\tt MPY}, {\tt MPYUHI}, and {\tt MPYSHI}. The first of these produces the low 32-bits of a 32x32-bit multiply result. The second two produce the upper 32-bits. The first, {\tt MPYUHI}, produces the upper 32-bits assuming the multiply was unsigned, whereas the second assuming it was signed. Each multiply instruction is independent of each other in execution, although the compiler may use them quite dependently.   In an effort to maintain single clock pipeline timing, all three of these multiplies have been slowed down in logic. Thus, depending upon the setting of {\tt OPT\_MULTIPLY} within {\tt cpudefs.v}, the multiply instructions will either 1)~cause an ILLEGAL instruction error, 2)~take one additional clock, or 3)~take two additional clocks.      \section{Divide Unit} The Zip CPU also has a divide unit which can be built alongside the ALU. This divide unit provides the Zip CPU with another two instructions that This divide unit provides the Zip CPU with its first two instructions that cannot be executed in a single cycle: {\tt DIVS}, or signed divide, and {\tt DIVU}, the unsigned divide. These are both 32--bit divide instructions, dividing one 32--bit number by another. In this case, the Operand B field, 997,8 → 890,8  whereas the numerator is given by the other register.   The Divide is also a multi--clock instruction. While the divide is running, the ALU, any memory loads, and the floating point unit (if installed) will be idle. Once the divide completes, other units may continue. the ALU, memory unit, and floating point unit (if installed) will be idle. Once the divide completes, other units may continue.   Of course, divides can have errors: division by zero. In the case of division by zero, an exception will be caused that will send the CPU either from  1006,9 → 899,10  mode.   \section{NOOP, BREAK, and Bus Lock Instruction} Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are somewhat special. These are the {\tt NOOP}, {\tt Break}, and bus {\tt LOCK} instructions. These are encoded according to Three instructions are not listed in the opcode list in Tbl.~\ref{tbl:iset-opcodes}, yet fit in the NOOP type instruction format of Fig.~\ref{fig:iset-format}. These are the {\tt NOOP}, {\tt Break}, and bus {\tt LOCK} instructions. These are encoded according to Fig.~\ref{fig:iset-noop}, and have the following meanings: \begin{figure}\begin{center} \begin{bytefield}[endianness=big]{32} 1015,9 → 909,9  \bitheader{0-31}\\ \begin{leftwordgroup}{NOOP} \bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}  \bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{Ignored} \\  \bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored} \\ \bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}  \bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{---} \\  \bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{---} \\ \bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}  \bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}  \bitbox{5}{Ignored} 1024,11 → 918,11   \end{leftwordgroup} \\ \begin{leftwordgroup}{BREAK} \bitbox{1}{0}\bitbox{3}{3'h7}  \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Ignored}  \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}  \end{leftwordgroup} \\ \begin{leftwordgroup}{LOCK} \bitbox{1}{0}\bitbox{3}{3'h7}  \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}  \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{100}\bitbox{22}{Ignored}  \end{leftwordgroup} \\ \end{bytefield} \caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop} 1045,26 → 939,15  setting of the break enable bit, it will either switch to supervisor mode or halt the CPU--depending upon where the user wishes to do his debugging.   Finally, the {\tt LOCK} instruction was added in order to provide for atomic operations. The {\tt LOCK} instruction only works in pipeline mode. It works by stalling the ALU pipeline stack until all prior stages are  filled, and then it guarantees that once a bus cycle is started, the  wishbone {\tt CYC} line will remain asserted until the LOCK is deasserted.  This allows the execution of one instruction that was waiting in the load operands pipeline stage, and one instruction that was waiting in the instruction decode stage. Further, if the instruction waiting in the decode stage was a VLIW instruction, then it may be possible to execute a third instruction. Finally, the {\tt LOCK} instruction was added in order to make a test and set multi--CPU operation possible. Following a LOCK instruction, the next two instructions, if they are memory LOD/STO instructions, will execute without dropping the wishbone {\tt CYC} line between the instructions. Thus a  {\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, where Rz is initially set, can be used to set an address while guaranteeing that Ry was the value before setting the address to Rz. This is a useful instruction while trying to achieve concurrency among multiple CPU's.   This was originally written to implement an atomic test and set instruction, such as a {\tt LOCK} followed by {\tt LOD (Rx),Ry} and a {\tt STO Rz,(Rx)}, where Rz is initially set.   Other instructions using a VLIW instruction combining a single ALU instruction with a store, such as an atomic increment, or {\tt LOCK}, {\tt LOD (Rx),Ry}, {\tt ADD 1,Ry}, {\tt STO Ry,(Rx)}, should be possible as well. Many of these combinations remain to be tested.   \section{Floating Point} Although the Zip CPU does not (yet) have a floating point unit, the current instruction set offers eight opcodes for floating point operations, and treats 1073,10 → 956,6  32--bit floating point instructions natively. Any 64--bit floating point instructions will still need to be emulated in software.   Until that time, of even after if the floating point unit is not installed, floating point instructions will trigger an illegal instruction exception, which may be trapped and then implemented in software.   \section{Derived Instructions} The Zip CPU supports many other common instructions, but not all of them are single cycle instructions. The derived instruction tables, 1246,15 → 1125,6   & \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}  & Step a Galois implementation of a Linear Feedback Shift Register, Rr,  using taps Rt \\\hline % % {\tt SEX.b Rx }  & \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx}  & Signed extend a byte into a full word.\\\hline {\tt SEX.h Rx }   & \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx}  & Sign extend a half word into a full word.\\\hline % {\tt STO.b Rx,\$addr}  & \parbox[t]{1.5in}{\tt %  LDI \$addr,Ra \\