OpenCores

Rev 199	Rev 202
Line 84...	Line 84...
`\usepackage{bytefield} % Install via apt-get install texlive-science`	`\usepackage{bytefield} % Install via apt-get install texlive-science`
`% \graphicspath{{../gfx}}`	`% \graphicspath{{../gfx}}`
`\project{ZipCPU}`	`\project{ZipCPU}`
`\title{Specification}`	`\title{Specification}`
`\author{Dan Gisselquist, Ph.D.}`	`\author{Dan Gisselquist, Ph.D.}`
`\email{dgisselq (at) opencores.org}`	`\email{dgisselq (at) ieee.org}`
`\revision{Rev.~1.0}`	`\revision{Rev.~1.1}`
`\definecolor{webred}{rgb}{0.5,0,0}`	`\definecolor{webred}{rgb}{0.5,0,0}`
`\definecolor{webgreen}{rgb}{0,0.4,0}`	`\definecolor{webgreen}{rgb}{0,0.4,0}`
`\hypersetup{`	`\hypersetup{`
`ps2pdf,`	`ps2pdf,`
`pdfpagelabels,`	`pdfpagelabels,`
Line 120...	Line 120...
`You should have received a copy of the GNU General Public License along`	`You should have received a copy of the GNU General Public License along`
`with this program. If not, see \hbox{$<$http://www.gnu.org/licenses/$>$} for`	`with this program. If not, see \hbox{$<$http://www.gnu.org/licenses/$>$} for`
`a copy.`	`a copy.`
`\end{license}`	`\end{license}`
`\begin{revisionhistory}`	`\begin{revisionhistory}`
	`2.0 & 1/18/2017 & Gisselquist & Switched from 32--bit to 8--bit bytes.\\\hline`
	`1.1 & 11/28/2016 & Gisselquist & Moved the ZipSystem address to {\tt 0xff000000} base.\\\hline`
`1.0 & 11/4/2016 & Gisselquist & Major rewrite,`	`1.0 & 11/4/2016 & Gisselquist & Major rewrite,`
`includes compiler info\\\hline`	`includes compiler info\\\hline`
`0.91& 7/16/2016 & Gisselquist & Described three more CC bits\\\hline`	`0.91& 7/16/2016 & Gisselquist & Described three more CC bits\\\hline`
`0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY,`	`0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY,`
`MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively. LOCK`	`MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively. LOCK`
Line 203...	Line 205...
`more.\footnote{A not--so integrated MMU is currently under development.}`	`more.\footnote{A not--so integrated MMU is currently under development.}`

`For those who like buzz words, the ZipCPU is:`	`For those who like buzz words, the ZipCPU is:`
`\begin{itemize}`	`\begin{itemize}`
`\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,`	`\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,`
instructions are 32-bits wide, etc. Indeed, the ``byte size''	`instructions are 32-bits wide, etc.`
`for this processor, as per the C--language definition of a`
``byte'' being the smallest addressable unit, is 32--bits.
`\item A RISC CPU. There is no microcode for executing instructions. All`	`\item A RISC CPU. There is no microcode for executing instructions. All`
`instructions are designed to be completed in one clock cycle.`	`instructions are designed to be completed in one clock cycle.`
`\item A Load/Store architecture. (Only load and store instructions`	`\item A Load/Store architecture. (Only load and store instructions`
`can access memory.)`	`can access memory.)`
`\item Wishbone compliant. All peripherals are accessed just like`	`\item Wishbone compliant. All peripherals are accessed just like`
Line 454...	Line 454...
`so it can be overridden upon instantiation.`	`so it can be overridden upon instantiation.`

`Given the performance benefits achieved by early branching, setting this flag`	`Given the performance benefits achieved by early branching, setting this flag`
`is highly recommended.`	`is highly recommended.`

`{\tt OPT\_PIPELINED\_BUS\_ACCESS} controls whether or not {\tt LOD}/{\tt STO}`	`{\tt OPT\_PIPELINED\_BUS\_ACCESS} controls whether or not memory`
`instructions can take advantage of the pipelined wishbone bus. To be`	`instructions can take advantage of the pipelined wishbone bus. To be`
`eligible, the operations to be pipelined must be adjacent, must be all`	`eligible, the operations to be pipelined must be adjacent, must be all`
`{\tt LOD}s or all {\tt STO}s, and the addresses must all use the same base`	`loads or all stores, and the addresses must all use the same base`
`address register and either have identical immediate offsets, or immediate`	`address register and either have identical immediate offsets, or immediate`
`offsets that increase by one for each instruction. Further, the`	`offsets that increase by one for each instruction. Further, the`
`{\tt LOD}/{\tt STO} string of instructions must all have the same conditional`	`string of load (or store) instructions must all have the same conditional`
`(if any). Currently, this approach and benefit is most effectively used`	`(if any). Currently, this approach and benefit is most effectively used`
`when saving registers to or restoring registers from the stack at the`	`when saving registers to or restoring registers from the stack at the`
`beginning/end of a procedure, when using assembly optimized programs, or`	`beginning/end of a procedure, when using assembly optimized programs, or`
`when doing a context swap.`	`when doing a context swap.`

`I recommend setting this flag, for performance reasons, especially if your`	`I recommend setting this flag, for performance reasons, especially if your`
`wishbone bus implementation can handle pipelined bus accesses. The logic`	`wishbone bus implementation can handle pipelined bus accesses. The logic`
`impact of this setting is minimal, the performance impact can be significant.`	`impact of this setting is minimal, the performance impact can be significant.`

`{\tt OPT\_VLIW} includes within the instruction set the Very Long Instruction`	`{\tt OPT\_CIS} includes within the instruction set the Very Long Instruction`
`Word packing, which packs up to two instructions within each instruction word.`	`Word packing, which packs up to two instructions within each instruction word.`
`Non--packed instructions will still execute as normal, this just enables the`	`Non--packed instructions will still execute as normal, this just enables the`
`decoding and running of packed instructions.`	`decoding and running of packed instructions.`

`The two next options, {\tt INCLUDE\_DMA\_CONTROLLER} and`	`The two next options, {\tt INCLUDE\_DMA\_CONTROLLER} and`
Line 482...	Line 482...
`control whether the DMA controller is included in the ZipSystem, and`	`control whether the DMA controller is included in the ZipSystem, and`
`whether or not the eight accounting timers are also included. Set these to`	`whether or not the eight accounting timers are also included. Set these to`
`include the respective peripherals, comment them out not to. These only`	`include the respective peripherals, comment them out not to. These only`
`affect the ZipSystem implementation, and not any ZipBones implementations.`	`affect the ZipSystem implementation, and not any ZipBones implementations.`

`Finally, if you find yourself needing to debug the core and specifically needing`	`Finally, if you find yourself needing to debug the core and specifically`
`to get a trace from the core to find out why something specifically failed,`	`needing to get a trace from the core to find out why something specifically`
`you may find it useful to define {\tt DEBUG\_SCOPE}. This will add a 32--bit`	`failed, you may find it useful to define {\tt DEBUG\_SCOPE}. This will add a`
`debug output from the core, as the last argument to the core, to the ZipSystem,`	`32--bit debug output from the core, as the last argument to the core, to the`
`or even to ZipBones. The actual definition and composition of this debugging`	`ZipSystem, or even to ZipBones. The actual definition and composition of`
`bit--field changes from one implementation to the next, depending upon needs`	`this debugging bit--field changes from one implementation to the next,`
`and necessities, so please look at the code at the bottom of {\tt zipcpu.v}`	`depending upon needs and necessities, so please look at the code at the`
`for more details.`	`bottom of {\tt zipcpu.v} for more details.`

`That ends our discussion of CPU options, but there remain several implementation`	`That ends our discussion of CPU options, but there remain several`
`parameters that can be defined with the CPU as well. Some of these, such as`	`implementation parameters that can be defined with the CPU as well. Some of`
`{\tt IMPLEMENT\_MPY}, {\tt IMPLEMENT\_DIVIDE}, {\tt IMPLEMENT\_FPU}, and`	`these, such as {\tt IMPLEMENT\_MPY}, {\tt IMPLEMENT\_DIVIDE},`
`{\tt EARLY\_BRANCHING} have already been discussed. The remainder shall be`	`{\tt IMPLEMENT\_FPU}, and {\tt EARLY\_BRANCHING} have already been discussed.`
`discussed quickly here.`	`The remainder shall be discussed quickly here.`

`The {\tt RESET\_ADDRESS} parameter controls what address the CPU attempts to`	`The {\tt RESET\_ADDRESS} parameter controls what address the CPU attempts to`
`fetch its first instruction from upon any CPU reset. The default value is`	`fetch its first instruction from upon any CPU reset. The default value is`
`not likely to be particularly useful, so overriding the default is recommended`	`not likely to be particularly useful, so overriding the default is recommended`
`for every implementation.`	`for every implementation.`

`The {\tt ADDRESS\_WIDTH} parameter can be used to trim down the width of`	`The {\tt ADDRESS\_WIDTH} parameter can be used to trim down the width of`
`addresses used by the CPU. For example, although the Wishbone Bus definition`	`addresses used by the CPU. For example, although the Wishbone Bus definition`
`used by the CPU has 32--address lines, particular implementations may have`	`used by the CPU has 30--address lines, particular implementations may have`
`fewer. By setting this value to the actual number of wires in the address`	`fewer. By setting this value to the actual number of wires in the address`
`bus, some logic can be spared within the CPU. The default is a 32--bit wide`	`bus, some logic can be spared within the CPU. The default is also the maximum,`
`bus.`	`a 30--bit address width. Two additional bits are used internally by the CPU`
	`to create the appearance of an 8--bit bus, by using the wishbone select lines.`

`The {\tt LGICACHE} parameter specifies the log base two of the instruction`	`The {\tt LGICACHE} parameter specifies the log base two of the instruction`
`cache size. If no instruction cache is used, this option has no effect.`	`cache size. If no instruction cache is used, this option has no effect.`
`Otherwise it sets the size of the instruction cache to be`	`Otherwise it sets the size of the instruction cache to be`
`$2^{\mbox{\tiny\tt LGICACHE}}$ words. The traditional prefetch cache, if used,`	`$2^{\mbox{\tiny\tt LGICACHE}}$ words. The traditional prefetch cache, if used,`
Line 527...	Line 528...

`The {\tt START\_HALTED} parameter, if set to non--zero, will cause the`	`The {\tt START\_HALTED} parameter, if set to non--zero, will cause the`
`CPU to be halted upon startup. This is useful for debugging, since it prevents`	`CPU to be halted upon startup. This is useful for debugging, since it prevents`
`the CPU from doing anything without supervision. Of course, once all pieces`	`the CPU from doing anything without supervision. Of course, once all pieces`
`of your design are in place and proven, you'll probably want to set this to`	`of your design are in place and proven, you'll probably want to set this to`
`zero.`	`zero, so that the CPU will then start up immediately upon power up.`

`The {\tt EXTERNAL\_INTERRUPTS} parameter controls the number of interrupt`	`The {\tt EXTERNAL\_INTERRUPTS} parameter controls the number of interrupt`
`wires coming into the CPU. This number must be between one and sixteen,`	`wires coming into the CPU. This number must be between one and sixteen,`
`or if the performance counters are disabled, between one and twenty four.`	`or if the performance counters are disabled, between one and twenty four.`

Line 584...	Line 585...
`In each register set, the Program Counter (PC) is register 15, whereas`	`In each register set, the Program Counter (PC) is register 15, whereas`
`the status register (SR) or condition code register (CC) is register 14. All`	`the status register (SR) or condition code register (CC) is register 14. All`
`other registers are identical in their hardware functionality.\footnote{Jumps`	`other registers are identical in their hardware functionality.\footnote{Jumps`
`to {\tt R0}, an instruction used to implement a return from a subroutine, may`	`to {\tt R0}, an instruction used to implement a return from a subroutine, may`
`be optimized in the future within the early branch logic.} By convention, the`	`be optimized in the future within the early branch logic.} By convention, the`
`stack pointer is register 13 and noted as (SP)--although there is nothing`	`stack pointer is register 13 and noted as (SP). Beyond this convention,`
`special about this register other than this convention. Also by convention, if`	`word accesses to offsets of the stack pointer are compressed when using the`
`the compiler needs a frame pointer it will be placed into register~12, and may`	`CIS instruction set. Also by convention, if the compiler needs a frame`
`be abbreviated by FP. Finally, by convention, R0 will hold a subroutine's`	`pointer it will be placed into register~12, and may be abbreviated by FP.`
`return address, sometimes called the link register (LR).`	`Finally, by convention, R0 will hold a subroutine's return address, sometimes`
	`called the link register (LR).`

`When the CPU is in supervisor mode, instructions can access both register sets`	`When the CPU is in supervisor mode, instructions can access both register sets`
`via the {\tt MOV} instruction, whereas when the CPU is in user mode, {\tt MOV}`	`via the {\tt MOV} instruction, whereas when the CPU is in user mode, {\tt MOV}`
`instructions will only offer access to user registers. We'll discuss this`	`instructions will only offer access to user registers. We'll discuss this`
`further in subsection.~\ref{sec:isa-mov}.`	`further in subsection.~\ref{sec:isa-mov}.`
Line 604...	Line 606...
`\begin{bitlist}`	`\begin{bitlist}`
`31\ldots 23 & R & Reserved for future uses\\\hline`	`31\ldots 23 & R & Reserved for future uses\\\hline`
`22\ldots 16 & R/W & Reserved for future uses\\\hline`	`22\ldots 16 & R/W & Reserved for future uses\\\hline`
`15 & R & Reserved for MMU exceptions\\\hline`	`15 & R & Reserved for MMU exceptions\\\hline`
`14 & W & Clear I-Cache command, always reads zero\\\hline`	`14 & W & Clear I-Cache command, always reads zero\\\hline`
`13 & R & VLIW instruction phase (1 for first half)\\\hline`	`13 & R & CIS instruction phase (1 for first half)\\\hline`
`12 & R & (Reserved for) Floating Point Exception\\\hline`	`12 & R & (Reserved for) Floating Point Exception\\\hline`
`11 & R & Division by Zero Exception\\\hline`	`11 & R & Division by Zero Exception\\\hline`
`10 & R & Bus-Error Flag\\\hline`	`10 & R & Bus-Error Flag\\\hline`
`9 & R & Trap Flag (or user interrupt). Cleared on return to userspace.\\\hline`	`9 & R & Trap Flag (or user interrupt). Cleared on return to userspace.\\\hline`
`8 & R & Illegal Instruction Flag\\\hline`	`8 & R & Illegal Instruction Flag\\\hline`
Line 737...	Line 739...

`\item The thirteenth bit will operate in a similar fashion to both the bus`	`\item The thirteenth bit will operate in a similar fashion to both the bus`
`error and division by zero flags, only it will be set upon a (yet to`	`error and division by zero flags, only it will be set upon a (yet to`
`be determined) floating point error.`	`be determined) floating point error.`

`\item In the case of VLIW instructions, if an exception occurs after the first`	`\item In the case of CIS instructions, if an exception occurs after the first`
`instruction but before the second, the fourteenth bit of the CC register`	`instruction but before the second, the fourteenth bit of the CC`
`will be set to indicate this fact.`	`register will be set to indicate this fact. This can be combined with`
	`the user PC to the address of the half-word where the fault occurred.`

`\item The fifteenth bit references a clear cache bit. The supervisor may`	`\item The fifteenth bit references a clear cache bit. The supervisor may`
`write a one to this bit in order to clear the CPU instruction cache.`	`write a one to this bit in order to clear the CPU instruction cache.`
`The bit always reads as a zero.`	`The bit always reads as a zero.`

Line 760...	Line 763...
`All ZipCPU instructions fit in one of the formats shown in`	`All ZipCPU instructions fit in one of the formats shown in`
`Fig.~\ref{fig:iset-format}.`	`Fig.~\ref{fig:iset-format}.`
`\begin{figure}\begin{center}`	`\begin{figure}\begin{center}`
`\begin{bytefield}[endianness=big]{32}`	`\begin{bytefield}[endianness=big]{32}`
`\bitheader{0-31}\\`	`\bitheader{0-31}\\`
`\begin{leftwordgroup}{Standard}\bitbox{1}{0}\bitbox{4}{DR}`	`\begin{leftwordgroup}{Standard}\bitbox{1}{0}\bitbox[tlr]{4}{}`
`\bitbox[lrt]{5}{OpCode}`	`\bitbox[lrt]{5}{OpCode}`
`\bitbox[lrt]{3}{Cnd}`	`\bitbox[lrt]{3}{}`
`\bitbox{1}{0}`	`\bitbox{1}{0}`
`\bitbox{18}{18-bit Signed Immediate} \\`	`\bitbox{18}{18-bit Signed Immediate} \\`
`\bitbox{1}{0}\bitbox{4}{DR}`	`\bitbox{1}{0}\bitbox[lr]{4}{DR}`
`\bitbox[lrb]{5}{}`	`\bitbox[lrb]{5}{}`
`\bitbox[lrb]{3}{}`	`\bitbox[lr]{3}{Cnd}`
`\bitbox{1}{1}`	`\bitbox{1}{1}`
`\bitbox{4}{BR}`	`\bitbox{4}{BR}`
`\bitbox{14}{14-bit Signed Immediate}\end{leftwordgroup} \\`	`\bitbox{14}{14-bit Signed Immediate}\end{leftwordgroup} \\`
`\begin{leftwordgroup}{MOV}\bitbox{1}{0}\bitbox{4}{DR}`	`\begin{leftwordgroup}{MOV}\bitbox{1}{0}\bitbox[lr]{4}{}`
`\bitbox[lrt]{5}{5'hf}`	`\bitbox[lrt]{5}{5'hf}`
`\bitbox[lrt]{3}{Cnd}`	`\bitbox[lrb]{3}{}`
`\bitbox{1}{A}`	`\bitbox{1}{A}`
`\bitbox{4}{BR}`	`\bitbox{4}{BR}`
`\bitbox{1}{B}`	`\bitbox{1}{B}`
`\bitbox{13}{13-bit Signed Immediate}\end{leftwordgroup} \\`	`\bitbox{13}{13-bit Signed Immediate}\end{leftwordgroup} \\`
`\begin{leftwordgroup}{LDI}\bitbox{1}{0}\bitbox{4}{DR}`	`\begin{leftwordgroup}{LDI}\bitbox{1}{0}\bitbox[lrb]{4}{}`
`\bitbox{4}{4'hb}`	`\bitbox{4}{4'hc}`
`\bitbox{23}{23-bit Signed Immediate}\end{leftwordgroup} \\`	`\bitbox{23}{23-bit Signed Immediate}\end{leftwordgroup} \\`
`\begin{leftwordgroup}{NOOP}\bitbox{1}{0}\bitbox{3}{3'h7}`	`\begin{leftwordgroup}{NOOP}\bitbox{1}{0}\bitbox{3}{3'h7}`
`\bitbox{1}{}`	`\bitbox{1}{}`
`\bitbox{2}{11}`	`\bitbox{2}{11}`
`\bitbox{3}{xxx}`	`\bitbox{3}{xxx}`
`\bitbox{22}{Ignored}`	`\bitbox{22}{Ignored}`
`\end{leftwordgroup} \\`	`\end{leftwordgroup} \\`
`\begin{leftwordgroup}{VLIW}\bitbox{1}{1}\bitbox[lrt]{4}{DR}`
`\bitbox[lrt]{5}{OpCode}`
`\bitbox[lrt]{3}{Cnd}`
`\bitbox{1}{0}`
`\bitbox{4}{Imm.}`
`\bitbox{14}{---} \\`
`\bitbox{1}{1}\bitbox[lr]{4}{}`
`\bitbox[lrb]{5}{}`
`\bitbox[lr]{3}{}`
`\bitbox{1}{1}`
`\bitbox{4}{BR}`
`\bitbox{14}{---} \\`
`\bitbox{1}{1}\bitbox[lrb]{4}{}`
`\bitbox{4}{4'hb}`
`\bitbox{1}{}`
`\bitbox[lrb]{3}{}`
`\bitbox{5}{5'b Imm}`
`\bitbox{14}{---} \\`
`\bitbox{1}{1}\bitbox{9}{---}`
`\bitbox[lrt]{3}{Cnd}`
`\bitbox{5}{---}`
`\bitbox[lrt]{4}{DR}`
`\bitbox[lrt]{5}{OpCode}`
`\bitbox{1}{0}`
`\bitbox{4}{Imm}`
`\\`
`\bitbox{1}{1}\bitbox{9}{---}`
`\bitbox[lr]{3}{}`
`\bitbox{5}{---}`
`\bitbox[lr]{4}{}`
`\bitbox[lrb]{5}{}`
`\bitbox{1}{1}`
`\bitbox{4}{Reg} \\`
`\bitbox{1}{1}\bitbox{9}{---}`
`\bitbox[lrb]{3}{}`
`\bitbox{5}{---}`
`\bitbox[lrb]{4}{}`
`\bitbox{4}{4'hb}`
`\bitbox{1}{}`
`\bitbox{5}{5'b Imm}`
`\end{leftwordgroup} \\`
`\end{bytefield}`	`\end{bytefield}`
`\caption{Zip Instruction Set Format}\label{fig:iset-format}`	`\caption{Zip Instruction Set Format}\label{fig:iset-format}`
`\end{center}\end{figure}`	`\end{center}\end{figure}`
`The basic format is that some operation, defined by the OpCode, is applied`	`The basic format is that some operation, defined by the OpCode, is applied`
`if a condition, Cnd, is true in order to produce a result which is placed in`	`if a condition, Cnd, is true in order to produce a result which is placed in`
`the destination register (DR). There are three basic exceptions to this`	`the destination register (DR).`
`model. The first is the {\tt MOV} instruction, which steals bits~13 and~18`
`to allow supervisor access to user registers. The second is the load 23--bit`	`There are three basic exceptions to this general instruction model. The`
	`first is the {\tt MOV} instruction, which steals bits~13 and~18`
	`to allow supervisor access to user registers. In supervisor mode, these`
	`are set to one to reference user registers, zero otherwise. They are ignored`
	`in user mode. The second exception is the load 23--bit`
`signed immediate instruction ({\tt LDI}), in that it accepts no conditions and`	`signed immediate instruction ({\tt LDI}), in that it accepts no conditions and`
`uses only a 4-bit opcode. The last exception is the {\tt NOOP} instruction`	`uses only a 4-bit opcode. The last exception is the {\tt NOOP} instruction`
`group, containing the {\tt NOOP}, {\tt BREAK}, and {\tt LOCK} opcodes. These`	`group, containing the {\tt BREAK}, {\tt LOCK}, {\tt SIM}, and {\tt NOOP}`
`instructions ignore their register and immediate settings.\footnote{A future`	`opcodes. These instructions ignore their register and immediate settings.`
`version of the CPU may repurpose the immediate bits within the {\tt NOOP}`	`Further, the immediate bits used by these opcodes are available for simulation`
`instruction to be simulator commands, while the immediate/register bits within`	`or debug facilities, but otherwise ignored by the CPU.`
`the {\tt BREAK} instruction may be used by the debugger for whatever purpose`
`it chooses to use them for--such as a breakpoint table index.}`

`The ZipCPU also supports a very long instruction word (VLIW) set of`
`instructions. These aren't truly VLIW instructions in the sense that the CPU`
`still only issues one instruction at a time, but they do pack two instructions`
`into a single instuction word. The number of bits used by the immediate field`
`are adjusted to make space for these instruction words. Other than instruction`
`format, the only basic difference between VLIW and normal instructions is that`
`the CPU will not switch to interrupt mode in between the two instructions,`
`unless an exception is generated by the first instruction. Likewise a new job`
`given to the assembler is that of automatically packing as many instructions as`
`possible into the VLIW format.`

`The disassembler will represent VLIW instructions by placing a vertical bar`
`between the two components, but still leaving them on the same line.`

`\subsection{Instruction OpCodes}\label{sec:isa-opcodes}`	`\subsection{Instruction OpCodes}\label{sec:isa-opcodes}`
`With a 5--bit opcode field, there are 32--possible instructions as shown in`	`With a 5--bit opcode field, there are 32--possible instructions as shown in`
`Tbl.~\ref{tbl:iset-opcodes}.`	`Tbl.~\ref{tbl:iset-opcodes}.`
`\begin{table}\begin{center}`	`\begin{table}\begin{center}`
`\begin{tabular}{\|l\|l\|l\|c\|} \hline \rowcolor[gray]{0.85}`	`\begin{tabular}{\|l\|l\|l\|l\|c\|} \hline \rowcolor[gray]{0.85}`
`OpCode & & Instruction &Sets CC \\\hline\hline`	`OpCode & & A-Reg & Instruction &Sets CC \\\hline\hline`
`5'h00 & {\tt SUB} & Subtract & \\\cline{1-3}`	`5'h00 & {\tt SUB} & \multicolumn{2}{l\|}{Subtract} & \\\cline{1-4}`
`5'h01 & {\tt AND} & Bitwise And & \\\cline{1-3}`	`5'h01 & {\tt AND} & \multicolumn{2}{l\|}{Bitwise And} & \\\cline{1-4}`
`5'h02 & {\tt ADD} & Add two numbers & \\\cline{1-3}`	`5'h02 & {\tt ADD} & \multicolumn{2}{l\|}{Add two numbers} & \\\cline{1-4}`
`5'h03 & {\tt OR} & Bitwise Or & Y \\\cline{1-3}`	`5'h03 & {\tt OR} & \multicolumn{2}{l\|}{Bitwise Or} & Y \\\cline{1-4}`
`5'h04 & {\tt XOR} & Bitwise Exclusive Or & \\\cline{1-3}`	`5'h04 & {\tt XOR} & \multicolumn{2}{l\|}{Bitwise Exclusive Or} & \\\cline{1-4}`
`5'h05 & {\tt LSR} & Logical Shift Right & \\\cline{1-3}`	`5'h05 & {\tt LSR} & \multicolumn{2}{l\|}{Logical Shift Right} & \\\cline{1-4}`
`5'h06 & {\tt LSL} & Logical Shift Left & \\\cline{1-3}`	`5'h06 & {\tt LSL} & \multicolumn{2}{l\|}{Logical Shift Left} & \\\cline{1-4}`
`5'h07 & {\tt ASR} & Arithmetic Shift Right & \\\hline`	`5'h07 & {\tt ASR} & \multicolumn{2}{l\|}{Arithmetic Shift Right} & \\\hline`
`5'h08 & {\tt MPY} & 32x32 bit multiply & Y \\\hline`
`5'h09 & {\tt LDILO} & Load Immediate Low & N\\\hline`	`5'h08 & {\tt BREV} & \multicolumn{2}{l\|}{Bit Reverse B operand into result}& \\\cline{1-4}`
`5'h0a & {\tt MPYUHI} & Upper 32 of 64 bits from an unsigned 32x32 multiply & \\\cline{1-3}`	`5'h09 & {\tt LDILO} & \multicolumn{2}{l\|}{Load Immediate Low} & N\\\hline`
`5'h0b & {\tt MPYSHI} & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3}`	`5'h0a & {\tt MPYUHI} & \multicolumn{2}{l\|}{Upper 32 of 64 bits from an unsigned 32x32 multiply} & \\\cline{1-4}`
`5'h0c & {\tt BREV} & Bit Reverse B operand into result& \\\cline{1-3}`	`5'h0b & {\tt MPYSHI} & \multicolumn{2}{l\|}{Upper 32 of 64 bits from a signed 32x32 multiply} & Y \\\cline{1-4}`
`5'h0d & {\tt POPC}& Population Count & \\\cline{1-3}`	`5'h0c & {\tt MPY} & \multicolumn{2}{l\|}{32x32 bit multiply} & \\\hline`
`5'h0e & {\tt ROL} & Rotate Ra left by OpB bits& \\\hline`	`5'h0d & {\tt MOV} & \multicolumn{2}{l\|}{Move OpB into Ra} & N \\\hline`
`5'h0f & {\tt MOV} & Move OpB into Ra & N \\\hline`	`5'h0e & {\tt DIVU} & R0-R13 & Divide, unsigned & Y \\\cline{1-4}`
`5'h10 & {\tt CMP} & Compare (Ra-OpB) to zero & Y \\\cline{1-3}`	`5'h0f & {\tt DIVS} & R0-R13 & Divide, signed & \\\hline\hline`
`5'h11 & {\tt TST} & Test (AND w/o setting result) & \\\hline`	`%`
`5'h12 & {\tt LOD} & Load Ra from memory (OpB) & N \\\cline{1-3}`	`5'h10 & {\tt CMP} & \multicolumn{2}{l\|}{Compare (Ra-OpB) to zero} & Y \\\cline{1-4}`
`5'h13 & {\tt STO} & Store Ra into memory at (OpB) & \\\hline\hline`	`5'h11 & {\tt TST} & \multicolumn{2}{l\|}{Test (AND w/o setting result)} & \\\hline`
`5'h14 & {\tt DIVU} & Divide, unsigned & Y \\\cline{1-3}`	`5'h12 & {\tt LW} & \multicolumn{2}{l\|}{Load a 32-bit word from memory (OpB) into Ra} & \\\cline{1-4}`
`5'h15 & {\tt DIVS} & Divide, signed & \\\hline\hline`	`5'h13 & {\tt SW} & \multicolumn{2}{l\|}{Store a 32-bit word from Ra into memory at (OpB)} & \\\cline{1-4}`
`5'h16/7 & {\tt LDI} & Load 23--bit signed immediate & N \\\hline\hline`	`5'h14 & {\tt LH} & \multicolumn{2}{l\|}{Load 16-bits from memory (opB) into Ra, clear upper 16 bits} & N \\\cline{1-4}`
`5'h18 & {\tt FPADD} & Floating point add & \\\cline{1-3}`	`5'h15 & {\tt SH} & \multicolumn{2}{l\|}{Store the lower 16-bits of Ra into memory at (OpB)} & \\\cline{1-4}`
`5'h19 & {\tt FPSUB} & Floating point subtract & \\\cline{1-3}`	`5'h16 & {\tt LB} & \multicolumn{2}{l\|}{Load 8-bits from memory (OpB) into Ra, clear upper 24 bits} & \\\cline{1-4}`
`5'h1a & {\tt FPMPY} & Floating point multiply & Y \\\cline{1-3}`	`5'h17 & {\tt SB} & \multicolumn{2}{l\|}{Store the lower 8-bits of Ra into memory at (OpB)} & \\\hline\hline`
`5'h1b & {\tt FPDIV} & Floating point divide & \\\cline{1-3}`	`5'h18/9 & {\tt LDI} & \multicolumn{2}{l\|}{Load 23--bit signed immediate} & N \\\hline\hline`
`5'h1c & {\tt FPI2F} & Convert integer to floating point & \\\cline{1-3}`	`5'h1a & {\tt FPADD} & R0-R13 & Floating point add & \\\cline{1-4}`
`5'h1d & {\tt FPF2I} & Convert floating point to integer & \\\hline`	`5'h1b & {\tt FPSUB} & R0-R13 & Floating point subtract & \\\cline{1-4}`
`5'h1e & & {\em Reserved for future use} &\\\hline`	`5'h1c & {\tt FPMPY} & R0-R13 & Floating point multiply & Y \\\cline{1-4}`
`5'h1f & & {\em Reserved for future use} &\\\hline`	`5'h1d & {\tt FPDIV} & R0-R13 & Floating point divide & \\\cline{1-4}`
`5'h18 & & \hbox to 0.5in{\tt NOOP} (A-register = PC)&\\\cline{1-3}`	`5'h1e & {\tt FPI2F} & R0-R13 & Convert integer to floating point & \\\cline{1-4}`
`5'h19 & & \hbox to 0.5in{\tt BREAK} (A-register = PC)& N\\\cline{1-3}`	`5'h1f & {\tt FPF2I} & R0-R13 & Convert floating point to integer & \\\hline\hline`
`5'h1a & & \hbox to 0.5in{\tt LOCK} (A-register = PC)&\\\hline`	`5'h1c & {\tt BREAK} &None(15)&& \\\cline{1-4}`
	`5'h1d & {\tt LOCK} &None(15)&& N\\\cline{1-4}`
	`5'h1e & {\tt SIM} &None(15)&&\\\cline{1-4}`
	`5'h1f & {\tt NOOP} &None(15)&&\\\hline`
`\end{tabular}`	`\end{tabular}`
`\caption{ZipCPU OpCodes}\label{tbl:iset-opcodes}`	`\caption{ZipCPU OpCodes}\label{tbl:iset-opcodes}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`%`	`%`
`Of these opcodes, {\tt ROL} and {\tt POPC} are experimental and may be`
`replaced in future revisions. (If you have a reason to like or wish to keep`
`these opcodes, please contact me. If you know of alternatives that might be`
`better, please let me know as well.) There is also room for six more`
`register-less instructions in the {\tt NOOP} instruction space,`
`and two floating point instruction opcodes have been reserved for future use.`

`\subsection{Conditional Instructions}\label{sec:isa-cond}`	`\subsection{Conditional Instructions}\label{sec:isa-cond}`
`Most, although not quite all, instructions may be conditionally executed.`	`Most, although not quite all, instructions may be conditionally executed.`
`The 23--bit load immediate instruction, together with the {\tt NOOP},`	`The 23--bit load immediate instruction, together with the {\tt NOOP},`
`{\tt BREAK}, and {\tt LOCK} instructions are the exceptions to this rule.`	`{\tt BREAK}, and {\tt LOCK} instructions are the exceptions to this rule.`
`All other instructions may be conditionally executed.`	`All other instructions may be conditionally executed.`

`From the four condition code flags, eight conditions are defined for standard`	`From the four condition code flags, eight conditions are defined, as shown in`
`instructions. These are shown in Tbl.~\ref{tbl:conditions}.`	`Tbl.~\ref{tbl:conditions}.`
`\begin{table}\begin{center}`	`\begin{table}\begin{center}`
`\begin{tabular}{l\|l\|l}`	`\begin{tabular}{l\|l\|l}`
`Code & Mnemonic & Condition \\\hline`	`Code & Mnemonic & Condition \\\hline`
`3'h0 & None & Always execute the instruction \\`	`3'h0 & None & Always execute the instruction \\`
`3'h1 & {\tt .LT}& Less than ('N' set) \\`	3'h1 & {\tt .Z} & Only execute when `Z' is set \\
`3'h2 & {\tt .Z} & Only execute when 'Z' is set \\`	3'h2 & {\tt .LT}& Less than (`N' set) \\
`3'h3 & {\tt .NZ}& Only execute when 'Z' is not set \\`	`3'h3 & {\tt .C} & Carry set (Also known as less-than unsigned) \\`
`3'h4 & {\tt .GT}& Greater than ('N' not set, 'Z' not set) \\`	`3'h4 & {\tt .V} & Overflow set\\`
`3'h5 & {\tt .GE}& Greater than or equal ('N' not set, 'Z' irrelevant) \\`	3'h5 & {\tt .NZ}& Only execute when `Z' is not set \\
`3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\`	3'h6 & {\tt .GE}& Greater than or equal (`N' not set) \\
`3'h7 & {\tt .V} & Overflow set\\`	`3'h7 & {\tt .NC}& Not carry (also known as greater-than or equal, unsigned) \\`
`\end{tabular}`	`\end{tabular}`
`\caption{Conditions for conditional operand execution}\label{tbl:conditions}`	`\caption{Conditions for conditional operand execution}\label{tbl:conditions}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`There is no condition code for less than or equal, not C or not V---there`	`There are no condition codes for either less than or equal or greater than,`
`just wasn't enough space in 3--bits. Ways of handling non--supported`	`whether signed or unsigned. In a similar fashion, there is no condition`
`conditions are discussed in Sec.~\ref{sec:in-mcond}.`	`code for not V---there just wasn't enough space in 3--bits. Ways of handling`
	`non--supported conditions are discussed in Sec.~\ref{sec:in-mcond}.`

`With the exception of \hbox{\tt CMP} and \hbox{\tt TST} instructions,`	`With the exception of \hbox{\tt CMP} and \hbox{\tt TST} instructions,`
`conditionally executed instructions will not further adjust the condition codes.`	`conditionally executed instructions will not further adjust the condition`
`Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions will adjust conditions`	`codes. Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions will adjust`
`whenever they are executed. In this way, multiple conditions may be evaluated`	`conditions whenever they are executed. In this way, multiple conditions may`
`without branches, creating a sort of logical and--but only if all the conditions`	`be evaluated without branches, creating a sort of logical and--but only if all`
`are the same. For example, to do something if \hbox{\tt R0} is one and`	`the conditions are the same. For example, to do something if \hbox{\tt R0} is`
`\hbox{\tt R1} is two, one might try code such as Tbl.~\ref{tbl:dbl-condition}.`	`one and \hbox{\tt R1} is two, one might try code such as`
	`Tbl.~\ref{tbl:dbl-condition}.`
`\begin{table}\begin{center}`	`\begin{table}\begin{center}`
`\begin{tabular}{l}`	`\begin{tabular}{l}`
`{\tt CMP 1,R0} \\`	`{\tt CMP 1,R0} \\`
`{\em ; Condition codes are now set based upon R0-1} \\`	`{\em ; Condition codes are now set based upon R0-1} \\`
`{\tt CMP.Z 2,R1} \\`	`{\tt CMP.Z 2,R1} \\`
`{\em ; If R0 $\neq$ 1, conditions are unchanged, {\tt Z} is still false.} \\`	`{\em ; If R0 $\neq$ 1, conditions are unchanged, {\tt Z} is still false.} \\`
`{\em ; If R0 $=$ 1, conditions are now set based upon R1-2.} \\`	`{\em ; If R0 $=$ 1, conditions are now set based upon R1-2.} \\`
`{\em ; Now some instruction could be done based upon the conjunction} \\`	`{\em ; Now some instruction could be done based upon the conjunction} \\`
`{\em ; of both conditions.} \\`	`{\em ; of both conditions.} \\`
`{\em ; While we use the example of a {\tt STO}, it could easily be any`	`{\em ; While we use the example of a {\tt SW}, it could easily be any`
`instruction.} \\`	`instruction.} \\`
`{\tt STO.Z R0,(R2)} \\`	`{\tt SW.Z R0,(R2)} \\`
`\end{tabular}`	`\end{tabular}`
`\caption{An example of a double conditional}\label{tbl:dbl-condition}`	`\caption{An example of a double conditional}\label{tbl:dbl-condition}`
`\end{center}\end{table}`	`\end{center}\end{table}`

`The real utility of conditionally executed instructions is that, unlike`	`The real utility of conditionally executed instructions is that, unlike`
`conditional branches, conditionally executed instructions will not stall`	`conditional branches, conditionally executed instructions will not stall`
`the bus if they are not executed.`	`the bus if they are not executed.`

`In the case of VLIW instructions, only four conditions are defined as shown`
`in Tbl.~\ref{tbl:vliw-conditions}.`
`\begin{table}\begin{center}`
`\begin{tabular}{l\|l\|l}`
`Code & Mnemonic & Condition \\\hline`
`2'h0 & None & Always execute the instruction \\`
`2'h1 & {\tt .LT} & Less than ('N' set) \\`
`2'h2 & {\tt .Z} & Only execute when 'Z' is set \\`
`2'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\`
`\end{tabular}`
`\caption{VLIW Conditions}\label{tbl:vliw-conditions}`
`\end{center}\end{table}`
`Further, the first bit of the three is given a special meaning: If the first`
`bit is set, the conditions apply to the second half of the instruction,`
`otherwise the conditions will only apply to the first half of a conditional`
`instruction. Of course, the other conditions are still available by mingling`
`the non--VLIW instructions with VLIW instructions.`

`\subsection{Modifying Conditions}\label{sec:in-mcond}`	`\subsection{Modifying Conditions}\label{sec:in-mcond}`
`A quick look at the list of conditions supported by the ZipCPU and listed`	`A quick look at the list of conditions supported by the ZipCPU and listed`
`in Tbl.~\ref{tbl:conditions} reveals that the ZipCPU does not have a full set`	`in Tbl.~\ref{tbl:conditions} reveals that the ZipCPU does not have a full set`
`of conditions. In particular, only one explicit unsigned condition is`	`of conditions. In particular, only one explicit unsigned condition is`
`supported. Therefore, Tbl.~\ref{tbl:creating-conditions}`	`supported. Therefore, Tbl.~\ref{tbl:creating-conditions}`
`\begin{table}\begin{center}`	`\begin{table}\begin{center}`
`\begin{tabular}{\|l\|l\|l\|}\hline`	`\begin{tabular}{\|l\|l\|l\|}\hline`
`Original & Modified & Name \\\hline\hline`	`Original & Modified & Name \\\hline\hline`
`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1`	`\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1`
`& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label}`	`& \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BLT label}`
`& Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline`	`& Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline`
	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1`
	`& \parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLT label\\BZ label}`
	`& Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline\hline`
	`\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BGT label} % if (Ry > Rx) -> Rx < Ry`
	`& \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BGE label}`
	`& Greater-than (immediate) \\[4mm]\hline`
	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGT label} % if (Ry > Rx) -> Rx < Ry`
	`& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BLT label}`
	`& Greater-than (register) \\[4mm]\hline\hline`
	`\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BLEU label}`
	`& \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BC label}`
	`& Less-than or equal unsigned immediate \\[4mm]\hline`
`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}`	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}`
`& \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label}`	`& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BNC label}`
`& Less-than or equal unsigned \\[4mm]\hline`	`& Less-than or equal unsigned register\\[4mm]\hline\hline`
	`\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry`
	`& \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BNC label}`
	`& Greater-than unsigned (immediate)\\[4mm]\hline`
`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry`	`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label} % if (Ry > Rx) -> Rx < Ry`
`& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}`	`& \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}`
`& Greater-than unsigned \\[4mm]\hline`	`& Greater-than unsigned \\[4mm]\hline`
`\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label} % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1`
`& \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label}`
`& Greater-than equal unsigned \\[4mm]\hline`
`\parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A`
`& \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label}`
`& Greater-than equal unsigned (with offset)\\[4mm]\hline`
`\parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A`
`& \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label}`
`& Greater-than equal comparison with a constant\\[4mm]\hline`
`\end{tabular}`	`\end{tabular}`
`\caption{Modifying conditions}\label{tbl:creating-conditions}`	`\caption{Modifying conditions}\label{tbl:creating-conditions}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`shows examples of how these unsupported conditions can be created`	`shows examples of how these unsupported conditions can be created`
`simply by adjusting the compare instruction, for no extra cost in clocks.`	`simply by adjusting the compare instruction, for no extra cost in clocks.`
Line 1048...	Line 984...
`In those cases where a fourteen or eighteen bit immediate doesn't make sense,`	`In those cases where a fourteen or eighteen bit immediate doesn't make sense,`
`such as for {\tt LDILO}, the extra bits associated with the immediate are`	`such as for {\tt LDILO}, the extra bits associated with the immediate are`
`simply ignored. (This rule does not apply to the shift instructions,`	`simply ignored. (This rule does not apply to the shift instructions,`
`{\tt ASR}, {\tt LSR}, and {\tt LSL}--which all use all of their immediate bits.)`	`{\tt ASR}, {\tt LSR}, and {\tt LSL}--which all use all of their immediate bits.)`

`VLIW instructions still use the same operand B as regular instructions, only`
`there was no room for any instruction plus immediate addressing. Therefore,`
`VLIW instructions have either`
`a register or a 4--bit signed immediate as their operand B. The only exception`
`is the load immediate instruction, which permits a 5--bit signed operand`
`B.\footnote{Although the space exists to extend this VLIW load immediate`
`instruction to six bits, the 5--bit limit was chosen to simplify the`
`disassembler. This may change in the future.}`

`\subsection{Address Modes}\label{sec:isa-addr}`	`\subsection{Address Modes}\label{sec:isa-addr}`
`The ZipCPU supports two addressing modes: register plus immediate, and`	`The ZipCPU supports two addressing modes: register plus immediate, and`
`immediate addressing. Addresses are encoded in the same fashion as`	`immediate addressing. Addresses are encoded in the same fashion as`
`Operand B's, discussed above.`	`Operand B's, discussed above.`

`The VLIW instruction set only offers register addressing.`

`\subsection{Move Operands}\label{sec:isa-mov}`	`\subsection{Move Operands}\label{sec:isa-mov}`
`The previous set of operands would be perfect and complete, save only that the`	`The previous set of operands would be perfect and complete, save only that the`
`CPU needs access to non--supervisory registers while in supervisory mode. The`	`CPU needs access to non--supervisory registers while in supervisory mode. The`
`MOV instruction has been modified to fit that purpose. The two bits,`	`MOV instruction has been modified to fit that purpose. The two bits,`
`shown as {\tt A} and {\tt B} in Fig.~\ref{fig:iset-format} above, are designed`	`shown as {\tt A} and {\tt B} in Fig.~\ref{fig:iset-format} above, are designed`
Line 1127...	Line 1052...
`exception, the divide by zero bit will be set in the CC register. In the`	`exception, the divide by zero bit will be set in the CC register. In the`
`case of a user mode divide by zero, this will be cleared by any return to user`	`case of a user mode divide by zero, this will be cleared by any return to user`
`mode command. The supervisor bit may be cleared either by a reboot or by the`	`mode command. The supervisor bit may be cleared either by a reboot or by the`
`external debugger.`	`external debugger.`

`\subsection{NOOP, BREAK, and Bus LOCK Instruction}`	`\section{CIS Instructions}`
`Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are`
`somewhat special. These are the {\tt NOOP}, {\tt BREAK}, and bus {\tt LOCK}`	`The ZipCPU also supports a compressed instruction set (CIS), outlined in`
`instructions. These are encoded according to`	`Fig.~\ref{fig:iset-cis},`
	`\begin{figure}\begin{center}`
	`\begin{bytefield}[endianness=big]{16}`
	`\bitheader{0-15}\\`
	`\bitbox[lrt]{1}{}\bitbox[lrt]{4}{}`
	`\bitbox[lrt]{3}{COp}`
	`\bitbox{1}{0}`
	`\bitbox{7}{Imm.} \\`
	`\bitbox[lr]{1}{1}\bitbox[lr]{4}{DR}`
	`\bitbox[lrb]{3}{}`
	`\bitbox{1}{1}`
	`\bitbox{4}{BR}`
	`\bitbox{3}{Imm} \\`
	`\bitbox[lr]{1}{}\bitbox[lr]{4}{}`
	`\bitbox{3}{\tt LDI}`
	`\bitbox{8}{8'b Imm} \\`
	`\bitbox[lrb]{1}{}\bitbox[lrb]{4}{}`
	`\bitbox{3}{\tt MOV}`
	`\bitbox{1}{1}`
	`\bitbox{4}{BR}`

Line 84...

\usepackage{bytefield}  % Install via apt-get install texlive-science

\usepackage{bytefield}  % Install via apt-get install texlive-science

% \graphicspath{{../gfx}}

% \graphicspath{{../gfx}}

\project{ZipCPU}

\project{ZipCPU}

\title{Specification}

\title{Specification}

\author{Dan Gisselquist, Ph.D.}

\author{Dan Gisselquist, Ph.D.}

\email{dgisselq (at) opencores.org}

\email{dgisselq (at) ieee.org}

\revision{Rev.~1.0}

\revision{Rev.~1.1}

\definecolor{webred}{rgb}{0.5,0,0}

\definecolor{webred}{rgb}{0.5,0,0}

\definecolor{webgreen}{rgb}{0,0.4,0}

\definecolor{webgreen}{rgb}{0,0.4,0}

\hypersetup{

\hypersetup{

        ps2pdf,

        ps2pdf,

        pdfpagelabels,

        pdfpagelabels,

Line 120...

You should have received a copy of the GNU General Public License along

You should have received a copy of the GNU General Public License along

with this program.  If not, see \hbox{$<$http://www.gnu.org/licenses/$>$} for

with this program.  If not, see \hbox{$<$http://www.gnu.org/licenses/$>$} for

a copy.

a copy.

\end{license}

\end{license}

\begin{revisionhistory}

\begin{revisionhistory}

2.0 & 1/18/2017 & Gisselquist & Switched from 32--bit to 8--bit bytes.\\\hline

1.1 & 11/28/2016 & Gisselquist & Moved the ZipSystem address to {\tt 0xff000000} base.\\\hline

1.0 & 11/4/2016 & Gisselquist & Major rewrite,

1.0 & 11/4/2016 & Gisselquist & Major rewrite,

                        includes compiler info\\\hline

                        includes compiler info\\\hline

0.91& 7/16/2016 & Gisselquist & Described three more CC bits\\\hline

0.91& 7/16/2016 & Gisselquist & Described three more CC bits\\\hline

0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY,

0.9 & 4/20/2016 & Gisselquist & Modified ISA: LDIHI replaced with MPY,

        MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively.  LOCK

        MPYU and MPYS replaced with MPYUHI, and MPYSHI respectively.  LOCK

Line 203...

Line 205...

more.\footnote{A not--so integrated MMU is currently under development.}

more.\footnote{A not--so integrated MMU is currently under development.}

For those who like buzz words, the ZipCPU is:

For those who like buzz words, the ZipCPU is:

\begin{itemize}

\begin{itemize}

\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,

\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,

                instructions are 32-bits wide, etc.  Indeed, the ``byte size''

                instructions are 32-bits wide, etc.

                for this processor, as per the C--language definition of a

                ``byte'' being the smallest addressable unit, is 32--bits.

\item A RISC CPU.  There is no microcode for executing instructions.  All

\item A RISC CPU.  There is no microcode for executing instructions.  All

        instructions are designed to be completed in one clock cycle.

        instructions are designed to be completed in one clock cycle.

\item A Load/Store architecture.  (Only load and store instructions

\item A Load/Store architecture.  (Only load and store instructions

                can access memory.)

                can access memory.)

\item Wishbone compliant.  All peripherals are accessed just like

\item Wishbone compliant.  All peripherals are accessed just like

Line 454...

so it can be overridden upon instantiation.

so it can be overridden upon instantiation.

Given the performance benefits achieved by early branching, setting this flag

Given the performance benefits achieved by early branching, setting this flag

is highly recommended.

is highly recommended.

{\tt OPT\_PIPELINED\_BUS\_ACCESS} controls whether or not {\tt LOD}/{\tt STO}

{\tt OPT\_PIPELINED\_BUS\_ACCESS} controls whether or not memory

instructions can take advantage of the pipelined wishbone bus.  To be

instructions can take advantage of the pipelined wishbone bus.  To be

eligible, the operations to be pipelined must be adjacent, must be all

eligible, the operations to be pipelined must be adjacent, must be all

{\tt LOD}s or all {\tt STO}s, and the addresses must all use the same base

loads or all stores, and the addresses must all use the same base

address register and either have identical immediate offsets, or immediate

address register and either have identical immediate offsets, or immediate

offsets that increase by one for each instruction.  Further, the

offsets that increase by one for each instruction.  Further, the

{\tt LOD}/{\tt STO} string of instructions must all have the same conditional

string of load (or store) instructions must all have the same conditional

(if any).  Currently, this approach and benefit is most effectively used

(if any).  Currently, this approach and benefit is most effectively used

when saving registers to or restoring registers from the stack at the

when saving registers to or restoring registers from the stack at the

beginning/end of a procedure, when using assembly optimized programs, or

beginning/end of a procedure, when using assembly optimized programs, or

when doing a context swap.

when doing a context swap.

I recommend setting this flag, for performance reasons, especially if your

I recommend setting this flag, for performance reasons, especially if your

wishbone bus implementation can handle pipelined bus accesses.  The logic

wishbone bus implementation can handle pipelined bus accesses.  The logic

impact of this setting is minimal, the performance impact can be significant.

impact of this setting is minimal, the performance impact can be significant.

{\tt OPT\_VLIW} includes within the instruction set the Very Long Instruction

{\tt OPT\_CIS} includes within the instruction set the Very Long Instruction

Word packing, which packs up to two instructions within each instruction word.

Word packing, which packs up to two instructions within each instruction word.

Non--packed instructions will still execute as normal, this just enables the

Non--packed instructions will still execute as normal, this just enables the

decoding and running of packed instructions.

decoding and running of packed instructions.

The two next options, {\tt INCLUDE\_DMA\_CONTROLLER} and

The two next options, {\tt INCLUDE\_DMA\_CONTROLLER} and

Line 482...

control whether the DMA controller is included in the ZipSystem, and

control whether the DMA controller is included in the ZipSystem, and

whether or not the eight accounting timers are also included.  Set these to

whether or not the eight accounting timers are also included.  Set these to

include the respective peripherals, comment them out not to.  These only

include the respective peripherals, comment them out not to.  These only

affect the ZipSystem implementation, and not any ZipBones implementations.

affect the ZipSystem implementation, and not any ZipBones implementations.

Finally, if you find yourself needing to debug the core and specifically needing

Finally, if you find yourself needing to debug the core and specifically

to get a trace from the core to find out why something specifically failed,

needing to get a trace from the core to find out why something specifically

you may find it useful to define {\tt DEBUG\_SCOPE}.  This will add a 32--bit

failed, you may find it useful to define {\tt DEBUG\_SCOPE}.  This will add a

debug output from the core, as the last argument to the core, to the ZipSystem,

32--bit debug output from the core, as the last argument to the core, to the

or even to ZipBones.  The actual definition and composition of this debugging

ZipSystem, or even to ZipBones.  The actual definition and composition of

bit--field changes from one implementation to the next, depending upon needs

this debugging bit--field changes from one implementation to the next,

and necessities, so please look at the code at the bottom of {\tt zipcpu.v}

depending upon needs and necessities, so please look at the code at the

for more details.

bottom of {\tt zipcpu.v} for more details.

That ends our discussion of CPU options, but there remain several implementation

That ends our discussion of CPU options, but there remain several

parameters that can be defined with the CPU as well.  Some of these, such as

implementation parameters that can be defined with the CPU as well.  Some of

{\tt IMPLEMENT\_MPY}, {\tt IMPLEMENT\_DIVIDE}, {\tt IMPLEMENT\_FPU}, and

these, such as {\tt IMPLEMENT\_MPY}, {\tt IMPLEMENT\_DIVIDE},

{\tt EARLY\_BRANCHING} have already been discussed. The remainder shall be

{\tt IMPLEMENT\_FPU}, and {\tt EARLY\_BRANCHING} have already been discussed.

discussed quickly here.

The remainder shall be discussed quickly here.

The {\tt RESET\_ADDRESS} parameter controls what address the CPU attempts to

The {\tt RESET\_ADDRESS} parameter controls what address the CPU attempts to

fetch its first instruction from upon any CPU reset.  The default value is

fetch its first instruction from upon any CPU reset.  The default value is

not likely to be particularly useful, so overriding the default is recommended

not likely to be particularly useful, so overriding the default is recommended

for every implementation.

for every implementation.

The {\tt ADDRESS\_WIDTH} parameter can be used to trim down the width of

The {\tt ADDRESS\_WIDTH} parameter can be used to trim down the width of

addresses used by the CPU.  For example, although the Wishbone Bus definition

addresses used by the CPU.  For example, although the Wishbone Bus definition

used by the CPU  has 32--address lines, particular implementations may have

used by the CPU  has 30--address lines, particular implementations may have

fewer.  By setting this value to the actual number of wires in the address

fewer.  By setting this value to the actual number of wires in the address

bus, some logic can be spared within the CPU.  The default is a 32--bit wide

bus, some logic can be spared within the CPU.  The default is also the maximum,

bus.

a 30--bit address width.  Two additional bits are used internally by the CPU

to create the appearance of an 8--bit bus, by using the wishbone select lines.

The {\tt LGICACHE} parameter specifies the log base two of the instruction

The {\tt LGICACHE} parameter specifies the log base two of the instruction

cache size.  If no instruction cache is used, this option has no effect.

cache size.  If no instruction cache is used, this option has no effect.

Otherwise it sets the size of the instruction cache to be

Otherwise it sets the size of the instruction cache to be

$2^{\mbox{\tiny\tt LGICACHE}}$ words.  The traditional prefetch cache, if used,

$2^{\mbox{\tiny\tt LGICACHE}}$ words.  The traditional prefetch cache, if used,

Line 527...

Line 528...

The {\tt START\_HALTED} parameter, if set to non--zero, will cause the

The {\tt START\_HALTED} parameter, if set to non--zero, will cause the

CPU to be halted upon startup.  This is useful for debugging, since it prevents

CPU to be halted upon startup.  This is useful for debugging, since it prevents

the CPU from doing anything without supervision.  Of course, once all pieces

the CPU from doing anything without supervision.  Of course, once all pieces

of your design are in place and proven, you'll probably want to set this to

of your design are in place and proven, you'll probably want to set this to

zero.

zero, so that the CPU will then start up immediately upon power up.

The {\tt EXTERNAL\_INTERRUPTS} parameter controls the number of interrupt

The {\tt EXTERNAL\_INTERRUPTS} parameter controls the number of interrupt

wires coming into the CPU.  This number must be between one and sixteen,

wires coming into the CPU.  This number must be between one and sixteen,

or if the performance counters are disabled, between one and twenty four.

or if the performance counters are disabled, between one and twenty four.

Line 584...

Line 585...

In each register set, the Program Counter (PC) is register 15, whereas

In each register set, the Program Counter (PC) is register 15, whereas

the status register (SR) or condition code register (CC) is register 14.  All

the status register (SR) or condition code register (CC) is register 14.  All

other registers are identical in their hardware functionality.\footnote{Jumps

other registers are identical in their hardware functionality.\footnote{Jumps

to {\tt R0}, an instruction used to implement a return from a subroutine, may

to {\tt R0}, an instruction used to implement a return from a subroutine, may

be optimized in the future within the early branch logic.} By convention, the

be optimized in the future within the early branch logic.} By convention, the

stack pointer is register 13 and noted as (SP)--although there is nothing

stack pointer is register 13 and noted as (SP).  Beyond this convention,

special about this register other than this convention.  Also by convention, if

word accesses to offsets of the stack pointer are compressed when using the

the compiler needs a frame pointer it will be placed into register~12, and may

CIS instruction set.  Also by convention, if the compiler needs a frame

be abbreviated by FP.  Finally, by convention, R0 will hold a subroutine's

pointer it will be placed into register~12, and may be abbreviated by FP.

return address, sometimes called the link register (LR).

Finally, by convention, R0 will hold a subroutine's return address, sometimes

called the link register (LR).

When the CPU is in supervisor mode, instructions can access both register sets

When the CPU is in supervisor mode, instructions can access both register sets

via the {\tt MOV} instruction, whereas when the CPU is in user mode, {\tt MOV}

via the {\tt MOV} instruction, whereas when the CPU is in user mode, {\tt MOV}

instructions will only offer access to user registers.  We'll discuss this

instructions will only offer access to user registers.  We'll discuss this

further in subsection.~\ref{sec:isa-mov}.

further in subsection.~\ref{sec:isa-mov}.

Line 604...

Line 606...

\begin{bitlist}

\begin{bitlist}

31\ldots 23 & R & Reserved for future uses\\\hline

31\ldots 23 & R & Reserved for future uses\\\hline

22\ldots 16 & R/W & Reserved for future uses\\\hline

22\ldots 16 & R/W & Reserved for future uses\\\hline

15 & R & Reserved for MMU exceptions\\\hline

15 & R & Reserved for MMU exceptions\\\hline

14 & W & Clear I-Cache command, always reads zero\\\hline

14 & W & Clear I-Cache command, always reads zero\\\hline

13 & R & VLIW instruction phase (1 for first half)\\\hline

13 & R & CIS instruction phase (1 for first half)\\\hline

12 & R & (Reserved for) Floating Point Exception\\\hline

12 & R & (Reserved for) Floating Point Exception\\\hline

11 & R & Division by Zero Exception\\\hline

11 & R & Division by Zero Exception\\\hline

10 & R & Bus-Error Flag\\\hline

10 & R & Bus-Error Flag\\\hline

9 & R & Trap Flag (or user interrupt).  Cleared on return to userspace.\\\hline

9 & R & Trap Flag (or user interrupt).  Cleared on return to userspace.\\\hline

8 & R & Illegal Instruction Flag\\\hline

8 & R & Illegal Instruction Flag\\\hline

Line 737...

Line 739...

\item The thirteenth bit will operate in a similar fashion to both the bus

\item The thirteenth bit will operate in a similar fashion to both the bus

        error and division by zero flags, only it will be set upon a (yet to

        error and division by zero flags, only it will be set upon a (yet to

        be determined) floating point error.

        be determined) floating point error.

\item In the case of VLIW instructions, if an exception occurs after the first

\item In the case of CIS instructions, if an exception occurs after the first

        instruction but before the second, the fourteenth bit of the CC register

        instruction but before the second, the fourteenth bit of the CC

        will be set to indicate this fact.

        register will be set to indicate this fact.  This can be combined with

        the user PC to the address of the half-word where the fault occurred.

\item The fifteenth bit references a clear cache bit.  The supervisor may

\item The fifteenth bit references a clear cache bit.  The supervisor may

        write a one to this bit in order to clear the CPU instruction cache.

        write a one to this bit in order to clear the CPU instruction cache.

        The bit always reads as a zero.

        The bit always reads as a zero.

Line 760...

Line 763...

All ZipCPU instructions fit in one of the formats shown in

All ZipCPU instructions fit in one of the formats shown in

Fig.~\ref{fig:iset-format}.

Fig.~\ref{fig:iset-format}.

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\begin{bytefield}[endianness=big]{32}

\begin{bytefield}[endianness=big]{32}

\bitheader{0-31}\\

\bitheader{0-31}\\

\begin{leftwordgroup}{Standard}\bitbox{1}{0}\bitbox{4}{DR}

\begin{leftwordgroup}{Standard}\bitbox{1}{0}\bitbox[tlr]{4}{}

                \bitbox[lrt]{5}{OpCode}

                \bitbox[lrt]{5}{OpCode}

                \bitbox[lrt]{3}{Cnd}

                \bitbox[lrt]{3}{}

                \bitbox{1}{0}

                \bitbox{1}{0}

                \bitbox{18}{18-bit Signed Immediate} \\

                \bitbox{18}{18-bit Signed Immediate} \\

\bitbox{1}{0}\bitbox{4}{DR}

\bitbox{1}{0}\bitbox[lr]{4}{DR}

                \bitbox[lrb]{5}{}

                \bitbox[lrb]{5}{}

                \bitbox[lrb]{3}{}

                \bitbox[lr]{3}{Cnd}

                \bitbox{1}{1}

                \bitbox{1}{1}

                \bitbox{4}{BR}

                \bitbox{4}{BR}

                \bitbox{14}{14-bit Signed Immediate}\end{leftwordgroup} \\

                \bitbox{14}{14-bit Signed Immediate}\end{leftwordgroup} \\

\begin{leftwordgroup}{MOV}\bitbox{1}{0}\bitbox{4}{DR}

\begin{leftwordgroup}{MOV}\bitbox{1}{0}\bitbox[lr]{4}{}

                \bitbox[lrt]{5}{5'hf}

                \bitbox[lrt]{5}{5'hf}

                \bitbox[lrt]{3}{Cnd}

                \bitbox[lrb]{3}{}

                \bitbox{1}{A}

                \bitbox{1}{A}

                \bitbox{4}{BR}

                \bitbox{4}{BR}

                \bitbox{1}{B}

                \bitbox{1}{B}

                \bitbox{13}{13-bit Signed Immediate}\end{leftwordgroup} \\

                \bitbox{13}{13-bit Signed Immediate}\end{leftwordgroup} \\

\begin{leftwordgroup}{LDI}\bitbox{1}{0}\bitbox{4}{DR}

\begin{leftwordgroup}{LDI}\bitbox{1}{0}\bitbox[lrb]{4}{}

                \bitbox{4}{4'hb}

                \bitbox{4}{4'hc}

                \bitbox{23}{23-bit Signed Immediate}\end{leftwordgroup} \\

                \bitbox{23}{23-bit Signed Immediate}\end{leftwordgroup} \\

\begin{leftwordgroup}{NOOP}\bitbox{1}{0}\bitbox{3}{3'h7}

\begin{leftwordgroup}{NOOP}\bitbox{1}{0}\bitbox{3}{3'h7}

                \bitbox{1}{}

                \bitbox{1}{}

                \bitbox{2}{11}

                \bitbox{2}{11}

                \bitbox{3}{xxx}

                \bitbox{3}{xxx}

                \bitbox{22}{Ignored}

                \bitbox{22}{Ignored}

                \end{leftwordgroup} \\

                \end{leftwordgroup} \\

\begin{leftwordgroup}{VLIW}\bitbox{1}{1}\bitbox[lrt]{4}{DR}

                \bitbox[lrt]{5}{OpCode}

                \bitbox[lrt]{3}{Cnd}

                \bitbox{1}{0}

                \bitbox{4}{Imm.}

                \bitbox{14}{---} \\

\bitbox{1}{1}\bitbox[lr]{4}{}

                \bitbox[lrb]{5}{}

                \bitbox[lr]{3}{}

                \bitbox{1}{1}

                \bitbox{4}{BR}

                \bitbox{14}{---}        \\

\bitbox{1}{1}\bitbox[lrb]{4}{}

                \bitbox{4}{4'hb}

                \bitbox{1}{}

                \bitbox[lrb]{3}{}

                \bitbox{5}{5'b Imm}

                \bitbox{14}{---}        \\

\bitbox{1}{1}\bitbox{9}{---}

                \bitbox[lrt]{3}{Cnd}

                \bitbox{5}{---}

                \bitbox[lrt]{4}{DR}

                \bitbox[lrt]{5}{OpCode}

                \bitbox{1}{0}

                \bitbox{4}{Imm}

\\

\bitbox{1}{1}\bitbox{9}{---}

                \bitbox[lr]{3}{}

                \bitbox{5}{---}

                \bitbox[lr]{4}{}

                \bitbox[lrb]{5}{}

                \bitbox{1}{1}

                \bitbox{4}{Reg} \\

\bitbox{1}{1}\bitbox{9}{---}

                \bitbox[lrb]{3}{}

                \bitbox{5}{---}

                \bitbox[lrb]{4}{}

                \bitbox{4}{4'hb}

                \bitbox{1}{}

                \bitbox{5}{5'b Imm}

                \end{leftwordgroup} \\

\end{bytefield}

\end{bytefield}

\caption{Zip Instruction Set Format}\label{fig:iset-format}

\caption{Zip Instruction Set Format}\label{fig:iset-format}

\end{center}\end{figure}

\end{center}\end{figure}

The basic format is that some operation, defined by the OpCode, is applied

The basic format is that some operation, defined by the OpCode, is applied

if a condition, Cnd, is true in order to produce a result which is placed in

if a condition, Cnd, is true in order to produce a result which is placed in

the destination register (DR).  There are three basic exceptions to this

the destination register (DR).

model.  The first is the {\tt MOV} instruction, which steals bits~13 and~18

to allow supervisor access to user registers.  The second is the load 23--bit

There are three basic exceptions to this general instruction model.  The

first is the {\tt MOV} instruction, which steals bits~13 and~18

to allow supervisor access to user registers.  In supervisor mode, these

are set to one to reference user registers, zero otherwise.  They are ignored

in user mode.  The second exception is the load 23--bit

signed immediate instruction ({\tt LDI}), in that it accepts no conditions and

signed immediate instruction ({\tt LDI}), in that it accepts no conditions and

uses only a 4-bit opcode.  The last exception is the {\tt NOOP} instruction

uses only a 4-bit opcode.  The last exception is the {\tt NOOP} instruction

group, containing the {\tt NOOP}, {\tt BREAK}, and {\tt LOCK} opcodes.  These

group, containing the {\tt BREAK}, {\tt LOCK}, {\tt SIM}, and {\tt NOOP}

instructions ignore their register and immediate settings.\footnote{A future

opcodes.  These instructions ignore their register and immediate settings.

version of the CPU may repurpose the immediate bits within the {\tt NOOP}

Further, the immediate bits used by these opcodes are available for simulation

instruction to be simulator commands, while the immediate/register bits within

or debug facilities, but otherwise ignored by the CPU.

the {\tt BREAK} instruction may be used by the debugger for whatever purpose

it chooses to use them for--such as a breakpoint table index.}

The ZipCPU also supports a very long instruction word (VLIW) set of

instructions.  These aren't truly VLIW instructions in the sense that the CPU

still only issues one instruction at a time, but they do pack two instructions

into a single instuction word.  The number of bits used by the immediate field

are adjusted to make space for these instruction words.  Other than instruction

format, the only basic difference between VLIW and normal instructions is that

the CPU will not switch to interrupt mode in between the two instructions,

unless an exception is generated by the first instruction.  Likewise a new job

given to the assembler is that of automatically packing as many instructions as

possible into the VLIW format.

The disassembler will represent VLIW instructions by placing a vertical bar

between the two components, but still leaving them on the same line.

\subsection{Instruction OpCodes}\label{sec:isa-opcodes}

\subsection{Instruction OpCodes}\label{sec:isa-opcodes}

With a 5--bit opcode field, there are 32--possible instructions as shown in

With a 5--bit opcode field, there are 32--possible instructions as shown in

Tbl.~\ref{tbl:iset-opcodes}.

Tbl.~\ref{tbl:iset-opcodes}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabular}{|l|l|l|c|} \hline \rowcolor[gray]{0.85}

\begin{tabular}{|l|l|l|l|c|} \hline \rowcolor[gray]{0.85}

OpCode & & Instruction &Sets CC \\\hline\hline

OpCode & & A-Reg & Instruction &Sets CC \\\hline\hline

5'h00 & {\tt SUB} & Subtract &   \\\cline{1-3}

5'h00 & {\tt SUB} & \multicolumn{2}{l|}{Subtract} &   \\\cline{1-4}

5'h01 & {\tt AND} & Bitwise And &   \\\cline{1-3}

5'h01 & {\tt AND} & \multicolumn{2}{l|}{Bitwise And} &   \\\cline{1-4}

5'h02 & {\tt ADD} & Add two numbers &   \\\cline{1-3}

5'h02 & {\tt ADD} & \multicolumn{2}{l|}{Add two numbers} &   \\\cline{1-4}

5'h03 & {\tt OR}  & Bitwise Or & Y \\\cline{1-3}

5'h03 & {\tt OR}  & \multicolumn{2}{l|}{Bitwise Or} & Y \\\cline{1-4}

5'h04 & {\tt XOR} & Bitwise Exclusive Or &   \\\cline{1-3}

5'h04 & {\tt XOR} & \multicolumn{2}{l|}{Bitwise Exclusive Or} &   \\\cline{1-4}

5'h05 & {\tt LSR} & Logical Shift Right &   \\\cline{1-3}

5'h05 & {\tt LSR} & \multicolumn{2}{l|}{Logical Shift Right} &   \\\cline{1-4}

5'h06 & {\tt LSL} & Logical Shift Left &   \\\cline{1-3}

5'h06 & {\tt LSL} & \multicolumn{2}{l|}{Logical Shift Left} &   \\\cline{1-4}

5'h07 & {\tt ASR} & Arithmetic Shift Right &   \\\hline

5'h07 & {\tt ASR} & \multicolumn{2}{l|}{Arithmetic Shift Right} &   \\\hline

5'h08 & {\tt MPY} & 32x32 bit multiply & Y \\\hline

5'h09 & {\tt LDILO} & Load Immediate Low & N\\\hline

5'h08 & {\tt BREV} & \multicolumn{2}{l|}{Bit Reverse B operand into result}&  \\\cline{1-4}

5'h0a & {\tt MPYUHI} & Upper 32 of 64 bits from an unsigned 32x32 multiply &  \\\cline{1-3}

5'h09 & {\tt LDILO} & \multicolumn{2}{l|}{Load Immediate Low} & N\\\hline

5'h0b & {\tt MPYSHI} & Upper 32 of 64 bits from a signed 32x32 multiply & Y \\\cline{1-3}

5'h0a & {\tt MPYUHI} & \multicolumn{2}{l|}{Upper 32 of 64 bits from an unsigned 32x32 multiply} &  \\\cline{1-4}

5'h0c & {\tt BREV} & Bit Reverse B operand into result&  \\\cline{1-3}

5'h0b & {\tt MPYSHI} & \multicolumn{2}{l|}{Upper 32 of 64 bits from a signed 32x32 multiply} & Y \\\cline{1-4}

5'h0d & {\tt POPC}& Population Count &  \\\cline{1-3}

5'h0c & {\tt MPY} & \multicolumn{2}{l|}{32x32 bit multiply} & \\\hline

5'h0e & {\tt ROL} & Rotate Ra left by OpB bits&   \\\hline

5'h0d & {\tt MOV} & \multicolumn{2}{l|}{Move OpB into Ra} & N \\\hline

5'h0f & {\tt MOV} & Move OpB into Ra & N \\\hline

5'h0e & {\tt DIVU} & R0-R13 & Divide, unsigned & Y \\\cline{1-4}

5'h10 & {\tt CMP} & Compare (Ra-OpB) to zero & Y \\\cline{1-3}

5'h0f & {\tt DIVS} & R0-R13 & Divide, signed &  \\\hline\hline

5'h11 & {\tt TST} & Test (AND w/o setting result) &   \\\hline

5'h12 & {\tt LOD} & Load Ra from memory (OpB) & N \\\cline{1-3}

5'h10 & {\tt CMP} & \multicolumn{2}{l|}{Compare (Ra-OpB) to zero} & Y \\\cline{1-4}

5'h13 & {\tt STO} & Store Ra into memory at (OpB) &  \\\hline\hline

5'h11 & {\tt TST} & \multicolumn{2}{l|}{Test (AND w/o setting result)} &   \\\hline

5'h14 & {\tt DIVU} & Divide, unsigned & Y \\\cline{1-3}

5'h12 & {\tt LW} & \multicolumn{2}{l|}{Load a 32-bit word from memory (OpB) into Ra} & \\\cline{1-4}

5'h15 & {\tt DIVS} & Divide, signed &  \\\hline\hline

5'h13 & {\tt SW} & \multicolumn{2}{l|}{Store a 32-bit word from Ra into memory at (OpB)} &  \\\cline{1-4}

5'h16/7 & {\tt LDI} & Load 23--bit signed immediate & N \\\hline\hline

5'h14 & {\tt LH} & \multicolumn{2}{l|}{Load 16-bits from memory (opB) into Ra, clear upper 16 bits} & N \\\cline{1-4}

5'h18 & {\tt FPADD} & Floating point add &  \\\cline{1-3}

5'h15 & {\tt SH} & \multicolumn{2}{l|}{Store the lower 16-bits of Ra into memory at (OpB)} &  \\\cline{1-4}

5'h19 & {\tt FPSUB} & Floating point subtract &   \\\cline{1-3}

5'h16 & {\tt LB} & \multicolumn{2}{l|}{Load 8-bits from memory (OpB) into Ra, clear upper 24 bits} & \\\cline{1-4}

5'h1a & {\tt FPMPY} & Floating point multiply & Y \\\cline{1-3}

5'h17 & {\tt SB} & \multicolumn{2}{l|}{Store the lower 8-bits of Ra into memory at (OpB)} &  \\\hline\hline

5'h1b & {\tt FPDIV} & Floating point divide &   \\\cline{1-3}

5'h18/9 & {\tt LDI} & \multicolumn{2}{l|}{Load 23--bit signed immediate} & N \\\hline\hline

5'h1c & {\tt FPI2F} & Convert integer to floating point &   \\\cline{1-3}

5'h1a & {\tt FPADD} & R0-R13 & Floating point add &  \\\cline{1-4}

5'h1d & {\tt FPF2I} & Convert floating point to integer &   \\\hline

5'h1b & {\tt FPSUB} & R0-R13 & Floating point subtract &   \\\cline{1-4}

5'h1e & & {\em Reserved for future use} &\\\hline

5'h1c & {\tt FPMPY} & R0-R13 & Floating point multiply & Y \\\cline{1-4}

5'h1f & & {\em Reserved for future use} &\\\hline

5'h1d & {\tt FPDIV} & R0-R13 & Floating point divide &   \\\cline{1-4}

5'h18 & & \hbox to 0.5in{\tt NOOP}  (A-register = PC)&\\\cline{1-3}

5'h1e & {\tt FPI2F} & R0-R13 & Convert integer to floating point &   \\\cline{1-4}

5'h19 & & \hbox to 0.5in{\tt BREAK} (A-register = PC)& N\\\cline{1-3}

5'h1f & {\tt FPF2I} & R0-R13 & Convert floating point to integer &   \\\hline\hline

5'h1a & & \hbox to 0.5in{\tt LOCK}  (A-register = PC)&\\\hline

5'h1c & {\tt BREAK} &None(15)&& \\\cline{1-4}

5'h1d & {\tt LOCK} &None(15)&& N\\\cline{1-4}

5'h1e & {\tt SIM}  &None(15)&&\\\cline{1-4}

5'h1f & {\tt NOOP} &None(15)&&\\\hline

\end{tabular}

\end{tabular}

\caption{ZipCPU OpCodes}\label{tbl:iset-opcodes}

\caption{ZipCPU OpCodes}\label{tbl:iset-opcodes}

\end{center}\end{table}

\end{center}\end{table}

Of these opcodes, {\tt ROL} and {\tt POPC} are experimental and may be

replaced in future revisions.  (If you have a reason to like or wish to keep

these opcodes, please contact me.  If you know of alternatives that might be

better, please let me know as well.)  There is also room for six more

register-less instructions in the {\tt NOOP} instruction space,

and two floating point instruction opcodes have been reserved for future use.

\subsection{Conditional Instructions}\label{sec:isa-cond}

\subsection{Conditional Instructions}\label{sec:isa-cond}

Most, although not quite all, instructions may be conditionally executed.

Most, although not quite all, instructions may be conditionally executed.

The 23--bit load immediate instruction, together with the {\tt NOOP},

The 23--bit load immediate instruction, together with the {\tt NOOP},

{\tt BREAK}, and {\tt LOCK} instructions are the exceptions to this rule.

{\tt BREAK}, and {\tt LOCK} instructions are the exceptions to this rule.

All other instructions may be conditionally executed.

All other instructions may be conditionally executed.

From the four condition code flags, eight conditions are defined for standard

From the four condition code flags, eight conditions are defined, as shown in

instructions.  These are shown in Tbl.~\ref{tbl:conditions}.

Tbl.~\ref{tbl:conditions}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabular}{l|l|l}

\begin{tabular}{l|l|l}

Code & Mnemonic & Condition \\\hline

Code & Mnemonic & Condition \\\hline

3'h0 & None & Always execute the instruction \\

3'h0 & None & Always execute the instruction \\

3'h1 & {\tt .LT}& Less than ('N' set) \\

3'h1 & {\tt .Z} & Only execute when `Z' is set \\

3'h2 & {\tt .Z} & Only execute when 'Z' is set \\

3'h2 & {\tt .LT}& Less than (`N' set) \\

3'h3 & {\tt .NZ}& Only execute when 'Z' is not set \\

3'h3 & {\tt .C} & Carry set (Also known as less-than unsigned) \\

3'h4 & {\tt .GT}& Greater than ('N' not set, 'Z' not set) \\

3'h4 & {\tt .V} & Overflow set\\

3'h5 & {\tt .GE}& Greater than or equal ('N' not set, 'Z' irrelevant) \\

3'h5 & {\tt .NZ}& Only execute when `Z' is not set \\

3'h6 & {\tt .C} & Carry set (Also known as less-than unsigned) \\

3'h6 & {\tt .GE}& Greater than or equal (`N' not set) \\

3'h7 & {\tt .V} & Overflow set\\

3'h7 & {\tt .NC}& Not carry (also known as greater-than or equal, unsigned) \\

\end{tabular}

\end{tabular}

\caption{Conditions for conditional operand execution}\label{tbl:conditions}

\caption{Conditions for conditional operand execution}\label{tbl:conditions}

\end{center}\end{table}

\end{center}\end{table}

There is no condition code for less than or equal, not C or not V---there

There are no condition codes for either less than or equal or greater than,

just wasn't enough space in 3--bits.  Ways of handling non--supported

whether signed or unsigned.  In a similar fashion, there is no condition

conditions are discussed in Sec.~\ref{sec:in-mcond}.

code for not V---there just wasn't enough space in 3--bits.  Ways of handling

non--supported conditions are discussed in Sec.~\ref{sec:in-mcond}.

With the exception of \hbox{\tt CMP} and \hbox{\tt TST} instructions,

With the exception of \hbox{\tt CMP} and \hbox{\tt TST} instructions,

conditionally executed instructions will not further adjust the condition codes.

conditionally executed instructions will not further adjust the condition

Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions will adjust conditions

codes.  Conditional \hbox{\tt CMP} or \hbox{\tt TST} instructions will adjust

whenever they are executed.  In this way, multiple conditions may be evaluated

conditions whenever they are executed.  In this way, multiple conditions may

without branches, creating a sort of logical and--but only if all the conditions

be evaluated without branches, creating a sort of logical and--but only if all

are the same.  For example, to do something if \hbox{\tt R0} is one and

the conditions are the same.  For example, to do something if \hbox{\tt R0} is

\hbox{\tt R1} is two, one might try code such as Tbl.~\ref{tbl:dbl-condition}.

one and \hbox{\tt R1} is two, one might try code such as

Tbl.~\ref{tbl:dbl-condition}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabular}{l}

\begin{tabular}{l}

        {\tt CMP 1,R0} \\

        {\tt CMP 1,R0} \\

        {\em ; Condition codes are now set based upon R0-1} \\

        {\em ; Condition codes are now set based upon R0-1} \\

        {\tt CMP.Z 2,R1} \\

        {\tt CMP.Z 2,R1} \\

        {\em ; If R0 $\neq$ 1, conditions are unchanged, {\tt Z} is still false.} \\

        {\em ; If R0 $\neq$ 1, conditions are unchanged, {\tt Z} is still false.} \\

        {\em ; If R0 $=$ 1, conditions are now set based upon R1-2.} \\

        {\em ; If R0 $=$ 1, conditions are now set based upon R1-2.} \\

        {\em ; Now some instruction could be done based upon the conjunction} \\

        {\em ; Now some instruction could be done based upon the conjunction} \\

        {\em ; of both conditions.} \\

        {\em ; of both conditions.} \\

        {\em ; While we use the example of a {\tt STO}, it could easily be any

        {\em ; While we use the example of a {\tt SW}, it could easily be any

                instruction.} \\

                instruction.} \\

        {\tt STO.Z R0,(R2)} \\

        {\tt SW.Z R0,(R2)} \\

\end{tabular}

\end{tabular}

\caption{An example of a double conditional}\label{tbl:dbl-condition}

\caption{An example of a double conditional}\label{tbl:dbl-condition}

\end{center}\end{table}

\end{center}\end{table}

The real utility of conditionally executed instructions is that, unlike

The real utility of conditionally executed instructions is that, unlike

conditional branches, conditionally executed instructions will not stall

conditional branches, conditionally executed instructions will not stall

the bus if they are not executed.

the bus if they are not executed.

In the case of VLIW instructions, only four conditions are defined as shown

in Tbl.~\ref{tbl:vliw-conditions}.

\begin{table}\begin{center}

\begin{tabular}{l|l|l}

Code & Mnemonic & Condition \\\hline

2'h0 & None & Always execute the instruction \\

2'h1 & {\tt .LT} & Less than ('N' set) \\

2'h2 & {\tt .Z} & Only execute when 'Z' is set \\

2'h3 & {\tt .NZ} & Only execute when 'Z' is not set \\

\end{tabular}

\caption{VLIW Conditions}\label{tbl:vliw-conditions}

\end{center}\end{table}

Further, the first bit of the three is given a special meaning: If the first

bit is set, the conditions apply to the second half of the instruction,

otherwise the conditions will only apply to the first half of a conditional

instruction.  Of course, the other conditions are still available by mingling

the non--VLIW instructions with VLIW instructions.

\subsection{Modifying Conditions}\label{sec:in-mcond}

\subsection{Modifying Conditions}\label{sec:in-mcond}

A quick look at the list of conditions supported by the ZipCPU and listed

A quick look at the list of conditions supported by the ZipCPU and listed

in Tbl.~\ref{tbl:conditions} reveals that the ZipCPU does not have a full set

in Tbl.~\ref{tbl:conditions} reveals that the ZipCPU does not have a full set

of conditions.  In particular, only one explicit unsigned condition is

of conditions.  In particular, only one explicit unsigned condition is

supported.  Therefore, Tbl.~\ref{tbl:creating-conditions}

supported.  Therefore, Tbl.~\ref{tbl:creating-conditions}

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabular}{|l|l|l|}\hline

\begin{tabular}{|l|l|l|}\hline

Original & Modified & Name \\\hline\hline

Original & Modified & Name \\\hline\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1

\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1

        & \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BLT label}

        & \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BLT label}

        & Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline

        & Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLE label} % If Ry <= Rx -> Ry < Rx+1

        & \parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLT label\\BZ label}

        & Less-than or equal (signed, {\tt Z} or {\tt N} set)\\[4mm]\hline\hline

\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BGT label}    % if (Ry > Rx) -> Rx < Ry

        & \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BGE label}

        & Greater-than (immediate) \\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGT label}     % if (Ry > Rx) -> Rx < Ry

        & \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BLT label}

        & Greater-than (register) \\[4mm]\hline\hline

\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BLEU label}

        & \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BC label}

        & Less-than or equal unsigned immediate \\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BLEU label}

        & \parbox[t]{1.5in}{\tt CMP 1+Rx,Ry\\BC label}

        & \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BNC label}

        & Less-than or equal unsigned \\[4mm]\hline

        & Less-than or equal unsigned register\\[4mm]\hline\hline

\parbox[t]{1.5in}{\tt CMP Imm,Ry\\BGTU label}   % if (Ry > Rx) -> Rx < Ry

        & \parbox[t]{1.5in}{\tt CMP 1+Imm,Ry\\BNC label}

        & Greater-than unsigned (immediate)\\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label}    % if (Ry > Rx) -> Rx < Ry

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGTU label}    % if (Ry > Rx) -> Rx < Ry

        & \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}

        & \parbox[t]{1.5in}{\tt CMP Ry,Rx\\BC label}

        & Greater-than unsigned \\[4mm]\hline

        & Greater-than unsigned \\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP Rx,Ry\\BGEU label}    % if (Ry >= Rx) -> Rx <= Ry -> Rx < Ry+1

        & \parbox[t]{1.5in}{\tt CMP 1+Ry,Rx\\BC label}

        & Greater-than equal unsigned \\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP A+Rx,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A

        & \parbox[t]{1.5in}{\tt CMP (1-A)+Ry,Rx\\BC label}

        & Greater-than equal unsigned (with offset)\\[4mm]\hline

\parbox[t]{1.5in}{\tt CMP A,Ry\\BGEU label} % if (Ry >= A+Rx)-> A+Rx <= Ry -> Rx < Ry+1-A

        & \parbox[t]{1.5in}{\tt LDI (A-1),Rx\\CMP Ry,Rx\\BC label}

        & Greater-than equal comparison with a constant\\[4mm]\hline

\end{tabular}

\end{tabular}

\caption{Modifying conditions}\label{tbl:creating-conditions}

\caption{Modifying conditions}\label{tbl:creating-conditions}

\end{center}\end{table}

\end{center}\end{table}

shows examples of how these unsupported conditions can be created

shows examples of how these unsupported conditions can be created

simply by adjusting the compare instruction, for no extra cost in clocks.

simply by adjusting the compare instruction, for no extra cost in clocks.

Line 1048...

Line 984...

In those cases where a fourteen or eighteen bit immediate doesn't make sense,

In those cases where a fourteen or eighteen bit immediate doesn't make sense,

such as for {\tt LDILO}, the extra bits associated with the immediate are

such as for {\tt LDILO}, the extra bits associated with the immediate are

simply ignored.  (This rule does not apply to the shift instructions,

simply ignored.  (This rule does not apply to the shift instructions,

{\tt ASR}, {\tt LSR}, and {\tt LSL}--which all use all of their immediate bits.)

{\tt ASR}, {\tt LSR}, and {\tt LSL}--which all use all of their immediate bits.)

VLIW instructions still use the same operand B as regular instructions, only

there was no room for any instruction plus immediate addressing.  Therefore,

VLIW instructions have either

a register or a 4--bit signed immediate as their operand B.  The only exception

is the load immediate instruction, which permits a 5--bit signed operand

B.\footnote{Although the space exists to extend this VLIW load immediate

instruction to six bits, the 5--bit limit was chosen to simplify the

disassembler.  This may change in the future.}

\subsection{Address Modes}\label{sec:isa-addr}

\subsection{Address Modes}\label{sec:isa-addr}

The ZipCPU supports two addressing modes: register plus immediate, and

The ZipCPU supports two addressing modes: register plus immediate, and

immediate addressing.  Addresses are encoded in the same fashion as

immediate addressing.  Addresses are encoded in the same fashion as

Operand B's, discussed above.

Operand B's, discussed above.

The VLIW instruction set only offers register addressing.

\subsection{Move Operands}\label{sec:isa-mov}

\subsection{Move Operands}\label{sec:isa-mov}

The previous set of operands would be perfect and complete, save only that the

The previous set of operands would be perfect and complete, save only that the

CPU needs access to non--supervisory registers while in supervisory mode.  The

CPU needs access to non--supervisory registers while in supervisory mode.  The

MOV instruction has been modified to fit that purpose.  The two bits,

MOV instruction has been modified to fit that purpose.  The two bits,

shown as {\tt A} and {\tt B} in Fig.~\ref{fig:iset-format} above, are designed

shown as {\tt A} and {\tt B} in Fig.~\ref{fig:iset-format} above, are designed

Line 1127...

Line 1052...

exception, the divide by zero bit will be set in the CC register.  In the

exception, the divide by zero bit will be set in the CC register.  In the

case of a user mode divide by zero, this will be cleared by any return to user

case of a user mode divide by zero, this will be cleared by any return to user

mode command.  The supervisor bit may be cleared either by a reboot or by the

mode command.  The supervisor bit may be cleared either by a reboot or by the

external debugger.

external debugger.

\subsection{NOOP, BREAK, and Bus LOCK Instruction}

\section{CIS Instructions}

Three instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are

somewhat special.  These are the {\tt NOOP}, {\tt BREAK}, and bus {\tt LOCK}

The ZipCPU also supports a compressed instruction set (CIS), outlined in

instructions.  These are encoded according to

Fig.~\ref{fig:iset-cis},

\begin{figure}\begin{center}

\begin{bytefield}[endianness=big]{16}

\bitheader{0-15}\\

\bitbox[lrt]{1}{}\bitbox[lrt]{4}{}

                \bitbox[lrt]{3}{COp}

                \bitbox{1}{0}

                \bitbox{7}{Imm.} \\

\bitbox[lr]{1}{1}\bitbox[lr]{4}{DR}

                \bitbox[lrb]{3}{}

                \bitbox{1}{1}

                \bitbox{4}{BR}

                \bitbox{3}{Imm} \\

\bitbox[lr]{1}{}\bitbox[lr]{4}{}

                \bitbox{3}{\tt LDI}

                \bitbox{8}{8'b Imm} \\

\bitbox[lrb]{1}{}\bitbox[lrb]{4}{}

                \bitbox{3}{\tt MOV}

                \bitbox{1}{1}

                \bitbox{4}{BR}

                \bitbox{3}{Imm} \\

\end{bytefield}

\caption{Zip Compressed Instruction Set (CIS) Format}\label{fig:iset-cis}

\end{center}\end{figure}

when enabled via {\tt OPT\_CIS}.

This compressed instruction set packs two instructions per word.  Words

must still be aligned, and jumping into the middle of a compressed instruction

is not allowed.  Further, the CIS only permits the encoding of 8~of the

32~opcodes available in the ISA, as listed in Tbl.~\ref{tbl:iset-cisops}.

\begin{table}\begin{center}

\begin{tabular}{|l|l|l|} \hline \rowcolor[gray]{0.85}

COp & & Instruction \\\hline\hline

3'h00 & {\tt SUB} & Subtract   \\\hline

3'h01 & {\tt AND} & Bitwise And   \\\hline

3'h02 & {\tt ADD} & Add two numbers   \\\hline

3'h03 & {\tt CMP}  & Bitwise Or  \\\hline

3'h04 & {\tt LW} & Bitwise Exclusive Or   \\\hline

3'h05 & {\tt SW} & Logical Shift Right  \\\hline

3'h06 & {\tt LDI} & Logical Shift Left   \\\hline

3'h07 & {\tt MOV} & Arithmetic Shift Right \\\hline

\end{tabular}

\caption{CIS OpCodes}\label{tbl:iset-cisops}

\end{center}\end{table}

A final feature of the compressed instruction set has to do with {\tt LW} and

{\tt SW} instructions.  An {\tt LW} or {\tt SW} instruction with bit-7 set

low references an offset of the Stack Pointer, (SP).  Hence the compressed

instruction set allows loads and stores to offsets of the Stack Pointer

of -128~octets on up to~127 octets.  In practice, this gives the compressed

load and store instructions, when referencing the stack, thirty--two words

that they can reference.

This compressed instruction set somewhat similar to other architectures that

have a thumb instruction set, with the difference that the ZipCPU can intermix

regular and thumb instructions at will.  When using the CIS, instructions are

still issued one at a time, however interrupts are disabled between

instruction halves, in order to prevent the CPU from stopping mid instruction.

Further, it is the silent job of the assembler to compress CIS instructions

in an opportunistic fashion.

The disassembler represents CIS instructions by placing a vertical bar

between the two components, while still leaving them on the same line.

The CIS instruction set does not support conditional execution.

\subsection{BREAK, Bus LOCK, SIM, and NOOP Instructions}

Four instructions within the opcode list in Tbl.~\ref{tbl:iset-opcodes}, are

somewhat special.  These are the {\tt BREAK}, bus {\tt LOCK}, {\tt SIM}, and

{\tt NOOP} instructions.  These are encoded according to

Fig.~\ref{fig:iset-noop}.

Fig.~\ref{fig:iset-noop}.

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\begin{bytefield}[endianness=big]{32}

\begin{bytefield}[endianness=big]{32}

\bitheader{0-31}\\

\bitheader{0-31}\\

\begin{leftwordgroup}{NOOP}

\bitbox{1}{0}\bitbox{3}{3'h7}\bitbox{1}{}

        \bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{Reserved for Simulator} \\

\bitbox{1}{1}\bitbox{3}{3'h7}\bitbox{1}{}

        \bitbox{2}{11}\bitbox{3}{000}\bitbox{22}{---} \\

\bitbox{1}{1}\bitbox{9}{---}\bitbox{3}{---}\bitbox{5}{---}

        \bitbox{3}{3'h7}\bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}

        \bitbox{5}{Rsrvd}

                \end{leftwordgroup} \\

\begin{leftwordgroup}{BREAK}

\begin{leftwordgroup}{BREAK}

\bitbox{1}{0}\bitbox{3}{3'h7}

\bitbox[lrt]{1}{}\bitbox[lrt]{3}{}

                \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{001}\bitbox{22}{Reserved for debugger}

                \bitbox{1}{}\bitbox[lrt]{3}{}\bitbox{2}{00}\bitbox{22}{Reserved for debugger}

                \end{leftwordgroup} \\

                \end{leftwordgroup} \\

\begin{leftwordgroup}{LOCK}

\begin{leftwordgroup}{LOCK}

\bitbox{1}{0}\bitbox{3}{3'h7}

\bitbox[lr]{1}{0}\bitbox[lr]{3}{3'h7}

                \bitbox{1}{}\bitbox{2}{11}\bitbox{3}{010}\bitbox{22}{Ignored}

                \bitbox{1}{}\bitbox[lr]{3}{111}\bitbox{2}{01}\bitbox{22}{Ignored}

                \end{leftwordgroup} \\

\begin{leftwordgroup}{SIM}

\bitbox[lr]{1}{}\bitbox[lr]{3}{}\bitbox{1}{}

        \bitbox[lr]{3}{}\bitbox{2}{10}\bitbox[lrt]{22}{Reserved for Simulator}

                \end{leftwordgroup} \\

\begin{leftwordgroup}{NOOP}

\bitbox[lrb]{1}{}\bitbox[lrb]{3}{}\bitbox{1}{}

        \bitbox[lrb]{3}{}\bitbox{2}{11}\bitbox[lrb]{22}{}

                \end{leftwordgroup} \\

                \end{leftwordgroup} \\

\end{bytefield}

\end{bytefield}

\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}

\caption{NOOP/Break/LOCK Instruction Format}\label{fig:iset-noop}

\end{center}\end{figure}

\end{center}\end{figure}

The {\tt NOOP} instruction is just that: an instruction that does not perform

any operation.  While many other instructions, such as a move from a register

to itself, could also fit this role, only the NOOP instruction guarantees

that it will not stall waiting for a register to be available.   For this

reason, it gets its own place in the instruction set.  Bits 21--0 of this

instruction are reserved for commands which may be given to a simulator, such

as simulator exit, should the code be run from a simulator.  However, such

simulation codes have not yet been defined.

The {\tt BREAK} instruction is useful for creating a debug instruction that

The {\tt BREAK} instruction is useful for creating a debug instruction that

will halt the CPU without executing.  If in user mode, depending upon the

will halt the CPU without executing.  If in user mode, depending upon the

setting of the break enable bit, it will either switch to supervisor mode or

setting of the break enable bit, it will either switch to supervisor mode or

halt the CPU--depending upon where the user wishes to do his debugging.  The

halt the CPU--depending upon where the user wishes to do his debugging.  The

lower 22~bits of this instruction are likewise reserved for the debuggers

lower 22~bits of this instruction are reserved for the debuggers use.

use.

Finally, the {\tt LOCK} instruction was added in order to provide for

The {\tt LOCK} instruction provides the ZipCPU's atomic operation support,

atomic operations.  The {\tt LOCK} instruction only works when the CPU is

althought it only works when the CPU is configured for pipeline

configured for pipeline mode.  It works by stalling the ALU pipeline stack

mode.\footnote{The reason for not allowing {\tt LOCK} support in

until all prior stages are filled, and then it guarantees that once a bus

non-pipelined mode is that the instruction fetch is not allowed to interrupt

cycle is started, the wishbone {\tt CYC} line will remain asserted until the

a lock cycle.  In non-pipelined mode, the instruction fetch must take place

LOCK is deasserted.  This allows the execution of three instructions, one

between every bus access, negating this utility.}  It works by stalling the

memory (ex. {\tt LOD}), one ALU (ex. {\tt ADD}), and another memory instruction

ALU pipeline stack until all prior stages are filled, and then it guarantees

(ex. {\tt STO}), to take place in an unbreakable fashion.  Example uses of this

that once a bus cycle is started, the wishbone {\tt CYC} line will remain

capability include an atomic increment, such as {\tt LOCK}, {\tt LOD (Rx),Ry},

asserted for up to three instructions.  This allows the execution of one

{\tt ADD \#,Ry}, and then {\tt STO Ry,(Rx)}, or even a two instruction pair

memory load (ex. {\tt LW}), one ALU operation (ex. {\tt ADD}), and then

such as a test and set sequence: {\tt LDI 1,Rz}, {\tt LOCK}, {\tt LOD (Rx),Ry},

another memory instruction (ex. {\tt SW}), to take place in an uninterrupted

{\tt STO Rz,(Rx)}.

fashion.  Example uses of this capability include an atomic increment, such

as {\tt LOCK}, {\tt LW (Rx),Ry}, {\tt ADD \#1,Ry}, {\tt SW Ry,(Rx)}, or even

a two instruction pair such as a test and set sequence: {\tt LDI 1,Rz},

{\tt LOCK}, {\tt LW (Rx),Ry}, {\tt SW Rz,(Rx)}.

The {\tt SIM} and {\tt NOOP} instructions need a touch more explaining.

From the standpoint of the CPU, when running from Verilog within an FPGA,

the {\tt SIM} instruction is an illegal instruction--generating an illegal

instruction exception.  Likewise the {\tt NOOP} instruction is just that:

an instruction that consumes a clock, but does not perform any operation.

In both cases, the lower 22--bits are ignored.

Both {\tt SIM} and {\tt NOOP} instructions, though, contain 22--bits that can

be used by a simulator if present.  The encoding of these 22-bits is identical,

so that programs that run in a simulator may run on actual hardware as well

(using the {\tt NOOP} encoding), or they may complain that they were unintended

to run on actual hardware, such as if the {\tt SIM} encoding were used.

Particular encodings allow for exiting the simulation with a known exit

code, {\tt $x$EXIT}, dumping either one or all registers, {\tt $x$DUMP},

or simpling sending a character to the simulator's standard output stream,

{\tt $x$OUT}--where $x$ is either {\tt N} for the {\tt NOOP} version of the

instruction, or {\tt S} for the {\tt SIM} version of the opcode.

The {\tt SIM} instruction is currrently a new facility for the ZipCPU, and

so its functionality remains under test.

\subsection{Floating Point}

\subsection{Floating Point}

Although the ZipCPU does not (yet) have a floating point unit, the current

Although the ZipCPU does not (yet) have a floating point unit, the current

instruction set offers eight opcodes for floating point operations, and treats

instruction set offers six opcodes for floating point operations, and treats

floating point exceptions like divide by zero errors.  Once this unit is built

floating point exceptions like divide by zero errors.  Once this unit is built

and integrated together with the rest of the CPU, the ZipCPU will support

and integrated together with the rest of the CPU, the ZipCPU will support

32--bit floating point instructions natively.  Any 64--bit floating point

32--bit floating point instructions natively.  Any 64--bit floating point

instructions will either need to be emulated in software, or else they will

instructions will either need to be emulated in software, or else they will

need an external floating point peripheral.

need an external floating point peripheral.

Until the FPU is built and integrated, of even afterwards if the floating point

Until this FPU is built and integrated, of even afterwards if the floating

unit is not installed by option, floating point instructions will trigger an

point unit is not installed by option, floating point instructions will

illegal instruction exception, which may be trapped and then implemented in

trigger an illegal instruction exception, which may be trapped and then

software.

implemented in software.

\subsection{Load/Store byte}

One difference between the ZipCPU and many other architectures is that there are

no load byte {\tt LB}, store byte {\tt SB}, load halfword {\tt LH} or store

halfword {\tt SH} instructions.  This lack is by design in an attempt to keep

the 32--bit bus simple.

Because the ZipCPU's addresses refer to 32--bit values, i.e. address one

will refer to a completely different 32--bit value than address two, simulating

these load and store byte instructions is difficult.

This is just the nature of the ZipCPU, as a result of the design choices that

were made.

\subsection{Derived Instructions}

\subsection{Derived Instructions}

The ZipCPU supports many other common instructions by construction, although

The ZipCPU supports many other common instructions by construction, although

not all of them are single cycle instructions.  Tables~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3} and~\ref{tbl:derived-4} show how these

not all of them are single cycle instructions.  Tables~\ref{tbl:derived-1}, \ref{tbl:derived-2}, \ref{tbl:derived-3} and~\ref{tbl:derived-4} show how these

other instructions may be implemented on the ZipCPU.  Many of these

other instructions may be implemented on the ZipCPU.  Many of these

instructions will have assembly equivalents,

instructions will have assembly equivalents,

such as the branch instructions, to facilitate working with the CPU.

such as the branch instructions, to facilitate working with the CPU.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline

\begin{tabular}{p{1.0in}p{1.5in}p{3in}}\\\hline

Mapped & Actual  & Notes \\\hline

Mapped & Actual  & Notes \\\hline

{\tt ABS Rx}

{\tt ABS Rx}

        & \parbox[t]{1.5in}{\tt TST -1,Rx\\NEG.LT Rx}

        & \parbox[t]{1.5in}{\tt TST -1,Rx\\NEG.LT Rx}

        & Absolute value, depends upon the derived {\tt NEG} instruction

        & Absolute value, depends upon the derived {\tt NEG} instruction

        below, and so this expands into three instructions total.\\\hline

        below, and so this expands into three instructions total.\\\hline

\parbox[t]{1.4in}{\tt ADD Ra,Rx\\ADDC Rb,Ry}

\parbox[t]{1.4in}{\tt ADD Ra,Rx\\ADDC Rb,Ry}

        & \parbox[t]{1.5in}{\tt Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}

        & \parbox[t]{1.5in}{\tt Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}

        & Add with carry \\\hline

        & Add with carry \\\hline

{\tt BRA.$x$ +/-\$Addr}

\hbox{\tt BRA.$x$ +/-\$Addr}

        & \hbox{\tt ADD.$x$ \$Addr+PC,PC}

        & \hbox{\tt ADD.$x$ \$Addr+PC,PC}

        & Branch or jump on condition $x$.  Works for 18--bit

        & Branch or jump on condition $x$.  Works for 18--bit

                signed address offsets.\\\hline

                signed address offsets.\\\hline

% {\tt BRA.Cond +/-\$Addr}

% {\tt BRA.Cond +/-\$Addr}

%       & \parbox[t]{1.5in}{\tt LDI \$Addr,Rx \\ ADD.cond Rx,PC}

%       & \parbox[t]{1.5in}{\tt LDI \$Addr,Rx \\ ADD.cond Rx,PC}

Line 1259...

Line 1251...

        & Clears Rx, leaving the flags untouched.  This instruction can be

        & Clears Rx, leaving the flags untouched.  This instruction can be

                executed conditionally. The assembler will quietly  choose

                executed conditionally. The assembler will quietly  choose

                between {\tt LDI} and {\tt BREV} depending upon the existence

                between {\tt LDI} and {\tt BREV} depending upon the existence

                of the condition.\\\hline

                of the condition.\\\hline

{\tt EXCH.W Rx }

{\tt EXCH.W Rx }

        & {\tt ROL \$16,Rx}

        & \parbox[t]{1.5in}{\tt MOV Rx,Rh \\

                LSL \$16,Rh \\

                LSR \$16,Rx \\

                OR Rh,Rx }

        & Exchanges the top and bottom 16'bit words of Rx \\\hline

        & Exchanges the top and bottom 16'bit words of Rx \\\hline

{\tt HALT }

{\tt HALT }

        & {\tt Or \$SLEEP,CC}

        & {\tt Or \$SLEEP,CC}

        & This only works when issued in interrupt/supervisor mode.  In user

        & This only works when issued in interrupt/supervisor mode.  In user

        mode this is simply a wait until interrupt instruction.

        mode this is simply a wait until interrupt instruction.

Line 1272...

Line 1267...

        success instruction.\\\hline

        success instruction.\\\hline

{\tt INT } & {\tt LDI \$0,CC} & This is also known as a trap instruction\\\hline

{\tt INT } & {\tt LDI \$0,CC} & This is also known as a trap instruction\\\hline

{\tt IRET}

{\tt IRET}

        & {\tt OR \$GIE,CC}

        & {\tt OR \$GIE,CC}

        & Also known as an RTU instruction (Return to Userspace) \\\hline

        & Also known as an RTU instruction (Return to Userspace) \\\hline

{\tt JMP R6+\$Offset}

\hbox{\tt JMP R6+\$Offset}

        & {\tt MOV \$Offset(R6),PC}

        & {\tt MOV \$Offset(R6),PC}

        & Only works for 13--bit offsets.  Other offsets require adding the

        & Only works for 13--bit offsets.  Other offsets require adding the

        offset first to R6 before jumping.\\\hline

        offset first to R6 before jumping.\\\hline

{\tt LJMP \$Addr}

{\tt LJMP \$Addr}

        & \parbox[t]{1.5in}{\tt LOD (PC),PC \\ {\em Address }}

        & \parbox[t]{1.5in}{\tt LW (PC),PC \\ {\em Address }}

        & Although this only works for an unconditional jump, and it only

        & Although this only works for an unconditional jump, and it only

        works in an architecture with a unified instruction and data address

        works in an architecture with a unified instruction and data address

        space, this instruction combination makes for a nice combination that

        space, this instruction combination makes for a nice combination that

        can be adjusted by a linker at a later time.\\\hline

        can be adjusted by a linker at a later time.\\\hline

{\tt LJMP.x \$Addr}

{\tt LJMP.x \$Addr}

        & \parbox[t]{1.5in}{\tt LOD.x 2(PC),PC \\ ADD 1,PC \\ {\em Address }}

        & \parbox[t]{1.5in}{\tt LW.x 4(PC),PC \\ ADD 4,PC \\ {\em Address }}

        & Long jump, works for a conditional long jump.  \\\hline

        & Long jump, works for a conditional long jump, not necessarily the best way to do this.  \\\hline

\end{tabular}

\end{tabular}

\caption{Derived Instructions}\label{tbl:derived-1}

\caption{Derived Instructions}\label{tbl:derived-1}

\end{center}\end{table}

\end{center}\end{table}

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabular}{p{1.1in}p{1.8in}p{3in}}\\\hline

\begin{tabular}{p{1.1in}p{1.8in}p{3in}}\\\hline

Mapped & Actual  & Notes \\\hline

Mapped & Actual  & Notes \\\hline

{\tt LJSR \$Addr  }

{\tt LJSR \$Addr  }

        & \parbox[t]{1.5in}{\tt MOV \$2+PC,R0 \\ LOD (PC),PC \\ {\em Address}}

        & \parbox[t]{1.5in}{\tt MOV \$8+PC,R0 \\ LW (PC),PC \\ {\em Address}}

        & Similar to LJMP, but it handles the return address properly.

        & Similar to LJMP, but it handles the return address properly.

        \\\hline

        \\\hline

{\tt JSR PC+\$Offset  }

{\tt JSR PC+\$Offset  }

        & \parbox[t]{1.5in}{\tt MOV \$1+PC,R0 \\ ADD \$Offset,PC}

        & \parbox[t]{1.5in}{\tt MOV \$4+PC,R0 \\ ADD \$Offset,PC}

        & This is similar to the jump and link instructions from other

        & This is similar to the jump and link instructions from other

        architectures, save only that it requires a specific link

        architectures, save only that it requires a specific link

        instruction, seen here as the {\tt MOV} instruction on the

        instruction, seen here as the {\tt MOV} instruction on the

        left.\\\hline

        left.\\\hline

{\tt LDI \$val,Rx }

{\tt LDI \$val,Rx }

Line 1313...

Line 1308...

                created to facilitate this together with {\tt BREV}.

                created to facilitate this together with {\tt BREV}.

\\

\\

        This is also the appropriate means for setting a register value

        This is also the appropriate means for setting a register value

        to an arbitrary 32--bit value in a post--assembly link

        to an arbitrary 32--bit value in a post--assembly link

        operation.}\\\hline

        operation.}\\\hline

{\tt LOD.b \$addr,Rx}

        & \parbox[t]{1.5in}{\tt %

        LDI     \$addr,Ra \\

        LDI     \$addr,Rb \\

        LSR     \$2,Ra \\

        AND     \$3,Rb \\

        LOD     (Ra),Rx \\

        LSL     \$3,Rb \\

        SUB     \$32,Rb \\

        ROL     Rb,Rx \\

        AND \$0ffh,Rx}

        & \parbox[t]{3in}{This CPU is designed for 32'bit word

        length instructions.  Byte addressing is not supported by the CPU or

        the bus, so it therefore takes more work to do.

        Note also that in this example, \$Addr is a byte-wise address, where

        all other addresses in this document are 32-bit wordlength addresses.

        For this reason,

        we needed to drop the bottom two bits.  This also limits the address

        space of character accesses using this method from 16 MB down to 4MB.}

                \\\hline

\parbox[t]{1.5in}{\tt LSL \$1,Rx\\ LSLC \$1,Ry}

\parbox[t]{1.5in}{\tt LSL \$1,Rx\\ LSLC \$1,Ry}

        & \parbox[t]{1.5in}{\tt LSL \$1,Ry \\

        & \parbox[t]{1.5in}{\tt LSL \$1,Ry \\

        LSL \$1,Rx \\

        LSL \$1,Rx \\

        OR.C \$1,Ry}

        OR.C \$1,Ry}

        & Logical shift left with carry.  Note that the

        & Logical shift left with carry.  Note that the

Line 1351...

Line 1325...

        BREV.C \$1,Rz \\

        BREV.C \$1,Rz \\

        LSR \$1,Rx \\

        LSR \$1,Rx \\

        OR Rz,Rx}

        OR Rz,Rx}

        & Logical shift right with carry.  Unlike the shift left, this

        & Logical shift right with carry.  Unlike the shift left, this

        approach doesn't extend well to numbers larger than two words. \\\hline

        approach doesn't extend well to numbers larger than two words. \\\hline

\end{tabular}

\caption{Derived Instructions, continued}\label{tbl:derived-2}

\end{center}\end{table}

\begin{table}\begin{center}

\begin{tabular}{p{1.2in}p{1.5in}p{3.2in}}\\\hline

{\tt NEG Rx} & \parbox[t]{1.5in}{\tt XOR \$-1,Rx \\ ADD \$1,Rx} & Negates Rx\\\hline

{\tt NEG Rx} & \parbox[t]{1.5in}{\tt XOR \$-1,Rx \\ ADD \$1,Rx} & Negates Rx\\\hline

{\tt NEG.C Rx} & \parbox[t]{1.5in}{\tt MOV.C \$-1+Rx,Rx\\XOR.C \$-1,Rx}

{\tt NEG.C Rx} & \parbox[t]{1.5in}{\tt MOV.C \$-1+Rx,Rx\\XOR.C \$-1,Rx}

        & Conditionally negates Rx\\\hline

        & Conditionally negates Rx\\\hline

{\tt NOT Rx } & {\tt XOR \$-1,Rx } & One's complement\\\hline

{\tt NOT Rx } & {\tt XOR \$-1,Rx } & One's complement\\\hline

{\tt POP Rx }

{\tt POP Rx }

        & \parbox[t]{1.5in}{\tt LOD \$(SP),Rx \\ ADD \$1,SP}

        & \parbox[t]{1.5in}{\tt LW \$(SP),Rx \\ ADD \$4,SP}

        & The compiler avoids the need for this instruction and the similar

        & The compiler avoids the need for this instruction and the similar

        {\tt PUSH} instruction when setting up the stack by coalescing all

        {\tt PUSH} instruction when setting up the stack by coalescing all

        the stack address modifications into a single instruction at the

        the stack address modifications into a single instruction at the

        beginning of any stack frame.\\\hline

        beginning of any stack frame.\\\hline

{\tt PUSH Rx}

{\tt PUSH Rx}

        & \parbox[t]{1.5in}{\hbox{\tt SUB \$1,SP}

        & \parbox[t]{1.5in}{\hbox{\tt SUB \$4,SP}

        \hbox{\tt STO Rx,\$(SP)}}

        \hbox{\tt SW Rx,\$(SP)}}

        & Note that for pipelined operation, it helps to coalesce all the

        & Note that for pipelined operation, it helps to coalesce all the

        {\tt SUB}'s into one command, and place the {\tt STO}'s right

        {\tt SUB}'s into one command, and place the {\tt SW}'s right

        after each other.  Further, to avoid a pipeline stall, the

        after each other.  Further, to avoid a pipeline stall, the

        immediate value for the first store must be zero.

        immediate value for the first store must be zero.

        \\\hline

        \\\hline

\end{tabular}

\caption{Derived Instructions, continued}\label{tbl:derived-2}

\end{center}\end{table}

\begin{table}\begin{center}

\begin{tabular}{p{1.0in}p{1.5in}p{3.2in}}\\\hline

{\tt PUSH Rx-Ry}

{\tt PUSH Rx-Ry}

        & \parbox[t]{1.5in}{\tt SUB \$$n$,SP \\

        & \parbox[t]{1.5in}{\tt SUB \$$4n$,SP \\

        STO Rx,\$(SP)

        SW Rx,\$(SP)

        \ldots \\

        \ldots \\

        STO Ry,\$$\left(n-1\right)$(SP)}

        SW Ry,\$$4\left(n-1\right)$(SP)}

        & Multiple pushes at once only need the single subtract from the

        & Multiple pushes at once only need the single subtract from the

        stack pointer.  This derived instruction is analogous to a similar one

        stack pointer.  This derived instruction is analogous to a similar one

        on the Motorola 68k architecture, although the Zip Assembler

        on the Motorola 68k architecture, although the Zip Assembler

        does not support the combined instruction.  This instruction

        does not support the combined instruction.  This instruction

        also supports pipelined memory access.\\\hline

        also supports pipelined memory access.\\\hline

{\tt RESET}

{\tt RESET}

        & \parbox[t]{1in}{\tt STO \$1,\$watchdog(R12)\\BUSY}

        & \parbox[t]{1in}{\tt LDI~0xff000000,R2\\LDI 1,R1\\\hbox{SW R1,\$watchdog(R2)}\\BUSY}

        & This depends upon the existence of a watchdog peripheral, and the

        & This depends upon the existence of a watchdog peripheral, and the

        peripheral base address being preloaded into {\tt R12}.  The BUSY

        peripheral base address being preloaded into {\tt R12}.  The BUSY

        instructions are required because the CPU will continue until the

        instructions are required because the CPU will continue until the

        {\tt STO} has completed.

        {\tt SW} has completed.

        Another opportunity might be to jump to the reset address from within

        Another opportunity might be to jump to the reset address from within

        supervisor mode.\\\hline

        supervisor mode.\\\hline

{\tt RET} & {\tt MOV R0,PC}

{\tt RET} & {\tt MOV R0,PC}

        & This depends upon the form of the {\tt JSR} given on the previous

        & This depends upon the form of the {\tt JSR} given on the previous

        page that stores the return address into R0.

        page that stores the return address into R0.

        \\\hline

        \\\hline

{\tt SEX.b Rx }

{\tt SEXB Rx }

        & \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx}

        & \parbox[t]{1.5in}{\tt LSL 24,Rx \\ ASR 24,Rx}

        & Signed extend an 8--bit value into a full word.\\\hline

        & Signed extend an 8--bit value into a full word.\\\hline

{\tt SEX.h Rx }

{\tt SEXH Rx }

        & \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx}

        & \parbox[t]{1.5in}{\tt LSL 16,Rx \\ ASR 16,Rx}

        & Sign extend a 16--bit value into a full word.\\\hline

        & Sign extend a 16--bit value into a full word.\\\hline

{\tt STEP Rr,Rt}

{\tt STEP Rr,Rt}

        & \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}

        & \parbox[t]{1.5in}{\tt LSR \$1,Rr \\ XOR.C Rt,Rr}

        & Step a Galois implementation of a Linear Feedback Shift Register, Rr,

        & Step a Galois implementation of a Linear Feedback Shift Register, Rr,

                using taps Rt \\\hline

                using taps Rt \\\hline

{\tt STEP}

{\tt STEP}

        & \parbox[t]{1.5in}{\tt OR \$Step|\$GIE,CC}

        & \parbox[t]{1.5in}{\tt OR \$Step|\$GIE,CC}

        & Steps a user mode process by one instruction\\\hline

        & Steps a user mode process by one instruction\\\hline

\end{tabular}

\caption{Derived Instructions, continued}\label{tbl:derived-3}

\end{center}\end{table}

\begin{table}\begin{center}

\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline

{\tt STO.b Rx,\$addr}

        & \parbox[t]{1.5in}{\tt %

        LDI \$addr,Ra \\

        LDI \$addr,Rb \\

        LSR \$2,Ra \\

        AND \$3,Rb \\

        SUB \$32,Rb \\

        LOD (Ra),Ry \\

        AND \$0ffh,Rx \\

        AND \~\$0ffh,Ry \\

        ROL Rb,Rx \\

        OR Rx,Ry \\

        STO Ry,(Ra) }

        & \parbox[t]{3in}{This CPU and its bus are {\em not} optimized

        for byte-wise operations.

        Note that in this example, \$addr is a

        byte-wise address, whereas in all of our other examples it is a

        32-bit word address. This also limits the address space

        of character accesses from 16 MB down to 4MB.

        Further, this instruction implies a byte ordering,

        such as big or little endian.} \\\hline

{\tt SUBR Rx,Ry }

{\tt SUBR Rx,Ry }

        & \parbox[t]{1.5in}{\tt SUB 1+Rx,Ry\\ XOR -1,Ry}

        % & \parbox[t]{1.5in}{\tt SUB 1+Rx,Ry\\ XOR -1,Ry}

        & \parbox[t]{1.5in}{\tt XOR -1,Ry\\ADD 1+Rx,Ry}

        & Ry is set to Rx-Ry, rather than the normal subtract which

        & Ry is set to Rx-Ry, rather than the normal subtract which

        sets Ry to Ry-Rx. \\\hline

        sets Ry to Ry-Rx. \\\hline

\parbox[t]{1.4in}{\tt SUB Ra,Rx\\SUBC Rb,Ry}

\parbox[t]{1.4in}{\tt SUB Ra,Rx\\SUBC Rb,Ry}

        & \parbox[t]{1.5in}{\tt SUB Ra,Rx\\SUB.C \$1,Ry\\SUB Rb,Ry}

        & \parbox[t]{1.5in}{\tt SUB Ra,Rx\\SUB.C \$1,Ry\\SUB Rb,Ry}

        & Subtract with carry.  Note that the overflow flag may not be

        & Subtract with carry.  Note that the overflow flag may not be

Line 1463...

Line 1409...

        quietly turn the LDI instruction into a {\tt BREV}/{\tt LDILO} pair,

        quietly turn the LDI instruction into a {\tt BREV}/{\tt LDILO} pair,

        but the effect would be the same. \\\hline

        but the effect would be the same. \\\hline

{\tt TS Rx,Ry,(Rz)}

{\tt TS Rx,Ry,(Rz)}

        & \hbox{\tt LDI 1,Rx}

        & \hbox{\tt LDI 1,Rx}

                \hbox{\tt LOCK}

                \hbox{\tt LOCK}

                \hbox{\tt LOD (Rz),Ry}

                \hbox{\tt LW (Rz),Ry}

                \hbox{\tt STO Rx,(Rz)}

                \hbox{\tt SW Rx,(Rz)}

        & A test and set instruction.  The {\tt LOCK} instruction insures

        & A test and set instruction.  The {\tt LOCK} instruction insures

        that the next two instructions lock the bus between the instructions,

        that the next two instructions lock the bus between the instructions,

        so no one else can use it.  Thus guarantees that the operation is

        so no one else can use it.  Thus guarantees that the operation is

        atomic.

        atomic.

        \\\hline

        \\\hline

\end{tabular}

\caption{Derived Instructions, continued}\label{tbl:derived-3}

\end{center}\end{table}

\begin{table}\begin{center}

\begin{tabular}{p{1.0in}p{1.5in}p{3in}}\\\hline

{\tt TST Rx}

{\tt TST Rx}

        & {\tt TST \$-1,Rx}

        & {\tt TST \$-1,Rx}

        & Set the condition codes based upon Rx without changing Rx.

        & Set the condition codes based upon Rx without changing Rx.

        Equivalent to a CMP \$0,Rx.\\\hline

        Equivalent to a CMP \$0,Rx.\\\hline

{\tt WAIT}

{\tt WAIT}

Line 1485...

Line 1438...

\end{center}\end{table}

\end{center}\end{table}

\subsection{Interrupt Handling}

\subsection{Interrupt Handling}

The ZipCPU does not maintain any interrupt vector tables.  If an interrupt

The ZipCPU does not maintain any interrupt vector tables.  If an interrupt

takes place, the CPU simply switches to from user to supervisor (interrupt)

takes place, the CPU simply switches to from user to supervisor (interrupt)

mode.  The supervisor code then continues from where it left off after

mode.  Since getting to user mode in the first place required a return to

executing a return to userspace {\tt RTU} instruction.

userspace instruction, {\tt RTU}, once the interrupt takes place the

supervisor just simply starts executing code immediately after that

Since the CPU may return from userspace after either an interrupt, a

{\tt RTU} instruction.

trap, or an exception, it is up to the supervisor code that handles the

transition to determine which of the three has taken place.

Since the CPU may return from userspace after either an interrupt (hardware

generated), a trap (software generated), or an exception (a fault of some

type), it is up to the supervisor code that handles the transition to

determine which of the three has taken place.

\subsection{Pipeline Stages}

\subsection{Pipeline Stages}

As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},

As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},

the ZipCPU supports a five stage pipeline.

the ZipCPU supports a five stage pipeline.

\begin{enumerate}

\begin{enumerate}

Line 1530...

Line 1486...

\item {\bf Decode}: Decodes an instruction into it's OpCode, register(s) to

\item {\bf Decode}: Decodes an instruction into it's OpCode, register(s) to

        read, condition code, and immediate offset.  This stage also

        read, condition code, and immediate offset.  This stage also

        determines whether the flags will be read or set, whether registers

        determines whether the flags will be read or set, whether registers

        will be read (and hence the pipeline may need to stall), or whether the

        will be read (and hence the pipeline may need to stall), or whether the

        result will be written back.  In many ways, simplifying the CPU also

        result will be written back.  In many ways, simplifying the CPU has

        meant simplifying this pipeline stage and hence the instruction set

        meant simplifying this particular pipeline stage and hence the

        architecture.

        instruction set architecture that it implements.

        This stage is also responsible for both normal and CIS decoding.

        Hence, following this stage, little information remains regarding

        whether or not the CPU was executing a CIS instruction.

\item {\bf Read Operands}: Read from the register file and applies any

\item {\bf Read Operands}: Read from the register file and applies any

        immediate values to the result.  There is no means of detecting or

        immediate values to the result.  There is no means of detecting or

        flagging arithmetic overflow or carry when adding the immediate to the

        flagging arithmetic overflow or carry when adding the immediate to the

        operand.  This stage will stall if any source operand is pending

        operand.  This stage will stall if any source operand is pending

        and the immediate value is non--zero.

        and the immediate value is non--zero.

\item At this point, the processing flow splits into one of four tracks: An

\item At this point, the processing flow splits into one of four tracks: An

        {\bf ALU} track which will accomplish a simple instruction, the

        {\bf ALU} track which will accomplish a simple instruction, the

        {\bf MemOps} stage which handles {\tt LOD} (load) and {\tt STO}

        {\bf MemOps} stage which handles {\tt LW} (load) and {\tt SW}

        (store) instructions, the {\bf divide} unit, and the

        (store) instructions, the {\bf divide} unit, and the

        {\bf floating point} unit.

        {\bf floating point} unit.

        \begin{itemize}

        \begin{itemize}

        \item Loads will stall instructions in the read operands stage until the

        \item Loads will stall instructions in the read operands stage until

                entire memory is complete, lest a register be read only to be

                the entire memory operation is complete, lest a register be

                updated unseen by the Load.

                read from the register file only to be updated unseen by the

                Load.

        \item Condition codes are set upon completion of the ALU, divide,

        \item Condition codes are set upon completion of the ALU, divide,

                or FPU stage.  (Memory operations do not set conditions.)

                or FPU stage.  (Memory operations do not set conditions.)

        \item Issuing a non--pipelined memory instruction to the memory unit

        \item Issuing a non--pipelined memory instruction to the memory unit

                while the memory unit is busy will stall the entire pipeline

                while the memory unit is busy will stall the entire pipeline

                until the memory unit is idle and ready to accept another

                until the memory unit is idle and ready to accept another

Line 1610...

Line 1571...

Given that there are five stages to the pipeline, that accounts

Given that there are five stages to the pipeline, that accounts

for the four stalls.  (Were the {\tt pipefetch} cache chosen, there would

for the four stalls.  (Were the {\tt pipefetch} cache chosen, there would

be another stall internal to the {\tt pipefetch} cache.)

be another stall internal to the {\tt pipefetch} cache.)

The decode stage can handle the {\tt ADD \$X,PC}, {\tt LDI \$X,PC}, and

The decode stage can handle the {\tt ADD \$X,PC}, {\tt LDI \$X,PC}, and

{\tt LOD (PC),PC} instructions specially, however, when {\tt EARLY\_BRANCHING}

{\tt LW (PC),PC} instructions specially, however, when {\tt EARLY\_BRANCHING}

is enabled.  These instructions, when

is enabled.  These instructions, when

not conditioned on the flags, can execute with only a single stall cycle (two

not conditioned on the flags, can execute with only a single stall cycle (two

for the {\tt LOD(PC),PC} instruction),

for the {\tt LW(PC),PC} instruction),

such as is shown in Fig.~\ref{fig:branch}.

such as is shown in Fig.~\ref{fig:branch}.

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\includegraphics[width=4in]{../gfx/bra.eps} %0.4in per clock

\includegraphics[width=4in]{../gfx/bra.eps} %0.4in per clock

\caption{An expedited branch costs a single stall cycle}\label{fig:branch}

\caption{An expedited branch costs a single stall cycle}\label{fig:branch}

\end{center}\end{figure}

\end{center}\end{figure}

Line 1644...

Line 1605...

This is also the reason why, when setting up a stack frame, the top of the

This is also the reason why, when setting up a stack frame, the top of the

stack frame is used first: it eliminates this stall cycle.\footnote{This only

stack frame is used first: it eliminates this stall cycle.\footnote{This only

applies if there is no local memory to allocate on the stack as well.}  Hence,

applies if there is no local memory to allocate on the stack as well.}  Hence,

to save registers at the top of a procedure, one would write:

to save registers at the top of a procedure, one would write:

\begin{enumerate}

\begin{enumerate}

\item\ {\tt SUB 2,SP}

\item\ {\tt SUB 16,SP}

\item\ {\tt STO R1,(SP)}

\item\ {\tt SW R1,(SP)}

\item\ {\tt STO R2,1(SP)}

\item\ {\tt SW R2,4(SP)}

\end{enumerate}

\end{enumerate}

Had {\tt R1} instead been stored at {\tt 1(SP)} as the top of the stack,

Had {\tt R1} instead been stored at {\tt 1(SP)} as the top of the stack,

there would've been an extra stall in setting up the stack frame.

there would've been an extra stall in setting up the stack frame.

\item When reading from the CC register after setting the flags

\item When reading from the CC register after setting the flags

Line 1677...

Line 1638...

will incur the stall, while a {\tt LDI \$BREAKEN|\$STEP,CC} will not since

will incur the stall, while a {\tt LDI \$BREAKEN|\$STEP,CC} will not since

it doesn't read the condition codes before executing.

it doesn't read the condition codes before executing.

\item When waiting for a memory read operation to complete

\item When waiting for a memory read operation to complete

\begin{enumerate}

\begin{enumerate}

\item\ {\tt LOD address,RA}

\item\ {\tt LW address,RA}

\item\ {\em (multiple stalls, bus dependent, 4 clocks best)}

\item\ {\em (multiple stalls, bus dependent, 4 clocks best)}

\item\ {\tt OPCODE I+RA,RB}

\item\ {\tt OPCODE I+RA,RB}

\end{enumerate}

\end{enumerate}

Remember, the ZipCPU does not support out of order execution.  Therefore,

Remember, the ZipCPU does not support out of order execution.  Therefore,

Line 1716...

Line 1677...

of clock cycles for the bus to be free, as shown in both Figs.~\ref{fig:memrd}

of clock cycles for the bus to be free, as shown in both Figs.~\ref{fig:memrd}

and~\ref{fig:memwr}, there will be stalls.

and~\ref{fig:memwr}, there will be stalls.

\item Memory operation followed by a memory operation

\item Memory operation followed by a memory operation

\begin{enumerate}

\begin{enumerate}

\item\ {\tt STO address,RA}

\item\ {\tt SW address,RA}

\item\ {\em (multiple stalls, bus dependent, 4 clocks best)}

\item\ {\em (multiple stalls, bus dependent, 4 clocks best)}

\item\ {\tt LOD address,RB}

\item\ {\tt LW address,RB}

\item\ {\em (multiple stalls, bus dependent, 4 clocks best)}

\item\ {\em (multiple stalls, bus dependent, 4 clocks best)}

\end{enumerate}

\end{enumerate}

In this case, the LOD instruction cannot start until the STO is finished,

In this case, the LW instruction cannot start until the SW is finished,

as illustrated by Fig.~\ref{fig:mstld}.

as illustrated by Fig.~\ref{fig:mstld}.

\begin{figure}\begin{center}

\begin{figure}\begin{center}

\includegraphics[width=5.5in]{../gfx/mstld.eps}

\includegraphics[width=5.5in]{../gfx/mstld.eps}

\caption{Pipeline handling of a store followed by a load instruction}\label{fig:mstld}

\caption{Pipeline handling of a store followed by a load instruction}\label{fig:mstld}

\end{center}\end{figure}

\end{center}\end{figure}

With proper scheduling, it is possible to do something in the ALU while the

With proper scheduling, it is possible to do something in the ALU while the

memory unit is busy with the STO instruction, but otherwise this pipeline will

memory unit is busy with the SW instruction, but otherwise this pipeline will

stall while waiting for it to complete before the load instruction can

stall while waiting for it to complete before the load instruction can

start.

start.

The ZipCPU has the capability of supporting a form of burst memory access,

The ZipCPU has the capability of supporting a form of burst memory access,

often called pipelined memory access within this document due to its use of

often called pipelined memory access within this document due to its use of

Line 1768...

Line 1729...

\subsection{Simplified Wishbone Bus}\label{ssec:bus}

\subsection{Simplified Wishbone Bus}\label{ssec:bus}

The bus architecture of the ZipCPU is that of a simplified, pipelined, WISHBONE

The bus architecture of the ZipCPU is that of a simplified, pipelined, WISHBONE

bus built according to the B4 specification.  Several changes have been made to

bus built according to the B4 specification.  Several changes have been made to

simplify this bus.  First, all unnecessary ancillary information has been

simplify this bus.  First, all unnecessary ancillary information has been

removed.  This includes the retry, tag, lock, cycle type indicator, and burst

removed.  This includes the retry, tag, lock, cycle type indicator, and burst

indicator signals.  It also includes the select lines which would enable the

indicator signals.  The bus supports big endian operation where the high order

CPU to act on less than 32--bit words.  As a result all operations on the bus

octet occupies the low order address.  Second, we insist that all

are 32--bit operations.  The bus is neither little endian nor big endian.  For

this reason, all words are 32--bits.  All instructions are also 32--bits wide.

Everything has been built around the 32--bit word.  Even the byte size (the

size of the minimum addressable unit) is 32--bits.  Second, we insist that all

accesses be pipelined, and simplify that further by insisting that pipelined

accesses be pipelined, and simplify that further by insisting that pipelined

accesses not cross peripherals---although we leave it to the user to keep that

accesses not cross peripherals---although we leave it to the user to keep that

from happening in practice.  Finally, we insist that the wishbone strobe line

from happening in practice.  Finally, we insist that the wishbone strobe line

be zero any time the cycle line is inactive.  This makes decoding simpler

be zero any time the cycle line is inactive.  This makes decoding simpler

in slave logic: a transaction is initiated whenever the strobe line is high

in slave logic: a transaction is initiated whenever the strobe line is high

Line 1790...

Line 1747...

The CPU knows nothing about which addresses reference on--chip or off-chip

The CPU knows nothing about which addresses reference on--chip or off-chip

memory, or even which reference peripherals.  Indeed, there is no indication

memory, or even which reference peripherals.  Indeed, there is no indication

within the CPU if a particular piece of memory can be cached or not, save that

within the CPU if a particular piece of memory can be cached or not, save that

the CPU assumes any and all instruction words can be cached.

the CPU assumes any and all instruction words can be cached.

The one exception to this rule revolves around addresses beginning with

The one exception to this rule revolves around addresses where the top 8-bits

{\tt 2'b11} in their high order word.  These addresses are used to access a

of their high order word are all ones.  These addresses are used to access a

variety of optional peripherals that will be discussed more in

variety of optional peripherals that will be discussed more in

Sec.~\ref{sec:zipsys}, but that are only present within the {\tt ZipSystem}.

Sec.~\ref{sec:zipsys}, but that are only present within the {\tt ZipSystem}.

When used with the bare {\tt ZipBones}, these addresses will cause a bus error.

When used with the bare {\tt ZipBones}, these addresses will cause a bus error.

The prefetch cache currently has no means of detecting an instruction that

The prefetch cache currently has no means of detecting an instruction that

was changed, save by clearing the instruction cache.  This may be necessary

was changed, save by clearing the instruction cache.  This may be necessary

when loading programs into previously used memory, or when creating

when loading programs into previously used memory, or when creating

self--modifying code.

self--modifying code.

Should the memory management unit (MMU) be integrated into the ZipCPU, the MMU

Should the memory management unit (MMU) be integrated into the ZipCPU, the MMU

will be able to be configured to instruct the ZipCPU as to which addresses may

configuration will tell the ZipCPU wich addresses may be cached and which not.

be cached and which not.

This topic is discussed further in the linker section, Sec.~\ref{sec:ld-mem}

This topic is discussed further in the linker section, Sec.~\ref{sec:ld-mem}

of the ABI chapter, Chap.~\ref{chap:abi}.

of the ABI chapter, Chap.~\ref{chap:abi}.

% \subsection{Measured Performance}\label{sec:perf}

% \subsection{Measured Performance}\label{sec:perf}

Line 1966...

Line 1922...

access\footnote{The pipeline cost of the DMA controller, including setup cost,

access\footnote{The pipeline cost of the DMA controller, including setup cost,

is a minimum of $14+2N$ clocks.} (The CPU gets priority over the bus, but once

is a minimum of $14+2N$ clocks.} (The CPU gets priority over the bus, but once

bus access is granted to the DMA peripheral, it will not be revoked mid--read

bus access is granted to the DMA peripheral, it will not be revoked mid--read

or mid--write.)

or mid--write.)

The DMA controller supports only aligned word accesses.  It does not support

byte or half-word accesses.

When copying memory from one location to another, the DMA controller will

When copying memory from one location to another, the DMA controller will

copy in units of a given transfer length--up to 1024 words at a time.  It will

copy in units of a given transfer length--up to 1024 words at a time.  It will

read that transfer length into its internal buffer, and then write to the

read that transfer length into its internal buffer, and then write to the

destination address from that buffer.

destination address from that buffer.

Line 2023...

Line 1982...

% ELF Format

% ELF Format

% Stack:

% Stack:

%       R13 is the stack register.

%       R13 is the stack register.

%       The stack grows downward.

%       The stack grows downward.

%       Memory at the current stack pointer is allocated.

%       Memory at the current stack pointer is allocated.

%       Hence, a PUSH is : SUB 1,SP; STO Rx,(SP)

%       Hence, a PUSH is : SUB 1,SP; SW Rx,(SP)

% Heap:

% Heap:

%       In general, not yet implemented.

%       In general, not yet implemented.

%       A less than adequate Heap has been implemented as a pointer, from which

%       A less than adequate Heap has been implemented as a pointer, from which

%       malloc requests simply decrement it.  Free's are NOOPs, leaving

%       malloc requests simply decrement it.  Free's are NOOPs, leaving

%       allocated memory allocated forever.

%       allocated memory allocated forever.

\section{Executable File Format}\label{sec:abi-elf}

\section{Executable File Format}\label{sec:abi-elf}

ZipCPU executable files are stored in the Executable and Linkable Format

ZipCPU executable files are stored in the Executable and Linkable Format

(ELF), prior to being placed in flash, or whatever memory they will be

(ELF), prior to being placed in flash, or whatever memory they will be

executed from.  All addresses within this format are ZipCPU addresses,

executed from.

referencing 32--bit quantities, whereas all offsets internal to the ELF file

represent 8--bit quantities.  Thus, when running the {\tt zip-objdump} utility

The ZipCPU described by this specification uses the 16-bits {\tt 16'hdad1}

on a ZipCPU ELF file, the addresses are properly set.

to identify itself against other CPUs.  This is not an officially registered

number, and may change in the future.

The ZipCPU does not (yet) have a dynamic linker/loader.  All linking is

The ZipCPU does not (yet) have a dynamic linker/loader.  All linking is

currently static, and done prior to run time.

currently static, and done prior to run time.

\section{Stack}\label{sec:abi-stack}

\section{Stack}\label{sec:abi-stack}

Although nothing in the hardware requires this, the compiler back end

Register {\tt R13} (also known as the {\tt SP} register) is the stack register.

implementation uses {\tt R13} (also known as the {\tt SP} register) as a stack

The compiler generates code that grows the stack from

register, and grows the stack from

high addresses to lower addresses.  That means that the stack will usually

high addresses to lower addresses.  That means that the stack will usually

start out set to a very large value, such as one past the last RAM address,

start out set to a very large value, such as one past the last RAM address,

and it will grow to lower and lower values--hopefully never mixing with the

and it will grow to lower and lower values--hopefully never mixing with the

heap.  Memory at the current stack position is assumed to be allocated.

heap.  Memory at the current stack position is assumed to be allocated.

When creating a stack frame for a function, the compiler will subtract

When creating a stack frame for a function, the compiler will subtract

the size of the stack frame from the stack register.  It will then store

the size of the stack frame from the stack register.  It will then store

any registers used by the function, from {\tt R5} to {\tt R12} (including

any registers used by the function, from {\tt R5} to {\tt R12} (including

the link register {\tt R0}) onto offsets given by the stack pointer plus a

the link register {\tt R0}) onto offsets given by the stack pointer plus a

constant.  If a frame pointer is used, the compiler uses {\tt R12} (or {\tt FP})

constant.  If a frame pointer is used, the compiler uses {\tt R12} (or

for this purpose.  The frame pointer is set by moving the stack pointer

{\tt FP}) for this purpose.  The frame pointer is set by moving the stack

plus an offset into {\tt FP}.  This {\tt MOV} instruction effectively limits

pointer plus an offset into {\tt FP}.  This {\tt MOV} instruction effectively

the size of any individual stack frame to $2^{12}-1$ words.

limits the size of any individual stack frame to $2^{12}-1$ octets.

Once a subroutine is complete, the frame is unwound.  If the frame pointer,

Once a subroutine is complete, the frame is unwound.  If the frame pointer,

{\tt FP} was used, then {\tt FP} is copied directly to the stack pointer,

{\tt FP} was used, then {\tt FP} is copied directly to the stack pointer,

{\tt SP}.  Registers are restored, starting with {\tt R0} all the way to

{\tt SP}.  Registers are restored, starting with {\tt R0} all the way to

{\tt R12} ({\tt FP}).  This also restores, and obliterates, the subroutine

{\tt R12} ({\tt FP}).  This also restores, and obliterates, the subroutine

frame pointer.  Once complete, a value is added to the stack pointer to return

frame pointer.  Once complete, a value is added to the stack pointer to

it to its original value, and a jump is made to the value located within

return it to its original value, and a jump is made to the value located

{\tt R0}.

within {\tt R0}.

\section{Relocations}\label{sec:abi-reloc}

\section{Relocations}\label{sec:abi-reloc}

The ZipCPU binutils back end supports two several relocations, although the

The ZipCPU binutils back end supports several types of relocations, although

two most common are the 32--bit relocations for register load and long jump.

the two most common are the 32--bit relocations for register load and long

jump.

The first of these is for loading an arbitrary 32--bit value into a register.

The first of these is for loading an arbitrary 32--bit value into a register.

Such instructions are broken into a pair of {\tt BREV} and {\tt LDILO}

Such instructions are broken into a pair of {\tt BREV} and {\tt LDILO}

instructions, and once the value of the parameter is known their immediates

instructions, and once the value of the parameter is known their immediate

are filled in.

values can be filled in.

The second type of 32--bit relocation is for jumps to arbitrary addresses.

The second type of 32--bit relocation is for jumps to arbitrary addresses.

These jumps are supported by the \hbox{\tt LOD (PC),PC} instruction, followed

These jumps are supported by the \hbox{\tt LW (PC),PC} instruction, followed

by the 32--bit address to be filled in later by the linker.  If the jump is

by the 32--bit address to be filled in later by the linker.  If the jump is

conditional, then a conditional \hbox{\tt LOD.$x$ 1(PC),PC} instruction is

conditional, then a conditional \hbox{\tt LW.$x$ 4(PC),PC} instruction is

used, followed by a {\tt BRA 1(PC),PC} and then the 32--bit relocation value.

used, followed by a {\tt ADD 4,PC} and then the 32--bit relocation value.

If the branch distance is known and within reach, branches will be implemented

If a branch distance is known and within reach, then it will be implemented

with {\tt ADD \#,PC} instructions, possibly conditional, as necessary.

with an {\tt ADD \#,PC} instruction, possibly conditional, as necessary.

While other relocations are supported, they tend not to be used nearly as much

While other relocations are supported, they tend not to be used nearly as much

as these two.

as these two.

\section{Call format}\label{sec:abi-jsr}

\section{Call format}\label{sec:abi-jsr}

One feature of the ZipCPU is that it has no JSR instruction.  Jumps to

One unique of the ZipCPU is that it has no JSR instruction.  The assembler

subroutine's therefore take three assembly instructions:

attempts to minimize this problem by replacing a {\tt JSR}~{\em address}

The first is a {\tt MOV .Lcall\#\#(PC),R0}, which places the return address

instruction with a {\tt MOV \#(PC),R0} followed by a jump to the requested

into R0.  {\tt .Lcall\#\#} in this case is a label, where \#\# is a unique

address.  In this case, the offset to the PC for the {\tt MOV} instruction

number filled in by the compiler.  This instruction is followed by a

is determined by whether or not the jump can be accomplished with a local

{\tt BRA subroutine} instruction.  Finally, the third assembly ``instruction''

branch or a long jump.

of any call sequence is the label {\tt .Lcall\#\#}.

While this works well in practice, GCC's implementation prevents such things

While this works well in practice, this implementation prevents such things

as {\tt JSR}'s followed by {\tt BRA}'s from being combined together.

as {\tt JSR}'s followed by {\tt BRA}'s from being combined together.

Finally, the first five operands passed to the subroutine will be placed into

Finally, GCC will place first five operands passed to the subroutine into

registers R1--R5.  Any additional operands are placed upon the stack.

registers R1--R5.  Any additional operands are placed upon the stack.

\section{Built-ins}\label{sec:abi-builtin}

\section{Built-ins}\label{sec:abi-builtin}

The ZipCPU ABI supports the a number of built in functions.  The compiler

The ZipCPU ABI supports the a number of built in functions.  The compiler

maps these functions directly to assembly language equivalents, essentially

maps these functions directly to assembly language equivalents, essentially

Line 2114...

Line 2073...

instructions.  These are:

instructions.  These are:

\begin{enumerate}

\begin{enumerate}

\item {\tt zip\_bitrev(int)} reverses the bits in the given integer, returning

\item {\tt zip\_bitrev(int)} reverses the bits in the given integer, returning

        the result.  This utilizes the internal {\tt BREV} instruction, and is

        the result.  This utilizes the internal {\tt BREV} instruction, and is

        designed to be used with FFT's as necessary.

        designed to be used with FFT's as necessary.

\item {\tt zip\_busy()} executes an {\tt ADD -1,PC} function, essentially

\item {\tt zip\_busy()} executes an {\tt ADD -4,PC} function, essentially

        forcing the CPU into a very tight infinite loop.

        forcing the CPU into a very tight infinite loop.

\item {\tt zip\_cc()} returns the value of the current CC register.  This may

\item {\tt zip\_cc()} returns the value of the current CC register.  This may

        be used within both user and supervisor code to determine in which

        be used within both user and supervisor code to determine in which

        mode the CPU is within.

        mode the CPU is within.

\item {\tt zip\_halt()} executes an \hbox{\tt OR \$SLEEP,CC} instruction to

\item {\tt zip\_halt()} executes an \hbox{\tt OR \$SLEEP,CC} instruction to

Line 2196...

Line 2155...

\begin{eqnarray*}

\begin{eqnarray*}

\mbox{blkram (wx) : ORIGIN = 0x0008000, LENGTH = 0x0008000}

\mbox{blkram (wx) : ORIGIN = 0x0008000, LENGTH = 0x0008000}

\end{eqnarray*}

\end{eqnarray*}

specifies that there is a region of memory, called blkram, that can be read and

specifies that there is a region of memory, called blkram, that can be read and

written, and that programs can execute from.  This section starts at address

written, and that programs can execute from.  This section starts at address

{\tt 0x8000} and extends for another {\tt 0x8000} words.  The other memories

{\tt 0x8000} and extends for another {\tt 0x8000} bytes.  The other memories

are defined in a similar manner, with names {\tt flash} and {\tt sdram}.

are defined in a similar manner, with names {\tt flash} and {\tt sdram}.

Following the memory section, three specific symbols are defined:

Following the memory section, three specific symbols are defined:

        {\tt \_flash}, defining the beginning of flash memory,

        {\tt \_flash}, defining the beginning of flash memory,

        {\tt \_blkram}, defining the beginning of on--chip block RAM,

        {\tt \_blkram}, defining the beginning of on--chip block RAM,

Line 2301...

Line 2260...

        Equivalently, this is the address of the first unused piece of

        Equivalently, this is the address of the first unused piece of

        memory, or the location from whence to start any dynamic memory

        memory, or the location from whence to start any dynamic memory

        subsystem.

        subsystem.

\end{enumerate}

\end{enumerate}

All of these symbols need to reference word aligned addresses.

\section{Loading ZipCPU Programs}

\section{Loading ZipCPU Programs}

There are two basic ways to load a ZipCPU program, depending upon whether or

There are two basic ways to load a ZipCPU program, depending upon whether or

not the ZipCPU is active within the current configuration.  If the ZipCPU

not the ZipCPU is active within the current configuration.  If the ZipCPU

is not a part of the current FPGA configuration, one need only write the

is not a part of the current FPGA configuration, one need only write the

flash and then switch configurations.  It will be the CPU's responsibility

flash and then switch configurations.  It will be the CPU's responsibility

Line 2507...

Line 2468...

{\tt void timer\_delay(int nclocks) \{} \\

{\tt void timer\_delay(int nclocks) \{} \\

\hbox to 0.25in{}\= {\em // Clear the PIC.  We want to exit from here on timer counts alone}\\

\hbox to 0.25in{}\= {\em // Clear the PIC.  We want to exit from here on timer counts alone}\\

        \> {\tt zip->pic = DISABLEALL|SYSINT\_TMA;}\\

        \> {\tt zip->pic = DISABLEALL|SYSINT\_TMA;}\\

        \> {\tt if (nclocks > 10) \{}\\

        \> {\tt if (nclocks > 10) \{}\\

        \> \hbox to 0.25in{}\= {\em // Set our timer to count down the given number of counts}\\

        \> \hbox to 0.25in{}\= {\em // Set our timer to count down the given number of counts}\\

        \> \> {\tt zip->tma = counts} \\

        \> \> {\tt zip->tma = nclocks;} \\

        \> \> {\tt zip->pic = EINT(SYSINT\_TMA);} \\

        \> \> {\tt zip->pic = EINT(SYSINT\_TMA);} \\

        \> \> {\tt zip\_wait();} \\

        \> \> {\tt zip\_wait();} \\

        \> \> {\tt zip->pic = CLEARPIC;} \\

        \> \> {\tt zip->pic = CLEARPIC;} \\

        \> {\tt \} }{\em // else anything less has likely already passed}

        \> {\tt \} }{\em // else anything less has likely already passed} \\

{\tt \}}\\

{\tt \}}\\

\end{tabbing}

\end{tabbing}

\caption{Waiting on a timer}\label{tbl:shi-timer}

\caption{Waiting on a timer}\label{tbl:shi-timer}

\end{center}\end{table}

\end{center}\end{table}

we present one means of waiting for a programmable amount of time using a

we present one means of waiting for a programmable amount of time using a

Line 2679...

Line 2640...

One common operation is that of a memory move or copy.  This section will

One common operation is that of a memory move or copy.  This section will

present several methods available to the ZipCPU for performing a memory

present several methods available to the ZipCPU for performing a memory

copy, starting with the C code shown in Tbl.~\ref{tbl:memcp-c}.

copy, starting with the C code shown in Tbl.~\ref{tbl:memcp-c}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\parbox{4in}{\begin{tabbing}

\parbox{4in}{\begin{tabbing}

{\tt void} \= {\tt memcpy(void *dest, void *src, int len) \{} \\

{\tt void} \= {\tt memcpy(char *dest, char *src, int len) \{} \\

        \> {\tt for(int i=0; i<len; i++)} \\

        \> {\tt for(int i=0; i<len; i++)} \\

        \> \hspace{0.2in} {\tt *dest++ = *src++;} \\

        \> \hspace{0.2in} {\tt *dest++ = *src++;} \\

\}

\}

\end{tabbing}}

\end{tabbing}}

\caption{Example Memory Copy code in C}\label{tbl:memcp-c}

\caption{Example Memory Copy code in C}\label{tbl:memcp-c}

Line 2696...

Line 2657...

\begin{tabbing}

\begin{tabbing}

memcpy: \\

memcpy: \\

\hbox to 0.35in{}\={\em ; R0 = return address, R1 = *dest, R2 = *src, R3 = LEN} \\

\hbox to 0.35in{}\={\em ; R0 = return address, R1 = *dest, R2 = *src, R3 = LEN} \\

\>      {\em ; The following will operate in 6 ($N=0$), or $2+12N$ clocks ($N\neq 0$).} \\

\>      {\em ; The following will operate in 6 ($N=0$), or $2+12N$ clocks ($N\neq 0$).} \\

\>      {\tt CMP 0,R3} \\ % 8 clocks per setup

\>      {\tt CMP 0,R3} \\ % 8 clocks per setup

\>      {\tt JMP.Z R0} \hbox to 0.3in{}\= {\em ; A conditional return }\\

\>      {\tt RETN.Z} \hbox to 0.3in{}\= {\em ; A conditional return }\\

\>      {\em ; No stack frame needs to be set up to use {\tt R4}, since the compiler}\\

\>      {\em ; No stack frame needs to be set up to use {\tt R4}, since the compiler}\\

\>      {\em  ; assumes {\tt R1}-{\tt R4} may be used and changed by any subroutine} \\

\>      {\em  ; assumes {\tt R1}-{\tt R4} may be used and changed by any subroutine} \\

memcpy\_loop: \\ % 12 clocks per loop

memcpy\_loop: \\ % 12 clocks per loop

\>      {\tt LOD (R2),R4} \\

\>      {\tt LB (R2),R4} \\

\>      {\em ; (4 stalls, cannot be scheduled away)} \\

\>      {\em ; (4 stalls, cannot be scheduled away)} \\

\>      {\tt STO R4,(R1)} \> {\em ; (4 schedulable stalls, has no impact now)} \\

\>      {\tt SB R4,(R1)} \> {\em ; (4 schedulable stalls, has no impact now)} \\

\>      {\em ; Update our count of the number of remaining values to copy}\\

\>      {\em ; Update our count of the number of remaining values to copy}\\

\>      {\tt SUB 1,R3}  \> {\em ; This will be zero when we have copied our last}\\

\>      {\tt SUB 1,R3}  \> {\em ; This will be zero when we have copied our last}\\

\>      {\tt JMP.Z R0}  \> {\em ; + 4 stalls, if taken}\\

\>      {\tt RETN.Z}    \> {\em ; + 4 stalls, if taken}\\

\>      {\tt ADD 1,R1}  \> {\em ; Implement the destination pointer }\\

\>      {\tt ADD 1,R1}  \> {\em ; Implement the destination pointer }\\

\>      {\tt ADD 1,R2}  \> {\em ; Implement the source pointer }\\

\>      {\tt ADD 1,R2}  \> {\em ; Implement the source pointer }\\

\>      {\tt BRA memcpy\_loop} \\

\>      {\tt BRA memcpy\_loop} \\

\>      {\em ; (1 stall on a BRA instruction)} \\

\>      {\em ; (1 stall on a BRA instruction)} \\

\end{tabbing}

\end{tabbing}

\caption{Example Memory Copy code in Zip Assembly, Unoptimized}\label{tbl:memcp-asm}

\caption{Example Memory Copy code in Zip Assembly, Unoptimized}\label{tbl:memcp-asm}

\end{center}\end{table}

\end{center}\end{table}

This example points out several things associated with the ZipCPU.  First,

This example points out several things associated with the ZipCPU.  First,

a straightforward implementation of a for loop is not the fastest loop

a straightforward implementation of a for loop is not the fastest loop

structure.  For this reason, we have placed the test to continue at the

structure.  For this reason, we have placed the test to continue at the

end.  Second, all pointers are {\tt void} pointers to arbitrary 32--bit

end.  Second, notice that we can use {\tt R4} without storing it, since the

data types.  The ZipCPU does not have explicit support for smaller or larger

C~ABI allows for subroutines to use {\tt R1}--{\tt R4} without saving them.

data types, and so this memory copy cannot be applied at an 8--bit level.

This means that we can return from this subroutine using conditional jumps to

Third, notice that we can use {\tt R4} without storing it, since the C~ABI

allows for subroutines to use {\tt R1}--{\tt R4} without saving them.  This

means that we can return from this subroutine using conditional jumps to

{\tt R0}.

{\tt R0}.

Still, there's more that could be done.  Suppose we wished to use the pipeline

Still, there's more that could be done.  Suppose we wished to use the pipeline

bus capability?  We might then write something closer to

bus capability?  We might then write something closer to

Tbl.~\ref{tbl:memcp-opt}.

Tbl.~\ref{tbl:memcp-opt}.

Line 2736...

Line 2694...

{\em ; Upon entry, R0 = return address, R1 = *dest, R2 = *src, R3 = LEN} \\

{\em ; Upon entry, R0 = return address, R1 = *dest, R2 = *src, R3 = LEN} \\

{\em ; Achieves roughly $32+17\left\lfloor\frac{N}{4}\right\rfloor$ clocks,

{\em ; Achieves roughly $32+17\left\lfloor\frac{N}{4}\right\rfloor$ clocks,

        after the initial pipeline delay}\\

        after the initial pipeline delay}\\

memcpy\_opt: \\

memcpy\_opt: \\

\hbox to 0.35in{}\=\hbox to 1.4in{\tt CMP 4,R3}\= {\em ; Check for small short lengths, len $<$ 4}\\

\hbox to 0.35in{}\=\hbox to 1.4in{\tt CMP 4,R3}\= {\em ; Check for small short lengths, len $<$ 4}\\

\>      {\tt BC memcpy\_finish} \> {\em ; Jump to the end if so}\\

\>      {\tt BC \_memcpy\_finish}       \> {\em ; Jump to the end if so}\\

\hbox to 0.35in{}\=\hbox to 1.4in{\tt SUB 3,SP}\= {\em ; Otherwise, create a stack frame, storing the registers}\\

\hbox to 0.35in{}\=\hbox to 1.4in{\tt SUB 12,SP}\= {\em ; Otherwise, create a stack frame, storing the registers}\\

\>      {\tt STO R5,(SP)}       \> {\em ; we will be using.  Note that this is a pipelined store, so}\\

\>      {\tt SW R5,(SP)}        \> {\em ; we will be using.  Note that this is a pipelined store, so}\\

\>      {\tt STO R6,1(SP)}      \> {\em ; subsequent stores only cost 1 clock.}\\

\>      {\tt SW R6,4(SP)}       \> {\em ; subsequent stores only cost 1 clock.}\\

\>      {\tt STO R7,2(SP)}\\

\>      {\tt SW R7,8(SP)}\\

\>      {\tt ADD 4,R2}          \> {\em ; Pre-Increment our pointers, for a 4-stage pipeline.  This}\\

\>      {\tt ADD 4,R2}          \> {\em ; Pre-Increment our pointers, for a 4-stage pipeline.  This}\\

\>      {\tt ADD 4,R1}          \> {\em ; also fills up the 3 of the 4 stall states following the}\\

\>      {\tt ADD 4,R1}          \> {\em ; also fills up the 3 of the 4 stall states following the}\\

\>      {\tt SUB 5,R3}          \> {\em ; stores.  Also, leave {\tt R3} as the number left minus one.}\\

\>      {\tt SUB 5,R3}          \> {\em ; stores.  Also, leave {\tt R3} as the number left minus one.}\\

\>      {\tt LOD -4(R2),R4}     \> {\em ; Load the first four values into }\\

\>      {\tt LB -4(R2),R4}      \> {\em ; Load the first four values into }\\

\>      {\tt LOD -3(R2),R5}     \> {\em ; registers, using a pipelined load.}\\

\>      {\tt LB -3(R2),R5}      \> {\em ; registers, using pipelined loads.}\\

\>      {\tt LOD -2(R2),R6}\\

\>      {\tt LB -2(R2),R6}\\

\>      {\tt LOD -1(R2),R7}\\

\>      {\tt LB -1(R2),R7}\\

{\tt mcopy\_next\_four\_chars:} \>\> {\em ; Here's the top of our copy loop}\\

{\tt \_mcopy\_next\_four\_chars:} \>\> {\em ; Here's the top of our copy loop}\\

\>      {\tt STO  R4,-4(R1)}    \> {\em ; Store four values, using a burst memory operation.}\\

\>      {\tt SB  R4,-4(R1)}     \> {\em ; Store four values, using a burst memory operation.}\\

\>      {\tt STO  R5,-3(R1)}    \> {\em ; One clock for subsequent stores.}\\

\>      {\tt SB  R5,-3(R1)}     \> {\em ; One clock for subsequent stores.}\\

\>      {\tt STO  R6,-2(R1)}    \> {\em ; None of these effect the flags, that were set when}\\

\>      {\tt SB  R6,-2(R1)}     \> {\em ; None of these effect the flags, that were set when}\\

\>      {\tt STO  R7,-1(R1)}    \> {\em ; we last adjusted {\tt R3}}\\

\>      {\tt SB  R7,-1(R1)}     \> {\em ; we last adjusted {\tt R3}}\\

\>      {\tt BC  preend\_memcpy} \> {\em ; +4 stall cycles, but only when taken}\\

\>      {\tt BC  \_preend\_memcpy} \> {\em ; +4 stall cycles, but only when taken}\\

\>      {\tt ADD  4,R1} \> {\em ; ALU ops don't stall during stores, so}\\

\>      {\tt ADD  4,R1} \> {\em ; ALU ops don't stall during stores, so}\\

\>      {\tt ADD  4,R2} \> {\em ; increment our pointers here.} \\

\>      {\tt ADD  4,R2} \> {\em ; increment our pointers here.} \\

\>      {\tt SUB  4,R3} \> {\em ; Calculate whether or not we have a next round}\\

\>      {\tt SUB  4,R3} \> {\em ; Calculate whether or not we have a next round}\\

\>      {\tt LOD  -4(R2),R4} \> {\em ; Preload the values for the next round}\\

\>      {\tt LB  -4(R2),R4} \>  {\em ; Preload the values for the next round}\\

\>      {\tt LOD  -3(R2),R5}\>  {\em ; Notice that these are also pipelined}\\

\>      {\tt LB  -3(R2),R5}\>   {\em ; Notice that these are also pipelined}\\

\>      {\tt LOD  -2(R2),R6}\>  {\em ; loads, as before.}\\

\>      {\tt LB  -2(R2),R6}\>   {\em ; loads, as before.}\\

\>      {\tt LOD  -1(R2),R7}\>  {\em ; The four stall cycles, though, are concurrent w/ the branch.}\\

\>      {\tt LB  -1(R2),R7}\>  {\em ; The four stall cycles, though, are concurrent w/ the branch.}\\

\>      {\tt BRA  mcopy\_next\_char} \> {\em ; Early branching avoids the full memory pipeline stall} \\

\>      {\tt BRA  \_mcopy\_next\_four\_chars}\hspace{0.25in} {\em ; Early branching avoids the full memory pipeline stall} \\

{\tt preend\_memcpy:}\\

{\tt \_preend\_memcpy:}\\

\>      {\tt ADD  1,R3} \>{\em ; R3 is now the remaining length, rather than one less than it}\\

\>      {\tt ADD  1,R3} \>{\em ; R3 is now the remaining length, rather than one less than it}\\

\>      {\tt LOD (SP),R5}  \> {\em ; Restore our saved registers, since the remainder of the routine}\\

\>      {\tt LW (SP),R5}  \> {\em ; Restore our saved registers, since the remainder of the routine}\\

\>      {\tt LOD 1(SP),R6} \> {\em ; doesn't use these registers}\\

\>      {\tt LW 4(SP),R6} \> {\em ; doesn't use these registers}\\

\>      {\tt LOD 2(SP),R7} \> {\em ;}\\

\>      {\tt LW 8(SP),R7} \> {\em ;}\\

\>      {\tt ADD 3,SP}  \>{\em ; Adjust the stack pointer back to what it was}\\

\>      {\tt ADD 12,SP} \>{\em ; Adjust the stack pointer back to what it was}\\

{\tt memcpy\_finish:}\>\>{\em ; At this point, there are $0\leq$ {\tt R3}$<4$ words left}\\

{\tt \_memcpy\_finish:}\>\>{\em ; At this point, there are $0\leq$ {\tt R3}$<4$ words left}\\

\>      {\tt CMP 1,R3} \> {\em ; Check if any ops are remaining }\\

\>      {\tt CMP 1,R3} \> {\em ; Check if any ops are remaining }\\

\>      {\tt JMP.LT R0} \> {\em ; Return now if nothing is left}\\

\>      {\tt RETN.LT} \> {\em ; Return now if nothing is left}\\

\>      {\tt LOD (R1),R4} \> {\em ; Load and store the first item}\\

\>      {\tt LB (R1),R4} \> {\em ; Load and store the first item}\\

\>      {\tt STO R4,(R1)} \> {\em ;}\\

\>      {\tt SB R4,(R1)} \> {\em ;}\\

\>      {\tt JMP.Z R0}  \> {\em ; Return if that was our only value}\\

\>      {\tt RETN.Z}    \> {\em ; Return if that was our only value}\\

\>      {\tt LOD 1(R1),R4}\>{\em; Load and store the second item (if necessary)} \\

\>      {\tt LB 1(R1),R4}\>{\em; Load and store the second item (if necessary)} \\

\>      {\tt STO R4,1(R1)}\\

\>      {\tt SB R4,1(R1)}\\

\>      {\tt CMP 2, R3}\\

\>      {\tt CMP 2, R3}\\

\>      {\tt JMP.LT R0}\\

\>      {\tt RETN.LT}\\

\>      {\tt LOD 2(R1),R4}\>{\em; Load and store the second item (if necessary)} \\

\>      {\tt LB 2(R1),R4}\>{\em; Load and store the second item (if necessary)} \\

\>      {\tt STO R4,2(R1)}\>{\em; {\tt LOD}, {\tt STO}, {\tt JMP R0} will cost 10 cycles}\\

\>      {\tt SB R4,2(R1)}\>{\em; {\tt LW}, {\tt SW}, {\tt RETN} will cost 10 cycles}\\

\>      {\tt JMP     R0} \> {\em ; Finally, we return}\\

\>      {\tt RETN} \> {\em ; Finally, we return}\\

\end{tabbing}}}

\end{tabbing}}}

\caption{Example Memory Copy code in Zip Assembly, Hand Optimized}\label{tbl:memcp-opt}

\caption{Example Memory Copy code in Zip Assembly, Hand Optimized}\label{tbl:memcp-opt}

\end{center}\end{table}

\end{center}\end{table}

This pipeline memory example, though, provides some neat things to discuss

This pipeline memory example, though, provides some neat things to discuss

about optimizing code using the ZipCPU.

about optimizing code using the ZipCPU.

Line 2818...

Line 2776...

without needing a new comparison.  Hence, zero to three separate values can be

without needing a new comparison.  Hence, zero to three separate values can be

copied using only two compares.

copied using only two compares.

However, this discussion wouldn't be complete without an example of how

However, this discussion wouldn't be complete without an example of how

this memory operation would be made even simpler using the direct memory

this memory operation would be made even simpler using the direct memory

access controller.  In that case, we can return to C with the code in

access controller.  In that case, we can return to the C language with the

Tbl.~\ref{tbl:memcp-dmac}.

code in Tbl.~\ref{tbl:memcp-dmac}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabbing}

\begin{tabbing}

{\tt \#define DMACOPY 0x0fed0000} {\em // Copy memory, largest chunk at a time possible} \\

{\tt \#define DMACOPY 0x0fed0000} {\em // Copy memory, largest chunk at a time possible} \\

\\

\\

{\tt void} \= {\tt memcpy\_dma(void *dest, void *src, int len) \{} \\

{\tt void} \= {\tt memcpy\_dma(void *dest, void *src, int len) \{} \\

Line 2845...

Line 2803...

        \> {\tt zip\_wait();}\\

        \> {\tt zip\_wait();}\\

{\tt \}}

{\tt \}}

\end{tabbing}

\end{tabbing}

\caption{Example Memory Copy code using the DMA}\label{tbl:memcp-dmac}

\caption{Example Memory Copy code using the DMA}\label{tbl:memcp-dmac}

\end{center}\end{table}

\end{center}\end{table}

For large memory amounts, the cost of this approach will scale at roughly

The DMA, however, will only work with an integer number of 32--bit aligned

2~clocks per word transferred.

words.  Still, for large memory amounts, the cost of this approach will scale

at roughly 2~clocks per word transferred.

Notice how much simpler this memory copy has become to write by using the DMA.

Notice how much simpler this memory copy has become to write by using the DMA.

But also consider, the system has only one direct memory access controller.

But also consider, the system has only one direct memory access controller.

What happens if one task tries to use the controller when it is already in use

What happens if one task tries to use the controller when it is already in use

by another task?  The result is that the direct memory access controller may

by another task?  The result is that the direct memory access controller may

Line 2861...

Line 2820...

Another example worth discussing is the {\tt memset()} library function.

Another example worth discussing is the {\tt memset()} library function.

A straightforward implementation of this function in C might look like

A straightforward implementation of this function in C might look like

Tbl.~\ref{tbl:memset-c}.

Tbl.~\ref{tbl:memset-c}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabbing}

\begin{tabbing}

\hbox to 0.4in{\tt void} \= {\tt *memset(void *s, int c, size\_t n) \{} \\

\hbox to 0.4in{\tt void} \= {\tt *memset(char *s, int c, size\_t n) \{} \\

        \> {\tt for(size\_t i=0; i<n; i++)} \\

        \> {\tt for(size\_t i=0; i<n; i++)} \\

        \> \hspace{0.4in} {\tt *s++ = c;} \\

        \> \hspace{0.4in} {\tt *s++ = c;} \\

        \> {\tt return s;}\\

        \> {\tt return s;}\\

{\tt \}}

{\tt \}}

\end{tabbing}

\end{tabbing}

Line 2877...

Line 2836...

\begin{tabbing}

\begin{tabbing}

{\em ; Upon entry, R0 = return address, R1 = s, R2 = c, R3 = len}\\

{\em ; Upon entry, R0 = return address, R1 = s, R2 = c, R3 = len}\\

{\em ; Cost: Roughly $4+6N$ clocks}\\

{\em ; Cost: Roughly $4+6N$ clocks}\\

{\tt memset:}\\

{\tt memset:}\\

\hbox to 0.25in{}\=\hbox to 1in{\tt TST R3}\={\em ; Return immediately if len (R3) is zero}\\

\hbox to 0.25in{}\=\hbox to 1in{\tt TST R3}\={\em ; Return immediately if len (R3) is zero}\\

\>      {\tt JMP.Z R0}\\

\>      {\tt RETN.Z}\\

\>      {\tt MOV R1,R4} \> {\em ; Keep our return value in R1, use R4 as a local}\\

\>      {\tt MOV R1,R4} \> {\em ; Keep our return value in R1, use R4 as a local}\\

{\tt memset\_loop:}\>\> {\em ; Here, we know we have at least one more to go}\\

{\tt memset\_loop:}\>\> {\em ; Here, we know we have at least one more to go}\\

\>      {\tt STO R2,(R4)} \> {\em       ; Store one value (no pipelining)} \\

\>      {\tt SB R2,(R4)} \> {\em        ; Store one value (no pipelining)} \\

\>      {\tt SUB 1,R3} \> {\em; Subtract during the store}\\

\>      {\tt SUB 1,R3} \> {\em; Subtract during the store}\\

\>      {\tt JMP.Z R0} \> {\em; Return (during store) if all done}\\

\>      {\tt RETN.Z} \> {\em; Return (during store) if all done}\\

\>      {\tt ADD 1,R4} \> {\em; Otherwise increment our pointer}\\

\>      {\tt ADD 1,R4} \> {\em; Otherwise increment our pointer}\\

\>      {\tt BRA memset\_loop} {\em ; and repeat}\\

\>      {\tt BRA memset\_loop} {\em ; and repeat}\\

\end{tabbing}

\end{tabbing}

\caption{Example Memset code, minimally optimized}\label{tbl:memset-unop}

\caption{Example Memset code, minimally optimized}\label{tbl:memset-unop}

\end{center}\end{table}

\end{center}\end{table}

Line 2908...

Line 2867...

\hbox to 0.25in{}\=\hbox to 0.6in{\tt MOV}\=\hbox to 1.0in{\tt R1,R4}\={\em ; Make a local copy of *s, so we can return R1}\\

\hbox to 0.25in{}\=\hbox to 0.6in{\tt MOV}\=\hbox to 1.0in{\tt R1,R4}\={\em ; Make a local copy of *s, so we can return R1}\\

\>      {\tt CMP}\>{\tt 4,R3}\>{\em ; Jump to non--unrolled section}\\

\>      {\tt CMP}\>{\tt 4,R3}\>{\em ; Jump to non--unrolled section}\\

\>      {\tt JMP.C}\>{\tt memset\_pipe\_tail}\\

\>      {\tt JMP.C}\>{\tt memset\_pipe\_tail}\\

\>      {\tt SUB}\>{\tt 1,R3}\> {\em ; R3 is now one less than the number to finish}\\

\>      {\tt SUB}\>{\tt 1,R3}\> {\em ; R3 is now one less than the number to finish}\\

{\tt memset\_pipe\_unrolled:}\>\>\> {\em ; Here, we know we have at least four more to go}\\

{\tt memset\_pipe\_unrolled:}\>\>\> {\em ; Here, we know we have at least four more to go}\\

\>      {\tt STO}\>{\tt R2,(R4)} \> {\em  ; Store our four values, pipelining our}\\

\>      {\tt SB}\>{\tt R2,(R4)} \> {\em  ; Store our four values, pipelining our}\\

\>      {\tt STO}\>{\tt R2,1(R4)} \> {\em ; access across the bus }\\

\>      {\tt SB}\>{\tt R2,1(R4)} \> {\em ; access across the bus }\\

\>      {\tt STO}\>{\tt R2,2(R4)} \\

\>      {\tt SB}\>{\tt R2,2(R4)} \\

\>      {\tt STO}\>{\tt R2,3(R4)} \\

\>      {\tt SB}\>{\tt R2,3(R4)} \\

\>      {\tt SUB}\>{\tt 4,R3} \> {\em; If there are zero left, this will be a -1 result}\\

\>      {\tt SUB}\>{\tt 4,R3} \> {\em; If there are zero left, this will be a -1 result}\\

\>      {\tt JMP.C}\>{\tt prememset\_pipe\_tail}\> \hbox to 0.5in{}\= {\em; So we can use our LT condition}\\

\>      {\tt BC}\>{\tt prememset\_pipe\_tail}\> \hbox to 0.5in{}\= {\em; So we can use our LT condition}\\

\>      {\tt ADD}\>{\tt 4,R4} \> {\em ; Otherwise increment our pointer}\\

\>      {\tt ADD}\>{\tt 4,R4} \> {\em ; Otherwise increment our pointer}\\

\>      {\tt BRA}\>{\tt memset\_pipe\_loop} {\em ; and repeat using an early branchable instruction}\\

\>      {\tt BRA}\>{\tt memset\_pipe\_unrolled} {\em ; and repeat using an early branchable instruction}\\

{\tt prememset\_pipe\_tail:} \\

{\tt prememset\_pipe\_tail:} \\

\>    {\tt ADD}\>{\tt 1,R3}\>{\em ; Return our counts left to the run number}\\

\>    {\tt ADD}\>{\tt 1,R3}\>{\em ; Return our counts left to the run number}\\

{\tt memset\_pipe\_tail:}\>\>\>{\em ; At this point, we have R3=0-3 remaining}\\

{\tt memset\_pipe\_tail:}\>\>\>{\em ; At this point, we have R3=0-3 remaining}\\

\>      {\tt CMP}\>{\tt 1,R3}   \> {\em ; If there's less than one left}\\

\>      {\tt CMP}\>{\tt 1,R3}   \> {\em ; If there's less than one left}\\

\>      {\tt JMP.C}\>{\tt R0}   \> {\em ; then return early.}\\

\>      {\tt RETN.C}\>  \> {\em ; then return early.}\\

\>      {\tt STO}\>{\tt R2,(R4)} \> {\em ; If we've got one left, store it}\\

\>      {\tt SB}\>{\tt R2,(R4)} \> {\em ; If we've got one left, store it}\\

\>      {\tt STO.GT}\>{\tt R2,1(R4)} \> {\em ; if two, do a burst store}\\

\>      {\tt SB.GT}\>{\tt R2,1(R4)} \> {\em ; if two, do a burst store}\\

\>      {\tt CMP}\>{\tt 3,R3}   \> {\em ; Check if we have another left}\\

\>      {\tt CMP}\>{\tt 3,R3}   \> {\em ; Check if we have another left}\\

\>      {\tt STO.Z}\>{\tt R2,2(R4)}     \> {\em ; and store it if so.}\\

\>      {\tt SB.Z}\>{\tt R2,2(R4)}      \> {\em ; and store it if so.}\\

\>      {\tt JMP}\>{\tt R0}     \> {\em ; Return now that we are complete.}

\>      {\tt RETN}\>            \> {\em ; Return now that we are complete.}

\end{tabbing}

\end{tabbing}

\caption{Example Memset after loop unrolling, using pipelined memory ops}\label{tbl:memset-pipe}

\caption{Example Memset after loop unrolling, using pipelined memory ops}\label{tbl:memset-pipe}

\end{center}\end{table}

\end{center}\end{table}

Note that, in this example as with the {\tt memcpy} example, our loop variable

Note that, in this example as with the {\tt memcpy} example, our loop variable

is one less than the number of operations remaining.  This is because the ZipCPU

is one less than the number of operations remaining.  This is because the

has no less than or equal comparison, but only a less than comparison.  Further,

ZipCPU has no less than or equal comparison, but only a less than comparison.

because the length is given as an unsigned quantity, we {\em only} have a

By subtracting one from the loop variable, that's all our comparison needs to

less than comparison.  By subtracting one from the loop variable, that's

be--at least, until the end of the loop.  For that, we jump to a section one

all our comparison needs to be--at least, until the end of the loop.  For

instruction earlier and return our counts value to the true remaining length.

that, we jump to a section one instruction earlier and return our counts

value to the true remaining length.

You may also notice that, despite the four possibilities in the end game, we

You may also notice that, despite the four possibilities in the end game, we

can carefully rearrange the logic to only use two compares.  The first compare

can carefully rearrange the logic to only use two compares.  The first compare

tests against less than one and returns if there are no more sets left.  Using

tests against less than one and returns if there are no more sets left.  Using

the same compare, though, we can also know if we have one or more stores left.

the same compare, though, we can also know if we have one or more stores left.

Hence, we can create a burst memory operation with one or two stores.

Hence, we can create a burst memory operation with one or two stores.

As one final example, we might also use the DMA for this operation, as with

The three examples given so far discuss and demonstrate solutions appropriate

for memory accesses that are not necessarily aligned.  Were the accesses

aligned, the operation could be done about four times faster.  To do this,

the {\tt LB} and {\tt SB} instructions would need to be replaced by {\tt LW}

and {\tt SW} instructions.

Still, if all accesses were able to be aligned, then we might also use the

DMA for this operation.  Hence, the DMA makes our final example in

Tbl.~\ref{tbl:memset-dma}.

Tbl.~\ref{tbl:memset-dma}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabbing}

\begin{tabbing}

{\tt \#define DMA\_CONSTSRC 0x20000000} {\em // Don't increment the source address}

{\tt \#define DMA\_CONSTSRC 0x20000000} {\em // Don't increment the source address}

\\

\\

{\tt void *} \= {\tt memset\_dma(void *s, int c, size\_t len) \{} \\

{\tt int *} \= {\tt memset\_dma(int *s, int c, size\_t len) \{} \\

        \> {\em // As before, this assumes we have access to the DMA, and that}\\

        \> {\em // As before, this assumes we have access to the DMA, and that}\\

        \> {\em // we are running in system high mode ...}\\

        \> {\em // we are running in system high mode ...}\\

        \> {\tt zip->dma.len = len;} \= {\em // Set up the DMA }\\

        \> {\tt zip->dma.len = len;} \= {\em // Set up the DMA }\\

        \> {\tt zip->dma.rd  = \&c;}\\

        \> {\tt zip->dma.rd  = \&c;}\\

        \> {\tt zip->dma.wr  = s;}\\

        \> {\tt zip->dma.wr  = s;}\\

Line 2968...

Line 2932...

        \> {\em // interrupt within it, now we enable the DMA interrupt, and}\\

        \> {\em // interrupt within it, now we enable the DMA interrupt, and}\\

        \> {\em // only the DMA interrupt.}\\

        \> {\em // only the DMA interrupt.}\\

        \> {\tt zip->pic = EINT(SYSINT\_DMA);}\\

        \> {\tt zip->pic = EINT(SYSINT\_DMA);}\\

        \> {\em // And wait for the DMA to complete.} \\

        \> {\em // And wait for the DMA to complete.} \\

        \> {\tt zip\_wait();}\\

        \> {\tt zip\_wait();}\\

        \> {\em // Return the original source pointer, so as to} \\

        \> {\em // match the library definition.} \\

        \> {\tt return s;}\\

{\tt \}}

{\tt \}}

\end{tabbing}

\end{tabbing}

\caption{Example Memset code, only this time with the DMA}\label{tbl:memset-dma}

\caption{Example Memset code, only this time with the DMA}\label{tbl:memset-dma}

\end{center}\end{table}

\end{center}\end{table}

This is almost identical to the {\tt memcpy} function above that used the

This is almost identical to the {\tt memcpy} function above that used the

DMA, save that the pointer for the value read is given to be the address

DMA, save that the pointer for the value read is given to be the address

of c, and that the DMA is instructed not to increment its source pointer.

of c, and that the DMA is instructed not to increment its source pointer.

The DMA will still do {\tt len} reads, so the asymptotic performance will never

The DMA will still do {\tt len} reads, so the asymptotic performance will never

be less than $2N$ clocks per transfer.

be less than $2N$ clocks per transfer.

\section{String Operations}

Perhaps one of the immediate questions most folks will have is, how does one

handle string operations on a CPU that only handles 32--bit numbers?  Here we

offer a couple of possibilities.

The first possibility is the easy and natural choice: just define characters

to be 32--bit numbers and ignore the upper 24 bits.  This is the choice made

by the compiler.  Hence, if you compile a simple string compare function,

such as Tbl.~\ref{tbl:str-cmp},

\begin{table}\begin{center}

\begin{tabbing}

\hbox to 0.25in{\tt int} \= {\tt strcmp(const char *s1, const char *s2) \{} \\

        \> {\tt while(*s1 == *s2)} \\

        \> \hbox to 0.25in{} {\tt s1++, s2++;} \\

        \> {\tt return *s2 - *s1;} \\

{\tt \}}

\end{tabbing}

\caption{Example string compare function}\label{tbl:str-cmp}

\end{center}\end{table}

string length function, such as Tbl.~\ref{tbl:str-len},

\begin{table}\begin{center}

\begin{tabbing}

{\tt unsigned} \= {\tt strlen(const char *s) \{} \\

        \> {\tt int ln = 0;} \\

        \> {\tt while(*s++ != 0)} \\

        \> \hbox to 0.25in{} {\tt ln++;} \\

        \> {\tt return ln;} \\

{\tt \}}

\end{tabbing}

\caption{Example string compare function}\label{tbl:str-len}

\end{center}\end{table}

or string copy function, such as Tbl.~\ref{tbl:str-cpy},

\begin{table}\begin{center}

\begin{tabbing}

{\tt char *} \= {\tt strcpy(char *dest, const char *src) \{} \\

        \> {\tt char *d = dest;} {\em // Make a working copy of the dest ptr}\\

        \> {\tt do \{} \\

        \> \hbox to 0.25in{} {\tt *d++ = *src;} \\

        \> {\tt \} while(*src++);} \\

        \> {\tt return dest;} \\

{\tt \}}

\end{tabbing}

\caption{Example string copy function}\label{tbl:str-cpy}

\end{center}\end{table}

this is what you will get.

A little work with these functions, and you should be able to optimize them

in a fashion similar to that with memcpy.  This doesn't solve the fundamental

problem, though, of why am I wasting 32--bits for 8--bit quantities?

An alternative would be to use a packed string structure.  To pack a string,

one might do something like Tbl.~\ref{tbl:pstr}.

\begin{table}\begin{center}

\begin{tabbing}

{\tt void} \= {\tt packstr(char *s) \{} \\

        \> {\tt char *d = s;} \= {\em // Pack our string in place} \\

        \> {\tt int w;}\>{\em // A holding word to pack things into} \\

        \> {\tt int k=0;}\>{\em // A count to know when to move to the next word} \\

        \> {\tt while(*s) \{} \\

        \> \hbox to 0.25in{}\={\tt w = (w<<8)|(*s \& 0x0ff);} \\

        \>\> {\em // After four of these octets, write the result out} \\

        \> \> {\tt if (((++k)\&3)==0) *d++ = w;} \\

        \> {\tt \}} \\

        \> {\em // But what happens if we never got to the fourth octet}\\

        \> {\em // in our last word?  We need to clean that up here.}\\

\\

        \> {\em // First, shift the partial value all the way up}\\

        \> {\tt w = (w<<(32-((k\&3)<<3));} {\em // Shift up the last word}\\

        \> {\tt *d++ = w;} {\em // Store any remaining partial value }\\

        \> {\em // If we want to make sure our strings end in zero, we need}\\

        \> {\em // one more step:}\\

        \> {\tt *d = 0;} {\em // Make sure string ends in a zero.}\\

{\tt \}}

\end{tabbing}

\caption{String packing function}\label{tbl:pstr}

\end{center}\end{table}

Notice that our packed string places its first byte in the high order octet

of our first word, that any excess octets in the last word are zeros,

and that there remains a zero word following our string.  With this packed

string approach, compares and copies can proceed four times faster.  As an

example, Tbl.~\ref{tbl:pstr-cmp}

\begin{table}\begin{center}

\begin{tabbing}

\hbox to 0.25in{\tt int} \= {\tt pstrcmp(const char *s1, const char *s2) \{} \\

        \> {\tt while(*s1 == *s2)} \\

        \> \hbox to 0.25in{} {\tt s1++, s2++;} \\

        \> {\tt return *s2 - *s1;} \\

{\tt \}}

\end{tabbing}

\caption{Packed string compare function}\label{tbl:pstr-cmp}

\end{center}\end{table}

presents a string compare function for a packed string.  You'll notice that

it doesn't look all that different from a string compare for a non-packed

string.  This is on purpose.  Another example might be a string copy, which

again, wouldn't look all that different.  Getting the number of used 8--bit

octets within a string is a touch more difficult.  In that case, one might

try something like Tbl.~\ref{tbl:pstr-len}.

\begin{table}\begin{center}

\begin{tabbing}

{\tt unsigned} \= {\tt pstrlen(const char *s) \{} \\

        \> {\tt int ln = 0;} \\

        \> {\tt while(*s++ != 0)} \\

        \> \hbox to 0.25in{}\={\tt ln+=4;} \\

        \> {\tt if (ln) \{}\\

        \>\>    {\em // Touch up the length in case of an incomplete last word} \\

        \>\>    {\tt int lastval = s[-1];}\\

\\

        \>\>    {\tt if ((lastval \& 0x0ff)==0) ln--;}\\

        \>\>    {\tt if ((lastval \& 0x0ffff)==0) ln--;}\\

        \>\>    {\tt if ((lastval \& 0x0ffffff)==0) ln--;}\\

        \> {\tt \}} \\

        \> {\tt return ln;} \\

{\tt \}}

\end{tabbing}

\caption{Packed string subcharacter length function}\label{tbl:pstr-len}

\end{center}\end{table}

\section{Context Switch}

\section{Context Switch}

Fundamental to any multiprocessing system is the ability to switch from one

Fundamental to any multiprocessing system is the ability to switch from one

task to the next.  In the ZipSystem, this is accomplished in one of a couple of

task to the next.  In the ZipSystem, this is accomplished in one of a couple of

Line 3156...

Line 3007...

        registers to some supervisor memory structure, such as is shown in

        registers to some supervisor memory structure, such as is shown in

        Tbl.~\ref{tbl:context-out}.

        Tbl.~\ref{tbl:context-out}.

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabbing}

\begin{tabbing}

{\tt save\_context:} \\

{\tt save\_context:} \\

\hbox to 0.25in{}\={\tt SUB 1,SP}\hbox to 0.5in{}\= {\em ; Function prologue: create a stack}\\

\hbox to 0.25in{}\={\tt SUB 4,SP}\hbox to 0.5in{}\= {\em ; Function prologue: create a stack}\\

\>        {\tt STO R5,(SP)}     \> {\em ; frame and save R5.  (R1-R4 are assumed}\\

\>        {\tt SW R5,(SP)}      \> {\em ; frame and save R5.  (R1-R4 are assumed}\\

\>        {\tt MOV uR0,R2}      \> {\em ; to be used and in need of saving.  Then}\\

\>        {\tt MOV uR0,R2}      \> {\em ; to be used and in need of saving.  Then}\\

\>        {\tt MOV uR1,R3}      \> {\em ; copy the user registers, four at a time to }\\

\>        {\tt MOV uR1,R3}      \> {\em ; copy the user registers, four at a time to }\\

\>        {\tt MOV uR2,R4}      \> {\em ; supervisor registers, where they can be}\\

\>        {\tt MOV uR2,R4}      \> {\em ; supervisor registers, where they can be}\\

\>        {\tt MOV uR3,R5}      \> {\em ; stored, while exploiting memory pipelining}\\

\>        {\tt MOV uR3,R5}      \> {\em ; stored, while exploiting memory pipelining}\\

\>        {\tt STO R2,(R1)}     \>{\em ; Exploit memory pipelining: }\\

\>        {\tt SW R2,(R1)}      \>{\em ; Exploit memory pipelining: }\\

\>        {\tt STO R3,1(R1)}    \>{\em ; All instructions write to same base memory}\\

\>        {\tt SW R3,4(R1)}     \>{\em ; All instructions write to same base memory}\\

\>        {\tt STO R4,2(R1)}    \>{\em ; All offsets increment by one }\\

\>        {\tt SW R4,8(R1)}     \>{\em ; All offsets increment by one }\\

\>        {\tt STO R5,3(R1)} \\

\>        {\tt SW R5,12(R1)} \\

\>      \ldots {\em ; Need to repeat for all user registers} \\

\>      \ldots {\em ; Need to repeat for all user registers} \\

\iffalse

&        {\tt MOV uR5,R0} \\

&        {\tt MOV uR6,R1} \\

&        {\tt MOV uR7,R2} \\

&        {\tt MOV uR8,R3} \\

&        {\tt MOV uR9,R4} \\

&        {\tt STO R0,5(R5) }\\

&        {\tt STO R1,6(R5) }\\

&        {\tt STO R2,7(R5) }\\

&        {\tt STO R3,8(R5) }\\

&        {\tt STO R4,9(R5)} \\

\fi

\>        {\tt MOV uR12,R2}     \> {\em ; Finish copying ... } \\

\>        {\tt MOV uR12,R2}     \> {\em ; Finish copying ... } \\

\>        {\tt MOV uSP,R3} \\

\>        {\tt MOV uSP,R3} \\

\>        {\tt MOV uCC,R4} \\

\>        {\tt MOV uCC,R4} \\

\>        {\tt MOV uPC,R5} \\

\>        {\tt MOV uPC,R5} \\

\>        {\tt STO R2,12(R1)}   \> {\em ; and saving the last registers.}\\

\>        {\tt SW R2,48(R1)}    \> {\em ; and saving the last registers.}\\

\>        {\tt STO R3,13(R1)}   \> {\em ; Note that even the special user registers }\\

\>        {\tt SW R3,52(R1)}    \> {\em ; Note that even the special user registers }\\

\>        {\tt STO R4,14(R1)}   \> {\em ; are saved just like any others. }\\

\>        {\tt SW R4,56(R1)}    \> {\em ; are saved just like any others. }\\

\>        {\tt STO R5,15(R1)} \\

\>        {\tt SW R5,60(R1)} \\

\>        {\tt LOD (SP),R5}     \> {\em ; Restore our one saved register}\\

\>        {\tt LW (SP),R5}      \> {\em ; Restore our one saved register}\\

\>        {\tt ADD 1,SP}                \> {\em ; our stack frame,} \\

\>        {\tt ADD 4,SP}                \> {\em ; our stack frame,} \\

\>        {\tt JMP R0}          \> {\em ; and return }\\

\>        {\tt RETN}            \> {\em ; and return }\\

\end{tabbing}

\end{tabbing}

\caption{Example Storing User Task Context}\label{tbl:context-out}

\caption{Example Storing User Task Context}\label{tbl:context-out}

\end{center}\end{table}

\end{center}\end{table}

Since this task is so fundamental, the ZipCPU compiler back end provides

Since this task is so fundamental, the ZipCPU compiler back end provides

the {\tt zip\_save\_context(int *)} function.

the {\tt zip\_save\_context(int *)} function.

Line 3236...

Line 3075...

        the user registers.  An example of this is shown in

        the user registers.  An example of this is shown in

        Tbl.~\ref{tbl:context-in},

        Tbl.~\ref{tbl:context-in},

\begin{table}\begin{center}

\begin{table}\begin{center}

\begin{tabbing}

\begin{tabbing}

{\tt restore\_context:} \\

{\tt restore\_context:} \\

\hbox to 0.25in{}\= {\tt SUB 1,SP}\hbox to 0.4in{}\={\em ; Set up a stack frame} \\

\hbox to 0.25in{}\= {\tt SUB 4,SP}\hbox to 0.4in{}\={\em ; Set up a stack frame} \\

\>      {\tt STO R5,(SP)} \> {\em ; and store a local register onto it.}\\

\>      {\tt SW R5,(SP)} \> {\em ; and store a local register onto it.}\\

\\

\\

\>      {\tt LOD (R1),R2} \> {\em ; By doing four loads at a time, we are }\\

\>      {\tt LW (R1),R2} \> {\em ; By doing four loads at a time, we are }\\

\>      {\tt LOD 1(R1),R3} \> {\em ; making sure we are using our pipelined}\\

\>      {\tt LW 4(R1),R3} \> {\em ; making sure we are using our pipelined}\\

\>      {\tt LOD 2(R1),R4} \> {\em ; memory capability. }\\

\>      {\tt LW 8(R1),R4} \> {\em ; memory capability. }\\

\>      {\tt LOD 3(R1),R5} \\

\>      {\tt LW 12(R1),R5} \\

\>      {\tt MOV R2,uR1} \> {\em ; Once the registers are loaded, copy them }\\

\>      {\tt MOV R2,uR1} \> {\em ; Once the registers are loaded, copy them }\\

\>      {\tt MOV R3,uR2} \> {\em ; into the user registers that they need to}\\

\>      {\tt MOV R3,uR2} \> {\em ; into the user registers that they need to}\\

\>      {\tt MOV R4,uR3} \> {\em ; be placed within.} \\

\>      {\tt MOV R4,uR3} \> {\em ; be placed within.} \\

\>      {\tt MOV R5,uR4} \\

\>      {\tt MOV R5,uR4} \\

        \> \ldots {\em ; Need to repeat for all user registers} \\

        \> \ldots {\em ; Need to repeat for all user registers} \\

\>      {\tt LOD 12(R1),R2} \> {\em ; Now for our last four registers ...}\\

\>      {\tt LW 48(R1),R2} \> {\em ; Now for our last four registers ...}\\

\>      {\tt LOD 13(R5),R3} \\

\>      {\tt LW 52(R5),R3} \\

\>      {\tt LOD 14(R5),R4} \\

\>      {\tt LW 56(R5),R4} \\

\>      {\tt LOD 15(R5),R5} \\

\>      {\tt LW 60(R5),R5} \\

\>      {\tt MOV R2,uR12} \> {\em ; These are the special purpose ones, restored }\\

\>      {\tt MOV R2,uR12} \> {\em ; These are the special purpose ones, restored }\\

\>      {\tt MOV R3,uSP} \> {\em ; just like any others.}\\

\>      {\tt MOV R3,uSP} \> {\em ; just like any others.}\\

\>      {\tt MOV R4,uCC} \\

\>      {\tt MOV R4,uCC} \\

\>      {\tt MOV R5,uPC} \\

\>      {\tt MOV R5,uPC} \\

\>      {\tt LOD (SP),R5} \> {\em ; Restore our saved register, } \\

\>      {\tt LW (SP),R5} \> {\em ; Restore our saved register, } \\

\>      {\tt ADD 1,SP}  \> {\em ; and the stack frame, }\\

\>      {\tt ADD 4,SP}  \> {\em ; and the stack frame, }\\

\>      {\tt JMP R0}    \> {\em ; and return to where we were called from.}\\

\>      {\tt RETN}      \> {\em ; and return to where we were called from.}\\

\end{tabbing}

\end{tabbing}

\caption{Example Restoring User Task Context}\label{tbl:context-in}

\caption{Example Restoring User Task Context}\label{tbl:context-in}

\end{center}\end{table}

\end{center}\end{table}

        Because this is such an important task, the ZipCPU GCC provides a

        Because this is such an important task, the ZipCPU GCC provides a

        built--in function, {\tt zip\_restore\_context(int *)}, which can be

        built--in function, {\tt zip\_restore\_context(int *)}, which can be

Line 3293...

Line 3132...

\section{ZipSystem Peripheral Registers}

\section{ZipSystem Peripheral Registers}

The ZipSystem maintains currently maintains 20 register locations, as shown

The ZipSystem maintains currently maintains 20 register locations, as shown

in Tbl.~\ref{tbl:zpregs}.

in Tbl.~\ref{tbl:zpregs}.

\begin{table}[htbp]

\begin{table}[htbp]

\begin{center}\begin{reglist}

\begin{center}\begin{reglist}

PIC   & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline

PIC   & \scalebox{0.8}{\tt 0xff000000} & 32 & R/W & Primary Interrupt Controller \\\hline

WDT & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline

WDT   & \scalebox{0.8}{\tt 0xff000004} & 32 & R/W & Watchdog Timer \\\hline

WBU&\scalebox{0.8}{\tt 0xc0000002} & 32 & R & Address of last bus timeout error\\\hline

WBU   &\scalebox{0.8}{\tt 0xff000008} & 32 & R & Address of last bus timeout error\\\hline

CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline

CTRIC & \scalebox{0.8}{\tt 0xff00000c} & 32 & R/W & Secondary Interrupt Controller \\\hline

TMRA  & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline

TMRA  & \scalebox{0.8}{\tt 0xff000010} & 32 & R/W & Timer A\\\hline

TMRB  & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline

TMRB  & \scalebox{0.8}{\tt 0xff000014} & 32 & R/W & Timer B\\\hline

TMRC  & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline

TMRC  & \scalebox{0.8}{\tt 0xff000018} & 32 & R/W & Timer C\\\hline

JIFF  & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline

JIFF  & \scalebox{0.8}{\tt 0xff00001c} & 32 & R/W & Jiffies \\\hline

MTASK  & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline

MTASK & \scalebox{0.8}{\tt 0xff000020} & 32 & R/W & Master Task Clock Counter \\\hline

MMSTL  & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline

MMSTL & \scalebox{0.8}{\tt 0xff000024} & 32 & R/W & Master Stall Counter \\\hline

MPSTL  & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline

MPSTL & \scalebox{0.8}{\tt 0xff000028} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline

MICNT  & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline

MICNT & \scalebox{0.8}{\tt 0xff00002c} & 32 & R/W & Master Instruction Counter\\\hline

UTASK  & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline

UTASK & \scalebox{0.8}{\tt 0xff000030} & 32 & R/W & User Task Clock Counter \\\hline

UMSTL  & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline

UMSTL & \scalebox{0.8}{\tt 0xff000034} & 32 & R/W & User Stall Counter \\\hline

UPSTL  & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline

UPSTL & \scalebox{0.8}{\tt 0xff000038} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline

UICNT  & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline

UICNT & \scalebox{0.8}{\tt 0xff00003c} & 32 & R/W & User Instruction Counter\\\hline

DMACTRL  & \scalebox{0.8}{\tt 0xc0000010} & 32 & R/W & DMA Control Register\\\hline

DMACTRL& \scalebox{0.8}{\tt 0xff000040} & 32 & R/W & DMA Control Register\\\hline

DMALEN  & \scalebox{0.8}{\tt 0xc0000011} & 32 & R/W & DMA total transfer length\\\hline

DMALEN & \scalebox{0.8}{\tt 0xff000044} & 32 & R/W & DMA total transfer length\\\hline

DMASRC  & \scalebox{0.8}{\tt 0xc0000012} & 32 & R/W & DMA source address\\\hline

DMASRC & \scalebox{0.8}{\tt 0xff000048} & 32 & R/W & DMA source address\\\hline

DMADST  & \scalebox{0.8}{\tt 0xc0000013} & 32 & R/W & DMA destination address\\\hline

DMADST & \scalebox{0.8}{\tt 0xff00004c} & 32 & R/W & DMA destination address\\\hline

% Cache  & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline

\end{reglist}

\end{reglist}

\caption{ZipSystem Internal/Peripheral Registers}\label{tbl:zpregs}

\caption{ZipSystem Internal/Peripheral Registers}\label{tbl:zpregs}

\end{center}\end{table}

\end{center}\end{table}

These registers are located in the CPU's address space, although in a special

These registers are all 32-bit registers.  Writes of less than 32--bits

area of that space.  Indeed, the area is so special, that the CPU decodes

may have unexpected results.  Further, they are located in a reserved location

the address space location before placing the request onto the bus.  For

within the CPU's address space.  As a result, references to these locations

this reason, other containers for the CPU, such as the ZipBones which doesn't

by a ZipBones based system will generate a bus error.

have these registers, will still create errors when they are referenced.

Here in this section, we'll walk through the definition of each of these

Here in this section, we'll walk through the definition of each of these

registers in turn, together with any bit fields that may be associated with

registers in turn, together with any bit fields that may be associated with

them, and how to set those fields.

them, and how to set those fields.

Line 3526...

Line 3363...

accessing the system via the wishbone bus.  The debug port itself has been

accessing the system via the wishbone bus.  The debug port itself has been

reduced to two addresses, as outlined earlier in Tbl.~\ref{tbl:dbgregs}.

reduced to two addresses, as outlined earlier in Tbl.~\ref{tbl:dbgregs}.

\begin{table}[htbp]

\begin{table}[htbp]

\begin{center}\begin{reglist}

\begin{center}\begin{reglist}

ZIPCTRL & 0 & 32 & R/W & Debug Control Register \\\hline

ZIPCTRL & 0 & 32 & R/W & Debug Control Register \\\hline

ZIPDATA & 1 & 32 & R/W & Debug Data Register \\\hline

ZIPDATA & 4 & 32 & R/W & Debug Data Register \\\hline

\end{reglist}

\end{reglist}

\caption{ZipSystem Debug Registers}\label{tbl:dbgregs}

\caption{ZipSystem Debug Registers}\label{tbl:dbgregs}

\end{center}\end{table}

\end{center}\end{table}

Access to the ZipSystem begins with the Debug Control register, shown in

Access to the ZipSystem begins with the Debug Control register, shown in

Line 3652...

Line 3489...

and Tbl.~\ref{tbl:wishbone-master} respectively.

and Tbl.~\ref{tbl:wishbone-master} respectively.

\begin{table}[htbp]

\begin{table}[htbp]

\begin{center}

\begin{center}

\begin{wishboneds}

\begin{wishboneds}

Revision level of wishbone & WB B4 spec \\\hline

Revision level of wishbone & WB B4 spec \\\hline

Type of interface & Master, Read/Write, single cycle or pipelined\\\hline

Type of interface & Master, Read/Write, pipelined\\\hline

Address Width & (ZipSystem parameter, can be up to 32--bit bits) \\\hline

Address Width & (ZipSystem parameter, up to 30~bits) \\\hline

Port size & 32--bit \\\hline

Port size & 32--bit \\\hline

Port granularity & 32--bit \\\hline

Port granularity & 8--bit \\\hline

Maximum Operand Size & 32--bit \\\hline

Maximum Operand Size & 32--bit \\\hline

Data transfer ordering & (Irrelevant) \\\hline

Data transfer ordering & Big--Endian \\\hline

Clock constraints & Works at 100~MHz on a Basys--3 board, and 80~MHz on a

Clock constraints & Works at 100~MHz on a Basys--3 board, and 80~MHz on a

                XuLA2--LX25\\\hline

                XuLA2--LX25\\\hline

Signal Names & \begin{tabular}{ll}

Signal Names & \begin{tabular}{ll}

                Signal Name & Wishbone Equivalent \\\hline

                Signal Name & Wishbone Equivalent \\\hline

                {\tt i\_clk} & {\tt CLK\_O} \\

                {\tt i\_clk} & {\tt CLK\_O} \\

                {\tt o\_wb\_cyc} & {\tt CYC\_O} \\

                {\tt o\_wb\_cyc} & {\tt CYC\_O} \\

                {\tt o\_wb\_stb} & {\tt (CYC\_O)\&(STB\_O)} \\

                {\tt o\_wb\_stb} & {\tt (CYC\_O)\&(STB\_O)} \\

                {\tt o\_wb\_we} & {\tt WE\_O} \\

                {\tt o\_wb\_we} & {\tt WE\_O} \\

                {\tt o\_wb\_addr} & {\tt ADR\_O} \\

                {\tt o\_wb\_addr} & {\tt ADR\_O} \\

                {\tt o\_wb\_data} & {\tt DAT\_O} \\

                {\tt o\_wb\_data} & {\tt DAT\_O} \\

                {\tt o\_wb\_sel} & {\tt SEL\_O} \\

                {\tt i\_wb\_ack} & {\tt ACK\_I} \\

                {\tt i\_wb\_ack} & {\tt ACK\_I} \\

                {\tt i\_wb\_stall} & {\tt STALL\_I} \\

                {\tt i\_wb\_stall} & {\tt STALL\_I} \\

                {\tt i\_wb\_data} & {\tt DAT\_I} \\

                {\tt i\_wb\_data} & {\tt DAT\_I} \\

                {\tt i\_wb\_err} & {\tt ERR\_I}

                {\tt i\_wb\_err} & {\tt ERR\_I}

                \end{tabular}\\\hline

                \end{tabular}\\\hline

Line 3738...

Line 3576...

\begin{table}

\begin{table}

\begin{center}\begin{portlist}

\begin{center}\begin{portlist}

{\tt o\_wb\_cyc}   &  1 & Output & Indicates an active Wishbone cycle\\\hline

{\tt o\_wb\_cyc}   &  1 & Output & Indicates an active Wishbone cycle\\\hline

{\tt o\_wb\_stb}   &  1 & Output & WB Strobe signal\\\hline

{\tt o\_wb\_stb}   &  1 & Output & WB Strobe signal\\\hline

{\tt o\_wb\_we}    &  1 & Output & Write enable\\\hline

{\tt o\_wb\_we}    &  1 & Output & Write enable\\\hline

{\tt o\_wb\_addr}  & 32 & Output & Bus address \\\hline

{\tt o\_wb\_addr}  & 30 & Output & Bus address \\\hline

{\tt o\_wb\_data}  & 32 & Output & Data on WB write\\\hline

{\tt o\_wb\_data}  & 32 & Output & Data on WB write\\\hline

{\tt o\_wb\_sel}   &  4 & Output & Select lines\\\hline

{\tt i\_wb\_ack}   &  1 & Input  & Slave has completed a R/W cycle\\\hline

{\tt i\_wb\_ack}   &  1 & Input  & Slave has completed a R/W cycle\\\hline

{\tt i\_wb\_stall} &  1 & Input  & WB bus slave not ready\\\hline

{\tt i\_wb\_stall} &  1 & Input  & WB bus slave not ready\\\hline

{\tt i\_wb\_data}  & 32 & Input  & Incoming bus data\\\hline

{\tt i\_wb\_data}  & 32 & Input  & Incoming bus data\\\hline

{\tt i\_wb\_err}   &  1 & Input  & Bus Error indication\\\hline

{\tt i\_wb\_err}   &  1 & Input  & Bus Error indication\\\hline

\end{portlist}\caption{CPU Master Wishbone I/O Ports}\label{tbl:iowb-master}\end{center}\end{table}

\end{portlist}\caption{CPU Master Wishbone I/O Ports}\label{tbl:iowb-master}\end{center}\end{table}

Line 3816...

Line 3655...

        A new implementation using an iCE40 FPGA suggests that the ZipCPU

        A new implementation using an iCE40 FPGA suggests that the ZipCPU

        will fit within the 4k~4--way LUTs of the iCE40 HK4X FPGA, but only

        will fit within the 4k~4--way LUTs of the iCE40 HK4X FPGA, but only

        just barely.

        just barely.

\item The ZipCPU was designed to be an implementable soft core that could be

\item The ZipCPU was designed to be an implementable soft core that could be

        placed within an FPGA, controlling actions internal to the FPGA. It

        placed within an FPGA, controlling actions internal to the FPGA.  This

        fits this role rather nicely. It does not fit the role of a general

        version of the CPU in particular has been updated so that it would

        purpose CPU replacement very well: it has no octet level access,

        support a more general purpose CPU, since as of version~2.0 the ZipCPU

        no double--precision floating point capability, neither does it have

        now supports octet level access across the bus.

        vector registers and operations.  However, it was never designed to be

        such a general purpose CPU but rather a system within a chip.

        Still, it fits this role rather nicely.  Other capabilities common

        to more general purpose CPUs, such as

        double--precision floating point capability, vector registers and

        vector operations have been left out.  However, it was never designed

        to be such a general purpose CPU but rather a system within a chip.

\item The extremely simplified instruction set of the ZipCPU was a good

\item The extremely simplified instruction set of the ZipCPU was a good

        choice. Although it does not have many of the commonly used

        choice. Although it does not have many of the commonly used

        instructions, PUSH, POP, JSR, and RET among them, the simplified

        instructions, PUSH, POP, JSR, and RET among them, the simplified

        instruction set has demonstrated an amazing versatility. I will contend

        instruction set has demonstrated an amazing versatility. I will contend

        therefore and for anyone who will listen, that this instruction set

        therefore and for anyone who will listen, that this instruction set

        offers a full and complete capability for whatever a user might wish

        offers a full and complete capability for whatever a user might wish

        to do with two exceptions: bytewise character access and accelerated

        to do with two exceptions: bytewise character access and accelerated

        floating-point support.

        floating-point support.

\item This simplified instruction set is easy to decode.

\item The simplified bus transactions (32-bit words only) were also very easy

        to implement.

\item The burst load/store approach using the wishbone pipelining mode is

\item The burst load/store approach using the wishbone pipelining mode is

        novel, and can be used to greatly increase the speed of the processor.

        novel, and can be used to greatly increase the speed of the processor.

\item The novel approach to interrupts greatly facilitates the development of

\item The novel approach to interrupts greatly facilitates the development of

        interrupt handlers from within high level languages.

        interrupt handlers from within high level languages.

Line 3859...

Line 3699...

        peripheral to copy instructions from the FLASH to a temporary memory

        peripheral to copy instructions from the FLASH to a temporary memory

        location, after which they may be executed at a single instruction

        location, after which they may be executed at a single instruction

        cycle per access again.

        cycle per access again.

\item Both GCC and binutils back ends exist for the ZipCPU.

\item Both GCC and binutils back ends exist for the ZipCPU.

\item As of this version of the CPU, a newlib veresion of the C--library

        now exists.

\end{itemize}

\end{itemize}

\section{The Not so Good}

\section{The Not so Good}

\begin{itemize}

\begin{itemize}

\item The CPU has no octet (character) support. This is both good and bad.

        Realistically, the CPU works just fine without it. Characters can be

        supported as subsets of 32-bit words without any problem. Practically,

        though, this creates two problems.  The first is that it makes porting

        code from non-ZipCPU platforms to the ZipCPU is difficult--especially

        anything that depends upon the existence of {\tt *int8\_t},

        {\tt *int16\_t}, the size difference between

        {\tt sizeof(int)=4*sizeof(char)}, or that tries to

        create unions with characters and integers and then attempts to

        reference the address of the characters within that union.

        The second problem is that peripherals that depend upon character

        support on the bus may need to be rewritten to work on a 32--bit bus.

\item The ZipCPU does not (yet) support a data cache.  One is currently under

\item The ZipCPU does not (yet) support a data cache.  One is currently under

        development.

        development.

        The ZipCPU compensates for this lack via its burst memory capability.

        The ZipCPU compensates for this lack via its burst memory capability.

        Further, performance tests using Dhrystone suggest that the ZipCPU is

        Further, performance tests using Dhrystone suggest that the ZipCPU is

Line 3910...

Line 3738...

        This isn't nearly as bad as it sounds, however, since most RISC

        This isn't nearly as bad as it sounds, however, since most RISC

        architectures have 32~registers that will need to be swapped upon any

        architectures have 32~registers that will need to be swapped upon any

        context swap.

        context swap.

\item The ZipCPU is by no means generic: it will never handle addresses

\item The ZipCPU is by no means generic: it will never handle addresses

        larger than 32-bits (4GW or 16GB) without a complete and total redesign.

        larger than 32-bits (4GB) without a complete and total redesign.

        This may limit its utility as a generic CPU in the future, although

        This may limit its utility as a generic CPU in the future, although

        as an embedded CPU within an FPGA this isn't really much of a

        as an embedded CPU within an FPGA this isn't really much of a

        restriction.

        restriction.

\item While a toolchain does exist for the ZipCPU, it isn't yet fully featured.

\item While a toolchain does exist for the ZipCPU, it isn't yet fully featured.

        The ZipCPU has no support for soft floating point arithmetic,

        The ZipCPU does not yet have any support for soft floating point

        nor does it have support for several standard library functions.

        arithmetic, nor does it have gdb support.  These may be provided

        Indeed, full C library support and gdb support are still lacking.

        in future versions.

\end{itemize}

\end{itemize}

\section{The Next Generation}

\section{The Next Generation}

This section could also be labeled as my ``To do'' list.  It outlines where

This section could also be labeled as my ``To do'' list.  It outlines where

you may expect features in the future.  Currently, there are five primary

you may expect features in the future.  Currently, there are five primary

Line 3932...

Line 3760...

        The lack of any floating point capability, either hard or soft, makes

        The lack of any floating point capability, either hard or soft, makes

        porting math software to the ZipCPU difficult.  Simply building a

        porting math software to the ZipCPU difficult.  Simply building a

        soft floating point library will solve this problem.

        soft floating point library will solve this problem.

\item A C library.

        The lack of octet support has so far prevented the porting of

        newlib to the ZipCPU platform.  In the end, it may mean that any

        C library implementation for the ZipCPU may be subtly different

        from any you are familiar with.

\item A data cache

\item A data cache

        A preliminary data cache implemented as a write through cache has

        A preliminary data cache implemented as a write through cache has

        been developed.  Adding this into the CPU should require few changes

        been developed.  Adding this into the CPU should require few changes

        internal to the CPU.  I expect future versions of the CPU will permit

        internal to the CPU.  I expect future versions of the CPU will permit

Line 3951...

Line 3772...

\item A Memory Management Unit

\item A Memory Management Unit

        The first version of such an MMU has already been written.  It is

        The first version of such an MMU has already been written.  It is

        available for examination in the ZipCPU repository.  This MMU exists

        available for examination in the ZipCPU repository.  This MMU exists

        as a peripheral of the ZipCPU.  Integrating this MMU into the ZipCPU

        as a peripheral of the ZipCPU.  Integrating this MMU into the ZipCPU

        will involve slowing down memory stores so that they can be accomplished

        will involve slowing down memory stores so that they can be

        synchronously, as well as determining how and when particular cache

        accomplished synchronously, as well as determining how and when

        lines need to be invalidated.

        particular cache lines need to be invalidated.

\item An integrated floating point unit (FPU)

\item An integrated floating point unit (FPU)

        Why a small scale CPU needs a hefty floating point unit, I'm not

        Why a small scale CPU needs a hefty floating point unit, I'm not

        certain, but many application contexts require the ability to do

        certain, but many application contexts require the ability to do

Line 3987...

Line 3808...

% -     ADD     x,PC            // Any PC relative jump (20 bits)

% -     ADD     x,PC            // Any PC relative jump (20 bits)

% -     ADD.C   x,PC            // Any PC relative conditional jump (20 bits)

% -     ADD.C   x,PC            // Any PC relative conditional jump (20 bits)

% -     LDIHI   Addr,Rx         // Load from any 32-bit address, clobbers Rx,

% -     LDIHI   Addr,Rx         // Load from any 32-bit address, clobbers Rx,

%       LOD     Addr(Rx),Rx     //    unconditional, requires second instruction

%       LW      Addr(Rx),Rx     //    unconditional, requires second instruction

% -     LOD.C   Addr(Ry),Rx     // Any 16-bit relative address load, poss. cond

% -     LW.C    Addr(Ry),Rx     // Any 16-bit relative address load, poss. cond

% -     STO.C   Rx,Addr(Ry)     // Any 16-bit rel addr, Rx and Ry must be valid

% -     SW.C    Rx,Addr(Ry)     // Any 16-bit rel addr, Rx and Ry must be valid

% -     FARJMP  #Addr:          // Arbitrary 32-bit jumps require a jump table

% -     FARJMP  #Addr:          // Arbitrary 32-bit jumps require a jump table

%       BRA     +1              // memory address.  The BRA +1 can be skipped,

%       BRA     +1              // memory address.  The BRA +1 can be skipped,

%       .WORD   Addr            // but only if the address is placed at the end

%       .WORD   Addr            // but only if the address is placed at the end

%       LOD     -2(PC),PC       // of an executable section

%       LW      -2(PC),PC       // of an executable section

 No newline at end of file

 No newline at end of file

Browse

Tools

Subversion Repositories zipcpu

[/] [zipcpu/] [trunk/] [doc/] [src/] [spec.tex] - Diff between revs 199 and 202