OpenCores

Rev 24	Rev 32
Line 101...	Line 101...
`envious of what they've accomplished. I would like to port binutils to the`	`envious of what they've accomplished. I would like to port binutils to the`
`Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The`	`Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The`
`OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has`	`OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has`
`a lot of features of modern CPUs within it that ... well, let's just say it's`	`a lot of features of modern CPUs within it that ... well, let's just say it's`
`not the little guy on the block. The Zip CPU is lighter weight, costing only`	`not the little guy on the block. The Zip CPU is lighter weight, costing only`
`about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic`	`about 2,300 LUTs with no peripherals, and 3,200 LUTs with some very basic`
`peripherals.`	`peripherals.`

`My final reason is that I'm building the Zip CPU as a learning experience. The`	`My final reason is that I'm building the Zip CPU as a learning experience. The`
`Zip CPU has allowed me to learn a lot about how CPUs work on a very micro`	`Zip CPU has allowed me to learn a lot about how CPUs work on a very micro`
`level. For the first time, I am beginning to understand many of the Computer`	`level. For the first time, I am beginning to understand many of the Computer`
Line 330...	Line 330...
`The next bit is a clock enable (0 to enable) or sleep bit (1 to put`	`The next bit is a clock enable (0 to enable) or sleep bit (1 to put`
`the CPU to sleep). Setting this bit will cause the CPU to`	`the CPU to sleep). Setting this bit will cause the CPU to`
`wait for an interrupt (if interrupts are enabled), or to`	`wait for an interrupt (if interrupts are enabled), or to`
`completely halt (if interrupts are disabled).`	`completely halt (if interrupts are disabled).`
`The sixth bit is a global interrupt enable bit (GIE). When this`	`The sixth bit is a global interrupt enable bit (GIE). When this`
`sixth bit is a '1' interrupts will be enabled, else disabled. When`	sixth bit is a `1' interrupts will be enabled, else disabled. When
`interrupts are disabled, the CPU will be in supervisor mode, otherwise`	`interrupts are disabled, the CPU will be in supervisor mode, otherwise`
`it is in user mode. Thus, to execute a context switch, one only`	`it is in user mode. Thus, to execute a context switch, one only`
`need enable or disable interrupts. (When an interrupt line goes`	`need enable or disable interrupts. (When an interrupt line goes`
`high, interrupts will automatically be disabled, as the CPU goes`	`high, interrupts will automatically be disabled, as the CPU goes`
`and deals with its context switch.)`	`and deals with its context switch.) Special logic has been added to`
	`keep the user mode from setting the sleep register and clearing the`
	`GIE register at the same time, with clearing the GIE register taking`
	`precedence.`

`The seventh bit is a step bit. This bit can be`	`The seventh bit is a step bit. This bit can be`
`set from supervisor mode only. After setting this bit, should`	`set from supervisor mode only. After setting this bit, should`
`the supervisor mode process switch to user mode, it would then`	`the supervisor mode process switch to user mode, it would then`
`accomplish one instruction in user mode before returning to supervisor`	`accomplish one instruction in user mode before returning to supervisor`
Line 357...	Line 360...
`(break enabled), or whether the break instruction will simply send send the`	`(break enabled), or whether the break instruction will simply send send the`
`CPU into interrupt mode. Encountering a break in supervisor mode will`	`CPU into interrupt mode. Encountering a break in supervisor mode will`
`halt the CPU independent of the break enable bit. This bit can only be set`	`halt the CPU independent of the break enable bit. This bit can only be set`
`within supervisor mode.`	`within supervisor mode.`

	`% Should break enable be a supervisor mode bit, while the break enable bit`
	`% in user mode is a break has taken place bit?`
	`%`

`This functionality was added to enable an external debugger to`	`This functionality was added to enable an external debugger to`
`set and manage breakpoints.`	`set and manage breakpoints.`

`The ninth bit is reserved for a floating point enable bit. When set, the`	`The ninth bit is reserved for a floating point enable bit. When set, the`
`arithmetic for the next instruction will be sent to a floating point unit.`	`arithmetic for the next instruction will be sent to a floating point unit.`
Line 414...	Line 421...
`\caption{Conditions for conditional operand execution}\label{tbl:conditions}`	`\caption{Conditions for conditional operand execution}\label{tbl:conditions}`
`\end{center}`	`\end{center}`
`\end{table}`	`\end{table}`
`There is no condition code for less than or equal, not C or not V. Sorry,`	`There is no condition code for less than or equal, not C or not V. Sorry,`
`I ran out of space in 3--bits. Using these conditions will take an extra`	`I ran out of space in 3--bits. Using these conditions will take an extra`
`instruction. (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})`	`instruction and a pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})`

`\section{Operand B}`	`\section{Operand B}`
Many instruction forms have a 21-bit source ``Operand B'' associated with them.	Many instruction forms have a 21-bit source ``Operand B'' associated with them.
`This Operand B is either equal to a register plus a signed immediate offset,`	`This Operand B is either equal to a register plus a signed immediate offset,`
`or an immediate offset by itself. This value is encoded as shown in`	`or an immediate offset by itself. This value is encoded as shown in`
Line 443...	Line 450...
`immediate address. Addresses are therefore encoded in the same fashion as`	`immediate address. Addresses are therefore encoded in the same fashion as`
`Operand B's, shown above.`	`Operand B's, shown above.`

`A lot of long hard thought was put into whether to allow pre/post increment`	`A lot of long hard thought was put into whether to allow pre/post increment`
`and decrement addressing modes. Finding no way to use these operators without`	`and decrement addressing modes. Finding no way to use these operators without`
`taking two or more clocks per instruction, these addressing modes have been`	`taking two or more clocks per instruction,\footnote{The two clocks figure`
	`comes from the design of the register set, allowing only one write per clock.`
	`That write is either from the memory unit or the ALU, but never both.} these`
	`addressing modes have been`
`removed from the realm of possibilities. This means that the Zip CPU has no`	`removed from the realm of possibilities. This means that the Zip CPU has no`
`native way of executing push, pop, return, or jump to subroutine operations.`	`native way of executing push, pop, return, or jump to subroutine operations.`
`Each of these instructions can be emulated with a set of instructions from the`	`Each of these instructions can be emulated with a set of instructions from the`
`existing set.`	`existing set.`

Line 482...	Line 492...
`rule that the register cannot be the PC or CC registers. The PC register`	`rule that the register cannot be the PC or CC registers. The PC register`
`field has been stolen to create a multiply by immediate instruction. The`	`field has been stolen to create a multiply by immediate instruction. The`
`CC register field is reserved.`	`CC register field is reserved.`

`\section{Floating Point}`	`\section{Floating Point}`
`The ZIP CPU does not support floating point operations today. However, the`	`The ZIP CPU does not support floating point operations. However, the`
`instruction set reserves a capability for a floating point operation. To`	`instruction set reserves two possibilities for future floating point`
`execute such an operation, simply set the floating point bit in the CC`	`operations.`
`register and the following instruction will interpret its registers as`
`a floating point instruction. Not all instructions, however, have floating`	`The first floating point operation hole in the instruction set involves`
`point equivalents. Further, the immediate fields do not apply in floating`	`setting the floating point bit in the CC register. The next instruction`
`point mode, and must be set to zero. Not all instructions make sense as`	`will simply interpret its operands as floating point instructions.`
`floating point operations. Therefore, only the CMP, SUB, ADD, and MPY`	`Not all instructions, however, have floating point equivalents. Further, the`
`instructions may be issued as floating point instructions. Other instructions`	`immediate fields do not apply in floating point mode, and must be set to`
`allow the examining of the floating point bit in the CC register. In all`	`zero. Not all instructions make sense as floating point operations.`
`cases, the floating point bit is cleared one instruction after it is set.`	`Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as`
	`floating point instructions. Other instructions allow the examining of the`
	`floating point bit in the CC register. In all cases, the floating point bit`
	`is cleared one instruction after it is set.`

	`The other possibility for floating point operations involves exploiting the`
	`hole in the instruction set that the NOOP and BREAK instructions reside within.`
	`These two instructions use 24--bits of address space. A simple adjustment`
	`to this space could create instructions with 4--bit register addresses for`
	`each register, a 3--bit field for conditional execution, and a 2--bit field`
	`for which operation. In this fashion, such a floating point capability would`
	`only fill 13--bits of the 24--bit field, still leaving lots of room for`
	`expansion.`

	`In both cases, the Zip CPU would support 32--bit single precision floats`
	`only.`

`The architecture does not support a floating point not-implemented interrupt.`	`The current architecture does not support a floating point not-implemented`
`Any soft floating point emulation must be done deliberately.`	`interrupt. Any soft floating point emulation must be done deliberately.`

`\section{Native Instructions}`	`\section{Native Instructions}`
`The instruction set for the Zip CPU is summarized in`	`The instruction set for the Zip CPU is summarized in`
`Tbl.~\ref{tbl:zip-instructions}.`	`Tbl.~\ref{tbl:zip-instructions}.`
`\begin{table}\begin{center}`	`\begin{table}\begin{center}`
Line 592...	Line 617...
`STO & \multicolumn{4}{l\|}{4'h7}`	`STO & \multicolumn{4}{l\|}{4'h7}`
`& \multicolumn{4}{l\|}{D. Reg}`	`& \multicolumn{4}{l\|}{D. Reg}`
`& \multicolumn{3}{l\|}{Cond.}`	`& \multicolumn{3}{l\|}{Cond.}`
`& \multicolumn{21}{l\|}{Operand B address}`	`& \multicolumn{21}{l\|}{Operand B address}`
`& \\\hline`	`& \\\hline`
`{\em Rsrd} & \multicolumn{4}{l\|}{4'h8}`
`& \multicolumn{4}{l\|}{R. Reg}`
`& \multicolumn{3}{l\|}{Cond.}`
`& 1'b0`
`& \multicolumn{20}{l\|}{Reserved}`
`& Yes \\\hline`
`SUB & \multicolumn{4}{l\|}{4'h8}`	`SUB & \multicolumn{4}{l\|}{4'h8}`
`& \multicolumn{4}{l\|}{R. Reg}`	`& \multicolumn{4}{l\|}{R. Reg}`
`& \multicolumn{3}{l\|}{Cond.}`	`& \multicolumn{3}{l\|}{Cond.}`
`& 1'b1`	`& \multicolumn{21}{l\|}{Operand B}`
`& \multicolumn{4}{l\|}{Reg}`
`& \multicolumn{16}{l\|}{16'bit signed offset}`
`& Yes \\\hline`	`& Yes \\\hline`
`AND & \multicolumn{4}{l\|}{4'h9}`	`AND & \multicolumn{4}{l\|}{4'h9}`
`& \multicolumn{4}{l\|}{R. Reg}`	`& \multicolumn{4}{l\|}{R. Reg}`
`& \multicolumn{3}{l\|}{Cond.}`	`& \multicolumn{3}{l\|}{Cond.}`
`& \multicolumn{21}{l\|}{Operand B}`	`& \multicolumn{21}{l\|}{Operand B}`
Line 646...	Line 663...
`\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}`	`\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}`
`\end{center}\end{table}`	`\end{center}\end{table}`

`As you can see, there's lots of room for instruction set expansion. The`	`As you can see, there's lots of room for instruction set expansion. The`
`NOOP and BREAK instructions are the only instructions within one particular`	`NOOP and BREAK instructions are the only instructions within one particular`
`24--bit hole. Likewise, the subtract leaves half of its space open, since a`	`24--bit hole. This spaces are reserved for future enhancements. For example,`
`subtract immediate is the same as an add with a negated immediate. This`	`floating point operations, consisting of a 3-bit floating point operation,`
`spaces are reserved for future enhancements.`	`two 4-bit registers, no immediate offset, and a 3-bit condition would fit`
	`nicely into 14--bits of this address space--making it so that the floating`
	`point bit in the CC register need not be used.`

`\section{Derived Instructions}`	`\section{Derived Instructions}`
`The ZIP CPU supports many other common instructions, but not all of them`	`The ZIP CPU supports many other common instructions, but not all of them`
`are single cycle instructions. The derived instruction tables,`	`are single cycle instructions. The derived instruction tables,`
`Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},`	`Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},`
Line 860...	Line 879...
`\caption{Derived Instructions, continued}\label{tbl:derived-3}`	`\caption{Derived Instructions, continued}\label{tbl:derived-3}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`\iffalse`	`\iffalse`
`\fi`	`\fi`
`\section{Pipeline Stages}`	`\section{Pipeline Stages}`
	`As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},`
	`the Zip CPU supports a five stage pipeline.`
`\begin{enumerate}`	`\begin{enumerate}`
`\item {\bf Prefetch}: Read instruction from memory (cache if possible). This`	`\item {\bf Prefetch}: Read instruction from memory (cache if possible). This`
`stage is actually pipelined itself, and so it will stall if the PC`	`stage is actually pipelined itself, and so it will stall if the PC`
`ever changes. Stalls are also created here if the instruction isn't`	`ever changes. Stalls are also created here if the instruction isn't`
`in the prefetch cache.`	`in the prefetch cache.`
`\item {\bf Decode}: Decode instruction into op code, register(s) to read, and`	`\item {\bf Decode}: Decode instruction into op code, register(s) to read, and`
`immediate offset.`	`immediate offset. This stage also determines whether the flags will`
	`be set or whether the result will be written back.`
`\item {\bf Read Operands}: Read registers and apply any immediate values to`	`\item {\bf Read Operands}: Read registers and apply any immediate values to`
`them. There is no means of detecting or flagging arithmetic overflow`	`them. There is no means of detecting or flagging arithmetic overflow`
`or carry when adding the immediate to the operand. This stage will`	`or carry when adding the immediate to the operand. This stage will`
`stall if any source operand is pending.`	`stall if any source operand is pending.`
`A proper optimizing compiler, therefore, will schedule an instruction`
`between the instruction that produces the result and the instruction`
`that uses it.`
`\item Split into two tracks: An {\bf ALU} which will accomplish a simple`	`\item Split into two tracks: An {\bf ALU} which will accomplish a simple`
`instruction, and the {\bf MemOps} stage which accomplishes memory`	`instruction, and the {\bf MemOps} stage which accomplishes memory`
`read/write.`	`read/write.`
`\begin{itemize}`	`\begin{itemize}`
`\item Loads stall instructions that access the register until it is`	`\item Loads stall instructions that access the register until it is`
`written to the register set.`	`written to the register set.`
`\item Condition codes are available upon completion`	`\item Condition codes are available upon completion`
`\item Issuing an instruction to the memory while the memory is busy will`	`\item Issuing an instruction to the memory while the memory is busy will`
`stall the bus. If the bus deadlocks, only a reset will`	`stall the entire pipeline. If the bus deadlocks, only a reset`
`release the CPU. (Watchdog timer, anyone?)`	`will release the CPU. (Watchdog timer, anyone?)`
`\item The Zip CPU currently has no means of reading and acting on any`	`\item The Zip CPU currently has no means of reading and acting on any`
`error conditions on the bus.`	`error conditions on the bus.`
`\end{itemize}`	`\end{itemize}`
`\item {\bf Write-Back}: Conditionally write back the result to register set,`	`\item {\bf Write-Back}: Conditionally write back the result to the register`
`applying the condition. This routine is bi-re-entrant: either the`	`set, applying the condition. This routine is bi-re-entrant: either the`
`memory or the simple instruction may request a register write.`	`memory or the simple instruction may request a register write.`
`\end{enumerate}`	`\end{enumerate}`

`The Zip CPU does not support out of order execution. Therefore, if the memory`	`The Zip CPU does not support out of order execution. Therefore, if the memory`
`unit stalls, every other instruction stalls. Memory stores, however, can take`	`unit stalls, every other instruction stalls. Memory stores, however, can take`
`place concurrently with ALU operations, although memory writes cannot.`	`place concurrently with ALU operations, although memory reads cannot.`

`\section{Pipeline Logic}`	`\section{Pipeline Logic}`
`How the CPU handles some instruction combinations can be telling when`	`How the CPU handles some instruction combinations can be telling when`
`determining what happens in the pipeline. The following lists some examples:`	`determining what happens in the pipeline. The following lists some examples:`
`\begin{itemize}`	`\begin{itemize}`
Line 923...	Line 942...
`R2 get, the value of R1 before the first move or the value of R0?`	`R2 get, the value of R1 before the first move or the value of R0?`
`Placing the value of R0 into R1 requires a pipeline stall, and possibly`	`Placing the value of R0 into R1 requires a pipeline stall, and possibly`
`two, as I have the pipeline designed.`	`two, as I have the pipeline designed.`

`The ZIP CPU architecture requires that R2 must equal R0 at the end of`	`The ZIP CPU architecture requires that R2 must equal R0 at the end of`
`this operation. This may stall the pipeline 1-2 cycles.`	`this operation. Even better, such combinations do not (normally)`
	`stall the pipeline.`

`\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}`	`\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}`


`At issue is the same item as above, save that the CMP instruction`	`At issue is the same item as above, save that the CMP instruction`
Line 942...	Line 962...

`At issue is the`	`At issue is the`
`fact that the logic supporting the CC register is more complicated than`	`fact that the logic supporting the CC register is more complicated than`
`the logic supporting any other register.`	`the logic supporting any other register.`

`The ZIP CPU will stall 1--2 cycles on this instruction, until the`	`The ZIP CPU will stall for a cycle cycle on this instruction.`
`CC register is valid.`

`\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}`	`\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}`

`At issues is whether or not the instruction following the jump will`	`At issues is whether or not the instruction following the jump will`
`take place before the jump. In other words, is the MOV to the PC`	`take place before the jump. In other words, is the MOV to the PC`
Line 991...	Line 1010...
`Because it isn't clear what would need to be canceled,`	`Because it isn't clear what would need to be canceled,`
`this instruction combination is not recommended.`	`this instruction combination is not recommended.`

`\item {\bf All issued instructions complete.}`	`\item {\bf All issued instructions complete.}`

`All stages are filled, or the entire pipeline`	`All stages are filled, or the entire pipeline stalls.`
`stalls.`

`What about debug control? What about`	`What about debug control? What about`
`register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2`	`register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2`
`should place the value of R0 into R2. How do you restart the pipeline`	`should place the value of R0 into R2. How do you restart the pipeline`
`after an interrupt? What address do you use? The last issued`	`after an interrupt? What address do you use? The last issued`
Line 1014...	Line 1032...

`Suggestion: Suppose we load extra information in the two`	`Suggestion: Suppose we load extra information in the two`
`CC register(s) for debugging intermediate pipeline stages?`	`CC register(s) for debugging intermediate pipeline stages?`

`The next problem, though, is how to deal with the read operand`	`The next problem, though, is how to deal with the read operand`
`pipeline stage needing the result from the register pipeline.a`	`pipeline stage needing the result from the register pipeline.`

`\item {\bf Memory instructions must complete}`	`\item {\bf Memory instructions must complete}`

`All instructions that enter into the memory module must`	`All instructions that enter into the memory module {\em must}`
`complete. Issued instructions from the prefetch, decode, or operand`	`complete. Issued instructions from the prefetch, decode, or operand`
`read stages may or may not complete. Jumps into code must be valid,`	`read stages may or may not complete. Jumps into code must be valid,`
`so that interrupt returns may be valid. All instructions entering the`	`so that interrupt returns may be valid. All instructions entering the`
`ALU complete.`	`ALU complete.`

Line 1037...	Line 1055...
`result is known. When the flag does go high, anything in the prefetch,`	`result is known. When the flag does go high, anything in the prefetch,`
`decode, and read-op stage will be invalidated.`	`decode, and read-op stage will be invalidated.`

`\end{itemize}`	`\end{itemize}`

	`\section{Pipeline Stalls}`
	`The processing pipeline can and will stall for a variety of reasons. Some of`
	`these are obvious, some less so. These reasons are listed below:`
	`\begin{itemize}`
	`\item When the prefetch cache is exhausted`

	`This should be obvious. If the prefetch cache doesn't have the instruction`
	`in memory, the entire pipeline must stall until enough of the prefetch cache`
	`is loaded to support the next instruction.`

	`\item While waiting for the pipeline to load following any taken branch, jump,`
	`return from interrupt or switch to interrupt context (6 clocks)`

	`If the PC suddenly changes, the pipeline is subsequently cleared and needs to`
	`be reloaded. Given that there are five stages to the pipeline, that accounts`
	`for five of the six delay clocks. The last clock is lost in the prefetch`
	`stage which needs at least one clock with a valid PC before it can produce`
	`a new output. Hence, six clocks will always be lost anytime the pipeline needs`
	`to be cleared.`

	`\item When reading from a prior register while also adding an immediate offset`
	`\begin{enumerate}`
	`\item\ {\tt OPCODE ?,RA}`
	`\item\ {\em (stall)}`
	`\item\ {\tt OPCODE I+RA,RB}`
	`\end{enumerate}`

	`Since the addition of the immediate register within OpB decoding gets applied`
	`during the read operand stage so that it can be nicely settled before the ALU,`
	`any instruction that will write back an operand must be separated from the`
	`opcode that will read and apply an immediate offset by one instruction. The`
	`good news is that this stall can easily be mitigated by proper scheduling.`

	`\item When writing to the CC or PC Register`
	`\begin{enumerate}`
	`\item\ {\tt OPCODE RA,PC} {\em Ex: a branch opcode}`
	`\item\ {\em (stall, even if jump not taken)}`
	`\item\ {\tt OPCODE RA,RB}`
	`\end{enumerate}`
	`Since branches take place in the writeback stage, the Zip CPU will stall the`
	`pipeline for one clock anytime there may be a possible jump. This prevents`
	`an instruction from executing a memory access after the jump but before the`
	`jump is recognized.`

	`\item When reading from the CC register after setting the flags`
	`\begin{enumerate}`
	`\item\ {\tt ALUOP RA,RB}`
	`\item\ {\em (stall}`
	`\item\ {\tt TST sys.ccv,CC}`
	`\item\ {\tt BZ somewhere}`
	`\end{enumerate}`

	`The reason for this stall is simply performance. Many of the flags are`
	`determined via combinatorial logic after the writeback instruction is`
	`determined. Trying to then place these into the input for one of the operands`
	`created a time delay loop that would no longer execute in a single 100~MHz`
	`clock cycle. (The time delay of the multiply within the ALU wasn't helping`
	`either \ldots).`

	`\item When waiting for a memory read operation to complete`
	`\begin{enumerate}`
	`\item\ {\tt LOD address,RA}`
	`\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}`
	`\item\ {\tt OPCODE I+RA,RB}`
	`\end{enumerate}`

	`Remember, the ZIP CPU does not support out of order execution. Therefore,`
	`anytime the memory unit becomes busy both the memory unit and the ALU must`
	`stall until the memory unit is cleared. This is especially true of a load`
	`instruction, which will write its operand back to the register file. Store`
	`instructions are different, since they can be busy with no impact on later`
	`ALU write back operations. Hence, only loads stall the pipeline.`

	`This also assumes that the memory being accessed is a single cycle memory.`
	`Slower memories, such as the Quad SPI flash, will take longer--perhaps even`
	`as long as fourty clocks. During this time the CPU and the external bus`
	`will be busy, and unable to do anything else.`

	`\item Memory operation followed by a memory operation`
	`\begin{enumerate}`
	`\item\ {\tt STO address,RA}`
	`\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}`
	`\item\ {\tt LOD address,RB}`
	`\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}`
	`\end{enumerate}`

	`In this case, the LOD instruction cannot start until the STALL is finished.`
	`With proper scheduling, it is possible to do something in the ALU while the`
	`STO is busy, but otherwise this pipeline will stall waiting for it to complete.`

	`Note that even though the Wishbone bus can support pipelined accesses at`
	`one access per clock, only the prefetch stage can take advantage of this.`
	`Load and Store instructions are stuck at one wishbone cycle per instruction.`
	`\end{itemize}`


`\chapter{Peripherals}\label{chap:periph}`	`\chapter{Peripherals}\label{chap:periph}`

`While the previous chapter describes a CPU in isolation, the Zip System`	`While the previous chapter describes a CPU in isolation, the Zip System`
Line 1120...	Line 1232...

`The watchdog timer is no different from any of the other timers, save for one`	`The watchdog timer is no different from any of the other timers, save for one`
`critical difference: the interrupt line from the watchdog`	`critical difference: the interrupt line from the watchdog`
timer is tied to the reset line of the CPU. Hence writing a `1' to the	timer is tied to the reset line of the CPU. Hence writing a `1' to the
`watchdog timer will always reset the CPU.`	`watchdog timer will always reset the CPU.`
`To stop the Watchdog timer, write a '0' to it. To start it,`	To stop the Watchdog timer, write a `0' to it. To start it,
`write any other number to it---as with the other timers.`	`write any other number to it---as with the other timers.`

`While the watchdog timer supports interval mode, it doesn't make as much sense`	`While the watchdog timer supports interval mode, it doesn't make as much sense`
`as it did with the other timers.`	`as it did with the other timers.`

Line 1153...	Line 1265...

`The purpose of this register is to support alarm times within a CPU. To`	`The purpose of this register is to support alarm times within a CPU. To`
`set an alarm for a particular process $N$ clocks in advance, read the current`	`set an alarm for a particular process $N$ clocks in advance, read the current`
`Jiffies value, and $N$, and write it back to the Jiffies register. The`	`Jiffies value, and $N$, and write it back to the Jiffies register. The`
`O/S must also keep track of values written to the Jiffies register. Thus,`	`O/S must also keep track of values written to the Jiffies register. Thus,`
when an `alarm' trips, it should be remoed from the list of alarms, the list	when an `alarm' trips, it should be removed from the list of alarms, the list
`should be sorted, and the next alarm in terms of Jiffies should be written`	`should be sorted, and the next alarm in terms of Jiffies should be written`
`to the register.`	`to the register.`

`\section{Manual Cache}`	`\section{Manual Cache}`

`The manual cache is an experimental setting that may not remain with the Zip`	`The manual cache is an experimental setting that may not remain with the Zip`
`CPU for very long. It is designed to facilitate running from FLASH or ROM`	`CPU for very long. It is designed to facilitate running from FLASH or ROM`
`memory, although the pipe cache really makes this need obsolete. The manual`	`memory, although the pipeline prefetch cache really makes this need obsolete.`
	`The manual`
`cache works by copying data from a wishbone address (range) into the cache`	`cache works by copying data from a wishbone address (range) into the cache`
`register, and then by making that memory available as memory to the Zip System.`	`register, and then by making that memory available as memory to the Zip System.`
`It is a {\em manual cache} because the processor must first specify what`	`It is a {\em manual cache} because the processor must first specify what`
`memory to copy, and then once copied the processor can only access the cache`	`memory to copy, and then once copied the processor can only access the cache`
`memory by the cache memory location. There is no transparency. It is perhaps`	`memory by the cache memory location. There is no transparency. It is perhaps`
Line 1181...	Line 1294...

`The ZipSystem registers fall into two categories, ZipSystem internal registers`	`The ZipSystem registers fall into two categories, ZipSystem internal registers`
`accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},`	`accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},`
`\begin{table}[htbp]`	`\begin{table}[htbp]`
`\begin{center}\begin{reglist}`	`\begin{center}\begin{reglist}`
`PIC & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline`	`PIC & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline`
`WDT & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline`	`WDT & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline`
`CCHE & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline`	`CCHE & \scalebox{0.8}{\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline`
`CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline`	`CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline`
`TMRA & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline`	`TMRA & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline`
`TMRB & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline`	`TMRB & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline`
`TMRC & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline`	`TMRC & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline`
`JIFF & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline`	`JIFF & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline`
`MTASK & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline`	`MTASK & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline`
`MMSTL & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline`	`MMSTL & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline`
`MPSTL & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline`	`MPSTL & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline`
`MICNT & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline`	`MICNT & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline`
`UTASK & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline`	`UTASK & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline`
`UMSTL & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline`	`UMSTL & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline`
`UPSTL & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline`	`UPSTL & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline`
`UICNT & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline`	`UICNT & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline`
`Cache & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline`	`% Cache & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline`
`\end{reglist}`	`\end{reglist}`
`\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}`	`\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}`
`\end{center}\end{table}`	`\end{center}\end{table}`
`and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.`	`and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.`
`\begin{table}[htbp]`	`\begin{table}[htbp]`
Line 1212...	Line 1325...
`\caption{Zip System Debug Registers}\label{tbl:dbgregs}`	`\caption{Zip System Debug Registers}\label{tbl:dbgregs}`
`\end{center}\end{table}`	`\end{center}\end{table}`


`\chapter{Wishbone Datasheet}\label{chap:wishbone}`	`\chapter{Wishbone Datasheet}\label{chap:wishbone}`
`The Zip System supports two wishbone accesses, a slave debug port and a master`	`The Zip System supports two wishbone ports, a slave debug port and a master`
`port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}`	`port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}`
`\begin{table}[htbp]`	`\begin{table}[htbp]`
`\begin{center}`	`\begin{center}`
`\begin{wishboneds}`	`\begin{wishboneds}`
`Revision level of wishbone & WB B4 spec \\\hline`	`Revision level of wishbone & WB B4 spec \\\hline`
Line 1279...	Line 1392...
`it choose to access a value not on the bus, or a peripheral that is not`	`it choose to access a value not on the bus, or a peripheral that is not`
`yet properly configured.`	`yet properly configured.`

`\chapter{Clocks}\label{chap:clocks}`	`\chapter{Clocks}\label{chap:clocks}`

`This core is based upon the Basys--3 design. The Basys--3 development board`	`This core is based upon the Basys--3 development board sold by Digilent.`
`contains one external 100~MHz clock, which is sufficient to run the ZIP CPU`	`The Basys--3 development board contains one external 100~MHz clock, which is`
`core.`	`sufficient to run the ZIP CPU core.`
`\begin{table}[htbp]`	`\begin{table}[htbp]`
`\begin{center}`	`\begin{center}`
`\begin{clocklist}`	`\begin{clocklist}`
`i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline`	`i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline`
`\end{clocklist}`	`\end{clocklist}`

Line 101...

envious of what they've accomplished. I would like to port binutils to the

envious of what they've accomplished. I would like to port binutils to the

Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The

Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The

OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has

OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has

a lot of features of modern CPUs within it that ... well, let's just say it's

a lot of features of modern CPUs within it that ... well, let's just say it's

not the little guy on the block. The Zip CPU is lighter weight, costing only

not the little guy on the block. The Zip CPU is lighter weight, costing only

about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic

about 2,300 LUTs with no peripherals, and 3,200 LUTs with some very basic

peripherals.

peripherals.

My final reason is that I'm building the Zip CPU as a learning experience. The

My final reason is that I'm building the Zip CPU as a learning experience. The

Zip CPU has allowed me to learn a lot about how CPUs work on a very micro

Zip CPU has allowed me to learn a lot about how CPUs work on a very micro

level. For the first time, I am beginning to understand many of the Computer

level. For the first time, I am beginning to understand many of the Computer

Line 330...

The next bit is a clock enable (0 to enable) or sleep bit (1 to put

The next bit is a clock enable (0 to enable) or sleep bit (1 to put

        the CPU to sleep).  Setting this bit will cause the CPU to

        the CPU to sleep).  Setting this bit will cause the CPU to

        wait for an interrupt (if interrupts are enabled), or to

        wait for an interrupt (if interrupts are enabled), or to

        completely halt (if interrupts are disabled).

        completely halt (if interrupts are disabled).

The sixth bit is a global interrupt enable bit (GIE).  When this

The sixth bit is a global interrupt enable bit (GIE).  When this

        sixth bit is a '1' interrupts will be enabled, else disabled.  When

        sixth bit is a `1' interrupts will be enabled, else disabled.  When

        interrupts are disabled, the CPU will be in supervisor mode, otherwise

        interrupts are disabled, the CPU will be in supervisor mode, otherwise

        it is in user mode.  Thus, to execute a context switch, one only

        it is in user mode.  Thus, to execute a context switch, one only

        need enable or disable interrupts.  (When an interrupt line goes

        need enable or disable interrupts.  (When an interrupt line goes

        high, interrupts will automatically be disabled, as the CPU goes

        high, interrupts will automatically be disabled, as the CPU goes

        and deals with its context switch.)

        and deals with its context switch.)  Special logic has been added to

        keep the user mode from setting the sleep register and clearing the

        GIE register at the same time, with clearing the GIE register taking

        precedence.

The seventh bit is a step bit.  This bit can be

The seventh bit is a step bit.  This bit can be

        set from supervisor mode only.  After setting this bit, should

        set from supervisor mode only.  After setting this bit, should

        the supervisor mode process switch to user mode, it would then

        the supervisor mode process switch to user mode, it would then

        accomplish one instruction in user mode before returning to supervisor

        accomplish one instruction in user mode before returning to supervisor

Line 357...

Line 360...

(break enabled), or whether the break instruction will simply send send the

(break enabled), or whether the break instruction will simply send send the

CPU into interrupt mode.  Encountering a break in supervisor mode will

CPU into interrupt mode.  Encountering a break in supervisor mode will

halt the CPU independent of the break enable bit.  This bit can only be set

halt the CPU independent of the break enable bit.  This bit can only be set

within supervisor mode.

within supervisor mode.

% Should break enable be a supervisor mode bit, while the break enable bit

% in user mode is a break has taken place bit?

This functionality was added to enable an external debugger to

This functionality was added to enable an external debugger to

        set and manage breakpoints.

        set and manage breakpoints.

The ninth bit is reserved for a floating point enable bit.  When set, the

The ninth bit is reserved for a floating point enable bit.  When set, the

arithmetic for the next instruction will be sent to a floating point unit.

arithmetic for the next instruction will be sent to a floating point unit.

Line 414...

Line 421...

\caption{Conditions for conditional operand execution}\label{tbl:conditions}

\caption{Conditions for conditional operand execution}\label{tbl:conditions}

\end{center}

\end{center}

\end{table}

\end{table}

There is no condition code for less than or equal, not C or not V.  Sorry,

There is no condition code for less than or equal, not C or not V.  Sorry,

I ran out of space in 3--bits.  Using these conditions will take an extra

I ran out of space in 3--bits.  Using these conditions will take an extra

instruction.  (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})

instruction and a pipeline stall.  (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})

\section{Operand B}

\section{Operand B}

Many instruction forms have a 21-bit source ``Operand B'' associated with them.

Many instruction forms have a 21-bit source ``Operand B'' associated with them.

This Operand B is either equal to a register plus a signed immediate offset,

This Operand B is either equal to a register plus a signed immediate offset,

or an immediate offset by itself.  This value is encoded as shown in

or an immediate offset by itself.  This value is encoded as shown in

Line 443...

Line 450...

immediate address.  Addresses are therefore encoded in the same fashion as

immediate address.  Addresses are therefore encoded in the same fashion as

Operand B's, shown above.

Operand B's, shown above.

A lot of long hard thought was put into whether to allow pre/post increment

A lot of long hard thought was put into whether to allow pre/post increment

and decrement addressing modes.  Finding no way to use these operators without

and decrement addressing modes.  Finding no way to use these operators without

taking two or more clocks per instruction, these addressing modes have been

taking two or more clocks per instruction,\footnote{The two clocks figure

comes from the design of the register set, allowing only one write per clock.

That write is either from the memory unit or the ALU, but never both.} these

addressing modes have been

removed from the realm of possibilities.  This means that the Zip CPU has no

removed from the realm of possibilities.  This means that the Zip CPU has no

native way of executing push, pop, return, or jump to subroutine operations.

native way of executing push, pop, return, or jump to subroutine operations.

Each of these instructions can be emulated with a set of instructions from the

Each of these instructions can be emulated with a set of instructions from the

existing set.

existing set.

Line 482...

Line 492...

rule that the register cannot be the PC or CC registers.  The PC register

rule that the register cannot be the PC or CC registers.  The PC register

field has been stolen to create a multiply by immediate instruction.  The

field has been stolen to create a multiply by immediate instruction.  The

CC register field is reserved.

CC register field is reserved.

\section{Floating Point}

\section{Floating Point}

The ZIP CPU does not support floating point operations today.  However, the

The ZIP CPU does not support floating point operations.  However, the

instruction set reserves a capability for a floating point operation.  To

instruction set reserves two possibilities for future floating point

execute such an operation, simply set the floating point bit in the CC

operations.

register and the following instruction will interpret its registers as

a floating point instruction.  Not all instructions, however, have floating

The first floating point operation hole in the instruction set involves

point equivalents.  Further, the immediate fields do not apply in floating

setting the floating point bit in the CC register.  The next instruction

point mode, and must be set to zero.  Not all instructions make sense as

will simply interpret its operands as floating point instructions.

floating point operations.  Therefore, only the CMP, SUB, ADD, and MPY

Not all instructions, however, have floating point equivalents.  Further, the

instructions may be issued as floating point instructions.  Other instructions

immediate fields do not apply in floating point mode, and must be set to

allow the examining of the floating point bit in the CC register.  In all

zero.  Not all instructions make sense as floating point operations.

cases, the floating point bit is cleared one instruction after it is set.

Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as

floating point instructions.  Other instructions allow the examining of the

floating point bit in the CC register.  In all cases, the floating point bit

is cleared one instruction after it is set.

The other possibility for floating point operations involves exploiting the

hole in the instruction set that the NOOP and BREAK instructions reside within.

These two instructions use 24--bits of address space.  A simple adjustment

to this space could create instructions with 4--bit register addresses for

each register, a 3--bit field for conditional execution, and a 2--bit field

for which operation.  In this fashion, such a floating point capability would

only fill 13--bits of the 24--bit field, still leaving lots of room for

expansion.

In both cases, the Zip CPU would support 32--bit single precision floats

only.

The architecture does not support a floating point not-implemented interrupt.

The current architecture does not support a floating point not-implemented

Any soft floating point emulation must be done deliberately.

interrupt.  Any soft floating point emulation must be done deliberately.

\section{Native Instructions}

\section{Native Instructions}

The instruction set for the Zip CPU is summarized in

The instruction set for the Zip CPU is summarized in

Tbl.~\ref{tbl:zip-instructions}.

Tbl.~\ref{tbl:zip-instructions}.

\begin{table}\begin{center}

\begin{table}\begin{center}

Line 592...

Line 617...

STO & \multicolumn{4}{l|}{4'h7}

STO & \multicolumn{4}{l|}{4'h7}

                & \multicolumn{4}{l|}{D. Reg}

                & \multicolumn{4}{l|}{D. Reg}

                & \multicolumn{3}{l|}{Cond.}

                & \multicolumn{3}{l|}{Cond.}

                & \multicolumn{21}{l|}{Operand B address}

                & \multicolumn{21}{l|}{Operand B address}

                & \\\hline

                & \\\hline

{\em Rsrd} & \multicolumn{4}{l|}{4'h8}

        &       \multicolumn{4}{l|}{R. Reg}

        &       \multicolumn{3}{l|}{Cond.}

        & 1'b0

        &       \multicolumn{20}{l|}{Reserved}

        & Yes \\\hline

SUB & \multicolumn{4}{l|}{4'h8}

SUB & \multicolumn{4}{l|}{4'h8}

        &       \multicolumn{4}{l|}{R. Reg}

        &       \multicolumn{4}{l|}{R. Reg}

        &       \multicolumn{3}{l|}{Cond.}

        &       \multicolumn{3}{l|}{Cond.}

        & 1'b1

        &       \multicolumn{21}{l|}{Operand B}

        &       \multicolumn{4}{l|}{Reg}

        &       \multicolumn{16}{l|}{16'bit signed offset}

        & Yes \\\hline

        & Yes \\\hline

AND & \multicolumn{4}{l|}{4'h9}

AND & \multicolumn{4}{l|}{4'h9}

        &       \multicolumn{4}{l|}{R. Reg}

        &       \multicolumn{4}{l|}{R. Reg}

        &       \multicolumn{3}{l|}{Cond.}

        &       \multicolumn{3}{l|}{Cond.}

        &       \multicolumn{21}{l|}{Operand B}

        &       \multicolumn{21}{l|}{Operand B}

Line 646...

Line 663...

\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}

\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}

\end{center}\end{table}

\end{center}\end{table}

As you can see, there's lots of room for instruction set expansion.  The

As you can see, there's lots of room for instruction set expansion.  The

NOOP and BREAK instructions are the only instructions within one particular

NOOP and BREAK instructions are the only instructions within one particular

24--bit hole.  Likewise, the subtract leaves half of its space open, since a

24--bit hole.  This spaces are reserved for future enhancements.  For example,

subtract immediate is the same as an add with a negated immediate.  This

floating point operations, consisting of a 3-bit floating point operation,

spaces are reserved for future enhancements.

two 4-bit registers, no immediate offset, and a 3-bit condition would fit

nicely into 14--bits of this address space--making it so that the floating

point bit in the CC register need not be used.

\section{Derived Instructions}

\section{Derived Instructions}

The ZIP CPU supports many other common instructions, but not all of them

The ZIP CPU supports many other common instructions, but not all of them

are single cycle instructions.  The derived instruction tables,

are single cycle instructions.  The derived instruction tables,

Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},

Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},

Line 860...

Line 879...

\caption{Derived Instructions, continued}\label{tbl:derived-3}

\caption{Derived Instructions, continued}\label{tbl:derived-3}

\end{center}\end{table}

\end{center}\end{table}

\iffalse

\iffalse

\fi

\fi

\section{Pipeline Stages}

\section{Pipeline Stages}

As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},

the Zip CPU supports a five stage pipeline.

\begin{enumerate}

\begin{enumerate}

\item {\bf Prefetch}: Read instruction from memory (cache if possible).  This

\item {\bf Prefetch}: Read instruction from memory (cache if possible).  This

        stage is actually pipelined itself, and so it will stall if the PC

        stage is actually pipelined itself, and so it will stall if the PC

        ever changes.  Stalls are also created here if the instruction isn't

        ever changes.  Stalls are also created here if the instruction isn't

        in the prefetch cache.

        in the prefetch cache.

\item {\bf Decode}: Decode instruction into op code, register(s) to read, and

\item {\bf Decode}: Decode instruction into op code, register(s) to read, and

        immediate offset.

        immediate offset.  This stage also determines whether the flags will

        be set or whether the result will be written back.

\item {\bf Read Operands}: Read registers and apply any immediate values to

\item {\bf Read Operands}: Read registers and apply any immediate values to

        them.  There is no means of detecting or flagging arithmetic overflow

        them.  There is no means of detecting or flagging arithmetic overflow

        or carry when adding the immediate to the operand.  This stage will

        or carry when adding the immediate to the operand.  This stage will

        stall if any source operand is pending.

        stall if any source operand is pending.

        A proper optimizing compiler, therefore, will schedule an instruction

        between the instruction that produces the result and the instruction

        that uses it.

\item Split into two tracks: An {\bf ALU} which will accomplish a simple

\item Split into two tracks: An {\bf ALU} which will accomplish a simple

        instruction, and the {\bf MemOps} stage which accomplishes memory

        instruction, and the {\bf MemOps} stage which accomplishes memory

        read/write.

        read/write.

        \begin{itemize}

        \begin{itemize}

        \item Loads stall instructions that access the register until it is

        \item Loads stall instructions that access the register until it is

                written to the register set.

                written to the register set.

        \item Condition codes are available upon completion

        \item Condition codes are available upon completion

        \item Issuing an instruction to the memory while the memory is busy will

        \item Issuing an instruction to the memory while the memory is busy will

                stall the bus.  If the bus deadlocks, only a reset will

                stall the entire pipeline.  If the bus deadlocks, only a reset

                release the CPU.  (Watchdog timer, anyone?)

                will release the CPU.  (Watchdog timer, anyone?)

        \item The Zip CPU currently has no means of reading and acting on any

        \item The Zip CPU currently has no means of reading and acting on any

        error conditions on the bus.

        error conditions on the bus.

        \end{itemize}

        \end{itemize}

\item {\bf Write-Back}: Conditionally write back the result to register set,

\item {\bf Write-Back}: Conditionally write back the result to the register

        applying the condition.  This routine is bi-re-entrant: either the

        set, applying the condition.  This routine is bi-re-entrant: either the

        memory or the simple instruction may request a register write.

        memory or the simple instruction may request a register write.

\end{enumerate}

\end{enumerate}

The Zip CPU does not support out of order execution.  Therefore, if the memory

The Zip CPU does not support out of order execution.  Therefore, if the memory

unit stalls, every other instruction stalls.  Memory stores, however, can take

unit stalls, every other instruction stalls.  Memory stores, however, can take

place concurrently with ALU operations, although memory writes cannot.

place concurrently with ALU operations, although memory reads cannot.

\section{Pipeline Logic}

\section{Pipeline Logic}

How the CPU handles some instruction combinations can be telling when

How the CPU handles some instruction combinations can be telling when

determining what happens in the pipeline.  The following lists some examples:

determining what happens in the pipeline.  The following lists some examples:

\begin{itemize}

\begin{itemize}

Line 923...

Line 942...

        R2 get, the value of R1 before the first move or the value of R0?

        R2 get, the value of R1 before the first move or the value of R0?

        Placing the value of R0 into R1 requires a pipeline stall, and possibly

        Placing the value of R0 into R1 requires a pipeline stall, and possibly

        two, as I have the pipeline designed.

        two, as I have the pipeline designed.

        The ZIP CPU architecture requires that R2 must equal R0 at the end of

        The ZIP CPU architecture requires that R2 must equal R0 at the end of

        this operation.  This may stall the pipeline 1-2 cycles.

        this operation.  Even better, such combinations do not (normally)

        stall the pipeline.

\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}

\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}

        At issue is the same item as above, save that the CMP instruction

        At issue is the same item as above, save that the CMP instruction

Line 942...

Line 962...

        At issue is the

        At issue is the

        fact that the logic supporting the CC register is more complicated than

        fact that the logic supporting the CC register is more complicated than

        the logic supporting any other register.

        the logic supporting any other register.

        The ZIP CPU will stall 1--2 cycles on this instruction, until the

        The ZIP CPU will stall for a cycle cycle on this instruction.

        CC register is valid.

\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}

\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}

        At issues is whether or not the instruction following the jump will

        At issues is whether or not the instruction following the jump will

        take place before the jump.  In other words, is the MOV to the PC

        take place before the jump.  In other words, is the MOV to the PC

Line 991...

Line 1010...

        Because it isn't clear what would need to be canceled,

        Because it isn't clear what would need to be canceled,

        this instruction combination is not recommended.

        this instruction combination is not recommended.

\item {\bf All issued instructions complete.}

\item {\bf All issued instructions complete.}

        All stages are filled, or the entire pipeline

        All stages are filled, or the entire pipeline stalls.

        stalls.

        What about debug control?  What about

        What about debug control?  What about

        register writes taking an extra clock stage?  MOV R0,R1; MOV R1,R2

        register writes taking an extra clock stage?  MOV R0,R1; MOV R1,R2

        should place the value of R0 into R2.  How do you restart the pipeline

        should place the value of R0 into R2.  How do you restart the pipeline

        after an interrupt?  What address do you use?  The last issued

        after an interrupt?  What address do you use?  The last issued

Line 1014...

Line 1032...

        Suggestion: Suppose we load extra information in the two

        Suggestion: Suppose we load extra information in the two

        CC register(s) for debugging intermediate pipeline stages?

        CC register(s) for debugging intermediate pipeline stages?

        The next problem, though, is how to deal with the read operand

        The next problem, though, is how to deal with the read operand

        pipeline stage needing the result from the register pipeline.a

        pipeline stage needing the result from the register pipeline.

\item {\bf Memory instructions must complete}

\item {\bf Memory instructions must complete}

        All instructions that enter into the memory module *must*

        All instructions that enter into the memory module {\em must}

        complete.  Issued instructions from the prefetch, decode, or operand

        complete.  Issued instructions from the prefetch, decode, or operand

        read stages may or may not complete.  Jumps into code must be valid,

        read stages may or may not complete.  Jumps into code must be valid,

        so that interrupt returns may be valid.  All instructions entering the

        so that interrupt returns may be valid.  All instructions entering the

        ALU complete.

        ALU complete.

Line 1037...

Line 1055...

        result is known.  When the flag does go high, anything in the prefetch,

        result is known.  When the flag does go high, anything in the prefetch,

        decode, and read-op stage will be invalidated.

        decode, and read-op stage will be invalidated.

\end{itemize}

\end{itemize}

\section{Pipeline Stalls}

The processing pipeline can and will stall for a variety of reasons.  Some of

these are obvious, some less so.  These reasons are listed below:

\begin{itemize}

\item When the prefetch cache is exhausted

This should be obvious.  If the prefetch cache doesn't have the instruction

in memory, the entire pipeline must stall until enough of the prefetch cache

is loaded to support the next instruction.

\item While waiting for the pipeline to load following any taken branch, jump,

        return from interrupt or switch to interrupt context (6 clocks)

If the PC suddenly changes, the pipeline is subsequently cleared and needs to

be reloaded.  Given that there are five stages to the pipeline, that accounts

for five of the six delay clocks.  The last clock is lost in the prefetch

stage which needs at least one clock with a valid PC before it can produce

a new output.  Hence, six clocks will always be lost anytime the pipeline needs

to be cleared.

\item When reading from a prior register while also adding an immediate offset

\begin{enumerate}

\item\ {\tt OPCODE ?,RA}

\item\ {\em (stall)}

\item\ {\tt OPCODE I+RA,RB}

\end{enumerate}

Since the addition of the immediate register within OpB decoding gets applied

during the read operand stage so that it can be nicely settled before the ALU,

any instruction that will write back an operand must be separated from the

opcode that will read and apply an immediate offset by one instruction.  The

good news is that this stall can easily be mitigated by proper scheduling.

\item When writing to the CC or PC Register

\begin{enumerate}

\item\ {\tt OPCODE RA,PC} {\em Ex: a branch opcode}

\item\ {\em (stall, even if jump not taken)}

\item\ {\tt OPCODE RA,RB}

\end{enumerate}

Since branches take place in the writeback stage, the Zip CPU will stall the

pipeline for one clock anytime there may be a possible jump.  This prevents

an instruction from executing a memory access after the jump but before the

jump is recognized.

\item When reading from the CC register after setting the flags

\begin{enumerate}

\item\ {\tt ALUOP RA,RB}

\item\ {\em (stall}

\item\ {\tt TST sys.ccv,CC}

\item\ {\tt BZ somewhere}

\end{enumerate}

The reason for this stall is simply performance.  Many of the flags are

determined via combinatorial logic after the writeback instruction is

determined.  Trying to then place these into the input for one of the operands

created a time delay loop that would no longer execute in a single 100~MHz

clock cycle.  (The time delay of the multiply within the ALU wasn't helping

either \ldots).

\item When waiting for a memory read operation to complete

\begin{enumerate}

\item\ {\tt LOD address,RA}

\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}

\item\ {\tt OPCODE I+RA,RB}

\end{enumerate}

Remember, the ZIP CPU does not support out of order execution.  Therefore,

anytime the memory unit becomes busy both the memory unit and the ALU must

stall until the memory unit is cleared.  This is especially true of a load

instruction, which will write its operand back to the register file.  Store

instructions are different, since they can be busy with no impact on later

ALU write back operations.  Hence, only loads stall the pipeline.

This also assumes that the memory being accessed is a single cycle memory.

Slower memories, such as the Quad SPI flash, will take longer--perhaps even

as long as fourty clocks.   During this time the CPU and the external bus

will be busy, and unable to do anything else.

\item Memory operation followed by a memory operation

\begin{enumerate}

\item\ {\tt STO address,RA}

\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}

\item\ {\tt LOD address,RB}

\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}

\end{enumerate}

In this case, the LOD instruction cannot start until the STALL is finished.

With proper scheduling, it is possible to do something in the ALU while the

STO is busy, but otherwise this pipeline will stall waiting for it to complete.

Note that even though the Wishbone bus can support pipelined accesses at

one access per clock, only the prefetch stage can take advantage of this.

Load and Store instructions are stuck at one wishbone cycle per instruction.

\end{itemize}

\chapter{Peripherals}\label{chap:periph}

\chapter{Peripherals}\label{chap:periph}

While the previous chapter describes a CPU in isolation, the Zip System

While the previous chapter describes a CPU in isolation, the Zip System

Line 1120...

Line 1232...

The watchdog timer is no different from any of the other timers, save for one

The watchdog timer is no different from any of the other timers, save for one

critical difference: the interrupt line from the watchdog

critical difference: the interrupt line from the watchdog

timer is tied to the reset line of the CPU.  Hence writing a `1' to the

timer is tied to the reset line of the CPU.  Hence writing a `1' to the

watchdog timer will always reset the CPU.

watchdog timer will always reset the CPU.

To stop the Watchdog timer, write a '0' to it.  To start it,

To stop the Watchdog timer, write a `0' to it.  To start it,

write any other number to it---as with the other timers.

write any other number to it---as with the other timers.

While the watchdog timer supports interval mode, it doesn't make as much sense

While the watchdog timer supports interval mode, it doesn't make as much sense

as it did with the other timers.

as it did with the other timers.

Line 1153...

Line 1265...

The purpose of this register is to support alarm times within a CPU.  To

The purpose of this register is to support alarm times within a CPU.  To

set an alarm for a particular process $N$ clocks in advance, read the current

set an alarm for a particular process $N$ clocks in advance, read the current

Jiffies value, and $N$, and write it back to the Jiffies register.  The

Jiffies value, and $N$, and write it back to the Jiffies register.  The

O/S must also keep track of values written to the Jiffies register.  Thus,

O/S must also keep track of values written to the Jiffies register.  Thus,

when an `alarm' trips, it should be remoed from the list of alarms, the list

when an `alarm' trips, it should be removed from the list of alarms, the list

should be sorted, and the next alarm in terms of Jiffies should be written

should be sorted, and the next alarm in terms of Jiffies should be written

to the register.

to the register.

\section{Manual Cache}

\section{Manual Cache}

The manual cache is an experimental setting that may not remain with the Zip

The manual cache is an experimental setting that may not remain with the Zip

CPU for very long.  It is designed to facilitate running from FLASH or ROM

CPU for very long.  It is designed to facilitate running from FLASH or ROM

memory, although the pipe cache really makes this need obsolete.  The manual

memory, although the pipeline prefetch cache really makes this need obsolete.

The manual

cache works by copying data from a wishbone address (range) into the cache

cache works by copying data from a wishbone address (range) into the cache

register, and then by making that memory available as memory to the Zip System.

register, and then by making that memory available as memory to the Zip System.

It is a {\em manual cache} because the processor must first specify what

It is a {\em manual cache} because the processor must first specify what

memory to copy, and then once copied the processor can only access the cache

memory to copy, and then once copied the processor can only access the cache

memory by the cache memory location.  There is no transparency.  It is perhaps

memory by the cache memory location.  There is no transparency.  It is perhaps

Line 1181...

Line 1294...

The ZipSystem registers fall into two categories, ZipSystem internal registers

The ZipSystem registers fall into two categories, ZipSystem internal registers

accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},

accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},

\begin{table}[htbp]

\begin{table}[htbp]

\begin{center}\begin{reglist}

\begin{center}\begin{reglist}

PIC   & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline

PIC   & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline

WDT   & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline

WDT   & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline

CCHE  & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline

CCHE  & \scalebox{0.8}{\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline

CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline

CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline

TMRA  & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline

TMRA  & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline

TMRB  & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline

TMRB  & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline

TMRC  & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline

TMRC  & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline

JIFF  & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline

JIFF  & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline

MTASK  & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline

MTASK  & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline

MMSTL  & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline

MMSTL  & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline

MPSTL  & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline

MPSTL  & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline

MICNT  & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline

MICNT  & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline

UTASK  & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline

UTASK  & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline

UMSTL  & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline

UMSTL  & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline

UPSTL  & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline

UPSTL  & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline

UICNT  & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline

UICNT  & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline

Cache  & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline

% Cache  & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline

\end{reglist}

\end{reglist}

\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}

\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}

\end{center}\end{table}

\end{center}\end{table}

and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.

and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.

\begin{table}[htbp]

\begin{table}[htbp]

Line 1212...

Line 1325...

\caption{Zip System Debug Registers}\label{tbl:dbgregs}

\caption{Zip System Debug Registers}\label{tbl:dbgregs}

\end{center}\end{table}

\end{center}\end{table}

\chapter{Wishbone Datasheet}\label{chap:wishbone}

\chapter{Wishbone Datasheet}\label{chap:wishbone}

The Zip System supports two wishbone accesses, a slave debug port and a master

The Zip System supports two wishbone ports, a slave debug port and a master

port for the system itself.  These are shown in Tbl.~\ref{tbl:wishbone-slave}

port for the system itself.  These are shown in Tbl.~\ref{tbl:wishbone-slave}

\begin{table}[htbp]

\begin{table}[htbp]

\begin{center}

\begin{center}

\begin{wishboneds}

\begin{wishboneds}

Revision level of wishbone & WB B4 spec \\\hline

Revision level of wishbone & WB B4 spec \\\hline

Line 1279...

Line 1392...

it choose to access a value not on the bus, or a peripheral that is not

it choose to access a value not on the bus, or a peripheral that is not

yet properly configured.

yet properly configured.

\chapter{Clocks}\label{chap:clocks}

\chapter{Clocks}\label{chap:clocks}

This core is based upon the Basys--3 design.  The Basys--3 development board

This core is based upon the Basys--3 development board sold by Digilent.

contains one external 100~MHz clock, which is sufficient to run the ZIP CPU

The Basys--3 development board contains one external 100~MHz clock, which is

core.

sufficient to run the ZIP CPU core.

\begin{table}[htbp]

\begin{table}[htbp]

\begin{center}

\begin{center}

\begin{clocklist}

\begin{clocklist}

i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline

i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline

\end{clocklist}

\end{clocklist}

Browse

Tools

Subversion Repositories zipcpu

[/] [zipcpu/] [trunk/] [doc/] [src/] [spec.tex] - Diff between revs 24 and 32