URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

# Subversion Repositorieszipcpu

## Compare Revisions

• This comparison shows the changes necessary to convert path
/
from Rev 31 to Rev 32
Reverse comparison

## Rev 31 → Rev 32

/zipcpu/trunk/doc/spec.pdf Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream
/zipcpu/trunk/doc/src/spec.tex
103,7 → 103,7
 OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has a lot of features of modern CPUs within it that ... well, let's just say it's not the little guy on the block. The Zip CPU is lighter weight, costing only about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic about 2,300 LUTs with no peripherals, and 3,200 LUTs with some very basic peripherals.   My final reason is that I'm building the Zip CPU as a learning experience. The
332,12 → 332,15
  wait for an interrupt (if interrupts are enabled), or to  completely halt (if interrupts are disabled). The sixth bit is a global interrupt enable bit (GIE). When this  sixth bit is a '1' interrupts will be enabled, else disabled. When   sixth bit is a 1' interrupts will be enabled, else disabled. When   interrupts are disabled, the CPU will be in supervisor mode, otherwise  it is in user mode. Thus, to execute a context switch, one only  need enable or disable interrupts. (When an interrupt line goes  high, interrupts will automatically be disabled, as the CPU goes  and deals with its context switch.)  and deals with its context switch.) Special logic has been added to  keep the user mode from setting the sleep register and clearing the  GIE register at the same time, with clearing the GIE register taking  precedence.   The seventh bit is a step bit. This bit can be  set from supervisor mode only. After setting this bit, should
359,6 → 362,10
 halt the CPU independent of the break enable bit. This bit can only be set within supervisor mode.   % Should break enable be a supervisor mode bit, while the break enable bit % in user mode is a break has taken place bit? %   This functionality was added to enable an external debugger to  set and manage breakpoints.  
416,7 → 423,7
 \end{table} There is no condition code for less than or equal, not C or not V. Sorry, I ran out of space in 3--bits. Using these conditions will take an extra instruction. (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)}) instruction and a pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})   \section{Operand B} Many instruction forms have a 21-bit source Operand B'' associated with them. 
445,7 → 452,10
   A lot of long hard thought was put into whether to allow pre/post increment and decrement addressing modes. Finding no way to use these operators without taking two or more clocks per instruction, these addressing modes have been taking two or more clocks per instruction,\footnote{The two clocks figure comes from the design of the register set, allowing only one write per clock. That write is either from the memory unit or the ALU, but never both.} these addressing modes have been removed from the realm of possibilities. This means that the Zip CPU has no native way of executing push, pop, return, or jump to subroutine operations. Each of these instructions can be emulated with a set of instructions from the
484,21 → 494,36
 CC register field is reserved.   \section{Floating Point} The ZIP CPU does not support floating point operations today. However, the instruction set reserves a capability for a floating point operation. To execute such an operation, simply set the floating point bit in the CC register and the following instruction will interpret its registers as a floating point instruction. Not all instructions, however, have floating point equivalents. Further, the immediate fields do not apply in floating point mode, and must be set to zero. Not all instructions make sense as floating point operations. Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as floating point instructions. Other instructions allow the examining of the floating point bit in the CC register. In all cases, the floating point bit is cleared one instruction after it is set. The ZIP CPU does not support floating point operations. However, the instruction set reserves two possibilities for future floating point operations.   The architecture does not support a floating point not-implemented interrupt. Any soft floating point emulation must be done deliberately. The first floating point operation hole in the instruction set involves setting the floating point bit in the CC register. The next instruction will simply interpret its operands as floating point instructions. Not all instructions, however, have floating point equivalents. Further, the immediate fields do not apply in floating point mode, and must be set to zero. Not all instructions make sense as floating point operations.  Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as floating point instructions. Other instructions allow the examining of the floating point bit in the CC register. In all cases, the floating point bit is cleared one instruction after it is set.   The other possibility for floating point operations involves exploiting the  hole in the instruction set that the NOOP and BREAK instructions reside within. These two instructions use 24--bits of address space. A simple adjustment to this space could create instructions with 4--bit register addresses for each register, a 3--bit field for conditional execution, and a 2--bit field for which operation. In this fashion, such a floating point capability would only fill 13--bits of the 24--bit field, still leaving lots of room for expansion.   In both cases, the Zip CPU would support 32--bit single precision floats only.   The current architecture does not support a floating point not-implemented interrupt. Any soft floating point emulation must be done deliberately.   \section{Native Instructions} The instruction set for the Zip CPU is summarized in Tbl.~\ref{tbl:zip-instructions}.
594,18 → 619,10
  & \multicolumn{3}{l|}{Cond.}  & \multicolumn{21}{l|}{Operand B address}  & \\\hline {\em Rsrd} & \multicolumn{4}{l|}{4'h8}  & \multicolumn{4}{l|}{R. Reg}  & \multicolumn{3}{l|}{Cond.}  & 1'b0  & \multicolumn{20}{l|}{Reserved}  & Yes \\\hline SUB & \multicolumn{4}{l|}{4'h8}  & \multicolumn{4}{l|}{R. Reg}  & \multicolumn{3}{l|}{Cond.}  & 1'b1  & \multicolumn{4}{l|}{Reg}  & \multicolumn{16}{l|}{16'bit signed offset}  & \multicolumn{21}{l|}{Operand B}  & Yes \\\hline AND & \multicolumn{4}{l|}{4'h9}  & \multicolumn{4}{l|}{R. Reg}
648,9 → 665,11
   As you can see, there's lots of room for instruction set expansion. The NOOP and BREAK instructions are the only instructions within one particular 24--bit hole. Likewise, the subtract leaves half of its space open, since a subtract immediate is the same as an add with a negated immediate. This spaces are reserved for future enhancements. 24--bit hole. This spaces are reserved for future enhancements. For example, floating point operations, consisting of a 3-bit floating point operation, two 4-bit registers, no immediate offset, and a 3-bit condition would fit nicely into 14--bits of this address space--making it so that the floating point bit in the CC register need not be used.   \section{Derived Instructions} The ZIP CPU supports many other common instructions, but not all of them
862,6 → 881,8
 \iffalse \fi \section{Pipeline Stages} As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu}, the Zip CPU supports a five stage pipeline. \begin{enumerate} \item {\bf Prefetch}: Read instruction from memory (cache if possible). This  stage is actually pipelined itself, and so it will stall if the PC
868,14 → 889,12
  ever changes. Stalls are also created here if the instruction isn't  in the prefetch cache. \item {\bf Decode}: Decode instruction into op code, register(s) to read, and  immediate offset.  immediate offset. This stage also determines whether the flags will  be set or whether the result will be written back. \item {\bf Read Operands}: Read registers and apply any immediate values to  them. There is no means of detecting or flagging arithmetic overflow  or carry when adding the immediate to the operand. This stage will  stall if any source operand is pending.  A proper optimizing compiler, therefore, will schedule an instruction  between the instruction that produces the result and the instruction  that uses it. \item Split into two tracks: An {\bf ALU} which will accomplish a simple  instruction, and the {\bf MemOps} stage which accomplishes memory  read/write.
884,19 → 903,19
  written to the register set.  \item Condition codes are available upon completion  \item Issuing an instruction to the memory while the memory is busy will  stall the bus. If the bus deadlocks, only a reset will  release the CPU. (Watchdog timer, anyone?)  stall the entire pipeline. If the bus deadlocks, only a reset  will release the CPU. (Watchdog timer, anyone?)  \item The Zip CPU currently has no means of reading and acting on any  error conditions on the bus.  \end{itemize} \item {\bf Write-Back}: Conditionally write back the result to register set,  applying the condition. This routine is bi-re-entrant: either the \item {\bf Write-Back}: Conditionally write back the result to the register  set, applying the condition. This routine is bi-re-entrant: either the  memory or the simple instruction may request a register write.  \end{enumerate}   The Zip CPU does not support out of order execution. Therefore, if the memory unit stalls, every other instruction stalls. Memory stores, however, can take place concurrently with ALU operations, although memory writes cannot. place concurrently with ALU operations, although memory reads cannot.   \section{Pipeline Logic} How the CPU handles some instruction combinations can be telling when
925,7 → 944,8
  two, as I have the pipeline designed.    The ZIP CPU architecture requires that R2 must equal R0 at the end of  this operation. This may stall the pipeline 1-2 cycles.  this operation. Even better, such combinations do not (normally)  stall the pipeline.   \item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}   944,8 → 964,7   fact that the logic supporting the CC register is more complicated than  the logic supporting any other register.    The ZIP CPU will stall 1--2 cycles on this instruction, until the  CC register is valid.  The ZIP CPU will stall for a cycle cycle on this instruction.   \item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}  
993,8 → 1012,7
   \item {\bf All issued instructions complete.}    All stages are filled, or the entire pipeline  stalls.  All stages are filled, or the entire pipeline stalls.    What about debug control? What about  register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2
1016,11 → 1034,11
  CC register(s) for debugging intermediate pipeline stages?    The next problem, though, is how to deal with the read operand  pipeline stage needing the result from the register pipeline.a  pipeline stage needing the result from the register pipeline.   \item {\bf Memory instructions must complete}    All instructions that enter into the memory module *must*  All instructions that enter into the memory module {\em must}  complete. Issued instructions from the prefetch, decode, or operand  read stages may or may not complete. Jumps into code must be valid,  so that interrupt returns may be valid. All instructions entering the
1039,8 → 1057,102
   \end{itemize}   \section{Pipeline Stalls} The processing pipeline can and will stall for a variety of reasons. Some of these are obvious, some less so. These reasons are listed below: \begin{itemize} \item When the prefetch cache is exhausted   This should be obvious. If the prefetch cache doesn't have the instruction in memory, the entire pipeline must stall until enough of the prefetch cache is loaded to support the next instruction.   \item While waiting for the pipeline to load following any taken branch, jump,  return from interrupt or switch to interrupt context (6 clocks)   If the PC suddenly changes, the pipeline is subsequently cleared and needs to be reloaded. Given that there are five stages to the pipeline, that accounts for five of the six delay clocks. The last clock is lost in the prefetch stage which needs at least one clock with a valid PC before it can produce a new output. Hence, six clocks will always be lost anytime the pipeline needs to be cleared.   \item When reading from a prior register while also adding an immediate offset \begin{enumerate} \item\ {\tt OPCODE ?,RA} \item\ {\em (stall)} \item\ {\tt OPCODE I+RA,RB} \end{enumerate}   Since the addition of the immediate register within OpB decoding gets applied during the read operand stage so that it can be nicely settled before the ALU, any instruction that will write back an operand must be separated from the opcode that will read and apply an immediate offset by one instruction. The good news is that this stall can easily be mitigated by proper scheduling.   \item When writing to the CC or PC Register \begin{enumerate} \item\ {\tt OPCODE RA,PC} {\em Ex: a branch opcode} \item\ {\em (stall, even if jump not taken)} \item\ {\tt OPCODE RA,RB} \end{enumerate} Since branches take place in the writeback stage, the Zip CPU will stall the pipeline for one clock anytime there may be a possible jump. This prevents an instruction from executing a memory access after the jump but before the jump is recognized.   \item When reading from the CC register after setting the flags \begin{enumerate} \item\ {\tt ALUOP RA,RB} \item\ {\em (stall} \item\ {\tt TST sys.ccv,CC} \item\ {\tt BZ somewhere} \end{enumerate}   The reason for this stall is simply performance. Many of the flags are determined via combinatorial logic after the writeback instruction is determined. Trying to then place these into the input for one of the operands created a time delay loop that would no longer execute in a single 100~MHz clock cycle. (The time delay of the multiply within the ALU wasn't helping either \ldots).    \item When waiting for a memory read operation to complete \begin{enumerate} \item\ {\tt LOD address,RA} \item\ {\em (multiple stalls, bus dependent, 7 clocks best)} \item\ {\tt OPCODE I+RA,RB} \end{enumerate}   Remember, the ZIP CPU does not support out of order execution. Therefore, anytime the memory unit becomes busy both the memory unit and the ALU must stall until the memory unit is cleared. This is especially true of a load instruction, which will write its operand back to the register file. Store instructions are different, since they can be busy with no impact on later ALU write back operations. Hence, only loads stall the pipeline.   This also assumes that the memory being accessed is a single cycle memory. Slower memories, such as the Quad SPI flash, will take longer--perhaps even as long as fourty clocks. During this time the CPU and the external bus  will be busy, and unable to do anything else.   \item Memory operation followed by a memory operation \begin{enumerate} \item\ {\tt STO address,RA} \item\ {\em (multiple stalls, bus dependent, 7 clocks best)} \item\ {\tt LOD address,RB} \item\ {\em (multiple stalls, bus dependent, 7 clocks best)} \end{enumerate}   In this case, the LOD instruction cannot start until the STALL is finished. With proper scheduling, it is possible to do something in the ALU while the STO is busy, but otherwise this pipeline will stall waiting for it to complete.   Note that even though the Wishbone bus can support pipelined accesses at one access per clock, only the prefetch stage can take advantage of this. Load and Store instructions are stuck at one wishbone cycle per instruction. \end{itemize}     \chapter{Peripherals}\label{chap:periph}   While the previous chapter describes a CPU in isolation, the Zip System
1122,7 → 1234,7
 critical difference: the interrupt line from the watchdog timer is tied to the reset line of the CPU. Hence writing a 1' to the  watchdog timer will always reset the CPU.  To stop the Watchdog timer, write a '0' to it. To start it, To stop the Watchdog timer, write a 0' to it. To start it, write any other number to it---as with the other timers.   While the watchdog timer supports interval mode, it doesn't make as much sense
1155,7 → 1267,7
 set an alarm for a particular process $N$ clocks in advance, read the current Jiffies value, and $N$, and write it back to the Jiffies register. The O/S must also keep track of values written to the Jiffies register. Thus, when an alarm' trips, it should be remoed from the list of alarms, the list when an alarm' trips, it should be removed from the list of alarms, the list should be sorted, and the next alarm in terms of Jiffies should be written to the register.  
1163,7 → 1275,8
   The manual cache is an experimental setting that may not remain with the Zip CPU for very long. It is designed to facilitate running from FLASH or ROM memory, although the pipe cache really makes this need obsolete. The manual memory, although the pipeline prefetch cache really makes this need obsolete.  The manual cache works by copying data from a wishbone address (range) into the cache register, and then by making that memory available as memory to the Zip System. It is a {\em manual cache} because the processor must first specify what
1183,23 → 1296,23
 accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs}, \begin{table}[htbp] \begin{center}\begin{reglist} PIC & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline WDT & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline CCHE & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline TMRA & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline TMRB & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline TMRC & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline JIFF & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline MTASK & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline MMSTL & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline MPSTL & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline MICNT & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline UTASK & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline UMSTL & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline UPSTL & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline UICNT & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline Cache & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline PIC & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline WDT & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline CCHE & \scalebox{0.8}{\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline TMRA & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline TMRB & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline TMRC & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline JIFF & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline MTASK & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline MMSTL & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline MPSTL & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline MICNT & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline UTASK & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline UMSTL & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline UPSTL & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline UICNT & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline % Cache & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline \end{reglist} \caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs} \end{center}\end{table}
1214,7 → 1327,7
     \chapter{Wishbone Datasheet}\label{chap:wishbone} The Zip System supports two wishbone accesses, a slave debug port and a master The Zip System supports two wishbone ports, a slave debug port and a master port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave} \begin{table}[htbp] \begin{center}
1281,9 → 1394,9
   \chapter{Clocks}\label{chap:clocks}   This core is based upon the Basys--3 design. The Basys--3 development board contains one external 100~MHz clock, which is sufficient to run the ZIP CPU core. This core is based upon the Basys--3 development board sold by Digilent.  The Basys--3 development board contains one external 100~MHz clock, which is sufficient to run the ZIP CPU core. \begin{table}[htbp] \begin{center} \begin{clocklist}`