URL
https://opencores.org/ocsvn/zipcpu/zipcpu/trunk
Subversion Repositories zipcpu
Compare Revisions
- This comparison shows the changes necessary to convert path
/zipcpu/trunk/doc
- from Rev 24 to Rev 32
- ↔ Reverse comparison
Rev 24 → Rev 32
/spec.pdf
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
/src/spec.tex
103,7 → 103,7
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has |
a lot of features of modern CPUs within it that ... well, let's just say it's |
not the little guy on the block. The Zip CPU is lighter weight, costing only |
about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic |
about 2,300 LUTs with no peripherals, and 3,200 LUTs with some very basic |
peripherals. |
|
My final reason is that I'm building the Zip CPU as a learning experience. The |
332,12 → 332,15
wait for an interrupt (if interrupts are enabled), or to |
completely halt (if interrupts are disabled). |
The sixth bit is a global interrupt enable bit (GIE). When this |
sixth bit is a '1' interrupts will be enabled, else disabled. When |
sixth bit is a `1' interrupts will be enabled, else disabled. When |
interrupts are disabled, the CPU will be in supervisor mode, otherwise |
it is in user mode. Thus, to execute a context switch, one only |
need enable or disable interrupts. (When an interrupt line goes |
high, interrupts will automatically be disabled, as the CPU goes |
and deals with its context switch.) |
and deals with its context switch.) Special logic has been added to |
keep the user mode from setting the sleep register and clearing the |
GIE register at the same time, with clearing the GIE register taking |
precedence. |
|
The seventh bit is a step bit. This bit can be |
set from supervisor mode only. After setting this bit, should |
359,6 → 362,10
halt the CPU independent of the break enable bit. This bit can only be set |
within supervisor mode. |
|
% Should break enable be a supervisor mode bit, while the break enable bit |
% in user mode is a break has taken place bit? |
% |
|
This functionality was added to enable an external debugger to |
set and manage breakpoints. |
|
416,7 → 423,7
\end{table} |
There is no condition code for less than or equal, not C or not V. Sorry, |
I ran out of space in 3--bits. Using these conditions will take an extra |
instruction. (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)}) |
instruction and a pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)}) |
|
\section{Operand B} |
Many instruction forms have a 21-bit source ``Operand B'' associated with them. |
445,7 → 452,10
|
A lot of long hard thought was put into whether to allow pre/post increment |
and decrement addressing modes. Finding no way to use these operators without |
taking two or more clocks per instruction, these addressing modes have been |
taking two or more clocks per instruction,\footnote{The two clocks figure |
comes from the design of the register set, allowing only one write per clock. |
That write is either from the memory unit or the ALU, but never both.} these |
addressing modes have been |
removed from the realm of possibilities. This means that the Zip CPU has no |
native way of executing push, pop, return, or jump to subroutine operations. |
Each of these instructions can be emulated with a set of instructions from the |
484,21 → 494,36
CC register field is reserved. |
|
\section{Floating Point} |
The ZIP CPU does not support floating point operations today. However, the |
instruction set reserves a capability for a floating point operation. To |
execute such an operation, simply set the floating point bit in the CC |
register and the following instruction will interpret its registers as |
a floating point instruction. Not all instructions, however, have floating |
point equivalents. Further, the immediate fields do not apply in floating |
point mode, and must be set to zero. Not all instructions make sense as |
floating point operations. Therefore, only the CMP, SUB, ADD, and MPY |
instructions may be issued as floating point instructions. Other instructions |
allow the examining of the floating point bit in the CC register. In all |
cases, the floating point bit is cleared one instruction after it is set. |
The ZIP CPU does not support floating point operations. However, the |
instruction set reserves two possibilities for future floating point |
operations. |
|
The architecture does not support a floating point not-implemented interrupt. |
Any soft floating point emulation must be done deliberately. |
The first floating point operation hole in the instruction set involves |
setting the floating point bit in the CC register. The next instruction |
will simply interpret its operands as floating point instructions. |
Not all instructions, however, have floating point equivalents. Further, the |
immediate fields do not apply in floating point mode, and must be set to |
zero. Not all instructions make sense as floating point operations. |
Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as |
floating point instructions. Other instructions allow the examining of the |
floating point bit in the CC register. In all cases, the floating point bit |
is cleared one instruction after it is set. |
|
The other possibility for floating point operations involves exploiting the |
hole in the instruction set that the NOOP and BREAK instructions reside within. |
These two instructions use 24--bits of address space. A simple adjustment |
to this space could create instructions with 4--bit register addresses for |
each register, a 3--bit field for conditional execution, and a 2--bit field |
for which operation. In this fashion, such a floating point capability would |
only fill 13--bits of the 24--bit field, still leaving lots of room for |
expansion. |
|
In both cases, the Zip CPU would support 32--bit single precision floats |
only. |
|
The current architecture does not support a floating point not-implemented |
interrupt. Any soft floating point emulation must be done deliberately. |
|
\section{Native Instructions} |
The instruction set for the Zip CPU is summarized in |
Tbl.~\ref{tbl:zip-instructions}. |
594,18 → 619,10
& \multicolumn{3}{l|}{Cond.} |
& \multicolumn{21}{l|}{Operand B address} |
& \\\hline |
{\em Rsrd} & \multicolumn{4}{l|}{4'h8} |
& \multicolumn{4}{l|}{R. Reg} |
& \multicolumn{3}{l|}{Cond.} |
& 1'b0 |
& \multicolumn{20}{l|}{Reserved} |
& Yes \\\hline |
SUB & \multicolumn{4}{l|}{4'h8} |
& \multicolumn{4}{l|}{R. Reg} |
& \multicolumn{3}{l|}{Cond.} |
& 1'b1 |
& \multicolumn{4}{l|}{Reg} |
& \multicolumn{16}{l|}{16'bit signed offset} |
& \multicolumn{21}{l|}{Operand B} |
& Yes \\\hline |
AND & \multicolumn{4}{l|}{4'h9} |
& \multicolumn{4}{l|}{R. Reg} |
648,9 → 665,11
|
As you can see, there's lots of room for instruction set expansion. The |
NOOP and BREAK instructions are the only instructions within one particular |
24--bit hole. Likewise, the subtract leaves half of its space open, since a |
subtract immediate is the same as an add with a negated immediate. This |
spaces are reserved for future enhancements. |
24--bit hole. This spaces are reserved for future enhancements. For example, |
floating point operations, consisting of a 3-bit floating point operation, |
two 4-bit registers, no immediate offset, and a 3-bit condition would fit |
nicely into 14--bits of this address space--making it so that the floating |
point bit in the CC register need not be used. |
|
\section{Derived Instructions} |
The ZIP CPU supports many other common instructions, but not all of them |
862,6 → 881,8
\iffalse |
\fi |
\section{Pipeline Stages} |
As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu}, |
the Zip CPU supports a five stage pipeline. |
\begin{enumerate} |
\item {\bf Prefetch}: Read instruction from memory (cache if possible). This |
stage is actually pipelined itself, and so it will stall if the PC |
868,14 → 889,12
ever changes. Stalls are also created here if the instruction isn't |
in the prefetch cache. |
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and |
immediate offset. |
immediate offset. This stage also determines whether the flags will |
be set or whether the result will be written back. |
\item {\bf Read Operands}: Read registers and apply any immediate values to |
them. There is no means of detecting or flagging arithmetic overflow |
or carry when adding the immediate to the operand. This stage will |
stall if any source operand is pending. |
A proper optimizing compiler, therefore, will schedule an instruction |
between the instruction that produces the result and the instruction |
that uses it. |
\item Split into two tracks: An {\bf ALU} which will accomplish a simple |
instruction, and the {\bf MemOps} stage which accomplishes memory |
read/write. |
884,19 → 903,19
written to the register set. |
\item Condition codes are available upon completion |
\item Issuing an instruction to the memory while the memory is busy will |
stall the bus. If the bus deadlocks, only a reset will |
release the CPU. (Watchdog timer, anyone?) |
stall the entire pipeline. If the bus deadlocks, only a reset |
will release the CPU. (Watchdog timer, anyone?) |
\item The Zip CPU currently has no means of reading and acting on any |
error conditions on the bus. |
\end{itemize} |
\item {\bf Write-Back}: Conditionally write back the result to register set, |
applying the condition. This routine is bi-re-entrant: either the |
\item {\bf Write-Back}: Conditionally write back the result to the register |
set, applying the condition. This routine is bi-re-entrant: either the |
memory or the simple instruction may request a register write. |
\end{enumerate} |
|
The Zip CPU does not support out of order execution. Therefore, if the memory |
unit stalls, every other instruction stalls. Memory stores, however, can take |
place concurrently with ALU operations, although memory writes cannot. |
place concurrently with ALU operations, although memory reads cannot. |
|
\section{Pipeline Logic} |
How the CPU handles some instruction combinations can be telling when |
925,7 → 944,8
two, as I have the pipeline designed. |
|
The ZIP CPU architecture requires that R2 must equal R0 at the end of |
this operation. This may stall the pipeline 1-2 cycles. |
this operation. Even better, such combinations do not (normally) |
stall the pipeline. |
|
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC} |
|
944,8 → 964,7
fact that the logic supporting the CC register is more complicated than |
the logic supporting any other register. |
|
The ZIP CPU will stall 1--2 cycles on this instruction, until the |
CC register is valid. |
The ZIP CPU will stall for a cycle cycle on this instruction. |
|
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1} |
|
993,8 → 1012,7
|
\item {\bf All issued instructions complete.} |
|
All stages are filled, or the entire pipeline |
stalls. |
All stages are filled, or the entire pipeline stalls. |
|
What about debug control? What about |
register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2 |
1016,11 → 1034,11
CC register(s) for debugging intermediate pipeline stages? |
|
The next problem, though, is how to deal with the read operand |
pipeline stage needing the result from the register pipeline.a |
pipeline stage needing the result from the register pipeline. |
|
\item {\bf Memory instructions must complete} |
|
All instructions that enter into the memory module *must* |
All instructions that enter into the memory module {\em must} |
complete. Issued instructions from the prefetch, decode, or operand |
read stages may or may not complete. Jumps into code must be valid, |
so that interrupt returns may be valid. All instructions entering the |
1039,8 → 1057,102
|
\end{itemize} |
|
\section{Pipeline Stalls} |
The processing pipeline can and will stall for a variety of reasons. Some of |
these are obvious, some less so. These reasons are listed below: |
\begin{itemize} |
\item When the prefetch cache is exhausted |
|
This should be obvious. If the prefetch cache doesn't have the instruction |
in memory, the entire pipeline must stall until enough of the prefetch cache |
is loaded to support the next instruction. |
|
\item While waiting for the pipeline to load following any taken branch, jump, |
return from interrupt or switch to interrupt context (6 clocks) |
|
If the PC suddenly changes, the pipeline is subsequently cleared and needs to |
be reloaded. Given that there are five stages to the pipeline, that accounts |
for five of the six delay clocks. The last clock is lost in the prefetch |
stage which needs at least one clock with a valid PC before it can produce |
a new output. Hence, six clocks will always be lost anytime the pipeline needs |
to be cleared. |
|
\item When reading from a prior register while also adding an immediate offset |
\begin{enumerate} |
\item\ {\tt OPCODE ?,RA} |
\item\ {\em (stall)} |
\item\ {\tt OPCODE I+RA,RB} |
\end{enumerate} |
|
Since the addition of the immediate register within OpB decoding gets applied |
during the read operand stage so that it can be nicely settled before the ALU, |
any instruction that will write back an operand must be separated from the |
opcode that will read and apply an immediate offset by one instruction. The |
good news is that this stall can easily be mitigated by proper scheduling. |
|
\item When writing to the CC or PC Register |
\begin{enumerate} |
\item\ {\tt OPCODE RA,PC} {\em Ex: a branch opcode} |
\item\ {\em (stall, even if jump not taken)} |
\item\ {\tt OPCODE RA,RB} |
\end{enumerate} |
Since branches take place in the writeback stage, the Zip CPU will stall the |
pipeline for one clock anytime there may be a possible jump. This prevents |
an instruction from executing a memory access after the jump but before the |
jump is recognized. |
|
\item When reading from the CC register after setting the flags |
\begin{enumerate} |
\item\ {\tt ALUOP RA,RB} |
\item\ {\em (stall} |
\item\ {\tt TST sys.ccv,CC} |
\item\ {\tt BZ somewhere} |
\end{enumerate} |
|
The reason for this stall is simply performance. Many of the flags are |
determined via combinatorial logic after the writeback instruction is |
determined. Trying to then place these into the input for one of the operands |
created a time delay loop that would no longer execute in a single 100~MHz |
clock cycle. (The time delay of the multiply within the ALU wasn't helping |
either \ldots). |
|
\item When waiting for a memory read operation to complete |
\begin{enumerate} |
\item\ {\tt LOD address,RA} |
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)} |
\item\ {\tt OPCODE I+RA,RB} |
\end{enumerate} |
|
Remember, the ZIP CPU does not support out of order execution. Therefore, |
anytime the memory unit becomes busy both the memory unit and the ALU must |
stall until the memory unit is cleared. This is especially true of a load |
instruction, which will write its operand back to the register file. Store |
instructions are different, since they can be busy with no impact on later |
ALU write back operations. Hence, only loads stall the pipeline. |
|
This also assumes that the memory being accessed is a single cycle memory. |
Slower memories, such as the Quad SPI flash, will take longer--perhaps even |
as long as fourty clocks. During this time the CPU and the external bus |
will be busy, and unable to do anything else. |
|
\item Memory operation followed by a memory operation |
\begin{enumerate} |
\item\ {\tt STO address,RA} |
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)} |
\item\ {\tt LOD address,RB} |
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)} |
\end{enumerate} |
|
In this case, the LOD instruction cannot start until the STALL is finished. |
With proper scheduling, it is possible to do something in the ALU while the |
STO is busy, but otherwise this pipeline will stall waiting for it to complete. |
|
Note that even though the Wishbone bus can support pipelined accesses at |
one access per clock, only the prefetch stage can take advantage of this. |
Load and Store instructions are stuck at one wishbone cycle per instruction. |
\end{itemize} |
|
|
\chapter{Peripherals}\label{chap:periph} |
|
While the previous chapter describes a CPU in isolation, the Zip System |
1122,7 → 1234,7
critical difference: the interrupt line from the watchdog |
timer is tied to the reset line of the CPU. Hence writing a `1' to the |
watchdog timer will always reset the CPU. |
To stop the Watchdog timer, write a '0' to it. To start it, |
To stop the Watchdog timer, write a `0' to it. To start it, |
write any other number to it---as with the other timers. |
|
While the watchdog timer supports interval mode, it doesn't make as much sense |
1155,7 → 1267,7
set an alarm for a particular process $N$ clocks in advance, read the current |
Jiffies value, and $N$, and write it back to the Jiffies register. The |
O/S must also keep track of values written to the Jiffies register. Thus, |
when an `alarm' trips, it should be remoed from the list of alarms, the list |
when an `alarm' trips, it should be removed from the list of alarms, the list |
should be sorted, and the next alarm in terms of Jiffies should be written |
to the register. |
|
1163,7 → 1275,8
|
The manual cache is an experimental setting that may not remain with the Zip |
CPU for very long. It is designed to facilitate running from FLASH or ROM |
memory, although the pipe cache really makes this need obsolete. The manual |
memory, although the pipeline prefetch cache really makes this need obsolete. |
The manual |
cache works by copying data from a wishbone address (range) into the cache |
register, and then by making that memory available as memory to the Zip System. |
It is a {\em manual cache} because the processor must first specify what |
1183,23 → 1296,23
accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs}, |
\begin{table}[htbp] |
\begin{center}\begin{reglist} |
PIC & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline |
WDT & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline |
CCHE & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline |
CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline |
TMRA & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline |
TMRB & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline |
TMRC & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline |
JIFF & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline |
MTASK & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline |
MMSTL & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline |
MPSTL & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline |
MICNT & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline |
UTASK & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline |
UMSTL & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline |
UPSTL & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline |
UICNT & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline |
Cache & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline |
PIC & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline |
WDT & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline |
CCHE & \scalebox{0.8}{\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline |
CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline |
TMRA & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline |
TMRB & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline |
TMRC & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline |
JIFF & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline |
MTASK & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline |
MMSTL & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline |
MPSTL & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline |
MICNT & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline |
UTASK & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline |
UMSTL & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline |
UPSTL & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline |
UICNT & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline |
% Cache & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline |
\end{reglist} |
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs} |
\end{center}\end{table} |
1214,7 → 1327,7
|
|
\chapter{Wishbone Datasheet}\label{chap:wishbone} |
The Zip System supports two wishbone accesses, a slave debug port and a master |
The Zip System supports two wishbone ports, a slave debug port and a master |
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave} |
\begin{table}[htbp] |
\begin{center} |
1281,9 → 1394,9
|
\chapter{Clocks}\label{chap:clocks} |
|
This core is based upon the Basys--3 design. The Basys--3 development board |
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU |
core. |
This core is based upon the Basys--3 development board sold by Digilent. |
The Basys--3 development board contains one external 100~MHz clock, which is |
sufficient to run the ZIP CPU core. |
\begin{table}[htbp] |
\begin{center} |
\begin{clocklist} |