OpenCores
URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

Subversion Repositories zipcpu

Compare Revisions

  • This comparison shows the changes necessary to convert path
    /zipcpu/trunk
    from Rev 31 to Rev 32
    Reverse comparison

Rev 31 → Rev 32

/doc/spec.pdf Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream
/doc/src/spec.tex
103,7 → 103,7
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has
a lot of features of modern CPUs within it that ... well, let's just say it's
not the little guy on the block. The Zip CPU is lighter weight, costing only
about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic
about 2,300 LUTs with no peripherals, and 3,200 LUTs with some very basic
peripherals.
 
My final reason is that I'm building the Zip CPU as a learning experience. The
332,12 → 332,15
wait for an interrupt (if interrupts are enabled), or to
completely halt (if interrupts are disabled).
The sixth bit is a global interrupt enable bit (GIE). When this
sixth bit is a '1' interrupts will be enabled, else disabled. When
sixth bit is a `1' interrupts will be enabled, else disabled. When
interrupts are disabled, the CPU will be in supervisor mode, otherwise
it is in user mode. Thus, to execute a context switch, one only
need enable or disable interrupts. (When an interrupt line goes
high, interrupts will automatically be disabled, as the CPU goes
and deals with its context switch.)
and deals with its context switch.) Special logic has been added to
keep the user mode from setting the sleep register and clearing the
GIE register at the same time, with clearing the GIE register taking
precedence.
 
The seventh bit is a step bit. This bit can be
set from supervisor mode only. After setting this bit, should
359,6 → 362,10
halt the CPU independent of the break enable bit. This bit can only be set
within supervisor mode.
 
% Should break enable be a supervisor mode bit, while the break enable bit
% in user mode is a break has taken place bit?
%
 
This functionality was added to enable an external debugger to
set and manage breakpoints.
 
416,7 → 423,7
\end{table}
There is no condition code for less than or equal, not C or not V. Sorry,
I ran out of space in 3--bits. Using these conditions will take an extra
instruction. (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
instruction and a pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
 
\section{Operand B}
Many instruction forms have a 21-bit source ``Operand B'' associated with them.
445,7 → 452,10
 
A lot of long hard thought was put into whether to allow pre/post increment
and decrement addressing modes. Finding no way to use these operators without
taking two or more clocks per instruction, these addressing modes have been
taking two or more clocks per instruction,\footnote{The two clocks figure
comes from the design of the register set, allowing only one write per clock.
That write is either from the memory unit or the ALU, but never both.} these
addressing modes have been
removed from the realm of possibilities. This means that the Zip CPU has no
native way of executing push, pop, return, or jump to subroutine operations.
Each of these instructions can be emulated with a set of instructions from the
484,21 → 494,36
CC register field is reserved.
 
\section{Floating Point}
The ZIP CPU does not support floating point operations today. However, the
instruction set reserves a capability for a floating point operation. To
execute such an operation, simply set the floating point bit in the CC
register and the following instruction will interpret its registers as
a floating point instruction. Not all instructions, however, have floating
point equivalents. Further, the immediate fields do not apply in floating
point mode, and must be set to zero. Not all instructions make sense as
floating point operations. Therefore, only the CMP, SUB, ADD, and MPY
instructions may be issued as floating point instructions. Other instructions
allow the examining of the floating point bit in the CC register. In all
cases, the floating point bit is cleared one instruction after it is set.
The ZIP CPU does not support floating point operations. However, the
instruction set reserves two possibilities for future floating point
operations.
 
The architecture does not support a floating point not-implemented interrupt.
Any soft floating point emulation must be done deliberately.
The first floating point operation hole in the instruction set involves
setting the floating point bit in the CC register. The next instruction
will simply interpret its operands as floating point instructions.
Not all instructions, however, have floating point equivalents. Further, the
immediate fields do not apply in floating point mode, and must be set to
zero. Not all instructions make sense as floating point operations.
Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as
floating point instructions. Other instructions allow the examining of the
floating point bit in the CC register. In all cases, the floating point bit
is cleared one instruction after it is set.
 
The other possibility for floating point operations involves exploiting the
hole in the instruction set that the NOOP and BREAK instructions reside within.
These two instructions use 24--bits of address space. A simple adjustment
to this space could create instructions with 4--bit register addresses for
each register, a 3--bit field for conditional execution, and a 2--bit field
for which operation. In this fashion, such a floating point capability would
only fill 13--bits of the 24--bit field, still leaving lots of room for
expansion.
 
In both cases, the Zip CPU would support 32--bit single precision floats
only.
 
The current architecture does not support a floating point not-implemented
interrupt. Any soft floating point emulation must be done deliberately.
 
\section{Native Instructions}
The instruction set for the Zip CPU is summarized in
Tbl.~\ref{tbl:zip-instructions}.
594,18 → 619,10
& \multicolumn{3}{l|}{Cond.}
& \multicolumn{21}{l|}{Operand B address}
& \\\hline
{\em Rsrd} & \multicolumn{4}{l|}{4'h8}
& \multicolumn{4}{l|}{R. Reg}
& \multicolumn{3}{l|}{Cond.}
& 1'b0
& \multicolumn{20}{l|}{Reserved}
& Yes \\\hline
SUB & \multicolumn{4}{l|}{4'h8}
& \multicolumn{4}{l|}{R. Reg}
& \multicolumn{3}{l|}{Cond.}
& 1'b1
& \multicolumn{4}{l|}{Reg}
& \multicolumn{16}{l|}{16'bit signed offset}
& \multicolumn{21}{l|}{Operand B}
& Yes \\\hline
AND & \multicolumn{4}{l|}{4'h9}
& \multicolumn{4}{l|}{R. Reg}
648,9 → 665,11
 
As you can see, there's lots of room for instruction set expansion. The
NOOP and BREAK instructions are the only instructions within one particular
24--bit hole. Likewise, the subtract leaves half of its space open, since a
subtract immediate is the same as an add with a negated immediate. This
spaces are reserved for future enhancements.
24--bit hole. This spaces are reserved for future enhancements. For example,
floating point operations, consisting of a 3-bit floating point operation,
two 4-bit registers, no immediate offset, and a 3-bit condition would fit
nicely into 14--bits of this address space--making it so that the floating
point bit in the CC register need not be used.
 
\section{Derived Instructions}
The ZIP CPU supports many other common instructions, but not all of them
862,6 → 881,8
\iffalse
\fi
\section{Pipeline Stages}
As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},
the Zip CPU supports a five stage pipeline.
\begin{enumerate}
\item {\bf Prefetch}: Read instruction from memory (cache if possible). This
stage is actually pipelined itself, and so it will stall if the PC
868,14 → 889,12
ever changes. Stalls are also created here if the instruction isn't
in the prefetch cache.
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
immediate offset.
immediate offset. This stage also determines whether the flags will
be set or whether the result will be written back.
\item {\bf Read Operands}: Read registers and apply any immediate values to
them. There is no means of detecting or flagging arithmetic overflow
or carry when adding the immediate to the operand. This stage will
stall if any source operand is pending.
A proper optimizing compiler, therefore, will schedule an instruction
between the instruction that produces the result and the instruction
that uses it.
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
instruction, and the {\bf MemOps} stage which accomplishes memory
read/write.
884,19 → 903,19
written to the register set.
\item Condition codes are available upon completion
\item Issuing an instruction to the memory while the memory is busy will
stall the bus. If the bus deadlocks, only a reset will
release the CPU. (Watchdog timer, anyone?)
stall the entire pipeline. If the bus deadlocks, only a reset
will release the CPU. (Watchdog timer, anyone?)
\item The Zip CPU currently has no means of reading and acting on any
error conditions on the bus.
\end{itemize}
\item {\bf Write-Back}: Conditionally write back the result to register set,
applying the condition. This routine is bi-re-entrant: either the
\item {\bf Write-Back}: Conditionally write back the result to the register
set, applying the condition. This routine is bi-re-entrant: either the
memory or the simple instruction may request a register write.
\end{enumerate}
 
The Zip CPU does not support out of order execution. Therefore, if the memory
unit stalls, every other instruction stalls. Memory stores, however, can take
place concurrently with ALU operations, although memory writes cannot.
place concurrently with ALU operations, although memory reads cannot.
 
\section{Pipeline Logic}
How the CPU handles some instruction combinations can be telling when
925,7 → 944,8
two, as I have the pipeline designed.
 
The ZIP CPU architecture requires that R2 must equal R0 at the end of
this operation. This may stall the pipeline 1-2 cycles.
this operation. Even better, such combinations do not (normally)
stall the pipeline.
 
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
 
944,8 → 964,7
fact that the logic supporting the CC register is more complicated than
the logic supporting any other register.
 
The ZIP CPU will stall 1--2 cycles on this instruction, until the
CC register is valid.
The ZIP CPU will stall for a cycle cycle on this instruction.
 
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}
 
993,8 → 1012,7
 
\item {\bf All issued instructions complete.}
 
All stages are filled, or the entire pipeline
stalls.
All stages are filled, or the entire pipeline stalls.
 
What about debug control? What about
register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2
1016,11 → 1034,11
CC register(s) for debugging intermediate pipeline stages?
 
The next problem, though, is how to deal with the read operand
pipeline stage needing the result from the register pipeline.a
pipeline stage needing the result from the register pipeline.
 
\item {\bf Memory instructions must complete}
 
All instructions that enter into the memory module *must*
All instructions that enter into the memory module {\em must}
complete. Issued instructions from the prefetch, decode, or operand
read stages may or may not complete. Jumps into code must be valid,
so that interrupt returns may be valid. All instructions entering the
1039,8 → 1057,102
 
\end{itemize}
 
\section{Pipeline Stalls}
The processing pipeline can and will stall for a variety of reasons. Some of
these are obvious, some less so. These reasons are listed below:
\begin{itemize}
\item When the prefetch cache is exhausted
 
This should be obvious. If the prefetch cache doesn't have the instruction
in memory, the entire pipeline must stall until enough of the prefetch cache
is loaded to support the next instruction.
 
\item While waiting for the pipeline to load following any taken branch, jump,
return from interrupt or switch to interrupt context (6 clocks)
 
If the PC suddenly changes, the pipeline is subsequently cleared and needs to
be reloaded. Given that there are five stages to the pipeline, that accounts
for five of the six delay clocks. The last clock is lost in the prefetch
stage which needs at least one clock with a valid PC before it can produce
a new output. Hence, six clocks will always be lost anytime the pipeline needs
to be cleared.
 
\item When reading from a prior register while also adding an immediate offset
\begin{enumerate}
\item\ {\tt OPCODE ?,RA}
\item\ {\em (stall)}
\item\ {\tt OPCODE I+RA,RB}
\end{enumerate}
 
Since the addition of the immediate register within OpB decoding gets applied
during the read operand stage so that it can be nicely settled before the ALU,
any instruction that will write back an operand must be separated from the
opcode that will read and apply an immediate offset by one instruction. The
good news is that this stall can easily be mitigated by proper scheduling.
 
\item When writing to the CC or PC Register
\begin{enumerate}
\item\ {\tt OPCODE RA,PC} {\em Ex: a branch opcode}
\item\ {\em (stall, even if jump not taken)}
\item\ {\tt OPCODE RA,RB}
\end{enumerate}
Since branches take place in the writeback stage, the Zip CPU will stall the
pipeline for one clock anytime there may be a possible jump. This prevents
an instruction from executing a memory access after the jump but before the
jump is recognized.
 
\item When reading from the CC register after setting the flags
\begin{enumerate}
\item\ {\tt ALUOP RA,RB}
\item\ {\em (stall}
\item\ {\tt TST sys.ccv,CC}
\item\ {\tt BZ somewhere}
\end{enumerate}
 
The reason for this stall is simply performance. Many of the flags are
determined via combinatorial logic after the writeback instruction is
determined. Trying to then place these into the input for one of the operands
created a time delay loop that would no longer execute in a single 100~MHz
clock cycle. (The time delay of the multiply within the ALU wasn't helping
either \ldots).
 
\item When waiting for a memory read operation to complete
\begin{enumerate}
\item\ {\tt LOD address,RA}
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
\item\ {\tt OPCODE I+RA,RB}
\end{enumerate}
 
Remember, the ZIP CPU does not support out of order execution. Therefore,
anytime the memory unit becomes busy both the memory unit and the ALU must
stall until the memory unit is cleared. This is especially true of a load
instruction, which will write its operand back to the register file. Store
instructions are different, since they can be busy with no impact on later
ALU write back operations. Hence, only loads stall the pipeline.
 
This also assumes that the memory being accessed is a single cycle memory.
Slower memories, such as the Quad SPI flash, will take longer--perhaps even
as long as fourty clocks. During this time the CPU and the external bus
will be busy, and unable to do anything else.
 
\item Memory operation followed by a memory operation
\begin{enumerate}
\item\ {\tt STO address,RA}
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
\item\ {\tt LOD address,RB}
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
\end{enumerate}
 
In this case, the LOD instruction cannot start until the STALL is finished.
With proper scheduling, it is possible to do something in the ALU while the
STO is busy, but otherwise this pipeline will stall waiting for it to complete.
 
Note that even though the Wishbone bus can support pipelined accesses at
one access per clock, only the prefetch stage can take advantage of this.
Load and Store instructions are stuck at one wishbone cycle per instruction.
\end{itemize}
 
 
\chapter{Peripherals}\label{chap:periph}
 
While the previous chapter describes a CPU in isolation, the Zip System
1122,7 → 1234,7
critical difference: the interrupt line from the watchdog
timer is tied to the reset line of the CPU. Hence writing a `1' to the
watchdog timer will always reset the CPU.
To stop the Watchdog timer, write a '0' to it. To start it,
To stop the Watchdog timer, write a `0' to it. To start it,
write any other number to it---as with the other timers.
 
While the watchdog timer supports interval mode, it doesn't make as much sense
1155,7 → 1267,7
set an alarm for a particular process $N$ clocks in advance, read the current
Jiffies value, and $N$, and write it back to the Jiffies register. The
O/S must also keep track of values written to the Jiffies register. Thus,
when an `alarm' trips, it should be remoed from the list of alarms, the list
when an `alarm' trips, it should be removed from the list of alarms, the list
should be sorted, and the next alarm in terms of Jiffies should be written
to the register.
 
1163,7 → 1275,8
 
The manual cache is an experimental setting that may not remain with the Zip
CPU for very long. It is designed to facilitate running from FLASH or ROM
memory, although the pipe cache really makes this need obsolete. The manual
memory, although the pipeline prefetch cache really makes this need obsolete.
The manual
cache works by copying data from a wishbone address (range) into the cache
register, and then by making that memory available as memory to the Zip System.
It is a {\em manual cache} because the processor must first specify what
1183,23 → 1296,23
accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},
\begin{table}[htbp]
\begin{center}\begin{reglist}
PIC & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline
WDT & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline
CCHE & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline
CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline
TMRA & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline
TMRB & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline
TMRC & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline
JIFF & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline
MTASK & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline
MMSTL & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline
MPSTL & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline
MICNT & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline
UTASK & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline
UMSTL & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline
UPSTL & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline
UICNT & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline
Cache & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline
PIC & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline
WDT & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline
CCHE & \scalebox{0.8}{\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline
CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline
TMRA & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline
TMRB & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline
TMRC & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline
JIFF & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline
MTASK & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline
MMSTL & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline
MPSTL & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline
MICNT & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline
UTASK & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline
UMSTL & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline
UPSTL & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline
UICNT & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline
% Cache & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline
\end{reglist}
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}
\end{center}\end{table}
1214,7 → 1327,7
 
 
\chapter{Wishbone Datasheet}\label{chap:wishbone}
The Zip System supports two wishbone accesses, a slave debug port and a master
The Zip System supports two wishbone ports, a slave debug port and a master
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}
\begin{table}[htbp]
\begin{center}
1281,9 → 1394,9
 
\chapter{Clocks}\label{chap:clocks}
 
This core is based upon the Basys--3 design. The Basys--3 development board
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
core.
This core is based upon the Basys--3 development board sold by Digilent.
The Basys--3 development board contains one external 100~MHz clock, which is
sufficient to run the ZIP CPU core.
\begin{table}[htbp]
\begin{center}
\begin{clocklist}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.