Line 101... |
Line 101... |
envious of what they've accomplished. I would like to port binutils to the
|
envious of what they've accomplished. I would like to port binutils to the
|
Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The
|
Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The
|
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has
|
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has
|
a lot of features of modern CPUs within it that ... well, let's just say it's
|
a lot of features of modern CPUs within it that ... well, let's just say it's
|
not the little guy on the block. The Zip CPU is lighter weight, costing only
|
not the little guy on the block. The Zip CPU is lighter weight, costing only
|
about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic
|
about 2,300 LUTs with no peripherals, and 3,200 LUTs with some very basic
|
peripherals.
|
peripherals.
|
|
|
My final reason is that I'm building the Zip CPU as a learning experience. The
|
My final reason is that I'm building the Zip CPU as a learning experience. The
|
Zip CPU has allowed me to learn a lot about how CPUs work on a very micro
|
Zip CPU has allowed me to learn a lot about how CPUs work on a very micro
|
level. For the first time, I am beginning to understand many of the Computer
|
level. For the first time, I am beginning to understand many of the Computer
|
Line 330... |
Line 330... |
The next bit is a clock enable (0 to enable) or sleep bit (1 to put
|
The next bit is a clock enable (0 to enable) or sleep bit (1 to put
|
the CPU to sleep). Setting this bit will cause the CPU to
|
the CPU to sleep). Setting this bit will cause the CPU to
|
wait for an interrupt (if interrupts are enabled), or to
|
wait for an interrupt (if interrupts are enabled), or to
|
completely halt (if interrupts are disabled).
|
completely halt (if interrupts are disabled).
|
The sixth bit is a global interrupt enable bit (GIE). When this
|
The sixth bit is a global interrupt enable bit (GIE). When this
|
sixth bit is a '1' interrupts will be enabled, else disabled. When
|
sixth bit is a `1' interrupts will be enabled, else disabled. When
|
interrupts are disabled, the CPU will be in supervisor mode, otherwise
|
interrupts are disabled, the CPU will be in supervisor mode, otherwise
|
it is in user mode. Thus, to execute a context switch, one only
|
it is in user mode. Thus, to execute a context switch, one only
|
need enable or disable interrupts. (When an interrupt line goes
|
need enable or disable interrupts. (When an interrupt line goes
|
high, interrupts will automatically be disabled, as the CPU goes
|
high, interrupts will automatically be disabled, as the CPU goes
|
and deals with its context switch.)
|
and deals with its context switch.) Special logic has been added to
|
|
keep the user mode from setting the sleep register and clearing the
|
|
GIE register at the same time, with clearing the GIE register taking
|
|
precedence.
|
|
|
The seventh bit is a step bit. This bit can be
|
The seventh bit is a step bit. This bit can be
|
set from supervisor mode only. After setting this bit, should
|
set from supervisor mode only. After setting this bit, should
|
the supervisor mode process switch to user mode, it would then
|
the supervisor mode process switch to user mode, it would then
|
accomplish one instruction in user mode before returning to supervisor
|
accomplish one instruction in user mode before returning to supervisor
|
Line 357... |
Line 360... |
(break enabled), or whether the break instruction will simply send send the
|
(break enabled), or whether the break instruction will simply send send the
|
CPU into interrupt mode. Encountering a break in supervisor mode will
|
CPU into interrupt mode. Encountering a break in supervisor mode will
|
halt the CPU independent of the break enable bit. This bit can only be set
|
halt the CPU independent of the break enable bit. This bit can only be set
|
within supervisor mode.
|
within supervisor mode.
|
|
|
|
% Should break enable be a supervisor mode bit, while the break enable bit
|
|
% in user mode is a break has taken place bit?
|
|
%
|
|
|
This functionality was added to enable an external debugger to
|
This functionality was added to enable an external debugger to
|
set and manage breakpoints.
|
set and manage breakpoints.
|
|
|
The ninth bit is reserved for a floating point enable bit. When set, the
|
The ninth bit is reserved for a floating point enable bit. When set, the
|
arithmetic for the next instruction will be sent to a floating point unit.
|
arithmetic for the next instruction will be sent to a floating point unit.
|
Line 414... |
Line 421... |
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
\end{center}
|
\end{center}
|
\end{table}
|
\end{table}
|
There is no condition code for less than or equal, not C or not V. Sorry,
|
There is no condition code for less than or equal, not C or not V. Sorry,
|
I ran out of space in 3--bits. Using these conditions will take an extra
|
I ran out of space in 3--bits. Using these conditions will take an extra
|
instruction. (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
|
instruction and a pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
|
|
|
\section{Operand B}
|
\section{Operand B}
|
Many instruction forms have a 21-bit source ``Operand B'' associated with them.
|
Many instruction forms have a 21-bit source ``Operand B'' associated with them.
|
This Operand B is either equal to a register plus a signed immediate offset,
|
This Operand B is either equal to a register plus a signed immediate offset,
|
or an immediate offset by itself. This value is encoded as shown in
|
or an immediate offset by itself. This value is encoded as shown in
|
Line 443... |
Line 450... |
immediate address. Addresses are therefore encoded in the same fashion as
|
immediate address. Addresses are therefore encoded in the same fashion as
|
Operand B's, shown above.
|
Operand B's, shown above.
|
|
|
A lot of long hard thought was put into whether to allow pre/post increment
|
A lot of long hard thought was put into whether to allow pre/post increment
|
and decrement addressing modes. Finding no way to use these operators without
|
and decrement addressing modes. Finding no way to use these operators without
|
taking two or more clocks per instruction, these addressing modes have been
|
taking two or more clocks per instruction,\footnote{The two clocks figure
|
|
comes from the design of the register set, allowing only one write per clock.
|
|
That write is either from the memory unit or the ALU, but never both.} these
|
|
addressing modes have been
|
removed from the realm of possibilities. This means that the Zip CPU has no
|
removed from the realm of possibilities. This means that the Zip CPU has no
|
native way of executing push, pop, return, or jump to subroutine operations.
|
native way of executing push, pop, return, or jump to subroutine operations.
|
Each of these instructions can be emulated with a set of instructions from the
|
Each of these instructions can be emulated with a set of instructions from the
|
existing set.
|
existing set.
|
|
|
Line 482... |
Line 492... |
rule that the register cannot be the PC or CC registers. The PC register
|
rule that the register cannot be the PC or CC registers. The PC register
|
field has been stolen to create a multiply by immediate instruction. The
|
field has been stolen to create a multiply by immediate instruction. The
|
CC register field is reserved.
|
CC register field is reserved.
|
|
|
\section{Floating Point}
|
\section{Floating Point}
|
The ZIP CPU does not support floating point operations today. However, the
|
The ZIP CPU does not support floating point operations. However, the
|
instruction set reserves a capability for a floating point operation. To
|
instruction set reserves two possibilities for future floating point
|
execute such an operation, simply set the floating point bit in the CC
|
operations.
|
register and the following instruction will interpret its registers as
|
|
a floating point instruction. Not all instructions, however, have floating
|
The first floating point operation hole in the instruction set involves
|
point equivalents. Further, the immediate fields do not apply in floating
|
setting the floating point bit in the CC register. The next instruction
|
point mode, and must be set to zero. Not all instructions make sense as
|
will simply interpret its operands as floating point instructions.
|
floating point operations. Therefore, only the CMP, SUB, ADD, and MPY
|
Not all instructions, however, have floating point equivalents. Further, the
|
instructions may be issued as floating point instructions. Other instructions
|
immediate fields do not apply in floating point mode, and must be set to
|
allow the examining of the floating point bit in the CC register. In all
|
zero. Not all instructions make sense as floating point operations.
|
cases, the floating point bit is cleared one instruction after it is set.
|
Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as
|
|
floating point instructions. Other instructions allow the examining of the
|
|
floating point bit in the CC register. In all cases, the floating point bit
|
|
is cleared one instruction after it is set.
|
|
|
|
The other possibility for floating point operations involves exploiting the
|
|
hole in the instruction set that the NOOP and BREAK instructions reside within.
|
|
These two instructions use 24--bits of address space. A simple adjustment
|
|
to this space could create instructions with 4--bit register addresses for
|
|
each register, a 3--bit field for conditional execution, and a 2--bit field
|
|
for which operation. In this fashion, such a floating point capability would
|
|
only fill 13--bits of the 24--bit field, still leaving lots of room for
|
|
expansion.
|
|
|
|
In both cases, the Zip CPU would support 32--bit single precision floats
|
|
only.
|
|
|
The architecture does not support a floating point not-implemented interrupt.
|
The current architecture does not support a floating point not-implemented
|
Any soft floating point emulation must be done deliberately.
|
interrupt. Any soft floating point emulation must be done deliberately.
|
|
|
\section{Native Instructions}
|
\section{Native Instructions}
|
The instruction set for the Zip CPU is summarized in
|
The instruction set for the Zip CPU is summarized in
|
Tbl.~\ref{tbl:zip-instructions}.
|
Tbl.~\ref{tbl:zip-instructions}.
|
\begin{table}\begin{center}
|
\begin{table}\begin{center}
|
Line 592... |
Line 617... |
STO & \multicolumn{4}{l|}{4'h7}
|
STO & \multicolumn{4}{l|}{4'h7}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{4}{l|}{D. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B address}
|
& \multicolumn{21}{l|}{Operand B address}
|
& \\\hline
|
& \\\hline
|
{\em Rsrd} & \multicolumn{4}{l|}{4'h8}
|
|
& \multicolumn{4}{l|}{R. Reg}
|
|
& \multicolumn{3}{l|}{Cond.}
|
|
& 1'b0
|
|
& \multicolumn{20}{l|}{Reserved}
|
|
& Yes \\\hline
|
|
SUB & \multicolumn{4}{l|}{4'h8}
|
SUB & \multicolumn{4}{l|}{4'h8}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& 1'b1
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{4}{l|}{Reg}
|
|
& \multicolumn{16}{l|}{16'bit signed offset}
|
|
& Yes \\\hline
|
& Yes \\\hline
|
AND & \multicolumn{4}{l|}{4'h9}
|
AND & \multicolumn{4}{l|}{4'h9}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{4}{l|}{R. Reg}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{3}{l|}{Cond.}
|
& \multicolumn{21}{l|}{Operand B}
|
& \multicolumn{21}{l|}{Operand B}
|
Line 646... |
Line 663... |
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
|
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
|
|
As you can see, there's lots of room for instruction set expansion. The
|
As you can see, there's lots of room for instruction set expansion. The
|
NOOP and BREAK instructions are the only instructions within one particular
|
NOOP and BREAK instructions are the only instructions within one particular
|
24--bit hole. Likewise, the subtract leaves half of its space open, since a
|
24--bit hole. This spaces are reserved for future enhancements. For example,
|
subtract immediate is the same as an add with a negated immediate. This
|
floating point operations, consisting of a 3-bit floating point operation,
|
spaces are reserved for future enhancements.
|
two 4-bit registers, no immediate offset, and a 3-bit condition would fit
|
|
nicely into 14--bits of this address space--making it so that the floating
|
|
point bit in the CC register need not be used.
|
|
|
\section{Derived Instructions}
|
\section{Derived Instructions}
|
The ZIP CPU supports many other common instructions, but not all of them
|
The ZIP CPU supports many other common instructions, but not all of them
|
are single cycle instructions. The derived instruction tables,
|
are single cycle instructions. The derived instruction tables,
|
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
|
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
|
Line 860... |
Line 879... |
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
\iffalse
|
\iffalse
|
\fi
|
\fi
|
\section{Pipeline Stages}
|
\section{Pipeline Stages}
|
|
As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},
|
|
the Zip CPU supports a five stage pipeline.
|
\begin{enumerate}
|
\begin{enumerate}
|
\item {\bf Prefetch}: Read instruction from memory (cache if possible). This
|
\item {\bf Prefetch}: Read instruction from memory (cache if possible). This
|
stage is actually pipelined itself, and so it will stall if the PC
|
stage is actually pipelined itself, and so it will stall if the PC
|
ever changes. Stalls are also created here if the instruction isn't
|
ever changes. Stalls are also created here if the instruction isn't
|
in the prefetch cache.
|
in the prefetch cache.
|
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
|
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
|
immediate offset.
|
immediate offset. This stage also determines whether the flags will
|
|
be set or whether the result will be written back.
|
\item {\bf Read Operands}: Read registers and apply any immediate values to
|
\item {\bf Read Operands}: Read registers and apply any immediate values to
|
them. There is no means of detecting or flagging arithmetic overflow
|
them. There is no means of detecting or flagging arithmetic overflow
|
or carry when adding the immediate to the operand. This stage will
|
or carry when adding the immediate to the operand. This stage will
|
stall if any source operand is pending.
|
stall if any source operand is pending.
|
A proper optimizing compiler, therefore, will schedule an instruction
|
|
between the instruction that produces the result and the instruction
|
|
that uses it.
|
|
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
|
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
|
instruction, and the {\bf MemOps} stage which accomplishes memory
|
instruction, and the {\bf MemOps} stage which accomplishes memory
|
read/write.
|
read/write.
|
\begin{itemize}
|
\begin{itemize}
|
\item Loads stall instructions that access the register until it is
|
\item Loads stall instructions that access the register until it is
|
written to the register set.
|
written to the register set.
|
\item Condition codes are available upon completion
|
\item Condition codes are available upon completion
|
\item Issuing an instruction to the memory while the memory is busy will
|
\item Issuing an instruction to the memory while the memory is busy will
|
stall the bus. If the bus deadlocks, only a reset will
|
stall the entire pipeline. If the bus deadlocks, only a reset
|
release the CPU. (Watchdog timer, anyone?)
|
will release the CPU. (Watchdog timer, anyone?)
|
\item The Zip CPU currently has no means of reading and acting on any
|
\item The Zip CPU currently has no means of reading and acting on any
|
error conditions on the bus.
|
error conditions on the bus.
|
\end{itemize}
|
\end{itemize}
|
\item {\bf Write-Back}: Conditionally write back the result to register set,
|
\item {\bf Write-Back}: Conditionally write back the result to the register
|
applying the condition. This routine is bi-re-entrant: either the
|
set, applying the condition. This routine is bi-re-entrant: either the
|
memory or the simple instruction may request a register write.
|
memory or the simple instruction may request a register write.
|
\end{enumerate}
|
\end{enumerate}
|
|
|
The Zip CPU does not support out of order execution. Therefore, if the memory
|
The Zip CPU does not support out of order execution. Therefore, if the memory
|
unit stalls, every other instruction stalls. Memory stores, however, can take
|
unit stalls, every other instruction stalls. Memory stores, however, can take
|
place concurrently with ALU operations, although memory writes cannot.
|
place concurrently with ALU operations, although memory reads cannot.
|
|
|
\section{Pipeline Logic}
|
\section{Pipeline Logic}
|
How the CPU handles some instruction combinations can be telling when
|
How the CPU handles some instruction combinations can be telling when
|
determining what happens in the pipeline. The following lists some examples:
|
determining what happens in the pipeline. The following lists some examples:
|
\begin{itemize}
|
\begin{itemize}
|
Line 923... |
Line 942... |
R2 get, the value of R1 before the first move or the value of R0?
|
R2 get, the value of R1 before the first move or the value of R0?
|
Placing the value of R0 into R1 requires a pipeline stall, and possibly
|
Placing the value of R0 into R1 requires a pipeline stall, and possibly
|
two, as I have the pipeline designed.
|
two, as I have the pipeline designed.
|
|
|
The ZIP CPU architecture requires that R2 must equal R0 at the end of
|
The ZIP CPU architecture requires that R2 must equal R0 at the end of
|
this operation. This may stall the pipeline 1-2 cycles.
|
this operation. Even better, such combinations do not (normally)
|
|
stall the pipeline.
|
|
|
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
|
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
|
|
|
|
|
At issue is the same item as above, save that the CMP instruction
|
At issue is the same item as above, save that the CMP instruction
|
Line 942... |
Line 962... |
|
|
At issue is the
|
At issue is the
|
fact that the logic supporting the CC register is more complicated than
|
fact that the logic supporting the CC register is more complicated than
|
the logic supporting any other register.
|
the logic supporting any other register.
|
|
|
The ZIP CPU will stall 1--2 cycles on this instruction, until the
|
The ZIP CPU will stall for a cycle cycle on this instruction.
|
CC register is valid.
|
|
|
|
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}
|
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}
|
|
|
At issues is whether or not the instruction following the jump will
|
At issues is whether or not the instruction following the jump will
|
take place before the jump. In other words, is the MOV to the PC
|
take place before the jump. In other words, is the MOV to the PC
|
Line 991... |
Line 1010... |
Because it isn't clear what would need to be canceled,
|
Because it isn't clear what would need to be canceled,
|
this instruction combination is not recommended.
|
this instruction combination is not recommended.
|
|
|
\item {\bf All issued instructions complete.}
|
\item {\bf All issued instructions complete.}
|
|
|
All stages are filled, or the entire pipeline
|
All stages are filled, or the entire pipeline stalls.
|
stalls.
|
|
|
|
What about debug control? What about
|
What about debug control? What about
|
register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2
|
register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2
|
should place the value of R0 into R2. How do you restart the pipeline
|
should place the value of R0 into R2. How do you restart the pipeline
|
after an interrupt? What address do you use? The last issued
|
after an interrupt? What address do you use? The last issued
|
Line 1014... |
Line 1032... |
|
|
Suggestion: Suppose we load extra information in the two
|
Suggestion: Suppose we load extra information in the two
|
CC register(s) for debugging intermediate pipeline stages?
|
CC register(s) for debugging intermediate pipeline stages?
|
|
|
The next problem, though, is how to deal with the read operand
|
The next problem, though, is how to deal with the read operand
|
pipeline stage needing the result from the register pipeline.a
|
pipeline stage needing the result from the register pipeline.
|
|
|
\item {\bf Memory instructions must complete}
|
\item {\bf Memory instructions must complete}
|
|
|
All instructions that enter into the memory module *must*
|
All instructions that enter into the memory module {\em must}
|
complete. Issued instructions from the prefetch, decode, or operand
|
complete. Issued instructions from the prefetch, decode, or operand
|
read stages may or may not complete. Jumps into code must be valid,
|
read stages may or may not complete. Jumps into code must be valid,
|
so that interrupt returns may be valid. All instructions entering the
|
so that interrupt returns may be valid. All instructions entering the
|
ALU complete.
|
ALU complete.
|
|
|
Line 1037... |
Line 1055... |
result is known. When the flag does go high, anything in the prefetch,
|
result is known. When the flag does go high, anything in the prefetch,
|
decode, and read-op stage will be invalidated.
|
decode, and read-op stage will be invalidated.
|
|
|
\end{itemize}
|
\end{itemize}
|
|
|
|
\section{Pipeline Stalls}
|
|
The processing pipeline can and will stall for a variety of reasons. Some of
|
|
these are obvious, some less so. These reasons are listed below:
|
|
\begin{itemize}
|
|
\item When the prefetch cache is exhausted
|
|
|
|
This should be obvious. If the prefetch cache doesn't have the instruction
|
|
in memory, the entire pipeline must stall until enough of the prefetch cache
|
|
is loaded to support the next instruction.
|
|
|
|
\item While waiting for the pipeline to load following any taken branch, jump,
|
|
return from interrupt or switch to interrupt context (6 clocks)
|
|
|
|
If the PC suddenly changes, the pipeline is subsequently cleared and needs to
|
|
be reloaded. Given that there are five stages to the pipeline, that accounts
|
|
for five of the six delay clocks. The last clock is lost in the prefetch
|
|
stage which needs at least one clock with a valid PC before it can produce
|
|
a new output. Hence, six clocks will always be lost anytime the pipeline needs
|
|
to be cleared.
|
|
|
|
\item When reading from a prior register while also adding an immediate offset
|
|
\begin{enumerate}
|
|
\item\ {\tt OPCODE ?,RA}
|
|
\item\ {\em (stall)}
|
|
\item\ {\tt OPCODE I+RA,RB}
|
|
\end{enumerate}
|
|
|
|
Since the addition of the immediate register within OpB decoding gets applied
|
|
during the read operand stage so that it can be nicely settled before the ALU,
|
|
any instruction that will write back an operand must be separated from the
|
|
opcode that will read and apply an immediate offset by one instruction. The
|
|
good news is that this stall can easily be mitigated by proper scheduling.
|
|
|
|
\item When writing to the CC or PC Register
|
|
\begin{enumerate}
|
|
\item\ {\tt OPCODE RA,PC} {\em Ex: a branch opcode}
|
|
\item\ {\em (stall, even if jump not taken)}
|
|
\item\ {\tt OPCODE RA,RB}
|
|
\end{enumerate}
|
|
Since branches take place in the writeback stage, the Zip CPU will stall the
|
|
pipeline for one clock anytime there may be a possible jump. This prevents
|
|
an instruction from executing a memory access after the jump but before the
|
|
jump is recognized.
|
|
|
|
\item When reading from the CC register after setting the flags
|
|
\begin{enumerate}
|
|
\item\ {\tt ALUOP RA,RB}
|
|
\item\ {\em (stall}
|
|
\item\ {\tt TST sys.ccv,CC}
|
|
\item\ {\tt BZ somewhere}
|
|
\end{enumerate}
|
|
|
|
The reason for this stall is simply performance. Many of the flags are
|
|
determined via combinatorial logic after the writeback instruction is
|
|
determined. Trying to then place these into the input for one of the operands
|
|
created a time delay loop that would no longer execute in a single 100~MHz
|
|
clock cycle. (The time delay of the multiply within the ALU wasn't helping
|
|
either \ldots).
|
|
|
|
\item When waiting for a memory read operation to complete
|
|
\begin{enumerate}
|
|
\item\ {\tt LOD address,RA}
|
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
|
\item\ {\tt OPCODE I+RA,RB}
|
|
\end{enumerate}
|
|
|
|
Remember, the ZIP CPU does not support out of order execution. Therefore,
|
|
anytime the memory unit becomes busy both the memory unit and the ALU must
|
|
stall until the memory unit is cleared. This is especially true of a load
|
|
instruction, which will write its operand back to the register file. Store
|
|
instructions are different, since they can be busy with no impact on later
|
|
ALU write back operations. Hence, only loads stall the pipeline.
|
|
|
|
This also assumes that the memory being accessed is a single cycle memory.
|
|
Slower memories, such as the Quad SPI flash, will take longer--perhaps even
|
|
as long as fourty clocks. During this time the CPU and the external bus
|
|
will be busy, and unable to do anything else.
|
|
|
|
\item Memory operation followed by a memory operation
|
|
\begin{enumerate}
|
|
\item\ {\tt STO address,RA}
|
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
|
\item\ {\tt LOD address,RB}
|
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
|
\end{enumerate}
|
|
|
|
In this case, the LOD instruction cannot start until the STALL is finished.
|
|
With proper scheduling, it is possible to do something in the ALU while the
|
|
STO is busy, but otherwise this pipeline will stall waiting for it to complete.
|
|
|
|
Note that even though the Wishbone bus can support pipelined accesses at
|
|
one access per clock, only the prefetch stage can take advantage of this.
|
|
Load and Store instructions are stuck at one wishbone cycle per instruction.
|
|
\end{itemize}
|
|
|
|
|
\chapter{Peripherals}\label{chap:periph}
|
\chapter{Peripherals}\label{chap:periph}
|
|
|
While the previous chapter describes a CPU in isolation, the Zip System
|
While the previous chapter describes a CPU in isolation, the Zip System
|
Line 1120... |
Line 1232... |
|
|
The watchdog timer is no different from any of the other timers, save for one
|
The watchdog timer is no different from any of the other timers, save for one
|
critical difference: the interrupt line from the watchdog
|
critical difference: the interrupt line from the watchdog
|
timer is tied to the reset line of the CPU. Hence writing a `1' to the
|
timer is tied to the reset line of the CPU. Hence writing a `1' to the
|
watchdog timer will always reset the CPU.
|
watchdog timer will always reset the CPU.
|
To stop the Watchdog timer, write a '0' to it. To start it,
|
To stop the Watchdog timer, write a `0' to it. To start it,
|
write any other number to it---as with the other timers.
|
write any other number to it---as with the other timers.
|
|
|
While the watchdog timer supports interval mode, it doesn't make as much sense
|
While the watchdog timer supports interval mode, it doesn't make as much sense
|
as it did with the other timers.
|
as it did with the other timers.
|
|
|
Line 1153... |
Line 1265... |
|
|
The purpose of this register is to support alarm times within a CPU. To
|
The purpose of this register is to support alarm times within a CPU. To
|
set an alarm for a particular process $N$ clocks in advance, read the current
|
set an alarm for a particular process $N$ clocks in advance, read the current
|
Jiffies value, and $N$, and write it back to the Jiffies register. The
|
Jiffies value, and $N$, and write it back to the Jiffies register. The
|
O/S must also keep track of values written to the Jiffies register. Thus,
|
O/S must also keep track of values written to the Jiffies register. Thus,
|
when an `alarm' trips, it should be remoed from the list of alarms, the list
|
when an `alarm' trips, it should be removed from the list of alarms, the list
|
should be sorted, and the next alarm in terms of Jiffies should be written
|
should be sorted, and the next alarm in terms of Jiffies should be written
|
to the register.
|
to the register.
|
|
|
\section{Manual Cache}
|
\section{Manual Cache}
|
|
|
The manual cache is an experimental setting that may not remain with the Zip
|
The manual cache is an experimental setting that may not remain with the Zip
|
CPU for very long. It is designed to facilitate running from FLASH or ROM
|
CPU for very long. It is designed to facilitate running from FLASH or ROM
|
memory, although the pipe cache really makes this need obsolete. The manual
|
memory, although the pipeline prefetch cache really makes this need obsolete.
|
|
The manual
|
cache works by copying data from a wishbone address (range) into the cache
|
cache works by copying data from a wishbone address (range) into the cache
|
register, and then by making that memory available as memory to the Zip System.
|
register, and then by making that memory available as memory to the Zip System.
|
It is a {\em manual cache} because the processor must first specify what
|
It is a {\em manual cache} because the processor must first specify what
|
memory to copy, and then once copied the processor can only access the cache
|
memory to copy, and then once copied the processor can only access the cache
|
memory by the cache memory location. There is no transparency. It is perhaps
|
memory by the cache memory location. There is no transparency. It is perhaps
|
Line 1181... |
Line 1294... |
|
|
The ZipSystem registers fall into two categories, ZipSystem internal registers
|
The ZipSystem registers fall into two categories, ZipSystem internal registers
|
accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},
|
accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},
|
\begin{table}[htbp]
|
\begin{table}[htbp]
|
\begin{center}\begin{reglist}
|
\begin{center}\begin{reglist}
|
PIC & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline
|
PIC & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline
|
WDT & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline
|
WDT & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline
|
CCHE & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline
|
CCHE & \scalebox{0.8}{\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline
|
CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline
|
CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline
|
TMRA & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline
|
TMRA & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline
|
TMRB & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline
|
TMRB & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline
|
TMRC & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline
|
TMRC & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline
|
JIFF & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline
|
JIFF & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline
|
MTASK & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline
|
MTASK & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline
|
MMSTL & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline
|
MMSTL & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline
|
MPSTL & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline
|
MPSTL & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline
|
MICNT & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline
|
MICNT & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline
|
UTASK & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline
|
UTASK & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline
|
UMSTL & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline
|
UMSTL & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline
|
UPSTL & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline
|
UPSTL & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline
|
UICNT & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline
|
UICNT & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline
|
Cache & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline
|
% Cache & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline
|
\end{reglist}
|
\end{reglist}
|
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}
|
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.
|
and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.
|
\begin{table}[htbp]
|
\begin{table}[htbp]
|
Line 1212... |
Line 1325... |
\caption{Zip System Debug Registers}\label{tbl:dbgregs}
|
\caption{Zip System Debug Registers}\label{tbl:dbgregs}
|
\end{center}\end{table}
|
\end{center}\end{table}
|
|
|
|
|
\chapter{Wishbone Datasheet}\label{chap:wishbone}
|
\chapter{Wishbone Datasheet}\label{chap:wishbone}
|
The Zip System supports two wishbone accesses, a slave debug port and a master
|
The Zip System supports two wishbone ports, a slave debug port and a master
|
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}
|
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}
|
\begin{table}[htbp]
|
\begin{table}[htbp]
|
\begin{center}
|
\begin{center}
|
\begin{wishboneds}
|
\begin{wishboneds}
|
Revision level of wishbone & WB B4 spec \\\hline
|
Revision level of wishbone & WB B4 spec \\\hline
|
Line 1279... |
Line 1392... |
it choose to access a value not on the bus, or a peripheral that is not
|
it choose to access a value not on the bus, or a peripheral that is not
|
yet properly configured.
|
yet properly configured.
|
|
|
\chapter{Clocks}\label{chap:clocks}
|
\chapter{Clocks}\label{chap:clocks}
|
|
|
This core is based upon the Basys--3 design. The Basys--3 development board
|
This core is based upon the Basys--3 development board sold by Digilent.
|
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
|
The Basys--3 development board contains one external 100~MHz clock, which is
|
core.
|
sufficient to run the ZIP CPU core.
|
\begin{table}[htbp]
|
\begin{table}[htbp]
|
\begin{center}
|
\begin{center}
|
\begin{clocklist}
|
\begin{clocklist}
|
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
|
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
|
\end{clocklist}
|
\end{clocklist}
|