URL
https://opencores.org/ocsvn/zipcpu/zipcpu/trunk
Subversion Repositories zipcpu
Compare Revisions
- This comparison shows the changes necessary to convert path
/zipcpu
- from Rev 32 to Rev 33
- ↔ Reverse comparison
Rev 32 → Rev 33
/trunk/doc/spec.pdf
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
/trunk/doc/src/spec.tex
5,7 → 5,7
%% Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
%% |
%% Purpose: This LaTeX file contains all of the documentation/description |
%% currently provided with this Zip CPU soft core. It supercedes |
%% currently provided with this Zip CPU soft core. It supersedes |
%% any information about the instruction set or CPUs found |
%% elsewhere. It's not nearly as interesting, though, as the PDF |
%% file it creates, so I'd recommend reading that before diving |
48,7 → 48,7
\title{Specification} |
\author{Dan Gisselquist, Ph.D.} |
\email{dgisselq (at) opencores.org} |
\revision{Rev.~0.2} |
\revision{Rev.~0.3} |
\begin{document} |
\pagestyle{gqtekspecplain} |
\titlepage |
70,6 → 70,7
copy. |
\end{license} |
\begin{revisionhistory} |
0.3 & 8/22/2015 & Gisselquist & First completed draft\\\hline |
0.2 & 8/19/2015 & Gisselquist & Still Draft, more complete \\\hline |
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline |
\end{revisionhistory} |
91,7 → 92,7
I would like to be able to generate Verilog code that can run equivalently |
on both Xilinx and Altera chips, and that can be easily ported from one |
manufacturer's chipsets to another. Even more, before purchasing a chip or a |
board, I would like to know that my chip works. I would like to build a test |
board, I would like to know that my soft core works. I would like to build a test |
bench to test components with, and Verilator is my chosen test bench. This |
forces me to use all Verilog, and it prevents me from using any proprietary |
cores. For this reason, Microblaze and Nios are out of the question. |
159,7 → 160,7
as simple as I originally hoped. Worse, I've had to adjust to create |
capabilities that I was never expecting to need. These include: |
\begin{itemize} |
\item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is |
\item {\bf External Debug:} Once placed upon an FPGA, some external means is |
still necessary to debug this CPU. That means that there needs to be |
an external register that can control the CPU: reset it, halt it, step |
it, and tell whether it is running or not. My chosen interface |
241,10 → 242,10
enters into either the ALU or memory unit, the instruction is |
guaranteed to complete. If the logic recognizes a branch or a |
condition that would render the instruction entering into this stage |
possibly inappropriate (i.e. a conditional branch preceeding a store |
possibly inappropriate (i.e. a conditional branch preceding a store |
instruction for example), then the pipeline stalls for one cycle |
until the conditional branch completes. Then, if it generates a new |
PC address, the stages preceeding are all wiped clean. |
PC address, the stages preceding are all wiped clean. |
|
The discrete execution model allows such things as sleeping: if the |
CPU is put to ``sleep,'' the ALU and memory stages stall and back up |
321,7 → 322,7
10 bits of the status register form a set of CPU state and condition codes. |
Writes to other bits of this register are preserved. |
|
Of the eight condition codes, the bottom four are the current flags: |
Of the condition codes, the bottom four bits are the current flags: |
Zero (Z), |
Carry (C), |
Negative (N), |
331,6 → 332,7
the CPU to sleep). Setting this bit will cause the CPU to |
wait for an interrupt (if interrupts are enabled), or to |
completely halt (if interrupts are disabled). |
|
The sixth bit is a global interrupt enable bit (GIE). When this |
sixth bit is a `1' interrupts will be enabled, else disabled. When |
interrupts are disabled, the CPU will be in supervisor mode, otherwise |
387,7 → 389,7
\begin{center} |
\begin{tabular}{l|l} |
Bit & Meaning \\\hline |
9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline |
9 & Soft trap, set on a trap from user mode, cleared when returning to user mode\\\hline |
8 & (Reserved for) Floating point enable \\\hline |
7 & Halt on break, to support an external debugger \\\hline |
6 & Step, single step the CPU in user mode\\\hline |
439,8 → 441,8
\caption{Bit allocation for Operand B}\label{tbl:opb} |
\end{center}\end{table} |
|
Sixteen and twenty bit immediates don't make sense for all instructions. For |
example, what is the point of a 20--bit immediate when executing a 16--bit |
Sixteen and twenty bit immediate values don't make sense for all instructions. |
For example, what is the point of a 20--bit immediate when executing a 16--bit |
multiply? Likewise, why have a 16--bit immediate when adding to a logical |
or arithmetic shift? In these cases, the extra bits are reserved for future |
instruction possibilities. |
647,17 → 649,17
LSL/ASL & \multicolumn{4}{l|}{4'hd} |
& \multicolumn{4}{l|}{R. Reg} |
& \multicolumn{3}{l|}{Cond.} |
& \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits} |
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits} |
& Yes \\\hline |
ASR & \multicolumn{4}{l|}{4'he} |
& \multicolumn{4}{l|}{R. Reg} |
& \multicolumn{3}{l|}{Cond.} |
& \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits} |
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits} |
& Yes \\\hline |
LSR & \multicolumn{4}{l|}{4'hf} |
& \multicolumn{4}{l|}{R. Reg} |
& \multicolumn{3}{l|}{Cond.} |
& \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits} |
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits} |
& Yes \\\hline |
\end{tabular} |
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions} |
685,7 → 687,7
& \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry} |
& Add with carry \\\hline |
BRA.Cond +/-\$Addr |
& \hbox{Mov.cond \$Addr+PC,PC} |
& \hbox{MOV.cond \$Addr+PC,PC} |
& Branch or jump on condition. Works for 15--bit |
signed address offsets.\\\hline |
BRA.Cond +/-\$Addr |
692,7 → 694,7
& \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC} |
& Branch/jump on condition. Works for |
23 bit address offsets, but costs a register, an extra instruction, |
and setsthe flags. \\\hline |
and sets the flags. \\\hline |
BNC PC+\$Addr |
& \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC} |
& Example of a branch on an unsupported |
711,7 → 713,7
HALT |
& Or \$SLEEP,CC |
& Executed while in interrupt mode. In user mode this is simply a |
wait until interrupt instructioon. \\\hline |
wait until interrupt instruction. \\\hline |
INT & LDI \$0,CC |
& Since we're using the CC register as a trap vector as well, this |
executes TRAP \#0. \\\hline |
776,7 → 778,7
OR.C \$1,Ry} |
& Logical shift left with carry. Note that the |
instruction order is now backwards, to keep the conditions valid. |
That is, LSL sets the carry flag, so if we did this the othe way |
That is, LSL sets the carry flag, so if we did this the other way |
with Rx before Ry, then the condition flag wouldn't have been right |
for an OR correction at the end. \\\hline |
\parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry} |
798,10 → 800,10
& Note |
that for interrupt purposes, one can never depend upon the value at |
(SP). Hence you read from it, then increment it, lest having |
incremented it firost something then comes along and writes to that |
incremented it first something then comes along and writes to that |
value before you can read the result. \\\hline |
PUSH Rx |
& \parbox[t]{1.5in}{SUB \$1,SPa \\ |
& \parbox[t]{1.5in}{SUB \$1,SP \\ |
STO Rx,\$1(SP)} |
& \\\hline |
RESET |
917,6 → 919,8
unit stalls, every other instruction stalls. Memory stores, however, can take |
place concurrently with ALU operations, although memory reads cannot. |
|
\iffalse |
|
\section{Pipeline Logic} |
How the CPU handles some instruction combinations can be telling when |
determining what happens in the pipeline. The following lists some examples: |
923,36 → 927,49
\begin{itemize} |
\item {\bf Delayed Branching} |
|
I had originally hoped to implement delayed branching. However, what |
happens in debug mode? |
That is, what happens when a debugger tries to single step an |
instruction? While I can easily single step the computer in either |
user or supervisor mode from externally, this processor does not appear |
able to step the CPU in user mode from within user mode--gosh, not even |
from within supervisor mode--such as if a process had a debugger |
attached. As the processor exists, I would have one result stepping |
the CPU from a debugger, and another stepping it externally. |
I had originally hoped to implement delayed branching. My goal |
was that the compiler would handle any pipeline stall conditions so |
that the pipeline logic could be simpler within the CPU. I ran into |
two problems with this. |
|
This is unacceptable, and so this CPU does not support delayed |
branching. |
The first problem has to deal with debug mode. When the debugger |
single steps an instruction, that instruction goes to completion. |
This means that if the instruction moves a value to the PC register, |
the PC register would now contain that value, indicating that the |
next instruction would be on the other side of the branch. There's |
just no easy way around this: the entire CPU state must be captured |
by the registers, to include the program counter. What value should |
the program counter be equal to? The branch? Fine. The address |
you are branching to? Fine. The address of the delay slot? Problem. |
|
The second problem with delayed branching is the idea of suspending |
processing for an interrupt. Which address should the CPU return |
to upon completing the interrupt processing? The branch? Good. The |
address after the branch? Also good. The address of the delay slot? |
Not so good. |
|
If you then add into this mess the idea that, if the CPU is running |
from a really slow memory such as the flash, the delay slot may never |
be filled before the branch is determined, then this makes even less |
sense. |
|
For all of these reasons, this CPU does not support delayed branching. |
|
\item {\bf Register Result:} {\tt MOV R0,R1; MOV R1,R2 } |
|
What value does |
R2 get, the value of R1 before the first move or the value of R0? |
Placing the value of R0 into R1 requires a pipeline stall, and possibly |
two, as I have the pipeline designed. |
What value does R2 get, the value of R1 before the first move or the |
value of R0? The Zip CPU has been optimized so that neither of these |
instructions require a pipeline stall--unless an immediate were to |
be added to R1 in the second instruction. |
|
The ZIP CPU architecture requires that R2 must equal R0 at the end of |
this operation. Even better, such combinations do not (normally) |
stall the pipeline. |
|
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC} |
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;} {\tt MOV.EQ \$x,PC} |
|
|
At issue is the same item as above, save that the CMP instruction |
updates the flags that the MOV instruction depends |
upon. |
updates the flags that the MOV instruction depends upon. |
|
The Zip CPU architecture requires that condition codes must be updated |
and available immediately for the next instruction without stalling the |
965,15 → 982,11
the logic supporting any other register. |
|
The ZIP CPU will stall for a cycle cycle on this instruction. |
\item {\bf Condition Codes Register Operand:} {\tt MOV R0,R1; MOV CC,R2} |
|
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1} |
|
At issues is whether or not the instruction following the jump will |
take place before the jump. In other words, is the MOV to the PC |
register handled differently from an ADD to the PC register? |
|
In the Zip architecture, MOV'es and ADD's use the same logic |
(simplifies the logic). |
Unlike the previous case, this move prior to reading the {\tt CC} |
register does not impact the {\tt CC} register. Therefore, this |
does not stall the bus, whereas the previous one would. |
\end{itemize} |
|
As I've studied this, I find several approaches to handling pipeline |
980,7 → 993,7
issues. These approaches (and their consequences) are listed below. |
|
\begin{itemize} |
\item {\bf All All issued instructions complete, Stages stall individually} |
\item {\bf All issued instructions complete, stages stall individually} |
|
What about a slow pre-fetch? |
|
995,47 → 1008,32
or a full pipeline if reading from cache. Each of these approaches |
would produce a different response. |
|
For this reason, the Zip CPU works off of a different basis: All |
instructions that enter either the ALU or the memory unit will |
complete. Stages still stall individually. |
|
\item {\bf Issued instructions may be canceled} |
|
Stages stall individually |
|
First problem: |
Memory operations cannot be canceled, even reads may have side effects |
The problem here is that |
memory operations cannot be canceled: even reads may have side effects |
on peripherals that cannot be canceled later. Further, in the case of |
an interrupt, it's difficult to know what to cancel. What happens in |
a \hbox{\tt MOV.C \$x,PC} followed by a \hbox{\tt MOV \$y,PC} |
instruction? Which get |
canceled? |
instruction? Which get canceled? |
|
Because it isn't clear what would need to be canceled, |
this instruction combination is not recommended. |
Because it isn't clear what would need to be canceled, the Zip CPU |
will not permit this combination. A MOV to the PC register will be |
followed by a stall, and possibly many stalls, so that the second |
move to PC will never be executed. |
|
\item {\bf All issued instructions complete.} |
|
All stages are filled, or the entire pipeline stalls. |
In this example, we try all issued instructions complete, but the |
entire pipeline stalls if one stage is not filled. In this approach, |
though, we again struggle with the problems associated with |
delayed branching. Upon attempting to restart the processor, where |
do you restart it from? |
|
What about debug control? What about |
register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2 |
should place the value of R0 into R2. How do you restart the pipeline |
after an interrupt? What address do you use? The last issued |
instruction? But the branch delay slots may make that invalid! |
|
Reading from the CPU debug port in this case yields inconsistent |
results: the CPU will halt or step with instructions stuck in the |
pipeline. Reading registers will give no indication of what is going |
on in the pipeline, just the results of completed operations, not of |
operations that have been started and not yet completed. |
Perhaps we should just report the state of the CPU based upon what |
instructions (PC values) have successfully completed? Thus the |
debug instruction is the one that will write registers on the next |
clock. |
|
Suggestion: Suppose we load extra information in the two |
CC register(s) for debugging intermediate pipeline stages? |
|
The next problem, though, is how to deal with the read operand |
pipeline stage needing the result from the register pipeline. |
|
\item {\bf Memory instructions must complete} |
|
All instructions that enter into the memory module {\em must} |
1056,6 → 1054,7
decode, and read-op stage will be invalidated. |
|
\end{itemize} |
\fi |
|
\section{Pipeline Stalls} |
The processing pipeline can and will stall for a variety of reasons. Some of |
1101,6 → 1100,8
an instruction from executing a memory access after the jump but before the |
jump is recognized. |
|
This stall cannot be mitigated through scheduling. |
|
\item When reading from the CC register after setting the flags |
\begin{enumerate} |
\item\ {\tt ALUOP RA,RB} |
1116,6 → 1117,13
clock cycle. (The time delay of the multiply within the ALU wasn't helping |
either \ldots). |
|
This stall may be eliminated via proper scheduling, by placing an instruction |
that does not set flags in between the ALU operation and the instruction |
that references the CC register. For example, {\tt MOV \$addr+PC,uPC} |
followed by an {\tt RTU} ({\tt OR \$GIE,CC}) instruction will not incur |
this stall, whereas an {\tt OR \$BREAKEN,CC} followed by an {\tt OR \$STEP,CC} |
will incur the stall. |
|
\item When waiting for a memory read operation to complete |
\begin{enumerate} |
\item\ {\tt LOD address,RA} |
1126,13 → 1134,13
Remember, the ZIP CPU does not support out of order execution. Therefore, |
anytime the memory unit becomes busy both the memory unit and the ALU must |
stall until the memory unit is cleared. This is especially true of a load |
instruction, which will write its operand back to the register file. Store |
instructions are different, since they can be busy with no impact on later |
ALU write back operations. Hence, only loads stall the pipeline. |
instruction, which must still write its operand back to the register file. |
Store instructions are different, since they can be busy with no impact on |
later ALU write back operations. Hence, only loads stall the pipeline. |
|
This also assumes that the memory being accessed is a single cycle memory. |
Slower memories, such as the Quad SPI flash, will take longer--perhaps even |
as long as fourty clocks. During this time the CPU and the external bus |
as long as forty clocks. During this time the CPU and the external bus |
will be busy, and unable to do anything else. |
|
\item Memory operation followed by a memory operation |
1290,6 → 1298,42
|
\chapter{Operation}\label{chap:ops} |
|
The Zip CPU, and even the Zip System, is not a System on a Chip (SoC). It |
needs to be connected to its operational environment in order to be used. |
Specifically, some per system adjustments need to be made: |
\begin{enumerate} |
\item The Zip System depends upon an external 32-bit Wishbone bus. This |
must exist, and must be connected to the Zip CPU for it to work. |
\item The Zip System needs to be told of its {\tt RESET\_ADDRESS}. This is |
the program counter of the first instruction following a reset. |
\item If you want the Zip System to start up on its own, you will need to |
set the {\tt START\_HALTED} parameter to zero. Otherwise, if you |
wish to manually start the CPU, that is if upon reset you want the |
CPU start start in its halted, reset state, then set this parameter to |
one. |
\item The third parameter to set is the number of interrupts you will be |
providing from external to the CPU. This can be anything from one |
to nine, but it cannot be zero. (Wire this line to a 1'b0 if you |
do not wish to support any external interrupts.) |
\item Finally, you need to place into some wishbone accessible address, whether |
RAM or (more likely) ROM, the initial instructions for the CPU. |
\end{enumerate} |
If you have enabled your CPU to start automatically, then upon power up the |
CPU will immediately start executing your instructions. |
|
This is, however, not how I have used the Zip CPU. I have instead used the |
ZIP CPU in a more controlled environment. For me, the CPU starts in a |
halted state, and waits to be told to start. Further, the RESET address is a |
location in RAM. After bringing up the board I am using, and further the |
bus that is on it, the RAM memory is then loaded externally with the program |
I wish the Zip System to run. Once the RAM is loaded, I release the CPU. |
The CPU then runs until its halt condition, at which point its task is |
complete. |
|
Eventually, I intend to place an operating system onto the ZipSystem, I'm |
just not there yet. |
|
|
\chapter{Registers}\label{chap:regs} |
|
The ZipSystem registers fall into two categories, ZipSystem internal registers |
1316,7 → 1360,7
\end{reglist} |
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs} |
\end{center}\end{table} |
and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}. |
and the two debug registers shown in Tbl.~\ref{tbl:dbgregs}. |
\begin{table}[htbp] |
\begin{center}\begin{reglist} |
ZIPCTRL & 0 & 32 & R/W & Debug Control Register \\\hline |
1325,8 → 1369,214
\caption{Zip System Debug Registers}\label{tbl:dbgregs} |
\end{center}\end{table} |
|
\section{Peripheral Registers} |
The peripheral registers, listed in Tbl.~\ref{tbl:zpregs}, are shown in the |
CPU's address space. These may be accessed by the CPU at these addresses, |
and when so accessed will respond as described in Chapt.~\ref{chap:periph}. |
These registers will be discussed briefly again here. |
|
\chapter{Wishbone Datasheet}\label{chap:wishbone} |
The Zip CPU Interrupt controller has four different types of bits, as shown in |
Tbl.~\ref{tbl:picbits}. |
\begin{table}\begin{center} |
\begin{bitlist} |
31 & R/W & Master Interrupt Enable\\\hline |
30\ldots 16 & R/W & Interrupt Enables, write '1' to change\\\hline |
15 & R & Current Master Interrupt State\\\hline |
15\ldots 0 & R/W & Input Interrupt states, write '1' to clear\\\hline |
\end{bitlist} |
\caption{Interrupt Controller Register Bits}\label{tbl:picbits} |
\end{center}\end{table} |
The high order bit, or bit--31, is the master interrupt enable bit. When this |
bit is set, then any time an interrupt occurs the CPU will be interrupted and |
will switch to supervisor mode, etc. |
|
Bits 30~\ldots 16 are interrupt enable bits. Should the interrupt line go |
ghile while enabled, an interrupt will be generated. To set an interrupt enable |
bit, one needs to write the master interrupt enable while writing a `1' to this |
the bit. To clear, one need only write a `0' to the master interrupt enable, |
while leaving this line high. |
|
Bits 15\ldots 0 are the current state of the interrupt vector. Interrupt lines |
trip when they go high, and remain tripped until they are acknowledged. If |
the interrupt goes high for longer than one pulse, it may be high when a clear |
is requested. If so, the interrupt will not clear. The line must go low |
again before the status bit can be cleared. |
|
As an example, consider the following scenario where the Zip CPU supports four |
interrupts, 3\ldots0. |
\begin{enumerate} |
\item The Supervisor will first, while in the interrupts disabled mode, |
write a {\tt 32'h800f000f} to the controller. The supervisor may then |
switch to the user state with interrupts enabled. |
\item When an interrupt occurs, the supervisor will switch to the interrupt |
state. It will then cycle through the interrupt bits to learn which |
interrupt handler to call. |
\item If the interrupt handler expects more interrupts, it will clear its |
current interrupt when it is done handling the interrupt in question. |
To do this, it will write a '1' to the low order interrupt mask, |
such as writing a {\tt 32'h80000001}. |
\item If the interrupt handler does not expect any more interrupts, it will |
instead clear the interrupt from the controller by writing a |
{\tt 32'h00010001} to the controller. |
\item Once all interrupts have been handled, the supervisor will write a |
{\tt 32'h80000000} to the interrupt register to re-enable interrupt |
generation. |
\item The supervisor should also check the user trap bit, and possible soft |
interrupt bits here, but this action has nothing to do with the |
interrupt control register. |
\item The supervisor will then leave interrupt mode, possibly adjusting |
whichever task is running, by executing a return from interrupt |
command. |
\end{enumerate} |
|
Leaving the interrupt controller, we show the timer registers bit definitions |
in Tbl.~\ref{tbl:tmrbits}. |
\begin{table}\begin{center} |
\begin{bitlist} |
31 & R/W & Auto-Reload\\\hline |
30\ldots 0 & R/W & Current timer value\\\hline |
\end{bitlist} |
\caption{Timer Register Bits}\label{tbl:tmrbits} |
\end{center}\end{table} |
As you may recall, the timer just counts down to zero and then trips an |
interrupt. Writing to the current timer value sets that value, and reading |
from it returns that value. Writing to the current timer value while also |
setting the auto--reload bit will send the timer into an auto--reload mode. |
In this mode, upon setting its interrupt bit for one cycle, the timer will |
also reset itself back to the value of the timer that was written to it when |
the auto--reload option was written to it. To clear and stop the timer, |
just simply write a `32'h00' to this register. |
|
The Jiffies register is somewhat similar in that the register always changes. |
In this case, the register counts up, whereas the timer always counted down. |
Reads from this register, as shown in Tbl.~\ref{tbl:jiffybits}, |
\begin{table}\begin{center} |
\begin{bitlist} |
31\ldots 0 & R & Current jiffy value\\\hline |
31\ldots 0 & W & Value/time of next interrupt\\\hline |
\end{bitlist} |
\caption{Jiffies Register Bits}\label{tbl:jiffybits} |
\end{center}\end{table} |
always return the time value contained in the register. Writes greater than |
the current Jiffy value, that is where the new value minus the old value is |
greater than zero while ignoring truncation, will set a new Jiffy interrupt |
time. At that time, the Jiffy vector will clear, and another interrupt time |
may either be written to it, or it will just continue counting without |
activating any more interrupts. |
|
The Zip CPU also supports several counter peripherals, mostly in the way of |
process accounting. This peripherals have a single register associated with |
them, shown in Tbl.~\ref{tbl:ctrbits}. |
\begin{table}\begin{center} |
\begin{bitlist} |
31\ldots 0 & R/W & Current counter value\\\hline |
\end{bitlist} |
\caption{Counter Register Bits}\label{tbl:ctrbits} |
\end{center}\end{table} |
Writes to this register set the new counter value. Reads read the current |
counter value. |
|
The current design operation of these counters is that of performance counting. |
Two sets of four registers are available for keeping track of performance. |
The first is a task counter. This just counts clock ticks. The second |
counter is a prefetch stall counter, then an master stall counter. These |
allow the CPU to be evaluated as to how efficient it is. The fourth and |
final counter is an instruction counter, which counts how many instructions the |
CPU has issued. |
|
It is envisioned that these counters will be used as follows: First, every time |
a master counter rolls over, the supervisor (Operating System) will record |
the fact. Second, whenever activating a user task, the Operating System will |
set the four user counters to zero. When the user task has completed, the |
Operating System will read the timers back off, to determine how much of the |
CPU the process had consumed. |
|
\section{Debug Port Registers} |
Accessing the Zip System via the debug port isn't as straight forward as |
accessing the system via the wishbone bus. The debug port itself has been |
reduced to two addresses, as outlined earlier in Tbl.~\ref{tbl:dbgregs}. |
Access to the Zip System begins with the Debug Control register, shown in |
Tbl.~\ref{tbl:dbgctrl}. |
\begin{table}\begin{center} |
\begin{bitlist} |
31\ldots 14 & R & Reserved\\\hline |
13 & R & CPU GIE setting\\\hline |
12 & R & CPU is sleeping\\\hline |
11 & W & Command clear PF cache\\\hline |
10 & R/W & Command HALT, Set to '1' to halt the CPU\\\hline |
9 & R & Stall Status, '1' if CPU is busy\\\hline |
8 & R/W & Step Command, set to '1' to step the CPU\\\hline |
7 & R & Interrupt Request \\\hline |
6 & R/W & Command RESET \\\hline |
5\ldots 0 & R/W & Debug Register Address \\\hline |
\end{bitlist} |
\caption{Debug Control Register Bits}\label{tbl:dbgctrl} |
\end{center}\end{table} |
|
The first step in debugging access is to determine whether or not the CPU |
is halted, and to halt it if not. To do this, first write a '1' to the |
Command HALT bit. This will halt the CPU and place it into debug mode. |
Once the CPU is halted, the stall status bit will drop to zero. Thus, |
if bit 10 is high and bit 9 low, the debug port is open to examine the |
internal state of the CPU. |
|
At this point, the external debugger may examine internal state information |
from within the CPU. To do this, first write again to the command register |
a value (with command halt still high) containing the address of an internal |
register of interest in the bottom 6~bits. Internal registers that may be |
accessed this way are listed in Tbl.~\ref{tbl:dbgaddrs}. |
\begin{table}\begin{center} |
\begin{reglist} |
sR0 & 0 & 32 & R/W & Supervisor Register R0 \\\hline |
sR1 & 0 & 32 & R/W & Supervisor Register R1 \\\hline |
sSP & 13 & 32 & R/W & Supervisor Stack Pointer\\\hline |
sCC & 14 & 32 & R/W & Supervisor Condition Code Register \\\hline |
sPC & 15 & 32 & R/W & Supervisor Program Counter\\\hline |
uR0 & 16 & 32 & R/W & User Register R0 \\\hline |
uR1 & 17 & 32 & R/W & User Register R1 \\\hline |
uSP & 29 & 32 & R/W & User Stack Pointer\\\hline |
uCC & 30 & 32 & R/W & User Condition Code Register \\\hline |
uPC & 31 & 32 & R/W & User Program Counter\\\hline |
PIC & 32 & 32 & R/W & Primary Interrupt Controller \\\hline |
WDT & 33 & 32 & R/W & Watchdog Timer\\\hline |
CCHE & 34 & 32 & R/W & Manual Cache Controller\\\hline |
CTRIC & 35 & 32 & R/W & Secondary Interrupt Controller\\\hline |
TMRA & 36 & 32 & R/W & Timer A\\\hline |
TMRB & 37 & 32 & R/W & Timer B\\\hline |
TMRC & 38 & 32 & R/W & Timer C\\\hline |
JIFF & 39 & 32 & R/W & Jiffies peripheral\\\hline |
MTASK & 40 & 32 & R/W & Master task clock counter\\\hline |
MMSTL & 41 & 32 & R/W & Master memory stall counter\\\hline |
MPSTL & 42 & 32 & R/W & Master Pre-Fetch Stall counter\\\hline |
MICNT & 43 & 32 & R/W & Master instruction counter\\\hline |
UTASK & 44 & 32 & R/W & User task clock counter\\\hline |
UMSTL & 45 & 32 & R/W & User memory stall counter\\\hline |
UPSTL & 46 & 32 & R/W & User Pre-Fetch Stall counter\\\hline |
UICNT & 47 & 32 & R/W & User instruction counter\\\hline |
\end{reglist} |
\caption{Debug Register Addresses}\label{tbl:dbgaddrs} |
\end{center}\end{table} |
Primarily, these ``registers'' include access to the entire CPU register |
set, as well as the 16~internal peripherals. To read one of these registers |
once the address is set, simply issue a read from the data port. To write |
one of these registers or peripheral ports, simply write to the data port |
after setting the proper address. |
|
In this manner, all of the CPU's internal state may be read and adjusted. |
|
As an example of how to use this, consider what would happen in the case |
of an external break point. If and when the CPU hits a break point that |
causes it to halt, the Command HALT bit will activate on its own, the CPU |
will then raise an external interrupt line and wait for a debugger to examine |
its state. After examining the state, the debugger will need to remove |
the breakpoint by writing a different instruction into memory and by writing |
to the command register while holding the clear cache, command halt, and |
step CPU bits high, (32'hd00). The debugger may then replace the breakpoint |
now that the CPU has gone beyond it, and clear the cache again (32'h500). |
|
To leave this debug mode, simply write a `32'h0' value to the command register. |
|
\chapter{Wishbone Datasheets}\label{chap:wishbone} |
The Zip System supports two wishbone ports, a slave debug port and a master |
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave} |
\begin{table}[htbp] |
1410,7 → 1660,58
|
|
\chapter{I/O Ports}\label{chap:ioports} |
The I/O ports to the Zip CPU may be grouped into three categories. The first |
is that of the master wishbone used by the CPU, then the slave wishbone used |
to command the CPU via a debugger, and then the rest. The first two of these |
were already discussed in the wishbone chapter. They are listed here |
for completeness in Tbl.~\ref{tbl:iowb-master} |
\begin{table} |
\begin{center}\begin{portlist} |
{\tt o\_wb\_cyc} & 1 & Output & Indicates an active Wishbone cycle\\\hline |
{\tt o\_wb\_stb} & 1 & Output & WB Strobe signal\\\hline |
{\tt o\_wb\_we} & 1 & Output & Write enable\\\hline |
{\tt o\_wb\_addr} & 32 & Output & Bus address \\\hline |
{\tt o\_wb\_data} & 32 & Output & Data on WB write\\\hline |
{\tt i\_wb\_ack} & 1 & Input & Slave has completed a R/W cycle\\\hline |
{\tt i\_wb\_stall} & 1 & Input & WB bus slave not ready\\\hline |
{\tt i\_wb\_data} & 32 & Input & Incoming bus data\\\hline |
\end{portlist}\caption{CPU Master Wishbone I/O Ports}\label{tbl:iowb-master}\end{center}\end{table} |
and~\ref{tbl:iowb-slave} respectively. |
\begin{table} |
\begin{center}\begin{portlist} |
{\tt i\_wb\_cyc} & 1 & Input & Indicates an active Wishbone cycle\\\hline |
{\tt i\_wb\_stb} & 1 & Input & WB Strobe signal\\\hline |
{\tt i\_wb\_we} & 1 & Input & Write enable\\\hline |
{\tt i\_wb\_addr} & 1 & Input & Bus address, command or data port \\\hline |
{\tt i\_wb\_data} & 32 & Input & Data on WB write\\\hline |
{\tt o\_wb\_ack} & 1 & Output & Slave has completed a R/W cycle\\\hline |
{\tt o\_wb\_stall} & 1 & Output & WB bus slave not ready\\\hline |
{\tt o\_wb\_data} & 32 & Output & Incoming bus data\\\hline |
\end{portlist}\caption{CPU Debug Wishbone I/O Ports}\label{tbl:iowb-slave}\end{center}\end{table} |
|
There are only four other lines to the CPU: the external clock, external |
reset, incoming external interrupt line(s), and the outgoing debug interrupt |
line. These are shown in Tbl.~\ref{tbl:ioports}. |
\begin{table} |
\begin{center}\begin{portlist} |
{\tt i\_clk} & 1 & Input & The master CPU clock \\\hline |
{\tt i\_rst} & 1 & Input & Active high reset line \\\hline |
{\tt i\_ext\_int} & 1\ldots 6 & Input & Incoming external interrupts \\\hline |
{\tt o\_ext\_int} & 1 & Output & CPU Halted interrupt \\\hline |
\end{portlist}\caption{I/O Ports}\label{tbl:ioports}\end{center}\end{table} |
The clock line was discussed briefly in Chapt.~\ref{chap:clocks}. We |
typically run it at 100~MHz. The reset line is an active high reset. When |
asserted, the CPU will start running again from its reset address in |
memory. Further, depending upon how the CPU is configured and specifically on |
the {\tt START\_HALTED} parameter, it may or may not start running |
automatically. The {\tt i\_ext\_int} line is for an external interrupt. This |
line may be as wide as 6~external interrupts, depending upon the setting of |
the {\tt EXTERNAL\_INTERRUPTS} line. As currently configured, the ZipSystem |
only supports one such interrupt line by default. For us, this line is the |
output of another interrupt controller, but that's a board specific setup |
detail. Finally, the Zip System produces one external interrupt whenever |
the CPU halts to wait for the debugger. |
|
% Appendices |
% Index |
\end{document} |