1 |
21 |
dgisselq |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2 |
|
|
%%
|
3 |
|
|
%% Filename: spec.tex
|
4 |
|
|
%%
|
5 |
|
|
%% Project: Zip CPU -- a small, lightweight, RISC CPU soft core
|
6 |
|
|
%%
|
7 |
|
|
%% Purpose: This LaTeX file contains all of the documentation/description
|
8 |
33 |
dgisselq |
%% currently provided with this Zip CPU soft core. It supersedes
|
9 |
21 |
dgisselq |
%% any information about the instruction set or CPUs found
|
10 |
|
|
%% elsewhere. It's not nearly as interesting, though, as the PDF
|
11 |
|
|
%% file it creates, so I'd recommend reading that before diving
|
12 |
|
|
%% into this file. You should be able to find the PDF file in
|
13 |
|
|
%% the SVN distribution together with this PDF file and a copy of
|
14 |
|
|
%% the GPL-3.0 license this file is distributed under. If not,
|
15 |
|
|
%% just type 'make' in the doc directory and it (should) build
|
16 |
|
|
%% without a problem.
|
17 |
|
|
%%
|
18 |
|
|
%%
|
19 |
|
|
%% Creator: Dan Gisselquist
|
20 |
|
|
%% Gisselquist Technology, LLC
|
21 |
|
|
%%
|
22 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
23 |
|
|
%%
|
24 |
|
|
%% Copyright (C) 2015, Gisselquist Technology, LLC
|
25 |
|
|
%%
|
26 |
|
|
%% This program is free software (firmware): you can redistribute it and/or
|
27 |
|
|
%% modify it under the terms of the GNU General Public License as published
|
28 |
|
|
%% by the Free Software Foundation, either version 3 of the License, or (at
|
29 |
|
|
%% your option) any later version.
|
30 |
|
|
%%
|
31 |
|
|
%% This program is distributed in the hope that it will be useful, but WITHOUT
|
32 |
|
|
%% ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
|
33 |
|
|
%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
34 |
|
|
%% for more details.
|
35 |
|
|
%%
|
36 |
|
|
%% You should have received a copy of the GNU General Public License along
|
37 |
|
|
%% with this program. (It's in the $(ROOT)/doc directory, run make with no
|
38 |
|
|
%% target there if the PDF file isn't present.) If not, see
|
39 |
|
|
%% <http://www.gnu.org/licenses/> for a copy.
|
40 |
|
|
%%
|
41 |
|
|
%% License: GPL, v3, as defined and found on www.gnu.org,
|
42 |
|
|
%% http://www.gnu.org/licenses/gpl.html
|
43 |
|
|
%%
|
44 |
|
|
%%
|
45 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
46 |
|
|
\documentclass{gqtekspec}
|
47 |
|
|
\project{Zip CPU}
|
48 |
|
|
\title{Specification}
|
49 |
|
|
\author{Dan Gisselquist, Ph.D.}
|
50 |
|
|
\email{dgisselq (at) opencores.org}
|
51 |
33 |
dgisselq |
\revision{Rev.~0.3}
|
52 |
21 |
dgisselq |
\begin{document}
|
53 |
|
|
\pagestyle{gqtekspecplain}
|
54 |
|
|
\titlepage
|
55 |
|
|
\begin{license}
|
56 |
|
|
Copyright (C) \theyear\today, Gisselquist Technology, LLC
|
57 |
|
|
|
58 |
|
|
This project is free software (firmware): you can redistribute it and/or
|
59 |
|
|
modify it under the terms of the GNU General Public License as published
|
60 |
|
|
by the Free Software Foundation, either version 3 of the License, or (at
|
61 |
|
|
your option) any later version.
|
62 |
|
|
|
63 |
|
|
This program is distributed in the hope that it will be useful, but WITHOUT
|
64 |
|
|
ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
|
65 |
|
|
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
66 |
|
|
for more details.
|
67 |
|
|
|
68 |
|
|
You should have received a copy of the GNU General Public License along
|
69 |
|
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
70 |
|
|
copy.
|
71 |
|
|
\end{license}
|
72 |
|
|
\begin{revisionhistory}
|
73 |
33 |
dgisselq |
0.3 & 8/22/2015 & Gisselquist & First completed draft\\\hline
|
74 |
24 |
dgisselq |
0.2 & 8/19/2015 & Gisselquist & Still Draft, more complete \\\hline
|
75 |
21 |
dgisselq |
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
|
76 |
|
|
\end{revisionhistory}
|
77 |
|
|
% Revision History
|
78 |
|
|
% Table of Contents, named Contents
|
79 |
|
|
\tableofcontents
|
80 |
24 |
dgisselq |
\listoffigures
|
81 |
21 |
dgisselq |
\listoftables
|
82 |
|
|
\begin{preface}
|
83 |
|
|
Many people have asked me why I am building the Zip CPU. ARM processors are
|
84 |
|
|
good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both
|
85 |
|
|
have better toolsets than the Zip CPU will ever have. OpenRISC is also
|
86 |
24 |
dgisselq |
available, RISC--V may be replacing it. Why build a new processor?
|
87 |
21 |
dgisselq |
|
88 |
|
|
The easiest, most obvious answer is the simple one: Because I can.
|
89 |
|
|
|
90 |
|
|
There's more to it, though. There's a lot that I would like to do with a
|
91 |
|
|
processor, and I want to be able to do it in a vendor independent fashion.
|
92 |
|
|
I would like to be able to generate Verilog code that can run equivalently
|
93 |
|
|
on both Xilinx and Altera chips, and that can be easily ported from one
|
94 |
|
|
manufacturer's chipsets to another. Even more, before purchasing a chip or a
|
95 |
33 |
dgisselq |
board, I would like to know that my soft core works. I would like to build a test
|
96 |
21 |
dgisselq |
bench to test components with, and Verilator is my chosen test bench. This
|
97 |
|
|
forces me to use all Verilog, and it prevents me from using any proprietary
|
98 |
|
|
cores. For this reason, Microblaze and Nios are out of the question.
|
99 |
|
|
|
100 |
|
|
Why not OpenRISC? That's a hard question. The OpenRISC team has done some
|
101 |
|
|
wonderful work on an amazing processor, and I'll have to admit that I am
|
102 |
|
|
envious of what they've accomplished. I would like to port binutils to the
|
103 |
|
|
Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The
|
104 |
|
|
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has
|
105 |
|
|
a lot of features of modern CPUs within it that ... well, let's just say it's
|
106 |
|
|
not the little guy on the block. The Zip CPU is lighter weight, costing only
|
107 |
32 |
dgisselq |
about 2,300 LUTs with no peripherals, and 3,200 LUTs with some very basic
|
108 |
21 |
dgisselq |
peripherals.
|
109 |
|
|
|
110 |
|
|
My final reason is that I'm building the Zip CPU as a learning experience. The
|
111 |
|
|
Zip CPU has allowed me to learn a lot about how CPUs work on a very micro
|
112 |
|
|
level. For the first time, I am beginning to understand many of the Computer
|
113 |
|
|
Architecture lessons from years ago.
|
114 |
|
|
|
115 |
|
|
To summarize: Because I can, because it is open source, because it is light
|
116 |
|
|
weight, and as an exercise in learning.
|
117 |
|
|
|
118 |
|
|
\end{preface}
|
119 |
|
|
|
120 |
|
|
\chapter{Introduction}
|
121 |
|
|
\pagenumbering{arabic}
|
122 |
|
|
\setcounter{page}{1}
|
123 |
|
|
|
124 |
|
|
|
125 |
|
|
The original goal of the ZIP CPU was to be a very simple CPU. You might
|
126 |
|
|
think of it as a poor man's alternative to the OpenRISC architecture.
|
127 |
|
|
For this reason, all instructions have been designed to be as simple as
|
128 |
|
|
possible, and are all designed to be executed in one instruction cycle per
|
129 |
|
|
instruction, barring pipeline stalls. Indeed, even the bus has been simplified
|
130 |
|
|
to a constant 32-bit width, with no option for more or less. This has
|
131 |
|
|
resulted in the choice to drop push and pop instructions, pre-increment and
|
132 |
|
|
post-decrement addressing modes, and more.
|
133 |
|
|
|
134 |
|
|
For those who like buzz words, the Zip CPU is:
|
135 |
|
|
\begin{itemize}
|
136 |
|
|
\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,
|
137 |
|
|
instructions are 32-bits wide, etc.
|
138 |
24 |
dgisselq |
\item A RISC CPU. There is no microcode for executing instructions. All
|
139 |
|
|
instructions are designed to be completed in one clock cycle.
|
140 |
21 |
dgisselq |
\item A Load/Store architecture. (Only load and store instructions
|
141 |
|
|
can access memory.)
|
142 |
|
|
\item Wishbone compliant. All peripherals are accessed just like
|
143 |
|
|
memory across this bus.
|
144 |
|
|
\item A Von-Neumann architecture. (The instructions and data share a
|
145 |
|
|
common bus.)
|
146 |
|
|
\item A pipelined architecture, having stages for {\bf Prefetch},
|
147 |
|
|
{\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory}
|
148 |
24 |
dgisselq |
unit, and {\bf Write-back}. See Fig.~\ref{fig:cpu}
|
149 |
|
|
\begin{figure}\begin{center}
|
150 |
|
|
\includegraphics[width=3.5in]{../gfx/cpu.eps}
|
151 |
|
|
\caption{Zip CPU internal pipeline architecture}\label{fig:cpu}
|
152 |
|
|
\end{center}\end{figure}
|
153 |
|
|
for a diagram of this structure.
|
154 |
21 |
dgisselq |
\item Completely open source, licensed under the GPL.\footnote{Should you
|
155 |
|
|
need a copy of the Zip CPU licensed under other terms, please
|
156 |
|
|
contact me.}
|
157 |
|
|
\end{itemize}
|
158 |
|
|
|
159 |
|
|
Now, however, that I've worked on the Zip CPU for a while, it is not nearly
|
160 |
|
|
as simple as I originally hoped. Worse, I've had to adjust to create
|
161 |
|
|
capabilities that I was never expecting to need. These include:
|
162 |
|
|
\begin{itemize}
|
163 |
33 |
dgisselq |
\item {\bf External Debug:} Once placed upon an FPGA, some external means is
|
164 |
21 |
dgisselq |
still necessary to debug this CPU. That means that there needs to be
|
165 |
|
|
an external register that can control the CPU: reset it, halt it, step
|
166 |
24 |
dgisselq |
it, and tell whether it is running or not. My chosen interface
|
167 |
|
|
includes a second register similar to this control register. This
|
168 |
|
|
second register allows the external controller or debugger to examine
|
169 |
21 |
dgisselq |
registers internal to the CPU.
|
170 |
|
|
|
171 |
|
|
\item {\bf Internal Debug:} Being able to run a debugger from within
|
172 |
|
|
a user process requires an ability to step a user process from
|
173 |
|
|
within a debugger. It also requires a break instruction that can
|
174 |
|
|
be substituted for any other instruction, and substituted back.
|
175 |
|
|
The break is actually difficult: the break instruction cannot be
|
176 |
|
|
allowed to execute. That way, upon a break, the debugger should
|
177 |
|
|
be able to jump back into the user process to step the instruction
|
178 |
|
|
that would've been at the break point initially, and then to
|
179 |
|
|
replace the break after passing it.
|
180 |
|
|
|
181 |
24 |
dgisselq |
Incidentally, this break messes with the prefetch cache and the
|
182 |
|
|
pipeline: if you change an instruction partially through the pipeline,
|
183 |
|
|
the whole pipeline needs to be cleansed. Likewise if you change
|
184 |
|
|
an instruction in memory, you need to make sure the cache is reloaded
|
185 |
|
|
with the new instruction.
|
186 |
|
|
|
187 |
21 |
dgisselq |
\item {\bf Prefetch Cache:} My original implementation had a very
|
188 |
|
|
simple prefetch stage. Any time the PC changed the prefetch would go
|
189 |
|
|
and fetch the new instruction. While this was perhaps this simplest
|
190 |
|
|
approach, it cost roughly five clocks for every instruction. This
|
191 |
|
|
was deemed unacceptable, as I wanted a CPU that could execute
|
192 |
|
|
instructions in one cycle. I therefore have a prefetch cache that
|
193 |
|
|
issues pipelined wishbone accesses to memory and then pushes
|
194 |
|
|
instructions at the CPU. Sadly, this accounts for about 20\% of the
|
195 |
|
|
logic in the entire CPU, or 15\% of the logic in the entire system.
|
196 |
|
|
|
197 |
|
|
|
198 |
|
|
\item {\bf Operating System:} In order to support an operating system,
|
199 |
|
|
interrupts and so forth, the CPU needs to support supervisor and
|
200 |
|
|
user modes, as well as a means of switching between them. For example,
|
201 |
|
|
the user needs a means of executing a system call. This is the
|
202 |
|
|
purpose of the {\bf `trap'} instruction. This instruction needs to
|
203 |
|
|
place the CPU into supervisor mode (here equivalent to disabling
|
204 |
|
|
interrupts), as well as handing it a parameter such as identifying
|
205 |
|
|
which O/S function was called.
|
206 |
|
|
|
207 |
24 |
dgisselq |
My initial approach to building a trap instruction was to create an external
|
208 |
|
|
peripheral which, when written to, would generate an interrupt and could
|
209 |
|
|
return the last value written to it. In practice, this approach didn't work
|
210 |
|
|
at all: the CPU executed two instructions while waiting for the
|
211 |
|
|
trap interrupt to take place. Since then, I've decided to keep the rest of
|
212 |
|
|
the CC register for that purpose so that a write to the CC register, with the
|
213 |
|
|
GIE bit cleared, could be used to execute a trap. This has other problems,
|
214 |
|
|
though, primarily in the limitation of the uses of the CC register. In
|
215 |
|
|
particular, the CC register is the best place to put CPU state information and
|
216 |
|
|
to ``announce'' special CPU features (floating point, etc). So the trap
|
217 |
|
|
instruction still switches to interrupt mode, but the CC register is not
|
218 |
|
|
nearly as useful for telling the supervisor mode processor what trap is being
|
219 |
|
|
executed.
|
220 |
21 |
dgisselq |
|
221 |
|
|
Modern timesharing systems also depend upon a {\bf Timer} interrupt
|
222 |
24 |
dgisselq |
to handle task swapping. For the Zip CPU, this interrupt is handled
|
223 |
|
|
external to the CPU as part of the CPU System, found in {\tt zipsystem.v}.
|
224 |
|
|
The timer module itself is found in {\tt ziptimer.v}.
|
225 |
21 |
dgisselq |
|
226 |
|
|
\item {\bf Pipeline Stalls:} My original plan was to not support pipeline
|
227 |
|
|
stalls at all, but rather to require the compiler to properly schedule
|
228 |
24 |
dgisselq |
all instructions so that stalls would never be necessary. After trying
|
229 |
21 |
dgisselq |
to build such an architecture, I gave up, having learned some things:
|
230 |
|
|
|
231 |
|
|
For example, in order to facilitate interrupt handling and debug
|
232 |
|
|
stepping, the CPU needs to know what instructions have finished, and
|
233 |
|
|
which have not. In other words, it needs to know where it can restart
|
234 |
|
|
the pipeline from. Once restarted, it must act as though it had
|
235 |
24 |
dgisselq |
never stopped. This killed my idea of delayed branching, since what
|
236 |
|
|
would be the appropriate program counter to restart at? The one the
|
237 |
|
|
CPU was going to branch to, or the ones in the delay slots? This
|
238 |
|
|
also makes the idea of compressed instruction codes difficult, since,
|
239 |
|
|
again, where do you restart on interrupt?
|
240 |
21 |
dgisselq |
|
241 |
|
|
So I switched to a model of discrete execution: Once an instruction
|
242 |
|
|
enters into either the ALU or memory unit, the instruction is
|
243 |
|
|
guaranteed to complete. If the logic recognizes a branch or a
|
244 |
|
|
condition that would render the instruction entering into this stage
|
245 |
33 |
dgisselq |
possibly inappropriate (i.e. a conditional branch preceding a store
|
246 |
21 |
dgisselq |
instruction for example), then the pipeline stalls for one cycle
|
247 |
|
|
until the conditional branch completes. Then, if it generates a new
|
248 |
33 |
dgisselq |
PC address, the stages preceding are all wiped clean.
|
249 |
21 |
dgisselq |
|
250 |
|
|
The discrete execution model allows such things as sleeping: if the
|
251 |
24 |
dgisselq |
CPU is put to ``sleep,'' the ALU and memory stages stall and back up
|
252 |
21 |
dgisselq |
everything before them. Likewise, anything that has entered the ALU
|
253 |
|
|
or memory stage when the CPU is placed to sleep continues to completion.
|
254 |
|
|
To handle this logic, each pipeline stage has three control signals:
|
255 |
|
|
a valid signal, a stall signal, and a clock enable signal. In
|
256 |
|
|
general, a stage stalls if it's contents are valid and the next step
|
257 |
|
|
is stalled. This allows the pipeline to fill any time a later stage
|
258 |
|
|
stalls.
|
259 |
|
|
|
260 |
24 |
dgisselq |
This approach is also different from other pipeline approaches. Instead
|
261 |
|
|
of keeping the entire pipeline filled, each stage is treated
|
262 |
|
|
independently. Therefore, individual stages may move forward as long
|
263 |
|
|
as the subsequent stage is available, regardless of whether the stage
|
264 |
|
|
behind it is filled.
|
265 |
|
|
|
266 |
21 |
dgisselq |
\item {\bf Verilog Modules:} When examining how other processors worked
|
267 |
|
|
here on open cores, many of them had one separate module per pipeline
|
268 |
|
|
stage. While this appeared to me to be a fascinating and commendable
|
269 |
|
|
idea, my own implementation didn't work out quite so nicely.
|
270 |
|
|
|
271 |
|
|
As an example, the decode module produces a {\em lot} of
|
272 |
|
|
control wires and registers. Creating a module out of this, with
|
273 |
|
|
only the simplest of logic within it, seemed to be more a lesson
|
274 |
|
|
in passing wires around, rather than encapsulating logic.
|
275 |
|
|
|
276 |
|
|
Another example was the register writeback section. I would love
|
277 |
|
|
this section to be a module in its own right, and many have made them
|
278 |
|
|
such. However, other modules depend upon writeback results other
|
279 |
|
|
than just what's placed in the register (i.e., the control wires).
|
280 |
|
|
For these reasons, I didn't manage to fit this section into it's
|
281 |
|
|
own module.
|
282 |
|
|
|
283 |
|
|
The result is that the majority of the CPU code can be found in
|
284 |
|
|
the {\tt zipcpu.v} file.
|
285 |
|
|
\end{itemize}
|
286 |
|
|
|
287 |
|
|
With that introduction out of the way, let's move on to the instruction
|
288 |
|
|
set.
|
289 |
|
|
|
290 |
|
|
\chapter{CPU Architecture}\label{chap:arch}
|
291 |
|
|
|
292 |
24 |
dgisselq |
The Zip CPU supports a set of two operand instructions, where the second operand
|
293 |
21 |
dgisselq |
(always a register) is the result. The only exception is the store instruction,
|
294 |
|
|
where the first operand (always a register) is the source of the data to be
|
295 |
|
|
stored.
|
296 |
|
|
|
297 |
24 |
dgisselq |
\section{Simplified Bus}
|
298 |
|
|
The bus architecture of the Zip CPU is that of a simplified WISHBONE bus.
|
299 |
|
|
It has been simplified in this fashion: all operations are 32--bit operations.
|
300 |
|
|
The bus is neither little endian nor bit endian. For this reason, all words
|
301 |
|
|
are 32--bits. All instructions are also 32--bits wide. Everything has been
|
302 |
|
|
built around the 32--bit word.
|
303 |
|
|
|
304 |
21 |
dgisselq |
\section{Register Set}
|
305 |
|
|
The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor
|
306 |
24 |
dgisselq |
and a user set as shown in Fig.~\ref{fig:regset}.
|
307 |
|
|
\begin{figure}\begin{center}
|
308 |
|
|
\includegraphics[width=3.5in]{../gfx/regset.eps}
|
309 |
|
|
\caption{Zip CPU Register File}\label{fig:regset}
|
310 |
|
|
\end{center}\end{figure}
|
311 |
|
|
The supervisor set is used in interrupt mode when interrupts are disabled,
|
312 |
|
|
whereas the user set is used otherwise. Of this register set, the Program
|
313 |
|
|
Counter (PC) is register 15, whereas the status register (SR) or condition
|
314 |
|
|
code register
|
315 |
21 |
dgisselq |
(CC) is register 14. By convention, the stack pointer will be register 13 and
|
316 |
24 |
dgisselq |
noted as (SP)--although there is nothing special about this register other
|
317 |
|
|
than this convention.
|
318 |
21 |
dgisselq |
The CPU can access both register sets via move instructions from the
|
319 |
|
|
supervisor state, whereas the user state can only access the user registers.
|
320 |
|
|
|
321 |
|
|
The status register is special, and bears further mention. The lower
|
322 |
24 |
dgisselq |
10 bits of the status register form a set of CPU state and condition codes.
|
323 |
|
|
Writes to other bits of this register are preserved.
|
324 |
21 |
dgisselq |
|
325 |
33 |
dgisselq |
Of the condition codes, the bottom four bits are the current flags:
|
326 |
21 |
dgisselq |
Zero (Z),
|
327 |
|
|
Carry (C),
|
328 |
|
|
Negative (N),
|
329 |
|
|
and Overflow (V).
|
330 |
|
|
|
331 |
|
|
The next bit is a clock enable (0 to enable) or sleep bit (1 to put
|
332 |
|
|
the CPU to sleep). Setting this bit will cause the CPU to
|
333 |
|
|
wait for an interrupt (if interrupts are enabled), or to
|
334 |
|
|
completely halt (if interrupts are disabled).
|
335 |
33 |
dgisselq |
|
336 |
21 |
dgisselq |
The sixth bit is a global interrupt enable bit (GIE). When this
|
337 |
32 |
dgisselq |
sixth bit is a `1' interrupts will be enabled, else disabled. When
|
338 |
21 |
dgisselq |
interrupts are disabled, the CPU will be in supervisor mode, otherwise
|
339 |
|
|
it is in user mode. Thus, to execute a context switch, one only
|
340 |
|
|
need enable or disable interrupts. (When an interrupt line goes
|
341 |
|
|
high, interrupts will automatically be disabled, as the CPU goes
|
342 |
32 |
dgisselq |
and deals with its context switch.) Special logic has been added to
|
343 |
|
|
keep the user mode from setting the sleep register and clearing the
|
344 |
|
|
GIE register at the same time, with clearing the GIE register taking
|
345 |
|
|
precedence.
|
346 |
21 |
dgisselq |
|
347 |
|
|
The seventh bit is a step bit. This bit can be
|
348 |
|
|
set from supervisor mode only. After setting this bit, should
|
349 |
|
|
the supervisor mode process switch to user mode, it would then
|
350 |
|
|
accomplish one instruction in user mode before returning to supervisor
|
351 |
|
|
mode. Then, upon return to supervisor mode, this bit will
|
352 |
|
|
be automatically cleared. This bit has no effect on the CPU while in
|
353 |
|
|
supervisor mode.
|
354 |
|
|
|
355 |
|
|
This functionality was added to enable a userspace debugger
|
356 |
|
|
functionality on a user process, working through supervisor mode
|
357 |
|
|
of course.
|
358 |
|
|
|
359 |
|
|
|
360 |
24 |
dgisselq |
The eighth bit is a break enable bit. This controls whether a break
|
361 |
|
|
instruction in user mode will halt the processor for an external debugger
|
362 |
|
|
(break enabled), or whether the break instruction will simply send send the
|
363 |
|
|
CPU into interrupt mode. Encountering a break in supervisor mode will
|
364 |
|
|
halt the CPU independent of the break enable bit. This bit can only be set
|
365 |
|
|
within supervisor mode.
|
366 |
21 |
dgisselq |
|
367 |
32 |
dgisselq |
% Should break enable be a supervisor mode bit, while the break enable bit
|
368 |
|
|
% in user mode is a break has taken place bit?
|
369 |
|
|
%
|
370 |
|
|
|
371 |
21 |
dgisselq |
This functionality was added to enable an external debugger to
|
372 |
|
|
set and manage breakpoints.
|
373 |
|
|
|
374 |
|
|
The ninth bit is reserved for a floating point enable bit. When set, the
|
375 |
|
|
arithmetic for the next instruction will be sent to a floating point unit.
|
376 |
|
|
Such a unit may later be added as an extension to the Zip CPU. If the
|
377 |
|
|
CPU does not support floating point instructions, this bit will never be set.
|
378 |
24 |
dgisselq |
The instruction set could also be simply extended to allow other data types
|
379 |
|
|
in this fashion, such as two by 16--bit vector operations or four by 8--bit
|
380 |
|
|
vector operations.
|
381 |
21 |
dgisselq |
|
382 |
|
|
The tenth bit is a trap bit. It is set whenever the user requests a soft
|
383 |
|
|
interrupt, and cleared on any return to userspace command. This allows the
|
384 |
|
|
supervisor, in supervisor mode, to determine whether it got to supervisor
|
385 |
|
|
mode from a trap or from an external interrupt or both.
|
386 |
|
|
|
387 |
24 |
dgisselq |
These status register bits are summarized in Tbl.~\ref{tbl:ccbits}.
|
388 |
21 |
dgisselq |
\begin{table}
|
389 |
|
|
\begin{center}
|
390 |
|
|
\begin{tabular}{l|l}
|
391 |
|
|
Bit & Meaning \\\hline
|
392 |
33 |
dgisselq |
9 & Soft trap, set on a trap from user mode, cleared when returning to user mode\\\hline
|
393 |
21 |
dgisselq |
8 & (Reserved for) Floating point enable \\\hline
|
394 |
|
|
7 & Halt on break, to support an external debugger \\\hline
|
395 |
|
|
6 & Step, single step the CPU in user mode\\\hline
|
396 |
|
|
5 & GIE, or Global Interrupt Enable \\\hline
|
397 |
|
|
4 & Sleep \\\hline
|
398 |
|
|
3 & V, or overflow bit.\\\hline
|
399 |
|
|
2 & N, or negative bit.\\\hline
|
400 |
|
|
1 & C, or carry bit.\\\hline
|
401 |
|
|
|
402 |
|
|
\end{tabular}
|
403 |
24 |
dgisselq |
\caption{Condition Code / Status Register Bits}\label{tbl:ccbits}
|
404 |
|
|
\end{center}\end{table}
|
405 |
|
|
|
406 |
21 |
dgisselq |
\section{Conditional Instructions}
|
407 |
|
|
Most, although not quite all, instructions are conditionally executed. From
|
408 |
|
|
the four condition code flags, eight conditions are defined. These are shown
|
409 |
|
|
in Tbl.~\ref{tbl:conditions}.
|
410 |
|
|
\begin{table}
|
411 |
|
|
\begin{center}
|
412 |
|
|
\begin{tabular}{l|l|l}
|
413 |
|
|
Code & Mneumonic & Condition \\\hline
|
414 |
|
|
3'h0 & None & Always execute the instruction \\
|
415 |
|
|
3'h1 & {\tt .Z} & Only execute when 'Z' is set \\
|
416 |
|
|
3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\
|
417 |
|
|
3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
|
418 |
|
|
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
|
419 |
24 |
dgisselq |
3'h5 & {\tt .LT} & Less than ('N' set) \\
|
420 |
21 |
dgisselq |
3'h6 & {\tt .C} & Carry set\\
|
421 |
|
|
3'h7 & {\tt .V} & Overflow set\\
|
422 |
|
|
\end{tabular}
|
423 |
|
|
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
424 |
|
|
\end{center}
|
425 |
|
|
\end{table}
|
426 |
24 |
dgisselq |
There is no condition code for less than or equal, not C or not V. Sorry,
|
427 |
|
|
I ran out of space in 3--bits. Using these conditions will take an extra
|
428 |
32 |
dgisselq |
instruction and a pipeline stall. (Ex: \hbox{\em (Stall)}; \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
|
429 |
21 |
dgisselq |
|
430 |
|
|
\section{Operand B}
|
431 |
24 |
dgisselq |
Many instruction forms have a 21-bit source ``Operand B'' associated with them.
|
432 |
21 |
dgisselq |
This Operand B is either equal to a register plus a signed immediate offset,
|
433 |
|
|
or an immediate offset by itself. This value is encoded as shown in
|
434 |
|
|
Tbl.~\ref{tbl:opb}.
|
435 |
|
|
\begin{table}\begin{center}
|
436 |
|
|
\begin{tabular}{|l|l|l|}\hline
|
437 |
|
|
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
|
438 |
24 |
dgisselq |
1'b0 & \multicolumn{2}{l|}{20--bit Signed Immediate value} \\\hline
|
439 |
|
|
1'b1 & 4-bit Register & 16--bit Signed immediate offset \\\hline
|
440 |
21 |
dgisselq |
\end{tabular}
|
441 |
|
|
\caption{Bit allocation for Operand B}\label{tbl:opb}
|
442 |
|
|
\end{center}\end{table}
|
443 |
24 |
dgisselq |
|
444 |
33 |
dgisselq |
Sixteen and twenty bit immediate values don't make sense for all instructions.
|
445 |
|
|
For example, what is the point of a 20--bit immediate when executing a 16--bit
|
446 |
24 |
dgisselq |
multiply? Likewise, why have a 16--bit immediate when adding to a logical
|
447 |
|
|
or arithmetic shift? In these cases, the extra bits are reserved for future
|
448 |
|
|
instruction possibilities.
|
449 |
|
|
|
450 |
21 |
dgisselq |
\section{Address Modes}
|
451 |
|
|
The ZIP CPU supports two addressing modes: register plus immediate, and
|
452 |
|
|
immediate address. Addresses are therefore encoded in the same fashion as
|
453 |
|
|
Operand B's, shown above.
|
454 |
|
|
|
455 |
|
|
A lot of long hard thought was put into whether to allow pre/post increment
|
456 |
|
|
and decrement addressing modes. Finding no way to use these operators without
|
457 |
32 |
dgisselq |
taking two or more clocks per instruction,\footnote{The two clocks figure
|
458 |
|
|
comes from the design of the register set, allowing only one write per clock.
|
459 |
|
|
That write is either from the memory unit or the ALU, but never both.} these
|
460 |
|
|
addressing modes have been
|
461 |
21 |
dgisselq |
removed from the realm of possibilities. This means that the Zip CPU has no
|
462 |
|
|
native way of executing push, pop, return, or jump to subroutine operations.
|
463 |
24 |
dgisselq |
Each of these instructions can be emulated with a set of instructions from the
|
464 |
|
|
existing set.
|
465 |
21 |
dgisselq |
|
466 |
|
|
\section{Move Operands}
|
467 |
|
|
The previous set of operands would be perfect and complete, save only that
|
468 |
24 |
dgisselq |
the CPU needs access to non--supervisory registers while in supervisory mode.
|
469 |
|
|
Therefore, the MOV instruction is special and offers access to these registers
|
470 |
|
|
\ldots when in supervisory mode. To keep the compiler simple, the extra bits
|
471 |
|
|
are ignored in non-supervisory mode (as though they didn't exist), rather than
|
472 |
|
|
being mapped to new instructions or additional capabilities. The bits
|
473 |
|
|
indicating which register set each register lies within are the A-Usr and
|
474 |
|
|
B-Usr bits. When set to a one, these refer to a user mode register. When set
|
475 |
|
|
to a zero, these refer to a register in the current mode, whether user or
|
476 |
|
|
supervisor. Further, because a load immediate instruction exists, there is no
|
477 |
|
|
move capability between an immediate and a register: all moves come from either
|
478 |
|
|
a register or a register plus an offset.
|
479 |
21 |
dgisselq |
|
480 |
24 |
dgisselq |
This actually leads to a bit of a problem: since the MOV instruction encodes
|
481 |
|
|
which register set each register is coming from or moving to, how shall a
|
482 |
|
|
compiler or assembler know how to compile a MOV instruction without knowing
|
483 |
|
|
the mode of the CPU at the time? For this reason, the compiler will assume
|
484 |
|
|
all MOV registers are supervisor registers, and display them as normal.
|
485 |
|
|
Anything with the user bit set will be treated as a user register. The CPU
|
486 |
|
|
will quietly ignore the supervisor bits while in user mode, and anything
|
487 |
|
|
marked as a user register will always be valid. (Did I just say that in the
|
488 |
|
|
last paragraph?)
|
489 |
21 |
dgisselq |
|
490 |
|
|
\section{Multiply Operations}
|
491 |
24 |
dgisselq |
The Zip CPU supports two Multiply operations, a
|
492 |
21 |
dgisselq |
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU). In both
|
493 |
|
|
cases, the operand is a register plus a 16-bit immediate, subject to the
|
494 |
|
|
rule that the register cannot be the PC or CC registers. The PC register
|
495 |
|
|
field has been stolen to create a multiply by immediate instruction. The
|
496 |
|
|
CC register field is reserved.
|
497 |
|
|
|
498 |
|
|
\section{Floating Point}
|
499 |
32 |
dgisselq |
The ZIP CPU does not support floating point operations. However, the
|
500 |
|
|
instruction set reserves two possibilities for future floating point
|
501 |
|
|
operations.
|
502 |
21 |
dgisselq |
|
503 |
32 |
dgisselq |
The first floating point operation hole in the instruction set involves
|
504 |
|
|
setting the floating point bit in the CC register. The next instruction
|
505 |
|
|
will simply interpret its operands as floating point instructions.
|
506 |
|
|
Not all instructions, however, have floating point equivalents. Further, the
|
507 |
|
|
immediate fields do not apply in floating point mode, and must be set to
|
508 |
|
|
zero. Not all instructions make sense as floating point operations.
|
509 |
|
|
Therefore, only the CMP, SUB, ADD, and MPY instructions may be issued as
|
510 |
|
|
floating point instructions. Other instructions allow the examining of the
|
511 |
|
|
floating point bit in the CC register. In all cases, the floating point bit
|
512 |
|
|
is cleared one instruction after it is set.
|
513 |
21 |
dgisselq |
|
514 |
32 |
dgisselq |
The other possibility for floating point operations involves exploiting the
|
515 |
|
|
hole in the instruction set that the NOOP and BREAK instructions reside within.
|
516 |
|
|
These two instructions use 24--bits of address space. A simple adjustment
|
517 |
|
|
to this space could create instructions with 4--bit register addresses for
|
518 |
|
|
each register, a 3--bit field for conditional execution, and a 2--bit field
|
519 |
|
|
for which operation. In this fashion, such a floating point capability would
|
520 |
|
|
only fill 13--bits of the 24--bit field, still leaving lots of room for
|
521 |
|
|
expansion.
|
522 |
|
|
|
523 |
|
|
In both cases, the Zip CPU would support 32--bit single precision floats
|
524 |
|
|
only.
|
525 |
|
|
|
526 |
|
|
The current architecture does not support a floating point not-implemented
|
527 |
|
|
interrupt. Any soft floating point emulation must be done deliberately.
|
528 |
|
|
|
529 |
21 |
dgisselq |
\section{Native Instructions}
|
530 |
|
|
The instruction set for the Zip CPU is summarized in
|
531 |
|
|
Tbl.~\ref{tbl:zip-instructions}.
|
532 |
|
|
\begin{table}\begin{center}
|
533 |
|
|
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
|
534 |
|
|
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
|
535 |
|
|
& \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
|
536 |
|
|
& Sets CC? \\\hline
|
537 |
|
|
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
|
538 |
|
|
& \multicolumn{4}{l|}{D. Reg}
|
539 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
540 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
541 |
|
|
& Yes \\\hline
|
542 |
24 |
dgisselq |
TST(And) & \multicolumn{4}{l|}{4'h1}
|
543 |
21 |
dgisselq |
& \multicolumn{4}{l|}{D. Reg}
|
544 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
545 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
546 |
|
|
& Yes \\\hline
|
547 |
|
|
MOV & \multicolumn{4}{l|}{4'h2}
|
548 |
|
|
& \multicolumn{4}{l|}{D. Reg}
|
549 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
550 |
|
|
& A-Usr
|
551 |
|
|
& \multicolumn{4}{l|}{B-Reg}
|
552 |
|
|
& B-Usr
|
553 |
|
|
& \multicolumn{15}{l|}{15'bit signed offset}
|
554 |
|
|
& \\\hline
|
555 |
|
|
LODI & \multicolumn{4}{l|}{4'h3}
|
556 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
557 |
|
|
& \multicolumn{24}{l|}{24'bit Signed Immediate}
|
558 |
|
|
& \\\hline
|
559 |
|
|
NOOP & \multicolumn{4}{l|}{4'h4}
|
560 |
|
|
& \multicolumn{4}{l|}{4'he}
|
561 |
|
|
& \multicolumn{24}{l|}{24'h00}
|
562 |
|
|
& \\\hline
|
563 |
|
|
BREAK & \multicolumn{4}{l|}{4'h4}
|
564 |
|
|
& \multicolumn{4}{l|}{4'he}
|
565 |
|
|
& \multicolumn{24}{l|}{24'h01}
|
566 |
|
|
& \\\hline
|
567 |
|
|
{\em Rsrd} & \multicolumn{4}{l|}{4'h4}
|
568 |
|
|
& \multicolumn{4}{l|}{4'he}
|
569 |
|
|
& \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
|
570 |
|
|
& \\\hline
|
571 |
|
|
LODIHI & \multicolumn{4}{l|}{4'h4}
|
572 |
|
|
& \multicolumn{4}{l|}{4'hf}
|
573 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
574 |
|
|
& 1'b1
|
575 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
576 |
|
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
577 |
|
|
& \\\hline
|
578 |
|
|
LODILO & \multicolumn{4}{l|}{4'h4}
|
579 |
|
|
& \multicolumn{4}{l|}{4'hf}
|
580 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
581 |
|
|
& 1'b0
|
582 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
583 |
|
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
584 |
|
|
& \\\hline
|
585 |
|
|
16-b MPYU & \multicolumn{4}{l|}{4'h4}
|
586 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
587 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
588 |
|
|
& 1'b0 & \multicolumn{4}{l|}{Reg}
|
589 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
590 |
|
|
& Yes \\\hline
|
591 |
|
|
16-b MPYU(I) & \multicolumn{4}{l|}{4'h4}
|
592 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
593 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
594 |
|
|
& 1'b0 & \multicolumn{4}{l|}{4'hf}
|
595 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
596 |
|
|
& Yes \\\hline
|
597 |
|
|
16-b MPYS & \multicolumn{4}{l|}{4'h4}
|
598 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
599 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
600 |
|
|
& 1'b1 & \multicolumn{4}{l|}{Reg}
|
601 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
602 |
|
|
& Yes \\\hline
|
603 |
|
|
16-b MPYS(I) & \multicolumn{4}{l|}{4'h4}
|
604 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
605 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
606 |
|
|
& 1'b1 & \multicolumn{4}{l|}{4'hf}
|
607 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
608 |
|
|
& Yes \\\hline
|
609 |
|
|
ROL & \multicolumn{4}{l|}{4'h5}
|
610 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
611 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
612 |
|
|
& \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
|
613 |
|
|
& \\\hline
|
614 |
|
|
LOD & \multicolumn{4}{l|}{4'h6}
|
615 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
616 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
617 |
|
|
& \multicolumn{21}{l|}{Operand B address}
|
618 |
|
|
& \\\hline
|
619 |
|
|
STO & \multicolumn{4}{l|}{4'h7}
|
620 |
|
|
& \multicolumn{4}{l|}{D. Reg}
|
621 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
622 |
|
|
& \multicolumn{21}{l|}{Operand B address}
|
623 |
|
|
& \\\hline
|
624 |
|
|
SUB & \multicolumn{4}{l|}{4'h8}
|
625 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
626 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
627 |
32 |
dgisselq |
& \multicolumn{21}{l|}{Operand B}
|
628 |
21 |
dgisselq |
& Yes \\\hline
|
629 |
|
|
AND & \multicolumn{4}{l|}{4'h9}
|
630 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
631 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
632 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
633 |
|
|
& Yes \\\hline
|
634 |
|
|
ADD & \multicolumn{4}{l|}{4'ha}
|
635 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
636 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
637 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
638 |
|
|
& Yes \\\hline
|
639 |
|
|
OR & \multicolumn{4}{l|}{4'hb}
|
640 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
641 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
642 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
643 |
|
|
& Yes \\\hline
|
644 |
|
|
XOR & \multicolumn{4}{l|}{4'hc}
|
645 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
646 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
647 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
648 |
|
|
& Yes \\\hline
|
649 |
|
|
LSL/ASL & \multicolumn{4}{l|}{4'hd}
|
650 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
651 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
652 |
33 |
dgisselq |
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
653 |
21 |
dgisselq |
& Yes \\\hline
|
654 |
|
|
ASR & \multicolumn{4}{l|}{4'he}
|
655 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
656 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
657 |
33 |
dgisselq |
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
658 |
21 |
dgisselq |
& Yes \\\hline
|
659 |
|
|
LSR & \multicolumn{4}{l|}{4'hf}
|
660 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
661 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
662 |
33 |
dgisselq |
& \multicolumn{21}{l|}{Operand B, imm. truncated to 6 bits}
|
663 |
21 |
dgisselq |
& Yes \\\hline
|
664 |
|
|
\end{tabular}
|
665 |
|
|
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
|
666 |
|
|
\end{center}\end{table}
|
667 |
|
|
|
668 |
|
|
As you can see, there's lots of room for instruction set expansion. The
|
669 |
24 |
dgisselq |
NOOP and BREAK instructions are the only instructions within one particular
|
670 |
32 |
dgisselq |
24--bit hole. This spaces are reserved for future enhancements. For example,
|
671 |
|
|
floating point operations, consisting of a 3-bit floating point operation,
|
672 |
|
|
two 4-bit registers, no immediate offset, and a 3-bit condition would fit
|
673 |
|
|
nicely into 14--bits of this address space--making it so that the floating
|
674 |
|
|
point bit in the CC register need not be used.
|
675 |
21 |
dgisselq |
|
676 |
|
|
\section{Derived Instructions}
|
677 |
|
|
The ZIP CPU supports many other common instructions, but not all of them
|
678 |
24 |
dgisselq |
are single cycle instructions. The derived instruction tables,
|
679 |
21 |
dgisselq |
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
|
680 |
|
|
help to capture some of how these other instructions may be implemented on
|
681 |
|
|
the ZIP CPU. Many of these instructions will have assembly equivalents,
|
682 |
|
|
such as the branch instructions, to facilitate working with the CPU.
|
683 |
|
|
\begin{table}\begin{center}
|
684 |
|
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
685 |
|
|
Mapped & Actual & Notes \\\hline
|
686 |
|
|
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
|
687 |
|
|
& \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
|
688 |
|
|
& Add with carry \\\hline
|
689 |
|
|
BRA.Cond +/-\$Addr
|
690 |
33 |
dgisselq |
& \hbox{MOV.cond \$Addr+PC,PC}
|
691 |
24 |
dgisselq |
& Branch or jump on condition. Works for 15--bit
|
692 |
|
|
signed address offsets.\\\hline
|
693 |
21 |
dgisselq |
BRA.Cond +/-\$Addr
|
694 |
|
|
& \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC}
|
695 |
|
|
& Branch/jump on condition. Works for
|
696 |
|
|
23 bit address offsets, but costs a register, an extra instruction,
|
697 |
33 |
dgisselq |
and sets the flags. \\\hline
|
698 |
21 |
dgisselq |
BNC PC+\$Addr
|
699 |
|
|
& \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
|
700 |
|
|
& Example of a branch on an unsupported
|
701 |
|
|
condition, in this case a branch on not carry \\\hline
|
702 |
|
|
BUSY & MOV \$-1(PC),PC & Execute an infinite loop \\\hline
|
703 |
|
|
CLRF.NZ Rx
|
704 |
|
|
& XOR.NZ Rx,Rx
|
705 |
|
|
& Clear Rx, and flags, if the Z-bit is not set \\\hline
|
706 |
|
|
CLR Rx
|
707 |
|
|
& LDI \$0,Rx
|
708 |
|
|
& Clears Rx, leaves flags untouched. This instruction cannot be
|
709 |
|
|
conditional. \\\hline
|
710 |
|
|
EXCH.W Rx
|
711 |
|
|
& ROL \$16,Rx
|
712 |
|
|
& Exchanges the top and bottom 16'bit words of Rx \\\hline
|
713 |
|
|
HALT
|
714 |
|
|
& Or \$SLEEP,CC
|
715 |
|
|
& Executed while in interrupt mode. In user mode this is simply a
|
716 |
33 |
dgisselq |
wait until interrupt instruction. \\\hline
|
717 |
21 |
dgisselq |
INT & LDI \$0,CC
|
718 |
|
|
& Since we're using the CC register as a trap vector as well, this
|
719 |
|
|
executes TRAP \#0. \\\hline
|
720 |
|
|
IRET
|
721 |
|
|
& OR \$GIE,CC
|
722 |
|
|
& Also an RTU instruction (Return to Userspace) \\\hline
|
723 |
|
|
JMP R6+\$Addr
|
724 |
|
|
& MOV \$Addr(R6),PC
|
725 |
|
|
& \\\hline
|
726 |
|
|
JSR PC+\$Addr
|
727 |
|
|
& \parbox[t]{1.5in}{SUB \$1,SP \\\
|
728 |
|
|
MOV \$3+PC,R0 \\
|
729 |
|
|
STO R0,1(SP) \\
|
730 |
|
|
MOV \$Addr+PC,PC \\
|
731 |
|
|
ADD \$1,SP}
|
732 |
24 |
dgisselq |
& Jump to Subroutine. Note the required cleanup instruction after
|
733 |
|
|
returning. \\\hline
|
734 |
21 |
dgisselq |
JSR PC+\$Addr
|
735 |
|
|
& \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
|
736 |
|
|
&This is the high speed
|
737 |
|
|
version of a subroutine call, necessitating a register to hold the
|
738 |
|
|
last PC address. In its favor, this method doesn't suffer the
|
739 |
|
|
mandatory memory access of the other approach. \\\hline
|
740 |
|
|
LDI.l \$val,Rx
|
741 |
|
|
& \parbox[t]{1.5in}{LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
|
742 |
|
|
LDILO (\$val \& 0x0ffff)}
|
743 |
|
|
& Sadly, there's not enough instruction
|
744 |
|
|
space to load a complete immediate value into any register.
|
745 |
|
|
Therefore, fully loading any register takes two cycles.
|
746 |
|
|
The LDIHI (load immediate high) and LDILO (load immediate low)
|
747 |
|
|
instructions have been created to facilitate this. \\\hline
|
748 |
|
|
\end{tabular}
|
749 |
|
|
\caption{Derived Instructions}\label{tbl:derived-1}
|
750 |
|
|
\end{center}\end{table}
|
751 |
|
|
\begin{table}\begin{center}
|
752 |
|
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
753 |
|
|
Mapped & Actual & Notes \\\hline
|
754 |
|
|
LOD.b \$addr,Rx
|
755 |
|
|
& \parbox[t]{1.5in}{%
|
756 |
|
|
LDI \$addr,Ra \\
|
757 |
|
|
LDI \$addr,Rb \\
|
758 |
|
|
LSR \$2,Ra \\
|
759 |
|
|
AND \$3,Rb \\
|
760 |
|
|
LOD (Ra),Rx \\
|
761 |
|
|
LSL \$3,Rb \\
|
762 |
|
|
SUB \$32,Rb \\
|
763 |
|
|
ROL Rb,Rx \\
|
764 |
|
|
AND \$0ffh,Rx}
|
765 |
|
|
& \parbox[t]{3in}{This CPU is designed for 32'bit word
|
766 |
|
|
length instructions. Byte addressing is not supported by the CPU or
|
767 |
|
|
the bus, so it therefore takes more work to do.
|
768 |
|
|
|
769 |
|
|
Note also that in this example, \$Addr is a byte-wise address, where
|
770 |
24 |
dgisselq |
all other addresses in this document are 32-bit wordlength addresses.
|
771 |
|
|
For this reason,
|
772 |
21 |
dgisselq |
we needed to drop the bottom two bits. This also limits the address
|
773 |
|
|
space of character accesses using this method from 16 MB down to 4MB.}
|
774 |
|
|
\\\hline
|
775 |
|
|
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
|
776 |
|
|
& \parbox[t]{1.5in}{LSL \$1,Ry \\
|
777 |
|
|
LSL \$1,Rx \\
|
778 |
|
|
OR.C \$1,Ry}
|
779 |
|
|
& Logical shift left with carry. Note that the
|
780 |
|
|
instruction order is now backwards, to keep the conditions valid.
|
781 |
33 |
dgisselq |
That is, LSL sets the carry flag, so if we did this the other way
|
782 |
21 |
dgisselq |
with Rx before Ry, then the condition flag wouldn't have been right
|
783 |
|
|
for an OR correction at the end. \\\hline
|
784 |
|
|
\parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry}
|
785 |
|
|
& \parbox[t]{1.5in}{CLR Rz \\
|
786 |
|
|
LSR \$1,Ry \\
|
787 |
|
|
LDIHI.C \$8000h,Rz \\
|
788 |
|
|
LSR \$1,Rx \\
|
789 |
|
|
OR Rz,Rx}
|
790 |
|
|
& Logical shift right with carry \\\hline
|
791 |
|
|
NEG Rx & \parbox[t]{1.5in}{XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
|
792 |
|
|
NOOP & NOOP & While there are many
|
793 |
|
|
operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these
|
794 |
|
|
operations have consequences in that they might stall the bus if
|
795 |
|
|
Rx isn't ready yet. For this reason, we have a dedicated NOOP
|
796 |
|
|
instruction. \\\hline
|
797 |
|
|
NOT Rx & XOR \$-1,Rx & \\\hline
|
798 |
|
|
POP Rx
|
799 |
|
|
& \parbox[t]{1.5in}{LOD \$-1(SP),Rx \\ ADD \$1,SP}
|
800 |
|
|
& Note
|
801 |
|
|
that for interrupt purposes, one can never depend upon the value at
|
802 |
|
|
(SP). Hence you read from it, then increment it, lest having
|
803 |
33 |
dgisselq |
incremented it first something then comes along and writes to that
|
804 |
21 |
dgisselq |
value before you can read the result. \\\hline
|
805 |
|
|
PUSH Rx
|
806 |
33 |
dgisselq |
& \parbox[t]{1.5in}{SUB \$1,SP \\
|
807 |
21 |
dgisselq |
STO Rx,\$1(SP)}
|
808 |
|
|
& \\\hline
|
809 |
|
|
RESET
|
810 |
|
|
& \parbox[t]{1in}{STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
|
811 |
|
|
& \parbox[t]{3in}{This depends upon the peripheral base address being
|
812 |
|
|
in R12.
|
813 |
|
|
|
814 |
|
|
Another opportunity might be to jump to the reset address from within
|
815 |
|
|
supervisor mode.}\\\hline
|
816 |
24 |
dgisselq |
RET & \parbox[t]{1.5in}{LOD \$-1(SP),PC}
|
817 |
|
|
& Note that this depends upon the calling context to clean up the
|
818 |
|
|
stack, as outlined for the JSR instruction. \\\hline
|
819 |
21 |
dgisselq |
\end{tabular}
|
820 |
|
|
\caption{Derived Instructions, continued}\label{tbl:derived-2}
|
821 |
|
|
\end{center}\end{table}
|
822 |
|
|
\begin{table}\begin{center}
|
823 |
|
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
824 |
|
|
RET & MOV R12,PC
|
825 |
|
|
& This is the high(er) speed version, that doesn't touch the stack.
|
826 |
|
|
As such, it doesn't suffer a stall on memory read/write to the stack.
|
827 |
|
|
\\\hline
|
828 |
|
|
STEP Rr,Rt
|
829 |
|
|
& \parbox[t]{1.5in}{LSR \$1,Rr \\ XOR.C Rt,Rr}
|
830 |
|
|
& Step a Galois implementation of a Linear Feedback Shift Register, Rr,
|
831 |
|
|
using taps Rt \\\hline
|
832 |
|
|
STO.b Rx,\$addr
|
833 |
|
|
& \parbox[t]{1.5in}{%
|
834 |
|
|
LDI \$addr,Ra \\
|
835 |
|
|
LDI \$addr,Rb \\
|
836 |
|
|
LSR \$2,Ra \\
|
837 |
|
|
AND \$3,Rb \\
|
838 |
|
|
SUB \$32,Rb \\
|
839 |
|
|
LOD (Ra),Ry \\
|
840 |
|
|
AND \$0ffh,Rx \\
|
841 |
|
|
AND \$-0ffh,Ry \\
|
842 |
|
|
ROL Rb,Rx \\
|
843 |
|
|
OR Rx,Ry \\
|
844 |
|
|
STO Ry,(Ra) }
|
845 |
|
|
& \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized
|
846 |
|
|
for byte-wise operations.
|
847 |
|
|
|
848 |
|
|
Note that in this example, \$addr is a
|
849 |
|
|
byte-wise address, whereas in all of our other examples it is a
|
850 |
|
|
32-bit word address. This also limits the address space
|
851 |
|
|
of character accesses from 16 MB down to 4MB.F
|
852 |
|
|
Further, this instruction implies a byte ordering,
|
853 |
|
|
such as big or little endian.} \\\hline
|
854 |
|
|
SWAP Rx,Ry
|
855 |
|
|
& \parbox[t]{1.5in}{
|
856 |
|
|
XOR Ry,Rx \\
|
857 |
|
|
XOR Rx,Ry \\
|
858 |
|
|
XOR Ry,Rx}
|
859 |
|
|
& While no extra registers are needed, this example
|
860 |
|
|
does take 3-clocks. \\\hline
|
861 |
|
|
TRAP \#X
|
862 |
|
|
& LDILO \$x,CC
|
863 |
|
|
& This approach uses the unused bits of the CC register as a TRAP
|
864 |
24 |
dgisselq |
address. The user will need to make certain
|
865 |
21 |
dgisselq |
that the SLEEP and GIE bits are not set in \$x. LDI would also work,
|
866 |
|
|
however using LDILO permits the use of conditional traps. (i.e.,
|
867 |
|
|
trap if the zero flag is set.) Should you wish to trap off of a
|
868 |
|
|
register value, you could equivalently load \$x into the register and
|
869 |
|
|
then MOV it into the CC register. \\\hline
|
870 |
|
|
TST Rx
|
871 |
|
|
& TST \$-1,Rx
|
872 |
|
|
& Set the condition codes based upon Rx. Could also do a CMP \$0,Rx,
|
873 |
|
|
ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc. The TST and CMP
|
874 |
|
|
approaches won't stall future pipeline stages looking for the value
|
875 |
|
|
of Rx. \\\hline
|
876 |
|
|
WAIT
|
877 |
|
|
& Or \$SLEEP,CC
|
878 |
|
|
& Wait 'til interrupt. In an interrupts disabled context, this
|
879 |
|
|
becomes a HALT instruction.
|
880 |
|
|
\end{tabular}
|
881 |
|
|
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
882 |
|
|
\end{center}\end{table}
|
883 |
|
|
\iffalse
|
884 |
|
|
\fi
|
885 |
|
|
\section{Pipeline Stages}
|
886 |
32 |
dgisselq |
As mentioned in the introduction, and highlighted in Fig.~\ref{fig:cpu},
|
887 |
|
|
the Zip CPU supports a five stage pipeline.
|
888 |
21 |
dgisselq |
\begin{enumerate}
|
889 |
|
|
\item {\bf Prefetch}: Read instruction from memory (cache if possible). This
|
890 |
|
|
stage is actually pipelined itself, and so it will stall if the PC
|
891 |
|
|
ever changes. Stalls are also created here if the instruction isn't
|
892 |
|
|
in the prefetch cache.
|
893 |
|
|
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
|
894 |
32 |
dgisselq |
immediate offset. This stage also determines whether the flags will
|
895 |
|
|
be set or whether the result will be written back.
|
896 |
21 |
dgisselq |
\item {\bf Read Operands}: Read registers and apply any immediate values to
|
897 |
24 |
dgisselq |
them. There is no means of detecting or flagging arithmetic overflow
|
898 |
|
|
or carry when adding the immediate to the operand. This stage will
|
899 |
|
|
stall if any source operand is pending.
|
900 |
21 |
dgisselq |
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
|
901 |
|
|
instruction, and the {\bf MemOps} stage which accomplishes memory
|
902 |
|
|
read/write.
|
903 |
|
|
\begin{itemize}
|
904 |
|
|
\item Loads stall instructions that access the register until it is
|
905 |
|
|
written to the register set.
|
906 |
|
|
\item Condition codes are available upon completion
|
907 |
|
|
\item Issuing an instruction to the memory while the memory is busy will
|
908 |
32 |
dgisselq |
stall the entire pipeline. If the bus deadlocks, only a reset
|
909 |
|
|
will release the CPU. (Watchdog timer, anyone?)
|
910 |
24 |
dgisselq |
\item The Zip CPU currently has no means of reading and acting on any
|
911 |
|
|
error conditions on the bus.
|
912 |
21 |
dgisselq |
\end{itemize}
|
913 |
32 |
dgisselq |
\item {\bf Write-Back}: Conditionally write back the result to the register
|
914 |
|
|
set, applying the condition. This routine is bi-re-entrant: either the
|
915 |
21 |
dgisselq |
memory or the simple instruction may request a register write.
|
916 |
|
|
\end{enumerate}
|
917 |
|
|
|
918 |
24 |
dgisselq |
The Zip CPU does not support out of order execution. Therefore, if the memory
|
919 |
|
|
unit stalls, every other instruction stalls. Memory stores, however, can take
|
920 |
32 |
dgisselq |
place concurrently with ALU operations, although memory reads cannot.
|
921 |
24 |
dgisselq |
|
922 |
33 |
dgisselq |
\iffalse
|
923 |
|
|
|
924 |
21 |
dgisselq |
\section{Pipeline Logic}
|
925 |
|
|
How the CPU handles some instruction combinations can be telling when
|
926 |
|
|
determining what happens in the pipeline. The following lists some examples:
|
927 |
|
|
\begin{itemize}
|
928 |
|
|
\item {\bf Delayed Branching}
|
929 |
|
|
|
930 |
33 |
dgisselq |
I had originally hoped to implement delayed branching. My goal
|
931 |
|
|
was that the compiler would handle any pipeline stall conditions so
|
932 |
|
|
that the pipeline logic could be simpler within the CPU. I ran into
|
933 |
|
|
two problems with this.
|
934 |
21 |
dgisselq |
|
935 |
33 |
dgisselq |
The first problem has to deal with debug mode. When the debugger
|
936 |
|
|
single steps an instruction, that instruction goes to completion.
|
937 |
|
|
This means that if the instruction moves a value to the PC register,
|
938 |
|
|
the PC register would now contain that value, indicating that the
|
939 |
|
|
next instruction would be on the other side of the branch. There's
|
940 |
|
|
just no easy way around this: the entire CPU state must be captured
|
941 |
|
|
by the registers, to include the program counter. What value should
|
942 |
|
|
the program counter be equal to? The branch? Fine. The address
|
943 |
|
|
you are branching to? Fine. The address of the delay slot? Problem.
|
944 |
21 |
dgisselq |
|
945 |
33 |
dgisselq |
The second problem with delayed branching is the idea of suspending
|
946 |
|
|
processing for an interrupt. Which address should the CPU return
|
947 |
|
|
to upon completing the interrupt processing? The branch? Good. The
|
948 |
|
|
address after the branch? Also good. The address of the delay slot?
|
949 |
|
|
Not so good.
|
950 |
|
|
|
951 |
|
|
If you then add into this mess the idea that, if the CPU is running
|
952 |
|
|
from a really slow memory such as the flash, the delay slot may never
|
953 |
|
|
be filled before the branch is determined, then this makes even less
|
954 |
|
|
sense.
|
955 |
|
|
|
956 |
|
|
For all of these reasons, this CPU does not support delayed branching.
|
957 |
|
|
|
958 |
21 |
dgisselq |
\item {\bf Register Result:} {\tt MOV R0,R1; MOV R1,R2 }
|
959 |
|
|
|
960 |
33 |
dgisselq |
What value does R2 get, the value of R1 before the first move or the
|
961 |
|
|
value of R0? The Zip CPU has been optimized so that neither of these
|
962 |
|
|
instructions require a pipeline stall--unless an immediate were to
|
963 |
|
|
be added to R1 in the second instruction.
|
964 |
21 |
dgisselq |
|
965 |
|
|
The ZIP CPU architecture requires that R2 must equal R0 at the end of
|
966 |
32 |
dgisselq |
this operation. Even better, such combinations do not (normally)
|
967 |
|
|
stall the pipeline.
|
968 |
21 |
dgisselq |
|
969 |
33 |
dgisselq |
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;} {\tt MOV.EQ \$x,PC}
|
970 |
21 |
dgisselq |
|
971 |
|
|
At issue is the same item as above, save that the CMP instruction
|
972 |
33 |
dgisselq |
updates the flags that the MOV instruction depends upon.
|
973 |
21 |
dgisselq |
|
974 |
|
|
The Zip CPU architecture requires that condition codes must be updated
|
975 |
|
|
and available immediately for the next instruction without stalling the
|
976 |
|
|
pipeline.
|
977 |
|
|
|
978 |
|
|
\item {\bf Condition Codes Register Result:} {\tt CMP R0,R1; MOV CC,R2}
|
979 |
|
|
|
980 |
|
|
At issue is the
|
981 |
|
|
fact that the logic supporting the CC register is more complicated than
|
982 |
|
|
the logic supporting any other register.
|
983 |
|
|
|
984 |
32 |
dgisselq |
The ZIP CPU will stall for a cycle cycle on this instruction.
|
985 |
33 |
dgisselq |
\item {\bf Condition Codes Register Operand:} {\tt MOV R0,R1; MOV CC,R2}
|
986 |
21 |
dgisselq |
|
987 |
33 |
dgisselq |
Unlike the previous case, this move prior to reading the {\tt CC}
|
988 |
|
|
register does not impact the {\tt CC} register. Therefore, this
|
989 |
|
|
does not stall the bus, whereas the previous one would.
|
990 |
21 |
dgisselq |
\end{itemize}
|
991 |
|
|
|
992 |
|
|
As I've studied this, I find several approaches to handling pipeline
|
993 |
|
|
issues. These approaches (and their consequences) are listed below.
|
994 |
|
|
|
995 |
|
|
\begin{itemize}
|
996 |
33 |
dgisselq |
\item {\bf All issued instructions complete, stages stall individually}
|
997 |
21 |
dgisselq |
|
998 |
|
|
What about a slow pre-fetch?
|
999 |
|
|
|
1000 |
|
|
Nominally, this works well: any issued instruction
|
1001 |
|
|
just runs to completion. If there are four issued instructions in the
|
1002 |
|
|
pipeline, with the writeback instruction being a write-to-PC
|
1003 |
|
|
instruction, the other three instructions naturally finish.
|
1004 |
|
|
|
1005 |
|
|
This approach fails when reading instructions from the flash,
|
1006 |
|
|
since such reads require N clocks to clocks to complete. Thus
|
1007 |
|
|
there may be only one instruction in the pipeline if reading from flash,
|
1008 |
|
|
or a full pipeline if reading from cache. Each of these approaches
|
1009 |
|
|
would produce a different response.
|
1010 |
|
|
|
1011 |
33 |
dgisselq |
For this reason, the Zip CPU works off of a different basis: All
|
1012 |
|
|
instructions that enter either the ALU or the memory unit will
|
1013 |
|
|
complete. Stages still stall individually.
|
1014 |
|
|
|
1015 |
21 |
dgisselq |
\item {\bf Issued instructions may be canceled}
|
1016 |
|
|
|
1017 |
33 |
dgisselq |
The problem here is that
|
1018 |
|
|
memory operations cannot be canceled: even reads may have side effects
|
1019 |
21 |
dgisselq |
on peripherals that cannot be canceled later. Further, in the case of
|
1020 |
|
|
an interrupt, it's difficult to know what to cancel. What happens in
|
1021 |
|
|
a \hbox{\tt MOV.C \$x,PC} followed by a \hbox{\tt MOV \$y,PC}
|
1022 |
33 |
dgisselq |
instruction? Which get canceled?
|
1023 |
21 |
dgisselq |
|
1024 |
33 |
dgisselq |
Because it isn't clear what would need to be canceled, the Zip CPU
|
1025 |
|
|
will not permit this combination. A MOV to the PC register will be
|
1026 |
|
|
followed by a stall, and possibly many stalls, so that the second
|
1027 |
|
|
move to PC will never be executed.
|
1028 |
21 |
dgisselq |
|
1029 |
|
|
\item {\bf All issued instructions complete.}
|
1030 |
|
|
|
1031 |
33 |
dgisselq |
In this example, we try all issued instructions complete, but the
|
1032 |
|
|
entire pipeline stalls if one stage is not filled. In this approach,
|
1033 |
|
|
though, we again struggle with the problems associated with
|
1034 |
|
|
delayed branching. Upon attempting to restart the processor, where
|
1035 |
|
|
do you restart it from?
|
1036 |
21 |
dgisselq |
|
1037 |
|
|
\item {\bf Memory instructions must complete}
|
1038 |
|
|
|
1039 |
32 |
dgisselq |
All instructions that enter into the memory module {\em must}
|
1040 |
21 |
dgisselq |
complete. Issued instructions from the prefetch, decode, or operand
|
1041 |
|
|
read stages may or may not complete. Jumps into code must be valid,
|
1042 |
|
|
so that interrupt returns may be valid. All instructions entering the
|
1043 |
|
|
ALU complete.
|
1044 |
|
|
|
1045 |
|
|
This looks to be the simplest approach.
|
1046 |
|
|
While the logic may be difficult, this appears to be the only
|
1047 |
|
|
re-entrant approach.
|
1048 |
|
|
|
1049 |
|
|
A {\tt new\_pc} flag will be high anytime the PC changes in an
|
1050 |
|
|
unpredictable way (i.e., it doesn't increment). This includes jumps
|
1051 |
|
|
as well as interrupts and interrupt returns. Whenever this flag may
|
1052 |
|
|
go high, memory operations and ALU operations will stall until the
|
1053 |
|
|
result is known. When the flag does go high, anything in the prefetch,
|
1054 |
|
|
decode, and read-op stage will be invalidated.
|
1055 |
|
|
|
1056 |
|
|
\end{itemize}
|
1057 |
33 |
dgisselq |
\fi
|
1058 |
21 |
dgisselq |
|
1059 |
32 |
dgisselq |
\section{Pipeline Stalls}
|
1060 |
|
|
The processing pipeline can and will stall for a variety of reasons. Some of
|
1061 |
|
|
these are obvious, some less so. These reasons are listed below:
|
1062 |
|
|
\begin{itemize}
|
1063 |
|
|
\item When the prefetch cache is exhausted
|
1064 |
21 |
dgisselq |
|
1065 |
32 |
dgisselq |
This should be obvious. If the prefetch cache doesn't have the instruction
|
1066 |
|
|
in memory, the entire pipeline must stall until enough of the prefetch cache
|
1067 |
|
|
is loaded to support the next instruction.
|
1068 |
21 |
dgisselq |
|
1069 |
32 |
dgisselq |
\item While waiting for the pipeline to load following any taken branch, jump,
|
1070 |
|
|
return from interrupt or switch to interrupt context (6 clocks)
|
1071 |
|
|
|
1072 |
|
|
If the PC suddenly changes, the pipeline is subsequently cleared and needs to
|
1073 |
|
|
be reloaded. Given that there are five stages to the pipeline, that accounts
|
1074 |
|
|
for five of the six delay clocks. The last clock is lost in the prefetch
|
1075 |
|
|
stage which needs at least one clock with a valid PC before it can produce
|
1076 |
|
|
a new output. Hence, six clocks will always be lost anytime the pipeline needs
|
1077 |
|
|
to be cleared.
|
1078 |
|
|
|
1079 |
|
|
\item When reading from a prior register while also adding an immediate offset
|
1080 |
|
|
\begin{enumerate}
|
1081 |
|
|
\item\ {\tt OPCODE ?,RA}
|
1082 |
|
|
\item\ {\em (stall)}
|
1083 |
|
|
\item\ {\tt OPCODE I+RA,RB}
|
1084 |
|
|
\end{enumerate}
|
1085 |
|
|
|
1086 |
|
|
Since the addition of the immediate register within OpB decoding gets applied
|
1087 |
|
|
during the read operand stage so that it can be nicely settled before the ALU,
|
1088 |
|
|
any instruction that will write back an operand must be separated from the
|
1089 |
|
|
opcode that will read and apply an immediate offset by one instruction. The
|
1090 |
|
|
good news is that this stall can easily be mitigated by proper scheduling.
|
1091 |
|
|
|
1092 |
|
|
\item When writing to the CC or PC Register
|
1093 |
|
|
\begin{enumerate}
|
1094 |
|
|
\item\ {\tt OPCODE RA,PC} {\em Ex: a branch opcode}
|
1095 |
|
|
\item\ {\em (stall, even if jump not taken)}
|
1096 |
|
|
\item\ {\tt OPCODE RA,RB}
|
1097 |
|
|
\end{enumerate}
|
1098 |
|
|
Since branches take place in the writeback stage, the Zip CPU will stall the
|
1099 |
|
|
pipeline for one clock anytime there may be a possible jump. This prevents
|
1100 |
|
|
an instruction from executing a memory access after the jump but before the
|
1101 |
|
|
jump is recognized.
|
1102 |
|
|
|
1103 |
33 |
dgisselq |
This stall cannot be mitigated through scheduling.
|
1104 |
|
|
|
1105 |
32 |
dgisselq |
\item When reading from the CC register after setting the flags
|
1106 |
|
|
\begin{enumerate}
|
1107 |
|
|
\item\ {\tt ALUOP RA,RB}
|
1108 |
|
|
\item\ {\em (stall}
|
1109 |
|
|
\item\ {\tt TST sys.ccv,CC}
|
1110 |
|
|
\item\ {\tt BZ somewhere}
|
1111 |
|
|
\end{enumerate}
|
1112 |
|
|
|
1113 |
|
|
The reason for this stall is simply performance. Many of the flags are
|
1114 |
|
|
determined via combinatorial logic after the writeback instruction is
|
1115 |
|
|
determined. Trying to then place these into the input for one of the operands
|
1116 |
|
|
created a time delay loop that would no longer execute in a single 100~MHz
|
1117 |
|
|
clock cycle. (The time delay of the multiply within the ALU wasn't helping
|
1118 |
|
|
either \ldots).
|
1119 |
|
|
|
1120 |
33 |
dgisselq |
This stall may be eliminated via proper scheduling, by placing an instruction
|
1121 |
|
|
that does not set flags in between the ALU operation and the instruction
|
1122 |
|
|
that references the CC register. For example, {\tt MOV \$addr+PC,uPC}
|
1123 |
|
|
followed by an {\tt RTU} ({\tt OR \$GIE,CC}) instruction will not incur
|
1124 |
|
|
this stall, whereas an {\tt OR \$BREAKEN,CC} followed by an {\tt OR \$STEP,CC}
|
1125 |
|
|
will incur the stall.
|
1126 |
|
|
|
1127 |
32 |
dgisselq |
\item When waiting for a memory read operation to complete
|
1128 |
|
|
\begin{enumerate}
|
1129 |
|
|
\item\ {\tt LOD address,RA}
|
1130 |
|
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
1131 |
|
|
\item\ {\tt OPCODE I+RA,RB}
|
1132 |
|
|
\end{enumerate}
|
1133 |
|
|
|
1134 |
|
|
Remember, the ZIP CPU does not support out of order execution. Therefore,
|
1135 |
|
|
anytime the memory unit becomes busy both the memory unit and the ALU must
|
1136 |
|
|
stall until the memory unit is cleared. This is especially true of a load
|
1137 |
33 |
dgisselq |
instruction, which must still write its operand back to the register file.
|
1138 |
|
|
Store instructions are different, since they can be busy with no impact on
|
1139 |
|
|
later ALU write back operations. Hence, only loads stall the pipeline.
|
1140 |
32 |
dgisselq |
|
1141 |
|
|
This also assumes that the memory being accessed is a single cycle memory.
|
1142 |
|
|
Slower memories, such as the Quad SPI flash, will take longer--perhaps even
|
1143 |
33 |
dgisselq |
as long as forty clocks. During this time the CPU and the external bus
|
1144 |
32 |
dgisselq |
will be busy, and unable to do anything else.
|
1145 |
|
|
|
1146 |
|
|
\item Memory operation followed by a memory operation
|
1147 |
|
|
\begin{enumerate}
|
1148 |
|
|
\item\ {\tt STO address,RA}
|
1149 |
|
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
1150 |
|
|
\item\ {\tt LOD address,RB}
|
1151 |
|
|
\item\ {\em (multiple stalls, bus dependent, 7 clocks best)}
|
1152 |
|
|
\end{enumerate}
|
1153 |
|
|
|
1154 |
|
|
In this case, the LOD instruction cannot start until the STALL is finished.
|
1155 |
|
|
With proper scheduling, it is possible to do something in the ALU while the
|
1156 |
|
|
STO is busy, but otherwise this pipeline will stall waiting for it to complete.
|
1157 |
|
|
|
1158 |
|
|
Note that even though the Wishbone bus can support pipelined accesses at
|
1159 |
|
|
one access per clock, only the prefetch stage can take advantage of this.
|
1160 |
|
|
Load and Store instructions are stuck at one wishbone cycle per instruction.
|
1161 |
|
|
\end{itemize}
|
1162 |
|
|
|
1163 |
|
|
|
1164 |
21 |
dgisselq |
\chapter{Peripherals}\label{chap:periph}
|
1165 |
24 |
dgisselq |
|
1166 |
|
|
While the previous chapter describes a CPU in isolation, the Zip System
|
1167 |
|
|
includes a minimum set of peripherals as well. These peripherals are shown
|
1168 |
|
|
in Fig.~\ref{fig:zipsystem}
|
1169 |
|
|
\begin{figure}\begin{center}
|
1170 |
|
|
\includegraphics[width=3.5in]{../gfx/system.eps}
|
1171 |
|
|
\caption{Zip System Peripherals}\label{fig:zipsystem}
|
1172 |
|
|
\end{center}\end{figure}
|
1173 |
|
|
and described here. They are designed to make
|
1174 |
|
|
the Zip CPU more useful in an Embedded Operating System environment.
|
1175 |
|
|
|
1176 |
21 |
dgisselq |
\section{Interrupt Controller}
|
1177 |
24 |
dgisselq |
|
1178 |
|
|
Perhaps the most important peripheral within the Zip System is the interrupt
|
1179 |
|
|
controller. While the Zip CPU itself can only handle one interrupt, and has
|
1180 |
|
|
only the one interrupt state: disabled or enabled, the interrupt controller
|
1181 |
|
|
can make things more interesting.
|
1182 |
|
|
|
1183 |
|
|
The Zip System interrupt controller module supports up to 15 interrupts, all
|
1184 |
|
|
controlled from one register. Bit~31 of the interrupt controller controls
|
1185 |
|
|
overall whether interrupts are enabled (1'b1) or disabled (1'b0). Bits~16--30
|
1186 |
|
|
control whether individual interrupts are enabled (1'b0) or disabled (1'b0).
|
1187 |
|
|
Bit~15 is an indicator showing whether or not any interrupt is active, and
|
1188 |
|
|
bits~0--15 indicate whether or not an individual interrupt is active.
|
1189 |
|
|
|
1190 |
|
|
The interrupt controller has been designed so that bits can be controlled
|
1191 |
|
|
individually without having any knowledge of the rest of the controller
|
1192 |
|
|
setting. To enable an interrupt, write to the register with the high order
|
1193 |
|
|
global enable bit set and the respective interrupt enable bit set. No other
|
1194 |
|
|
bits will be affected. To disable an interrupt, write to the register with
|
1195 |
|
|
the high order global enable bit cleared and the respective interrupt enable
|
1196 |
|
|
bit set. To clear an interrupt, write a `1' to that interrupts status pin.
|
1197 |
|
|
Zero's written to the register have no affect, save that a zero written to the
|
1198 |
|
|
master enable will disable all interrupts.
|
1199 |
|
|
|
1200 |
|
|
As an example, suppose you wished to enable interrupt \#4. You would then
|
1201 |
|
|
write to the register a {\tt 0x80100010} to enable interrupt \#4 and to clear
|
1202 |
|
|
any past active state. When you later wish to disable this interrupt, you would
|
1203 |
|
|
write a {\tt 0x00100010} to the register. As before, this both disables the
|
1204 |
|
|
interrupt and clears the active indicator. This also has the side effect of
|
1205 |
|
|
disabling all interrupts, so a second write of {\tt 0x80000000} may be necessary
|
1206 |
|
|
to re-enable any other interrupts.
|
1207 |
|
|
|
1208 |
|
|
The Zip System currently hosts two interrupt controllers, a primary and a
|
1209 |
|
|
secondary. The primary interrupt controller has one interrupt line which may
|
1210 |
|
|
come from an external interrupt controller, and one interrupt line from the
|
1211 |
|
|
secondary controller. Other primary interrupts include the system timers,
|
1212 |
|
|
the jiffies interrupt, and the manual cache interrupt. The secondary interrupt
|
1213 |
|
|
controller maintains an interrupt state for all of the processor accounting
|
1214 |
|
|
counters.
|
1215 |
|
|
|
1216 |
21 |
dgisselq |
\section{Counter}
|
1217 |
|
|
|
1218 |
|
|
The Zip Counter is a very simple counter: it just counts. It cannot be
|
1219 |
|
|
halted. When it rolls over, it issues an interrupt. Writing a value to the
|
1220 |
|
|
counter just sets the current value, and it starts counting again from that
|
1221 |
|
|
value.
|
1222 |
|
|
|
1223 |
|
|
Eight counters are implemented in the Zip System for process accounting.
|
1224 |
|
|
This may change in the future, as nothing as yet uses these counters.
|
1225 |
|
|
|
1226 |
|
|
\section{Timer}
|
1227 |
|
|
|
1228 |
|
|
The Zip Timer is also very simple: it simply counts down to zero. When it
|
1229 |
|
|
transitions from a one to a zero it creates an interrupt.
|
1230 |
|
|
|
1231 |
|
|
Writing any non-zero value to the timer starts the timer. If the high order
|
1232 |
|
|
bit is set when writing to the timer, the timer becomes an interval timer and
|
1233 |
|
|
reloads its last start time on any interrupt. Hence, to mark seconds, one
|
1234 |
|
|
might set the timer to 100~million (the number of clocks per second), and
|
1235 |
|
|
set the high bit. Ever after, the timer will interrupt the CPU once per
|
1236 |
24 |
dgisselq |
second (assuming a 100~MHz clock). This reload capability also limits the
|
1237 |
|
|
maximum timer value to $2^{31}-1$, rather than $2^{32}-1$.
|
1238 |
21 |
dgisselq |
|
1239 |
|
|
\section{Watchdog Timer}
|
1240 |
|
|
|
1241 |
|
|
The watchdog timer is no different from any of the other timers, save for one
|
1242 |
|
|
critical difference: the interrupt line from the watchdog
|
1243 |
|
|
timer is tied to the reset line of the CPU. Hence writing a `1' to the
|
1244 |
|
|
watchdog timer will always reset the CPU.
|
1245 |
32 |
dgisselq |
To stop the Watchdog timer, write a `0' to it. To start it,
|
1246 |
21 |
dgisselq |
write any other number to it---as with the other timers.
|
1247 |
|
|
|
1248 |
|
|
While the watchdog timer supports interval mode, it doesn't make as much sense
|
1249 |
|
|
as it did with the other timers.
|
1250 |
|
|
|
1251 |
|
|
\section{Jiffies}
|
1252 |
|
|
|
1253 |
|
|
This peripheral is motivated by the Linux use of `jiffies' whereby a process
|
1254 |
|
|
can request to be put to sleep until a certain number of `jiffies' have
|
1255 |
|
|
elapsed. Using this interface, the CPU can read the number of `jiffies'
|
1256 |
|
|
from the peripheral (it only has the one location in address space), add the
|
1257 |
24 |
dgisselq |
sleep length to it, and write the result back to the peripheral. The zipjiffies
|
1258 |
21 |
dgisselq |
peripheral will record the value written to it only if it is nearer the current
|
1259 |
|
|
counter value than the last current waiting interrupt time. If no other
|
1260 |
|
|
interrupts are waiting, and this time is in the future, it will be enabled.
|
1261 |
|
|
(There is currently no way to disable a jiffie interrupt once set, other
|
1262 |
24 |
dgisselq |
than to disable the interrupt line in the interrupt controller.) The processor
|
1263 |
21 |
dgisselq |
may then place this sleep request into a list among other sleep requests.
|
1264 |
|
|
Once the timer expires, it would write the next Jiffy request to the peripheral
|
1265 |
|
|
and wake up the process whose timer had expired.
|
1266 |
|
|
|
1267 |
|
|
Indeed, the Jiffies register is nothing more than a glorified counter with
|
1268 |
|
|
an interrupt. Unlike the other counters, the Jiffies register cannot be set.
|
1269 |
|
|
Writes to the jiffies register create an interrupt time. When the Jiffies
|
1270 |
|
|
register later equals the value written to it, an interrupt will be asserted
|
1271 |
|
|
and the register then continues counting as though no interrupt had taken
|
1272 |
|
|
place.
|
1273 |
|
|
|
1274 |
|
|
The purpose of this register is to support alarm times within a CPU. To
|
1275 |
|
|
set an alarm for a particular process $N$ clocks in advance, read the current
|
1276 |
|
|
Jiffies value, and $N$, and write it back to the Jiffies register. The
|
1277 |
|
|
O/S must also keep track of values written to the Jiffies register. Thus,
|
1278 |
32 |
dgisselq |
when an `alarm' trips, it should be removed from the list of alarms, the list
|
1279 |
21 |
dgisselq |
should be sorted, and the next alarm in terms of Jiffies should be written
|
1280 |
|
|
to the register.
|
1281 |
|
|
|
1282 |
24 |
dgisselq |
\section{Manual Cache}
|
1283 |
|
|
|
1284 |
|
|
The manual cache is an experimental setting that may not remain with the Zip
|
1285 |
|
|
CPU for very long. It is designed to facilitate running from FLASH or ROM
|
1286 |
32 |
dgisselq |
memory, although the pipeline prefetch cache really makes this need obsolete.
|
1287 |
|
|
The manual
|
1288 |
24 |
dgisselq |
cache works by copying data from a wishbone address (range) into the cache
|
1289 |
|
|
register, and then by making that memory available as memory to the Zip System.
|
1290 |
|
|
It is a {\em manual cache} because the processor must first specify what
|
1291 |
|
|
memory to copy, and then once copied the processor can only access the cache
|
1292 |
|
|
memory by the cache memory location. There is no transparency. It is perhaps
|
1293 |
|
|
best described as a combination DMA controller and local memory.
|
1294 |
|
|
|
1295 |
|
|
Worse, this cache is likely going to be removed from the ZipSystem. Having used
|
1296 |
|
|
the ZipSystem now for some time, I have yet to find a need or use for the manual
|
1297 |
|
|
cache. I will likely replace this peripheral with a proper DMA controller.
|
1298 |
|
|
|
1299 |
21 |
dgisselq |
\chapter{Operation}\label{chap:ops}
|
1300 |
|
|
|
1301 |
33 |
dgisselq |
The Zip CPU, and even the Zip System, is not a System on a Chip (SoC). It
|
1302 |
|
|
needs to be connected to its operational environment in order to be used.
|
1303 |
|
|
Specifically, some per system adjustments need to be made:
|
1304 |
|
|
\begin{enumerate}
|
1305 |
|
|
\item The Zip System depends upon an external 32-bit Wishbone bus. This
|
1306 |
|
|
must exist, and must be connected to the Zip CPU for it to work.
|
1307 |
|
|
\item The Zip System needs to be told of its {\tt RESET\_ADDRESS}. This is
|
1308 |
|
|
the program counter of the first instruction following a reset.
|
1309 |
|
|
\item If you want the Zip System to start up on its own, you will need to
|
1310 |
|
|
set the {\tt START\_HALTED} parameter to zero. Otherwise, if you
|
1311 |
|
|
wish to manually start the CPU, that is if upon reset you want the
|
1312 |
|
|
CPU start start in its halted, reset state, then set this parameter to
|
1313 |
|
|
one.
|
1314 |
|
|
\item The third parameter to set is the number of interrupts you will be
|
1315 |
|
|
providing from external to the CPU. This can be anything from one
|
1316 |
|
|
to nine, but it cannot be zero. (Wire this line to a 1'b0 if you
|
1317 |
|
|
do not wish to support any external interrupts.)
|
1318 |
|
|
\item Finally, you need to place into some wishbone accessible address, whether
|
1319 |
|
|
RAM or (more likely) ROM, the initial instructions for the CPU.
|
1320 |
|
|
\end{enumerate}
|
1321 |
|
|
If you have enabled your CPU to start automatically, then upon power up the
|
1322 |
|
|
CPU will immediately start executing your instructions.
|
1323 |
|
|
|
1324 |
|
|
This is, however, not how I have used the Zip CPU. I have instead used the
|
1325 |
|
|
ZIP CPU in a more controlled environment. For me, the CPU starts in a
|
1326 |
|
|
halted state, and waits to be told to start. Further, the RESET address is a
|
1327 |
|
|
location in RAM. After bringing up the board I am using, and further the
|
1328 |
|
|
bus that is on it, the RAM memory is then loaded externally with the program
|
1329 |
|
|
I wish the Zip System to run. Once the RAM is loaded, I release the CPU.
|
1330 |
|
|
The CPU then runs until its halt condition, at which point its task is
|
1331 |
|
|
complete.
|
1332 |
|
|
|
1333 |
|
|
Eventually, I intend to place an operating system onto the ZipSystem, I'm
|
1334 |
|
|
just not there yet.
|
1335 |
|
|
|
1336 |
|
|
|
1337 |
21 |
dgisselq |
\chapter{Registers}\label{chap:regs}
|
1338 |
|
|
|
1339 |
24 |
dgisselq |
The ZipSystem registers fall into two categories, ZipSystem internal registers
|
1340 |
|
|
accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},
|
1341 |
|
|
\begin{table}[htbp]
|
1342 |
|
|
\begin{center}\begin{reglist}
|
1343 |
32 |
dgisselq |
PIC & \scalebox{0.8}{\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline
|
1344 |
|
|
WDT & \scalebox{0.8}{\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline
|
1345 |
|
|
CCHE & \scalebox{0.8}{\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline
|
1346 |
|
|
CTRIC & \scalebox{0.8}{\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline
|
1347 |
|
|
TMRA & \scalebox{0.8}{\tt 0xc0000004} & 32 & R/W & Timer A\\\hline
|
1348 |
|
|
TMRB & \scalebox{0.8}{\tt 0xc0000005} & 32 & R/W & Timer B\\\hline
|
1349 |
|
|
TMRC & \scalebox{0.8}{\tt 0xc0000006} & 32 & R/W & Timer C\\\hline
|
1350 |
|
|
JIFF & \scalebox{0.8}{\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline
|
1351 |
|
|
MTASK & \scalebox{0.8}{\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline
|
1352 |
|
|
MMSTL & \scalebox{0.8}{\tt 0xc0000009} & 32 & R/W & Master Stall Counter \\\hline
|
1353 |
|
|
MPSTL & \scalebox{0.8}{\tt 0xc000000a} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline
|
1354 |
|
|
MICNT & \scalebox{0.8}{\tt 0xc000000b} & 32 & R/W & Master Instruction Counter\\\hline
|
1355 |
|
|
UTASK & \scalebox{0.8}{\tt 0xc000000c} & 32 & R/W & User Task Clock Counter \\\hline
|
1356 |
|
|
UMSTL & \scalebox{0.8}{\tt 0xc000000d} & 32 & R/W & User Stall Counter \\\hline
|
1357 |
|
|
UPSTL & \scalebox{0.8}{\tt 0xc000000e} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline
|
1358 |
|
|
UICNT & \scalebox{0.8}{\tt 0xc000000f} & 32 & R/W & User Instruction Counter\\\hline
|
1359 |
|
|
% Cache & \scalebox{0.8}{\tt 0xc0100000} & & & Base address of the Cache memory\\\hline
|
1360 |
24 |
dgisselq |
\end{reglist}
|
1361 |
|
|
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}
|
1362 |
|
|
\end{center}\end{table}
|
1363 |
33 |
dgisselq |
and the two debug registers shown in Tbl.~\ref{tbl:dbgregs}.
|
1364 |
24 |
dgisselq |
\begin{table}[htbp]
|
1365 |
|
|
\begin{center}\begin{reglist}
|
1366 |
|
|
ZIPCTRL & 0 & 32 & R/W & Debug Control Register \\\hline
|
1367 |
|
|
ZIPDATA & 1 & 32 & R/W & Debug Data Register \\\hline
|
1368 |
|
|
\end{reglist}
|
1369 |
|
|
\caption{Zip System Debug Registers}\label{tbl:dbgregs}
|
1370 |
|
|
\end{center}\end{table}
|
1371 |
|
|
|
1372 |
33 |
dgisselq |
\section{Peripheral Registers}
|
1373 |
|
|
The peripheral registers, listed in Tbl.~\ref{tbl:zpregs}, are shown in the
|
1374 |
|
|
CPU's address space. These may be accessed by the CPU at these addresses,
|
1375 |
|
|
and when so accessed will respond as described in Chapt.~\ref{chap:periph}.
|
1376 |
|
|
These registers will be discussed briefly again here.
|
1377 |
24 |
dgisselq |
|
1378 |
33 |
dgisselq |
The Zip CPU Interrupt controller has four different types of bits, as shown in
|
1379 |
|
|
Tbl.~\ref{tbl:picbits}.
|
1380 |
|
|
\begin{table}\begin{center}
|
1381 |
|
|
\begin{bitlist}
|
1382 |
|
|
31 & R/W & Master Interrupt Enable\\\hline
|
1383 |
|
|
30\ldots 16 & R/W & Interrupt Enables, write '1' to change\\\hline
|
1384 |
|
|
15 & R & Current Master Interrupt State\\\hline
|
1385 |
|
|
15\ldots 0 & R/W & Input Interrupt states, write '1' to clear\\\hline
|
1386 |
|
|
\end{bitlist}
|
1387 |
|
|
\caption{Interrupt Controller Register Bits}\label{tbl:picbits}
|
1388 |
|
|
\end{center}\end{table}
|
1389 |
|
|
The high order bit, or bit--31, is the master interrupt enable bit. When this
|
1390 |
|
|
bit is set, then any time an interrupt occurs the CPU will be interrupted and
|
1391 |
|
|
will switch to supervisor mode, etc.
|
1392 |
|
|
|
1393 |
|
|
Bits 30~\ldots 16 are interrupt enable bits. Should the interrupt line go
|
1394 |
|
|
ghile while enabled, an interrupt will be generated. To set an interrupt enable
|
1395 |
|
|
bit, one needs to write the master interrupt enable while writing a `1' to this
|
1396 |
|
|
the bit. To clear, one need only write a `0' to the master interrupt enable,
|
1397 |
|
|
while leaving this line high.
|
1398 |
|
|
|
1399 |
|
|
Bits 15\ldots 0 are the current state of the interrupt vector. Interrupt lines
|
1400 |
|
|
trip when they go high, and remain tripped until they are acknowledged. If
|
1401 |
|
|
the interrupt goes high for longer than one pulse, it may be high when a clear
|
1402 |
|
|
is requested. If so, the interrupt will not clear. The line must go low
|
1403 |
|
|
again before the status bit can be cleared.
|
1404 |
|
|
|
1405 |
|
|
As an example, consider the following scenario where the Zip CPU supports four
|
1406 |
|
|
interrupts, 3\ldots0.
|
1407 |
|
|
\begin{enumerate}
|
1408 |
|
|
\item The Supervisor will first, while in the interrupts disabled mode,
|
1409 |
|
|
write a {\tt 32'h800f000f} to the controller. The supervisor may then
|
1410 |
|
|
switch to the user state with interrupts enabled.
|
1411 |
|
|
\item When an interrupt occurs, the supervisor will switch to the interrupt
|
1412 |
|
|
state. It will then cycle through the interrupt bits to learn which
|
1413 |
|
|
interrupt handler to call.
|
1414 |
|
|
\item If the interrupt handler expects more interrupts, it will clear its
|
1415 |
|
|
current interrupt when it is done handling the interrupt in question.
|
1416 |
|
|
To do this, it will write a '1' to the low order interrupt mask,
|
1417 |
|
|
such as writing a {\tt 32'h80000001}.
|
1418 |
|
|
\item If the interrupt handler does not expect any more interrupts, it will
|
1419 |
|
|
instead clear the interrupt from the controller by writing a
|
1420 |
|
|
{\tt 32'h00010001} to the controller.
|
1421 |
|
|
\item Once all interrupts have been handled, the supervisor will write a
|
1422 |
|
|
{\tt 32'h80000000} to the interrupt register to re-enable interrupt
|
1423 |
|
|
generation.
|
1424 |
|
|
\item The supervisor should also check the user trap bit, and possible soft
|
1425 |
|
|
interrupt bits here, but this action has nothing to do with the
|
1426 |
|
|
interrupt control register.
|
1427 |
|
|
\item The supervisor will then leave interrupt mode, possibly adjusting
|
1428 |
|
|
whichever task is running, by executing a return from interrupt
|
1429 |
|
|
command.
|
1430 |
|
|
\end{enumerate}
|
1431 |
|
|
|
1432 |
|
|
Leaving the interrupt controller, we show the timer registers bit definitions
|
1433 |
|
|
in Tbl.~\ref{tbl:tmrbits}.
|
1434 |
|
|
\begin{table}\begin{center}
|
1435 |
|
|
\begin{bitlist}
|
1436 |
|
|
31 & R/W & Auto-Reload\\\hline
|
1437 |
|
|
30\ldots 0 & R/W & Current timer value\\\hline
|
1438 |
|
|
\end{bitlist}
|
1439 |
|
|
\caption{Timer Register Bits}\label{tbl:tmrbits}
|
1440 |
|
|
\end{center}\end{table}
|
1441 |
|
|
As you may recall, the timer just counts down to zero and then trips an
|
1442 |
|
|
interrupt. Writing to the current timer value sets that value, and reading
|
1443 |
|
|
from it returns that value. Writing to the current timer value while also
|
1444 |
|
|
setting the auto--reload bit will send the timer into an auto--reload mode.
|
1445 |
|
|
In this mode, upon setting its interrupt bit for one cycle, the timer will
|
1446 |
|
|
also reset itself back to the value of the timer that was written to it when
|
1447 |
|
|
the auto--reload option was written to it. To clear and stop the timer,
|
1448 |
|
|
just simply write a `32'h00' to this register.
|
1449 |
|
|
|
1450 |
|
|
The Jiffies register is somewhat similar in that the register always changes.
|
1451 |
|
|
In this case, the register counts up, whereas the timer always counted down.
|
1452 |
|
|
Reads from this register, as shown in Tbl.~\ref{tbl:jiffybits},
|
1453 |
|
|
\begin{table}\begin{center}
|
1454 |
|
|
\begin{bitlist}
|
1455 |
|
|
31\ldots 0 & R & Current jiffy value\\\hline
|
1456 |
|
|
31\ldots 0 & W & Value/time of next interrupt\\\hline
|
1457 |
|
|
\end{bitlist}
|
1458 |
|
|
\caption{Jiffies Register Bits}\label{tbl:jiffybits}
|
1459 |
|
|
\end{center}\end{table}
|
1460 |
|
|
always return the time value contained in the register. Writes greater than
|
1461 |
|
|
the current Jiffy value, that is where the new value minus the old value is
|
1462 |
|
|
greater than zero while ignoring truncation, will set a new Jiffy interrupt
|
1463 |
|
|
time. At that time, the Jiffy vector will clear, and another interrupt time
|
1464 |
|
|
may either be written to it, or it will just continue counting without
|
1465 |
|
|
activating any more interrupts.
|
1466 |
|
|
|
1467 |
|
|
The Zip CPU also supports several counter peripherals, mostly in the way of
|
1468 |
|
|
process accounting. This peripherals have a single register associated with
|
1469 |
|
|
them, shown in Tbl.~\ref{tbl:ctrbits}.
|
1470 |
|
|
\begin{table}\begin{center}
|
1471 |
|
|
\begin{bitlist}
|
1472 |
|
|
31\ldots 0 & R/W & Current counter value\\\hline
|
1473 |
|
|
\end{bitlist}
|
1474 |
|
|
\caption{Counter Register Bits}\label{tbl:ctrbits}
|
1475 |
|
|
\end{center}\end{table}
|
1476 |
|
|
Writes to this register set the new counter value. Reads read the current
|
1477 |
|
|
counter value.
|
1478 |
|
|
|
1479 |
|
|
The current design operation of these counters is that of performance counting.
|
1480 |
|
|
Two sets of four registers are available for keeping track of performance.
|
1481 |
|
|
The first is a task counter. This just counts clock ticks. The second
|
1482 |
|
|
counter is a prefetch stall counter, then an master stall counter. These
|
1483 |
|
|
allow the CPU to be evaluated as to how efficient it is. The fourth and
|
1484 |
|
|
final counter is an instruction counter, which counts how many instructions the
|
1485 |
|
|
CPU has issued.
|
1486 |
|
|
|
1487 |
|
|
It is envisioned that these counters will be used as follows: First, every time
|
1488 |
|
|
a master counter rolls over, the supervisor (Operating System) will record
|
1489 |
|
|
the fact. Second, whenever activating a user task, the Operating System will
|
1490 |
|
|
set the four user counters to zero. When the user task has completed, the
|
1491 |
|
|
Operating System will read the timers back off, to determine how much of the
|
1492 |
|
|
CPU the process had consumed.
|
1493 |
|
|
|
1494 |
|
|
\section{Debug Port Registers}
|
1495 |
|
|
Accessing the Zip System via the debug port isn't as straight forward as
|
1496 |
|
|
accessing the system via the wishbone bus. The debug port itself has been
|
1497 |
|
|
reduced to two addresses, as outlined earlier in Tbl.~\ref{tbl:dbgregs}.
|
1498 |
|
|
Access to the Zip System begins with the Debug Control register, shown in
|
1499 |
|
|
Tbl.~\ref{tbl:dbgctrl}.
|
1500 |
|
|
\begin{table}\begin{center}
|
1501 |
|
|
\begin{bitlist}
|
1502 |
|
|
31\ldots 14 & R & Reserved\\\hline
|
1503 |
|
|
13 & R & CPU GIE setting\\\hline
|
1504 |
|
|
12 & R & CPU is sleeping\\\hline
|
1505 |
|
|
11 & W & Command clear PF cache\\\hline
|
1506 |
|
|
10 & R/W & Command HALT, Set to '1' to halt the CPU\\\hline
|
1507 |
|
|
9 & R & Stall Status, '1' if CPU is busy\\\hline
|
1508 |
|
|
8 & R/W & Step Command, set to '1' to step the CPU\\\hline
|
1509 |
|
|
7 & R & Interrupt Request \\\hline
|
1510 |
|
|
6 & R/W & Command RESET \\\hline
|
1511 |
|
|
5\ldots 0 & R/W & Debug Register Address \\\hline
|
1512 |
|
|
\end{bitlist}
|
1513 |
|
|
\caption{Debug Control Register Bits}\label{tbl:dbgctrl}
|
1514 |
|
|
\end{center}\end{table}
|
1515 |
|
|
|
1516 |
|
|
The first step in debugging access is to determine whether or not the CPU
|
1517 |
|
|
is halted, and to halt it if not. To do this, first write a '1' to the
|
1518 |
|
|
Command HALT bit. This will halt the CPU and place it into debug mode.
|
1519 |
|
|
Once the CPU is halted, the stall status bit will drop to zero. Thus,
|
1520 |
|
|
if bit 10 is high and bit 9 low, the debug port is open to examine the
|
1521 |
|
|
internal state of the CPU.
|
1522 |
|
|
|
1523 |
|
|
At this point, the external debugger may examine internal state information
|
1524 |
|
|
from within the CPU. To do this, first write again to the command register
|
1525 |
|
|
a value (with command halt still high) containing the address of an internal
|
1526 |
|
|
register of interest in the bottom 6~bits. Internal registers that may be
|
1527 |
|
|
accessed this way are listed in Tbl.~\ref{tbl:dbgaddrs}.
|
1528 |
|
|
\begin{table}\begin{center}
|
1529 |
|
|
\begin{reglist}
|
1530 |
|
|
sR0 & 0 & 32 & R/W & Supervisor Register R0 \\\hline
|
1531 |
|
|
sR1 & 0 & 32 & R/W & Supervisor Register R1 \\\hline
|
1532 |
|
|
sSP & 13 & 32 & R/W & Supervisor Stack Pointer\\\hline
|
1533 |
|
|
sCC & 14 & 32 & R/W & Supervisor Condition Code Register \\\hline
|
1534 |
|
|
sPC & 15 & 32 & R/W & Supervisor Program Counter\\\hline
|
1535 |
|
|
uR0 & 16 & 32 & R/W & User Register R0 \\\hline
|
1536 |
|
|
uR1 & 17 & 32 & R/W & User Register R1 \\\hline
|
1537 |
|
|
uSP & 29 & 32 & R/W & User Stack Pointer\\\hline
|
1538 |
|
|
uCC & 30 & 32 & R/W & User Condition Code Register \\\hline
|
1539 |
|
|
uPC & 31 & 32 & R/W & User Program Counter\\\hline
|
1540 |
|
|
PIC & 32 & 32 & R/W & Primary Interrupt Controller \\\hline
|
1541 |
|
|
WDT & 33 & 32 & R/W & Watchdog Timer\\\hline
|
1542 |
|
|
CCHE & 34 & 32 & R/W & Manual Cache Controller\\\hline
|
1543 |
|
|
CTRIC & 35 & 32 & R/W & Secondary Interrupt Controller\\\hline
|
1544 |
|
|
TMRA & 36 & 32 & R/W & Timer A\\\hline
|
1545 |
|
|
TMRB & 37 & 32 & R/W & Timer B\\\hline
|
1546 |
|
|
TMRC & 38 & 32 & R/W & Timer C\\\hline
|
1547 |
|
|
JIFF & 39 & 32 & R/W & Jiffies peripheral\\\hline
|
1548 |
|
|
MTASK & 40 & 32 & R/W & Master task clock counter\\\hline
|
1549 |
|
|
MMSTL & 41 & 32 & R/W & Master memory stall counter\\\hline
|
1550 |
|
|
MPSTL & 42 & 32 & R/W & Master Pre-Fetch Stall counter\\\hline
|
1551 |
|
|
MICNT & 43 & 32 & R/W & Master instruction counter\\\hline
|
1552 |
|
|
UTASK & 44 & 32 & R/W & User task clock counter\\\hline
|
1553 |
|
|
UMSTL & 45 & 32 & R/W & User memory stall counter\\\hline
|
1554 |
|
|
UPSTL & 46 & 32 & R/W & User Pre-Fetch Stall counter\\\hline
|
1555 |
|
|
UICNT & 47 & 32 & R/W & User instruction counter\\\hline
|
1556 |
|
|
\end{reglist}
|
1557 |
|
|
\caption{Debug Register Addresses}\label{tbl:dbgaddrs}
|
1558 |
|
|
\end{center}\end{table}
|
1559 |
|
|
Primarily, these ``registers'' include access to the entire CPU register
|
1560 |
|
|
set, as well as the 16~internal peripherals. To read one of these registers
|
1561 |
|
|
once the address is set, simply issue a read from the data port. To write
|
1562 |
|
|
one of these registers or peripheral ports, simply write to the data port
|
1563 |
|
|
after setting the proper address.
|
1564 |
|
|
|
1565 |
|
|
In this manner, all of the CPU's internal state may be read and adjusted.
|
1566 |
|
|
|
1567 |
|
|
As an example of how to use this, consider what would happen in the case
|
1568 |
|
|
of an external break point. If and when the CPU hits a break point that
|
1569 |
|
|
causes it to halt, the Command HALT bit will activate on its own, the CPU
|
1570 |
|
|
will then raise an external interrupt line and wait for a debugger to examine
|
1571 |
|
|
its state. After examining the state, the debugger will need to remove
|
1572 |
|
|
the breakpoint by writing a different instruction into memory and by writing
|
1573 |
|
|
to the command register while holding the clear cache, command halt, and
|
1574 |
|
|
step CPU bits high, (32'hd00). The debugger may then replace the breakpoint
|
1575 |
|
|
now that the CPU has gone beyond it, and clear the cache again (32'h500).
|
1576 |
|
|
|
1577 |
|
|
To leave this debug mode, simply write a `32'h0' value to the command register.
|
1578 |
|
|
|
1579 |
|
|
\chapter{Wishbone Datasheets}\label{chap:wishbone}
|
1580 |
32 |
dgisselq |
The Zip System supports two wishbone ports, a slave debug port and a master
|
1581 |
21 |
dgisselq |
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}
|
1582 |
|
|
\begin{table}[htbp]
|
1583 |
|
|
\begin{center}
|
1584 |
|
|
\begin{wishboneds}
|
1585 |
|
|
Revision level of wishbone & WB B4 spec \\\hline
|
1586 |
|
|
Type of interface & Slave, Read/Write, single words only \\\hline
|
1587 |
24 |
dgisselq |
Address Width & 1--bit \\\hline
|
1588 |
21 |
dgisselq |
Port size & 32--bit \\\hline
|
1589 |
|
|
Port granularity & 32--bit \\\hline
|
1590 |
|
|
Maximum Operand Size & 32--bit \\\hline
|
1591 |
|
|
Data transfer ordering & (Irrelevant) \\\hline
|
1592 |
|
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
1593 |
|
|
Signal Names & \begin{tabular}{ll}
|
1594 |
|
|
Signal Name & Wishbone Equivalent \\\hline
|
1595 |
|
|
{\tt i\_clk} & {\tt CLK\_I} \\
|
1596 |
|
|
{\tt i\_dbg\_cyc} & {\tt CYC\_I} \\
|
1597 |
|
|
{\tt i\_dbg\_stb} & {\tt STB\_I} \\
|
1598 |
|
|
{\tt i\_dbg\_we} & {\tt WE\_I} \\
|
1599 |
|
|
{\tt i\_dbg\_addr} & {\tt ADR\_I} \\
|
1600 |
|
|
{\tt i\_dbg\_data} & {\tt DAT\_I} \\
|
1601 |
|
|
{\tt o\_dbg\_ack} & {\tt ACK\_O} \\
|
1602 |
|
|
{\tt o\_dbg\_stall} & {\tt STALL\_O} \\
|
1603 |
|
|
{\tt o\_dbg\_data} & {\tt DAT\_O}
|
1604 |
|
|
\end{tabular}\\\hline
|
1605 |
|
|
\end{wishboneds}
|
1606 |
22 |
dgisselq |
\caption{Wishbone Datasheet for the Debug Interface}\label{tbl:wishbone-slave}
|
1607 |
21 |
dgisselq |
\end{center}\end{table}
|
1608 |
|
|
and Tbl.~\ref{tbl:wishbone-master} respectively.
|
1609 |
|
|
\begin{table}[htbp]
|
1610 |
|
|
\begin{center}
|
1611 |
|
|
\begin{wishboneds}
|
1612 |
|
|
Revision level of wishbone & WB B4 spec \\\hline
|
1613 |
24 |
dgisselq |
Type of interface & Master, Read/Write, single cycle or pipelined\\\hline
|
1614 |
|
|
Address Width & 32--bit bits \\\hline
|
1615 |
21 |
dgisselq |
Port size & 32--bit \\\hline
|
1616 |
|
|
Port granularity & 32--bit \\\hline
|
1617 |
|
|
Maximum Operand Size & 32--bit \\\hline
|
1618 |
|
|
Data transfer ordering & (Irrelevant) \\\hline
|
1619 |
|
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
1620 |
|
|
Signal Names & \begin{tabular}{ll}
|
1621 |
|
|
Signal Name & Wishbone Equivalent \\\hline
|
1622 |
|
|
{\tt i\_clk} & {\tt CLK\_O} \\
|
1623 |
|
|
{\tt o\_wb\_cyc} & {\tt CYC\_O} \\
|
1624 |
|
|
{\tt o\_wb\_stb} & {\tt STB\_O} \\
|
1625 |
|
|
{\tt o\_wb\_we} & {\tt WE\_O} \\
|
1626 |
|
|
{\tt o\_wb\_addr} & {\tt ADR\_O} \\
|
1627 |
|
|
{\tt o\_wb\_data} & {\tt DAT\_O} \\
|
1628 |
|
|
{\tt i\_wb\_ack} & {\tt ACK\_I} \\
|
1629 |
|
|
{\tt i\_wb\_stall} & {\tt STALL\_I} \\
|
1630 |
|
|
{\tt i\_wb\_data} & {\tt DAT\_I}
|
1631 |
|
|
\end{tabular}\\\hline
|
1632 |
|
|
\end{wishboneds}
|
1633 |
22 |
dgisselq |
\caption{Wishbone Datasheet for the CPU as Master}\label{tbl:wishbone-master}
|
1634 |
21 |
dgisselq |
\end{center}\end{table}
|
1635 |
|
|
I do not recommend that you connect these together through the interconnect.
|
1636 |
24 |
dgisselq |
Rather, the debug port of the CPU should be accessible regardless of the state
|
1637 |
|
|
of the master bus.
|
1638 |
21 |
dgisselq |
|
1639 |
24 |
dgisselq |
You may wish to notice that neither the {\tt ERR} nor the {\tt RETRY} wires
|
1640 |
|
|
have been implemented. What this means is that the CPU is currently unable
|
1641 |
|
|
to detect a bus error condition, and so may stall indefinitely (hang) should
|
1642 |
|
|
it choose to access a value not on the bus, or a peripheral that is not
|
1643 |
|
|
yet properly configured.
|
1644 |
21 |
dgisselq |
|
1645 |
|
|
\chapter{Clocks}\label{chap:clocks}
|
1646 |
|
|
|
1647 |
32 |
dgisselq |
This core is based upon the Basys--3 development board sold by Digilent.
|
1648 |
|
|
The Basys--3 development board contains one external 100~MHz clock, which is
|
1649 |
|
|
sufficient to run the ZIP CPU core.
|
1650 |
21 |
dgisselq |
\begin{table}[htbp]
|
1651 |
|
|
\begin{center}
|
1652 |
|
|
\begin{clocklist}
|
1653 |
|
|
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
|
1654 |
|
|
\end{clocklist}
|
1655 |
|
|
\caption{List of Clocks}\label{tbl:clocks}
|
1656 |
|
|
\end{center}\end{table}
|
1657 |
|
|
I hesitate to suggest that the core can run faster than 100~MHz, since I have
|
1658 |
|
|
had struggled with various timing violations to keep it at 100~MHz. So, for
|
1659 |
|
|
now, I will only state that it can run at 100~MHz.
|
1660 |
|
|
|
1661 |
|
|
|
1662 |
|
|
\chapter{I/O Ports}\label{chap:ioports}
|
1663 |
33 |
dgisselq |
The I/O ports to the Zip CPU may be grouped into three categories. The first
|
1664 |
|
|
is that of the master wishbone used by the CPU, then the slave wishbone used
|
1665 |
|
|
to command the CPU via a debugger, and then the rest. The first two of these
|
1666 |
|
|
were already discussed in the wishbone chapter. They are listed here
|
1667 |
|
|
for completeness in Tbl.~\ref{tbl:iowb-master}
|
1668 |
|
|
\begin{table}
|
1669 |
|
|
\begin{center}\begin{portlist}
|
1670 |
|
|
{\tt o\_wb\_cyc} & 1 & Output & Indicates an active Wishbone cycle\\\hline
|
1671 |
|
|
{\tt o\_wb\_stb} & 1 & Output & WB Strobe signal\\\hline
|
1672 |
|
|
{\tt o\_wb\_we} & 1 & Output & Write enable\\\hline
|
1673 |
|
|
{\tt o\_wb\_addr} & 32 & Output & Bus address \\\hline
|
1674 |
|
|
{\tt o\_wb\_data} & 32 & Output & Data on WB write\\\hline
|
1675 |
|
|
{\tt i\_wb\_ack} & 1 & Input & Slave has completed a R/W cycle\\\hline
|
1676 |
|
|
{\tt i\_wb\_stall} & 1 & Input & WB bus slave not ready\\\hline
|
1677 |
|
|
{\tt i\_wb\_data} & 32 & Input & Incoming bus data\\\hline
|
1678 |
|
|
\end{portlist}\caption{CPU Master Wishbone I/O Ports}\label{tbl:iowb-master}\end{center}\end{table}
|
1679 |
|
|
and~\ref{tbl:iowb-slave} respectively.
|
1680 |
|
|
\begin{table}
|
1681 |
|
|
\begin{center}\begin{portlist}
|
1682 |
|
|
{\tt i\_wb\_cyc} & 1 & Input & Indicates an active Wishbone cycle\\\hline
|
1683 |
|
|
{\tt i\_wb\_stb} & 1 & Input & WB Strobe signal\\\hline
|
1684 |
|
|
{\tt i\_wb\_we} & 1 & Input & Write enable\\\hline
|
1685 |
|
|
{\tt i\_wb\_addr} & 1 & Input & Bus address, command or data port \\\hline
|
1686 |
|
|
{\tt i\_wb\_data} & 32 & Input & Data on WB write\\\hline
|
1687 |
|
|
{\tt o\_wb\_ack} & 1 & Output & Slave has completed a R/W cycle\\\hline
|
1688 |
|
|
{\tt o\_wb\_stall} & 1 & Output & WB bus slave not ready\\\hline
|
1689 |
|
|
{\tt o\_wb\_data} & 32 & Output & Incoming bus data\\\hline
|
1690 |
|
|
\end{portlist}\caption{CPU Debug Wishbone I/O Ports}\label{tbl:iowb-slave}\end{center}\end{table}
|
1691 |
21 |
dgisselq |
|
1692 |
33 |
dgisselq |
There are only four other lines to the CPU: the external clock, external
|
1693 |
|
|
reset, incoming external interrupt line(s), and the outgoing debug interrupt
|
1694 |
|
|
line. These are shown in Tbl.~\ref{tbl:ioports}.
|
1695 |
|
|
\begin{table}
|
1696 |
|
|
\begin{center}\begin{portlist}
|
1697 |
|
|
{\tt i\_clk} & 1 & Input & The master CPU clock \\\hline
|
1698 |
|
|
{\tt i\_rst} & 1 & Input & Active high reset line \\\hline
|
1699 |
|
|
{\tt i\_ext\_int} & 1\ldots 6 & Input & Incoming external interrupts \\\hline
|
1700 |
|
|
{\tt o\_ext\_int} & 1 & Output & CPU Halted interrupt \\\hline
|
1701 |
|
|
\end{portlist}\caption{I/O Ports}\label{tbl:ioports}\end{center}\end{table}
|
1702 |
|
|
The clock line was discussed briefly in Chapt.~\ref{chap:clocks}. We
|
1703 |
|
|
typically run it at 100~MHz. The reset line is an active high reset. When
|
1704 |
|
|
asserted, the CPU will start running again from its reset address in
|
1705 |
|
|
memory. Further, depending upon how the CPU is configured and specifically on
|
1706 |
|
|
the {\tt START\_HALTED} parameter, it may or may not start running
|
1707 |
|
|
automatically. The {\tt i\_ext\_int} line is for an external interrupt. This
|
1708 |
|
|
line may be as wide as 6~external interrupts, depending upon the setting of
|
1709 |
|
|
the {\tt EXTERNAL\_INTERRUPTS} line. As currently configured, the ZipSystem
|
1710 |
|
|
only supports one such interrupt line by default. For us, this line is the
|
1711 |
|
|
output of another interrupt controller, but that's a board specific setup
|
1712 |
|
|
detail. Finally, the Zip System produces one external interrupt whenever
|
1713 |
|
|
the CPU halts to wait for the debugger.
|
1714 |
|
|
|
1715 |
21 |
dgisselq |
% Appendices
|
1716 |
|
|
% Index
|
1717 |
|
|
\end{document}
|
1718 |
|
|
|
1719 |
|
|
|