OpenCores
URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

Subversion Repositories zipcpu

[/] [zipcpu/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 24

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 21 dgisselq
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2
%%
3
%% Filename:    spec.tex
4
%%
5
%% Project:     Zip CPU -- a small, lightweight, RISC CPU soft core
6
%%
7
%% Purpose:     This LaTeX file contains all of the documentation/description
8
%%              currently provided with this Zip CPU soft core.  It supercedes
9
%%              any information about the instruction set or CPUs found
10
%%              elsewhere.  It's not nearly as interesting, though, as the PDF
11
%%              file it creates, so I'd recommend reading that before diving
12
%%              into this file.  You should be able to find the PDF file in
13
%%              the SVN distribution together with this PDF file and a copy of
14
%%              the GPL-3.0 license this file is distributed under.  If not,
15
%%              just type 'make' in the doc directory and it (should) build
16
%%              without a problem.
17
%%
18
%%
19
%% Creator:     Dan Gisselquist
20
%%              Gisselquist Technology, LLC
21
%%
22
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
23
%%
24
%% Copyright (C) 2015, Gisselquist Technology, LLC
25
%%
26
%% This program is free software (firmware): you can redistribute it and/or
27
%% modify it under the terms of  the GNU General Public License as published
28
%% by the Free Software Foundation, either version 3 of the License, or (at
29
%% your option) any later version.
30
%%
31
%% This program is distributed in the hope that it will be useful, but WITHOUT
32
%% ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
33
%% FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
34
%% for more details.
35
%%
36
%% You should have received a copy of the GNU General Public License along
37
%% with this program.  (It's in the $(ROOT)/doc directory, run make with no
38
%% target there if the PDF file isn't present.)  If not, see
39
%% <http://www.gnu.org/licenses/> for a copy.
40
%%
41
%% License:     GPL, v3, as defined and found on www.gnu.org,
42
%%              http://www.gnu.org/licenses/gpl.html
43
%%
44
%%
45
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
46
\documentclass{gqtekspec}
47
\project{Zip CPU}
48
\title{Specification}
49
\author{Dan Gisselquist, Ph.D.}
50
\email{dgisselq (at) opencores.org}
51 24 dgisselq
\revision{Rev.~0.2}
52 21 dgisselq
\begin{document}
53
\pagestyle{gqtekspecplain}
54
\titlepage
55
\begin{license}
56
Copyright (C) \theyear\today, Gisselquist Technology, LLC
57
 
58
This project is free software (firmware): you can redistribute it and/or
59
modify it under the terms of  the GNU General Public License as published
60
by the Free Software Foundation, either version 3 of the License, or (at
61
your option) any later version.
62
 
63
This program is distributed in the hope that it will be useful, but WITHOUT
64
ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
65
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
66
for more details.
67
 
68
You should have received a copy of the GNU General Public License along
69
with this program.  If not, see \hbox{<http://www.gnu.org/licenses/>} for a
70
copy.
71
\end{license}
72
\begin{revisionhistory}
73 24 dgisselq
0.2 & 8/19/2015 & Gisselquist & Still Draft, more complete \\\hline
74 21 dgisselq
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
75
\end{revisionhistory}
76
% Revision History
77
% Table of Contents, named Contents
78
\tableofcontents
79 24 dgisselq
\listoffigures
80 21 dgisselq
\listoftables
81
\begin{preface}
82
Many people have asked me why I am building the Zip CPU. ARM processors are
83
good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both
84
have better toolsets than the Zip CPU will ever have. OpenRISC is also
85 24 dgisselq
available, RISC--V may be replacing it. Why build a new processor?
86 21 dgisselq
 
87
The easiest, most obvious answer is the simple one: Because I can.
88
 
89
There's more to it, though. There's a lot that I would like to do with a
90
processor, and I want to be able to do it in a vendor independent fashion.
91
I would like to be able to generate Verilog code that can run equivalently
92
on both Xilinx and Altera chips, and that can be easily ported from one
93
manufacturer's chipsets to another. Even more, before purchasing a chip or a
94
board, I would like to know that my chip works. I would like to build a test
95
bench to test components with, and Verilator is my chosen test bench. This
96
forces me to use all Verilog, and it prevents me from using any proprietary
97
cores. For this reason, Microblaze and Nios are out of the question.
98
 
99
Why not OpenRISC? That's a hard question. The OpenRISC team has done some
100
wonderful work on an amazing processor, and I'll have to admit that I am
101
envious of what they've accomplished. I would like to port binutils to the
102
Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The
103
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has
104
a lot of features of modern CPUs within it that ... well, let's just say it's
105
not the little guy on the block. The Zip CPU is lighter weight, costing only
106
about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic
107
peripherals.
108
 
109
My final reason is that I'm building the Zip CPU as a learning experience. The
110
Zip CPU has allowed me to learn a lot about how CPUs work on a very micro
111
level. For the first time, I am beginning to understand many of the Computer
112
Architecture lessons from years ago.
113
 
114
To summarize: Because I can, because it is open source, because it is light
115
weight, and as an exercise in learning.
116
 
117
\end{preface}
118
 
119
\chapter{Introduction}
120
\pagenumbering{arabic}
121
\setcounter{page}{1}
122
 
123
 
124
The original goal of the ZIP CPU was to be a very simple CPU.   You might
125
think of it as a poor man's alternative to the OpenRISC architecture.
126
For this reason, all instructions have been designed to be as simple as
127
possible, and are all designed to be executed in one instruction cycle per
128
instruction, barring pipeline stalls.  Indeed, even the bus has been simplified
129
to a constant 32-bit width, with no option for more or less.  This has
130
resulted in the choice to drop push and pop instructions, pre-increment and
131
post-decrement addressing modes, and more.
132
 
133
For those who like buzz words, the Zip CPU is:
134
\begin{itemize}
135
\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,
136
                instructions are 32-bits wide, etc.
137 24 dgisselq
\item A RISC CPU.  There is no microcode for executing instructions.  All
138
        instructions are designed to be completed in one clock cycle.
139 21 dgisselq
\item A Load/Store architecture.  (Only load and store instructions
140
                can access memory.)
141
\item Wishbone compliant.  All peripherals are accessed just like
142
                memory across this bus.
143
\item A Von-Neumann architecture.  (The instructions and data share a
144
                common bus.)
145
\item A pipelined architecture, having stages for {\bf Prefetch},
146
                {\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory}
147 24 dgisselq
                unit, and {\bf Write-back}.  See Fig.~\ref{fig:cpu}
148
\begin{figure}\begin{center}
149
\includegraphics[width=3.5in]{../gfx/cpu.eps}
150
\caption{Zip CPU internal pipeline architecture}\label{fig:cpu}
151
\end{center}\end{figure}
152
                for a diagram of this structure.
153 21 dgisselq
\item Completely open source, licensed under the GPL.\footnote{Should you
154
        need a copy of the Zip CPU licensed under other terms, please
155
        contact me.}
156
\end{itemize}
157
 
158
Now, however, that I've worked on the Zip CPU for a while, it is not nearly
159
as simple as I originally hoped.  Worse, I've had to adjust to create
160
capabilities that I was never expecting to need.  These include:
161
\begin{itemize}
162
\item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is
163
        still necessary to debug this CPU.  That means that there needs to be
164
        an external register that can control the CPU: reset it, halt it, step
165 24 dgisselq
        it, and tell whether it is running or not.  My chosen interface
166
        includes a second register similar to this control register.  This
167
        second register allows the external controller or debugger to examine
168 21 dgisselq
        registers internal to the CPU.
169
 
170
\item {\bf Internal Debug:} Being able to run a debugger from within
171
        a user process requires an ability to step a user process from
172
        within a debugger.  It also requires a break instruction that can
173
        be substituted for any other instruction, and substituted back.
174
        The break is actually difficult: the break instruction cannot be
175
        allowed to execute.  That way, upon a break, the debugger should
176
        be able to jump back into the user process to step the instruction
177
        that would've been at the break point initially, and then to
178
        replace the break after passing it.
179
 
180 24 dgisselq
        Incidentally, this break messes with the prefetch cache and the
181
        pipeline: if you change an instruction partially through the pipeline,
182
        the whole pipeline needs to be cleansed.  Likewise if you change
183
        an instruction in memory, you need to make sure the cache is reloaded
184
        with the new instruction.
185
 
186 21 dgisselq
\item {\bf Prefetch Cache:} My original implementation had a very
187
        simple prefetch stage.  Any time the PC changed the prefetch would go
188
        and fetch the new instruction.  While this was perhaps this simplest
189
        approach, it cost roughly five clocks for every instruction.  This
190
        was deemed unacceptable, as I wanted a CPU that could execute
191
        instructions in one cycle.  I therefore have a prefetch cache that
192
        issues pipelined wishbone accesses to memory and then pushes
193
        instructions at the CPU.  Sadly, this accounts for about 20\% of the
194
        logic in the entire CPU, or 15\% of the logic in the entire system.
195
 
196
 
197
\item {\bf Operating System:} In order to support an operating system,
198
        interrupts and so forth, the CPU needs to support supervisor and
199
        user modes, as well as a means of switching between them.  For example,
200
        the user needs a means of executing a system call.  This is the
201
        purpose of the {\bf `trap'} instruction.  This instruction needs to
202
        place the CPU into supervisor mode (here equivalent to disabling
203
        interrupts), as well as handing it a parameter such as identifying
204
        which O/S function was called.
205
 
206 24 dgisselq
My initial approach to building a trap instruction was to create an external
207
peripheral which, when written to, would generate an interrupt and could
208
return the last value written to it.  In practice, this approach didn't work
209
at all: the CPU executed two instructions while waiting for the
210
trap interrupt to take place.  Since then, I've decided to keep the rest of
211
the CC register for that purpose so that a write to the CC register, with the
212
GIE bit cleared, could be used to execute a trap.  This has other problems,
213
though, primarily in the limitation of the uses of the CC register.  In
214
particular, the CC register is the best place to put CPU state information and
215
to ``announce'' special CPU features (floating point, etc).  So the trap
216
instruction still switches to interrupt mode, but the CC register is not
217
nearly as useful for telling the supervisor mode processor what trap is being
218
executed.
219 21 dgisselq
 
220
Modern timesharing systems also depend upon a {\bf Timer} interrupt
221 24 dgisselq
to handle task swapping.  For the Zip CPU, this interrupt is handled
222
external to the CPU as part of the CPU System, found in {\tt zipsystem.v}.
223
The timer module itself is found in {\tt ziptimer.v}.
224 21 dgisselq
 
225
\item {\bf Pipeline Stalls:} My original plan was to not support pipeline
226
        stalls at all, but rather to require the compiler to properly schedule
227 24 dgisselq
        all instructions so that stalls would never be necessary.  After trying
228 21 dgisselq
        to build such an architecture, I gave up, having learned some things:
229
 
230
        For example, in  order to facilitate interrupt handling and debug
231
        stepping, the CPU needs to know what instructions have finished, and
232
        which have not.  In other words, it needs to know where it can restart
233
        the pipeline from.  Once restarted, it must act as though it had
234 24 dgisselq
        never stopped.  This killed my idea of delayed branching, since what
235
        would be the appropriate program counter to restart at?  The one the
236
        CPU was going to branch to, or the ones in the delay slots?  This
237
        also makes the idea of compressed instruction codes difficult, since,
238
        again, where do you restart on interrupt?
239 21 dgisselq
 
240
        So I switched to a model of discrete execution: Once an instruction
241
        enters into either the ALU or memory unit, the instruction is
242
        guaranteed to complete.  If the logic recognizes a branch or a
243
        condition that would render the instruction entering into this stage
244
        possibly inappropriate (i.e. a conditional branch preceeding a store
245
        instruction for example), then the pipeline stalls for one cycle
246
        until the conditional branch completes.  Then, if it generates a new
247
        PC address, the stages preceeding are all wiped clean.
248
 
249
        The discrete execution model allows such things as sleeping: if the
250 24 dgisselq
        CPU is put to ``sleep,'' the ALU and memory stages stall and back up
251 21 dgisselq
        everything before them.  Likewise, anything that has entered the ALU
252
        or memory stage when the CPU is placed to sleep continues to completion.
253
        To handle this logic, each pipeline stage has three control signals:
254
        a valid signal, a stall signal, and a clock enable signal.  In
255
        general, a stage stalls if it's contents are valid and the next step
256
        is stalled.  This allows the pipeline to fill any time a later stage
257
        stalls.
258
 
259 24 dgisselq
        This approach is also different from other pipeline approaches.  Instead
260
        of keeping the entire pipeline filled, each stage is treated
261
        independently.  Therefore, individual stages may move forward as long
262
        as the subsequent stage is available, regardless of whether the stage
263
        behind it is filled.
264
 
265 21 dgisselq
\item {\bf Verilog Modules:} When examining how other processors worked
266
        here on open cores, many of them had one separate module per pipeline
267
        stage.  While this appeared to me to be a fascinating and commendable
268
        idea, my own implementation didn't work out quite so nicely.
269
 
270
        As an example, the decode module produces a {\em lot} of
271
        control wires and registers.  Creating a module out of this, with
272
        only the simplest of logic within it, seemed to be more a lesson
273
        in passing wires around, rather than encapsulating logic.
274
 
275
        Another example was the register writeback section.  I would love
276
        this section to be a module in its own right, and many have made them
277
        such.  However, other modules depend upon writeback results other
278
        than just what's placed in the register (i.e., the control wires).
279
        For these reasons, I didn't manage to fit this section into it's
280
        own module.
281
 
282
        The result is that the majority of the CPU code can be found in
283
        the {\tt zipcpu.v} file.
284
\end{itemize}
285
 
286
With that introduction out of the way, let's move on to the instruction
287
set.
288
 
289
\chapter{CPU Architecture}\label{chap:arch}
290
 
291 24 dgisselq
The Zip CPU supports a set of two operand instructions, where the second operand
292 21 dgisselq
(always a register) is the result.  The only exception is the store instruction,
293
where the first operand (always a register) is the source of the data to be
294
stored.
295
 
296 24 dgisselq
\section{Simplified Bus}
297
The bus architecture of the Zip CPU is that of a simplified WISHBONE bus.
298
It has been simplified in this fashion: all operations are 32--bit operations.
299
The bus is neither little endian nor bit endian.  For this reason, all words
300
are 32--bits.  All instructions are also 32--bits wide.  Everything has been
301
built around the 32--bit word.
302
 
303 21 dgisselq
\section{Register Set}
304
The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor
305 24 dgisselq
and a user set as shown in Fig.~\ref{fig:regset}.
306
\begin{figure}\begin{center}
307
\includegraphics[width=3.5in]{../gfx/regset.eps}
308
\caption{Zip CPU Register File}\label{fig:regset}
309
\end{center}\end{figure}
310
The supervisor set is used in interrupt mode when interrupts are disabled,
311
whereas the user set is used otherwise.  Of this register set, the Program
312
Counter (PC) is register 15, whereas the status register (SR) or condition
313
code register
314 21 dgisselq
(CC) is register 14.  By convention, the stack pointer will be register 13 and
315 24 dgisselq
noted as (SP)--although there is nothing special about this register other
316
than this convention.
317 21 dgisselq
The CPU can access both register sets via move instructions from the
318
supervisor state, whereas the user state can only access the user registers.
319
 
320
The status register is special, and bears further mention.  The lower
321 24 dgisselq
10 bits of the status register form a set of CPU state and condition codes.
322
Writes to other bits of this register are preserved.
323 21 dgisselq
 
324
Of the eight condition codes, the bottom four are the current flags:
325
                Zero (Z),
326
                Carry (C),
327
                Negative (N),
328
                and Overflow (V).
329
 
330
The next bit is a clock enable (0 to enable) or sleep bit (1 to put
331
        the CPU to sleep).  Setting this bit will cause the CPU to
332
        wait for an interrupt (if interrupts are enabled), or to
333
        completely halt (if interrupts are disabled).
334
The sixth bit is a global interrupt enable bit (GIE).  When this
335
        sixth bit is a '1' interrupts will be enabled, else disabled.  When
336
        interrupts are disabled, the CPU will be in supervisor mode, otherwise
337
        it is in user mode.  Thus, to execute a context switch, one only
338
        need enable or disable interrupts.  (When an interrupt line goes
339
        high, interrupts will automatically be disabled, as the CPU goes
340
        and deals with its context switch.)
341
 
342
The seventh bit is a step bit.  This bit can be
343
        set from supervisor mode only.  After setting this bit, should
344
        the supervisor mode process switch to user mode, it would then
345
        accomplish one instruction in user mode before returning to supervisor
346
        mode.  Then, upon return to supervisor mode, this bit will
347
        be automatically cleared.  This bit has no effect on the CPU while in
348
        supervisor mode.
349
 
350
        This functionality was added to enable a userspace debugger
351
        functionality on a user process, working through supervisor mode
352
        of course.
353
 
354
 
355 24 dgisselq
The eighth bit is a break enable bit.  This controls whether a break
356
instruction in user mode will halt the processor for an external debugger
357
(break enabled), or whether the break instruction will simply send send the
358
CPU into interrupt mode.  Encountering a break in supervisor mode will
359
halt the CPU independent of the break enable bit.  This bit can only be set
360
within supervisor mode.
361 21 dgisselq
 
362
This functionality was added to enable an external debugger to
363
        set and manage breakpoints.
364
 
365
The ninth bit is reserved for a floating point enable bit.  When set, the
366
arithmetic for the next instruction will be sent to a floating point unit.
367
Such a unit may later be added as an extension to the Zip CPU.  If the
368
CPU does not support floating point instructions, this bit will never be set.
369 24 dgisselq
The instruction set could also be simply extended to allow other data types
370
in this fashion, such as two by 16--bit vector operations or four by 8--bit
371
vector operations.
372 21 dgisselq
 
373
The tenth bit is a trap bit.  It is set whenever the user requests a soft
374
interrupt, and cleared on any return to userspace command.  This allows the
375
supervisor, in supervisor mode, to determine whether it got to supervisor
376
mode from a trap or from an external interrupt or both.
377
 
378 24 dgisselq
These status register bits are summarized in Tbl.~\ref{tbl:ccbits}.
379 21 dgisselq
\begin{table}
380
\begin{center}
381
\begin{tabular}{l|l}
382
Bit & Meaning \\\hline
383
9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline
384
8 & (Reserved for) Floating point enable \\\hline
385
7 & Halt on break, to support an external debugger \\\hline
386
6 & Step, single step the CPU in user mode\\\hline
387
5 & GIE, or Global Interrupt Enable \\\hline
388
4 & Sleep \\\hline
389
3 & V, or overflow bit.\\\hline
390
2 & N, or negative bit.\\\hline
391
1 & C, or carry bit.\\\hline
392
 
393
\end{tabular}
394 24 dgisselq
\caption{Condition Code / Status Register Bits}\label{tbl:ccbits}
395
\end{center}\end{table}
396
 
397 21 dgisselq
\section{Conditional Instructions}
398
Most, although not quite all, instructions are conditionally executed.  From
399
the four condition code flags, eight conditions are defined.  These are shown
400
in Tbl.~\ref{tbl:conditions}.
401
\begin{table}
402
\begin{center}
403
\begin{tabular}{l|l|l}
404
Code & Mneumonic & Condition \\\hline
405
3'h0 & None & Always execute the instruction \\
406
3'h1 & {\tt .Z} & Only execute when 'Z' is set \\
407
3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\
408
3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
409
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
410 24 dgisselq
3'h5 & {\tt .LT} & Less than ('N' set) \\
411 21 dgisselq
3'h6 & {\tt .C} & Carry set\\
412
3'h7 & {\tt .V} & Overflow set\\
413
\end{tabular}
414
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
415
\end{center}
416
\end{table}
417 24 dgisselq
There is no condition code for less than or equal, not C or not V.  Sorry,
418
I ran out of space in 3--bits.  Using these conditions will take an extra
419
instruction.  (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
420 21 dgisselq
 
421
\section{Operand B}
422 24 dgisselq
Many instruction forms have a 21-bit source ``Operand B'' associated with them.
423 21 dgisselq
This Operand B is either equal to a register plus a signed immediate offset,
424
or an immediate offset by itself.  This value is encoded as shown in
425
Tbl.~\ref{tbl:opb}.
426
\begin{table}\begin{center}
427
\begin{tabular}{|l|l|l|}\hline
428
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
429 24 dgisselq
1'b0 & \multicolumn{2}{l|}{20--bit Signed Immediate value} \\\hline
430
1'b1 & 4-bit Register & 16--bit Signed immediate offset \\\hline
431 21 dgisselq
\end{tabular}
432
\caption{Bit allocation for Operand B}\label{tbl:opb}
433
\end{center}\end{table}
434 24 dgisselq
 
435
Sixteen and twenty bit immediates don't make sense for all instructions.  For
436
example, what is the point of a 20--bit immediate when executing a 16--bit
437
multiply?  Likewise, why have a 16--bit immediate when adding to a logical
438
or arithmetic shift?  In these cases, the extra bits are reserved for future
439
instruction possibilities.
440
 
441 21 dgisselq
\section{Address Modes}
442
The ZIP CPU supports two addressing modes: register plus immediate, and
443
immediate address.  Addresses are therefore encoded in the same fashion as
444
Operand B's, shown above.
445
 
446
A lot of long hard thought was put into whether to allow pre/post increment
447
and decrement addressing modes.  Finding no way to use these operators without
448
taking two or more clocks per instruction, these addressing modes have been
449
removed from the realm of possibilities.  This means that the Zip CPU has no
450
native way of executing push, pop, return, or jump to subroutine operations.
451 24 dgisselq
Each of these instructions can be emulated with a set of instructions from the
452
existing set.
453 21 dgisselq
 
454
\section{Move Operands}
455
The previous set of operands would be perfect and complete, save only that
456 24 dgisselq
the CPU needs access to non--supervisory registers while in supervisory mode.
457
Therefore, the MOV instruction is special and offers access to these registers
458
\ldots when in supervisory mode.  To keep the compiler simple, the extra bits
459
are ignored in non-supervisory mode (as though they didn't exist), rather than
460
being mapped to new instructions or additional capabilities.  The bits
461
indicating which register set each register lies within are the A-Usr and
462
B-Usr bits.  When set to a one, these refer to a user mode register.  When set
463
to a zero, these refer to a register in the current mode, whether user or
464
supervisor.  Further, because a load immediate instruction exists, there is no
465
move capability between an immediate and a register: all moves come from either
466
a register or a register plus an offset.
467 21 dgisselq
 
468 24 dgisselq
This actually leads to a bit of a problem: since the MOV instruction encodes
469
which register set each register is coming from or moving to, how shall a
470
compiler or assembler know how to compile a MOV instruction without knowing
471
the mode of the CPU at the time?  For this reason, the compiler will assume
472
all MOV registers are supervisor registers, and display them as normal.
473
Anything with the user bit set will be treated as a user register.  The CPU
474
will quietly ignore the supervisor bits while in user mode, and anything
475
marked as a user register will always be valid.  (Did I just say that in the
476
last paragraph?)
477 21 dgisselq
 
478
\section{Multiply Operations}
479 24 dgisselq
The Zip CPU supports two Multiply operations, a
480 21 dgisselq
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU).  In both
481
cases, the operand is a register plus a 16-bit immediate, subject to the
482
rule that the register cannot be the PC or CC registers.  The PC register
483
field has been stolen to create a multiply by immediate instruction.  The
484
CC register field is reserved.
485
 
486
\section{Floating Point}
487
The ZIP CPU does not support floating point operations today.  However, the
488
instruction set reserves a capability for a floating point operation.  To
489
execute such an operation, simply set the floating point bit in the CC
490
register and the following instruction will interpret its registers as
491
a floating point instruction.  Not all instructions, however, have floating
492
point equivalents.  Further, the immediate fields do not apply in floating
493
point mode, and must be set to zero.  Not all instructions make sense as
494
floating point operations.  Therefore, only the CMP, SUB, ADD, and MPY
495
instructions may be issued as floating point instructions.  Other instructions
496
allow the examining of the floating point bit in the CC register.  In all
497
cases, the floating point bit is cleared one instruction after it is set.
498
 
499
The architecture does not support a floating point not-implemented interrupt.
500
Any soft floating point emulation must be done deliberately.
501
 
502
\section{Native Instructions}
503
The instruction set for the Zip CPU is summarized in
504
Tbl.~\ref{tbl:zip-instructions}.
505
\begin{table}\begin{center}
506
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
507
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
508
        & \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
509
        & Sets CC? \\\hline
510
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
511
                & \multicolumn{4}{l|}{D. Reg}
512
                & \multicolumn{3}{l|}{Cond.}
513
                & \multicolumn{21}{l|}{Operand B}
514
                & Yes \\\hline
515 24 dgisselq
TST(And) & \multicolumn{4}{l|}{4'h1}
516 21 dgisselq
                & \multicolumn{4}{l|}{D. Reg}
517
                & \multicolumn{3}{l|}{Cond.}
518
                & \multicolumn{21}{l|}{Operand B}
519
        & Yes \\\hline
520
MOV & \multicolumn{4}{l|}{4'h2}
521
                & \multicolumn{4}{l|}{D. Reg}
522
                & \multicolumn{3}{l|}{Cond.}
523
                & A-Usr
524
                & \multicolumn{4}{l|}{B-Reg}
525
                & B-Usr
526
                & \multicolumn{15}{l|}{15'bit signed offset}
527
                & \\\hline
528
LODI & \multicolumn{4}{l|}{4'h3}
529
                & \multicolumn{4}{l|}{R. Reg}
530
                & \multicolumn{24}{l|}{24'bit Signed Immediate}
531
                & \\\hline
532
NOOP & \multicolumn{4}{l|}{4'h4}
533
                & \multicolumn{4}{l|}{4'he}
534
                & \multicolumn{24}{l|}{24'h00}
535
                & \\\hline
536
BREAK & \multicolumn{4}{l|}{4'h4}
537
                & \multicolumn{4}{l|}{4'he}
538
                & \multicolumn{24}{l|}{24'h01}
539
                & \\\hline
540
{\em Rsrd} & \multicolumn{4}{l|}{4'h4}
541
                & \multicolumn{4}{l|}{4'he}
542
                & \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
543
                & \\\hline
544
LODIHI & \multicolumn{4}{l|}{4'h4}
545
                & \multicolumn{4}{l|}{4'hf}
546
                & \multicolumn{3}{l|}{Cond.}
547
                & 1'b1
548
                & \multicolumn{4}{l|}{R. Reg}
549
                & \multicolumn{16}{l|}{16-bit Immediate}
550
                & \\\hline
551
LODILO & \multicolumn{4}{l|}{4'h4}
552
                & \multicolumn{4}{l|}{4'hf}
553
                & \multicolumn{3}{l|}{Cond.}
554
                & 1'b0
555
                & \multicolumn{4}{l|}{R. Reg}
556
                & \multicolumn{16}{l|}{16-bit Immediate}
557
                & \\\hline
558
16-b MPYU & \multicolumn{4}{l|}{4'h4}
559
                & \multicolumn{4}{l|}{R. Reg}
560
                & \multicolumn{3}{l|}{Cond.}
561
                & 1'b0 & \multicolumn{4}{l|}{Reg}
562
                & \multicolumn{16}{l|}{16-bit Offset}
563
                & Yes \\\hline
564
16-b MPYU(I) & \multicolumn{4}{l|}{4'h4}
565
                & \multicolumn{4}{l|}{R. Reg}
566
                & \multicolumn{3}{l|}{Cond.}
567
                & 1'b0 & \multicolumn{4}{l|}{4'hf}
568
                & \multicolumn{16}{l|}{16-bit Offset}
569
                & Yes \\\hline
570
16-b MPYS & \multicolumn{4}{l|}{4'h4}
571
                & \multicolumn{4}{l|}{R. Reg}
572
                & \multicolumn{3}{l|}{Cond.}
573
                & 1'b1 & \multicolumn{4}{l|}{Reg}
574
                & \multicolumn{16}{l|}{16-bit Offset}
575
                & Yes \\\hline
576
16-b MPYS(I) & \multicolumn{4}{l|}{4'h4}
577
                & \multicolumn{4}{l|}{R. Reg}
578
                & \multicolumn{3}{l|}{Cond.}
579
                & 1'b1 & \multicolumn{4}{l|}{4'hf}
580
                & \multicolumn{16}{l|}{16-bit Offset}
581
                & Yes \\\hline
582
ROL & \multicolumn{4}{l|}{4'h5}
583
                & \multicolumn{4}{l|}{R. Reg}
584
                & \multicolumn{3}{l|}{Cond.}
585
                & \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
586
                & \\\hline
587
LOD & \multicolumn{4}{l|}{4'h6}
588
                & \multicolumn{4}{l|}{R. Reg}
589
                & \multicolumn{3}{l|}{Cond.}
590
                & \multicolumn{21}{l|}{Operand B address}
591
                & \\\hline
592
STO & \multicolumn{4}{l|}{4'h7}
593
                & \multicolumn{4}{l|}{D. Reg}
594
                & \multicolumn{3}{l|}{Cond.}
595
                & \multicolumn{21}{l|}{Operand B address}
596
                & \\\hline
597
{\em Rsrd} & \multicolumn{4}{l|}{4'h8}
598
        &       \multicolumn{4}{l|}{R. Reg}
599
        &       \multicolumn{3}{l|}{Cond.}
600
        & 1'b0
601
        &       \multicolumn{20}{l|}{Reserved}
602
        & Yes \\\hline
603
SUB & \multicolumn{4}{l|}{4'h8}
604
        &       \multicolumn{4}{l|}{R. Reg}
605
        &       \multicolumn{3}{l|}{Cond.}
606
        & 1'b1
607
        &       \multicolumn{4}{l|}{Reg}
608
        &       \multicolumn{16}{l|}{16'bit signed offset}
609
        & Yes \\\hline
610
AND & \multicolumn{4}{l|}{4'h9}
611
        &       \multicolumn{4}{l|}{R. Reg}
612
        &       \multicolumn{3}{l|}{Cond.}
613
        &       \multicolumn{21}{l|}{Operand B}
614
        & Yes \\\hline
615
ADD & \multicolumn{4}{l|}{4'ha}
616
        &       \multicolumn{4}{l|}{R. Reg}
617
        &       \multicolumn{3}{l|}{Cond.}
618
        &       \multicolumn{21}{l|}{Operand B}
619
        & Yes \\\hline
620
OR & \multicolumn{4}{l|}{4'hb}
621
        &       \multicolumn{4}{l|}{R. Reg}
622
        &       \multicolumn{3}{l|}{Cond.}
623
        &       \multicolumn{21}{l|}{Operand B}
624
        & Yes \\\hline
625
XOR & \multicolumn{4}{l|}{4'hc}
626
        &       \multicolumn{4}{l|}{R. Reg}
627
        &       \multicolumn{3}{l|}{Cond.}
628
        &       \multicolumn{21}{l|}{Operand B}
629
        & Yes \\\hline
630
LSL/ASL & \multicolumn{4}{l|}{4'hd}
631
        &       \multicolumn{4}{l|}{R. Reg}
632
        &       \multicolumn{3}{l|}{Cond.}
633
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
634
        & Yes \\\hline
635
ASR & \multicolumn{4}{l|}{4'he}
636
        &       \multicolumn{4}{l|}{R. Reg}
637
        &       \multicolumn{3}{l|}{Cond.}
638
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
639
        & Yes \\\hline
640
LSR & \multicolumn{4}{l|}{4'hf}
641
        &       \multicolumn{4}{l|}{R. Reg}
642
        &       \multicolumn{3}{l|}{Cond.}
643
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
644
        & Yes \\\hline
645
\end{tabular}
646
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
647
\end{center}\end{table}
648
 
649
As you can see, there's lots of room for instruction set expansion.  The
650 24 dgisselq
NOOP and BREAK instructions are the only instructions within one particular
651
24--bit hole.  Likewise, the subtract leaves half of its space open, since a
652
subtract immediate is the same as an add with a negated immediate.  This
653
spaces are reserved for future enhancements.
654 21 dgisselq
 
655
\section{Derived Instructions}
656
The ZIP CPU supports many other common instructions, but not all of them
657 24 dgisselq
are single cycle instructions.  The derived instruction tables,
658 21 dgisselq
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
659
help to capture some of how these other instructions may be implemented on
660
the ZIP CPU.  Many of these instructions will have assembly equivalents,
661
such as the branch instructions, to facilitate working with the CPU.
662
\begin{table}\begin{center}
663
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
664
Mapped & Actual  & Notes \\\hline
665
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
666
        & \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
667
        & Add with carry \\\hline
668
BRA.Cond +/-\$Addr
669 24 dgisselq
        & \hbox{Mov.cond \$Addr+PC,PC}
670
        & Branch or jump on condition.  Works for 15--bit
671
                signed address offsets.\\\hline
672 21 dgisselq
BRA.Cond +/-\$Addr
673
        & \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC}
674
        & Branch/jump on condition.  Works for
675
        23 bit address offsets, but costs a register, an extra instruction,
676
        and setsthe flags. \\\hline
677
BNC PC+\$Addr
678
        & \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
679
        & Example of a branch on an unsupported
680
                condition, in this case a branch on not carry \\\hline
681
BUSY & MOV \$-1(PC),PC & Execute an infinite loop \\\hline
682
CLRF.NZ Rx
683
        & XOR.NZ Rx,Rx
684
        & Clear Rx, and flags, if the Z-bit is not set \\\hline
685
CLR Rx
686
        & LDI \$0,Rx
687
        & Clears Rx, leaves flags untouched.  This instruction cannot be
688
                conditional. \\\hline
689
EXCH.W Rx
690
        & ROL \$16,Rx
691
        & Exchanges the top and bottom 16'bit words of Rx \\\hline
692
HALT
693
        & Or \$SLEEP,CC
694
        & Executed while in interrupt mode.  In user mode this is simply a
695
        wait until interrupt instructioon. \\\hline
696
INT & LDI \$0,CC
697
        & Since we're using the CC register as a trap vector as well, this
698
        executes TRAP \#0. \\\hline
699
IRET
700
        & OR \$GIE,CC
701
        & Also an RTU instruction (Return to Userspace) \\\hline
702
JMP R6+\$Addr
703
        & MOV \$Addr(R6),PC
704
        & \\\hline
705
JSR PC+\$Addr
706
        & \parbox[t]{1.5in}{SUB \$1,SP \\\
707
        MOV \$3+PC,R0 \\
708
        STO R0,1(SP) \\
709
        MOV \$Addr+PC,PC \\
710
        ADD \$1,SP}
711 24 dgisselq
        & Jump to Subroutine. Note the required cleanup instruction after
712
        returning. \\\hline
713 21 dgisselq
JSR PC+\$Addr
714
        & \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
715
        &This is the high speed
716
        version of a subroutine call, necessitating a register to hold the
717
        last PC address.  In its favor, this method doesn't suffer the
718
        mandatory memory access of the other approach. \\\hline
719
LDI.l \$val,Rx
720
        & \parbox[t]{1.5in}{LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
721
                        LDILO (\$val \& 0x0ffff)}
722
        & Sadly, there's not enough instruction
723
                space to load a complete immediate value into any register.
724
                Therefore, fully loading any register takes two cycles.
725
                The LDIHI (load immediate high) and LDILO (load immediate low)
726
                instructions have been created to facilitate this. \\\hline
727
\end{tabular}
728
\caption{Derived Instructions}\label{tbl:derived-1}
729
\end{center}\end{table}
730
\begin{table}\begin{center}
731
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
732
Mapped & Actual  & Notes \\\hline
733
LOD.b \$addr,Rx
734
        & \parbox[t]{1.5in}{%
735
        LDI     \$addr,Ra \\
736
        LDI     \$addr,Rb \\
737
        LSR     \$2,Ra \\
738
        AND     \$3,Rb \\
739
        LOD     (Ra),Rx \\
740
        LSL     \$3,Rb \\
741
        SUB     \$32,Rb \\
742
        ROL     Rb,Rx \\
743
        AND \$0ffh,Rx}
744
        & \parbox[t]{3in}{This CPU is designed for 32'bit word
745
        length instructions.  Byte addressing is not supported by the CPU or
746
        the bus, so it therefore takes more work to do.
747
 
748
        Note also that in this example, \$Addr is a byte-wise address, where
749 24 dgisselq
        all other addresses in this document are 32-bit wordlength addresses.
750
        For this reason,
751 21 dgisselq
        we needed to drop the bottom two bits.  This also limits the address
752
        space of character accesses using this method from 16 MB down to 4MB.}
753
                \\\hline
754
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
755
        & \parbox[t]{1.5in}{LSL \$1,Ry \\
756
        LSL \$1,Rx \\
757
        OR.C \$1,Ry}
758
        & Logical shift left with carry.  Note that the
759
        instruction order is now backwards, to keep the conditions valid.
760
        That is, LSL sets the carry flag, so if we did this the othe way
761
        with Rx before Ry, then the condition flag wouldn't have been right
762
        for an OR correction at the end. \\\hline
763
\parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry}
764
        & \parbox[t]{1.5in}{CLR Rz \\
765
        LSR \$1,Ry \\
766
        LDIHI.C \$8000h,Rz \\
767
        LSR \$1,Rx \\
768
        OR Rz,Rx}
769
        & Logical shift right with carry \\\hline
770
NEG Rx & \parbox[t]{1.5in}{XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
771
NOOP & NOOP & While there are many
772
        operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these
773
        operations have consequences in that they might stall the bus if
774
        Rx isn't ready yet.  For this reason, we have a dedicated NOOP
775
        instruction. \\\hline
776
NOT Rx & XOR \$-1,Rx & \\\hline
777
POP Rx
778
        & \parbox[t]{1.5in}{LOD \$-1(SP),Rx \\ ADD \$1,SP}
779
        & Note
780
        that for interrupt purposes, one can never depend upon the value at
781
        (SP).  Hence you read from it, then increment it, lest having
782
        incremented it firost something then comes along and writes to that
783
        value before you can read the result. \\\hline
784
PUSH Rx
785
        & \parbox[t]{1.5in}{SUB \$1,SPa \\
786
        STO Rx,\$1(SP)}
787
        & \\\hline
788
RESET
789
        & \parbox[t]{1in}{STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
790
        & \parbox[t]{3in}{This depends upon the peripheral base address being
791
        in R12.
792
 
793
        Another opportunity might be to jump to the reset address from within
794
        supervisor mode.}\\\hline
795 24 dgisselq
RET & \parbox[t]{1.5in}{LOD \$-1(SP),PC}
796
        & Note that this depends upon the calling context to clean up the
797
        stack, as outlined for the JSR instruction.  \\\hline
798 21 dgisselq
\end{tabular}
799
\caption{Derived Instructions, continued}\label{tbl:derived-2}
800
\end{center}\end{table}
801
\begin{table}\begin{center}
802
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
803
RET & MOV R12,PC
804
        & This is the high(er) speed version, that doesn't touch the stack.
805
        As such, it doesn't suffer a stall on memory read/write to the stack.
806
        \\\hline
807
STEP Rr,Rt
808
        & \parbox[t]{1.5in}{LSR \$1,Rr \\ XOR.C Rt,Rr}
809
        & Step a Galois implementation of a Linear Feedback Shift Register, Rr,
810
                using taps Rt \\\hline
811
STO.b Rx,\$addr
812
        & \parbox[t]{1.5in}{%
813
        LDI \$addr,Ra \\
814
        LDI \$addr,Rb \\
815
        LSR \$2,Ra \\
816
        AND \$3,Rb \\
817
        SUB \$32,Rb \\
818
        LOD (Ra),Ry \\
819
        AND \$0ffh,Rx \\
820
        AND \$-0ffh,Ry \\
821
        ROL Rb,Rx \\
822
        OR Rx,Ry \\
823
        STO Ry,(Ra) }
824
        & \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized
825
        for byte-wise operations.
826
 
827
        Note that in this example, \$addr is a
828
        byte-wise address, whereas in all of our other examples it is a
829
        32-bit word address. This also limits the address space
830
        of character accesses from 16 MB down to 4MB.F
831
        Further, this instruction implies a byte ordering,
832
        such as big or little endian.} \\\hline
833
SWAP Rx,Ry
834
        & \parbox[t]{1.5in}{
835
        XOR Ry,Rx \\
836
        XOR Rx,Ry \\
837
        XOR Ry,Rx}
838
        & While no extra registers are needed, this example
839
        does take 3-clocks. \\\hline
840
TRAP \#X
841
        & LDILO \$x,CC
842
        & This approach uses the unused bits of the CC register as a TRAP
843 24 dgisselq
        address.  The user will need to make certain
844 21 dgisselq
        that the SLEEP and GIE bits are not set in \$x.  LDI would also work,
845
        however using LDILO permits the use of conditional traps.  (i.e.,
846
        trap if the zero flag is set.)  Should you wish to trap off of a
847
        register value, you could equivalently load \$x into the register and
848
        then MOV it into the CC register. \\\hline
849
TST Rx
850
        & TST \$-1,Rx
851
        & Set the condition codes based upon Rx.  Could also do a CMP \$0,Rx,
852
        ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc.  The TST and CMP
853
        approaches won't stall future pipeline stages looking for the value
854
        of Rx. \\\hline
855
WAIT
856
        & Or \$SLEEP,CC
857
        & Wait 'til interrupt.  In an interrupts disabled context, this
858
        becomes a HALT instruction.
859
\end{tabular}
860
\caption{Derived Instructions, continued}\label{tbl:derived-3}
861
\end{center}\end{table}
862
\iffalse
863
\fi
864
\section{Pipeline Stages}
865
\begin{enumerate}
866
\item {\bf Prefetch}: Read instruction from memory (cache if possible).  This
867
        stage is actually pipelined itself, and so it will stall if the PC
868
        ever changes.  Stalls are also created here if the instruction isn't
869
        in the prefetch cache.
870
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
871
        immediate offset.
872
\item {\bf Read Operands}: Read registers and apply any immediate values to
873 24 dgisselq
        them.  There is no means of detecting or flagging arithmetic overflow
874
        or carry when adding the immediate to the operand.  This stage will
875
        stall if any source operand is pending.
876 21 dgisselq
        A proper optimizing compiler, therefore, will schedule an instruction
877
        between the instruction that produces the result and the instruction
878
        that uses it.
879
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
880
        instruction, and the {\bf MemOps} stage which accomplishes memory
881
        read/write.
882
        \begin{itemize}
883
        \item Loads stall instructions that access the register until it is
884
                written to the register set.
885
        \item Condition codes are available upon completion
886
        \item Issuing an instruction to the memory while the memory is busy will
887
                stall the bus.  If the bus deadlocks, only a reset will
888
                release the CPU.  (Watchdog timer, anyone?)
889 24 dgisselq
        \item The Zip CPU currently has no means of reading and acting on any
890
        error conditions on the bus.
891 21 dgisselq
        \end{itemize}
892
\item {\bf Write-Back}: Conditionally write back the result to register set,
893
        applying the condition.  This routine is bi-re-entrant: either the
894
        memory or the simple instruction may request a register write.
895
\end{enumerate}
896
 
897 24 dgisselq
The Zip CPU does not support out of order execution.  Therefore, if the memory
898
unit stalls, every other instruction stalls.  Memory stores, however, can take
899
place concurrently with ALU operations, although memory writes cannot.
900
 
901 21 dgisselq
\section{Pipeline Logic}
902
How the CPU handles some instruction combinations can be telling when
903
determining what happens in the pipeline.  The following lists some examples:
904
\begin{itemize}
905
\item {\bf Delayed Branching}
906
 
907
        I had originally hoped to implement delayed branching.  However, what
908
        happens in debug mode?
909
        That is, what happens when a debugger tries to single step an
910
        instruction?  While I can easily single step the computer in either
911
        user or supervisor mode from externally, this processor does not appear
912
        able to step the CPU in user mode from within user mode--gosh, not even
913
        from within supervisor mode--such as if a process had a debugger
914
        attached.  As the processor exists, I would have one result stepping
915
        the CPU from a debugger, and another stepping it externally.
916
 
917
        This is unacceptable, and so this CPU does not support delayed
918
        branching.
919
 
920
\item {\bf Register Result:} {\tt MOV R0,R1; MOV R1,R2 }
921
 
922
        What value does
923
        R2 get, the value of R1 before the first move or the value of R0?
924
        Placing the value of R0 into R1 requires a pipeline stall, and possibly
925
        two, as I have the pipeline designed.
926
 
927
        The ZIP CPU architecture requires that R2 must equal R0 at the end of
928
        this operation.  This may stall the pipeline 1-2 cycles.
929
 
930
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
931
 
932
 
933
        At issue is the same item as above, save that the CMP instruction
934
        updates the flags that the MOV instruction depends
935
        upon.
936
 
937
        The Zip CPU architecture requires that condition codes must be updated
938
        and available immediately for the next instruction without stalling the
939
        pipeline.
940
 
941
\item {\bf Condition Codes Register Result:} {\tt CMP R0,R1; MOV CC,R2}
942
 
943
        At issue is the
944
        fact that the logic supporting the CC register is more complicated than
945
        the logic supporting any other register.
946
 
947
        The ZIP CPU will stall 1--2 cycles on this instruction, until the
948
        CC register is valid.
949
 
950
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}
951
 
952
        At issues is whether or not the instruction following the jump will
953
        take place before the jump.  In other words, is the MOV to the PC
954
        register handled differently from an ADD to the PC register?
955
 
956
        In the Zip architecture, MOV'es and ADD's use the same logic
957
        (simplifies the logic).
958
\end{itemize}
959
 
960
As I've studied  this, I find several approaches to handling pipeline
961
        issues.  These approaches (and their consequences) are listed below.
962
 
963
\begin{itemize}
964
\item {\bf All All issued instructions complete, Stages stall individually}
965
 
966
        What about a slow pre-fetch?
967
 
968
        Nominally, this works well: any issued instruction
969
        just runs to completion.  If there are four issued instructions in the
970
        pipeline, with the writeback instruction being a write-to-PC
971
        instruction, the other three instructions naturally finish.
972
 
973
        This approach fails when reading instructions from the flash,
974
        since such reads require N clocks to clocks to complete.  Thus
975
        there may be only one instruction in the pipeline if reading from flash,
976
        or a full pipeline if reading from cache.  Each of these approaches
977
        would produce a different response.
978
 
979
\item {\bf Issued instructions may be canceled}
980
 
981
        Stages stall individually
982
 
983
        First problem:
984
        Memory operations cannot be canceled, even reads may have side effects
985
        on peripherals that cannot be canceled later.  Further, in the case of
986
        an interrupt, it's difficult to know what to cancel.  What happens in
987
        a \hbox{\tt MOV.C \$x,PC} followed by a \hbox{\tt MOV \$y,PC}
988
        instruction?  Which get
989
        canceled?
990
 
991
        Because it isn't clear what would need to be canceled,
992
        this instruction combination is not recommended.
993
 
994
\item {\bf All issued instructions complete.}
995
 
996
        All stages are filled, or the entire pipeline
997
        stalls.
998
 
999
        What about debug control?  What about
1000
        register writes taking an extra clock stage?  MOV R0,R1; MOV R1,R2
1001
        should place the value of R0 into R2.  How do you restart the pipeline
1002
        after an interrupt?  What address do you use?  The last issued
1003
        instruction?  But the branch delay slots may make that invalid!
1004
 
1005
        Reading from the CPU debug port in this case yields inconsistent
1006
        results: the CPU will halt or step with instructions stuck in the
1007
        pipeline.  Reading registers will give no indication of what is going
1008
        on in the pipeline, just the results of completed operations, not of
1009
        operations that have been started and not yet completed.
1010
        Perhaps we should just report the state of the CPU based upon what
1011
        instructions (PC values) have successfully completed?  Thus the
1012
        debug instruction is the one that will write registers on the next
1013
        clock.
1014
 
1015
        Suggestion: Suppose we load extra information in the two
1016
        CC register(s) for debugging intermediate pipeline stages?
1017
 
1018
        The next problem, though, is how to deal with the read operand
1019
        pipeline stage needing the result from the register pipeline.a
1020
 
1021
\item {\bf Memory instructions must complete}
1022
 
1023
        All instructions that enter into the memory module *must*
1024
        complete.  Issued instructions from the prefetch, decode, or operand
1025
        read stages may or may not complete.  Jumps into code must be valid,
1026
        so that interrupt returns may be valid.  All instructions entering the
1027
        ALU complete.
1028
 
1029
        This looks to be the simplest approach.
1030
        While the logic may be difficult, this appears to be the only
1031
        re-entrant approach.
1032
 
1033
        A {\tt new\_pc} flag will be high anytime the PC changes in an
1034
        unpredictable way (i.e., it doesn't increment).  This includes jumps
1035
        as well as interrupts and interrupt returns.  Whenever this flag may
1036
        go high, memory operations and ALU operations will stall until the
1037
        result is known.  When the flag does go high, anything in the prefetch,
1038
        decode, and read-op stage will be invalidated.
1039
 
1040
\end{itemize}
1041
 
1042
 
1043
 
1044
\chapter{Peripherals}\label{chap:periph}
1045 24 dgisselq
 
1046
While the previous chapter describes a CPU in isolation, the Zip System
1047
includes a minimum set of peripherals as well.  These peripherals are shown
1048
in Fig.~\ref{fig:zipsystem}
1049
\begin{figure}\begin{center}
1050
\includegraphics[width=3.5in]{../gfx/system.eps}
1051
\caption{Zip System Peripherals}\label{fig:zipsystem}
1052
\end{center}\end{figure}
1053
and described here.  They are designed to make
1054
the Zip CPU more useful in an Embedded Operating System environment.
1055
 
1056 21 dgisselq
\section{Interrupt Controller}
1057 24 dgisselq
 
1058
Perhaps the most important peripheral within the Zip System is the interrupt
1059
controller.  While the Zip CPU itself can only handle one interrupt, and has
1060
only the one interrupt state: disabled or enabled, the interrupt controller
1061
can make things more interesting.
1062
 
1063
The Zip System interrupt controller module supports up to 15 interrupts, all
1064
controlled from one register.  Bit~31 of the interrupt controller controls
1065
overall whether interrupts are enabled (1'b1) or disabled (1'b0).  Bits~16--30
1066
control whether individual interrupts are enabled (1'b0) or disabled (1'b0).
1067
Bit~15 is an indicator showing whether or not any interrupt is active, and
1068
bits~0--15 indicate whether or not an individual interrupt is active.
1069
 
1070
The interrupt controller has been designed so that bits can be controlled
1071
individually without having any knowledge of the rest of the controller
1072
setting.  To enable an interrupt, write to the register with the high order
1073
global enable bit set and the respective interrupt enable bit set.  No other
1074
bits will be affected.  To disable an interrupt, write to the register with
1075
the high order global enable bit cleared and the respective interrupt enable
1076
bit set.  To clear an interrupt, write a `1' to that interrupts status pin.
1077
Zero's written to the register have no affect, save that a zero written to the
1078
master enable will disable all interrupts.
1079
 
1080
As an example, suppose you wished to enable interrupt \#4.  You would then
1081
write to the register a {\tt 0x80100010} to enable interrupt \#4 and to clear
1082
any past active state.  When you later wish to disable this interrupt, you would
1083
write a {\tt 0x00100010} to the register.  As before, this both disables the
1084
interrupt and clears the active indicator.  This also has the side effect of
1085
disabling all interrupts, so a second write of {\tt 0x80000000} may be necessary
1086
to re-enable any other interrupts.
1087
 
1088
The Zip System currently hosts two interrupt controllers, a primary and a
1089
secondary.  The primary interrupt controller has one interrupt line which may
1090
come from an external interrupt controller, and one interrupt line from the
1091
secondary controller.  Other primary interrupts include the system timers,
1092
the jiffies interrupt, and the manual cache interrupt.  The secondary interrupt
1093
controller maintains an interrupt state for all of the processor accounting
1094
counters.
1095
 
1096 21 dgisselq
\section{Counter}
1097
 
1098
The Zip Counter is a very simple counter: it just counts.  It cannot be
1099
halted.  When it rolls over, it issues an interrupt.  Writing a value to the
1100
counter just sets the current value, and it starts counting again from that
1101
value.
1102
 
1103
Eight counters are implemented in the Zip System for process accounting.
1104
This may change in the future, as nothing as yet uses these counters.
1105
 
1106
\section{Timer}
1107
 
1108
The Zip Timer is also very simple: it simply counts down to zero.  When it
1109
transitions from a one to a zero it creates an interrupt.
1110
 
1111
Writing any non-zero value to the timer starts the timer.  If the high order
1112
bit is set when writing to the timer, the timer becomes an interval timer and
1113
reloads its last start time on any interrupt.  Hence, to mark seconds, one
1114
might set the timer to 100~million (the number of clocks per second), and
1115
set the high bit.  Ever after, the timer will interrupt the CPU once per
1116 24 dgisselq
second (assuming a 100~MHz clock).  This reload capability also limits the
1117
maximum timer value to $2^{31}-1$, rather than $2^{32}-1$.
1118 21 dgisselq
 
1119
\section{Watchdog Timer}
1120
 
1121
The watchdog timer is no different from any of the other timers, save for one
1122
critical difference: the interrupt line from the watchdog
1123
timer is tied to the reset line of the CPU.  Hence writing a `1' to the
1124
watchdog timer will always reset the CPU.
1125
To stop the Watchdog timer, write a '0' to it.  To start it,
1126
write any other number to it---as with the other timers.
1127
 
1128
While the watchdog timer supports interval mode, it doesn't make as much sense
1129
as it did with the other timers.
1130
 
1131
\section{Jiffies}
1132
 
1133
This peripheral is motivated by the Linux use of `jiffies' whereby a process
1134
can request to be put to sleep until a certain number of `jiffies' have
1135
elapsed.  Using this interface, the CPU can read the number of `jiffies'
1136
from the peripheral (it only has the one location in address space), add the
1137 24 dgisselq
sleep length to it, and write the result back to the peripheral.  The zipjiffies
1138 21 dgisselq
peripheral will record the value written to it only if it is nearer the current
1139
counter value than the last current waiting interrupt time.  If no other
1140
interrupts are waiting, and this time is in the future, it will be enabled.
1141
(There is currently no way to disable a jiffie interrupt once set, other
1142 24 dgisselq
than to disable the interrupt line in the interrupt controller.)  The processor
1143 21 dgisselq
may then place this sleep request into a list among other sleep requests.
1144
Once the timer expires, it would write the next Jiffy request to the peripheral
1145
and wake up the process whose timer had expired.
1146
 
1147
Indeed, the Jiffies register is nothing more than a glorified counter with
1148
an interrupt.  Unlike the other counters, the Jiffies register cannot be set.
1149
Writes to the jiffies register create an interrupt time.  When the Jiffies
1150
register later equals the value written to it, an interrupt will be asserted
1151
and the register then continues counting as though no interrupt had taken
1152
place.
1153
 
1154
The purpose of this register is to support alarm times within a CPU.  To
1155
set an alarm for a particular process $N$ clocks in advance, read the current
1156
Jiffies value, and $N$, and write it back to the Jiffies register.  The
1157
O/S must also keep track of values written to the Jiffies register.  Thus,
1158
when an `alarm' trips, it should be remoed from the list of alarms, the list
1159
should be sorted, and the next alarm in terms of Jiffies should be written
1160
to the register.
1161
 
1162 24 dgisselq
\section{Manual Cache}
1163
 
1164
The manual cache is an experimental setting that may not remain with the Zip
1165
CPU for very long.  It is designed to facilitate running from FLASH or ROM
1166
memory, although the pipe cache really makes this need obsolete.  The manual
1167
cache works by copying data from a wishbone address (range) into the cache
1168
register, and then by making that memory available as memory to the Zip System.
1169
It is a {\em manual cache} because the processor must first specify what
1170
memory to copy, and then once copied the processor can only access the cache
1171
memory by the cache memory location.  There is no transparency.  It is perhaps
1172
best described as a combination DMA controller and local memory.
1173
 
1174
Worse, this cache is likely going to be removed from the ZipSystem.  Having used
1175
the ZipSystem now for some time, I have yet to find a need or use for the manual
1176
cache.  I will likely replace this peripheral with a proper DMA controller.
1177
 
1178 21 dgisselq
\chapter{Operation}\label{chap:ops}
1179
 
1180
\chapter{Registers}\label{chap:regs}
1181
 
1182 24 dgisselq
The ZipSystem registers fall into two categories, ZipSystem internal registers
1183
accessed via the ZipCPU shown in Tbl.~\ref{tbl:zpregs},
1184
\begin{table}[htbp]
1185
\begin{center}\begin{reglist}
1186
PIC   & {\tt 0xc0000000} & 32 & R/W & Primary Interrupt Controller \\\hline
1187
WDT   & {\tt 0xc0000001} & 32 & R/W & Watchdog Timer \\\hline
1188
CCHE  & {\tt 0xc0000002} & 32 & R/W & Manual Cache Controller \\\hline
1189
CTRIC & {\tt 0xc0000003} & 32 & R/W & Secondary Interrupt Controller \\\hline
1190
TMRA  & {\tt 0xc0000004} & 32 & R/W & Timer A\\\hline
1191
TMRB  & {\tt 0xc0000005} & 32 & R/W & Timer B\\\hline
1192
TMRC  & {\tt 0xc0000006} & 32 & R/W & Timer C\\\hline
1193
JIFF  & {\tt 0xc0000007} & 32 & R/W & Jiffies \\\hline
1194
MTASK  & {\tt 0xc0000008} & 32 & R/W & Master Task Clock Counter \\\hline
1195
MMSTL  & {\tt 0xc0000008} & 32 & R/W & Master Stall Counter \\\hline
1196
MPSTL  & {\tt 0xc0000008} & 32 & R/W & Master Pre--Fetch Stall Counter \\\hline
1197
MICNT  & {\tt 0xc0000008} & 32 & R/W & Master Instruction Counter\\\hline
1198
UTASK  & {\tt 0xc0000008} & 32 & R/W & User Task Clock Counter \\\hline
1199
UMSTL  & {\tt 0xc0000008} & 32 & R/W & User Stall Counter \\\hline
1200
UPSTL  & {\tt 0xc0000008} & 32 & R/W & User Pre--Fetch Stall Counter \\\hline
1201
UICNT  & {\tt 0xc0000008} & 32 & R/W & User Instruction Counter\\\hline
1202
Cache  & {\tt 0xc0100000} & & & Base address of the Cache memory\\\hline
1203
\end{reglist}
1204
\caption{Zip System Internal/Peripheral Registers}\label{tbl:zpregs}
1205
\end{center}\end{table}
1206
and the two debug registers showin in Tbl.~\ref{tbl:dbgregs}.
1207
\begin{table}[htbp]
1208
\begin{center}\begin{reglist}
1209
ZIPCTRL & 0 & 32 & R/W & Debug Control Register \\\hline
1210
ZIPDATA & 1 & 32 & R/W & Debug Data Register \\\hline
1211
\end{reglist}
1212
\caption{Zip System Debug Registers}\label{tbl:dbgregs}
1213
\end{center}\end{table}
1214
 
1215
 
1216 21 dgisselq
\chapter{Wishbone Datasheet}\label{chap:wishbone}
1217
The Zip System supports two wishbone accesses, a slave debug port and a master
1218
port for the system itself.  These are shown in Tbl.~\ref{tbl:wishbone-slave}
1219
\begin{table}[htbp]
1220
\begin{center}
1221
\begin{wishboneds}
1222
Revision level of wishbone & WB B4 spec \\\hline
1223
Type of interface & Slave, Read/Write, single words only \\\hline
1224 24 dgisselq
Address Width & 1--bit \\\hline
1225 21 dgisselq
Port size & 32--bit \\\hline
1226
Port granularity & 32--bit \\\hline
1227
Maximum Operand Size & 32--bit \\\hline
1228
Data transfer ordering & (Irrelevant) \\\hline
1229
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1230
Signal Names & \begin{tabular}{ll}
1231
                Signal Name & Wishbone Equivalent \\\hline
1232
                {\tt i\_clk} & {\tt CLK\_I} \\
1233
                {\tt i\_dbg\_cyc} & {\tt CYC\_I} \\
1234
                {\tt i\_dbg\_stb} & {\tt STB\_I} \\
1235
                {\tt i\_dbg\_we} & {\tt WE\_I} \\
1236
                {\tt i\_dbg\_addr} & {\tt ADR\_I} \\
1237
                {\tt i\_dbg\_data} & {\tt DAT\_I} \\
1238
                {\tt o\_dbg\_ack} & {\tt ACK\_O} \\
1239
                {\tt o\_dbg\_stall} & {\tt STALL\_O} \\
1240
                {\tt o\_dbg\_data} & {\tt DAT\_O}
1241
                \end{tabular}\\\hline
1242
\end{wishboneds}
1243 22 dgisselq
\caption{Wishbone Datasheet for the Debug Interface}\label{tbl:wishbone-slave}
1244 21 dgisselq
\end{center}\end{table}
1245
and Tbl.~\ref{tbl:wishbone-master} respectively.
1246
\begin{table}[htbp]
1247
\begin{center}
1248
\begin{wishboneds}
1249
Revision level of wishbone & WB B4 spec \\\hline
1250 24 dgisselq
Type of interface & Master, Read/Write, single cycle or pipelined\\\hline
1251
Address Width & 32--bit bits \\\hline
1252 21 dgisselq
Port size & 32--bit \\\hline
1253
Port granularity & 32--bit \\\hline
1254
Maximum Operand Size & 32--bit \\\hline
1255
Data transfer ordering & (Irrelevant) \\\hline
1256
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1257
Signal Names & \begin{tabular}{ll}
1258
                Signal Name & Wishbone Equivalent \\\hline
1259
                {\tt i\_clk} & {\tt CLK\_O} \\
1260
                {\tt o\_wb\_cyc} & {\tt CYC\_O} \\
1261
                {\tt o\_wb\_stb} & {\tt STB\_O} \\
1262
                {\tt o\_wb\_we} & {\tt WE\_O} \\
1263
                {\tt o\_wb\_addr} & {\tt ADR\_O} \\
1264
                {\tt o\_wb\_data} & {\tt DAT\_O} \\
1265
                {\tt i\_wb\_ack} & {\tt ACK\_I} \\
1266
                {\tt i\_wb\_stall} & {\tt STALL\_I} \\
1267
                {\tt i\_wb\_data} & {\tt DAT\_I}
1268
                \end{tabular}\\\hline
1269
\end{wishboneds}
1270 22 dgisselq
\caption{Wishbone Datasheet for the CPU as Master}\label{tbl:wishbone-master}
1271 21 dgisselq
\end{center}\end{table}
1272
I do not recommend that you connect these together through the interconnect.
1273 24 dgisselq
Rather, the debug port of the CPU should be accessible regardless of the state
1274
of the master bus.
1275 21 dgisselq
 
1276 24 dgisselq
You may wish to notice that neither the {\tt ERR} nor the {\tt RETRY} wires
1277
have been implemented.  What this means is that the CPU is currently unable
1278
to detect a bus error condition, and so may stall indefinitely (hang) should
1279
it choose to access a value not on the bus, or a peripheral that is not
1280
yet properly configured.
1281 21 dgisselq
 
1282
\chapter{Clocks}\label{chap:clocks}
1283
 
1284
This core is based upon the Basys--3 design.  The Basys--3 development board
1285
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
1286
core.
1287
\begin{table}[htbp]
1288
\begin{center}
1289
\begin{clocklist}
1290
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
1291
\end{clocklist}
1292
\caption{List of Clocks}\label{tbl:clocks}
1293
\end{center}\end{table}
1294
I hesitate to suggest that the core can run faster than 100~MHz, since I have
1295
had struggled with various timing violations to keep it at 100~MHz.  So, for
1296
now, I will only state that it can run at 100~MHz.
1297
 
1298
 
1299
\chapter{I/O Ports}\label{chap:ioports}
1300
 
1301
% Appendices
1302
% Index
1303
\end{document}
1304
 
1305
 

powered by: WebSVN 2.1.0

© copyright 1999-2022 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.