OpenCores
URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

Subversion Repositories zipcpu

[/] [zipcpu/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 21

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 21 dgisselq
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2
%%
3
%% Filename:    spec.tex
4
%%
5
%% Project:     Zip CPU -- a small, lightweight, RISC CPU soft core
6
%%
7
%% Purpose:     This LaTeX file contains all of the documentation/description
8
%%              currently provided with this Zip CPU soft core.  It supercedes
9
%%              any information about the instruction set or CPUs found
10
%%              elsewhere.  It's not nearly as interesting, though, as the PDF
11
%%              file it creates, so I'd recommend reading that before diving
12
%%              into this file.  You should be able to find the PDF file in
13
%%              the SVN distribution together with this PDF file and a copy of
14
%%              the GPL-3.0 license this file is distributed under.  If not,
15
%%              just type 'make' in the doc directory and it (should) build
16
%%              without a problem.
17
%%
18
%%
19
%% Creator:     Dan Gisselquist
20
%%              Gisselquist Technology, LLC
21
%%
22
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
23
%%
24
%% Copyright (C) 2015, Gisselquist Technology, LLC
25
%%
26
%% This program is free software (firmware): you can redistribute it and/or
27
%% modify it under the terms of  the GNU General Public License as published
28
%% by the Free Software Foundation, either version 3 of the License, or (at
29
%% your option) any later version.
30
%%
31
%% This program is distributed in the hope that it will be useful, but WITHOUT
32
%% ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
33
%% FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
34
%% for more details.
35
%%
36
%% You should have received a copy of the GNU General Public License along
37
%% with this program.  (It's in the $(ROOT)/doc directory, run make with no
38
%% target there if the PDF file isn't present.)  If not, see
39
%% <http://www.gnu.org/licenses/> for a copy.
40
%%
41
%% License:     GPL, v3, as defined and found on www.gnu.org,
42
%%              http://www.gnu.org/licenses/gpl.html
43
%%
44
%%
45
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
46
\documentclass{gqtekspec}
47
\project{Zip CPU}
48
\title{Specification}
49
\author{Dan Gisselquist, Ph.D.}
50
\email{dgisselq (at) opencores.org}
51
\revision{Rev.~0.1}
52
\begin{document}
53
\pagestyle{gqtekspecplain}
54
\titlepage
55
\begin{license}
56
Copyright (C) \theyear\today, Gisselquist Technology, LLC
57
 
58
This project is free software (firmware): you can redistribute it and/or
59
modify it under the terms of  the GNU General Public License as published
60
by the Free Software Foundation, either version 3 of the License, or (at
61
your option) any later version.
62
 
63
This program is distributed in the hope that it will be useful, but WITHOUT
64
ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
65
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
66
for more details.
67
 
68
You should have received a copy of the GNU General Public License along
69
with this program.  If not, see \hbox{<http://www.gnu.org/licenses/>} for a
70
copy.
71
\end{license}
72
\begin{revisionhistory}
73
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
74
\end{revisionhistory}
75
% Revision History
76
% Table of Contents, named Contents
77
\tableofcontents
78
% \listoffigures
79
\listoftables
80
\begin{preface}
81
Many people have asked me why I am building the Zip CPU. ARM processors are
82
good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both
83
have better toolsets than the Zip CPU will ever have. OpenRISC is also
84
available. Why build a new processor?
85
 
86
The easiest, most obvious answer is the simple one: Because I can.
87
 
88
There's more to it, though. There's a lot that I would like to do with a
89
processor, and I want to be able to do it in a vendor independent fashion.
90
I would like to be able to generate Verilog code that can run equivalently
91
on both Xilinx and Altera chips, and that can be easily ported from one
92
manufacturer's chipsets to another. Even more, before purchasing a chip or a
93
board, I would like to know that my chip works. I would like to build a test
94
bench to test components with, and Verilator is my chosen test bench. This
95
forces me to use all Verilog, and it prevents me from using any proprietary
96
cores. For this reason, Microblaze and Nios are out of the question.
97
 
98
Why not OpenRISC? That's a hard question. The OpenRISC team has done some
99
wonderful work on an amazing processor, and I'll have to admit that I am
100
envious of what they've accomplished. I would like to port binutils to the
101
Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The
102
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has
103
a lot of features of modern CPUs within it that ... well, let's just say it's
104
not the little guy on the block. The Zip CPU is lighter weight, costing only
105
about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic
106
peripherals.
107
 
108
My final reason is that I'm building the Zip CPU as a learning experience. The
109
Zip CPU has allowed me to learn a lot about how CPUs work on a very micro
110
level. For the first time, I am beginning to understand many of the Computer
111
Architecture lessons from years ago.
112
 
113
To summarize: Because I can, because it is open source, because it is light
114
weight, and as an exercise in learning.
115
 
116
\end{preface}
117
 
118
\chapter{Introduction}
119
\pagenumbering{arabic}
120
\setcounter{page}{1}
121
 
122
 
123
The original goal of the ZIP CPU was to be a very simple CPU.   You might
124
think of it as a poor man's alternative to the OpenRISC architecture.
125
For this reason, all instructions have been designed to be as simple as
126
possible, and are all designed to be executed in one instruction cycle per
127
instruction, barring pipeline stalls.  Indeed, even the bus has been simplified
128
to a constant 32-bit width, with no option for more or less.  This has
129
resulted in the choice to drop push and pop instructions, pre-increment and
130
post-decrement addressing modes, and more.
131
 
132
For those who like buzz words, the Zip CPU is:
133
\begin{itemize}
134
\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,
135
                instructions are 32-bits wide, etc.
136
\item A RISC CPU.  There is no microcode for executing instructions.
137
\item A Load/Store architecture.  (Only load and store instructions
138
                can access memory.)
139
\item Wishbone compliant.  All peripherals are accessed just like
140
                memory across this bus.
141
\item A Von-Neumann architecture.  (The instructions and data share a
142
                common bus.)
143
\item A pipelined architecture, having stages for {\bf Prefetch},
144
                {\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory}
145
                unit, and {\bf Write-back}
146
\item Completely open source, licensed under the GPL.\footnote{Should you
147
        need a copy of the Zip CPU licensed under other terms, please
148
        contact me.}
149
\end{itemize}
150
 
151
Now, however, that I've worked on the Zip CPU for a while, it is not nearly
152
as simple as I originally hoped.  Worse, I've had to adjust to create
153
capabilities that I was never expecting to need.  These include:
154
\begin{itemize}
155
\item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is
156
        still necessary to debug this CPU.  That means that there needs to be
157
        an external register that can control the CPU: reset it, halt it, step
158
        it, and tell whether it is running or not.  Another register is placed
159
        similar to this register, to allow the external controller to examine
160
        registers internal to the CPU.
161
 
162
\item {\bf Internal Debug:} Being able to run a debugger from within
163
        a user process requires an ability to step a user process from
164
        within a debugger.  It also requires a break instruction that can
165
        be substituted for any other instruction, and substituted back.
166
        The break is actually difficult: the break instruction cannot be
167
        allowed to execute.  That way, upon a break, the debugger should
168
        be able to jump back into the user process to step the instruction
169
        that would've been at the break point initially, and then to
170
        replace the break after passing it.
171
 
172
\item {\bf Prefetch Cache:} My original implementation had a very
173
        simple prefetch stage.  Any time the PC changed the prefetch would go
174
        and fetch the new instruction.  While this was perhaps this simplest
175
        approach, it cost roughly five clocks for every instruction.  This
176
        was deemed unacceptable, as I wanted a CPU that could execute
177
        instructions in one cycle.  I therefore have a prefetch cache that
178
        issues pipelined wishbone accesses to memory and then pushes
179
        instructions at the CPU.  Sadly, this accounts for about 20\% of the
180
        logic in the entire CPU, or 15\% of the logic in the entire system.
181
 
182
 
183
\item {\bf Operating System:} In order to support an operating system,
184
        interrupts and so forth, the CPU needs to support supervisor and
185
        user modes, as well as a means of switching between them.  For example,
186
        the user needs a means of executing a system call.  This is the
187
        purpose of the {\bf `trap'} instruction.  This instruction needs to
188
        place the CPU into supervisor mode (here equivalent to disabling
189
        interrupts), as well as handing it a parameter such as identifying
190
        which O/S function was called.
191
 
192
My initial approach to building a trap instruction was to create
193
        an external peripheral which, when written to, would generate an
194
        interrupt and could return the last value written to it.  This failed
195
        timing requirements, however: the CPU executed two instructions while
196
        waiting for the trap interrupt to take place.  Since then, I've
197
        decided to keep the rest of the CC register for that purpose so that a
198
        write to the CC register, with the GIE bit cleared, could be used to
199
        execute a trap.
200
 
201
Modern timesharing systems also depend upon a {\bf Timer} interrupt
202
        to handle task swapping.  For the Zip CPU, this interrupt is handled
203
        external to the CPU as part of the CPU System, found in
204
        {\tt zipsystem.v}.  The timer module itself is found in
205
        {\tt ziptimer.v}.
206
 
207
\item {\bf Pipeline Stalls:} My original plan was to not support pipeline
208
        stalls at all, but rather to require the compiler to properly schedule
209
        instructions so that stalls would never be necessary.  After trying
210
        to build such an architecture, I gave up, having learned some things:
211
 
212
        For example, in  order to facilitate interrupt handling and debug
213
        stepping, the CPU needs to know what instructions have finished, and
214
        which have not.  In other words, it needs to know where it can restart
215
        the pipeline from.  Once restarted, it must act as though it had
216
                never stopped.  This killed my idea of delayed branching, since
217
                what would be the appropriate program counter to restart at?
218
                The one the CPU was going to branch to, or the ones in the
219
                delay slots?
220
 
221
        So I switched to a model of discrete execution: Once an instruction
222
        enters into either the ALU or memory unit, the instruction is
223
        guaranteed to complete.  If the logic recognizes a branch or a
224
        condition that would render the instruction entering into this stage
225
        possibly inappropriate (i.e. a conditional branch preceeding a store
226
        instruction for example), then the pipeline stalls for one cycle
227
        until the conditional branch completes.  Then, if it generates a new
228
        PC address, the stages preceeding are all wiped clean.
229
 
230
        The discrete execution model allows such things as sleeping: if the
231
        CPU is put to "sleep", the ALU and memory stages stall and back up
232
        everything before them.  Likewise, anything that has entered the ALU
233
        or memory stage when the CPU is placed to sleep continues to completion.
234
        To handle this logic, each pipeline stage has three control signals:
235
        a valid signal, a stall signal, and a clock enable signal.  In
236
        general, a stage stalls if it's contents are valid and the next step
237
        is stalled.  This allows the pipeline to fill any time a later stage
238
        stalls.
239
 
240
\item {\bf Verilog Modules:} When examining how other processors worked
241
        here on open cores, many of them had one separate module per pipeline
242
        stage.  While this appeared to me to be a fascinating and commendable
243
        idea, my own implementation didn't work out quite so nicely.
244
 
245
        As an example, the decode module produces a {\em lot} of
246
        control wires and registers.  Creating a module out of this, with
247
        only the simplest of logic within it, seemed to be more a lesson
248
        in passing wires around, rather than encapsulating logic.
249
 
250
        Another example was the register writeback section.  I would love
251
        this section to be a module in its own right, and many have made them
252
        such.  However, other modules depend upon writeback results other
253
        than just what's placed in the register (i.e., the control wires).
254
        For these reasons, I didn't manage to fit this section into it's
255
        own module.
256
 
257
        The result is that the majority of the CPU code can be found in
258
        the {\tt zipcpu.v} file.
259
\end{itemize}
260
 
261
With that introduction out of the way, let's move on to the instruction
262
set.
263
 
264
\chapter{CPU Architecture}\label{chap:arch}
265
 
266
The Zip CPU supports a set of two operand instructions, where the first operand
267
(always a register) is the result.  The only exception is the store instruction,
268
where the first operand (always a register) is the source of the data to be
269
stored.
270
 
271
\section{Register Set}
272
The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor
273
and a user set.  The supervisor set is used in interrupt mode, whereas
274
the user set is used otherwise.  Of this register set, the Program Counter (PC)
275
is register 15, whereas the status register (SR) or condition code register
276
(CC) is register 14.  By convention, the stack pointer will be register 13 and
277
noted as (SP)--although the instruction set allows it to be anything.
278
The CPU can access both register sets via move instructions from the
279
supervisor state, whereas the user state can only access the user registers.
280
 
281
The status register is special, and bears further mention.  The lower
282
8 bits of the status register form a set of condition codes.  Writes to other
283
bits are preserved, and can be used as part of the trap architecture--examined
284
by the O/S upon any interrupt, cleared before returning.
285
 
286
Of the eight condition codes, the bottom four are the current flags:
287
                Zero (Z),
288
                Carry (C),
289
                Negative (N),
290
                and Overflow (V).
291
 
292
The next bit is a clock enable (0 to enable) or sleep bit (1 to put
293
        the CPU to sleep).  Setting this bit will cause the CPU to
294
        wait for an interrupt (if interrupts are enabled), or to
295
        completely halt (if interrupts are disabled).
296
The sixth bit is a global interrupt enable bit (GIE).  When this
297
        sixth bit is a '1' interrupts will be enabled, else disabled.  When
298
        interrupts are disabled, the CPU will be in supervisor mode, otherwise
299
        it is in user mode.  Thus, to execute a context switch, one only
300
        need enable or disable interrupts.  (When an interrupt line goes
301
        high, interrupts will automatically be disabled, as the CPU goes
302
        and deals with its context switch.)
303
 
304
The seventh bit is a step bit.  This bit can be
305
        set from supervisor mode only.  After setting this bit, should
306
        the supervisor mode process switch to user mode, it would then
307
        accomplish one instruction in user mode before returning to supervisor
308
        mode.  Then, upon return to supervisor mode, this bit will
309
        be automatically cleared.  This bit has no effect on the CPU while in
310
        supervisor mode.
311
 
312
        This functionality was added to enable a userspace debugger
313
        functionality on a user process, working through supervisor mode
314
        of course.
315
 
316
 
317
The eighth bit is a break enable bit.  This
318
        controls whether a break instruction will halt the processor for an
319
        external debuggerr (break enabled), or whether the break instruction
320
        will simply set the STEP bit and send the CPU into interrupt mode.
321
        This bit can only be set within supervisor mode.
322
 
323
This functionality was added to enable an external debugger to
324
        set and manage breakpoints.
325
 
326
The ninth bit is reserved for a floating point enable bit.  When set, the
327
arithmetic for the next instruction will be sent to a floating point unit.
328
Such a unit may later be added as an extension to the Zip CPU.  If the
329
CPU does not support floating point instructions, this bit will never be set.
330
 
331
The tenth bit is a trap bit.  It is set whenever the user requests a soft
332
interrupt, and cleared on any return to userspace command.  This allows the
333
supervisor, in supervisor mode, to determine whether it got to supervisor
334
mode from a trap or from an external interrupt or both.
335
 
336
The status register bits are shown below:
337
\begin{table}
338
\begin{center}
339
\begin{tabular}{l|l}
340
Bit & Meaning \\\hline
341
9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline
342
8 & (Reserved for) Floating point enable \\\hline
343
7 & Halt on break, to support an external debugger \\\hline
344
6 & Step, single step the CPU in user mode\\\hline
345
5 & GIE, or Global Interrupt Enable \\\hline
346
4 & Sleep \\\hline
347
3 & V, or overflow bit.\\\hline
348
2 & N, or negative bit.\\\hline
349
1 & C, or carry bit.\\\hline
350
 
351
\end{tabular}
352
\end{center}
353
\end{table}
354
\section{Conditional Instructions}
355
Most, although not quite all, instructions are conditionally executed.  From
356
the four condition code flags, eight conditions are defined.  These are shown
357
in Tbl.~\ref{tbl:conditions}.
358
\begin{table}
359
\begin{center}
360
\begin{tabular}{l|l|l}
361
Code & Mneumonic & Condition \\\hline
362
3'h0 & None & Always execute the instruction \\
363
3'h1 & {\tt .Z} & Only execute when 'Z' is set \\
364
3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\
365
3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
366
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
367
3'h5 & {\tt .LT} & Less than ('N' not set) \\
368
3'h6 & {\tt .C} & Carry set\\
369
3'h7 & {\tt .V} & Overflow set\\
370
\end{tabular}
371
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
372
\end{center}
373
\end{table}
374
There is no condition code for less than or equal, not C or not V.  Using
375
these conditions will take an extra instruction.
376
(Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
377
 
378
\section{Operand B}
379
Many instruction forms have a 21-bit source "Operand B" associated with them.
380
This Operand B is either equal to a register plus a signed immediate offset,
381
or an immediate offset by itself.  This value is encoded as shown in
382
Tbl.~\ref{tbl:opb}.
383
\begin{table}\begin{center}
384
\begin{tabular}{|l|l|l|}\hline
385
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
386
1'b0 & \multicolumn{2}{l|}{Signed Immediate value} \\\hline
387
1'b1 & 4-bit Register & 16-bit Signed immediate offset \\\hline
388
\end{tabular}
389
\caption{Bit allocation for Operand B}\label{tbl:opb}
390
\end{center}\end{table}
391
\section{Address Modes}
392
The ZIP CPU supports two addressing modes: register plus immediate, and
393
immediate address.  Addresses are therefore encoded in the same fashion as
394
Operand B's, shown above.
395
 
396
A lot of long hard thought was put into whether to allow pre/post increment
397
and decrement addressing modes.  Finding no way to use these operators without
398
taking two or more clocks per instruction, these addressing modes have been
399
removed from the realm of possibilities.  This means that the Zip CPU has no
400
native way of executing push, pop, return, or jump to subroutine operations.
401
 
402
\section{Move Operands}
403
The previous set of operands would be perfect and complete, save only that
404
        the CPU needs access to non--supervisory registers while in supervisory
405
        mode.  Therefore, the MOV instruction is special and offers access
406
        to these registers ... when in supervisory mode.  To keep the compiler
407
        simple, the extra bits are ignored in non-supervisory mode (as though
408
        they didn't exist), rather than being mapped to new instructions or
409
        additional capabilities.  The bits indicating which register set each
410
        register lies within are the A-Usr and B-Usr bits.  When set to a one,
411
        these refer to a user mode register.  When set to a zero, these refer
412
        to a register in the current mode, whether user or supervisor.
413
        Further, because
414
        a load immediate instruction exists, there is no move capability between
415
        an immediate and a register: all moves come from either a register or
416
        a register plus an offset.
417
 
418
This actually leads to a bit of a problem: since the MOV instruction
419
        encodes which register set each register is coming from or moving to,
420
        how shall a compiler or assembler know how to compile a MOV instruction
421
        without knowing the mode of the CPU at the time?  For this reason,
422
        the compiler will assume all MOV registers are supervisor registers,
423
        and display them as normal.  Anything with the user bit set will
424
        be treated as a user register.  The CPU will quietly ignore the
425
        supervisor bits while in user mode, and anything marked as a user
426
        register will always be valid.
427
 
428
\section{Multiply Operations}
429
While the Zip CPU instruction set supports multiply operations, they are not
430
yet fully supported by the CPU.  Two Multiply operations are supported, a
431
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU).  In both
432
cases, the operand is a register plus a 16-bit immediate, subject to the
433
rule that the register cannot be the PC or CC registers.  The PC register
434
field has been stolen to create a multiply by immediate instruction.  The
435
CC register field is reserved.
436
 
437
\section{Floating Point}
438
The ZIP CPU does not support floating point operations today.  However, the
439
instruction set reserves a capability for a floating point operation.  To
440
execute such an operation, simply set the floating point bit in the CC
441
register and the following instruction will interpret its registers as
442
a floating point instruction.  Not all instructions, however, have floating
443
point equivalents.  Further, the immediate fields do not apply in floating
444
point mode, and must be set to zero.  Not all instructions make sense as
445
floating point operations.  Therefore, only the CMP, SUB, ADD, and MPY
446
instructions may be issued as floating point instructions.  Other instructions
447
allow the examining of the floating point bit in the CC register.  In all
448
cases, the floating point bit is cleared one instruction after it is set.
449
 
450
The architecture does not support a floating point not-implemented interrupt.
451
Any soft floating point emulation must be done deliberately.
452
 
453
\section{Native Instructions}
454
The instruction set for the Zip CPU is summarized in
455
Tbl.~\ref{tbl:zip-instructions}.
456
\begin{table}\begin{center}
457
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
458
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
459
        & \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
460
        & Sets CC? \\\hline
461
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
462
                & \multicolumn{4}{l|}{D. Reg}
463
                & \multicolumn{3}{l|}{Cond.}
464
                & \multicolumn{21}{l|}{Operand B}
465
                & Yes \\\hline
466
BTST(And) & \multicolumn{4}{l|}{4'h1}
467
                & \multicolumn{4}{l|}{D. Reg}
468
                & \multicolumn{3}{l|}{Cond.}
469
                & \multicolumn{21}{l|}{Operand B}
470
        & Yes \\\hline
471
MOV & \multicolumn{4}{l|}{4'h2}
472
                & \multicolumn{4}{l|}{D. Reg}
473
                & \multicolumn{3}{l|}{Cond.}
474
                & A-Usr
475
                & \multicolumn{4}{l|}{B-Reg}
476
                & B-Usr
477
                & \multicolumn{15}{l|}{15'bit signed offset}
478
                & \\\hline
479
LODI & \multicolumn{4}{l|}{4'h3}
480
                & \multicolumn{4}{l|}{R. Reg}
481
                & \multicolumn{24}{l|}{24'bit Signed Immediate}
482
                & \\\hline
483
NOOP & \multicolumn{4}{l|}{4'h4}
484
                & \multicolumn{4}{l|}{4'he}
485
                & \multicolumn{24}{l|}{24'h00}
486
                & \\\hline
487
BREAK & \multicolumn{4}{l|}{4'h4}
488
                & \multicolumn{4}{l|}{4'he}
489
                & \multicolumn{24}{l|}{24'h01}
490
                & \\\hline
491
{\em Rsrd} & \multicolumn{4}{l|}{4'h4}
492
                & \multicolumn{4}{l|}{4'he}
493
                & \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
494
                & \\\hline
495
LODIHI & \multicolumn{4}{l|}{4'h4}
496
                & \multicolumn{4}{l|}{4'hf}
497
                & \multicolumn{3}{l|}{Cond.}
498
                & 1'b1
499
                & \multicolumn{4}{l|}{R. Reg}
500
                & \multicolumn{16}{l|}{16-bit Immediate}
501
                & \\\hline
502
LODILO & \multicolumn{4}{l|}{4'h4}
503
                & \multicolumn{4}{l|}{4'hf}
504
                & \multicolumn{3}{l|}{Cond.}
505
                & 1'b0
506
                & \multicolumn{4}{l|}{R. Reg}
507
                & \multicolumn{16}{l|}{16-bit Immediate}
508
                & \\\hline
509
16-b MPYU & \multicolumn{4}{l|}{4'h4}
510
                & \multicolumn{4}{l|}{R. Reg}
511
                & \multicolumn{3}{l|}{Cond.}
512
                & 1'b0 & \multicolumn{4}{l|}{Reg}
513
                & \multicolumn{16}{l|}{16-bit Offset}
514
                & Yes \\\hline
515
16-b MPYU(I) & \multicolumn{4}{l|}{4'h4}
516
                & \multicolumn{4}{l|}{R. Reg}
517
                & \multicolumn{3}{l|}{Cond.}
518
                & 1'b0 & \multicolumn{4}{l|}{4'hf}
519
                & \multicolumn{16}{l|}{16-bit Offset}
520
                & Yes \\\hline
521
16-b MPYS & \multicolumn{4}{l|}{4'h4}
522
                & \multicolumn{4}{l|}{R. Reg}
523
                & \multicolumn{3}{l|}{Cond.}
524
                & 1'b1 & \multicolumn{4}{l|}{Reg}
525
                & \multicolumn{16}{l|}{16-bit Offset}
526
                & Yes \\\hline
527
16-b MPYS(I) & \multicolumn{4}{l|}{4'h4}
528
                & \multicolumn{4}{l|}{R. Reg}
529
                & \multicolumn{3}{l|}{Cond.}
530
                & 1'b1 & \multicolumn{4}{l|}{4'hf}
531
                & \multicolumn{16}{l|}{16-bit Offset}
532
                & Yes \\\hline
533
ROL & \multicolumn{4}{l|}{4'h5}
534
                & \multicolumn{4}{l|}{R. Reg}
535
                & \multicolumn{3}{l|}{Cond.}
536
                & \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
537
                & \\\hline
538
LOD & \multicolumn{4}{l|}{4'h6}
539
                & \multicolumn{4}{l|}{R. Reg}
540
                & \multicolumn{3}{l|}{Cond.}
541
                & \multicolumn{21}{l|}{Operand B address}
542
                & \\\hline
543
STO & \multicolumn{4}{l|}{4'h7}
544
                & \multicolumn{4}{l|}{D. Reg}
545
                & \multicolumn{3}{l|}{Cond.}
546
                & \multicolumn{21}{l|}{Operand B address}
547
                & \\\hline
548
{\em Rsrd} & \multicolumn{4}{l|}{4'h8}
549
        &       \multicolumn{4}{l|}{R. Reg}
550
        &       \multicolumn{3}{l|}{Cond.}
551
        & 1'b0
552
        &       \multicolumn{20}{l|}{Reserved}
553
        & Yes \\\hline
554
SUB & \multicolumn{4}{l|}{4'h8}
555
        &       \multicolumn{4}{l|}{R. Reg}
556
        &       \multicolumn{3}{l|}{Cond.}
557
        & 1'b1
558
        &       \multicolumn{4}{l|}{Reg}
559
        &       \multicolumn{16}{l|}{16'bit signed offset}
560
        & Yes \\\hline
561
AND & \multicolumn{4}{l|}{4'h9}
562
        &       \multicolumn{4}{l|}{R. Reg}
563
        &       \multicolumn{3}{l|}{Cond.}
564
        &       \multicolumn{21}{l|}{Operand B}
565
        & Yes \\\hline
566
ADD & \multicolumn{4}{l|}{4'ha}
567
        &       \multicolumn{4}{l|}{R. Reg}
568
        &       \multicolumn{3}{l|}{Cond.}
569
        &       \multicolumn{21}{l|}{Operand B}
570
        & Yes \\\hline
571
OR & \multicolumn{4}{l|}{4'hb}
572
        &       \multicolumn{4}{l|}{R. Reg}
573
        &       \multicolumn{3}{l|}{Cond.}
574
        &       \multicolumn{21}{l|}{Operand B}
575
        & Yes \\\hline
576
XOR & \multicolumn{4}{l|}{4'hc}
577
        &       \multicolumn{4}{l|}{R. Reg}
578
        &       \multicolumn{3}{l|}{Cond.}
579
        &       \multicolumn{21}{l|}{Operand B}
580
        & Yes \\\hline
581
LSL/ASL & \multicolumn{4}{l|}{4'hd}
582
        &       \multicolumn{4}{l|}{R. Reg}
583
        &       \multicolumn{3}{l|}{Cond.}
584
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
585
        & Yes \\\hline
586
ASR & \multicolumn{4}{l|}{4'he}
587
        &       \multicolumn{4}{l|}{R. Reg}
588
        &       \multicolumn{3}{l|}{Cond.}
589
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
590
        & Yes \\\hline
591
LSR & \multicolumn{4}{l|}{4'hf}
592
        &       \multicolumn{4}{l|}{R. Reg}
593
        &       \multicolumn{3}{l|}{Cond.}
594
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
595
        & Yes \\\hline
596
\end{tabular}
597
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
598
\end{center}\end{table}
599
 
600
As you can see, there's lots of room for instruction set expansion.  The
601
NOOP and BREAK instructions leave 24~bits of open instruction address
602
space, minus the two instructions NOOP and BREAK.  The Subtract leaves half
603
of its space open, since a subtract immediate is the same as an add with a
604
negated immediate.
605
 
606
\section{Derived Instructions}
607
The ZIP CPU supports many other common instructions, but not all of them
608
are single instructions.  The derived instruction tables,
609
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
610
help to capture some of how these other instructions may be implemented on
611
the ZIP CPU.  Many of these instructions will have assembly equivalents,
612
such as the branch instructions, to facilitate working with the CPU.
613
\begin{table}\begin{center}
614
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
615
Mapped & Actual  & Notes \\\hline
616
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
617
        & \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
618
        & Add with carry \\\hline
619
BRA.Cond +/-\$Addr
620
        & Mov.cond \$Addr+PC,PC
621
        & Branch or jump on condition.  Works for 14 bit
622
                address offsets.\\\hline
623
BRA.Cond +/-\$Addr
624
        & \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC}
625
        & Branch/jump on condition.  Works for
626
        23 bit address offsets, but costs a register, an extra instruction,
627
        and setsthe flags. \\\hline
628
BNC PC+\$Addr
629
        & \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
630
        & Example of a branch on an unsupported
631
                condition, in this case a branch on not carry \\\hline
632
BUSY & MOV \$-1(PC),PC & Execute an infinite loop \\\hline
633
CLRF.NZ Rx
634
        & XOR.NZ Rx,Rx
635
        & Clear Rx, and flags, if the Z-bit is not set \\\hline
636
CLR Rx
637
        & LDI \$0,Rx
638
        & Clears Rx, leaves flags untouched.  This instruction cannot be
639
                conditional. \\\hline
640
EXCH.W Rx
641
        & ROL \$16,Rx
642
        & Exchanges the top and bottom 16'bit words of Rx \\\hline
643
HALT
644
        & Or \$SLEEP,CC
645
        & Executed while in interrupt mode.  In user mode this is simply a
646
        wait until interrupt instructioon. \\\hline
647
INT & LDI \$0,CC
648
        & Since we're using the CC register as a trap vector as well, this
649
        executes TRAP \#0. \\\hline
650
IRET
651
        & OR \$GIE,CC
652
        & Also an RTU instruction (Return to Userspace) \\\hline
653
JMP R6+\$Addr
654
        & MOV \$Addr(R6),PC
655
        & \\\hline
656
JSR PC+\$Addr
657
        & \parbox[t]{1.5in}{SUB \$1,SP \\\
658
        MOV \$3+PC,R0 \\
659
        STO R0,1(SP) \\
660
        MOV \$Addr+PC,PC \\
661
        ADD \$1,SP}
662
        & Jump to Subroutine. \\\hline
663
JSR PC+\$Addr
664
        & \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
665
        &This is the high speed
666
        version of a subroutine call, necessitating a register to hold the
667
        last PC address.  In its favor, this method doesn't suffer the
668
        mandatory memory access of the other approach. \\\hline
669
LDI.l \$val,Rx
670
        & \parbox[t]{1.5in}{LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
671
                        LDILO (\$val \& 0x0ffff)}
672
        & Sadly, there's not enough instruction
673
                space to load a complete immediate value into any register.
674
                Therefore, fully loading any register takes two cycles.
675
                The LDIHI (load immediate high) and LDILO (load immediate low)
676
                instructions have been created to facilitate this. \\\hline
677
\end{tabular}
678
\caption{Derived Instructions}\label{tbl:derived-1}
679
\end{center}\end{table}
680
\begin{table}\begin{center}
681
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
682
Mapped & Actual  & Notes \\\hline
683
LOD.b \$addr,Rx
684
        & \parbox[t]{1.5in}{%
685
        LDI     \$addr,Ra \\
686
        LDI     \$addr,Rb \\
687
        LSR     \$2,Ra \\
688
        AND     \$3,Rb \\
689
        LOD     (Ra),Rx \\
690
        LSL     \$3,Rb \\
691
        SUB     \$32,Rb \\
692
        ROL     Rb,Rx \\
693
        AND \$0ffh,Rx}
694
        & \parbox[t]{3in}{This CPU is designed for 32'bit word
695
        length instructions.  Byte addressing is not supported by the CPU or
696
        the bus, so it therefore takes more work to do.
697
 
698
        Note also that in this example, \$Addr is a byte-wise address, where
699
        all other addresses are 32-bit wordlength addresses.  For this reason,
700
        we needed to drop the bottom two bits.  This also limits the address
701
        space of character accesses using this method from 16 MB down to 4MB.}
702
                \\\hline
703
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
704
        & \parbox[t]{1.5in}{LSL \$1,Ry \\
705
        LSL \$1,Rx \\
706
        OR.C \$1,Ry}
707
        & Logical shift left with carry.  Note that the
708
        instruction order is now backwards, to keep the conditions valid.
709
        That is, LSL sets the carry flag, so if we did this the othe way
710
        with Rx before Ry, then the condition flag wouldn't have been right
711
        for an OR correction at the end. \\\hline
712
\parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry}
713
        & \parbox[t]{1.5in}{CLR Rz \\
714
        LSR \$1,Ry \\
715
        LDIHI.C \$8000h,Rz \\
716
        LSR \$1,Rx \\
717
        OR Rz,Rx}
718
        & Logical shift right with carry \\\hline
719
NEG Rx & \parbox[t]{1.5in}{XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
720
NOOP & NOOP & While there are many
721
        operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these
722
        operations have consequences in that they might stall the bus if
723
        Rx isn't ready yet.  For this reason, we have a dedicated NOOP
724
        instruction. \\\hline
725
NOT Rx & XOR \$-1,Rx & \\\hline
726
POP Rx
727
        & \parbox[t]{1.5in}{LOD \$-1(SP),Rx \\ ADD \$1,SP}
728
        & Note
729
        that for interrupt purposes, one can never depend upon the value at
730
        (SP).  Hence you read from it, then increment it, lest having
731
        incremented it firost something then comes along and writes to that
732
        value before you can read the result. \\\hline
733
PUSH Rx
734
        & \parbox[t]{1.5in}{SUB \$1,SPa \\
735
        STO Rx,\$1(SP)}
736
        & \\\hline
737
RESET
738
        & \parbox[t]{1in}{STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
739
        & \parbox[t]{3in}{This depends upon the peripheral base address being
740
        in R12.
741
 
742
        Another opportunity might be to jump to the reset address from within
743
        supervisor mode.}\\\hline
744
RET & \parbox[t]{1.5in}{LOD \$-1(SP),R0 \\
745
        MOV \$-1+SP,SP \\
746
        MOV R0,PC}
747
        & An alternative might be to LOD \$-1(SP),PC, followed
748
        by depending upon the calling program to ADD \$1,SP. \\\hline
749
\end{tabular}
750
\caption{Derived Instructions, continued}\label{tbl:derived-2}
751
\end{center}\end{table}
752
\begin{table}\begin{center}
753
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
754
RET & MOV R12,PC
755
        & This is the high(er) speed version, that doesn't touch the stack.
756
        As such, it doesn't suffer a stall on memory read/write to the stack.
757
        \\\hline
758
STEP Rr,Rt
759
        & \parbox[t]{1.5in}{LSR \$1,Rr \\ XOR.C Rt,Rr}
760
        & Step a Galois implementation of a Linear Feedback Shift Register, Rr,
761
                using taps Rt \\\hline
762
STO.b Rx,\$addr
763
        & \parbox[t]{1.5in}{%
764
        LDI \$addr,Ra \\
765
        LDI \$addr,Rb \\
766
        LSR \$2,Ra \\
767
        AND \$3,Rb \\
768
        SUB \$32,Rb \\
769
        LOD (Ra),Ry \\
770
        AND \$0ffh,Rx \\
771
        AND \$-0ffh,Ry \\
772
        ROL Rb,Rx \\
773
        OR Rx,Ry \\
774
        STO Ry,(Ra) }
775
        & \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized
776
        for byte-wise operations.
777
 
778
        Note that in this example, \$addr is a
779
        byte-wise address, whereas in all of our other examples it is a
780
        32-bit word address. This also limits the address space
781
        of character accesses from 16 MB down to 4MB.F
782
        Further, this instruction implies a byte ordering,
783
        such as big or little endian.} \\\hline
784
SWAP Rx,Ry
785
        & \parbox[t]{1.5in}{
786
        XOR Ry,Rx \\
787
        XOR Rx,Ry \\
788
        XOR Ry,Rx}
789
        & While no extra registers are needed, this example
790
        does take 3-clocks. \\\hline
791
TRAP \#X
792
        & LDILO \$x,CC
793
        & This approach uses the unused bits of the CC register as a TRAP
794
        address.  If these bits are zero, no trap has occurred.  Unlike my
795
        previous approach, which was to use a trap peripheral, this approach
796
        has no delay associated with it.  To work, the supervisor will need
797
        to clear this register following any trap, and the user will need to
798
        be careful to only set this register prior to a trap condition.
799
        Likewise, when setting this value, the user will need to make certain
800
        that the SLEEP and GIE bits are not set in \$x.  LDI would also work,
801
        however using LDILO permits the use of conditional traps.  (i.e.,
802
        trap if the zero flag is set.)  Should you wish to trap off of a
803
        register value, you could equivalently load \$x into the register and
804
        then MOV it into the CC register. \\\hline
805
TST Rx
806
        & TST \$-1,Rx
807
        & Set the condition codes based upon Rx.  Could also do a CMP \$0,Rx,
808
        ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc.  The TST and CMP
809
        approaches won't stall future pipeline stages looking for the value
810
        of Rx. \\\hline
811
WAIT
812
        & Or \$SLEEP,CC
813
        & Wait 'til interrupt.  In an interrupts disabled context, this
814
        becomes a HALT instruction.
815
</TABLE>
816
\end{tabular}
817
\caption{Derived Instructions, continued}\label{tbl:derived-3}
818
\end{center}\end{table}
819
\iffalse
820
\fi
821
\section{Pipeline Stages}
822
\begin{enumerate}
823
\item {\bf Prefetch}: Read instruction from memory (cache if possible).  This
824
        stage is actually pipelined itself, and so it will stall if the PC
825
        ever changes.  Stalls are also created here if the instruction isn't
826
        in the prefetch cache.
827
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
828
        immediate offset.
829
\item {\bf Read Operands}: Read registers and apply any immediate values to
830
        them.  This stage will stall if any source operand is pending.
831
        A proper optimizing compiler, therefore, will schedule an instruction
832
        between the instruction that produces the result and the instruction
833
        that uses it.
834
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
835
        instruction, and the {\bf MemOps} stage which accomplishes memory
836
        read/write.
837
        \begin{itemize}
838
        \item Loads stall instructions that access the register until it is
839
                written to the register set.
840
        \item Condition codes are available upon completion
841
        \item Issuing an instruction to the memory while the memory is busy will
842
                stall the bus.  If the bus deadlocks, only a reset will
843
                release the CPU.  (Watchdog timer, anyone?)
844
        \end{itemize}
845
\item {\bf Write-Back}: Conditionally write back the result to register set,
846
        applying the condition.  This routine is bi-re-entrant: either the
847
        memory or the simple instruction may request a register write.
848
\end{enumerate}
849
 
850
\section{Pipeline Logic}
851
How the CPU handles some instruction combinations can be telling when
852
determining what happens in the pipeline.  The following lists some examples:
853
\begin{itemize}
854
\item {\bf Delayed Branching}
855
 
856
        I had originally hoped to implement delayed branching.  However, what
857
        happens in debug mode?
858
        That is, what happens when a debugger tries to single step an
859
        instruction?  While I can easily single step the computer in either
860
        user or supervisor mode from externally, this processor does not appear
861
        able to step the CPU in user mode from within user mode--gosh, not even
862
        from within supervisor mode--such as if a process had a debugger
863
        attached.  As the processor exists, I would have one result stepping
864
        the CPU from a debugger, and another stepping it externally.
865
 
866
        This is unacceptable, and so this CPU does not support delayed
867
        branching.
868
 
869
\item {\bf Register Result:} {\tt MOV R0,R1; MOV R1,R2 }
870
 
871
        What value does
872
        R2 get, the value of R1 before the first move or the value of R0?
873
        Placing the value of R0 into R1 requires a pipeline stall, and possibly
874
        two, as I have the pipeline designed.
875
 
876
        The ZIP CPU architecture requires that R2 must equal R0 at the end of
877
        this operation.  This may stall the pipeline 1-2 cycles.
878
 
879
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
880
 
881
 
882
        At issue is the same item as above, save that the CMP instruction
883
        updates the flags that the MOV instruction depends
884
        upon.
885
 
886
        The Zip CPU architecture requires that condition codes must be updated
887
        and available immediately for the next instruction without stalling the
888
        pipeline.
889
 
890
\item {\bf Condition Codes Register Result:} {\tt CMP R0,R1; MOV CC,R2}
891
 
892
        At issue is the
893
        fact that the logic supporting the CC register is more complicated than
894
        the logic supporting any other register.
895
 
896
        The ZIP CPU will stall 1--2 cycles on this instruction, until the
897
        CC register is valid.
898
 
899
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}
900
 
901
        At issues is whether or not the instruction following the jump will
902
        take place before the jump.  In other words, is the MOV to the PC
903
        register handled differently from an ADD to the PC register?
904
 
905
        In the Zip architecture, MOV'es and ADD's use the same logic
906
        (simplifies the logic).
907
\end{itemize}
908
 
909
As I've studied  this, I find several approaches to handling pipeline
910
        issues.  These approaches (and their consequences) are listed below.
911
 
912
\begin{itemize}
913
\item {\bf All All issued instructions complete, Stages stall individually}
914
 
915
        What about a slow pre-fetch?
916
 
917
        Nominally, this works well: any issued instruction
918
        just runs to completion.  If there are four issued instructions in the
919
        pipeline, with the writeback instruction being a write-to-PC
920
        instruction, the other three instructions naturally finish.
921
 
922
        This approach fails when reading instructions from the flash,
923
        since such reads require N clocks to clocks to complete.  Thus
924
        there may be only one instruction in the pipeline if reading from flash,
925
        or a full pipeline if reading from cache.  Each of these approaches
926
        would produce a different response.
927
 
928
\item {\bf Issued instructions may be canceled}
929
 
930
        Stages stall individually
931
 
932
        First problem:
933
        Memory operations cannot be canceled, even reads may have side effects
934
        on peripherals that cannot be canceled later.  Further, in the case of
935
        an interrupt, it's difficult to know what to cancel.  What happens in
936
        a \hbox{\tt MOV.C \$x,PC} followed by a \hbox{\tt MOV \$y,PC}
937
        instruction?  Which get
938
        canceled?
939
 
940
        Because it isn't clear what would need to be canceled,
941
        this instruction combination is not recommended.
942
 
943
\item {\bf All issued instructions complete.}
944
 
945
        All stages are filled, or the entire pipeline
946
        stalls.
947
 
948
        What about debug control?  What about
949
        register writes taking an extra clock stage?  MOV R0,R1; MOV R1,R2
950
        should place the value of R0 into R2.  How do you restart the pipeline
951
        after an interrupt?  What address do you use?  The last issued
952
        instruction?  But the branch delay slots may make that invalid!
953
 
954
        Reading from the CPU debug port in this case yields inconsistent
955
        results: the CPU will halt or step with instructions stuck in the
956
        pipeline.  Reading registers will give no indication of what is going
957
        on in the pipeline, just the results of completed operations, not of
958
        operations that have been started and not yet completed.
959
        Perhaps we should just report the state of the CPU based upon what
960
        instructions (PC values) have successfully completed?  Thus the
961
        debug instruction is the one that will write registers on the next
962
        clock.
963
 
964
        Suggestion: Suppose we load extra information in the two
965
        CC register(s) for debugging intermediate pipeline stages?
966
 
967
        The next problem, though, is how to deal with the read operand
968
        pipeline stage needing the result from the register pipeline.a
969
 
970
\item {\bf Memory instructions must complete}
971
 
972
        All instructions that enter into the memory module *must*
973
        complete.  Issued instructions from the prefetch, decode, or operand
974
        read stages may or may not complete.  Jumps into code must be valid,
975
        so that interrupt returns may be valid.  All instructions entering the
976
        ALU complete.
977
 
978
        This looks to be the simplest approach.
979
        While the logic may be difficult, this appears to be the only
980
        re-entrant approach.
981
 
982
        A {\tt new\_pc} flag will be high anytime the PC changes in an
983
        unpredictable way (i.e., it doesn't increment).  This includes jumps
984
        as well as interrupts and interrupt returns.  Whenever this flag may
985
        go high, memory operations and ALU operations will stall until the
986
        result is known.  When the flag does go high, anything in the prefetch,
987
        decode, and read-op stage will be invalidated.
988
 
989
\end{itemize}
990
 
991
 
992
 
993
\chapter{Peripherals}\label{chap:periph}
994
\section{Interrupt Controller}
995
\section{Counter}
996
 
997
The Zip Counter is a very simple counter: it just counts.  It cannot be
998
halted.  When it rolls over, it issues an interrupt.  Writing a value to the
999
counter just sets the current value, and it starts counting again from that
1000
value.
1001
 
1002
Eight counters are implemented in the Zip System for process accounting.
1003
This may change in the future, as nothing as yet uses these counters.
1004
 
1005
\section{Timer}
1006
 
1007
The Zip Timer is also very simple: it simply counts down to zero.  When it
1008
transitions from a one to a zero it creates an interrupt.
1009
 
1010
Writing any non-zero value to the timer starts the timer.  If the high order
1011
bit is set when writing to the timer, the timer becomes an interval timer and
1012
reloads its last start time on any interrupt.  Hence, to mark seconds, one
1013
might set the timer to 100~million (the number of clocks per second), and
1014
set the high bit.  Ever after, the timer will interrupt the CPU once per
1015
second (assuming a 100~MHz clock).
1016
 
1017
\section{Watchdog Timer}
1018
 
1019
The watchdog timer is no different from any of the other timers, save for one
1020
critical difference: the interrupt line from the watchdog
1021
timer is tied to the reset line of the CPU.  Hence writing a `1' to the
1022
watchdog timer will always reset the CPU.
1023
To stop the Watchdog timer, write a '0' to it.  To start it,
1024
write any other number to it---as with the other timers.
1025
 
1026
While the watchdog timer supports interval mode, it doesn't make as much sense
1027
as it did with the other timers.
1028
 
1029
\section{Jiffies}
1030
 
1031
This peripheral is motivated by the Linux use of `jiffies' whereby a process
1032
can request to be put to sleep until a certain number of `jiffies' have
1033
elapsed.  Using this interface, the CPU can read the number of `jiffies'
1034
from the peripheral (it only has the one location in address space), add the
1035
sleep length to it, and write teh result back to the peripheral.  The zipjiffies
1036
peripheral will record the value written to it only if it is nearer the current
1037
counter value than the last current waiting interrupt time.  If no other
1038
interrupts are waiting, and this time is in the future, it will be enabled.
1039
(There is currently no way to disable a jiffie interrupt once set, other
1040
than to disable the register in the interrupt controller.)  The processor
1041
may then place this sleep request into a list among other sleep requests.
1042
Once the timer expires, it would write the next Jiffy request to the peripheral
1043
and wake up the process whose timer had expired.
1044
 
1045
Indeed, the Jiffies register is nothing more than a glorified counter with
1046
an interrupt.  Unlike the other counters, the Jiffies register cannot be set.
1047
Writes to the jiffies register create an interrupt time.  When the Jiffies
1048
register later equals the value written to it, an interrupt will be asserted
1049
and the register then continues counting as though no interrupt had taken
1050
place.
1051
 
1052
The purpose of this register is to support alarm times within a CPU.  To
1053
set an alarm for a particular process $N$ clocks in advance, read the current
1054
Jiffies value, and $N$, and write it back to the Jiffies register.  The
1055
O/S must also keep track of values written to the Jiffies register.  Thus,
1056
when an `alarm' trips, it should be remoed from the list of alarms, the list
1057
should be sorted, and the next alarm in terms of Jiffies should be written
1058
to the register.
1059
 
1060
\chapter{Operation}\label{chap:ops}
1061
 
1062
\chapter{Registers}\label{chap:regs}
1063
 
1064
\chapter{Wishbone Datasheet}\label{chap:wishbone}
1065
The Zip System supports two wishbone accesses, a slave debug port and a master
1066
port for the system itself.  These are shown in Tbl.~\ref{tbl:wishbone-slave}
1067
\begin{table}[htbp]
1068
\begin{center}
1069
\begin{wishboneds}
1070
Revision level of wishbone & WB B4 spec \\\hline
1071
Type of interface & Slave, Read/Write, single words only \\\hline
1072
Port size & 32--bit \\\hline
1073
Port granularity & 32--bit \\\hline
1074
Maximum Operand Size & 32--bit \\\hline
1075
Data transfer ordering & (Irrelevant) \\\hline
1076
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1077
Signal Names & \begin{tabular}{ll}
1078
                Signal Name & Wishbone Equivalent \\\hline
1079
                {\tt i\_clk} & {\tt CLK\_I} \\
1080
                {\tt i\_dbg\_cyc} & {\tt CYC\_I} \\
1081
                {\tt i\_dbg\_stb} & {\tt STB\_I} \\
1082
                {\tt i\_dbg\_we} & {\tt WE\_I} \\
1083
                {\tt i\_dbg\_addr} & {\tt ADR\_I} \\
1084
                {\tt i\_dbg\_data} & {\tt DAT\_I} \\
1085
                {\tt o\_dbg\_ack} & {\tt ACK\_O} \\
1086
                {\tt o\_dbg\_stall} & {\tt STALL\_O} \\
1087
                {\tt o\_dbg\_data} & {\tt DAT\_O}
1088
                \end{tabular}\\\hline
1089
\end{wishboneds}
1090
\caption{Wishbone Datasheet}\label{tbl:wishbone-slave}
1091
\end{center}\end{table}
1092
and Tbl.~\ref{tbl:wishbone-master} respectively.
1093
\begin{table}[htbp]
1094
\begin{center}
1095
\begin{wishboneds}
1096
Revision level of wishbone & WB B4 spec \\\hline
1097
Type of interface & Master, Read/Write, sometimes pipelined \\\hline
1098
Port size & 32--bit \\\hline
1099
Port granularity & 32--bit \\\hline
1100
Maximum Operand Size & 32--bit \\\hline
1101
Data transfer ordering & (Irrelevant) \\\hline
1102
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1103
Signal Names & \begin{tabular}{ll}
1104
                Signal Name & Wishbone Equivalent \\\hline
1105
                {\tt i\_clk} & {\tt CLK\_O} \\
1106
                {\tt o\_wb\_cyc} & {\tt CYC\_O} \\
1107
                {\tt o\_wb\_stb} & {\tt STB\_O} \\
1108
                {\tt o\_wb\_we} & {\tt WE\_O} \\
1109
                {\tt o\_wb\_addr} & {\tt ADR\_O} \\
1110
                {\tt o\_wb\_data} & {\tt DAT\_O} \\
1111
                {\tt i\_wb\_ack} & {\tt ACK\_I} \\
1112
                {\tt i\_wb\_stall} & {\tt STALL\_I} \\
1113
                {\tt i\_wb\_data} & {\tt DAT\_I}
1114
                \end{tabular}\\\hline
1115
\end{wishboneds}
1116
\caption{Wishbone Datasheet}\label{tbl:wishbone-master}
1117
\end{center}\end{table}
1118
I do not recommend that you connect these together through the interconnect.
1119
 
1120
The big thing to notice is that both the real time clock and the real time
1121
date modules act as wishbone slaves, and that all accesses to the registers of
1122
either module are 32--bit reads and writes.  The address bus does not offer
1123
byte level, but rather 32--bit word level resolution.  Select lines are not
1124
implemented.  Bit ordering is the normal ordering where bit~31 is the most
1125
significant bit and so forth.
1126
 
1127
\chapter{Clocks}\label{chap:clocks}
1128
 
1129
This core is based upon the Basys--3 design.  The Basys--3 development board
1130
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
1131
core.
1132
\begin{table}[htbp]
1133
\begin{center}
1134
\begin{clocklist}
1135
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
1136
\end{clocklist}
1137
\caption{List of Clocks}\label{tbl:clocks}
1138
\end{center}\end{table}
1139
I hesitate to suggest that the core can run faster than 100~MHz, since I have
1140
had struggled with various timing violations to keep it at 100~MHz.  So, for
1141
now, I will only state that it can run at 100~MHz.
1142
 
1143
 
1144
\chapter{I/O Ports}\label{chap:ioports}
1145
The I/O ports for this clock are shown in Tbls.~\ref{tbl:iowishbone}
1146
\begin{table}[htbp]
1147
\begin{center}
1148
\begin{portlist}
1149
i\_clk & 1 & Input & System clock, used for time and wishbone interfaces.\\\hline
1150
i\_wb\_cyc & 1 & Input & Wishbone bus cycle wire.\\\hline
1151
i\_wb\_stb & 1 & Input & Wishbone strobe.\\\hline
1152
i\_wb\_we & 1 & Input & Wishbone write enable.\\\hline
1153
i\_wb\_addr & 5 & Input & Wishbone address.\\\hline
1154
i\_wb\_data & 32 & Input & Wishbone bus data register for use when writing
1155
        (configuring) the core from the bus.\\\hline
1156
o\_wb\_ack & 1 & Output & Return value acknowledging a wishbone write, or
1157
                signifying valid data in the case of a wishbone read request.
1158
                \\\hline
1159
o\_wb\_stall & 1 & Output & Indicates the device is not yet ready for another
1160
                wishbone access, effectively stalling the bus.\\\hline
1161
o\_wb\_data & 32 & Output & Wishbone data bus, returning data values read
1162
                from the interface.\\\hline
1163
\end{portlist}
1164
\caption{Wishbone I/O Ports}\label{tbl:iowishbone}
1165
\end{center}\end{table}
1166
and~Tbl.~\ref{tbl:ioother}.
1167
\begin{table}[htbp]
1168
\begin{center}
1169
\begin{portlist}
1170
o\_sseg & 32 & Output & Lines to control a seven segment display, to be
1171
                sent to that display's driver.  Each eight bit byte controls
1172
                one digit in the display, with the bottom bit in the byte
1173
                controlling the decimal point.\\\hline
1174
o\_led & 16 & Output & Output LED's, consisting of a 16--bit counter counting
1175
                from zero to all ones each minute, and synchronized with each
1176
                minute so as to create an indicator of when the next minute
1177
                will take place when only the hours and minutes can be
1178
                displayed.\\\hline
1179
o\_interrupt & 1 & Output & A pulsed/strobed interrupt line.  When the
1180
                clock needs to generate an interrupt, it will set this line
1181
                high for one clock cycle.  \\\hline
1182
o\_ppd & 1 & Output & A `pulse per day' signal which can be fed into the
1183
        real--time date module.  This line will be high on the clock before
1184
        the stroke of midnight, allowing the date module to turn over to the
1185
        next day at exactly the same time the clock module turns over to the
1186
        next day.\\\hline
1187
i\_hack & 1 & Input & When this line is raised, copies are made of the
1188
        internal state registers on the next clock.  These registers can then
1189
        be used for an accurate time hack regarding the state of the clock
1190
        at the time this line was strobed.\\\hline
1191
\end{portlist}
1192
\caption{Other I/O Ports}\label{tbl:ioother}
1193
\end{center}\end{table}
1194
Tbl.~\ref{tbl:iowishbone} reiterates the wishbone I/O values just discussed in
1195
Chapt.~\ref{chap:wishbone}, and so need no further discussion here.
1196
 
1197
 
1198
% Appendices
1199
% Index
1200
\end{document}
1201
 
1202
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.