URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

# Subversion Repositorieszipcpu

## [/] [zipcpu/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 22

Line No. Rev Author Line
1 21 dgisselq
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2
%%
3
%% Filename:    spec.tex
4
%%
5
%% Project:     Zip CPU -- a small, lightweight, RISC CPU soft core
6
%%
7
%% Purpose:     This LaTeX file contains all of the documentation/description
8
%%              currently provided with this Zip CPU soft core.  It supercedes
9
%%              any information about the instruction set or CPUs found
10
%%              elsewhere.  It's not nearly as interesting, though, as the PDF
11
%%              file it creates, so I'd recommend reading that before diving
12
%%              into this file.  You should be able to find the PDF file in
13
%%              the SVN distribution together with this PDF file and a copy of
14
15
%%              just type 'make' in the doc directory and it (should) build
16
%%              without a problem.
17
%%
18
%%
19
%% Creator:     Dan Gisselquist
20
%%              Gisselquist Technology, LLC
21
%%
22
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
23
%%
24
%% Copyright (C) 2015, Gisselquist Technology, LLC
25
%%
26
%% This program is free software (firmware): you can redistribute it and/or
27
%% modify it under the terms of  the GNU General Public License as published
28
%% by the Free Software Foundation, either version 3 of the License, or (at
29
%% your option) any later version.
30
%%
31
%% This program is distributed in the hope that it will be useful, but WITHOUT
32
%% ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
33
%% FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
34
%% for more details.
35
%%
36
%% You should have received a copy of the GNU General Public License along
37
%% with this program.  (It's in the $(ROOT)/doc directory, run make with no 38 %% target there if the PDF file isn't present.) If not, see 39 %% <http://www.gnu.org/licenses/> for a copy. 40 %% 41 %% License: GPL, v3, as defined and found on www.gnu.org, 42 %% http://www.gnu.org/licenses/gpl.html 43 %% 44 %% 45 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 46 \documentclass{gqtekspec} 47 \project{Zip CPU} 48 \title{Specification} 49 \author{Dan Gisselquist, Ph.D.} 50 \email{dgisselq (at) opencores.org} 51 \revision{Rev.~0.1} 52 \begin{document} 53 \pagestyle{gqtekspecplain} 54 \titlepage 55 \begin{license} 56 Copyright (C) \theyear\today, Gisselquist Technology, LLC 57 58 This project is free software (firmware): you can redistribute it and/or 59 modify it under the terms of the GNU General Public License as published 60 by the Free Software Foundation, either version 3 of the License, or (at 61 your option) any later version. 62 63 This program is distributed in the hope that it will be useful, but WITHOUT 64 ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or 65 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 66 for more details. 67 68 You should have received a copy of the GNU General Public License along 69 with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a 70 copy. 71 \end{license} 72 \begin{revisionhistory} 73 0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline 74 \end{revisionhistory} 75 % Revision History 76 % Table of Contents, named Contents 77 \tableofcontents 78 % \listoffigures 79 \listoftables 80 \begin{preface} 81 Many people have asked me why I am building the Zip CPU. ARM processors are 82 good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both 83 have better toolsets than the Zip CPU will ever have. OpenRISC is also 84 available. Why build a new processor? 85 86 The easiest, most obvious answer is the simple one: Because I can. 87 88 There's more to it, though. There's a lot that I would like to do with a 89 processor, and I want to be able to do it in a vendor independent fashion. 90 I would like to be able to generate Verilog code that can run equivalently 91 on both Xilinx and Altera chips, and that can be easily ported from one 92 manufacturer's chipsets to another. Even more, before purchasing a chip or a 93 board, I would like to know that my chip works. I would like to build a test 94 bench to test components with, and Verilator is my chosen test bench. This 95 forces me to use all Verilog, and it prevents me from using any proprietary 96 cores. For this reason, Microblaze and Nios are out of the question. 97 98 Why not OpenRISC? That's a hard question. The OpenRISC team has done some 99 wonderful work on an amazing processor, and I'll have to admit that I am 100 envious of what they've accomplished. I would like to port binutils to the 101 Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The 102 OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has 103 a lot of features of modern CPUs within it that ... well, let's just say it's 104 not the little guy on the block. The Zip CPU is lighter weight, costing only 105 about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic 106 peripherals. 107 108 My final reason is that I'm building the Zip CPU as a learning experience. The 109 Zip CPU has allowed me to learn a lot about how CPUs work on a very micro 110 level. For the first time, I am beginning to understand many of the Computer 111 Architecture lessons from years ago. 112 113 To summarize: Because I can, because it is open source, because it is light 114 weight, and as an exercise in learning. 115 116 \end{preface} 117 118 \chapter{Introduction} 119 \pagenumbering{arabic} 120 \setcounter{page}{1} 121 122 123 The original goal of the ZIP CPU was to be a very simple CPU. You might 124 think of it as a poor man's alternative to the OpenRISC architecture. 125 For this reason, all instructions have been designed to be as simple as 126 possible, and are all designed to be executed in one instruction cycle per 127 instruction, barring pipeline stalls. Indeed, even the bus has been simplified 128 to a constant 32-bit width, with no option for more or less. This has 129 resulted in the choice to drop push and pop instructions, pre-increment and 130 post-decrement addressing modes, and more. 131 132 For those who like buzz words, the Zip CPU is: 133 \begin{itemize} 134 \item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits, 135 instructions are 32-bits wide, etc. 136 \item A RISC CPU. There is no microcode for executing instructions. 137 \item A Load/Store architecture. (Only load and store instructions 138 can access memory.) 139 \item Wishbone compliant. All peripherals are accessed just like 140 memory across this bus. 141 \item A Von-Neumann architecture. (The instructions and data share a 142 common bus.) 143 \item A pipelined architecture, having stages for {\bf Prefetch}, 144 {\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory} 145 unit, and {\bf Write-back} 146 \item Completely open source, licensed under the GPL.\footnote{Should you 147 need a copy of the Zip CPU licensed under other terms, please 148 contact me.} 149 \end{itemize} 150 151 Now, however, that I've worked on the Zip CPU for a while, it is not nearly 152 as simple as I originally hoped. Worse, I've had to adjust to create 153 capabilities that I was never expecting to need. These include: 154 \begin{itemize} 155 \item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is 156 still necessary to debug this CPU. That means that there needs to be 157 an external register that can control the CPU: reset it, halt it, step 158 it, and tell whether it is running or not. Another register is placed 159 similar to this register, to allow the external controller to examine 160 registers internal to the CPU. 161 162 \item {\bf Internal Debug:} Being able to run a debugger from within 163 a user process requires an ability to step a user process from 164 within a debugger. It also requires a break instruction that can 165 be substituted for any other instruction, and substituted back. 166 The break is actually difficult: the break instruction cannot be 167 allowed to execute. That way, upon a break, the debugger should 168 be able to jump back into the user process to step the instruction 169 that would've been at the break point initially, and then to 170 replace the break after passing it. 171 172 \item {\bf Prefetch Cache:} My original implementation had a very 173 simple prefetch stage. Any time the PC changed the prefetch would go 174 and fetch the new instruction. While this was perhaps this simplest 175 approach, it cost roughly five clocks for every instruction. This 176 was deemed unacceptable, as I wanted a CPU that could execute 177 instructions in one cycle. I therefore have a prefetch cache that 178 issues pipelined wishbone accesses to memory and then pushes 179 instructions at the CPU. Sadly, this accounts for about 20\% of the 180 logic in the entire CPU, or 15\% of the logic in the entire system. 181 182 183 \item {\bf Operating System:} In order to support an operating system, 184 interrupts and so forth, the CPU needs to support supervisor and 185 user modes, as well as a means of switching between them. For example, 186 the user needs a means of executing a system call. This is the 187 purpose of the {\bf trap'} instruction. This instruction needs to 188 place the CPU into supervisor mode (here equivalent to disabling 189 interrupts), as well as handing it a parameter such as identifying 190 which O/S function was called. 191 192 My initial approach to building a trap instruction was to create 193 an external peripheral which, when written to, would generate an 194 interrupt and could return the last value written to it. This failed 195 timing requirements, however: the CPU executed two instructions while 196 waiting for the trap interrupt to take place. Since then, I've 197 decided to keep the rest of the CC register for that purpose so that a 198 write to the CC register, with the GIE bit cleared, could be used to 199 execute a trap. 200 201 Modern timesharing systems also depend upon a {\bf Timer} interrupt 202 to handle task swapping. For the Zip CPU, this interrupt is handled 203 external to the CPU as part of the CPU System, found in 204 {\tt zipsystem.v}. The timer module itself is found in 205 {\tt ziptimer.v}. 206 207 \item {\bf Pipeline Stalls:} My original plan was to not support pipeline 208 stalls at all, but rather to require the compiler to properly schedule 209 instructions so that stalls would never be necessary. After trying 210 to build such an architecture, I gave up, having learned some things: 211 212 For example, in order to facilitate interrupt handling and debug 213 stepping, the CPU needs to know what instructions have finished, and 214 which have not. In other words, it needs to know where it can restart 215 the pipeline from. Once restarted, it must act as though it had 216 never stopped. This killed my idea of delayed branching, since 217 what would be the appropriate program counter to restart at? 218 The one the CPU was going to branch to, or the ones in the 219 delay slots? 220 221 So I switched to a model of discrete execution: Once an instruction 222 enters into either the ALU or memory unit, the instruction is 223 guaranteed to complete. If the logic recognizes a branch or a 224 condition that would render the instruction entering into this stage 225 possibly inappropriate (i.e. a conditional branch preceeding a store 226 instruction for example), then the pipeline stalls for one cycle 227 until the conditional branch completes. Then, if it generates a new 228 PC address, the stages preceeding are all wiped clean. 229 230 The discrete execution model allows such things as sleeping: if the 231 CPU is put to "sleep", the ALU and memory stages stall and back up 232 everything before them. Likewise, anything that has entered the ALU 233 or memory stage when the CPU is placed to sleep continues to completion. 234 To handle this logic, each pipeline stage has three control signals: 235 a valid signal, a stall signal, and a clock enable signal. In 236 general, a stage stalls if it's contents are valid and the next step 237 is stalled. This allows the pipeline to fill any time a later stage 238 stalls. 239 240 \item {\bf Verilog Modules:} When examining how other processors worked 241 here on open cores, many of them had one separate module per pipeline 242 stage. While this appeared to me to be a fascinating and commendable 243 idea, my own implementation didn't work out quite so nicely. 244 245 As an example, the decode module produces a {\em lot} of 246 control wires and registers. Creating a module out of this, with 247 only the simplest of logic within it, seemed to be more a lesson 248 in passing wires around, rather than encapsulating logic. 249 250 Another example was the register writeback section. I would love 251 this section to be a module in its own right, and many have made them 252 such. However, other modules depend upon writeback results other 253 than just what's placed in the register (i.e., the control wires). 254 For these reasons, I didn't manage to fit this section into it's 255 own module. 256 257 The result is that the majority of the CPU code can be found in 258 the {\tt zipcpu.v} file. 259 \end{itemize} 260 261 With that introduction out of the way, let's move on to the instruction 262 set. 263 264 \chapter{CPU Architecture}\label{chap:arch} 265 266 The Zip CPU supports a set of two operand instructions, where the first operand 267 (always a register) is the result. The only exception is the store instruction, 268 where the first operand (always a register) is the source of the data to be 269 stored. 270 271 \section{Register Set} 272 The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor 273 and a user set. The supervisor set is used in interrupt mode, whereas 274 the user set is used otherwise. Of this register set, the Program Counter (PC) 275 is register 15, whereas the status register (SR) or condition code register 276 (CC) is register 14. By convention, the stack pointer will be register 13 and 277 noted as (SP)--although the instruction set allows it to be anything. 278 The CPU can access both register sets via move instructions from the 279 supervisor state, whereas the user state can only access the user registers. 280 281 The status register is special, and bears further mention. The lower 282 8 bits of the status register form a set of condition codes. Writes to other 283 bits are preserved, and can be used as part of the trap architecture--examined 284 by the O/S upon any interrupt, cleared before returning. 285 286 Of the eight condition codes, the bottom four are the current flags: 287 Zero (Z), 288 Carry (C), 289 Negative (N), 290 and Overflow (V). 291 292 The next bit is a clock enable (0 to enable) or sleep bit (1 to put 293 the CPU to sleep). Setting this bit will cause the CPU to 294 wait for an interrupt (if interrupts are enabled), or to 295 completely halt (if interrupts are disabled). 296 The sixth bit is a global interrupt enable bit (GIE). When this 297 sixth bit is a '1' interrupts will be enabled, else disabled. When 298 interrupts are disabled, the CPU will be in supervisor mode, otherwise 299 it is in user mode. Thus, to execute a context switch, one only 300 need enable or disable interrupts. (When an interrupt line goes 301 high, interrupts will automatically be disabled, as the CPU goes 302 and deals with its context switch.) 303 304 The seventh bit is a step bit. This bit can be 305 set from supervisor mode only. After setting this bit, should 306 the supervisor mode process switch to user mode, it would then 307 accomplish one instruction in user mode before returning to supervisor 308 mode. Then, upon return to supervisor mode, this bit will 309 be automatically cleared. This bit has no effect on the CPU while in 310 supervisor mode. 311 312 This functionality was added to enable a userspace debugger 313 functionality on a user process, working through supervisor mode 314 of course. 315 316 317 The eighth bit is a break enable bit. This 318 controls whether a break instruction will halt the processor for an 319 external debuggerr (break enabled), or whether the break instruction 320 will simply set the STEP bit and send the CPU into interrupt mode. 321 This bit can only be set within supervisor mode. 322 323 This functionality was added to enable an external debugger to 324 set and manage breakpoints. 325 326 The ninth bit is reserved for a floating point enable bit. When set, the 327 arithmetic for the next instruction will be sent to a floating point unit. 328 Such a unit may later be added as an extension to the Zip CPU. If the 329 CPU does not support floating point instructions, this bit will never be set. 330 331 The tenth bit is a trap bit. It is set whenever the user requests a soft 332 interrupt, and cleared on any return to userspace command. This allows the 333 supervisor, in supervisor mode, to determine whether it got to supervisor 334 mode from a trap or from an external interrupt or both. 335 336 The status register bits are shown below: 337 \begin{table} 338 \begin{center} 339 \begin{tabular}{l|l} 340 Bit & Meaning \\\hline 341 9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline 342 8 & (Reserved for) Floating point enable \\\hline 343 7 & Halt on break, to support an external debugger \\\hline 344 6 & Step, single step the CPU in user mode\\\hline 345 5 & GIE, or Global Interrupt Enable \\\hline 346 4 & Sleep \\\hline 347 3 & V, or overflow bit.\\\hline 348 2 & N, or negative bit.\\\hline 349 1 & C, or carry bit.\\\hline 350 351 \end{tabular} 352 \end{center} 353 \end{table} 354 \section{Conditional Instructions} 355 Most, although not quite all, instructions are conditionally executed. From 356 the four condition code flags, eight conditions are defined. These are shown 357 in Tbl.~\ref{tbl:conditions}. 358 \begin{table} 359 \begin{center} 360 \begin{tabular}{l|l|l} 361 Code & Mneumonic & Condition \\\hline 362 3'h0 & None & Always execute the instruction \\ 363 3'h1 & {\tt .Z} & Only execute when 'Z' is set \\ 364 3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\ 365 3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\ 366 3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\ 367 3'h5 & {\tt .LT} & Less than ('N' not set) \\ 368 3'h6 & {\tt .C} & Carry set\\ 369 3'h7 & {\tt .V} & Overflow set\\ 370 \end{tabular} 371 \caption{Conditions for conditional operand execution}\label{tbl:conditions} 372 \end{center} 373 \end{table} 374 There is no condition code for less than or equal, not C or not V. Using 375 these conditions will take an extra instruction. 376 (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
377

378
\section{Operand B}
379
Many instruction forms have a 21-bit source "Operand B" associated with them.
380
This Operand B is either equal to a register plus a signed immediate offset,
381
or an immediate offset by itself.  This value is encoded as shown in
382
Tbl.~\ref{tbl:opb}.
383
\begin{table}\begin{center}
384
\begin{tabular}{|l|l|l|}\hline
385
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
386
1'b0 & \multicolumn{2}{l|}{Signed Immediate value} \\\hline
387
1'b1 & 4-bit Register & 16-bit Signed immediate offset \\\hline
388
\end{tabular}
389
\caption{Bit allocation for Operand B}\label{tbl:opb}
390
\end{center}\end{table}
391
392
The ZIP CPU supports two addressing modes: register plus immediate, and
393
394
Operand B's, shown above.
395

396
A lot of long hard thought was put into whether to allow pre/post increment
397
and decrement addressing modes.  Finding no way to use these operators without
398
taking two or more clocks per instruction, these addressing modes have been
399
removed from the realm of possibilities.  This means that the Zip CPU has no
400
native way of executing push, pop, return, or jump to subroutine operations.
401

402
\section{Move Operands}
403
The previous set of operands would be perfect and complete, save only that
404
405
mode.  Therefore, the MOV instruction is special and offers access
406
to these registers ... when in supervisory mode.  To keep the compiler
407
simple, the extra bits are ignored in non-supervisory mode (as though
408
they didn't exist), rather than being mapped to new instructions or
409
additional capabilities.  The bits indicating which register set each
410
register lies within are the A-Usr and B-Usr bits.  When set to a one,
411
these refer to a user mode register.  When set to a zero, these refer
412
to a register in the current mode, whether user or supervisor.
413
Further, because
414
a load immediate instruction exists, there is no move capability between
415
an immediate and a register: all moves come from either a register or
416
a register plus an offset.
417

418
This actually leads to a bit of a problem: since the MOV instruction
419
encodes which register set each register is coming from or moving to,
420
how shall a compiler or assembler know how to compile a MOV instruction
421
without knowing the mode of the CPU at the time?  For this reason,
422
the compiler will assume all MOV registers are supervisor registers,
423
and display them as normal.  Anything with the user bit set will
424
be treated as a user register.  The CPU will quietly ignore the
425
supervisor bits while in user mode, and anything marked as a user
426
register will always be valid.
427

428
\section{Multiply Operations}
429
While the Zip CPU instruction set supports multiply operations, they are not
430
yet fully supported by the CPU.  Two Multiply operations are supported, a
431
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU).  In both
432
cases, the operand is a register plus a 16-bit immediate, subject to the
433
rule that the register cannot be the PC or CC registers.  The PC register
434
field has been stolen to create a multiply by immediate instruction.  The
435
CC register field is reserved.
436

437
\section{Floating Point}
438
The ZIP CPU does not support floating point operations today.  However, the
439
instruction set reserves a capability for a floating point operation.  To
440
execute such an operation, simply set the floating point bit in the CC
441
register and the following instruction will interpret its registers as
442
a floating point instruction.  Not all instructions, however, have floating
443
point equivalents.  Further, the immediate fields do not apply in floating
444
point mode, and must be set to zero.  Not all instructions make sense as
445
floating point operations.  Therefore, only the CMP, SUB, ADD, and MPY
446
instructions may be issued as floating point instructions.  Other instructions
447
allow the examining of the floating point bit in the CC register.  In all
448
cases, the floating point bit is cleared one instruction after it is set.
449

450
The architecture does not support a floating point not-implemented interrupt.
451
Any soft floating point emulation must be done deliberately.
452

453
\section{Native Instructions}
454
The instruction set for the Zip CPU is summarized in
455
Tbl.~\ref{tbl:zip-instructions}.
456
\begin{table}\begin{center}
457
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
458
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
459
& \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
460
& Sets CC? \\\hline
461
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
462
& \multicolumn{4}{l|}{D. Reg}
463
& \multicolumn{3}{l|}{Cond.}
464
& \multicolumn{21}{l|}{Operand B}
465
& Yes \\\hline
466
BTST(And) & \multicolumn{4}{l|}{4'h1}
467
& \multicolumn{4}{l|}{D. Reg}
468
& \multicolumn{3}{l|}{Cond.}
469
& \multicolumn{21}{l|}{Operand B}
470
& Yes \\\hline
471
MOV & \multicolumn{4}{l|}{4'h2}
472
& \multicolumn{4}{l|}{D. Reg}
473
& \multicolumn{3}{l|}{Cond.}
474
& A-Usr
475
& \multicolumn{4}{l|}{B-Reg}
476
& B-Usr
477
& \multicolumn{15}{l|}{15'bit signed offset}
478
& \\\hline
479
LODI & \multicolumn{4}{l|}{4'h3}
480
& \multicolumn{4}{l|}{R. Reg}
481
& \multicolumn{24}{l|}{24'bit Signed Immediate}
482
& \\\hline
483
NOOP & \multicolumn{4}{l|}{4'h4}
484
& \multicolumn{4}{l|}{4'he}
485
& \multicolumn{24}{l|}{24'h00}
486
& \\\hline
487
BREAK & \multicolumn{4}{l|}{4'h4}
488
& \multicolumn{4}{l|}{4'he}
489
& \multicolumn{24}{l|}{24'h01}
490
& \\\hline
491
{\em Rsrd} & \multicolumn{4}{l|}{4'h4}
492
& \multicolumn{4}{l|}{4'he}
493
& \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
494
& \\\hline
495
LODIHI & \multicolumn{4}{l|}{4'h4}
496
& \multicolumn{4}{l|}{4'hf}
497
& \multicolumn{3}{l|}{Cond.}
498
& 1'b1
499
& \multicolumn{4}{l|}{R. Reg}
500
& \multicolumn{16}{l|}{16-bit Immediate}
501
& \\\hline
502
LODILO & \multicolumn{4}{l|}{4'h4}
503
& \multicolumn{4}{l|}{4'hf}
504
& \multicolumn{3}{l|}{Cond.}
505
& 1'b0
506
& \multicolumn{4}{l|}{R. Reg}
507
& \multicolumn{16}{l|}{16-bit Immediate}
508
& \\\hline
509
16-b MPYU & \multicolumn{4}{l|}{4'h4}
510
& \multicolumn{4}{l|}{R. Reg}
511
& \multicolumn{3}{l|}{Cond.}
512
& 1'b0 & \multicolumn{4}{l|}{Reg}
513
& \multicolumn{16}{l|}{16-bit Offset}
514
& Yes \\\hline
515
16-b MPYU(I) & \multicolumn{4}{l|}{4'h4}
516
& \multicolumn{4}{l|}{R. Reg}
517
& \multicolumn{3}{l|}{Cond.}
518
& 1'b0 & \multicolumn{4}{l|}{4'hf}
519
& \multicolumn{16}{l|}{16-bit Offset}
520
& Yes \\\hline
521
16-b MPYS & \multicolumn{4}{l|}{4'h4}
522
& \multicolumn{4}{l|}{R. Reg}
523
& \multicolumn{3}{l|}{Cond.}
524
& 1'b1 & \multicolumn{4}{l|}{Reg}
525
& \multicolumn{16}{l|}{16-bit Offset}
526
& Yes \\\hline
527
16-b MPYS(I) & \multicolumn{4}{l|}{4'h4}
528
& \multicolumn{4}{l|}{R. Reg}
529
& \multicolumn{3}{l|}{Cond.}
530
& 1'b1 & \multicolumn{4}{l|}{4'hf}
531
& \multicolumn{16}{l|}{16-bit Offset}
532
& Yes \\\hline
533
ROL & \multicolumn{4}{l|}{4'h5}
534
& \multicolumn{4}{l|}{R. Reg}
535
& \multicolumn{3}{l|}{Cond.}
536
& \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
537
& \\\hline
538
LOD & \multicolumn{4}{l|}{4'h6}
539
& \multicolumn{4}{l|}{R. Reg}
540
& \multicolumn{3}{l|}{Cond.}
541
542
& \\\hline
543
STO & \multicolumn{4}{l|}{4'h7}
544
& \multicolumn{4}{l|}{D. Reg}
545
& \multicolumn{3}{l|}{Cond.}
546
547
& \\\hline
548
{\em Rsrd} & \multicolumn{4}{l|}{4'h8}
549
&       \multicolumn{4}{l|}{R. Reg}
550
&       \multicolumn{3}{l|}{Cond.}
551
& 1'b0
552
&       \multicolumn{20}{l|}{Reserved}
553
& Yes \\\hline
554
SUB & \multicolumn{4}{l|}{4'h8}
555
&       \multicolumn{4}{l|}{R. Reg}
556
&       \multicolumn{3}{l|}{Cond.}
557
& 1'b1
558
&       \multicolumn{4}{l|}{Reg}
559
&       \multicolumn{16}{l|}{16'bit signed offset}
560
& Yes \\\hline
561
AND & \multicolumn{4}{l|}{4'h9}
562
&       \multicolumn{4}{l|}{R. Reg}
563
&       \multicolumn{3}{l|}{Cond.}
564
&       \multicolumn{21}{l|}{Operand B}
565
& Yes \\\hline
566
567
&       \multicolumn{4}{l|}{R. Reg}
568
&       \multicolumn{3}{l|}{Cond.}
569
&       \multicolumn{21}{l|}{Operand B}
570
& Yes \\\hline
571
OR & \multicolumn{4}{l|}{4'hb}
572
&       \multicolumn{4}{l|}{R. Reg}
573
&       \multicolumn{3}{l|}{Cond.}
574
&       \multicolumn{21}{l|}{Operand B}
575
& Yes \\\hline
576
XOR & \multicolumn{4}{l|}{4'hc}
577
&       \multicolumn{4}{l|}{R. Reg}
578
&       \multicolumn{3}{l|}{Cond.}
579
&       \multicolumn{21}{l|}{Operand B}
580
& Yes \\\hline
581
LSL/ASL & \multicolumn{4}{l|}{4'hd}
582
&       \multicolumn{4}{l|}{R. Reg}
583
&       \multicolumn{3}{l|}{Cond.}
584
&       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
585
& Yes \\\hline
586
ASR & \multicolumn{4}{l|}{4'he}
587
&       \multicolumn{4}{l|}{R. Reg}
588
&       \multicolumn{3}{l|}{Cond.}
589
&       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
590
& Yes \\\hline
591
LSR & \multicolumn{4}{l|}{4'hf}
592
&       \multicolumn{4}{l|}{R. Reg}
593
&       \multicolumn{3}{l|}{Cond.}
594
&       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
595
& Yes \\\hline
596
\end{tabular}
597
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
598
\end{center}\end{table}
599

600
As you can see, there's lots of room for instruction set expansion.  The
601
NOOP and BREAK instructions leave 24~bits of open instruction address
602
space, minus the two instructions NOOP and BREAK.  The Subtract leaves half
603
of its space open, since a subtract immediate is the same as an add with a
604
negated immediate.
605

606
\section{Derived Instructions}
607
The ZIP CPU supports many other common instructions, but not all of them
608
are single instructions.  The derived instruction tables,
609
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
610
help to capture some of how these other instructions may be implemented on
611
the ZIP CPU.  Many of these instructions will have assembly equivalents,
612
such as the branch instructions, to facilitate working with the CPU.
613
\begin{table}\begin{center}
614
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
615
Mapped & Actual  & Notes \\\hline
616
617
& \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry} 618 & Add with carry \\\hline 619 BRA.Cond +/-\$Addr
620
& Mov.cond \$Addr+PC,PC 621 & Branch or jump on condition. Works for 14 bit 622 address offsets.\\\hline 623 BRA.Cond +/-\$Addr
624
& \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC} 625 & Branch/jump on condition. Works for 626 23 bit address offsets, but costs a register, an extra instruction, 627 and setsthe flags. \\\hline 628 BNC PC+\$Addr
629
& \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
630
& Example of a branch on an unsupported
631
condition, in this case a branch on not carry \\\hline
632
BUSY & MOV \$-1(PC),PC & Execute an infinite loop \\\hline 633 CLRF.NZ Rx 634 & XOR.NZ Rx,Rx 635 & Clear Rx, and flags, if the Z-bit is not set \\\hline 636 CLR Rx 637 & LDI \$0,Rx
638
& Clears Rx, leaves flags untouched.  This instruction cannot be
639
conditional. \\\hline
640
EXCH.W Rx
641
& ROL \$16,Rx 642 & Exchanges the top and bottom 16'bit words of Rx \\\hline 643 HALT 644 & Or \$SLEEP,CC
645
& Executed while in interrupt mode.  In user mode this is simply a
646
wait until interrupt instructioon. \\\hline
647
INT & LDI \$0,CC 648 & Since we're using the CC register as a trap vector as well, this 649 executes TRAP \#0. \\\hline 650 IRET 651 & OR \$GIE,CC
652
653
JMP R6+\$Addr 654 & MOV \$Addr(R6),PC
655
& \\\hline
656
JSR PC+\$Addr 657 & \parbox[t]{1.5in}{SUB \$1,SP \\\
658
MOV \$3+PC,R0 \\ 659 STO R0,1(SP) \\ 660 MOV \$Addr+PC,PC \\
661
ADD \$1,SP} 662 & Jump to Subroutine. \\\hline 663 JSR PC+\$Addr
664
& \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
665
&This is the high speed
666
version of a subroutine call, necessitating a register to hold the
667
last PC address.  In its favor, this method doesn't suffer the
668
mandatory memory access of the other approach. \\\hline
669
LDI.l \$val,Rx 670 & \parbox[t]{1.5in}{LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
671
LDILO (\$val \& 0x0ffff)} 672 & Sadly, there's not enough instruction 673 space to load a complete immediate value into any register. 674 Therefore, fully loading any register takes two cycles. 675 The LDIHI (load immediate high) and LDILO (load immediate low) 676 instructions have been created to facilitate this. \\\hline 677 \end{tabular} 678 \caption{Derived Instructions}\label{tbl:derived-1} 679 \end{center}\end{table} 680 \begin{table}\begin{center} 681 \begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline 682 Mapped & Actual & Notes \\\hline 683 LOD.b \$addr,Rx
684
& \parbox[t]{1.5in}{%
685
LDI     \$addr,Ra \\ 686 LDI \$addr,Rb \\
687
LSR     \$2,Ra \\ 688 AND \$3,Rb \\
689
LOD     (Ra),Rx \\
690
LSL     \$3,Rb \\ 691 SUB \$32,Rb \\
692
ROL     Rb,Rx \\
693
AND \$0ffh,Rx} 694 & \parbox[t]{3in}{This CPU is designed for 32'bit word 695 length instructions. Byte addressing is not supported by the CPU or 696 the bus, so it therefore takes more work to do. 697 698 Note also that in this example, \$Addr is a byte-wise address, where
699
700
we needed to drop the bottom two bits.  This also limits the address
701
space of character accesses using this method from 16 MB down to 4MB.}
702
\\\hline
703
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
704
& \parbox[t]{1.5in}{LSL \$1,Ry \\ 705 LSL \$1,Rx \\
706
OR.C \$1,Ry} 707 & Logical shift left with carry. Note that the 708 instruction order is now backwards, to keep the conditions valid. 709 That is, LSL sets the carry flag, so if we did this the othe way 710 with Rx before Ry, then the condition flag wouldn't have been right 711 for an OR correction at the end. \\\hline 712 \parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry} 713 & \parbox[t]{1.5in}{CLR Rz \\ 714 LSR \$1,Ry \\
715
LDIHI.C \$8000h,Rz \\ 716 LSR \$1,Rx \\
717
OR Rz,Rx}
718
& Logical shift right with carry \\\hline
719
NEG Rx & \parbox[t]{1.5in}{XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
720
NOOP & NOOP & While there are many
721
operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these 722 operations have consequences in that they might stall the bus if 723 Rx isn't ready yet. For this reason, we have a dedicated NOOP 724 instruction. \\\hline 725 NOT Rx & XOR \$-1,Rx & \\\hline
726
POP Rx
727
& \parbox[t]{1.5in}{LOD \$-1(SP),Rx \\ ADD \$1,SP}
728
& Note
729
that for interrupt purposes, one can never depend upon the value at
730
(SP).  Hence you read from it, then increment it, lest having
731
incremented it firost something then comes along and writes to that
732
value before you can read the result. \\\hline
733
PUSH Rx
734
& \parbox[t]{1.5in}{SUB \$1,SPa \\ 735 STO Rx,\$1(SP)}
736
& \\\hline
737
RESET
738
& \parbox[t]{1in}{STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
739
& \parbox[t]{3in}{This depends upon the peripheral base address being
740
in R12.
741

742
743
supervisor mode.}\\\hline
744
RET & \parbox[t]{1.5in}{LOD \$-1(SP),R0 \\ 745 MOV \$-1+SP,SP \\
746
MOV R0,PC}
747
& An alternative might be to LOD \$-1(SP),PC, followed 748 by depending upon the calling program to ADD \$1,SP. \\\hline
749
\end{tabular}
750
\caption{Derived Instructions, continued}\label{tbl:derived-2}
751
\end{center}\end{table}
752
\begin{table}\begin{center}
753
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
754
RET & MOV R12,PC
755
& This is the high(er) speed version, that doesn't touch the stack.
756
As such, it doesn't suffer a stall on memory read/write to the stack.
757
\\\hline
758
STEP Rr,Rt
759
& \parbox[t]{1.5in}{LSR \$1,Rr \\ XOR.C Rt,Rr} 760 & Step a Galois implementation of a Linear Feedback Shift Register, Rr, 761 using taps Rt \\\hline 762 STO.b Rx,\$addr
763
& \parbox[t]{1.5in}{%
764
LDI \$addr,Ra \\ 765 LDI \$addr,Rb \\
766
LSR \$2,Ra \\ 767 AND \$3,Rb \\
768
SUB \$32,Rb \\ 769 LOD (Ra),Ry \\ 770 AND \$0ffh,Rx \\
771
AND \$-0ffh,Ry \\ 772 ROL Rb,Rx \\ 773 OR Rx,Ry \\ 774 STO Ry,(Ra) } 775 & \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized 776 for byte-wise operations. 777 778 Note that in this example, \$addr is a
779
byte-wise address, whereas in all of our other examples it is a
780
781
of character accesses from 16 MB down to 4MB.F
782
Further, this instruction implies a byte ordering,
783
such as big or little endian.} \\\hline
784
SWAP Rx,Ry
785
& \parbox[t]{1.5in}{
786
XOR Ry,Rx \\
787
XOR Rx,Ry \\
788
XOR Ry,Rx}
789
& While no extra registers are needed, this example
790
does take 3-clocks. \\\hline
791
TRAP \#X
792
& LDILO \$x,CC 793 & This approach uses the unused bits of the CC register as a TRAP 794 address. If these bits are zero, no trap has occurred. Unlike my 795 previous approach, which was to use a trap peripheral, this approach 796 has no delay associated with it. To work, the supervisor will need 797 to clear this register following any trap, and the user will need to 798 be careful to only set this register prior to a trap condition. 799 Likewise, when setting this value, the user will need to make certain 800 that the SLEEP and GIE bits are not set in \$x.  LDI would also work,
801
however using LDILO permits the use of conditional traps.  (i.e.,
802
trap if the zero flag is set.)  Should you wish to trap off of a
803
register value, you could equivalently load \$x into the register and 804 then MOV it into the CC register. \\\hline 805 TST Rx 806 & TST \$-1,Rx
807
& Set the condition codes based upon Rx.  Could also do a CMP \$0,Rx, 808 ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc.  The TST and CMP
809
approaches won't stall future pipeline stages looking for the value
810
of Rx. \\\hline
811
WAIT
812
& Or \$SLEEP,CC 813 & Wait 'til interrupt. In an interrupts disabled context, this 814 becomes a HALT instruction. 815 </TABLE> 816 \end{tabular} 817 \caption{Derived Instructions, continued}\label{tbl:derived-3} 818 \end{center}\end{table} 819 \iffalse 820 \fi 821 \section{Pipeline Stages} 822 \begin{enumerate} 823 \item {\bf Prefetch}: Read instruction from memory (cache if possible). This 824 stage is actually pipelined itself, and so it will stall if the PC 825 ever changes. Stalls are also created here if the instruction isn't 826 in the prefetch cache. 827 \item {\bf Decode}: Decode instruction into op code, register(s) to read, and 828 immediate offset. 829 \item {\bf Read Operands}: Read registers and apply any immediate values to 830 them. This stage will stall if any source operand is pending. 831 A proper optimizing compiler, therefore, will schedule an instruction 832 between the instruction that produces the result and the instruction 833 that uses it. 834 \item Split into two tracks: An {\bf ALU} which will accomplish a simple 835 instruction, and the {\bf MemOps} stage which accomplishes memory 836 read/write. 837 \begin{itemize} 838 \item Loads stall instructions that access the register until it is 839 written to the register set. 840 \item Condition codes are available upon completion 841 \item Issuing an instruction to the memory while the memory is busy will 842 stall the bus. If the bus deadlocks, only a reset will 843 release the CPU. (Watchdog timer, anyone?) 844 \end{itemize} 845 \item {\bf Write-Back}: Conditionally write back the result to register set, 846 applying the condition. This routine is bi-re-entrant: either the 847 memory or the simple instruction may request a register write. 848 \end{enumerate} 849 850 \section{Pipeline Logic} 851 How the CPU handles some instruction combinations can be telling when 852 determining what happens in the pipeline. The following lists some examples: 853 \begin{itemize} 854 \item {\bf Delayed Branching} 855 856 I had originally hoped to implement delayed branching. However, what 857 happens in debug mode? 858 That is, what happens when a debugger tries to single step an 859 instruction? While I can easily single step the computer in either 860 user or supervisor mode from externally, this processor does not appear 861 able to step the CPU in user mode from within user mode--gosh, not even 862 from within supervisor mode--such as if a process had a debugger 863 attached. As the processor exists, I would have one result stepping 864 the CPU from a debugger, and another stepping it externally. 865 866 This is unacceptable, and so this CPU does not support delayed 867 branching. 868 869 \item {\bf Register Result:} {\tt MOV R0,R1; MOV R1,R2 } 870 871 What value does 872 R2 get, the value of R1 before the first move or the value of R0? 873 Placing the value of R0 into R1 requires a pipeline stall, and possibly 874 two, as I have the pipeline designed. 875 876 The ZIP CPU architecture requires that R2 must equal R0 at the end of 877 this operation. This may stall the pipeline 1-2 cycles. 878 879 \item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
880

881

882
At issue is the same item as above, save that the CMP instruction
883
updates the flags that the MOV instruction depends
884
upon.
885

886
The Zip CPU architecture requires that condition codes must be updated
887
and available immediately for the next instruction without stalling the
888
pipeline.
889

890
\item {\bf Condition Codes Register Result:} {\tt CMP R0,R1; MOV CC,R2}
891

892
At issue is the
893
fact that the logic supporting the CC register is more complicated than
894
the logic supporting any other register.
895

896
The ZIP CPU will stall 1--2 cycles on this instruction, until the
897
CC register is valid.
898

899
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1} 900 901 At issues is whether or not the instruction following the jump will 902 take place before the jump. In other words, is the MOV to the PC 903 register handled differently from an ADD to the PC register? 904 905 In the Zip architecture, MOV'es and ADD's use the same logic 906 (simplifies the logic). 907 \end{itemize} 908 909 As I've studied this, I find several approaches to handling pipeline 910 issues. These approaches (and their consequences) are listed below. 911 912 \begin{itemize} 913 \item {\bf All All issued instructions complete, Stages stall individually} 914 915 What about a slow pre-fetch? 916 917 Nominally, this works well: any issued instruction 918 just runs to completion. If there are four issued instructions in the 919 pipeline, with the writeback instruction being a write-to-PC 920 instruction, the other three instructions naturally finish. 921 922 This approach fails when reading instructions from the flash, 923 since such reads require N clocks to clocks to complete. Thus 924 there may be only one instruction in the pipeline if reading from flash, 925 or a full pipeline if reading from cache. Each of these approaches 926 would produce a different response. 927 928 \item {\bf Issued instructions may be canceled} 929 930 Stages stall individually 931 932 First problem: 933 Memory operations cannot be canceled, even reads may have side effects 934 on peripherals that cannot be canceled later. Further, in the case of 935 an interrupt, it's difficult to know what to cancel. What happens in 936 a \hbox{\tt MOV.C \$x,PC} followed by a \hbox{\tt MOV \$y,PC} 937 instruction? Which get 938 canceled? 939 940 Because it isn't clear what would need to be canceled, 941 this instruction combination is not recommended. 942 943 \item {\bf All issued instructions complete.} 944 945 All stages are filled, or the entire pipeline 946 stalls. 947 948 What about debug control? What about 949 register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2 950 should place the value of R0 into R2. How do you restart the pipeline 951 after an interrupt? What address do you use? The last issued 952 instruction? But the branch delay slots may make that invalid! 953 954 Reading from the CPU debug port in this case yields inconsistent 955 results: the CPU will halt or step with instructions stuck in the 956 pipeline. Reading registers will give no indication of what is going 957 on in the pipeline, just the results of completed operations, not of 958 operations that have been started and not yet completed. 959 Perhaps we should just report the state of the CPU based upon what 960 instructions (PC values) have successfully completed? Thus the 961 debug instruction is the one that will write registers on the next 962 clock. 963 964 Suggestion: Suppose we load extra information in the two 965 CC register(s) for debugging intermediate pipeline stages? 966 967 The next problem, though, is how to deal with the read operand 968 pipeline stage needing the result from the register pipeline.a 969 970 \item {\bf Memory instructions must complete} 971 972 All instructions that enter into the memory module *must* 973 complete. Issued instructions from the prefetch, decode, or operand 974 read stages may or may not complete. Jumps into code must be valid, 975 so that interrupt returns may be valid. All instructions entering the 976 ALU complete. 977 978 This looks to be the simplest approach. 979 While the logic may be difficult, this appears to be the only 980 re-entrant approach. 981 982 A {\tt new\_pc} flag will be high anytime the PC changes in an 983 unpredictable way (i.e., it doesn't increment). This includes jumps 984 as well as interrupts and interrupt returns. Whenever this flag may 985 go high, memory operations and ALU operations will stall until the 986 result is known. When the flag does go high, anything in the prefetch, 987 decode, and read-op stage will be invalidated. 988 989 \end{itemize} 990 991 992 993 \chapter{Peripherals}\label{chap:periph} 994 \section{Interrupt Controller} 995 \section{Counter} 996 997 The Zip Counter is a very simple counter: it just counts. It cannot be 998 halted. When it rolls over, it issues an interrupt. Writing a value to the 999 counter just sets the current value, and it starts counting again from that 1000 value. 1001 1002 Eight counters are implemented in the Zip System for process accounting. 1003 This may change in the future, as nothing as yet uses these counters. 1004 1005 \section{Timer} 1006 1007 The Zip Timer is also very simple: it simply counts down to zero. When it 1008 transitions from a one to a zero it creates an interrupt. 1009 1010 Writing any non-zero value to the timer starts the timer. If the high order 1011 bit is set when writing to the timer, the timer becomes an interval timer and 1012 reloads its last start time on any interrupt. Hence, to mark seconds, one 1013 might set the timer to 100~million (the number of clocks per second), and 1014 set the high bit. Ever after, the timer will interrupt the CPU once per 1015 second (assuming a 100~MHz clock). 1016 1017 \section{Watchdog Timer} 1018 1019 The watchdog timer is no different from any of the other timers, save for one 1020 critical difference: the interrupt line from the watchdog 1021 timer is tied to the reset line of the CPU. Hence writing a 1' to the 1022 watchdog timer will always reset the CPU. 1023 To stop the Watchdog timer, write a '0' to it. To start it, 1024 write any other number to it---as with the other timers. 1025 1026 While the watchdog timer supports interval mode, it doesn't make as much sense 1027 as it did with the other timers. 1028 1029 \section{Jiffies} 1030 1031 This peripheral is motivated by the Linux use of jiffies' whereby a process 1032 can request to be put to sleep until a certain number of jiffies' have 1033 elapsed. Using this interface, the CPU can read the number of jiffies' 1034 from the peripheral (it only has the one location in address space), add the 1035 sleep length to it, and write teh result back to the peripheral. The zipjiffies 1036 peripheral will record the value written to it only if it is nearer the current 1037 counter value than the last current waiting interrupt time. If no other 1038 interrupts are waiting, and this time is in the future, it will be enabled. 1039 (There is currently no way to disable a jiffie interrupt once set, other 1040 than to disable the register in the interrupt controller.) The processor 1041 may then place this sleep request into a list among other sleep requests. 1042 Once the timer expires, it would write the next Jiffy request to the peripheral 1043 and wake up the process whose timer had expired. 1044 1045 Indeed, the Jiffies register is nothing more than a glorified counter with 1046 an interrupt. Unlike the other counters, the Jiffies register cannot be set. 1047 Writes to the jiffies register create an interrupt time. When the Jiffies 1048 register later equals the value written to it, an interrupt will be asserted 1049 and the register then continues counting as though no interrupt had taken 1050 place. 1051 1052 The purpose of this register is to support alarm times within a CPU. To 1053 set an alarm for a particular process$N$clocks in advance, read the current 1054 Jiffies value, and$N\$, and write it back to the Jiffies register.  The
1055
O/S must also keep track of values written to the Jiffies register.  Thus,
1056
when an alarm' trips, it should be remoed from the list of alarms, the list
1057
should be sorted, and the next alarm in terms of Jiffies should be written
1058
to the register.
1059

1060
\chapter{Operation}\label{chap:ops}
1061

1062
\chapter{Registers}\label{chap:regs}
1063

1064
\chapter{Wishbone Datasheet}\label{chap:wishbone}
1065
The Zip System supports two wishbone accesses, a slave debug port and a master
1066
port for the system itself.  These are shown in Tbl.~\ref{tbl:wishbone-slave}
1067
\begin{table}[htbp]
1068
\begin{center}
1069
\begin{wishboneds}
1070
Revision level of wishbone & WB B4 spec \\\hline
1071
Type of interface & Slave, Read/Write, single words only \\\hline
1072
Port size & 32--bit \\\hline
1073
Port granularity & 32--bit \\\hline
1074
Maximum Operand Size & 32--bit \\\hline
1075
Data transfer ordering & (Irrelevant) \\\hline
1076
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1077
Signal Names & \begin{tabular}{ll}
1078
Signal Name & Wishbone Equivalent \\\hline
1079
{\tt i\_clk} & {\tt CLK\_I} \\
1080
{\tt i\_dbg\_cyc} & {\tt CYC\_I} \\
1081
{\tt i\_dbg\_stb} & {\tt STB\_I} \\
1082
{\tt i\_dbg\_we} & {\tt WE\_I} \\
1083
1084
{\tt i\_dbg\_data} & {\tt DAT\_I} \\
1085
{\tt o\_dbg\_ack} & {\tt ACK\_O} \\
1086
{\tt o\_dbg\_stall} & {\tt STALL\_O} \\
1087
{\tt o\_dbg\_data} & {\tt DAT\_O}
1088
\end{tabular}\\\hline
1089
\end{wishboneds}
1090 22 dgisselq
\caption{Wishbone Datasheet for the Debug Interface}\label{tbl:wishbone-slave}
1091 21 dgisselq
\end{center}\end{table}
1092
and Tbl.~\ref{tbl:wishbone-master} respectively.
1093
\begin{table}[htbp]
1094
\begin{center}
1095
\begin{wishboneds}
1096
Revision level of wishbone & WB B4 spec \\\hline
1097
Type of interface & Master, Read/Write, sometimes pipelined \\\hline
1098
Port size & 32--bit \\\hline
1099
Port granularity & 32--bit \\\hline
1100
Maximum Operand Size & 32--bit \\\hline
1101
Data transfer ordering & (Irrelevant) \\\hline
1102
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1103
Signal Names & \begin{tabular}{ll}
1104
Signal Name & Wishbone Equivalent \\\hline
1105
{\tt i\_clk} & {\tt CLK\_O} \\
1106
{\tt o\_wb\_cyc} & {\tt CYC\_O} \\
1107
{\tt o\_wb\_stb} & {\tt STB\_O} \\
1108
{\tt o\_wb\_we} & {\tt WE\_O} \\
1109
1110
{\tt o\_wb\_data} & {\tt DAT\_O} \\
1111
{\tt i\_wb\_ack} & {\tt ACK\_I} \\
1112
{\tt i\_wb\_stall} & {\tt STALL\_I} \\
1113
{\tt i\_wb\_data} & {\tt DAT\_I}
1114
\end{tabular}\\\hline
1115
\end{wishboneds}
1116 22 dgisselq
\caption{Wishbone Datasheet for the CPU as Master}\label{tbl:wishbone-master}
1117 21 dgisselq
\end{center}\end{table}
1118
I do not recommend that you connect these together through the interconnect.
1119

1120
The big thing to notice is that both the real time clock and the real time
1121
date modules act as wishbone slaves, and that all accesses to the registers of
1122
either module are 32--bit reads and writes.  The address bus does not offer
1123
byte level, but rather 32--bit word level resolution.  Select lines are not
1124
implemented.  Bit ordering is the normal ordering where bit~31 is the most
1125
significant bit and so forth.
1126

1127
\chapter{Clocks}\label{chap:clocks}
1128

1129
This core is based upon the Basys--3 design.  The Basys--3 development board
1130
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
1131
core.
1132
\begin{table}[htbp]
1133
\begin{center}
1134
\begin{clocklist}
1135
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
1136
\end{clocklist}
1137
\caption{List of Clocks}\label{tbl:clocks}
1138
\end{center}\end{table}
1139
I hesitate to suggest that the core can run faster than 100~MHz, since I have
1140
had struggled with various timing violations to keep it at 100~MHz.  So, for
1141
now, I will only state that it can run at 100~MHz.
1142

1143

1144
\chapter{I/O Ports}\label{chap:ioports}
1145
The I/O ports for this clock are shown in Tbls.~\ref{tbl:iowishbone}
1146
\begin{table}[htbp]
1147
\begin{center}
1148
\begin{portlist}
1149
i\_clk & 1 & Input & System clock, used for time and wishbone interfaces.\\\hline
1150
i\_wb\_cyc & 1 & Input & Wishbone bus cycle wire.\\\hline
1151
i\_wb\_stb & 1 & Input & Wishbone strobe.\\\hline
1152
i\_wb\_we & 1 & Input & Wishbone write enable.\\\hline
1153
1154
i\_wb\_data & 32 & Input & Wishbone bus data register for use when writing
1155
(configuring) the core from the bus.\\\hline
1156
o\_wb\_ack & 1 & Output & Return value acknowledging a wishbone write, or
1157
signifying valid data in the case of a wishbone read request.
1158
\\\hline
1159
o\_wb\_stall & 1 & Output & Indicates the device is not yet ready for another
1160
wishbone access, effectively stalling the bus.\\\hline
1161
o\_wb\_data & 32 & Output & Wishbone data bus, returning data values read
1162
from the interface.\\\hline
1163
\end{portlist}
1164
\caption{Wishbone I/O Ports}\label{tbl:iowishbone}
1165
\end{center}\end{table}
1166
and~Tbl.~\ref{tbl:ioother}.
1167
\begin{table}[htbp]
1168
\begin{center}
1169
\begin{portlist}
1170
o\_sseg & 32 & Output & Lines to control a seven segment display, to be
1171
sent to that display's driver.  Each eight bit byte controls
1172
one digit in the display, with the bottom bit in the byte
1173
controlling the decimal point.\\\hline
1174
o\_led & 16 & Output & Output LED's, consisting of a 16--bit counter counting
1175
from zero to all ones each minute, and synchronized with each
1176
minute so as to create an indicator of when the next minute
1177
will take place when only the hours and minutes can be
1178
displayed.\\\hline
1179
o\_interrupt & 1 & Output & A pulsed/strobed interrupt line.  When the
1180
clock needs to generate an interrupt, it will set this line
1181
high for one clock cycle.  \\\hline
1182
o\_ppd & 1 & Output & A `pulse per day' signal which can be fed into the
1183
real--time date module.  This line will be high on the clock before
1184
the stroke of midnight, allowing the date module to turn over to the
1185
next day at exactly the same time the clock module turns over to the
1186
next day.\\\hline
1187
i\_hack & 1 & Input & When this line is raised, copies are made of the
1188
internal state registers on the next clock.  These registers can then
1189
be used for an accurate time hack regarding the state of the clock
1190
at the time this line was strobed.\\\hline
1191
\end{portlist}
1192
\caption{Other I/O Ports}\label{tbl:ioother}
1193
\end{center}\end{table}
1194
Tbl.~\ref{tbl:iowishbone} reiterates the wishbone I/O values just discussed in
1195
Chapt.~\ref{chap:wishbone}, and so need no further discussion here.
1196

1197

1198
% Appendices
1199
% Index
1200
\end{document}
1201

1202