URL https://opencores.org/ocsvn/zipcpu/zipcpu/trunk

# Subversion Repositorieszipcpu

## [/] [zipcpu/] [trunk/] [doc/] [src/] [spec.tex] - Blame information for rev 21

Line No. Rev Author Line
1 21 dgisselq
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2
%%
3
%% Filename:    spec.tex
4
%%
5
%% Project:     Zip CPU -- a small, lightweight, RISC CPU soft core
6
%%
7
%% Purpose:     This LaTeX file contains all of the documentation/description
8
%%              currently provided with this Zip CPU soft core.  It supercedes
9
%%              any information about the instruction set or CPUs found
10
%%              elsewhere.  It's not nearly as interesting, though, as the PDF
11
%%              file it creates, so I'd recommend reading that before diving
12
%%              into this file.  You should be able to find the PDF file in
13
%%              the SVN distribution together with this PDF file and a copy of
14
%%              the GPL-3.0 license this file is distributed under.  If not,
15
%%              just type 'make' in the doc directory and it (should) build
16
%%              without a problem.
17
%%
18
%%
19
%% Creator:     Dan Gisselquist
20
%%              Gisselquist Technology, LLC
21
%%
22
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
23
%%
24
%% Copyright (C) 2015, Gisselquist Technology, LLC
25
%%
26
%% This program is free software (firmware): you can redistribute it and/or
27
%% modify it under the terms of  the GNU General Public License as published
28
%% by the Free Software Foundation, either version 3 of the License, or (at
29
%% your option) any later version.
30
%%
31
%% This program is distributed in the hope that it will be useful, but WITHOUT
32
%% ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
33
%% FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
34
%% for more details.
35
%%
36
%% You should have received a copy of the GNU General Public License along
37
%% with this program.  (It's in the $(ROOT)/doc directory, run make with no 38 %% target there if the PDF file isn't present.) If not, see 39 %% <http://www.gnu.org/licenses/> for a copy. 40 %% 41 %% License: GPL, v3, as defined and found on www.gnu.org, 42 %% http://www.gnu.org/licenses/gpl.html 43 %% 44 %% 45 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 46 \documentclass{gqtekspec} 47 \project{Zip CPU} 48 \title{Specification} 49 \author{Dan Gisselquist, Ph.D.} 50 \email{dgisselq (at) opencores.org} 51 \revision{Rev.~0.1} 52 \begin{document} 53 \pagestyle{gqtekspecplain} 54 \titlepage 55 \begin{license} 56 Copyright (C) \theyear\today, Gisselquist Technology, LLC 57   58 This project is free software (firmware): you can redistribute it and/or 59 modify it under the terms of the GNU General Public License as published 60 by the Free Software Foundation, either version 3 of the License, or (at 61 your option) any later version. 62   63 This program is distributed in the hope that it will be useful, but WITHOUT 64 ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or 65 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 66 for more details. 67   68 You should have received a copy of the GNU General Public License along 69 with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a 70 copy. 71 \end{license} 72 \begin{revisionhistory} 73 0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline 74 \end{revisionhistory} 75 % Revision History 76 % Table of Contents, named Contents 77 \tableofcontents 78 % \listoffigures 79 \listoftables 80 \begin{preface} 81 Many people have asked me why I am building the Zip CPU. ARM processors are 82 good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both 83 have better toolsets than the Zip CPU will ever have. OpenRISC is also 84 available. Why build a new processor? 85   86 The easiest, most obvious answer is the simple one: Because I can. 87   88 There's more to it, though. There's a lot that I would like to do with a 89 processor, and I want to be able to do it in a vendor independent fashion. 90 I would like to be able to generate Verilog code that can run equivalently 91 on both Xilinx and Altera chips, and that can be easily ported from one 92 manufacturer's chipsets to another. Even more, before purchasing a chip or a 93 board, I would like to know that my chip works. I would like to build a test 94 bench to test components with, and Verilator is my chosen test bench. This 95 forces me to use all Verilog, and it prevents me from using any proprietary 96 cores. For this reason, Microblaze and Nios are out of the question. 97   98 Why not OpenRISC? That's a hard question. The OpenRISC team has done some 99 wonderful work on an amazing processor, and I'll have to admit that I am 100 envious of what they've accomplished. I would like to port binutils to the 101 Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The 102 OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has 103 a lot of features of modern CPUs within it that ... well, let's just say it's 104 not the little guy on the block. The Zip CPU is lighter weight, costing only 105 about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic 106 peripherals. 107   108 My final reason is that I'm building the Zip CPU as a learning experience. The 109 Zip CPU has allowed me to learn a lot about how CPUs work on a very micro 110 level. For the first time, I am beginning to understand many of the Computer 111 Architecture lessons from years ago. 112   113 To summarize: Because I can, because it is open source, because it is light 114 weight, and as an exercise in learning. 115   116 \end{preface} 117   118 \chapter{Introduction} 119 \pagenumbering{arabic} 120 \setcounter{page}{1} 121   122   123 The original goal of the ZIP CPU was to be a very simple CPU. You might 124 think of it as a poor man's alternative to the OpenRISC architecture. 125 For this reason, all instructions have been designed to be as simple as 126 possible, and are all designed to be executed in one instruction cycle per 127 instruction, barring pipeline stalls. Indeed, even the bus has been simplified 128 to a constant 32-bit width, with no option for more or less. This has 129 resulted in the choice to drop push and pop instructions, pre-increment and 130 post-decrement addressing modes, and more. 131   132 For those who like buzz words, the Zip CPU is: 133 \begin{itemize} 134 \item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits, 135  instructions are 32-bits wide, etc. 136 \item A RISC CPU. There is no microcode for executing instructions. 137 \item A Load/Store architecture. (Only load and store instructions 138  can access memory.) 139 \item Wishbone compliant. All peripherals are accessed just like 140  memory across this bus. 141 \item A Von-Neumann architecture. (The instructions and data share a 142  common bus.) 143 \item A pipelined architecture, having stages for {\bf Prefetch}, 144  {\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory} 145  unit, and {\bf Write-back} 146 \item Completely open source, licensed under the GPL.\footnote{Should you 147  need a copy of the Zip CPU licensed under other terms, please 148  contact me.} 149 \end{itemize} 150   151 Now, however, that I've worked on the Zip CPU for a while, it is not nearly 152 as simple as I originally hoped. Worse, I've had to adjust to create 153 capabilities that I was never expecting to need. These include: 154 \begin{itemize} 155 \item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is 156  still necessary to debug this CPU. That means that there needs to be 157  an external register that can control the CPU: reset it, halt it, step 158  it, and tell whether it is running or not. Another register is placed 159  similar to this register, to allow the external controller to examine 160  registers internal to the CPU. 161   162 \item {\bf Internal Debug:} Being able to run a debugger from within 163  a user process requires an ability to step a user process from 164  within a debugger. It also requires a break instruction that can 165  be substituted for any other instruction, and substituted back. 166  The break is actually difficult: the break instruction cannot be 167  allowed to execute. That way, upon a break, the debugger should 168  be able to jump back into the user process to step the instruction 169  that would've been at the break point initially, and then to 170  replace the break after passing it. 171   172 \item {\bf Prefetch Cache:} My original implementation had a very 173  simple prefetch stage. Any time the PC changed the prefetch would go 174  and fetch the new instruction. While this was perhaps this simplest 175  approach, it cost roughly five clocks for every instruction. This 176  was deemed unacceptable, as I wanted a CPU that could execute 177  instructions in one cycle. I therefore have a prefetch cache that 178  issues pipelined wishbone accesses to memory and then pushes 179  instructions at the CPU. Sadly, this accounts for about 20\% of the 180  logic in the entire CPU, or 15\% of the logic in the entire system. 181   182   183 \item {\bf Operating System:} In order to support an operating system, 184  interrupts and so forth, the CPU needs to support supervisor and 185  user modes, as well as a means of switching between them. For example, 186  the user needs a means of executing a system call. This is the 187  purpose of the {\bf trap'} instruction. This instruction needs to 188  place the CPU into supervisor mode (here equivalent to disabling 189  interrupts), as well as handing it a parameter such as identifying 190  which O/S function was called. 191   192 My initial approach to building a trap instruction was to create 193  an external peripheral which, when written to, would generate an 194  interrupt and could return the last value written to it. This failed 195  timing requirements, however: the CPU executed two instructions while 196  waiting for the trap interrupt to take place. Since then, I've 197  decided to keep the rest of the CC register for that purpose so that a 198  write to the CC register, with the GIE bit cleared, could be used to 199  execute a trap. 200   201 Modern timesharing systems also depend upon a {\bf Timer} interrupt 202  to handle task swapping. For the Zip CPU, this interrupt is handled 203  external to the CPU as part of the CPU System, found in 204  {\tt zipsystem.v}. The timer module itself is found in 205  {\tt ziptimer.v}. 206   207 \item {\bf Pipeline Stalls:} My original plan was to not support pipeline 208  stalls at all, but rather to require the compiler to properly schedule 209  instructions so that stalls would never be necessary. After trying 210  to build such an architecture, I gave up, having learned some things: 211   212  For example, in order to facilitate interrupt handling and debug 213  stepping, the CPU needs to know what instructions have finished, and 214  which have not. In other words, it needs to know where it can restart 215  the pipeline from. Once restarted, it must act as though it had 216  never stopped. This killed my idea of delayed branching, since 217  what would be the appropriate program counter to restart at? 218  The one the CPU was going to branch to, or the ones in the 219  delay slots? 220   221  So I switched to a model of discrete execution: Once an instruction 222  enters into either the ALU or memory unit, the instruction is 223  guaranteed to complete. If the logic recognizes a branch or a 224  condition that would render the instruction entering into this stage 225  possibly inappropriate (i.e. a conditional branch preceeding a store 226  instruction for example), then the pipeline stalls for one cycle 227  until the conditional branch completes. Then, if it generates a new 228  PC address, the stages preceeding are all wiped clean. 229   230  The discrete execution model allows such things as sleeping: if the 231  CPU is put to "sleep", the ALU and memory stages stall and back up 232  everything before them. Likewise, anything that has entered the ALU 233  or memory stage when the CPU is placed to sleep continues to completion. 234  To handle this logic, each pipeline stage has three control signals: 235  a valid signal, a stall signal, and a clock enable signal. In 236  general, a stage stalls if it's contents are valid and the next step 237  is stalled. This allows the pipeline to fill any time a later stage 238  stalls. 239   240 \item {\bf Verilog Modules:} When examining how other processors worked 241  here on open cores, many of them had one separate module per pipeline 242  stage. While this appeared to me to be a fascinating and commendable 243  idea, my own implementation didn't work out quite so nicely. 244   245  As an example, the decode module produces a {\em lot} of 246  control wires and registers. Creating a module out of this, with 247  only the simplest of logic within it, seemed to be more a lesson 248  in passing wires around, rather than encapsulating logic. 249   250  Another example was the register writeback section. I would love 251  this section to be a module in its own right, and many have made them 252  such. However, other modules depend upon writeback results other 253  than just what's placed in the register (i.e., the control wires). 254  For these reasons, I didn't manage to fit this section into it's 255  own module. 256   257  The result is that the majority of the CPU code can be found in 258  the {\tt zipcpu.v} file. 259 \end{itemize} 260   261 With that introduction out of the way, let's move on to the instruction 262 set. 263   264 \chapter{CPU Architecture}\label{chap:arch} 265   266 The Zip CPU supports a set of two operand instructions, where the first operand 267 (always a register) is the result. The only exception is the store instruction, 268 where the first operand (always a register) is the source of the data to be 269 stored. 270   271 \section{Register Set} 272 The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor 273 and a user set. The supervisor set is used in interrupt mode, whereas 274 the user set is used otherwise. Of this register set, the Program Counter (PC) 275 is register 15, whereas the status register (SR) or condition code register 276 (CC) is register 14. By convention, the stack pointer will be register 13 and 277 noted as (SP)--although the instruction set allows it to be anything. 278 The CPU can access both register sets via move instructions from the 279 supervisor state, whereas the user state can only access the user registers. 280   281 The status register is special, and bears further mention. The lower 282 8 bits of the status register form a set of condition codes. Writes to other 283 bits are preserved, and can be used as part of the trap architecture--examined 284 by the O/S upon any interrupt, cleared before returning. 285   286 Of the eight condition codes, the bottom four are the current flags: 287  Zero (Z), 288  Carry (C), 289  Negative (N), 290  and Overflow (V). 291   292 The next bit is a clock enable (0 to enable) or sleep bit (1 to put 293  the CPU to sleep). Setting this bit will cause the CPU to 294  wait for an interrupt (if interrupts are enabled), or to 295  completely halt (if interrupts are disabled). 296 The sixth bit is a global interrupt enable bit (GIE). When this 297  sixth bit is a '1' interrupts will be enabled, else disabled. When 298  interrupts are disabled, the CPU will be in supervisor mode, otherwise 299  it is in user mode. Thus, to execute a context switch, one only 300  need enable or disable interrupts. (When an interrupt line goes 301  high, interrupts will automatically be disabled, as the CPU goes 302  and deals with its context switch.) 303   304 The seventh bit is a step bit. This bit can be 305  set from supervisor mode only. After setting this bit, should 306  the supervisor mode process switch to user mode, it would then 307  accomplish one instruction in user mode before returning to supervisor 308  mode. Then, upon return to supervisor mode, this bit will 309  be automatically cleared. This bit has no effect on the CPU while in 310  supervisor mode. 311   312  This functionality was added to enable a userspace debugger 313  functionality on a user process, working through supervisor mode 314  of course. 315   316   317 The eighth bit is a break enable bit. This 318  controls whether a break instruction will halt the processor for an 319  external debuggerr (break enabled), or whether the break instruction 320  will simply set the STEP bit and send the CPU into interrupt mode. 321  This bit can only be set within supervisor mode. 322   323 This functionality was added to enable an external debugger to 324  set and manage breakpoints. 325   326 The ninth bit is reserved for a floating point enable bit. When set, the 327 arithmetic for the next instruction will be sent to a floating point unit. 328 Such a unit may later be added as an extension to the Zip CPU. If the 329 CPU does not support floating point instructions, this bit will never be set. 330   331 The tenth bit is a trap bit. It is set whenever the user requests a soft 332 interrupt, and cleared on any return to userspace command. This allows the 333 supervisor, in supervisor mode, to determine whether it got to supervisor 334 mode from a trap or from an external interrupt or both. 335   336 The status register bits are shown below: 337 \begin{table} 338 \begin{center} 339 \begin{tabular}{l|l} 340 Bit & Meaning \\\hline 341 9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline 342 8 & (Reserved for) Floating point enable \\\hline 343 7 & Halt on break, to support an external debugger \\\hline 344 6 & Step, single step the CPU in user mode\\\hline 345 5 & GIE, or Global Interrupt Enable \\\hline 346 4 & Sleep \\\hline 347 3 & V, or overflow bit.\\\hline 348 2 & N, or negative bit.\\\hline 349 1 & C, or carry bit.\\\hline 350   351 \end{tabular} 352 \end{center} 353 \end{table} 354 \section{Conditional Instructions} 355 Most, although not quite all, instructions are conditionally executed. From 356 the four condition code flags, eight conditions are defined. These are shown 357 in Tbl.~\ref{tbl:conditions}. 358 \begin{table} 359 \begin{center} 360 \begin{tabular}{l|l|l} 361 Code & Mneumonic & Condition \\\hline 362 3'h0 & None & Always execute the instruction \\ 363 3'h1 & {\tt .Z} & Only execute when 'Z' is set \\ 364 3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\ 365 3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\ 366 3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\ 367 3'h5 & {\tt .LT} & Less than ('N' not set) \\ 368 3'h6 & {\tt .C} & Carry set\\ 369 3'h7 & {\tt .V} & Overflow set\\ 370 \end{tabular} 371 \caption{Conditions for conditional operand execution}\label{tbl:conditions} 372 \end{center} 373 \end{table} 374 There is no condition code for less than or equal, not C or not V. Using 375 these conditions will take an extra instruction. 376 (Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
377
 
378
\section{Operand B}
379
Many instruction forms have a 21-bit source "Operand B" associated with them.
380
This Operand B is either equal to a register plus a signed immediate offset,
381
or an immediate offset by itself.  This value is encoded as shown in
382
Tbl.~\ref{tbl:opb}.
383
\begin{table}\begin{center}
384
\begin{tabular}{|l|l|l|}\hline
385
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
386
1'b0 & \multicolumn{2}{l|}{Signed Immediate value} \\\hline
387
1'b1 & 4-bit Register & 16-bit Signed immediate offset \\\hline
388
\end{tabular}
389
\caption{Bit allocation for Operand B}\label{tbl:opb}
390
\end{center}\end{table}
391
\section{Address Modes}
392
The ZIP CPU supports two addressing modes: register plus immediate, and
393
immediate address.  Addresses are therefore encoded in the same fashion as
394
Operand B's, shown above.
395
 
396
A lot of long hard thought was put into whether to allow pre/post increment
397
and decrement addressing modes.  Finding no way to use these operators without
398
taking two or more clocks per instruction, these addressing modes have been
399
removed from the realm of possibilities.  This means that the Zip CPU has no
400
native way of executing push, pop, return, or jump to subroutine operations.
401
 
402
\section{Move Operands}
403
The previous set of operands would be perfect and complete, save only that
404
        the CPU needs access to non--supervisory registers while in supervisory
405
        mode.  Therefore, the MOV instruction is special and offers access
406
        to these registers ... when in supervisory mode.  To keep the compiler
407
        simple, the extra bits are ignored in non-supervisory mode (as though
408
        they didn't exist), rather than being mapped to new instructions or
409
        additional capabilities.  The bits indicating which register set each
410
        register lies within are the A-Usr and B-Usr bits.  When set to a one,
411
        these refer to a user mode register.  When set to a zero, these refer
412
        to a register in the current mode, whether user or supervisor.
413
        Further, because
414
        a load immediate instruction exists, there is no move capability between
415
        an immediate and a register: all moves come from either a register or
416
        a register plus an offset.
417
 
418
This actually leads to a bit of a problem: since the MOV instruction
419
        encodes which register set each register is coming from or moving to,
420
        how shall a compiler or assembler know how to compile a MOV instruction
421
        without knowing the mode of the CPU at the time?  For this reason,
422
        the compiler will assume all MOV registers are supervisor registers,
423
        and display them as normal.  Anything with the user bit set will
424
        be treated as a user register.  The CPU will quietly ignore the
425
        supervisor bits while in user mode, and anything marked as a user
426
        register will always be valid.
427
 
428
\section{Multiply Operations}
429
While the Zip CPU instruction set supports multiply operations, they are not
430
yet fully supported by the CPU.  Two Multiply operations are supported, a
431
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU).  In both
432
cases, the operand is a register plus a 16-bit immediate, subject to the
433
rule that the register cannot be the PC or CC registers.  The PC register
434
field has been stolen to create a multiply by immediate instruction.  The
435
CC register field is reserved.
436
 
437
\section{Floating Point}
438
The ZIP CPU does not support floating point operations today.  However, the
439
instruction set reserves a capability for a floating point operation.  To
440
execute such an operation, simply set the floating point bit in the CC
441
register and the following instruction will interpret its registers as
442
a floating point instruction.  Not all instructions, however, have floating
443
point equivalents.  Further, the immediate fields do not apply in floating
444
point mode, and must be set to zero.  Not all instructions make sense as
445
floating point operations.  Therefore, only the CMP, SUB, ADD, and MPY
446
instructions may be issued as floating point instructions.  Other instructions
447
allow the examining of the floating point bit in the CC register.  In all
448
cases, the floating point bit is cleared one instruction after it is set.
449
 
450
The architecture does not support a floating point not-implemented interrupt.
451
Any soft floating point emulation must be done deliberately.
452
 
453
\section{Native Instructions}
454
The instruction set for the Zip CPU is summarized in
455
Tbl.~\ref{tbl:zip-instructions}.
456
\begin{table}\begin{center}
457
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
458
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
459
        & \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
460
        & Sets CC? \\\hline
461
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
462
                & \multicolumn{4}{l|}{D. Reg}
463
                & \multicolumn{3}{l|}{Cond.}
464
                & \multicolumn{21}{l|}{Operand B}
465
                & Yes \\\hline
466
BTST(And) & \multicolumn{4}{l|}{4'h1}
467
                & \multicolumn{4}{l|}{D. Reg}
468
                & \multicolumn{3}{l|}{Cond.}
469
                & \multicolumn{21}{l|}{Operand B}
470
        & Yes \\\hline
471
MOV & \multicolumn{4}{l|}{4'h2}
472
                & \multicolumn{4}{l|}{D. Reg}
473
                & \multicolumn{3}{l|}{Cond.}
474
                & A-Usr
475
                & \multicolumn{4}{l|}{B-Reg}
476
                & B-Usr
477
                & \multicolumn{15}{l|}{15'bit signed offset}
478
                & \\\hline
479
LODI & \multicolumn{4}{l|}{4'h3}
480
                & \multicolumn{4}{l|}{R. Reg}
481
                & \multicolumn{24}{l|}{24'bit Signed Immediate}
482
                & \\\hline
483
NOOP & \multicolumn{4}{l|}{4'h4}
484
                & \multicolumn{4}{l|}{4'he}
485
                & \multicolumn{24}{l|}{24'h00}
486
                & \\\hline
487
BREAK & \multicolumn{4}{l|}{4'h4}
488
                & \multicolumn{4}{l|}{4'he}
489
                & \multicolumn{24}{l|}{24'h01}
490
                & \\\hline
491
{\em Rsrd} & \multicolumn{4}{l|}{4'h4}
492
                & \multicolumn{4}{l|}{4'he}
493
                & \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
494
                & \\\hline
495
LODIHI & \multicolumn{4}{l|}{4'h4}
496
                & \multicolumn{4}{l|}{4'hf}
497
                & \multicolumn{3}{l|}{Cond.}
498
                & 1'b1
499
                & \multicolumn{4}{l|}{R. Reg}
500
                & \multicolumn{16}{l|}{16-bit Immediate}
501
                & \\\hline
502
LODILO & \multicolumn{4}{l|}{4'h4}
503
                & \multicolumn{4}{l|}{4'hf}
504
                & \multicolumn{3}{l|}{Cond.}
505
                & 1'b0
506
                & \multicolumn{4}{l|}{R. Reg}
507
                & \multicolumn{16}{l|}{16-bit Immediate}
508
                & \\\hline
509
16-b MPYU & \multicolumn{4}{l|}{4'h4}
510
                & \multicolumn{4}{l|}{R. Reg}
511
                & \multicolumn{3}{l|}{Cond.}
512
                & 1'b0 & \multicolumn{4}{l|}{Reg}
513
                & \multicolumn{16}{l|}{16-bit Offset}
514
                & Yes \\\hline
515
16-b MPYU(I) & \multicolumn{4}{l|}{4'h4}
516
                & \multicolumn{4}{l|}{R. Reg}
517
                & \multicolumn{3}{l|}{Cond.}
518
                & 1'b0 & \multicolumn{4}{l|}{4'hf}
519
                & \multicolumn{16}{l|}{16-bit Offset}
520
                & Yes \\\hline
521
16-b MPYS & \multicolumn{4}{l|}{4'h4}
522
                & \multicolumn{4}{l|}{R. Reg}
523
                & \multicolumn{3}{l|}{Cond.}
524
                & 1'b1 & \multicolumn{4}{l|}{Reg}
525
                & \multicolumn{16}{l|}{16-bit Offset}
526
                & Yes \\\hline
527
16-b MPYS(I) & \multicolumn{4}{l|}{4'h4}
528
                & \multicolumn{4}{l|}{R. Reg}
529
                & \multicolumn{3}{l|}{Cond.}
530
                & 1'b1 & \multicolumn{4}{l|}{4'hf}
531
                & \multicolumn{16}{l|}{16-bit Offset}
532
                & Yes \\\hline
533
ROL & \multicolumn{4}{l|}{4'h5}
534
                & \multicolumn{4}{l|}{R. Reg}
535
                & \multicolumn{3}{l|}{Cond.}
536
                & \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
537
                & \\\hline
538
LOD & \multicolumn{4}{l|}{4'h6}
539
                & \multicolumn{4}{l|}{R. Reg}
540
                & \multicolumn{3}{l|}{Cond.}
541
                & \multicolumn{21}{l|}{Operand B address}
542
                & \\\hline
543
STO & \multicolumn{4}{l|}{4'h7}
544
                & \multicolumn{4}{l|}{D. Reg}
545
                & \multicolumn{3}{l|}{Cond.}
546
                & \multicolumn{21}{l|}{Operand B address}
547
                & \\\hline
548
{\em Rsrd} & \multicolumn{4}{l|}{4'h8}
549
        &       \multicolumn{4}{l|}{R. Reg}
550
        &       \multicolumn{3}{l|}{Cond.}
551
        & 1'b0
552
        &       \multicolumn{20}{l|}{Reserved}
553
        & Yes \\\hline
554
SUB & \multicolumn{4}{l|}{4'h8}
555
        &       \multicolumn{4}{l|}{R. Reg}
556
        &       \multicolumn{3}{l|}{Cond.}
557
        & 1'b1
558
        &       \multicolumn{4}{l|}{Reg}
559
        &       \multicolumn{16}{l|}{16'bit signed offset}
560
        & Yes \\\hline
561
AND & \multicolumn{4}{l|}{4'h9}
562
        &       \multicolumn{4}{l|}{R. Reg}
563
        &       \multicolumn{3}{l|}{Cond.}
564
        &       \multicolumn{21}{l|}{Operand B}
565
        & Yes \\\hline
566
ADD & \multicolumn{4}{l|}{4'ha}
567
        &       \multicolumn{4}{l|}{R. Reg}
568
        &       \multicolumn{3}{l|}{Cond.}
569
        &       \multicolumn{21}{l|}{Operand B}
570
        & Yes \\\hline
571
OR & \multicolumn{4}{l|}{4'hb}
572
        &       \multicolumn{4}{l|}{R. Reg}
573
        &       \multicolumn{3}{l|}{Cond.}
574
        &       \multicolumn{21}{l|}{Operand B}
575
        & Yes \\\hline
576
XOR & \multicolumn{4}{l|}{4'hc}
577
        &       \multicolumn{4}{l|}{R. Reg}
578
        &       \multicolumn{3}{l|}{Cond.}
579
        &       \multicolumn{21}{l|}{Operand B}
580
        & Yes \\\hline
581
LSL/ASL & \multicolumn{4}{l|}{4'hd}
582
        &       \multicolumn{4}{l|}{R. Reg}
583
        &       \multicolumn{3}{l|}{Cond.}
584
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
585
        & Yes \\\hline
586
ASR & \multicolumn{4}{l|}{4'he}
587
        &       \multicolumn{4}{l|}{R. Reg}
588
        &       \multicolumn{3}{l|}{Cond.}
589
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
590
        & Yes \\\hline
591
LSR & \multicolumn{4}{l|}{4'hf}
592
        &       \multicolumn{4}{l|}{R. Reg}
593
        &       \multicolumn{3}{l|}{Cond.}
594
        &       \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
595
        & Yes \\\hline
596
\end{tabular}
597
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
598
\end{center}\end{table}
599
 
600
As you can see, there's lots of room for instruction set expansion.  The
601
NOOP and BREAK instructions leave 24~bits of open instruction address
602
space, minus the two instructions NOOP and BREAK.  The Subtract leaves half
603
of its space open, since a subtract immediate is the same as an add with a
604
negated immediate.
605
 
606
\section{Derived Instructions}
607
The ZIP CPU supports many other common instructions, but not all of them
608
are single instructions.  The derived instruction tables,
609
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
610
help to capture some of how these other instructions may be implemented on
611
the ZIP CPU.  Many of these instructions will have assembly equivalents,
612
such as the branch instructions, to facilitate working with the CPU.
613
\begin{table}\begin{center}
614
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
615
Mapped & Actual  & Notes \\\hline
616
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
617
        & \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry} 618  & Add with carry \\\hline 619 BRA.Cond +/-\$Addr
620
        & Mov.cond \$Addr+PC,PC 621  & Branch or jump on condition. Works for 14 bit 622  address offsets.\\\hline 623 BRA.Cond +/-\$Addr
624
        & \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC} 625  & Branch/jump on condition. Works for 626  23 bit address offsets, but costs a register, an extra instruction, 627  and setsthe flags. \\\hline 628 BNC PC+\$Addr
629
        & \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
630
        & Example of a branch on an unsupported
631
                condition, in this case a branch on not carry \\\hline
632
BUSY & MOV \$-1(PC),PC & Execute an infinite loop \\\hline 633 CLRF.NZ Rx 634  & XOR.NZ Rx,Rx 635  & Clear Rx, and flags, if the Z-bit is not set \\\hline 636 CLR Rx 637  & LDI \$0,Rx
638
        & Clears Rx, leaves flags untouched.  This instruction cannot be
639
                conditional. \\\hline
640
EXCH.W Rx
641
        & ROL \$16,Rx 642  & Exchanges the top and bottom 16'bit words of Rx \\\hline 643 HALT 644  & Or \$SLEEP,CC
645
        & Executed while in interrupt mode.  In user mode this is simply a
646
        wait until interrupt instructioon. \\\hline
647
INT & LDI \$0,CC 648  & Since we're using the CC register as a trap vector as well, this 649  executes TRAP \#0. \\\hline 650 IRET 651  & OR \$GIE,CC
652
        & Also an RTU instruction (Return to Userspace) \\\hline
653
JMP R6+\$Addr 654  & MOV \$Addr(R6),PC
655
        & \\\hline
656
JSR PC+\$Addr 657  & \parbox[t]{1.5in}{SUB \$1,SP \\\
658
        MOV \$3+PC,R0 \\ 659  STO R0,1(SP) \\ 660  MOV \$Addr+PC,PC \\
661
        ADD \$1,SP} 662  & Jump to Subroutine. \\\hline 663 JSR PC+\$Addr
664
        & \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
665
        &This is the high speed
666
        version of a subroutine call, necessitating a register to hold the
667
        last PC address.  In its favor, this method doesn't suffer the
668
        mandatory memory access of the other approach. \\\hline
669
LDI.l \$val,Rx 670  & \parbox[t]{1.5in}{LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
671
                        LDILO (\$val \& 0x0ffff)} 672  & Sadly, there's not enough instruction 673  space to load a complete immediate value into any register. 674  Therefore, fully loading any register takes two cycles. 675  The LDIHI (load immediate high) and LDILO (load immediate low) 676  instructions have been created to facilitate this. \\\hline 677 \end{tabular} 678 \caption{Derived Instructions}\label{tbl:derived-1} 679 \end{center}\end{table} 680 \begin{table}\begin{center} 681 \begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline 682 Mapped & Actual & Notes \\\hline 683 LOD.b \$addr,Rx
684
        & \parbox[t]{1.5in}{%
685
        LDI     \$addr,Ra \\ 686  LDI \$addr,Rb \\
687
        LSR     \$2,Ra \\ 688  AND \$3,Rb \\
689
        LOD     (Ra),Rx \\
690
        LSL     \$3,Rb \\ 691  SUB \$32,Rb \\
692
        ROL     Rb,Rx \\
693
        AND \$0ffh,Rx} 694  & \parbox[t]{3in}{This CPU is designed for 32'bit word 695  length instructions. Byte addressing is not supported by the CPU or 696  the bus, so it therefore takes more work to do. 697   698  Note also that in this example, \$Addr is a byte-wise address, where
699
        all other addresses are 32-bit wordlength addresses.  For this reason,
700
        we needed to drop the bottom two bits.  This also limits the address
701
        space of character accesses using this method from 16 MB down to 4MB.}
702
                \\\hline
703
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
704
        & \parbox[t]{1.5in}{LSL \$1,Ry \\ 705  LSL \$1,Rx \\
706
        OR.C \$1,Ry} 707  & Logical shift left with carry. Note that the 708  instruction order is now backwards, to keep the conditions valid. 709  That is, LSL sets the carry flag, so if we did this the othe way 710  with Rx before Ry, then the condition flag wouldn't have been right 711  for an OR correction at the end. \\\hline 712 \parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry} 713  & \parbox[t]{1.5in}{CLR Rz \\ 714  LSR \$1,Ry \\
715
        LDIHI.C \$8000h,Rz \\ 716  LSR \$1,Rx \\
717
        OR Rz,Rx}
718
        & Logical shift right with carry \\\hline
719
NEG Rx & \parbox[t]{1.5in}{XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
720
NOOP & NOOP & While there are many
721
        operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these 722  operations have consequences in that they might stall the bus if 723  Rx isn't ready yet. For this reason, we have a dedicated NOOP 724  instruction. \\\hline 725 NOT Rx & XOR \$-1,Rx & \\\hline
726
POP Rx
727
        & \parbox[t]{1.5in}{LOD \$-1(SP),Rx \\ ADD \$1,SP}
728
        & Note
729
        that for interrupt purposes, one can never depend upon the value at
730
        (SP).  Hence you read from it, then increment it, lest having
731
        incremented it firost something then comes along and writes to that
732
        value before you can read the result. \\\hline
733
PUSH Rx
734
        & \parbox[t]{1.5in}{SUB \$1,SPa \\ 735  STO Rx,\$1(SP)}
736
        & \\\hline
737
RESET
738
        & \parbox[t]{1in}{STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
739
        & \parbox[t]{3in}{This depends upon the peripheral base address being
740
        in R12.
741
 
742
        Another opportunity might be to jump to the reset address from within
743
        supervisor mode.}\\\hline
744
RET & \parbox[t]{1.5in}{LOD \$-1(SP),R0 \\ 745  MOV \$-1+SP,SP \\
746
        MOV R0,PC}
747
        & An alternative might be to LOD \$-1(SP),PC, followed 748  by depending upon the calling program to ADD \$1,SP. \\\hline
749
\end{tabular}
750
\caption{Derived Instructions, continued}\label{tbl:derived-2}
751
\end{center}\end{table}
752
\begin{table}\begin{center}
753
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
754
RET & MOV R12,PC
755
        & This is the high(er) speed version, that doesn't touch the stack.
756
        As such, it doesn't suffer a stall on memory read/write to the stack.
757
        \\\hline
758
STEP Rr,Rt
759
        & \parbox[t]{1.5in}{LSR \$1,Rr \\ XOR.C Rt,Rr} 760  & Step a Galois implementation of a Linear Feedback Shift Register, Rr, 761  using taps Rt \\\hline 762 STO.b Rx,\$addr
763
        & \parbox[t]{1.5in}{%
764
        LDI \$addr,Ra \\ 765  LDI \$addr,Rb \\
766
        LSR \$2,Ra \\ 767  AND \$3,Rb \\
768
        SUB \$32,Rb \\ 769  LOD (Ra),Ry \\ 770  AND \$0ffh,Rx \\
771
        AND \$-0ffh,Ry \\ 772  ROL Rb,Rx \\ 773  OR Rx,Ry \\ 774  STO Ry,(Ra) } 775  & \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized 776  for byte-wise operations. 777   778  Note that in this example, \$addr is a
779
        byte-wise address, whereas in all of our other examples it is a
780
        32-bit word address. This also limits the address space
781
        of character accesses from 16 MB down to 4MB.F
782
        Further, this instruction implies a byte ordering,
783
        such as big or little endian.} \\\hline
784
SWAP Rx,Ry
785
        & \parbox[t]{1.5in}{
786
        XOR Ry,Rx \\
787
        XOR Rx,Ry \\
788
        XOR Ry,Rx}
789
        & While no extra registers are needed, this example
790
        does take 3-clocks. \\\hline
791
TRAP \#X
792
        & LDILO \$x,CC 793  & This approach uses the unused bits of the CC register as a TRAP 794  address. If these bits are zero, no trap has occurred. Unlike my 795  previous approach, which was to use a trap peripheral, this approach 796  has no delay associated with it. To work, the supervisor will need 797  to clear this register following any trap, and the user will need to 798  be careful to only set this register prior to a trap condition. 799  Likewise, when setting this value, the user will need to make certain 800  that the SLEEP and GIE bits are not set in \$x.  LDI would also work,
801
        however using LDILO permits the use of conditional traps.  (i.e.,
802
        trap if the zero flag is set.)  Should you wish to trap off of a
803
        register value, you could equivalently load \$x into the register and 804  then MOV it into the CC register. \\\hline 805 TST Rx 806  & TST \$-1,Rx
807
        & Set the condition codes based upon Rx.  Could also do a CMP \$0,Rx, 808  ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc.  The TST and CMP
809
        approaches won't stall future pipeline stages looking for the value
810
        of Rx. \\\hline
811
WAIT
812
        & Or \$SLEEP,CC 813  & Wait 'til interrupt. In an interrupts disabled context, this 814  becomes a HALT instruction. 815 </TABLE> 816 \end{tabular} 817 \caption{Derived Instructions, continued}\label{tbl:derived-3} 818 \end{center}\end{table} 819 \iffalse 820 \fi 821 \section{Pipeline Stages} 822 \begin{enumerate} 823 \item {\bf Prefetch}: Read instruction from memory (cache if possible). This 824  stage is actually pipelined itself, and so it will stall if the PC 825  ever changes. Stalls are also created here if the instruction isn't 826  in the prefetch cache. 827 \item {\bf Decode}: Decode instruction into op code, register(s) to read, and 828  immediate offset. 829 \item {\bf Read Operands}: Read registers and apply any immediate values to 830  them. This stage will stall if any source operand is pending. 831  A proper optimizing compiler, therefore, will schedule an instruction 832  between the instruction that produces the result and the instruction 833  that uses it. 834 \item Split into two tracks: An {\bf ALU} which will accomplish a simple 835  instruction, and the {\bf MemOps} stage which accomplishes memory 836  read/write. 837  \begin{itemize} 838  \item Loads stall instructions that access the register until it is 839  written to the register set. 840  \item Condition codes are available upon completion 841  \item Issuing an instruction to the memory while the memory is busy will 842  stall the bus. If the bus deadlocks, only a reset will 843  release the CPU. (Watchdog timer, anyone?) 844  \end{itemize} 845 \item {\bf Write-Back}: Conditionally write back the result to register set, 846  applying the condition. This routine is bi-re-entrant: either the 847  memory or the simple instruction may request a register write. 848 \end{enumerate} 849   850 \section{Pipeline Logic} 851 How the CPU handles some instruction combinations can be telling when 852 determining what happens in the pipeline. The following lists some examples: 853 \begin{itemize} 854 \item {\bf Delayed Branching} 855   856  I had originally hoped to implement delayed branching. However, what 857  happens in debug mode? 858  That is, what happens when a debugger tries to single step an 859  instruction? While I can easily single step the computer in either 860  user or supervisor mode from externally, this processor does not appear 861  able to step the CPU in user mode from within user mode--gosh, not even 862  from within supervisor mode--such as if a process had a debugger 863  attached. As the processor exists, I would have one result stepping 864  the CPU from a debugger, and another stepping it externally. 865   866  This is unacceptable, and so this CPU does not support delayed 867  branching. 868   869 \item {\bf Register Result:} {\tt MOV R0,R1; MOV R1,R2 } 870   871  What value does 872  R2 get, the value of R1 before the first move or the value of R0? 873  Placing the value of R0 into R1 requires a pipeline stall, and possibly 874  two, as I have the pipeline designed. 875   876  The ZIP CPU architecture requires that R2 must equal R0 at the end of 877  this operation. This may stall the pipeline 1-2 cycles. 878   879 \item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
880
 
881
 
882
        At issue is the same item as above, save that the CMP instruction
883
        updates the flags that the MOV instruction depends
884
        upon.
885
 
886
        The Zip CPU architecture requires that condition codes must be updated
887
        and available immediately for the next instruction without stalling the
888
        pipeline.
889
 
890
\item {\bf Condition Codes Register Result:} {\tt CMP R0,R1; MOV CC,R2}
891
 
892
        At issue is the
893
        fact that the logic supporting the CC register is more complicated than
894
        the logic supporting any other register.
895
 
896
        The ZIP CPU will stall 1--2 cycles on this instruction, until the
897
        CC register is valid.
898
 
899
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1} 900   901  At issues is whether or not the instruction following the jump will 902  take place before the jump. In other words, is the MOV to the PC 903  register handled differently from an ADD to the PC register? 904   905  In the Zip architecture, MOV'es and ADD's use the same logic 906  (simplifies the logic). 907 \end{itemize} 908   909 As I've studied this, I find several approaches to handling pipeline 910  issues. These approaches (and their consequences) are listed below. 911   912 \begin{itemize} 913 \item {\bf All All issued instructions complete, Stages stall individually} 914   915  What about a slow pre-fetch? 916   917  Nominally, this works well: any issued instruction 918  just runs to completion. If there are four issued instructions in the 919  pipeline, with the writeback instruction being a write-to-PC 920  instruction, the other three instructions naturally finish. 921   922  This approach fails when reading instructions from the flash, 923  since such reads require N clocks to clocks to complete. Thus 924  there may be only one instruction in the pipeline if reading from flash, 925  or a full pipeline if reading from cache. Each of these approaches 926  would produce a different response. 927   928 \item {\bf Issued instructions may be canceled} 929   930  Stages stall individually 931   932  First problem: 933  Memory operations cannot be canceled, even reads may have side effects 934  on peripherals that cannot be canceled later. Further, in the case of 935  an interrupt, it's difficult to know what to cancel. What happens in 936  a \hbox{\tt MOV.C \$x,PC} followed by a \hbox{\tt MOV \$y,PC} 937  instruction? Which get 938  canceled? 939   940  Because it isn't clear what would need to be canceled, 941  this instruction combination is not recommended. 942   943 \item {\bf All issued instructions complete.} 944   945  All stages are filled, or the entire pipeline 946  stalls. 947   948  What about debug control? What about 949  register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2 950  should place the value of R0 into R2. How do you restart the pipeline 951  after an interrupt? What address do you use? The last issued 952  instruction? But the branch delay slots may make that invalid! 953   954  Reading from the CPU debug port in this case yields inconsistent 955  results: the CPU will halt or step with instructions stuck in the 956  pipeline. Reading registers will give no indication of what is going 957  on in the pipeline, just the results of completed operations, not of 958  operations that have been started and not yet completed. 959  Perhaps we should just report the state of the CPU based upon what 960  instructions (PC values) have successfully completed? Thus the 961  debug instruction is the one that will write registers on the next 962  clock. 963   964  Suggestion: Suppose we load extra information in the two 965  CC register(s) for debugging intermediate pipeline stages? 966   967  The next problem, though, is how to deal with the read operand 968  pipeline stage needing the result from the register pipeline.a 969   970 \item {\bf Memory instructions must complete} 971   972  All instructions that enter into the memory module *must* 973  complete. Issued instructions from the prefetch, decode, or operand 974  read stages may or may not complete. Jumps into code must be valid, 975  so that interrupt returns may be valid. All instructions entering the 976  ALU complete. 977   978  This looks to be the simplest approach. 979  While the logic may be difficult, this appears to be the only 980  re-entrant approach. 981   982  A {\tt new\_pc} flag will be high anytime the PC changes in an 983  unpredictable way (i.e., it doesn't increment). This includes jumps 984  as well as interrupts and interrupt returns. Whenever this flag may 985  go high, memory operations and ALU operations will stall until the 986  result is known. When the flag does go high, anything in the prefetch, 987  decode, and read-op stage will be invalidated. 988   989 \end{itemize} 990   991   992   993 \chapter{Peripherals}\label{chap:periph} 994 \section{Interrupt Controller} 995 \section{Counter} 996   997 The Zip Counter is a very simple counter: it just counts. It cannot be 998 halted. When it rolls over, it issues an interrupt. Writing a value to the 999 counter just sets the current value, and it starts counting again from that 1000 value. 1001   1002 Eight counters are implemented in the Zip System for process accounting. 1003 This may change in the future, as nothing as yet uses these counters. 1004   1005 \section{Timer} 1006   1007 The Zip Timer is also very simple: it simply counts down to zero. When it 1008 transitions from a one to a zero it creates an interrupt. 1009   1010 Writing any non-zero value to the timer starts the timer. If the high order 1011 bit is set when writing to the timer, the timer becomes an interval timer and 1012 reloads its last start time on any interrupt. Hence, to mark seconds, one 1013 might set the timer to 100~million (the number of clocks per second), and 1014 set the high bit. Ever after, the timer will interrupt the CPU once per 1015 second (assuming a 100~MHz clock). 1016   1017 \section{Watchdog Timer} 1018   1019 The watchdog timer is no different from any of the other timers, save for one 1020 critical difference: the interrupt line from the watchdog 1021 timer is tied to the reset line of the CPU. Hence writing a 1' to the 1022 watchdog timer will always reset the CPU. 1023 To stop the Watchdog timer, write a '0' to it. To start it, 1024 write any other number to it---as with the other timers. 1025   1026 While the watchdog timer supports interval mode, it doesn't make as much sense 1027 as it did with the other timers. 1028   1029 \section{Jiffies} 1030   1031 This peripheral is motivated by the Linux use of jiffies' whereby a process 1032 can request to be put to sleep until a certain number of jiffies' have 1033 elapsed. Using this interface, the CPU can read the number of jiffies' 1034 from the peripheral (it only has the one location in address space), add the 1035 sleep length to it, and write teh result back to the peripheral. The zipjiffies 1036 peripheral will record the value written to it only if it is nearer the current 1037 counter value than the last current waiting interrupt time. If no other 1038 interrupts are waiting, and this time is in the future, it will be enabled. 1039 (There is currently no way to disable a jiffie interrupt once set, other 1040 than to disable the register in the interrupt controller.) The processor 1041 may then place this sleep request into a list among other sleep requests. 1042 Once the timer expires, it would write the next Jiffy request to the peripheral 1043 and wake up the process whose timer had expired. 1044   1045 Indeed, the Jiffies register is nothing more than a glorified counter with 1046 an interrupt. Unlike the other counters, the Jiffies register cannot be set. 1047 Writes to the jiffies register create an interrupt time. When the Jiffies 1048 register later equals the value written to it, an interrupt will be asserted 1049 and the register then continues counting as though no interrupt had taken 1050 place. 1051   1052 The purpose of this register is to support alarm times within a CPU. To 1053 set an alarm for a particular process$N$clocks in advance, read the current 1054 Jiffies value, and$N\$, and write it back to the Jiffies register.  The
1055
O/S must also keep track of values written to the Jiffies register.  Thus,
1056
when an alarm' trips, it should be remoed from the list of alarms, the list
1057
should be sorted, and the next alarm in terms of Jiffies should be written
1058
to the register.
1059
 
1060
\chapter{Operation}\label{chap:ops}
1061
 
1062
\chapter{Registers}\label{chap:regs}
1063
 
1064
\chapter{Wishbone Datasheet}\label{chap:wishbone}
1065
The Zip System supports two wishbone accesses, a slave debug port and a master
1066
port for the system itself.  These are shown in Tbl.~\ref{tbl:wishbone-slave}
1067
\begin{table}[htbp]
1068
\begin{center}
1069
\begin{wishboneds}
1070
Revision level of wishbone & WB B4 spec \\\hline
1071
Type of interface & Slave, Read/Write, single words only \\\hline
1072
Port size & 32--bit \\\hline
1073
Port granularity & 32--bit \\\hline
1074
Maximum Operand Size & 32--bit \\\hline
1075
Data transfer ordering & (Irrelevant) \\\hline
1076
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1077
Signal Names & \begin{tabular}{ll}
1078
                Signal Name & Wishbone Equivalent \\\hline
1079
                {\tt i\_clk} & {\tt CLK\_I} \\
1080
                {\tt i\_dbg\_cyc} & {\tt CYC\_I} \\
1081
                {\tt i\_dbg\_stb} & {\tt STB\_I} \\
1082
                {\tt i\_dbg\_we} & {\tt WE\_I} \\
1083
                {\tt i\_dbg\_addr} & {\tt ADR\_I} \\
1084
                {\tt i\_dbg\_data} & {\tt DAT\_I} \\
1085
                {\tt o\_dbg\_ack} & {\tt ACK\_O} \\
1086
                {\tt o\_dbg\_stall} & {\tt STALL\_O} \\
1087
                {\tt o\_dbg\_data} & {\tt DAT\_O}
1088
                \end{tabular}\\\hline
1089
\end{wishboneds}
1090
\caption{Wishbone Datasheet}\label{tbl:wishbone-slave}
1091
\end{center}\end{table}
1092
and Tbl.~\ref{tbl:wishbone-master} respectively.
1093
\begin{table}[htbp]
1094
\begin{center}
1095
\begin{wishboneds}
1096
Revision level of wishbone & WB B4 spec \\\hline
1097
Type of interface & Master, Read/Write, sometimes pipelined \\\hline
1098
Port size & 32--bit \\\hline
1099
Port granularity & 32--bit \\\hline
1100
Maximum Operand Size & 32--bit \\\hline
1101
Data transfer ordering & (Irrelevant) \\\hline
1102
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
1103
Signal Names & \begin{tabular}{ll}
1104
                Signal Name & Wishbone Equivalent \\\hline
1105
                {\tt i\_clk} & {\tt CLK\_O} \\
1106
                {\tt o\_wb\_cyc} & {\tt CYC\_O} \\
1107
                {\tt o\_wb\_stb} & {\tt STB\_O} \\
1108
                {\tt o\_wb\_we} & {\tt WE\_O} \\
1109
                {\tt o\_wb\_addr} & {\tt ADR\_O} \\
1110
                {\tt o\_wb\_data} & {\tt DAT\_O} \\
1111
                {\tt i\_wb\_ack} & {\tt ACK\_I} \\
1112
                {\tt i\_wb\_stall} & {\tt STALL\_I} \\
1113
                {\tt i\_wb\_data} & {\tt DAT\_I}
1114
                \end{tabular}\\\hline
1115
\end{wishboneds}
1116
\caption{Wishbone Datasheet}\label{tbl:wishbone-master}
1117
\end{center}\end{table}
1118
I do not recommend that you connect these together through the interconnect.
1119
 
1120
The big thing to notice is that both the real time clock and the real time
1121
date modules act as wishbone slaves, and that all accesses to the registers of
1122
either module are 32--bit reads and writes.  The address bus does not offer
1123
byte level, but rather 32--bit word level resolution.  Select lines are not
1124
implemented.  Bit ordering is the normal ordering where bit~31 is the most
1125
significant bit and so forth.
1126
 
1127
\chapter{Clocks}\label{chap:clocks}
1128
 
1129
This core is based upon the Basys--3 design.  The Basys--3 development board
1130
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
1131
core.
1132
\begin{table}[htbp]
1133
\begin{center}
1134
\begin{clocklist}
1135
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
1136
\end{clocklist}
1137
\caption{List of Clocks}\label{tbl:clocks}
1138
\end{center}\end{table}
1139
I hesitate to suggest that the core can run faster than 100~MHz, since I have
1140
had struggled with various timing violations to keep it at 100~MHz.  So, for
1141
now, I will only state that it can run at 100~MHz.
1142
 
1143
 
1144
\chapter{I/O Ports}\label{chap:ioports}
1145
The I/O ports for this clock are shown in Tbls.~\ref{tbl:iowishbone}
1146
\begin{table}[htbp]
1147
\begin{center}
1148
\begin{portlist}
1149
i\_clk & 1 & Input & System clock, used for time and wishbone interfaces.\\\hline
1150
i\_wb\_cyc & 1 & Input & Wishbone bus cycle wire.\\\hline
1151
i\_wb\_stb & 1 & Input & Wishbone strobe.\\\hline
1152
i\_wb\_we & 1 & Input & Wishbone write enable.\\\hline
1153
i\_wb\_addr & 5 & Input & Wishbone address.\\\hline
1154
i\_wb\_data & 32 & Input & Wishbone bus data register for use when writing
1155
        (configuring) the core from the bus.\\\hline
1156
o\_wb\_ack & 1 & Output & Return value acknowledging a wishbone write, or
1157
                signifying valid data in the case of a wishbone read request.
1158
                \\\hline
1159
o\_wb\_stall & 1 & Output & Indicates the device is not yet ready for another
1160
                wishbone access, effectively stalling the bus.\\\hline
1161
o\_wb\_data & 32 & Output & Wishbone data bus, returning data values read
1162
                from the interface.\\\hline
1163
\end{portlist}
1164
\caption{Wishbone I/O Ports}\label{tbl:iowishbone}
1165
\end{center}\end{table}
1166
and~Tbl.~\ref{tbl:ioother}.
1167
\begin{table}[htbp]
1168
\begin{center}
1169
\begin{portlist}
1170
o\_sseg & 32 & Output & Lines to control a seven segment display, to be
1171
                sent to that display's driver.  Each eight bit byte controls
1172
                one digit in the display, with the bottom bit in the byte
1173
                controlling the decimal point.\\\hline
1174
o\_led & 16 & Output & Output LED's, consisting of a 16--bit counter counting
1175
                from zero to all ones each minute, and synchronized with each
1176
                minute so as to create an indicator of when the next minute
1177
                will take place when only the hours and minutes can be
1178
                displayed.\\\hline
1179
o\_interrupt & 1 & Output & A pulsed/strobed interrupt line.  When the
1180
                clock needs to generate an interrupt, it will set this line
1181
                high for one clock cycle.  \\\hline
1182
o\_ppd & 1 & Output & A pulse per day' signal which can be fed into the
1183
        real--time date module.  This line will be high on the clock before
1184
        the stroke of midnight, allowing the date module to turn over to the
1185
        next day at exactly the same time the clock module turns over to the
1186
        next day.\\\hline
1187
i\_hack & 1 & Input & When this line is raised, copies are made of the
1188
        internal state registers on the next clock.  These registers can then
1189
        be used for an accurate time hack regarding the state of the clock
1190
        at the time this line was strobed.\\\hline
1191
\end{portlist}
1192
\caption{Other I/O Ports}\label{tbl:ioother}
1193
\end{center}\end{table}
1194
Tbl.~\ref{tbl:iowishbone} reiterates the wishbone I/O values just discussed in
1195
Chapt.~\ref{chap:wishbone}, and so need no further discussion here.
1196
 
1197
 
1198
% Appendices
1199
% Index
1200
\end{document}
1201
 
1202
 `