| 1 |
21 |
dgisselq |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
| 2 |
|
|
%%
|
| 3 |
|
|
%% Filename: spec.tex
|
| 4 |
|
|
%%
|
| 5 |
|
|
%% Project: Zip CPU -- a small, lightweight, RISC CPU soft core
|
| 6 |
|
|
%%
|
| 7 |
|
|
%% Purpose: This LaTeX file contains all of the documentation/description
|
| 8 |
|
|
%% currently provided with this Zip CPU soft core. It supercedes
|
| 9 |
|
|
%% any information about the instruction set or CPUs found
|
| 10 |
|
|
%% elsewhere. It's not nearly as interesting, though, as the PDF
|
| 11 |
|
|
%% file it creates, so I'd recommend reading that before diving
|
| 12 |
|
|
%% into this file. You should be able to find the PDF file in
|
| 13 |
|
|
%% the SVN distribution together with this PDF file and a copy of
|
| 14 |
|
|
%% the GPL-3.0 license this file is distributed under. If not,
|
| 15 |
|
|
%% just type 'make' in the doc directory and it (should) build
|
| 16 |
|
|
%% without a problem.
|
| 17 |
|
|
%%
|
| 18 |
|
|
%%
|
| 19 |
|
|
%% Creator: Dan Gisselquist
|
| 20 |
|
|
%% Gisselquist Technology, LLC
|
| 21 |
|
|
%%
|
| 22 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
| 23 |
|
|
%%
|
| 24 |
|
|
%% Copyright (C) 2015, Gisselquist Technology, LLC
|
| 25 |
|
|
%%
|
| 26 |
|
|
%% This program is free software (firmware): you can redistribute it and/or
|
| 27 |
|
|
%% modify it under the terms of the GNU General Public License as published
|
| 28 |
|
|
%% by the Free Software Foundation, either version 3 of the License, or (at
|
| 29 |
|
|
%% your option) any later version.
|
| 30 |
|
|
%%
|
| 31 |
|
|
%% This program is distributed in the hope that it will be useful, but WITHOUT
|
| 32 |
|
|
%% ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
|
| 33 |
|
|
%% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
| 34 |
|
|
%% for more details.
|
| 35 |
|
|
%%
|
| 36 |
|
|
%% You should have received a copy of the GNU General Public License along
|
| 37 |
|
|
%% with this program. (It's in the $(ROOT)/doc directory, run make with no
|
| 38 |
|
|
%% target there if the PDF file isn't present.) If not, see
|
| 39 |
|
|
%% <http://www.gnu.org/licenses/> for a copy.
|
| 40 |
|
|
%%
|
| 41 |
|
|
%% License: GPL, v3, as defined and found on www.gnu.org,
|
| 42 |
|
|
%% http://www.gnu.org/licenses/gpl.html
|
| 43 |
|
|
%%
|
| 44 |
|
|
%%
|
| 45 |
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
| 46 |
|
|
\documentclass{gqtekspec}
|
| 47 |
|
|
\project{Zip CPU}
|
| 48 |
|
|
\title{Specification}
|
| 49 |
|
|
\author{Dan Gisselquist, Ph.D.}
|
| 50 |
|
|
\email{dgisselq (at) opencores.org}
|
| 51 |
|
|
\revision{Rev.~0.1}
|
| 52 |
|
|
\begin{document}
|
| 53 |
|
|
\pagestyle{gqtekspecplain}
|
| 54 |
|
|
\titlepage
|
| 55 |
|
|
\begin{license}
|
| 56 |
|
|
Copyright (C) \theyear\today, Gisselquist Technology, LLC
|
| 57 |
|
|
|
| 58 |
|
|
This project is free software (firmware): you can redistribute it and/or
|
| 59 |
|
|
modify it under the terms of the GNU General Public License as published
|
| 60 |
|
|
by the Free Software Foundation, either version 3 of the License, or (at
|
| 61 |
|
|
your option) any later version.
|
| 62 |
|
|
|
| 63 |
|
|
This program is distributed in the hope that it will be useful, but WITHOUT
|
| 64 |
|
|
ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
|
| 65 |
|
|
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
| 66 |
|
|
for more details.
|
| 67 |
|
|
|
| 68 |
|
|
You should have received a copy of the GNU General Public License along
|
| 69 |
|
|
with this program. If not, see \hbox{<http://www.gnu.org/licenses/>} for a
|
| 70 |
|
|
copy.
|
| 71 |
|
|
\end{license}
|
| 72 |
|
|
\begin{revisionhistory}
|
| 73 |
|
|
0.1 & 8/17/2015 & Gisselquist & Incomplete First Draft \\\hline
|
| 74 |
|
|
\end{revisionhistory}
|
| 75 |
|
|
% Revision History
|
| 76 |
|
|
% Table of Contents, named Contents
|
| 77 |
|
|
\tableofcontents
|
| 78 |
|
|
% \listoffigures
|
| 79 |
|
|
\listoftables
|
| 80 |
|
|
\begin{preface}
|
| 81 |
|
|
Many people have asked me why I am building the Zip CPU. ARM processors are
|
| 82 |
|
|
good and effective. Xilinx makes and markets Microblaze, Altera Nios, and both
|
| 83 |
|
|
have better toolsets than the Zip CPU will ever have. OpenRISC is also
|
| 84 |
|
|
available. Why build a new processor?
|
| 85 |
|
|
|
| 86 |
|
|
The easiest, most obvious answer is the simple one: Because I can.
|
| 87 |
|
|
|
| 88 |
|
|
There's more to it, though. There's a lot that I would like to do with a
|
| 89 |
|
|
processor, and I want to be able to do it in a vendor independent fashion.
|
| 90 |
|
|
I would like to be able to generate Verilog code that can run equivalently
|
| 91 |
|
|
on both Xilinx and Altera chips, and that can be easily ported from one
|
| 92 |
|
|
manufacturer's chipsets to another. Even more, before purchasing a chip or a
|
| 93 |
|
|
board, I would like to know that my chip works. I would like to build a test
|
| 94 |
|
|
bench to test components with, and Verilator is my chosen test bench. This
|
| 95 |
|
|
forces me to use all Verilog, and it prevents me from using any proprietary
|
| 96 |
|
|
cores. For this reason, Microblaze and Nios are out of the question.
|
| 97 |
|
|
|
| 98 |
|
|
Why not OpenRISC? That's a hard question. The OpenRISC team has done some
|
| 99 |
|
|
wonderful work on an amazing processor, and I'll have to admit that I am
|
| 100 |
|
|
envious of what they've accomplished. I would like to port binutils to the
|
| 101 |
|
|
Zip CPU, as I would like to port GCC and GDB. They are way ahead of me. The
|
| 102 |
|
|
OpenRISC processor, however, is complex and hefty at about 4,500 LUTs. It has
|
| 103 |
|
|
a lot of features of modern CPUs within it that ... well, let's just say it's
|
| 104 |
|
|
not the little guy on the block. The Zip CPU is lighter weight, costing only
|
| 105 |
|
|
about 2,000 LUTs with no peripherals, and 3,000 LUTs with some very basic
|
| 106 |
|
|
peripherals.
|
| 107 |
|
|
|
| 108 |
|
|
My final reason is that I'm building the Zip CPU as a learning experience. The
|
| 109 |
|
|
Zip CPU has allowed me to learn a lot about how CPUs work on a very micro
|
| 110 |
|
|
level. For the first time, I am beginning to understand many of the Computer
|
| 111 |
|
|
Architecture lessons from years ago.
|
| 112 |
|
|
|
| 113 |
|
|
To summarize: Because I can, because it is open source, because it is light
|
| 114 |
|
|
weight, and as an exercise in learning.
|
| 115 |
|
|
|
| 116 |
|
|
\end{preface}
|
| 117 |
|
|
|
| 118 |
|
|
\chapter{Introduction}
|
| 119 |
|
|
\pagenumbering{arabic}
|
| 120 |
|
|
\setcounter{page}{1}
|
| 121 |
|
|
|
| 122 |
|
|
|
| 123 |
|
|
The original goal of the ZIP CPU was to be a very simple CPU. You might
|
| 124 |
|
|
think of it as a poor man's alternative to the OpenRISC architecture.
|
| 125 |
|
|
For this reason, all instructions have been designed to be as simple as
|
| 126 |
|
|
possible, and are all designed to be executed in one instruction cycle per
|
| 127 |
|
|
instruction, barring pipeline stalls. Indeed, even the bus has been simplified
|
| 128 |
|
|
to a constant 32-bit width, with no option for more or less. This has
|
| 129 |
|
|
resulted in the choice to drop push and pop instructions, pre-increment and
|
| 130 |
|
|
post-decrement addressing modes, and more.
|
| 131 |
|
|
|
| 132 |
|
|
For those who like buzz words, the Zip CPU is:
|
| 133 |
|
|
\begin{itemize}
|
| 134 |
|
|
\item A 32-bit CPU: All registers are 32-bits, addresses are 32-bits,
|
| 135 |
|
|
instructions are 32-bits wide, etc.
|
| 136 |
|
|
\item A RISC CPU. There is no microcode for executing instructions.
|
| 137 |
|
|
\item A Load/Store architecture. (Only load and store instructions
|
| 138 |
|
|
can access memory.)
|
| 139 |
|
|
\item Wishbone compliant. All peripherals are accessed just like
|
| 140 |
|
|
memory across this bus.
|
| 141 |
|
|
\item A Von-Neumann architecture. (The instructions and data share a
|
| 142 |
|
|
common bus.)
|
| 143 |
|
|
\item A pipelined architecture, having stages for {\bf Prefetch},
|
| 144 |
|
|
{\bf Decode}, {\bf Read-Operand}, the {\bf ALU/Memory}
|
| 145 |
|
|
unit, and {\bf Write-back}
|
| 146 |
|
|
\item Completely open source, licensed under the GPL.\footnote{Should you
|
| 147 |
|
|
need a copy of the Zip CPU licensed under other terms, please
|
| 148 |
|
|
contact me.}
|
| 149 |
|
|
\end{itemize}
|
| 150 |
|
|
|
| 151 |
|
|
Now, however, that I've worked on the Zip CPU for a while, it is not nearly
|
| 152 |
|
|
as simple as I originally hoped. Worse, I've had to adjust to create
|
| 153 |
|
|
capabilities that I was never expecting to need. These include:
|
| 154 |
|
|
\begin{itemize}
|
| 155 |
|
|
\item {\bf Extenal Debug:} Once placed upon an FPGA, some external means is
|
| 156 |
|
|
still necessary to debug this CPU. That means that there needs to be
|
| 157 |
|
|
an external register that can control the CPU: reset it, halt it, step
|
| 158 |
|
|
it, and tell whether it is running or not. Another register is placed
|
| 159 |
|
|
similar to this register, to allow the external controller to examine
|
| 160 |
|
|
registers internal to the CPU.
|
| 161 |
|
|
|
| 162 |
|
|
\item {\bf Internal Debug:} Being able to run a debugger from within
|
| 163 |
|
|
a user process requires an ability to step a user process from
|
| 164 |
|
|
within a debugger. It also requires a break instruction that can
|
| 165 |
|
|
be substituted for any other instruction, and substituted back.
|
| 166 |
|
|
The break is actually difficult: the break instruction cannot be
|
| 167 |
|
|
allowed to execute. That way, upon a break, the debugger should
|
| 168 |
|
|
be able to jump back into the user process to step the instruction
|
| 169 |
|
|
that would've been at the break point initially, and then to
|
| 170 |
|
|
replace the break after passing it.
|
| 171 |
|
|
|
| 172 |
|
|
\item {\bf Prefetch Cache:} My original implementation had a very
|
| 173 |
|
|
simple prefetch stage. Any time the PC changed the prefetch would go
|
| 174 |
|
|
and fetch the new instruction. While this was perhaps this simplest
|
| 175 |
|
|
approach, it cost roughly five clocks for every instruction. This
|
| 176 |
|
|
was deemed unacceptable, as I wanted a CPU that could execute
|
| 177 |
|
|
instructions in one cycle. I therefore have a prefetch cache that
|
| 178 |
|
|
issues pipelined wishbone accesses to memory and then pushes
|
| 179 |
|
|
instructions at the CPU. Sadly, this accounts for about 20\% of the
|
| 180 |
|
|
logic in the entire CPU, or 15\% of the logic in the entire system.
|
| 181 |
|
|
|
| 182 |
|
|
|
| 183 |
|
|
\item {\bf Operating System:} In order to support an operating system,
|
| 184 |
|
|
interrupts and so forth, the CPU needs to support supervisor and
|
| 185 |
|
|
user modes, as well as a means of switching between them. For example,
|
| 186 |
|
|
the user needs a means of executing a system call. This is the
|
| 187 |
|
|
purpose of the {\bf `trap'} instruction. This instruction needs to
|
| 188 |
|
|
place the CPU into supervisor mode (here equivalent to disabling
|
| 189 |
|
|
interrupts), as well as handing it a parameter such as identifying
|
| 190 |
|
|
which O/S function was called.
|
| 191 |
|
|
|
| 192 |
|
|
My initial approach to building a trap instruction was to create
|
| 193 |
|
|
an external peripheral which, when written to, would generate an
|
| 194 |
|
|
interrupt and could return the last value written to it. This failed
|
| 195 |
|
|
timing requirements, however: the CPU executed two instructions while
|
| 196 |
|
|
waiting for the trap interrupt to take place. Since then, I've
|
| 197 |
|
|
decided to keep the rest of the CC register for that purpose so that a
|
| 198 |
|
|
write to the CC register, with the GIE bit cleared, could be used to
|
| 199 |
|
|
execute a trap.
|
| 200 |
|
|
|
| 201 |
|
|
Modern timesharing systems also depend upon a {\bf Timer} interrupt
|
| 202 |
|
|
to handle task swapping. For the Zip CPU, this interrupt is handled
|
| 203 |
|
|
external to the CPU as part of the CPU System, found in
|
| 204 |
|
|
{\tt zipsystem.v}. The timer module itself is found in
|
| 205 |
|
|
{\tt ziptimer.v}.
|
| 206 |
|
|
|
| 207 |
|
|
\item {\bf Pipeline Stalls:} My original plan was to not support pipeline
|
| 208 |
|
|
stalls at all, but rather to require the compiler to properly schedule
|
| 209 |
|
|
instructions so that stalls would never be necessary. After trying
|
| 210 |
|
|
to build such an architecture, I gave up, having learned some things:
|
| 211 |
|
|
|
| 212 |
|
|
For example, in order to facilitate interrupt handling and debug
|
| 213 |
|
|
stepping, the CPU needs to know what instructions have finished, and
|
| 214 |
|
|
which have not. In other words, it needs to know where it can restart
|
| 215 |
|
|
the pipeline from. Once restarted, it must act as though it had
|
| 216 |
|
|
never stopped. This killed my idea of delayed branching, since
|
| 217 |
|
|
what would be the appropriate program counter to restart at?
|
| 218 |
|
|
The one the CPU was going to branch to, or the ones in the
|
| 219 |
|
|
delay slots?
|
| 220 |
|
|
|
| 221 |
|
|
So I switched to a model of discrete execution: Once an instruction
|
| 222 |
|
|
enters into either the ALU or memory unit, the instruction is
|
| 223 |
|
|
guaranteed to complete. If the logic recognizes a branch or a
|
| 224 |
|
|
condition that would render the instruction entering into this stage
|
| 225 |
|
|
possibly inappropriate (i.e. a conditional branch preceeding a store
|
| 226 |
|
|
instruction for example), then the pipeline stalls for one cycle
|
| 227 |
|
|
until the conditional branch completes. Then, if it generates a new
|
| 228 |
|
|
PC address, the stages preceeding are all wiped clean.
|
| 229 |
|
|
|
| 230 |
|
|
The discrete execution model allows such things as sleeping: if the
|
| 231 |
|
|
CPU is put to "sleep", the ALU and memory stages stall and back up
|
| 232 |
|
|
everything before them. Likewise, anything that has entered the ALU
|
| 233 |
|
|
or memory stage when the CPU is placed to sleep continues to completion.
|
| 234 |
|
|
To handle this logic, each pipeline stage has three control signals:
|
| 235 |
|
|
a valid signal, a stall signal, and a clock enable signal. In
|
| 236 |
|
|
general, a stage stalls if it's contents are valid and the next step
|
| 237 |
|
|
is stalled. This allows the pipeline to fill any time a later stage
|
| 238 |
|
|
stalls.
|
| 239 |
|
|
|
| 240 |
|
|
\item {\bf Verilog Modules:} When examining how other processors worked
|
| 241 |
|
|
here on open cores, many of them had one separate module per pipeline
|
| 242 |
|
|
stage. While this appeared to me to be a fascinating and commendable
|
| 243 |
|
|
idea, my own implementation didn't work out quite so nicely.
|
| 244 |
|
|
|
| 245 |
|
|
As an example, the decode module produces a {\em lot} of
|
| 246 |
|
|
control wires and registers. Creating a module out of this, with
|
| 247 |
|
|
only the simplest of logic within it, seemed to be more a lesson
|
| 248 |
|
|
in passing wires around, rather than encapsulating logic.
|
| 249 |
|
|
|
| 250 |
|
|
Another example was the register writeback section. I would love
|
| 251 |
|
|
this section to be a module in its own right, and many have made them
|
| 252 |
|
|
such. However, other modules depend upon writeback results other
|
| 253 |
|
|
than just what's placed in the register (i.e., the control wires).
|
| 254 |
|
|
For these reasons, I didn't manage to fit this section into it's
|
| 255 |
|
|
own module.
|
| 256 |
|
|
|
| 257 |
|
|
The result is that the majority of the CPU code can be found in
|
| 258 |
|
|
the {\tt zipcpu.v} file.
|
| 259 |
|
|
\end{itemize}
|
| 260 |
|
|
|
| 261 |
|
|
With that introduction out of the way, let's move on to the instruction
|
| 262 |
|
|
set.
|
| 263 |
|
|
|
| 264 |
|
|
\chapter{CPU Architecture}\label{chap:arch}
|
| 265 |
|
|
|
| 266 |
|
|
The Zip CPU supports a set of two operand instructions, where the first operand
|
| 267 |
|
|
(always a register) is the result. The only exception is the store instruction,
|
| 268 |
|
|
where the first operand (always a register) is the source of the data to be
|
| 269 |
|
|
stored.
|
| 270 |
|
|
|
| 271 |
|
|
\section{Register Set}
|
| 272 |
|
|
The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor
|
| 273 |
|
|
and a user set. The supervisor set is used in interrupt mode, whereas
|
| 274 |
|
|
the user set is used otherwise. Of this register set, the Program Counter (PC)
|
| 275 |
|
|
is register 15, whereas the status register (SR) or condition code register
|
| 276 |
|
|
(CC) is register 14. By convention, the stack pointer will be register 13 and
|
| 277 |
|
|
noted as (SP)--although the instruction set allows it to be anything.
|
| 278 |
|
|
The CPU can access both register sets via move instructions from the
|
| 279 |
|
|
supervisor state, whereas the user state can only access the user registers.
|
| 280 |
|
|
|
| 281 |
|
|
The status register is special, and bears further mention. The lower
|
| 282 |
|
|
8 bits of the status register form a set of condition codes. Writes to other
|
| 283 |
|
|
bits are preserved, and can be used as part of the trap architecture--examined
|
| 284 |
|
|
by the O/S upon any interrupt, cleared before returning.
|
| 285 |
|
|
|
| 286 |
|
|
Of the eight condition codes, the bottom four are the current flags:
|
| 287 |
|
|
Zero (Z),
|
| 288 |
|
|
Carry (C),
|
| 289 |
|
|
Negative (N),
|
| 290 |
|
|
and Overflow (V).
|
| 291 |
|
|
|
| 292 |
|
|
The next bit is a clock enable (0 to enable) or sleep bit (1 to put
|
| 293 |
|
|
the CPU to sleep). Setting this bit will cause the CPU to
|
| 294 |
|
|
wait for an interrupt (if interrupts are enabled), or to
|
| 295 |
|
|
completely halt (if interrupts are disabled).
|
| 296 |
|
|
The sixth bit is a global interrupt enable bit (GIE). When this
|
| 297 |
|
|
sixth bit is a '1' interrupts will be enabled, else disabled. When
|
| 298 |
|
|
interrupts are disabled, the CPU will be in supervisor mode, otherwise
|
| 299 |
|
|
it is in user mode. Thus, to execute a context switch, one only
|
| 300 |
|
|
need enable or disable interrupts. (When an interrupt line goes
|
| 301 |
|
|
high, interrupts will automatically be disabled, as the CPU goes
|
| 302 |
|
|
and deals with its context switch.)
|
| 303 |
|
|
|
| 304 |
|
|
The seventh bit is a step bit. This bit can be
|
| 305 |
|
|
set from supervisor mode only. After setting this bit, should
|
| 306 |
|
|
the supervisor mode process switch to user mode, it would then
|
| 307 |
|
|
accomplish one instruction in user mode before returning to supervisor
|
| 308 |
|
|
mode. Then, upon return to supervisor mode, this bit will
|
| 309 |
|
|
be automatically cleared. This bit has no effect on the CPU while in
|
| 310 |
|
|
supervisor mode.
|
| 311 |
|
|
|
| 312 |
|
|
This functionality was added to enable a userspace debugger
|
| 313 |
|
|
functionality on a user process, working through supervisor mode
|
| 314 |
|
|
of course.
|
| 315 |
|
|
|
| 316 |
|
|
|
| 317 |
|
|
The eighth bit is a break enable bit. This
|
| 318 |
|
|
controls whether a break instruction will halt the processor for an
|
| 319 |
|
|
external debuggerr (break enabled), or whether the break instruction
|
| 320 |
|
|
will simply set the STEP bit and send the CPU into interrupt mode.
|
| 321 |
|
|
This bit can only be set within supervisor mode.
|
| 322 |
|
|
|
| 323 |
|
|
This functionality was added to enable an external debugger to
|
| 324 |
|
|
set and manage breakpoints.
|
| 325 |
|
|
|
| 326 |
|
|
The ninth bit is reserved for a floating point enable bit. When set, the
|
| 327 |
|
|
arithmetic for the next instruction will be sent to a floating point unit.
|
| 328 |
|
|
Such a unit may later be added as an extension to the Zip CPU. If the
|
| 329 |
|
|
CPU does not support floating point instructions, this bit will never be set.
|
| 330 |
|
|
|
| 331 |
|
|
The tenth bit is a trap bit. It is set whenever the user requests a soft
|
| 332 |
|
|
interrupt, and cleared on any return to userspace command. This allows the
|
| 333 |
|
|
supervisor, in supervisor mode, to determine whether it got to supervisor
|
| 334 |
|
|
mode from a trap or from an external interrupt or both.
|
| 335 |
|
|
|
| 336 |
|
|
The status register bits are shown below:
|
| 337 |
|
|
\begin{table}
|
| 338 |
|
|
\begin{center}
|
| 339 |
|
|
\begin{tabular}{l|l}
|
| 340 |
|
|
Bit & Meaning \\\hline
|
| 341 |
|
|
9 & Soft trap, set on a trap from user mode, cleared when returing to user mode\\\hline
|
| 342 |
|
|
8 & (Reserved for) Floating point enable \\\hline
|
| 343 |
|
|
7 & Halt on break, to support an external debugger \\\hline
|
| 344 |
|
|
6 & Step, single step the CPU in user mode\\\hline
|
| 345 |
|
|
5 & GIE, or Global Interrupt Enable \\\hline
|
| 346 |
|
|
4 & Sleep \\\hline
|
| 347 |
|
|
3 & V, or overflow bit.\\\hline
|
| 348 |
|
|
2 & N, or negative bit.\\\hline
|
| 349 |
|
|
1 & C, or carry bit.\\\hline
|
| 350 |
|
|
|
| 351 |
|
|
\end{tabular}
|
| 352 |
|
|
\end{center}
|
| 353 |
|
|
\end{table}
|
| 354 |
|
|
\section{Conditional Instructions}
|
| 355 |
|
|
Most, although not quite all, instructions are conditionally executed. From
|
| 356 |
|
|
the four condition code flags, eight conditions are defined. These are shown
|
| 357 |
|
|
in Tbl.~\ref{tbl:conditions}.
|
| 358 |
|
|
\begin{table}
|
| 359 |
|
|
\begin{center}
|
| 360 |
|
|
\begin{tabular}{l|l|l}
|
| 361 |
|
|
Code & Mneumonic & Condition \\\hline
|
| 362 |
|
|
3'h0 & None & Always execute the instruction \\
|
| 363 |
|
|
3'h1 & {\tt .Z} & Only execute when 'Z' is set \\
|
| 364 |
|
|
3'h2 & {\tt .NE} & Only execute when 'Z' is not set \\
|
| 365 |
|
|
3'h3 & {\tt .GE} & Greater than or equal ('N' not set, 'Z' irrelevant) \\
|
| 366 |
|
|
3'h4 & {\tt .GT} & Greater than ('N' not set, 'Z' not set) \\
|
| 367 |
|
|
3'h5 & {\tt .LT} & Less than ('N' not set) \\
|
| 368 |
|
|
3'h6 & {\tt .C} & Carry set\\
|
| 369 |
|
|
3'h7 & {\tt .V} & Overflow set\\
|
| 370 |
|
|
\end{tabular}
|
| 371 |
|
|
\caption{Conditions for conditional operand execution}\label{tbl:conditions}
|
| 372 |
|
|
\end{center}
|
| 373 |
|
|
\end{table}
|
| 374 |
|
|
There is no condition code for less than or equal, not C or not V. Using
|
| 375 |
|
|
these conditions will take an extra instruction.
|
| 376 |
|
|
(Ex: \hbox{\tt TST \$4,CC;} \hbox{\tt STO.NZ R0,(R1)})
|
| 377 |
|
|
|
| 378 |
|
|
\section{Operand B}
|
| 379 |
|
|
Many instruction forms have a 21-bit source "Operand B" associated with them.
|
| 380 |
|
|
This Operand B is either equal to a register plus a signed immediate offset,
|
| 381 |
|
|
or an immediate offset by itself. This value is encoded as shown in
|
| 382 |
|
|
Tbl.~\ref{tbl:opb}.
|
| 383 |
|
|
\begin{table}\begin{center}
|
| 384 |
|
|
\begin{tabular}{|l|l|l|}\hline
|
| 385 |
|
|
Bit 20 & 19 \ldots 16 & 15 \ldots 0 \\\hline
|
| 386 |
|
|
1'b0 & \multicolumn{2}{l|}{Signed Immediate value} \\\hline
|
| 387 |
|
|
1'b1 & 4-bit Register & 16-bit Signed immediate offset \\\hline
|
| 388 |
|
|
\end{tabular}
|
| 389 |
|
|
\caption{Bit allocation for Operand B}\label{tbl:opb}
|
| 390 |
|
|
\end{center}\end{table}
|
| 391 |
|
|
\section{Address Modes}
|
| 392 |
|
|
The ZIP CPU supports two addressing modes: register plus immediate, and
|
| 393 |
|
|
immediate address. Addresses are therefore encoded in the same fashion as
|
| 394 |
|
|
Operand B's, shown above.
|
| 395 |
|
|
|
| 396 |
|
|
A lot of long hard thought was put into whether to allow pre/post increment
|
| 397 |
|
|
and decrement addressing modes. Finding no way to use these operators without
|
| 398 |
|
|
taking two or more clocks per instruction, these addressing modes have been
|
| 399 |
|
|
removed from the realm of possibilities. This means that the Zip CPU has no
|
| 400 |
|
|
native way of executing push, pop, return, or jump to subroutine operations.
|
| 401 |
|
|
|
| 402 |
|
|
\section{Move Operands}
|
| 403 |
|
|
The previous set of operands would be perfect and complete, save only that
|
| 404 |
|
|
the CPU needs access to non--supervisory registers while in supervisory
|
| 405 |
|
|
mode. Therefore, the MOV instruction is special and offers access
|
| 406 |
|
|
to these registers ... when in supervisory mode. To keep the compiler
|
| 407 |
|
|
simple, the extra bits are ignored in non-supervisory mode (as though
|
| 408 |
|
|
they didn't exist), rather than being mapped to new instructions or
|
| 409 |
|
|
additional capabilities. The bits indicating which register set each
|
| 410 |
|
|
register lies within are the A-Usr and B-Usr bits. When set to a one,
|
| 411 |
|
|
these refer to a user mode register. When set to a zero, these refer
|
| 412 |
|
|
to a register in the current mode, whether user or supervisor.
|
| 413 |
|
|
Further, because
|
| 414 |
|
|
a load immediate instruction exists, there is no move capability between
|
| 415 |
|
|
an immediate and a register: all moves come from either a register or
|
| 416 |
|
|
a register plus an offset.
|
| 417 |
|
|
|
| 418 |
|
|
This actually leads to a bit of a problem: since the MOV instruction
|
| 419 |
|
|
encodes which register set each register is coming from or moving to,
|
| 420 |
|
|
how shall a compiler or assembler know how to compile a MOV instruction
|
| 421 |
|
|
without knowing the mode of the CPU at the time? For this reason,
|
| 422 |
|
|
the compiler will assume all MOV registers are supervisor registers,
|
| 423 |
|
|
and display them as normal. Anything with the user bit set will
|
| 424 |
|
|
be treated as a user register. The CPU will quietly ignore the
|
| 425 |
|
|
supervisor bits while in user mode, and anything marked as a user
|
| 426 |
|
|
register will always be valid.
|
| 427 |
|
|
|
| 428 |
|
|
\section{Multiply Operations}
|
| 429 |
|
|
While the Zip CPU instruction set supports multiply operations, they are not
|
| 430 |
|
|
yet fully supported by the CPU. Two Multiply operations are supported, a
|
| 431 |
|
|
16x16 bit signed multiply (MPYS) and the same but unsigned (MPYU). In both
|
| 432 |
|
|
cases, the operand is a register plus a 16-bit immediate, subject to the
|
| 433 |
|
|
rule that the register cannot be the PC or CC registers. The PC register
|
| 434 |
|
|
field has been stolen to create a multiply by immediate instruction. The
|
| 435 |
|
|
CC register field is reserved.
|
| 436 |
|
|
|
| 437 |
|
|
\section{Floating Point}
|
| 438 |
|
|
The ZIP CPU does not support floating point operations today. However, the
|
| 439 |
|
|
instruction set reserves a capability for a floating point operation. To
|
| 440 |
|
|
execute such an operation, simply set the floating point bit in the CC
|
| 441 |
|
|
register and the following instruction will interpret its registers as
|
| 442 |
|
|
a floating point instruction. Not all instructions, however, have floating
|
| 443 |
|
|
point equivalents. Further, the immediate fields do not apply in floating
|
| 444 |
|
|
point mode, and must be set to zero. Not all instructions make sense as
|
| 445 |
|
|
floating point operations. Therefore, only the CMP, SUB, ADD, and MPY
|
| 446 |
|
|
instructions may be issued as floating point instructions. Other instructions
|
| 447 |
|
|
allow the examining of the floating point bit in the CC register. In all
|
| 448 |
|
|
cases, the floating point bit is cleared one instruction after it is set.
|
| 449 |
|
|
|
| 450 |
|
|
The architecture does not support a floating point not-implemented interrupt.
|
| 451 |
|
|
Any soft floating point emulation must be done deliberately.
|
| 452 |
|
|
|
| 453 |
|
|
\section{Native Instructions}
|
| 454 |
|
|
The instruction set for the Zip CPU is summarized in
|
| 455 |
|
|
Tbl.~\ref{tbl:zip-instructions}.
|
| 456 |
|
|
\begin{table}\begin{center}
|
| 457 |
|
|
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|l|c|}\hline
|
| 458 |
|
|
Op Code & \multicolumn{8}{c|}{31\ldots24} & \multicolumn{8}{c|}{23\ldots 16}
|
| 459 |
|
|
& \multicolumn{8}{c|}{15\ldots 8} & \multicolumn{8}{c|}{7\ldots 0}
|
| 460 |
|
|
& Sets CC? \\\hline
|
| 461 |
|
|
CMP(Sub) & \multicolumn{4}{l|}{4'h0}
|
| 462 |
|
|
& \multicolumn{4}{l|}{D. Reg}
|
| 463 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 464 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
| 465 |
|
|
& Yes \\\hline
|
| 466 |
|
|
BTST(And) & \multicolumn{4}{l|}{4'h1}
|
| 467 |
|
|
& \multicolumn{4}{l|}{D. Reg}
|
| 468 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 469 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
| 470 |
|
|
& Yes \\\hline
|
| 471 |
|
|
MOV & \multicolumn{4}{l|}{4'h2}
|
| 472 |
|
|
& \multicolumn{4}{l|}{D. Reg}
|
| 473 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 474 |
|
|
& A-Usr
|
| 475 |
|
|
& \multicolumn{4}{l|}{B-Reg}
|
| 476 |
|
|
& B-Usr
|
| 477 |
|
|
& \multicolumn{15}{l|}{15'bit signed offset}
|
| 478 |
|
|
& \\\hline
|
| 479 |
|
|
LODI & \multicolumn{4}{l|}{4'h3}
|
| 480 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 481 |
|
|
& \multicolumn{24}{l|}{24'bit Signed Immediate}
|
| 482 |
|
|
& \\\hline
|
| 483 |
|
|
NOOP & \multicolumn{4}{l|}{4'h4}
|
| 484 |
|
|
& \multicolumn{4}{l|}{4'he}
|
| 485 |
|
|
& \multicolumn{24}{l|}{24'h00}
|
| 486 |
|
|
& \\\hline
|
| 487 |
|
|
BREAK & \multicolumn{4}{l|}{4'h4}
|
| 488 |
|
|
& \multicolumn{4}{l|}{4'he}
|
| 489 |
|
|
& \multicolumn{24}{l|}{24'h01}
|
| 490 |
|
|
& \\\hline
|
| 491 |
|
|
{\em Rsrd} & \multicolumn{4}{l|}{4'h4}
|
| 492 |
|
|
& \multicolumn{4}{l|}{4'he}
|
| 493 |
|
|
& \multicolumn{24}{l|}{24'bits, but not 0 or 1.}
|
| 494 |
|
|
& \\\hline
|
| 495 |
|
|
LODIHI & \multicolumn{4}{l|}{4'h4}
|
| 496 |
|
|
& \multicolumn{4}{l|}{4'hf}
|
| 497 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 498 |
|
|
& 1'b1
|
| 499 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 500 |
|
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
| 501 |
|
|
& \\\hline
|
| 502 |
|
|
LODILO & \multicolumn{4}{l|}{4'h4}
|
| 503 |
|
|
& \multicolumn{4}{l|}{4'hf}
|
| 504 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 505 |
|
|
& 1'b0
|
| 506 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 507 |
|
|
& \multicolumn{16}{l|}{16-bit Immediate}
|
| 508 |
|
|
& \\\hline
|
| 509 |
|
|
16-b MPYU & \multicolumn{4}{l|}{4'h4}
|
| 510 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 511 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 512 |
|
|
& 1'b0 & \multicolumn{4}{l|}{Reg}
|
| 513 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
| 514 |
|
|
& Yes \\\hline
|
| 515 |
|
|
16-b MPYU(I) & \multicolumn{4}{l|}{4'h4}
|
| 516 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 517 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 518 |
|
|
& 1'b0 & \multicolumn{4}{l|}{4'hf}
|
| 519 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
| 520 |
|
|
& Yes \\\hline
|
| 521 |
|
|
16-b MPYS & \multicolumn{4}{l|}{4'h4}
|
| 522 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 523 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 524 |
|
|
& 1'b1 & \multicolumn{4}{l|}{Reg}
|
| 525 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
| 526 |
|
|
& Yes \\\hline
|
| 527 |
|
|
16-b MPYS(I) & \multicolumn{4}{l|}{4'h4}
|
| 528 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 529 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 530 |
|
|
& 1'b1 & \multicolumn{4}{l|}{4'hf}
|
| 531 |
|
|
& \multicolumn{16}{l|}{16-bit Offset}
|
| 532 |
|
|
& Yes \\\hline
|
| 533 |
|
|
ROL & \multicolumn{4}{l|}{4'h5}
|
| 534 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 535 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 536 |
|
|
& \multicolumn{21}{l|}{Operand B, truncated to low order 5 bits}
|
| 537 |
|
|
& \\\hline
|
| 538 |
|
|
LOD & \multicolumn{4}{l|}{4'h6}
|
| 539 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 540 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 541 |
|
|
& \multicolumn{21}{l|}{Operand B address}
|
| 542 |
|
|
& \\\hline
|
| 543 |
|
|
STO & \multicolumn{4}{l|}{4'h7}
|
| 544 |
|
|
& \multicolumn{4}{l|}{D. Reg}
|
| 545 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 546 |
|
|
& \multicolumn{21}{l|}{Operand B address}
|
| 547 |
|
|
& \\\hline
|
| 548 |
|
|
{\em Rsrd} & \multicolumn{4}{l|}{4'h8}
|
| 549 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 550 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 551 |
|
|
& 1'b0
|
| 552 |
|
|
& \multicolumn{20}{l|}{Reserved}
|
| 553 |
|
|
& Yes \\\hline
|
| 554 |
|
|
SUB & \multicolumn{4}{l|}{4'h8}
|
| 555 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 556 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 557 |
|
|
& 1'b1
|
| 558 |
|
|
& \multicolumn{4}{l|}{Reg}
|
| 559 |
|
|
& \multicolumn{16}{l|}{16'bit signed offset}
|
| 560 |
|
|
& Yes \\\hline
|
| 561 |
|
|
AND & \multicolumn{4}{l|}{4'h9}
|
| 562 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 563 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 564 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
| 565 |
|
|
& Yes \\\hline
|
| 566 |
|
|
ADD & \multicolumn{4}{l|}{4'ha}
|
| 567 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 568 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 569 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
| 570 |
|
|
& Yes \\\hline
|
| 571 |
|
|
OR & \multicolumn{4}{l|}{4'hb}
|
| 572 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 573 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 574 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
| 575 |
|
|
& Yes \\\hline
|
| 576 |
|
|
XOR & \multicolumn{4}{l|}{4'hc}
|
| 577 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 578 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 579 |
|
|
& \multicolumn{21}{l|}{Operand B}
|
| 580 |
|
|
& Yes \\\hline
|
| 581 |
|
|
LSL/ASL & \multicolumn{4}{l|}{4'hd}
|
| 582 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 583 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 584 |
|
|
& \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
|
| 585 |
|
|
& Yes \\\hline
|
| 586 |
|
|
ASR & \multicolumn{4}{l|}{4'he}
|
| 587 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 588 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 589 |
|
|
& \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
|
| 590 |
|
|
& Yes \\\hline
|
| 591 |
|
|
LSR & \multicolumn{4}{l|}{4'hf}
|
| 592 |
|
|
& \multicolumn{4}{l|}{R. Reg}
|
| 593 |
|
|
& \multicolumn{3}{l|}{Cond.}
|
| 594 |
|
|
& \multicolumn{21}{l|}{Operand B, imm. trucated to 6 bits}
|
| 595 |
|
|
& Yes \\\hline
|
| 596 |
|
|
\end{tabular}
|
| 597 |
|
|
\caption{Zip CPU Instruction Set}\label{tbl:zip-instructions}
|
| 598 |
|
|
\end{center}\end{table}
|
| 599 |
|
|
|
| 600 |
|
|
As you can see, there's lots of room for instruction set expansion. The
|
| 601 |
|
|
NOOP and BREAK instructions leave 24~bits of open instruction address
|
| 602 |
|
|
space, minus the two instructions NOOP and BREAK. The Subtract leaves half
|
| 603 |
|
|
of its space open, since a subtract immediate is the same as an add with a
|
| 604 |
|
|
negated immediate.
|
| 605 |
|
|
|
| 606 |
|
|
\section{Derived Instructions}
|
| 607 |
|
|
The ZIP CPU supports many other common instructions, but not all of them
|
| 608 |
|
|
are single instructions. The derived instruction tables,
|
| 609 |
|
|
Tbls.~\ref{tbl:derived-1}, \ref{tbl:derived-2}, and~\ref{tbl:derived-3},
|
| 610 |
|
|
help to capture some of how these other instructions may be implemented on
|
| 611 |
|
|
the ZIP CPU. Many of these instructions will have assembly equivalents,
|
| 612 |
|
|
such as the branch instructions, to facilitate working with the CPU.
|
| 613 |
|
|
\begin{table}\begin{center}
|
| 614 |
|
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
| 615 |
|
|
Mapped & Actual & Notes \\\hline
|
| 616 |
|
|
\parbox[t]{1.4in}{ADD Ra,Rx\\ADDC Rb,Ry}
|
| 617 |
|
|
& \parbox[t]{1.5in}{Add Ra,Rx\\ADD.C \$1,Ry\\Add Rb,Ry}
|
| 618 |
|
|
& Add with carry \\\hline
|
| 619 |
|
|
BRA.Cond +/-\$Addr
|
| 620 |
|
|
& Mov.cond \$Addr+PC,PC
|
| 621 |
|
|
& Branch or jump on condition. Works for 14 bit
|
| 622 |
|
|
address offsets.\\\hline
|
| 623 |
|
|
BRA.Cond +/-\$Addr
|
| 624 |
|
|
& \parbox[t]{1.5in}{LDI \$Addr,Rx \\ ADD.cond Rx,PC}
|
| 625 |
|
|
& Branch/jump on condition. Works for
|
| 626 |
|
|
23 bit address offsets, but costs a register, an extra instruction,
|
| 627 |
|
|
and setsthe flags. \\\hline
|
| 628 |
|
|
BNC PC+\$Addr
|
| 629 |
|
|
& \parbox[t]{1.5in}{Test \$Carry,CC \\ MOV.Z PC+\$Addr,PC}
|
| 630 |
|
|
& Example of a branch on an unsupported
|
| 631 |
|
|
condition, in this case a branch on not carry \\\hline
|
| 632 |
|
|
BUSY & MOV \$-1(PC),PC & Execute an infinite loop \\\hline
|
| 633 |
|
|
CLRF.NZ Rx
|
| 634 |
|
|
& XOR.NZ Rx,Rx
|
| 635 |
|
|
& Clear Rx, and flags, if the Z-bit is not set \\\hline
|
| 636 |
|
|
CLR Rx
|
| 637 |
|
|
& LDI \$0,Rx
|
| 638 |
|
|
& Clears Rx, leaves flags untouched. This instruction cannot be
|
| 639 |
|
|
conditional. \\\hline
|
| 640 |
|
|
EXCH.W Rx
|
| 641 |
|
|
& ROL \$16,Rx
|
| 642 |
|
|
& Exchanges the top and bottom 16'bit words of Rx \\\hline
|
| 643 |
|
|
HALT
|
| 644 |
|
|
& Or \$SLEEP,CC
|
| 645 |
|
|
& Executed while in interrupt mode. In user mode this is simply a
|
| 646 |
|
|
wait until interrupt instructioon. \\\hline
|
| 647 |
|
|
INT & LDI \$0,CC
|
| 648 |
|
|
& Since we're using the CC register as a trap vector as well, this
|
| 649 |
|
|
executes TRAP \#0. \\\hline
|
| 650 |
|
|
IRET
|
| 651 |
|
|
& OR \$GIE,CC
|
| 652 |
|
|
& Also an RTU instruction (Return to Userspace) \\\hline
|
| 653 |
|
|
JMP R6+\$Addr
|
| 654 |
|
|
& MOV \$Addr(R6),PC
|
| 655 |
|
|
& \\\hline
|
| 656 |
|
|
JSR PC+\$Addr
|
| 657 |
|
|
& \parbox[t]{1.5in}{SUB \$1,SP \\\
|
| 658 |
|
|
MOV \$3+PC,R0 \\
|
| 659 |
|
|
STO R0,1(SP) \\
|
| 660 |
|
|
MOV \$Addr+PC,PC \\
|
| 661 |
|
|
ADD \$1,SP}
|
| 662 |
|
|
& Jump to Subroutine. \\\hline
|
| 663 |
|
|
JSR PC+\$Addr
|
| 664 |
|
|
& \parbox[t]{1.5in}{MOV \$3+PC,R12 \\ MOV \$addr+PC,PC}
|
| 665 |
|
|
&This is the high speed
|
| 666 |
|
|
version of a subroutine call, necessitating a register to hold the
|
| 667 |
|
|
last PC address. In its favor, this method doesn't suffer the
|
| 668 |
|
|
mandatory memory access of the other approach. \\\hline
|
| 669 |
|
|
LDI.l \$val,Rx
|
| 670 |
|
|
& \parbox[t]{1.5in}{LDIHI (\$val$>>$16)\&0x0ffff, Rx \\
|
| 671 |
|
|
LDILO (\$val \& 0x0ffff)}
|
| 672 |
|
|
& Sadly, there's not enough instruction
|
| 673 |
|
|
space to load a complete immediate value into any register.
|
| 674 |
|
|
Therefore, fully loading any register takes two cycles.
|
| 675 |
|
|
The LDIHI (load immediate high) and LDILO (load immediate low)
|
| 676 |
|
|
instructions have been created to facilitate this. \\\hline
|
| 677 |
|
|
\end{tabular}
|
| 678 |
|
|
\caption{Derived Instructions}\label{tbl:derived-1}
|
| 679 |
|
|
\end{center}\end{table}
|
| 680 |
|
|
\begin{table}\begin{center}
|
| 681 |
|
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
| 682 |
|
|
Mapped & Actual & Notes \\\hline
|
| 683 |
|
|
LOD.b \$addr,Rx
|
| 684 |
|
|
& \parbox[t]{1.5in}{%
|
| 685 |
|
|
LDI \$addr,Ra \\
|
| 686 |
|
|
LDI \$addr,Rb \\
|
| 687 |
|
|
LSR \$2,Ra \\
|
| 688 |
|
|
AND \$3,Rb \\
|
| 689 |
|
|
LOD (Ra),Rx \\
|
| 690 |
|
|
LSL \$3,Rb \\
|
| 691 |
|
|
SUB \$32,Rb \\
|
| 692 |
|
|
ROL Rb,Rx \\
|
| 693 |
|
|
AND \$0ffh,Rx}
|
| 694 |
|
|
& \parbox[t]{3in}{This CPU is designed for 32'bit word
|
| 695 |
|
|
length instructions. Byte addressing is not supported by the CPU or
|
| 696 |
|
|
the bus, so it therefore takes more work to do.
|
| 697 |
|
|
|
| 698 |
|
|
Note also that in this example, \$Addr is a byte-wise address, where
|
| 699 |
|
|
all other addresses are 32-bit wordlength addresses. For this reason,
|
| 700 |
|
|
we needed to drop the bottom two bits. This also limits the address
|
| 701 |
|
|
space of character accesses using this method from 16 MB down to 4MB.}
|
| 702 |
|
|
\\\hline
|
| 703 |
|
|
\parbox[t]{1.5in}{LSL \$1,Rx\\ LSLC \$1,Ry}
|
| 704 |
|
|
& \parbox[t]{1.5in}{LSL \$1,Ry \\
|
| 705 |
|
|
LSL \$1,Rx \\
|
| 706 |
|
|
OR.C \$1,Ry}
|
| 707 |
|
|
& Logical shift left with carry. Note that the
|
| 708 |
|
|
instruction order is now backwards, to keep the conditions valid.
|
| 709 |
|
|
That is, LSL sets the carry flag, so if we did this the othe way
|
| 710 |
|
|
with Rx before Ry, then the condition flag wouldn't have been right
|
| 711 |
|
|
for an OR correction at the end. \\\hline
|
| 712 |
|
|
\parbox[t]{1.5in}{LSR \$1,Rx \\ LSRC \$1,Ry}
|
| 713 |
|
|
& \parbox[t]{1.5in}{CLR Rz \\
|
| 714 |
|
|
LSR \$1,Ry \\
|
| 715 |
|
|
LDIHI.C \$8000h,Rz \\
|
| 716 |
|
|
LSR \$1,Rx \\
|
| 717 |
|
|
OR Rz,Rx}
|
| 718 |
|
|
& Logical shift right with carry \\\hline
|
| 719 |
|
|
NEG Rx & \parbox[t]{1.5in}{XOR \$-1,Rx \\ ADD \$1,Rx} & \\\hline
|
| 720 |
|
|
NOOP & NOOP & While there are many
|
| 721 |
|
|
operations that do nothing, such as MOV Rx,Rx, or OR \$0,Rx, these
|
| 722 |
|
|
operations have consequences in that they might stall the bus if
|
| 723 |
|
|
Rx isn't ready yet. For this reason, we have a dedicated NOOP
|
| 724 |
|
|
instruction. \\\hline
|
| 725 |
|
|
NOT Rx & XOR \$-1,Rx & \\\hline
|
| 726 |
|
|
POP Rx
|
| 727 |
|
|
& \parbox[t]{1.5in}{LOD \$-1(SP),Rx \\ ADD \$1,SP}
|
| 728 |
|
|
& Note
|
| 729 |
|
|
that for interrupt purposes, one can never depend upon the value at
|
| 730 |
|
|
(SP). Hence you read from it, then increment it, lest having
|
| 731 |
|
|
incremented it firost something then comes along and writes to that
|
| 732 |
|
|
value before you can read the result. \\\hline
|
| 733 |
|
|
PUSH Rx
|
| 734 |
|
|
& \parbox[t]{1.5in}{SUB \$1,SPa \\
|
| 735 |
|
|
STO Rx,\$1(SP)}
|
| 736 |
|
|
& \\\hline
|
| 737 |
|
|
RESET
|
| 738 |
|
|
& \parbox[t]{1in}{STO \$1,\$watchdog(R12)\\NOOP\\NOOP}
|
| 739 |
|
|
& \parbox[t]{3in}{This depends upon the peripheral base address being
|
| 740 |
|
|
in R12.
|
| 741 |
|
|
|
| 742 |
|
|
Another opportunity might be to jump to the reset address from within
|
| 743 |
|
|
supervisor mode.}\\\hline
|
| 744 |
|
|
RET & \parbox[t]{1.5in}{LOD \$-1(SP),R0 \\
|
| 745 |
|
|
MOV \$-1+SP,SP \\
|
| 746 |
|
|
MOV R0,PC}
|
| 747 |
|
|
& An alternative might be to LOD \$-1(SP),PC, followed
|
| 748 |
|
|
by depending upon the calling program to ADD \$1,SP. \\\hline
|
| 749 |
|
|
\end{tabular}
|
| 750 |
|
|
\caption{Derived Instructions, continued}\label{tbl:derived-2}
|
| 751 |
|
|
\end{center}\end{table}
|
| 752 |
|
|
\begin{table}\begin{center}
|
| 753 |
|
|
\begin{tabular}{p{1.4in}p{1.5in}p{3in}}\\\hline
|
| 754 |
|
|
RET & MOV R12,PC
|
| 755 |
|
|
& This is the high(er) speed version, that doesn't touch the stack.
|
| 756 |
|
|
As such, it doesn't suffer a stall on memory read/write to the stack.
|
| 757 |
|
|
\\\hline
|
| 758 |
|
|
STEP Rr,Rt
|
| 759 |
|
|
& \parbox[t]{1.5in}{LSR \$1,Rr \\ XOR.C Rt,Rr}
|
| 760 |
|
|
& Step a Galois implementation of a Linear Feedback Shift Register, Rr,
|
| 761 |
|
|
using taps Rt \\\hline
|
| 762 |
|
|
STO.b Rx,\$addr
|
| 763 |
|
|
& \parbox[t]{1.5in}{%
|
| 764 |
|
|
LDI \$addr,Ra \\
|
| 765 |
|
|
LDI \$addr,Rb \\
|
| 766 |
|
|
LSR \$2,Ra \\
|
| 767 |
|
|
AND \$3,Rb \\
|
| 768 |
|
|
SUB \$32,Rb \\
|
| 769 |
|
|
LOD (Ra),Ry \\
|
| 770 |
|
|
AND \$0ffh,Rx \\
|
| 771 |
|
|
AND \$-0ffh,Ry \\
|
| 772 |
|
|
ROL Rb,Rx \\
|
| 773 |
|
|
OR Rx,Ry \\
|
| 774 |
|
|
STO Ry,(Ra) }
|
| 775 |
|
|
& \parbox[t]{3in}{This CPU and it's bus are {\em not} optimized
|
| 776 |
|
|
for byte-wise operations.
|
| 777 |
|
|
|
| 778 |
|
|
Note that in this example, \$addr is a
|
| 779 |
|
|
byte-wise address, whereas in all of our other examples it is a
|
| 780 |
|
|
32-bit word address. This also limits the address space
|
| 781 |
|
|
of character accesses from 16 MB down to 4MB.F
|
| 782 |
|
|
Further, this instruction implies a byte ordering,
|
| 783 |
|
|
such as big or little endian.} \\\hline
|
| 784 |
|
|
SWAP Rx,Ry
|
| 785 |
|
|
& \parbox[t]{1.5in}{
|
| 786 |
|
|
XOR Ry,Rx \\
|
| 787 |
|
|
XOR Rx,Ry \\
|
| 788 |
|
|
XOR Ry,Rx}
|
| 789 |
|
|
& While no extra registers are needed, this example
|
| 790 |
|
|
does take 3-clocks. \\\hline
|
| 791 |
|
|
TRAP \#X
|
| 792 |
|
|
& LDILO \$x,CC
|
| 793 |
|
|
& This approach uses the unused bits of the CC register as a TRAP
|
| 794 |
|
|
address. If these bits are zero, no trap has occurred. Unlike my
|
| 795 |
|
|
previous approach, which was to use a trap peripheral, this approach
|
| 796 |
|
|
has no delay associated with it. To work, the supervisor will need
|
| 797 |
|
|
to clear this register following any trap, and the user will need to
|
| 798 |
|
|
be careful to only set this register prior to a trap condition.
|
| 799 |
|
|
Likewise, when setting this value, the user will need to make certain
|
| 800 |
|
|
that the SLEEP and GIE bits are not set in \$x. LDI would also work,
|
| 801 |
|
|
however using LDILO permits the use of conditional traps. (i.e.,
|
| 802 |
|
|
trap if the zero flag is set.) Should you wish to trap off of a
|
| 803 |
|
|
register value, you could equivalently load \$x into the register and
|
| 804 |
|
|
then MOV it into the CC register. \\\hline
|
| 805 |
|
|
TST Rx
|
| 806 |
|
|
& TST \$-1,Rx
|
| 807 |
|
|
& Set the condition codes based upon Rx. Could also do a CMP \$0,Rx,
|
| 808 |
|
|
ADD \$0,Rx, SUB \$0,Rx, etc, AND \$-1,Rx, etc. The TST and CMP
|
| 809 |
|
|
approaches won't stall future pipeline stages looking for the value
|
| 810 |
|
|
of Rx. \\\hline
|
| 811 |
|
|
WAIT
|
| 812 |
|
|
& Or \$SLEEP,CC
|
| 813 |
|
|
& Wait 'til interrupt. In an interrupts disabled context, this
|
| 814 |
|
|
becomes a HALT instruction.
|
| 815 |
|
|
</TABLE>
|
| 816 |
|
|
\end{tabular}
|
| 817 |
|
|
\caption{Derived Instructions, continued}\label{tbl:derived-3}
|
| 818 |
|
|
\end{center}\end{table}
|
| 819 |
|
|
\iffalse
|
| 820 |
|
|
\fi
|
| 821 |
|
|
\section{Pipeline Stages}
|
| 822 |
|
|
\begin{enumerate}
|
| 823 |
|
|
\item {\bf Prefetch}: Read instruction from memory (cache if possible). This
|
| 824 |
|
|
stage is actually pipelined itself, and so it will stall if the PC
|
| 825 |
|
|
ever changes. Stalls are also created here if the instruction isn't
|
| 826 |
|
|
in the prefetch cache.
|
| 827 |
|
|
\item {\bf Decode}: Decode instruction into op code, register(s) to read, and
|
| 828 |
|
|
immediate offset.
|
| 829 |
|
|
\item {\bf Read Operands}: Read registers and apply any immediate values to
|
| 830 |
|
|
them. This stage will stall if any source operand is pending.
|
| 831 |
|
|
A proper optimizing compiler, therefore, will schedule an instruction
|
| 832 |
|
|
between the instruction that produces the result and the instruction
|
| 833 |
|
|
that uses it.
|
| 834 |
|
|
\item Split into two tracks: An {\bf ALU} which will accomplish a simple
|
| 835 |
|
|
instruction, and the {\bf MemOps} stage which accomplishes memory
|
| 836 |
|
|
read/write.
|
| 837 |
|
|
\begin{itemize}
|
| 838 |
|
|
\item Loads stall instructions that access the register until it is
|
| 839 |
|
|
written to the register set.
|
| 840 |
|
|
\item Condition codes are available upon completion
|
| 841 |
|
|
\item Issuing an instruction to the memory while the memory is busy will
|
| 842 |
|
|
stall the bus. If the bus deadlocks, only a reset will
|
| 843 |
|
|
release the CPU. (Watchdog timer, anyone?)
|
| 844 |
|
|
\end{itemize}
|
| 845 |
|
|
\item {\bf Write-Back}: Conditionally write back the result to register set,
|
| 846 |
|
|
applying the condition. This routine is bi-re-entrant: either the
|
| 847 |
|
|
memory or the simple instruction may request a register write.
|
| 848 |
|
|
\end{enumerate}
|
| 849 |
|
|
|
| 850 |
|
|
\section{Pipeline Logic}
|
| 851 |
|
|
How the CPU handles some instruction combinations can be telling when
|
| 852 |
|
|
determining what happens in the pipeline. The following lists some examples:
|
| 853 |
|
|
\begin{itemize}
|
| 854 |
|
|
\item {\bf Delayed Branching}
|
| 855 |
|
|
|
| 856 |
|
|
I had originally hoped to implement delayed branching. However, what
|
| 857 |
|
|
happens in debug mode?
|
| 858 |
|
|
That is, what happens when a debugger tries to single step an
|
| 859 |
|
|
instruction? While I can easily single step the computer in either
|
| 860 |
|
|
user or supervisor mode from externally, this processor does not appear
|
| 861 |
|
|
able to step the CPU in user mode from within user mode--gosh, not even
|
| 862 |
|
|
from within supervisor mode--such as if a process had a debugger
|
| 863 |
|
|
attached. As the processor exists, I would have one result stepping
|
| 864 |
|
|
the CPU from a debugger, and another stepping it externally.
|
| 865 |
|
|
|
| 866 |
|
|
This is unacceptable, and so this CPU does not support delayed
|
| 867 |
|
|
branching.
|
| 868 |
|
|
|
| 869 |
|
|
\item {\bf Register Result:} {\tt MOV R0,R1; MOV R1,R2 }
|
| 870 |
|
|
|
| 871 |
|
|
What value does
|
| 872 |
|
|
R2 get, the value of R1 before the first move or the value of R0?
|
| 873 |
|
|
Placing the value of R0 into R1 requires a pipeline stall, and possibly
|
| 874 |
|
|
two, as I have the pipeline designed.
|
| 875 |
|
|
|
| 876 |
|
|
The ZIP CPU architecture requires that R2 must equal R0 at the end of
|
| 877 |
|
|
this operation. This may stall the pipeline 1-2 cycles.
|
| 878 |
|
|
|
| 879 |
|
|
\item {\bf Condition Codes Result:} {\tt CMP R0,R1;Mov.EQ \$x,PC}
|
| 880 |
|
|
|
| 881 |
|
|
|
| 882 |
|
|
At issue is the same item as above, save that the CMP instruction
|
| 883 |
|
|
updates the flags that the MOV instruction depends
|
| 884 |
|
|
upon.
|
| 885 |
|
|
|
| 886 |
|
|
The Zip CPU architecture requires that condition codes must be updated
|
| 887 |
|
|
and available immediately for the next instruction without stalling the
|
| 888 |
|
|
pipeline.
|
| 889 |
|
|
|
| 890 |
|
|
\item {\bf Condition Codes Register Result:} {\tt CMP R0,R1; MOV CC,R2}
|
| 891 |
|
|
|
| 892 |
|
|
At issue is the
|
| 893 |
|
|
fact that the logic supporting the CC register is more complicated than
|
| 894 |
|
|
the logic supporting any other register.
|
| 895 |
|
|
|
| 896 |
|
|
The ZIP CPU will stall 1--2 cycles on this instruction, until the
|
| 897 |
|
|
CC register is valid.
|
| 898 |
|
|
|
| 899 |
|
|
\item {\bf Delayed Branching: } {\tt ADD \$x,PC; MOV R0,R1}
|
| 900 |
|
|
|
| 901 |
|
|
At issues is whether or not the instruction following the jump will
|
| 902 |
|
|
take place before the jump. In other words, is the MOV to the PC
|
| 903 |
|
|
register handled differently from an ADD to the PC register?
|
| 904 |
|
|
|
| 905 |
|
|
In the Zip architecture, MOV'es and ADD's use the same logic
|
| 906 |
|
|
(simplifies the logic).
|
| 907 |
|
|
\end{itemize}
|
| 908 |
|
|
|
| 909 |
|
|
As I've studied this, I find several approaches to handling pipeline
|
| 910 |
|
|
issues. These approaches (and their consequences) are listed below.
|
| 911 |
|
|
|
| 912 |
|
|
\begin{itemize}
|
| 913 |
|
|
\item {\bf All All issued instructions complete, Stages stall individually}
|
| 914 |
|
|
|
| 915 |
|
|
What about a slow pre-fetch?
|
| 916 |
|
|
|
| 917 |
|
|
Nominally, this works well: any issued instruction
|
| 918 |
|
|
just runs to completion. If there are four issued instructions in the
|
| 919 |
|
|
pipeline, with the writeback instruction being a write-to-PC
|
| 920 |
|
|
instruction, the other three instructions naturally finish.
|
| 921 |
|
|
|
| 922 |
|
|
This approach fails when reading instructions from the flash,
|
| 923 |
|
|
since such reads require N clocks to clocks to complete. Thus
|
| 924 |
|
|
there may be only one instruction in the pipeline if reading from flash,
|
| 925 |
|
|
or a full pipeline if reading from cache. Each of these approaches
|
| 926 |
|
|
would produce a different response.
|
| 927 |
|
|
|
| 928 |
|
|
\item {\bf Issued instructions may be canceled}
|
| 929 |
|
|
|
| 930 |
|
|
Stages stall individually
|
| 931 |
|
|
|
| 932 |
|
|
First problem:
|
| 933 |
|
|
Memory operations cannot be canceled, even reads may have side effects
|
| 934 |
|
|
on peripherals that cannot be canceled later. Further, in the case of
|
| 935 |
|
|
an interrupt, it's difficult to know what to cancel. What happens in
|
| 936 |
|
|
a \hbox{\tt MOV.C \$x,PC} followed by a \hbox{\tt MOV \$y,PC}
|
| 937 |
|
|
instruction? Which get
|
| 938 |
|
|
canceled?
|
| 939 |
|
|
|
| 940 |
|
|
Because it isn't clear what would need to be canceled,
|
| 941 |
|
|
this instruction combination is not recommended.
|
| 942 |
|
|
|
| 943 |
|
|
\item {\bf All issued instructions complete.}
|
| 944 |
|
|
|
| 945 |
|
|
All stages are filled, or the entire pipeline
|
| 946 |
|
|
stalls.
|
| 947 |
|
|
|
| 948 |
|
|
What about debug control? What about
|
| 949 |
|
|
register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2
|
| 950 |
|
|
should place the value of R0 into R2. How do you restart the pipeline
|
| 951 |
|
|
after an interrupt? What address do you use? The last issued
|
| 952 |
|
|
instruction? But the branch delay slots may make that invalid!
|
| 953 |
|
|
|
| 954 |
|
|
Reading from the CPU debug port in this case yields inconsistent
|
| 955 |
|
|
results: the CPU will halt or step with instructions stuck in the
|
| 956 |
|
|
pipeline. Reading registers will give no indication of what is going
|
| 957 |
|
|
on in the pipeline, just the results of completed operations, not of
|
| 958 |
|
|
operations that have been started and not yet completed.
|
| 959 |
|
|
Perhaps we should just report the state of the CPU based upon what
|
| 960 |
|
|
instructions (PC values) have successfully completed? Thus the
|
| 961 |
|
|
debug instruction is the one that will write registers on the next
|
| 962 |
|
|
clock.
|
| 963 |
|
|
|
| 964 |
|
|
Suggestion: Suppose we load extra information in the two
|
| 965 |
|
|
CC register(s) for debugging intermediate pipeline stages?
|
| 966 |
|
|
|
| 967 |
|
|
The next problem, though, is how to deal with the read operand
|
| 968 |
|
|
pipeline stage needing the result from the register pipeline.a
|
| 969 |
|
|
|
| 970 |
|
|
\item {\bf Memory instructions must complete}
|
| 971 |
|
|
|
| 972 |
|
|
All instructions that enter into the memory module *must*
|
| 973 |
|
|
complete. Issued instructions from the prefetch, decode, or operand
|
| 974 |
|
|
read stages may or may not complete. Jumps into code must be valid,
|
| 975 |
|
|
so that interrupt returns may be valid. All instructions entering the
|
| 976 |
|
|
ALU complete.
|
| 977 |
|
|
|
| 978 |
|
|
This looks to be the simplest approach.
|
| 979 |
|
|
While the logic may be difficult, this appears to be the only
|
| 980 |
|
|
re-entrant approach.
|
| 981 |
|
|
|
| 982 |
|
|
A {\tt new\_pc} flag will be high anytime the PC changes in an
|
| 983 |
|
|
unpredictable way (i.e., it doesn't increment). This includes jumps
|
| 984 |
|
|
as well as interrupts and interrupt returns. Whenever this flag may
|
| 985 |
|
|
go high, memory operations and ALU operations will stall until the
|
| 986 |
|
|
result is known. When the flag does go high, anything in the prefetch,
|
| 987 |
|
|
decode, and read-op stage will be invalidated.
|
| 988 |
|
|
|
| 989 |
|
|
\end{itemize}
|
| 990 |
|
|
|
| 991 |
|
|
|
| 992 |
|
|
|
| 993 |
|
|
\chapter{Peripherals}\label{chap:periph}
|
| 994 |
|
|
\section{Interrupt Controller}
|
| 995 |
|
|
\section{Counter}
|
| 996 |
|
|
|
| 997 |
|
|
The Zip Counter is a very simple counter: it just counts. It cannot be
|
| 998 |
|
|
halted. When it rolls over, it issues an interrupt. Writing a value to the
|
| 999 |
|
|
counter just sets the current value, and it starts counting again from that
|
| 1000 |
|
|
value.
|
| 1001 |
|
|
|
| 1002 |
|
|
Eight counters are implemented in the Zip System for process accounting.
|
| 1003 |
|
|
This may change in the future, as nothing as yet uses these counters.
|
| 1004 |
|
|
|
| 1005 |
|
|
\section{Timer}
|
| 1006 |
|
|
|
| 1007 |
|
|
The Zip Timer is also very simple: it simply counts down to zero. When it
|
| 1008 |
|
|
transitions from a one to a zero it creates an interrupt.
|
| 1009 |
|
|
|
| 1010 |
|
|
Writing any non-zero value to the timer starts the timer. If the high order
|
| 1011 |
|
|
bit is set when writing to the timer, the timer becomes an interval timer and
|
| 1012 |
|
|
reloads its last start time on any interrupt. Hence, to mark seconds, one
|
| 1013 |
|
|
might set the timer to 100~million (the number of clocks per second), and
|
| 1014 |
|
|
set the high bit. Ever after, the timer will interrupt the CPU once per
|
| 1015 |
|
|
second (assuming a 100~MHz clock).
|
| 1016 |
|
|
|
| 1017 |
|
|
\section{Watchdog Timer}
|
| 1018 |
|
|
|
| 1019 |
|
|
The watchdog timer is no different from any of the other timers, save for one
|
| 1020 |
|
|
critical difference: the interrupt line from the watchdog
|
| 1021 |
|
|
timer is tied to the reset line of the CPU. Hence writing a `1' to the
|
| 1022 |
|
|
watchdog timer will always reset the CPU.
|
| 1023 |
|
|
To stop the Watchdog timer, write a '0' to it. To start it,
|
| 1024 |
|
|
write any other number to it---as with the other timers.
|
| 1025 |
|
|
|
| 1026 |
|
|
While the watchdog timer supports interval mode, it doesn't make as much sense
|
| 1027 |
|
|
as it did with the other timers.
|
| 1028 |
|
|
|
| 1029 |
|
|
\section{Jiffies}
|
| 1030 |
|
|
|
| 1031 |
|
|
This peripheral is motivated by the Linux use of `jiffies' whereby a process
|
| 1032 |
|
|
can request to be put to sleep until a certain number of `jiffies' have
|
| 1033 |
|
|
elapsed. Using this interface, the CPU can read the number of `jiffies'
|
| 1034 |
|
|
from the peripheral (it only has the one location in address space), add the
|
| 1035 |
|
|
sleep length to it, and write teh result back to the peripheral. The zipjiffies
|
| 1036 |
|
|
peripheral will record the value written to it only if it is nearer the current
|
| 1037 |
|
|
counter value than the last current waiting interrupt time. If no other
|
| 1038 |
|
|
interrupts are waiting, and this time is in the future, it will be enabled.
|
| 1039 |
|
|
(There is currently no way to disable a jiffie interrupt once set, other
|
| 1040 |
|
|
than to disable the register in the interrupt controller.) The processor
|
| 1041 |
|
|
may then place this sleep request into a list among other sleep requests.
|
| 1042 |
|
|
Once the timer expires, it would write the next Jiffy request to the peripheral
|
| 1043 |
|
|
and wake up the process whose timer had expired.
|
| 1044 |
|
|
|
| 1045 |
|
|
Indeed, the Jiffies register is nothing more than a glorified counter with
|
| 1046 |
|
|
an interrupt. Unlike the other counters, the Jiffies register cannot be set.
|
| 1047 |
|
|
Writes to the jiffies register create an interrupt time. When the Jiffies
|
| 1048 |
|
|
register later equals the value written to it, an interrupt will be asserted
|
| 1049 |
|
|
and the register then continues counting as though no interrupt had taken
|
| 1050 |
|
|
place.
|
| 1051 |
|
|
|
| 1052 |
|
|
The purpose of this register is to support alarm times within a CPU. To
|
| 1053 |
|
|
set an alarm for a particular process $N$ clocks in advance, read the current
|
| 1054 |
|
|
Jiffies value, and $N$, and write it back to the Jiffies register. The
|
| 1055 |
|
|
O/S must also keep track of values written to the Jiffies register. Thus,
|
| 1056 |
|
|
when an `alarm' trips, it should be remoed from the list of alarms, the list
|
| 1057 |
|
|
should be sorted, and the next alarm in terms of Jiffies should be written
|
| 1058 |
|
|
to the register.
|
| 1059 |
|
|
|
| 1060 |
|
|
\chapter{Operation}\label{chap:ops}
|
| 1061 |
|
|
|
| 1062 |
|
|
\chapter{Registers}\label{chap:regs}
|
| 1063 |
|
|
|
| 1064 |
|
|
\chapter{Wishbone Datasheet}\label{chap:wishbone}
|
| 1065 |
|
|
The Zip System supports two wishbone accesses, a slave debug port and a master
|
| 1066 |
|
|
port for the system itself. These are shown in Tbl.~\ref{tbl:wishbone-slave}
|
| 1067 |
|
|
\begin{table}[htbp]
|
| 1068 |
|
|
\begin{center}
|
| 1069 |
|
|
\begin{wishboneds}
|
| 1070 |
|
|
Revision level of wishbone & WB B4 spec \\\hline
|
| 1071 |
|
|
Type of interface & Slave, Read/Write, single words only \\\hline
|
| 1072 |
|
|
Port size & 32--bit \\\hline
|
| 1073 |
|
|
Port granularity & 32--bit \\\hline
|
| 1074 |
|
|
Maximum Operand Size & 32--bit \\\hline
|
| 1075 |
|
|
Data transfer ordering & (Irrelevant) \\\hline
|
| 1076 |
|
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
| 1077 |
|
|
Signal Names & \begin{tabular}{ll}
|
| 1078 |
|
|
Signal Name & Wishbone Equivalent \\\hline
|
| 1079 |
|
|
{\tt i\_clk} & {\tt CLK\_I} \\
|
| 1080 |
|
|
{\tt i\_dbg\_cyc} & {\tt CYC\_I} \\
|
| 1081 |
|
|
{\tt i\_dbg\_stb} & {\tt STB\_I} \\
|
| 1082 |
|
|
{\tt i\_dbg\_we} & {\tt WE\_I} \\
|
| 1083 |
|
|
{\tt i\_dbg\_addr} & {\tt ADR\_I} \\
|
| 1084 |
|
|
{\tt i\_dbg\_data} & {\tt DAT\_I} \\
|
| 1085 |
|
|
{\tt o\_dbg\_ack} & {\tt ACK\_O} \\
|
| 1086 |
|
|
{\tt o\_dbg\_stall} & {\tt STALL\_O} \\
|
| 1087 |
|
|
{\tt o\_dbg\_data} & {\tt DAT\_O}
|
| 1088 |
|
|
\end{tabular}\\\hline
|
| 1089 |
|
|
\end{wishboneds}
|
| 1090 |
|
|
\caption{Wishbone Datasheet}\label{tbl:wishbone-slave}
|
| 1091 |
|
|
\end{center}\end{table}
|
| 1092 |
|
|
and Tbl.~\ref{tbl:wishbone-master} respectively.
|
| 1093 |
|
|
\begin{table}[htbp]
|
| 1094 |
|
|
\begin{center}
|
| 1095 |
|
|
\begin{wishboneds}
|
| 1096 |
|
|
Revision level of wishbone & WB B4 spec \\\hline
|
| 1097 |
|
|
Type of interface & Master, Read/Write, sometimes pipelined \\\hline
|
| 1098 |
|
|
Port size & 32--bit \\\hline
|
| 1099 |
|
|
Port granularity & 32--bit \\\hline
|
| 1100 |
|
|
Maximum Operand Size & 32--bit \\\hline
|
| 1101 |
|
|
Data transfer ordering & (Irrelevant) \\\hline
|
| 1102 |
|
|
Clock constraints & Works at 100~MHz on a Basys--3 board\\\hline
|
| 1103 |
|
|
Signal Names & \begin{tabular}{ll}
|
| 1104 |
|
|
Signal Name & Wishbone Equivalent \\\hline
|
| 1105 |
|
|
{\tt i\_clk} & {\tt CLK\_O} \\
|
| 1106 |
|
|
{\tt o\_wb\_cyc} & {\tt CYC\_O} \\
|
| 1107 |
|
|
{\tt o\_wb\_stb} & {\tt STB\_O} \\
|
| 1108 |
|
|
{\tt o\_wb\_we} & {\tt WE\_O} \\
|
| 1109 |
|
|
{\tt o\_wb\_addr} & {\tt ADR\_O} \\
|
| 1110 |
|
|
{\tt o\_wb\_data} & {\tt DAT\_O} \\
|
| 1111 |
|
|
{\tt i\_wb\_ack} & {\tt ACK\_I} \\
|
| 1112 |
|
|
{\tt i\_wb\_stall} & {\tt STALL\_I} \\
|
| 1113 |
|
|
{\tt i\_wb\_data} & {\tt DAT\_I}
|
| 1114 |
|
|
\end{tabular}\\\hline
|
| 1115 |
|
|
\end{wishboneds}
|
| 1116 |
|
|
\caption{Wishbone Datasheet}\label{tbl:wishbone-master}
|
| 1117 |
|
|
\end{center}\end{table}
|
| 1118 |
|
|
I do not recommend that you connect these together through the interconnect.
|
| 1119 |
|
|
|
| 1120 |
|
|
The big thing to notice is that both the real time clock and the real time
|
| 1121 |
|
|
date modules act as wishbone slaves, and that all accesses to the registers of
|
| 1122 |
|
|
either module are 32--bit reads and writes. The address bus does not offer
|
| 1123 |
|
|
byte level, but rather 32--bit word level resolution. Select lines are not
|
| 1124 |
|
|
implemented. Bit ordering is the normal ordering where bit~31 is the most
|
| 1125 |
|
|
significant bit and so forth.
|
| 1126 |
|
|
|
| 1127 |
|
|
\chapter{Clocks}\label{chap:clocks}
|
| 1128 |
|
|
|
| 1129 |
|
|
This core is based upon the Basys--3 design. The Basys--3 development board
|
| 1130 |
|
|
contains one external 100~MHz clock, which is sufficient to run the ZIP CPU
|
| 1131 |
|
|
core.
|
| 1132 |
|
|
\begin{table}[htbp]
|
| 1133 |
|
|
\begin{center}
|
| 1134 |
|
|
\begin{clocklist}
|
| 1135 |
|
|
i\_clk & External & 100~MHz & 100~MHz & System clock.\\\hline
|
| 1136 |
|
|
\end{clocklist}
|
| 1137 |
|
|
\caption{List of Clocks}\label{tbl:clocks}
|
| 1138 |
|
|
\end{center}\end{table}
|
| 1139 |
|
|
I hesitate to suggest that the core can run faster than 100~MHz, since I have
|
| 1140 |
|
|
had struggled with various timing violations to keep it at 100~MHz. So, for
|
| 1141 |
|
|
now, I will only state that it can run at 100~MHz.
|
| 1142 |
|
|
|
| 1143 |
|
|
|
| 1144 |
|
|
\chapter{I/O Ports}\label{chap:ioports}
|
| 1145 |
|
|
The I/O ports for this clock are shown in Tbls.~\ref{tbl:iowishbone}
|
| 1146 |
|
|
\begin{table}[htbp]
|
| 1147 |
|
|
\begin{center}
|
| 1148 |
|
|
\begin{portlist}
|
| 1149 |
|
|
i\_clk & 1 & Input & System clock, used for time and wishbone interfaces.\\\hline
|
| 1150 |
|
|
i\_wb\_cyc & 1 & Input & Wishbone bus cycle wire.\\\hline
|
| 1151 |
|
|
i\_wb\_stb & 1 & Input & Wishbone strobe.\\\hline
|
| 1152 |
|
|
i\_wb\_we & 1 & Input & Wishbone write enable.\\\hline
|
| 1153 |
|
|
i\_wb\_addr & 5 & Input & Wishbone address.\\\hline
|
| 1154 |
|
|
i\_wb\_data & 32 & Input & Wishbone bus data register for use when writing
|
| 1155 |
|
|
(configuring) the core from the bus.\\\hline
|
| 1156 |
|
|
o\_wb\_ack & 1 & Output & Return value acknowledging a wishbone write, or
|
| 1157 |
|
|
signifying valid data in the case of a wishbone read request.
|
| 1158 |
|
|
\\\hline
|
| 1159 |
|
|
o\_wb\_stall & 1 & Output & Indicates the device is not yet ready for another
|
| 1160 |
|
|
wishbone access, effectively stalling the bus.\\\hline
|
| 1161 |
|
|
o\_wb\_data & 32 & Output & Wishbone data bus, returning data values read
|
| 1162 |
|
|
from the interface.\\\hline
|
| 1163 |
|
|
\end{portlist}
|
| 1164 |
|
|
\caption{Wishbone I/O Ports}\label{tbl:iowishbone}
|
| 1165 |
|
|
\end{center}\end{table}
|
| 1166 |
|
|
and~Tbl.~\ref{tbl:ioother}.
|
| 1167 |
|
|
\begin{table}[htbp]
|
| 1168 |
|
|
\begin{center}
|
| 1169 |
|
|
\begin{portlist}
|
| 1170 |
|
|
o\_sseg & 32 & Output & Lines to control a seven segment display, to be
|
| 1171 |
|
|
sent to that display's driver. Each eight bit byte controls
|
| 1172 |
|
|
one digit in the display, with the bottom bit in the byte
|
| 1173 |
|
|
controlling the decimal point.\\\hline
|
| 1174 |
|
|
o\_led & 16 & Output & Output LED's, consisting of a 16--bit counter counting
|
| 1175 |
|
|
from zero to all ones each minute, and synchronized with each
|
| 1176 |
|
|
minute so as to create an indicator of when the next minute
|
| 1177 |
|
|
will take place when only the hours and minutes can be
|
| 1178 |
|
|
displayed.\\\hline
|
| 1179 |
|
|
o\_interrupt & 1 & Output & A pulsed/strobed interrupt line. When the
|
| 1180 |
|
|
clock needs to generate an interrupt, it will set this line
|
| 1181 |
|
|
high for one clock cycle. \\\hline
|
| 1182 |
|
|
o\_ppd & 1 & Output & A `pulse per day' signal which can be fed into the
|
| 1183 |
|
|
real--time date module. This line will be high on the clock before
|
| 1184 |
|
|
the stroke of midnight, allowing the date module to turn over to the
|
| 1185 |
|
|
next day at exactly the same time the clock module turns over to the
|
| 1186 |
|
|
next day.\\\hline
|
| 1187 |
|
|
i\_hack & 1 & Input & When this line is raised, copies are made of the
|
| 1188 |
|
|
internal state registers on the next clock. These registers can then
|
| 1189 |
|
|
be used for an accurate time hack regarding the state of the clock
|
| 1190 |
|
|
at the time this line was strobed.\\\hline
|
| 1191 |
|
|
\end{portlist}
|
| 1192 |
|
|
\caption{Other I/O Ports}\label{tbl:ioother}
|
| 1193 |
|
|
\end{center}\end{table}
|
| 1194 |
|
|
Tbl.~\ref{tbl:iowishbone} reiterates the wishbone I/O values just discussed in
|
| 1195 |
|
|
Chapt.~\ref{chap:wishbone}, and so need no further discussion here.
|
| 1196 |
|
|
|
| 1197 |
|
|
|
| 1198 |
|
|
% Appendices
|
| 1199 |
|
|
% Index
|
| 1200 |
|
|
\end{document}
|
| 1201 |
|
|
|
| 1202 |
|
|
|