1 |
15 |
hellwig |
% This file is part of the MMIXware package (c) Donald E Knuth 1999
|
2 |
|
|
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
|
3 |
|
|
|
4 |
|
|
\def\title{MMIX}
|
5 |
|
|
\input epsf % input macros for dvips to include METAPOST illustrations
|
6 |
|
|
|
7 |
|
|
\def\MMIX{\.{MMIX}}
|
8 |
|
|
\def\NNIX{\hbox{\mc NNIX}}
|
9 |
|
|
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
|
10 |
|
|
\def\beginword{\vcenter\bgroup\let\\=\wordrule\halign\bgroup&\hfil##\hfil\cr}
|
11 |
|
|
\def\endword{\noalign{\vskip\baselineskip}\egroup\egroup
|
12 |
|
|
\advance\belowdisplayskip-\baselineskip}
|
13 |
|
|
\def\wordrule{\vrule height 9.5pt depth 4.5pt width .4pt}
|
14 |
|
|
\newdimen\bitwd \bitwd=6.6pt
|
15 |
|
|
\def\field#1#2{\vrule depth 3pt width 0pt \hbox to#1\bitwd{\hss$#2$\hss}}
|
16 |
|
|
|
17 |
|
|
\def\XF{\\{XF}}\def\XM{\\{XM}}\def\XD{\\{XD}} % these not in \tt
|
18 |
|
|
\def\PC{\\{PC}}
|
19 |
|
|
\def\Jx{\.{J}} % conversely, I type J_ to get J in \tt
|
20 |
|
|
|
21 |
|
|
\def\s{{\rm s}}
|
22 |
|
|
\def\rX{{\rm\$X}} \def\rY{{\rm\$Y}} \def\rZ{{\rm\$Z}}
|
23 |
|
|
\def\mm{{\rm M}} \def\xx{{\rm X}} \def\yy{{\rm Y}} \def\zz{{\rm Z}}
|
24 |
|
|
%\def\ll{{\rm L}} \def\gg{{\rm G}}
|
25 |
|
|
\def\ll{L} \def\gg{G}
|
26 |
|
|
\def\?{\mkern-1mu}
|
27 |
|
|
|
28 |
|
|
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
|
29 |
|
|
|
30 |
|
|
@* Introduction to MMIX.
|
31 |
|
|
Thirty-eight years have passed since the \.{MIX} computer was designed, and
|
32 |
|
|
computer architecture has been converging during those years
|
33 |
|
|
towards a rather different
|
34 |
|
|
style of machine. Therefore it is time to replace \.{MIX} with a new
|
35 |
|
|
computer that contains even less saturated fat than its predecessor.
|
36 |
|
|
|
37 |
|
|
Exercise 1.3.1--25 in the third edition of
|
38 |
|
|
{\sl Fundamental Algorithms\/} speaks of an extended
|
39 |
|
|
\.{MIX} called MixMaster, which is upward compatible with the old version.
|
40 |
|
|
But MixMaster itself is hopelessly obsolete; although it allows for
|
41 |
|
|
several gigabytes of memory, we can't even use it with {\mc ASCII} code to
|
42 |
|
|
get lowercase letters. And ouch, the standard subroutine calling convention
|
43 |
|
|
of \.{MIX} is irrevocably based on self-modifying code! Decimal arithmetic
|
44 |
|
|
and self-modifying code were popular in 1962, but they sure have disappeared
|
45 |
|
|
quickly as machines have gotten bigger and faster. A completely new
|
46 |
|
|
design is called for, based on the principles of RISC architecture as
|
47 |
|
|
expounded in {\sl Computer Architecture\/} by Hennessy and Patterson
|
48 |
|
|
(Morgan Kaufmann, 1996). % first ed was "Morgan Kaufman"! but now "nn" is legit
|
49 |
|
|
@^Hennessy, John LeRoy@>
|
50 |
|
|
@^Patterson, David Andrew@>
|
51 |
|
|
|
52 |
|
|
So here is \MMIX, a computer that will totally replace \.{MIX}
|
53 |
|
|
in the ``ultimate'' editions of {\sl The Art of Computer Programming},
|
54 |
|
|
Volumes 1--3, and in the first editions of the remaining volumes.
|
55 |
|
|
I~must confess that
|
56 |
|
|
I~can hardly wait to own a computer like this.
|
57 |
|
|
|
58 |
|
|
How do you pronounce \MMIX? I've been saying ``em-mix'' to myself,
|
59 |
|
|
because the first `\.M' represents a new millennium. Therefore I~use
|
60 |
|
|
the article ``an'' instead of~``a'' before the name \MMIX\
|
61 |
|
|
in English phrases like ``an \MMIX\ simulator.''
|
62 |
|
|
|
63 |
|
|
Incidentally, the {\sl Dictionary of American Regional English\/ \bf3} (1996)
|
64 |
|
|
lists ``mommix'' as a common dialect word used both as a noun and a verb;
|
65 |
|
|
to mommix something means to botch it, to bollix it. Only time will
|
66 |
|
|
tell whether I~have mommixed the definition of \MMIX.
|
67 |
|
|
|
68 |
|
|
@ The original \.{MIX} computer could be operated without an operating
|
69 |
|
|
system; you could bootstrap it with punched cards or paper tape and do
|
70 |
|
|
everything yourself. But nowadays such power is no longer in the hands of
|
71 |
|
|
ordinary users. The \MMIX\ hardware, like all other computing machines
|
72 |
|
|
made today, relies on an operating system to get jobs
|
73 |
|
|
started in their own address spaces and to provide I/O capabilities.
|
74 |
|
|
|
75 |
|
|
Whenever anybody has asked if I will be writing about operating systems,
|
76 |
|
|
my reply has always been ``Nix.'' Therefore the name of\/ \MMIX's operating
|
77 |
|
|
system, \NNIX, will come as no surprise.
|
78 |
|
|
@:NNIX}{\NNIX\ operating system@>
|
79 |
|
|
@^operating system@>
|
80 |
|
|
From time to time I will necessarily have to refer to things that \NNIX\ does
|
81 |
|
|
for its users, but I am unable to build \NNIX\ myself. Life is
|
82 |
|
|
too short. It would be wonderful if some expert in operating system design
|
83 |
|
|
became inspired to write a book that explains exactly how to construct a nice,
|
84 |
|
|
clean \NNIX\ kernel for an \MMIX\ chip.
|
85 |
|
|
|
86 |
|
|
@ I am deeply grateful to the many people who have helped me shape the behavior
|
87 |
|
|
of\/ \MMIX. In particular, John Hennessy and (especially) Dick Sites
|
88 |
|
|
have made significant contributions.
|
89 |
|
|
@^Hennessy, John LeRoy@>
|
90 |
|
|
@^Sites, Richard Lee@>
|
91 |
|
|
|
92 |
|
|
@ A programmer's introduction to \MMIX\ appears in ``Fascicle~1,'' a booklet
|
93 |
|
|
@^Fascicle 1@>
|
94 |
|
|
containing tutorial material that will ultimately appear in the fourth edition
|
95 |
|
|
of {\sl The Art of Computer Programming}.
|
96 |
|
|
The description in the following sections is rather different, because
|
97 |
|
|
we are concerned about a complete implementation, including all of the
|
98 |
|
|
features used by the operating system and invisible to normal programs.
|
99 |
|
|
Here it is important to emphasize exceptional cases that were glossed over
|
100 |
|
|
in the tutorial, and~to consider
|
101 |
|
|
nitpicky details about things that might go wrong.
|
102 |
|
|
|
103 |
|
|
@* MMIX basics.
|
104 |
|
|
\MMIX\ is a 64-bit RISC machine with at least 256 general-purpose registers
|
105 |
|
|
and a 64-bit address space.
|
106 |
|
|
Every instruction is four bytes long and has the form
|
107 |
|
|
$$\vcenter{\offinterlineskip
|
108 |
|
|
\def\\#1&{\omit&}
|
109 |
|
|
\hrule
|
110 |
|
|
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
|
111 |
|
|
height 9pt depth4pt&OP&&X&&Y&&Z&\cr}
|
112 |
|
|
\hrule}\,.$$
|
113 |
|
|
The 256 possible OP codes fall into a dozen or so easily remembered
|
114 |
|
|
@^OP codes@>
|
115 |
|
|
categories; an instruction usually means, ``Set register X to the
|
116 |
|
|
result of\/ Y~OP~Z\null.'' For example,
|
117 |
|
|
$$\vcenter{\offinterlineskip
|
118 |
|
|
\def\\#1&{\omit&}
|
119 |
|
|
\hrule
|
120 |
|
|
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
|
121 |
|
|
height 9pt depth4pt&32&&1&&2&&3&\cr}
|
122 |
|
|
\hrule}$$
|
123 |
|
|
sets register~1 to the sum of registers 2 and 3.
|
124 |
|
|
A few instructions combine the Y and Z bytes into
|
125 |
|
|
a 16-bit YZ field; two of the jump instructions use a 24-bit XYZ field.
|
126 |
|
|
But the three bytes X, Y, Z usually have three-pronged significance
|
127 |
|
|
independent of each other.
|
128 |
|
|
|
129 |
|
|
Instructions are usually represented in a symbolic form corresponding
|
130 |
|
|
to the \MMIX\ assembly language, in which each operation code has a mnemonic
|
131 |
|
|
name. For example, operation~32 is \.{ADD}, and the instruction above
|
132 |
|
|
might be written `\.{ADD} \.{\$1,\$2,\$3}'; a dollar sign `\.\$' symbolizes
|
133 |
|
|
a register number. In general, the instruction
|
134 |
|
|
\.{ADD}~\.{\$X,\$Y,\$Z} is the operation of setting $\rX=\rY+\rZ$.
|
135 |
|
|
An assembly language instruction with two commas has three operand
|
136 |
|
|
fields X, Y,~Z; an instruction with one comma has two operand fields
|
137 |
|
|
X,~YZ; an instruction with no comma has one operand field,~XYZ;
|
138 |
|
|
an instruction with no operands has $\xx=\yy=\zz=0$.
|
139 |
|
|
|
140 |
|
|
\def\0{\$Z\char'174Z}
|
141 |
|
|
Most instructions have two forms, one in which the Z field stands for
|
142 |
|
|
register \$Z, and one in which Z is an unsigned ``immediate'' constant.
|
143 |
|
|
@^immediate operands@>
|
144 |
|
|
Thus, for example, the command `\.{ADD} \.{\$X,\$Y,\$Z}' has a counterpart
|
145 |
|
|
`\.{ADD} \.{\$X,\$Y,Z}', which sets $\rX=\rY+\zz$. Immediate constants
|
146 |
|
|
are always nonnegative.
|
147 |
|
|
In the descriptions
|
148 |
|
|
below we will introduce such pairs of instructions
|
149 |
|
|
by writing just `\.{ADD}~\.{\$X,\$Y,\0}' instead of naming both
|
150 |
|
|
cases explicitly.
|
151 |
|
|
|
152 |
|
|
The operation code for \.{ADD}~\.{\$X,\$Y,\$Z} is 32, but the operation
|
153 |
|
|
code for \.{ADD}~\.{\$X,\$Y,Z} is~33. The \MMIX\ assembler chooses the correct
|
154 |
|
|
code by noting whether the third argument is a register number or~not.
|
155 |
|
|
|
156 |
|
|
Register numbers and constants can be given symbolic names; for example, the
|
157 |
|
|
assembly language instruction `\.x~\.{IS}~\.{\$1}' makes \.x an
|
158 |
|
|
abbreviation for register number~1. Similarly, `\.{FIVE}~\.{IS}~\.5'
|
159 |
|
|
makes \.{FIVE} an abbreviation for the constant~5.
|
160 |
|
|
After these abbreviations have been specified, the instruction
|
161 |
|
|
\.{ADD}~\.{x,x,FIVE} increases \$1 by~5, using opcode~33, while
|
162 |
|
|
the instruction \.{ADD}~\.{x,x,x} doubles \$1 using opcode~32.
|
163 |
|
|
Symbolic names that stand for register numbers
|
164 |
|
|
conventionally begin with a lowercase letter, while names that stand
|
165 |
|
|
for constants conventionally begin with an uppercase letter.
|
166 |
|
|
This convention is not actually enforced by the assembler,
|
167 |
|
|
but it tends to reduce a programmer's confusion.
|
168 |
|
|
|
169 |
|
|
@ A {\it nybble\/} is a 4-bit quantity, often used to denote a decimal
|
170 |
|
|
or hexadecimal digit.
|
171 |
|
|
A {\it byte\/} is an 8-bit quantity, often used to denote an alphanumeric
|
172 |
|
|
character in {\mc ASCII} code. The Unicode standard extends {\mc ASCII} to
|
173 |
|
|
@^Unicode@>
|
174 |
|
|
@^ASCII@>
|
175 |
|
|
essentially all the world's languages by using 16-bit-wide characters called
|
176 |
|
|
{\it wydes\/}. (Weight watchers know that two nybbles make one byte,
|
177 |
|
|
but two bytes make one wyde.)
|
178 |
|
|
In the discussion below we use the term
|
179 |
|
|
{\it tetrabyte\/} or ``tetra'' for a 4-byte quantity, and the similar term
|
180 |
|
|
@^nybble@>
|
181 |
|
|
@^byte@>
|
182 |
|
|
@^wyde@>
|
183 |
|
|
@^tetrabyte@>
|
184 |
|
|
@^octabyte@>
|
185 |
|
|
{\it octabyte\/} or ``octa'' for an 8-byte quantity. Thus, a tetra is
|
186 |
|
|
two wydes, an octa is two tetras; an octabyte has 64~bits. Each \MMIX\
|
187 |
|
|
register can be thought of as containing one octabyte, or two tetras,
|
188 |
|
|
or four wydes, or eight bytes, or sixteen nybbles.
|
189 |
|
|
|
190 |
|
|
When bytes, wydes, tetras, and octas represent numbers they are said to be
|
191 |
|
|
either {\it signed\/} or {\it unsigned}. An unsigned byte is a number between
|
192 |
|
|
0~and $2^8-1=255$ inclusive; an unsigned wyde lies, similarly, between
|
193 |
|
|
0~and $2^{16}-1=65535$; an unsigned tetra lies between
|
194 |
|
|
0~and $2^{32}-1=4{,}294{,}967{,}295$; an unsigned octa lies between
|
195 |
|
|
0~and $2^{64}-1=18{,}446{,}744{,}073{,}709{,}551{,}615$.
|
196 |
|
|
Their signed counterparts use the
|
197 |
|
|
conventions of two's complement notation, by subtracting respectively $2^8$,
|
198 |
|
|
$2^{16}$, $2^{32}$, or~$2^{64}$ times the most significant bit. Thus,
|
199 |
|
|
the unsigned bytes 128 through 255 are regarded as the numbers $-128$
|
200 |
|
|
through~$-1$ when they are evaluated as signed bytes; a signed byte therefore
|
201 |
|
|
lies between $-128$ and $+127$, inclusive. A signed wyde is a number
|
202 |
|
|
between $-32768$ and $+32767$; a signed tetra lies between
|
203 |
|
|
$-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$; a signed octa lies between
|
204 |
|
|
$-9{,}223{,}372{,}036{,}854{,}775{,}808$ and $+9{,}223{,}372{,}036{,}854{,}775{,}807$.
|
205 |
|
|
|
206 |
|
|
The virtual memory of\/ \MMIX\ is an array M of $2^{64}$ bytes. If $k$ is any
|
207 |
|
|
unsigned octabyte, M[$k$]~is a 1-byte quantity. \MMIX\ machines do not
|
208 |
|
|
actually have such vast memories, but programmers can act as if $2^{64}$ bytes
|
209 |
|
|
are indeed present, because \MMIX\ provides address translation mechanisms by
|
210 |
|
|
which an operating system can maintain this illusion.
|
211 |
|
|
|
212 |
|
|
We use the notation $\mm_{2^t}[k]$ to stand for a number consisting of
|
213 |
|
|
$2^t$~consecutive bytes starting at location~$k\land\nobreak(2^{64}-2^t)$.
|
214 |
|
|
(The notation $k\land(2^{64}-2^t)$ means that the least
|
215 |
|
|
significant $t$ bits of~$k$ are set to~0, and only the least 64~bits
|
216 |
|
|
of the resulting address are retained. Similarly, the notation
|
217 |
|
|
$k\lor(2^t-1)$ means that the least significant $t$ bits of~$k$ are set to~1.)
|
218 |
|
|
All accesses to $2^t$-byte quantities by \MMIX\ are {\it aligned}, in the sense
|
219 |
|
|
that the first byte is a multiple of~$2^t$.
|
220 |
|
|
|
221 |
|
|
Addressing is always ``big-endian.'' In other words, the
|
222 |
|
|
@^big-endian versus little-endian@>
|
223 |
|
|
@^little-endian versus big-endian@>
|
224 |
|
|
most significant (leftmost) byte of $\mm_{2^t}[k]$ is
|
225 |
|
|
$\mm_1[k\land\nobreak(2^{64}-2^t)]$
|
226 |
|
|
and the least significant (rightmost) byte is $\mm_1[k\lor(2^t-1)]$.
|
227 |
|
|
We use the notation $\s(\mm_{2^t}[k])$ when we want to regard
|
228 |
|
|
this $2^t$-byte number as a {\it signed\/} integer.
|
229 |
|
|
Formally speaking, if $l=2^t$,
|
230 |
|
|
@^signed integers@>
|
231 |
|
|
$$\s(\mm_l[k])=\bigl(\mm_1[k\land(-l)]\,\mm_1[k\land(-l)+1]\,\ldots\,
|
232 |
|
|
\mm_1[k\lor(l-1)]\bigr)_{256}
|
233 |
|
|
-2^{8l}[\mm_1[k\land(-l)]\!\ge\!128].$$
|
234 |
|
|
|
235 |
|
|
@* Loading and storing.
|
236 |
|
|
Several instructions can be used to get information from memory into
|
237 |
|
|
registers. For example, the ``load tetra unsigned'' instruction
|
238 |
|
|
\.{LDTU} \.{\$1,\$4,\$5}
|
239 |
|
|
puts the four bytes $\mm_4[\$4+\$5]$ into register~1 as an unsigned
|
240 |
|
|
integer;
|
241 |
|
|
the most significant four bytes of register~1 are set to zero.
|
242 |
|
|
The similar instruction \.{LDT} \.{\$1,\$4,\$5}, ``load tetra,'' sets
|
243 |
|
|
\$1 to the {\it signed\/} integer $\s(\mm_4[\$4+\$5])$.
|
244 |
|
|
(Instructions generally treat numbers as
|
245 |
|
|
@^signed integers@>
|
246 |
|
|
signed unless the operation code specifically calls them
|
247 |
|
|
unsigned.) In the signed case, the most significant four bytes of the
|
248 |
|
|
register will be copies of the most significant bit of the tetrabyte
|
249 |
|
|
loaded; thus they will be all~0s or all~1s, depending on whether the
|
250 |
|
|
number is $\ge0$ or $<0$.
|
251 |
|
|
|
252 |
|
|
\def\bull{\smallbreak\textindent{$\bullet$}}
|
253 |
|
|
\def\bul{\par\textindent{$\bullet$}}
|
254 |
|
|
\def\<#1 #2 {\.{#1}~\.{#2} }
|
255 |
|
|
\def\>{\hfill\break}
|
256 |
|
|
|
257 |
|
|
\bull\
|
258 |
|
|
@.LDB@>
|
259 |
|
|
Byte $\s(\mm[\rY+\rZ])$ or $\s(\mm[\rY+\zz])$ is loaded into register~X as a
|
260 |
|
|
signed number between $-128$ and $+127$, inclusive.
|
261 |
|
|
|
262 |
|
|
\bull\
|
263 |
|
|
@.LDBU@>
|
264 |
|
|
Byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$ is loaded into register~X as an
|
265 |
|
|
unsigned number between $0$ and $255$, inclusive.
|
266 |
|
|
|
267 |
|
|
\bull\
|
268 |
|
|
@.LDW@>
|
269 |
|
|
Bytes $\s(\mm_2[\rY+\rZ])$ or $\s(\mm_2[\rY+\zz])$
|
270 |
|
|
are loaded into register~X as a signed number between $-32768$ and $+32767$,
|
271 |
|
|
inclusive. As mentioned above, our notation $\mm_2[k]$ implies that
|
272 |
|
|
the least significant bit of the address $\rY+\rZ$ or $\rY+\zz$ is
|
273 |
|
|
ignored and assumed to be~0.
|
274 |
|
|
|
275 |
|
|
\bull\
|
276 |
|
|
@.LDWU@>
|
277 |
|
|
Bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$ are loaded
|
278 |
|
|
into register~X as an unsigned number between $0$ and $65535$, inclusive.
|
279 |
|
|
|
280 |
|
|
\bull\
|
281 |
|
|
@.LDT@>
|
282 |
|
|
Bytes $\s(\mm_4[\rY+\rZ])$ or $\s(\mm_4[\rY+\zz])$
|
283 |
|
|
are loaded into register~X as a signed number between $-2{,}147{,}483{,}648$ and
|
284 |
|
|
$+2{,}147{,}483{,}647$, inclusive.
|
285 |
|
|
As mentioned above, our notation $\mm_4[k]$ implies that
|
286 |
|
|
the two least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
|
287 |
|
|
ignored and assumed to be~0.
|
288 |
|
|
|
289 |
|
|
\bull\
|
290 |
|
|
@.LDTU@>
|
291 |
|
|
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$
|
292 |
|
|
are loaded into register~X as an unsigned number between 0 and
|
293 |
|
|
4{,}294{,}967{,}296, inclusive.
|
294 |
|
|
|
295 |
|
|
\bull\
|
296 |
|
|
@.LDO@>
|
297 |
|
|
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
|
298 |
|
|
register~X\null.
|
299 |
|
|
As mentioned above, our notation $\mm_8[k]$ implies that
|
300 |
|
|
the three least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
|
301 |
|
|
ignored and assumed to be~0.
|
302 |
|
|
|
303 |
|
|
\bull\
|
304 |
|
|
@.LDOU@>
|
305 |
|
|
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
|
306 |
|
|
register~X\null. There is in fact no difference between the behavior of
|
307 |
|
|
\.{LDOU} and~\.{LDO}, since
|
308 |
|
|
an octabyte can be regarded as either signed or unsigned. \.{LDOU} is included
|
309 |
|
|
in \MMIX\ just for completeness and consistency, in spite of the fact that
|
310 |
|
|
a foolish consistency is the hobgoblin of little minds.
|
311 |
|
|
@^Emerson, Ralph Waldo@>
|
312 |
|
|
(Niklaus Wirth made a strong plea for such consistency in his early critique
|
313 |
|
|
of System/360; see {\sl JACM\/ \bf15} (1967), 37--74.)
|
314 |
|
|
@^Wirth, Niklaus Emil@>
|
315 |
|
|
@^System/360@>
|
316 |
|
|
|
317 |
|
|
\bull\
|
318 |
|
|
@.LDHT@>
|
319 |
|
|
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$ are loaded into the most
|
320 |
|
|
significant half of register~X, and the least significant half is
|
321 |
|
|
cleared to zero. (One use of ``high tetra arithmetic'' is to detect
|
322 |
|
|
overflow easily when tetrabytes are added or subtracted.)
|
323 |
|
|
|
324 |
|
|
\bull\
|
325 |
|
|
The address $\rY+\rZ$ or $\rY+\zz$ is loaded into register~X. This
|
326 |
|
|
instruction is simply another name for the \.{ADDU} instruction
|
327 |
|
|
discussed below; it can
|
328 |
|
|
be used when the programmer is thinking of memory addresses
|
329 |
|
|
instead of numbers.
|
330 |
|
|
The \MMIX\ assembler converts \.{LDA} into the same OP-code as \.{ADDU}.
|
331 |
|
|
@.LDA@>
|
332 |
|
|
@.ADDU@>
|
333 |
|
|
|
334 |
|
|
@ Another family of instructions goes the other way, storing registers into
|
335 |
|
|
memory. For example, the ``store octa immediate'' command
|
336 |
|
|
\
|
337 |
|
|
into $\mm_8[\$2+17]$.
|
338 |
|
|
|
339 |
|
|
\bull\
|
340 |
|
|
@.STB@>
|
341 |
|
|
The least significant byte of register~X is stored into
|
342 |
|
|
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. An integer overflow exception occurs if
|
343 |
|
|
@.overflow@>
|
344 |
|
|
\$X is not between $-128$ and $+127$. (We will discuss overflow and other
|
345 |
|
|
kinds of exceptions later.)
|
346 |
|
|
|
347 |
|
|
\bull\\>
|
348 |
|
|
@.STBU@>
|
349 |
|
|
The least significant byte of register~X is stored into
|
350 |
|
|
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. \.{STBU} instructions are the same
|
351 |
|
|
as \.{STB} instructions, except that no test for overflow is made.
|
352 |
|
|
|
353 |
|
|
\bull\
|
354 |
|
|
@.STW@>
|
355 |
|
|
The two least significant bytes of register~X are stored into
|
356 |
|
|
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
|
357 |
|
|
An integer overflow exception occurs if
|
358 |
|
|
\$X is not between $-32768$ and $+32767$.
|
359 |
|
|
|
360 |
|
|
\bull\\>
|
361 |
|
|
@.STWU@>
|
362 |
|
|
The two least significant bytes of register~X are stored into
|
363 |
|
|
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
|
364 |
|
|
\.{STWU} instructions are the same
|
365 |
|
|
as \.{STW} instructions, except that no test for overflow is made.
|
366 |
|
|
|
367 |
|
|
\bull\
|
368 |
|
|
@.STT@>
|
369 |
|
|
The four least significant bytes of register~X are stored into
|
370 |
|
|
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
|
371 |
|
|
An integer overflow exception occurs if
|
372 |
|
|
\$X is not between $-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$.
|
373 |
|
|
|
374 |
|
|
\bull\
|
375 |
|
|
@.STTU@>
|
376 |
|
|
The four least significant bytes of register~X are stored into
|
377 |
|
|
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
|
378 |
|
|
\.{STTU} instructions are the same
|
379 |
|
|
as \.{STT} instructions, except that no test for overflow is made.
|
380 |
|
|
|
381 |
|
|
\bull\
|
382 |
|
|
@.STO@>
|
383 |
|
|
Register X is stored into bytes $\mm_8[\rY+\rZ]$ or
|
384 |
|
|
$\mm_8[\rY+\zz]$.
|
385 |
|
|
|
386 |
|
|
\bull\
|
387 |
|
|
@.STOU@>
|
388 |
|
|
Identical to \.{STO} \.{\$X,\$Y,\0}.
|
389 |
|
|
|
390 |
|
|
\bull\
|
391 |
|
|
@.STCO@>
|
392 |
|
|
An octabyte whose value is the unsigned byte X is stored into
|
393 |
|
|
$\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$.
|
394 |
|
|
|
395 |
|
|
\bull\
|
396 |
|
|
The most significant four bytes of register~X are stored into
|
397 |
|
|
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
|
398 |
|
|
@.STHT@>
|
399 |
|
|
|
400 |
|
|
@* Adding and subtracting.
|
401 |
|
|
Once numbers are in registers, we can compute with them. Let's consider
|
402 |
|
|
addition and subtraction first.
|
403 |
|
|
|
404 |
|
|
\bull\
|
405 |
|
|
@.ADD@>
|
406 |
|
|
The sum $\rY+\rZ$ or $\rY+\zz$ is placed into register~X using signed,
|
407 |
|
|
two's complement arithmetic.
|
408 |
|
|
An integer overflow exception occurs if the sum is $\ge2^{63}$ or $<-2^{63}$.
|
409 |
|
|
(We will discuss overflow and other kinds of exceptions later.)
|
410 |
|
|
@.overflow@>
|
411 |
|
|
|
412 |
|
|
\bull\
|
413 |
|
|
@.ADDU@>
|
414 |
|
|
The sum $(\rY+\rZ)\bmod2^{64}$ or $(\rY+\zz)\bmod2^{64}$
|
415 |
|
|
is placed into register~X\null.
|
416 |
|
|
These instructions are the same
|
417 |
|
|
as \.{ADD}~\.{\$X,\$Y,\0} commands
|
418 |
|
|
except that no test for overflow is made.
|
419 |
|
|
(Overflow could be detected if desired by using the command
|
420 |
|
|
\
|
421 |
|
|
@.CMPU@>
|
422 |
|
|
``compare unsigned''; see below.)
|
423 |
|
|
|
424 |
|
|
\bull\<2ADDU \$X,\$Y,\0 `times 2 and add unsigned'.\>
|
425 |
|
|
@.2ADDU@>
|
426 |
|
|
The sum $(2\rY+\rZ)\bmod2^{64}$ or $(2\rY+\zz)\bmod2^{64}$
|
427 |
|
|
is placed into register~X\null.
|
428 |
|
|
|
429 |
|
|
\bull\<4ADDU \$X,\$Y,\0 `times 4 and add unsigned'.\>
|
430 |
|
|
@.4ADDU@>
|
431 |
|
|
The sum $(4\rY+\rZ)\bmod2^{64}$ or $(4\rY+\zz)\bmod2^{64}$
|
432 |
|
|
is placed into register~X\null.
|
433 |
|
|
|
434 |
|
|
\bull\<8ADDU \$X,\$Y,\0 `times 8 and add unsigned'.\>
|
435 |
|
|
@.8ADDU@>
|
436 |
|
|
The sum $(8\rY+\rZ)\bmod2^{64}$ or $(8\rY+\zz)\bmod2^{64}$
|
437 |
|
|
is placed into register~X\null.
|
438 |
|
|
|
439 |
|
|
\bull\<16ADDU \$X,\$Y,\0 `times 16 and add unsigned'.\>
|
440 |
|
|
@.16ADDU@>
|
441 |
|
|
The sum $(16\rY+\rZ)\bmod2^{64}$ or $(16\rY+\zz)\bmod2^{64}$
|
442 |
|
|
is placed into register~X\null.
|
443 |
|
|
|
444 |
|
|
\bull\
|
445 |
|
|
@.SUB@>
|
446 |
|
|
The difference $\rY-\rZ$ or $\rY-\zz$ is placed into register~X using
|
447 |
|
|
signed, two's complement arithmetic.
|
448 |
|
|
An integer overflow exception occurs if the difference is $\ge2^{63}$ or
|
449 |
|
|
$<-2^{63}$.
|
450 |
|
|
|
451 |
|
|
\bull\
|
452 |
|
|
@.SUBU@>
|
453 |
|
|
The difference $(\rY-\rZ)\bmod2^{64}$ or $(\rY-\zz)\bmod2^{64}$
|
454 |
|
|
is placed into register~X\null.
|
455 |
|
|
These two instructions are the same
|
456 |
|
|
as \.{SUB}~\.{\$X,\$Y,\0} except that no test for overflow is made.
|
457 |
|
|
|
458 |
|
|
\bull\
|
459 |
|
|
@.NEG@>
|
460 |
|
|
The value $\yy-\rZ$ or $\yy-\zz$ is placed into register~X using
|
461 |
|
|
signed, two's complement arithmetic.
|
462 |
|
|
An integer overflow exception occurs if the result is greater
|
463 |
|
|
than~$2^{63}-\nobreak1$.
|
464 |
|
|
(Notice that in this case \MMIX\ works with the ``immediate'' constant~Y,
|
465 |
|
|
not register~Y\null. \.{NEG} commands are analogous to the immediate variants
|
466 |
|
|
of other commands, because they save us from having to put one-byte
|
467 |
|
|
constants into a register. When $\yy=0$, overflow occurs if and
|
468 |
|
|
only if $\rZ=-2^{63}$. The instruction \
|
469 |
|
|
same effect as \.{NEG}~\.{\$X,0,1}.)
|
470 |
|
|
|
471 |
|
|
\bull\
|
472 |
|
|
@.NEGU@>
|
473 |
|
|
The value $(\yy-\rZ)\bmod2^{64}$ or $(\yy-\zz)\bmod2^{64}$
|
474 |
|
|
is placed into register~X\null.
|
475 |
|
|
\.{NEGU} instructions are the same
|
476 |
|
|
as \.{NEG} instructions, except that no test for overflow is made.
|
477 |
|
|
|
478 |
|
|
@* Bit fiddling.
|
479 |
|
|
Before looking at multiplication and division, which take longer than
|
480 |
|
|
addition and subtraction, let's look at some of the other things that
|
481 |
|
|
\MMIX\ can do fast. There are eighteen instructions for bitwise
|
482 |
|
|
logical operations on unsigned numbers.
|
483 |
|
|
|
484 |
|
|
\bull\
|
485 |
|
|
@.AND@>
|
486 |
|
|
Each bit of register Y is logically anded with the corresponding bit of
|
487 |
|
|
register~Z or of the constant~Z, and the result is placed in register~X\null.
|
488 |
|
|
In other words, a bit of register~X is set to~1 if and only if the
|
489 |
|
|
corresponding bits of the operands are both~1;
|
490 |
|
|
in symbols, $\rX=\rY\land\rZ$ or $\rX=\rY\land\zz$.
|
491 |
|
|
This means in particular that \
|
492 |
|
|
most significant bytes of register~X, because 0s are prefixed to the
|
493 |
|
|
constant byte~Z\null.
|
494 |
|
|
|
495 |
|
|
\bull\
|
496 |
|
|
@.OR@>
|
497 |
|
|
Each bit of register Y is logically ored with the corresponding bit of
|
498 |
|
|
register~Z or of the constant~Z, and the result is placed in register~X\null.
|
499 |
|
|
In other words, a bit of register~X is set to~0 if and only if the
|
500 |
|
|
corresponding bits of the operands are both~0;
|
501 |
|
|
in symbols, $\rX=\rY\lor\rZ$ or $\rX=\rY\lor\zz$.
|
502 |
|
|
|
503 |
|
|
In the special case $\zz=0$, the immediate variant of
|
504 |
|
|
this command simply copies register~Y to
|
505 |
|
|
register~X\null. The \MMIX\ assembler allows us to write
|
506 |
|
|
`\.{SET}~\.{\$X,\$Y}' as a convenient abbreviation for
|
507 |
|
|
`\.{OR}~\.{\$X,\$Y,0}'.
|
508 |
|
|
@.SET@>
|
509 |
|
|
|
510 |
|
|
\bull\
|
511 |
|
|
@.XOR@>
|
512 |
|
|
Each bit of register Y is logically xored with the corresponding bit of
|
513 |
|
|
register~Z or of the constant~Z, and the result is placed in register~X\null.
|
514 |
|
|
In other words, a bit of register~X is set to~0 if and only if the
|
515 |
|
|
corresponding bits of the operands are equal;
|
516 |
|
|
in symbols, $\rX=\rY\oplus\rZ$ or $\rX=\rY\oplus\zz$.
|
517 |
|
|
|
518 |
|
|
\bull\
|
519 |
|
|
@.ANDN@>
|
520 |
|
|
Each bit of register Y is logically anded with the complement of the
|
521 |
|
|
corresponding bit of
|
522 |
|
|
register~Z or of the constant~Z, and the result is placed in register~X\null.
|
523 |
|
|
In other words, a bit of register~X is set to~1 if and only if the
|
524 |
|
|
corresponding bit of register~Y is~1 and the other corresponding bit is~0;
|
525 |
|
|
in symbols, $\rX=\rY\setminus\rZ$ or $\rX=\rY\setminus\zz$.
|
526 |
|
|
(This is the {\it logical difference\/} operation; if the operands
|
527 |
|
|
are bit strings representing sets, we are computing the elements that
|
528 |
|
|
lie in one set but not the other.)
|
529 |
|
|
|
530 |
|
|
\bull\
|
531 |
|
|
@.ORN@>
|
532 |
|
|
Each bit of register Y is logically ored with the complement of the
|
533 |
|
|
corresponding bit of
|
534 |
|
|
register~Z or of the constant~Z, and the result is placed in register~X\null.
|
535 |
|
|
In other words, a bit of register~X is set to~1 if and only if the
|
536 |
|
|
corresponding bit of register~Y is greater than or equal to the other corresponding bit;
|
537 |
|
|
in symbols, $\rX=\rY\lor\overline\rZ$
|
538 |
|
|
or $\rX=\rY\lor\overline\zz$.
|
539 |
|
|
(This is the complement of $\rZ\setminus\rY$ or $\zz\setminus\rY$.)
|
540 |
|
|
|
541 |
|
|
\bull\
|
542 |
|
|
@.NAND@>
|
543 |
|
|
Each bit of register Y is logically anded with the corresponding bit of
|
544 |
|
|
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
|
545 |
|
|
In other words, a bit of register~X is set to~0 if and only if the
|
546 |
|
|
corresponding bits of the operands are both~1;
|
547 |
|
|
in symbols, $\rX=\rY\mathbin{\overline\land}\rZ$ or
|
548 |
|
|
$\rX=\rY\mathbin{\overline\land}\zz$.
|
549 |
|
|
|
550 |
|
|
\bull\
|
551 |
|
|
@.NOR@>
|
552 |
|
|
Each bit of register Y is logically ored with the corresponding bit of
|
553 |
|
|
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
|
554 |
|
|
In other words, a bit of register~X is set to~1 if and only if the
|
555 |
|
|
corresponding bits of the operands are both~0;
|
556 |
|
|
in symbols, $\rX=\rY\mathbin{\overline\lor}\rZ$ or
|
557 |
|
|
$\rX=\rY\mathbin{\overline\lor}\zz$.
|
558 |
|
|
|
559 |
|
|
\bull\
|
560 |
|
|
@.NAND@>
|
561 |
|
|
Each bit of register Y is logically xored with the corresponding bit of
|
562 |
|
|
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
|
563 |
|
|
In other words, a bit of register~X is set to~1 if and only if the
|
564 |
|
|
corresponding bits of the operands are equal;
|
565 |
|
|
in symbols, $\rX=\rY\mathbin{\overline\oplus}\rZ$ or
|
566 |
|
|
$\rX=\rY\mathbin{\overline\oplus}\zz$.
|
567 |
|
|
|
568 |
|
|
\bull\
|
569 |
|
|
@.MUX@>
|
570 |
|
|
For each bit position~$j$, the $j$th bit of register~X is set either to
|
571 |
|
|
bit~$j$ of register~Y
|
572 |
|
|
or to bit~$j$ of the other operand \$Z~or~Z, depending on
|
573 |
|
|
whether bit~$j$ of the special {\it mask register\/}~rM is 1 or 0:
|
574 |
|
|
@^rM@>
|
575 |
|
|
if ${\rm M}_j$ then $\yy_j$ else~$\zz_j$.
|
576 |
|
|
In symbols, $\rm\rX=(\rY\land rM)\lor(\rZ\land\overline{rM})$ or
|
577 |
|
|
$\rm\rX=(\rY\land rM)\lor(\zz\land\overline{rM})$.
|
578 |
|
|
(\MMIX\ has several such special registers, associated with instructions that
|
579 |
|
|
need more than two inputs or produce more than one output.)
|
580 |
|
|
|
581 |
|
|
@ Besides the eighteen bitwise operations, \MMIX\ can also perform unsigned
|
582 |
|
|
bytewise and biggerwise operations that are somewhat more exotic.
|
583 |
|
|
|
584 |
|
|
\bull\
|
585 |
|
|
@.BDIF@>
|
586 |
|
|
For each byte position~$j$, the $j$th byte of register~X is set to byte~$j$ of
|
587 |
|
|
register~Y minus byte~$j$ of the other operand \$Z~or~Z, unless that
|
588 |
|
|
difference is negative; in the latter case, byte~$j$ of~\$X is set to zero.
|
589 |
|
|
|
590 |
|
|
\bull\
|
591 |
|
|
@.WDIF@>
|
592 |
|
|
For each wyde position~$j$, the $j$th wyde of register~X is set to wyde~$j$ of
|
593 |
|
|
register~Y minus wyde~$j$ of the other operand \$Z~or~Z, unless that
|
594 |
|
|
difference is negative; in the latter case, wyde~$j$ of~\$X is set to zero.
|
595 |
|
|
|
596 |
|
|
\bull\
|
597 |
|
|
@.TDIF@>
|
598 |
|
|
For each tetra position~$j$, the $j$th tetra of register~X is set to tetra~$j$ of
|
599 |
|
|
register~Y minus tetra~$j$ of the other operand \$Z~or~Z, unless that
|
600 |
|
|
difference is negative; in the latter case, tetra~$j$ of~\$X is set to zero.
|
601 |
|
|
|
602 |
|
|
\bull\
|
603 |
|
|
@.ODIF@>
|
604 |
|
|
Register~X is set to register~Y minus the other operand \$Z~or~Z, unless
|
605 |
|
|
\$Z~or~Z exceeds register~Y; in the latter case,
|
606 |
|
|
\$X~is set to zero. The operands are treated as unsigned integers.
|
607 |
|
|
|
608 |
|
|
\smallskip
|
609 |
|
|
The \.{BDIF} and \.{WDIF} commands are useful
|
610 |
|
|
in applications to graphics or video; \.{TDIF} and \.{ODIF} are also
|
611 |
|
|
present for reasons of consistency. For example, if \.a and \.b are
|
612 |
|
|
registers containing
|
613 |
|
|
8-byte quantities, their bytewise maxima~\.c and
|
614 |
|
|
bytewise minima~\.d are computed by
|
615 |
|
|
$$\hbox{\tt BDIF x,a,b; ADDU c,x,b; SUBU d,a,x;}$$
|
616 |
|
|
similarly, the individual ``pixel differences'' \.e, namely the absolute
|
617 |
|
|
values of the differences of corresponding bytes, are computed by
|
618 |
|
|
$$\hbox{\tt BDIF x,a,b; BDIF y,b,a; OR e,x,y.}$$
|
619 |
|
|
To add individual
|
620 |
|
|
bytes of \.a and \.b while clipping all sums to 255 if they don't fit
|
621 |
|
|
in a single byte, one can say
|
622 |
|
|
$$\hbox{\tt NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums,x,0;}$$
|
623 |
|
|
in other words, complement \.a, apply \.{BDIF}, and complement the result.
|
624 |
|
|
The operations can also be used to construct efficient operations on
|
625 |
|
|
strings of bytes or wydes.
|
626 |
|
|
@^graphics@>
|
627 |
|
|
@^pixels@>
|
628 |
|
|
@^saturating arithmetic@>
|
629 |
|
|
@^nybble@>
|
630 |
|
|
|
631 |
|
|
Exercise: Implement a ``nybble difference'' instruction that operates
|
632 |
|
|
in a similar way on sixteen nybbles at a time.
|
633 |
|
|
|
634 |
|
|
Answer: {\tt\spaceskip=.5em minus .3em
|
635 |
|
|
AND x,a,m; AND y,b,m; ANDN xx,a,m; ANDN yy,b,m;
|
636 |
|
|
BDIF x,x,y; BDIF xx,xx,yy; OR ans,x,xx} where register \.m contains
|
637 |
|
|
the mask \Hex{0f0f0f0f0f0f0f0f}.
|
638 |
|
|
|
639 |
|
|
(The \.{ANDN} operation can be regarded as
|
640 |
|
|
a ``bit difference'' instruction that operates
|
641 |
|
|
in a similar way on 64 bits at a time.)
|
642 |
|
|
|
643 |
|
|
@ Three more pairs of bit-fiddling instructions round out the collection of exotics.
|
644 |
|
|
|
645 |
|
|
\bull\
|
646 |
|
|
@.SADD@>
|
647 |
|
|
Each bit of register Y is logically anded with the complement of the
|
648 |
|
|
corresponding bit of
|
649 |
|
|
register~Z or of the constant~Z, and the number of 1~bits in the
|
650 |
|
|
result is placed in register~X\null.
|
651 |
|
|
In other words, register~X is set to the number of bit positions
|
652 |
|
|
in which register~Y has a~1 and the other operand has a~0;
|
653 |
|
|
in symbols, $\rX=\nu(\rY\setminus\rZ)$ or $\rX=\nu(\rY\setminus\zz)$.
|
654 |
|
|
When the second operand is zero this operation is sometimes called
|
655 |
|
|
``population counting,'' because it counts the number of 1s in register~Y\null.
|
656 |
|
|
@^population counting@>
|
657 |
|
|
@^counting ones@>
|
658 |
|
|
|
659 |
|
|
\bull\
|
660 |
|
|
@.MOR@>
|
661 |
|
|
Suppose the 64 bits of register Y are indexed as
|
662 |
|
|
$$y_{00}y_{01}\ldots y_{07}y_{10}y_{11}\ldots y_{17}\ldots
|
663 |
|
|
y_{70}y_{71}\ldots y_{77};$$
|
664 |
|
|
in other words, $y_{ij}$ is the $j$th bit of the $i$th byte, if we
|
665 |
|
|
number the bits and bytes from 0 to 7 in big-endian fashion from left to right.
|
666 |
|
|
Let the bits of the other operand, \$Z or~Z, be indexed similarly:
|
667 |
|
|
$$z_{00}z_{01}\ldots z_{07}z_{10}z_{11}\ldots z_{17}\ldots
|
668 |
|
|
z_{70}z_{71}\ldots z_{77}.$$
|
669 |
|
|
The \.{MOR} operation replaces each bit $x_{ij}$ of register~X by the bit
|
670 |
|
|
$$ y_{0j}z_{i0}\lor y_{1j}z_{i1}\lor \cdots \lor y_{7j}z_{i7}.$$
|
671 |
|
|
Thus, for example, if register Z contains the constant
|
672 |
|
|
\Hex{0102040810204080},
|
673 |
|
|
\.{MOR} reverses the order of the bytes in register~Y, converting between
|
674 |
|
|
little-endian and big-endian addressing.
|
675 |
|
|
@^big-endian versus little-endian@>
|
676 |
|
|
@^little-endian versus big-endian@>
|
677 |
|
|
(The $i$th byte of~\$X depends on the bytes of~\$Y as specified by the
|
678 |
|
|
$i$th byte of~\$Z or~Z\null. If we regard
|
679 |
|
|
64-bit words as $8\times8$ Boolean matrices, with one byte per column,
|
680 |
|
|
this operation computes the
|
681 |
|
|
Boolean product $\rX=\rY\,\rZ$ or $\rX=\rY\,\zz$. Alternatively, if we
|
682 |
|
|
regard 64-bit words as $8\times8$ matrices with one byte per~{\it row},
|
683 |
|
|
\.{MOR} computes the Boolean product $\rX=\rZ\,\rY$ or $\rX=\zz\,\rY$
|
684 |
|
|
with operands in the opposite order. The immediate form
|
685 |
|
|
\
|
686 |
|
|
to zero; the other byte is set to the bitwise or of whatever bytes of
|
687 |
|
|
register~Y are specified by the immediate operand~Z\null.)
|
688 |
|
|
|
689 |
|
|
Exercise: Explain how to compute a mask \.m that is \Hex{ff} in byte
|
690 |
|
|
positions where \.a exceeds \.b, \Hex{00} in all other bytes.
|
691 |
|
|
Answer: \.{BDIF}~\.{x,a,b;} \.{MOR}~\.{m,minusone,x;}
|
692 |
|
|
here \.{minusone} is a register consisting of all 1s. (Moreover,
|
693 |
|
|
if we \.{AND} this result
|
694 |
|
|
with \Hex{8040201008040201}, then \.{MOR} with $\zz=255$, we get
|
695 |
|
|
a one-byte encoding~of~\.m.)
|
696 |
|
|
|
697 |
|
|
\bull\
|
698 |
|
|
@.MXOR@>
|
699 |
|
|
This operation is like the Boolean multiplication just discussed, but
|
700 |
|
|
exclusive-or is used to combine the bits. Thus we obtain a matrix
|
701 |
|
|
product over the field of two elements instead of a Boolean matrix product.
|
702 |
|
|
This operation can be used to construct hash functions, among many other things.
|
703 |
|
|
(The hash functions aren't bad, but they are not ``universal'' in the
|
704 |
|
|
sense of exercise 6.4--72.)
|
705 |
|
|
@^matrices of bits@>
|
706 |
|
|
@^Boolean multiplication@>
|
707 |
|
|
|
708 |
|
|
@ Sixteen ``immediate wyde'' instructions are available for the common
|
709 |
|
|
case that a 16-bit constant is needed. In this case the Y~and~Z fields
|
710 |
|
|
of the instruction are regarded as a single 16-bit unsigned number~YZ\null.
|
711 |
|
|
@^immediate operands@>
|
712 |
|
|
|
713 |
|
|
\bull\
|
714 |
|
|
@.SETH@>
|
715 |
|
|
\
|
716 |
|
|
@.SETMH@>
|
717 |
|
|
\
|
718 |
|
|
@.SETML@>
|
719 |
|
|
\
|
720 |
|
|
@.SETL@>
|
721 |
|
|
The 16-bit unsigned number YZ is shifted left
|
722 |
|
|
by either 48 or 32 or 16 or 0 bits, respectively, and placed into register~X\null.
|
723 |
|
|
Thus, for example, \.{SETML} inserts
|
724 |
|
|
a given value into the second-least-significant wyde of register~X and
|
725 |
|
|
sets the other three wydes to zero.
|
726 |
|
|
|
727 |
|
|
\bull\
|
728 |
|
|
@.INCH@>
|
729 |
|
|
\
|
730 |
|
|
@.INCMH@>
|
731 |
|
|
\
|
732 |
|
|
@.INCML@>
|
733 |
|
|
\
|
734 |
|
|
@.INCL@>
|
735 |
|
|
The 16-bit unsigned number YZ is shifted left
|
736 |
|
|
by either 48 or 32 or 16 or 0 bits, respectively, and added to register~X,
|
737 |
|
|
ignoring overflow; the result is placed back into register~X\null.
|
738 |
|
|
|
739 |
|
|
If YZ is the hexadecimal constant \Hex{8000}, the command \
|
740 |
|
|
complements the most significant bit of register~X\null. We will see
|
741 |
|
|
below that this can be used to negate a floating point number.
|
742 |
|
|
@^negation, floating point@>
|
743 |
|
|
|
744 |
|
|
\bull\
|
745 |
|
|
@.ORH@>
|
746 |
|
|
\
|
747 |
|
|
@.ORMH@>
|
748 |
|
|
\
|
749 |
|
|
@.ORML@>
|
750 |
|
|
\
|
751 |
|
|
@.ORL@>
|
752 |
|
|
The 16-bit unsigned number YZ is shifted left
|
753 |
|
|
by either 48 or 32 or 16 or 0 bits, respectively, and ored with register~X;
|
754 |
|
|
the result is placed back into register~X\null.
|
755 |
|
|
|
756 |
|
|
Notice that any desired 4-wyde constant \.{GH} \.{IJ} \.{KL} \.{MN}
|
757 |
|
|
can be inserted into a register with a sequence of four instructions
|
758 |
|
|
such as
|
759 |
|
|
$$\hbox{\tt SETH \$X,GH; INCMH \$X,IJ; INCML \$X,KL; INCL \$X,MN;}$$
|
760 |
|
|
any of these \.{INC} instructions could also be replaced by \.{OR}.
|
761 |
|
|
|
762 |
|
|
\bull\
|
763 |
|
|
@.ANDNH@>
|
764 |
|
|
\
|
765 |
|
|
@.ANDNMH@>
|
766 |
|
|
\
|
767 |
|
|
@.ANDNML@>
|
768 |
|
|
\
|
769 |
|
|
@.ANDNL@>
|
770 |
|
|
The 16-bit unsigned number YZ is shifted left
|
771 |
|
|
by either 48 or 32 or 16 or 0 bits, respectively, then
|
772 |
|
|
complemented and anded with register~X;
|
773 |
|
|
the result is placed back into register~X\null.
|
774 |
|
|
|
775 |
|
|
If YZ is the hexadecimal
|
776 |
|
|
constant \Hex{8000}, the command \
|
777 |
|
|
bit of register~X to be~0. This can be used to compute the absolute value of
|
778 |
|
|
a floating point number.
|
779 |
|
|
@^absolute value, floating point@>
|
780 |
|
|
|
781 |
|
|
@ \MMIX\ knows several ways to shift a register left or right
|
782 |
|
|
by any number of bits.
|
783 |
|
|
|
784 |
|
|
\bull\
|
785 |
|
|
@.SL@>
|
786 |
|
|
The bits of register~Y are shifted left by \$Z or Z places, and 0s
|
787 |
|
|
are shifted in from the right; the result is placed in register~X\null.
|
788 |
|
|
Register~Y is treated as a signed number, but
|
789 |
|
|
the second operand is treated as an unsigned number.
|
790 |
|
|
The effect is the same as multiplication by
|
791 |
|
|
$2^{\mkern1mu\rZ}$ or by $2^\zz$; an integer overflow exception occurs if the
|
792 |
|
|
result is $\ge2^{63}$ or $<-2^{63}$.
|
793 |
|
|
In particular, if the second operand is 64 or~more, register~X will
|
794 |
|
|
become entirely zero, and integer overflow will be signaled unless
|
795 |
|
|
register~Y was zero.
|
796 |
|
|
|
797 |
|
|
\bull\
|
798 |
|
|
@.SLU@>
|
799 |
|
|
The bits of register~Y are shifted left by \$Z or Z places, and 0s
|
800 |
|
|
are shifted in from the right; the result is placed in register~X\null.
|
801 |
|
|
Both operands are treated as unsigned numbers. The \.{SLU} instructions
|
802 |
|
|
are equivalent to \.{SL}, except that no test for overflow is made.
|
803 |
|
|
|
804 |
|
|
\bull\
|
805 |
|
|
@.SR@>
|
806 |
|
|
The bits of register~Y are shifted right by \$Z or Z places, and copies
|
807 |
|
|
of the leftmost bit (the sign bit) are shifted in from the left; the result is
|
808 |
|
|
placed in register~X\null.
|
809 |
|
|
Register~Y is treated as a signed number, but
|
810 |
|
|
the second operand is treated as an unsigned number.
|
811 |
|
|
The effect is the same as division by $2^{\mkern1mu\rZ}$ or by
|
812 |
|
|
$2^\zz$ and rounding down. In particular, if the second operand is 64 or~more,
|
813 |
|
|
register~X will become zero if \$Y was nonnegative, $-1$ if \$Y was negative.
|
814 |
|
|
|
815 |
|
|
\bull\
|
816 |
|
|
@.SRU@>
|
817 |
|
|
The bits of register~Y are shifted right by \$Z or Z places, and 0s
|
818 |
|
|
are shifted in from the left; the result is placed in register~X\null.
|
819 |
|
|
Both operands are treated as unsigned numbers.
|
820 |
|
|
The effect is the same as unsigned division of a 64-bit number
|
821 |
|
|
by $2^{\mkern1mu\rZ}$ or by~$2^\zz$;
|
822 |
|
|
if the second operand is 64 or~more, register~X will become entirely~zero.
|
823 |
|
|
|
824 |
|
|
@* Comparisons.
|
825 |
|
|
Arithmetic and logical operations are nice,
|
826 |
|
|
but computer programs also need to compare numbers
|
827 |
|
|
and to change the course of a calculation depending on what they find.
|
828 |
|
|
\MMIX\ has four comparison instructions to facilitate such decision-making.
|
829 |
|
|
|
830 |
|
|
\bull\
|
831 |
|
|
@.CMP@>
|
832 |
|
|
Register X is set to $-1$ if register Y is less than register Z or less than
|
833 |
|
|
the unsigned immediate value~Z, using the conventions of signed
|
834 |
|
|
arithmetic; it is set to 0 if register~Y is equal to register Z or equal to
|
835 |
|
|
the unsigned immediate value~Z; otherwise it is set to~1.
|
836 |
|
|
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
|
837 |
|
|
|
838 |
|
|
\bull\
|
839 |
|
|
@.CMPU@>
|
840 |
|
|
Register X is set to $-1$ if register Y is less than register Z or less than
|
841 |
|
|
the unsigned immediate value Z, using the conventions of unsigned
|
842 |
|
|
arithmetic; it is set to 0 if register Y is equal to register Z or equal to
|
843 |
|
|
the unsigned immediate value~Z; otherwise it is set to~1.
|
844 |
|
|
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
|
845 |
|
|
|
846 |
|
|
@ There also are 32 conditional instructions, which choose quickly between
|
847 |
|
|
two alternative courses of action.
|
848 |
|
|
|
849 |
|
|
\bull\
|
850 |
|
|
@.CSN@>
|
851 |
|
|
If register Y is negative (namely if its most significant bit is~1),
|
852 |
|
|
register~X is set to the contents of register~Z or to the
|
853 |
|
|
unsigned immediate value~Z. Otherwise nothing happens.
|
854 |
|
|
|
855 |
|
|
\bull\
|
856 |
|
|
@.CSZ@>
|
857 |
|
|
\bul\
|
858 |
|
|
@.CSP@>
|
859 |
|
|
\bul\
|
860 |
|
|
@.CSOD@>
|
861 |
|
|
\bul\
|
862 |
|
|
@.CSNN@>
|
863 |
|
|
\bul\
|
864 |
|
|
@.CSNZ@>
|
865 |
|
|
\bul\
|
866 |
|
|
@.CSNP@>
|
867 |
|
|
\bul\
|
868 |
|
|
@.CSEV@>
|
869 |
|
|
These instructions are entirely analogous to \.{CSN}, except
|
870 |
|
|
that register~X changes only if register~Y is respectively zero, positive,
|
871 |
|
|
odd, nonnegative, nonzero, nonpositive, or nonodd.
|
872 |
|
|
|
873 |
|
|
\bull\
|
874 |
|
|
@.ZSN@>
|
875 |
|
|
If register Y is negative (namely if its most significant bit is~1),
|
876 |
|
|
register~X is set to the contents of register~Z or to the
|
877 |
|
|
unsigned immediate value~Z. Otherwise register~X is set to zero.
|
878 |
|
|
|
879 |
|
|
\bull\
|
880 |
|
|
@.ZSZ@>
|
881 |
|
|
\bul\
|
882 |
|
|
@.ZSP@>
|
883 |
|
|
\bul\
|
884 |
|
|
@.ZSOD@>
|
885 |
|
|
\bul\
|
886 |
|
|
@.ZSNN@>
|
887 |
|
|
\bul\
|
888 |
|
|
@.ZSNZ@>
|
889 |
|
|
\bul\
|
890 |
|
|
@.ZSNP@>
|
891 |
|
|
\bul\
|
892 |
|
|
@.ZSEV@>
|
893 |
|
|
These instructions are entirely analogous to \.{ZSN}, except
|
894 |
|
|
that \$X is set to \$Z or~Z if register~Y is respectively zero, positive,
|
895 |
|
|
odd, nonnegative, nonzero, nonpositive, or even; otherwise
|
896 |
|
|
\$X is set to zero.
|
897 |
|
|
|
898 |
|
|
Notice that the two instructions \
|
899 |
|
|
the same effect. So do the two instructions \
|
900 |
|
|
So do \
|
901 |
|
|
|
902 |
|
|
@* Branches and jumps.
|
903 |
|
|
\MMIX\ ordinarily executes instructions in sequence, proceeding from
|
904 |
|
|
an instruction in tetrabyte M$_4[\lambda]$ to the instruction in
|
905 |
|
|
M$_4[\lambda+4]$. But there are several ways to interrupt
|
906 |
|
|
the normal flow of control, most of which use the Y and Z fields of
|
907 |
|
|
an instruction as a combined 16-bit YZ field.
|
908 |
|
|
For example, \
|
909 |
|
|
is typical: It means that control should skip ahead 1000 instructions
|
910 |
|
|
to the command that appears $4000$ bytes after the
|
911 |
|
|
\.{BNZ}, if register~3 is not equal to zero.
|
912 |
|
|
|
913 |
|
|
There are eight branch-forward instructions, corresponding to the
|
914 |
|
|
eight conditions in the \.{CS} and \.{ZS} commands that we discussed earlier.
|
915 |
|
|
And there are eight similar branch-backward instructions; for example,
|
916 |
|
|
\
|
917 |
|
|
instruction that appears $4000$ bytes {\it before\/}
|
918 |
|
|
this \.{BOD} command, if register~2 is odd. The numeric OP-code when branching
|
919 |
|
|
backward is one greater than the OP-code when branching
|
920 |
|
|
forward; the assembler takes care of this automatically, just as it takes
|
921 |
|
|
cares of changing \.{ADD} from 32 to 33 when necessary.
|
922 |
|
|
|
923 |
|
|
Since branches are relative to the current location, the \MMIX\ assembler
|
924 |
|
|
treats branch instructions in a special way.
|
925 |
|
|
Suppose a programmer writes `\.{BNZ} \.{\$3,Case5}',
|
926 |
|
|
where \.{Case5} is the address of an instruction in location~$l$.
|
927 |
|
|
If this instruction appears in location~$\lambda$, the assembler first
|
928 |
|
|
computes the displacement $\delta=\lfloor(l-\lambda)/4\rfloor$. Then if
|
929 |
|
|
$\delta$ is nonnegative, the quantity~$\delta$
|
930 |
|
|
is placed in the YZ field of a \.{BNZ}
|
931 |
|
|
command, and it should be less than $2^{16}$; if $\delta$ is negative,
|
932 |
|
|
the quantity $2^{16}+\delta$ is placed in the YZ field of a \.{BNZ}
|
933 |
|
|
command with OP-code increased by~1,
|
934 |
|
|
and $\delta$ should not be less than $-2^{16}$.
|
935 |
|
|
|
936 |
|
|
The symbol \.{@@} used in our examples of
|
937 |
|
|
\.{BNZ} and \.{BOD} above is interpreted by the
|
938 |
|
|
assembler as an abbreviation for ``the location of the current
|
939 |
|
|
instruction.'' In the following
|
940 |
|
|
notes we will define pairs of branch commands by writing, for example,
|
941 |
|
|
`\.{BNZ}~\.{\$X,@@+4*YZ[-262144]}'; this stands for a branch-forward
|
942 |
|
|
command that
|
943 |
|
|
branches to the current location plus four times~YZ, as well as for
|
944 |
|
|
a branch-backward command that branches to the current
|
945 |
|
|
location plus four times $(\rm YZ-65536)$.
|
946 |
|
|
|
947 |
|
|
\bull\
|
948 |
|
|
@.BN@>
|
949 |
|
|
\bul\
|
950 |
|
|
@.BZ@>
|
951 |
|
|
\bul\
|
952 |
|
|
@.BP@>
|
953 |
|
|
\bul\
|
954 |
|
|
@.BOD@>
|
955 |
|
|
\bul\
|
956 |
|
|
@.BNN@>
|
957 |
|
|
\bul\
|
958 |
|
|
@.BNZ@>
|
959 |
|
|
\bul\
|
960 |
|
|
@.BNP@>
|
961 |
|
|
\bul\
|
962 |
|
|
@.BEV@>
|
963 |
|
|
If register X is respectively negative, zero, positive, odd, nonnegative,
|
964 |
|
|
nonzero, nonpositive, or even, and if this instruction appears in memory
|
965 |
|
|
location $\lambda$, the next instruction is taken from memory location
|
966 |
|
|
$\lambda+4{\rm YZ}$ (branching forward) or $\lambda+4({\rm YZ}-2^{16})$
|
967 |
|
|
(branching backward). Thus one can go from location~$\lambda$ to any location
|
968 |
|
|
between $\lambda-262{,}144$ and $\lambda+262{,}140$, inclusive.
|
969 |
|
|
|
970 |
|
|
\smallskip
|
971 |
|
|
Sixteen additional branch instructions called {\it probable branches\/}
|
972 |
|
|
are also provided. They have exactly the same meaning as ordinary
|
973 |
|
|
branch instructions; for example, \
|
974 |
|
|
go backward 4000 bytes if register~2 is odd. But they differ in running time:
|
975 |
|
|
On some implementations of\/ \MMIX,
|
976 |
|
|
a branch instruction takes longer when the branch is taken, while a
|
977 |
|
|
probable branch takes longer when the branch is {\it not\/} taken.
|
978 |
|
|
Thus programmers should use a \.B instruction when they think branching is
|
979 |
|
|
relatively unlikely, but they should use \.{PB} when they expect
|
980 |
|
|
branching to occur more often than not. Here is a list of the
|
981 |
|
|
probable branch commands, for completeness:
|
982 |
|
|
|
983 |
|
|
\bull\
|
984 |
|
|
@.PBN@>
|
985 |
|
|
\bul\
|
986 |
|
|
@.PBZ@>
|
987 |
|
|
\bul\
|
988 |
|
|
@.PBP@>
|
989 |
|
|
\bul\
|
990 |
|
|
@.PBOD@>
|
991 |
|
|
\bul\
|
992 |
|
|
@.PBNN@>
|
993 |
|
|
\bul\
|
994 |
|
|
@.PBNZ@>
|
995 |
|
|
\bul\
|
996 |
|
|
@.PBNP@>
|
997 |
|
|
\bul\
|
998 |
|
|
@.PBEV@>
|
999 |
|
|
|
1000 |
|
|
@ Locations that are relative to the current instruction can be
|
1001 |
|
|
transformed into absolute locations with \.{GETA} commands.
|
1002 |
|
|
|
1003 |
|
|
\bull\
|
1004 |
|
|
@.GETA@>
|
1005 |
|
|
The value $\lambda+4{\rm YZ}$ or $\lambda+4({\rm YZ}-2^{16})$ is placed in
|
1006 |
|
|
register~X\null. (The assembly language conventions of branch instructions
|
1007 |
|
|
apply; for example, we can write `\.{GETA} \.{\$X,Addr}'.)
|
1008 |
|
|
|
1009 |
|
|
@ \MMIX\ also has unconditional jump instructions, which change the
|
1010 |
|
|
location of the next instruction no matter what.
|
1011 |
|
|
|
1012 |
|
|
\bull\
|
1013 |
|
|
@.JMP@>
|
1014 |
|
|
A \.{JMP} command treats bytes X, Y, and Z as an unsigned 24-bit
|
1015 |
|
|
integer XYZ. It allows a program to transfer control from location $\lambda$ to any
|
1016 |
|
|
location between $\lambda-67\?{,}108{,}864$ and $\lambda+67\?{,}108{,}860$
|
1017 |
|
|
inclusive, using relative addressing as in the \.{B} and \.{PB} commands.
|
1018 |
|
|
|
1019 |
|
|
\bull\
|
1020 |
|
|
@.GO@>
|
1021 |
|
|
\MMIX\ takes its next instruction from location $\rY+\rZ$ or $\rY+\zz$,
|
1022 |
|
|
and continues from there. Register~X is set equal to $\lambda+4$, the
|
1023 |
|
|
location of the instruction that would ordinarily have been executed next.
|
1024 |
|
|
(\.{GO} is similar to a jump, but it is not relative
|
1025 |
|
|
to the current location. Since \.{GO} has the same format as a load or store
|
1026 |
|
|
instruction, a loading routine can treat program labels with the same mechanism
|
1027 |
|
|
that is used to treat references to data.)
|
1028 |
|
|
|
1029 |
|
|
An old-fashioned type of subroutine linkage can be implemented by saying
|
1030 |
|
|
either `\.{GO}~\.{r,subloc,0}' or `\.{GETA}~\.{r,@@+8;}
|
1031 |
|
|
\.{JMP}~\.{Sub}' to~enter a subroutine,
|
1032 |
|
|
then `\.{GO}~\.{r,r,0}' to return.
|
1033 |
|
|
But subroutines are normally entered with the instructions
|
1034 |
|
|
\.{PUSHJ} or \.{PUSHGO}, described below.
|
1035 |
|
|
|
1036 |
|
|
The two least significant bits of the address
|
1037 |
|
|
in a \.{GO} command are essentially ignored. They will, however, appear in
|
1038 |
|
|
the value of~$\lambda$ returned by \.{GETA} instructions, and in the
|
1039 |
|
|
return-jump register~rJ after \.{PUSHJ} or \.{PUSHGO} instructions are
|
1040 |
|
|
performed, and in
|
1041 |
|
|
@^rJ@>
|
1042 |
|
|
the where-interrupted register at the time of an interrupt. Therefore they
|
1043 |
|
|
could be used to send some kind of signal to a subroutine or (less likely)
|
1044 |
|
|
to an interrupt handler.
|
1045 |
|
|
|
1046 |
|
|
@* Multiplication and division.
|
1047 |
|
|
Now for some instructions that make \MMIX\ work harder.
|
1048 |
|
|
|
1049 |
|
|
\bull\
|
1050 |
|
|
@.MUL@>
|
1051 |
|
|
The signed product of the number in register Y by either the
|
1052 |
|
|
number in register~Z or the unsigned byte~Z
|
1053 |
|
|
replaces the contents of register~X\null. An
|
1054 |
|
|
integer overflow exception can occur, as with \.{ADD} or \.{SUB}, if the
|
1055 |
|
|
result is less than $-2^{63}$ or greater than $2^{63}-1$. (Immediate
|
1056 |
|
|
multiplication by powers of~2 can be done more rapidly with the \.{SL}
|
1057 |
|
|
instruction.)
|
1058 |
|
|
|
1059 |
|
|
\bull\
|
1060 |
|
|
@.MULU@>
|
1061 |
|
|
The lower 64 bits of the
|
1062 |
|
|
unsigned 128-bit product of register~Y and either
|
1063 |
|
|
register~Z or~Z are placed in register~X, and the upper 64 bits are
|
1064 |
|
|
placed in the special {\it himult register\/}~rH\null. (Immediate multiplication
|
1065 |
|
|
@^rH@>
|
1066 |
|
|
by powers of~2 can be done more rapidly with the \.{SLU} instruction,
|
1067 |
|
|
if the upper half is not needed.
|
1068 |
|
|
Furthermore, an instruction like \<4ADDU \$X,\$Y,\$Y is faster than
|
1069 |
|
|
\.{MULU} \.{\$X,\$Y,5}.)
|
1070 |
|
|
|
1071 |
|
|
\bull\
|
1072 |
|
|
@.DIV@>
|
1073 |
|
|
The signed quotient of the number in register Y divided
|
1074 |
|
|
by either the number in register~Z or the unsigned byte~Z
|
1075 |
|
|
replaces the contents of register~X, and the signed remainder
|
1076 |
|
|
is placed in the special {\it remainder register\/}~rR\null.
|
1077 |
|
|
@^rR@>
|
1078 |
|
|
An integer divide check exception occurs if the divisor is zero; in that
|
1079 |
|
|
case \$X is set to zero and rR is set to~\$Y\null.
|
1080 |
|
|
@^divide check exception@>
|
1081 |
|
|
@^overflow@>
|
1082 |
|
|
An integer overflow exception occurs if the number $-2^{63}$ is divided
|
1083 |
|
|
by~$-1$; otherwise integer overflow is impossible. The quotient of
|
1084 |
|
|
$y$ divided by~$z$ is defined to be $\lfloor y/z\rfloor$, and the remainder
|
1085 |
|
|
is defined to be $y-\lfloor y/z\rfloor z$ (also written $y\bmod z$).
|
1086 |
|
|
Thus, the remainder is either
|
1087 |
|
|
zero or has the sign of the divisor. Dividing by $z=2^t$ gives
|
1088 |
|
|
exactly the same quotient as shifting right~$t$ via the \.{SR} command, and
|
1089 |
|
|
exactly the same remainder as anding with $z-1$ via the \.{AND} command.
|
1090 |
|
|
Division of a positive 63-bit number by a positive constant can be accomplished
|
1091 |
|
|
more quickly by computing the upper half of a suitable unsigned product and
|
1092 |
|
|
shifting it right appropriately.
|
1093 |
|
|
|
1094 |
|
|
\bull\
|
1095 |
|
|
@.DIVU@>
|
1096 |
|
|
The unsigned 128-bit number obtained by prefixing the special {\it dividend
|
1097 |
|
|
register}~rD to the contents of register~Y is divided either by the
|
1098 |
|
|
@^rD@>
|
1099 |
|
|
unsigned number in register~Z or by the unsigned byte~Z, and the quotient is placed
|
1100 |
|
|
in register~X\null. The remainder is placed in the remainder
|
1101 |
|
|
register~rR\null.
|
1102 |
|
|
However, if rD is greater than or equal to
|
1103 |
|
|
the divisor (and in particular if the divisor is zero),
|
1104 |
|
|
then \$X is set to~rD and rR is set to~\$Y\null.
|
1105 |
|
|
(Unsigned arithmetic never signals an exceptional condition, even
|
1106 |
|
|
when dividing by zero.)
|
1107 |
|
|
If rD is zero, unsigned division by $z=2^t$ gives exactly the same quotient as
|
1108 |
|
|
shifting right~$t$ via the \.{SRU} command, and
|
1109 |
|
|
exactly the same remainder as anding with $z-1$ via the \.{AND} command.
|
1110 |
|
|
Section 4.3.1 of {\sl Seminumerical Algorithms\/}
|
1111 |
|
|
explains how to use unsigned division to obtain the quotient and remainder
|
1112 |
|
|
of extremely large numbers.
|
1113 |
|
|
|
1114 |
|
|
@* Floating point computations.
|
1115 |
|
|
Floating point arithmetic conforming to the famous IEEE/ANSI
|
1116 |
|
|
Standard~754 is provided for arbitrary 64-bit numbers. The IEEE standard
|
1117 |
|
|
refers to such numbers as ``double format'' quantities, but \MMIX\
|
1118 |
|
|
calls them simply floating point numbers because 64-bit quantities are
|
1119 |
|
|
the~norm.
|
1120 |
|
|
@^floating point arithmetic@>
|
1121 |
|
|
@^IEEE/ANSI Standard 754@>
|
1122 |
|
|
@^subnormal numbers@>
|
1123 |
|
|
@^normal numbers@>
|
1124 |
|
|
@^NaN@>
|
1125 |
|
|
@^overflow@>
|
1126 |
|
|
@^underflow@>
|
1127 |
|
|
@^invalid exception@>
|
1128 |
|
|
@^inexact exception@>
|
1129 |
|
|
@^signaling NaN@>
|
1130 |
|
|
@^quiet NaN@>
|
1131 |
|
|
@^infinity@>
|
1132 |
|
|
@^rounding modes@>
|
1133 |
|
|
|
1134 |
|
|
A positive floating point number has 53 bits of precision and can range
|
1135 |
|
|
from approximately $10^{-308}$ to $10^{308}$. ``Subnormal numbers''
|
1136 |
|
|
between $10^{-324}$ and $10^{-308}$ can also be represented, but with fewer
|
1137 |
|
|
bits of precision.
|
1138 |
|
|
Floating point numbers can be
|
1139 |
|
|
infinite, and they satisfy such identities as $1.0/\infty=+0.0$, $-2.8\times\infty
|
1140 |
|
|
=-\infty$. Floating
|
1141 |
|
|
point quantities can also be ``Not-a-Numbers'' or NaNs, which are
|
1142 |
|
|
further classified into signaling NaNs and quiet NaNs.
|
1143 |
|
|
|
1144 |
|
|
Five kinds
|
1145 |
|
|
of exceptions can occur during floating point computations, and they
|
1146 |
|
|
each have code letters: Floating
|
1147 |
|
|
overflow~(O) or underflow~(U); floating divide by zero~(Z);
|
1148 |
|
|
floating inexact~(X); and floating invalid~(I).
|
1149 |
|
|
For example, the multiplication of sufficiently small integers causes
|
1150 |
|
|
no exceptions, and the division of 91.0 by~13.0 is also exception-free,
|
1151 |
|
|
but the division 1.0/3.0 is inexact. The multiplication of extremely
|
1152 |
|
|
large or extremely small floating point numbers is inexact and it
|
1153 |
|
|
also causes overflow or underflow.
|
1154 |
|
|
Invalid results occur when taking the square root of a negative
|
1155 |
|
|
number; mathematicians can remember the I exception
|
1156 |
|
|
by relating it to the square root of $-1.0$.
|
1157 |
|
|
Invalid results also occur when trying to convert infinity
|
1158 |
|
|
or a quiet NaN to a fixed-point
|
1159 |
|
|
integer, or when any signaling NaN is encountered, or when
|
1160 |
|
|
mathematically undefined operations like $\infty-\infty$ or $0/0$ are
|
1161 |
|
|
requested.
|
1162 |
|
|
(Programmers can be sure that they have not erroneously
|
1163 |
|
|
used uninitialized floating point data if they initialize all their variables
|
1164 |
|
|
to signaling NaN values.)
|
1165 |
|
|
|
1166 |
|
|
Four different rounding modes for inexact results are available:
|
1167 |
|
|
round to nearest (and to even in case of ties);
|
1168 |
|
|
round off (toward zero); round up (toward $+\infty)$;
|
1169 |
|
|
or round down (toward $-\infty$). \MMIX\
|
1170 |
|
|
has a special {\it arithmetic status register\/}~rA that specifies the
|
1171 |
|
|
@^rA@>
|
1172 |
|
|
current rounding mode and the user's current preferences for exception
|
1173 |
|
|
handling.
|
1174 |
|
|
|
1175 |
|
|
\def\NaN{{\rm NaN}}
|
1176 |
|
|
IEEE standard arithmetic provides an excellent foundation for scientific
|
1177 |
|
|
calculations, and it will be thoroughly explained in the fourth
|
1178 |
|
|
edition of {\sl Seminumerical Algorithms}, Section 4.2.
|
1179 |
|
|
For our present purposes, we need not study all the details; but
|
1180 |
|
|
we do need to specify \MMIX's behavior with respect to several
|
1181 |
|
|
things that are not completely defined by the standard.
|
1182 |
|
|
For example, the IEEE standard does not fully define the
|
1183 |
|
|
result of operations with NaNs.
|
1184 |
|
|
|
1185 |
|
|
When an octabyte represents a floating point number
|
1186 |
|
|
in \MMIX's registers, the leftmost bit is the sign; then come 11 bits for an
|
1187 |
|
|
exponent~$e$; and the remaining 52 bits are the fraction part~$f$.
|
1188 |
|
|
We regard $e$ as an integer between 0 and $(11111111111)_2=2047$, and we regard $f$ as
|
1189 |
|
|
a fraction between 0 and $(.111\ldots1)_2=1-2^{-52}$.
|
1190 |
|
|
Each octabyte has the following
|
1191 |
|
|
significance:
|
1192 |
|
|
$$\vbox{\halign{\hfil$\pm#$,\quad if \hfil\cr
|
1193 |
|
|
0.0&$e=f=0$ (zero);\cr
|
1194 |
|
|
2^{-1022}f&$e=0$ and $f>0$ (subnormal);\cr
|
1195 |
|
|
2^{\mkern1mu e-1023}(1+f)&$0
|
1196 |
|
|
\infty&$e=2047$ and $f=0$ (infinite);\cr
|
1197 |
|
|
\NaN(f)&$e=2047$ and $0
|
1198 |
|
|
\NaN(f)&$e=2047$ and $f\ge1/2$ (quiet NaN).\cr}}$$
|
1199 |
|
|
Notice that $+0.0$ is distinguished from $-0.0$; this fact is
|
1200 |
|
|
important for interval arithmetic.
|
1201 |
|
|
@^minus zero@>
|
1202 |
|
|
|
1203 |
|
|
Exercise: What 64 bits represent the floating point number 1.0?
|
1204 |
|
|
Answer: We want $e=1023$ and $f=0$, so the answer is \Hex{3ff0000000000000}.
|
1205 |
|
|
|
1206 |
|
|
Exercise: What is the largest finite floating point number?
|
1207 |
|
|
Answer: We want $e=2046$ and $f=1-2^{-52}$, so the answer is
|
1208 |
|
|
$\Hex{7fefffffffffffff}=2^{1024}-2^{971}$.
|
1209 |
|
|
|
1210 |
|
|
@ The seven IEEE floating point arithmetic operations (addition, subtraction,
|
1211 |
|
|
multiplication, division, remainder, square root, and nearest-integer)
|
1212 |
|
|
all share common features, called the {\it standard floating point
|
1213 |
|
|
conventions\/} in the discussion below:
|
1214 |
|
|
@^standard floating point conventions@>
|
1215 |
|
|
@^overflow@>
|
1216 |
|
|
@^underflow@>
|
1217 |
|
|
The operation is performed on floating point numbers found in two registers,
|
1218 |
|
|
\$Y and~\$Z, except that square root and integerization
|
1219 |
|
|
involve only one operand.
|
1220 |
|
|
If neither input operand is a NaN, we first determine the exact result,
|
1221 |
|
|
then round it using the current rounding mode
|
1222 |
|
|
found in special register~rA\null. Infinite results are exact and need
|
1223 |
|
|
no rounding. A floating overflow exception occurs if the rounded
|
1224 |
|
|
result is finite but needs an exponent greater than 2046.
|
1225 |
|
|
A floating underflow exception occurs if the rounded result needs an exponent
|
1226 |
|
|
less than~1 and either (i)~the unrounded result cannot be represented exactly
|
1227 |
|
|
@^rA@>
|
1228 |
|
|
as a subnormal number or (ii)~the ``floating underflow trip'' is enabled in~rA\null.
|
1229 |
|
|
(Trips are discussed below.)
|
1230 |
|
|
NaNs are treated specially as follows: If either \$Y or~\$Z is a signaling NaN,
|
1231 |
|
|
an invalid exception occurs and the NaN is quieted by adding 1/2 to its
|
1232 |
|
|
fraction part. Then if \$Z is a quiet NaN, the result is set
|
1233 |
|
|
to \$Z; otherwise if \$Y is a quiet NaN, the result is set to \$Y\null.
|
1234 |
|
|
(Registers \$Y and \$Z do not actually change.)
|
1235 |
|
|
|
1236 |
|
|
\bull\\>
|
1237 |
|
|
@.FADD@>
|
1238 |
|
|
The floating point sum $\rY+\rZ$ is computed by the
|
1239 |
|
|
standard floating point conventions just described,
|
1240 |
|
|
and placed in register~X\null.
|
1241 |
|
|
An invalid exception occurs if the sum is $(+\infty)+(-\infty)$ or
|
1242 |
|
|
$(-\infty)+(+\infty)$; in that case the result is $\NaN(1/2)$ with the sign
|
1243 |
|
|
of~\$Z\null. If the sum is exactly zero and the current mode is
|
1244 |
|
|
not rounding-down, the result is $+0.0$ except that $(-0.0)+(-0.0)=-0.0$. If the
|
1245 |
|
|
@^minus zero@>
|
1246 |
|
|
sum is exactly zero and the current mode is rounding-down, the result
|
1247 |
|
|
is $-0.0$ except that $(+0.0)+(+0.0)=+0.0$.
|
1248 |
|
|
These rules for signed zeros turn out to be useful when doing interval
|
1249 |
|
|
arithmetic: If the lower bound of an interval is $+0.0$ or if the
|
1250 |
|
|
upper bound is $-0.0$, the interval does not contain zero, so the
|
1251 |
|
|
numbers in the interval have a known sign.
|
1252 |
|
|
|
1253 |
|
|
Floating point underflow cannot occur unless the U-trip has been enabled,
|
1254 |
|
|
because any underflowing result of floating point
|
1255 |
|
|
addition can be represented exactly as a subnormal number.
|
1256 |
|
|
|
1257 |
|
|
Silly but instructive exercise: Find all pairs of numbers $(\rY,\rZ)$ such
|
1258 |
|
|
that the commands \
|
1259 |
|
|
the same result in~\$X
|
1260 |
|
|
(although \.{FADD} may cause floating exceptions).
|
1261 |
|
|
Answer: Of course \$Y or \$Z could be zero, if the other one is not a signaling
|
1262 |
|
|
NaN. Or one could be signaling and the other \Hex{0008000000000000}.
|
1263 |
|
|
Other possibilities
|
1264 |
|
|
occur when they are both positive and less than
|
1265 |
|
|
\Hex{0010000000000001}; or when one operand is \Hex{0000000000000001}
|
1266 |
|
|
and the other is an odd number between \Hex{0020000000000001} and
|
1267 |
|
|
\Hex{002ffffffffffffd} inclusive (rounding to nearest).
|
1268 |
|
|
And still more surprising possibilities exist, such as
|
1269 |
|
|
\Hex{7f6001b4c67bc809}\thinspace+\thinspace\Hex{ff5ffb6a4534a3f7}.
|
1270 |
|
|
All eight families of solutions will be revealed some day in the fourth edition
|
1271 |
|
|
of {\sl Seminumerical Algorithms}.
|
1272 |
|
|
|
1273 |
|
|
\bull\
|
1274 |
|
|
@.FSUB@>
|
1275 |
|
|
This instruction is equivalent to \.{FADD}, but with the sign of~\$Z negated
|
1276 |
|
|
unless \$Z is a~NaN.
|
1277 |
|
|
|
1278 |
|
|
\bull\
|
1279 |
|
|
@.FMUL@>
|
1280 |
|
|
The floating point product $\rY\times\rZ$ is computed by
|
1281 |
|
|
the standard floating point conventions, and placed in register~X\null.
|
1282 |
|
|
An invalid exception occurs if
|
1283 |
|
|
the product is $(\pm0.0)\times(\pm\infty)$ or $(\pm\infty)\times(\pm0.0)$;
|
1284 |
|
|
in that case the result is $\pm\NaN(1/2)$. No exception occurs for the
|
1285 |
|
|
product $(\pm\infty)\times(\pm\infty)$. If neither \$Y nor~\$Z is a NaN,
|
1286 |
|
|
the sign of the result is the product of the signs of \$Y and~\$Z\null.
|
1287 |
|
|
|
1288 |
|
|
\bull\
|
1289 |
|
|
@.FDIV@>
|
1290 |
|
|
The floating point quotient $\rY\?/\rZ$ is computed by
|
1291 |
|
|
the standard floating point conventions, and placed in \$X\null.
|
1292 |
|
|
@^standard floating point conventions@>
|
1293 |
|
|
A floating divide by zero exception occurs if the
|
1294 |
|
|
quotient is $(\hbox{normal or subnormal})/(\pm0.0)$. An invalid exception occurs if
|
1295 |
|
|
the quotient is $(\pm0.0)/(\pm0.0)$ or $(\pm\infty)/(\pm\infty)$; in that case the
|
1296 |
|
|
result is $\pm\NaN(1/2)$. No exception occurs for the
|
1297 |
|
|
quotient $(\pm\infty)/(\pm0.0)$. If neither \$Y nor~\$Z is a NaN,
|
1298 |
|
|
the sign of the result is the product of the signs of \$Y and~\$Z\null.
|
1299 |
|
|
|
1300 |
|
|
If a floating point number in register X is known to have an exponent between
|
1301 |
|
|
2 and~2046, the instruction \
|
1302 |
|
|
|
1303 |
|
|
\bull\
|
1304 |
|
|
@.FREM@>
|
1305 |
|
|
The floating point remainder $\rY\,{\rm rem}\,\rZ$ is computed by
|
1306 |
|
|
the standard floating point conventions, and placed in register~X\null.
|
1307 |
|
|
(The IEEE standard defines the remainder to be $\rY-n\times\rZ$,
|
1308 |
|
|
where $n$ is the nearest integer to $\rY/\rZ$, and $n$ is an even
|
1309 |
|
|
integer in case of ties. This is not the same as the remainder
|
1310 |
|
|
$\rY\bmod\rZ$ computed by \.{DIV} or \.{DIVU}.)
|
1311 |
|
|
A zero remainder has the sign of~\$Y\null.
|
1312 |
|
|
An invalid exception occurs if \$Y is infinite and/or \$Z is zero; in
|
1313 |
|
|
that case the result is $\NaN(1/2)$ with the sign of~\$Y\null.
|
1314 |
|
|
|
1315 |
|
|
\bull\
|
1316 |
|
|
@.FSQRT@>
|
1317 |
|
|
The floating point square root $\sqrt\rZ$ is computed by the
|
1318 |
|
|
standard floating point conventions, and placed in register~X\null. An
|
1319 |
|
|
invalid exception occurs if \$Z is a negative number (either infinite, normal,
|
1320 |
|
|
or subnormal); in that case the result is $-\NaN(1/2)$. No exception occurs
|
1321 |
|
|
when taking the square root of $-0.0$ or $+\infty$. In all cases the sign of
|
1322 |
|
|
the result is the sign of~\$Z\null.
|
1323 |
|
|
|
1324 |
|
|
\bull\
|
1325 |
|
|
@.FINT@>
|
1326 |
|
|
The floating point number in register~Z is rounded (if
|
1327 |
|
|
necessary) to a floating point integer, using the current
|
1328 |
|
|
rounding mode, and placed in register~X\null. Infinite values and quiet NaNs
|
1329 |
|
|
are not changed; signaling NaNs are treated as in the standard conventions.
|
1330 |
|
|
Floating point overflow and underflow exceptions cannot occur.
|
1331 |
|
|
|
1332 |
|
|
The Y field of \.{FSQRT} and \.{FINT} can be used to specify a
|
1333 |
|
|
special rounding mode, as explained below.
|
1334 |
|
|
|
1335 |
|
|
@ Besides doing arithmetic, we need to compare floating point numbers
|
1336 |
|
|
with each other, taking proper account of NaNs and the fact that $-0.0$
|
1337 |
|
|
should be considered equal to $+0.0$. The following instructions are
|
1338 |
|
|
analogous to the comparison operators \.{CMP} and \.{CMPU} that we
|
1339 |
|
|
have used for integers.
|
1340 |
|
|
@^minus zero@>
|
1341 |
|
|
|
1342 |
|
|
\bull\
|
1343 |
|
|
@.FCMP@>
|
1344 |
|
|
Register X is set to $-1$ if $\rY<\rZ$ according to the conventions of
|
1345 |
|
|
floating point arithmetic, or to~1 if $\rY>\rZ$ according to those
|
1346 |
|
|
conventions. Otherwise it is set to~0. An invalid exception
|
1347 |
|
|
occurs if either \$Y or \$Z is a NaN; in such cases the result is zero.
|
1348 |
|
|
|
1349 |
|
|
\bull\
|
1350 |
|
|
@.FEQL@>
|
1351 |
|
|
Register X is set to 1 if $\rY=\rZ$ according to the conventions of
|
1352 |
|
|
floating point arithmetic. Otherwise it is set to~0. The result is zero if
|
1353 |
|
|
either \$Y or \$Z is a NaN, even if a NaN is being compared with itself.
|
1354 |
|
|
However, no invalid exception occurs, not even when \$Y or \$Z is a signaling
|
1355 |
|
|
NaN\null. (Perhaps \MMIX\ differs slightly from the IEEE standard in this
|
1356 |
|
|
regard, but programmers sometimes need to look at signaling NaNs without
|
1357 |
|
|
encountering side effects.
|
1358 |
|
|
Programmers who insist on raising
|
1359 |
|
|
an invalid exception whenever a signaling NaN is compared for floating equality
|
1360 |
|
|
should issue the instructions \
|
1361 |
|
|
\.{FEQL}~\.{\$X,\$Y,\$Z}.)
|
1362 |
|
|
|
1363 |
|
|
Suppose $w$, $x$, $y$, and $z$ are unsigned 64-bit integers with
|
1364 |
|
|
$w
|
1365 |
|
|
while the leftmost bits of $y$ and~$z$ are~1.
|
1366 |
|
|
Then we have $w
|
1367 |
|
|
as unsigned integers, but $y
|
1368 |
|
|
integers, because $y$ and~$z$ are negative. Furthermore, we have
|
1369 |
|
|
$z
|
1370 |
|
|
floating point numbers, assuming that no NaNs are present,
|
1371 |
|
|
because the leftmost bit of a floating point
|
1372 |
|
|
number represents its sign and the remaining bits represent its magnitude.
|
1373 |
|
|
The case $y=w$ occurs in floating point comparison
|
1374 |
|
|
if and only if $y$ is the representation of $-0.0$
|
1375 |
|
|
and $w$ is the representation of $+0.0$.
|
1376 |
|
|
|
1377 |
|
|
\bull\
|
1378 |
|
|
@.FUN@>
|
1379 |
|
|
Register X is set to 1 if \$Y and \$Z are unordered according to the conventions
|
1380 |
|
|
of floating point arithmetic (namely, if either one is a NaN); otherwise
|
1381 |
|
|
register~X is set to~0. No invalid exception occurs, not even when \$Y or \$Z is
|
1382 |
|
|
a signaling NaN\null.
|
1383 |
|
|
|
1384 |
|
|
\smallskip
|
1385 |
|
|
The IEEE standard discusses 26 different possible
|
1386 |
|
|
relations on floating point numbers;
|
1387 |
|
|
\MMIX\ implements 14 of them with single instructions, followed by
|
1388 |
|
|
a branch (or by a \.{ZS} to make a ``pure'' 0~or~1 result); all 26
|
1389 |
|
|
can be evaluated with a sequence of at most four \MMIX\ commands
|
1390 |
|
|
and a subsequent branch. The
|
1391 |
|
|
hardest case to handle is `?$>=$' (unordered or greater or equal,
|
1392 |
|
|
to be computed without exceptions), for which the following
|
1393 |
|
|
sequence makes $\rX\ge0$ if and only if $\rY\mathrel?>=\rZ$:
|
1394 |
|
|
$$\vbox{\halign{&\tt#\hfil\ \cr
|
1395 |
|
|
&FUN &\$255,\$Y,\$Z\cr
|
1396 |
|
|
&BP &\$255,1F&\% skip ahead if unordered\cr
|
1397 |
|
|
&FCMP&\$X,\$Y,\$Z&\% \$X=[\$Y>\$Z]-[\$Y<\$Z]; no exceptions will arise\cr
|
1398 |
|
|
1H&CSNZ &\$X,\$255,1&\% \$X=1 if unordered\cr
|
1399 |
|
|
}}$$
|
1400 |
|
|
|
1401 |
|
|
@ Exercise: Suppose \MMIX\ had no \.{FINT} instruction. Explain how to
|
1402 |
|
|
@.FINT@>
|
1403 |
|
|
obtain the equivalent of \
|
1404 |
|
|
program should do the proper thing with respect to NaNs and exceptions.
|
1405 |
|
|
(For example, it should cause an invalid exception if and only if \$Z is
|
1406 |
|
|
a signaling NaN; it should cause an inexact exception only if \$Z needs
|
1407 |
|
|
to be rounded to another value.)
|
1408 |
|
|
@^emulation@>
|
1409 |
|
|
|
1410 |
|
|
Answer: (The assembler prefixes hexadecimal constants by \.\#.)
|
1411 |
|
|
$$\vbox{\halign{&\tt#\hfil\ \cr
|
1412 |
|
|
&SETH &\$0,\char`\#4330&\% \$0=2\char`\^52\cr
|
1413 |
|
|
&SET &\$1,\$Z&\% \$1=\$Z\cr
|
1414 |
|
|
&ANDNH &\$1,\char`\#8000&\% \$1=abs(\$Z)\cr
|
1415 |
|
|
&ANDN &\$2,\$Z,\$1&\% \$2=signbit(\$Z)\cr
|
1416 |
|
|
&FUN &\$3,\$Z,\$Z&\% \$3=[\$Z is a NaN]\cr
|
1417 |
|
|
&BNZ &\$3,1F&\% skip ahead if \$Z is a NaN\cr
|
1418 |
|
|
&FCMP &\$3,\$1,\$0&\% \$3=[abs(\$Z)>2\char`\^52]-[abs(\$Z)<2\char`\^52]\cr
|
1419 |
|
|
&CSNN &\$0,\$3,0&\% set \$0=0 if \$3>=0\cr
|
1420 |
|
|
&OR &\$0,\$2,\$0&\% attach sign of \$Z to \$0\cr
|
1421 |
|
|
1H\ &FADD &\$1,\$Z,\$0&\% \$1=\$Z+\$0\cr
|
1422 |
|
|
&FSUB &\$X,\$1,\$0&\% \$X=\$1-\$0\cr}}$$
|
1423 |
|
|
This program handles most cases of interest by adding and subtracting
|
1424 |
|
|
$\pm2^{52}$ using floating point arithmetic.
|
1425 |
|
|
It would be incorrect to do this in all cases;
|
1426 |
|
|
for example, such addition/subtraction might fail to give the correct
|
1427 |
|
|
answer when \$Z is a small negative
|
1428 |
|
|
quantity (if rounding toward zero), or when \$Z is a number like
|
1429 |
|
|
$2^{105}+2^{53}$ (if rounding to nearest).
|
1430 |
|
|
|
1431 |
|
|
@ \MMIX\ goes beyond the IEEE standard to define additional relations
|
1432 |
|
|
between floating point numbers, as suggested by the theory in
|
1433 |
|
|
Section 4.2.2 of {\sl Seminumerical Algorithms}. Given a nonnegative
|
1434 |
|
|
number~$\epsilon$, each normal floating point number $u=(f,e)$ has
|
1435 |
|
|
a {\it neighborhood\/}
|
1436 |
|
|
$$N_\epsilon(u)=\{x\,\mid\,\vert x-u\vert\le 2^{e-1022}\epsilon\};$$
|
1437 |
|
|
we also define $N_\epsilon(0)=\{0\}$,
|
1438 |
|
|
$N_\epsilon(u)=\{x\mid\vert x-u\vert\le2^{-1021}\epsilon\}$ if $u$ is
|
1439 |
|
|
subnormal; $N_\epsilon(\pm\infty)=\{\pm\infty\}$ if $\epsilon<1$,
|
1440 |
|
|
$N_\epsilon(\pm\infty)=\{$everything except $\mp\infty\}$ if $1\le\epsilon<2$,
|
1441 |
|
|
$N_\epsilon(\pm\infty)=\{$everything$\}$ if $\epsilon\ge2$. Then we write
|
1442 |
|
|
$$\vbox{\halign{$u#v\ (\epsilon)$, \hfil\cr
|
1443 |
|
|
\prec&if $u
|
1444 |
|
|
\sim&if $u\in N_\epsilon(v)$ or $v\in N_\epsilon(u)$;\cr
|
1445 |
|
|
\approx&if $u\in N_\epsilon(v)$ and $v\in N_\epsilon(u)$;\cr
|
1446 |
|
|
\succ&if $u>N_\epsilon(v)$ and $N_\epsilon(u)>v$.\cr}}$$
|
1447 |
|
|
|
1448 |
|
|
\def\rE{{\rm rE}}
|
1449 |
|
|
\bull\
|
1450 |
|
|
@.FCMPE@>
|
1451 |
|
|
Register X is set to $-1$ if $\rY\prec\rZ\ \ (\rE)$ according to the
|
1452 |
|
|
conventions of {\sl Seminumerical Algorithms} as stated above; it is set to~1
|
1453 |
|
|
if $\rY\succ\rZ\ \ (\rE)$ according to those conventions; otherwise
|
1454 |
|
|
it is set to~0. Here rE is a floating point number in
|
1455 |
|
|
@^rE@>
|
1456 |
|
|
the special {\it epsilon register\/}, which is used only by the
|
1457 |
|
|
floating point comparison operations \.{FCMPE}, \.{FEQLE}, and \.{FUNE}.
|
1458 |
|
|
An invalid exception occurs, and the result is zero,
|
1459 |
|
|
if any of \$Y, \$Z, or rE are NaN, or if rE is negative.
|
1460 |
|
|
If no such exception occurs, exactly one of the three conditions
|
1461 |
|
|
$\rY\prec\rZ$, $\rY\sim\rZ$, $\rY\succ\rZ$ holds with respect to~rE.
|
1462 |
|
|
|
1463 |
|
|
\bull\
|
1464 |
|
|
@.FEQLE@>
|
1465 |
|
|
Register X is set to 1 if $\rY\approx\rZ\ \ (\rE)$ according to the
|
1466 |
|
|
conventions of {\sl Seminumerical Algorithms\/} as stated above; otherwise
|
1467 |
|
|
it is set to~0.
|
1468 |
|
|
An invalid exception occurs, and the result is zero,
|
1469 |
|
|
if any of \$Y, \$Z, or rE are NaN, or if rE is negative.
|
1470 |
|
|
Notice that the relation $\rY\approx\rZ$ computed by \.{FEQLE} is
|
1471 |
|
|
stronger than the relation $\rY\sim\rZ$ computed by \.{FCMPE}.
|
1472 |
|
|
|
1473 |
|
|
\bull\
|
1474 |
|
|
@.FUNE@>
|
1475 |
|
|
Register X is set to 1 if
|
1476 |
|
|
\$Y, \$Z, or~rE are exceptional as discussed for \.{FCMPE} and \.{FEQLE};
|
1477 |
|
|
otherwise it is set to~0. No exceptions occur, even if \$Y, \$Z, or~rE is
|
1478 |
|
|
a signaling NaN.
|
1479 |
|
|
|
1480 |
|
|
\smallskip\noindent
|
1481 |
|
|
Exercise: What floating point numbers does \.{FCMPE} regard
|
1482 |
|
|
as $\sim0.0$ with respect to
|
1483 |
|
|
$\epsilon=1/2$, when no exceptions arise? \ Answer: Zero, subnormal
|
1484 |
|
|
numbers, and normal numbers with $f=0$.
|
1485 |
|
|
(The numbers similar to zero with respect to~$\epsilon$ are zero,
|
1486 |
|
|
subnormal numbers with $f\le2\epsilon$, normal numbers with $f\le2\epsilon-1$,
|
1487 |
|
|
and $\pm\infty$ if $\epsilon>=1$.)
|
1488 |
|
|
|
1489 |
|
|
@ The IEEE standard also defines 32-bit floating point quantities, which
|
1490 |
|
|
it calls ``single format'' numbers. \MMIX\ calls them {\it short floats},
|
1491 |
|
|
@^short float@>
|
1492 |
|
|
and converts between 32-bit and 64-bit forms when such numbers are
|
1493 |
|
|
loaded from memory or stored into memory. A short float consists of a sign
|
1494 |
|
|
bit followed by an 8-bit exponent and a 23-bit fraction. After it has
|
1495 |
|
|
been loaded into one of\/ \MMIX's registers, its 52-bit fraction part
|
1496 |
|
|
will have 29 trailing zero bits, and its exponent~$e$ will be one of the
|
1497 |
|
|
256 values 0, $(01110000001)_2=897$, $(01110000010)_2=898$, \dots,
|
1498 |
|
|
$(10001111110)_2=1150$, or~2047, unless it was subnormal; a subnormal
|
1499 |
|
|
short float loads into a normal number with $874\le e\le896$.
|
1500 |
|
|
|
1501 |
|
|
\bull\
|
1502 |
|
|
@.LDSF@>
|
1503 |
|
|
Register~X is set to the 64-bit floating point number corresponding
|
1504 |
|
|
to the 32-bit floating point number represented by
|
1505 |
|
|
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
|
1506 |
|
|
No arithmetic exceptions occur, not even if a signaling NaN is loaded.
|
1507 |
|
|
|
1508 |
|
|
\bull\
|
1509 |
|
|
@.STSF@>
|
1510 |
|
|
The value obtained by rounding register~X to a 32-bit floating
|
1511 |
|
|
point number is placed in $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
|
1512 |
|
|
Rounding is done with the current rounding mode, in a manner
|
1513 |
|
|
exactly analogous to the standard conventions for rounding 64-bit results,
|
1514 |
|
|
except that the precision and exponent range are limited. In particular,
|
1515 |
|
|
floating overflow, underflow, and inexact exceptions might occur;
|
1516 |
|
|
a signaling NaN will trigger an invalid exception and it will become quiet.
|
1517 |
|
|
The fraction part of a NaN is truncated if necessary to a multiple of
|
1518 |
|
|
$2^{-23}$, by ignoring the least significant 29 bits.
|
1519 |
|
|
|
1520 |
|
|
If we load any two short floats and operate on them once with either \.{FADD},
|
1521 |
|
|
\.{FSUB}, \.{FMUL}, \.{FDIV}, \.{FREM}, \.{FSQRT}, or \.{FINT}, and if we then
|
1522 |
|
|
store the result as a short float, we obtain the results required by
|
1523 |
|
|
the IEEE standard for single format arithmetic, because
|
1524 |
|
|
the double format can be shown to have enough precision to avoid any
|
1525 |
|
|
problems of ``double rounding.'' But programmers are usually better
|
1526 |
|
|
off sticking
|
1527 |
|
|
to 64-bit arithmetic unless they have a strong reason to emulate the
|
1528 |
|
|
precise behavior of a 32-bit computer; 32 bits do not offer
|
1529 |
|
|
much precision.
|
1530 |
|
|
|
1531 |
|
|
@ Of course we need to be able to go back and forth between integers and
|
1532 |
|
|
floating point values.
|
1533 |
|
|
|
1534 |
|
|
\bull\
|
1535 |
|
|
@.FIX@>
|
1536 |
|
|
The floating point number in register~Z is converted to an integer
|
1537 |
|
|
as with the \.{FINT} instruction, and the resulting integer (mod~$2^{64}$)
|
1538 |
|
|
is placed in register~X\null.
|
1539 |
|
|
An invalid exception occurs if \$Z is infinite
|
1540 |
|
|
or a NaN; in that case \$X is simply set equal to~\$Z\null. A float-to-fix
|
1541 |
|
|
exception occurs if the result is less than
|
1542 |
|
|
@^float-to-fix exception@>
|
1543 |
|
|
@^short float@>
|
1544 |
|
|
$-2^{63}$ or greater than $2^{63}-1$.
|
1545 |
|
|
|
1546 |
|
|
\bull\
|
1547 |
|
|
@.FIXU@>
|
1548 |
|
|
This instruction is identical to \.{FIX} except that no float-to-fix
|
1549 |
|
|
exception occurs.
|
1550 |
|
|
|
1551 |
|
|
\bull\
|
1552 |
|
|
@.FLOT@>
|
1553 |
|
|
The integer in \$Z or the immediate constant~Z is
|
1554 |
|
|
converted to the nearest floating point value (using the current rounding
|
1555 |
|
|
mode) and placed in register~X\null. A floating inexact exception
|
1556 |
|
|
occurs if rounding is necessary.
|
1557 |
|
|
|
1558 |
|
|
\bull\
|
1559 |
|
|
@.FLOTU@>
|
1560 |
|
|
\.{FLOTU} is like \.{FLOT}, but \$Z is treated as an unsigned integer.
|
1561 |
|
|
|
1562 |
|
|
\bull\
|
1563 |
|
|
\
|
1564 |
|
|
@.SFLOT@>
|
1565 |
|
|
@.SFLOTU@>
|
1566 |
|
|
The \.{SFLOT} instructions are like the \.{FLOT} instructions, except that
|
1567 |
|
|
they round to a floating point number whose fraction part is a multiple
|
1568 |
|
|
of $2^{-23}$. (Thus, the resulting value will not be changed by a ``store
|
1569 |
|
|
short float'' instruction.) Such conversions appear in \MMIX's repertoire only
|
1570 |
|
|
to establish complete conformance with the IEEE standard; a programmer
|
1571 |
|
|
needs them only when emulating a 32-bit machine.
|
1572 |
|
|
@^emulation@>
|
1573 |
|
|
|
1574 |
|
|
@ Since the variants of \.{FIX} and \.{FLOT} involve only one input operand (\$Z
|
1575 |
|
|
or~Z), their Y~field is normally zero. A programmer can, however, force the
|
1576 |
|
|
mode of rounding used with these commands by setting
|
1577 |
|
|
$$\vbox{\halign{$\yy=#$,\quad &\.{ROUND\_#};\hfil\cr
|
1578 |
|
|
1&OFF\cr
|
1579 |
|
|
2&UP\cr
|
1580 |
|
|
3&DOWN\cr
|
1581 |
|
|
4&NEAR\cr}}$$
|
1582 |
|
|
for example, the instruction \
|
1583 |
|
|
exponent~$e$ of register~X to $1086-l$ if \$Z is a nonzero quantity with
|
1584 |
|
|
$l$~leading zero bits. Thus we can count leading zeros by continuing
|
1585 |
|
|
with \.{SETL}~\.{\$0,1086}; \.{SR}~\.{\$X,\$X,52}; \.{SUB}~\.{\$X,\$0,\$X};
|
1586 |
|
|
\.{CSZ}~\.{\$X,\$Z,64}.
|
1587 |
|
|
@^counting leading zeros@>
|
1588 |
|
|
@.FLOT@>
|
1589 |
|
|
@.FLOTU@>
|
1590 |
|
|
@.SFLOT@>
|
1591 |
|
|
@.SFLOTU@>
|
1592 |
|
|
@.FIX@>
|
1593 |
|
|
@.FIXU@>
|
1594 |
|
|
@:ROUND_OFF}\.{ROUND\_OFF@>
|
1595 |
|
|
@:ROUND_UP}\.{ROUND\_UP@>
|
1596 |
|
|
@:ROUND_DOWN}\.{ROUND\_DOWN@>
|
1597 |
|
|
@:ROUND_NEAR}\.{ROUND\_NEAR@>
|
1598 |
|
|
|
1599 |
|
|
The Y field can also be used in the same way
|
1600 |
|
|
to specify any desired rounding mode in the other
|
1601 |
|
|
floating point instructions that have only a single operand, namely
|
1602 |
|
|
\.{FSQRT} and~\.{FINT}.
|
1603 |
|
|
@.FSQRT@>
|
1604 |
|
|
@.FINT@>
|
1605 |
|
|
An illegal instruction interrupt occurs if Y exceeds~4 in any of these
|
1606 |
|
|
commands.
|
1607 |
|
|
@^illegal instructions@>
|
1608 |
|
|
|
1609 |
|
|
@* Subroutine linkage.
|
1610 |
|
|
\MMIX\ has several special operations designed to facilitate the process of
|
1611 |
|
|
calling and implementing subroutines. The key notion is the idea of a
|
1612 |
|
|
hardware-supported {\it register stack}, which can coexist with a
|
1613 |
|
|
software-supported stack of variables that are not maintained in registers.
|
1614 |
|
|
From a programmer's standpoint, \MMIX\ maintains a potentially unbounded list
|
1615 |
|
|
$S[0]$, $S[1]$, \dots,~$S[\tau-1]$ of octabytes holding the contents
|
1616 |
|
|
of registers that are temporarily inaccessible; initially $\tau=0$.
|
1617 |
|
|
When a subroutine is entered, registers can be ``pushed'' on to the end of
|
1618 |
|
|
this list, increasing~$\tau$; when the subroutine has finished its
|
1619 |
|
|
execution, the registers are ``popped'' off again and $\tau$~decreases.
|
1620 |
|
|
|
1621 |
|
|
Our discussion so far has treated all 256 registers \$0, \$1, \dots,~\$255 as if
|
1622 |
|
|
they were alike. But in fact, \MMIX\ maintains two internal one-byte counters
|
1623 |
|
|
$L$ and~$G$, where $0\le\ll\le\gg<256$, with the property that
|
1624 |
|
|
$$\vbox{\halign{#\hfil\cr
|
1625 |
|
|
registers 0, 1, \dots, $\ll-1$ are ``local'';\cr
|
1626 |
|
|
registers @!|L|, $\ll+1$, \dots, $\gg-1$ are ``marginal'';\cr
|
1627 |
|
|
registers @!|G|, $\gg+1$, \dots, 255 are ``global.''\cr}}$$
|
1628 |
|
|
A marginal register is zero when its value is read.
|
1629 |
|
|
@^illegal instructions@>
|
1630 |
|
|
@^rG@>
|
1631 |
|
|
@^rL@>
|
1632 |
|
|
@^local registers@>
|
1633 |
|
|
@^marginal registers@>
|
1634 |
|
|
@^global registers@>
|
1635 |
|
|
@^register stack@>
|
1636 |
|
|
|
1637 |
|
|
The $G$ counter is normally set to a fixed value once and for all when a program
|
1638 |
|
|
is loaded, thereby defining the number of program variables that will live
|
1639 |
|
|
entirely in registers rather than in memory during the course of execution.
|
1640 |
|
|
A programmer may, however, change~$G$ dynamically using the \.{PUT}
|
1641 |
|
|
instruction described below.
|
1642 |
|
|
|
1643 |
|
|
The $L$ counter starts at 0. If an instruction places a value into a register
|
1644 |
|
|
that is currently marginal, namely a register $x$ such that
|
1645 |
|
|
$\ll\le x<\gg$, the value of~$L$ will increase to $x+1$, and any
|
1646 |
|
|
newly local registers will be zero. For example, if $\ll=10$ and
|
1647 |
|
|
$\gg=200$, the instruction \
|
1648 |
|
|
instruction \
|
1649 |
|
|
\$15 to $\$5+\$200$, and $L$~to~16. (The process of clearing registers and
|
1650 |
|
|
increasing~$L$ might take quite a few machine cycles in the worst case. We will
|
1651 |
|
|
see later that \MMIX\ is able to take care of any high-priority interrupts
|
1652 |
|
|
that might occur during this time.)
|
1653 |
|
|
|
1654 |
|
|
\bull\
|
1655 |
|
|
\bul\
|
1656 |
|
|
@.PUSHGO@>
|
1657 |
|
|
@.PUSHJ@>
|
1658 |
|
|
Suppose first that $\xx<\ll$.
|
1659 |
|
|
Register~X is set equal to the number~X, then
|
1660 |
|
|
registers 0, 1, \dots,~X are pushed onto the register stack as
|
1661 |
|
|
described below.
|
1662 |
|
|
If this instruction is in
|
1663 |
|
|
location $\lambda$, the value $\lambda+4$ is placed into the special {\it
|
1664 |
|
|
return-jump register\/}~rJ\null. Then control jumps to instruction
|
1665 |
|
|
@^rJ@>
|
1666 |
|
|
$\lambda+4\rm YZ$ or $\lambda+4\rm YZ-262144$ or
|
1667 |
|
|
$\rY+\rZ$ or $\rY+\zz$, as in a
|
1668 |
|
|
\.{JMP} or \.{GO} command.
|
1669 |
|
|
|
1670 |
|
|
Pushing the first $\xx+1$ registers onto the stack means essentially that we
|
1671 |
|
|
set $S[\tau]\gets\$0$, $S[\tau+1]\gets\$1$, \dots, $S[\tau+\xx]\gets\$\xx$,
|
1672 |
|
|
$\tau\gets\tau+\xx+1$, $\$0\gets\$(\xx+1)$, \dots,
|
1673 |
|
|
$\$(\ll-\xx-2)\gets\$(\ll-1)$, $\ll\gets\ll-\xx-1$. For example, if
|
1674 |
|
|
$\xx=1$ and $\ll=5$, the current contents of \$0 and the number~1 are
|
1675 |
|
|
placed on the register stack, where they will be temporarily inaccessible.
|
1676 |
|
|
Then control jumps to a subroutine with $L$ reduced to~3; the registers that we
|
1677 |
|
|
had been calling \$2, \$3, and \$4 appear as \$0, \$1, and \$2 to the subroutine.
|
1678 |
|
|
|
1679 |
|
|
If $\ll\le\xx<\gg$, the value of $\ll$ increases to $\xx+1$ as described
|
1680 |
|
|
above; then the rules for $\xx<\ll$ apply.
|
1681 |
|
|
|
1682 |
|
|
If $\xx\ge\gg$ the actions are similar, except that {\it all\/} of the local
|
1683 |
|
|
registers \$0, \dots,~$\$(\ll-1)$ are placed on the register stack
|
1684 |
|
|
followed by the number~$L$, and $L$~is reset to zero. In particular, the
|
1685 |
|
|
instruction \
|
1686 |
|
|
onto the stack and sets $L$ to zero, regardless of the previous value of~$L$.
|
1687 |
|
|
|
1688 |
|
|
We will see later that \MMIX\ is able to achieve the effect of pushing and
|
1689 |
|
|
renaming local registers without actually doing very much work at all.
|
1690 |
|
|
|
1691 |
|
|
\bull\
|
1692 |
|
|
@.POP@>
|
1693 |
|
|
This command preserves X of the current local registers,
|
1694 |
|
|
undoes the effect of the most recent \.{PUSHJ} or \.{PUSHGO}, and jumps
|
1695 |
|
|
to the instruction in $\mm_4[{\rm4YZ+rJ}]$. If $\xx>0$, the value of
|
1696 |
|
|
$\$(\xx-1)$ goes into the ``hole'' position where \.{PUSHJ} or
|
1697 |
|
|
\.{PUSHGO} stored the number of registers previously pushed.
|
1698 |
|
|
|
1699 |
|
|
The formal details of \.{POP} are slightly complicated, but we will see that
|
1700 |
|
|
they make sense: If $\xx>\ll$, we first replace X by $\ll+1$. Then we
|
1701 |
|
|
set $x\gets S[\tau-1]\bmod 256$; this is the effective value of the X~field
|
1702 |
|
|
in the push instruction that is being undone. Stack position $S[\tau-1]$ is
|
1703 |
|
|
now set to $\$(\xx-1)$ if $0<\xx\le L$, otherwise it is set to zero.
|
1704 |
|
|
Then we essentially set
|
1705 |
|
|
$\ll\gets\min(x+\xx,\gg)$, $\$(\ll-1)\gets\$(\ll-x-2)$, \dots,
|
1706 |
|
|
$\$(x+1)\gets\$0$, $\$x\gets S[\tau-1]$, \dots,
|
1707 |
|
|
$\$0\gets S[\tau-x-1]$, $\tau\gets\tau-x-1$. The operating system should
|
1708 |
|
|
@^operating system@>
|
1709 |
|
|
arrange things so that a memory-protection
|
1710 |
|
|
interrupt will occur if a program does more pops than pushes.
|
1711 |
|
|
(If $x>\gg$, these formulas don't make sense as written; we actually
|
1712 |
|
|
set $\$j\gets S[\tau-x-1+j]$ for $\ll>j\ge0$ in that rare case.)
|
1713 |
|
|
|
1714 |
|
|
Suppose, for example, that a subroutine has three input parameters
|
1715 |
|
|
$(\$0,\$1,\$2)$ and produces two outputs $(\$0,\$1)$. If the subroutine does
|
1716 |
|
|
not call any other subroutines, it can simply end with \.{POP} \.{2,0},
|
1717 |
|
|
because rJ will contain the return address. Otherwise it should begin by
|
1718 |
|
|
saving rJ, for example with the instruction \
|
1719 |
|
|
using local registers \$0 through~\$3, and it should use \
|
1720 |
|
|
\
|
1721 |
|
|
calling sub-subroutines; finally it should \
|
1722 |
|
|
saying \.{POP}~\.{2,0}. To call the subroutine from another routine that
|
1723 |
|
|
has, say, 6~local registers, we would put the input arguments into \$7, \$8,
|
1724 |
|
|
and~\$9, then issue the command \.{PUSHGO} \.{\$6,base,Subr};
|
1725 |
|
|
in due time the outputs of the subroutine will appear in \$7 and~\$6.
|
1726 |
|
|
|
1727 |
|
|
Notice that the push and pop commands make use of a one-place ``hole'' in the
|
1728 |
|
|
register stack, between the registers that are pushed down and the registers
|
1729 |
|
|
that remain local. (The hole is position \$6 in the example just considered.)
|
1730 |
|
|
\MMIX\ needs this hole position to remember the number of
|
1731 |
|
|
registers that are pushed down.
|
1732 |
|
|
A subroutine with no outputs ends with \
|
1733 |
|
|
(becomes marginal). A subroutine with one output~\$0 ends with \
|
1734 |
|
|
the hole gets the former value of~\$0. A subroutine with two outputs
|
1735 |
|
|
$(\$0,\$1)$ ends with \
|
1736 |
|
|
this case, therefore, the relative order of the two outputs has been switched
|
1737 |
|
|
on the register stack. If a subroutine has, say, five outputs
|
1738 |
|
|
$(\$0,\ldots,\$4)$, it ends with \
|
1739 |
|
|
where it is followed by $(\$0,\$1,\$2,\$3)$.
|
1740 |
|
|
\MMIX\ makes this curious permutation in the case of multiple outputs because
|
1741 |
|
|
the hole is most easily plugged by moving one value down (namely~\$4) instead
|
1742 |
|
|
of by sliding each of five values down in the stack.
|
1743 |
|
|
|
1744 |
|
|
These conventions for parameter passing are admittedly a bit confusing in the
|
1745 |
|
|
general case, and I~suppose people who use them extensively might someday find
|
1746 |
|
|
themselves talking about ``the infamous \MMIX\ register shuffle.'' However,
|
1747 |
|
|
there is good use for subroutines that convert
|
1748 |
|
|
a sequence of register contents like $(x,a,b,c)$ into $(f,a,b,c)$ where
|
1749 |
|
|
$f$ is a function of $a$, $b$, and $c$ but not~$x$. Moreover,
|
1750 |
|
|
\.{PUSHGO} and \.{POP} can be implemented with great efficiency,
|
1751 |
|
|
and subroutine linkage tends to be a significant bottleneck when
|
1752 |
|
|
other conventions are used.
|
1753 |
|
|
|
1754 |
|
|
Information about a subroutine's calling conventions needs to be communicated
|
1755 |
|
|
to a debugger. That can readily be done at the same time as we inform the
|
1756 |
|
|
debugger about the symbolic names of addresses in memory.
|
1757 |
|
|
|
1758 |
|
|
A subroutine that uses 50 local registers will not function properly if it is
|
1759 |
|
|
called by a program that sets $G$ less than~50. \MMIX\ does not allow the
|
1760 |
|
|
value of~$G$ to become less than~32. Therefore any subroutine that avoids
|
1761 |
|
|
global registers and uses at most~32 local registers
|
1762 |
|
|
can be sure to work properly regardless of the current value of~$G$.
|
1763 |
|
|
|
1764 |
|
|
The rules stated above imply that a \.{PUSHJ} or
|
1765 |
|
|
\.{PUSHGO} instruction with $\xx=255$ pushes all of the currently defined
|
1766 |
|
|
local registers onto the stack and sets $L$ to~zero.
|
1767 |
|
|
This makes $G$ local registers available for use by the subroutine
|
1768 |
|
|
jumped~to. If that subroutine later returns with \.{POP} \.{0,0}, the former
|
1769 |
|
|
value of~$L$ and the former contents of \$0, \dots,~$\$(\ll-1)$ will be
|
1770 |
|
|
restored (assuming that $G$ doesn't decrease).
|
1771 |
|
|
|
1772 |
|
|
A \.{POP} instruction with $\xx=255$
|
1773 |
|
|
preserves all the local registers as outputs of
|
1774 |
|
|
the subroutine (provided that the total doesn't exceed~$G$ after popping),
|
1775 |
|
|
and puts zero into the hole (unless $L=G=255$). The best policy, however, is
|
1776 |
|
|
almost always to use \.{POP} with a small value of~X, and in general to keep
|
1777 |
|
|
the value of~$L$ as small as
|
1778 |
|
|
possible by decreasing it when registers are no longer active.
|
1779 |
|
|
A smaller value of~$L$ means that \MMIX\ can change context more
|
1780 |
|
|
easily when switching from one process to another.
|
1781 |
|
|
|
1782 |
|
|
@* System considerations.
|
1783 |
|
|
High-performance implementations of\/ \MMIX\ gain speed by keeping {\it
|
1784 |
|
|
caches\/} of instructions and data that are likely to be needed as computation
|
1785 |
|
|
@^caches@>
|
1786 |
|
|
proceeds. [See M.~V. Wilkes, {\sl IEEE Transactions\/ \bf EC-14} (1965),
|
1787 |
|
|
270--271; J.~S. Liptay, {\sl IBM System J. \bf7} (1968), 15--21.]
|
1788 |
|
|
@^Wilkes, Maurice Vincent@>
|
1789 |
|
|
@^Liptay, John S.@>
|
1790 |
|
|
Careful programmers can make the computer run even faster by giving
|
1791 |
|
|
hints about how to maintain such caches.
|
1792 |
|
|
|
1793 |
|
|
\bull\
|
1794 |
|
|
@.LDUNC@>
|
1795 |
|
|
These instructions, which have the same meaning as \.{LDO}, also
|
1796 |
|
|
inform the computer that the loaded octabyte (and its neighbors in a cache
|
1797 |
|
|
block) will probably not be read or written in the near future.
|
1798 |
|
|
|
1799 |
|
|
\bull\
|
1800 |
|
|
@.STUNC@>
|
1801 |
|
|
These instructions, which have the same meaning as \.{STO}, also
|
1802 |
|
|
inform the computer that the stored octabyte (and its neighbors in a cache
|
1803 |
|
|
block) will probably not be read or written in the near future.
|
1804 |
|
|
|
1805 |
|
|
\bull\
|
1806 |
|
|
@.PRELD@>
|
1807 |
|
|
These instructions have no effect on registers or memory, but they inform the
|
1808 |
|
|
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
|
1809 |
|
|
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
|
1810 |
|
|
will probably be loaded and/or stored in the near future.
|
1811 |
|
|
No protection failure occurs if the memory is not accessible.
|
1812 |
|
|
|
1813 |
|
|
\bull\
|
1814 |
|
|
@.PREGO@>
|
1815 |
|
|
These instructions have no effect on registers or memory, but they inform the
|
1816 |
|
|
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
|
1817 |
|
|
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
|
1818 |
|
|
will probably be used as instructions in the near future.
|
1819 |
|
|
No protection failure occurs if the memory is not accessible.
|
1820 |
|
|
|
1821 |
|
|
\bull\
|
1822 |
|
|
@.PREST@>
|
1823 |
|
|
These instructions have no effect on registers or memory if the computer has
|
1824 |
|
|
no data cache. But when such a cache exists, they inform the
|
1825 |
|
|
computer that all of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
|
1826 |
|
|
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
|
1827 |
|
|
will definitely be stored in the near future before they are loaded.
|
1828 |
|
|
(Therefore it is permissible for the machine to ignore the present contents of
|
1829 |
|
|
those bytes. Also, if those bytes are being shared by several processors,
|
1830 |
|
|
the current processor should try to acquire exclusive access.)
|
1831 |
|
|
No protection failure occurs if the memory is not accessible.
|
1832 |
|
|
|
1833 |
|
|
\bull\
|
1834 |
|
|
@.SYNCD@>
|
1835 |
|
|
When executed from nonnegative locations, these instructions have no effect on
|
1836 |
|
|
registers or memory if neither a write buffer nor a ``write back''
|
1837 |
|
|
data cache are present. But when such a buffer or cache exists, they force the
|
1838 |
|
|
computer to make sure that all data for the $\xx+1$ bytes
|
1839 |
|
|
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or
|
1840 |
|
|
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
|
1841 |
|
|
will be present in memory.
|
1842 |
|
|
(Otherwise the result of a previous store instruction might appear only
|
1843 |
|
|
in the cache; the computer is being told that now is the time to
|
1844 |
|
|
write the information back, if it hasn't already been written. A program
|
1845 |
|
|
can use this feature before outputting directly from memory.)
|
1846 |
|
|
No protection failure occurs if the memory is not accessible.
|
1847 |
|
|
|
1848 |
|
|
The action is similar when \.{SYNCD} is executed from a negative address,
|
1849 |
|
|
but in this case the specified bytes are also removed from the data
|
1850 |
|
|
cache (and from a secondary cache, if present). The operating system can
|
1851 |
|
|
use this feature when a page of virtual memory is being swapped out,
|
1852 |
|
|
or when data is input directly into memory.
|
1853 |
|
|
@^operating system@>
|
1854 |
|
|
|
1855 |
|
|
\bull\
|
1856 |
|
|
@.SYNCID@>
|
1857 |
|
|
When executed from nonnegative locations these instructions have no effect on
|
1858 |
|
|
registers or memory if the computer has no instruction cache separate from a
|
1859 |
|
|
data cache. But when such a cache exists, they force the
|
1860 |
|
|
computer to make sure that the $\xx+1$ bytes
|
1861 |
|
|
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or
|
1862 |
|
|
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
|
1863 |
|
|
will be interpreted correctly
|
1864 |
|
|
if used as instructions before they are next modified.
|
1865 |
|
|
(Generally speaking, an \MMIX\ program is not expected to store anything in
|
1866 |
|
|
memory locations that are also being used as instructions.
|
1867 |
|
|
Therefore \MMIX's instruction cache is allowed to become inconsistent with
|
1868 |
|
|
respect to its data cache. Programmers who insist on executing instructions
|
1869 |
|
|
that have been fabricated dynamically, for example when setting a breakpoint
|
1870 |
|
|
for debugging, must first \.{SYNCID} those instructions
|
1871 |
|
|
in order to guarantee that the intended results will be obtained.) A \.{SYNCID}
|
1872 |
|
|
command might be implemented in several ways; for example, the machine
|
1873 |
|
|
might update its instruction cache to agree with its data cache. A simpler
|
1874 |
|
|
solution, which is good enough because the need for \.{SYNCID} ought to
|
1875 |
|
|
be rare, removes instructions in the specified range
|
1876 |
|
|
from the instruction cache, if
|
1877 |
|
|
present, so that they will have to be fetched from memory the next time
|
1878 |
|
|
they are needed; in this case the machine also carries out the effect of
|
1879 |
|
|
a~\.{SYNCD} command.
|
1880 |
|
|
No protection failure occurs if the memory is not accessible.
|
1881 |
|
|
|
1882 |
|
|
The behavior is more drastic, but faster, when \.{SYNCID} is executed
|
1883 |
|
|
from a negative location. Then all bytes in the specified range are
|
1884 |
|
|
simply removed from all caches, and the memory corresponding to
|
1885 |
|
|
any ``dirty'' cache blocks involving such bytes is {\it not\/} brought up
|
1886 |
|
|
to date. An operating system can use this version of the command
|
1887 |
|
|
when pages of virtual memory are being discarded (for example, when
|
1888 |
|
|
a program is being terminated).
|
1889 |
|
|
|
1890 |
|
|
@ \MMIX\ is designed to work not only on a single processor but also
|
1891 |
|
|
in situations where several processors
|
1892 |
|
|
share a common memory. The following commands are useful
|
1893 |
|
|
for efficient operation in such circumstances.
|
1894 |
|
|
|
1895 |
|
|
\bull\
|
1896 |
|
|
@.CSWAP@>
|
1897 |
|
|
If the octabyte $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$
|
1898 |
|
|
is equal to the contents of the special {\it prediction register\/}~rP,
|
1899 |
|
|
@^rP@>
|
1900 |
|
|
it is replaced in memory with the contents of register~X, and
|
1901 |
|
|
register~X is set equal to~1. Otherwise the octabyte in memory
|
1902 |
|
|
replaces rP and register~X is set to zero.
|
1903 |
|
|
This is an atomic (indivisible, uninterruptible) operation,
|
1904 |
|
|
useful for interprocess communication
|
1905 |
|
|
when independent computers are sharing the same memory.
|
1906 |
|
|
|
1907 |
|
|
The compare-and-swap operation was introduced by IBM in late
|
1908 |
|
|
models of the
|
1909 |
|
|
@^IBM Corporation@>
|
1910 |
|
|
@^compare-and-swap@>
|
1911 |
|
|
@^atomic instruction@>
|
1912 |
|
|
System/370 architecture, and it soon spread to several
|
1913 |
|
|
@^System/370@>
|
1914 |
|
|
other machines. Significant ways to use it are discussed, for example,
|
1915 |
|
|
in section 7.2.3 of Harold Stone's
|
1916 |
|
|
{\sl High-Performance Computer Architecture\/} (Reading, Massachusetts:\
|
1917 |
|
|
Addison--Wesley, 1987), and in sections 8.2 and 8.3 of {\sl Transaction
|
1918 |
|
|
Processing\/} by Jim Gray and Andreas Reuter (San Francisco:\ Morgan
|
1919 |
|
|
Kaufmann, 1993). % Kaufmann: stet
|
1920 |
|
|
@^Stone, Harold Stuart@>
|
1921 |
|
|
@^Gray, James Nicholas@>
|
1922 |
|
|
@^Reuter, Andreas Horst@>
|
1923 |
|
|
|
1924 |
|
|
\bull\
|
1925 |
|
|
@.SYNC@>
|
1926 |
|
|
If $\rm XYZ=0$, the machine drains its pipeline (that is, it
|
1927 |
|
|
stalls until all preceding instructions have completed their activity).
|
1928 |
|
|
If $\rm XYZ=1$, the machine controls its actions less drastically,
|
1929 |
|
|
in such a way that all
|
1930 |
|
|
store instructions preceding this \.{SYNC} will be completed
|
1931 |
|
|
before all store instructions after it.
|
1932 |
|
|
If $\rm XYZ=2$, the machine controls its actions in such a way that all
|
1933 |
|
|
load instructions preceding this \.{SYNC} will be completed
|
1934 |
|
|
before all load instructions after it.
|
1935 |
|
|
If $\rm XYZ=3$, the machine controls its actions
|
1936 |
|
|
in such a way that all {\it load or store\/} instructions preceding this
|
1937 |
|
|
\.{SYNC} will be completed before all load or store instructions after it.
|
1938 |
|
|
If $\rm XYZ=4$, the machine goes into a power-saver mode, in which
|
1939 |
|
|
@^power-saver mode@>
|
1940 |
|
|
instructions may be executed more slowly (or not at all) until some kind
|
1941 |
|
|
of ``wake-up'' signal is received.
|
1942 |
|
|
If $\rm XYZ=5$, the machine empties its write buffer and
|
1943 |
|
|
cleans its data caches, if any (including a possible secondary cache);
|
1944 |
|
|
the caches retain their data,
|
1945 |
|
|
but the cache contents also appear in memory.
|
1946 |
|
|
If $\rm XYZ=6$, the machine clears its virtual address translation
|
1947 |
|
|
caches (see below).
|
1948 |
|
|
If $\rm XYZ=7$, the machine clears its instruction and data caches,
|
1949 |
|
|
discarding any information in the data caches that wasn't previously
|
1950 |
|
|
in memory. (``Clearing'' is stronger than ``cleaning''; a clear cache
|
1951 |
|
|
remembers nothing. Clearing is also faster, because it simply obliterates
|
1952 |
|
|
everything.)
|
1953 |
|
|
If $\rm XYZ>7$, an illegal instruction interrupt occurs.
|
1954 |
|
|
|
1955 |
|
|
Of course no \.{SYNC} is necessary between a command that loads from or stores
|
1956 |
|
|
into memory and a subsequent command that loads from or stores into exactly
|
1957 |
|
|
the same location. However, \.{SYNC} might be necessary in certain cases even
|
1958 |
|
|
on a one-processor system, because input/output processes take place in
|
1959 |
|
|
parallel with ordinary computation.
|
1960 |
|
|
|
1961 |
|
|
The cases $\rm XYZ>3$ are {\it privileged}, in the sense that
|
1962 |
|
|
only the operating system can use them. More precisely, if a \.{SYNC}
|
1963 |
|
|
command is encountered with $\rm XYZ=4$ or
|
1964 |
|
|
$\rm XYZ=5$ or $\rm XYZ=6$ or $\rm XYZ=7$,
|
1965 |
|
|
a ``privileged instruction interrupt'' occurs unless that interrupt
|
1966 |
|
|
is currently disabled. Only the operating system can disable
|
1967 |
|
|
interrupts (see below).
|
1968 |
|
|
@^privileged operations@>
|
1969 |
|
|
|
1970 |
|
|
@* Trips and traps.
|
1971 |
|
|
Special register rA records the current status information
|
1972 |
|
|
about arithmetic exceptions. Its least significant byte contains eight
|
1973 |
|
|
``event'' bits called DVWIOUZX from left to right, where D stands for
|
1974 |
|
|
integer divide check, V~for integer overflow, W~for float-to-fix overflow,
|
1975 |
|
|
I~for invalid operation, O~for floating overflow, U~for
|
1976 |
|
|
floating underflow, Z~for floating division by zero, and X~for floating
|
1977 |
|
|
inexact. % The low order five bits agree with SPARC I conventions
|
1978 |
|
|
% but Alpha, for example, uses the order VXUOZI
|
1979 |
|
|
The next least significant byte of rA contains eight
|
1980 |
|
|
``enable'' bits with the same names DVWIOUZX and the same meanings.
|
1981 |
|
|
When an exceptional condition occurs, there are two cases: If the
|
1982 |
|
|
corresponding enable bit is~0, the corresponding event bit is set
|
1983 |
|
|
to~1. But if the corresponding enable bit is~1, \MMIX\ interrupts
|
1984 |
|
|
its current instruction stream and executes a special ``exception
|
1985 |
|
|
handler.'' Thus, the event bits record exceptions that have not been
|
1986 |
|
|
``tripped.''
|
1987 |
|
|
@^overflow@>
|
1988 |
|
|
@^underflow@>
|
1989 |
|
|
@^exceptions@>
|
1990 |
|
|
@^handlers@>
|
1991 |
|
|
@^float-to-fix exception@>
|
1992 |
|
|
@^inexact exception@>
|
1993 |
|
|
@^invalid exception@>
|
1994 |
|
|
@^divide check exception@>
|
1995 |
|
|
|
1996 |
|
|
Floating point overflow always causes two exceptions, O and~X\null.
|
1997 |
|
|
(The strictest interpretation of the IEEE standard would raise exception~X
|
1998 |
|
|
on overflow only if floating overflow is not enabled, but \MMIX\ always
|
1999 |
|
|
considers an overflowed result to be inexact.)
|
2000 |
|
|
Floating point underflow always causes both U and~X when underflow is
|
2001 |
|
|
not enabled, and it might cause both U and~X when underflow is enabled.
|
2002 |
|
|
If both enable bits are set to~1 in such cases, the overflow or underflow
|
2003 |
|
|
handler is called and the inexact handler is ignored. All other types
|
2004 |
|
|
of exceptions arise one at a time, so there is no ambiguity about which
|
2005 |
|
|
exception handler should be invoked unless exceptions are raised by
|
2006 |
|
|
``ropcode~2'' (see below); in general the first enabled exception
|
2007 |
|
|
in the list DVWIOUZX takes precedence.
|
2008 |
|
|
|
2009 |
|
|
What about the six high-order bytes of the status register rA?
|
2010 |
|
|
@^rA@>
|
2011 |
|
|
@^rounding modes@>
|
2012 |
|
|
At present, only two of those 48 bits are defined;
|
2013 |
|
|
the others must be zero for compatibility
|
2014 |
|
|
with possible future extensions. The two bits corresponding to $2^{17}$
|
2015 |
|
|
and $2^{16}$ in rA specify a rounding mode, as follows: 00~means
|
2016 |
|
|
round to nearest (the default); 01~means round off (toward zero);
|
2017 |
|
|
10~means round up (toward positive infinity); and
|
2018 |
|
|
11~means round down (toward negative infinity).
|
2019 |
|
|
% Alpha conventions differ: 10,00,11,01 for nearest,off,up,down
|
2020 |
|
|
|
2021 |
|
|
@ The execution of\/ \MMIX\ programs can be interrupted in several ways.
|
2022 |
|
|
We have just seen that arithmetic exceptions will cause interrupts if
|
2023 |
|
|
they are enabled; so will illegal or privileged instructions, or instructions
|
2024 |
|
|
@^illegal instructions@>
|
2025 |
|
|
@^privileged operations@>
|
2026 |
|
|
@^emulation@>
|
2027 |
|
|
@^interrupts@>
|
2028 |
|
|
@^I/O@>
|
2029 |
|
|
@^input/output@>
|
2030 |
|
|
that are emulated in software instead of provided by the hardware.
|
2031 |
|
|
Input/output operations or external timers are another common source
|
2032 |
|
|
of interrupts; the operating system knows how to deal with
|
2033 |
|
|
all gadgets that might be hooked up to an \MMIX\ processor chip.
|
2034 |
|
|
Interrupts occur also when memory accesses fail---for example if
|
2035 |
|
|
memory is nonexistent or protected.
|
2036 |
|
|
Power failures that force the machine to use its backup battery power
|
2037 |
|
|
in order to keep running in an emergency,
|
2038 |
|
|
or hardware failures like parity errors,
|
2039 |
|
|
all must be handled as gracefully as possible.
|
2040 |
|
|
|
2041 |
|
|
Users can also force interrupts to happen by giving explicit \.{TRAP} or
|
2042 |
|
|
\.{TRIP} instructions:
|
2043 |
|
|
|
2044 |
|
|
\bull\
|
2045 |
|
|
@.TRIP@>
|
2046 |
|
|
@.TRAP@>
|
2047 |
|
|
Both of these instructions interrupt processing and transfer control
|
2048 |
|
|
to a handler. The difference between them is that \.{TRAP}
|
2049 |
|
|
is handled by the operating system but \.{TRIP} is handled by the user.
|
2050 |
|
|
@^operating system@>
|
2051 |
|
|
More precisely, the X, Y, and Z fields of \.{TRAP} have special significance
|
2052 |
|
|
predefined by the operating system kernel. For example, a system call---say an I/O
|
2053 |
|
|
command, or a command to allocate more memory---might be invoked
|
2054 |
|
|
by certain settings of X, Y, and~Z\null.
|
2055 |
|
|
The X, Y, and Z fields of \.{TRIP}, on the other hand, are definable by
|
2056 |
|
|
users for their own applications, and users also define their own
|
2057 |
|
|
handlers. ``Trip handler'' programs
|
2058 |
|
|
invoked by \.{TRIP} are interruptible, but interrupts are normally inhibited
|
2059 |
|
|
while a \.{TRAP} is being serviced. Specific details about the
|
2060 |
|
|
precise actions of \.{TRIP} and \.{TRAP} appear below, together
|
2061 |
|
|
with the description of another command called \.{RESUME} that
|
2062 |
|
|
returns control from a handler to the interrupted program.
|
2063 |
|
|
|
2064 |
|
|
Only two variants of \.{TRAP} are predefined by the \MMIX\ architecture:
|
2065 |
|
|
If $\rm XYZ=0$ in a \.{TRAP}
|
2066 |
|
|
command, a user process should terminate. If $\rm XYZ=1$,
|
2067 |
|
|
the operating system should provide default action for cases in which
|
2068 |
|
|
the user has not provided any handler for a particular
|
2069 |
|
|
kind of interrupt (see below).
|
2070 |
|
|
|
2071 |
|
|
A few additional variants of \.{TRAP} are predefined in the rudimentary
|
2072 |
|
|
operating system used with \MMIX\ simulators. These variants, which
|
2073 |
|
|
allow simple input/output operations to be done, all have $\xx=0$,
|
2074 |
|
|
and the Y~field is a small positive constant. For example, $\yy=1$ invokes
|
2075 |
|
|
the \.{Fopen} routine, which opens a file. (See the program
|
2076 |
|
|
{\mc MMIX-SIM} for full details.)
|
2077 |
|
|
@^I/O@>
|
2078 |
|
|
@^input/output@>
|
2079 |
|
|
|
2080 |
|
|
@ Non-catastrophic interrupts in \MMIX\ are always {\it precise}, in the sense that all legal
|
2081 |
|
|
instructions before a certain point have effectively been executed, and
|
2082 |
|
|
no instructions after that point have yet been executed. The current
|
2083 |
|
|
instruction, which may or may not have been completed at the time of
|
2084 |
|
|
interrupt and which may or may not need to be resumed after the interrupt has
|
2085 |
|
|
been serviced, is
|
2086 |
|
|
put into the special {\it execution register\/}~rX, and its operands (if any)
|
2087 |
|
|
are placed in special registers rY and~rZ\null. The address of the following
|
2088 |
|
|
instruction is placed in the special {\it where-interrupted
|
2089 |
|
|
register\/}~rW\null.
|
2090 |
|
|
@^interrupts@>
|
2091 |
|
|
@^rW@>
|
2092 |
|
|
@^rX@>
|
2093 |
|
|
@^rY@>
|
2094 |
|
|
@^rZ@>
|
2095 |
|
|
The instruction in~rX may not be the same as the instruction in
|
2096 |
|
|
location $\rm rW-4$; for example, it may be an instruction that
|
2097 |
|
|
branched or jumped to~rW\null. It might also be an instruction
|
2098 |
|
|
inserted internally by the \MMIX\ processor.
|
2099 |
|
|
(For example, the computer silently inserts an internal instruction
|
2100 |
|
|
that increases~$L$ before an instruction
|
2101 |
|
|
like \
|
2102 |
|
|
occurs, between the inserted instruction and the \.{ADD},
|
2103 |
|
|
the instruction in~rX will
|
2104 |
|
|
say \.{ADD}, because an internal instruction retains the identity of the
|
2105 |
|
|
actual command that spawned it; but rW will point to the {\it real\/}
|
2106 |
|
|
\.{ADD} command.)
|
2107 |
|
|
|
2108 |
|
|
When an instruction has the normal meaning ``set \$X to
|
2109 |
|
|
the result of \$Y~op~\$Z'' or ``set \$X to the result of \$Y~op~Z,''
|
2110 |
|
|
special registers rY and~rZ will relate in the
|
2111 |
|
|
obvious way to the Y and~Z operands of the instruction; but this is not
|
2112 |
|
|
always the case. For example, after an interrupted
|
2113 |
|
|
store instruction, the first operand~rY will hold
|
2114 |
|
|
the virtual memory address (\$Y plus either \$Z or~Z),
|
2115 |
|
|
and the second operand~rZ will be the octabyte to be stored in memory
|
2116 |
|
|
(including bytes that have not changed, in cases like \.{STB}). In
|
2117 |
|
|
other cases the actual
|
2118 |
|
|
contents of rY and~rZ are defined by each implementation of\/ \MMIX,
|
2119 |
|
|
and programmers should not rely on their significance.
|
2120 |
|
|
|
2121 |
|
|
Some instructions take an unpredictable and possibly long amount of time, so
|
2122 |
|
|
it may be necessary to interrupt them in progress. For example, the \.{FREM}
|
2123 |
|
|
@.FREM@>
|
2124 |
|
|
instruction (floating point remainder) is extremely difficult to compute
|
2125 |
|
|
rapidly if its first operand has an exponent of~2046 and its second operand
|
2126 |
|
|
has an exponent of~1. In such cases the rY and rZ registers saved during an
|
2127 |
|
|
interrupt show the current state of the computation, not necessarily the
|
2128 |
|
|
original values of the operands. The value of $\rm rY\,{rem}\,rZ$ will still
|
2129 |
|
|
be the desired remainder, but rY may well have been reduced to a
|
2130 |
|
|
number that has an exponent closer to the exponent of~rZ\null.
|
2131 |
|
|
After the interrupt has been processed, the remainder
|
2132 |
|
|
computation will continue where it left off.
|
2133 |
|
|
(Alternatively, an operation like \.{FREM} or even \.{FADD} might be
|
2134 |
|
|
implemented in software instead of hardware, as we will see later.)
|
2135 |
|
|
|
2136 |
|
|
Another example arises with an instruction like \.{PREST} (prestore), which can
|
2137 |
|
|
@.PREST@>
|
2138 |
|
|
specify prestoring up to 256 bytes. An implementation of\/ \MMIX\ might choose
|
2139 |
|
|
to prestore only 32 or 64 bytes at a time, depending on the cache block size;
|
2140 |
|
|
then it can change the contents of rX to reflect the unfinished part of
|
2141 |
|
|
a partially completed \.{PREST} command.
|
2142 |
|
|
|
2143 |
|
|
Commands that decrease $G$, pop the stack, save the
|
2144 |
|
|
current context, or unsave an old context also are interruptible. Register~rX
|
2145 |
|
|
is used to communicate information about partial completion in such a
|
2146 |
|
|
way that the interruption will be essentially ``invisible'' after
|
2147 |
|
|
a program is resumed.
|
2148 |
|
|
|
2149 |
|
|
@ Three kinds of interruption are possible: trips, forced traps, and
|
2150 |
|
|
dynamic traps. We will discuss each of these in turn.
|
2151 |
|
|
@^interrupts@>
|
2152 |
|
|
@^trips@>
|
2153 |
|
|
@^traps@>
|
2154 |
|
|
@^forced traps@>
|
2155 |
|
|
@^dynamic traps@>
|
2156 |
|
|
@^handlers@>
|
2157 |
|
|
@^operating system@>
|
2158 |
|
|
|
2159 |
|
|
A \.{TRIP} instruction puts itself into the right half of the execution
|
2160 |
|
|
@.TRIP@>
|
2161 |
|
|
register~rX, and sets the 32 bits of the left half to \Hex{80000000}.
|
2162 |
|
|
(Therefore rX is {\it negative\/}; this fact will
|
2163 |
|
|
tell the \.{RESUME} command not to \.{TRIP} again.) The special registers
|
2164 |
|
|
rY and rZ are set to the contents of the registers specified by the
|
2165 |
|
|
Y and Z fields of the \.{TRIP} command, namely \$Y and~\$Z.
|
2166 |
|
|
Then \$255 is placed into the special {\it bootstrap
|
2167 |
|
|
register\/}~rB, and \$255 is set to~rJ. \MMIX\ now takes its next instruction
|
2168 |
|
|
@^rB@>
|
2169 |
|
|
from virtual memory address~0.
|
2170 |
|
|
|
2171 |
|
|
Arithmetic exceptions interrupt the computation in essentially the
|
2172 |
|
|
same way as \.{TRIP}, if they are enabled. The only difference is that
|
2173 |
|
|
their handlers begin at the respective addresses
|
2174 |
|
|
16, 32, 48, 64, 80, 96, 112, and~128, for exception bits D, V, W, I, O, U,
|
2175 |
|
|
Z, and~X of~rA; registers rY and~rZ are set to the operands of the
|
2176 |
|
|
interrupted instruction as explained earlier.
|
2177 |
|
|
|
2178 |
|
|
A 16-byte block of memory is just enough for a sequence of commands like
|
2179 |
|
|
$$\hbox{\tt PUSHJ 255,Handler; PUT rJ,\$255; GET \$255,rB; RESUME}$$
|
2180 |
|
|
which will invoke a user's handler. And if the user does not choose to
|
2181 |
|
|
provide a custom-designed handler, the operating system provides a
|
2182 |
|
|
default handler via the instructions
|
2183 |
|
|
$$\hbox{\tt TRAP 1; GET \$255,rB; RESUME.}$$
|
2184 |
|
|
|
2185 |
|
|
A trip handler might simply record the fact that tripping occurred.
|
2186 |
|
|
But the handler for an arithmetic interrupt might want to change the
|
2187 |
|
|
default result of a computation. In such cases, the handler should place
|
2188 |
|
|
the desired substitute result into~rZ, and it should change the most
|
2189 |
|
|
significant byte of~rX from \Hex{80} to \Hex{02}. This will have the desired
|
2190 |
|
|
effect, because of the rules of \.{RESUME} explained below, {\it unless\/}
|
2191 |
|
|
the exception occurred on a command like \.{STB} or \.{STSF}. (A~bit more
|
2192 |
|
|
work is needed to alter the effect of a command that stores into memory.)
|
2193 |
|
|
|
2194 |
|
|
Instructions in {\it negative\/} virtual locations do not invoke trip
|
2195 |
|
|
handlers, either for \.{TRIP} or for arithmetic exceptions. Such instructions
|
2196 |
|
|
are reserved for the operating system, as we will see.
|
2197 |
|
|
@^negative locations@>
|
2198 |
|
|
|
2199 |
|
|
@ A \.{TRAP} instruction interrupts the computation essentially
|
2200 |
|
|
@^interrupts@>
|
2201 |
|
|
like \.{TRIP}, but with the following modifications:
|
2202 |
|
|
@^rT@>
|
2203 |
|
|
@.TRAP@>
|
2204 |
|
|
@^rK@>
|
2205 |
|
|
(i)~the interrupt mask register~rK is cleared
|
2206 |
|
|
to zero, thereby inhibiting interrupts; (ii)~control jumps to virtual memory
|
2207 |
|
|
address~rT, not zero; (iii)~information is placed
|
2208 |
|
|
@^rBB@>
|
2209 |
|
|
@^rWW@>
|
2210 |
|
|
@^rXX@>
|
2211 |
|
|
@^rYY@>
|
2212 |
|
|
@^rZZ@>
|
2213 |
|
|
in a separate set of special registers rBB, rWW, rXX, rYY, and~rZZ, instead of
|
2214 |
|
|
rB, rW, rX, rY, and~rZ\null. (These special registers are needed because a trap
|
2215 |
|
|
might occur while processing a \.{TRIP}.)
|
2216 |
|
|
|
2217 |
|
|
Another kind of forced trap occurs on implementations of\/ \MMIX\ that
|
2218 |
|
|
emulate certain instructions in software rather than in hardware.
|
2219 |
|
|
Such instructions cause a \.{TRAP} even though their opcode is something
|
2220 |
|
|
else like \.{FREM} or \.{FADD} or \.{DIV}. The trap handler can tell
|
2221 |
|
|
what instruction to emulate by looking at the opcode, which appears
|
2222 |
|
|
in~rXX\null.
|
2223 |
|
|
In such cases the left-hand half of~rXX is set to \Hex{02000000}; the handler
|
2224 |
|
|
emulating \.{FADD}, say, should compute the floating point sum of rYY and~rZZ
|
2225 |
|
|
and place the result in~rZZ\null. A~subsequent
|
2226 |
|
|
\.{RESUME}~\.1 will then place the value of~rZZ in the proper register.
|
2227 |
|
|
@^emulation@>
|
2228 |
|
|
@^forced traps@>
|
2229 |
|
|
|
2230 |
|
|
Implementations of\/ \MMIX\ might also emulate the process of
|
2231 |
|
|
virtual-address-to-physical-address translation described below,
|
2232 |
|
|
instead of providing for page table calculations in hardware.
|
2233 |
|
|
Then if, say, a \.{LDB} instruction does not know the physical memory
|
2234 |
|
|
address corresponding to a specified virtual address, it will cause
|
2235 |
|
|
a forced trap with the left half of~rXX set to \Hex{03000000} and with
|
2236 |
|
|
rYY set to the virtual address in question. The trap handler should
|
2237 |
|
|
place the physical page address into~rZZ; then \.{RESUME}~\.1 will
|
2238 |
|
|
complete~the~\.{LDB}.
|
2239 |
|
|
|
2240 |
|
|
@ The third and final kind of interrupt is called a {\it dynamic\/} trap.
|
2241 |
|
|
@^interrupts@>
|
2242 |
|
|
@^dynamic traps@>
|
2243 |
|
|
Such interruptions occur when one or more of the 64 bits in the
|
2244 |
|
|
special {\it interrupt request register\/}~rQ have been set to~1,
|
2245 |
|
|
@^rQ@>
|
2246 |
|
|
@^rK@>
|
2247 |
|
|
and when at least one corresponding bit of the special
|
2248 |
|
|
{\it interrupt mask register\/}~rK is also equal to~1. The bit positions
|
2249 |
|
|
of rQ and~rK have the general form
|
2250 |
|
|
$$\beginword
|
2251 |
|
|
&\field{24}{24}&&\field88&&\field{24}{24}&&\field88\cr
|
2252 |
|
|
\noalign{\hrule}
|
2253 |
|
|
\\&low-priority I/O&\\&program&\\&high-priority I/O&\\&machine&\\\cr
|
2254 |
|
|
\noalign{\hrule}\endword$$
|
2255 |
|
|
where the 8-bit ``program'' bits are called \.{rwxnkbsp} and have
|
2256 |
|
|
the following meanings:
|
2257 |
|
|
$$\vbox{\halign{\.# bit: \hfil\cr
|
2258 |
|
|
r&instruction tries to load from a page without read permission;\cr
|
2259 |
|
|
w&instruction tries to store to a page without write permission;\cr
|
2260 |
|
|
x&instruction appears in a page without execute permission;\cr
|
2261 |
|
|
n&instruction refers to a negative virtual address;\cr
|
2262 |
|
|
k&instruction is privileged, for use by the ``kernel'' only;\cr
|
2263 |
|
|
b&instruction breaks the rules of\/ \MMIX;\cr
|
2264 |
|
|
s&instruction violates security (see below);\cr
|
2265 |
|
|
p&instruction comes from a privileged (negative) virtual address.\cr}}$$
|
2266 |
|
|
Negative addresses are for the use of the operating system only;
|
2267 |
|
|
@^operating system@>
|
2268 |
|
|
@^protection bits@>
|
2269 |
|
|
@^permission bits@>
|
2270 |
|
|
@^security violation@>
|
2271 |
|
|
@^privileged instructions@>
|
2272 |
|
|
@^illegal instructions@>
|
2273 |
|
|
@^page fault@>
|
2274 |
|
|
a security violation occurs if an instruction in a nonnegative address
|
2275 |
|
|
is executed without the \.{rwxnkbsp} bits of~rK all set to~1.
|
2276 |
|
|
(In such cases the \.s bits of both rQ and~rK are set to~1.)
|
2277 |
|
|
|
2278 |
|
|
The eight ``machine'' bits of rQ and rK represent the most urgent
|
2279 |
|
|
kinds of interrupts. The rightmost bit stands for power failure,
|
2280 |
|
|
the next for memory parity error, the next for nonexistent memory,
|
2281 |
|
|
the next for rebooting, etc.
|
2282 |
|
|
Interrupts that need especially quick service, like requests from
|
2283 |
|
|
a high-speed network, also are allocated bit positions near the right end.
|
2284 |
|
|
Low priority I/O devices like keyboards are assigned to bits at the left.
|
2285 |
|
|
The allocation of input/output devices to bit positions will
|
2286 |
|
|
differ from implementation to implementation, depending on
|
2287 |
|
|
what devices are available.
|
2288 |
|
|
@^I/O@>
|
2289 |
|
|
@^input/output@>
|
2290 |
|
|
|
2291 |
|
|
Once $\rm rQ\land rK$ becomes nonzero, the machine waits
|
2292 |
|
|
briefly until it can give a precise interrupt.
|
2293 |
|
|
Then it proceeds as with a forced trap,
|
2294 |
|
|
except that it uses the special ``dynamic
|
2295 |
|
|
trap address register''~rTT instead of~rT. The trap handler that
|
2296 |
|
|
@^rTT@>
|
2297 |
|
|
begins at location~rTT can figure out the reason for interrupt by
|
2298 |
|
|
examining $\rm rQ\land rK$. (For example, after the instructions
|
2299 |
|
|
$$\hbox spread-10pt{\tt\spaceskip .5em minus .1em
|
2300 |
|
|
GET \$0,rQ; LDOU \$1,savedK; AND \$0,\$0,\$1; SUBU \$1,\$0,1;
|
2301 |
|
|
SADD \$2,\$1,\$0; ANDN \$1,\$0,\$1}$$
|
2302 |
|
|
the highest-priority offending bit will be in \$1 and its position will be
|
2303 |
|
|
in~\$2.)
|
2304 |
|
|
@^counting trailing zeros@>
|
2305 |
|
|
|
2306 |
|
|
If the interrupted instruction contributed 1s to any of the \.{rwxnkbsp} bits
|
2307 |
|
|
of~rQ, the corresponding bits are set to~1 also in~rXX\null. A~dynamic trap
|
2308 |
|
|
handler might be able to use this information (although it should
|
2309 |
|
|
service higher-priority interrupts first if the right half
|
2310 |
|
|
of $\rm rQ\land rK$ is nonzero).
|
2311 |
|
|
@^rX@>
|
2312 |
|
|
|
2313 |
|
|
The rules of\/ \MMIX\ are rigged
|
2314 |
|
|
so that only the operating system can execute instructions
|
2315 |
|
|
with interrupts suppressed. Therefore the operating system can in fact
|
2316 |
|
|
use instructions that would interrupt an ordinary program. Control of
|
2317 |
|
|
register rK turns out to be the ultimate privilege, and in a sense the
|
2318 |
|
|
only important one.
|
2319 |
|
|
@^privileged operations@>
|
2320 |
|
|
|
2321 |
|
|
An instruction that causes a dynamic trap is usually executed before the
|
2322 |
|
|
interruption occurs. However, an instruction that traps with
|
2323 |
|
|
bits \.x, \.k, or \.b does nothing; a load instruction that traps
|
2324 |
|
|
with \.r or \.n loads zero; a store instruction that traps with any
|
2325 |
|
|
of \.{rwxnkbsp} stores nothing.
|
2326 |
|
|
|
2327 |
|
|
@ After a trip handler or trap handler has done its thing, it
|
2328 |
|
|
generally invokes the following command.
|
2329 |
|
|
|
2330 |
|
|
\bull\
|
2331 |
|
|
@.RESUME@>
|
2332 |
|
|
@^interrupts@>
|
2333 |
|
|
@^handlers@>
|
2334 |
|
|
If the Z field of this instruction is zero,
|
2335 |
|
|
\MMIX\ will use the
|
2336 |
|
|
information found in special registers rW, rX, rY, and~rZ to restart an
|
2337 |
|
|
@^rW@>
|
2338 |
|
|
@^rX@>
|
2339 |
|
|
@^rY@>
|
2340 |
|
|
@^rZ@>
|
2341 |
|
|
@^rBB@>
|
2342 |
|
|
@^rWW@>
|
2343 |
|
|
@^rXX@>
|
2344 |
|
|
@^rYY@>
|
2345 |
|
|
@^rZZ@>
|
2346 |
|
|
@^rK@>
|
2347 |
|
|
interrupted computation. If the execution register rX is negative, it will be
|
2348 |
|
|
ignored and instructions will be executed starting at virtual address~rW\null;
|
2349 |
|
|
otherwise the instruction in the right half of the execution register will be
|
2350 |
|
|
inserted into the program as if it had appeared in location $\rm rW-4$,
|
2351 |
|
|
subject to certain modifications that we will explain momentarily,
|
2352 |
|
|
and the {\it next\/} instruction will come from rW.
|
2353 |
|
|
|
2354 |
|
|
If the Z field of \.{RESUME}
|
2355 |
|
|
is 1 and if this instruction appears in a negative location,
|
2356 |
|
|
registers rWW, rXX, rYY, and~rZZ are used instead of rW, rX, rY, and~rZ\null.
|
2357 |
|
|
Also, just before resuming the computation,
|
2358 |
|
|
mask register rK is set to \$255 and \$255 is set to rBB\null.
|
2359 |
|
|
(Only the operating system gets to use this feature.)
|
2360 |
|
|
@^operating system@>
|
2361 |
|
|
|
2362 |
|
|
An interrupt handler within the operating system might choose to allow itself
|
2363 |
|
|
to be interrupted. In such cases it should save the contents of
|
2364 |
|
|
rBB, rWW, rXX, rYY, and~rZZ on some kind of stack, before making rK nonzero.
|
2365 |
|
|
Then, before resuming whatever caused the base level interrupt, it
|
2366 |
|
|
must again disable all interrupts; this can be done with \.{TRAP},
|
2367 |
|
|
because the trap handler can tell from the virtual address in~rWW that
|
2368 |
|
|
it has been invoked by the operating system. Once rK is again zero,
|
2369 |
|
|
the contents of rBB, rWW, rXX, rYY, and~rZZ are restored from the stack,
|
2370 |
|
|
the outer level interrupt mask is placed in \$255, and \
|
2371 |
|
|
finishes the job.
|
2372 |
|
|
|
2373 |
|
|
Values of Z greater than 1 are reserved for possible later
|
2374 |
|
|
definition. Therefore they cause an illegal instruction interrupt (that
|
2375 |
|
|
is, they set the `\.b' bit of~rQ) in the present version of\/ \MMIX.
|
2376 |
|
|
@^illegal instructions@>
|
2377 |
|
|
|
2378 |
|
|
If the execution register rX is nonnegative, its leftmost byte controls
|
2379 |
|
|
the way its right-hand half will be inserted into the program.
|
2380 |
|
|
Let's call this byte the ``ropcode.'' A ropcode of~0 simply
|
2381 |
|
|
inserts the instruction into the execution stream; a ropcode of~1
|
2382 |
|
|
is similar, but it substitutes rY and rZ for the
|
2383 |
|
|
two operands, assuming that this makes sense for the operation considered.
|
2384 |
|
|
@^ropcodes@>
|
2385 |
|
|
|
2386 |
|
|
Ropcode~2 inserts a command that sets \$X to rZ, where
|
2387 |
|
|
X~is the second byte in the right half of rX\null.
|
2388 |
|
|
This ropcode is normally used with forced-trap emulations, so that the result
|
2389 |
|
|
of an emulated instruction is placed into the correct register.
|
2390 |
|
|
It also uses the third-from-left byte of~rX to raise any or all of the
|
2391 |
|
|
arithmetic exceptions DVWIOUZX, at the same time as rZ is
|
2392 |
|
|
being placed in \$X. Emulated instructions and
|
2393 |
|
|
explicit \.{TRAP} commands can therefore cause overflow, say,
|
2394 |
|
|
just as ordinary instructions can.
|
2395 |
|
|
(Such new exceptions may, of
|
2396 |
|
|
course, spawn a trip interrupt, if any of the corresponding bits are enabled
|
2397 |
|
|
in~rA.)
|
2398 |
|
|
@^rA@>
|
2399 |
|
|
@^emulation@>
|
2400 |
|
|
|
2401 |
|
|
Finally, ropcode 3 is the same as ropcode 0, except that it also
|
2402 |
|
|
tells \MMIX\ to treat rZ as the page table entry for the virtual
|
2403 |
|
|
address~rY\null. (See the discussion of virtual address translation below.)
|
2404 |
|
|
Ropcodes greater than~3 are not permitted; moreover,
|
2405 |
|
|
only \
|
2406 |
|
|
|
2407 |
|
|
The ropcode rules in the previous paragraphs should of course be understood to
|
2408 |
|
|
involve rWW, rXX, rYY, and rZZ instead of rW, rX, rY, and~rZ when
|
2409 |
|
|
the ropcode is seen by \.{RESUME}~\.1. Thus, in particular, ropcode~3
|
2410 |
|
|
always applies to rYY and~rZZ, never to rY and~rZ.
|
2411 |
|
|
|
2412 |
|
|
Special restrictions must hold if resumption is to work properly: Ropcodes
|
2413 |
|
|
0~and~3 must not insert a \.{RESUME} instruction; ropcode~1 must insert
|
2414 |
|
|
a ``normal'' instruction, namely one whose opcode begins with
|
2415 |
|
|
one of the hexadecimal digits \Hex{0}, \Hex{1}, \Hex{2}, \Hex{3}, \Hex{6},
|
2416 |
|
|
\Hex{7}, \Hex{C}, \Hex{D}, or~\Hex{E}. (See the opcode chart below.)
|
2417 |
|
|
Some implementations may also allow ropcode~1 with \.{SYNCD[I]}
|
2418 |
|
|
and \.{SYNCID[I]}, so that those instructions can conveniently be
|
2419 |
|
|
interrupted.
|
2420 |
|
|
Moreover, the destination register \$X used with ropcode 1 or~2 must
|
2421 |
|
|
not be marginal. All of these restrictions hold automatically in normal
|
2422 |
|
|
use; they are relevant only if the programmer tries to do something tricky.
|
2423 |
|
|
|
2424 |
|
|
Notice that the slightly tricky sequence
|
2425 |
|
|
$$\hbox{\tt LDA \$0,Loc; PUT rW,\$0; LDTU \$1,Inst; PUT rX,\$1; RESUME}$$
|
2426 |
|
|
will execute an almost arbitrary instruction \.{Inst} as if it had been in
|
2427 |
|
|
location \.{Loc-4}, and then will jump to location \.{Loc} (assuming
|
2428 |
|
|
that \.{Inst} doesn't branch elsewhere).
|
2429 |
|
|
|
2430 |
|
|
@* Special registers.
|
2431 |
|
|
@^special registers@>
|
2432 |
|
|
Quite a few special registers have been mentioned so far, and \MMIX\ actually
|
2433 |
|
|
has even more. It is time now to enumerate them all, together with their
|
2434 |
|
|
internal code numbers:
|
2435 |
|
|
$$\vbox{\halign{\hfil#,\quad\hfil\cr
|
2436 |
|
|
rA&arithmetic status register [21]\cr
|
2437 |
|
|
rB&bootstrap register (trip) [0]\cr
|
2438 |
|
|
rC&cycle counter [8]\cr
|
2439 |
|
|
rD÷nd register [1]\cr
|
2440 |
|
|
rE&epsilon register [2]\cr
|
2441 |
|
|
rF&failure location register [22]\cr
|
2442 |
|
|
rG&global threshold register [19]\cr
|
2443 |
|
|
rH&himult register [3]\cr
|
2444 |
|
|
rI&interval counter [12]\cr
|
2445 |
|
|
rJ&return-jump register [4]\cr
|
2446 |
|
|
rK&interrupt mask register [15]\cr
|
2447 |
|
|
rL&local threshold register [20]\cr
|
2448 |
|
|
rM&multiplex mask register [5]\cr
|
2449 |
|
|
rN&serial number [9]\cr
|
2450 |
|
|
rO®ister stack offset [10]\cr
|
2451 |
|
|
rP&prediction register [23]\cr
|
2452 |
|
|
rQ&interrupt request register [16]\cr
|
2453 |
|
|
rR&remainder register [6]\cr
|
2454 |
|
|
rS®ister stack pointer [11]\cr
|
2455 |
|
|
rT&trap address register [13]\cr
|
2456 |
|
|
rU&usage counter [17]\cr
|
2457 |
|
|
rV&virtual translation register [18]\cr
|
2458 |
|
|
rW&where-interrupted register (trip) [24]\cr
|
2459 |
|
|
rX&execution register (trip) [25]\cr
|
2460 |
|
|
rY&Y operand (trip) [26]\cr
|
2461 |
|
|
rZ&Z operand (trip) [27]\cr
|
2462 |
|
|
rBB&bootstrap register (trap) [7]\cr
|
2463 |
|
|
rTT&dynamic trap address register [14]\cr
|
2464 |
|
|
rWW&where-interrupted register (trap) [28]\cr
|
2465 |
|
|
rXX&execution register (trap) [29]\cr
|
2466 |
|
|
rYY&Y operand (trap) [30]\cr
|
2467 |
|
|
rZZ&Z operand (trap) [31]\cr}}$$
|
2468 |
|
|
@^rG@>
|
2469 |
|
|
@^rL@>
|
2470 |
|
|
In this list rG and rL are what we have been calling simply $G$ and $L$; \
|
2471 |
|
|
rC, rF, rI, rN, rO, rS, rU, and~rV have not been mentioned before.
|
2472 |
|
|
|
2473 |
|
|
@ The {\it cycle counter\/}~rC advances by~1 on every ``clock pulse'' of the
|
2474 |
|
|
@^rC@>
|
2475 |
|
|
\MMIX\ processor. Thus if \MMIX\ is running at 500 MHz, the cycle
|
2476 |
|
|
counter increases every 2 nanoseconds. There is no need to worry about
|
2477 |
|
|
rC overflowing; even if it were to increase once every nanosecond,
|
2478 |
|
|
it wouldn't reach $2^{64}$ until more than 584.55 years have gone by.
|
2479 |
|
|
|
2480 |
|
|
The {\it interval counter\/}~rI is similar, but it {\it decreases\/}
|
2481 |
|
|
@^rI@>
|
2482 |
|
|
by~1 on each cycle, and causes an {\it interval interrupt\/}
|
2483 |
|
|
when it reaches zero. Such interrupts can be extremely useful for
|
2484 |
|
|
``continuous profiling'' as a means of studying
|
2485 |
|
|
the empirical running time of programs;
|
2486 |
|
|
see Jennifer~M. Anderson, Lance~M. Berc, Jeffrey Dean, Sanjay Ghemawat,
|
2487 |
|
|
Monika~R. Henzinger, Shun-Tak~A. Leung, Richard~L. Sites, Mark~T. Vandevoorde,
|
2488 |
|
|
Carl~A. Waldspurger, and William~E. Weihl, {\sl ACM Transactions on Computer
|
2489 |
|
|
Systems\/ \bf15} (1997), 357--390.
|
2490 |
|
|
The interval interrupt is achieved by setting the leftmost bit of the
|
2491 |
|
|
``machine'' byte of~rQ equal to~1; this is the eighth-least-significant bit.
|
2492 |
|
|
@^rQ@>
|
2493 |
|
|
@^continuous profiling@>
|
2494 |
|
|
@^performance monitoring@>
|
2495 |
|
|
@^Anderson, Jennifer-Ann Monique@>
|
2496 |
|
|
@^Berc, Lance Michael@>
|
2497 |
|
|
@^Dean, Jeffrey Adgate@>
|
2498 |
|
|
@^Ghemawat, Sanjay@>
|
2499 |
|
|
@^Henzinger, Monika Hildegard Rauch@>
|
2500 |
|
|
@^Leung, Shun-Tak Albert@>
|
2501 |
|
|
@^Sites, Richard Lee@>
|
2502 |
|
|
@^Vandevoorde, Mark Thierry@>
|
2503 |
|
|
@^Waldspurger, Carl Alan@>
|
2504 |
|
|
@^Weihl, William Edward@>
|
2505 |
|
|
|
2506 |
|
|
The {\it usage counter\/}~rU consists of three fields $(u_p,u_m,u_c)$,
|
2507 |
|
|
@^rU@>
|
2508 |
|
|
called the usage pattern~$u_p$, the usage mask~$u_m$,
|
2509 |
|
|
and the usage count~$u_c$. The most significant byte of~rU is the usage
|
2510 |
|
|
pattern; the next most significant byte is the usage mask; and
|
2511 |
|
|
the remaining 48 bits are the usage count. Whenever an instruction whose
|
2512 |
|
|
${\rm OP}\land u_m=u_p$ has been executed, the value of $u_c$ increases by~1
|
2513 |
|
|
(mod~$2^{48}$).
|
2514 |
|
|
Thus, for example, the OP-code chart below implies that
|
2515 |
|
|
all instructions are counted if $u_p=u_m=0$;
|
2516 |
|
|
all loads and stores are counted together with \.{GO} and \.{PUSHGO}
|
2517 |
|
|
if $u_p=(10000000)_2$ and $u_m=(11000000)_2$;
|
2518 |
|
|
all floating point instructions are counted together with fixed point
|
2519 |
|
|
multiplications and divisions if $u_p=0$ and $u_m=(11100000)_2$;
|
2520 |
|
|
fixed point multiplications and divisions alone are counted if
|
2521 |
|
|
$u_p=(00011000)_2$ and $u_m=(11111000)_2$; completed subroutine calls
|
2522 |
|
|
are counted if $u_p=\.{POP}$ and $u_m=(11111111)_2$.
|
2523 |
|
|
Instructions in negative locations, which belong to the operating system,
|
2524 |
|
|
are exceptional: They are included in the usage count only if the leading bit
|
2525 |
|
|
of $u_c$ is~1.
|
2526 |
|
|
@^negative locations@>
|
2527 |
|
|
|
2528 |
|
|
Incidentally, the 64-bit counters rC and rI can be implemented rather cheaply with
|
2529 |
|
|
only two levels of logic, using an old trick called ``carry-save addition''
|
2530 |
|
|
[see, for example, G.~Metze and J.~E. Robertson, {\sl Proc.\ International
|
2531 |
|
|
Conf.\ Information Processing\/} (Paris:\ 1959), 389--396]. One nice
|
2532 |
|
|
embodiment of this idea is to
|
2533 |
|
|
@^Metze, Gernot@>
|
2534 |
|
|
@^Robertson, James Evans@>
|
2535 |
|
|
@^carry-save addition@>
|
2536 |
|
|
represent a binary number~$x$ in a redundant form as the difference $x'-x''$
|
2537 |
|
|
of two binary numbers. Any two such numbers can be added without carry
|
2538 |
|
|
propagation as follows: Let
|
2539 |
|
|
$$f(x,y,z)=
|
2540 |
|
|
(x\land\bar y)\lor(x\land z)\lor(\bar y\land z), \qquad
|
2541 |
|
|
% ((x\oplus y)\land(x\oplus z))\oplus z, \qquad
|
2542 |
|
|
g(x,y,z)=x\oplus y\oplus z.$$
|
2543 |
|
|
Then it is easy to check that $x-y+z=2f(x,y,z)-g(x,y,z)$; we need only verify
|
2544 |
|
|
this in the eight cases when $x$, $y$, and~$z$ are 0 or~1.
|
2545 |
|
|
Thus we can subtract~1 from a counter $x'-x''$ by setting
|
2546 |
|
|
$$(x',x'')\gets(f(x',x'',-1)\LL1,\;g(x',x'',-1));$$
|
2547 |
|
|
we can add~1 by setting
|
2548 |
|
|
$(x',x'')\gets(g(x'',x',-1),f(x'',x',-1)\LL1)$.
|
2549 |
|
|
The result is zero if and only if
|
2550 |
|
|
$x'=x''$. We need not actually compute the difference $x'-x''$ until
|
2551 |
|
|
we need to examine the register. The computation
|
2552 |
|
|
of $f(x,y,z)$ and $g(x,y,z)$ is particularly simple in the special
|
2553 |
|
|
cases $z=0$ and $z=-1$. A similar trick works for~rU,
|
2554 |
|
|
but extra care is needed in that case
|
2555 |
|
|
because several instructions might finish at the same time.
|
2556 |
|
|
(Thanks to Frank Yellin for his improvements to this paragraph.)
|
2557 |
|
|
@^Yellin, Frank Nathan@>
|
2558 |
|
|
|
2559 |
|
|
@ The special {\it serial number register\/}~rN is permanently set to
|
2560 |
|
|
@^rN@>
|
2561 |
|
|
the time this particular instance of\/ \MMIX\ was created (measured as the
|
2562 |
|
|
number of seconds since 00:00:00 Greenwich Mean Time on 1~January 1970),
|
2563 |
|
|
in its five least significant bytes. The three most significant bytes
|
2564 |
|
|
are permanently set to the {\it version number\/} of the \MMIX\ architecture
|
2565 |
|
|
that is being implemented together with
|
2566 |
|
|
two additional bytes that modify the version
|
2567 |
|
|
number. This quantity serves as an essentially unique identification
|
2568 |
|
|
number for each copy of\/ \MMIX.
|
2569 |
|
|
@^version number@>
|
2570 |
|
|
|
2571 |
|
|
Version 1.0.0 of the architecture is described in the present document.
|
2572 |
|
|
Version~1.0.1 is similar, but simplified to avoid the
|
2573 |
|
|
complications of pipelines and operating systems.
|
2574 |
|
|
Other versions may become necessary in the future.
|
2575 |
|
|
|
2576 |
|
|
@ The {\it register stack offset\/}~rO and {\it register stack
|
2577 |
|
|
pointer\/}~rS are especially interesting, because they are used to implement
|
2578 |
|
|
@^register stack@>
|
2579 |
|
|
@^rO@>
|
2580 |
|
|
@^rS@>
|
2581 |
|
|
\MMIX's register stack~$S[0]$, $S[1]$, $S[2]$,~\dots.
|
2582 |
|
|
|
2583 |
|
|
The operating system
|
2584 |
|
|
initializes a register stack by assigning a large area of virtual memory to
|
2585 |
|
|
each running process, beginning at an address like
|
2586 |
|
|
\Hex{6000000000000000}.
|
2587 |
|
|
If this starting address is~$\sigma$, stack entry $S[k]$ will go into
|
2588 |
|
|
the octabyte $\mm_8[\sigma+8k]$. Stack underflow will be detected because
|
2589 |
|
|
the process does not have permission to read from $\mm[\sigma-1]$.
|
2590 |
|
|
Stack overflow will be detected because something will give out---either
|
2591 |
|
|
the user's budget or the user's patience or the user's swap space---long before
|
2592 |
|
|
$2^{61}$~bytes of virtual memory are filled by a register stack.
|
2593 |
|
|
@^terabytes@>
|
2594 |
|
|
|
2595 |
|
|
The \MMIX\ hardware maintains the register stack by having two banks
|
2596 |
|
|
of 64-bit general-purpose registers, one for globals and one for locals.
|
2597 |
|
|
The global registers $\rm g[32]$, $\rm g[33]$, \dots, $\rm g[255]$ are used for
|
2598 |
|
|
register numbers that are $\ge\gg$ in \MMIX\ commands;
|
2599 |
|
|
recall that $G$~is always 32 or more. The local
|
2600 |
|
|
registers come from another array that contains $2^n$ registers for
|
2601 |
|
|
some~$n$ where $8\le n\le10$; for simplicity of exposition
|
2602 |
|
|
we will assume that there are exactly 512 local
|
2603 |
|
|
registers, but there may be only 256 or there may be 1024.
|
2604 |
|
|
|
2605 |
|
|
\def\l{{\rm l}}
|
2606 |
|
|
@^ring of local registers@>
|
2607 |
|
|
The local register slots l[0], l[1], \dots, l[511] act as a cyclic buffer with
|
2608 |
|
|
addresses that wrap around mod~512, so that $\l[512]=\l[0]$,
|
2609 |
|
|
$\l[513]=\l[1]$, etc. This buffer is divided into three parts by three
|
2610 |
|
|
pointers, which we will call $\alpha$, $\beta$, and $\gamma$.
|
2611 |
|
|
$$\epsfbox{mmix.1}$$
|
2612 |
|
|
Registers $\l[\alpha]$, $\l[\alpha+1]$, \dots,~$\l[\beta-1]$ are
|
2613 |
|
|
what program instructions currently call \$0, \$1, \dots,~$\$(\ll-1)$;
|
2614 |
|
|
registers $\l[\beta]$, $\l[\beta+1]$, \dots,~$\l[\gamma-1]$ are currently
|
2615 |
|
|
unused; and registers $\l[\gamma]$, $\l[\gamma+1]$, \dots,~$\l[\alpha-1]$
|
2616 |
|
|
contain items of the register stack that have been pushed down but not yet
|
2617 |
|
|
stored in memory. Special register~rS holds the virtual memory address where
|
2618 |
|
|
$\l[\gamma]$ will be stored, if necessary. Special register~rO holds the
|
2619 |
|
|
address where $\l[\alpha]$ will be stored; this always equals $8\tau$ plus
|
2620 |
|
|
the address of~$S[0]$. We can deduce the values of $\alpha$, $\beta$,
|
2621 |
|
|
and~$\gamma$ from the contents of rL, rO, and~rS, because
|
2622 |
|
|
$$\rm\alpha=(rO/8)\bmod512,\qquad \beta=(\alpha+rL)\bmod512,\qquad
|
2623 |
|
|
\hbox{and}\qquad \gamma=(rS/8)\bmod512.$$
|
2624 |
|
|
|
2625 |
|
|
To maintain this situation we need to make sure that the pointers $\alpha$,
|
2626 |
|
|
$\beta$, and $\gamma$ never move past each other. A~\.{PUSHJ} or
|
2627 |
|
|
\.{PUSHGO} operation simply
|
2628 |
|
|
advances $\alpha$ toward~$\beta$, so it is very simple. The first part of a
|
2629 |
|
|
\.{POP} operation, which moves $\beta$ toward~$\alpha$, is also very simple.
|
2630 |
|
|
But the next part of a~\.{POP} requires $\alpha$ to move downward, and
|
2631 |
|
|
memory accesses might be required. \MMIX\ will decrease rS by~8 (thereby
|
2632 |
|
|
decreasing $\gamma$ by~1) and set $\l[\gamma]\gets\mm_8[{\rm rS}]$,
|
2633 |
|
|
one or more times if necessary, to keep $\alpha$ from decreasing
|
2634 |
|
|
past~$\gamma$. Similarly, the operation of increasing~$L$ may cause \MMIX\ to
|
2635 |
|
|
set $\mm_8[{\rm rS}]\gets\l[\gamma]$ and increase rS by~8 (thereby increasing
|
2636 |
|
|
$\gamma$ by~1) one or more times, to keep $\beta$ from increasing
|
2637 |
|
|
past~$\gamma$. (Actually $\beta$ is never allowed to increase to the point
|
2638 |
|
|
where it becomes {\it equal\/} to $\gamma$.)
|
2639 |
|
|
If many registers need to be loaded or stored at once,
|
2640 |
|
|
these operations are interruptible.
|
2641 |
|
|
|
2642 |
|
|
[A somewhat similar scheme was introduced by David R. Ditzel and H.~R.
|
2643 |
|
|
McLellan in {\sl SIGPLAN Notices\/ \bf17},\thinspace4 (April 1982), 48--56,
|
2644 |
|
|
and incorporated in the so-called {\mc CRISP} architecture developed at
|
2645 |
|
|
AT{\AM}T Bell Labs. An even more similar scheme was adopted in the late 1980s
|
2646 |
|
|
@^AT{\AM}T Bell Laboratories@>
|
2647 |
|
|
@^Advanced Micro Devices@>
|
2648 |
|
|
by Advanced Micro Devices, in the processors of their Am29000 series---a
|
2649 |
|
|
family of computers whose instructions have essentially the
|
2650 |
|
|
format `OP~X~Y~Z' used by~\MMIX.]
|
2651 |
|
|
@^Ditzel, David Roger@>
|
2652 |
|
|
@^McClellan, Hubert Rae, Jr.@>
|
2653 |
|
|
|
2654 |
|
|
Limited versions of\/ \MMIX, having fewer registers, can also be envisioned. For
|
2655 |
|
|
example, we might have only 32 local registers $\l[0]$, $\l[1]$,
|
2656 |
|
|
\dots,~$\l[31]$ and only 32 global registers $\rm g[224]$, $\rm g[225]$,
|
2657 |
|
|
\dots,~$\rm g[255]$. Such a machine could run any \MMIX\ program that
|
2658 |
|
|
maintains the inequalities $\ll<32$ and $\gg\ge224$.
|
2659 |
|
|
|
2660 |
|
|
@ Access to \MMIX's special registers is obtained via the \.{GET} and
|
2661 |
|
|
\.{PUT} commands.
|
2662 |
|
|
@^special registers@>
|
2663 |
|
|
@^rL@>
|
2664 |
|
|
@^rQ@>
|
2665 |
|
|
|
2666 |
|
|
\bull\
|
2667 |
|
|
@.GET@>
|
2668 |
|
|
Register X is set to the contents of the special register identified by
|
2669 |
|
|
its code number~Z, using the code numbers listed earlier.
|
2670 |
|
|
An illegal instruction interrupt occurs if $\zz\ge32$.
|
2671 |
|
|
|
2672 |
|
|
Every special register is readable; \MMIX\ does not keep secrets from
|
2673 |
|
|
an inquisitive user. But of course only the operating system is allowed
|
2674 |
|
|
@^operating system@>
|
2675 |
|
|
to change registers like rK and~rQ (the interrupt mask and request
|
2676 |
|
|
registers). And not even the operating system is allowed to change~rC
|
2677 |
|
|
(the cycle counter) or rN~(the serial number) or the stack pointers
|
2678 |
|
|
rO~and~rS.
|
2679 |
|
|
|
2680 |
|
|
\bull\
|
2681 |
|
|
@.PUT@>
|
2682 |
|
|
the Y field must be zero.\>
|
2683 |
|
|
The special register identified by~X is set to
|
2684 |
|
|
the contents of register Z or to the unsigned byte~Z itself,
|
2685 |
|
|
if permissible. Some changes are, however, impermissible:
|
2686 |
|
|
Bits of rA that are always zero must remain zero; the leading seven bytes
|
2687 |
|
|
of rG and rL must remain zero, and rL must not exceed~rG;
|
2688 |
|
|
special registers 8--11 (namely rC, rN, rO, and~rS) must not change;
|
2689 |
|
|
special registers 12--18 (namely
|
2690 |
|
|
rI, rK, rQ, rT, rU, rV, and~rTT) can be changed only if the privilege
|
2691 |
|
|
bit of rK is zero;
|
2692 |
|
|
and certain bits of~rQ (depending on available hardware) might not
|
2693 |
|
|
allow software to change them from 0 to~1. Moreover, any bits of~rQ that have
|
2694 |
|
|
changed from 0 to~1 since the most recent \
|
2695 |
|
|
will remain~1 after \.{PUT}~\.{rQ,z}.
|
2696 |
|
|
The \.{PUT} command will not increase~rL; it sets rL to the minimum
|
2697 |
|
|
of the current value and the new value. (A~program should say
|
2698 |
|
|
\
|
2699 |
|
|
|
2700 |
|
|
Impermissible \.{PUT} commands cause an illegal instruction interrupt,
|
2701 |
|
|
or (in the case of rI, rK, rQ, rT, rU, rV, and~rTT) a privileged
|
2702 |
|
|
operation interrupt.
|
2703 |
|
|
@^illegal instructions@>
|
2704 |
|
|
@^privileged operations@>
|
2705 |
|
|
|
2706 |
|
|
\bull\
|
2707 |
|
|
@.SAVE@>
|
2708 |
|
|
@^register stack@>
|
2709 |
|
|
@^ring of local registers@>
|
2710 |
|
|
@^rO@>
|
2711 |
|
|
@^rS@>
|
2712 |
|
|
\
|
2713 |
|
|
so must the Z field of~\.{SAVE}, the X~field of \.{UNSAVE}.\>
|
2714 |
|
|
@.UNSAVE@>
|
2715 |
|
|
The \.{SAVE} instruction stores all registers and special registers
|
2716 |
|
|
that might affect the computation of the currently running process.
|
2717 |
|
|
First the current local registers \$0, \$1, \dots,~$\$(\ll-1)$ are
|
2718 |
|
|
pushed down as in \.{PUSHGO}~\.{\$255}, and $L$~is set to zero.
|
2719 |
|
|
Then the current global
|
2720 |
|
|
registers $\$\gg$, $\$(\gg+1)$, \dots,~\$255 are placed above them
|
2721 |
|
|
in the register stack; finally
|
2722 |
|
|
rB, rD, rE, rH, rJ, rM, rR, rP, rW, rX, rY, and~rZ
|
2723 |
|
|
are placed at the very top, followed by registers rG and~rA packed
|
2724 |
|
|
into eight bytes:
|
2725 |
|
|
$$\beginword
|
2726 |
|
|
&\field88&&\field{24}{24}&&\field{32}{32}\cr
|
2727 |
|
|
\noalign{\hrule}
|
2728 |
|
|
\\&rG&\\&0&\\&rA&\\\cr
|
2729 |
|
|
\noalign{\hrule}\endword$$
|
2730 |
|
|
The address of the topmost octabyte is then placed in register~X, which
|
2731 |
|
|
must be a global register. (This instruction is interruptible. If an
|
2732 |
|
|
interrupt occurs while the registers are being saved, we will have
|
2733 |
|
|
$\alpha=\beta=\gamma$ in the ring of local registers;
|
2734 |
|
|
thus rO will equal~rS and rL will be zero. The interrupt handler
|
2735 |
|
|
essentially has a new register stack, starting on top of the partially
|
2736 |
|
|
saved context.) Immediately after a \.{SAVE} the values of rO and~rS
|
2737 |
|
|
are equal to the location of the first byte following the stack
|
2738 |
|
|
just saved. The current register stack is effectively empty at this
|
2739 |
|
|
point; thus one shouldn't do a \.{POP} until this context
|
2740 |
|
|
or some other context has been unsaved.
|
2741 |
|
|
@^rO@>
|
2742 |
|
|
@^rS@>
|
2743 |
|
|
|
2744 |
|
|
The \.{UNSAVE} instruction goes the other way, restoring all the
|
2745 |
|
|
registers when given an address in register~Z that was returned
|
2746 |
|
|
by a previous \.{SAVE}. Immediately after an \.{UNSAVE} the values of
|
2747 |
|
|
rO and~rS will be equal. Like \.{SAVE}, this instruction is interruptible.
|
2748 |
|
|
|
2749 |
|
|
The operating system uses \.{SAVE} and \.{UNSAVE}
|
2750 |
|
|
to switch context between different processes.
|
2751 |
|
|
It can also use \.{UNSAVE} to
|
2752 |
|
|
establish suitable initial values of rO and~rS\null.
|
2753 |
|
|
But a user program that knows what it is doing can in fact allocate its own
|
2754 |
|
|
register stack or stacks and do its own process switching.
|
2755 |
|
|
|
2756 |
|
|
Caution: \.{UNSAVE} is destructive, in the sense that a program can't reliably
|
2757 |
|
|
\.{UNSAVE} twice from the same saved context. Once an
|
2758 |
|
|
\.{UNSAVE} has been done,
|
2759 |
|
|
further operations are likely to change the memory
|
2760 |
|
|
record of what was saved. Moreover, an interrupt during the middle
|
2761 |
|
|
of an \.{UNSAVE} may have already clobbered some of the data in memory before
|
2762 |
|
|
the \.{UNSAVE} has completely finished, although the data will appear
|
2763 |
|
|
properly in all registers.
|
2764 |
|
|
|
2765 |
|
|
@* Virtual and physical addresses.
|
2766 |
|
|
Virtual 64-bit addresses are converted to physical addresses in a manner
|
2767 |
|
|
@^virtual addresses@>
|
2768 |
|
|
@^physical addresses@>
|
2769 |
|
|
governed by the special {\it virtual translation register\/}~rV. Thus
|
2770 |
|
|
@^rV@>
|
2771 |
|
|
$\rm M[A]$ really refers to $\rm m[\phi(A)]$, where m~is the physical
|
2772 |
|
|
memory array and $\phi(A)$
|
2773 |
|
|
is determined by the physical mapping function~$\phi$. The details of
|
2774 |
|
|
this conversion are rather technical and of interest mainly to the operating
|
2775 |
|
|
system, but two simple rules are important to ordinary users:
|
2776 |
|
|
@^operating system@>
|
2777 |
|
|
|
2778 |
|
|
\bull Negative addresses are mapped directly to physical addresses, by simply
|
2779 |
|
|
@^negative locations@>
|
2780 |
|
|
suppressing the sign bit:
|
2781 |
|
|
$$\phi(A)=A+2^{63}=A\land\Hex{7fffffffffffffff},\qquad
|
2782 |
|
|
\hbox{if $A<0$.}$$
|
2783 |
|
|
{\it All accesses to negative addresses are privileged}, for use by the
|
2784 |
|
|
operating system only.
|
2785 |
|
|
@^privileged operations@>
|
2786 |
|
|
(Thus, for example, the trap addresses in~rT and~rTT should be negative,
|
2787 |
|
|
because they are addresses inside the operating system.) Moreover, all physical
|
2788 |
|
|
addresses $\ge2^{48}$ are intended for use by memory-mapped I/O devices;
|
2789 |
|
|
values read from or written to such locations are never placed in a cache.
|
2790 |
|
|
@^I/O@>
|
2791 |
|
|
@^input/output@>
|
2792 |
|
|
@^memory-mapped input/output@>
|
2793 |
|
|
|
2794 |
|
|
\bull Nonnegative addresses belong to four {\it segments}, depending on
|
2795 |
|
|
@^segments@>
|
2796 |
|
|
whether the three leading bits are 000, 001, 010, or 011. These $2^{61}$-byte
|
2797 |
|
|
segments are traditionally used for a program's text, data, dynamic
|
2798 |
|
|
memory, and register stack, respectively, but such conventions are
|
2799 |
|
|
not mandatory. There are four mappings $\phi_0$, $\phi_1$, $\phi_2$,
|
2800 |
|
|
and~$\phi_3$ of 61-bit addresses into 48-bit physical memory space, one for
|
2801 |
|
|
each segment:
|
2802 |
|
|
$$\phi(A)=\phi_{\lfloor A/2^{61}\rfloor}(A\bmod2^{61}),\qquad
|
2803 |
|
|
\hbox{if $0\le A<2^{63}$.}$$
|
2804 |
|
|
In general, the machine is able to access smaller addresses of a segment more
|
2805 |
|
|
efficiently than larger addresses. Thus a programmer should let each segment
|
2806 |
|
|
grow upward from zero, trying to keep any of the 61-bit addresses from
|
2807 |
|
|
becoming larger than necessary, although arbitrary addresses are legal.
|
2808 |
|
|
|
2809 |
|
|
@ Now it's time for the technical details of virtual address translation.
|
2810 |
|
|
@^segments@>
|
2811 |
|
|
@^virtual addresses@>
|
2812 |
|
|
@^physical addresses@>
|
2813 |
|
|
@^rV@>
|
2814 |
|
|
The mappings $\phi_0$, $\phi_1$, $\phi_2$, and~$\phi_3$ are defined
|
2815 |
|
|
by the following rules.
|
2816 |
|
|
\smallskip
|
2817 |
|
|
|
2818 |
|
|
(1) The first two bytes of rV are four nybbles called $b_1$, $b_2$, $b_3$,
|
2819 |
|
|
$b_4$; we also define $b_0=0$. Segment~$i$ has at most $1024^{\,b_{i+1}-b_i}$
|
2820 |
|
|
pages. In particular, segment~$i$ must have at most one page when
|
2821 |
|
|
$b_i=b_{i+1}$, and it must be entirely empty if $b_i>b_{i+1}$.
|
2822 |
|
|
|
2823 |
|
|
(2) The next byte of rV, $s$, specifies the current {\it page size},
|
2824 |
|
|
which is $2^s$ bytes. We must have $s\ge13$ (hence at least 8192~bytes
|
2825 |
|
|
per page). Values of~$s$ larger than, say, 20 or~so are of use only in rather
|
2826 |
|
|
large programs that will reside in main memory for long periods of time,
|
2827 |
|
|
because memory protection and swapping are applied to entire pages.
|
2828 |
|
|
The maximum legal value of~$s$ is~48.
|
2829 |
|
|
|
2830 |
|
|
(3) The remaining five bytes of rV are a 27-bit {\it root location\/}~$r$,
|
2831 |
|
|
a 10-bit {\it address space number\/}~$n$, and a 3-bit {\it function
|
2832 |
|
|
field\/}~$f$:
|
2833 |
|
|
$$\centerline{$\hbox{rV}=\beginword
|
2834 |
|
|
&\field44&&\field44&&\field44&&\field44&&\field88&&
|
2835 |
|
|
\field{27}{27}&&\field{10}{10}&&\field33\cr
|
2836 |
|
|
\noalign{\hrule}
|
2837 |
|
|
\\&$b_1$&\\&$b_2$&\\&$b_3$&\\&$b_4$&\\&$s$&\\&$r$&\\&$n$&\\&$f$&\\\cr
|
2838 |
|
|
\noalign{\hrule}\endword$}$$
|
2839 |
|
|
Normally $f=0$; if $f=1$, virtual address translation will be done by
|
2840 |
|
|
software instead of hardware, and the $b_1$, $b_2$, $b_3$, $b_4$,
|
2841 |
|
|
and~$r$ fields of~rV will be ignored by the hardware.
|
2842 |
|
|
(Values of $f>1$ are reserved for possible future use; if $f>1$
|
2843 |
|
|
when \MMIX\ tries to translate an address, a memory-protection
|
2844 |
|
|
failure will occur.)
|
2845 |
|
|
@^illegal instructions@>
|
2846 |
|
|
|
2847 |
|
|
(4) Each page has an 8-byte {\it page table entry\/} (PTE), which looks
|
2848 |
|
|
@^page table entry@>
|
2849 |
|
|
@^PTE@>
|
2850 |
|
|
like this:
|
2851 |
|
|
$$\centerline{$\hbox{PTE}=\beginword
|
2852 |
|
|
&\field{16}{16}&&\field{32}{48-s}&&\field3{s-13}&&\field{10}{10}&&
|
2853 |
|
|
\field33\cr
|
2854 |
|
|
\noalign{\hrule}
|
2855 |
|
|
\\&$x$&\\&$a$&\\&$y$&\\&$n$&\\&$p$&\\\cr
|
2856 |
|
|
\noalign{\hrule}\endword$}$$
|
2857 |
|
|
Here $x$ and $y$ are ignored (thus they are usable for any purpose by the
|
2858 |
|
|
operating
|
2859 |
|
|
system); $2^s a$~is the physical address of byte~0 on the page; and $n$~is
|
2860 |
|
|
the address space number (which must match the number in~rV). The final three
|
2861 |
|
|
bits are the {\it protection bits\/} $p_r\,p_w\,p_x$; the user needs
|
2862 |
|
|
$p_r=1$ to load from this page, $p_w=1$ to store on this page, and
|
2863 |
|
|
$p_x=1$ to execute instructions on this page. If $n$~fails to
|
2864 |
|
|
match the number in~rV, or if the appropriate protection bit is zero,
|
2865 |
|
|
a memory-protection fault occurs.
|
2866 |
|
|
@^protection fault@>
|
2867 |
|
|
|
2868 |
|
|
Page table entries should be writable only by the operating system.
|
2869 |
|
|
The 16 ignored bits of~$x$ imply that physical memory size is limited
|
2870 |
|
|
to $2^{48}$ bytes (namely 256 large terabytes); that should be enough capacity
|
2871 |
|
|
for awhile, if not for the entire new millennium.
|
2872 |
|
|
@^terabytes@>
|
2873 |
|
|
|
2874 |
|
|
(5) A given 61-bit address $A$ belongs to page $\lfloor A/2^s\rfloor$ of
|
2875 |
|
|
its segment, and
|
2876 |
|
|
$$\phi_i(A)=2^s\,a+(A\bmod2^s)$$
|
2877 |
|
|
if $a$ is the address in the PTE for page $\lfloor A/2^s\rfloor$ of
|
2878 |
|
|
segment~$i$.
|
2879 |
|
|
|
2880 |
|
|
(6) Suppose $\lfloor A/2^s\rfloor=(a_4a_3a_2a_1a_0)_{1024}$ in the
|
2881 |
|
|
radix-1024 number system. In the common case $a_4=a_3=a_2=a_1=0$, the
|
2882 |
|
|
PTE is simply the octabyte ${\rm m}_8[2^{13}(r+b_i)+8a_0]$; this rule
|
2883 |
|
|
defines the mapping for the first 1024 pages. The next million or~so pages are
|
2884 |
|
|
accessed through an auxiliary {\it page table pointer}
|
2885 |
|
|
@^page table pointer@>
|
2886 |
|
|
@^PTP@>
|
2887 |
|
|
$$\centerline{$\hbox{PTP}=\beginword
|
2888 |
|
|
&\field11&&\field{50}{50}&&\field{10}{10}&&\field33\cr
|
2889 |
|
|
\noalign{\hrule}
|
2890 |
|
|
\\&1&\\&$c$&\\&$n$&\\&$q$&\\\cr
|
2891 |
|
|
\noalign{\hrule}\endword$}$$
|
2892 |
|
|
in ${\rm m}_8[2^{13}(r+b_i+1)+8a_1]$; here the sign must be~1 and the
|
2893 |
|
|
$n$-field must match~rV, but the $q$~bits are ignored. The desired PTE for
|
2894 |
|
|
page $(a_1a_0)_{1024}$ is then in ${\rm m}_8[2^{13}c+8a_0]$. The next billion
|
2895 |
|
|
or so pages, namely the pages $(a_2a_1a_0)_{1024}$ with $a_2\ne0$,
|
2896 |
|
|
are accessed similarly, through an auxiliary PTP at level~two; and
|
2897 |
|
|
so on.
|
2898 |
|
|
|
2899 |
|
|
Notice that if $b_3=b_4$, there is just one page in segment~3, and its PTE
|
2900 |
|
|
appears all alone in physical location $2^{13}(r+b_3)$.
|
2901 |
|
|
Otherwise the PTEs appear in 1024-octabyte blocks. We usually
|
2902 |
|
|
have $0
|
2903 |
|
|
worthy of mention: In this special case there is only one page, and the
|
2904 |
|
|
segment bits of a virtual address are ignored; the other $61-s$ bits of each
|
2905 |
|
|
virtual address must be zero.
|
2906 |
|
|
|
2907 |
|
|
If $s=13$, $b_1=3$, $b_2=2$, $b_3=1$, and $b_4=0$, there are at most
|
2908 |
|
|
$2^{30}$ pages of 8192 bytes each, all belonging to segment~0. This is
|
2909 |
|
|
essentially the virtual memory setup in the Alpha~21064 computers with
|
2910 |
|
|
{\mc DIGITAL~UNIX}$^{\rm\,TM}$.
|
2911 |
|
|
@^Alpha computers@>
|
2912 |
|
|
|
2913 |
|
|
I know these rules look extremely complicated, and I sincerely wish I could
|
2914 |
|
|
have found an alternative that would be both simple and efficient in practice.
|
2915 |
|
|
I tried various schemes based on hashing, but came to the conclusion that
|
2916 |
|
|
``trie'' methods such as those described here are better for this application.
|
2917 |
|
|
Indeed, the page tables in most contemporary computers are based on very
|
2918 |
|
|
similar ideas, but with significantly smaller virtual addresses and without
|
2919 |
|
|
the shortcut for small page numbers. I tried also to find formats for rV
|
2920 |
|
|
and the page tables that would match byte boundaries in a more friendly way,
|
2921 |
|
|
but the corresponding page sizes did not work well. Fortunately these grungy
|
2922 |
|
|
details are almost always completely hidden from ordinary users.
|
2923 |
|
|
|
2924 |
|
|
@ Of course \MMIX\ can't afford to perform a lengthy calculation of physical
|
2925 |
|
|
addresses every time it accesses memory. The machine therefore maintains a
|
2926 |
|
|
{\it translation cache\/} (TC),
|
2927 |
|
|
@^translation caches@>
|
2928 |
|
|
@^TC@>
|
2929 |
|
|
which contains the translations of recently
|
2930 |
|
|
accessed pages. (In fact, there usually are two such caches,
|
2931 |
|
|
one for instructions
|
2932 |
|
|
and one for data.) A~TC holds a set of 64-bit translation keys
|
2933 |
|
|
$$\beginword
|
2934 |
|
|
&\field{1.2}1&&\field22&&\field{44.8}{61-s}&&\field3{s-13}&&\field{10}{10}&&
|
2935 |
|
|
\field33\cr
|
2936 |
|
|
\noalign{\hrule}
|
2937 |
|
|
\\&0&\\&$i$&\\&$v$&\\&0&\\&$n$&\\&0&\\\cr
|
2938 |
|
|
\noalign{\hrule}\endword$$
|
2939 |
|
|
associated with 38-bit translations
|
2940 |
|
|
$$\beginword
|
2941 |
|
|
&\field{32}{48-s}&&\field3{s-13}&&\field33\cr
|
2942 |
|
|
\noalign{\hrule}
|
2943 |
|
|
\\&$a$&\\&0&\\&$p$&\\\cr
|
2944 |
|
|
\noalign{\hrule}\endword$$
|
2945 |
|
|
representing the relevant parts of the PTE for page $v$ of segment $i$.
|
2946 |
|
|
Different processes typically have different values of~$n$, and possibly also
|
2947 |
|
|
different values of~$s$. The operating system needs a way to keep such caches
|
2948 |
|
|
up to date when pages are being allocated, moved, swapped, or recycled.
|
2949 |
|
|
The operating system also likes to know which pages have been recently
|
2950 |
|
|
used. The \.{LDVTS} instructions facilitate such operations:
|
2951 |
|
|
@^protection bits@>
|
2952 |
|
|
@^permission bits@>
|
2953 |
|
|
|
2954 |
|
|
\bull\
|
2955 |
|
|
@.LDVTS@>
|
2956 |
|
|
The sum $\rY+\rZ$ or $\rY+\zz$ should have the form of
|
2957 |
|
|
a translation cache key as above,
|
2958 |
|
|
except that the rightmost three bits need not be zero.
|
2959 |
|
|
If this key is present in a TC,
|
2960 |
|
|
the rightmost three bits replace the current protection code~$p$;
|
2961 |
|
|
however, if $p$ is thereby set to zero, the key is removed from
|
2962 |
|
|
the TC. Register~X is set to 0 if the key was not present
|
2963 |
|
|
in any translation cache, or to 1 if the key was present in the TC
|
2964 |
|
|
for instructions, or to 2 if the key was present in the TC for data,
|
2965 |
|
|
or to~3 if the key was present in both. This instruction is for the
|
2966 |
|
|
operating system only.
|
2967 |
|
|
|
2968 |
|
|
@ We mentioned earlier that
|
2969 |
|
|
cheap versions of\/ \MMIX\ might calculate the physical addresses with
|
2970 |
|
|
@^emulation@>
|
2971 |
|
|
@^rV@>
|
2972 |
|
|
software instead of hardware, using forced traps when the operating
|
2973 |
|
|
system needs to do page table calculations.
|
2974 |
|
|
@^operating system@>
|
2975 |
|
|
Here is some code that could be used for
|
2976 |
|
|
such purposes; it defines the translation process precisely, given a
|
2977 |
|
|
nonnegative virtual
|
2978 |
|
|
address in register~rYY\null. First we must unpack the fields of~rV and
|
2979 |
|
|
@^virtual addresses@>
|
2980 |
|
|
@^physical addresses@>
|
2981 |
|
|
@^rV@>
|
2982 |
|
|
@^PTE@>
|
2983 |
|
|
@^PTP@>
|
2984 |
|
|
@^segments@>
|
2985 |
|
|
compute the relevant base addresses for PTEs and PTPs:
|
2986 |
|
|
$$\vbox{\halign{&\tt#\hfil\ \cr
|
2987 |
|
|
&GET &virt,rYY\cr
|
2988 |
|
|
&GET &\$7,rV &\% \$7=(virtual translation register)\cr
|
2989 |
|
|
&SRU &\$1,virt,61 &\% \$1=i (segment number of virtual address)\cr
|
2990 |
|
|
&SLU &\$1,\$1,2 \cr
|
2991 |
|
|
&NEG &\$1,52,\$1 &\% \$1=52-4i\cr
|
2992 |
|
|
&SRU &\$1,\$7,\$1 \cr
|
2993 |
|
|
&SLU &\$2,\$1,4 \cr
|
2994 |
|
|
&SETL &\$0,\#f000 \cr
|
2995 |
|
|
&AND &\$1,\$1,\$0 &\% \$1=b[i]<<12\cr
|
2996 |
|
|
&AND &\$2,\$2,\$0 &\% \$2=b[i+1]<<12\cr
|
2997 |
|
|
&SLU &\$3,\$7,24 \cr
|
2998 |
|
|
&SRU &\$3,\$3,37 \cr
|
2999 |
|
|
&SLU &\$3,\$3,13 &\% \$3=(r field of rV)\cr
|
3000 |
|
|
&ORH &\$3,\#8000 &\% make \$3 a physical address\cr
|
3001 |
|
|
&2ADDU &base,\$1,\$3 &\% base=address of first page table\cr
|
3002 |
|
|
&2ADDU &limit,\$2,\$3 &\% limit=address after last page table\cr
|
3003 |
|
|
&SRU &s,\$7,40 \cr
|
3004 |
|
|
&AND &s,s,\#ff &\% s=(s field of rV)\cr
|
3005 |
|
|
&CMP &\$0,s,13 \cr
|
3006 |
|
|
&BN &\$0,Fail &\% s must be 13 or more\cr
|
3007 |
|
|
&CMP &\$0,s,49 \cr
|
3008 |
|
|
&BNN &\$0,Fail &\% s must be 48 or less\cr
|
3009 |
|
|
&SETH &mask,\#8000 \cr
|
3010 |
|
|
&ORL &mask,\#1ff8&\% mask=(sign bit and n field)\cr
|
3011 |
|
|
&ORH &\$7,\#8000 &\% set sign bit for PTP validation below\cr
|
3012 |
|
|
&ANDNH &virt,\#e000 &\% zero out the segment number\cr
|
3013 |
|
|
&SRU &\$0,virt,s &\% \$0=a4a3a2a1a0 (page number of virt)\cr
|
3014 |
|
|
&ZSZ &\$1,\$0,1 &\% \$1=[page number is zero]\cr
|
3015 |
|
|
&ADD &limit,limit,\$1&\% increase limit if page number is zero\cr
|
3016 |
|
|
&SETL&\$6,\#3ff\cr
|
3017 |
|
|
}}$$
|
3018 |
|
|
The next part of the routine finds the ``digits'' of
|
3019 |
|
|
the page number $(a_4a_3a_2a_1a_0)_{1024}$, from right to left:
|
3020 |
|
|
$$
|
3021 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3022 |
|
|
&OR &\$5,base,0\cr
|
3023 |
|
|
&SRU &\$1,\$0,10\cr
|
3024 |
|
|
&PBZ &\$1,1F\cr
|
3025 |
|
|
&AND &\$0,\$0,\$6\cr
|
3026 |
|
|
&INCL &base,\#2000\cr}}
|
3027 |
|
|
\qquad
|
3028 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3029 |
|
|
&OR &\$5,base,0\cr
|
3030 |
|
|
&SRU &\$2,\$1,10\cr
|
3031 |
|
|
&PBZ &\$2,2F\cr
|
3032 |
|
|
&AND &\$1,\$1,\$6\cr
|
3033 |
|
|
&INCL &base,\#2000\cr}}
|
3034 |
|
|
\qquad
|
3035 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3036 |
|
|
&OR &\$5,base,0\cr
|
3037 |
|
|
&SRU &\$3,\$2,10\cr
|
3038 |
|
|
&PBZ &\$3,3F\cr
|
3039 |
|
|
&AND &\$2,\$2,\$6\cr
|
3040 |
|
|
&INCL &base,\#2000\cr}}
|
3041 |
|
|
\qquad
|
3042 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3043 |
|
|
&OR &\$5,base,0\cr
|
3044 |
|
|
&SRU &\$4,\$3,10\cr
|
3045 |
|
|
&PBZ &\$4,4F\cr
|
3046 |
|
|
&AND &\$3,\$3,\$6\cr
|
3047 |
|
|
&INCL &base,\#2000\cr}}
|
3048 |
|
|
$$
|
3049 |
|
|
Then the process cascades back through PTPs.
|
3050 |
|
|
$$
|
3051 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3052 |
|
|
&OR &\$5,base,0\cr
|
3053 |
|
|
&8ADDU&\$6,\$4,base\cr
|
3054 |
|
|
&LDO &base,\$6,0\cr
|
3055 |
|
|
&XOR &\$6,base,\$7\cr
|
3056 |
|
|
&AND &\$6,\$6,mask\cr
|
3057 |
|
|
&BNZ &\$6,Fail\cr}}
|
3058 |
|
|
\quad
|
3059 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3060 |
|
|
&ANDNL&base,\#1fff\cr
|
3061 |
|
|
4H&8ADDU &\$6,\$3,base\cr
|
3062 |
|
|
&LDO &base,\$6,0\cr
|
3063 |
|
|
&XOR &\$6,base,\$7\cr
|
3064 |
|
|
&AND &\$6,\$6,mask\cr
|
3065 |
|
|
&BNZ &\$6,Fail\cr}}
|
3066 |
|
|
\quad
|
3067 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3068 |
|
|
&ANDNL&base,\#1fff\cr
|
3069 |
|
|
3H&8ADDU &\$6,\$2,base\cr
|
3070 |
|
|
&LDO &base,\$6,0\cr
|
3071 |
|
|
&XOR &\$6,base,\$7\cr
|
3072 |
|
|
&AND &\$6,\$6,mask\cr
|
3073 |
|
|
&BNZ &\$6,Fail\cr}}
|
3074 |
|
|
\quad
|
3075 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3076 |
|
|
&ANDNL&base,\#1fff\cr
|
3077 |
|
|
2H&8ADDU &\$6,\$1,base\cr
|
3078 |
|
|
&LDO &base,\$6,0\cr
|
3079 |
|
|
&XOR &\$6,base,\$7\cr
|
3080 |
|
|
&AND &\$6,\$6,mask\cr
|
3081 |
|
|
&BNZ &\$6,Fail\cr}}
|
3082 |
|
|
$$
|
3083 |
|
|
Finally we obtain the PTE and communicate it to the machine.
|
3084 |
|
|
If errors have been detected, we set the translation to zero; actually
|
3085 |
|
|
any translation with permission bits zero would have the same effect.
|
3086 |
|
|
$$\chardef\_=`\_
|
3087 |
|
|
\vcenter{\halign{&\tt#\hfil\ \cr
|
3088 |
|
|
&ANDNL &base,\#1fff &\% remove low 13 bits of PTP\cr
|
3089 |
|
|
1H &8ADDU &\$6,\$0,base \cr
|
3090 |
|
|
&LDO &base,\$6,0 &\% base=PTE\cr
|
3091 |
|
|
&XOR &\$6,base,\$7\cr
|
3092 |
|
|
&ANDN&\$6,\$6,\#7\cr
|
3093 |
|
|
&SLU &\$6,\$6,51\cr
|
3094 |
|
|
&BNZ &\$6,Fail &\% branch if n doesn't match\cr
|
3095 |
|
|
&CMP &\$6,\$5,limit \cr
|
3096 |
|
|
&BN &\$6,Ready &\% did we run off the end of the page table?\cr
|
3097 |
|
|
Fail&SETL &base,0 &\% errors lead to PTE of zero\cr
|
3098 |
|
|
Ready&PUT&rZZ,base\cr
|
3099 |
|
|
&LDO&\$255,IntMask &\% load the desired setting of rK\cr
|
3100 |
|
|
&RESUME&1 &\% now the machine will digest the translation\cr}}$$
|
3101 |
|
|
All loads and stores in this program deal with negative virtual addresses.
|
3102 |
|
|
This effectively shuts off memory mapping and makes the page tables
|
3103 |
|
|
inaccessible to the user.\looseness=-1
|
3104 |
|
|
|
3105 |
|
|
The program assumes that the ropcode in rXX is 3 (which it is when
|
3106 |
|
|
a forced trap is triggered by the need for virtual translation).
|
3107 |
|
|
@^ropcodes@>
|
3108 |
|
|
@^translation caches@>
|
3109 |
|
|
|
3110 |
|
|
The translation from virtual pages to physical pages need not actually
|
3111 |
|
|
follow the rules for PTPs and PTEs; any other mapping could be
|
3112 |
|
|
substituted by operating systems with special needs. But people usually
|
3113 |
|
|
want compatibility between different implementations whenever
|
3114 |
|
|
possible. The only parts of~rV that \MMIX\ really needs are the $s$~field,
|
3115 |
|
|
which defines page sizes, and the $n$~field, which keeps TC entries
|
3116 |
|
|
of one process from being confused with the TC entries of another.
|
3117 |
|
|
|
3118 |
|
|
@* The complete instruction set. We have now described all of\/ \MMIX's
|
3119 |
|
|
special registers---except one: The special
|
3120 |
|
|
{\it failure location register\/}~rF is set
|
3121 |
|
|
@^rF@>
|
3122 |
|
|
to a physical memory address when a parity error or other memory
|
3123 |
|
|
fault occurs. (The instruction leading to this error will probably be
|
3124 |
|
|
long gone before such a fault is detected; for example, the machine might
|
3125 |
|
|
be trying to write old data from a cache in order to make room for
|
3126 |
|
|
new data. Thus there is generally no connection between the current virtual
|
3127 |
|
|
program location~rW and the physical location of a memory error. But knowledge
|
3128 |
|
|
of the latter location can still be useful for hardware repair, or when
|
3129 |
|
|
an operating system is booting up.)
|
3130 |
|
|
|
3131 |
|
|
@ One additional instruction proves to be useful.
|
3132 |
|
|
|
3133 |
|
|
\bull\
|
3134 |
|
|
This command lubricates the disk drives, fans, magnetic tape drives,
|
3135 |
|
|
laser printers, scanners, and any other mechanical equipment hooked
|
3136 |
|
|
up to \MMIX, if necessary. Fields X, Y, and~Z are ignored.
|
3137 |
|
|
@.SWYM@>
|
3138 |
|
|
|
3139 |
|
|
The \.{SWYM} command was originally included in \MMIX's repertoire because
|
3140 |
|
|
machines occasionally need grease to keep in shape, just as
|
3141 |
|
|
human beings occasionally need to swim or do some other kind of exercise
|
3142 |
|
|
in order to maintain good muscle tone. But in fact, \.{SWYM} has turned out to
|
3143 |
|
|
be a ``no-op,'' an instruction that does nothing at all; the
|
3144 |
|
|
@^no-op@>
|
3145 |
|
|
hypothetical manufacturers of our hypothetical machine have pointed out that
|
3146 |
|
|
modern computer equipment is already well oiled and sealed for permanent use.
|
3147 |
|
|
Even so, a no-op instruction provides a good way for software to
|
3148 |
|
|
send signals to the hardware, for such things as scheduling the way
|
3149 |
|
|
instructions are issued on superscalar superpipelined buzzword-compliant
|
3150 |
|
|
machines. Software programs can also use no-ops to communicate with other
|
3151 |
|
|
programs like symbolic debuggers.
|
3152 |
|
|
|
3153 |
|
|
When a forced trap computes the translation~rZZ of a virtual address~rYY,
|
3154 |
|
|
ropcode~3 of \
|
3155 |
|
|
the opcode in~rXX is \.{SWYM}; otherwise $\rm(rYY,rZZ)$ will be put
|
3156 |
|
|
into the TC for data.
|
3157 |
|
|
@^ropcodes@>
|
3158 |
|
|
@^translation caches@>
|
3159 |
|
|
@.RESUME@>
|
3160 |
|
|
@^virtual address emulation@>
|
3161 |
|
|
@^emulation@>
|
3162 |
|
|
|
3163 |
|
|
@ The running time of\/ \MMIX\ programs depends to a great extent
|
3164 |
|
|
on changes in technology.
|
3165 |
|
|
\MMIX\ is a mythical machine, but its mythical hardware exists in
|
3166 |
|
|
cheap, slow versions as well as in costly high-performance models.
|
3167 |
|
|
Details of running time usually depend on things like the amount of main memory
|
3168 |
|
|
available to implement virtual memory, as well as the sizes of
|
3169 |
|
|
caches and other buffers.
|
3170 |
|
|
|
3171 |
|
|
For practical purposes, the running time of an \MMIX\ program can often be
|
3172 |
|
|
estimated satisfactorily by assigning a fixed cost
|
3173 |
|
|
to each operation, based on the approximate running time that would be obtained
|
3174 |
|
|
on a high-performance machine with lots of main memory; so that's what
|
3175 |
|
|
we will do. Each operation will be assumed to take an integer number
|
3176 |
|
|
of~$\upsilon$,
|
3177 |
|
|
where $\upsilon$ (pronounced ``oops'') is a unit that represents the clock cycle time in
|
3178 |
|
|
@^mems@>
|
3179 |
|
|
@^oops@>
|
3180 |
|
|
a pipelined implementation. The value of $\upsilon$ will probably decrease
|
3181 |
|
|
from year to year, but I'll keep calling it $\upsilon$. The running
|
3182 |
|
|
time will also depend on the number of memory references or {\it mems\/}
|
3183 |
|
|
that a program uses;
|
3184 |
|
|
this is the number of load and store instructions. For example,
|
3185 |
|
|
each \.{LDO} (load octa) instruction will be assumed to cost
|
3186 |
|
|
$\mu+\upsilon$, where $\mu$ is the average cost of
|
3187 |
|
|
a memory reference. The total running time of a program might be reported as,
|
3188 |
|
|
say, $35\mu+1000\upsilon$, meaning 35 mems plus 1000~oops. The
|
3189 |
|
|
ratio $\mu/\upsilon$ will probably increase with time, so mem-counting
|
3190 |
|
|
is likely to become increasingly important. [See the discussion of mems in
|
3191 |
|
|
{\sl The Stanford GraphBase\/} (New York:\ ACM Press, 1994).]
|
3192 |
|
|
@^oops@>
|
3193 |
|
|
@^running times, approximate@>
|
3194 |
|
|
|
3195 |
|
|
Integer addition, subtraction, and comparison all take just $1\upsilon$.
|
3196 |
|
|
The same is true for \.{SET}, \.{GET}, \.{PUT}, \.{SYNC}, and \.{SWYM}
|
3197 |
|
|
instructions,
|
3198 |
|
|
as well as bitwise logical operations, shifts, relative jumps, comparisons,
|
3199 |
|
|
conditional assignments,
|
3200 |
|
|
and correctly predicted branches-not-taken or probable-branches-taken.
|
3201 |
|
|
Mispredicted branches or probable branches cost $3\upsilon$, and
|
3202 |
|
|
so do the \.{POP} and \.{GO} commands.
|
3203 |
|
|
Integer multiplication takes $10\upsilon$; integer division weighs in
|
3204 |
|
|
at~$60\upsilon$.
|
3205 |
|
|
@.MUL@>
|
3206 |
|
|
@.DIV@>
|
3207 |
|
|
@.TRAP@>
|
3208 |
|
|
@.TRIP@>
|
3209 |
|
|
@.RESUME@>
|
3210 |
|
|
\.{TRAP}, \.{TRIP}, and \.{RESUME} cost $5\upsilon$ each.
|
3211 |
|
|
|
3212 |
|
|
Most floating point operations have a nominal running time of $4\upsilon$,
|
3213 |
|
|
although the comparison operators \.{FCMP}, \.{FEQL}, and \.{FUN}
|
3214 |
|
|
need only $1\upsilon$.
|
3215 |
|
|
\.{FDIV} and \.{FSQRT} cost $40\upsilon$ each.
|
3216 |
|
|
@.FDIV@>
|
3217 |
|
|
@.FSQRT@>
|
3218 |
|
|
@.FREM@>
|
3219 |
|
|
The actual running time of floating point computations
|
3220 |
|
|
will vary depending on the operands; for example,
|
3221 |
|
|
the machine might need one extra $\upsilon$ for each subnormal input
|
3222 |
|
|
or output, and it might slow down greatly when trips are enabled.
|
3223 |
|
|
The \.{FREM} instruction might typically cost
|
3224 |
|
|
$(3+\delta)\upsilon$, where $\delta$ is the amount
|
3225 |
|
|
by which the exponent of the first operand exceeds the exponent of the
|
3226 |
|
|
second (or zero, if this amount is negative). A floating point
|
3227 |
|
|
operation might take only $1\upsilon$
|
3228 |
|
|
if at least one of its operands is zero, infinity, or~NaN\null.
|
3229 |
|
|
However, the fixed values stated at the beginning of this paragraph
|
3230 |
|
|
will be used for all seat-of-the-pants estimates of running time,
|
3231 |
|
|
since we want to keep the estimates as simple as possible
|
3232 |
|
|
without making them terribly out of line.
|
3233 |
|
|
|
3234 |
|
|
All load and store operations will be assumed to cost $\mu+\upsilon$,
|
3235 |
|
|
except that \.{CSWAP} costs $2\mu+2\upsilon$.
|
3236 |
|
|
(This applies to all OP~codes that begin with
|
3237 |
|
|
\Hex8, \Hex9, \Hex{A}, and \Hex{B}, except \Hex{98}--\Hex{9F} and
|
3238 |
|
|
\Hex{B8}--\Hex{BF}. It's best
|
3239 |
|
|
to keep the rules simple, because $\mu$ is just
|
3240 |
|
|
an approximate device for estimating average memory cost.)
|
3241 |
|
|
\.{SAVE} and \.{UNSAVE} are charged $20\mu+\upsilon$.
|
3242 |
|
|
@.CSWAP@>
|
3243 |
|
|
@.SAVE@>
|
3244 |
|
|
@.UNSAVE@>
|
3245 |
|
|
|
3246 |
|
|
Of course we must remember that these numbers are very rough.
|
3247 |
|
|
We have not included the cost of fetching instructions from memory.
|
3248 |
|
|
Furthermore, an integer multiplication or division might have an effective
|
3249 |
|
|
cost of only $1\upsilon$, if the result is not needed while other
|
3250 |
|
|
numbers are being calculated.
|
3251 |
|
|
Only a detailed simulation can be expected to be truly realistic.
|
3252 |
|
|
|
3253 |
|
|
@ If you think that \MMIX\ has plenty of operation codes, you are right;
|
3254 |
|
|
we have now described them all. Here is a chart that shows their
|
3255 |
|
|
numeric values:
|
3256 |
|
|
\def\oddline#1{\cr
|
3257 |
|
|
\noalign{\nointerlineskip}
|
3258 |
|
|
\omit&\setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}&
|
3259 |
|
|
\multispan{17}\hrulefill&
|
3260 |
|
|
\setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}\cr
|
3261 |
|
|
\noalign{\nointerlineskip}}
|
3262 |
|
|
\def\evenline{\cr\noalign{\hrule}}
|
3263 |
|
|
\def\chartstrut{\lower4.5pt\vbox to14pt{}}
|
3264 |
|
|
\def\beginchart{$$\tt\halign to\hsize\bgroup
|
3265 |
|
|
\chartstrut##\tabskip0pt plus10pt&
|
3266 |
|
|
&\hfil##\hfil&\vrule##\cr
|
3267 |
|
|
\lower6.5pt\null
|
3268 |
|
|
&&&\Hex0&&\Hex1&&\Hex2&&\Hex3&&\Hex4&&\Hex 5&&\Hex 6&&\Hex 7&\evenline}
|
3269 |
|
|
\def\endchart{\raise11.5pt\null&&&\Hex 8&&\Hex 9&&\Hex A&&\Hex B&
|
3270 |
|
|
&\Hex C&&\Hex D&&\Hex E&&\Hex F&\cr\egroup$$}
|
3271 |
|
|
\def\\#1[#2]{\multispan3\hfil#1[#2]\hfil}
|
3272 |
|
|
\beginchart
|
3273 |
|
|
&&&TRAP&&FCMP&&FUN&&FEQL&&FADD&&FIX&&FSUB&&FIXU&\oddline 0
|
3274 |
|
|
&&&\\FLOT[I]&&\\FLOTU[I]&&\\SFLOT[I]&&\\SFLOTU[I]&\evenline
|
3275 |
|
|
&&&FMUL&&FCMPE&&FUNE&&FEQLE&&FDIV&&FSQRT&&FREM&&FINT&\oddline 1
|
3276 |
|
|
&&&\\MUL[I]&&\\MULU[I]&&\\DIV[I]&&\\DIVU[I]&\evenline
|
3277 |
|
|
&&&\\ADD[I]&&\\ADDU[I]&&\\SUB[I]&&\\SUBU[I]&\oddline 2
|
3278 |
|
|
&&&\\2ADDU[I]&&\\4ADDU[I]&&\\8ADDU[I]&&\\16ADDU[I]&\evenline
|
3279 |
|
|
&&&\\CMP[I]&&\\CMPU[I]&&\\NEG[I]&&\\NEGU[I]&\oddline 3
|
3280 |
|
|
&&&\\SL[I]&&\\SLU[I]&&\\SR[I]&&\\SRU[I]&\evenline
|
3281 |
|
|
&&&\\BN[B]&&\\BZ[B]&&\\BP[B]&&\\BOD[B]&\oddline 4
|
3282 |
|
|
&&&\\BNN[B]&&\\BNZ[B]&&\\BNP[B]&&\\BEV[B]&\evenline
|
3283 |
|
|
&&&\\PBN[B]&&\\PBZ[B]&&\\PBP[B]&&\\PBOD[B]&\oddline 5
|
3284 |
|
|
&&&\\PBNN[B]&&\\PBNZ[B]&&\\PBNP[B]&&\\PBEV[B]&\evenline
|
3285 |
|
|
&&&\\CSN[I]&&\\CSZ[I]&&\\CSP[I]&&\\CSOD[I]&\oddline 6
|
3286 |
|
|
&&&\\CSNN[I]&&\\CSNZ[I]&&\\CSNP[I]&&\\CSEV[I]&\evenline
|
3287 |
|
|
&&&\\ZSN[I]&&\\ZSZ[I]&&\\ZSP[I]&&\\ZSOD[I]&\oddline 7
|
3288 |
|
|
&&&\\ZSNN[I]&&\\ZSNZ[I]&&\\ZSNP[I]&&\\ZSEV[I]&\evenline
|
3289 |
|
|
&&&\\LDB[I]&&\\LDBU[I]&&\\LDW[I]&&\\LDWU[I]&\oddline 8
|
3290 |
|
|
&&&\\LDT[I]&&\\LDTU[I]&&\\LDO[I]&&\\LDOU[I]&\evenline
|
3291 |
|
|
&&&\\LDSF[I]&&\\LDHT[I]&&\\CSWAP[I]&&\\LDUNC[I]&\oddline 9
|
3292 |
|
|
&&&\\LDVTS[I]&&\\PRELD[I]&&\\PREGO[I]&&\\GO[I]&\evenline
|
3293 |
|
|
&&&\\STB[I]&&\\STBU[I]&&\\STW[I]&&\\STWU[I]&\oddline A
|
3294 |
|
|
&&&\\STT[I]&&\\STTU[I]&&\\STO[I]&&\\STOU[I]&\evenline
|
3295 |
|
|
&&&\\STSF[I]&&\\STHT[I]&&\\STCO[I]&&\\STUNC[I]&\oddline B
|
3296 |
|
|
&&&\\SYNCD[I]&&\\PREST[I]&&\\SYNCID[I]&&\\PUSHGO[I]&\evenline
|
3297 |
|
|
&&&\\OR[I]&&\\ORN[I]&&\\NOR[I]&&\\XOR[I]&\oddline C
|
3298 |
|
|
&&&\\AND[I]&&\\ANDN[I]&&\\NAND[I]&&\\NXOR[I]&\evenline
|
3299 |
|
|
&&&\\BDIF[I]&&\\WDIF[I]&&\\TDIF[I]&&\\ODIF[I]&\oddline D
|
3300 |
|
|
&&&\\MUX[I]&&\\SADD[I]&&\\MOR[I]&&\\MXOR[I]&\evenline
|
3301 |
|
|
&&&SETH&&SETMH&&SETML&&SETL&&INCH&&INCMH&&INCML&&INCL&\oddline E
|
3302 |
|
|
&&&ORH&&ORMH&&ORML&&ORL&&ANDNH&&ANDNMH&&ANDNML&&ANDNL&\evenline
|
3303 |
|
|
&&&\\JMP[B]&&\\PUSHJ[B]&&\\GETA[B]&&\\PUT[I]&\oddline F
|
3304 |
|
|
&&&POP&&RESUME&&SAVE&&UNSAVE&&SYNC&&SWYM&&GET&&TRIP&\evenline
|
3305 |
|
|
\endchart
|
3306 |
|
|
The notation `\.{[I]}' indicates an operation with an ``immediate'' variant
|
3307 |
|
|
in which the Z field denotes a constant instead of a register number.
|
3308 |
|
|
Similarly, `\.{[B]}' indicates an operation with a ``backward'' variant
|
3309 |
|
|
in which a relative address has a negative displacement. Simulators and
|
3310 |
|
|
other programs that need to present \MMIX\ instructions in symbolic
|
3311 |
|
|
form will say that opcode \Hex{20} is \.{ADD} while opcode \Hex{21}
|
3312 |
|
|
is~\.{ADDI}; they will say that \Hex{F2} is \.{PUSHJ} while \Hex{F3}
|
3313 |
|
|
is~\.{PUSHJB}. But the \MMIX\ assembler uses only the forms \.{ADD}
|
3314 |
|
|
and \.{PUSHJ}, not \.{ADDI} or \.{PUSHJB}.
|
3315 |
|
|
|
3316 |
|
|
To read this chart, use the hexadecimal digits at the top, bottom,
|
3317 |
|
|
left, and right.
|
3318 |
|
|
For example, operation code \.{A9} in hexadecimal notation appears in
|
3319 |
|
|
the lower part of the \Hex{Ax} row and in the \Hex1/\Hex9 column; it is
|
3320 |
|
|
\.{STTI}, `store tetrabyte immediate'.
|
3321 |
|
|
@^OP codes, table@>
|
3322 |
|
|
|
3323 |
|
|
%The blank spaces in this chart are undefined opcodes,
|
3324 |
|
|
%reserved for future extension.
|
3325 |
|
|
%If an instruction with such
|
3326 |
|
|
%an opcode is encountered in a user program, it is considered to be
|
3327 |
|
|
%an illegal instruction (like, say, \.{FIX} with the \.Y field greater than~9),
|
3328 |
|
|
%@^illegal instructions@>
|
3329 |
|
|
%triggering an interrupt. Such instructions might become defined in
|
3330 |
|
|
%later versions of\/ \MMIX, at which time the operating system
|
3331 |
|
|
%could probably emulate the new instructions for backward compatibility.
|
3332 |
|
|
%@^version number@>
|
3333 |
|
|
|
3334 |
|
|
\def\\#1{\leavevmode\hbox{\it#1\/\kern.05em}} % italic type for identifiers
|
3335 |
|
|
|
3336 |
|
|
@*Index. (References are to section numbers, not page numbers.)
|