OpenCores
URL https://opencores.org/ocsvn/eco32/eco32/trunk

Subversion Repositories eco32

[/] [eco32/] [trunk/] [fp/] [implementation/] [mmix/] [mmix-doc.w] - Blame information for rev 15

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 15 hellwig
% This file is part of the MMIXware package (c) Donald E Knuth 1999
2
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
3
 
4
\def\title{MMIX}
5
\input epsf % input macros for dvips to include METAPOST illustrations
6
 
7
\def\MMIX{\.{MMIX}}
8
\def\NNIX{\hbox{\mc NNIX}}
9
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
10
\def\beginword{\vcenter\bgroup\let\\=\wordrule\halign\bgroup&\hfil##\hfil\cr}
11
\def\endword{\noalign{\vskip\baselineskip}\egroup\egroup
12
   \advance\belowdisplayskip-\baselineskip}
13
\def\wordrule{\vrule height 9.5pt depth 4.5pt width .4pt}
14
\newdimen\bitwd \bitwd=6.6pt
15
\def\field#1#2{\vrule depth 3pt width 0pt \hbox to#1\bitwd{\hss$#2$\hss}}
16
 
17
\def\XF{\\{XF}}\def\XM{\\{XM}}\def\XD{\\{XD}} % these not in \tt
18
\def\PC{\\{PC}}
19
\def\Jx{\.{J}} % conversely, I type J_ to get J in \tt
20
 
21
\def\s{{\rm s}}
22
\def\rX{{\rm\$X}} \def\rY{{\rm\$Y}} \def\rZ{{\rm\$Z}}
23
\def\mm{{\rm M}} \def\xx{{\rm X}} \def\yy{{\rm Y}} \def\zz{{\rm Z}}
24
%\def\ll{{\rm L}} \def\gg{{\rm G}}
25
\def\ll{L} \def\gg{G}
26
\def\?{\mkern-1mu}
27
 
28
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
29
 
30
@* Introduction to MMIX.
31
Thirty-eight years have passed since the \.{MIX} computer was designed, and
32
computer architecture has been converging during those years
33
towards a rather different
34
style of machine. Therefore it is time to replace \.{MIX} with a new
35
computer that contains even less saturated fat than its predecessor.
36
 
37
Exercise 1.3.1--25 in the third edition of
38
{\sl Fundamental Algorithms\/} speaks of an extended
39
\.{MIX} called MixMaster, which is upward compatible with the old version.
40
But MixMaster itself is hopelessly obsolete; although it allows for
41
several gigabytes of memory, we can't even use it with {\mc ASCII} code to
42
get lowercase letters. And ouch, the standard subroutine calling convention
43
of \.{MIX} is irrevocably based on self-modifying code! Decimal arithmetic
44
and self-modifying code were popular in 1962, but they sure have disappeared
45
quickly as machines have gotten bigger and faster. A completely new
46
design is called for, based on the principles of RISC architecture as
47
expounded in {\sl Computer Architecture\/} by Hennessy and Patterson
48
(Morgan Kaufmann, 1996). % first ed was "Morgan Kaufman"! but now "nn" is legit
49
@^Hennessy, John LeRoy@>
50
@^Patterson, David Andrew@>
51
 
52
So here is \MMIX, a computer that will totally replace \.{MIX}
53
in the ``ultimate'' editions of {\sl The Art of Computer Programming},
54
Volumes 1--3, and in the first editions of the remaining volumes.
55
I~must confess that
56
I~can hardly wait to own a computer like this.
57
 
58
How do you pronounce \MMIX? I've been saying ``em-mix'' to myself,
59
because the first `\.M' represents a new millennium. Therefore I~use
60
the article ``an'' instead of~``a'' before the name \MMIX\
61
in English phrases like ``an \MMIX\ simulator.''
62
 
63
Incidentally, the {\sl Dictionary of American Regional English\/ \bf3} (1996)
64
lists ``mommix'' as a common dialect word used both as a noun and a verb;
65
to mommix something means to botch it, to bollix it. Only time will
66
tell whether I~have mommixed the definition of \MMIX.
67
 
68
@ The original \.{MIX} computer could be operated without an operating
69
system; you could bootstrap it with punched cards or paper tape and do
70
everything yourself. But nowadays such power is no longer in the hands of
71
ordinary users. The \MMIX\ hardware, like all other computing machines
72
made today, relies on an operating system to get jobs
73
started in their own address spaces and to provide I/O capabilities.
74
 
75
Whenever anybody has asked if I will be writing about operating systems,
76
my reply has always been ``Nix.'' Therefore the name of\/ \MMIX's operating
77
system, \NNIX, will come as no surprise.
78
@:NNIX}{\NNIX\ operating system@>
79
@^operating system@>
80
From time to time I will necessarily have to refer to things that \NNIX\ does
81
for its users, but I am unable to build \NNIX\ myself. Life is
82
too short. It would be wonderful if some expert in operating system design
83
became inspired to write a book that explains exactly how to construct a nice,
84
clean \NNIX\ kernel for an \MMIX\ chip.
85
 
86
@ I am deeply grateful to the many people who have helped me shape the behavior
87
of\/ \MMIX. In particular, John Hennessy and (especially) Dick Sites
88
have made significant contributions.
89
@^Hennessy, John LeRoy@>
90
@^Sites, Richard Lee@>
91
 
92
@ A programmer's introduction to \MMIX\ appears in ``Fascicle~1,'' a booklet
93
@^Fascicle 1@>
94
containing tutorial material that will ultimately appear in the fourth edition
95
of {\sl The Art of Computer Programming}.
96
The description in the following sections is rather different, because
97
we are concerned about a complete implementation, including all of the
98
features used by the operating system and invisible to normal programs.
99
Here it is important to emphasize exceptional cases that were glossed over
100
in the tutorial, and~to consider
101
nitpicky details about things that might go wrong.
102
 
103
@* MMIX basics.
104
\MMIX\ is a 64-bit RISC machine with at least 256 general-purpose registers
105
and a 64-bit address space.
106
Every instruction is four bytes long and has the form
107
$$\vcenter{\offinterlineskip
108
 \def\\#1&{\omit&}
109
 \hrule
110
 \halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
111
 height 9pt depth4pt&OP&&X&&Y&&Z&\cr}
112
 \hrule}\,.$$
113
The 256 possible OP codes fall into a dozen or so easily remembered
114
@^OP codes@>
115
categories; an instruction usually means, ``Set register X to the
116
result of\/ Y~OP~Z\null.'' For example,
117
$$\vcenter{\offinterlineskip
118
 \def\\#1&{\omit&}
119
 \hrule
120
 \halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
121
 height 9pt depth4pt&32&&1&&2&&3&\cr}
122
 \hrule}$$
123
sets register~1 to the sum of registers 2 and 3.
124
A few instructions combine the Y and Z bytes into
125
a 16-bit YZ field; two of the jump instructions use a 24-bit XYZ field.
126
But the three bytes X, Y, Z usually have three-pronged significance
127
independent of each other.
128
 
129
Instructions are usually represented in a symbolic form corresponding
130
to the \MMIX\ assembly language, in which each operation code has a mnemonic
131
name. For example, operation~32 is \.{ADD}, and the instruction above
132
might be written `\.{ADD} \.{\$1,\$2,\$3}'; a dollar sign `\.\$' symbolizes
133
a register number. In general, the instruction
134
\.{ADD}~\.{\$X,\$Y,\$Z} is the operation of setting $\rX=\rY+\rZ$.
135
An assembly language instruction with two commas has three operand
136
fields X, Y,~Z; an instruction with one comma has two operand fields
137
X,~YZ; an instruction with no comma has one operand field,~XYZ;
138
an instruction with no operands has $\xx=\yy=\zz=0$.
139
 
140
\def\0{\$Z\char'174Z}
141
Most instructions have two forms, one in which the Z field stands for
142
register \$Z, and one in which Z is an unsigned ``immediate'' constant.
143
@^immediate operands@>
144
Thus, for example, the command `\.{ADD} \.{\$X,\$Y,\$Z}' has a counterpart
145
`\.{ADD} \.{\$X,\$Y,Z}', which sets $\rX=\rY+\zz$. Immediate constants
146
are always nonnegative.
147
In the descriptions
148
below we will introduce such pairs of instructions
149
by writing just `\.{ADD}~\.{\$X,\$Y,\0}' instead of naming both
150
cases explicitly.
151
 
152
The operation code for \.{ADD}~\.{\$X,\$Y,\$Z} is 32, but the operation
153
code for \.{ADD}~\.{\$X,\$Y,Z} is~33. The \MMIX\ assembler chooses the correct
154
code by noting whether the third argument is a register number or~not.
155
 
156
Register numbers and constants can be given symbolic names; for example, the
157
assembly language instruction `\.x~\.{IS}~\.{\$1}' makes \.x an
158
abbreviation for register number~1. Similarly, `\.{FIVE}~\.{IS}~\.5'
159
makes \.{FIVE} an abbreviation for the constant~5.
160
After these abbreviations have been specified, the instruction
161
\.{ADD}~\.{x,x,FIVE} increases \$1 by~5, using opcode~33, while
162
the instruction \.{ADD}~\.{x,x,x} doubles \$1 using opcode~32.
163
Symbolic names that stand for register numbers
164
conventionally begin with a lowercase letter, while names that stand
165
for constants conventionally begin with an uppercase letter.
166
This convention is not actually enforced by the assembler,
167
but it tends to reduce a programmer's confusion.
168
 
169
@ A {\it nybble\/} is a 4-bit quantity, often used to denote a decimal
170
or hexadecimal digit.
171
A {\it byte\/} is an 8-bit quantity, often used to denote an alphanumeric
172
character in {\mc ASCII} code. The Unicode standard extends {\mc ASCII} to
173
@^Unicode@>
174
@^ASCII@>
175
essentially all the world's languages by using 16-bit-wide characters called
176
{\it wydes\/}. (Weight watchers know that two nybbles make one byte,
177
but two bytes make one wyde.)
178
In the discussion below we use the term
179
{\it tetrabyte\/} or ``tetra'' for a 4-byte quantity, and the similar term
180
@^nybble@>
181
@^byte@>
182
@^wyde@>
183
@^tetrabyte@>
184
@^octabyte@>
185
{\it octabyte\/} or ``octa'' for an 8-byte quantity. Thus, a tetra is
186
two wydes, an octa is two tetras; an octabyte has 64~bits. Each \MMIX\
187
register can be thought of as containing one octabyte, or two tetras,
188
or four wydes, or eight bytes, or sixteen nybbles.
189
 
190
When bytes, wydes, tetras, and octas represent numbers they are said to be
191
either {\it signed\/} or {\it unsigned}. An unsigned byte is a number between
192
0~and $2^8-1=255$ inclusive; an unsigned wyde lies, similarly, between
193
0~and $2^{16}-1=65535$; an unsigned tetra lies between
194
0~and $2^{32}-1=4{,}294{,}967{,}295$; an unsigned octa lies between
195
0~and $2^{64}-1=18{,}446{,}744{,}073{,}709{,}551{,}615$.
196
Their signed counterparts use the
197
conventions of two's complement notation, by subtracting respectively $2^8$,
198
$2^{16}$, $2^{32}$, or~$2^{64}$ times the most significant bit. Thus,
199
the unsigned bytes 128 through 255 are regarded as the numbers $-128$
200
through~$-1$ when they are evaluated as signed bytes; a signed byte therefore
201
lies between $-128$ and $+127$, inclusive. A signed wyde is a number
202
between $-32768$ and $+32767$; a signed tetra lies between
203
$-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$; a signed octa lies between
204
$-9{,}223{,}372{,}036{,}854{,}775{,}808$ and $+9{,}223{,}372{,}036{,}854{,}775{,}807$.
205
 
206
The virtual memory of\/ \MMIX\ is an array M of $2^{64}$ bytes. If $k$ is any
207
unsigned octabyte, M[$k$]~is a 1-byte quantity. \MMIX\ machines do not
208
actually have such vast memories, but programmers can act as if $2^{64}$ bytes
209
are indeed present, because \MMIX\ provides address translation mechanisms by
210
which an operating system can maintain this illusion.
211
 
212
We use the notation $\mm_{2^t}[k]$ to stand for a number consisting of
213
$2^t$~consecutive bytes starting at location~$k\land\nobreak(2^{64}-2^t)$.
214
(The notation $k\land(2^{64}-2^t)$ means that the least
215
significant $t$ bits of~$k$ are set to~0, and only the least 64~bits
216
of the resulting address are retained. Similarly, the notation
217
$k\lor(2^t-1)$ means that the least significant $t$ bits of~$k$ are set to~1.)
218
All accesses to $2^t$-byte quantities by \MMIX\ are {\it aligned}, in the sense
219
that the first byte is a multiple of~$2^t$.
220
 
221
Addressing is always ``big-endian.'' In other words, the
222
@^big-endian versus little-endian@>
223
@^little-endian versus big-endian@>
224
most significant (leftmost) byte of $\mm_{2^t}[k]$ is
225
$\mm_1[k\land\nobreak(2^{64}-2^t)]$
226
and the least significant (rightmost) byte is $\mm_1[k\lor(2^t-1)]$.
227
We use the notation $\s(\mm_{2^t}[k])$ when we want to regard
228
this $2^t$-byte number as a {\it signed\/} integer.
229
Formally speaking, if $l=2^t$,
230
@^signed integers@>
231
$$\s(\mm_l[k])=\bigl(\mm_1[k\land(-l)]\,\mm_1[k\land(-l)+1]\,\ldots\,
232
 \mm_1[k\lor(l-1)]\bigr)_{256}
233
-2^{8l}[\mm_1[k\land(-l)]\!\ge\!128].$$
234
 
235
@* Loading and storing.
236
Several instructions can be used to get information from memory into
237
registers. For example, the ``load tetra unsigned'' instruction
238
\.{LDTU} \.{\$1,\$4,\$5}
239
puts the four bytes $\mm_4[\$4+\$5]$ into register~1 as an unsigned
240
integer;
241
the most significant four bytes of register~1 are set to zero.
242
The similar instruction \.{LDT} \.{\$1,\$4,\$5}, ``load tetra,'' sets
243
\$1 to the {\it signed\/} integer $\s(\mm_4[\$4+\$5])$.
244
(Instructions generally treat numbers as
245
@^signed integers@>
246
signed unless the operation code specifically calls them
247
unsigned.) In the signed case, the most significant four bytes of the
248
register will be copies of the most significant bit of the tetrabyte
249
loaded; thus they will be all~0s or all~1s, depending on whether the
250
number is $\ge0$ or $<0$.
251
 
252
\def\bull{\smallbreak\textindent{$\bullet$}}
253
\def\bul{\par\textindent{$\bullet$}}
254
\def\<#1 #2 {\.{#1}~\.{#2} }
255
\def\>{\hfill\break}
256
 
257
\bull\
258
@.LDB@>
259
Byte $\s(\mm[\rY+\rZ])$ or $\s(\mm[\rY+\zz])$ is loaded into register~X as a
260
signed number between $-128$ and $+127$, inclusive.
261
 
262
\bull\
263
@.LDBU@>
264
Byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$ is loaded into register~X as an
265
unsigned number between $0$ and $255$, inclusive.
266
 
267
\bull\
268
@.LDW@>
269
Bytes $\s(\mm_2[\rY+\rZ])$ or $\s(\mm_2[\rY+\zz])$
270
are loaded into register~X as a signed number between $-32768$ and $+32767$,
271
inclusive. As mentioned above, our notation $\mm_2[k]$ implies that
272
the least significant bit of the address $\rY+\rZ$ or $\rY+\zz$ is
273
ignored and assumed to be~0.
274
 
275
\bull\
276
@.LDWU@>
277
Bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$ are loaded
278
into register~X as an unsigned number between $0$ and $65535$, inclusive.
279
 
280
\bull\
281
@.LDT@>
282
Bytes $\s(\mm_4[\rY+\rZ])$ or $\s(\mm_4[\rY+\zz])$
283
are loaded into register~X as a signed number between $-2{,}147{,}483{,}648$ and
284
$+2{,}147{,}483{,}647$, inclusive.
285
As mentioned above, our notation $\mm_4[k]$ implies that
286
the two least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
287
ignored and assumed to be~0.
288
 
289
\bull\
290
@.LDTU@>
291
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$
292
are loaded into register~X as an unsigned number between 0 and
293
4{,}294{,}967{,}296, inclusive.
294
 
295
\bull\
296
@.LDO@>
297
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
298
register~X\null.
299
As mentioned above, our notation $\mm_8[k]$ implies that
300
the three least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
301
ignored and assumed to be~0.
302
 
303
\bull\
304
@.LDOU@>
305
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
306
register~X\null. There is in fact no difference between the behavior of
307
\.{LDOU} and~\.{LDO}, since
308
an octabyte can be regarded as either signed or unsigned. \.{LDOU} is included
309
in \MMIX\ just for completeness and consistency, in spite of the fact that
310
a foolish consistency is the hobgoblin of little minds.
311
@^Emerson, Ralph Waldo@>
312
(Niklaus Wirth made a strong plea for such consistency in his early critique
313
of System/360; see {\sl JACM\/ \bf15} (1967), 37--74.)
314
@^Wirth, Niklaus Emil@>
315
@^System/360@>
316
 
317
\bull\
318
@.LDHT@>
319
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$ are loaded into the most
320
significant half of register~X, and the least significant half is
321
cleared to zero. (One use of ``high tetra arithmetic'' is to detect
322
overflow easily when tetrabytes are added or subtracted.)
323
 
324
\bull\
325
The address $\rY+\rZ$ or $\rY+\zz$ is loaded into register~X. This
326
instruction is simply another name for the \.{ADDU} instruction
327
discussed below; it can
328
be used when the programmer is thinking of memory addresses
329
instead of numbers.
330
The \MMIX\ assembler converts \.{LDA} into the same OP-code as \.{ADDU}.
331
@.LDA@>
332
@.ADDU@>
333
 
334
@ Another family of instructions goes the other way, storing registers into
335
memory. For example, the ``store octa immediate'' command
336
\
337
into $\mm_8[\$2+17]$.
338
 
339
\bull\
340
@.STB@>
341
The least significant byte of register~X is stored into
342
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. An integer overflow exception occurs if
343
@.overflow@>
344
\$X is not between $-128$ and $+127$. (We will discuss overflow and other
345
kinds of exceptions later.)
346
 
347
\bull\\>
348
@.STBU@>
349
The least significant byte of register~X is stored into
350
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. \.{STBU} instructions are the same
351
as \.{STB} instructions, except that no test for overflow is made.
352
 
353
\bull\
354
@.STW@>
355
The two least significant bytes of register~X are stored into
356
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
357
An integer overflow exception occurs if
358
\$X is not between $-32768$ and $+32767$.
359
 
360
\bull\\>
361
@.STWU@>
362
The two least significant bytes of register~X are stored into
363
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
364
\.{STWU} instructions are the same
365
as \.{STW} instructions, except that no test for overflow is made.
366
 
367
\bull\
368
@.STT@>
369
The four least significant bytes of register~X are stored into
370
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
371
An integer overflow exception occurs if
372
\$X is not between $-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$.
373
 
374
\bull\
375
@.STTU@>
376
The four least significant bytes of register~X are stored into
377
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
378
\.{STTU} instructions are the same
379
as \.{STT} instructions, except that no test for overflow is made.
380
 
381
\bull\
382
@.STO@>
383
Register X is stored into bytes $\mm_8[\rY+\rZ]$ or
384
$\mm_8[\rY+\zz]$.
385
 
386
\bull\
387
@.STOU@>
388
Identical to \.{STO} \.{\$X,\$Y,\0}.
389
 
390
\bull\
391
@.STCO@>
392
An octabyte whose value is the unsigned byte X is stored into
393
$\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$.
394
 
395
\bull\
396
The most significant four bytes of register~X are stored into
397
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
398
@.STHT@>
399
 
400
@* Adding and subtracting.
401
Once numbers are in registers, we can compute with them. Let's consider
402
addition and subtraction first.
403
 
404
\bull\
405
@.ADD@>
406
The sum $\rY+\rZ$ or $\rY+\zz$ is placed into register~X using signed,
407
two's complement arithmetic.
408
An integer overflow exception occurs if the sum is $\ge2^{63}$ or $<-2^{63}$.
409
(We will discuss overflow and other kinds of exceptions later.)
410
@.overflow@>
411
 
412
\bull\
413
@.ADDU@>
414
The sum $(\rY+\rZ)\bmod2^{64}$ or $(\rY+\zz)\bmod2^{64}$
415
is placed into register~X\null.
416
These instructions are the same
417
as \.{ADD}~\.{\$X,\$Y,\0} commands
418
except that no test for overflow is made.
419
(Overflow could be detected if desired by using the command
420
\
421
@.CMPU@>
422
``compare unsigned''; see below.)
423
 
424
\bull\<2ADDU \$X,\$Y,\0 `times 2 and add unsigned'.\>
425
@.2ADDU@>
426
The sum $(2\rY+\rZ)\bmod2^{64}$ or $(2\rY+\zz)\bmod2^{64}$
427
is placed into register~X\null.
428
 
429
\bull\<4ADDU \$X,\$Y,\0 `times 4 and add unsigned'.\>
430
@.4ADDU@>
431
The sum $(4\rY+\rZ)\bmod2^{64}$ or $(4\rY+\zz)\bmod2^{64}$
432
is placed into register~X\null.
433
 
434
\bull\<8ADDU \$X,\$Y,\0 `times 8 and add unsigned'.\>
435
@.8ADDU@>
436
The sum $(8\rY+\rZ)\bmod2^{64}$ or $(8\rY+\zz)\bmod2^{64}$
437
is placed into register~X\null.
438
 
439
\bull\<16ADDU \$X,\$Y,\0 `times 16 and add unsigned'.\>
440
@.16ADDU@>
441
The sum $(16\rY+\rZ)\bmod2^{64}$ or $(16\rY+\zz)\bmod2^{64}$
442
is placed into register~X\null.
443
 
444
\bull\
445
@.SUB@>
446
The difference $\rY-\rZ$ or $\rY-\zz$ is placed into register~X using
447
signed, two's complement arithmetic.
448
An integer overflow exception occurs if the difference is $\ge2^{63}$ or
449
$<-2^{63}$.
450
 
451
\bull\
452
@.SUBU@>
453
The difference $(\rY-\rZ)\bmod2^{64}$ or $(\rY-\zz)\bmod2^{64}$
454
is placed into register~X\null.
455
These two instructions are the same
456
as \.{SUB}~\.{\$X,\$Y,\0} except that no test for overflow is made.
457
 
458
\bull\
459
@.NEG@>
460
The value $\yy-\rZ$ or $\yy-\zz$ is placed into register~X using
461
signed, two's complement arithmetic.
462
An integer overflow exception occurs if the result is greater
463
than~$2^{63}-\nobreak1$.
464
(Notice that in this case \MMIX\ works with the ``immediate'' constant~Y,
465
not register~Y\null. \.{NEG} commands are analogous to the immediate variants
466
of other commands, because they save us from having to put one-byte
467
constants into a register. When $\yy=0$, overflow occurs if and
468
only if $\rZ=-2^{63}$. The instruction \
469
same effect as \.{NEG}~\.{\$X,0,1}.)
470
 
471
\bull\
472
@.NEGU@>
473
The value $(\yy-\rZ)\bmod2^{64}$ or $(\yy-\zz)\bmod2^{64}$
474
is placed into register~X\null.
475
\.{NEGU} instructions are the same
476
as \.{NEG} instructions, except that no test for overflow is made.
477
 
478
@* Bit fiddling.
479
Before looking at multiplication and division, which take longer than
480
addition and subtraction, let's look at some of the other things that
481
\MMIX\ can do fast. There are eighteen instructions for bitwise
482
logical operations on unsigned numbers.
483
 
484
\bull\
485
@.AND@>
486
Each bit of register Y is logically anded with the corresponding bit of
487
register~Z or of the constant~Z, and the result is placed in register~X\null.
488
In other words, a bit of register~X is set to~1 if and only if the
489
corresponding bits of the operands are both~1;
490
in symbols, $\rX=\rY\land\rZ$ or $\rX=\rY\land\zz$.
491
This means in particular that \
492
most significant bytes of register~X, because 0s are prefixed to the
493
constant byte~Z\null.
494
 
495
\bull\
496
@.OR@>
497
Each bit of register Y is logically ored with the corresponding bit of
498
register~Z or of the constant~Z, and the result is placed in register~X\null.
499
In other words, a bit of register~X is set to~0 if and only if the
500
corresponding bits of the operands are both~0;
501
in symbols, $\rX=\rY\lor\rZ$ or $\rX=\rY\lor\zz$.
502
 
503
In the special case $\zz=0$, the immediate variant of
504
this command simply copies register~Y to
505
register~X\null. The \MMIX\ assembler allows us to write
506
`\.{SET}~\.{\$X,\$Y}' as a convenient abbreviation for
507
`\.{OR}~\.{\$X,\$Y,0}'.
508
@.SET@>
509
 
510
\bull\
511
@.XOR@>
512
Each bit of register Y is logically xored with the corresponding bit of
513
register~Z or of the constant~Z, and the result is placed in register~X\null.
514
In other words, a bit of register~X is set to~0 if and only if the
515
corresponding bits of the operands are equal;
516
in symbols, $\rX=\rY\oplus\rZ$ or $\rX=\rY\oplus\zz$.
517
 
518
\bull\
519
@.ANDN@>
520
Each bit of register Y is logically anded with the complement of the
521
corresponding bit of
522
register~Z or of the constant~Z, and the result is placed in register~X\null.
523
In other words, a bit of register~X is set to~1 if and only if the
524
corresponding bit of register~Y is~1 and the other corresponding bit is~0;
525
in symbols, $\rX=\rY\setminus\rZ$ or $\rX=\rY\setminus\zz$.
526
(This is the {\it logical difference\/} operation; if the operands
527
are bit strings representing sets, we are computing the elements that
528
lie in one set but not the other.)
529
 
530
\bull\
531
@.ORN@>
532
Each bit of register Y is logically ored with the complement of the
533
corresponding bit of
534
register~Z or of the constant~Z, and the result is placed in register~X\null.
535
In other words, a bit of register~X is set to~1 if and only if the
536
corresponding bit of register~Y is greater than or equal to the other corresponding bit;
537
in symbols, $\rX=\rY\lor\overline\rZ$
538
or $\rX=\rY\lor\overline\zz$.
539
(This is the complement of $\rZ\setminus\rY$ or $\zz\setminus\rY$.)
540
 
541
\bull\
542
@.NAND@>
543
Each bit of register Y is logically anded with the corresponding bit of
544
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
545
In other words, a bit of register~X is set to~0 if and only if the
546
corresponding bits of the operands are both~1;
547
in symbols, $\rX=\rY\mathbin{\overline\land}\rZ$ or
548
$\rX=\rY\mathbin{\overline\land}\zz$.
549
 
550
\bull\
551
@.NOR@>
552
Each bit of register Y is logically ored with the corresponding bit of
553
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
554
In other words, a bit of register~X is set to~1 if and only if the
555
corresponding bits of the operands are both~0;
556
in symbols, $\rX=\rY\mathbin{\overline\lor}\rZ$ or
557
$\rX=\rY\mathbin{\overline\lor}\zz$.
558
 
559
\bull\
560
@.NAND@>
561
Each bit of register Y is logically xored with the corresponding bit of
562
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
563
In other words, a bit of register~X is set to~1 if and only if the
564
corresponding bits of the operands are equal;
565
in symbols, $\rX=\rY\mathbin{\overline\oplus}\rZ$ or
566
$\rX=\rY\mathbin{\overline\oplus}\zz$.
567
 
568
\bull\
569
@.MUX@>
570
For each bit position~$j$, the $j$th bit of register~X is set either to
571
bit~$j$ of register~Y
572
or to bit~$j$ of the other operand \$Z~or~Z, depending on
573
whether bit~$j$ of the special {\it mask register\/}~rM is 1 or 0:
574
@^rM@>
575
if ${\rm M}_j$ then $\yy_j$ else~$\zz_j$.
576
In symbols, $\rm\rX=(\rY\land rM)\lor(\rZ\land\overline{rM})$ or
577
            $\rm\rX=(\rY\land rM)\lor(\zz\land\overline{rM})$.
578
(\MMIX\ has several such special registers, associated with instructions that
579
need more than two inputs or produce more than one output.)
580
 
581
@ Besides the eighteen bitwise operations, \MMIX\ can also perform unsigned
582
bytewise and biggerwise operations that are somewhat more exotic.
583
 
584
\bull\
585
@.BDIF@>
586
For each byte position~$j$, the $j$th byte of register~X is set to byte~$j$ of
587
register~Y minus byte~$j$ of the other operand \$Z~or~Z, unless that
588
difference is negative; in the latter case, byte~$j$ of~\$X is set to zero.
589
 
590
\bull\
591
@.WDIF@>
592
For each wyde position~$j$, the $j$th wyde of register~X is set to wyde~$j$ of
593
register~Y minus wyde~$j$ of the other operand \$Z~or~Z, unless that
594
difference is negative; in the latter case, wyde~$j$ of~\$X is set to zero.
595
 
596
\bull\
597
@.TDIF@>
598
For each tetra position~$j$, the $j$th tetra of register~X is set to tetra~$j$ of
599
register~Y minus tetra~$j$ of the other operand \$Z~or~Z, unless that
600
difference is negative; in the latter case, tetra~$j$ of~\$X is set to zero.
601
 
602
\bull\
603
@.ODIF@>
604
Register~X is set to register~Y minus the other operand \$Z~or~Z, unless
605
\$Z~or~Z exceeds register~Y; in the latter case,
606
\$X~is set to zero. The operands are treated as unsigned integers.
607
 
608
\smallskip
609
The \.{BDIF} and \.{WDIF} commands are useful
610
in applications to graphics or video; \.{TDIF} and \.{ODIF} are also
611
present for reasons of consistency. For example, if \.a and \.b are
612
registers containing
613
8-byte quantities, their bytewise maxima~\.c and
614
bytewise minima~\.d are computed by
615
$$\hbox{\tt BDIF x,a,b; ADDU c,x,b; SUBU d,a,x;}$$
616
similarly, the individual ``pixel differences'' \.e, namely the absolute
617
values of the differences of corresponding bytes, are computed by
618
$$\hbox{\tt BDIF x,a,b; BDIF y,b,a; OR e,x,y.}$$
619
To add individual
620
bytes of \.a and \.b while clipping all sums to 255 if they don't fit
621
in a single byte, one can say
622
$$\hbox{\tt NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums,x,0;}$$
623
in other words, complement \.a, apply \.{BDIF}, and complement the result.
624
The operations can also be used to construct efficient operations on
625
strings of bytes or wydes.
626
@^graphics@>
627
@^pixels@>
628
@^saturating arithmetic@>
629
@^nybble@>
630
 
631
Exercise: Implement a ``nybble difference'' instruction that operates
632
in a similar way on sixteen nybbles at a time.
633
 
634
Answer: {\tt\spaceskip=.5em minus .3em
635
AND x,a,m; AND y,b,m; ANDN xx,a,m; ANDN yy,b,m;
636
BDIF x,x,y; BDIF xx,xx,yy; OR ans,x,xx} where register \.m contains
637
the mask \Hex{0f0f0f0f0f0f0f0f}.
638
 
639
(The \.{ANDN} operation can be regarded as
640
a ``bit difference'' instruction that operates
641
in a similar way on 64 bits at a time.)
642
 
643
@ Three more pairs of bit-fiddling instructions round out the collection of exotics.
644
 
645
\bull\
646
@.SADD@>
647
Each bit of register Y is logically anded with the complement of the
648
corresponding bit of
649
register~Z or of the constant~Z, and the number of 1~bits in the
650
result is placed in register~X\null.
651
In other words, register~X is set to the number of bit positions
652
in which register~Y has a~1 and the other operand has a~0;
653
in symbols, $\rX=\nu(\rY\setminus\rZ)$ or $\rX=\nu(\rY\setminus\zz)$.
654
When the second operand is zero this operation is sometimes called
655
``population counting,'' because it counts the number of 1s in register~Y\null.
656
@^population counting@>
657
@^counting ones@>
658
 
659
\bull\
660
@.MOR@>
661
Suppose the 64 bits of register Y are indexed as
662
$$y_{00}y_{01}\ldots y_{07}y_{10}y_{11}\ldots y_{17}\ldots
663
  y_{70}y_{71}\ldots y_{77};$$
664
in other words, $y_{ij}$ is the $j$th bit of the $i$th byte, if we
665
number the bits and bytes from 0 to 7 in big-endian fashion from left to right.
666
Let the bits of the other operand, \$Z or~Z, be indexed similarly:
667
$$z_{00}z_{01}\ldots z_{07}z_{10}z_{11}\ldots z_{17}\ldots
668
  z_{70}z_{71}\ldots z_{77}.$$
669
The \.{MOR} operation replaces each bit $x_{ij}$ of register~X by the bit
670
$$ y_{0j}z_{i0}\lor y_{1j}z_{i1}\lor \cdots \lor y_{7j}z_{i7}.$$
671
Thus, for example, if register Z contains the constant
672
\Hex{0102040810204080},
673
\.{MOR} reverses the order of the bytes in register~Y, converting between
674
little-endian and big-endian addressing.
675
@^big-endian versus little-endian@>
676
@^little-endian versus big-endian@>
677
(The $i$th byte of~\$X depends on the bytes of~\$Y as specified by the
678
$i$th byte of~\$Z or~Z\null. If we regard
679
64-bit words as $8\times8$ Boolean matrices, with one byte per column,
680
this operation computes the
681
Boolean product $\rX=\rY\,\rZ$ or $\rX=\rY\,\zz$. Alternatively, if we
682
regard 64-bit words as $8\times8$ matrices with one byte per~{\it row},
683
\.{MOR} computes the Boolean product $\rX=\rZ\,\rY$ or $\rX=\zz\,\rY$
684
with operands in the opposite order. The immediate form
685
\
686
to zero; the other byte is set to the bitwise or of whatever bytes of
687
register~Y are specified by the immediate operand~Z\null.)
688
 
689
Exercise: Explain how to compute a mask \.m that is \Hex{ff} in byte
690
positions where \.a exceeds \.b, \Hex{00} in all other bytes.
691
Answer: \.{BDIF}~\.{x,a,b;} \.{MOR}~\.{m,minusone,x;}
692
here \.{minusone} is a register consisting of all 1s. (Moreover,
693
if we \.{AND} this result
694
with \Hex{8040201008040201}, then \.{MOR} with $\zz=255$, we get
695
a one-byte encoding~of~\.m.)
696
 
697
\bull\
698
@.MXOR@>
699
This operation is like the Boolean multiplication just discussed, but
700
exclusive-or is used to combine the bits. Thus we obtain a matrix
701
product over the field of two elements instead of a Boolean matrix product.
702
This operation can be used to construct hash functions, among many other things.
703
(The hash functions aren't bad, but they are not ``universal'' in the
704
sense of exercise 6.4--72.)
705
@^matrices of bits@>
706
@^Boolean multiplication@>
707
 
708
@ Sixteen ``immediate wyde'' instructions are available for the common
709
case that a 16-bit constant is needed. In this case the Y~and~Z fields
710
of the instruction are regarded as a single 16-bit unsigned number~YZ\null.
711
@^immediate operands@>
712
 
713
\bull\
714
@.SETH@>
715
     \
716
@.SETMH@>
717
     \
718
@.SETML@>
719
     \
720
@.SETL@>
721
The 16-bit unsigned number YZ is shifted left
722
by either 48 or 32 or 16 or 0 bits, respectively, and placed into register~X\null.
723
Thus, for example, \.{SETML} inserts
724
a given value into the second-least-significant wyde of register~X and
725
sets the other three wydes to zero.
726
 
727
\bull\
728
@.INCH@>
729
     \
730
@.INCMH@>
731
     \
732
@.INCML@>
733
     \
734
@.INCL@>
735
The 16-bit unsigned number YZ is shifted left
736
by either 48 or 32 or 16 or 0 bits, respectively, and added to register~X,
737
ignoring overflow; the result is placed back into register~X\null.
738
 
739
If YZ is the hexadecimal constant \Hex{8000}, the command \
740
complements the most significant bit of register~X\null. We will see
741
below that this can be used to negate a floating point number.
742
@^negation, floating point@>
743
 
744
\bull\
745
@.ORH@>
746
     \
747
@.ORMH@>
748
     \
749
@.ORML@>
750
     \
751
@.ORL@>
752
The 16-bit unsigned number YZ is shifted left
753
by either 48 or 32 or 16 or 0 bits, respectively, and ored with register~X;
754
the result is placed back into register~X\null.
755
 
756
Notice that any desired 4-wyde constant \.{GH} \.{IJ} \.{KL} \.{MN}
757
can be inserted into a register with a sequence of four instructions
758
such as
759
$$\hbox{\tt SETH \$X,GH; INCMH \$X,IJ; INCML \$X,KL; INCL \$X,MN;}$$
760
any of these \.{INC} instructions could also be replaced by \.{OR}.
761
 
762
\bull\
763
@.ANDNH@>
764
     \
765
@.ANDNMH@>
766
     \
767
@.ANDNML@>
768
     \
769
@.ANDNL@>
770
The 16-bit unsigned number YZ is shifted left
771
by either 48 or 32 or 16 or 0 bits, respectively, then
772
complemented and anded with register~X;
773
the result is placed back into register~X\null.
774
 
775
If YZ is the hexadecimal
776
constant \Hex{8000}, the command \
777
bit of register~X to be~0. This can be used to compute the absolute value of
778
a floating point number.
779
@^absolute value, floating point@>
780
 
781
@ \MMIX\ knows several ways to shift a register left or right
782
by any number of bits.
783
 
784
\bull\
785
@.SL@>
786
The bits of register~Y are shifted left by \$Z or Z places, and 0s
787
are shifted in from the right; the result is placed in register~X\null.
788
Register~Y is treated as a signed number, but
789
the second operand is treated as an unsigned number.
790
The effect is the same as multiplication by
791
$2^{\mkern1mu\rZ}$ or by $2^\zz$; an integer overflow exception occurs if the
792
result is $\ge2^{63}$ or $<-2^{63}$.
793
In particular, if the second operand is 64 or~more, register~X will
794
become entirely zero, and integer overflow will be signaled unless
795
register~Y was zero.
796
 
797
\bull\
798
@.SLU@>
799
The bits of register~Y are shifted left by \$Z or Z places, and 0s
800
are shifted in from the right; the result is placed in register~X\null.
801
Both operands are treated as unsigned numbers. The \.{SLU} instructions
802
are equivalent to \.{SL}, except that no test for overflow is made.
803
 
804
\bull\
805
@.SR@>
806
The bits of register~Y are shifted right by \$Z or Z places, and copies
807
of the leftmost bit (the sign bit) are shifted in from the left; the result is
808
placed in register~X\null.
809
Register~Y is treated as a signed number, but
810
the second operand is treated as an unsigned number.
811
The effect is the same as division by $2^{\mkern1mu\rZ}$ or by
812
$2^\zz$ and rounding down. In particular, if the second operand is 64 or~more,
813
register~X will become zero if \$Y was nonnegative, $-1$ if \$Y was negative.
814
 
815
\bull\
816
@.SRU@>
817
The bits of register~Y are shifted right by \$Z or Z places, and 0s
818
are shifted in from the left; the result is placed in register~X\null.
819
Both operands are treated as unsigned numbers.
820
The effect is the same as unsigned division of a 64-bit number
821
by $2^{\mkern1mu\rZ}$ or by~$2^\zz$;
822
if the second operand is 64 or~more, register~X will become entirely~zero.
823
 
824
@* Comparisons.
825
Arithmetic and logical operations are nice,
826
but computer programs also need to compare numbers
827
and to change the course of a calculation depending on what they find.
828
\MMIX\ has four comparison instructions to facilitate such decision-making.
829
 
830
\bull\
831
@.CMP@>
832
Register X is set to $-1$ if register Y is less than register Z or less than
833
the unsigned immediate value~Z, using the conventions of signed
834
arithmetic; it is set to 0 if register~Y is equal to register Z or equal to
835
the unsigned immediate value~Z; otherwise it is set to~1.
836
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
837
 
838
\bull\
839
@.CMPU@>
840
Register X is set to $-1$ if register Y is less than register Z or less than
841
the unsigned immediate value Z, using the conventions of unsigned
842
arithmetic; it is set to 0 if register Y is equal to register Z or equal to
843
the unsigned immediate value~Z; otherwise it is set to~1.
844
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
845
 
846
@ There also are 32 conditional instructions, which choose quickly between
847
two alternative courses of action.
848
 
849
\bull\
850
@.CSN@>
851
If register Y is negative (namely if its most significant bit is~1),
852
register~X is set to the contents of register~Z or to the
853
unsigned immediate value~Z. Otherwise nothing happens.
854
 
855
\bull\
856
@.CSZ@>
857
\bul\
858
@.CSP@>
859
\bul\
860
@.CSOD@>
861
\bul\
862
@.CSNN@>
863
\bul\
864
@.CSNZ@>
865
\bul\
866
@.CSNP@>
867
\bul\
868
@.CSEV@>
869
These instructions are entirely analogous to \.{CSN}, except
870
that register~X changes only if register~Y is respectively zero, positive,
871
odd, nonnegative, nonzero, nonpositive, or nonodd.
872
 
873
\bull\
874
@.ZSN@>
875
If register Y is negative (namely if its most significant bit is~1),
876
register~X is set to the contents of register~Z or to the
877
unsigned immediate value~Z. Otherwise register~X is set to zero.
878
 
879
\bull\
880
@.ZSZ@>
881
\bul\
882
@.ZSP@>
883
\bul\
884
@.ZSOD@>
885
\bul\
886
@.ZSNN@>
887
\bul\
888
@.ZSNZ@>
889
\bul\
890
@.ZSNP@>
891
\bul\
892
@.ZSEV@>
893
These instructions are entirely analogous to \.{ZSN}, except
894
that \$X is set to \$Z or~Z if register~Y is respectively zero, positive,
895
odd, nonnegative, nonzero, nonpositive, or even; otherwise
896
\$X is set to zero.
897
 
898
Notice that the two instructions \
899
the same effect. So do the two instructions \
900
So do \
901
 
902
@* Branches and jumps.
903
\MMIX\ ordinarily executes instructions in sequence, proceeding from
904
an instruction in tetrabyte M$_4[\lambda]$ to the instruction in
905
M$_4[\lambda+4]$. But there are several ways to interrupt
906
the normal flow of control, most of which use the Y and Z fields of
907
an instruction as a combined 16-bit YZ field.
908
For example, \
909
is typical: It means that control should skip ahead 1000 instructions
910
to the command that appears $4000$ bytes after the
911
\.{BNZ}, if register~3 is not equal to zero.
912
 
913
There are eight branch-forward instructions, corresponding to the
914
eight conditions in the \.{CS} and \.{ZS} commands that we discussed earlier.
915
And there are eight similar branch-backward instructions; for example,
916
\
917
instruction that appears $4000$ bytes {\it before\/}
918
this \.{BOD} command, if register~2 is odd. The numeric OP-code when branching
919
backward is one greater than the OP-code when branching
920
forward; the assembler takes care of this automatically, just as it takes
921
cares of changing \.{ADD} from 32 to 33 when necessary.
922
 
923
Since branches are relative to the current location, the \MMIX\ assembler
924
treats branch instructions in a special way.
925
Suppose a programmer writes `\.{BNZ} \.{\$3,Case5}',
926
where \.{Case5} is the address of an instruction in location~$l$.
927
If this instruction appears in location~$\lambda$, the assembler first
928
computes the displacement $\delta=\lfloor(l-\lambda)/4\rfloor$. Then if
929
$\delta$ is nonnegative, the quantity~$\delta$
930
is placed in the YZ field of a \.{BNZ}
931
command, and it should be less than $2^{16}$; if $\delta$ is negative,
932
the quantity $2^{16}+\delta$ is placed in the YZ field of a \.{BNZ}
933
command with OP-code increased by~1,
934
and $\delta$ should not be less than $-2^{16}$.
935
 
936
The symbol \.{@@} used in our examples of
937
\.{BNZ} and \.{BOD} above is interpreted by the
938
assembler as an abbreviation for ``the location of the current
939
instruction.'' In the following
940
notes we will define pairs of branch commands by writing, for example,
941
`\.{BNZ}~\.{\$X,@@+4*YZ[-262144]}'; this stands for a branch-forward
942
command that
943
branches to the current location plus four times~YZ, as well as for
944
a branch-backward command that branches to the current
945
location plus four times $(\rm YZ-65536)$.
946
 
947
\bull\
948
@.BN@>
949
\bul\
950
@.BZ@>
951
\bul\
952
@.BP@>
953
\bul\
954
@.BOD@>
955
\bul\
956
@.BNN@>
957
\bul\
958
@.BNZ@>
959
\bul\
960
@.BNP@>
961
\bul\
962
@.BEV@>
963
If register X is respectively negative, zero, positive, odd, nonnegative,
964
nonzero, nonpositive, or even, and if this instruction appears in memory
965
location $\lambda$, the next instruction is taken from memory location
966
$\lambda+4{\rm YZ}$ (branching forward) or $\lambda+4({\rm YZ}-2^{16})$
967
(branching backward). Thus one can go from location~$\lambda$ to any location
968
between $\lambda-262{,}144$ and $\lambda+262{,}140$, inclusive.
969
 
970
\smallskip
971
Sixteen additional branch instructions called {\it probable branches\/}
972
are also provided. They have exactly the same meaning as ordinary
973
branch instructions; for example, \
974
go backward 4000 bytes if register~2 is odd. But they differ in running time:
975
On some implementations of\/ \MMIX,
976
a branch instruction takes longer when the branch is taken, while a
977
probable branch takes longer when the branch is {\it not\/} taken.
978
Thus programmers should use a \.B instruction when they think branching is
979
relatively unlikely, but they should use \.{PB} when they expect
980
branching to occur more often than not. Here is a list of the
981
probable branch commands, for completeness:
982
 
983
\bull\
984
@.PBN@>
985
\bul\
986
@.PBZ@>
987
\bul\
988
@.PBP@>
989
\bul\
990
@.PBOD@>
991
\bul\
992
@.PBNN@>
993
\bul\
994
@.PBNZ@>
995
\bul\
996
@.PBNP@>
997
\bul\
998
@.PBEV@>
999
 
1000
@ Locations that are relative to the current instruction can be
1001
transformed into absolute locations with \.{GETA} commands.
1002
 
1003
\bull\
1004
@.GETA@>
1005
The value $\lambda+4{\rm YZ}$ or $\lambda+4({\rm YZ}-2^{16})$ is placed in
1006
register~X\null. (The assembly language conventions of branch instructions
1007
apply; for example, we can write `\.{GETA} \.{\$X,Addr}'.)
1008
 
1009
@ \MMIX\ also has unconditional jump instructions, which change the
1010
location of the next instruction no matter what.
1011
 
1012
\bull\
1013
@.JMP@>
1014
A \.{JMP} command treats bytes X, Y, and Z as an unsigned 24-bit
1015
integer XYZ. It allows a program to transfer control from location $\lambda$ to any
1016
location between $\lambda-67\?{,}108{,}864$ and $\lambda+67\?{,}108{,}860$
1017
inclusive, using relative addressing as in the \.{B} and \.{PB} commands.
1018
 
1019
\bull\
1020
@.GO@>
1021
\MMIX\ takes its next instruction from location $\rY+\rZ$ or $\rY+\zz$,
1022
and continues from there. Register~X is set equal to $\lambda+4$, the
1023
location of the instruction that would ordinarily have been executed next.
1024
(\.{GO} is similar to a jump, but it is not relative
1025
to the current location. Since \.{GO} has the same format as a load or store
1026
instruction, a loading routine can treat program labels with the same mechanism
1027
that is used to treat references to data.)
1028
 
1029
An old-fashioned type of subroutine linkage can be implemented by saying
1030
either `\.{GO}~\.{r,subloc,0}' or `\.{GETA}~\.{r,@@+8;}
1031
\.{JMP}~\.{Sub}' to~enter a subroutine,
1032
then `\.{GO}~\.{r,r,0}' to return.
1033
But subroutines are normally entered with the instructions
1034
\.{PUSHJ} or \.{PUSHGO}, described below.
1035
 
1036
The two least significant bits of the address
1037
in a \.{GO} command are essentially ignored. They will, however, appear in
1038
the value of~$\lambda$ returned by \.{GETA} instructions, and in the
1039
return-jump register~rJ after \.{PUSHJ} or \.{PUSHGO} instructions are
1040
performed, and in
1041
@^rJ@>
1042
the where-interrupted register at the time of an interrupt. Therefore they
1043
could be used to send some kind of signal to a subroutine or (less likely)
1044
to an interrupt handler.
1045
 
1046
@* Multiplication and division.
1047
Now for some instructions that make \MMIX\ work harder.
1048
 
1049
\bull\
1050
@.MUL@>
1051
The signed product of the number in register Y by either the
1052
number in register~Z or the unsigned byte~Z
1053
replaces the contents of register~X\null. An
1054
integer overflow exception can occur, as with \.{ADD} or \.{SUB}, if the
1055
result is less than $-2^{63}$ or greater than $2^{63}-1$. (Immediate
1056
multiplication by powers of~2 can be done more rapidly with the \.{SL}
1057
instruction.)
1058
 
1059
\bull\
1060
@.MULU@>
1061
The lower 64 bits of the
1062
unsigned 128-bit product of register~Y and either
1063
register~Z or~Z are placed in register~X, and the upper 64 bits are
1064
placed in the special {\it himult register\/}~rH\null. (Immediate multiplication
1065
@^rH@>
1066
by powers of~2 can be done more rapidly with the \.{SLU} instruction,
1067
if the upper half is not needed.
1068
Furthermore, an instruction like \<4ADDU \$X,\$Y,\$Y is faster than
1069
\.{MULU} \.{\$X,\$Y,5}.)
1070
 
1071
\bull\
1072
@.DIV@>
1073
The signed quotient of the number in register Y divided
1074
by either the number in register~Z or the unsigned byte~Z
1075
replaces the contents of register~X, and the signed remainder
1076
is placed in the special {\it remainder register\/}~rR\null.
1077
@^rR@>
1078
An integer divide check exception occurs if the divisor is zero; in that
1079
case \$X is set to zero and rR is set to~\$Y\null.
1080
@^divide check exception@>
1081
@^overflow@>
1082
An integer overflow exception occurs if the number $-2^{63}$ is divided
1083
by~$-1$; otherwise integer overflow is impossible. The quotient of
1084
$y$ divided by~$z$ is defined to be $\lfloor y/z\rfloor$, and the remainder
1085
is defined to be $y-\lfloor y/z\rfloor z$ (also written $y\bmod z$).
1086
Thus, the remainder is either
1087
zero or has the sign of the divisor. Dividing by $z=2^t$ gives
1088
exactly the same quotient as shifting right~$t$ via the \.{SR} command, and
1089
exactly the same remainder as anding with $z-1$ via the \.{AND} command.
1090
Division of a positive 63-bit number by a positive constant can be accomplished
1091
more quickly by computing the upper half of a suitable unsigned product and
1092
shifting it right appropriately.
1093
 
1094
\bull\
1095
@.DIVU@>
1096
The unsigned 128-bit number obtained by prefixing the special {\it dividend
1097
register}~rD to the contents of register~Y is divided either by the
1098
@^rD@>
1099
unsigned number in register~Z or by the unsigned byte~Z, and the quotient is placed
1100
in register~X\null. The remainder is placed in the remainder
1101
register~rR\null.
1102
However, if rD is greater than or equal to
1103
the divisor (and in particular if the divisor is zero),
1104
then \$X is set to~rD and rR is set to~\$Y\null.
1105
(Unsigned arithmetic never signals an exceptional condition, even
1106
when dividing by zero.)
1107
If rD is zero, unsigned division by $z=2^t$ gives exactly the same quotient as
1108
shifting right~$t$ via the \.{SRU} command, and
1109
exactly the same remainder as anding with $z-1$ via the \.{AND} command.
1110
Section 4.3.1 of {\sl Seminumerical Algorithms\/}
1111
explains how to use unsigned division to obtain the quotient and remainder
1112
of extremely large numbers.
1113
 
1114
@* Floating point computations.
1115
Floating point arithmetic conforming to the famous IEEE/ANSI
1116
Standard~754 is provided for arbitrary 64-bit numbers. The IEEE standard
1117
refers to such numbers as ``double format'' quantities, but \MMIX\
1118
calls them simply floating point numbers because 64-bit quantities are
1119
the~norm.
1120
@^floating point arithmetic@>
1121
@^IEEE/ANSI Standard 754@>
1122
@^subnormal numbers@>
1123
@^normal numbers@>
1124
@^NaN@>
1125
@^overflow@>
1126
@^underflow@>
1127
@^invalid exception@>
1128
@^inexact exception@>
1129
@^signaling NaN@>
1130
@^quiet NaN@>
1131
@^infinity@>
1132
@^rounding modes@>
1133
 
1134
A positive floating point number has 53 bits of precision and can range
1135
from approximately $10^{-308}$ to $10^{308}$. ``Subnormal numbers''
1136
between $10^{-324}$ and $10^{-308}$ can also be represented, but with fewer
1137
bits of precision.
1138
Floating point numbers can be
1139
infinite, and they satisfy such identities as $1.0/\infty=+0.0$, $-2.8\times\infty
1140
=-\infty$. Floating
1141
point quantities can also be ``Not-a-Numbers'' or NaNs, which are
1142
further classified into signaling NaNs and quiet NaNs.
1143
 
1144
Five kinds
1145
of exceptions can occur during floating point computations, and they
1146
each have code letters: Floating
1147
overflow~(O) or underflow~(U); floating divide by zero~(Z);
1148
floating inexact~(X); and floating invalid~(I).
1149
For example, the multiplication of sufficiently small integers causes
1150
no exceptions, and the division of 91.0 by~13.0 is also exception-free,
1151
but the division 1.0/3.0 is inexact. The multiplication of extremely
1152
large or extremely small floating point numbers is inexact and it
1153
also causes overflow or underflow.
1154
Invalid results occur when taking the square root of a negative
1155
number; mathematicians can remember the I exception
1156
by relating it to the square root of $-1.0$.
1157
Invalid results also occur when trying to convert infinity
1158
or a quiet NaN to a fixed-point
1159
integer, or when any signaling NaN is encountered, or when
1160
mathematically undefined operations like $\infty-\infty$ or $0/0$ are
1161
requested.
1162
(Programmers can be sure that they have not erroneously
1163
used uninitialized floating point data if they initialize all their variables
1164
to signaling NaN values.)
1165
 
1166
Four different rounding modes for inexact results are available:
1167
round to nearest (and to even in case of ties);
1168
round off (toward zero); round up (toward $+\infty)$;
1169
or round down (toward $-\infty$). \MMIX\
1170
has a special {\it arithmetic status register\/}~rA that specifies the
1171
@^rA@>
1172
current rounding mode and the user's current preferences for exception
1173
handling.
1174
 
1175
\def\NaN{{\rm NaN}}
1176
IEEE standard arithmetic provides an excellent foundation for scientific
1177
calculations, and it will be thoroughly explained in the fourth
1178
edition of {\sl Seminumerical Algorithms}, Section 4.2.
1179
For our present purposes, we need not study all the details; but
1180
we do need to specify \MMIX's behavior with respect to several
1181
things that are not completely defined by the standard.
1182
For example, the IEEE standard does not fully define the
1183
result of operations with NaNs.
1184
 
1185
When an octabyte represents a floating point number
1186
in \MMIX's registers, the leftmost bit is the sign; then come 11 bits for an
1187
exponent~$e$; and the remaining 52 bits are the fraction part~$f$.
1188
We regard $e$ as an integer between 0 and $(11111111111)_2=2047$, and we regard $f$ as
1189
a fraction between 0 and $(.111\ldots1)_2=1-2^{-52}$.
1190
Each octabyte has the following
1191
significance:
1192
$$\vbox{\halign{\hfil$\pm#$,\quad if &#\hfil\cr
1193
0.0&$e=f=0$ (zero);\cr
1194
2^{-1022}f&$e=0$ and $f>0$ (subnormal);\cr
1195
2^{\mkern1mu e-1023}(1+f)&$0
1196
\infty&$e=2047$ and $f=0$ (infinite);\cr
1197
\NaN(f)&$e=2047$ and $0
1198
\NaN(f)&$e=2047$ and $f\ge1/2$ (quiet NaN).\cr}}$$
1199
Notice that $+0.0$ is distinguished from $-0.0$; this fact is
1200
important for interval arithmetic.
1201
@^minus zero@>
1202
 
1203
Exercise: What 64 bits represent the floating point number 1.0?
1204
Answer: We want $e=1023$ and $f=0$, so the answer is \Hex{3ff0000000000000}.
1205
 
1206
Exercise: What is the largest finite floating point number?
1207
Answer: We want $e=2046$ and $f=1-2^{-52}$, so the answer is
1208
$\Hex{7fefffffffffffff}=2^{1024}-2^{971}$.
1209
 
1210
@ The seven IEEE floating point arithmetic operations (addition, subtraction,
1211
multiplication, division, remainder, square root, and nearest-integer)
1212
all share common features, called the {\it standard floating point
1213
conventions\/} in the discussion below:
1214
@^standard floating point conventions@>
1215
@^overflow@>
1216
@^underflow@>
1217
The operation is performed on floating point numbers found in two registers,
1218
\$Y and~\$Z, except that square root and integerization
1219
involve only one operand.
1220
If neither input operand is a NaN, we first determine the exact result,
1221
then round it using the current rounding mode
1222
found in special register~rA\null. Infinite results are exact and need
1223
no rounding. A floating overflow exception occurs if the rounded
1224
result is finite but needs an exponent greater than 2046.
1225
A floating underflow exception occurs if the rounded result needs an exponent
1226
less than~1 and either (i)~the unrounded result cannot be represented exactly
1227
@^rA@>
1228
as a subnormal number or (ii)~the ``floating underflow trip'' is enabled in~rA\null.
1229
(Trips are discussed below.)
1230
NaNs are treated specially as follows: If either \$Y or~\$Z is a signaling NaN,
1231
an invalid exception occurs and the NaN is quieted by adding 1/2 to its
1232
fraction part. Then if \$Z is a quiet NaN, the result is set
1233
to \$Z; otherwise if \$Y is a quiet NaN, the result is set to \$Y\null.
1234
(Registers \$Y and \$Z do not actually change.)
1235
 
1236
\bull\\>
1237
@.FADD@>
1238
The floating point sum $\rY+\rZ$ is computed by the
1239
standard floating point conventions just described,
1240
and placed in register~X\null.
1241
An invalid exception occurs if the sum is $(+\infty)+(-\infty)$ or
1242
$(-\infty)+(+\infty)$; in that case the result is $\NaN(1/2)$ with the sign
1243
of~\$Z\null. If the sum is exactly zero and the current mode is
1244
not rounding-down, the result is $+0.0$ except that $(-0.0)+(-0.0)=-0.0$. If the
1245
@^minus zero@>
1246
sum is exactly zero and the current mode is rounding-down, the result
1247
is $-0.0$ except that $(+0.0)+(+0.0)=+0.0$.
1248
These rules for signed zeros turn out to be useful when doing interval
1249
arithmetic: If the lower bound of an interval is $+0.0$ or if the
1250
upper bound is $-0.0$, the interval does not contain zero, so the
1251
numbers in the interval have a known sign.
1252
 
1253
Floating point underflow cannot occur unless the U-trip has been enabled,
1254
because any underflowing result of floating point
1255
addition can be represented exactly as a subnormal number.
1256
 
1257
Silly but instructive exercise: Find all pairs of numbers $(\rY,\rZ)$ such
1258
that the commands \
1259
the same result in~\$X
1260
(although \.{FADD} may cause floating exceptions).
1261
Answer: Of course \$Y or \$Z could be zero, if the other one is not a signaling
1262
NaN. Or one could be signaling and the other \Hex{0008000000000000}.
1263
Other possibilities
1264
occur when they are both positive and less than
1265
\Hex{0010000000000001}; or when one operand is \Hex{0000000000000001}
1266
and the other is an odd number between \Hex{0020000000000001} and
1267
\Hex{002ffffffffffffd} inclusive (rounding to nearest).
1268
And still more surprising possibilities exist, such as
1269
\Hex{7f6001b4c67bc809}\thinspace+\thinspace\Hex{ff5ffb6a4534a3f7}.
1270
All eight families of solutions will be revealed some day in the fourth edition
1271
of {\sl Seminumerical Algorithms}.
1272
 
1273
\bull\
1274
@.FSUB@>
1275
This instruction is equivalent to \.{FADD}, but with the sign of~\$Z negated
1276
unless \$Z is a~NaN.
1277
 
1278
\bull\
1279
@.FMUL@>
1280
The floating point product $\rY\times\rZ$ is computed by
1281
the standard floating point conventions, and placed in register~X\null.
1282
An invalid exception occurs if
1283
the product is $(\pm0.0)\times(\pm\infty)$ or $(\pm\infty)\times(\pm0.0)$;
1284
in that case the result is $\pm\NaN(1/2)$. No exception occurs for the
1285
product $(\pm\infty)\times(\pm\infty)$. If neither \$Y nor~\$Z is a NaN,
1286
the sign of the result is the product of the signs of \$Y and~\$Z\null.
1287
 
1288
\bull\
1289
@.FDIV@>
1290
The floating point quotient $\rY\?/\rZ$ is computed by
1291
the standard floating point conventions, and placed in \$X\null.
1292
@^standard floating point conventions@>
1293
A floating divide by zero exception occurs if the
1294
quotient is $(\hbox{normal or subnormal})/(\pm0.0)$. An invalid exception occurs if
1295
the quotient is $(\pm0.0)/(\pm0.0)$ or $(\pm\infty)/(\pm\infty)$; in that case the
1296
result is $\pm\NaN(1/2)$. No exception occurs for the
1297
quotient $(\pm\infty)/(\pm0.0)$. If neither \$Y nor~\$Z is a NaN,
1298
the sign of the result is the product of the signs of \$Y and~\$Z\null.
1299
 
1300
If a floating point number in register X is known to have an exponent between
1301
2 and~2046, the instruction \
1302
 
1303
\bull\
1304
@.FREM@>
1305
The  floating point remainder $\rY\,{\rm rem}\,\rZ$ is computed by
1306
the standard floating point conventions, and placed in register~X\null.
1307
(The IEEE standard defines the remainder to be $\rY-n\times\rZ$,
1308
where $n$ is the nearest integer to $\rY/\rZ$, and $n$ is an even
1309
integer in case of ties. This is not the same as the remainder
1310
$\rY\bmod\rZ$ computed by \.{DIV} or \.{DIVU}.)
1311
A zero remainder has the sign of~\$Y\null.
1312
An invalid exception occurs if \$Y is infinite and/or \$Z is zero; in
1313
that case the result is $\NaN(1/2)$ with the sign of~\$Y\null.
1314
 
1315
\bull\
1316
@.FSQRT@>
1317
The floating point square root $\sqrt\rZ$ is computed by the
1318
standard floating point conventions, and placed in register~X\null.  An
1319
invalid exception occurs if \$Z is a negative number (either infinite, normal,
1320
or subnormal); in that case the result is $-\NaN(1/2)$. No exception occurs
1321
when taking the square root of $-0.0$ or $+\infty$. In all cases the sign of
1322
the result is the sign of~\$Z\null.
1323
 
1324
\bull\
1325
@.FINT@>
1326
The floating point number in register~Z is rounded (if
1327
necessary) to a floating point integer, using the current
1328
rounding mode, and placed in register~X\null. Infinite values and quiet NaNs
1329
are not changed; signaling NaNs are treated as in the standard conventions.
1330
Floating point overflow and underflow exceptions cannot occur.
1331
 
1332
The Y field of \.{FSQRT} and \.{FINT} can be used to specify a
1333
special rounding mode, as explained below.
1334
 
1335
@ Besides doing arithmetic, we need to compare floating point numbers
1336
with each other, taking proper account of NaNs and the fact that $-0.0$
1337
should be considered equal to $+0.0$. The following instructions are
1338
analogous to the comparison operators \.{CMP} and \.{CMPU} that we
1339
have used for integers.
1340
@^minus zero@>
1341
 
1342
\bull\
1343
@.FCMP@>
1344
Register X is set to $-1$ if $\rY<\rZ$ according to the conventions of
1345
floating point arithmetic, or to~1 if $\rY>\rZ$ according to those
1346
conventions. Otherwise it is set to~0. An invalid exception
1347
occurs if either \$Y or \$Z is a NaN; in such cases the result is zero.
1348
 
1349
\bull\
1350
@.FEQL@>
1351
Register X is set to 1 if $\rY=\rZ$ according to the conventions of
1352
floating point arithmetic. Otherwise it is set to~0. The result is zero if
1353
either \$Y or \$Z is a NaN, even if a NaN is being compared with itself.
1354
However, no invalid exception occurs, not even when \$Y or \$Z is a signaling
1355
NaN\null. (Perhaps \MMIX\ differs slightly from the IEEE standard in this
1356
regard, but programmers sometimes need to look at signaling NaNs without
1357
encountering side effects.
1358
Programmers who insist on raising
1359
an invalid exception whenever a signaling NaN is compared for floating equality
1360
should issue the instructions \
1361
\.{FEQL}~\.{\$X,\$Y,\$Z}.)
1362
 
1363
Suppose $w$, $x$, $y$, and $z$ are unsigned 64-bit integers with
1364
$w
1365
while the leftmost bits of $y$ and~$z$ are~1.
1366
Then we have $w
1367
as unsigned integers, but $y
1368
integers, because $y$ and~$z$ are negative. Furthermore, we have
1369
$z
1370
floating point numbers, assuming that no NaNs are present,
1371
because the leftmost bit of a floating point
1372
number represents its sign and the remaining bits represent its magnitude.
1373
The case $y=w$ occurs in floating point comparison
1374
 if and only if $y$ is the representation of $-0.0$
1375
and $w$ is the representation of $+0.0$.
1376
 
1377
\bull\
1378
@.FUN@>
1379
Register X is set to 1 if \$Y and \$Z are unordered according to the conventions
1380
of floating point arithmetic (namely, if either one is a NaN); otherwise
1381
register~X is set to~0. No invalid exception occurs, not even when \$Y or \$Z is
1382
a signaling NaN\null.
1383
 
1384
\smallskip
1385
The IEEE standard discusses 26 different possible
1386
relations on floating point numbers;
1387
\MMIX\ implements 14 of them with single instructions, followed by
1388
a branch (or by a \.{ZS} to make a ``pure'' 0~or~1 result); all 26
1389
can be evaluated with a sequence of at most four \MMIX\ commands
1390
and a subsequent branch. The
1391
hardest case to handle is `?$>=$' (unordered or greater or equal,
1392
to be computed without exceptions), for which the following
1393
sequence makes $\rX\ge0$ if and only if $\rY\mathrel?>=\rZ$:
1394
$$\vbox{\halign{&\tt#\hfil\ \cr
1395
  &FUN &\$255,\$Y,\$Z\cr
1396
  &BP  &\$255,1F&\% skip ahead if unordered\cr
1397
  &FCMP&\$X,\$Y,\$Z&\% \$X=[\$Y>\$Z]-[\$Y<\$Z]; no exceptions will arise\cr
1398
1H&CSNZ &\$X,\$255,1&\% \$X=1 if unordered\cr
1399
}}$$
1400
 
1401
@ Exercise: Suppose \MMIX\ had no \.{FINT} instruction. Explain how to
1402
@.FINT@>
1403
obtain the equivalent of \
1404
program should do the proper thing with respect to NaNs and exceptions.
1405
(For example, it should cause an invalid exception if and only if \$Z is
1406
a signaling NaN; it should cause an inexact exception only if \$Z needs
1407
to be rounded to another value.)
1408
@^emulation@>
1409
 
1410
Answer: (The assembler prefixes hexadecimal constants by \.\#.)
1411
$$\vbox{\halign{&\tt#\hfil\ \cr
1412
  &SETH &\$0,\char`\#4330&\% \$0=2\char`\^52\cr
1413
  &SET &\$1,\$Z&\% \$1=\$Z\cr
1414
  &ANDNH &\$1,\char`\#8000&\% \$1=abs(\$Z)\cr
1415
  &ANDN &\$2,\$Z,\$1&\% \$2=signbit(\$Z)\cr
1416
  &FUN &\$3,\$Z,\$Z&\% \$3=[\$Z is a NaN]\cr
1417
  &BNZ &\$3,1F&\% skip ahead if \$Z is a NaN\cr
1418
  &FCMP &\$3,\$1,\$0&\% \$3=[abs(\$Z)>2\char`\^52]-[abs(\$Z)<2\char`\^52]\cr
1419
  &CSNN &\$0,\$3,0&\% set \$0=0 if \$3>=0\cr
1420
  &OR  &\$0,\$2,\$0&\% attach sign of \$Z to \$0\cr
1421
1H\ &FADD &\$1,\$Z,\$0&\% \$1=\$Z+\$0\cr
1422
  &FSUB &\$X,\$1,\$0&\% \$X=\$1-\$0\cr}}$$
1423
This program handles most cases of interest by adding and subtracting
1424
$\pm2^{52}$ using floating point arithmetic.
1425
It would be incorrect to do this in all cases;
1426
for example, such addition/subtraction might fail to give the correct
1427
answer when \$Z is a small negative
1428
quantity (if rounding toward zero), or when \$Z is a number like
1429
$2^{105}+2^{53}$ (if rounding to nearest).
1430
 
1431
@ \MMIX\ goes beyond the IEEE standard to define additional relations
1432
between floating point numbers, as suggested by the theory in
1433
Section 4.2.2 of {\sl Seminumerical Algorithms}. Given a nonnegative
1434
number~$\epsilon$, each normal floating point number $u=(f,e)$ has
1435
a {\it neighborhood\/}
1436
$$N_\epsilon(u)=\{x\,\mid\,\vert x-u\vert\le 2^{e-1022}\epsilon\};$$
1437
we also define $N_\epsilon(0)=\{0\}$,
1438
$N_\epsilon(u)=\{x\mid\vert x-u\vert\le2^{-1021}\epsilon\}$ if $u$ is
1439
subnormal; $N_\epsilon(\pm\infty)=\{\pm\infty\}$ if $\epsilon<1$,
1440
$N_\epsilon(\pm\infty)=\{$everything except $\mp\infty\}$ if $1\le\epsilon<2$,
1441
$N_\epsilon(\pm\infty)=\{$everything$\}$ if $\epsilon\ge2$. Then we write
1442
$$\vbox{\halign{$u#v\ (\epsilon)$, &#\hfil\cr
1443
\prec&if $u
1444
\sim&if $u\in N_\epsilon(v)$ or $v\in N_\epsilon(u)$;\cr
1445
\approx&if $u\in N_\epsilon(v)$ and $v\in N_\epsilon(u)$;\cr
1446
\succ&if $u>N_\epsilon(v)$ and $N_\epsilon(u)>v$.\cr}}$$
1447
 
1448
\def\rE{{\rm rE}}
1449
\bull\
1450
@.FCMPE@>
1451
Register X is set to $-1$ if $\rY\prec\rZ\ \ (\rE)$ according to the
1452
conventions of {\sl Seminumerical Algorithms} as stated above; it is set to~1
1453
if $\rY\succ\rZ\ \ (\rE)$ according to those conventions; otherwise
1454
it is set to~0. Here rE is a floating point number in
1455
@^rE@>
1456
the special {\it epsilon register\/}, which is used only by the
1457
floating point comparison operations \.{FCMPE}, \.{FEQLE}, and \.{FUNE}.
1458
An invalid exception occurs, and the result is zero,
1459
if any of \$Y, \$Z, or rE are NaN, or if rE is negative.
1460
If no such exception occurs, exactly one of the three conditions
1461
$\rY\prec\rZ$, $\rY\sim\rZ$, $\rY\succ\rZ$ holds with respect to~rE.
1462
 
1463
\bull\
1464
@.FEQLE@>
1465
Register X is set to 1 if $\rY\approx\rZ\ \ (\rE)$ according to the
1466
conventions of {\sl Seminumerical Algorithms\/} as stated above; otherwise
1467
it is set to~0.
1468
An invalid exception occurs, and the result is zero,
1469
if any of \$Y, \$Z, or rE are NaN, or if rE is negative.
1470
Notice that the relation $\rY\approx\rZ$ computed by \.{FEQLE} is
1471
stronger than the relation $\rY\sim\rZ$ computed by \.{FCMPE}.
1472
 
1473
\bull\
1474
@.FUNE@>
1475
Register X is set to 1 if
1476
\$Y, \$Z, or~rE are exceptional as discussed for \.{FCMPE} and \.{FEQLE};
1477
otherwise it is set to~0. No exceptions occur, even if \$Y, \$Z, or~rE is
1478
a signaling NaN.
1479
 
1480
\smallskip\noindent
1481
Exercise: What floating point numbers does \.{FCMPE} regard
1482
as $\sim0.0$ with respect to
1483
$\epsilon=1/2$, when no exceptions arise? \ Answer: Zero, subnormal
1484
numbers, and normal numbers with $f=0$.
1485
(The numbers similar to zero with respect to~$\epsilon$ are zero,
1486
subnormal numbers with $f\le2\epsilon$, normal numbers with $f\le2\epsilon-1$,
1487
and $\pm\infty$ if $\epsilon>=1$.)
1488
 
1489
@ The IEEE standard also defines 32-bit floating point quantities, which
1490
it calls ``single format'' numbers. \MMIX\ calls them {\it short floats},
1491
@^short float@>
1492
and converts between 32-bit and 64-bit forms when such numbers are
1493
loaded from memory or stored into memory. A short float consists of a sign
1494
bit followed by an 8-bit exponent and a 23-bit fraction. After it has
1495
been loaded into one of\/ \MMIX's registers, its 52-bit fraction part
1496
will have 29 trailing zero bits, and its exponent~$e$ will be one of the
1497
256 values 0, $(01110000001)_2=897$, $(01110000010)_2=898$, \dots,
1498
$(10001111110)_2=1150$, or~2047, unless it was subnormal; a subnormal
1499
short float loads into a normal number with $874\le e\le896$.
1500
 
1501
\bull\
1502
@.LDSF@>
1503
Register~X is set to the 64-bit floating point number corresponding
1504
to the 32-bit floating point number represented by
1505
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
1506
No arithmetic exceptions occur, not even if a signaling NaN is loaded.
1507
 
1508
\bull\
1509
@.STSF@>
1510
The value obtained by rounding register~X to a 32-bit floating
1511
point number is placed in $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
1512
Rounding is done with the current rounding mode, in a manner
1513
exactly analogous to the standard conventions for rounding 64-bit results,
1514
except that the precision and exponent range are limited. In particular,
1515
floating overflow, underflow, and inexact exceptions might occur;
1516
a signaling NaN will trigger an invalid exception and it will become quiet.
1517
The fraction part of a NaN is truncated if necessary to a multiple of
1518
$2^{-23}$, by ignoring the least significant 29 bits.
1519
 
1520
If we load any two short floats and operate on them once with either \.{FADD},
1521
\.{FSUB}, \.{FMUL}, \.{FDIV}, \.{FREM}, \.{FSQRT}, or \.{FINT}, and if we then
1522
store the result as a short float, we obtain the results required by
1523
the IEEE standard for single format arithmetic, because
1524
the double format can be shown to have enough precision to avoid any
1525
problems of ``double rounding.'' But programmers are usually better
1526
off sticking
1527
to 64-bit arithmetic unless they have a strong reason to emulate the
1528
precise behavior of a 32-bit computer; 32 bits do not offer
1529
much precision.
1530
 
1531
@ Of course we need to be able to go back and forth between integers and
1532
floating point values.
1533
 
1534
\bull\
1535
@.FIX@>
1536
The floating point number in register~Z is converted to an integer
1537
as with the \.{FINT} instruction, and the resulting integer (mod~$2^{64}$)
1538
is placed in register~X\null.
1539
An invalid exception occurs if \$Z is infinite
1540
or a NaN; in that case \$X is simply set equal to~\$Z\null. A float-to-fix
1541
exception occurs if the result is less than
1542
@^float-to-fix exception@>
1543
@^short float@>
1544
$-2^{63}$ or greater than $2^{63}-1$.
1545
 
1546
\bull\
1547
@.FIXU@>
1548
This instruction is identical to \.{FIX} except that no float-to-fix
1549
exception occurs.
1550
 
1551
\bull\
1552
@.FLOT@>
1553
The integer in \$Z or the immediate constant~Z is
1554
converted to the nearest floating point value (using the current rounding
1555
mode) and placed in register~X\null. A floating inexact exception
1556
occurs if rounding is necessary.
1557
 
1558
\bull\
1559
@.FLOTU@>
1560
\.{FLOTU} is like \.{FLOT}, but \$Z is treated as an unsigned integer.
1561
 
1562
\bull\
1563
\
1564
@.SFLOT@>
1565
@.SFLOTU@>
1566
The \.{SFLOT} instructions are like the \.{FLOT} instructions, except that
1567
they round to a floating point number whose fraction part is a multiple
1568
of $2^{-23}$. (Thus, the resulting value will not be changed by a ``store
1569
short float'' instruction.) Such conversions appear in \MMIX's repertoire only
1570
to establish complete conformance with the IEEE standard; a programmer
1571
needs them only when emulating a 32-bit machine.
1572
@^emulation@>
1573
 
1574
@ Since the variants of \.{FIX} and \.{FLOT} involve only one input operand (\$Z
1575
or~Z), their Y~field is normally zero. A programmer can, however, force the
1576
mode of rounding used with these commands by setting
1577
$$\vbox{\halign{$\yy=#$,\quad &\.{ROUND\_#};\hfil\cr
1578
1&OFF\cr
1579
2&UP\cr
1580
3&DOWN\cr
1581
4&NEAR\cr}}$$
1582
for example, the instruction \
1583
exponent~$e$ of register~X to $1086-l$ if \$Z is a nonzero quantity with
1584
$l$~leading zero bits. Thus we can count leading zeros by continuing
1585
with \.{SETL}~\.{\$0,1086}; \.{SR}~\.{\$X,\$X,52}; \.{SUB}~\.{\$X,\$0,\$X};
1586
\.{CSZ}~\.{\$X,\$Z,64}.
1587
@^counting leading zeros@>
1588
@.FLOT@>
1589
@.FLOTU@>
1590
@.SFLOT@>
1591
@.SFLOTU@>
1592
@.FIX@>
1593
@.FIXU@>
1594
@:ROUND_OFF}\.{ROUND\_OFF@>
1595
@:ROUND_UP}\.{ROUND\_UP@>
1596
@:ROUND_DOWN}\.{ROUND\_DOWN@>
1597
@:ROUND_NEAR}\.{ROUND\_NEAR@>
1598
 
1599
The Y field can also be used in the same way
1600
to specify any desired rounding mode in the other
1601
floating point instructions that have only a single operand, namely
1602
\.{FSQRT} and~\.{FINT}.
1603
@.FSQRT@>
1604
@.FINT@>
1605
An illegal instruction interrupt occurs if Y exceeds~4 in any of these
1606
commands.
1607
@^illegal instructions@>
1608
 
1609
@* Subroutine linkage.
1610
\MMIX\ has several special operations designed to facilitate the process of
1611
calling and implementing subroutines. The key notion is the idea of a
1612
hardware-supported {\it register stack}, which can coexist with a
1613
software-supported stack of variables that are not maintained in registers.
1614
From a programmer's standpoint, \MMIX\ maintains a potentially unbounded list
1615
$S[0]$, $S[1]$, \dots,~$S[\tau-1]$ of octabytes holding the contents
1616
of registers that are temporarily inaccessible; initially $\tau=0$.
1617
When a subroutine is entered, registers can be ``pushed'' on to the end of
1618
this list, increasing~$\tau$; when the subroutine has finished its
1619
execution, the registers are ``popped'' off again and $\tau$~decreases.
1620
 
1621
Our discussion so far has treated all 256 registers \$0, \$1, \dots,~\$255 as if
1622
they were alike. But in fact, \MMIX\ maintains two internal one-byte counters
1623
$L$ and~$G$, where $0\le\ll\le\gg<256$, with the property that
1624
$$\vbox{\halign{#\hfil\cr
1625
registers 0, 1, \dots, $\ll-1$ are ``local'';\cr
1626
registers @!|L|, $\ll+1$, \dots, $\gg-1$ are ``marginal'';\cr
1627
registers @!|G|, $\gg+1$, \dots, 255 are ``global.''\cr}}$$
1628
A marginal register is zero when its value is read.
1629
@^illegal instructions@>
1630
@^rG@>
1631
@^rL@>
1632
@^local registers@>
1633
@^marginal registers@>
1634
@^global registers@>
1635
@^register stack@>
1636
 
1637
The $G$ counter is normally set to a fixed value once and for all when a program
1638
is loaded, thereby defining the number of program variables that will live
1639
entirely in registers rather than in memory during the course of execution.
1640
A programmer may, however, change~$G$ dynamically using the \.{PUT}
1641
instruction described below.
1642
 
1643
The $L$ counter starts at 0. If an instruction places a value into a register
1644
that is currently marginal, namely a register $x$ such that
1645
$\ll\le x<\gg$, the value of~$L$ will increase to $x+1$, and any
1646
newly local registers will be zero. For example, if $\ll=10$ and
1647
$\gg=200$, the instruction \
1648
instruction \
1649
\$15 to $\$5+\$200$, and $L$~to~16. (The process of clearing registers and
1650
increasing~$L$ might take quite a few machine cycles in the worst case. We will
1651
see later that \MMIX\ is able to take care of any high-priority interrupts
1652
that might occur during this time.)
1653
 
1654
\bull\
1655
\bul\
1656
@.PUSHGO@>
1657
@.PUSHJ@>
1658
Suppose first that $\xx<\ll$.
1659
Register~X is set equal to the number~X, then
1660
registers 0, 1, \dots,~X are pushed onto the register stack as
1661
described below.
1662
If this instruction is in
1663
location $\lambda$, the value $\lambda+4$ is placed into the special {\it
1664
return-jump register\/}~rJ\null. Then control jumps to instruction
1665
@^rJ@>
1666
$\lambda+4\rm YZ$ or $\lambda+4\rm YZ-262144$ or
1667
$\rY+\rZ$ or $\rY+\zz$, as in a
1668
\.{JMP} or \.{GO} command.
1669
 
1670
Pushing the first $\xx+1$ registers onto the stack means essentially that we
1671
set $S[\tau]\gets\$0$, $S[\tau+1]\gets\$1$, \dots, $S[\tau+\xx]\gets\$\xx$,
1672
$\tau\gets\tau+\xx+1$, $\$0\gets\$(\xx+1)$, \dots,
1673
$\$(\ll-\xx-2)\gets\$(\ll-1)$, $\ll\gets\ll-\xx-1$. For example, if
1674
$\xx=1$ and $\ll=5$, the current contents of \$0 and the number~1 are
1675
placed on the register stack, where they will be temporarily inaccessible.
1676
Then control jumps to a subroutine with $L$ reduced to~3; the registers that we
1677
had been calling \$2, \$3, and \$4 appear as \$0, \$1, and \$2 to the subroutine.
1678
 
1679
If $\ll\le\xx<\gg$, the value of $\ll$ increases to $\xx+1$ as described
1680
above; then the rules for $\xx<\ll$ apply.
1681
 
1682
If $\xx\ge\gg$ the actions are similar, except that {\it all\/} of the local
1683
registers \$0, \dots,~$\$(\ll-1)$ are placed on the register stack
1684
followed by the number~$L$, and $L$~is reset to zero. In particular, the
1685
instruction \
1686
onto the stack and sets $L$ to zero, regardless of the previous value of~$L$.
1687
 
1688
We will see later that \MMIX\ is able to achieve the effect of pushing and
1689
renaming local registers without actually doing very much work at all.
1690
 
1691
\bull\
1692
@.POP@>
1693
This command preserves X of the current local registers,
1694
undoes the effect of the most recent \.{PUSHJ} or \.{PUSHGO}, and jumps
1695
to the instruction in $\mm_4[{\rm4YZ+rJ}]$. If $\xx>0$, the value of
1696
$\$(\xx-1)$ goes into the ``hole'' position where \.{PUSHJ} or
1697
\.{PUSHGO} stored the number of registers previously pushed.
1698
 
1699
The formal details of \.{POP} are slightly complicated, but we will see that
1700
they make sense: If $\xx>\ll$, we first replace X by $\ll+1$. Then we
1701
set $x\gets S[\tau-1]\bmod 256$; this is the effective value of the X~field
1702
in the push instruction that is being undone. Stack position $S[\tau-1]$ is
1703
now set to $\$(\xx-1)$ if $0<\xx\le L$, otherwise it is set to zero.
1704
Then we essentially set
1705
$\ll\gets\min(x+\xx,\gg)$, $\$(\ll-1)\gets\$(\ll-x-2)$, \dots,
1706
$\$(x+1)\gets\$0$, $\$x\gets S[\tau-1]$, \dots,
1707
$\$0\gets S[\tau-x-1]$, $\tau\gets\tau-x-1$. The operating system should
1708
@^operating system@>
1709
arrange things so that a memory-protection
1710
interrupt will occur if a program does more pops than pushes.
1711
(If $x>\gg$, these formulas don't make sense as written; we actually
1712
set $\$j\gets S[\tau-x-1+j]$ for $\ll>j\ge0$ in that rare case.)
1713
 
1714
Suppose, for example, that a subroutine has three input parameters
1715
$(\$0,\$1,\$2)$ and produces two outputs $(\$0,\$1)$. If the subroutine does
1716
not call any other subroutines, it can simply end with \.{POP} \.{2,0},
1717
because rJ will contain the return address. Otherwise it should begin by
1718
saving rJ, for example with the instruction \
1719
using local registers \$0 through~\$3, and it should use \
1720
\
1721
calling sub-subroutines; finally it should \
1722
saying \.{POP}~\.{2,0}. To call the subroutine from another routine that
1723
has, say, 6~local registers, we would put the input arguments into \$7, \$8,
1724
and~\$9, then issue the command \.{PUSHGO} \.{\$6,base,Subr};
1725
in due time the outputs of the subroutine will appear in \$7 and~\$6.
1726
 
1727
Notice that the push and pop commands make use of a one-place ``hole'' in the
1728
register stack, between the registers that are pushed down and the registers
1729
that remain local. (The hole is position \$6 in the example just considered.)
1730
\MMIX\ needs this hole position to remember the number of
1731
registers that are pushed down.
1732
A subroutine with no outputs ends with \
1733
(becomes marginal). A subroutine with one output~\$0 ends with \
1734
the hole gets the former value of~\$0. A subroutine with two outputs
1735
$(\$0,\$1)$ ends with \
1736
this case, therefore, the relative order of the two outputs has been switched
1737
on the register stack. If a subroutine has, say, five outputs
1738
$(\$0,\ldots,\$4)$, it ends with \
1739
where it is followed by $(\$0,\$1,\$2,\$3)$.
1740
\MMIX\ makes this curious permutation in the case of multiple outputs because
1741
the hole is most easily plugged by moving one value down (namely~\$4) instead
1742
of by sliding each of five values down in the stack.
1743
 
1744
These conventions for parameter passing are admittedly a bit confusing in the
1745
general case, and I~suppose people who use them extensively might someday find
1746
themselves talking about ``the infamous \MMIX\ register shuffle.'' However,
1747
there is good use for subroutines that convert
1748
a sequence of register contents like $(x,a,b,c)$ into $(f,a,b,c)$ where
1749
$f$ is a function of $a$, $b$, and $c$ but not~$x$. Moreover,
1750
\.{PUSHGO} and \.{POP} can be implemented with great efficiency,
1751
and subroutine linkage tends to be a significant bottleneck when
1752
other conventions are used.
1753
 
1754
Information about a subroutine's calling conventions needs to be communicated
1755
to a debugger. That can readily be done at the same time as we inform the
1756
debugger about the symbolic names of addresses in memory.
1757
 
1758
A subroutine that uses 50 local registers will not function properly if it is
1759
called by a program that sets $G$ less than~50. \MMIX\ does not allow the
1760
value of~$G$ to become less than~32. Therefore any subroutine that avoids
1761
global registers and uses at most~32 local registers
1762
can be sure to work properly regardless of the current value of~$G$.
1763
 
1764
The rules stated above imply that a \.{PUSHJ} or
1765
\.{PUSHGO} instruction with $\xx=255$ pushes all of the currently defined
1766
local registers onto the stack and sets $L$ to~zero.
1767
This makes $G$ local registers available for use by the subroutine
1768
jumped~to. If that subroutine later returns with \.{POP} \.{0,0}, the former
1769
value of~$L$ and the former contents of \$0, \dots,~$\$(\ll-1)$ will be
1770
restored (assuming that $G$ doesn't decrease).
1771
 
1772
A \.{POP} instruction with $\xx=255$
1773
preserves all the local registers as outputs of
1774
the subroutine (provided that the total doesn't exceed~$G$ after popping),
1775
and puts zero into the hole (unless $L=G=255$). The best policy, however, is
1776
almost always to use \.{POP} with a small value of~X, and in general to keep
1777
the value of~$L$ as small as
1778
possible by decreasing it when registers are no longer active.
1779
A smaller value of~$L$ means that \MMIX\ can change context more
1780
easily when switching from one process to another.
1781
 
1782
@* System considerations.
1783
High-performance implementations of\/ \MMIX\ gain speed by keeping {\it
1784
caches\/} of instructions and data that are likely to be needed as computation
1785
@^caches@>
1786
proceeds. [See M.~V. Wilkes, {\sl IEEE Transactions\/ \bf EC-14} (1965),
1787
270--271; J.~S. Liptay, {\sl IBM System J. \bf7} (1968), 15--21.]
1788
@^Wilkes, Maurice Vincent@>
1789
@^Liptay, John S.@>
1790
Careful programmers can make the computer run even faster by giving
1791
hints about how to maintain such caches.
1792
 
1793
\bull\
1794
@.LDUNC@>
1795
These instructions, which have the same meaning as \.{LDO}, also
1796
inform the computer that the loaded octabyte (and its neighbors in a cache
1797
block) will probably not be read or written in the near future.
1798
 
1799
\bull\
1800
@.STUNC@>
1801
These instructions, which have the same meaning as \.{STO}, also
1802
inform the computer that the stored octabyte (and its neighbors in a cache
1803
block) will probably not be read or written in the near future.
1804
 
1805
\bull\
1806
@.PRELD@>
1807
These instructions have no effect on registers or memory, but they inform the
1808
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
1809
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
1810
will probably be loaded and/or stored in the near future.
1811
No protection failure occurs if the memory is not accessible.
1812
 
1813
\bull\
1814
@.PREGO@>
1815
These instructions have no effect on registers or memory, but they inform the
1816
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
1817
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
1818
will probably be used as instructions in the near future.
1819
No protection failure occurs if the memory is not accessible.
1820
 
1821
\bull\
1822
@.PREST@>
1823
These instructions have no effect on registers or memory if the computer has
1824
no data cache. But when such a cache exists, they inform the
1825
computer that all of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
1826
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
1827
will definitely be stored in the near future before they are loaded.
1828
(Therefore it is permissible for the machine to ignore the present contents of
1829
those bytes. Also, if those bytes are being shared by several processors,
1830
the current processor should try to acquire exclusive access.)
1831
No protection failure occurs if the memory is not accessible.
1832
 
1833
\bull\
1834
@.SYNCD@>
1835
When executed from nonnegative locations, these instructions have no effect on
1836
registers or memory if neither a write buffer nor a ``write back''
1837
data cache are present. But when such a buffer or cache exists, they force the
1838
computer to make sure that all data for the $\xx+1$ bytes
1839
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or
1840
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
1841
will be present in memory.
1842
(Otherwise the result of a previous store instruction might appear only
1843
in the cache; the computer is being told that now is the time to
1844
write the information back, if it hasn't already been written. A program
1845
can use this feature before outputting directly from memory.)
1846
No protection failure occurs if the memory is not accessible.
1847
 
1848
The action is similar when \.{SYNCD} is executed from a negative address,
1849
but in this case the specified bytes are also removed from the data
1850
cache (and from a secondary cache, if present). The operating system can
1851
use this feature when a page of virtual memory is being swapped out,
1852
or when data is input directly into memory.
1853
@^operating system@>
1854
 
1855
\bull\
1856
@.SYNCID@>
1857
When executed from nonnegative locations these instructions have no effect on
1858
registers or memory if the computer has no instruction cache separate from a
1859
data cache. But when such a cache exists, they force the
1860
computer to make sure that the $\xx+1$ bytes
1861
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or
1862
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
1863
will be interpreted correctly
1864
if used as instructions before they are next modified.
1865
(Generally speaking, an \MMIX\ program is not expected to store anything in
1866
memory locations that are also being used as instructions.
1867
Therefore \MMIX's instruction cache is allowed to become inconsistent with
1868
respect to its data cache. Programmers who insist on executing instructions
1869
that have been fabricated dynamically, for example when setting a breakpoint
1870
for debugging, must first \.{SYNCID} those instructions
1871
in order to guarantee that the intended results will be obtained.) A \.{SYNCID}
1872
command might be implemented in several ways; for example, the machine
1873
might update its instruction cache to agree with its data cache. A simpler
1874
solution, which is good enough because the need for \.{SYNCID} ought to
1875
be rare, removes instructions in the specified range
1876
from the instruction cache, if
1877
present, so that they will have to be fetched from memory the next time
1878
they are needed; in this case the machine also carries out the effect of
1879
a~\.{SYNCD} command.
1880
No protection failure occurs if the memory is not accessible.
1881
 
1882
The behavior is more drastic, but faster, when \.{SYNCID} is executed
1883
from a negative location. Then all bytes in the specified range are
1884
simply removed from all caches, and the memory corresponding to
1885
any ``dirty'' cache blocks involving such bytes is {\it not\/} brought up
1886
to date. An operating system can use this version of the command
1887
when pages of virtual memory are being discarded (for example, when
1888
a program is being terminated).
1889
 
1890
@ \MMIX\ is designed to work not only on a single processor but also
1891
in situations where several processors
1892
share a common memory. The following commands are useful
1893
for efficient operation in such circumstances.
1894
 
1895
\bull\
1896
@.CSWAP@>
1897
If the octabyte $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$
1898
is equal to the contents of the special {\it prediction register\/}~rP,
1899
@^rP@>
1900
it is replaced in memory with the contents of register~X, and
1901
register~X is set equal to~1. Otherwise the octabyte in memory
1902
replaces rP and register~X is set to zero.
1903
This is an atomic (indivisible, uninterruptible) operation,
1904
useful for interprocess communication
1905
when independent computers are sharing the same memory.
1906
 
1907
The compare-and-swap operation was introduced by IBM in late
1908
models of the
1909
@^IBM Corporation@>
1910
@^compare-and-swap@>
1911
@^atomic instruction@>
1912
System/370 architecture, and it soon spread to several
1913
@^System/370@>
1914
other machines. Significant ways to use it are discussed, for example,
1915
in section 7.2.3 of Harold Stone's
1916
{\sl High-Performance Computer Architecture\/} (Reading, Massachusetts:\
1917
Addison--Wesley, 1987), and in sections 8.2 and 8.3 of {\sl Transaction
1918
Processing\/} by Jim Gray and Andreas Reuter (San Francisco:\ Morgan
1919
Kaufmann, 1993). % Kaufmann: stet
1920
@^Stone, Harold Stuart@>
1921
@^Gray, James Nicholas@>
1922
@^Reuter, Andreas Horst@>
1923
 
1924
\bull\
1925
@.SYNC@>
1926
If $\rm XYZ=0$, the machine drains its pipeline (that is, it
1927
stalls until all preceding instructions have completed their activity).
1928
If $\rm XYZ=1$, the machine controls its actions less drastically,
1929
in such a way that all
1930
store instructions preceding this \.{SYNC} will be completed
1931
before all store instructions after it.
1932
If $\rm XYZ=2$, the machine controls its actions in such a way that all
1933
load instructions preceding this \.{SYNC} will be completed
1934
before all load instructions after it.
1935
If $\rm XYZ=3$, the machine controls its actions
1936
in such a way that all {\it load or store\/} instructions preceding this
1937
\.{SYNC} will be completed before all load or store instructions after it.
1938
If $\rm XYZ=4$, the machine goes into a power-saver mode, in which
1939
@^power-saver mode@>
1940
instructions may be executed more slowly (or not at all) until some kind
1941
of ``wake-up'' signal is received.
1942
If $\rm XYZ=5$, the machine empties its write buffer and
1943
cleans its data caches, if any (including a possible secondary cache);
1944
the caches retain their data,
1945
but the cache contents also appear in memory.
1946
If $\rm XYZ=6$, the machine clears its virtual address translation
1947
caches (see below).
1948
If $\rm XYZ=7$, the machine clears its instruction and data caches,
1949
discarding any information in the data caches that wasn't previously
1950
in memory. (``Clearing'' is stronger than ``cleaning''; a clear cache
1951
remembers nothing. Clearing is also faster, because it simply obliterates
1952
everything.)
1953
If $\rm XYZ>7$, an illegal instruction interrupt occurs.
1954
 
1955
Of course no \.{SYNC} is necessary between a command that loads from or stores
1956
into memory and a subsequent command that loads from or stores into exactly
1957
the same location. However, \.{SYNC} might be necessary in certain cases even
1958
on a one-processor system, because input/output processes take place in
1959
parallel with ordinary computation.
1960
 
1961
The cases $\rm XYZ>3$ are {\it privileged}, in the sense that
1962
only the operating system can use them. More precisely, if a \.{SYNC}
1963
command is encountered with $\rm XYZ=4$ or
1964
$\rm XYZ=5$ or $\rm XYZ=6$ or $\rm XYZ=7$,
1965
a ``privileged instruction interrupt'' occurs unless that interrupt
1966
is currently disabled. Only the operating system can disable
1967
interrupts (see below).
1968
@^privileged operations@>
1969
 
1970
@* Trips and traps.
1971
Special register rA records the current status information
1972
about arithmetic exceptions. Its least significant byte contains eight
1973
``event'' bits called DVWIOUZX from left to right, where D stands for
1974
integer divide check, V~for integer overflow, W~for float-to-fix overflow,
1975
I~for invalid operation, O~for floating overflow, U~for
1976
floating underflow, Z~for floating division by zero, and X~for floating
1977
inexact. % The low order five bits agree with SPARC I conventions
1978
% but Alpha, for example, uses the order VXUOZI
1979
The next least significant byte of rA contains eight
1980
``enable'' bits with the same names DVWIOUZX and the same meanings.
1981
When an exceptional condition occurs, there are two cases: If the
1982
corresponding enable bit is~0, the corresponding event bit is set
1983
to~1.  But if the corresponding enable bit is~1, \MMIX\ interrupts
1984
its current instruction stream and executes a special ``exception
1985
handler.'' Thus, the event bits record exceptions that have not been
1986
``tripped.''
1987
@^overflow@>
1988
@^underflow@>
1989
@^exceptions@>
1990
@^handlers@>
1991
@^float-to-fix exception@>
1992
@^inexact exception@>
1993
@^invalid exception@>
1994
@^divide check exception@>
1995
 
1996
Floating point overflow always causes two exceptions, O and~X\null.
1997
(The strictest interpretation of the IEEE standard would raise exception~X
1998
on overflow only if floating overflow is not enabled, but \MMIX\ always
1999
considers an overflowed result to be inexact.)
2000
Floating point underflow always causes both U and~X when underflow is
2001
not enabled, and it might cause both U and~X when underflow is enabled.
2002
If both enable bits are set to~1 in such cases, the overflow or underflow
2003
handler is called and the inexact handler is ignored. All other types
2004
of exceptions arise one at a time, so there is no ambiguity about which
2005
exception handler should be invoked unless exceptions are raised by
2006
``ropcode~2'' (see below); in general the first enabled exception
2007
in the list DVWIOUZX takes precedence.
2008
 
2009
What about the six high-order bytes of the status register rA?
2010
@^rA@>
2011
@^rounding modes@>
2012
At present, only two of those 48 bits are defined;
2013
the others must be zero for compatibility
2014
with possible future extensions. The two bits corresponding to $2^{17}$
2015
and $2^{16}$ in rA specify a rounding mode, as follows: 00~means
2016
round to nearest (the default); 01~means round off (toward zero);
2017
10~means round up (toward positive infinity); and
2018
11~means round down (toward negative infinity).
2019
% Alpha conventions differ: 10,00,11,01 for nearest,off,up,down
2020
 
2021
@ The execution of\/ \MMIX\ programs can be interrupted in several ways.
2022
We have just seen that arithmetic exceptions will cause interrupts if
2023
they are enabled; so will illegal or privileged instructions, or instructions
2024
@^illegal instructions@>
2025
@^privileged operations@>
2026
@^emulation@>
2027
@^interrupts@>
2028
@^I/O@>
2029
@^input/output@>
2030
that are emulated in software instead of provided by the hardware.
2031
Input/output operations or external timers are another common source
2032
of interrupts; the operating system knows how to deal with
2033
all gadgets that might be hooked up to an \MMIX\ processor chip.
2034
Interrupts occur also when memory accesses fail---for example if
2035
memory is nonexistent or protected.
2036
Power failures that force the machine to use its backup battery power
2037
in order to keep running in an emergency,
2038
or hardware failures like parity errors,
2039
all must be handled as gracefully as possible.
2040
 
2041
Users can also force interrupts to happen by giving explicit \.{TRAP} or
2042
\.{TRIP} instructions:
2043
 
2044
\bull\
2045
@.TRIP@>
2046
@.TRAP@>
2047
Both of these instructions interrupt processing and transfer control
2048
to a handler. The difference between them is that \.{TRAP}
2049
is handled by the operating system but \.{TRIP} is handled by the user.
2050
@^operating system@>
2051
More precisely, the X, Y, and Z fields of \.{TRAP} have special significance
2052
predefined by the operating system kernel. For example, a system call---say an I/O
2053
command, or a command to allocate more memory---might be invoked
2054
by certain settings of X, Y, and~Z\null.
2055
The X, Y, and Z fields of \.{TRIP}, on the other hand, are definable by
2056
users for their own applications, and users also define their own
2057
handlers. ``Trip handler'' programs
2058
invoked by \.{TRIP} are interruptible, but interrupts are normally inhibited
2059
while a \.{TRAP} is being serviced. Specific details about the
2060
precise actions of \.{TRIP} and \.{TRAP} appear below, together
2061
with the description of another command called \.{RESUME} that
2062
returns control from a handler to the interrupted program.
2063
 
2064
Only two variants of \.{TRAP} are predefined by the \MMIX\ architecture:
2065
If $\rm XYZ=0$ in a \.{TRAP}
2066
command, a user process should terminate. If $\rm XYZ=1$,
2067
the operating system should provide default action for cases in which
2068
the user has not provided any handler for a particular
2069
kind of interrupt (see below).
2070
 
2071
A few additional variants of \.{TRAP} are predefined in the rudimentary
2072
operating system used with \MMIX\ simulators. These variants, which
2073
allow simple input/output operations to be done, all have $\xx=0$,
2074
and the Y~field is a small positive constant. For example, $\yy=1$ invokes
2075
the \.{Fopen} routine, which opens a file. (See the program
2076
{\mc MMIX-SIM} for full details.)
2077
@^I/O@>
2078
@^input/output@>
2079
 
2080
@ Non-catastrophic interrupts in \MMIX\ are always {\it precise}, in the sense that all legal
2081
instructions before a certain point have effectively been executed, and
2082
no instructions after that point have yet been executed. The current
2083
instruction, which may or may not have been completed at the time of
2084
interrupt and which may or may not need to be resumed after the interrupt has
2085
been serviced, is
2086
put into the special {\it execution register\/}~rX, and its operands (if any)
2087
are placed in special registers rY and~rZ\null. The address of the following
2088
instruction is placed in the special {\it where-interrupted
2089
register\/}~rW\null.
2090
@^interrupts@>
2091
@^rW@>
2092
@^rX@>
2093
@^rY@>
2094
@^rZ@>
2095
The instruction in~rX may not be the same as the instruction in
2096
location $\rm rW-4$; for example, it may be an instruction that
2097
branched or jumped to~rW\null. It might also be an instruction
2098
inserted internally by the \MMIX\ processor.
2099
(For example, the computer silently inserts an internal instruction
2100
that increases~$L$ before an instruction
2101
like \
2102
occurs, between the inserted instruction and the \.{ADD},
2103
the instruction in~rX will
2104
say \.{ADD}, because an internal instruction retains the identity of the
2105
actual command that spawned it; but rW will point to the {\it real\/}
2106
\.{ADD} command.)
2107
 
2108
When an instruction has the normal meaning ``set \$X to
2109
the result of \$Y~op~\$Z'' or ``set \$X to the result of \$Y~op~Z,''
2110
special registers rY and~rZ will relate in the
2111
obvious way to the Y and~Z operands of the instruction; but this is not
2112
always the case. For example, after an interrupted
2113
store instruction, the first operand~rY will hold
2114
the virtual memory address (\$Y plus either \$Z or~Z),
2115
and the second operand~rZ will be the octabyte to be stored in memory
2116
(including bytes that have not changed, in cases like \.{STB}). In
2117
other cases the actual
2118
contents of rY and~rZ are defined by each implementation of\/ \MMIX,
2119
and programmers should not rely on their significance.
2120
 
2121
Some instructions take an unpredictable and possibly long amount of time, so
2122
it may be necessary to interrupt them in progress. For example, the \.{FREM}
2123
@.FREM@>
2124
instruction (floating point remainder) is extremely difficult to compute
2125
rapidly if its first operand has an exponent of~2046 and its second operand
2126
has an exponent of~1. In such cases the rY and rZ registers saved during an
2127
interrupt show the current state of the computation, not necessarily the
2128
original values of the operands. The value of $\rm rY\,{rem}\,rZ$ will still
2129
be the desired remainder, but rY may well have been reduced to a
2130
number that has an exponent closer to the exponent of~rZ\null.
2131
After the interrupt has been processed, the remainder
2132
computation will continue where it left off.
2133
(Alternatively, an operation like \.{FREM} or even \.{FADD} might be
2134
implemented in software instead of hardware, as we will see later.)
2135
 
2136
Another example arises with an instruction like \.{PREST} (prestore), which can
2137
@.PREST@>
2138
specify prestoring up to 256 bytes. An implementation of\/ \MMIX\ might choose
2139
to prestore only 32 or 64 bytes at a time, depending on the cache block size;
2140
then it can change the contents of rX to reflect the unfinished part of
2141
a partially completed \.{PREST} command.
2142
 
2143
Commands that decrease $G$, pop the stack, save the
2144
current context, or unsave an old context also are interruptible. Register~rX
2145
is used to communicate information about partial completion in such a
2146
way that the interruption will be essentially ``invisible'' after
2147
a program is resumed.
2148
 
2149
@ Three kinds of interruption are possible: trips, forced traps, and
2150
dynamic traps. We will discuss each of these in turn.
2151
@^interrupts@>
2152
@^trips@>
2153
@^traps@>
2154
@^forced traps@>
2155
@^dynamic traps@>
2156
@^handlers@>
2157
@^operating system@>
2158
 
2159
A \.{TRIP} instruction puts itself into the right half of the execution
2160
@.TRIP@>
2161
register~rX, and sets the 32 bits of the left half to \Hex{80000000}.
2162
(Therefore rX is {\it negative\/}; this fact will
2163
tell the \.{RESUME} command not to \.{TRIP} again.) The special registers
2164
rY and rZ are set to the contents of the registers specified by the
2165
Y and Z fields of the \.{TRIP} command, namely \$Y and~\$Z.
2166
Then \$255 is placed into the special {\it bootstrap
2167
register\/}~rB, and \$255 is set to~rJ. \MMIX\ now takes its next instruction
2168
@^rB@>
2169
from virtual memory address~0.
2170
 
2171
Arithmetic exceptions interrupt the computation in essentially the
2172
same way as \.{TRIP}, if they are enabled. The only difference is that
2173
their handlers begin at the respective addresses
2174
16, 32, 48, 64, 80, 96, 112, and~128, for exception bits D, V, W, I, O, U,
2175
Z, and~X of~rA; registers rY and~rZ are set to the operands of the
2176
interrupted instruction as explained earlier.
2177
 
2178
A 16-byte block of memory is just enough for a sequence of commands like
2179
$$\hbox{\tt PUSHJ 255,Handler; PUT rJ,\$255; GET \$255,rB; RESUME}$$
2180
which will invoke a user's handler. And if the user does not choose to
2181
provide a custom-designed handler, the operating system provides a
2182
default handler via the instructions
2183
$$\hbox{\tt TRAP 1; GET \$255,rB; RESUME.}$$
2184
 
2185
A trip handler might simply record the fact that tripping occurred.
2186
But the handler for an arithmetic interrupt might want to change the
2187
default result of a computation. In such cases, the handler should place
2188
the desired substitute result into~rZ, and it should change the most
2189
significant byte of~rX from \Hex{80} to \Hex{02}. This will have the desired
2190
effect, because of the rules of \.{RESUME} explained below, {\it unless\/}
2191
the exception occurred on a command like \.{STB} or \.{STSF}. (A~bit more
2192
work is needed to alter the effect of a command that stores into memory.)
2193
 
2194
Instructions in {\it negative\/} virtual locations do not invoke trip
2195
handlers, either for \.{TRIP} or for arithmetic exceptions. Such instructions
2196
are reserved for the operating system, as we will see.
2197
@^negative locations@>
2198
 
2199
@ A \.{TRAP} instruction interrupts the computation essentially
2200
@^interrupts@>
2201
like \.{TRIP}, but with the following modifications:
2202
@^rT@>
2203
@.TRAP@>
2204
@^rK@>
2205
(i)~the interrupt mask register~rK is cleared
2206
to zero, thereby inhibiting interrupts; (ii)~control jumps to virtual memory
2207
address~rT, not zero; (iii)~information is placed
2208
@^rBB@>
2209
@^rWW@>
2210
@^rXX@>
2211
@^rYY@>
2212
@^rZZ@>
2213
in a separate set of special registers rBB, rWW, rXX, rYY, and~rZZ, instead of
2214
rB, rW, rX, rY, and~rZ\null. (These special registers are needed because a trap
2215
might occur while processing a \.{TRIP}.)
2216
 
2217
Another kind of forced trap occurs on implementations of\/ \MMIX\ that
2218
emulate certain instructions in software rather than in hardware.
2219
Such instructions cause a \.{TRAP} even though their opcode is something
2220
else like \.{FREM} or \.{FADD} or \.{DIV}. The trap handler can tell
2221
what instruction to emulate by looking at the opcode, which appears
2222
in~rXX\null.
2223
In such cases the left-hand half of~rXX is set to \Hex{02000000}; the handler
2224
emulating \.{FADD}, say, should compute the floating point sum of rYY and~rZZ
2225
and place the result in~rZZ\null. A~subsequent
2226
\.{RESUME}~\.1 will then place the value of~rZZ in the proper register.
2227
@^emulation@>
2228
@^forced traps@>
2229
 
2230
Implementations of\/ \MMIX\ might also emulate the process of
2231
virtual-address-to-physical-address translation described below,
2232
instead of providing for page table calculations in hardware.
2233
Then if, say, a \.{LDB} instruction does not know the physical memory
2234
address corresponding to a specified virtual address, it will cause
2235
a forced trap with the left half of~rXX set to \Hex{03000000} and with
2236
rYY set to the virtual address in question. The trap handler should
2237
place the physical page address into~rZZ; then \.{RESUME}~\.1 will
2238
complete~the~\.{LDB}.
2239
 
2240
@ The third and final kind of interrupt is called a {\it dynamic\/} trap.
2241
@^interrupts@>
2242
@^dynamic traps@>
2243
Such interruptions occur when one or more of the 64 bits in the
2244
special {\it interrupt request register\/}~rQ have been set to~1,
2245
@^rQ@>
2246
@^rK@>
2247
and when at least one corresponding bit of the special
2248
{\it interrupt mask register\/}~rK is also equal to~1. The bit positions
2249
of rQ and~rK have the general form
2250
$$\beginword
2251
&\field{24}{24}&&\field88&&\field{24}{24}&&\field88\cr
2252
\noalign{\hrule}
2253
\\&low-priority I/O&\\&program&\\&high-priority I/O&\\&machine&\\\cr
2254
\noalign{\hrule}\endword$$
2255
where the 8-bit ``program'' bits are called \.{rwxnkbsp} and have
2256
the following meanings:
2257
$$\vbox{\halign{\.# bit: &#\hfil\cr
2258
r&instruction tries to load from a page without read permission;\cr
2259
w&instruction tries to store to a page without write permission;\cr
2260
x&instruction appears in a page without execute permission;\cr
2261
n&instruction refers to a negative virtual address;\cr
2262
k&instruction is privileged, for use by the ``kernel'' only;\cr
2263
b&instruction breaks the rules of\/ \MMIX;\cr
2264
s&instruction violates security (see below);\cr
2265
p&instruction comes from a privileged (negative) virtual address.\cr}}$$
2266
Negative addresses are for the use of the operating system only;
2267
@^operating system@>
2268
@^protection bits@>
2269
@^permission bits@>
2270
@^security violation@>
2271
@^privileged instructions@>
2272
@^illegal instructions@>
2273
@^page fault@>
2274
a security violation occurs if an instruction in a nonnegative address
2275
is executed without the \.{rwxnkbsp} bits of~rK all set to~1.
2276
(In such cases the \.s bits of both rQ and~rK are set to~1.)
2277
 
2278
The eight ``machine'' bits of rQ and rK represent the most urgent
2279
kinds of interrupts. The rightmost bit stands for power failure,
2280
the next for memory parity error, the next for nonexistent memory,
2281
the next for rebooting, etc.
2282
Interrupts that need especially quick service, like requests from
2283
a high-speed network, also are allocated bit positions near the right end.
2284
Low priority I/O devices like keyboards are assigned to bits at the left.
2285
The allocation of input/output devices to bit positions will
2286
differ from implementation to implementation, depending on
2287
what devices are available.
2288
@^I/O@>
2289
@^input/output@>
2290
 
2291
Once $\rm rQ\land rK$ becomes nonzero, the machine waits
2292
briefly until it can give a precise interrupt.
2293
Then it proceeds as with a forced trap,
2294
except that it uses the special ``dynamic
2295
trap address register''~rTT instead of~rT. The trap handler that
2296
@^rTT@>
2297
begins at location~rTT can figure out the reason for interrupt by
2298
examining $\rm rQ\land rK$. (For example, after the instructions
2299
$$\hbox spread-10pt{\tt\spaceskip .5em minus .1em
2300
 GET \$0,rQ; LDOU \$1,savedK; AND \$0,\$0,\$1; SUBU \$1,\$0,1;
2301
   SADD \$2,\$1,\$0; ANDN \$1,\$0,\$1}$$
2302
the highest-priority offending bit will be in \$1 and its position will be
2303
in~\$2.)
2304
@^counting trailing zeros@>
2305
 
2306
If the interrupted instruction contributed 1s to any of the \.{rwxnkbsp} bits
2307
of~rQ, the corresponding bits are set to~1 also in~rXX\null. A~dynamic trap
2308
handler might be able to use this information (although it should
2309
service higher-priority interrupts first if the right half
2310
of $\rm rQ\land rK$ is nonzero).
2311
@^rX@>
2312
 
2313
The rules of\/ \MMIX\ are rigged
2314
so that only the operating system can execute instructions
2315
with interrupts suppressed. Therefore the operating system can in fact
2316
use instructions that would interrupt an ordinary program. Control of
2317
register rK turns out to be the ultimate privilege, and in a sense the
2318
only important one.
2319
@^privileged operations@>
2320
 
2321
An instruction that causes a dynamic trap is usually executed before the
2322
interruption occurs. However, an instruction that traps with
2323
bits \.x, \.k, or \.b does nothing; a load instruction that traps
2324
with \.r or \.n loads zero; a store instruction that traps with any
2325
of \.{rwxnkbsp} stores nothing.
2326
 
2327
@ After a trip handler or trap handler has done its thing, it
2328
generally invokes the following command.
2329
 
2330
\bull\
2331
@.RESUME@>
2332
@^interrupts@>
2333
@^handlers@>
2334
If the Z field of this instruction is zero,
2335
\MMIX\ will use the
2336
information found in special registers rW, rX, rY, and~rZ to restart an
2337
@^rW@>
2338
@^rX@>
2339
@^rY@>
2340
@^rZ@>
2341
@^rBB@>
2342
@^rWW@>
2343
@^rXX@>
2344
@^rYY@>
2345
@^rZZ@>
2346
@^rK@>
2347
interrupted computation. If the execution register rX is negative, it will be
2348
ignored and instructions will be executed starting at virtual address~rW\null;
2349
otherwise the instruction in the right half of the execution register will be
2350
inserted into the program as if it had appeared in location $\rm rW-4$,
2351
subject to certain modifications that we will explain momentarily,
2352
and the {\it next\/} instruction will come from rW.
2353
 
2354
If the Z field of \.{RESUME}
2355
is 1 and if this instruction appears in a negative location,
2356
registers rWW, rXX, rYY, and~rZZ are used instead of rW, rX, rY, and~rZ\null.
2357
Also, just before resuming the computation,
2358
mask register rK is set to \$255 and \$255 is set to rBB\null.
2359
(Only the operating system gets to use this feature.)
2360
@^operating system@>
2361
 
2362
An interrupt handler within the operating system might choose to allow itself
2363
to be interrupted. In such cases it should save the contents of
2364
rBB, rWW, rXX, rYY, and~rZZ on some kind of stack, before making rK nonzero.
2365
Then, before resuming whatever caused the base level interrupt, it
2366
must again disable all interrupts; this can be done with \.{TRAP},
2367
because the trap handler can tell from the virtual address in~rWW that
2368
it has been invoked by the operating system. Once rK is again zero,
2369
the contents of rBB, rWW, rXX, rYY, and~rZZ are restored from the stack,
2370
the outer level interrupt mask is placed in \$255, and \
2371
finishes the job.
2372
 
2373
Values of Z greater than 1 are reserved for possible later
2374
definition. Therefore they cause an illegal instruction interrupt (that
2375
is, they set the `\.b' bit of~rQ) in the present version of\/ \MMIX.
2376
@^illegal instructions@>
2377
 
2378
If the execution register rX is nonnegative, its leftmost byte controls
2379
the way its right-hand half will be inserted into the program.
2380
Let's call this byte the ``ropcode.'' A ropcode of~0 simply
2381
inserts the instruction into the execution stream; a ropcode of~1
2382
is similar, but it substitutes rY and rZ for the
2383
two operands, assuming that this makes sense for the operation considered.
2384
@^ropcodes@>
2385
 
2386
Ropcode~2 inserts a command that sets \$X to rZ, where
2387
X~is the second byte in the right half of rX\null.
2388
This ropcode is normally used with forced-trap emulations, so that the result
2389
of an emulated instruction is placed into the correct register.
2390
It also uses the third-from-left byte of~rX to raise any or all of the
2391
arithmetic exceptions DVWIOUZX, at the same time as rZ is
2392
being placed in \$X. Emulated instructions and
2393
explicit \.{TRAP} commands can therefore cause overflow, say,
2394
just as ordinary instructions can.
2395
(Such new exceptions may, of
2396
course, spawn a trip interrupt, if any of the corresponding bits are enabled
2397
in~rA.)
2398
@^rA@>
2399
@^emulation@>
2400
 
2401
Finally, ropcode 3 is the same as ropcode 0, except that it also
2402
tells \MMIX\ to treat rZ as the page table entry for the virtual
2403
address~rY\null. (See the discussion of virtual address translation below.)
2404
Ropcodes greater than~3 are not permitted; moreover,
2405
only \
2406
 
2407
The ropcode rules in the previous paragraphs should of course be understood to
2408
involve rWW, rXX, rYY, and rZZ instead of rW, rX, rY, and~rZ when
2409
the ropcode is seen by \.{RESUME}~\.1. Thus, in particular, ropcode~3
2410
always applies to rYY and~rZZ, never to rY and~rZ.
2411
 
2412
Special restrictions must hold if resumption is to work properly: Ropcodes
2413
0~and~3 must not insert a \.{RESUME} instruction; ropcode~1 must insert
2414
a ``normal'' instruction, namely one whose opcode begins with
2415
one of the hexadecimal digits \Hex{0}, \Hex{1}, \Hex{2}, \Hex{3}, \Hex{6},
2416
\Hex{7}, \Hex{C}, \Hex{D}, or~\Hex{E}. (See the opcode chart below.)
2417
Some implementations may also allow ropcode~1 with \.{SYNCD[I]}
2418
and \.{SYNCID[I]}, so that those instructions can conveniently be
2419
interrupted.
2420
Moreover, the destination register \$X used with ropcode 1 or~2 must
2421
not be marginal. All of these restrictions hold automatically in normal
2422
use; they are relevant only if the programmer tries to do something tricky.
2423
 
2424
Notice that the slightly tricky sequence
2425
$$\hbox{\tt LDA \$0,Loc; PUT rW,\$0; LDTU \$1,Inst; PUT rX,\$1; RESUME}$$
2426
will execute an almost arbitrary instruction \.{Inst} as if it had been in
2427
location \.{Loc-4}, and then will jump to location \.{Loc} (assuming
2428
that \.{Inst} doesn't branch elsewhere).
2429
 
2430
@* Special registers.
2431
@^special registers@>
2432
Quite a few special registers have been mentioned so far, and \MMIX\ actually
2433
has even more. It is time now to enumerate them all, together with their
2434
internal code numbers:
2435
$$\vbox{\halign{\hfil#,\quad&#;\hfil\cr
2436
rA&arithmetic status register [21]\cr
2437
rB&bootstrap register (trip) [0]\cr
2438
rC&cycle counter [8]\cr
2439
rD÷nd register [1]\cr
2440
rE&epsilon register [2]\cr
2441
rF&failure location register [22]\cr
2442
rG&global threshold register [19]\cr
2443
rH&himult register [3]\cr
2444
rI&interval counter [12]\cr
2445
rJ&return-jump register [4]\cr
2446
rK&interrupt mask register [15]\cr
2447
rL&local threshold register [20]\cr
2448
rM&multiplex mask register [5]\cr
2449
rN&serial number [9]\cr
2450
rO®ister stack offset [10]\cr
2451
rP&prediction register [23]\cr
2452
rQ&interrupt request register [16]\cr
2453
rR&remainder register [6]\cr
2454
rS®ister stack pointer [11]\cr
2455
rT&trap address register [13]\cr
2456
rU&usage counter [17]\cr
2457
rV&virtual translation register [18]\cr
2458
rW&where-interrupted register (trip) [24]\cr
2459
rX&execution register (trip) [25]\cr
2460
rY&Y operand (trip) [26]\cr
2461
rZ&Z operand (trip) [27]\cr
2462
rBB&bootstrap register (trap) [7]\cr
2463
rTT&dynamic trap address register [14]\cr
2464
rWW&where-interrupted register (trap) [28]\cr
2465
rXX&execution register (trap) [29]\cr
2466
rYY&Y operand (trap) [30]\cr
2467
rZZ&Z operand (trap) [31]\cr}}$$
2468
@^rG@>
2469
@^rL@>
2470
In this list rG and rL are what we have been calling simply $G$ and $L$; \
2471
rC, rF, rI, rN, rO, rS, rU, and~rV have not been mentioned before.
2472
 
2473
@ The {\it cycle counter\/}~rC advances by~1 on every ``clock pulse'' of the
2474
@^rC@>
2475
\MMIX\ processor. Thus if \MMIX\ is running at 500 MHz, the cycle
2476
counter increases every 2 nanoseconds. There is no need to worry about
2477
rC overflowing; even if it were to increase once every nanosecond,
2478
it wouldn't reach $2^{64}$ until more than 584.55 years have gone by.
2479
 
2480
The {\it interval counter\/}~rI is similar, but it {\it decreases\/}
2481
@^rI@>
2482
by~1 on each cycle, and causes an {\it interval interrupt\/}
2483
when it reaches zero. Such interrupts can be extremely useful for
2484
``continuous profiling'' as a means of studying
2485
the empirical running time of programs;
2486
see Jennifer~M. Anderson, Lance~M. Berc, Jeffrey Dean, Sanjay Ghemawat,
2487
Monika~R. Henzinger, Shun-Tak~A. Leung, Richard~L. Sites, Mark~T. Vandevoorde,
2488
Carl~A. Waldspurger, and William~E. Weihl, {\sl ACM Transactions on Computer
2489
Systems\/ \bf15} (1997), 357--390.
2490
The interval interrupt is achieved by setting the leftmost bit of the
2491
``machine'' byte of~rQ equal to~1; this is the eighth-least-significant bit.
2492
@^rQ@>
2493
@^continuous profiling@>
2494
@^performance monitoring@>
2495
@^Anderson, Jennifer-Ann Monique@>
2496
@^Berc, Lance Michael@>
2497
@^Dean, Jeffrey Adgate@>
2498
@^Ghemawat, Sanjay@>
2499
@^Henzinger, Monika Hildegard Rauch@>
2500
@^Leung, Shun-Tak Albert@>
2501
@^Sites, Richard Lee@>
2502
@^Vandevoorde, Mark Thierry@>
2503
@^Waldspurger, Carl Alan@>
2504
@^Weihl, William Edward@>
2505
 
2506
The {\it usage counter\/}~rU consists of three fields $(u_p,u_m,u_c)$,
2507
@^rU@>
2508
called the usage pattern~$u_p$, the usage mask~$u_m$,
2509
and the usage count~$u_c$. The most significant byte of~rU is the usage
2510
pattern; the next most significant byte is the usage mask; and
2511
the remaining 48 bits are the usage count. Whenever an instruction whose
2512
${\rm OP}\land u_m=u_p$ has been executed, the value of $u_c$ increases by~1
2513
(mod~$2^{48}$).
2514
Thus, for example, the OP-code chart below implies that
2515
all instructions are counted if $u_p=u_m=0$;
2516
all loads and stores are counted together with \.{GO} and \.{PUSHGO}
2517
if $u_p=(10000000)_2$ and $u_m=(11000000)_2$;
2518
all floating point instructions are counted together with fixed point
2519
multiplications and divisions if $u_p=0$ and $u_m=(11100000)_2$;
2520
fixed point multiplications and divisions alone are counted if
2521
$u_p=(00011000)_2$ and $u_m=(11111000)_2$; completed subroutine calls
2522
are counted if $u_p=\.{POP}$ and $u_m=(11111111)_2$.
2523
Instructions in negative locations, which belong to the operating system,
2524
are exceptional: They are included in the usage count only if the leading bit
2525
of $u_c$ is~1.
2526
@^negative locations@>
2527
 
2528
Incidentally, the 64-bit counters rC and rI can be implemented rather cheaply with
2529
only two levels of logic, using an old trick called ``carry-save addition''
2530
[see, for example, G.~Metze and J.~E. Robertson, {\sl Proc.\ International
2531
Conf.\ Information Processing\/} (Paris:\ 1959), 389--396]. One nice
2532
embodiment of this idea is to
2533
@^Metze, Gernot@>
2534
@^Robertson, James Evans@>
2535
@^carry-save addition@>
2536
represent a binary number~$x$ in a redundant form as the difference $x'-x''$
2537
of two binary numbers. Any two such numbers can be added without carry
2538
propagation as follows: Let
2539
$$f(x,y,z)=
2540
 (x\land\bar y)\lor(x\land z)\lor(\bar y\land z), \qquad
2541
% ((x\oplus y)\land(x\oplus z))\oplus z, \qquad
2542
  g(x,y,z)=x\oplus y\oplus z.$$
2543
Then it is easy to check that $x-y+z=2f(x,y,z)-g(x,y,z)$; we need only verify
2544
this in the eight cases when $x$, $y$, and~$z$ are 0 or~1.
2545
Thus we can subtract~1 from a counter $x'-x''$ by setting
2546
$$(x',x'')\gets(f(x',x'',-1)\LL1,\;g(x',x'',-1));$$
2547
we can add~1 by setting
2548
$(x',x'')\gets(g(x'',x',-1),f(x'',x',-1)\LL1)$.
2549
The result is zero if and only if
2550
$x'=x''$. We need not actually compute the difference $x'-x''$ until
2551
we need to examine the register. The computation
2552
of $f(x,y,z)$ and $g(x,y,z)$ is particularly simple in the special
2553
cases $z=0$ and $z=-1$. A similar trick works for~rU,
2554
but extra care is needed in that case
2555
because several instructions might finish at the same time.
2556
(Thanks to Frank Yellin for his improvements to this paragraph.)
2557
@^Yellin, Frank Nathan@>
2558
 
2559
@ The special {\it serial number register\/}~rN is permanently set to
2560
@^rN@>
2561
the time this particular instance of\/ \MMIX\ was created (measured as the
2562
number of seconds since 00:00:00 Greenwich Mean Time on 1~January 1970),
2563
in its five least significant bytes. The three most significant bytes
2564
are permanently set to the {\it version number\/} of the \MMIX\ architecture
2565
that is being implemented together with
2566
two additional bytes that modify the version
2567
number. This quantity serves as an essentially unique identification
2568
number for each copy of\/ \MMIX.
2569
@^version number@>
2570
 
2571
Version 1.0.0 of the architecture is described in the present document.
2572
Version~1.0.1 is similar, but simplified to avoid the
2573
complications of pipelines and operating systems.
2574
Other versions may become necessary in the future.
2575
 
2576
@ The {\it register stack offset\/}~rO and {\it register stack
2577
pointer\/}~rS are especially interesting, because they are used to implement
2578
@^register stack@>
2579
@^rO@>
2580
@^rS@>
2581
\MMIX's register stack~$S[0]$, $S[1]$, $S[2]$,~\dots.
2582
 
2583
The operating system
2584
initializes a register stack by assigning a large area of virtual memory to
2585
each running process, beginning at an address like
2586
\Hex{6000000000000000}.
2587
If this starting address is~$\sigma$, stack entry $S[k]$ will go into
2588
the octabyte $\mm_8[\sigma+8k]$. Stack underflow will be detected because
2589
the process does not have permission to read from $\mm[\sigma-1]$.
2590
Stack overflow will be detected because something will give out---either
2591
the user's budget or the user's patience or the user's swap space---long before
2592
$2^{61}$~bytes of virtual memory are filled by a register stack.
2593
@^terabytes@>
2594
 
2595
The \MMIX\ hardware maintains the register stack by having two banks
2596
of 64-bit general-purpose registers, one for globals and one for locals.
2597
The global registers $\rm g[32]$, $\rm g[33]$, \dots, $\rm g[255]$ are used for
2598
register numbers that are $\ge\gg$ in \MMIX\ commands;
2599
recall that $G$~is always 32 or more. The local
2600
registers come from another array that contains $2^n$ registers for
2601
some~$n$ where $8\le n\le10$; for simplicity of exposition
2602
we will assume that there are exactly 512 local
2603
registers, but there may be only 256 or there may be 1024.
2604
 
2605
\def\l{{\rm l}}
2606
@^ring of local registers@>
2607
The local register slots l[0], l[1], \dots, l[511] act as a cyclic buffer with
2608
addresses that wrap around mod~512, so that $\l[512]=\l[0]$,
2609
$\l[513]=\l[1]$, etc. This buffer is divided into three parts by three
2610
pointers, which we will call $\alpha$, $\beta$, and $\gamma$.
2611
$$\epsfbox{mmix.1}$$
2612
Registers $\l[\alpha]$, $\l[\alpha+1]$, \dots,~$\l[\beta-1]$ are
2613
what program instructions currently call \$0, \$1, \dots,~$\$(\ll-1)$;
2614
registers $\l[\beta]$, $\l[\beta+1]$, \dots,~$\l[\gamma-1]$ are currently
2615
unused; and registers $\l[\gamma]$, $\l[\gamma+1]$, \dots,~$\l[\alpha-1]$
2616
contain items of the register stack that have been pushed down but not yet
2617
stored in memory. Special register~rS holds the virtual memory address where
2618
$\l[\gamma]$ will be stored, if necessary. Special register~rO holds the
2619
address where $\l[\alpha]$ will be stored; this always equals $8\tau$ plus
2620
the address of~$S[0]$. We can deduce the values of $\alpha$, $\beta$,
2621
and~$\gamma$ from the contents of rL, rO, and~rS, because
2622
$$\rm\alpha=(rO/8)\bmod512,\qquad \beta=(\alpha+rL)\bmod512,\qquad
2623
\hbox{and}\qquad \gamma=(rS/8)\bmod512.$$
2624
 
2625
To maintain this situation we need to make sure that the pointers $\alpha$,
2626
$\beta$, and $\gamma$ never move past each other. A~\.{PUSHJ} or
2627
\.{PUSHGO} operation simply
2628
advances $\alpha$ toward~$\beta$, so it is very simple. The first part of a
2629
\.{POP} operation, which moves $\beta$ toward~$\alpha$, is also very simple.
2630
But the next part of a~\.{POP} requires $\alpha$ to move downward, and
2631
memory accesses might be required. \MMIX\ will decrease rS by~8 (thereby
2632
decreasing $\gamma$ by~1) and set $\l[\gamma]\gets\mm_8[{\rm rS}]$,
2633
one or more times if necessary, to keep $\alpha$ from decreasing
2634
past~$\gamma$. Similarly, the operation of increasing~$L$ may cause \MMIX\ to
2635
set $\mm_8[{\rm rS}]\gets\l[\gamma]$ and increase rS by~8 (thereby increasing
2636
$\gamma$ by~1) one or more times, to keep $\beta$ from increasing
2637
past~$\gamma$. (Actually $\beta$ is never allowed to increase to the point
2638
where it becomes {\it equal\/} to $\gamma$.)
2639
If many registers need to be loaded or stored at once,
2640
these operations are interruptible.
2641
 
2642
[A somewhat similar scheme was introduced by David R. Ditzel and H.~R.
2643
McLellan in {\sl SIGPLAN Notices\/ \bf17},\thinspace4 (April 1982), 48--56,
2644
and incorporated in the so-called {\mc CRISP} architecture developed at
2645
AT{\AM}T Bell Labs. An even more similar scheme was adopted in the late 1980s
2646
@^AT{\AM}T Bell Laboratories@>
2647
@^Advanced Micro Devices@>
2648
by Advanced Micro Devices, in the processors of their Am29000 series---a
2649
family of computers whose instructions have essentially the
2650
format `OP~X~Y~Z' used by~\MMIX.]
2651
@^Ditzel, David Roger@>
2652
@^McClellan, Hubert Rae, Jr.@>
2653
 
2654
Limited versions of\/ \MMIX, having fewer registers, can also be envisioned. For
2655
example, we might have only 32 local registers $\l[0]$, $\l[1]$,
2656
\dots,~$\l[31]$ and only 32 global registers $\rm g[224]$, $\rm g[225]$,
2657
\dots,~$\rm g[255]$. Such a machine could run any \MMIX\ program that
2658
maintains the inequalities $\ll<32$ and $\gg\ge224$.
2659
 
2660
@ Access to \MMIX's special registers is obtained via the \.{GET} and
2661
\.{PUT} commands.
2662
@^special registers@>
2663
@^rL@>
2664
@^rQ@>
2665
 
2666
\bull\
2667
@.GET@>
2668
Register X is set to the contents of the special register identified by
2669
its code number~Z, using the code numbers listed earlier.
2670
An illegal instruction interrupt occurs if $\zz\ge32$.
2671
 
2672
Every special register is readable; \MMIX\ does not keep secrets from
2673
an inquisitive user. But of course only the operating system is allowed
2674
@^operating system@>
2675
to change registers like rK and~rQ (the interrupt mask and request
2676
registers). And not even the operating system is allowed to change~rC
2677
(the cycle counter) or rN~(the serial number) or the stack pointers
2678
rO~and~rS.
2679
 
2680
\bull\
2681
@.PUT@>
2682
the Y field must be zero.\>
2683
The special register identified by~X is set to
2684
the contents of register Z or to the unsigned byte~Z itself,
2685
if permissible. Some changes are, however, impermissible:
2686
Bits of rA that are always zero must remain zero; the leading seven bytes
2687
of rG and rL must remain zero, and rL must not exceed~rG;
2688
special registers 8--11 (namely rC, rN, rO, and~rS) must not change;
2689
special registers 12--18 (namely
2690
rI, rK, rQ, rT, rU, rV, and~rTT) can be changed only if the privilege
2691
bit of rK is zero;
2692
and certain bits of~rQ (depending on available hardware) might not
2693
allow software to change them from 0 to~1. Moreover, any bits of~rQ that have
2694
changed from 0 to~1 since the most recent \
2695
will remain~1 after \.{PUT}~\.{rQ,z}.
2696
The \.{PUT} command will not increase~rL; it sets rL to the minimum
2697
of the current value and the new value. (A~program should say
2698
\
2699
 
2700
Impermissible \.{PUT} commands cause an illegal instruction interrupt,
2701
or (in the case of rI, rK, rQ, rT, rU, rV, and~rTT) a privileged
2702
operation interrupt.
2703
@^illegal instructions@>
2704
@^privileged operations@>
2705
 
2706
\bull\
2707
@.SAVE@>
2708
@^register stack@>
2709
@^ring of local registers@>
2710
@^rO@>
2711
@^rS@>
2712
\
2713
so must the Z field of~\.{SAVE}, the X~field of \.{UNSAVE}.\>
2714
@.UNSAVE@>
2715
The \.{SAVE} instruction stores all registers and special registers
2716
that might affect the computation of the currently running process.
2717
First the current local registers \$0, \$1, \dots,~$\$(\ll-1)$ are
2718
pushed down as in \.{PUSHGO}~\.{\$255}, and $L$~is set to zero.
2719
Then the current global
2720
registers $\$\gg$, $\$(\gg+1)$, \dots,~\$255 are placed above them
2721
in the register stack; finally
2722
rB, rD, rE, rH, rJ, rM, rR, rP, rW, rX, rY, and~rZ
2723
are placed at the very top, followed by registers rG and~rA packed
2724
into eight bytes:
2725
$$\beginword
2726
&\field88&&\field{24}{24}&&\field{32}{32}\cr
2727
\noalign{\hrule}
2728
\\&rG&\\&0&\\&rA&\\\cr
2729
\noalign{\hrule}\endword$$
2730
The address of the topmost octabyte is then placed in register~X, which
2731
must be a global register. (This instruction is interruptible. If an
2732
interrupt occurs while the registers are being saved, we will have
2733
$\alpha=\beta=\gamma$ in the ring of local registers;
2734
thus rO will equal~rS and rL will be zero. The interrupt handler
2735
essentially has a new register stack, starting on top of the partially
2736
saved context.) Immediately after a \.{SAVE} the values of rO and~rS
2737
are equal to the location of the first byte following the stack
2738
just saved. The current register stack is effectively empty at this
2739
point; thus one shouldn't do a \.{POP} until this context
2740
or some other context has been unsaved.
2741
@^rO@>
2742
@^rS@>
2743
 
2744
The \.{UNSAVE} instruction goes the other way, restoring all the
2745
registers when given an address in register~Z that was returned
2746
by a previous \.{SAVE}. Immediately after an \.{UNSAVE} the values of
2747
rO and~rS will be equal. Like \.{SAVE}, this instruction is interruptible.
2748
 
2749
The operating system uses \.{SAVE} and \.{UNSAVE}
2750
to switch context between different processes.
2751
It can also use \.{UNSAVE} to
2752
establish suitable initial values of rO and~rS\null.
2753
But a user program that knows what it is doing can in fact allocate its own
2754
register stack or stacks and do its own process switching.
2755
 
2756
Caution: \.{UNSAVE} is destructive, in the sense that a program can't reliably
2757
\.{UNSAVE} twice from the same saved context. Once an
2758
\.{UNSAVE} has been done,
2759
further operations are likely to change the memory
2760
record of what was saved. Moreover, an interrupt during the middle
2761
of an \.{UNSAVE} may have already clobbered some of the data in memory before
2762
the \.{UNSAVE} has completely finished, although the data will appear
2763
properly in all registers.
2764
 
2765
@* Virtual and physical addresses.
2766
Virtual 64-bit addresses are converted to physical addresses in a manner
2767
@^virtual addresses@>
2768
@^physical addresses@>
2769
governed by the special {\it virtual translation register\/}~rV. Thus
2770
@^rV@>
2771
$\rm M[A]$ really refers to $\rm m[\phi(A)]$, where m~is the physical
2772
memory array and $\phi(A)$
2773
is determined by the physical mapping function~$\phi$. The details of
2774
this conversion are rather technical and of interest mainly to the operating
2775
system, but two simple rules are important to ordinary users:
2776
@^operating system@>
2777
 
2778
\bull Negative addresses are mapped directly to physical addresses, by simply
2779
@^negative locations@>
2780
suppressing the sign bit:
2781
$$\phi(A)=A+2^{63}=A\land\Hex{7fffffffffffffff},\qquad
2782
\hbox{if $A<0$.}$$
2783
{\it All accesses to negative addresses are privileged}, for use by the
2784
operating system only.
2785
@^privileged operations@>
2786
(Thus, for example, the trap addresses in~rT and~rTT should be negative,
2787
because they are addresses inside the operating system.) Moreover, all physical
2788
addresses $\ge2^{48}$ are intended for use by memory-mapped I/O devices;
2789
values read from or written to such locations are never placed in a cache.
2790
@^I/O@>
2791
@^input/output@>
2792
@^memory-mapped input/output@>
2793
 
2794
\bull Nonnegative addresses belong to four {\it segments}, depending on
2795
@^segments@>
2796
whether the three leading bits are 000, 001, 010, or 011. These $2^{61}$-byte
2797
segments are traditionally used for a program's text, data, dynamic
2798
memory, and register stack, respectively, but such conventions are
2799
not mandatory. There are four mappings $\phi_0$, $\phi_1$, $\phi_2$,
2800
and~$\phi_3$ of 61-bit addresses into 48-bit physical memory space, one for
2801
each segment:
2802
$$\phi(A)=\phi_{\lfloor A/2^{61}\rfloor}(A\bmod2^{61}),\qquad
2803
\hbox{if $0\le A<2^{63}$.}$$
2804
In general, the machine is able to access smaller addresses of a segment more
2805
efficiently than larger addresses. Thus a programmer should let each segment
2806
grow upward from zero, trying to keep any of the 61-bit addresses from
2807
becoming larger than necessary, although arbitrary addresses are legal.
2808
 
2809
@ Now it's time for the technical details of virtual address translation.
2810
@^segments@>
2811
@^virtual addresses@>
2812
@^physical addresses@>
2813
@^rV@>
2814
The mappings $\phi_0$, $\phi_1$, $\phi_2$, and~$\phi_3$ are defined
2815
by the following rules.
2816
\smallskip
2817
 
2818
(1) The first two bytes of rV are four nybbles called $b_1$, $b_2$, $b_3$,
2819
$b_4$; we also define $b_0=0$. Segment~$i$ has at most $1024^{\,b_{i+1}-b_i}$
2820
pages. In particular, segment~$i$ must have at most one page when
2821
$b_i=b_{i+1}$, and it must be entirely empty if $b_i>b_{i+1}$.
2822
 
2823
(2) The next byte of rV, $s$, specifies the current {\it page size},
2824
which is $2^s$ bytes. We must have $s\ge13$ (hence at least 8192~bytes
2825
per page). Values of~$s$ larger than, say, 20 or~so are of use only in rather
2826
large programs that will reside in main memory for long periods of time,
2827
because memory protection and swapping are applied to entire pages.
2828
The maximum legal value of~$s$ is~48.
2829
 
2830
(3) The remaining five bytes of rV are a 27-bit {\it root location\/}~$r$,
2831
a 10-bit {\it address space number\/}~$n$, and a 3-bit {\it function
2832
field\/}~$f$:
2833
$$\centerline{$\hbox{rV}=\beginword
2834
&\field44&&\field44&&\field44&&\field44&&\field88&&
2835
 \field{27}{27}&&\field{10}{10}&&\field33\cr
2836
\noalign{\hrule}
2837
\\&$b_1$&\\&$b_2$&\\&$b_3$&\\&$b_4$&\\&$s$&\\&$r$&\\&$n$&\\&$f$&\\\cr
2838
\noalign{\hrule}\endword$}$$
2839
Normally $f=0$; if $f=1$, virtual address translation will be done by
2840
software instead of hardware, and the $b_1$, $b_2$, $b_3$, $b_4$,
2841
and~$r$ fields of~rV will be ignored by the hardware.
2842
(Values of $f>1$ are reserved for possible future use; if $f>1$
2843
when \MMIX\ tries to translate an address, a memory-protection
2844
failure will occur.)
2845
@^illegal instructions@>
2846
 
2847
(4) Each page has an 8-byte {\it page table entry\/} (PTE), which looks
2848
@^page table entry@>
2849
@^PTE@>
2850
like this:
2851
$$\centerline{$\hbox{PTE}=\beginword
2852
&\field{16}{16}&&\field{32}{48-s}&&\field3{s-13}&&\field{10}{10}&&
2853
 \field33\cr
2854
\noalign{\hrule}
2855
\\&$x$&\\&$a$&\\&$y$&\\&$n$&\\&$p$&\\\cr
2856
\noalign{\hrule}\endword$}$$
2857
Here $x$ and $y$ are ignored (thus they are usable for any purpose by the
2858
operating
2859
system); $2^s a$~is the physical address of byte~0 on the page; and $n$~is
2860
the address space number (which must match the number in~rV). The final three
2861
bits are the {\it protection bits\/} $p_r\,p_w\,p_x$; the user needs
2862
$p_r=1$ to load from this page, $p_w=1$ to store on this page, and
2863
$p_x=1$ to execute instructions on this page. If $n$~fails to
2864
match the number in~rV, or if the appropriate protection bit is zero,
2865
a memory-protection fault occurs.
2866
@^protection fault@>
2867
 
2868
Page table entries should be writable only by the operating system.
2869
The 16 ignored bits of~$x$ imply that physical memory size is limited
2870
to $2^{48}$ bytes (namely 256 large terabytes); that should be enough capacity
2871
for awhile, if not for the entire new millennium.
2872
@^terabytes@>
2873
 
2874
(5) A given 61-bit address $A$ belongs to page $\lfloor A/2^s\rfloor$ of
2875
its segment, and
2876
$$\phi_i(A)=2^s\,a+(A\bmod2^s)$$
2877
if $a$ is the address in the PTE for page $\lfloor A/2^s\rfloor$ of
2878
segment~$i$.
2879
 
2880
(6) Suppose $\lfloor A/2^s\rfloor=(a_4a_3a_2a_1a_0)_{1024}$ in the
2881
radix-1024 number system. In the common case $a_4=a_3=a_2=a_1=0$, the
2882
PTE is simply the octabyte ${\rm m}_8[2^{13}(r+b_i)+8a_0]$; this rule
2883
defines the mapping for the first 1024 pages. The next million or~so pages are
2884
accessed through an auxiliary {\it page table pointer}
2885
@^page table pointer@>
2886
@^PTP@>
2887
$$\centerline{$\hbox{PTP}=\beginword
2888
&\field11&&\field{50}{50}&&\field{10}{10}&&\field33\cr
2889
\noalign{\hrule}
2890
\\&1&\\&$c$&\\&$n$&\\&$q$&\\\cr
2891
\noalign{\hrule}\endword$}$$
2892
in ${\rm m}_8[2^{13}(r+b_i+1)+8a_1]$; here the sign must be~1 and the
2893
$n$-field must match~rV, but the $q$~bits are ignored. The desired PTE for
2894
page $(a_1a_0)_{1024}$ is then in ${\rm m}_8[2^{13}c+8a_0]$. The next billion
2895
or so pages, namely the pages $(a_2a_1a_0)_{1024}$ with $a_2\ne0$,
2896
are accessed similarly, through an auxiliary PTP at level~two; and
2897
so on.
2898
 
2899
Notice that if $b_3=b_4$, there is just one page in segment~3, and its PTE
2900
appears all alone in physical location $2^{13}(r+b_3)$.
2901
Otherwise the PTEs appear in 1024-octabyte blocks. We usually
2902
have $0
2903
worthy of mention: In this special case there is only one page, and the
2904
segment bits of a virtual address are ignored; the other $61-s$ bits of each
2905
virtual address must be zero.
2906
 
2907
If $s=13$, $b_1=3$, $b_2=2$, $b_3=1$, and $b_4=0$, there are at most
2908
$2^{30}$ pages of 8192 bytes each, all belonging to segment~0. This is
2909
essentially the virtual memory setup in the Alpha~21064 computers with
2910
{\mc DIGITAL~UNIX}$^{\rm\,TM}$.
2911
@^Alpha computers@>
2912
 
2913
I know these rules look extremely complicated, and I sincerely wish I could
2914
have found an alternative that would be both simple and efficient in practice.
2915
I tried various schemes based on hashing, but came to the conclusion that
2916
``trie'' methods such as those described here are better for this application.
2917
Indeed, the page tables in most contemporary computers are based on very
2918
similar ideas, but with significantly smaller virtual addresses and without
2919
the shortcut for small page numbers. I tried also to find formats for rV
2920
and the page tables that would match byte boundaries in a more friendly way,
2921
but the corresponding page sizes did not work well. Fortunately these grungy
2922
details are almost always completely hidden from ordinary users.
2923
 
2924
@ Of course \MMIX\ can't afford to perform a lengthy calculation of physical
2925
addresses every time it accesses memory. The machine therefore maintains a
2926
{\it translation cache\/} (TC),
2927
@^translation caches@>
2928
@^TC@>
2929
which contains the translations of recently
2930
accessed pages. (In fact, there usually are two such caches,
2931
one for instructions
2932
and one for data.) A~TC holds a set of 64-bit translation keys
2933
$$\beginword
2934
&\field{1.2}1&&\field22&&\field{44.8}{61-s}&&\field3{s-13}&&\field{10}{10}&&
2935
 \field33\cr
2936
\noalign{\hrule}
2937
\\&0&\\&$i$&\\&$v$&\\&0&\\&$n$&\\&0&\\\cr
2938
\noalign{\hrule}\endword$$
2939
associated with 38-bit translations
2940
$$\beginword
2941
&\field{32}{48-s}&&\field3{s-13}&&\field33\cr
2942
\noalign{\hrule}
2943
\\&$a$&\\&0&\\&$p$&\\\cr
2944
\noalign{\hrule}\endword$$
2945
representing the relevant parts of the PTE for page $v$ of segment $i$.
2946
Different processes typically have different values of~$n$, and possibly also
2947
different values of~$s$. The operating system needs a way to keep such caches
2948
up to date when pages are being allocated, moved, swapped, or recycled.
2949
The operating system also likes to know which pages have been recently
2950
used. The \.{LDVTS} instructions facilitate such operations:
2951
@^protection bits@>
2952
@^permission bits@>
2953
 
2954
\bull\
2955
@.LDVTS@>
2956
The sum $\rY+\rZ$ or $\rY+\zz$ should have the form of
2957
a translation cache key as above,
2958
except that the rightmost three bits need not be zero.
2959
If this key is present in a TC,
2960
the rightmost three bits replace the current protection code~$p$;
2961
however, if $p$ is thereby set to zero, the key is removed from
2962
the TC. Register~X is set to 0 if the key was not present
2963
in any translation cache, or to 1 if the key was present in the TC
2964
for instructions, or to 2 if the key was present in the TC for data,
2965
or to~3 if the key was present in both. This instruction is for the
2966
operating system only.
2967
 
2968
@ We mentioned earlier that
2969
cheap versions of\/ \MMIX\ might calculate the physical addresses with
2970
@^emulation@>
2971
@^rV@>
2972
software instead of hardware, using forced traps when the operating
2973
system needs to do page table calculations.
2974
@^operating system@>
2975
Here is some code that could be used for
2976
such purposes; it defines the translation process precisely, given a
2977
nonnegative virtual
2978
address in register~rYY\null. First we must unpack the fields of~rV and
2979
@^virtual addresses@>
2980
@^physical addresses@>
2981
@^rV@>
2982
@^PTE@>
2983
@^PTP@>
2984
@^segments@>
2985
compute the relevant base addresses for PTEs and PTPs:
2986
$$\vbox{\halign{&\tt#\hfil\ \cr
2987
&GET &virt,rYY\cr
2988
&GET &\$7,rV         &\% \$7=(virtual translation register)\cr
2989
&SRU    &\$1,virt,61    &\% \$1=i (segment number of virtual address)\cr
2990
&SLU    &\$1,\$1,2       \cr
2991
&NEG    &\$1,52,\$1      &\% \$1=52-4i\cr
2992
&SRU    &\$1,\$7,\$1      \cr
2993
&SLU    &\$2,\$1,4       \cr
2994
&SETL   &\$0,\#f000     \cr
2995
&AND    &\$1,\$1,\$0      &\% \$1=b[i]<<12\cr
2996
&AND    &\$2,\$2,\$0      &\% \$2=b[i+1]<<12\cr
2997
&SLU    &\$3,\$7,24      \cr
2998
&SRU    &\$3,\$3,37      \cr
2999
&SLU    &\$3,\$3,13      &\% \$3=(r field of rV)\cr
3000
&ORH   &\$3,\#8000     &\% make \$3 a physical address\cr
3001
&2ADDU   &base,\$1,\$3    &\% base=address of first page table\cr
3002
&2ADDU   &limit,\$2,\$3   &\% limit=address after last page table\cr
3003
&SRU    &s,\$7,40      \cr
3004
&AND    &s,s,\#ff       &\% s=(s field of rV)\cr
3005
&CMP    &\$0,s,13       \cr
3006
&BN     &\$0,Fail       &\% s must be 13 or more\cr
3007
&CMP    &\$0,s,49       \cr
3008
&BNN    &\$0,Fail       &\% s must be 48 or less\cr
3009
&SETH  &mask,\#8000 \cr
3010
&ORL   &mask,\#1ff8&\% mask=(sign bit and n field)\cr
3011
&ORH   &\$7,\#8000      &\% set sign bit for PTP validation below\cr
3012
&ANDNH &virt,\#e000     &\% zero out the segment number\cr
3013
&SRU    &\$0,virt,s     &\% \$0=a4a3a2a1a0 (page number of virt)\cr
3014
&ZSZ   &\$1,\$0,1       &\% \$1=[page number is zero]\cr
3015
&ADD    &limit,limit,\$1&\% increase limit if page number is zero\cr
3016
&SETL&\$6,\#3ff\cr
3017
}}$$
3018
The next part of the routine finds the ``digits'' of
3019
the page number $(a_4a_3a_2a_1a_0)_{1024}$, from right to left:
3020
$$
3021
\vcenter{\halign{&\tt#\hfil\ \cr
3022
&OR  &\$5,base,0\cr
3023
&SRU &\$1,\$0,10\cr
3024
&PBZ &\$1,1F\cr
3025
&AND &\$0,\$0,\$6\cr
3026
&INCL &base,\#2000\cr}}
3027
\qquad
3028
\vcenter{\halign{&\tt#\hfil\ \cr
3029
&OR  &\$5,base,0\cr
3030
&SRU &\$2,\$1,10\cr
3031
&PBZ &\$2,2F\cr
3032
&AND &\$1,\$1,\$6\cr
3033
&INCL &base,\#2000\cr}}
3034
\qquad
3035
\vcenter{\halign{&\tt#\hfil\ \cr
3036
&OR  &\$5,base,0\cr
3037
&SRU &\$3,\$2,10\cr
3038
&PBZ &\$3,3F\cr
3039
&AND &\$2,\$2,\$6\cr
3040
&INCL &base,\#2000\cr}}
3041
\qquad
3042
\vcenter{\halign{&\tt#\hfil\ \cr
3043
&OR  &\$5,base,0\cr
3044
&SRU &\$4,\$3,10\cr
3045
&PBZ &\$4,4F\cr
3046
&AND &\$3,\$3,\$6\cr
3047
&INCL &base,\#2000\cr}}
3048
$$
3049
Then the process cascades back through PTPs.
3050
$$
3051
\vcenter{\halign{&\tt#\hfil\ \cr
3052
&OR  &\$5,base,0\cr
3053
&8ADDU&\$6,\$4,base\cr
3054
&LDO  &base,\$6,0\cr
3055
&XOR &\$6,base,\$7\cr
3056
&AND &\$6,\$6,mask\cr
3057
&BNZ &\$6,Fail\cr}}
3058
\quad
3059
\vcenter{\halign{&\tt#\hfil\ \cr
3060
&ANDNL&base,\#1fff\cr
3061
4H&8ADDU &\$6,\$3,base\cr
3062
&LDO  &base,\$6,0\cr
3063
&XOR &\$6,base,\$7\cr
3064
&AND &\$6,\$6,mask\cr
3065
&BNZ &\$6,Fail\cr}}
3066
\quad
3067
\vcenter{\halign{&\tt#\hfil\ \cr
3068
&ANDNL&base,\#1fff\cr
3069
3H&8ADDU &\$6,\$2,base\cr
3070
&LDO  &base,\$6,0\cr
3071
&XOR &\$6,base,\$7\cr
3072
&AND &\$6,\$6,mask\cr
3073
&BNZ &\$6,Fail\cr}}
3074
\quad
3075
\vcenter{\halign{&\tt#\hfil\ \cr
3076
&ANDNL&base,\#1fff\cr
3077
2H&8ADDU &\$6,\$1,base\cr
3078
&LDO  &base,\$6,0\cr
3079
&XOR &\$6,base,\$7\cr
3080
&AND &\$6,\$6,mask\cr
3081
&BNZ &\$6,Fail\cr}}
3082
$$
3083
Finally we obtain the PTE and communicate it to the machine.
3084
If errors have been detected, we set the translation to zero; actually
3085
any translation with permission bits zero would have the same effect.
3086
$$\chardef\_=`\_
3087
\vcenter{\halign{&\tt#\hfil\ \cr
3088
&ANDNL &base,\#1fff &\% remove low 13 bits of PTP\cr
3089
1H &8ADDU &\$6,\$0,base \cr
3090
&LDO  &base,\$6,0  &\% base=PTE\cr
3091
&XOR &\$6,base,\$7\cr
3092
&ANDN&\$6,\$6,\#7\cr
3093
&SLU &\$6,\$6,51\cr
3094
&BNZ &\$6,Fail &\% branch if n doesn't match\cr
3095
&CMP &\$6,\$5,limit \cr
3096
&BN  &\$6,Ready &\% did we run off the end of the page table?\cr
3097
Fail&SETL &base,0 &\% errors lead to PTE of zero\cr
3098
Ready&PUT&rZZ,base\cr
3099
&LDO&\$255,IntMask &\% load the desired setting of rK\cr
3100
&RESUME&1 &\% now the machine will digest the translation\cr}}$$
3101
All loads and stores in this program deal with negative virtual addresses.
3102
This effectively shuts off memory mapping and makes the page tables
3103
inaccessible to the user.\looseness=-1
3104
 
3105
The program assumes that the ropcode in rXX is 3 (which it is when
3106
a forced trap is triggered by the need for virtual translation).
3107
@^ropcodes@>
3108
@^translation caches@>
3109
 
3110
The translation from virtual pages to physical pages need not actually
3111
follow the rules for PTPs and PTEs; any other mapping could be
3112
substituted by operating systems with special needs. But people usually
3113
want compatibility between different implementations whenever
3114
possible. The only parts of~rV that \MMIX\ really needs are the $s$~field,
3115
which defines page sizes, and the $n$~field, which keeps TC entries
3116
of one process from being confused with the TC entries of another.
3117
 
3118
@* The complete instruction set. We have now described all of\/ \MMIX's
3119
special registers---except one: The special
3120
{\it failure location register\/}~rF is set
3121
@^rF@>
3122
to a physical memory address when a parity error or other memory
3123
fault occurs. (The instruction leading to this error will probably be
3124
long gone before such a fault is detected; for example, the machine might
3125
be trying to write old data from a cache in order to make room for
3126
new data. Thus there is generally no connection between the current virtual
3127
program location~rW and the physical location of a memory error. But knowledge
3128
of the latter location can still be useful for hardware repair, or when
3129
an operating system is booting up.)
3130
 
3131
@ One additional instruction proves to be useful.
3132
 
3133
\bull\
3134
This command lubricates the disk drives, fans, magnetic tape drives,
3135
laser printers, scanners, and any other mechanical equipment hooked
3136
up to \MMIX, if necessary. Fields X, Y, and~Z are ignored.
3137
@.SWYM@>
3138
 
3139
The \.{SWYM} command was originally included in \MMIX's repertoire because
3140
machines occasionally need grease to keep in shape, just as
3141
human beings occasionally need to swim or do some other kind of exercise
3142
in order to maintain good muscle tone. But in fact, \.{SWYM} has turned out to
3143
be a ``no-op,'' an instruction that does nothing at all; the
3144
@^no-op@>
3145
hypothetical manufacturers of our hypothetical machine have pointed out that
3146
modern computer equipment is already well oiled and sealed for permanent use.
3147
Even so, a no-op instruction provides a good way for software to
3148
send signals to the hardware, for such things as scheduling the way
3149
instructions are issued on superscalar superpipelined buzzword-compliant
3150
machines. Software programs can also use no-ops to communicate with other
3151
programs like symbolic debuggers.
3152
 
3153
When a forced trap computes the translation~rZZ of a virtual address~rYY,
3154
ropcode~3 of \
3155
the opcode in~rXX is \.{SWYM}; otherwise $\rm(rYY,rZZ)$ will be put
3156
into the TC for data.
3157
@^ropcodes@>
3158
@^translation caches@>
3159
@.RESUME@>
3160
@^virtual address emulation@>
3161
@^emulation@>
3162
 
3163
@ The running time of\/ \MMIX\ programs depends to a great extent
3164
on changes in technology.
3165
\MMIX\ is a mythical machine, but its mythical hardware exists in
3166
cheap, slow versions as well as in costly high-performance models.
3167
Details of running time usually depend on things like the amount of main memory
3168
available to implement virtual memory, as well as the sizes of
3169
caches and other buffers.
3170
 
3171
For practical purposes, the running time of an \MMIX\ program can often be
3172
estimated satisfactorily by assigning a fixed cost
3173
to each operation, based on the approximate running time that would be obtained
3174
on a high-performance machine with lots of main memory; so that's what
3175
we will do. Each operation will be assumed to take an integer number
3176
of~$\upsilon$,
3177
where $\upsilon$ (pronounced ``oops'') is a unit that represents the clock cycle time in
3178
@^mems@>
3179
@^oops@>
3180
a pipelined implementation. The value of $\upsilon$ will probably decrease
3181
from year to year, but I'll keep calling it $\upsilon$. The running
3182
time will also depend on the number of memory references or {\it mems\/}
3183
that a program uses;
3184
this is the number of load and store instructions. For example,
3185
each \.{LDO} (load octa) instruction will be assumed to cost
3186
$\mu+\upsilon$, where $\mu$ is the average cost of
3187
a memory reference. The total running time of a program might be reported as,
3188
say, $35\mu+1000\upsilon$, meaning 35 mems plus 1000~oops. The
3189
ratio $\mu/\upsilon$ will probably increase with time, so mem-counting
3190
is likely to become increasingly important. [See the discussion of mems in
3191
{\sl The Stanford GraphBase\/} (New York:\ ACM Press, 1994).]
3192
@^oops@>
3193
@^running times, approximate@>
3194
 
3195
Integer addition, subtraction, and comparison all take just $1\upsilon$.
3196
The same is true for \.{SET}, \.{GET}, \.{PUT}, \.{SYNC}, and \.{SWYM}
3197
instructions,
3198
as well as bitwise logical operations, shifts, relative jumps, comparisons,
3199
conditional assignments,
3200
and correctly predicted branches-not-taken or probable-branches-taken.
3201
Mispredicted branches or probable branches cost $3\upsilon$, and
3202
so do the \.{POP} and \.{GO} commands.
3203
Integer multiplication takes $10\upsilon$; integer division weighs in
3204
at~$60\upsilon$.
3205
@.MUL@>
3206
@.DIV@>
3207
@.TRAP@>
3208
@.TRIP@>
3209
@.RESUME@>
3210
\.{TRAP}, \.{TRIP}, and \.{RESUME} cost $5\upsilon$ each.
3211
 
3212
Most floating point operations have a nominal running time of $4\upsilon$,
3213
although the comparison operators \.{FCMP}, \.{FEQL}, and \.{FUN}
3214
need only $1\upsilon$.
3215
\.{FDIV} and \.{FSQRT} cost $40\upsilon$ each.
3216
@.FDIV@>
3217
@.FSQRT@>
3218
@.FREM@>
3219
The actual running time of floating point computations
3220
will vary depending on the operands; for example,
3221
the machine might need one extra $\upsilon$ for each subnormal input
3222
or output, and it might slow down greatly when trips are enabled.
3223
The \.{FREM} instruction might typically cost
3224
$(3+\delta)\upsilon$, where $\delta$ is the amount
3225
by which the exponent of the first operand exceeds the exponent of the
3226
second (or zero, if this amount is negative). A floating point
3227
operation might take only $1\upsilon$
3228
if at least one of its operands is zero, infinity, or~NaN\null.
3229
However, the fixed values stated at the beginning of this paragraph
3230
will be used for all seat-of-the-pants estimates of running time,
3231
since we want to keep the estimates as simple as possible
3232
without making them terribly out of line.
3233
 
3234
All load and store operations will be assumed to cost $\mu+\upsilon$,
3235
except that \.{CSWAP} costs $2\mu+2\upsilon$.
3236
(This applies to all OP~codes that begin with
3237
\Hex8, \Hex9, \Hex{A}, and \Hex{B}, except \Hex{98}--\Hex{9F} and
3238
\Hex{B8}--\Hex{BF}. It's best
3239
to keep the rules simple, because $\mu$ is just
3240
an approximate device for estimating average memory cost.)
3241
\.{SAVE} and \.{UNSAVE} are charged $20\mu+\upsilon$.
3242
@.CSWAP@>
3243
@.SAVE@>
3244
@.UNSAVE@>
3245
 
3246
Of course we must remember that these numbers are very rough.
3247
We have not included the cost of fetching instructions from memory.
3248
Furthermore, an integer multiplication or division might have an effective
3249
cost of only $1\upsilon$, if the result is not needed while other
3250
numbers are being calculated.
3251
Only a detailed simulation can be expected to be truly realistic.
3252
 
3253
@ If you think that \MMIX\ has plenty of operation codes, you are right;
3254
we have now described them all. Here is a chart that shows their
3255
numeric values:
3256
\def\oddline#1{\cr
3257
  \noalign{\nointerlineskip}
3258
  \omit&\setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}&
3259
  \multispan{17}\hrulefill&
3260
  \setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}\cr
3261
  \noalign{\nointerlineskip}}
3262
\def\evenline{\cr\noalign{\hrule}}
3263
\def\chartstrut{\lower4.5pt\vbox to14pt{}}
3264
\def\beginchart{$$\tt\halign to\hsize\bgroup
3265
    \chartstrut##\tabskip0pt plus10pt&
3266
    &\hfil##\hfil&\vrule##\cr
3267
    \lower6.5pt\null
3268
    &&&\Hex0&&\Hex1&&\Hex2&&\Hex3&&\Hex4&&\Hex 5&&\Hex 6&&\Hex 7&\evenline}
3269
\def\endchart{\raise11.5pt\null&&&\Hex 8&&\Hex 9&&\Hex A&&\Hex B&
3270
  &\Hex C&&\Hex D&&\Hex E&&\Hex F&\cr\egroup$$}
3271
\def\\#1[#2]{\multispan3\hfil#1[#2]\hfil}
3272
\beginchart
3273
&&&TRAP&&FCMP&&FUN&&FEQL&&FADD&&FIX&&FSUB&&FIXU&\oddline 0
3274
&&&\\FLOT[I]&&\\FLOTU[I]&&\\SFLOT[I]&&\\SFLOTU[I]&\evenline
3275
&&&FMUL&&FCMPE&&FUNE&&FEQLE&&FDIV&&FSQRT&&FREM&&FINT&\oddline 1
3276
&&&\\MUL[I]&&\\MULU[I]&&\\DIV[I]&&\\DIVU[I]&\evenline
3277
&&&\\ADD[I]&&\\ADDU[I]&&\\SUB[I]&&\\SUBU[I]&\oddline 2
3278
&&&\\2ADDU[I]&&\\4ADDU[I]&&\\8ADDU[I]&&\\16ADDU[I]&\evenline
3279
&&&\\CMP[I]&&\\CMPU[I]&&\\NEG[I]&&\\NEGU[I]&\oddline 3
3280
&&&\\SL[I]&&\\SLU[I]&&\\SR[I]&&\\SRU[I]&\evenline
3281
&&&\\BN[B]&&\\BZ[B]&&\\BP[B]&&\\BOD[B]&\oddline 4
3282
&&&\\BNN[B]&&\\BNZ[B]&&\\BNP[B]&&\\BEV[B]&\evenline
3283
&&&\\PBN[B]&&\\PBZ[B]&&\\PBP[B]&&\\PBOD[B]&\oddline 5
3284
&&&\\PBNN[B]&&\\PBNZ[B]&&\\PBNP[B]&&\\PBEV[B]&\evenline
3285
&&&\\CSN[I]&&\\CSZ[I]&&\\CSP[I]&&\\CSOD[I]&\oddline 6
3286
&&&\\CSNN[I]&&\\CSNZ[I]&&\\CSNP[I]&&\\CSEV[I]&\evenline
3287
&&&\\ZSN[I]&&\\ZSZ[I]&&\\ZSP[I]&&\\ZSOD[I]&\oddline 7
3288
&&&\\ZSNN[I]&&\\ZSNZ[I]&&\\ZSNP[I]&&\\ZSEV[I]&\evenline
3289
&&&\\LDB[I]&&\\LDBU[I]&&\\LDW[I]&&\\LDWU[I]&\oddline 8
3290
&&&\\LDT[I]&&\\LDTU[I]&&\\LDO[I]&&\\LDOU[I]&\evenline
3291
&&&\\LDSF[I]&&\\LDHT[I]&&\\CSWAP[I]&&\\LDUNC[I]&\oddline 9
3292
&&&\\LDVTS[I]&&\\PRELD[I]&&\\PREGO[I]&&\\GO[I]&\evenline
3293
&&&\\STB[I]&&\\STBU[I]&&\\STW[I]&&\\STWU[I]&\oddline A
3294
&&&\\STT[I]&&\\STTU[I]&&\\STO[I]&&\\STOU[I]&\evenline
3295
&&&\\STSF[I]&&\\STHT[I]&&\\STCO[I]&&\\STUNC[I]&\oddline B
3296
&&&\\SYNCD[I]&&\\PREST[I]&&\\SYNCID[I]&&\\PUSHGO[I]&\evenline
3297
&&&\\OR[I]&&\\ORN[I]&&\\NOR[I]&&\\XOR[I]&\oddline C
3298
&&&\\AND[I]&&\\ANDN[I]&&\\NAND[I]&&\\NXOR[I]&\evenline
3299
&&&\\BDIF[I]&&\\WDIF[I]&&\\TDIF[I]&&\\ODIF[I]&\oddline D
3300
&&&\\MUX[I]&&\\SADD[I]&&\\MOR[I]&&\\MXOR[I]&\evenline
3301
&&&SETH&&SETMH&&SETML&&SETL&&INCH&&INCMH&&INCML&&INCL&\oddline E
3302
&&&ORH&&ORMH&&ORML&&ORL&&ANDNH&&ANDNMH&&ANDNML&&ANDNL&\evenline
3303
&&&\\JMP[B]&&\\PUSHJ[B]&&\\GETA[B]&&\\PUT[I]&\oddline F
3304
&&&POP&&RESUME&&SAVE&&UNSAVE&&SYNC&&SWYM&&GET&&TRIP&\evenline
3305
\endchart
3306
The notation `\.{[I]}' indicates an operation with an ``immediate'' variant
3307
in which the Z field denotes a constant instead of a register number.
3308
Similarly, `\.{[B]}' indicates an operation with a ``backward'' variant
3309
in which a relative address has a negative displacement. Simulators and
3310
other programs that need to present \MMIX\ instructions in symbolic
3311
form will say that opcode \Hex{20} is \.{ADD} while opcode \Hex{21}
3312
is~\.{ADDI}; they will say that \Hex{F2} is \.{PUSHJ} while \Hex{F3}
3313
is~\.{PUSHJB}. But the \MMIX\ assembler uses only the forms \.{ADD}
3314
and \.{PUSHJ}, not \.{ADDI} or \.{PUSHJB}.
3315
 
3316
To read this chart, use the hexadecimal digits at the top, bottom,
3317
left, and right.
3318
For example, operation code \.{A9} in hexadecimal notation appears in
3319
the lower part of the \Hex{Ax} row and in the \Hex1/\Hex9 column; it is
3320
\.{STTI}, `store tetrabyte immediate'.
3321
@^OP codes, table@>
3322
 
3323
%The blank spaces in this chart are undefined opcodes,
3324
%reserved for future extension.
3325
%If an instruction with such
3326
%an opcode is encountered in a user program, it is considered to be
3327
%an illegal instruction (like, say, \.{FIX} with the \.Y field greater than~9),
3328
%@^illegal instructions@>
3329
%triggering an interrupt. Such instructions might become defined in
3330
%later versions of\/ \MMIX, at which time the operating system
3331
%could probably emulate the new instructions for backward compatibility.
3332
%@^version number@>
3333
 
3334
\def\\#1{\leavevmode\hbox{\it#1\/\kern.05em}} % italic type for identifiers
3335
 
3336
@*Index. (References are to section numbers, not page numbers.)

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.