OpenCores
URL https://opencores.org/ocsvn/eco32/eco32/trunk

Subversion Repositories eco32

Compare Revisions

  • This comparison shows the changes necessary to convert path
    /eco32/tags/eco32-0.26/fp/implementation/mmix
    from Rev 15 to Rev 270
    Reverse comparison

Rev 15 → Rev 270

/mmix-doc.w
0,0 → 1,3336
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
 
\def\title{MMIX}
\input epsf % input macros for dvips to include METAPOST illustrations
 
\def\MMIX{\.{MMIX}}
\def\NNIX{\hbox{\mc NNIX}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
\def\beginword{\vcenter\bgroup\let\\=\wordrule\halign\bgroup&\hfil##\hfil\cr}
\def\endword{\noalign{\vskip\baselineskip}\egroup\egroup
\advance\belowdisplayskip-\baselineskip}
\def\wordrule{\vrule height 9.5pt depth 4.5pt width .4pt}
\newdimen\bitwd \bitwd=6.6pt
\def\field#1#2{\vrule depth 3pt width 0pt \hbox to#1\bitwd{\hss$#2$\hss}}
 
\def\XF{\\{XF}}\def\XM{\\{XM}}\def\XD{\\{XD}} % these not in \tt
\def\PC{\\{PC}}
\def\Jx{\.{J}} % conversely, I type J_ to get J in \tt
 
\def\s{{\rm s}}
\def\rX{{\rm\$X}} \def\rY{{\rm\$Y}} \def\rZ{{\rm\$Z}}
\def\mm{{\rm M}} \def\xx{{\rm X}} \def\yy{{\rm Y}} \def\zz{{\rm Z}}
%\def\ll{{\rm L}} \def\gg{{\rm G}}
\def\ll{L} \def\gg{G}
\def\?{\mkern-1mu}
 
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
 
@* Introduction to MMIX.
Thirty-eight years have passed since the \.{MIX} computer was designed, and
computer architecture has been converging during those years
towards a rather different
style of machine. Therefore it is time to replace \.{MIX} with a new
computer that contains even less saturated fat than its predecessor.
 
Exercise 1.3.1--25 in the third edition of
{\sl Fundamental Algorithms\/} speaks of an extended
\.{MIX} called MixMaster, which is upward compatible with the old version.
But MixMaster itself is hopelessly obsolete; although it allows for
several gigabytes of memory, we can't even use it with {\mc ASCII} code to
get lowercase letters. And ouch, the standard subroutine calling convention
of \.{MIX} is irrevocably based on self-modifying code! Decimal arithmetic
and self-modifying code were popular in 1962, but they sure have disappeared
quickly as machines have gotten bigger and faster. A completely new
design is called for, based on the principles of RISC architecture as
expounded in {\sl Computer Architecture\/} by Hennessy and Patterson
(Morgan Kaufmann, 1996). % first ed was "Morgan Kaufman"! but now "nn" is legit
@^Hennessy, John LeRoy@>
@^Patterson, David Andrew@>
 
So here is \MMIX, a computer that will totally replace \.{MIX}
in the ``ultimate'' editions of {\sl The Art of Computer Programming},
Volumes 1--3, and in the first editions of the remaining volumes.
I~must confess that
I~can hardly wait to own a computer like this.
 
How do you pronounce \MMIX? I've been saying ``em-mix'' to myself,
because the first `\.M' represents a new millennium. Therefore I~use
the article ``an'' instead of~``a'' before the name \MMIX\
in English phrases like ``an \MMIX\ simulator.''
 
Incidentally, the {\sl Dictionary of American Regional English\/ \bf3} (1996)
lists ``mommix'' as a common dialect word used both as a noun and a verb;
to mommix something means to botch it, to bollix it. Only time will
tell whether I~have mommixed the definition of \MMIX.
 
@ The original \.{MIX} computer could be operated without an operating
system; you could bootstrap it with punched cards or paper tape and do
everything yourself. But nowadays such power is no longer in the hands of
ordinary users. The \MMIX\ hardware, like all other computing machines
made today, relies on an operating system to get jobs
started in their own address spaces and to provide I/O capabilities.
 
Whenever anybody has asked if I will be writing about operating systems,
my reply has always been ``Nix.'' Therefore the name of\/ \MMIX's operating
system, \NNIX, will come as no surprise.
@:NNIX}{\NNIX\ operating system@>
@^operating system@>
From time to time I will necessarily have to refer to things that \NNIX\ does
for its users, but I am unable to build \NNIX\ myself. Life is
too short. It would be wonderful if some expert in operating system design
became inspired to write a book that explains exactly how to construct a nice,
clean \NNIX\ kernel for an \MMIX\ chip.
 
@ I am deeply grateful to the many people who have helped me shape the behavior
of\/ \MMIX. In particular, John Hennessy and (especially) Dick Sites
have made significant contributions.
@^Hennessy, John LeRoy@>
@^Sites, Richard Lee@>
 
@ A programmer's introduction to \MMIX\ appears in ``Fascicle~1,'' a booklet
@^Fascicle 1@>
containing tutorial material that will ultimately appear in the fourth edition
of {\sl The Art of Computer Programming}.
The description in the following sections is rather different, because
we are concerned about a complete implementation, including all of the
features used by the operating system and invisible to normal programs.
Here it is important to emphasize exceptional cases that were glossed over
in the tutorial, and~to consider
nitpicky details about things that might go wrong.
 
@* MMIX basics.
\MMIX\ is a 64-bit RISC machine with at least 256 general-purpose registers
and a 64-bit address space.
Every instruction is four bytes long and has the form
$$\vcenter{\offinterlineskip
\def\\#1&{\omit&#1&}
\hrule
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
height 9pt depth4pt&OP&&X&&Y&&Z&\cr}
\hrule}\,.$$
The 256 possible OP codes fall into a dozen or so easily remembered
@^OP codes@>
categories; an instruction usually means, ``Set register X to the
result of\/ Y~OP~Z\null.'' For example,
$$\vcenter{\offinterlineskip
\def\\#1&{\omit&#1&}
\hrule
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr
height 9pt depth4pt&32&&1&&2&&3&\cr}
\hrule}$$
sets register~1 to the sum of registers 2 and 3.
A few instructions combine the Y and Z bytes into
a 16-bit YZ field; two of the jump instructions use a 24-bit XYZ field.
But the three bytes X, Y, Z usually have three-pronged significance
independent of each other.
 
Instructions are usually represented in a symbolic form corresponding
to the \MMIX\ assembly language, in which each operation code has a mnemonic
name. For example, operation~32 is \.{ADD}, and the instruction above
might be written `\.{ADD} \.{\$1,\$2,\$3}'; a dollar sign `\.\$' symbolizes
a register number. In general, the instruction
\.{ADD}~\.{\$X,\$Y,\$Z} is the operation of setting $\rX=\rY+\rZ$.
An assembly language instruction with two commas has three operand
fields X, Y,~Z; an instruction with one comma has two operand fields
X,~YZ; an instruction with no comma has one operand field,~XYZ;
an instruction with no operands has $\xx=\yy=\zz=0$.
 
\def\0{\$Z\char'174Z}
Most instructions have two forms, one in which the Z field stands for
register \$Z, and one in which Z is an unsigned ``immediate'' constant.
@^immediate operands@>
Thus, for example, the command `\.{ADD} \.{\$X,\$Y,\$Z}' has a counterpart
`\.{ADD} \.{\$X,\$Y,Z}', which sets $\rX=\rY+\zz$. Immediate constants
are always nonnegative.
In the descriptions
below we will introduce such pairs of instructions
by writing just `\.{ADD}~\.{\$X,\$Y,\0}' instead of naming both
cases explicitly.
 
The operation code for \.{ADD}~\.{\$X,\$Y,\$Z} is 32, but the operation
code for \.{ADD}~\.{\$X,\$Y,Z} is~33. The \MMIX\ assembler chooses the correct
code by noting whether the third argument is a register number or~not.
 
Register numbers and constants can be given symbolic names; for example, the
assembly language instruction `\.x~\.{IS}~\.{\$1}' makes \.x an
abbreviation for register number~1. Similarly, `\.{FIVE}~\.{IS}~\.5'
makes \.{FIVE} an abbreviation for the constant~5.
After these abbreviations have been specified, the instruction
\.{ADD}~\.{x,x,FIVE} increases \$1 by~5, using opcode~33, while
the instruction \.{ADD}~\.{x,x,x} doubles \$1 using opcode~32.
Symbolic names that stand for register numbers
conventionally begin with a lowercase letter, while names that stand
for constants conventionally begin with an uppercase letter.
This convention is not actually enforced by the assembler,
but it tends to reduce a programmer's confusion.
 
@ A {\it nybble\/} is a 4-bit quantity, often used to denote a decimal
or hexadecimal digit.
A {\it byte\/} is an 8-bit quantity, often used to denote an alphanumeric
character in {\mc ASCII} code. The Unicode standard extends {\mc ASCII} to
@^Unicode@>
@^ASCII@>
essentially all the world's languages by using 16-bit-wide characters called
{\it wydes\/}. (Weight watchers know that two nybbles make one byte,
but two bytes make one wyde.)
In the discussion below we use the term
{\it tetrabyte\/} or ``tetra'' for a 4-byte quantity, and the similar term
@^nybble@>
@^byte@>
@^wyde@>
@^tetrabyte@>
@^octabyte@>
{\it octabyte\/} or ``octa'' for an 8-byte quantity. Thus, a tetra is
two wydes, an octa is two tetras; an octabyte has 64~bits. Each \MMIX\
register can be thought of as containing one octabyte, or two tetras,
or four wydes, or eight bytes, or sixteen nybbles.
 
When bytes, wydes, tetras, and octas represent numbers they are said to be
either {\it signed\/} or {\it unsigned}. An unsigned byte is a number between
0~and $2^8-1=255$ inclusive; an unsigned wyde lies, similarly, between
0~and $2^{16}-1=65535$; an unsigned tetra lies between
0~and $2^{32}-1=4{,}294{,}967{,}295$; an unsigned octa lies between
0~and $2^{64}-1=18{,}446{,}744{,}073{,}709{,}551{,}615$.
Their signed counterparts use the
conventions of two's complement notation, by subtracting respectively $2^8$,
$2^{16}$, $2^{32}$, or~$2^{64}$ times the most significant bit. Thus,
the unsigned bytes 128 through 255 are regarded as the numbers $-128$
through~$-1$ when they are evaluated as signed bytes; a signed byte therefore
lies between $-128$ and $+127$, inclusive. A signed wyde is a number
between $-32768$ and $+32767$; a signed tetra lies between
$-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$; a signed octa lies between
$-9{,}223{,}372{,}036{,}854{,}775{,}808$ and $+9{,}223{,}372{,}036{,}854{,}775{,}807$.
 
The virtual memory of\/ \MMIX\ is an array M of $2^{64}$ bytes. If $k$ is any
unsigned octabyte, M[$k$]~is a 1-byte quantity. \MMIX\ machines do not
actually have such vast memories, but programmers can act as if $2^{64}$ bytes
are indeed present, because \MMIX\ provides address translation mechanisms by
which an operating system can maintain this illusion.
 
We use the notation $\mm_{2^t}[k]$ to stand for a number consisting of
$2^t$~consecutive bytes starting at location~$k\land\nobreak(2^{64}-2^t)$.
(The notation $k\land(2^{64}-2^t)$ means that the least
significant $t$ bits of~$k$ are set to~0, and only the least 64~bits
of the resulting address are retained. Similarly, the notation
$k\lor(2^t-1)$ means that the least significant $t$ bits of~$k$ are set to~1.)
All accesses to $2^t$-byte quantities by \MMIX\ are {\it aligned}, in the sense
that the first byte is a multiple of~$2^t$.
 
Addressing is always ``big-endian.'' In other words, the
@^big-endian versus little-endian@>
@^little-endian versus big-endian@>
most significant (leftmost) byte of $\mm_{2^t}[k]$ is
$\mm_1[k\land\nobreak(2^{64}-2^t)]$
and the least significant (rightmost) byte is $\mm_1[k\lor(2^t-1)]$.
We use the notation $\s(\mm_{2^t}[k])$ when we want to regard
this $2^t$-byte number as a {\it signed\/} integer.
Formally speaking, if $l=2^t$,
@^signed integers@>
$$\s(\mm_l[k])=\bigl(\mm_1[k\land(-l)]\,\mm_1[k\land(-l)+1]\,\ldots\,
\mm_1[k\lor(l-1)]\bigr)_{256}
-2^{8l}[\mm_1[k\land(-l)]\!\ge\!128].$$
 
@* Loading and storing.
Several instructions can be used to get information from memory into
registers. For example, the ``load tetra unsigned'' instruction
\.{LDTU} \.{\$1,\$4,\$5}
puts the four bytes $\mm_4[\$4+\$5]$ into register~1 as an unsigned
integer;
the most significant four bytes of register~1 are set to zero.
The similar instruction \.{LDT} \.{\$1,\$4,\$5}, ``load tetra,'' sets
\$1 to the {\it signed\/} integer $\s(\mm_4[\$4+\$5])$.
(Instructions generally treat numbers as
@^signed integers@>
signed unless the operation code specifically calls them
unsigned.) In the signed case, the most significant four bytes of the
register will be copies of the most significant bit of the tetrabyte
loaded; thus they will be all~0s or all~1s, depending on whether the
number is $\ge0$ or $<0$.
 
\def\bull{\smallbreak\textindent{$\bullet$}}
\def\bul{\par\textindent{$\bullet$}}
\def\<#1 #2 {\.{#1}~\.{#2} }
\def\>{\hfill\break}
 
\bull\<LDB \$X,\$Y,\0 `load byte'.\>
@.LDB@>
Byte $\s(\mm[\rY+\rZ])$ or $\s(\mm[\rY+\zz])$ is loaded into register~X as a
signed number between $-128$ and $+127$, inclusive.
 
\bull\<LDBU \$X,\$Y,\0 `load byte unsigned'.@>
@.LDBU@>
Byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$ is loaded into register~X as an
unsigned number between $0$ and $255$, inclusive.
 
\bull\<LDW \$X,\$Y,\0 `load wyde'.\>
@.LDW@>
Bytes $\s(\mm_2[\rY+\rZ])$ or $\s(\mm_2[\rY+\zz])$
are loaded into register~X as a signed number between $-32768$ and $+32767$,
inclusive. As mentioned above, our notation $\mm_2[k]$ implies that
the least significant bit of the address $\rY+\rZ$ or $\rY+\zz$ is
ignored and assumed to be~0.
 
\bull\<LDWU \$X,\$Y,\0 `load wyde unsigned'.@>
@.LDWU@>
Bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$ are loaded
into register~X as an unsigned number between $0$ and $65535$, inclusive.
 
\bull\<LDT \$X,\$Y,\0 `load tetra'.\>
@.LDT@>
Bytes $\s(\mm_4[\rY+\rZ])$ or $\s(\mm_4[\rY+\zz])$
are loaded into register~X as a signed number between $-2{,}147{,}483{,}648$ and
$+2{,}147{,}483{,}647$, inclusive.
As mentioned above, our notation $\mm_4[k]$ implies that
the two least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
ignored and assumed to be~0.
 
\bull\<LDTU \$X,\$Y,\0 `load tetra unsigned'.\>
@.LDTU@>
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$
are loaded into register~X as an unsigned number between 0 and
4{,}294{,}967{,}296, inclusive.
 
\bull\<LDO \$X,\$Y,\0 `load octa'.\>
@.LDO@>
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
register~X\null.
As mentioned above, our notation $\mm_8[k]$ implies that
the three least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are
ignored and assumed to be~0.
 
\bull\<LDOU \$X,\$Y,\0 `load octa unsigned'.\>
@.LDOU@>
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into
register~X\null. There is in fact no difference between the behavior of
\.{LDOU} and~\.{LDO}, since
an octabyte can be regarded as either signed or unsigned. \.{LDOU} is included
in \MMIX\ just for completeness and consistency, in spite of the fact that
a foolish consistency is the hobgoblin of little minds.
@^Emerson, Ralph Waldo@>
(Niklaus Wirth made a strong plea for such consistency in his early critique
of System/360; see {\sl JACM\/ \bf15} (1967), 37--74.)
@^Wirth, Niklaus Emil@>
@^System/360@>
 
\bull\<LDHT \$X,\$Y,\0 `load high tetra'.\>
@.LDHT@>
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$ are loaded into the most
significant half of register~X, and the least significant half is
cleared to zero. (One use of ``high tetra arithmetic'' is to detect
overflow easily when tetrabytes are added or subtracted.)
 
\bull\<LDA \$X,\$Y,\0 `load address'.\>
The address $\rY+\rZ$ or $\rY+\zz$ is loaded into register~X. This
instruction is simply another name for the \.{ADDU} instruction
discussed below; it can
be used when the programmer is thinking of memory addresses
instead of numbers.
The \MMIX\ assembler converts \.{LDA} into the same OP-code as \.{ADDU}.
@.LDA@>
@.ADDU@>
 
@ Another family of instructions goes the other way, storing registers into
memory. For example, the ``store octa immediate'' command
\<STO \$3,\$2,17 puts the current contents of register~3
into $\mm_8[\$2+17]$.
 
\bull\<STB \$X,\$Y,\0 `store byte'.\>
@.STB@>
The least significant byte of register~X is stored into
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. An integer overflow exception occurs if
@.overflow@>
\$X is not between $-128$ and $+127$. (We will discuss overflow and other
kinds of exceptions later.)
 
\bull\<STBU \$X,\$Y,\0 `store byte unsigned'.@>\>
@.STBU@>
The least significant byte of register~X is stored into
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. \.{STBU} instructions are the same
as \.{STB} instructions, except that no test for overflow is made.
 
\bull\<STW \$X,\$Y,\0 `store wyde'.\>
@.STW@>
The two least significant bytes of register~X are stored into
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
An integer overflow exception occurs if
\$X is not between $-32768$ and $+32767$.
 
\bull\<STWU \$X,\$Y,\0 `store wyde unsigned'.@>\>
@.STWU@>
The two least significant bytes of register~X are stored into
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$.
\.{STWU} instructions are the same
as \.{STW} instructions, except that no test for overflow is made.
 
\bull\<STT \$X,\$Y,\0 `store tetra'.\>
@.STT@>
The four least significant bytes of register~X are stored into
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
An integer overflow exception occurs if
\$X is not between $-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$.
 
\bull\<STTU \$X,\$Y,\0 `store tetra unsigned'.\>
@.STTU@>
The four least significant bytes of register~X are stored into
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
\.{STTU} instructions are the same
as \.{STT} instructions, except that no test for overflow is made.
 
\bull\<STO \$X,\$Y,\0 `store octa'.\>
@.STO@>
Register X is stored into bytes $\mm_8[\rY+\rZ]$ or
$\mm_8[\rY+\zz]$.
 
\bull\<STOU \$X,\$Y,\0 `store octa unsigned'.\>
@.STOU@>
Identical to \.{STO} \.{\$X,\$Y,\0}.
 
\bull\<STCO X,\$Y,\0 `store constant octabyte'.\>
@.STCO@>
An octabyte whose value is the unsigned byte X is stored into
$\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$.
 
\bull\<STHT \$X,\$Y,\0 `store high tetra'.\>
The most significant four bytes of register~X are stored into
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
@.STHT@>
 
@* Adding and subtracting.
Once numbers are in registers, we can compute with them. Let's consider
addition and subtraction first.
 
\bull\<ADD \$X,\$Y,\0 `add'.\>
@.ADD@>
The sum $\rY+\rZ$ or $\rY+\zz$ is placed into register~X using signed,
two's complement arithmetic.
An integer overflow exception occurs if the sum is $\ge2^{63}$ or $<-2^{63}$.
(We will discuss overflow and other kinds of exceptions later.)
@.overflow@>
 
\bull\<ADDU \$X,\$Y,\0 `add unsigned'.\>
@.ADDU@>
The sum $(\rY+\rZ)\bmod2^{64}$ or $(\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
These instructions are the same
as \.{ADD}~\.{\$X,\$Y,\0} commands
except that no test for overflow is made.
(Overflow could be detected if desired by using the command
\<CMPU ovflo,\$X,\$Y after addition, where \.{CMPU} means
@.CMPU@>
``compare unsigned''; see below.)
 
\bull\<2ADDU \$X,\$Y,\0 `times 2 and add unsigned'.\>
@.2ADDU@>
The sum $(2\rY+\rZ)\bmod2^{64}$ or $(2\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
 
\bull\<4ADDU \$X,\$Y,\0 `times 4 and add unsigned'.\>
@.4ADDU@>
The sum $(4\rY+\rZ)\bmod2^{64}$ or $(4\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
 
\bull\<8ADDU \$X,\$Y,\0 `times 8 and add unsigned'.\>
@.8ADDU@>
The sum $(8\rY+\rZ)\bmod2^{64}$ or $(8\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
 
\bull\<16ADDU \$X,\$Y,\0 `times 16 and add unsigned'.\>
@.16ADDU@>
The sum $(16\rY+\rZ)\bmod2^{64}$ or $(16\rY+\zz)\bmod2^{64}$
is placed into register~X\null.
 
\bull\<SUB \$X,\$Y,\0 `subtract'.\>
@.SUB@>
The difference $\rY-\rZ$ or $\rY-\zz$ is placed into register~X using
signed, two's complement arithmetic.
An integer overflow exception occurs if the difference is $\ge2^{63}$ or
$<-2^{63}$.
 
\bull\<SUBU \$X,\$Y,\0 `subtract unsigned'.\>
@.SUBU@>
The difference $(\rY-\rZ)\bmod2^{64}$ or $(\rY-\zz)\bmod2^{64}$
is placed into register~X\null.
These two instructions are the same
as \.{SUB}~\.{\$X,\$Y,\0} except that no test for overflow is made.
 
\bull\<NEG \$X,Y,\0 `negate'.\>
@.NEG@>
The value $\yy-\rZ$ or $\yy-\zz$ is placed into register~X using
signed, two's complement arithmetic.
An integer overflow exception occurs if the result is greater
than~$2^{63}-\nobreak1$.
(Notice that in this case \MMIX\ works with the ``immediate'' constant~Y,
not register~Y\null. \.{NEG} commands are analogous to the immediate variants
of other commands, because they save us from having to put one-byte
constants into a register. When $\yy=0$, overflow occurs if and
only if $\rZ=-2^{63}$. The instruction \<NEG \$X,1,2 has exactly the
same effect as \.{NEG}~\.{\$X,0,1}.)
 
\bull\<NEGU \$X,Y,\0 `negate unsigned'.\>
@.NEGU@>
The value $(\yy-\rZ)\bmod2^{64}$ or $(\yy-\zz)\bmod2^{64}$
is placed into register~X\null.
\.{NEGU} instructions are the same
as \.{NEG} instructions, except that no test for overflow is made.
 
@* Bit fiddling.
Before looking at multiplication and division, which take longer than
addition and subtraction, let's look at some of the other things that
\MMIX\ can do fast. There are eighteen instructions for bitwise
logical operations on unsigned numbers.
 
\bull\<AND \$X,\$Y,\0 `bitwise and'.\>
@.AND@>
Each bit of register Y is logically anded with the corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bits of the operands are both~1;
in symbols, $\rX=\rY\land\rZ$ or $\rX=\rY\land\zz$.
This means in particular that \<AND \$X,\$Y,Z always zeroes out the seven
most significant bytes of register~X, because 0s are prefixed to the
constant byte~Z\null.
 
\bull\<OR \$X,\$Y,\0 `bitwise or'.\>
@.OR@>
Each bit of register Y is logically ored with the corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~0 if and only if the
corresponding bits of the operands are both~0;
in symbols, $\rX=\rY\lor\rZ$ or $\rX=\rY\lor\zz$.
 
In the special case $\zz=0$, the immediate variant of
this command simply copies register~Y to
register~X\null. The \MMIX\ assembler allows us to write
`\.{SET}~\.{\$X,\$Y}' as a convenient abbreviation for
`\.{OR}~\.{\$X,\$Y,0}'.
@.SET@>
 
\bull\<XOR \$X,\$Y,\0 `bitwise exclusive-or'.\>
@.XOR@>
Each bit of register Y is logically xored with the corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~0 if and only if the
corresponding bits of the operands are equal;
in symbols, $\rX=\rY\oplus\rZ$ or $\rX=\rY\oplus\zz$.
 
\bull\<ANDN \$X,\$Y,\0 `bitwise and-not'.\>
@.ANDN@>
Each bit of register Y is logically anded with the complement of the
corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bit of register~Y is~1 and the other corresponding bit is~0;
in symbols, $\rX=\rY\setminus\rZ$ or $\rX=\rY\setminus\zz$.
(This is the {\it logical difference\/} operation; if the operands
are bit strings representing sets, we are computing the elements that
lie in one set but not the other.)
 
\bull\<ORN \$X,\$Y,\0 `bitwise or-not'.\>
@.ORN@>
Each bit of register Y is logically ored with the complement of the
corresponding bit of
register~Z or of the constant~Z, and the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bit of register~Y is greater than or equal to the other corresponding bit;
in symbols, $\rX=\rY\lor\overline\rZ$
or $\rX=\rY\lor\overline\zz$.
(This is the complement of $\rZ\setminus\rY$ or $\zz\setminus\rY$.)
 
\bull\<NAND \$X,\$Y,\0 `bitwise not-and'.\>
@.NAND@>
Each bit of register Y is logically anded with the corresponding bit of
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
In other words, a bit of register~X is set to~0 if and only if the
corresponding bits of the operands are both~1;
in symbols, $\rX=\rY\mathbin{\overline\land}\rZ$ or
$\rX=\rY\mathbin{\overline\land}\zz$.
 
\bull\<NOR \$X,\$Y,\0 `bitwise not-or'.\>
@.NOR@>
Each bit of register Y is logically ored with the corresponding bit of
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bits of the operands are both~0;
in symbols, $\rX=\rY\mathbin{\overline\lor}\rZ$ or
$\rX=\rY\mathbin{\overline\lor}\zz$.
 
\bull\<NXOR \$X,\$Y,\0 `bitwise not-exclusive-or'.\>
@.NAND@>
Each bit of register Y is logically xored with the corresponding bit of
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null.
In other words, a bit of register~X is set to~1 if and only if the
corresponding bits of the operands are equal;
in symbols, $\rX=\rY\mathbin{\overline\oplus}\rZ$ or
$\rX=\rY\mathbin{\overline\oplus}\zz$.
 
\bull\<MUX \$X,\$Y,\0 `bitwise multiplex'.\>
@.MUX@>
For each bit position~$j$, the $j$th bit of register~X is set either to
bit~$j$ of register~Y
or to bit~$j$ of the other operand \$Z~or~Z, depending on
whether bit~$j$ of the special {\it mask register\/}~rM is 1 or 0:
@^rM@>
if ${\rm M}_j$ then $\yy_j$ else~$\zz_j$.
In symbols, $\rm\rX=(\rY\land rM)\lor(\rZ\land\overline{rM})$ or
$\rm\rX=(\rY\land rM)\lor(\zz\land\overline{rM})$.
(\MMIX\ has several such special registers, associated with instructions that
need more than two inputs or produce more than one output.)
 
@ Besides the eighteen bitwise operations, \MMIX\ can also perform unsigned
bytewise and biggerwise operations that are somewhat more exotic.
 
\bull\<BDIF \$X,\$Y,\0 `byte difference'.\>
@.BDIF@>
For each byte position~$j$, the $j$th byte of register~X is set to byte~$j$ of
register~Y minus byte~$j$ of the other operand \$Z~or~Z, unless that
difference is negative; in the latter case, byte~$j$ of~\$X is set to zero.
 
\bull\<WDIF \$X,\$Y,\0 `wyde difference'.\>
@.WDIF@>
For each wyde position~$j$, the $j$th wyde of register~X is set to wyde~$j$ of
register~Y minus wyde~$j$ of the other operand \$Z~or~Z, unless that
difference is negative; in the latter case, wyde~$j$ of~\$X is set to zero.
 
\bull\<TDIF \$X,\$Y,\0 `tetra difference'.\>
@.TDIF@>
For each tetra position~$j$, the $j$th tetra of register~X is set to tetra~$j$ of
register~Y minus tetra~$j$ of the other operand \$Z~or~Z, unless that
difference is negative; in the latter case, tetra~$j$ of~\$X is set to zero.
 
\bull\<ODIF \$X,\$Y,\0 `octa difference'.\>
@.ODIF@>
Register~X is set to register~Y minus the other operand \$Z~or~Z, unless
\$Z~or~Z exceeds register~Y; in the latter case,
\$X~is set to zero. The operands are treated as unsigned integers.
 
\smallskip
The \.{BDIF} and \.{WDIF} commands are useful
in applications to graphics or video; \.{TDIF} and \.{ODIF} are also
present for reasons of consistency. For example, if \.a and \.b are
registers containing
8-byte quantities, their bytewise maxima~\.c and
bytewise minima~\.d are computed by
$$\hbox{\tt BDIF x,a,b; ADDU c,x,b; SUBU d,a,x;}$$
similarly, the individual ``pixel differences'' \.e, namely the absolute
values of the differences of corresponding bytes, are computed by
$$\hbox{\tt BDIF x,a,b; BDIF y,b,a; OR e,x,y.}$$
To add individual
bytes of \.a and \.b while clipping all sums to 255 if they don't fit
in a single byte, one can say
$$\hbox{\tt NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums,x,0;}$$
in other words, complement \.a, apply \.{BDIF}, and complement the result.
The operations can also be used to construct efficient operations on
strings of bytes or wydes.
@^graphics@>
@^pixels@>
@^saturating arithmetic@>
@^nybble@>
 
Exercise: Implement a ``nybble difference'' instruction that operates
in a similar way on sixteen nybbles at a time.
 
Answer: {\tt\spaceskip=.5em minus .3em
AND x,a,m; AND y,b,m; ANDN xx,a,m; ANDN yy,b,m;
BDIF x,x,y; BDIF xx,xx,yy; OR ans,x,xx} where register \.m contains
the mask \Hex{0f0f0f0f0f0f0f0f}.
 
(The \.{ANDN} operation can be regarded as
a ``bit difference'' instruction that operates
in a similar way on 64 bits at a time.)
 
@ Three more pairs of bit-fiddling instructions round out the collection of exotics.
 
\bull\<SADD \$X,\$Y,\0 `sideways add'.\>
@.SADD@>
Each bit of register Y is logically anded with the complement of the
corresponding bit of
register~Z or of the constant~Z, and the number of 1~bits in the
result is placed in register~X\null.
In other words, register~X is set to the number of bit positions
in which register~Y has a~1 and the other operand has a~0;
in symbols, $\rX=\nu(\rY\setminus\rZ)$ or $\rX=\nu(\rY\setminus\zz)$.
When the second operand is zero this operation is sometimes called
``population counting,'' because it counts the number of 1s in register~Y\null.
@^population counting@>
@^counting ones@>
 
\bull\<MOR \$X,\$Y,\0 `multiple or'.\>
@.MOR@>
Suppose the 64 bits of register Y are indexed as
$$y_{00}y_{01}\ldots y_{07}y_{10}y_{11}\ldots y_{17}\ldots
y_{70}y_{71}\ldots y_{77};$$
in other words, $y_{ij}$ is the $j$th bit of the $i$th byte, if we
number the bits and bytes from 0 to 7 in big-endian fashion from left to right.
Let the bits of the other operand, \$Z or~Z, be indexed similarly:
$$z_{00}z_{01}\ldots z_{07}z_{10}z_{11}\ldots z_{17}\ldots
z_{70}z_{71}\ldots z_{77}.$$
The \.{MOR} operation replaces each bit $x_{ij}$ of register~X by the bit
$$ y_{0j}z_{i0}\lor y_{1j}z_{i1}\lor \cdots \lor y_{7j}z_{i7}.$$
Thus, for example, if register Z contains the constant
\Hex{0102040810204080},
\.{MOR} reverses the order of the bytes in register~Y, converting between
little-endian and big-endian addressing.
@^big-endian versus little-endian@>
@^little-endian versus big-endian@>
(The $i$th byte of~\$X depends on the bytes of~\$Y as specified by the
$i$th byte of~\$Z or~Z\null. If we regard
64-bit words as $8\times8$ Boolean matrices, with one byte per column,
this operation computes the
Boolean product $\rX=\rY\,\rZ$ or $\rX=\rY\,\zz$. Alternatively, if we
regard 64-bit words as $8\times8$ matrices with one byte per~{\it row},
\.{MOR} computes the Boolean product $\rX=\rZ\,\rY$ or $\rX=\zz\,\rY$
with operands in the opposite order. The immediate form
\<MOR \$X,\$Y,Z always sets the leading seven bytes of register~X
to zero; the other byte is set to the bitwise or of whatever bytes of
register~Y are specified by the immediate operand~Z\null.)
 
Exercise: Explain how to compute a mask \.m that is \Hex{ff} in byte
positions where \.a exceeds \.b, \Hex{00} in all other bytes.
Answer: \.{BDIF}~\.{x,a,b;} \.{MOR}~\.{m,minusone,x;}
here \.{minusone} is a register consisting of all 1s. (Moreover,
if we \.{AND} this result
with \Hex{8040201008040201}, then \.{MOR} with $\zz=255$, we get
a one-byte encoding~of~\.m.)
 
\bull\<MXOR \$X,\$Y,\0 `multiple exclusive-or'.\>
@.MXOR@>
This operation is like the Boolean multiplication just discussed, but
exclusive-or is used to combine the bits. Thus we obtain a matrix
product over the field of two elements instead of a Boolean matrix product.
This operation can be used to construct hash functions, among many other things.
(The hash functions aren't bad, but they are not ``universal'' in the
sense of exercise 6.4--72.)
@^matrices of bits@>
@^Boolean multiplication@>
 
@ Sixteen ``immediate wyde'' instructions are available for the common
case that a 16-bit constant is needed. In this case the Y~and~Z fields
of the instruction are regarded as a single 16-bit unsigned number~YZ\null.
@^immediate operands@>
 
\bull\<SETH \$X,YZ `set to high wyde';
@.SETH@>
\<SETMH \$X,YZ `set to medium high wyde';
@.SETMH@>
\<SETML \$X,YZ `set to medium low wyde';
@.SETML@>
\<SETL \$X,YZ `set to low wyde'.\>
@.SETL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, and placed into register~X\null.
Thus, for example, \.{SETML} inserts
a given value into the second-least-significant wyde of register~X and
sets the other three wydes to zero.
 
\bull\<INCH \$X,YZ `increase by high wyde';
@.INCH@>
\<INCMH \$X,YZ `increase by medium high wyde';
@.INCMH@>
\<INCML \$X,YZ `increase by medium low wyde';
@.INCML@>
\<INCL \$X,YZ `increase by low wyde'.\>
@.INCL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, and added to register~X,
ignoring overflow; the result is placed back into register~X\null.
 
If YZ is the hexadecimal constant \Hex{8000}, the command \<INCH \$X,YZ
complements the most significant bit of register~X\null. We will see
below that this can be used to negate a floating point number.
@^negation, floating point@>
 
\bull\<ORH \$X,YZ `bitwise or with high wyde';
@.ORH@>
\<ORMH \$X,YZ `bitwise or with medium high wyde';
@.ORMH@>
\<ORML \$X,YZ `bitwise or with medium low wyde';
@.ORML@>
\<ORL \$X,YZ `bitwise or with low wyde'.\>
@.ORL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, and ored with register~X;
the result is placed back into register~X\null.
 
Notice that any desired 4-wyde constant \.{GH} \.{IJ} \.{KL} \.{MN}
can be inserted into a register with a sequence of four instructions
such as
$$\hbox{\tt SETH \$X,GH; INCMH \$X,IJ; INCML \$X,KL; INCL \$X,MN;}$$
any of these \.{INC} instructions could also be replaced by \.{OR}.
 
\bull\<ANDNH \$X,YZ `bitwise and-not high wyde';
@.ANDNH@>
\<ANDNMH \$X,YZ `bitwise and-not medium high wyde';\>
@.ANDNMH@>
\<ANDNML \$X,YZ `bitwise and-not medium low wyde';
@.ANDNML@>
\<ANDNL \$X,YZ `bitwise and-not low wyde'.\>
@.ANDNL@>
The 16-bit unsigned number YZ is shifted left
by either 48 or 32 or 16 or 0 bits, respectively, then
complemented and anded with register~X;
the result is placed back into register~X\null.
 
If YZ is the hexadecimal
constant \Hex{8000}, the command \<ANDNH \$X,YZ forces the most significant
bit of register~X to be~0. This can be used to compute the absolute value of
a floating point number.
@^absolute value, floating point@>
 
@ \MMIX\ knows several ways to shift a register left or right
by any number of bits.
 
\bull\<SL \$X,\$Y,\0 `shift left'.\>
@.SL@>
The bits of register~Y are shifted left by \$Z or Z places, and 0s
are shifted in from the right; the result is placed in register~X\null.
Register~Y is treated as a signed number, but
the second operand is treated as an unsigned number.
The effect is the same as multiplication by
$2^{\mkern1mu\rZ}$ or by $2^\zz$; an integer overflow exception occurs if the
result is $\ge2^{63}$ or $<-2^{63}$.
In particular, if the second operand is 64 or~more, register~X will
become entirely zero, and integer overflow will be signaled unless
register~Y was zero.
 
\bull\<SLU \$X,\$Y,\0 `shift left unsigned'.\>
@.SLU@>
The bits of register~Y are shifted left by \$Z or Z places, and 0s
are shifted in from the right; the result is placed in register~X\null.
Both operands are treated as unsigned numbers. The \.{SLU} instructions
are equivalent to \.{SL}, except that no test for overflow is made.
 
\bull\<SR \$X,\$Y,\0 `shift right'.\>
@.SR@>
The bits of register~Y are shifted right by \$Z or Z places, and copies
of the leftmost bit (the sign bit) are shifted in from the left; the result is
placed in register~X\null.
Register~Y is treated as a signed number, but
the second operand is treated as an unsigned number.
The effect is the same as division by $2^{\mkern1mu\rZ}$ or by
$2^\zz$ and rounding down. In particular, if the second operand is 64 or~more,
register~X will become zero if \$Y was nonnegative, $-1$ if \$Y was negative.
 
\bull\<SRU \$X,\$Y,\0 `shift right unsigned'.\>
@.SRU@>
The bits of register~Y are shifted right by \$Z or Z places, and 0s
are shifted in from the left; the result is placed in register~X\null.
Both operands are treated as unsigned numbers.
The effect is the same as unsigned division of a 64-bit number
by $2^{\mkern1mu\rZ}$ or by~$2^\zz$;
if the second operand is 64 or~more, register~X will become entirely~zero.
 
@* Comparisons.
Arithmetic and logical operations are nice,
but computer programs also need to compare numbers
and to change the course of a calculation depending on what they find.
\MMIX\ has four comparison instructions to facilitate such decision-making.
 
\bull\<CMP \$X,\$Y,\0 `compare'.\>
@.CMP@>
Register X is set to $-1$ if register Y is less than register Z or less than
the unsigned immediate value~Z, using the conventions of signed
arithmetic; it is set to 0 if register~Y is equal to register Z or equal to
the unsigned immediate value~Z; otherwise it is set to~1.
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
 
\bull\<CMPU \$X,\$Y,\0 `compare unsigned'.\>
@.CMPU@>
Register X is set to $-1$ if register Y is less than register Z or less than
the unsigned immediate value Z, using the conventions of unsigned
arithmetic; it is set to 0 if register Y is equal to register Z or equal to
the unsigned immediate value~Z; otherwise it is set to~1.
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$.
 
@ There also are 32 conditional instructions, which choose quickly between
two alternative courses of action.
 
\bull\<CSN \$X,\$Y,\0 `conditionally set if negative'.\>
@.CSN@>
If register Y is negative (namely if its most significant bit is~1),
register~X is set to the contents of register~Z or to the
unsigned immediate value~Z. Otherwise nothing happens.
 
\bull\<CSZ \$X,\$Y,\0 `conditionally set if zero'.
@.CSZ@>
\bul\<CSP \$X,\$Y,\0 `conditionally set if positive'.
@.CSP@>
\bul\<CSOD \$X,\$Y,\0 `conditionally set if odd'.
@.CSOD@>
\bul\<CSNN \$X,\$Y,\0 `conditionally set if nonnegative'.
@.CSNN@>
\bul\<CSNZ \$X,\$Y,\0 `conditionally set if nonzero'.
@.CSNZ@>
\bul\<CSNP \$X,\$Y,\0 `conditionally set if nonpositive'.
@.CSNP@>
\bul\<CSEV \$X,\$Y,\0 `conditionally set if even'.\>
@.CSEV@>
These instructions are entirely analogous to \.{CSN}, except
that register~X changes only if register~Y is respectively zero, positive,
odd, nonnegative, nonzero, nonpositive, or nonodd.
 
\bull\<ZSN \$X,\$Y,\0 `zero or set if negative'.\>
@.ZSN@>
If register Y is negative (namely if its most significant bit is~1),
register~X is set to the contents of register~Z or to the
unsigned immediate value~Z. Otherwise register~X is set to zero.
 
\bull\<ZSZ \$X,\$Y,\0 `zero or set if zero'.
@.ZSZ@>
\bul\<ZSP \$X,\$Y,\0 `zero or set if positive'.
@.ZSP@>
\bul\<ZSOD \$X,\$Y,\0 `zero or set if odd'.
@.ZSOD@>
\bul\<ZSNN \$X,\$Y,\0 `zero or set if nonnegative'.
@.ZSNN@>
\bul\<ZSNZ \$X,\$Y,\0 `zero or set if nonzero'.
@.ZSNZ@>
\bul\<ZSNP \$X,\$Y,\0 `zero or set if nonpositive'.
@.ZSNP@>
\bul\<ZSEV \$X,\$Y,\0 `zero or set if even'.\>
@.ZSEV@>
These instructions are entirely analogous to \.{ZSN}, except
that \$X is set to \$Z or~Z if register~Y is respectively zero, positive,
odd, nonnegative, nonzero, nonpositive, or even; otherwise
\$X is set to zero.
 
Notice that the two instructions \<CMPU r,s,0 and \<ZSNZ r,s,1 have
the same effect. So do the two instructions \<CSNP r,s,0 and \.{ZSP} \.{r,s,r}.
So do \<AND r,s,1 and \.{ZSOD}~\.{r,s,1}.
 
@* Branches and jumps.
\MMIX\ ordinarily executes instructions in sequence, proceeding from
an instruction in tetrabyte M$_4[\lambda]$ to the instruction in
M$_4[\lambda+4]$. But there are several ways to interrupt
the normal flow of control, most of which use the Y and Z fields of
an instruction as a combined 16-bit YZ field.
For example, \<BNZ \$3,@@+4000 (branch if nonzero)
is typical: It means that control should skip ahead 1000 instructions
to the command that appears $4000$ bytes after the
\.{BNZ}, if register~3 is not equal to zero.
 
There are eight branch-forward instructions, corresponding to the
eight conditions in the \.{CS} and \.{ZS} commands that we discussed earlier.
And there are eight similar branch-backward instructions; for example,
\<BOD \$2,@@-4000 (branch if odd) takes control to the
instruction that appears $4000$ bytes {\it before\/}
this \.{BOD} command, if register~2 is odd. The numeric OP-code when branching
backward is one greater than the OP-code when branching
forward; the assembler takes care of this automatically, just as it takes
cares of changing \.{ADD} from 32 to 33 when necessary.
 
Since branches are relative to the current location, the \MMIX\ assembler
treats branch instructions in a special way.
Suppose a programmer writes `\.{BNZ} \.{\$3,Case5}',
where \.{Case5} is the address of an instruction in location~$l$.
If this instruction appears in location~$\lambda$, the assembler first
computes the displacement $\delta=\lfloor(l-\lambda)/4\rfloor$. Then if
$\delta$ is nonnegative, the quantity~$\delta$
is placed in the YZ field of a \.{BNZ}
command, and it should be less than $2^{16}$; if $\delta$ is negative,
the quantity $2^{16}+\delta$ is placed in the YZ field of a \.{BNZ}
command with OP-code increased by~1,
and $\delta$ should not be less than $-2^{16}$.
 
The symbol \.{@@} used in our examples of
\.{BNZ} and \.{BOD} above is interpreted by the
assembler as an abbreviation for ``the location of the current
instruction.'' In the following
notes we will define pairs of branch commands by writing, for example,
`\.{BNZ}~\.{\$X,@@+4*YZ[-262144]}'; this stands for a branch-forward
command that
branches to the current location plus four times~YZ, as well as for
a branch-backward command that branches to the current
location plus four times $(\rm YZ-65536)$.
 
\bull\<BN \$X,@@+4*YZ[-262144] `branch if negative'.
@.BN@>
\bul\<BZ \$X,@@+4*YZ[-262144] `branch if zero'.
@.BZ@>
\bul\<BP \$X,@@+4*YZ[-262144] `branch if positive'.
@.BP@>
\bul\<BOD \$X,@@+4*YZ[-262144] `branch if odd'.
@.BOD@>
\bul\<BNN \$X,@@+4*YZ[-262144] `branch if nonnegative'.
@.BNN@>
\bul\<BNZ \$X,@@+4*YZ[-262144] `branch if nonzero'.
@.BNZ@>
\bul\<BNP \$X,@@+4*YZ[-262144] `branch if nonpositive'.
@.BNP@>
\bul\<BEV \$X,@@+4*YZ[-262144] `branch if even'.\>
@.BEV@>
If register X is respectively negative, zero, positive, odd, nonnegative,
nonzero, nonpositive, or even, and if this instruction appears in memory
location $\lambda$, the next instruction is taken from memory location
$\lambda+4{\rm YZ}$ (branching forward) or $\lambda+4({\rm YZ}-2^{16})$
(branching backward). Thus one can go from location~$\lambda$ to any location
between $\lambda-262{,}144$ and $\lambda+262{,}140$, inclusive.
 
\smallskip
Sixteen additional branch instructions called {\it probable branches\/}
are also provided. They have exactly the same meaning as ordinary
branch instructions; for example, \<PBOD \$2,@@-4000 and \<BOD \$2,@@-4000 both
go backward 4000 bytes if register~2 is odd. But they differ in running time:
On some implementations of\/ \MMIX,
a branch instruction takes longer when the branch is taken, while a
probable branch takes longer when the branch is {\it not\/} taken.
Thus programmers should use a \.B instruction when they think branching is
relatively unlikely, but they should use \.{PB} when they expect
branching to occur more often than not. Here is a list of the
probable branch commands, for completeness:
 
\bull\<PBN \$X,@@+4*YZ[-262144] `probable branch if negative'.
@.PBN@>
\bul\<PBZ \$X,@@+4*YZ[-262144] `probable branch if zero'.
@.PBZ@>
\bul\<PBP \$X,@@+4*YZ[-262144] `probable branch if positive'.
@.PBP@>
\bul\<PBOD \$X,@@+4*YZ[-262144] `probable branch if odd'.
@.PBOD@>
\bul\<PBNN \$X,@@+4*YZ[-262144] `probable branch if nonnegative'.
@.PBNN@>
\bul\<PBNZ \$X,@@+4*YZ[-262144] `probable branch if nonzero'.
@.PBNZ@>
\bul\<PBNP \$X,@@+4*YZ[-262144] `probable branch if nonpositive'.
@.PBNP@>
\bul\<PBEV \$X,@@+4*YZ[-262144] `probable branch if even'.
@.PBEV@>
 
@ Locations that are relative to the current instruction can be
transformed into absolute locations with \.{GETA} commands.
 
\bull\<GETA \$X,@@+4*YZ[-262144] `get address'.\>
@.GETA@>
The value $\lambda+4{\rm YZ}$ or $\lambda+4({\rm YZ}-2^{16})$ is placed in
register~X\null. (The assembly language conventions of branch instructions
apply; for example, we can write `\.{GETA} \.{\$X,Addr}'.)
 
@ \MMIX\ also has unconditional jump instructions, which change the
location of the next instruction no matter what.
 
\bull\<JMP @@+4*XYZ[-67108864] `jump'.\>
@.JMP@>
A \.{JMP} command treats bytes X, Y, and Z as an unsigned 24-bit
integer XYZ. It allows a program to transfer control from location $\lambda$ to any
location between $\lambda-67\?{,}108{,}864$ and $\lambda+67\?{,}108{,}860$
inclusive, using relative addressing as in the \.{B} and \.{PB} commands.
 
\bull\<GO \$X,\$Y,\0 `go to location'.\>
@.GO@>
\MMIX\ takes its next instruction from location $\rY+\rZ$ or $\rY+\zz$,
and continues from there. Register~X is set equal to $\lambda+4$, the
location of the instruction that would ordinarily have been executed next.
(\.{GO} is similar to a jump, but it is not relative
to the current location. Since \.{GO} has the same format as a load or store
instruction, a loading routine can treat program labels with the same mechanism
that is used to treat references to data.)
 
An old-fashioned type of subroutine linkage can be implemented by saying
either `\.{GO}~\.{r,subloc,0}' or `\.{GETA}~\.{r,@@+8;}
\.{JMP}~\.{Sub}' to~enter a subroutine,
then `\.{GO}~\.{r,r,0}' to return.
But subroutines are normally entered with the instructions
\.{PUSHJ} or \.{PUSHGO}, described below.
 
The two least significant bits of the address
in a \.{GO} command are essentially ignored. They will, however, appear in
the value of~$\lambda$ returned by \.{GETA} instructions, and in the
return-jump register~rJ after \.{PUSHJ} or \.{PUSHGO} instructions are
performed, and in
@^rJ@>
the where-interrupted register at the time of an interrupt. Therefore they
could be used to send some kind of signal to a subroutine or (less likely)
to an interrupt handler.
 
@* Multiplication and division.
Now for some instructions that make \MMIX\ work harder.
 
\bull\<MUL \$X,\$Y,\0 `multiply'.\>
@.MUL@>
The signed product of the number in register Y by either the
number in register~Z or the unsigned byte~Z
replaces the contents of register~X\null. An
integer overflow exception can occur, as with \.{ADD} or \.{SUB}, if the
result is less than $-2^{63}$ or greater than $2^{63}-1$. (Immediate
multiplication by powers of~2 can be done more rapidly with the \.{SL}
instruction.)
 
\bull\<MULU \$X,\$Y,\0 `multiply unsigned'.\>
@.MULU@>
The lower 64 bits of the
unsigned 128-bit product of register~Y and either
register~Z or~Z are placed in register~X, and the upper 64 bits are
placed in the special {\it himult register\/}~rH\null. (Immediate multiplication
@^rH@>
by powers of~2 can be done more rapidly with the \.{SLU} instruction,
if the upper half is not needed.
Furthermore, an instruction like \<4ADDU \$X,\$Y,\$Y is faster than
\.{MULU} \.{\$X,\$Y,5}.)
 
\bull\<DIV \$X,\$Y,\0 `divide'.\>
@.DIV@>
The signed quotient of the number in register Y divided
by either the number in register~Z or the unsigned byte~Z
replaces the contents of register~X, and the signed remainder
is placed in the special {\it remainder register\/}~rR\null.
@^rR@>
An integer divide check exception occurs if the divisor is zero; in that
case \$X is set to zero and rR is set to~\$Y\null.
@^divide check exception@>
@^overflow@>
An integer overflow exception occurs if the number $-2^{63}$ is divided
by~$-1$; otherwise integer overflow is impossible. The quotient of
$y$ divided by~$z$ is defined to be $\lfloor y/z\rfloor$, and the remainder
is defined to be $y-\lfloor y/z\rfloor z$ (also written $y\bmod z$).
Thus, the remainder is either
zero or has the sign of the divisor. Dividing by $z=2^t$ gives
exactly the same quotient as shifting right~$t$ via the \.{SR} command, and
exactly the same remainder as anding with $z-1$ via the \.{AND} command.
Division of a positive 63-bit number by a positive constant can be accomplished
more quickly by computing the upper half of a suitable unsigned product and
shifting it right appropriately.
 
\bull\<DIVU \$X,\$Y,\0 `divide unsigned'.\>
@.DIVU@>
The unsigned 128-bit number obtained by prefixing the special {\it dividend
register}~rD to the contents of register~Y is divided either by the
@^rD@>
unsigned number in register~Z or by the unsigned byte~Z, and the quotient is placed
in register~X\null. The remainder is placed in the remainder
register~rR\null.
However, if rD is greater than or equal to
the divisor (and in particular if the divisor is zero),
then \$X is set to~rD and rR is set to~\$Y\null.
(Unsigned arithmetic never signals an exceptional condition, even
when dividing by zero.)
If rD is zero, unsigned division by $z=2^t$ gives exactly the same quotient as
shifting right~$t$ via the \.{SRU} command, and
exactly the same remainder as anding with $z-1$ via the \.{AND} command.
Section 4.3.1 of {\sl Seminumerical Algorithms\/}
explains how to use unsigned division to obtain the quotient and remainder
of extremely large numbers.
 
@* Floating point computations.
Floating point arithmetic conforming to the famous IEEE/ANSI
Standard~754 is provided for arbitrary 64-bit numbers. The IEEE standard
refers to such numbers as ``double format'' quantities, but \MMIX\
calls them simply floating point numbers because 64-bit quantities are
the~norm.
@^floating point arithmetic@>
@^IEEE/ANSI Standard 754@>
@^subnormal numbers@>
@^normal numbers@>
@^NaN@>
@^overflow@>
@^underflow@>
@^invalid exception@>
@^inexact exception@>
@^signaling NaN@>
@^quiet NaN@>
@^infinity@>
@^rounding modes@>
 
A positive floating point number has 53 bits of precision and can range
from approximately $10^{-308}$ to $10^{308}$. ``Subnormal numbers''
between $10^{-324}$ and $10^{-308}$ can also be represented, but with fewer
bits of precision.
Floating point numbers can be
infinite, and they satisfy such identities as $1.0/\infty=+0.0$, $-2.8\times\infty
=-\infty$. Floating
point quantities can also be ``Not-a-Numbers'' or NaNs, which are
further classified into signaling NaNs and quiet NaNs.
 
Five kinds
of exceptions can occur during floating point computations, and they
each have code letters: Floating
overflow~(O) or underflow~(U); floating divide by zero~(Z);
floating inexact~(X); and floating invalid~(I).
For example, the multiplication of sufficiently small integers causes
no exceptions, and the division of 91.0 by~13.0 is also exception-free,
but the division 1.0/3.0 is inexact. The multiplication of extremely
large or extremely small floating point numbers is inexact and it
also causes overflow or underflow.
Invalid results occur when taking the square root of a negative
number; mathematicians can remember the I exception
by relating it to the square root of $-1.0$.
Invalid results also occur when trying to convert infinity
or a quiet NaN to a fixed-point
integer, or when any signaling NaN is encountered, or when
mathematically undefined operations like $\infty-\infty$ or $0/0$ are
requested.
(Programmers can be sure that they have not erroneously
used uninitialized floating point data if they initialize all their variables
to signaling NaN values.)
 
Four different rounding modes for inexact results are available:
round to nearest (and to even in case of ties);
round off (toward zero); round up (toward $+\infty)$;
or round down (toward $-\infty$). \MMIX\
has a special {\it arithmetic status register\/}~rA that specifies the
@^rA@>
current rounding mode and the user's current preferences for exception
handling.
 
\def\NaN{{\rm NaN}}
IEEE standard arithmetic provides an excellent foundation for scientific
calculations, and it will be thoroughly explained in the fourth
edition of {\sl Seminumerical Algorithms}, Section 4.2.
For our present purposes, we need not study all the details; but
we do need to specify \MMIX's behavior with respect to several
things that are not completely defined by the standard.
For example, the IEEE standard does not fully define the
result of operations with NaNs.
 
When an octabyte represents a floating point number
in \MMIX's registers, the leftmost bit is the sign; then come 11 bits for an
exponent~$e$; and the remaining 52 bits are the fraction part~$f$.
We regard $e$ as an integer between 0 and $(11111111111)_2=2047$, and we regard $f$ as
a fraction between 0 and $(.111\ldots1)_2=1-2^{-52}$.
Each octabyte has the following
significance:
$$\vbox{\halign{\hfil$\pm#$,\quad if &#\hfil\cr
0.0&$e=f=0$ (zero);\cr
2^{-1022}f&$e=0$ and $f>0$ (subnormal);\cr
2^{\mkern1mu e-1023}(1+f)&$0<e<2047$ (normal);\cr
\infty&$e=2047$ and $f=0$ (infinite);\cr
\NaN(f)&$e=2047$ and $0<f<1/2$ (signaling NaN);\cr
\NaN(f)&$e=2047$ and $f\ge1/2$ (quiet NaN).\cr}}$$
Notice that $+0.0$ is distinguished from $-0.0$; this fact is
important for interval arithmetic.
@^minus zero@>
 
Exercise: What 64 bits represent the floating point number 1.0?
Answer: We want $e=1023$ and $f=0$, so the answer is \Hex{3ff0000000000000}.
 
Exercise: What is the largest finite floating point number?
Answer: We want $e=2046$ and $f=1-2^{-52}$, so the answer is
$\Hex{7fefffffffffffff}=2^{1024}-2^{971}$.
 
@ The seven IEEE floating point arithmetic operations (addition, subtraction,
multiplication, division, remainder, square root, and nearest-integer)
all share common features, called the {\it standard floating point
conventions\/} in the discussion below:
@^standard floating point conventions@>
@^overflow@>
@^underflow@>
The operation is performed on floating point numbers found in two registers,
\$Y and~\$Z, except that square root and integerization
involve only one operand.
If neither input operand is a NaN, we first determine the exact result,
then round it using the current rounding mode
found in special register~rA\null. Infinite results are exact and need
no rounding. A floating overflow exception occurs if the rounded
result is finite but needs an exponent greater than 2046.
A floating underflow exception occurs if the rounded result needs an exponent
less than~1 and either (i)~the unrounded result cannot be represented exactly
@^rA@>
as a subnormal number or (ii)~the ``floating underflow trip'' is enabled in~rA\null.
(Trips are discussed below.)
NaNs are treated specially as follows: If either \$Y or~\$Z is a signaling NaN,
an invalid exception occurs and the NaN is quieted by adding 1/2 to its
fraction part. Then if \$Z is a quiet NaN, the result is set
to \$Z; otherwise if \$Y is a quiet NaN, the result is set to \$Y\null.
(Registers \$Y and \$Z do not actually change.)
 
\bull\<FADD \$X,\$Y,\$Z `floating add'.@>\>
@.FADD@>
The floating point sum $\rY+\rZ$ is computed by the
standard floating point conventions just described,
and placed in register~X\null.
An invalid exception occurs if the sum is $(+\infty)+(-\infty)$ or
$(-\infty)+(+\infty)$; in that case the result is $\NaN(1/2)$ with the sign
of~\$Z\null. If the sum is exactly zero and the current mode is
not rounding-down, the result is $+0.0$ except that $(-0.0)+(-0.0)=-0.0$. If the
@^minus zero@>
sum is exactly zero and the current mode is rounding-down, the result
is $-0.0$ except that $(+0.0)+(+0.0)=+0.0$.
These rules for signed zeros turn out to be useful when doing interval
arithmetic: If the lower bound of an interval is $+0.0$ or if the
upper bound is $-0.0$, the interval does not contain zero, so the
numbers in the interval have a known sign.
 
Floating point underflow cannot occur unless the U-trip has been enabled,
because any underflowing result of floating point
addition can be represented exactly as a subnormal number.
 
Silly but instructive exercise: Find all pairs of numbers $(\rY,\rZ)$ such
that the commands \<FADD \$X,\$Y,\$Z and \<ADDU \$X,\$Y,\$Z both produce
the same result in~\$X
(although \.{FADD} may cause floating exceptions).
Answer: Of course \$Y or \$Z could be zero, if the other one is not a signaling
NaN. Or one could be signaling and the other \Hex{0008000000000000}.
Other possibilities
occur when they are both positive and less than
\Hex{0010000000000001}; or when one operand is \Hex{0000000000000001}
and the other is an odd number between \Hex{0020000000000001} and
\Hex{002ffffffffffffd} inclusive (rounding to nearest).
And still more surprising possibilities exist, such as
\Hex{7f6001b4c67bc809}\thinspace+\thinspace\Hex{ff5ffb6a4534a3f7}.
All eight families of solutions will be revealed some day in the fourth edition
of {\sl Seminumerical Algorithms}.
 
\bull\<FSUB \$X,\$Y,\$Z `floating subtract'.\>
@.FSUB@>
This instruction is equivalent to \.{FADD}, but with the sign of~\$Z negated
unless \$Z is a~NaN.
 
\bull\<FMUL \$X,\$Y,\$Z `floating multiply'.\>
@.FMUL@>
The floating point product $\rY\times\rZ$ is computed by
the standard floating point conventions, and placed in register~X\null.
An invalid exception occurs if
the product is $(\pm0.0)\times(\pm\infty)$ or $(\pm\infty)\times(\pm0.0)$;
in that case the result is $\pm\NaN(1/2)$. No exception occurs for the
product $(\pm\infty)\times(\pm\infty)$. If neither \$Y nor~\$Z is a NaN,
the sign of the result is the product of the signs of \$Y and~\$Z\null.
 
\bull\<FDIV \$X,\$Y,\$Z `floating divide'.\>
@.FDIV@>
The floating point quotient $\rY\?/\rZ$ is computed by
the standard floating point conventions, and placed in \$X\null.
@^standard floating point conventions@>
A floating divide by zero exception occurs if the
quotient is $(\hbox{normal or subnormal})/(\pm0.0)$. An invalid exception occurs if
the quotient is $(\pm0.0)/(\pm0.0)$ or $(\pm\infty)/(\pm\infty)$; in that case the
result is $\pm\NaN(1/2)$. No exception occurs for the
quotient $(\pm\infty)/(\pm0.0)$. If neither \$Y nor~\$Z is a NaN,
the sign of the result is the product of the signs of \$Y and~\$Z\null.
 
If a floating point number in register X is known to have an exponent between
2 and~2046, the instruction \<INCH \$X,\char`\#fff0 will divide it by~2.0.
 
\bull\<FREM \$X,\$Y,\$Z `floating remainder'.\>
@.FREM@>
The floating point remainder $\rY\,{\rm rem}\,\rZ$ is computed by
the standard floating point conventions, and placed in register~X\null.
(The IEEE standard defines the remainder to be $\rY-n\times\rZ$,
where $n$ is the nearest integer to $\rY/\rZ$, and $n$ is an even
integer in case of ties. This is not the same as the remainder
$\rY\bmod\rZ$ computed by \.{DIV} or \.{DIVU}.)
A zero remainder has the sign of~\$Y\null.
An invalid exception occurs if \$Y is infinite and/or \$Z is zero; in
that case the result is $\NaN(1/2)$ with the sign of~\$Y\null.
 
\bull\<FSQRT \$X,\$Z `floating square root'.\>
@.FSQRT@>
The floating point square root $\sqrt\rZ$ is computed by the
standard floating point conventions, and placed in register~X\null. An
invalid exception occurs if \$Z is a negative number (either infinite, normal,
or subnormal); in that case the result is $-\NaN(1/2)$. No exception occurs
when taking the square root of $-0.0$ or $+\infty$. In all cases the sign of
the result is the sign of~\$Z\null.
 
\bull\<FINT \$X,\$Z `floating integer'.\>
@.FINT@>
The floating point number in register~Z is rounded (if
necessary) to a floating point integer, using the current
rounding mode, and placed in register~X\null. Infinite values and quiet NaNs
are not changed; signaling NaNs are treated as in the standard conventions.
Floating point overflow and underflow exceptions cannot occur.
 
The Y field of \.{FSQRT} and \.{FINT} can be used to specify a
special rounding mode, as explained below.
@ Besides doing arithmetic, we need to compare floating point numbers
with each other, taking proper account of NaNs and the fact that $-0.0$
should be considered equal to $+0.0$. The following instructions are
analogous to the comparison operators \.{CMP} and \.{CMPU} that we
have used for integers.
@^minus zero@>
 
\bull\<FCMP \$X,\$Y,\$Z `floating compare'.\>
@.FCMP@>
Register X is set to $-1$ if $\rY<\rZ$ according to the conventions of
floating point arithmetic, or to~1 if $\rY>\rZ$ according to those
conventions. Otherwise it is set to~0. An invalid exception
occurs if either \$Y or \$Z is a NaN; in such cases the result is zero.
 
\bull\<FEQL \$X,\$Y,\$Z `floating equal to'.\>
@.FEQL@>
Register X is set to 1 if $\rY=\rZ$ according to the conventions of
floating point arithmetic. Otherwise it is set to~0. The result is zero if
either \$Y or \$Z is a NaN, even if a NaN is being compared with itself.
However, no invalid exception occurs, not even when \$Y or \$Z is a signaling
NaN\null. (Perhaps \MMIX\ differs slightly from the IEEE standard in this
regard, but programmers sometimes need to look at signaling NaNs without
encountering side effects.
Programmers who insist on raising
an invalid exception whenever a signaling NaN is compared for floating equality
should issue the instructions \<FSUB \$X,\$Y,\$Y; \<FSUB \$X,\$Z,\$Z just before saying
\.{FEQL}~\.{\$X,\$Y,\$Z}.)
 
Suppose $w$, $x$, $y$, and $z$ are unsigned 64-bit integers with
$w<x<2^{63}\le y<z$. Thus, the leftmost bits of $w$ and~$x$ are~0,
while the leftmost bits of $y$ and~$z$ are~1.
Then we have $w<x<y<z$ when these numbers are considered
as unsigned integers, but $y<z<w<x$ when they are considered as signed
integers, because $y$ and~$z$ are negative. Furthermore, we have
$z<y\le w<x$ when these same 64-bit quantities are considered to be
floating point numbers, assuming that no NaNs are present,
because the leftmost bit of a floating point
number represents its sign and the remaining bits represent its magnitude.
The case $y=w$ occurs in floating point comparison
if and only if $y$ is the representation of $-0.0$
and $w$ is the representation of $+0.0$.
 
\bull\<FUN \$X,\$Y,\$Z `floating unordered'.\>
@.FUN@>
Register X is set to 1 if \$Y and \$Z are unordered according to the conventions
of floating point arithmetic (namely, if either one is a NaN); otherwise
register~X is set to~0. No invalid exception occurs, not even when \$Y or \$Z is
a signaling NaN\null.
 
\smallskip
The IEEE standard discusses 26 different possible
relations on floating point numbers;
\MMIX\ implements 14 of them with single instructions, followed by
a branch (or by a \.{ZS} to make a ``pure'' 0~or~1 result); all 26
can be evaluated with a sequence of at most four \MMIX\ commands
and a subsequent branch. The
hardest case to handle is `?$>=$' (unordered or greater or equal,
to be computed without exceptions), for which the following
sequence makes $\rX\ge0$ if and only if $\rY\mathrel?>=\rZ$:
$$\vbox{\halign{&\tt#\hfil\ \cr
&FUN &\$255,\$Y,\$Z\cr
&BP &\$255,1F&\% skip ahead if unordered\cr
&FCMP&\$X,\$Y,\$Z&\% \$X=[\$Y>\$Z]-[\$Y<\$Z]; no exceptions will arise\cr
1H&CSNZ &\$X,\$255,1&\% \$X=1 if unordered\cr
}}$$
 
@ Exercise: Suppose \MMIX\ had no \.{FINT} instruction. Explain how to
@.FINT@>
obtain the equivalent of \<FINT \$X,\$Z using other instructions. Your
program should do the proper thing with respect to NaNs and exceptions.
(For example, it should cause an invalid exception if and only if \$Z is
a signaling NaN; it should cause an inexact exception only if \$Z needs
to be rounded to another value.)
@^emulation@>
 
Answer: (The assembler prefixes hexadecimal constants by \.\#.)
$$\vbox{\halign{&\tt#\hfil\ \cr
&SETH &\$0,\char`\#4330&\% \$0=2\char`\^52\cr
&SET &\$1,\$Z&\% \$1=\$Z\cr
&ANDNH &\$1,\char`\#8000&\% \$1=abs(\$Z)\cr
&ANDN &\$2,\$Z,\$1&\% \$2=signbit(\$Z)\cr
&FUN &\$3,\$Z,\$Z&\% \$3=[\$Z is a NaN]\cr
&BNZ &\$3,1F&\% skip ahead if \$Z is a NaN\cr
&FCMP &\$3,\$1,\$0&\% \$3=[abs(\$Z)>2\char`\^52]-[abs(\$Z)<2\char`\^52]\cr
&CSNN &\$0,\$3,0&\% set \$0=0 if \$3>=0\cr
&OR &\$0,\$2,\$0&\% attach sign of \$Z to \$0\cr
1H\ &FADD &\$1,\$Z,\$0&\% \$1=\$Z+\$0\cr
&FSUB &\$X,\$1,\$0&\% \$X=\$1-\$0\cr}}$$
This program handles most cases of interest by adding and subtracting
$\pm2^{52}$ using floating point arithmetic.
It would be incorrect to do this in all cases;
for example, such addition/subtraction might fail to give the correct
answer when \$Z is a small negative
quantity (if rounding toward zero), or when \$Z is a number like
$2^{105}+2^{53}$ (if rounding to nearest).
 
@ \MMIX\ goes beyond the IEEE standard to define additional relations
between floating point numbers, as suggested by the theory in
Section 4.2.2 of {\sl Seminumerical Algorithms}. Given a nonnegative
number~$\epsilon$, each normal floating point number $u=(f,e)$ has
a {\it neighborhood\/}
$$N_\epsilon(u)=\{x\,\mid\,\vert x-u\vert\le 2^{e-1022}\epsilon\};$$
we also define $N_\epsilon(0)=\{0\}$,
$N_\epsilon(u)=\{x\mid\vert x-u\vert\le2^{-1021}\epsilon\}$ if $u$ is
subnormal; $N_\epsilon(\pm\infty)=\{\pm\infty\}$ if $\epsilon<1$,
$N_\epsilon(\pm\infty)=\{$everything except $\mp\infty\}$ if $1\le\epsilon<2$,
$N_\epsilon(\pm\infty)=\{$everything$\}$ if $\epsilon\ge2$. Then we write
$$\vbox{\halign{$u#v\ (\epsilon)$, &#\hfil\cr
\prec&if $u<N_\epsilon(v)$ and $N_\epsilon(u)<v$;\cr
\sim&if $u\in N_\epsilon(v)$ or $v\in N_\epsilon(u)$;\cr
\approx&if $u\in N_\epsilon(v)$ and $v\in N_\epsilon(u)$;\cr
\succ&if $u>N_\epsilon(v)$ and $N_\epsilon(u)>v$.\cr}}$$
 
\def\rE{{\rm rE}}
\bull\<FCMPE \$X,\$Y,\$Z `floating compare (with respect to epsilon)'.\>
@.FCMPE@>
Register X is set to $-1$ if $\rY\prec\rZ\ \ (\rE)$ according to the
conventions of {\sl Seminumerical Algorithms} as stated above; it is set to~1
if $\rY\succ\rZ\ \ (\rE)$ according to those conventions; otherwise
it is set to~0. Here rE is a floating point number in
@^rE@>
the special {\it epsilon register\/}, which is used only by the
floating point comparison operations \.{FCMPE}, \.{FEQLE}, and \.{FUNE}.
An invalid exception occurs, and the result is zero,
if any of \$Y, \$Z, or rE are NaN, or if rE is negative.
If no such exception occurs, exactly one of the three conditions
$\rY\prec\rZ$, $\rY\sim\rZ$, $\rY\succ\rZ$ holds with respect to~rE.
 
\bull\<FEQLE \$X,\$Y,\$Z `floating equivalent (with respect to epsilon)'.\>
@.FEQLE@>
Register X is set to 1 if $\rY\approx\rZ\ \ (\rE)$ according to the
conventions of {\sl Seminumerical Algorithms\/} as stated above; otherwise
it is set to~0.
An invalid exception occurs, and the result is zero,
if any of \$Y, \$Z, or rE are NaN, or if rE is negative.
Notice that the relation $\rY\approx\rZ$ computed by \.{FEQLE} is
stronger than the relation $\rY\sim\rZ$ computed by \.{FCMPE}.
 
\bull\<FUNE \$X,\$Y,\$Z `floating unordered (with respect to epsilon)'.\>
@.FUNE@>
Register X is set to 1 if
\$Y, \$Z, or~rE are exceptional as discussed for \.{FCMPE} and \.{FEQLE};
otherwise it is set to~0. No exceptions occur, even if \$Y, \$Z, or~rE is
a signaling NaN.
 
\smallskip\noindent
Exercise: What floating point numbers does \.{FCMPE} regard
as $\sim0.0$ with respect to
$\epsilon=1/2$, when no exceptions arise? \ Answer: Zero, subnormal
numbers, and normal numbers with $f=0$.
(The numbers similar to zero with respect to~$\epsilon$ are zero,
subnormal numbers with $f\le2\epsilon$, normal numbers with $f\le2\epsilon-1$,
and $\pm\infty$ if $\epsilon>=1$.)
 
@ The IEEE standard also defines 32-bit floating point quantities, which
it calls ``single format'' numbers. \MMIX\ calls them {\it short floats},
@^short float@>
and converts between 32-bit and 64-bit forms when such numbers are
loaded from memory or stored into memory. A short float consists of a sign
bit followed by an 8-bit exponent and a 23-bit fraction. After it has
been loaded into one of\/ \MMIX's registers, its 52-bit fraction part
will have 29 trailing zero bits, and its exponent~$e$ will be one of the
256 values 0, $(01110000001)_2=897$, $(01110000010)_2=898$, \dots,
$(10001111110)_2=1150$, or~2047, unless it was subnormal; a subnormal
short float loads into a normal number with $874\le e\le896$.
 
\bull\<LDSF \$X,\$Y,\0 `load short float'.\>
@.LDSF@>
Register~X is set to the 64-bit floating point number corresponding
to the 32-bit floating point number represented by
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
No arithmetic exceptions occur, not even if a signaling NaN is loaded.
 
\bull\<STSF \$X,\$Y,\0 `store short float'.\>
@.STSF@>
The value obtained by rounding register~X to a 32-bit floating
point number is placed in $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$.
Rounding is done with the current rounding mode, in a manner
exactly analogous to the standard conventions for rounding 64-bit results,
except that the precision and exponent range are limited. In particular,
floating overflow, underflow, and inexact exceptions might occur;
a signaling NaN will trigger an invalid exception and it will become quiet.
The fraction part of a NaN is truncated if necessary to a multiple of
$2^{-23}$, by ignoring the least significant 29 bits.
 
If we load any two short floats and operate on them once with either \.{FADD},
\.{FSUB}, \.{FMUL}, \.{FDIV}, \.{FREM}, \.{FSQRT}, or \.{FINT}, and if we then
store the result as a short float, we obtain the results required by
the IEEE standard for single format arithmetic, because
the double format can be shown to have enough precision to avoid any
problems of ``double rounding.'' But programmers are usually better
off sticking
to 64-bit arithmetic unless they have a strong reason to emulate the
precise behavior of a 32-bit computer; 32 bits do not offer
much precision.
 
@ Of course we need to be able to go back and forth between integers and
floating point values.
 
\bull\<FIX \$X,\$Z `convert floating to fixed'.\>
@.FIX@>
The floating point number in register~Z is converted to an integer
as with the \.{FINT} instruction, and the resulting integer (mod~$2^{64}$)
is placed in register~X\null.
An invalid exception occurs if \$Z is infinite
or a NaN; in that case \$X is simply set equal to~\$Z\null. A float-to-fix
exception occurs if the result is less than
@^float-to-fix exception@>
@^short float@>
$-2^{63}$ or greater than $2^{63}-1$.
 
\bull\<FIXU \$X,\$Z `convert floating to fixed unsigned'.\>
@.FIXU@>
This instruction is identical to \.{FIX} except that no float-to-fix
exception occurs.
 
\bull\<FLOT \$X,\0 `convert fixed to floating'.\>
@.FLOT@>
The integer in \$Z or the immediate constant~Z is
converted to the nearest floating point value (using the current rounding
mode) and placed in register~X\null. A floating inexact exception
occurs if rounding is necessary.
 
\bull\<FLOTU \$X,\0 `convert fixed to floating unsigned'.\>
@.FLOTU@>
\.{FLOTU} is like \.{FLOT}, but \$Z is treated as an unsigned integer.
 
\bull\<SFLOT \$X,\0 `convert fixed to short float';
\<SFLOTU \$X,\0 `convert fixed to short float unsigned'.\>
@.SFLOT@>
@.SFLOTU@>
The \.{SFLOT} instructions are like the \.{FLOT} instructions, except that
they round to a floating point number whose fraction part is a multiple
of $2^{-23}$. (Thus, the resulting value will not be changed by a ``store
short float'' instruction.) Such conversions appear in \MMIX's repertoire only
to establish complete conformance with the IEEE standard; a programmer
needs them only when emulating a 32-bit machine.
@^emulation@>
 
@ Since the variants of \.{FIX} and \.{FLOT} involve only one input operand (\$Z
or~Z), their Y~field is normally zero. A programmer can, however, force the
mode of rounding used with these commands by setting
$$\vbox{\halign{$\yy=#$,\quad &\.{ROUND\_#};\hfil\cr
1&OFF\cr
2&UP\cr
3&DOWN\cr
4&NEAR\cr}}$$
for example, the instruction \<FLOTU \$X,ROUND\_OFF,\$Z will set the
exponent~$e$ of register~X to $1086-l$ if \$Z is a nonzero quantity with
$l$~leading zero bits. Thus we can count leading zeros by continuing
with \.{SETL}~\.{\$0,1086}; \.{SR}~\.{\$X,\$X,52}; \.{SUB}~\.{\$X,\$0,\$X};
\.{CSZ}~\.{\$X,\$Z,64}.
@^counting leading zeros@>
@.FLOT@>
@.FLOTU@>
@.SFLOT@>
@.SFLOTU@>
@.FIX@>
@.FIXU@>
@:ROUND_OFF}\.{ROUND\_OFF@>
@:ROUND_UP}\.{ROUND\_UP@>
@:ROUND_DOWN}\.{ROUND\_DOWN@>
@:ROUND_NEAR}\.{ROUND\_NEAR@>
 
The Y field can also be used in the same way
to specify any desired rounding mode in the other
floating point instructions that have only a single operand, namely
\.{FSQRT} and~\.{FINT}.
@.FSQRT@>
@.FINT@>
An illegal instruction interrupt occurs if Y exceeds~4 in any of these
commands.
@^illegal instructions@>
 
@* Subroutine linkage.
\MMIX\ has several special operations designed to facilitate the process of
calling and implementing subroutines. The key notion is the idea of a
hardware-supported {\it register stack}, which can coexist with a
software-supported stack of variables that are not maintained in registers.
From a programmer's standpoint, \MMIX\ maintains a potentially unbounded list
$S[0]$, $S[1]$, \dots,~$S[\tau-1]$ of octabytes holding the contents
of registers that are temporarily inaccessible; initially $\tau=0$.
When a subroutine is entered, registers can be ``pushed'' on to the end of
this list, increasing~$\tau$; when the subroutine has finished its
execution, the registers are ``popped'' off again and $\tau$~decreases.
 
Our discussion so far has treated all 256 registers \$0, \$1, \dots,~\$255 as if
they were alike. But in fact, \MMIX\ maintains two internal one-byte counters
$L$ and~$G$, where $0\le\ll\le\gg<256$, with the property that
$$\vbox{\halign{#\hfil\cr
registers 0, 1, \dots, $\ll-1$ are ``local'';\cr
registers @!|L|, $\ll+1$, \dots, $\gg-1$ are ``marginal'';\cr
registers @!|G|, $\gg+1$, \dots, 255 are ``global.''\cr}}$$
A marginal register is zero when its value is read.
@^illegal instructions@>
@^rG@>
@^rL@>
@^local registers@>
@^marginal registers@>
@^global registers@>
@^register stack@>
 
The $G$ counter is normally set to a fixed value once and for all when a program
is loaded, thereby defining the number of program variables that will live
entirely in registers rather than in memory during the course of execution.
A programmer may, however, change~$G$ dynamically using the \.{PUT}
instruction described below.
 
The $L$ counter starts at 0. If an instruction places a value into a register
that is currently marginal, namely a register $x$ such that
$\ll\le x<\gg$, the value of~$L$ will increase to $x+1$, and any
newly local registers will be zero. For example, if $\ll=10$ and
$\gg=200$, the instruction \<ADD \$5,\$15,1 would simply set \$5 to~1. But the
instruction \<ADD \$15,\$5,\$200 would set \$10, \$11, \dots,~\$14 to zero,
\$15 to $\$5+\$200$, and $L$~to~16. (The process of clearing registers and
increasing~$L$ might take quite a few machine cycles in the worst case. We will
see later that \MMIX\ is able to take care of any high-priority interrupts
that might occur during this time.)
 
\bull\<PUSHJ \$X,@@+4*YZ[-262144] `push registers and jump'.
\bul\<PUSHGO \$X,\$Y,\0 `push registers and go'.\>
@.PUSHGO@>
@.PUSHJ@>
Suppose first that $\xx<\ll$.
Register~X is set equal to the number~X, then
registers 0, 1, \dots,~X are pushed onto the register stack as
described below.
If this instruction is in
location $\lambda$, the value $\lambda+4$ is placed into the special {\it
return-jump register\/}~rJ\null. Then control jumps to instruction
@^rJ@>
$\lambda+4\rm YZ$ or $\lambda+4\rm YZ-262144$ or
$\rY+\rZ$ or $\rY+\zz$, as in a
\.{JMP} or \.{GO} command.
 
Pushing the first $\xx+1$ registers onto the stack means essentially that we
set $S[\tau]\gets\$0$, $S[\tau+1]\gets\$1$, \dots, $S[\tau+\xx]\gets\$\xx$,
$\tau\gets\tau+\xx+1$, $\$0\gets\$(\xx+1)$, \dots,
$\$(\ll-\xx-2)\gets\$(\ll-1)$, $\ll\gets\ll-\xx-1$. For example, if
$\xx=1$ and $\ll=5$, the current contents of \$0 and the number~1 are
placed on the register stack, where they will be temporarily inaccessible.
Then control jumps to a subroutine with $L$ reduced to~3; the registers that we
had been calling \$2, \$3, and \$4 appear as \$0, \$1, and \$2 to the subroutine.
 
If $\ll\le\xx<\gg$, the value of $\ll$ increases to $\xx+1$ as described
above; then the rules for $\xx<\ll$ apply.
 
If $\xx\ge\gg$ the actions are similar, except that {\it all\/} of the local
registers \$0, \dots,~$\$(\ll-1)$ are placed on the register stack
followed by the number~$L$, and $L$~is reset to zero. In particular, the
instruction \<PUSHGO \$255,\$Y,\$Z pushes all the local registers
onto the stack and sets $L$ to zero, regardless of the previous value of~$L$.
 
We will see later that \MMIX\ is able to achieve the effect of pushing and
renaming local registers without actually doing very much work at all.
 
\bull\<POP X,YZ `pop registers and return from subroutine'.\>
@.POP@>
This command preserves X of the current local registers,
undoes the effect of the most recent \.{PUSHJ} or \.{PUSHGO}, and jumps
to the instruction in $\mm_4[{\rm4YZ+rJ}]$. If $\xx>0$, the value of
$\$(\xx-1)$ goes into the ``hole'' position where \.{PUSHJ} or
\.{PUSHGO} stored the number of registers previously pushed.
 
The formal details of \.{POP} are slightly complicated, but we will see that
they make sense: If $\xx>\ll$, we first replace X by $\ll+1$. Then we
set $x\gets S[\tau-1]\bmod 256$; this is the effective value of the X~field
in the push instruction that is being undone. Stack position $S[\tau-1]$ is
now set to $\$(\xx-1)$ if $0<\xx\le L$, otherwise it is set to zero.
Then we essentially set
$\ll\gets\min(x+\xx,\gg)$, $\$(\ll-1)\gets\$(\ll-x-2)$, \dots,
$\$(x+1)\gets\$0$, $\$x\gets S[\tau-1]$, \dots,
$\$0\gets S[\tau-x-1]$, $\tau\gets\tau-x-1$. The operating system should
@^operating system@>
arrange things so that a memory-protection
interrupt will occur if a program does more pops than pushes.
(If $x>\gg$, these formulas don't make sense as written; we actually
set $\$j\gets S[\tau-x-1+j]$ for $\ll>j\ge0$ in that rare case.)
 
Suppose, for example, that a subroutine has three input parameters
$(\$0,\$1,\$2)$ and produces two outputs $(\$0,\$1)$. If the subroutine does
not call any other subroutines, it can simply end with \.{POP} \.{2,0},
because rJ will contain the return address. Otherwise it should begin by
saving rJ, for example with the instruction \<GET \$4,rJ if it will be
using local registers \$0 through~\$3, and it should use \<PUSHJ \$5 or
\<PUSHGO \$5 when
calling sub-subroutines; finally it should \<PUT rJ,\$4 before
saying \.{POP}~\.{2,0}. To call the subroutine from another routine that
has, say, 6~local registers, we would put the input arguments into \$7, \$8,
and~\$9, then issue the command \.{PUSHGO} \.{\$6,base,Subr};
in due time the outputs of the subroutine will appear in \$7 and~\$6.
 
Notice that the push and pop commands make use of a one-place ``hole'' in the
register stack, between the registers that are pushed down and the registers
that remain local. (The hole is position \$6 in the example just considered.)
\MMIX\ needs this hole position to remember the number of
registers that are pushed down.
A subroutine with no outputs ends with \<POP 0,0 and the hole disappears
(becomes marginal). A subroutine with one output~\$0 ends with \<POP 1,0 and
the hole gets the former value of~\$0. A subroutine with two outputs
$(\$0,\$1)$ ends with \<POP 2,0 and the hole gets the former value of~\$1; in
this case, therefore, the relative order of the two outputs has been switched
on the register stack. If a subroutine has, say, five outputs
$(\$0,\ldots,\$4)$, it ends with \<POP 5,0 and \$4~goes into the hole position,
where it is followed by $(\$0,\$1,\$2,\$3)$.
\MMIX\ makes this curious permutation in the case of multiple outputs because
the hole is most easily plugged by moving one value down (namely~\$4) instead
of by sliding each of five values down in the stack.
 
These conventions for parameter passing are admittedly a bit confusing in the
general case, and I~suppose people who use them extensively might someday find
themselves talking about ``the infamous \MMIX\ register shuffle.'' However,
there is good use for subroutines that convert
a sequence of register contents like $(x,a,b,c)$ into $(f,a,b,c)$ where
$f$ is a function of $a$, $b$, and $c$ but not~$x$. Moreover,
\.{PUSHGO} and \.{POP} can be implemented with great efficiency,
and subroutine linkage tends to be a significant bottleneck when
other conventions are used.
 
Information about a subroutine's calling conventions needs to be communicated
to a debugger. That can readily be done at the same time as we inform the
debugger about the symbolic names of addresses in memory.
 
A subroutine that uses 50 local registers will not function properly if it is
called by a program that sets $G$ less than~50. \MMIX\ does not allow the
value of~$G$ to become less than~32. Therefore any subroutine that avoids
global registers and uses at most~32 local registers
can be sure to work properly regardless of the current value of~$G$.
 
The rules stated above imply that a \.{PUSHJ} or
\.{PUSHGO} instruction with $\xx=255$ pushes all of the currently defined
local registers onto the stack and sets $L$ to~zero.
This makes $G$ local registers available for use by the subroutine
jumped~to. If that subroutine later returns with \.{POP} \.{0,0}, the former
value of~$L$ and the former contents of \$0, \dots,~$\$(\ll-1)$ will be
restored (assuming that $G$ doesn't decrease).
 
A \.{POP} instruction with $\xx=255$
preserves all the local registers as outputs of
the subroutine (provided that the total doesn't exceed~$G$ after popping),
and puts zero into the hole (unless $L=G=255$). The best policy, however, is
almost always to use \.{POP} with a small value of~X, and in general to keep
the value of~$L$ as small as
possible by decreasing it when registers are no longer active.
A smaller value of~$L$ means that \MMIX\ can change context more
easily when switching from one process to another.
 
@* System considerations.
High-performance implementations of\/ \MMIX\ gain speed by keeping {\it
caches\/} of instructions and data that are likely to be needed as computation
@^caches@>
proceeds. [See M.~V. Wilkes, {\sl IEEE Transactions\/ \bf EC-14} (1965),
270--271; J.~S. Liptay, {\sl IBM System J. \bf7} (1968), 15--21.]
@^Wilkes, Maurice Vincent@>
@^Liptay, John S.@>
Careful programmers can make the computer run even faster by giving
hints about how to maintain such caches.
 
\bull\<LDUNC \$X,\$Y,\0 `load octa uncached'.\>
@.LDUNC@>
These instructions, which have the same meaning as \.{LDO}, also
inform the computer that the loaded octabyte (and its neighbors in a cache
block) will probably not be read or written in the near future.
 
\bull\<STUNC \$X,\$Y,\0 `store octa uncached'.\>
@.STUNC@>
These instructions, which have the same meaning as \.{STO}, also
inform the computer that the stored octabyte (and its neighbors in a cache
block) will probably not be read or written in the near future.
 
\bull\<PRELD X,\$Y,\0 `preload data'.\>
@.PRELD@>
These instructions have no effect on registers or memory, but they inform the
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
will probably be loaded and/or stored in the near future.
No protection failure occurs if the memory is not accessible.
 
\bull\<PREGO X,\$Y,\0 `prefetch to go'.\>
@.PREGO@>
These instructions have no effect on registers or memory, but they inform the
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
will probably be used as instructions in the near future.
No protection failure occurs if the memory is not accessible.
 
\bull\<PREST X,\$Y,\0 `prestore data'.\>
@.PREST@>
These instructions have no effect on registers or memory if the computer has
no data cache. But when such a cache exists, they inform the
computer that all of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
will definitely be stored in the near future before they are loaded.
(Therefore it is permissible for the machine to ignore the present contents of
those bytes. Also, if those bytes are being shared by several processors,
the current processor should try to acquire exclusive access.)
No protection failure occurs if the memory is not accessible.
 
\bull\<SYNCD X,\$Y,\0 `synchronize data'.\>
@.SYNCD@>
When executed from nonnegative locations, these instructions have no effect on
registers or memory if neither a write buffer nor a ``write back''
data cache are present. But when such a buffer or cache exists, they force the
computer to make sure that all data for the $\xx+1$ bytes
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
will be present in memory.
(Otherwise the result of a previous store instruction might appear only
in the cache; the computer is being told that now is the time to
write the information back, if it hasn't already been written. A program
can use this feature before outputting directly from memory.)
No protection failure occurs if the memory is not accessible.
 
The action is similar when \.{SYNCD} is executed from a negative address,
but in this case the specified bytes are also removed from the data
cache (and from a secondary cache, if present). The operating system can
use this feature when a page of virtual memory is being swapped out,
or when data is input directly into memory.
@^operating system@>
 
\bull\<SYNCID X,\$Y,\0 `synchronize instructions and data'.\>
@.SYNCID@>
When executed from nonnegative locations these instructions have no effect on
registers or memory if the computer has no instruction cache separate from a
data cache. But when such a cache exists, they force the
computer to make sure that the $\xx+1$ bytes
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$,
will be interpreted correctly
if used as instructions before they are next modified.
(Generally speaking, an \MMIX\ program is not expected to store anything in
memory locations that are also being used as instructions.
Therefore \MMIX's instruction cache is allowed to become inconsistent with
respect to its data cache. Programmers who insist on executing instructions
that have been fabricated dynamically, for example when setting a breakpoint
for debugging, must first \.{SYNCID} those instructions
in order to guarantee that the intended results will be obtained.) A \.{SYNCID}
command might be implemented in several ways; for example, the machine
might update its instruction cache to agree with its data cache. A simpler
solution, which is good enough because the need for \.{SYNCID} ought to
be rare, removes instructions in the specified range
from the instruction cache, if
present, so that they will have to be fetched from memory the next time
they are needed; in this case the machine also carries out the effect of
a~\.{SYNCD} command.
No protection failure occurs if the memory is not accessible.
 
The behavior is more drastic, but faster, when \.{SYNCID} is executed
from a negative location. Then all bytes in the specified range are
simply removed from all caches, and the memory corresponding to
any ``dirty'' cache blocks involving such bytes is {\it not\/} brought up
to date. An operating system can use this version of the command
when pages of virtual memory are being discarded (for example, when
a program is being terminated).
 
@ \MMIX\ is designed to work not only on a single processor but also
in situations where several processors
share a common memory. The following commands are useful
for efficient operation in such circumstances.
 
\bull\<CSWAP \$X,\$Y,\0 `compare and swap octabytes'.\>
@.CSWAP@>
If the octabyte $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$
is equal to the contents of the special {\it prediction register\/}~rP,
@^rP@>
it is replaced in memory with the contents of register~X, and
register~X is set equal to~1. Otherwise the octabyte in memory
replaces rP and register~X is set to zero.
This is an atomic (indivisible, uninterruptible) operation,
useful for interprocess communication
when independent computers are sharing the same memory.
 
The compare-and-swap operation was introduced by IBM in late
models of the
@^IBM Corporation@>
@^compare-and-swap@>
@^atomic instruction@>
System/370 architecture, and it soon spread to several
@^System/370@>
other machines. Significant ways to use it are discussed, for example,
in section 7.2.3 of Harold Stone's
{\sl High-Performance Computer Architecture\/} (Reading, Massachusetts:\
Addison--Wesley, 1987), and in sections 8.2 and 8.3 of {\sl Transaction
Processing\/} by Jim Gray and Andreas Reuter (San Francisco:\ Morgan
Kaufmann, 1993). % Kaufmann: stet
@^Stone, Harold Stuart@>
@^Gray, James Nicholas@>
@^Reuter, Andreas Horst@>
 
\bull\<SYNC XYZ `synchronize'.\>
@.SYNC@>
If $\rm XYZ=0$, the machine drains its pipeline (that is, it
stalls until all preceding instructions have completed their activity).
If $\rm XYZ=1$, the machine controls its actions less drastically,
in such a way that all
store instructions preceding this \.{SYNC} will be completed
before all store instructions after it.
If $\rm XYZ=2$, the machine controls its actions in such a way that all
load instructions preceding this \.{SYNC} will be completed
before all load instructions after it.
If $\rm XYZ=3$, the machine controls its actions
in such a way that all {\it load or store\/} instructions preceding this
\.{SYNC} will be completed before all load or store instructions after it.
If $\rm XYZ=4$, the machine goes into a power-saver mode, in which
@^power-saver mode@>
instructions may be executed more slowly (or not at all) until some kind
of ``wake-up'' signal is received.
If $\rm XYZ=5$, the machine empties its write buffer and
cleans its data caches, if any (including a possible secondary cache);
the caches retain their data,
but the cache contents also appear in memory.
If $\rm XYZ=6$, the machine clears its virtual address translation
caches (see below).
If $\rm XYZ=7$, the machine clears its instruction and data caches,
discarding any information in the data caches that wasn't previously
in memory. (``Clearing'' is stronger than ``cleaning''; a clear cache
remembers nothing. Clearing is also faster, because it simply obliterates
everything.)
If $\rm XYZ>7$, an illegal instruction interrupt occurs.
 
Of course no \.{SYNC} is necessary between a command that loads from or stores
into memory and a subsequent command that loads from or stores into exactly
the same location. However, \.{SYNC} might be necessary in certain cases even
on a one-processor system, because input/output processes take place in
parallel with ordinary computation.
 
The cases $\rm XYZ>3$ are {\it privileged}, in the sense that
only the operating system can use them. More precisely, if a \.{SYNC}
command is encountered with $\rm XYZ=4$ or
$\rm XYZ=5$ or $\rm XYZ=6$ or $\rm XYZ=7$,
a ``privileged instruction interrupt'' occurs unless that interrupt
is currently disabled. Only the operating system can disable
interrupts (see below).
@^privileged operations@>
 
@* Trips and traps.
Special register rA records the current status information
about arithmetic exceptions. Its least significant byte contains eight
``event'' bits called DVWIOUZX from left to right, where D stands for
integer divide check, V~for integer overflow, W~for float-to-fix overflow,
I~for invalid operation, O~for floating overflow, U~for
floating underflow, Z~for floating division by zero, and X~for floating
inexact. % The low order five bits agree with SPARC I conventions
% but Alpha, for example, uses the order VXUOZI
The next least significant byte of rA contains eight
``enable'' bits with the same names DVWIOUZX and the same meanings.
When an exceptional condition occurs, there are two cases: If the
corresponding enable bit is~0, the corresponding event bit is set
to~1. But if the corresponding enable bit is~1, \MMIX\ interrupts
its current instruction stream and executes a special ``exception
handler.'' Thus, the event bits record exceptions that have not been
``tripped.''
@^overflow@>
@^underflow@>
@^exceptions@>
@^handlers@>
@^float-to-fix exception@>
@^inexact exception@>
@^invalid exception@>
@^divide check exception@>
 
Floating point overflow always causes two exceptions, O and~X\null.
(The strictest interpretation of the IEEE standard would raise exception~X
on overflow only if floating overflow is not enabled, but \MMIX\ always
considers an overflowed result to be inexact.)
Floating point underflow always causes both U and~X when underflow is
not enabled, and it might cause both U and~X when underflow is enabled.
If both enable bits are set to~1 in such cases, the overflow or underflow
handler is called and the inexact handler is ignored. All other types
of exceptions arise one at a time, so there is no ambiguity about which
exception handler should be invoked unless exceptions are raised by
``ropcode~2'' (see below); in general the first enabled exception
in the list DVWIOUZX takes precedence.
 
What about the six high-order bytes of the status register rA?
@^rA@>
@^rounding modes@>
At present, only two of those 48 bits are defined;
the others must be zero for compatibility
with possible future extensions. The two bits corresponding to $2^{17}$
and $2^{16}$ in rA specify a rounding mode, as follows: 00~means
round to nearest (the default); 01~means round off (toward zero);
10~means round up (toward positive infinity); and
11~means round down (toward negative infinity).
% Alpha conventions differ: 10,00,11,01 for nearest,off,up,down
 
@ The execution of\/ \MMIX\ programs can be interrupted in several ways.
We have just seen that arithmetic exceptions will cause interrupts if
they are enabled; so will illegal or privileged instructions, or instructions
@^illegal instructions@>
@^privileged operations@>
@^emulation@>
@^interrupts@>
@^I/O@>
@^input/output@>
that are emulated in software instead of provided by the hardware.
Input/output operations or external timers are another common source
of interrupts; the operating system knows how to deal with
all gadgets that might be hooked up to an \MMIX\ processor chip.
Interrupts occur also when memory accesses fail---for example if
memory is nonexistent or protected.
Power failures that force the machine to use its backup battery power
in order to keep running in an emergency,
or hardware failures like parity errors,
all must be handled as gracefully as possible.
 
Users can also force interrupts to happen by giving explicit \.{TRAP} or
\.{TRIP} instructions:
 
\bull\<TRAP X,Y,Z `trap'; \<TRIP X,Y,Z `trip'.\>
@.TRIP@>
@.TRAP@>
Both of these instructions interrupt processing and transfer control
to a handler. The difference between them is that \.{TRAP}
is handled by the operating system but \.{TRIP} is handled by the user.
@^operating system@>
More precisely, the X, Y, and Z fields of \.{TRAP} have special significance
predefined by the operating system kernel. For example, a system call---say an I/O
command, or a command to allocate more memory---might be invoked
by certain settings of X, Y, and~Z\null.
The X, Y, and Z fields of \.{TRIP}, on the other hand, are definable by
users for their own applications, and users also define their own
handlers. ``Trip handler'' programs
invoked by \.{TRIP} are interruptible, but interrupts are normally inhibited
while a \.{TRAP} is being serviced. Specific details about the
precise actions of \.{TRIP} and \.{TRAP} appear below, together
with the description of another command called \.{RESUME} that
returns control from a handler to the interrupted program.
 
Only two variants of \.{TRAP} are predefined by the \MMIX\ architecture:
If $\rm XYZ=0$ in a \.{TRAP}
command, a user process should terminate. If $\rm XYZ=1$,
the operating system should provide default action for cases in which
the user has not provided any handler for a particular
kind of interrupt (see below).
 
A few additional variants of \.{TRAP} are predefined in the rudimentary
operating system used with \MMIX\ simulators. These variants, which
allow simple input/output operations to be done, all have $\xx=0$,
and the Y~field is a small positive constant. For example, $\yy=1$ invokes
the \.{Fopen} routine, which opens a file. (See the program
{\mc MMIX-SIM} for full details.)
@^I/O@>
@^input/output@>
 
@ Non-catastrophic interrupts in \MMIX\ are always {\it precise}, in the sense that all legal
instructions before a certain point have effectively been executed, and
no instructions after that point have yet been executed. The current
instruction, which may or may not have been completed at the time of
interrupt and which may or may not need to be resumed after the interrupt has
been serviced, is
put into the special {\it execution register\/}~rX, and its operands (if any)
are placed in special registers rY and~rZ\null. The address of the following
instruction is placed in the special {\it where-interrupted
register\/}~rW\null.
@^interrupts@>
@^rW@>
@^rX@>
@^rY@>
@^rZ@>
The instruction in~rX may not be the same as the instruction in
location $\rm rW-4$; for example, it may be an instruction that
branched or jumped to~rW\null. It might also be an instruction
inserted internally by the \MMIX\ processor.
(For example, the computer silently inserts an internal instruction
that increases~$L$ before an instruction
like \<ADD \$9,\$1,\$0 if $L$~is currently less than~10. If an interrupt
occurs, between the inserted instruction and the \.{ADD},
the instruction in~rX will
say \.{ADD}, because an internal instruction retains the identity of the
actual command that spawned it; but rW will point to the {\it real\/}
\.{ADD} command.)
 
When an instruction has the normal meaning ``set \$X to
the result of \$Y~op~\$Z'' or ``set \$X to the result of \$Y~op~Z,''
special registers rY and~rZ will relate in the
obvious way to the Y and~Z operands of the instruction; but this is not
always the case. For example, after an interrupted
store instruction, the first operand~rY will hold
the virtual memory address (\$Y plus either \$Z or~Z),
and the second operand~rZ will be the octabyte to be stored in memory
(including bytes that have not changed, in cases like \.{STB}). In
other cases the actual
contents of rY and~rZ are defined by each implementation of\/ \MMIX,
and programmers should not rely on their significance.
 
Some instructions take an unpredictable and possibly long amount of time, so
it may be necessary to interrupt them in progress. For example, the \.{FREM}
@.FREM@>
instruction (floating point remainder) is extremely difficult to compute
rapidly if its first operand has an exponent of~2046 and its second operand
has an exponent of~1. In such cases the rY and rZ registers saved during an
interrupt show the current state of the computation, not necessarily the
original values of the operands. The value of $\rm rY\,{rem}\,rZ$ will still
be the desired remainder, but rY may well have been reduced to a
number that has an exponent closer to the exponent of~rZ\null.
After the interrupt has been processed, the remainder
computation will continue where it left off.
(Alternatively, an operation like \.{FREM} or even \.{FADD} might be
implemented in software instead of hardware, as we will see later.)
 
Another example arises with an instruction like \.{PREST} (prestore), which can
@.PREST@>
specify prestoring up to 256 bytes. An implementation of\/ \MMIX\ might choose
to prestore only 32 or 64 bytes at a time, depending on the cache block size;
then it can change the contents of rX to reflect the unfinished part of
a partially completed \.{PREST} command.
 
Commands that decrease $G$, pop the stack, save the
current context, or unsave an old context also are interruptible. Register~rX
is used to communicate information about partial completion in such a
way that the interruption will be essentially ``invisible'' after
a program is resumed.
 
@ Three kinds of interruption are possible: trips, forced traps, and
dynamic traps. We will discuss each of these in turn.
@^interrupts@>
@^trips@>
@^traps@>
@^forced traps@>
@^dynamic traps@>
@^handlers@>
@^operating system@>
 
A \.{TRIP} instruction puts itself into the right half of the execution
@.TRIP@>
register~rX, and sets the 32 bits of the left half to \Hex{80000000}.
(Therefore rX is {\it negative\/}; this fact will
tell the \.{RESUME} command not to \.{TRIP} again.) The special registers
rY and rZ are set to the contents of the registers specified by the
Y and Z fields of the \.{TRIP} command, namely \$Y and~\$Z.
Then \$255 is placed into the special {\it bootstrap
register\/}~rB, and \$255 is set to~rJ. \MMIX\ now takes its next instruction
@^rB@>
from virtual memory address~0.
 
Arithmetic exceptions interrupt the computation in essentially the
same way as \.{TRIP}, if they are enabled. The only difference is that
their handlers begin at the respective addresses
16, 32, 48, 64, 80, 96, 112, and~128, for exception bits D, V, W, I, O, U,
Z, and~X of~rA; registers rY and~rZ are set to the operands of the
interrupted instruction as explained earlier.
 
A 16-byte block of memory is just enough for a sequence of commands like
$$\hbox{\tt PUSHJ 255,Handler; PUT rJ,\$255; GET \$255,rB; RESUME}$$
which will invoke a user's handler. And if the user does not choose to
provide a custom-designed handler, the operating system provides a
default handler via the instructions
$$\hbox{\tt TRAP 1; GET \$255,rB; RESUME.}$$
 
A trip handler might simply record the fact that tripping occurred.
But the handler for an arithmetic interrupt might want to change the
default result of a computation. In such cases, the handler should place
the desired substitute result into~rZ, and it should change the most
significant byte of~rX from \Hex{80} to \Hex{02}. This will have the desired
effect, because of the rules of \.{RESUME} explained below, {\it unless\/}
the exception occurred on a command like \.{STB} or \.{STSF}. (A~bit more
work is needed to alter the effect of a command that stores into memory.)
 
Instructions in {\it negative\/} virtual locations do not invoke trip
handlers, either for \.{TRIP} or for arithmetic exceptions. Such instructions
are reserved for the operating system, as we will see.
@^negative locations@>
 
@ A \.{TRAP} instruction interrupts the computation essentially
@^interrupts@>
like \.{TRIP}, but with the following modifications:
@^rT@>
@.TRAP@>
@^rK@>
(i)~the interrupt mask register~rK is cleared
to zero, thereby inhibiting interrupts; (ii)~control jumps to virtual memory
address~rT, not zero; (iii)~information is placed
@^rBB@>
@^rWW@>
@^rXX@>
@^rYY@>
@^rZZ@>
in a separate set of special registers rBB, rWW, rXX, rYY, and~rZZ, instead of
rB, rW, rX, rY, and~rZ\null. (These special registers are needed because a trap
might occur while processing a \.{TRIP}.)
 
Another kind of forced trap occurs on implementations of\/ \MMIX\ that
emulate certain instructions in software rather than in hardware.
Such instructions cause a \.{TRAP} even though their opcode is something
else like \.{FREM} or \.{FADD} or \.{DIV}. The trap handler can tell
what instruction to emulate by looking at the opcode, which appears
in~rXX\null.
In such cases the left-hand half of~rXX is set to \Hex{02000000}; the handler
emulating \.{FADD}, say, should compute the floating point sum of rYY and~rZZ
and place the result in~rZZ\null. A~subsequent
\.{RESUME}~\.1 will then place the value of~rZZ in the proper register.
@^emulation@>
@^forced traps@>
 
Implementations of\/ \MMIX\ might also emulate the process of
virtual-address-to-physical-address translation described below,
instead of providing for page table calculations in hardware.
Then if, say, a \.{LDB} instruction does not know the physical memory
address corresponding to a specified virtual address, it will cause
a forced trap with the left half of~rXX set to \Hex{03000000} and with
rYY set to the virtual address in question. The trap handler should
place the physical page address into~rZZ; then \.{RESUME}~\.1 will
complete~the~\.{LDB}.
 
@ The third and final kind of interrupt is called a {\it dynamic\/} trap.
@^interrupts@>
@^dynamic traps@>
Such interruptions occur when one or more of the 64 bits in the
special {\it interrupt request register\/}~rQ have been set to~1,
@^rQ@>
@^rK@>
and when at least one corresponding bit of the special
{\it interrupt mask register\/}~rK is also equal to~1. The bit positions
of rQ and~rK have the general form
$$\beginword
&\field{24}{24}&&\field88&&\field{24}{24}&&\field88\cr
\noalign{\hrule}
\\&low-priority I/O&\\&program&\\&high-priority I/O&\\&machine&\\\cr
\noalign{\hrule}\endword$$
where the 8-bit ``program'' bits are called \.{rwxnkbsp} and have
the following meanings:
$$\vbox{\halign{\.# bit: &#\hfil\cr
r&instruction tries to load from a page without read permission;\cr
w&instruction tries to store to a page without write permission;\cr
x&instruction appears in a page without execute permission;\cr
n&instruction refers to a negative virtual address;\cr
k&instruction is privileged, for use by the ``kernel'' only;\cr
b&instruction breaks the rules of\/ \MMIX;\cr
s&instruction violates security (see below);\cr
p&instruction comes from a privileged (negative) virtual address.\cr}}$$
Negative addresses are for the use of the operating system only;
@^operating system@>
@^protection bits@>
@^permission bits@>
@^security violation@>
@^privileged instructions@>
@^illegal instructions@>
@^page fault@>
a security violation occurs if an instruction in a nonnegative address
is executed without the \.{rwxnkbsp} bits of~rK all set to~1.
(In such cases the \.s bits of both rQ and~rK are set to~1.)
 
The eight ``machine'' bits of rQ and rK represent the most urgent
kinds of interrupts. The rightmost bit stands for power failure,
the next for memory parity error, the next for nonexistent memory,
the next for rebooting, etc.
Interrupts that need especially quick service, like requests from
a high-speed network, also are allocated bit positions near the right end.
Low priority I/O devices like keyboards are assigned to bits at the left.
The allocation of input/output devices to bit positions will
differ from implementation to implementation, depending on
what devices are available.
@^I/O@>
@^input/output@>
 
Once $\rm rQ\land rK$ becomes nonzero, the machine waits
briefly until it can give a precise interrupt.
Then it proceeds as with a forced trap,
except that it uses the special ``dynamic
trap address register''~rTT instead of~rT. The trap handler that
@^rTT@>
begins at location~rTT can figure out the reason for interrupt by
examining $\rm rQ\land rK$. (For example, after the instructions
$$\hbox spread-10pt{\tt\spaceskip .5em minus .1em
GET \$0,rQ; LDOU \$1,savedK; AND \$0,\$0,\$1; SUBU \$1,\$0,1;
SADD \$2,\$1,\$0; ANDN \$1,\$0,\$1}$$
the highest-priority offending bit will be in \$1 and its position will be
in~\$2.)
@^counting trailing zeros@>
 
If the interrupted instruction contributed 1s to any of the \.{rwxnkbsp} bits
of~rQ, the corresponding bits are set to~1 also in~rXX\null. A~dynamic trap
handler might be able to use this information (although it should
service higher-priority interrupts first if the right half
of $\rm rQ\land rK$ is nonzero).
@^rX@>
 
The rules of\/ \MMIX\ are rigged
so that only the operating system can execute instructions
with interrupts suppressed. Therefore the operating system can in fact
use instructions that would interrupt an ordinary program. Control of
register rK turns out to be the ultimate privilege, and in a sense the
only important one.
@^privileged operations@>
 
An instruction that causes a dynamic trap is usually executed before the
interruption occurs. However, an instruction that traps with
bits \.x, \.k, or \.b does nothing; a load instruction that traps
with \.r or \.n loads zero; a store instruction that traps with any
of \.{rwxnkbsp} stores nothing.
 
@ After a trip handler or trap handler has done its thing, it
generally invokes the following command.
 
\bull\<RESUME Z `resume after interrupt'; the X and Y fields must be zero.\>
@.RESUME@>
@^interrupts@>
@^handlers@>
If the Z field of this instruction is zero,
\MMIX\ will use the
information found in special registers rW, rX, rY, and~rZ to restart an
@^rW@>
@^rX@>
@^rY@>
@^rZ@>
@^rBB@>
@^rWW@>
@^rXX@>
@^rYY@>
@^rZZ@>
@^rK@>
interrupted computation. If the execution register rX is negative, it will be
ignored and instructions will be executed starting at virtual address~rW\null;
otherwise the instruction in the right half of the execution register will be
inserted into the program as if it had appeared in location $\rm rW-4$,
subject to certain modifications that we will explain momentarily,
and the {\it next\/} instruction will come from rW.
 
If the Z field of \.{RESUME}
is 1 and if this instruction appears in a negative location,
registers rWW, rXX, rYY, and~rZZ are used instead of rW, rX, rY, and~rZ\null.
Also, just before resuming the computation,
mask register rK is set to \$255 and \$255 is set to rBB\null.
(Only the operating system gets to use this feature.)
@^operating system@>
 
An interrupt handler within the operating system might choose to allow itself
to be interrupted. In such cases it should save the contents of
rBB, rWW, rXX, rYY, and~rZZ on some kind of stack, before making rK nonzero.
Then, before resuming whatever caused the base level interrupt, it
must again disable all interrupts; this can be done with \.{TRAP},
because the trap handler can tell from the virtual address in~rWW that
it has been invoked by the operating system. Once rK is again zero,
the contents of rBB, rWW, rXX, rYY, and~rZZ are restored from the stack,
the outer level interrupt mask is placed in \$255, and \<RESUME 1
finishes the job.
 
Values of Z greater than 1 are reserved for possible later
definition. Therefore they cause an illegal instruction interrupt (that
is, they set the `\.b' bit of~rQ) in the present version of\/ \MMIX.
@^illegal instructions@>
 
If the execution register rX is nonnegative, its leftmost byte controls
the way its right-hand half will be inserted into the program.
Let's call this byte the ``ropcode.'' A ropcode of~0 simply
inserts the instruction into the execution stream; a ropcode of~1
is similar, but it substitutes rY and rZ for the
two operands, assuming that this makes sense for the operation considered.
@^ropcodes@>
 
Ropcode~2 inserts a command that sets \$X to rZ, where
X~is the second byte in the right half of rX\null.
This ropcode is normally used with forced-trap emulations, so that the result
of an emulated instruction is placed into the correct register.
It also uses the third-from-left byte of~rX to raise any or all of the
arithmetic exceptions DVWIOUZX, at the same time as rZ is
being placed in \$X. Emulated instructions and
explicit \.{TRAP} commands can therefore cause overflow, say,
just as ordinary instructions can.
(Such new exceptions may, of
course, spawn a trip interrupt, if any of the corresponding bits are enabled
in~rA.)
@^rA@>
@^emulation@>
 
Finally, ropcode 3 is the same as ropcode 0, except that it also
tells \MMIX\ to treat rZ as the page table entry for the virtual
address~rY\null. (See the discussion of virtual address translation below.)
Ropcodes greater than~3 are not permitted; moreover,
only \<RESUME 1 is allowed to use ropcode~3.
 
The ropcode rules in the previous paragraphs should of course be understood to
involve rWW, rXX, rYY, and rZZ instead of rW, rX, rY, and~rZ when
the ropcode is seen by \.{RESUME}~\.1. Thus, in particular, ropcode~3
always applies to rYY and~rZZ, never to rY and~rZ.
 
Special restrictions must hold if resumption is to work properly: Ropcodes
0~and~3 must not insert a \.{RESUME} instruction; ropcode~1 must insert
a ``normal'' instruction, namely one whose opcode begins with
one of the hexadecimal digits \Hex{0}, \Hex{1}, \Hex{2}, \Hex{3}, \Hex{6},
\Hex{7}, \Hex{C}, \Hex{D}, or~\Hex{E}. (See the opcode chart below.)
Some implementations may also allow ropcode~1 with \.{SYNCD[I]}
and \.{SYNCID[I]}, so that those instructions can conveniently be
interrupted.
Moreover, the destination register \$X used with ropcode 1 or~2 must
not be marginal. All of these restrictions hold automatically in normal
use; they are relevant only if the programmer tries to do something tricky.
 
Notice that the slightly tricky sequence
$$\hbox{\tt LDA \$0,Loc; PUT rW,\$0; LDTU \$1,Inst; PUT rX,\$1; RESUME}$$
will execute an almost arbitrary instruction \.{Inst} as if it had been in
location \.{Loc-4}, and then will jump to location \.{Loc} (assuming
that \.{Inst} doesn't branch elsewhere).
 
@* Special registers.
@^special registers@>
Quite a few special registers have been mentioned so far, and \MMIX\ actually
has even more. It is time now to enumerate them all, together with their
internal code numbers:
$$\vbox{\halign{\hfil#,\quad&#;\hfil\cr
rA&arithmetic status register [21]\cr
rB&bootstrap register (trip) [0]\cr
rC&cycle counter [8]\cr
rD&dividend register [1]\cr
rE&epsilon register [2]\cr
rF&failure location register [22]\cr
rG&global threshold register [19]\cr
rH&himult register [3]\cr
rI&interval counter [12]\cr
rJ&return-jump register [4]\cr
rK&interrupt mask register [15]\cr
rL&local threshold register [20]\cr
rM&multiplex mask register [5]\cr
rN&serial number [9]\cr
rO&register stack offset [10]\cr
rP&prediction register [23]\cr
rQ&interrupt request register [16]\cr
rR&remainder register [6]\cr
rS&register stack pointer [11]\cr
rT&trap address register [13]\cr
rU&usage counter [17]\cr
rV&virtual translation register [18]\cr
rW&where-interrupted register (trip) [24]\cr
rX&execution register (trip) [25]\cr
rY&Y operand (trip) [26]\cr
rZ&Z operand (trip) [27]\cr
rBB&bootstrap register (trap) [7]\cr
rTT&dynamic trap address register [14]\cr
rWW&where-interrupted register (trap) [28]\cr
rXX&execution register (trap) [29]\cr
rYY&Y operand (trap) [30]\cr
rZZ&Z operand (trap) [31]\cr}}$$
@^rG@>
@^rL@>
In this list rG and rL are what we have been calling simply $G$ and $L$; \
rC, rF, rI, rN, rO, rS, rU, and~rV have not been mentioned before.
 
@ The {\it cycle counter\/}~rC advances by~1 on every ``clock pulse'' of the
@^rC@>
\MMIX\ processor. Thus if \MMIX\ is running at 500 MHz, the cycle
counter increases every 2 nanoseconds. There is no need to worry about
rC overflowing; even if it were to increase once every nanosecond,
it wouldn't reach $2^{64}$ until more than 584.55 years have gone by.
 
The {\it interval counter\/}~rI is similar, but it {\it decreases\/}
@^rI@>
by~1 on each cycle, and causes an {\it interval interrupt\/}
when it reaches zero. Such interrupts can be extremely useful for
``continuous profiling'' as a means of studying
the empirical running time of programs;
see Jennifer~M. Anderson, Lance~M. Berc, Jeffrey Dean, Sanjay Ghemawat,
Monika~R. Henzinger, Shun-Tak~A. Leung, Richard~L. Sites, Mark~T. Vandevoorde,
Carl~A. Waldspurger, and William~E. Weihl, {\sl ACM Transactions on Computer
Systems\/ \bf15} (1997), 357--390.
The interval interrupt is achieved by setting the leftmost bit of the
``machine'' byte of~rQ equal to~1; this is the eighth-least-significant bit.
@^rQ@>
@^continuous profiling@>
@^performance monitoring@>
@^Anderson, Jennifer-Ann Monique@>
@^Berc, Lance Michael@>
@^Dean, Jeffrey Adgate@>
@^Ghemawat, Sanjay@>
@^Henzinger, Monika Hildegard Rauch@>
@^Leung, Shun-Tak Albert@>
@^Sites, Richard Lee@>
@^Vandevoorde, Mark Thierry@>
@^Waldspurger, Carl Alan@>
@^Weihl, William Edward@>
 
The {\it usage counter\/}~rU consists of three fields $(u_p,u_m,u_c)$,
@^rU@>
called the usage pattern~$u_p$, the usage mask~$u_m$,
and the usage count~$u_c$. The most significant byte of~rU is the usage
pattern; the next most significant byte is the usage mask; and
the remaining 48 bits are the usage count. Whenever an instruction whose
${\rm OP}\land u_m=u_p$ has been executed, the value of $u_c$ increases by~1
(mod~$2^{48}$).
Thus, for example, the OP-code chart below implies that
all instructions are counted if $u_p=u_m=0$;
all loads and stores are counted together with \.{GO} and \.{PUSHGO}
if $u_p=(10000000)_2$ and $u_m=(11000000)_2$;
all floating point instructions are counted together with fixed point
multiplications and divisions if $u_p=0$ and $u_m=(11100000)_2$;
fixed point multiplications and divisions alone are counted if
$u_p=(00011000)_2$ and $u_m=(11111000)_2$; completed subroutine calls
are counted if $u_p=\.{POP}$ and $u_m=(11111111)_2$.
Instructions in negative locations, which belong to the operating system,
are exceptional: They are included in the usage count only if the leading bit
of $u_c$ is~1.
@^negative locations@>
 
Incidentally, the 64-bit counters rC and rI can be implemented rather cheaply with
only two levels of logic, using an old trick called ``carry-save addition''
[see, for example, G.~Metze and J.~E. Robertson, {\sl Proc.\ International
Conf.\ Information Processing\/} (Paris:\ 1959), 389--396]. One nice
embodiment of this idea is to
@^Metze, Gernot@>
@^Robertson, James Evans@>
@^carry-save addition@>
represent a binary number~$x$ in a redundant form as the difference $x'-x''$
of two binary numbers. Any two such numbers can be added without carry
propagation as follows: Let
$$f(x,y,z)=
(x\land\bar y)\lor(x\land z)\lor(\bar y\land z), \qquad
% ((x\oplus y)\land(x\oplus z))\oplus z, \qquad
g(x,y,z)=x\oplus y\oplus z.$$
Then it is easy to check that $x-y+z=2f(x,y,z)-g(x,y,z)$; we need only verify
this in the eight cases when $x$, $y$, and~$z$ are 0 or~1.
Thus we can subtract~1 from a counter $x'-x''$ by setting
$$(x',x'')\gets(f(x',x'',-1)\LL1,\;g(x',x'',-1));$$
we can add~1 by setting
$(x',x'')\gets(g(x'',x',-1),f(x'',x',-1)\LL1)$.
The result is zero if and only if
$x'=x''$. We need not actually compute the difference $x'-x''$ until
we need to examine the register. The computation
of $f(x,y,z)$ and $g(x,y,z)$ is particularly simple in the special
cases $z=0$ and $z=-1$. A similar trick works for~rU,
but extra care is needed in that case
because several instructions might finish at the same time.
(Thanks to Frank Yellin for his improvements to this paragraph.)
@^Yellin, Frank Nathan@>
 
@ The special {\it serial number register\/}~rN is permanently set to
@^rN@>
the time this particular instance of\/ \MMIX\ was created (measured as the
number of seconds since 00:00:00 Greenwich Mean Time on 1~January 1970),
in its five least significant bytes. The three most significant bytes
are permanently set to the {\it version number\/} of the \MMIX\ architecture
that is being implemented together with
two additional bytes that modify the version
number. This quantity serves as an essentially unique identification
number for each copy of\/ \MMIX.
@^version number@>
 
Version 1.0.0 of the architecture is described in the present document.
Version~1.0.1 is similar, but simplified to avoid the
complications of pipelines and operating systems.
Other versions may become necessary in the future.
 
@ The {\it register stack offset\/}~rO and {\it register stack
pointer\/}~rS are especially interesting, because they are used to implement
@^register stack@>
@^rO@>
@^rS@>
\MMIX's register stack~$S[0]$, $S[1]$, $S[2]$,~\dots.
 
The operating system
initializes a register stack by assigning a large area of virtual memory to
each running process, beginning at an address like
\Hex{6000000000000000}.
If this starting address is~$\sigma$, stack entry $S[k]$ will go into
the octabyte $\mm_8[\sigma+8k]$. Stack underflow will be detected because
the process does not have permission to read from $\mm[\sigma-1]$.
Stack overflow will be detected because something will give out---either
the user's budget or the user's patience or the user's swap space---long before
$2^{61}$~bytes of virtual memory are filled by a register stack.
@^terabytes@>
 
The \MMIX\ hardware maintains the register stack by having two banks
of 64-bit general-purpose registers, one for globals and one for locals.
The global registers $\rm g[32]$, $\rm g[33]$, \dots, $\rm g[255]$ are used for
register numbers that are $\ge\gg$ in \MMIX\ commands;
recall that $G$~is always 32 or more. The local
registers come from another array that contains $2^n$ registers for
some~$n$ where $8\le n\le10$; for simplicity of exposition
we will assume that there are exactly 512 local
registers, but there may be only 256 or there may be 1024.
 
\def\l{{\rm l}}
@^ring of local registers@>
The local register slots l[0], l[1], \dots, l[511] act as a cyclic buffer with
addresses that wrap around mod~512, so that $\l[512]=\l[0]$,
$\l[513]=\l[1]$, etc. This buffer is divided into three parts by three
pointers, which we will call $\alpha$, $\beta$, and $\gamma$.
$$\epsfbox{mmix.1}$$
Registers $\l[\alpha]$, $\l[\alpha+1]$, \dots,~$\l[\beta-1]$ are
what program instructions currently call \$0, \$1, \dots,~$\$(\ll-1)$;
registers $\l[\beta]$, $\l[\beta+1]$, \dots,~$\l[\gamma-1]$ are currently
unused; and registers $\l[\gamma]$, $\l[\gamma+1]$, \dots,~$\l[\alpha-1]$
contain items of the register stack that have been pushed down but not yet
stored in memory. Special register~rS holds the virtual memory address where
$\l[\gamma]$ will be stored, if necessary. Special register~rO holds the
address where $\l[\alpha]$ will be stored; this always equals $8\tau$ plus
the address of~$S[0]$. We can deduce the values of $\alpha$, $\beta$,
and~$\gamma$ from the contents of rL, rO, and~rS, because
$$\rm\alpha=(rO/8)\bmod512,\qquad \beta=(\alpha+rL)\bmod512,\qquad
\hbox{and}\qquad \gamma=(rS/8)\bmod512.$$
 
To maintain this situation we need to make sure that the pointers $\alpha$,
$\beta$, and $\gamma$ never move past each other. A~\.{PUSHJ} or
\.{PUSHGO} operation simply
advances $\alpha$ toward~$\beta$, so it is very simple. The first part of a
\.{POP} operation, which moves $\beta$ toward~$\alpha$, is also very simple.
But the next part of a~\.{POP} requires $\alpha$ to move downward, and
memory accesses might be required. \MMIX\ will decrease rS by~8 (thereby
decreasing $\gamma$ by~1) and set $\l[\gamma]\gets\mm_8[{\rm rS}]$,
one or more times if necessary, to keep $\alpha$ from decreasing
past~$\gamma$. Similarly, the operation of increasing~$L$ may cause \MMIX\ to
set $\mm_8[{\rm rS}]\gets\l[\gamma]$ and increase rS by~8 (thereby increasing
$\gamma$ by~1) one or more times, to keep $\beta$ from increasing
past~$\gamma$. (Actually $\beta$ is never allowed to increase to the point
where it becomes {\it equal\/} to $\gamma$.)
If many registers need to be loaded or stored at once,
these operations are interruptible.
 
[A somewhat similar scheme was introduced by David R. Ditzel and H.~R.
McLellan in {\sl SIGPLAN Notices\/ \bf17},\thinspace4 (April 1982), 48--56,
and incorporated in the so-called {\mc CRISP} architecture developed at
AT{\AM}T Bell Labs. An even more similar scheme was adopted in the late 1980s
@^AT{\AM}T Bell Laboratories@>
@^Advanced Micro Devices@>
by Advanced Micro Devices, in the processors of their Am29000 series---a
family of computers whose instructions have essentially the
format `OP~X~Y~Z' used by~\MMIX.]
@^Ditzel, David Roger@>
@^McClellan, Hubert Rae, Jr.@>
 
Limited versions of\/ \MMIX, having fewer registers, can also be envisioned. For
example, we might have only 32 local registers $\l[0]$, $\l[1]$,
\dots,~$\l[31]$ and only 32 global registers $\rm g[224]$, $\rm g[225]$,
\dots,~$\rm g[255]$. Such a machine could run any \MMIX\ program that
maintains the inequalities $\ll<32$ and $\gg\ge224$.
 
@ Access to \MMIX's special registers is obtained via the \.{GET} and
\.{PUT} commands.
@^special registers@>
@^rL@>
@^rQ@>
 
\bull\<GET \$X,Z `get from special register'; the Y field must be zero.\>
@.GET@>
Register X is set to the contents of the special register identified by
its code number~Z, using the code numbers listed earlier.
An illegal instruction interrupt occurs if $\zz\ge32$.
 
Every special register is readable; \MMIX\ does not keep secrets from
an inquisitive user. But of course only the operating system is allowed
@^operating system@>
to change registers like rK and~rQ (the interrupt mask and request
registers). And not even the operating system is allowed to change~rC
(the cycle counter) or rN~(the serial number) or the stack pointers
rO~and~rS.
 
\bull\<PUT X,\0 `put into special register';
@.PUT@>
the Y field must be zero.\>
The special register identified by~X is set to
the contents of register Z or to the unsigned byte~Z itself,
if permissible. Some changes are, however, impermissible:
Bits of rA that are always zero must remain zero; the leading seven bytes
of rG and rL must remain zero, and rL must not exceed~rG;
special registers 8--11 (namely rC, rN, rO, and~rS) must not change;
special registers 12--18 (namely
rI, rK, rQ, rT, rU, rV, and~rTT) can be changed only if the privilege
bit of rK is zero;
and certain bits of~rQ (depending on available hardware) might not
allow software to change them from 0 to~1. Moreover, any bits of~rQ that have
changed from 0 to~1 since the most recent \<GET x,rQ
will remain~1 after \.{PUT}~\.{rQ,z}.
The \.{PUT} command will not increase~rL; it sets rL to the minimum
of the current value and the new value. (A~program should say
\<SETL \$99,0 instead of \<PUT rL,100 when rL is known to be less than~100.)
 
Impermissible \.{PUT} commands cause an illegal instruction interrupt,
or (in the case of rI, rK, rQ, rT, rU, rV, and~rTT) a privileged
operation interrupt.
@^illegal instructions@>
@^privileged operations@>
 
\bull\<SAVE \$X,0 `save process state';
@.SAVE@>
@^register stack@>
@^ring of local registers@>
@^rO@>
@^rS@>
\<UNSAVE 0,\$Z `restore process state'; the Y~field must be~0, and
so must the Z field of~\.{SAVE}, the X~field of \.{UNSAVE}.\>
@.UNSAVE@>
The \.{SAVE} instruction stores all registers and special registers
that might affect the computation of the currently running process.
First the current local registers \$0, \$1, \dots,~$\$(\ll-1)$ are
pushed down as in \.{PUSHGO}~\.{\$255}, and $L$~is set to zero.
Then the current global
registers $\$\gg$, $\$(\gg+1)$, \dots,~\$255 are placed above them
in the register stack; finally
rB, rD, rE, rH, rJ, rM, rR, rP, rW, rX, rY, and~rZ
are placed at the very top, followed by registers rG and~rA packed
into eight bytes:
$$\beginword
&\field88&&\field{24}{24}&&\field{32}{32}\cr
\noalign{\hrule}
\\&rG&\\&0&\\&rA&\\\cr
\noalign{\hrule}\endword$$
The address of the topmost octabyte is then placed in register~X, which
must be a global register. (This instruction is interruptible. If an
interrupt occurs while the registers are being saved, we will have
$\alpha=\beta=\gamma$ in the ring of local registers;
thus rO will equal~rS and rL will be zero. The interrupt handler
essentially has a new register stack, starting on top of the partially
saved context.) Immediately after a \.{SAVE} the values of rO and~rS
are equal to the location of the first byte following the stack
just saved. The current register stack is effectively empty at this
point; thus one shouldn't do a \.{POP} until this context
or some other context has been unsaved.
@^rO@>
@^rS@>
 
The \.{UNSAVE} instruction goes the other way, restoring all the
registers when given an address in register~Z that was returned
by a previous \.{SAVE}. Immediately after an \.{UNSAVE} the values of
rO and~rS will be equal. Like \.{SAVE}, this instruction is interruptible.
 
The operating system uses \.{SAVE} and \.{UNSAVE}
to switch context between different processes.
It can also use \.{UNSAVE} to
establish suitable initial values of rO and~rS\null.
But a user program that knows what it is doing can in fact allocate its own
register stack or stacks and do its own process switching.
 
Caution: \.{UNSAVE} is destructive, in the sense that a program can't reliably
\.{UNSAVE} twice from the same saved context. Once an
\.{UNSAVE} has been done,
further operations are likely to change the memory
record of what was saved. Moreover, an interrupt during the middle
of an \.{UNSAVE} may have already clobbered some of the data in memory before
the \.{UNSAVE} has completely finished, although the data will appear
properly in all registers.
 
@* Virtual and physical addresses.
Virtual 64-bit addresses are converted to physical addresses in a manner
@^virtual addresses@>
@^physical addresses@>
governed by the special {\it virtual translation register\/}~rV. Thus
@^rV@>
$\rm M[A]$ really refers to $\rm m[\phi(A)]$, where m~is the physical
memory array and $\phi(A)$
is determined by the physical mapping function~$\phi$. The details of
this conversion are rather technical and of interest mainly to the operating
system, but two simple rules are important to ordinary users:
@^operating system@>
 
\bull Negative addresses are mapped directly to physical addresses, by simply
@^negative locations@>
suppressing the sign bit:
$$\phi(A)=A+2^{63}=A\land\Hex{7fffffffffffffff},\qquad
\hbox{if $A<0$.}$$
{\it All accesses to negative addresses are privileged}, for use by the
operating system only.
@^privileged operations@>
(Thus, for example, the trap addresses in~rT and~rTT should be negative,
because they are addresses inside the operating system.) Moreover, all physical
addresses $\ge2^{48}$ are intended for use by memory-mapped I/O devices;
values read from or written to such locations are never placed in a cache.
@^I/O@>
@^input/output@>
@^memory-mapped input/output@>
 
\bull Nonnegative addresses belong to four {\it segments}, depending on
@^segments@>
whether the three leading bits are 000, 001, 010, or 011. These $2^{61}$-byte
segments are traditionally used for a program's text, data, dynamic
memory, and register stack, respectively, but such conventions are
not mandatory. There are four mappings $\phi_0$, $\phi_1$, $\phi_2$,
and~$\phi_3$ of 61-bit addresses into 48-bit physical memory space, one for
each segment:
$$\phi(A)=\phi_{\lfloor A/2^{61}\rfloor}(A\bmod2^{61}),\qquad
\hbox{if $0\le A<2^{63}$.}$$
In general, the machine is able to access smaller addresses of a segment more
efficiently than larger addresses. Thus a programmer should let each segment
grow upward from zero, trying to keep any of the 61-bit addresses from
becoming larger than necessary, although arbitrary addresses are legal.
 
@ Now it's time for the technical details of virtual address translation.
@^segments@>
@^virtual addresses@>
@^physical addresses@>
@^rV@>
The mappings $\phi_0$, $\phi_1$, $\phi_2$, and~$\phi_3$ are defined
by the following rules.
\smallskip
 
(1) The first two bytes of rV are four nybbles called $b_1$, $b_2$, $b_3$,
$b_4$; we also define $b_0=0$. Segment~$i$ has at most $1024^{\,b_{i+1}-b_i}$
pages. In particular, segment~$i$ must have at most one page when
$b_i=b_{i+1}$, and it must be entirely empty if $b_i>b_{i+1}$.
 
(2) The next byte of rV, $s$, specifies the current {\it page size},
which is $2^s$ bytes. We must have $s\ge13$ (hence at least 8192~bytes
per page). Values of~$s$ larger than, say, 20 or~so are of use only in rather
large programs that will reside in main memory for long periods of time,
because memory protection and swapping are applied to entire pages.
The maximum legal value of~$s$ is~48.
 
(3) The remaining five bytes of rV are a 27-bit {\it root location\/}~$r$,
a 10-bit {\it address space number\/}~$n$, and a 3-bit {\it function
field\/}~$f$:
$$\centerline{$\hbox{rV}=\beginword
&\field44&&\field44&&\field44&&\field44&&\field88&&
\field{27}{27}&&\field{10}{10}&&\field33\cr
\noalign{\hrule}
\\&$b_1$&\\&$b_2$&\\&$b_3$&\\&$b_4$&\\&$s$&\\&$r$&\\&$n$&\\&$f$&\\\cr
\noalign{\hrule}\endword$}$$
Normally $f=0$; if $f=1$, virtual address translation will be done by
software instead of hardware, and the $b_1$, $b_2$, $b_3$, $b_4$,
and~$r$ fields of~rV will be ignored by the hardware.
(Values of $f>1$ are reserved for possible future use; if $f>1$
when \MMIX\ tries to translate an address, a memory-protection
failure will occur.)
@^illegal instructions@>
 
(4) Each page has an 8-byte {\it page table entry\/} (PTE), which looks
@^page table entry@>
@^PTE@>
like this:
$$\centerline{$\hbox{PTE}=\beginword
&\field{16}{16}&&\field{32}{48-s}&&\field3{s-13}&&\field{10}{10}&&
\field33\cr
\noalign{\hrule}
\\&$x$&\\&$a$&\\&$y$&\\&$n$&\\&$p$&\\\cr
\noalign{\hrule}\endword$}$$
Here $x$ and $y$ are ignored (thus they are usable for any purpose by the
operating
system); $2^s a$~is the physical address of byte~0 on the page; and $n$~is
the address space number (which must match the number in~rV). The final three
bits are the {\it protection bits\/} $p_r\,p_w\,p_x$; the user needs
$p_r=1$ to load from this page, $p_w=1$ to store on this page, and
$p_x=1$ to execute instructions on this page. If $n$~fails to
match the number in~rV, or if the appropriate protection bit is zero,
a memory-protection fault occurs.
@^protection fault@>
 
Page table entries should be writable only by the operating system.
The 16 ignored bits of~$x$ imply that physical memory size is limited
to $2^{48}$ bytes (namely 256 large terabytes); that should be enough capacity
for awhile, if not for the entire new millennium.
@^terabytes@>
 
(5) A given 61-bit address $A$ belongs to page $\lfloor A/2^s\rfloor$ of
its segment, and
$$\phi_i(A)=2^s\,a+(A\bmod2^s)$$
if $a$ is the address in the PTE for page $\lfloor A/2^s\rfloor$ of
segment~$i$.
 
(6) Suppose $\lfloor A/2^s\rfloor=(a_4a_3a_2a_1a_0)_{1024}$ in the
radix-1024 number system. In the common case $a_4=a_3=a_2=a_1=0$, the
PTE is simply the octabyte ${\rm m}_8[2^{13}(r+b_i)+8a_0]$; this rule
defines the mapping for the first 1024 pages. The next million or~so pages are
accessed through an auxiliary {\it page table pointer}
@^page table pointer@>
@^PTP@>
$$\centerline{$\hbox{PTP}=\beginword
&\field11&&\field{50}{50}&&\field{10}{10}&&\field33\cr
\noalign{\hrule}
\\&1&\\&$c$&\\&$n$&\\&$q$&\\\cr
\noalign{\hrule}\endword$}$$
in ${\rm m}_8[2^{13}(r+b_i+1)+8a_1]$; here the sign must be~1 and the
$n$-field must match~rV, but the $q$~bits are ignored. The desired PTE for
page $(a_1a_0)_{1024}$ is then in ${\rm m}_8[2^{13}c+8a_0]$. The next billion
or so pages, namely the pages $(a_2a_1a_0)_{1024}$ with $a_2\ne0$,
are accessed similarly, through an auxiliary PTP at level~two; and
so on.
 
Notice that if $b_3=b_4$, there is just one page in segment~3, and its PTE
appears all alone in physical location $2^{13}(r+b_3)$.
Otherwise the PTEs appear in 1024-octabyte blocks. We usually
have $0<b_1<b_2<b_3<b_4$, but the null case $b_1=b_2=b_3=b_4=0$ is
worthy of mention: In this special case there is only one page, and the
segment bits of a virtual address are ignored; the other $61-s$ bits of each
virtual address must be zero.
 
If $s=13$, $b_1=3$, $b_2=2$, $b_3=1$, and $b_4=0$, there are at most
$2^{30}$ pages of 8192 bytes each, all belonging to segment~0. This is
essentially the virtual memory setup in the Alpha~21064 computers with
{\mc DIGITAL~UNIX}$^{\rm\,TM}$.
@^Alpha computers@>
 
I know these rules look extremely complicated, and I sincerely wish I could
have found an alternative that would be both simple and efficient in practice.
I tried various schemes based on hashing, but came to the conclusion that
``trie'' methods such as those described here are better for this application.
Indeed, the page tables in most contemporary computers are based on very
similar ideas, but with significantly smaller virtual addresses and without
the shortcut for small page numbers. I tried also to find formats for rV
and the page tables that would match byte boundaries in a more friendly way,
but the corresponding page sizes did not work well. Fortunately these grungy
details are almost always completely hidden from ordinary users.
 
@ Of course \MMIX\ can't afford to perform a lengthy calculation of physical
addresses every time it accesses memory. The machine therefore maintains a
{\it translation cache\/} (TC),
@^translation caches@>
@^TC@>
which contains the translations of recently
accessed pages. (In fact, there usually are two such caches,
one for instructions
and one for data.) A~TC holds a set of 64-bit translation keys
$$\beginword
&\field{1.2}1&&\field22&&\field{44.8}{61-s}&&\field3{s-13}&&\field{10}{10}&&
\field33\cr
\noalign{\hrule}
\\&0&\\&$i$&\\&$v$&\\&0&\\&$n$&\\&0&\\\cr
\noalign{\hrule}\endword$$
associated with 38-bit translations
$$\beginword
&\field{32}{48-s}&&\field3{s-13}&&\field33\cr
\noalign{\hrule}
\\&$a$&\\&0&\\&$p$&\\\cr
\noalign{\hrule}\endword$$
representing the relevant parts of the PTE for page $v$ of segment $i$.
Different processes typically have different values of~$n$, and possibly also
different values of~$s$. The operating system needs a way to keep such caches
up to date when pages are being allocated, moved, swapped, or recycled.
The operating system also likes to know which pages have been recently
used. The \.{LDVTS} instructions facilitate such operations:
@^protection bits@>
@^permission bits@>
 
\bull\<LDVTS \$X,\$Y,\0 `load virtual translation status'.\>
@.LDVTS@>
The sum $\rY+\rZ$ or $\rY+\zz$ should have the form of
a translation cache key as above,
except that the rightmost three bits need not be zero.
If this key is present in a TC,
the rightmost three bits replace the current protection code~$p$;
however, if $p$ is thereby set to zero, the key is removed from
the TC. Register~X is set to 0 if the key was not present
in any translation cache, or to 1 if the key was present in the TC
for instructions, or to 2 if the key was present in the TC for data,
or to~3 if the key was present in both. This instruction is for the
operating system only.
 
@ We mentioned earlier that
cheap versions of\/ \MMIX\ might calculate the physical addresses with
@^emulation@>
@^rV@>
software instead of hardware, using forced traps when the operating
system needs to do page table calculations.
@^operating system@>
Here is some code that could be used for
such purposes; it defines the translation process precisely, given a
nonnegative virtual
address in register~rYY\null. First we must unpack the fields of~rV and
@^virtual addresses@>
@^physical addresses@>
@^rV@>
@^PTE@>
@^PTP@>
@^segments@>
compute the relevant base addresses for PTEs and PTPs:
$$\vbox{\halign{&\tt#\hfil\ \cr
&GET &virt,rYY\cr
&GET &\$7,rV &\% \$7=(virtual translation register)\cr
&SRU &\$1,virt,61 &\% \$1=i (segment number of virtual address)\cr
&SLU &\$1,\$1,2 \cr
&NEG &\$1,52,\$1 &\% \$1=52-4i\cr
&SRU &\$1,\$7,\$1 \cr
&SLU &\$2,\$1,4 \cr
&SETL &\$0,\#f000 \cr
&AND &\$1,\$1,\$0 &\% \$1=b[i]<<12\cr
&AND &\$2,\$2,\$0 &\% \$2=b[i+1]<<12\cr
&SLU &\$3,\$7,24 \cr
&SRU &\$3,\$3,37 \cr
&SLU &\$3,\$3,13 &\% \$3=(r field of rV)\cr
&ORH &\$3,\#8000 &\% make \$3 a physical address\cr
&2ADDU &base,\$1,\$3 &\% base=address of first page table\cr
&2ADDU &limit,\$2,\$3 &\% limit=address after last page table\cr
&SRU &s,\$7,40 \cr
&AND &s,s,\#ff &\% s=(s field of rV)\cr
&CMP &\$0,s,13 \cr
&BN &\$0,Fail &\% s must be 13 or more\cr
&CMP &\$0,s,49 \cr
&BNN &\$0,Fail &\% s must be 48 or less\cr
&SETH &mask,\#8000 \cr
&ORL &mask,\#1ff8&\% mask=(sign bit and n field)\cr
&ORH &\$7,\#8000 &\% set sign bit for PTP validation below\cr
&ANDNH &virt,\#e000 &\% zero out the segment number\cr
&SRU &\$0,virt,s &\% \$0=a4a3a2a1a0 (page number of virt)\cr
&ZSZ &\$1,\$0,1 &\% \$1=[page number is zero]\cr
&ADD &limit,limit,\$1&\% increase limit if page number is zero\cr
&SETL&\$6,\#3ff\cr
}}$$
The next part of the routine finds the ``digits'' of
the page number $(a_4a_3a_2a_1a_0)_{1024}$, from right to left:
$$
\vcenter{\halign{&\tt#\hfil\ \cr
&OR &\$5,base,0\cr
&SRU &\$1,\$0,10\cr
&PBZ &\$1,1F\cr
&AND &\$0,\$0,\$6\cr
&INCL &base,\#2000\cr}}
\qquad
\vcenter{\halign{&\tt#\hfil\ \cr
&OR &\$5,base,0\cr
&SRU &\$2,\$1,10\cr
&PBZ &\$2,2F\cr
&AND &\$1,\$1,\$6\cr
&INCL &base,\#2000\cr}}
\qquad
\vcenter{\halign{&\tt#\hfil\ \cr
&OR &\$5,base,0\cr
&SRU &\$3,\$2,10\cr
&PBZ &\$3,3F\cr
&AND &\$2,\$2,\$6\cr
&INCL &base,\#2000\cr}}
\qquad
\vcenter{\halign{&\tt#\hfil\ \cr
&OR &\$5,base,0\cr
&SRU &\$4,\$3,10\cr
&PBZ &\$4,4F\cr
&AND &\$3,\$3,\$6\cr
&INCL &base,\#2000\cr}}
$$
Then the process cascades back through PTPs.
$$
\vcenter{\halign{&\tt#\hfil\ \cr
&OR &\$5,base,0\cr
&8ADDU&\$6,\$4,base\cr
&LDO &base,\$6,0\cr
&XOR &\$6,base,\$7\cr
&AND &\$6,\$6,mask\cr
&BNZ &\$6,Fail\cr}}
\quad
\vcenter{\halign{&\tt#\hfil\ \cr
&ANDNL&base,\#1fff\cr
4H&8ADDU &\$6,\$3,base\cr
&LDO &base,\$6,0\cr
&XOR &\$6,base,\$7\cr
&AND &\$6,\$6,mask\cr
&BNZ &\$6,Fail\cr}}
\quad
\vcenter{\halign{&\tt#\hfil\ \cr
&ANDNL&base,\#1fff\cr
3H&8ADDU &\$6,\$2,base\cr
&LDO &base,\$6,0\cr
&XOR &\$6,base,\$7\cr
&AND &\$6,\$6,mask\cr
&BNZ &\$6,Fail\cr}}
\quad
\vcenter{\halign{&\tt#\hfil\ \cr
&ANDNL&base,\#1fff\cr
2H&8ADDU &\$6,\$1,base\cr
&LDO &base,\$6,0\cr
&XOR &\$6,base,\$7\cr
&AND &\$6,\$6,mask\cr
&BNZ &\$6,Fail\cr}}
$$
Finally we obtain the PTE and communicate it to the machine.
If errors have been detected, we set the translation to zero; actually
any translation with permission bits zero would have the same effect.
$$\chardef\_=`\_
\vcenter{\halign{&\tt#\hfil\ \cr
&ANDNL &base,\#1fff &\% remove low 13 bits of PTP\cr
1H &8ADDU &\$6,\$0,base \cr
&LDO &base,\$6,0 &\% base=PTE\cr
&XOR &\$6,base,\$7\cr
&ANDN&\$6,\$6,\#7\cr
&SLU &\$6,\$6,51\cr
&BNZ &\$6,Fail &\% branch if n doesn't match\cr
&CMP &\$6,\$5,limit \cr
&BN &\$6,Ready &\% did we run off the end of the page table?\cr
Fail&SETL &base,0 &\% errors lead to PTE of zero\cr
Ready&PUT&rZZ,base\cr
&LDO&\$255,IntMask &\% load the desired setting of rK\cr
&RESUME&1 &\% now the machine will digest the translation\cr}}$$
All loads and stores in this program deal with negative virtual addresses.
This effectively shuts off memory mapping and makes the page tables
inaccessible to the user.\looseness=-1
 
The program assumes that the ropcode in rXX is 3 (which it is when
a forced trap is triggered by the need for virtual translation).
@^ropcodes@>
@^translation caches@>
 
The translation from virtual pages to physical pages need not actually
follow the rules for PTPs and PTEs; any other mapping could be
substituted by operating systems with special needs. But people usually
want compatibility between different implementations whenever
possible. The only parts of~rV that \MMIX\ really needs are the $s$~field,
which defines page sizes, and the $n$~field, which keeps TC entries
of one process from being confused with the TC entries of another.
 
@* The complete instruction set. We have now described all of\/ \MMIX's
special registers---except one: The special
{\it failure location register\/}~rF is set
@^rF@>
to a physical memory address when a parity error or other memory
fault occurs. (The instruction leading to this error will probably be
long gone before such a fault is detected; for example, the machine might
be trying to write old data from a cache in order to make room for
new data. Thus there is generally no connection between the current virtual
program location~rW and the physical location of a memory error. But knowledge
of the latter location can still be useful for hardware repair, or when
an operating system is booting up.)
 
@ One additional instruction proves to be useful.
 
\bull\<SWYM X,Y,Z `sympathize with your machinery'.\>
This command lubricates the disk drives, fans, magnetic tape drives,
laser printers, scanners, and any other mechanical equipment hooked
up to \MMIX, if necessary. Fields X, Y, and~Z are ignored.
@.SWYM@>
 
The \.{SWYM} command was originally included in \MMIX's repertoire because
machines occasionally need grease to keep in shape, just as
human beings occasionally need to swim or do some other kind of exercise
in order to maintain good muscle tone. But in fact, \.{SWYM} has turned out to
be a ``no-op,'' an instruction that does nothing at all; the
@^no-op@>
hypothetical manufacturers of our hypothetical machine have pointed out that
modern computer equipment is already well oiled and sealed for permanent use.
Even so, a no-op instruction provides a good way for software to
send signals to the hardware, for such things as scheduling the way
instructions are issued on superscalar superpipelined buzzword-compliant
machines. Software programs can also use no-ops to communicate with other
programs like symbolic debuggers.
 
When a forced trap computes the translation~rZZ of a virtual address~rYY,
ropcode~3 of \<RESUME 1 will put $\rm(rYY,rZZ)$ into the TC for instructions if
the opcode in~rXX is \.{SWYM}; otherwise $\rm(rYY,rZZ)$ will be put
into the TC for data.
@^ropcodes@>
@^translation caches@>
@.RESUME@>
@^virtual address emulation@>
@^emulation@>
 
@ The running time of\/ \MMIX\ programs depends to a great extent
on changes in technology.
\MMIX\ is a mythical machine, but its mythical hardware exists in
cheap, slow versions as well as in costly high-performance models.
Details of running time usually depend on things like the amount of main memory
available to implement virtual memory, as well as the sizes of
caches and other buffers.
 
For practical purposes, the running time of an \MMIX\ program can often be
estimated satisfactorily by assigning a fixed cost
to each operation, based on the approximate running time that would be obtained
on a high-performance machine with lots of main memory; so that's what
we will do. Each operation will be assumed to take an integer number
of~$\upsilon$,
where $\upsilon$ (pronounced ``oops'') is a unit that represents the clock cycle time in
@^mems@>
@^oops@>
a pipelined implementation. The value of $\upsilon$ will probably decrease
from year to year, but I'll keep calling it $\upsilon$. The running
time will also depend on the number of memory references or {\it mems\/}
that a program uses;
this is the number of load and store instructions. For example,
each \.{LDO} (load octa) instruction will be assumed to cost
$\mu+\upsilon$, where $\mu$ is the average cost of
a memory reference. The total running time of a program might be reported as,
say, $35\mu+1000\upsilon$, meaning 35 mems plus 1000~oops. The
ratio $\mu/\upsilon$ will probably increase with time, so mem-counting
is likely to become increasingly important. [See the discussion of mems in
{\sl The Stanford GraphBase\/} (New York:\ ACM Press, 1994).]
@^oops@>
@^running times, approximate@>
 
Integer addition, subtraction, and comparison all take just $1\upsilon$.
The same is true for \.{SET}, \.{GET}, \.{PUT}, \.{SYNC}, and \.{SWYM}
instructions,
as well as bitwise logical operations, shifts, relative jumps, comparisons,
conditional assignments,
and correctly predicted branches-not-taken or probable-branches-taken.
Mispredicted branches or probable branches cost $3\upsilon$, and
so do the \.{POP} and \.{GO} commands.
Integer multiplication takes $10\upsilon$; integer division weighs in
at~$60\upsilon$.
@.MUL@>
@.DIV@>
@.TRAP@>
@.TRIP@>
@.RESUME@>
\.{TRAP}, \.{TRIP}, and \.{RESUME} cost $5\upsilon$ each.
 
Most floating point operations have a nominal running time of $4\upsilon$,
although the comparison operators \.{FCMP}, \.{FEQL}, and \.{FUN}
need only $1\upsilon$.
\.{FDIV} and \.{FSQRT} cost $40\upsilon$ each.
@.FDIV@>
@.FSQRT@>
@.FREM@>
The actual running time of floating point computations
will vary depending on the operands; for example,
the machine might need one extra $\upsilon$ for each subnormal input
or output, and it might slow down greatly when trips are enabled.
The \.{FREM} instruction might typically cost
$(3+\delta)\upsilon$, where $\delta$ is the amount
by which the exponent of the first operand exceeds the exponent of the
second (or zero, if this amount is negative). A floating point
operation might take only $1\upsilon$
if at least one of its operands is zero, infinity, or~NaN\null.
However, the fixed values stated at the beginning of this paragraph
will be used for all seat-of-the-pants estimates of running time,
since we want to keep the estimates as simple as possible
without making them terribly out of line.
 
All load and store operations will be assumed to cost $\mu+\upsilon$,
except that \.{CSWAP} costs $2\mu+2\upsilon$.
(This applies to all OP~codes that begin with
\Hex8, \Hex9, \Hex{A}, and \Hex{B}, except \Hex{98}--\Hex{9F} and
\Hex{B8}--\Hex{BF}. It's best
to keep the rules simple, because $\mu$ is just
an approximate device for estimating average memory cost.)
\.{SAVE} and \.{UNSAVE} are charged $20\mu+\upsilon$.
@.CSWAP@>
@.SAVE@>
@.UNSAVE@>
 
Of course we must remember that these numbers are very rough.
We have not included the cost of fetching instructions from memory.
Furthermore, an integer multiplication or division might have an effective
cost of only $1\upsilon$, if the result is not needed while other
numbers are being calculated.
Only a detailed simulation can be expected to be truly realistic.
 
@ If you think that \MMIX\ has plenty of operation codes, you are right;
we have now described them all. Here is a chart that shows their
numeric values:
\def\oddline#1{\cr
\noalign{\nointerlineskip}
\omit&\setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}&
\multispan{17}\hrulefill&
\setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}\cr
\noalign{\nointerlineskip}}
\def\evenline{\cr\noalign{\hrule}}
\def\chartstrut{\lower4.5pt\vbox to14pt{}}
\def\beginchart{$$\tt\halign to\hsize\bgroup
\chartstrut##\tabskip0pt plus10pt&
&\hfil##\hfil&\vrule##\cr
\lower6.5pt\null
&&&\Hex0&&\Hex1&&\Hex2&&\Hex3&&\Hex4&&\Hex 5&&\Hex 6&&\Hex 7&\evenline}
\def\endchart{\raise11.5pt\null&&&\Hex 8&&\Hex 9&&\Hex A&&\Hex B&
&\Hex C&&\Hex D&&\Hex E&&\Hex F&\cr\egroup$$}
\def\\#1[#2]{\multispan3\hfil#1[#2]\hfil}
\beginchart
&&&TRAP&&FCMP&&FUN&&FEQL&&FADD&&FIX&&FSUB&&FIXU&\oddline 0
&&&\\FLOT[I]&&\\FLOTU[I]&&\\SFLOT[I]&&\\SFLOTU[I]&\evenline
&&&FMUL&&FCMPE&&FUNE&&FEQLE&&FDIV&&FSQRT&&FREM&&FINT&\oddline 1
&&&\\MUL[I]&&\\MULU[I]&&\\DIV[I]&&\\DIVU[I]&\evenline
&&&\\ADD[I]&&\\ADDU[I]&&\\SUB[I]&&\\SUBU[I]&\oddline 2
&&&\\2ADDU[I]&&\\4ADDU[I]&&\\8ADDU[I]&&\\16ADDU[I]&\evenline
&&&\\CMP[I]&&\\CMPU[I]&&\\NEG[I]&&\\NEGU[I]&\oddline 3
&&&\\SL[I]&&\\SLU[I]&&\\SR[I]&&\\SRU[I]&\evenline
&&&\\BN[B]&&\\BZ[B]&&\\BP[B]&&\\BOD[B]&\oddline 4
&&&\\BNN[B]&&\\BNZ[B]&&\\BNP[B]&&\\BEV[B]&\evenline
&&&\\PBN[B]&&\\PBZ[B]&&\\PBP[B]&&\\PBOD[B]&\oddline 5
&&&\\PBNN[B]&&\\PBNZ[B]&&\\PBNP[B]&&\\PBEV[B]&\evenline
&&&\\CSN[I]&&\\CSZ[I]&&\\CSP[I]&&\\CSOD[I]&\oddline 6
&&&\\CSNN[I]&&\\CSNZ[I]&&\\CSNP[I]&&\\CSEV[I]&\evenline
&&&\\ZSN[I]&&\\ZSZ[I]&&\\ZSP[I]&&\\ZSOD[I]&\oddline 7
&&&\\ZSNN[I]&&\\ZSNZ[I]&&\\ZSNP[I]&&\\ZSEV[I]&\evenline
&&&\\LDB[I]&&\\LDBU[I]&&\\LDW[I]&&\\LDWU[I]&\oddline 8
&&&\\LDT[I]&&\\LDTU[I]&&\\LDO[I]&&\\LDOU[I]&\evenline
&&&\\LDSF[I]&&\\LDHT[I]&&\\CSWAP[I]&&\\LDUNC[I]&\oddline 9
&&&\\LDVTS[I]&&\\PRELD[I]&&\\PREGO[I]&&\\GO[I]&\evenline
&&&\\STB[I]&&\\STBU[I]&&\\STW[I]&&\\STWU[I]&\oddline A
&&&\\STT[I]&&\\STTU[I]&&\\STO[I]&&\\STOU[I]&\evenline
&&&\\STSF[I]&&\\STHT[I]&&\\STCO[I]&&\\STUNC[I]&\oddline B
&&&\\SYNCD[I]&&\\PREST[I]&&\\SYNCID[I]&&\\PUSHGO[I]&\evenline
&&&\\OR[I]&&\\ORN[I]&&\\NOR[I]&&\\XOR[I]&\oddline C
&&&\\AND[I]&&\\ANDN[I]&&\\NAND[I]&&\\NXOR[I]&\evenline
&&&\\BDIF[I]&&\\WDIF[I]&&\\TDIF[I]&&\\ODIF[I]&\oddline D
&&&\\MUX[I]&&\\SADD[I]&&\\MOR[I]&&\\MXOR[I]&\evenline
&&&SETH&&SETMH&&SETML&&SETL&&INCH&&INCMH&&INCML&&INCL&\oddline E
&&&ORH&&ORMH&&ORML&&ORL&&ANDNH&&ANDNMH&&ANDNML&&ANDNL&\evenline
&&&\\JMP[B]&&\\PUSHJ[B]&&\\GETA[B]&&\\PUT[I]&\oddline F
&&&POP&&RESUME&&SAVE&&UNSAVE&&SYNC&&SWYM&&GET&&TRIP&\evenline
\endchart
The notation `\.{[I]}' indicates an operation with an ``immediate'' variant
in which the Z field denotes a constant instead of a register number.
Similarly, `\.{[B]}' indicates an operation with a ``backward'' variant
in which a relative address has a negative displacement. Simulators and
other programs that need to present \MMIX\ instructions in symbolic
form will say that opcode \Hex{20} is \.{ADD} while opcode \Hex{21}
is~\.{ADDI}; they will say that \Hex{F2} is \.{PUSHJ} while \Hex{F3}
is~\.{PUSHJB}. But the \MMIX\ assembler uses only the forms \.{ADD}
and \.{PUSHJ}, not \.{ADDI} or \.{PUSHJB}.
 
To read this chart, use the hexadecimal digits at the top, bottom,
left, and right.
For example, operation code \.{A9} in hexadecimal notation appears in
the lower part of the \Hex{Ax} row and in the \Hex1/\Hex9 column; it is
\.{STTI}, `store tetrabyte immediate'.
@^OP codes, table@>
 
%The blank spaces in this chart are undefined opcodes,
%reserved for future extension.
%If an instruction with such
%an opcode is encountered in a user program, it is considered to be
%an illegal instruction (like, say, \.{FIX} with the \.Y field greater than~9),
%@^illegal instructions@>
%triggering an interrupt. Such instructions might become defined in
%later versions of\/ \MMIX, at which time the operating system
%could probably emulate the new instructions for backward compatibility.
%@^version number@>
 
\def\\#1{\leavevmode\hbox{\it#1\/\kern.05em}} % italic type for identifiers
 
@*Index. (References are to section numbers, not page numbers.)
/sort.mms
0,0 → 1,40
LOC Data_Segment
x0 GREG @
X0 IS @
N IS 100
 
j IS $0
m IS $1
kk IS $2
xk IS $3
t IS $255
LOC #100
Maximum SL kk,$0,3
LDO m,x0,kk
JMP ChangeJ
Loop LDO xk,x0,kk
CMP t,xk,m
PBNP t,DecreaseK
ChangeM SET m,xk
ChangeJ SR j,kk,3
DecreaseK SUB kk,kk,8
PBP kk,Loop
POP 2,0
 
Main GETA t,9F
TRAP 0,Fread,StdIn
SET $0,N<<3
1H SR $2,$0,3
PUSHJ 1,Maximum
LDO $3,x0,$0
SL $2,$2,3
STO $1,x0,$0
STO $3,x0,$2
SUB $0,$0,1<<3
PBNZ $0,1B
GETA t,9F
TRAP 0,Fwrite,StdOut
TRAP 0,Halt,0
9H OCTA X0+1<<3,N<<3
 
/mmix-arith.w
0,0 → 1,1843
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
 
\def\title{MMIX-ARITH}
 
\def\MMIX{\.{MMIX}}
\def\MMIXAL{\.{MMIXAL}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
\def\dts{\mathinner{\ldotp\ldotp}}
\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow
\def\ff{\\{ff\kern-.05em}}
@s ff TeX
@s bool normal @q unreserve a C++ keyword @>
@s xor normal @q unreserve a C++ keyword @>
 
@* Introduction. The subroutines below are used to simulate 64-bit \MMIX\
arithmetic on an old-fashioned 32-bit computer---like the one the author
had when he wrote \MMIXAL\ and the first \MMIX\ simulators in 1998 and 1999.
All operations are fabricated from 32-bit arithmetic, including
a full implementation of the IEEE floating point standard,
assuming only that the \CEE/ compiler has a 32-bit unsigned integer type.
 
Some day 64-bit machines will be commonplace and the awkward manipulations of
the present program will look quite archaic. Interested readers who have such
computers will be able to convert the code to a pure 64-bit form without
difficulty, thereby obtaining much faster and simpler routines. Meanwhile,
however, we can simulate the future and hope for continued progress.
 
This program module has a simple structure, intended to make it
suitable for loading with \MMIX\ simulators and assemblers.
 
@c
#include <stdio.h>
#include <string.h>
#include <ctype.h>
@<Stuff for \CEE/ preprocessor@>@;
typedef enum{@+false,true@+} bool;
@<Tetrabyte and octabyte type definitions@>@;
@<Other type definitions@>@;
@<Global variables@>@;
@<Subroutines@>
 
@ Subroutines of this program are declared first with a prototype,
as in {\mc ANSI C}, then with an old-style \CEE/ function definition.
Here are some preprocessor commands that make this work correctly with both
new-style and old-style compilers.
@^prototypes for functions@>
 
@<Stuff for \CEE/ preprocessor@>=
#ifdef __STDC__
#define ARGS(list) list
#else
#define ARGS(list) ()
#endif
 
@ The definition of type \&{tetra} should be changed, if necessary, so that
it represents an unsigned 32-bit integer.
@^system dependencies@>
 
@<Tetra...@>=
typedef unsigned int tetra;
/* for systems conforming to the LP-64 data model */
typedef struct { tetra h,l;} octa; /* two tetrabytes make one octabyte */
 
@ @d sign_bit ((unsigned)0x80000000)
 
@<Glob...@>=
octa zero_octa; /* |zero_octa.h=zero_octa.l=0| */
octa neg_one={-1,-1}; /* |neg_one.h=neg_one.l=-1| */
octa inf_octa={0x7ff00000,0}; /* floating point $+\infty$ */
octa standard_NaN={0x7ff80000,0}; /* floating point NaN(.5) */
octa aux; /* auxiliary output of a subroutine */
bool overflow; /* set by certain subroutines for signed arithmetic */
 
@ It's easy to add and subtract octabytes, if we aren't terribly
worried about speed.
 
@<Subr...@>=
octa oplus @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa oplus(y,z) /* compute $y+z$ */
octa y,z;
{@+ octa x;
x.h=y.h+z.h;@+
x.l=y.l+z.l;
if (x.l<y.l) x.h++;
return x;
}
@#
octa ominus @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa ominus(y,z) /* compute $y-z$ */
octa y,z;
{@+ octa x;
x.h=y.h-z.h;@+
x.l=y.l-z.l;
if (x.l>y.l) x.h--;
return x;
}
 
@ In the following subroutine, |delta| is a signed quantity that is
assumed to fit in a signed tetrabyte.
 
@<Subr...@>=
octa incr @,@,@[ARGS((octa,int))@];@+@t}\6{@>
octa incr(y,delta) /* compute $y+\delta$ */
octa y;
int delta;
{@+ octa x;
x.h=y.h;@+ x.l=y.l+delta;
if (delta>=0) {
if (x.l<y.l) x.h++;
}@+else if (x.l>y.l) x.h--;
return x;
}
 
@ Left and right shifts are only a bit more difficult.
 
@<Subr...@>=
octa shift_left @,@,@[ARGS((octa,int))@];@+@t}\6{@>
octa shift_left(y,s) /* shift left by $s$ bits, where $0\le s\le64$ */
octa y;
int s;
{
while (s>=32) y.h=y.l,y.l=0,s-=32;
if (s) {@+register tetra yhl=y.h<<s,ylh=y.l>>(32-s);
y.h=yhl+ylh;@+ y.l<<=s;
}
return y;
}
@#
octa shift_right @,@,@[ARGS((octa,int,int))@];@+@t}\6{@>
octa shift_right(y,s,u) /* shift right, arithmetically if $u=0$ */
octa y;
int s,u;
{
while (s>=32) y.l=y.h, y.h=(u?0: -(y.h>>31)), s-=32;
if (s) {@+register tetra yhl=y.h<<(32-s),ylh=y.l>>s;
y.h=(u? 0:(-(y.h>>31))<<(32-s))+(y.h>>s);@+ y.l=yhl+ylh;
}
return y;
}
 
@* Multiplication. We need to multiply two unsigned 64-bit integers, obtaining
an unsigned 128-bit product. It is easy to do this on a 32-bit machine
by using Algorithm 4.3.1M of {\sl Seminumerical Algorithms}, with $b=2^{16}$.
@^multiprecision multiplication@>
 
The following subroutine returns the lower half of the product, and
puts the upper half into a global octabyte called |aux|.
 
@<Subr...@>=
octa omult @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa omult(y,z)
octa y,z;
{
register int i,j,k;
tetra u[4],v[4],w[8];
register tetra t;
octa acc;
@<Unpack the multiplier and multiplicand to |u| and |v|@>;
for (j=0;j<4;j++) w[j]=0;
for (j=0;j<4;j++)
if (!v[j]) w[j+4]=0;
else {
for (i=k=0;i<4;i++) {
t=u[i]*v[j]+w[i+j]+k;
w[i+j]=t&0xffff, k=t>>16;
}
w[j+4]=k;
}
@<Pack |w| into the outputs |aux| and |acc|@>;
return acc;
}
 
@ @<Glob...@>=
extern octa aux; /* secondary output of subroutines with multiple outputs */
extern bool overflow;
 
@ @<Unpack the mult...@>=
u[3]=y.h>>16, u[2]=y.h&0xffff, u[1]= y.l>>16, u[0]=y.l&0xffff;
v[3]=z.h>>16, v[2]=z.h&0xffff, v[1]= z.l>>16, v[0]=z.l&0xffff;
 
@ @<Pack |w| into the outputs |aux| and |acc|@>=
aux.h=(w[7]<<16)+w[6], aux.l=(w[5]<<16)+w[4];
acc.h=(w[3]<<16)+w[2], acc.l=(w[1]<<16)+w[0];
 
@ Signed multiplication has the same lower half product as unsigned
multiplication. The signed upper half product is obtained with at most two
further subtractions, after which the result has overflowed if and only if
the upper half is unequal to 64 copies of the sign bit in the lower half.
 
@<Subr...@>=
octa signed_omult @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa signed_omult(y,z)
octa y,z;
{
octa acc;
acc=omult(y,z);
if (y.h&sign_bit) aux=ominus(aux,z);
if (z.h&sign_bit) aux=ominus(aux,y);
overflow=(aux.h!=aux.l || (aux.h^(aux.h>>1)^(acc.h&sign_bit)));
return acc;
}
 
@* Division. Long division of an unsigned 128-bit integer by an unsigned
64-bit integer is, of course, one of the most challenging routines
needed for \MMIX\ arithmetic. The following program, based on
Algorithm 4.3.1D of {\sl Seminumerical Algorithms}, computes
octabytes $q$ and $r$ such that $(2^{64}x+y)=qz+r$ and $0\le r<z$,
given octabytes $x$, $y$, and~$z$, assuming that $x<z$.
(If $x\ge z$, it simply sets $q=x$ and $r=y$.)
The quotient~$q$ is returned by the subroutine;
the remainder~$r$ is stored in |aux|.
@^multiprecision division@>
 
@<Subr...@>=
octa odiv @,@,@[ARGS((octa,octa,octa))@];@+@t}\6{@>
octa odiv(x,y,z)
octa x,y,z;
{
register int i,j,k,n,d;
tetra u[8],v[4],q[4],mask,qhat,rhat,vh,vmh;
register tetra t;
octa acc;
@<Check that |x<z|; otherwise give trivial answer@>;
@<Unpack the dividend and divisor to |u| and |v|@>;
@<Determine the number of significant places |n| in the divisor |v|@>;
@<Normalize the divisor@>;
for (j=3;j>=0;j--) @<Determine the quotient digit |q[j]|@>;
@<Unnormalize the remainder@>;
@<Pack |q| and |u| to |acc| and |aux|@>;
return acc;
}
 
@ @<Check that |x<z|; otherwise give trivial answer@>=
if (x.h>z.h || (x.h==z.h && x.l>=z.l)) {
aux=y;@+ return x;
}
 
@ @<Unpack the div...@>=
u[7]=x.h>>16, u[6]=x.h&0xffff, u[5]=x.l>>16, u[4]=x.l&0xffff;
u[3]=y.h>>16, u[2]=y.h&0xffff, u[1]=y.l>>16, u[0]=y.l&0xffff;
v[3]=z.h>>16, v[2]=z.h&0xffff, v[1]=z.l>>16, v[0]=z.l&0xffff;
 
@ @<Determine the number of significant places |n| in the divisor |v|@>=
for (n=4;v[n-1]==0;n--);
 
@ We shift |u| and |v| left by |d| places, where |d| is chosen to
make $2^{15}\le v_{n-1}<2^{16}$.
 
@<Normalize the divisor@>=
vh=v[n-1];
for (d=0;vh<0x8000;d++,vh<<=1);
for (j=k=0; j<n+4; j++) {
t=(u[j]<<d)+k;
u[j]=t&0xffff, k=t>>16;
}
for (j=k=0; j<n; j++) {
t=(v[j]<<d)+k;
v[j]=t&0xffff, k=t>>16;
}
vh=v[n-1];
vmh=(n>1? v[n-2]: 0);
 
@ @<Unnormalize the remainder@>=
mask=(1<<d)-1;
for (j=3; j>=n; j--) u[j]=0;
for (k=0;j>=0;j--) {
t=(k<<16)+u[j];
u[j]=t>>d, k=t&mask;
}
 
@ @<Pack |q| and |u| to |acc| and |aux|@>=
acc.h=(q[3]<<16)+q[2], acc.l=(q[1]<<16)+q[0];
aux.h=(u[3]<<16)+u[2], aux.l=(u[1]<<16)+u[0];
 
@ @<Determine the quotient digit |q[j]|@>=
{
@<Find the trial quotient, $\hat q$@>;
@<Subtract $b^j\hat q v$ from |u|@>;
@<If the result was negative, decrease $\hat q$ by 1@>;
q[j]=qhat;
}
 
@ @<Find the trial quotient, $\hat q$@>=
t=(u[j+n]<<16)+u[j+n-1];
qhat=t/vh, rhat=t-vh*qhat;
if (n>1) while (qhat==0x10000 || qhat*vmh>(rhat<<16)+u[j+n-2]) {
qhat--, rhat+=vh;
if (rhat>=0x10000) break;
}
 
@ After this step, |u[j+n]| will either equal |k| or |k-1|. The
true value of~|u| would be obtained by subtracting~|k| from |u[j+n]|;
but we don't have to fuss over |u[j+n]|, because it won't be examined later.
 
@<Subtract $b^j\hat q v$ from |u|@>=
for (i=k=0; i<n; i++) {
t=u[i+j]+0xffff0000-k-qhat*v[i];
u[i+j]=t&0xffff, k=0xffff-(t>>16);
}
 
@ The correction here occurs only rarely, but it can be necessary---for
example, when dividing the number \Hex{7fff800100000000} by \Hex{800080020005}.
 
@<If the result was negative, decrease $\hat q$ by 1@>=
if (u[j+n]!=k) {
qhat--;
for (i=k=0; i<n; i++) {
t=u[i+j]+v[i]+k;
u[i+j]=t&0xffff, k=t>>16;
}
}
 
@ Signed division can be reduced to unsigned division in a tedious
but straightforward manner. We assume that the divisor isn't zero.
 
@<Subr...@>=
octa signed_odiv @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa signed_odiv(y,z)
octa y,z;
{
octa yy,zz,q;
register int sy,sz;
if (y.h&sign_bit) sy=2, yy=ominus(zero_octa,y);
else sy=0, yy=y;
if (z.h&sign_bit) sz=1, zz=ominus(zero_octa,z);
else sz=0, zz=z;
q=odiv(zero_octa,yy,zz);
overflow=false;
switch (sy+sz) {
case 2+1: aux=ominus(zero_octa,aux);
if (q.h==sign_bit) overflow=true;
case 0+0: return q;
case 2+0:@+ if (aux.h || aux.l) aux=ominus(zz,aux);
goto negate_q;
case 0+1:@+ if (aux.h || aux.l) aux=ominus(aux,zz);
negate_q:@+ if (aux.h || aux.l) return ominus(neg_one,q);
else return ominus(zero_octa,q);
}
}
 
@* Bit fiddling. The bitwise operators of \MMIX\ are fairly easy to
implement directly, but three of them occur often enough to deserve
packaging as subroutines.
 
@<Subr...@>=
octa oand @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa oand(y,z) /* compute $y\land z$ */
octa y,z;
{@+ octa x;
x.h=y.h&z.h;@+ x.l=y.l&z.l;
return x;
}
@#
octa oandn @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa oandn(y,z) /* compute $y\land\bar z$ */
octa y,z;
{@+ octa x;
x.h=y.h&~z.h;@+ x.l=y.l&~z.l;
return x;
}
@#
octa oxor @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa oxor(y,z) /* compute $y\oplus z$ */
octa y,z;
{@+ octa x;
x.h=y.h^z.h;@+ x.l=y.l^z.l;
return x;
}
 
@ Here's a fun way to count the number of bits in a tetrabyte.
[This classical trick is called the ``Gillies--Miller method
for sideways addition'' in {\sl The Preparation of Programs
for an Electronic Digital Computer\/} by Wilkes, Wheeler, and
Gill, second edition (Reading, Mass.:\ Addison--Wesley, 1957),
191--193. Some of the tricks used here were suggested by
Balbir Singh, Peter Rossmanith, and Stefan Schwoon.]
@^Gillies, Donald Bruce@>
@^Miller, Jeffrey Charles Percy@>
@^Wilkes, Maurice Vincent@>
@^Wheeler, David John@>
@^Gill, Stanley@>
@^Singh, Balbir@>
@^Rossmanith, Peter@>
@^Schwoon, Stefan@>
 
@<Subr...@>=
int count_bits @,@,@[ARGS((tetra))@];@+@t}\6{@>
int count_bits(x)
tetra x;
{
register int xx=x;
xx=xx-((xx>>1)&0x55555555);
xx=(xx&0x33333333)+((xx>>2)&0x33333333);
xx=(xx+(xx>>4))&0x0f0f0f0f;
xx=xx+(xx>>8);
return (xx+(xx>>16)) & 0xff;
}
 
@ To compute the nonnegative byte differences of two given tetrabytes,
we can carry out the following 20-step branchless computation:
 
@<Subr...@>=
tetra byte_diff @,@,@[ARGS((tetra,tetra))@];@+@t}\6{@>
tetra byte_diff(y,z)
tetra y,z;
{
register tetra d=(y&0x00ff00ff)+0x01000100-(z&0x00ff00ff);
register tetra m=d&0x01000100;
register tetra x=d&(m-(m>>8));
d=((y>>8)&0x00ff00ff)+0x01000100-((z>>8)&0x00ff00ff);
m=d&0x01000100;
return x+((d&(m-(m>>8)))<<8);
}
 
@ To compute the nonnegative wyde differences of two tetrabytes,
another trick leads to a 15-step branchless computation.
(Research problem: Can |count_bits|, |byte_diff|, or |wyde_diff| be done
with fewer operations?)
 
@<Subr...@>=
tetra wyde_diff @,@,@[ARGS((tetra,tetra))@];@+@t}\6{@>
tetra wyde_diff(y,z)
tetra y,z;
{
register tetra a=((y>>16)-(z>>16))&0x10000;
register tetra b=((y&0xffff)-(z&0xffff))&0x10000;
return y-(z^((y^z)&(b-a-(b>>16))));
}
 
@ The last bitwise subroutine we need is the most interesting:
It implements \MMIX's \.{MOR} and \.{MXOR} operations.
 
@<Subr...@>=
octa bool_mult @,@,@[ARGS((octa,octa,bool))@];@+@t}\6{@>
octa bool_mult(y,z,xor)
octa y,z; /* the operands */
bool xor; /* do we do xor instead of or? */
{
octa o,x;
register tetra a,b,c;
register int k;
for (k=0,o=y,x=zero_octa;o.h||o.l;k++,o=shift_right(o,8,1))
if (o.l&0xff) {
a=((z.h>>k)&0x01010101)*0xff;
b=((z.l>>k)&0x01010101)*0xff;
c=(o.l&0xff)*0x01010101;
if (xor) x.h^=a&c, x.l^=b&c;
else x.h|=a&c, x.l|=b&c;
}
return x;
}
 
@* Floating point packing and unpacking. Standard IEEE floating binary
numbers pack a sign, exponent, and fraction into a tetrabyte
or octabyte. In this section we consider basic subroutines that
convert between IEEE format and the separate unpacked components.
 
@d ROUND_OFF 1
@d ROUND_UP 2
@d ROUND_DOWN 3
@d ROUND_NEAR 4
 
@<Glob...@>=
int cur_round; /* the current rounding mode */
 
@ The |fpack| routine takes an octabyte $f$, a raw exponent~$e$,
and a sign~|s|, and packs them
into the floating binary number that corresponds to
$\pm2^{e-1076}f$, using a given rounding mode.
The value of $f$ should satisfy $2^{54}\le f\le 2^{55}$.
 
Thus, for example, the floating binary number $+1.0=\Hex{3ff0000000000000}$
is obtained when $f=2^{54}$, $e=\Hex{3fe}$, and |s='+'|.
The raw exponent~$e$ is usually one less than
the final exponent value; the leading bit of~$f$ is essentially added
to the exponent. (This trick works nicely for subnormal numbers, when
$e<0$, or in cases where the value of $f$ is rounded upwards to $2^{55}$.)
 
Exceptional events are noted by oring appropriate bits into
the global variable |exceptions|. Special considerations apply to
underflow, which is not fully specified by Section 7.4 of the IEEE standard:
Implementations of the standard are free to choose between two definitions
of ``tininess'' and two definitions of ``accuracy loss.''
\MMIX\ determines tininess {\it after\/} rounding, hence a result with
$e<0$ is not necessarily tiny; \MMIX\ treats accuracy loss as equivalent
to inexactness. Thus, a result underflows if and only if
it is tiny and either (i)~it is inexact or (ii)~the underflow trap is enabled.
The |fpack| routine sets |U_BIT| in |exceptions| if and only if the result is
tiny, |X_BIT| if and only if the result is inexact.
@^underflow@>
 
@d X_BIT (1<<8) /* floating inexact */
@d Z_BIT (1<<9) /* floating division by zero */
@d U_BIT (1<<10) /* floating underflow */
@d O_BIT (1<<11) /* floating overflow */
@d I_BIT (1<<12) /* floating invalid operation */
@d W_BIT (1<<13) /* float-to-fix overflow */
@d V_BIT (1<<14) /* integer overflow */
@d D_BIT (1<<15) /* integer divide check */
@d E_BIT (1<<18) /* external (dynamic) trap bit */
 
@<Subr...@>=
octa fpack @,@,@[ARGS((octa,int,char,int))@];@+@t}\6{@>
octa fpack(f,e,s,r)
octa f; /* the normalized fraction part */
int e; /* the raw exponent */
char s; /* the sign */
int r; /* the rounding mode */
{
octa o;
if (e>0x7fd) e=0x7ff, o=zero_octa;
else {
if (e<0) {
if (e<-54) o.h=0, o.l=1;
else {@+octa oo;
o=shift_right(f,-e,1);
oo=shift_left(o,-e);
if (oo.l!=f.l || oo.h!=f.h) o.l |= 1; /* sticky bit */
@^sticky bit@>
}
e=0;
}@+else o=f;
}
@<Round and return the result@>;
}
 
@ @<Glob...@>=
int exceptions; /* bits possibly destined for rA */
 
@ Everything falls together so nicely here, it's almost too good to be true!
 
@<Round and return the result@>=
if (o.l&3) exceptions |= X_BIT;
switch (r) {
case ROUND_DOWN:@+ if (s=='-') o=incr(o,3);@+break;
case ROUND_UP:@+ if (s!='-') o=incr(o,3);
case ROUND_OFF: break;
case ROUND_NEAR: o=incr(o, o.l&4? 2: 1);@+break;
}
o = shift_right(o,2,1);
o.h += e<<20;
if (o.h>=0x7ff00000) exceptions |= O_BIT+X_BIT; /* overflow */
else if (o.h<0x100000) exceptions |= U_BIT; /* tininess */
if (s=='-') o.h |= sign_bit;
return o;
 
@ Similarly, |sfpack| packs a short float, from inputs
having the same conventions as |fpack|.
 
@<Subr...@>=
tetra sfpack @,@,@[ARGS((octa,int,char,int))@];@+@t}\6{@>
tetra sfpack(f,e,s,r)
octa f; /* the fraction part */
int e; /* the raw exponent */
char s; /* the sign */
int r; /* the rounding mode */
{
register tetra o;
if (e>0x47d) e=0x47f, o=0;
else {
o=shift_left(f,3).h;
if (f.l&0x1fffffff) o|=1;
if (e<0x380) {
if (e<0x380-25) o=1;
else {@+register tetra o0,oo;
o0 = o;
o = o>>(0x380-e);
oo = o<<(0x380-e);
if (oo!=o0) o |= 1; /* sticky bit */
@^sticky bit@>
}
e=0x380;
}
}
@<Round and return the short result@>;
}
 
@ @<Round and return the short result@>=
if (o&3) exceptions |= X_BIT;
switch (r) {
case ROUND_DOWN:@+ if (s=='-') o+=3;@+break;
case ROUND_UP:@+ if (s!='-') o+=3;
case ROUND_OFF: break;
case ROUND_NEAR: o+=(o&4? 2: 1);@+break;
}
o = o>>2;
o += (e-0x380)<<23;
if (o>=0x7f800000) exceptions |= O_BIT+X_BIT; /* overflow */
else if (o<0x100000) exceptions |= U_BIT; /* tininess */
if (s=='-') o |= sign_bit;
return o;
 
@ The |funpack| routine is, roughly speaking, the opposite of |fpack|.
It takes a given floating point number~$x$ and separates out its
fraction part~$f$, exponent~$e$, and sign~$s$. It clears |exceptions|
to zero. It returns the type of value found: |zro|, |num|, |inf|,
or |nan|. When it returns |num|,
it will have set $f$, $e$, and~$s$
to the values from which |fpack| would produce the original number~$x$
without exceptions.
 
@d zero_exponent (-1000) /* zero is assumed to have this exponent */
 
@<Other type...@>=
typedef enum {@!zro,@!num,@!inf,@!nan}@+ftype;
@ @<Subr...@>=
ftype funpack @,@,@[ARGS((octa,octa*,int*,char*))@];@+@t}\6{@>
ftype funpack(x,f,e,s)
octa x; /* the given floating point value */
octa *f; /* address where the fraction part should be stored */
int *e; /* address where the exponent part should be stored */
char *s; /* address where the sign should be stored */
{
register int ee;
exceptions=0;
*s=(x.h&sign_bit? '-': '+');
*f=shift_left(x,2);
f->h &= 0x3fffff;
ee=(x.h>>20)&0x7ff;
if (ee) {
*e=ee-1;
f->h |= 0x400000;
return (ee<0x7ff? num: f->h==0x400000 && !f->l? inf: nan);
}
if (!x.l && !f->h) {
*e=zero_exponent;@+ return zro;
}
do {@+ ee--;@+ *f=shift_left(*f,1);@+} while (!(f->h&0x400000));
*e=ee;@+ return num;
}
 
@ @<Subr...@>=
ftype sfunpack @,@,@[ARGS((tetra,octa*,int*,char*))@];@+@t}\6{@>
ftype sfunpack(x,f,e,s)
tetra x; /* the given floating point value */
octa *f; /* address where the fraction part should be stored */
int *e; /* address where the exponent part should be stored */
char *s; /* address where the sign should be stored */
{
register int ee;
exceptions=0;
*s=(x&sign_bit? '-': '+');
f->h=(x>>1)&0x3fffff, f->l=x<<31;
ee=(x>>23)&0xff;
if (ee) {
*e=ee+0x380-1;
f->h |= 0x400000;
return (ee<0xff? num: (x&0x7fffffff)==0x7f800000? inf: nan);
}
if (!(x&0x7fffffff)) {
*e=zero_exponent;@+return zro;
}
do {@+ ee--;@+ *f=shift_left(*f,1);@+} while (!(f->h&0x400000));
*e=ee+0x380;@+ return num;
}
 
@ Since \MMIX\ downplays 32-bit operations, it uses |sfpack| and |sfunpack|
only when loading and storing short floats, or when converting
from fixed point to floating point.
 
@<Subr...@>=
octa load_sf @,@,@[ARGS((tetra))@];@+@t}\6{@>
octa load_sf(z)
tetra z; /* 32 bits to be loaded into a 64-bit register */
{
octa f,x;@+int e;@+char s;@+ftype t;
t=sfunpack(z,&f,&e,&s);
switch (t) {
case zro: x=zero_octa;@+break;
case num: return fpack(f,e,s,ROUND_OFF);
case inf: x=inf_octa;@+break;
case nan: x=shift_right(f,2,1);@+x.h|=0x7ff00000;@+break;
}
if (s=='-') x.h|=sign_bit;
return x;
}
 
@ @<Subr...@>=
tetra store_sf @,@,@[ARGS((octa))@];@+@t}\6{@>
tetra store_sf(x)
octa x; /* 64 bits to be loaded into a 32-bit word */
{
octa f;@+tetra z;@+int e;@+char s;@+ftype t;
t=funpack(x,&f,&e,&s);
switch (t) {
case zro: z=0;@+break;
case num: return sfpack(f,e,s,cur_round);
case inf: z=0x7f800000;@+break;
case nan:@+ if (!(f.h&0x200000)) {
f.h|=0x200000;@+exceptions|=I_BIT; /* NaN was signaling */
}
z=0x7f800000|(f.h<<1)|(f.l>>31);@+break;
}
if (s=='-') z|=sign_bit;
return z;
}
 
@* Floating multiplication and division.
The hardest fixed point operations were multiplication and division;
but these two operations are the {\it easiest\/} to implement in floating point
arithmetic, once their fixed point counterparts are available.
 
@<Subr...@>=
octa fmult @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa fmult(y,z)
octa y,z;
{
ftype yt,zt;
int ye,ze;
char ys,zs;
octa x,xf,yf,zf;
register int xe;
register char xs;
yt=funpack(y,&yf,&ye,&ys);
zt=funpack(z,&zf,&ze,&zs);
xs=ys+zs-'+'; /* will be |'-'| when the result is negative */
switch (4*yt+zt) {
@t\4@>@<The usual NaN cases@>;
case 4*zro+zro: case 4*zro+num: case 4*num+zro: x=zero_octa;@+break;
case 4*num+inf: case 4*inf+num: case 4*inf+inf: x=inf_octa;@+break;
case 4*zro+inf: case 4*inf+zro: x=standard_NaN;
exceptions|=I_BIT;@+break;
case 4*num+num: @<Multiply nonzero numbers and |return|@>;
}
if (xs=='-') x.h|=sign_bit;
return x;
}
 
@ @<The usual NaN cases@>=
case 4*nan+nan:@+if (!(y.h&0x80000)) exceptions|=I_BIT; /* |y| is signaling */
case 4*zro+nan: case 4*num+nan: case 4*inf+nan:
if (!(z.h&0x80000)) exceptions|=I_BIT, z.h|=0x80000;
return z;
case 4*nan+zro: case 4*nan+num: case 4*nan+inf:
if (!(y.h&0x80000)) exceptions|=I_BIT, y.h|=0x80000;
return y;
 
@ @<Multiply nonzero numbers and |return|@>=
xe=ye+ze-0x3fd; /* the raw exponent */
x=omult(yf,shift_left(zf,9));
if (aux.h>=0x400000) xf=aux;
else xf=shift_left(aux,1), xe--;
if (x.h||x.l) xf.l|=1; /* adjust the sticky bit */
return fpack(xf,xe,xs,cur_round);
 
@ @<Subr...@>=
octa fdivide @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa fdivide(y,z)
octa y,z;
{
ftype yt,zt;
int ye,ze;
char ys,zs;
octa x,xf,yf,zf;
register int xe;
register char xs;
yt=funpack(y,&yf,&ye,&ys);
zt=funpack(z,&zf,&ze,&zs);
xs=ys+zs-'+'; /* will be |'-'| when the result is negative */
switch (4*yt+zt) {
@t\4@>@<The usual NaN cases@>;
case 4*zro+inf: case 4*zro+num: case 4*num+inf: x=zero_octa;@+break;
case 4*num+zro: exceptions|=Z_BIT;
case 4*inf+num: case 4*inf+zro: x=inf_octa;@+break;
case 4*zro+zro: case 4*inf+inf: x=standard_NaN;
exceptions|=I_BIT;@+break;
case 4*num+num: @<Divide nonzero numbers and |return|@>;
}
if (xs=='-') x.h|=sign_bit;
return x;
}
 
@ @<Divide nonzero numbers...@>=
xe=ye-ze+0x3fd; /* the raw exponent */
xf=odiv(yf,zero_octa,shift_left(zf,9));
if (xf.h>=0x800000) {
aux.l|=xf.l&1;
xf=shift_right(xf,1,1);
xe++;
}
if (aux.h||aux.l) xf.l|=1; /* adjust the sticky bit */
return fpack(xf,xe,xs,cur_round);
 
@*Floating addition and subtraction. Now for the bread-and-butter
operation, the sum of two floating point numbers.
It is not terribly difficult, but many cases need to be handled carefully.
 
@<Subr...@>=
octa fplus @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
octa fplus(y,z)
octa y,z;
{
ftype yt,zt;
int ye,ze;
char ys,zs;
octa x,xf,yf,zf;
register int xe,d;
register char xs;
yt=funpack(y,&yf,&ye,&ys);
zt=funpack(z,&zf,&ze,&zs);
switch (4*yt+zt) {
@t\4@>@<The usual NaN cases@>;
case 4*zro+num: return fpack(zf,ze,zs,ROUND_OFF);@+break; /* may underflow */
case 4*num+zro: return fpack(yf,ye,ys,ROUND_OFF);@+break; /* may underflow */
case 4*inf+inf:@+if (ys!=zs) {
exceptions|=I_BIT;@+x=standard_NaN;@+xs=zs;@+break;
}
case 4*num+inf: case 4*zro+inf: x=inf_octa;@+xs=zs;@+break;
case 4*inf+num: case 4*inf+zro: x=inf_octa;@+xs=ys;@+break;
case 4*num+num:@+ if (y.h!=(z.h^0x80000000) || y.l!=z.l)
@<Add nonzero numbers and |return|@>;
case 4*zro+zro: x=zero_octa;
xs=(ys==zs? ys: cur_round==ROUND_DOWN? '-': '+');@+break;
}
if (xs=='-') x.h|=sign_bit;
return x;
}
 
@ @<Add nonzero numbers...@>=
{@+octa o,oo;
if (ye<ze || (ye==ze && (yf.h<zf.h || (yf.h==zf.h && yf.l<zf.l))))
@<Exchange |y| with |z|@>;
d=ye-ze;
xs=ys, xe=ye;
if (d) @<Adjust for difference in exponents@>;
if (ys==zs) {
xf=oplus(yf,zf);
if (xf.h>=0x800000) xe++, d=xf.l&1, xf=shift_right(xf,1,1), xf.l|=d;
}@+else {
xf=ominus(yf,zf);
if (xf.h>=0x800000) xe++, d=xf.l&1, xf=shift_right(xf,1,1), xf.l|=d;
else@+ while (xf.h<0x400000) xe--, xf=shift_left(xf,1);
}
return fpack(xf,xe,xs,cur_round);
}
 
@ @<Exchange |y| with |z|@>=
{
o=yf, yf=zf, zf=o;
d=ye, ye=ze, ze=d;
d=ys, ys=zs, zs=d;
}
 
@ Proper rounding requires two bits to the right of the fraction delivered
to~|fpack|. The first is the true next bit of the result;
the other is a ``sticky'' bit, which is nonzero if any further bits of the
true result are nonzero. Sticky rounding to an integer takes
$x$ into the number $\lfloor x/2\rfloor+\lceil x/2\rceil$.
@^sticky bit@>
 
Some subtleties need to be observed here, in order to
prevent the sticky bit from being shifted left. If we did not
shift |yf| left~1 before shifting |zf| to the right, an incorrect
answer would be obtained in certain cases---for example, if
$|yf|=2^{54}$, $|zf|=2^{54}+2^{53}-1$, $d=52$.
 
@<Adjust for difference in exponents@>=
{
if (d<=2) zf=shift_right(zf,d,1); /* exact result */
else if (d>53) zf.h=0, zf.l=1; /* tricky but OK */
else {
if (ys!=zs) d--,xe--,yf=shift_left(yf,1);
o=zf;
zf=shift_right(o,d,1);
oo=shift_left(zf,d);
if (oo.l!=o.l || oo.h!=o.h) zf.l|=1;
}
}
 
@ The comparison of floating point numbers with respect to $\epsilon$
shares some of the characteristics of floating point addition/subtraction.
In some ways it is simpler, and in other ways it is more difficult;
we might as well deal with it now. % anyways
 
Subroutine |fepscomp(y,z,e,s)| returns 2 if |y|, |z|, or |e| is a NaN
or |e| is negative. It returns 1 if |s=0| and $y\approx z\ (e)$ or if
|s!=0| and $y\sim z\ (e)$,
as defined in Section~4.2.2 of {\sl Seminumerical Algorithms\/};
otherwise it returns~0.
 
@<Subr...@>=
int fepscomp @,@,@[ARGS((octa,octa,octa,int))@];@+@t}\6{@>
int fepscomp(y,z,e,s)
octa y,z,e; /* the operands */
int s; /* test similarity? */
{
octa yf,zf,ef,o,oo;
int ye,ze,ee;
char ys,zs,es;
register int yt,zt,et,d;
et=funpack(e,&ef,&ee,&es);
if (es=='-') return 2;
switch (et) {
case nan: return 2;
case inf: ee=10000;
case num: case zro: break;
}
yt=funpack(y,&yf,&ye,&ys);
zt=funpack(z,&zf,&ze,&zs);
switch (4*yt+zt) {
case 4*nan+nan: case 4*nan+inf: case 4*nan+num: case 4*nan+zro:
case 4*inf+nan: case 4*num+nan: case 4*zro+nan: return 2;
case 4*inf+inf: return (ys==zs || ee>=1023);
case 4*inf+num: case 4*inf+zro: case 4*num+inf: case 4*zro+inf:
return (s && ee>=1022);
case 4*zro+zro: return 1;
case 4*zro+num: case 4*num+zro:@+ if (!s) return 0;
case 4*num+num: break;
}
@<Compare two numbers with respect to epsilon and |return|@>;
}
 
@ The relation $y\approx z\ (\epsilon)$ reduces to
$y\sim z\ (\epsilon/2^d)$, if $d$~is the difference between the
larger and smaller exponents of $y$ and~$z$.
 
@<Compare two numbers with respect to epsilon and |return|@>=
@<Unsubnormalize |y| and |z|, if they are subnormal@>;
if (ye<ze || (ye==ze && (yf.h<zf.h || (yf.h==zf.h && yf.l<zf.l))))
@<Exchange |y| with |z|@>;
if (ze==zero_exponent) ze=ye;
d=ye-ze;
if (!s) ee-=d;
if (ee>=1023) return 1; /* if $\epsilon\ge2$, $z\in N_\epsilon(y)$ */
@<Compute the difference of fraction parts, |o|@>;
if (!o.h && !o.l) return 1;
if (ee<968) return 0; /* if $y\ne z$ and $\epsilon<2^{-54}$, $y\not\sim z$ */
if (ee>=1021) ef=shift_left(ef,ee-1021);
else ef=shift_right(ef,1021-ee,1);
return o.h<ef.h || (o.h==ef.h && o.l<=ef.l);
 
@ @<Unsubnormalize |y| and |z|, if they are subnormal@>=
if (ye<0 && yt!=zro) yf=shift_left(y,2), ye=0;
if (ze<0 && zt!=zro) zf=shift_left(z,2), ze=0;
 
@ At this point $y\sim z$ if and only if
$$|yf|+(-1)^{[ys=zs]}|zf|/2^d\le 2^{ee-1021}|ef|=2^{55}\epsilon.$$
We need to evaluate this relation without overstepping the bounds of
our simulated 64-bit registers.
 
When $d>2$, the difference of fraction parts might not fit exactly
in an octabyte;
in that case the numbers are not similar unless $\epsilon>3/8$,
and we replace the difference by the ceiling of the
true result. When $\epsilon<1/8$, our program essentially replaces
$2^{55}\epsilon$ by $\lfloor2^{55}\epsilon\rfloor$. These
truncations are not needed simultaneously. Therefore the logic
is justified by the facts that, if $n$ is an integer, we have
$x\le n$ if and only if $\lceil x\rceil\le n$;
$n\le x$ if and only if $n\le\lfloor x\rfloor$. (Notice that the
concept of ``sticky bit'' is {\it not\/} appropriate here.)
@^sticky bit@>
 
@<Compute the difference of fraction parts, |o|@>=
if (d>54) o=zero_octa,oo=zf;
else o=shift_right(zf,d,1),oo=shift_left(o,d);
if (oo.h!=zf.h || oo.l!=zf.l) { /* truncated result, hence $d>2$ */
if (ee<1020) return 0; /* difference is too large for similarity */
o=incr(o,ys==zs? 0: 1); /* adjust for ceiling */
}
o=(ys==zs? ominus(yf,o): oplus(yf,o));
 
@*Floating point output conversion.
The |print_float| routine converts an octabyte to a floating decimal
representation that will be input as precisely the same value.
@^binary-to-decimal conversion@>
@^radix conversion@>
@^multiprecision conversion@>
 
@<Subr...@>=
static void bignum_times_ten @,@,@[ARGS((bignum*))@];
static void bignum_dec @,@,@[ARGS((bignum*,bignum*,tetra))@];
static int bignum_compare @,@,@[ARGS((bignum*,bignum*))@];
void print_float @,@,@[ARGS((octa))@];@+@t}\6{@>
void print_float(x)
octa x;
{
@<Local variables for |print_float|@>;
if (x.h&sign_bit) printf("-");
@<Extract the exponent |e| and determine the
fraction interval $[f\dts g]$ or $(f\dts g)$@>;
@<Store $f$ and $g$ as multiprecise integers@>;
@<Compute the significant digits |s| and decimal exponent |e|@>;
@<Print the significant digits with proper context@>;
}
 
@ One way to visualize the problem being solved here is to consider
the vastly simpler case in which there are only 2-bit exponents
and 2-bit fractions. Then the sixteen possible 4-bit combinations
have the following interpretations:
$$\def\\{\;\dts\;}
\vbox{\halign{#\qquad&$#$\hfil\cr
0000&[0\\0.125]\cr
0001&(0.125\\0.375)\cr
0010&[0.375\\0.625]\cr
0011&(0.625\\0.875)\cr
0100&[0.875\\1.125]\cr
0101&(1.125\\1.375)\cr
0110&[1.375\\1.625]\cr
0111&(1.625\\1.875)\cr
1000&[1.875\\2.25]\cr
1001&(2.25\\2.75)\cr
1010&[2.75\\3.25]\cr
1011&(3.25\\3.75)\cr
1100&[3.75\\\infty]\cr
1101&\rm NaN(0\\0.375)\cr
1110&\rm NaN[0.375\\0.625]\cr
1111&\rm NaN(0.625\\1)\cr}}$$
Notice that the interval is closed, $[f\dts g]$, when the fraction part
is even; it is open, $(f\dts g)$, when the fraction part is odd.
The printed outputs for these sixteen values, if we actually were
dealing with such short exponents and fractions, would be
\.{0.}, \.{.2}, \.{.5}, \.{.7}, \.{1.}, \.{1.2}, \.{1.5}, \.{1.7},
\.{2.}, \.{2.5}, \.{3.}, \.{3.5}, \.{Inf}, \.{NaN.2}, \.{NaN}, \.{NaN.8},
respectively.
 
@<Extract the exponent |e|...@>=
f=shift_left(x,1);
e=f.h>>21;
f.h&=0x1fffff;
if (!f.h && !f.l) @<Handle the special case when the fraction part is zero@>@;
else {
g=incr(f,1);
f=incr(f,-1);
if (!e) e=1; /* subnormal */
else if (e==0x7ff) {
printf("NaN");
if (g.h==0x100000 && g.l==1) return; /* the ``standard'' NaN */
e=0x3ff; /* extreme NaNs come out OK even without adjusting |f| or |g| */
}@+else f.h|=0x200000, g.h|=0x200000;
}
 
@ @<Local variables for |print_float|@>=
octa f,g; /* lower and upper bounds on the fraction part */
register int e; /* exponent part */
register int j,k; /* all purpose indices */
 
@ The transition points between exponents correspond to powers of~2. At
such points the interval extends only half as far to the left of that
power of~2 as it does to the right. For example, in the 4-bit minifloat numbers
considered above, case 1000 corresponds to the interval $[1.875\;\dts\;2.25]$.
 
@<Handle the special case when the fraction part is zero@>=
{
if (!e) {
printf("0.");@+return;
}
if (e==0x7ff) {
printf("Inf");@+return;
}
e--;
f.h=0x3fffff, f.l=0xffffffff;
g.h=0x400000, g.l=2;
}
 
@ We want to find the ``simplest'' value in the interval corresponding
to the given number, in the sense that it has fewest significant
digits when expressed in decimal notation. Thus, for example,
if the floating point number can be described by a relatively
short string such as `\.{.1}' or `\.{37e100}', we want to discover that
representation.
 
The basic idea is to generate the decimal representations of the
two endpoints of the interval, outputting the leading digits where
both endpoints agree, then making a final decision at the first place where
they disagree.
 
The ``simplest'' value is not always unique. For example, in the
case of 4-bit minifloat numbers we could represent the bit pattern 0001 as
either \.{.2} or \.{.3}, and we could represent 1001 in five equally short
ways: \.{2.3} or \.{2.4} or \.{2.5} or \.{2.6} or \.{2.7}. The
algorithm below tries to choose the middle possibility in such cases.
 
[A solution to the analogous problem for fixed-point representations,
without the additional complication of round-to-even, was used by
the author in the program for \TeX; see {\sl Beauty is Our Business\/}
(Springer, 1990), 233--242.]
@^Knuth, Donald Ervin@>
 
Suppose we are given two fractions $f$ and $g$, where $0\le f<g<1$, and
we want to compute the shortest decimal in the closed interval $[f\dts g]$.
If $f=0$, we are done. Otherwise let $10f=d+f'$ and $10g=e+g'$, where
$0\le f'<1$ and $0\le g'<1$. If $d<e$, we can terminate by outputting
any of the digits $d+1$, \dots,~$e$; otherwise we output the
common digit $d=e$, and repeat the process on the fractions $0\le f'<g'<1$.
A similar procedure works with respect to the open interval $(f\dts g)$.
 
@ The program below carries out the stated algorithm by using multiprecision
arithmetic on 77-place integers with 28 bits each. This choice
facilitates multiplication by~10, and allows us to deal with the whole range of
floating binary numbers using fixed point arithmetic. We keep track of
the leading and trailing digit positions so that trivial operations on
zeros are avoided.
 
If |f| points to a \&{bignum}, its radix-$2^{28}$ digits are
|f->dat[0]| through |f->dat[76]|, from most significant to least significant.
We assume that all digit positions are zero unless they lie in the
subarray between indices |f->a| and |f->b|, inclusive.
Furthermore, both |f->dat[f->a]| and |f->dat[f->b]| are nonzero,
unless |f->a=f->b=bignum_prec-1|.
 
The \&{bignum} data type can be used with any radix less than
$2^{32}$; we will use it later with radix~$10^9$. The |dat| array
is made large enough to accommodate both applications.
 
@d bignum_prec 157 /* would be 77 if we cared only about |print_float| */
 
@<Other type...@>=
typedef struct {
int a; /* index of the most significant digit */
int b; /* index of the least significant digit; must be $\ge a$ */
tetra dat[bignum_prec]; /* the digits; undefined except between |a| and |b| */
} bignum;
 
@ Here, for example, is how we go from $f$ to $10f$, assuming that
overflow will not occur and that the radix is $2^{28}$:
 
@<Subr...@>=
static void bignum_times_ten(f)
bignum *f;
{
register tetra *p,*q; register tetra x,carry;
for (p=&f->dat[f->b],q=&f->dat[f->a],carry=0; p>=q; p--) {
x=*p*10+carry;
*p=x&0xfffffff;
carry=x>>28;
}
*p=carry;
if (carry) f->a--;
if (f->dat[f->b]==0 && f->b>f->a) f->b--;
}
 
@ And here is how we test whether $f<g$, $f=g$, or $f>g$, using any
radix whatever:
 
@<Subr...@>=
static int bignum_compare(f,g)
bignum *f,*g;
{
register tetra *p,*pp,*q,*qq;
if (f->a!=g->a) return f->a > g->a? -1: 1;
pp=&f->dat[f->b], qq=&g->dat[g->b];
for (p=&f->dat[f->a],q=&g->dat[g->a]; p<=pp; p++,q++) {
if (*p!=*q) return *p<*q? -1: 1;
if (q==qq) return p<pp;
}
return -1;
}
 
@ The following subroutine subtracts $g$ from~$f$, assuming that
$f\ge g>0$ and using a given radix.
 
@<Subr...@>=
static void bignum_dec(f,g,r)
bignum *f,*g;
tetra r; /* the radix */
{
register tetra *p,*q,*qq;
register int x,borrow;
while (g->b>f->b) f->dat[++f->b]=0;
qq=&g->dat[g->a];
for (p=&f->dat[g->b],q=&g->dat[g->b],borrow=0;q>=qq;p--,q--) {
x=*p - *q - borrow;
if (x>=0) borrow=0, *p=x;
else borrow=1, *p=x+r;
}
for (;borrow;p--)
if (*p) borrow=0, *p=*p-1;
else *p=r-1;
while (f->dat[f->a]==0) {
if (f->a==f->b) { /* the result is zero */
f->a=f->b=bignum_prec-1, f->dat[bignum_prec-1]=0;
return;
}
f->a++;
}
while (f->dat[f->b]==0) f->b--;
}
 
@ Armed with these subroutines, we are ready to solve the problem.
The first task is to put the numbers into \&{bignum} form.
If the exponent is |e|, the number destined for digit |dat[k]| will
consist of the rightmost 28 bits of the given fraction after it has
been shifted right $c-e-28k$ bits, for some constant~$c$.
We choose $c$ so that,
when $e$ has its maximum value \Hex{7ff}, the leading digit will
go into position |dat[1]|, and so that when the number to be printed
is exactly~1 the integer part of~$g$ will also be exactly~1.
 
@d magic_offset 2112 /* the constant $c$ that makes it work */
@d origin 37 /* the radix point follows |dat[37]| */
 
@<Store $f$ and $g$ as multiprecise integers@>=
k=(magic_offset-e)/28;
ff.dat[k-1]=shift_right(f,magic_offset+28-e-28*k,1).l&0xfffffff;
gg.dat[k-1]=shift_right(g,magic_offset+28-e-28*k,1).l&0xfffffff;
ff.dat[k]=shift_right(f,magic_offset-e-28*k,1).l&0xfffffff;
gg.dat[k]=shift_right(g,magic_offset-e-28*k,1).l&0xfffffff;
ff.dat[k+1]=shift_left(f,e+28*k-(magic_offset-28)).l&0xfffffff;
gg.dat[k+1]=shift_left(g,e+28*k-(magic_offset-28)).l&0xfffffff;
ff.a=(ff.dat[k-1]? k-1: k);
ff.b=(ff.dat[k+1]? k+1: k);
gg.a=(gg.dat[k-1]? k-1: k);
gg.b=(gg.dat[k+1]? k+1: k);
 
@ If $e$ is sufficiently small, the fractions $f$ and $g$ will be less than~1,
and we can use the stated algorithm directly. Of course, if $e$ is
extremely small, a lot of leading zeros need to be lopped off; in the
worst case, we may have to multiply $f$ and~$g$ by~10 more than 300 times.
But hey, we don't need to do that extremely often, and computers are
pretty fast nowadays.
 
In the small-exponent case, the computation always terminates before
$f$ becomes zero, because the interval endpoints are fractions with
denominator $2^t$ for some $t>50$.
 
The invariant relations |ff.dat[ff.a]!=0| and |gg.dat[gg.a]!=0| are
not maintained by the computation here, when |ff.a=origin| or |gg.a=origin|.
But no harm is done, because |bignum_compare| is not used.
 
@<Compute the significant digits |s|...@>=
if (e>0x401) @<Compute the significant digits in the large-exponent case@>@;
else@+{ /* if |e<=0x401| we have |gg.a>=origin| and |gg.dat[origin]<=8| */
if (ff.a>origin) ff.dat[origin]=0;
for (e=1, p=s; gg.a>origin || ff.dat[origin]==gg.dat[origin]; ) {
if (gg.a>origin) e--;
else *p++=ff.dat[origin]+'0', ff.dat[origin]=0, gg.dat[origin]=0;
bignum_times_ten(&ff);
bignum_times_ten(&gg);
}
*p++=((ff.dat[origin]+1+gg.dat[origin])>>1)+'0'; /* the middle digit */
}
*p='\0'; /* terminate the string |s| */
 
@ When |e| is large, we use the stated algorithm by considering $f$ and
$g$ to be fractions whose denominator is a power of~10.
 
An interesting case arises when the number to be converted is
\Hex{44ada56a4b0835bf}, since the interval turns out to be
$$ (69999999999999991611392\ \ \dts\ \ 70000000000000000000000).$$
If this were a closed interval, we could simply give the answer
\.{7e22}; but the number \.{7e22} actually corresponds to
\Hex{44ada56a4b0835c0}
because of the round-to-even rule. Therefore the correct answer is, say,
\.{6.9999999999999995e22}. This example shows that we need a slightly
different strategy in the case of open intervals; we cannot simply
look at the first position in which the endpoints have different
decimal digits. Therefore we change the invariant relation to $0\le f<g\le 1$,
when open intervals are involved,
and we do not terminate the process when $f=0$ or $g=1$.
 
@<Compute the significant digits in the large-exponent case@>=
{@+register int open=x.l&1;
tt.dat[origin]=10;
tt.a=tt.b=origin;
for (e=1;bignum_compare(&gg,&tt)>=open;e++)
bignum_times_ten(&tt);
p=s;
while (1) {
bignum_times_ten(&ff);
bignum_times_ten(&gg);
for (j='0';bignum_compare(&ff,&tt)>=0;j++)
bignum_dec(&ff,&tt,0x10000000),bignum_dec(&gg,&tt,0x10000000);
if (bignum_compare(&gg,&tt)>=open) break;
*p++=j;
if (ff.a==bignum_prec-1 && !open)
goto done; /* $f=0$ in a closed interval */
}
for (k=j;bignum_compare(&gg,&tt)>=open;k++) bignum_dec(&gg,&tt,0x10000000);
*p++=(j+1+k)>>1; /* the middle digit */
done:;
}
 
@ The length of string~|s| will be at most 17. For if $f$ and $g$
agree to 17 places, we have $g/f<1+10^{-16}$; but the
ratio $g/f$ is always $\ge(1+2^{-52}+2^{-53})/(1+2^{-52}-2^{-53})
>1+2\times10^{-16}$.
 
@<Local variables for |print_float|@>=
bignum ff,gg; /* fractions or numerators of fractions */
bignum tt; /* power of ten (used as the denominator) */
char s[18];
register char *p;
 
@ At this point the significant digits are in string |s|, and |s[0]!='0'|.
If we put a decimal point at the left of~|s|, the result should
be multiplied by $10^e$.
 
We prefer the output `\.{300.}' to the form `\.{3e2}', and we prefer
`\.{.03}' to `\.{3e-2}'. In general, the output will use an
explicit exponent only if the alternative would take more than
18~characters.
 
@<Print the significant digits with proper context@>=
if (e>17 || e<(int)strlen(s)-17)
printf("%c%s%se%d",s[0],(s[1]? ".": ""),s+1,e-1);
else if (e<0) printf(".%0*d%s",-e,0,s);
else if (strlen(s)>=e) printf("%.*s.%s",e,s,s+e);
else printf("%s%0*d.",s,e-(int)strlen(s),0);
 
@*Floating point input conversion. Going the other way, we want to
be able to convert a given decimal number into its floating binary
@^decimal-to-binary conversion@>
@^radix conversion@>
@^multiprecision conversion@>
equivalent. The following syntax is supported:
$$\vbox{\halign{$#$\hfil\cr
\<digit>\is\.0\mid\.1\mid\.2\mid\.3\mid\.4\mid
\.5\mid\.6\mid\.7\mid\.8\mid\.9\cr
\<digit string>\is\<digit>\mid\<digit string>\<digit>\cr
\<decimal string>\is\<digit string>\..\mid\..\<digit string>\mid
\<digit string>\..\<digit string>\cr
\<optional sign>\is\<empty>\mid\.+\mid\.-\cr
\<exponent>\is\.e\<optional sign>\<digit string>\cr
\<optional exponent>\is\<empty>\mid\<exponent>\cr
\<floating magnitude>\is\<digit string>\<exponent>\mid
\<decimal string>\<optional exponent>\mid\cr
\hskip12em \.{Inf}\mid\.{NaN}\mid\.{NaN.}\<digit string>\cr
\<floating constant>\is\<optional sign>\<floating magnitude>\cr
\<decimal constant>\is\<optional sign>\<digit string>\cr
}}$$
For example, `\.{-3.}' is the floating constant \Hex{c008000000000000}\thinspace;
`\.{1e3}' and `\.{1000}' are both equivalent to \Hex{408f400000000000}\thinspace;
`\.{NaN}' and `\.{+NaN.5}' are both equivalent to \Hex{7ff8000000000000}.
 
The |scan_const| routine looks at a given string and finds the
longest initial substring that matches the syntax of either \<decimal
constant> or \<floating constant>. It puts the corresponding value
into the global octabyte variable~|val|; it also puts the position of the first
unscanned character in the global pointer variable |next_char|.
It returns 1 if a floating constant was found, 0~if a decimal constant
was found, $-1$ if nothing was found. A decimal constant that doesn't
fit in an octabyte is computed modulo~$2^{64}$.
@^syntax of floating point constants@>
 
The value of |exceptions| set by |scan_const| is not necessarily correct.
 
@<Subr...@>=
static void bignum_double @,@,@[ARGS((bignum*))@];
int scan_const @,@,@[ARGS((char*))@];@+@t}\6{@>
int scan_const(s)
char *s;
{
@<Local variables for |scan_const|@>;
val.h=val.l=0;
p=s;
if (*p=='+' || *p=='-') sign=*p++;@+else sign='+';
if (strncmp(p,"NaN",3)==0) NaN=true, p+=3;
else NaN=false;
if ((isdigit(*p)&&!NaN) || (*p=='.' && isdigit(*(p+1))))
@<Scan a number and |return|@>;
if (NaN) @<Return the standard NaN@>;
if (strncmp(p,"Inf",3)==0) @<Return infinity@>;
no_const_found: next_char=s;@+return -1;
}
 
@ @<Glob...@>=
octa val; /* value returned by |scan_const| */
char *next_char; /* pointer returned by |scan_const| */
 
@ @<Local variables for |scan_const|@>=
register char *p,*q; /* for string manipulations */
register bool NaN; /* are we processing a NaN? */
int sign; /* |'+'| or |'-'| */
 
@ @<Return the standard NaN@>=
{
next_char=p;
val.h=0x600000, exp=0x3fe;
goto packit;
}
 
@ @<Return infinity@>=
{
next_char=p+3;
goto make_it_infinite;
}
 
@ We saw above that a string of at most 17 digits is enough to characterize
a floating point number, for purposes of output. But a much longer buffer
for digits is needed when we're doing input. For example, consider the
borderline quantity $(1+2^{-53})/2^{1022}$; its decimal expansion, when
written out exactly, is a number with more than 750 significant digits:
\.{2.2250738585...8125e-308}.
If {\it any one\/} of those digits is increased, or if
additional nonzero digits are added as in
\.{2.2250738585...81250000001e-308},
the rounded value is supposed to change from \Hex{0010000000000000}
to \Hex{0010000000000001}.
 
We assume here that the user prefers a perfectly correct answer to
a speedy almost-correct one, so we implement the most general case.
 
@<Scan a number...@>=
{
for (q=buf0,dec_pt=(char*)0;isdigit(*p);p++) {
val=oplus(val,shift_left(val,2)); /* multiply by 5 */
val=incr(shift_left(val,1),*p-'0');
if (q>buf0 || *p!='0')
if (q<buf_max) *q++=*p;
else if (*(q-1)=='0') *(q-1)=*p;
}
if (NaN) *q++='1';
if (*p=='.') @<Scan a fraction part@>;
next_char=p;
if (*p=='e' && !NaN) @<Scan an exponent@>@;
else exp=0;
if (dec_pt) @<Return a floating point constant@>;
if (sign=='-') val=ominus(zero_octa,val);
return 0;
}
 
@ @<Scan a fraction part@>=
{
dec_pt=q;
p++;
for (zeros=0;isdigit(*p);p++)
if (*p=='0' && q==buf0) zeros++;
else if (q<buf_max) *q++=*p;
else if (*(q-1)=='0') *(q-1)=*p;
}
 
@ The buffer needs room for eight digits of padding at the left, followed
by up to $1022+53-307$ significant digits, followed by a ``sticky'' digit
at position |buf_max-1|, and eight more digits of padding.
 
@d buf0 (buf+8)
@d buf_max (buf+777)
 
@<Glob...@>=
static char buf[785]="00000000"; /* where we put significant input digits */
 
@ @<Local variables for |scan_const|@>=
register char* dec_pt; /* position of decimal point in |buf| */
register int exp; /* scanned exponent; later used for raw binary exponent */
register int zeros; /* leading zeros removed after decimal point */
 
@ Here we don't advance |next_char| and force a decimal point until we
know that a syntactically correct exponent exists.
 
The code here will convert extra-large inputs like
`\.{9e+9999999999999999}' into $\infty$ and extra-small inputs into zero.
Strange inputs like `\.{-00.0e9999999}' must also be accommodated.
 
@<Scan an exponent@>=
{@+register char exp_sign;
p++;
if (*p=='+' || *p=='-') exp_sign=*p++;@+else exp_sign='+';
if (isdigit(*p)) {
for (exp=*p++ -'0';isdigit(*p);p++)
if (exp<1000) exp = 10*exp + *p - '0';
if (!dec_pt) dec_pt=q, zeros=0;
if (exp_sign=='-') exp=-exp;
next_char=p;
}
}
 
@ @<Return a floating point constant@>=
{
@<Move the digits from |buf| to |ff|@>;
@<Determine the binary fraction and binary exponent@>;
packit: @<Pack and round the answer@>;
return 1;
}
 
@ Now we get ready to compute the binary fraction bits, by putting the
scanned input digits into a multiprecision fixed-point
accumulator |ff| that spans the full necessary range.
After this step, the number that we want to convert to floating binary
will appear in |ff.dat[ff.a]|, |ff.dat[ff.a+1]|, \dots,
|ff.dat[ff.b]|.
The radix-$10^9$ digit in ${\it ff}[36-k]$ is understood to be multiplied
by $10^{9k}$, for $36\ge k\ge-120$.
 
@<Move the digits from |buf| to |ff|@>=
x=buf+341+zeros-dec_pt-exp;
if (q==buf0 || x>=1413) {
make_it_zero: exp=-99999;@+ goto packit;
}
if (x<0) {
make_it_infinite: exp=99999;@+ goto packit;
}
ff.a=x/9;
for (p=q;p<q+8;p++) *p='0'; /* pad with trailing zeros */
q=q-1-(q+341+zeros-dec_pt-exp)%9; /* compute stopping place in |buf| */
for (p=buf0-x%9,k=ff.a;p<=q && k<=156; p+=9, k++)
@<Put the 9-digit number |*p|\thinspace\dots\thinspace|*(p+8)|
into |ff.dat[k]|@>;
ff.b=k-1;
for (x=0;p<=q;p+=9) if (strncmp(p,"000000000",9)!=0) x=1;
ff.dat[156]+=x; /* nonzero digits that fall off the right are sticky */
@^sticky bit@>
while (ff.dat[ff.b]==0) ff.b--;
 
@ @<Put the 9-digit number...@>=
{
for (x=*p-'0',pp=p+1;pp<p+9;pp++) x=10*x + *pp - '0';
ff.dat[k]=x;
}
 
@ @<Local variables for |scan_const|@>=
register int k,x;
register char *pp;
bignum ff,tt;
 
@ Here's a subroutine that is dual to |bignum_times_ten|. It changes $f$
to~$2f$, assuming that overflow will not occur and that the radix is $10^9$.
 
@<Subr...@>=
static void bignum_double(f)
bignum *f;
{
register tetra *p,*q; register int x,carry;
for (p=&f->dat[f->b],q=&f->dat[f->a],carry=0; p>=q; p--) {
x = *p + *p + carry;
if (x>=1000000000) carry=1, *p=x-1000000000;
else carry=0, *p=x;
}
*p=carry;
if (carry) f->a--;
if (f->dat[f->b]==0 && f->b>f->a) f->b--;
}
 
@ @<Determine the binary fraction and binary exponent@>=
val=zero_octa;
if (ff.a>36) {
for (exp=0x3fe;ff.a>36;exp--) bignum_double(&ff);
for (k=54;k;k--) {
if (ff.dat[36]) {
if (k>=32) val.h |= 1<<(k-32);@+else val.l |= 1<<k;
ff.dat[36]=0;
if (ff.b==36) break; /* break if |ff| now zero */
}
bignum_double(&ff);
}
}@+else {
tt.a=tt.b=36, tt.dat[36]=2;
for (exp=0x3fe;bignum_compare(&ff,&tt)>=0;exp++) bignum_double(&tt);
for (k=54;k;k--) {
bignum_double(&ff);
if (bignum_compare(&ff,&tt)>=0) {
if (k>=32) val.h |= 1<<(k-32);@+else val.l |= 1<<k;
bignum_dec(&ff,&tt,1000000000);
if (ff.a==bignum_prec-1) break; /* break if |ff| now zero */
}
}
}
if (k==0) val.l |= 1; /* add sticky bit if |ff| nonzero */
 
@ We need to be careful that the input `\.{NaN.999999999999999999999}' doesn't
get rounded up; it is supposed to yield \Hex{7fffffffffffffff}.
 
Although the input `\.{NaN.0}' is illegal, strictly speaking, we silently
convert it to \Hex{7ff0000000000001}---a number that would be
output as `\.{NaN.0000000000000002}'.
 
@<Pack and round the answer@>=
val=fpack(val,exp,sign,ROUND_NEAR);
if (NaN) {
if ((val.h&0x7fffffff)==0x40000000) val.h |= 0x7fffffff, val.l=0xffffffff;
else if ((val.h&0x7fffffff)==0x3ff00000 && !val.l) val.h|=0x40000000,val.l=1;
else val.h |= 0x40000000;
}
 
@*Floating point remainders. In this section we implement the remainder
of the floating point operations---one of which happens to be the
operation of taking the remainder.
 
The easiest task remaining is to compare two floating point quantities.
Routine |fcomp| returns $-1$~if~$y<z$, 0~if~$y=z$, $+1$~if~$y>z$, and
$+2$~if $y$ and~$z$ are unordered.
 
@<Subr...@>=
int fcomp @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
int fcomp(y,z)
octa y,z;
{
ftype yt,zt;
int ye,ze;
char ys,zs;
octa yf,zf;
register int x;
yt=funpack(y,&yf,&ye,&ys);
zt=funpack(z,&zf,&ze,&zs);
switch (4*yt+zt) {
case 4*nan+nan: case 4*zro+nan: case 4*num+nan: case 4*inf+nan:
case 4*nan+zro: case 4*nan+num: case 4*nan+inf: return 2;
case 4*zro+zro: return 0;
case 4*zro+num: case 4*num+zro: case 4*zro+inf: case 4*inf+zro:
case 4*num+num: case 4*num+inf: case 4*inf+num: case 4*inf+inf:
if (ys!=zs) x=1;
else if (y.h>z.h) x=1;
else if (y.h<z.h) x=-1;
else if (y.l>z.l) x=1;
else if (y.l<z.l) x=-1;
else return 0;
break;
}
return (ys=='-'? -x: x);
}
 
@ Several \MMIX\ operations act on a single floating point number and
accept an arbitrary rounding mode. For example, consider the
operation of rounding to the nearest floating point integer:
 
@<Subr...@>=
octa fintegerize @,@,@[ARGS((octa,int))@];@+@t}\6{@>
octa fintegerize(z,r)
octa z; /* the operand */
int r; /* the rounding mode */
{
ftype zt;
int ze;
char zs;
octa xf,zf;
zt=funpack(z,&zf,&ze,&zs);
if (!r) r=cur_round;
switch (zt) {
case nan:@+if (!(z.h&0x80000)) {@+exceptions|=I_BIT;@+z.h|=0x80000;@+}
case inf: case zro: return z;
case num: @<Integerize and |return|@>;
}
}
 
@ @<Integerize...@>=
if (ze>=1074) return fpack(zf,ze,zs,ROUND_OFF); /* already an integer */
if (ze<=1020) xf.h=0,xf.l=1;
else {@+octa oo;
xf=shift_right(zf,1074-ze,1);
oo=shift_left(xf,1074-ze);
if (oo.l!=zf.l || oo.h!=zf.h) xf.l|=1; /* sticky bit */
@^sticky bit@>
}
switch (r) {
case ROUND_DOWN:@+ if (zs=='-') xf=incr(xf,3);@+break;
case ROUND_UP:@+ if (zs!='-') xf=incr(xf,3);
case ROUND_OFF: break;
case ROUND_NEAR: xf=incr(xf, xf.l&4? 2: 1);@+break;
}
xf.l&=0xfffffffc;
if (ze>=1022) return fpack(shift_left(xf,1074-ze),ze,zs,ROUND_OFF);
if (xf.l) xf.h=0x3ff00000, xf.l=0;
if (zs=='-') xf.h|=sign_bit;
return xf;
 
@ To convert floating point to fixed point, we use |fixit|.
 
@<Subr...@>=
octa fixit @,@,@[ARGS((octa,int))@];@+@t}\6{@>
octa fixit(z,r)
octa z; /* the operand */
int r; /* the rounding mode */
{
ftype zt;
int ze;
char zs;
octa zf,o;
zt=funpack(z,&zf,&ze,&zs);
if (!r) r=cur_round;
switch (zt) {
case nan: case inf: exceptions|=I_BIT;@+return z;
case zro: return zero_octa;
case num:@+if (funpack(fintegerize(z,r),&zf,&ze,&zs)==zro) return zero_octa;
if (ze<=1076) o=shift_right(zf,1076-ze,1);
else {
if (ze>1085 || (ze==1085 && (zf.h>0x400000 || @|
(zf.h==0x400000 && (zf.l || zs!='-'))))) exceptions|=W_BIT;
if (ze>=1140) return zero_octa;
o=shift_left(zf,ze-1076);
}
return (zs=='-'? ominus(zero_octa,o): o);
}
}
 
@ Going the other way, we can specify not only a rounding mode but whether
the given fixed point octabyte is signed or unsigned, and whether the
result should be rounded to short precision.
 
@<Subr...@>=
octa floatit @,@,@[ARGS((octa,int,int,int))@];@+@t}\6{@>
octa floatit(z,r,u,p)
octa z; /* octabyte to float */
int r; /* rounding mode */
int u; /* unsigned? */
int p; /* short precision? */
{
int e;@+char s;
register int t;
exceptions=0;
if (!z.h && !z.l) return zero_octa;
if (!r) r=cur_round;
if (!u && (z.h&sign_bit)) s='-', z=ominus(zero_octa,z);@+ else s='+';
e=1076;
while (z.h<0x400000) e--,z=shift_left(z,1);
while (z.h>=0x800000) {
e++;
t=z.l&1;
z=shift_right(z,1,1);
z.l|=t;
}
if (p) @<Convert to short float@>;
return fpack(z,e,s,r);
}
 
@ @<Convert to short float@>=
{
register int ex;@+register tetra t;
t=sfpack(z,e,s,r);
ex=exceptions;
sfunpack(t,&z,&e,&s);
exceptions=ex;
}
 
@ The square root operation is more interesting.
 
@<Subr...@>=
octa froot @,@,@[ARGS((octa,int))@];@+@t}\6{@>
octa froot(z,r)
octa z; /* the operand */
int r; /* the rounding mode */
{
ftype zt;
int ze;
char zs;
octa x,xf,rf,zf;
register int xe,k;
if (!r) r=cur_round;
zt=funpack(z,&zf,&ze,&zs);
if (zs=='-' && zt!=zro) exceptions|=I_BIT, x=standard_NaN;
else@+switch (zt) {
case nan:@+ if (!(z.h&0x80000)) exceptions|=I_BIT, z.h|=0x80000;
return z;
case inf: case zro: x=z;@+break;
case num: @<Take the square root and |return|@>;
}
if (zs=='-') x.h|=sign_bit;
return x;
}
 
@ The square root can be found by an adaptation of the old pencil-and-paper
method. If $n=\lfloor\sqrt s\rfloor$, where $s$ is an integer,
we have $s=n^2+r$ where $0\le r\le2n$;
this invariant can be maintained if we replace $s$ by $4s+(0,1,2,3)$
and $n$ by $2n+(0,1)$. The following code implements this idea with
$2n$ in~|xf| and $r$ in~|rf|. (It could easily be made to run about
twice as fast.)
 
@<Take the square root and |return|@>=
xf.h=0, xf.l=2;
xe=(ze+0x3fe)>>1;
if (ze&1) zf=shift_left(zf,1);
rf.h=0, rf.l=(zf.h>>22)-1;
for (k=53;k;k--) {
rf=shift_left(rf,2);@+ xf=shift_left(xf,1);
if (k>=43) rf=incr(rf,(zf.h>>(2*(k-43)))&3);
else if (k>=27) rf=incr(rf,(zf.l>>(2*(k-27)))&3);
if ((rf.l>xf.l && rf.h>=xf.h) || rf.h>xf.h) {
xf.l++;@+rf=ominus(rf,xf);@+xf.l++;
}
}
if (rf.h || rf.l) xf.l++; /* sticky bit */
return fpack(xf,xe,'+',r);
 
@ And finally, the genuine floating point remainder. Subroutine |fremstep|
either calculates $y\,{\rm rem}\,z$ or reduces $y$ to a smaller number
having the same remainder with respect to~$z$. In the latter case
the |E_BIT| is set in |exceptions|. A third parameter, |delta|,
gives a decrease in exponent that is acceptable for incomplete results;
if |delta| is sufficiently large, say 2500, the correct result will
always be obtained in one step of |fremstep|.
 
@<Subr...@>=
octa fremstep @,@,@[ARGS((octa,octa,int))@];@+@t}\6{@>
octa fremstep(y,z,delta)
octa y,z;
int delta;
{
ftype yt,zt;
int ye,ze;
char xs,ys,zs;
octa x,xf,yf,zf;
register int xe,thresh,odd;
yt=funpack(y,&yf,&ye,&ys);
zt=funpack(z,&zf,&ze,&zs);
switch (4*yt+zt) {
@t\4@>@<The usual NaN cases@>;
case 4*zro+zro: case 4*num+zro: case 4*inf+zro:
case 4*inf+num: case 4*inf+inf: x=standard_NaN;
exceptions|=I_BIT;@+break;
case 4*zro+num: case 4*zro+inf: case 4*num+inf: return y;
case 4*num+num: @<Remainderize nonzero numbers and |return|@>;
zero_out: x=zero_octa;
}
if (ys=='-') x.h|=sign_bit;
return x;
}
 
@ If there's a huge difference in exponents and the remainder is nonzero,
this computation will take a long time. One could compute
$(2^ny)\,{\rm rem}\,z$ much more quickly for large~$n$ by using $O(\log n)$
multiplications modulo~$z$, but the floating remainder operation isn't
important enough to justify such expensive hardware.
 
Results of floating remainder are always exact, so the rounding mode
is immaterial.
 
@<Remainderize...@>=
odd=0; /* will be 1 if we've subtracted an odd multiple of~$z$ from $y$ */
thresh=ye-delta;
if (thresh<ze) thresh=ze;
while (ye>=thresh) @<Reduce |(ye,yf)| by a multiple of |zf|;
|goto zero_out| if the remainder is zero,
|goto try_complement| if appropriate@>;
if (ye>=ze) {
exceptions|=E_BIT;@+return fpack(yf,ye,ys,ROUND_OFF);
}
if (ye<ze-1) return fpack(yf,ye,ys,ROUND_OFF);
yf=shift_right(yf,1,1);
try_complement: xf=ominus(zf,yf), xe=ze, xs='+' + '-' - ys;
if (xf.h>yf.h || (xf.h==yf.h && (xf.l>yf.l || (xf.l==yf.l && !odd))))
xf=yf, xs=ys;
while (xf.h<0x400000) xe--, xf=shift_left(xf,1);
return fpack(xf,xe,xs,ROUND_OFF);
 
@ Here we are careful not to change the sign of |y|, because a remainder
of~0 is supposed to inherit the original sign of~|y|.
 
@<Reduce |(ye,yf)| by a multiple of |zf|...@>=
{
if (yf.h==zf.h && yf.l==zf.l) goto zero_out;
if (yf.h<zf.h || (yf.h==zf.h && yf.l<zf.l)) {
if (ye==ze) goto try_complement;
ye--, yf=shift_left(yf,1);
}
yf=ominus(yf,zf);
if (ye==ze) odd=1;
while (yf.h<0x400000) ye--, yf=shift_left(yf,1);
}
 
@* Index.
 
/deluxe.mmconfig
0,0 → 1,66
% configuration for basic tests --- still under construction
memaddresstime 3
memreadtime 10 memwritetime 10
membusbytes 16
branchpredictbits 2
branchaddressbits 6
branchhistorybits 3
branchdualbits 3
memchunksmax 100
hashprime 127
Scache blocksize 64
Scache setsize 2048
Scache associativity 4 pseudolru
Scache accesstime 2
Dcache blocksize 32
Dcache setsize 512
Dcache victimsize 8
Icache blocksize 32
Icache setsize 256
Icache victimsize 4
DTcache associativity 4 lru
unit BIT1 000000000000000000000000000000000000000000000000ffff00ff00ffc004
unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU3 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU4 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU5 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU6 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit LSU3 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit MUL1 000080f000000000000000000000000000000000000000000000000000000000
unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000
unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000
dispatchmax 3
commitmax 3
fetchmax 4
memslots 8
renameregs 20
reorderbuffer 40
Dcache writeallocate 1
Scache writeallocate 1
Dcache writeback 1
Scache writeback 1
Dcache ports 2
DTcache ports 2
writebuffer 8
writeholdingtime 3
mul0 1
mul1 2
mul2 2 2 1
mul3 2 2 2 1
mul4 2 2 2 2 2
mul5 2 2 2 2 2
mul6 2 2 2 2 2
mul7 2 2 2 2 2
mul8 2 2 2 2 2
div 10 10 10 10 10 10
fadd 1 1 1 1
fmul 1 1 1 1
fdiv 10 10 10 10
fsqrt 10 10 10 10
feps 1 1 1 1
fix 1 1
flot 1 1
 
/primesfx.mms
0,0 → 1,69
% Example program ... Table of primes (floating point with sharper bound)
L IS 500 The number of primes to find
t IS $255 Temporary storage
n GREG
q GREG
r GREG
jj GREG
kk GREG
pk GREG
mm IS kk
 
LOC Data_Segment
PRIME1 WYDE 2
LOC PRIME1+2*L
ptop GREG @
j0 GREG PRIME1+2-@
BUF OCTA
 
LOC #100
Main SET n,3
SET jj,j0
2H STWU n,ptop,jj
INCL jj,2
3H BZ jj,2F
4H INCL n,2
5H SET kk,j0
fn GREG 0
sqrtn GREG 0
FLOT fn,n
FSQRT sqrtn,fn
0H GREG #3fffff0000000000
FSUB sqrtn,sqrtn,0B
6H LDWU pk,ptop,kk
FLOT t,pk
FREM r,fn,t
BZ r,4B
7H FCMP t,t,sqrtn
BNN t,2B
8H INCL kk,2
JMP 6B
GREG @
Title BYTE "First Five Hundred Primes"
NewLn BYTE #a,0
Blanks BYTE " ",0
2H LDA t,Title
TRAP 0,Fputs,StdOut
NEG mm,2
3H ADD mm,mm,j0
LDA t,Blanks
TRAP 0,Fputs,StdOut
2H LDWU pk,ptop,mm
0H GREG #2030303030000000
STOU 0B,BUF
LDA t,BUF+4
1H DIV pk,pk,10
GET r,rR
INCL r,'0'
STBU r,t,0
SUB t,t,1
PBNZ pk,1B
LDA t,BUF
TRAP 0,Fputs,StdOut
INCL mm,2*L/10
PBN mm,2B
LDA t,NewLn
TRAP 0,Fputs,StdOut
CMP t,mm,2*(L/10-1)
PBNZ t,3B
TRAP 0,Halt,0
/test.mmix
0,0 → 1,12
000000010000: A RIDICULOUS BUT INSTRUCTIVE TEST PROGRAM FOR MMIX-PIPE
f40000038d030004 10000: GETA r0,$+3; LDO r3,r0,4 % start in 8000000000010000
f6130003f000000b 10008: PUT rV,r3; JMP $+11
00300e000000c330 10010: b1=0,b2=0,b3=3,b4=0,s=14,r=6,8n=#330
8000000000010337 10018: level 2 page table pointer
8000000000010330 10020: level 1 page table pointer
ff00000000010337 10028: page table entry, maps seg 2 page (3 4 5) to #10000
4000000c04014000 10030: seg 2, page (3 4 5)
8d0200248d040228 10038: LDO r2,r0,36; LDO r4,r2,40 % r2=[#10030], r4=[#10028]
bf030248f8640002 10040: PUSHGO r3,r2,#48; POP r100,2 % goes to #10048
4700ffff34008000 10048: BOD r0,$-1; INCH r0,#8000 % goes back, then fwd
d101002f00000000 10050: SL r1,r0,47; TRAP 0,0,0
/mmixal.w
0,0 → 1,3258
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
 
\def\title{MMIXAL}
 
\def\MMIX{\.{MMIX}}
\def\MMIXAL{\.{MMIXAL}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow
\def\bull{\smallbreak\textindent{$\bullet$}}
@s and normal @q unreserve a C++ keyword @>
@s or normal @q unreserve a C++ keyword @>
@s xor normal @q unreserve a C++ keyword @>
 
\ifx\exotic+
\font\heb=heb8 at 10pt
\font\rus=lhwnr8
\input unicode
\unicodeptsize=8pt
\fi
 
@* Definition of MMIXAL. This program takes input written in \MMIXAL,
the \MMIX\ assembly language, and translates it
@^assembly language@>
into binary files that can be loaded and executed
on \MMIX\ simulators. \MMIXAL\ is much simpler than the ``industrial
strength'' assembly languages that computer manufacturers usually provide,
because it is primarily intended for the simple demonstration programs
in {\sl The Art of Computer Programming}. Yet it tries to have enough
features to serve also as the back end of compilers for \CEE/ and other
high-level languages.
 
Instructions for using the program appear at the end of this document.
First we will discuss the input and output languages in detail; then we'll
consider the translation process, step by step; then we'll put everything
together.
 
@ A program in \MMIXAL\ consists of a series of {\it lines}, each of which
usually contains a single instruction. However, lines with no instructions are
possible, and so are lines with two or more instructions.
 
Each instruction has
three parts called its label field, opcode field, and operand field; these
fields are separated from each other by one or more spaces.
The label field, which is often empty, consists of all characters up to the
first blank space. The opcode field, which is never empty, runs from the first
nonblank after the label to the next blank space. The operand field, which
again might be empty, runs from the next nonblank character (if any) to the
first blank or semicolon that isn't part of a string or character constant.
If the operand field is followed by a semicolon, possibly with intervening
blanks, a new instruction begins immediately after the semicolon; otherwise
the rest of the line is ignored. The end of a line is treated as a blank space
for the purposes of these rules, with the additional proviso that
string or character constants are not allowed to extend from one line to
another.
 
The label field must begin with a letter or a digit; otherwise the entire
line is treated as a comment. Popular ways to introduce comments,
either at the beginning of a line or after the operand field, are to
precede them by the character \.\% as in \TeX, or by \.{//} as in \CPLUSPLUS/;
\MMIXAL\ is not very particular. However, Lisp-style comments introduced
by single semicolons will fail if they follow an instruction, because
they will be assumed to introduce another instruction.
 
@ \MMIXAL\ has no built-in macro capability, nor does it know how to
include header files and such things. But users can run their files
through a standard \CEE/ preprocessor to obtain \MMIXAL\ programs in which
macros and such things have been expanded. (Caution: The preprocessor also
removes \CEE/-style comments, unless it is told not to do so.)
Literate programming tools could also be used for preprocessing.
@^C preprocessor@>
@^literate programming@>
 
If a line begins with the special form `\.\# \<integer> \<string>',
this program interprets it as a {\it line directive\/} emitted by a
preprocessor. For example,
$$\leftline{\indent\.{\# 13 "foo.mms"}}$$
means that the following line was line 13 in the user's source file
\.{foo.mms}. Line directives allow us to correlate errors with the
user's original file; we also pass them to the output, for use by
simulators and debuggers.
@^line directives@>
 
@ \MMIXAL\ deals primarily with {\it symbols\/} and {\it constants}, which it
interprets and combines to form machine language instructions and data.
Constants are simplest, so we will discuss them first.
 
A {\it decimal constant\/} is a sequence of digits, representing a number in
radix~10. A~{\it hexadecimal constant\/} is a sequence of hexadecimal digits,
preceded by~\.\#, representing a number in radix~16:
$$\vbox{\halign{$#$\hfil\cr
\<digit>\is\.0\mid\.1\mid\.2\mid\.3\mid\.4\mid
\.5\mid\.6\mid\.7\mid\.8\mid\.9\cr
\<hex digit>\is\<digit>\mid\.A\mid\.B\mid\.C\mid\.D\mid\.E\mid\.F\mid
\.a\mid\.b\mid\.c\mid\.d\mid\.e\mid\.f\cr
\<decimal constant>\is\<digit>\mid\<decimal constant>\<digit>\cr
\<hex constant>\is\.\#\<hex digit>\mid\<hex constant>\<hex digit>\cr
}}$$
Constants whose value is $2^{64}$ or more are reduced modulo $2^{64}$.
 
@ A {\it character constant\/} is a single character enclosed in
single quote marks; it denotes the {\mc ASCII} or Unicode number
@^Unicode@>
corresponding to that character. For example, \.{'a'}
represents the constant \.{\#61}, also known as~\.{97}. The quoted character
can be
anything except the character that the \CEE/ library calls \.{\\n} or {\it
newline}; that character should be represented as \.{\#a}.
$$\vbox{\halign{$#$\hfil\cr
\<character constant>\is\.'\<single byte character except newline>\.'\cr
\<constant>\is\<decimal constant>\mid\<hex constant>\mid\<character constant>
\cr}}$$
Notice that \.{'''} represents a single quote, the code \.{\#27}; and
\.{'\\'} represents a backslash, the code \.{\#5c}. \MMIXAL~characters are
never ``quoted'' by backslashes as in the \CEE/~language.
 
In the present implementation
a character constant will always be at most 255, since wyde character
input is not supported.
\ifx\exotic+ But if the input were in Unicode one could write,
say, \.'{\heb\char"40}\.' or \.'{\rus ZH}\.' for \.{\#05d0} or
\.{\#0416}. \fi
The present program
does not support Unicode directly because basic software for inputting and
outputting 16-bit characters was still in a primitive state at the time of
writing. But the data structures below are designed so that a change to
Unicode will not be difficult when the time is ripe.
 
@ A {\it string constant\/} like \.{"Hello"} is an abbreviation for
a sequence of one or more character constants separated by commas:
\.{'H','e','l','l','o'}.
Any character except newline or the double quote mark~\."
can appear between the double quotes of a string constant.
\ifx\exotic+ Similarly,
\."\Uni1.08:24:24:-1:20% Unicode char "9ad8
<002000001800000806ffffff00000002004003ffe00300e00300c00300c003ffc0%
0300c02000043ffffe30000e31008c31ffcc3181cc31818c31818c31ff8c31818c3%
0007c300018>%
\thinspace\Uni1.08:24:24:-1:20% Unicode char "5fb7
<1c038018030018030631ffff30060067860446fffe86ccce0ccccc0ccccc18cccc%
18fffc38c00c38001878fffc58040098030818398618b18318b00b19b0081b300c1%
b3ffc181ff8>%
\thinspace\Uni1.08:24:24:-1:20% Unicode char "7eb3
<0601c00e01800c018018018018218231bfff61b187433186ff3186c631860c3186%
18334630332663b6367e341660380600300600300603b0061e3006f03006c030060%
0303e00300c>%
\kern.1em\." is an abbreviation for
\.'\Uni1.08:24:24:-1:20% Unicode char "9ad8
<002000001800000806ffffff00000002004003ffe00300e00300c00300c003ffc0%
0300c02000043ffffe30000e31008c31ffcc3181cc31818c31818c31ff8c31818c3%
0007c300018>%
\.{','}\Uni1.08:24:24:-1:20% Unicode char "5fb7
<1c038018030018030631ffff30060067860446fffe86ccce0ccccc0ccccc18cccc%
18fffc38c00c38001878fffc58040098030818398618b18318b00b19b0081b300c1%
b3ffc181ff8>%
\.{','}\Uni1.08:24:24:-1:20% Unicode char "7eb3
<0601c00e01800c018018018018218231bfff61b187433186ff3186c631860c3186%
18334630332663b6367e341660380600300600300603b0061e3006f03006c030060%
0303e00300c>%
\.' (namely \.{\#9ad8,\#5fb7,\#7eb3}) when Unicode is supported.
@^Unicode@>
\fi
 
@ A {\it symbol\/} in \MMIXAL\ is any sequence of letters and digits,
beginning with a letter. A~colon~`\.:' or underscore symbol `\.\_'
is regarded as a letter, for purposes of this definition.
All extended-ASCII characters like `{\tt \'e}',
whose 8-bit code exceeds 126, are also treated as letters.
$$\vbox{\halign{$#$\hfil\cr
\<letter>\is\.A\mid\.B\mid\cdots\mid\.Z\mid\.a\mid\.b\mid\cdots\mid\.z\mid
\.:\mid\.\_\mid\<{character with code value $>126$}>\cr
\<symbol>\is\<letter>\mid\<symbol>\<letter>\mid\<symbol>\<digit>\cr
}}$$
 
In future implementations, when \MMIXAL\ is used with Unicode,
@^Unicode@>
all wyde characters whose 16-bit code exceeds 126 will be regarded
as letters; thus \MMIXAL\ symbols will be able to involve Greek letters or
Chinese characters or thousands of other glyphs.
@ A symbol is said to
be {\it fully qualified\/} if it begins with a colon. Every symbol
that is not fully qualified is an abbreviation for the fully qualified
symbol obtained by placing the {\it current prefix\/} in front of it;
the current prefix is always fully qualified. At the beginning of an
\MMIXAL\ program the current prefix is simply the single character~`\.:',
but the user can change it with the \.{PREFIX} command. For example,
$$\vbox{\halign{&\quad\tt#\hfil\cr
ADD&x,y,z&\% means ADD :x,:y,:z\cr
PREFIX&Foo:&\% current prefix is :Foo:\cr
ADD&x,y,z&\% means ADD :Foo:x,:Foo:y,:Foo:z\cr
PREFIX&Bar:&\% current prefix is :Foo:Bar:\cr
ADD&:x,y,:z&\% means ADD :x,:Foo:Bar:y,:z\cr
PREFIX&:&\% current prefix reverts to :\cr
ADD&x,Foo:Bar:y,Foo:z&\% means ADD :x,:Foo:Bar:y,:Foo:z\cr
}}$$
This mechanism allows large programs to avoid conflicts between symbol names,
when parts of the program are independent and/or written by different users.
The current prefix conventionally ends with a colon, but this convention
need not be obeyed.
 
@ A {\it local symbol\/} is a decimal digit followed by one of the
letters \.B, \.F, or~\.H, meaning ``backward,'' ``forward,'' or ``here'':
$$\vbox{\halign{$#$\hfill\cr
\<local operand>\is\<digit>\,\.B\mid\<digit>\,\.F\cr
\<local label>\is\<digit>\,\.H\cr
}}$$
The \.B and \.F forms are permitted only in the operand field of \MMIXAL\
instructions; the \.H form is permitted only in the label field. A local
operand such as~\.{2B} stands for the last local label~\.{2H}
in instructions before the current one, or 0 if \.{2H} has not yet appeared
as a label. A~local operand such as~\.{2F} stands
for the first \.{2H} in instructions after the current one. Thus, in a
sequence such as
$$\vbox{\halign{\tt#\cr 2H JMP 2F\cr 2H JMP 2B\cr}}$$
the first instruction jumps to the second and the second jumps to the first.
 
Local symbols are useful for references to nearby points of a program, in
cases where no meaningful name is appropriate. They can also be useful
in special situations where a redefinable symbol is needed; for example,
an instruction like
$$\.{9H IS 9B+1}$$
will maintain a running counter.
 
@ Each symbol receives a value called its {\it equivalent\/} when it
appears in the label field of an instruction; it is said to be {\it defined\/}
after its equivalent has been established. A few symbols, like \.{rA}
and \.{ROUND\_OFF} and \.{Fopen},
are predefined because they refer to fixed constants
associated with the \MMIX\ hardware or its rudimentary operating system;
otherwise every symbol should be
defined exactly once. The two appearances of `\.{2H}' in the example
above do not violate this rule, because the second `\.{2H}' is not the
same symbol as the first.
 
A predefined symbol can be redefined (given a new equivalent). After it
has been redefined it acts like an ordinary symbol and cannot be
redefined again. A complete list of the predefined symbols appears
in the program listing below.
@^predefined symbols@>
 
Equivalents are either {\it pure\/} or {\it register numbers}. A pure
equivalent is an unsigned octabyte, but a register number
equivalent is a one-byte value, between 0 and~255.
A dollar sign is used to change a pure number into a register number;
for example, `\.{\$20}' means register number~20.
 
@ Constants and symbols are combined into {\it expressions\/} in a simple way:
$$\vbox{\halign{$#$\hfil\cr
\<primary expression>\is\<constant>\mid\<symbol>\mid\<local operand>\mid
\.{@@}\mid\cr
\hskip12pc\.(\<expression>\.)\mid\<unary operator>\<primary expression>\cr
\<term>\is\<primary expression>\mid
\<term>\<strong operator>\<primary expression>\cr
\<expression>\is\<term>\mid\<expression>\<weak operator>\<term>\cr
\<unary operator>\is\.+\mid\.-\mid\.\~\mid\.\$\mid\.\&\cr
\<strong operator>\is\.*\mid\./\mid\.{//}\mid\.\%\mid\.{<<}\mid\.{>>}
\mid\.\&\cr
\<weak operator>\is\.+\mid\.-\mid\.{\char'174}\mid\.\^\cr
}}$$
Each expression has a value that is either pure or a register number.
The character \.{@@} stands for the current location, which is always pure.
The unary operators
\.+, \.-, \.\~, \.\$, and \.\& mean, respectively, ``do nothing,''
``subtract from zero,'' ``complement the bits,'' ``change from pure value
to register number,'' and ``take the serial number.'' Only the first of these,
\.+, can be applied to a register number. The last unary operator, \.\&,
applies only to symbols, and it is of interest primarily to system programmers;
it converts a symbol to the unique positive integer that is used to identify
it in the binary file output by \MMIXAL.
@^serial number@>
 
Binary operators come in two flavors, strong and weak. The strong ones
are essentially concerned with multiplication or division: \.{x*y},
\.{x/y}, \.{x//y}, \.{x\%y}, \.{x<<y}, \.{x>>y}, and \.{x\&y}
stand respectively for
$(x\times y)\bmod2^{64}$ (multiplication), $\lfloor x/y\rfloor$ (division),
$\lfloor2^{64}x/y\rfloor$ (fractional division), $x\bmod y$ (remainder),
$(x\times2^y)\bmod2^{64}$ (left~shift), $\lfloor x/2^y\rfloor$
(right shift), and $x\land y$ (bitwise and) on unsigned octabytes.
Division is legal only if $y>0$; fractional division is
legal only if $x<y$. None of the strong binary operations can be
applied to register numbers.
 
The weak binary operations \.{x+y}, \.{x-y}, \.{x\char'174 y}, and
\.{x\^y} stand respectively for $(x+y)\bmod2^{64}$ (addition),
$(x-y)\bmod2^{64}$ (subtraction),
$x\lor y$ (bitwise or), and $x\oplus y$ (bitwise exclusive-or) on
unsigned octabytes. These operations can be applied to register
numbers only in four contexts: $\<register>+\<pure>$, $\<pure>+\<register>$,
$\<register>-\<pure>$
and $\<register>-\<register>$. For example, if \.{x} denotes \.{\$1} and
\.{y} denotes \.{\$10}, then \.{x+3} and \.{3+x} denote \.{\$4}, and
\.{y-x} denotes the pure value \.{9}.
 
Register numbers within expressions are allowed to be
arbitrary octabytes, but a register number assigned as the
equivalent of a symbol should not exceed 255.
 
(Incidentally, one might ask why the designer of \MMIXAL\ did not simply
adopt the existing rules of \CEE/ for expressions. The primary reason is that
the designers of \CEE/ chose to give \.{<<}, \.{>>}, and \.\& a lower
precedence than~\.+; but in \MMIXAL\ we want to be able to write things
like \.{o<<24+x<<16+y<<8+z} or \.{@@+yz<<2} or \.{@@+(\#100-@@)\&\#ff}.
Since the conventions of \CEE/ were inappropriate, it was better
to make a clean break, not pretending to have a close relationship
with that language. The new rules are quite easily memorized,
because \MMIXAL\ has just two levels of precedence, and the strong binary
operations are all essentially multiplicative by nature
while the weak binary operations are essentially additive.)
 
@ A symbol is called a {\it future reference\/} until it has been defined.
\MMIXAL\ restricts the use of future references, so that programs can
be assembled quickly in one pass over the input; therefore all
expressions can be evaluated when the \MMIXAL\ processor first sees them.
 
The restrictions are easily stated: Future references
cannot be used in expressions together with unary or binary operators (except
the unary~\.+, which does nothing); moreover, future references
can appear as operands only in instructions that have relative
addresses (namely branches, probable branches, \.{JMP}, \.{PUSHJ},
\.{GETA}) or in octabyte constants (the pseudo-operation \.{OCTA}).
Thus, for example, one can say \.{JMP}~\.{1F} or \.{JMP}~\.{1B-4}, but not
\.{JMP}~\.{1F-4}.
 
@ We noted earlier that each \MMIXAL\ instruction contains
a label field, an opcode field, and an operand field. The label field is
either empty or a symbol or local label; when it is nonempty, the
symbol or local label receives an equivalent. The operand field is
either empty or a sequence of expressions separated by commas; when
it is empty, it is equivalent to the simple operand field~`\.0'.
$$\vbox{\halign{$#$\hfil\cr
\<instruction>\is\<label>\<opcode>\<operand list>\cr
\<label>\is\<empty>\mid\<symbol>\mid\<local label>\cr
\<operand list>\is\<empty>\mid\<expression list>\cr
\<expression list>\is\<expression>\mid\<expression list>\.,\<expression>\cr
}}$$
 
The opcode field either contains a symbolic \MMIX\ operation name (like
\.{ADD}), or an {\it alias operation}, or a {\it pseudo-operation}.
Alias operations are alternate names for \MMIX\ operations whose standard
names are inappropriate in certain contexts.
Pseudo-operations do not correspond
directly to \MMIX\ commands, but they govern the assembly process in
important ways.
 
There are two alias operations:
 
\bull \.{SET} \.{\$X,\$Y} is equivalent to \.{OR} \.{\$X,\$Y,0}; it sets
register~X to register~Y. Similarly, \.{SET} \.{\$X,Y} (when \.Y is
not a register) is equivalent to \.{SETL} \.{\$X,Y}.
@.SET@>
 
\bull \.{LDA} \.{\$X,\$Y,\$Z} is equivalent to \.{ADDU} \.{\$X,\$Y,\$Z};
it loads the address of memory location $\rm \$Y+\$Z$ into register~X.
Similarly, \.{LDA} \.{\$X,\$Y,Z} is equivalent to \.{ADDU} \.{\$X,\$Y,Z}.
@.LDA@>
 
\smallskip
The symbolic operation names for genuine \MMIX\ operations
should not include the suffix~\.I for an immediate operation or the suffix~\.B
for a backward jump; \MMIXAL\ determines such things automatically.
Thus, one never writes \.{ADDI} or \.{JMPB} in the source input to
\MMIXAL, although such opcodes might appear when a simulator or
debugger or disassembler is presenting a numeric instruction in symbolic form.
$$\vbox{\halign{$#$\hfil\cr
\<opcode>\is\<symbolic \MMIX\ operation>\mid\<alias operation>\cr
\hskip12pc\mid\<pseudo-operation>\cr
\<symbolic \MMIX\ operation>\is\.{TRAP}\mid\.{FCMP}\mid\cdots\mid\.{TRIP}\cr
\<alias operation>\is\.{SET}\mid\.{LDA}\cr
\<pseudo-operation>\is\.{IS}\mid\.{LOC}\mid\.{PREFIX}\mid
\.{GREG}\mid\.{LOCAL}\mid\.{BSPEC}\mid\.{ESPEC}\cr
\hskip12pc\mid\.{BYTE}\mid\.{WYDE}\mid\.{TETRA}\mid\.{OCTA}\cr
}}$$
 
@ \MMIX\ operations like \.{ADD} require exactly three expressions as
operands. The first two must be register numbers. The third must be either a
register number or a pure number between 0 and~255; in the latter case,
\.{ADD} becomes \.{ADDI} in the assembled output. Thus, for example,
the command ``set register~1 to the sum of register~2 and register~3'' could be
expressed as
$$\.{ADD \$1,\$2,\$3}$$
or as, say,
$$\.{ADD x,y,y+1}$$
if the equivalent of \.x is \.{\$1} and the equivalent of \.y is \.{\$2}.
The command ``subtract 5 from register~1'' could be expressed as
$$\.{SUB \$1,\$1,5}$$
or as
$$\.{SUB x,x,5}$$
but not as `\.{SUBI} \.{\$1,\$1,5}' or `\.{SUBI} \.{x,x,5}'.
 
\MMIX\ operations like \.{FLOT} require either three operands
(register, pure, register/pure) or only two (register, register/pure).
In the first case the middle operand is the rounding mode, which is
best expressed in terms of the predefined symbolic values
\.{ROUND\_CURRENT}, \.{ROUND\_OFF}, \.{ROUND\_UP}, \.{ROUND\_DOWN},
\.{ROUND\_NEAR}, for $(0,1,2,3,4)$ respectively. In the second case
the middle operand is understood to be zero (namely,
\.{ROUND\_CURRENT}).
@:ROUND_OFF}\.{ROUND\_OFF@>
@:ROUND_UP}\.{ROUND\_UP@>
@:ROUND_DOWN}\.{ROUND\_DOWN@>
@:ROUND_NEAR}\.{ROUND\_NEAR@>
@:ROUND_CURRENT}\.{ROUND\_CURRENT@>
 
\MMIX\ operations like \.{SETL} or \.{INCH}, which involve a wyde
intermediate constant, require exactly two operands, (register, pure).
The value of the second operand should fit in two bytes.
 
\MMIX\ operations like \.{BNZ}, which mention a register and a
relative address, also require two operands. The first operand
should be a register number. The second operand should yield a result~$r$
in the range $-2^{16}\le r<2^{16}$ when the current location is subtracted
from it and the result is divided by~4. The second operand might also
be undefined; in that case, the eventual value must satisfy the
restriction stated for defined values. The opcodes \.{GETA} and
\.{PUSHJ} are similar, except that the first operand to \.{PUSHJ}
might also be pure (see below). The \.{JMP} operation is also
similar, but it has only one operand, and it allows the larger
address range $-2^{24}\le r<2^{24}$.
 
\MMIX\ operations that refer to memory, like \.{LDO} and \.{STHT} and \.{GO},
are treated like \.{ADD}
if they have three operands, except that the first operand should be
pure (not a register number) in the case of \.{PRELD}, \.{PREGO},
\.{PREST}, \.{STCO}, \.{SYNCD}, and \.{SYNCID}. These opcodes
also accept a special two-operand form, in which the second operand
stands for a {\it base address\/} and an immediate offset (see below).
 
The first operand of \.{PUSHJ} and \.{PUSHGO} can be either a pure
number or a register number. In the first case (`\.{PUSHJ}~\.{2,Sub}'
or `\.{PUSHGO}~\.{2,Sub}')
the programmer might be thinking ``let's push down two registers'';
in the second case (`\.{PUSHJ}~\.{\$2,Sub}' or `\.{PUSHGO}~\.{\$2,Sub}')
the programmer might be thinking ``let's make register~2 the hole
position for this subroutine call.'' Both cases result in the same
assembled output.
 
The remaining \MMIX\ opcodes are idiosyncratic:
$$\def\\{{\rm\quad or\quad}}
\vbox{\halign{\tt#\hfill\cr
NEG r,p,z;\cr
PUT s,z;\cr
GET r,s;\cr
POP p,yz;\cr
RESUME xyz;\cr
SAVE r,0;\cr
UNSAVE r;\cr
SYNC xyz;\cr
TRAP x,y,z\\TRAP x,yz\\TRAP xyz;\cr
}}$$
\.{SWYM} and \.{TRIP} are like \.{TRAP}. Here \.s is an integer
between 0 and~31, preferably given by one of the predefined
symbols \.{rA}, \.{rB}, \dots~for special register codes;
\.r is a register number; \.p is a pure byte; \.x, \.y, and \.z are
either register numbers or pure bytes; \.{yz} and \.{xyz} are pure
values that fit respectively in two and three bytes.
 
All of these rules can be summarized by saying that \MMIXAL\ treats each
\MMIX\ opcode in the most natural way. When there are three operands,
they affect fields X,~Y, and~Z of the assembled \MMIX\ instruction;
when there are two operands, they affect fields X and~YZ;
when there is just one operand, it affects field XYZ.
 
@ In all cases when the opcode corresponds to an \MMIX\ operation,
the \MMIXAL\ instruction tells the assembler to carry out four steps:
(1)~Align the current location
so that it is a multiple of~4, by adding 1, 2, or~3 if necessary;
(2)~Define the equivalent of the label field to be the
current location, if the label is nonempty;
(3)~Evaluate the operands and assemble the specified \MMIX\ instruction into
the current location;
(4)~Increase the current location by~4.
 
@ Now let's consider the pseudo-operations, starting with the simplest cases.
 
\bull\<label> \.{IS} \<expression>
defines the value of the label to be the value of the expression,
which must not be a future reference. The expression may be
either pure or a register number.
 
\bull\<label> \.{LOC} \<expression>
first defines the label to be the value of the current location, if the label
is nonempty. Then the current location is changed to the value of the
expression, which must be pure.
 
\smallskip For example, `\.{LOC} \.{\#1000}' will start assembling subsequent
instructions or data in location whose hexa\-decimal value is \Hex{1000}.
`\.X~\.{LOC}~\.{@@+500}' defines \.X to be the address of the first
of 500 bytes in memory; assembly will continue at location $\.X+500$.
The operation of aligning the current location to a multiple of~256,
if it is not already aligned in that way, can be expressed as
`\.{LOC}~\.{@@+(256-@@)\&255}'.
 
A less trivial example arises if we want to emit instructions and data into
two separate areas of memory, but we want to intermix them in the
\MMIXAL\ source file. We could start by defining \.{8H} and \.{9H}
to be the starting addresses of the instruction and data segments,
respectively. Then, a sequence of instructions could be enclosed
in `\.{LOC}~\.{8B}; \dots; \.{8H}~\.{IS}~\.{@@}'; a sequence of
data could be enclosed in `\.{LOC}~\.{9B}; \dots; \.{9H}~\.{IS}~\.{@@}'.
Any number of such sequences could then be combined.
Instead of the two pseudo-instructions `\.{8H}~\.{IS}~\.{@@;} \.{LOC}~\.{9B}'
one could in fact write simply `\.{8H}~\.{LOC}~\.{9B}' when
switching from instructions to data.
 
\bull \.{PREFIX} \<symbol>
redefines the current prefix to be the given symbol (fully qualified).
The label field should be blank.
 
@ The next pseudo-operations assemble bytes, wydes, tetrabytes, or
octabytes of data.
 
\bull \<label> \.{BYTE} \<expression list>
defines the label to be the current location, if the label field is nonempty;
then it assembles one byte for each expression in the expression list, and
advances the current location by the number of bytes. The expressions
should all be pure numbers that fit in one byte.
 
String constants are often used in such expression lists.
For example, if the current location is \Hex{1000}, the instruction
\.{BYTE}~\.{"Hello",0} assembles six bytes containing the constants
\.{'H'}, \.{'e'}, \.{'l'}, \.{'l'}, \.{'o'}, and~\.0 into locations
\Hex{1000}, \dots,~\Hex{1005}, and advances the current location
to \Hex{1006}.
 
\bull \<label> \.{WYDE} \<expression list>
is similar, but it first makes the current location even, by adding~1 to it
if necessary. Then it defines the label (if a nonempty label is present),
and assembles each expression as a two-byte value. The current location
is advanced by twice the number of expressions in the list. The
expressions should all be pure numbers that fit in two bytes.
 
\bull \<label> \.{TETRA} \<expression list>
is similar, but it aligns the current location to a multiple of~4
before defining the label; then it
assembles each expression as a four-byte value. The current location
is advanced by $4n$ if there are $n$~expressions in the list. Each
expression should be a pure number that fits in four bytes.
 
\bull \<label> \.{OCTA} \<expression list>
is similar, but it first aligns the current location to a multiple of~8;
it assembles each expression as an eight-byte value. The current location
is advanced by $8n$ if there are $n$~expressions in the list. Any or all
of the expressions may be future references, but they should all
be defined as pure numbers eventually.
 
@ Global registers are important for accessing memory in \MMIX\ programs.
They could be allocated by hand, and defined with \.{IS} instructions,
but \MMIXAL\ provides a mechanism that is usually much more convenient:
 
\bull \<label> \.{GREG} \<expression>
allocates a new global register, and assigns its number as the
equivalent of the label.
At the beginning of assembly, the current global threshold~G is~\$255.
Each distinct \.{GREG} instruction decreases~G by~1; the final value of~G will
be the initial value of~rG when the assembled program is loaded.
 
The value of the expression will be loaded into the global register
at the beginning of the program. {\it If this value is nonzero, it
should remain constant throughout the program execution\/}; such
global registers are considered to be {\it base addresses}. Two or
more base addresses with the same constant value are assigned to the
same global register number.
 
Base addresses can simplify memory accesses in an important way.
Suppose, for example, five octabyte values appear in a data segment,
and their addresses are called \.{AA}, \.{BB}, \.{CC}, \.{DD}, and
\.{EE}:
$$\.{AA LOC @@+8;BB LOC @@+8;CC LOC @@+8;DD LOC @@+8;EE LOC @@+8}$$
Then if you say \.{Base GREG AA}, you will be able to write simply
`\.{LDO}~\.{\$1,AA}' to bring \.{AA} into register~\.{\$1}, and
`\.{LDO}~\.{\$2,CC}' to bring \.{CC} into register~\.{\$2}.
 
Here's how it works: Whenever a memory operation such as
\.{LDO} or \.{STB} or \.{GO} has only two operands, the second
operand should be a pure number whose value can be expressed
as $b+\delta$, where $0\le\delta<256$ and $b$ is the value of
a base address in one of the preceding \.{GREG} commands. The \MMIXAL\
processor will find the closest base address and manufacture an
appropriate command. For example, the instruction `\.{LDO}~\.{\$2,CC}' in the
example of the preceding paragraph would be converted automatically to
`\.{LDO}~\.{\$2,Base,16}'.
 
If no base address is close enough, an error message will be
generated, unless this program is run with the \.{-x} option
on the command line. The \.{-x} option inserts additional instructions
if necessary, using global register~255, so that any address is
accessible. For example,
if there is no base address that allows \.{LDO}~\.{\$2,FF} to be
implemented in a single instruction, but if \.{FF} equals \.{Base+1000},
then the \.{-x} option would assemble two instructions,
$$\.{SETL \$255,1000; LDO \$2,Base,\$255}$$
in place of \.{LDO}~\.{\$2,FF}. Caution:~The \.{-x} feature makes the
number of actual \MMIX\ instructions hard to predict, so extreme care must
be used if your style of coding includes relative branch instructions
in dangerous forms like `\.{BNZ}~\.{x,@@+8}'.
 
This base address convention can be used also with the alias
operation~\.{LDA}. For example, `\.{LDA}~\.{\$3,CC}' loads the
@.LDA@>
address of \.{CC} into register~3, by assembling the instruction
`\.{ADDU}~\.{\$3,Base,16}'.
 
\MMIXAL\ also allows a two-operand form for memory operations such as
$$\hbox{\.{LDO} \.{\$1,\$2}}$$
to be an abbreviation for `\.{LDO} \.{\$1,\$2,0}'.
 
When \MMIXAL\ programs use subroutines with a memory stack in addition
to the built-in register stack, they usually begin with the
instructions `\.{sp}~\.{GREG}~\.{0;fp}~\.{GREG}~\.0'; these instructions
allocate a {\it stack pointer\/} \.{sp=\$254} and a {\it frame pointer\/}
\.{fp=\$253}. However, subroutine libraries are free to implement any
conventions for global registers and stacks that they like.
@^stack pointer@>
@^frame pointer@>
 
@ Short programs rarely run out of global registers, but long programs
need a mechanism to check that \.{GREG} hasn't been used too often.
The following pseudo-instruction provides the necessary safety valve:
 
\bull \.{LOCAL} \<expression>
ensures that the expression will be a local register in the program
being assembled. The expression should be a register number, and
the label field should be blank. At the close of
assembly, \MMIXAL\ will report an error if the final value of~G does
not exceed all register numbers that are declared local in this way.
 
A \.{LOCAL} instruction need not be given unless the register number
is 32 or~more. (\MMIX\ always considers \.{\$0} through \.{\$31} to be
local, so \MMIXAL\ implicitly acts as if the
instruction `\.{LOCAL}~\.{\$31}' were present.)
 
@ Finally, there are two pseudo-instructions to pass information
and hints to the loading routine and/or to debuggers that will be
using the assembled program.
 
\bull \.{BSPEC} \<expression>
begins ``special mode''; the \<expression> should have a value that
fits in two bytes, and the label field should be blank.
 
\bull \.{ESPEC}
ends ``special mode''; the operand field is ignored, and the label
field should be blank.
 
\smallskip\noindent
All material assembled between \.{BSPEC} and \.{ESPEC} is passed
directly to the output, but not loaded as part of the assembled program.
Ordinary \MMIX\ instructions cannot appear in special mode; only the
pseudo-operations \.{IS}, \.{PREFIX}, \.{BYTE}, \.{WYDE}, \.{TETRA},
\.{OCTA}, \.{GREG}, and \.{LOCAL} are allowed. The operand of
\.{BSPEC} should have a value that fits in two bytes; this value
identifies the kind of data that follows. (For example, \.{BSPEC}~\.0
might introduce information about subroutine calling conventions at the
current location, and \.{BSPEC}~\.1 might introduce line numbers from
a high-level-language program that was compiled into the code at
the current place.
System routines often need to pass such information through an assembler
to the operating system, hence \MMIXAL\ provides a general-purpose conduit.)
 
@ A program should begin at the special symbolic location \.{Main}
@.Main@>
(more precisely, at the address corresponding to
the fully qualified symbol \.{:Main}).
This symbol always has serial number~1, and it must always be defined.
@^serial number@>
 
Locations should not receive assembled data more than once.
(More precisely, the loader will load the bitwise~xor of all the
data assembled for each byte position; but the general rule ``do not load
two things into the same byte'' is safest.)
All locations that do not receive assembled data are initially zero,
except that the loading routine will put register stack data into
segment~3, and the operating system may put command-line data and
debugger data into segment~2.
(The rudimentary \MMIX\ operating system starts a program
with the number of command-line arguments in~\$0, and a pointer to
the beginning of an array of argument pointers in~\$1.)
Segments 2 and 3 should not get assembled data, unless the
user is a true hacker who is willing to take the risk that such data
might crash the system.
 
@* Binary MMO output. When the \MMIXAL\ processor assembles a file
called \.{foo.mms}, it produces a binary output file called \.{foo.mmo}.
(The suffix \.{mms} stands for ``\MMIX\ symbolic,'' and \.{mmo} stands
for ``\MMIX\ object.'') Such \.{mmo} files have a simple structure
consisting of a sequence of tetrabytes. Some of the tetrabytes are
instructions to a loading routine; others are data to be loaded.
@^object files@>
 
Loader instructions are distinguished from tetrabytes of data by their
first (most significant) byte, which has the special escape-code value
\Hex{98}, called |mm| in the program below. This code value corresponds
to \MMIX's opcode \.{LDVTS}, which is unlikely to occur in tetras of
data. The second byte~X of a loader instruction is the loader opcode,
called the {\it lopcode}. The third and fourth bytes, Y~and~Z, are
operands. Sometimes they are combined into a single 16-bit operand called~YZ.
@^lopcodes@>
 
@d mm 0x98
 
@ A small, contrived example will help explain the basic ideas of \.{mmo}
format. Consider the following input file, called \.{test.mms}:
$$\obeyspaces\vbox{\halign{\tt#\hfil\cr
\% A peculiar example of MMIXAL\cr
\ LOC Data\_Segment \% location \#2000000000000000\cr
\ OCTA 1F \% a future reference\cr
a GREG @@ \% \$254 is base address for ABCD\cr
ABCD BYTE "ab" \% two bytes of data\cr
\ LOC \#123456789 \% switch to the instruction segment\cr
Main JMP 1F \% another future reference\cr
\ LOC @@+\#4000 \% skip past 16384 bytes\cr
2H LDB \$3,ABCD+1 \% use the base address\cr
\ BZ \$3,1F; TRAP \% and refer to the future again\cr
\# 3 "foo.mms" \% this comment is a line directive\cr
\ LOC 2B-4*10 \% move 10 tetras before previous location\cr
1H JMP 2B \% resolve previous references to 1F\cr
\ BSPEC 5 \% begin special data of type 5\cr
\ TETRA {\AM}a<<8 \% four bytes of special data\cr
\ WYDE a-\$0 \% two more bytes of special data\cr
\ ESPEC \% end a special data packet\cr
\ LOC ABCD+2 \% resume the data segment\cr
\ BYTE "cd",\#98 \% assemble three more bytes of data\cr
}}$$
It defines a silly program that essentially puts \.{'b'} into register~3;
the program halts when it gets to an all-zero \.{TRAP} instruction
following the~\.{BZ}. But the assembled output of this file illustrates most
of the features of \MMIX\ objects, and in fact \.{test.mms} was the
first test file tried by the author when the \MMIXAL\ processor was originally
written.
 
The binary output file \.{test.mmo} assembled from \.{test.mms} consists
of the following tetrabytes, shown in hexadecimal notation with brief
comments. Fuller explanations
appear with the descriptions of individual lopcodes below.
$$
\halign{\hskip.5in\tt#&\quad#\hfil\cr
98090101&|lop_pre| $1,1$ (preamble, version 1, 1 tetra)\cr
36f4a363&(the file creation time)\cr
% Sat Mar 20 23:44:35 1999
98012001&|lop_loc| $\Hex{20},1$ (data segment, 1 tetra)\cr
00000000&(low tetrabyte of address in data segment)\cr
00000000&(high tetrabyte of \.{OCTA} \.{1F})\cr
00000000&(low tetrabyte, will be fixed up later)\cr
61620000&(\.{"ab"}, padded with trailing zeros)\cr
\noalign{\penalty-200}
98010002&|lop_loc| $0,2$ (instruction segment, 2 tetras)\cr
00000001&(high tetrabyte of address in instruction segment)\cr
2345678c&(low tetrabyte of address, after alignment)\cr
98060002&|lop_file| $0,2$ (file name 0, 2 tetras)\cr
74657374&(\.{"test"})\cr
2e6d6d73&(\.{".mms"})\cr
98070007&|lop_line| 7 (line 7 of the current file)\cr
f0000000&(\.{JMP} \.{1F}, will be fixed up later)\cr
98024000&|lop_skip| \Hex{4000} (advance 16384 bytes)\cr
98070009&|lop_line| 9 (line 9 of the current file)\cr
8103fe01&(\.{LDB} \.{\$3,b,1}, uses base address \.b)\cr
42030000&(\.{BZ} \.{\$3,1F}, will be fixed later)\cr
9807000a&|lop_line| 10 (stay on line 10)\cr
00000000&(\.{TRAP})\cr
98010002&|lop_loc| $0,2$ (instruction segment, 2 tetras)\cr
00000001&(high tetrabyte of address in instruction segment)\cr
2345a768&(low tetrabyte of address \.{1H})\cr
98050010&|lop_fixrx| 16 (fix 16-bit relative address)\cr
0100fff5&(fixup for location \.{@@-4*-11})\cr
98040ff7&|lop_fixr| \Hex{ff7} (fix \.{@@-4*\#ff7})\cr
98032001&|lop_fixo| $\Hex{20},1$ (data segment, 1 tetra)\cr
00000000&(low tetrabyte of data segment address to fix)\cr
98060102&|lop_file| $1,2$ (file name 1, 2 tetras)\cr
666f6f2e&(\.{"foo."})\cr
6d6d7300&(\.{"mms",0})\cr
98070004&|lop_line| 4 (line 4 of the current file)\cr
f000000a&(\.{JMP} \.{2B})\cr
98080005&|lop_spec| 5 (begin special data of type 5)\cr
00000200&(\.{TETRA} \.{\&a<<8})\cr
00fe0000&(\.{WYDE} \.{a-\$0})\cr
98012001&|lop_loc| $\Hex{20},1$ (data segment, 1 tetra)\cr
0000000a&(low tetrabyte of address in data segment)\cr
00006364&(\.{"cd"} with leading zeros, because of alignment)\cr
98000001&|lop_quote| (don't treat next tetrabyte as a lopcode)\cr
98000000&(\.{BYTE} \.{\#98}, padded with trailing zeros)\cr
980a00fe&|lop_post| \$254 (begin postamble, G is 254)\cr
20000000&(high tetrabyte of the initial contents of \$254)\cr
00000008&(low tetrabyte of base address \$254)\cr
00000001&(high tetrabyte of the initial contents of \$255)\cr
2345678c&(low tetrabyte of \$255, is address of \.{Main})\cr
980b0000&|lop_stab| (begin symbol table)\cr
203a5040&(compressed form for symbol table as a ternary trie)\cr
50404020\cr
41204220\cr
43094408\cr
83404020&(\.{ABCD} = \Hex{2000000000000008}, serial 3)\cr
4d206120\cr
69056e01\cr
2345678c\cr
81400f61&(\.{Main} = \Hex{000000012345678c}, serial 1)\cr
fe820000&(\.{a} = \$254, serial 2)\cr
980c000a&|lop_end| (end symbol table, 10 tetras)\cr
}$$
 
@ When a tetrabyte of the \.{mmo} file does not begin with the escape code,
it is loaded into the current location~$\lambda$, and $\lambda$ is increased
to the next higher multiple of~4.
(If $\lambda$ is not a multiple of~4, the tetrabyte actually goes
into location $\lambda\land(-4)=4\lfloor\lambda/4\rfloor$, according
to \MMIX's usual conventions.) The current line number is also increased
by~1, if it is nonzero.
 
When a tetrabyte does begin with the escape code, its next byte
is the lopcode defining a loader instruction. There are thirteen lopcodes:
 
\bull |lop_quote|: $\rm X=\Hex{00}$, $\rm YZ=1$. Treat the next tetra as
an ordinary tetrabyte, even if it begins with the escape code.
 
\bull |lop_loc|: $\rm X=\Hex{01}$, $\rm Y=high$ byte, $\rm Z=tetra$ count
($\rm Z=1$~or~2). Set the current location to the 64-bit address defined
by the next Z tetras, plus $\rm 2^{56}Y$. Usually $\rm Y=0$ (for the
instruction segment) or $\rm Y=\Hex{20}$ (for the data segment).
If $\rm Z=2$, the high tetra appears first.
 
\bull |lop_skip|: $\rm X=\Hex{02}$, $\rm YZ=delta$. Increase the
current location by~YZ.
 
\bull |lop_fixo|: $\rm X=\Hex{03}$, $\rm Y=high$ byte, $\rm Z=tetra$ count
($\rm Z=1$~or~2). Load the value of the current location~$\lambda$ into
octabyte~P, where P~is the 64-bit address defined by the next Z tetras
plus $\rm2^{56}Y$ as in |lop_loc|. (The octabyte at~P was previously assembled
as zero because of a future reference.)
 
\bull |lop_fixr|: $\rm X=\Hex{04}$, $\rm YZ=delta$. Load YZ into the YZ~field
of the tetrabyte in location~P, where P~is
$\rm\lambda-4YZ$, namely the address that precedes the current location
by YZ~tetrabytes. (This tetrabyte was previously loaded with an \MMIX\
instruction that takes a relative address: a branch, probable branch,
\.{JMP}, \.{PUSHJ}, or~\.{GETA}. Its YZ~field was previously
assembled as zero because of a future reference.)
 
\bull |lop_fixrx|: $\rm X=\Hex{05}$, $\rm Y=0$, $\rm Z=16$ or 24.
Proceed as in |lop_fixr|,
but load $\delta$ into tetrabyte $\rm P=\lambda-4\delta$ instead of loading
YZ into $\rm P=\lambda-4YZ$. Here $\delta$ is the value of the tetrabyte
following the |lop_fixrx| instruction; its leading byte will either
0 or~1. If the leading byte is~1, $\delta$ should be treated as the
{\it negative\/} number $(\delta\land\Hex{ffffff})-2^{\rm Z}$ when
calculating the address~P. (The latter case arises only rarely,
but it is needed when fixing up a relative ``future'' reference that
ultimately leads to a ``backward'' instruction. The value of~$\delta$ that
is xored into location~P in such cases will change \.{BZ} to \.{BZB},
or \.{JMP} to \.{JMPB}, etc.; we have $\rm Z=24$ when fixing a~\.{JMP},
$\rm Z=16$ otherwise.)
 
\bull |lop_file|: $\rm X=\Hex{06}$, $\rm Y=file$ number, $\rm Z=tetra$ count.
Set the current file number to~Y and the current line number to~zero. If this
file number has occurred previously, Z~should be zero; otherwise Z~should be
positive, and the next Z tetrabytes are the characters of the file name in
big-endian order.
Trailing zeros follow the file name if its length is not a multiple of~4.
 
\bull |lop_line|: $\rm X=\Hex{07}$, $\rm YZ=line$ number. Set the current line
number to~YZ\null. If the line number is nonzero, the current file and current
line should correspond to the source location that generated the next data to
be loaded, for use in diagnostic messages. (The \MMIXAL\ processor gives
precise line numbers to the sources of tetrabytes in segment~0, which tend to
be instructions, but not to the sources of tetrabytes assembled in other
segments.)
 
\bull |lop_spec|: $\rm X=\Hex{08}$, $\rm YZ=type$. Begin special data of
type~YZ\null. The subsequent tetrabytes, continuing until the next loader
operation other than |lop_quote|, comprise the special data. A |lop_quote|
instruction allows tetrabytes of special data to begin with the escape code.
 
\bull |lop_pre|: $\rm X=\Hex{09}$, $\rm Y=1$, $\rm Z=tetra$ count. A~|lop_pre|
instruction, which defines the ``preamble,'' must be the first tetrabyte of
every \.{mmo} file. The Y~field specifies the version number of \.{mmo}
format, currently~1; other version numbers may be defined later, but
version~1 should always be supported as described in the present document.
The Z~tetrabytes following a |lop_pre| command provide additional information
that might be of interest to system routines. If $\rm Z>0$, the first tetra
of additional information records the time that this \.{mmo} file was
created, measured in seconds since 00:00:00 Greenwich Mean Time on
1~Jan~1970.
 
\bull |lop_post|: $\rm X=\Hex{0a}$, $\rm Y=0$, $\rm Z=G$ (must be 32~or~more).
This instruction begins the {\it postamble}, which follows all instructions
and data to be loaded. It causes the loaded program to begin with rG equal to
the stated value of~G, and with \$G, $\rm G+1$, \dots,~\$255 initially set to
the values of the next $\rm(256-G)*2$ tetrabytes. These tetrabytes specify
$\rm 256-G$ octabytes in big-endian fashion (high half first).
 
\bull |lop_stab|: $\rm X=\Hex{0b}$, $\rm YZ=0$. This instruction must appear
immediately after the $\rm(256-G)*2$ tetrabytes following~|lop_post|. It is
followed by the symbol table, which lists the equivalents of all user-defined
symbols in a compact form that will be described later.
 
\bull |lop_end|: $\rm X=\Hex{0c}$, $\rm YZ=tetra$ count. This instruction
must be the very last tetrabyte of each \.{mmo} file. Furthermore,
exactly YZ tetrabytes must appear between it and the |lop_stab| command.
(Therefore a program can easily find the symbol table without reading
forward through the entire \.{mmo} file.)
 
\smallskip
A separate routine called \.{MMOtype} is available to translate
binary \.{mmo} files into human-readable form.
 
@d lop_quote 0x0 /* the quotation lopcode */
@d lop_loc 0x1 /* the location lopcode */
@d lop_skip 0x2 /* the skip lopcode */
@d lop_fixo 0x3 /* the octabyte-fix lopcode */
@d lop_fixr 0x4 /* the relative-fix lopcode */
@d lop_fixrx 0x5 /* extended relative-fix lopcode */
@d lop_file 0x6 /* the file name lopcode */
@d lop_line 0x7 /* the file position lopcode */
@d lop_spec 0x8 /* the special hook lopcode */
@d lop_pre 0x9 /* the preamble lopcode */
@d lop_post 0xa /* the postamble lopcode */
@d lop_stab 0xb /* the symbol table lopcode */
@d lop_end 0xc /* the end-it-all lopcode */
 
@ Many readers will have noticed that \MMIXAL\ has no facilities for
relocatable output, nor does \.{mmo} format support such features. The
author's first drafts of \MMIXAL\ and \.{mmo} did allow relocatable objects,
with external linkages, but the rules were substantially more complicated and
therefore inconsistent with the goals of {\sl The Art of Computer Programming}.
The present design might actually prove to be superior to the current
practice, now that computer memory is significantly cheaper than it
used to be, because one-pass assembly and loading are extremely fast when
relocatability and external linkages are disallowed. Different program modules
can be assembled together about as fast as they could be linked together under
a relocatable scheme, and they can communicate with each other in much more
flexible ways. Debugging tools are enhanced when open-source libraries are
combined with user programs, and such libraries will certainly improve in
quality when their source form is accessible to a larger community of users.
 
@* Basic data types.
This program for the 64-bit \MMIX\ architecture is based on 32-bit integer
arithmetic, because nearly every computer available to the author at the time
of writing was limited in that way.
Details of the basic arithmetic appear in a separate program module
called {\mc MMIX-ARITH}, because the same routines are needed also
for the simulators. The definition of type \&{tetra} should be changed, if
necessary, to conform with the definitions found in {\mc MMIX-ARITH}.
@^system dependencies@>
 
@<Type...@>=
typedef unsigned int tetra;
/* assumes that an int is exactly 32 bits wide */
typedef struct { tetra h,l;} octa; /* two tetrabytes make one octabyte */
typedef enum {@!false,@!true}@+@!bool;
 
@ @<Glob...@>=
extern octa zero_octa; /* |zero_octa.h=zero_octa.l=0| */
extern octa neg_one; /* |neg_one.h=neg_one.l=-1| */
extern octa aux; /* auxiliary output of a subroutine */
extern bool overflow; /* set by certain subroutines for signed arithmetic */
 
@ Most of the subroutines in {\mc MMIX-ARITH} return an octabyte as
a function of two octabytes; for example, |oplus(y,z)| returns the
sum of octabytes |y| and~|z|. Division inputs the high
half of a dividend in the global variable~|aux| and returns
the remainder in~|aux|.
 
@<Sub...@>=
extern octa oplus @,@,@[ARGS((octa y,octa z))@];
/* unsigned $y+z$ */
extern octa ominus @,@,@[ARGS((octa y,octa z))@];
/* unsigned $y-z$ */
extern octa incr @,@,@[ARGS((octa y,int delta))@];
/* unsigned $y+\delta$ ($\delta$ is signed) */
extern octa oand @,@,@[ARGS((octa y,octa z))@];
/* $y\land z$ */
extern octa shift_left @,@,@[ARGS((octa y,int s))@];
/* $y\LL s$, $0\le s\le64$ */
extern octa shift_right @,@,@[ARGS((octa y,int s,int uns))@];
/* $y\GG s$, signed if |!uns| */
extern octa omult @,@,@[ARGS((octa y,octa z))@];
/* unsigned $(|aux|,x)=y\times z$ */
extern octa odiv @,@,@[ARGS((octa x,octa y,octa z))@];
/* unsigned $(x,y)/z$; $|aux|=(x,y)\bmod z$ */
 
@ Here's a rudimentary check to see if arithmetic is in trouble.
 
@<Init...@>=
acc=shift_left(neg_one,1);
if (acc.h!=0xffffffff) panic("Type tetra is not implemented correctly");
@.Type tetra...@>
 
@ Future versions of this program will work with symbols formed from Unicode
characters, but the present code limits itself to an 8-bit subset.
@^Unicode@>
The type \&{Char} is defined here in order to ease the later transition:
At present, \&{Char} is the same as \&{unsigned} \&{char}, but
\&{Char} can be changed to a 16-bit type in the Unicode version.
 
Other changes will also be necessary when the transition to Unicode is made;
for example, some calls of |fprintf| will become calls of |fwprintf|,
and some occurrences of \.{\%s} will become \.{\%ls} in print formats.
The switchable type name \&{Char} provides at least a first step
towards a brighter future with Unicode.
 
@<Type...@>=
typedef unsigned char Char; /* bytes that will become wydes some day */
 
@ While we're talking about classic systems versus future systems, we
might as well define the |ARGS| macro, which makes function prototypes
available on {\mc ANSI \CEE/} systems without making them
uncompilable on older systems. Each subroutine below is declared first
with a prototype, then with an old-style definition.
 
@<Preprocessor definitions@>=
#ifdef __STDC__
#define ARGS(list) list
#else
#define ARGS(list) ()
#endif
 
@* Basic input and output. Input goes into a buffer that is normally
limited to 72 characters. This limit can be raised, by using the
\.{-b} option when invoking the assembler; but short buffers will keep listings
from becoming unwieldy, because a symbolic listing adds 19 characters per~line.
 
@<Initialize everything@>=
if (buf_size<72) buf_size=72;
buffer=(Char*)calloc(buf_size+1,sizeof(Char));
lab_field=(Char*)calloc(buf_size+1,sizeof(Char));
op_field=(Char*)calloc(buf_size,sizeof(Char));
operand_list=(Char*)calloc(buf_size,sizeof(Char));
err_buf=(Char*)calloc(buf_size+60,sizeof(Char));
if (!buffer || !lab_field || !op_field || !operand_list || !err_buf)
panic("No room for the buffers");
@.No room...@>
 
@ @<Glob...@>=
Char *buffer; /* raw input of the current line */
Char *buf_ptr; /* current position within |buffer| */
Char *lab_field; /* copy of the label field of the current instruction */
Char *op_field; /* copy of the opcode field of the current instruction */
Char *operand_list; /* copy of the operand field of the current instruction */
Char *err_buf; /* place where dynamic error messages are sprinted */
 
@ @<Get the next line of input text, or |break| if the input has ended@>=
if (!fgets(buffer,buf_size+1,src_file)) break;
line_no++;
line_listed=false;
j=strlen(buffer);
if (buffer[j-1]=='\n') buffer[j-1]='\0'; /* remove the newline */
else if ((j=fgetc(src_file))!=EOF)
@<Flush the excess part of an overlong line@>;
if (buffer[0]=='#') @<Check for a line directive@>;
buf_ptr=buffer;
 
@ @<Flush the excess...@>=
{
while(j!='\n' && j!= EOF) j=fgetc(src_file);
if (!long_warning_given) {
long_warning_given=true;
err("*trailing characters of long input line have been dropped");
@.trailing characters...@>
fprintf(stderr,
"(say `-b <number>' to increase the length of my input buffer)\n");
}@+else err("*trailing characters dropped");
}
 
@ @<Glob...@>=
int cur_file; /* index of the current file in |filename| */
int line_no; /* current position in the file */
bool line_listed; /* have we listed the buffer contents? */
bool long_warning_given; /* have we given the hint about \.{-b}? */
 
@ We keep track of source file name and line number at all times, for
error reporting and for synchronization data in the object file.
Up to 256 different source file names can be remembered.
 
@<Glob...@>=
Char *filename[257];
/* source file names, including those in line directives */
int filename_count; /* how many |filename| entries have we filled? */
 
@ If the current line is a line directive, it will also be treated
as a comment by the assembler.
 
@<Check for a line directive@>=
{
for (p=buffer+1;isspace(*p);p++);
for (j=*p++-'0';isdigit(*p);p++) j=10*j+*p-'0';
for (;isspace(*p);p++);
if (*p=='\"') {
if (!filename[filename_count]) {
filename[filename_count]=(Char*)calloc(FILENAME_MAX+1,sizeof(Char));
if (!filename[filename_count])
panic("Capacity exceeded: Out of filename memory");
@.Capacity exceeded...@>
}
for (p++,q=filename[filename_count];*p && *p!='\"';p++,q++) *q=*p;
if (*p=='\"' && *(p-1)!='\"') { /* yes, it's a line directive */
*q='\0';
for (k=0;strcmp(filename[k],filename[filename_count])!=0;k++);
if (k==filename_count) filename_count++;
cur_file=k;
line_no=j-1;
}
}
}
 
@ Archaic versions of the \CEE/ library do not define |FILENAME_MAX|.
 
@<Preprocessor definitions@>=
#ifndef FILENAME_MAX
#define FILENAME_MAX 256
#endif
 
@ @<Local variables@>=
register Char *p,*q; /* the place where we're currently scanning */
 
@ The next several subroutines are useful for preparing a listing of
the assembled results. In such a listing, which the user can request
with a command-line option, we fill the leftmost 19 columns with
a representation of the output that has been assembled from the
input in the buffer. Sometimes the assembled output requires
more than one line, because we have room to output only a tetrabyte per line.
 
The |flush_listing_line| subroutine is called when we have finished
generating one line's worth of assembled material. Its parameter is
a string to be printed between the assembled material and the
buffer contents, if the input line hasn't yet been echoed. The length
of this string should be 19 minus the number of characters already printed
on the current line of the listing.
 
@<Sub...@>=
void flush_listing_line @,@,@[ARGS((char*))@];@+@t}\6{@>
void flush_listing_line(s)
char *s;
{
if (line_listed) fprintf(listing_file,"\n");
else {
fprintf(listing_file,"%s%s\n",s,buffer);
line_listed=true;
}
}
 
@ Only the three least significant hex digits of a location are shown on
the listing, unless the other digits have changed. The following subroutine
prints an extra line when a change needs to be shown.
 
@<Sub...@>=
void update_listing_loc @,@,@[ARGS((int))@];@+@t}\6{@>
void update_listing_loc(k)
int k; /* the location to display, mod 4 */
{
if (cur_loc.h!=listing_loc.h || ((cur_loc.l^listing_loc.l)&0xfffff000)) {
fprintf(listing_file,"%08x%08x:",cur_loc.h,(cur_loc.l&-4)|k);
flush_listing_line(" ");
}
listing_loc.h=cur_loc.h;@+
listing_loc.l=(cur_loc.l&-4)|k;
}
 
@ @<Glob...@>=
octa cur_loc; /* current location of assembled output */
octa listing_loc; /* current location on the listing */
unsigned char hold_buf[4]; /* assembled bytes */
unsigned char held_bits; /* which bytes of |hold_buf| are active? */
unsigned char listing_bits; /* which of them haven't been listed yet? */
bool spec_mode; /* are we between |BSPEC| and |ESPEC|? */
tetra spec_mode_loc; /* number of bytes in the current special output */
 
@ When bytes are assembled, they are placed into the |hold_buf|.
More precisely, a byte assembled for a location that is |j|~plus a
multiple of~4 is placed into |hold_buf[j]|; two auxiliary variables,
|held_bits| and |listing_bits|, are then increased by |1<<j|.
Furthermore, |listing_bits|
is increased by |0x10<<j| if that byte is a future reference to be
resolved later.
 
The bytes are held until we need to output them.
The |listing_clear| routine lists any that have been held
but not yet shown. It should be called only when |listing_bits!=0|.
 
@<Sub...@>=
void listing_clear @,@,@[ARGS((void))@];@+@t}\6{@>
void listing_clear()
{
register int j,k;
for (k=0;k<4;k++) if (listing_bits&(1<<k)) break;
if (spec_mode) fprintf(listing_file," ");
else {
update_listing_loc(k);
fprintf(listing_file," ...%03x: ",(listing_loc.l&0xffc)|k);
}
for (j=0;j<4;j++)
if (listing_bits&(0x10<<j)) fprintf(listing_file,"xx");
else if (listing_bits&(1<<j)) fprintf(listing_file,"%02x",hold_buf[j]);
else fprintf(listing_file," ");
flush_listing_line(" ");
listing_bits=0;
}
 
@ Error messages are written to |stderr|. If the message begins with
`\.*' it is merely a warning; if it begins with `\.!' it is fatal;
otherwise the error is probably serious enough to make manual correction
necessary, yet it is not tragic. Errors and warnings appear
also on the optional listing file.
 
@d err(m) {@+report_error(m);@+if (m[0]!='*') goto bypass;@+}
@d derr(m,p) {@+sprintf(err_buf,m,p);
report_error(err_buf);@+if (err_buf[0]!='*') goto bypass;@+}
@d dderr(m,p,q) {@+sprintf(err_buf,m,p,q);
report_error(err_buf);@+if (err_buf[0]!='*') goto bypass;@+}
@d panic(m) {@+sprintf(err_buf,"!%s",m);@+report_error(err_buf);@+}
@d dpanic(m,p) {@+err_buf[0]='!';@+sprintf(err_buf+1,m,p);@+
report_error(err_buf);@+}
 
@<Sub...@>=
void report_error @,@,@[ARGS((char*))@];@+@t}\6{@>
void report_error(message)
char *message;
{
if (!filename[cur_file]) filename[cur_file]="(nofile)";
if (message[0]=='*')
fprintf(stderr,"\"%s\", line %d warning: %s\n",
filename[cur_file],line_no,message+1);
else if (message[0]=='!')
fprintf(stderr,"\"%s\", line %d fatal error: %s\n",
filename[cur_file],line_no,message+1);
else {
fprintf(stderr,"\"%s\", line %d: %s!\n",
filename[cur_file],line_no,message);
err_count++;
}
if (listing_file) {
if (!line_listed) flush_listing_line("****************** ");
if (message[0]=='*') fprintf(listing_file,
"************ warning: %s\n",message+1);
else if (message[0]=='!') fprintf(listing_file,
"******** fatal error: %s!\n",message+1);
else fprintf(listing_file,
"********** error: %s!\n",message);
}
if (message[0]=='!') exit(-2);
}
 
@ @<Glob...@>=
int err_count; /* this many errors were found */
 
@ Output to the binary |obj_file| occurs four bytes at a time. The
bytes are assembled in small buffers, not output as single tetrabytes,
because we want the output to be big-endian even when the assembler
is running on a little-endian machine.
@^big-endian versus little-endian@>
@^little-endian versus big-endian@>
 
@d mmo_write(buf) if (fwrite(buf,1,4,obj_file)!=4)
dpanic("Can't write on %s",obj_file_name)
@.Can't write...@>
 
@<Sub...@>=
void mmo_clear @,@,@[ARGS((void))@];
void mmo_out @,@,@[ARGS((void))@];
unsigned char lop_quote_command[4]={mm,lop_quote,0,1};
void mmo_clear() /* clears |hold_buf|, when |held_bits!=0| */
{
if (hold_buf[0]==mm) mmo_write(lop_quote_command);
mmo_write(hold_buf);
if (listing_file && listing_bits) listing_clear();
held_bits=0;
hold_buf[0]=hold_buf[1]=hold_buf[2]=hold_buf[3]=0;
mmo_cur_loc=incr(mmo_cur_loc,4);@+ mmo_cur_loc.l&=-4;
if (mmo_line_no) mmo_line_no++;
}
@#
unsigned char mmo_buf[4];
int mmo_ptr;
void mmo_out() /* output the contents of |mmo_buf| */
{
if (held_bits) mmo_clear();
mmo_write(mmo_buf);
}
 
@ @<Sub...@>=
void mmo_tetra @,@,@[ARGS((tetra))@];
void mmo_byte @,@,@[ARGS((unsigned char))@];
void mmo_lop @,@,@[ARGS((char,unsigned char,unsigned char))@];
void mmo_lopp @,@,@[ARGS((char,unsigned short))@];
void mmo_tetra(t) /* output a tetrabyte */
tetra t;
{
mmo_buf[0]=t>>24;@+ mmo_buf[1]=(t>>16)&0xff;
mmo_buf[2]=(t>>8)&0xff;@+ mmo_buf[3]=t&0xff;
mmo_out();
}
@#
void mmo_byte(b)
unsigned char b;
{
mmo_buf[(mmo_ptr++)&3]=b;
if (!(mmo_ptr&3)) mmo_out();
}
@#
void mmo_lop(x,y,z) /* output a loader operation */
char x;
unsigned char y,z;
{
mmo_buf[0]=mm;@+ mmo_buf[1]=x;@+ mmo_buf[2]=y;@+ mmo_buf[3]=z;
mmo_out();
}
@#
void mmo_lopp(x,yz) /* output a loader operation with two-byte operand */
char x;
unsigned short yz;
{
mmo_buf[0]=mm;@+ mmo_buf[1]=x;@+
mmo_buf[2]=yz>>8;@+ mmo_buf[3]=yz&0xff;
mmo_out();
}
 
@ The |mmo_loc| subroutine makes the current location in the object file
equal to |cur_loc|.
 
@<Sub...@>=
void mmo_loc @,@,@[ARGS((void))@];@+@t}\6{@>
void mmo_loc()
{
octa o;
if (held_bits) mmo_clear();
o=ominus(cur_loc,mmo_cur_loc);
if (o.h==0 && o.l<0x10000) {
if (o.l) mmo_lopp(lop_skip,o.l);
}@+else {
if (cur_loc.h&0xffffff) {
mmo_lop(lop_loc,0,2);
mmo_tetra(cur_loc.h);
}@+else mmo_lop(lop_loc,cur_loc.h>>24,1);
mmo_tetra(cur_loc.l);
}
mmo_cur_loc=cur_loc;
}
 
@ Similarly, the |mmo_sync| subroutine makes sure that the current file and
line number in the output file agree with |cur_file| and |line_no|.
 
@<Sub...@>=
void mmo_sync @,@,@[ARGS((void))@];@+@t}\6{@>
void mmo_sync()
{
register int j; register unsigned char *p;
if (cur_file!=mmo_cur_file) {
if (filename_passed[cur_file]) mmo_lop(lop_file,cur_file,0);
else {
mmo_lop(lop_file,cur_file,(strlen(filename[cur_file])+3)>>2);
for (j=0,p=filename[cur_file];*p;p++,j=(j+1)&3) {
mmo_buf[j]=*p;
if (j==3) mmo_out();
}
if (j) {
for (;j<4;j++) mmo_buf[j]=0;
mmo_out();
}
filename_passed[cur_file]=1;
}
mmo_cur_file=cur_file;
mmo_line_no=0;
}
if (line_no!=mmo_line_no) {
if (line_no>=0x10000)
panic("I can't deal with line numbers exceeding 65535");
@.I can't deal with...@>
mmo_lopp(lop_line,line_no);
mmo_line_no=line_no;
}
}
 
@ @<Glob...@>=
octa mmo_cur_loc; /* current location in the object file */
int mmo_line_no; /* current line number in the \.{mmo} output so far */
int mmo_cur_file; /* index of the current file in the \.{mmo} output so far */
char filename_passed[256]; /* has a filename been recorded in the output? */
 
@ Here is a basic subroutine that assembles |k| bytes starting at |cur_loc|.
The value of |k| should be 1, 2, or~4, and |cur_loc| should be a multiple
of~|k|. The |x_bits| parameter tells which bytes, if any, are part of
a future reference.
 
@<Sub...@>=
void assemble @,@,@[ARGS((char,tetra,unsigned char))@];@+@t}\6{@>
void assemble(k,dat,x_bits)
char k;
tetra dat;
unsigned char x_bits;
{
register int j,jj,l;
if (spec_mode) l=spec_mode_loc;
else {
l=cur_loc.l;
@<Make sure |cur_loc| and |mmo_cur_loc| refer to the same tetrabyte@>;
if (!held_bits && !(cur_loc.h&0xe0000000)) mmo_sync();
}
for (j=0;j<k;j++) {
jj=(l+j)&3;
hold_buf[jj]=(dat>>(8*(k-1-j)))&0xff;
held_bits|=1<<jj;
listing_bits|=1<<jj;
}
listing_bits|=x_bits;
if (((l+k)&3)==0) {
if (listing_file) listing_clear();
mmo_clear();
}
if (spec_mode) spec_mode_loc+=k; else cur_loc=incr(cur_loc,k);
}
 
@ @<Make sure |cur_loc| and |mmo_cur_loc| refer to the same tetrabyte@>=
if (cur_loc.h!=mmo_cur_loc.h || ((cur_loc.l^mmo_cur_loc.l)&0xfffffffc))
mmo_loc();
 
@* The symbol table. Symbols are stored and retrieved by means of
a {\it ternary search trie}, following ideas of Bentley and
Sedgewick. (See {\sl ACM--SIAM Symp.\ on Discrete Algorithms\/ \bf8} (1997),
360--369; R.~Sedgewick, {\sl Algorithms in C\/} (Reading, Mass.:\
Addison--Wesley, 1998), \S15.4.) Each trie node stores a character,
@^Bentley, Jon Louis@>
@^Sedgewick, Robert@>
and there are branches to subtries for the cases where a given character
is less than, equal to, or greater than the character in the trie.
There also is a pointer to a symbol table entry if a symbol ends at
the current node.
 
@s sym_tab_struct int
 
@<Type...@>=
typedef struct ternary_trie_struct {
unsigned short ch; /* the (possibly wyde) character stored here */
struct ternary_trie_struct *left, *mid, *right; /* downward
in the ternary trie */
struct sym_tab_struct *sym; /* equivalents of symbols */
} trie_node;
 
@ We allocate trie nodes in chunks of 1000 at a time.
 
@<Sub...@>=
trie_node* new_trie_node @,@,@[ARGS((void))@];@+@t}\6{@>
trie_node* new_trie_node()
{
register trie_node *t=next_trie_node;
if (t==last_trie_node) {
t=(trie_node*)calloc(1000,sizeof(trie_node));
if (!t) panic("Capacity exceeded: Out of trie memory");
@.Capacity exceeded...@>
last_trie_node=t+1000;
}
next_trie_node=t+1;
return t;
}
@ @<Glob...@>=
trie_node *trie_root; /* root of the trie */
trie_node *op_root; /* root of subtrie for opcodes */
trie_node *next_trie_node, *last_trie_node; /* allocation control */
trie_node *cur_prefix; /* root of subtrie for unqualified symbols */
 
@ The |trie_search| subroutine starts at a given node of the trie and finds
a given string in its middle subtrie, inserting new nodes if necessary.
The string ends with the first nonletter or nondigit; the location
of the terminating character is stored in global variable~|terminator|.
 
@d isletter(c) (isalpha(c)||c=='_'||c==':'||c>126)
 
@<Sub...@>=
trie_node *trie_search @,@,@[ARGS((trie_node*,Char*))@];
Char *terminator; /* where the search ended */
trie_node *trie_search(t,s)
trie_node *t;
Char *s;
{
register trie_node *tt=t;
register Char *p=s;
while (1) {
if (!isletter(*p) && !isdigit(*p)) {
terminator=p;@+return tt;
}
if (tt->mid) {
tt=tt->mid;
while (*p!=tt->ch) {
if (*p<tt->ch) {
if (tt->left) tt=tt->left;
else {
tt->left=new_trie_node();@+tt=tt->left;@+goto store_new_char;
}
}@+else {
if (tt->right) tt=tt->right;
else {
tt->right=new_trie_node();@+tt=tt->right;@+goto store_new_char;
}
}
}
p++;
}@+else {
tt->mid=new_trie_node();@+tt=tt->mid;
store_new_char: tt->ch=*p++;
}
}
}
 
@ Symbol table nodes hold the serial numbers and
equivalents of defined symbols. They also
hold ``fixup information'' for undefined symbols; this will allow the
loader to correct any previously assembled instructions that refer to such
symbols when they are eventually defined.
 
In the symbol table node for a defined symbol, the |link| field
has one of the special codes |DEFINED| or |REGISTER| or |PREDEFINED|, and the
|equiv| field holds the defined value. The |serial| number
is a unique identifier for all user-defined symbols.
 
In the symbol table node for an undefined symbol, the |equiv| field
is ignored. The |link| field
points to the first node of fixup information; that node is, in turn,
a symbol table node that might link to other fixups. The |serial| number
in a fixup node is either 0 or 1 or 2, meaning respectively ``fixup the
octabyte pointed to by |equiv|'' or ``fixup the relative address in the YZ
field of the instruction pointed to by |equiv|'' or ``fixup the relative
address in the XYZ field of the instruction pointed to by |equiv|.''
 
@s sym_node int
@s bool int
 
@d DEFINED (sym_node*)1 /* code value for octabyte equivalents */
@d REGISTER (sym_node*)2 /* code value for register-number equivalents */
@d PREDEFINED (sym_node*)3 /* code value for not-yet-used equivalents */
@d fix_o 0 /* |serial| code for octabyte fixup */
@d fix_yz 1 /* |serial| code for relative fixup */
@d fix_xyz 2 /* |serial| code for \.{JMP} fixup */
 
@<Type...@>=
typedef struct sym_tab_struct {
int serial; /* serial number of symbol; type number for fixups */
struct sym_tab_struct *link; /* |DEFINED| status or link to fixup */
octa equiv; /* the equivalent value */
} sym_node;
 
@ The allocation of new symbol table nodes proceeds in chunks, like the
allocation of trie nodes. But in this case we also have the possibility
of reusing old fixup nodes that are no longer needed.
 
@d recycle_fixup(pp) pp->link=sym_avail, sym_avail=pp
 
@<Sub...@>=
sym_node* new_sym_node @,@,@[ARGS((bool))@];@+@t}\6{@>
sym_node* new_sym_node(serialize)
bool serialize; /* should the new node receive a unique serial number? */
{
register sym_node *p=sym_avail;
if (p) {
sym_avail=p->link;@+p->link=NULL;@+p->serial=0;@+p->equiv=zero_octa;
}@+else {
p=next_sym_node;
if (p==last_sym_node) {
p=(sym_node*)calloc(1000,sizeof(sym_node));
if (!p) panic("Capacity exceeded: Out of symbol memory");
@.Capacity exceeded...@>
last_sym_node=p+1000;
}
next_sym_node=p+1;
}
if (serialize) p->serial=++serial_number;
return p;
}
@ @<Glob...@>=
int serial_number;
sym_node *sym_root; /* root of the sym */
sym_node *next_sym_node, *last_sym_node; /* allocation control */
sym_node *sym_avail; /* stack of recycled symbol table nodes */
 
@ We initialize the trie by inserting all the predefined symbols.
Opcodes are given the prefix \.{\^}, to distinguish them from
ordinary symbols; this character nicely divides uppercase letters from
lowercase letters.
 
@<Init...@>=
trie_root=new_trie_node();
cur_prefix=trie_root;
op_root=new_trie_node();
trie_root->mid=op_root;
trie_root->ch=':';
op_root->ch='^';
@<Put the \MMIX\ opcodes and \MMIXAL\ pseudo-ops into the trie@>;
@<Put the special register names into the trie@>;
@<Put other predefined symbols into the trie@>;
 
@ Most of the assembly work can be table driven, based on bits that
are stored as the ``equivalents'' of opcode symbols like \.{\^ADD}.
 
@d rel_addr_bit 0x1 /* is YZ or XYZ relative? */
@d immed_bit 0x2 /* should opcode be immediate if Z or YZ not register? */
@d zar_bit 0x4 /* should register status of Z be ignored? */
@d zr_bit 0x8 /* must Z be a register? */
@d yar_bit 0x10 /* should register status of Y be ignored? */
@d yr_bit 0x20 /* must Y be a register? */
@d xar_bit 0x40 /* should register status of X be ignored? */
@d xr_bit 0x80 /* must X be a register? */
@d yzar_bit 0x100 /* should register status of YZ be ignored? */
@d yzr_bit 0x200 /* must YZ be a register? */
@d xyzar_bit 0x400 /* should register status of XYZ be ignored? */
@d xyzr_bit 0x800 /* must XYZ be a register? */
@d one_arg_bit 0x1000 /* is it OK to have zero or one operand? */
@d two_arg_bit 0x2000 /* is it OK to have exactly two operands? */
@d three_arg_bit 0x4000 /* is it OK to have exactly three operands? */
@d many_arg_bit 0x8000 /* is it OK to have more than three operands? */
@d align_bits 0x30000 /* how much alignment: byte, wyde, tetra, or octa? */
@d no_label_bit 0x40000 /* should the label be blank? */
@d mem_bit 0x80000 /* must YZ be a memory reference? */
@d spec_bit 0x100000 /* is this opcode allowed in \.{SPEC} mode? */
 
@<Type...@>=
typedef struct {
Char *name; /* symbolic opcode */
short code; /* numeric opcode */
int bits; /* treatment of operands */
} op_spec;
@#
typedef enum {
@!SET=0x100,@!IS,@!LOC,@!PREFIX,@!BSPEC,@!ESPEC,@!GREG,@!LOCAL,@/
@!BYTE,@!WYDE,@!TETRA,@!OCTA}@+@!pseudo_op;
 
@ @<Glob...@>=
op_spec op_init_table[]={@/
{"TRAP", 0x00, 0x27554},
@.TRAP@>
{"FCMP", 0x01, 0x240a8},
@.FCMP@>
{"FUN", 0x02, 0x240a8},
@.FUN@>
{"FEQL", 0x03, 0x240a8},@/
@.FEQL@>
{"FADD", 0x04, 0x240a8},
@.FADD@>
{"FIX", 0x05, 0x26288},
@.FIX@>
{"FSUB", 0x06, 0x240a8},
@.FSUB@>
{"FIXU", 0x07, 0x26288},@/
@.FIXU@>
{"FLOT", 0x08, 0x26282},
@.FLOT@>
{"FLOTU", 0x0a, 0x26282},
@.FLOTU@>
{"SFLOT", 0x0c, 0x26282},
@.SFLOT@>
{"SFLOTU", 0x0e, 0x26282},@/
@.SFLOTU@>
{"FMUL", 0x10, 0x240a8},
@.FMUL@>
{"FCMPE", 0x11, 0x240a8},
@.FCMPE@>
{"FUNE", 0x12, 0x240a8},
@.FUNE@>
{"FEQLE", 0x13, 0x240a8},@/
@.FEQLE@>
{"FDIV", 0x14, 0x240a8},
@.FDIV@>
{"FSQRT", 0x15, 0x26288},
@.FSQRT@>
{"FREM", 0x16, 0x240a8},
@.FREM@>
{"FINT", 0x17, 0x26288},@/
@.FINT@>
{"MUL", 0x18, 0x240a2},
@.MUL@>
{"MULU", 0x1a, 0x240a2},
@.MULU@>
{"DIV", 0x1c, 0x240a2},
@.DIV@>
{"DIVU", 0x1e, 0x240a2},@/
@.DIVU@>
{"ADD", 0x20, 0x240a2},
@.ADD@>
{"ADDU", 0x22, 0x240a2},
@.ADDU@>
{"SUB", 0x24, 0x240a2},
@.SUB@>
{"SUBU", 0x26, 0x240a2},@/
@.SUBU@>
{"2ADDU", 0x28, 0x240a2},
@.2ADDU@>
{"4ADDU", 0x2a, 0x240a2},
@.4ADDU@>
{"8ADDU", 0x2c, 0x240a2},
@.8ADDU@>
{"16ADDU", 0x2e, 0x240a2},@/
@.16ADDU@>
{"CMP", 0x30, 0x240a2},
@.CMP@>
{"CMPU", 0x32, 0x240a2},
@.CMPU@>
{"NEG", 0x34, 0x26082},
@.NEG@>
{"NEGU", 0x36, 0x26082},@/
@.NEGU@>
{"SL", 0x38, 0x240a2},
@.SL@>
{"SLU", 0x3a, 0x240a2},
@.SLU@>
{"SR", 0x3c, 0x240a2},
@.SR@>
{"SRU", 0x3e, 0x240a2},@/
@.SRU@>
{"BN", 0x40, 0x22081},
@.BN@>
{"BZ", 0x42, 0x22081},
@.BZ@>
{"BP", 0x44, 0x22081},
@.BP@>
{"BOD", 0x46, 0x22081},@/
@.BOD@>
{"BNN", 0x48, 0x22081},
@.BNN@>
{"BNZ", 0x4a, 0x22081},
@.BNZ@>
{"BNP", 0x4c, 0x22081},
@.BNP@>
{"BEV", 0x4e, 0x22081},@/
@.BEV@>
{"PBN", 0x50, 0x22081},
@.PBN@>
{"PBZ", 0x52, 0x22081},
@.PBZ@>
{"PBP", 0x54, 0x22081},
@.PBP@>
{"PBOD", 0x56, 0x22081},@/
@.PBOD@>
{"PBNN", 0x58, 0x22081},
@.PBNN@>
{"PBNZ", 0x5a, 0x22081},
@.PBNZ@>
{"PBNP", 0x5c, 0x22081},
@.PBNP@>
{"PBEV", 0x5e, 0x22081},@/
@.PBEV@>
{"CSN", 0x60, 0x240a2},
@.CSN@>
{"CSZ", 0x62, 0x240a2},
@.CSZ@>
{"CSP", 0x64, 0x240a2},
@.CSP@>
{"CSOD", 0x66, 0x240a2},@/
@.CSOD@>
{"CSNN", 0x68, 0x240a2},
@.CSNN@>
{"CSNZ", 0x6a, 0x240a2},
@.CSNZ@>
{"CSNP", 0x6c, 0x240a2},
@.CSNP@>
{"CSEV", 0x6e, 0x240a2},@/
@.CSEV@>
{"ZSN", 0x70, 0x240a2},
@.ZSN@>
{"ZSZ", 0x72, 0x240a2},
@.ZSZ@>
{"ZSP", 0x74, 0x240a2},
@.ZSP@>
{"ZSOD", 0x76, 0x240a2},@/
@.ZSOD@>
{"ZSNN", 0x78, 0x240a2},
@.ZSNN@>
{"ZSNZ", 0x7a, 0x240a2},
@.ZSNZ@>
{"ZSNP", 0x7c, 0x240a2},
@.ZSNP@>
{"ZSEV", 0x7e, 0x240a2},@/
@.ZSEV@>
{"LDB", 0x80, 0xa60a2},
@.LDB@>
{"LDBU", 0x82, 0xa60a2},
@.LDBU@>
{"LDW", 0x84, 0xa60a2},
@.LDW@>
{"LDWU", 0x86, 0xa60a2},@/
@.LDWU@>
{"LDT", 0x88, 0xa60a2},
@.LDT@>
{"LDTU", 0x8a, 0xa60a2},
@.LDTU@>
{"LDO", 0x8c, 0xa60a2},
@.LDO@>
{"LDOU", 0x8e, 0xa60a2},@/
@.LDOU@>
{"LDSF", 0x90, 0xa60a2},
@.LDSF@>
{"LDHT", 0x92, 0xa60a2},
@.LDHT@>
{"CSWAP", 0x94, 0xa60a2},
@.CSWAP@>
{"LDUNC", 0x96, 0xa60a2},@/
@.LDUNC@>
{"LDVTS", 0x98, 0xa60a2},
@.LDVTS@>
{"PRELD", 0x9a, 0xa6022},
@.PRELD@>
{"PREGO", 0x9c, 0xa6022},
@.PREGO@>
{"GO", 0x9e, 0xa60a2},@/
@.GO@>
{"STB", 0xa0, 0xa60a2},
@.STB@>
{"STBU", 0xa2, 0xa60a2},
@.STBU@>
{"STW", 0xa4, 0xa60a2},
@.STW@>
{"STWU", 0xa6, 0xa60a2},@/
@.STWU@>
{"STT", 0xa8, 0xa60a2},
@.STT@>
{"STTU", 0xaa, 0xa60a2},
@.STTU@>
{"STO", 0xac, 0xa60a2},
@.STO@>
{"STOU", 0xae, 0xa60a2},@/
@.STOU@>
{"STSF", 0xb0, 0xa60a2},
@.STSF@>
{"STHT", 0xb2, 0xa60a2},
@.STHT@>
{"STCO", 0xb4, 0xa6022},
@.STCO@>
{"STUNC", 0xb6, 0xa60a2},@/
@.STUNC@>
{"SYNCD", 0xb8, 0xa6022},
@.SYNCD@>
{"PREST", 0xba, 0xa6022},
@.PREST@>
{"SYNCID", 0xbc, 0xa6022},
@.SYNCID@>
{"PUSHGO", 0xbe, 0xa6062},@/
@.PUSHGO@>
{"OR", 0xc0, 0x240a2},
@.OR@>
{"ORN", 0xc2, 0x240a2},
@.ORN@>
{"NOR", 0xc4, 0x240a2},
@.NOR@>
{"XOR", 0xc6, 0x240a2},@/
@.XOR@>
{"AND", 0xc8, 0x240a2},
@.AND@>
{"ANDN", 0xca, 0x240a2},
@.ANDN@>
{"NAND", 0xcc, 0x240a2},
@.NAND@>
{"NXOR", 0xce, 0x240a2},@/
@.NXOR@>
{"BDIF", 0xd0, 0x240a2},
@.BDIF@>
{"WDIF", 0xd2, 0x240a2},
@.WDIF@>
{"TDIF", 0xd4, 0x240a2},
@.TDIF@>
{"ODIF", 0xd6, 0x240a2},@/
@.ODIF@>
{"MUX", 0xd8, 0x240a2},
@.MUX@>
{"SADD", 0xda, 0x240a2},
@.SADD@>
{"MOR", 0xdc, 0x240a2},
@.MOR@>
{"MXOR", 0xde, 0x240a2},@/
@.MXOR@>
{"SETH", 0xe0, 0x22080},
@.SETH@>
{"SETMH", 0xe1, 0x22080},
@.SETMH@>
{"SETML", 0xe2, 0x22080},
@.SETML@>
{"SETL", 0xe3, 0x22080},@/
@.SETL@>
{"INCH", 0xe4, 0x22080},
@.INCH@>
{"INCMH", 0xe5, 0x22080},
@.INCMH@>
{"INCML", 0xe6, 0x22080},
@.INCML@>
{"INCL", 0xe7, 0x22080},@/
@.INCL@>
{"ORH", 0xe8, 0x22080},
@.ORH@>
{"ORMH", 0xe9, 0x22080},
@.ORMH@>
{"ORML", 0xea, 0x22080},
@.ORML@>
{"ORL", 0xeb, 0x22080},@/
@.ORL@>
{"ANDNH", 0xec, 0x22080},
@.ANDNH@>
{"ANDNMH", 0xed, 0x22080},
@.ANDNMH@>
{"ANDNML", 0xee, 0x22080},
@.ANDNML@>
{"ANDNL", 0xef, 0x22080},@/
@.ANDNL@>
{"JMP", 0xf0, 0x21001},
@.JMP@>
{"PUSHJ", 0xf2, 0x22041},
@.PUSHJ@>
{"GETA", 0xf4, 0x22081},
@.GETA@>
{"PUT", 0xf6, 0x22002},@/
@.PUT@>
{"POP", 0xf8, 0x23000},
@.POP@>
{"RESUME", 0xf9, 0x21000},
@.RESUME@>
{"SAVE", 0xfa, 0x22080},
@.SAVE@>
{"UNSAVE", 0xfb, 0x23a00},@/
@.UNSAVE@>
{"SYNC", 0xfc, 0x21000},
@.SYNC@>
{"SWYM", 0xfd, 0x27554},
@.SWYM@>
{"GET", 0xfe, 0x22080},
@.GET@>
{"TRIP", 0xff, 0x27554},@/
@.TRIP@>
{"SET",SET, 0x22180},
@.SET@>
{"LDA", 0x22, 0xa60a2},@/
@.LDA@>
{"IS", IS, 0x101400},
@.IS@>
{"LOC", LOC, 0x1400},
@.LOC@>
{"PREFIX", PREFIX, 0x141000},@/
@.PREFIX@>
{"BYTE", BYTE, 0x10f000},
@.BYTE@>
{"WYDE", WYDE, 0x11f000},
@.WYDE@>
{"TETRA", TETRA, 0x12f000},
@.TETRA@>
{"OCTA", OCTA, 0x13f000},@/
@.OCTA@>
{"BSPEC", BSPEC, 0x41400},
@.BSPEC@>
{"ESPEC", ESPEC, 0x141000},@/
@.ESPEC@>
{"GREG", GREG, 0x101000},
@.GREG@>
{"LOCAL", LOCAL, 0x141800}};
@.LOCAL@>
int op_init_size; /* the number of items in |op_init_table| */
 
@ @<Put the \MMIX\ opcodes and \MMIXAL\ pseudo-ops into the trie@>=
op_init_size=(sizeof op_init_table)/sizeof(op_spec);
for (j=0;j<op_init_size;j++) {
tt=trie_search(op_root,op_init_table[j].name);
pp=tt->sym=new_sym_node(false);
pp->link=PREDEFINED;
pp->equiv.h=op_init_table[j].code, pp->equiv.l=op_init_table[j].bits;
}
 
@ @<Local...@>=
register trie_node *tt;
register sym_node *pp,*qq;
 
@ @<Put the special register names into the trie@>=
for (j=0;j<32;j++) {
tt=trie_search(trie_root,special_name[j]);
pp=tt->sym=new_sym_node(false);
pp->link=PREDEFINED;
pp->equiv.l=j;
}
 
@ @<Glob...@>=
Char *special_name[32]={"rB","rD","rE","rH","rJ","rM","rR","rBB",
"rC","rN","rO","rS","rI","rT","rTT","rK","rQ","rU","rV","rG","rL",
"rA","rF","rP","rW","rX","rY","rZ","rWW","rXX","rYY","rZZ"};
@^predefined symbols@>
 
@ @<Type...@>=
typedef struct {
Char* name;
tetra h,l;
}@+predef_spec;
 
@ @<Glob...@>=
predef_spec predefs[]={
{"ROUND_CURRENT",0,0},
@:ROUND_CURRENT}\.{ROUND\_CURRENT@>
{"ROUND_OFF",0,1},
@:ROUND_OFF}\.{ROUND\_OFF@>
{"ROUND_UP",0,2},
@:ROUND_UP}\.{ROUND\_UP@>
{"ROUND_DOWN",0,3},
@:ROUND_DOWN}\.{ROUND\_DOWN@>
{"ROUND_NEAR",0,4},@/
@:ROUND_NEAR}\.{ROUND\_NEAR@>
{"Inf",0x7ff00000,0},@/
@.Inf@>
{"Data_Segment",0x20000000,0},
@:Data_Segment}\.{Data\_Segment@>
{"Pool_Segment",0x40000000,0},
@:Pool_Segment}\.{Pool\_Segment@>
{"Stack_Segment",0x60000000,0},@/
@:Stack_Segment}\.{Stack\_Segment@>
{"D_BIT",0,0x80},
@:D_BIT}\.{D\_BIT@>
{"V_BIT",0,0x40},
@:V_BIT}\.{V\_BIT@>
{"W_BIT",0,0x20},
@:W_BIT}\.{W\_BIT@>
{"I_BIT",0,0x10},
@:I_BIT}\.{I\_BIT@>
{"O_BIT",0,0x08},
@:O_BIT}\.{O\_BIT@>
{"U_BIT",0,0x04},
@:U_BIT}\.{U\_BIT@>
{"Z_BIT",0,0x02},
@:Z_BIT}\.{Z\_BIT@>
{"X_BIT",0,0x01},@/
@:X_BIT}\.{X\_BIT@>
{"D_Handler",0,0x10},
@:D_Handler}\.{D\_Handler@>
{"V_Handler",0,0x20},
@:V_Handler}\.{V\_Handler@>
{"W_Handler",0,0x30},
@:W_Handler}\.{W\_Handler@>
{"I_Handler",0,0x40},
@:I_Handler}\.{I\_Handler@>
{"O_Handler",0,0x50},
@:O_Handler}\.{O\_Handler@>
{"U_Handler",0,0x60},
@:U_Handler}\.{U\_Handler@>
{"Z_Handler",0,0x70},
@:Z_Handler}\.{Z\_Handler@>
{"X_Handler",0,0x80},@/
@:X_Handler}\.{X\_Handler@>
{"StdIn",0,0},
@.StdIn@>
{"StdOut",0,1},
@.StdOut@>
{"StdErr",0,2},@/
@.StdErr@>
{"TextRead",0,0},
@.TextRead@>
{"TextWrite",0,1},
@.TextWrite@>
{"BinaryRead",0,2},
@.BinaryRead@>
{"BinaryWrite",0,3},
@.BinaryWrite@>
{"BinaryReadWrite",0,4},@/
@.BinaryReadWrite@>
{"Halt",0,0},
@.Halt@>
{"Fopen",0,1},
@.Fopen@>
{"Fclose",0,2},
@.Fclose@>
{"Fread",0,3},
@.Fread@>
{"Fgets",0,4},
@.Fgets@>
{"Fgetws",0,5},
@.Fgetws@>
{"Fwrite",0,6},
@.Fwrite@>
{"Fputs",0,7},
@.Fputs@>
{"Fputws",0,8},
@.Fputws@>
{"Fseek",0,9},
@.Fseek@>
{"Ftell",0,10}};
@.Ftell@>
int predef_size;
@^predefined symbols@>
 
@ @<Put other predefined symbols into the trie@>=
predef_size=(sizeof predefs)/sizeof(predef_spec);
for (j=0;j<predef_size;j++) {
tt=trie_search(trie_root,predefs[j].name);
pp=tt->sym=new_sym_node(false);
pp->link=PREDEFINED;
pp->equiv.h=predefs[j].h, pp->equiv.l=predefs[j].l;
}
 
@ We place \.{Main} into the trie at the beginning of assembly,
so that it will show up as an undefined symbol if the user
specifies no starting point.
@.Main@>
 
@<Init...@>=
trie_search(trie_root,"Main")->sym=new_sym_node(true);
 
@ At the end of assembly we traverse the entire symbol table, visiting each
symbol in lexicographic order and transmitting the trie structure to the
output file. We detect any undefined future references at this time.
 
The order of traversal has a simple recursive pattern: To traverse the subtrie
rooted at~|t|, we
$$\vbox{\halign{#\hfil\cr
traverse |t->left|, if the left subtrie is nonempty;\cr
visit |t->sym|, if this symbol table entry is present;\cr
traverse |t->mid|, if the middle subtrie is nonempty;\cr
traverse |t->right|, if the right subtrie is nonempty.\cr
}}$$
This pattern leads to a compact representation in the \.{mmo} file, usually
requiring fewer than two bytes per trie node plus the bytes needed to encode
the equivalents and serial numbers. Each node of the trie is encoded as a
``master byte'' followed by the encodings of the left subtrie,
character, equivalent, middle subtrie, and right subtrie.
The master byte is the sum of
$$\vbox{\halign{#\hfil\cr
\Hex{80}, if the character occupies two bytes instead of one;\cr
\Hex{40}, if the left subtrie is nonempty;\cr
\Hex{20}, if the middle subtrie is nonempty;\cr
\Hex{10}, if the right subtrie is nonempty;\cr
\Hex{01} to \Hex{08}, if the symbol's equivalent is one to eight bytes long;\cr
\Hex{09} to \Hex{0e}, if the symbol's equivalent is $2^{61}$ plus one
to six bytes;\cr
\Hex{0f}, if the symbol's equivalent is \$0 plus one byte;\cr}}$$
the character is omitted if the middle subtrie and the equivalent are
both empty. The ``equivalent'' of an undefined symbol is zero, but
stated as two bytes long.
Symbol equivalents are followed by the serial number, represented as a
sequence of one or more bytes in radix~128; the final byte of the serial
number is tagged by adding~128. (Thus, serial number $2^{14}-1$ is
encoded as \Hex{7fff}; serial number $2^{14}$ is \Hex{010080}.)
 
@ First we prune the trie by removing all predefined symbols that the
user did not redefine.
 
@<Sub...@>=
trie_node* prune @,@,@[ARGS((trie_node*))@];@+@t}\6{@>
trie_node* prune(t)
trie_node* t;
{
register int useful=0;
if (t->sym) {
if (t->sym->serial) useful=1;
else t->sym=NULL;
}
if (t->left) {
t->left=prune(t->left);
if (t->left) useful=1;
}
if (t->mid) {
t->mid=prune(t->mid);
if (t->mid) useful=1;
}
if (t->right) {
t->right=prune(t->right);
if (t->right) useful=1;
}
if (useful) return t;
else return NULL;
}
 
@ Then we output the trie by following the recursive traversal pattern.
 
@<Sub...@>=
void out_stab @,@,@[ARGS((trie_node*))@];@+@t}\6{@>
void out_stab(t)
trie_node* t;
{
register int m=0,j;
register sym_node *pp;
if (t->ch>0xff) m+=0x80;
if (t->left) m+=0x40;
if (t->mid) m+=0x20;
if (t->right) m+=0x10;
if (t->sym) {
if (t->sym->link==REGISTER) m+=0xf;
else if (t->sym->link==DEFINED)
@<Encode the length of |t->sym->equiv|@>@;
else if (t->sym->link || t->sym->serial==1) @<Report an undefined symbol@>;
}
mmo_byte(m);
if (t->left) out_stab(t->left);
if (m&0x2f) @<Visit |t| and traverse |t->mid|@>;
if (t->right) out_stab(t->right);
}
 
@ A global variable called |sym_buf| holds all characters on middle branches to
the current trie node; |sym_ptr| is the first currently unused
character in |sym_buf|.
@^Unicode@>
 
@<Visit |t| and traverse |t->mid|@>=
{
if (m&0x80) mmo_byte(t->ch>>8);
mmo_byte(t->ch&0xff);
*sym_ptr++=(m&0x80? '?': t->ch); /* Unicode? not yet */
m&=0xf;@+ if (m && t->sym->link) {
if (listing_file) @<Print symbol |sym_buf| and its equivalent@>;
if (m==15) m=1;
else if (m>8) m-=8;
for (;m>0;m--)
if (m>4) mmo_byte((t->sym->equiv.h>>(8*(m-5)))&0xff);
else mmo_byte((t->sym->equiv.l>>(8*(m-1)))&0xff);
for (m=0;m<4;m++) if (t->sym->serial<(1<<(7*(m+1)))) break;
for (;m>=0;m--)
mmo_byte(((t->sym->serial>>(7*m))&0x7f)+(m? 0: 0x80));
}
if (t->mid) out_stab(t->mid);
sym_ptr--;
}
 
@ @<Encode the length of |t->sym->equiv|@>=
{@+register tetra x;
if ((t->sym->equiv.h&0xffff0000)==0x20000000)
m+=8, x=t->sym->equiv.h-0x20000000; /* data segment */
else x=t->sym->equiv.h;
if (x) m+=4;@+ else x=t->sym->equiv.l;
for (j=1;j<4;j++) if (x<(1<<(8*j))) break;
m+=j;
}
 
@ We make room for symbols up to 999 bytes long. Strictly speaking,
the program should check if this limit is exceeded; but really!
 
@<Glob...@>=
Char sym_buf[1000];
Char *sym_ptr;
 
@ The initial `\.:' of each fully qualified symbol is omitted here, since most
users of \MMIXAL\ will probably not need the \.{PREFIX} feature. One
consequence of this omission is that the one-character symbol~`\.:'
itself, which is allowed by the rules of \MMIXAL, is printed as the null
string.
 
@<Print symbol |sym_buf| and its equivalent@>=
{
*sym_ptr='\0';
fprintf(listing_file," %s = ",sym_buf+1);
pp=t->sym;
if (pp->link==DEFINED)
fprintf(listing_file,"#%08x%08x",pp->equiv.h,pp->equiv.l);
else if (pp->link==REGISTER)
fprintf(listing_file,"$%03d",pp->equiv.l);
else fprintf(listing_file,"?");
fprintf(listing_file," (%d)\n",pp->serial);
}
 
@ @<Report an undefined symbol@>=
{
*sym_ptr=(m&0x80? '?': t->ch); /* Unicode? not yet */
*(sym_ptr+1)='\0';
fprintf(stderr,"undefined symbol: %s\n",sym_buf+1);
@.undefined symbol@>
err_count++;
m+=2;
}
 
@ @<Check and output the trie@>=
op_root->mid=NULL; /* annihilate all the opcodes */
prune(trie_root);
sym_ptr=sym_buf;
if (listing_file) fprintf(listing_file,"\nSymbol table:\n");
mmo_lop(lop_stab,0,0);
out_stab(trie_root);
while (mmo_ptr&3) mmo_byte(0);
mmo_lopp(lop_end,mmo_ptr>>2);
 
@* Expressions. The most intricate part of the assembly process is
the task of scanning and evaluating expressions in the operand field.
Fortunately, \MMIXAL's expressions have a simple structure that can
be handled easily with a stack-based approach.
 
Two stacks hold pending data as the operand field is scanned and evaluated.
The |op_stack| contains operators that have not yet been performed; the
|val_stack| contains values that have not yet been used. After an entire
operand list has been scanned, the |op_stack| will be empty and the
|val_stack| will hold the operand values needed to assemble the current
instruction.
 
@ Entries on |op_stack| have one of the constant values defined here, and they
have one of the precedence levels defined here.
 
Entries on |val_stack| have |equiv|, |link|, and |status| fields; the |link|
points to a trie node if the expression is a symbol that has not yet
been subjected to any operations.
 
@<Type...@>=
typedef enum {@!negate,@!serialize,@!complement,@!registerize,@!inner_lp,@|
@!plus,@!minus,@!times,@!over,@!frac,@!mod,@!shl,@!shr,@!and,@!or,@!xor,@|
@!outer_lp,@!outer_rp,@!inner_rp} @!stack_op;
typedef enum {@!zero,@!weak,@!strong,@!unary} @!prec;
typedef enum {@!pure,@!reg_val,@!undefined} @!stat;
typedef struct {
octa equiv; /* current value */
trie_node *link; /* trie reference for symbol */
stat status; /* |pure|, |reg_val|, or |undefined| */
} val_node;
 
@ @d top_op op_stack[op_ptr-1] /* top entry on the operator stack */
@d top_val val_stack[val_ptr-1] /* top entry on the value stack */
@d next_val val_stack[val_ptr-2] /* next-to-top entry of the value stack */
 
@<Glob...@>=
stack_op *op_stack; /* stack for pending operators */
int op_ptr; /* number of items on |op_stack| */
val_node *val_stack; /* stack for pending operands */
int val_ptr; /* number of items on |val_stack| */
prec precedence[]={unary,unary,unary,unary,zero,@|
weak,weak,strong,strong,strong,strong,strong,strong,strong,weak,weak,@|
zero,zero,zero}; /* precedences of the respective |stack_op| values */
stack_op rt_op; /* newly scanned operator */
octa acc; /* temporary accumulator */
 
@ @<Init...@>=
op_stack=(stack_op*)calloc(buf_size,sizeof(stack_op));
val_stack=(val_node*)calloc(buf_size,sizeof(val_node));
if (!op_stack || !val_stack) panic("No room for the stacks");
@.No room...@>
 
@ The operand field of an instruction will have been copied into a separate
\&{Char} array called |operand_list| when we reach this part of the program.
 
@<Scan the operand field@>=
p=operand_list;
val_ptr=0; /* |val_stack| is empty */
op_stack[0]=outer_lp, op_ptr=1;
/* |op_stack| contains an ``outer left parenthesis'' */
while (1) {
@<Scan opening tokens until putting something on |val_stack|@>;
scan_close: @<Scan a binary operator or closing token, |rt_op|@>;
while (precedence[top_op]>=precedence[rt_op])
@<Perform the top operation on |op_stack|@>;
hold_op: op_stack[op_ptr++]=rt_op;
}
operands_done:@;
 
@ A comment that follows an empty operand list needs to be detected here.
 
@<Scan opening tokens until putting something on |val_stack|@>=
scan_open:@+if (isletter(*p)) @<Scan a symbol@>@;
else if (isdigit(*p)) {
if (*(p+1)=='F') @<Scan a forward local@>@;
else if (*(p+1)=='B') @<Scan a backward local@>@;
else @<Scan a decimal constant@>;
}@+else@+ switch(*p++) {
case '#': @<Scan a hexadecimal constant@>;@+break;
case '\'': @<Scan a character constant@>;@+break;
case '\"': @<Scan a string constant@>;@+break;
case '@@': @<Scan the current location@>;@+break;
case '-': op_stack[op_ptr++]=negate;
case '+': goto scan_open;
case '&': op_stack[op_ptr++]=serialize;@+goto scan_open;
case '~': op_stack[op_ptr++]=complement;@+goto scan_open;
case '$': op_stack[op_ptr++]=registerize;@+goto scan_open;
case '(': op_stack[op_ptr++]=inner_lp;@+goto scan_open;
default: if (p==operand_list+1) { /* treat operand list as empty */
operand_list[0]='0', operand_list[1]='\0', p=operand_list;
goto scan_open;
}
if (*(p-1)) derr("syntax error at character `%c'",*(p-1));
derr("syntax error after character `%c'",*(p-2));
@.syntax error...@>
}
 
@ @<Scan a symbol@>=
{
if (*p==':') tt=trie_search(trie_root,p+1);
else tt=trie_search(cur_prefix,p);
p=terminator;
symbol_found: val_ptr++;
pp=tt->sym;
if (!pp) pp=tt->sym=new_sym_node(true);
top_val.link=tt, top_val.equiv=pp->equiv;
if (pp->link==PREDEFINED) pp->link=DEFINED;
top_val.status=(pp->link==DEFINED? pure: pp->link==REGISTER? reg_val:
undefined);
}
 
@ @<Scan a forward local@>=
{
tt=&forward_local_host[*p-'0'];@+ p+=2;@+ goto symbol_found;
}
 
@ @<Scan a backward local@>=
{
tt=&backward_local_host[*p-'0'];@+ p+=2;@+ goto symbol_found;
}
 
@ Statically allocated variables |forward_local_host[j]| and
|backward_local_host[j]| masquerade as nodes of the trie.
 
@<Glob...@>=
trie_node forward_local_host[10], backward_local_host[10];
sym_node forward_local[10], backward_local[10];
 
@ Initially \.{0H}, \.{1H}, \dots, \.{9H} are defined to be zero.
 
@<Init...@>=
for (j=0;j<10;j++) {
forward_local_host[j].sym=&forward_local[j];
backward_local_host[j].sym=&backward_local[j];
backward_local[j].link=DEFINED;
}
 
@ We have already checked to make sure that the character constant is legal.
 
@<Scan a character constant@>=
acc.h=0, acc.l=*p;
p+=2;
goto constant_found;
 
@ @<Scan a string constant@>=
acc.h=0, acc.l=*p;
if (*p=='\"') {
p++; acc.l=0; err("*null string is treated as zero");
@.null string...@>
}@+else if (*(p+1)=='\"') p+=2;
else *p='\"', *--p=',';
goto constant_found;
 
@ @<Scan a decimal constant@>=
acc.h=0, acc.l=*p-'0';
for (p++;isdigit(*p);p++) {
acc=oplus(acc,shift_left(acc,2));
acc=incr(shift_left(acc,1),*p-'0');
}
constant_found: val_ptr++;
top_val.link=NULL;
top_val.equiv=acc;
top_val.status=pure;
 
@ @<Scan a hexadecimal constant@>=
if (!isxdigit(*p)) err("illegal hexadecimal constant");
@.illegal hexadecimal constant@>
acc.h=acc.l=0;
for (;isxdigit(*p);p++) {
acc=incr(shift_left(acc,4),*p-'0');
if (*p>='a') acc=incr(acc,'0'-'a'+10);
else if (*p>='A') acc=incr(acc,'0'-'A'+10);
}
goto constant_found;
 
@ @<Scan the current location@>=
acc=cur_loc;
goto constant_found;
 
@ @<Scan a binary operator or closing token, |rt_op|@>=
switch(*p++) {
case '+': rt_op=plus;@+break;
case '-': rt_op=minus;@+break;
case '*': rt_op=times;@+break;
case '/':@+if (*p!='/') rt_op=over;
else p++,rt_op=frac;@+break;
case '%': rt_op=mod;@+break;
case '<': rt_op=shl;@+goto sh_check;
case '>': rt_op=shr;
sh_check:@+if (*p++==*(p-1)) break;
derr("syntax error at `%c'",*(p-2));
@.syntax error...@>
case '&': rt_op=and;@+break;
case '|': rt_op=or;@+break;
case '^': rt_op=xor;@+break;
case ')': rt_op=inner_rp;@+break;
case '\0': case ',': rt_op=outer_rp;@+break;
default: derr("syntax error at `%c'",*(p-1));
}
 
@ @<Perform the top operation on |op_stack|@>=
switch(op_stack[--op_ptr]) {
case inner_lp:@+if (rt_op==inner_rp) goto scan_close;
err("*missing right parenthesis");@+break;
@.missing right parenthesis@>
case outer_lp:@+if (rt_op==outer_rp) {
if (top_val.status==reg_val && (top_val.equiv.l>0xff||top_val.equiv.h)) {
err("*register number too large, will be reduced mod 256");
@.register number...@>
top_val.equiv.h=0, top_val.equiv.l &= 0xff;
}
if (!*(p-1)) goto operands_done;
else rt_op=outer_lp;@+goto hold_op; /* comma */
}@+else {
op_ptr++;
err("*missing left parenthesis");
@.missing left parenthesis@>
goto scan_close;
}
@t\4@>@<Cases for unary operators@>@;
@t\4@>@<Cases for binary operators@>@;
}
 
@ Now we come to the part where equivalents are changed by unary
or binary operators found in the expression being scanned.
 
The most typical operator, and in some ways the fussiest one
to deal with, is binary addition. Once we've written the code for
this case, the other cases almost take care of themselves.
 
@<Cases for binary...@>=
case plus:@+if (top_val.status==undefined)
err("cannot add an undefined quantity");
@.cannot add...@>
if (next_val.status==undefined)
err("cannot add to an undefined quantity");
if (top_val.status==reg_val && next_val.status==reg_val)
err("cannot add two register numbers");
next_val.equiv=oplus(next_val.equiv,top_val.equiv);
fin_bin: next_val.status=(top_val.status==next_val.status? pure: reg_val);
val_ptr--;
delink: top_val.link=NULL;@+break;
 
@ @d unary_check(verb) if (top_val.status!=pure)
derr("can %s pure values only",verb)
 
@<Cases for unary...@>=
case negate: unary_check("negate");
@.can negate...@>
top_val.equiv=ominus(zero_octa,top_val.equiv);@+goto delink;
case complement: unary_check("complement");
@.can complement...@>
top_val.equiv.h=~top_val.equiv.h, top_val.equiv.l=~top_val.equiv.l;
goto delink;
case registerize: unary_check("registerize");
@.can registerize...@>
top_val.status=reg_val;@+goto delink;
case serialize:@+if (!top_val.link)
err("can take serial number of symbol only");
@.can take serial number...@>
top_val.equiv.h=0, top_val.equiv.l=top_val.link->sym->serial;
top_val.status=pure;@+goto delink;
 
@ @d binary_check(verb)
if (top_val.status!=pure || next_val.status!=pure)
derr("can %s pure values only",verb)
 
@<Cases for binary...@>=
case minus:@+if (top_val.status==undefined)
err("cannot subtract an undefined quantity");
@.cannot subtract...@>
if (next_val.status==undefined)
err("cannot subtract from an undefined quantity");
if (top_val.status==reg_val && next_val.status!=reg_val)
err("cannot subtract register number from pure value");
next_val.equiv=ominus(next_val.equiv,top_val.equiv);@+goto fin_bin;
case times: binary_check("multiply");
@.can multiply...@>
next_val.equiv=omult(next_val.equiv,top_val.equiv);@+goto fin_bin;
case over: case mod: binary_check("divide");
@.can divide...@>
if (top_val.equiv.l==0 && top_val.equiv.h==0)
err("*division by zero");
@.division by zero@>
next_val.equiv=odiv(zero_octa,next_val.equiv,top_val.equiv);
if (op_stack[op_ptr]==mod) next_val.equiv=aux;
goto fin_bin;
case frac: binary_check("compute a ratio of");
@.can compute...@>
if (next_val.equiv.h>=top_val.equiv.h &&
(next_val.equiv.l>=top_val.equiv.l || next_val.equiv.h>top_val.equiv.h))
err("*illegal fraction");
@.illegal fraction@>
next_val.equiv=odiv(next_val.equiv,zero_octa,top_val.equiv);@+goto fin_bin;
case shl: case shr: binary_check("compute a bitwise shift of");
if (top_val.equiv.h || top_val.equiv.l>63) next_val.equiv=zero_octa;
else if (op_stack[op_ptr]==shl)
next_val.equiv=shift_left(next_val.equiv,top_val.equiv.l);
else next_val.equiv=shift_right(next_val.equiv,top_val.equiv.l,true);
goto fin_bin;
case and: binary_check("compute bitwise and of");
next_val.equiv.h&=top_val.equiv.h, next_val.equiv.l&=top_val.equiv.l;
goto fin_bin;
case or: binary_check("compute bitwise or of");
next_val.equiv.h|=top_val.equiv.h, next_val.equiv.l|=top_val.equiv.l;
goto fin_bin;
case xor: binary_check("compute bitwise xor of");
next_val.equiv.h^=top_val.equiv.h, next_val.equiv.l^=top_val.equiv.l;
goto fin_bin;
 
@* Assembling an instruction.
Now let's move up from the expression level to the instruction level. We get to
this part of the program at the beginning of a line, or after a
semicolon at the end of an instruction earlier on the current line.
Our current position in the buffer is the value of |buf_ptr|.
 
@<Process the next \MMIXAL\ instruction or comment@>=
p=buf_ptr;@+ buf_ptr="";
@<Scan the label field; |goto bypass| if there is none@>;
@<Scan the opcode field; |goto bypass| if there is none@>;
@<Copy the operand field@>;
buf_ptr=p;
if (spec_mode && !(op_bits&spec_bit))
derr("cannot use `%s' in special mode",op_field);
@.cannot use...@>
if ((op_bits&no_label_bit) && lab_field[0]) {
derr("*label field of `%s' instruction is ignored",op_field);
lab_field[0]='\0';
}
@.label field...ignored@>
if (op_bits&align_bits) @<Align the location pointer@>;
@<Scan the operand field@>;
if (opcode==GREG) @<Allocate a global register@>;
if (lab_field[0]) @<Define the label@>;
@<Do the operation@>;
bypass:@;
 
@ @<Scan the label field; |goto bypass| if there is none@>=
if (!*p) goto bypass;
q=lab_field;
if (!isspace(*p)) {
if (!isdigit(*p)&&!isletter(*p)) goto bypass; /* comment */
for (*q++=*p++;isdigit(*p)||isletter(*p);p++,q++) *q=*p;
if (*p && !isspace(*p)) derr("label syntax error at `%c'",*p);
@.label syntax error...@>
}
*q='\0';
if (isdigit(lab_field[0]) && (lab_field[1]!='H' || lab_field[2]))
derr("improper local label `%s'",lab_field);
@.improper local label...@>
for (p++;isspace(*p);p++);
 
@ We copy the opcode field to a special buffer because we might
want to refer to the symbolic opcode in error messages.
 
@<Scan the opcode field...@>=
q=op_field;@+
while (isletter(*p)||isdigit(*p)) *q++=*p++;
*q='\0';
if (!isspace(*p) && *p && op_field[0]) derr("opcode syntax error at `%c'",*p);
@.opcode syntax error...@>
pp=trie_search(op_root,op_field)->sym;
if (!pp) {
if (op_field[0]) derr("unknown operation code `%s'",op_field);
@.unknown operation code@>
if (lab_field[0]) derr("*no opcode; label `%s' will be ignored",lab_field);
@.no opcode...@>
goto bypass;
}
opcode=pp->equiv.h, op_bits=pp->equiv.l;
while (isspace(*p)) p++;
 
@ @<Glob...@>=
tetra opcode; /* numeric code for \MMIX\ operation or \MMIXAL\ pseudo-op */
tetra op_bits; /* flags describing an operator's special characteristics */
 
@ We copy the operand field to a special buffer so that we can
change string constants while scanning them later.
 
@<Copy the operand field@>=
q=operand_list;
while (*p) {
if (*p==';') break;
if (*p=='\'') {
*q++=*p++;
if (!*p) err("incomplete character constant");
@.incomplete...constant@>
*q++=*p++;
if (*p!='\'') err("illegal character constant");
@.illegal character constant@>
}@+else if (*p=='\"') {
for (*q++=*p++;*p && *p!='\"';p++,q++) *q=*p;
if (!*p) err("incomplete string constant");
}
*q++=*p++;
if (isspace(*p)) break;
}
while (isspace(*p)) p++;
if (*p==';') p++;
else p=""; /* if not followed by semicolon, rest of the line is a comment */
if (q==operand_list) *q++='0'; /* change empty operand field to `\.0' */
*q='\0';
 
@ It is important to do the alignment in this step before defining
the label or evaluating the operand field.
 
@<Align the location pointer@>=
{
j=(op_bits&align_bits)>>16;
acc.h=-1, acc.l=-(1<<j);
cur_loc=oand(incr(cur_loc,(1<<j)-1),acc);
}
 
@ @<Allocate a global register@>=
{
if (val_stack[0].equiv.l || val_stack[0].equiv.h) {
for (j=greg;j<255;j++)
if (greg_val[j].l==val_stack[0].equiv.l &&
greg_val[j].h==val_stack[0].equiv.h) {
cur_greg=j; goto got_greg;
}
}
if (greg==32) err("too many global registers");
@.too many global registers@>
greg--;
greg_val[greg]=val_stack[0].equiv;@+ cur_greg=greg;
got_greg:;
}
 
@ If the label is, say \.{2H}, we will already have used the old
value of \.{2B} when evaluating the operands. Furthermore, an
operand of \.{2F} will have been treated as undefined, which it
still is.
 
Symbols can be defined more than once, but only if each definition
gives them the same equivalent value.
 
A warning message is given when a predefined symbol is being redefined,
if its predefined value has already been used.
 
@<Define the label@>=
{
sym_node *new_link=DEFINED;
acc=cur_loc;
if (opcode==IS) {
cur_loc=val_stack[0].equiv;
if (val_stack[0].status==reg_val) new_link=REGISTER;
}@+else if (opcode==GREG) cur_loc.h=0, cur_loc.l=cur_greg, new_link=REGISTER;
@<Find the symbol table node, |pp|@>;
if (pp->link==DEFINED || pp->link==REGISTER) {
if (pp->equiv.l!=cur_loc.l||pp->equiv.h!=cur_loc.h || pp->link!=new_link) {
if (pp->serial) derr("symbol `%s' is already defined",lab_field);
@.symbol...already defined@>
pp->serial=++serial_number;
derr("*redefinition of predefined symbol `%s'",lab_field);
@.redefinition...@>
}
}@+ else if (pp->link==PREDEFINED) pp->serial=++serial_number;
else if (pp->link) {
if (new_link==REGISTER) err("future reference cannot be to a register");
@.future reference cannot...@>
do @<Fix prior references to this label@>@;@+while (pp->link);
}
if (isdigit(lab_field[0])) pp=&backward_local[lab_field[0]-'0'];
pp->equiv=cur_loc;@+ pp->link=new_link;
@<Fix references that might be in the |val_stack|@>;
if (listing_file && (opcode==IS || opcode==LOC))
@<Make special listing to show the label equivalent@>;
cur_loc=acc;
}
 
@ @<Fix references that might be in the |val_stack|@>=
if (!isdigit(lab_field[0]))
for (j=0;j<val_ptr;j++)
if (val_stack[j].status==undefined && val_stack[j].link->sym==pp) {
val_stack[j].status=(new_link==REGISTER? reg_val: pure);
val_stack[j].equiv=cur_loc;
}
 
@ @<Find the symbol table node, |pp|@>=
if (isdigit(lab_field[0])) pp=&forward_local[lab_field[0]-'0'];
else {
if (lab_field[0]==':') tt=trie_search(trie_root,lab_field+1);
else tt=trie_search(cur_prefix,lab_field);
pp=tt->sym;
if (!pp) pp=tt->sym=new_sym_node(true);
}
 
@ @<Fix prior references to this label@>=
{
qq=pp->link;
pp->link=qq->link;
mmo_loc();
if (qq->serial==fix_o) @<Fix a future reference from an octabyte@>@;
else @<Fix a future reference from a relative address@>;
recycle_fixup(qq);
}
 
@ @<Fix a future reference from an octabyte@>=
{
if (qq->equiv.h&0xffffff) {
mmo_lop(lop_fixo,0,2);
mmo_tetra(qq->equiv.h);
}@+else mmo_lop(lop_fixo,qq->equiv.h>>24,1);
mmo_tetra(qq->equiv.l);
}
 
@ @<Fix a future reference from a relative address@>=
{
octa o;
o=ominus(cur_loc,qq->equiv);
if (o.l&3)
dderr("*relative address in location #%08x%08x not divisible by 4",
@.relative address...@>
qq->equiv.h,qq->equiv.l);
o=shift_right(o,2,0);@+
k=0;
if (o.h==0)
if (o.l<0x10000) mmo_lopp(lop_fixr,o.l);
else if (qq->serial==fix_xyz && o.l<0x1000000) {
mmo_lop(lop_fixrx,0,24);@+mmo_tetra(o.l);
}@+else k=1;
else if (o.h==0xffffffff)
if (qq->serial==fix_xyz && o.l>=0xff000000) {
mmo_lop(lop_fixrx,0,24);@+mmo_tetra(o.l&0x1ffffff);
}@+else if (qq->serial==fix_yz && o.l>=0xffff0000) {
mmo_lop(lop_fixrx,0,16);@+mmo_tetra(o.l&0x100ffff);
}@+else k=1;
else k=1;
if (k) dderr("relative address in location #%08x%08x is too far away",
qq->equiv.h,qq->equiv.l);
}
 
@ @<Make special listing to show the label equivalent@>=
if (new_link==DEFINED) {
fprintf(listing_file,"(%08x%08x)",cur_loc.h,cur_loc.l);
flush_listing_line(" ");
}@+else {
fprintf(listing_file,"($%03d)",cur_loc.l&0xff);
flush_listing_line(" ");
}
 
@ @<Do the operation@>=
future_bits=0;
if (op_bits&many_arg_bit) @<Do a many-operand operation@>@;
else@+switch (val_ptr) {
case 1:@+if (!(op_bits&one_arg_bit))
derr("opcode `%s' needs more than one operand",op_field);
@.opcode...operand(s)@>
@<Do a one-operand operation@>;
case 2:@+if (!(op_bits&two_arg_bit))
if (op_bits&one_arg_bit)
derr("opcode `%s' must not have two operands",op_field)@;
else derr("opcode `%s' must have more than two operands",op_field);
@<Do a two-operand operation@>;
case 3:@+if (!(op_bits&three_arg_bit))
derr("opcode `%s' must not have three operands",op_field);
@<Do a three-operand operation@>;
default: derr("too many operands for opcode `%s'",op_field);
@.too many operands...@>
}
 
@ The many-operand operators are |BYTE|, |WYDE|, |TETRA|, and |OCTA|.
 
@<Do a many-operand operation@>=
for (j=0;j<val_ptr;j++) {
@<Deal with cases where |val_stack[j]| is impure@>;
k=1<<(opcode-BYTE);
if ((val_stack[j].equiv.h && opcode<OCTA) ||@|
(val_stack[j].equiv.l>0xffff && opcode<TETRA) ||@|
(val_stack[j].equiv.l>0xff && opcode<WYDE))
if (k==1) err("*constant doesn't fit in one byte")@;
@.constant doesn't fit...@>
else derr("*constant doesn't fit in %d bytes",k);
if (k<8) assemble(k,val_stack[j].equiv.l,0);
else if (val_stack[j].status==undefined)
assemble(4,0,0xf0), assemble(4,0,0xf0);
else assemble(4,val_stack[j].equiv.h,0), assemble(4,val_stack[j].equiv.l,0);
}
 
@ @<Deal with cases where |val_stack[j]| is impure@>=
if (val_stack[j].status==reg_val)
err("*register number used as a constant")@;
@.register number...@>
else if (val_stack[j].status==undefined) {
if (opcode!=OCTA) err("undefined constant");
@.undefined constant@>
pp=val_stack[j].link->sym;
qq=new_sym_node(false);
qq->link=pp->link;
pp->link=qq;
qq->serial=fix_o;
qq->equiv=cur_loc;
}
 
@ @<Do a three-operand operation@>=
@<Do the Z field@>;
@<Do the Y field@>;
assemble_X: @<Do the X field@>;
assemble_inst: assemble(4,(opcode<<24)+xyz,future_bits);
break;
 
@ Individual fields of an instruction are placed into
global variables |z|, |y|, |x|, |yz|, and/or |xyz|.
 
@<Glob...@>=
tetra z,y,x,yz,xyz; /* pieces for assembly */
int future_bits; /* places where there are future references */
 
@ @<Do the Z field@>=
if (val_stack[2].status==undefined) err("Z field is undefined");
@.Z field is undefined@>
if (val_stack[2].status==reg_val) {
if (!(op_bits&(immed_bit+zr_bit+zar_bit)))
derr("*Z field of `%s' should not be a register number",op_field);
@.Z field...register number@>
}@+ else if (op_bits&immed_bit) opcode++; /* immediate */
else if (op_bits&zr_bit)
derr("*Z field of `%s' should be a register number",op_field);
if (val_stack[2].equiv.h || val_stack[2].equiv.l>0xff)
err("*Z field doesn't fit in one byte");
@.Z field doesn't fit...@>
z=val_stack[2].equiv.l&0xff;
 
@ @<Do the Y field@>=
if (val_stack[1].status==undefined) err("Y field is undefined");
@.Y field is undefined@>
if (val_stack[1].status==reg_val) {
if (!(op_bits&(yr_bit+yar_bit)))
derr("*Y field of `%s' should not be a register number",op_field);
@.Y field...register number@>
}@+ else if (op_bits&yr_bit)
derr("*Y field of `%s' should be a register number",op_field);
if (val_stack[1].equiv.h || val_stack[1].equiv.l>0xff)
err("*Y field doesn't fit in one byte");
@.Y field doesn't fit...@>
y=val_stack[1].equiv.l&0xff;@+
yz=(y<<8)+z;
 
@ @<Do the X field@>=
if (val_stack[0].status==undefined) err("X field is undefined");
@.X field is undefined@>
if (val_stack[0].status==reg_val) {
if (!(op_bits&(xr_bit+xar_bit)))
derr("*X field of `%s' should not be a register number",op_field);
@.X field...register number@>
}@+ else if (op_bits&xr_bit)
derr("*X field of `%s' should be a register number",op_field);
if (val_stack[0].equiv.h || val_stack[0].equiv.l>0xff)
err("*X field doesn't fit in one byte");
@.X field doesn't fit...@>
x=val_stack[0].equiv.l&0xff;@+
xyz=(x<<16)+yz;
 
@ @<Do a two-operand operation@>=
if (val_stack[1].status==undefined) {
if (op_bits&rel_addr_bit)
@<Assemble YZ as a future reference and |goto assemble_X|@>@;
else err("YZ field is undefined");
@.YZ field is undefined@>
}@+else if (val_stack[1].status==reg_val) {
if (!(op_bits&(immed_bit+yzr_bit+yzar_bit)))
derr("*YZ field of `%s' should not be a register number",op_field);
@.YZ field...register number@>
if (opcode==SET) val_stack[1].equiv.l<<=8,opcode=0xc1; /* change to \.{OR} */
else if (op_bits&mem_bit)
val_stack[1].equiv.l<<=8,opcode++; /* silently append \.{,0} */
}@+ else { /* |val_stack[1].status==pure| */
if (op_bits&mem_bit)
@<Assemble YZ as a memory address and |goto assemble_X|@>;
if (opcode==SET) opcode=0xe3; /* change to \.{SETL} */
else if (op_bits&immed_bit) opcode++; /* immediate */
else if (op_bits&yzr_bit) {
derr("*YZ field of `%s' should be a register number",op_field);
}
if (op_bits&rel_addr_bit)
@<Assemble YZ as a relative address and |goto assemble_X|@>;
}
if (val_stack[1].equiv.h || val_stack[1].equiv.l>0xffff)
err("*YZ field doesn't fit in two bytes");
@.YZ field doesn't fit...@>
yz=val_stack[1].equiv.l&0xffff;
goto assemble_X;
 
@ @<Assemble YZ as a future reference...@>=
{
pp=val_stack[1].link->sym;
qq=new_sym_node(false);
qq->link=pp->link;
pp->link=qq;
qq->serial=fix_yz;
qq->equiv=cur_loc;
yz=0;
future_bits=0xc0;
goto assemble_X;
}
 
@ @<Assemble YZ as a relative address and |goto assemble_X|@>=
{
octa source, dest;
if (val_stack[1].equiv.l&3)
err("*relative address is not divisible by 4");
@.relative address...@>
source=shift_right(cur_loc,2,0);
dest=shift_right(val_stack[1].equiv,2,0);
acc=ominus(dest,source);
if (!(acc.h&0x80000000)) {
if (acc.l>0xffff || acc.h)
err("relative address is more than #ffff tetrabytes forward");
}@+else {
acc=incr(acc,0x10000);
opcode++;
if (acc.l>0xffff || acc.h)
err("relative address is more than #10000 tetrabytes backward");
}
yz=acc.l;
goto assemble_X;
}
 
@ @<Assemble YZ as a memory address and |goto assemble_X|@>=
{
octa o;
o=val_stack[1].equiv, k=0;
for (j=greg;j<255;j++) if (greg_val[j].h || greg_val[j].l) {
acc=ominus(val_stack[1].equiv,greg_val[j]);
if (acc.h<=o.h && (acc.l<=o.l || acc.h<o.h)) o=acc, k=j;
}
if (o.l<=0xff && !o.h && k) yz=(k<<8)+o.l, opcode++;
else if (!expanding) err("no base address is close enough to the address A")@;
@.no base address...@>
else @<Assemble instructions to put supplementary data in \$255@>;
goto assemble_X;
}
 
@ @d SETH 0xe0
@d ORH 0xe8
@d ORL 0xeb
 
@<Assemble instructions to put supplementary data in \$255@>=
{
for (j=SETH;j<=ORL;j++) {
switch (j&3) {
case 0: yz=o.h>>16;@+break; /* \.{SETH} */
case 1: yz=o.h&0xffff;@+break; /* \.{SETMH} or \.{ORMH} */
case 2: yz=o.l>>16;@+break; /* \.{SETML} or \.{ORML} */
case 3: yz=o.l&0xffff;@+break; /* \.{SETL} or \.{ORL} */
}
if (yz) {
assemble(4,(j<<24)+(255<<16)+yz,0);
j |= ORH;
}
}
if (k) yz=(k<<8)+255; /* Y = \$$k$, Z = \$255 */
else yz=255<<8, opcode++; /* Y = \$255, Z = 0 */
}
 
@ @<Do a one-operand operation@>=
if (val_stack[0].status==undefined) {
if (op_bits&rel_addr_bit)
@<Assemble XYZ as a future reference and |goto assemble_inst|@>@;
else if (opcode!=PREFIX) err("the operand is undefined");
@.the operand is undefined@>
}@+else if (val_stack[0].status==reg_val) {
if (!(op_bits&(xyzr_bit+xyzar_bit)))
derr("*operand of `%s' should not be a register number",op_field);
@.operand...register number@>
}@+ else { /* |val_stack[0].status==pure| */
if (op_bits&xyzr_bit)
derr("*operand of `%s' should be a register number",op_field);
if (op_bits&rel_addr_bit)
@<Assemble XYZ as a relative address and |goto assemble_inst|@>;
}
if (opcode>0xff) @<Do a pseudo-operation and |goto bypass|@>;
if (val_stack[0].equiv.h || val_stack[0].equiv.l>0xffffff)
err("*XYZ field doesn't fit in three bytes");
@.XYZ field doesn't fit...@>
xyz=val_stack[0].equiv.l&0xffffff;
goto assemble_inst;
 
@ @<Assemble XYZ as a future reference...@>=
{
pp=val_stack[0].link->sym;
qq=new_sym_node(false);
qq->link=pp->link;
pp->link=qq;
qq->serial=fix_xyz;
qq->equiv=cur_loc;
xyz=0;
future_bits=0xe0;
goto assemble_inst;
}
 
@ @<Assemble XYZ as a relative address...@>=
{
octa source, dest;
if (val_stack[0].equiv.l&3)
err("*relative address is not divisible by 4");
@.relative address...@>
source=shift_right(cur_loc,2,0);
dest=shift_right(val_stack[0].equiv,2,0);
acc=ominus(dest,source);
if (!(acc.h&0x80000000)) {
if (acc.l>0xffffff || acc.h)
err("relative address is more than #ffffff tetrabytes forward");
}@+else {
acc=incr(acc,0x1000000);
opcode++;
if (acc.l>0xffffff || acc.h)
err("relative address is more than #1000000 tetrabytes backward");
}
xyz=acc.l;
goto assemble_inst;
}
 
@ @<Do a pseudo-operation...@>=
switch(opcode) {
case LOC: cur_loc=val_stack[0].equiv;
case IS: goto bypass;
case PREFIX:@+if (!val_stack[0].link) err("not a valid prefix");
@.not a valid prefix@>
cur_prefix=val_stack[0].link;@+goto bypass;
case GREG:@+if (listing_file) @<Make listing for |GREG|@>;
goto bypass;
case LOCAL:@+if (val_stack[0].equiv.l>lreg) lreg=val_stack[0].equiv.l;
if (listing_file) {
fprintf(listing_file,"($%03d)",val_stack[0].equiv.l);
flush_listing_line(" ");
}
goto bypass;
case BSPEC:@+if (val_stack[0].equiv.l>0xffff || val_stack[0].equiv.h)
err("*operand of `BSPEC' doesn't fit in two bytes");
@.operand of `BSPEC'...@>
mmo_loc();@+mmo_sync();
mmo_lopp(lop_spec,val_stack[0].equiv.l);
spec_mode=true;@+spec_mode_loc=0;@+ goto bypass;
case ESPEC: spec_mode=false;@+goto bypass;
}
 
@ @<Glob...@>=
octa greg_val[256]; /* initial values of global registers */
 
@ @<Make listing for |GREG|@>=
if (val_stack[0].equiv.l || val_stack[0].equiv.h) {
fprintf(listing_file,"($%03d=#%08x",cur_greg,val_stack[0].equiv.h);
flush_listing_line(" ");
fprintf(listing_file," %08x)",val_stack[0].equiv.l);
flush_listing_line(" ");
}@+else {
fprintf(listing_file,"($%03d)",cur_greg);
flush_listing_line(" ");
}
 
@* Running the program. On a \UNIX/-like system, the command
$$\.{mmixal [options] sourcefilename}$$
will assemble the \MMIXAL\ program in file \.{sourcefilename},
writing any error messages on the standard error file. (Nothing is written to
the standard output.) The options, which may appear in any order, are:
 
\bull\.{-o objectfilename}\quad Send the output to a binary file called
\.{objectfilename}.
If no \.{-o} specification is given, the object file name is obtained from the
input file name by changing the final letter from `\.s' to~`\.o', or by
appending `\.{.mmo}' if \.{sourcefilename} doesn't end with~\.s.
 
\bull\.{-l listingname}\quad Output a listing of the assembled input and
output to a text file called \.{listingname}.
 
\bull\.{-x}\quad Expand memory-oriented commands that cannot be assembled
as single instructions, by assembling auxiliary instructions that make
temporary use of global register~\$255.
 
\bull\.{-b bufsize}\quad Allow up to \.{bufsize} characters per line of input.
 
@ Here, finally, is the overall structure of this program.
 
@c
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
@#
@<Preprocessor definitions@>@;
@<Type definitions@>@;
@<Global variables@>@;
@<Subroutines@>@;
@#
int main(argc,argv)
int argc;@+
char *argv[];
{
register int j,k; /* all-purpose integers */
@<Local variables@>;
@<Process the command line@>;
@<Initialize everything@>;
while(1) {
@<Get the next line of input text, or |break| if the input has ended@>;
while(1) {
@<Process the next \MMIXAL\ instruction or comment@>;
if (!*buf_ptr) break;
}
if (listing_file) {
if (listing_bits) listing_clear();
else if (!line_listed) flush_listing_line(" ");
}
}
@<Finish the assembly@>;
}
 
@ The space after |"-b"| is optional, because
{\mc MMIX-SIM} does not use a space in this context.
 
@<Process the command line@>=
for (j=1;j<argc-1 && argv[j][0]=='-';j++) if (!argv[j][2]) {
if (argv[j][1]=='x') expanding=1;
else if (argv[j][1]=='o') j++,strcpy(obj_file_name,argv[j]);
else if (argv[j][1]=='l') j++,strcpy(listing_name,argv[j]);
else if (argv[j][1]=='b' && sscanf(argv[j+1],"%d",&buf_size)==1) j++;
else break;
}@+else if (argv[j][1]!='b' || sscanf(argv[j]+1,"%d",&buf_size)!=1) break;
if (j!=argc-1) {
fprintf(stderr,"Usage: %s %s sourcefilename\n",
@.Usage: ...@>
argv[0],"[-x] [-l listingname] [-b buffersize] [-o objectfilename]");
exit(-1);
}
src_file_name=argv[j];
 
@ @<Open the files@>=
src_file=fopen(src_file_name,"r");
if (!src_file) dpanic("Can't open the source file %s",src_file_name);
@.Can't open...@>
if (!obj_file_name[0]) {
j=strlen(src_file_name);
if (src_file_name[j-1]=='s') {
strcpy(obj_file_name,src_file_name);@+ obj_file_name[j-1]='o';
} else sprintf(obj_file_name,"%s.mmo",src_file_name);
}
obj_file=fopen(obj_file_name,"wb");
if (!obj_file) dpanic("Can't open the object file %s",obj_file_name);
if (listing_name[0]) {
listing_file=fopen(listing_name,"w");
if (!listing_file) dpanic("Can't open the listing file %s",listing_name);
}
 
@ @<Glob...@>=
char *src_file_name; /* name of the \MMIXAL\ input file */
char obj_file_name[FILENAME_MAX+1]; /* name of the binary output file */
char listing_name[FILENAME_MAX+1]; /* name of the optional listing file */
FILE *src_file, *obj_file, *listing_file;
int expanding; /* are we expanding instructions when base address fail? */
int buf_size; /* maximum number of characters per line of input */
 
@ @<Init...@>=
@<Open the files@>;
filename[0]=src_file_name;
filename_count=1;
@<Output the preamble@>;
 
@ @<Output the preamble@>=
mmo_lop(lop_pre,1,1);
mmo_tetra(time(NULL));
mmo_cur_file=-1;
 
@ @<Finish the assembly@>=
if (lreg>=greg)
dpanic("Danger: Must reduce the number of GREGs by %d",lreg-greg+1);
@.Danger@>
@<Output the postamble@>;
@<Check and output the trie@>;
@<Report any undefined local symbols@>;
if (err_count) {
if (err_count>1) fprintf(stderr,"(%d errors were found.)\n",err_count);
else fprintf(stderr,"(One error was found.)\n");
}
exit(err_count);
 
@ @<Glob...@>=
int greg=255; /* global register allocator */
int cur_greg; /* global register just allocated */
int lreg=32; /* local register allocator */
 
@ @<Output the postamble@>=
mmo_lop(lop_post,0,greg);
greg_val[255]=trie_search(trie_root,"Main")->sym->equiv;
for (j=greg;j<256;j++) {
mmo_tetra(greg_val[j].h);
mmo_tetra(greg_val[j].l);
}
 
@ @<Report any undefined local symbols@>=
for (j=0;j<10;j++) if (forward_local[j].link)
err_count++,fprintf(stderr,"undefined local symbol %dF\n",j);
@.undefined local symbol@>
 
@* Index.
 
/permu-heap.mms
0,0 → 1,75
* Permutation generator a la Heap
N IS 5 $n$ (3, 4, 5, or 6)
t IS $255
j IS $0 $8j$
k IS $1 $8k$
ak IS $2
aj IS $3
 
LOC Data_Segment
a GREG @ Base address for $a_0\ldots a_{n-1}$
A0 IS @
A1 IS @+8
A2 IS @+16
* LOC @+8*N Space for $a_0\ldots a_{n-1}$
BYTE "11111111","22222222","33333333"
BYTE "44444444","55555555","66666666"
BYTE #a,0
LOC (@+7)&-8 (align to octabyte)
c GREG @-8*3 Location of $c_0$
LOC @-8*3+8*N $8c_3\ldots 8c_{n-1}$, initially zero
OCTA -1 $c_n=-1$, a convenient sentinel
u GREG 0 Contents of $a_0$, except in inner loop
v GREG 0 Contents of $a_1$, except in inner loop
w GREG 0 Contents of $a_2$, except in inner loop
 
LOC #100
1H STCO 0,c,k $c_k\gets 0$.
INCL k,8 $k\gets k+1$.
0H LDO j,c,k $j\gets c_k$.
CMP t,j,k
BZ t,1B Loop if $c_k=k$.
BN j,Done Terminate if $c_k<0$ ($k=n$).
LDO ak,a,k Fetch $a_k$.
ADD t,j,8
STO t,c,k $c_k\gets j+1$.
AND t,k,#8
CSZ j,t,0 Set $j\gets 0$ if $k$ is even.
LDO aj,a,j Fetch $a_j$.
STO ak,a,j Replace it by $a_k$.
CSZ u,j,ak Set $u\gets a_k$ if $j=0$.
SUB j,j,8 $j\gets j-1$.
CSZ v,j,ak Set $v\gets a_k$ if $j=0$.
SUB j,j,8 $j\gets j-1$.
CSZ w,j,ak Set $w\gets a_k$ if $j=0$.
STO aj,a,k Replace $a_k$ by what was $a_j$.
In PUSHJ 0,Visit
STO v,A0 $a_0\gets v$.
STO u,A1 $a_1\gets u$.
PUSHJ 0,Visit
STO w,A0 $a_0\gets w$.
STO v,A2 $a_2\gets v$.
PUSHJ 0,Visit
STO u,A0 $a_0\gets u$.
STO w,A1 $a_1\gets w$.
PUSHJ 0,Visit
STO v,A0 $a_0\gets v$.
STO u,A2 $a_2\gets u$.
PUSHJ 0,Visit
STO w,A0 $a_0\gets w$.
STO v,A1 $a_1\gets v$.
PUSHJ 0,Visit
SET t,u Swap $u\leftrightarrow w$.
SET u,w
SET w,t
SET k,8*3 $k\gets3$.
JMP 0B
 
Visit LDA t,A0
TRAP 0,Fputs,StdOut
POP
Main LDO u,A0
LDO v,A1
LDO w,A2
JMP In
Done TRAP 0,Halt,0
/hello.mms
0,0 → 1,9
argv IS $1
LOC #100
Main LDOU $255,argv,0
TRAP 0,Fputs,StdOut
GETA $255,String
TRAP 0,Fputs,StdOut
TRAP 0,Halt,0
String BYTE ", world",#a,0
 
/popup.mms
0,0 → 1,23
* Testing the solution to exercise 1.4.1--16
LOC #100
B GET $2,rJ
PUSHJ $3,C
PUT rJ,$2; POP 2,0
SET $1,1
SET $0,$3
PUT rJ,$2; POP 2,0
 
C BZ $0,1F
CMP $2,$0,5
PBNZ $2,2F
POP 1,0
2H GET $1,rJ
SUB $3,$0,1
PUSHJ $2,C
PUT rJ,$1; POP 1,0
ADD $0,$2,2
PUT rJ,$1
1H POP 1,2
 
Main SET $5,2 manually change this to 5 or 6 or ...
PUSHJ $0,B
/test1.mmconfig
0,0 → 1,36
% FIRST CONFIGURATION TEST, goes with test1.mmix
% The following erroneous lines were commented out one by one while testing:
%sh*t % obscene
%memaddresstime 0 % too small
%memaddresstime unit % unreadable
%branchpredictbits 9 % too large
%membusbytes 9 % not a power of two
%ITcache unit % unknown cache parameter
%mul0 0 % too small
%mul0 256 % too big
%unit antidisestablishmentarianism % too long
%unit 0 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEG % eh?
%unit 1 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEFG % 65
%unit 2 0000000000000000000000000000000000000000000000000000000000000000 % 0's
%Dcache blocksize 1024 % exceeds Scache
%Dcache granularity 16 % exceeds blocksize
%Scache granularity 16 % differs from Dcache
memaddresstime 4
memreadtime 5 memwritetime 6 % don't ask why
membusbytes 16
branchpredictbits 2
branchaddressbits 1
branchhistorybits 1
branchdualbits 1
%branchdualbits 30
memchunksmax 2
hashprime 3
Scache blocksize 32
Scache setsize 2
Scache associativity 4 lru
Scache accesstime 2
Icache victimsize 2
unit UNI1 ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
unit UNI2 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
sh 1 1 1
disablesecurity 1
/traffic.mms
0,0 → 1,97
* Traffic Signal Problem
rate GREG 100 % ridiculously small, for testing (shd be 250MHz)
t IS $255
Sensor_Buf IS Data_Segment
GREG Sensor_Buf
 
LOC #100
Lights IS 3
Sensor IS 4
%Lights_Name BYTE "/dev/lights",0
%Sensor_Name BYTE "/dev/sensor",0
Lights_Name BYTE "lights",0 (temporary name)
Sensor_Name BYTE "sensor",0 (temporary name)
Lights_Args OCTA Lights_Name,BinaryWrite
Sensor_Args OCTA Sensor_Name,BinaryRead
Read_Sensor OCTA Sensor_Buf,1
Boulevard BYTE #77,0 DelMar green, WALK; Berkly red, DONT
BYTE #7f,0 DelMar green, DONT; Berkly red, DONT
BYTE #73,0 DelMar green, off; Berkly red, DONT
BYTE #bf,0 DelMar amber, DONT; Berkly red, DONT
Avenue BYTE #dd,0 DelMar red, DONT; Berkly green, WALK
BYTE #df,0 DelMar red, DONT; Berkly green, DONT
BYTE #dc,0 DelMar red, DONT; Berkly green, off
BYTE #ef,0 DelMar red, DONT; Berkly amber, DONT
 
goal GREG % transition time for lights
Main GETA t,Lights_Args
TRAP 0,Fopen,Lights
GETA t,Sensor_Args
TRAP 0,Fopen,Sensor
GET goal,rC
ANDNMH goal,#ffff % temporary patch
JMP 2F
 
GREG @
delay_go GREG
Delay GET t,rC
ANDNMH t,#ffff % temporary patch
SUBU t,t,goal NB: not CMPU
PBN t,Delay
GO delay_go,delay_go,0
 
flash_go GREG
n GREG
green GREG
temp GREG
Flash SET n,8
1H ADD t,green,2*1
TRAP 0,Fputs,Lights DONT WALK
ADD temp,goal,rate
SR t,rate,1
ADDU goal,goal,t
GO delay_go,Delay
ADD t,green,2*2
TRAP 0,Fputs,Lights off
SET goal,temp
GO delay_go,Delay
SUB n,n,1
PBP n,1B
ADD t,green,2*1
TRAP 0,Fputs,Lights DONT WALK
MUL t,rate,4
ADDU goal,goal,t
GO delay_go,Delay
ADD t,green,2*3
TRAP 0,Fputs,Lights DONT WALK, amber
GO flash_go,flash_go,0
 
Wait GET goal,rC
ANDNMH goal,#ffff % temporary patch
1H GETA t,Read_Sensor
TRAP 0,Fread,Sensor
LDB t,Sensor_Buf
BZ t,Wait
GETA green,Boulevard
GO flash_go,Flash
MUL t,rate,8
ADDU goal,goal,t
GO delay_go,Delay
GETA t,Avenue
TRAP 0,Fputs,Lights
MUL t,rate,8
ADDU goal,goal,t
GO delay_go,Delay
GETA green,Avenue
GO flash_go,Flash
GETA t,Read_Sensor
TRAP 0,Fread,Sensor % clear redundant signal
MUL t,rate,5
ADDU goal,goal,t
GO delay_go,Delay
2H GETA t,Boulevard
TRAP 0,Fputs,Lights
MUL t,rate,18
ADDU goal,goal,t
GO delay_go,Delay
JMP 1B
/zero.mms
0,0 → 1,53
LOC #100
a IS $0
n IS $1
z IS $2
t IS $255
 
1H STB z,a,0
SUB n,n,1
ADD a,a,1
Zero BZ n,9F
SET z,0
AND t,a,7
BNZ t,1B
CMP t,n,64
PBNN t,3F
JMP 5F
2H STCO 0,a,0
SUB n,n,8
ADD a,a,8
3H AND t,a,63
PBNZ t,2B
CMP t,n,64
BN t,5F
4H PREST 63,a,0
SUB n,n,64
CMP t,n,64
STCO 0,a,0
STCO 0,a,8
STCO 0,a,16
STCO 0,a,24
STCO 0,a,32
STCO 0,a,40
STCO 0,a,48
STCO 0,a,56
ADD a,a,64
PBNN t,4B
5H CMP t,n,8
BN t,7F
6H STCO 0,a,0
SUB n,n,8
ADD a,a,8
CMP t,n,8
PBNN t,6B
7H BZ n,9F
8H STB z,a,0
SUB n,n,1
ADD a,a,1
PBNZ n,8B
9H POP
 
Main SET a+1,#fff7
SET n+1,146
PUSHJ 0,Zero
/Makefile
0,0 → 1,97
#
# Makefile for MMIXware
#
 
# Be sure that CWEB version 3.0 or greater is installed before proceeding!
# In fact, CWEB 3.61 is recommended for making hardcopy or PDF documentation.
 
# If you prefer optimization to debugging, change -g to something like -O:
CFLAGS = -g
 
# Uncomment the second line if you use pdftex to bypass .dvi files:
PDFTEX = dvipdfm
#PDFTEX = pdftex
 
.SUFFIXES: .dvi .tex .w .ps .pdf .mmo .mmb .mms
 
.tex.dvi:
tex $*.tex
 
.dvi.ps:
dvips $* -o $*.ps
 
.w.c:
if test -r $*.ch; then ctangle $*.w $*.ch; else ctangle $*.w; fi
 
.w.tex:
if test -r $*.ch; then cweave $*.w $*.ch; else cweave $*.w; fi
 
.w.o:
make $*.c
make $*.o
 
.w:
make $*.c
make $*
 
.w.dvi:
make $*.tex
make $*.dvi
 
.w.ps:
make $*.dvi
make $*.ps
 
.w.pdf:
make $*.tex
case "$(PDFTEX)" in \
dvipdfm ) tex "\let\pdf+ \input $*"; dvipdfm $* ;; \
pdftex ) pdftex $* ;; \
esac
 
.mmo.mmb:
mmix -D$*.mmb $*.mmo
 
.mms.mmo:
mmixal -x -b 250 -l $*.mml $*.mms
 
WEBFILES = abstime.w boilerplate.w mmix-arith.w mmix-config.w mmix-doc.w \
mmix-io.w mmix-mem.w mmix-pipe.w mmix-sim.w mmixal.w mmmix.w mmotype.w
CHANGEFILES =
TESTFILES = *.mms silly.run silly.out *.mmconfig *.mmix
MISCFILES = Makefile makefile.dos README mmix.mp mmix.1
ALL = $(WEBFILES) $(TESTFILES) $(MISCFILES)
 
basic: mmixal mmix
 
doc: mmix-doc.ps mmixal.dvi mmix-sim.dvi
dvips -n13 mmixal.dvi -o mmixal-intro.ps
dvips -n8 mmix-sim.dvi -o mmix-sim-intro.ps
 
all: mmixal mmix mmotype mmmix
 
clean:
rm -f *~ *.o *.c *.h *.tex *.log *.dvi *.toc *.idx *.scn *.ps core
 
mmix-pipe.o: mmix-pipe.c abstime
./abstime > abstime.h
$(CC) $(CFLAGS) -c mmix-pipe.c
rm abstime.h
 
mmix-config.o: mmix-pipe.o
 
mmmix: mmix-arith.o mmix-pipe.o mmix-config.o mmix-mem.o mmix-io.o mmmix.c
$(CC) $(CFLAGS) mmmix.c \
mmix-arith.o mmix-pipe.o mmix-config.o mmix-mem.o mmix-io.o -o mmmix
 
mmixal: mmix-arith.o mmixal.c
$(CC) $(CFLAGS) mmixal.c mmix-arith.o -o mmixal
 
mmix: mmix-arith.o mmix-io.o mmix-sim.c abstime
./abstime > abstime.h
$(CC) $(CFLAGS) mmix-sim.c mmix-arith.o mmix-io.o -o mmix
rm abstime.h
 
tarfile: $(ALL)
tar cvf /tmp/mmix.tar $(ALL)
gzip -9 /tmp/mmix.tar
/strcpy.mms
0,0 → 1,98
in IS $2
out IS $3
r IS $4
l IS $5
m IS $6
t IS $7
mm IS $8
tt IS $9
flip GREG #0102040810204080
ones GREG #0101010101010101
 
LOC #100
StrCpy AND in,$0,#7
SLU in,in,3
AND out,$1,#7
SLU out,out,3
SUB r,out,in
LDOU out,$1,0
SUB $1,$1,$0
NEG m,0,1
SRU m,m,in
LDOU in,$0,0
PUT rM,m
NEG mm,0,1
BN r,1F
NEG l,64,r
SLU tt,out,r
MUX in,in,tt
BDIF t,ones,in
AND t,t,m
SRU mm,mm,r
PUT rM,mm
JMP 4F
1H NEG l,0,r
INCL r,64
SUB $1,$1,8
SRU out,out,l
MUX in,in,out
BDIF t,ones,in
AND t,t,m
SRU mm,mm,r
PUT rM,mm
PBZ t,2F
JMP 5F
3H MUX out,tt,out
STOU out,$0,$1
2H SLU out,in,l
LDOU in,$0,8
INCL $0,8
BDIF t,ones,in
4H SRU tt,in,r
PBZ t,3B
SRU mm,t,r
MUX out,tt,out
BNZ mm,1F
STOU out,$0,$1
5H INCL $0,8
SLU out,in,l
SLU mm,t,l
1H LDOU in,$0,$1
MOR mm,mm,flip
SUBU t,mm,1
ANDN mm,mm,t
MOR mm,mm,flip
SUBU mm,mm,1
PUT rM,mm
MUX in,in,out
STOU in,$0,$1
POP 0
 
Main SET $3,#8001
0H SET $0,0
SET $1,#aa
1H STB $1,$0,0
INCL $1,#11
CMP $2,$1,#dd
CSZ $1,$2,#aa
INCL $0,1
CMP $6,$0,32
PBNZ $6,1B
SET $0,$3
ADD $2,$3,$5
SET $1,3
JMP 2F
1H STB $1,$0,0
SUB $1,$1,1
CSZ $1,$1,3
INCL $0,1
2H CMP $6,$0,$2
PBNZ $6,1B
SET $1,0
STB $1,$0,0
PUSHJ 2,StrCpy
SET $6,0
JMP 0B
% put src address in $3
% put dest addr in $4
% put string length in $5
/primes6.mms
0,0 → 1,63
% Example program ... Table of primes
L IS 600 The number of primes to find
t IS $255 Temporary storage
n GREG
q GREG
r GREG
jj GREG
kk GREG
pk GREG
mm IS kk
 
LOC Data_Segment
PRIME1 WYDE 2
LOC PRIME1+2*L
ptop GREG @
j0 GREG PRIME1+2-@
BUF OCTA
 
LOC #100
Main SET n,3
SET jj,j0
2H STWU n,ptop,jj
INCL jj,2
3H BZ jj,2F
4H INCL n,2
5H SET kk,j0
6H LDWU pk,ptop,kk
DIV q,n,pk
GET r,rR
BZ r,4B
7H CMP t,q,pk
BNP t,2B
8H INCL kk,2
JMP 6B
GREG @
Title BYTE "First Six Hundred Primes"
NewLn BYTE #a,0
Blanks BYTE " ",0
2H LDA t,Title
TRAP 0,Fputs,StdOut
NEG mm,2
3H ADD mm,mm,j0
LDA t,Blanks
TRAP 0,Fputs,StdOut
2H LDWU pk,ptop,mm
0H GREG #2030303030000000
STOU 0B,BUF
LDA t,BUF+4
1H DIV pk,pk,10
GET r,rR
INCL r,'0'
STBU r,t,0
SUB t,t,1
PBNZ pk,1B
LDA t,BUF
TRAP 0,Fputs,StdOut
INCL mm,2*L/10
PBN mm,2B
LDA t,NewLn
TRAP 0,Fputs,StdOut
CMP t,mm,2*(L/10-1)
PBNZ t,3B
TRAP 0,Halt,0
/alpha.mms
0,0 → 1,89
* The "alpha channel" exercise in section 7.1.3
x GREG
y GREG
z GREG
m GREG
alpha GREG
t IS $255
l GREG #0101010101010101
h GREG #8080808080808080
mone GREG -1
rodd GREG #4020100804020101
lsh GREG #0080402010080402
 
LOC #100
Main XOR t,x,y
MOR z,rodd,t
AND t,x,y
ADDU z,z,t
AND t,alpha,h
MOR m,mone,t
PUT rM,m
MUX x,z,x
MUX y,y,z
MOR alpha,lsh,alpha
XOR t,x,y
MOR z,t,rodd
AND t,x,y
ADDU z,z,t
AND t,alpha,h
MOR m,t,mone
PUT rM,m
MUX x,z,x
MUX y,y,z
MOR alpha,alpha,lsh
XOR t,x,y
MOR z,t,rodd
AND t,x,y
ADDU z,z,t
AND t,alpha,h
MOR m,t,mone
PUT rM,m
MUX x,z,x
MUX y,y,z
MOR alpha,alpha,lsh
XOR t,x,y
MOR z,t,rodd
AND t,x,y
ADDU z,z,t
AND t,alpha,h
MOR m,t,mone
PUT rM,m
MUX x,z,x
MUX y,y,z
MOR alpha,alpha,lsh
XOR t,x,y
MOR z,t,rodd
AND t,x,y
ADDU z,z,t
AND t,alpha,h
MOR m,t,mone
PUT rM,m
MUX x,z,x
MUX y,y,z
MOR alpha,alpha,lsh
XOR t,x,y
MOR z,t,rodd
AND t,x,y
ADDU z,z,t
AND t,alpha,h
MOR m,t,mone
PUT rM,m
MUX x,z,x
MUX y,y,z
MOR alpha,alpha,lsh
XOR t,x,y
MOR z,t,rodd
AND t,x,y
ADDU z,z,t
AND t,alpha,h
MOR m,t,mone
PUT rM,m
MUX x,z,x
MUX y,y,z
MOR alpha,alpha,lsh
XOR t,x,y
MOR z,t,rodd
AND t,x,y
ADDU z,z,t
TRAP 0,Halt,0
/mmix-config.w
0,0 → 1,1041
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
 
\def\title{MMIX-CONFIG}
\def\MMIX{\.{MMIX}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
@s bool int
@s cache int
@s func int
@s coroutine int
@s octa int
@s cacheset int
@s cacheblock int
@s fetch int
@s control int
@s write_node int
@s internal_opcode int
@s replace_policy int
@s PV TeX
@s mmix_opcode int
@s specnode int
\def\PV{\\{PV}} % use italics, not \tt
@s CPV TeX
\def\CPV{\\{CPV}}
@s OP TeX
\def\OP{\\{OP}}
@s and normal @q unreserve a C++ keyword @>
@s or normal @q unreserve a C++ keyword @>
@s xor normal @q unreserve a C++ keyword @>
 
@*Input format. Configuration files allow this simulator to adapt itself to
infinitely many possible combinations of hardware features. The purpose of the
present module is to read a configuration file, check it for validity, and
set up the relevant data structures.
 
All data in a configuration file consists simply of {\it tokens\/} separated
by one or more units of white space, where a ``token'' is any sequence of
nonspace characters that doesn't contain a percent sign. Percent signs
and anything following them on a line are ignored; this convention allows
a user to include comments in the file. Here's a simple (but weird) example:
$$\vbox{\halign{\tt#\hfil\cr
\% Silly configuration\cr
writebuffer 200\cr
memaddresstime 100\cr
Dcache associativity 4 lru\cr
Dcache blocksize 1024\cr
unit ODD 5555555555555555555555555555555555555555555555555555555555555555\cr
unit EVEN aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\cr
div 40 30 20\ \ \% three-stage divide\cr
}}$$
It means that (1) the write buffer has capacity for 200 octabytes;
(2)~the memory bus takes 100 cycles to process an address;
(3)~there's a D-cache, in which each set has 4 blocks and the replacement
policy is least-recently-used;
(4)~each block in the D-cache has 1024 bytes;
(5)~there are two functional units, one for all the odd-numbered opcodes
and one for all the rest;
(6)~the division instructions take three pipeline stages, spending 40 cycles
in the first stage, 30~in the second, and 20 in the last;
(7)~all other parameters have default values.
 
@ Four kinds of specifications can appear in a configuration file,
according to the following syntax:
\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow
$$\vbox{\halign{$#$\hfil\cr
\<specification>\is\<PV spec>\mid\<cache spec>\mid\<pipe spec>\mid
\<functional spec>\cr
\<PV spec>\is\<parameter>\<decimal value>\cr
\<cache spec>\is\<cache name>\<cache parameter>\<decimal value>\<policy>\cr
\<pipe spec>\is\<operation>\<pipeline times>\cr
\<functional spec>\is\.{unit}\ \<name>\<64 hexadecimal digits>\cr}}$$
 
@ A \<PV spec> simply assigns a given value to a given parameter. The
possibilities for \<parameter> are as follows:
 
\def\bull#1 {\smallskip\hang\textindent{$\bullet$}\.{#1}\enspace}
\bull fetchbuffer (default 4), maximum instructions in the fetch buffer;
must be $\ge1$.
 
\bull writebuffer (default 2), maximum octabytes in the write buffer;
must be $\ge1$.
 
\bull reorderbuffer (default 5), maximum instructions issued but not
committed; must be $\ge1$.
 
\bull renameregs (default 5), maximum partial results in the reorder
buffer; must be $\ge1$.
 
\bull memslots (default 2), maximum store instructions in the reorder
buffer; must be $\ge1$.
 
\bull localregs (default 256), number of local registers in ring;
must be 256, 512, or 1024.
 
\bull fetchmax (default 2), maximum instructions fetched per cycle;
must be $\ge1$.
 
\bull dispatchmax (default 1), maximum instructions issued per cycle;
must be $\ge1$.
 
\bull peekahead (default 1), maximum lookahead for jumps per cycle.
 
\bull commitmax (default 1), maximum instructions committed per cycle;
must be $\ge1$.
 
\bull fremmax (default 1), maximum reductions in \.{FREM} computation per
cycle; must be $\ge1$.
 
\bull denin (default 1), extra cycles taken if a floating point input
is subnormal.
 
\bull denout (default 1), extra cycles taken if a floating point result
is subnormal.
 
\bull writeholdingtime (default 0), minimum number of cycles for data to
remain in the write buffer.
 
\bull memaddresstime (default 20), cycles to process memory address;
must be $\ge1$.
 
\bull memreadtime (default 20), cycles to read one memory busload;
must be $\ge1$.
 
\bull memwritetime (default 20), cycles to write one memory busload;
must be $\ge1$.
 
\bull membusbytes (default 8), number of bytes per memory busload; must be a
power of~2 that is 8~or~more.
 
\bull branchpredictbits (default 0), number of bits in each branch prediction
table entry; must be $\le8$.
 
\bull branchaddressbits (default 0), number of bits in instruction address
used to index the branch prediction table.
 
\bull branchhistorybits (default 0), number of bits in branch history used to
index the branch prediction table.
 
\bull branchdualbits (default 0), number of bits of
instruction-address-xor-branch-history used to index the branch prediction
table.
 
\bull hardwarepagetable (default 1), is zero if page table calculations
must be emulated by the operating system.
 
\bull disablesecurity (default 0), is 1 if the hot-seat security checks
are turned off. This option is used only for testing purposes; it means
that the `\.s' interrupt will not occur, and the `\.p' interrupt will
be signaled only when going from a nonnegative location to a negative one.
 
\bull memchunksmax (default 1000), maximum number of $2^{16}$-byte chunks of
simulated memory; must be $\ge1$.
 
\bull hashprime (default 2003), prime number used to address simulated memory;
must exceed \.{memchunksmax}, preferably by a factor of about~2.
 
\smallskip\noindent
The values of \.{memchunksmax} and \.{hashprime} affect only the speed of the
simulator, not its results---unless a very huge program is being simulated.
The stated defaults for \.{memchunksmax} and \.{hashprime}
should be adequate for almost all applications.
 
@ A \<cache spec> assigns a given value to a parameter affecting one of five
possible caches:
$$\vbox{\halign{$#$\hfil\cr
\<cache spec>\is\<cache name>\<cache parameter>\<decimal value>\<policy>\cr
\<cache name>\is\.{ITcache}\mid\.{DTcache}\mid\.{Icache}\mid\.{Dcache}
\mid\.{Scache}\cr
\<policy>\is\<empty>\mid\.{random}\mid\.{serial}
\mid\.{pseudolru}\mid\.{lru}\cr}}$$
The possibilities for \<cache parameter> are as follows:
 
\bull associativity (default 1), number of cache blocks per cache set;
must be a power of~2. (A cache with associativity~1 is said to be
``direct-mapped.'')
 
\bull blocksize (default 8), number of bytes per cache block; must be a power
of~2, at least equal to the granularity, and at most equal to~8192.
The blocksize of \.{ITcache} and \.{DTcache} must be~8.
 
\bull setsize (default 1), number of sets of cache blocks; must be a power
of~2. (A cache with set size~1 is said to be ``fully associative.'')
 
\bull granularity (default 8), number of bytes per ``dirty bit,'' used to
remember which items of data have changed since they were read from memory;
must be a power of~2 and at least~8. The granularity must be~8 if
\.{writeallocate} is~0.
 
\bull victimsize (default 0), number of cache blocks in the victim buffer,
which holds blocks removed from the main cache sets; must be zero or a power
of~2.
 
\bull writeback (default 0), is 1 in a ``write-back'' cache, which holds dirty
data as long as possible; is 0 in a ``write-through'' cache, which cleans
all data as soon as possible.
 
\bull writeallocate (default 0), is 1 in a ``write-allocate'' cache,
which remembers all recently written data;
is 0 in a ``write-around'' cache, which doesn't make space for newly written
data that fails to hit an existing cache block.
 
\bull accesstime (default 1), number of cycles to query the cache;
must be $\ge1$. (Hits in the S-cache actually require {\it twice}
the accesstime, once to query the tag and once to transmit the data.)
 
\bull copyintime (default 1), number of cycles to move a cache block from
its input buffer into the cache proper; must be $\ge1$.
 
\bull copyouttime (default 1), number of cycles to move a cache block
from the cache proper to its output buffer; must be $\ge1$.
 
\bull ports (default 1), number of processes that can simultaneous
query the cache; must be $\ge1$.
 
\smallskip
The \<policy> parameter should be nonempty only on cache specifications
for parameters
\.{associativity} and \.{victimsize}. If no replacement policy is specified,
\.{random} is the default. All four policies are equivalent when the
\.{associativity} or \.{victimsize} is~1; \.{pseudolru} is equivalent
to \.{lru} when the \.{associativity} or \.{victimsize} is~2.
 
The \.{granularity}, \.{writeback}, \.{writeallocate}, and \.{copyouttime}
parameters affect the performance only of the D-cache and S-cache; the other
three caches are read-only, so they never need to write their data.
 
The \.{ports} parameter affects the performance of the D-cache and
DT-cache, and (if the \.{PREGO} command is used) the performance of the
I-cache and IT-cache. The S-cache accommodates only one process at a time,
regardless of the number of specified ports.
 
Only the translation caches (the IT-cache and DT-cache) are present by
default. But if any specifications are given for, say, an I-cache,
all of the unspecified I-cache parameters take their default values.
 
The existence of an S-cache (secondary cache) implies the existence of both
I-cache and D-cache (primary caches for instructions and data).
The block size of the secondary cache must not be less than the block
size of the primary caches. The secondary cache must have the
same granularity as the D-cache.
 
@ A \<pipe spec> governs the execution time of potentially slow operations.
$$\vbox{\halign{$#$\hfil\cr
\<pipe spec>\is\<operation>\<pipeline times>\cr
\<pipeline times>\is\<decimal value>\mid\<pipeline times>\<decimal value>\cr}}$$
Here the \<operation> is one of the following:
 
\bull mul0 through \.{mul8} (default 10); the values for \.{mul}$j$ refer
to products in which the second operand is less than $2^{8j}$, where $j$
is as small as possible. Thus, for example, \.{mul1} applies to
nonzero one-byte multipliers.
 
\bull div (default 60); this applies to integer division, signed and unsigned.
 
\bull sh (default 1); this applies to left and right shifts, signed and
unsigned.
 
\bull mux (default 1); the multiplex operator.
 
\bull sadd (default 1); the sideways addition operator.
 
\bull mor (default 1); the boolean matrix multiplication operators \.{MOR} and
\.{MXOR}.
 
\bull fadd (default 4); floating point addition and subtraction.
 
\bull fmul (default 4); floating point multiplication.
 
\bull fdiv (default 40); floating point division.
 
\bull fsqrt (default 40); floating point square root.
 
\bull fint (default 4); floating point integerization.
 
\bull fix (default 2); conversion from floating to fixed, signed and unsigned.
 
\bull flot (default 2); conversion from fixed to floating, signed and unsigned.
 
\bull feps (default 4); floating comparison with respect to epsilon.
 
\smallskip\noindent
In each case one can specify a sequence of pipeline stages, with a positive
number of cycles to be spent in each stage. For example, a specification like
`\.{fmul}~\.{3}~\.{1}' would say that a functional unit that supports
\.{FMUL} takes a total of four cycles to compute the floating point product
in two stages; it can start working on a second product after three cycles
have gone by.
 
If a floating point operation has a subnormal input, \.{denin} is added to
the time for the first stage. If a floating point operation has a subnormal
result, \.{denout} is added to the time for the last stage.
 
@ The fourth and final kind of specification defines a functional unit:
$$\<functional spec>\is\.{unit}\ \<name>\<64 hexadecimal digits>$$
The symbolic name should be at most fifteen characters long.
The 64 hexadecimal digits contain 256 bits, with `1' for each supported
opcode; the most significant (leftmost) bit is for opcode 0 (\.{TRAP}),
and the least significant bit is for opcode 255 (\.{TRIP}).
 
For example, we can define a load/store unit (which handles register/memory
operations), a multiplication unit (which handles fixed and floating point
multiplication), a boolean unit (which handles only bitwise operations),
and a more general arithmetic-logical unit, as follows:
$$\vbox{\halign{\tt#\hfil\cr
unit LSU 00000000000000000000000000000000fffffffcfffffffc0000000000000000\cr
unit MUL 000080f000000000000000000000000000000000000000000000000000000000\cr
unit BIT 000000000000000000000000000000000000000000000000ffff00ff00ff0000\cr
unit ALU f0000000ffffffffffffffffffffffff0000000300000003ffffffffffffffff\cr
}}$$
 
The order in which units are specified is important, because \MMIX's dispatcher
will try to match each instruction with the first functional unit that
supports its opcode. Therefore it is best to list more specialized
units (like the \.{BIT} unit in this example) before more general ones;
this lets the specialized units have first chance at the instructions
they can handle.
 
There can be any number of functional units, having possibly identical
specifications. One should, however, give each unit a unique name
(e.g., \.{ALU1} and \.{ALU2} if there are two arithmetic-logical units),
since these names are used in diagnostic messages.
 
Opcodes that aren't supported by any specified unit will cause an
emulation trap.
@^emulation@>
 
@ Full details about the significance of all these parameters can be found
in the \.{mmix-pipe} module, which defines and discusses the data structures
that need to be configured and initialized.
 
Of course the specifications in a configuration file needn't make any sense,
nor need they be practically achievable. We could, for example, specify
a unit that handles only the two opcodes \.{NXOR} and \.{DIVUI};
we could specify 1-cycle division but pipelined 100-cycle shifts, or
1-cycle memory access but 100-cycle cache access. We could create
a thousand rename registers and issue a hundred instructions per cycle,
etc. Some combinations of parameters are clearly ridiculous.
 
But there remain a huge number of possibilities of interest, especially
as technology continues to evolve. By experimenting with configurations that
are extreme by present-day standards, we can see how much might be gained
if the corresponding hardware could be built economically.
 
@* Basic input/output. Let's get ready to program the |MMIX_config| subroutine
by building some simple infrastructure. First we need some macros to
print error messages.
 
@d errprint0(f) fprintf(stderr,f)
@d errprint1(f,a) fprintf(stderr,f,a)
@d errprint2(f,a,b) fprintf(stderr,f,a,b)
@d errprint3(f,a,b,c) fprintf(stderr,f,a,b,c)
@d panic(x)@+ {@+x;@+errprint0("!\n");@+exit(-1);@+}
 
@ And we need a place to look at the input.
 
@d BUF_SIZE 100 /* we don't need long lines */
 
@<Global variables@>=
FILE *config_file; /* input comes from here */
char buffer[BUF_SIZE]; /* input lines go here */
char token[BUF_SIZE]; /* and tokens are copied to here */
char *buf_pointer=buffer; /* this is our current position */
bool token_prescanned; /* does |token| contain the next token already? */
 
@ The |get_token| routine copies the next token of input into the |token|
buffer. After the input has ended, a final `\.{end}' is appended.
 
@<Subroutines@>=
static void get_token @,@,@[ARGS((void))@];@+@t}\6{@>
static void get_token() /* set |token| to the next token of the configuration file */
{
register char *p,*q;
if (token_prescanned) {
token_prescanned=false;@+ return;
}
while(1) { /* scan past white space */
if (*buf_pointer=='\0' || *buf_pointer=='\n' || *buf_pointer=='%') {
if (!fgets(buffer,BUF_SIZE,config_file)) {
strcpy(token,"end");@+return;
}
if (strlen(buffer)==BUF_SIZE-1 && buffer[BUF_SIZE-2]!='\n')
panic(errprint1("config file line too long: `%s...'",buffer));
@.config file line...@>
buf_pointer=buffer;
}@+else if (!isspace(*buf_pointer)) break;
else buf_pointer++;
}
for (p=buf_pointer,q=token;!isspace(*p) && *p!='%';p++,q++) *q=*p;
buf_pointer=p;@+ *q='\0';
return;
}
 
@ The |get_int| routine is called when we wish to input a decimal value.
It returns $-1$ if the next token isn't a string of decimal digits.
 
@<Sub...@>=
static int get_int @,@,@[ARGS((void))@];@+@t}\6{@>
static int get_int()
{@+ int v;
char *p;
get_token();
for (p=token,v=0; *p>='0' && *p<='9'; p++) v=10*v+*p-'0';
if (*p) return -1;
return v;
}
 
@ A simple data structure makes it fairly easy to deal with
parameter/value specifications.
 
@<Type definitions@>=
typedef struct {
char name[20]; /* symbolic name */
int *v; /* internal name */
int defval; /* default value */
int minval, maxval; /* minimum and maximum legal values */
bool power_of_two; /* must it be a power of two? */
} pv_spec;
 
@ Cache parameters are a bit more difficult, but still not bad.
 
@<Type...@>=
typedef enum {@!assoc,@!blksz,@!setsz,@!gran,@!vctsz,
@!wrb,@!wra,@!acctm,@!citm,@!cotm,@!prts} c_param;
@#
typedef struct {
char name[20]; /* symbolic name */
c_param v; /* internal code */
int defval; /* default value */
int minval, maxval; /* minimum and maximum legal values */
bool power_of_two; /* must it be a power of two? */
} cpv_spec;
 
@ Operation codes are the easiest of all.
 
@<Type...@>=
typedef struct {
char name[8]; /* symbolic name */
internal_opcode v; /* internal code */
int defval; /* default value */
} op_spec;
 
@ Most of the parameters are external variables declared in the header
file \.{mmix-pipe.h}; but some are private to this module. Here we
define the main tables used below.
 
@<Glob...@>=
int fetch_buf_size,write_buf_size,reorder_buf_size,mem_bus_bytes,hardware_PT;
int max_cycs=60;
pv_spec PV[]={@/
{"fetchbuffer", &fetch_buf_size, 4, 1, INT_MAX, false},@/
{"writebuffer", &write_buf_size, 2, 1, INT_MAX, false},@/
{"reorderbuffer", &reorder_buf_size, 5, 1, INT_MAX, false},@/
{"renameregs", &max_rename_regs, 5, 1, INT_MAX, false},@/
{"memslots", &max_mem_slots, 2, 1, INT_MAX, false},@/
{"localregs", &lring_size, 256, 256, 1024, true},@/
{"fetchmax", &fetch_max, 2, 1, INT_MAX, false},@/
{"dispatchmax", &dispatch_max, 1, 1, INT_MAX, false},@/
{"peekahead", &peekahead, 1, 0, INT_MAX, false},@/
{"commitmax", &commit_max, 1, 1, INT_MAX, false},@/
{"fremmax", &frem_max, 1, 1, INT_MAX, false},@/
{"denin",&denin_penalty, 1, 0, INT_MAX, false},@/
{"denout",&denout_penalty, 1, 0, INT_MAX, false},@/
{"writeholdingtime", &holding_time, 0, 0, INT_MAX, false},@/
{"memaddresstime", &mem_addr_time, 20, 1, INT_MAX, false},@/
{"memreadtime", &mem_read_time, 20, 1, INT_MAX, false},@/
{"memwritetime", &mem_write_time, 20, 1, INT_MAX, false},@/
{"membusbytes", &mem_bus_bytes, 8, 8, INT_MAX, true},@/
{"branchpredictbits", &bp_n, 0, 0, 8, false},@/
{"branchaddressbits", &bp_a, 0, 0, 32, false},@/
{"branchhistorybits", &bp_b, 0, 0, 32, false},@/
{"branchdualbits", &bp_c, 0, 0, 32, false},@/
{"hardwarepagetable", &hardware_PT, 1, 0, 1, false},@/
{"disablesecurity", (int*)&security_disabled, 0, 0, 1, false},@/
{"memchunksmax", &mem_chunks_max, 1000, 1, INT_MAX, false},@/
{"hashprime", &hash_prime, 2003, 2, INT_MAX, false}};
@#
cpv_spec CPV[]={
{"associativity", assoc, 1, 1, INT_MAX, true},@/
{"blocksize", blksz, 8, 8, 8192, true},@/
{"setsize", setsz, 1, 1, INT_MAX, true},@/
{"granularity", gran, 8, 8, 8192, true},@/
{"victimsize", vctsz, 0, 0, INT_MAX, true},@/
{"writeback", wrb, 0, 0, 1,false},@/
{"writeallocate", wra, 0, 0, 1,false},@/
{"accesstime", acctm, 1, 1, INT_MAX, false},@/
{"copyintime", citm, 1, 1, INT_MAX, false},@/
{"copyouttime", cotm, 1, 1, INT_MAX, false},@/
{"ports", prts, 1, 1, INT_MAX,false}};
@#
op_spec OP[]={
{"mul0", mul0, 10},
{"mul1", mul1, 10},
{"mul2", mul2, 10},
{"mul3", mul3, 10},
{"mul4", mul4, 10},
{"mul5", mul5, 10},
{"mul6", mul6, 10},
{"mul7", mul7, 10},
{"mul8", mul8, 10},@|
{"div", div, 60},
{"sh", sh, 1},
{"mux", mux, 1},
{"sadd", sadd, 1},
{"mor", mor, 1},@|
{"fadd", fadd, 4},
{"fmul", fmul, 4},
{"fdiv", fdiv, 40},
{"fsqrt", fsqrt, 40},
{"fint", fint, 4},@|
{"fix", fix, 2},
{"flot", flot, 2},
{"feps", feps, 4}};
int PV_size,CPV_size,OP_size; /* the number of entries in |PV|, |CPV|, |OP| */
 
@ The |new_cache| routine creates a \&{cache} structure with default values.
(These default values are ``hard-wired'' into the program, not actually
read from the |CPV| table.)
 
@<Sub...@>=
static cache* new_cache @,@,@[ARGS((char*))@];@+@t}\6{@>
static cache* new_cache(name)
char *name;
{@+register cache *c=(cache*)calloc(1,sizeof(cache));
if (!c) panic(errprint1("Can't allocate %s",name));
@.Can't allocate...@>
c->aa=1; /* default associativity, should equal |CPV[0].defval| */
c->bb=8; /* default blocksize */
c->cc=1; /* default setsize */
c->gg=8; /* default granularity */
c->vv=0; /* default victimsize */
c->repl=random; /* default replacement policy */
c->vrepl=random; /* default victim replacement policy */
c->mode=0; /* default mode is write-through and write-around */
c->access_time=c->copy_in_time=c->copy_out_time=1;
c->filler.ctl=&(c->filler_ctl);
c->filler_ctl.ptr_a=(void*)c;
c->filler_ctl.go.o.l=4;
c->flusher.ctl=&(c->flusher_ctl);
c->flusher_ctl.ptr_a=(void*)c;
c->flusher_ctl.go.o.l=4;
c->ports=1;
c->name=name;
return c;
}
 
@ @<Initialize to defaults@>=
PV_size=(sizeof PV)/sizeof(pv_spec);
CPV_size=(sizeof CPV)/sizeof(cpv_spec);
OP_size=(sizeof OP)/sizeof(op_spec);
ITcache=new_cache("ITcache");
DTcache=new_cache("DTcache");
Icache=Dcache=Scache=NULL;
for (j=0;j<PV_size;j++) *(PV[j].v)=PV[j].defval;
for (j=0;j<OP_size;j++) {
pipe_seq[OP[j].v][0]=OP[j].defval;
pipe_seq[OP[j].v][1]=0; /* one stage */
}
 
@* Reading the specs. Before we're ready to process the configuration file,
we need to count the number of functional units, so that we know
how much space to allocate for them.
 
A special background unit is always provided, just to make sure that
\.{TRAP} and \.{TRIP} instructions are handled by somebody.
 
@<Count and allocate the functional units@>=
funit_count=0;
while (strcmp(token,"end")!=0) {
get_token();
if (strcmp(token,"unit")==0) {
funit_count++;
get_token();@+get_token(); /* a unit might be named \.{unit} or \.{end} */
}
}
funit=(func*)calloc(funit_count+1,sizeof(func));
if (!funit) panic(errprint0("Can't allocate the functional units"));
@.Can't allocate...@>
strcpy(funit[funit_count].name,"%%");
@.\%\%@>
funit[funit_count].ops[0]=0x80000000; /* \.{TRAP} */
funit[funit_count].ops[7]=0x1; /* \.{TRIP} */
 
@ Now we can read the specifications and obey them. This program doesn't
bother to be very tolerant of errors, nor does it try to be very efficient.
 
Incidentally, the specifications don't have to be broken into individual lines
in any meaningful way. We simply read them token by token.
 
@<Record all the specs@>=
rewind(config_file);
funit_count=0;
token[0]='\0';
while (strcmp(token,"end")!=0) {
get_token();
if (strcmp(token,"end")==0) break;
@<If |token| is a parameter name, process a PV spec@>;
@<If |token| is a cache name, process a cache spec@>;
@<If |token| is an operation name, process a pipe spec@>;
if (strcmp(token,"unit")==0) @<Process a functional spec@>;
panic(errprint1(
"Configuration syntax error: Specification can't start with `%s'",token));
@.Configuration syntax error...@>
}
 
@ @<If |token| is a parameter name, process a PV spec@>=
for (j=0;j<PV_size;j++) if (strcmp(token,PV[j].name)==0) {
n=get_int();
if (n<PV[j].minval) panic(errprint2(
@.Configuration error...@>
"Configuration error: %s must be >= %d",PV[j].name,PV[j].minval));
if (n>PV[j].maxval) panic(errprint2(
"Configuration error: %s must be <= %d",PV[j].name,PV[j].maxval));
if (PV[j].power_of_two && (n&(n-1))) panic(errprint1(
"Configuration error: %s must be a power of 2",PV[j].name));
*(PV[j].v)=n;
break;
}
if (j<PV_size) continue;
 
@ @<If |token| is a cache name, process a cache spec@>=
if (strcmp(token,"ITcache")==0) {
pcs(ITcache);@+continue;
}@+else if (strcmp(token,"DTcache")==0) {
pcs(DTcache);@+continue;
}@+else if (strcmp(token,"Icache")==0) {
if (!Icache) Icache=new_cache("Icache");
pcs(Icache);@+continue;
}@+else if (strcmp(token,"Dcache")==0) {
if (!Dcache) Dcache=new_cache("Dcache");
pcs(Dcache);@+continue;
}@+else if (strcmp(token,"Scache")==0) {
if (!Icache) Icache=new_cache("Icache");
if (!Dcache) Dcache=new_cache("Dcache");
if (!Scache) Scache=new_cache("Scache");
pcs(Scache);@+continue;
}
 
@ @<Sub...@>=
static void ppol @,@,@[ARGS((replace_policy*))@];@+@t}\6{@>
static void ppol(rr) /* subroutine to scan for a replacement policy */
replace_policy *rr;
{
get_token();
if (strcmp(token,"random")==0) *rr=random;
else if (strcmp(token,"serial")==0) *rr=serial;
else if (strcmp(token,"pseudolru")==0) *rr=pseudo_lru;
else if (strcmp(token,"lru")==0) *rr=lru;
else token_prescanned=true; /* oops, we should rescan that token */
}
 
@ @<Sub...@>=
static void pcs @,@,@[ARGS((cache*))@];@+@t}\6{@>
static void pcs(c) /* subroutine to process a cache spec */
cache *c;
{
register int j,n;
get_token();
for (j=0;j<CPV_size;j++) if (strcmp(token,CPV[j].name)==0) break;
if (j==CPV_size) panic(errprint1(
"Configuration syntax error: `%s' isn't a cache parameter name",token));
@.Configuration syntax error...@>
n=get_int();
if (n<CPV[j].minval) panic(errprint2(
"Configuration error: %s must be >= %d",CPV[j].name,CPV[j].minval));
@.Configuration error...@>
if (n>CPV[j].maxval) panic(errprint2(
"Configuration error: %s must be <= %d",CPV[j].name,CPV[j].maxval));
if (CPV[j].power_of_two && (n&(n-1))) panic(errprint1(
"Configuration error: %s must be power of 2",CPV[j].name));
switch (CPV[j].v) {
case assoc: c->aa=n;@+ppol(&(c->repl));@+break;
case blksz: c->bb=n;@+break;
case setsz: c->cc=n;@+break;
case gran: c->gg=n;@+break;
case vctsz: c->vv=n;@+ppol(&(c->vrepl));@+break;
case wrb: c->mode=(c->mode&~WRITE_BACK)+n*WRITE_BACK;@+break;
case wra: c->mode=(c->mode&~WRITE_ALLOC)+n*WRITE_ALLOC;@+break;
case acctm:@+ if (n>max_cycs) max_cycs=n;
c->access_time=n;@+break;
case citm:@+ if (n>max_cycs) max_cycs=n;
c->copy_in_time=n;@+break;
case cotm:@+ if (n>max_cycs) max_cycs=n;
c->copy_out_time=n;@+break;
case prts: c->ports=n;@+break;
}
}
 
@ @<If |token| is an operation name, process a pipe spec@>=
for (j=0;j<OP_size;j++) if (strcmp(token,OP[j].name)==0) {
for (i=0;;i++) {
n=get_int();
if (n<0) break;
if (n==0) panic(errprint0(
"Configuration error: Pipeline cycles must be positive"));
@.Configuration error...@>
if (n>255) panic(errprint0(
"Configuration error: Pipeline cycles must be <= 255"));
if (n>max_cycs) max_cycs=n;
if (i>=pipe_limit) panic(errprint1(
"Configuration error: More than %d pipeline stages",pipe_limit));
pipe_seq[OP[j].v][i]=n;
}
token_prescanned=true;
break;
}
if (j<OP_size) continue;
 
@ @<Process a functional spec@>=
{
get_token();
if (strlen(token)>15) panic(errprint1(
"Configuration error: `%s' is more than 15 characters long",token));
@.Configuration error...@>
strcpy(funit[funit_count].name,token);
get_token();
if (strlen(token)!=64) panic(errprint1(
"Configuration error: unit %s doesn't have 64 hex digit specs",
funit[funit_count].name));
for (i=j=n=0;j<64;j++) {
if (token[j]>='0' && token[j]<='9') n=(n<<4)+(token[j]-'0');
else if (token[j]>='a' && token[j]<='f') n=(n<<4)+(token[j]-'a'+10);
else if (token[j]>='A' && token[j]<='F') n=(n<<4)+(token[j]-'A'+10);
else panic(errprint1(
"Configuration error: `%c' is not a hex digit",token[j]));
if ((j&0x7)==0x7) funit[funit_count].ops[i++]=n, n=0;
}
funit_count++;
continue;
}
 
@* Checking and allocating. The battle is only half over when we've
absorbed all the data of the configuration file. We still must check for
interactions between different quantities, and we must allocate
space for cache blocks, coroutines, etc.
 
One of the most difficult tasks facing us to determine the maximum number
of pipeline stages needed by each functional unit. Let's tackle that first.
 
@<Allocate coroutines in each functional unit@>=
@<Build table of pipeline stages needed for each opcode@>;
for (j=0;j<=funit_count;j++) {
@<Determine the number of stages, |n|, needed by |funit[j]|@>;
funit[j].k=n;
funit[j].co=(coroutine*)calloc(n,sizeof(coroutine));
for (i=0;i<n;i++) {
funit[j].co[i].name=funit[j].name;
funit[j].co[i].stage=i+1;
}
}
 
@ @<Build table of pipeline stages needed for each opcode@>=
for (j=div;j<=max_pipe_op;j++) int_stages[j]=strlen(pipe_seq[j]);
for (;j<=max_real_command;j++) int_stages[j]=1;
for (j=mul0,n=0;j<=mul8;j++)
if (strlen(pipe_seq[j])>n) n=strlen(pipe_seq[j]);
int_stages[mul]=n;
int_stages[ld]=int_stages[st]=int_stages[frem]=2;
for (j=0;j<256;j++) stages[j]=int_stages[int_op[j]];
 
@ The |int_op| conversion table is similar to the |internal_op| array of
the \\{MMIX\_pipe} routine, but it replaces |divu| by |div|,
|fsub| by |fadd|, etc.
 
@<Glob...@>=
internal_opcode int_op[256]={@/
trap,fcmp,funeq,funeq,fadd,fix,fadd,fix,@/
flot,flot,flot,flot,flot,flot,flot,flot,@/
fmul,feps,feps,feps,fdiv,fsqrt,frem,fint,@/
mul,mul,mul,mul,div,div,div,div,@/
add,add,addu,addu,sub,sub,subu,subu,@/
addu,addu,addu,addu,addu,addu,addu,addu,@/
cmp,cmp,cmpu,cmpu,sub,sub,subu,subu,@/
sh,sh,sh,sh,sh,sh,sh,sh,@/
br,br,br,br,br,br,br,br,@/
br,br,br,br,br,br,br,br,@/
pbr,pbr,pbr,pbr,pbr,pbr,pbr,pbr,@/
pbr,pbr,pbr,pbr,pbr,pbr,pbr,pbr,@/
cset,cset,cset,cset,cset,cset,cset,cset,@/
cset,cset,cset,cset,cset,cset,cset,cset,@/
zset,zset,zset,zset,zset,zset,zset,zset,@/
zset,zset,zset,zset,zset,zset,zset,zset,@/
ld,ld,ld,ld,ld,ld,ld,ld,@/
ld,ld,ld,ld,ld,ld,ld,ld,@/
ld,ld,ld,ld,ld,ld,ld,ld,@/
ld,ld,ld,ld,prego,prego,go,go,@/
st,st,st,st,st,st,st,st,@/
st,st,st,st,st,st,st,st,@/
st,st,st,st,st,st,st,st,@/
st,st,st,st,st,st,pushgo,pushgo,@/
or,or,orn,orn,nor,nor,xor,xor,@/
and,and,andn,andn,nand,nand,nxor,nxor,@/
bdif,bdif,wdif,wdif,tdif,tdif,odif,odif,@/
mux,mux,sadd,sadd,mor,mor,mor,mor,@/
set,set,set,set,addu,addu,addu,addu,@/
or,or,or,or,andn,andn,andn,andn,@/
noop,noop,pushj,pushj,set,set,put,put,@/
pop,resume,save,unsave,sync,noop,get,trip};
int int_stages[max_real_command+1];
/* stages as function of |internal_opcode| */
int stages[256]; /* stages as function of |mmix_opcode| */
 
@ @<Determine the number of stages...@>=
for (i=n=0;i<256;i++)
if (((funit[j].ops[i>>5]<<(i&0x1f))&0x80000000) && stages[i]>n)
n=stages[i];
if (n==0) panic(errprint1(
"Configuration error: unit %s doesn't do anything",funit[j].name));
@.Configuration error...@>
 
@ The next hardest thing on our agenda is to set up the cache structure
fields that depend on the parameters. For example, although we have defined
the parameter in the |bb| field (the block size), we also need to compute the
|b|~field (log of the block size), and we must create the cache blocks
themselves.
 
@<Sub...@>=
static int lg @,@,@[ARGS((int))@];@+@t}\6{@>
static int lg(n) /* compute binary logarithm */
int n;
{@+register int j,l;
for (j=n,l=0;j;j>>=1) l++;
return l-1;
}
 
@ @<Sub...@>=
static void alloc_cache @,@,@[ARGS((cache*,char*))@];@+@t}\6{@>
static void alloc_cache(c,name)
cache *c;
char *name;
{@+register int j,k;
if (c->bb<c->gg) panic(errprint1(
"Configuration error: blocksize of %s is less than granularity",name));
@.Configuration error...@>
if (name[1]=='T' && c->bb!=8) panic(errprint1(
"Configuration error: blocksize of %s must be 8",name));
c->a=lg(c->aa);
c->b=lg(c->bb);
c->c=lg(c->cc);
c->g=lg(c->gg);
c->v=lg(c->vv);
c->tagmask=-(1<<(c->b+c->c));
if (c->a+c->b+c->c>=32) panic(errprint1(
"Configuration error: %s has >= 4 gigabytes of data",name));
if (c->gg!=8 && !(c->mode&WRITE_ALLOC)) panic(errprint2(
"Configuration error: %s does write-around with granularity %d",
name,c->gg));
@<Allocate the cache sets for cache |c|@>;
if (c->vv) @<Allocate the victim cache for cache |c|@>;
c->inbuf.dirty=(char*)calloc(c->bb>>c->g,sizeof(char));
if (!c->inbuf.dirty) panic(errprint1(
"Can't allocate dirty bits for inbuffer of %s",name));
@.Can't allocate...@>
c->inbuf.data=(octa *)calloc(c->bb>>3,sizeof(octa));
if (!c->inbuf.data) panic(errprint1(
"Can't allocate data for inbuffer of %s",name));
c->outbuf.dirty=(char*)calloc(c->bb>>c->g,sizeof(char));
if (!c->outbuf.dirty) panic(errprint1(
"Can't allocate dirty bits for outbuffer of %s",name));
c->outbuf.data=(octa *)calloc(c->bb>>3,sizeof(octa));
if (!c->outbuf.data) panic(errprint1(
"Can't allocate data for outbuffer of %s",name));
if (name[0]!='S') @<Allocate reader coroutines for cache |c|@>;
}
 
@ @d sign_bit 0x80000000
 
@<Allocate the cache sets for cache |c|@>=
c->set=(cacheset *)calloc(c->cc,sizeof(cacheset));
if (!c->set) panic(errprint1(
"Can't allocate cache sets for %s",name));
@.Can't allocate...@>
for (j=0;j<c->cc;j++) {
c->set[j]=(cacheblock *)calloc(c->aa,sizeof(cacheblock));
if (!c->set[j]) panic(errprint2(
"Can't allocate cache blocks for set %d of %s",j,name));
for (k=0;k<c->aa;k++) {
c->set[j][k].tag.h=sign_bit; /* invalid tag */
c->set[j][k].dirty=(char *)calloc(c->bb>>c->g,sizeof(char));
if (!c->set[j][k].dirty) panic(errprint3(
"Can't allocate dirty bits for block %d of set %d of %s",k,j,name));
c->set[j][k].data=(octa *)calloc(c->bb>>3,sizeof(octa));
if (!c->set[j][k].data) panic(errprint3(
"Can't allocate data for block %d of set %d of %s",k,j,name));
}
}
 
@ @<Allocate the victim cache for cache |c|@>=
{
c->victim=(cacheblock*)calloc(c->vv,sizeof(cacheblock));
if (!c->victim) panic(errprint1(
"Can't allocate blocks for victim cache of %s",name));
for (k=0;k<c->vv;k++) {
c->victim[k].tag.h=sign_bit; /* invalid tag */
c->victim[k].dirty=(char *)calloc(c->bb>>c->g,sizeof(char));
if (!c->victim[k].dirty) panic(errprint2(
"Can't allocate dirty bits for block %d of victim cache of %s",
k,name));
@.Can't allocate...@>
c->victim[k].data=(octa *)calloc(c->bb>>3,sizeof(octa));
if (!c->victim[k].data) panic(errprint2(
"Can't allocate data for block %d of victim cache of %s",k,name));
}
}
 
@ @<Allocate reader coroutines for cache |c|@>=
{
c->reader=(coroutine*)calloc(c->ports,sizeof(coroutine));
if (!c->reader) panic(errprint1(
@.Can't allocate...@>
"Can't allocate readers for %s",name));
for (j=0;j<c->ports;j++) {
c->reader[j].stage=vanish;
c->reader[j].name=(name[0]=='D'? (name[1]=='T'? "DTreader": "Dreader"):
(name[1]=='T'? "ITreader": "Ireader"));
}
}
 
@ @<Allocate the caches@>=
alloc_cache(ITcache,"ITcache");
ITcache->filler.name="ITfiller";@+ ITcache->filler.stage=fill_from_virt;
alloc_cache(DTcache,"DTcache");
DTcache->filler.name="DTfiller";@+ DTcache->filler.stage=fill_from_virt;
if (Icache) {
alloc_cache(Icache,"Icache");
Icache->filler.name="Ifiller";@+ Icache->filler.stage=fill_from_mem;
}
if (Dcache) {
alloc_cache(Dcache,"Dcache");
Dcache->filler.name="Dfiller";@+ Dcache->filler.stage=fill_from_mem;
Dcache->flusher.name="Dflusher";@+ Dcache->flusher.stage=flush_to_mem;
}
if (Scache) {
alloc_cache(Scache,"Scache");
if (Scache->bb<Icache->bb) panic(errprint0(
"Configuration error: Scache blocks smaller than Icache blocks"));
@.Configuration error...@>
if (Scache->bb<Dcache->bb) panic(errprint0(
"Configuration error: Scache blocks smaller than Dcache blocks"));
if (Scache->gg!=Dcache->gg) panic(errprint0(
"Configuration error: Scache granularity differs from the Dcache"));
Icache->filler.stage=fill_from_S;
Dcache->filler.stage=fill_from_S;@+ Dcache->flusher.stage=flush_to_S;
Scache->filler.name="Sfiller";@+ Scache->filler.stage=fill_from_mem;
Scache->flusher.name="Sflusher";@+ Scache->flusher.stage=flush_to_mem;
}
 
@ Now we are nearly done. The only nontrivial task remaining is
to allocate the ring of queues for coroutine scheduling; for this we
need to determine the maximum waiting time that will occur between
scheduler and schedulee.
 
@<Allocate the scheduling queue@>=
bus_words=mem_bus_bytes>>3;
j=(mem_read_time<mem_write_time? mem_write_time: mem_read_time);
n=1;
if (Scache && Scache->bb>n) n=Scache->bb;
if (Icache && Icache->bb>n) n=Icache->bb;
if (Dcache && Dcache->bb>n) n=Dcache->bb;
n=mem_addr_time+((int)(n+bus_words-1)/bus_words)*j;
if (n>max_cycs) max_cycs=n; /* now |max_cycs| bounds the waiting time */
ring_size=max_cycs+1;
ring=(coroutine *)calloc(ring_size,sizeof(coroutine));
if (!ring) panic(errprint0("Can't allocate the scheduling ring"));
@.Can't allocate...@>
{@+register coroutine *p;
for (p=ring;p<ring+ring_size;p++) {
p->name=""; /* header nodes are nameless */
p->stage=max_stage;
}
}
 
@ @s chunknode int
 
@<Touch up last-minute trivia@>=
if (hash_prime<=mem_chunks_max) panic(errprint0(
"Configuration error: hashprime must exceed memchunksmax"));
@.Configuration error...@>
mem_hash=(chunknode *)calloc(hash_prime+1,sizeof(chunknode));
if (!mem_hash) panic(errprint0("Can't allocate the hash table"));
@.Can't allocate...@>
mem_hash[0].chunk=(octa*)calloc(1<<13,sizeof(octa));
if (!mem_hash[0].chunk) panic(errprint0("Can't allocate chunk 0"));
mem_hash[hash_prime].chunk=(octa*)calloc(1<<13,sizeof(octa));
if (!mem_hash[hash_prime].chunk) panic(errprint0("Can't allocate 0 chunk"));
mem_chunks=1;
fetch_bot=(fetch*)calloc(fetch_buf_size+1,sizeof(fetch));
if (!fetch_bot) panic(errprint0("Can't allocate the fetch buffer"));
fetch_top=fetch_bot+fetch_buf_size;
reorder_bot=(control*)calloc(reorder_buf_size+1,sizeof(control));
if (!reorder_bot) panic(errprint0("Can't allocate the reorder buffer"));
reorder_top=reorder_bot+reorder_buf_size;
wbuf_bot=(write_node*)calloc(write_buf_size+1,sizeof(write_node));
if (!wbuf_bot) panic(errprint0("Can't allocate the write buffer"));
wbuf_top=wbuf_bot+write_buf_size;
if (bp_n==0) bp_table=NULL;
else { /* a branch prediction table is desired */
if (bp_a+bp_b+bp_c>=32) panic(errprint0(
"Configuration error: Branch table has >= 4 gigabytes of data"));
bp_table=(char*)calloc(1<<(bp_a+bp_b+bp_c),sizeof(char));
if (!bp_table) panic(errprint0("Can't allocate the branch table"));
}
l=(specnode*)calloc(lring_size,sizeof(specnode));
if (!l) panic(errprint0("Can't allocate local registers"));
j=bus_words;
if (Icache && Icache->bb>j) j=Icache->bb;
fetched=(octa*)calloc(j,sizeof(octa));
if (!fetched) panic(errprint0("Can't allocate prefetch buffer"));
dispatch_stat=(int*)calloc(dispatch_max+1,sizeof(int));
if (!dispatch_stat) panic(errprint0("Can't allocate dispatch counts"));
no_hardware_PT=1-hardware_PT;
 
@* Putting it all together. Here then is the desired configuration
subroutine.
 
@c
#include <stdio.h> /* |fopen|, |fgets|, |sscanf|, |rewind| */
#include <stdlib.h> /* |calloc|, |exit| */
#include <ctype.h> /* |isspace| */
#include <string.h> /* |strcpy|, |strlen|, |strcmp| */
#include <limits.h> /* |INT_MAX| */
#include "mmix-pipe.h"
@<Type definitions@>@;
@<Global variables@>@;
@<Subroutines@>@;
void MMIX_config(filename)
char *filename;
{@+register int i,j,n;
config_file=fopen(filename,"r");
if (!config_file)
panic(errprint1("Can't open configuration file %s",filename));
@.Can't open...@>
@<Initialize to defaults@>;
@<Count and allocate the functional units@>;
@<Record all the specs@>;
@<Allocate coroutines in each functional unit@>;
@<Allocate the caches@>;
@<Allocate the scheduling queue@>;
@<Touch up...@>;
}
 
@*Index.
/primessf.mms
0,0 → 1,66
% Example program ... Table of primes (using short floats)
L IS 500 The number of primes to find
t IS $255 Temporary storage
fn GREG
q GREG
r GREG
jj GREG
kk GREG
pk GREG
mm IS kk
 
LOC Data_Segment
PRIME1 TETRA #40000000
LOC PRIME1+4*L
ptop GREG @
j0 GREG PRIME1+4-@
BUF OCTA
 
LOC #100
Main FLOT fn,3
SET jj,j0
2H STSF fn,ptop,jj
INCL jj,4
3H BZ jj,2F
0H GREG #4000000000000000
4H FADD fn,fn,0B
5H SET kk,j0
sqrtn GREG 0
FSQRT sqrtn,fn
6H LDSF pk,ptop,kk
FREM r,fn,pk
BZ r,4B
7H FCMP t,pk,sqrtn
BNN t,2B
8H INCL kk,4
JMP 6B
GREG @
Title BYTE "First Five Hundred Primes"
NewLn BYTE #a,0
Blanks BYTE " ",0
2H LDA t,Title
TRAP 0,Fputs,StdOut
NEG mm,4
3H ADD mm,mm,j0
LDA t,Blanks
TRAP 0,Fputs,StdOut
2H LDSF pk,ptop,mm
FIX pk,pk
0H GREG #2030303030000000
STOU 0B,BUF
LDA t,BUF+4
1H DIV pk,pk,10
GET r,rR
INCL r,'0'
STBU r,t,0
SUB t,t,1
PBNZ pk,1B
LDA t,BUF
TRAP 0,Fputs,StdOut
INCL mm,4*L/10
PBN mm,2B
LDA t,NewLn
TRAP 0,Fputs,StdOut
CMP t,mm,4*(L/10-1)
PBNZ t,3B
TRAP 0,Halt,0
/iotest1.mms
0,0 → 1,36
* TESTING I/O (besides what was tested by the copy program)
* (intended for online test)
 
t IS $255
Buf IS Data_Segment+2
LOC Buf+9*2
Arg0 OCTA Buf,9
Arg1 OCTA Filename,BinaryReadWrite
LOC @+1
Filename BYTE "iotest.tmp",0
GREG Buf
LOC #200
Main LDA t,Arg0
TRAP 0,Fgets,StdIn Fgets(StdIn,Buf,9)
LDA t,Buf
TRAP 0,Fputs,StdOut Fputs(StdOut,Buf)
LDA t,Arg0
TRAP 0,Fgetws,StdIn Fgetws(StdIn,Buf,9)
LDA t,Buf
TRAP 0,Fputws,StdOut Fputws(StdOut,Buf)
TRAP 0,Fclose,StdIn Fclose(StdIn)
TRAP 0,Fclose,StdIn Fclose(StdIn)
LDA t,Arg1
TRAP 0,Fopen,StdIn Fopen(StdIn,"iotest.tmp",BinaryReadWrite)
NEG t,1
TRAP 0,Fseek,StdIn Fseek(StdIn,-1)
TRAP 0,Ftell,StdIn Ftell(StdIn)
LDA t,Buf
TRAP 0,Fputws,StdIn Fputws(StdIn,Buf)
SET t,2
TRAP 0,Fseek,StdIn Fseek(StdIn,2)
LDA t,Arg0
TRAP 0,Fgets,StdIn Fgets(StdIn,Buf,9)
TRAP 0,Halt,0
 
/iotest2.mms
0,0 → 1,98
* Additional IO test for the simulated simulator
* (Change "Chunk" to 8 in sim.mms to make the acid test!)
 
t IS $255
h IS 3
 
LOC Data_Segment
* initial value final value
A OCTA #1111111111111111 #0011000011610a00
OCTA #2222222222222222 #222222222262630a
OCTA #3333333333333333 #0033333333646566
OCTA #4444444444444444 #0a00444444313233
OCTA #5555555555555555 #343536373839410a
OCTA #6666666666666666 #0066666666313233
OCTA #7777777777777777 #3435363738394142
OCTA #8888888888888888 #00888888000a0000
OCTA #9999999999999999 #999999990a0a000a
OCTA #1111111111111111 #0000111178787979
OCTA #2222222222222222 #000a000031313232
OCTA #3333333333333333 #333334343535000a
OCTA #4444444444444444 #0000444431313232
OCTA #5555555555555555 #3333343435353636
OCTA #6666666666666666 #0000666666707100
OCTA #7777777777777777 #7777777777777777
OCTA #8888888888888888 #8888888888888888
OCTA #9999999999999999 #9999999999999999
GREG @
GREG @+256
Dat BYTE "xa",#a,"bc",#a,"def",#a,"123456789A",#a,"123456789AB"
BYTE 0,#a,#a,#a,0,#a,"xxyy",0,#a,"1122334455",0,#a
BYTE "112233445566pq",0,0
IOscr BYTE "ioscr.tmp",0
Arg0 OCTA IOscr,BinaryReadWrite
Arg1 OCTA A,0
Arg2 OCTA A,1
Arg3 OCTA A+3,1
Arg4 OCTA A+5,12
Arg5 OCTA A+13,12
Arg6 OCTA A+21,12
Arg7 OCTA A+29,12
Arg8 OCTA A+45,12
Arg9 OCTA A+61,7
Arg10 OCTA A+69,7
Arg11 OCTA A+77,7
Arg12 OCTA A+85,7
Arg13 OCTA A+101,7
Arg14 OCTA A+117,7
Arg15 OCTA A,8*18
 
LOC #100
Main TRAP 0,Fclose,h
LDA t,Arg0
TRAP 0,Fopen,h
LDA t,Dat
TRAP 0,Fputws,h
TRAP 0,Ftell,h
SET t,1000
TRAP 0,Fseek,h
TRAP 0,Ftell,h
SET t,1
TRAP 0,Fseek,h
TRAP 0,Ftell,h
LDA t,Arg1
TRAP 0,Fgets,h
LDA t,Arg2
TRAP 0,Fgets,h
LDA t,Arg3
TRAP 0,Fgetws,h
LDA t,Arg4
TRAP 0,Fgets,h
LDA t,Arg5
TRAP 0,Fgets,h
LDA t,Arg6
TRAP 0,Fgets,h
LDA t,Arg7
TRAP 0,Fgets,h
LDA t,Arg8
TRAP 0,Fgets,h
LDA t,Arg9
TRAP 0,Fgetws,h
LDA t,Arg10
TRAP 0,Fgetws,h
LDA t,Arg11
TRAP 0,Fgetws,h
LDA t,Arg12
TRAP 0,Fgetws,h
LDA t,Arg13
TRAP 0,Fgetws,h
NEG t,3
TRAP 0,Fseek,h
TRAP 0,Ftell,h
LDA t,Arg14
TRAP 0,Fgets,h
SET t,0
TRAP 0,Fseek,h
LDA t,Arg15
TRAP 0,Fwrite,h
TRAP 0,Halt,0
/primesx.mmconfig
0,0 → 1,44
% configuration for primes test -- still in preparation
memaddresstime 3
memreadtime 10 memwritetime 10
membusbytes 16
%branchpredictbits 2
%branchaddressbits 1
%branchhistorybits 1
%branchdualbits 1
memchunksmax 4
hashprime 5
Scache blocksize 64
Scache setsize 512
Scache associativity 4 pseudolru
Scache accesstime 2
Dcache blocksize 32
Dcache setsize 256
Icache blocksize 32
Icache setsize 256
Icache victimsize 2
unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU3 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU4 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit LSU3 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit MUL1 000080f000000000000000000000000000000000000000000000000000000000
unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000
unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000
dispatchmax 3
commitmax 3
fetchmax 4
memslots 4
renameregs 10
reorderbuffer 20
Dcache writeallocate 1
Scache writeallocate 1
Dcache writeback 1
Scache writeback 1
Dcache ports 2
DTcache ports 2
writebuffer 4
writeholdingtime 5
div 10 10 10 10 10 10
/silly.mms
0,0 → 1,227
* A program that exercises all MMIX operations (more or less)
small GREG #abc
neg_zero GREG #8000000000000000
half GREG #3fe0000000000000
inf GREG #7ff0000000000000
sig_nan GREG #7ff1000000000000
round_off GREG ROUND_OFF<<16
round_up GREG ROUND_UP<<16
round_down GREG ROUND_DOWN<<16
addy GREG #7f6001b4c67bc809
addz GREG #ff5ffb6a4534a3f7
flip GREG #0102040810204080
ry GREG
rz GREG
LOC Data_Segment
GREG @
Start_Inst SUB $4,half,$1
Final_Inst SRU $4,half,1
Load_Test OCTA #8081828384858687
OCTA #88898a8b8c8d8e8f
Jmp_Pop JMP @+8
POP
Load_Begin TETRA #5f030405
Load_End LDUNC $3,$4,5
Big_Begin GO $40,ry,5
Big_End ANDNL $40,(ry-$0)<<8+5
 
LOC #100
Main FCMP $0,neg_zero,$5
FCMP $1,neg_zero,inf
FCMP $2,inf,sig_nan
FUN $3,sig_nan,sig_nan
FEQL $4,$4,neg_zero
FADD $5,half,inf
FADD $6,half,neg_zero
FADD $7,half,half
FADD $8,half,sig_nan
FSUB $9,half,small
PUT rA,round_off
FSUB $9,half,small
FSUB $9,small,half
FSQRT $10,$9
FSUB $11,sig_nan,$10
PUT rA,round_down
FSUB $12,half,half
FSUB $12,$20,$21
FSUB $12,$20,neg_zero
PUT rA,round_up
SUB $0,inf,1 % $0 = largest normal number
FADD $12,$0,small
FIX $12,half
FIXU $14,ROUND_DOWN,$9
FLOT $15,ROUND_DOWN,addy
FLOT $16,ROUND_UP,addy
NEG $1,1 % $1 = -1
FLOT $17,1
FLOT $17,$1
FLOTU $18,255
FLOTU $18,neg_zero
FIX $13,ROUND_NEAR,$18
SFLOT $18,ROUND_DOWN,addy
SFLOT $19,ROUND_UP,addy
FSUB $20,$18,$19
FSUB $20,$16,$15
SFLOT $20,1
SFLOT $20,$1
SFLOTU $21,$1
SFLOTU $21,255
FMUL $22,neg_zero,inf
FMUL $22,half,half
FMUL $23,small,$0
PUT rE,half
FCMPE $24,half,$21
FCMPE $24,neg_zero,small
FCMPE $24,neg_zero,half
FCMPE $24,half,inf
FEQLE $24,$15,$16
PUT rE,neg_zero
FEQLE $24,half,half
FUNE $24,half,half
FSQRT $25,ROUND_UP,$0
FDIV $26,$0,$25
PUT rA,$50
FDIV $26,$0,$25
FMUL $27,$25,$25
FREM $28,$9,half
FREM $29,$9,small
FINT $30,$9
FINT $30,ROUND_UP,small
MUL $31,flip,flip
MUL $32,flip,$1
MUL $33,flip,2
DIV $32,$32,$1
DIV $32,neg_zero,$1
MULU $32,flip,$1
MULU $31,flip,flip
GET $33,rH
PUT rD,$33
DIV $33,$1,3
DIVU $34,$31,flip
ADD $35,addy,addz
FADD $36,addy,addz
CMP $37,$36,$35
GETA $3,1F
PUT rW,$3
LDT $6,Start_Inst
LDTU $7,Final_Inst
1H CMP $5,$6,$7
BNN $5,1F
INCML $6,#100 % increase the opcode
PUT rX,$6 % ropcode 0
RESUME % return to 1B
1H BN $0,@+4*6
PBN $0,@-4*1
BNN $0,@+4*6
PBN $0,@+4*5
PBNN $0,@+4*5
BN $0,@-4*3
BNN $0,@-4*3
PBN $0,@-4*3
PBNN $0,@-4*3
BZ $0,@+4*6
PBZ $0,@-4*1
BNZ $0,@+4*6
PBZ $0,@+4*5
PBNZ $0,@+4*5
BZ $0,@-4*3
BNZ $0,@-4*3
PBZ $0,@-4*3
PBNZ $0,@-4*3
BP $0,@+4*6
PBP $0,@-4*1
BNP $0,@+4*6
PBP $0,@+4*5
PBNP $0,@+4*5
BP $0,@-4*3
BNP $0,@-4*3
PBP $0,@-4*3
PBNP $0,@-4*3
BOD $0,@+4*6
PBOD $0,@-4*1
BEV $0,@+4*6
PBOD $0,@+4*5
PBEV $0,@+4*5
BOD $0,@-4*3
BEV $0,@-4*3
PBOD $0,@-4*3
PBEV $0,@-4*3
LDA $4,Load_Test+4
GETA $3,1F
PUT rW,$3
LDTU $7,Load_End
LDTU $6,Load_Begin
1H CMPU $8,$6,$7
BNN $8,1F
INCML $6,#100 % increase the opcode
PUT rX,$6
RESUME % return to 1B
2H OCTA #fedcba9876543210 % becomes Jmp_Pop
OCTA #ffeeddccbbaa9988 % becomes Jmp_Pop
NEG ry,addy
SET rz,flip
PUT rM,addz
POP
1H GETA $4,2B
SETL $7,4*11
GO $7,$7,$4
GO $7,$4,4*12
PRELD 70,$4,$4
PRELD 70,$4,0
PREGO 70,$4,$4
PREGO 70,$4,0
CSWAP $3,Load_Test+13
GETA $3,1F
PUT rW,$3
SETL rz,1
ADD ry,$4,4
LDOU $40,Jmp_Pop
LDTU $7,Big_End
LDTU $6,Big_Begin
1H CMPU $8,$6,$7
BNN $8,1F
INCML $6,#100 % increase the opcode
PUT rX,$6
SET $5,rz
RESUME % return to 1B
1H SL $40,small,51
SL $40,small,52
SAVE $255,0
PUT rG,small-$0
INCL small-1,U_BIT<<8
FADD $100,small,$200
PUT rA,small-1 % enable underflow trip
TRIP 1,$100,small
FSUB $100,small,$200 % cause underflow trip
PUT rL,10
PUT rL,small
PUSHJ 11,@+4
UNSAVE $255
TRAP 0,Halt,0 % normal exit
 
LOC U_Handler
PUSHJ $255,Handler
3H TRAP 0,$1
SUB $0,$1,1
POP 2,0
4H GET $50,rX
INCH $50,#8100 % ropcode 1
FLOT $60,1
PUT rZ,$60
JMP 2F
 
LOC 0
GET $50,rX
INCH $50,#8200 % ropcode 2
INCMH $50,#ff00-(U_BIT<<8)
TRAP 1
2H PUT rX,$50
GET $255,rB
RESUME
Handler SETL $5,#abcd
GET $1,rJ
PUSHJ 3,3B
SUB $10,$3,$4
PUT rJ,$1
POP 11,(4B-3B)>>2
 
/coolcomb.mms
0,0 → 1,27
* The "cool-lex" combinations of Ruskey and Williams, ex 7.2.1.3--55(b)
s IS 4 % the number of 0-bits in each combination
t IS 3 % the number of 1-bits in each combination; s+t<=8 here
bits GREG 0
ptr GREG 0
LOC #100
Main LDA ptr,Data_Segment % assemble this with the -x switch!
SET bits,(1<<t)-1
1H PUSHJ $0,Visit
ADDU $0,bits,1
AND $0,$0,bits
SUBU $1,$0,1
XOR $1,$1,$0
ADDU $0,$1,1
AND $1,$1,bits
AND $0,$0,bits
ODIF $0,$0,1
SUBU $1,$1,$0
ADDU bits,bits,$1
SRU $0,bits,s+t
PBZ $0,1B
TRAP 0,Halt,0 % simulate this with the -I switch!
Visit STBU bits,ptr,0
INCL ptr,1
POP 0,0
 
/valid.mms
0,0 → 1,47
LOC 4
LDVTS $0,$0,0
LOC #100
Main SET $1,4
PUSHJ 0,InstTest
JMP Main
 
a IS #ffffffff % table entry when anything goes
b IS #ffff04ff % table entry when Y <= ROUND_NEAR
c IS #001f00ff % table entry for PUT and PUTI
d IS #ff000000 % table entry for RESUME
e IS #ffff0000 % table entry for SAVE
f IS #ff0000ff % table entry for UNSAVE
g IS #ff000003 % table entry for SYNC
h IS #ffff001f % table entry for GET
table GREG @
TETRA a,a,a,a,a,b,a,b,b,b,b,b,b,b,b,b % 0x
TETRA a,a,a,a,a,b,a,b,a,a,a,a,a,a,a,a % 1x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 2x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 3x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 4x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 5x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 6x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 7x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 8x
TETRA a,a,a,a,a,a,a,a,0,0,a,a,a,a,a,a % 9x
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Ax
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Bx
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Cx
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Dx
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Ex
TETRA a,a,a,a,a,a,c,c,a,d,e,f,g,a,h,a % Fx
tetra IS $1
maxXYZ IS $2
InstTest BN $0,9F
LDTU tetra,$0,0
SR $0,tetra,22
LDT maxXYZ,table,$0
BDIF $0,tetra,maxXYZ
PBNP maxXYZ,9F
ANDNML $0,#ff00
BNZ $0,9F
MOR tetra,tetra,#4
CMP $0,tetra,18
CSP tetra,$0,0
ODIF $0,tetra,7
9H POP 1,0
/halves.mmix
0,0 → 1,39
000000001000: Hand-assembled halves program
c1f8fa0033f60000 1000: Main OR p,pbase,0; SETL carry,0
f000000420f5f5f6 1008: JMP 1F;Loop ADD acc,acc,carry
77f6f705a1f5f800 1010: ZSOD carry,starp,5; STB acc,p,0
81f7f80137f80001 1018: 1H LDB starp,p,1; INCL p,1
80f5f9f75bf7fffa 1020: LDB acc,half,starp; PBNZ starp,Loop
a1f5f800f1fffff5 1028: STB acc,p,0; JMP Main
100000000000: must preload this address into g249 (f9)
3500000000000000 HALF: BYTE '5'
100000000030:
3030313132323333 BYTE "00112233"
3434310000000000 BYTE "441",0
200000000000: bottom of stack
0000000000000000 20...000: rL
0000000000001000 20...008: f4
0000000000000000 20...010: f5
0000000000000000 20...018: f6
0000000000000000 20...020: f7
0000000000000000 20...028: f8
0000100000000000 20...030: f9
0000100000000039 20...038: fa
0000000000000000 20...040: fb
0000000000000000 20...048: fc
0000000000000000 20...050: fd
0000000000000000 20...058: fe
0000000000000000 20...060: ff
0000000000000000 20...068: rB
0000000000000000 20...070: rD
0000000000000000 20...078: rE
0000000000000000 20...080: rH
0000000000000000 20...088: rJ
0000000000000000 20...090: rM
0000000000000000 20...098: rP
0000000000000000 20...0a0: rR
0000000000000000 20...0a8: rW
0000000000000000 20...0b0: rX
0000000000000000 20...0b8: rY
0000000000000000 20...0c0: rZ
f400000000000000 20...0c8: rG and rA
/primes.mmconfig
0,0 → 1,41
% configuration for primes test -- still in preparation
memaddresstime 3
memreadtime 10 memwritetime 10
membusbytes 16
%branchpredictbits 2
%branchaddressbits 1
%branchhistorybits 1
%branchdualbits 1
memchunksmax 4
hashprime 5
Scache blocksize 64
Scache setsize 512
Scache associativity 4 pseudolru
Scache accesstime 2
Dcache blocksize 32
Dcache setsize 256
Icache blocksize 32
Icache setsize 256
Icache victimsize 2
unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit LSU3 00000000000000000000000000000000fffffffcfffffffc0000000000000000
unit MUL1 000080f000000000000000000000000000000000000000000000000000000000
unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000
unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000
%dispatchmax 3
%commitmax 3
%fetchmax 4
memslots 4
renameregs 10
reorderbuffer 20
Dcache writeallocate 1
Scache writeallocate 1
Dcache writeback 1
Scache writeback 1
Dcache ports 2
DTcache ports 2
writebuffer 4
writeholdingtime 5
/test2.mmconfig
0,0 → 1,35
% configuration for test2.mmix
memaddresstime 3
memreadtime 5 memwritetime 4 % don't ask why
membusbytes 16
writeholdingtime 5
branchpredictbits 2
branchaddressbits 1
branchhistorybits 1
branchdualbits 1
memchunksmax 4
hashprime 5
Scache blocksize 64
Scache setsize 2
Scache associativity 4 pseudolru
Scache accesstime 2
Dcache blocksize 32
Dcache setsize 8
Icache setsize 2
Icache victimsize 2
DTcache associativity 4 pseudolru
ITcache associativity 2
unit UNI1 fffffeffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
unit UNI2 FFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
dispatchmax 3
commitmax 3
fetchmax 4
memslots 4
Dcache writeallocate 1
Scache writeallocate 1
Dcache writeback 1
Scache writeback 1
%Dcache ports 2
Icache blocksize 32
%hardwarepagetable 0 % for this version, start at 8000000000000070
%writebuffer 7
/silly.out
0,0 → 1,1679
mmix> i silly.run
GOOD LUCK, DEAR SIMULATOR
(rG,rA)=M8[#60000000000000f0]=#f100000000000000
rS-=8, rZ=M8[#60000000000000e8]=#0000000000000000
rS-=8, rY=M8[#60000000000000e0]=#0000000000000000
rS-=8, rX=M8[#60000000000000d8]=#0000000000000000
rS-=8, rW=M8[#60000000000000d0]=#0000000000000000
rS-=8, rP=M8[#60000000000000c8]=#0000000000000000
rS-=8, rR=M8[#60000000000000c0]=#0000000000000000
rS-=8, rM=M8[#60000000000000b8]=#0000000000000000
rS-=8, rJ=M8[#60000000000000b0]=#0000000000000000
rS-=8, rH=M8[#60000000000000a8]=#0000000000000000
rS-=8, rE=M8[#60000000000000a0]=#0000000000000000
rS-=8, rD=M8[#6000000000000098]=#0000000000000000
rS-=8, rB=M8[#6000000000000090]=#0000000000000000
rS-=8, g[255]=M8[#6000000000000088]=#0000000000000100
rS-=8, g[254]=M8[#6000000000000080]=#0000000000000abc
rS-=8, g[253]=M8[#6000000000000078]=#8000000000000000
rS-=8, g[252]=M8[#6000000000000070]=#3fe0000000000000
rS-=8, g[251]=M8[#6000000000000068]=#7ff0000000000000
rS-=8, g[250]=M8[#6000000000000060]=#7ff1000000000000
rS-=8, g[249]=M8[#6000000000000058]=#0000000000010000
rS-=8, g[248]=M8[#6000000000000050]=#0000000000020000
rS-=8, g[247]=M8[#6000000000000048]=#0000000000030000
rS-=8, g[246]=M8[#6000000000000040]=#7f6001b4c67bc809
rS-=8, g[245]=M8[#6000000000000038]=#ff5ffb6a4534a3f7
rS-=8, g[244]=M8[#6000000000000030]=#0102040810204080
rS-=8, g[243]=M8[#6000000000000028]=#0000000000000000
rS-=8, g[242]=M8[#6000000000000020]=#0000000000000000
rS-=8, g[241]=M8[#6000000000000018]=#2000000000000000
rS-=8, l[2]=M8[#6000000000000010]=#0000000000000002
rS-=8, l[1]=M8[#6000000000000008]=#4000000000000008
rS-=8, l[0]=M8[#6000000000000000]=#0000000000000001
(00000000000000fc: fb0000ff (UNSAVE)) #60000000000000f0: rG=241, ..., rL=2
"silly.mms"
line 29: Main FCMP $0,neg_zero,$5
1. 0000000000000100: 0100fd05 (FCMP) $0=l[0] = -0. cmp 0. = 0
line 30: FCMP $1,neg_zero,inf
1. 0000000000000104: 0101fdfb (FCMP) $1=l[1] = -0. cmp Inf = -1
line 31: FCMP $2,inf,sig_nan
1. 0000000000000108: 0102fbfa (FCMP) rL=3, $2=l[2] = Inf cmp NaN.0625 = 0, rA=#00010
line 32: FUN $3,sig_nan,sig_nan
1. 000000000000010c: 0203fafa (FUN) rL=4, $3=l[3] = [NaN.0625(||)NaN.0625] = 1
line 33: FEQL $4,$4,neg_zero
1. 0000000000000110: 030404fd (FEQL) rL=5, $4=l[4] = [0.(==)-0.] = 1
line 34: FADD $5,half,inf
1. 0000000000000114: 0405fcfb (FADD) rL=6, $5=l[5] = .5 (+) Inf = Inf
line 35: FADD $6,half,neg_zero
1. 0000000000000118: 0406fcfd (FADD) rL=7, $6=l[6] = .5 (+) -0. = .5
line 36: FADD $7,half,half
1. 000000000000011c: 0407fcfc (FADD) rL=8, $7=l[7] = .5 (+) .5 = 1.
line 37: FADD $8,half,sig_nan
1. 0000000000000120: 0408fcfa (FADD) rL=9, $8=l[8] = .5 (+) NaN.0625 = NaN.5625, rA=#00010
line 38: FSUB $9,half,small
1. 0000000000000124: 0609fcfe (FSUB) rL=10, $9=l[9] = .5 (-) 1.3577e-320 = .5, rA=#00011
line 39: PUT rA,round_off
1. 0000000000000128: f61500f9 (PUT) rA = 65536 = #10000
line 40: FSUB $9,half,small
1. 000000000000012c: 0609fcfe (FSUB) $9=l[9] = .5 [-] 1.3577e-320 = .49999999999999994, rA=#10001
line 41: FSUB $9,small,half
1. 0000000000000130: 0609fefc (FSUB) $9=l[9] = 1.3577e-320 [-] .5 = -.49999999999999994, rA=#10001
line 42: FSQRT $10,$9
1. 0000000000000134: 150a0009 (FSQRT) rL=11, $10=l[10] = [sqrt] -.49999999999999994 = -NaN, rA=#10011
line 43: FSUB $11,sig_nan,$10
1. 0000000000000138: 060bfa0a (FSUB) rL=12, $11=l[11] = NaN.0625 [-] -NaN = -NaN, rA=#10011
line 44: PUT rA,round_down
1. 000000000000013c: f61500f7 (PUT) rA = 196608 = #30000
line 45: FSUB $12,half,half
1. 0000000000000140: 060cfcfc (FSUB) rL=13, $12=l[12] = .5 _-_ .5 = -0.
line 46: FSUB $12,$20,$21
1. 0000000000000144: 060c1415 (FSUB) $12=l[12] = 0. _-_ 0. = -0.
line 47: FSUB $12,$20,neg_zero
1. 0000000000000148: 060c14fd (FSUB) $12=l[12] = 0. _-_ -0. = 0.
line 48: PUT rA,round_up
1. 000000000000014c: f61500f8 (PUT) rA = 131072 = #20000
line 49: SUB $0,inf,1 % $0 = largest normal number
1. 0000000000000150: 2500fb01 (SUBI) $0=l[0] = 9218868437227405312 - 1 = 9218868437227405311
line 50: FADD $12,$0,small
1. 0000000000000154: 040c00fe (FADD) $12=l[12] = 1.7976931348623157e308 ^+^ 1.3577e-320 = Inf, rA=#20009
line 51: FIX $12,half
1. 0000000000000158: 050c00fc (FIX) $12=l[12] = ^fix^ .5 = 1
line 52: FIXU $14,ROUND_DOWN,$9
1. 000000000000015c: 070e0309 (FIXU) rL=15, $14=l[14] = _fix_ -.49999999999999994 = #ffffffffffffffff
line 53: FLOT $15,ROUND_DOWN,addy
1. 0000000000000160: 080f03f6 (FLOT) rL=16, $15=l[15] = _flot_ 9178337916516812809 = 9.178337916516813e18, rA=#20009
line 54: FLOT $16,ROUND_UP,addy
1. 0000000000000164: 081002f6 (FLOT) rL=17, $16=l[16] = ^flot^ 9178337916516812809 = 9.178337916516814e18, rA=#20009
line 55: NEG $1,1 % $1 = -1
1. 0000000000000168: 35010001 (NEGI) $1=l[1] = 0 - 1 = -1
line 56: FLOT $17,1
1. 000000000000016c: 09110001 (FLOTI) rL=18, $17=l[17] = ^flot^ 1 = 1.
line 57: FLOT $17,$1
1. 0000000000000170: 08110001 (FLOT) $17=l[17] = ^flot^ -1 = -1.
line 58: FLOTU $18,255
1. 0000000000000174: 0b1200ff (FLOTUI) rL=19, $18=l[18] = ^flot^ 255 = 255.
line 59: FLOTU $18,neg_zero
1. 0000000000000178: 0a1200fd (FLOTU) $18=l[18] = ^flot^ #8000000000000000 = 9.223372036854776e18
line 60: FIX $13,ROUND_NEAR,$18
1. 000000000000017c: 050d0412 (FIX) $13=l[13] = (fix) 9.223372036854776e18 = -9223372036854775808, rA=#20029
line 61: SFLOT $18,ROUND_DOWN,addy
1. 0000000000000180: 0c1203f6 (SFLOT) $18=l[18] = _sflot_ 9178337916516812809 = 9.178337689848513e18, rA=#20029
line 62: SFLOT $19,ROUND_UP,addy
1. 0000000000000184: 0c1302f6 (SFLOT) rL=20, $19=l[19] = ^sflot^ 9178337916516812809 = 9.178338239604326e18, rA=#20029
line 63: FSUB $20,$18,$19
1. 0000000000000188: 06141213 (FSUB) rL=21, $20=l[20] = 9.178337689848513e18 ^-^ 9.178338239604326e18 = -549755813888.
line 64: FSUB $20,$16,$15
1. 000000000000018c: 0614100f (FSUB) $20=l[20] = 9.178337916516814e18 ^-^ 9.178337916516813e18 = 1024.
line 65: SFLOT $20,1
1. 0000000000000190: 0d140001 (SFLOTI) $20=l[20] = ^sflot^ 1 = 1.
line 66: SFLOT $20,$1
1. 0000000000000194: 0c140001 (SFLOT) $20=l[20] = ^sflot^ -1 = -1.
line 67: SFLOTU $21,$1
1. 0000000000000198: 0e150001 (SFLOTU) rL=22, $21=l[21] = ^sflot^ #ffffffffffffffff = 1.8446744073709552e19, rA=#20029
line 68: SFLOTU $21,255
1. 000000000000019c: 0f1500ff (SFLOTUI) $21=l[21] = ^sflot^ 255 = 255.
line 69: FMUL $22,neg_zero,inf
1. 00000000000001a0: 1016fdfb (FMUL) rL=23, $22=l[22] = -0. ^*^ Inf = -NaN, rA=#20039
line 70: FMUL $22,half,half
1. 00000000000001a4: 1016fcfc (FMUL) $22=l[22] = .5 ^*^ .5 = .25
line 71: FMUL $23,small,$0
1. 00000000000001a8: 1017fe00 (FMUL) rL=24, $23=l[23] = 1.3577e-320 ^*^ 1.7976931348623157e308 = 2.440714297335944e-12, rA=#20039
line 72: PUT rE,half
1. 00000000000001ac: f60200fc (PUT) rE = 4602678819172646912 = #3fe0000000000000
line 73: FCMPE $24,half,$21
1. 00000000000001b0: 1118fc15 (FCMPE) rL=25, $24=l[24] = .5 cmp 255. (.5)) = -1
line 74: FCMPE $24,neg_zero,small
1. 00000000000001b4: 1118fdfe (FCMPE) $24=l[24] = -0. cmp 1.3577e-320 (.5)) = 0
line 75: FCMPE $24,neg_zero,half
1. 00000000000001b8: 1118fdfc (FCMPE) $24=l[24] = -0. cmp .5 (.5)) = 0
line 76: FCMPE $24,half,inf
1. 00000000000001bc: 1118fcfb (FCMPE) $24=l[24] = .5 cmp Inf (.5)) = -1
line 77: FEQLE $24,$15,$16
1. 00000000000001c0: 13180f10 (FEQLE) $24=l[24] = [9.178337916516813e18(==)9.178337916516814e18 (.5)] = 1
line 78: PUT rE,neg_zero
1. 00000000000001c4: f60200fd (PUT) rE = -9223372036854775808 = #8000000000000000
line 79: FEQLE $24,half,half
1. 00000000000001c8: 1318fcfc (FEQLE) $24=l[24] = [.5(==).5 (-0.)] = 0, rA=#20039
line 80: FUNE $24,half,half
1. 00000000000001cc: 1218fcfc (FUNE) $24=l[24] = [.5(||).5 (-0.)] = 1
line 81: FSQRT $25,ROUND_UP,$0
1. 00000000000001d0: 15190200 (FSQRT) rL=26, $25=l[25] = ^sqrt^ 1.7976931348623157e308 = 1.3407807929942597e154, rA=#20039
line 82: FDIV $26,$0,$25
1. 00000000000001d4: 141a0019 (FDIV) rL=27, $26=l[26] = 1.7976931348623157e308 ^/^ 1.3407807929942597e154 = 1.3407807929942595e154
line 83: PUT rA,$50
1. 00000000000001d8: f6150032 (PUT) rA = 0 = #0
line 84: FDIV $26,$0,$25
1. 00000000000001dc: 141a0019 (FDIV) $26=l[26] = 1.7976931348623157e308 (/) 1.3407807929942597e154 = 1.3407807929942595e154
line 85: FMUL $27,$25,$25
1. 00000000000001e0: 101b1919 (FMUL) rL=28, $27=l[27] = 1.3407807929942597e154 (*) 1.3407807929942597e154 = Inf, rA=#00009
line 86: FREM $28,$9,half
1. 00000000000001e4: 161c09fc (FREM) rL=29, $28=l[28] = -.49999999999999994 (rem) .5 = 5.551115123125783e-17
line 87: FREM $29,$9,small
1. 00000000000001e8: 161d09fe (FREM) rL=30, $29=l[29] = -.49999999999999994 (rem) 1.3577e-320 = -2.866e-321
line 88: FINT $30,$9
1. 00000000000001ec: 171e0009 (FINT) rL=31, $30=l[30] = (int) -.49999999999999994 = -0.
line 89: FINT $30,ROUND_UP,small
1. 00000000000001f0: 171e02fe (FINT) $30=l[30] = ^int^ 1.3577e-320 = 1.
line 90: MUL $31,flip,flip
1. 00000000000001f4: 181ff4f4 (MUL) rL=32, $31=l[31] = 72624976668147840 * 72624976668147840 = 507802986467049472, rA=#00049
line 91: MUL $32,flip,$1
1. 00000000000001f8: 1820f401 (MUL) rL=33, $32=l[32] = 72624976668147840 * -1 = -72624976668147840
line 92: MUL $33,flip,2
1. 00000000000001fc: 1921f402 (MULI) rL=34, $33=l[33] = 72624976668147840 * 2 = 145249953336295680
line 93: DIV $32,$32,$1
1. 0000000000000200: 1c202001 (DIV) $32=l[32] = -72624976668147840 / -1 = 72624976668147840, rR=0
line 94: DIV $32,neg_zero,$1
1. 0000000000000204: 1c20fd01 (DIV) $32=l[32] = -9223372036854775808 / -1 = -9223372036854775808, rR=0, rA=#00049
line 95: MULU $32,flip,$1
1. 0000000000000208: 1a20f401 (MULU) $32=l[32] = #102040810204080 * #ffffffffffffffff = #fefdfbf7efdfbf80, rH=#10204081020407f
line 96: MULU $31,flip,flip
1. 000000000000020c: 1a1ff4f4 (MULU) $31=l[31] = #102040810204080 * #102040810204080 = #70c142030404000, rH=#1040c2050c1c4
line 97: GET $33,rH
1. 0000000000000210: fe210003 (GET) $33=l[33] = rH = #1040c2050c1c4
line 98: PUT rD,$33
1. 0000000000000214: f6010021 (PUT) rD = 285925104992708 = #1040c2050c1c4
line 99: DIV $33,$1,3
1. 0000000000000218: 1d210103 (DIVI) $33=l[33] = -1 / 3 = -1, rR=2
line 100: DIVU $34,$31,flip
1. 000000000000021c: 1e221ff4 (DIVU) rL=35, $34=l[34] = #1040c2050c1c4070c142030404000 / #102040810204080 = #102040810204080, rR=#0
line 101: ADD $35,addy,addz
1. 0000000000000220: 2023f6f5 (ADD) rL=36, $35=l[35] = 9178337916516812809 + -45041037404232713 = 9133296879112580096
line 102: FADD $36,addy,addz
1. 0000000000000224: 0424f6f5 (FADD) rL=37, $36=l[36] = 3.51258193065761e305 (+) -3.5091543080293444e305 = 3.4276226282657902e302
line 103: CMP $37,$36,$35
1. 0000000000000228: 30252423 (CMP) rL=38, $37=l[37] = 9133296879112580096 cmp 9133296879112580096 = 0
line 104: GETA $3,1F
1. 000000000000022c: f4030004 (GETA) $3=l[3] = #23c
line 105: PUT rW,$3
1. 0000000000000230: f6180003 (PUT) rW = 572 = #23c
line 106: LDT $6,Start_Inst
1. 0000000000000234: 8906f100 (LDTI) $6=l[6] = M4[#2000000000000000] = 604306433
line 107: LDTU $7,Final_Inst
1. 0000000000000238: 8b07f104 (LDTUI) $7=l[7] = M4[#2000000000000000+4] = #3f04fc01
line 108: 1H CMP $5,$6,$7
1. 000000000000023c: 30050607 (CMP) $5=l[5] = 604306433 cmp 1057291265 = -1
line 109: BNN $5,1F
1. 0000000000000240: 48050004 (BNN) -1>=0? No
line 110: INCML $6,#100 % increase the opcode
1. 0000000000000244: e6060100 (INCML) $6=l[6] = #2404fc01 + #1000000 = #2504fc01
line 111: PUT rX,$6 % ropcode 0
1. 0000000000000248: f6190006 (PUT) rX = 621083649 = #2504fc01
line 112: RESUME % return to 1B
1. 000000000000024c: f9000000 (RESUME) {#2504fc01} -> #23c
(0000000000000238: 2504fc01 (SUBI)) $4=l[4] = 4602678819172646912 - 1 = 4602678819172646911
--------
line 108: 1H CMP $5,$6,$7
2. 000000000000023c: 30050607 (CMP) $5=l[5] = 621083649 cmp 1057291265 = -1
line 109: BNN $5,1F
2. 0000000000000240: 48050004 (BNN) -1>=0? No
line 110: INCML $6,#100 % increase the opcode
2. 0000000000000244: e6060100 (INCML) $6=l[6] = #2504fc01 + #1000000 = #2604fc01
line 111: PUT rX,$6 % ropcode 0
2. 0000000000000248: f6190006 (PUT) rX = 637860865 = #2604fc01
line 112: RESUME % return to 1B
2. 000000000000024c: f9000000 (RESUME) {#2604fc01} -> #23c
(0000000000000238: 2604fc01 (SUBU)) $4=l[4] = #3fe0000000000000 - #ffffffffffffffff = #3fe0000000000001
...............................................
line 112: RESUME % return to 1B
3. 000000000000024c: f9000000 (RESUME) {#2704fc01} -> #23c
(0000000000000238: 2704fc01 (SUBUI)) $4=l[4] = #3fe0000000000000 - 1 = #3fdfffffffffffff
...............................................
line 112: RESUME % return to 1B
4. 000000000000024c: f9000000 (RESUME) {#2804fc01} -> #23c
(0000000000000238: 2804fc01 (2ADDU)) $4=l[4] = #3fe0000000000000 <<1+ #ffffffffffffffff = #7fbfffffffffffff
...............................................
line 112: RESUME % return to 1B
5. 000000000000024c: f9000000 (RESUME) {#2904fc01} -> #23c
(0000000000000238: 2904fc01 (2ADDUI)) $4=l[4] = #3fe0000000000000 <<1+ 1 = #7fc0000000000001
...............................................
line 112: RESUME % return to 1B
6. 000000000000024c: f9000000 (RESUME) {#2a04fc01} -> #23c
(0000000000000238: 2a04fc01 (4ADDU)) $4=l[4] = #3fe0000000000000 <<2+ #ffffffffffffffff = #ff7fffffffffffff
...............................................
line 112: RESUME % return to 1B
7. 000000000000024c: f9000000 (RESUME) {#2b04fc01} -> #23c
(0000000000000238: 2b04fc01 (4ADDUI)) $4=l[4] = #3fe0000000000000 <<2+ 1 = #ff80000000000001
...............................................
line 112: RESUME % return to 1B
8. 000000000000024c: f9000000 (RESUME) {#2c04fc01} -> #23c
(0000000000000238: 2c04fc01 (8ADDU)) $4=l[4] = #3fe0000000000000 <<3+ #ffffffffffffffff = #feffffffffffffff
...............................................
line 112: RESUME % return to 1B
9. 000000000000024c: f9000000 (RESUME) {#2d04fc01} -> #23c
(0000000000000238: 2d04fc01 (8ADDUI)) $4=l[4] = #3fe0000000000000 <<3+ 1 = #ff00000000000001
...............................................
line 112: RESUME % return to 1B
10. 000000000000024c: f9000000 (RESUME) {#2e04fc01} -> #23c
(0000000000000238: 2e04fc01 (16ADDU)) $4=l[4] = #3fe0000000000000 <<4+ #ffffffffffffffff = #fdffffffffffffff
...............................................
line 112: RESUME % return to 1B
11. 000000000000024c: f9000000 (RESUME) {#2f04fc01} -> #23c
(0000000000000238: 2f04fc01 (16ADDUI)) $4=l[4] = #3fe0000000000000 <<4+ 1 = #fe00000000000001
...............................................
line 112: RESUME % return to 1B
12. 000000000000024c: f9000000 (RESUME) {#3004fc01} -> #23c
(0000000000000238: 3004fc01 (CMP)) $4=l[4] = 4602678819172646912 cmp -1 = 1
...............................................
line 112: RESUME % return to 1B
13. 000000000000024c: f9000000 (RESUME) {#3104fc01} -> #23c
(0000000000000238: 3104fc01 (CMPI)) $4=l[4] = 4602678819172646912 cmp 1 = 1
...............................................
line 112: RESUME % return to 1B
14. 000000000000024c: f9000000 (RESUME) {#3204fc01} -> #23c
(0000000000000238: 3204fc01 (CMPU)) $4=l[4] = #3fe0000000000000 cmp #ffffffffffffffff = -1
...............................................
line 112: RESUME % return to 1B
15. 000000000000024c: f9000000 (RESUME) {#3304fc01} -> #23c
(0000000000000238: 3304fc01 (CMPUI)) $4=l[4] = #3fe0000000000000 cmp 1 = 1
...............................................
line 112: RESUME % return to 1B
16. 000000000000024c: f9000000 (RESUME) {#3404fc01} -> #23c
(0000000000000238: 3404fc01 (NEG)) $4=l[4] = 252 - -1 = 253
...............................................
line 112: RESUME % return to 1B
17. 000000000000024c: f9000000 (RESUME) {#3504fc01} -> #23c
(0000000000000238: 3504fc01 (NEGI)) $4=l[4] = 252 - 1 = 251
...............................................
line 112: RESUME % return to 1B
18. 000000000000024c: f9000000 (RESUME) {#3604fc01} -> #23c
(0000000000000238: 3604fc01 (NEGU)) $4=l[4] = 252 - #ffffffffffffffff = #fd
...............................................
line 112: RESUME % return to 1B
19. 000000000000024c: f9000000 (RESUME) {#3704fc01} -> #23c
(0000000000000238: 3704fc01 (NEGUI)) $4=l[4] = 252 - 1 = #fb
...............................................
line 112: RESUME % return to 1B
20. 000000000000024c: f9000000 (RESUME) {#3804fc01} -> #23c
(0000000000000238: 3804fc01 (SL)) $4=l[4] = 4602678819172646912 << #ffffffffffffffff = 0, rA=#00049
...............................................
line 112: RESUME % return to 1B
21. 000000000000024c: f9000000 (RESUME) {#3904fc01} -> #23c
(0000000000000238: 3904fc01 (SLI)) $4=l[4] = 4602678819172646912 << 1 = 9205357638345293824
...............................................
line 112: RESUME % return to 1B
22. 000000000000024c: f9000000 (RESUME) {#3a04fc01} -> #23c
(0000000000000238: 3a04fc01 (SLU)) $4=l[4] = #3fe0000000000000 << #ffffffffffffffff = #0
...............................................
line 112: RESUME % return to 1B
23. 000000000000024c: f9000000 (RESUME) {#3b04fc01} -> #23c
(0000000000000238: 3b04fc01 (SLUI)) $4=l[4] = #3fe0000000000000 << 1 = #7fc0000000000000
...............................................
line 112: RESUME % return to 1B
24. 000000000000024c: f9000000 (RESUME) {#3c04fc01} -> #23c
(0000000000000238: 3c04fc01 (SR)) $4=l[4] = 4602678819172646912 >> #ffffffffffffffff = 0
...............................................
line 112: RESUME % return to 1B
25. 000000000000024c: f9000000 (RESUME) {#3d04fc01} -> #23c
(0000000000000238: 3d04fc01 (SRI)) $4=l[4] = 4602678819172646912 >> 1 = 2301339409586323456
...............................................
line 112: RESUME % return to 1B
26. 000000000000024c: f9000000 (RESUME) {#3e04fc01} -> #23c
(0000000000000238: 3e04fc01 (SRU)) $4=l[4] = #3fe0000000000000 >> #ffffffffffffffff = #0
...............................................
line 112: RESUME % return to 1B
27. 000000000000024c: f9000000 (RESUME) {#3f04fc01} -> #23c
(0000000000000238: 3f04fc01 (SRUI)) $4=l[4] = #3fe0000000000000 >> 1 = #1ff0000000000000
...............................................
line 113: 1H BN $0,@+4*6
1. 0000000000000250: 40000006 (BN) 9218868437227405311<0? No
line 114: PBN $0,@-4*1
1. 0000000000000254: 5100ffff (PBNB) 9218868437227405311<0? No (bad guess)
line 115: BNN $0,@+4*6
1. 0000000000000258: 48000006 (BNN) 9218868437227405311>=0? Yes, -> #270 (bad guess)
...
line 121: PBNN $0,@-4*3
1. 0000000000000270: 5900fffd (PBNNB) 9218868437227405311>=0? Yes, -> #264
--------
line 118: BN $0,@-4*3
1. 0000000000000264: 4100fffd (BNB) 9218868437227405311<0? No
line 119: BNN $0,@-4*3
1. 0000000000000268: 4900fffd (BNNB) 9218868437227405311>=0? Yes, -> #25c (bad guess)
--------
line 116: PBN $0,@+4*5
1. 000000000000025c: 50000005 (PBN) 9218868437227405311<0? No (bad guess)
line 117: PBNN $0,@+4*5
1. 0000000000000260: 58000005 (PBNN) 9218868437227405311>=0? Yes, -> #274
...
line 122: BZ $0,@+4*6
1. 0000000000000274: 42000006 (BZ) 9218868437227405311==0? No
line 123: PBZ $0,@-4*1
1. 0000000000000278: 5300ffff (PBZB) 9218868437227405311==0? No (bad guess)
line 124: BNZ $0,@+4*6
1. 000000000000027c: 4a000006 (BNZ) 9218868437227405311!=0? Yes, -> #294 (bad guess)
...
line 130: PBNZ $0,@-4*3
1. 0000000000000294: 5b00fffd (PBNZB) 9218868437227405311!=0? Yes, -> #288
--------
line 127: BZ $0,@-4*3
1. 0000000000000288: 4300fffd (BZB) 9218868437227405311==0? No
line 128: BNZ $0,@-4*3
1. 000000000000028c: 4b00fffd (BNZB) 9218868437227405311!=0? Yes, -> #280 (bad guess)
--------
line 125: PBZ $0,@+4*5
1. 0000000000000280: 52000005 (PBZ) 9218868437227405311==0? No (bad guess)
line 126: PBNZ $0,@+4*5
1. 0000000000000284: 5a000005 (PBNZ) 9218868437227405311!=0? Yes, -> #298
...
line 131: BP $0,@+4*6
1. 0000000000000298: 44000006 (BP) 9218868437227405311>0? Yes, -> #2b0 (bad guess)
...
line 137: BNP $0,@-4*3
1. 00000000000002b0: 4d00fffd (BNPB) 9218868437227405311<=0? No
line 138: PBP $0,@-4*3
1. 00000000000002b4: 5500fffd (PBPB) 9218868437227405311>0? Yes, -> #2a8
--------
line 135: PBNP $0,@+4*5
1. 00000000000002a8: 5c000005 (PBNP) 9218868437227405311<=0? No (bad guess)
line 136: BP $0,@-4*3
1. 00000000000002ac: 4500fffd (BPB) 9218868437227405311>0? Yes, -> #2a0 (bad guess)
--------
line 133: BNP $0,@+4*6
1. 00000000000002a0: 4c000006 (BNP) 9218868437227405311<=0? No
line 134: PBP $0,@+4*5
1. 00000000000002a4: 54000005 (PBP) 9218868437227405311>0? Yes, -> #2b8
...
line 139: PBNP $0,@-4*3
1. 00000000000002b8: 5d00fffd (PBNPB) 9218868437227405311<=0? No (bad guess)
line 140: BOD $0,@+4*6
1. 00000000000002bc: 46000006 (BOD) 9218868437227405311 odd? Yes, -> #2d4 (bad guess)
...
line 146: BEV $0,@-4*3
1. 00000000000002d4: 4f00fffd (BEVB) 9218868437227405311 even? No
line 147: PBOD $0,@-4*3
1. 00000000000002d8: 5700fffd (PBODB) 9218868437227405311 odd? Yes, -> #2cc
--------
line 144: PBEV $0,@+4*5
1. 00000000000002cc: 5e000005 (PBEV) 9218868437227405311 even? No (bad guess)
line 145: BOD $0,@-4*3
1. 00000000000002d0: 4700fffd (BODB) 9218868437227405311 odd? Yes, -> #2c4 (bad guess)
--------
line 142: BEV $0,@+4*6
1. 00000000000002c4: 4e000006 (BEV) 9218868437227405311 even? No
line 143: PBOD $0,@+4*5
1. 00000000000002c8: 56000005 (PBOD) 9218868437227405311 odd? Yes, -> #2dc
...
line 148: PBEV $0,@-4*3
1. 00000000000002dc: 5f00fffd (PBEVB) 9218868437227405311 even? No (bad guess)
line 149: LDA $4,Load_Test+4
1. 00000000000002e0: 2304f10c (ADDUI) $4=l[4] = #2000000000000000 + 12 = #200000000000000c
line 150: GETA $3,1F
1. 00000000000002e4: f4030004 (GETA) $3=l[3] = #2f4
line 151: PUT rW,$3
1. 00000000000002e8: f6180003 (PUT) rW = 756 = #2f4
line 152: LDTU $7,Load_End
1. 00000000000002ec: 8b07f124 (LDTUI) $7=l[7] = M4[#2000000000000000+36] = #97030405
line 153: LDTU $6,Load_Begin
1. 00000000000002f0: 8b06f120 (LDTUI) $6=l[6] = M4[#2000000000000000+32] = #5f030405
line 154: 1H CMPU $8,$6,$7
1. 00000000000002f4: 32080607 (CMPU) $8=l[8] = #5f030405 cmp #97030405 = -1
line 155: BNN $8,1F
1. 00000000000002f8: 4808000c (BNN) -1>=0? No
line 156: INCML $6,#100 % increase the opcode
1. 00000000000002fc: e6060100 (INCML) $6=l[6] = #5f030405 + #1000000 = #60030405
line 157: PUT rX,$6
1. 0000000000000300: f6190006 (PUT) rX = 1610810373 = #60030405
line 158: RESUME % return to 1B
1. 0000000000000304: f9000000 (RESUME) {#60030405} -> #2f4
(00000000000002f0: 60030405 (CSN)) $3=l[3] = 2305843009213693964<0? 0: 756 = 756
--------
line 154: 1H CMPU $8,$6,$7
2. 00000000000002f4: 32080607 (CMPU) $8=l[8] = #60030405 cmp #97030405 = -1
line 155: BNN $8,1F
2. 00000000000002f8: 4808000c (BNN) -1>=0? No
line 156: INCML $6,#100 % increase the opcode
2. 00000000000002fc: e6060100 (INCML) $6=l[6] = #60030405 + #1000000 = #61030405
line 157: PUT rX,$6
2. 0000000000000300: f6190006 (PUT) rX = 1627587589 = #61030405
line 158: RESUME % return to 1B
2. 0000000000000304: f9000000 (RESUME) {#61030405} -> #2f4
(00000000000002f0: 61030405 (CSNI)) $3=l[3] = 2305843009213693964<0? 5: 756 = 756
...............................................
line 158: RESUME % return to 1B
3. 0000000000000304: f9000000 (RESUME) {#62030405} -> #2f4
(00000000000002f0: 62030405 (CSZ)) $3=l[3] = 2305843009213693964==0? 0: 756 = 756
...............................................
line 158: RESUME % return to 1B
4. 0000000000000304: f9000000 (RESUME) {#63030405} -> #2f4
(00000000000002f0: 63030405 (CSZI)) $3=l[3] = 2305843009213693964==0? 5: 756 = 756
...............................................
line 158: RESUME % return to 1B
5. 0000000000000304: f9000000 (RESUME) {#64030405} -> #2f4
(00000000000002f0: 64030405 (CSP)) $3=l[3] = 2305843009213693964>0? 0: 756 = 0
...............................................
line 158: RESUME % return to 1B
6. 0000000000000304: f9000000 (RESUME) {#65030405} -> #2f4
(00000000000002f0: 65030405 (CSPI)) $3=l[3] = 2305843009213693964>0? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
7. 0000000000000304: f9000000 (RESUME) {#66030405} -> #2f4
(00000000000002f0: 66030405 (CSOD)) $3=l[3] = 2305843009213693964 odd? 0: 5 = 5
...............................................
line 158: RESUME % return to 1B
8. 0000000000000304: f9000000 (RESUME) {#67030405} -> #2f4
(00000000000002f0: 67030405 (CSODI)) $3=l[3] = 2305843009213693964 odd? 5: 5 = 5
...............................................
line 158: RESUME % return to 1B
9. 0000000000000304: f9000000 (RESUME) {#68030405} -> #2f4
(00000000000002f0: 68030405 (CSNN)) $3=l[3] = 2305843009213693964>=0? 0: 5 = 0
...............................................
line 158: RESUME % return to 1B
10. 0000000000000304: f9000000 (RESUME) {#69030405} -> #2f4
(00000000000002f0: 69030405 (CSNNI)) $3=l[3] = 2305843009213693964>=0? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
11. 0000000000000304: f9000000 (RESUME) {#6a030405} -> #2f4
(00000000000002f0: 6a030405 (CSNZ)) $3=l[3] = 2305843009213693964!=0? 0: 5 = 0
...............................................
line 158: RESUME % return to 1B
12. 0000000000000304: f9000000 (RESUME) {#6b030405} -> #2f4
(00000000000002f0: 6b030405 (CSNZI)) $3=l[3] = 2305843009213693964!=0? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
13. 0000000000000304: f9000000 (RESUME) {#6c030405} -> #2f4
(00000000000002f0: 6c030405 (CSNP)) $3=l[3] = 2305843009213693964<=0? 0: 5 = 5
...............................................
line 158: RESUME % return to 1B
14. 0000000000000304: f9000000 (RESUME) {#6d030405} -> #2f4
(00000000000002f0: 6d030405 (CSNPI)) $3=l[3] = 2305843009213693964<=0? 5: 5 = 5
...............................................
line 158: RESUME % return to 1B
15. 0000000000000304: f9000000 (RESUME) {#6e030405} -> #2f4
(00000000000002f0: 6e030405 (CSEV)) $3=l[3] = 2305843009213693964 even? 0: 5 = 0
...............................................
line 158: RESUME % return to 1B
16. 0000000000000304: f9000000 (RESUME) {#6f030405} -> #2f4
(00000000000002f0: 6f030405 (CSEVI)) $3=l[3] = 2305843009213693964 even? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
17. 0000000000000304: f9000000 (RESUME) {#70030405} -> #2f4
(00000000000002f0: 70030405 (ZSN)) $3=l[3] = 2305843009213693964<0? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
18. 0000000000000304: f9000000 (RESUME) {#71030405} -> #2f4
(00000000000002f0: 71030405 (ZSNI)) $3=l[3] = 2305843009213693964<0? 5: 0 = 0
...............................................
line 158: RESUME % return to 1B
19. 0000000000000304: f9000000 (RESUME) {#72030405} -> #2f4
(00000000000002f0: 72030405 (ZSZ)) $3=l[3] = 2305843009213693964==0? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
20. 0000000000000304: f9000000 (RESUME) {#73030405} -> #2f4
(00000000000002f0: 73030405 (ZSZI)) $3=l[3] = 2305843009213693964==0? 5: 0 = 0
...............................................
line 158: RESUME % return to 1B
21. 0000000000000304: f9000000 (RESUME) {#74030405} -> #2f4
(00000000000002f0: 74030405 (ZSP)) $3=l[3] = 2305843009213693964>0? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
22. 0000000000000304: f9000000 (RESUME) {#75030405} -> #2f4
(00000000000002f0: 75030405 (ZSPI)) $3=l[3] = 2305843009213693964>0? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
23. 0000000000000304: f9000000 (RESUME) {#76030405} -> #2f4
(00000000000002f0: 76030405 (ZSOD)) $3=l[3] = 2305843009213693964 odd? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
24. 0000000000000304: f9000000 (RESUME) {#77030405} -> #2f4
(00000000000002f0: 77030405 (ZSODI)) $3=l[3] = 2305843009213693964 odd? 5: 0 = 0
...............................................
line 158: RESUME % return to 1B
25. 0000000000000304: f9000000 (RESUME) {#78030405} -> #2f4
(00000000000002f0: 78030405 (ZSNN)) $3=l[3] = 2305843009213693964>=0? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
26. 0000000000000304: f9000000 (RESUME) {#79030405} -> #2f4
(00000000000002f0: 79030405 (ZSNNI)) $3=l[3] = 2305843009213693964>=0? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
27. 0000000000000304: f9000000 (RESUME) {#7a030405} -> #2f4
(00000000000002f0: 7a030405 (ZSNZ)) $3=l[3] = 2305843009213693964!=0? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
28. 0000000000000304: f9000000 (RESUME) {#7b030405} -> #2f4
(00000000000002f0: 7b030405 (ZSNZI)) $3=l[3] = 2305843009213693964!=0? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
29. 0000000000000304: f9000000 (RESUME) {#7c030405} -> #2f4
(00000000000002f0: 7c030405 (ZSNP)) $3=l[3] = 2305843009213693964<=0? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
30. 0000000000000304: f9000000 (RESUME) {#7d030405} -> #2f4
(00000000000002f0: 7d030405 (ZSNPI)) $3=l[3] = 2305843009213693964<=0? 5: 0 = 0
...............................................
line 158: RESUME % return to 1B
31. 0000000000000304: f9000000 (RESUME) {#7e030405} -> #2f4
(00000000000002f0: 7e030405 (ZSEV)) $3=l[3] = 2305843009213693964 even? 0: 0 = 0
...............................................
line 158: RESUME % return to 1B
32. 0000000000000304: f9000000 (RESUME) {#7f030405} -> #2f4
(00000000000002f0: 7f030405 (ZSEVI)) $3=l[3] = 2305843009213693964 even? 5: 0 = 5
...............................................
line 158: RESUME % return to 1B
33. 0000000000000304: f9000000 (RESUME) {#80030405} -> #2f4
(00000000000002f0: 80030405 (LDB)) $3=l[3] = M1[#200000000000000c+#0] = -124
...............................................
line 158: RESUME % return to 1B
34. 0000000000000304: f9000000 (RESUME) {#81030405} -> #2f4
(00000000000002f0: 81030405 (LDBI)) $3=l[3] = M1[#200000000000000c+5] = -119
...............................................
line 158: RESUME % return to 1B
35. 0000000000000304: f9000000 (RESUME) {#82030405} -> #2f4
(00000000000002f0: 82030405 (LDBU)) $3=l[3] = M1[#200000000000000c+#0] = #84
...............................................
line 158: RESUME % return to 1B
36. 0000000000000304: f9000000 (RESUME) {#83030405} -> #2f4
(00000000000002f0: 83030405 (LDBUI)) $3=l[3] = M1[#200000000000000c+5] = #89
...............................................
line 158: RESUME % return to 1B
37. 0000000000000304: f9000000 (RESUME) {#84030405} -> #2f4
(00000000000002f0: 84030405 (LDW)) $3=l[3] = M2[#200000000000000c+#0] = -31611
...............................................
line 158: RESUME % return to 1B
38. 0000000000000304: f9000000 (RESUME) {#85030405} -> #2f4
(00000000000002f0: 85030405 (LDWI)) $3=l[3] = M2[#200000000000000c+5] = -30583
...............................................
line 158: RESUME % return to 1B
39. 0000000000000304: f9000000 (RESUME) {#86030405} -> #2f4
(00000000000002f0: 86030405 (LDWU)) $3=l[3] = M2[#200000000000000c+#0] = #8485
...............................................
line 158: RESUME % return to 1B
40. 0000000000000304: f9000000 (RESUME) {#87030405} -> #2f4
(00000000000002f0: 87030405 (LDWUI)) $3=l[3] = M2[#200000000000000c+5] = #8889
...............................................
line 158: RESUME % return to 1B
41. 0000000000000304: f9000000 (RESUME) {#88030405} -> #2f4
(00000000000002f0: 88030405 (LDT)) $3=l[3] = M4[#200000000000000c+#0] = -2071624057
...............................................
line 158: RESUME % return to 1B
42. 0000000000000304: f9000000 (RESUME) {#89030405} -> #2f4
(00000000000002f0: 89030405 (LDTI)) $3=l[3] = M4[#200000000000000c+5] = -2004252021
...............................................
line 158: RESUME % return to 1B
43. 0000000000000304: f9000000 (RESUME) {#8a030405} -> #2f4
(00000000000002f0: 8a030405 (LDTU)) $3=l[3] = M4[#200000000000000c+#0] = #84858687
...............................................
line 158: RESUME % return to 1B
44. 0000000000000304: f9000000 (RESUME) {#8b030405} -> #2f4
(00000000000002f0: 8b030405 (LDTUI)) $3=l[3] = M4[#200000000000000c+5] = #88898a8b
...............................................
line 158: RESUME % return to 1B
45. 0000000000000304: f9000000 (RESUME) {#8c030405} -> #2f4
(00000000000002f0: 8c030405 (LDO)) $3=l[3] = M8[#200000000000000c+#0] = -9186918263483431289
...............................................
line 158: RESUME % return to 1B
46. 0000000000000304: f9000000 (RESUME) {#8d030405} -> #2f4
(00000000000002f0: 8d030405 (LDOI)) $3=l[3] = M8[#200000000000000c+5] = -8608196880778817905
...............................................
line 158: RESUME % return to 1B
47. 0000000000000304: f9000000 (RESUME) {#8e030405} -> #2f4
(00000000000002f0: 8e030405 (LDOU)) $3=l[3] = M8[#200000000000000c+#0] = #8081828384858687
...............................................
line 158: RESUME % return to 1B
48. 0000000000000304: f9000000 (RESUME) {#8f030405} -> #2f4
(00000000000002f0: 8f030405 (LDOUI)) $3=l[3] = M8[#200000000000000c+5] = #88898a8b8c8d8e8f
...............................................
line 158: RESUME % return to 1B
49. 0000000000000304: f9000000 (RESUME) {#90030405} -> #2f4
(00000000000002f0: 90030405 (LDSF)) $3=l[3] = (M4[#200000000000000c+#0]) = -3.1391693585473826e-36
...............................................
line 158: RESUME % return to 1B
50. 0000000000000304: f9000000 (RESUME) {#91030405} -> #2f4
(00000000000002f0: 91030405 (LDSFI)) $3=l[3] = (M4[#200000000000000c+5]) = -8.277958869830208e-34
...............................................
line 158: RESUME % return to 1B
51. 0000000000000304: f9000000 (RESUME) {#92030405} -> #2f4
(00000000000002f0: 92030405 (LDHT)) $3=l[3] = M4[#200000000000000c+#0]<<32 = #8485868700000000
...............................................
line 158: RESUME % return to 1B
52. 0000000000000304: f9000000 (RESUME) {#93030405} -> #2f4
(00000000000002f0: 93030405 (LDHTI)) $3=l[3] = M4[#200000000000000c+5]<<32 = #88898a8b00000000
...............................................
line 158: RESUME % return to 1B
53. 0000000000000304: f9000000 (RESUME) {#94030405} -> #2f4
(00000000000002f0: 94030405 (CSWAP)) $3=l[3] = [M8[#200000000000000c+#0]==0] = 0, rP=#8081828384858687
...............................................
line 158: RESUME % return to 1B
54. 0000000000000304: f9000000 (RESUME) {#95030405} -> #2f4
(00000000000002f0: 95030405 (CSWAPI)) $3=l[3] = [M8[#200000000000000c+5]==-9186918263483431289] = 0, rP=#88898a8b8c8d8e8f
...............................................
line 158: RESUME % return to 1B
55. 0000000000000304: f9000000 (RESUME) {#96030405} -> #2f4
(00000000000002f0: 96030405 (LDUNC)) $3=l[3] = M8[#200000000000000c+#0] = #8081828384858687
...............................................
line 158: RESUME % return to 1B
56. 0000000000000304: f9000000 (RESUME) {#97030405} -> #2f4
(00000000000002f0: 97030405 (LDUNCI)) $3=l[3] = M8[#200000000000000c+5] = #88898a8b8c8d8e8f
...............................................
line 165: 1H GETA $4,2B
1. 0000000000000328: f504fff8 (GETAB) $4=l[4] = #308
line 166: SETL $7,4*11
1. 000000000000032c: e307002c (SETL) $7=l[7] = #2c
line 167: GO $7,$7,$4
1. 0000000000000330: 9e070704 (GO) $7=l[7] = #334, -> #2c+#308
line 168: GO $7,$4,4*12
1. 0000000000000334: 9f070430 (GOI) $7=l[7] = #338, -> #308+48
line 169: PRELD 70,$4,$4
1. 0000000000000338: 9a460404 (PRELD) [#308+#308 .. #656]
line 170: PRELD 70,$4,0
1. 000000000000033c: 9b460400 (PRELDI) [#308 .. #34e]
line 171: PREGO 70,$4,$4
1. 0000000000000340: 9c460404 (PREGO) [#308+#308 .. #656]
line 172: PREGO 70,$4,0
1. 0000000000000344: 9d460400 (PREGOI) [#308 .. #34e]
line 173: CSWAP $3,Load_Test+13
1. 0000000000000348: 9503f115 (CSWAPI) $3=l[3] = [M8[#2000000000000000+21]==-8608196880778817905] = 1, M8[#2000000000000010]=#88898a8b8c8d8e8f
line 174: GETA $3,1F
1. 000000000000034c: f4030007 (GETA) $3=l[3] = #368
line 175: PUT rW,$3
1. 0000000000000350: f6180003 (PUT) rW = 872 = #368
line 176: SETL rz,1
1. 0000000000000354: e3f20001 (SETL) $242=g[242] = #1
line 177: ADD ry,$4,4
1. 0000000000000358: 21f30404 (ADDI) $243=g[243] = 776 + 4 = 780
line 178: LDOU $40,Jmp_Pop
1. 000000000000035c: 8f28f118 (LDOUI) rL=41, $40=l[40] = M8[#2000000000000000+24] = #f0000002f8000000
line 179: LDTU $7,Big_End
1. 0000000000000360: 8b07f12c (LDTUI) $7=l[7] = M4[#2000000000000000+44] = #ef28f305
line 180: LDTU $6,Big_Begin
1. 0000000000000364: 8b06f128 (LDTUI) $6=l[6] = M4[#2000000000000000+40] = #9f28f305
line 181: 1H CMPU $8,$6,$7
1. 0000000000000368: 32080607 (CMPU) $8=l[8] = #9f28f305 cmp #ef28f305 = -1
line 182: BNN $8,1F
1. 000000000000036c: 48080005 (BNN) -1>=0? No
line 183: INCML $6,#100 % increase the opcode
1. 0000000000000370: e6060100 (INCML) $6=l[6] = #9f28f305 + #1000000 = #a028f305
line 184: PUT rX,$6
1. 0000000000000374: f6190006 (PUT) rX = 2687038213 = #a028f305
line 185: SET $5,rz
1. 0000000000000378: c105f200 (ORI) $5=l[5] = 1 = #1
line 186: RESUME % return to 1B
1. 000000000000037c: f9000000 (RESUME) {#a028f305} -> #368
(0000000000000364: a028f305 (STB)) M1[#30c+#1] = -1152921491856162816, M8[#308]=#fedcba9876003210, rA=#00049
--------
line 181: 1H CMPU $8,$6,$7
2. 0000000000000368: 32080607 (CMPU) $8=l[8] = #a028f305 cmp #ef28f305 = -1
line 182: BNN $8,1F
2. 000000000000036c: 48080005 (BNN) -1>=0? No
line 183: INCML $6,#100 % increase the opcode
2. 0000000000000370: e6060100 (INCML) $6=l[6] = #a028f305 + #1000000 = #a128f305
line 184: PUT rX,$6
2. 0000000000000374: f6190006 (PUT) rX = 2703815429 = #a128f305
line 185: SET $5,rz
2. 0000000000000378: c105f200 (ORI) $5=l[5] = 1 = #1
line 186: RESUME % return to 1B
2. 000000000000037c: f9000000 (RESUME) {#a128f305} -> #368
(0000000000000364: a128f305 (STBI)) M1[#30c+5] = -1152921491856162816, M8[#310]=#ff00ddccbbaa9988, rA=#00049
...............................................
line 186: RESUME % return to 1B
3. 000000000000037c: f9000000 (RESUME) {#a228f305} -> #368
(0000000000000364: a228f305 (STBU)) M1[#30c+#1] = #f0000002f8000000, M8[#308]=#fedcba9876003210
...............................................
line 186: RESUME % return to 1B
4. 000000000000037c: f9000000 (RESUME) {#a328f305} -> #368
(0000000000000364: a328f305 (STBUI)) M1[#30c+5] = #f0000002f8000000, M8[#310]=#ff00ddccbbaa9988
...............................................
line 186: RESUME % return to 1B
5. 000000000000037c: f9000000 (RESUME) {#a428f305} -> #368
(0000000000000364: a428f305 (STW)) M2[#30c+#1] = -1152921491856162816, M8[#308]=#fedcba9800003210, rA=#00049
...............................................
line 186: RESUME % return to 1B
6. 000000000000037c: f9000000 (RESUME) {#a528f305} -> #368
(0000000000000364: a528f305 (STWI)) M2[#30c+5] = -1152921491856162816, M8[#310]=#ddccbbaa9988, rA=#00049
...............................................
line 186: RESUME % return to 1B
7. 000000000000037c: f9000000 (RESUME) {#a628f305} -> #368
(0000000000000364: a628f305 (STWU)) M2[#30c+#1] = #f0000002f8000000, M8[#308]=#fedcba9800003210
...............................................
line 186: RESUME % return to 1B
8. 000000000000037c: f9000000 (RESUME) {#a728f305} -> #368
(0000000000000364: a728f305 (STWUI)) M2[#30c+5] = #f0000002f8000000, M8[#310]=#ddccbbaa9988
...............................................
line 186: RESUME % return to 1B
9. 000000000000037c: f9000000 (RESUME) {#a828f305} -> #368
(0000000000000364: a828f305 (STT)) M4[#30c+#1] = -1152921491856162816, M8[#308]=#fedcba98f8000000, rA=#00049
...............................................
line 186: RESUME % return to 1B
10. 000000000000037c: f9000000 (RESUME) {#a928f305} -> #368
(0000000000000364: a928f305 (STTI)) M4[#30c+5] = -1152921491856162816, M8[#310]=#f8000000bbaa9988, rA=#00049
...............................................
line 186: RESUME % return to 1B
11. 000000000000037c: f9000000 (RESUME) {#aa28f305} -> #368
(0000000000000364: aa28f305 (STTU)) M4[#30c+#1] = #f0000002f8000000, M8[#308]=#fedcba98f8000000
...............................................
line 186: RESUME % return to 1B
12. 000000000000037c: f9000000 (RESUME) {#ab28f305} -> #368
(0000000000000364: ab28f305 (STTUI)) M4[#30c+5] = #f0000002f8000000, M8[#310]=#f8000000bbaa9988
...............................................
line 186: RESUME % return to 1B
13. 000000000000037c: f9000000 (RESUME) {#ac28f305} -> #368
(0000000000000364: ac28f305 (STO)) M8[#30c+#1] = -1152921491856162816
...............................................
line 186: RESUME % return to 1B
14. 000000000000037c: f9000000 (RESUME) {#ad28f305} -> #368
(0000000000000364: ad28f305 (STOI)) M8[#30c+5] = -1152921491856162816
...............................................
line 186: RESUME % return to 1B
15. 000000000000037c: f9000000 (RESUME) {#ae28f305} -> #368
(0000000000000364: ae28f305 (STOU)) M8[#30c+#1] = #f0000002f8000000
...............................................
line 186: RESUME % return to 1B
16. 000000000000037c: f9000000 (RESUME) {#af28f305} -> #368
(0000000000000364: af28f305 (STOUI)) M8[#30c+5] = #f0000002f8000000
...............................................
line 186: RESUME % return to 1B
17. 000000000000037c: f9000000 (RESUME) {#b028f305} -> #368
(0000000000000364: b028f305 (STSF)) (M4[#30c+#1]) = -3.105044975643911e231, M8[#308]=#f0000002ff800000, rA=#00049
...............................................
line 186: RESUME % return to 1B
18. 000000000000037c: f9000000 (RESUME) {#b128f305} -> #368
(0000000000000364: b128f305 (STSFI)) (M4[#30c+5]) = -3.105044975643911e231, M8[#310]=#ff800000f8000000, rA=#00049
...............................................
line 186: RESUME % return to 1B
19. 000000000000037c: f9000000 (RESUME) {#b228f305} -> #368
(0000000000000364: b228f305 (STHT)) M4[#30c+#1] = #f0000002f8000000>>32, M8[#308]=#f0000002f0000002
...............................................
line 186: RESUME % return to 1B
20. 000000000000037c: f9000000 (RESUME) {#b328f305} -> #368
(0000000000000364: b328f305 (STHTI)) M4[#30c+5] = #f0000002f8000000>>32, M8[#310]=#f0000002f8000000
...............................................
line 186: RESUME % return to 1B
21. 000000000000037c: f9000000 (RESUME) {#b428f305} -> #368
(0000000000000364: b428f305 (STCO)) M8[#30c+#1] = 40
...............................................
line 186: RESUME % return to 1B
22. 000000000000037c: f9000000 (RESUME) {#b528f305} -> #368
(0000000000000364: b528f305 (STCOI)) M8[#30c+5] = 40
...............................................
line 186: RESUME % return to 1B
23. 000000000000037c: f9000000 (RESUME) {#b628f305} -> #368
(0000000000000364: b628f305 (STUNC)) M8[#30c+#1] = #f0000002f8000000
...............................................
line 186: RESUME % return to 1B
24. 000000000000037c: f9000000 (RESUME) {#b728f305} -> #368
(0000000000000364: b728f305 (STUNCI)) M8[#30c+5] = #f0000002f8000000
...............................................
line 186: RESUME % return to 1B
25. 000000000000037c: f9000000 (RESUME) {#b828f305} -> #368
(0000000000000364: b828f305 (SYNCD)) [#30c+#1 .. #335]
...............................................
line 186: RESUME % return to 1B
26. 000000000000037c: f9000000 (RESUME) {#b928f305} -> #368
(0000000000000364: b928f305 (SYNCDI)) [#30c+5 .. #339]
...............................................
line 186: RESUME % return to 1B
27. 000000000000037c: f9000000 (RESUME) {#ba28f305} -> #368
(0000000000000364: ba28f305 (PREST)) [#30c+#1 .. #335]
...............................................
line 186: RESUME % return to 1B
28. 000000000000037c: f9000000 (RESUME) {#bb28f305} -> #368
(0000000000000364: bb28f305 (PRESTI)) [#30c+5 .. #339]
...............................................
line 186: RESUME % return to 1B
29. 000000000000037c: f9000000 (RESUME) {#bc28f305} -> #368
(0000000000000364: bc28f305 (SYNCID)) [#30c+#1 .. #335]
...............................................
line 186: RESUME % return to 1B
30. 000000000000037c: f9000000 (RESUME) {#bd28f305} -> #368
(0000000000000364: bd28f305 (SYNCIDI)) [#30c+5 .. #339]
...............................................
line 186: RESUME % return to 1B
31. 000000000000037c: f9000000 (RESUME) {#be28f305} -> #368
(0000000000000364: be28f305 (PUSHGO)) l[40]=40, rO=#6000000000000148, rL=0, rJ=#368, -> #30c+#1
--------
line 159: 2H OCTA #fedcba9876543210 % becomes Jmp_Pop
1. 000000000000030d: f8000000 (POP) rL=40, rO=#6000000000000000, -> #368
...............................................
line 186: RESUME % return to 1B
32. 000000000000037c: f9000000 (RESUME) {#bf28f305} -> #368
(0000000000000364: bf28f305 (PUSHGOI)) l[40]=40, rO=#6000000000000148, rL=0, rJ=#368, -> #30c+5
--------
line 160: OCTA #ffeeddccbbaa9988 % becomes Jmp_Pop
1. 0000000000000311: f0000002 (JMP) -> #319
line 161: NEG ry,addy
1. 0000000000000319: 34f300f6 (NEG) $243=g[243] = 0 - 9178337916516812809 = -9178337916516812809
line 162: SET rz,flip
1. 000000000000031d: c1f2f400 (ORI) $242=g[242] = 72624976668147840 = #102040810204080
line 163: PUT rM,addz
1. 0000000000000321: f60500f5 (PUT) rM = -45041037404232713 = #ff5ffb6a4534a3f7
line 164: POP
1. 0000000000000325: f8000000 (POP) rL=40, rO=#6000000000000000, -> #368
...............................................
line 186: RESUME % return to 1B
33. 000000000000037c: f9000000 (RESUME) {#c028f305} -> #368
(0000000000000364: c028f305 (OR)) rL=41, $40=l[40] = #809ffe4b398437f7 | #102040810204080 = #819ffe4b39a477f7
...............................................
line 186: RESUME % return to 1B
34. 000000000000037c: f9000000 (RESUME) {#c128f305} -> #368
(0000000000000364: c128f305 (ORI)) $40=l[40] = #809ffe4b398437f7 | 5 = #809ffe4b398437f7
...............................................
line 186: RESUME % return to 1B
35. 000000000000037c: f9000000 (RESUME) {#c228f305} -> #368
(0000000000000364: c228f305 (ORN)) $40=l[40] = #809ffe4b398437f7 |~ #102040810204080 = #feffffffffdfbfff
...............................................
line 186: RESUME % return to 1B
36. 000000000000037c: f9000000 (RESUME) {#c328f305} -> #368
(0000000000000364: c328f305 (ORNI)) $40=l[40] = #809ffe4b398437f7 |~ 5 = #ffffffffffffffff
...............................................
line 186: RESUME % return to 1B
37. 000000000000037c: f9000000 (RESUME) {#c428f305} -> #368
(0000000000000364: c428f305 (NOR)) $40=l[40] = #809ffe4b398437f7 ~| #102040810204080 = #7e6001b4c65b8808
...............................................
line 186: RESUME % return to 1B
38. 000000000000037c: f9000000 (RESUME) {#c528f305} -> #368
(0000000000000364: c528f305 (NORI)) $40=l[40] = #809ffe4b398437f7 ~| 5 = #7f6001b4c67bc808
...............................................
line 186: RESUME % return to 1B
39. 000000000000037c: f9000000 (RESUME) {#c628f305} -> #368
(0000000000000364: c628f305 (XOR)) $40=l[40] = #809ffe4b398437f7 ^ #102040810204080 = #819dfa4329a47777
...............................................
line 186: RESUME % return to 1B
40. 000000000000037c: f9000000 (RESUME) {#c728f305} -> #368
(0000000000000364: c728f305 (XORI)) $40=l[40] = #809ffe4b398437f7 ^ 5 = #809ffe4b398437f2
...............................................
line 186: RESUME % return to 1B
41. 000000000000037c: f9000000 (RESUME) {#c828f305} -> #368
(0000000000000364: c828f305 (AND)) $40=l[40] = #809ffe4b398437f7 & #102040810204080 = #2040810000080
...............................................
line 186: RESUME % return to 1B
42. 000000000000037c: f9000000 (RESUME) {#c928f305} -> #368
(0000000000000364: c928f305 (ANDI)) $40=l[40] = #809ffe4b398437f7 & 5 = #5
...............................................
line 186: RESUME % return to 1B
43. 000000000000037c: f9000000 (RESUME) {#ca28f305} -> #368
(0000000000000364: ca28f305 (ANDN)) $40=l[40] = #809ffe4b398437f7 \ #102040810204080 = #809dfa4329843777
...............................................
line 186: RESUME % return to 1B
44. 000000000000037c: f9000000 (RESUME) {#cb28f305} -> #368
(0000000000000364: cb28f305 (ANDNI)) $40=l[40] = #809ffe4b398437f7 \ 5 = #809ffe4b398437f2
...............................................
line 186: RESUME % return to 1B
45. 000000000000037c: f9000000 (RESUME) {#cc28f305} -> #368
(0000000000000364: cc28f305 (NAND)) $40=l[40] = #809ffe4b398437f7 ~& #102040810204080 = #fffdfbf7efffff7f
...............................................
line 186: RESUME % return to 1B
46. 000000000000037c: f9000000 (RESUME) {#cd28f305} -> #368
(0000000000000364: cd28f305 (NANDI)) $40=l[40] = #809ffe4b398437f7 ~& 5 = #fffffffffffffffa
...............................................
line 186: RESUME % return to 1B
47. 000000000000037c: f9000000 (RESUME) {#ce28f305} -> #368
(0000000000000364: ce28f305 (NXOR)) $40=l[40] = #809ffe4b398437f7 ~^ #102040810204080 = #7e6205bcd65b8888
...............................................
line 186: RESUME % return to 1B
48. 000000000000037c: f9000000 (RESUME) {#cf28f305} -> #368
(0000000000000364: cf28f305 (NXORI)) $40=l[40] = #809ffe4b398437f7 ~^ 5 = #7f6001b4c67bc80d
...............................................
line 186: RESUME % return to 1B
49. 000000000000037c: f9000000 (RESUME) {#d028f305} -> #368
(0000000000000364: d028f305 (BDIF)) $40=l[40] = #809ffe4b398437f7 bdif #102040810204080 = #7f9dfa4329640077
...............................................
line 186: RESUME % return to 1B
50. 000000000000037c: f9000000 (RESUME) {#d128f305} -> #368
(0000000000000364: d128f305 (BDIFI)) $40=l[40] = #809ffe4b398437f7 bdif 5 = #809ffe4b398437f2
...............................................
line 186: RESUME % return to 1B
51. 000000000000037c: f9000000 (RESUME) {#d228f305} -> #368
(0000000000000364: d228f305 (WDIF)) $40=l[40] = #809ffe4b398437f7 wdif #102040810204080 = #7f9dfa4329640000
...............................................
line 186: RESUME % return to 1B
52. 000000000000037c: f9000000 (RESUME) {#d328f305} -> #368
(0000000000000364: d328f305 (WDIFI)) $40=l[40] = #809ffe4b398437f7 wdif 5 = #809ffe4b398437f2
...............................................
line 186: RESUME % return to 1B
53. 000000000000037c: f9000000 (RESUME) {#d428f305} -> #368
(0000000000000364: d428f305 (TDIF)) $40=l[40] = #809ffe4b398437f7 tdif #102040810204080 = #7f9dfa432963f777
...............................................
line 186: RESUME % return to 1B
54. 000000000000037c: f9000000 (RESUME) {#d528f305} -> #368
(0000000000000364: d528f305 (TDIFI)) $40=l[40] = #809ffe4b398437f7 tdif 5 = #809ffe4b398437f2
...............................................
line 186: RESUME % return to 1B
55. 000000000000037c: f9000000 (RESUME) {#d628f305} -> #368
(0000000000000364: d628f305 (ODIF)) $40=l[40] = #809ffe4b398437f7 odif #102040810204080 = #7f9dfa432963f777
...............................................
line 186: RESUME % return to 1B
56. 000000000000037c: f9000000 (RESUME) {#d728f305} -> #368
(0000000000000364: d728f305 (ODIFI)) $40=l[40] = #809ffe4b398437f7 odif 5 = #809ffe4b398437f2
...............................................
line 186: RESUME % return to 1B
57. 000000000000037c: f9000000 (RESUME) {#d828f305} -> #368
(0000000000000364: d828f305 (MUX)) $40=l[40] = #ff5ffb6a4534a3f7? #809ffe4b398437f7: #102040810204080 = #801ffe4a110463f7
...............................................
line 186: RESUME % return to 1B
58. 000000000000037c: f9000000 (RESUME) {#d928f305} -> #368
(0000000000000364: d928f305 (MUXI)) $40=l[40] = #ff5ffb6a4534a3f7? #809ffe4b398437f7: 5 = #801ffa4a010423f7
...............................................
line 186: RESUME % return to 1B
59. 000000000000037c: f9000000 (RESUME) {#da28f305} -> #368
(0000000000000364: da28f305 (SADD)) $40=l[40] = nu(#809ffe4b398437f7\#102040810204080) = 31
...............................................
line 186: RESUME % return to 1B
60. 000000000000037c: f9000000 (RESUME) {#db28f305} -> #368
(0000000000000364: db28f305 (SADDI)) $40=l[40] = nu(#809ffe4b398437f7\5) = 34
...............................................
line 186: RESUME % return to 1B
61. 000000000000037c: f9000000 (RESUME) {#dc28f305} -> #368
(0000000000000364: dc28f305 (MOR)) $40=l[40] = #809ffe4b398437f7 mor #102040810204080 = #f73784394bfe9f80
...............................................
line 186: RESUME % return to 1B
62. 000000000000037c: f9000000 (RESUME) {#dd28f305} -> #368
(0000000000000364: dd28f305 (MORI)) $40=l[40] = #809ffe4b398437f7 mor 5 = #f7
...............................................
line 186: RESUME % return to 1B
63. 000000000000037c: f9000000 (RESUME) {#de28f305} -> #368
(0000000000000364: de28f305 (MXOR)) $40=l[40] = #809ffe4b398437f7 mxor #102040810204080 = #f73784394bfe9f80
...............................................
line 186: RESUME % return to 1B
64. 000000000000037c: f9000000 (RESUME) {#df28f305} -> #368
(0000000000000364: df28f305 (MXORI)) $40=l[40] = #809ffe4b398437f7 mxor 5 = #73
...............................................
line 186: RESUME % return to 1B
65. 000000000000037c: f9000000 (RESUME) {#e028f305} -> #368
(0000000000000364: e028f305 (SETH)) $40=l[40] = #f305000000000000
...............................................
line 186: RESUME % return to 1B
66. 000000000000037c: f9000000 (RESUME) {#e128f305} -> #368
(0000000000000364: e128f305 (SETMH)) $40=l[40] = #f30500000000
...............................................
line 186: RESUME % return to 1B
67. 000000000000037c: f9000000 (RESUME) {#e228f305} -> #368
(0000000000000364: e228f305 (SETML)) $40=l[40] = #f3050000
...............................................
line 186: RESUME % return to 1B
68. 000000000000037c: f9000000 (RESUME) {#e328f305} -> #368
(0000000000000364: e328f305 (SETL)) $40=l[40] = #f305
...............................................
line 186: RESUME % return to 1B
69. 000000000000037c: f9000000 (RESUME) {#e428f305} -> #368
(0000000000000364: e428f305 (INCH)) $40=l[40] = #f305 + #f305000000000000 = #f30500000000f305
...............................................
line 186: RESUME % return to 1B
70. 000000000000037c: f9000000 (RESUME) {#e528f305} -> #368
(0000000000000364: e528f305 (INCMH)) $40=l[40] = #f30500000000f305 + #f30500000000 = #f305f3050000f305
...............................................
line 186: RESUME % return to 1B
71. 000000000000037c: f9000000 (RESUME) {#e628f305} -> #368
(0000000000000364: e628f305 (INCML)) $40=l[40] = #f305f3050000f305 + #f3050000 = #f305f305f305f305
...............................................
line 186: RESUME % return to 1B
72. 000000000000037c: f9000000 (RESUME) {#e728f305} -> #368
(0000000000000364: e728f305 (INCL)) $40=l[40] = #f305f305f305f305 + #f305 = #f305f305f306e60a
...............................................
line 186: RESUME % return to 1B
73. 000000000000037c: f9000000 (RESUME) {#e828f305} -> #368
(0000000000000364: e828f305 (ORH)) $40=l[40] = #f305f305f306e60a | #f305000000000000 = #f305f305f306e60a
...............................................
line 186: RESUME % return to 1B
74. 000000000000037c: f9000000 (RESUME) {#e928f305} -> #368
(0000000000000364: e928f305 (ORMH)) $40=l[40] = #f305f305f306e60a | #f30500000000 = #f305f305f306e60a
...............................................
line 186: RESUME % return to 1B
75. 000000000000037c: f9000000 (RESUME) {#ea28f305} -> #368
(0000000000000364: ea28f305 (ORML)) $40=l[40] = #f305f305f306e60a | #f3050000 = #f305f305f307e60a
...............................................
line 186: RESUME % return to 1B
76. 000000000000037c: f9000000 (RESUME) {#eb28f305} -> #368
(0000000000000364: eb28f305 (ORL)) $40=l[40] = #f305f305f307e60a | #f305 = #f305f305f307f70f
...............................................
line 186: RESUME % return to 1B
77. 000000000000037c: f9000000 (RESUME) {#ec28f305} -> #368
(0000000000000364: ec28f305 (ANDNH)) $40=l[40] = #f305f305f307f70f \ #f305000000000000 = #f305f307f70f
...............................................
line 186: RESUME % return to 1B
78. 000000000000037c: f9000000 (RESUME) {#ed28f305} -> #368
(0000000000000364: ed28f305 (ANDNMH)) $40=l[40] = #f305f307f70f \ #f30500000000 = #f307f70f
...............................................
line 186: RESUME % return to 1B
79. 000000000000037c: f9000000 (RESUME) {#ee28f305} -> #368
(0000000000000364: ee28f305 (ANDNML)) $40=l[40] = #f307f70f \ #f3050000 = #2f70f
...............................................
line 186: RESUME % return to 1B
80. 000000000000037c: f9000000 (RESUME) {#ef28f305} -> #368
(0000000000000364: ef28f305 (ANDNL)) $40=l[40] = #2f70f \ #f305 = #2040a
...............................................
line 187: 1H SL $40,small,51
1. 0000000000000380: 3928fe33 (SLI) $40=l[40] = 2748 << 51 = 6187945888007061504
line 188: SL $40,small,52
1. 0000000000000384: 3928fe34 (SLI) $40=l[40] = 2748 << 52 = -6070852297695428608, rA=#00049
line 189: SAVE $255,0
M8[#6000000000000000]=l[0]=#7fefffffffffffff, rS+=8
M8[#6000000000000008]=l[1]=#ffffffffffffffff, rS+=8
M8[#6000000000000010]=l[2]=#0000000000000000, rS+=8
M8[#6000000000000018]=l[3]=#0000000000000368, rS+=8
M8[#6000000000000020]=l[4]=#0000000000000308, rS+=8
M8[#6000000000000028]=l[5]=#0102040810204080, rS+=8
M8[#6000000000000030]=l[6]=#00000000ef28f305, rS+=8
M8[#6000000000000038]=l[7]=#00000000ef28f305, rS+=8
M8[#6000000000000040]=l[8]=#0000000000000000, rS+=8
M8[#6000000000000048]=l[9]=#bfdfffffffffffff, rS+=8
M8[#6000000000000050]=l[10]=#fff8000000000000, rS+=8
M8[#6000000000000058]=l[11]=#fff8000000000000, rS+=8
M8[#6000000000000060]=l[12]=#0000000000000001, rS+=8
M8[#6000000000000068]=l[13]=#8000000000000000, rS+=8
M8[#6000000000000070]=l[14]=#ffffffffffffffff, rS+=8
M8[#6000000000000078]=l[15]=#43dfd8006d319ef2, rS+=8
M8[#6000000000000080]=l[16]=#43dfd8006d319ef3, rS+=8
M8[#6000000000000088]=l[17]=#bff0000000000000, rS+=8
M8[#6000000000000090]=l[18]=#43dfd80060000000, rS+=8
M8[#6000000000000098]=l[19]=#43dfd80080000000, rS+=8
M8[#60000000000000a0]=l[20]=#bff0000000000000, rS+=8
M8[#60000000000000a8]=l[21]=#406fe00000000000, rS+=8
M8[#60000000000000b0]=l[22]=#3fd0000000000000, rS+=8
M8[#60000000000000b8]=l[23]=#3d85780000000000, rS+=8
M8[#60000000000000c0]=l[24]=#0000000000000001, rS+=8
M8[#60000000000000c8]=l[25]=#5ff0000000000000, rS+=8
M8[#60000000000000d0]=l[26]=#5fefffffffffffff, rS+=8
M8[#60000000000000d8]=l[27]=#7ff0000000000000, rS+=8
M8[#60000000000000e0]=l[28]=#3c90000000000000, rS+=8
M8[#60000000000000e8]=l[29]=#8000000000000244, rS+=8
M8[#60000000000000f0]=l[30]=#3ff0000000000000, rS+=8
M8[#60000000000000f8]=l[31]=#070c142030404000, rS+=8
M8[#6000000000000100]=l[32]=#fefdfbf7efdfbf80, rS+=8
M8[#6000000000000108]=l[33]=#ffffffffffffffff, rS+=8
M8[#6000000000000110]=l[34]=#0102040810204080, rS+=8
M8[#6000000000000118]=l[35]=#7ebffd1f0bb06c00, rS+=8
M8[#6000000000000120]=l[36]=#7ebffd1f0bb06c00, rS+=8
M8[#6000000000000128]=l[37]=#0000000000000000, rS+=8
M8[#6000000000000130]=l[38]=#0000000000000000, rS+=8
M8[#6000000000000138]=l[39]=#0000000000000000, rS+=8
M8[#6000000000000140]=l[40]=#abc0000000000000, rS+=8
M8[#6000000000000148]=l[41]=#0000000000000029, rS+=8
M8[#6000000000000150]=g[241]=#2000000000000000, rS+=8
M8[#6000000000000158]=g[242]=#0102040810204080, rS+=8
M8[#6000000000000160]=g[243]=#809ffe4b398437f7, rS+=8
M8[#6000000000000168]=g[244]=#0102040810204080, rS+=8
M8[#6000000000000170]=g[245]=#ff5ffb6a4534a3f7, rS+=8
M8[#6000000000000178]=g[246]=#7f6001b4c67bc809, rS+=8
M8[#6000000000000180]=g[247]=#0000000000030000, rS+=8
M8[#6000000000000188]=g[248]=#0000000000020000, rS+=8
M8[#6000000000000190]=g[249]=#0000000000010000, rS+=8
M8[#6000000000000198]=g[250]=#7ff1000000000000, rS+=8
M8[#60000000000001a0]=g[251]=#7ff0000000000000, rS+=8
M8[#60000000000001a8]=g[252]=#3fe0000000000000, rS+=8
M8[#60000000000001b0]=g[253]=#8000000000000000, rS+=8
M8[#60000000000001b8]=g[254]=#0000000000000abc, rS+=8
M8[#60000000000001c0]=g[255]=#0000000000000100, rS+=8
M8[#60000000000001c8]=rB=#0000000000000000, rS+=8
M8[#60000000000001d0]=rD=#0001040c2050c1c4, rS+=8
M8[#60000000000001d8]=rE=#8000000000000000, rS+=8
M8[#60000000000001e0]=rH=#0001040c2050c1c4, rS+=8
M8[#60000000000001e8]=rJ=#0000000000000368, rS+=8
M8[#60000000000001f0]=rM=#ff5ffb6a4534a3f7, rS+=8
M8[#60000000000001f8]=rR=#0000000000000000, rS+=8
M8[#6000000000000200]=rP=#88898a8b8c8d8e8f, rS+=8
M8[#6000000000000208]=rW=#0000000000000368, rS+=8
M8[#6000000000000210]=rX=#00000000ef28f305, rS+=8
M8[#6000000000000218]=rY=#0000000000000000, rS+=8
M8[#6000000000000220]=rZ=#0000000000000000, rS+=8
M8[#6000000000000228]=(rG,rA)=#f100000000000049, rS+=8
1. 0000000000000388: faff0000 (SAVE) rL=0, $255=g[255] = #6000000000000228
line 190: PUT rG,small-$0
1. 000000000000038c: f71300fe (PUTI) rG = 254 = #fe
line 191: INCL small-1,U_BIT<<8
1. 0000000000000390: e7fd0400 (INCL) rL=254, $253=l[67] = #0 + #400 = #400
line 192: FADD $100,small,$200
1. 0000000000000394: 0464fec8 (FADD) $100=l[170] = 1.3577e-320 (+) 0. = 1.3577e-320
line 193: PUT rA,small-1 % enable underflow trip
1. 0000000000000398: f61500fd (PUT) rA = 1024 = #400
line 194: TRIP 1,$100,small
1. 000000000000039c: ff0164fe (TRIP) rW=#3a0, rX=#80000000ff0164fe, rY=#abc, rZ=#abc, rB=#6000000000000228, g[255]=#368, -> #00
...
line 214: GET $50,rX
1. 0000000000000000: fe320019 (GET) $50=l[120] = rX = #80000000ff0164fe
line 215: INCH $50,#8200 % ropcode 2
1. 0000000000000004: e4328200 (INCH) $50=l[120] = #80000000ff0164fe + #8200000000000000 = #2000000ff0164fe
line 216: INCMH $50,#ff00-(U_BIT<<8)
1. 0000000000000008: e532fb00 (INCMH) $50=l[120] = #2000000ff0164fe + #fb0000000000 = #200fb00ff0164fe
Warning: TRIP at location 000000000000039c
line 217: TRAP 1
1. 000000000000000c: 00000001 (TRAP) Halt(1)
line 218: 2H PUT rX,$50
1. 0000000000000010: f6190032 (PUT) rX = 144391169772709118 = #200fb00ff0164fe
line 219: GET $255,rB
1. 0000000000000014: feff0000 (GET) $255=g[255] = rB = #6000000000000228
line 220: RESUME
1. 0000000000000018: f9000000 (RESUME) {#200fb00ff0164fe} -> #3a0
(000000000000039c: ..01..rZ (SET)) $1=l[71] = 2748 = #abc, rA=#004fb
--------
line 195: FSUB $100,small,$200 % cause underflow trip
1. 00000000000003a0: 0664fec8 (FSUB) $100=l[170] = 1.3577e-320 (-) 0. = 1.3577e-320, -> #60
...
line 203: PUSHJ $255,Handler
1. 0000000000000060: f3ffffef (PUSHJB) l[68]=254, rO=#6000000000000a28, rL=0, rJ=#64, -> #1c
...
line 221: Handler SETL $5,#abcd
M8[#6000000000000230]=l[70]=#0000000000000000, rS+=8
M8[#6000000000000238]=l[71]=#0000000000000abc, rS+=8
M8[#6000000000000240]=l[72]=#0000000000000000, rS+=8
M8[#6000000000000248]=l[73]=#0000000000000000, rS+=8
M8[#6000000000000250]=l[74]=#0000000000000000, rS+=8
M8[#6000000000000258]=l[75]=#0000000000000000, rS+=8
1. 000000000000001c: e305abcd (SETL) rL=6, $5=l[74] = #abcd
line 222: GET $1,rJ
1. 0000000000000020: fe010004 (GET) $1=l[70] = rJ = #64
line 223: PUSHJ 3,3B
1. 0000000000000024: f2030010 (PUSHJ) l[72]=3, rO=#6000000000000a48, rL=2, rJ=#28, -> #64
Warning: floating point underflow at location 00000000000003a0
--------
line 204: 3H TRAP 0,$1
1. 0000000000000064: 00000001 (TRAP) Halt(1)
line 205: SUB $0,$1,1
1. 0000000000000068: 25000101 (SUBI) $0=l[73] = 43981 - 1 = 43980
line 206: POP 2,0
1. 000000000000006c: f8020000 (POP) l[72]=#abcd, rL=5, rO=#6000000000000a28, -> #28
...
line 224: SUB $10,$3,$4
M8[#6000000000000260]=l[76]=#0000000000000000, rS+=8
M8[#6000000000000268]=l[77]=#0000000000000000, rS+=8
M8[#6000000000000270]=l[78]=#0000000000000000, rS+=8
M8[#6000000000000278]=l[79]=#0000000000000000, rS+=8
M8[#6000000000000280]=l[80]=#0000000000000000, rS+=8
1. 0000000000000028: 240a0304 (SUB) rL=11, $10=l[79] = 43981 - 43980 = 1
line 225: PUT rJ,$1
1. 000000000000002c: f6040001 (PUT) rJ = 100 = #64
line 226: POP 11,(4B-3B)>>2
rS-=8, l[80]=M8[#6000000000000280]=#0000000000000000
rS-=8, l[79]=M8[#6000000000000278]=#0000000000000000
rS-=8, l[78]=M8[#6000000000000270]=#0000000000000000
rS-=8, l[77]=M8[#6000000000000268]=#0000000000000000
rS-=8, l[76]=M8[#6000000000000260]=#0000000000000000
rS-=8, l[75]=M8[#6000000000000258]=#0000000000000000
rS-=8, l[74]=M8[#6000000000000250]=#0000000000000000
rS-=8, l[73]=M8[#6000000000000248]=#0000000000000000
rS-=8, l[72]=M8[#6000000000000240]=#0000000000000000
rS-=8, l[71]=M8[#6000000000000238]=#0000000000000abc
rS-=8, l[70]=M8[#6000000000000230]=#0000000000000000
1. 0000000000000030: f80b0003 (POP) rL=254, rO=#6000000000000230, -> #64+12
--------
line 207: 4H GET $50,rX
1. 0000000000000070: fe320019 (GET) $50=l[120] = rX = #800000000664fec8
line 208: INCH $50,#8100 % ropcode 1
1. 0000000000000074: e4328100 (INCH) $50=l[120] = #800000000664fec8 + #8100000000000000 = #10000000664fec8
line 209: FLOT $60,1
1. 0000000000000078: 093c0001 (FLOTI) $60=l[130] = (flot) 1 = 1.
line 210: PUT rZ,$60
1. 000000000000007c: f61b003c (PUT) rZ = 4607182418800017408 = #3ff0000000000000
line 211: JMP 2F
1. 0000000000000080: f1ffffe4 (JMPB) -> #10
...
line 218: 2H PUT rX,$50
2. 0000000000000010: f6190032 (PUT) rX = 72057594145210056 = #10000000664fec8
line 219: GET $255,rB
2. 0000000000000014: feff0000 (GET) $255=g[255] = rB = #6000000000000228
line 220: RESUME
2. 0000000000000018: f9000000 (RESUME) {#10000000664fec8} -> #3a4
(00000000000003a0: 0664rYrZ (FSUB)) $100=l[170] = 1.3577e-320 (-) 1. = -1., rA=#004fb
--------
line 196: PUT rL,10
1. 00000000000003a4: f714000a (PUTI) rL = min(rL,10) = 10
line 197: PUT rL,small
1. 00000000000003a8: f61400fe (PUT) rL = min(rL,2748) = 10
line 198: PUSHJ 11,@+4
1. 00000000000003ac: f20b0001 (PUSHJ) l[81]=11, rO=#6000000000000290, rL=0, rJ=#3b0, -> #3b0
line 199: UNSAVE $255
(rG,rA)=M8[#6000000000000228]=#f100000000000049
rS-=8, rZ=M8[#6000000000000220]=#0000000000000000
rS-=8, rY=M8[#6000000000000218]=#0000000000000000
rS-=8, rX=M8[#6000000000000210]=#00000000ef28f305
rS-=8, rW=M8[#6000000000000208]=#0000000000000368
rS-=8, rP=M8[#6000000000000200]=#88898a8b8c8d8e8f
rS-=8, rR=M8[#60000000000001f8]=#0000000000000000
rS-=8, rM=M8[#60000000000001f0]=#ff5ffb6a4534a3f7
rS-=8, rJ=M8[#60000000000001e8]=#0000000000000368
rS-=8, rH=M8[#60000000000001e0]=#0001040c2050c1c4
rS-=8, rE=M8[#60000000000001d8]=#8000000000000000
rS-=8, rD=M8[#60000000000001d0]=#0001040c2050c1c4
rS-=8, rB=M8[#60000000000001c8]=#0000000000000000
rS-=8, g[255]=M8[#60000000000001c0]=#0000000000000100
rS-=8, g[254]=M8[#60000000000001b8]=#0000000000000abc
rS-=8, g[253]=M8[#60000000000001b0]=#8000000000000000
rS-=8, g[252]=M8[#60000000000001a8]=#3fe0000000000000
rS-=8, g[251]=M8[#60000000000001a0]=#7ff0000000000000
rS-=8, g[250]=M8[#6000000000000198]=#7ff1000000000000
rS-=8, g[249]=M8[#6000000000000190]=#0000000000010000
rS-=8, g[248]=M8[#6000000000000188]=#0000000000020000
rS-=8, g[247]=M8[#6000000000000180]=#0000000000030000
rS-=8, g[246]=M8[#6000000000000178]=#7f6001b4c67bc809
rS-=8, g[245]=M8[#6000000000000170]=#ff5ffb6a4534a3f7
rS-=8, g[244]=M8[#6000000000000168]=#0102040810204080
rS-=8, g[243]=M8[#6000000000000160]=#809ffe4b398437f7
rS-=8, g[242]=M8[#6000000000000158]=#0102040810204080
rS-=8, g[241]=M8[#6000000000000150]=#2000000000000000
rS-=8, l[41]=M8[#6000000000000148]=#0000000000000029
rS-=8, l[40]=M8[#6000000000000140]=#abc0000000000000
rS-=8, l[39]=M8[#6000000000000138]=#0000000000000000
rS-=8, l[38]=M8[#6000000000000130]=#0000000000000000
rS-=8, l[37]=M8[#6000000000000128]=#0000000000000000
rS-=8, l[36]=M8[#6000000000000120]=#7ebffd1f0bb06c00
rS-=8, l[35]=M8[#6000000000000118]=#7ebffd1f0bb06c00
rS-=8, l[34]=M8[#6000000000000110]=#0102040810204080
rS-=8, l[33]=M8[#6000000000000108]=#ffffffffffffffff
rS-=8, l[32]=M8[#6000000000000100]=#fefdfbf7efdfbf80
rS-=8, l[31]=M8[#60000000000000f8]=#070c142030404000
rS-=8, l[30]=M8[#60000000000000f0]=#3ff0000000000000
rS-=8, l[29]=M8[#60000000000000e8]=#8000000000000244
rS-=8, l[28]=M8[#60000000000000e0]=#3c90000000000000
rS-=8, l[27]=M8[#60000000000000d8]=#7ff0000000000000
rS-=8, l[26]=M8[#60000000000000d0]=#5fefffffffffffff
rS-=8, l[25]=M8[#60000000000000c8]=#5ff0000000000000
rS-=8, l[24]=M8[#60000000000000c0]=#0000000000000001
rS-=8, l[23]=M8[#60000000000000b8]=#3d85780000000000
rS-=8, l[22]=M8[#60000000000000b0]=#3fd0000000000000
rS-=8, l[21]=M8[#60000000000000a8]=#406fe00000000000
rS-=8, l[20]=M8[#60000000000000a0]=#bff0000000000000
rS-=8, l[19]=M8[#6000000000000098]=#43dfd80080000000
rS-=8, l[18]=M8[#6000000000000090]=#43dfd80060000000
rS-=8, l[17]=M8[#6000000000000088]=#bff0000000000000
rS-=8, l[16]=M8[#6000000000000080]=#43dfd8006d319ef3
rS-=8, l[15]=M8[#6000000000000078]=#43dfd8006d319ef2
rS-=8, l[14]=M8[#6000000000000070]=#ffffffffffffffff
rS-=8, l[13]=M8[#6000000000000068]=#8000000000000000
rS-=8, l[12]=M8[#6000000000000060]=#0000000000000001
rS-=8, l[11]=M8[#6000000000000058]=#fff8000000000000
rS-=8, l[10]=M8[#6000000000000050]=#fff8000000000000
rS-=8, l[9]=M8[#6000000000000048]=#bfdfffffffffffff
rS-=8, l[8]=M8[#6000000000000040]=#0000000000000000
rS-=8, l[7]=M8[#6000000000000038]=#00000000ef28f305
rS-=8, l[6]=M8[#6000000000000030]=#00000000ef28f305
rS-=8, l[5]=M8[#6000000000000028]=#0102040810204080
rS-=8, l[4]=M8[#6000000000000020]=#0000000000000308
rS-=8, l[3]=M8[#6000000000000018]=#0000000000000368
rS-=8, l[2]=M8[#6000000000000010]=#0000000000000000
rS-=8, l[1]=M8[#6000000000000008]=#ffffffffffffffff
rS-=8, l[0]=M8[#6000000000000000]=#7fefffffffffffff
1. 00000000000003b0: fb0000ff (UNSAVE) #6000000000000228: rG=241, ..., rL=41
line 200: TRAP 0,Halt,0 % normal exit
1. 00000000000003b4: 00000000 (TRAP) Halt(0)
1243 instructions, 99 mems, 2557 oops; 179 good guesses, 19 bad
(halted at location #00000000000003b4)
 
Program profile:
"silly.mms"
line 214: GET $50,rX
1. 0000000000000000: fe320019 (GET)
line 215: INCH $50,#8200 % ropcode 2
1. 0000000000000004: e4328200 (INCH)
line 216: INCMH $50,#ff00-(U_BIT<<8)
1. 0000000000000008: e532fb00 (INCMH)
line 217: TRAP 1
1. 000000000000000c: 00000001 (TRAP)
line 218: 2H PUT rX,$50
2. 0000000000000010: f6190032 (PUT)
line 219: GET $255,rB
2. 0000000000000014: feff0000 (GET)
line 220: RESUME
2. 0000000000000018: f9000000 (RESUME)
line 221: Handler SETL $5,#abcd
1. 000000000000001c: e305abcd (SETL)
line 222: GET $1,rJ
1. 0000000000000020: fe010004 (GET)
line 223: PUSHJ 3,3B
1. 0000000000000024: f2030010 (PUSHJ)
line 224: SUB $10,$3,$4
1. 0000000000000028: 240a0304 (SUB)
line 225: PUT rJ,$1
1. 000000000000002c: f6040001 (PUT)
line 226: POP 11,(4B-3B)>>2
1. 0000000000000030: f80b0003 (POP)
--------
line 203: PUSHJ $255,Handler
1. 0000000000000060: f3ffffef (PUSHJB)
line 204: 3H TRAP 0,$1
1. 0000000000000064: 00000001 (TRAP)
line 205: SUB $0,$1,1
1. 0000000000000068: 25000101 (SUBI)
line 206: POP 2,0
1. 000000000000006c: f8020000 (POP)
line 207: 4H GET $50,rX
1. 0000000000000070: fe320019 (GET)
line 208: INCH $50,#8100 % ropcode 1
1. 0000000000000074: e4328100 (INCH)
line 209: FLOT $60,1
1. 0000000000000078: 093c0001 (FLOTI)
line 210: PUT rZ,$60
1. 000000000000007c: f61b003c (PUT)
line 211: JMP 2F
1. 0000000000000080: f1ffffe4 (JMPB)
--------
line 29: Main FCMP $0,neg_zero,$5
1. 0000000000000100: 0100fd05 (FCMP)
line 30: FCMP $1,neg_zero,inf
1. 0000000000000104: 0101fdfb (FCMP)
line 31: FCMP $2,inf,sig_nan
1. 0000000000000108: 0102fbfa (FCMP)
line 32: FUN $3,sig_nan,sig_nan
1. 000000000000010c: 0203fafa (FUN)
line 33: FEQL $4,$4,neg_zero
1. 0000000000000110: 030404fd (FEQL)
line 34: FADD $5,half,inf
1. 0000000000000114: 0405fcfb (FADD)
line 35: FADD $6,half,neg_zero
1. 0000000000000118: 0406fcfd (FADD)
line 36: FADD $7,half,half
1. 000000000000011c: 0407fcfc (FADD)
line 37: FADD $8,half,sig_nan
1. 0000000000000120: 0408fcfa (FADD)
line 38: FSUB $9,half,small
1. 0000000000000124: 0609fcfe (FSUB)
line 39: PUT rA,round_off
1. 0000000000000128: f61500f9 (PUT)
line 40: FSUB $9,half,small
1. 000000000000012c: 0609fcfe (FSUB)
line 41: FSUB $9,small,half
1. 0000000000000130: 0609fefc (FSUB)
line 42: FSQRT $10,$9
1. 0000000000000134: 150a0009 (FSQRT)
line 43: FSUB $11,sig_nan,$10
1. 0000000000000138: 060bfa0a (FSUB)
line 44: PUT rA,round_down
1. 000000000000013c: f61500f7 (PUT)
line 45: FSUB $12,half,half
1. 0000000000000140: 060cfcfc (FSUB)
line 46: FSUB $12,$20,$21
1. 0000000000000144: 060c1415 (FSUB)
line 47: FSUB $12,$20,neg_zero
1. 0000000000000148: 060c14fd (FSUB)
line 48: PUT rA,round_up
1. 000000000000014c: f61500f8 (PUT)
line 49: SUB $0,inf,1 % $0 = largest normal number
1. 0000000000000150: 2500fb01 (SUBI)
line 50: FADD $12,$0,small
1. 0000000000000154: 040c00fe (FADD)
line 51: FIX $12,half
1. 0000000000000158: 050c00fc (FIX)
line 52: FIXU $14,ROUND_DOWN,$9
1. 000000000000015c: 070e0309 (FIXU)
line 53: FLOT $15,ROUND_DOWN,addy
1. 0000000000000160: 080f03f6 (FLOT)
line 54: FLOT $16,ROUND_UP,addy
1. 0000000000000164: 081002f6 (FLOT)
line 55: NEG $1,1 % $1 = -1
1. 0000000000000168: 35010001 (NEGI)
line 56: FLOT $17,1
1. 000000000000016c: 09110001 (FLOTI)
line 57: FLOT $17,$1
1. 0000000000000170: 08110001 (FLOT)
line 58: FLOTU $18,255
1. 0000000000000174: 0b1200ff (FLOTUI)
line 59: FLOTU $18,neg_zero
1. 0000000000000178: 0a1200fd (FLOTU)
line 60: FIX $13,ROUND_NEAR,$18
1. 000000000000017c: 050d0412 (FIX)
line 61: SFLOT $18,ROUND_DOWN,addy
1. 0000000000000180: 0c1203f6 (SFLOT)
line 62: SFLOT $19,ROUND_UP,addy
1. 0000000000000184: 0c1302f6 (SFLOT)
line 63: FSUB $20,$18,$19
1. 0000000000000188: 06141213 (FSUB)
line 64: FSUB $20,$16,$15
1. 000000000000018c: 0614100f (FSUB)
line 65: SFLOT $20,1
1. 0000000000000190: 0d140001 (SFLOTI)
line 66: SFLOT $20,$1
1. 0000000000000194: 0c140001 (SFLOT)
line 67: SFLOTU $21,$1
1. 0000000000000198: 0e150001 (SFLOTU)
line 68: SFLOTU $21,255
1. 000000000000019c: 0f1500ff (SFLOTUI)
line 69: FMUL $22,neg_zero,inf
1. 00000000000001a0: 1016fdfb (FMUL)
line 70: FMUL $22,half,half
1. 00000000000001a4: 1016fcfc (FMUL)
line 71: FMUL $23,small,$0
1. 00000000000001a8: 1017fe00 (FMUL)
line 72: PUT rE,half
1. 00000000000001ac: f60200fc (PUT)
line 73: FCMPE $24,half,$21
1. 00000000000001b0: 1118fc15 (FCMPE)
line 74: FCMPE $24,neg_zero,small
1. 00000000000001b4: 1118fdfe (FCMPE)
line 75: FCMPE $24,neg_zero,half
1. 00000000000001b8: 1118fdfc (FCMPE)
line 76: FCMPE $24,half,inf
1. 00000000000001bc: 1118fcfb (FCMPE)
line 77: FEQLE $24,$15,$16
1. 00000000000001c0: 13180f10 (FEQLE)
line 78: PUT rE,neg_zero
1. 00000000000001c4: f60200fd (PUT)
line 79: FEQLE $24,half,half
1. 00000000000001c8: 1318fcfc (FEQLE)
line 80: FUNE $24,half,half
1. 00000000000001cc: 1218fcfc (FUNE)
line 81: FSQRT $25,ROUND_UP,$0
1. 00000000000001d0: 15190200 (FSQRT)
line 82: FDIV $26,$0,$25
1. 00000000000001d4: 141a0019 (FDIV)
line 83: PUT rA,$50
1. 00000000000001d8: f6150032 (PUT)
line 84: FDIV $26,$0,$25
1. 00000000000001dc: 141a0019 (FDIV)
line 85: FMUL $27,$25,$25
1. 00000000000001e0: 101b1919 (FMUL)
line 86: FREM $28,$9,half
1. 00000000000001e4: 161c09fc (FREM)
line 87: FREM $29,$9,small
1. 00000000000001e8: 161d09fe (FREM)
line 88: FINT $30,$9
1. 00000000000001ec: 171e0009 (FINT)
line 89: FINT $30,ROUND_UP,small
1. 00000000000001f0: 171e02fe (FINT)
line 90: MUL $31,flip,flip
1. 00000000000001f4: 181ff4f4 (MUL)
line 91: MUL $32,flip,$1
1. 00000000000001f8: 1820f401 (MUL)
line 92: MUL $33,flip,2
1. 00000000000001fc: 1921f402 (MULI)
line 93: DIV $32,$32,$1
1. 0000000000000200: 1c202001 (DIV)
line 94: DIV $32,neg_zero,$1
1. 0000000000000204: 1c20fd01 (DIV)
line 95: MULU $32,flip,$1
1. 0000000000000208: 1a20f401 (MULU)
line 96: MULU $31,flip,flip
1. 000000000000020c: 1a1ff4f4 (MULU)
line 97: GET $33,rH
1. 0000000000000210: fe210003 (GET)
line 98: PUT rD,$33
1. 0000000000000214: f6010021 (PUT)
line 99: DIV $33,$1,3
1. 0000000000000218: 1d210103 (DIVI)
line 100: DIVU $34,$31,flip
1. 000000000000021c: 1e221ff4 (DIVU)
line 101: ADD $35,addy,addz
1. 0000000000000220: 2023f6f5 (ADD)
line 102: FADD $36,addy,addz
1. 0000000000000224: 0424f6f5 (FADD)
line 103: CMP $37,$36,$35
1. 0000000000000228: 30252423 (CMP)
line 104: GETA $3,1F
1. 000000000000022c: f4030004 (GETA)
line 105: PUT rW,$3
1. 0000000000000230: f6180003 (PUT)
line 106: LDT $6,Start_Inst
1. 0000000000000234: 8906f100 (LDTI)
line 107: LDTU $7,Final_Inst
1. 0000000000000238: 8b07f104 (LDTUI)
line 108: 1H CMP $5,$6,$7
28. 000000000000023c: 30050607 (CMP)
line 109: BNN $5,1F
28. 0000000000000240: 48050004 (BNN)
line 110: INCML $6,#100 % increase the opcode
27. 0000000000000244: e6060100 (INCML)
line 111: PUT rX,$6 % ropcode 0
27. 0000000000000248: f6190006 (PUT)
line 112: RESUME % return to 1B
27. 000000000000024c: f9000000 (RESUME)
line 113: 1H BN $0,@+4*6
1. 0000000000000250: 40000006 (BN)
line 114: PBN $0,@-4*1
1. 0000000000000254: 5100ffff (PBNB)
line 115: BNN $0,@+4*6
1. 0000000000000258: 48000006 (BNN)
line 116: PBN $0,@+4*5
1. 000000000000025c: 50000005 (PBN)
line 117: PBNN $0,@+4*5
1. 0000000000000260: 58000005 (PBNN)
line 118: BN $0,@-4*3
1. 0000000000000264: 4100fffd (BNB)
line 119: BNN $0,@-4*3
1. 0000000000000268: 4900fffd (BNNB)
line 120: PBN $0,@-4*3
line 121: PBNN $0,@-4*3
1. 0000000000000270: 5900fffd (PBNNB)
line 122: BZ $0,@+4*6
1. 0000000000000274: 42000006 (BZ)
line 123: PBZ $0,@-4*1
1. 0000000000000278: 5300ffff (PBZB)
line 124: BNZ $0,@+4*6
1. 000000000000027c: 4a000006 (BNZ)
line 125: PBZ $0,@+4*5
1. 0000000000000280: 52000005 (PBZ)
line 126: PBNZ $0,@+4*5
1. 0000000000000284: 5a000005 (PBNZ)
line 127: BZ $0,@-4*3
1. 0000000000000288: 4300fffd (BZB)
line 128: BNZ $0,@-4*3
1. 000000000000028c: 4b00fffd (BNZB)
line 129: PBZ $0,@-4*3
line 130: PBNZ $0,@-4*3
1. 0000000000000294: 5b00fffd (PBNZB)
line 131: BP $0,@+4*6
1. 0000000000000298: 44000006 (BP)
line 132: PBP $0,@-4*1
line 133: BNP $0,@+4*6
1. 00000000000002a0: 4c000006 (BNP)
line 134: PBP $0,@+4*5
1. 00000000000002a4: 54000005 (PBP)
line 135: PBNP $0,@+4*5
1. 00000000000002a8: 5c000005 (PBNP)
line 136: BP $0,@-4*3
1. 00000000000002ac: 4500fffd (BPB)
line 137: BNP $0,@-4*3
1. 00000000000002b0: 4d00fffd (BNPB)
line 138: PBP $0,@-4*3
1. 00000000000002b4: 5500fffd (PBPB)
line 139: PBNP $0,@-4*3
1. 00000000000002b8: 5d00fffd (PBNPB)
line 140: BOD $0,@+4*6
1. 00000000000002bc: 46000006 (BOD)
line 141: PBOD $0,@-4*1
line 142: BEV $0,@+4*6
1. 00000000000002c4: 4e000006 (BEV)
line 143: PBOD $0,@+4*5
1. 00000000000002c8: 56000005 (PBOD)
line 144: PBEV $0,@+4*5
1. 00000000000002cc: 5e000005 (PBEV)
line 145: BOD $0,@-4*3
1. 00000000000002d0: 4700fffd (BODB)
line 146: BEV $0,@-4*3
1. 00000000000002d4: 4f00fffd (BEVB)
line 147: PBOD $0,@-4*3
1. 00000000000002d8: 5700fffd (PBODB)
line 148: PBEV $0,@-4*3
1. 00000000000002dc: 5f00fffd (PBEVB)
line 149: LDA $4,Load_Test+4
1. 00000000000002e0: 2304f10c (ADDUI)
line 150: GETA $3,1F
1. 00000000000002e4: f4030004 (GETA)
line 151: PUT rW,$3
1. 00000000000002e8: f6180003 (PUT)
line 152: LDTU $7,Load_End
1. 00000000000002ec: 8b07f124 (LDTUI)
line 153: LDTU $6,Load_Begin
1. 00000000000002f0: 8b06f120 (LDTUI)
line 154: 1H CMPU $8,$6,$7
57. 00000000000002f4: 32080607 (CMPU)
line 155: BNN $8,1F
57. 00000000000002f8: 4808000c (BNN)
line 156: INCML $6,#100 % increase the opcode
56. 00000000000002fc: e6060100 (INCML)
line 157: PUT rX,$6
56. 0000000000000300: f6190006 (PUT)
line 158: RESUME % return to 1B
56. 0000000000000304: f9000000 (RESUME)
line 159: 2H OCTA #fedcba9876543210 % becomes Jmp_Pop
1. 000000000000030c: f8000000 (POP)
line 160: OCTA #ffeeddccbbaa9988 % becomes Jmp_Pop
1. 0000000000000310: f0000002 (JMP)
line 161: NEG ry,addy
1. 0000000000000318: 34f300f6 (NEG)
line 162: SET rz,flip
1. 000000000000031c: c1f2f400 (ORI)
line 163: PUT rM,addz
1. 0000000000000320: f60500f5 (PUT)
line 164: POP
1. 0000000000000324: f8000000 (POP)
line 165: 1H GETA $4,2B
1. 0000000000000328: f504fff8 (GETAB)
line 166: SETL $7,4*11
1. 000000000000032c: e307002c (SETL)
line 167: GO $7,$7,$4
1. 0000000000000330: 9e070704 (GO)
line 168: GO $7,$4,4*12
1. 0000000000000334: 9f070430 (GOI)
line 169: PRELD 70,$4,$4
1. 0000000000000338: 9a460404 (PRELD)
line 170: PRELD 70,$4,0
1. 000000000000033c: 9b460400 (PRELDI)
line 171: PREGO 70,$4,$4
1. 0000000000000340: 9c460404 (PREGO)
line 172: PREGO 70,$4,0
1. 0000000000000344: 9d460400 (PREGOI)
line 173: CSWAP $3,Load_Test+13
1. 0000000000000348: 9503f115 (CSWAPI)
line 174: GETA $3,1F
1. 000000000000034c: f4030007 (GETA)
line 175: PUT rW,$3
1. 0000000000000350: f6180003 (PUT)
line 176: SETL rz,1
1. 0000000000000354: e3f20001 (SETL)
line 177: ADD ry,$4,4
1. 0000000000000358: 21f30404 (ADDI)
line 178: LDOU $40,Jmp_Pop
1. 000000000000035c: 8f28f118 (LDOUI)
line 179: LDTU $7,Big_End
1. 0000000000000360: 8b07f12c (LDTUI)
line 180: LDTU $6,Big_Begin
1. 0000000000000364: 8b06f128 (LDTUI)
line 181: 1H CMPU $8,$6,$7
81. 0000000000000368: 32080607 (CMPU)
line 182: BNN $8,1F
81. 000000000000036c: 48080005 (BNN)
line 183: INCML $6,#100 % increase the opcode
80. 0000000000000370: e6060100 (INCML)
line 184: PUT rX,$6
80. 0000000000000374: f6190006 (PUT)
line 185: SET $5,rz
80. 0000000000000378: c105f200 (ORI)
line 186: RESUME % return to 1B
80. 000000000000037c: f9000000 (RESUME)
line 187: 1H SL $40,small,51
1. 0000000000000380: 3928fe33 (SLI)
line 188: SL $40,small,52
1. 0000000000000384: 3928fe34 (SLI)
line 189: SAVE $255,0
1. 0000000000000388: faff0000 (SAVE)
line 190: PUT rG,small-$0
1. 000000000000038c: f71300fe (PUTI)
line 191: INCL small-1,U_BIT<<8
1. 0000000000000390: e7fd0400 (INCL)
line 192: FADD $100,small,$200
1. 0000000000000394: 0464fec8 (FADD)
line 193: PUT rA,small-1 % enable underflow trip
1. 0000000000000398: f61500fd (PUT)
line 194: TRIP 1,$100,small
1. 000000000000039c: ff0164fe (TRIP)
line 195: FSUB $100,small,$200 % cause underflow trip
1. 00000000000003a0: 0664fec8 (FSUB)
line 196: PUT rL,10
1. 00000000000003a4: f714000a (PUTI)
line 197: PUT rL,small
1. 00000000000003a8: f61400fe (PUT)
line 198: PUSHJ 11,@+4
1. 00000000000003ac: f20b0001 (PUSHJ)
line 199: UNSAVE $255
1. 00000000000003b0: fb0000ff (UNSAVE)
line 200: TRAP 0,Halt,0 % normal exit
1. 00000000000003b4: 00000000 (TRAP)
1243 instructions, 99 mems, 2557 oops; 179 good guesses, 19 bad
(halted at location #00000000000003b4)
/number1.mms
0,0 → 1,18
NEG $1,1
STCO 1,$1,1
CMPU $1,$1,1
STB $1,$1,$1
LDOU $1,$1,$1
INCH $1,1
16ADDU $1,$1,$1
MULU $1,$1,$1
PUT rA,1
STW $1,$1,1
SADD $1,$1,1
FLOT $1,$1
PUT rB,$1
XOR $1,$1,1
PBOD $1,@-4*1
NOR $1,$1,$1
SR $1,$1,1
SRU $1,$1,1
/copy.mms
0,0 → 1,63
* SAMPLE PROGRAM: COPY A GIVEN FILE TO STANDARD OUTPUT
 
t IS $255
argc IS $0
argv IS $1
s IS $2
Buf_Size IS 5 ridiculously small for testing
LOC Data_Segment
Buffer LOC @+Buf_Size
GREG @
Arg0 OCTA 0,TextRead
Arg1 OCTA Buffer,Buf_Size
LOC #200 main(argc,argv) {
Main CMP t,argc,2 if (argc==2) goto openit
PBZ t,OpenIt
GETA t,1F fputs("Usage: ",stderr)
TRAP 0,Fputs,StdErr
LDOU t,argv,0 fputs(argv[0],stderr)
TRAP 0,Fputs,StdErr
GETA t,2F fputs(" filename\n",stderr)
Quit TRAP 0,Fputs,StdErr
NEG t,0,1 quit: exit(-1)
TRAP 0,Halt,0
1H BYTE "Usage: ",0
LOC (@+3)&-4 align to tetrabyte
2H BYTE " filename",#a,0
 
OpenIt LDOU s,argv,8 openit: s=argv[1]
STOU s,Arg0
LDA t,Arg0 fopen(argv[1],"r",file[3])
TRAP 0,Fopen,3
PBNN t,CopyIt if (no error) goto copyit
GETA t,1F fputs("Can't open file ",stderr)
TRAP 0,Fputs,StdErr
SET t,s fputs(argv[1],stderr)
TRAP 0,Fputs,StdErr
GETA t,2F fputs("!\n",stderr)
JMP Quit goto quit
1H BYTE "Can't open file ",0
LOC (@+3)&-4 align to tetrabyte
2H BYTE "!",#a,0
 
CopyIt LDA t,Arg1 copyit:
TRAP 0,Fread,3 items=fread(buffer,1,buf_size,file[3])
BN t,EndIt if (items < buf_size) goto endit
LDA t,Arg1 items=fwrite(buffer,1,buf_size,stdout)
TRAP 0,Fwrite,StdOut
PBNN t,CopyIt if (items >= buf_size) goto copyit
Trouble GETA t,1F trouble: fputs("Trouble w...!",stderr)
JMP Quit goto quit
1H BYTE "Trouble writing StdOut!",#a,0
 
EndIt INCL t,Buf_Size
BN t,ReadErr if (ferror(file[3])) goto readerr
STO t,Arg1+8
LDA t,Arg1 n=fwrite(buffer,1,items,stdout)
TRAP 0,Fwrite,StdOut
BN t,Trouble if (n < items) goto trouble
TRAP 0,Halt,0 exit(0)
ReadErr GETA t,1F readerr: fputs("Trouble r...!",stderr)
JMP Quit goto quit }
1H BYTE "Trouble reading!",#a,0
/sub1.mms
0,0 → 1,22
x0 GREG Data_Segment
t IS $255
LOC Data_Segment+8
OCTA 1,3,2,3
LOC Data_Segment+8*100
OCTA -1
LOC #100
* Maximum of X[1..100]
j IS $0 ;m IS $1 ;kk IS $2 ;xk IS $3
Max100 SETL kk,100*8
LDO m,x0,kk
JMP 2F
3H LDO xk,x0,kk
CMP t,xk,m
PBNP t,5F
SET m,xk
2H SR j,kk,3
5H SUB kk,kk,8
PBP kk,3B
6H POP 2,0
 
Main PUSHJ 0,Max100
/sub2.mms
0,0 → 1,23
x0 GREG Data_Segment
t IS $255
LOC Data_Segment+8
OCTA 1,3,2,3
LOC Data_Segment+8*100
OCTA -1
LOC #100
* Maximum of X[1..100]
j GREG ;m GREG ;kk GREG ;xk GREG ; GREG @
GoMax100 SETL kk,100*8
LDO m,x0,kk
JMP 1F
3H LDO xk,x0,kk
CMP t,xk,m
PBNP t,5F
4H SET m,xk
1H SR j,kk,3
5H SUB kk,kk,8
PBP kk,3B
6H GO kk,$0,0
 
Main GO $0,GoMax100
 
/saddle1.mms
0,0 → 1,52
* Exercise 1.3.2'--18, Solution 1
LOC #100
t IS $255
a00 GREG Data_Segment
a10 GREG Data_Segment+8
ij IS $0 % element index and return register
j GREG % column index
k GREG % size of list of minima
x GREG % current minimum
y GREG % current element
Saddle SET ij,9*8
RowMin SET j,8
LDB x,a10,ij Candidate for row minimum
2H SET k,0 Set list empty.
4H INCL k,1
STB j,a00,k Put column index in list.
1H SUB ij,ij,1 Go left one.
SUB j,j,1
BZ j,ColMax Done with row?
3H LDB y,a10,ij
SUB t,x,y
PBN t,1B Is \.x still minimum?
SET x,y
PBP t,2B New minimum?
JMP 4B Remember another minimum.
ColMax LDB $1,a00,k Get column from list.
ADD j,$1,9*8-8
1H LDB y,a10,j
CMP t,x,y
PBN t,No Is row min${}<{}$column element?
SUB j,j,8
PBP j,1B Done with column?
Yes ADD ij,ij,$1 Yes; $\.{ij}\gets{}$index of saddle.
LDA ij,a10,ij
POP 1,0
No SUB k,k,1 Is list empty?
BP k,ColMax If not, try again.
PBP ij,RowMin Have all rows been tried?
POP 1,0 Yes; $\$0=0$, no saddle.\quad\slug\endmmix
 
aaaa GREG 6364136223846793005 C E Haynes's multiplier
Main SET ij,9*8 assume that $1 = seed
1H MULU $1,$1,aaaa
INCL $1,1
MULU x,$1,5
GET x,rH
SUB x,x,2
STB x,a10,ij
SUB ij,ij,1
PBP ij,1B
PUSHJ 2,Saddle
JMP Main
/saddle2.mms
0,0 → 1,59
* Exercise 1.3.2'--18, Solution 2
LOC #100
t IS $255
a00 GREG Data_Segment
a10 GREG Data_Segment+8
a20 GREG Data_Segment+8*2
ij GREG % element index
ii GREG % row index times 8
j GREG % column index
x GREG % current maximum
y GREG % current element
z GREG % current min max
ans IS $0 % return register
Phase1 SET j,8 Start at column 8.
SET z,1000 $\.z\gets\infty$ (more or less).
3H ADD ij,j,9*8-2*8
LDB x,a20,ij
1H LDB y,a10,ij
CMP t,x,y Is x<y?
CSN x,t,y If so, update the maximum.
2H SUB ij,ij,8 Move up one.
PBP ij,1B
STB x,a10,ij Store column maximum.
CMP t,x,z Is x<z?
CSN z,t,x If so, update the min max.
SUB j,j,1 Move left a column.
PBP j,3B
Phase2 SET ii,9*8-8 At this point $\.z=\min_jC(j)$
3H ADD ij,ii,8 Prepare to search a row.
SET j,8
1H LDB x,a10,ij
SUB t,z,x Is $\.z>a_{ij}$?
PBP t,No No saddle in this row
PBN t,2F
LDB x,a00,j Is $a_{ij}=C(j)$?
CMP t,x,z
CSZ ans,t,ij If so, remember a possible saddle point.
2H SUB j,j,1 Move left in row.
SUB ij,ij,1
PBP j,1B
LDA ans,a10,ans A saddle point was found here.
POP 1,0
No SUB ii,ii,8
PBP ii,3B Try another row.
SET ans,0
POP 1,0 $\.{ans} = 0$; no saddle.\quad\slug
 
aaaa GREG 6364136223846793005 C E Haynes's multiplier
Main SET ij,9*8 assume that $1 = seed
1H MULU $1,$1,aaaa
INCL $1,1
MULU x,$1,5
GET x,rH
SUB x,x,2
STB x,a10,ij
SUB ij,ij,1
PBP ij,1B
PUSHJ 2,Phase1
JMP Main
/halves.mms
0,0 → 1,29
% Example program ... 2^-n in decimal
%
LOC #2000000000000000 % Data segment
HALF BYTE '5'
LOC @+'0'-1
BYTE "0011223344" % Table of half-digits
DATA BYTE '1',0
%
GREGTOP $g250
pbase GREG DATA-1
half GREG HALF
p GREG 0
starp GREG 0
carry GREG 0
acc GREG 0
LOC #1000
Main OR p,pbase,0 % p = &DATA-1.
SETL carry,0 % carry = 0.
JMP 1F
Loop ADD acc,acc,carry % acc += carry.
ZSOD carry,starp,5 % carry = 5[*p odd].
STB acc,p,0 % *p = acc.
1H LDB starp,p,1
INCL p,1 % p++.
LDB acc,half,starp % acc = half[*p].
PBNZ starp,Loop % repeat until *p='\0'.
STB acc,p,0 % *p = '5'.
JMP Main % repeat indefinitely.
 
/test.mmconfig
0,0 → 1,37
% CONFIGURATION TEST
% The following erroneous lines have been commented out one by one:
%sh*t % obscene
%memaddresstime 0 % too small
%memaddresstime unit % unreadable
%branchpredictbits 9 % too large
%membusbytes 9 % not a power of two
%ITcache unit % unknown cache parameter
%mul0 0 % too small
%mul0 256 % too big
%unit antidisestablishmentarianism % too long
%unit 0 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEG % eh?
%unit 1 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEFG % 65
%unit 2 0000000000000000000000000000000000000000000000000000000000000000 % 0's
%Dcache blocksize 1024 % exceeds Scache
%Dcache granularity 16 % exceeds blocksize
%Scache granularity 16 % differs from Dcache
memaddresstime 4
memreadtime 5 memwritetime 6 % don't ask why
membusbytes 16
branchpredictbits 2
branchaddressbits 1
branchhistorybits 1
branchdualbits 1
%branchdualbits 30
memchunksmax 2
hashprime 3
Scache blocksize 32
Scache setsize 2
Scache associativity 4 lru
Scache accesstime 2
Icache victimsize 2
unit UNI1 ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
unit UNI2 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
sh 1 1 1
 
 
/harm.mms
0,0 → 1,47
* Sum of Rounded Harmonic Series
MaxN IS 10
a GREG % Accumulator
c GREG % $2\cdot10^n$
d GREG % Divisor or digit
r GREG % Scaled reciprocal
s GREG % Scaled sum
m GREG % $m_k$
mm GREG % $m_{k+1}$
nn GREG % $n-\.{MaxN}$
LOC Data_Segment
dec GREG @+3 % decimal point loc
BYTE " ."
% LOC @+MaxN+6
LOC #100
Main NEG nn,MaxN-1
SET c,20
1H SET m,1
SR s,c,1
JMP 2F
3H SUB a,c,1
SL d,r,1
SUB d,d,1
DIV mm,a,d
4H SUB a,mm,m
MUL a,r,a
ADD s,s,a
SET m,mm
2H ADD a,c,m
2ADDU d,m,2
DIV r,a,d
PBNZ r,3B
5H ADD a,nn,MaxN+1
SET d,#a
JMP 7F
6H DIV s,s,10
GET d,rR
INCL d,'0'
7H STB d,dec,a
SUB a,a,1
BZ a,@-4
PBNZ s,6B
8H SUB $255,dec,3
TRAP 0,Fputs,StdOut
9H INCL nn,1
MUL c,c,10
PBNP nn,1B
/sim.mms
0,0 → 1,1136
% Stripped-Down Simulator for MMIX, derived from MMIX-SIM
% To run it on a program like "foo bar"
% first say "mmix -Dfoo.mmb foo bar"
% then "mmix <options> sim foo.mmb"
 
% I apologize for lack of comments; they're in the book though
 
t IS $255
lring_size IS 256 % octabytes in the local register ring
 
LOC Data_Segment
Global LOC @+8*256
g GREG Global % base of 256 global registers
Local LOC @+8*lring_size
l GREG Local % base of lring_size local registers
GREG @
IOArgs OCTA 0,BinaryRead
Chunk0 IS @
 
LOC #100
PREFIX :Mem:
head GREG % address of first chunk
curkey GREG % KEY(head)
alloc GREG % address of next chunk to allocate
Chunk IS #1000 bytes per chunk, is power of 2
addr IS $0
key IS $1
test IS $2
newlink IS $3
p IS $4 % LINK(p)=head
t IS :t
 
KEY IS 0
LINK IS 8
DATA IS 16
nodesize GREG Chunk+3*8 pad with 8 zero bytes
mask GREG Chunk-1
 
:MemFind ANDN key,addr,mask
CMPU t,key,curkey
PBZ t,4F
BN addr,:Error
SET newlink,head
1H SET p,head
LDOU head,p,LINK
PBNZ head,2F
SET head,alloc
STOU key,head,KEY
ADDU alloc,alloc,nodesize
JMP 3F
2H LDOU test,head,KEY
CMPU t,test,key
BNZ t,1B
3H LDOU t,head,LINK
STOU newlink,head,LINK
SET curkey,key
STOU t,p,LINK
4H SUBU t,addr,key
LDA $0,head,DATA
ADDU $0,t,$0
POP 1,0
PREFIX :
 
res IS $2
arg IS res+1
 
ss GREG % rS
oo GREG % rO
ll GREG % 8*rL
gg GREG % 8*rG
aa GREG % rA
ii GREG % rI
uu GREG % rU
cc GREG % rC
 
lring_mask GREG 8*lring_size-1
:GetReg CMPU t,$0,gg
BN t,1F
LDOU $0,g,$0
POP 1,0
1H CMPU t,$0,ll
ADDU $0,$0,oo
AND $0,$0,lring_mask
LDOU $0,l,$0
CSNN $0,t,0
POP 1,0
 
:StackStore GET $0,rJ
AND t,ss,lring_mask \S82
LDOU $1,l,t
SET arg,ss
PUSHJ res,MemFind
STOU $1,res,0 M[rS]<-l[rS]
ADDU ss,ss,8
PUT rJ,$0
POP
:StackLoad GET $0,rJ
SUBU ss,ss,8 \S83
SET arg,ss
PUSHJ res,MemFind
LDOU $1,res,0
AND t,ss,lring_mask
STOU $1,l,t
PUT rJ,$0
POP
:StackRoom SUBU t,ss,oo idiom in \S81,\S101,\S102
SUBU t,t,ll
AND t,t,lring_mask
PBNZ t,1F
GET $0,rJ
PUSHJ res,StackStore
PUT rJ,$0
1H POP
 
* The main loop
loc GREG % where the simulator is at
inst_ptr GREG % where the simulator will be next
inst GREG % the current instruction being simulated
resuming GREG % are we resuming an instruction in rX?
 
Fetch PBZ resuming,1F \S60 (main simulation loop)
SUBU loc,inst_ptr,4
LDTU inst,g,8*rX+4
JMP 2F
1H SET loc,inst_ptr
SET arg,loc
PUSHJ res,MemFind
LDTU inst,res,0
ADDU inst_ptr,loc,4
2H CMPU t,loc,g
BNN t,Error loc>=Data_Segment
 
op GREG % opcode of the current instruction
xx GREG % X field of the current instruction
yy GREG % Y field of the current instruction
zz GREG % Z field of the current instruction
yz GREG % YZ field of the current instruction
f GREG % packed information about the current op
xxx GREG % X field times 8
x GREG % result, or X operand
y GREG % Y operand
z GREG % Z operand
xptr GREG % location where x should be stored
exc GREG % arithmetic exceptions
 
Z_is_immed_bit IS #1
Z_is_source_bit IS #2
Y_is_immed_bit IS #4
Y_is_source_bit IS #8
X_is_source_bit IS #10
X_is_dest_bit IS #20
Rel_addr_bit IS #40
Mem_bit IS #80
 
Info IS #1000
Done IS Info+8*256
info GREG Info % base address for master info table
c255 GREG 8*255
c256 GREG 8*256
 
MOR op,inst,#8
MOR xx,inst,#4
MOR yy,inst,#2
MOR zz,inst,#1
0H GREG -#10000
ANDN yz,inst,0B
SLU xxx,xx,3
SLU t,op,3
LDOU f,info,t
SET x,0
SET y,0
SET z,0
SET exc,0
AND t,f,Rel_addr_bit
PBZ t,1F
PBEV f,2F Convert rel to abs, \S70
9H GREG -#1000000
ANDN yz,inst,9B xyz
ADDU t,yz,9B
JMP 3F
2H ADDU t,yz,0B
3H CSOD yz,op,t
SL t,yz,2
ADDU yz,loc,t
1H PBNN resuming,Install_X Install operands \S71
LDOU y,g,8*rY Install special operands \S127
LDOU z,g,8*rZ
BOD resuming,Install_Y
0H GREG #C1<<56+(x-$0)<<48+(z-$0)<<40+1<<16+X_is_dest_bit
SET f,0B Change to ORI instruction
LDOU exc,g,8*rX
MOR exc,exc,#20
JMP XDest
Install_X AND t,f,X_is_source_bit
PBZ t,1F
SET arg,xxx
PUSHJ res,GetReg
SET x,res
1H SRU t,f,5
AND t,t,#f8
PBZ t,Install_Z
LDOU x,g,t Set x from third op, \S79
Install_Z AND t,f,Z_is_source_bit
PBZ t,1F
SLU arg,zz,3
PUSHJ res,GetReg
SET z,res
JMP Install_Y
1H CSOD z,f,zz Z_is_immed_bit
AND t,op,#f0
CMPU t,t,#e0
PBNZ t,Install_Y
AND t,op,#3 Set z as immediate wyde, \S78
NEG t,3,t
SLU t,t,4
SLU z,yz,t
SET y,x
Install_Y AND t,f,Y_is_immed_bit
PBZ t,1F
SET y,yy
SLU t,yy,40
ADDU f,f,t
1H AND t,f,Y_is_source_bit
BZ t,1F
SLU arg,yy,3
PUSHJ res,GetReg
SET y,res (end of \S71)
1H AND t,f,X_is_dest_bit
BZ t,1F
XDest CMPU t,xxx,gg Install X as dest, \S80
BN t,3F
LDA xptr,g,xxx
JMP 1F
2H ADDU t,oo,ll
AND t,t,lring_mask
STCO 0,l,t
INCL ll,8
PUSHJ res,StackRoom
3H CMPU t,xxx,ll
BNN t,2B
ADD t,xxx,oo
AND t,t,lring_mask
LDA xptr,l,t
1H AND t,f,Mem_bit
PBZ t,1F
ADDU arg,y,z
CMPU t,op,#A0
BN t,2F
CMPU t,arg,g
BN t,Error
2H PUSHJ res,MemFind
1H SRU t,f,32
PUT rX,t
PUT rM,x
PUT rE,x
0H GREG #30000
AND t,aa,0B
ORL t,U_BIT<<8 enable underflow trip
PUT rA,t
0H GREG Done
PUT rW,0B
RESUME
 
MulU MULU x,y,z
GET t,rH
STOU t,g,8*rH
JMP XDone
 
Div DIV x,y,z
JMP 1F
DivU PUT rD,x
DIVU x,y,z
1H GET t,rR
STO t,g,8*rR
JMP XDone
 
Cswap LDOU z,g,8*rP
LDOU y,res,0
CMPU t,y,z
BNZ t,1F
STOU x,res,0
JMP 2F
1H STOU y,g,8*rP
2H ZSZ x,t,1
JMP XDone
 
BTaken ADDU cc,cc,4
PBTaken SUBU cc,cc,2
SET inst_ptr,yz
JMP Update
 
Go SET x,inst_ptr
ADDU inst_ptr,y,z
JMP XDone
 
PushGo ADDU yz,y,z
PushJ SET inst_ptr,yz
CMPU t,xxx,gg
PBN t,1F
SET xxx,ll
SRU xx,xxx,3
INCL ll,8
PUSHJ 0,StackRoom
1H ADDU t,xxx,oo
AND t,t,lring_mask
STOU xx,l,t
ADDU t,loc,4
STOU t,g,8*rJ
INCL xxx,8
SUBU ll,ll,xxx
ADDU oo,oo,xxx
JMP Update
 
Pop SUBU oo,oo,8
BZ xx,1F
CMPU t,ll,xxx
BN t,1F
ADDU t,xxx,oo
AND t,t,lring_mask
LDOU y,l,t
1H CMPU t,oo,ss
PBNN t,1F
PUSHJ 0,StackLoad
1H AND t,oo,lring_mask
LDOU z,l,t
AND z,z,#ff
SLU z,z,3
1H SUBU t,oo,ss
CMPU t,t,z
PBNN t,1F
PUSHJ 0,StackLoad actually gamma=beta possible here!
JMP 1B
1H ADDU ll,ll,8
CMPU t,xxx,ll
CSN ll,t,xxx
ADDU ll,ll,z
CMPU t,gg,ll
CSN ll,t,gg
CMPU t,z,ll
BNN t,1F
AND t,oo,lring_mask
STOU y,l,t
1H LDOU y,g,8*rJ
SUBU oo,oo,z
4ADDU inst_ptr,yz,y
JMP Update
 
Save BNZ yz,Error \S102
CMPU t,xxx,gg
BN t,Error
ADDU t,oo,ll
AND t,t,lring_mask
SRU y,ll,3
STOU y,l,t
INCL ll,8
PUSHJ 0,StackRoom
ADDU oo,oo,ll
SET ll,0
1H PUSHJ 0,StackStore
CMPU t,ss,oo
PBNZ t,1B
SUBU y,gg,8
4H ADDU y,y,8
1H SET arg,ss \S103
PUSHJ res,MemFind
CMPU t,y,8*(rZ+1)
LDOU z,g,y
PBNZ t,2F
SLU z,gg,56-3
ADDU z,z,aa
2H STOU z,res,0
INCL ss,8
BNZ t,1F
CMPU t,y,c255
BZ t,2F
CMPU t,y,8*rR
PBNZ t,4B
SET y,8*rP
JMP 1B
2H SET y,8*rB
JMP 1B
1H SET oo,ss
SUBU x,oo,8
JMP XDone
 
Unsave BNZ xx,Error \S104
BNZ yy,Error
ANDNL z,#7
ADDU ss,z,8
SET y,8*(rZ+2)
1H SUBU y,y,8
4H SUBU ss,ss,8 \S105
SET arg,ss
PUSHJ res,MemFind
LDOU x,res,0
CMPU t,y,8*(rZ+1)
PBNZ t,2F
SRU gg,x,56-3
SLU aa,x,64-18
SRU aa,aa,64-18
JMP 1B
2H STOU x,g,y
3H CMPU t,y,8*rP
CSZ y,t,8*(rR+1)
CSZ y,y,c256
CMPU t,y,gg
PBNZ t,1B
PUSHJ 0,StackLoad
AND t,ss,lring_mask
LDOU x,l,t
AND x,x,#ff
BZ x,1F
SET y,x
2H PUSHJ 0,StackLoad
SUBU y,y,1
PBNZ y,2B
SLU x,x,3
1H SET ll,x
CMPU t,gg,x
CSN ll,t,gg
SET oo,ss
PBNZ uu,Update
BZ resuming,Update
JMP AllDone
 
Get CMPU t,yz,32
BNN t,Error
STOU ii,g,8*rI
STOU cc,g,8*rC
STOU oo,g,8*rO
STOU ss,g,8*rS
STOU uu,g,8*rU
STOU aa,g,8*rA
SR t,ll,3
STOU t,g,8*rL
SR t,gg,3
STOU t,g,8*rG
SLU t,zz,3
LDOU x,g,t
JMP XDone
 
Put BNZ yy,Error
CMPU t,xx,32
BNN t,Error
CMPU t,xx,rC
BN t,PutOK
CMPU t,xx,rF
BN t,1F
PutOK STOU z,g,xxx
JMP Update
1H CMPU t,xx,rG
BN t,Error
SUB t,xx,rL
PBP t,PutA
BN t,PutG
PutL SLU z,z,3 \S98, PUT rL
CMPU t,z,ll
CSN ll,t,z
JMP Update
0H GREG #40000
PutA CMPU t,z,0B \S100, PUT rA
BNN t,Error
SET aa,z
JMP Update
PutG SRU t,z,8
BNZ t,Error
CMPU t,z,32
BN t,Error
SLU z,z,3
CMPU t,z,ll
BN t,Error
JMP 2F
1H SUBU gg,gg,8
STCO 0,g,gg
2H CMPU t,z,gg
PBN t,1B
SET gg,z
JMP Update
 
Resume SLU t,inst,40 \S125
BNZ t,Error
LDOU inst_ptr,g,8*rW
LDOU x,g,8*rX
BN x,Update
SRU xx,x,56
SUBU t,xx,2
BNN t,1F
PBZ xx,2F
SRU y,x,28 rop=1 (RESUME_CONT)
AND y,y,#f
SET z,1
SLU z,z,y
ANDNL z,#70cf
BNZ z,Error
1H BP t,Error
SRU t,x,13
AND t,t,c255
CMPU y,t,ll
BN y,2F
CMPU y,t,gg
BN y,Error
2H MOR t,x,#8
CMPU t,t,#F9 RESUME
BZ t,Error
NEG resuming,xx
CSNN resuming,resuming,1
JMP Update
 
Sync BNZ xx,Error
CMPU t,yz,4
BNN t,Error
JMP Update
 
Trip SET xx,0
JMP TakeTrip
 
Trap STOU inst_ptr,g,8*rWW
0H GREG #8000000000000000
ADDU t,inst,0B
STOU t,g,8*rXX
STOU y,g,8*rYY
STOU z,g,8*rZZ
SRU y,inst,6
CMPU t,y,4*11
BNN t,Error
LDOU t,g,c255
0H GREG @+4
GO y,0B,y
JMP SimHalt
JMP SimFopen
JMP SimFclose
JMP SimFread
JMP SimFgets
JMP SimFgetws
JMP SimFwrite
JMP SimFputs
JMP SimFputws
JMP SimFseek
JMP SimFtell
 
:GetArgs GET $0,rJ
SET y,t
SET arg,t
PUSHJ res,MemFind
LDOU z,res,0 z = virtual address of buffer
SET arg,z
PUSHJ res,MemFind
SET x,res x = physical address of buffer
STO x,IOArgs
SET xx,Mem:Chunk
AND zz,x,Mem:mask
SUB xx,xx,zz xx = bytes from x to chunk end
ADDU arg,y,8
PUSHJ res,MemFind
LDOU zz,res,0 zz = size of buffer
STOU zz,IOArgs+8
PUT rJ,$0
POP
 
GREG @
:SimInst LDA t,IOArgs
JMP DoInst
SimFinish LDA t,IOArgs
SimFclose GETA $0,TrapDone
:DoInst PUT rW,$0
PUT rX,inst
RESUME
 
SimFopen PUSHJ 0,GetArgs
ADDU xx,Mem:alloc,Mem:nodesize
STOU xx,IOArgs % we'll copy the file name here
SET x,xx
1H SET arg,z
PUSHJ res,MemFind
LDBU t,res,0
STBU t,x,0
INCL x,1
INCL z,1
PBNZ t,1B
GO $0,SimInst
3H STCO 0,x,0 % clean up the copied string
CMPU z,xx,x
SUB x,x,8
PBN z,3B
JMP TrapDone
 
TrapDone STO t,g,8*rBB "RESUME 1" works this way
STO t,g,c255
JMP Update
 
SimFread PUSHJ 0,GetArgs
SET y,zz number of bytes to read
1H CMP t,xx,y
PBNN t,SimFinish
STO xx,IOArgs+8 oops, we must cross chunk bdry
SUB y,y,xx
GO $0,SimInst
BN t,1F
ADD z,z,xx
SET arg,z
PUSHJ res,MemFind
STOU res,IOArgs
STO y,IOArgs+8
ADD xx,Mem:mask,1
JMP 1B
1H SUB t,t,y
JMP TrapDone
 
SimFgets PUSHJ 0,GetArgs
CMP t,xx,zz
PBNN t,SimFinish easy if all in one chunk
SET y,zz remaining buf size
SET yy,0 bytes successfully read so far
1H ADD t,xx,1
STO t,IOArgs+8 null character spills off end
GO $0,SimInst
BN t,TrapDone
ADD yy,yy,t
CMP $0,t,xx
SET t,yy
PBNZ $0,TrapDone
ADDU z,z,xx
SET arg,z
PUSHJ res,MemFind
SUBU x,x,1
LDBU t,x,xx look at last byte read
CMP t,t,#0a is it newline?
BZ t,1F
SUB y,y,xx
SET x,res
STOU x,IOArgs
STO y,IOArgs+8
ADD xx,Mem:mask,1
CMP t,xx,y
BN t,1B
GO $0,SimInst
BN t,TrapDone
2H ADD t,yy,t
JMP TrapDone
1H SET t,0
STBU t,res,0
JMP 2B
 
SimFgetws PUSHJ 0,GetArgs
ADD y,zz,zz remaining buf size (bytes)
CMP t,xx,y
PBNN t,SimFinish easy if all in one chunk
SET yy,0 wydes successfully read so far
1H ADD zz,xx,3
SR zz,zz,1 wydes in current chunk, plus 1
STO zz,IOArgs+8 null character spills off end
GO $0,SimInst
BN t,TrapDone
ADDU yy,yy,t
SUB zz,zz,1
CMP $0,t,zz
SET t,yy
PBNZ $0,TrapDone
ADD z,z,xx
SET arg,z
PUSHJ res,MemFind
SUBU x,x,2
LDWU t,x,xx look at last wyde read
CMP t,t,#0a is it newline?
BZ t,1F
SUB y,y,xx
SET x,res
STOU x,IOArgs
SR t,y,1
STO t,IOArgs+8
ADD xx,Mem:mask,1
ANDN y,y,1
CMP t,xx,y
BN t,1B
GO $0,SimInst
BN t,TrapDone
2H ADD t,yy,t
JMP TrapDone
1H SET t,0
STWU t,res,0
JMP 2B
 
SimFwrite IS SimFread yes it works!
 
SimFputs SET xx,0 this many bytes written
SET z,t virtual address of string
1H SET arg,z
PUSHJ res,MemFind
SET t,res physical address of string
GO $0,DoInst
BN t,TrapDone
BZ t,1F
ADD xx,xx,t
ADDU z,z,t
AND t,z,Mem:mask
BZ t,1B
1H SET t,xx
JMP TrapDone
 
SimFputws SET xx,0 this many wydes written
SET z,t virtual address of string
1H SET arg,z
PUSHJ res,MemFind
SET t,res physical address of string
GO $0,DoInst
BN t,TrapDone
BZ t,1F
ADD xx,xx,t
2ADDU z,t,z
AND t,z,Mem:mask
BZ t,1B
1H SET t,xx
JMP TrapDone
 
SimFseek IS SimFclose
SimFtell IS SimFclose
 
GREG @
1H BYTE "Warning: ",0
2H BYTE " at location ",0
3H BYTE #a,0
T0 BYTE "TRIP",0
T1 BYTE "integer divide check",0
T2 BYTE "integer overflow",0
T3 BYTE "float-to-fix overflow",0
T4 BYTE "invalid floating point operation",0
T5 BYTE "floating point overflow",0
T6 BYTE "floating point underflow",0
T7 BYTE "floating point division by zero",0
T8 BYTE "floating point inexact",0
TripType OCTA T0,T1,T2,T3,T4,T5,T6,T7,T8
SimHalt CMP t,zz,1
BZ inst,Exit t=0 on normal exit
BNZ t,Error
CMPU t,loc,#90
BNN t,Error Halt 1 from loc<#90 gives warning
LDA t,1B
TRAP 0,Fputs,StdErr
SR x,loc,1
LDA t,TripType
LDOU t,t,x
TRAP 0,Fputs,StdErr
LDA t,2B
TRAP 0,Fputs,StdErr
LDOU x,g,8*rW
SUBU x,x,4
SRU arg,x,32
PUSHJ res,OutTetra
SET arg,x
PUSHJ res,OutTetra
LDA t,3B
TRAP 0,Fputs,StdErr
LDOU t,g,c255
JMP TrapDone
 
Error NEG t,22 catch-22
Exit TRAP 0,Halt,0
 
s IS $1
0H GREG #0008000400020001
:OutTetra MOR t,$0,0B
SLU s,t,4
XOR t,s,t
0H GREG #0f0f0f0f0f0f0f0f
AND t,t,0B
0H GREG #0606060606060606
ADDU t,t,0B
0H GREG #0000002700000000
MOR s,0B,t
0H GREG #2a2a2a2a2a2a2a2a
ADDU t,t,0B
ADDU s,t,s
STOU s,g,c255
GETA t,OctaArgs
TRAP 0,Fwrite,StdErr
POP 0
 
O IS Done-4
LOC Info
JMP Trap+@-O; BYTE 0,5,0,#0a TRAP
FCMP x,y,z; BYTE 0,1,0,#2a FCMP
FUN x,y,z; BYTE 0,1,0,#2a FUN
FEQL x,y,z; BYTE 0,1,0,#2a FEQL
FADD x,y,z; BYTE 0,4,0,#2a FADD
FIX x,0,z; BYTE 0,4,0,#26 FIX
FSUB x,y,z; BYTE 0,4,0,#2a FSUB
FIXU x,0,z; BYTE 0,4,0,#26 FIXU
FLOT x,0,z; BYTE 0,4,0,#26 FLOT
FLOT x,0,z; BYTE 0,4,0,#25 FLOTI
FLOTU x,0,z; BYTE 0,4,0,#26 FLOTU
FLOTU x,0,z; BYTE 0,4,0,#25 FLOTUI
SFLOT x,0,z; BYTE 0,4,0,#26 SFLOT
SFLOT x,0,z; BYTE 0,4,0,#25 SFLOTI
SFLOTU x,0,z; BYTE 0,4,0,#26 SFLOTU
SFLOTU x,0,z; BYTE 0,4,0,#25 SFLOTUI
FMUL x,y,z; BYTE 0,4,0,#2a FMUL
FCMPE x,y,z; BYTE 0,4,rE,#2a FCMPE
FUNE x,y,z; BYTE 0,1,rE,#2a FUNE
FEQLE x,y,z; BYTE 0,4,rE,#2a FEQLE
FDIV x,y,z; BYTE 0,40,0,#2a FDIV
FSQRT x,0,z; BYTE 0,40,0,#26 FSQRT
FREM x,y,z; BYTE 0,4,0,#2a FREM
FINT x,0,z; BYTE 0,4,0,#26 FINT
MUL x,y,z; BYTE 0,10,0,#2a MUL
MUL x,y,z; BYTE 0,10,0,#29 MULI
JMP MulU+@-O; BYTE 0,10,0,#2a MULU
JMP MulU+@-O; BYTE 0,10,0,#29 MULUI
JMP Div+@-O; BYTE 0,60,0,#2a DIV
JMP Div+@-O; BYTE 0,60,0,#29 DIVI
JMP DivU+@-O; BYTE 0,60,rD,#2a DIVU
JMP DivU+@-O; BYTE 0,60,rD,#29 DIVUI
ADD x,y,z; BYTE 0,1,0,#2a ADD
ADD x,y,z; BYTE 0,1,0,#29 ADDI
ADDU x,y,z; BYTE 0,1,0,#2a ADDU
ADDU x,y,z; BYTE 0,1,0,#29 ADDUI
SUB x,y,z; BYTE 0,1,0,#2a SUB
SUB x,y,z; BYTE 0,1,0,#29 SUBI
SUBU x,y,z; BYTE 0,1,0,#2a SUBU
SUBU x,y,z; BYTE 0,1,0,#29 SUBUI
2ADDU x,y,z; BYTE 0,1,0,#2a 2ADDU
2ADDU x,y,z; BYTE 0,1,0,#29 2ADDUI
4ADDU x,y,z; BYTE 0,1,0,#2a 4ADDU
4ADDU x,y,z; BYTE 0,1,0,#29 4ADDUI
8ADDU x,y,z; BYTE 0,1,0,#2a 8ADDU
8ADDU x,y,z; BYTE 0,1,0,#29 8ADDUI
16ADDU x,y,z; BYTE 0,1,0,#2a 16ADDU
16ADDU x,y,z; BYTE 0,1,0,#29 16ADDUI
CMP x,y,z; BYTE 0,1,0,#2a CMP
CMP x,y,z; BYTE 0,1,0,#29 CMPI
CMPU x,y,z; BYTE 0,1,0,#2a CMPU
CMPU x,y,z; BYTE 0,1,0,#29 CMPUI
NEG x,0,z; BYTE 0,1,0,#26 NEG
NEG x,0,z; BYTE 0,1,0,#25 NEGI
NEGU x,0,z; BYTE 0,1,0,#26 NEGU
NEGU x,0,z; BYTE 0,1,0,#25 NEGUI
SL x,y,z; BYTE 0,1,0,#2a SL
SL x,y,z; BYTE 0,1,0,#29 SLI
SLU x,y,z; BYTE 0,1,0,#2a SLU
SLU x,y,z; BYTE 0,1,0,#29 SLUI
SR x,y,z; BYTE 0,1,0,#2a SR
SR x,y,z; BYTE 0,1,0,#29 SRI
SRU x,y,z; BYTE 0,1,0,#2a SRU
SRU x,y,z; BYTE 0,1,0,#29 SRUI
BN x,BTaken+@-O; BYTE 0,1,0,#50 BN
BN x,BTaken+@-O; BYTE 0,1,0,#50 BNB
BZ x,BTaken+@-O; BYTE 0,1,0,#50 BZ
BZ x,BTaken+@-O; BYTE 0,1,0,#50 BZB
BP x,BTaken+@-O; BYTE 0,1,0,#50 BP
BP x,BTaken+@-O; BYTE 0,1,0,#50 BPB
BOD x,BTaken+@-O; BYTE 0,1,0,#50 BOD
BOD x,BTaken+@-O; BYTE 0,1,0,#50 BODB
BNN x,BTaken+@-O; BYTE 0,1,0,#50 BNN
BNN x,BTaken+@-O; BYTE 0,1,0,#50 BNNB
BNZ x,BTaken+@-O; BYTE 0,1,0,#50 BNZ
BNZ x,BTaken+@-O; BYTE 0,1,0,#50 BNZB
BNP x,BTaken+@-O; BYTE 0,1,0,#50 BNP
BNP x,BTaken+@-O; BYTE 0,1,0,#50 BNPB
BEV x,BTaken+@-O; BYTE 0,1,0,#50 BEV
BEV x,BTaken+@-O; BYTE 0,1,0,#50 BEVB
PBN x,PBTaken+@-O; BYTE 0,3,0,#50 PBN
PBN x,PBTaken+@-O; BYTE 0,3,0,#50 PBNB
PBZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBZ
PBZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBZB
PBP x,PBTaken+@-O; BYTE 0,3,0,#50 PBP
PBP x,PBTaken+@-O; BYTE 0,3,0,#50 PBPB
PBOD x,PBTaken+@-O; BYTE 0,3,0,#50 PBOD
PBOD x,PBTaken+@-O; BYTE 0,3,0,#50 PBODB
PBNN x,PBTaken+@-O; BYTE 0,3,0,#50 PBNN
PBNN x,PBTaken+@-O; BYTE 0,3,0,#50 PBNNB
PBNZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBNZ
PBNZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBNZB
PBNP x,PBTaken+@-O; BYTE 0,3,0,#50 PBNP
PBNP x,PBTaken+@-O; BYTE 0,3,0,#50 PBNPB
PBEV x,PBTaken+@-O; BYTE 0,3,0,#50 PBEV
PBEV x,PBTaken+@-O; BYTE 0,3,0,#50 PBEVB
CSN x,y,z; BYTE 0,1,0,#3a CSN
CSN x,y,z; BYTE 0,1,0,#39 CSNI
CSZ x,y,z; BYTE 0,1,0,#3a CSZ
CSZ x,y,z; BYTE 0,1,0,#39 CSZI
CSP x,y,z; BYTE 0,1,0,#3a CSP
CSP x,y,z; BYTE 0,1,0,#39 CSPI
CSOD x,y,z; BYTE 0,1,0,#3a CSOD
CSOD x,y,z; BYTE 0,1,0,#39 CSODI
CSNN x,y,z; BYTE 0,1,0,#3a CSNN
CSNN x,y,z; BYTE 0,1,0,#39 CSNNI
CSNZ x,y,z; BYTE 0,1,0,#3a CSNZ
CSNZ x,y,z; BYTE 0,1,0,#39 CSNZI
CSNP x,y,z; BYTE 0,1,0,#3a CSNP
CSNP x,y,z; BYTE 0,1,0,#39 CSNPI
CSEV x,y,z; BYTE 0,1,0,#3a CSEV
CSEV x,y,z; BYTE 0,1,0,#39 CSEVI
ZSN x,y,z; BYTE 0,1,0,#2a ZSN
ZSN x,y,z; BYTE 0,1,0,#29 ZSNI
ZSZ x,y,z; BYTE 0,1,0,#2a ZSZ
ZSZ x,y,z; BYTE 0,1,0,#29 ZSZI
ZSP x,y,z; BYTE 0,1,0,#2a ZSP
ZSP x,y,z; BYTE 0,1,0,#29 ZSPI
ZSOD x,y,z; BYTE 0,1,0,#2a ZSOD
ZSOD x,y,z; BYTE 0,1,0,#29 ZSODI
ZSNN x,y,z; BYTE 0,1,0,#2a ZSNN
ZSNN x,y,z; BYTE 0,1,0,#29 ZSNNI
ZSNZ x,y,z; BYTE 0,1,0,#2a ZSNZ
ZSNZ x,y,z; BYTE 0,1,0,#29 ZSNZI
ZSNP x,y,z; BYTE 0,1,0,#2a ZSNP
ZSNP x,y,z; BYTE 0,1,0,#29 ZSNPI
ZSEV x,y,z; BYTE 0,1,0,#2a ZSEV
ZSEV x,y,z; BYTE 0,1,0,#29 ZSEVI
LDB x,res,0; BYTE 1,1,0,#aa LDB
LDB x,res,0; BYTE 1,1,0,#a9 LDBI
LDBU x,res,0; BYTE 1,1,0,#aa LDBU
LDBU x,res,0; BYTE 1,1,0,#a9 LDBUI
LDW x,res,0; BYTE 1,1,0,#aa LDW
LDW x,res,0; BYTE 1,1,0,#a9 LDWI
LDWU x,res,0; BYTE 1,1,0,#aa LDWU
LDWU x,res,0; BYTE 1,1,0,#a9 LDWUI
LDT x,res,0; BYTE 1,1,0,#aa LDT
LDT x,res,0; BYTE 1,1,0,#a9 LDTI
LDTU x,res,0; BYTE 1,1,0,#aa LDTU
LDTU x,res,0; BYTE 1,1,0,#a9 LDTUI
LDO x,res,0; BYTE 1,1,0,#aa LDO
LDO x,res,0; BYTE 1,1,0,#a9 LDOI
LDOU x,res,0; BYTE 1,1,0,#aa LDOU
LDOU x,res,0; BYTE 1,1,0,#a9 LDOUI
LDSF x,res,0; BYTE 1,1,0,#aa LDSF
LDSF x,res,0; BYTE 1,1,0,#a9 LDSFI
LDHT x,res,0; BYTE 1,1,0,#aa LDHT
LDHT x,res,0; BYTE 1,1,0,#a9 LDHTI
JMP Cswap+@-O; BYTE 2,2,0,#ba CSWAP
JMP Cswap+@-O; BYTE 2,2,0,#b9 CSWAPI
LDUNC x,res,0; BYTE 1,1,0,#aa LDUNC
LDUNC x,res,0; BYTE 1,1,0,#a9 LDUNCI
JMP Error+@-O; BYTE 0,1,0,#2a LDVTS
JMP Error+@-O; BYTE 0,1,0,#29 LDVTSI
SWYM 0; BYTE 0,1,0,#0a PRELD
SWYM 0; BYTE 0,1,0,#09 PRELDI
SWYM 0; BYTE 0,1,0,#0a PREGO
SWYM 0; BYTE 0,1,0,#09 PREGOI
JMP Go+@-O; BYTE 0,3,0,#2a GO
JMP Go+@-O; BYTE 0,3,0,#29 GOI
STB x,res,0; BYTE 1,1,0,#9a STB
STB x,res,0; BYTE 1,1,0,#99 STBI
STBU x,res,0; BYTE 1,1,0,#9a STBU
STBU x,res,0; BYTE 1,1,0,#99 STBUI
STW x,res,0; BYTE 1,1,0,#9a STW
STW x,res,0; BYTE 1,1,0,#99 STWI
STWU x,res,0; BYTE 1,1,0,#9a STWU
STWU x,res,0; BYTE 1,1,0,#99 STWUI
STT x,res,0; BYTE 1,1,0,#9a STT
STT x,res,0; BYTE 1,1,0,#99 STTI
STTU x,res,0; BYTE 1,1,0,#9a STTU
STTU x,res,0; BYTE 1,1,0,#99 STTUI
STO x,res,0; BYTE 1,1,0,#9a STO
STO x,res,0; BYTE 1,1,0,#99 STOI
STOU x,res,0; BYTE 1,1,0,#9a STOU
STOU x,res,0; BYTE 1,1,0,#99 STOUI
STSF x,res,0; BYTE 1,1,0,#9a STSF
STSF x,res,0; BYTE 1,1,0,#99 STSFI
STHT x,res,0; BYTE 1,1,0,#9a STHT
STHT x,res,0; BYTE 1,1,0,#99 STHTI
STO xx,res,0; BYTE 1,1,0,#8a STCO
STO xx,res,0; BYTE 1,1,0,#89 STCOI
STUNC x,res,0; BYTE 1,1,0,#9a STUNC
STUNC x,res,0; BYTE 1,1,0,#99 STUNCI
SWYM 0; BYTE 0,1,0,#0a SYNCD
SWYM 0; BYTE 0,1,0,#09 SYNCDI
SWYM 0; BYTE 0,1,0,#0a PREST
SWYM 0; BYTE 0,1,0,#09 PRESTI
SWYM 0; BYTE 0,1,0,#0a SYNCID
SWYM 0; BYTE 0,1,0,#09 SYNCIDI
JMP PushGo+@-O; BYTE 0,3,0,#2a PUSHGO
JMP PushGo+@-O; BYTE 0,3,0,#29 PUSHGOI
OR x,y,z; BYTE 0,1,0,#2a OR
OR x,y,z; BYTE 0,1,0,#29 ORI
ORN x,y,z; BYTE 0,1,0,#2a ORN
ORN x,y,z; BYTE 0,1,0,#29 ORNI
NOR x,y,z; BYTE 0,1,0,#2a NOR
NOR x,y,z; BYTE 0,1,0,#29 NORI
XOR x,y,z; BYTE 0,1,0,#2a XOR
XOR x,y,z; BYTE 0,1,0,#29 XORI
AND x,y,z; BYTE 0,1,0,#2a AND
AND x,y,z; BYTE 0,1,0,#29 ANDI
ANDN x,y,z; BYTE 0,1,0,#2a ANDN
ANDN x,y,z; BYTE 0,1,0,#29 ANDNI
NAND x,y,z; BYTE 0,1,0,#2a NAND
NAND x,y,z; BYTE 0,1,0,#29 NANDI
NXOR x,y,z; BYTE 0,1,0,#2a NXOR
NXOR x,y,z; BYTE 0,1,0,#29 NXORI
BDIF x,y,z; BYTE 0,1,0,#2a BDIF
BDIF x,y,z; BYTE 0,1,0,#29 BDIFI
WDIF x,y,z; BYTE 0,1,0,#2a WDIF
WDIF x,y,z; BYTE 0,1,0,#29 WDIFI
TDIF x,y,z; BYTE 0,1,0,#2a TDIF
TDIF x,y,z; BYTE 0,1,0,#29 TDIFI
ODIF x,y,z; BYTE 0,1,0,#2a ODIF
ODIF x,y,z; BYTE 0,1,0,#29 ODIFI
MUX x,y,z; BYTE 0,1,rM,#2a MUX
MUX x,y,z; BYTE 0,1,rM,#29 MUXI
SADD x,y,z; BYTE 0,1,0,#2a SADD
SADD x,y,z; BYTE 0,1,0,#29 SADDI
MOR x,y,z; BYTE 0,1,0,#2a MOR
MOR x,y,z; BYTE 0,1,0,#29 MORI
MXOR x,y,z; BYTE 0,1,0,#2a MXOR
MXOR x,y,z; BYTE 0,1,0,#29 MXORI
SET x,z; BYTE 0,1,0,#20 SETH
SET x,z; BYTE 0,1,0,#20 SETMH
SET x,z; BYTE 0,1,0,#20 SETML
SET x,z; BYTE 0,1,0,#20 SETL
ADDU x,x,z; BYTE 0,1,0,#30 INCH
ADDU x,x,z; BYTE 0,1,0,#30 INCMH
ADDU x,x,z; BYTE 0,1,0,#30 INCML
ADDU x,x,z; BYTE 0,1,0,#30 INCL
OR x,x,z; BYTE 0,1,0,#30 ORH
OR x,x,z; BYTE 0,1,0,#30 ORMH
OR x,x,z; BYTE 0,1,0,#30 ORML
OR x,x,z; BYTE 0,1,0,#30 ORL
ANDN x,x,z; BYTE 0,1,0,#30 ANDNH
ANDN x,x,z; BYTE 0,1,0,#30 ANDNMH
ANDN x,x,z; BYTE 0,1,0,#30 ANDNML
ANDN x,x,z; BYTE 0,1,0,#30 ANDNL
SET inst_ptr,yz; BYTE 0,1,0,#41 JMP
SET inst_ptr,yz; BYTE 0,1,0,#41 JMPB
JMP PushJ+@-O; BYTE 0,1,0,#60 PUSHJ
JMP PushJ+@-O; BYTE 0,1,0,#60 PUSHJB
SET x,yz; BYTE 0,1,0,#60 GETA
SET x,yz; BYTE 0,1,0,#60 GETAB
JMP Put+@-O; BYTE 0,1,0,#02 PUT
JMP Put+@-O; BYTE 0,1,0,#01 PUTI
JMP Pop+@-O; BYTE 0,3,rJ,#00 POP
JMP Resume+@-O; BYTE 0,5,0,#00 RESUME
JMP Save+@-O; BYTE 20,1,0,#20 SAVE
JMP Unsave+@-O; BYTE 20,1,0,#02 UNSAVE
JMP Sync+@-O; BYTE 0,1,0,#01 SYNC
SWYM x,y,z; BYTE 0,1,0,#00 SWYM
JMP Get+@-O; BYTE 0,1,0,#20 GET
JMP Trip+@-O; BYTE 0,5,0,#0a TRIP
 
Done AND t,f,X_is_dest_bit % doubly defined but OK
BZ t,1F
XDone STOU x,xptr,0
1H GET t,rA
AND t,t,#ff
OR exc,exc,t
AND t,exc,U_BIT+X_BIT Check for trip, \S123
CMPU t,t,U_BIT
PBNZ t,1F branch unless underflow is exact
0H GREG U_BIT<<8
AND t,aa,0B
BNZ t,1F branch if underflow is enabled
ANDNL exc,U_BIT ignore U if exact and not enabled
1H PBZ exc,Update
SRU t,aa,8
AND t,t,exc
PBZ t,4F
SET xx,0 Initiate a trip, \S124
SLU t,t,55
2H INCL xx,1
SLU t,t,1
PBNN t,2B
SET t,#100
SRU t,t,xx
ANDN exc,exc,t
TakeTrip STOU inst_ptr,g,8*rW
SLU inst_ptr,xx,4
INCH inst,#8000
STOU inst,g,8*rX
AND t,f,Mem_bit
PBZ t,1F
ADDU y,y,z
SET z,x
1H STOU y,g,8*rY
STOU z,g,8*rZ
LDOU t,g,c255
STOU t,g,8*rB
LDOU t,g,8*rJ
STOU t,g,c255
4H OR aa,aa,exc
0H GREG #0000000800000004 Update the clocks, \S128
Update MOR t,f,0B $2^{32}$mems + oops
ADDU cc,cc,t
ADDU uu,uu,1
SUBU ii,ii,1
AllDone PBZ resuming,Fetch
CMPU t,op,#F9 RESUME
CSNZ resuming,t,0
JMP Fetch
 
OctaArgs OCTA Global+8*255,8
Infile IS 3
Main LDA Mem:head,Chunk0
ADDU Mem:alloc,Mem:head,Mem:nodesize
GET t,rN
INCL t,1
STOU t,g,8*rN
LDOU t,$1,8 argv[1]
STOU t,IOArgs
LDA t,IOArgs
TRAP 0,Fopen,Infile
BN t,Error
1H GETA t,OctaArgs
TRAP 0,Fread,Infile
BN t,9F
LDOU loc,g,c255
2H GETA t,OctaArgs
TRAP 0,Fread,Infile
LDOU x,g,c255
BN t,Error
SET arg,loc
BZ x,1B
PUSHJ res,MemFind
STOU x,res,0
INCL loc,8
JMP 2B
9H TRAP 0,Fclose,Infile
SUBU loc,loc,8
STOU loc,g,c255 place to UNSAVE
SUBU arg,loc,8*13
PUSHJ res,MemFind
LDOU inst_ptr,res,0 Main
SET arg,#90 Get ready to UNSAVE, \S162
PUSHJ res,MemFind
LDTU x,res,0
SET resuming,1 RESUME_AGAIN
CSNZ inst_ptr,x,#90
0H GREG #FB<<24+255 UNSAVE $255
STOU 0B,g,8*rX
SET gg,c255
JMP Fetch
 
LOC Global+8*rK; OCTA -1
LOC Global+8*rT; OCTA #8000000500000000
LOC Global+8*rTT; OCTA #8000000600000000
LOC Global+8*rV; OCTA #369c200400000000
 
LOC U_Handler
ORL exc,U_BIT
JMP Done
/permu-plain.mms
0,0 → 1,40
* Permutation generator a la plain-changes (mockup only)
t IS $255
a GREG 0
p GREG 0
c GREG 0
fmask GREG #f
magic GREG #8844221188442211
ffmask GREG #ff000000
u IS $0
LOC #100
GREG @
T OCTA #194cb4594cb4594c,#b44,0
Main SET a,#1234
SLU a,a,12 [needed to make the MXOR stuff work]
LDA p,T
JMP 3F
 
1H SRU u,a,12 (trace this)
 
% SLU u,fmask,t
% SLU t,a,4
% XOR t,t,a
% AND t,t,u
% SRU u,t,4
% OR t,t,u
% XOR a,a,t
SLU u,a,t
MXOR u,magic,u
AND u,u,ffmask
SRU u,u,t
XOR a,a,u
 
SRU c,c,3
2H AND t,c,#1c
PBNZ t,1B
ADD p,p,8
3H LDO c,p,0
PBNZ c,2B
TRAP 0,Halt,0
 
/mmix.mp
0,0 → 1,23
% illustrations for mmix.w
 
beginfig(1)
numeric r; r=.5in; % radius of circle
numeric rr; rr=.9in; % radius of arc
 
pickup pencircle scaled .6pt;
draw (r,0){up}...(0,r){left}...(-r,0){down}...(0,-r){right}...cycle;
pickup pencircle scaled .3pt;
for a=-45, 30, 180:
z[a]=(r+10pt,0) rotated (a+10);
draw ((r-4pt,0)--(r+20pt,0)) rotated a;
endfor
label.rt(btex $\alpha$ etex,z[-45]-(2pt,6pt));
label.rt(btex $\beta$ etex,z[30]+(0,2pt));
label.lft(btex $\gamma$ etex,z[180]);
 
drawdblarrow ((rr,0) rotated-45){dir 45}...((rr,0) rotated 30){dir 120};
label.rt(btex $L$ etex,(rr+8pt,0) rotated -10);
endfig;
 
bye.
 
/mmix-mem.w
0,0 → 1,59
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
 
\def\title{MMIX-MEM}
\def\MMIX{\.{MMIX}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
@s octa int
 
@* Memory-mapped input and output. This module supplies procedures for reading
@^I/O@>
@^input/output@>
@^memory-mapped input/output@>
and writing \MMIX\ memory addresses that exceed 48 bits. Such addresses are
used by the operating system for input and output, so they require special
treatment. At present only dummy versions of these routines are implemented.
Users who need nontrivial versions of |spec_read| and/or |spec_write| should
prepare their own and link them with the rest of the simulator.
 
@p
#include <stdio.h>
#include "mmix-pipe.h" /* header file for all modules */
extern octa read_hex(); /* found in the main program module */
static char buf[20];
 
@ If the |interactive_read_bit| of the |verbose| control is set,
the user is supposed to supply values dynamically. Otherwise
zero is read.
 
@p
octa spec_read @,@,@[ARGS((octa))@];@+@t}\6{@>
octa spec_read(addr)
octa addr;
{
octa val;
if (verbose&interactive_read_bit) {
printf("** Read from loc %08x%08x: ",addr.h,addr.l);
fgets(buf,20,stdin);
val=read_hex(buf);
} else val.l=val.h=0;
if (verbose&show_spec_bit)
printf(" (spec_read %08x%08x from %08x%08x at time %d)\n",
val.h,val.l,addr.h,addr.l,ticks.l);
return val;
}
 
@ The default |spec_write| just reports its arguments, without actually
writing anything.
 
@p
void spec_write @,@,@[ARGS((octa,octa))@];@+@t}\6{@>
void spec_write(addr,val)
octa addr,val;
{
if (verbose&show_spec_bit)
printf(" (spec_write %08x%08x to %08x%08x at time %d)\n",
val.h,val.l,addr.h,addr.l,ticks.l);
}
 
@* Index.
/mmmix.w
0,0 → 1,548
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
 
\def\title{MMMIX}
\def\MMIX{\.{MMIX}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
@s octa int
@s tetra int
@s bool int
@s fetch int
@s specnode int
 
@* Introduction.
This \.{CWEB} program simulates how the \MMIX\ computer might be
implemented with a high-performance pipeline in many different configurations.
All of the complexities of \MMIX's architecture are treated, except for
multiprocessing and low-level details of memory mapped input/output.
 
The present program module, which contains the main routine for the
\MMIX\ meta-simulator, is primarily devoted to administrative tasks. Other modules
do the actual work after this module has told them what to do.
 
@ A user typically invokes the meta-simulator with a \UNIX/-like command line
of the general form
`\.{mmmix}~\.{configfile}~\.{progfile}',
where the \.{configfile} describes the characteristics
of an \MMIX\ implementation and the \.{progfile} contains a program to
be downloaded and run. Rules for configuration files appear in
the module called \.{mmix-config}. The program file is either
an ``\MMIX\ binary file'' dumped by {\mc MMIX-SIM}, or an
ASCII text file that describes hexadecimal data
in a rudimentary format. It is assumed to be binary if
its name ends with the extension `\.{.mmb}'.
 
@c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mmix-pipe.h"
@#
char *config_file_name, *prog_file_name;
@<Global variables@>@;
@<Subroutines@>@;
 
int main(argc,argv)
int argc;
char *argv[];
{
@<Parse the command line@>;
MMIX_config(config_file_name);
MMIX_init();
mmix_io_init();
@<Input the program@>;
@<Run the simulation interactively@>;
printf("Simulation ended at time %d.\n",ticks.l);
print_stats();
return 0;
}
 
@ The command line might also contain options, some day.
For now I'm forgetting them and simplifying everything until I gain
further experience.
 
@<Parse...@>=
if (argc!=3) {
fprintf(stderr,"Usage: %s configfile progfile\n",argv[0]);
@.Usage: ...@>
exit(-3);
}
config_file_name=argv[1];
prog_file_name=argv[2];
 
@ @<Input the program@>=
if (strlen(prog_file_name)>4 &&
strcmp(prog_file_name+strlen(prog_file_name)-4,".mmb")==0)
@<Input an \MMIX\ binary file@>@;
else @<Input a rudimentary hexadecimal file@>;
fclose(prog_file);
 
@* Hexadecimal input to memory.
A rudimentary hexadecimal input format is implemented here so that the
@^hexadecimal files@>
simulator can be run with essentially arbitrary data in the simulated memory.
The rules of this format are extremely simple: Each line of the file
either begins with (i)~12 hexadecimal digits followed by a colon; or
(ii)~a space followed by 16 hexadecimal digits. In case~(i), the 12
hex digits specify a 48-bit physical address, called the current
location. In case~(ii), the 16 hex digits specify an octabyte to be
stored in the current location; the current location is then increased by~8.
The current location should be a multiple of~8, but its three least
significant bits are actually ignored. Arbitrary comments can follow
the specification of a new current location or a new octabyte, as long
as each line is less than 99 characters long. For example, the file
$$\vbox{\halign{\tt#\hfil\cr
0123456789ab: SILLY EXAMPLE\cr
\ 0123456789abcdef first octabyte\cr
\ fedbca9876543210 second\cr}}$$
places the octabyte
\Hex{0123456789abcdef} into memory location \Hex{0123456789a8}
and \Hex{fedcba9876543210} into location \Hex{0123456789b0}.
 
@d BUF_SIZE 100
 
@<Glob...@>=
octa cur_loc;
octa cur_dat;
bool new_chunk;
char buffer[BUF_SIZE];
FILE *prog_file;
 
@ @<Input a rudimentary hexadecimal file@>=
{
prog_file=fopen(prog_file_name,"r");
if (!prog_file) {
fprintf(stderr,"Panic: Can't open MMIX hexadecimal file %s!\n",prog_file_name);
@.Can't open...@>
exit(-3);
}
new_chunk=true;
while (1) {
if (!fgets(buffer,BUF_SIZE,prog_file)) break;
if (buffer[strlen(buffer)-1]!='\n') {
fprintf(stderr,"Panic: Hexadecimal file line too long: `%s...'!\n",buffer);
@.Hexadecimal file line...@>
exit(-3);
}
if (buffer[12]==':') @<Change the current location@>@;
else if (buffer[0]==' ') @<Read an octabyte and advance |cur_loc|@>@;
else {
fprintf(stderr,"Panic: Improper hexadecimal file line: `%s'!\n",buffer);
@.Improper hexadecimal...@>
exit(-3);
}
}
}
 
@ @<Change the current location@>=
{
if (sscanf(buffer,"%4x%8x",&cur_loc.h,&cur_loc.l)!=2) {
fprintf(stderr,"Panic: Improper hexadecimal file location: `%s'!\n",buffer);
@.Improper hexadecimal...@>
exit(-3);
}
new_chunk=true;
}
 
@ @<Read an octabyte and advance |cur_loc|@>=
{
if (sscanf(buffer+1,"%8x%8x",&cur_dat.h,&cur_dat.l)!=2) {
fprintf(stderr,"Panic: Improper hexadecimal file data: `%s'!\n",buffer);
@.Improper hexadecimal...@>
exit(-3);
}
if (new_chunk) mem_write(cur_loc,cur_dat);
else mem_hash[last_h].chunk[(cur_loc.l&0xffff)>>3]=cur_dat;
cur_loc.l+=8;
if ((cur_loc.l&0xfff8)!=0) new_chunk=false;
else {
new_chunk=true;
if ((cur_loc.l&0xffff0000)==0) cur_loc.h++;
}
}
 
@* Binary input to memory.
When the program file was dumped by {\mc MMIX-SIM}, it
has the simple format discussed in exercise 1.4.3$'$--20 of the \MMIX\ fascicle.
@^binary files@>
@^segments@>
In this case we assume that the user's program has text, data, pool, and stack
segments, as in the conventions of that book.
We load it into four
$2^{32}$-byte pages of physical memory, one for each segment; page zero of
segment~$i$ is mapped to physical location $2^{32}i$. Page tables are kept in
physical locations starting at $2^{32}\times4$; static traps begin at
$2^{32}\times 5$ and dynamic traps at $2^{32}\times6$. (These conventions
agree with the special register settings
$\rm rT=\Hex{8000000500000000}$,
$\rm rTT=\Hex{8000000600000000}$,
$\rm rV=\Hex{369c200400000000}$
assumed by the stripped-down simulator.)
 
@<Input an \MMIX\ binary file@>=
{
prog_file=fopen(prog_file_name,"rb");
if (!prog_file) {
fprintf(stderr,"Panic: Can't open MMIX binary file %s!\n",prog_file_name);
@.Can't open...@>
exit(-3);
}
while (1) {
if (!undump_octa()) break;
new_chunk=true;
cur_loc=cur_dat;
if (cur_loc.h&0x9fffffff) bad_address=true;
else bad_address=false, cur_loc.h >>= 29;
/* apply trivial mapping function for each segment */
@<Input consecutive octabytes beginning at |cur_loc|@>;
}
@<Set up the canned environment@>;
}
 
@ The |undump_octa| routine reads eight bytes from the binary file
|prog_file| into the global octabyte |cur_dat|,
taking care as usual to be big-endian regardless of the host computer's bias.
@^big-endian versus little-endian@>
@^little-endian versus big-endian@>
 
@<Sub...@>=
static bool undump_octa @,@,@[ARGS((void))@];@+@t}\6{@>
static bool undump_octa()
{
register int t0,t1,t2,t3;
t0=fgetc(prog_file);@+ if (t0==EOF) return false;
t1=fgetc(prog_file);@+ if (t1==EOF) goto oops;
t2=fgetc(prog_file);@+ if (t2==EOF) goto oops;
t3=fgetc(prog_file);@+ if (t3==EOF) goto oops;
cur_dat.h=(t0<<24)+(t1<<16)+(t2<<8)+t3;
t0=fgetc(prog_file);@+ if (t0==EOF) goto oops;
t1=fgetc(prog_file);@+ if (t1==EOF) goto oops;
t2=fgetc(prog_file);@+ if (t2==EOF) goto oops;
t3=fgetc(prog_file);@+ if (t3==EOF) goto oops;
cur_dat.l=(t0<<24)+(t1<<16)+(t2<<8)+t3;
return true;
oops: fprintf(stderr,"Premature end of file on %s!\n",prog_file_name);
@.Premature end of file...@>
return false;
}
 
@ @<Input consecutive octabytes beginning at |cur_loc|@>=
while (1) {
if (!undump_octa()) {
fprintf(stderr,"Unexpected end of file on %s!\n",prog_file_name);
@.Unexpected end of file...@>
break;
}
if (!(cur_dat.h || cur_dat.l)) break;
if (bad_address) {
fprintf(stderr,"Panic: Unsupported virtual address %08x%08x!\n",
@.Unsupported virtual address@>
cur_loc.h,cur_loc.l);
exit(-5);
}
if (new_chunk) mem_write(cur_loc,cur_dat);
else mem_hash[last_h].chunk[(cur_loc.l&0xffff)>>3]=cur_dat;
cur_loc.l+=8;
if ((cur_loc.l&0xfff8)!=0) new_chunk=false;
else {
new_chunk=true;
if ((cur_loc.l&0xffff0000)==0) {
bad_address=true; cur_loc.h=(cur_loc.h<<29)+1;
}
}
}
 
@ The primitive operating system assumed in simple programs of {\sl The
Art of Computer Programming\/} will set up text segment, data segment,
pool segment, and stack segment as in {\mc MMIX-SIM}. The runtime stack
will be initialized if we \.{UNSAVE} from the last location loaded
in the \.{.mmb} file.
 
@d rQ 16
 
@<Set up the canned environment@>=
if (cur_loc.h!=3) {
fprintf(stderr,"Panic: MMIX binary file didn't set up the stack!\n");
@.MMIX binary file...@>
exit(-6);
}
inst_ptr.o=mem_read(incr(cur_loc,-8*14)); /* \.{Main} */
inst_ptr.p=NULL;
cur_loc.h=0x60000000;
g[255].o=incr(cur_loc,-8); /* place to \.{UNSAVE} */
cur_dat.l=0x90;
if (mem_read(cur_dat).h) inst_ptr.o=cur_dat; /* start at |0x90| if nonzero */
head->inst=(UNSAVE<<24)+255, tail--; /* prefetch a fabricated command */
head->loc=incr(inst_ptr.o,-4); /* in case the \.{UNSAVE} is interrupted */
g[rT].o.h=0x80000005, g[rTT].o.h=0x80000006;
cur_dat.h=(RESUME<<24)+1, cur_dat.l=0, cur_loc.h=5, cur_loc.l=0;
mem_write(cur_loc,cur_dat); /* the primitive trap handler */
cur_dat.l=cur_dat.h, cur_dat.h=(NEGI<<24)+(255<<16)+1;
cur_loc.h=6, cur_loc.l=8;
mem_write(cur_loc,cur_dat); /* the primitive dynamic trap handler */
cur_dat.h=(GET<<24)+rQ, cur_dat.l=(PUTI<<24)+(rQ<<16), cur_loc.l=0;
mem_write(cur_loc,cur_dat); /* more of the primitive dynamic trap handler */
cur_dat.h=0, cur_dat.l=7; /* generate a PTE with \.{rwx} permission */
cur_loc.h=4; /* beginning of skeleton page table */
mem_write(cur_loc,cur_dat); /* PTE for the text segment */
ITcache->set[0][0].tag=zero_octa;
ITcache->set[0][0].data[0]=cur_dat; /* prime the IT cache */
cur_dat.l=6; /* PTE with read and write permission only */
cur_dat.h=1, cur_loc.l=3<<13;
mem_write(cur_loc,cur_dat); /* PTE for the data segment */
cur_dat.h=2, cur_loc.l=6<<13;
mem_write(cur_loc,cur_dat); /* PTE for the pool segment */
cur_dat.h=3, cur_loc.l=9<<13;
mem_write(cur_loc,cur_dat); /* PTE for the stack segment */
g[rK].o=neg_one; /* enable all interrupts */
g[rV].o.h=0x369c2004;
page_bad=false, page_r=4<<(32-13), page_s=32, page_mask.l=0xffffffff;
page_b[1]=3, page_b[2]=6, page_b[3]=9, page_b[4]=12;
 
@* Interaction. When prompted for instructions, this simulator
@.mmmix>@>
understands the following terse commands:
 
\def\bull{\smallbreak\textindent{$\bullet$}}
\def\<#1>{$\langle\,$#1$\,\rangle$}
\bull\<positive integer>: Run for this many clock cycles.
 
\bull\.{@@}\<hexadecimal integer>: Set the instruction pointer
to this virtual address; successive instructions will be fetched from here.
 
\bull\.{b}\<hexadecimal integer>: Set the breakpoint
to this virtual address; simulation will pause when an instruction from the
breakpoint address enters the fetch buffer.
 
\bull\.v\<hexadecimal integer>: Set the desired level of diagnostic
output; each bit in the hexadecimal integer enables certain printouts
when the simulator is running. Bit \Hex1 shows instructions when issued,
deissued, or committed; \Hex2 shows the pipeline and locks after each cycle;
\Hex4 shows each coroutine activation; \Hex8 each coroutine scheduling;
\Hex{10} reports when reading from an uninitialized chunk of memory;
\Hex{20} asks for online input when reading from addresses $\ge2^{48}$;
\Hex{40} reports all I/O to memory address $\ge2^{48}$;
\Hex{80} shows details of branch prediction;
\Hex{100} displays full cache contents including blocks with invalid tags.
 
\bull\.-\<integer>: Deissue this many instructions.
 
\bull\.l\<integer> or \.g\<integer>: Show current ``hot'' contents
of a local or global register.
 
\bull\.m\<hexadecimal integer>: Show current contents of a physical memory
address. (This value may not be up to date; newer values might appear
in the write buffer and/or in the caches.)
 
\bull\.f\<hexadecimal integer>: Insert a tetrabyte into the fetch buffer.
(Use with care!)
 
\bull\.i\<integer>: Set the interval counter rI to the given value; this will
trigger an interrupt after the specified number of cycles.
 
\bull\.{IT}, \.{DT}, \.I, \.D, or \.S: Show current contents of a cache.
 
\bull\.{D*} or \.{S*}: Show dirty blocks of a cache.
 
\bull\.p: Show current contents of the pipeline.
 
\bull\.s: Show current statistics on branch prediction and
speed of instruction issue.
 
\bull\.h: Help (show the possibilities for interaction).
 
\bull\.q: Quit.
 
@<Run the simulation interactively@>=
while (1) {
printf("mmmix> ");@+fflush(stdout);
@.mmmix>@>
fgets(buffer,BUF_SIZE,stdin);
switch (buffer[0]) {
default: what_say:
printf("Eh? Sorry, I don't understand. (Type h for help)\n");
continue;
case 'q': case 'x': goto done;
@<Cases for interaction@>@;
}
}
done:@;
 
@ @<Cases...@>=
case 'h': case '?': printf("The interactive commands are as follows:\n");
printf(" <n> to run for n cycles\n");
printf(" @@<x> to take next instruction from location x\n");
printf(" b<x> to pause when location x is fetched\n");
printf(" v<x> to print specified diagnostics when running;\n");
printf(" x=1[insts enter/leave pipe]+2[whole pipeline each cycle]+\n");
printf(" 4[coroutine activations]+8[coroutine scheduling]+\n");
printf(" 10[uninitialized read]+20[online I/O read]+\n");
printf(" 40[I/O read/write]+80[branch prediction details]+\n");
printf(" 100[invalid cache blocks displayed too]\n");
printf(" -<n> to deissue n instructions\n");
printf(" l<n> to print current value of local register n\n");
printf(" g<n> to print current value of global register n\n");
printf(" m<x> to print current value of memory address x\n");
printf(" f<x> to insert instruction x into the fetch buffer\n");
printf(" i<n> to initiate a timer interrupt after n cycles\n");
printf(" IT, DT, I, D, or S to print current cache contents\n");
printf(" D* or S* to print dirty blocks of a cache\n");
printf(" p to print current pipeline contents\n");
printf(" s to print current stats\n");
printf(" h to print this message\n");
printf(" q to exit\n");
printf("(Here <n> is a decimal integer, <x> is hexadecimal.)\n");
continue;
 
@ @<Cases...@>=
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
if (sscanf(buffer,"%d",&n)!=1) goto what_say;
printf("Running %d at time %d",n,ticks.l);
if (bp.h==(tetra)-1 && bp.l==(tetra)-1) printf("\n");
else printf(" with breakpoint %08x%08x\n",bp.h,bp.l);
MMIX_run(n,bp);@+continue;
case '@@': inst_ptr.o=read_hex(buffer+1);@+inst_ptr.p=NULL;@+continue;
case 'b': bp=read_hex(buffer+1);@+continue;
case 'v': verbose=read_hex(buffer+1).l;@+continue;
 
@ @<Glob...@>=
int n,m; /* temporary integer */
octa bp={-1,-1}; /* breakpoint */
octa tmp; /* an octabyte of temporary interest */
static unsigned char d[BUF_SIZE];
 
@ Here's a simple program to read an octabyte in hexadecimal notation
from a buffer. It changes the buffer by storing a null character
after the input.
@^radix conversion@>
 
@<Sub...@>=
octa read_hex @,@,@[ARGS((char *))@];@+@t}\6{@>
octa read_hex(p)
char *p;
{
register int j,k;
octa val;
val.h=val.l=0;
for (j=0;;j++) {
if (p[j]>='0' && p[j]<='9') d[j]=p[j]-'0';
else if (p[j]>='a' && p[j]<='f') d[j]=p[j]-'a'+10;
else if (p[j]>='A' && p[j]<='F') d[j]=p[j]-'A'+10;
else break;
}
p[j]='\0';
for (j--,k=0;k<=j;k++) {
if (k>=8) val.h+=d[j-k]<<(4*k-32);
else val.l+=d[j-k]<<(4*k);
}
return val;
}
 
@ @<Cases...@>=
case '-':@+ if (sscanf(buffer+1,"%d",&n)!=1 || n<0) goto what_say;
if (cool<=hot) m=hot-cool;@+else m=(hot-reorder_bot)+1+(reorder_top-cool);
if (n>m) deissues=m;@+else deissues=n;
continue;
case 'l':@+ if (sscanf(buffer+1,"%d",&n)!=1 || n<0) goto what_say;
if (n>=lring_size) goto what_say;
printf(" l[%d]=%08x%08x\n",n,l[n].o.h,l[n].o.l);@+continue;
case 'm': tmp=mem_read(read_hex(buffer+1));
printf(" m[%s]=%08x%08x\n",buffer+1,tmp.h,tmp.l);@+continue;
 
@ The register stack pointers, rO and rS, are not kept up to date
in the |g| array. Therefore we have to deduce their values by
examining the pipeline.
 
@<Cases...@>=
case 'g':@+ if (sscanf(buffer+1,"%d",&n)!=1 || n<0) goto what_say;
if (n>=256) goto what_say;
if (n==rO || n==rS) {
if (hot==cool) /* pipeline empty */
g[rO].o=sl3(cool_O), g[rS].o=sl3(cool_S);
else g[rO].o=sl3(hot->cur_O), g[rS].o=sl3(hot->cur_S);
}
printf(" g[%d]=%08x%08x\n",n,g[n].o.h,g[n].o.l);
continue;
 
@ @<Sub...@>=
static octa sl3 @,@,@[ARGS((octa))@];@+@t}\6{@>
static octa sl3(y) /* shift left by 3 bits */
octa y;
{
register tetra yhl=y.h<<3, ylh=y.l>>29;
y.h=yhl+ylh;@+ y.l<<=3;
return y;
}
 
@ @<Cases...@>=
case 'I': print_cache(buffer[1]=='T'? ITcache: Icache,false);@+continue;
case 'D': print_cache(buffer[1]=='T'? DTcache: Dcache,@/
buffer[1]=='*');@+continue;
case 'S': print_cache(Scache,buffer[1]=='*');@+continue;
case 'p': print_pipe();@+print_locks();@+continue;
case 's': print_stats();@+continue;
case 'i':@+ if (sscanf(buffer+1,"%d",&n)==1) g[rI].o=incr(zero_octa,n);
continue;
 
@ @<Cases...@>=
case 'f': tmp=read_hex(buffer+1);
{
register fetch* new_tail;
if (tail==fetch_bot) new_tail=fetch_top;
else new_tail=tail-1;
if (new_tail==head) printf("Sorry, the fetch buffer is full!\n");
else {
tail->loc=inst_ptr.o;
tail->inst=tmp.l;
tail->interrupt=0;
tail->noted=false;
tail=new_tail;
}
continue;
}
 
@ A hidden case here, for me when debugging.
It essentially disables the translation caches, by mapping everything
to zero.
 
@<Cases...@>=
case 'd':@+if (ticks.l)
printf("Sorry: I disable ITcache and DTcache only at the beginning!\n");
else {
ITcache->set[0][0].tag=zero_octa;
ITcache->set[0][0].data[0]=seven_octa;
DTcache->set[0][0].tag=zero_octa;
DTcache->set[0][0].data[0]=seven_octa;
g[rK].o=neg_one;
page_bad=false;
page_mask=neg_one;
inst_ptr.p=(specnode*)1;
}@+continue;
 
@ And another case, for me when kludging. At the moment,
it simply lists the functional unit names.
 
But I might decide to put other stuff here when giving a demo.
 
@<Cases...@>=
case 'k':@+ { register int j;
for (j=0;j<funit_count;j++)
printf("unit %s %d\n",funit[j].name,funit[j].k);
}
continue;
 
@ @<Glob...@>=
bool bad_address;
extern bool page_bad;
extern octa page_mask;
extern int page_r,page_s,page_b[5];
extern octa zero_octa;
extern octa neg_one;
octa seven_octa={0,7};
extern octa incr @,@,@[ARGS((octa y,int delta))@];
/* unsigned $y+\delta$ ($\delta$ is signed) */
extern void mmix_io_init @,@,@[ARGS((void))@];
extern void MMIX_config @,@,@[ARGS((char*))@];
 
@* Index.
/crypto1.mms
0,0 → 1,73
* Cryptanalysis Problem (CLASSIFIED) (pipelined)
a GREG
b GREG
bb GREG
c GREG
t GREG
x GREG
y GREG
LOC Data_Segment
freq GREG @ Base address for byte counts
LOC @+8*(1<<8) Space for the byte frequencies
p GREG @
BYTE "abracadabraa",0,"abc" Trivial test data
ones GREG #0101010101010101
LOC #100
Start LDOU a,p,0
INCL p,8
BDIF t,ones,a
BNZ t,3F Do main loop, unless near the end.
2H SRU b,a,53
LDO c,freq,b Load old count.
SLU bb,a,8
INCL c,1
SRU bb,bb,53
STO c,freq,b Store new count.
LDO c,freq,bb
SLU b,a,16
INCL c,1
SRU b,b,53
STO c,freq,bb
LDO c,freq,b Load old count.
SLU bb,a,24
INCL c,1
SRU bb,bb,53
STO c,freq,b Store new count.
LDO c,freq,bb
SLU b,a,32
INCL c,1
SRU b,b,53
STO c,freq,bb
LDO c,freq,b Load old count.
SLU bb,a,40
INCL c,1
SRU bb,bb,53
STO c,freq,b Store new count.
LDO c,freq,bb
SLU b,a,48
INCL c,1
SRU b,b,53
STO c,freq,bb
LDO c,freq,b Load old count.
SLU bb,a,56
INCL c,1
SRU bb,bb,53
STO c,freq,b Store new count.
LDO c,freq,bb
LDOU a,p,0
INCL p,8
INCL c,1
BDIF t,ones,a
STO c,freq,bb
PBZ t,2B Do main loop, unless near the end.
3H SRU b,a,53
LDO c,freq,b Load old count.
INCL c,1
STO c,freq,b Store new count.
SRU b,b,3
SLU a,a,8
PBNZ b,3B Continue unless done.
POP
 
Main IS Start
 
/cp.mms
0,0 → 1,15
% copy from StdIn to StdOut, no error checking
LOC Data_Segment
GREG @
ArgR OCTA Buf,2 one char at a time
ArgW OCTA Buf,1 ditto
Buf LOC @+2
 
LOC #100
Main LDA $255,ArgR
TRAP 0,Fgets,StdIn
BN $255,Done
LDA $255,ArgW
TRAP 0,Fwrite,StdOut
JMP Main
Done TRAP 0,0,Halt
/crypto2.mms
0,0 → 1,86
* Cryptanalysis Problem (CLASSIFIED) (pipelined, superscalar)
a GREG
b GREG
bb GREG
bbb GREG
bbbb GREG
c GREG
cc GREG
t GREG
x GREG
y GREG
LOC Data_Segment
freq GREG @ Base address for even byte counts
LOC @+8*(1<<8) Space for the byte frequencies
freqq GREG @ Base address for odd byte counts
LOC @+8*(1<<8) Space for the byte frequencies
p GREG @
BYTE "abracadabraa",0,"abc" Trivial test data
ones GREG #0101010101010101
LOC #100
Start LDOU a,p,0
INCL p,8
BDIF t,ones,a
SLU bb,a,8
BNZ t,3F Do main loop, unless near the end.
2H SRU b,a,53
SRU bb,bb,53
LDO c,freq,b
LDO cc,freqq,bb
SLU bbb,a,16
SLU bbbb,a,24
INCL c,1
INCL cc,1
SRU bbb,bbb,53
SRU bbbb,bbbb,53
STO c,freq,b
STO cc,freqq,bb
LDO c,freq,bbb
LDO cc,freqq,bbbb
SLU b,a,32
SLU bb,a,40
INCL c,1
INCL cc,1
SRU b,b,53
SRU bb,bb,53
STO c,freq,bbb
STO cc,freqq,bbbb
LDO c,freq,b
LDO cc,freqq,bb
SLU bbb,a,48
SLU bbbb,a,56
INCL c,1
INCL cc,1
SRU bbb,bbb,53
SRU bbbb,bbbb,53
STO c,freq,b
STO cc,freqq,bb
LDO c,freq,bbb
LDO cc,freqq,bbbb
LDOU a,p,0
INCL p,8
INCL c,1
INCL cc,1
BDIF t,ones,a
SLU bb,a,8
STO c,freq,bbb
STO cc,freqq,bbbb
PBZ t,2B
3H SRU b,a,53
LDO c,freq,b
INCL c,1
STO c,freq,b
SRU b,b,3
SLU a,a,8
PBNZ b,3B
SET p,8*255
4H LDO c,freq,p
LDO cc,freqq,p
ADD c,c,cc
STO c,freq,p
SUB p,p,8
PBP p,4B
POP
 
Main IS Start
 
/mmotype.w
0,0 → 1,466
% This file is part of the MMIXware package (c) Donald E Knuth 1999
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
 
\def\title{MMOTYPE}
\def\MMIX{\.{MMIX}}
\def\MMIXAL{\.{MMIXAL}}
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
 
@* Introduction. This program reads a binary \.{mmo} file output by
the \MMIXAL\ processor and lists it in human-readable form. It lists
only the symbol table, if invoked with the \.{-s} option. It lists
also the tetrabytes of input, if invoked with the \.{-v} option.
 
@s tetra int
 
@c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
@<Prototype preparations@>@;
@<Type definitions@>@;
@<Global variables@>@;
@<Subroutines@>@;
@#
int main(argc,argv)
int argc;@+char*argv[];
{
register int j,delta,postamble=0;
register char *p;
@<Process the command line@>;
@<Initialize everything@>;
@<List the preamble@>;
do @<List the next item@>@;@+while (!postamble);
@<List the postamble@>;
@<List the symbol table@>;
return 0;
}
 
@ @<Process the command line@>=
listing=1, verbose=0;
for (j=1;j<argc-1 && argv[j][0]=='-' && argv[j][2]=='\0';j++) {
if (argv[j][1]=='s') listing=0;
else if (argv[j][1]=='v') verbose=1;
else break;
}
if (j!=argc-1) {
fprintf(stderr,"Usage: %s [-s] [-v] mmofile\n",argv[0]);
@.Usage: ...@>
exit(-1);
}
 
@ @<Initialize everything@>=
mmo_file=fopen(argv[argc-1],"rb");
if (!mmo_file) {
fprintf(stderr,"Can't open file %s!\n",argv[argc-1]);
@.Can't open...@>
exit(-2);
}
 
@ @<Glob...@>=
int listing; /* are we listing everything? */
int verbose; /* are we also showing the tetras of input as they are read? */
FILE *mmo_file; /* the input file */
 
@ @<Prototype preparations@>=
#ifdef __STDC__
#define ARGS(list) list
#else
#define ARGS(list) ()
#endif
 
@ A complete definition of \.{mmo} format appears in the \MMIXAL\ document.
Here we need to define only the basic constants used for interpretation.
 
@d mm 0x98 /* the escape code of \.{mmo} format */
@d lop_quote 0x0 /* the quotation lopcode */
@d lop_loc 0x1 /* the location lopcode */
@d lop_skip 0x2 /* the skip lopcode */
@d lop_fixo 0x3 /* the octabyte-fix lopcode */
@d lop_fixr 0x4 /* the relative-fix lopcode */
@d lop_fixrx 0x5 /* extended relative-fix lopcode */
@d lop_file 0x6 /* the file name lopcode */
@d lop_line 0x7 /* the file position lopcode */
@d lop_spec 0x8 /* the special hook lopcode */
@d lop_pre 0x9 /* the preamble lopcode */
@d lop_post 0xa /* the postamble lopcode */
@d lop_stab 0xb /* the symbol table lopcode */
@d lop_end 0xc /* the end-it-all lopcode */
 
@* Low-level arithmetic. This program is intended to work correctly
whenever an |int| has at least 32 bits.
 
@<Type...@>=
typedef unsigned char byte; /* a monobyte */
typedef unsigned int tetra; /* a tetrabyte */
typedef struct {@+tetra h,l;}@+octa; /* an octabyte */
 
@ The |incr| subroutine adds a signed integer to an (unsigned) octabyte.
 
@<Sub...@>=
octa incr @,@,@[ARGS((octa,int))@];
octa incr(o,delta)
octa o;
int delta;
{
register tetra t;
octa x;
if (delta>=0) {
t=0xffffffff-delta;
if (o.l<=t) x.l=o.l+delta, x.h=o.h;
else x.l=o.l-t-1, x.h=o.h+1;
} else {
t=-delta;
if (o.l>=t) x.l=o.l-t, x.h=o.h;
else x.l=o.l+(0xffffffff+delta)+1, x.h=o.h-1;
}
return x;
}
 
@* Low-level input. The tetrabytes of an \.{mmo} file are stored in
friendly big-endian fashion, but this program is supposed to work also
on computers that are little-endian. Therefore we read four successive bytes
and pack them into a tetrabyte, instead of reading a single tetrabyte.
 
@<Sub...@>=
void read_tet @,@,@[ARGS((void))@];
void read_tet()
{
if (fread(buf,1,4,mmo_file)!=4) {
fprintf(stderr,"Unexpected end of file after %d tetras!\n",count);
@.Unexpected end of file...@>
exit(-3);
}
yz=(buf[2]<<8)+buf[3];
tet=(((buf[0]<<8)+buf[1])<<16)+yz;
if (verbose) printf(" %08x\n",tet);
count++;
}
 
@ @<Sub...@>=
byte read_byte @,@,@[ARGS((void))@];
byte read_byte()
{
register byte b;
if (!byte_count) read_tet();
b=buf[byte_count];
byte_count=(byte_count+1)&3;
return b;
}
 
@ @<Glob...@>=
int count; /* the number of tetrabytes we've read */
int byte_count; /* index of the next-to-be-read byte */
byte buf[4]; /* the most recently read bytes */
int yz; /* the two least significant bytes */
tetra tet; /* |buf| bytes packed big-endianwise */
 
@ @<Init...@>=
count=byte_count=0;
 
@* The main loop. Now for the bread-and-butter part of this program.
 
@<List the next item@>=
{
read_tet();
loop:@+if (buf[0]==mm) switch (buf[1]) {
case lop_quote:@+if (yz!=1)
err("YZ field of lop_quote should be 1");
@.YZ field...should be 1@>
read_tet();@+break;
@t\4@>@<Cases for lopcodes in the main loop@>@;
default: err("Unknown lopcode");
@.Unknown lopcode@>
}
if (listing) @<List |tet| as a normal item@>;
}
 
@ We want to catch all cases where the rules of \.{mmo} format are
not obeyed. The |err| macro ameliorates this somewhat tedious chore.
 
@d err(m) {@+fprintf(stderr,"Error in tetra %d: %s!\n",count,m);@+ continue;@+}
@.Error in tetra...@>
 
@ In a normal situation, the newly read tetrabyte is simply supposed
to be loaded into the current location. We list not only the current
location but also the current file position, if |cur_line| is nonzero
and |cur_loc| belongs to segment~0.
 
@<List |tet| as a normal item@>=
{
printf("%08x%08x: %08x",cur_loc.h,cur_loc.l,tet);
if (!cur_line) printf("\n");
else {
if (cur_loc.h&0xe0000000) printf("\n");
else {
if (cur_file==listed_file) printf(" (line %d)\n",cur_line);
else {
printf(" (\"%s\", line %d)\n", file_name[cur_file], cur_line);
listed_file=cur_file;
}
}
cur_line++;
}
cur_loc=incr(cur_loc,4);@+ cur_loc.l &=-4;
}
 
@ @<Glob...@>=
octa cur_loc; /* the current location */
int listed_file; /* the most recently listed file number */
int cur_file; /* the most recently selected file number */
int cur_line; /* the current position in |cur_file| */
char *file_name[256]; /* file names seen */
octa tmp; /* an octabyte of temporary interest */
 
@ @<Init...@>=
cur_loc.h=cur_loc.l=0;
listed_file=cur_file=-1;
cur_line=0;
 
@* The simple lopcodes. We have already implemented |lop_quote|, which
falls through to the normal case after reading an extra tetrabyte.
Now let's consider the other lopcodes in turn.
 
@d y buf[2] /* the next-to-least significant byte */
@d z buf[3] /* the least significant byte */
 
@<Cases...@>=
case lop_loc:@+if (z==2) {
j=y;@+ read_tet();@+ cur_loc.h=(j<<24)+tet;
}@+else if (z==1) cur_loc.h=y<<24;
else err("Z field of lop_loc should be 1 or 2");
@:Z field of lop_loc...}\.{Z field of lop\_loc...@>
read_tet();@+ cur_loc.l=tet;
continue;
case lop_skip: cur_loc=incr(cur_loc,yz);@+continue;
 
@ Fixups load information out of order, when future references have
been resolved. The current file name and line number are not considered
relevant.
 
@<Cases...@>=
case lop_fixo:@+if (z==2) {
j=y;@+ read_tet();@+ tmp.h=(j<<24)+tet;
}@+else if (z==1) tmp.h=y<<24;
else err("Z field of lop_fixo should be 1 or 2");
@:Z field of lop_fixo...}\.{Z field of lop\_fixo...@>
read_tet();@+ tmp.l=tet;
if (listing) printf("%08x%08x: %08x%08x\n",tmp.h,tmp.l,cur_loc.h,cur_loc.l);
continue;
case lop_fixr: delta=yz; goto fixr;
case lop_fixrx:j=yz;@+if (j!=16 && j!=24)
err("YZ field of lop_fixrx should be 16 or 24");
@:YZ field of lop_fixrx...}\.{YZ field of lop\_fixrx...@>
read_tet(); delta=tet;
if (delta&0xfe000000) err("increment of lop_fixrx is too large");
@.increment...too large@>
fixr: tmp=incr(cur_loc,-(delta>=0x1000000? (delta&0xffffff)-(1<<j): delta)<<2);
if (listing) printf("%08x%08x: %08x\n",tmp.h,tmp.l,delta);
continue;
 
@ The space for file names isn't allocated until we are sure we need it.
 
@<Cases...@>=
case lop_file:@+if (file_name[y]) {
for (j=z;j>0;j--) read_tet();
cur_file=y;
if (z) err("Two file names with the same number");
@.Two file names...@>
}@+else {
if (!z) err("No name given for newly selected file");
@.No name given...@>
file_name[y]=(char*)calloc(4*z+1,1);
if (!file_name[y]) {
fprintf(stderr,"No room to store the file name!\n");@+exit(-4);
@.No room...@>
}
cur_file=y;
for (j=z,p=file_name[y]; j>0; j--,p+=4) {
read_tet();
*p=buf[0];@+*(p+1)=buf[1];@+*(p+2)=buf[2];@+*(p+3)=buf[3];
}
}
cur_line=0;@+continue;
case lop_line:@+if (cur_file<0) err("No file was selected for lop_line");
@.No file was selected...@>
cur_line=yz;@+continue;
 
@ Special bytes in the file might be in synch with the current location
and/or the current file position, so we list those parameters too.
 
@<Cases...@>=
case lop_spec:@+if (listing) {
printf("Special data %d at loc %08x%08x", yz, cur_loc.h, cur_loc.l);
if (!cur_line) printf("\n");
else if (cur_file==listed_file) printf(" (line %d)\n",cur_line);
else {
printf(" (\"%s\", line %d)\n", file_name[cur_file], cur_line);
listed_file=cur_file;
}
}
while(1) {
read_tet();
if (buf[0]==mm) {
if (buf[1]!=lop_quote || yz!=1) goto loop; /* end of special data */
read_tet();
}
if (listing) printf(" %08x\n",tet);
}
 
@ The other cases shouldn't appear in the main loop.
 
@<Cases...@>=
case lop_pre: err("Can't have another preamble");
@.Can't have another...@>
case lop_post: postamble=1;
if (y) err("Y field of lop_post should be zero");
@:Y field of lop_post...}\.{Y field of lop\_post...@>
if (z<32) err("Z field of lop_post must be 32 or more");
@:Z field of lop_post...}\.{Z field of lop\_post...@>
continue;
case lop_stab: err("Symbol table must follow postamble");
@.Symbol table...@>
case lop_end: err("Symbol table can't end before it begins");
 
@* The preamble and postamble. Now here's what we do before and after
the main loop.
 
@<List the preamble@>=
read_tet(); /* read the first tetrabyte of input */
if (buf[0]!=mm || buf[1]!=lop_pre) {
fprintf(stderr,"Input is not an MMO file (first two bytes are wrong)!\n");
@.Input is not...@>
exit(-5);
}
if (y!=1) fprintf(stderr,
"Warning: I'm reading this file as version 1, not version %d!\n",y);
@.I'm reading this file...@>
if (z>0) {
j=z;
read_tet();
if (listing)
printf("File was created %s",asctime(localtime((time_t*)&tet)));
for (j--;j>0;j--) {
read_tet();
if (listing) printf("Preamble data %08x\n",tet);
}
}
 
@ @<List the postamble@>=
for (j=z;j<256;j++) {
read_tet();@+tmp.h=tet;@+read_tet();
if (listing) {
if (tmp.h || tet) printf("g%03d: %08x%08x\n",j,tmp.h,tet);
else printf("g%03d: 0\n",j);
}
}
 
@* The symbol table. Finally we come to the symbol table, which is
the most interesting part of this program because it recursively
traces an implicit ternary trie structure.
 
@<List the symbol table@>=
read_tet();
if (buf[0]!=mm || buf[1]!=lop_stab) {
fprintf(stderr,"Symbol table does not follow the postamble!\n");
@.Symbol table...@>
exit(-6);
}
if (yz) fprintf(stderr,"YZ field of lop_stab should be zero!\n");
@.YZ field...should be zero@>
printf("Symbol table (beginning at tetra %d):\n",count);
stab_start=count;
sym_ptr=sym_buf;
print_stab();
@<Check the |lop_end|@>;
 
@ The main work is done by a recursive subroutine called |print_stab|,
which manipulates a global array |sym_buf| containing the current
symbol prefix; the global variable |sym_ptr| points to the first
unfilled character of that array.
 
@<Sub...@>=
void print_stab @,@,@[ARGS((void))@];
void print_stab()
{
register int m=read_byte(); /* the master control byte */
register int c; /* the character at the current trie node */
register int j,k;
if (m&0x40) print_stab(); /* traverse the left subtrie, if it is nonempty */
if (m&0x2f) {
@<Read the character |c|@>;
*sym_ptr++=c;
if (sym_ptr==&sym_buf[sym_length_max]) {
fprintf(stderr,"Oops, the symbol is too long!\n");@+exit(-7);
@.Oops...too long@>
}
if (m&0xf)
@<Print the current symbol with its equivalent and serial number@>;
if (m&0x20) print_stab(); /* traverse the middle subtrie */
sym_ptr--;
}
if (m&0x10) print_stab(); /* traverse the right subtrie, if it is nonempty */
}
 
@ The present implementation doesn't support Unicode; characters with
more than 8-bit codes are printed as `\.?'. However, the changes
for 16-bit codes would be quite easy if proper fonts for Unicode output
were available. In that case, |sym_buf| would be an array of wyde characters.
@^Unicode@>
@^system dependencies@>
 
@<Read the character |c|@>=
if (m&0x80) j=read_byte(); /* 16-bit character */
else j=0;
c=read_byte();
if (j) c='?'; /* oops, we can't print |(j<<8)+c| easily at this time */
 
@ @<Print the current symbol with its equivalent and serial number@>=
{
*sym_ptr='\0';
j=m&0xf;
if (j==15) sprintf(equiv_buf,"$%03d",read_byte());
else if (j<=8) {
strcpy(equiv_buf,"#");
for (;j>0;j--) sprintf(equiv_buf+strlen(equiv_buf),"%02x",read_byte());
if (strcmp(equiv_buf,"#0000")==0) strcpy(equiv_buf,"?"); /* undefined */
}@+else {
strncpy(equiv_buf,"#20000000000000",33-2*j);
equiv_buf[33-2*j]='\0';
for (;j>8;j--) sprintf(equiv_buf+strlen(equiv_buf),"%02x",read_byte());
}
for (j=k=read_byte();; k=read_byte(),j=(j<<7)+k) if (k>=128) break;
/* the serial number is now $j-128$ */
printf(" %s = %s (%d)\n",sym_buf+1,equiv_buf,j-128);
}
 
@ @d sym_length_max 1000
 
@<Glob...@>=
int stab_start; /* where the symbol table began */
char sym_buf[sym_length_max];
/* the characters on middle transitions to current node */
char *sym_ptr; /* the character in |sym_buf| following the current prefix */
char equiv_buf[20]; /* equivalent of the current symbol */
 
@ @<Check the |lop_end|@>=
while (byte_count)
if (read_byte()) fprintf(stderr,"Nonzero byte follows the symbol table!\n");
@.Nonzero byte follows...@>
read_tet();
if (buf[0]!=mm || buf[1]!=lop_end)
fprintf(stderr,"The symbol table isn't followed by lop_end!\n");
@.The symbol table isn't...@>
else if (count!=stab_start+yz+1)
fprintf(stderr,"YZ field at lop_end should have been %d!\n",count-yz-1);
@:YZ field at lop_end...}\.{YZ field at lop\_end...@>
else {
if (verbose) printf("Symbol table ends at tetra %d.\n",count);
if (fread(buf,1,1,mmo_file))
fprintf(stderr,"Extra bytes follow the lop_end!\n");
@.Extra bytes follow...@>
}
 
 
@* Index.
/mmix-20081027.tar Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream
mmix-20081027.tar Property changes : Added: svn:mime-type ## -0,0 +1 ## +application/octet-stream \ No newline at end of property Index: primesf.mms =================================================================== --- primesf.mms (nonexistent) +++ primesf.mms (revision 270) @@ -0,0 +1,67 @@ +% Example program ... Table of primes (floating point version) +L IS 500 The number of primes to find +t IS $255 Temporary storage +n GREG +q GREG +r GREG +jj GREG +kk GREG +pk GREG +mm IS kk + + LOC Data_Segment +PRIME1 WYDE 2 + LOC PRIME1+2*L +ptop GREG @ +j0 GREG PRIME1+2-@ +BUF OCTA + + LOC #100 +Main SET n,3 + SET jj,j0 +2H STWU n,ptop,jj + INCL jj,2 +3H BZ jj,2F +4H INCL n,2 +5H SET kk,j0 +fn GREG 0 +sqrtn GREG 0 + FLOT fn,n + FSQRT sqrtn,fn +6H LDWU pk,ptop,kk + FLOT t,pk + FREM r,fn,t + BZ r,4B +7H FCMP t,t,sqrtn + BNN t,2B +8H INCL kk,2 + JMP 6B + GREG @ +Title BYTE "First Five Hundred Primes" +NewLn BYTE #a,0 +Blanks BYTE " ",0 +2H LDA t,Title + TRAP 0,Fputs,StdOut + NEG mm,2 +3H ADD mm,mm,j0 + LDA t,Blanks + TRAP 0,Fputs,StdOut +2H LDWU pk,ptop,mm +0H GREG #2030303030000000 + STOU 0B,BUF + LDA t,BUF+4 +1H DIV pk,pk,10 + GET r,rR + INCL r,'0' + STBU r,t,0 + SUB t,t,1 + PBNZ pk,1B + LDA t,BUF + TRAP 0,Fputs,StdOut + INCL mm,2*L/10 + PBN mm,2B + LDA t,NewLn + TRAP 0,Fputs,StdOut + CMP t,mm,2*(L/10-1) + PBNZ t,3B + TRAP 0,Halt,0 Index: makefile.dos =================================================================== --- makefile.dos (nonexistent) +++ makefile.dos (revision 270) @@ -0,0 +1,117 @@ +# +# Makefile for MMIXware under DOS +# +# Comments to andreas.scherer@pobox.com +# +# If you're using nmake, you'll need to save the Unix makefile and +# rename this file to makefile, as in: +# +# ren Makefile Makefile.unix +# ren Makefile.dos Makefile +# +# Then use nmake normally. + +# Be sure that CWEB version 3.0 or greater is installed before proceeding! +# In fact, CWEB 3.6 is recommended for making hardcopy or PDF documentation. + +# If you prefer optimization to debugging, change /Zi to something like /GB: +MAKE = $(MAKE) /$(MAKEFLAGS) +CFLAGS = /Zi + +.SUFFIXES: .dvi .tex .w .ps .pdf + +.tex.dvi: + tex $*.tex + +.tex.pdf: + pdftex $*.tex + +.dvi.ps: + dvips $* -o $*.ps + +.w.c: + if exist $*.ch ctangle $*.w $*.ch + if not exist $*.ch ctangle $*.w + +.w.tex: + if exist $*.ch cweave $*.w $*.ch + if not exist $*.ch cweave $*.w + +.w.obj: + $(MAKE) $*.c + $(MAKE) $*.obj + +.w.exe: + $(MAKE) $*.c + $(MAKE) $*.exe + +.w.dvi: + $(MAKE) $*.tex + $(MAKE) $*.dvi + +.w.ps: + $(MAKE) $*.dvi + $(MAKE) $*.ps + +.w.pdf: + $(MAKE) $*.tex + $(MAKE) $*.pdf + +WEBFILES = mmix-def.w mmixal.w "mmix-arith.w" mmix-sim.w mmix-io.w mmix-mem.w \ + mmotype.w abstime.w mmix-doc.w "mmix-config.w" mmix-pipe.w mmmix.w +CHANGEFILES = +TESTFILES = *.mms silly.run silly.out *.mmconfig *.mmix +MISCFILES = Makefile makefile.dos README mmix.mp mmix.1 +ALL = $(WEBFILES) $(TESTFILES) $(MISCFILES) + +basic: mmixal.exe mmix.exe + +doc: mmix-doc.ps mmixal.ps mmix-sim.ps abstime.ps + +all: mmixal.exe mmix.exe mmotype.exe mmmix.exe + +clean: + del *~ + del *.obj + del *.c + del *.h + del *.tex + del *.log + del *.dvi + del *.toc + del *.idx + del *.scn + del *.ps + del *.pdf + del *.ilk + del *.pdb + +abstime.exe: abstime.obj + $(CC) $(CFLAGS) abstime.obj /Feabstime.exe + +"mmix-pipe.obj": "mmix-pipe.c" abstime.exe + .\abstime >abstime.h + $(CC) $(CFLAGS) -c mmix-pipe.c + +mmmix.exe: "mmix-arith.obj" "mmix-pipe.obj" "mmix-config.obj" \ + "mmix-mem.obj" "mmix-io.obj" mmmix.c + $(CC) $(CFLAGS) mmmix.c \ + "mmix-arith.obj" "mmix-pipe.obj" "mmix-config.obj" "mmix-mem.obj" \ + "mmix-io.obj" /Femmmix.exe + +mmixal.exe: "mmix-arith.obj" mmixal.c + $(CC) $(CFLAGS) mmixal.c "mmix-arith.obj" /Femmixal.exe + +mmix.exe: "mmix-arith.obj" mmix-io.obj mmix-sim.c abstime.exe + .\abstime >abstime.h + $(CC) $(CFLAGS) mmix-sim.c \ + "mmix-arith.obj" mmix-io.obj /Femmix.exe + +mmotype.exe: mmotype.obj + $(CC) $(CFLAGS) mmotype.obj /Femmotype.exe + +tarfile: $(ALL) + tar cvf /tmp/mmix.tar $(ALL) + gzip -9 /tmp/mmix.tar + + Index: plain.mmconfig =================================================================== --- plain.mmconfig (nonexistent) +++ plain.mmconfig (revision 270) @@ -0,0 +1,42 @@ +% example configuration for basic tests +memaddresstime 3 +memreadtime 10 memwritetime 10 +membusbytes 16 +branchpredictbits 2 +branchaddressbits 1 +branchhistorybits 1 +branchdualbits 1 +memchunksmax 100 +hashprime 127 +Scache blocksize 64 +Scache setsize 512 +Scache associativity 4 pseudolru +Scache accesstime 2 +Dcache blocksize 32 +Dcache setsize 256 +Dcache victimsize 2 +Icache blocksize 32 +Icache setsize 256 +Icache victimsize 2 +DTcache associativity 4 lru +unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe +unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe +unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000 +unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000 +unit MUL1 000080f000000000000000000000000000000000000000000000000000000000 +unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000 +unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000 +memslots 4 +renameregs 10 +reorderbuffer 20 +Dcache writeallocate 1 +Scache writeallocate 1 +Dcache writeback 1 +Scache writeback 1 +Dcache ports 2 +DTcache ports 2 +writebuffer 4 +writeholdingtime 5 +mul0 1 +mul1 2 +mul2 5 Index: hptest.mms =================================================================== --- hptest.mms (nonexistent) +++ hptest.mms (revision 270) @@ -0,0 +1,18 @@ +* Register stack test program by Hans-Peter Nilsson, January 2002 + LOC #100 +cnt GREG +max IS 17 +msg BYTE "No bug noticed here",#a,0 + +Main PUSHJ $16,Recurse + GETA $255,msg + TRAP 0,Fputs,StdOut + TRAP 0,Halt,0 + +Recurse ADDU cnt,cnt,1 + CMP $0,cnt,max + BZ $0,0F + GET $1,rJ + PUSHJ $16,Recurse + PUT rJ,$1 +0H POP 0,0 Index: permu-langdon.mms =================================================================== --- permu-langdon.mms (nonexistent) +++ permu-langdon.mms (revision 270) @@ -0,0 +1,35 @@ +* Permutation generator a la Langdon +N IS 6 $n$ (2, 3, ..., 15) +t IS $255 +k IS $0 +kk IS $1 +c IS $2 +d IS $3 +a GREG 0 +ones GREG #1111111111111111&(1<<(4*N)-1) + + LOC #100 + GREG @ +ElGordo OCTA #fedcba9876543210&(1<<(4*N)-1) +Main LDOU a,ElGordo $a\gets\.{\#...3210}$. + JMP 2F +1H SRU a,a,4*(16-N) + OR a,a,t + +2H ADDU c,a,ones Trace this location to see the perm! + + SRU t,a,4*(N-1) + SLU a,a,4*(17-N) + PBNZ t,1B + SET k,1 +3H SRU d,a,60 + SLU a,a,4 + CMP c,d,k + SLU kk,k,2 + SLU d,d,kk + OR t,t,d + PBNZ c,1B + INCL k,1 + PBNZ a,3B + TRAP 0,Halt,0 + Index: fftswap.mms =================================================================== --- fftswap.mms (nonexistent) +++ fftswap.mms (revision 270) @@ -0,0 +1,67 @@ +% the bit-reversal portion of a 256-point Fast Fourier Transform + +t IS $255 +n IS 256 +pi GREG +pj GREG +tx GREG +i GREG +j GREG + + LOC Data_Segment +Data GREG @ +% Here follows 256 octabyte pairs for (real,imag) parts of complex data +% I'm faking it with small integer numbers just for easy testing +% But it uses long lines, so assemble with "mmixal -b 80" + OCTA 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + OCTA 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 + OCTA 32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47 + OCTA 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63 + OCTA 64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 + OCTA 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95 + OCTA 96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111 + OCTA 112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127 + OCTA 128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143 + OCTA 144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159 + OCTA 160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175 + OCTA 176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191 + OCTA 192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207 + OCTA 208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223 + OCTA 224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239 + OCTA 240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255 + OCTA 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271 + OCTA 272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287 + OCTA 288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303 + OCTA 304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319 + OCTA 320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335 + OCTA 336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351 + OCTA 352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367 + OCTA 368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383 + OCTA 384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399 + OCTA 400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415 + OCTA 416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431 + OCTA 432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447 + OCTA 448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463 + OCTA 464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479 + OCTA 480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495 + OCTA 496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511 + + LOC #100 +Main SET i,n-1 +0H GREG #0102040810204080 +1H MOR j,0B,i + CMP t,i,j + PBNN t,2F jump if i<=j + 16ADDU pi,i,Data pi=&Data[i] + 16ADDU pj,j,Data pj=&Data[j] + LDO t,pi,0 + LDO tx,pj,0 + STO t,pj,0 swap Data[i].real with Data[j].real + STO tx,pi,0 + LDO t,pi,8 + LDO tx,pj,8 + STO t,pj,8 swap Data[i].imag with Data[j].imag + STO tx,pi,8 +2H SUB i,i,1 i-- + PBP i,1B repeat until i==0 + TRAP 0,Halt,0 Index: mmix-sim.w =================================================================== --- mmix-sim.w (nonexistent) +++ mmix-sim.w (revision 270) @@ -0,0 +1,3424 @@ +% This file is part of the MMIXware package (c) Donald E Knuth 1999 +@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! + +\def\title{MMIX-SIM} +\def\MMIX{\.{MMIX}} +\def\NNIX{\hbox{\mc NNIX}} +\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant +\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow +\def\dts{\mathinner{\ldotp\ldotp}} +\def\bull{\smallskip\textindent{$\bullet$}} +@s xor normal @q unreserve a C++ keyword @> +@s bool normal @q unreserve a C++ keyword @> + +@*Introduction. This program simulates a simplified version of the \MMIX\ +computer. Its main goal is to help people create and test \MMIX\ programs for +{\sl The Art of Computer Programming\/} and related publications. It provides +only a rudimentary terminal-oriented interface, but it has enough +infrastructure to support a cool graphical user interface --- which could be +added by a motivated reader. (Hint, hint.) + +\MMIX\ is simplified in the following ways: + +\bull +There is no pipeline, and there are no +caches. Thus, commands like \.{SYNC} and \.{SYNCD} and \.{PREGO} do nothing. + +\bull +Simulation applies only to user programs, not to an operating system kernel. +Thus, all addresses must be nonnegative; ``privileged'' commands such as +\.{PUT}~\.{rK,z} or \.{RESUME}~\.1 or \.{LDVTS}~\.{x,y,z} are not allowed; +instructions should be executed only from addresses in segment~0 +(addresses less than \Hex{2000000000000000}). +Certain special registers remain constant: $\rm rF=0$, +$\rm rK=\Hex{ffffffffffffffff}$, +$\rm rQ=0$; +$\rm rT=\Hex{8000000500000000}$, +$\rm rTT=\Hex{8000000600000000}$, +$\rm rV=\Hex{369c200400000000}$. + +\bull +No trap interrupts are implemented, except for a few special cases of \.{TRAP} +that provide rudimentary input-output. +@^interrupts@> + +\bull +All instructions take a fixed amount of time, given by the rough estimates +stated in the \MMIX\ documentation. For example, \.{MUL} takes $10\upsilon$, +\.{LDB} takes $\mu+\upsilon\mkern1mu$; all times are expressed in terms of +$\mu$ and~$\upsilon$, ``mems'' and ``oops.'' The clock register~rC increases by +@^mems@> +@^oops@> +$2^{32}$ for each~$\mu$ and 1~for each~$\upsilon$. But the interval +counter~rI decreases by~1 for each instruction, and the usage +counter~rU increases by~1 for each instruction. +@^rC@> +@^rI@> +@^rU@> + +@ To run this simulator, assuming \UNIX/ conventions, you say +`\.{mmix} \ \.{progfile} \.{args...}', +where \.{progfile} is an output of the \.{MMIXAL} assembler, +\.{args...} is a sequence of optional command-line arguments passed +to the simulated program, and \ is any subset of the following: +@^command line arguments@> + +\bull \.{-t}\quad Trace each instruction the first $n$ times it +is executed. (The notation \.{} in this option, and in several +other options and interactive commands below, stands for a decimal integer.) + +\bull \.{-e}\quad Trace each instruction that raises an arithmetic +exception belonging to the given bit pattern. (The notation \.{} in this +option, and in several other commands below, stands for a hexadecimal integer.) +The exception bits are DVWIOUZX as they appear in rA, namely +\Hex{80} for~D (integer divide check), \Hex{40} for~V (integer overflow), +\dots, \Hex{01} for~X (floating inexact). The option \.{-e} by itself +is equivalent to \.{-eff}, tracing all eight exceptions. + +\bull \.{-r}\quad Trace details of the register stack. This option +shows all the ``hidden'' loads and stores that occur when octabytes are +written from the ring of local registers into memory, or read from memory into +that ring. It also shows the full details of \.{SAVE} and \.{UNSAVE} +operations. + +\bull \.{-l}\quad List the source line corresponding to each traced +instruction, filling gaps of length $n$ or less. +For example, if one instruction came from line 10 of the source file +and the next instruction to be traced came from line 12, line 11 would +be shown also, provided that $n\ge1$. If \.{} is omitted it is +assumed to be~3. + +\bull \.{-s}\quad Show statistics of running time with each traced instruction. + +\bull \.{-P}\quad Show the program profile (that is, the frequency counts +of each instruction that was executed) when the simulation ends. + +\bull \.{-L}\quad List the source lines corresponding to each instruction +that appears in the program profile, filling gaps of length $n$ or less. +This option implies \.{-P}. If \.{} is omitted it is assumed to be~3. + +\bull \.{-v}\quad Be verbose: \kern-2.5ptTurn on all options. +(More precisely, the \.{-v} option is +shorthand for \.{-t9999999999}~\.{-e} \.{-r} \.{-s} \.{-l10}~\.{-L10}.) + +\bull \.{-q}\quad Be quiet: Cancel all previously specified options. + +\bull \.{-i}\quad Go into interactive mode before starting the simulation. + +\bull \.{-I}\quad Go into interactive mode when the simulated program +halts or pauses for a breakpoint. + +\bull \.{-b}\quad Set the buffer size of source lines to $\max(72,n)$. + +\bull \.{-c}\quad Set the capacity of the local register ring +to $\max(256,n)$; this number must be a power of~2. + +\bull \.{-f}\quad Use the named file for standard input to the +simulated program. This option should be used whenever the simulator +is not being used interactively, because the simulator will not recognize +end of file when standard input has been defined in any other way. + +\bull \.{-D}\quad Prepare the named file for use by other +simulators, instead of actually doing a simulation. + +\bull \.{-?}\quad Print the ``\.{Usage}'' message, which summarizes +the command-line options. + +\smallskip\noindent +The author recommends \.{-t2} \.{-l} \.{-L} for initial offline debugging. + +While the program is being simulated, an {\it interrupt\/} +signal (usually control-C) will cause the simulator to +@^interrupts@> +break and go into interactive mode after tracing the current instruction, +even if \.{-i} and \.{-I} were not specified on the command line. + +@ In interactive mode, the user is prompted `\.{mmix>}' and a variety of +@.mmix>@> +commands can be typed online. Any command-line option can be given +in response to such a prompt (including the `\.-' that begins the option), +and the following operations are also available: + +\bull Simply typing \ or \.n\ to the \.{mmix>} prompt causes +one \MMIX\ instruction to be executed and traced; then the user is prompted +again. + +\bull \.c continues simulation until the program halts or reaches +a breakpoint. (Actually the command is `\.c\', but we won't +bother to mention the \ in the following description.) + +\bull \.q quits (terminates the simulation), after printing the +profile (if it was requested) and the final statistics. + +\bull \.s prints out the current statistics (the clock times and the +current instruction location). We have already discussed the \.{-s} option +on the command line, which +causes these statistics to be printed automatically; +but a lot of statistics can fill up a lot of file space, so users may +prefer to see the statistics only on demand. + +\bull \.{l}, \.{g}, \.{\$}, \.{rA}, \.{rB}, \dots, +\.{rZZ}, and \.{M} will show the current value of a local register, +global register, dynamically numbered register, special register, or memory +location. Here \.{} specifies the type of value to be displayed; +if \.{} is `\.!', the value will be given in decimal notation; +if \.{} is `\..' it will be given in floating point notation; +if \.{} is `\.\#' it will be given in hexadecimal, and +if \.{} is `\."' it will be given as a string of eight one-byte +characters. Just typing \.{} by itself will repeat the most recently shown +value, perhaps in another format; for example, the command `\.{l10\#}' +will show local register 10 in hexadecimal notation, then the command +`\.!' will show it in decimal and `\..' will show it as a floating point +number. If \.{} is empty, the previous type will be repeated; +the default type is decimal. Register \.{rA} is equivalent to \.{g22}, +according to the numbering used in \.{GET} and \.{PUT} commands. + +The `\.{}' in any of these commands can also have the form +`\.{=}', where the value is a decimal or floating point or +hexadecimal or string constant. (The syntax rules for floating point constants +appear in {\mc MMIX-ARITH}. A string constant is treated as in the +\.{BYTE} command of \.{MMIXAL}, but padded at the left with zeros if +fewer than eight characters are specified.) This assigns a new value +before displaying it. For example, `\.{l10=.1e3}' +sets local register 10 equal to 100; `\.{g250="ABCD",\#a}' sets global +register 250 equal to \Hex{000000414243440a}; `\.{M1000=-Inf}' sets +M$[\Hex{1000}]_8=\Hex{fff0000000000000}$, the representation of $-\infty$. +Special registers other than~rI cannot be set to values disallowed by~\.{PUT}. +Marginal registers cannot be set to nonzero values. + +The command `\.{rI=250}' sets the interval counter to 250; this will +cause a break in simulation after 250 instructions have been executed. + +\bull \.{+} shows the next $n$ octabytes following the one +most recently shown, in format \.{}. For example, after `\.{l10\#}' +a subsequent `\.{+30}' will show \.{l11}, \.{l12}, \dots, \.{l40} in +hexadecimal notation. After `\.{g200=3}' a subsequent `\.{+30}' will +set \.{g201}, \.{g202}, \dots, \.{g230} equal to~3, but a subsequent +`\.{+30!}' would merely display \.{g201} through~\.{g230} in decimal +notation. Memory addresses will advance by~8 instead of by~1. If \.{} +is empty, the default value $n=1$ is used. + +\bull \.{@@} sets the address of the next tetrabyte to be +simulated, sort of like a \.{GO} command. + +\bull \.{t} says that the instruction in tetrabyte location $x$ should +always be traced, regardless of its frequency count. + +\bull \.{u} undoes the effect of \.{t}. + +\bull \.{b[rwx]} sets breakpoints at tetrabyte $x$; here \.{[rwx]} +stands for any subset of the letters \.r, \.w, and/or~\.x, meaning to +break when the tetrabyte is read, written, and/or executed. For example, +`\.{bx1000}' causes a break in the simulation just after the tetrabyte +in \Hex{1000} is executed; `\.{b1000}' undoes this breakpoint; +`\.{brwx1000}' causes a break just after any simulated instruction loads, +stores, or appears in tetrabyte number \Hex{1000}. + +\bull \.{T}, \.{D}, \.{P}, \.{S} sets the ``current segment'' to +\.{Text\_Segment}, \.{Data\_Segment}, \.{Pool\_Segment}, or +\.{Stack\_Segment}, respectively, namely to \Hex{0}, \Hex{2000000000000000}, +\Hex{4000000000000000}, or \Hex{6000000000000000}. The current segment, +initially \Hex{0}, is added to all +memory addresses in \.{M}, \.{@@}, \.{t}, \.{u}, and \.{b} commands. +@:Text_Segment}\.{Text\_Segment@> +@:Data_Segment}\.{Data\_Segment@> +@:Pool_Segment}\.{Pool\_Segment@> +@:Stack_Segment}\.{Stack\_Segment@> + +\bull \.{B} lists all current breakpoints and tracepoints. + +\bull \.{i} reads a sequence of interactive commands from the +specified file, one command per line, ignoring blank lines. This feature +can be used to set many breakpoints or to display a number of key +registers, etc. Included lines that begin with \.\% or \.i are ignored; +therefore an included file cannot include {\it another\/} file. +Included lines that begin with a blank space are reproduced in the standard +output, otherwise ignored. + +\bull \.h (help) reminds the user of the available interactive commands. + +@* Rudimentary I/O. +Input and output are provided by the following ten primitive system calls: +@^I/O@> +@^input/output@> + +\bull \.{Fopen}|(handle,name,mode)|. Here |handle| is a +one-byte integer, |name| is a string, and |mode| is one of the +values \.{TextRead}, \.{TextWrite}, \.{BinaryRead}, \.{BinaryWrite}, +\.{BinaryReadWrite}. An \.{Fopen} call associates |handle| with the +external file called |name| and prepares to do input and/or output +on that file. It returns 0 if the file was opened successfully; otherwise +returns the value~$-1$. If |mode| is \.{TextWrite}, \.{BinaryWrite}, or +\.{BinaryReadWrite}, +any previous contents of the named file are discarded. If |mode| is +\.{TextRead} or \.{TextWrite}, the file consists of ``lines'' terminated +by ``newline'' characters, and it is said to be a text file; otherwise +the file consists of uninterpreted bytes, and it is said to be a binary file. +@.Fopen@> +@.TextRead@> +@.TextWrite@> +@.BinaryRead@> +@.BinaryWrite@> +@.BinaryReadWrite@> + +Text files and binary files are essentially equivalent in cases +where this simulator is hosted by an operating system derived from \UNIX/; +in such cases files can be written as text and read as binary or vice versa. +But with other operating systems, text files and binary files often have +quite different representations, and certain characters with byte +codes less than~|' '| are forbidden in text. Within any \MMIX\ program, +the newline character has byte code $\Hex{0a}=10$. + +At the beginning of a program three handles have already been opened: The +``standard input'' file \.{StdIn} (handle~0) has mode \.{TextRead}, the +``standard output'' file \.{StdOut} (handle~1) has mode \.{TextWrite}, and the +``standard error'' file \.{StdErr} (handle~2) also has mode \.{TextWrite}. +@.StdIn@> +@.StdOut@> +@.StdErr@> +When this simulator is being run interactively, lines of standard input +should be typed following a prompt that says `\.{StdIn>\ }', unless the \.{-f} +option has been used. +The standard output and standard error files of the simulated program +are intermixed with the output of the simulator~itself. + +The input/output operations supported by this simulator can perhaps be +understood most easily with reference to the standard library \.{stdio} +that comes with the \CEE/ language, because the conventions of~\CEE/ +have been explained in hundreds of books. If we declare an array +|FILE *file[256]| and set |file[0]=stdin|, |file[1]=stdout|, and +|file[2]=stderr|, then the simulated system call \.{Fopen}|(handle,name,mode)| +is essentially equivalent to the \CEE/ expression +$$\displaylines{ +\hskip5em\hbox{(|file[handle]|? + |(file[handle]=freopen(name,mode_string[mode],file[handle]))|:}\hfill\cr +\hfill\hbox{|(file[handle]=fopen(name,mode_string[mode]))|)? 0: $-1$},% + \hskip5em\cr}$$ +if we set |mode_string|[\.{TextRead}]~=~|"r"|, +|mode_string|[\.{TextWrite}]~=~|"w"|, +|mode_string|[\.{BinaryRead}]~=~|"rb"|, +|mode_string|[\.{BinaryWrite}]~=~|"wb"|, and +|mode_string|[\.{BinaryReadWrite}]~=~|"wb+"|. + +\bull \.{Fclose}|(handle)|. If the given file handle has been opened, it is +closed---no longer associated with any file. Again the result is 0 if +successful, or $-1$ if the file was already closed or unclosable. +The \CEE/ equivalent is +$$\hbox{|fclose(file[handle])? -1: 0|}$$ +with the additional side effect of setting |file[handle]=NULL|. + +\bull \.{Fread}|(handle,buffer,size)|. +The file handle should have been opened with mode \.{TextRead}, +\.{BinaryRead}, or \.{BinaryReadWrite}. +@.Fread@> +The next |size| characters are read into \MMIX's memory starting at address +|buffer|. If an error occurs, the value |-1-size| is returned; +otherwise, if the end of file does not intervene, 0~is returned; +otherwise the negative value |n-size| is returned, where |n|~is the number of +characters successfully read and stored. The statement +$$\hbox{|fread(buffer,1,size,file[handle])-size|}$$ +has the equivalent effect in \CEE/, in the absence of file errors. + +\bull \.{Fgets}|(handle,buffer,size)|. +The file handle should have been opened with mode \.{TextRead}, +\.{BinaryRead}, or \.{BinaryReadWrite}. +@.Fgets@> +Characters are read into \MMIX's memory starting at address |buffer|, until +either |size-1| characters have been read and stored or a newline character has +been read and stored; the next byte in memory is then set to zero. +If an error or end of file occurs before reading is complete, the memory +contents are undefined and the value $-1$ is returned; otherwise +the number of characters successfully read and stored is returned. +The equivalent in \CEE/ is +$$\hbox{|fgets(buffer,size,file[handle])? strlen(buffer): -1|}$$ +if we assume that no null characters were read in; null characters may, +however, precede a newline, and they are counted just like other characters. + +\bull \.{Fgetws}|(handle,buffer,size)|. +@.Fgetws@> +This command is the same as \.{Fgets}, except that it applies to wyde +characters instead of one-byte characters. Up to |size-1| wyde +characters are read; a wyde newline is $\Hex{000a}$. The \CEE/~version, +using conventions of the ISO multibyte string extension (MSE), is +@^MSE@> +approximately +$$\hbox{|fgetws(buffer,size,file[handle])? wcslen(buffer): -1|}$$ +where |buffer| now has type |wchar_t*|. + +\bull \.{Fwrite}|(handle,buffer,size)|. +The file handle should have been opened with one of the modes \.{TextWrite}, +\.{BinaryWrite}, or \.{BinaryReadWrite}. +@.Fwrite@> +The next |size| characters are written from \MMIX's memory starting at address +|buffer|. If no error occurs, 0~is returned; +otherwise the negative value |n-size| is returned, where |n|~is the number of +characters successfully written. The statement +$$\hbox{|fwrite(buffer,1,size,file[handle])-size|}$$ +together with |fflush(file[handle])| has the equivalent effect in \CEE/. + +\bull \.{Fputs}|(handle,string)|. +The file handle should have been opened with mode \.{TextWrite}, +\.{BinaryWrite}, or \.{BinaryReadWrite}. +@.Fputs@> +One-byte characters are written from \MMIX's memory to the file, starting +at address |string|, up to but not including the first byte equal to~zero. +The number of bytes written is returned, or $-1$ on error. +The \CEE/ version is +$$\hbox{|fputs(string,file[handle])>=0? strlen(string): -1|,}$$ +together with |fflush(file[handle])|. + +\bull \.{Fputws}|(handle,string)|. +The file handle should have been opened with mode \.{TextWrite}, +\.{BinaryWrite}, or \.{BinaryReadWrite}. +@.Fputws@> +Wyde characters are written from \MMIX's memory to the file, starting +at address |string|, up to but not including the first wyde equal to~zero. +The number of wydes written is returned, or $-1$ on error. +The \CEE/+MSE version is +$$\hbox{|fputws(string,file[handle])>=0? wcslen(string): -1|}$$ +together with |fflush(file[handle])|, where |string| now has type |wchar_t*|. + +\bull \.{Fseek}|(handle,offset)|. +The file handle should have been opened with mode \.{BinaryRead}, +\.{BinaryWrite}, or \.{BinaryReadWrite}. +@.Fseek@> +This operation causes the next input or output operation to begin at +|offset| bytes from the beginning of the file, if |offset>=0|, or at +|-offset-1| bytes before the end of the file, if |offset<0|. (For +example, |offset=0| ``rewinds'' the file to its very beginning; +|offset=-1| moves forward all the way to the end.) The result is 0 +if successful, or $-1$ if the stated positioning could not be done. +The \CEE/ version is +$$\hbox{|fseek(file[handle],@,offset<0? offset+1: offset,@, + offset<0? SEEK_END: SEEK_SET)|? $-1$: 0.}$$ +If a file in mode \.{BinaryReadWrite} is used for both reading and writing, +an \.{Fseek} command must be given when switching from input to output +or from output to input. + +\bull \.{Ftell}|(handle)|. +The file handle should have been opened with mode \.{BinaryRead}, +\.{BinaryWrite}, or \.{BinaryReadWrite}. +@.Ftell@> +This operation returns the current file position, measured in bytes +from the beginning, or $-1$ if an error has occurred. In this case the +\CEE/ function +$$\hbox{|ftell(file[handle])|}$$ +has exactly the same meaning. + +\smallskip +Although these ten operations are quite primitive, they provide +the necessary functionality for extremely complex input/output behavior. +For example, every function in the \.{stdio} library of \CEE/, +with the exception of the two administrative operations \\{remove} and +\\{rename}, can be implemented as a subroutine in terms of the six basic +operations \.{Fopen}, \.{Fclose}, \.{Fread}, \.{Fwrite}, \.{Fseek}, and +\.{Ftell}. + +Notice that the \MMIX\ function calls are much more consistent than +those in the \CEE/ library. The first argument is always a handle; +the second, if present, is always an address; the third, if present, +is always a size. {\it The result returned is always nonnegative if the +operation was successful, negative if an anomaly arose.} These common +features make the functions reasonably easy to remember. + +@ The ten input/output operations of the previous section are invoked by +\.{TRAP} commands with $\rm X=0$, $\rm Y=\.{Fopen}$ or \.{Fclose} or \dots~or +\.{Ftell}, and $\rm Z=\.{Handle}$. If~there are two arguments, the +second argument is placed in \$255. If there are three arguments, +the address of the second is placed in~\$255; the second argument +is M$[\$255]_8$ and the third argument is M$[\$255+8]_8$. The returned +value will be in \$255 when the system call is finished. (See the +example below.) + +@ The user program starts at symbolic location \.{Main}. At this time +@.Main@> +@:Pool_Segment}\.{Pool\_Segment@> +the global registers are initialized according to the \.{GREG} +statements in the \.{MMIXAL} program, and \$255 is set to the +numeric equivalent of~\.{Main}. Local register~\$0 is +initially set to the number of {\it command-line arguments\/}; and +@^command line arguments@> +local register~\$1 points to the first such argument, which +is always a pointer to the program name. Each command-line argument is a +pointer to a string; the last such pointer is M$[\$0\ll3+\$1]_8$, and +M$[\$0\ll3+\$1+8]_8$ is zero. (Register~\$1 will point to an octabyte in +\.{Pool\_Segment}, and the command-line strings will be in that segment +too.) Location M[\.{Pool\_Segment}] will be the address of the first +unused octabyte of the pool segment. + +Registers rA, rB, rD, rE, rF, rH, rI, rJ, rM, rP, rQ, and rR +are initially zero, and $\rm rL=2$. + +A subroutine library loaded with the user program might need to initialize +itself. If an instruction has been loaded into tetrabyte M$[\Hex{90}]_4$, +the simulator actually begins execution at \Hex{90} instead of at~\.{Main}; +in this case \$255 holds the location of~\.{Main}. +@^subroutine library initialization@> +@^initialization of a user program@> +(The routine at \Hex{90} can pass control to \.{Main} without increasing~rL, +if it starts with the slightly tricky sequence +$$\.{PUT rW, \$255;{ } PUT rB, \$255;{ } SETML \$255,\#F700;{ } % PUTI rB,0! + PUT rX,\$255}$$ +and eventually says \.{RESUME}; this \.{RESUME} command will restore +\$255 and~rB. But the user program should {\it not\/} really count on +the fact that rL is initially~2.) + +@ The main program ends when \MMIX\ executes the system +call \.{TRAP}~\.{0}, which is often symbolically written +`\.{TRAP}~\.{0,Halt,0}' to make its intention clear. The contents +of \$255 at that time are considered to be the value ``returned'' +by the main program, as in the |exit| statement of~\CEE/; a nonzero +value indicates an anomalous exit. All open files are closed +@.Halt@> +when the program ends. + +@ Here, for example, is a complete program that copies a text file +to the standard output, given the name of the file to be copied. +It includes all necessary error checking. +\vskip-14pt +$$\baselineskip=10pt +\obeyspaces\halign{\qquad\.{#}\hfil\cr +* SAMPLE PROGRAM: COPY A GIVEN FILE TO STANDARD OUTPUT\cr +\noalign{\smallskip} +t IS \$255\cr +argc IS \$0\cr +argv IS \$1\cr +s IS \$2\cr +Buf\_Size IS 1000\cr +{} LOC Data\_Segment\cr +Buffer LOC @@+Buf\_Size\cr +{} GREG @@\cr +Arg0 OCTA 0,TextRead\cr +Arg1 OCTA Buffer,Buf\_Size\cr +\noalign{\smallskip} +{} LOC \#200 main(argc,argv) \{\cr +Main CMP t,argc,2 if (argc==2) goto openit\cr +{} PBZ t,OpenIt\cr +{} GETA t,1F fputs("Usage: ",stderr)\cr +{} TRAP 0,Fputs,StdErr\cr +{} LDOU t,argv,0 fputs(argv[0],stderr)\cr +{} TRAP 0,Fputs,StdErr\cr +{} GETA t,2F fputs(" filename\\n",stderr)\cr +Quit TRAP 0,Fputs,StdErr \cr +{} NEG t,0,1 quit: exit(-1)\cr +{} TRAP 0,Halt,0\cr +1H BYTE "Usage: ",0\cr +{} LOC (@@+3)\&-4 align to tetrabyte\cr +2H BYTE " filename",\#a,0\cr +\noalign{\smallskip} +OpenIt LDOU s,argv,8 openit: s=argv[1]\cr +{} STOU s,Arg0\cr +{} LDA t,Arg0 fopen(argv[1],"r",file[3])\cr +{} TRAP 0,Fopen,3\cr +{} PBNN t,CopyIt if (no error) goto copyit\cr +{} GETA t,1F fputs("Can't open file ",stderr)\cr +{} TRAP 0,Fputs,StdErr\cr +{} SET t,s fputs(argv[1],stderr)\cr +{} TRAP 0,Fputs,StdErr\cr +{} GETA t,2F fputs("!\\n",stderr)\cr +{} JMP Quit goto quit\cr +1H BYTE "Can't open file ",0\cr +{} LOC (@@+3)\&-4 align to tetrabyte\cr +2H BYTE "!",\#a,0\cr +\noalign{\smallskip} +CopyIt LDA t,Arg1 copyit:\cr +{} TRAP 0,Fread,3 items=fread(buffer,1,buf\_size,file[3])\cr +{} BN t,EndIt if (items < buf\_size) goto endit\cr +{} LDA t,Arg1 items=fwrite(buffer,1,buf\_size,stdout)\cr +{} TRAP 0,Fwrite,StdOut\cr +{} PBNN t,CopyIt if (items >= buf\_size) goto copyit\cr +Trouble GETA t,1F trouble: fputs("Trouble w...!",stderr)\cr +{} JMP Quit goto quit\cr +1H BYTE "Trouble writing StdOut!",\#a,0\cr +\noalign{\smallskip} +EndIt INCL t,Buf\_Size\cr +{} BN t,ReadErr if (ferror(file[3])) goto readerr\cr +{} STO t,Arg1+8\cr +{} LDA t,Arg1 n=fwrite(buffer,1,items,stdout)\cr +{} TRAP 0,Fwrite,StdOut\cr +{} BN t,Trouble if (n < items) goto trouble\cr +{} TRAP 0,Halt,0 exit(0)\cr +ReadErr GETA t,1F readerr: fputs("Trouble r...!",stderr)\cr +{} JMP Quit goto quit \}\cr +1H BYTE "Trouble reading!",\#a,0\cr +}$$ + +@* Basics. To get started, we define a type that provides semantic sugar. + +@= +typedef enum {@!false,@!true}@+@!bool; + +@ This program for the 64-bit \MMIX\ architecture is based on 32-bit integer +arithmetic, because nearly every computer available to the author at the time +of writing (1999) was limited in that way. It uses subroutines +from the {\mc MMIX-ARITH} module, assuming only that type \&{tetra} +represents unsigned 32-bit integers. The definition of \&{tetra} +given here should be changed, if necessary, to agree with the +definition in that module. +@^system dependencies@> + +@= +typedef unsigned int tetra; + /* for systems conforming to the LP-64 data model */ +typedef struct {tetra h,l;} octa; /* two tetrabytes make one octabyte */ +typedef unsigned char byte; /* a monobyte */ + +@ We declare subroutines twice, once with a prototype and once +with the old-style~\CEE/ conventions. The following hack makes +this work with new compilers as well as the old standbys. + +@= +#ifdef __STDC__ +#define ARGS(list) list +#else +#define ARGS(list) () +#endif + +@ @= +void print_hex @,@,@[ARGS((octa))@];@+@t}\6{@> +void print_hex(o) + octa o; +{ + if (o.h) printf("%x%08x",o.h,o.l); + else printf("%x",o.l); +} + +@ Most of the subroutines in {\mc MMIX-ARITH} return an octabyte as +a function of two octabytes; for example, |oplus(y,z)| returns the +sum of octabytes |y| and~|z|. Division inputs the high +half of a dividend in the global variable~|aux| and returns +the remainder in~|aux|. + +@= +extern octa zero_octa; /* |zero_octa.h=zero_octa.l=0| */ +extern octa neg_one; /* |neg_one.h=neg_one.l=-1| */ +extern octa aux,val; /* auxiliary data */ +extern bool overflow; /* flag set by signed multiplication and division */ +extern int exceptions; /* bits set by floating point operations */ +extern int cur_round; /* the current rounding mode */ +extern char *next_char; /* where a scanned constant ended */ +extern octa oplus @,@,@[ARGS((octa y,octa z))@]; + /* unsigned $y+z$ */ +extern octa ominus @,@,@[ARGS((octa y,octa z))@]; + /* unsigned $y-z$ */ +extern octa incr @,@,@[ARGS((octa y,int delta))@]; + /* unsigned $y+\delta$ ($\delta$ is signed) */ +extern octa oand @,@,@[ARGS((octa y,octa z))@]; + /* $y\land z$ */ +extern octa shift_left @,@,@[ARGS((octa y,int s))@]; + /* $y\LL s$, $0\le s\le64$ */ +extern octa shift_right @,@,@[ARGS((octa y,int s,int uns))@]; + /* $y\GG s$, signed if |!uns| */ +extern octa omult @,@,@[ARGS((octa y,octa z))@]; + /* unsigned $(|aux|,x)=y\times z$ */ +extern octa signed_omult @,@,@[ARGS((octa y,octa z))@]; + /* signed $x=y\times z$ */ +extern octa odiv @,@,@[ARGS((octa x,octa y,octa z))@]; + /* unsigned $(x,y)/z$; $|aux|=(x,y)\bmod z$ */ +extern octa signed_odiv @,@,@[ARGS((octa y,octa z))@]; + /* signed $x=y/z$ */ +extern int count_bits @,@,@[ARGS((tetra z))@]; + /* $x=\nu(z)$ */ +extern tetra byte_diff @,@,@[ARGS((tetra y,tetra z))@]; + /* half of \.{BDIF} */ +extern tetra wyde_diff @,@,@[ARGS((tetra y,tetra z))@]; + /* half of \.{WDIF} */ +extern octa bool_mult @,@,@[ARGS((octa y,octa z,bool xor))@]; + /* \.{MOR} or \.{MXOR} */ +extern octa load_sf @,@,@[ARGS((tetra z))@]; + /* load short float */ +extern tetra store_sf @,@,@[ARGS((octa x))@]; + /* store short float */ +extern octa fplus @,@,@[ARGS((octa y,octa z))@]; + /* floating point $x=y\oplus z$ */ +extern octa fmult @,@,@[ARGS((octa y ,octa z))@]; + /* floating point $x=y\otimes z$ */ +extern octa fdivide @,@,@[ARGS((octa y,octa z))@]; + /* floating point $x=y\oslash z$ */ +extern octa froot @,@,@[ARGS((octa,int))@]; + /* floating point $x=\sqrt z$ */ +extern octa fremstep @,@,@[ARGS((octa y,octa z,int delta))@]; + /* floating point $x\,{\rm rem}\,z=y\,{\rm rem}\,z$ */ +extern octa fintegerize @,@,@[ARGS((octa z,int mode))@]; + /* floating point $x={\rm round}(z)$ */ +extern int fcomp @,@,@[ARGS((octa y,octa z))@]; + /* $-1$, 0, 1, or 2 if $yz$, $y\parallel z$ */ +extern int fepscomp @,@,@[ARGS((octa y,octa z,octa eps,int sim))@]; + /* $x=|sim|?\ [y\sim z\ (\epsilon)]:\ [y\approx z\ (\epsilon)]$ */ +extern octa floatit @,@,@[ARGS((octa z,int mode,int unsgnd,int shrt))@]; + /* fix to float */ +extern octa fixit @,@,@[ARGS((octa z,int mode))@]; + /* float to fix */ +extern void print_float @,@,@[ARGS((octa z))@]; + /* print octabyte as floating decimal */ +extern int scan_const @,@,@[ARGS((char* buf))@]; + /* |val| = floating or integer constant; returns the type */ + +@ Here's a quick check to see if arithmetic is in trouble. + +@d panic(m) {@+fprintf(stderr,"Panic: %s!\n",m);@+exit(-2);@+} +@= +if (shift_left(neg_one,1).h!=0xffffffff) + panic("Incorrect implementation of type tetra"); +@.Incorrect implementation...@> + +@ Binary-to-decimal conversion is used when we want to see an octabyte +as a signed integer. The identity $\lfloor(an+b)/10\rfloor= +\lfloor a/10\rfloor n+\lfloor((a\bmod 10)n+b)/10\rfloor$ is helpful here. + +@d sign_bit ((unsigned)0x80000000) + +@= +void print_int @,@,@[ARGS((octa))@];@+@t}\6{@> +void print_int(o) + octa o; +{ + register tetra hi=o.h, lo=o.l, r, t; + register int j; + char dig[20]; + if (lo==0 && hi==0) printf("0"); + else { + if (hi&sign_bit) { + printf("-"); + if (lo==0) hi=-hi; + else lo=-lo, hi=~hi; + } + for (j=0;hi;j++) { /* 64-bit division by 10 */ + r=((hi%10)<<16)+(lo>>16); + hi=hi/10; + t=((r%10)<<16)+(lo&0xffff); + lo=((r/10)<<16)+(t/10); + dig[j]=t%10; + } + for (;lo;j++) { + dig[j]=lo%10; + lo=lo/10; + } + for (j--;j>=0;j--) printf("%c",dig[j]+'0'); + } +} + +@* Simulated memory. Chunks of simulated memory, 2048 bytes each, +are kept in a tree structure organized as a {\it treap}, +following ideas of Vuillemin, Aragon, and Seidel +@^Vuillemin, Jean Etienne@> +@^Aragon, Cecilia Rodriguez@> +@^Seidel, Raimund@> +[{\sl Communications of the ACM\/ \bf23} (1980), 229--239; +{\sl IEEE Symp.\ on Foundations of Computer Science\/ \bf30} (1989), 540--546]. +Each node of the treap has two keys: One, called |loc|, is the +base address of 512 simulated tetrabytes; it follows the conventions +of an ordinary binary search tree, with all locations in the left subtree +less than the |loc| of a node and all locations in the right subtree +greater than that~|loc|. The other, called |stamp|, can be thought of as the +time the node was inserted into the tree; all subnodes of a given node +have a larger~|stamp|. By assigning time stamps at random, we maintain +a tree structure that almost always is fairly well balanced. + +Each simulated tetrabyte has an associated frequency count and +source file reference. + +@= +typedef struct { + tetra tet; /* the tetrabyte of simulated memory */ + tetra freq; /* the number of times it was obeyed as an instruction */ + unsigned char bkpt; /* breakpoint information for this tetrabyte */ + unsigned char file_no; /* source file number, if known */ + unsigned short line_no; /* source line number, if known */ +} mem_tetra; +@# +typedef struct mem_node_struct { + octa loc; /* location of the first of 512 simulated tetrabytes */ + tetra stamp; /* time stamp for treap balancing */ + struct mem_node_struct *left, *right; /* pointers to subtrees */ + mem_tetra dat[512]; /* the chunk of simulated tetrabytes */ +} mem_node; + +@ The |stamp| value is actually only pseudorandom, based on the +idea of Fibonacci hashing [see {\sl Sorting and Searching}, Section~6.4]. +This is good enough for our purposes, and it guarantees that +no two stamps will be identical. + +@= +mem_node* new_mem @,@,@[ARGS((void))@];@+@t}\6{@> +mem_node* new_mem() +{ + register mem_node *p; + p=(mem_node*)calloc(1,sizeof(mem_node)); + if (!p) panic("Can't allocate any more memory"); +@.Can't allocate...@> + p->stamp=priority; + priority+=0x9e3779b9; /* $\lfloor2^{32}(\phi-1)\rfloor$ */ + return p; +} + +@ Initially we start with a chunk for the pool segment, since +the simulator will be putting command-line information there before +it runs the program. + +@= +mem_root=new_mem(); +mem_root->loc.h=0x40000000; +last_mem=mem_root; + +@ @= +tetra priority=314159265; /* pseudorandom time stamp counter */ +mem_node *mem_root; /* root of the treap */ +mem_node *last_mem; /* the memory node most recently read or written */ + +@ The |mem_find| routine finds a given tetrabyte in the simulated +memory, inserting a new node into the treap if necessary. + +@= +mem_tetra* mem_find @,@,@[ARGS((octa))@];@+@t}\6{@> +mem_tetra* mem_find(addr) + octa addr; +{ + octa key; + register int offset; + register mem_node *p=last_mem; + key.h=addr.h; + key.l=addr.l&0xfffff800; + offset=addr.l&0x7fc; + if (p->loc.l!=key.l || p->loc.h!=key.h) + @; + return &p->dat[offset>>2]; +} + +@ @= +{@+register mem_node **q; + for (p=mem_root; p; ) { + if (key.l==p->loc.l && key.h==p->loc.h) goto found; + if ((key.lloc.l && key.h<=p->loc.h) || key.hloc.h) p=p->left; + else p=p->right; + } + for (p=mem_root,q=&mem_root; p && p->stamploc.l && key.h<=p->loc.h) || key.hloc.h) q=&p->left; + else q=&p->right; + } + *q=new_mem(); + (*q)->loc=key; + @; + p=*q; +found: last_mem=p; +} + +@ At this point we want to split the binary search tree |p| into two +parts based on the given |key|, forming the left and right subtrees +of the new node~|q|. The effect will be as if |key| had been inserted +before all of |p|'s nodes. + +@= +{ + register mem_node **l=&(*q)->left,**r=&(*q)->right; + while (p) { + if ((key.lloc.l && key.h<=p->loc.h) || key.hloc.h) + *r=p, r=&p->left, p=*r; + else *l=p, l=&p->right, p=*l; + } + *l=*r=NULL; +} + +@* Loading an object file. To get the user's program into memory, +we read in an \MMIX\ object, using modifications of the routines +in the utility program \.{MMOtype}. Complete details of \.{mmo} +format appear in the program for {\mc MMIXAL}; a reader +who hopes to understand this section ought to at least skim +that documentation. +Here we need to define only the basic constants used for interpretation. + +@d mm 0x98 /* the escape code of \.{mmo} format */ +@d lop_quote 0x0 /* the quotation lopcode */ +@d lop_loc 0x1 /* the location lopcode */ +@d lop_skip 0x2 /* the skip lopcode */ +@d lop_fixo 0x3 /* the octabyte-fix lopcode */ +@d lop_fixr 0x4 /* the relative-fix lopcode */ +@d lop_fixrx 0x5 /* extended relative-fix lopcode */ +@d lop_file 0x6 /* the file name lopcode */ +@d lop_line 0x7 /* the file position lopcode */ +@d lop_spec 0x8 /* the special hook lopcode */ +@d lop_pre 0x9 /* the preamble lopcode */ +@d lop_post 0xa /* the postamble lopcode */ +@d lop_stab 0xb /* the symbol table lopcode */ +@d lop_end 0xc /* the end-it-all lopcode */ + +@ We do not load the symbol table. (A more ambitious simulator could +implement \.{MMIXAL}-style expressions for interactive debugging, +but such enhancements are left to the interested reader.) + +@= +mmo_file=fopen(mmo_file_name,"rb"); +if (!mmo_file) { + register char *alt_name=(char*)calloc(strlen(mmo_file_name)+5,sizeof(char)); + if (!alt_name) panic("Can't allocate file name buffer"); +@.Can't allocate...@> + sprintf(alt_name,"%s.mmo",mmo_file_name); + mmo_file=fopen(alt_name,"rb"); + if (!mmo_file) { + fprintf(stderr,"Can't open the object file %s or %s!\n", +@.Can't open...@> + mmo_file_name,alt_name); + exit(-3); + } + free(alt_name); +} +byte_count=0; + +@ @= +FILE *mmo_file; /* the input file */ +int postamble; /* have we encountered |lop_post|? */ +int byte_count; /* index of the next-to-be-read byte */ +byte buf[4]; /* the most recently read bytes */ +int yzbytes; /* the two least significant bytes */ +int delta; /* difference for relative fixup */ +tetra tet; /* |buf| bytes packed big-endianwise */ + +@ The tetrabytes of an \.{mmo} file are stored in +friendly big-endian fashion, but this program is supposed to work also +on computers that are little-endian. Therefore we read four successive bytes +and pack them into a tetrabyte, instead of reading a single tetrabyte. + +@d mmo_err { + fprintf(stderr,"Bad object file! (Try running MMOtype.)\n"); +@.Bad object file@> + exit(-4); + } + +@= +void read_tet @,@,@[ARGS((void))@];@+@t}\6{@> +void read_tet() +{ + if (fread(buf,1,4,mmo_file)!=4) mmo_err; + yzbytes=(buf[2]<<8)+buf[3]; + tet=(((buf[0]<<8)+buf[1])<<16)+yzbytes; +} + +@ @= +byte read_byte @,@,@[ARGS((void))@];@+@t}\6{@> +byte read_byte() +{ + register byte b; + if (!byte_count) read_tet(); + b=buf[byte_count]; + byte_count=(byte_count+1)&3; + return b; +} + +@ @= +read_tet(); /* read the first tetrabyte of input */ +if (buf[0]!=mm || buf[1]!=lop_pre) mmo_err; +if (ybyte!=1) mmo_err; +if (zbyte==0) obj_time=0xffffffff; +else { + j=zbyte-1; + read_tet();@+ obj_time=tet; /* file creation time */ + for (;j>0;j--) read_tet(); +} + +@ @= +{ + read_tet(); + loop:@+if (buf[0]==mm) switch (buf[1]) { + case lop_quote:@+if (yzbytes!=1) mmo_err; + read_tet();@+break; + @t\4@>@@; + case lop_post: postamble=1; + if (ybyte || zbyte<32) mmo_err; + continue; + default: mmo_err; + } + @; +} + +@ In a normal situation, the newly read tetrabyte is simply supposed +to be loaded into the current location. We load not only the current +location but also the current file position, if |cur_line| is nonzero +and |cur_loc| belongs to segment~0. + +@d mmo_load(loc,val) ll=mem_find(loc), ll->tet^=val + +@= +{ + mmo_load(cur_loc,tet); + if (cur_line) { + ll->file_no=cur_file; + ll->line_no=cur_line; + cur_line++; + } + cur_loc=incr(cur_loc,4);@+ cur_loc.l &=-4; +} + +@ @= +octa cur_loc; /* the current location */ +int cur_file=-1; /* the most recently selected file number */ +int cur_line; /* the current position in |cur_file|, if nonzero */ +octa tmp; /* an octabyte of temporary interest */ +tetra obj_time; /* when the object file was created */ + +@ @= +cur_loc.h=cur_loc.l=0; +cur_file=-1; +cur_line=0; +@; +do @@;@+while (!postamble); +@; +fclose(mmo_file); +cur_line=0; + +@ We have already implemented |lop_quote|, which +falls through to the normal case after reading an extra tetrabyte. +Now let's consider the other lopcodes in turn. + +@d ybyte buf[2] /* the next-to-least significant byte */ +@d zbyte buf[3] /* the least significant byte */ + +@= +case lop_loc:@+if (zbyte==2) { + j=ybyte;@+ read_tet();@+ cur_loc.h=(j<<24)+tet; + }@+else if (zbyte==1) cur_loc.h=ybyte<<24; + else mmo_err; + read_tet();@+ cur_loc.l=tet; + continue; +case lop_skip: cur_loc=incr(cur_loc,yzbytes);@+continue; + +@ Fixups load information out of order, when future references have +been resolved. The current file name and line number are not considered +relevant. + +@= +case lop_fixo:@+if (zbyte==2) { + j=ybyte;@+ read_tet();@+ tmp.h=(j<<24)+tet; + }@+else if (zbyte==1) tmp.h=ybyte<<24; + else mmo_err; + read_tet();@+ tmp.l=tet; + mmo_load(tmp,cur_loc.h); + mmo_load(incr(tmp,4),cur_loc.l); + continue; +case lop_fixr: delta=yzbytes; goto fixr; +case lop_fixrx:j=yzbytes;@+if (j!=16 && j!=24) mmo_err; + read_tet(); delta=tet; + if (delta&0xfe000000) mmo_err; +fixr: tmp=incr(cur_loc,-(delta>=0x1000000? (delta&0xffffff)-(1<= +case lop_file:@+if (file_info[ybyte].name) { + if (zbyte) mmo_err; + cur_file=ybyte; + }@+else { + if (!zbyte) mmo_err; + file_info[ybyte].name=(char*)calloc(4*zbyte+1,1); + if (!file_info[ybyte].name) { + fprintf(stderr,"No room to store the file name!\n");@+exit(-5); +@.No room...@> + } + cur_file=ybyte; + for (j=zbyte,p=file_info[ybyte].name; j>0; j--,p+=4) { + read_tet(); + *p=buf[0];@+*(p+1)=buf[1];@+*(p+2)=buf[2];@+*(p+3)=buf[3]; + } + } + cur_line=0;@+continue; +case lop_line:@+if (cur_file<0) mmo_err; + cur_line=yzbytes;@+continue; + +@ Special bytes are ignored (at least for now). + +@= +case lop_spec:@+ while(1) { + read_tet(); + if (buf[0]==mm) { + if (buf[1]!=lop_quote || yzbytes!=1) goto loop; /* end of special data */ + read_tet(); + } + } + +@ Since a chunk of memory holds 512 tetrabytes, the |ll| pointer in the +following loop stays in the same chunk (namely, the first chunk +of segment~3, also known as \.{Stack\_Segment}). +@:Stack_Segment}\.{Stack\_Segment@> +@:Pool_Segment}\.{Pool\_Segment@> + +@= +aux.h=0x60000000;@+ aux.l=0x18; +ll=mem_find(aux); +(ll-1)->tet=2; /* this will ultimately set |rL=2| */ +(ll-5)->tet=argc; /* and $\$0=|argc|$ */ +(ll-4)->tet=0x40000000; +(ll-3)->tet=0x8; /* and $\$1=\.{Pool\_Segment}+8$ */ +G=zbyte;@+ L=0; +for (j=G+G;j<256+256;j++,ll++,aux.l+=4) read_tet(), ll->tet=tet; +inst_ptr.h=(ll-2)->tet, inst_ptr.l=(ll-1)->tet; /* \.{Main} */ +(ll+2*12)->tet=G<<24; +g[255]=incr(aux,12*8); /* we will |UNSAVE| from here, to get going */ + +@* Loading and printing source lines. +The loaded program generally contains cross references to the lines +of symbolic source files, so that the context of each instruction +can be understood. The following sections of this program +make such information available when it is desired. + +Source file data is kept in a \&{file\_node} structure: + +@= +typedef struct { + char *name; /* name of source file */ + int line_count; /* number of lines in the file */ + long *map; /* pointer to map of file positions */ +} file_node; + +@ In partial preparation for the day when source files are in +Unicode, we define a type \&{Char} for the source characters. + +@= +typedef char Char; /* bytes that will become wydes some day */ + +@ @= +file_node file_info[256]; /* data about each source file */ +int buf_size; /* size of buffer for source lines */ +Char *buffer; + +@ As in \.{MMIXAL}, we prefer source lines of length 72 characters or less, +but the user is allowed to increase the limit. (Longer lines will silently +be truncated to the buffer size when the simulator lists them.) + +@= +if (buf_size<72) buf_size=72; +buffer=(Char*)calloc(buf_size+1,sizeof(Char)); +if (!buffer) panic("Can't allocate source line buffer"); +@.Can't allocate...@> + +@ The first time we are called upon to list a line from a given source +file, we make a map of starting locations for each line. Source files +should contain at most 65535 lines. We assume that they contain +no null characters. + +@= +void make_map @,@,@[ARGS((void))@];@+@t}\6{@> +void make_map() +{ + long map[65536]; + register int k,l; + register long*p; + @; + for (l=1;l<65536 && !feof(src_file);l++) { + map[l]=ftell(src_file); + loop:@+if (!fgets(buffer,buf_size,src_file)) break; + if (buffer[strlen(buffer)-1]!='\n') goto loop; + } + file_info[cur_file].line_count=l; + file_info[cur_file].map=p=(long*)calloc(l,sizeof(long)); + if (!p) panic("No room for a source-line map"); +@.No room...@> + for (k=1;k + +@= +#include +#include + +@ @= +@^system dependencies@> +{ + struct stat stat_buf; + if (stat(file_info[cur_file].name,&stat_buf)>=0) + if ((tetra)stat_buf.st_mtime > obj_time) + fprintf(stderr, + "Warning: File %s was modified; it may not match the program!\n", +@.File...was modified@> + file_info[cur_file].name); +} + +@ Source lines are listed by the |print_line| routine, preceded by +12 characters containing the line number. If a file error occurs, +nothing is printed---not even an error message; the absence of +listed data is itself a message. + +@= +void print_line @,@,@[ARGS((int))@];@+@t}\6{@> +void print_line(k) + int k; +{ + char buf[11]; + if (k>=file_info[cur_file].line_count) return; + if (fseek(src_file,file_info[cur_file].map[k],SEEK_SET)!=0) return; + if (!fgets(buffer,buf_size,src_file)) return; + sprintf(buf,"%d: ",k); + printf("line %.6s %s",buf,buffer); + if (buffer[strlen(buffer)-1]!='\n') printf("\n"); + line_shown=true; +} + +@ @= +#ifndef SEEK_SET +#define SEEK_SET 0 /* code for setting the file pointer to a given offset */ +#endif + +@ The |show_line| routine is called when we want to output line |cur_line| +of source file number |cur_file|, assuming that |cur_line!=0|. Its job +is primarily to maintain continuity, by opening or reopening the |src_file| +if the source file changes, and by connecting the previously output +lines to the new one. Sometimes no output is necessary, because the +desired line has already been printed. + +@= +void show_line @,@,@[ARGS((void))@];@+@t}\6{@> +void show_line() +{ + register int k; + if (shown_file!=cur_file) @@; + else if (shown_line==cur_line) return; /* already shown */ + if (cur_line>shown_line+gap+1 || cur_line0) + if (cur_line= +FILE *src_file; /* the currently open source file */ +int shown_file=-1; /* index of the most recently listed file */ +int shown_line; /* the line most recently listed in |shown_file| */ +int gap; /* minimum gap between consecutively listed source lines */ +bool line_shown; /* did we list anything recently? */ +bool showing_source; /* are we listing source lines? */ +int profile_gap; /* the |gap| when printing final frequencies */ +bool profile_showing_source; /* |showing_source| within final frequencies */ + +@ @= +{ + if (!src_file) src_file=fopen(file_info[cur_file].name,"r"); + else freopen(file_info[cur_file].name,"r",src_file); + if (!src_file) { + fprintf(stderr,"Warning: I can't open file %s; source listing omitted.\n", +@.I can't open...@> + file_info[cur_file].name); + showing_source=false; + return; + } + printf("\"%s\"\n",file_info[cur_file].name); + shown_file=cur_file; + shown_line=0; + if (!file_info[cur_file].map) make_map(); +} + +@ Here is a simple application of |show_line|. It is a recursive routine that +prints the frequency counts of all instructions that occur in a +given subtree of the simulated memory and that were executed at least once. +The subtree is traversed in symmetric order; therefore the frequencies +appear in increasing order of the instruction locations. + +@= +void print_freqs @,@,@[ARGS((mem_node*))@];@+@t}\6{@> +void print_freqs(p) + mem_node *p; +{ + register int j; + octa cur_loc; + if (p->left) print_freqs(p->left); + for (j=0;j<512;j++) if (p->dat[j].freq) + @loc+4*j|@>; + if (p->right) print_freqs(p->right); +} + +@ An ellipsis (\.{...}) is printed between frequency data for nonconsecutive +instructions, unless source line information intervenes. + +@= +{ + cur_loc=incr(p->loc,4*j); + if (showing_source && p->dat[j].line_no) { + cur_file=p->dat[j].file_no, cur_line=p->dat[j].line_no; + line_shown=false; + show_line(); + if (line_shown) goto loc_implied; + } + if (cur_loc.l!=implied_loc.l || cur_loc.h!=implied_loc.h) + if (profile_started) printf(" 0. ...\n"); + loc_implied: printf("%10d. %08x%08x: %08x (%s)\n", + p->dat[j].freq, cur_loc.h, cur_loc.l, p->dat[j].tet, + info[p->dat[j].tet>>24].name); + implied_loc=incr(cur_loc,4);@+ profile_started=true; +} + +@ @= +octa implied_loc; /* location following the last shown frequency data */ +bool profile_started; /* have we printed at least one frequency count? */ + +@ @= +{ + printf("\nProgram profile:\n"); + shown_file=cur_file=-1;@+ shown_line=cur_line=0; + gap=profile_gap; + showing_source=profile_showing_source; + implied_loc=neg_one; + print_freqs(mem_root); +} + +@* Lists. This simulator needs to deal with 256 different opcodes, +so we might as well enumerate them~now. + +@= +typedef enum{@/ +@!TRAP,@!FCMP,@!FUN,@!FEQL,@!FADD,@!FIX,@!FSUB,@!FIXU,@/ +@!FLOT,@!FLOTI,@!FLOTU,@!FLOTUI,@!SFLOT,@!SFLOTI,@!SFLOTU,@!SFLOTUI,@/ +@!FMUL,@!FCMPE,@!FUNE,@!FEQLE,@!FDIV,@!FSQRT,@!FREM,@!FINT,@/ +@!MUL,@!MULI,@!MULU,@!MULUI,@!DIV,@!DIVI,@!DIVU,@!DIVUI,@/ +@!ADD,@!ADDI,@!ADDU,@!ADDUI,@!SUB,@!SUBI,@!SUBU,@!SUBUI,@/ +@!IIADDU,@!IIADDUI,@!IVADDU,@!IVADDUI,@!VIIIADDU,@!VIIIADDUI,@!XVIADDU,@!XVIADDUI,@/ +@!CMP,@!CMPI,@!CMPU,@!CMPUI,@!NEG,@!NEGI,@!NEGU,@!NEGUI,@/ +@!SL,@!SLI,@!SLU,@!SLUI,@!SR,@!SRI,@!SRU,@!SRUI,@/ +@!BN,@!BNB,@!BZ,@!BZB,@!BP,@!BPB,@!BOD,@!BODB,@/ +@!BNN,@!BNNB,@!BNZ,@!BNZB,@!BNP,@!BNPB,@!BEV,@!BEVB,@/ +@!PBN,@!PBNB,@!PBZ,@!PBZB,@!PBP,@!PBPB,@!PBOD,@!PBODB,@/ +@!PBNN,@!PBNNB,@!PBNZ,@!PBNZB,@!PBNP,@!PBNPB,@!PBEV,@!PBEVB,@/ +@!CSN,@!CSNI,@!CSZ,@!CSZI,@!CSP,@!CSPI,@!CSOD,@!CSODI,@/ +@!CSNN,@!CSNNI,@!CSNZ,@!CSNZI,@!CSNP,@!CSNPI,@!CSEV,@!CSEVI,@/ +@!ZSN,@!ZSNI,@!ZSZ,@!ZSZI,@!ZSP,@!ZSPI,@!ZSOD,@!ZSODI,@/ +@!ZSNN,@!ZSNNI,@!ZSNZ,@!ZSNZI,@!ZSNP,@!ZSNPI,@!ZSEV,@!ZSEVI,@/ +@!LDB,@!LDBI,@!LDBU,@!LDBUI,@!LDW,@!LDWI,@!LDWU,@!LDWUI,@/ +@!LDT,@!LDTI,@!LDTU,@!LDTUI,@!LDO,@!LDOI,@!LDOU,@!LDOUI,@/ +@!LDSF,@!LDSFI,@!LDHT,@!LDHTI,@!CSWAP,@!CSWAPI,@!LDUNC,@!LDUNCI,@/ +@!LDVTS,@!LDVTSI,@!PRELD,@!PRELDI,@!PREGO,@!PREGOI,@!GO,@!GOI,@/ +@!STB,@!STBI,@!STBU,@!STBUI,@!STW,@!STWI,@!STWU,@!STWUI,@/ +@!STT,@!STTI,@!STTU,@!STTUI,@!STO,@!STOI,@!STOU,@!STOUI,@/ +@!STSF,@!STSFI,@!STHT,@!STHTI,@!STCO,@!STCOI,@!STUNC,@!STUNCI,@/ +@!SYNCD,@!SYNCDI,@!PREST,@!PRESTI,@!SYNCID,@!SYNCIDI,@!PUSHGO,@!PUSHGOI,@/ +@!OR,@!ORI,@!ORN,@!ORNI,@!NOR,@!NORI,@!XOR,@!XORI,@/ +@!AND,@!ANDI,@!ANDN,@!ANDNI,@!NAND,@!NANDI,@!NXOR,@!NXORI,@/ +@!BDIF,@!BDIFI,@!WDIF,@!WDIFI,@!TDIF,@!TDIFI,@!ODIF,@!ODIFI,@/ +@!MUX,@!MUXI,@!SADD,@!SADDI,@!MOR,@!MORI,@!MXOR,@!MXORI,@/ +@!SETH,@!SETMH,@!SETML,@!SETL,@!INCH,@!INCMH,@!INCML,@!INCL,@/ +@!ORH,@!ORMH,@!ORML,@!ORL,@!ANDNH,@!ANDNMH,@!ANDNML,@!ANDNL,@/ +@!JMP,@!JMPB,@!PUSHJ,@!PUSHJB,@!GETA,@!GETAB,@!PUT,@!PUTI,@/ +@!POP,@!RESUME,@!SAVE,@!UNSAVE,@!SYNC,@!SWYM,@!GET,@!TRIP}@+@!mmix_opcode; + +@ We also need to enumerate the special names for special registers. + +@= +typedef enum{ +@!rB,@!rD,@!rE,@!rH,@!rJ,@!rM,@!rR,@!rBB, + @!rC,@!rN,@!rO,@!rS,@!rI,@!rT,@!rTT,@!rK,@!rQ,@!rU,@!rV,@!rG,@!rL, + @!rA,@!rF,@!rP,@!rW,@!rX,@!rY,@!rZ,@!rWW,@!rXX,@!rYY,@!rZZ} @!special_reg; + +@ @= +char *special_name[32]={"rB","rD","rE","rH","rJ","rM","rR","rBB", + "rC","rN","rO","rS","rI","rT","rTT","rK","rQ","rU","rV","rG","rL", + "rA","rF","rP","rW","rX","rY","rZ","rWW","rXX","rYY","rZZ"}; + +@ Here are the bit codes for arithmetic exceptions. These codes, except +|H_BIT|, are defined also in {\mc MMIX-ARITH}. + +@d X_BIT (1<<8) /* floating inexact */ +@d Z_BIT (1<<9) /* floating division by zero */ +@d U_BIT (1<<10) /* floating underflow */ +@d O_BIT (1<<11) /* floating overflow */ +@d I_BIT (1<<12) /* floating invalid operation */ +@d W_BIT (1<<13) /* float-to-fix overflow */ +@d V_BIT (1<<14) /* integer overflow */ +@d D_BIT (1<<15) /* integer divide check */ +@d H_BIT (1<<16) /* trip */ + +@ The |bkpt| field associated with each tetrabyte of memory has +bits associated with forced tracing and/or +breaking for reading, writing, and/or execution. + +@d trace_bit (1<<3) +@d read_bit (1<<2) +@d write_bit (1<<1) +@d exec_bit (1<<0) + +@ To complete our lists of lists, +we enumerate the rudimentary operating system calls +that are built in to \.{MMIXAL}. + +@d max_sys_call Ftell + +@= +typedef enum{ +@!Halt,@!Fopen,@!Fclose,@!Fread,@!Fgets,@!Fgetws, +@!Fwrite,@!Fputs,@!Fputws,@!Fseek,@!Ftell} @!sys_call; + +@* The main loop. Now let's plunge in to the guts of the simulator, +the master switch that controls most of the action. + +@= +{ + if (resuming) loc=incr(inst_ptr,-4), inst=g[rX].l; + else @; + op=inst>>24;@+xx=(inst>>16)&0xff;@+yy=(inst>>8)&0xff;@+zz=inst&0xff; + f=info[op].flags;@+yz=inst&0xffff; + x=y=z=a=b=zero_octa;@+ exc=0;@+ old_L=L; + if (f&rel_addr_bit) @; + @; + if (f&X_is_dest_bit) @; + w=oplus(y,z); + if (loc.h>=0x20000000) goto privileged_inst; + switch(op) { + @t\4@>@; + } + @; + @; + @; + if (resuming && op!=RESUME) resuming=false; +} + +@ Operands |x| and |a| are usually destinations (results), computed from +the source operands |y|, |z|, and/or~|b|. + +@= +octa w,x,y,z,a,b,ma,mb; /* operands */ +octa *x_ptr; /* destination */ +octa loc; /* location of the current instruction */ +octa inst_ptr; /* location of the next instruction */ +tetra inst; /* the current instruction */ +int old_L; /* value of |L| before the current instruction */ +int exc; /* exceptions raised by the current instruction */ +int tracing_exceptions; /* exception bits that cause tracing */ +int rop; /* ropcode of a resumed instruction */ +int round_mode; /* the style of floating point rounding just used */ +bool resuming; /* are we resuming an interrupted instruction? */ +bool halted; /* did the program come to a halt? */ +bool breakpoint; /* should we pause after the current instruction? */ +bool tracing; /* should we trace the current instruction? */ +bool stack_tracing; /* should we trace details of the register stack? */ +bool interacting; /* are we in interactive mode? */ +bool interact_after_break; /* should we go into interactive mode? */ +bool tripping; /* are we about to go to a trip handler? */ +bool good; /* did the last branch instruction guess correctly? */ +tetra trace_threshold; /* each instruction should be traced this many times */ + +@ @= +register mmix_opcode op; /* operation code of the current instruction */ +register int xx,yy,zz,yz; /* operand fields of the current instruction */ +register tetra f; /* properties of the current |op| */ +register int i,j,k; /* miscellaneous indices */ +register mem_tetra *ll; /* current place in the simulated memory */ +register char *p; /* current place in a string */ + +@ @= +{ + loc=inst_ptr; + ll=mem_find(loc); + inst=ll->tet; + cur_file=ll->file_no; + cur_line=ll->line_no; + ll->freq++; + if (ll->bkpt&exec_bit) breakpoint=true; + tracing=breakpoint||(ll->bkpt&trace_bit)||(ll->freq<=trace_threshold); + inst_ptr=incr(inst_ptr,4); +} + +@ Much of the simulation is table-driven, based on a static data +structure called the \&{op\_info} for each operation code. + +@= +typedef struct { + char *name; /* symbolic name of an opcode */ + unsigned char flags; /* its instruction format */ + unsigned char third_operand; /* its special register input */ + unsigned char mems; /* how many $\mu$ it costs */ + unsigned char oops; /* how many $\upsilon$ it costs */ + char *trace_format; /* how it appears when traced */ +} op_info; + +@ For example, the |flags| field of |info[op]| +tells us how to obtain the operands from the X, Y, and~Z fields +of the current instruction. Each entry records special properties of an +operation code, in binary notation: +\Hex{1}~means Z~is an immediate value, \Hex{2}~means rZ is +a source operand, \Hex{4}~means Y~is an immediate value, \Hex{8}~means rY is a +source operand, \Hex{10}~means rX is a source operand, \Hex{20}~means +rX is a destination, \Hex{40}~means YZ is part of a relative address, +\Hex{80}~means a push or pop or unsave instruction. + +The |trace_format| field will be explained later. + +@d Z_is_immed_bit 0x1 +@d Z_is_source_bit 0x2 +@d Y_is_immed_bit 0x4 +@d Y_is_source_bit 0x8 +@d X_is_source_bit 0x10 +@d X_is_dest_bit 0x20 +@d rel_addr_bit 0x40 +@d push_pop_bit 0x80 + +@= +op_info info[256]={ +@, +@, +@, +@}; + +@ @= +{"TRAP",0x0a,255,0,5,"%r"},@| +{"FCMP",0x2a,0,0,1,"%l = %.y cmp %.z = %x"},@| +{"FUN",0x2a,0,0,1,"%l = [%.y(||)%.z] = %x"},@| +{"FEQL",0x2a,0,0,1,"%l = [%.y(==)%.z] = %x"},@| +{"FADD",0x2a,0,0,4,"%l = %.y %(+%) %.z = %.x"},@| +{"FIX",0x26,0,0,4,"%l = %(fix%) %.z = %x"},@| +{"FSUB",0x2a,0,0,4,"%l = %.y %(-%) %.z = %.x"},@| +{"FIXU",0x26,0,0,4,"%l = %(fix%) %.z = %#x"},@| +{"FLOT",0x26,0,0,4,"%l = %(flot%) %z = %.x"},@| +{"FLOTI",0x25,0,0,4,"%l = %(flot%) %z = %.x"},@| +{"FLOTU",0x26,0,0,4,"%l = %(flot%) %#z = %.x"},@| +{"FLOTUI",0x25,0,0,4,"%l = %(flot%) %z = %.x"},@| +{"SFLOT",0x26,0,0,4,"%l = %(sflot%) %z = %.x"},@| +{"SFLOTI",0x25,0,0,4,"%l = %(sflot%) %z = %.x"},@| +{"SFLOTU",0x26,0,0,4,"%l = %(sflot%) %#z = %.x"},@| +{"SFLOTUI",0x25,0,0,4,"%l = %(sflot%) %z = %.x"},@| +{"FMUL",0x2a,0,0,4,"%l = %.y %(*%) %.z = %.x"},@| +{"FCMPE",0x2a,rE,0,4,"%l = %.y cmp %.z (%.b)) = %x"},@| +{"FUNE",0x2a,rE,0,1,"%l = [%.y(||)%.z (%.b)] = %x"},@| +{"FEQLE",0x2a,rE,0,4,"%l = [%.y(==)%.z (%.b)] = %x"},@| +{"FDIV",0x2a,0,0,40,"%l = %.y %(/%) %.z = %.x"},@| +{"FSQRT",0x26,0,0,40,"%l = %(sqrt%) %.z = %.x"},@| +{"FREM",0x2a,0,0,4,"%l = %.y %(rem%) %.z = %.x"},@| +{"FINT",0x26,0,0,4,"%l = %(int%) %.z = %.x"},@| +{"MUL",0x2a,0,0,10,"%l = %y * %z = %x"},@| +{"MULI",0x29,0,0,10,"%l = %y * %z = %x"},@| +{"MULU",0x2a,0,0,10,"%l = %#y * %#z = %#x, rH=%#a"},@| +{"MULUI",0x29,0,0,10,"%l = %#y * %z = %#x, rH=%#a"},@| +{"DIV",0x2a,0,0,60,"%l = %y / %z = %x, rR=%a"},@| +{"DIVI",0x29,0,0,60,"%l = %y / %z = %x, rR=%a"},@| +{"DIVU",0x2a,rD,0,60,"%l = %#b%0y / %#z = %#x, rR=%#a"},@| +{"DIVUI",0x29,rD,0,60,"%l = %#b%0y / %z = %#x, rR=%#a"},@| +{"ADD",0x2a,0,0,1,"%l = %y + %z = %x"},@| +{"ADDI",0x29,0,0,1,"%l = %y + %z = %x"},@| +{"ADDU",0x2a,0,0,1,"%l = %#y + %#z = %#x"},@| +{"ADDUI",0x29,0,0,1,"%l = %#y + %z = %#x"},@| +{"SUB",0x2a,0,0,1,"%l = %y - %z = %x"},@| +{"SUBI",0x29,0,0,1,"%l = %y - %z = %x"},@| +{"SUBU",0x2a,0,0,1,"%l = %#y - %#z = %#x"},@| +{"SUBUI",0x29,0,0,1,"%l = %#y - %z = %#x"},@| +{"2ADDU",0x2a,0,0,1,"%l = %#y <<1+ %#z = %#x"},@| +{"2ADDUI",0x29,0,0,1,"%l = %#y <<1+ %z = %#x"},@| +{"4ADDU",0x2a,0,0,1,"%l = %#y <<2+ %#z = %#x"},@| +{"4ADDUI",0x29,0,0,1,"%l = %#y <<2+ %z = %#x"},@| +{"8ADDU",0x2a,0,0,1,"%l = %#y <<3+ %#z = %#x"},@| +{"8ADDUI",0x29,0,0,1,"%l = %#y <<3+ %z = %#x"},@| +{"16ADDU",0x2a,0,0,1,"%l = %#y <<4+ %#z = %#x"},@| +{"16ADDUI",0x29,0,0,1,"%l = %#y <<4+ %z = %#x"},@| +{"CMP",0x2a,0,0,1,"%l = %y cmp %z = %x"},@| +{"CMPI",0x29,0,0,1,"%l = %y cmp %z = %x"},@| +{"CMPU",0x2a,0,0,1,"%l = %#y cmp %#z = %x"},@| +{"CMPUI",0x29,0,0,1,"%l = %#y cmp %z = %x"},@| +{"NEG",0x26,0,0,1,"%l = %y - %z = %x"},@| +{"NEGI",0x25,0,0,1,"%l = %y - %z = %x"},@| +{"NEGU",0x26,0,0,1,"%l = %y - %#z = %#x"},@| +{"NEGUI",0x25,0,0,1,"%l = %y - %z = %#x"},@| +{"SL",0x2a,0,0,1,"%l = %y << %#z = %x"},@| +{"SLI",0x29,0,0,1,"%l = %y << %z = %x"},@| +{"SLU",0x2a,0,0,1,"%l = %#y << %#z = %#x"},@| +{"SLUI",0x29,0,0,1,"%l = %#y << %z = %#x"},@| +{"SR",0x2a,0,0,1,"%l = %y >> %#z = %x"},@| +{"SRI",0x29,0,0,1,"%l = %y >> %z = %x"},@| +{"SRU",0x2a,0,0,1,"%l = %#y >> %#z = %#x"},@| +{"SRUI",0x29,0,0,1,"%l = %#y >> %z = %#x"} + +@ @= +{"BN",0x50,0,0,1,"%b<0? %t%g"},@| +{"BNB",0x50,0,0,1,"%b<0? %t%g"},@| +{"BZ",0x50,0,0,1,"%b==0? %t%g"},@| +{"BZB",0x50,0,0,1,"%b==0? %t%g"},@| +{"BP",0x50,0,0,1,"%b>0? %t%g"},@| +{"BPB",0x50,0,0,1,"%b>0? %t%g"},@| +{"BOD",0x50,0,0,1,"%b odd? %t%g"},@| +{"BODB",0x50,0,0,1,"%b odd? %t%g"},@| +{"BNN",0x50,0,0,1,"%b>=0? %t%g"},@| +{"BNNB",0x50,0,0,1,"%b>=0? %t%g"},@| +{"BNZ",0x50,0,0,1,"%b!=0? %t%g"},@| +{"BNZB",0x50,0,0,1,"%b!=0? %t%g"},@| +{"BNP",0x50,0,0,1,"%b<=0? %t%g"},@| +{"BNPB",0x50,0,0,1,"%b<=0? %t%g"},@| +{"BEV",0x50,0,0,1,"%b even? %t%g"},@| +{"BEVB",0x50,0,0,1,"%b even? %t%g"},@| +{"PBN",0x50,0,0,1,"%b<0? %t%g"},@| +{"PBNB",0x50,0,0,1,"%b<0? %t%g"},@| +{"PBZ",0x50,0,0,1,"%b==0? %t%g"},@| +{"PBZB",0x50,0,0,1,"%b==0? %t%g"},@| +{"PBP",0x50,0,0,1,"%b>0? %t%g"},@| +{"PBPB",0x50,0,0,1,"%b>0? %t%g"},@| +{"PBOD",0x50,0,0,1,"%b odd? %t%g"},@| +{"PBODB",0x50,0,0,1,"%b odd? %t%g"},@| +{"PBNN",0x50,0,0,1,"%b>=0? %t%g"},@| +{"PBNNB",0x50,0,0,1,"%b>=0? %t%g"},@| +{"PBNZ",0x50,0,0,1,"%b!=0? %t%g"},@| +{"PBNZB",0x50,0,0,1,"%b!=0? %t%g"},@| +{"PBNP",0x50,0,0,1,"%b<=0? %t%g"},@| +{"PBNPB",0x50,0,0,1,"%b<=0? %t%g"},@| +{"PBEV",0x50,0,0,1,"%b even? %t%g"},@| +{"PBEVB",0x50,0,0,1,"%b even? %t%g"},@| +{"CSN",0x3a,0,0,1,"%l = %y<0? %z: %b = %x"},@| +{"CSNI",0x39,0,0,1,"%l = %y<0? %z: %b = %x"},@| +{"CSZ",0x3a,0,0,1,"%l = %y==0? %z: %b = %x"},@| +{"CSZI",0x39,0,0,1,"%l = %y==0? %z: %b = %x"},@| +{"CSP",0x3a,0,0,1,"%l = %y>0? %z: %b = %x"},@| +{"CSPI",0x39,0,0,1,"%l = %y>0? %z: %b = %x"},@| +{"CSOD",0x3a,0,0,1,"%l = %y odd? %z: %b = %x"},@| +{"CSODI",0x39,0,0,1,"%l = %y odd? %z: %b = %x"},@| +{"CSNN",0x3a,0,0,1,"%l = %y>=0? %z: %b = %x"},@| +{"CSNNI",0x39,0,0,1,"%l = %y>=0? %z: %b = %x"},@| +{"CSNZ",0x3a,0,0,1,"%l = %y!=0? %z: %b = %x"},@| +{"CSNZI",0x39,0,0,1,"%l = %y!=0? %z: %b = %x"},@| +{"CSNP",0x3a,0,0,1,"%l = %y<=0? %z: %b = %x"},@| +{"CSNPI",0x39,0,0,1,"%l = %y<=0? %z: %b = %x"},@| +{"CSEV",0x3a,0,0,1,"%l = %y even? %z: %b = %x"},@| +{"CSEVI",0x39,0,0,1,"%l = %y even? %z: %b = %x"},@| +{"ZSN",0x2a,0,0,1,"%l = %y<0? %z: 0 = %x"},@| +{"ZSNI",0x29,0,0,1,"%l = %y<0? %z: 0 = %x"},@| +{"ZSZ",0x2a,0,0,1,"%l = %y==0? %z: 0 = %x"},@| +{"ZSZI",0x29,0,0,1,"%l = %y==0? %z: 0 = %x"},@| +{"ZSP",0x2a,0,0,1,"%l = %y>0? %z: 0 = %x"},@| +{"ZSPI",0x29,0,0,1,"%l = %y>0? %z: 0 = %x"},@| +{"ZSOD",0x2a,0,0,1,"%l = %y odd? %z: 0 = %x"},@| +{"ZSODI",0x29,0,0,1,"%l = %y odd? %z: 0 = %x"},@| +{"ZSNN",0x2a,0,0,1,"%l = %y>=0? %z: 0 = %x"},@| +{"ZSNNI",0x29,0,0,1,"%l = %y>=0? %z: 0 = %x"},@| +{"ZSNZ",0x2a,0,0,1,"%l = %y!=0? %z: 0 = %x"},@| +{"ZSNZI",0x29,0,0,1,"%l = %y!=0? %z: 0 = %x"},@| +{"ZSNP",0x2a,0,0,1,"%l = %y<=0? %z: 0 = %x"},@| +{"ZSNPI",0x29,0,0,1,"%l = %y<=0? %z: 0 = %x"},@| +{"ZSEV",0x2a,0,0,1,"%l = %y even? %z: 0 = %x"},@| +{"ZSEVI",0x29,0,0,1,"%l = %y even? %z: 0 = %x"} + +@ @= +{"LDB",0x2a,0,1,1,"%l = M1[%#y+%#z] = %x"},@| +{"LDBI",0x29,0,1,1,"%l = M1[%#y%?+] = %x"},@| +{"LDBU",0x2a,0,1,1,"%l = M1[%#y+%#z] = %#x"},@| +{"LDBUI",0x29,0,1,1,"%l = M1[%#y%?+] = %#x"},@| +{"LDW",0x2a,0,1,1,"%l = M2[%#y+%#z] = %x"},@| +{"LDWI",0x29,0,1,1,"%l = M2[%#y%?+] = %x"},@| +{"LDWU",0x2a,0,1,1,"%l = M2[%#y+%#z] = %#x"},@| +{"LDWUI",0x29,0,1,1,"%l = M2[%#y%?+] = %#x"},@| +{"LDT",0x2a,0,1,1,"%l = M4[%#y+%#z] = %x"},@| +{"LDTI",0x29,0,1,1,"%l = M4[%#y%?+] = %x"},@| +{"LDTU",0x2a,0,1,1,"%l = M4[%#y+%#z] = %#x"},@| +{"LDTUI",0x29,0,1,1,"%l = M4[%#y%?+] = %#x"},@| +{"LDO",0x2a,0,1,1,"%l = M8[%#y+%#z] = %x"},@| +{"LDOI",0x29,0,1,1,"%l = M8[%#y%?+] = %x"},@| +{"LDOU",0x2a,0,1,1,"%l = M8[%#y+%#z] = %#x"},@| +{"LDOUI",0x29,0,1,1,"%l = M8[%#y%?+] = %#x"},@| +{"LDSF",0x2a,0,1,1,"%l = (M4[%#y+%#z]) = %.x"},@| +{"LDSFI",0x29,0,1,1,"%l = (M4[%#y%?+]) = %.x"},@| +{"LDHT",0x2a,0,1,1,"%l = M4[%#y+%#z]<<32 = %#x"},@| +{"LDHTI",0x29,0,1,1,"%l = M4[%#y%?+]<<32 = %#x"},@| +{"CSWAP",0x3a,0,2,2,"%l = [M8[%#y+%#z]==%a] = %x, %r"},@| +{"CSWAPI",0x39,0,2,2,"%l = [M8[%#y%?+]==%a] = %x, %r"},@| +{"LDUNC",0x2a,0,1,1,"%l = M8[%#y+%#z] = %#x"},@| +{"LDUNCI",0x29,0,1,1,"%l = M8[%#y%?+] = %#x"},@| +{"LDVTS",0x2a,0,0,1,""},@| +{"LDVTSI",0x29,0,0,1,""},@| +{"PRELD",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@| +{"PRELDI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@| +{"PREGO",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@| +{"PREGOI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@| +{"GO",0x2a,0,0,3,"%l = %#x, -> %#y+%#z"},@| +{"GOI",0x29,0,0,3,"%l = %#x, -> %#y%?+"},@| +{"STB",0x1a,0,1,1,"M1[%#y+%#z] = %b, M8[%#w]=%#a"},@| +{"STBI",0x19,0,1,1,"M1[%#y%?+] = %b, M8[%#w]=%#a"},@| +{"STBU",0x1a,0,1,1,"M1[%#y+%#z] = %#b, M8[%#w]=%#a"},@| +{"STBUI",0x19,0,1,1,"M1[%#y%?+] = %#b, M8[%#w]=%#a"},@| +{"STW",0x1a,0,1,1,"M2[%#y+%#z] = %b, M8[%#w]=%#a"},@| +{"STWI",0x19,0,1,1,"M2[%#y%?+] = %b, M8[%#w]=%#a"},@| +{"STWU",0x1a,0,1,1,"M2[%#y+%#z] = %#b, M8[%#w]=%#a"},@| +{"STWUI",0x19,0,1,1,"M2[%#y%?+] = %#b, M8[%#w]=%#a"},@| +{"STT",0x1a,0,1,1,"M4[%#y+%#z] = %b, M8[%#w]=%#a"},@| +{"STTI",0x19,0,1,1,"M4[%#y%?+] = %b, M8[%#w]=%#a"},@| +{"STTU",0x1a,0,1,1,"M4[%#y+%#z] = %#b, M8[%#w]=%#a"},@| +{"STTUI",0x19,0,1,1,"M4[%#y%?+] = %#b, M8[%#w]=%#a"},@| +{"STO",0x1a,0,1,1,"M8[%#y+%#z] = %b"},@| +{"STOI",0x19,0,1,1,"M8[%#y%?+] = %b"},@| +{"STOU",0x1a,0,1,1,"M8[%#y+%#z] = %#b"},@| +{"STOUI",0x19,0,1,1,"M8[%#y%?+] = %#b"},@| +{"STSF",0x1a,0,1,1,"%(M4[%#y+%#z]%) = %.b, M8[%#w]=%#a"},@| +{"STSFI",0x19,0,1,1,"%(M4[%#y%?+]%) = %.b, M8[%#w]=%#a"},@| +{"STHT",0x1a,0,1,1,"M4[%#y+%#z] = %#b>>32, M8[%#w]=%#a"},@| +{"STHTI",0x19,0,1,1,"M4[%#y%?+] = %#b>>32, M8[%#w]=%#a"},@| +{"STCO",0x0a,0,1,1,"M8[%#y+%#z] = %b"},@| +{"STCOI",0x09,0,1,1,"M8[%#y%?+] = %b"},@| +{"STUNC",0x1a,0,1,1,"M8[%#y+%#z] = %#b"},@| +{"STUNCI",0x19,0,1,1,"M8[%#y%?+] = %#b"},@| +{"SYNCD",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@| +{"SYNCDI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@| +{"PREST",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@| +{"PRESTI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@| +{"SYNCID",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@| +{"SYNCIDI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@| +{"PUSHGO",0xaa,0,0,3,"%lrO=%#b, rL=%a, rJ=%#x, -> %#y+%#z"},@| +{"PUSHGOI",0xa9,0,0,3,"%lrO=%#b, rL=%a, rJ=%#x, -> %#y%?+"} + +@ @= +{"OR",0x2a,0,0,1,"%l = %#y | %#z = %#x"},@| +{"ORI",0x29,0,0,1,"%l = %#y | %z = %#x"},@| +{"ORN",0x2a,0,0,1,"%l = %#y |~ %#z = %#x"},@| +{"ORNI",0x29,0,0,1,"%l = %#y |~ %z = %#x"},@| +{"NOR",0x2a,0,0,1,"%l = %#y ~| %#z = %#x"},@| +{"NORI",0x29,0,0,1,"%l = %#y ~| %z = %#x"},@| +{"XOR",0x2a,0,0,1,"%l = %#y ^ %#z = %#x"},@| +{"XORI",0x29,0,0,1,"%l = %#y ^ %z = %#x"},@| +{"AND",0x2a,0,0,1,"%l = %#y & %#z = %#x"},@| +{"ANDI",0x29,0,0,1,"%l = %#y & %z = %#x"},@| +{"ANDN",0x2a,0,0,1,"%l = %#y \\ %#z = %#x"},@| +{"ANDNI",0x29,0,0,1,"%l = %#y \\ %z = %#x"},@| +{"NAND",0x2a,0,0,1,"%l = %#y ~& %#z = %#x"},@| +{"NANDI",0x29,0,0,1,"%l = %#y ~& %z = %#x"},@| +{"NXOR",0x2a,0,0,1,"%l = %#y ~^ %#z = %#x"},@| +{"NXORI",0x29,0,0,1,"%l = %#y ~^ %z = %#x"},@| +{"BDIF",0x2a,0,0,1,"%l = %#y bdif %#z = %#x"},@| +{"BDIFI",0x29,0,0,1,"%l = %#y bdif %z = %#x"},@| +{"WDIF",0x2a,0,0,1,"%l = %#y wdif %#z = %#x"},@| +{"WDIFI",0x29,0,0,1,"%l = %#y wdif %z = %#x"},@| +{"TDIF",0x2a,0,0,1,"%l = %#y tdif %#z = %#x"},@| +{"TDIFI",0x29,0,0,1,"%l = %#y tdif %z = %#x"},@| +{"ODIF",0x2a,0,0,1,"%l = %#y odif %#z = %#x"},@| +{"ODIFI",0x29,0,0,1,"%l = %#y odif %z = %#x"},@| +{"MUX",0x2a,rM,0,1,"%l = %#b? %#y: %#z = %#x"},@| +{"MUXI",0x29,rM,0,1,"%l = %#b? %#y: %z = %#x"},@| +{"SADD",0x2a,0,0,1,"%l = nu(%#y\\%#z) = %x"},@| +{"SADDI",0x29,0,0,1,"%l = nu(%#y%?\\) = %x"},@| +{"MOR",0x2a,0,0,1,"%l = %#y mor %#z = %#x"},@| +{"MORI",0x29,0,0,1,"%l = %#y mor %z = %#x"},@| +{"MXOR",0x2a,0,0,1,"%l = %#y mxor %#z = %#x"},@| +{"MXORI",0x29,0,0,1,"%l = %#y mxor %z = %#x"},@| +{"SETH",0x20,0,0,1,"%l = %#z"},@| +{"SETMH",0x20,0,0,1,"%l = %#z"},@| +{"SETML",0x20,0,0,1,"%l = %#z"},@| +{"SETL",0x20,0,0,1,"%l = %#z"},@| +{"INCH",0x30,0,0,1,"%l = %#y + %#z = %#x"},@| +{"INCMH",0x30,0,0,1,"%l = %#y + %#z = %#x"},@| +{"INCML",0x30,0,0,1,"%l = %#y + %#z = %#x"},@| +{"INCL",0x30,0,0,1,"%l = %#y + %#z = %#x"},@| +{"ORH",0x30,0,0,1,"%l = %#y | %#z = %#x"},@| +{"ORMH",0x30,0,0,1,"%l = %#y | %#z = %#x"},@| +{"ORML",0x30,0,0,1,"%l = %#y | %#z = %#x"},@| +{"ORL",0x30,0,0,1,"%l = %#y | %#z = %#x"},@| +{"ANDNH",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@| +{"ANDNMH",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@| +{"ANDNML",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@| +{"ANDNL",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@| +{"JMP",0x40,0,0,1,"-> %#z"},@| +{"JMPB",0x40,0,0,1,"-> %#z"},@| +{"PUSHJ",0xe0,0,0,1,"%lrO=%#b, rL=%a, rJ=%#x, -> %#z"},@| +{"PUSHJB",0xe0,0,0,1,"%lrO=%#b, rL=%a, rJ=%#x, -> %#z"},@| +{"GETA",0x60,0,0,1,"%l = %#z"},@| +{"GETAB",0x60,0,0,1,"%l = %#z"},@| +{"PUT",0x02,0,0,1,"%s = %r"},@| +{"PUTI",0x01,0,0,1,"%s = %r"},@| +{"POP",0x80,rJ,0,3,"%lrL=%a, rO=%#b, -> %#y%?+"},@| +{"RESUME",0x00,0,0,5,"{%#b} -> %#z"},@| +{"SAVE",0x20,0,20,1,"%l = %#x"},@| +{"UNSAVE",0x82,0,20,1,"%#z: rG=%x, ..., rL=%a"},@| +{"SYNC",0x01,0,0,1,""},@| +{"SWYM",0x00,0,0,1,""},@| +{"GET",0x20,0,0,1,"%l = %s = %#x"},@| +{"TRIP",0x0a,255,0,5,"rW=%#w, rX=%#x, rY=%#y, rZ=%#z, rB=%#b, g[255]=%#a"} + +@ @= +{ + if ((op&0xfe)==JMP) yz=inst&0xffffff; + if (op&1) yz-=(op==JMPB? 0x1000000: 0x10000); + y=inst_ptr;@+ z=incr(loc,yz<<2); +} + +@ @= +if (resuming && rop!=RESUME_AGAIN) + @@; +else { + if (f&0x10) @; + if (info[op].third_operand) @; + if (f&0x1) z.l=zz; + else if (f&0x2) @@; + else if ((op&0xf0)==SETH) @; + if (f&0x4) y.l=yy; + else if (f&0x8) @; +} + +@ There are 256 global registers, |g[0]| through |g[255]|; the +first 32 of them are used for the special registers |rA|, |rB|, etc. +There are |lring_mask+1| local registers, usually 256 but the +user can increase this to a larger power of~2 if desired. + +The current values of rL, rG, rO, and rS are kept in separate variables +called |L|, |G|, |O|, and |S| for convenience. (In fact, |O| and |S| +actually hold the values rO/8 and rS/8, modulo |lring_size|.) + +@= +{ + if (zz>=G) z=g[zz]; + else if (zz= +{ + if (yy>=G) y=g[yy]; + else if (yy= +{ + if (xx>=G) b=g[xx]; + else if (xx= +register int G,L,O; /* accessible copies of key registers */ + +@ @= +octa g[256]; /* global registers */ +octa *l; /* local registers */ +int lring_size; /* the number of local registers (a power of 2) */ +int lring_mask; /* one less than |lring_size| */ +int S; /* congruent to $\rm rS\GG 3$ modulo |lring_size| */ + +@ Several of the global registers have constant values, because +of the way \MMIX\ has been simplified in this simulator. + +Special register rN has a constant value identifying the time of compilation. +(The macro \.{ABSTIME} is defined externally in the file \.{abstime.h}, +which should have just been created by {\mc ABSTIME}\kern.05em; +{\mc ABSTIME} is +a trivial program that computes the value of the standard library function +|time(NULL)|. We assume that this number, which is the number of seconds in +the ``{\mc UNIX} epoch,'' is less than~$2^{32}$. Beware: Our assumption will +fail in February of 2106.) +@^system dependencies@> + +@d VERSION 1 /* version of the \MMIX\ architecture that we support */ +@d SUBVERSION 0 /* secondary byte of version number */ +@d SUBSUBVERSION 1 /* further qualification to version number */ + +@= +g[rK]=neg_one; +g[rN].h=(VERSION<<24)+(SUBVERSION<<16)+(SUBSUBVERSION<<8); +g[rN].l=ABSTIME; /* see comment and warning above */ +g[rT].h=0x80000005; +g[rTT].h=0x80000006; +g[rV].h=0x369c2004; +if (lring_size<256) lring_size=256; +lring_mask=lring_size-1; +if (lring_size&lring_mask) + panic("The number of local registers must be a power of 2"); +@.The number of local...@> +l=(octa*)calloc(lring_size,sizeof(octa)); +if (!l) panic("No room for the local registers"); +@.No room...@> +cur_round=ROUND_NEAR; + +@ In operations like |INCH|, we want |z| to be the |yz| field, +shifted left 48 bits. We also want |y| to be register~X, which has +previously been placed in |b|; then |INCH| can be simulated as if +it were |ADDU|. + +@= +{ + switch (op&3) { + case 0: z.h=yz<<16;@+break; + case 1: z.h=yz;@+break; + case 2: z.l=yz<<16;@+break; + case 3: z.l=yz;@+break; + } + y=b; +} + +@ @= +b=g[info[op].third_operand]; + +@ @= +if (xx>=G) { + sprintf(lhs,"$%d=g[%d]",xx,xx); + x_ptr=&g[xx]; +}@+else { + while (xx>=L) @; + sprintf(lhs,"$%d=l[%d]",xx,(O+xx)&lring_mask); + x_ptr=&l[(O+xx)&lring_mask]; +} + +@ @= +{ + l[(O+L)&lring_mask]=zero_octa; + L=g[rL].l=L+1; + if (((S-O-L)&lring_mask)==0) stack_store(); +} + +@ The |stack_store| routine advances the ``gamma'' pointer in the +ring of local registers, by storing the oldest local register into memory +location~rS and advancing rS. + +@d test_store_bkpt(ll) if ((ll)->bkpt&write_bit) breakpoint=tracing=true + +@= +void stack_store @,@,@[ARGS((void))@];@+@t}\6{@> +void stack_store() +{ + register mem_tetra *ll=mem_find(g[rS]); + register int k=S&lring_mask; + ll->tet=l[k].h;@+test_store_bkpt(ll); + (ll+1)->tet=l[k].l;@+test_store_bkpt(ll+1); + if (stack_tracing) { + tracing=true; + if (cur_line) show_line(); + printf(" M8[#%08x%08x]=l[%d]=#%08x%08x, rS+=8\n", + g[rS].h,g[rS].l,k,l[k].h,l[k].l); + } + g[rS]=incr(g[rS],8), S++; +} + +@ The |stack_load| routine is essentially the inverse of |stack_store|. + +@d test_load_bkpt(ll) if ((ll)->bkpt&read_bit) breakpoint=tracing=true + +@= +void stack_load @,@,@[ARGS((void))@];@+@t}\6{@> +void stack_load() +{ + register mem_tetra *ll; + register int k; + S--, g[rS]=incr(g[rS],-8); + ll=mem_find(g[rS]); + k=S&lring_mask; + l[k].h=ll->tet;@+test_load_bkpt(ll); + l[k].l=(ll+1)->tet;@+test_load_bkpt(ll+1); + if (stack_tracing) { + tracing=true; + if (cur_line) show_line(); + printf(" rS-=8, l[%d]=M8[#%08x%08x]=#%08x%08x\n", + k,g[rS].h,g[rS].l,l[k].h,l[k].l); + } +} + +@* Simulating the instructions. The master switch branches in 256 +directions, one for each \MMIX\ instruction. + +Let's start with |ADD|, since it is somehow the most typical case---not +too easy, and not too hard. The task is to compute |x=y+z|, and to +signal overflow if the sum is out of range. Overflow occurs if and +only if |y| and |z| have the same sign but the sum has a different sign. + +Overflow is one of the eight arithmetic exceptions. We record such +exceptions in a variable called~|exc|, which is set to +zero at the beginning of each cycle and used to update~rA at the end. + +The main control routine has put the input operands into octabytes +|y| and~|z|. It has also made |x_ptr| point to the octabyte where the +result should be placed. + +@= +case ADD: case ADDI: x=w; /* |w=oplus(y,z)| */ + if (((y.h^z.h)&sign_bit)==0 && ((y.h^x.h)&sign_bit)!=0) exc|=V_BIT; +store_x: *x_ptr=x;@+break; + +@ Other cases of signed and unsigned addition and subtraction are, +of course, similar. Overflow occurs in the calculation |x=y-z| if and +only if it occurs in the calculation |y=x+z|. + +@= +case SUB: case SUBI: case NEG: case NEGI: x=ominus(y,z); + if (((x.h^z.h)&sign_bit)==0 && ((x.h^y.h)&sign_bit)!=0) exc|=V_BIT; + goto store_x; +case ADDU: case ADDUI: case INCH: case INCMH: case INCML: case INCL: + x=w;@+goto store_x; +case SUBU: case SUBUI: case NEGU: case NEGUI: x=ominus(y,z);@+goto store_x; +case IIADDU: case IIADDUI: case IVADDU: case IVADDUI: +case VIIIADDU: case VIIIADDUI: case XVIADDU: case XVIADDUI: + x=oplus(shift_left(y,((op&0xf)>>1)-3),z);@+goto store_x; +case SETH: case SETMH: case SETML: case SETL: case GETA: case GETAB: + x=z;@+goto store_x; + +@ Let's get the simple bitwise operations out of the way too. + +@= +case OR: case ORI: case ORH: case ORMH: case ORML: case ORL: + x.h=y.h|z.h;@+ x.l=y.l|z.l;@+ goto store_x; +case ORN: case ORNI: + x.h=y.h|~z.h;@+ x.l=y.l|~z.l;@+ goto store_x; +case NOR: case NORI: + x.h=~(y.h|z.h);@+ x.l=~(y.l|z.l);@+ goto store_x; +case XOR: case XORI: + x.h=y.h^z.h;@+ x.l=y.l^z.l;@+ goto store_x; +case AND: case ANDI: + x.h=y.h&z.h;@+ x.l=y.l&z.l;@+ goto store_x; +case ANDN: case ANDNI: case ANDNH: case ANDNMH: case ANDNML: case ANDNL: + x.h=y.h&~z.h;@+ x.l=y.l&~z.l;@+ goto store_x; +case NAND: case NANDI: + x.h=~(y.h&z.h);@+ x.l=~(y.l&z.l);@+ goto store_x; +case NXOR: case NXORI: + x.h=~(y.h^z.h);@+ x.l=~(y.l^z.l);@+ goto store_x; + +@ The less simple bit manipulations are almost equally simple, +given the subroutines of {\mc MMIX-ARITH}. +The |MUX| operation has three inputs; +in such cases the inputs appear in |y|, |z|, and~|b|. + +@d shift_amt (z.h || z.l>=64? 64: z.l) + +@= +case SL: case SLI: x=shift_left(y,shift_amt); + a=shift_right(x,shift_amt,0); + if (a.h!=y.h || a.l!=y.l) exc|=V_BIT; + goto store_x; +case SLU: case SLUI: x=shift_left(y,shift_amt);@+goto store_x; +case SR: case SRI: case SRU: case SRUI: + x=shift_right(y,shift_amt,op&0x2);@+goto store_x; +case MUX: case MUXI: + x.h=(y.h&b.h)|(z.h&~b.h);@+ x.l=(y.l&b.l)|(z.l&~b.l); + goto store_x; +case SADD: case SADDI: + x.l=count_bits(y.h&~z.h)+count_bits(y.l&~z.l);@+goto store_x; +case MOR: case MORI: + x=bool_mult(y,z,false);@+goto store_x; +case MXOR: case MXORI: + x=bool_mult(y,z,true);@+goto store_x; +case BDIF: case BDIFI: + x.h=byte_diff(y.h,z.h);@+x.l=byte_diff(y.l,z.l);@+goto store_x; +case WDIF: case WDIFI: + x.h=wyde_diff(y.h,z.h);@+x.l=wyde_diff(y.l,z.l);@+goto store_x; +case TDIF: case TDIFI:@+ + if (y.h>z.h) x.h=y.h-z.h; +tdif_l:@+ if (y.l>z.l) x.l=y.l-z.l;@+ goto store_x; +case ODIF: case ODIFI:@+if (y.h>z.h) x=ominus(y,z); + else if (y.h==z.h) goto tdif_l; + goto store_x; + +@ When an operation has two outputs, the primary output is placed in~|x| +and the auxiliary output is placed in~|a|. + +@= +case MUL: case MULI: x=signed_omult(y,z); +test_overflow:@+if (overflow) exc|=V_BIT; + goto store_x; +case MULU: case MULUI: x=omult(y,z);@+a=g[rH]=aux;@+goto store_x; +case DIV: case DIVI:@+if (!z.l && !z.h) aux=y, exc|=D_BIT, overflow=false; + else x=signed_odiv(y,z); + a=g[rR]=aux;@+goto test_overflow; +case DIVU: case DIVUI: x=odiv(b,y,z);@+a=g[rR]=aux;@+goto store_x; + +@ The floating point routines of {\mc MMIX-ARITH} record exceptional +events in a variable called |exceptions|. Here we simply merge those bits into +the |exc| variable. The |U_BIT| is not exactly the +same as ``underflow,'' but the true definition of underflow will be applied +when |exc| is combined with~rA. + +@= +case FADD: x=fplus(y,z); + fin_float: round_mode=cur_round; + store_fx: exc|=exceptions;@+ goto store_x; +case FSUB: a=z;@+if (fcomp(a,zero_octa)!=2) a.h^=sign_bit; + x=fplus(y,a);@+goto fin_float; +case FMUL: x=fmult(y,z);@+goto fin_float; +case FDIV: x=fdivide(y,z);@+goto fin_float; +case FREM: x=fremstep(y,z,2500);@+goto fin_float; +case FSQRT: x=froot(z,y.l); + fin_unifloat:@+if (y.h || y.l>4) goto illegal_inst; + round_mode=(y.l? y.l: cur_round);@+goto store_fx; +case FINT: x=fintegerize(z,y.l);@+goto fin_unifloat; +case FIX: x=fixit(z,y.l);@+goto fin_unifloat; +case FIXU: x=fixit(z,y.l);@+exceptions&=~W_BIT;@+goto fin_unifloat; +case FLOT: case FLOTI: case FLOTU: case FLOTUI: +case SFLOT: case SFLOTI: case SFLOTU: case SFLOTUI: + x=floatit(z,y.l,op&0x2,op&0x4);@+goto fin_unifloat; + +@ We have now done all of the arithmetic operations except for the +cases that compare two registers and yield a value of $-1$~or~0~or~1. + +@d cmp_zero store_x /* |x| is 0 by default */ + +@= +case CMP: case CMPI:@+if ((y.h&sign_bit)>(z.h&sign_bit)) goto cmp_neg; + if ((y.h&sign_bit)<(z.h&sign_bit)) goto cmp_pos; +case CMPU: case CMPUI:@+if (y.hz.h) goto cmp_pos; + if (y.l= +int register_truth @,@,@[ARGS((octa,mmix_opcode))@];@+@t}\6{@> +int register_truth(o,op) + octa o; + mmix_opcode op; +{@+register int b; + switch ((op>>1) & 0x3) { + case 0: b=o.h>>31;@+break; /* negative? */ + case 1: b=(o.h==0 && o.l==0);@+break; /* zero? */ + case 2: b=(o.h= +case CSN: case CSNI: case CSZ: case CSZI:@/ +case CSP: case CSPI: case CSOD: case CSODI:@/ +case CSNN: case CSNNI: case CSNZ: case CSNZI:@/ +case CSNP: case CSNPI: case CSEV: case CSEVI:@/ +case ZSN: case ZSNI: case ZSZ: case ZSZI:@/ +case ZSP: case ZSPI: case ZSOD: case ZSODI:@/ +case ZSNN: case ZSNNI: case ZSNZ: case ZSNZI:@/ +case ZSNP: case ZSNPI: case ZSEV: case ZSEVI:@/ + x=register_truth(y,op)? z: b;@+goto store_x; + +@ Didn't that feel good, when 32 opcodes reduced to a single case? +We get to do it one more time. Happiness! + +@= +case BN: case BNB: case BZ: case BZB:@/ +case BP: case BPB: case BOD: case BODB:@/ +case BNN: case BNNB: case BNZ: case BNZB:@/ +case BNP: case BNPB: case BEV: case BEVB:@/ +case PBN: case PBNB: case PBZ: case PBZB:@/ +case PBP: case PBPB: case PBOD: case PBODB:@/ +case PBNN: case PBNNB: case PBNZ: case PBNZB:@/ +case PBNP: case PBNPB: case PBEV: case PBEVB:@/ + x.l=register_truth(b,op); + if (x.l) { + inst_ptr=z; + good=(op>=PBN); + }@+else good=(op= +case LDB: case LDBI: case LDBU: case LDBUI:@/ + i=56;@+j=(w.l&0x3)<<3; goto fin_ld; +case LDW: case LDWI: case LDWU: case LDWUI:@/ + i=48;@+j=(w.l&0x2)<<3; goto fin_ld; +case LDT: case LDTI: case LDTU: case LDTUI:@/ + i=32;@+j=0;@+ goto fin_ld; +case LDHT: case LDHTI: i=j=0; +fin_ld: ll=mem_find(w);@+test_load_bkpt(ll); + x.h=ll->tet; + x=shift_right(shift_left(x,j),i,op&0x2); +check_ld:@+if (w.h&sign_bit) goto privileged_inst; + goto store_x; +case LDO: case LDOI: case LDOU: case LDOUI: case LDUNC: case LDUNCI: + w.l&=-8;@+ ll=mem_find(w); + test_load_bkpt(ll);@+test_load_bkpt(ll+1); + x.h=ll->tet;@+ x.l=(ll+1)->tet; + goto check_ld; +case LDSF: case LDSFI: ll=mem_find(w);@+test_load_bkpt(ll); + x=load_sf(ll->tet);@+ goto check_ld; + +@ @= +case STB: case STBI: case STBU: case STBUI:@/ + i=56;@+j=(w.l&0x3)<<3; goto fin_pst; +case STW: case STWI: case STWU: case STWUI:@/ + i=48;@+j=(w.l&0x2)<<3; goto fin_pst; +case STT: case STTI: case STTU: case STTUI:@/ + i=32;@+j=0; +fin_pst: ll=mem_find(w); + if ((op&0x2)==0) { + a=shift_right(shift_left(b,i),i,0); + if (a.h!=b.h || a.l!=b.l) exc|=V_BIT; + } + ll->tet^=(ll->tet^(b.l<<(i-32-j))) & ((((tetra)-1)<<(i-32))>>j); + goto fin_st; +case STSF: case STSFI: ll=mem_find(w); + ll->tet=store_sf(b);@+exc=exceptions; + goto fin_st; +case STHT: case STHTI: ll=mem_find(w);@+ ll->tet=b.h; +fin_st: test_store_bkpt(ll); + w.l&=-8;@+ll=mem_find(w); + a.h=ll->tet;@+ a.l=(ll+1)->tet; /* for trace output */ + goto check_st; +case STCO: case STCOI: b.l=xx; +case STO: case STOI: case STOU: case STOUI: case STUNC: case STUNCI: + w.l&=-8;@+ll=mem_find(w); + test_store_bkpt(ll);@+ test_store_bkpt(ll+1); + ll->tet=b.h;@+ (ll+1)->tet=b.l; +check_st:@+if (w.h&sign_bit) goto privileged_inst; + break; + +@ The |CSWAP| operation has elements of both loading and storing. +We shuffle some of +the operands around so that they will appear correctly in the trace output. + +@= +case CSWAP: case CSWAPI: w.l&=-8;@+ll=mem_find(w); + test_load_bkpt(ll);@+test_load_bkpt(ll+1); + a=g[rP]; + if (ll->tet==a.h && (ll+1)->tet==a.l) { + x.h=0, x.l=1; + test_store_bkpt(ll);@+test_store_bkpt(ll+1); + ll->tet=b.h, (ll+1)->tet=b.l; + strcpy(rhs,"M8[%#w]=%#b"); + }@+else { + b.h=ll->tet, b.l=(ll+1)->tet; + g[rP]=b; + strcpy(rhs,"rP=%#b"); + } + goto check_ld; + +@ The |GET| command is permissive, but |PUT| is restrictive. + +@= +case GET:@+if (yy!=0 || zz>=32) goto illegal_inst; + x=g[zz]; + goto store_x; +case PUT: case PUTI:@+ if (yy!=0 || xx>=32) goto illegal_inst; + strcpy(rhs,"%z = %#z"); + if (xx>=8) { + if (xx<=11) goto illegal_inst; /* can't change rC, rN, rO, rS */ + if (xx<=18) goto privileged_inst; + if (xx==rA) @@; + else if (xx==rL) @@; + else if (xx==rG) @; + } + g[xx]=z;@+zz=xx;@+break; + +@ @= +{ + x=z;@+ strcpy(rhs,z.h? "min(rL,%#x) = %z": "min(rL,%x) = %z"); + if (z.l>L || z.h) z.h=0, z.l=L; + else old_L=L=z.l; +} + +@ @= +{ + if (z.h!=0 || z.l>255 || z.l= +{ + if (z.h!=0 || z.l>=0x40000) goto illegal_inst; + cur_round=(z.l>=0x10000? z.l>>16: ROUND_NEAR); +} + +@ Pushing and popping are rather delicate, because we want to trace +them coherently. + +@= +case PUSHGO: case PUSHGOI: inst_ptr=w;@+goto push; +case PUSHJ: case PUSHJB: inst_ptr=z; +push:@+if (xx>=G) { + xx=L++; + if (((S-O-L)&lring_mask)==0) stack_store(); + } + x.l=xx;@+l[(O+xx)&lring_mask]=x; /* the ``hole'' records the amount pushed */ + sprintf(lhs,"l[%d]=%d, ",(O+xx)&lring_mask,xx); + x=g[rJ]=incr(loc,4); + L-=xx+1;@+ O+=xx+1; + b=g[rO]=incr(g[rO],(xx+1)<<3); +sync_L: a.l=g[rL].l=L;@+break; +case POP:@+if (xx!=0 && xx<=L) y=l[(O+xx-1)&lring_mask]; + if (g[rS].l==g[rO].l) stack_load(); + k=l[(O-1)&lring_mask].l&0xff; + while ((tetra)(O-S)<=(tetra)k) stack_load(); + L=k+(xx<=L? xx: L+1); + if (L>G) L=G; + if (L>k) { + l[(O-1)&lring_mask]=y; + if (y.h) sprintf(lhs,"l[%d]=#%x%08x, ",(O-1)&lring_mask,y.h,y.l); + else sprintf(lhs,"l[%d]=#%x, ",(O-1)&lring_mask,y.l); + }@+else lhs[0]='\0'; + y=g[rJ];@+ z.l=yz<<2;@+ inst_ptr=oplus(y,z); + O-=k+1;@+ b=g[rO]=incr(g[rO],-((k+1)<<3)); + goto sync_L; + +@ To complete our simulation of \MMIX's register stack, we need +to implement |SAVE| and |UNSAVE|. + +@= +case SAVE:@+if (xx; + if (k==255) k=rB; + else if (k==rR) k=rP; + else if (k==rZ+1) break; + else k++; + } + O=S, g[rO]=g[rS]; + x=incr(g[rO],-8);@+goto store_x; + +@ This part of the program naturally has a lot in common with the +|stack_store| subroutine. (There's a little white lie in the +section name; if |k|~is |rZ+1|, we store rG and~rA, not |g[k]|.) + +@= +ll=mem_find(g[rS]); +if (k==rZ+1) x.h=G<<24, x.l=g[rA].l; +else x=g[k]; +ll->tet=x.h;@+test_store_bkpt(ll); +(ll+1)->tet=x.l;@+test_store_bkpt(ll+1); +if (stack_tracing) { + tracing=true; + if (cur_line) show_line(); + if (k>=32) printf(" M8[#%08x%08x]=g[%d]=#%08x%08x, rS+=8\n", + g[rS].h,g[rS].l,k,x.h,x.l); + else printf(" M8[#%08x%08x]=%s=#%08x%08x, rS+=8\n", + g[rS].h,g[rS].l,k==rZ+1? "(rG,rA)": special_name[k],x.h,x.l); +} +S++, g[rS]=incr(g[rS],8); + +@ @= +case UNSAVE:@+if (xx!=0 || yy!=0) goto illegal_inst; + z.l&=-8;@+g[rS]=incr(z,8); + for (k=rZ+1;;) { + @; + if (k==rP) k=rR; + else if (k==rB) k=255; + else if (k==G) break; + else k--; + } + S=g[rS].l>>3; + stack_load(); + k=l[S&lring_mask].l&0xff; + for (j=0;jG? G: k; + g[rL].l=L;@+a=g[rL]; + g[rG].l=G;@+break; + +@ @= +g[rS]=incr(g[rS],-8); +ll=mem_find(g[rS]); +test_load_bkpt(ll);@+test_load_bkpt(ll+1); +if (k==rZ+1) x.l=G=g[rG].l=ll->tet>>24, a.l=g[rA].l=(ll+1)->tet&0x3ffff; +else g[k].h=ll->tet, g[k].l=(ll+1)->tet; +if (stack_tracing) { + tracing=true; + if (cur_line) show_line(); + if (k>=32) printf(" rS-=8, g[%d]=M8[#%08x%08x]=#%08x%08x\n", + k,g[rS].h,g[rS].l,ll->tet,(ll+1)->tet); + else if (k==rZ+1) printf(" (rG,rA)=M8[#%08x%08x]=#%08x%08x\n", + g[rS].h,g[rS].l,ll->tet,(ll+1)->tet); + else printf(" rS-=8, %s=M8[#%08x%08x]=#%08x%08x\n", + special_name[k],g[rS].h,g[rS].l,ll->tet,(ll+1)->tet); +} + +@ The cache maintenance instructions don't affect this simulation, +because there are no caches. But if the user has invoked them, we do +provide a bit of information when tracing, indicating the scope of the +instruction. + +@= +case SYNCID: case SYNCIDI: case PREST: case PRESTI: +case SYNCD: case SYNCDI: case PREGO: case PREGOI: +case PRELD: case PRELDI: x=incr(w,xx);@+break; + +@ Several loose ends remain to be nailed down. +% (Incidentally, a ``loose end'' should never be confused with ``Lucent.'') + +@= +case GO: case GOI: x=inst_ptr;@+inst_ptr=w;@+goto store_x; +case JMP: case JMPB: inst_ptr=z; +case SWYM: break; +case SYNC:@+if (xx!=0 || yy!=0 || zz>7) goto illegal_inst; + if (zz<=3) break; +case LDVTS: case LDVTSI: privileged_inst: strcpy(lhs,"!privileged"); + goto break_inst; +illegal_inst: strcpy(lhs,"!illegal"); +break_inst: breakpoint=tracing=true; + if (!interacting && !interact_after_break) halted=true; + break; + +@* Trips and traps. We have now implemented 253 of the 256 instructions: all +but \.{TRIP}, \.{TRAP}, and \.{RESUME}. + +The |TRIP| instruction simply turns |H_BIT| on in the |exc| variable; +this will trigger an interruption to location~0. +@^interrupts@> + +The |TRAP| instruction is not simulated, except for the system calls +mentioned in the introduction. + +@= +case TRIP: exc|=H_BIT;@+break; +case TRAP:@+if (xx!=0 || yy>max_sys_call) goto privileged_inst; + strcpy(rhs,trap_format[yy]); + g[rWW]=inst_ptr; + g[rXX].h=sign_bit, g[rXX].l=inst; + g[rYY]=y, g[rZZ]=z; + z.h=0, z.l=zz; + a=incr(b,8); + @; + switch (yy) { +case Halt: @;@+g[rBB]=g[255];@+break; +case Fopen: g[rBB]=mmix_fopen((unsigned char)zz,mb,ma);@+break; +case Fclose: g[rBB]=mmix_fclose((unsigned char)zz);@+break; +case Fread: g[rBB]=mmix_fread((unsigned char)zz,mb,ma);@+break; +case Fgets: g[rBB]=mmix_fgets((unsigned char)zz,mb,ma);@+break; +case Fgetws: g[rBB]=mmix_fgetws((unsigned char)zz,mb,ma);@+break; +case Fwrite: g[rBB]=mmix_fwrite((unsigned char)zz,mb,ma);@+break; +case Fputs: g[rBB]=mmix_fputs((unsigned char)zz,b);@+break; +case Fputws: g[rBB]=mmix_fputws((unsigned char)zz,b);@+break; +case Fseek: g[rBB]=mmix_fseek((unsigned char)zz,b);@+break; +case Ftell: g[rBB]=mmix_ftell((unsigned char)zz);@+break; +} + x=g[255]=g[rBB];@+break; + +@ @= +if (!zz) halted=breakpoint=true; +else if (zz==1) { + if (loc.h || loc.l>=0x90) goto privileged_inst; + print_trip_warning(loc.l>>4,incr(g[rW],-4)); +}@+else goto privileged_inst; + +@ @= +char arg_count[]={1,3,1,3,3,3,3,2,2,2,1}; +char *trap_format[]={ +"Halt(%z)", +"$255 = Fopen(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x", +"$255 = Fclose(%!z) = %x", +"$255 = Fread(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x", +"$255 = Fgets(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x", +"$255 = Fgetws(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x", +"$255 = Fwrite(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x", +"$255 = Fputs(%!z,%#b) = %x", +"$255 = Fputws(%!z,%#b) = %x", +"$255 = Fseek(%!z,%b) = %x", +"$255 = Ftell(%!z) = %x"}; + +@ @= +if (arg_count[yy]==3) { + ll=mem_find(b);@+test_load_bkpt(ll);@+test_load_bkpt(ll+1); + mb.h=ll->tet, mb.l=(ll+1)->tet; + ll=mem_find(a);@+test_load_bkpt(ll);@+test_load_bkpt(ll+1); + ma.h=ll->tet, ma.l=(ll+1)->tet; +} + +@ The input/output operations invoked by \.{TRAP}s are +done by subroutines in an auxiliary program module called {\mc MMIX-IO}. +Here we need only declare those subroutines, and write three primitive +interfaces on which they depend. + +@ @= +extern void mmix_io_init @,@,@[ARGS((void))@]; +extern octa mmix_fopen @,@,@[ARGS((unsigned char,octa,octa))@]; +extern octa mmix_fclose @,@,@[ARGS((unsigned char))@]; +extern octa mmix_fread @,@,@[ARGS((unsigned char,octa,octa))@]; +extern octa mmix_fgets @,@,@[ARGS((unsigned char,octa,octa))@]; +extern octa mmix_fgetws @,@,@[ARGS((unsigned char,octa,octa))@]; +extern octa mmix_fwrite @,@,@[ARGS((unsigned char,octa,octa))@]; +extern octa mmix_fputs @,@,@[ARGS((unsigned char,octa))@]; +extern octa mmix_fputws @,@,@[ARGS((unsigned char,octa))@]; +extern octa mmix_fseek @,@,@[ARGS((unsigned char,octa))@]; +extern octa mmix_ftell @,@,@[ARGS((unsigned char))@]; +extern void print_trip_warning @,@,@[ARGS((int,octa))@]; +extern void mmix_fake_stdin @,@,@[ARGS((FILE*))@]; + +@ The subroutine |mmgetchars(buf,size,addr,stop)| reads characters +starting at address |addr| in the simulated memory and stores them +in |buf|, continuing until |size| characters have been read or +some other stopping criterion has been met. If |stop<0| there is +no other criterion; if |stop=0| a null character will also terminate +the process; otherwise |addr| is even, and two consecutive null bytes +starting at an even address will terminate the process. The number +of bytes read and stored, exclusive of terminating nulls, is returned. + +@= +int mmgetchars @,@,@[ARGS((char*,int,octa,int))@];@+@t}\6{@> +int mmgetchars(buf,size,addr,stop) + char *buf; + int size; + octa addr; + int stop; +{ + register char *p; + register int m; + register mem_tetra *ll; + register tetra x; + octa a; + for (p=buf,m=0,a=addr; mtet; + if ((a.l&0x3) || m>size-4) @@; + else @@; + } + return size; +} + +@ @= +{ + *p=(x>>(8*((~a.l)&0x3)))&0xff; + if (!*p && stop>=0) { + if (stop==0) return m; + if ((a.l&0x1) && *(p-1)=='\0') return m-1; + } + p++,m++,a=incr(a,1); +} + +@ @= +{ + *p=x>>24; + if (!*p && (stop==0 || (stop>0 && x<0x10000))) return m; + *(p+1)=(x>>16)&0xff; + if (!*(p+1) && stop==0) return m+1; + *(p+2)=(x>>8)&0xff; + if (!*(p+2) && (stop==0 || (stop>0 && (x&0xffff)==0))) return m+2; + *(p+3)=x&0xff; + if (!*(p+3) && stop==0) return m+3; + p+=4,m+=4,a=incr(a,4); +} + +@ The subroutine |mmputchars(buf,size,addr)| puts |size| characters +into the simulated memory starting at address |addr|. + +@= +void mmputchars @,@,@[ARGS((unsigned char*,int,octa))@];@+@t}\6{@> +void mmputchars(buf,size,addr) + unsigned char *buf; + int size; + octa addr; +{ + register unsigned char *p; + register int m; + register mem_tetra *ll; + octa a; + for (p=buf,m=0,a=addr; msize-4) @@; + else @; + } +} + +@ @= +{ + register int s=8*((~a.l)&0x3); + ll->tet^=(((ll->tet>>s)^*p)&0xff)<= +{ + ll->tet=(*p<<24)+(*(p+1)<<16)+(*(p+2)<<8)+*(p+3); + p+=4,m+=4,a=incr(a,4); +} + +@ When standard input is being read by the simulated program at the same time +as it is being used for interaction, we try to keep the two uses separate +by maintaining a private buffer for the simulated program's \.{StdIn}. +Online input is usually transmitted from the keyboard to a \CEE/ program +a line at a time; therefore an +|fgets| operation works much better than |fread| when we prompt +for new input. But there is a slight complication, because |fgets| +might read a null character before coming to a newline character. +We cannot deduce the number of characters read by |fgets| simply +by looking at |strlen(stdin_buf)|. + +@= +char stdin_chr @,@,@[ARGS((void))@];@+@t}\6{@> +char stdin_chr() +{ + register char* p; + while (stdin_buf_start==stdin_buf_end) { + if (interacting) { + printf("StdIn> ");@+fflush(stdout); +@.StdIn>@> + } + if (!fgets(stdin_buf,256,stdin)) + panic("End of file on standard input; use the -f option, not <"); + stdin_buf_start=stdin_buf; + for (p=stdin_buf;p= +char stdin_buf[256]; /* standard input to the simulated program */ +char *stdin_buf_start; /* current position in that buffer */ +char *stdin_buf_end; /* current end of that buffer */ + +@ Just after executing each instruction, we do the following. +Underflow that is exact and not enabled is ignored. (This applies +also to underflow that was triggered by |RESUME_SET|.) + +@= +if ((exc&(U_BIT+X_BIT))==U_BIT && !(g[rA].l&U_BIT)) exc &=~U_BIT; +if (exc) { + if (exc&tracing_exceptions) tracing=true; + j=exc&(g[rA].l|H_BIT); /* find all exceptions that have been enabled */ + if (j) @; + g[rA].l |= exc>>8; +} + +@ @= +{ + tripping=true; + for (k=0; !(j&H_BIT); j<<=1, k++) ; + exc&=~(H_BIT>>k); /* trips taken are not logged as events */ + g[rW]=inst_ptr; + inst_ptr.h=0, inst_ptr.l=k<<4; + g[rX].h=sign_bit, g[rX].l=inst; + if ((op&0xe0)==STB) g[rY]=w, g[rZ]=b; + else g[rY]=y, g[rZ]=z; + g[rB]=g[255]; + g[255]=g[rJ]; + if (op==TRIP) w=g[rW], x=g[rX], a=g[255]; +} + +@ We are finally ready for the last case. + +@= +case RESUME:@+if (xx || yy || zz) goto illegal_inst; +inst_ptr=z=g[rW]; +b=g[rX]; +if (!(b.h&sign_bit)) @; +break; + +@ Here we check to see if the ropcode restrictions hold. +If so, the ropcode will actually be obeyed on the next fetch phase. + +@d RESUME_AGAIN 0 /* repeat the command in rX as if in location $\rm rW-4$ */ +@d RESUME_CONT 1 /* same, but substitute rY and rZ for operands */ +@d RESUME_SET 2 /* set r[X] to rZ */ + +@= +{ + rop=b.h>>24; /* the ropcode is the leading byte of rX */ + switch (rop) { + case RESUME_CONT:@+if ((1<<(b.l>>28))&0x8f30) goto illegal_inst; + case RESUME_SET: k=(b.l>>16)&0xff; + if (k>=L && k>24)==RESUME) goto illegal_inst; + break; + default: goto illegal_inst; + } + resuming=true; +} + +@ @= +if (rop==RESUME_SET) { + op=ORI; + y=g[rZ]; + z=zero_octa; + exc=g[rX].h&0xff00; + f=X_is_dest_bit; +}@+else { /* |RESUME_CONT| */ + y=g[rY]; + z=g[rZ]; +} + +@ We don't want to count the |UNSAVE| that bootstraps the whole process. + +@= +if (g[rU].l || g[rU].h || !resuming) { + g[rC].h+=info[op].mems; /* clock goes up by $2^{32}$ for each $\mu$ */ + g[rC]=incr(g[rC],info[op].oops); /* clock goes up by 1 for each $\upsilon$ */ + g[rU]=incr(g[rU],1); /* usage counter counts total instructions simulated */ + g[rI]=incr(g[rI],-1); /* interval timer counts down by 1 only */ + if (g[rI].l==0 && g[rI].h==0) tracing=breakpoint=true; +} + +@* Tracing. After an instruction has been executed, we often want +to display its effect. This part of the program prints out a +symbolic interpretation of what has just happened. + +@= +if (tracing) { + if (showing_source && cur_line) show_line(); + @; + @; + if (showing_stats || breakpoint) show_stats(breakpoint); + just_traced=true; +}@+else if (just_traced) { + printf(" ...............................................\n"); + just_traced=false; + shown_line=-gap-1; /* gap will not be filled */ +} + +@ @= +bool showing_stats; /* should traced instructions also show the statistics? */ +bool just_traced; /* was the previous instruction traced? */ + +@ @= +if (resuming && op!=RESUME) { + switch (rop) { + case RESUME_AGAIN: printf(" (%08x%08x: %08x (%s)) ", + loc.h,loc.l,inst,info[op].name);@+break; + case RESUME_CONT: printf(" (%08x%08x: %04xrYrZ (%s)) ", + loc.h,loc.l,inst>>16,info[op].name);@+break; + case RESUME_SET: printf(" (%08x%08x: ..%02x..rZ (SET)) ", + loc.h,loc.l,(inst>>16)&0xff);@+break; + } +}@+else { + ll=mem_find(loc); + printf("%10d. %08x%08x: %08x (%s) ",ll->freq,loc.h,loc.l,inst,info[op].name); +} + +@ This part of the simulator was inspired by ideas of E.~H. Satterthwaite, +@^Satterthwaite, Edwin Hallowell, Jr.@> +{\sl Software---Practice and Experience\/ \bf2} (1972), 197--217. +Online debugging tools have improved significantly since Satterthwaite +published his work, but good offline tools are still valuable; +alas, today's algebraic programming languages do not provide tracing +facilities that come anywhere close to the level of quality that Satterthwaite +was able to demonstrate for {\mc ALGOL} in 1970. + +@= +if (lhs[0]=='!') printf("%s instruction!\n",lhs+1); /* privileged or illegal */ +else { + @; + if (z.l==0 && (op==ADDUI||op==ORI)) p="%l = %y = %#x"; /* \.{LDA}, \.{SET} */ + else p=info[op].trace_format; + for (;*p;p++) @; + if (exc) printf(", rA=#%05x", g[rA].l); + if (tripping) tripping=false, printf(", -> #%02x", inst_ptr.l); + printf("\n"); +} + +@ Push, pop, and \.{UNSAVE} instructions display changes to rL and rO +explicitly; otherwise the change is implicit, if |L!=old_L|. + +@= +if (L!=old_L && !(f&push_pop_bit)) printf("rL=%d, ",L); + +@ Each \MMIX\ instruction has a {\it trace format\/} string, which defines +its symbolic representation. For example, the string for \.{ADD} is +|"%l = %y + %z = %x"|; if the instruction is, say, \.{ADD}~\.{\$1,\$2,\$3} +with $\$2=5$ and $\$3=8$, and if the stack offset is 100, the trace output +will be |"$1=l[101] = 5 + 8 = 13"|. + +Percent signs (\.\%) induce special format conventions, as follows: + +\bull \.{\%a}, \.{\%b}, \.{\%p}, \.{\%q}, \.{\%w}, \.{\%x}, \.{\%y}, and +\.{\%z} stand for the numeric contents of octabytes |a|, |b|, |ma|, |mb|, |w|, +|x|, |y|, and~|z|, respectively; a ``style'' character may follow the +percent sign in this case, as explained below. + +\bull \.{\%(} and \.{\%)} are brackets that indicate the mode of +floating point rounding. If |round_mode=ROUND_NEAR|, |ROUND_OFF|, +|ROUND_UP|, |ROUND_DOWN|, the corresponding brackets are +\.(~and~\.), \.[~and~\.], \.\^~and~\.\^, \.\_~and~\.\_. +Such brackets are placed around a floating point operator; +for example, floating point addition is denoted +by `\.{[+]}' when the current rounding mode is rounding-off. + +\bull \.{\%l} stands for the string |lhs|, which usually represents the +``left hand side'' of the +instruction just performed, formatted as a register number and +its equivalent in the ring of local registers (e.g., `\.{\$1=l[101]}') or +as a register number and its equivalent in the array of global registers +(e.g., `\.{\$255=g[255]}'). The \.{POP} instruction +uses |lhs| to indicate how the ``hole'' in the register stack was plugged. + +\bull \.{\%r} means to switch to string |rhs| and continue formatting +from there. This mechanism allows us to use variable formats for opcodes like +\.{TRAP} that have several variants. + +\bull \.{\%t} means to print either `\.{Yes, ->loc}' (where \.{loc} is +the location of the next instruction) or `\.{No}', depending on the +value of~|x|. + +\bull \.{\%g} means to print `\.{ (bad guess)}' if |good| is |false|. + +\bull \.{\%s} stands for the name of special register |g[zz]|. + +\bull \.{\%?} stands for omission of +the following operator if |z=0|. For example, the +memory address of \.{LDBI} is described by `\.{\%\#y\%?+}'; this +means to treat the address as simply `\.{\%\#y}' if |z=0|, +otherwise as `\.{\%\#y+\%z}'. This case is used only when +|z| is a relatively small number (|z.h=0|). + +@= +{ + if (*p!='%') fputc(*p,stdout); + else { + style=decimal; + char_switch: switch (*++p) { + @t\4@>@; + default: printf("BUG!!"); /* can't happen */ + } + } +} + +@ Octabytes are printed as decimal numbers unless a +``style'' character intervenes between the percent sign and the +name of the octabyte: `\.\#' denotes hexadecimal notation, prefixed by~\.\#; +`\.0' denotes hexadecimal notation with no prefixed~\.\# and with leading zeros not suppressed; +`\..' denotes floating decimal notation; and +`\.!' means to use the names \.{StdIn}, \.{StdOut}, or \.{StdErr} +if the value is 0, 1, or~2. +@.StdIn@> +@.StdOut@> +@.StdErr@> + +@= +case '#': style=hex;@+ goto char_switch; +case '0': style=zhex;@+ goto char_switch; +case '.': style=floating;@+ goto char_switch; +case '!': style=handle;@+ goto char_switch; + +@ @= +typedef enum {@!decimal,@!hex,@!zhex,@!floating,@!handle} fmt_style; + +@ @= +case 'a': trace_print(a);@+break; +case 'b': trace_print(b);@+break; +case 'p': trace_print(ma);@+break; +case 'q': trace_print(mb);@+break; +case 'w': trace_print(w);@+break; +case 'x': trace_print(x);@+break; +case 'y': trace_print(y);@+break; +case 'z': trace_print(z);@+break; + +@ @= +fmt_style style; +char *stream_name[]={"StdIn","StdOut","StdErr"}; +@.StdIn@> +@.StdOut@> +@.StdErr@> +@# +void trace_print @,@,@[ARGS((octa))@];@+@t}\6{@> +void trace_print(o) + octa o; +{ + switch (style) { + case decimal: print_int(o);@+return; + case hex: fputc('#',stdout);@+print_hex(o);@+return; + case zhex: printf("%08x%08x",o.h,o.l);@+return; + case floating: print_float(o);@+return; + case handle:@+if (o.h==0 && o.l<3) printf(stream_name[o.l]); + else print_int(o);@+return; + } +} + +@ @= +case '(': fputc(left_paren[round_mode],stdout);@+break; +case ')': fputc(right_paren[round_mode],stdout);@+break; +case 't':@+if (x.l) printf(" Yes, -> #"),print_hex(inst_ptr); + else printf(" No");@+break; +case 'g':@+if (!good) printf(" (bad guess)");@+break; +case 's': printf(special_name[zz]);@+break; +case '?': p++;@+if (z.l) printf("%c%d",*p,z.l);@+break; +case 'l': printf(lhs);@+break; +case 'r': p=switchable_string;@+break; + +@ @d rhs &switchable_string[1] + +@= +char left_paren[]={0,'[','^','_','('}; /* denotes the rounding mode */ +char right_paren[]={0,']','^','_',')'}; /* denotes the rounding mode */ +char switchable_string[48]; /* holds |rhs|; position 0 is ignored */ + /* |switchable_string| must be able to hold any |trap_format| */ +char lhs[32]; +int good_guesses, bad_guesses; /* branch prediction statistics */ + +@ @= +void show_stats @,@,@[ARGS((bool))@];@+@t}\6{@> +void show_stats(verbose) + bool verbose; +{ + octa o; + printf(" %d instruction%s, %d mem%s, %d oop%s; %d good guess%s, %d bad\n", + g[rU].l,g[rU].l==1? "": "s",@| + g[rC].h,g[rC].h==1? "": "s",@| + g[rC].l,g[rC].l==1? "": "s",@| + good_guesses,good_guesses==1? "": "es",bad_guesses); + if (!verbose) return; + o = halted? incr(inst_ptr,-4): inst_ptr; + printf(" (%s at location #%08x%08x)\n", + halted? "halted": "now", o.h, o.l); +} + +@* Running the program. Now we are ready to fit the pieces together into a +working simulator. + +@c +#include +#include +#include +#include +#include +#include "abstime.h" +@@; +@@; +@@; +@@; +@# +int main(argc,argv) + int argc; + char *argv[]; +{ + @; + mmix_io_init(); + @; + @; + @; + @; + while (1) { + if (interrupt && !breakpoint) breakpoint=interacting=true, interrupt=false; + else { + breakpoint=false; + if (interacting) @; + } + if (halted) break; + do @@; + while ((!interrupt && !breakpoint) || resuming); + if (interact_after_break) interacting=true, interact_after_break=false; + } + end_simulation:@+if (profiling) @; + if (interacting || profiling || showing_stats) show_stats(true); + return g[255].l; /* provide rudimentary feedback for non-interactive runs */ +} + +@ Here we process the command-line options; when we finish, |*cur_arg| +should be the name of the object file to be loaded and simulated. + +@d mmo_file_name *cur_arg + +@= +myself=argv[0]; +for (cur_arg=argv+1;*cur_arg && (*cur_arg)[0]=='-'; cur_arg++) + scan_option(*cur_arg+1,true); +if (!*cur_arg) scan_option("?",true); /* exit with usage note */ +argc -= cur_arg-argv; /* this is the |argc| of the user program */ + +@ Careful readers of the following subroutine will notice a little white bug: +A tracing specification like +\.{t1000000000} or even \.{t0000000000} or even \.{t!!!!!!!!!!} +is silently converted to \.{t4294967295}. + +The \.{-b} and \.{-c} options are effective only on the command line, but they +are harmless while interacting. + +@= +void scan_option @,@,@[ARGS((char*,bool))@];@+@t}\6{@> +void scan_option(arg,usage) + char *arg; /* command-line argument (without the `\.-') */ + bool usage; /* should we exit with usage note if unrecognized? */ +{ + register int k; + switch (*arg) { + case 't':@+if (strlen(arg)>10) trace_threshold=0xffffffff; + else if (sscanf(arg+1,"%d",&trace_threshold)!=1) trace_threshold=0; + return; + case 'e':@+if (!*(arg+1)) tracing_exceptions=0xff; + else if (sscanf(arg+1,"%x",&tracing_exceptions)!=1) tracing_exceptions=0; + return; + case 'r': stack_tracing=true;@+return; + case 's': showing_stats=true;@+return; + case 'l':@+if (!*(arg+1)) gap=3; + else if (sscanf(arg+1,"%d",&gap)!=1) gap=0; + showing_source=true;@+return; + case 'L':@+if (!*(arg+1)) profile_gap=3; + else if (sscanf(arg+1,"%d",&profile_gap)!=1) profile_gap=0; + profile_showing_source=true; + case 'P': profiling=true;@+return; + case 'v': trace_threshold=0xffffffff;@+ tracing_exceptions=0xff; + stack_tracing=true; @+ showing_stats=true; + gap=10, showing_source=true; + profile_gap=10, profile_showing_source=true, profiling=true; + return; + case 'q': trace_threshold=tracing_exceptions=0; + stack_tracing=showing_stats=showing_source=false; + profiling=profile_showing_source=false; + return; + case 'i': interacting=true;@+return; + case 'I': interact_after_break=true;@+return; + case 'b':@+if (sscanf(arg+1,"%d",&buf_size)!=1) buf_size=0;@+return; + case 'c':@+if (sscanf(arg+1,"%d",&lring_size)!=1) lring_size=0;@+return; + case 'f': @;@+return; + case 'D': @;@+return; + default:@+if (usage) { + fprintf(stderr, + "Usage: %s progfile command-line-args...\n",myself); +@.Usage: ...@> + for (k=0;usage_help[k][0];k++) fprintf(stderr,usage_help[k]); + exit(-1); + }@+else@+ for (k=0;usage_help[k][1]!='b';k++) printf(usage_help[k]); + return; + } +} + +@ @= +char *myself; /* |argv[0]|, the name of this simulator */ +char **cur_arg; /* pointer to current place in the argument vector */ +bool interrupt; /* has the user interrupted the simulation recently? */ +bool profiling; /* should we print the profile at the end? */ +FILE *fake_stdin; /* file substituted for the simulated \.{StdIn} */ +FILE *dump_file; /* file used for binary dumps */ +char *usage_help[]={@/ +" with these options: (=decimal number, =hex number)\n",@| +"-t trace each instruction the first n times\n",@| +"-e trace each instruction with an exception matching x\n",@| +"-r trace hidden details of the register stack\n",@| +"-l list source lines when tracing, filling gaps <= n\n",@| +"-s show statistics after each traced instruction\n",@| +"-P print a profile when simulation ends\n",@| +"-L list source lines with the profile\n",@| +"-v be verbose: show almost everything\n",@| +"-q be quiet: show only the simulated standard output\n",@| +"-i run interactively (prompt for online commands)\n",@| +"-I interact, but only after the program halts\n",@| +"-b change the buffer size for source lines\n",@| +"-c change the cyclic local register ring size\n",@| +"-f use given file to simulate standard input\n",@| +"-D dump a file for use by other simulators\n",@| +""}; +char *interactive_help[]={@/ +"The interactive commands are:\n",@| +" trace one instruction\n",@| +"n trace one instruction\n",@| +"c continue until halt or breakpoint\n",@| +"q quit the simulation\n",@| +"s show current statistics\n",@| +"l set and/or show local register in format t\n",@| +"g set and/or show global register in format t\n",@| +"rA set and/or show register rA in format t\n",@| +"$ set and/or show dynamic register in format t\n",@| +"M set and/or show memory octabyte in format t\n",@| +"+ set and/or show n additional octabytes in format t\n",@| +" is ! (decimal) or . (floating) or # (hex) or \" (string)\n",@| +" or (previous ) or = (change value)\n",@| +"@@ go to location x\n",@| +"b[rwx] set or reset breakpoint at location x\n",@| +"t trace location x\n",@| +"u untrace location x\n",@| +"T set current segment to Text_Segment\n",@| +"D set current segment to Data_Segment\n",@| +"P set current segment to Pool_Segment\n",@| +"S set current segment to Stack_Segment\n",@| +"B show all current breakpoints and tracepoints\n",@| +"i insert commands from file\n",@| +"-

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.