URL
https://opencores.org/ocsvn/eco32/eco32/trunk
Subversion Repositories eco32
Compare Revisions
- This comparison shows the changes necessary to convert path
/eco32/tags/eco32-0.26/fp/implementation/mmix
- from Rev 15 to Rev 270
- ↔ Reverse comparison
Rev 15 → Rev 270
/mmix-doc.w
0,0 → 1,3336
% This file is part of the MMIXware package (c) Donald E Knuth 1999 |
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! |
|
\def\title{MMIX} |
\input epsf % input macros for dvips to include METAPOST illustrations |
|
\def\MMIX{\.{MMIX}} |
\def\NNIX{\hbox{\mc NNIX}} |
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant |
\def\beginword{\vcenter\bgroup\let\\=\wordrule\halign\bgroup&\hfil##\hfil\cr} |
\def\endword{\noalign{\vskip\baselineskip}\egroup\egroup |
\advance\belowdisplayskip-\baselineskip} |
\def\wordrule{\vrule height 9.5pt depth 4.5pt width .4pt} |
\newdimen\bitwd \bitwd=6.6pt |
\def\field#1#2{\vrule depth 3pt width 0pt \hbox to#1\bitwd{\hss$#2$\hss}} |
|
\def\XF{\\{XF}}\def\XM{\\{XM}}\def\XD{\\{XD}} % these not in \tt |
\def\PC{\\{PC}} |
\def\Jx{\.{J}} % conversely, I type J_ to get J in \tt |
|
\def\s{{\rm s}} |
\def\rX{{\rm\$X}} \def\rY{{\rm\$Y}} \def\rZ{{\rm\$Z}} |
\def\mm{{\rm M}} \def\xx{{\rm X}} \def\yy{{\rm Y}} \def\zz{{\rm Z}} |
%\def\ll{{\rm L}} \def\gg{{\rm G}} |
\def\ll{L} \def\gg{G} |
\def\?{\mkern-1mu} |
|
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> |
|
@* Introduction to MMIX. |
Thirty-eight years have passed since the \.{MIX} computer was designed, and |
computer architecture has been converging during those years |
towards a rather different |
style of machine. Therefore it is time to replace \.{MIX} with a new |
computer that contains even less saturated fat than its predecessor. |
|
Exercise 1.3.1--25 in the third edition of |
{\sl Fundamental Algorithms\/} speaks of an extended |
\.{MIX} called MixMaster, which is upward compatible with the old version. |
But MixMaster itself is hopelessly obsolete; although it allows for |
several gigabytes of memory, we can't even use it with {\mc ASCII} code to |
get lowercase letters. And ouch, the standard subroutine calling convention |
of \.{MIX} is irrevocably based on self-modifying code! Decimal arithmetic |
and self-modifying code were popular in 1962, but they sure have disappeared |
quickly as machines have gotten bigger and faster. A completely new |
design is called for, based on the principles of RISC architecture as |
expounded in {\sl Computer Architecture\/} by Hennessy and Patterson |
(Morgan Kaufmann, 1996). % first ed was "Morgan Kaufman"! but now "nn" is legit |
@^Hennessy, John LeRoy@> |
@^Patterson, David Andrew@> |
|
So here is \MMIX, a computer that will totally replace \.{MIX} |
in the ``ultimate'' editions of {\sl The Art of Computer Programming}, |
Volumes 1--3, and in the first editions of the remaining volumes. |
I~must confess that |
I~can hardly wait to own a computer like this. |
|
How do you pronounce \MMIX? I've been saying ``em-mix'' to myself, |
because the first `\.M' represents a new millennium. Therefore I~use |
the article ``an'' instead of~``a'' before the name \MMIX\ |
in English phrases like ``an \MMIX\ simulator.'' |
|
Incidentally, the {\sl Dictionary of American Regional English\/ \bf3} (1996) |
lists ``mommix'' as a common dialect word used both as a noun and a verb; |
to mommix something means to botch it, to bollix it. Only time will |
tell whether I~have mommixed the definition of \MMIX. |
|
@ The original \.{MIX} computer could be operated without an operating |
system; you could bootstrap it with punched cards or paper tape and do |
everything yourself. But nowadays such power is no longer in the hands of |
ordinary users. The \MMIX\ hardware, like all other computing machines |
made today, relies on an operating system to get jobs |
started in their own address spaces and to provide I/O capabilities. |
|
Whenever anybody has asked if I will be writing about operating systems, |
my reply has always been ``Nix.'' Therefore the name of\/ \MMIX's operating |
system, \NNIX, will come as no surprise. |
@:NNIX}{\NNIX\ operating system@> |
@^operating system@> |
From time to time I will necessarily have to refer to things that \NNIX\ does |
for its users, but I am unable to build \NNIX\ myself. Life is |
too short. It would be wonderful if some expert in operating system design |
became inspired to write a book that explains exactly how to construct a nice, |
clean \NNIX\ kernel for an \MMIX\ chip. |
|
@ I am deeply grateful to the many people who have helped me shape the behavior |
of\/ \MMIX. In particular, John Hennessy and (especially) Dick Sites |
have made significant contributions. |
@^Hennessy, John LeRoy@> |
@^Sites, Richard Lee@> |
|
@ A programmer's introduction to \MMIX\ appears in ``Fascicle~1,'' a booklet |
@^Fascicle 1@> |
containing tutorial material that will ultimately appear in the fourth edition |
of {\sl The Art of Computer Programming}. |
The description in the following sections is rather different, because |
we are concerned about a complete implementation, including all of the |
features used by the operating system and invisible to normal programs. |
Here it is important to emphasize exceptional cases that were glossed over |
in the tutorial, and~to consider |
nitpicky details about things that might go wrong. |
|
@* MMIX basics. |
\MMIX\ is a 64-bit RISC machine with at least 256 general-purpose registers |
and a 64-bit address space. |
Every instruction is four bytes long and has the form |
$$\vcenter{\offinterlineskip |
\def\\#1&{\omit&} |
\hrule |
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr |
height 9pt depth4pt&OP&&X&&Y&&Z&\cr} |
\hrule}\,.$$ |
The 256 possible OP codes fall into a dozen or so easily remembered |
@^OP codes@> |
categories; an instruction usually means, ``Set register X to the |
result of\/ Y~OP~Z\null.'' For example, |
$$\vcenter{\offinterlineskip |
\def\\#1&{\omit&} |
\hrule |
\halign{&\vrule#&\hbox to 4em{\tt\hfil#\hfil}\cr |
height 9pt depth4pt&32&&1&&2&&3&\cr} |
\hrule}$$ |
sets register~1 to the sum of registers 2 and 3. |
A few instructions combine the Y and Z bytes into |
a 16-bit YZ field; two of the jump instructions use a 24-bit XYZ field. |
But the three bytes X, Y, Z usually have three-pronged significance |
independent of each other. |
|
Instructions are usually represented in a symbolic form corresponding |
to the \MMIX\ assembly language, in which each operation code has a mnemonic |
name. For example, operation~32 is \.{ADD}, and the instruction above |
might be written `\.{ADD} \.{\$1,\$2,\$3}'; a dollar sign `\.\$' symbolizes |
a register number. In general, the instruction |
\.{ADD}~\.{\$X,\$Y,\$Z} is the operation of setting $\rX=\rY+\rZ$. |
An assembly language instruction with two commas has three operand |
fields X, Y,~Z; an instruction with one comma has two operand fields |
X,~YZ; an instruction with no comma has one operand field,~XYZ; |
an instruction with no operands has $\xx=\yy=\zz=0$. |
|
\def\0{\$Z\char'174Z} |
Most instructions have two forms, one in which the Z field stands for |
register \$Z, and one in which Z is an unsigned ``immediate'' constant. |
@^immediate operands@> |
Thus, for example, the command `\.{ADD} \.{\$X,\$Y,\$Z}' has a counterpart |
`\.{ADD} \.{\$X,\$Y,Z}', which sets $\rX=\rY+\zz$. Immediate constants |
are always nonnegative. |
In the descriptions |
below we will introduce such pairs of instructions |
by writing just `\.{ADD}~\.{\$X,\$Y,\0}' instead of naming both |
cases explicitly. |
|
The operation code for \.{ADD}~\.{\$X,\$Y,\$Z} is 32, but the operation |
code for \.{ADD}~\.{\$X,\$Y,Z} is~33. The \MMIX\ assembler chooses the correct |
code by noting whether the third argument is a register number or~not. |
|
Register numbers and constants can be given symbolic names; for example, the |
assembly language instruction `\.x~\.{IS}~\.{\$1}' makes \.x an |
abbreviation for register number~1. Similarly, `\.{FIVE}~\.{IS}~\.5' |
makes \.{FIVE} an abbreviation for the constant~5. |
After these abbreviations have been specified, the instruction |
\.{ADD}~\.{x,x,FIVE} increases \$1 by~5, using opcode~33, while |
the instruction \.{ADD}~\.{x,x,x} doubles \$1 using opcode~32. |
Symbolic names that stand for register numbers |
conventionally begin with a lowercase letter, while names that stand |
for constants conventionally begin with an uppercase letter. |
This convention is not actually enforced by the assembler, |
but it tends to reduce a programmer's confusion. |
|
@ A {\it nybble\/} is a 4-bit quantity, often used to denote a decimal |
or hexadecimal digit. |
A {\it byte\/} is an 8-bit quantity, often used to denote an alphanumeric |
character in {\mc ASCII} code. The Unicode standard extends {\mc ASCII} to |
@^Unicode@> |
@^ASCII@> |
essentially all the world's languages by using 16-bit-wide characters called |
{\it wydes\/}. (Weight watchers know that two nybbles make one byte, |
but two bytes make one wyde.) |
In the discussion below we use the term |
{\it tetrabyte\/} or ``tetra'' for a 4-byte quantity, and the similar term |
@^nybble@> |
@^byte@> |
@^wyde@> |
@^tetrabyte@> |
@^octabyte@> |
{\it octabyte\/} or ``octa'' for an 8-byte quantity. Thus, a tetra is |
two wydes, an octa is two tetras; an octabyte has 64~bits. Each \MMIX\ |
register can be thought of as containing one octabyte, or two tetras, |
or four wydes, or eight bytes, or sixteen nybbles. |
|
When bytes, wydes, tetras, and octas represent numbers they are said to be |
either {\it signed\/} or {\it unsigned}. An unsigned byte is a number between |
0~and $2^8-1=255$ inclusive; an unsigned wyde lies, similarly, between |
0~and $2^{16}-1=65535$; an unsigned tetra lies between |
0~and $2^{32}-1=4{,}294{,}967{,}295$; an unsigned octa lies between |
0~and $2^{64}-1=18{,}446{,}744{,}073{,}709{,}551{,}615$. |
Their signed counterparts use the |
conventions of two's complement notation, by subtracting respectively $2^8$, |
$2^{16}$, $2^{32}$, or~$2^{64}$ times the most significant bit. Thus, |
the unsigned bytes 128 through 255 are regarded as the numbers $-128$ |
through~$-1$ when they are evaluated as signed bytes; a signed byte therefore |
lies between $-128$ and $+127$, inclusive. A signed wyde is a number |
between $-32768$ and $+32767$; a signed tetra lies between |
$-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$; a signed octa lies between |
$-9{,}223{,}372{,}036{,}854{,}775{,}808$ and $+9{,}223{,}372{,}036{,}854{,}775{,}807$. |
|
The virtual memory of\/ \MMIX\ is an array M of $2^{64}$ bytes. If $k$ is any |
unsigned octabyte, M[$k$]~is a 1-byte quantity. \MMIX\ machines do not |
actually have such vast memories, but programmers can act as if $2^{64}$ bytes |
are indeed present, because \MMIX\ provides address translation mechanisms by |
which an operating system can maintain this illusion. |
|
We use the notation $\mm_{2^t}[k]$ to stand for a number consisting of |
$2^t$~consecutive bytes starting at location~$k\land\nobreak(2^{64}-2^t)$. |
(The notation $k\land(2^{64}-2^t)$ means that the least |
significant $t$ bits of~$k$ are set to~0, and only the least 64~bits |
of the resulting address are retained. Similarly, the notation |
$k\lor(2^t-1)$ means that the least significant $t$ bits of~$k$ are set to~1.) |
All accesses to $2^t$-byte quantities by \MMIX\ are {\it aligned}, in the sense |
that the first byte is a multiple of~$2^t$. |
|
Addressing is always ``big-endian.'' In other words, the |
@^big-endian versus little-endian@> |
@^little-endian versus big-endian@> |
most significant (leftmost) byte of $\mm_{2^t}[k]$ is |
$\mm_1[k\land\nobreak(2^{64}-2^t)]$ |
and the least significant (rightmost) byte is $\mm_1[k\lor(2^t-1)]$. |
We use the notation $\s(\mm_{2^t}[k])$ when we want to regard |
this $2^t$-byte number as a {\it signed\/} integer. |
Formally speaking, if $l=2^t$, |
@^signed integers@> |
$$\s(\mm_l[k])=\bigl(\mm_1[k\land(-l)]\,\mm_1[k\land(-l)+1]\,\ldots\, |
\mm_1[k\lor(l-1)]\bigr)_{256} |
-2^{8l}[\mm_1[k\land(-l)]\!\ge\!128].$$ |
|
@* Loading and storing. |
Several instructions can be used to get information from memory into |
registers. For example, the ``load tetra unsigned'' instruction |
\.{LDTU} \.{\$1,\$4,\$5} |
puts the four bytes $\mm_4[\$4+\$5]$ into register~1 as an unsigned |
integer; |
the most significant four bytes of register~1 are set to zero. |
The similar instruction \.{LDT} \.{\$1,\$4,\$5}, ``load tetra,'' sets |
\$1 to the {\it signed\/} integer $\s(\mm_4[\$4+\$5])$. |
(Instructions generally treat numbers as |
@^signed integers@> |
signed unless the operation code specifically calls them |
unsigned.) In the signed case, the most significant four bytes of the |
register will be copies of the most significant bit of the tetrabyte |
loaded; thus they will be all~0s or all~1s, depending on whether the |
number is $\ge0$ or $<0$. |
|
\def\bull{\smallbreak\textindent{$\bullet$}} |
\def\bul{\par\textindent{$\bullet$}} |
\def\<#1 #2 {\.{#1}~\.{#2} } |
\def\>{\hfill\break} |
|
\bull\<LDB \$X,\$Y,\0 `load byte'.\> |
@.LDB@> |
Byte $\s(\mm[\rY+\rZ])$ or $\s(\mm[\rY+\zz])$ is loaded into register~X as a |
signed number between $-128$ and $+127$, inclusive. |
|
\bull\<LDBU \$X,\$Y,\0 `load byte unsigned'.@> |
@.LDBU@> |
Byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$ is loaded into register~X as an |
unsigned number between $0$ and $255$, inclusive. |
|
\bull\<LDW \$X,\$Y,\0 `load wyde'.\> |
@.LDW@> |
Bytes $\s(\mm_2[\rY+\rZ])$ or $\s(\mm_2[\rY+\zz])$ |
are loaded into register~X as a signed number between $-32768$ and $+32767$, |
inclusive. As mentioned above, our notation $\mm_2[k]$ implies that |
the least significant bit of the address $\rY+\rZ$ or $\rY+\zz$ is |
ignored and assumed to be~0. |
|
\bull\<LDWU \$X,\$Y,\0 `load wyde unsigned'.@> |
@.LDWU@> |
Bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$ are loaded |
into register~X as an unsigned number between $0$ and $65535$, inclusive. |
|
\bull\<LDT \$X,\$Y,\0 `load tetra'.\> |
@.LDT@> |
Bytes $\s(\mm_4[\rY+\rZ])$ or $\s(\mm_4[\rY+\zz])$ |
are loaded into register~X as a signed number between $-2{,}147{,}483{,}648$ and |
$+2{,}147{,}483{,}647$, inclusive. |
As mentioned above, our notation $\mm_4[k]$ implies that |
the two least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are |
ignored and assumed to be~0. |
|
\bull\<LDTU \$X,\$Y,\0 `load tetra unsigned'.\> |
@.LDTU@> |
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$ |
are loaded into register~X as an unsigned number between 0 and |
4{,}294{,}967{,}296, inclusive. |
|
\bull\<LDO \$X,\$Y,\0 `load octa'.\> |
@.LDO@> |
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into |
register~X\null. |
As mentioned above, our notation $\mm_8[k]$ implies that |
the three least significant bits of the address $\rY+\rZ$ or $\rY+\zz$ are |
ignored and assumed to be~0. |
|
\bull\<LDOU \$X,\$Y,\0 `load octa unsigned'.\> |
@.LDOU@> |
Bytes $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ are loaded into |
register~X\null. There is in fact no difference between the behavior of |
\.{LDOU} and~\.{LDO}, since |
an octabyte can be regarded as either signed or unsigned. \.{LDOU} is included |
in \MMIX\ just for completeness and consistency, in spite of the fact that |
a foolish consistency is the hobgoblin of little minds. |
@^Emerson, Ralph Waldo@> |
(Niklaus Wirth made a strong plea for such consistency in his early critique |
of System/360; see {\sl JACM\/ \bf15} (1967), 37--74.) |
@^Wirth, Niklaus Emil@> |
@^System/360@> |
|
\bull\<LDHT \$X,\$Y,\0 `load high tetra'.\> |
@.LDHT@> |
Bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$ are loaded into the most |
significant half of register~X, and the least significant half is |
cleared to zero. (One use of ``high tetra arithmetic'' is to detect |
overflow easily when tetrabytes are added or subtracted.) |
|
\bull\<LDA \$X,\$Y,\0 `load address'.\> |
The address $\rY+\rZ$ or $\rY+\zz$ is loaded into register~X. This |
instruction is simply another name for the \.{ADDU} instruction |
discussed below; it can |
be used when the programmer is thinking of memory addresses |
instead of numbers. |
The \MMIX\ assembler converts \.{LDA} into the same OP-code as \.{ADDU}. |
@.LDA@> |
@.ADDU@> |
|
@ Another family of instructions goes the other way, storing registers into |
memory. For example, the ``store octa immediate'' command |
\<STO \$3,\$2,17 puts the current contents of register~3 |
into $\mm_8[\$2+17]$. |
|
\bull\<STB \$X,\$Y,\0 `store byte'.\> |
@.STB@> |
The least significant byte of register~X is stored into |
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. An integer overflow exception occurs if |
@.overflow@> |
\$X is not between $-128$ and $+127$. (We will discuss overflow and other |
kinds of exceptions later.) |
|
\bull\<STBU \$X,\$Y,\0 `store byte unsigned'.@>\> |
@.STBU@> |
The least significant byte of register~X is stored into |
byte $\mm[\rY+\rZ]$ or $\mm[\rY+\zz]$. \.{STBU} instructions are the same |
as \.{STB} instructions, except that no test for overflow is made. |
|
\bull\<STW \$X,\$Y,\0 `store wyde'.\> |
@.STW@> |
The two least significant bytes of register~X are stored into |
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$. |
An integer overflow exception occurs if |
\$X is not between $-32768$ and $+32767$. |
|
\bull\<STWU \$X,\$Y,\0 `store wyde unsigned'.@>\> |
@.STWU@> |
The two least significant bytes of register~X are stored into |
bytes $\mm_2[\rY+\rZ]$ or $\mm_2[\rY+\zz]$. |
\.{STWU} instructions are the same |
as \.{STW} instructions, except that no test for overflow is made. |
|
\bull\<STT \$X,\$Y,\0 `store tetra'.\> |
@.STT@> |
The four least significant bytes of register~X are stored into |
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$. |
An integer overflow exception occurs if |
\$X is not between $-2{,}147{,}483{,}648$ and $+2{,}147{,}483{,}647$. |
|
\bull\<STTU \$X,\$Y,\0 `store tetra unsigned'.\> |
@.STTU@> |
The four least significant bytes of register~X are stored into |
bytes $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$. |
\.{STTU} instructions are the same |
as \.{STT} instructions, except that no test for overflow is made. |
|
\bull\<STO \$X,\$Y,\0 `store octa'.\> |
@.STO@> |
Register X is stored into bytes $\mm_8[\rY+\rZ]$ or |
$\mm_8[\rY+\zz]$. |
|
\bull\<STOU \$X,\$Y,\0 `store octa unsigned'.\> |
@.STOU@> |
Identical to \.{STO} \.{\$X,\$Y,\0}. |
|
\bull\<STCO X,\$Y,\0 `store constant octabyte'.\> |
@.STCO@> |
An octabyte whose value is the unsigned byte X is stored into |
$\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$. |
|
\bull\<STHT \$X,\$Y,\0 `store high tetra'.\> |
The most significant four bytes of register~X are stored into |
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$. |
@.STHT@> |
|
@* Adding and subtracting. |
Once numbers are in registers, we can compute with them. Let's consider |
addition and subtraction first. |
|
\bull\<ADD \$X,\$Y,\0 `add'.\> |
@.ADD@> |
The sum $\rY+\rZ$ or $\rY+\zz$ is placed into register~X using signed, |
two's complement arithmetic. |
An integer overflow exception occurs if the sum is $\ge2^{63}$ or $<-2^{63}$. |
(We will discuss overflow and other kinds of exceptions later.) |
@.overflow@> |
|
\bull\<ADDU \$X,\$Y,\0 `add unsigned'.\> |
@.ADDU@> |
The sum $(\rY+\rZ)\bmod2^{64}$ or $(\rY+\zz)\bmod2^{64}$ |
is placed into register~X\null. |
These instructions are the same |
as \.{ADD}~\.{\$X,\$Y,\0} commands |
except that no test for overflow is made. |
(Overflow could be detected if desired by using the command |
\<CMPU ovflo,\$X,\$Y after addition, where \.{CMPU} means |
@.CMPU@> |
``compare unsigned''; see below.) |
|
\bull\<2ADDU \$X,\$Y,\0 `times 2 and add unsigned'.\> |
@.2ADDU@> |
The sum $(2\rY+\rZ)\bmod2^{64}$ or $(2\rY+\zz)\bmod2^{64}$ |
is placed into register~X\null. |
|
\bull\<4ADDU \$X,\$Y,\0 `times 4 and add unsigned'.\> |
@.4ADDU@> |
The sum $(4\rY+\rZ)\bmod2^{64}$ or $(4\rY+\zz)\bmod2^{64}$ |
is placed into register~X\null. |
|
\bull\<8ADDU \$X,\$Y,\0 `times 8 and add unsigned'.\> |
@.8ADDU@> |
The sum $(8\rY+\rZ)\bmod2^{64}$ or $(8\rY+\zz)\bmod2^{64}$ |
is placed into register~X\null. |
|
\bull\<16ADDU \$X,\$Y,\0 `times 16 and add unsigned'.\> |
@.16ADDU@> |
The sum $(16\rY+\rZ)\bmod2^{64}$ or $(16\rY+\zz)\bmod2^{64}$ |
is placed into register~X\null. |
|
\bull\<SUB \$X,\$Y,\0 `subtract'.\> |
@.SUB@> |
The difference $\rY-\rZ$ or $\rY-\zz$ is placed into register~X using |
signed, two's complement arithmetic. |
An integer overflow exception occurs if the difference is $\ge2^{63}$ or |
$<-2^{63}$. |
|
\bull\<SUBU \$X,\$Y,\0 `subtract unsigned'.\> |
@.SUBU@> |
The difference $(\rY-\rZ)\bmod2^{64}$ or $(\rY-\zz)\bmod2^{64}$ |
is placed into register~X\null. |
These two instructions are the same |
as \.{SUB}~\.{\$X,\$Y,\0} except that no test for overflow is made. |
|
\bull\<NEG \$X,Y,\0 `negate'.\> |
@.NEG@> |
The value $\yy-\rZ$ or $\yy-\zz$ is placed into register~X using |
signed, two's complement arithmetic. |
An integer overflow exception occurs if the result is greater |
than~$2^{63}-\nobreak1$. |
(Notice that in this case \MMIX\ works with the ``immediate'' constant~Y, |
not register~Y\null. \.{NEG} commands are analogous to the immediate variants |
of other commands, because they save us from having to put one-byte |
constants into a register. When $\yy=0$, overflow occurs if and |
only if $\rZ=-2^{63}$. The instruction \<NEG \$X,1,2 has exactly the |
same effect as \.{NEG}~\.{\$X,0,1}.) |
|
\bull\<NEGU \$X,Y,\0 `negate unsigned'.\> |
@.NEGU@> |
The value $(\yy-\rZ)\bmod2^{64}$ or $(\yy-\zz)\bmod2^{64}$ |
is placed into register~X\null. |
\.{NEGU} instructions are the same |
as \.{NEG} instructions, except that no test for overflow is made. |
|
@* Bit fiddling. |
Before looking at multiplication and division, which take longer than |
addition and subtraction, let's look at some of the other things that |
\MMIX\ can do fast. There are eighteen instructions for bitwise |
logical operations on unsigned numbers. |
|
\bull\<AND \$X,\$Y,\0 `bitwise and'.\> |
@.AND@> |
Each bit of register Y is logically anded with the corresponding bit of |
register~Z or of the constant~Z, and the result is placed in register~X\null. |
In other words, a bit of register~X is set to~1 if and only if the |
corresponding bits of the operands are both~1; |
in symbols, $\rX=\rY\land\rZ$ or $\rX=\rY\land\zz$. |
This means in particular that \<AND \$X,\$Y,Z always zeroes out the seven |
most significant bytes of register~X, because 0s are prefixed to the |
constant byte~Z\null. |
|
\bull\<OR \$X,\$Y,\0 `bitwise or'.\> |
@.OR@> |
Each bit of register Y is logically ored with the corresponding bit of |
register~Z or of the constant~Z, and the result is placed in register~X\null. |
In other words, a bit of register~X is set to~0 if and only if the |
corresponding bits of the operands are both~0; |
in symbols, $\rX=\rY\lor\rZ$ or $\rX=\rY\lor\zz$. |
|
In the special case $\zz=0$, the immediate variant of |
this command simply copies register~Y to |
register~X\null. The \MMIX\ assembler allows us to write |
`\.{SET}~\.{\$X,\$Y}' as a convenient abbreviation for |
`\.{OR}~\.{\$X,\$Y,0}'. |
@.SET@> |
|
\bull\<XOR \$X,\$Y,\0 `bitwise exclusive-or'.\> |
@.XOR@> |
Each bit of register Y is logically xored with the corresponding bit of |
register~Z or of the constant~Z, and the result is placed in register~X\null. |
In other words, a bit of register~X is set to~0 if and only if the |
corresponding bits of the operands are equal; |
in symbols, $\rX=\rY\oplus\rZ$ or $\rX=\rY\oplus\zz$. |
|
\bull\<ANDN \$X,\$Y,\0 `bitwise and-not'.\> |
@.ANDN@> |
Each bit of register Y is logically anded with the complement of the |
corresponding bit of |
register~Z or of the constant~Z, and the result is placed in register~X\null. |
In other words, a bit of register~X is set to~1 if and only if the |
corresponding bit of register~Y is~1 and the other corresponding bit is~0; |
in symbols, $\rX=\rY\setminus\rZ$ or $\rX=\rY\setminus\zz$. |
(This is the {\it logical difference\/} operation; if the operands |
are bit strings representing sets, we are computing the elements that |
lie in one set but not the other.) |
|
\bull\<ORN \$X,\$Y,\0 `bitwise or-not'.\> |
@.ORN@> |
Each bit of register Y is logically ored with the complement of the |
corresponding bit of |
register~Z or of the constant~Z, and the result is placed in register~X\null. |
In other words, a bit of register~X is set to~1 if and only if the |
corresponding bit of register~Y is greater than or equal to the other corresponding bit; |
in symbols, $\rX=\rY\lor\overline\rZ$ |
or $\rX=\rY\lor\overline\zz$. |
(This is the complement of $\rZ\setminus\rY$ or $\zz\setminus\rY$.) |
|
\bull\<NAND \$X,\$Y,\0 `bitwise not-and'.\> |
@.NAND@> |
Each bit of register Y is logically anded with the corresponding bit of |
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null. |
In other words, a bit of register~X is set to~0 if and only if the |
corresponding bits of the operands are both~1; |
in symbols, $\rX=\rY\mathbin{\overline\land}\rZ$ or |
$\rX=\rY\mathbin{\overline\land}\zz$. |
|
\bull\<NOR \$X,\$Y,\0 `bitwise not-or'.\> |
@.NOR@> |
Each bit of register Y is logically ored with the corresponding bit of |
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null. |
In other words, a bit of register~X is set to~1 if and only if the |
corresponding bits of the operands are both~0; |
in symbols, $\rX=\rY\mathbin{\overline\lor}\rZ$ or |
$\rX=\rY\mathbin{\overline\lor}\zz$. |
|
\bull\<NXOR \$X,\$Y,\0 `bitwise not-exclusive-or'.\> |
@.NAND@> |
Each bit of register Y is logically xored with the corresponding bit of |
register~Z or of the constant~Z, and the complement of the result is placed in register~X\null. |
In other words, a bit of register~X is set to~1 if and only if the |
corresponding bits of the operands are equal; |
in symbols, $\rX=\rY\mathbin{\overline\oplus}\rZ$ or |
$\rX=\rY\mathbin{\overline\oplus}\zz$. |
|
\bull\<MUX \$X,\$Y,\0 `bitwise multiplex'.\> |
@.MUX@> |
For each bit position~$j$, the $j$th bit of register~X is set either to |
bit~$j$ of register~Y |
or to bit~$j$ of the other operand \$Z~or~Z, depending on |
whether bit~$j$ of the special {\it mask register\/}~rM is 1 or 0: |
@^rM@> |
if ${\rm M}_j$ then $\yy_j$ else~$\zz_j$. |
In symbols, $\rm\rX=(\rY\land rM)\lor(\rZ\land\overline{rM})$ or |
$\rm\rX=(\rY\land rM)\lor(\zz\land\overline{rM})$. |
(\MMIX\ has several such special registers, associated with instructions that |
need more than two inputs or produce more than one output.) |
|
@ Besides the eighteen bitwise operations, \MMIX\ can also perform unsigned |
bytewise and biggerwise operations that are somewhat more exotic. |
|
\bull\<BDIF \$X,\$Y,\0 `byte difference'.\> |
@.BDIF@> |
For each byte position~$j$, the $j$th byte of register~X is set to byte~$j$ of |
register~Y minus byte~$j$ of the other operand \$Z~or~Z, unless that |
difference is negative; in the latter case, byte~$j$ of~\$X is set to zero. |
|
\bull\<WDIF \$X,\$Y,\0 `wyde difference'.\> |
@.WDIF@> |
For each wyde position~$j$, the $j$th wyde of register~X is set to wyde~$j$ of |
register~Y minus wyde~$j$ of the other operand \$Z~or~Z, unless that |
difference is negative; in the latter case, wyde~$j$ of~\$X is set to zero. |
|
\bull\<TDIF \$X,\$Y,\0 `tetra difference'.\> |
@.TDIF@> |
For each tetra position~$j$, the $j$th tetra of register~X is set to tetra~$j$ of |
register~Y minus tetra~$j$ of the other operand \$Z~or~Z, unless that |
difference is negative; in the latter case, tetra~$j$ of~\$X is set to zero. |
|
\bull\<ODIF \$X,\$Y,\0 `octa difference'.\> |
@.ODIF@> |
Register~X is set to register~Y minus the other operand \$Z~or~Z, unless |
\$Z~or~Z exceeds register~Y; in the latter case, |
\$X~is set to zero. The operands are treated as unsigned integers. |
|
\smallskip |
The \.{BDIF} and \.{WDIF} commands are useful |
in applications to graphics or video; \.{TDIF} and \.{ODIF} are also |
present for reasons of consistency. For example, if \.a and \.b are |
registers containing |
8-byte quantities, their bytewise maxima~\.c and |
bytewise minima~\.d are computed by |
$$\hbox{\tt BDIF x,a,b; ADDU c,x,b; SUBU d,a,x;}$$ |
similarly, the individual ``pixel differences'' \.e, namely the absolute |
values of the differences of corresponding bytes, are computed by |
$$\hbox{\tt BDIF x,a,b; BDIF y,b,a; OR e,x,y.}$$ |
To add individual |
bytes of \.a and \.b while clipping all sums to 255 if they don't fit |
in a single byte, one can say |
$$\hbox{\tt NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums,x,0;}$$ |
in other words, complement \.a, apply \.{BDIF}, and complement the result. |
The operations can also be used to construct efficient operations on |
strings of bytes or wydes. |
@^graphics@> |
@^pixels@> |
@^saturating arithmetic@> |
@^nybble@> |
|
Exercise: Implement a ``nybble difference'' instruction that operates |
in a similar way on sixteen nybbles at a time. |
|
Answer: {\tt\spaceskip=.5em minus .3em |
AND x,a,m; AND y,b,m; ANDN xx,a,m; ANDN yy,b,m; |
BDIF x,x,y; BDIF xx,xx,yy; OR ans,x,xx} where register \.m contains |
the mask \Hex{0f0f0f0f0f0f0f0f}. |
|
(The \.{ANDN} operation can be regarded as |
a ``bit difference'' instruction that operates |
in a similar way on 64 bits at a time.) |
|
@ Three more pairs of bit-fiddling instructions round out the collection of exotics. |
|
\bull\<SADD \$X,\$Y,\0 `sideways add'.\> |
@.SADD@> |
Each bit of register Y is logically anded with the complement of the |
corresponding bit of |
register~Z or of the constant~Z, and the number of 1~bits in the |
result is placed in register~X\null. |
In other words, register~X is set to the number of bit positions |
in which register~Y has a~1 and the other operand has a~0; |
in symbols, $\rX=\nu(\rY\setminus\rZ)$ or $\rX=\nu(\rY\setminus\zz)$. |
When the second operand is zero this operation is sometimes called |
``population counting,'' because it counts the number of 1s in register~Y\null. |
@^population counting@> |
@^counting ones@> |
|
\bull\<MOR \$X,\$Y,\0 `multiple or'.\> |
@.MOR@> |
Suppose the 64 bits of register Y are indexed as |
$$y_{00}y_{01}\ldots y_{07}y_{10}y_{11}\ldots y_{17}\ldots |
y_{70}y_{71}\ldots y_{77};$$ |
in other words, $y_{ij}$ is the $j$th bit of the $i$th byte, if we |
number the bits and bytes from 0 to 7 in big-endian fashion from left to right. |
Let the bits of the other operand, \$Z or~Z, be indexed similarly: |
$$z_{00}z_{01}\ldots z_{07}z_{10}z_{11}\ldots z_{17}\ldots |
z_{70}z_{71}\ldots z_{77}.$$ |
The \.{MOR} operation replaces each bit $x_{ij}$ of register~X by the bit |
$$ y_{0j}z_{i0}\lor y_{1j}z_{i1}\lor \cdots \lor y_{7j}z_{i7}.$$ |
Thus, for example, if register Z contains the constant |
\Hex{0102040810204080}, |
\.{MOR} reverses the order of the bytes in register~Y, converting between |
little-endian and big-endian addressing. |
@^big-endian versus little-endian@> |
@^little-endian versus big-endian@> |
(The $i$th byte of~\$X depends on the bytes of~\$Y as specified by the |
$i$th byte of~\$Z or~Z\null. If we regard |
64-bit words as $8\times8$ Boolean matrices, with one byte per column, |
this operation computes the |
Boolean product $\rX=\rY\,\rZ$ or $\rX=\rY\,\zz$. Alternatively, if we |
regard 64-bit words as $8\times8$ matrices with one byte per~{\it row}, |
\.{MOR} computes the Boolean product $\rX=\rZ\,\rY$ or $\rX=\zz\,\rY$ |
with operands in the opposite order. The immediate form |
\<MOR \$X,\$Y,Z always sets the leading seven bytes of register~X |
to zero; the other byte is set to the bitwise or of whatever bytes of |
register~Y are specified by the immediate operand~Z\null.) |
|
Exercise: Explain how to compute a mask \.m that is \Hex{ff} in byte |
positions where \.a exceeds \.b, \Hex{00} in all other bytes. |
Answer: \.{BDIF}~\.{x,a,b;} \.{MOR}~\.{m,minusone,x;} |
here \.{minusone} is a register consisting of all 1s. (Moreover, |
if we \.{AND} this result |
with \Hex{8040201008040201}, then \.{MOR} with $\zz=255$, we get |
a one-byte encoding~of~\.m.) |
|
\bull\<MXOR \$X,\$Y,\0 `multiple exclusive-or'.\> |
@.MXOR@> |
This operation is like the Boolean multiplication just discussed, but |
exclusive-or is used to combine the bits. Thus we obtain a matrix |
product over the field of two elements instead of a Boolean matrix product. |
This operation can be used to construct hash functions, among many other things. |
(The hash functions aren't bad, but they are not ``universal'' in the |
sense of exercise 6.4--72.) |
@^matrices of bits@> |
@^Boolean multiplication@> |
|
@ Sixteen ``immediate wyde'' instructions are available for the common |
case that a 16-bit constant is needed. In this case the Y~and~Z fields |
of the instruction are regarded as a single 16-bit unsigned number~YZ\null. |
@^immediate operands@> |
|
\bull\<SETH \$X,YZ `set to high wyde'; |
@.SETH@> |
\<SETMH \$X,YZ `set to medium high wyde'; |
@.SETMH@> |
\<SETML \$X,YZ `set to medium low wyde'; |
@.SETML@> |
\<SETL \$X,YZ `set to low wyde'.\> |
@.SETL@> |
The 16-bit unsigned number YZ is shifted left |
by either 48 or 32 or 16 or 0 bits, respectively, and placed into register~X\null. |
Thus, for example, \.{SETML} inserts |
a given value into the second-least-significant wyde of register~X and |
sets the other three wydes to zero. |
|
\bull\<INCH \$X,YZ `increase by high wyde'; |
@.INCH@> |
\<INCMH \$X,YZ `increase by medium high wyde'; |
@.INCMH@> |
\<INCML \$X,YZ `increase by medium low wyde'; |
@.INCML@> |
\<INCL \$X,YZ `increase by low wyde'.\> |
@.INCL@> |
The 16-bit unsigned number YZ is shifted left |
by either 48 or 32 or 16 or 0 bits, respectively, and added to register~X, |
ignoring overflow; the result is placed back into register~X\null. |
|
If YZ is the hexadecimal constant \Hex{8000}, the command \<INCH \$X,YZ |
complements the most significant bit of register~X\null. We will see |
below that this can be used to negate a floating point number. |
@^negation, floating point@> |
|
\bull\<ORH \$X,YZ `bitwise or with high wyde'; |
@.ORH@> |
\<ORMH \$X,YZ `bitwise or with medium high wyde'; |
@.ORMH@> |
\<ORML \$X,YZ `bitwise or with medium low wyde'; |
@.ORML@> |
\<ORL \$X,YZ `bitwise or with low wyde'.\> |
@.ORL@> |
The 16-bit unsigned number YZ is shifted left |
by either 48 or 32 or 16 or 0 bits, respectively, and ored with register~X; |
the result is placed back into register~X\null. |
|
Notice that any desired 4-wyde constant \.{GH} \.{IJ} \.{KL} \.{MN} |
can be inserted into a register with a sequence of four instructions |
such as |
$$\hbox{\tt SETH \$X,GH; INCMH \$X,IJ; INCML \$X,KL; INCL \$X,MN;}$$ |
any of these \.{INC} instructions could also be replaced by \.{OR}. |
|
\bull\<ANDNH \$X,YZ `bitwise and-not high wyde'; |
@.ANDNH@> |
\<ANDNMH \$X,YZ `bitwise and-not medium high wyde';\> |
@.ANDNMH@> |
\<ANDNML \$X,YZ `bitwise and-not medium low wyde'; |
@.ANDNML@> |
\<ANDNL \$X,YZ `bitwise and-not low wyde'.\> |
@.ANDNL@> |
The 16-bit unsigned number YZ is shifted left |
by either 48 or 32 or 16 or 0 bits, respectively, then |
complemented and anded with register~X; |
the result is placed back into register~X\null. |
|
If YZ is the hexadecimal |
constant \Hex{8000}, the command \<ANDNH \$X,YZ forces the most significant |
bit of register~X to be~0. This can be used to compute the absolute value of |
a floating point number. |
@^absolute value, floating point@> |
|
@ \MMIX\ knows several ways to shift a register left or right |
by any number of bits. |
|
\bull\<SL \$X,\$Y,\0 `shift left'.\> |
@.SL@> |
The bits of register~Y are shifted left by \$Z or Z places, and 0s |
are shifted in from the right; the result is placed in register~X\null. |
Register~Y is treated as a signed number, but |
the second operand is treated as an unsigned number. |
The effect is the same as multiplication by |
$2^{\mkern1mu\rZ}$ or by $2^\zz$; an integer overflow exception occurs if the |
result is $\ge2^{63}$ or $<-2^{63}$. |
In particular, if the second operand is 64 or~more, register~X will |
become entirely zero, and integer overflow will be signaled unless |
register~Y was zero. |
|
\bull\<SLU \$X,\$Y,\0 `shift left unsigned'.\> |
@.SLU@> |
The bits of register~Y are shifted left by \$Z or Z places, and 0s |
are shifted in from the right; the result is placed in register~X\null. |
Both operands are treated as unsigned numbers. The \.{SLU} instructions |
are equivalent to \.{SL}, except that no test for overflow is made. |
|
\bull\<SR \$X,\$Y,\0 `shift right'.\> |
@.SR@> |
The bits of register~Y are shifted right by \$Z or Z places, and copies |
of the leftmost bit (the sign bit) are shifted in from the left; the result is |
placed in register~X\null. |
Register~Y is treated as a signed number, but |
the second operand is treated as an unsigned number. |
The effect is the same as division by $2^{\mkern1mu\rZ}$ or by |
$2^\zz$ and rounding down. In particular, if the second operand is 64 or~more, |
register~X will become zero if \$Y was nonnegative, $-1$ if \$Y was negative. |
|
\bull\<SRU \$X,\$Y,\0 `shift right unsigned'.\> |
@.SRU@> |
The bits of register~Y are shifted right by \$Z or Z places, and 0s |
are shifted in from the left; the result is placed in register~X\null. |
Both operands are treated as unsigned numbers. |
The effect is the same as unsigned division of a 64-bit number |
by $2^{\mkern1mu\rZ}$ or by~$2^\zz$; |
if the second operand is 64 or~more, register~X will become entirely~zero. |
|
@* Comparisons. |
Arithmetic and logical operations are nice, |
but computer programs also need to compare numbers |
and to change the course of a calculation depending on what they find. |
\MMIX\ has four comparison instructions to facilitate such decision-making. |
|
\bull\<CMP \$X,\$Y,\0 `compare'.\> |
@.CMP@> |
Register X is set to $-1$ if register Y is less than register Z or less than |
the unsigned immediate value~Z, using the conventions of signed |
arithmetic; it is set to 0 if register~Y is equal to register Z or equal to |
the unsigned immediate value~Z; otherwise it is set to~1. |
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$. |
|
\bull\<CMPU \$X,\$Y,\0 `compare unsigned'.\> |
@.CMPU@> |
Register X is set to $-1$ if register Y is less than register Z or less than |
the unsigned immediate value Z, using the conventions of unsigned |
arithmetic; it is set to 0 if register Y is equal to register Z or equal to |
the unsigned immediate value~Z; otherwise it is set to~1. |
In symbols, $\rX=[\rY\!>\!\rZ]-[\rY\!<\!\rZ]$ or $\rX=[\rY\!>\!\zz]-[\rY\!<\!\zz]$. |
|
@ There also are 32 conditional instructions, which choose quickly between |
two alternative courses of action. |
|
\bull\<CSN \$X,\$Y,\0 `conditionally set if negative'.\> |
@.CSN@> |
If register Y is negative (namely if its most significant bit is~1), |
register~X is set to the contents of register~Z or to the |
unsigned immediate value~Z. Otherwise nothing happens. |
|
\bull\<CSZ \$X,\$Y,\0 `conditionally set if zero'. |
@.CSZ@> |
\bul\<CSP \$X,\$Y,\0 `conditionally set if positive'. |
@.CSP@> |
\bul\<CSOD \$X,\$Y,\0 `conditionally set if odd'. |
@.CSOD@> |
\bul\<CSNN \$X,\$Y,\0 `conditionally set if nonnegative'. |
@.CSNN@> |
\bul\<CSNZ \$X,\$Y,\0 `conditionally set if nonzero'. |
@.CSNZ@> |
\bul\<CSNP \$X,\$Y,\0 `conditionally set if nonpositive'. |
@.CSNP@> |
\bul\<CSEV \$X,\$Y,\0 `conditionally set if even'.\> |
@.CSEV@> |
These instructions are entirely analogous to \.{CSN}, except |
that register~X changes only if register~Y is respectively zero, positive, |
odd, nonnegative, nonzero, nonpositive, or nonodd. |
|
\bull\<ZSN \$X,\$Y,\0 `zero or set if negative'.\> |
@.ZSN@> |
If register Y is negative (namely if its most significant bit is~1), |
register~X is set to the contents of register~Z or to the |
unsigned immediate value~Z. Otherwise register~X is set to zero. |
|
\bull\<ZSZ \$X,\$Y,\0 `zero or set if zero'. |
@.ZSZ@> |
\bul\<ZSP \$X,\$Y,\0 `zero or set if positive'. |
@.ZSP@> |
\bul\<ZSOD \$X,\$Y,\0 `zero or set if odd'. |
@.ZSOD@> |
\bul\<ZSNN \$X,\$Y,\0 `zero or set if nonnegative'. |
@.ZSNN@> |
\bul\<ZSNZ \$X,\$Y,\0 `zero or set if nonzero'. |
@.ZSNZ@> |
\bul\<ZSNP \$X,\$Y,\0 `zero or set if nonpositive'. |
@.ZSNP@> |
\bul\<ZSEV \$X,\$Y,\0 `zero or set if even'.\> |
@.ZSEV@> |
These instructions are entirely analogous to \.{ZSN}, except |
that \$X is set to \$Z or~Z if register~Y is respectively zero, positive, |
odd, nonnegative, nonzero, nonpositive, or even; otherwise |
\$X is set to zero. |
|
Notice that the two instructions \<CMPU r,s,0 and \<ZSNZ r,s,1 have |
the same effect. So do the two instructions \<CSNP r,s,0 and \.{ZSP} \.{r,s,r}. |
So do \<AND r,s,1 and \.{ZSOD}~\.{r,s,1}. |
|
@* Branches and jumps. |
\MMIX\ ordinarily executes instructions in sequence, proceeding from |
an instruction in tetrabyte M$_4[\lambda]$ to the instruction in |
M$_4[\lambda+4]$. But there are several ways to interrupt |
the normal flow of control, most of which use the Y and Z fields of |
an instruction as a combined 16-bit YZ field. |
For example, \<BNZ \$3,@@+4000 (branch if nonzero) |
is typical: It means that control should skip ahead 1000 instructions |
to the command that appears $4000$ bytes after the |
\.{BNZ}, if register~3 is not equal to zero. |
|
There are eight branch-forward instructions, corresponding to the |
eight conditions in the \.{CS} and \.{ZS} commands that we discussed earlier. |
And there are eight similar branch-backward instructions; for example, |
\<BOD \$2,@@-4000 (branch if odd) takes control to the |
instruction that appears $4000$ bytes {\it before\/} |
this \.{BOD} command, if register~2 is odd. The numeric OP-code when branching |
backward is one greater than the OP-code when branching |
forward; the assembler takes care of this automatically, just as it takes |
cares of changing \.{ADD} from 32 to 33 when necessary. |
|
Since branches are relative to the current location, the \MMIX\ assembler |
treats branch instructions in a special way. |
Suppose a programmer writes `\.{BNZ} \.{\$3,Case5}', |
where \.{Case5} is the address of an instruction in location~$l$. |
If this instruction appears in location~$\lambda$, the assembler first |
computes the displacement $\delta=\lfloor(l-\lambda)/4\rfloor$. Then if |
$\delta$ is nonnegative, the quantity~$\delta$ |
is placed in the YZ field of a \.{BNZ} |
command, and it should be less than $2^{16}$; if $\delta$ is negative, |
the quantity $2^{16}+\delta$ is placed in the YZ field of a \.{BNZ} |
command with OP-code increased by~1, |
and $\delta$ should not be less than $-2^{16}$. |
|
The symbol \.{@@} used in our examples of |
\.{BNZ} and \.{BOD} above is interpreted by the |
assembler as an abbreviation for ``the location of the current |
instruction.'' In the following |
notes we will define pairs of branch commands by writing, for example, |
`\.{BNZ}~\.{\$X,@@+4*YZ[-262144]}'; this stands for a branch-forward |
command that |
branches to the current location plus four times~YZ, as well as for |
a branch-backward command that branches to the current |
location plus four times $(\rm YZ-65536)$. |
|
\bull\<BN \$X,@@+4*YZ[-262144] `branch if negative'. |
@.BN@> |
\bul\<BZ \$X,@@+4*YZ[-262144] `branch if zero'. |
@.BZ@> |
\bul\<BP \$X,@@+4*YZ[-262144] `branch if positive'. |
@.BP@> |
\bul\<BOD \$X,@@+4*YZ[-262144] `branch if odd'. |
@.BOD@> |
\bul\<BNN \$X,@@+4*YZ[-262144] `branch if nonnegative'. |
@.BNN@> |
\bul\<BNZ \$X,@@+4*YZ[-262144] `branch if nonzero'. |
@.BNZ@> |
\bul\<BNP \$X,@@+4*YZ[-262144] `branch if nonpositive'. |
@.BNP@> |
\bul\<BEV \$X,@@+4*YZ[-262144] `branch if even'.\> |
@.BEV@> |
If register X is respectively negative, zero, positive, odd, nonnegative, |
nonzero, nonpositive, or even, and if this instruction appears in memory |
location $\lambda$, the next instruction is taken from memory location |
$\lambda+4{\rm YZ}$ (branching forward) or $\lambda+4({\rm YZ}-2^{16})$ |
(branching backward). Thus one can go from location~$\lambda$ to any location |
between $\lambda-262{,}144$ and $\lambda+262{,}140$, inclusive. |
|
\smallskip |
Sixteen additional branch instructions called {\it probable branches\/} |
are also provided. They have exactly the same meaning as ordinary |
branch instructions; for example, \<PBOD \$2,@@-4000 and \<BOD \$2,@@-4000 both |
go backward 4000 bytes if register~2 is odd. But they differ in running time: |
On some implementations of\/ \MMIX, |
a branch instruction takes longer when the branch is taken, while a |
probable branch takes longer when the branch is {\it not\/} taken. |
Thus programmers should use a \.B instruction when they think branching is |
relatively unlikely, but they should use \.{PB} when they expect |
branching to occur more often than not. Here is a list of the |
probable branch commands, for completeness: |
|
\bull\<PBN \$X,@@+4*YZ[-262144] `probable branch if negative'. |
@.PBN@> |
\bul\<PBZ \$X,@@+4*YZ[-262144] `probable branch if zero'. |
@.PBZ@> |
\bul\<PBP \$X,@@+4*YZ[-262144] `probable branch if positive'. |
@.PBP@> |
\bul\<PBOD \$X,@@+4*YZ[-262144] `probable branch if odd'. |
@.PBOD@> |
\bul\<PBNN \$X,@@+4*YZ[-262144] `probable branch if nonnegative'. |
@.PBNN@> |
\bul\<PBNZ \$X,@@+4*YZ[-262144] `probable branch if nonzero'. |
@.PBNZ@> |
\bul\<PBNP \$X,@@+4*YZ[-262144] `probable branch if nonpositive'. |
@.PBNP@> |
\bul\<PBEV \$X,@@+4*YZ[-262144] `probable branch if even'. |
@.PBEV@> |
|
@ Locations that are relative to the current instruction can be |
transformed into absolute locations with \.{GETA} commands. |
|
\bull\<GETA \$X,@@+4*YZ[-262144] `get address'.\> |
@.GETA@> |
The value $\lambda+4{\rm YZ}$ or $\lambda+4({\rm YZ}-2^{16})$ is placed in |
register~X\null. (The assembly language conventions of branch instructions |
apply; for example, we can write `\.{GETA} \.{\$X,Addr}'.) |
|
@ \MMIX\ also has unconditional jump instructions, which change the |
location of the next instruction no matter what. |
|
\bull\<JMP @@+4*XYZ[-67108864] `jump'.\> |
@.JMP@> |
A \.{JMP} command treats bytes X, Y, and Z as an unsigned 24-bit |
integer XYZ. It allows a program to transfer control from location $\lambda$ to any |
location between $\lambda-67\?{,}108{,}864$ and $\lambda+67\?{,}108{,}860$ |
inclusive, using relative addressing as in the \.{B} and \.{PB} commands. |
|
\bull\<GO \$X,\$Y,\0 `go to location'.\> |
@.GO@> |
\MMIX\ takes its next instruction from location $\rY+\rZ$ or $\rY+\zz$, |
and continues from there. Register~X is set equal to $\lambda+4$, the |
location of the instruction that would ordinarily have been executed next. |
(\.{GO} is similar to a jump, but it is not relative |
to the current location. Since \.{GO} has the same format as a load or store |
instruction, a loading routine can treat program labels with the same mechanism |
that is used to treat references to data.) |
|
An old-fashioned type of subroutine linkage can be implemented by saying |
either `\.{GO}~\.{r,subloc,0}' or `\.{GETA}~\.{r,@@+8;} |
\.{JMP}~\.{Sub}' to~enter a subroutine, |
then `\.{GO}~\.{r,r,0}' to return. |
But subroutines are normally entered with the instructions |
\.{PUSHJ} or \.{PUSHGO}, described below. |
|
The two least significant bits of the address |
in a \.{GO} command are essentially ignored. They will, however, appear in |
the value of~$\lambda$ returned by \.{GETA} instructions, and in the |
return-jump register~rJ after \.{PUSHJ} or \.{PUSHGO} instructions are |
performed, and in |
@^rJ@> |
the where-interrupted register at the time of an interrupt. Therefore they |
could be used to send some kind of signal to a subroutine or (less likely) |
to an interrupt handler. |
|
@* Multiplication and division. |
Now for some instructions that make \MMIX\ work harder. |
|
\bull\<MUL \$X,\$Y,\0 `multiply'.\> |
@.MUL@> |
The signed product of the number in register Y by either the |
number in register~Z or the unsigned byte~Z |
replaces the contents of register~X\null. An |
integer overflow exception can occur, as with \.{ADD} or \.{SUB}, if the |
result is less than $-2^{63}$ or greater than $2^{63}-1$. (Immediate |
multiplication by powers of~2 can be done more rapidly with the \.{SL} |
instruction.) |
|
\bull\<MULU \$X,\$Y,\0 `multiply unsigned'.\> |
@.MULU@> |
The lower 64 bits of the |
unsigned 128-bit product of register~Y and either |
register~Z or~Z are placed in register~X, and the upper 64 bits are |
placed in the special {\it himult register\/}~rH\null. (Immediate multiplication |
@^rH@> |
by powers of~2 can be done more rapidly with the \.{SLU} instruction, |
if the upper half is not needed. |
Furthermore, an instruction like \<4ADDU \$X,\$Y,\$Y is faster than |
\.{MULU} \.{\$X,\$Y,5}.) |
|
\bull\<DIV \$X,\$Y,\0 `divide'.\> |
@.DIV@> |
The signed quotient of the number in register Y divided |
by either the number in register~Z or the unsigned byte~Z |
replaces the contents of register~X, and the signed remainder |
is placed in the special {\it remainder register\/}~rR\null. |
@^rR@> |
An integer divide check exception occurs if the divisor is zero; in that |
case \$X is set to zero and rR is set to~\$Y\null. |
@^divide check exception@> |
@^overflow@> |
An integer overflow exception occurs if the number $-2^{63}$ is divided |
by~$-1$; otherwise integer overflow is impossible. The quotient of |
$y$ divided by~$z$ is defined to be $\lfloor y/z\rfloor$, and the remainder |
is defined to be $y-\lfloor y/z\rfloor z$ (also written $y\bmod z$). |
Thus, the remainder is either |
zero or has the sign of the divisor. Dividing by $z=2^t$ gives |
exactly the same quotient as shifting right~$t$ via the \.{SR} command, and |
exactly the same remainder as anding with $z-1$ via the \.{AND} command. |
Division of a positive 63-bit number by a positive constant can be accomplished |
more quickly by computing the upper half of a suitable unsigned product and |
shifting it right appropriately. |
|
\bull\<DIVU \$X,\$Y,\0 `divide unsigned'.\> |
@.DIVU@> |
The unsigned 128-bit number obtained by prefixing the special {\it dividend |
register}~rD to the contents of register~Y is divided either by the |
@^rD@> |
unsigned number in register~Z or by the unsigned byte~Z, and the quotient is placed |
in register~X\null. The remainder is placed in the remainder |
register~rR\null. |
However, if rD is greater than or equal to |
the divisor (and in particular if the divisor is zero), |
then \$X is set to~rD and rR is set to~\$Y\null. |
(Unsigned arithmetic never signals an exceptional condition, even |
when dividing by zero.) |
If rD is zero, unsigned division by $z=2^t$ gives exactly the same quotient as |
shifting right~$t$ via the \.{SRU} command, and |
exactly the same remainder as anding with $z-1$ via the \.{AND} command. |
Section 4.3.1 of {\sl Seminumerical Algorithms\/} |
explains how to use unsigned division to obtain the quotient and remainder |
of extremely large numbers. |
|
@* Floating point computations. |
Floating point arithmetic conforming to the famous IEEE/ANSI |
Standard~754 is provided for arbitrary 64-bit numbers. The IEEE standard |
refers to such numbers as ``double format'' quantities, but \MMIX\ |
calls them simply floating point numbers because 64-bit quantities are |
the~norm. |
@^floating point arithmetic@> |
@^IEEE/ANSI Standard 754@> |
@^subnormal numbers@> |
@^normal numbers@> |
@^NaN@> |
@^overflow@> |
@^underflow@> |
@^invalid exception@> |
@^inexact exception@> |
@^signaling NaN@> |
@^quiet NaN@> |
@^infinity@> |
@^rounding modes@> |
|
A positive floating point number has 53 bits of precision and can range |
from approximately $10^{-308}$ to $10^{308}$. ``Subnormal numbers'' |
between $10^{-324}$ and $10^{-308}$ can also be represented, but with fewer |
bits of precision. |
Floating point numbers can be |
infinite, and they satisfy such identities as $1.0/\infty=+0.0$, $-2.8\times\infty |
=-\infty$. Floating |
point quantities can also be ``Not-a-Numbers'' or NaNs, which are |
further classified into signaling NaNs and quiet NaNs. |
|
Five kinds |
of exceptions can occur during floating point computations, and they |
each have code letters: Floating |
overflow~(O) or underflow~(U); floating divide by zero~(Z); |
floating inexact~(X); and floating invalid~(I). |
For example, the multiplication of sufficiently small integers causes |
no exceptions, and the division of 91.0 by~13.0 is also exception-free, |
but the division 1.0/3.0 is inexact. The multiplication of extremely |
large or extremely small floating point numbers is inexact and it |
also causes overflow or underflow. |
Invalid results occur when taking the square root of a negative |
number; mathematicians can remember the I exception |
by relating it to the square root of $-1.0$. |
Invalid results also occur when trying to convert infinity |
or a quiet NaN to a fixed-point |
integer, or when any signaling NaN is encountered, or when |
mathematically undefined operations like $\infty-\infty$ or $0/0$ are |
requested. |
(Programmers can be sure that they have not erroneously |
used uninitialized floating point data if they initialize all their variables |
to signaling NaN values.) |
|
Four different rounding modes for inexact results are available: |
round to nearest (and to even in case of ties); |
round off (toward zero); round up (toward $+\infty)$; |
or round down (toward $-\infty$). \MMIX\ |
has a special {\it arithmetic status register\/}~rA that specifies the |
@^rA@> |
current rounding mode and the user's current preferences for exception |
handling. |
|
\def\NaN{{\rm NaN}} |
IEEE standard arithmetic provides an excellent foundation for scientific |
calculations, and it will be thoroughly explained in the fourth |
edition of {\sl Seminumerical Algorithms}, Section 4.2. |
For our present purposes, we need not study all the details; but |
we do need to specify \MMIX's behavior with respect to several |
things that are not completely defined by the standard. |
For example, the IEEE standard does not fully define the |
result of operations with NaNs. |
|
When an octabyte represents a floating point number |
in \MMIX's registers, the leftmost bit is the sign; then come 11 bits for an |
exponent~$e$; and the remaining 52 bits are the fraction part~$f$. |
We regard $e$ as an integer between 0 and $(11111111111)_2=2047$, and we regard $f$ as |
a fraction between 0 and $(.111\ldots1)_2=1-2^{-52}$. |
Each octabyte has the following |
significance: |
$$\vbox{\halign{\hfil$\pm#$,\quad if &#\hfil\cr |
0.0&$e=f=0$ (zero);\cr |
2^{-1022}f&$e=0$ and $f>0$ (subnormal);\cr |
2^{\mkern1mu e-1023}(1+f)&$0<e<2047$ (normal);\cr |
\infty&$e=2047$ and $f=0$ (infinite);\cr |
\NaN(f)&$e=2047$ and $0<f<1/2$ (signaling NaN);\cr |
\NaN(f)&$e=2047$ and $f\ge1/2$ (quiet NaN).\cr}}$$ |
Notice that $+0.0$ is distinguished from $-0.0$; this fact is |
important for interval arithmetic. |
@^minus zero@> |
|
Exercise: What 64 bits represent the floating point number 1.0? |
Answer: We want $e=1023$ and $f=0$, so the answer is \Hex{3ff0000000000000}. |
|
Exercise: What is the largest finite floating point number? |
Answer: We want $e=2046$ and $f=1-2^{-52}$, so the answer is |
$\Hex{7fefffffffffffff}=2^{1024}-2^{971}$. |
|
@ The seven IEEE floating point arithmetic operations (addition, subtraction, |
multiplication, division, remainder, square root, and nearest-integer) |
all share common features, called the {\it standard floating point |
conventions\/} in the discussion below: |
@^standard floating point conventions@> |
@^overflow@> |
@^underflow@> |
The operation is performed on floating point numbers found in two registers, |
\$Y and~\$Z, except that square root and integerization |
involve only one operand. |
If neither input operand is a NaN, we first determine the exact result, |
then round it using the current rounding mode |
found in special register~rA\null. Infinite results are exact and need |
no rounding. A floating overflow exception occurs if the rounded |
result is finite but needs an exponent greater than 2046. |
A floating underflow exception occurs if the rounded result needs an exponent |
less than~1 and either (i)~the unrounded result cannot be represented exactly |
@^rA@> |
as a subnormal number or (ii)~the ``floating underflow trip'' is enabled in~rA\null. |
(Trips are discussed below.) |
NaNs are treated specially as follows: If either \$Y or~\$Z is a signaling NaN, |
an invalid exception occurs and the NaN is quieted by adding 1/2 to its |
fraction part. Then if \$Z is a quiet NaN, the result is set |
to \$Z; otherwise if \$Y is a quiet NaN, the result is set to \$Y\null. |
(Registers \$Y and \$Z do not actually change.) |
|
\bull\<FADD \$X,\$Y,\$Z `floating add'.@>\> |
@.FADD@> |
The floating point sum $\rY+\rZ$ is computed by the |
standard floating point conventions just described, |
and placed in register~X\null. |
An invalid exception occurs if the sum is $(+\infty)+(-\infty)$ or |
$(-\infty)+(+\infty)$; in that case the result is $\NaN(1/2)$ with the sign |
of~\$Z\null. If the sum is exactly zero and the current mode is |
not rounding-down, the result is $+0.0$ except that $(-0.0)+(-0.0)=-0.0$. If the |
@^minus zero@> |
sum is exactly zero and the current mode is rounding-down, the result |
is $-0.0$ except that $(+0.0)+(+0.0)=+0.0$. |
These rules for signed zeros turn out to be useful when doing interval |
arithmetic: If the lower bound of an interval is $+0.0$ or if the |
upper bound is $-0.0$, the interval does not contain zero, so the |
numbers in the interval have a known sign. |
|
Floating point underflow cannot occur unless the U-trip has been enabled, |
because any underflowing result of floating point |
addition can be represented exactly as a subnormal number. |
|
Silly but instructive exercise: Find all pairs of numbers $(\rY,\rZ)$ such |
that the commands \<FADD \$X,\$Y,\$Z and \<ADDU \$X,\$Y,\$Z both produce |
the same result in~\$X |
(although \.{FADD} may cause floating exceptions). |
Answer: Of course \$Y or \$Z could be zero, if the other one is not a signaling |
NaN. Or one could be signaling and the other \Hex{0008000000000000}. |
Other possibilities |
occur when they are both positive and less than |
\Hex{0010000000000001}; or when one operand is \Hex{0000000000000001} |
and the other is an odd number between \Hex{0020000000000001} and |
\Hex{002ffffffffffffd} inclusive (rounding to nearest). |
And still more surprising possibilities exist, such as |
\Hex{7f6001b4c67bc809}\thinspace+\thinspace\Hex{ff5ffb6a4534a3f7}. |
All eight families of solutions will be revealed some day in the fourth edition |
of {\sl Seminumerical Algorithms}. |
|
\bull\<FSUB \$X,\$Y,\$Z `floating subtract'.\> |
@.FSUB@> |
This instruction is equivalent to \.{FADD}, but with the sign of~\$Z negated |
unless \$Z is a~NaN. |
|
\bull\<FMUL \$X,\$Y,\$Z `floating multiply'.\> |
@.FMUL@> |
The floating point product $\rY\times\rZ$ is computed by |
the standard floating point conventions, and placed in register~X\null. |
An invalid exception occurs if |
the product is $(\pm0.0)\times(\pm\infty)$ or $(\pm\infty)\times(\pm0.0)$; |
in that case the result is $\pm\NaN(1/2)$. No exception occurs for the |
product $(\pm\infty)\times(\pm\infty)$. If neither \$Y nor~\$Z is a NaN, |
the sign of the result is the product of the signs of \$Y and~\$Z\null. |
|
\bull\<FDIV \$X,\$Y,\$Z `floating divide'.\> |
@.FDIV@> |
The floating point quotient $\rY\?/\rZ$ is computed by |
the standard floating point conventions, and placed in \$X\null. |
@^standard floating point conventions@> |
A floating divide by zero exception occurs if the |
quotient is $(\hbox{normal or subnormal})/(\pm0.0)$. An invalid exception occurs if |
the quotient is $(\pm0.0)/(\pm0.0)$ or $(\pm\infty)/(\pm\infty)$; in that case the |
result is $\pm\NaN(1/2)$. No exception occurs for the |
quotient $(\pm\infty)/(\pm0.0)$. If neither \$Y nor~\$Z is a NaN, |
the sign of the result is the product of the signs of \$Y and~\$Z\null. |
|
If a floating point number in register X is known to have an exponent between |
2 and~2046, the instruction \<INCH \$X,\char`\#fff0 will divide it by~2.0. |
|
\bull\<FREM \$X,\$Y,\$Z `floating remainder'.\> |
@.FREM@> |
The floating point remainder $\rY\,{\rm rem}\,\rZ$ is computed by |
the standard floating point conventions, and placed in register~X\null. |
(The IEEE standard defines the remainder to be $\rY-n\times\rZ$, |
where $n$ is the nearest integer to $\rY/\rZ$, and $n$ is an even |
integer in case of ties. This is not the same as the remainder |
$\rY\bmod\rZ$ computed by \.{DIV} or \.{DIVU}.) |
A zero remainder has the sign of~\$Y\null. |
An invalid exception occurs if \$Y is infinite and/or \$Z is zero; in |
that case the result is $\NaN(1/2)$ with the sign of~\$Y\null. |
|
\bull\<FSQRT \$X,\$Z `floating square root'.\> |
@.FSQRT@> |
The floating point square root $\sqrt\rZ$ is computed by the |
standard floating point conventions, and placed in register~X\null. An |
invalid exception occurs if \$Z is a negative number (either infinite, normal, |
or subnormal); in that case the result is $-\NaN(1/2)$. No exception occurs |
when taking the square root of $-0.0$ or $+\infty$. In all cases the sign of |
the result is the sign of~\$Z\null. |
|
\bull\<FINT \$X,\$Z `floating integer'.\> |
@.FINT@> |
The floating point number in register~Z is rounded (if |
necessary) to a floating point integer, using the current |
rounding mode, and placed in register~X\null. Infinite values and quiet NaNs |
are not changed; signaling NaNs are treated as in the standard conventions. |
Floating point overflow and underflow exceptions cannot occur. |
|
The Y field of \.{FSQRT} and \.{FINT} can be used to specify a |
special rounding mode, as explained below. |
|
@ Besides doing arithmetic, we need to compare floating point numbers |
with each other, taking proper account of NaNs and the fact that $-0.0$ |
should be considered equal to $+0.0$. The following instructions are |
analogous to the comparison operators \.{CMP} and \.{CMPU} that we |
have used for integers. |
@^minus zero@> |
|
\bull\<FCMP \$X,\$Y,\$Z `floating compare'.\> |
@.FCMP@> |
Register X is set to $-1$ if $\rY<\rZ$ according to the conventions of |
floating point arithmetic, or to~1 if $\rY>\rZ$ according to those |
conventions. Otherwise it is set to~0. An invalid exception |
occurs if either \$Y or \$Z is a NaN; in such cases the result is zero. |
|
\bull\<FEQL \$X,\$Y,\$Z `floating equal to'.\> |
@.FEQL@> |
Register X is set to 1 if $\rY=\rZ$ according to the conventions of |
floating point arithmetic. Otherwise it is set to~0. The result is zero if |
either \$Y or \$Z is a NaN, even if a NaN is being compared with itself. |
However, no invalid exception occurs, not even when \$Y or \$Z is a signaling |
NaN\null. (Perhaps \MMIX\ differs slightly from the IEEE standard in this |
regard, but programmers sometimes need to look at signaling NaNs without |
encountering side effects. |
Programmers who insist on raising |
an invalid exception whenever a signaling NaN is compared for floating equality |
should issue the instructions \<FSUB \$X,\$Y,\$Y; \<FSUB \$X,\$Z,\$Z just before saying |
\.{FEQL}~\.{\$X,\$Y,\$Z}.) |
|
Suppose $w$, $x$, $y$, and $z$ are unsigned 64-bit integers with |
$w<x<2^{63}\le y<z$. Thus, the leftmost bits of $w$ and~$x$ are~0, |
while the leftmost bits of $y$ and~$z$ are~1. |
Then we have $w<x<y<z$ when these numbers are considered |
as unsigned integers, but $y<z<w<x$ when they are considered as signed |
integers, because $y$ and~$z$ are negative. Furthermore, we have |
$z<y\le w<x$ when these same 64-bit quantities are considered to be |
floating point numbers, assuming that no NaNs are present, |
because the leftmost bit of a floating point |
number represents its sign and the remaining bits represent its magnitude. |
The case $y=w$ occurs in floating point comparison |
if and only if $y$ is the representation of $-0.0$ |
and $w$ is the representation of $+0.0$. |
|
\bull\<FUN \$X,\$Y,\$Z `floating unordered'.\> |
@.FUN@> |
Register X is set to 1 if \$Y and \$Z are unordered according to the conventions |
of floating point arithmetic (namely, if either one is a NaN); otherwise |
register~X is set to~0. No invalid exception occurs, not even when \$Y or \$Z is |
a signaling NaN\null. |
|
\smallskip |
The IEEE standard discusses 26 different possible |
relations on floating point numbers; |
\MMIX\ implements 14 of them with single instructions, followed by |
a branch (or by a \.{ZS} to make a ``pure'' 0~or~1 result); all 26 |
can be evaluated with a sequence of at most four \MMIX\ commands |
and a subsequent branch. The |
hardest case to handle is `?$>=$' (unordered or greater or equal, |
to be computed without exceptions), for which the following |
sequence makes $\rX\ge0$ if and only if $\rY\mathrel?>=\rZ$: |
$$\vbox{\halign{&\tt#\hfil\ \cr |
&FUN &\$255,\$Y,\$Z\cr |
&BP &\$255,1F&\% skip ahead if unordered\cr |
&FCMP&\$X,\$Y,\$Z&\% \$X=[\$Y>\$Z]-[\$Y<\$Z]; no exceptions will arise\cr |
1H&CSNZ &\$X,\$255,1&\% \$X=1 if unordered\cr |
}}$$ |
|
@ Exercise: Suppose \MMIX\ had no \.{FINT} instruction. Explain how to |
@.FINT@> |
obtain the equivalent of \<FINT \$X,\$Z using other instructions. Your |
program should do the proper thing with respect to NaNs and exceptions. |
(For example, it should cause an invalid exception if and only if \$Z is |
a signaling NaN; it should cause an inexact exception only if \$Z needs |
to be rounded to another value.) |
@^emulation@> |
|
Answer: (The assembler prefixes hexadecimal constants by \.\#.) |
$$\vbox{\halign{&\tt#\hfil\ \cr |
&SETH &\$0,\char`\#4330&\% \$0=2\char`\^52\cr |
&SET &\$1,\$Z&\% \$1=\$Z\cr |
&ANDNH &\$1,\char`\#8000&\% \$1=abs(\$Z)\cr |
&ANDN &\$2,\$Z,\$1&\% \$2=signbit(\$Z)\cr |
&FUN &\$3,\$Z,\$Z&\% \$3=[\$Z is a NaN]\cr |
&BNZ &\$3,1F&\% skip ahead if \$Z is a NaN\cr |
&FCMP &\$3,\$1,\$0&\% \$3=[abs(\$Z)>2\char`\^52]-[abs(\$Z)<2\char`\^52]\cr |
&CSNN &\$0,\$3,0&\% set \$0=0 if \$3>=0\cr |
&OR &\$0,\$2,\$0&\% attach sign of \$Z to \$0\cr |
1H\ &FADD &\$1,\$Z,\$0&\% \$1=\$Z+\$0\cr |
&FSUB &\$X,\$1,\$0&\% \$X=\$1-\$0\cr}}$$ |
This program handles most cases of interest by adding and subtracting |
$\pm2^{52}$ using floating point arithmetic. |
It would be incorrect to do this in all cases; |
for example, such addition/subtraction might fail to give the correct |
answer when \$Z is a small negative |
quantity (if rounding toward zero), or when \$Z is a number like |
$2^{105}+2^{53}$ (if rounding to nearest). |
|
@ \MMIX\ goes beyond the IEEE standard to define additional relations |
between floating point numbers, as suggested by the theory in |
Section 4.2.2 of {\sl Seminumerical Algorithms}. Given a nonnegative |
number~$\epsilon$, each normal floating point number $u=(f,e)$ has |
a {\it neighborhood\/} |
$$N_\epsilon(u)=\{x\,\mid\,\vert x-u\vert\le 2^{e-1022}\epsilon\};$$ |
we also define $N_\epsilon(0)=\{0\}$, |
$N_\epsilon(u)=\{x\mid\vert x-u\vert\le2^{-1021}\epsilon\}$ if $u$ is |
subnormal; $N_\epsilon(\pm\infty)=\{\pm\infty\}$ if $\epsilon<1$, |
$N_\epsilon(\pm\infty)=\{$everything except $\mp\infty\}$ if $1\le\epsilon<2$, |
$N_\epsilon(\pm\infty)=\{$everything$\}$ if $\epsilon\ge2$. Then we write |
$$\vbox{\halign{$u#v\ (\epsilon)$, &#\hfil\cr |
\prec&if $u<N_\epsilon(v)$ and $N_\epsilon(u)<v$;\cr |
\sim&if $u\in N_\epsilon(v)$ or $v\in N_\epsilon(u)$;\cr |
\approx&if $u\in N_\epsilon(v)$ and $v\in N_\epsilon(u)$;\cr |
\succ&if $u>N_\epsilon(v)$ and $N_\epsilon(u)>v$.\cr}}$$ |
|
\def\rE{{\rm rE}} |
\bull\<FCMPE \$X,\$Y,\$Z `floating compare (with respect to epsilon)'.\> |
@.FCMPE@> |
Register X is set to $-1$ if $\rY\prec\rZ\ \ (\rE)$ according to the |
conventions of {\sl Seminumerical Algorithms} as stated above; it is set to~1 |
if $\rY\succ\rZ\ \ (\rE)$ according to those conventions; otherwise |
it is set to~0. Here rE is a floating point number in |
@^rE@> |
the special {\it epsilon register\/}, which is used only by the |
floating point comparison operations \.{FCMPE}, \.{FEQLE}, and \.{FUNE}. |
An invalid exception occurs, and the result is zero, |
if any of \$Y, \$Z, or rE are NaN, or if rE is negative. |
If no such exception occurs, exactly one of the three conditions |
$\rY\prec\rZ$, $\rY\sim\rZ$, $\rY\succ\rZ$ holds with respect to~rE. |
|
\bull\<FEQLE \$X,\$Y,\$Z `floating equivalent (with respect to epsilon)'.\> |
@.FEQLE@> |
Register X is set to 1 if $\rY\approx\rZ\ \ (\rE)$ according to the |
conventions of {\sl Seminumerical Algorithms\/} as stated above; otherwise |
it is set to~0. |
An invalid exception occurs, and the result is zero, |
if any of \$Y, \$Z, or rE are NaN, or if rE is negative. |
Notice that the relation $\rY\approx\rZ$ computed by \.{FEQLE} is |
stronger than the relation $\rY\sim\rZ$ computed by \.{FCMPE}. |
|
\bull\<FUNE \$X,\$Y,\$Z `floating unordered (with respect to epsilon)'.\> |
@.FUNE@> |
Register X is set to 1 if |
\$Y, \$Z, or~rE are exceptional as discussed for \.{FCMPE} and \.{FEQLE}; |
otherwise it is set to~0. No exceptions occur, even if \$Y, \$Z, or~rE is |
a signaling NaN. |
|
\smallskip\noindent |
Exercise: What floating point numbers does \.{FCMPE} regard |
as $\sim0.0$ with respect to |
$\epsilon=1/2$, when no exceptions arise? \ Answer: Zero, subnormal |
numbers, and normal numbers with $f=0$. |
(The numbers similar to zero with respect to~$\epsilon$ are zero, |
subnormal numbers with $f\le2\epsilon$, normal numbers with $f\le2\epsilon-1$, |
and $\pm\infty$ if $\epsilon>=1$.) |
|
@ The IEEE standard also defines 32-bit floating point quantities, which |
it calls ``single format'' numbers. \MMIX\ calls them {\it short floats}, |
@^short float@> |
and converts between 32-bit and 64-bit forms when such numbers are |
loaded from memory or stored into memory. A short float consists of a sign |
bit followed by an 8-bit exponent and a 23-bit fraction. After it has |
been loaded into one of\/ \MMIX's registers, its 52-bit fraction part |
will have 29 trailing zero bits, and its exponent~$e$ will be one of the |
256 values 0, $(01110000001)_2=897$, $(01110000010)_2=898$, \dots, |
$(10001111110)_2=1150$, or~2047, unless it was subnormal; a subnormal |
short float loads into a normal number with $874\le e\le896$. |
|
\bull\<LDSF \$X,\$Y,\0 `load short float'.\> |
@.LDSF@> |
Register~X is set to the 64-bit floating point number corresponding |
to the 32-bit floating point number represented by |
$\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$. |
No arithmetic exceptions occur, not even if a signaling NaN is loaded. |
|
\bull\<STSF \$X,\$Y,\0 `store short float'.\> |
@.STSF@> |
The value obtained by rounding register~X to a 32-bit floating |
point number is placed in $\mm_4[\rY+\rZ]$ or $\mm_4[\rY+\zz]$. |
Rounding is done with the current rounding mode, in a manner |
exactly analogous to the standard conventions for rounding 64-bit results, |
except that the precision and exponent range are limited. In particular, |
floating overflow, underflow, and inexact exceptions might occur; |
a signaling NaN will trigger an invalid exception and it will become quiet. |
The fraction part of a NaN is truncated if necessary to a multiple of |
$2^{-23}$, by ignoring the least significant 29 bits. |
|
If we load any two short floats and operate on them once with either \.{FADD}, |
\.{FSUB}, \.{FMUL}, \.{FDIV}, \.{FREM}, \.{FSQRT}, or \.{FINT}, and if we then |
store the result as a short float, we obtain the results required by |
the IEEE standard for single format arithmetic, because |
the double format can be shown to have enough precision to avoid any |
problems of ``double rounding.'' But programmers are usually better |
off sticking |
to 64-bit arithmetic unless they have a strong reason to emulate the |
precise behavior of a 32-bit computer; 32 bits do not offer |
much precision. |
|
@ Of course we need to be able to go back and forth between integers and |
floating point values. |
|
\bull\<FIX \$X,\$Z `convert floating to fixed'.\> |
@.FIX@> |
The floating point number in register~Z is converted to an integer |
as with the \.{FINT} instruction, and the resulting integer (mod~$2^{64}$) |
is placed in register~X\null. |
An invalid exception occurs if \$Z is infinite |
or a NaN; in that case \$X is simply set equal to~\$Z\null. A float-to-fix |
exception occurs if the result is less than |
@^float-to-fix exception@> |
@^short float@> |
$-2^{63}$ or greater than $2^{63}-1$. |
|
\bull\<FIXU \$X,\$Z `convert floating to fixed unsigned'.\> |
@.FIXU@> |
This instruction is identical to \.{FIX} except that no float-to-fix |
exception occurs. |
|
\bull\<FLOT \$X,\0 `convert fixed to floating'.\> |
@.FLOT@> |
The integer in \$Z or the immediate constant~Z is |
converted to the nearest floating point value (using the current rounding |
mode) and placed in register~X\null. A floating inexact exception |
occurs if rounding is necessary. |
|
\bull\<FLOTU \$X,\0 `convert fixed to floating unsigned'.\> |
@.FLOTU@> |
\.{FLOTU} is like \.{FLOT}, but \$Z is treated as an unsigned integer. |
|
\bull\<SFLOT \$X,\0 `convert fixed to short float'; |
\<SFLOTU \$X,\0 `convert fixed to short float unsigned'.\> |
@.SFLOT@> |
@.SFLOTU@> |
The \.{SFLOT} instructions are like the \.{FLOT} instructions, except that |
they round to a floating point number whose fraction part is a multiple |
of $2^{-23}$. (Thus, the resulting value will not be changed by a ``store |
short float'' instruction.) Such conversions appear in \MMIX's repertoire only |
to establish complete conformance with the IEEE standard; a programmer |
needs them only when emulating a 32-bit machine. |
@^emulation@> |
|
@ Since the variants of \.{FIX} and \.{FLOT} involve only one input operand (\$Z |
or~Z), their Y~field is normally zero. A programmer can, however, force the |
mode of rounding used with these commands by setting |
$$\vbox{\halign{$\yy=#$,\quad &\.{ROUND\_#};\hfil\cr |
1&OFF\cr |
2&UP\cr |
3&DOWN\cr |
4&NEAR\cr}}$$ |
for example, the instruction \<FLOTU \$X,ROUND\_OFF,\$Z will set the |
exponent~$e$ of register~X to $1086-l$ if \$Z is a nonzero quantity with |
$l$~leading zero bits. Thus we can count leading zeros by continuing |
with \.{SETL}~\.{\$0,1086}; \.{SR}~\.{\$X,\$X,52}; \.{SUB}~\.{\$X,\$0,\$X}; |
\.{CSZ}~\.{\$X,\$Z,64}. |
@^counting leading zeros@> |
@.FLOT@> |
@.FLOTU@> |
@.SFLOT@> |
@.SFLOTU@> |
@.FIX@> |
@.FIXU@> |
@:ROUND_OFF}\.{ROUND\_OFF@> |
@:ROUND_UP}\.{ROUND\_UP@> |
@:ROUND_DOWN}\.{ROUND\_DOWN@> |
@:ROUND_NEAR}\.{ROUND\_NEAR@> |
|
The Y field can also be used in the same way |
to specify any desired rounding mode in the other |
floating point instructions that have only a single operand, namely |
\.{FSQRT} and~\.{FINT}. |
@.FSQRT@> |
@.FINT@> |
An illegal instruction interrupt occurs if Y exceeds~4 in any of these |
commands. |
@^illegal instructions@> |
|
@* Subroutine linkage. |
\MMIX\ has several special operations designed to facilitate the process of |
calling and implementing subroutines. The key notion is the idea of a |
hardware-supported {\it register stack}, which can coexist with a |
software-supported stack of variables that are not maintained in registers. |
From a programmer's standpoint, \MMIX\ maintains a potentially unbounded list |
$S[0]$, $S[1]$, \dots,~$S[\tau-1]$ of octabytes holding the contents |
of registers that are temporarily inaccessible; initially $\tau=0$. |
When a subroutine is entered, registers can be ``pushed'' on to the end of |
this list, increasing~$\tau$; when the subroutine has finished its |
execution, the registers are ``popped'' off again and $\tau$~decreases. |
|
Our discussion so far has treated all 256 registers \$0, \$1, \dots,~\$255 as if |
they were alike. But in fact, \MMIX\ maintains two internal one-byte counters |
$L$ and~$G$, where $0\le\ll\le\gg<256$, with the property that |
$$\vbox{\halign{#\hfil\cr |
registers 0, 1, \dots, $\ll-1$ are ``local'';\cr |
registers @!|L|, $\ll+1$, \dots, $\gg-1$ are ``marginal'';\cr |
registers @!|G|, $\gg+1$, \dots, 255 are ``global.''\cr}}$$ |
A marginal register is zero when its value is read. |
@^illegal instructions@> |
@^rG@> |
@^rL@> |
@^local registers@> |
@^marginal registers@> |
@^global registers@> |
@^register stack@> |
|
The $G$ counter is normally set to a fixed value once and for all when a program |
is loaded, thereby defining the number of program variables that will live |
entirely in registers rather than in memory during the course of execution. |
A programmer may, however, change~$G$ dynamically using the \.{PUT} |
instruction described below. |
|
The $L$ counter starts at 0. If an instruction places a value into a register |
that is currently marginal, namely a register $x$ such that |
$\ll\le x<\gg$, the value of~$L$ will increase to $x+1$, and any |
newly local registers will be zero. For example, if $\ll=10$ and |
$\gg=200$, the instruction \<ADD \$5,\$15,1 would simply set \$5 to~1. But the |
instruction \<ADD \$15,\$5,\$200 would set \$10, \$11, \dots,~\$14 to zero, |
\$15 to $\$5+\$200$, and $L$~to~16. (The process of clearing registers and |
increasing~$L$ might take quite a few machine cycles in the worst case. We will |
see later that \MMIX\ is able to take care of any high-priority interrupts |
that might occur during this time.) |
|
\bull\<PUSHJ \$X,@@+4*YZ[-262144] `push registers and jump'. |
\bul\<PUSHGO \$X,\$Y,\0 `push registers and go'.\> |
@.PUSHGO@> |
@.PUSHJ@> |
Suppose first that $\xx<\ll$. |
Register~X is set equal to the number~X, then |
registers 0, 1, \dots,~X are pushed onto the register stack as |
described below. |
If this instruction is in |
location $\lambda$, the value $\lambda+4$ is placed into the special {\it |
return-jump register\/}~rJ\null. Then control jumps to instruction |
@^rJ@> |
$\lambda+4\rm YZ$ or $\lambda+4\rm YZ-262144$ or |
$\rY+\rZ$ or $\rY+\zz$, as in a |
\.{JMP} or \.{GO} command. |
|
Pushing the first $\xx+1$ registers onto the stack means essentially that we |
set $S[\tau]\gets\$0$, $S[\tau+1]\gets\$1$, \dots, $S[\tau+\xx]\gets\$\xx$, |
$\tau\gets\tau+\xx+1$, $\$0\gets\$(\xx+1)$, \dots, |
$\$(\ll-\xx-2)\gets\$(\ll-1)$, $\ll\gets\ll-\xx-1$. For example, if |
$\xx=1$ and $\ll=5$, the current contents of \$0 and the number~1 are |
placed on the register stack, where they will be temporarily inaccessible. |
Then control jumps to a subroutine with $L$ reduced to~3; the registers that we |
had been calling \$2, \$3, and \$4 appear as \$0, \$1, and \$2 to the subroutine. |
|
If $\ll\le\xx<\gg$, the value of $\ll$ increases to $\xx+1$ as described |
above; then the rules for $\xx<\ll$ apply. |
|
If $\xx\ge\gg$ the actions are similar, except that {\it all\/} of the local |
registers \$0, \dots,~$\$(\ll-1)$ are placed on the register stack |
followed by the number~$L$, and $L$~is reset to zero. In particular, the |
instruction \<PUSHGO \$255,\$Y,\$Z pushes all the local registers |
onto the stack and sets $L$ to zero, regardless of the previous value of~$L$. |
|
We will see later that \MMIX\ is able to achieve the effect of pushing and |
renaming local registers without actually doing very much work at all. |
|
\bull\<POP X,YZ `pop registers and return from subroutine'.\> |
@.POP@> |
This command preserves X of the current local registers, |
undoes the effect of the most recent \.{PUSHJ} or \.{PUSHGO}, and jumps |
to the instruction in $\mm_4[{\rm4YZ+rJ}]$. If $\xx>0$, the value of |
$\$(\xx-1)$ goes into the ``hole'' position where \.{PUSHJ} or |
\.{PUSHGO} stored the number of registers previously pushed. |
|
The formal details of \.{POP} are slightly complicated, but we will see that |
they make sense: If $\xx>\ll$, we first replace X by $\ll+1$. Then we |
set $x\gets S[\tau-1]\bmod 256$; this is the effective value of the X~field |
in the push instruction that is being undone. Stack position $S[\tau-1]$ is |
now set to $\$(\xx-1)$ if $0<\xx\le L$, otherwise it is set to zero. |
Then we essentially set |
$\ll\gets\min(x+\xx,\gg)$, $\$(\ll-1)\gets\$(\ll-x-2)$, \dots, |
$\$(x+1)\gets\$0$, $\$x\gets S[\tau-1]$, \dots, |
$\$0\gets S[\tau-x-1]$, $\tau\gets\tau-x-1$. The operating system should |
@^operating system@> |
arrange things so that a memory-protection |
interrupt will occur if a program does more pops than pushes. |
(If $x>\gg$, these formulas don't make sense as written; we actually |
set $\$j\gets S[\tau-x-1+j]$ for $\ll>j\ge0$ in that rare case.) |
|
Suppose, for example, that a subroutine has three input parameters |
$(\$0,\$1,\$2)$ and produces two outputs $(\$0,\$1)$. If the subroutine does |
not call any other subroutines, it can simply end with \.{POP} \.{2,0}, |
because rJ will contain the return address. Otherwise it should begin by |
saving rJ, for example with the instruction \<GET \$4,rJ if it will be |
using local registers \$0 through~\$3, and it should use \<PUSHJ \$5 or |
\<PUSHGO \$5 when |
calling sub-subroutines; finally it should \<PUT rJ,\$4 before |
saying \.{POP}~\.{2,0}. To call the subroutine from another routine that |
has, say, 6~local registers, we would put the input arguments into \$7, \$8, |
and~\$9, then issue the command \.{PUSHGO} \.{\$6,base,Subr}; |
in due time the outputs of the subroutine will appear in \$7 and~\$6. |
|
Notice that the push and pop commands make use of a one-place ``hole'' in the |
register stack, between the registers that are pushed down and the registers |
that remain local. (The hole is position \$6 in the example just considered.) |
\MMIX\ needs this hole position to remember the number of |
registers that are pushed down. |
A subroutine with no outputs ends with \<POP 0,0 and the hole disappears |
(becomes marginal). A subroutine with one output~\$0 ends with \<POP 1,0 and |
the hole gets the former value of~\$0. A subroutine with two outputs |
$(\$0,\$1)$ ends with \<POP 2,0 and the hole gets the former value of~\$1; in |
this case, therefore, the relative order of the two outputs has been switched |
on the register stack. If a subroutine has, say, five outputs |
$(\$0,\ldots,\$4)$, it ends with \<POP 5,0 and \$4~goes into the hole position, |
where it is followed by $(\$0,\$1,\$2,\$3)$. |
\MMIX\ makes this curious permutation in the case of multiple outputs because |
the hole is most easily plugged by moving one value down (namely~\$4) instead |
of by sliding each of five values down in the stack. |
|
These conventions for parameter passing are admittedly a bit confusing in the |
general case, and I~suppose people who use them extensively might someday find |
themselves talking about ``the infamous \MMIX\ register shuffle.'' However, |
there is good use for subroutines that convert |
a sequence of register contents like $(x,a,b,c)$ into $(f,a,b,c)$ where |
$f$ is a function of $a$, $b$, and $c$ but not~$x$. Moreover, |
\.{PUSHGO} and \.{POP} can be implemented with great efficiency, |
and subroutine linkage tends to be a significant bottleneck when |
other conventions are used. |
|
Information about a subroutine's calling conventions needs to be communicated |
to a debugger. That can readily be done at the same time as we inform the |
debugger about the symbolic names of addresses in memory. |
|
A subroutine that uses 50 local registers will not function properly if it is |
called by a program that sets $G$ less than~50. \MMIX\ does not allow the |
value of~$G$ to become less than~32. Therefore any subroutine that avoids |
global registers and uses at most~32 local registers |
can be sure to work properly regardless of the current value of~$G$. |
|
The rules stated above imply that a \.{PUSHJ} or |
\.{PUSHGO} instruction with $\xx=255$ pushes all of the currently defined |
local registers onto the stack and sets $L$ to~zero. |
This makes $G$ local registers available for use by the subroutine |
jumped~to. If that subroutine later returns with \.{POP} \.{0,0}, the former |
value of~$L$ and the former contents of \$0, \dots,~$\$(\ll-1)$ will be |
restored (assuming that $G$ doesn't decrease). |
|
A \.{POP} instruction with $\xx=255$ |
preserves all the local registers as outputs of |
the subroutine (provided that the total doesn't exceed~$G$ after popping), |
and puts zero into the hole (unless $L=G=255$). The best policy, however, is |
almost always to use \.{POP} with a small value of~X, and in general to keep |
the value of~$L$ as small as |
possible by decreasing it when registers are no longer active. |
A smaller value of~$L$ means that \MMIX\ can change context more |
easily when switching from one process to another. |
|
@* System considerations. |
High-performance implementations of\/ \MMIX\ gain speed by keeping {\it |
caches\/} of instructions and data that are likely to be needed as computation |
@^caches@> |
proceeds. [See M.~V. Wilkes, {\sl IEEE Transactions\/ \bf EC-14} (1965), |
270--271; J.~S. Liptay, {\sl IBM System J. \bf7} (1968), 15--21.] |
@^Wilkes, Maurice Vincent@> |
@^Liptay, John S.@> |
Careful programmers can make the computer run even faster by giving |
hints about how to maintain such caches. |
|
\bull\<LDUNC \$X,\$Y,\0 `load octa uncached'.\> |
@.LDUNC@> |
These instructions, which have the same meaning as \.{LDO}, also |
inform the computer that the loaded octabyte (and its neighbors in a cache |
block) will probably not be read or written in the near future. |
|
\bull\<STUNC \$X,\$Y,\0 `store octa uncached'.\> |
@.STUNC@> |
These instructions, which have the same meaning as \.{STO}, also |
inform the computer that the stored octabyte (and its neighbors in a cache |
block) will probably not be read or written in the near future. |
|
\bull\<PRELD X,\$Y,\0 `preload data'.\> |
@.PRELD@> |
These instructions have no effect on registers or memory, but they inform the |
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through |
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$, |
will probably be loaded and/or stored in the near future. |
No protection failure occurs if the memory is not accessible. |
|
\bull\<PREGO X,\$Y,\0 `prefetch to go'.\> |
@.PREGO@> |
These instructions have no effect on registers or memory, but they inform the |
computer that many of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through |
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$, |
will probably be used as instructions in the near future. |
No protection failure occurs if the memory is not accessible. |
|
\bull\<PREST X,\$Y,\0 `prestore data'.\> |
@.PREST@> |
These instructions have no effect on registers or memory if the computer has |
no data cache. But when such a cache exists, they inform the |
computer that all of the $\xx+1$ bytes $\mm[\rY+\rZ]$ through |
$\mm[\rY+\rZ+\xx]$, or $\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$, |
will definitely be stored in the near future before they are loaded. |
(Therefore it is permissible for the machine to ignore the present contents of |
those bytes. Also, if those bytes are being shared by several processors, |
the current processor should try to acquire exclusive access.) |
No protection failure occurs if the memory is not accessible. |
|
\bull\<SYNCD X,\$Y,\0 `synchronize data'.\> |
@.SYNCD@> |
When executed from nonnegative locations, these instructions have no effect on |
registers or memory if neither a write buffer nor a ``write back'' |
data cache are present. But when such a buffer or cache exists, they force the |
computer to make sure that all data for the $\xx+1$ bytes |
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or |
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$, |
will be present in memory. |
(Otherwise the result of a previous store instruction might appear only |
in the cache; the computer is being told that now is the time to |
write the information back, if it hasn't already been written. A program |
can use this feature before outputting directly from memory.) |
No protection failure occurs if the memory is not accessible. |
|
The action is similar when \.{SYNCD} is executed from a negative address, |
but in this case the specified bytes are also removed from the data |
cache (and from a secondary cache, if present). The operating system can |
use this feature when a page of virtual memory is being swapped out, |
or when data is input directly into memory. |
@^operating system@> |
|
\bull\<SYNCID X,\$Y,\0 `synchronize instructions and data'.\> |
@.SYNCID@> |
When executed from nonnegative locations these instructions have no effect on |
registers or memory if the computer has no instruction cache separate from a |
data cache. But when such a cache exists, they force the |
computer to make sure that the $\xx+1$ bytes |
$\mm[\rY+\rZ]$ through $\mm[\rY+\rZ+\xx]$, or |
$\mm[\rY+\zz]$ through $\mm[\rY+\zz+\xx]$, |
will be interpreted correctly |
if used as instructions before they are next modified. |
(Generally speaking, an \MMIX\ program is not expected to store anything in |
memory locations that are also being used as instructions. |
Therefore \MMIX's instruction cache is allowed to become inconsistent with |
respect to its data cache. Programmers who insist on executing instructions |
that have been fabricated dynamically, for example when setting a breakpoint |
for debugging, must first \.{SYNCID} those instructions |
in order to guarantee that the intended results will be obtained.) A \.{SYNCID} |
command might be implemented in several ways; for example, the machine |
might update its instruction cache to agree with its data cache. A simpler |
solution, which is good enough because the need for \.{SYNCID} ought to |
be rare, removes instructions in the specified range |
from the instruction cache, if |
present, so that they will have to be fetched from memory the next time |
they are needed; in this case the machine also carries out the effect of |
a~\.{SYNCD} command. |
No protection failure occurs if the memory is not accessible. |
|
The behavior is more drastic, but faster, when \.{SYNCID} is executed |
from a negative location. Then all bytes in the specified range are |
simply removed from all caches, and the memory corresponding to |
any ``dirty'' cache blocks involving such bytes is {\it not\/} brought up |
to date. An operating system can use this version of the command |
when pages of virtual memory are being discarded (for example, when |
a program is being terminated). |
|
@ \MMIX\ is designed to work not only on a single processor but also |
in situations where several processors |
share a common memory. The following commands are useful |
for efficient operation in such circumstances. |
|
\bull\<CSWAP \$X,\$Y,\0 `compare and swap octabytes'.\> |
@.CSWAP@> |
If the octabyte $\mm_8[\rY+\rZ]$ or $\mm_8[\rY+\zz]$ |
is equal to the contents of the special {\it prediction register\/}~rP, |
@^rP@> |
it is replaced in memory with the contents of register~X, and |
register~X is set equal to~1. Otherwise the octabyte in memory |
replaces rP and register~X is set to zero. |
This is an atomic (indivisible, uninterruptible) operation, |
useful for interprocess communication |
when independent computers are sharing the same memory. |
|
The compare-and-swap operation was introduced by IBM in late |
models of the |
@^IBM Corporation@> |
@^compare-and-swap@> |
@^atomic instruction@> |
System/370 architecture, and it soon spread to several |
@^System/370@> |
other machines. Significant ways to use it are discussed, for example, |
in section 7.2.3 of Harold Stone's |
{\sl High-Performance Computer Architecture\/} (Reading, Massachusetts:\ |
Addison--Wesley, 1987), and in sections 8.2 and 8.3 of {\sl Transaction |
Processing\/} by Jim Gray and Andreas Reuter (San Francisco:\ Morgan |
Kaufmann, 1993). % Kaufmann: stet |
@^Stone, Harold Stuart@> |
@^Gray, James Nicholas@> |
@^Reuter, Andreas Horst@> |
|
\bull\<SYNC XYZ `synchronize'.\> |
@.SYNC@> |
If $\rm XYZ=0$, the machine drains its pipeline (that is, it |
stalls until all preceding instructions have completed their activity). |
If $\rm XYZ=1$, the machine controls its actions less drastically, |
in such a way that all |
store instructions preceding this \.{SYNC} will be completed |
before all store instructions after it. |
If $\rm XYZ=2$, the machine controls its actions in such a way that all |
load instructions preceding this \.{SYNC} will be completed |
before all load instructions after it. |
If $\rm XYZ=3$, the machine controls its actions |
in such a way that all {\it load or store\/} instructions preceding this |
\.{SYNC} will be completed before all load or store instructions after it. |
If $\rm XYZ=4$, the machine goes into a power-saver mode, in which |
@^power-saver mode@> |
instructions may be executed more slowly (or not at all) until some kind |
of ``wake-up'' signal is received. |
If $\rm XYZ=5$, the machine empties its write buffer and |
cleans its data caches, if any (including a possible secondary cache); |
the caches retain their data, |
but the cache contents also appear in memory. |
If $\rm XYZ=6$, the machine clears its virtual address translation |
caches (see below). |
If $\rm XYZ=7$, the machine clears its instruction and data caches, |
discarding any information in the data caches that wasn't previously |
in memory. (``Clearing'' is stronger than ``cleaning''; a clear cache |
remembers nothing. Clearing is also faster, because it simply obliterates |
everything.) |
If $\rm XYZ>7$, an illegal instruction interrupt occurs. |
|
Of course no \.{SYNC} is necessary between a command that loads from or stores |
into memory and a subsequent command that loads from or stores into exactly |
the same location. However, \.{SYNC} might be necessary in certain cases even |
on a one-processor system, because input/output processes take place in |
parallel with ordinary computation. |
|
The cases $\rm XYZ>3$ are {\it privileged}, in the sense that |
only the operating system can use them. More precisely, if a \.{SYNC} |
command is encountered with $\rm XYZ=4$ or |
$\rm XYZ=5$ or $\rm XYZ=6$ or $\rm XYZ=7$, |
a ``privileged instruction interrupt'' occurs unless that interrupt |
is currently disabled. Only the operating system can disable |
interrupts (see below). |
@^privileged operations@> |
|
@* Trips and traps. |
Special register rA records the current status information |
about arithmetic exceptions. Its least significant byte contains eight |
``event'' bits called DVWIOUZX from left to right, where D stands for |
integer divide check, V~for integer overflow, W~for float-to-fix overflow, |
I~for invalid operation, O~for floating overflow, U~for |
floating underflow, Z~for floating division by zero, and X~for floating |
inexact. % The low order five bits agree with SPARC I conventions |
% but Alpha, for example, uses the order VXUOZI |
The next least significant byte of rA contains eight |
``enable'' bits with the same names DVWIOUZX and the same meanings. |
When an exceptional condition occurs, there are two cases: If the |
corresponding enable bit is~0, the corresponding event bit is set |
to~1. But if the corresponding enable bit is~1, \MMIX\ interrupts |
its current instruction stream and executes a special ``exception |
handler.'' Thus, the event bits record exceptions that have not been |
``tripped.'' |
@^overflow@> |
@^underflow@> |
@^exceptions@> |
@^handlers@> |
@^float-to-fix exception@> |
@^inexact exception@> |
@^invalid exception@> |
@^divide check exception@> |
|
Floating point overflow always causes two exceptions, O and~X\null. |
(The strictest interpretation of the IEEE standard would raise exception~X |
on overflow only if floating overflow is not enabled, but \MMIX\ always |
considers an overflowed result to be inexact.) |
Floating point underflow always causes both U and~X when underflow is |
not enabled, and it might cause both U and~X when underflow is enabled. |
If both enable bits are set to~1 in such cases, the overflow or underflow |
handler is called and the inexact handler is ignored. All other types |
of exceptions arise one at a time, so there is no ambiguity about which |
exception handler should be invoked unless exceptions are raised by |
``ropcode~2'' (see below); in general the first enabled exception |
in the list DVWIOUZX takes precedence. |
|
What about the six high-order bytes of the status register rA? |
@^rA@> |
@^rounding modes@> |
At present, only two of those 48 bits are defined; |
the others must be zero for compatibility |
with possible future extensions. The two bits corresponding to $2^{17}$ |
and $2^{16}$ in rA specify a rounding mode, as follows: 00~means |
round to nearest (the default); 01~means round off (toward zero); |
10~means round up (toward positive infinity); and |
11~means round down (toward negative infinity). |
% Alpha conventions differ: 10,00,11,01 for nearest,off,up,down |
|
@ The execution of\/ \MMIX\ programs can be interrupted in several ways. |
We have just seen that arithmetic exceptions will cause interrupts if |
they are enabled; so will illegal or privileged instructions, or instructions |
@^illegal instructions@> |
@^privileged operations@> |
@^emulation@> |
@^interrupts@> |
@^I/O@> |
@^input/output@> |
that are emulated in software instead of provided by the hardware. |
Input/output operations or external timers are another common source |
of interrupts; the operating system knows how to deal with |
all gadgets that might be hooked up to an \MMIX\ processor chip. |
Interrupts occur also when memory accesses fail---for example if |
memory is nonexistent or protected. |
Power failures that force the machine to use its backup battery power |
in order to keep running in an emergency, |
or hardware failures like parity errors, |
all must be handled as gracefully as possible. |
|
Users can also force interrupts to happen by giving explicit \.{TRAP} or |
\.{TRIP} instructions: |
|
\bull\<TRAP X,Y,Z `trap'; \<TRIP X,Y,Z `trip'.\> |
@.TRIP@> |
@.TRAP@> |
Both of these instructions interrupt processing and transfer control |
to a handler. The difference between them is that \.{TRAP} |
is handled by the operating system but \.{TRIP} is handled by the user. |
@^operating system@> |
More precisely, the X, Y, and Z fields of \.{TRAP} have special significance |
predefined by the operating system kernel. For example, a system call---say an I/O |
command, or a command to allocate more memory---might be invoked |
by certain settings of X, Y, and~Z\null. |
The X, Y, and Z fields of \.{TRIP}, on the other hand, are definable by |
users for their own applications, and users also define their own |
handlers. ``Trip handler'' programs |
invoked by \.{TRIP} are interruptible, but interrupts are normally inhibited |
while a \.{TRAP} is being serviced. Specific details about the |
precise actions of \.{TRIP} and \.{TRAP} appear below, together |
with the description of another command called \.{RESUME} that |
returns control from a handler to the interrupted program. |
|
Only two variants of \.{TRAP} are predefined by the \MMIX\ architecture: |
If $\rm XYZ=0$ in a \.{TRAP} |
command, a user process should terminate. If $\rm XYZ=1$, |
the operating system should provide default action for cases in which |
the user has not provided any handler for a particular |
kind of interrupt (see below). |
|
A few additional variants of \.{TRAP} are predefined in the rudimentary |
operating system used with \MMIX\ simulators. These variants, which |
allow simple input/output operations to be done, all have $\xx=0$, |
and the Y~field is a small positive constant. For example, $\yy=1$ invokes |
the \.{Fopen} routine, which opens a file. (See the program |
{\mc MMIX-SIM} for full details.) |
@^I/O@> |
@^input/output@> |
|
@ Non-catastrophic interrupts in \MMIX\ are always {\it precise}, in the sense that all legal |
instructions before a certain point have effectively been executed, and |
no instructions after that point have yet been executed. The current |
instruction, which may or may not have been completed at the time of |
interrupt and which may or may not need to be resumed after the interrupt has |
been serviced, is |
put into the special {\it execution register\/}~rX, and its operands (if any) |
are placed in special registers rY and~rZ\null. The address of the following |
instruction is placed in the special {\it where-interrupted |
register\/}~rW\null. |
@^interrupts@> |
@^rW@> |
@^rX@> |
@^rY@> |
@^rZ@> |
The instruction in~rX may not be the same as the instruction in |
location $\rm rW-4$; for example, it may be an instruction that |
branched or jumped to~rW\null. It might also be an instruction |
inserted internally by the \MMIX\ processor. |
(For example, the computer silently inserts an internal instruction |
that increases~$L$ before an instruction |
like \<ADD \$9,\$1,\$0 if $L$~is currently less than~10. If an interrupt |
occurs, between the inserted instruction and the \.{ADD}, |
the instruction in~rX will |
say \.{ADD}, because an internal instruction retains the identity of the |
actual command that spawned it; but rW will point to the {\it real\/} |
\.{ADD} command.) |
|
When an instruction has the normal meaning ``set \$X to |
the result of \$Y~op~\$Z'' or ``set \$X to the result of \$Y~op~Z,'' |
special registers rY and~rZ will relate in the |
obvious way to the Y and~Z operands of the instruction; but this is not |
always the case. For example, after an interrupted |
store instruction, the first operand~rY will hold |
the virtual memory address (\$Y plus either \$Z or~Z), |
and the second operand~rZ will be the octabyte to be stored in memory |
(including bytes that have not changed, in cases like \.{STB}). In |
other cases the actual |
contents of rY and~rZ are defined by each implementation of\/ \MMIX, |
and programmers should not rely on their significance. |
|
Some instructions take an unpredictable and possibly long amount of time, so |
it may be necessary to interrupt them in progress. For example, the \.{FREM} |
@.FREM@> |
instruction (floating point remainder) is extremely difficult to compute |
rapidly if its first operand has an exponent of~2046 and its second operand |
has an exponent of~1. In such cases the rY and rZ registers saved during an |
interrupt show the current state of the computation, not necessarily the |
original values of the operands. The value of $\rm rY\,{rem}\,rZ$ will still |
be the desired remainder, but rY may well have been reduced to a |
number that has an exponent closer to the exponent of~rZ\null. |
After the interrupt has been processed, the remainder |
computation will continue where it left off. |
(Alternatively, an operation like \.{FREM} or even \.{FADD} might be |
implemented in software instead of hardware, as we will see later.) |
|
Another example arises with an instruction like \.{PREST} (prestore), which can |
@.PREST@> |
specify prestoring up to 256 bytes. An implementation of\/ \MMIX\ might choose |
to prestore only 32 or 64 bytes at a time, depending on the cache block size; |
then it can change the contents of rX to reflect the unfinished part of |
a partially completed \.{PREST} command. |
|
Commands that decrease $G$, pop the stack, save the |
current context, or unsave an old context also are interruptible. Register~rX |
is used to communicate information about partial completion in such a |
way that the interruption will be essentially ``invisible'' after |
a program is resumed. |
|
@ Three kinds of interruption are possible: trips, forced traps, and |
dynamic traps. We will discuss each of these in turn. |
@^interrupts@> |
@^trips@> |
@^traps@> |
@^forced traps@> |
@^dynamic traps@> |
@^handlers@> |
@^operating system@> |
|
A \.{TRIP} instruction puts itself into the right half of the execution |
@.TRIP@> |
register~rX, and sets the 32 bits of the left half to \Hex{80000000}. |
(Therefore rX is {\it negative\/}; this fact will |
tell the \.{RESUME} command not to \.{TRIP} again.) The special registers |
rY and rZ are set to the contents of the registers specified by the |
Y and Z fields of the \.{TRIP} command, namely \$Y and~\$Z. |
Then \$255 is placed into the special {\it bootstrap |
register\/}~rB, and \$255 is set to~rJ. \MMIX\ now takes its next instruction |
@^rB@> |
from virtual memory address~0. |
|
Arithmetic exceptions interrupt the computation in essentially the |
same way as \.{TRIP}, if they are enabled. The only difference is that |
their handlers begin at the respective addresses |
16, 32, 48, 64, 80, 96, 112, and~128, for exception bits D, V, W, I, O, U, |
Z, and~X of~rA; registers rY and~rZ are set to the operands of the |
interrupted instruction as explained earlier. |
|
A 16-byte block of memory is just enough for a sequence of commands like |
$$\hbox{\tt PUSHJ 255,Handler; PUT rJ,\$255; GET \$255,rB; RESUME}$$ |
which will invoke a user's handler. And if the user does not choose to |
provide a custom-designed handler, the operating system provides a |
default handler via the instructions |
$$\hbox{\tt TRAP 1; GET \$255,rB; RESUME.}$$ |
|
A trip handler might simply record the fact that tripping occurred. |
But the handler for an arithmetic interrupt might want to change the |
default result of a computation. In such cases, the handler should place |
the desired substitute result into~rZ, and it should change the most |
significant byte of~rX from \Hex{80} to \Hex{02}. This will have the desired |
effect, because of the rules of \.{RESUME} explained below, {\it unless\/} |
the exception occurred on a command like \.{STB} or \.{STSF}. (A~bit more |
work is needed to alter the effect of a command that stores into memory.) |
|
Instructions in {\it negative\/} virtual locations do not invoke trip |
handlers, either for \.{TRIP} or for arithmetic exceptions. Such instructions |
are reserved for the operating system, as we will see. |
@^negative locations@> |
|
@ A \.{TRAP} instruction interrupts the computation essentially |
@^interrupts@> |
like \.{TRIP}, but with the following modifications: |
@^rT@> |
@.TRAP@> |
@^rK@> |
(i)~the interrupt mask register~rK is cleared |
to zero, thereby inhibiting interrupts; (ii)~control jumps to virtual memory |
address~rT, not zero; (iii)~information is placed |
@^rBB@> |
@^rWW@> |
@^rXX@> |
@^rYY@> |
@^rZZ@> |
in a separate set of special registers rBB, rWW, rXX, rYY, and~rZZ, instead of |
rB, rW, rX, rY, and~rZ\null. (These special registers are needed because a trap |
might occur while processing a \.{TRIP}.) |
|
Another kind of forced trap occurs on implementations of\/ \MMIX\ that |
emulate certain instructions in software rather than in hardware. |
Such instructions cause a \.{TRAP} even though their opcode is something |
else like \.{FREM} or \.{FADD} or \.{DIV}. The trap handler can tell |
what instruction to emulate by looking at the opcode, which appears |
in~rXX\null. |
In such cases the left-hand half of~rXX is set to \Hex{02000000}; the handler |
emulating \.{FADD}, say, should compute the floating point sum of rYY and~rZZ |
and place the result in~rZZ\null. A~subsequent |
\.{RESUME}~\.1 will then place the value of~rZZ in the proper register. |
@^emulation@> |
@^forced traps@> |
|
Implementations of\/ \MMIX\ might also emulate the process of |
virtual-address-to-physical-address translation described below, |
instead of providing for page table calculations in hardware. |
Then if, say, a \.{LDB} instruction does not know the physical memory |
address corresponding to a specified virtual address, it will cause |
a forced trap with the left half of~rXX set to \Hex{03000000} and with |
rYY set to the virtual address in question. The trap handler should |
place the physical page address into~rZZ; then \.{RESUME}~\.1 will |
complete~the~\.{LDB}. |
|
@ The third and final kind of interrupt is called a {\it dynamic\/} trap. |
@^interrupts@> |
@^dynamic traps@> |
Such interruptions occur when one or more of the 64 bits in the |
special {\it interrupt request register\/}~rQ have been set to~1, |
@^rQ@> |
@^rK@> |
and when at least one corresponding bit of the special |
{\it interrupt mask register\/}~rK is also equal to~1. The bit positions |
of rQ and~rK have the general form |
$$\beginword |
&\field{24}{24}&&\field88&&\field{24}{24}&&\field88\cr |
\noalign{\hrule} |
\\&low-priority I/O&\\&program&\\&high-priority I/O&\\&machine&\\\cr |
\noalign{\hrule}\endword$$ |
where the 8-bit ``program'' bits are called \.{rwxnkbsp} and have |
the following meanings: |
$$\vbox{\halign{\.# bit: &#\hfil\cr |
r&instruction tries to load from a page without read permission;\cr |
w&instruction tries to store to a page without write permission;\cr |
x&instruction appears in a page without execute permission;\cr |
n&instruction refers to a negative virtual address;\cr |
k&instruction is privileged, for use by the ``kernel'' only;\cr |
b&instruction breaks the rules of\/ \MMIX;\cr |
s&instruction violates security (see below);\cr |
p&instruction comes from a privileged (negative) virtual address.\cr}}$$ |
Negative addresses are for the use of the operating system only; |
@^operating system@> |
@^protection bits@> |
@^permission bits@> |
@^security violation@> |
@^privileged instructions@> |
@^illegal instructions@> |
@^page fault@> |
a security violation occurs if an instruction in a nonnegative address |
is executed without the \.{rwxnkbsp} bits of~rK all set to~1. |
(In such cases the \.s bits of both rQ and~rK are set to~1.) |
|
The eight ``machine'' bits of rQ and rK represent the most urgent |
kinds of interrupts. The rightmost bit stands for power failure, |
the next for memory parity error, the next for nonexistent memory, |
the next for rebooting, etc. |
Interrupts that need especially quick service, like requests from |
a high-speed network, also are allocated bit positions near the right end. |
Low priority I/O devices like keyboards are assigned to bits at the left. |
The allocation of input/output devices to bit positions will |
differ from implementation to implementation, depending on |
what devices are available. |
@^I/O@> |
@^input/output@> |
|
Once $\rm rQ\land rK$ becomes nonzero, the machine waits |
briefly until it can give a precise interrupt. |
Then it proceeds as with a forced trap, |
except that it uses the special ``dynamic |
trap address register''~rTT instead of~rT. The trap handler that |
@^rTT@> |
begins at location~rTT can figure out the reason for interrupt by |
examining $\rm rQ\land rK$. (For example, after the instructions |
$$\hbox spread-10pt{\tt\spaceskip .5em minus .1em |
GET \$0,rQ; LDOU \$1,savedK; AND \$0,\$0,\$1; SUBU \$1,\$0,1; |
SADD \$2,\$1,\$0; ANDN \$1,\$0,\$1}$$ |
the highest-priority offending bit will be in \$1 and its position will be |
in~\$2.) |
@^counting trailing zeros@> |
|
If the interrupted instruction contributed 1s to any of the \.{rwxnkbsp} bits |
of~rQ, the corresponding bits are set to~1 also in~rXX\null. A~dynamic trap |
handler might be able to use this information (although it should |
service higher-priority interrupts first if the right half |
of $\rm rQ\land rK$ is nonzero). |
@^rX@> |
|
The rules of\/ \MMIX\ are rigged |
so that only the operating system can execute instructions |
with interrupts suppressed. Therefore the operating system can in fact |
use instructions that would interrupt an ordinary program. Control of |
register rK turns out to be the ultimate privilege, and in a sense the |
only important one. |
@^privileged operations@> |
|
An instruction that causes a dynamic trap is usually executed before the |
interruption occurs. However, an instruction that traps with |
bits \.x, \.k, or \.b does nothing; a load instruction that traps |
with \.r or \.n loads zero; a store instruction that traps with any |
of \.{rwxnkbsp} stores nothing. |
|
@ After a trip handler or trap handler has done its thing, it |
generally invokes the following command. |
|
\bull\<RESUME Z `resume after interrupt'; the X and Y fields must be zero.\> |
@.RESUME@> |
@^interrupts@> |
@^handlers@> |
If the Z field of this instruction is zero, |
\MMIX\ will use the |
information found in special registers rW, rX, rY, and~rZ to restart an |
@^rW@> |
@^rX@> |
@^rY@> |
@^rZ@> |
@^rBB@> |
@^rWW@> |
@^rXX@> |
@^rYY@> |
@^rZZ@> |
@^rK@> |
interrupted computation. If the execution register rX is negative, it will be |
ignored and instructions will be executed starting at virtual address~rW\null; |
otherwise the instruction in the right half of the execution register will be |
inserted into the program as if it had appeared in location $\rm rW-4$, |
subject to certain modifications that we will explain momentarily, |
and the {\it next\/} instruction will come from rW. |
|
If the Z field of \.{RESUME} |
is 1 and if this instruction appears in a negative location, |
registers rWW, rXX, rYY, and~rZZ are used instead of rW, rX, rY, and~rZ\null. |
Also, just before resuming the computation, |
mask register rK is set to \$255 and \$255 is set to rBB\null. |
(Only the operating system gets to use this feature.) |
@^operating system@> |
|
An interrupt handler within the operating system might choose to allow itself |
to be interrupted. In such cases it should save the contents of |
rBB, rWW, rXX, rYY, and~rZZ on some kind of stack, before making rK nonzero. |
Then, before resuming whatever caused the base level interrupt, it |
must again disable all interrupts; this can be done with \.{TRAP}, |
because the trap handler can tell from the virtual address in~rWW that |
it has been invoked by the operating system. Once rK is again zero, |
the contents of rBB, rWW, rXX, rYY, and~rZZ are restored from the stack, |
the outer level interrupt mask is placed in \$255, and \<RESUME 1 |
finishes the job. |
|
Values of Z greater than 1 are reserved for possible later |
definition. Therefore they cause an illegal instruction interrupt (that |
is, they set the `\.b' bit of~rQ) in the present version of\/ \MMIX. |
@^illegal instructions@> |
|
If the execution register rX is nonnegative, its leftmost byte controls |
the way its right-hand half will be inserted into the program. |
Let's call this byte the ``ropcode.'' A ropcode of~0 simply |
inserts the instruction into the execution stream; a ropcode of~1 |
is similar, but it substitutes rY and rZ for the |
two operands, assuming that this makes sense for the operation considered. |
@^ropcodes@> |
|
Ropcode~2 inserts a command that sets \$X to rZ, where |
X~is the second byte in the right half of rX\null. |
This ropcode is normally used with forced-trap emulations, so that the result |
of an emulated instruction is placed into the correct register. |
It also uses the third-from-left byte of~rX to raise any or all of the |
arithmetic exceptions DVWIOUZX, at the same time as rZ is |
being placed in \$X. Emulated instructions and |
explicit \.{TRAP} commands can therefore cause overflow, say, |
just as ordinary instructions can. |
(Such new exceptions may, of |
course, spawn a trip interrupt, if any of the corresponding bits are enabled |
in~rA.) |
@^rA@> |
@^emulation@> |
|
Finally, ropcode 3 is the same as ropcode 0, except that it also |
tells \MMIX\ to treat rZ as the page table entry for the virtual |
address~rY\null. (See the discussion of virtual address translation below.) |
Ropcodes greater than~3 are not permitted; moreover, |
only \<RESUME 1 is allowed to use ropcode~3. |
|
The ropcode rules in the previous paragraphs should of course be understood to |
involve rWW, rXX, rYY, and rZZ instead of rW, rX, rY, and~rZ when |
the ropcode is seen by \.{RESUME}~\.1. Thus, in particular, ropcode~3 |
always applies to rYY and~rZZ, never to rY and~rZ. |
|
Special restrictions must hold if resumption is to work properly: Ropcodes |
0~and~3 must not insert a \.{RESUME} instruction; ropcode~1 must insert |
a ``normal'' instruction, namely one whose opcode begins with |
one of the hexadecimal digits \Hex{0}, \Hex{1}, \Hex{2}, \Hex{3}, \Hex{6}, |
\Hex{7}, \Hex{C}, \Hex{D}, or~\Hex{E}. (See the opcode chart below.) |
Some implementations may also allow ropcode~1 with \.{SYNCD[I]} |
and \.{SYNCID[I]}, so that those instructions can conveniently be |
interrupted. |
Moreover, the destination register \$X used with ropcode 1 or~2 must |
not be marginal. All of these restrictions hold automatically in normal |
use; they are relevant only if the programmer tries to do something tricky. |
|
Notice that the slightly tricky sequence |
$$\hbox{\tt LDA \$0,Loc; PUT rW,\$0; LDTU \$1,Inst; PUT rX,\$1; RESUME}$$ |
will execute an almost arbitrary instruction \.{Inst} as if it had been in |
location \.{Loc-4}, and then will jump to location \.{Loc} (assuming |
that \.{Inst} doesn't branch elsewhere). |
|
@* Special registers. |
@^special registers@> |
Quite a few special registers have been mentioned so far, and \MMIX\ actually |
has even more. It is time now to enumerate them all, together with their |
internal code numbers: |
$$\vbox{\halign{\hfil#,\quad&#;\hfil\cr |
rA&arithmetic status register [21]\cr |
rB&bootstrap register (trip) [0]\cr |
rC&cycle counter [8]\cr |
rD÷nd register [1]\cr |
rE&epsilon register [2]\cr |
rF&failure location register [22]\cr |
rG&global threshold register [19]\cr |
rH&himult register [3]\cr |
rI&interval counter [12]\cr |
rJ&return-jump register [4]\cr |
rK&interrupt mask register [15]\cr |
rL&local threshold register [20]\cr |
rM&multiplex mask register [5]\cr |
rN&serial number [9]\cr |
rO®ister stack offset [10]\cr |
rP&prediction register [23]\cr |
rQ&interrupt request register [16]\cr |
rR&remainder register [6]\cr |
rS®ister stack pointer [11]\cr |
rT&trap address register [13]\cr |
rU&usage counter [17]\cr |
rV&virtual translation register [18]\cr |
rW&where-interrupted register (trip) [24]\cr |
rX&execution register (trip) [25]\cr |
rY&Y operand (trip) [26]\cr |
rZ&Z operand (trip) [27]\cr |
rBB&bootstrap register (trap) [7]\cr |
rTT&dynamic trap address register [14]\cr |
rWW&where-interrupted register (trap) [28]\cr |
rXX&execution register (trap) [29]\cr |
rYY&Y operand (trap) [30]\cr |
rZZ&Z operand (trap) [31]\cr}}$$ |
@^rG@> |
@^rL@> |
In this list rG and rL are what we have been calling simply $G$ and $L$; \ |
rC, rF, rI, rN, rO, rS, rU, and~rV have not been mentioned before. |
|
@ The {\it cycle counter\/}~rC advances by~1 on every ``clock pulse'' of the |
@^rC@> |
\MMIX\ processor. Thus if \MMIX\ is running at 500 MHz, the cycle |
counter increases every 2 nanoseconds. There is no need to worry about |
rC overflowing; even if it were to increase once every nanosecond, |
it wouldn't reach $2^{64}$ until more than 584.55 years have gone by. |
|
The {\it interval counter\/}~rI is similar, but it {\it decreases\/} |
@^rI@> |
by~1 on each cycle, and causes an {\it interval interrupt\/} |
when it reaches zero. Such interrupts can be extremely useful for |
``continuous profiling'' as a means of studying |
the empirical running time of programs; |
see Jennifer~M. Anderson, Lance~M. Berc, Jeffrey Dean, Sanjay Ghemawat, |
Monika~R. Henzinger, Shun-Tak~A. Leung, Richard~L. Sites, Mark~T. Vandevoorde, |
Carl~A. Waldspurger, and William~E. Weihl, {\sl ACM Transactions on Computer |
Systems\/ \bf15} (1997), 357--390. |
The interval interrupt is achieved by setting the leftmost bit of the |
``machine'' byte of~rQ equal to~1; this is the eighth-least-significant bit. |
@^rQ@> |
@^continuous profiling@> |
@^performance monitoring@> |
@^Anderson, Jennifer-Ann Monique@> |
@^Berc, Lance Michael@> |
@^Dean, Jeffrey Adgate@> |
@^Ghemawat, Sanjay@> |
@^Henzinger, Monika Hildegard Rauch@> |
@^Leung, Shun-Tak Albert@> |
@^Sites, Richard Lee@> |
@^Vandevoorde, Mark Thierry@> |
@^Waldspurger, Carl Alan@> |
@^Weihl, William Edward@> |
|
The {\it usage counter\/}~rU consists of three fields $(u_p,u_m,u_c)$, |
@^rU@> |
called the usage pattern~$u_p$, the usage mask~$u_m$, |
and the usage count~$u_c$. The most significant byte of~rU is the usage |
pattern; the next most significant byte is the usage mask; and |
the remaining 48 bits are the usage count. Whenever an instruction whose |
${\rm OP}\land u_m=u_p$ has been executed, the value of $u_c$ increases by~1 |
(mod~$2^{48}$). |
Thus, for example, the OP-code chart below implies that |
all instructions are counted if $u_p=u_m=0$; |
all loads and stores are counted together with \.{GO} and \.{PUSHGO} |
if $u_p=(10000000)_2$ and $u_m=(11000000)_2$; |
all floating point instructions are counted together with fixed point |
multiplications and divisions if $u_p=0$ and $u_m=(11100000)_2$; |
fixed point multiplications and divisions alone are counted if |
$u_p=(00011000)_2$ and $u_m=(11111000)_2$; completed subroutine calls |
are counted if $u_p=\.{POP}$ and $u_m=(11111111)_2$. |
Instructions in negative locations, which belong to the operating system, |
are exceptional: They are included in the usage count only if the leading bit |
of $u_c$ is~1. |
@^negative locations@> |
|
Incidentally, the 64-bit counters rC and rI can be implemented rather cheaply with |
only two levels of logic, using an old trick called ``carry-save addition'' |
[see, for example, G.~Metze and J.~E. Robertson, {\sl Proc.\ International |
Conf.\ Information Processing\/} (Paris:\ 1959), 389--396]. One nice |
embodiment of this idea is to |
@^Metze, Gernot@> |
@^Robertson, James Evans@> |
@^carry-save addition@> |
represent a binary number~$x$ in a redundant form as the difference $x'-x''$ |
of two binary numbers. Any two such numbers can be added without carry |
propagation as follows: Let |
$$f(x,y,z)= |
(x\land\bar y)\lor(x\land z)\lor(\bar y\land z), \qquad |
% ((x\oplus y)\land(x\oplus z))\oplus z, \qquad |
g(x,y,z)=x\oplus y\oplus z.$$ |
Then it is easy to check that $x-y+z=2f(x,y,z)-g(x,y,z)$; we need only verify |
this in the eight cases when $x$, $y$, and~$z$ are 0 or~1. |
Thus we can subtract~1 from a counter $x'-x''$ by setting |
$$(x',x'')\gets(f(x',x'',-1)\LL1,\;g(x',x'',-1));$$ |
we can add~1 by setting |
$(x',x'')\gets(g(x'',x',-1),f(x'',x',-1)\LL1)$. |
The result is zero if and only if |
$x'=x''$. We need not actually compute the difference $x'-x''$ until |
we need to examine the register. The computation |
of $f(x,y,z)$ and $g(x,y,z)$ is particularly simple in the special |
cases $z=0$ and $z=-1$. A similar trick works for~rU, |
but extra care is needed in that case |
because several instructions might finish at the same time. |
(Thanks to Frank Yellin for his improvements to this paragraph.) |
@^Yellin, Frank Nathan@> |
|
@ The special {\it serial number register\/}~rN is permanently set to |
@^rN@> |
the time this particular instance of\/ \MMIX\ was created (measured as the |
number of seconds since 00:00:00 Greenwich Mean Time on 1~January 1970), |
in its five least significant bytes. The three most significant bytes |
are permanently set to the {\it version number\/} of the \MMIX\ architecture |
that is being implemented together with |
two additional bytes that modify the version |
number. This quantity serves as an essentially unique identification |
number for each copy of\/ \MMIX. |
@^version number@> |
|
Version 1.0.0 of the architecture is described in the present document. |
Version~1.0.1 is similar, but simplified to avoid the |
complications of pipelines and operating systems. |
Other versions may become necessary in the future. |
|
@ The {\it register stack offset\/}~rO and {\it register stack |
pointer\/}~rS are especially interesting, because they are used to implement |
@^register stack@> |
@^rO@> |
@^rS@> |
\MMIX's register stack~$S[0]$, $S[1]$, $S[2]$,~\dots. |
|
The operating system |
initializes a register stack by assigning a large area of virtual memory to |
each running process, beginning at an address like |
\Hex{6000000000000000}. |
If this starting address is~$\sigma$, stack entry $S[k]$ will go into |
the octabyte $\mm_8[\sigma+8k]$. Stack underflow will be detected because |
the process does not have permission to read from $\mm[\sigma-1]$. |
Stack overflow will be detected because something will give out---either |
the user's budget or the user's patience or the user's swap space---long before |
$2^{61}$~bytes of virtual memory are filled by a register stack. |
@^terabytes@> |
|
The \MMIX\ hardware maintains the register stack by having two banks |
of 64-bit general-purpose registers, one for globals and one for locals. |
The global registers $\rm g[32]$, $\rm g[33]$, \dots, $\rm g[255]$ are used for |
register numbers that are $\ge\gg$ in \MMIX\ commands; |
recall that $G$~is always 32 or more. The local |
registers come from another array that contains $2^n$ registers for |
some~$n$ where $8\le n\le10$; for simplicity of exposition |
we will assume that there are exactly 512 local |
registers, but there may be only 256 or there may be 1024. |
|
\def\l{{\rm l}} |
@^ring of local registers@> |
The local register slots l[0], l[1], \dots, l[511] act as a cyclic buffer with |
addresses that wrap around mod~512, so that $\l[512]=\l[0]$, |
$\l[513]=\l[1]$, etc. This buffer is divided into three parts by three |
pointers, which we will call $\alpha$, $\beta$, and $\gamma$. |
$$\epsfbox{mmix.1}$$ |
Registers $\l[\alpha]$, $\l[\alpha+1]$, \dots,~$\l[\beta-1]$ are |
what program instructions currently call \$0, \$1, \dots,~$\$(\ll-1)$; |
registers $\l[\beta]$, $\l[\beta+1]$, \dots,~$\l[\gamma-1]$ are currently |
unused; and registers $\l[\gamma]$, $\l[\gamma+1]$, \dots,~$\l[\alpha-1]$ |
contain items of the register stack that have been pushed down but not yet |
stored in memory. Special register~rS holds the virtual memory address where |
$\l[\gamma]$ will be stored, if necessary. Special register~rO holds the |
address where $\l[\alpha]$ will be stored; this always equals $8\tau$ plus |
the address of~$S[0]$. We can deduce the values of $\alpha$, $\beta$, |
and~$\gamma$ from the contents of rL, rO, and~rS, because |
$$\rm\alpha=(rO/8)\bmod512,\qquad \beta=(\alpha+rL)\bmod512,\qquad |
\hbox{and}\qquad \gamma=(rS/8)\bmod512.$$ |
|
To maintain this situation we need to make sure that the pointers $\alpha$, |
$\beta$, and $\gamma$ never move past each other. A~\.{PUSHJ} or |
\.{PUSHGO} operation simply |
advances $\alpha$ toward~$\beta$, so it is very simple. The first part of a |
\.{POP} operation, which moves $\beta$ toward~$\alpha$, is also very simple. |
But the next part of a~\.{POP} requires $\alpha$ to move downward, and |
memory accesses might be required. \MMIX\ will decrease rS by~8 (thereby |
decreasing $\gamma$ by~1) and set $\l[\gamma]\gets\mm_8[{\rm rS}]$, |
one or more times if necessary, to keep $\alpha$ from decreasing |
past~$\gamma$. Similarly, the operation of increasing~$L$ may cause \MMIX\ to |
set $\mm_8[{\rm rS}]\gets\l[\gamma]$ and increase rS by~8 (thereby increasing |
$\gamma$ by~1) one or more times, to keep $\beta$ from increasing |
past~$\gamma$. (Actually $\beta$ is never allowed to increase to the point |
where it becomes {\it equal\/} to $\gamma$.) |
If many registers need to be loaded or stored at once, |
these operations are interruptible. |
|
[A somewhat similar scheme was introduced by David R. Ditzel and H.~R. |
McLellan in {\sl SIGPLAN Notices\/ \bf17},\thinspace4 (April 1982), 48--56, |
and incorporated in the so-called {\mc CRISP} architecture developed at |
AT{\AM}T Bell Labs. An even more similar scheme was adopted in the late 1980s |
@^AT{\AM}T Bell Laboratories@> |
@^Advanced Micro Devices@> |
by Advanced Micro Devices, in the processors of their Am29000 series---a |
family of computers whose instructions have essentially the |
format `OP~X~Y~Z' used by~\MMIX.] |
@^Ditzel, David Roger@> |
@^McClellan, Hubert Rae, Jr.@> |
|
Limited versions of\/ \MMIX, having fewer registers, can also be envisioned. For |
example, we might have only 32 local registers $\l[0]$, $\l[1]$, |
\dots,~$\l[31]$ and only 32 global registers $\rm g[224]$, $\rm g[225]$, |
\dots,~$\rm g[255]$. Such a machine could run any \MMIX\ program that |
maintains the inequalities $\ll<32$ and $\gg\ge224$. |
|
@ Access to \MMIX's special registers is obtained via the \.{GET} and |
\.{PUT} commands. |
@^special registers@> |
@^rL@> |
@^rQ@> |
|
\bull\<GET \$X,Z `get from special register'; the Y field must be zero.\> |
@.GET@> |
Register X is set to the contents of the special register identified by |
its code number~Z, using the code numbers listed earlier. |
An illegal instruction interrupt occurs if $\zz\ge32$. |
|
Every special register is readable; \MMIX\ does not keep secrets from |
an inquisitive user. But of course only the operating system is allowed |
@^operating system@> |
to change registers like rK and~rQ (the interrupt mask and request |
registers). And not even the operating system is allowed to change~rC |
(the cycle counter) or rN~(the serial number) or the stack pointers |
rO~and~rS. |
|
\bull\<PUT X,\0 `put into special register'; |
@.PUT@> |
the Y field must be zero.\> |
The special register identified by~X is set to |
the contents of register Z or to the unsigned byte~Z itself, |
if permissible. Some changes are, however, impermissible: |
Bits of rA that are always zero must remain zero; the leading seven bytes |
of rG and rL must remain zero, and rL must not exceed~rG; |
special registers 8--11 (namely rC, rN, rO, and~rS) must not change; |
special registers 12--18 (namely |
rI, rK, rQ, rT, rU, rV, and~rTT) can be changed only if the privilege |
bit of rK is zero; |
and certain bits of~rQ (depending on available hardware) might not |
allow software to change them from 0 to~1. Moreover, any bits of~rQ that have |
changed from 0 to~1 since the most recent \<GET x,rQ |
will remain~1 after \.{PUT}~\.{rQ,z}. |
The \.{PUT} command will not increase~rL; it sets rL to the minimum |
of the current value and the new value. (A~program should say |
\<SETL \$99,0 instead of \<PUT rL,100 when rL is known to be less than~100.) |
|
Impermissible \.{PUT} commands cause an illegal instruction interrupt, |
or (in the case of rI, rK, rQ, rT, rU, rV, and~rTT) a privileged |
operation interrupt. |
@^illegal instructions@> |
@^privileged operations@> |
|
\bull\<SAVE \$X,0 `save process state'; |
@.SAVE@> |
@^register stack@> |
@^ring of local registers@> |
@^rO@> |
@^rS@> |
\<UNSAVE 0,\$Z `restore process state'; the Y~field must be~0, and |
so must the Z field of~\.{SAVE}, the X~field of \.{UNSAVE}.\> |
@.UNSAVE@> |
The \.{SAVE} instruction stores all registers and special registers |
that might affect the computation of the currently running process. |
First the current local registers \$0, \$1, \dots,~$\$(\ll-1)$ are |
pushed down as in \.{PUSHGO}~\.{\$255}, and $L$~is set to zero. |
Then the current global |
registers $\$\gg$, $\$(\gg+1)$, \dots,~\$255 are placed above them |
in the register stack; finally |
rB, rD, rE, rH, rJ, rM, rR, rP, rW, rX, rY, and~rZ |
are placed at the very top, followed by registers rG and~rA packed |
into eight bytes: |
$$\beginword |
&\field88&&\field{24}{24}&&\field{32}{32}\cr |
\noalign{\hrule} |
\\&rG&\\&0&\\&rA&\\\cr |
\noalign{\hrule}\endword$$ |
The address of the topmost octabyte is then placed in register~X, which |
must be a global register. (This instruction is interruptible. If an |
interrupt occurs while the registers are being saved, we will have |
$\alpha=\beta=\gamma$ in the ring of local registers; |
thus rO will equal~rS and rL will be zero. The interrupt handler |
essentially has a new register stack, starting on top of the partially |
saved context.) Immediately after a \.{SAVE} the values of rO and~rS |
are equal to the location of the first byte following the stack |
just saved. The current register stack is effectively empty at this |
point; thus one shouldn't do a \.{POP} until this context |
or some other context has been unsaved. |
@^rO@> |
@^rS@> |
|
The \.{UNSAVE} instruction goes the other way, restoring all the |
registers when given an address in register~Z that was returned |
by a previous \.{SAVE}. Immediately after an \.{UNSAVE} the values of |
rO and~rS will be equal. Like \.{SAVE}, this instruction is interruptible. |
|
The operating system uses \.{SAVE} and \.{UNSAVE} |
to switch context between different processes. |
It can also use \.{UNSAVE} to |
establish suitable initial values of rO and~rS\null. |
But a user program that knows what it is doing can in fact allocate its own |
register stack or stacks and do its own process switching. |
|
Caution: \.{UNSAVE} is destructive, in the sense that a program can't reliably |
\.{UNSAVE} twice from the same saved context. Once an |
\.{UNSAVE} has been done, |
further operations are likely to change the memory |
record of what was saved. Moreover, an interrupt during the middle |
of an \.{UNSAVE} may have already clobbered some of the data in memory before |
the \.{UNSAVE} has completely finished, although the data will appear |
properly in all registers. |
|
@* Virtual and physical addresses. |
Virtual 64-bit addresses are converted to physical addresses in a manner |
@^virtual addresses@> |
@^physical addresses@> |
governed by the special {\it virtual translation register\/}~rV. Thus |
@^rV@> |
$\rm M[A]$ really refers to $\rm m[\phi(A)]$, where m~is the physical |
memory array and $\phi(A)$ |
is determined by the physical mapping function~$\phi$. The details of |
this conversion are rather technical and of interest mainly to the operating |
system, but two simple rules are important to ordinary users: |
@^operating system@> |
|
\bull Negative addresses are mapped directly to physical addresses, by simply |
@^negative locations@> |
suppressing the sign bit: |
$$\phi(A)=A+2^{63}=A\land\Hex{7fffffffffffffff},\qquad |
\hbox{if $A<0$.}$$ |
{\it All accesses to negative addresses are privileged}, for use by the |
operating system only. |
@^privileged operations@> |
(Thus, for example, the trap addresses in~rT and~rTT should be negative, |
because they are addresses inside the operating system.) Moreover, all physical |
addresses $\ge2^{48}$ are intended for use by memory-mapped I/O devices; |
values read from or written to such locations are never placed in a cache. |
@^I/O@> |
@^input/output@> |
@^memory-mapped input/output@> |
|
\bull Nonnegative addresses belong to four {\it segments}, depending on |
@^segments@> |
whether the three leading bits are 000, 001, 010, or 011. These $2^{61}$-byte |
segments are traditionally used for a program's text, data, dynamic |
memory, and register stack, respectively, but such conventions are |
not mandatory. There are four mappings $\phi_0$, $\phi_1$, $\phi_2$, |
and~$\phi_3$ of 61-bit addresses into 48-bit physical memory space, one for |
each segment: |
$$\phi(A)=\phi_{\lfloor A/2^{61}\rfloor}(A\bmod2^{61}),\qquad |
\hbox{if $0\le A<2^{63}$.}$$ |
In general, the machine is able to access smaller addresses of a segment more |
efficiently than larger addresses. Thus a programmer should let each segment |
grow upward from zero, trying to keep any of the 61-bit addresses from |
becoming larger than necessary, although arbitrary addresses are legal. |
|
@ Now it's time for the technical details of virtual address translation. |
@^segments@> |
@^virtual addresses@> |
@^physical addresses@> |
@^rV@> |
The mappings $\phi_0$, $\phi_1$, $\phi_2$, and~$\phi_3$ are defined |
by the following rules. |
\smallskip |
|
(1) The first two bytes of rV are four nybbles called $b_1$, $b_2$, $b_3$, |
$b_4$; we also define $b_0=0$. Segment~$i$ has at most $1024^{\,b_{i+1}-b_i}$ |
pages. In particular, segment~$i$ must have at most one page when |
$b_i=b_{i+1}$, and it must be entirely empty if $b_i>b_{i+1}$. |
|
(2) The next byte of rV, $s$, specifies the current {\it page size}, |
which is $2^s$ bytes. We must have $s\ge13$ (hence at least 8192~bytes |
per page). Values of~$s$ larger than, say, 20 or~so are of use only in rather |
large programs that will reside in main memory for long periods of time, |
because memory protection and swapping are applied to entire pages. |
The maximum legal value of~$s$ is~48. |
|
(3) The remaining five bytes of rV are a 27-bit {\it root location\/}~$r$, |
a 10-bit {\it address space number\/}~$n$, and a 3-bit {\it function |
field\/}~$f$: |
$$\centerline{$\hbox{rV}=\beginword |
&\field44&&\field44&&\field44&&\field44&&\field88&& |
\field{27}{27}&&\field{10}{10}&&\field33\cr |
\noalign{\hrule} |
\\&$b_1$&\\&$b_2$&\\&$b_3$&\\&$b_4$&\\&$s$&\\&$r$&\\&$n$&\\&$f$&\\\cr |
\noalign{\hrule}\endword$}$$ |
Normally $f=0$; if $f=1$, virtual address translation will be done by |
software instead of hardware, and the $b_1$, $b_2$, $b_3$, $b_4$, |
and~$r$ fields of~rV will be ignored by the hardware. |
(Values of $f>1$ are reserved for possible future use; if $f>1$ |
when \MMIX\ tries to translate an address, a memory-protection |
failure will occur.) |
@^illegal instructions@> |
|
(4) Each page has an 8-byte {\it page table entry\/} (PTE), which looks |
@^page table entry@> |
@^PTE@> |
like this: |
$$\centerline{$\hbox{PTE}=\beginword |
&\field{16}{16}&&\field{32}{48-s}&&\field3{s-13}&&\field{10}{10}&& |
\field33\cr |
\noalign{\hrule} |
\\&$x$&\\&$a$&\\&$y$&\\&$n$&\\&$p$&\\\cr |
\noalign{\hrule}\endword$}$$ |
Here $x$ and $y$ are ignored (thus they are usable for any purpose by the |
operating |
system); $2^s a$~is the physical address of byte~0 on the page; and $n$~is |
the address space number (which must match the number in~rV). The final three |
bits are the {\it protection bits\/} $p_r\,p_w\,p_x$; the user needs |
$p_r=1$ to load from this page, $p_w=1$ to store on this page, and |
$p_x=1$ to execute instructions on this page. If $n$~fails to |
match the number in~rV, or if the appropriate protection bit is zero, |
a memory-protection fault occurs. |
@^protection fault@> |
|
Page table entries should be writable only by the operating system. |
The 16 ignored bits of~$x$ imply that physical memory size is limited |
to $2^{48}$ bytes (namely 256 large terabytes); that should be enough capacity |
for awhile, if not for the entire new millennium. |
@^terabytes@> |
|
(5) A given 61-bit address $A$ belongs to page $\lfloor A/2^s\rfloor$ of |
its segment, and |
$$\phi_i(A)=2^s\,a+(A\bmod2^s)$$ |
if $a$ is the address in the PTE for page $\lfloor A/2^s\rfloor$ of |
segment~$i$. |
|
(6) Suppose $\lfloor A/2^s\rfloor=(a_4a_3a_2a_1a_0)_{1024}$ in the |
radix-1024 number system. In the common case $a_4=a_3=a_2=a_1=0$, the |
PTE is simply the octabyte ${\rm m}_8[2^{13}(r+b_i)+8a_0]$; this rule |
defines the mapping for the first 1024 pages. The next million or~so pages are |
accessed through an auxiliary {\it page table pointer} |
@^page table pointer@> |
@^PTP@> |
$$\centerline{$\hbox{PTP}=\beginword |
&\field11&&\field{50}{50}&&\field{10}{10}&&\field33\cr |
\noalign{\hrule} |
\\&1&\\&$c$&\\&$n$&\\&$q$&\\\cr |
\noalign{\hrule}\endword$}$$ |
in ${\rm m}_8[2^{13}(r+b_i+1)+8a_1]$; here the sign must be~1 and the |
$n$-field must match~rV, but the $q$~bits are ignored. The desired PTE for |
page $(a_1a_0)_{1024}$ is then in ${\rm m}_8[2^{13}c+8a_0]$. The next billion |
or so pages, namely the pages $(a_2a_1a_0)_{1024}$ with $a_2\ne0$, |
are accessed similarly, through an auxiliary PTP at level~two; and |
so on. |
|
Notice that if $b_3=b_4$, there is just one page in segment~3, and its PTE |
appears all alone in physical location $2^{13}(r+b_3)$. |
Otherwise the PTEs appear in 1024-octabyte blocks. We usually |
have $0<b_1<b_2<b_3<b_4$, but the null case $b_1=b_2=b_3=b_4=0$ is |
worthy of mention: In this special case there is only one page, and the |
segment bits of a virtual address are ignored; the other $61-s$ bits of each |
virtual address must be zero. |
|
If $s=13$, $b_1=3$, $b_2=2$, $b_3=1$, and $b_4=0$, there are at most |
$2^{30}$ pages of 8192 bytes each, all belonging to segment~0. This is |
essentially the virtual memory setup in the Alpha~21064 computers with |
{\mc DIGITAL~UNIX}$^{\rm\,TM}$. |
@^Alpha computers@> |
|
I know these rules look extremely complicated, and I sincerely wish I could |
have found an alternative that would be both simple and efficient in practice. |
I tried various schemes based on hashing, but came to the conclusion that |
``trie'' methods such as those described here are better for this application. |
Indeed, the page tables in most contemporary computers are based on very |
similar ideas, but with significantly smaller virtual addresses and without |
the shortcut for small page numbers. I tried also to find formats for rV |
and the page tables that would match byte boundaries in a more friendly way, |
but the corresponding page sizes did not work well. Fortunately these grungy |
details are almost always completely hidden from ordinary users. |
|
@ Of course \MMIX\ can't afford to perform a lengthy calculation of physical |
addresses every time it accesses memory. The machine therefore maintains a |
{\it translation cache\/} (TC), |
@^translation caches@> |
@^TC@> |
which contains the translations of recently |
accessed pages. (In fact, there usually are two such caches, |
one for instructions |
and one for data.) A~TC holds a set of 64-bit translation keys |
$$\beginword |
&\field{1.2}1&&\field22&&\field{44.8}{61-s}&&\field3{s-13}&&\field{10}{10}&& |
\field33\cr |
\noalign{\hrule} |
\\&0&\\&$i$&\\&$v$&\\&0&\\&$n$&\\&0&\\\cr |
\noalign{\hrule}\endword$$ |
associated with 38-bit translations |
$$\beginword |
&\field{32}{48-s}&&\field3{s-13}&&\field33\cr |
\noalign{\hrule} |
\\&$a$&\\&0&\\&$p$&\\\cr |
\noalign{\hrule}\endword$$ |
representing the relevant parts of the PTE for page $v$ of segment $i$. |
Different processes typically have different values of~$n$, and possibly also |
different values of~$s$. The operating system needs a way to keep such caches |
up to date when pages are being allocated, moved, swapped, or recycled. |
The operating system also likes to know which pages have been recently |
used. The \.{LDVTS} instructions facilitate such operations: |
@^protection bits@> |
@^permission bits@> |
|
\bull\<LDVTS \$X,\$Y,\0 `load virtual translation status'.\> |
@.LDVTS@> |
The sum $\rY+\rZ$ or $\rY+\zz$ should have the form of |
a translation cache key as above, |
except that the rightmost three bits need not be zero. |
If this key is present in a TC, |
the rightmost three bits replace the current protection code~$p$; |
however, if $p$ is thereby set to zero, the key is removed from |
the TC. Register~X is set to 0 if the key was not present |
in any translation cache, or to 1 if the key was present in the TC |
for instructions, or to 2 if the key was present in the TC for data, |
or to~3 if the key was present in both. This instruction is for the |
operating system only. |
|
@ We mentioned earlier that |
cheap versions of\/ \MMIX\ might calculate the physical addresses with |
@^emulation@> |
@^rV@> |
software instead of hardware, using forced traps when the operating |
system needs to do page table calculations. |
@^operating system@> |
Here is some code that could be used for |
such purposes; it defines the translation process precisely, given a |
nonnegative virtual |
address in register~rYY\null. First we must unpack the fields of~rV and |
@^virtual addresses@> |
@^physical addresses@> |
@^rV@> |
@^PTE@> |
@^PTP@> |
@^segments@> |
compute the relevant base addresses for PTEs and PTPs: |
$$\vbox{\halign{&\tt#\hfil\ \cr |
&GET &virt,rYY\cr |
&GET &\$7,rV &\% \$7=(virtual translation register)\cr |
&SRU &\$1,virt,61 &\% \$1=i (segment number of virtual address)\cr |
&SLU &\$1,\$1,2 \cr |
&NEG &\$1,52,\$1 &\% \$1=52-4i\cr |
&SRU &\$1,\$7,\$1 \cr |
&SLU &\$2,\$1,4 \cr |
&SETL &\$0,\#f000 \cr |
&AND &\$1,\$1,\$0 &\% \$1=b[i]<<12\cr |
&AND &\$2,\$2,\$0 &\% \$2=b[i+1]<<12\cr |
&SLU &\$3,\$7,24 \cr |
&SRU &\$3,\$3,37 \cr |
&SLU &\$3,\$3,13 &\% \$3=(r field of rV)\cr |
&ORH &\$3,\#8000 &\% make \$3 a physical address\cr |
&2ADDU &base,\$1,\$3 &\% base=address of first page table\cr |
&2ADDU &limit,\$2,\$3 &\% limit=address after last page table\cr |
&SRU &s,\$7,40 \cr |
&AND &s,s,\#ff &\% s=(s field of rV)\cr |
&CMP &\$0,s,13 \cr |
&BN &\$0,Fail &\% s must be 13 or more\cr |
&CMP &\$0,s,49 \cr |
&BNN &\$0,Fail &\% s must be 48 or less\cr |
&SETH &mask,\#8000 \cr |
&ORL &mask,\#1ff8&\% mask=(sign bit and n field)\cr |
&ORH &\$7,\#8000 &\% set sign bit for PTP validation below\cr |
&ANDNH &virt,\#e000 &\% zero out the segment number\cr |
&SRU &\$0,virt,s &\% \$0=a4a3a2a1a0 (page number of virt)\cr |
&ZSZ &\$1,\$0,1 &\% \$1=[page number is zero]\cr |
&ADD &limit,limit,\$1&\% increase limit if page number is zero\cr |
&SETL&\$6,\#3ff\cr |
}}$$ |
The next part of the routine finds the ``digits'' of |
the page number $(a_4a_3a_2a_1a_0)_{1024}$, from right to left: |
$$ |
\vcenter{\halign{&\tt#\hfil\ \cr |
&OR &\$5,base,0\cr |
&SRU &\$1,\$0,10\cr |
&PBZ &\$1,1F\cr |
&AND &\$0,\$0,\$6\cr |
&INCL &base,\#2000\cr}} |
\qquad |
\vcenter{\halign{&\tt#\hfil\ \cr |
&OR &\$5,base,0\cr |
&SRU &\$2,\$1,10\cr |
&PBZ &\$2,2F\cr |
&AND &\$1,\$1,\$6\cr |
&INCL &base,\#2000\cr}} |
\qquad |
\vcenter{\halign{&\tt#\hfil\ \cr |
&OR &\$5,base,0\cr |
&SRU &\$3,\$2,10\cr |
&PBZ &\$3,3F\cr |
&AND &\$2,\$2,\$6\cr |
&INCL &base,\#2000\cr}} |
\qquad |
\vcenter{\halign{&\tt#\hfil\ \cr |
&OR &\$5,base,0\cr |
&SRU &\$4,\$3,10\cr |
&PBZ &\$4,4F\cr |
&AND &\$3,\$3,\$6\cr |
&INCL &base,\#2000\cr}} |
$$ |
Then the process cascades back through PTPs. |
$$ |
\vcenter{\halign{&\tt#\hfil\ \cr |
&OR &\$5,base,0\cr |
&8ADDU&\$6,\$4,base\cr |
&LDO &base,\$6,0\cr |
&XOR &\$6,base,\$7\cr |
&AND &\$6,\$6,mask\cr |
&BNZ &\$6,Fail\cr}} |
\quad |
\vcenter{\halign{&\tt#\hfil\ \cr |
&ANDNL&base,\#1fff\cr |
4H&8ADDU &\$6,\$3,base\cr |
&LDO &base,\$6,0\cr |
&XOR &\$6,base,\$7\cr |
&AND &\$6,\$6,mask\cr |
&BNZ &\$6,Fail\cr}} |
\quad |
\vcenter{\halign{&\tt#\hfil\ \cr |
&ANDNL&base,\#1fff\cr |
3H&8ADDU &\$6,\$2,base\cr |
&LDO &base,\$6,0\cr |
&XOR &\$6,base,\$7\cr |
&AND &\$6,\$6,mask\cr |
&BNZ &\$6,Fail\cr}} |
\quad |
\vcenter{\halign{&\tt#\hfil\ \cr |
&ANDNL&base,\#1fff\cr |
2H&8ADDU &\$6,\$1,base\cr |
&LDO &base,\$6,0\cr |
&XOR &\$6,base,\$7\cr |
&AND &\$6,\$6,mask\cr |
&BNZ &\$6,Fail\cr}} |
$$ |
Finally we obtain the PTE and communicate it to the machine. |
If errors have been detected, we set the translation to zero; actually |
any translation with permission bits zero would have the same effect. |
$$\chardef\_=`\_ |
\vcenter{\halign{&\tt#\hfil\ \cr |
&ANDNL &base,\#1fff &\% remove low 13 bits of PTP\cr |
1H &8ADDU &\$6,\$0,base \cr |
&LDO &base,\$6,0 &\% base=PTE\cr |
&XOR &\$6,base,\$7\cr |
&ANDN&\$6,\$6,\#7\cr |
&SLU &\$6,\$6,51\cr |
&BNZ &\$6,Fail &\% branch if n doesn't match\cr |
&CMP &\$6,\$5,limit \cr |
&BN &\$6,Ready &\% did we run off the end of the page table?\cr |
Fail&SETL &base,0 &\% errors lead to PTE of zero\cr |
Ready&PUT&rZZ,base\cr |
&LDO&\$255,IntMask &\% load the desired setting of rK\cr |
&RESUME&1 &\% now the machine will digest the translation\cr}}$$ |
All loads and stores in this program deal with negative virtual addresses. |
This effectively shuts off memory mapping and makes the page tables |
inaccessible to the user.\looseness=-1 |
|
The program assumes that the ropcode in rXX is 3 (which it is when |
a forced trap is triggered by the need for virtual translation). |
@^ropcodes@> |
@^translation caches@> |
|
The translation from virtual pages to physical pages need not actually |
follow the rules for PTPs and PTEs; any other mapping could be |
substituted by operating systems with special needs. But people usually |
want compatibility between different implementations whenever |
possible. The only parts of~rV that \MMIX\ really needs are the $s$~field, |
which defines page sizes, and the $n$~field, which keeps TC entries |
of one process from being confused with the TC entries of another. |
|
@* The complete instruction set. We have now described all of\/ \MMIX's |
special registers---except one: The special |
{\it failure location register\/}~rF is set |
@^rF@> |
to a physical memory address when a parity error or other memory |
fault occurs. (The instruction leading to this error will probably be |
long gone before such a fault is detected; for example, the machine might |
be trying to write old data from a cache in order to make room for |
new data. Thus there is generally no connection between the current virtual |
program location~rW and the physical location of a memory error. But knowledge |
of the latter location can still be useful for hardware repair, or when |
an operating system is booting up.) |
|
@ One additional instruction proves to be useful. |
|
\bull\<SWYM X,Y,Z `sympathize with your machinery'.\> |
This command lubricates the disk drives, fans, magnetic tape drives, |
laser printers, scanners, and any other mechanical equipment hooked |
up to \MMIX, if necessary. Fields X, Y, and~Z are ignored. |
@.SWYM@> |
|
The \.{SWYM} command was originally included in \MMIX's repertoire because |
machines occasionally need grease to keep in shape, just as |
human beings occasionally need to swim or do some other kind of exercise |
in order to maintain good muscle tone. But in fact, \.{SWYM} has turned out to |
be a ``no-op,'' an instruction that does nothing at all; the |
@^no-op@> |
hypothetical manufacturers of our hypothetical machine have pointed out that |
modern computer equipment is already well oiled and sealed for permanent use. |
Even so, a no-op instruction provides a good way for software to |
send signals to the hardware, for such things as scheduling the way |
instructions are issued on superscalar superpipelined buzzword-compliant |
machines. Software programs can also use no-ops to communicate with other |
programs like symbolic debuggers. |
|
When a forced trap computes the translation~rZZ of a virtual address~rYY, |
ropcode~3 of \<RESUME 1 will put $\rm(rYY,rZZ)$ into the TC for instructions if |
the opcode in~rXX is \.{SWYM}; otherwise $\rm(rYY,rZZ)$ will be put |
into the TC for data. |
@^ropcodes@> |
@^translation caches@> |
@.RESUME@> |
@^virtual address emulation@> |
@^emulation@> |
|
@ The running time of\/ \MMIX\ programs depends to a great extent |
on changes in technology. |
\MMIX\ is a mythical machine, but its mythical hardware exists in |
cheap, slow versions as well as in costly high-performance models. |
Details of running time usually depend on things like the amount of main memory |
available to implement virtual memory, as well as the sizes of |
caches and other buffers. |
|
For practical purposes, the running time of an \MMIX\ program can often be |
estimated satisfactorily by assigning a fixed cost |
to each operation, based on the approximate running time that would be obtained |
on a high-performance machine with lots of main memory; so that's what |
we will do. Each operation will be assumed to take an integer number |
of~$\upsilon$, |
where $\upsilon$ (pronounced ``oops'') is a unit that represents the clock cycle time in |
@^mems@> |
@^oops@> |
a pipelined implementation. The value of $\upsilon$ will probably decrease |
from year to year, but I'll keep calling it $\upsilon$. The running |
time will also depend on the number of memory references or {\it mems\/} |
that a program uses; |
this is the number of load and store instructions. For example, |
each \.{LDO} (load octa) instruction will be assumed to cost |
$\mu+\upsilon$, where $\mu$ is the average cost of |
a memory reference. The total running time of a program might be reported as, |
say, $35\mu+1000\upsilon$, meaning 35 mems plus 1000~oops. The |
ratio $\mu/\upsilon$ will probably increase with time, so mem-counting |
is likely to become increasingly important. [See the discussion of mems in |
{\sl The Stanford GraphBase\/} (New York:\ ACM Press, 1994).] |
@^oops@> |
@^running times, approximate@> |
|
Integer addition, subtraction, and comparison all take just $1\upsilon$. |
The same is true for \.{SET}, \.{GET}, \.{PUT}, \.{SYNC}, and \.{SWYM} |
instructions, |
as well as bitwise logical operations, shifts, relative jumps, comparisons, |
conditional assignments, |
and correctly predicted branches-not-taken or probable-branches-taken. |
Mispredicted branches or probable branches cost $3\upsilon$, and |
so do the \.{POP} and \.{GO} commands. |
Integer multiplication takes $10\upsilon$; integer division weighs in |
at~$60\upsilon$. |
@.MUL@> |
@.DIV@> |
@.TRAP@> |
@.TRIP@> |
@.RESUME@> |
\.{TRAP}, \.{TRIP}, and \.{RESUME} cost $5\upsilon$ each. |
|
Most floating point operations have a nominal running time of $4\upsilon$, |
although the comparison operators \.{FCMP}, \.{FEQL}, and \.{FUN} |
need only $1\upsilon$. |
\.{FDIV} and \.{FSQRT} cost $40\upsilon$ each. |
@.FDIV@> |
@.FSQRT@> |
@.FREM@> |
The actual running time of floating point computations |
will vary depending on the operands; for example, |
the machine might need one extra $\upsilon$ for each subnormal input |
or output, and it might slow down greatly when trips are enabled. |
The \.{FREM} instruction might typically cost |
$(3+\delta)\upsilon$, where $\delta$ is the amount |
by which the exponent of the first operand exceeds the exponent of the |
second (or zero, if this amount is negative). A floating point |
operation might take only $1\upsilon$ |
if at least one of its operands is zero, infinity, or~NaN\null. |
However, the fixed values stated at the beginning of this paragraph |
will be used for all seat-of-the-pants estimates of running time, |
since we want to keep the estimates as simple as possible |
without making them terribly out of line. |
|
All load and store operations will be assumed to cost $\mu+\upsilon$, |
except that \.{CSWAP} costs $2\mu+2\upsilon$. |
(This applies to all OP~codes that begin with |
\Hex8, \Hex9, \Hex{A}, and \Hex{B}, except \Hex{98}--\Hex{9F} and |
\Hex{B8}--\Hex{BF}. It's best |
to keep the rules simple, because $\mu$ is just |
an approximate device for estimating average memory cost.) |
\.{SAVE} and \.{UNSAVE} are charged $20\mu+\upsilon$. |
@.CSWAP@> |
@.SAVE@> |
@.UNSAVE@> |
|
Of course we must remember that these numbers are very rough. |
We have not included the cost of fetching instructions from memory. |
Furthermore, an integer multiplication or division might have an effective |
cost of only $1\upsilon$, if the result is not needed while other |
numbers are being calculated. |
Only a detailed simulation can be expected to be truly realistic. |
|
@ If you think that \MMIX\ has plenty of operation codes, you are right; |
we have now described them all. Here is a chart that shows their |
numeric values: |
\def\oddline#1{\cr |
\noalign{\nointerlineskip} |
\omit&\setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}& |
\multispan{17}\hrulefill& |
\setbox0=\hbox{\lower 2.3pt\hbox{\Hex{#1x}}}\smash{\box0}\cr |
\noalign{\nointerlineskip}} |
\def\evenline{\cr\noalign{\hrule}} |
\def\chartstrut{\lower4.5pt\vbox to14pt{}} |
\def\beginchart{$$\tt\halign to\hsize\bgroup |
\chartstrut##\tabskip0pt plus10pt& |
&\hfil##\hfil&\vrule##\cr |
\lower6.5pt\null |
&&&\Hex0&&\Hex1&&\Hex2&&\Hex3&&\Hex4&&\Hex 5&&\Hex 6&&\Hex 7&\evenline} |
\def\endchart{\raise11.5pt\null&&&\Hex 8&&\Hex 9&&\Hex A&&\Hex B& |
&\Hex C&&\Hex D&&\Hex E&&\Hex F&\cr\egroup$$} |
\def\\#1[#2]{\multispan3\hfil#1[#2]\hfil} |
\beginchart |
&&&TRAP&&FCMP&&FUN&&FEQL&&FADD&&FIX&&FSUB&&FIXU&\oddline 0 |
&&&\\FLOT[I]&&\\FLOTU[I]&&\\SFLOT[I]&&\\SFLOTU[I]&\evenline |
&&&FMUL&&FCMPE&&FUNE&&FEQLE&&FDIV&&FSQRT&&FREM&&FINT&\oddline 1 |
&&&\\MUL[I]&&\\MULU[I]&&\\DIV[I]&&\\DIVU[I]&\evenline |
&&&\\ADD[I]&&\\ADDU[I]&&\\SUB[I]&&\\SUBU[I]&\oddline 2 |
&&&\\2ADDU[I]&&\\4ADDU[I]&&\\8ADDU[I]&&\\16ADDU[I]&\evenline |
&&&\\CMP[I]&&\\CMPU[I]&&\\NEG[I]&&\\NEGU[I]&\oddline 3 |
&&&\\SL[I]&&\\SLU[I]&&\\SR[I]&&\\SRU[I]&\evenline |
&&&\\BN[B]&&\\BZ[B]&&\\BP[B]&&\\BOD[B]&\oddline 4 |
&&&\\BNN[B]&&\\BNZ[B]&&\\BNP[B]&&\\BEV[B]&\evenline |
&&&\\PBN[B]&&\\PBZ[B]&&\\PBP[B]&&\\PBOD[B]&\oddline 5 |
&&&\\PBNN[B]&&\\PBNZ[B]&&\\PBNP[B]&&\\PBEV[B]&\evenline |
&&&\\CSN[I]&&\\CSZ[I]&&\\CSP[I]&&\\CSOD[I]&\oddline 6 |
&&&\\CSNN[I]&&\\CSNZ[I]&&\\CSNP[I]&&\\CSEV[I]&\evenline |
&&&\\ZSN[I]&&\\ZSZ[I]&&\\ZSP[I]&&\\ZSOD[I]&\oddline 7 |
&&&\\ZSNN[I]&&\\ZSNZ[I]&&\\ZSNP[I]&&\\ZSEV[I]&\evenline |
&&&\\LDB[I]&&\\LDBU[I]&&\\LDW[I]&&\\LDWU[I]&\oddline 8 |
&&&\\LDT[I]&&\\LDTU[I]&&\\LDO[I]&&\\LDOU[I]&\evenline |
&&&\\LDSF[I]&&\\LDHT[I]&&\\CSWAP[I]&&\\LDUNC[I]&\oddline 9 |
&&&\\LDVTS[I]&&\\PRELD[I]&&\\PREGO[I]&&\\GO[I]&\evenline |
&&&\\STB[I]&&\\STBU[I]&&\\STW[I]&&\\STWU[I]&\oddline A |
&&&\\STT[I]&&\\STTU[I]&&\\STO[I]&&\\STOU[I]&\evenline |
&&&\\STSF[I]&&\\STHT[I]&&\\STCO[I]&&\\STUNC[I]&\oddline B |
&&&\\SYNCD[I]&&\\PREST[I]&&\\SYNCID[I]&&\\PUSHGO[I]&\evenline |
&&&\\OR[I]&&\\ORN[I]&&\\NOR[I]&&\\XOR[I]&\oddline C |
&&&\\AND[I]&&\\ANDN[I]&&\\NAND[I]&&\\NXOR[I]&\evenline |
&&&\\BDIF[I]&&\\WDIF[I]&&\\TDIF[I]&&\\ODIF[I]&\oddline D |
&&&\\MUX[I]&&\\SADD[I]&&\\MOR[I]&&\\MXOR[I]&\evenline |
&&&SETH&&SETMH&&SETML&&SETL&&INCH&&INCMH&&INCML&&INCL&\oddline E |
&&&ORH&&ORMH&&ORML&&ORL&&ANDNH&&ANDNMH&&ANDNML&&ANDNL&\evenline |
&&&\\JMP[B]&&\\PUSHJ[B]&&\\GETA[B]&&\\PUT[I]&\oddline F |
&&&POP&&RESUME&&SAVE&&UNSAVE&&SYNC&&SWYM&&GET&&TRIP&\evenline |
\endchart |
The notation `\.{[I]}' indicates an operation with an ``immediate'' variant |
in which the Z field denotes a constant instead of a register number. |
Similarly, `\.{[B]}' indicates an operation with a ``backward'' variant |
in which a relative address has a negative displacement. Simulators and |
other programs that need to present \MMIX\ instructions in symbolic |
form will say that opcode \Hex{20} is \.{ADD} while opcode \Hex{21} |
is~\.{ADDI}; they will say that \Hex{F2} is \.{PUSHJ} while \Hex{F3} |
is~\.{PUSHJB}. But the \MMIX\ assembler uses only the forms \.{ADD} |
and \.{PUSHJ}, not \.{ADDI} or \.{PUSHJB}. |
|
To read this chart, use the hexadecimal digits at the top, bottom, |
left, and right. |
For example, operation code \.{A9} in hexadecimal notation appears in |
the lower part of the \Hex{Ax} row and in the \Hex1/\Hex9 column; it is |
\.{STTI}, `store tetrabyte immediate'. |
@^OP codes, table@> |
|
%The blank spaces in this chart are undefined opcodes, |
%reserved for future extension. |
%If an instruction with such |
%an opcode is encountered in a user program, it is considered to be |
%an illegal instruction (like, say, \.{FIX} with the \.Y field greater than~9), |
%@^illegal instructions@> |
%triggering an interrupt. Such instructions might become defined in |
%later versions of\/ \MMIX, at which time the operating system |
%could probably emulate the new instructions for backward compatibility. |
%@^version number@> |
|
\def\\#1{\leavevmode\hbox{\it#1\/\kern.05em}} % italic type for identifiers |
|
@*Index. (References are to section numbers, not page numbers.) |
/sort.mms
0,0 → 1,40
LOC Data_Segment |
x0 GREG @ |
X0 IS @ |
N IS 100 |
|
j IS $0 |
m IS $1 |
kk IS $2 |
xk IS $3 |
t IS $255 |
LOC #100 |
Maximum SL kk,$0,3 |
LDO m,x0,kk |
JMP ChangeJ |
Loop LDO xk,x0,kk |
CMP t,xk,m |
PBNP t,DecreaseK |
ChangeM SET m,xk |
ChangeJ SR j,kk,3 |
DecreaseK SUB kk,kk,8 |
PBP kk,Loop |
POP 2,0 |
|
Main GETA t,9F |
TRAP 0,Fread,StdIn |
SET $0,N<<3 |
1H SR $2,$0,3 |
PUSHJ 1,Maximum |
LDO $3,x0,$0 |
SL $2,$2,3 |
STO $1,x0,$0 |
STO $3,x0,$2 |
SUB $0,$0,1<<3 |
PBNZ $0,1B |
GETA t,9F |
TRAP 0,Fwrite,StdOut |
TRAP 0,Halt,0 |
9H OCTA X0+1<<3,N<<3 |
|
|
/mmix-arith.w
0,0 → 1,1843
% This file is part of the MMIXware package (c) Donald E Knuth 1999 |
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! |
|
\def\title{MMIX-ARITH} |
|
\def\MMIX{\.{MMIX}} |
\def\MMIXAL{\.{MMIXAL}} |
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant |
\def\dts{\mathinner{\ldotp\ldotp}} |
\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow |
\def\ff{\\{ff\kern-.05em}} |
@s ff TeX |
@s bool normal @q unreserve a C++ keyword @> |
@s xor normal @q unreserve a C++ keyword @> |
|
@* Introduction. The subroutines below are used to simulate 64-bit \MMIX\ |
arithmetic on an old-fashioned 32-bit computer---like the one the author |
had when he wrote \MMIXAL\ and the first \MMIX\ simulators in 1998 and 1999. |
All operations are fabricated from 32-bit arithmetic, including |
a full implementation of the IEEE floating point standard, |
assuming only that the \CEE/ compiler has a 32-bit unsigned integer type. |
|
Some day 64-bit machines will be commonplace and the awkward manipulations of |
the present program will look quite archaic. Interested readers who have such |
computers will be able to convert the code to a pure 64-bit form without |
difficulty, thereby obtaining much faster and simpler routines. Meanwhile, |
however, we can simulate the future and hope for continued progress. |
|
This program module has a simple structure, intended to make it |
suitable for loading with \MMIX\ simulators and assemblers. |
|
@c |
#include <stdio.h> |
#include <string.h> |
#include <ctype.h> |
@<Stuff for \CEE/ preprocessor@>@; |
typedef enum{@+false,true@+} bool; |
@<Tetrabyte and octabyte type definitions@>@; |
@<Other type definitions@>@; |
@<Global variables@>@; |
@<Subroutines@> |
|
@ Subroutines of this program are declared first with a prototype, |
as in {\mc ANSI C}, then with an old-style \CEE/ function definition. |
Here are some preprocessor commands that make this work correctly with both |
new-style and old-style compilers. |
@^prototypes for functions@> |
|
@<Stuff for \CEE/ preprocessor@>= |
#ifdef __STDC__ |
#define ARGS(list) list |
#else |
#define ARGS(list) () |
#endif |
|
@ The definition of type \&{tetra} should be changed, if necessary, so that |
it represents an unsigned 32-bit integer. |
@^system dependencies@> |
|
@<Tetra...@>= |
typedef unsigned int tetra; |
/* for systems conforming to the LP-64 data model */ |
typedef struct { tetra h,l;} octa; /* two tetrabytes make one octabyte */ |
|
@ @d sign_bit ((unsigned)0x80000000) |
|
@<Glob...@>= |
octa zero_octa; /* |zero_octa.h=zero_octa.l=0| */ |
octa neg_one={-1,-1}; /* |neg_one.h=neg_one.l=-1| */ |
octa inf_octa={0x7ff00000,0}; /* floating point $+\infty$ */ |
octa standard_NaN={0x7ff80000,0}; /* floating point NaN(.5) */ |
octa aux; /* auxiliary output of a subroutine */ |
bool overflow; /* set by certain subroutines for signed arithmetic */ |
|
@ It's easy to add and subtract octabytes, if we aren't terribly |
worried about speed. |
|
@<Subr...@>= |
octa oplus @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa oplus(y,z) /* compute $y+z$ */ |
octa y,z; |
{@+ octa x; |
x.h=y.h+z.h;@+ |
x.l=y.l+z.l; |
if (x.l<y.l) x.h++; |
return x; |
} |
@# |
octa ominus @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa ominus(y,z) /* compute $y-z$ */ |
octa y,z; |
{@+ octa x; |
x.h=y.h-z.h;@+ |
x.l=y.l-z.l; |
if (x.l>y.l) x.h--; |
return x; |
} |
|
@ In the following subroutine, |delta| is a signed quantity that is |
assumed to fit in a signed tetrabyte. |
|
@<Subr...@>= |
octa incr @,@,@[ARGS((octa,int))@];@+@t}\6{@> |
octa incr(y,delta) /* compute $y+\delta$ */ |
octa y; |
int delta; |
{@+ octa x; |
x.h=y.h;@+ x.l=y.l+delta; |
if (delta>=0) { |
if (x.l<y.l) x.h++; |
}@+else if (x.l>y.l) x.h--; |
return x; |
} |
|
@ Left and right shifts are only a bit more difficult. |
|
@<Subr...@>= |
octa shift_left @,@,@[ARGS((octa,int))@];@+@t}\6{@> |
octa shift_left(y,s) /* shift left by $s$ bits, where $0\le s\le64$ */ |
octa y; |
int s; |
{ |
while (s>=32) y.h=y.l,y.l=0,s-=32; |
if (s) {@+register tetra yhl=y.h<<s,ylh=y.l>>(32-s); |
y.h=yhl+ylh;@+ y.l<<=s; |
} |
return y; |
} |
@# |
octa shift_right @,@,@[ARGS((octa,int,int))@];@+@t}\6{@> |
octa shift_right(y,s,u) /* shift right, arithmetically if $u=0$ */ |
octa y; |
int s,u; |
{ |
while (s>=32) y.l=y.h, y.h=(u?0: -(y.h>>31)), s-=32; |
if (s) {@+register tetra yhl=y.h<<(32-s),ylh=y.l>>s; |
y.h=(u? 0:(-(y.h>>31))<<(32-s))+(y.h>>s);@+ y.l=yhl+ylh; |
} |
return y; |
} |
|
@* Multiplication. We need to multiply two unsigned 64-bit integers, obtaining |
an unsigned 128-bit product. It is easy to do this on a 32-bit machine |
by using Algorithm 4.3.1M of {\sl Seminumerical Algorithms}, with $b=2^{16}$. |
@^multiprecision multiplication@> |
|
The following subroutine returns the lower half of the product, and |
puts the upper half into a global octabyte called |aux|. |
|
@<Subr...@>= |
octa omult @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa omult(y,z) |
octa y,z; |
{ |
register int i,j,k; |
tetra u[4],v[4],w[8]; |
register tetra t; |
octa acc; |
@<Unpack the multiplier and multiplicand to |u| and |v|@>; |
for (j=0;j<4;j++) w[j]=0; |
for (j=0;j<4;j++) |
if (!v[j]) w[j+4]=0; |
else { |
for (i=k=0;i<4;i++) { |
t=u[i]*v[j]+w[i+j]+k; |
w[i+j]=t&0xffff, k=t>>16; |
} |
w[j+4]=k; |
} |
@<Pack |w| into the outputs |aux| and |acc|@>; |
return acc; |
} |
|
@ @<Glob...@>= |
extern octa aux; /* secondary output of subroutines with multiple outputs */ |
extern bool overflow; |
|
@ @<Unpack the mult...@>= |
u[3]=y.h>>16, u[2]=y.h&0xffff, u[1]= y.l>>16, u[0]=y.l&0xffff; |
v[3]=z.h>>16, v[2]=z.h&0xffff, v[1]= z.l>>16, v[0]=z.l&0xffff; |
|
@ @<Pack |w| into the outputs |aux| and |acc|@>= |
aux.h=(w[7]<<16)+w[6], aux.l=(w[5]<<16)+w[4]; |
acc.h=(w[3]<<16)+w[2], acc.l=(w[1]<<16)+w[0]; |
|
@ Signed multiplication has the same lower half product as unsigned |
multiplication. The signed upper half product is obtained with at most two |
further subtractions, after which the result has overflowed if and only if |
the upper half is unequal to 64 copies of the sign bit in the lower half. |
|
@<Subr...@>= |
octa signed_omult @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa signed_omult(y,z) |
octa y,z; |
{ |
octa acc; |
acc=omult(y,z); |
if (y.h&sign_bit) aux=ominus(aux,z); |
if (z.h&sign_bit) aux=ominus(aux,y); |
overflow=(aux.h!=aux.l || (aux.h^(aux.h>>1)^(acc.h&sign_bit))); |
return acc; |
} |
|
@* Division. Long division of an unsigned 128-bit integer by an unsigned |
64-bit integer is, of course, one of the most challenging routines |
needed for \MMIX\ arithmetic. The following program, based on |
Algorithm 4.3.1D of {\sl Seminumerical Algorithms}, computes |
octabytes $q$ and $r$ such that $(2^{64}x+y)=qz+r$ and $0\le r<z$, |
given octabytes $x$, $y$, and~$z$, assuming that $x<z$. |
(If $x\ge z$, it simply sets $q=x$ and $r=y$.) |
The quotient~$q$ is returned by the subroutine; |
the remainder~$r$ is stored in |aux|. |
@^multiprecision division@> |
|
@<Subr...@>= |
octa odiv @,@,@[ARGS((octa,octa,octa))@];@+@t}\6{@> |
octa odiv(x,y,z) |
octa x,y,z; |
{ |
register int i,j,k,n,d; |
tetra u[8],v[4],q[4],mask,qhat,rhat,vh,vmh; |
register tetra t; |
octa acc; |
@<Check that |x<z|; otherwise give trivial answer@>; |
@<Unpack the dividend and divisor to |u| and |v|@>; |
@<Determine the number of significant places |n| in the divisor |v|@>; |
@<Normalize the divisor@>; |
for (j=3;j>=0;j--) @<Determine the quotient digit |q[j]|@>; |
@<Unnormalize the remainder@>; |
@<Pack |q| and |u| to |acc| and |aux|@>; |
return acc; |
} |
|
@ @<Check that |x<z|; otherwise give trivial answer@>= |
if (x.h>z.h || (x.h==z.h && x.l>=z.l)) { |
aux=y;@+ return x; |
} |
|
@ @<Unpack the div...@>= |
u[7]=x.h>>16, u[6]=x.h&0xffff, u[5]=x.l>>16, u[4]=x.l&0xffff; |
u[3]=y.h>>16, u[2]=y.h&0xffff, u[1]=y.l>>16, u[0]=y.l&0xffff; |
v[3]=z.h>>16, v[2]=z.h&0xffff, v[1]=z.l>>16, v[0]=z.l&0xffff; |
|
@ @<Determine the number of significant places |n| in the divisor |v|@>= |
for (n=4;v[n-1]==0;n--); |
|
@ We shift |u| and |v| left by |d| places, where |d| is chosen to |
make $2^{15}\le v_{n-1}<2^{16}$. |
|
@<Normalize the divisor@>= |
vh=v[n-1]; |
for (d=0;vh<0x8000;d++,vh<<=1); |
for (j=k=0; j<n+4; j++) { |
t=(u[j]<<d)+k; |
u[j]=t&0xffff, k=t>>16; |
} |
for (j=k=0; j<n; j++) { |
t=(v[j]<<d)+k; |
v[j]=t&0xffff, k=t>>16; |
} |
vh=v[n-1]; |
vmh=(n>1? v[n-2]: 0); |
|
@ @<Unnormalize the remainder@>= |
mask=(1<<d)-1; |
for (j=3; j>=n; j--) u[j]=0; |
for (k=0;j>=0;j--) { |
t=(k<<16)+u[j]; |
u[j]=t>>d, k=t&mask; |
} |
|
@ @<Pack |q| and |u| to |acc| and |aux|@>= |
acc.h=(q[3]<<16)+q[2], acc.l=(q[1]<<16)+q[0]; |
aux.h=(u[3]<<16)+u[2], aux.l=(u[1]<<16)+u[0]; |
|
@ @<Determine the quotient digit |q[j]|@>= |
{ |
@<Find the trial quotient, $\hat q$@>; |
@<Subtract $b^j\hat q v$ from |u|@>; |
@<If the result was negative, decrease $\hat q$ by 1@>; |
q[j]=qhat; |
} |
|
@ @<Find the trial quotient, $\hat q$@>= |
t=(u[j+n]<<16)+u[j+n-1]; |
qhat=t/vh, rhat=t-vh*qhat; |
if (n>1) while (qhat==0x10000 || qhat*vmh>(rhat<<16)+u[j+n-2]) { |
qhat--, rhat+=vh; |
if (rhat>=0x10000) break; |
} |
|
@ After this step, |u[j+n]| will either equal |k| or |k-1|. The |
true value of~|u| would be obtained by subtracting~|k| from |u[j+n]|; |
but we don't have to fuss over |u[j+n]|, because it won't be examined later. |
|
@<Subtract $b^j\hat q v$ from |u|@>= |
for (i=k=0; i<n; i++) { |
t=u[i+j]+0xffff0000-k-qhat*v[i]; |
u[i+j]=t&0xffff, k=0xffff-(t>>16); |
} |
|
@ The correction here occurs only rarely, but it can be necessary---for |
example, when dividing the number \Hex{7fff800100000000} by \Hex{800080020005}. |
|
@<If the result was negative, decrease $\hat q$ by 1@>= |
if (u[j+n]!=k) { |
qhat--; |
for (i=k=0; i<n; i++) { |
t=u[i+j]+v[i]+k; |
u[i+j]=t&0xffff, k=t>>16; |
} |
} |
|
@ Signed division can be reduced to unsigned division in a tedious |
but straightforward manner. We assume that the divisor isn't zero. |
|
@<Subr...@>= |
octa signed_odiv @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa signed_odiv(y,z) |
octa y,z; |
{ |
octa yy,zz,q; |
register int sy,sz; |
if (y.h&sign_bit) sy=2, yy=ominus(zero_octa,y); |
else sy=0, yy=y; |
if (z.h&sign_bit) sz=1, zz=ominus(zero_octa,z); |
else sz=0, zz=z; |
q=odiv(zero_octa,yy,zz); |
overflow=false; |
switch (sy+sz) { |
case 2+1: aux=ominus(zero_octa,aux); |
if (q.h==sign_bit) overflow=true; |
case 0+0: return q; |
case 2+0:@+ if (aux.h || aux.l) aux=ominus(zz,aux); |
goto negate_q; |
case 0+1:@+ if (aux.h || aux.l) aux=ominus(aux,zz); |
negate_q:@+ if (aux.h || aux.l) return ominus(neg_one,q); |
else return ominus(zero_octa,q); |
} |
} |
|
@* Bit fiddling. The bitwise operators of \MMIX\ are fairly easy to |
implement directly, but three of them occur often enough to deserve |
packaging as subroutines. |
|
@<Subr...@>= |
octa oand @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa oand(y,z) /* compute $y\land z$ */ |
octa y,z; |
{@+ octa x; |
x.h=y.h&z.h;@+ x.l=y.l&z.l; |
return x; |
} |
@# |
octa oandn @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa oandn(y,z) /* compute $y\land\bar z$ */ |
octa y,z; |
{@+ octa x; |
x.h=y.h&~z.h;@+ x.l=y.l&~z.l; |
return x; |
} |
@# |
octa oxor @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa oxor(y,z) /* compute $y\oplus z$ */ |
octa y,z; |
{@+ octa x; |
x.h=y.h^z.h;@+ x.l=y.l^z.l; |
return x; |
} |
|
@ Here's a fun way to count the number of bits in a tetrabyte. |
[This classical trick is called the ``Gillies--Miller method |
for sideways addition'' in {\sl The Preparation of Programs |
for an Electronic Digital Computer\/} by Wilkes, Wheeler, and |
Gill, second edition (Reading, Mass.:\ Addison--Wesley, 1957), |
191--193. Some of the tricks used here were suggested by |
Balbir Singh, Peter Rossmanith, and Stefan Schwoon.] |
@^Gillies, Donald Bruce@> |
@^Miller, Jeffrey Charles Percy@> |
@^Wilkes, Maurice Vincent@> |
@^Wheeler, David John@> |
@^Gill, Stanley@> |
@^Singh, Balbir@> |
@^Rossmanith, Peter@> |
@^Schwoon, Stefan@> |
|
@<Subr...@>= |
int count_bits @,@,@[ARGS((tetra))@];@+@t}\6{@> |
int count_bits(x) |
tetra x; |
{ |
register int xx=x; |
xx=xx-((xx>>1)&0x55555555); |
xx=(xx&0x33333333)+((xx>>2)&0x33333333); |
xx=(xx+(xx>>4))&0x0f0f0f0f; |
xx=xx+(xx>>8); |
return (xx+(xx>>16)) & 0xff; |
} |
|
@ To compute the nonnegative byte differences of two given tetrabytes, |
we can carry out the following 20-step branchless computation: |
|
@<Subr...@>= |
tetra byte_diff @,@,@[ARGS((tetra,tetra))@];@+@t}\6{@> |
tetra byte_diff(y,z) |
tetra y,z; |
{ |
register tetra d=(y&0x00ff00ff)+0x01000100-(z&0x00ff00ff); |
register tetra m=d&0x01000100; |
register tetra x=d&(m-(m>>8)); |
d=((y>>8)&0x00ff00ff)+0x01000100-((z>>8)&0x00ff00ff); |
m=d&0x01000100; |
return x+((d&(m-(m>>8)))<<8); |
} |
|
@ To compute the nonnegative wyde differences of two tetrabytes, |
another trick leads to a 15-step branchless computation. |
(Research problem: Can |count_bits|, |byte_diff|, or |wyde_diff| be done |
with fewer operations?) |
|
@<Subr...@>= |
tetra wyde_diff @,@,@[ARGS((tetra,tetra))@];@+@t}\6{@> |
tetra wyde_diff(y,z) |
tetra y,z; |
{ |
register tetra a=((y>>16)-(z>>16))&0x10000; |
register tetra b=((y&0xffff)-(z&0xffff))&0x10000; |
return y-(z^((y^z)&(b-a-(b>>16)))); |
} |
|
@ The last bitwise subroutine we need is the most interesting: |
It implements \MMIX's \.{MOR} and \.{MXOR} operations. |
|
@<Subr...@>= |
octa bool_mult @,@,@[ARGS((octa,octa,bool))@];@+@t}\6{@> |
octa bool_mult(y,z,xor) |
octa y,z; /* the operands */ |
bool xor; /* do we do xor instead of or? */ |
{ |
octa o,x; |
register tetra a,b,c; |
register int k; |
for (k=0,o=y,x=zero_octa;o.h||o.l;k++,o=shift_right(o,8,1)) |
if (o.l&0xff) { |
a=((z.h>>k)&0x01010101)*0xff; |
b=((z.l>>k)&0x01010101)*0xff; |
c=(o.l&0xff)*0x01010101; |
if (xor) x.h^=a&c, x.l^=b&c; |
else x.h|=a&c, x.l|=b&c; |
} |
return x; |
} |
|
@* Floating point packing and unpacking. Standard IEEE floating binary |
numbers pack a sign, exponent, and fraction into a tetrabyte |
or octabyte. In this section we consider basic subroutines that |
convert between IEEE format and the separate unpacked components. |
|
@d ROUND_OFF 1 |
@d ROUND_UP 2 |
@d ROUND_DOWN 3 |
@d ROUND_NEAR 4 |
|
@<Glob...@>= |
int cur_round; /* the current rounding mode */ |
|
@ The |fpack| routine takes an octabyte $f$, a raw exponent~$e$, |
and a sign~|s|, and packs them |
into the floating binary number that corresponds to |
$\pm2^{e-1076}f$, using a given rounding mode. |
The value of $f$ should satisfy $2^{54}\le f\le 2^{55}$. |
|
Thus, for example, the floating binary number $+1.0=\Hex{3ff0000000000000}$ |
is obtained when $f=2^{54}$, $e=\Hex{3fe}$, and |s='+'|. |
The raw exponent~$e$ is usually one less than |
the final exponent value; the leading bit of~$f$ is essentially added |
to the exponent. (This trick works nicely for subnormal numbers, when |
$e<0$, or in cases where the value of $f$ is rounded upwards to $2^{55}$.) |
|
Exceptional events are noted by oring appropriate bits into |
the global variable |exceptions|. Special considerations apply to |
underflow, which is not fully specified by Section 7.4 of the IEEE standard: |
Implementations of the standard are free to choose between two definitions |
of ``tininess'' and two definitions of ``accuracy loss.'' |
\MMIX\ determines tininess {\it after\/} rounding, hence a result with |
$e<0$ is not necessarily tiny; \MMIX\ treats accuracy loss as equivalent |
to inexactness. Thus, a result underflows if and only if |
it is tiny and either (i)~it is inexact or (ii)~the underflow trap is enabled. |
The |fpack| routine sets |U_BIT| in |exceptions| if and only if the result is |
tiny, |X_BIT| if and only if the result is inexact. |
@^underflow@> |
|
@d X_BIT (1<<8) /* floating inexact */ |
@d Z_BIT (1<<9) /* floating division by zero */ |
@d U_BIT (1<<10) /* floating underflow */ |
@d O_BIT (1<<11) /* floating overflow */ |
@d I_BIT (1<<12) /* floating invalid operation */ |
@d W_BIT (1<<13) /* float-to-fix overflow */ |
@d V_BIT (1<<14) /* integer overflow */ |
@d D_BIT (1<<15) /* integer divide check */ |
@d E_BIT (1<<18) /* external (dynamic) trap bit */ |
|
@<Subr...@>= |
octa fpack @,@,@[ARGS((octa,int,char,int))@];@+@t}\6{@> |
octa fpack(f,e,s,r) |
octa f; /* the normalized fraction part */ |
int e; /* the raw exponent */ |
char s; /* the sign */ |
int r; /* the rounding mode */ |
{ |
octa o; |
if (e>0x7fd) e=0x7ff, o=zero_octa; |
else { |
if (e<0) { |
if (e<-54) o.h=0, o.l=1; |
else {@+octa oo; |
o=shift_right(f,-e,1); |
oo=shift_left(o,-e); |
if (oo.l!=f.l || oo.h!=f.h) o.l |= 1; /* sticky bit */ |
@^sticky bit@> |
} |
e=0; |
}@+else o=f; |
} |
@<Round and return the result@>; |
} |
|
@ @<Glob...@>= |
int exceptions; /* bits possibly destined for rA */ |
|
@ Everything falls together so nicely here, it's almost too good to be true! |
|
@<Round and return the result@>= |
if (o.l&3) exceptions |= X_BIT; |
switch (r) { |
case ROUND_DOWN:@+ if (s=='-') o=incr(o,3);@+break; |
case ROUND_UP:@+ if (s!='-') o=incr(o,3); |
case ROUND_OFF: break; |
case ROUND_NEAR: o=incr(o, o.l&4? 2: 1);@+break; |
} |
o = shift_right(o,2,1); |
o.h += e<<20; |
if (o.h>=0x7ff00000) exceptions |= O_BIT+X_BIT; /* overflow */ |
else if (o.h<0x100000) exceptions |= U_BIT; /* tininess */ |
if (s=='-') o.h |= sign_bit; |
return o; |
|
@ Similarly, |sfpack| packs a short float, from inputs |
having the same conventions as |fpack|. |
|
@<Subr...@>= |
tetra sfpack @,@,@[ARGS((octa,int,char,int))@];@+@t}\6{@> |
tetra sfpack(f,e,s,r) |
octa f; /* the fraction part */ |
int e; /* the raw exponent */ |
char s; /* the sign */ |
int r; /* the rounding mode */ |
{ |
register tetra o; |
if (e>0x47d) e=0x47f, o=0; |
else { |
o=shift_left(f,3).h; |
if (f.l&0x1fffffff) o|=1; |
if (e<0x380) { |
if (e<0x380-25) o=1; |
else {@+register tetra o0,oo; |
o0 = o; |
o = o>>(0x380-e); |
oo = o<<(0x380-e); |
if (oo!=o0) o |= 1; /* sticky bit */ |
@^sticky bit@> |
} |
e=0x380; |
} |
} |
@<Round and return the short result@>; |
} |
|
@ @<Round and return the short result@>= |
if (o&3) exceptions |= X_BIT; |
switch (r) { |
case ROUND_DOWN:@+ if (s=='-') o+=3;@+break; |
case ROUND_UP:@+ if (s!='-') o+=3; |
case ROUND_OFF: break; |
case ROUND_NEAR: o+=(o&4? 2: 1);@+break; |
} |
o = o>>2; |
o += (e-0x380)<<23; |
if (o>=0x7f800000) exceptions |= O_BIT+X_BIT; /* overflow */ |
else if (o<0x100000) exceptions |= U_BIT; /* tininess */ |
if (s=='-') o |= sign_bit; |
return o; |
|
@ The |funpack| routine is, roughly speaking, the opposite of |fpack|. |
It takes a given floating point number~$x$ and separates out its |
fraction part~$f$, exponent~$e$, and sign~$s$. It clears |exceptions| |
to zero. It returns the type of value found: |zro|, |num|, |inf|, |
or |nan|. When it returns |num|, |
it will have set $f$, $e$, and~$s$ |
to the values from which |fpack| would produce the original number~$x$ |
without exceptions. |
|
@d zero_exponent (-1000) /* zero is assumed to have this exponent */ |
|
@<Other type...@>= |
typedef enum {@!zro,@!num,@!inf,@!nan}@+ftype; |
|
@ @<Subr...@>= |
ftype funpack @,@,@[ARGS((octa,octa*,int*,char*))@];@+@t}\6{@> |
ftype funpack(x,f,e,s) |
octa x; /* the given floating point value */ |
octa *f; /* address where the fraction part should be stored */ |
int *e; /* address where the exponent part should be stored */ |
char *s; /* address where the sign should be stored */ |
{ |
register int ee; |
exceptions=0; |
*s=(x.h&sign_bit? '-': '+'); |
*f=shift_left(x,2); |
f->h &= 0x3fffff; |
ee=(x.h>>20)&0x7ff; |
if (ee) { |
*e=ee-1; |
f->h |= 0x400000; |
return (ee<0x7ff? num: f->h==0x400000 && !f->l? inf: nan); |
} |
if (!x.l && !f->h) { |
*e=zero_exponent;@+ return zro; |
} |
do {@+ ee--;@+ *f=shift_left(*f,1);@+} while (!(f->h&0x400000)); |
*e=ee;@+ return num; |
} |
|
@ @<Subr...@>= |
ftype sfunpack @,@,@[ARGS((tetra,octa*,int*,char*))@];@+@t}\6{@> |
ftype sfunpack(x,f,e,s) |
tetra x; /* the given floating point value */ |
octa *f; /* address where the fraction part should be stored */ |
int *e; /* address where the exponent part should be stored */ |
char *s; /* address where the sign should be stored */ |
{ |
register int ee; |
exceptions=0; |
*s=(x&sign_bit? '-': '+'); |
f->h=(x>>1)&0x3fffff, f->l=x<<31; |
ee=(x>>23)&0xff; |
if (ee) { |
*e=ee+0x380-1; |
f->h |= 0x400000; |
return (ee<0xff? num: (x&0x7fffffff)==0x7f800000? inf: nan); |
} |
if (!(x&0x7fffffff)) { |
*e=zero_exponent;@+return zro; |
} |
do {@+ ee--;@+ *f=shift_left(*f,1);@+} while (!(f->h&0x400000)); |
*e=ee+0x380;@+ return num; |
} |
|
@ Since \MMIX\ downplays 32-bit operations, it uses |sfpack| and |sfunpack| |
only when loading and storing short floats, or when converting |
from fixed point to floating point. |
|
@<Subr...@>= |
octa load_sf @,@,@[ARGS((tetra))@];@+@t}\6{@> |
octa load_sf(z) |
tetra z; /* 32 bits to be loaded into a 64-bit register */ |
{ |
octa f,x;@+int e;@+char s;@+ftype t; |
t=sfunpack(z,&f,&e,&s); |
switch (t) { |
case zro: x=zero_octa;@+break; |
case num: return fpack(f,e,s,ROUND_OFF); |
case inf: x=inf_octa;@+break; |
case nan: x=shift_right(f,2,1);@+x.h|=0x7ff00000;@+break; |
} |
if (s=='-') x.h|=sign_bit; |
return x; |
} |
|
@ @<Subr...@>= |
tetra store_sf @,@,@[ARGS((octa))@];@+@t}\6{@> |
tetra store_sf(x) |
octa x; /* 64 bits to be loaded into a 32-bit word */ |
{ |
octa f;@+tetra z;@+int e;@+char s;@+ftype t; |
t=funpack(x,&f,&e,&s); |
switch (t) { |
case zro: z=0;@+break; |
case num: return sfpack(f,e,s,cur_round); |
case inf: z=0x7f800000;@+break; |
case nan:@+ if (!(f.h&0x200000)) { |
f.h|=0x200000;@+exceptions|=I_BIT; /* NaN was signaling */ |
} |
z=0x7f800000|(f.h<<1)|(f.l>>31);@+break; |
} |
if (s=='-') z|=sign_bit; |
return z; |
} |
|
@* Floating multiplication and division. |
The hardest fixed point operations were multiplication and division; |
but these two operations are the {\it easiest\/} to implement in floating point |
arithmetic, once their fixed point counterparts are available. |
|
@<Subr...@>= |
octa fmult @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa fmult(y,z) |
octa y,z; |
{ |
ftype yt,zt; |
int ye,ze; |
char ys,zs; |
octa x,xf,yf,zf; |
register int xe; |
register char xs; |
yt=funpack(y,&yf,&ye,&ys); |
zt=funpack(z,&zf,&ze,&zs); |
xs=ys+zs-'+'; /* will be |'-'| when the result is negative */ |
switch (4*yt+zt) { |
@t\4@>@<The usual NaN cases@>; |
case 4*zro+zro: case 4*zro+num: case 4*num+zro: x=zero_octa;@+break; |
case 4*num+inf: case 4*inf+num: case 4*inf+inf: x=inf_octa;@+break; |
case 4*zro+inf: case 4*inf+zro: x=standard_NaN; |
exceptions|=I_BIT;@+break; |
case 4*num+num: @<Multiply nonzero numbers and |return|@>; |
} |
if (xs=='-') x.h|=sign_bit; |
return x; |
} |
|
@ @<The usual NaN cases@>= |
case 4*nan+nan:@+if (!(y.h&0x80000)) exceptions|=I_BIT; /* |y| is signaling */ |
case 4*zro+nan: case 4*num+nan: case 4*inf+nan: |
if (!(z.h&0x80000)) exceptions|=I_BIT, z.h|=0x80000; |
return z; |
case 4*nan+zro: case 4*nan+num: case 4*nan+inf: |
if (!(y.h&0x80000)) exceptions|=I_BIT, y.h|=0x80000; |
return y; |
|
@ @<Multiply nonzero numbers and |return|@>= |
xe=ye+ze-0x3fd; /* the raw exponent */ |
x=omult(yf,shift_left(zf,9)); |
if (aux.h>=0x400000) xf=aux; |
else xf=shift_left(aux,1), xe--; |
if (x.h||x.l) xf.l|=1; /* adjust the sticky bit */ |
return fpack(xf,xe,xs,cur_round); |
|
@ @<Subr...@>= |
octa fdivide @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa fdivide(y,z) |
octa y,z; |
{ |
ftype yt,zt; |
int ye,ze; |
char ys,zs; |
octa x,xf,yf,zf; |
register int xe; |
register char xs; |
yt=funpack(y,&yf,&ye,&ys); |
zt=funpack(z,&zf,&ze,&zs); |
xs=ys+zs-'+'; /* will be |'-'| when the result is negative */ |
switch (4*yt+zt) { |
@t\4@>@<The usual NaN cases@>; |
case 4*zro+inf: case 4*zro+num: case 4*num+inf: x=zero_octa;@+break; |
case 4*num+zro: exceptions|=Z_BIT; |
case 4*inf+num: case 4*inf+zro: x=inf_octa;@+break; |
case 4*zro+zro: case 4*inf+inf: x=standard_NaN; |
exceptions|=I_BIT;@+break; |
case 4*num+num: @<Divide nonzero numbers and |return|@>; |
} |
if (xs=='-') x.h|=sign_bit; |
return x; |
} |
|
@ @<Divide nonzero numbers...@>= |
xe=ye-ze+0x3fd; /* the raw exponent */ |
xf=odiv(yf,zero_octa,shift_left(zf,9)); |
if (xf.h>=0x800000) { |
aux.l|=xf.l&1; |
xf=shift_right(xf,1,1); |
xe++; |
} |
if (aux.h||aux.l) xf.l|=1; /* adjust the sticky bit */ |
return fpack(xf,xe,xs,cur_round); |
|
@*Floating addition and subtraction. Now for the bread-and-butter |
operation, the sum of two floating point numbers. |
It is not terribly difficult, but many cases need to be handled carefully. |
|
@<Subr...@>= |
octa fplus @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
octa fplus(y,z) |
octa y,z; |
{ |
ftype yt,zt; |
int ye,ze; |
char ys,zs; |
octa x,xf,yf,zf; |
register int xe,d; |
register char xs; |
yt=funpack(y,&yf,&ye,&ys); |
zt=funpack(z,&zf,&ze,&zs); |
switch (4*yt+zt) { |
@t\4@>@<The usual NaN cases@>; |
case 4*zro+num: return fpack(zf,ze,zs,ROUND_OFF);@+break; /* may underflow */ |
case 4*num+zro: return fpack(yf,ye,ys,ROUND_OFF);@+break; /* may underflow */ |
case 4*inf+inf:@+if (ys!=zs) { |
exceptions|=I_BIT;@+x=standard_NaN;@+xs=zs;@+break; |
} |
case 4*num+inf: case 4*zro+inf: x=inf_octa;@+xs=zs;@+break; |
case 4*inf+num: case 4*inf+zro: x=inf_octa;@+xs=ys;@+break; |
case 4*num+num:@+ if (y.h!=(z.h^0x80000000) || y.l!=z.l) |
@<Add nonzero numbers and |return|@>; |
case 4*zro+zro: x=zero_octa; |
xs=(ys==zs? ys: cur_round==ROUND_DOWN? '-': '+');@+break; |
} |
if (xs=='-') x.h|=sign_bit; |
return x; |
} |
|
@ @<Add nonzero numbers...@>= |
{@+octa o,oo; |
if (ye<ze || (ye==ze && (yf.h<zf.h || (yf.h==zf.h && yf.l<zf.l)))) |
@<Exchange |y| with |z|@>; |
d=ye-ze; |
xs=ys, xe=ye; |
if (d) @<Adjust for difference in exponents@>; |
if (ys==zs) { |
xf=oplus(yf,zf); |
if (xf.h>=0x800000) xe++, d=xf.l&1, xf=shift_right(xf,1,1), xf.l|=d; |
}@+else { |
xf=ominus(yf,zf); |
if (xf.h>=0x800000) xe++, d=xf.l&1, xf=shift_right(xf,1,1), xf.l|=d; |
else@+ while (xf.h<0x400000) xe--, xf=shift_left(xf,1); |
} |
return fpack(xf,xe,xs,cur_round); |
} |
|
@ @<Exchange |y| with |z|@>= |
{ |
o=yf, yf=zf, zf=o; |
d=ye, ye=ze, ze=d; |
d=ys, ys=zs, zs=d; |
} |
|
@ Proper rounding requires two bits to the right of the fraction delivered |
to~|fpack|. The first is the true next bit of the result; |
the other is a ``sticky'' bit, which is nonzero if any further bits of the |
true result are nonzero. Sticky rounding to an integer takes |
$x$ into the number $\lfloor x/2\rfloor+\lceil x/2\rceil$. |
@^sticky bit@> |
|
Some subtleties need to be observed here, in order to |
prevent the sticky bit from being shifted left. If we did not |
shift |yf| left~1 before shifting |zf| to the right, an incorrect |
answer would be obtained in certain cases---for example, if |
$|yf|=2^{54}$, $|zf|=2^{54}+2^{53}-1$, $d=52$. |
|
@<Adjust for difference in exponents@>= |
{ |
if (d<=2) zf=shift_right(zf,d,1); /* exact result */ |
else if (d>53) zf.h=0, zf.l=1; /* tricky but OK */ |
else { |
if (ys!=zs) d--,xe--,yf=shift_left(yf,1); |
o=zf; |
zf=shift_right(o,d,1); |
oo=shift_left(zf,d); |
if (oo.l!=o.l || oo.h!=o.h) zf.l|=1; |
} |
} |
|
@ The comparison of floating point numbers with respect to $\epsilon$ |
shares some of the characteristics of floating point addition/subtraction. |
In some ways it is simpler, and in other ways it is more difficult; |
we might as well deal with it now. % anyways |
|
Subroutine |fepscomp(y,z,e,s)| returns 2 if |y|, |z|, or |e| is a NaN |
or |e| is negative. It returns 1 if |s=0| and $y\approx z\ (e)$ or if |
|s!=0| and $y\sim z\ (e)$, |
as defined in Section~4.2.2 of {\sl Seminumerical Algorithms\/}; |
otherwise it returns~0. |
|
@<Subr...@>= |
int fepscomp @,@,@[ARGS((octa,octa,octa,int))@];@+@t}\6{@> |
int fepscomp(y,z,e,s) |
octa y,z,e; /* the operands */ |
int s; /* test similarity? */ |
{ |
octa yf,zf,ef,o,oo; |
int ye,ze,ee; |
char ys,zs,es; |
register int yt,zt,et,d; |
et=funpack(e,&ef,&ee,&es); |
if (es=='-') return 2; |
switch (et) { |
case nan: return 2; |
case inf: ee=10000; |
case num: case zro: break; |
} |
yt=funpack(y,&yf,&ye,&ys); |
zt=funpack(z,&zf,&ze,&zs); |
switch (4*yt+zt) { |
case 4*nan+nan: case 4*nan+inf: case 4*nan+num: case 4*nan+zro: |
case 4*inf+nan: case 4*num+nan: case 4*zro+nan: return 2; |
case 4*inf+inf: return (ys==zs || ee>=1023); |
case 4*inf+num: case 4*inf+zro: case 4*num+inf: case 4*zro+inf: |
return (s && ee>=1022); |
case 4*zro+zro: return 1; |
case 4*zro+num: case 4*num+zro:@+ if (!s) return 0; |
case 4*num+num: break; |
} |
@<Compare two numbers with respect to epsilon and |return|@>; |
} |
|
@ The relation $y\approx z\ (\epsilon)$ reduces to |
$y\sim z\ (\epsilon/2^d)$, if $d$~is the difference between the |
larger and smaller exponents of $y$ and~$z$. |
|
@<Compare two numbers with respect to epsilon and |return|@>= |
@<Unsubnormalize |y| and |z|, if they are subnormal@>; |
if (ye<ze || (ye==ze && (yf.h<zf.h || (yf.h==zf.h && yf.l<zf.l)))) |
@<Exchange |y| with |z|@>; |
if (ze==zero_exponent) ze=ye; |
d=ye-ze; |
if (!s) ee-=d; |
if (ee>=1023) return 1; /* if $\epsilon\ge2$, $z\in N_\epsilon(y)$ */ |
@<Compute the difference of fraction parts, |o|@>; |
if (!o.h && !o.l) return 1; |
if (ee<968) return 0; /* if $y\ne z$ and $\epsilon<2^{-54}$, $y\not\sim z$ */ |
if (ee>=1021) ef=shift_left(ef,ee-1021); |
else ef=shift_right(ef,1021-ee,1); |
return o.h<ef.h || (o.h==ef.h && o.l<=ef.l); |
|
@ @<Unsubnormalize |y| and |z|, if they are subnormal@>= |
if (ye<0 && yt!=zro) yf=shift_left(y,2), ye=0; |
if (ze<0 && zt!=zro) zf=shift_left(z,2), ze=0; |
|
@ At this point $y\sim z$ if and only if |
$$|yf|+(-1)^{[ys=zs]}|zf|/2^d\le 2^{ee-1021}|ef|=2^{55}\epsilon.$$ |
We need to evaluate this relation without overstepping the bounds of |
our simulated 64-bit registers. |
|
When $d>2$, the difference of fraction parts might not fit exactly |
in an octabyte; |
in that case the numbers are not similar unless $\epsilon>3/8$, |
and we replace the difference by the ceiling of the |
true result. When $\epsilon<1/8$, our program essentially replaces |
$2^{55}\epsilon$ by $\lfloor2^{55}\epsilon\rfloor$. These |
truncations are not needed simultaneously. Therefore the logic |
is justified by the facts that, if $n$ is an integer, we have |
$x\le n$ if and only if $\lceil x\rceil\le n$; |
$n\le x$ if and only if $n\le\lfloor x\rfloor$. (Notice that the |
concept of ``sticky bit'' is {\it not\/} appropriate here.) |
@^sticky bit@> |
|
@<Compute the difference of fraction parts, |o|@>= |
if (d>54) o=zero_octa,oo=zf; |
else o=shift_right(zf,d,1),oo=shift_left(o,d); |
if (oo.h!=zf.h || oo.l!=zf.l) { /* truncated result, hence $d>2$ */ |
if (ee<1020) return 0; /* difference is too large for similarity */ |
o=incr(o,ys==zs? 0: 1); /* adjust for ceiling */ |
} |
o=(ys==zs? ominus(yf,o): oplus(yf,o)); |
|
@*Floating point output conversion. |
The |print_float| routine converts an octabyte to a floating decimal |
representation that will be input as precisely the same value. |
@^binary-to-decimal conversion@> |
@^radix conversion@> |
@^multiprecision conversion@> |
|
@<Subr...@>= |
static void bignum_times_ten @,@,@[ARGS((bignum*))@]; |
static void bignum_dec @,@,@[ARGS((bignum*,bignum*,tetra))@]; |
static int bignum_compare @,@,@[ARGS((bignum*,bignum*))@]; |
void print_float @,@,@[ARGS((octa))@];@+@t}\6{@> |
void print_float(x) |
octa x; |
{ |
@<Local variables for |print_float|@>; |
if (x.h&sign_bit) printf("-"); |
@<Extract the exponent |e| and determine the |
fraction interval $[f\dts g]$ or $(f\dts g)$@>; |
@<Store $f$ and $g$ as multiprecise integers@>; |
@<Compute the significant digits |s| and decimal exponent |e|@>; |
@<Print the significant digits with proper context@>; |
} |
|
@ One way to visualize the problem being solved here is to consider |
the vastly simpler case in which there are only 2-bit exponents |
and 2-bit fractions. Then the sixteen possible 4-bit combinations |
have the following interpretations: |
$$\def\\{\;\dts\;} |
\vbox{\halign{#\qquad&$#$\hfil\cr |
0000&[0\\0.125]\cr |
0001&(0.125\\0.375)\cr |
0010&[0.375\\0.625]\cr |
0011&(0.625\\0.875)\cr |
0100&[0.875\\1.125]\cr |
0101&(1.125\\1.375)\cr |
0110&[1.375\\1.625]\cr |
0111&(1.625\\1.875)\cr |
1000&[1.875\\2.25]\cr |
1001&(2.25\\2.75)\cr |
1010&[2.75\\3.25]\cr |
1011&(3.25\\3.75)\cr |
1100&[3.75\\\infty]\cr |
1101&\rm NaN(0\\0.375)\cr |
1110&\rm NaN[0.375\\0.625]\cr |
1111&\rm NaN(0.625\\1)\cr}}$$ |
Notice that the interval is closed, $[f\dts g]$, when the fraction part |
is even; it is open, $(f\dts g)$, when the fraction part is odd. |
The printed outputs for these sixteen values, if we actually were |
dealing with such short exponents and fractions, would be |
\.{0.}, \.{.2}, \.{.5}, \.{.7}, \.{1.}, \.{1.2}, \.{1.5}, \.{1.7}, |
\.{2.}, \.{2.5}, \.{3.}, \.{3.5}, \.{Inf}, \.{NaN.2}, \.{NaN}, \.{NaN.8}, |
respectively. |
|
@<Extract the exponent |e|...@>= |
f=shift_left(x,1); |
e=f.h>>21; |
f.h&=0x1fffff; |
if (!f.h && !f.l) @<Handle the special case when the fraction part is zero@>@; |
else { |
g=incr(f,1); |
f=incr(f,-1); |
if (!e) e=1; /* subnormal */ |
else if (e==0x7ff) { |
printf("NaN"); |
if (g.h==0x100000 && g.l==1) return; /* the ``standard'' NaN */ |
e=0x3ff; /* extreme NaNs come out OK even without adjusting |f| or |g| */ |
}@+else f.h|=0x200000, g.h|=0x200000; |
} |
|
@ @<Local variables for |print_float|@>= |
octa f,g; /* lower and upper bounds on the fraction part */ |
register int e; /* exponent part */ |
register int j,k; /* all purpose indices */ |
|
@ The transition points between exponents correspond to powers of~2. At |
such points the interval extends only half as far to the left of that |
power of~2 as it does to the right. For example, in the 4-bit minifloat numbers |
considered above, case 1000 corresponds to the interval $[1.875\;\dts\;2.25]$. |
|
@<Handle the special case when the fraction part is zero@>= |
{ |
if (!e) { |
printf("0.");@+return; |
} |
if (e==0x7ff) { |
printf("Inf");@+return; |
} |
e--; |
f.h=0x3fffff, f.l=0xffffffff; |
g.h=0x400000, g.l=2; |
} |
|
@ We want to find the ``simplest'' value in the interval corresponding |
to the given number, in the sense that it has fewest significant |
digits when expressed in decimal notation. Thus, for example, |
if the floating point number can be described by a relatively |
short string such as `\.{.1}' or `\.{37e100}', we want to discover that |
representation. |
|
The basic idea is to generate the decimal representations of the |
two endpoints of the interval, outputting the leading digits where |
both endpoints agree, then making a final decision at the first place where |
they disagree. |
|
The ``simplest'' value is not always unique. For example, in the |
case of 4-bit minifloat numbers we could represent the bit pattern 0001 as |
either \.{.2} or \.{.3}, and we could represent 1001 in five equally short |
ways: \.{2.3} or \.{2.4} or \.{2.5} or \.{2.6} or \.{2.7}. The |
algorithm below tries to choose the middle possibility in such cases. |
|
[A solution to the analogous problem for fixed-point representations, |
without the additional complication of round-to-even, was used by |
the author in the program for \TeX; see {\sl Beauty is Our Business\/} |
(Springer, 1990), 233--242.] |
@^Knuth, Donald Ervin@> |
|
Suppose we are given two fractions $f$ and $g$, where $0\le f<g<1$, and |
we want to compute the shortest decimal in the closed interval $[f\dts g]$. |
If $f=0$, we are done. Otherwise let $10f=d+f'$ and $10g=e+g'$, where |
$0\le f'<1$ and $0\le g'<1$. If $d<e$, we can terminate by outputting |
any of the digits $d+1$, \dots,~$e$; otherwise we output the |
common digit $d=e$, and repeat the process on the fractions $0\le f'<g'<1$. |
A similar procedure works with respect to the open interval $(f\dts g)$. |
|
@ The program below carries out the stated algorithm by using multiprecision |
arithmetic on 77-place integers with 28 bits each. This choice |
facilitates multiplication by~10, and allows us to deal with the whole range of |
floating binary numbers using fixed point arithmetic. We keep track of |
the leading and trailing digit positions so that trivial operations on |
zeros are avoided. |
|
If |f| points to a \&{bignum}, its radix-$2^{28}$ digits are |
|f->dat[0]| through |f->dat[76]|, from most significant to least significant. |
We assume that all digit positions are zero unless they lie in the |
subarray between indices |f->a| and |f->b|, inclusive. |
Furthermore, both |f->dat[f->a]| and |f->dat[f->b]| are nonzero, |
unless |f->a=f->b=bignum_prec-1|. |
|
The \&{bignum} data type can be used with any radix less than |
$2^{32}$; we will use it later with radix~$10^9$. The |dat| array |
is made large enough to accommodate both applications. |
|
@d bignum_prec 157 /* would be 77 if we cared only about |print_float| */ |
|
@<Other type...@>= |
typedef struct { |
int a; /* index of the most significant digit */ |
int b; /* index of the least significant digit; must be $\ge a$ */ |
tetra dat[bignum_prec]; /* the digits; undefined except between |a| and |b| */ |
} bignum; |
|
@ Here, for example, is how we go from $f$ to $10f$, assuming that |
overflow will not occur and that the radix is $2^{28}$: |
|
@<Subr...@>= |
static void bignum_times_ten(f) |
bignum *f; |
{ |
register tetra *p,*q; register tetra x,carry; |
for (p=&f->dat[f->b],q=&f->dat[f->a],carry=0; p>=q; p--) { |
x=*p*10+carry; |
*p=x&0xfffffff; |
carry=x>>28; |
} |
*p=carry; |
if (carry) f->a--; |
if (f->dat[f->b]==0 && f->b>f->a) f->b--; |
} |
|
@ And here is how we test whether $f<g$, $f=g$, or $f>g$, using any |
radix whatever: |
|
@<Subr...@>= |
static int bignum_compare(f,g) |
bignum *f,*g; |
{ |
register tetra *p,*pp,*q,*qq; |
if (f->a!=g->a) return f->a > g->a? -1: 1; |
pp=&f->dat[f->b], qq=&g->dat[g->b]; |
for (p=&f->dat[f->a],q=&g->dat[g->a]; p<=pp; p++,q++) { |
if (*p!=*q) return *p<*q? -1: 1; |
if (q==qq) return p<pp; |
} |
return -1; |
} |
|
@ The following subroutine subtracts $g$ from~$f$, assuming that |
$f\ge g>0$ and using a given radix. |
|
@<Subr...@>= |
static void bignum_dec(f,g,r) |
bignum *f,*g; |
tetra r; /* the radix */ |
{ |
register tetra *p,*q,*qq; |
register int x,borrow; |
while (g->b>f->b) f->dat[++f->b]=0; |
qq=&g->dat[g->a]; |
for (p=&f->dat[g->b],q=&g->dat[g->b],borrow=0;q>=qq;p--,q--) { |
x=*p - *q - borrow; |
if (x>=0) borrow=0, *p=x; |
else borrow=1, *p=x+r; |
} |
for (;borrow;p--) |
if (*p) borrow=0, *p=*p-1; |
else *p=r-1; |
while (f->dat[f->a]==0) { |
if (f->a==f->b) { /* the result is zero */ |
f->a=f->b=bignum_prec-1, f->dat[bignum_prec-1]=0; |
return; |
} |
f->a++; |
} |
while (f->dat[f->b]==0) f->b--; |
} |
|
@ Armed with these subroutines, we are ready to solve the problem. |
The first task is to put the numbers into \&{bignum} form. |
If the exponent is |e|, the number destined for digit |dat[k]| will |
consist of the rightmost 28 bits of the given fraction after it has |
been shifted right $c-e-28k$ bits, for some constant~$c$. |
We choose $c$ so that, |
when $e$ has its maximum value \Hex{7ff}, the leading digit will |
go into position |dat[1]|, and so that when the number to be printed |
is exactly~1 the integer part of~$g$ will also be exactly~1. |
|
@d magic_offset 2112 /* the constant $c$ that makes it work */ |
@d origin 37 /* the radix point follows |dat[37]| */ |
|
@<Store $f$ and $g$ as multiprecise integers@>= |
k=(magic_offset-e)/28; |
ff.dat[k-1]=shift_right(f,magic_offset+28-e-28*k,1).l&0xfffffff; |
gg.dat[k-1]=shift_right(g,magic_offset+28-e-28*k,1).l&0xfffffff; |
ff.dat[k]=shift_right(f,magic_offset-e-28*k,1).l&0xfffffff; |
gg.dat[k]=shift_right(g,magic_offset-e-28*k,1).l&0xfffffff; |
ff.dat[k+1]=shift_left(f,e+28*k-(magic_offset-28)).l&0xfffffff; |
gg.dat[k+1]=shift_left(g,e+28*k-(magic_offset-28)).l&0xfffffff; |
ff.a=(ff.dat[k-1]? k-1: k); |
ff.b=(ff.dat[k+1]? k+1: k); |
gg.a=(gg.dat[k-1]? k-1: k); |
gg.b=(gg.dat[k+1]? k+1: k); |
|
@ If $e$ is sufficiently small, the fractions $f$ and $g$ will be less than~1, |
and we can use the stated algorithm directly. Of course, if $e$ is |
extremely small, a lot of leading zeros need to be lopped off; in the |
worst case, we may have to multiply $f$ and~$g$ by~10 more than 300 times. |
But hey, we don't need to do that extremely often, and computers are |
pretty fast nowadays. |
|
In the small-exponent case, the computation always terminates before |
$f$ becomes zero, because the interval endpoints are fractions with |
denominator $2^t$ for some $t>50$. |
|
The invariant relations |ff.dat[ff.a]!=0| and |gg.dat[gg.a]!=0| are |
not maintained by the computation here, when |ff.a=origin| or |gg.a=origin|. |
But no harm is done, because |bignum_compare| is not used. |
|
@<Compute the significant digits |s|...@>= |
if (e>0x401) @<Compute the significant digits in the large-exponent case@>@; |
else@+{ /* if |e<=0x401| we have |gg.a>=origin| and |gg.dat[origin]<=8| */ |
if (ff.a>origin) ff.dat[origin]=0; |
for (e=1, p=s; gg.a>origin || ff.dat[origin]==gg.dat[origin]; ) { |
if (gg.a>origin) e--; |
else *p++=ff.dat[origin]+'0', ff.dat[origin]=0, gg.dat[origin]=0; |
bignum_times_ten(&ff); |
bignum_times_ten(&gg); |
} |
*p++=((ff.dat[origin]+1+gg.dat[origin])>>1)+'0'; /* the middle digit */ |
} |
*p='\0'; /* terminate the string |s| */ |
|
@ When |e| is large, we use the stated algorithm by considering $f$ and |
$g$ to be fractions whose denominator is a power of~10. |
|
An interesting case arises when the number to be converted is |
\Hex{44ada56a4b0835bf}, since the interval turns out to be |
$$ (69999999999999991611392\ \ \dts\ \ 70000000000000000000000).$$ |
If this were a closed interval, we could simply give the answer |
\.{7e22}; but the number \.{7e22} actually corresponds to |
\Hex{44ada56a4b0835c0} |
because of the round-to-even rule. Therefore the correct answer is, say, |
\.{6.9999999999999995e22}. This example shows that we need a slightly |
different strategy in the case of open intervals; we cannot simply |
look at the first position in which the endpoints have different |
decimal digits. Therefore we change the invariant relation to $0\le f<g\le 1$, |
when open intervals are involved, |
and we do not terminate the process when $f=0$ or $g=1$. |
|
@<Compute the significant digits in the large-exponent case@>= |
{@+register int open=x.l&1; |
tt.dat[origin]=10; |
tt.a=tt.b=origin; |
for (e=1;bignum_compare(&gg,&tt)>=open;e++) |
bignum_times_ten(&tt); |
p=s; |
while (1) { |
bignum_times_ten(&ff); |
bignum_times_ten(&gg); |
for (j='0';bignum_compare(&ff,&tt)>=0;j++) |
bignum_dec(&ff,&tt,0x10000000),bignum_dec(&gg,&tt,0x10000000); |
if (bignum_compare(&gg,&tt)>=open) break; |
*p++=j; |
if (ff.a==bignum_prec-1 && !open) |
goto done; /* $f=0$ in a closed interval */ |
} |
for (k=j;bignum_compare(&gg,&tt)>=open;k++) bignum_dec(&gg,&tt,0x10000000); |
*p++=(j+1+k)>>1; /* the middle digit */ |
done:; |
} |
|
@ The length of string~|s| will be at most 17. For if $f$ and $g$ |
agree to 17 places, we have $g/f<1+10^{-16}$; but the |
ratio $g/f$ is always $\ge(1+2^{-52}+2^{-53})/(1+2^{-52}-2^{-53}) |
>1+2\times10^{-16}$. |
|
@<Local variables for |print_float|@>= |
bignum ff,gg; /* fractions or numerators of fractions */ |
bignum tt; /* power of ten (used as the denominator) */ |
char s[18]; |
register char *p; |
|
@ At this point the significant digits are in string |s|, and |s[0]!='0'|. |
If we put a decimal point at the left of~|s|, the result should |
be multiplied by $10^e$. |
|
We prefer the output `\.{300.}' to the form `\.{3e2}', and we prefer |
`\.{.03}' to `\.{3e-2}'. In general, the output will use an |
explicit exponent only if the alternative would take more than |
18~characters. |
|
@<Print the significant digits with proper context@>= |
if (e>17 || e<(int)strlen(s)-17) |
printf("%c%s%se%d",s[0],(s[1]? ".": ""),s+1,e-1); |
else if (e<0) printf(".%0*d%s",-e,0,s); |
else if (strlen(s)>=e) printf("%.*s.%s",e,s,s+e); |
else printf("%s%0*d.",s,e-(int)strlen(s),0); |
|
@*Floating point input conversion. Going the other way, we want to |
be able to convert a given decimal number into its floating binary |
@^decimal-to-binary conversion@> |
@^radix conversion@> |
@^multiprecision conversion@> |
equivalent. The following syntax is supported: |
$$\vbox{\halign{$#$\hfil\cr |
\<digit>\is\.0\mid\.1\mid\.2\mid\.3\mid\.4\mid |
\.5\mid\.6\mid\.7\mid\.8\mid\.9\cr |
\<digit string>\is\<digit>\mid\<digit string>\<digit>\cr |
\<decimal string>\is\<digit string>\..\mid\..\<digit string>\mid |
\<digit string>\..\<digit string>\cr |
\<optional sign>\is\<empty>\mid\.+\mid\.-\cr |
\<exponent>\is\.e\<optional sign>\<digit string>\cr |
\<optional exponent>\is\<empty>\mid\<exponent>\cr |
\<floating magnitude>\is\<digit string>\<exponent>\mid |
\<decimal string>\<optional exponent>\mid\cr |
\hskip12em \.{Inf}\mid\.{NaN}\mid\.{NaN.}\<digit string>\cr |
\<floating constant>\is\<optional sign>\<floating magnitude>\cr |
\<decimal constant>\is\<optional sign>\<digit string>\cr |
}}$$ |
For example, `\.{-3.}' is the floating constant \Hex{c008000000000000}\thinspace; |
`\.{1e3}' and `\.{1000}' are both equivalent to \Hex{408f400000000000}\thinspace; |
`\.{NaN}' and `\.{+NaN.5}' are both equivalent to \Hex{7ff8000000000000}. |
|
The |scan_const| routine looks at a given string and finds the |
longest initial substring that matches the syntax of either \<decimal |
constant> or \<floating constant>. It puts the corresponding value |
into the global octabyte variable~|val|; it also puts the position of the first |
unscanned character in the global pointer variable |next_char|. |
It returns 1 if a floating constant was found, 0~if a decimal constant |
was found, $-1$ if nothing was found. A decimal constant that doesn't |
fit in an octabyte is computed modulo~$2^{64}$. |
@^syntax of floating point constants@> |
|
The value of |exceptions| set by |scan_const| is not necessarily correct. |
|
@<Subr...@>= |
static void bignum_double @,@,@[ARGS((bignum*))@]; |
int scan_const @,@,@[ARGS((char*))@];@+@t}\6{@> |
int scan_const(s) |
char *s; |
{ |
@<Local variables for |scan_const|@>; |
val.h=val.l=0; |
p=s; |
if (*p=='+' || *p=='-') sign=*p++;@+else sign='+'; |
if (strncmp(p,"NaN",3)==0) NaN=true, p+=3; |
else NaN=false; |
if ((isdigit(*p)&&!NaN) || (*p=='.' && isdigit(*(p+1)))) |
@<Scan a number and |return|@>; |
if (NaN) @<Return the standard NaN@>; |
if (strncmp(p,"Inf",3)==0) @<Return infinity@>; |
no_const_found: next_char=s;@+return -1; |
} |
|
@ @<Glob...@>= |
octa val; /* value returned by |scan_const| */ |
char *next_char; /* pointer returned by |scan_const| */ |
|
@ @<Local variables for |scan_const|@>= |
register char *p,*q; /* for string manipulations */ |
register bool NaN; /* are we processing a NaN? */ |
int sign; /* |'+'| or |'-'| */ |
|
@ @<Return the standard NaN@>= |
{ |
next_char=p; |
val.h=0x600000, exp=0x3fe; |
goto packit; |
} |
|
@ @<Return infinity@>= |
{ |
next_char=p+3; |
goto make_it_infinite; |
} |
|
@ We saw above that a string of at most 17 digits is enough to characterize |
a floating point number, for purposes of output. But a much longer buffer |
for digits is needed when we're doing input. For example, consider the |
borderline quantity $(1+2^{-53})/2^{1022}$; its decimal expansion, when |
written out exactly, is a number with more than 750 significant digits: |
\.{2.2250738585...8125e-308}. |
If {\it any one\/} of those digits is increased, or if |
additional nonzero digits are added as in |
\.{2.2250738585...81250000001e-308}, |
the rounded value is supposed to change from \Hex{0010000000000000} |
to \Hex{0010000000000001}. |
|
We assume here that the user prefers a perfectly correct answer to |
a speedy almost-correct one, so we implement the most general case. |
|
@<Scan a number...@>= |
{ |
for (q=buf0,dec_pt=(char*)0;isdigit(*p);p++) { |
val=oplus(val,shift_left(val,2)); /* multiply by 5 */ |
val=incr(shift_left(val,1),*p-'0'); |
if (q>buf0 || *p!='0') |
if (q<buf_max) *q++=*p; |
else if (*(q-1)=='0') *(q-1)=*p; |
} |
if (NaN) *q++='1'; |
if (*p=='.') @<Scan a fraction part@>; |
next_char=p; |
if (*p=='e' && !NaN) @<Scan an exponent@>@; |
else exp=0; |
if (dec_pt) @<Return a floating point constant@>; |
if (sign=='-') val=ominus(zero_octa,val); |
return 0; |
} |
|
@ @<Scan a fraction part@>= |
{ |
dec_pt=q; |
p++; |
for (zeros=0;isdigit(*p);p++) |
if (*p=='0' && q==buf0) zeros++; |
else if (q<buf_max) *q++=*p; |
else if (*(q-1)=='0') *(q-1)=*p; |
} |
|
@ The buffer needs room for eight digits of padding at the left, followed |
by up to $1022+53-307$ significant digits, followed by a ``sticky'' digit |
at position |buf_max-1|, and eight more digits of padding. |
|
@d buf0 (buf+8) |
@d buf_max (buf+777) |
|
@<Glob...@>= |
static char buf[785]="00000000"; /* where we put significant input digits */ |
|
@ @<Local variables for |scan_const|@>= |
register char* dec_pt; /* position of decimal point in |buf| */ |
register int exp; /* scanned exponent; later used for raw binary exponent */ |
register int zeros; /* leading zeros removed after decimal point */ |
|
@ Here we don't advance |next_char| and force a decimal point until we |
know that a syntactically correct exponent exists. |
|
The code here will convert extra-large inputs like |
`\.{9e+9999999999999999}' into $\infty$ and extra-small inputs into zero. |
Strange inputs like `\.{-00.0e9999999}' must also be accommodated. |
|
@<Scan an exponent@>= |
{@+register char exp_sign; |
p++; |
if (*p=='+' || *p=='-') exp_sign=*p++;@+else exp_sign='+'; |
if (isdigit(*p)) { |
for (exp=*p++ -'0';isdigit(*p);p++) |
if (exp<1000) exp = 10*exp + *p - '0'; |
if (!dec_pt) dec_pt=q, zeros=0; |
if (exp_sign=='-') exp=-exp; |
next_char=p; |
} |
} |
|
@ @<Return a floating point constant@>= |
{ |
@<Move the digits from |buf| to |ff|@>; |
@<Determine the binary fraction and binary exponent@>; |
packit: @<Pack and round the answer@>; |
return 1; |
} |
|
@ Now we get ready to compute the binary fraction bits, by putting the |
scanned input digits into a multiprecision fixed-point |
accumulator |ff| that spans the full necessary range. |
After this step, the number that we want to convert to floating binary |
will appear in |ff.dat[ff.a]|, |ff.dat[ff.a+1]|, \dots, |
|ff.dat[ff.b]|. |
The radix-$10^9$ digit in ${\it ff}[36-k]$ is understood to be multiplied |
by $10^{9k}$, for $36\ge k\ge-120$. |
|
@<Move the digits from |buf| to |ff|@>= |
x=buf+341+zeros-dec_pt-exp; |
if (q==buf0 || x>=1413) { |
make_it_zero: exp=-99999;@+ goto packit; |
} |
if (x<0) { |
make_it_infinite: exp=99999;@+ goto packit; |
} |
ff.a=x/9; |
for (p=q;p<q+8;p++) *p='0'; /* pad with trailing zeros */ |
q=q-1-(q+341+zeros-dec_pt-exp)%9; /* compute stopping place in |buf| */ |
for (p=buf0-x%9,k=ff.a;p<=q && k<=156; p+=9, k++) |
@<Put the 9-digit number |*p|\thinspace\dots\thinspace|*(p+8)| |
into |ff.dat[k]|@>; |
ff.b=k-1; |
for (x=0;p<=q;p+=9) if (strncmp(p,"000000000",9)!=0) x=1; |
ff.dat[156]+=x; /* nonzero digits that fall off the right are sticky */ |
@^sticky bit@> |
while (ff.dat[ff.b]==0) ff.b--; |
|
@ @<Put the 9-digit number...@>= |
{ |
for (x=*p-'0',pp=p+1;pp<p+9;pp++) x=10*x + *pp - '0'; |
ff.dat[k]=x; |
} |
|
@ @<Local variables for |scan_const|@>= |
register int k,x; |
register char *pp; |
bignum ff,tt; |
|
@ Here's a subroutine that is dual to |bignum_times_ten|. It changes $f$ |
to~$2f$, assuming that overflow will not occur and that the radix is $10^9$. |
|
@<Subr...@>= |
static void bignum_double(f) |
bignum *f; |
{ |
register tetra *p,*q; register int x,carry; |
for (p=&f->dat[f->b],q=&f->dat[f->a],carry=0; p>=q; p--) { |
x = *p + *p + carry; |
if (x>=1000000000) carry=1, *p=x-1000000000; |
else carry=0, *p=x; |
} |
*p=carry; |
if (carry) f->a--; |
if (f->dat[f->b]==0 && f->b>f->a) f->b--; |
} |
|
@ @<Determine the binary fraction and binary exponent@>= |
val=zero_octa; |
if (ff.a>36) { |
for (exp=0x3fe;ff.a>36;exp--) bignum_double(&ff); |
for (k=54;k;k--) { |
if (ff.dat[36]) { |
if (k>=32) val.h |= 1<<(k-32);@+else val.l |= 1<<k; |
ff.dat[36]=0; |
if (ff.b==36) break; /* break if |ff| now zero */ |
} |
bignum_double(&ff); |
} |
}@+else { |
tt.a=tt.b=36, tt.dat[36]=2; |
for (exp=0x3fe;bignum_compare(&ff,&tt)>=0;exp++) bignum_double(&tt); |
for (k=54;k;k--) { |
bignum_double(&ff); |
if (bignum_compare(&ff,&tt)>=0) { |
if (k>=32) val.h |= 1<<(k-32);@+else val.l |= 1<<k; |
bignum_dec(&ff,&tt,1000000000); |
if (ff.a==bignum_prec-1) break; /* break if |ff| now zero */ |
} |
} |
} |
if (k==0) val.l |= 1; /* add sticky bit if |ff| nonzero */ |
|
@ We need to be careful that the input `\.{NaN.999999999999999999999}' doesn't |
get rounded up; it is supposed to yield \Hex{7fffffffffffffff}. |
|
Although the input `\.{NaN.0}' is illegal, strictly speaking, we silently |
convert it to \Hex{7ff0000000000001}---a number that would be |
output as `\.{NaN.0000000000000002}'. |
|
@<Pack and round the answer@>= |
val=fpack(val,exp,sign,ROUND_NEAR); |
if (NaN) { |
if ((val.h&0x7fffffff)==0x40000000) val.h |= 0x7fffffff, val.l=0xffffffff; |
else if ((val.h&0x7fffffff)==0x3ff00000 && !val.l) val.h|=0x40000000,val.l=1; |
else val.h |= 0x40000000; |
} |
|
@*Floating point remainders. In this section we implement the remainder |
of the floating point operations---one of which happens to be the |
operation of taking the remainder. |
|
The easiest task remaining is to compare two floating point quantities. |
Routine |fcomp| returns $-1$~if~$y<z$, 0~if~$y=z$, $+1$~if~$y>z$, and |
$+2$~if $y$ and~$z$ are unordered. |
|
@<Subr...@>= |
int fcomp @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
int fcomp(y,z) |
octa y,z; |
{ |
ftype yt,zt; |
int ye,ze; |
char ys,zs; |
octa yf,zf; |
register int x; |
yt=funpack(y,&yf,&ye,&ys); |
zt=funpack(z,&zf,&ze,&zs); |
switch (4*yt+zt) { |
case 4*nan+nan: case 4*zro+nan: case 4*num+nan: case 4*inf+nan: |
case 4*nan+zro: case 4*nan+num: case 4*nan+inf: return 2; |
case 4*zro+zro: return 0; |
case 4*zro+num: case 4*num+zro: case 4*zro+inf: case 4*inf+zro: |
case 4*num+num: case 4*num+inf: case 4*inf+num: case 4*inf+inf: |
if (ys!=zs) x=1; |
else if (y.h>z.h) x=1; |
else if (y.h<z.h) x=-1; |
else if (y.l>z.l) x=1; |
else if (y.l<z.l) x=-1; |
else return 0; |
break; |
} |
return (ys=='-'? -x: x); |
} |
|
@ Several \MMIX\ operations act on a single floating point number and |
accept an arbitrary rounding mode. For example, consider the |
operation of rounding to the nearest floating point integer: |
|
@<Subr...@>= |
octa fintegerize @,@,@[ARGS((octa,int))@];@+@t}\6{@> |
octa fintegerize(z,r) |
octa z; /* the operand */ |
int r; /* the rounding mode */ |
{ |
ftype zt; |
int ze; |
char zs; |
octa xf,zf; |
zt=funpack(z,&zf,&ze,&zs); |
if (!r) r=cur_round; |
switch (zt) { |
case nan:@+if (!(z.h&0x80000)) {@+exceptions|=I_BIT;@+z.h|=0x80000;@+} |
case inf: case zro: return z; |
case num: @<Integerize and |return|@>; |
} |
} |
|
@ @<Integerize...@>= |
if (ze>=1074) return fpack(zf,ze,zs,ROUND_OFF); /* already an integer */ |
if (ze<=1020) xf.h=0,xf.l=1; |
else {@+octa oo; |
xf=shift_right(zf,1074-ze,1); |
oo=shift_left(xf,1074-ze); |
if (oo.l!=zf.l || oo.h!=zf.h) xf.l|=1; /* sticky bit */ |
@^sticky bit@> |
} |
switch (r) { |
case ROUND_DOWN:@+ if (zs=='-') xf=incr(xf,3);@+break; |
case ROUND_UP:@+ if (zs!='-') xf=incr(xf,3); |
case ROUND_OFF: break; |
case ROUND_NEAR: xf=incr(xf, xf.l&4? 2: 1);@+break; |
} |
xf.l&=0xfffffffc; |
if (ze>=1022) return fpack(shift_left(xf,1074-ze),ze,zs,ROUND_OFF); |
if (xf.l) xf.h=0x3ff00000, xf.l=0; |
if (zs=='-') xf.h|=sign_bit; |
return xf; |
|
@ To convert floating point to fixed point, we use |fixit|. |
|
@<Subr...@>= |
octa fixit @,@,@[ARGS((octa,int))@];@+@t}\6{@> |
octa fixit(z,r) |
octa z; /* the operand */ |
int r; /* the rounding mode */ |
{ |
ftype zt; |
int ze; |
char zs; |
octa zf,o; |
zt=funpack(z,&zf,&ze,&zs); |
if (!r) r=cur_round; |
switch (zt) { |
case nan: case inf: exceptions|=I_BIT;@+return z; |
case zro: return zero_octa; |
case num:@+if (funpack(fintegerize(z,r),&zf,&ze,&zs)==zro) return zero_octa; |
if (ze<=1076) o=shift_right(zf,1076-ze,1); |
else { |
if (ze>1085 || (ze==1085 && (zf.h>0x400000 || @| |
(zf.h==0x400000 && (zf.l || zs!='-'))))) exceptions|=W_BIT; |
if (ze>=1140) return zero_octa; |
o=shift_left(zf,ze-1076); |
} |
return (zs=='-'? ominus(zero_octa,o): o); |
} |
} |
|
@ Going the other way, we can specify not only a rounding mode but whether |
the given fixed point octabyte is signed or unsigned, and whether the |
result should be rounded to short precision. |
|
@<Subr...@>= |
octa floatit @,@,@[ARGS((octa,int,int,int))@];@+@t}\6{@> |
octa floatit(z,r,u,p) |
octa z; /* octabyte to float */ |
int r; /* rounding mode */ |
int u; /* unsigned? */ |
int p; /* short precision? */ |
{ |
int e;@+char s; |
register int t; |
exceptions=0; |
if (!z.h && !z.l) return zero_octa; |
if (!r) r=cur_round; |
if (!u && (z.h&sign_bit)) s='-', z=ominus(zero_octa,z);@+ else s='+'; |
e=1076; |
while (z.h<0x400000) e--,z=shift_left(z,1); |
while (z.h>=0x800000) { |
e++; |
t=z.l&1; |
z=shift_right(z,1,1); |
z.l|=t; |
} |
if (p) @<Convert to short float@>; |
return fpack(z,e,s,r); |
} |
|
@ @<Convert to short float@>= |
{ |
register int ex;@+register tetra t; |
t=sfpack(z,e,s,r); |
ex=exceptions; |
sfunpack(t,&z,&e,&s); |
exceptions=ex; |
} |
|
@ The square root operation is more interesting. |
|
@<Subr...@>= |
octa froot @,@,@[ARGS((octa,int))@];@+@t}\6{@> |
octa froot(z,r) |
octa z; /* the operand */ |
int r; /* the rounding mode */ |
{ |
ftype zt; |
int ze; |
char zs; |
octa x,xf,rf,zf; |
register int xe,k; |
if (!r) r=cur_round; |
zt=funpack(z,&zf,&ze,&zs); |
if (zs=='-' && zt!=zro) exceptions|=I_BIT, x=standard_NaN; |
else@+switch (zt) { |
case nan:@+ if (!(z.h&0x80000)) exceptions|=I_BIT, z.h|=0x80000; |
return z; |
case inf: case zro: x=z;@+break; |
case num: @<Take the square root and |return|@>; |
} |
if (zs=='-') x.h|=sign_bit; |
return x; |
} |
|
@ The square root can be found by an adaptation of the old pencil-and-paper |
method. If $n=\lfloor\sqrt s\rfloor$, where $s$ is an integer, |
we have $s=n^2+r$ where $0\le r\le2n$; |
this invariant can be maintained if we replace $s$ by $4s+(0,1,2,3)$ |
and $n$ by $2n+(0,1)$. The following code implements this idea with |
$2n$ in~|xf| and $r$ in~|rf|. (It could easily be made to run about |
twice as fast.) |
|
@<Take the square root and |return|@>= |
xf.h=0, xf.l=2; |
xe=(ze+0x3fe)>>1; |
if (ze&1) zf=shift_left(zf,1); |
rf.h=0, rf.l=(zf.h>>22)-1; |
for (k=53;k;k--) { |
rf=shift_left(rf,2);@+ xf=shift_left(xf,1); |
if (k>=43) rf=incr(rf,(zf.h>>(2*(k-43)))&3); |
else if (k>=27) rf=incr(rf,(zf.l>>(2*(k-27)))&3); |
if ((rf.l>xf.l && rf.h>=xf.h) || rf.h>xf.h) { |
xf.l++;@+rf=ominus(rf,xf);@+xf.l++; |
} |
} |
if (rf.h || rf.l) xf.l++; /* sticky bit */ |
return fpack(xf,xe,'+',r); |
|
@ And finally, the genuine floating point remainder. Subroutine |fremstep| |
either calculates $y\,{\rm rem}\,z$ or reduces $y$ to a smaller number |
having the same remainder with respect to~$z$. In the latter case |
the |E_BIT| is set in |exceptions|. A third parameter, |delta|, |
gives a decrease in exponent that is acceptable for incomplete results; |
if |delta| is sufficiently large, say 2500, the correct result will |
always be obtained in one step of |fremstep|. |
|
@<Subr...@>= |
octa fremstep @,@,@[ARGS((octa,octa,int))@];@+@t}\6{@> |
octa fremstep(y,z,delta) |
octa y,z; |
int delta; |
{ |
ftype yt,zt; |
int ye,ze; |
char xs,ys,zs; |
octa x,xf,yf,zf; |
register int xe,thresh,odd; |
yt=funpack(y,&yf,&ye,&ys); |
zt=funpack(z,&zf,&ze,&zs); |
switch (4*yt+zt) { |
@t\4@>@<The usual NaN cases@>; |
case 4*zro+zro: case 4*num+zro: case 4*inf+zro: |
case 4*inf+num: case 4*inf+inf: x=standard_NaN; |
exceptions|=I_BIT;@+break; |
case 4*zro+num: case 4*zro+inf: case 4*num+inf: return y; |
case 4*num+num: @<Remainderize nonzero numbers and |return|@>; |
zero_out: x=zero_octa; |
} |
if (ys=='-') x.h|=sign_bit; |
return x; |
} |
|
@ If there's a huge difference in exponents and the remainder is nonzero, |
this computation will take a long time. One could compute |
$(2^ny)\,{\rm rem}\,z$ much more quickly for large~$n$ by using $O(\log n)$ |
multiplications modulo~$z$, but the floating remainder operation isn't |
important enough to justify such expensive hardware. |
|
Results of floating remainder are always exact, so the rounding mode |
is immaterial. |
|
@<Remainderize...@>= |
odd=0; /* will be 1 if we've subtracted an odd multiple of~$z$ from $y$ */ |
thresh=ye-delta; |
if (thresh<ze) thresh=ze; |
while (ye>=thresh) @<Reduce |(ye,yf)| by a multiple of |zf|; |
|goto zero_out| if the remainder is zero, |
|goto try_complement| if appropriate@>; |
if (ye>=ze) { |
exceptions|=E_BIT;@+return fpack(yf,ye,ys,ROUND_OFF); |
} |
if (ye<ze-1) return fpack(yf,ye,ys,ROUND_OFF); |
yf=shift_right(yf,1,1); |
try_complement: xf=ominus(zf,yf), xe=ze, xs='+' + '-' - ys; |
if (xf.h>yf.h || (xf.h==yf.h && (xf.l>yf.l || (xf.l==yf.l && !odd)))) |
xf=yf, xs=ys; |
while (xf.h<0x400000) xe--, xf=shift_left(xf,1); |
return fpack(xf,xe,xs,ROUND_OFF); |
|
@ Here we are careful not to change the sign of |y|, because a remainder |
of~0 is supposed to inherit the original sign of~|y|. |
|
@<Reduce |(ye,yf)| by a multiple of |zf|...@>= |
{ |
if (yf.h==zf.h && yf.l==zf.l) goto zero_out; |
if (yf.h<zf.h || (yf.h==zf.h && yf.l<zf.l)) { |
if (ye==ze) goto try_complement; |
ye--, yf=shift_left(yf,1); |
} |
yf=ominus(yf,zf); |
if (ye==ze) odd=1; |
while (yf.h<0x400000) ye--, yf=shift_left(yf,1); |
} |
|
@* Index. |
|
/deluxe.mmconfig
0,0 → 1,66
% configuration for basic tests --- still under construction |
memaddresstime 3 |
memreadtime 10 memwritetime 10 |
membusbytes 16 |
branchpredictbits 2 |
branchaddressbits 6 |
branchhistorybits 3 |
branchdualbits 3 |
memchunksmax 100 |
hashprime 127 |
Scache blocksize 64 |
Scache setsize 2048 |
Scache associativity 4 pseudolru |
Scache accesstime 2 |
Dcache blocksize 32 |
Dcache setsize 512 |
Dcache victimsize 8 |
Icache blocksize 32 |
Icache setsize 256 |
Icache victimsize 4 |
DTcache associativity 4 lru |
unit BIT1 000000000000000000000000000000000000000000000000ffff00ff00ffc004 |
unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU3 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU4 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU5 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU6 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit LSU3 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit MUL1 000080f000000000000000000000000000000000000000000000000000000000 |
unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000 |
unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000 |
dispatchmax 3 |
commitmax 3 |
fetchmax 4 |
memslots 8 |
renameregs 20 |
reorderbuffer 40 |
Dcache writeallocate 1 |
Scache writeallocate 1 |
Dcache writeback 1 |
Scache writeback 1 |
Dcache ports 2 |
DTcache ports 2 |
writebuffer 8 |
writeholdingtime 3 |
mul0 1 |
mul1 2 |
mul2 2 2 1 |
mul3 2 2 2 1 |
mul4 2 2 2 2 2 |
mul5 2 2 2 2 2 |
mul6 2 2 2 2 2 |
mul7 2 2 2 2 2 |
mul8 2 2 2 2 2 |
div 10 10 10 10 10 10 |
fadd 1 1 1 1 |
fmul 1 1 1 1 |
fdiv 10 10 10 10 |
fsqrt 10 10 10 10 |
feps 1 1 1 1 |
fix 1 1 |
flot 1 1 |
|
/primesfx.mms
0,0 → 1,69
% Example program ... Table of primes (floating point with sharper bound) |
L IS 500 The number of primes to find |
t IS $255 Temporary storage |
n GREG |
q GREG |
r GREG |
jj GREG |
kk GREG |
pk GREG |
mm IS kk |
|
LOC Data_Segment |
PRIME1 WYDE 2 |
LOC PRIME1+2*L |
ptop GREG @ |
j0 GREG PRIME1+2-@ |
BUF OCTA |
|
LOC #100 |
Main SET n,3 |
SET jj,j0 |
2H STWU n,ptop,jj |
INCL jj,2 |
3H BZ jj,2F |
4H INCL n,2 |
5H SET kk,j0 |
fn GREG 0 |
sqrtn GREG 0 |
FLOT fn,n |
FSQRT sqrtn,fn |
0H GREG #3fffff0000000000 |
FSUB sqrtn,sqrtn,0B |
6H LDWU pk,ptop,kk |
FLOT t,pk |
FREM r,fn,t |
BZ r,4B |
7H FCMP t,t,sqrtn |
BNN t,2B |
8H INCL kk,2 |
JMP 6B |
GREG @ |
Title BYTE "First Five Hundred Primes" |
NewLn BYTE #a,0 |
Blanks BYTE " ",0 |
2H LDA t,Title |
TRAP 0,Fputs,StdOut |
NEG mm,2 |
3H ADD mm,mm,j0 |
LDA t,Blanks |
TRAP 0,Fputs,StdOut |
2H LDWU pk,ptop,mm |
0H GREG #2030303030000000 |
STOU 0B,BUF |
LDA t,BUF+4 |
1H DIV pk,pk,10 |
GET r,rR |
INCL r,'0' |
STBU r,t,0 |
SUB t,t,1 |
PBNZ pk,1B |
LDA t,BUF |
TRAP 0,Fputs,StdOut |
INCL mm,2*L/10 |
PBN mm,2B |
LDA t,NewLn |
TRAP 0,Fputs,StdOut |
CMP t,mm,2*(L/10-1) |
PBNZ t,3B |
TRAP 0,Halt,0 |
/test.mmix
0,0 → 1,12
000000010000: A RIDICULOUS BUT INSTRUCTIVE TEST PROGRAM FOR MMIX-PIPE |
f40000038d030004 10000: GETA r0,$+3; LDO r3,r0,4 % start in 8000000000010000 |
f6130003f000000b 10008: PUT rV,r3; JMP $+11 |
00300e000000c330 10010: b1=0,b2=0,b3=3,b4=0,s=14,r=6,8n=#330 |
8000000000010337 10018: level 2 page table pointer |
8000000000010330 10020: level 1 page table pointer |
ff00000000010337 10028: page table entry, maps seg 2 page (3 4 5) to #10000 |
4000000c04014000 10030: seg 2, page (3 4 5) |
8d0200248d040228 10038: LDO r2,r0,36; LDO r4,r2,40 % r2=[#10030], r4=[#10028] |
bf030248f8640002 10040: PUSHGO r3,r2,#48; POP r100,2 % goes to #10048 |
4700ffff34008000 10048: BOD r0,$-1; INCH r0,#8000 % goes back, then fwd |
d101002f00000000 10050: SL r1,r0,47; TRAP 0,0,0 |
/mmixal.w
0,0 → 1,3258
% This file is part of the MMIXware package (c) Donald E Knuth 1999 |
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! |
|
\def\title{MMIXAL} |
|
\def\MMIX{\.{MMIX}} |
\def\MMIXAL{\.{MMIXAL}} |
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant |
\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow |
\def\bull{\smallbreak\textindent{$\bullet$}} |
@s and normal @q unreserve a C++ keyword @> |
@s or normal @q unreserve a C++ keyword @> |
@s xor normal @q unreserve a C++ keyword @> |
|
\ifx\exotic+ |
\font\heb=heb8 at 10pt |
\font\rus=lhwnr8 |
\input unicode |
\unicodeptsize=8pt |
\fi |
|
@* Definition of MMIXAL. This program takes input written in \MMIXAL, |
the \MMIX\ assembly language, and translates it |
@^assembly language@> |
into binary files that can be loaded and executed |
on \MMIX\ simulators. \MMIXAL\ is much simpler than the ``industrial |
strength'' assembly languages that computer manufacturers usually provide, |
because it is primarily intended for the simple demonstration programs |
in {\sl The Art of Computer Programming}. Yet it tries to have enough |
features to serve also as the back end of compilers for \CEE/ and other |
high-level languages. |
|
Instructions for using the program appear at the end of this document. |
First we will discuss the input and output languages in detail; then we'll |
consider the translation process, step by step; then we'll put everything |
together. |
|
@ A program in \MMIXAL\ consists of a series of {\it lines}, each of which |
usually contains a single instruction. However, lines with no instructions are |
possible, and so are lines with two or more instructions. |
|
Each instruction has |
three parts called its label field, opcode field, and operand field; these |
fields are separated from each other by one or more spaces. |
The label field, which is often empty, consists of all characters up to the |
first blank space. The opcode field, which is never empty, runs from the first |
nonblank after the label to the next blank space. The operand field, which |
again might be empty, runs from the next nonblank character (if any) to the |
first blank or semicolon that isn't part of a string or character constant. |
If the operand field is followed by a semicolon, possibly with intervening |
blanks, a new instruction begins immediately after the semicolon; otherwise |
the rest of the line is ignored. The end of a line is treated as a blank space |
for the purposes of these rules, with the additional proviso that |
string or character constants are not allowed to extend from one line to |
another. |
|
The label field must begin with a letter or a digit; otherwise the entire |
line is treated as a comment. Popular ways to introduce comments, |
either at the beginning of a line or after the operand field, are to |
precede them by the character \.\% as in \TeX, or by \.{//} as in \CPLUSPLUS/; |
\MMIXAL\ is not very particular. However, Lisp-style comments introduced |
by single semicolons will fail if they follow an instruction, because |
they will be assumed to introduce another instruction. |
|
@ \MMIXAL\ has no built-in macro capability, nor does it know how to |
include header files and such things. But users can run their files |
through a standard \CEE/ preprocessor to obtain \MMIXAL\ programs in which |
macros and such things have been expanded. (Caution: The preprocessor also |
removes \CEE/-style comments, unless it is told not to do so.) |
Literate programming tools could also be used for preprocessing. |
@^C preprocessor@> |
@^literate programming@> |
|
If a line begins with the special form `\.\# \<integer> \<string>', |
this program interprets it as a {\it line directive\/} emitted by a |
preprocessor. For example, |
$$\leftline{\indent\.{\# 13 "foo.mms"}}$$ |
means that the following line was line 13 in the user's source file |
\.{foo.mms}. Line directives allow us to correlate errors with the |
user's original file; we also pass them to the output, for use by |
simulators and debuggers. |
@^line directives@> |
|
@ \MMIXAL\ deals primarily with {\it symbols\/} and {\it constants}, which it |
interprets and combines to form machine language instructions and data. |
Constants are simplest, so we will discuss them first. |
|
A {\it decimal constant\/} is a sequence of digits, representing a number in |
radix~10. A~{\it hexadecimal constant\/} is a sequence of hexadecimal digits, |
preceded by~\.\#, representing a number in radix~16: |
$$\vbox{\halign{$#$\hfil\cr |
\<digit>\is\.0\mid\.1\mid\.2\mid\.3\mid\.4\mid |
\.5\mid\.6\mid\.7\mid\.8\mid\.9\cr |
\<hex digit>\is\<digit>\mid\.A\mid\.B\mid\.C\mid\.D\mid\.E\mid\.F\mid |
\.a\mid\.b\mid\.c\mid\.d\mid\.e\mid\.f\cr |
\<decimal constant>\is\<digit>\mid\<decimal constant>\<digit>\cr |
\<hex constant>\is\.\#\<hex digit>\mid\<hex constant>\<hex digit>\cr |
}}$$ |
Constants whose value is $2^{64}$ or more are reduced modulo $2^{64}$. |
|
@ A {\it character constant\/} is a single character enclosed in |
single quote marks; it denotes the {\mc ASCII} or Unicode number |
@^Unicode@> |
corresponding to that character. For example, \.{'a'} |
represents the constant \.{\#61}, also known as~\.{97}. The quoted character |
can be |
anything except the character that the \CEE/ library calls \.{\\n} or {\it |
newline}; that character should be represented as \.{\#a}. |
$$\vbox{\halign{$#$\hfil\cr |
\<character constant>\is\.'\<single byte character except newline>\.'\cr |
\<constant>\is\<decimal constant>\mid\<hex constant>\mid\<character constant> |
\cr}}$$ |
Notice that \.{'''} represents a single quote, the code \.{\#27}; and |
\.{'\\'} represents a backslash, the code \.{\#5c}. \MMIXAL~characters are |
never ``quoted'' by backslashes as in the \CEE/~language. |
|
In the present implementation |
a character constant will always be at most 255, since wyde character |
input is not supported. |
\ifx\exotic+ But if the input were in Unicode one could write, |
say, \.'{\heb\char"40}\.' or \.'{\rus ZH}\.' for \.{\#05d0} or |
\.{\#0416}. \fi |
The present program |
does not support Unicode directly because basic software for inputting and |
outputting 16-bit characters was still in a primitive state at the time of |
writing. But the data structures below are designed so that a change to |
Unicode will not be difficult when the time is ripe. |
|
@ A {\it string constant\/} like \.{"Hello"} is an abbreviation for |
a sequence of one or more character constants separated by commas: |
\.{'H','e','l','l','o'}. |
Any character except newline or the double quote mark~\." |
can appear between the double quotes of a string constant. |
\ifx\exotic+ Similarly, |
\."\Uni1.08:24:24:-1:20% Unicode char "9ad8 |
<002000001800000806ffffff00000002004003ffe00300e00300c00300c003ffc0% |
0300c02000043ffffe30000e31008c31ffcc3181cc31818c31818c31ff8c31818c3% |
0007c300018>% |
\thinspace\Uni1.08:24:24:-1:20% Unicode char "5fb7 |
<1c038018030018030631ffff30060067860446fffe86ccce0ccccc0ccccc18cccc% |
18fffc38c00c38001878fffc58040098030818398618b18318b00b19b0081b300c1% |
b3ffc181ff8>% |
\thinspace\Uni1.08:24:24:-1:20% Unicode char "7eb3 |
<0601c00e01800c018018018018218231bfff61b187433186ff3186c631860c3186% |
18334630332663b6367e341660380600300600300603b0061e3006f03006c030060% |
0303e00300c>% |
\kern.1em\." is an abbreviation for |
\.'\Uni1.08:24:24:-1:20% Unicode char "9ad8 |
<002000001800000806ffffff00000002004003ffe00300e00300c00300c003ffc0% |
0300c02000043ffffe30000e31008c31ffcc3181cc31818c31818c31ff8c31818c3% |
0007c300018>% |
\.{','}\Uni1.08:24:24:-1:20% Unicode char "5fb7 |
<1c038018030018030631ffff30060067860446fffe86ccce0ccccc0ccccc18cccc% |
18fffc38c00c38001878fffc58040098030818398618b18318b00b19b0081b300c1% |
b3ffc181ff8>% |
\.{','}\Uni1.08:24:24:-1:20% Unicode char "7eb3 |
<0601c00e01800c018018018018218231bfff61b187433186ff3186c631860c3186% |
18334630332663b6367e341660380600300600300603b0061e3006f03006c030060% |
0303e00300c>% |
\.' (namely \.{\#9ad8,\#5fb7,\#7eb3}) when Unicode is supported. |
@^Unicode@> |
\fi |
|
@ A {\it symbol\/} in \MMIXAL\ is any sequence of letters and digits, |
beginning with a letter. A~colon~`\.:' or underscore symbol `\.\_' |
is regarded as a letter, for purposes of this definition. |
All extended-ASCII characters like `{\tt \'e}', |
whose 8-bit code exceeds 126, are also treated as letters. |
$$\vbox{\halign{$#$\hfil\cr |
\<letter>\is\.A\mid\.B\mid\cdots\mid\.Z\mid\.a\mid\.b\mid\cdots\mid\.z\mid |
\.:\mid\.\_\mid\<{character with code value $>126$}>\cr |
\<symbol>\is\<letter>\mid\<symbol>\<letter>\mid\<symbol>\<digit>\cr |
}}$$ |
|
In future implementations, when \MMIXAL\ is used with Unicode, |
@^Unicode@> |
all wyde characters whose 16-bit code exceeds 126 will be regarded |
as letters; thus \MMIXAL\ symbols will be able to involve Greek letters or |
Chinese characters or thousands of other glyphs. |
@ A symbol is said to |
be {\it fully qualified\/} if it begins with a colon. Every symbol |
that is not fully qualified is an abbreviation for the fully qualified |
symbol obtained by placing the {\it current prefix\/} in front of it; |
the current prefix is always fully qualified. At the beginning of an |
\MMIXAL\ program the current prefix is simply the single character~`\.:', |
but the user can change it with the \.{PREFIX} command. For example, |
$$\vbox{\halign{&\quad\tt#\hfil\cr |
ADD&x,y,z&\% means ADD :x,:y,:z\cr |
PREFIX&Foo:&\% current prefix is :Foo:\cr |
ADD&x,y,z&\% means ADD :Foo:x,:Foo:y,:Foo:z\cr |
PREFIX&Bar:&\% current prefix is :Foo:Bar:\cr |
ADD&:x,y,:z&\% means ADD :x,:Foo:Bar:y,:z\cr |
PREFIX&:&\% current prefix reverts to :\cr |
ADD&x,Foo:Bar:y,Foo:z&\% means ADD :x,:Foo:Bar:y,:Foo:z\cr |
}}$$ |
This mechanism allows large programs to avoid conflicts between symbol names, |
when parts of the program are independent and/or written by different users. |
The current prefix conventionally ends with a colon, but this convention |
need not be obeyed. |
|
@ A {\it local symbol\/} is a decimal digit followed by one of the |
letters \.B, \.F, or~\.H, meaning ``backward,'' ``forward,'' or ``here'': |
$$\vbox{\halign{$#$\hfill\cr |
\<local operand>\is\<digit>\,\.B\mid\<digit>\,\.F\cr |
\<local label>\is\<digit>\,\.H\cr |
}}$$ |
The \.B and \.F forms are permitted only in the operand field of \MMIXAL\ |
instructions; the \.H form is permitted only in the label field. A local |
operand such as~\.{2B} stands for the last local label~\.{2H} |
in instructions before the current one, or 0 if \.{2H} has not yet appeared |
as a label. A~local operand such as~\.{2F} stands |
for the first \.{2H} in instructions after the current one. Thus, in a |
sequence such as |
$$\vbox{\halign{\tt#\cr 2H JMP 2F\cr 2H JMP 2B\cr}}$$ |
the first instruction jumps to the second and the second jumps to the first. |
|
Local symbols are useful for references to nearby points of a program, in |
cases where no meaningful name is appropriate. They can also be useful |
in special situations where a redefinable symbol is needed; for example, |
an instruction like |
$$\.{9H IS 9B+1}$$ |
will maintain a running counter. |
|
@ Each symbol receives a value called its {\it equivalent\/} when it |
appears in the label field of an instruction; it is said to be {\it defined\/} |
after its equivalent has been established. A few symbols, like \.{rA} |
and \.{ROUND\_OFF} and \.{Fopen}, |
are predefined because they refer to fixed constants |
associated with the \MMIX\ hardware or its rudimentary operating system; |
otherwise every symbol should be |
defined exactly once. The two appearances of `\.{2H}' in the example |
above do not violate this rule, because the second `\.{2H}' is not the |
same symbol as the first. |
|
A predefined symbol can be redefined (given a new equivalent). After it |
has been redefined it acts like an ordinary symbol and cannot be |
redefined again. A complete list of the predefined symbols appears |
in the program listing below. |
@^predefined symbols@> |
|
Equivalents are either {\it pure\/} or {\it register numbers}. A pure |
equivalent is an unsigned octabyte, but a register number |
equivalent is a one-byte value, between 0 and~255. |
A dollar sign is used to change a pure number into a register number; |
for example, `\.{\$20}' means register number~20. |
|
@ Constants and symbols are combined into {\it expressions\/} in a simple way: |
$$\vbox{\halign{$#$\hfil\cr |
\<primary expression>\is\<constant>\mid\<symbol>\mid\<local operand>\mid |
\.{@@}\mid\cr |
\hskip12pc\.(\<expression>\.)\mid\<unary operator>\<primary expression>\cr |
\<term>\is\<primary expression>\mid |
\<term>\<strong operator>\<primary expression>\cr |
\<expression>\is\<term>\mid\<expression>\<weak operator>\<term>\cr |
\<unary operator>\is\.+\mid\.-\mid\.\~\mid\.\$\mid\.\&\cr |
\<strong operator>\is\.*\mid\./\mid\.{//}\mid\.\%\mid\.{<<}\mid\.{>>} |
\mid\.\&\cr |
\<weak operator>\is\.+\mid\.-\mid\.{\char'174}\mid\.\^\cr |
}}$$ |
Each expression has a value that is either pure or a register number. |
The character \.{@@} stands for the current location, which is always pure. |
The unary operators |
\.+, \.-, \.\~, \.\$, and \.\& mean, respectively, ``do nothing,'' |
``subtract from zero,'' ``complement the bits,'' ``change from pure value |
to register number,'' and ``take the serial number.'' Only the first of these, |
\.+, can be applied to a register number. The last unary operator, \.\&, |
applies only to symbols, and it is of interest primarily to system programmers; |
it converts a symbol to the unique positive integer that is used to identify |
it in the binary file output by \MMIXAL. |
@^serial number@> |
|
Binary operators come in two flavors, strong and weak. The strong ones |
are essentially concerned with multiplication or division: \.{x*y}, |
\.{x/y}, \.{x//y}, \.{x\%y}, \.{x<<y}, \.{x>>y}, and \.{x\&y} |
stand respectively for |
$(x\times y)\bmod2^{64}$ (multiplication), $\lfloor x/y\rfloor$ (division), |
$\lfloor2^{64}x/y\rfloor$ (fractional division), $x\bmod y$ (remainder), |
$(x\times2^y)\bmod2^{64}$ (left~shift), $\lfloor x/2^y\rfloor$ |
(right shift), and $x\land y$ (bitwise and) on unsigned octabytes. |
Division is legal only if $y>0$; fractional division is |
legal only if $x<y$. None of the strong binary operations can be |
applied to register numbers. |
|
The weak binary operations \.{x+y}, \.{x-y}, \.{x\char'174 y}, and |
\.{x\^y} stand respectively for $(x+y)\bmod2^{64}$ (addition), |
$(x-y)\bmod2^{64}$ (subtraction), |
$x\lor y$ (bitwise or), and $x\oplus y$ (bitwise exclusive-or) on |
unsigned octabytes. These operations can be applied to register |
numbers only in four contexts: $\<register>+\<pure>$, $\<pure>+\<register>$, |
$\<register>-\<pure>$ |
and $\<register>-\<register>$. For example, if \.{x} denotes \.{\$1} and |
\.{y} denotes \.{\$10}, then \.{x+3} and \.{3+x} denote \.{\$4}, and |
\.{y-x} denotes the pure value \.{9}. |
|
Register numbers within expressions are allowed to be |
arbitrary octabytes, but a register number assigned as the |
equivalent of a symbol should not exceed 255. |
|
(Incidentally, one might ask why the designer of \MMIXAL\ did not simply |
adopt the existing rules of \CEE/ for expressions. The primary reason is that |
the designers of \CEE/ chose to give \.{<<}, \.{>>}, and \.\& a lower |
precedence than~\.+; but in \MMIXAL\ we want to be able to write things |
like \.{o<<24+x<<16+y<<8+z} or \.{@@+yz<<2} or \.{@@+(\#100-@@)\&\#ff}. |
Since the conventions of \CEE/ were inappropriate, it was better |
to make a clean break, not pretending to have a close relationship |
with that language. The new rules are quite easily memorized, |
because \MMIXAL\ has just two levels of precedence, and the strong binary |
operations are all essentially multiplicative by nature |
while the weak binary operations are essentially additive.) |
|
@ A symbol is called a {\it future reference\/} until it has been defined. |
\MMIXAL\ restricts the use of future references, so that programs can |
be assembled quickly in one pass over the input; therefore all |
expressions can be evaluated when the \MMIXAL\ processor first sees them. |
|
The restrictions are easily stated: Future references |
cannot be used in expressions together with unary or binary operators (except |
the unary~\.+, which does nothing); moreover, future references |
can appear as operands only in instructions that have relative |
addresses (namely branches, probable branches, \.{JMP}, \.{PUSHJ}, |
\.{GETA}) or in octabyte constants (the pseudo-operation \.{OCTA}). |
Thus, for example, one can say \.{JMP}~\.{1F} or \.{JMP}~\.{1B-4}, but not |
\.{JMP}~\.{1F-4}. |
|
@ We noted earlier that each \MMIXAL\ instruction contains |
a label field, an opcode field, and an operand field. The label field is |
either empty or a symbol or local label; when it is nonempty, the |
symbol or local label receives an equivalent. The operand field is |
either empty or a sequence of expressions separated by commas; when |
it is empty, it is equivalent to the simple operand field~`\.0'. |
$$\vbox{\halign{$#$\hfil\cr |
\<instruction>\is\<label>\<opcode>\<operand list>\cr |
\<label>\is\<empty>\mid\<symbol>\mid\<local label>\cr |
\<operand list>\is\<empty>\mid\<expression list>\cr |
\<expression list>\is\<expression>\mid\<expression list>\.,\<expression>\cr |
}}$$ |
|
The opcode field either contains a symbolic \MMIX\ operation name (like |
\.{ADD}), or an {\it alias operation}, or a {\it pseudo-operation}. |
Alias operations are alternate names for \MMIX\ operations whose standard |
names are inappropriate in certain contexts. |
Pseudo-operations do not correspond |
directly to \MMIX\ commands, but they govern the assembly process in |
important ways. |
|
There are two alias operations: |
|
\bull \.{SET} \.{\$X,\$Y} is equivalent to \.{OR} \.{\$X,\$Y,0}; it sets |
register~X to register~Y. Similarly, \.{SET} \.{\$X,Y} (when \.Y is |
not a register) is equivalent to \.{SETL} \.{\$X,Y}. |
@.SET@> |
|
\bull \.{LDA} \.{\$X,\$Y,\$Z} is equivalent to \.{ADDU} \.{\$X,\$Y,\$Z}; |
it loads the address of memory location $\rm \$Y+\$Z$ into register~X. |
Similarly, \.{LDA} \.{\$X,\$Y,Z} is equivalent to \.{ADDU} \.{\$X,\$Y,Z}. |
@.LDA@> |
|
\smallskip |
The symbolic operation names for genuine \MMIX\ operations |
should not include the suffix~\.I for an immediate operation or the suffix~\.B |
for a backward jump; \MMIXAL\ determines such things automatically. |
Thus, one never writes \.{ADDI} or \.{JMPB} in the source input to |
\MMIXAL, although such opcodes might appear when a simulator or |
debugger or disassembler is presenting a numeric instruction in symbolic form. |
$$\vbox{\halign{$#$\hfil\cr |
\<opcode>\is\<symbolic \MMIX\ operation>\mid\<alias operation>\cr |
\hskip12pc\mid\<pseudo-operation>\cr |
\<symbolic \MMIX\ operation>\is\.{TRAP}\mid\.{FCMP}\mid\cdots\mid\.{TRIP}\cr |
\<alias operation>\is\.{SET}\mid\.{LDA}\cr |
\<pseudo-operation>\is\.{IS}\mid\.{LOC}\mid\.{PREFIX}\mid |
\.{GREG}\mid\.{LOCAL}\mid\.{BSPEC}\mid\.{ESPEC}\cr |
\hskip12pc\mid\.{BYTE}\mid\.{WYDE}\mid\.{TETRA}\mid\.{OCTA}\cr |
}}$$ |
|
@ \MMIX\ operations like \.{ADD} require exactly three expressions as |
operands. The first two must be register numbers. The third must be either a |
register number or a pure number between 0 and~255; in the latter case, |
\.{ADD} becomes \.{ADDI} in the assembled output. Thus, for example, |
the command ``set register~1 to the sum of register~2 and register~3'' could be |
expressed as |
$$\.{ADD \$1,\$2,\$3}$$ |
or as, say, |
$$\.{ADD x,y,y+1}$$ |
if the equivalent of \.x is \.{\$1} and the equivalent of \.y is \.{\$2}. |
The command ``subtract 5 from register~1'' could be expressed as |
$$\.{SUB \$1,\$1,5}$$ |
or as |
$$\.{SUB x,x,5}$$ |
but not as `\.{SUBI} \.{\$1,\$1,5}' or `\.{SUBI} \.{x,x,5}'. |
|
\MMIX\ operations like \.{FLOT} require either three operands |
(register, pure, register/pure) or only two (register, register/pure). |
In the first case the middle operand is the rounding mode, which is |
best expressed in terms of the predefined symbolic values |
\.{ROUND\_CURRENT}, \.{ROUND\_OFF}, \.{ROUND\_UP}, \.{ROUND\_DOWN}, |
\.{ROUND\_NEAR}, for $(0,1,2,3,4)$ respectively. In the second case |
the middle operand is understood to be zero (namely, |
\.{ROUND\_CURRENT}). |
@:ROUND_OFF}\.{ROUND\_OFF@> |
@:ROUND_UP}\.{ROUND\_UP@> |
@:ROUND_DOWN}\.{ROUND\_DOWN@> |
@:ROUND_NEAR}\.{ROUND\_NEAR@> |
@:ROUND_CURRENT}\.{ROUND\_CURRENT@> |
|
\MMIX\ operations like \.{SETL} or \.{INCH}, which involve a wyde |
intermediate constant, require exactly two operands, (register, pure). |
The value of the second operand should fit in two bytes. |
|
\MMIX\ operations like \.{BNZ}, which mention a register and a |
relative address, also require two operands. The first operand |
should be a register number. The second operand should yield a result~$r$ |
in the range $-2^{16}\le r<2^{16}$ when the current location is subtracted |
from it and the result is divided by~4. The second operand might also |
be undefined; in that case, the eventual value must satisfy the |
restriction stated for defined values. The opcodes \.{GETA} and |
\.{PUSHJ} are similar, except that the first operand to \.{PUSHJ} |
might also be pure (see below). The \.{JMP} operation is also |
similar, but it has only one operand, and it allows the larger |
address range $-2^{24}\le r<2^{24}$. |
|
\MMIX\ operations that refer to memory, like \.{LDO} and \.{STHT} and \.{GO}, |
are treated like \.{ADD} |
if they have three operands, except that the first operand should be |
pure (not a register number) in the case of \.{PRELD}, \.{PREGO}, |
\.{PREST}, \.{STCO}, \.{SYNCD}, and \.{SYNCID}. These opcodes |
also accept a special two-operand form, in which the second operand |
stands for a {\it base address\/} and an immediate offset (see below). |
|
The first operand of \.{PUSHJ} and \.{PUSHGO} can be either a pure |
number or a register number. In the first case (`\.{PUSHJ}~\.{2,Sub}' |
or `\.{PUSHGO}~\.{2,Sub}') |
the programmer might be thinking ``let's push down two registers''; |
in the second case (`\.{PUSHJ}~\.{\$2,Sub}' or `\.{PUSHGO}~\.{\$2,Sub}') |
the programmer might be thinking ``let's make register~2 the hole |
position for this subroutine call.'' Both cases result in the same |
assembled output. |
|
The remaining \MMIX\ opcodes are idiosyncratic: |
$$\def\\{{\rm\quad or\quad}} |
\vbox{\halign{\tt#\hfill\cr |
NEG r,p,z;\cr |
PUT s,z;\cr |
GET r,s;\cr |
POP p,yz;\cr |
RESUME xyz;\cr |
SAVE r,0;\cr |
UNSAVE r;\cr |
SYNC xyz;\cr |
TRAP x,y,z\\TRAP x,yz\\TRAP xyz;\cr |
}}$$ |
\.{SWYM} and \.{TRIP} are like \.{TRAP}. Here \.s is an integer |
between 0 and~31, preferably given by one of the predefined |
symbols \.{rA}, \.{rB}, \dots~for special register codes; |
\.r is a register number; \.p is a pure byte; \.x, \.y, and \.z are |
either register numbers or pure bytes; \.{yz} and \.{xyz} are pure |
values that fit respectively in two and three bytes. |
|
All of these rules can be summarized by saying that \MMIXAL\ treats each |
\MMIX\ opcode in the most natural way. When there are three operands, |
they affect fields X,~Y, and~Z of the assembled \MMIX\ instruction; |
when there are two operands, they affect fields X and~YZ; |
when there is just one operand, it affects field XYZ. |
|
@ In all cases when the opcode corresponds to an \MMIX\ operation, |
the \MMIXAL\ instruction tells the assembler to carry out four steps: |
(1)~Align the current location |
so that it is a multiple of~4, by adding 1, 2, or~3 if necessary; |
(2)~Define the equivalent of the label field to be the |
current location, if the label is nonempty; |
(3)~Evaluate the operands and assemble the specified \MMIX\ instruction into |
the current location; |
(4)~Increase the current location by~4. |
|
@ Now let's consider the pseudo-operations, starting with the simplest cases. |
|
\bull\<label> \.{IS} \<expression> |
defines the value of the label to be the value of the expression, |
which must not be a future reference. The expression may be |
either pure or a register number. |
|
\bull\<label> \.{LOC} \<expression> |
first defines the label to be the value of the current location, if the label |
is nonempty. Then the current location is changed to the value of the |
expression, which must be pure. |
|
\smallskip For example, `\.{LOC} \.{\#1000}' will start assembling subsequent |
instructions or data in location whose hexa\-decimal value is \Hex{1000}. |
`\.X~\.{LOC}~\.{@@+500}' defines \.X to be the address of the first |
of 500 bytes in memory; assembly will continue at location $\.X+500$. |
The operation of aligning the current location to a multiple of~256, |
if it is not already aligned in that way, can be expressed as |
`\.{LOC}~\.{@@+(256-@@)\&255}'. |
|
A less trivial example arises if we want to emit instructions and data into |
two separate areas of memory, but we want to intermix them in the |
\MMIXAL\ source file. We could start by defining \.{8H} and \.{9H} |
to be the starting addresses of the instruction and data segments, |
respectively. Then, a sequence of instructions could be enclosed |
in `\.{LOC}~\.{8B}; \dots; \.{8H}~\.{IS}~\.{@@}'; a sequence of |
data could be enclosed in `\.{LOC}~\.{9B}; \dots; \.{9H}~\.{IS}~\.{@@}'. |
Any number of such sequences could then be combined. |
Instead of the two pseudo-instructions `\.{8H}~\.{IS}~\.{@@;} \.{LOC}~\.{9B}' |
one could in fact write simply `\.{8H}~\.{LOC}~\.{9B}' when |
switching from instructions to data. |
|
\bull \.{PREFIX} \<symbol> |
redefines the current prefix to be the given symbol (fully qualified). |
The label field should be blank. |
|
@ The next pseudo-operations assemble bytes, wydes, tetrabytes, or |
octabytes of data. |
|
\bull \<label> \.{BYTE} \<expression list> |
defines the label to be the current location, if the label field is nonempty; |
then it assembles one byte for each expression in the expression list, and |
advances the current location by the number of bytes. The expressions |
should all be pure numbers that fit in one byte. |
|
String constants are often used in such expression lists. |
For example, if the current location is \Hex{1000}, the instruction |
\.{BYTE}~\.{"Hello",0} assembles six bytes containing the constants |
\.{'H'}, \.{'e'}, \.{'l'}, \.{'l'}, \.{'o'}, and~\.0 into locations |
\Hex{1000}, \dots,~\Hex{1005}, and advances the current location |
to \Hex{1006}. |
|
\bull \<label> \.{WYDE} \<expression list> |
is similar, but it first makes the current location even, by adding~1 to it |
if necessary. Then it defines the label (if a nonempty label is present), |
and assembles each expression as a two-byte value. The current location |
is advanced by twice the number of expressions in the list. The |
expressions should all be pure numbers that fit in two bytes. |
|
\bull \<label> \.{TETRA} \<expression list> |
is similar, but it aligns the current location to a multiple of~4 |
before defining the label; then it |
assembles each expression as a four-byte value. The current location |
is advanced by $4n$ if there are $n$~expressions in the list. Each |
expression should be a pure number that fits in four bytes. |
|
\bull \<label> \.{OCTA} \<expression list> |
is similar, but it first aligns the current location to a multiple of~8; |
it assembles each expression as an eight-byte value. The current location |
is advanced by $8n$ if there are $n$~expressions in the list. Any or all |
of the expressions may be future references, but they should all |
be defined as pure numbers eventually. |
|
@ Global registers are important for accessing memory in \MMIX\ programs. |
They could be allocated by hand, and defined with \.{IS} instructions, |
but \MMIXAL\ provides a mechanism that is usually much more convenient: |
|
\bull \<label> \.{GREG} \<expression> |
allocates a new global register, and assigns its number as the |
equivalent of the label. |
At the beginning of assembly, the current global threshold~G is~\$255. |
Each distinct \.{GREG} instruction decreases~G by~1; the final value of~G will |
be the initial value of~rG when the assembled program is loaded. |
|
The value of the expression will be loaded into the global register |
at the beginning of the program. {\it If this value is nonzero, it |
should remain constant throughout the program execution\/}; such |
global registers are considered to be {\it base addresses}. Two or |
more base addresses with the same constant value are assigned to the |
same global register number. |
|
Base addresses can simplify memory accesses in an important way. |
Suppose, for example, five octabyte values appear in a data segment, |
and their addresses are called \.{AA}, \.{BB}, \.{CC}, \.{DD}, and |
\.{EE}: |
$$\.{AA LOC @@+8;BB LOC @@+8;CC LOC @@+8;DD LOC @@+8;EE LOC @@+8}$$ |
Then if you say \.{Base GREG AA}, you will be able to write simply |
`\.{LDO}~\.{\$1,AA}' to bring \.{AA} into register~\.{\$1}, and |
`\.{LDO}~\.{\$2,CC}' to bring \.{CC} into register~\.{\$2}. |
|
Here's how it works: Whenever a memory operation such as |
\.{LDO} or \.{STB} or \.{GO} has only two operands, the second |
operand should be a pure number whose value can be expressed |
as $b+\delta$, where $0\le\delta<256$ and $b$ is the value of |
a base address in one of the preceding \.{GREG} commands. The \MMIXAL\ |
processor will find the closest base address and manufacture an |
appropriate command. For example, the instruction `\.{LDO}~\.{\$2,CC}' in the |
example of the preceding paragraph would be converted automatically to |
`\.{LDO}~\.{\$2,Base,16}'. |
|
If no base address is close enough, an error message will be |
generated, unless this program is run with the \.{-x} option |
on the command line. The \.{-x} option inserts additional instructions |
if necessary, using global register~255, so that any address is |
accessible. For example, |
if there is no base address that allows \.{LDO}~\.{\$2,FF} to be |
implemented in a single instruction, but if \.{FF} equals \.{Base+1000}, |
then the \.{-x} option would assemble two instructions, |
$$\.{SETL \$255,1000; LDO \$2,Base,\$255}$$ |
in place of \.{LDO}~\.{\$2,FF}. Caution:~The \.{-x} feature makes the |
number of actual \MMIX\ instructions hard to predict, so extreme care must |
be used if your style of coding includes relative branch instructions |
in dangerous forms like `\.{BNZ}~\.{x,@@+8}'. |
|
This base address convention can be used also with the alias |
operation~\.{LDA}. For example, `\.{LDA}~\.{\$3,CC}' loads the |
@.LDA@> |
address of \.{CC} into register~3, by assembling the instruction |
`\.{ADDU}~\.{\$3,Base,16}'. |
|
\MMIXAL\ also allows a two-operand form for memory operations such as |
$$\hbox{\.{LDO} \.{\$1,\$2}}$$ |
to be an abbreviation for `\.{LDO} \.{\$1,\$2,0}'. |
|
When \MMIXAL\ programs use subroutines with a memory stack in addition |
to the built-in register stack, they usually begin with the |
instructions `\.{sp}~\.{GREG}~\.{0;fp}~\.{GREG}~\.0'; these instructions |
allocate a {\it stack pointer\/} \.{sp=\$254} and a {\it frame pointer\/} |
\.{fp=\$253}. However, subroutine libraries are free to implement any |
conventions for global registers and stacks that they like. |
@^stack pointer@> |
@^frame pointer@> |
|
@ Short programs rarely run out of global registers, but long programs |
need a mechanism to check that \.{GREG} hasn't been used too often. |
The following pseudo-instruction provides the necessary safety valve: |
|
\bull \.{LOCAL} \<expression> |
ensures that the expression will be a local register in the program |
being assembled. The expression should be a register number, and |
the label field should be blank. At the close of |
assembly, \MMIXAL\ will report an error if the final value of~G does |
not exceed all register numbers that are declared local in this way. |
|
A \.{LOCAL} instruction need not be given unless the register number |
is 32 or~more. (\MMIX\ always considers \.{\$0} through \.{\$31} to be |
local, so \MMIXAL\ implicitly acts as if the |
instruction `\.{LOCAL}~\.{\$31}' were present.) |
|
@ Finally, there are two pseudo-instructions to pass information |
and hints to the loading routine and/or to debuggers that will be |
using the assembled program. |
|
\bull \.{BSPEC} \<expression> |
begins ``special mode''; the \<expression> should have a value that |
fits in two bytes, and the label field should be blank. |
|
\bull \.{ESPEC} |
ends ``special mode''; the operand field is ignored, and the label |
field should be blank. |
|
\smallskip\noindent |
All material assembled between \.{BSPEC} and \.{ESPEC} is passed |
directly to the output, but not loaded as part of the assembled program. |
Ordinary \MMIX\ instructions cannot appear in special mode; only the |
pseudo-operations \.{IS}, \.{PREFIX}, \.{BYTE}, \.{WYDE}, \.{TETRA}, |
\.{OCTA}, \.{GREG}, and \.{LOCAL} are allowed. The operand of |
\.{BSPEC} should have a value that fits in two bytes; this value |
identifies the kind of data that follows. (For example, \.{BSPEC}~\.0 |
might introduce information about subroutine calling conventions at the |
current location, and \.{BSPEC}~\.1 might introduce line numbers from |
a high-level-language program that was compiled into the code at |
the current place. |
System routines often need to pass such information through an assembler |
to the operating system, hence \MMIXAL\ provides a general-purpose conduit.) |
|
@ A program should begin at the special symbolic location \.{Main} |
@.Main@> |
(more precisely, at the address corresponding to |
the fully qualified symbol \.{:Main}). |
This symbol always has serial number~1, and it must always be defined. |
@^serial number@> |
|
Locations should not receive assembled data more than once. |
(More precisely, the loader will load the bitwise~xor of all the |
data assembled for each byte position; but the general rule ``do not load |
two things into the same byte'' is safest.) |
All locations that do not receive assembled data are initially zero, |
except that the loading routine will put register stack data into |
segment~3, and the operating system may put command-line data and |
debugger data into segment~2. |
(The rudimentary \MMIX\ operating system starts a program |
with the number of command-line arguments in~\$0, and a pointer to |
the beginning of an array of argument pointers in~\$1.) |
Segments 2 and 3 should not get assembled data, unless the |
user is a true hacker who is willing to take the risk that such data |
might crash the system. |
|
@* Binary MMO output. When the \MMIXAL\ processor assembles a file |
called \.{foo.mms}, it produces a binary output file called \.{foo.mmo}. |
(The suffix \.{mms} stands for ``\MMIX\ symbolic,'' and \.{mmo} stands |
for ``\MMIX\ object.'') Such \.{mmo} files have a simple structure |
consisting of a sequence of tetrabytes. Some of the tetrabytes are |
instructions to a loading routine; others are data to be loaded. |
@^object files@> |
|
Loader instructions are distinguished from tetrabytes of data by their |
first (most significant) byte, which has the special escape-code value |
\Hex{98}, called |mm| in the program below. This code value corresponds |
to \MMIX's opcode \.{LDVTS}, which is unlikely to occur in tetras of |
data. The second byte~X of a loader instruction is the loader opcode, |
called the {\it lopcode}. The third and fourth bytes, Y~and~Z, are |
operands. Sometimes they are combined into a single 16-bit operand called~YZ. |
@^lopcodes@> |
|
@d mm 0x98 |
|
@ A small, contrived example will help explain the basic ideas of \.{mmo} |
format. Consider the following input file, called \.{test.mms}: |
$$\obeyspaces\vbox{\halign{\tt#\hfil\cr |
\% A peculiar example of MMIXAL\cr |
\ LOC Data\_Segment \% location \#2000000000000000\cr |
\ OCTA 1F \% a future reference\cr |
a GREG @@ \% \$254 is base address for ABCD\cr |
ABCD BYTE "ab" \% two bytes of data\cr |
\ LOC \#123456789 \% switch to the instruction segment\cr |
Main JMP 1F \% another future reference\cr |
\ LOC @@+\#4000 \% skip past 16384 bytes\cr |
2H LDB \$3,ABCD+1 \% use the base address\cr |
\ BZ \$3,1F; TRAP \% and refer to the future again\cr |
\# 3 "foo.mms" \% this comment is a line directive\cr |
\ LOC 2B-4*10 \% move 10 tetras before previous location\cr |
1H JMP 2B \% resolve previous references to 1F\cr |
\ BSPEC 5 \% begin special data of type 5\cr |
\ TETRA {\AM}a<<8 \% four bytes of special data\cr |
\ WYDE a-\$0 \% two more bytes of special data\cr |
\ ESPEC \% end a special data packet\cr |
\ LOC ABCD+2 \% resume the data segment\cr |
\ BYTE "cd",\#98 \% assemble three more bytes of data\cr |
}}$$ |
It defines a silly program that essentially puts \.{'b'} into register~3; |
the program halts when it gets to an all-zero \.{TRAP} instruction |
following the~\.{BZ}. But the assembled output of this file illustrates most |
of the features of \MMIX\ objects, and in fact \.{test.mms} was the |
first test file tried by the author when the \MMIXAL\ processor was originally |
written. |
|
The binary output file \.{test.mmo} assembled from \.{test.mms} consists |
of the following tetrabytes, shown in hexadecimal notation with brief |
comments. Fuller explanations |
appear with the descriptions of individual lopcodes below. |
$$ |
\halign{\hskip.5in\tt#&\quad#\hfil\cr |
98090101&|lop_pre| $1,1$ (preamble, version 1, 1 tetra)\cr |
36f4a363&(the file creation time)\cr |
% Sat Mar 20 23:44:35 1999 |
98012001&|lop_loc| $\Hex{20},1$ (data segment, 1 tetra)\cr |
00000000&(low tetrabyte of address in data segment)\cr |
00000000&(high tetrabyte of \.{OCTA} \.{1F})\cr |
00000000&(low tetrabyte, will be fixed up later)\cr |
61620000&(\.{"ab"}, padded with trailing zeros)\cr |
\noalign{\penalty-200} |
98010002&|lop_loc| $0,2$ (instruction segment, 2 tetras)\cr |
00000001&(high tetrabyte of address in instruction segment)\cr |
2345678c&(low tetrabyte of address, after alignment)\cr |
98060002&|lop_file| $0,2$ (file name 0, 2 tetras)\cr |
74657374&(\.{"test"})\cr |
2e6d6d73&(\.{".mms"})\cr |
98070007&|lop_line| 7 (line 7 of the current file)\cr |
f0000000&(\.{JMP} \.{1F}, will be fixed up later)\cr |
98024000&|lop_skip| \Hex{4000} (advance 16384 bytes)\cr |
98070009&|lop_line| 9 (line 9 of the current file)\cr |
8103fe01&(\.{LDB} \.{\$3,b,1}, uses base address \.b)\cr |
42030000&(\.{BZ} \.{\$3,1F}, will be fixed later)\cr |
9807000a&|lop_line| 10 (stay on line 10)\cr |
00000000&(\.{TRAP})\cr |
98010002&|lop_loc| $0,2$ (instruction segment, 2 tetras)\cr |
00000001&(high tetrabyte of address in instruction segment)\cr |
2345a768&(low tetrabyte of address \.{1H})\cr |
98050010&|lop_fixrx| 16 (fix 16-bit relative address)\cr |
0100fff5&(fixup for location \.{@@-4*-11})\cr |
98040ff7&|lop_fixr| \Hex{ff7} (fix \.{@@-4*\#ff7})\cr |
98032001&|lop_fixo| $\Hex{20},1$ (data segment, 1 tetra)\cr |
00000000&(low tetrabyte of data segment address to fix)\cr |
98060102&|lop_file| $1,2$ (file name 1, 2 tetras)\cr |
666f6f2e&(\.{"foo."})\cr |
6d6d7300&(\.{"mms",0})\cr |
98070004&|lop_line| 4 (line 4 of the current file)\cr |
f000000a&(\.{JMP} \.{2B})\cr |
98080005&|lop_spec| 5 (begin special data of type 5)\cr |
00000200&(\.{TETRA} \.{\&a<<8})\cr |
00fe0000&(\.{WYDE} \.{a-\$0})\cr |
98012001&|lop_loc| $\Hex{20},1$ (data segment, 1 tetra)\cr |
0000000a&(low tetrabyte of address in data segment)\cr |
00006364&(\.{"cd"} with leading zeros, because of alignment)\cr |
98000001&|lop_quote| (don't treat next tetrabyte as a lopcode)\cr |
98000000&(\.{BYTE} \.{\#98}, padded with trailing zeros)\cr |
980a00fe&|lop_post| \$254 (begin postamble, G is 254)\cr |
20000000&(high tetrabyte of the initial contents of \$254)\cr |
00000008&(low tetrabyte of base address \$254)\cr |
00000001&(high tetrabyte of the initial contents of \$255)\cr |
2345678c&(low tetrabyte of \$255, is address of \.{Main})\cr |
980b0000&|lop_stab| (begin symbol table)\cr |
203a5040&(compressed form for symbol table as a ternary trie)\cr |
50404020\cr |
41204220\cr |
43094408\cr |
83404020&(\.{ABCD} = \Hex{2000000000000008}, serial 3)\cr |
4d206120\cr |
69056e01\cr |
2345678c\cr |
81400f61&(\.{Main} = \Hex{000000012345678c}, serial 1)\cr |
fe820000&(\.{a} = \$254, serial 2)\cr |
980c000a&|lop_end| (end symbol table, 10 tetras)\cr |
}$$ |
|
@ When a tetrabyte of the \.{mmo} file does not begin with the escape code, |
it is loaded into the current location~$\lambda$, and $\lambda$ is increased |
to the next higher multiple of~4. |
(If $\lambda$ is not a multiple of~4, the tetrabyte actually goes |
into location $\lambda\land(-4)=4\lfloor\lambda/4\rfloor$, according |
to \MMIX's usual conventions.) The current line number is also increased |
by~1, if it is nonzero. |
|
When a tetrabyte does begin with the escape code, its next byte |
is the lopcode defining a loader instruction. There are thirteen lopcodes: |
|
\bull |lop_quote|: $\rm X=\Hex{00}$, $\rm YZ=1$. Treat the next tetra as |
an ordinary tetrabyte, even if it begins with the escape code. |
|
\bull |lop_loc|: $\rm X=\Hex{01}$, $\rm Y=high$ byte, $\rm Z=tetra$ count |
($\rm Z=1$~or~2). Set the current location to the 64-bit address defined |
by the next Z tetras, plus $\rm 2^{56}Y$. Usually $\rm Y=0$ (for the |
instruction segment) or $\rm Y=\Hex{20}$ (for the data segment). |
If $\rm Z=2$, the high tetra appears first. |
|
\bull |lop_skip|: $\rm X=\Hex{02}$, $\rm YZ=delta$. Increase the |
current location by~YZ. |
|
\bull |lop_fixo|: $\rm X=\Hex{03}$, $\rm Y=high$ byte, $\rm Z=tetra$ count |
($\rm Z=1$~or~2). Load the value of the current location~$\lambda$ into |
octabyte~P, where P~is the 64-bit address defined by the next Z tetras |
plus $\rm2^{56}Y$ as in |lop_loc|. (The octabyte at~P was previously assembled |
as zero because of a future reference.) |
|
\bull |lop_fixr|: $\rm X=\Hex{04}$, $\rm YZ=delta$. Load YZ into the YZ~field |
of the tetrabyte in location~P, where P~is |
$\rm\lambda-4YZ$, namely the address that precedes the current location |
by YZ~tetrabytes. (This tetrabyte was previously loaded with an \MMIX\ |
instruction that takes a relative address: a branch, probable branch, |
\.{JMP}, \.{PUSHJ}, or~\.{GETA}. Its YZ~field was previously |
assembled as zero because of a future reference.) |
|
\bull |lop_fixrx|: $\rm X=\Hex{05}$, $\rm Y=0$, $\rm Z=16$ or 24. |
Proceed as in |lop_fixr|, |
but load $\delta$ into tetrabyte $\rm P=\lambda-4\delta$ instead of loading |
YZ into $\rm P=\lambda-4YZ$. Here $\delta$ is the value of the tetrabyte |
following the |lop_fixrx| instruction; its leading byte will either |
0 or~1. If the leading byte is~1, $\delta$ should be treated as the |
{\it negative\/} number $(\delta\land\Hex{ffffff})-2^{\rm Z}$ when |
calculating the address~P. (The latter case arises only rarely, |
but it is needed when fixing up a relative ``future'' reference that |
ultimately leads to a ``backward'' instruction. The value of~$\delta$ that |
is xored into location~P in such cases will change \.{BZ} to \.{BZB}, |
or \.{JMP} to \.{JMPB}, etc.; we have $\rm Z=24$ when fixing a~\.{JMP}, |
$\rm Z=16$ otherwise.) |
|
\bull |lop_file|: $\rm X=\Hex{06}$, $\rm Y=file$ number, $\rm Z=tetra$ count. |
Set the current file number to~Y and the current line number to~zero. If this |
file number has occurred previously, Z~should be zero; otherwise Z~should be |
positive, and the next Z tetrabytes are the characters of the file name in |
big-endian order. |
Trailing zeros follow the file name if its length is not a multiple of~4. |
|
\bull |lop_line|: $\rm X=\Hex{07}$, $\rm YZ=line$ number. Set the current line |
number to~YZ\null. If the line number is nonzero, the current file and current |
line should correspond to the source location that generated the next data to |
be loaded, for use in diagnostic messages. (The \MMIXAL\ processor gives |
precise line numbers to the sources of tetrabytes in segment~0, which tend to |
be instructions, but not to the sources of tetrabytes assembled in other |
segments.) |
|
\bull |lop_spec|: $\rm X=\Hex{08}$, $\rm YZ=type$. Begin special data of |
type~YZ\null. The subsequent tetrabytes, continuing until the next loader |
operation other than |lop_quote|, comprise the special data. A |lop_quote| |
instruction allows tetrabytes of special data to begin with the escape code. |
|
\bull |lop_pre|: $\rm X=\Hex{09}$, $\rm Y=1$, $\rm Z=tetra$ count. A~|lop_pre| |
instruction, which defines the ``preamble,'' must be the first tetrabyte of |
every \.{mmo} file. The Y~field specifies the version number of \.{mmo} |
format, currently~1; other version numbers may be defined later, but |
version~1 should always be supported as described in the present document. |
The Z~tetrabytes following a |lop_pre| command provide additional information |
that might be of interest to system routines. If $\rm Z>0$, the first tetra |
of additional information records the time that this \.{mmo} file was |
created, measured in seconds since 00:00:00 Greenwich Mean Time on |
1~Jan~1970. |
|
\bull |lop_post|: $\rm X=\Hex{0a}$, $\rm Y=0$, $\rm Z=G$ (must be 32~or~more). |
This instruction begins the {\it postamble}, which follows all instructions |
and data to be loaded. It causes the loaded program to begin with rG equal to |
the stated value of~G, and with \$G, $\rm G+1$, \dots,~\$255 initially set to |
the values of the next $\rm(256-G)*2$ tetrabytes. These tetrabytes specify |
$\rm 256-G$ octabytes in big-endian fashion (high half first). |
|
\bull |lop_stab|: $\rm X=\Hex{0b}$, $\rm YZ=0$. This instruction must appear |
immediately after the $\rm(256-G)*2$ tetrabytes following~|lop_post|. It is |
followed by the symbol table, which lists the equivalents of all user-defined |
symbols in a compact form that will be described later. |
|
\bull |lop_end|: $\rm X=\Hex{0c}$, $\rm YZ=tetra$ count. This instruction |
must be the very last tetrabyte of each \.{mmo} file. Furthermore, |
exactly YZ tetrabytes must appear between it and the |lop_stab| command. |
(Therefore a program can easily find the symbol table without reading |
forward through the entire \.{mmo} file.) |
|
\smallskip |
A separate routine called \.{MMOtype} is available to translate |
binary \.{mmo} files into human-readable form. |
|
@d lop_quote 0x0 /* the quotation lopcode */ |
@d lop_loc 0x1 /* the location lopcode */ |
@d lop_skip 0x2 /* the skip lopcode */ |
@d lop_fixo 0x3 /* the octabyte-fix lopcode */ |
@d lop_fixr 0x4 /* the relative-fix lopcode */ |
@d lop_fixrx 0x5 /* extended relative-fix lopcode */ |
@d lop_file 0x6 /* the file name lopcode */ |
@d lop_line 0x7 /* the file position lopcode */ |
@d lop_spec 0x8 /* the special hook lopcode */ |
@d lop_pre 0x9 /* the preamble lopcode */ |
@d lop_post 0xa /* the postamble lopcode */ |
@d lop_stab 0xb /* the symbol table lopcode */ |
@d lop_end 0xc /* the end-it-all lopcode */ |
|
@ Many readers will have noticed that \MMIXAL\ has no facilities for |
relocatable output, nor does \.{mmo} format support such features. The |
author's first drafts of \MMIXAL\ and \.{mmo} did allow relocatable objects, |
with external linkages, but the rules were substantially more complicated and |
therefore inconsistent with the goals of {\sl The Art of Computer Programming}. |
The present design might actually prove to be superior to the current |
practice, now that computer memory is significantly cheaper than it |
used to be, because one-pass assembly and loading are extremely fast when |
relocatability and external linkages are disallowed. Different program modules |
can be assembled together about as fast as they could be linked together under |
a relocatable scheme, and they can communicate with each other in much more |
flexible ways. Debugging tools are enhanced when open-source libraries are |
combined with user programs, and such libraries will certainly improve in |
quality when their source form is accessible to a larger community of users. |
|
@* Basic data types. |
This program for the 64-bit \MMIX\ architecture is based on 32-bit integer |
arithmetic, because nearly every computer available to the author at the time |
of writing was limited in that way. |
Details of the basic arithmetic appear in a separate program module |
called {\mc MMIX-ARITH}, because the same routines are needed also |
for the simulators. The definition of type \&{tetra} should be changed, if |
necessary, to conform with the definitions found in {\mc MMIX-ARITH}. |
@^system dependencies@> |
|
@<Type...@>= |
typedef unsigned int tetra; |
/* assumes that an int is exactly 32 bits wide */ |
typedef struct { tetra h,l;} octa; /* two tetrabytes make one octabyte */ |
typedef enum {@!false,@!true}@+@!bool; |
|
@ @<Glob...@>= |
extern octa zero_octa; /* |zero_octa.h=zero_octa.l=0| */ |
extern octa neg_one; /* |neg_one.h=neg_one.l=-1| */ |
extern octa aux; /* auxiliary output of a subroutine */ |
extern bool overflow; /* set by certain subroutines for signed arithmetic */ |
|
@ Most of the subroutines in {\mc MMIX-ARITH} return an octabyte as |
a function of two octabytes; for example, |oplus(y,z)| returns the |
sum of octabytes |y| and~|z|. Division inputs the high |
half of a dividend in the global variable~|aux| and returns |
the remainder in~|aux|. |
|
@<Sub...@>= |
extern octa oplus @,@,@[ARGS((octa y,octa z))@]; |
/* unsigned $y+z$ */ |
extern octa ominus @,@,@[ARGS((octa y,octa z))@]; |
/* unsigned $y-z$ */ |
extern octa incr @,@,@[ARGS((octa y,int delta))@]; |
/* unsigned $y+\delta$ ($\delta$ is signed) */ |
extern octa oand @,@,@[ARGS((octa y,octa z))@]; |
/* $y\land z$ */ |
extern octa shift_left @,@,@[ARGS((octa y,int s))@]; |
/* $y\LL s$, $0\le s\le64$ */ |
extern octa shift_right @,@,@[ARGS((octa y,int s,int uns))@]; |
/* $y\GG s$, signed if |!uns| */ |
extern octa omult @,@,@[ARGS((octa y,octa z))@]; |
/* unsigned $(|aux|,x)=y\times z$ */ |
extern octa odiv @,@,@[ARGS((octa x,octa y,octa z))@]; |
/* unsigned $(x,y)/z$; $|aux|=(x,y)\bmod z$ */ |
|
@ Here's a rudimentary check to see if arithmetic is in trouble. |
|
@<Init...@>= |
acc=shift_left(neg_one,1); |
if (acc.h!=0xffffffff) panic("Type tetra is not implemented correctly"); |
@.Type tetra...@> |
|
@ Future versions of this program will work with symbols formed from Unicode |
characters, but the present code limits itself to an 8-bit subset. |
@^Unicode@> |
The type \&{Char} is defined here in order to ease the later transition: |
At present, \&{Char} is the same as \&{unsigned} \&{char}, but |
\&{Char} can be changed to a 16-bit type in the Unicode version. |
|
Other changes will also be necessary when the transition to Unicode is made; |
for example, some calls of |fprintf| will become calls of |fwprintf|, |
and some occurrences of \.{\%s} will become \.{\%ls} in print formats. |
The switchable type name \&{Char} provides at least a first step |
towards a brighter future with Unicode. |
|
@<Type...@>= |
typedef unsigned char Char; /* bytes that will become wydes some day */ |
|
@ While we're talking about classic systems versus future systems, we |
might as well define the |ARGS| macro, which makes function prototypes |
available on {\mc ANSI \CEE/} systems without making them |
uncompilable on older systems. Each subroutine below is declared first |
with a prototype, then with an old-style definition. |
|
@<Preprocessor definitions@>= |
#ifdef __STDC__ |
#define ARGS(list) list |
#else |
#define ARGS(list) () |
#endif |
|
@* Basic input and output. Input goes into a buffer that is normally |
limited to 72 characters. This limit can be raised, by using the |
\.{-b} option when invoking the assembler; but short buffers will keep listings |
from becoming unwieldy, because a symbolic listing adds 19 characters per~line. |
|
@<Initialize everything@>= |
if (buf_size<72) buf_size=72; |
buffer=(Char*)calloc(buf_size+1,sizeof(Char)); |
lab_field=(Char*)calloc(buf_size+1,sizeof(Char)); |
op_field=(Char*)calloc(buf_size,sizeof(Char)); |
operand_list=(Char*)calloc(buf_size,sizeof(Char)); |
err_buf=(Char*)calloc(buf_size+60,sizeof(Char)); |
if (!buffer || !lab_field || !op_field || !operand_list || !err_buf) |
panic("No room for the buffers"); |
@.No room...@> |
|
@ @<Glob...@>= |
Char *buffer; /* raw input of the current line */ |
Char *buf_ptr; /* current position within |buffer| */ |
Char *lab_field; /* copy of the label field of the current instruction */ |
Char *op_field; /* copy of the opcode field of the current instruction */ |
Char *operand_list; /* copy of the operand field of the current instruction */ |
Char *err_buf; /* place where dynamic error messages are sprinted */ |
|
@ @<Get the next line of input text, or |break| if the input has ended@>= |
if (!fgets(buffer,buf_size+1,src_file)) break; |
line_no++; |
line_listed=false; |
j=strlen(buffer); |
if (buffer[j-1]=='\n') buffer[j-1]='\0'; /* remove the newline */ |
else if ((j=fgetc(src_file))!=EOF) |
@<Flush the excess part of an overlong line@>; |
if (buffer[0]=='#') @<Check for a line directive@>; |
buf_ptr=buffer; |
|
@ @<Flush the excess...@>= |
{ |
while(j!='\n' && j!= EOF) j=fgetc(src_file); |
if (!long_warning_given) { |
long_warning_given=true; |
err("*trailing characters of long input line have been dropped"); |
@.trailing characters...@> |
fprintf(stderr, |
"(say `-b <number>' to increase the length of my input buffer)\n"); |
}@+else err("*trailing characters dropped"); |
} |
|
@ @<Glob...@>= |
int cur_file; /* index of the current file in |filename| */ |
int line_no; /* current position in the file */ |
bool line_listed; /* have we listed the buffer contents? */ |
bool long_warning_given; /* have we given the hint about \.{-b}? */ |
|
@ We keep track of source file name and line number at all times, for |
error reporting and for synchronization data in the object file. |
Up to 256 different source file names can be remembered. |
|
@<Glob...@>= |
Char *filename[257]; |
/* source file names, including those in line directives */ |
int filename_count; /* how many |filename| entries have we filled? */ |
|
@ If the current line is a line directive, it will also be treated |
as a comment by the assembler. |
|
@<Check for a line directive@>= |
{ |
for (p=buffer+1;isspace(*p);p++); |
for (j=*p++-'0';isdigit(*p);p++) j=10*j+*p-'0'; |
for (;isspace(*p);p++); |
if (*p=='\"') { |
if (!filename[filename_count]) { |
filename[filename_count]=(Char*)calloc(FILENAME_MAX+1,sizeof(Char)); |
if (!filename[filename_count]) |
panic("Capacity exceeded: Out of filename memory"); |
@.Capacity exceeded...@> |
} |
for (p++,q=filename[filename_count];*p && *p!='\"';p++,q++) *q=*p; |
if (*p=='\"' && *(p-1)!='\"') { /* yes, it's a line directive */ |
*q='\0'; |
for (k=0;strcmp(filename[k],filename[filename_count])!=0;k++); |
if (k==filename_count) filename_count++; |
cur_file=k; |
line_no=j-1; |
} |
} |
} |
|
@ Archaic versions of the \CEE/ library do not define |FILENAME_MAX|. |
|
@<Preprocessor definitions@>= |
#ifndef FILENAME_MAX |
#define FILENAME_MAX 256 |
#endif |
|
@ @<Local variables@>= |
register Char *p,*q; /* the place where we're currently scanning */ |
|
@ The next several subroutines are useful for preparing a listing of |
the assembled results. In such a listing, which the user can request |
with a command-line option, we fill the leftmost 19 columns with |
a representation of the output that has been assembled from the |
input in the buffer. Sometimes the assembled output requires |
more than one line, because we have room to output only a tetrabyte per line. |
|
The |flush_listing_line| subroutine is called when we have finished |
generating one line's worth of assembled material. Its parameter is |
a string to be printed between the assembled material and the |
buffer contents, if the input line hasn't yet been echoed. The length |
of this string should be 19 minus the number of characters already printed |
on the current line of the listing. |
|
@<Sub...@>= |
void flush_listing_line @,@,@[ARGS((char*))@];@+@t}\6{@> |
void flush_listing_line(s) |
char *s; |
{ |
if (line_listed) fprintf(listing_file,"\n"); |
else { |
fprintf(listing_file,"%s%s\n",s,buffer); |
line_listed=true; |
} |
} |
|
@ Only the three least significant hex digits of a location are shown on |
the listing, unless the other digits have changed. The following subroutine |
prints an extra line when a change needs to be shown. |
|
@<Sub...@>= |
void update_listing_loc @,@,@[ARGS((int))@];@+@t}\6{@> |
void update_listing_loc(k) |
int k; /* the location to display, mod 4 */ |
{ |
if (cur_loc.h!=listing_loc.h || ((cur_loc.l^listing_loc.l)&0xfffff000)) { |
fprintf(listing_file,"%08x%08x:",cur_loc.h,(cur_loc.l&-4)|k); |
flush_listing_line(" "); |
} |
listing_loc.h=cur_loc.h;@+ |
listing_loc.l=(cur_loc.l&-4)|k; |
} |
|
@ @<Glob...@>= |
octa cur_loc; /* current location of assembled output */ |
octa listing_loc; /* current location on the listing */ |
unsigned char hold_buf[4]; /* assembled bytes */ |
unsigned char held_bits; /* which bytes of |hold_buf| are active? */ |
unsigned char listing_bits; /* which of them haven't been listed yet? */ |
bool spec_mode; /* are we between |BSPEC| and |ESPEC|? */ |
tetra spec_mode_loc; /* number of bytes in the current special output */ |
|
@ When bytes are assembled, they are placed into the |hold_buf|. |
More precisely, a byte assembled for a location that is |j|~plus a |
multiple of~4 is placed into |hold_buf[j]|; two auxiliary variables, |
|held_bits| and |listing_bits|, are then increased by |1<<j|. |
Furthermore, |listing_bits| |
is increased by |0x10<<j| if that byte is a future reference to be |
resolved later. |
|
The bytes are held until we need to output them. |
The |listing_clear| routine lists any that have been held |
but not yet shown. It should be called only when |listing_bits!=0|. |
|
@<Sub...@>= |
void listing_clear @,@,@[ARGS((void))@];@+@t}\6{@> |
void listing_clear() |
{ |
register int j,k; |
for (k=0;k<4;k++) if (listing_bits&(1<<k)) break; |
if (spec_mode) fprintf(listing_file," "); |
else { |
update_listing_loc(k); |
fprintf(listing_file," ...%03x: ",(listing_loc.l&0xffc)|k); |
} |
for (j=0;j<4;j++) |
if (listing_bits&(0x10<<j)) fprintf(listing_file,"xx"); |
else if (listing_bits&(1<<j)) fprintf(listing_file,"%02x",hold_buf[j]); |
else fprintf(listing_file," "); |
flush_listing_line(" "); |
listing_bits=0; |
} |
|
@ Error messages are written to |stderr|. If the message begins with |
`\.*' it is merely a warning; if it begins with `\.!' it is fatal; |
otherwise the error is probably serious enough to make manual correction |
necessary, yet it is not tragic. Errors and warnings appear |
also on the optional listing file. |
|
@d err(m) {@+report_error(m);@+if (m[0]!='*') goto bypass;@+} |
@d derr(m,p) {@+sprintf(err_buf,m,p); |
report_error(err_buf);@+if (err_buf[0]!='*') goto bypass;@+} |
@d dderr(m,p,q) {@+sprintf(err_buf,m,p,q); |
report_error(err_buf);@+if (err_buf[0]!='*') goto bypass;@+} |
@d panic(m) {@+sprintf(err_buf,"!%s",m);@+report_error(err_buf);@+} |
@d dpanic(m,p) {@+err_buf[0]='!';@+sprintf(err_buf+1,m,p);@+ |
report_error(err_buf);@+} |
|
@<Sub...@>= |
void report_error @,@,@[ARGS((char*))@];@+@t}\6{@> |
void report_error(message) |
char *message; |
{ |
if (!filename[cur_file]) filename[cur_file]="(nofile)"; |
if (message[0]=='*') |
fprintf(stderr,"\"%s\", line %d warning: %s\n", |
filename[cur_file],line_no,message+1); |
else if (message[0]=='!') |
fprintf(stderr,"\"%s\", line %d fatal error: %s\n", |
filename[cur_file],line_no,message+1); |
else { |
fprintf(stderr,"\"%s\", line %d: %s!\n", |
filename[cur_file],line_no,message); |
err_count++; |
} |
if (listing_file) { |
if (!line_listed) flush_listing_line("****************** "); |
if (message[0]=='*') fprintf(listing_file, |
"************ warning: %s\n",message+1); |
else if (message[0]=='!') fprintf(listing_file, |
"******** fatal error: %s!\n",message+1); |
else fprintf(listing_file, |
"********** error: %s!\n",message); |
} |
if (message[0]=='!') exit(-2); |
} |
|
@ @<Glob...@>= |
int err_count; /* this many errors were found */ |
|
@ Output to the binary |obj_file| occurs four bytes at a time. The |
bytes are assembled in small buffers, not output as single tetrabytes, |
because we want the output to be big-endian even when the assembler |
is running on a little-endian machine. |
@^big-endian versus little-endian@> |
@^little-endian versus big-endian@> |
|
@d mmo_write(buf) if (fwrite(buf,1,4,obj_file)!=4) |
dpanic("Can't write on %s",obj_file_name) |
@.Can't write...@> |
|
@<Sub...@>= |
void mmo_clear @,@,@[ARGS((void))@]; |
void mmo_out @,@,@[ARGS((void))@]; |
unsigned char lop_quote_command[4]={mm,lop_quote,0,1}; |
void mmo_clear() /* clears |hold_buf|, when |held_bits!=0| */ |
{ |
if (hold_buf[0]==mm) mmo_write(lop_quote_command); |
mmo_write(hold_buf); |
if (listing_file && listing_bits) listing_clear(); |
held_bits=0; |
hold_buf[0]=hold_buf[1]=hold_buf[2]=hold_buf[3]=0; |
mmo_cur_loc=incr(mmo_cur_loc,4);@+ mmo_cur_loc.l&=-4; |
if (mmo_line_no) mmo_line_no++; |
} |
@# |
unsigned char mmo_buf[4]; |
int mmo_ptr; |
void mmo_out() /* output the contents of |mmo_buf| */ |
{ |
if (held_bits) mmo_clear(); |
mmo_write(mmo_buf); |
} |
|
@ @<Sub...@>= |
void mmo_tetra @,@,@[ARGS((tetra))@]; |
void mmo_byte @,@,@[ARGS((unsigned char))@]; |
void mmo_lop @,@,@[ARGS((char,unsigned char,unsigned char))@]; |
void mmo_lopp @,@,@[ARGS((char,unsigned short))@]; |
void mmo_tetra(t) /* output a tetrabyte */ |
tetra t; |
{ |
mmo_buf[0]=t>>24;@+ mmo_buf[1]=(t>>16)&0xff; |
mmo_buf[2]=(t>>8)&0xff;@+ mmo_buf[3]=t&0xff; |
mmo_out(); |
} |
@# |
void mmo_byte(b) |
unsigned char b; |
{ |
mmo_buf[(mmo_ptr++)&3]=b; |
if (!(mmo_ptr&3)) mmo_out(); |
} |
@# |
void mmo_lop(x,y,z) /* output a loader operation */ |
char x; |
unsigned char y,z; |
{ |
mmo_buf[0]=mm;@+ mmo_buf[1]=x;@+ mmo_buf[2]=y;@+ mmo_buf[3]=z; |
mmo_out(); |
} |
@# |
void mmo_lopp(x,yz) /* output a loader operation with two-byte operand */ |
char x; |
unsigned short yz; |
{ |
mmo_buf[0]=mm;@+ mmo_buf[1]=x;@+ |
mmo_buf[2]=yz>>8;@+ mmo_buf[3]=yz&0xff; |
mmo_out(); |
} |
|
@ The |mmo_loc| subroutine makes the current location in the object file |
equal to |cur_loc|. |
|
@<Sub...@>= |
void mmo_loc @,@,@[ARGS((void))@];@+@t}\6{@> |
void mmo_loc() |
{ |
octa o; |
if (held_bits) mmo_clear(); |
o=ominus(cur_loc,mmo_cur_loc); |
if (o.h==0 && o.l<0x10000) { |
if (o.l) mmo_lopp(lop_skip,o.l); |
}@+else { |
if (cur_loc.h&0xffffff) { |
mmo_lop(lop_loc,0,2); |
mmo_tetra(cur_loc.h); |
}@+else mmo_lop(lop_loc,cur_loc.h>>24,1); |
mmo_tetra(cur_loc.l); |
} |
mmo_cur_loc=cur_loc; |
} |
|
@ Similarly, the |mmo_sync| subroutine makes sure that the current file and |
line number in the output file agree with |cur_file| and |line_no|. |
|
@<Sub...@>= |
void mmo_sync @,@,@[ARGS((void))@];@+@t}\6{@> |
void mmo_sync() |
{ |
register int j; register unsigned char *p; |
if (cur_file!=mmo_cur_file) { |
if (filename_passed[cur_file]) mmo_lop(lop_file,cur_file,0); |
else { |
mmo_lop(lop_file,cur_file,(strlen(filename[cur_file])+3)>>2); |
for (j=0,p=filename[cur_file];*p;p++,j=(j+1)&3) { |
mmo_buf[j]=*p; |
if (j==3) mmo_out(); |
} |
if (j) { |
for (;j<4;j++) mmo_buf[j]=0; |
mmo_out(); |
} |
filename_passed[cur_file]=1; |
} |
mmo_cur_file=cur_file; |
mmo_line_no=0; |
} |
if (line_no!=mmo_line_no) { |
if (line_no>=0x10000) |
panic("I can't deal with line numbers exceeding 65535"); |
@.I can't deal with...@> |
mmo_lopp(lop_line,line_no); |
mmo_line_no=line_no; |
} |
} |
|
@ @<Glob...@>= |
octa mmo_cur_loc; /* current location in the object file */ |
int mmo_line_no; /* current line number in the \.{mmo} output so far */ |
int mmo_cur_file; /* index of the current file in the \.{mmo} output so far */ |
char filename_passed[256]; /* has a filename been recorded in the output? */ |
|
@ Here is a basic subroutine that assembles |k| bytes starting at |cur_loc|. |
The value of |k| should be 1, 2, or~4, and |cur_loc| should be a multiple |
of~|k|. The |x_bits| parameter tells which bytes, if any, are part of |
a future reference. |
|
@<Sub...@>= |
void assemble @,@,@[ARGS((char,tetra,unsigned char))@];@+@t}\6{@> |
void assemble(k,dat,x_bits) |
char k; |
tetra dat; |
unsigned char x_bits; |
{ |
register int j,jj,l; |
if (spec_mode) l=spec_mode_loc; |
else { |
l=cur_loc.l; |
@<Make sure |cur_loc| and |mmo_cur_loc| refer to the same tetrabyte@>; |
if (!held_bits && !(cur_loc.h&0xe0000000)) mmo_sync(); |
} |
for (j=0;j<k;j++) { |
jj=(l+j)&3; |
hold_buf[jj]=(dat>>(8*(k-1-j)))&0xff; |
held_bits|=1<<jj; |
listing_bits|=1<<jj; |
} |
listing_bits|=x_bits; |
if (((l+k)&3)==0) { |
if (listing_file) listing_clear(); |
mmo_clear(); |
} |
if (spec_mode) spec_mode_loc+=k; else cur_loc=incr(cur_loc,k); |
} |
|
@ @<Make sure |cur_loc| and |mmo_cur_loc| refer to the same tetrabyte@>= |
if (cur_loc.h!=mmo_cur_loc.h || ((cur_loc.l^mmo_cur_loc.l)&0xfffffffc)) |
mmo_loc(); |
|
@* The symbol table. Symbols are stored and retrieved by means of |
a {\it ternary search trie}, following ideas of Bentley and |
Sedgewick. (See {\sl ACM--SIAM Symp.\ on Discrete Algorithms\/ \bf8} (1997), |
360--369; R.~Sedgewick, {\sl Algorithms in C\/} (Reading, Mass.:\ |
Addison--Wesley, 1998), \S15.4.) Each trie node stores a character, |
@^Bentley, Jon Louis@> |
@^Sedgewick, Robert@> |
and there are branches to subtries for the cases where a given character |
is less than, equal to, or greater than the character in the trie. |
There also is a pointer to a symbol table entry if a symbol ends at |
the current node. |
|
@s sym_tab_struct int |
|
@<Type...@>= |
typedef struct ternary_trie_struct { |
unsigned short ch; /* the (possibly wyde) character stored here */ |
struct ternary_trie_struct *left, *mid, *right; /* downward |
in the ternary trie */ |
struct sym_tab_struct *sym; /* equivalents of symbols */ |
} trie_node; |
|
@ We allocate trie nodes in chunks of 1000 at a time. |
|
@<Sub...@>= |
trie_node* new_trie_node @,@,@[ARGS((void))@];@+@t}\6{@> |
trie_node* new_trie_node() |
{ |
register trie_node *t=next_trie_node; |
if (t==last_trie_node) { |
t=(trie_node*)calloc(1000,sizeof(trie_node)); |
if (!t) panic("Capacity exceeded: Out of trie memory"); |
@.Capacity exceeded...@> |
last_trie_node=t+1000; |
} |
next_trie_node=t+1; |
return t; |
} |
|
@ @<Glob...@>= |
trie_node *trie_root; /* root of the trie */ |
trie_node *op_root; /* root of subtrie for opcodes */ |
trie_node *next_trie_node, *last_trie_node; /* allocation control */ |
trie_node *cur_prefix; /* root of subtrie for unqualified symbols */ |
|
@ The |trie_search| subroutine starts at a given node of the trie and finds |
a given string in its middle subtrie, inserting new nodes if necessary. |
The string ends with the first nonletter or nondigit; the location |
of the terminating character is stored in global variable~|terminator|. |
|
@d isletter(c) (isalpha(c)||c=='_'||c==':'||c>126) |
|
@<Sub...@>= |
trie_node *trie_search @,@,@[ARGS((trie_node*,Char*))@]; |
Char *terminator; /* where the search ended */ |
trie_node *trie_search(t,s) |
trie_node *t; |
Char *s; |
{ |
register trie_node *tt=t; |
register Char *p=s; |
while (1) { |
if (!isletter(*p) && !isdigit(*p)) { |
terminator=p;@+return tt; |
} |
if (tt->mid) { |
tt=tt->mid; |
while (*p!=tt->ch) { |
if (*p<tt->ch) { |
if (tt->left) tt=tt->left; |
else { |
tt->left=new_trie_node();@+tt=tt->left;@+goto store_new_char; |
} |
}@+else { |
if (tt->right) tt=tt->right; |
else { |
tt->right=new_trie_node();@+tt=tt->right;@+goto store_new_char; |
} |
} |
} |
p++; |
}@+else { |
tt->mid=new_trie_node();@+tt=tt->mid; |
store_new_char: tt->ch=*p++; |
} |
} |
} |
|
@ Symbol table nodes hold the serial numbers and |
equivalents of defined symbols. They also |
hold ``fixup information'' for undefined symbols; this will allow the |
loader to correct any previously assembled instructions that refer to such |
symbols when they are eventually defined. |
|
In the symbol table node for a defined symbol, the |link| field |
has one of the special codes |DEFINED| or |REGISTER| or |PREDEFINED|, and the |
|equiv| field holds the defined value. The |serial| number |
is a unique identifier for all user-defined symbols. |
|
In the symbol table node for an undefined symbol, the |equiv| field |
is ignored. The |link| field |
points to the first node of fixup information; that node is, in turn, |
a symbol table node that might link to other fixups. The |serial| number |
in a fixup node is either 0 or 1 or 2, meaning respectively ``fixup the |
octabyte pointed to by |equiv|'' or ``fixup the relative address in the YZ |
field of the instruction pointed to by |equiv|'' or ``fixup the relative |
address in the XYZ field of the instruction pointed to by |equiv|.'' |
|
@s sym_node int |
@s bool int |
|
@d DEFINED (sym_node*)1 /* code value for octabyte equivalents */ |
@d REGISTER (sym_node*)2 /* code value for register-number equivalents */ |
@d PREDEFINED (sym_node*)3 /* code value for not-yet-used equivalents */ |
@d fix_o 0 /* |serial| code for octabyte fixup */ |
@d fix_yz 1 /* |serial| code for relative fixup */ |
@d fix_xyz 2 /* |serial| code for \.{JMP} fixup */ |
|
@<Type...@>= |
typedef struct sym_tab_struct { |
int serial; /* serial number of symbol; type number for fixups */ |
struct sym_tab_struct *link; /* |DEFINED| status or link to fixup */ |
octa equiv; /* the equivalent value */ |
} sym_node; |
|
@ The allocation of new symbol table nodes proceeds in chunks, like the |
allocation of trie nodes. But in this case we also have the possibility |
of reusing old fixup nodes that are no longer needed. |
|
@d recycle_fixup(pp) pp->link=sym_avail, sym_avail=pp |
|
@<Sub...@>= |
sym_node* new_sym_node @,@,@[ARGS((bool))@];@+@t}\6{@> |
sym_node* new_sym_node(serialize) |
bool serialize; /* should the new node receive a unique serial number? */ |
{ |
register sym_node *p=sym_avail; |
if (p) { |
sym_avail=p->link;@+p->link=NULL;@+p->serial=0;@+p->equiv=zero_octa; |
}@+else { |
p=next_sym_node; |
if (p==last_sym_node) { |
p=(sym_node*)calloc(1000,sizeof(sym_node)); |
if (!p) panic("Capacity exceeded: Out of symbol memory"); |
@.Capacity exceeded...@> |
last_sym_node=p+1000; |
} |
next_sym_node=p+1; |
} |
if (serialize) p->serial=++serial_number; |
return p; |
} |
|
@ @<Glob...@>= |
int serial_number; |
sym_node *sym_root; /* root of the sym */ |
sym_node *next_sym_node, *last_sym_node; /* allocation control */ |
sym_node *sym_avail; /* stack of recycled symbol table nodes */ |
|
@ We initialize the trie by inserting all the predefined symbols. |
Opcodes are given the prefix \.{\^}, to distinguish them from |
ordinary symbols; this character nicely divides uppercase letters from |
lowercase letters. |
|
@<Init...@>= |
trie_root=new_trie_node(); |
cur_prefix=trie_root; |
op_root=new_trie_node(); |
trie_root->mid=op_root; |
trie_root->ch=':'; |
op_root->ch='^'; |
@<Put the \MMIX\ opcodes and \MMIXAL\ pseudo-ops into the trie@>; |
@<Put the special register names into the trie@>; |
@<Put other predefined symbols into the trie@>; |
|
@ Most of the assembly work can be table driven, based on bits that |
are stored as the ``equivalents'' of opcode symbols like \.{\^ADD}. |
|
@d rel_addr_bit 0x1 /* is YZ or XYZ relative? */ |
@d immed_bit 0x2 /* should opcode be immediate if Z or YZ not register? */ |
@d zar_bit 0x4 /* should register status of Z be ignored? */ |
@d zr_bit 0x8 /* must Z be a register? */ |
@d yar_bit 0x10 /* should register status of Y be ignored? */ |
@d yr_bit 0x20 /* must Y be a register? */ |
@d xar_bit 0x40 /* should register status of X be ignored? */ |
@d xr_bit 0x80 /* must X be a register? */ |
@d yzar_bit 0x100 /* should register status of YZ be ignored? */ |
@d yzr_bit 0x200 /* must YZ be a register? */ |
@d xyzar_bit 0x400 /* should register status of XYZ be ignored? */ |
@d xyzr_bit 0x800 /* must XYZ be a register? */ |
@d one_arg_bit 0x1000 /* is it OK to have zero or one operand? */ |
@d two_arg_bit 0x2000 /* is it OK to have exactly two operands? */ |
@d three_arg_bit 0x4000 /* is it OK to have exactly three operands? */ |
@d many_arg_bit 0x8000 /* is it OK to have more than three operands? */ |
@d align_bits 0x30000 /* how much alignment: byte, wyde, tetra, or octa? */ |
@d no_label_bit 0x40000 /* should the label be blank? */ |
@d mem_bit 0x80000 /* must YZ be a memory reference? */ |
@d spec_bit 0x100000 /* is this opcode allowed in \.{SPEC} mode? */ |
|
@<Type...@>= |
typedef struct { |
Char *name; /* symbolic opcode */ |
short code; /* numeric opcode */ |
int bits; /* treatment of operands */ |
} op_spec; |
@# |
typedef enum { |
@!SET=0x100,@!IS,@!LOC,@!PREFIX,@!BSPEC,@!ESPEC,@!GREG,@!LOCAL,@/ |
@!BYTE,@!WYDE,@!TETRA,@!OCTA}@+@!pseudo_op; |
|
@ @<Glob...@>= |
op_spec op_init_table[]={@/ |
{"TRAP", 0x00, 0x27554}, |
@.TRAP@> |
{"FCMP", 0x01, 0x240a8}, |
@.FCMP@> |
{"FUN", 0x02, 0x240a8}, |
@.FUN@> |
{"FEQL", 0x03, 0x240a8},@/ |
@.FEQL@> |
{"FADD", 0x04, 0x240a8}, |
@.FADD@> |
{"FIX", 0x05, 0x26288}, |
@.FIX@> |
{"FSUB", 0x06, 0x240a8}, |
@.FSUB@> |
{"FIXU", 0x07, 0x26288},@/ |
@.FIXU@> |
{"FLOT", 0x08, 0x26282}, |
@.FLOT@> |
{"FLOTU", 0x0a, 0x26282}, |
@.FLOTU@> |
{"SFLOT", 0x0c, 0x26282}, |
@.SFLOT@> |
{"SFLOTU", 0x0e, 0x26282},@/ |
@.SFLOTU@> |
{"FMUL", 0x10, 0x240a8}, |
@.FMUL@> |
{"FCMPE", 0x11, 0x240a8}, |
@.FCMPE@> |
{"FUNE", 0x12, 0x240a8}, |
@.FUNE@> |
{"FEQLE", 0x13, 0x240a8},@/ |
@.FEQLE@> |
{"FDIV", 0x14, 0x240a8}, |
@.FDIV@> |
{"FSQRT", 0x15, 0x26288}, |
@.FSQRT@> |
{"FREM", 0x16, 0x240a8}, |
@.FREM@> |
{"FINT", 0x17, 0x26288},@/ |
@.FINT@> |
{"MUL", 0x18, 0x240a2}, |
@.MUL@> |
{"MULU", 0x1a, 0x240a2}, |
@.MULU@> |
{"DIV", 0x1c, 0x240a2}, |
@.DIV@> |
{"DIVU", 0x1e, 0x240a2},@/ |
@.DIVU@> |
{"ADD", 0x20, 0x240a2}, |
@.ADD@> |
{"ADDU", 0x22, 0x240a2}, |
@.ADDU@> |
{"SUB", 0x24, 0x240a2}, |
@.SUB@> |
{"SUBU", 0x26, 0x240a2},@/ |
@.SUBU@> |
{"2ADDU", 0x28, 0x240a2}, |
@.2ADDU@> |
{"4ADDU", 0x2a, 0x240a2}, |
@.4ADDU@> |
{"8ADDU", 0x2c, 0x240a2}, |
@.8ADDU@> |
{"16ADDU", 0x2e, 0x240a2},@/ |
@.16ADDU@> |
{"CMP", 0x30, 0x240a2}, |
@.CMP@> |
{"CMPU", 0x32, 0x240a2}, |
@.CMPU@> |
{"NEG", 0x34, 0x26082}, |
@.NEG@> |
{"NEGU", 0x36, 0x26082},@/ |
@.NEGU@> |
{"SL", 0x38, 0x240a2}, |
@.SL@> |
{"SLU", 0x3a, 0x240a2}, |
@.SLU@> |
{"SR", 0x3c, 0x240a2}, |
@.SR@> |
{"SRU", 0x3e, 0x240a2},@/ |
@.SRU@> |
{"BN", 0x40, 0x22081}, |
@.BN@> |
{"BZ", 0x42, 0x22081}, |
@.BZ@> |
{"BP", 0x44, 0x22081}, |
@.BP@> |
{"BOD", 0x46, 0x22081},@/ |
@.BOD@> |
{"BNN", 0x48, 0x22081}, |
@.BNN@> |
{"BNZ", 0x4a, 0x22081}, |
@.BNZ@> |
{"BNP", 0x4c, 0x22081}, |
@.BNP@> |
{"BEV", 0x4e, 0x22081},@/ |
@.BEV@> |
{"PBN", 0x50, 0x22081}, |
@.PBN@> |
{"PBZ", 0x52, 0x22081}, |
@.PBZ@> |
{"PBP", 0x54, 0x22081}, |
@.PBP@> |
{"PBOD", 0x56, 0x22081},@/ |
@.PBOD@> |
{"PBNN", 0x58, 0x22081}, |
@.PBNN@> |
{"PBNZ", 0x5a, 0x22081}, |
@.PBNZ@> |
{"PBNP", 0x5c, 0x22081}, |
@.PBNP@> |
{"PBEV", 0x5e, 0x22081},@/ |
@.PBEV@> |
{"CSN", 0x60, 0x240a2}, |
@.CSN@> |
{"CSZ", 0x62, 0x240a2}, |
@.CSZ@> |
{"CSP", 0x64, 0x240a2}, |
@.CSP@> |
{"CSOD", 0x66, 0x240a2},@/ |
@.CSOD@> |
{"CSNN", 0x68, 0x240a2}, |
@.CSNN@> |
{"CSNZ", 0x6a, 0x240a2}, |
@.CSNZ@> |
{"CSNP", 0x6c, 0x240a2}, |
@.CSNP@> |
{"CSEV", 0x6e, 0x240a2},@/ |
@.CSEV@> |
{"ZSN", 0x70, 0x240a2}, |
@.ZSN@> |
{"ZSZ", 0x72, 0x240a2}, |
@.ZSZ@> |
{"ZSP", 0x74, 0x240a2}, |
@.ZSP@> |
{"ZSOD", 0x76, 0x240a2},@/ |
@.ZSOD@> |
{"ZSNN", 0x78, 0x240a2}, |
@.ZSNN@> |
{"ZSNZ", 0x7a, 0x240a2}, |
@.ZSNZ@> |
{"ZSNP", 0x7c, 0x240a2}, |
@.ZSNP@> |
{"ZSEV", 0x7e, 0x240a2},@/ |
@.ZSEV@> |
{"LDB", 0x80, 0xa60a2}, |
@.LDB@> |
{"LDBU", 0x82, 0xa60a2}, |
@.LDBU@> |
{"LDW", 0x84, 0xa60a2}, |
@.LDW@> |
{"LDWU", 0x86, 0xa60a2},@/ |
@.LDWU@> |
{"LDT", 0x88, 0xa60a2}, |
@.LDT@> |
{"LDTU", 0x8a, 0xa60a2}, |
@.LDTU@> |
{"LDO", 0x8c, 0xa60a2}, |
@.LDO@> |
{"LDOU", 0x8e, 0xa60a2},@/ |
@.LDOU@> |
{"LDSF", 0x90, 0xa60a2}, |
@.LDSF@> |
{"LDHT", 0x92, 0xa60a2}, |
@.LDHT@> |
{"CSWAP", 0x94, 0xa60a2}, |
@.CSWAP@> |
{"LDUNC", 0x96, 0xa60a2},@/ |
@.LDUNC@> |
{"LDVTS", 0x98, 0xa60a2}, |
@.LDVTS@> |
{"PRELD", 0x9a, 0xa6022}, |
@.PRELD@> |
{"PREGO", 0x9c, 0xa6022}, |
@.PREGO@> |
{"GO", 0x9e, 0xa60a2},@/ |
@.GO@> |
{"STB", 0xa0, 0xa60a2}, |
@.STB@> |
{"STBU", 0xa2, 0xa60a2}, |
@.STBU@> |
{"STW", 0xa4, 0xa60a2}, |
@.STW@> |
{"STWU", 0xa6, 0xa60a2},@/ |
@.STWU@> |
{"STT", 0xa8, 0xa60a2}, |
@.STT@> |
{"STTU", 0xaa, 0xa60a2}, |
@.STTU@> |
{"STO", 0xac, 0xa60a2}, |
@.STO@> |
{"STOU", 0xae, 0xa60a2},@/ |
@.STOU@> |
{"STSF", 0xb0, 0xa60a2}, |
@.STSF@> |
{"STHT", 0xb2, 0xa60a2}, |
@.STHT@> |
{"STCO", 0xb4, 0xa6022}, |
@.STCO@> |
{"STUNC", 0xb6, 0xa60a2},@/ |
@.STUNC@> |
{"SYNCD", 0xb8, 0xa6022}, |
@.SYNCD@> |
{"PREST", 0xba, 0xa6022}, |
@.PREST@> |
{"SYNCID", 0xbc, 0xa6022}, |
@.SYNCID@> |
{"PUSHGO", 0xbe, 0xa6062},@/ |
@.PUSHGO@> |
{"OR", 0xc0, 0x240a2}, |
@.OR@> |
{"ORN", 0xc2, 0x240a2}, |
@.ORN@> |
{"NOR", 0xc4, 0x240a2}, |
@.NOR@> |
{"XOR", 0xc6, 0x240a2},@/ |
@.XOR@> |
{"AND", 0xc8, 0x240a2}, |
@.AND@> |
{"ANDN", 0xca, 0x240a2}, |
@.ANDN@> |
{"NAND", 0xcc, 0x240a2}, |
@.NAND@> |
{"NXOR", 0xce, 0x240a2},@/ |
@.NXOR@> |
{"BDIF", 0xd0, 0x240a2}, |
@.BDIF@> |
{"WDIF", 0xd2, 0x240a2}, |
@.WDIF@> |
{"TDIF", 0xd4, 0x240a2}, |
@.TDIF@> |
{"ODIF", 0xd6, 0x240a2},@/ |
@.ODIF@> |
{"MUX", 0xd8, 0x240a2}, |
@.MUX@> |
{"SADD", 0xda, 0x240a2}, |
@.SADD@> |
{"MOR", 0xdc, 0x240a2}, |
@.MOR@> |
{"MXOR", 0xde, 0x240a2},@/ |
@.MXOR@> |
{"SETH", 0xe0, 0x22080}, |
@.SETH@> |
{"SETMH", 0xe1, 0x22080}, |
@.SETMH@> |
{"SETML", 0xe2, 0x22080}, |
@.SETML@> |
{"SETL", 0xe3, 0x22080},@/ |
@.SETL@> |
{"INCH", 0xe4, 0x22080}, |
@.INCH@> |
{"INCMH", 0xe5, 0x22080}, |
@.INCMH@> |
{"INCML", 0xe6, 0x22080}, |
@.INCML@> |
{"INCL", 0xe7, 0x22080},@/ |
@.INCL@> |
{"ORH", 0xe8, 0x22080}, |
@.ORH@> |
{"ORMH", 0xe9, 0x22080}, |
@.ORMH@> |
{"ORML", 0xea, 0x22080}, |
@.ORML@> |
{"ORL", 0xeb, 0x22080},@/ |
@.ORL@> |
{"ANDNH", 0xec, 0x22080}, |
@.ANDNH@> |
{"ANDNMH", 0xed, 0x22080}, |
@.ANDNMH@> |
{"ANDNML", 0xee, 0x22080}, |
@.ANDNML@> |
{"ANDNL", 0xef, 0x22080},@/ |
@.ANDNL@> |
{"JMP", 0xf0, 0x21001}, |
@.JMP@> |
{"PUSHJ", 0xf2, 0x22041}, |
@.PUSHJ@> |
{"GETA", 0xf4, 0x22081}, |
@.GETA@> |
{"PUT", 0xf6, 0x22002},@/ |
@.PUT@> |
{"POP", 0xf8, 0x23000}, |
@.POP@> |
{"RESUME", 0xf9, 0x21000}, |
@.RESUME@> |
{"SAVE", 0xfa, 0x22080}, |
@.SAVE@> |
{"UNSAVE", 0xfb, 0x23a00},@/ |
@.UNSAVE@> |
{"SYNC", 0xfc, 0x21000}, |
@.SYNC@> |
{"SWYM", 0xfd, 0x27554}, |
@.SWYM@> |
{"GET", 0xfe, 0x22080}, |
@.GET@> |
{"TRIP", 0xff, 0x27554},@/ |
@.TRIP@> |
{"SET",SET, 0x22180}, |
@.SET@> |
{"LDA", 0x22, 0xa60a2},@/ |
@.LDA@> |
{"IS", IS, 0x101400}, |
@.IS@> |
{"LOC", LOC, 0x1400}, |
@.LOC@> |
{"PREFIX", PREFIX, 0x141000},@/ |
@.PREFIX@> |
{"BYTE", BYTE, 0x10f000}, |
@.BYTE@> |
{"WYDE", WYDE, 0x11f000}, |
@.WYDE@> |
{"TETRA", TETRA, 0x12f000}, |
@.TETRA@> |
{"OCTA", OCTA, 0x13f000},@/ |
@.OCTA@> |
{"BSPEC", BSPEC, 0x41400}, |
@.BSPEC@> |
{"ESPEC", ESPEC, 0x141000},@/ |
@.ESPEC@> |
{"GREG", GREG, 0x101000}, |
@.GREG@> |
{"LOCAL", LOCAL, 0x141800}}; |
@.LOCAL@> |
int op_init_size; /* the number of items in |op_init_table| */ |
|
@ @<Put the \MMIX\ opcodes and \MMIXAL\ pseudo-ops into the trie@>= |
op_init_size=(sizeof op_init_table)/sizeof(op_spec); |
for (j=0;j<op_init_size;j++) { |
tt=trie_search(op_root,op_init_table[j].name); |
pp=tt->sym=new_sym_node(false); |
pp->link=PREDEFINED; |
pp->equiv.h=op_init_table[j].code, pp->equiv.l=op_init_table[j].bits; |
} |
|
@ @<Local...@>= |
register trie_node *tt; |
register sym_node *pp,*qq; |
|
@ @<Put the special register names into the trie@>= |
for (j=0;j<32;j++) { |
tt=trie_search(trie_root,special_name[j]); |
pp=tt->sym=new_sym_node(false); |
pp->link=PREDEFINED; |
pp->equiv.l=j; |
} |
|
@ @<Glob...@>= |
Char *special_name[32]={"rB","rD","rE","rH","rJ","rM","rR","rBB", |
"rC","rN","rO","rS","rI","rT","rTT","rK","rQ","rU","rV","rG","rL", |
"rA","rF","rP","rW","rX","rY","rZ","rWW","rXX","rYY","rZZ"}; |
@^predefined symbols@> |
|
@ @<Type...@>= |
typedef struct { |
Char* name; |
tetra h,l; |
}@+predef_spec; |
|
@ @<Glob...@>= |
predef_spec predefs[]={ |
{"ROUND_CURRENT",0,0}, |
@:ROUND_CURRENT}\.{ROUND\_CURRENT@> |
{"ROUND_OFF",0,1}, |
@:ROUND_OFF}\.{ROUND\_OFF@> |
{"ROUND_UP",0,2}, |
@:ROUND_UP}\.{ROUND\_UP@> |
{"ROUND_DOWN",0,3}, |
@:ROUND_DOWN}\.{ROUND\_DOWN@> |
{"ROUND_NEAR",0,4},@/ |
@:ROUND_NEAR}\.{ROUND\_NEAR@> |
{"Inf",0x7ff00000,0},@/ |
@.Inf@> |
{"Data_Segment",0x20000000,0}, |
@:Data_Segment}\.{Data\_Segment@> |
{"Pool_Segment",0x40000000,0}, |
@:Pool_Segment}\.{Pool\_Segment@> |
{"Stack_Segment",0x60000000,0},@/ |
@:Stack_Segment}\.{Stack\_Segment@> |
{"D_BIT",0,0x80}, |
@:D_BIT}\.{D\_BIT@> |
{"V_BIT",0,0x40}, |
@:V_BIT}\.{V\_BIT@> |
{"W_BIT",0,0x20}, |
@:W_BIT}\.{W\_BIT@> |
{"I_BIT",0,0x10}, |
@:I_BIT}\.{I\_BIT@> |
{"O_BIT",0,0x08}, |
@:O_BIT}\.{O\_BIT@> |
{"U_BIT",0,0x04}, |
@:U_BIT}\.{U\_BIT@> |
{"Z_BIT",0,0x02}, |
@:Z_BIT}\.{Z\_BIT@> |
{"X_BIT",0,0x01},@/ |
@:X_BIT}\.{X\_BIT@> |
{"D_Handler",0,0x10}, |
@:D_Handler}\.{D\_Handler@> |
{"V_Handler",0,0x20}, |
@:V_Handler}\.{V\_Handler@> |
{"W_Handler",0,0x30}, |
@:W_Handler}\.{W\_Handler@> |
{"I_Handler",0,0x40}, |
@:I_Handler}\.{I\_Handler@> |
{"O_Handler",0,0x50}, |
@:O_Handler}\.{O\_Handler@> |
{"U_Handler",0,0x60}, |
@:U_Handler}\.{U\_Handler@> |
{"Z_Handler",0,0x70}, |
@:Z_Handler}\.{Z\_Handler@> |
{"X_Handler",0,0x80},@/ |
@:X_Handler}\.{X\_Handler@> |
{"StdIn",0,0}, |
@.StdIn@> |
{"StdOut",0,1}, |
@.StdOut@> |
{"StdErr",0,2},@/ |
@.StdErr@> |
{"TextRead",0,0}, |
@.TextRead@> |
{"TextWrite",0,1}, |
@.TextWrite@> |
{"BinaryRead",0,2}, |
@.BinaryRead@> |
{"BinaryWrite",0,3}, |
@.BinaryWrite@> |
{"BinaryReadWrite",0,4},@/ |
@.BinaryReadWrite@> |
{"Halt",0,0}, |
@.Halt@> |
{"Fopen",0,1}, |
@.Fopen@> |
{"Fclose",0,2}, |
@.Fclose@> |
{"Fread",0,3}, |
@.Fread@> |
{"Fgets",0,4}, |
@.Fgets@> |
{"Fgetws",0,5}, |
@.Fgetws@> |
{"Fwrite",0,6}, |
@.Fwrite@> |
{"Fputs",0,7}, |
@.Fputs@> |
{"Fputws",0,8}, |
@.Fputws@> |
{"Fseek",0,9}, |
@.Fseek@> |
{"Ftell",0,10}}; |
@.Ftell@> |
int predef_size; |
@^predefined symbols@> |
|
@ @<Put other predefined symbols into the trie@>= |
predef_size=(sizeof predefs)/sizeof(predef_spec); |
for (j=0;j<predef_size;j++) { |
tt=trie_search(trie_root,predefs[j].name); |
pp=tt->sym=new_sym_node(false); |
pp->link=PREDEFINED; |
pp->equiv.h=predefs[j].h, pp->equiv.l=predefs[j].l; |
} |
|
@ We place \.{Main} into the trie at the beginning of assembly, |
so that it will show up as an undefined symbol if the user |
specifies no starting point. |
@.Main@> |
|
@<Init...@>= |
trie_search(trie_root,"Main")->sym=new_sym_node(true); |
|
@ At the end of assembly we traverse the entire symbol table, visiting each |
symbol in lexicographic order and transmitting the trie structure to the |
output file. We detect any undefined future references at this time. |
|
The order of traversal has a simple recursive pattern: To traverse the subtrie |
rooted at~|t|, we |
$$\vbox{\halign{#\hfil\cr |
traverse |t->left|, if the left subtrie is nonempty;\cr |
visit |t->sym|, if this symbol table entry is present;\cr |
traverse |t->mid|, if the middle subtrie is nonempty;\cr |
traverse |t->right|, if the right subtrie is nonempty.\cr |
}}$$ |
This pattern leads to a compact representation in the \.{mmo} file, usually |
requiring fewer than two bytes per trie node plus the bytes needed to encode |
the equivalents and serial numbers. Each node of the trie is encoded as a |
``master byte'' followed by the encodings of the left subtrie, |
character, equivalent, middle subtrie, and right subtrie. |
The master byte is the sum of |
$$\vbox{\halign{#\hfil\cr |
\Hex{80}, if the character occupies two bytes instead of one;\cr |
\Hex{40}, if the left subtrie is nonempty;\cr |
\Hex{20}, if the middle subtrie is nonempty;\cr |
\Hex{10}, if the right subtrie is nonempty;\cr |
\Hex{01} to \Hex{08}, if the symbol's equivalent is one to eight bytes long;\cr |
\Hex{09} to \Hex{0e}, if the symbol's equivalent is $2^{61}$ plus one |
to six bytes;\cr |
\Hex{0f}, if the symbol's equivalent is \$0 plus one byte;\cr}}$$ |
the character is omitted if the middle subtrie and the equivalent are |
both empty. The ``equivalent'' of an undefined symbol is zero, but |
stated as two bytes long. |
Symbol equivalents are followed by the serial number, represented as a |
sequence of one or more bytes in radix~128; the final byte of the serial |
number is tagged by adding~128. (Thus, serial number $2^{14}-1$ is |
encoded as \Hex{7fff}; serial number $2^{14}$ is \Hex{010080}.) |
|
@ First we prune the trie by removing all predefined symbols that the |
user did not redefine. |
|
@<Sub...@>= |
trie_node* prune @,@,@[ARGS((trie_node*))@];@+@t}\6{@> |
trie_node* prune(t) |
trie_node* t; |
{ |
register int useful=0; |
if (t->sym) { |
if (t->sym->serial) useful=1; |
else t->sym=NULL; |
} |
if (t->left) { |
t->left=prune(t->left); |
if (t->left) useful=1; |
} |
if (t->mid) { |
t->mid=prune(t->mid); |
if (t->mid) useful=1; |
} |
if (t->right) { |
t->right=prune(t->right); |
if (t->right) useful=1; |
} |
if (useful) return t; |
else return NULL; |
} |
|
@ Then we output the trie by following the recursive traversal pattern. |
|
@<Sub...@>= |
void out_stab @,@,@[ARGS((trie_node*))@];@+@t}\6{@> |
void out_stab(t) |
trie_node* t; |
{ |
register int m=0,j; |
register sym_node *pp; |
if (t->ch>0xff) m+=0x80; |
if (t->left) m+=0x40; |
if (t->mid) m+=0x20; |
if (t->right) m+=0x10; |
if (t->sym) { |
if (t->sym->link==REGISTER) m+=0xf; |
else if (t->sym->link==DEFINED) |
@<Encode the length of |t->sym->equiv|@>@; |
else if (t->sym->link || t->sym->serial==1) @<Report an undefined symbol@>; |
} |
mmo_byte(m); |
if (t->left) out_stab(t->left); |
if (m&0x2f) @<Visit |t| and traverse |t->mid|@>; |
if (t->right) out_stab(t->right); |
} |
|
@ A global variable called |sym_buf| holds all characters on middle branches to |
the current trie node; |sym_ptr| is the first currently unused |
character in |sym_buf|. |
@^Unicode@> |
|
@<Visit |t| and traverse |t->mid|@>= |
{ |
if (m&0x80) mmo_byte(t->ch>>8); |
mmo_byte(t->ch&0xff); |
*sym_ptr++=(m&0x80? '?': t->ch); /* Unicode? not yet */ |
m&=0xf;@+ if (m && t->sym->link) { |
if (listing_file) @<Print symbol |sym_buf| and its equivalent@>; |
if (m==15) m=1; |
else if (m>8) m-=8; |
for (;m>0;m--) |
if (m>4) mmo_byte((t->sym->equiv.h>>(8*(m-5)))&0xff); |
else mmo_byte((t->sym->equiv.l>>(8*(m-1)))&0xff); |
for (m=0;m<4;m++) if (t->sym->serial<(1<<(7*(m+1)))) break; |
for (;m>=0;m--) |
mmo_byte(((t->sym->serial>>(7*m))&0x7f)+(m? 0: 0x80)); |
} |
if (t->mid) out_stab(t->mid); |
sym_ptr--; |
} |
|
@ @<Encode the length of |t->sym->equiv|@>= |
{@+register tetra x; |
if ((t->sym->equiv.h&0xffff0000)==0x20000000) |
m+=8, x=t->sym->equiv.h-0x20000000; /* data segment */ |
else x=t->sym->equiv.h; |
if (x) m+=4;@+ else x=t->sym->equiv.l; |
for (j=1;j<4;j++) if (x<(1<<(8*j))) break; |
m+=j; |
} |
|
@ We make room for symbols up to 999 bytes long. Strictly speaking, |
the program should check if this limit is exceeded; but really! |
|
@<Glob...@>= |
Char sym_buf[1000]; |
Char *sym_ptr; |
|
@ The initial `\.:' of each fully qualified symbol is omitted here, since most |
users of \MMIXAL\ will probably not need the \.{PREFIX} feature. One |
consequence of this omission is that the one-character symbol~`\.:' |
itself, which is allowed by the rules of \MMIXAL, is printed as the null |
string. |
|
@<Print symbol |sym_buf| and its equivalent@>= |
{ |
*sym_ptr='\0'; |
fprintf(listing_file," %s = ",sym_buf+1); |
pp=t->sym; |
if (pp->link==DEFINED) |
fprintf(listing_file,"#%08x%08x",pp->equiv.h,pp->equiv.l); |
else if (pp->link==REGISTER) |
fprintf(listing_file,"$%03d",pp->equiv.l); |
else fprintf(listing_file,"?"); |
fprintf(listing_file," (%d)\n",pp->serial); |
} |
|
@ @<Report an undefined symbol@>= |
{ |
*sym_ptr=(m&0x80? '?': t->ch); /* Unicode? not yet */ |
*(sym_ptr+1)='\0'; |
fprintf(stderr,"undefined symbol: %s\n",sym_buf+1); |
@.undefined symbol@> |
err_count++; |
m+=2; |
} |
|
@ @<Check and output the trie@>= |
op_root->mid=NULL; /* annihilate all the opcodes */ |
prune(trie_root); |
sym_ptr=sym_buf; |
if (listing_file) fprintf(listing_file,"\nSymbol table:\n"); |
mmo_lop(lop_stab,0,0); |
out_stab(trie_root); |
while (mmo_ptr&3) mmo_byte(0); |
mmo_lopp(lop_end,mmo_ptr>>2); |
|
@* Expressions. The most intricate part of the assembly process is |
the task of scanning and evaluating expressions in the operand field. |
Fortunately, \MMIXAL's expressions have a simple structure that can |
be handled easily with a stack-based approach. |
|
Two stacks hold pending data as the operand field is scanned and evaluated. |
The |op_stack| contains operators that have not yet been performed; the |
|val_stack| contains values that have not yet been used. After an entire |
operand list has been scanned, the |op_stack| will be empty and the |
|val_stack| will hold the operand values needed to assemble the current |
instruction. |
|
@ Entries on |op_stack| have one of the constant values defined here, and they |
have one of the precedence levels defined here. |
|
Entries on |val_stack| have |equiv|, |link|, and |status| fields; the |link| |
points to a trie node if the expression is a symbol that has not yet |
been subjected to any operations. |
|
@<Type...@>= |
typedef enum {@!negate,@!serialize,@!complement,@!registerize,@!inner_lp,@| |
@!plus,@!minus,@!times,@!over,@!frac,@!mod,@!shl,@!shr,@!and,@!or,@!xor,@| |
@!outer_lp,@!outer_rp,@!inner_rp} @!stack_op; |
typedef enum {@!zero,@!weak,@!strong,@!unary} @!prec; |
typedef enum {@!pure,@!reg_val,@!undefined} @!stat; |
typedef struct { |
octa equiv; /* current value */ |
trie_node *link; /* trie reference for symbol */ |
stat status; /* |pure|, |reg_val|, or |undefined| */ |
} val_node; |
|
@ @d top_op op_stack[op_ptr-1] /* top entry on the operator stack */ |
@d top_val val_stack[val_ptr-1] /* top entry on the value stack */ |
@d next_val val_stack[val_ptr-2] /* next-to-top entry of the value stack */ |
|
@<Glob...@>= |
stack_op *op_stack; /* stack for pending operators */ |
int op_ptr; /* number of items on |op_stack| */ |
val_node *val_stack; /* stack for pending operands */ |
int val_ptr; /* number of items on |val_stack| */ |
prec precedence[]={unary,unary,unary,unary,zero,@| |
weak,weak,strong,strong,strong,strong,strong,strong,strong,weak,weak,@| |
zero,zero,zero}; /* precedences of the respective |stack_op| values */ |
stack_op rt_op; /* newly scanned operator */ |
octa acc; /* temporary accumulator */ |
|
@ @<Init...@>= |
op_stack=(stack_op*)calloc(buf_size,sizeof(stack_op)); |
val_stack=(val_node*)calloc(buf_size,sizeof(val_node)); |
if (!op_stack || !val_stack) panic("No room for the stacks"); |
@.No room...@> |
|
@ The operand field of an instruction will have been copied into a separate |
\&{Char} array called |operand_list| when we reach this part of the program. |
|
@<Scan the operand field@>= |
p=operand_list; |
val_ptr=0; /* |val_stack| is empty */ |
op_stack[0]=outer_lp, op_ptr=1; |
/* |op_stack| contains an ``outer left parenthesis'' */ |
while (1) { |
@<Scan opening tokens until putting something on |val_stack|@>; |
scan_close: @<Scan a binary operator or closing token, |rt_op|@>; |
while (precedence[top_op]>=precedence[rt_op]) |
@<Perform the top operation on |op_stack|@>; |
hold_op: op_stack[op_ptr++]=rt_op; |
} |
operands_done:@; |
|
@ A comment that follows an empty operand list needs to be detected here. |
|
@<Scan opening tokens until putting something on |val_stack|@>= |
scan_open:@+if (isletter(*p)) @<Scan a symbol@>@; |
else if (isdigit(*p)) { |
if (*(p+1)=='F') @<Scan a forward local@>@; |
else if (*(p+1)=='B') @<Scan a backward local@>@; |
else @<Scan a decimal constant@>; |
}@+else@+ switch(*p++) { |
case '#': @<Scan a hexadecimal constant@>;@+break; |
case '\'': @<Scan a character constant@>;@+break; |
case '\"': @<Scan a string constant@>;@+break; |
case '@@': @<Scan the current location@>;@+break; |
case '-': op_stack[op_ptr++]=negate; |
case '+': goto scan_open; |
case '&': op_stack[op_ptr++]=serialize;@+goto scan_open; |
case '~': op_stack[op_ptr++]=complement;@+goto scan_open; |
case '$': op_stack[op_ptr++]=registerize;@+goto scan_open; |
case '(': op_stack[op_ptr++]=inner_lp;@+goto scan_open; |
default: if (p==operand_list+1) { /* treat operand list as empty */ |
operand_list[0]='0', operand_list[1]='\0', p=operand_list; |
goto scan_open; |
} |
if (*(p-1)) derr("syntax error at character `%c'",*(p-1)); |
derr("syntax error after character `%c'",*(p-2)); |
@.syntax error...@> |
} |
|
@ @<Scan a symbol@>= |
{ |
if (*p==':') tt=trie_search(trie_root,p+1); |
else tt=trie_search(cur_prefix,p); |
p=terminator; |
symbol_found: val_ptr++; |
pp=tt->sym; |
if (!pp) pp=tt->sym=new_sym_node(true); |
top_val.link=tt, top_val.equiv=pp->equiv; |
if (pp->link==PREDEFINED) pp->link=DEFINED; |
top_val.status=(pp->link==DEFINED? pure: pp->link==REGISTER? reg_val: |
undefined); |
} |
|
@ @<Scan a forward local@>= |
{ |
tt=&forward_local_host[*p-'0'];@+ p+=2;@+ goto symbol_found; |
} |
|
@ @<Scan a backward local@>= |
{ |
tt=&backward_local_host[*p-'0'];@+ p+=2;@+ goto symbol_found; |
} |
|
@ Statically allocated variables |forward_local_host[j]| and |
|backward_local_host[j]| masquerade as nodes of the trie. |
|
@<Glob...@>= |
trie_node forward_local_host[10], backward_local_host[10]; |
sym_node forward_local[10], backward_local[10]; |
|
@ Initially \.{0H}, \.{1H}, \dots, \.{9H} are defined to be zero. |
|
@<Init...@>= |
for (j=0;j<10;j++) { |
forward_local_host[j].sym=&forward_local[j]; |
backward_local_host[j].sym=&backward_local[j]; |
backward_local[j].link=DEFINED; |
} |
|
@ We have already checked to make sure that the character constant is legal. |
|
@<Scan a character constant@>= |
acc.h=0, acc.l=*p; |
p+=2; |
goto constant_found; |
|
@ @<Scan a string constant@>= |
acc.h=0, acc.l=*p; |
if (*p=='\"') { |
p++; acc.l=0; err("*null string is treated as zero"); |
@.null string...@> |
}@+else if (*(p+1)=='\"') p+=2; |
else *p='\"', *--p=','; |
goto constant_found; |
|
@ @<Scan a decimal constant@>= |
acc.h=0, acc.l=*p-'0'; |
for (p++;isdigit(*p);p++) { |
acc=oplus(acc,shift_left(acc,2)); |
acc=incr(shift_left(acc,1),*p-'0'); |
} |
constant_found: val_ptr++; |
top_val.link=NULL; |
top_val.equiv=acc; |
top_val.status=pure; |
|
@ @<Scan a hexadecimal constant@>= |
if (!isxdigit(*p)) err("illegal hexadecimal constant"); |
@.illegal hexadecimal constant@> |
acc.h=acc.l=0; |
for (;isxdigit(*p);p++) { |
acc=incr(shift_left(acc,4),*p-'0'); |
if (*p>='a') acc=incr(acc,'0'-'a'+10); |
else if (*p>='A') acc=incr(acc,'0'-'A'+10); |
} |
goto constant_found; |
|
@ @<Scan the current location@>= |
acc=cur_loc; |
goto constant_found; |
|
@ @<Scan a binary operator or closing token, |rt_op|@>= |
switch(*p++) { |
case '+': rt_op=plus;@+break; |
case '-': rt_op=minus;@+break; |
case '*': rt_op=times;@+break; |
case '/':@+if (*p!='/') rt_op=over; |
else p++,rt_op=frac;@+break; |
case '%': rt_op=mod;@+break; |
case '<': rt_op=shl;@+goto sh_check; |
case '>': rt_op=shr; |
sh_check:@+if (*p++==*(p-1)) break; |
derr("syntax error at `%c'",*(p-2)); |
@.syntax error...@> |
case '&': rt_op=and;@+break; |
case '|': rt_op=or;@+break; |
case '^': rt_op=xor;@+break; |
case ')': rt_op=inner_rp;@+break; |
case '\0': case ',': rt_op=outer_rp;@+break; |
default: derr("syntax error at `%c'",*(p-1)); |
} |
|
@ @<Perform the top operation on |op_stack|@>= |
switch(op_stack[--op_ptr]) { |
case inner_lp:@+if (rt_op==inner_rp) goto scan_close; |
err("*missing right parenthesis");@+break; |
@.missing right parenthesis@> |
case outer_lp:@+if (rt_op==outer_rp) { |
if (top_val.status==reg_val && (top_val.equiv.l>0xff||top_val.equiv.h)) { |
err("*register number too large, will be reduced mod 256"); |
@.register number...@> |
top_val.equiv.h=0, top_val.equiv.l &= 0xff; |
} |
if (!*(p-1)) goto operands_done; |
else rt_op=outer_lp;@+goto hold_op; /* comma */ |
}@+else { |
op_ptr++; |
err("*missing left parenthesis"); |
@.missing left parenthesis@> |
goto scan_close; |
} |
@t\4@>@<Cases for unary operators@>@; |
@t\4@>@<Cases for binary operators@>@; |
} |
|
@ Now we come to the part where equivalents are changed by unary |
or binary operators found in the expression being scanned. |
|
The most typical operator, and in some ways the fussiest one |
to deal with, is binary addition. Once we've written the code for |
this case, the other cases almost take care of themselves. |
|
@<Cases for binary...@>= |
case plus:@+if (top_val.status==undefined) |
err("cannot add an undefined quantity"); |
@.cannot add...@> |
if (next_val.status==undefined) |
err("cannot add to an undefined quantity"); |
if (top_val.status==reg_val && next_val.status==reg_val) |
err("cannot add two register numbers"); |
next_val.equiv=oplus(next_val.equiv,top_val.equiv); |
fin_bin: next_val.status=(top_val.status==next_val.status? pure: reg_val); |
val_ptr--; |
delink: top_val.link=NULL;@+break; |
|
@ @d unary_check(verb) if (top_val.status!=pure) |
derr("can %s pure values only",verb) |
|
@<Cases for unary...@>= |
case negate: unary_check("negate"); |
@.can negate...@> |
top_val.equiv=ominus(zero_octa,top_val.equiv);@+goto delink; |
case complement: unary_check("complement"); |
@.can complement...@> |
top_val.equiv.h=~top_val.equiv.h, top_val.equiv.l=~top_val.equiv.l; |
goto delink; |
case registerize: unary_check("registerize"); |
@.can registerize...@> |
top_val.status=reg_val;@+goto delink; |
case serialize:@+if (!top_val.link) |
err("can take serial number of symbol only"); |
@.can take serial number...@> |
top_val.equiv.h=0, top_val.equiv.l=top_val.link->sym->serial; |
top_val.status=pure;@+goto delink; |
|
@ @d binary_check(verb) |
if (top_val.status!=pure || next_val.status!=pure) |
derr("can %s pure values only",verb) |
|
@<Cases for binary...@>= |
case minus:@+if (top_val.status==undefined) |
err("cannot subtract an undefined quantity"); |
@.cannot subtract...@> |
if (next_val.status==undefined) |
err("cannot subtract from an undefined quantity"); |
if (top_val.status==reg_val && next_val.status!=reg_val) |
err("cannot subtract register number from pure value"); |
next_val.equiv=ominus(next_val.equiv,top_val.equiv);@+goto fin_bin; |
case times: binary_check("multiply"); |
@.can multiply...@> |
next_val.equiv=omult(next_val.equiv,top_val.equiv);@+goto fin_bin; |
case over: case mod: binary_check("divide"); |
@.can divide...@> |
if (top_val.equiv.l==0 && top_val.equiv.h==0) |
err("*division by zero"); |
@.division by zero@> |
next_val.equiv=odiv(zero_octa,next_val.equiv,top_val.equiv); |
if (op_stack[op_ptr]==mod) next_val.equiv=aux; |
goto fin_bin; |
case frac: binary_check("compute a ratio of"); |
@.can compute...@> |
if (next_val.equiv.h>=top_val.equiv.h && |
(next_val.equiv.l>=top_val.equiv.l || next_val.equiv.h>top_val.equiv.h)) |
err("*illegal fraction"); |
@.illegal fraction@> |
next_val.equiv=odiv(next_val.equiv,zero_octa,top_val.equiv);@+goto fin_bin; |
case shl: case shr: binary_check("compute a bitwise shift of"); |
if (top_val.equiv.h || top_val.equiv.l>63) next_val.equiv=zero_octa; |
else if (op_stack[op_ptr]==shl) |
next_val.equiv=shift_left(next_val.equiv,top_val.equiv.l); |
else next_val.equiv=shift_right(next_val.equiv,top_val.equiv.l,true); |
goto fin_bin; |
case and: binary_check("compute bitwise and of"); |
next_val.equiv.h&=top_val.equiv.h, next_val.equiv.l&=top_val.equiv.l; |
goto fin_bin; |
case or: binary_check("compute bitwise or of"); |
next_val.equiv.h|=top_val.equiv.h, next_val.equiv.l|=top_val.equiv.l; |
goto fin_bin; |
case xor: binary_check("compute bitwise xor of"); |
next_val.equiv.h^=top_val.equiv.h, next_val.equiv.l^=top_val.equiv.l; |
goto fin_bin; |
|
@* Assembling an instruction. |
Now let's move up from the expression level to the instruction level. We get to |
this part of the program at the beginning of a line, or after a |
semicolon at the end of an instruction earlier on the current line. |
Our current position in the buffer is the value of |buf_ptr|. |
|
@<Process the next \MMIXAL\ instruction or comment@>= |
p=buf_ptr;@+ buf_ptr=""; |
@<Scan the label field; |goto bypass| if there is none@>; |
@<Scan the opcode field; |goto bypass| if there is none@>; |
@<Copy the operand field@>; |
buf_ptr=p; |
if (spec_mode && !(op_bits&spec_bit)) |
derr("cannot use `%s' in special mode",op_field); |
@.cannot use...@> |
if ((op_bits&no_label_bit) && lab_field[0]) { |
derr("*label field of `%s' instruction is ignored",op_field); |
lab_field[0]='\0'; |
} |
@.label field...ignored@> |
if (op_bits&align_bits) @<Align the location pointer@>; |
@<Scan the operand field@>; |
if (opcode==GREG) @<Allocate a global register@>; |
if (lab_field[0]) @<Define the label@>; |
@<Do the operation@>; |
bypass:@; |
|
@ @<Scan the label field; |goto bypass| if there is none@>= |
if (!*p) goto bypass; |
q=lab_field; |
if (!isspace(*p)) { |
if (!isdigit(*p)&&!isletter(*p)) goto bypass; /* comment */ |
for (*q++=*p++;isdigit(*p)||isletter(*p);p++,q++) *q=*p; |
if (*p && !isspace(*p)) derr("label syntax error at `%c'",*p); |
@.label syntax error...@> |
} |
*q='\0'; |
if (isdigit(lab_field[0]) && (lab_field[1]!='H' || lab_field[2])) |
derr("improper local label `%s'",lab_field); |
@.improper local label...@> |
for (p++;isspace(*p);p++); |
|
@ We copy the opcode field to a special buffer because we might |
want to refer to the symbolic opcode in error messages. |
|
@<Scan the opcode field...@>= |
q=op_field;@+ |
while (isletter(*p)||isdigit(*p)) *q++=*p++; |
*q='\0'; |
if (!isspace(*p) && *p && op_field[0]) derr("opcode syntax error at `%c'",*p); |
@.opcode syntax error...@> |
pp=trie_search(op_root,op_field)->sym; |
if (!pp) { |
if (op_field[0]) derr("unknown operation code `%s'",op_field); |
@.unknown operation code@> |
if (lab_field[0]) derr("*no opcode; label `%s' will be ignored",lab_field); |
@.no opcode...@> |
goto bypass; |
} |
opcode=pp->equiv.h, op_bits=pp->equiv.l; |
while (isspace(*p)) p++; |
|
@ @<Glob...@>= |
tetra opcode; /* numeric code for \MMIX\ operation or \MMIXAL\ pseudo-op */ |
tetra op_bits; /* flags describing an operator's special characteristics */ |
|
@ We copy the operand field to a special buffer so that we can |
change string constants while scanning them later. |
|
@<Copy the operand field@>= |
q=operand_list; |
while (*p) { |
if (*p==';') break; |
if (*p=='\'') { |
*q++=*p++; |
if (!*p) err("incomplete character constant"); |
@.incomplete...constant@> |
*q++=*p++; |
if (*p!='\'') err("illegal character constant"); |
@.illegal character constant@> |
}@+else if (*p=='\"') { |
for (*q++=*p++;*p && *p!='\"';p++,q++) *q=*p; |
if (!*p) err("incomplete string constant"); |
} |
*q++=*p++; |
if (isspace(*p)) break; |
} |
while (isspace(*p)) p++; |
if (*p==';') p++; |
else p=""; /* if not followed by semicolon, rest of the line is a comment */ |
if (q==operand_list) *q++='0'; /* change empty operand field to `\.0' */ |
*q='\0'; |
|
@ It is important to do the alignment in this step before defining |
the label or evaluating the operand field. |
|
@<Align the location pointer@>= |
{ |
j=(op_bits&align_bits)>>16; |
acc.h=-1, acc.l=-(1<<j); |
cur_loc=oand(incr(cur_loc,(1<<j)-1),acc); |
} |
|
@ @<Allocate a global register@>= |
{ |
if (val_stack[0].equiv.l || val_stack[0].equiv.h) { |
for (j=greg;j<255;j++) |
if (greg_val[j].l==val_stack[0].equiv.l && |
greg_val[j].h==val_stack[0].equiv.h) { |
cur_greg=j; goto got_greg; |
} |
} |
if (greg==32) err("too many global registers"); |
@.too many global registers@> |
greg--; |
greg_val[greg]=val_stack[0].equiv;@+ cur_greg=greg; |
got_greg:; |
} |
|
@ If the label is, say \.{2H}, we will already have used the old |
value of \.{2B} when evaluating the operands. Furthermore, an |
operand of \.{2F} will have been treated as undefined, which it |
still is. |
|
Symbols can be defined more than once, but only if each definition |
gives them the same equivalent value. |
|
A warning message is given when a predefined symbol is being redefined, |
if its predefined value has already been used. |
|
@<Define the label@>= |
{ |
sym_node *new_link=DEFINED; |
acc=cur_loc; |
if (opcode==IS) { |
cur_loc=val_stack[0].equiv; |
if (val_stack[0].status==reg_val) new_link=REGISTER; |
}@+else if (opcode==GREG) cur_loc.h=0, cur_loc.l=cur_greg, new_link=REGISTER; |
@<Find the symbol table node, |pp|@>; |
if (pp->link==DEFINED || pp->link==REGISTER) { |
if (pp->equiv.l!=cur_loc.l||pp->equiv.h!=cur_loc.h || pp->link!=new_link) { |
if (pp->serial) derr("symbol `%s' is already defined",lab_field); |
@.symbol...already defined@> |
pp->serial=++serial_number; |
derr("*redefinition of predefined symbol `%s'",lab_field); |
@.redefinition...@> |
} |
}@+ else if (pp->link==PREDEFINED) pp->serial=++serial_number; |
else if (pp->link) { |
if (new_link==REGISTER) err("future reference cannot be to a register"); |
@.future reference cannot...@> |
do @<Fix prior references to this label@>@;@+while (pp->link); |
} |
if (isdigit(lab_field[0])) pp=&backward_local[lab_field[0]-'0']; |
pp->equiv=cur_loc;@+ pp->link=new_link; |
@<Fix references that might be in the |val_stack|@>; |
if (listing_file && (opcode==IS || opcode==LOC)) |
@<Make special listing to show the label equivalent@>; |
cur_loc=acc; |
} |
|
@ @<Fix references that might be in the |val_stack|@>= |
if (!isdigit(lab_field[0])) |
for (j=0;j<val_ptr;j++) |
if (val_stack[j].status==undefined && val_stack[j].link->sym==pp) { |
val_stack[j].status=(new_link==REGISTER? reg_val: pure); |
val_stack[j].equiv=cur_loc; |
} |
|
@ @<Find the symbol table node, |pp|@>= |
if (isdigit(lab_field[0])) pp=&forward_local[lab_field[0]-'0']; |
else { |
if (lab_field[0]==':') tt=trie_search(trie_root,lab_field+1); |
else tt=trie_search(cur_prefix,lab_field); |
pp=tt->sym; |
if (!pp) pp=tt->sym=new_sym_node(true); |
} |
|
@ @<Fix prior references to this label@>= |
{ |
qq=pp->link; |
pp->link=qq->link; |
mmo_loc(); |
if (qq->serial==fix_o) @<Fix a future reference from an octabyte@>@; |
else @<Fix a future reference from a relative address@>; |
recycle_fixup(qq); |
} |
|
@ @<Fix a future reference from an octabyte@>= |
{ |
if (qq->equiv.h&0xffffff) { |
mmo_lop(lop_fixo,0,2); |
mmo_tetra(qq->equiv.h); |
}@+else mmo_lop(lop_fixo,qq->equiv.h>>24,1); |
mmo_tetra(qq->equiv.l); |
} |
|
@ @<Fix a future reference from a relative address@>= |
{ |
octa o; |
o=ominus(cur_loc,qq->equiv); |
if (o.l&3) |
dderr("*relative address in location #%08x%08x not divisible by 4", |
@.relative address...@> |
qq->equiv.h,qq->equiv.l); |
o=shift_right(o,2,0);@+ |
k=0; |
if (o.h==0) |
if (o.l<0x10000) mmo_lopp(lop_fixr,o.l); |
else if (qq->serial==fix_xyz && o.l<0x1000000) { |
mmo_lop(lop_fixrx,0,24);@+mmo_tetra(o.l); |
}@+else k=1; |
else if (o.h==0xffffffff) |
if (qq->serial==fix_xyz && o.l>=0xff000000) { |
mmo_lop(lop_fixrx,0,24);@+mmo_tetra(o.l&0x1ffffff); |
}@+else if (qq->serial==fix_yz && o.l>=0xffff0000) { |
mmo_lop(lop_fixrx,0,16);@+mmo_tetra(o.l&0x100ffff); |
}@+else k=1; |
else k=1; |
if (k) dderr("relative address in location #%08x%08x is too far away", |
qq->equiv.h,qq->equiv.l); |
} |
|
@ @<Make special listing to show the label equivalent@>= |
if (new_link==DEFINED) { |
fprintf(listing_file,"(%08x%08x)",cur_loc.h,cur_loc.l); |
flush_listing_line(" "); |
}@+else { |
fprintf(listing_file,"($%03d)",cur_loc.l&0xff); |
flush_listing_line(" "); |
} |
|
@ @<Do the operation@>= |
future_bits=0; |
if (op_bits&many_arg_bit) @<Do a many-operand operation@>@; |
else@+switch (val_ptr) { |
case 1:@+if (!(op_bits&one_arg_bit)) |
derr("opcode `%s' needs more than one operand",op_field); |
@.opcode...operand(s)@> |
@<Do a one-operand operation@>; |
case 2:@+if (!(op_bits&two_arg_bit)) |
if (op_bits&one_arg_bit) |
derr("opcode `%s' must not have two operands",op_field)@; |
else derr("opcode `%s' must have more than two operands",op_field); |
@<Do a two-operand operation@>; |
case 3:@+if (!(op_bits&three_arg_bit)) |
derr("opcode `%s' must not have three operands",op_field); |
@<Do a three-operand operation@>; |
default: derr("too many operands for opcode `%s'",op_field); |
@.too many operands...@> |
} |
|
@ The many-operand operators are |BYTE|, |WYDE|, |TETRA|, and |OCTA|. |
|
@<Do a many-operand operation@>= |
for (j=0;j<val_ptr;j++) { |
@<Deal with cases where |val_stack[j]| is impure@>; |
k=1<<(opcode-BYTE); |
if ((val_stack[j].equiv.h && opcode<OCTA) ||@| |
(val_stack[j].equiv.l>0xffff && opcode<TETRA) ||@| |
(val_stack[j].equiv.l>0xff && opcode<WYDE)) |
if (k==1) err("*constant doesn't fit in one byte")@; |
@.constant doesn't fit...@> |
else derr("*constant doesn't fit in %d bytes",k); |
if (k<8) assemble(k,val_stack[j].equiv.l,0); |
else if (val_stack[j].status==undefined) |
assemble(4,0,0xf0), assemble(4,0,0xf0); |
else assemble(4,val_stack[j].equiv.h,0), assemble(4,val_stack[j].equiv.l,0); |
} |
|
@ @<Deal with cases where |val_stack[j]| is impure@>= |
if (val_stack[j].status==reg_val) |
err("*register number used as a constant")@; |
@.register number...@> |
else if (val_stack[j].status==undefined) { |
if (opcode!=OCTA) err("undefined constant"); |
@.undefined constant@> |
pp=val_stack[j].link->sym; |
qq=new_sym_node(false); |
qq->link=pp->link; |
pp->link=qq; |
qq->serial=fix_o; |
qq->equiv=cur_loc; |
} |
|
@ @<Do a three-operand operation@>= |
@<Do the Z field@>; |
@<Do the Y field@>; |
assemble_X: @<Do the X field@>; |
assemble_inst: assemble(4,(opcode<<24)+xyz,future_bits); |
break; |
|
@ Individual fields of an instruction are placed into |
global variables |z|, |y|, |x|, |yz|, and/or |xyz|. |
|
@<Glob...@>= |
tetra z,y,x,yz,xyz; /* pieces for assembly */ |
int future_bits; /* places where there are future references */ |
|
@ @<Do the Z field@>= |
if (val_stack[2].status==undefined) err("Z field is undefined"); |
@.Z field is undefined@> |
if (val_stack[2].status==reg_val) { |
if (!(op_bits&(immed_bit+zr_bit+zar_bit))) |
derr("*Z field of `%s' should not be a register number",op_field); |
@.Z field...register number@> |
}@+ else if (op_bits&immed_bit) opcode++; /* immediate */ |
else if (op_bits&zr_bit) |
derr("*Z field of `%s' should be a register number",op_field); |
if (val_stack[2].equiv.h || val_stack[2].equiv.l>0xff) |
err("*Z field doesn't fit in one byte"); |
@.Z field doesn't fit...@> |
z=val_stack[2].equiv.l&0xff; |
|
@ @<Do the Y field@>= |
if (val_stack[1].status==undefined) err("Y field is undefined"); |
@.Y field is undefined@> |
if (val_stack[1].status==reg_val) { |
if (!(op_bits&(yr_bit+yar_bit))) |
derr("*Y field of `%s' should not be a register number",op_field); |
@.Y field...register number@> |
}@+ else if (op_bits&yr_bit) |
derr("*Y field of `%s' should be a register number",op_field); |
if (val_stack[1].equiv.h || val_stack[1].equiv.l>0xff) |
err("*Y field doesn't fit in one byte"); |
@.Y field doesn't fit...@> |
y=val_stack[1].equiv.l&0xff;@+ |
yz=(y<<8)+z; |
|
@ @<Do the X field@>= |
if (val_stack[0].status==undefined) err("X field is undefined"); |
@.X field is undefined@> |
if (val_stack[0].status==reg_val) { |
if (!(op_bits&(xr_bit+xar_bit))) |
derr("*X field of `%s' should not be a register number",op_field); |
@.X field...register number@> |
}@+ else if (op_bits&xr_bit) |
derr("*X field of `%s' should be a register number",op_field); |
if (val_stack[0].equiv.h || val_stack[0].equiv.l>0xff) |
err("*X field doesn't fit in one byte"); |
@.X field doesn't fit...@> |
x=val_stack[0].equiv.l&0xff;@+ |
xyz=(x<<16)+yz; |
|
@ @<Do a two-operand operation@>= |
if (val_stack[1].status==undefined) { |
if (op_bits&rel_addr_bit) |
@<Assemble YZ as a future reference and |goto assemble_X|@>@; |
else err("YZ field is undefined"); |
@.YZ field is undefined@> |
}@+else if (val_stack[1].status==reg_val) { |
if (!(op_bits&(immed_bit+yzr_bit+yzar_bit))) |
derr("*YZ field of `%s' should not be a register number",op_field); |
@.YZ field...register number@> |
if (opcode==SET) val_stack[1].equiv.l<<=8,opcode=0xc1; /* change to \.{OR} */ |
else if (op_bits&mem_bit) |
val_stack[1].equiv.l<<=8,opcode++; /* silently append \.{,0} */ |
}@+ else { /* |val_stack[1].status==pure| */ |
if (op_bits&mem_bit) |
@<Assemble YZ as a memory address and |goto assemble_X|@>; |
if (opcode==SET) opcode=0xe3; /* change to \.{SETL} */ |
else if (op_bits&immed_bit) opcode++; /* immediate */ |
else if (op_bits&yzr_bit) { |
derr("*YZ field of `%s' should be a register number",op_field); |
} |
if (op_bits&rel_addr_bit) |
@<Assemble YZ as a relative address and |goto assemble_X|@>; |
} |
if (val_stack[1].equiv.h || val_stack[1].equiv.l>0xffff) |
err("*YZ field doesn't fit in two bytes"); |
@.YZ field doesn't fit...@> |
yz=val_stack[1].equiv.l&0xffff; |
goto assemble_X; |
|
@ @<Assemble YZ as a future reference...@>= |
{ |
pp=val_stack[1].link->sym; |
qq=new_sym_node(false); |
qq->link=pp->link; |
pp->link=qq; |
qq->serial=fix_yz; |
qq->equiv=cur_loc; |
yz=0; |
future_bits=0xc0; |
goto assemble_X; |
} |
|
@ @<Assemble YZ as a relative address and |goto assemble_X|@>= |
{ |
octa source, dest; |
if (val_stack[1].equiv.l&3) |
err("*relative address is not divisible by 4"); |
@.relative address...@> |
source=shift_right(cur_loc,2,0); |
dest=shift_right(val_stack[1].equiv,2,0); |
acc=ominus(dest,source); |
if (!(acc.h&0x80000000)) { |
if (acc.l>0xffff || acc.h) |
err("relative address is more than #ffff tetrabytes forward"); |
}@+else { |
acc=incr(acc,0x10000); |
opcode++; |
if (acc.l>0xffff || acc.h) |
err("relative address is more than #10000 tetrabytes backward"); |
} |
yz=acc.l; |
goto assemble_X; |
} |
|
@ @<Assemble YZ as a memory address and |goto assemble_X|@>= |
{ |
octa o; |
o=val_stack[1].equiv, k=0; |
for (j=greg;j<255;j++) if (greg_val[j].h || greg_val[j].l) { |
acc=ominus(val_stack[1].equiv,greg_val[j]); |
if (acc.h<=o.h && (acc.l<=o.l || acc.h<o.h)) o=acc, k=j; |
} |
if (o.l<=0xff && !o.h && k) yz=(k<<8)+o.l, opcode++; |
else if (!expanding) err("no base address is close enough to the address A")@; |
@.no base address...@> |
else @<Assemble instructions to put supplementary data in \$255@>; |
goto assemble_X; |
} |
|
@ @d SETH 0xe0 |
@d ORH 0xe8 |
@d ORL 0xeb |
|
@<Assemble instructions to put supplementary data in \$255@>= |
{ |
for (j=SETH;j<=ORL;j++) { |
switch (j&3) { |
case 0: yz=o.h>>16;@+break; /* \.{SETH} */ |
case 1: yz=o.h&0xffff;@+break; /* \.{SETMH} or \.{ORMH} */ |
case 2: yz=o.l>>16;@+break; /* \.{SETML} or \.{ORML} */ |
case 3: yz=o.l&0xffff;@+break; /* \.{SETL} or \.{ORL} */ |
} |
if (yz) { |
assemble(4,(j<<24)+(255<<16)+yz,0); |
j |= ORH; |
} |
} |
if (k) yz=(k<<8)+255; /* Y = \$$k$, Z = \$255 */ |
else yz=255<<8, opcode++; /* Y = \$255, Z = 0 */ |
} |
|
@ @<Do a one-operand operation@>= |
if (val_stack[0].status==undefined) { |
if (op_bits&rel_addr_bit) |
@<Assemble XYZ as a future reference and |goto assemble_inst|@>@; |
else if (opcode!=PREFIX) err("the operand is undefined"); |
@.the operand is undefined@> |
}@+else if (val_stack[0].status==reg_val) { |
if (!(op_bits&(xyzr_bit+xyzar_bit))) |
derr("*operand of `%s' should not be a register number",op_field); |
@.operand...register number@> |
}@+ else { /* |val_stack[0].status==pure| */ |
if (op_bits&xyzr_bit) |
derr("*operand of `%s' should be a register number",op_field); |
if (op_bits&rel_addr_bit) |
@<Assemble XYZ as a relative address and |goto assemble_inst|@>; |
} |
if (opcode>0xff) @<Do a pseudo-operation and |goto bypass|@>; |
if (val_stack[0].equiv.h || val_stack[0].equiv.l>0xffffff) |
err("*XYZ field doesn't fit in three bytes"); |
@.XYZ field doesn't fit...@> |
xyz=val_stack[0].equiv.l&0xffffff; |
goto assemble_inst; |
|
@ @<Assemble XYZ as a future reference...@>= |
{ |
pp=val_stack[0].link->sym; |
qq=new_sym_node(false); |
qq->link=pp->link; |
pp->link=qq; |
qq->serial=fix_xyz; |
qq->equiv=cur_loc; |
xyz=0; |
future_bits=0xe0; |
goto assemble_inst; |
} |
|
@ @<Assemble XYZ as a relative address...@>= |
{ |
octa source, dest; |
if (val_stack[0].equiv.l&3) |
err("*relative address is not divisible by 4"); |
@.relative address...@> |
source=shift_right(cur_loc,2,0); |
dest=shift_right(val_stack[0].equiv,2,0); |
acc=ominus(dest,source); |
if (!(acc.h&0x80000000)) { |
if (acc.l>0xffffff || acc.h) |
err("relative address is more than #ffffff tetrabytes forward"); |
}@+else { |
acc=incr(acc,0x1000000); |
opcode++; |
if (acc.l>0xffffff || acc.h) |
err("relative address is more than #1000000 tetrabytes backward"); |
} |
xyz=acc.l; |
goto assemble_inst; |
} |
|
@ @<Do a pseudo-operation...@>= |
switch(opcode) { |
case LOC: cur_loc=val_stack[0].equiv; |
case IS: goto bypass; |
case PREFIX:@+if (!val_stack[0].link) err("not a valid prefix"); |
@.not a valid prefix@> |
cur_prefix=val_stack[0].link;@+goto bypass; |
case GREG:@+if (listing_file) @<Make listing for |GREG|@>; |
goto bypass; |
case LOCAL:@+if (val_stack[0].equiv.l>lreg) lreg=val_stack[0].equiv.l; |
if (listing_file) { |
fprintf(listing_file,"($%03d)",val_stack[0].equiv.l); |
flush_listing_line(" "); |
} |
goto bypass; |
case BSPEC:@+if (val_stack[0].equiv.l>0xffff || val_stack[0].equiv.h) |
err("*operand of `BSPEC' doesn't fit in two bytes"); |
@.operand of `BSPEC'...@> |
mmo_loc();@+mmo_sync(); |
mmo_lopp(lop_spec,val_stack[0].equiv.l); |
spec_mode=true;@+spec_mode_loc=0;@+ goto bypass; |
case ESPEC: spec_mode=false;@+goto bypass; |
} |
|
@ @<Glob...@>= |
octa greg_val[256]; /* initial values of global registers */ |
|
@ @<Make listing for |GREG|@>= |
if (val_stack[0].equiv.l || val_stack[0].equiv.h) { |
fprintf(listing_file,"($%03d=#%08x",cur_greg,val_stack[0].equiv.h); |
flush_listing_line(" "); |
fprintf(listing_file," %08x)",val_stack[0].equiv.l); |
flush_listing_line(" "); |
}@+else { |
fprintf(listing_file,"($%03d)",cur_greg); |
flush_listing_line(" "); |
} |
|
@* Running the program. On a \UNIX/-like system, the command |
$$\.{mmixal [options] sourcefilename}$$ |
will assemble the \MMIXAL\ program in file \.{sourcefilename}, |
writing any error messages on the standard error file. (Nothing is written to |
the standard output.) The options, which may appear in any order, are: |
|
\bull\.{-o objectfilename}\quad Send the output to a binary file called |
\.{objectfilename}. |
If no \.{-o} specification is given, the object file name is obtained from the |
input file name by changing the final letter from `\.s' to~`\.o', or by |
appending `\.{.mmo}' if \.{sourcefilename} doesn't end with~\.s. |
|
\bull\.{-l listingname}\quad Output a listing of the assembled input and |
output to a text file called \.{listingname}. |
|
\bull\.{-x}\quad Expand memory-oriented commands that cannot be assembled |
as single instructions, by assembling auxiliary instructions that make |
temporary use of global register~\$255. |
|
\bull\.{-b bufsize}\quad Allow up to \.{bufsize} characters per line of input. |
|
@ Here, finally, is the overall structure of this program. |
|
@c |
#include <stdio.h> |
#include <stdlib.h> |
#include <ctype.h> |
#include <string.h> |
#include <time.h> |
@# |
@<Preprocessor definitions@>@; |
@<Type definitions@>@; |
@<Global variables@>@; |
@<Subroutines@>@; |
@# |
int main(argc,argv) |
int argc;@+ |
char *argv[]; |
{ |
register int j,k; /* all-purpose integers */ |
@<Local variables@>; |
@<Process the command line@>; |
@<Initialize everything@>; |
while(1) { |
@<Get the next line of input text, or |break| if the input has ended@>; |
while(1) { |
@<Process the next \MMIXAL\ instruction or comment@>; |
if (!*buf_ptr) break; |
} |
if (listing_file) { |
if (listing_bits) listing_clear(); |
else if (!line_listed) flush_listing_line(" "); |
} |
} |
@<Finish the assembly@>; |
} |
|
@ The space after |"-b"| is optional, because |
{\mc MMIX-SIM} does not use a space in this context. |
|
@<Process the command line@>= |
for (j=1;j<argc-1 && argv[j][0]=='-';j++) if (!argv[j][2]) { |
if (argv[j][1]=='x') expanding=1; |
else if (argv[j][1]=='o') j++,strcpy(obj_file_name,argv[j]); |
else if (argv[j][1]=='l') j++,strcpy(listing_name,argv[j]); |
else if (argv[j][1]=='b' && sscanf(argv[j+1],"%d",&buf_size)==1) j++; |
else break; |
}@+else if (argv[j][1]!='b' || sscanf(argv[j]+1,"%d",&buf_size)!=1) break; |
if (j!=argc-1) { |
fprintf(stderr,"Usage: %s %s sourcefilename\n", |
@.Usage: ...@> |
argv[0],"[-x] [-l listingname] [-b buffersize] [-o objectfilename]"); |
exit(-1); |
} |
src_file_name=argv[j]; |
|
@ @<Open the files@>= |
src_file=fopen(src_file_name,"r"); |
if (!src_file) dpanic("Can't open the source file %s",src_file_name); |
@.Can't open...@> |
if (!obj_file_name[0]) { |
j=strlen(src_file_name); |
if (src_file_name[j-1]=='s') { |
strcpy(obj_file_name,src_file_name);@+ obj_file_name[j-1]='o'; |
} else sprintf(obj_file_name,"%s.mmo",src_file_name); |
} |
obj_file=fopen(obj_file_name,"wb"); |
if (!obj_file) dpanic("Can't open the object file %s",obj_file_name); |
if (listing_name[0]) { |
listing_file=fopen(listing_name,"w"); |
if (!listing_file) dpanic("Can't open the listing file %s",listing_name); |
} |
|
@ @<Glob...@>= |
char *src_file_name; /* name of the \MMIXAL\ input file */ |
char obj_file_name[FILENAME_MAX+1]; /* name of the binary output file */ |
char listing_name[FILENAME_MAX+1]; /* name of the optional listing file */ |
FILE *src_file, *obj_file, *listing_file; |
int expanding; /* are we expanding instructions when base address fail? */ |
int buf_size; /* maximum number of characters per line of input */ |
|
@ @<Init...@>= |
@<Open the files@>; |
filename[0]=src_file_name; |
filename_count=1; |
@<Output the preamble@>; |
|
@ @<Output the preamble@>= |
mmo_lop(lop_pre,1,1); |
mmo_tetra(time(NULL)); |
mmo_cur_file=-1; |
|
@ @<Finish the assembly@>= |
if (lreg>=greg) |
dpanic("Danger: Must reduce the number of GREGs by %d",lreg-greg+1); |
@.Danger@> |
@<Output the postamble@>; |
@<Check and output the trie@>; |
@<Report any undefined local symbols@>; |
if (err_count) { |
if (err_count>1) fprintf(stderr,"(%d errors were found.)\n",err_count); |
else fprintf(stderr,"(One error was found.)\n"); |
} |
exit(err_count); |
|
@ @<Glob...@>= |
int greg=255; /* global register allocator */ |
int cur_greg; /* global register just allocated */ |
int lreg=32; /* local register allocator */ |
|
@ @<Output the postamble@>= |
mmo_lop(lop_post,0,greg); |
greg_val[255]=trie_search(trie_root,"Main")->sym->equiv; |
for (j=greg;j<256;j++) { |
mmo_tetra(greg_val[j].h); |
mmo_tetra(greg_val[j].l); |
} |
|
@ @<Report any undefined local symbols@>= |
for (j=0;j<10;j++) if (forward_local[j].link) |
err_count++,fprintf(stderr,"undefined local symbol %dF\n",j); |
@.undefined local symbol@> |
|
@* Index. |
|
/permu-heap.mms
0,0 → 1,75
* Permutation generator a la Heap |
N IS 5 $n$ (3, 4, 5, or 6) |
t IS $255 |
j IS $0 $8j$ |
k IS $1 $8k$ |
ak IS $2 |
aj IS $3 |
|
LOC Data_Segment |
a GREG @ Base address for $a_0\ldots a_{n-1}$ |
A0 IS @ |
A1 IS @+8 |
A2 IS @+16 |
* LOC @+8*N Space for $a_0\ldots a_{n-1}$ |
BYTE "11111111","22222222","33333333" |
BYTE "44444444","55555555","66666666" |
BYTE #a,0 |
LOC (@+7)&-8 (align to octabyte) |
c GREG @-8*3 Location of $c_0$ |
LOC @-8*3+8*N $8c_3\ldots 8c_{n-1}$, initially zero |
OCTA -1 $c_n=-1$, a convenient sentinel |
u GREG 0 Contents of $a_0$, except in inner loop |
v GREG 0 Contents of $a_1$, except in inner loop |
w GREG 0 Contents of $a_2$, except in inner loop |
|
LOC #100 |
1H STCO 0,c,k $c_k\gets 0$. |
INCL k,8 $k\gets k+1$. |
0H LDO j,c,k $j\gets c_k$. |
CMP t,j,k |
BZ t,1B Loop if $c_k=k$. |
BN j,Done Terminate if $c_k<0$ ($k=n$). |
LDO ak,a,k Fetch $a_k$. |
ADD t,j,8 |
STO t,c,k $c_k\gets j+1$. |
AND t,k,#8 |
CSZ j,t,0 Set $j\gets 0$ if $k$ is even. |
LDO aj,a,j Fetch $a_j$. |
STO ak,a,j Replace it by $a_k$. |
CSZ u,j,ak Set $u\gets a_k$ if $j=0$. |
SUB j,j,8 $j\gets j-1$. |
CSZ v,j,ak Set $v\gets a_k$ if $j=0$. |
SUB j,j,8 $j\gets j-1$. |
CSZ w,j,ak Set $w\gets a_k$ if $j=0$. |
STO aj,a,k Replace $a_k$ by what was $a_j$. |
In PUSHJ 0,Visit |
STO v,A0 $a_0\gets v$. |
STO u,A1 $a_1\gets u$. |
PUSHJ 0,Visit |
STO w,A0 $a_0\gets w$. |
STO v,A2 $a_2\gets v$. |
PUSHJ 0,Visit |
STO u,A0 $a_0\gets u$. |
STO w,A1 $a_1\gets w$. |
PUSHJ 0,Visit |
STO v,A0 $a_0\gets v$. |
STO u,A2 $a_2\gets u$. |
PUSHJ 0,Visit |
STO w,A0 $a_0\gets w$. |
STO v,A1 $a_1\gets v$. |
PUSHJ 0,Visit |
SET t,u Swap $u\leftrightarrow w$. |
SET u,w |
SET w,t |
SET k,8*3 $k\gets3$. |
JMP 0B |
|
Visit LDA t,A0 |
TRAP 0,Fputs,StdOut |
POP |
Main LDO u,A0 |
LDO v,A1 |
LDO w,A2 |
JMP In |
Done TRAP 0,Halt,0 |
/hello.mms
0,0 → 1,9
argv IS $1 |
LOC #100 |
Main LDOU $255,argv,0 |
TRAP 0,Fputs,StdOut |
GETA $255,String |
TRAP 0,Fputs,StdOut |
TRAP 0,Halt,0 |
String BYTE ", world",#a,0 |
|
/popup.mms
0,0 → 1,23
* Testing the solution to exercise 1.4.1--16 |
LOC #100 |
B GET $2,rJ |
PUSHJ $3,C |
PUT rJ,$2; POP 2,0 |
SET $1,1 |
SET $0,$3 |
PUT rJ,$2; POP 2,0 |
|
C BZ $0,1F |
CMP $2,$0,5 |
PBNZ $2,2F |
POP 1,0 |
2H GET $1,rJ |
SUB $3,$0,1 |
PUSHJ $2,C |
PUT rJ,$1; POP 1,0 |
ADD $0,$2,2 |
PUT rJ,$1 |
1H POP 1,2 |
|
Main SET $5,2 manually change this to 5 or 6 or ... |
PUSHJ $0,B |
/test1.mmconfig
0,0 → 1,36
% FIRST CONFIGURATION TEST, goes with test1.mmix |
% The following erroneous lines were commented out one by one while testing: |
%sh*t % obscene |
%memaddresstime 0 % too small |
%memaddresstime unit % unreadable |
%branchpredictbits 9 % too large |
%membusbytes 9 % not a power of two |
%ITcache unit % unknown cache parameter |
%mul0 0 % too small |
%mul0 256 % too big |
%unit antidisestablishmentarianism % too long |
%unit 0 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEG % eh? |
%unit 1 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEFG % 65 |
%unit 2 0000000000000000000000000000000000000000000000000000000000000000 % 0's |
%Dcache blocksize 1024 % exceeds Scache |
%Dcache granularity 16 % exceeds blocksize |
%Scache granularity 16 % differs from Dcache |
memaddresstime 4 |
memreadtime 5 memwritetime 6 % don't ask why |
membusbytes 16 |
branchpredictbits 2 |
branchaddressbits 1 |
branchhistorybits 1 |
branchdualbits 1 |
%branchdualbits 30 |
memchunksmax 2 |
hashprime 3 |
Scache blocksize 32 |
Scache setsize 2 |
Scache associativity 4 lru |
Scache accesstime 2 |
Icache victimsize 2 |
unit UNI1 ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff |
unit UNI2 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF |
sh 1 1 1 |
disablesecurity 1 |
/traffic.mms
0,0 → 1,97
* Traffic Signal Problem |
rate GREG 100 % ridiculously small, for testing (shd be 250MHz) |
t IS $255 |
Sensor_Buf IS Data_Segment |
GREG Sensor_Buf |
|
LOC #100 |
Lights IS 3 |
Sensor IS 4 |
%Lights_Name BYTE "/dev/lights",0 |
%Sensor_Name BYTE "/dev/sensor",0 |
Lights_Name BYTE "lights",0 (temporary name) |
Sensor_Name BYTE "sensor",0 (temporary name) |
Lights_Args OCTA Lights_Name,BinaryWrite |
Sensor_Args OCTA Sensor_Name,BinaryRead |
Read_Sensor OCTA Sensor_Buf,1 |
Boulevard BYTE #77,0 DelMar green, WALK; Berkly red, DONT |
BYTE #7f,0 DelMar green, DONT; Berkly red, DONT |
BYTE #73,0 DelMar green, off; Berkly red, DONT |
BYTE #bf,0 DelMar amber, DONT; Berkly red, DONT |
Avenue BYTE #dd,0 DelMar red, DONT; Berkly green, WALK |
BYTE #df,0 DelMar red, DONT; Berkly green, DONT |
BYTE #dc,0 DelMar red, DONT; Berkly green, off |
BYTE #ef,0 DelMar red, DONT; Berkly amber, DONT |
|
goal GREG % transition time for lights |
Main GETA t,Lights_Args |
TRAP 0,Fopen,Lights |
GETA t,Sensor_Args |
TRAP 0,Fopen,Sensor |
GET goal,rC |
ANDNMH goal,#ffff % temporary patch |
JMP 2F |
|
GREG @ |
delay_go GREG |
Delay GET t,rC |
ANDNMH t,#ffff % temporary patch |
SUBU t,t,goal NB: not CMPU |
PBN t,Delay |
GO delay_go,delay_go,0 |
|
flash_go GREG |
n GREG |
green GREG |
temp GREG |
Flash SET n,8 |
1H ADD t,green,2*1 |
TRAP 0,Fputs,Lights DONT WALK |
ADD temp,goal,rate |
SR t,rate,1 |
ADDU goal,goal,t |
GO delay_go,Delay |
ADD t,green,2*2 |
TRAP 0,Fputs,Lights off |
SET goal,temp |
GO delay_go,Delay |
SUB n,n,1 |
PBP n,1B |
ADD t,green,2*1 |
TRAP 0,Fputs,Lights DONT WALK |
MUL t,rate,4 |
ADDU goal,goal,t |
GO delay_go,Delay |
ADD t,green,2*3 |
TRAP 0,Fputs,Lights DONT WALK, amber |
GO flash_go,flash_go,0 |
|
Wait GET goal,rC |
ANDNMH goal,#ffff % temporary patch |
1H GETA t,Read_Sensor |
TRAP 0,Fread,Sensor |
LDB t,Sensor_Buf |
BZ t,Wait |
GETA green,Boulevard |
GO flash_go,Flash |
MUL t,rate,8 |
ADDU goal,goal,t |
GO delay_go,Delay |
GETA t,Avenue |
TRAP 0,Fputs,Lights |
MUL t,rate,8 |
ADDU goal,goal,t |
GO delay_go,Delay |
GETA green,Avenue |
GO flash_go,Flash |
GETA t,Read_Sensor |
TRAP 0,Fread,Sensor % clear redundant signal |
MUL t,rate,5 |
ADDU goal,goal,t |
GO delay_go,Delay |
2H GETA t,Boulevard |
TRAP 0,Fputs,Lights |
MUL t,rate,18 |
ADDU goal,goal,t |
GO delay_go,Delay |
JMP 1B |
/zero.mms
0,0 → 1,53
LOC #100 |
a IS $0 |
n IS $1 |
z IS $2 |
t IS $255 |
|
1H STB z,a,0 |
SUB n,n,1 |
ADD a,a,1 |
Zero BZ n,9F |
SET z,0 |
AND t,a,7 |
BNZ t,1B |
CMP t,n,64 |
PBNN t,3F |
JMP 5F |
2H STCO 0,a,0 |
SUB n,n,8 |
ADD a,a,8 |
3H AND t,a,63 |
PBNZ t,2B |
CMP t,n,64 |
BN t,5F |
4H PREST 63,a,0 |
SUB n,n,64 |
CMP t,n,64 |
STCO 0,a,0 |
STCO 0,a,8 |
STCO 0,a,16 |
STCO 0,a,24 |
STCO 0,a,32 |
STCO 0,a,40 |
STCO 0,a,48 |
STCO 0,a,56 |
ADD a,a,64 |
PBNN t,4B |
5H CMP t,n,8 |
BN t,7F |
6H STCO 0,a,0 |
SUB n,n,8 |
ADD a,a,8 |
CMP t,n,8 |
PBNN t,6B |
7H BZ n,9F |
8H STB z,a,0 |
SUB n,n,1 |
ADD a,a,1 |
PBNZ n,8B |
9H POP |
|
Main SET a+1,#fff7 |
SET n+1,146 |
PUSHJ 0,Zero |
/Makefile
0,0 → 1,97
# |
# Makefile for MMIXware |
# |
|
# Be sure that CWEB version 3.0 or greater is installed before proceeding! |
# In fact, CWEB 3.61 is recommended for making hardcopy or PDF documentation. |
|
# If you prefer optimization to debugging, change -g to something like -O: |
CFLAGS = -g |
|
# Uncomment the second line if you use pdftex to bypass .dvi files: |
PDFTEX = dvipdfm |
#PDFTEX = pdftex |
|
.SUFFIXES: .dvi .tex .w .ps .pdf .mmo .mmb .mms |
|
.tex.dvi: |
tex $*.tex |
|
.dvi.ps: |
dvips $* -o $*.ps |
|
.w.c: |
if test -r $*.ch; then ctangle $*.w $*.ch; else ctangle $*.w; fi |
|
.w.tex: |
if test -r $*.ch; then cweave $*.w $*.ch; else cweave $*.w; fi |
|
.w.o: |
make $*.c |
make $*.o |
|
.w: |
make $*.c |
make $* |
|
.w.dvi: |
make $*.tex |
make $*.dvi |
|
.w.ps: |
make $*.dvi |
make $*.ps |
|
.w.pdf: |
make $*.tex |
case "$(PDFTEX)" in \ |
dvipdfm ) tex "\let\pdf+ \input $*"; dvipdfm $* ;; \ |
pdftex ) pdftex $* ;; \ |
esac |
|
.mmo.mmb: |
mmix -D$*.mmb $*.mmo |
|
.mms.mmo: |
mmixal -x -b 250 -l $*.mml $*.mms |
|
WEBFILES = abstime.w boilerplate.w mmix-arith.w mmix-config.w mmix-doc.w \ |
mmix-io.w mmix-mem.w mmix-pipe.w mmix-sim.w mmixal.w mmmix.w mmotype.w |
CHANGEFILES = |
TESTFILES = *.mms silly.run silly.out *.mmconfig *.mmix |
MISCFILES = Makefile makefile.dos README mmix.mp mmix.1 |
ALL = $(WEBFILES) $(TESTFILES) $(MISCFILES) |
|
basic: mmixal mmix |
|
doc: mmix-doc.ps mmixal.dvi mmix-sim.dvi |
dvips -n13 mmixal.dvi -o mmixal-intro.ps |
dvips -n8 mmix-sim.dvi -o mmix-sim-intro.ps |
|
all: mmixal mmix mmotype mmmix |
|
clean: |
rm -f *~ *.o *.c *.h *.tex *.log *.dvi *.toc *.idx *.scn *.ps core |
|
mmix-pipe.o: mmix-pipe.c abstime |
./abstime > abstime.h |
$(CC) $(CFLAGS) -c mmix-pipe.c |
rm abstime.h |
|
mmix-config.o: mmix-pipe.o |
|
mmmix: mmix-arith.o mmix-pipe.o mmix-config.o mmix-mem.o mmix-io.o mmmix.c |
$(CC) $(CFLAGS) mmmix.c \ |
mmix-arith.o mmix-pipe.o mmix-config.o mmix-mem.o mmix-io.o -o mmmix |
|
mmixal: mmix-arith.o mmixal.c |
$(CC) $(CFLAGS) mmixal.c mmix-arith.o -o mmixal |
|
mmix: mmix-arith.o mmix-io.o mmix-sim.c abstime |
./abstime > abstime.h |
$(CC) $(CFLAGS) mmix-sim.c mmix-arith.o mmix-io.o -o mmix |
rm abstime.h |
|
tarfile: $(ALL) |
tar cvf /tmp/mmix.tar $(ALL) |
gzip -9 /tmp/mmix.tar |
/strcpy.mms
0,0 → 1,98
in IS $2 |
out IS $3 |
r IS $4 |
l IS $5 |
m IS $6 |
t IS $7 |
mm IS $8 |
tt IS $9 |
flip GREG #0102040810204080 |
ones GREG #0101010101010101 |
|
LOC #100 |
StrCpy AND in,$0,#7 |
SLU in,in,3 |
AND out,$1,#7 |
SLU out,out,3 |
SUB r,out,in |
LDOU out,$1,0 |
SUB $1,$1,$0 |
NEG m,0,1 |
SRU m,m,in |
LDOU in,$0,0 |
PUT rM,m |
NEG mm,0,1 |
BN r,1F |
NEG l,64,r |
SLU tt,out,r |
MUX in,in,tt |
BDIF t,ones,in |
AND t,t,m |
SRU mm,mm,r |
PUT rM,mm |
JMP 4F |
1H NEG l,0,r |
INCL r,64 |
SUB $1,$1,8 |
SRU out,out,l |
MUX in,in,out |
BDIF t,ones,in |
AND t,t,m |
SRU mm,mm,r |
PUT rM,mm |
PBZ t,2F |
JMP 5F |
3H MUX out,tt,out |
STOU out,$0,$1 |
2H SLU out,in,l |
LDOU in,$0,8 |
INCL $0,8 |
BDIF t,ones,in |
4H SRU tt,in,r |
PBZ t,3B |
SRU mm,t,r |
MUX out,tt,out |
BNZ mm,1F |
STOU out,$0,$1 |
5H INCL $0,8 |
SLU out,in,l |
SLU mm,t,l |
1H LDOU in,$0,$1 |
MOR mm,mm,flip |
SUBU t,mm,1 |
ANDN mm,mm,t |
MOR mm,mm,flip |
SUBU mm,mm,1 |
PUT rM,mm |
MUX in,in,out |
STOU in,$0,$1 |
POP 0 |
|
Main SET $3,#8001 |
0H SET $0,0 |
SET $1,#aa |
1H STB $1,$0,0 |
INCL $1,#11 |
CMP $2,$1,#dd |
CSZ $1,$2,#aa |
INCL $0,1 |
CMP $6,$0,32 |
PBNZ $6,1B |
SET $0,$3 |
ADD $2,$3,$5 |
SET $1,3 |
JMP 2F |
1H STB $1,$0,0 |
SUB $1,$1,1 |
CSZ $1,$1,3 |
INCL $0,1 |
2H CMP $6,$0,$2 |
PBNZ $6,1B |
SET $1,0 |
STB $1,$0,0 |
PUSHJ 2,StrCpy |
SET $6,0 |
JMP 0B |
% put src address in $3 |
% put dest addr in $4 |
% put string length in $5 |
/primes6.mms
0,0 → 1,63
% Example program ... Table of primes |
L IS 600 The number of primes to find |
t IS $255 Temporary storage |
n GREG |
q GREG |
r GREG |
jj GREG |
kk GREG |
pk GREG |
mm IS kk |
|
LOC Data_Segment |
PRIME1 WYDE 2 |
LOC PRIME1+2*L |
ptop GREG @ |
j0 GREG PRIME1+2-@ |
BUF OCTA |
|
LOC #100 |
Main SET n,3 |
SET jj,j0 |
2H STWU n,ptop,jj |
INCL jj,2 |
3H BZ jj,2F |
4H INCL n,2 |
5H SET kk,j0 |
6H LDWU pk,ptop,kk |
DIV q,n,pk |
GET r,rR |
BZ r,4B |
7H CMP t,q,pk |
BNP t,2B |
8H INCL kk,2 |
JMP 6B |
GREG @ |
Title BYTE "First Six Hundred Primes" |
NewLn BYTE #a,0 |
Blanks BYTE " ",0 |
2H LDA t,Title |
TRAP 0,Fputs,StdOut |
NEG mm,2 |
3H ADD mm,mm,j0 |
LDA t,Blanks |
TRAP 0,Fputs,StdOut |
2H LDWU pk,ptop,mm |
0H GREG #2030303030000000 |
STOU 0B,BUF |
LDA t,BUF+4 |
1H DIV pk,pk,10 |
GET r,rR |
INCL r,'0' |
STBU r,t,0 |
SUB t,t,1 |
PBNZ pk,1B |
LDA t,BUF |
TRAP 0,Fputs,StdOut |
INCL mm,2*L/10 |
PBN mm,2B |
LDA t,NewLn |
TRAP 0,Fputs,StdOut |
CMP t,mm,2*(L/10-1) |
PBNZ t,3B |
TRAP 0,Halt,0 |
/alpha.mms
0,0 → 1,89
* The "alpha channel" exercise in section 7.1.3 |
x GREG |
y GREG |
z GREG |
m GREG |
alpha GREG |
t IS $255 |
l GREG #0101010101010101 |
h GREG #8080808080808080 |
mone GREG -1 |
rodd GREG #4020100804020101 |
lsh GREG #0080402010080402 |
|
LOC #100 |
Main XOR t,x,y |
MOR z,rodd,t |
AND t,x,y |
ADDU z,z,t |
AND t,alpha,h |
MOR m,mone,t |
PUT rM,m |
MUX x,z,x |
MUX y,y,z |
MOR alpha,lsh,alpha |
XOR t,x,y |
MOR z,t,rodd |
AND t,x,y |
ADDU z,z,t |
AND t,alpha,h |
MOR m,t,mone |
PUT rM,m |
MUX x,z,x |
MUX y,y,z |
MOR alpha,alpha,lsh |
XOR t,x,y |
MOR z,t,rodd |
AND t,x,y |
ADDU z,z,t |
AND t,alpha,h |
MOR m,t,mone |
PUT rM,m |
MUX x,z,x |
MUX y,y,z |
MOR alpha,alpha,lsh |
XOR t,x,y |
MOR z,t,rodd |
AND t,x,y |
ADDU z,z,t |
AND t,alpha,h |
MOR m,t,mone |
PUT rM,m |
MUX x,z,x |
MUX y,y,z |
MOR alpha,alpha,lsh |
XOR t,x,y |
MOR z,t,rodd |
AND t,x,y |
ADDU z,z,t |
AND t,alpha,h |
MOR m,t,mone |
PUT rM,m |
MUX x,z,x |
MUX y,y,z |
MOR alpha,alpha,lsh |
XOR t,x,y |
MOR z,t,rodd |
AND t,x,y |
ADDU z,z,t |
AND t,alpha,h |
MOR m,t,mone |
PUT rM,m |
MUX x,z,x |
MUX y,y,z |
MOR alpha,alpha,lsh |
XOR t,x,y |
MOR z,t,rodd |
AND t,x,y |
ADDU z,z,t |
AND t,alpha,h |
MOR m,t,mone |
PUT rM,m |
MUX x,z,x |
MUX y,y,z |
MOR alpha,alpha,lsh |
XOR t,x,y |
MOR z,t,rodd |
AND t,x,y |
ADDU z,z,t |
TRAP 0,Halt,0 |
/mmix-config.w
0,0 → 1,1041
% This file is part of the MMIXware package (c) Donald E Knuth 1999 |
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! |
|
\def\title{MMIX-CONFIG} |
\def\MMIX{\.{MMIX}} |
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant |
@s bool int |
@s cache int |
@s func int |
@s coroutine int |
@s octa int |
@s cacheset int |
@s cacheblock int |
@s fetch int |
@s control int |
@s write_node int |
@s internal_opcode int |
@s replace_policy int |
@s PV TeX |
@s mmix_opcode int |
@s specnode int |
\def\PV{\\{PV}} % use italics, not \tt |
@s CPV TeX |
\def\CPV{\\{CPV}} |
@s OP TeX |
\def\OP{\\{OP}} |
@s and normal @q unreserve a C++ keyword @> |
@s or normal @q unreserve a C++ keyword @> |
@s xor normal @q unreserve a C++ keyword @> |
|
@*Input format. Configuration files allow this simulator to adapt itself to |
infinitely many possible combinations of hardware features. The purpose of the |
present module is to read a configuration file, check it for validity, and |
set up the relevant data structures. |
|
All data in a configuration file consists simply of {\it tokens\/} separated |
by one or more units of white space, where a ``token'' is any sequence of |
nonspace characters that doesn't contain a percent sign. Percent signs |
and anything following them on a line are ignored; this convention allows |
a user to include comments in the file. Here's a simple (but weird) example: |
$$\vbox{\halign{\tt#\hfil\cr |
\% Silly configuration\cr |
writebuffer 200\cr |
memaddresstime 100\cr |
Dcache associativity 4 lru\cr |
Dcache blocksize 1024\cr |
unit ODD 5555555555555555555555555555555555555555555555555555555555555555\cr |
unit EVEN aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\cr |
div 40 30 20\ \ \% three-stage divide\cr |
}}$$ |
It means that (1) the write buffer has capacity for 200 octabytes; |
(2)~the memory bus takes 100 cycles to process an address; |
(3)~there's a D-cache, in which each set has 4 blocks and the replacement |
policy is least-recently-used; |
(4)~each block in the D-cache has 1024 bytes; |
(5)~there are two functional units, one for all the odd-numbered opcodes |
and one for all the rest; |
(6)~the division instructions take three pipeline stages, spending 40 cycles |
in the first stage, 30~in the second, and 20 in the last; |
(7)~all other parameters have default values. |
|
@ Four kinds of specifications can appear in a configuration file, |
according to the following syntax: |
\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow |
$$\vbox{\halign{$#$\hfil\cr |
\<specification>\is\<PV spec>\mid\<cache spec>\mid\<pipe spec>\mid |
\<functional spec>\cr |
\<PV spec>\is\<parameter>\<decimal value>\cr |
\<cache spec>\is\<cache name>\<cache parameter>\<decimal value>\<policy>\cr |
\<pipe spec>\is\<operation>\<pipeline times>\cr |
\<functional spec>\is\.{unit}\ \<name>\<64 hexadecimal digits>\cr}}$$ |
|
@ A \<PV spec> simply assigns a given value to a given parameter. The |
possibilities for \<parameter> are as follows: |
|
\def\bull#1 {\smallskip\hang\textindent{$\bullet$}\.{#1}\enspace} |
\bull fetchbuffer (default 4), maximum instructions in the fetch buffer; |
must be $\ge1$. |
|
\bull writebuffer (default 2), maximum octabytes in the write buffer; |
must be $\ge1$. |
|
\bull reorderbuffer (default 5), maximum instructions issued but not |
committed; must be $\ge1$. |
|
\bull renameregs (default 5), maximum partial results in the reorder |
buffer; must be $\ge1$. |
|
\bull memslots (default 2), maximum store instructions in the reorder |
buffer; must be $\ge1$. |
|
\bull localregs (default 256), number of local registers in ring; |
must be 256, 512, or 1024. |
|
\bull fetchmax (default 2), maximum instructions fetched per cycle; |
must be $\ge1$. |
|
\bull dispatchmax (default 1), maximum instructions issued per cycle; |
must be $\ge1$. |
|
\bull peekahead (default 1), maximum lookahead for jumps per cycle. |
|
\bull commitmax (default 1), maximum instructions committed per cycle; |
must be $\ge1$. |
|
\bull fremmax (default 1), maximum reductions in \.{FREM} computation per |
cycle; must be $\ge1$. |
|
\bull denin (default 1), extra cycles taken if a floating point input |
is subnormal. |
|
\bull denout (default 1), extra cycles taken if a floating point result |
is subnormal. |
|
\bull writeholdingtime (default 0), minimum number of cycles for data to |
remain in the write buffer. |
|
\bull memaddresstime (default 20), cycles to process memory address; |
must be $\ge1$. |
|
\bull memreadtime (default 20), cycles to read one memory busload; |
must be $\ge1$. |
|
\bull memwritetime (default 20), cycles to write one memory busload; |
must be $\ge1$. |
|
\bull membusbytes (default 8), number of bytes per memory busload; must be a |
power of~2 that is 8~or~more. |
|
\bull branchpredictbits (default 0), number of bits in each branch prediction |
table entry; must be $\le8$. |
|
\bull branchaddressbits (default 0), number of bits in instruction address |
used to index the branch prediction table. |
|
\bull branchhistorybits (default 0), number of bits in branch history used to |
index the branch prediction table. |
|
\bull branchdualbits (default 0), number of bits of |
instruction-address-xor-branch-history used to index the branch prediction |
table. |
|
\bull hardwarepagetable (default 1), is zero if page table calculations |
must be emulated by the operating system. |
|
\bull disablesecurity (default 0), is 1 if the hot-seat security checks |
are turned off. This option is used only for testing purposes; it means |
that the `\.s' interrupt will not occur, and the `\.p' interrupt will |
be signaled only when going from a nonnegative location to a negative one. |
|
\bull memchunksmax (default 1000), maximum number of $2^{16}$-byte chunks of |
simulated memory; must be $\ge1$. |
|
\bull hashprime (default 2003), prime number used to address simulated memory; |
must exceed \.{memchunksmax}, preferably by a factor of about~2. |
|
\smallskip\noindent |
The values of \.{memchunksmax} and \.{hashprime} affect only the speed of the |
simulator, not its results---unless a very huge program is being simulated. |
The stated defaults for \.{memchunksmax} and \.{hashprime} |
should be adequate for almost all applications. |
|
@ A \<cache spec> assigns a given value to a parameter affecting one of five |
possible caches: |
$$\vbox{\halign{$#$\hfil\cr |
\<cache spec>\is\<cache name>\<cache parameter>\<decimal value>\<policy>\cr |
\<cache name>\is\.{ITcache}\mid\.{DTcache}\mid\.{Icache}\mid\.{Dcache} |
\mid\.{Scache}\cr |
\<policy>\is\<empty>\mid\.{random}\mid\.{serial} |
\mid\.{pseudolru}\mid\.{lru}\cr}}$$ |
The possibilities for \<cache parameter> are as follows: |
|
\bull associativity (default 1), number of cache blocks per cache set; |
must be a power of~2. (A cache with associativity~1 is said to be |
``direct-mapped.'') |
|
\bull blocksize (default 8), number of bytes per cache block; must be a power |
of~2, at least equal to the granularity, and at most equal to~8192. |
The blocksize of \.{ITcache} and \.{DTcache} must be~8. |
|
\bull setsize (default 1), number of sets of cache blocks; must be a power |
of~2. (A cache with set size~1 is said to be ``fully associative.'') |
|
\bull granularity (default 8), number of bytes per ``dirty bit,'' used to |
remember which items of data have changed since they were read from memory; |
must be a power of~2 and at least~8. The granularity must be~8 if |
\.{writeallocate} is~0. |
|
\bull victimsize (default 0), number of cache blocks in the victim buffer, |
which holds blocks removed from the main cache sets; must be zero or a power |
of~2. |
|
\bull writeback (default 0), is 1 in a ``write-back'' cache, which holds dirty |
data as long as possible; is 0 in a ``write-through'' cache, which cleans |
all data as soon as possible. |
|
\bull writeallocate (default 0), is 1 in a ``write-allocate'' cache, |
which remembers all recently written data; |
is 0 in a ``write-around'' cache, which doesn't make space for newly written |
data that fails to hit an existing cache block. |
|
\bull accesstime (default 1), number of cycles to query the cache; |
must be $\ge1$. (Hits in the S-cache actually require {\it twice} |
the accesstime, once to query the tag and once to transmit the data.) |
|
\bull copyintime (default 1), number of cycles to move a cache block from |
its input buffer into the cache proper; must be $\ge1$. |
|
\bull copyouttime (default 1), number of cycles to move a cache block |
from the cache proper to its output buffer; must be $\ge1$. |
|
\bull ports (default 1), number of processes that can simultaneous |
query the cache; must be $\ge1$. |
|
\smallskip |
The \<policy> parameter should be nonempty only on cache specifications |
for parameters |
\.{associativity} and \.{victimsize}. If no replacement policy is specified, |
\.{random} is the default. All four policies are equivalent when the |
\.{associativity} or \.{victimsize} is~1; \.{pseudolru} is equivalent |
to \.{lru} when the \.{associativity} or \.{victimsize} is~2. |
|
The \.{granularity}, \.{writeback}, \.{writeallocate}, and \.{copyouttime} |
parameters affect the performance only of the D-cache and S-cache; the other |
three caches are read-only, so they never need to write their data. |
|
The \.{ports} parameter affects the performance of the D-cache and |
DT-cache, and (if the \.{PREGO} command is used) the performance of the |
I-cache and IT-cache. The S-cache accommodates only one process at a time, |
regardless of the number of specified ports. |
|
Only the translation caches (the IT-cache and DT-cache) are present by |
default. But if any specifications are given for, say, an I-cache, |
all of the unspecified I-cache parameters take their default values. |
|
The existence of an S-cache (secondary cache) implies the existence of both |
I-cache and D-cache (primary caches for instructions and data). |
The block size of the secondary cache must not be less than the block |
size of the primary caches. The secondary cache must have the |
same granularity as the D-cache. |
|
@ A \<pipe spec> governs the execution time of potentially slow operations. |
$$\vbox{\halign{$#$\hfil\cr |
\<pipe spec>\is\<operation>\<pipeline times>\cr |
\<pipeline times>\is\<decimal value>\mid\<pipeline times>\<decimal value>\cr}}$$ |
Here the \<operation> is one of the following: |
|
\bull mul0 through \.{mul8} (default 10); the values for \.{mul}$j$ refer |
to products in which the second operand is less than $2^{8j}$, where $j$ |
is as small as possible. Thus, for example, \.{mul1} applies to |
nonzero one-byte multipliers. |
|
\bull div (default 60); this applies to integer division, signed and unsigned. |
|
\bull sh (default 1); this applies to left and right shifts, signed and |
unsigned. |
|
\bull mux (default 1); the multiplex operator. |
|
\bull sadd (default 1); the sideways addition operator. |
|
\bull mor (default 1); the boolean matrix multiplication operators \.{MOR} and |
\.{MXOR}. |
|
\bull fadd (default 4); floating point addition and subtraction. |
|
\bull fmul (default 4); floating point multiplication. |
|
\bull fdiv (default 40); floating point division. |
|
\bull fsqrt (default 40); floating point square root. |
|
\bull fint (default 4); floating point integerization. |
|
\bull fix (default 2); conversion from floating to fixed, signed and unsigned. |
|
\bull flot (default 2); conversion from fixed to floating, signed and unsigned. |
|
\bull feps (default 4); floating comparison with respect to epsilon. |
|
\smallskip\noindent |
In each case one can specify a sequence of pipeline stages, with a positive |
number of cycles to be spent in each stage. For example, a specification like |
`\.{fmul}~\.{3}~\.{1}' would say that a functional unit that supports |
\.{FMUL} takes a total of four cycles to compute the floating point product |
in two stages; it can start working on a second product after three cycles |
have gone by. |
|
If a floating point operation has a subnormal input, \.{denin} is added to |
the time for the first stage. If a floating point operation has a subnormal |
result, \.{denout} is added to the time for the last stage. |
|
@ The fourth and final kind of specification defines a functional unit: |
$$\<functional spec>\is\.{unit}\ \<name>\<64 hexadecimal digits>$$ |
The symbolic name should be at most fifteen characters long. |
The 64 hexadecimal digits contain 256 bits, with `1' for each supported |
opcode; the most significant (leftmost) bit is for opcode 0 (\.{TRAP}), |
and the least significant bit is for opcode 255 (\.{TRIP}). |
|
For example, we can define a load/store unit (which handles register/memory |
operations), a multiplication unit (which handles fixed and floating point |
multiplication), a boolean unit (which handles only bitwise operations), |
and a more general arithmetic-logical unit, as follows: |
$$\vbox{\halign{\tt#\hfil\cr |
unit LSU 00000000000000000000000000000000fffffffcfffffffc0000000000000000\cr |
unit MUL 000080f000000000000000000000000000000000000000000000000000000000\cr |
unit BIT 000000000000000000000000000000000000000000000000ffff00ff00ff0000\cr |
unit ALU f0000000ffffffffffffffffffffffff0000000300000003ffffffffffffffff\cr |
}}$$ |
|
The order in which units are specified is important, because \MMIX's dispatcher |
will try to match each instruction with the first functional unit that |
supports its opcode. Therefore it is best to list more specialized |
units (like the \.{BIT} unit in this example) before more general ones; |
this lets the specialized units have first chance at the instructions |
they can handle. |
|
There can be any number of functional units, having possibly identical |
specifications. One should, however, give each unit a unique name |
(e.g., \.{ALU1} and \.{ALU2} if there are two arithmetic-logical units), |
since these names are used in diagnostic messages. |
|
Opcodes that aren't supported by any specified unit will cause an |
emulation trap. |
@^emulation@> |
|
@ Full details about the significance of all these parameters can be found |
in the \.{mmix-pipe} module, which defines and discusses the data structures |
that need to be configured and initialized. |
|
Of course the specifications in a configuration file needn't make any sense, |
nor need they be practically achievable. We could, for example, specify |
a unit that handles only the two opcodes \.{NXOR} and \.{DIVUI}; |
we could specify 1-cycle division but pipelined 100-cycle shifts, or |
1-cycle memory access but 100-cycle cache access. We could create |
a thousand rename registers and issue a hundred instructions per cycle, |
etc. Some combinations of parameters are clearly ridiculous. |
|
But there remain a huge number of possibilities of interest, especially |
as technology continues to evolve. By experimenting with configurations that |
are extreme by present-day standards, we can see how much might be gained |
if the corresponding hardware could be built economically. |
|
@* Basic input/output. Let's get ready to program the |MMIX_config| subroutine |
by building some simple infrastructure. First we need some macros to |
print error messages. |
|
@d errprint0(f) fprintf(stderr,f) |
@d errprint1(f,a) fprintf(stderr,f,a) |
@d errprint2(f,a,b) fprintf(stderr,f,a,b) |
@d errprint3(f,a,b,c) fprintf(stderr,f,a,b,c) |
@d panic(x)@+ {@+x;@+errprint0("!\n");@+exit(-1);@+} |
|
@ And we need a place to look at the input. |
|
@d BUF_SIZE 100 /* we don't need long lines */ |
|
@<Global variables@>= |
FILE *config_file; /* input comes from here */ |
char buffer[BUF_SIZE]; /* input lines go here */ |
char token[BUF_SIZE]; /* and tokens are copied to here */ |
char *buf_pointer=buffer; /* this is our current position */ |
bool token_prescanned; /* does |token| contain the next token already? */ |
|
@ The |get_token| routine copies the next token of input into the |token| |
buffer. After the input has ended, a final `\.{end}' is appended. |
|
@<Subroutines@>= |
static void get_token @,@,@[ARGS((void))@];@+@t}\6{@> |
static void get_token() /* set |token| to the next token of the configuration file */ |
{ |
register char *p,*q; |
if (token_prescanned) { |
token_prescanned=false;@+ return; |
} |
while(1) { /* scan past white space */ |
if (*buf_pointer=='\0' || *buf_pointer=='\n' || *buf_pointer=='%') { |
if (!fgets(buffer,BUF_SIZE,config_file)) { |
strcpy(token,"end");@+return; |
} |
if (strlen(buffer)==BUF_SIZE-1 && buffer[BUF_SIZE-2]!='\n') |
panic(errprint1("config file line too long: `%s...'",buffer)); |
@.config file line...@> |
buf_pointer=buffer; |
}@+else if (!isspace(*buf_pointer)) break; |
else buf_pointer++; |
} |
for (p=buf_pointer,q=token;!isspace(*p) && *p!='%';p++,q++) *q=*p; |
buf_pointer=p;@+ *q='\0'; |
return; |
} |
|
@ The |get_int| routine is called when we wish to input a decimal value. |
It returns $-1$ if the next token isn't a string of decimal digits. |
|
@<Sub...@>= |
static int get_int @,@,@[ARGS((void))@];@+@t}\6{@> |
static int get_int() |
{@+ int v; |
char *p; |
get_token(); |
for (p=token,v=0; *p>='0' && *p<='9'; p++) v=10*v+*p-'0'; |
if (*p) return -1; |
return v; |
} |
|
@ A simple data structure makes it fairly easy to deal with |
parameter/value specifications. |
|
@<Type definitions@>= |
typedef struct { |
char name[20]; /* symbolic name */ |
int *v; /* internal name */ |
int defval; /* default value */ |
int minval, maxval; /* minimum and maximum legal values */ |
bool power_of_two; /* must it be a power of two? */ |
} pv_spec; |
|
@ Cache parameters are a bit more difficult, but still not bad. |
|
@<Type...@>= |
typedef enum {@!assoc,@!blksz,@!setsz,@!gran,@!vctsz, |
@!wrb,@!wra,@!acctm,@!citm,@!cotm,@!prts} c_param; |
@# |
typedef struct { |
char name[20]; /* symbolic name */ |
c_param v; /* internal code */ |
int defval; /* default value */ |
int minval, maxval; /* minimum and maximum legal values */ |
bool power_of_two; /* must it be a power of two? */ |
} cpv_spec; |
|
@ Operation codes are the easiest of all. |
|
@<Type...@>= |
typedef struct { |
char name[8]; /* symbolic name */ |
internal_opcode v; /* internal code */ |
int defval; /* default value */ |
} op_spec; |
|
@ Most of the parameters are external variables declared in the header |
file \.{mmix-pipe.h}; but some are private to this module. Here we |
define the main tables used below. |
|
@<Glob...@>= |
int fetch_buf_size,write_buf_size,reorder_buf_size,mem_bus_bytes,hardware_PT; |
int max_cycs=60; |
pv_spec PV[]={@/ |
{"fetchbuffer", &fetch_buf_size, 4, 1, INT_MAX, false},@/ |
{"writebuffer", &write_buf_size, 2, 1, INT_MAX, false},@/ |
{"reorderbuffer", &reorder_buf_size, 5, 1, INT_MAX, false},@/ |
{"renameregs", &max_rename_regs, 5, 1, INT_MAX, false},@/ |
{"memslots", &max_mem_slots, 2, 1, INT_MAX, false},@/ |
{"localregs", &lring_size, 256, 256, 1024, true},@/ |
{"fetchmax", &fetch_max, 2, 1, INT_MAX, false},@/ |
{"dispatchmax", &dispatch_max, 1, 1, INT_MAX, false},@/ |
{"peekahead", &peekahead, 1, 0, INT_MAX, false},@/ |
{"commitmax", &commit_max, 1, 1, INT_MAX, false},@/ |
{"fremmax", &frem_max, 1, 1, INT_MAX, false},@/ |
{"denin",&denin_penalty, 1, 0, INT_MAX, false},@/ |
{"denout",&denout_penalty, 1, 0, INT_MAX, false},@/ |
{"writeholdingtime", &holding_time, 0, 0, INT_MAX, false},@/ |
{"memaddresstime", &mem_addr_time, 20, 1, INT_MAX, false},@/ |
{"memreadtime", &mem_read_time, 20, 1, INT_MAX, false},@/ |
{"memwritetime", &mem_write_time, 20, 1, INT_MAX, false},@/ |
{"membusbytes", &mem_bus_bytes, 8, 8, INT_MAX, true},@/ |
{"branchpredictbits", &bp_n, 0, 0, 8, false},@/ |
{"branchaddressbits", &bp_a, 0, 0, 32, false},@/ |
{"branchhistorybits", &bp_b, 0, 0, 32, false},@/ |
{"branchdualbits", &bp_c, 0, 0, 32, false},@/ |
{"hardwarepagetable", &hardware_PT, 1, 0, 1, false},@/ |
{"disablesecurity", (int*)&security_disabled, 0, 0, 1, false},@/ |
{"memchunksmax", &mem_chunks_max, 1000, 1, INT_MAX, false},@/ |
{"hashprime", &hash_prime, 2003, 2, INT_MAX, false}}; |
@# |
cpv_spec CPV[]={ |
{"associativity", assoc, 1, 1, INT_MAX, true},@/ |
{"blocksize", blksz, 8, 8, 8192, true},@/ |
{"setsize", setsz, 1, 1, INT_MAX, true},@/ |
{"granularity", gran, 8, 8, 8192, true},@/ |
{"victimsize", vctsz, 0, 0, INT_MAX, true},@/ |
{"writeback", wrb, 0, 0, 1,false},@/ |
{"writeallocate", wra, 0, 0, 1,false},@/ |
{"accesstime", acctm, 1, 1, INT_MAX, false},@/ |
{"copyintime", citm, 1, 1, INT_MAX, false},@/ |
{"copyouttime", cotm, 1, 1, INT_MAX, false},@/ |
{"ports", prts, 1, 1, INT_MAX,false}}; |
@# |
op_spec OP[]={ |
{"mul0", mul0, 10}, |
{"mul1", mul1, 10}, |
{"mul2", mul2, 10}, |
{"mul3", mul3, 10}, |
{"mul4", mul4, 10}, |
{"mul5", mul5, 10}, |
{"mul6", mul6, 10}, |
{"mul7", mul7, 10}, |
{"mul8", mul8, 10},@| |
{"div", div, 60}, |
{"sh", sh, 1}, |
{"mux", mux, 1}, |
{"sadd", sadd, 1}, |
{"mor", mor, 1},@| |
{"fadd", fadd, 4}, |
{"fmul", fmul, 4}, |
{"fdiv", fdiv, 40}, |
{"fsqrt", fsqrt, 40}, |
{"fint", fint, 4},@| |
{"fix", fix, 2}, |
{"flot", flot, 2}, |
{"feps", feps, 4}}; |
int PV_size,CPV_size,OP_size; /* the number of entries in |PV|, |CPV|, |OP| */ |
|
@ The |new_cache| routine creates a \&{cache} structure with default values. |
(These default values are ``hard-wired'' into the program, not actually |
read from the |CPV| table.) |
|
@<Sub...@>= |
static cache* new_cache @,@,@[ARGS((char*))@];@+@t}\6{@> |
static cache* new_cache(name) |
char *name; |
{@+register cache *c=(cache*)calloc(1,sizeof(cache)); |
if (!c) panic(errprint1("Can't allocate %s",name)); |
@.Can't allocate...@> |
c->aa=1; /* default associativity, should equal |CPV[0].defval| */ |
c->bb=8; /* default blocksize */ |
c->cc=1; /* default setsize */ |
c->gg=8; /* default granularity */ |
c->vv=0; /* default victimsize */ |
c->repl=random; /* default replacement policy */ |
c->vrepl=random; /* default victim replacement policy */ |
c->mode=0; /* default mode is write-through and write-around */ |
c->access_time=c->copy_in_time=c->copy_out_time=1; |
c->filler.ctl=&(c->filler_ctl); |
c->filler_ctl.ptr_a=(void*)c; |
c->filler_ctl.go.o.l=4; |
c->flusher.ctl=&(c->flusher_ctl); |
c->flusher_ctl.ptr_a=(void*)c; |
c->flusher_ctl.go.o.l=4; |
c->ports=1; |
c->name=name; |
return c; |
} |
|
@ @<Initialize to defaults@>= |
PV_size=(sizeof PV)/sizeof(pv_spec); |
CPV_size=(sizeof CPV)/sizeof(cpv_spec); |
OP_size=(sizeof OP)/sizeof(op_spec); |
ITcache=new_cache("ITcache"); |
DTcache=new_cache("DTcache"); |
Icache=Dcache=Scache=NULL; |
for (j=0;j<PV_size;j++) *(PV[j].v)=PV[j].defval; |
for (j=0;j<OP_size;j++) { |
pipe_seq[OP[j].v][0]=OP[j].defval; |
pipe_seq[OP[j].v][1]=0; /* one stage */ |
} |
|
@* Reading the specs. Before we're ready to process the configuration file, |
we need to count the number of functional units, so that we know |
how much space to allocate for them. |
|
A special background unit is always provided, just to make sure that |
\.{TRAP} and \.{TRIP} instructions are handled by somebody. |
|
@<Count and allocate the functional units@>= |
funit_count=0; |
while (strcmp(token,"end")!=0) { |
get_token(); |
if (strcmp(token,"unit")==0) { |
funit_count++; |
get_token();@+get_token(); /* a unit might be named \.{unit} or \.{end} */ |
} |
} |
funit=(func*)calloc(funit_count+1,sizeof(func)); |
if (!funit) panic(errprint0("Can't allocate the functional units")); |
@.Can't allocate...@> |
strcpy(funit[funit_count].name,"%%"); |
@.\%\%@> |
funit[funit_count].ops[0]=0x80000000; /* \.{TRAP} */ |
funit[funit_count].ops[7]=0x1; /* \.{TRIP} */ |
|
@ Now we can read the specifications and obey them. This program doesn't |
bother to be very tolerant of errors, nor does it try to be very efficient. |
|
Incidentally, the specifications don't have to be broken into individual lines |
in any meaningful way. We simply read them token by token. |
|
@<Record all the specs@>= |
rewind(config_file); |
funit_count=0; |
token[0]='\0'; |
while (strcmp(token,"end")!=0) { |
get_token(); |
if (strcmp(token,"end")==0) break; |
@<If |token| is a parameter name, process a PV spec@>; |
@<If |token| is a cache name, process a cache spec@>; |
@<If |token| is an operation name, process a pipe spec@>; |
if (strcmp(token,"unit")==0) @<Process a functional spec@>; |
panic(errprint1( |
"Configuration syntax error: Specification can't start with `%s'",token)); |
@.Configuration syntax error...@> |
} |
|
@ @<If |token| is a parameter name, process a PV spec@>= |
for (j=0;j<PV_size;j++) if (strcmp(token,PV[j].name)==0) { |
n=get_int(); |
if (n<PV[j].minval) panic(errprint2( |
@.Configuration error...@> |
"Configuration error: %s must be >= %d",PV[j].name,PV[j].minval)); |
if (n>PV[j].maxval) panic(errprint2( |
"Configuration error: %s must be <= %d",PV[j].name,PV[j].maxval)); |
if (PV[j].power_of_two && (n&(n-1))) panic(errprint1( |
"Configuration error: %s must be a power of 2",PV[j].name)); |
*(PV[j].v)=n; |
break; |
} |
if (j<PV_size) continue; |
|
@ @<If |token| is a cache name, process a cache spec@>= |
if (strcmp(token,"ITcache")==0) { |
pcs(ITcache);@+continue; |
}@+else if (strcmp(token,"DTcache")==0) { |
pcs(DTcache);@+continue; |
}@+else if (strcmp(token,"Icache")==0) { |
if (!Icache) Icache=new_cache("Icache"); |
pcs(Icache);@+continue; |
}@+else if (strcmp(token,"Dcache")==0) { |
if (!Dcache) Dcache=new_cache("Dcache"); |
pcs(Dcache);@+continue; |
}@+else if (strcmp(token,"Scache")==0) { |
if (!Icache) Icache=new_cache("Icache"); |
if (!Dcache) Dcache=new_cache("Dcache"); |
if (!Scache) Scache=new_cache("Scache"); |
pcs(Scache);@+continue; |
} |
|
@ @<Sub...@>= |
static void ppol @,@,@[ARGS((replace_policy*))@];@+@t}\6{@> |
static void ppol(rr) /* subroutine to scan for a replacement policy */ |
replace_policy *rr; |
{ |
get_token(); |
if (strcmp(token,"random")==0) *rr=random; |
else if (strcmp(token,"serial")==0) *rr=serial; |
else if (strcmp(token,"pseudolru")==0) *rr=pseudo_lru; |
else if (strcmp(token,"lru")==0) *rr=lru; |
else token_prescanned=true; /* oops, we should rescan that token */ |
} |
|
@ @<Sub...@>= |
static void pcs @,@,@[ARGS((cache*))@];@+@t}\6{@> |
static void pcs(c) /* subroutine to process a cache spec */ |
cache *c; |
{ |
register int j,n; |
get_token(); |
for (j=0;j<CPV_size;j++) if (strcmp(token,CPV[j].name)==0) break; |
if (j==CPV_size) panic(errprint1( |
"Configuration syntax error: `%s' isn't a cache parameter name",token)); |
@.Configuration syntax error...@> |
n=get_int(); |
if (n<CPV[j].minval) panic(errprint2( |
"Configuration error: %s must be >= %d",CPV[j].name,CPV[j].minval)); |
@.Configuration error...@> |
if (n>CPV[j].maxval) panic(errprint2( |
"Configuration error: %s must be <= %d",CPV[j].name,CPV[j].maxval)); |
if (CPV[j].power_of_two && (n&(n-1))) panic(errprint1( |
"Configuration error: %s must be power of 2",CPV[j].name)); |
switch (CPV[j].v) { |
case assoc: c->aa=n;@+ppol(&(c->repl));@+break; |
case blksz: c->bb=n;@+break; |
case setsz: c->cc=n;@+break; |
case gran: c->gg=n;@+break; |
case vctsz: c->vv=n;@+ppol(&(c->vrepl));@+break; |
case wrb: c->mode=(c->mode&~WRITE_BACK)+n*WRITE_BACK;@+break; |
case wra: c->mode=(c->mode&~WRITE_ALLOC)+n*WRITE_ALLOC;@+break; |
case acctm:@+ if (n>max_cycs) max_cycs=n; |
c->access_time=n;@+break; |
case citm:@+ if (n>max_cycs) max_cycs=n; |
c->copy_in_time=n;@+break; |
case cotm:@+ if (n>max_cycs) max_cycs=n; |
c->copy_out_time=n;@+break; |
case prts: c->ports=n;@+break; |
} |
} |
|
@ @<If |token| is an operation name, process a pipe spec@>= |
for (j=0;j<OP_size;j++) if (strcmp(token,OP[j].name)==0) { |
for (i=0;;i++) { |
n=get_int(); |
if (n<0) break; |
if (n==0) panic(errprint0( |
"Configuration error: Pipeline cycles must be positive")); |
@.Configuration error...@> |
if (n>255) panic(errprint0( |
"Configuration error: Pipeline cycles must be <= 255")); |
if (n>max_cycs) max_cycs=n; |
if (i>=pipe_limit) panic(errprint1( |
"Configuration error: More than %d pipeline stages",pipe_limit)); |
pipe_seq[OP[j].v][i]=n; |
} |
token_prescanned=true; |
break; |
} |
if (j<OP_size) continue; |
|
@ @<Process a functional spec@>= |
{ |
get_token(); |
if (strlen(token)>15) panic(errprint1( |
"Configuration error: `%s' is more than 15 characters long",token)); |
@.Configuration error...@> |
strcpy(funit[funit_count].name,token); |
get_token(); |
if (strlen(token)!=64) panic(errprint1( |
"Configuration error: unit %s doesn't have 64 hex digit specs", |
funit[funit_count].name)); |
for (i=j=n=0;j<64;j++) { |
if (token[j]>='0' && token[j]<='9') n=(n<<4)+(token[j]-'0'); |
else if (token[j]>='a' && token[j]<='f') n=(n<<4)+(token[j]-'a'+10); |
else if (token[j]>='A' && token[j]<='F') n=(n<<4)+(token[j]-'A'+10); |
else panic(errprint1( |
"Configuration error: `%c' is not a hex digit",token[j])); |
if ((j&0x7)==0x7) funit[funit_count].ops[i++]=n, n=0; |
} |
funit_count++; |
continue; |
} |
|
@* Checking and allocating. The battle is only half over when we've |
absorbed all the data of the configuration file. We still must check for |
interactions between different quantities, and we must allocate |
space for cache blocks, coroutines, etc. |
|
One of the most difficult tasks facing us to determine the maximum number |
of pipeline stages needed by each functional unit. Let's tackle that first. |
|
@<Allocate coroutines in each functional unit@>= |
@<Build table of pipeline stages needed for each opcode@>; |
for (j=0;j<=funit_count;j++) { |
@<Determine the number of stages, |n|, needed by |funit[j]|@>; |
funit[j].k=n; |
funit[j].co=(coroutine*)calloc(n,sizeof(coroutine)); |
for (i=0;i<n;i++) { |
funit[j].co[i].name=funit[j].name; |
funit[j].co[i].stage=i+1; |
} |
} |
|
@ @<Build table of pipeline stages needed for each opcode@>= |
for (j=div;j<=max_pipe_op;j++) int_stages[j]=strlen(pipe_seq[j]); |
for (;j<=max_real_command;j++) int_stages[j]=1; |
for (j=mul0,n=0;j<=mul8;j++) |
if (strlen(pipe_seq[j])>n) n=strlen(pipe_seq[j]); |
int_stages[mul]=n; |
int_stages[ld]=int_stages[st]=int_stages[frem]=2; |
for (j=0;j<256;j++) stages[j]=int_stages[int_op[j]]; |
|
@ The |int_op| conversion table is similar to the |internal_op| array of |
the \\{MMIX\_pipe} routine, but it replaces |divu| by |div|, |
|fsub| by |fadd|, etc. |
|
@<Glob...@>= |
internal_opcode int_op[256]={@/ |
trap,fcmp,funeq,funeq,fadd,fix,fadd,fix,@/ |
flot,flot,flot,flot,flot,flot,flot,flot,@/ |
fmul,feps,feps,feps,fdiv,fsqrt,frem,fint,@/ |
mul,mul,mul,mul,div,div,div,div,@/ |
add,add,addu,addu,sub,sub,subu,subu,@/ |
addu,addu,addu,addu,addu,addu,addu,addu,@/ |
cmp,cmp,cmpu,cmpu,sub,sub,subu,subu,@/ |
sh,sh,sh,sh,sh,sh,sh,sh,@/ |
br,br,br,br,br,br,br,br,@/ |
br,br,br,br,br,br,br,br,@/ |
pbr,pbr,pbr,pbr,pbr,pbr,pbr,pbr,@/ |
pbr,pbr,pbr,pbr,pbr,pbr,pbr,pbr,@/ |
cset,cset,cset,cset,cset,cset,cset,cset,@/ |
cset,cset,cset,cset,cset,cset,cset,cset,@/ |
zset,zset,zset,zset,zset,zset,zset,zset,@/ |
zset,zset,zset,zset,zset,zset,zset,zset,@/ |
ld,ld,ld,ld,ld,ld,ld,ld,@/ |
ld,ld,ld,ld,ld,ld,ld,ld,@/ |
ld,ld,ld,ld,ld,ld,ld,ld,@/ |
ld,ld,ld,ld,prego,prego,go,go,@/ |
st,st,st,st,st,st,st,st,@/ |
st,st,st,st,st,st,st,st,@/ |
st,st,st,st,st,st,st,st,@/ |
st,st,st,st,st,st,pushgo,pushgo,@/ |
or,or,orn,orn,nor,nor,xor,xor,@/ |
and,and,andn,andn,nand,nand,nxor,nxor,@/ |
bdif,bdif,wdif,wdif,tdif,tdif,odif,odif,@/ |
mux,mux,sadd,sadd,mor,mor,mor,mor,@/ |
set,set,set,set,addu,addu,addu,addu,@/ |
or,or,or,or,andn,andn,andn,andn,@/ |
noop,noop,pushj,pushj,set,set,put,put,@/ |
pop,resume,save,unsave,sync,noop,get,trip}; |
int int_stages[max_real_command+1]; |
/* stages as function of |internal_opcode| */ |
int stages[256]; /* stages as function of |mmix_opcode| */ |
|
@ @<Determine the number of stages...@>= |
for (i=n=0;i<256;i++) |
if (((funit[j].ops[i>>5]<<(i&0x1f))&0x80000000) && stages[i]>n) |
n=stages[i]; |
if (n==0) panic(errprint1( |
"Configuration error: unit %s doesn't do anything",funit[j].name)); |
@.Configuration error...@> |
|
@ The next hardest thing on our agenda is to set up the cache structure |
fields that depend on the parameters. For example, although we have defined |
the parameter in the |bb| field (the block size), we also need to compute the |
|b|~field (log of the block size), and we must create the cache blocks |
themselves. |
|
@<Sub...@>= |
static int lg @,@,@[ARGS((int))@];@+@t}\6{@> |
static int lg(n) /* compute binary logarithm */ |
int n; |
{@+register int j,l; |
for (j=n,l=0;j;j>>=1) l++; |
return l-1; |
} |
|
@ @<Sub...@>= |
static void alloc_cache @,@,@[ARGS((cache*,char*))@];@+@t}\6{@> |
static void alloc_cache(c,name) |
cache *c; |
char *name; |
{@+register int j,k; |
if (c->bb<c->gg) panic(errprint1( |
"Configuration error: blocksize of %s is less than granularity",name)); |
@.Configuration error...@> |
if (name[1]=='T' && c->bb!=8) panic(errprint1( |
"Configuration error: blocksize of %s must be 8",name)); |
c->a=lg(c->aa); |
c->b=lg(c->bb); |
c->c=lg(c->cc); |
c->g=lg(c->gg); |
c->v=lg(c->vv); |
c->tagmask=-(1<<(c->b+c->c)); |
if (c->a+c->b+c->c>=32) panic(errprint1( |
"Configuration error: %s has >= 4 gigabytes of data",name)); |
if (c->gg!=8 && !(c->mode&WRITE_ALLOC)) panic(errprint2( |
"Configuration error: %s does write-around with granularity %d", |
name,c->gg)); |
@<Allocate the cache sets for cache |c|@>; |
if (c->vv) @<Allocate the victim cache for cache |c|@>; |
c->inbuf.dirty=(char*)calloc(c->bb>>c->g,sizeof(char)); |
if (!c->inbuf.dirty) panic(errprint1( |
"Can't allocate dirty bits for inbuffer of %s",name)); |
@.Can't allocate...@> |
c->inbuf.data=(octa *)calloc(c->bb>>3,sizeof(octa)); |
if (!c->inbuf.data) panic(errprint1( |
"Can't allocate data for inbuffer of %s",name)); |
c->outbuf.dirty=(char*)calloc(c->bb>>c->g,sizeof(char)); |
if (!c->outbuf.dirty) panic(errprint1( |
"Can't allocate dirty bits for outbuffer of %s",name)); |
c->outbuf.data=(octa *)calloc(c->bb>>3,sizeof(octa)); |
if (!c->outbuf.data) panic(errprint1( |
"Can't allocate data for outbuffer of %s",name)); |
if (name[0]!='S') @<Allocate reader coroutines for cache |c|@>; |
} |
|
@ @d sign_bit 0x80000000 |
|
@<Allocate the cache sets for cache |c|@>= |
c->set=(cacheset *)calloc(c->cc,sizeof(cacheset)); |
if (!c->set) panic(errprint1( |
"Can't allocate cache sets for %s",name)); |
@.Can't allocate...@> |
for (j=0;j<c->cc;j++) { |
c->set[j]=(cacheblock *)calloc(c->aa,sizeof(cacheblock)); |
if (!c->set[j]) panic(errprint2( |
"Can't allocate cache blocks for set %d of %s",j,name)); |
for (k=0;k<c->aa;k++) { |
c->set[j][k].tag.h=sign_bit; /* invalid tag */ |
c->set[j][k].dirty=(char *)calloc(c->bb>>c->g,sizeof(char)); |
if (!c->set[j][k].dirty) panic(errprint3( |
"Can't allocate dirty bits for block %d of set %d of %s",k,j,name)); |
c->set[j][k].data=(octa *)calloc(c->bb>>3,sizeof(octa)); |
if (!c->set[j][k].data) panic(errprint3( |
"Can't allocate data for block %d of set %d of %s",k,j,name)); |
} |
} |
|
@ @<Allocate the victim cache for cache |c|@>= |
{ |
c->victim=(cacheblock*)calloc(c->vv,sizeof(cacheblock)); |
if (!c->victim) panic(errprint1( |
"Can't allocate blocks for victim cache of %s",name)); |
for (k=0;k<c->vv;k++) { |
c->victim[k].tag.h=sign_bit; /* invalid tag */ |
c->victim[k].dirty=(char *)calloc(c->bb>>c->g,sizeof(char)); |
if (!c->victim[k].dirty) panic(errprint2( |
"Can't allocate dirty bits for block %d of victim cache of %s", |
k,name)); |
@.Can't allocate...@> |
c->victim[k].data=(octa *)calloc(c->bb>>3,sizeof(octa)); |
if (!c->victim[k].data) panic(errprint2( |
"Can't allocate data for block %d of victim cache of %s",k,name)); |
} |
} |
|
@ @<Allocate reader coroutines for cache |c|@>= |
{ |
c->reader=(coroutine*)calloc(c->ports,sizeof(coroutine)); |
if (!c->reader) panic(errprint1( |
@.Can't allocate...@> |
"Can't allocate readers for %s",name)); |
for (j=0;j<c->ports;j++) { |
c->reader[j].stage=vanish; |
c->reader[j].name=(name[0]=='D'? (name[1]=='T'? "DTreader": "Dreader"): |
(name[1]=='T'? "ITreader": "Ireader")); |
} |
} |
|
@ @<Allocate the caches@>= |
alloc_cache(ITcache,"ITcache"); |
ITcache->filler.name="ITfiller";@+ ITcache->filler.stage=fill_from_virt; |
alloc_cache(DTcache,"DTcache"); |
DTcache->filler.name="DTfiller";@+ DTcache->filler.stage=fill_from_virt; |
if (Icache) { |
alloc_cache(Icache,"Icache"); |
Icache->filler.name="Ifiller";@+ Icache->filler.stage=fill_from_mem; |
} |
if (Dcache) { |
alloc_cache(Dcache,"Dcache"); |
Dcache->filler.name="Dfiller";@+ Dcache->filler.stage=fill_from_mem; |
Dcache->flusher.name="Dflusher";@+ Dcache->flusher.stage=flush_to_mem; |
} |
if (Scache) { |
alloc_cache(Scache,"Scache"); |
if (Scache->bb<Icache->bb) panic(errprint0( |
"Configuration error: Scache blocks smaller than Icache blocks")); |
@.Configuration error...@> |
if (Scache->bb<Dcache->bb) panic(errprint0( |
"Configuration error: Scache blocks smaller than Dcache blocks")); |
if (Scache->gg!=Dcache->gg) panic(errprint0( |
"Configuration error: Scache granularity differs from the Dcache")); |
Icache->filler.stage=fill_from_S; |
Dcache->filler.stage=fill_from_S;@+ Dcache->flusher.stage=flush_to_S; |
Scache->filler.name="Sfiller";@+ Scache->filler.stage=fill_from_mem; |
Scache->flusher.name="Sflusher";@+ Scache->flusher.stage=flush_to_mem; |
} |
|
@ Now we are nearly done. The only nontrivial task remaining is |
to allocate the ring of queues for coroutine scheduling; for this we |
need to determine the maximum waiting time that will occur between |
scheduler and schedulee. |
|
@<Allocate the scheduling queue@>= |
bus_words=mem_bus_bytes>>3; |
j=(mem_read_time<mem_write_time? mem_write_time: mem_read_time); |
n=1; |
if (Scache && Scache->bb>n) n=Scache->bb; |
if (Icache && Icache->bb>n) n=Icache->bb; |
if (Dcache && Dcache->bb>n) n=Dcache->bb; |
n=mem_addr_time+((int)(n+bus_words-1)/bus_words)*j; |
if (n>max_cycs) max_cycs=n; /* now |max_cycs| bounds the waiting time */ |
ring_size=max_cycs+1; |
ring=(coroutine *)calloc(ring_size,sizeof(coroutine)); |
if (!ring) panic(errprint0("Can't allocate the scheduling ring")); |
@.Can't allocate...@> |
{@+register coroutine *p; |
for (p=ring;p<ring+ring_size;p++) { |
p->name=""; /* header nodes are nameless */ |
p->stage=max_stage; |
} |
} |
|
@ @s chunknode int |
|
@<Touch up last-minute trivia@>= |
if (hash_prime<=mem_chunks_max) panic(errprint0( |
"Configuration error: hashprime must exceed memchunksmax")); |
@.Configuration error...@> |
mem_hash=(chunknode *)calloc(hash_prime+1,sizeof(chunknode)); |
if (!mem_hash) panic(errprint0("Can't allocate the hash table")); |
@.Can't allocate...@> |
mem_hash[0].chunk=(octa*)calloc(1<<13,sizeof(octa)); |
if (!mem_hash[0].chunk) panic(errprint0("Can't allocate chunk 0")); |
mem_hash[hash_prime].chunk=(octa*)calloc(1<<13,sizeof(octa)); |
if (!mem_hash[hash_prime].chunk) panic(errprint0("Can't allocate 0 chunk")); |
mem_chunks=1; |
fetch_bot=(fetch*)calloc(fetch_buf_size+1,sizeof(fetch)); |
if (!fetch_bot) panic(errprint0("Can't allocate the fetch buffer")); |
fetch_top=fetch_bot+fetch_buf_size; |
reorder_bot=(control*)calloc(reorder_buf_size+1,sizeof(control)); |
if (!reorder_bot) panic(errprint0("Can't allocate the reorder buffer")); |
reorder_top=reorder_bot+reorder_buf_size; |
wbuf_bot=(write_node*)calloc(write_buf_size+1,sizeof(write_node)); |
if (!wbuf_bot) panic(errprint0("Can't allocate the write buffer")); |
wbuf_top=wbuf_bot+write_buf_size; |
if (bp_n==0) bp_table=NULL; |
else { /* a branch prediction table is desired */ |
if (bp_a+bp_b+bp_c>=32) panic(errprint0( |
"Configuration error: Branch table has >= 4 gigabytes of data")); |
bp_table=(char*)calloc(1<<(bp_a+bp_b+bp_c),sizeof(char)); |
if (!bp_table) panic(errprint0("Can't allocate the branch table")); |
} |
l=(specnode*)calloc(lring_size,sizeof(specnode)); |
if (!l) panic(errprint0("Can't allocate local registers")); |
j=bus_words; |
if (Icache && Icache->bb>j) j=Icache->bb; |
fetched=(octa*)calloc(j,sizeof(octa)); |
if (!fetched) panic(errprint0("Can't allocate prefetch buffer")); |
dispatch_stat=(int*)calloc(dispatch_max+1,sizeof(int)); |
if (!dispatch_stat) panic(errprint0("Can't allocate dispatch counts")); |
no_hardware_PT=1-hardware_PT; |
|
@* Putting it all together. Here then is the desired configuration |
subroutine. |
|
@c |
#include <stdio.h> /* |fopen|, |fgets|, |sscanf|, |rewind| */ |
#include <stdlib.h> /* |calloc|, |exit| */ |
#include <ctype.h> /* |isspace| */ |
#include <string.h> /* |strcpy|, |strlen|, |strcmp| */ |
#include <limits.h> /* |INT_MAX| */ |
#include "mmix-pipe.h" |
@<Type definitions@>@; |
@<Global variables@>@; |
@<Subroutines@>@; |
void MMIX_config(filename) |
char *filename; |
{@+register int i,j,n; |
config_file=fopen(filename,"r"); |
if (!config_file) |
panic(errprint1("Can't open configuration file %s",filename)); |
@.Can't open...@> |
@<Initialize to defaults@>; |
@<Count and allocate the functional units@>; |
@<Record all the specs@>; |
@<Allocate coroutines in each functional unit@>; |
@<Allocate the caches@>; |
@<Allocate the scheduling queue@>; |
@<Touch up...@>; |
} |
|
@*Index. |
/primessf.mms
0,0 → 1,66
% Example program ... Table of primes (using short floats) |
L IS 500 The number of primes to find |
t IS $255 Temporary storage |
fn GREG |
q GREG |
r GREG |
jj GREG |
kk GREG |
pk GREG |
mm IS kk |
|
LOC Data_Segment |
PRIME1 TETRA #40000000 |
LOC PRIME1+4*L |
ptop GREG @ |
j0 GREG PRIME1+4-@ |
BUF OCTA |
|
LOC #100 |
Main FLOT fn,3 |
SET jj,j0 |
2H STSF fn,ptop,jj |
INCL jj,4 |
3H BZ jj,2F |
0H GREG #4000000000000000 |
4H FADD fn,fn,0B |
5H SET kk,j0 |
sqrtn GREG 0 |
FSQRT sqrtn,fn |
6H LDSF pk,ptop,kk |
FREM r,fn,pk |
BZ r,4B |
7H FCMP t,pk,sqrtn |
BNN t,2B |
8H INCL kk,4 |
JMP 6B |
GREG @ |
Title BYTE "First Five Hundred Primes" |
NewLn BYTE #a,0 |
Blanks BYTE " ",0 |
2H LDA t,Title |
TRAP 0,Fputs,StdOut |
NEG mm,4 |
3H ADD mm,mm,j0 |
LDA t,Blanks |
TRAP 0,Fputs,StdOut |
2H LDSF pk,ptop,mm |
FIX pk,pk |
0H GREG #2030303030000000 |
STOU 0B,BUF |
LDA t,BUF+4 |
1H DIV pk,pk,10 |
GET r,rR |
INCL r,'0' |
STBU r,t,0 |
SUB t,t,1 |
PBNZ pk,1B |
LDA t,BUF |
TRAP 0,Fputs,StdOut |
INCL mm,4*L/10 |
PBN mm,2B |
LDA t,NewLn |
TRAP 0,Fputs,StdOut |
CMP t,mm,4*(L/10-1) |
PBNZ t,3B |
TRAP 0,Halt,0 |
/iotest1.mms
0,0 → 1,36
* TESTING I/O (besides what was tested by the copy program) |
* (intended for online test) |
|
t IS $255 |
Buf IS Data_Segment+2 |
LOC Buf+9*2 |
Arg0 OCTA Buf,9 |
Arg1 OCTA Filename,BinaryReadWrite |
LOC @+1 |
Filename BYTE "iotest.tmp",0 |
GREG Buf |
|
LOC #200 |
Main LDA t,Arg0 |
TRAP 0,Fgets,StdIn Fgets(StdIn,Buf,9) |
LDA t,Buf |
TRAP 0,Fputs,StdOut Fputs(StdOut,Buf) |
LDA t,Arg0 |
TRAP 0,Fgetws,StdIn Fgetws(StdIn,Buf,9) |
LDA t,Buf |
TRAP 0,Fputws,StdOut Fputws(StdOut,Buf) |
TRAP 0,Fclose,StdIn Fclose(StdIn) |
TRAP 0,Fclose,StdIn Fclose(StdIn) |
LDA t,Arg1 |
TRAP 0,Fopen,StdIn Fopen(StdIn,"iotest.tmp",BinaryReadWrite) |
NEG t,1 |
TRAP 0,Fseek,StdIn Fseek(StdIn,-1) |
TRAP 0,Ftell,StdIn Ftell(StdIn) |
LDA t,Buf |
TRAP 0,Fputws,StdIn Fputws(StdIn,Buf) |
SET t,2 |
TRAP 0,Fseek,StdIn Fseek(StdIn,2) |
LDA t,Arg0 |
TRAP 0,Fgets,StdIn Fgets(StdIn,Buf,9) |
TRAP 0,Halt,0 |
|
/iotest2.mms
0,0 → 1,98
* Additional IO test for the simulated simulator |
* (Change "Chunk" to 8 in sim.mms to make the acid test!) |
|
t IS $255 |
h IS 3 |
|
LOC Data_Segment |
* initial value final value |
A OCTA #1111111111111111 #0011000011610a00 |
OCTA #2222222222222222 #222222222262630a |
OCTA #3333333333333333 #0033333333646566 |
OCTA #4444444444444444 #0a00444444313233 |
OCTA #5555555555555555 #343536373839410a |
OCTA #6666666666666666 #0066666666313233 |
OCTA #7777777777777777 #3435363738394142 |
OCTA #8888888888888888 #00888888000a0000 |
OCTA #9999999999999999 #999999990a0a000a |
OCTA #1111111111111111 #0000111178787979 |
OCTA #2222222222222222 #000a000031313232 |
OCTA #3333333333333333 #333334343535000a |
OCTA #4444444444444444 #0000444431313232 |
OCTA #5555555555555555 #3333343435353636 |
OCTA #6666666666666666 #0000666666707100 |
OCTA #7777777777777777 #7777777777777777 |
OCTA #8888888888888888 #8888888888888888 |
OCTA #9999999999999999 #9999999999999999 |
GREG @ |
GREG @+256 |
Dat BYTE "xa",#a,"bc",#a,"def",#a,"123456789A",#a,"123456789AB" |
BYTE 0,#a,#a,#a,0,#a,"xxyy",0,#a,"1122334455",0,#a |
BYTE "112233445566pq",0,0 |
IOscr BYTE "ioscr.tmp",0 |
Arg0 OCTA IOscr,BinaryReadWrite |
Arg1 OCTA A,0 |
Arg2 OCTA A,1 |
Arg3 OCTA A+3,1 |
Arg4 OCTA A+5,12 |
Arg5 OCTA A+13,12 |
Arg6 OCTA A+21,12 |
Arg7 OCTA A+29,12 |
Arg8 OCTA A+45,12 |
Arg9 OCTA A+61,7 |
Arg10 OCTA A+69,7 |
Arg11 OCTA A+77,7 |
Arg12 OCTA A+85,7 |
Arg13 OCTA A+101,7 |
Arg14 OCTA A+117,7 |
Arg15 OCTA A,8*18 |
|
LOC #100 |
Main TRAP 0,Fclose,h |
LDA t,Arg0 |
TRAP 0,Fopen,h |
LDA t,Dat |
TRAP 0,Fputws,h |
TRAP 0,Ftell,h |
SET t,1000 |
TRAP 0,Fseek,h |
TRAP 0,Ftell,h |
SET t,1 |
TRAP 0,Fseek,h |
TRAP 0,Ftell,h |
LDA t,Arg1 |
TRAP 0,Fgets,h |
LDA t,Arg2 |
TRAP 0,Fgets,h |
LDA t,Arg3 |
TRAP 0,Fgetws,h |
LDA t,Arg4 |
TRAP 0,Fgets,h |
LDA t,Arg5 |
TRAP 0,Fgets,h |
LDA t,Arg6 |
TRAP 0,Fgets,h |
LDA t,Arg7 |
TRAP 0,Fgets,h |
LDA t,Arg8 |
TRAP 0,Fgets,h |
LDA t,Arg9 |
TRAP 0,Fgetws,h |
LDA t,Arg10 |
TRAP 0,Fgetws,h |
LDA t,Arg11 |
TRAP 0,Fgetws,h |
LDA t,Arg12 |
TRAP 0,Fgetws,h |
LDA t,Arg13 |
TRAP 0,Fgetws,h |
NEG t,3 |
TRAP 0,Fseek,h |
TRAP 0,Ftell,h |
LDA t,Arg14 |
TRAP 0,Fgets,h |
SET t,0 |
TRAP 0,Fseek,h |
LDA t,Arg15 |
TRAP 0,Fwrite,h |
TRAP 0,Halt,0 |
/primesx.mmconfig
0,0 → 1,44
% configuration for primes test -- still in preparation |
memaddresstime 3 |
memreadtime 10 memwritetime 10 |
membusbytes 16 |
%branchpredictbits 2 |
%branchaddressbits 1 |
%branchhistorybits 1 |
%branchdualbits 1 |
memchunksmax 4 |
hashprime 5 |
Scache blocksize 64 |
Scache setsize 512 |
Scache associativity 4 pseudolru |
Scache accesstime 2 |
Dcache blocksize 32 |
Dcache setsize 256 |
Icache blocksize 32 |
Icache setsize 256 |
Icache victimsize 2 |
unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU3 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU4 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit LSU3 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit MUL1 000080f000000000000000000000000000000000000000000000000000000000 |
unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000 |
unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000 |
dispatchmax 3 |
commitmax 3 |
fetchmax 4 |
memslots 4 |
renameregs 10 |
reorderbuffer 20 |
Dcache writeallocate 1 |
Scache writeallocate 1 |
Dcache writeback 1 |
Scache writeback 1 |
Dcache ports 2 |
DTcache ports 2 |
writebuffer 4 |
writeholdingtime 5 |
div 10 10 10 10 10 10 |
/silly.mms
0,0 → 1,227
* A program that exercises all MMIX operations (more or less) |
small GREG #abc |
neg_zero GREG #8000000000000000 |
half GREG #3fe0000000000000 |
inf GREG #7ff0000000000000 |
sig_nan GREG #7ff1000000000000 |
round_off GREG ROUND_OFF<<16 |
round_up GREG ROUND_UP<<16 |
round_down GREG ROUND_DOWN<<16 |
addy GREG #7f6001b4c67bc809 |
addz GREG #ff5ffb6a4534a3f7 |
flip GREG #0102040810204080 |
ry GREG |
rz GREG |
LOC Data_Segment |
GREG @ |
Start_Inst SUB $4,half,$1 |
Final_Inst SRU $4,half,1 |
Load_Test OCTA #8081828384858687 |
OCTA #88898a8b8c8d8e8f |
Jmp_Pop JMP @+8 |
POP |
Load_Begin TETRA #5f030405 |
Load_End LDUNC $3,$4,5 |
Big_Begin GO $40,ry,5 |
Big_End ANDNL $40,(ry-$0)<<8+5 |
|
LOC #100 |
Main FCMP $0,neg_zero,$5 |
FCMP $1,neg_zero,inf |
FCMP $2,inf,sig_nan |
FUN $3,sig_nan,sig_nan |
FEQL $4,$4,neg_zero |
FADD $5,half,inf |
FADD $6,half,neg_zero |
FADD $7,half,half |
FADD $8,half,sig_nan |
FSUB $9,half,small |
PUT rA,round_off |
FSUB $9,half,small |
FSUB $9,small,half |
FSQRT $10,$9 |
FSUB $11,sig_nan,$10 |
PUT rA,round_down |
FSUB $12,half,half |
FSUB $12,$20,$21 |
FSUB $12,$20,neg_zero |
PUT rA,round_up |
SUB $0,inf,1 % $0 = largest normal number |
FADD $12,$0,small |
FIX $12,half |
FIXU $14,ROUND_DOWN,$9 |
FLOT $15,ROUND_DOWN,addy |
FLOT $16,ROUND_UP,addy |
NEG $1,1 % $1 = -1 |
FLOT $17,1 |
FLOT $17,$1 |
FLOTU $18,255 |
FLOTU $18,neg_zero |
FIX $13,ROUND_NEAR,$18 |
SFLOT $18,ROUND_DOWN,addy |
SFLOT $19,ROUND_UP,addy |
FSUB $20,$18,$19 |
FSUB $20,$16,$15 |
SFLOT $20,1 |
SFLOT $20,$1 |
SFLOTU $21,$1 |
SFLOTU $21,255 |
FMUL $22,neg_zero,inf |
FMUL $22,half,half |
FMUL $23,small,$0 |
PUT rE,half |
FCMPE $24,half,$21 |
FCMPE $24,neg_zero,small |
FCMPE $24,neg_zero,half |
FCMPE $24,half,inf |
FEQLE $24,$15,$16 |
PUT rE,neg_zero |
FEQLE $24,half,half |
FUNE $24,half,half |
FSQRT $25,ROUND_UP,$0 |
FDIV $26,$0,$25 |
PUT rA,$50 |
FDIV $26,$0,$25 |
FMUL $27,$25,$25 |
FREM $28,$9,half |
FREM $29,$9,small |
FINT $30,$9 |
FINT $30,ROUND_UP,small |
MUL $31,flip,flip |
MUL $32,flip,$1 |
MUL $33,flip,2 |
DIV $32,$32,$1 |
DIV $32,neg_zero,$1 |
MULU $32,flip,$1 |
MULU $31,flip,flip |
GET $33,rH |
PUT rD,$33 |
DIV $33,$1,3 |
DIVU $34,$31,flip |
ADD $35,addy,addz |
FADD $36,addy,addz |
CMP $37,$36,$35 |
GETA $3,1F |
PUT rW,$3 |
LDT $6,Start_Inst |
LDTU $7,Final_Inst |
1H CMP $5,$6,$7 |
BNN $5,1F |
INCML $6,#100 % increase the opcode |
PUT rX,$6 % ropcode 0 |
RESUME % return to 1B |
1H BN $0,@+4*6 |
PBN $0,@-4*1 |
BNN $0,@+4*6 |
PBN $0,@+4*5 |
PBNN $0,@+4*5 |
BN $0,@-4*3 |
BNN $0,@-4*3 |
PBN $0,@-4*3 |
PBNN $0,@-4*3 |
BZ $0,@+4*6 |
PBZ $0,@-4*1 |
BNZ $0,@+4*6 |
PBZ $0,@+4*5 |
PBNZ $0,@+4*5 |
BZ $0,@-4*3 |
BNZ $0,@-4*3 |
PBZ $0,@-4*3 |
PBNZ $0,@-4*3 |
BP $0,@+4*6 |
PBP $0,@-4*1 |
BNP $0,@+4*6 |
PBP $0,@+4*5 |
PBNP $0,@+4*5 |
BP $0,@-4*3 |
BNP $0,@-4*3 |
PBP $0,@-4*3 |
PBNP $0,@-4*3 |
BOD $0,@+4*6 |
PBOD $0,@-4*1 |
BEV $0,@+4*6 |
PBOD $0,@+4*5 |
PBEV $0,@+4*5 |
BOD $0,@-4*3 |
BEV $0,@-4*3 |
PBOD $0,@-4*3 |
PBEV $0,@-4*3 |
LDA $4,Load_Test+4 |
GETA $3,1F |
PUT rW,$3 |
LDTU $7,Load_End |
LDTU $6,Load_Begin |
1H CMPU $8,$6,$7 |
BNN $8,1F |
INCML $6,#100 % increase the opcode |
PUT rX,$6 |
RESUME % return to 1B |
2H OCTA #fedcba9876543210 % becomes Jmp_Pop |
OCTA #ffeeddccbbaa9988 % becomes Jmp_Pop |
NEG ry,addy |
SET rz,flip |
PUT rM,addz |
POP |
1H GETA $4,2B |
SETL $7,4*11 |
GO $7,$7,$4 |
GO $7,$4,4*12 |
PRELD 70,$4,$4 |
PRELD 70,$4,0 |
PREGO 70,$4,$4 |
PREGO 70,$4,0 |
CSWAP $3,Load_Test+13 |
GETA $3,1F |
PUT rW,$3 |
SETL rz,1 |
ADD ry,$4,4 |
LDOU $40,Jmp_Pop |
LDTU $7,Big_End |
LDTU $6,Big_Begin |
1H CMPU $8,$6,$7 |
BNN $8,1F |
INCML $6,#100 % increase the opcode |
PUT rX,$6 |
SET $5,rz |
RESUME % return to 1B |
1H SL $40,small,51 |
SL $40,small,52 |
SAVE $255,0 |
PUT rG,small-$0 |
INCL small-1,U_BIT<<8 |
FADD $100,small,$200 |
PUT rA,small-1 % enable underflow trip |
TRIP 1,$100,small |
FSUB $100,small,$200 % cause underflow trip |
PUT rL,10 |
PUT rL,small |
PUSHJ 11,@+4 |
UNSAVE $255 |
TRAP 0,Halt,0 % normal exit |
|
LOC U_Handler |
PUSHJ $255,Handler |
3H TRAP 0,$1 |
SUB $0,$1,1 |
POP 2,0 |
4H GET $50,rX |
INCH $50,#8100 % ropcode 1 |
FLOT $60,1 |
PUT rZ,$60 |
JMP 2F |
|
LOC 0 |
GET $50,rX |
INCH $50,#8200 % ropcode 2 |
INCMH $50,#ff00-(U_BIT<<8) |
TRAP 1 |
2H PUT rX,$50 |
GET $255,rB |
RESUME |
Handler SETL $5,#abcd |
GET $1,rJ |
PUSHJ 3,3B |
SUB $10,$3,$4 |
PUT rJ,$1 |
POP 11,(4B-3B)>>2 |
|
/coolcomb.mms
0,0 → 1,27
* The "cool-lex" combinations of Ruskey and Williams, ex 7.2.1.3--55(b) |
s IS 4 % the number of 0-bits in each combination |
t IS 3 % the number of 1-bits in each combination; s+t<=8 here |
bits GREG 0 |
ptr GREG 0 |
LOC #100 |
Main LDA ptr,Data_Segment % assemble this with the -x switch! |
SET bits,(1<<t)-1 |
1H PUSHJ $0,Visit |
ADDU $0,bits,1 |
AND $0,$0,bits |
SUBU $1,$0,1 |
XOR $1,$1,$0 |
ADDU $0,$1,1 |
AND $1,$1,bits |
AND $0,$0,bits |
ODIF $0,$0,1 |
SUBU $1,$1,$0 |
ADDU bits,bits,$1 |
SRU $0,bits,s+t |
PBZ $0,1B |
TRAP 0,Halt,0 % simulate this with the -I switch! |
Visit STBU bits,ptr,0 |
INCL ptr,1 |
POP 0,0 |
|
|
/valid.mms
0,0 → 1,47
LOC 4 |
LDVTS $0,$0,0 |
LOC #100 |
Main SET $1,4 |
PUSHJ 0,InstTest |
JMP Main |
|
a IS #ffffffff % table entry when anything goes |
b IS #ffff04ff % table entry when Y <= ROUND_NEAR |
c IS #001f00ff % table entry for PUT and PUTI |
d IS #ff000000 % table entry for RESUME |
e IS #ffff0000 % table entry for SAVE |
f IS #ff0000ff % table entry for UNSAVE |
g IS #ff000003 % table entry for SYNC |
h IS #ffff001f % table entry for GET |
table GREG @ |
TETRA a,a,a,a,a,b,a,b,b,b,b,b,b,b,b,b % 0x |
TETRA a,a,a,a,a,b,a,b,a,a,a,a,a,a,a,a % 1x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 2x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 3x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 4x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 5x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 6x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 7x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % 8x |
TETRA a,a,a,a,a,a,a,a,0,0,a,a,a,a,a,a % 9x |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Ax |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Bx |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Cx |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Dx |
TETRA a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a % Ex |
TETRA a,a,a,a,a,a,c,c,a,d,e,f,g,a,h,a % Fx |
tetra IS $1 |
maxXYZ IS $2 |
InstTest BN $0,9F |
LDTU tetra,$0,0 |
SR $0,tetra,22 |
LDT maxXYZ,table,$0 |
BDIF $0,tetra,maxXYZ |
PBNP maxXYZ,9F |
ANDNML $0,#ff00 |
BNZ $0,9F |
MOR tetra,tetra,#4 |
CMP $0,tetra,18 |
CSP tetra,$0,0 |
ODIF $0,tetra,7 |
9H POP 1,0 |
/halves.mmix
0,0 → 1,39
000000001000: Hand-assembled halves program |
c1f8fa0033f60000 1000: Main OR p,pbase,0; SETL carry,0 |
f000000420f5f5f6 1008: JMP 1F;Loop ADD acc,acc,carry |
77f6f705a1f5f800 1010: ZSOD carry,starp,5; STB acc,p,0 |
81f7f80137f80001 1018: 1H LDB starp,p,1; INCL p,1 |
80f5f9f75bf7fffa 1020: LDB acc,half,starp; PBNZ starp,Loop |
a1f5f800f1fffff5 1028: STB acc,p,0; JMP Main |
100000000000: must preload this address into g249 (f9) |
3500000000000000 HALF: BYTE '5' |
100000000030: |
3030313132323333 BYTE "00112233" |
3434310000000000 BYTE "441",0 |
200000000000: bottom of stack |
0000000000000000 20...000: rL |
0000000000001000 20...008: f4 |
0000000000000000 20...010: f5 |
0000000000000000 20...018: f6 |
0000000000000000 20...020: f7 |
0000000000000000 20...028: f8 |
0000100000000000 20...030: f9 |
0000100000000039 20...038: fa |
0000000000000000 20...040: fb |
0000000000000000 20...048: fc |
0000000000000000 20...050: fd |
0000000000000000 20...058: fe |
0000000000000000 20...060: ff |
0000000000000000 20...068: rB |
0000000000000000 20...070: rD |
0000000000000000 20...078: rE |
0000000000000000 20...080: rH |
0000000000000000 20...088: rJ |
0000000000000000 20...090: rM |
0000000000000000 20...098: rP |
0000000000000000 20...0a0: rR |
0000000000000000 20...0a8: rW |
0000000000000000 20...0b0: rX |
0000000000000000 20...0b8: rY |
0000000000000000 20...0c0: rZ |
f400000000000000 20...0c8: rG and rA |
/primes.mmconfig
0,0 → 1,41
% configuration for primes test -- still in preparation |
memaddresstime 3 |
memreadtime 10 memwritetime 10 |
membusbytes 16 |
%branchpredictbits 2 |
%branchaddressbits 1 |
%branchhistorybits 1 |
%branchdualbits 1 |
memchunksmax 4 |
hashprime 5 |
Scache blocksize 64 |
Scache setsize 512 |
Scache associativity 4 pseudolru |
Scache accesstime 2 |
Dcache blocksize 32 |
Dcache setsize 256 |
Icache blocksize 32 |
Icache setsize 256 |
Icache victimsize 2 |
unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe |
unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit LSU3 00000000000000000000000000000000fffffffcfffffffc0000000000000000 |
unit MUL1 000080f000000000000000000000000000000000000000000000000000000000 |
unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000 |
unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000 |
%dispatchmax 3 |
%commitmax 3 |
%fetchmax 4 |
memslots 4 |
renameregs 10 |
reorderbuffer 20 |
Dcache writeallocate 1 |
Scache writeallocate 1 |
Dcache writeback 1 |
Scache writeback 1 |
Dcache ports 2 |
DTcache ports 2 |
writebuffer 4 |
writeholdingtime 5 |
/test2.mmconfig
0,0 → 1,35
% configuration for test2.mmix |
memaddresstime 3 |
memreadtime 5 memwritetime 4 % don't ask why |
membusbytes 16 |
writeholdingtime 5 |
branchpredictbits 2 |
branchaddressbits 1 |
branchhistorybits 1 |
branchdualbits 1 |
memchunksmax 4 |
hashprime 5 |
Scache blocksize 64 |
Scache setsize 2 |
Scache associativity 4 pseudolru |
Scache accesstime 2 |
Dcache blocksize 32 |
Dcache setsize 8 |
Icache setsize 2 |
Icache victimsize 2 |
DTcache associativity 4 pseudolru |
ITcache associativity 2 |
unit UNI1 fffffeffffffffffffffffffffffffffffffffffffffffffffffffffffffffff |
unit UNI2 FFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF |
dispatchmax 3 |
commitmax 3 |
fetchmax 4 |
memslots 4 |
Dcache writeallocate 1 |
Scache writeallocate 1 |
Dcache writeback 1 |
Scache writeback 1 |
%Dcache ports 2 |
Icache blocksize 32 |
%hardwarepagetable 0 % for this version, start at 8000000000000070 |
%writebuffer 7 |
/silly.out
0,0 → 1,1679
mmix> i silly.run |
GOOD LUCK, DEAR SIMULATOR |
(rG,rA)=M8[#60000000000000f0]=#f100000000000000 |
rS-=8, rZ=M8[#60000000000000e8]=#0000000000000000 |
rS-=8, rY=M8[#60000000000000e0]=#0000000000000000 |
rS-=8, rX=M8[#60000000000000d8]=#0000000000000000 |
rS-=8, rW=M8[#60000000000000d0]=#0000000000000000 |
rS-=8, rP=M8[#60000000000000c8]=#0000000000000000 |
rS-=8, rR=M8[#60000000000000c0]=#0000000000000000 |
rS-=8, rM=M8[#60000000000000b8]=#0000000000000000 |
rS-=8, rJ=M8[#60000000000000b0]=#0000000000000000 |
rS-=8, rH=M8[#60000000000000a8]=#0000000000000000 |
rS-=8, rE=M8[#60000000000000a0]=#0000000000000000 |
rS-=8, rD=M8[#6000000000000098]=#0000000000000000 |
rS-=8, rB=M8[#6000000000000090]=#0000000000000000 |
rS-=8, g[255]=M8[#6000000000000088]=#0000000000000100 |
rS-=8, g[254]=M8[#6000000000000080]=#0000000000000abc |
rS-=8, g[253]=M8[#6000000000000078]=#8000000000000000 |
rS-=8, g[252]=M8[#6000000000000070]=#3fe0000000000000 |
rS-=8, g[251]=M8[#6000000000000068]=#7ff0000000000000 |
rS-=8, g[250]=M8[#6000000000000060]=#7ff1000000000000 |
rS-=8, g[249]=M8[#6000000000000058]=#0000000000010000 |
rS-=8, g[248]=M8[#6000000000000050]=#0000000000020000 |
rS-=8, g[247]=M8[#6000000000000048]=#0000000000030000 |
rS-=8, g[246]=M8[#6000000000000040]=#7f6001b4c67bc809 |
rS-=8, g[245]=M8[#6000000000000038]=#ff5ffb6a4534a3f7 |
rS-=8, g[244]=M8[#6000000000000030]=#0102040810204080 |
rS-=8, g[243]=M8[#6000000000000028]=#0000000000000000 |
rS-=8, g[242]=M8[#6000000000000020]=#0000000000000000 |
rS-=8, g[241]=M8[#6000000000000018]=#2000000000000000 |
rS-=8, l[2]=M8[#6000000000000010]=#0000000000000002 |
rS-=8, l[1]=M8[#6000000000000008]=#4000000000000008 |
rS-=8, l[0]=M8[#6000000000000000]=#0000000000000001 |
(00000000000000fc: fb0000ff (UNSAVE)) #60000000000000f0: rG=241, ..., rL=2 |
"silly.mms" |
line 29: Main FCMP $0,neg_zero,$5 |
1. 0000000000000100: 0100fd05 (FCMP) $0=l[0] = -0. cmp 0. = 0 |
line 30: FCMP $1,neg_zero,inf |
1. 0000000000000104: 0101fdfb (FCMP) $1=l[1] = -0. cmp Inf = -1 |
line 31: FCMP $2,inf,sig_nan |
1. 0000000000000108: 0102fbfa (FCMP) rL=3, $2=l[2] = Inf cmp NaN.0625 = 0, rA=#00010 |
line 32: FUN $3,sig_nan,sig_nan |
1. 000000000000010c: 0203fafa (FUN) rL=4, $3=l[3] = [NaN.0625(||)NaN.0625] = 1 |
line 33: FEQL $4,$4,neg_zero |
1. 0000000000000110: 030404fd (FEQL) rL=5, $4=l[4] = [0.(==)-0.] = 1 |
line 34: FADD $5,half,inf |
1. 0000000000000114: 0405fcfb (FADD) rL=6, $5=l[5] = .5 (+) Inf = Inf |
line 35: FADD $6,half,neg_zero |
1. 0000000000000118: 0406fcfd (FADD) rL=7, $6=l[6] = .5 (+) -0. = .5 |
line 36: FADD $7,half,half |
1. 000000000000011c: 0407fcfc (FADD) rL=8, $7=l[7] = .5 (+) .5 = 1. |
line 37: FADD $8,half,sig_nan |
1. 0000000000000120: 0408fcfa (FADD) rL=9, $8=l[8] = .5 (+) NaN.0625 = NaN.5625, rA=#00010 |
line 38: FSUB $9,half,small |
1. 0000000000000124: 0609fcfe (FSUB) rL=10, $9=l[9] = .5 (-) 1.3577e-320 = .5, rA=#00011 |
line 39: PUT rA,round_off |
1. 0000000000000128: f61500f9 (PUT) rA = 65536 = #10000 |
line 40: FSUB $9,half,small |
1. 000000000000012c: 0609fcfe (FSUB) $9=l[9] = .5 [-] 1.3577e-320 = .49999999999999994, rA=#10001 |
line 41: FSUB $9,small,half |
1. 0000000000000130: 0609fefc (FSUB) $9=l[9] = 1.3577e-320 [-] .5 = -.49999999999999994, rA=#10001 |
line 42: FSQRT $10,$9 |
1. 0000000000000134: 150a0009 (FSQRT) rL=11, $10=l[10] = [sqrt] -.49999999999999994 = -NaN, rA=#10011 |
line 43: FSUB $11,sig_nan,$10 |
1. 0000000000000138: 060bfa0a (FSUB) rL=12, $11=l[11] = NaN.0625 [-] -NaN = -NaN, rA=#10011 |
line 44: PUT rA,round_down |
1. 000000000000013c: f61500f7 (PUT) rA = 196608 = #30000 |
line 45: FSUB $12,half,half |
1. 0000000000000140: 060cfcfc (FSUB) rL=13, $12=l[12] = .5 _-_ .5 = -0. |
line 46: FSUB $12,$20,$21 |
1. 0000000000000144: 060c1415 (FSUB) $12=l[12] = 0. _-_ 0. = -0. |
line 47: FSUB $12,$20,neg_zero |
1. 0000000000000148: 060c14fd (FSUB) $12=l[12] = 0. _-_ -0. = 0. |
line 48: PUT rA,round_up |
1. 000000000000014c: f61500f8 (PUT) rA = 131072 = #20000 |
line 49: SUB $0,inf,1 % $0 = largest normal number |
1. 0000000000000150: 2500fb01 (SUBI) $0=l[0] = 9218868437227405312 - 1 = 9218868437227405311 |
line 50: FADD $12,$0,small |
1. 0000000000000154: 040c00fe (FADD) $12=l[12] = 1.7976931348623157e308 ^+^ 1.3577e-320 = Inf, rA=#20009 |
line 51: FIX $12,half |
1. 0000000000000158: 050c00fc (FIX) $12=l[12] = ^fix^ .5 = 1 |
line 52: FIXU $14,ROUND_DOWN,$9 |
1. 000000000000015c: 070e0309 (FIXU) rL=15, $14=l[14] = _fix_ -.49999999999999994 = #ffffffffffffffff |
line 53: FLOT $15,ROUND_DOWN,addy |
1. 0000000000000160: 080f03f6 (FLOT) rL=16, $15=l[15] = _flot_ 9178337916516812809 = 9.178337916516813e18, rA=#20009 |
line 54: FLOT $16,ROUND_UP,addy |
1. 0000000000000164: 081002f6 (FLOT) rL=17, $16=l[16] = ^flot^ 9178337916516812809 = 9.178337916516814e18, rA=#20009 |
line 55: NEG $1,1 % $1 = -1 |
1. 0000000000000168: 35010001 (NEGI) $1=l[1] = 0 - 1 = -1 |
line 56: FLOT $17,1 |
1. 000000000000016c: 09110001 (FLOTI) rL=18, $17=l[17] = ^flot^ 1 = 1. |
line 57: FLOT $17,$1 |
1. 0000000000000170: 08110001 (FLOT) $17=l[17] = ^flot^ -1 = -1. |
line 58: FLOTU $18,255 |
1. 0000000000000174: 0b1200ff (FLOTUI) rL=19, $18=l[18] = ^flot^ 255 = 255. |
line 59: FLOTU $18,neg_zero |
1. 0000000000000178: 0a1200fd (FLOTU) $18=l[18] = ^flot^ #8000000000000000 = 9.223372036854776e18 |
line 60: FIX $13,ROUND_NEAR,$18 |
1. 000000000000017c: 050d0412 (FIX) $13=l[13] = (fix) 9.223372036854776e18 = -9223372036854775808, rA=#20029 |
line 61: SFLOT $18,ROUND_DOWN,addy |
1. 0000000000000180: 0c1203f6 (SFLOT) $18=l[18] = _sflot_ 9178337916516812809 = 9.178337689848513e18, rA=#20029 |
line 62: SFLOT $19,ROUND_UP,addy |
1. 0000000000000184: 0c1302f6 (SFLOT) rL=20, $19=l[19] = ^sflot^ 9178337916516812809 = 9.178338239604326e18, rA=#20029 |
line 63: FSUB $20,$18,$19 |
1. 0000000000000188: 06141213 (FSUB) rL=21, $20=l[20] = 9.178337689848513e18 ^-^ 9.178338239604326e18 = -549755813888. |
line 64: FSUB $20,$16,$15 |
1. 000000000000018c: 0614100f (FSUB) $20=l[20] = 9.178337916516814e18 ^-^ 9.178337916516813e18 = 1024. |
line 65: SFLOT $20,1 |
1. 0000000000000190: 0d140001 (SFLOTI) $20=l[20] = ^sflot^ 1 = 1. |
line 66: SFLOT $20,$1 |
1. 0000000000000194: 0c140001 (SFLOT) $20=l[20] = ^sflot^ -1 = -1. |
line 67: SFLOTU $21,$1 |
1. 0000000000000198: 0e150001 (SFLOTU) rL=22, $21=l[21] = ^sflot^ #ffffffffffffffff = 1.8446744073709552e19, rA=#20029 |
line 68: SFLOTU $21,255 |
1. 000000000000019c: 0f1500ff (SFLOTUI) $21=l[21] = ^sflot^ 255 = 255. |
line 69: FMUL $22,neg_zero,inf |
1. 00000000000001a0: 1016fdfb (FMUL) rL=23, $22=l[22] = -0. ^*^ Inf = -NaN, rA=#20039 |
line 70: FMUL $22,half,half |
1. 00000000000001a4: 1016fcfc (FMUL) $22=l[22] = .5 ^*^ .5 = .25 |
line 71: FMUL $23,small,$0 |
1. 00000000000001a8: 1017fe00 (FMUL) rL=24, $23=l[23] = 1.3577e-320 ^*^ 1.7976931348623157e308 = 2.440714297335944e-12, rA=#20039 |
line 72: PUT rE,half |
1. 00000000000001ac: f60200fc (PUT) rE = 4602678819172646912 = #3fe0000000000000 |
line 73: FCMPE $24,half,$21 |
1. 00000000000001b0: 1118fc15 (FCMPE) rL=25, $24=l[24] = .5 cmp 255. (.5)) = -1 |
line 74: FCMPE $24,neg_zero,small |
1. 00000000000001b4: 1118fdfe (FCMPE) $24=l[24] = -0. cmp 1.3577e-320 (.5)) = 0 |
line 75: FCMPE $24,neg_zero,half |
1. 00000000000001b8: 1118fdfc (FCMPE) $24=l[24] = -0. cmp .5 (.5)) = 0 |
line 76: FCMPE $24,half,inf |
1. 00000000000001bc: 1118fcfb (FCMPE) $24=l[24] = .5 cmp Inf (.5)) = -1 |
line 77: FEQLE $24,$15,$16 |
1. 00000000000001c0: 13180f10 (FEQLE) $24=l[24] = [9.178337916516813e18(==)9.178337916516814e18 (.5)] = 1 |
line 78: PUT rE,neg_zero |
1. 00000000000001c4: f60200fd (PUT) rE = -9223372036854775808 = #8000000000000000 |
line 79: FEQLE $24,half,half |
1. 00000000000001c8: 1318fcfc (FEQLE) $24=l[24] = [.5(==).5 (-0.)] = 0, rA=#20039 |
line 80: FUNE $24,half,half |
1. 00000000000001cc: 1218fcfc (FUNE) $24=l[24] = [.5(||).5 (-0.)] = 1 |
line 81: FSQRT $25,ROUND_UP,$0 |
1. 00000000000001d0: 15190200 (FSQRT) rL=26, $25=l[25] = ^sqrt^ 1.7976931348623157e308 = 1.3407807929942597e154, rA=#20039 |
line 82: FDIV $26,$0,$25 |
1. 00000000000001d4: 141a0019 (FDIV) rL=27, $26=l[26] = 1.7976931348623157e308 ^/^ 1.3407807929942597e154 = 1.3407807929942595e154 |
line 83: PUT rA,$50 |
1. 00000000000001d8: f6150032 (PUT) rA = 0 = #0 |
line 84: FDIV $26,$0,$25 |
1. 00000000000001dc: 141a0019 (FDIV) $26=l[26] = 1.7976931348623157e308 (/) 1.3407807929942597e154 = 1.3407807929942595e154 |
line 85: FMUL $27,$25,$25 |
1. 00000000000001e0: 101b1919 (FMUL) rL=28, $27=l[27] = 1.3407807929942597e154 (*) 1.3407807929942597e154 = Inf, rA=#00009 |
line 86: FREM $28,$9,half |
1. 00000000000001e4: 161c09fc (FREM) rL=29, $28=l[28] = -.49999999999999994 (rem) .5 = 5.551115123125783e-17 |
line 87: FREM $29,$9,small |
1. 00000000000001e8: 161d09fe (FREM) rL=30, $29=l[29] = -.49999999999999994 (rem) 1.3577e-320 = -2.866e-321 |
line 88: FINT $30,$9 |
1. 00000000000001ec: 171e0009 (FINT) rL=31, $30=l[30] = (int) -.49999999999999994 = -0. |
line 89: FINT $30,ROUND_UP,small |
1. 00000000000001f0: 171e02fe (FINT) $30=l[30] = ^int^ 1.3577e-320 = 1. |
line 90: MUL $31,flip,flip |
1. 00000000000001f4: 181ff4f4 (MUL) rL=32, $31=l[31] = 72624976668147840 * 72624976668147840 = 507802986467049472, rA=#00049 |
line 91: MUL $32,flip,$1 |
1. 00000000000001f8: 1820f401 (MUL) rL=33, $32=l[32] = 72624976668147840 * -1 = -72624976668147840 |
line 92: MUL $33,flip,2 |
1. 00000000000001fc: 1921f402 (MULI) rL=34, $33=l[33] = 72624976668147840 * 2 = 145249953336295680 |
line 93: DIV $32,$32,$1 |
1. 0000000000000200: 1c202001 (DIV) $32=l[32] = -72624976668147840 / -1 = 72624976668147840, rR=0 |
line 94: DIV $32,neg_zero,$1 |
1. 0000000000000204: 1c20fd01 (DIV) $32=l[32] = -9223372036854775808 / -1 = -9223372036854775808, rR=0, rA=#00049 |
line 95: MULU $32,flip,$1 |
1. 0000000000000208: 1a20f401 (MULU) $32=l[32] = #102040810204080 * #ffffffffffffffff = #fefdfbf7efdfbf80, rH=#10204081020407f |
line 96: MULU $31,flip,flip |
1. 000000000000020c: 1a1ff4f4 (MULU) $31=l[31] = #102040810204080 * #102040810204080 = #70c142030404000, rH=#1040c2050c1c4 |
line 97: GET $33,rH |
1. 0000000000000210: fe210003 (GET) $33=l[33] = rH = #1040c2050c1c4 |
line 98: PUT rD,$33 |
1. 0000000000000214: f6010021 (PUT) rD = 285925104992708 = #1040c2050c1c4 |
line 99: DIV $33,$1,3 |
1. 0000000000000218: 1d210103 (DIVI) $33=l[33] = -1 / 3 = -1, rR=2 |
line 100: DIVU $34,$31,flip |
1. 000000000000021c: 1e221ff4 (DIVU) rL=35, $34=l[34] = #1040c2050c1c4070c142030404000 / #102040810204080 = #102040810204080, rR=#0 |
line 101: ADD $35,addy,addz |
1. 0000000000000220: 2023f6f5 (ADD) rL=36, $35=l[35] = 9178337916516812809 + -45041037404232713 = 9133296879112580096 |
line 102: FADD $36,addy,addz |
1. 0000000000000224: 0424f6f5 (FADD) rL=37, $36=l[36] = 3.51258193065761e305 (+) -3.5091543080293444e305 = 3.4276226282657902e302 |
line 103: CMP $37,$36,$35 |
1. 0000000000000228: 30252423 (CMP) rL=38, $37=l[37] = 9133296879112580096 cmp 9133296879112580096 = 0 |
line 104: GETA $3,1F |
1. 000000000000022c: f4030004 (GETA) $3=l[3] = #23c |
line 105: PUT rW,$3 |
1. 0000000000000230: f6180003 (PUT) rW = 572 = #23c |
line 106: LDT $6,Start_Inst |
1. 0000000000000234: 8906f100 (LDTI) $6=l[6] = M4[#2000000000000000] = 604306433 |
line 107: LDTU $7,Final_Inst |
1. 0000000000000238: 8b07f104 (LDTUI) $7=l[7] = M4[#2000000000000000+4] = #3f04fc01 |
line 108: 1H CMP $5,$6,$7 |
1. 000000000000023c: 30050607 (CMP) $5=l[5] = 604306433 cmp 1057291265 = -1 |
line 109: BNN $5,1F |
1. 0000000000000240: 48050004 (BNN) -1>=0? No |
line 110: INCML $6,#100 % increase the opcode |
1. 0000000000000244: e6060100 (INCML) $6=l[6] = #2404fc01 + #1000000 = #2504fc01 |
line 111: PUT rX,$6 % ropcode 0 |
1. 0000000000000248: f6190006 (PUT) rX = 621083649 = #2504fc01 |
line 112: RESUME % return to 1B |
1. 000000000000024c: f9000000 (RESUME) {#2504fc01} -> #23c |
(0000000000000238: 2504fc01 (SUBI)) $4=l[4] = 4602678819172646912 - 1 = 4602678819172646911 |
-------- |
line 108: 1H CMP $5,$6,$7 |
2. 000000000000023c: 30050607 (CMP) $5=l[5] = 621083649 cmp 1057291265 = -1 |
line 109: BNN $5,1F |
2. 0000000000000240: 48050004 (BNN) -1>=0? No |
line 110: INCML $6,#100 % increase the opcode |
2. 0000000000000244: e6060100 (INCML) $6=l[6] = #2504fc01 + #1000000 = #2604fc01 |
line 111: PUT rX,$6 % ropcode 0 |
2. 0000000000000248: f6190006 (PUT) rX = 637860865 = #2604fc01 |
line 112: RESUME % return to 1B |
2. 000000000000024c: f9000000 (RESUME) {#2604fc01} -> #23c |
(0000000000000238: 2604fc01 (SUBU)) $4=l[4] = #3fe0000000000000 - #ffffffffffffffff = #3fe0000000000001 |
............................................... |
line 112: RESUME % return to 1B |
3. 000000000000024c: f9000000 (RESUME) {#2704fc01} -> #23c |
(0000000000000238: 2704fc01 (SUBUI)) $4=l[4] = #3fe0000000000000 - 1 = #3fdfffffffffffff |
............................................... |
line 112: RESUME % return to 1B |
4. 000000000000024c: f9000000 (RESUME) {#2804fc01} -> #23c |
(0000000000000238: 2804fc01 (2ADDU)) $4=l[4] = #3fe0000000000000 <<1+ #ffffffffffffffff = #7fbfffffffffffff |
............................................... |
line 112: RESUME % return to 1B |
5. 000000000000024c: f9000000 (RESUME) {#2904fc01} -> #23c |
(0000000000000238: 2904fc01 (2ADDUI)) $4=l[4] = #3fe0000000000000 <<1+ 1 = #7fc0000000000001 |
............................................... |
line 112: RESUME % return to 1B |
6. 000000000000024c: f9000000 (RESUME) {#2a04fc01} -> #23c |
(0000000000000238: 2a04fc01 (4ADDU)) $4=l[4] = #3fe0000000000000 <<2+ #ffffffffffffffff = #ff7fffffffffffff |
............................................... |
line 112: RESUME % return to 1B |
7. 000000000000024c: f9000000 (RESUME) {#2b04fc01} -> #23c |
(0000000000000238: 2b04fc01 (4ADDUI)) $4=l[4] = #3fe0000000000000 <<2+ 1 = #ff80000000000001 |
............................................... |
line 112: RESUME % return to 1B |
8. 000000000000024c: f9000000 (RESUME) {#2c04fc01} -> #23c |
(0000000000000238: 2c04fc01 (8ADDU)) $4=l[4] = #3fe0000000000000 <<3+ #ffffffffffffffff = #feffffffffffffff |
............................................... |
line 112: RESUME % return to 1B |
9. 000000000000024c: f9000000 (RESUME) {#2d04fc01} -> #23c |
(0000000000000238: 2d04fc01 (8ADDUI)) $4=l[4] = #3fe0000000000000 <<3+ 1 = #ff00000000000001 |
............................................... |
line 112: RESUME % return to 1B |
10. 000000000000024c: f9000000 (RESUME) {#2e04fc01} -> #23c |
(0000000000000238: 2e04fc01 (16ADDU)) $4=l[4] = #3fe0000000000000 <<4+ #ffffffffffffffff = #fdffffffffffffff |
............................................... |
line 112: RESUME % return to 1B |
11. 000000000000024c: f9000000 (RESUME) {#2f04fc01} -> #23c |
(0000000000000238: 2f04fc01 (16ADDUI)) $4=l[4] = #3fe0000000000000 <<4+ 1 = #fe00000000000001 |
............................................... |
line 112: RESUME % return to 1B |
12. 000000000000024c: f9000000 (RESUME) {#3004fc01} -> #23c |
(0000000000000238: 3004fc01 (CMP)) $4=l[4] = 4602678819172646912 cmp -1 = 1 |
............................................... |
line 112: RESUME % return to 1B |
13. 000000000000024c: f9000000 (RESUME) {#3104fc01} -> #23c |
(0000000000000238: 3104fc01 (CMPI)) $4=l[4] = 4602678819172646912 cmp 1 = 1 |
............................................... |
line 112: RESUME % return to 1B |
14. 000000000000024c: f9000000 (RESUME) {#3204fc01} -> #23c |
(0000000000000238: 3204fc01 (CMPU)) $4=l[4] = #3fe0000000000000 cmp #ffffffffffffffff = -1 |
............................................... |
line 112: RESUME % return to 1B |
15. 000000000000024c: f9000000 (RESUME) {#3304fc01} -> #23c |
(0000000000000238: 3304fc01 (CMPUI)) $4=l[4] = #3fe0000000000000 cmp 1 = 1 |
............................................... |
line 112: RESUME % return to 1B |
16. 000000000000024c: f9000000 (RESUME) {#3404fc01} -> #23c |
(0000000000000238: 3404fc01 (NEG)) $4=l[4] = 252 - -1 = 253 |
............................................... |
line 112: RESUME % return to 1B |
17. 000000000000024c: f9000000 (RESUME) {#3504fc01} -> #23c |
(0000000000000238: 3504fc01 (NEGI)) $4=l[4] = 252 - 1 = 251 |
............................................... |
line 112: RESUME % return to 1B |
18. 000000000000024c: f9000000 (RESUME) {#3604fc01} -> #23c |
(0000000000000238: 3604fc01 (NEGU)) $4=l[4] = 252 - #ffffffffffffffff = #fd |
............................................... |
line 112: RESUME % return to 1B |
19. 000000000000024c: f9000000 (RESUME) {#3704fc01} -> #23c |
(0000000000000238: 3704fc01 (NEGUI)) $4=l[4] = 252 - 1 = #fb |
............................................... |
line 112: RESUME % return to 1B |
20. 000000000000024c: f9000000 (RESUME) {#3804fc01} -> #23c |
(0000000000000238: 3804fc01 (SL)) $4=l[4] = 4602678819172646912 << #ffffffffffffffff = 0, rA=#00049 |
............................................... |
line 112: RESUME % return to 1B |
21. 000000000000024c: f9000000 (RESUME) {#3904fc01} -> #23c |
(0000000000000238: 3904fc01 (SLI)) $4=l[4] = 4602678819172646912 << 1 = 9205357638345293824 |
............................................... |
line 112: RESUME % return to 1B |
22. 000000000000024c: f9000000 (RESUME) {#3a04fc01} -> #23c |
(0000000000000238: 3a04fc01 (SLU)) $4=l[4] = #3fe0000000000000 << #ffffffffffffffff = #0 |
............................................... |
line 112: RESUME % return to 1B |
23. 000000000000024c: f9000000 (RESUME) {#3b04fc01} -> #23c |
(0000000000000238: 3b04fc01 (SLUI)) $4=l[4] = #3fe0000000000000 << 1 = #7fc0000000000000 |
............................................... |
line 112: RESUME % return to 1B |
24. 000000000000024c: f9000000 (RESUME) {#3c04fc01} -> #23c |
(0000000000000238: 3c04fc01 (SR)) $4=l[4] = 4602678819172646912 >> #ffffffffffffffff = 0 |
............................................... |
line 112: RESUME % return to 1B |
25. 000000000000024c: f9000000 (RESUME) {#3d04fc01} -> #23c |
(0000000000000238: 3d04fc01 (SRI)) $4=l[4] = 4602678819172646912 >> 1 = 2301339409586323456 |
............................................... |
line 112: RESUME % return to 1B |
26. 000000000000024c: f9000000 (RESUME) {#3e04fc01} -> #23c |
(0000000000000238: 3e04fc01 (SRU)) $4=l[4] = #3fe0000000000000 >> #ffffffffffffffff = #0 |
............................................... |
line 112: RESUME % return to 1B |
27. 000000000000024c: f9000000 (RESUME) {#3f04fc01} -> #23c |
(0000000000000238: 3f04fc01 (SRUI)) $4=l[4] = #3fe0000000000000 >> 1 = #1ff0000000000000 |
............................................... |
line 113: 1H BN $0,@+4*6 |
1. 0000000000000250: 40000006 (BN) 9218868437227405311<0? No |
line 114: PBN $0,@-4*1 |
1. 0000000000000254: 5100ffff (PBNB) 9218868437227405311<0? No (bad guess) |
line 115: BNN $0,@+4*6 |
1. 0000000000000258: 48000006 (BNN) 9218868437227405311>=0? Yes, -> #270 (bad guess) |
... |
line 121: PBNN $0,@-4*3 |
1. 0000000000000270: 5900fffd (PBNNB) 9218868437227405311>=0? Yes, -> #264 |
-------- |
line 118: BN $0,@-4*3 |
1. 0000000000000264: 4100fffd (BNB) 9218868437227405311<0? No |
line 119: BNN $0,@-4*3 |
1. 0000000000000268: 4900fffd (BNNB) 9218868437227405311>=0? Yes, -> #25c (bad guess) |
-------- |
line 116: PBN $0,@+4*5 |
1. 000000000000025c: 50000005 (PBN) 9218868437227405311<0? No (bad guess) |
line 117: PBNN $0,@+4*5 |
1. 0000000000000260: 58000005 (PBNN) 9218868437227405311>=0? Yes, -> #274 |
... |
line 122: BZ $0,@+4*6 |
1. 0000000000000274: 42000006 (BZ) 9218868437227405311==0? No |
line 123: PBZ $0,@-4*1 |
1. 0000000000000278: 5300ffff (PBZB) 9218868437227405311==0? No (bad guess) |
line 124: BNZ $0,@+4*6 |
1. 000000000000027c: 4a000006 (BNZ) 9218868437227405311!=0? Yes, -> #294 (bad guess) |
... |
line 130: PBNZ $0,@-4*3 |
1. 0000000000000294: 5b00fffd (PBNZB) 9218868437227405311!=0? Yes, -> #288 |
-------- |
line 127: BZ $0,@-4*3 |
1. 0000000000000288: 4300fffd (BZB) 9218868437227405311==0? No |
line 128: BNZ $0,@-4*3 |
1. 000000000000028c: 4b00fffd (BNZB) 9218868437227405311!=0? Yes, -> #280 (bad guess) |
-------- |
line 125: PBZ $0,@+4*5 |
1. 0000000000000280: 52000005 (PBZ) 9218868437227405311==0? No (bad guess) |
line 126: PBNZ $0,@+4*5 |
1. 0000000000000284: 5a000005 (PBNZ) 9218868437227405311!=0? Yes, -> #298 |
... |
line 131: BP $0,@+4*6 |
1. 0000000000000298: 44000006 (BP) 9218868437227405311>0? Yes, -> #2b0 (bad guess) |
... |
line 137: BNP $0,@-4*3 |
1. 00000000000002b0: 4d00fffd (BNPB) 9218868437227405311<=0? No |
line 138: PBP $0,@-4*3 |
1. 00000000000002b4: 5500fffd (PBPB) 9218868437227405311>0? Yes, -> #2a8 |
-------- |
line 135: PBNP $0,@+4*5 |
1. 00000000000002a8: 5c000005 (PBNP) 9218868437227405311<=0? No (bad guess) |
line 136: BP $0,@-4*3 |
1. 00000000000002ac: 4500fffd (BPB) 9218868437227405311>0? Yes, -> #2a0 (bad guess) |
-------- |
line 133: BNP $0,@+4*6 |
1. 00000000000002a0: 4c000006 (BNP) 9218868437227405311<=0? No |
line 134: PBP $0,@+4*5 |
1. 00000000000002a4: 54000005 (PBP) 9218868437227405311>0? Yes, -> #2b8 |
... |
line 139: PBNP $0,@-4*3 |
1. 00000000000002b8: 5d00fffd (PBNPB) 9218868437227405311<=0? No (bad guess) |
line 140: BOD $0,@+4*6 |
1. 00000000000002bc: 46000006 (BOD) 9218868437227405311 odd? Yes, -> #2d4 (bad guess) |
... |
line 146: BEV $0,@-4*3 |
1. 00000000000002d4: 4f00fffd (BEVB) 9218868437227405311 even? No |
line 147: PBOD $0,@-4*3 |
1. 00000000000002d8: 5700fffd (PBODB) 9218868437227405311 odd? Yes, -> #2cc |
-------- |
line 144: PBEV $0,@+4*5 |
1. 00000000000002cc: 5e000005 (PBEV) 9218868437227405311 even? No (bad guess) |
line 145: BOD $0,@-4*3 |
1. 00000000000002d0: 4700fffd (BODB) 9218868437227405311 odd? Yes, -> #2c4 (bad guess) |
-------- |
line 142: BEV $0,@+4*6 |
1. 00000000000002c4: 4e000006 (BEV) 9218868437227405311 even? No |
line 143: PBOD $0,@+4*5 |
1. 00000000000002c8: 56000005 (PBOD) 9218868437227405311 odd? Yes, -> #2dc |
... |
line 148: PBEV $0,@-4*3 |
1. 00000000000002dc: 5f00fffd (PBEVB) 9218868437227405311 even? No (bad guess) |
line 149: LDA $4,Load_Test+4 |
1. 00000000000002e0: 2304f10c (ADDUI) $4=l[4] = #2000000000000000 + 12 = #200000000000000c |
line 150: GETA $3,1F |
1. 00000000000002e4: f4030004 (GETA) $3=l[3] = #2f4 |
line 151: PUT rW,$3 |
1. 00000000000002e8: f6180003 (PUT) rW = 756 = #2f4 |
line 152: LDTU $7,Load_End |
1. 00000000000002ec: 8b07f124 (LDTUI) $7=l[7] = M4[#2000000000000000+36] = #97030405 |
line 153: LDTU $6,Load_Begin |
1. 00000000000002f0: 8b06f120 (LDTUI) $6=l[6] = M4[#2000000000000000+32] = #5f030405 |
line 154: 1H CMPU $8,$6,$7 |
1. 00000000000002f4: 32080607 (CMPU) $8=l[8] = #5f030405 cmp #97030405 = -1 |
line 155: BNN $8,1F |
1. 00000000000002f8: 4808000c (BNN) -1>=0? No |
line 156: INCML $6,#100 % increase the opcode |
1. 00000000000002fc: e6060100 (INCML) $6=l[6] = #5f030405 + #1000000 = #60030405 |
line 157: PUT rX,$6 |
1. 0000000000000300: f6190006 (PUT) rX = 1610810373 = #60030405 |
line 158: RESUME % return to 1B |
1. 0000000000000304: f9000000 (RESUME) {#60030405} -> #2f4 |
(00000000000002f0: 60030405 (CSN)) $3=l[3] = 2305843009213693964<0? 0: 756 = 756 |
-------- |
line 154: 1H CMPU $8,$6,$7 |
2. 00000000000002f4: 32080607 (CMPU) $8=l[8] = #60030405 cmp #97030405 = -1 |
line 155: BNN $8,1F |
2. 00000000000002f8: 4808000c (BNN) -1>=0? No |
line 156: INCML $6,#100 % increase the opcode |
2. 00000000000002fc: e6060100 (INCML) $6=l[6] = #60030405 + #1000000 = #61030405 |
line 157: PUT rX,$6 |
2. 0000000000000300: f6190006 (PUT) rX = 1627587589 = #61030405 |
line 158: RESUME % return to 1B |
2. 0000000000000304: f9000000 (RESUME) {#61030405} -> #2f4 |
(00000000000002f0: 61030405 (CSNI)) $3=l[3] = 2305843009213693964<0? 5: 756 = 756 |
............................................... |
line 158: RESUME % return to 1B |
3. 0000000000000304: f9000000 (RESUME) {#62030405} -> #2f4 |
(00000000000002f0: 62030405 (CSZ)) $3=l[3] = 2305843009213693964==0? 0: 756 = 756 |
............................................... |
line 158: RESUME % return to 1B |
4. 0000000000000304: f9000000 (RESUME) {#63030405} -> #2f4 |
(00000000000002f0: 63030405 (CSZI)) $3=l[3] = 2305843009213693964==0? 5: 756 = 756 |
............................................... |
line 158: RESUME % return to 1B |
5. 0000000000000304: f9000000 (RESUME) {#64030405} -> #2f4 |
(00000000000002f0: 64030405 (CSP)) $3=l[3] = 2305843009213693964>0? 0: 756 = 0 |
............................................... |
line 158: RESUME % return to 1B |
6. 0000000000000304: f9000000 (RESUME) {#65030405} -> #2f4 |
(00000000000002f0: 65030405 (CSPI)) $3=l[3] = 2305843009213693964>0? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
7. 0000000000000304: f9000000 (RESUME) {#66030405} -> #2f4 |
(00000000000002f0: 66030405 (CSOD)) $3=l[3] = 2305843009213693964 odd? 0: 5 = 5 |
............................................... |
line 158: RESUME % return to 1B |
8. 0000000000000304: f9000000 (RESUME) {#67030405} -> #2f4 |
(00000000000002f0: 67030405 (CSODI)) $3=l[3] = 2305843009213693964 odd? 5: 5 = 5 |
............................................... |
line 158: RESUME % return to 1B |
9. 0000000000000304: f9000000 (RESUME) {#68030405} -> #2f4 |
(00000000000002f0: 68030405 (CSNN)) $3=l[3] = 2305843009213693964>=0? 0: 5 = 0 |
............................................... |
line 158: RESUME % return to 1B |
10. 0000000000000304: f9000000 (RESUME) {#69030405} -> #2f4 |
(00000000000002f0: 69030405 (CSNNI)) $3=l[3] = 2305843009213693964>=0? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
11. 0000000000000304: f9000000 (RESUME) {#6a030405} -> #2f4 |
(00000000000002f0: 6a030405 (CSNZ)) $3=l[3] = 2305843009213693964!=0? 0: 5 = 0 |
............................................... |
line 158: RESUME % return to 1B |
12. 0000000000000304: f9000000 (RESUME) {#6b030405} -> #2f4 |
(00000000000002f0: 6b030405 (CSNZI)) $3=l[3] = 2305843009213693964!=0? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
13. 0000000000000304: f9000000 (RESUME) {#6c030405} -> #2f4 |
(00000000000002f0: 6c030405 (CSNP)) $3=l[3] = 2305843009213693964<=0? 0: 5 = 5 |
............................................... |
line 158: RESUME % return to 1B |
14. 0000000000000304: f9000000 (RESUME) {#6d030405} -> #2f4 |
(00000000000002f0: 6d030405 (CSNPI)) $3=l[3] = 2305843009213693964<=0? 5: 5 = 5 |
............................................... |
line 158: RESUME % return to 1B |
15. 0000000000000304: f9000000 (RESUME) {#6e030405} -> #2f4 |
(00000000000002f0: 6e030405 (CSEV)) $3=l[3] = 2305843009213693964 even? 0: 5 = 0 |
............................................... |
line 158: RESUME % return to 1B |
16. 0000000000000304: f9000000 (RESUME) {#6f030405} -> #2f4 |
(00000000000002f0: 6f030405 (CSEVI)) $3=l[3] = 2305843009213693964 even? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
17. 0000000000000304: f9000000 (RESUME) {#70030405} -> #2f4 |
(00000000000002f0: 70030405 (ZSN)) $3=l[3] = 2305843009213693964<0? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
18. 0000000000000304: f9000000 (RESUME) {#71030405} -> #2f4 |
(00000000000002f0: 71030405 (ZSNI)) $3=l[3] = 2305843009213693964<0? 5: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
19. 0000000000000304: f9000000 (RESUME) {#72030405} -> #2f4 |
(00000000000002f0: 72030405 (ZSZ)) $3=l[3] = 2305843009213693964==0? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
20. 0000000000000304: f9000000 (RESUME) {#73030405} -> #2f4 |
(00000000000002f0: 73030405 (ZSZI)) $3=l[3] = 2305843009213693964==0? 5: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
21. 0000000000000304: f9000000 (RESUME) {#74030405} -> #2f4 |
(00000000000002f0: 74030405 (ZSP)) $3=l[3] = 2305843009213693964>0? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
22. 0000000000000304: f9000000 (RESUME) {#75030405} -> #2f4 |
(00000000000002f0: 75030405 (ZSPI)) $3=l[3] = 2305843009213693964>0? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
23. 0000000000000304: f9000000 (RESUME) {#76030405} -> #2f4 |
(00000000000002f0: 76030405 (ZSOD)) $3=l[3] = 2305843009213693964 odd? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
24. 0000000000000304: f9000000 (RESUME) {#77030405} -> #2f4 |
(00000000000002f0: 77030405 (ZSODI)) $3=l[3] = 2305843009213693964 odd? 5: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
25. 0000000000000304: f9000000 (RESUME) {#78030405} -> #2f4 |
(00000000000002f0: 78030405 (ZSNN)) $3=l[3] = 2305843009213693964>=0? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
26. 0000000000000304: f9000000 (RESUME) {#79030405} -> #2f4 |
(00000000000002f0: 79030405 (ZSNNI)) $3=l[3] = 2305843009213693964>=0? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
27. 0000000000000304: f9000000 (RESUME) {#7a030405} -> #2f4 |
(00000000000002f0: 7a030405 (ZSNZ)) $3=l[3] = 2305843009213693964!=0? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
28. 0000000000000304: f9000000 (RESUME) {#7b030405} -> #2f4 |
(00000000000002f0: 7b030405 (ZSNZI)) $3=l[3] = 2305843009213693964!=0? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
29. 0000000000000304: f9000000 (RESUME) {#7c030405} -> #2f4 |
(00000000000002f0: 7c030405 (ZSNP)) $3=l[3] = 2305843009213693964<=0? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
30. 0000000000000304: f9000000 (RESUME) {#7d030405} -> #2f4 |
(00000000000002f0: 7d030405 (ZSNPI)) $3=l[3] = 2305843009213693964<=0? 5: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
31. 0000000000000304: f9000000 (RESUME) {#7e030405} -> #2f4 |
(00000000000002f0: 7e030405 (ZSEV)) $3=l[3] = 2305843009213693964 even? 0: 0 = 0 |
............................................... |
line 158: RESUME % return to 1B |
32. 0000000000000304: f9000000 (RESUME) {#7f030405} -> #2f4 |
(00000000000002f0: 7f030405 (ZSEVI)) $3=l[3] = 2305843009213693964 even? 5: 0 = 5 |
............................................... |
line 158: RESUME % return to 1B |
33. 0000000000000304: f9000000 (RESUME) {#80030405} -> #2f4 |
(00000000000002f0: 80030405 (LDB)) $3=l[3] = M1[#200000000000000c+#0] = -124 |
............................................... |
line 158: RESUME % return to 1B |
34. 0000000000000304: f9000000 (RESUME) {#81030405} -> #2f4 |
(00000000000002f0: 81030405 (LDBI)) $3=l[3] = M1[#200000000000000c+5] = -119 |
............................................... |
line 158: RESUME % return to 1B |
35. 0000000000000304: f9000000 (RESUME) {#82030405} -> #2f4 |
(00000000000002f0: 82030405 (LDBU)) $3=l[3] = M1[#200000000000000c+#0] = #84 |
............................................... |
line 158: RESUME % return to 1B |
36. 0000000000000304: f9000000 (RESUME) {#83030405} -> #2f4 |
(00000000000002f0: 83030405 (LDBUI)) $3=l[3] = M1[#200000000000000c+5] = #89 |
............................................... |
line 158: RESUME % return to 1B |
37. 0000000000000304: f9000000 (RESUME) {#84030405} -> #2f4 |
(00000000000002f0: 84030405 (LDW)) $3=l[3] = M2[#200000000000000c+#0] = -31611 |
............................................... |
line 158: RESUME % return to 1B |
38. 0000000000000304: f9000000 (RESUME) {#85030405} -> #2f4 |
(00000000000002f0: 85030405 (LDWI)) $3=l[3] = M2[#200000000000000c+5] = -30583 |
............................................... |
line 158: RESUME % return to 1B |
39. 0000000000000304: f9000000 (RESUME) {#86030405} -> #2f4 |
(00000000000002f0: 86030405 (LDWU)) $3=l[3] = M2[#200000000000000c+#0] = #8485 |
............................................... |
line 158: RESUME % return to 1B |
40. 0000000000000304: f9000000 (RESUME) {#87030405} -> #2f4 |
(00000000000002f0: 87030405 (LDWUI)) $3=l[3] = M2[#200000000000000c+5] = #8889 |
............................................... |
line 158: RESUME % return to 1B |
41. 0000000000000304: f9000000 (RESUME) {#88030405} -> #2f4 |
(00000000000002f0: 88030405 (LDT)) $3=l[3] = M4[#200000000000000c+#0] = -2071624057 |
............................................... |
line 158: RESUME % return to 1B |
42. 0000000000000304: f9000000 (RESUME) {#89030405} -> #2f4 |
(00000000000002f0: 89030405 (LDTI)) $3=l[3] = M4[#200000000000000c+5] = -2004252021 |
............................................... |
line 158: RESUME % return to 1B |
43. 0000000000000304: f9000000 (RESUME) {#8a030405} -> #2f4 |
(00000000000002f0: 8a030405 (LDTU)) $3=l[3] = M4[#200000000000000c+#0] = #84858687 |
............................................... |
line 158: RESUME % return to 1B |
44. 0000000000000304: f9000000 (RESUME) {#8b030405} -> #2f4 |
(00000000000002f0: 8b030405 (LDTUI)) $3=l[3] = M4[#200000000000000c+5] = #88898a8b |
............................................... |
line 158: RESUME % return to 1B |
45. 0000000000000304: f9000000 (RESUME) {#8c030405} -> #2f4 |
(00000000000002f0: 8c030405 (LDO)) $3=l[3] = M8[#200000000000000c+#0] = -9186918263483431289 |
............................................... |
line 158: RESUME % return to 1B |
46. 0000000000000304: f9000000 (RESUME) {#8d030405} -> #2f4 |
(00000000000002f0: 8d030405 (LDOI)) $3=l[3] = M8[#200000000000000c+5] = -8608196880778817905 |
............................................... |
line 158: RESUME % return to 1B |
47. 0000000000000304: f9000000 (RESUME) {#8e030405} -> #2f4 |
(00000000000002f0: 8e030405 (LDOU)) $3=l[3] = M8[#200000000000000c+#0] = #8081828384858687 |
............................................... |
line 158: RESUME % return to 1B |
48. 0000000000000304: f9000000 (RESUME) {#8f030405} -> #2f4 |
(00000000000002f0: 8f030405 (LDOUI)) $3=l[3] = M8[#200000000000000c+5] = #88898a8b8c8d8e8f |
............................................... |
line 158: RESUME % return to 1B |
49. 0000000000000304: f9000000 (RESUME) {#90030405} -> #2f4 |
(00000000000002f0: 90030405 (LDSF)) $3=l[3] = (M4[#200000000000000c+#0]) = -3.1391693585473826e-36 |
............................................... |
line 158: RESUME % return to 1B |
50. 0000000000000304: f9000000 (RESUME) {#91030405} -> #2f4 |
(00000000000002f0: 91030405 (LDSFI)) $3=l[3] = (M4[#200000000000000c+5]) = -8.277958869830208e-34 |
............................................... |
line 158: RESUME % return to 1B |
51. 0000000000000304: f9000000 (RESUME) {#92030405} -> #2f4 |
(00000000000002f0: 92030405 (LDHT)) $3=l[3] = M4[#200000000000000c+#0]<<32 = #8485868700000000 |
............................................... |
line 158: RESUME % return to 1B |
52. 0000000000000304: f9000000 (RESUME) {#93030405} -> #2f4 |
(00000000000002f0: 93030405 (LDHTI)) $3=l[3] = M4[#200000000000000c+5]<<32 = #88898a8b00000000 |
............................................... |
line 158: RESUME % return to 1B |
53. 0000000000000304: f9000000 (RESUME) {#94030405} -> #2f4 |
(00000000000002f0: 94030405 (CSWAP)) $3=l[3] = [M8[#200000000000000c+#0]==0] = 0, rP=#8081828384858687 |
............................................... |
line 158: RESUME % return to 1B |
54. 0000000000000304: f9000000 (RESUME) {#95030405} -> #2f4 |
(00000000000002f0: 95030405 (CSWAPI)) $3=l[3] = [M8[#200000000000000c+5]==-9186918263483431289] = 0, rP=#88898a8b8c8d8e8f |
............................................... |
line 158: RESUME % return to 1B |
55. 0000000000000304: f9000000 (RESUME) {#96030405} -> #2f4 |
(00000000000002f0: 96030405 (LDUNC)) $3=l[3] = M8[#200000000000000c+#0] = #8081828384858687 |
............................................... |
line 158: RESUME % return to 1B |
56. 0000000000000304: f9000000 (RESUME) {#97030405} -> #2f4 |
(00000000000002f0: 97030405 (LDUNCI)) $3=l[3] = M8[#200000000000000c+5] = #88898a8b8c8d8e8f |
............................................... |
line 165: 1H GETA $4,2B |
1. 0000000000000328: f504fff8 (GETAB) $4=l[4] = #308 |
line 166: SETL $7,4*11 |
1. 000000000000032c: e307002c (SETL) $7=l[7] = #2c |
line 167: GO $7,$7,$4 |
1. 0000000000000330: 9e070704 (GO) $7=l[7] = #334, -> #2c+#308 |
line 168: GO $7,$4,4*12 |
1. 0000000000000334: 9f070430 (GOI) $7=l[7] = #338, -> #308+48 |
line 169: PRELD 70,$4,$4 |
1. 0000000000000338: 9a460404 (PRELD) [#308+#308 .. #656] |
line 170: PRELD 70,$4,0 |
1. 000000000000033c: 9b460400 (PRELDI) [#308 .. #34e] |
line 171: PREGO 70,$4,$4 |
1. 0000000000000340: 9c460404 (PREGO) [#308+#308 .. #656] |
line 172: PREGO 70,$4,0 |
1. 0000000000000344: 9d460400 (PREGOI) [#308 .. #34e] |
line 173: CSWAP $3,Load_Test+13 |
1. 0000000000000348: 9503f115 (CSWAPI) $3=l[3] = [M8[#2000000000000000+21]==-8608196880778817905] = 1, M8[#2000000000000010]=#88898a8b8c8d8e8f |
line 174: GETA $3,1F |
1. 000000000000034c: f4030007 (GETA) $3=l[3] = #368 |
line 175: PUT rW,$3 |
1. 0000000000000350: f6180003 (PUT) rW = 872 = #368 |
line 176: SETL rz,1 |
1. 0000000000000354: e3f20001 (SETL) $242=g[242] = #1 |
line 177: ADD ry,$4,4 |
1. 0000000000000358: 21f30404 (ADDI) $243=g[243] = 776 + 4 = 780 |
line 178: LDOU $40,Jmp_Pop |
1. 000000000000035c: 8f28f118 (LDOUI) rL=41, $40=l[40] = M8[#2000000000000000+24] = #f0000002f8000000 |
line 179: LDTU $7,Big_End |
1. 0000000000000360: 8b07f12c (LDTUI) $7=l[7] = M4[#2000000000000000+44] = #ef28f305 |
line 180: LDTU $6,Big_Begin |
1. 0000000000000364: 8b06f128 (LDTUI) $6=l[6] = M4[#2000000000000000+40] = #9f28f305 |
line 181: 1H CMPU $8,$6,$7 |
1. 0000000000000368: 32080607 (CMPU) $8=l[8] = #9f28f305 cmp #ef28f305 = -1 |
line 182: BNN $8,1F |
1. 000000000000036c: 48080005 (BNN) -1>=0? No |
line 183: INCML $6,#100 % increase the opcode |
1. 0000000000000370: e6060100 (INCML) $6=l[6] = #9f28f305 + #1000000 = #a028f305 |
line 184: PUT rX,$6 |
1. 0000000000000374: f6190006 (PUT) rX = 2687038213 = #a028f305 |
line 185: SET $5,rz |
1. 0000000000000378: c105f200 (ORI) $5=l[5] = 1 = #1 |
line 186: RESUME % return to 1B |
1. 000000000000037c: f9000000 (RESUME) {#a028f305} -> #368 |
(0000000000000364: a028f305 (STB)) M1[#30c+#1] = -1152921491856162816, M8[#308]=#fedcba9876003210, rA=#00049 |
-------- |
line 181: 1H CMPU $8,$6,$7 |
2. 0000000000000368: 32080607 (CMPU) $8=l[8] = #a028f305 cmp #ef28f305 = -1 |
line 182: BNN $8,1F |
2. 000000000000036c: 48080005 (BNN) -1>=0? No |
line 183: INCML $6,#100 % increase the opcode |
2. 0000000000000370: e6060100 (INCML) $6=l[6] = #a028f305 + #1000000 = #a128f305 |
line 184: PUT rX,$6 |
2. 0000000000000374: f6190006 (PUT) rX = 2703815429 = #a128f305 |
line 185: SET $5,rz |
2. 0000000000000378: c105f200 (ORI) $5=l[5] = 1 = #1 |
line 186: RESUME % return to 1B |
2. 000000000000037c: f9000000 (RESUME) {#a128f305} -> #368 |
(0000000000000364: a128f305 (STBI)) M1[#30c+5] = -1152921491856162816, M8[#310]=#ff00ddccbbaa9988, rA=#00049 |
............................................... |
line 186: RESUME % return to 1B |
3. 000000000000037c: f9000000 (RESUME) {#a228f305} -> #368 |
(0000000000000364: a228f305 (STBU)) M1[#30c+#1] = #f0000002f8000000, M8[#308]=#fedcba9876003210 |
............................................... |
line 186: RESUME % return to 1B |
4. 000000000000037c: f9000000 (RESUME) {#a328f305} -> #368 |
(0000000000000364: a328f305 (STBUI)) M1[#30c+5] = #f0000002f8000000, M8[#310]=#ff00ddccbbaa9988 |
............................................... |
line 186: RESUME % return to 1B |
5. 000000000000037c: f9000000 (RESUME) {#a428f305} -> #368 |
(0000000000000364: a428f305 (STW)) M2[#30c+#1] = -1152921491856162816, M8[#308]=#fedcba9800003210, rA=#00049 |
............................................... |
line 186: RESUME % return to 1B |
6. 000000000000037c: f9000000 (RESUME) {#a528f305} -> #368 |
(0000000000000364: a528f305 (STWI)) M2[#30c+5] = -1152921491856162816, M8[#310]=#ddccbbaa9988, rA=#00049 |
............................................... |
line 186: RESUME % return to 1B |
7. 000000000000037c: f9000000 (RESUME) {#a628f305} -> #368 |
(0000000000000364: a628f305 (STWU)) M2[#30c+#1] = #f0000002f8000000, M8[#308]=#fedcba9800003210 |
............................................... |
line 186: RESUME % return to 1B |
8. 000000000000037c: f9000000 (RESUME) {#a728f305} -> #368 |
(0000000000000364: a728f305 (STWUI)) M2[#30c+5] = #f0000002f8000000, M8[#310]=#ddccbbaa9988 |
............................................... |
line 186: RESUME % return to 1B |
9. 000000000000037c: f9000000 (RESUME) {#a828f305} -> #368 |
(0000000000000364: a828f305 (STT)) M4[#30c+#1] = -1152921491856162816, M8[#308]=#fedcba98f8000000, rA=#00049 |
............................................... |
line 186: RESUME % return to 1B |
10. 000000000000037c: f9000000 (RESUME) {#a928f305} -> #368 |
(0000000000000364: a928f305 (STTI)) M4[#30c+5] = -1152921491856162816, M8[#310]=#f8000000bbaa9988, rA=#00049 |
............................................... |
line 186: RESUME % return to 1B |
11. 000000000000037c: f9000000 (RESUME) {#aa28f305} -> #368 |
(0000000000000364: aa28f305 (STTU)) M4[#30c+#1] = #f0000002f8000000, M8[#308]=#fedcba98f8000000 |
............................................... |
line 186: RESUME % return to 1B |
12. 000000000000037c: f9000000 (RESUME) {#ab28f305} -> #368 |
(0000000000000364: ab28f305 (STTUI)) M4[#30c+5] = #f0000002f8000000, M8[#310]=#f8000000bbaa9988 |
............................................... |
line 186: RESUME % return to 1B |
13. 000000000000037c: f9000000 (RESUME) {#ac28f305} -> #368 |
(0000000000000364: ac28f305 (STO)) M8[#30c+#1] = -1152921491856162816 |
............................................... |
line 186: RESUME % return to 1B |
14. 000000000000037c: f9000000 (RESUME) {#ad28f305} -> #368 |
(0000000000000364: ad28f305 (STOI)) M8[#30c+5] = -1152921491856162816 |
............................................... |
line 186: RESUME % return to 1B |
15. 000000000000037c: f9000000 (RESUME) {#ae28f305} -> #368 |
(0000000000000364: ae28f305 (STOU)) M8[#30c+#1] = #f0000002f8000000 |
............................................... |
line 186: RESUME % return to 1B |
16. 000000000000037c: f9000000 (RESUME) {#af28f305} -> #368 |
(0000000000000364: af28f305 (STOUI)) M8[#30c+5] = #f0000002f8000000 |
............................................... |
line 186: RESUME % return to 1B |
17. 000000000000037c: f9000000 (RESUME) {#b028f305} -> #368 |
(0000000000000364: b028f305 (STSF)) (M4[#30c+#1]) = -3.105044975643911e231, M8[#308]=#f0000002ff800000, rA=#00049 |
............................................... |
line 186: RESUME % return to 1B |
18. 000000000000037c: f9000000 (RESUME) {#b128f305} -> #368 |
(0000000000000364: b128f305 (STSFI)) (M4[#30c+5]) = -3.105044975643911e231, M8[#310]=#ff800000f8000000, rA=#00049 |
............................................... |
line 186: RESUME % return to 1B |
19. 000000000000037c: f9000000 (RESUME) {#b228f305} -> #368 |
(0000000000000364: b228f305 (STHT)) M4[#30c+#1] = #f0000002f8000000>>32, M8[#308]=#f0000002f0000002 |
............................................... |
line 186: RESUME % return to 1B |
20. 000000000000037c: f9000000 (RESUME) {#b328f305} -> #368 |
(0000000000000364: b328f305 (STHTI)) M4[#30c+5] = #f0000002f8000000>>32, M8[#310]=#f0000002f8000000 |
............................................... |
line 186: RESUME % return to 1B |
21. 000000000000037c: f9000000 (RESUME) {#b428f305} -> #368 |
(0000000000000364: b428f305 (STCO)) M8[#30c+#1] = 40 |
............................................... |
line 186: RESUME % return to 1B |
22. 000000000000037c: f9000000 (RESUME) {#b528f305} -> #368 |
(0000000000000364: b528f305 (STCOI)) M8[#30c+5] = 40 |
............................................... |
line 186: RESUME % return to 1B |
23. 000000000000037c: f9000000 (RESUME) {#b628f305} -> #368 |
(0000000000000364: b628f305 (STUNC)) M8[#30c+#1] = #f0000002f8000000 |
............................................... |
line 186: RESUME % return to 1B |
24. 000000000000037c: f9000000 (RESUME) {#b728f305} -> #368 |
(0000000000000364: b728f305 (STUNCI)) M8[#30c+5] = #f0000002f8000000 |
............................................... |
line 186: RESUME % return to 1B |
25. 000000000000037c: f9000000 (RESUME) {#b828f305} -> #368 |
(0000000000000364: b828f305 (SYNCD)) [#30c+#1 .. #335] |
............................................... |
line 186: RESUME % return to 1B |
26. 000000000000037c: f9000000 (RESUME) {#b928f305} -> #368 |
(0000000000000364: b928f305 (SYNCDI)) [#30c+5 .. #339] |
............................................... |
line 186: RESUME % return to 1B |
27. 000000000000037c: f9000000 (RESUME) {#ba28f305} -> #368 |
(0000000000000364: ba28f305 (PREST)) [#30c+#1 .. #335] |
............................................... |
line 186: RESUME % return to 1B |
28. 000000000000037c: f9000000 (RESUME) {#bb28f305} -> #368 |
(0000000000000364: bb28f305 (PRESTI)) [#30c+5 .. #339] |
............................................... |
line 186: RESUME % return to 1B |
29. 000000000000037c: f9000000 (RESUME) {#bc28f305} -> #368 |
(0000000000000364: bc28f305 (SYNCID)) [#30c+#1 .. #335] |
............................................... |
line 186: RESUME % return to 1B |
30. 000000000000037c: f9000000 (RESUME) {#bd28f305} -> #368 |
(0000000000000364: bd28f305 (SYNCIDI)) [#30c+5 .. #339] |
............................................... |
line 186: RESUME % return to 1B |
31. 000000000000037c: f9000000 (RESUME) {#be28f305} -> #368 |
(0000000000000364: be28f305 (PUSHGO)) l[40]=40, rO=#6000000000000148, rL=0, rJ=#368, -> #30c+#1 |
-------- |
line 159: 2H OCTA #fedcba9876543210 % becomes Jmp_Pop |
1. 000000000000030d: f8000000 (POP) rL=40, rO=#6000000000000000, -> #368 |
............................................... |
line 186: RESUME % return to 1B |
32. 000000000000037c: f9000000 (RESUME) {#bf28f305} -> #368 |
(0000000000000364: bf28f305 (PUSHGOI)) l[40]=40, rO=#6000000000000148, rL=0, rJ=#368, -> #30c+5 |
-------- |
line 160: OCTA #ffeeddccbbaa9988 % becomes Jmp_Pop |
1. 0000000000000311: f0000002 (JMP) -> #319 |
line 161: NEG ry,addy |
1. 0000000000000319: 34f300f6 (NEG) $243=g[243] = 0 - 9178337916516812809 = -9178337916516812809 |
line 162: SET rz,flip |
1. 000000000000031d: c1f2f400 (ORI) $242=g[242] = 72624976668147840 = #102040810204080 |
line 163: PUT rM,addz |
1. 0000000000000321: f60500f5 (PUT) rM = -45041037404232713 = #ff5ffb6a4534a3f7 |
line 164: POP |
1. 0000000000000325: f8000000 (POP) rL=40, rO=#6000000000000000, -> #368 |
............................................... |
line 186: RESUME % return to 1B |
33. 000000000000037c: f9000000 (RESUME) {#c028f305} -> #368 |
(0000000000000364: c028f305 (OR)) rL=41, $40=l[40] = #809ffe4b398437f7 | #102040810204080 = #819ffe4b39a477f7 |
............................................... |
line 186: RESUME % return to 1B |
34. 000000000000037c: f9000000 (RESUME) {#c128f305} -> #368 |
(0000000000000364: c128f305 (ORI)) $40=l[40] = #809ffe4b398437f7 | 5 = #809ffe4b398437f7 |
............................................... |
line 186: RESUME % return to 1B |
35. 000000000000037c: f9000000 (RESUME) {#c228f305} -> #368 |
(0000000000000364: c228f305 (ORN)) $40=l[40] = #809ffe4b398437f7 |~ #102040810204080 = #feffffffffdfbfff |
............................................... |
line 186: RESUME % return to 1B |
36. 000000000000037c: f9000000 (RESUME) {#c328f305} -> #368 |
(0000000000000364: c328f305 (ORNI)) $40=l[40] = #809ffe4b398437f7 |~ 5 = #ffffffffffffffff |
............................................... |
line 186: RESUME % return to 1B |
37. 000000000000037c: f9000000 (RESUME) {#c428f305} -> #368 |
(0000000000000364: c428f305 (NOR)) $40=l[40] = #809ffe4b398437f7 ~| #102040810204080 = #7e6001b4c65b8808 |
............................................... |
line 186: RESUME % return to 1B |
38. 000000000000037c: f9000000 (RESUME) {#c528f305} -> #368 |
(0000000000000364: c528f305 (NORI)) $40=l[40] = #809ffe4b398437f7 ~| 5 = #7f6001b4c67bc808 |
............................................... |
line 186: RESUME % return to 1B |
39. 000000000000037c: f9000000 (RESUME) {#c628f305} -> #368 |
(0000000000000364: c628f305 (XOR)) $40=l[40] = #809ffe4b398437f7 ^ #102040810204080 = #819dfa4329a47777 |
............................................... |
line 186: RESUME % return to 1B |
40. 000000000000037c: f9000000 (RESUME) {#c728f305} -> #368 |
(0000000000000364: c728f305 (XORI)) $40=l[40] = #809ffe4b398437f7 ^ 5 = #809ffe4b398437f2 |
............................................... |
line 186: RESUME % return to 1B |
41. 000000000000037c: f9000000 (RESUME) {#c828f305} -> #368 |
(0000000000000364: c828f305 (AND)) $40=l[40] = #809ffe4b398437f7 & #102040810204080 = #2040810000080 |
............................................... |
line 186: RESUME % return to 1B |
42. 000000000000037c: f9000000 (RESUME) {#c928f305} -> #368 |
(0000000000000364: c928f305 (ANDI)) $40=l[40] = #809ffe4b398437f7 & 5 = #5 |
............................................... |
line 186: RESUME % return to 1B |
43. 000000000000037c: f9000000 (RESUME) {#ca28f305} -> #368 |
(0000000000000364: ca28f305 (ANDN)) $40=l[40] = #809ffe4b398437f7 \ #102040810204080 = #809dfa4329843777 |
............................................... |
line 186: RESUME % return to 1B |
44. 000000000000037c: f9000000 (RESUME) {#cb28f305} -> #368 |
(0000000000000364: cb28f305 (ANDNI)) $40=l[40] = #809ffe4b398437f7 \ 5 = #809ffe4b398437f2 |
............................................... |
line 186: RESUME % return to 1B |
45. 000000000000037c: f9000000 (RESUME) {#cc28f305} -> #368 |
(0000000000000364: cc28f305 (NAND)) $40=l[40] = #809ffe4b398437f7 ~& #102040810204080 = #fffdfbf7efffff7f |
............................................... |
line 186: RESUME % return to 1B |
46. 000000000000037c: f9000000 (RESUME) {#cd28f305} -> #368 |
(0000000000000364: cd28f305 (NANDI)) $40=l[40] = #809ffe4b398437f7 ~& 5 = #fffffffffffffffa |
............................................... |
line 186: RESUME % return to 1B |
47. 000000000000037c: f9000000 (RESUME) {#ce28f305} -> #368 |
(0000000000000364: ce28f305 (NXOR)) $40=l[40] = #809ffe4b398437f7 ~^ #102040810204080 = #7e6205bcd65b8888 |
............................................... |
line 186: RESUME % return to 1B |
48. 000000000000037c: f9000000 (RESUME) {#cf28f305} -> #368 |
(0000000000000364: cf28f305 (NXORI)) $40=l[40] = #809ffe4b398437f7 ~^ 5 = #7f6001b4c67bc80d |
............................................... |
line 186: RESUME % return to 1B |
49. 000000000000037c: f9000000 (RESUME) {#d028f305} -> #368 |
(0000000000000364: d028f305 (BDIF)) $40=l[40] = #809ffe4b398437f7 bdif #102040810204080 = #7f9dfa4329640077 |
............................................... |
line 186: RESUME % return to 1B |
50. 000000000000037c: f9000000 (RESUME) {#d128f305} -> #368 |
(0000000000000364: d128f305 (BDIFI)) $40=l[40] = #809ffe4b398437f7 bdif 5 = #809ffe4b398437f2 |
............................................... |
line 186: RESUME % return to 1B |
51. 000000000000037c: f9000000 (RESUME) {#d228f305} -> #368 |
(0000000000000364: d228f305 (WDIF)) $40=l[40] = #809ffe4b398437f7 wdif #102040810204080 = #7f9dfa4329640000 |
............................................... |
line 186: RESUME % return to 1B |
52. 000000000000037c: f9000000 (RESUME) {#d328f305} -> #368 |
(0000000000000364: d328f305 (WDIFI)) $40=l[40] = #809ffe4b398437f7 wdif 5 = #809ffe4b398437f2 |
............................................... |
line 186: RESUME % return to 1B |
53. 000000000000037c: f9000000 (RESUME) {#d428f305} -> #368 |
(0000000000000364: d428f305 (TDIF)) $40=l[40] = #809ffe4b398437f7 tdif #102040810204080 = #7f9dfa432963f777 |
............................................... |
line 186: RESUME % return to 1B |
54. 000000000000037c: f9000000 (RESUME) {#d528f305} -> #368 |
(0000000000000364: d528f305 (TDIFI)) $40=l[40] = #809ffe4b398437f7 tdif 5 = #809ffe4b398437f2 |
............................................... |
line 186: RESUME % return to 1B |
55. 000000000000037c: f9000000 (RESUME) {#d628f305} -> #368 |
(0000000000000364: d628f305 (ODIF)) $40=l[40] = #809ffe4b398437f7 odif #102040810204080 = #7f9dfa432963f777 |
............................................... |
line 186: RESUME % return to 1B |
56. 000000000000037c: f9000000 (RESUME) {#d728f305} -> #368 |
(0000000000000364: d728f305 (ODIFI)) $40=l[40] = #809ffe4b398437f7 odif 5 = #809ffe4b398437f2 |
............................................... |
line 186: RESUME % return to 1B |
57. 000000000000037c: f9000000 (RESUME) {#d828f305} -> #368 |
(0000000000000364: d828f305 (MUX)) $40=l[40] = #ff5ffb6a4534a3f7? #809ffe4b398437f7: #102040810204080 = #801ffe4a110463f7 |
............................................... |
line 186: RESUME % return to 1B |
58. 000000000000037c: f9000000 (RESUME) {#d928f305} -> #368 |
(0000000000000364: d928f305 (MUXI)) $40=l[40] = #ff5ffb6a4534a3f7? #809ffe4b398437f7: 5 = #801ffa4a010423f7 |
............................................... |
line 186: RESUME % return to 1B |
59. 000000000000037c: f9000000 (RESUME) {#da28f305} -> #368 |
(0000000000000364: da28f305 (SADD)) $40=l[40] = nu(#809ffe4b398437f7\#102040810204080) = 31 |
............................................... |
line 186: RESUME % return to 1B |
60. 000000000000037c: f9000000 (RESUME) {#db28f305} -> #368 |
(0000000000000364: db28f305 (SADDI)) $40=l[40] = nu(#809ffe4b398437f7\5) = 34 |
............................................... |
line 186: RESUME % return to 1B |
61. 000000000000037c: f9000000 (RESUME) {#dc28f305} -> #368 |
(0000000000000364: dc28f305 (MOR)) $40=l[40] = #809ffe4b398437f7 mor #102040810204080 = #f73784394bfe9f80 |
............................................... |
line 186: RESUME % return to 1B |
62. 000000000000037c: f9000000 (RESUME) {#dd28f305} -> #368 |
(0000000000000364: dd28f305 (MORI)) $40=l[40] = #809ffe4b398437f7 mor 5 = #f7 |
............................................... |
line 186: RESUME % return to 1B |
63. 000000000000037c: f9000000 (RESUME) {#de28f305} -> #368 |
(0000000000000364: de28f305 (MXOR)) $40=l[40] = #809ffe4b398437f7 mxor #102040810204080 = #f73784394bfe9f80 |
............................................... |
line 186: RESUME % return to 1B |
64. 000000000000037c: f9000000 (RESUME) {#df28f305} -> #368 |
(0000000000000364: df28f305 (MXORI)) $40=l[40] = #809ffe4b398437f7 mxor 5 = #73 |
............................................... |
line 186: RESUME % return to 1B |
65. 000000000000037c: f9000000 (RESUME) {#e028f305} -> #368 |
(0000000000000364: e028f305 (SETH)) $40=l[40] = #f305000000000000 |
............................................... |
line 186: RESUME % return to 1B |
66. 000000000000037c: f9000000 (RESUME) {#e128f305} -> #368 |
(0000000000000364: e128f305 (SETMH)) $40=l[40] = #f30500000000 |
............................................... |
line 186: RESUME % return to 1B |
67. 000000000000037c: f9000000 (RESUME) {#e228f305} -> #368 |
(0000000000000364: e228f305 (SETML)) $40=l[40] = #f3050000 |
............................................... |
line 186: RESUME % return to 1B |
68. 000000000000037c: f9000000 (RESUME) {#e328f305} -> #368 |
(0000000000000364: e328f305 (SETL)) $40=l[40] = #f305 |
............................................... |
line 186: RESUME % return to 1B |
69. 000000000000037c: f9000000 (RESUME) {#e428f305} -> #368 |
(0000000000000364: e428f305 (INCH)) $40=l[40] = #f305 + #f305000000000000 = #f30500000000f305 |
............................................... |
line 186: RESUME % return to 1B |
70. 000000000000037c: f9000000 (RESUME) {#e528f305} -> #368 |
(0000000000000364: e528f305 (INCMH)) $40=l[40] = #f30500000000f305 + #f30500000000 = #f305f3050000f305 |
............................................... |
line 186: RESUME % return to 1B |
71. 000000000000037c: f9000000 (RESUME) {#e628f305} -> #368 |
(0000000000000364: e628f305 (INCML)) $40=l[40] = #f305f3050000f305 + #f3050000 = #f305f305f305f305 |
............................................... |
line 186: RESUME % return to 1B |
72. 000000000000037c: f9000000 (RESUME) {#e728f305} -> #368 |
(0000000000000364: e728f305 (INCL)) $40=l[40] = #f305f305f305f305 + #f305 = #f305f305f306e60a |
............................................... |
line 186: RESUME % return to 1B |
73. 000000000000037c: f9000000 (RESUME) {#e828f305} -> #368 |
(0000000000000364: e828f305 (ORH)) $40=l[40] = #f305f305f306e60a | #f305000000000000 = #f305f305f306e60a |
............................................... |
line 186: RESUME % return to 1B |
74. 000000000000037c: f9000000 (RESUME) {#e928f305} -> #368 |
(0000000000000364: e928f305 (ORMH)) $40=l[40] = #f305f305f306e60a | #f30500000000 = #f305f305f306e60a |
............................................... |
line 186: RESUME % return to 1B |
75. 000000000000037c: f9000000 (RESUME) {#ea28f305} -> #368 |
(0000000000000364: ea28f305 (ORML)) $40=l[40] = #f305f305f306e60a | #f3050000 = #f305f305f307e60a |
............................................... |
line 186: RESUME % return to 1B |
76. 000000000000037c: f9000000 (RESUME) {#eb28f305} -> #368 |
(0000000000000364: eb28f305 (ORL)) $40=l[40] = #f305f305f307e60a | #f305 = #f305f305f307f70f |
............................................... |
line 186: RESUME % return to 1B |
77. 000000000000037c: f9000000 (RESUME) {#ec28f305} -> #368 |
(0000000000000364: ec28f305 (ANDNH)) $40=l[40] = #f305f305f307f70f \ #f305000000000000 = #f305f307f70f |
............................................... |
line 186: RESUME % return to 1B |
78. 000000000000037c: f9000000 (RESUME) {#ed28f305} -> #368 |
(0000000000000364: ed28f305 (ANDNMH)) $40=l[40] = #f305f307f70f \ #f30500000000 = #f307f70f |
............................................... |
line 186: RESUME % return to 1B |
79. 000000000000037c: f9000000 (RESUME) {#ee28f305} -> #368 |
(0000000000000364: ee28f305 (ANDNML)) $40=l[40] = #f307f70f \ #f3050000 = #2f70f |
............................................... |
line 186: RESUME % return to 1B |
80. 000000000000037c: f9000000 (RESUME) {#ef28f305} -> #368 |
(0000000000000364: ef28f305 (ANDNL)) $40=l[40] = #2f70f \ #f305 = #2040a |
............................................... |
line 187: 1H SL $40,small,51 |
1. 0000000000000380: 3928fe33 (SLI) $40=l[40] = 2748 << 51 = 6187945888007061504 |
line 188: SL $40,small,52 |
1. 0000000000000384: 3928fe34 (SLI) $40=l[40] = 2748 << 52 = -6070852297695428608, rA=#00049 |
line 189: SAVE $255,0 |
M8[#6000000000000000]=l[0]=#7fefffffffffffff, rS+=8 |
M8[#6000000000000008]=l[1]=#ffffffffffffffff, rS+=8 |
M8[#6000000000000010]=l[2]=#0000000000000000, rS+=8 |
M8[#6000000000000018]=l[3]=#0000000000000368, rS+=8 |
M8[#6000000000000020]=l[4]=#0000000000000308, rS+=8 |
M8[#6000000000000028]=l[5]=#0102040810204080, rS+=8 |
M8[#6000000000000030]=l[6]=#00000000ef28f305, rS+=8 |
M8[#6000000000000038]=l[7]=#00000000ef28f305, rS+=8 |
M8[#6000000000000040]=l[8]=#0000000000000000, rS+=8 |
M8[#6000000000000048]=l[9]=#bfdfffffffffffff, rS+=8 |
M8[#6000000000000050]=l[10]=#fff8000000000000, rS+=8 |
M8[#6000000000000058]=l[11]=#fff8000000000000, rS+=8 |
M8[#6000000000000060]=l[12]=#0000000000000001, rS+=8 |
M8[#6000000000000068]=l[13]=#8000000000000000, rS+=8 |
M8[#6000000000000070]=l[14]=#ffffffffffffffff, rS+=8 |
M8[#6000000000000078]=l[15]=#43dfd8006d319ef2, rS+=8 |
M8[#6000000000000080]=l[16]=#43dfd8006d319ef3, rS+=8 |
M8[#6000000000000088]=l[17]=#bff0000000000000, rS+=8 |
M8[#6000000000000090]=l[18]=#43dfd80060000000, rS+=8 |
M8[#6000000000000098]=l[19]=#43dfd80080000000, rS+=8 |
M8[#60000000000000a0]=l[20]=#bff0000000000000, rS+=8 |
M8[#60000000000000a8]=l[21]=#406fe00000000000, rS+=8 |
M8[#60000000000000b0]=l[22]=#3fd0000000000000, rS+=8 |
M8[#60000000000000b8]=l[23]=#3d85780000000000, rS+=8 |
M8[#60000000000000c0]=l[24]=#0000000000000001, rS+=8 |
M8[#60000000000000c8]=l[25]=#5ff0000000000000, rS+=8 |
M8[#60000000000000d0]=l[26]=#5fefffffffffffff, rS+=8 |
M8[#60000000000000d8]=l[27]=#7ff0000000000000, rS+=8 |
M8[#60000000000000e0]=l[28]=#3c90000000000000, rS+=8 |
M8[#60000000000000e8]=l[29]=#8000000000000244, rS+=8 |
M8[#60000000000000f0]=l[30]=#3ff0000000000000, rS+=8 |
M8[#60000000000000f8]=l[31]=#070c142030404000, rS+=8 |
M8[#6000000000000100]=l[32]=#fefdfbf7efdfbf80, rS+=8 |
M8[#6000000000000108]=l[33]=#ffffffffffffffff, rS+=8 |
M8[#6000000000000110]=l[34]=#0102040810204080, rS+=8 |
M8[#6000000000000118]=l[35]=#7ebffd1f0bb06c00, rS+=8 |
M8[#6000000000000120]=l[36]=#7ebffd1f0bb06c00, rS+=8 |
M8[#6000000000000128]=l[37]=#0000000000000000, rS+=8 |
M8[#6000000000000130]=l[38]=#0000000000000000, rS+=8 |
M8[#6000000000000138]=l[39]=#0000000000000000, rS+=8 |
M8[#6000000000000140]=l[40]=#abc0000000000000, rS+=8 |
M8[#6000000000000148]=l[41]=#0000000000000029, rS+=8 |
M8[#6000000000000150]=g[241]=#2000000000000000, rS+=8 |
M8[#6000000000000158]=g[242]=#0102040810204080, rS+=8 |
M8[#6000000000000160]=g[243]=#809ffe4b398437f7, rS+=8 |
M8[#6000000000000168]=g[244]=#0102040810204080, rS+=8 |
M8[#6000000000000170]=g[245]=#ff5ffb6a4534a3f7, rS+=8 |
M8[#6000000000000178]=g[246]=#7f6001b4c67bc809, rS+=8 |
M8[#6000000000000180]=g[247]=#0000000000030000, rS+=8 |
M8[#6000000000000188]=g[248]=#0000000000020000, rS+=8 |
M8[#6000000000000190]=g[249]=#0000000000010000, rS+=8 |
M8[#6000000000000198]=g[250]=#7ff1000000000000, rS+=8 |
M8[#60000000000001a0]=g[251]=#7ff0000000000000, rS+=8 |
M8[#60000000000001a8]=g[252]=#3fe0000000000000, rS+=8 |
M8[#60000000000001b0]=g[253]=#8000000000000000, rS+=8 |
M8[#60000000000001b8]=g[254]=#0000000000000abc, rS+=8 |
M8[#60000000000001c0]=g[255]=#0000000000000100, rS+=8 |
M8[#60000000000001c8]=rB=#0000000000000000, rS+=8 |
M8[#60000000000001d0]=rD=#0001040c2050c1c4, rS+=8 |
M8[#60000000000001d8]=rE=#8000000000000000, rS+=8 |
M8[#60000000000001e0]=rH=#0001040c2050c1c4, rS+=8 |
M8[#60000000000001e8]=rJ=#0000000000000368, rS+=8 |
M8[#60000000000001f0]=rM=#ff5ffb6a4534a3f7, rS+=8 |
M8[#60000000000001f8]=rR=#0000000000000000, rS+=8 |
M8[#6000000000000200]=rP=#88898a8b8c8d8e8f, rS+=8 |
M8[#6000000000000208]=rW=#0000000000000368, rS+=8 |
M8[#6000000000000210]=rX=#00000000ef28f305, rS+=8 |
M8[#6000000000000218]=rY=#0000000000000000, rS+=8 |
M8[#6000000000000220]=rZ=#0000000000000000, rS+=8 |
M8[#6000000000000228]=(rG,rA)=#f100000000000049, rS+=8 |
1. 0000000000000388: faff0000 (SAVE) rL=0, $255=g[255] = #6000000000000228 |
line 190: PUT rG,small-$0 |
1. 000000000000038c: f71300fe (PUTI) rG = 254 = #fe |
line 191: INCL small-1,U_BIT<<8 |
1. 0000000000000390: e7fd0400 (INCL) rL=254, $253=l[67] = #0 + #400 = #400 |
line 192: FADD $100,small,$200 |
1. 0000000000000394: 0464fec8 (FADD) $100=l[170] = 1.3577e-320 (+) 0. = 1.3577e-320 |
line 193: PUT rA,small-1 % enable underflow trip |
1. 0000000000000398: f61500fd (PUT) rA = 1024 = #400 |
line 194: TRIP 1,$100,small |
1. 000000000000039c: ff0164fe (TRIP) rW=#3a0, rX=#80000000ff0164fe, rY=#abc, rZ=#abc, rB=#6000000000000228, g[255]=#368, -> #00 |
... |
line 214: GET $50,rX |
1. 0000000000000000: fe320019 (GET) $50=l[120] = rX = #80000000ff0164fe |
line 215: INCH $50,#8200 % ropcode 2 |
1. 0000000000000004: e4328200 (INCH) $50=l[120] = #80000000ff0164fe + #8200000000000000 = #2000000ff0164fe |
line 216: INCMH $50,#ff00-(U_BIT<<8) |
1. 0000000000000008: e532fb00 (INCMH) $50=l[120] = #2000000ff0164fe + #fb0000000000 = #200fb00ff0164fe |
Warning: TRIP at location 000000000000039c |
line 217: TRAP 1 |
1. 000000000000000c: 00000001 (TRAP) Halt(1) |
line 218: 2H PUT rX,$50 |
1. 0000000000000010: f6190032 (PUT) rX = 144391169772709118 = #200fb00ff0164fe |
line 219: GET $255,rB |
1. 0000000000000014: feff0000 (GET) $255=g[255] = rB = #6000000000000228 |
line 220: RESUME |
1. 0000000000000018: f9000000 (RESUME) {#200fb00ff0164fe} -> #3a0 |
(000000000000039c: ..01..rZ (SET)) $1=l[71] = 2748 = #abc, rA=#004fb |
-------- |
line 195: FSUB $100,small,$200 % cause underflow trip |
1. 00000000000003a0: 0664fec8 (FSUB) $100=l[170] = 1.3577e-320 (-) 0. = 1.3577e-320, -> #60 |
... |
line 203: PUSHJ $255,Handler |
1. 0000000000000060: f3ffffef (PUSHJB) l[68]=254, rO=#6000000000000a28, rL=0, rJ=#64, -> #1c |
... |
line 221: Handler SETL $5,#abcd |
M8[#6000000000000230]=l[70]=#0000000000000000, rS+=8 |
M8[#6000000000000238]=l[71]=#0000000000000abc, rS+=8 |
M8[#6000000000000240]=l[72]=#0000000000000000, rS+=8 |
M8[#6000000000000248]=l[73]=#0000000000000000, rS+=8 |
M8[#6000000000000250]=l[74]=#0000000000000000, rS+=8 |
M8[#6000000000000258]=l[75]=#0000000000000000, rS+=8 |
1. 000000000000001c: e305abcd (SETL) rL=6, $5=l[74] = #abcd |
line 222: GET $1,rJ |
1. 0000000000000020: fe010004 (GET) $1=l[70] = rJ = #64 |
line 223: PUSHJ 3,3B |
1. 0000000000000024: f2030010 (PUSHJ) l[72]=3, rO=#6000000000000a48, rL=2, rJ=#28, -> #64 |
Warning: floating point underflow at location 00000000000003a0 |
-------- |
line 204: 3H TRAP 0,$1 |
1. 0000000000000064: 00000001 (TRAP) Halt(1) |
line 205: SUB $0,$1,1 |
1. 0000000000000068: 25000101 (SUBI) $0=l[73] = 43981 - 1 = 43980 |
line 206: POP 2,0 |
1. 000000000000006c: f8020000 (POP) l[72]=#abcd, rL=5, rO=#6000000000000a28, -> #28 |
... |
line 224: SUB $10,$3,$4 |
M8[#6000000000000260]=l[76]=#0000000000000000, rS+=8 |
M8[#6000000000000268]=l[77]=#0000000000000000, rS+=8 |
M8[#6000000000000270]=l[78]=#0000000000000000, rS+=8 |
M8[#6000000000000278]=l[79]=#0000000000000000, rS+=8 |
M8[#6000000000000280]=l[80]=#0000000000000000, rS+=8 |
1. 0000000000000028: 240a0304 (SUB) rL=11, $10=l[79] = 43981 - 43980 = 1 |
line 225: PUT rJ,$1 |
1. 000000000000002c: f6040001 (PUT) rJ = 100 = #64 |
line 226: POP 11,(4B-3B)>>2 |
rS-=8, l[80]=M8[#6000000000000280]=#0000000000000000 |
rS-=8, l[79]=M8[#6000000000000278]=#0000000000000000 |
rS-=8, l[78]=M8[#6000000000000270]=#0000000000000000 |
rS-=8, l[77]=M8[#6000000000000268]=#0000000000000000 |
rS-=8, l[76]=M8[#6000000000000260]=#0000000000000000 |
rS-=8, l[75]=M8[#6000000000000258]=#0000000000000000 |
rS-=8, l[74]=M8[#6000000000000250]=#0000000000000000 |
rS-=8, l[73]=M8[#6000000000000248]=#0000000000000000 |
rS-=8, l[72]=M8[#6000000000000240]=#0000000000000000 |
rS-=8, l[71]=M8[#6000000000000238]=#0000000000000abc |
rS-=8, l[70]=M8[#6000000000000230]=#0000000000000000 |
1. 0000000000000030: f80b0003 (POP) rL=254, rO=#6000000000000230, -> #64+12 |
-------- |
line 207: 4H GET $50,rX |
1. 0000000000000070: fe320019 (GET) $50=l[120] = rX = #800000000664fec8 |
line 208: INCH $50,#8100 % ropcode 1 |
1. 0000000000000074: e4328100 (INCH) $50=l[120] = #800000000664fec8 + #8100000000000000 = #10000000664fec8 |
line 209: FLOT $60,1 |
1. 0000000000000078: 093c0001 (FLOTI) $60=l[130] = (flot) 1 = 1. |
line 210: PUT rZ,$60 |
1. 000000000000007c: f61b003c (PUT) rZ = 4607182418800017408 = #3ff0000000000000 |
line 211: JMP 2F |
1. 0000000000000080: f1ffffe4 (JMPB) -> #10 |
... |
line 218: 2H PUT rX,$50 |
2. 0000000000000010: f6190032 (PUT) rX = 72057594145210056 = #10000000664fec8 |
line 219: GET $255,rB |
2. 0000000000000014: feff0000 (GET) $255=g[255] = rB = #6000000000000228 |
line 220: RESUME |
2. 0000000000000018: f9000000 (RESUME) {#10000000664fec8} -> #3a4 |
(00000000000003a0: 0664rYrZ (FSUB)) $100=l[170] = 1.3577e-320 (-) 1. = -1., rA=#004fb |
-------- |
line 196: PUT rL,10 |
1. 00000000000003a4: f714000a (PUTI) rL = min(rL,10) = 10 |
line 197: PUT rL,small |
1. 00000000000003a8: f61400fe (PUT) rL = min(rL,2748) = 10 |
line 198: PUSHJ 11,@+4 |
1. 00000000000003ac: f20b0001 (PUSHJ) l[81]=11, rO=#6000000000000290, rL=0, rJ=#3b0, -> #3b0 |
line 199: UNSAVE $255 |
(rG,rA)=M8[#6000000000000228]=#f100000000000049 |
rS-=8, rZ=M8[#6000000000000220]=#0000000000000000 |
rS-=8, rY=M8[#6000000000000218]=#0000000000000000 |
rS-=8, rX=M8[#6000000000000210]=#00000000ef28f305 |
rS-=8, rW=M8[#6000000000000208]=#0000000000000368 |
rS-=8, rP=M8[#6000000000000200]=#88898a8b8c8d8e8f |
rS-=8, rR=M8[#60000000000001f8]=#0000000000000000 |
rS-=8, rM=M8[#60000000000001f0]=#ff5ffb6a4534a3f7 |
rS-=8, rJ=M8[#60000000000001e8]=#0000000000000368 |
rS-=8, rH=M8[#60000000000001e0]=#0001040c2050c1c4 |
rS-=8, rE=M8[#60000000000001d8]=#8000000000000000 |
rS-=8, rD=M8[#60000000000001d0]=#0001040c2050c1c4 |
rS-=8, rB=M8[#60000000000001c8]=#0000000000000000 |
rS-=8, g[255]=M8[#60000000000001c0]=#0000000000000100 |
rS-=8, g[254]=M8[#60000000000001b8]=#0000000000000abc |
rS-=8, g[253]=M8[#60000000000001b0]=#8000000000000000 |
rS-=8, g[252]=M8[#60000000000001a8]=#3fe0000000000000 |
rS-=8, g[251]=M8[#60000000000001a0]=#7ff0000000000000 |
rS-=8, g[250]=M8[#6000000000000198]=#7ff1000000000000 |
rS-=8, g[249]=M8[#6000000000000190]=#0000000000010000 |
rS-=8, g[248]=M8[#6000000000000188]=#0000000000020000 |
rS-=8, g[247]=M8[#6000000000000180]=#0000000000030000 |
rS-=8, g[246]=M8[#6000000000000178]=#7f6001b4c67bc809 |
rS-=8, g[245]=M8[#6000000000000170]=#ff5ffb6a4534a3f7 |
rS-=8, g[244]=M8[#6000000000000168]=#0102040810204080 |
rS-=8, g[243]=M8[#6000000000000160]=#809ffe4b398437f7 |
rS-=8, g[242]=M8[#6000000000000158]=#0102040810204080 |
rS-=8, g[241]=M8[#6000000000000150]=#2000000000000000 |
rS-=8, l[41]=M8[#6000000000000148]=#0000000000000029 |
rS-=8, l[40]=M8[#6000000000000140]=#abc0000000000000 |
rS-=8, l[39]=M8[#6000000000000138]=#0000000000000000 |
rS-=8, l[38]=M8[#6000000000000130]=#0000000000000000 |
rS-=8, l[37]=M8[#6000000000000128]=#0000000000000000 |
rS-=8, l[36]=M8[#6000000000000120]=#7ebffd1f0bb06c00 |
rS-=8, l[35]=M8[#6000000000000118]=#7ebffd1f0bb06c00 |
rS-=8, l[34]=M8[#6000000000000110]=#0102040810204080 |
rS-=8, l[33]=M8[#6000000000000108]=#ffffffffffffffff |
rS-=8, l[32]=M8[#6000000000000100]=#fefdfbf7efdfbf80 |
rS-=8, l[31]=M8[#60000000000000f8]=#070c142030404000 |
rS-=8, l[30]=M8[#60000000000000f0]=#3ff0000000000000 |
rS-=8, l[29]=M8[#60000000000000e8]=#8000000000000244 |
rS-=8, l[28]=M8[#60000000000000e0]=#3c90000000000000 |
rS-=8, l[27]=M8[#60000000000000d8]=#7ff0000000000000 |
rS-=8, l[26]=M8[#60000000000000d0]=#5fefffffffffffff |
rS-=8, l[25]=M8[#60000000000000c8]=#5ff0000000000000 |
rS-=8, l[24]=M8[#60000000000000c0]=#0000000000000001 |
rS-=8, l[23]=M8[#60000000000000b8]=#3d85780000000000 |
rS-=8, l[22]=M8[#60000000000000b0]=#3fd0000000000000 |
rS-=8, l[21]=M8[#60000000000000a8]=#406fe00000000000 |
rS-=8, l[20]=M8[#60000000000000a0]=#bff0000000000000 |
rS-=8, l[19]=M8[#6000000000000098]=#43dfd80080000000 |
rS-=8, l[18]=M8[#6000000000000090]=#43dfd80060000000 |
rS-=8, l[17]=M8[#6000000000000088]=#bff0000000000000 |
rS-=8, l[16]=M8[#6000000000000080]=#43dfd8006d319ef3 |
rS-=8, l[15]=M8[#6000000000000078]=#43dfd8006d319ef2 |
rS-=8, l[14]=M8[#6000000000000070]=#ffffffffffffffff |
rS-=8, l[13]=M8[#6000000000000068]=#8000000000000000 |
rS-=8, l[12]=M8[#6000000000000060]=#0000000000000001 |
rS-=8, l[11]=M8[#6000000000000058]=#fff8000000000000 |
rS-=8, l[10]=M8[#6000000000000050]=#fff8000000000000 |
rS-=8, l[9]=M8[#6000000000000048]=#bfdfffffffffffff |
rS-=8, l[8]=M8[#6000000000000040]=#0000000000000000 |
rS-=8, l[7]=M8[#6000000000000038]=#00000000ef28f305 |
rS-=8, l[6]=M8[#6000000000000030]=#00000000ef28f305 |
rS-=8, l[5]=M8[#6000000000000028]=#0102040810204080 |
rS-=8, l[4]=M8[#6000000000000020]=#0000000000000308 |
rS-=8, l[3]=M8[#6000000000000018]=#0000000000000368 |
rS-=8, l[2]=M8[#6000000000000010]=#0000000000000000 |
rS-=8, l[1]=M8[#6000000000000008]=#ffffffffffffffff |
rS-=8, l[0]=M8[#6000000000000000]=#7fefffffffffffff |
1. 00000000000003b0: fb0000ff (UNSAVE) #6000000000000228: rG=241, ..., rL=41 |
line 200: TRAP 0,Halt,0 % normal exit |
1. 00000000000003b4: 00000000 (TRAP) Halt(0) |
1243 instructions, 99 mems, 2557 oops; 179 good guesses, 19 bad |
(halted at location #00000000000003b4) |
|
Program profile: |
"silly.mms" |
line 214: GET $50,rX |
1. 0000000000000000: fe320019 (GET) |
line 215: INCH $50,#8200 % ropcode 2 |
1. 0000000000000004: e4328200 (INCH) |
line 216: INCMH $50,#ff00-(U_BIT<<8) |
1. 0000000000000008: e532fb00 (INCMH) |
line 217: TRAP 1 |
1. 000000000000000c: 00000001 (TRAP) |
line 218: 2H PUT rX,$50 |
2. 0000000000000010: f6190032 (PUT) |
line 219: GET $255,rB |
2. 0000000000000014: feff0000 (GET) |
line 220: RESUME |
2. 0000000000000018: f9000000 (RESUME) |
line 221: Handler SETL $5,#abcd |
1. 000000000000001c: e305abcd (SETL) |
line 222: GET $1,rJ |
1. 0000000000000020: fe010004 (GET) |
line 223: PUSHJ 3,3B |
1. 0000000000000024: f2030010 (PUSHJ) |
line 224: SUB $10,$3,$4 |
1. 0000000000000028: 240a0304 (SUB) |
line 225: PUT rJ,$1 |
1. 000000000000002c: f6040001 (PUT) |
line 226: POP 11,(4B-3B)>>2 |
1. 0000000000000030: f80b0003 (POP) |
-------- |
line 203: PUSHJ $255,Handler |
1. 0000000000000060: f3ffffef (PUSHJB) |
line 204: 3H TRAP 0,$1 |
1. 0000000000000064: 00000001 (TRAP) |
line 205: SUB $0,$1,1 |
1. 0000000000000068: 25000101 (SUBI) |
line 206: POP 2,0 |
1. 000000000000006c: f8020000 (POP) |
line 207: 4H GET $50,rX |
1. 0000000000000070: fe320019 (GET) |
line 208: INCH $50,#8100 % ropcode 1 |
1. 0000000000000074: e4328100 (INCH) |
line 209: FLOT $60,1 |
1. 0000000000000078: 093c0001 (FLOTI) |
line 210: PUT rZ,$60 |
1. 000000000000007c: f61b003c (PUT) |
line 211: JMP 2F |
1. 0000000000000080: f1ffffe4 (JMPB) |
-------- |
line 29: Main FCMP $0,neg_zero,$5 |
1. 0000000000000100: 0100fd05 (FCMP) |
line 30: FCMP $1,neg_zero,inf |
1. 0000000000000104: 0101fdfb (FCMP) |
line 31: FCMP $2,inf,sig_nan |
1. 0000000000000108: 0102fbfa (FCMP) |
line 32: FUN $3,sig_nan,sig_nan |
1. 000000000000010c: 0203fafa (FUN) |
line 33: FEQL $4,$4,neg_zero |
1. 0000000000000110: 030404fd (FEQL) |
line 34: FADD $5,half,inf |
1. 0000000000000114: 0405fcfb (FADD) |
line 35: FADD $6,half,neg_zero |
1. 0000000000000118: 0406fcfd (FADD) |
line 36: FADD $7,half,half |
1. 000000000000011c: 0407fcfc (FADD) |
line 37: FADD $8,half,sig_nan |
1. 0000000000000120: 0408fcfa (FADD) |
line 38: FSUB $9,half,small |
1. 0000000000000124: 0609fcfe (FSUB) |
line 39: PUT rA,round_off |
1. 0000000000000128: f61500f9 (PUT) |
line 40: FSUB $9,half,small |
1. 000000000000012c: 0609fcfe (FSUB) |
line 41: FSUB $9,small,half |
1. 0000000000000130: 0609fefc (FSUB) |
line 42: FSQRT $10,$9 |
1. 0000000000000134: 150a0009 (FSQRT) |
line 43: FSUB $11,sig_nan,$10 |
1. 0000000000000138: 060bfa0a (FSUB) |
line 44: PUT rA,round_down |
1. 000000000000013c: f61500f7 (PUT) |
line 45: FSUB $12,half,half |
1. 0000000000000140: 060cfcfc (FSUB) |
line 46: FSUB $12,$20,$21 |
1. 0000000000000144: 060c1415 (FSUB) |
line 47: FSUB $12,$20,neg_zero |
1. 0000000000000148: 060c14fd (FSUB) |
line 48: PUT rA,round_up |
1. 000000000000014c: f61500f8 (PUT) |
line 49: SUB $0,inf,1 % $0 = largest normal number |
1. 0000000000000150: 2500fb01 (SUBI) |
line 50: FADD $12,$0,small |
1. 0000000000000154: 040c00fe (FADD) |
line 51: FIX $12,half |
1. 0000000000000158: 050c00fc (FIX) |
line 52: FIXU $14,ROUND_DOWN,$9 |
1. 000000000000015c: 070e0309 (FIXU) |
line 53: FLOT $15,ROUND_DOWN,addy |
1. 0000000000000160: 080f03f6 (FLOT) |
line 54: FLOT $16,ROUND_UP,addy |
1. 0000000000000164: 081002f6 (FLOT) |
line 55: NEG $1,1 % $1 = -1 |
1. 0000000000000168: 35010001 (NEGI) |
line 56: FLOT $17,1 |
1. 000000000000016c: 09110001 (FLOTI) |
line 57: FLOT $17,$1 |
1. 0000000000000170: 08110001 (FLOT) |
line 58: FLOTU $18,255 |
1. 0000000000000174: 0b1200ff (FLOTUI) |
line 59: FLOTU $18,neg_zero |
1. 0000000000000178: 0a1200fd (FLOTU) |
line 60: FIX $13,ROUND_NEAR,$18 |
1. 000000000000017c: 050d0412 (FIX) |
line 61: SFLOT $18,ROUND_DOWN,addy |
1. 0000000000000180: 0c1203f6 (SFLOT) |
line 62: SFLOT $19,ROUND_UP,addy |
1. 0000000000000184: 0c1302f6 (SFLOT) |
line 63: FSUB $20,$18,$19 |
1. 0000000000000188: 06141213 (FSUB) |
line 64: FSUB $20,$16,$15 |
1. 000000000000018c: 0614100f (FSUB) |
line 65: SFLOT $20,1 |
1. 0000000000000190: 0d140001 (SFLOTI) |
line 66: SFLOT $20,$1 |
1. 0000000000000194: 0c140001 (SFLOT) |
line 67: SFLOTU $21,$1 |
1. 0000000000000198: 0e150001 (SFLOTU) |
line 68: SFLOTU $21,255 |
1. 000000000000019c: 0f1500ff (SFLOTUI) |
line 69: FMUL $22,neg_zero,inf |
1. 00000000000001a0: 1016fdfb (FMUL) |
line 70: FMUL $22,half,half |
1. 00000000000001a4: 1016fcfc (FMUL) |
line 71: FMUL $23,small,$0 |
1. 00000000000001a8: 1017fe00 (FMUL) |
line 72: PUT rE,half |
1. 00000000000001ac: f60200fc (PUT) |
line 73: FCMPE $24,half,$21 |
1. 00000000000001b0: 1118fc15 (FCMPE) |
line 74: FCMPE $24,neg_zero,small |
1. 00000000000001b4: 1118fdfe (FCMPE) |
line 75: FCMPE $24,neg_zero,half |
1. 00000000000001b8: 1118fdfc (FCMPE) |
line 76: FCMPE $24,half,inf |
1. 00000000000001bc: 1118fcfb (FCMPE) |
line 77: FEQLE $24,$15,$16 |
1. 00000000000001c0: 13180f10 (FEQLE) |
line 78: PUT rE,neg_zero |
1. 00000000000001c4: f60200fd (PUT) |
line 79: FEQLE $24,half,half |
1. 00000000000001c8: 1318fcfc (FEQLE) |
line 80: FUNE $24,half,half |
1. 00000000000001cc: 1218fcfc (FUNE) |
line 81: FSQRT $25,ROUND_UP,$0 |
1. 00000000000001d0: 15190200 (FSQRT) |
line 82: FDIV $26,$0,$25 |
1. 00000000000001d4: 141a0019 (FDIV) |
line 83: PUT rA,$50 |
1. 00000000000001d8: f6150032 (PUT) |
line 84: FDIV $26,$0,$25 |
1. 00000000000001dc: 141a0019 (FDIV) |
line 85: FMUL $27,$25,$25 |
1. 00000000000001e0: 101b1919 (FMUL) |
line 86: FREM $28,$9,half |
1. 00000000000001e4: 161c09fc (FREM) |
line 87: FREM $29,$9,small |
1. 00000000000001e8: 161d09fe (FREM) |
line 88: FINT $30,$9 |
1. 00000000000001ec: 171e0009 (FINT) |
line 89: FINT $30,ROUND_UP,small |
1. 00000000000001f0: 171e02fe (FINT) |
line 90: MUL $31,flip,flip |
1. 00000000000001f4: 181ff4f4 (MUL) |
line 91: MUL $32,flip,$1 |
1. 00000000000001f8: 1820f401 (MUL) |
line 92: MUL $33,flip,2 |
1. 00000000000001fc: 1921f402 (MULI) |
line 93: DIV $32,$32,$1 |
1. 0000000000000200: 1c202001 (DIV) |
line 94: DIV $32,neg_zero,$1 |
1. 0000000000000204: 1c20fd01 (DIV) |
line 95: MULU $32,flip,$1 |
1. 0000000000000208: 1a20f401 (MULU) |
line 96: MULU $31,flip,flip |
1. 000000000000020c: 1a1ff4f4 (MULU) |
line 97: GET $33,rH |
1. 0000000000000210: fe210003 (GET) |
line 98: PUT rD,$33 |
1. 0000000000000214: f6010021 (PUT) |
line 99: DIV $33,$1,3 |
1. 0000000000000218: 1d210103 (DIVI) |
line 100: DIVU $34,$31,flip |
1. 000000000000021c: 1e221ff4 (DIVU) |
line 101: ADD $35,addy,addz |
1. 0000000000000220: 2023f6f5 (ADD) |
line 102: FADD $36,addy,addz |
1. 0000000000000224: 0424f6f5 (FADD) |
line 103: CMP $37,$36,$35 |
1. 0000000000000228: 30252423 (CMP) |
line 104: GETA $3,1F |
1. 000000000000022c: f4030004 (GETA) |
line 105: PUT rW,$3 |
1. 0000000000000230: f6180003 (PUT) |
line 106: LDT $6,Start_Inst |
1. 0000000000000234: 8906f100 (LDTI) |
line 107: LDTU $7,Final_Inst |
1. 0000000000000238: 8b07f104 (LDTUI) |
line 108: 1H CMP $5,$6,$7 |
28. 000000000000023c: 30050607 (CMP) |
line 109: BNN $5,1F |
28. 0000000000000240: 48050004 (BNN) |
line 110: INCML $6,#100 % increase the opcode |
27. 0000000000000244: e6060100 (INCML) |
line 111: PUT rX,$6 % ropcode 0 |
27. 0000000000000248: f6190006 (PUT) |
line 112: RESUME % return to 1B |
27. 000000000000024c: f9000000 (RESUME) |
line 113: 1H BN $0,@+4*6 |
1. 0000000000000250: 40000006 (BN) |
line 114: PBN $0,@-4*1 |
1. 0000000000000254: 5100ffff (PBNB) |
line 115: BNN $0,@+4*6 |
1. 0000000000000258: 48000006 (BNN) |
line 116: PBN $0,@+4*5 |
1. 000000000000025c: 50000005 (PBN) |
line 117: PBNN $0,@+4*5 |
1. 0000000000000260: 58000005 (PBNN) |
line 118: BN $0,@-4*3 |
1. 0000000000000264: 4100fffd (BNB) |
line 119: BNN $0,@-4*3 |
1. 0000000000000268: 4900fffd (BNNB) |
line 120: PBN $0,@-4*3 |
line 121: PBNN $0,@-4*3 |
1. 0000000000000270: 5900fffd (PBNNB) |
line 122: BZ $0,@+4*6 |
1. 0000000000000274: 42000006 (BZ) |
line 123: PBZ $0,@-4*1 |
1. 0000000000000278: 5300ffff (PBZB) |
line 124: BNZ $0,@+4*6 |
1. 000000000000027c: 4a000006 (BNZ) |
line 125: PBZ $0,@+4*5 |
1. 0000000000000280: 52000005 (PBZ) |
line 126: PBNZ $0,@+4*5 |
1. 0000000000000284: 5a000005 (PBNZ) |
line 127: BZ $0,@-4*3 |
1. 0000000000000288: 4300fffd (BZB) |
line 128: BNZ $0,@-4*3 |
1. 000000000000028c: 4b00fffd (BNZB) |
line 129: PBZ $0,@-4*3 |
line 130: PBNZ $0,@-4*3 |
1. 0000000000000294: 5b00fffd (PBNZB) |
line 131: BP $0,@+4*6 |
1. 0000000000000298: 44000006 (BP) |
line 132: PBP $0,@-4*1 |
line 133: BNP $0,@+4*6 |
1. 00000000000002a0: 4c000006 (BNP) |
line 134: PBP $0,@+4*5 |
1. 00000000000002a4: 54000005 (PBP) |
line 135: PBNP $0,@+4*5 |
1. 00000000000002a8: 5c000005 (PBNP) |
line 136: BP $0,@-4*3 |
1. 00000000000002ac: 4500fffd (BPB) |
line 137: BNP $0,@-4*3 |
1. 00000000000002b0: 4d00fffd (BNPB) |
line 138: PBP $0,@-4*3 |
1. 00000000000002b4: 5500fffd (PBPB) |
line 139: PBNP $0,@-4*3 |
1. 00000000000002b8: 5d00fffd (PBNPB) |
line 140: BOD $0,@+4*6 |
1. 00000000000002bc: 46000006 (BOD) |
line 141: PBOD $0,@-4*1 |
line 142: BEV $0,@+4*6 |
1. 00000000000002c4: 4e000006 (BEV) |
line 143: PBOD $0,@+4*5 |
1. 00000000000002c8: 56000005 (PBOD) |
line 144: PBEV $0,@+4*5 |
1. 00000000000002cc: 5e000005 (PBEV) |
line 145: BOD $0,@-4*3 |
1. 00000000000002d0: 4700fffd (BODB) |
line 146: BEV $0,@-4*3 |
1. 00000000000002d4: 4f00fffd (BEVB) |
line 147: PBOD $0,@-4*3 |
1. 00000000000002d8: 5700fffd (PBODB) |
line 148: PBEV $0,@-4*3 |
1. 00000000000002dc: 5f00fffd (PBEVB) |
line 149: LDA $4,Load_Test+4 |
1. 00000000000002e0: 2304f10c (ADDUI) |
line 150: GETA $3,1F |
1. 00000000000002e4: f4030004 (GETA) |
line 151: PUT rW,$3 |
1. 00000000000002e8: f6180003 (PUT) |
line 152: LDTU $7,Load_End |
1. 00000000000002ec: 8b07f124 (LDTUI) |
line 153: LDTU $6,Load_Begin |
1. 00000000000002f0: 8b06f120 (LDTUI) |
line 154: 1H CMPU $8,$6,$7 |
57. 00000000000002f4: 32080607 (CMPU) |
line 155: BNN $8,1F |
57. 00000000000002f8: 4808000c (BNN) |
line 156: INCML $6,#100 % increase the opcode |
56. 00000000000002fc: e6060100 (INCML) |
line 157: PUT rX,$6 |
56. 0000000000000300: f6190006 (PUT) |
line 158: RESUME % return to 1B |
56. 0000000000000304: f9000000 (RESUME) |
line 159: 2H OCTA #fedcba9876543210 % becomes Jmp_Pop |
1. 000000000000030c: f8000000 (POP) |
line 160: OCTA #ffeeddccbbaa9988 % becomes Jmp_Pop |
1. 0000000000000310: f0000002 (JMP) |
line 161: NEG ry,addy |
1. 0000000000000318: 34f300f6 (NEG) |
line 162: SET rz,flip |
1. 000000000000031c: c1f2f400 (ORI) |
line 163: PUT rM,addz |
1. 0000000000000320: f60500f5 (PUT) |
line 164: POP |
1. 0000000000000324: f8000000 (POP) |
line 165: 1H GETA $4,2B |
1. 0000000000000328: f504fff8 (GETAB) |
line 166: SETL $7,4*11 |
1. 000000000000032c: e307002c (SETL) |
line 167: GO $7,$7,$4 |
1. 0000000000000330: 9e070704 (GO) |
line 168: GO $7,$4,4*12 |
1. 0000000000000334: 9f070430 (GOI) |
line 169: PRELD 70,$4,$4 |
1. 0000000000000338: 9a460404 (PRELD) |
line 170: PRELD 70,$4,0 |
1. 000000000000033c: 9b460400 (PRELDI) |
line 171: PREGO 70,$4,$4 |
1. 0000000000000340: 9c460404 (PREGO) |
line 172: PREGO 70,$4,0 |
1. 0000000000000344: 9d460400 (PREGOI) |
line 173: CSWAP $3,Load_Test+13 |
1. 0000000000000348: 9503f115 (CSWAPI) |
line 174: GETA $3,1F |
1. 000000000000034c: f4030007 (GETA) |
line 175: PUT rW,$3 |
1. 0000000000000350: f6180003 (PUT) |
line 176: SETL rz,1 |
1. 0000000000000354: e3f20001 (SETL) |
line 177: ADD ry,$4,4 |
1. 0000000000000358: 21f30404 (ADDI) |
line 178: LDOU $40,Jmp_Pop |
1. 000000000000035c: 8f28f118 (LDOUI) |
line 179: LDTU $7,Big_End |
1. 0000000000000360: 8b07f12c (LDTUI) |
line 180: LDTU $6,Big_Begin |
1. 0000000000000364: 8b06f128 (LDTUI) |
line 181: 1H CMPU $8,$6,$7 |
81. 0000000000000368: 32080607 (CMPU) |
line 182: BNN $8,1F |
81. 000000000000036c: 48080005 (BNN) |
line 183: INCML $6,#100 % increase the opcode |
80. 0000000000000370: e6060100 (INCML) |
line 184: PUT rX,$6 |
80. 0000000000000374: f6190006 (PUT) |
line 185: SET $5,rz |
80. 0000000000000378: c105f200 (ORI) |
line 186: RESUME % return to 1B |
80. 000000000000037c: f9000000 (RESUME) |
line 187: 1H SL $40,small,51 |
1. 0000000000000380: 3928fe33 (SLI) |
line 188: SL $40,small,52 |
1. 0000000000000384: 3928fe34 (SLI) |
line 189: SAVE $255,0 |
1. 0000000000000388: faff0000 (SAVE) |
line 190: PUT rG,small-$0 |
1. 000000000000038c: f71300fe (PUTI) |
line 191: INCL small-1,U_BIT<<8 |
1. 0000000000000390: e7fd0400 (INCL) |
line 192: FADD $100,small,$200 |
1. 0000000000000394: 0464fec8 (FADD) |
line 193: PUT rA,small-1 % enable underflow trip |
1. 0000000000000398: f61500fd (PUT) |
line 194: TRIP 1,$100,small |
1. 000000000000039c: ff0164fe (TRIP) |
line 195: FSUB $100,small,$200 % cause underflow trip |
1. 00000000000003a0: 0664fec8 (FSUB) |
line 196: PUT rL,10 |
1. 00000000000003a4: f714000a (PUTI) |
line 197: PUT rL,small |
1. 00000000000003a8: f61400fe (PUT) |
line 198: PUSHJ 11,@+4 |
1. 00000000000003ac: f20b0001 (PUSHJ) |
line 199: UNSAVE $255 |
1. 00000000000003b0: fb0000ff (UNSAVE) |
line 200: TRAP 0,Halt,0 % normal exit |
1. 00000000000003b4: 00000000 (TRAP) |
1243 instructions, 99 mems, 2557 oops; 179 good guesses, 19 bad |
(halted at location #00000000000003b4) |
/number1.mms
0,0 → 1,18
NEG $1,1 |
STCO 1,$1,1 |
CMPU $1,$1,1 |
STB $1,$1,$1 |
LDOU $1,$1,$1 |
INCH $1,1 |
16ADDU $1,$1,$1 |
MULU $1,$1,$1 |
PUT rA,1 |
STW $1,$1,1 |
SADD $1,$1,1 |
FLOT $1,$1 |
PUT rB,$1 |
XOR $1,$1,1 |
PBOD $1,@-4*1 |
NOR $1,$1,$1 |
SR $1,$1,1 |
SRU $1,$1,1 |
/copy.mms
0,0 → 1,63
* SAMPLE PROGRAM: COPY A GIVEN FILE TO STANDARD OUTPUT |
|
t IS $255 |
argc IS $0 |
argv IS $1 |
s IS $2 |
Buf_Size IS 5 ridiculously small for testing |
LOC Data_Segment |
Buffer LOC @+Buf_Size |
GREG @ |
Arg0 OCTA 0,TextRead |
Arg1 OCTA Buffer,Buf_Size |
|
LOC #200 main(argc,argv) { |
Main CMP t,argc,2 if (argc==2) goto openit |
PBZ t,OpenIt |
GETA t,1F fputs("Usage: ",stderr) |
TRAP 0,Fputs,StdErr |
LDOU t,argv,0 fputs(argv[0],stderr) |
TRAP 0,Fputs,StdErr |
GETA t,2F fputs(" filename\n",stderr) |
Quit TRAP 0,Fputs,StdErr |
NEG t,0,1 quit: exit(-1) |
TRAP 0,Halt,0 |
1H BYTE "Usage: ",0 |
LOC (@+3)&-4 align to tetrabyte |
2H BYTE " filename",#a,0 |
|
OpenIt LDOU s,argv,8 openit: s=argv[1] |
STOU s,Arg0 |
LDA t,Arg0 fopen(argv[1],"r",file[3]) |
TRAP 0,Fopen,3 |
PBNN t,CopyIt if (no error) goto copyit |
GETA t,1F fputs("Can't open file ",stderr) |
TRAP 0,Fputs,StdErr |
SET t,s fputs(argv[1],stderr) |
TRAP 0,Fputs,StdErr |
GETA t,2F fputs("!\n",stderr) |
JMP Quit goto quit |
1H BYTE "Can't open file ",0 |
LOC (@+3)&-4 align to tetrabyte |
2H BYTE "!",#a,0 |
|
CopyIt LDA t,Arg1 copyit: |
TRAP 0,Fread,3 items=fread(buffer,1,buf_size,file[3]) |
BN t,EndIt if (items < buf_size) goto endit |
LDA t,Arg1 items=fwrite(buffer,1,buf_size,stdout) |
TRAP 0,Fwrite,StdOut |
PBNN t,CopyIt if (items >= buf_size) goto copyit |
Trouble GETA t,1F trouble: fputs("Trouble w...!",stderr) |
JMP Quit goto quit |
1H BYTE "Trouble writing StdOut!",#a,0 |
|
EndIt INCL t,Buf_Size |
BN t,ReadErr if (ferror(file[3])) goto readerr |
STO t,Arg1+8 |
LDA t,Arg1 n=fwrite(buffer,1,items,stdout) |
TRAP 0,Fwrite,StdOut |
BN t,Trouble if (n < items) goto trouble |
TRAP 0,Halt,0 exit(0) |
ReadErr GETA t,1F readerr: fputs("Trouble r...!",stderr) |
JMP Quit goto quit } |
1H BYTE "Trouble reading!",#a,0 |
/sub1.mms
0,0 → 1,22
x0 GREG Data_Segment |
t IS $255 |
LOC Data_Segment+8 |
OCTA 1,3,2,3 |
LOC Data_Segment+8*100 |
OCTA -1 |
LOC #100 |
* Maximum of X[1..100] |
j IS $0 ;m IS $1 ;kk IS $2 ;xk IS $3 |
Max100 SETL kk,100*8 |
LDO m,x0,kk |
JMP 2F |
3H LDO xk,x0,kk |
CMP t,xk,m |
PBNP t,5F |
SET m,xk |
2H SR j,kk,3 |
5H SUB kk,kk,8 |
PBP kk,3B |
6H POP 2,0 |
|
Main PUSHJ 0,Max100 |
/sub2.mms
0,0 → 1,23
x0 GREG Data_Segment |
t IS $255 |
LOC Data_Segment+8 |
OCTA 1,3,2,3 |
LOC Data_Segment+8*100 |
OCTA -1 |
LOC #100 |
* Maximum of X[1..100] |
j GREG ;m GREG ;kk GREG ;xk GREG ; GREG @ |
GoMax100 SETL kk,100*8 |
LDO m,x0,kk |
JMP 1F |
3H LDO xk,x0,kk |
CMP t,xk,m |
PBNP t,5F |
4H SET m,xk |
1H SR j,kk,3 |
5H SUB kk,kk,8 |
PBP kk,3B |
6H GO kk,$0,0 |
|
Main GO $0,GoMax100 |
|
/saddle1.mms
0,0 → 1,52
* Exercise 1.3.2'--18, Solution 1 |
LOC #100 |
t IS $255 |
a00 GREG Data_Segment |
a10 GREG Data_Segment+8 |
ij IS $0 % element index and return register |
j GREG % column index |
k GREG % size of list of minima |
x GREG % current minimum |
y GREG % current element |
Saddle SET ij,9*8 |
RowMin SET j,8 |
LDB x,a10,ij Candidate for row minimum |
2H SET k,0 Set list empty. |
4H INCL k,1 |
STB j,a00,k Put column index in list. |
1H SUB ij,ij,1 Go left one. |
SUB j,j,1 |
BZ j,ColMax Done with row? |
3H LDB y,a10,ij |
SUB t,x,y |
PBN t,1B Is \.x still minimum? |
SET x,y |
PBP t,2B New minimum? |
JMP 4B Remember another minimum. |
ColMax LDB $1,a00,k Get column from list. |
ADD j,$1,9*8-8 |
1H LDB y,a10,j |
CMP t,x,y |
PBN t,No Is row min${}<{}$column element? |
SUB j,j,8 |
PBP j,1B Done with column? |
Yes ADD ij,ij,$1 Yes; $\.{ij}\gets{}$index of saddle. |
LDA ij,a10,ij |
POP 1,0 |
No SUB k,k,1 Is list empty? |
BP k,ColMax If not, try again. |
PBP ij,RowMin Have all rows been tried? |
POP 1,0 Yes; $\$0=0$, no saddle.\quad\slug\endmmix |
|
aaaa GREG 6364136223846793005 C E Haynes's multiplier |
Main SET ij,9*8 assume that $1 = seed |
1H MULU $1,$1,aaaa |
INCL $1,1 |
MULU x,$1,5 |
GET x,rH |
SUB x,x,2 |
STB x,a10,ij |
SUB ij,ij,1 |
PBP ij,1B |
PUSHJ 2,Saddle |
JMP Main |
/saddle2.mms
0,0 → 1,59
* Exercise 1.3.2'--18, Solution 2 |
LOC #100 |
t IS $255 |
a00 GREG Data_Segment |
a10 GREG Data_Segment+8 |
a20 GREG Data_Segment+8*2 |
ij GREG % element index |
ii GREG % row index times 8 |
j GREG % column index |
x GREG % current maximum |
y GREG % current element |
z GREG % current min max |
ans IS $0 % return register |
Phase1 SET j,8 Start at column 8. |
SET z,1000 $\.z\gets\infty$ (more or less). |
3H ADD ij,j,9*8-2*8 |
LDB x,a20,ij |
1H LDB y,a10,ij |
CMP t,x,y Is x<y? |
CSN x,t,y If so, update the maximum. |
2H SUB ij,ij,8 Move up one. |
PBP ij,1B |
STB x,a10,ij Store column maximum. |
CMP t,x,z Is x<z? |
CSN z,t,x If so, update the min max. |
SUB j,j,1 Move left a column. |
PBP j,3B |
Phase2 SET ii,9*8-8 At this point $\.z=\min_jC(j)$ |
3H ADD ij,ii,8 Prepare to search a row. |
SET j,8 |
1H LDB x,a10,ij |
SUB t,z,x Is $\.z>a_{ij}$? |
PBP t,No No saddle in this row |
PBN t,2F |
LDB x,a00,j Is $a_{ij}=C(j)$? |
CMP t,x,z |
CSZ ans,t,ij If so, remember a possible saddle point. |
2H SUB j,j,1 Move left in row. |
SUB ij,ij,1 |
PBP j,1B |
LDA ans,a10,ans A saddle point was found here. |
POP 1,0 |
No SUB ii,ii,8 |
PBP ii,3B Try another row. |
SET ans,0 |
POP 1,0 $\.{ans} = 0$; no saddle.\quad\slug |
|
aaaa GREG 6364136223846793005 C E Haynes's multiplier |
Main SET ij,9*8 assume that $1 = seed |
1H MULU $1,$1,aaaa |
INCL $1,1 |
MULU x,$1,5 |
GET x,rH |
SUB x,x,2 |
STB x,a10,ij |
SUB ij,ij,1 |
PBP ij,1B |
PUSHJ 2,Phase1 |
JMP Main |
/halves.mms
0,0 → 1,29
% Example program ... 2^-n in decimal |
% |
LOC #2000000000000000 % Data segment |
HALF BYTE '5' |
LOC @+'0'-1 |
BYTE "0011223344" % Table of half-digits |
DATA BYTE '1',0 |
% |
GREGTOP $g250 |
pbase GREG DATA-1 |
half GREG HALF |
p GREG 0 |
starp GREG 0 |
carry GREG 0 |
acc GREG 0 |
LOC #1000 |
Main OR p,pbase,0 % p = &DATA-1. |
SETL carry,0 % carry = 0. |
JMP 1F |
Loop ADD acc,acc,carry % acc += carry. |
ZSOD carry,starp,5 % carry = 5[*p odd]. |
STB acc,p,0 % *p = acc. |
1H LDB starp,p,1 |
INCL p,1 % p++. |
LDB acc,half,starp % acc = half[*p]. |
PBNZ starp,Loop % repeat until *p='\0'. |
STB acc,p,0 % *p = '5'. |
JMP Main % repeat indefinitely. |
|
/test.mmconfig
0,0 → 1,37
% CONFIGURATION TEST |
% The following erroneous lines have been commented out one by one: |
%sh*t % obscene |
%memaddresstime 0 % too small |
%memaddresstime unit % unreadable |
%branchpredictbits 9 % too large |
%membusbytes 9 % not a power of two |
%ITcache unit % unknown cache parameter |
%mul0 0 % too small |
%mul0 256 % too big |
%unit antidisestablishmentarianism % too long |
%unit 0 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEG % eh? |
%unit 1 0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEFG % 65 |
%unit 2 0000000000000000000000000000000000000000000000000000000000000000 % 0's |
%Dcache blocksize 1024 % exceeds Scache |
%Dcache granularity 16 % exceeds blocksize |
%Scache granularity 16 % differs from Dcache |
memaddresstime 4 |
memreadtime 5 memwritetime 6 % don't ask why |
membusbytes 16 |
branchpredictbits 2 |
branchaddressbits 1 |
branchhistorybits 1 |
branchdualbits 1 |
%branchdualbits 30 |
memchunksmax 2 |
hashprime 3 |
Scache blocksize 32 |
Scache setsize 2 |
Scache associativity 4 lru |
Scache accesstime 2 |
Icache victimsize 2 |
unit UNI1 ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff |
unit UNI2 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF |
sh 1 1 1 |
|
|
/harm.mms
0,0 → 1,47
* Sum of Rounded Harmonic Series |
MaxN IS 10 |
a GREG % Accumulator |
c GREG % $2\cdot10^n$ |
d GREG % Divisor or digit |
r GREG % Scaled reciprocal |
s GREG % Scaled sum |
m GREG % $m_k$ |
mm GREG % $m_{k+1}$ |
nn GREG % $n-\.{MaxN}$ |
LOC Data_Segment |
dec GREG @+3 % decimal point loc |
BYTE " ." |
% LOC @+MaxN+6 |
LOC #100 |
Main NEG nn,MaxN-1 |
SET c,20 |
1H SET m,1 |
SR s,c,1 |
JMP 2F |
3H SUB a,c,1 |
SL d,r,1 |
SUB d,d,1 |
DIV mm,a,d |
4H SUB a,mm,m |
MUL a,r,a |
ADD s,s,a |
SET m,mm |
2H ADD a,c,m |
2ADDU d,m,2 |
DIV r,a,d |
PBNZ r,3B |
5H ADD a,nn,MaxN+1 |
SET d,#a |
JMP 7F |
6H DIV s,s,10 |
GET d,rR |
INCL d,'0' |
7H STB d,dec,a |
SUB a,a,1 |
BZ a,@-4 |
PBNZ s,6B |
8H SUB $255,dec,3 |
TRAP 0,Fputs,StdOut |
9H INCL nn,1 |
MUL c,c,10 |
PBNP nn,1B |
/sim.mms
0,0 → 1,1136
% Stripped-Down Simulator for MMIX, derived from MMIX-SIM |
% To run it on a program like "foo bar" |
% first say "mmix -Dfoo.mmb foo bar" |
% then "mmix <options> sim foo.mmb" |
|
% I apologize for lack of comments; they're in the book though |
|
t IS $255 |
lring_size IS 256 % octabytes in the local register ring |
|
LOC Data_Segment |
Global LOC @+8*256 |
g GREG Global % base of 256 global registers |
Local LOC @+8*lring_size |
l GREG Local % base of lring_size local registers |
GREG @ |
IOArgs OCTA 0,BinaryRead |
Chunk0 IS @ |
|
LOC #100 |
PREFIX :Mem: |
head GREG % address of first chunk |
curkey GREG % KEY(head) |
alloc GREG % address of next chunk to allocate |
Chunk IS #1000 bytes per chunk, is power of 2 |
addr IS $0 |
key IS $1 |
test IS $2 |
newlink IS $3 |
p IS $4 % LINK(p)=head |
t IS :t |
|
KEY IS 0 |
LINK IS 8 |
DATA IS 16 |
nodesize GREG Chunk+3*8 pad with 8 zero bytes |
mask GREG Chunk-1 |
|
:MemFind ANDN key,addr,mask |
CMPU t,key,curkey |
PBZ t,4F |
BN addr,:Error |
SET newlink,head |
1H SET p,head |
LDOU head,p,LINK |
PBNZ head,2F |
SET head,alloc |
STOU key,head,KEY |
ADDU alloc,alloc,nodesize |
JMP 3F |
2H LDOU test,head,KEY |
CMPU t,test,key |
BNZ t,1B |
3H LDOU t,head,LINK |
STOU newlink,head,LINK |
SET curkey,key |
STOU t,p,LINK |
4H SUBU t,addr,key |
LDA $0,head,DATA |
ADDU $0,t,$0 |
POP 1,0 |
PREFIX : |
|
res IS $2 |
arg IS res+1 |
|
ss GREG % rS |
oo GREG % rO |
ll GREG % 8*rL |
gg GREG % 8*rG |
aa GREG % rA |
ii GREG % rI |
uu GREG % rU |
cc GREG % rC |
|
lring_mask GREG 8*lring_size-1 |
:GetReg CMPU t,$0,gg |
BN t,1F |
LDOU $0,g,$0 |
POP 1,0 |
1H CMPU t,$0,ll |
ADDU $0,$0,oo |
AND $0,$0,lring_mask |
LDOU $0,l,$0 |
CSNN $0,t,0 |
POP 1,0 |
|
:StackStore GET $0,rJ |
AND t,ss,lring_mask \S82 |
LDOU $1,l,t |
SET arg,ss |
PUSHJ res,MemFind |
STOU $1,res,0 M[rS]<-l[rS] |
ADDU ss,ss,8 |
PUT rJ,$0 |
POP |
:StackLoad GET $0,rJ |
SUBU ss,ss,8 \S83 |
SET arg,ss |
PUSHJ res,MemFind |
LDOU $1,res,0 |
AND t,ss,lring_mask |
STOU $1,l,t |
PUT rJ,$0 |
POP |
:StackRoom SUBU t,ss,oo idiom in \S81,\S101,\S102 |
SUBU t,t,ll |
AND t,t,lring_mask |
PBNZ t,1F |
GET $0,rJ |
PUSHJ res,StackStore |
PUT rJ,$0 |
1H POP |
|
* The main loop |
loc GREG % where the simulator is at |
inst_ptr GREG % where the simulator will be next |
inst GREG % the current instruction being simulated |
resuming GREG % are we resuming an instruction in rX? |
|
Fetch PBZ resuming,1F \S60 (main simulation loop) |
SUBU loc,inst_ptr,4 |
LDTU inst,g,8*rX+4 |
JMP 2F |
1H SET loc,inst_ptr |
SET arg,loc |
PUSHJ res,MemFind |
LDTU inst,res,0 |
ADDU inst_ptr,loc,4 |
2H CMPU t,loc,g |
BNN t,Error loc>=Data_Segment |
|
op GREG % opcode of the current instruction |
xx GREG % X field of the current instruction |
yy GREG % Y field of the current instruction |
zz GREG % Z field of the current instruction |
yz GREG % YZ field of the current instruction |
f GREG % packed information about the current op |
xxx GREG % X field times 8 |
x GREG % result, or X operand |
y GREG % Y operand |
z GREG % Z operand |
xptr GREG % location where x should be stored |
exc GREG % arithmetic exceptions |
|
Z_is_immed_bit IS #1 |
Z_is_source_bit IS #2 |
Y_is_immed_bit IS #4 |
Y_is_source_bit IS #8 |
X_is_source_bit IS #10 |
X_is_dest_bit IS #20 |
Rel_addr_bit IS #40 |
Mem_bit IS #80 |
|
Info IS #1000 |
Done IS Info+8*256 |
info GREG Info % base address for master info table |
c255 GREG 8*255 |
c256 GREG 8*256 |
|
MOR op,inst,#8 |
MOR xx,inst,#4 |
MOR yy,inst,#2 |
MOR zz,inst,#1 |
0H GREG -#10000 |
ANDN yz,inst,0B |
SLU xxx,xx,3 |
SLU t,op,3 |
LDOU f,info,t |
SET x,0 |
SET y,0 |
SET z,0 |
SET exc,0 |
AND t,f,Rel_addr_bit |
PBZ t,1F |
PBEV f,2F Convert rel to abs, \S70 |
9H GREG -#1000000 |
ANDN yz,inst,9B xyz |
ADDU t,yz,9B |
JMP 3F |
2H ADDU t,yz,0B |
3H CSOD yz,op,t |
SL t,yz,2 |
ADDU yz,loc,t |
1H PBNN resuming,Install_X Install operands \S71 |
LDOU y,g,8*rY Install special operands \S127 |
LDOU z,g,8*rZ |
BOD resuming,Install_Y |
0H GREG #C1<<56+(x-$0)<<48+(z-$0)<<40+1<<16+X_is_dest_bit |
SET f,0B Change to ORI instruction |
LDOU exc,g,8*rX |
MOR exc,exc,#20 |
JMP XDest |
Install_X AND t,f,X_is_source_bit |
PBZ t,1F |
SET arg,xxx |
PUSHJ res,GetReg |
SET x,res |
1H SRU t,f,5 |
AND t,t,#f8 |
PBZ t,Install_Z |
LDOU x,g,t Set x from third op, \S79 |
Install_Z AND t,f,Z_is_source_bit |
PBZ t,1F |
SLU arg,zz,3 |
PUSHJ res,GetReg |
SET z,res |
JMP Install_Y |
1H CSOD z,f,zz Z_is_immed_bit |
AND t,op,#f0 |
CMPU t,t,#e0 |
PBNZ t,Install_Y |
AND t,op,#3 Set z as immediate wyde, \S78 |
NEG t,3,t |
SLU t,t,4 |
SLU z,yz,t |
SET y,x |
Install_Y AND t,f,Y_is_immed_bit |
PBZ t,1F |
SET y,yy |
SLU t,yy,40 |
ADDU f,f,t |
1H AND t,f,Y_is_source_bit |
BZ t,1F |
SLU arg,yy,3 |
PUSHJ res,GetReg |
SET y,res (end of \S71) |
1H AND t,f,X_is_dest_bit |
BZ t,1F |
XDest CMPU t,xxx,gg Install X as dest, \S80 |
BN t,3F |
LDA xptr,g,xxx |
JMP 1F |
2H ADDU t,oo,ll |
AND t,t,lring_mask |
STCO 0,l,t |
INCL ll,8 |
PUSHJ res,StackRoom |
3H CMPU t,xxx,ll |
BNN t,2B |
ADD t,xxx,oo |
AND t,t,lring_mask |
LDA xptr,l,t |
1H AND t,f,Mem_bit |
PBZ t,1F |
ADDU arg,y,z |
CMPU t,op,#A0 |
BN t,2F |
CMPU t,arg,g |
BN t,Error |
2H PUSHJ res,MemFind |
1H SRU t,f,32 |
PUT rX,t |
PUT rM,x |
PUT rE,x |
0H GREG #30000 |
AND t,aa,0B |
ORL t,U_BIT<<8 enable underflow trip |
PUT rA,t |
0H GREG Done |
PUT rW,0B |
RESUME |
|
MulU MULU x,y,z |
GET t,rH |
STOU t,g,8*rH |
JMP XDone |
|
Div DIV x,y,z |
JMP 1F |
DivU PUT rD,x |
DIVU x,y,z |
1H GET t,rR |
STO t,g,8*rR |
JMP XDone |
|
Cswap LDOU z,g,8*rP |
LDOU y,res,0 |
CMPU t,y,z |
BNZ t,1F |
STOU x,res,0 |
JMP 2F |
1H STOU y,g,8*rP |
2H ZSZ x,t,1 |
JMP XDone |
|
BTaken ADDU cc,cc,4 |
PBTaken SUBU cc,cc,2 |
SET inst_ptr,yz |
JMP Update |
|
Go SET x,inst_ptr |
ADDU inst_ptr,y,z |
JMP XDone |
|
PushGo ADDU yz,y,z |
PushJ SET inst_ptr,yz |
CMPU t,xxx,gg |
PBN t,1F |
SET xxx,ll |
SRU xx,xxx,3 |
INCL ll,8 |
PUSHJ 0,StackRoom |
1H ADDU t,xxx,oo |
AND t,t,lring_mask |
STOU xx,l,t |
ADDU t,loc,4 |
STOU t,g,8*rJ |
INCL xxx,8 |
SUBU ll,ll,xxx |
ADDU oo,oo,xxx |
JMP Update |
|
Pop SUBU oo,oo,8 |
BZ xx,1F |
CMPU t,ll,xxx |
BN t,1F |
ADDU t,xxx,oo |
AND t,t,lring_mask |
LDOU y,l,t |
1H CMPU t,oo,ss |
PBNN t,1F |
PUSHJ 0,StackLoad |
1H AND t,oo,lring_mask |
LDOU z,l,t |
AND z,z,#ff |
SLU z,z,3 |
1H SUBU t,oo,ss |
CMPU t,t,z |
PBNN t,1F |
PUSHJ 0,StackLoad actually gamma=beta possible here! |
JMP 1B |
1H ADDU ll,ll,8 |
CMPU t,xxx,ll |
CSN ll,t,xxx |
ADDU ll,ll,z |
CMPU t,gg,ll |
CSN ll,t,gg |
CMPU t,z,ll |
BNN t,1F |
AND t,oo,lring_mask |
STOU y,l,t |
1H LDOU y,g,8*rJ |
SUBU oo,oo,z |
4ADDU inst_ptr,yz,y |
JMP Update |
|
Save BNZ yz,Error \S102 |
CMPU t,xxx,gg |
BN t,Error |
ADDU t,oo,ll |
AND t,t,lring_mask |
SRU y,ll,3 |
STOU y,l,t |
INCL ll,8 |
PUSHJ 0,StackRoom |
ADDU oo,oo,ll |
SET ll,0 |
1H PUSHJ 0,StackStore |
CMPU t,ss,oo |
PBNZ t,1B |
SUBU y,gg,8 |
4H ADDU y,y,8 |
1H SET arg,ss \S103 |
PUSHJ res,MemFind |
CMPU t,y,8*(rZ+1) |
LDOU z,g,y |
PBNZ t,2F |
SLU z,gg,56-3 |
ADDU z,z,aa |
2H STOU z,res,0 |
INCL ss,8 |
BNZ t,1F |
CMPU t,y,c255 |
BZ t,2F |
CMPU t,y,8*rR |
PBNZ t,4B |
SET y,8*rP |
JMP 1B |
2H SET y,8*rB |
JMP 1B |
1H SET oo,ss |
SUBU x,oo,8 |
JMP XDone |
|
Unsave BNZ xx,Error \S104 |
BNZ yy,Error |
ANDNL z,#7 |
ADDU ss,z,8 |
SET y,8*(rZ+2) |
1H SUBU y,y,8 |
4H SUBU ss,ss,8 \S105 |
SET arg,ss |
PUSHJ res,MemFind |
LDOU x,res,0 |
CMPU t,y,8*(rZ+1) |
PBNZ t,2F |
SRU gg,x,56-3 |
SLU aa,x,64-18 |
SRU aa,aa,64-18 |
JMP 1B |
2H STOU x,g,y |
3H CMPU t,y,8*rP |
CSZ y,t,8*(rR+1) |
CSZ y,y,c256 |
CMPU t,y,gg |
PBNZ t,1B |
PUSHJ 0,StackLoad |
AND t,ss,lring_mask |
LDOU x,l,t |
AND x,x,#ff |
BZ x,1F |
SET y,x |
2H PUSHJ 0,StackLoad |
SUBU y,y,1 |
PBNZ y,2B |
SLU x,x,3 |
1H SET ll,x |
CMPU t,gg,x |
CSN ll,t,gg |
SET oo,ss |
PBNZ uu,Update |
BZ resuming,Update |
JMP AllDone |
|
Get CMPU t,yz,32 |
BNN t,Error |
STOU ii,g,8*rI |
STOU cc,g,8*rC |
STOU oo,g,8*rO |
STOU ss,g,8*rS |
STOU uu,g,8*rU |
STOU aa,g,8*rA |
SR t,ll,3 |
STOU t,g,8*rL |
SR t,gg,3 |
STOU t,g,8*rG |
SLU t,zz,3 |
LDOU x,g,t |
JMP XDone |
|
Put BNZ yy,Error |
CMPU t,xx,32 |
BNN t,Error |
CMPU t,xx,rC |
BN t,PutOK |
CMPU t,xx,rF |
BN t,1F |
PutOK STOU z,g,xxx |
JMP Update |
1H CMPU t,xx,rG |
BN t,Error |
SUB t,xx,rL |
PBP t,PutA |
BN t,PutG |
PutL SLU z,z,3 \S98, PUT rL |
CMPU t,z,ll |
CSN ll,t,z |
JMP Update |
0H GREG #40000 |
PutA CMPU t,z,0B \S100, PUT rA |
BNN t,Error |
SET aa,z |
JMP Update |
PutG SRU t,z,8 |
BNZ t,Error |
CMPU t,z,32 |
BN t,Error |
SLU z,z,3 |
CMPU t,z,ll |
BN t,Error |
JMP 2F |
1H SUBU gg,gg,8 |
STCO 0,g,gg |
2H CMPU t,z,gg |
PBN t,1B |
SET gg,z |
JMP Update |
|
Resume SLU t,inst,40 \S125 |
BNZ t,Error |
LDOU inst_ptr,g,8*rW |
LDOU x,g,8*rX |
BN x,Update |
SRU xx,x,56 |
SUBU t,xx,2 |
BNN t,1F |
PBZ xx,2F |
SRU y,x,28 rop=1 (RESUME_CONT) |
AND y,y,#f |
SET z,1 |
SLU z,z,y |
ANDNL z,#70cf |
BNZ z,Error |
1H BP t,Error |
SRU t,x,13 |
AND t,t,c255 |
CMPU y,t,ll |
BN y,2F |
CMPU y,t,gg |
BN y,Error |
2H MOR t,x,#8 |
CMPU t,t,#F9 RESUME |
BZ t,Error |
NEG resuming,xx |
CSNN resuming,resuming,1 |
JMP Update |
|
Sync BNZ xx,Error |
CMPU t,yz,4 |
BNN t,Error |
JMP Update |
|
Trip SET xx,0 |
JMP TakeTrip |
|
Trap STOU inst_ptr,g,8*rWW |
0H GREG #8000000000000000 |
ADDU t,inst,0B |
STOU t,g,8*rXX |
STOU y,g,8*rYY |
STOU z,g,8*rZZ |
SRU y,inst,6 |
CMPU t,y,4*11 |
BNN t,Error |
LDOU t,g,c255 |
0H GREG @+4 |
GO y,0B,y |
JMP SimHalt |
JMP SimFopen |
JMP SimFclose |
JMP SimFread |
JMP SimFgets |
JMP SimFgetws |
JMP SimFwrite |
JMP SimFputs |
JMP SimFputws |
JMP SimFseek |
JMP SimFtell |
|
:GetArgs GET $0,rJ |
SET y,t |
SET arg,t |
PUSHJ res,MemFind |
LDOU z,res,0 z = virtual address of buffer |
SET arg,z |
PUSHJ res,MemFind |
SET x,res x = physical address of buffer |
STO x,IOArgs |
SET xx,Mem:Chunk |
AND zz,x,Mem:mask |
SUB xx,xx,zz xx = bytes from x to chunk end |
ADDU arg,y,8 |
PUSHJ res,MemFind |
LDOU zz,res,0 zz = size of buffer |
STOU zz,IOArgs+8 |
PUT rJ,$0 |
POP |
|
GREG @ |
:SimInst LDA t,IOArgs |
JMP DoInst |
SimFinish LDA t,IOArgs |
SimFclose GETA $0,TrapDone |
:DoInst PUT rW,$0 |
PUT rX,inst |
RESUME |
|
SimFopen PUSHJ 0,GetArgs |
ADDU xx,Mem:alloc,Mem:nodesize |
STOU xx,IOArgs % we'll copy the file name here |
SET x,xx |
1H SET arg,z |
PUSHJ res,MemFind |
LDBU t,res,0 |
STBU t,x,0 |
INCL x,1 |
INCL z,1 |
PBNZ t,1B |
GO $0,SimInst |
3H STCO 0,x,0 % clean up the copied string |
CMPU z,xx,x |
SUB x,x,8 |
PBN z,3B |
JMP TrapDone |
|
TrapDone STO t,g,8*rBB "RESUME 1" works this way |
STO t,g,c255 |
JMP Update |
|
SimFread PUSHJ 0,GetArgs |
SET y,zz number of bytes to read |
1H CMP t,xx,y |
PBNN t,SimFinish |
STO xx,IOArgs+8 oops, we must cross chunk bdry |
SUB y,y,xx |
GO $0,SimInst |
BN t,1F |
ADD z,z,xx |
SET arg,z |
PUSHJ res,MemFind |
STOU res,IOArgs |
STO y,IOArgs+8 |
ADD xx,Mem:mask,1 |
JMP 1B |
1H SUB t,t,y |
JMP TrapDone |
|
SimFgets PUSHJ 0,GetArgs |
CMP t,xx,zz |
PBNN t,SimFinish easy if all in one chunk |
SET y,zz remaining buf size |
SET yy,0 bytes successfully read so far |
1H ADD t,xx,1 |
STO t,IOArgs+8 null character spills off end |
GO $0,SimInst |
BN t,TrapDone |
ADD yy,yy,t |
CMP $0,t,xx |
SET t,yy |
PBNZ $0,TrapDone |
ADDU z,z,xx |
SET arg,z |
PUSHJ res,MemFind |
SUBU x,x,1 |
LDBU t,x,xx look at last byte read |
CMP t,t,#0a is it newline? |
BZ t,1F |
SUB y,y,xx |
SET x,res |
STOU x,IOArgs |
STO y,IOArgs+8 |
ADD xx,Mem:mask,1 |
CMP t,xx,y |
BN t,1B |
GO $0,SimInst |
BN t,TrapDone |
2H ADD t,yy,t |
JMP TrapDone |
1H SET t,0 |
STBU t,res,0 |
JMP 2B |
|
SimFgetws PUSHJ 0,GetArgs |
ADD y,zz,zz remaining buf size (bytes) |
CMP t,xx,y |
PBNN t,SimFinish easy if all in one chunk |
SET yy,0 wydes successfully read so far |
1H ADD zz,xx,3 |
SR zz,zz,1 wydes in current chunk, plus 1 |
STO zz,IOArgs+8 null character spills off end |
GO $0,SimInst |
BN t,TrapDone |
ADDU yy,yy,t |
SUB zz,zz,1 |
CMP $0,t,zz |
SET t,yy |
PBNZ $0,TrapDone |
ADD z,z,xx |
SET arg,z |
PUSHJ res,MemFind |
SUBU x,x,2 |
LDWU t,x,xx look at last wyde read |
CMP t,t,#0a is it newline? |
BZ t,1F |
SUB y,y,xx |
SET x,res |
STOU x,IOArgs |
SR t,y,1 |
STO t,IOArgs+8 |
ADD xx,Mem:mask,1 |
ANDN y,y,1 |
CMP t,xx,y |
BN t,1B |
GO $0,SimInst |
BN t,TrapDone |
2H ADD t,yy,t |
JMP TrapDone |
1H SET t,0 |
STWU t,res,0 |
JMP 2B |
|
SimFwrite IS SimFread yes it works! |
|
SimFputs SET xx,0 this many bytes written |
SET z,t virtual address of string |
1H SET arg,z |
PUSHJ res,MemFind |
SET t,res physical address of string |
GO $0,DoInst |
BN t,TrapDone |
BZ t,1F |
ADD xx,xx,t |
ADDU z,z,t |
AND t,z,Mem:mask |
BZ t,1B |
1H SET t,xx |
JMP TrapDone |
|
SimFputws SET xx,0 this many wydes written |
SET z,t virtual address of string |
1H SET arg,z |
PUSHJ res,MemFind |
SET t,res physical address of string |
GO $0,DoInst |
BN t,TrapDone |
BZ t,1F |
ADD xx,xx,t |
2ADDU z,t,z |
AND t,z,Mem:mask |
BZ t,1B |
1H SET t,xx |
JMP TrapDone |
|
SimFseek IS SimFclose |
SimFtell IS SimFclose |
|
GREG @ |
1H BYTE "Warning: ",0 |
2H BYTE " at location ",0 |
3H BYTE #a,0 |
T0 BYTE "TRIP",0 |
T1 BYTE "integer divide check",0 |
T2 BYTE "integer overflow",0 |
T3 BYTE "float-to-fix overflow",0 |
T4 BYTE "invalid floating point operation",0 |
T5 BYTE "floating point overflow",0 |
T6 BYTE "floating point underflow",0 |
T7 BYTE "floating point division by zero",0 |
T8 BYTE "floating point inexact",0 |
TripType OCTA T0,T1,T2,T3,T4,T5,T6,T7,T8 |
SimHalt CMP t,zz,1 |
BZ inst,Exit t=0 on normal exit |
BNZ t,Error |
CMPU t,loc,#90 |
BNN t,Error Halt 1 from loc<#90 gives warning |
LDA t,1B |
TRAP 0,Fputs,StdErr |
SR x,loc,1 |
LDA t,TripType |
LDOU t,t,x |
TRAP 0,Fputs,StdErr |
LDA t,2B |
TRAP 0,Fputs,StdErr |
LDOU x,g,8*rW |
SUBU x,x,4 |
SRU arg,x,32 |
PUSHJ res,OutTetra |
SET arg,x |
PUSHJ res,OutTetra |
LDA t,3B |
TRAP 0,Fputs,StdErr |
LDOU t,g,c255 |
JMP TrapDone |
|
Error NEG t,22 catch-22 |
Exit TRAP 0,Halt,0 |
|
s IS $1 |
0H GREG #0008000400020001 |
:OutTetra MOR t,$0,0B |
SLU s,t,4 |
XOR t,s,t |
0H GREG #0f0f0f0f0f0f0f0f |
AND t,t,0B |
0H GREG #0606060606060606 |
ADDU t,t,0B |
0H GREG #0000002700000000 |
MOR s,0B,t |
0H GREG #2a2a2a2a2a2a2a2a |
ADDU t,t,0B |
ADDU s,t,s |
STOU s,g,c255 |
GETA t,OctaArgs |
TRAP 0,Fwrite,StdErr |
POP 0 |
|
O IS Done-4 |
LOC Info |
JMP Trap+@-O; BYTE 0,5,0,#0a TRAP |
FCMP x,y,z; BYTE 0,1,0,#2a FCMP |
FUN x,y,z; BYTE 0,1,0,#2a FUN |
FEQL x,y,z; BYTE 0,1,0,#2a FEQL |
FADD x,y,z; BYTE 0,4,0,#2a FADD |
FIX x,0,z; BYTE 0,4,0,#26 FIX |
FSUB x,y,z; BYTE 0,4,0,#2a FSUB |
FIXU x,0,z; BYTE 0,4,0,#26 FIXU |
FLOT x,0,z; BYTE 0,4,0,#26 FLOT |
FLOT x,0,z; BYTE 0,4,0,#25 FLOTI |
FLOTU x,0,z; BYTE 0,4,0,#26 FLOTU |
FLOTU x,0,z; BYTE 0,4,0,#25 FLOTUI |
SFLOT x,0,z; BYTE 0,4,0,#26 SFLOT |
SFLOT x,0,z; BYTE 0,4,0,#25 SFLOTI |
SFLOTU x,0,z; BYTE 0,4,0,#26 SFLOTU |
SFLOTU x,0,z; BYTE 0,4,0,#25 SFLOTUI |
FMUL x,y,z; BYTE 0,4,0,#2a FMUL |
FCMPE x,y,z; BYTE 0,4,rE,#2a FCMPE |
FUNE x,y,z; BYTE 0,1,rE,#2a FUNE |
FEQLE x,y,z; BYTE 0,4,rE,#2a FEQLE |
FDIV x,y,z; BYTE 0,40,0,#2a FDIV |
FSQRT x,0,z; BYTE 0,40,0,#26 FSQRT |
FREM x,y,z; BYTE 0,4,0,#2a FREM |
FINT x,0,z; BYTE 0,4,0,#26 FINT |
MUL x,y,z; BYTE 0,10,0,#2a MUL |
MUL x,y,z; BYTE 0,10,0,#29 MULI |
JMP MulU+@-O; BYTE 0,10,0,#2a MULU |
JMP MulU+@-O; BYTE 0,10,0,#29 MULUI |
JMP Div+@-O; BYTE 0,60,0,#2a DIV |
JMP Div+@-O; BYTE 0,60,0,#29 DIVI |
JMP DivU+@-O; BYTE 0,60,rD,#2a DIVU |
JMP DivU+@-O; BYTE 0,60,rD,#29 DIVUI |
ADD x,y,z; BYTE 0,1,0,#2a ADD |
ADD x,y,z; BYTE 0,1,0,#29 ADDI |
ADDU x,y,z; BYTE 0,1,0,#2a ADDU |
ADDU x,y,z; BYTE 0,1,0,#29 ADDUI |
SUB x,y,z; BYTE 0,1,0,#2a SUB |
SUB x,y,z; BYTE 0,1,0,#29 SUBI |
SUBU x,y,z; BYTE 0,1,0,#2a SUBU |
SUBU x,y,z; BYTE 0,1,0,#29 SUBUI |
2ADDU x,y,z; BYTE 0,1,0,#2a 2ADDU |
2ADDU x,y,z; BYTE 0,1,0,#29 2ADDUI |
4ADDU x,y,z; BYTE 0,1,0,#2a 4ADDU |
4ADDU x,y,z; BYTE 0,1,0,#29 4ADDUI |
8ADDU x,y,z; BYTE 0,1,0,#2a 8ADDU |
8ADDU x,y,z; BYTE 0,1,0,#29 8ADDUI |
16ADDU x,y,z; BYTE 0,1,0,#2a 16ADDU |
16ADDU x,y,z; BYTE 0,1,0,#29 16ADDUI |
CMP x,y,z; BYTE 0,1,0,#2a CMP |
CMP x,y,z; BYTE 0,1,0,#29 CMPI |
CMPU x,y,z; BYTE 0,1,0,#2a CMPU |
CMPU x,y,z; BYTE 0,1,0,#29 CMPUI |
NEG x,0,z; BYTE 0,1,0,#26 NEG |
NEG x,0,z; BYTE 0,1,0,#25 NEGI |
NEGU x,0,z; BYTE 0,1,0,#26 NEGU |
NEGU x,0,z; BYTE 0,1,0,#25 NEGUI |
SL x,y,z; BYTE 0,1,0,#2a SL |
SL x,y,z; BYTE 0,1,0,#29 SLI |
SLU x,y,z; BYTE 0,1,0,#2a SLU |
SLU x,y,z; BYTE 0,1,0,#29 SLUI |
SR x,y,z; BYTE 0,1,0,#2a SR |
SR x,y,z; BYTE 0,1,0,#29 SRI |
SRU x,y,z; BYTE 0,1,0,#2a SRU |
SRU x,y,z; BYTE 0,1,0,#29 SRUI |
BN x,BTaken+@-O; BYTE 0,1,0,#50 BN |
BN x,BTaken+@-O; BYTE 0,1,0,#50 BNB |
BZ x,BTaken+@-O; BYTE 0,1,0,#50 BZ |
BZ x,BTaken+@-O; BYTE 0,1,0,#50 BZB |
BP x,BTaken+@-O; BYTE 0,1,0,#50 BP |
BP x,BTaken+@-O; BYTE 0,1,0,#50 BPB |
BOD x,BTaken+@-O; BYTE 0,1,0,#50 BOD |
BOD x,BTaken+@-O; BYTE 0,1,0,#50 BODB |
BNN x,BTaken+@-O; BYTE 0,1,0,#50 BNN |
BNN x,BTaken+@-O; BYTE 0,1,0,#50 BNNB |
BNZ x,BTaken+@-O; BYTE 0,1,0,#50 BNZ |
BNZ x,BTaken+@-O; BYTE 0,1,0,#50 BNZB |
BNP x,BTaken+@-O; BYTE 0,1,0,#50 BNP |
BNP x,BTaken+@-O; BYTE 0,1,0,#50 BNPB |
BEV x,BTaken+@-O; BYTE 0,1,0,#50 BEV |
BEV x,BTaken+@-O; BYTE 0,1,0,#50 BEVB |
PBN x,PBTaken+@-O; BYTE 0,3,0,#50 PBN |
PBN x,PBTaken+@-O; BYTE 0,3,0,#50 PBNB |
PBZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBZ |
PBZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBZB |
PBP x,PBTaken+@-O; BYTE 0,3,0,#50 PBP |
PBP x,PBTaken+@-O; BYTE 0,3,0,#50 PBPB |
PBOD x,PBTaken+@-O; BYTE 0,3,0,#50 PBOD |
PBOD x,PBTaken+@-O; BYTE 0,3,0,#50 PBODB |
PBNN x,PBTaken+@-O; BYTE 0,3,0,#50 PBNN |
PBNN x,PBTaken+@-O; BYTE 0,3,0,#50 PBNNB |
PBNZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBNZ |
PBNZ x,PBTaken+@-O; BYTE 0,3,0,#50 PBNZB |
PBNP x,PBTaken+@-O; BYTE 0,3,0,#50 PBNP |
PBNP x,PBTaken+@-O; BYTE 0,3,0,#50 PBNPB |
PBEV x,PBTaken+@-O; BYTE 0,3,0,#50 PBEV |
PBEV x,PBTaken+@-O; BYTE 0,3,0,#50 PBEVB |
CSN x,y,z; BYTE 0,1,0,#3a CSN |
CSN x,y,z; BYTE 0,1,0,#39 CSNI |
CSZ x,y,z; BYTE 0,1,0,#3a CSZ |
CSZ x,y,z; BYTE 0,1,0,#39 CSZI |
CSP x,y,z; BYTE 0,1,0,#3a CSP |
CSP x,y,z; BYTE 0,1,0,#39 CSPI |
CSOD x,y,z; BYTE 0,1,0,#3a CSOD |
CSOD x,y,z; BYTE 0,1,0,#39 CSODI |
CSNN x,y,z; BYTE 0,1,0,#3a CSNN |
CSNN x,y,z; BYTE 0,1,0,#39 CSNNI |
CSNZ x,y,z; BYTE 0,1,0,#3a CSNZ |
CSNZ x,y,z; BYTE 0,1,0,#39 CSNZI |
CSNP x,y,z; BYTE 0,1,0,#3a CSNP |
CSNP x,y,z; BYTE 0,1,0,#39 CSNPI |
CSEV x,y,z; BYTE 0,1,0,#3a CSEV |
CSEV x,y,z; BYTE 0,1,0,#39 CSEVI |
ZSN x,y,z; BYTE 0,1,0,#2a ZSN |
ZSN x,y,z; BYTE 0,1,0,#29 ZSNI |
ZSZ x,y,z; BYTE 0,1,0,#2a ZSZ |
ZSZ x,y,z; BYTE 0,1,0,#29 ZSZI |
ZSP x,y,z; BYTE 0,1,0,#2a ZSP |
ZSP x,y,z; BYTE 0,1,0,#29 ZSPI |
ZSOD x,y,z; BYTE 0,1,0,#2a ZSOD |
ZSOD x,y,z; BYTE 0,1,0,#29 ZSODI |
ZSNN x,y,z; BYTE 0,1,0,#2a ZSNN |
ZSNN x,y,z; BYTE 0,1,0,#29 ZSNNI |
ZSNZ x,y,z; BYTE 0,1,0,#2a ZSNZ |
ZSNZ x,y,z; BYTE 0,1,0,#29 ZSNZI |
ZSNP x,y,z; BYTE 0,1,0,#2a ZSNP |
ZSNP x,y,z; BYTE 0,1,0,#29 ZSNPI |
ZSEV x,y,z; BYTE 0,1,0,#2a ZSEV |
ZSEV x,y,z; BYTE 0,1,0,#29 ZSEVI |
LDB x,res,0; BYTE 1,1,0,#aa LDB |
LDB x,res,0; BYTE 1,1,0,#a9 LDBI |
LDBU x,res,0; BYTE 1,1,0,#aa LDBU |
LDBU x,res,0; BYTE 1,1,0,#a9 LDBUI |
LDW x,res,0; BYTE 1,1,0,#aa LDW |
LDW x,res,0; BYTE 1,1,0,#a9 LDWI |
LDWU x,res,0; BYTE 1,1,0,#aa LDWU |
LDWU x,res,0; BYTE 1,1,0,#a9 LDWUI |
LDT x,res,0; BYTE 1,1,0,#aa LDT |
LDT x,res,0; BYTE 1,1,0,#a9 LDTI |
LDTU x,res,0; BYTE 1,1,0,#aa LDTU |
LDTU x,res,0; BYTE 1,1,0,#a9 LDTUI |
LDO x,res,0; BYTE 1,1,0,#aa LDO |
LDO x,res,0; BYTE 1,1,0,#a9 LDOI |
LDOU x,res,0; BYTE 1,1,0,#aa LDOU |
LDOU x,res,0; BYTE 1,1,0,#a9 LDOUI |
LDSF x,res,0; BYTE 1,1,0,#aa LDSF |
LDSF x,res,0; BYTE 1,1,0,#a9 LDSFI |
LDHT x,res,0; BYTE 1,1,0,#aa LDHT |
LDHT x,res,0; BYTE 1,1,0,#a9 LDHTI |
JMP Cswap+@-O; BYTE 2,2,0,#ba CSWAP |
JMP Cswap+@-O; BYTE 2,2,0,#b9 CSWAPI |
LDUNC x,res,0; BYTE 1,1,0,#aa LDUNC |
LDUNC x,res,0; BYTE 1,1,0,#a9 LDUNCI |
JMP Error+@-O; BYTE 0,1,0,#2a LDVTS |
JMP Error+@-O; BYTE 0,1,0,#29 LDVTSI |
SWYM 0; BYTE 0,1,0,#0a PRELD |
SWYM 0; BYTE 0,1,0,#09 PRELDI |
SWYM 0; BYTE 0,1,0,#0a PREGO |
SWYM 0; BYTE 0,1,0,#09 PREGOI |
JMP Go+@-O; BYTE 0,3,0,#2a GO |
JMP Go+@-O; BYTE 0,3,0,#29 GOI |
STB x,res,0; BYTE 1,1,0,#9a STB |
STB x,res,0; BYTE 1,1,0,#99 STBI |
STBU x,res,0; BYTE 1,1,0,#9a STBU |
STBU x,res,0; BYTE 1,1,0,#99 STBUI |
STW x,res,0; BYTE 1,1,0,#9a STW |
STW x,res,0; BYTE 1,1,0,#99 STWI |
STWU x,res,0; BYTE 1,1,0,#9a STWU |
STWU x,res,0; BYTE 1,1,0,#99 STWUI |
STT x,res,0; BYTE 1,1,0,#9a STT |
STT x,res,0; BYTE 1,1,0,#99 STTI |
STTU x,res,0; BYTE 1,1,0,#9a STTU |
STTU x,res,0; BYTE 1,1,0,#99 STTUI |
STO x,res,0; BYTE 1,1,0,#9a STO |
STO x,res,0; BYTE 1,1,0,#99 STOI |
STOU x,res,0; BYTE 1,1,0,#9a STOU |
STOU x,res,0; BYTE 1,1,0,#99 STOUI |
STSF x,res,0; BYTE 1,1,0,#9a STSF |
STSF x,res,0; BYTE 1,1,0,#99 STSFI |
STHT x,res,0; BYTE 1,1,0,#9a STHT |
STHT x,res,0; BYTE 1,1,0,#99 STHTI |
STO xx,res,0; BYTE 1,1,0,#8a STCO |
STO xx,res,0; BYTE 1,1,0,#89 STCOI |
STUNC x,res,0; BYTE 1,1,0,#9a STUNC |
STUNC x,res,0; BYTE 1,1,0,#99 STUNCI |
SWYM 0; BYTE 0,1,0,#0a SYNCD |
SWYM 0; BYTE 0,1,0,#09 SYNCDI |
SWYM 0; BYTE 0,1,0,#0a PREST |
SWYM 0; BYTE 0,1,0,#09 PRESTI |
SWYM 0; BYTE 0,1,0,#0a SYNCID |
SWYM 0; BYTE 0,1,0,#09 SYNCIDI |
JMP PushGo+@-O; BYTE 0,3,0,#2a PUSHGO |
JMP PushGo+@-O; BYTE 0,3,0,#29 PUSHGOI |
OR x,y,z; BYTE 0,1,0,#2a OR |
OR x,y,z; BYTE 0,1,0,#29 ORI |
ORN x,y,z; BYTE 0,1,0,#2a ORN |
ORN x,y,z; BYTE 0,1,0,#29 ORNI |
NOR x,y,z; BYTE 0,1,0,#2a NOR |
NOR x,y,z; BYTE 0,1,0,#29 NORI |
XOR x,y,z; BYTE 0,1,0,#2a XOR |
XOR x,y,z; BYTE 0,1,0,#29 XORI |
AND x,y,z; BYTE 0,1,0,#2a AND |
AND x,y,z; BYTE 0,1,0,#29 ANDI |
ANDN x,y,z; BYTE 0,1,0,#2a ANDN |
ANDN x,y,z; BYTE 0,1,0,#29 ANDNI |
NAND x,y,z; BYTE 0,1,0,#2a NAND |
NAND x,y,z; BYTE 0,1,0,#29 NANDI |
NXOR x,y,z; BYTE 0,1,0,#2a NXOR |
NXOR x,y,z; BYTE 0,1,0,#29 NXORI |
BDIF x,y,z; BYTE 0,1,0,#2a BDIF |
BDIF x,y,z; BYTE 0,1,0,#29 BDIFI |
WDIF x,y,z; BYTE 0,1,0,#2a WDIF |
WDIF x,y,z; BYTE 0,1,0,#29 WDIFI |
TDIF x,y,z; BYTE 0,1,0,#2a TDIF |
TDIF x,y,z; BYTE 0,1,0,#29 TDIFI |
ODIF x,y,z; BYTE 0,1,0,#2a ODIF |
ODIF x,y,z; BYTE 0,1,0,#29 ODIFI |
MUX x,y,z; BYTE 0,1,rM,#2a MUX |
MUX x,y,z; BYTE 0,1,rM,#29 MUXI |
SADD x,y,z; BYTE 0,1,0,#2a SADD |
SADD x,y,z; BYTE 0,1,0,#29 SADDI |
MOR x,y,z; BYTE 0,1,0,#2a MOR |
MOR x,y,z; BYTE 0,1,0,#29 MORI |
MXOR x,y,z; BYTE 0,1,0,#2a MXOR |
MXOR x,y,z; BYTE 0,1,0,#29 MXORI |
SET x,z; BYTE 0,1,0,#20 SETH |
SET x,z; BYTE 0,1,0,#20 SETMH |
SET x,z; BYTE 0,1,0,#20 SETML |
SET x,z; BYTE 0,1,0,#20 SETL |
ADDU x,x,z; BYTE 0,1,0,#30 INCH |
ADDU x,x,z; BYTE 0,1,0,#30 INCMH |
ADDU x,x,z; BYTE 0,1,0,#30 INCML |
ADDU x,x,z; BYTE 0,1,0,#30 INCL |
OR x,x,z; BYTE 0,1,0,#30 ORH |
OR x,x,z; BYTE 0,1,0,#30 ORMH |
OR x,x,z; BYTE 0,1,0,#30 ORML |
OR x,x,z; BYTE 0,1,0,#30 ORL |
ANDN x,x,z; BYTE 0,1,0,#30 ANDNH |
ANDN x,x,z; BYTE 0,1,0,#30 ANDNMH |
ANDN x,x,z; BYTE 0,1,0,#30 ANDNML |
ANDN x,x,z; BYTE 0,1,0,#30 ANDNL |
SET inst_ptr,yz; BYTE 0,1,0,#41 JMP |
SET inst_ptr,yz; BYTE 0,1,0,#41 JMPB |
JMP PushJ+@-O; BYTE 0,1,0,#60 PUSHJ |
JMP PushJ+@-O; BYTE 0,1,0,#60 PUSHJB |
SET x,yz; BYTE 0,1,0,#60 GETA |
SET x,yz; BYTE 0,1,0,#60 GETAB |
JMP Put+@-O; BYTE 0,1,0,#02 PUT |
JMP Put+@-O; BYTE 0,1,0,#01 PUTI |
JMP Pop+@-O; BYTE 0,3,rJ,#00 POP |
JMP Resume+@-O; BYTE 0,5,0,#00 RESUME |
JMP Save+@-O; BYTE 20,1,0,#20 SAVE |
JMP Unsave+@-O; BYTE 20,1,0,#02 UNSAVE |
JMP Sync+@-O; BYTE 0,1,0,#01 SYNC |
SWYM x,y,z; BYTE 0,1,0,#00 SWYM |
JMP Get+@-O; BYTE 0,1,0,#20 GET |
JMP Trip+@-O; BYTE 0,5,0,#0a TRIP |
|
Done AND t,f,X_is_dest_bit % doubly defined but OK |
BZ t,1F |
XDone STOU x,xptr,0 |
1H GET t,rA |
AND t,t,#ff |
OR exc,exc,t |
AND t,exc,U_BIT+X_BIT Check for trip, \S123 |
CMPU t,t,U_BIT |
PBNZ t,1F branch unless underflow is exact |
0H GREG U_BIT<<8 |
AND t,aa,0B |
BNZ t,1F branch if underflow is enabled |
ANDNL exc,U_BIT ignore U if exact and not enabled |
1H PBZ exc,Update |
SRU t,aa,8 |
AND t,t,exc |
PBZ t,4F |
SET xx,0 Initiate a trip, \S124 |
SLU t,t,55 |
2H INCL xx,1 |
SLU t,t,1 |
PBNN t,2B |
SET t,#100 |
SRU t,t,xx |
ANDN exc,exc,t |
TakeTrip STOU inst_ptr,g,8*rW |
SLU inst_ptr,xx,4 |
INCH inst,#8000 |
STOU inst,g,8*rX |
AND t,f,Mem_bit |
PBZ t,1F |
ADDU y,y,z |
SET z,x |
1H STOU y,g,8*rY |
STOU z,g,8*rZ |
LDOU t,g,c255 |
STOU t,g,8*rB |
LDOU t,g,8*rJ |
STOU t,g,c255 |
4H OR aa,aa,exc |
0H GREG #0000000800000004 Update the clocks, \S128 |
Update MOR t,f,0B $2^{32}$mems + oops |
ADDU cc,cc,t |
ADDU uu,uu,1 |
SUBU ii,ii,1 |
AllDone PBZ resuming,Fetch |
CMPU t,op,#F9 RESUME |
CSNZ resuming,t,0 |
JMP Fetch |
|
OctaArgs OCTA Global+8*255,8 |
Infile IS 3 |
Main LDA Mem:head,Chunk0 |
ADDU Mem:alloc,Mem:head,Mem:nodesize |
GET t,rN |
INCL t,1 |
STOU t,g,8*rN |
LDOU t,$1,8 argv[1] |
STOU t,IOArgs |
LDA t,IOArgs |
TRAP 0,Fopen,Infile |
BN t,Error |
1H GETA t,OctaArgs |
TRAP 0,Fread,Infile |
BN t,9F |
LDOU loc,g,c255 |
2H GETA t,OctaArgs |
TRAP 0,Fread,Infile |
LDOU x,g,c255 |
BN t,Error |
SET arg,loc |
BZ x,1B |
PUSHJ res,MemFind |
STOU x,res,0 |
INCL loc,8 |
JMP 2B |
9H TRAP 0,Fclose,Infile |
SUBU loc,loc,8 |
STOU loc,g,c255 place to UNSAVE |
SUBU arg,loc,8*13 |
PUSHJ res,MemFind |
LDOU inst_ptr,res,0 Main |
SET arg,#90 Get ready to UNSAVE, \S162 |
PUSHJ res,MemFind |
LDTU x,res,0 |
SET resuming,1 RESUME_AGAIN |
CSNZ inst_ptr,x,#90 |
0H GREG #FB<<24+255 UNSAVE $255 |
STOU 0B,g,8*rX |
SET gg,c255 |
JMP Fetch |
|
LOC Global+8*rK; OCTA -1 |
LOC Global+8*rT; OCTA #8000000500000000 |
LOC Global+8*rTT; OCTA #8000000600000000 |
LOC Global+8*rV; OCTA #369c200400000000 |
|
LOC U_Handler |
ORL exc,U_BIT |
JMP Done |
/permu-plain.mms
0,0 → 1,40
* Permutation generator a la plain-changes (mockup only) |
t IS $255 |
a GREG 0 |
p GREG 0 |
c GREG 0 |
fmask GREG #f |
magic GREG #8844221188442211 |
ffmask GREG #ff000000 |
u IS $0 |
LOC #100 |
GREG @ |
T OCTA #194cb4594cb4594c,#b44,0 |
Main SET a,#1234 |
SLU a,a,12 [needed to make the MXOR stuff work] |
LDA p,T |
JMP 3F |
|
1H SRU u,a,12 (trace this) |
|
% SLU u,fmask,t |
% SLU t,a,4 |
% XOR t,t,a |
% AND t,t,u |
% SRU u,t,4 |
% OR t,t,u |
% XOR a,a,t |
SLU u,a,t |
MXOR u,magic,u |
AND u,u,ffmask |
SRU u,u,t |
XOR a,a,u |
|
SRU c,c,3 |
2H AND t,c,#1c |
PBNZ t,1B |
ADD p,p,8 |
3H LDO c,p,0 |
PBNZ c,2B |
TRAP 0,Halt,0 |
|
/mmix.mp
0,0 → 1,23
% illustrations for mmix.w |
|
beginfig(1) |
numeric r; r=.5in; % radius of circle |
numeric rr; rr=.9in; % radius of arc |
|
pickup pencircle scaled .6pt; |
draw (r,0){up}...(0,r){left}...(-r,0){down}...(0,-r){right}...cycle; |
pickup pencircle scaled .3pt; |
for a=-45, 30, 180: |
z[a]=(r+10pt,0) rotated (a+10); |
draw ((r-4pt,0)--(r+20pt,0)) rotated a; |
endfor |
label.rt(btex $\alpha$ etex,z[-45]-(2pt,6pt)); |
label.rt(btex $\beta$ etex,z[30]+(0,2pt)); |
label.lft(btex $\gamma$ etex,z[180]); |
|
drawdblarrow ((rr,0) rotated-45){dir 45}...((rr,0) rotated 30){dir 120}; |
label.rt(btex $L$ etex,(rr+8pt,0) rotated -10); |
endfig; |
|
bye. |
|
/mmix-mem.w
0,0 → 1,59
% This file is part of the MMIXware package (c) Donald E Knuth 1999 |
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! |
|
\def\title{MMIX-MEM} |
\def\MMIX{\.{MMIX}} |
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant |
@s octa int |
|
@* Memory-mapped input and output. This module supplies procedures for reading |
@^I/O@> |
@^input/output@> |
@^memory-mapped input/output@> |
and writing \MMIX\ memory addresses that exceed 48 bits. Such addresses are |
used by the operating system for input and output, so they require special |
treatment. At present only dummy versions of these routines are implemented. |
Users who need nontrivial versions of |spec_read| and/or |spec_write| should |
prepare their own and link them with the rest of the simulator. |
|
@p |
#include <stdio.h> |
#include "mmix-pipe.h" /* header file for all modules */ |
extern octa read_hex(); /* found in the main program module */ |
static char buf[20]; |
|
@ If the |interactive_read_bit| of the |verbose| control is set, |
the user is supposed to supply values dynamically. Otherwise |
zero is read. |
|
@p |
octa spec_read @,@,@[ARGS((octa))@];@+@t}\6{@> |
octa spec_read(addr) |
octa addr; |
{ |
octa val; |
if (verbose&interactive_read_bit) { |
printf("** Read from loc %08x%08x: ",addr.h,addr.l); |
fgets(buf,20,stdin); |
val=read_hex(buf); |
} else val.l=val.h=0; |
if (verbose&show_spec_bit) |
printf(" (spec_read %08x%08x from %08x%08x at time %d)\n", |
val.h,val.l,addr.h,addr.l,ticks.l); |
return val; |
} |
|
@ The default |spec_write| just reports its arguments, without actually |
writing anything. |
|
@p |
void spec_write @,@,@[ARGS((octa,octa))@];@+@t}\6{@> |
void spec_write(addr,val) |
octa addr,val; |
{ |
if (verbose&show_spec_bit) |
printf(" (spec_write %08x%08x to %08x%08x at time %d)\n", |
val.h,val.l,addr.h,addr.l,ticks.l); |
} |
|
@* Index. |
/mmmix.w
0,0 → 1,548
% This file is part of the MMIXware package (c) Donald E Knuth 1999 |
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! |
|
\def\title{MMMIX} |
\def\MMIX{\.{MMIX}} |
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant |
@s octa int |
@s tetra int |
@s bool int |
@s fetch int |
@s specnode int |
|
@* Introduction. |
This \.{CWEB} program simulates how the \MMIX\ computer might be |
implemented with a high-performance pipeline in many different configurations. |
All of the complexities of \MMIX's architecture are treated, except for |
multiprocessing and low-level details of memory mapped input/output. |
|
The present program module, which contains the main routine for the |
\MMIX\ meta-simulator, is primarily devoted to administrative tasks. Other modules |
do the actual work after this module has told them what to do. |
|
@ A user typically invokes the meta-simulator with a \UNIX/-like command line |
of the general form |
`\.{mmmix}~\.{configfile}~\.{progfile}', |
where the \.{configfile} describes the characteristics |
of an \MMIX\ implementation and the \.{progfile} contains a program to |
be downloaded and run. Rules for configuration files appear in |
the module called \.{mmix-config}. The program file is either |
an ``\MMIX\ binary file'' dumped by {\mc MMIX-SIM}, or an |
ASCII text file that describes hexadecimal data |
in a rudimentary format. It is assumed to be binary if |
its name ends with the extension `\.{.mmb}'. |
|
@c |
#include <stdio.h> |
#include <stdlib.h> |
#include <string.h> |
#include "mmix-pipe.h" |
@# |
char *config_file_name, *prog_file_name; |
@<Global variables@>@; |
@<Subroutines@>@; |
|
int main(argc,argv) |
int argc; |
char *argv[]; |
{ |
@<Parse the command line@>; |
MMIX_config(config_file_name); |
MMIX_init(); |
mmix_io_init(); |
@<Input the program@>; |
@<Run the simulation interactively@>; |
printf("Simulation ended at time %d.\n",ticks.l); |
print_stats(); |
return 0; |
} |
|
@ The command line might also contain options, some day. |
For now I'm forgetting them and simplifying everything until I gain |
further experience. |
|
@<Parse...@>= |
if (argc!=3) { |
fprintf(stderr,"Usage: %s configfile progfile\n",argv[0]); |
@.Usage: ...@> |
exit(-3); |
} |
config_file_name=argv[1]; |
prog_file_name=argv[2]; |
|
@ @<Input the program@>= |
if (strlen(prog_file_name)>4 && |
strcmp(prog_file_name+strlen(prog_file_name)-4,".mmb")==0) |
@<Input an \MMIX\ binary file@>@; |
else @<Input a rudimentary hexadecimal file@>; |
fclose(prog_file); |
|
@* Hexadecimal input to memory. |
A rudimentary hexadecimal input format is implemented here so that the |
@^hexadecimal files@> |
simulator can be run with essentially arbitrary data in the simulated memory. |
The rules of this format are extremely simple: Each line of the file |
either begins with (i)~12 hexadecimal digits followed by a colon; or |
(ii)~a space followed by 16 hexadecimal digits. In case~(i), the 12 |
hex digits specify a 48-bit physical address, called the current |
location. In case~(ii), the 16 hex digits specify an octabyte to be |
stored in the current location; the current location is then increased by~8. |
The current location should be a multiple of~8, but its three least |
significant bits are actually ignored. Arbitrary comments can follow |
the specification of a new current location or a new octabyte, as long |
as each line is less than 99 characters long. For example, the file |
$$\vbox{\halign{\tt#\hfil\cr |
0123456789ab: SILLY EXAMPLE\cr |
\ 0123456789abcdef first octabyte\cr |
\ fedbca9876543210 second\cr}}$$ |
places the octabyte |
\Hex{0123456789abcdef} into memory location \Hex{0123456789a8} |
and \Hex{fedcba9876543210} into location \Hex{0123456789b0}. |
|
@d BUF_SIZE 100 |
|
@<Glob...@>= |
octa cur_loc; |
octa cur_dat; |
bool new_chunk; |
char buffer[BUF_SIZE]; |
FILE *prog_file; |
|
@ @<Input a rudimentary hexadecimal file@>= |
{ |
prog_file=fopen(prog_file_name,"r"); |
if (!prog_file) { |
fprintf(stderr,"Panic: Can't open MMIX hexadecimal file %s!\n",prog_file_name); |
@.Can't open...@> |
exit(-3); |
} |
new_chunk=true; |
while (1) { |
if (!fgets(buffer,BUF_SIZE,prog_file)) break; |
if (buffer[strlen(buffer)-1]!='\n') { |
fprintf(stderr,"Panic: Hexadecimal file line too long: `%s...'!\n",buffer); |
@.Hexadecimal file line...@> |
exit(-3); |
} |
if (buffer[12]==':') @<Change the current location@>@; |
else if (buffer[0]==' ') @<Read an octabyte and advance |cur_loc|@>@; |
else { |
fprintf(stderr,"Panic: Improper hexadecimal file line: `%s'!\n",buffer); |
@.Improper hexadecimal...@> |
exit(-3); |
} |
} |
} |
|
@ @<Change the current location@>= |
{ |
if (sscanf(buffer,"%4x%8x",&cur_loc.h,&cur_loc.l)!=2) { |
fprintf(stderr,"Panic: Improper hexadecimal file location: `%s'!\n",buffer); |
@.Improper hexadecimal...@> |
exit(-3); |
} |
new_chunk=true; |
} |
|
@ @<Read an octabyte and advance |cur_loc|@>= |
{ |
if (sscanf(buffer+1,"%8x%8x",&cur_dat.h,&cur_dat.l)!=2) { |
fprintf(stderr,"Panic: Improper hexadecimal file data: `%s'!\n",buffer); |
@.Improper hexadecimal...@> |
exit(-3); |
} |
if (new_chunk) mem_write(cur_loc,cur_dat); |
else mem_hash[last_h].chunk[(cur_loc.l&0xffff)>>3]=cur_dat; |
cur_loc.l+=8; |
if ((cur_loc.l&0xfff8)!=0) new_chunk=false; |
else { |
new_chunk=true; |
if ((cur_loc.l&0xffff0000)==0) cur_loc.h++; |
} |
} |
|
@* Binary input to memory. |
When the program file was dumped by {\mc MMIX-SIM}, it |
has the simple format discussed in exercise 1.4.3$'$--20 of the \MMIX\ fascicle. |
@^binary files@> |
@^segments@> |
In this case we assume that the user's program has text, data, pool, and stack |
segments, as in the conventions of that book. |
We load it into four |
$2^{32}$-byte pages of physical memory, one for each segment; page zero of |
segment~$i$ is mapped to physical location $2^{32}i$. Page tables are kept in |
physical locations starting at $2^{32}\times4$; static traps begin at |
$2^{32}\times 5$ and dynamic traps at $2^{32}\times6$. (These conventions |
agree with the special register settings |
$\rm rT=\Hex{8000000500000000}$, |
$\rm rTT=\Hex{8000000600000000}$, |
$\rm rV=\Hex{369c200400000000}$ |
assumed by the stripped-down simulator.) |
|
@<Input an \MMIX\ binary file@>= |
{ |
prog_file=fopen(prog_file_name,"rb"); |
if (!prog_file) { |
fprintf(stderr,"Panic: Can't open MMIX binary file %s!\n",prog_file_name); |
@.Can't open...@> |
exit(-3); |
} |
while (1) { |
if (!undump_octa()) break; |
new_chunk=true; |
cur_loc=cur_dat; |
if (cur_loc.h&0x9fffffff) bad_address=true; |
else bad_address=false, cur_loc.h >>= 29; |
/* apply trivial mapping function for each segment */ |
@<Input consecutive octabytes beginning at |cur_loc|@>; |
} |
@<Set up the canned environment@>; |
} |
|
@ The |undump_octa| routine reads eight bytes from the binary file |
|prog_file| into the global octabyte |cur_dat|, |
taking care as usual to be big-endian regardless of the host computer's bias. |
@^big-endian versus little-endian@> |
@^little-endian versus big-endian@> |
|
@<Sub...@>= |
static bool undump_octa @,@,@[ARGS((void))@];@+@t}\6{@> |
static bool undump_octa() |
{ |
register int t0,t1,t2,t3; |
t0=fgetc(prog_file);@+ if (t0==EOF) return false; |
t1=fgetc(prog_file);@+ if (t1==EOF) goto oops; |
t2=fgetc(prog_file);@+ if (t2==EOF) goto oops; |
t3=fgetc(prog_file);@+ if (t3==EOF) goto oops; |
cur_dat.h=(t0<<24)+(t1<<16)+(t2<<8)+t3; |
t0=fgetc(prog_file);@+ if (t0==EOF) goto oops; |
t1=fgetc(prog_file);@+ if (t1==EOF) goto oops; |
t2=fgetc(prog_file);@+ if (t2==EOF) goto oops; |
t3=fgetc(prog_file);@+ if (t3==EOF) goto oops; |
cur_dat.l=(t0<<24)+(t1<<16)+(t2<<8)+t3; |
return true; |
oops: fprintf(stderr,"Premature end of file on %s!\n",prog_file_name); |
@.Premature end of file...@> |
return false; |
} |
|
@ @<Input consecutive octabytes beginning at |cur_loc|@>= |
while (1) { |
if (!undump_octa()) { |
fprintf(stderr,"Unexpected end of file on %s!\n",prog_file_name); |
@.Unexpected end of file...@> |
break; |
} |
if (!(cur_dat.h || cur_dat.l)) break; |
if (bad_address) { |
fprintf(stderr,"Panic: Unsupported virtual address %08x%08x!\n", |
@.Unsupported virtual address@> |
cur_loc.h,cur_loc.l); |
exit(-5); |
} |
if (new_chunk) mem_write(cur_loc,cur_dat); |
else mem_hash[last_h].chunk[(cur_loc.l&0xffff)>>3]=cur_dat; |
cur_loc.l+=8; |
if ((cur_loc.l&0xfff8)!=0) new_chunk=false; |
else { |
new_chunk=true; |
if ((cur_loc.l&0xffff0000)==0) { |
bad_address=true; cur_loc.h=(cur_loc.h<<29)+1; |
} |
} |
} |
|
@ The primitive operating system assumed in simple programs of {\sl The |
Art of Computer Programming\/} will set up text segment, data segment, |
pool segment, and stack segment as in {\mc MMIX-SIM}. The runtime stack |
will be initialized if we \.{UNSAVE} from the last location loaded |
in the \.{.mmb} file. |
|
@d rQ 16 |
|
@<Set up the canned environment@>= |
if (cur_loc.h!=3) { |
fprintf(stderr,"Panic: MMIX binary file didn't set up the stack!\n"); |
@.MMIX binary file...@> |
exit(-6); |
} |
inst_ptr.o=mem_read(incr(cur_loc,-8*14)); /* \.{Main} */ |
inst_ptr.p=NULL; |
cur_loc.h=0x60000000; |
g[255].o=incr(cur_loc,-8); /* place to \.{UNSAVE} */ |
cur_dat.l=0x90; |
if (mem_read(cur_dat).h) inst_ptr.o=cur_dat; /* start at |0x90| if nonzero */ |
head->inst=(UNSAVE<<24)+255, tail--; /* prefetch a fabricated command */ |
head->loc=incr(inst_ptr.o,-4); /* in case the \.{UNSAVE} is interrupted */ |
g[rT].o.h=0x80000005, g[rTT].o.h=0x80000006; |
cur_dat.h=(RESUME<<24)+1, cur_dat.l=0, cur_loc.h=5, cur_loc.l=0; |
mem_write(cur_loc,cur_dat); /* the primitive trap handler */ |
cur_dat.l=cur_dat.h, cur_dat.h=(NEGI<<24)+(255<<16)+1; |
cur_loc.h=6, cur_loc.l=8; |
mem_write(cur_loc,cur_dat); /* the primitive dynamic trap handler */ |
cur_dat.h=(GET<<24)+rQ, cur_dat.l=(PUTI<<24)+(rQ<<16), cur_loc.l=0; |
mem_write(cur_loc,cur_dat); /* more of the primitive dynamic trap handler */ |
cur_dat.h=0, cur_dat.l=7; /* generate a PTE with \.{rwx} permission */ |
cur_loc.h=4; /* beginning of skeleton page table */ |
mem_write(cur_loc,cur_dat); /* PTE for the text segment */ |
ITcache->set[0][0].tag=zero_octa; |
ITcache->set[0][0].data[0]=cur_dat; /* prime the IT cache */ |
cur_dat.l=6; /* PTE with read and write permission only */ |
cur_dat.h=1, cur_loc.l=3<<13; |
mem_write(cur_loc,cur_dat); /* PTE for the data segment */ |
cur_dat.h=2, cur_loc.l=6<<13; |
mem_write(cur_loc,cur_dat); /* PTE for the pool segment */ |
cur_dat.h=3, cur_loc.l=9<<13; |
mem_write(cur_loc,cur_dat); /* PTE for the stack segment */ |
g[rK].o=neg_one; /* enable all interrupts */ |
g[rV].o.h=0x369c2004; |
page_bad=false, page_r=4<<(32-13), page_s=32, page_mask.l=0xffffffff; |
page_b[1]=3, page_b[2]=6, page_b[3]=9, page_b[4]=12; |
|
@* Interaction. When prompted for instructions, this simulator |
@.mmmix>@> |
understands the following terse commands: |
|
\def\bull{\smallbreak\textindent{$\bullet$}} |
\def\<#1>{$\langle\,$#1$\,\rangle$} |
\bull\<positive integer>: Run for this many clock cycles. |
|
\bull\.{@@}\<hexadecimal integer>: Set the instruction pointer |
to this virtual address; successive instructions will be fetched from here. |
|
\bull\.{b}\<hexadecimal integer>: Set the breakpoint |
to this virtual address; simulation will pause when an instruction from the |
breakpoint address enters the fetch buffer. |
|
\bull\.v\<hexadecimal integer>: Set the desired level of diagnostic |
output; each bit in the hexadecimal integer enables certain printouts |
when the simulator is running. Bit \Hex1 shows instructions when issued, |
deissued, or committed; \Hex2 shows the pipeline and locks after each cycle; |
\Hex4 shows each coroutine activation; \Hex8 each coroutine scheduling; |
\Hex{10} reports when reading from an uninitialized chunk of memory; |
\Hex{20} asks for online input when reading from addresses $\ge2^{48}$; |
\Hex{40} reports all I/O to memory address $\ge2^{48}$; |
\Hex{80} shows details of branch prediction; |
\Hex{100} displays full cache contents including blocks with invalid tags. |
|
\bull\.-\<integer>: Deissue this many instructions. |
|
\bull\.l\<integer> or \.g\<integer>: Show current ``hot'' contents |
of a local or global register. |
|
\bull\.m\<hexadecimal integer>: Show current contents of a physical memory |
address. (This value may not be up to date; newer values might appear |
in the write buffer and/or in the caches.) |
|
\bull\.f\<hexadecimal integer>: Insert a tetrabyte into the fetch buffer. |
(Use with care!) |
|
\bull\.i\<integer>: Set the interval counter rI to the given value; this will |
trigger an interrupt after the specified number of cycles. |
|
\bull\.{IT}, \.{DT}, \.I, \.D, or \.S: Show current contents of a cache. |
|
\bull\.{D*} or \.{S*}: Show dirty blocks of a cache. |
|
\bull\.p: Show current contents of the pipeline. |
|
\bull\.s: Show current statistics on branch prediction and |
speed of instruction issue. |
|
\bull\.h: Help (show the possibilities for interaction). |
|
\bull\.q: Quit. |
|
@<Run the simulation interactively@>= |
while (1) { |
printf("mmmix> ");@+fflush(stdout); |
@.mmmix>@> |
fgets(buffer,BUF_SIZE,stdin); |
switch (buffer[0]) { |
default: what_say: |
printf("Eh? Sorry, I don't understand. (Type h for help)\n"); |
continue; |
case 'q': case 'x': goto done; |
@<Cases for interaction@>@; |
} |
} |
done:@; |
|
@ @<Cases...@>= |
case 'h': case '?': printf("The interactive commands are as follows:\n"); |
printf(" <n> to run for n cycles\n"); |
printf(" @@<x> to take next instruction from location x\n"); |
printf(" b<x> to pause when location x is fetched\n"); |
printf(" v<x> to print specified diagnostics when running;\n"); |
printf(" x=1[insts enter/leave pipe]+2[whole pipeline each cycle]+\n"); |
printf(" 4[coroutine activations]+8[coroutine scheduling]+\n"); |
printf(" 10[uninitialized read]+20[online I/O read]+\n"); |
printf(" 40[I/O read/write]+80[branch prediction details]+\n"); |
printf(" 100[invalid cache blocks displayed too]\n"); |
printf(" -<n> to deissue n instructions\n"); |
printf(" l<n> to print current value of local register n\n"); |
printf(" g<n> to print current value of global register n\n"); |
printf(" m<x> to print current value of memory address x\n"); |
printf(" f<x> to insert instruction x into the fetch buffer\n"); |
printf(" i<n> to initiate a timer interrupt after n cycles\n"); |
printf(" IT, DT, I, D, or S to print current cache contents\n"); |
printf(" D* or S* to print dirty blocks of a cache\n"); |
printf(" p to print current pipeline contents\n"); |
printf(" s to print current stats\n"); |
printf(" h to print this message\n"); |
printf(" q to exit\n"); |
printf("(Here <n> is a decimal integer, <x> is hexadecimal.)\n"); |
continue; |
|
@ @<Cases...@>= |
case '0': case '1': case '2': case '3': case '4': |
case '5': case '6': case '7': case '8': case '9': |
if (sscanf(buffer,"%d",&n)!=1) goto what_say; |
printf("Running %d at time %d",n,ticks.l); |
if (bp.h==(tetra)-1 && bp.l==(tetra)-1) printf("\n"); |
else printf(" with breakpoint %08x%08x\n",bp.h,bp.l); |
MMIX_run(n,bp);@+continue; |
case '@@': inst_ptr.o=read_hex(buffer+1);@+inst_ptr.p=NULL;@+continue; |
case 'b': bp=read_hex(buffer+1);@+continue; |
case 'v': verbose=read_hex(buffer+1).l;@+continue; |
|
@ @<Glob...@>= |
int n,m; /* temporary integer */ |
octa bp={-1,-1}; /* breakpoint */ |
octa tmp; /* an octabyte of temporary interest */ |
static unsigned char d[BUF_SIZE]; |
|
@ Here's a simple program to read an octabyte in hexadecimal notation |
from a buffer. It changes the buffer by storing a null character |
after the input. |
@^radix conversion@> |
|
@<Sub...@>= |
octa read_hex @,@,@[ARGS((char *))@];@+@t}\6{@> |
octa read_hex(p) |
char *p; |
{ |
register int j,k; |
octa val; |
val.h=val.l=0; |
for (j=0;;j++) { |
if (p[j]>='0' && p[j]<='9') d[j]=p[j]-'0'; |
else if (p[j]>='a' && p[j]<='f') d[j]=p[j]-'a'+10; |
else if (p[j]>='A' && p[j]<='F') d[j]=p[j]-'A'+10; |
else break; |
} |
p[j]='\0'; |
for (j--,k=0;k<=j;k++) { |
if (k>=8) val.h+=d[j-k]<<(4*k-32); |
else val.l+=d[j-k]<<(4*k); |
} |
return val; |
} |
|
@ @<Cases...@>= |
case '-':@+ if (sscanf(buffer+1,"%d",&n)!=1 || n<0) goto what_say; |
if (cool<=hot) m=hot-cool;@+else m=(hot-reorder_bot)+1+(reorder_top-cool); |
if (n>m) deissues=m;@+else deissues=n; |
continue; |
case 'l':@+ if (sscanf(buffer+1,"%d",&n)!=1 || n<0) goto what_say; |
if (n>=lring_size) goto what_say; |
printf(" l[%d]=%08x%08x\n",n,l[n].o.h,l[n].o.l);@+continue; |
case 'm': tmp=mem_read(read_hex(buffer+1)); |
printf(" m[%s]=%08x%08x\n",buffer+1,tmp.h,tmp.l);@+continue; |
|
@ The register stack pointers, rO and rS, are not kept up to date |
in the |g| array. Therefore we have to deduce their values by |
examining the pipeline. |
|
@<Cases...@>= |
case 'g':@+ if (sscanf(buffer+1,"%d",&n)!=1 || n<0) goto what_say; |
if (n>=256) goto what_say; |
if (n==rO || n==rS) { |
if (hot==cool) /* pipeline empty */ |
g[rO].o=sl3(cool_O), g[rS].o=sl3(cool_S); |
else g[rO].o=sl3(hot->cur_O), g[rS].o=sl3(hot->cur_S); |
} |
printf(" g[%d]=%08x%08x\n",n,g[n].o.h,g[n].o.l); |
continue; |
|
@ @<Sub...@>= |
static octa sl3 @,@,@[ARGS((octa))@];@+@t}\6{@> |
static octa sl3(y) /* shift left by 3 bits */ |
octa y; |
{ |
register tetra yhl=y.h<<3, ylh=y.l>>29; |
y.h=yhl+ylh;@+ y.l<<=3; |
return y; |
} |
|
@ @<Cases...@>= |
case 'I': print_cache(buffer[1]=='T'? ITcache: Icache,false);@+continue; |
case 'D': print_cache(buffer[1]=='T'? DTcache: Dcache,@/ |
buffer[1]=='*');@+continue; |
case 'S': print_cache(Scache,buffer[1]=='*');@+continue; |
case 'p': print_pipe();@+print_locks();@+continue; |
case 's': print_stats();@+continue; |
case 'i':@+ if (sscanf(buffer+1,"%d",&n)==1) g[rI].o=incr(zero_octa,n); |
continue; |
|
@ @<Cases...@>= |
case 'f': tmp=read_hex(buffer+1); |
{ |
register fetch* new_tail; |
if (tail==fetch_bot) new_tail=fetch_top; |
else new_tail=tail-1; |
if (new_tail==head) printf("Sorry, the fetch buffer is full!\n"); |
else { |
tail->loc=inst_ptr.o; |
tail->inst=tmp.l; |
tail->interrupt=0; |
tail->noted=false; |
tail=new_tail; |
} |
continue; |
} |
|
@ A hidden case here, for me when debugging. |
It essentially disables the translation caches, by mapping everything |
to zero. |
|
@<Cases...@>= |
case 'd':@+if (ticks.l) |
printf("Sorry: I disable ITcache and DTcache only at the beginning!\n"); |
else { |
ITcache->set[0][0].tag=zero_octa; |
ITcache->set[0][0].data[0]=seven_octa; |
DTcache->set[0][0].tag=zero_octa; |
DTcache->set[0][0].data[0]=seven_octa; |
g[rK].o=neg_one; |
page_bad=false; |
page_mask=neg_one; |
inst_ptr.p=(specnode*)1; |
}@+continue; |
|
@ And another case, for me when kludging. At the moment, |
it simply lists the functional unit names. |
|
But I might decide to put other stuff here when giving a demo. |
|
@<Cases...@>= |
case 'k':@+ { register int j; |
for (j=0;j<funit_count;j++) |
printf("unit %s %d\n",funit[j].name,funit[j].k); |
} |
continue; |
|
@ @<Glob...@>= |
bool bad_address; |
extern bool page_bad; |
extern octa page_mask; |
extern int page_r,page_s,page_b[5]; |
extern octa zero_octa; |
extern octa neg_one; |
octa seven_octa={0,7}; |
extern octa incr @,@,@[ARGS((octa y,int delta))@]; |
/* unsigned $y+\delta$ ($\delta$ is signed) */ |
extern void mmix_io_init @,@,@[ARGS((void))@]; |
extern void MMIX_config @,@,@[ARGS((char*))@]; |
|
@* Index. |
/crypto1.mms
0,0 → 1,73
* Cryptanalysis Problem (CLASSIFIED) (pipelined) |
a GREG |
b GREG |
bb GREG |
c GREG |
t GREG |
x GREG |
y GREG |
LOC Data_Segment |
freq GREG @ Base address for byte counts |
LOC @+8*(1<<8) Space for the byte frequencies |
p GREG @ |
BYTE "abracadabraa",0,"abc" Trivial test data |
ones GREG #0101010101010101 |
LOC #100 |
Start LDOU a,p,0 |
INCL p,8 |
BDIF t,ones,a |
BNZ t,3F Do main loop, unless near the end. |
2H SRU b,a,53 |
LDO c,freq,b Load old count. |
SLU bb,a,8 |
INCL c,1 |
SRU bb,bb,53 |
STO c,freq,b Store new count. |
LDO c,freq,bb |
SLU b,a,16 |
INCL c,1 |
SRU b,b,53 |
STO c,freq,bb |
LDO c,freq,b Load old count. |
SLU bb,a,24 |
INCL c,1 |
SRU bb,bb,53 |
STO c,freq,b Store new count. |
LDO c,freq,bb |
SLU b,a,32 |
INCL c,1 |
SRU b,b,53 |
STO c,freq,bb |
LDO c,freq,b Load old count. |
SLU bb,a,40 |
INCL c,1 |
SRU bb,bb,53 |
STO c,freq,b Store new count. |
LDO c,freq,bb |
SLU b,a,48 |
INCL c,1 |
SRU b,b,53 |
STO c,freq,bb |
LDO c,freq,b Load old count. |
SLU bb,a,56 |
INCL c,1 |
SRU bb,bb,53 |
STO c,freq,b Store new count. |
LDO c,freq,bb |
LDOU a,p,0 |
INCL p,8 |
INCL c,1 |
BDIF t,ones,a |
STO c,freq,bb |
PBZ t,2B Do main loop, unless near the end. |
3H SRU b,a,53 |
LDO c,freq,b Load old count. |
INCL c,1 |
STO c,freq,b Store new count. |
SRU b,b,3 |
SLU a,a,8 |
PBNZ b,3B Continue unless done. |
POP |
|
Main IS Start |
|
/cp.mms
0,0 → 1,15
% copy from StdIn to StdOut, no error checking |
LOC Data_Segment |
GREG @ |
ArgR OCTA Buf,2 one char at a time |
ArgW OCTA Buf,1 ditto |
Buf LOC @+2 |
|
LOC #100 |
Main LDA $255,ArgR |
TRAP 0,Fgets,StdIn |
BN $255,Done |
LDA $255,ArgW |
TRAP 0,Fwrite,StdOut |
JMP Main |
Done TRAP 0,0,Halt |
/crypto2.mms
0,0 → 1,86
* Cryptanalysis Problem (CLASSIFIED) (pipelined, superscalar) |
a GREG |
b GREG |
bb GREG |
bbb GREG |
bbbb GREG |
c GREG |
cc GREG |
t GREG |
x GREG |
y GREG |
LOC Data_Segment |
freq GREG @ Base address for even byte counts |
LOC @+8*(1<<8) Space for the byte frequencies |
freqq GREG @ Base address for odd byte counts |
LOC @+8*(1<<8) Space for the byte frequencies |
p GREG @ |
BYTE "abracadabraa",0,"abc" Trivial test data |
ones GREG #0101010101010101 |
LOC #100 |
Start LDOU a,p,0 |
INCL p,8 |
BDIF t,ones,a |
SLU bb,a,8 |
BNZ t,3F Do main loop, unless near the end. |
2H SRU b,a,53 |
SRU bb,bb,53 |
LDO c,freq,b |
LDO cc,freqq,bb |
SLU bbb,a,16 |
SLU bbbb,a,24 |
INCL c,1 |
INCL cc,1 |
SRU bbb,bbb,53 |
SRU bbbb,bbbb,53 |
STO c,freq,b |
STO cc,freqq,bb |
LDO c,freq,bbb |
LDO cc,freqq,bbbb |
SLU b,a,32 |
SLU bb,a,40 |
INCL c,1 |
INCL cc,1 |
SRU b,b,53 |
SRU bb,bb,53 |
STO c,freq,bbb |
STO cc,freqq,bbbb |
LDO c,freq,b |
LDO cc,freqq,bb |
SLU bbb,a,48 |
SLU bbbb,a,56 |
INCL c,1 |
INCL cc,1 |
SRU bbb,bbb,53 |
SRU bbbb,bbbb,53 |
STO c,freq,b |
STO cc,freqq,bb |
LDO c,freq,bbb |
LDO cc,freqq,bbbb |
LDOU a,p,0 |
INCL p,8 |
INCL c,1 |
INCL cc,1 |
BDIF t,ones,a |
SLU bb,a,8 |
STO c,freq,bbb |
STO cc,freqq,bbbb |
PBZ t,2B |
3H SRU b,a,53 |
LDO c,freq,b |
INCL c,1 |
STO c,freq,b |
SRU b,b,3 |
SLU a,a,8 |
PBNZ b,3B |
SET p,8*255 |
4H LDO c,freq,p |
LDO cc,freqq,p |
ADD c,c,cc |
STO c,freq,p |
SUB p,p,8 |
PBP p,4B |
POP |
|
Main IS Start |
|
/mmotype.w
0,0 → 1,466
% This file is part of the MMIXware package (c) Donald E Knuth 1999 |
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES! |
|
\def\title{MMOTYPE} |
\def\MMIX{\.{MMIX}} |
\def\MMIXAL{\.{MMIXAL}} |
\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant |
|
@* Introduction. This program reads a binary \.{mmo} file output by |
the \MMIXAL\ processor and lists it in human-readable form. It lists |
only the symbol table, if invoked with the \.{-s} option. It lists |
also the tetrabytes of input, if invoked with the \.{-v} option. |
|
@s tetra int |
|
@c |
#include <stdio.h> |
#include <stdlib.h> |
#include <time.h> |
#include <string.h> |
@<Prototype preparations@>@; |
@<Type definitions@>@; |
@<Global variables@>@; |
@<Subroutines@>@; |
@# |
int main(argc,argv) |
int argc;@+char*argv[]; |
{ |
register int j,delta,postamble=0; |
register char *p; |
@<Process the command line@>; |
@<Initialize everything@>; |
@<List the preamble@>; |
do @<List the next item@>@;@+while (!postamble); |
@<List the postamble@>; |
@<List the symbol table@>; |
return 0; |
} |
|
@ @<Process the command line@>= |
listing=1, verbose=0; |
for (j=1;j<argc-1 && argv[j][0]=='-' && argv[j][2]=='\0';j++) { |
if (argv[j][1]=='s') listing=0; |
else if (argv[j][1]=='v') verbose=1; |
else break; |
} |
if (j!=argc-1) { |
fprintf(stderr,"Usage: %s [-s] [-v] mmofile\n",argv[0]); |
@.Usage: ...@> |
exit(-1); |
} |
|
@ @<Initialize everything@>= |
mmo_file=fopen(argv[argc-1],"rb"); |
if (!mmo_file) { |
fprintf(stderr,"Can't open file %s!\n",argv[argc-1]); |
@.Can't open...@> |
exit(-2); |
} |
|
@ @<Glob...@>= |
int listing; /* are we listing everything? */ |
int verbose; /* are we also showing the tetras of input as they are read? */ |
FILE *mmo_file; /* the input file */ |
|
@ @<Prototype preparations@>= |
#ifdef __STDC__ |
#define ARGS(list) list |
#else |
#define ARGS(list) () |
#endif |
|
@ A complete definition of \.{mmo} format appears in the \MMIXAL\ document. |
Here we need to define only the basic constants used for interpretation. |
|
@d mm 0x98 /* the escape code of \.{mmo} format */ |
@d lop_quote 0x0 /* the quotation lopcode */ |
@d lop_loc 0x1 /* the location lopcode */ |
@d lop_skip 0x2 /* the skip lopcode */ |
@d lop_fixo 0x3 /* the octabyte-fix lopcode */ |
@d lop_fixr 0x4 /* the relative-fix lopcode */ |
@d lop_fixrx 0x5 /* extended relative-fix lopcode */ |
@d lop_file 0x6 /* the file name lopcode */ |
@d lop_line 0x7 /* the file position lopcode */ |
@d lop_spec 0x8 /* the special hook lopcode */ |
@d lop_pre 0x9 /* the preamble lopcode */ |
@d lop_post 0xa /* the postamble lopcode */ |
@d lop_stab 0xb /* the symbol table lopcode */ |
@d lop_end 0xc /* the end-it-all lopcode */ |
|
@* Low-level arithmetic. This program is intended to work correctly |
whenever an |int| has at least 32 bits. |
|
@<Type...@>= |
typedef unsigned char byte; /* a monobyte */ |
typedef unsigned int tetra; /* a tetrabyte */ |
typedef struct {@+tetra h,l;}@+octa; /* an octabyte */ |
|
@ The |incr| subroutine adds a signed integer to an (unsigned) octabyte. |
|
@<Sub...@>= |
octa incr @,@,@[ARGS((octa,int))@]; |
octa incr(o,delta) |
octa o; |
int delta; |
{ |
register tetra t; |
octa x; |
if (delta>=0) { |
t=0xffffffff-delta; |
if (o.l<=t) x.l=o.l+delta, x.h=o.h; |
else x.l=o.l-t-1, x.h=o.h+1; |
} else { |
t=-delta; |
if (o.l>=t) x.l=o.l-t, x.h=o.h; |
else x.l=o.l+(0xffffffff+delta)+1, x.h=o.h-1; |
} |
return x; |
} |
|
@* Low-level input. The tetrabytes of an \.{mmo} file are stored in |
friendly big-endian fashion, but this program is supposed to work also |
on computers that are little-endian. Therefore we read four successive bytes |
and pack them into a tetrabyte, instead of reading a single tetrabyte. |
|
@<Sub...@>= |
void read_tet @,@,@[ARGS((void))@]; |
void read_tet() |
{ |
if (fread(buf,1,4,mmo_file)!=4) { |
fprintf(stderr,"Unexpected end of file after %d tetras!\n",count); |
@.Unexpected end of file...@> |
exit(-3); |
} |
yz=(buf[2]<<8)+buf[3]; |
tet=(((buf[0]<<8)+buf[1])<<16)+yz; |
if (verbose) printf(" %08x\n",tet); |
count++; |
} |
|
@ @<Sub...@>= |
byte read_byte @,@,@[ARGS((void))@]; |
byte read_byte() |
{ |
register byte b; |
if (!byte_count) read_tet(); |
b=buf[byte_count]; |
byte_count=(byte_count+1)&3; |
return b; |
} |
|
@ @<Glob...@>= |
int count; /* the number of tetrabytes we've read */ |
int byte_count; /* index of the next-to-be-read byte */ |
byte buf[4]; /* the most recently read bytes */ |
int yz; /* the two least significant bytes */ |
tetra tet; /* |buf| bytes packed big-endianwise */ |
|
@ @<Init...@>= |
count=byte_count=0; |
|
@* The main loop. Now for the bread-and-butter part of this program. |
|
@<List the next item@>= |
{ |
read_tet(); |
loop:@+if (buf[0]==mm) switch (buf[1]) { |
case lop_quote:@+if (yz!=1) |
err("YZ field of lop_quote should be 1"); |
@.YZ field...should be 1@> |
read_tet();@+break; |
@t\4@>@<Cases for lopcodes in the main loop@>@; |
default: err("Unknown lopcode"); |
@.Unknown lopcode@> |
} |
if (listing) @<List |tet| as a normal item@>; |
} |
|
@ We want to catch all cases where the rules of \.{mmo} format are |
not obeyed. The |err| macro ameliorates this somewhat tedious chore. |
|
@d err(m) {@+fprintf(stderr,"Error in tetra %d: %s!\n",count,m);@+ continue;@+} |
@.Error in tetra...@> |
|
@ In a normal situation, the newly read tetrabyte is simply supposed |
to be loaded into the current location. We list not only the current |
location but also the current file position, if |cur_line| is nonzero |
and |cur_loc| belongs to segment~0. |
|
@<List |tet| as a normal item@>= |
{ |
printf("%08x%08x: %08x",cur_loc.h,cur_loc.l,tet); |
if (!cur_line) printf("\n"); |
else { |
if (cur_loc.h&0xe0000000) printf("\n"); |
else { |
if (cur_file==listed_file) printf(" (line %d)\n",cur_line); |
else { |
printf(" (\"%s\", line %d)\n", file_name[cur_file], cur_line); |
listed_file=cur_file; |
} |
} |
cur_line++; |
} |
cur_loc=incr(cur_loc,4);@+ cur_loc.l &=-4; |
} |
|
@ @<Glob...@>= |
octa cur_loc; /* the current location */ |
int listed_file; /* the most recently listed file number */ |
int cur_file; /* the most recently selected file number */ |
int cur_line; /* the current position in |cur_file| */ |
char *file_name[256]; /* file names seen */ |
octa tmp; /* an octabyte of temporary interest */ |
|
@ @<Init...@>= |
cur_loc.h=cur_loc.l=0; |
listed_file=cur_file=-1; |
cur_line=0; |
|
@* The simple lopcodes. We have already implemented |lop_quote|, which |
falls through to the normal case after reading an extra tetrabyte. |
Now let's consider the other lopcodes in turn. |
|
@d y buf[2] /* the next-to-least significant byte */ |
@d z buf[3] /* the least significant byte */ |
|
@<Cases...@>= |
case lop_loc:@+if (z==2) { |
j=y;@+ read_tet();@+ cur_loc.h=(j<<24)+tet; |
}@+else if (z==1) cur_loc.h=y<<24; |
else err("Z field of lop_loc should be 1 or 2"); |
@:Z field of lop_loc...}\.{Z field of lop\_loc...@> |
read_tet();@+ cur_loc.l=tet; |
continue; |
case lop_skip: cur_loc=incr(cur_loc,yz);@+continue; |
|
@ Fixups load information out of order, when future references have |
been resolved. The current file name and line number are not considered |
relevant. |
|
@<Cases...@>= |
case lop_fixo:@+if (z==2) { |
j=y;@+ read_tet();@+ tmp.h=(j<<24)+tet; |
}@+else if (z==1) tmp.h=y<<24; |
else err("Z field of lop_fixo should be 1 or 2"); |
@:Z field of lop_fixo...}\.{Z field of lop\_fixo...@> |
read_tet();@+ tmp.l=tet; |
if (listing) printf("%08x%08x: %08x%08x\n",tmp.h,tmp.l,cur_loc.h,cur_loc.l); |
continue; |
case lop_fixr: delta=yz; goto fixr; |
case lop_fixrx:j=yz;@+if (j!=16 && j!=24) |
err("YZ field of lop_fixrx should be 16 or 24"); |
@:YZ field of lop_fixrx...}\.{YZ field of lop\_fixrx...@> |
read_tet(); delta=tet; |
if (delta&0xfe000000) err("increment of lop_fixrx is too large"); |
@.increment...too large@> |
fixr: tmp=incr(cur_loc,-(delta>=0x1000000? (delta&0xffffff)-(1<<j): delta)<<2); |
if (listing) printf("%08x%08x: %08x\n",tmp.h,tmp.l,delta); |
continue; |
|
@ The space for file names isn't allocated until we are sure we need it. |
|
@<Cases...@>= |
case lop_file:@+if (file_name[y]) { |
for (j=z;j>0;j--) read_tet(); |
cur_file=y; |
if (z) err("Two file names with the same number"); |
@.Two file names...@> |
}@+else { |
if (!z) err("No name given for newly selected file"); |
@.No name given...@> |
file_name[y]=(char*)calloc(4*z+1,1); |
if (!file_name[y]) { |
fprintf(stderr,"No room to store the file name!\n");@+exit(-4); |
@.No room...@> |
} |
cur_file=y; |
for (j=z,p=file_name[y]; j>0; j--,p+=4) { |
read_tet(); |
*p=buf[0];@+*(p+1)=buf[1];@+*(p+2)=buf[2];@+*(p+3)=buf[3]; |
} |
} |
cur_line=0;@+continue; |
case lop_line:@+if (cur_file<0) err("No file was selected for lop_line"); |
@.No file was selected...@> |
cur_line=yz;@+continue; |
|
@ Special bytes in the file might be in synch with the current location |
and/or the current file position, so we list those parameters too. |
|
@<Cases...@>= |
case lop_spec:@+if (listing) { |
printf("Special data %d at loc %08x%08x", yz, cur_loc.h, cur_loc.l); |
if (!cur_line) printf("\n"); |
else if (cur_file==listed_file) printf(" (line %d)\n",cur_line); |
else { |
printf(" (\"%s\", line %d)\n", file_name[cur_file], cur_line); |
listed_file=cur_file; |
} |
} |
while(1) { |
read_tet(); |
if (buf[0]==mm) { |
if (buf[1]!=lop_quote || yz!=1) goto loop; /* end of special data */ |
read_tet(); |
} |
if (listing) printf(" %08x\n",tet); |
} |
|
@ The other cases shouldn't appear in the main loop. |
|
@<Cases...@>= |
case lop_pre: err("Can't have another preamble"); |
@.Can't have another...@> |
case lop_post: postamble=1; |
if (y) err("Y field of lop_post should be zero"); |
@:Y field of lop_post...}\.{Y field of lop\_post...@> |
if (z<32) err("Z field of lop_post must be 32 or more"); |
@:Z field of lop_post...}\.{Z field of lop\_post...@> |
continue; |
case lop_stab: err("Symbol table must follow postamble"); |
@.Symbol table...@> |
case lop_end: err("Symbol table can't end before it begins"); |
|
@* The preamble and postamble. Now here's what we do before and after |
the main loop. |
|
@<List the preamble@>= |
read_tet(); /* read the first tetrabyte of input */ |
if (buf[0]!=mm || buf[1]!=lop_pre) { |
fprintf(stderr,"Input is not an MMO file (first two bytes are wrong)!\n"); |
@.Input is not...@> |
exit(-5); |
} |
if (y!=1) fprintf(stderr, |
"Warning: I'm reading this file as version 1, not version %d!\n",y); |
@.I'm reading this file...@> |
if (z>0) { |
j=z; |
read_tet(); |
if (listing) |
printf("File was created %s",asctime(localtime((time_t*)&tet))); |
for (j--;j>0;j--) { |
read_tet(); |
if (listing) printf("Preamble data %08x\n",tet); |
} |
} |
|
@ @<List the postamble@>= |
for (j=z;j<256;j++) { |
read_tet();@+tmp.h=tet;@+read_tet(); |
if (listing) { |
if (tmp.h || tet) printf("g%03d: %08x%08x\n",j,tmp.h,tet); |
else printf("g%03d: 0\n",j); |
} |
} |
|
@* The symbol table. Finally we come to the symbol table, which is |
the most interesting part of this program because it recursively |
traces an implicit ternary trie structure. |
|
@<List the symbol table@>= |
read_tet(); |
if (buf[0]!=mm || buf[1]!=lop_stab) { |
fprintf(stderr,"Symbol table does not follow the postamble!\n"); |
@.Symbol table...@> |
exit(-6); |
} |
if (yz) fprintf(stderr,"YZ field of lop_stab should be zero!\n"); |
@.YZ field...should be zero@> |
printf("Symbol table (beginning at tetra %d):\n",count); |
stab_start=count; |
sym_ptr=sym_buf; |
print_stab(); |
@<Check the |lop_end|@>; |
|
@ The main work is done by a recursive subroutine called |print_stab|, |
which manipulates a global array |sym_buf| containing the current |
symbol prefix; the global variable |sym_ptr| points to the first |
unfilled character of that array. |
|
@<Sub...@>= |
void print_stab @,@,@[ARGS((void))@]; |
void print_stab() |
{ |
register int m=read_byte(); /* the master control byte */ |
register int c; /* the character at the current trie node */ |
register int j,k; |
if (m&0x40) print_stab(); /* traverse the left subtrie, if it is nonempty */ |
if (m&0x2f) { |
@<Read the character |c|@>; |
*sym_ptr++=c; |
if (sym_ptr==&sym_buf[sym_length_max]) { |
fprintf(stderr,"Oops, the symbol is too long!\n");@+exit(-7); |
@.Oops...too long@> |
} |
if (m&0xf) |
@<Print the current symbol with its equivalent and serial number@>; |
if (m&0x20) print_stab(); /* traverse the middle subtrie */ |
sym_ptr--; |
} |
if (m&0x10) print_stab(); /* traverse the right subtrie, if it is nonempty */ |
} |
|
@ The present implementation doesn't support Unicode; characters with |
more than 8-bit codes are printed as `\.?'. However, the changes |
for 16-bit codes would be quite easy if proper fonts for Unicode output |
were available. In that case, |sym_buf| would be an array of wyde characters. |
@^Unicode@> |
@^system dependencies@> |
|
@<Read the character |c|@>= |
if (m&0x80) j=read_byte(); /* 16-bit character */ |
else j=0; |
c=read_byte(); |
if (j) c='?'; /* oops, we can't print |(j<<8)+c| easily at this time */ |
|
@ @<Print the current symbol with its equivalent and serial number@>= |
{ |
*sym_ptr='\0'; |
j=m&0xf; |
if (j==15) sprintf(equiv_buf,"$%03d",read_byte()); |
else if (j<=8) { |
strcpy(equiv_buf,"#"); |
for (;j>0;j--) sprintf(equiv_buf+strlen(equiv_buf),"%02x",read_byte()); |
if (strcmp(equiv_buf,"#0000")==0) strcpy(equiv_buf,"?"); /* undefined */ |
}@+else { |
strncpy(equiv_buf,"#20000000000000",33-2*j); |
equiv_buf[33-2*j]='\0'; |
for (;j>8;j--) sprintf(equiv_buf+strlen(equiv_buf),"%02x",read_byte()); |
} |
for (j=k=read_byte();; k=read_byte(),j=(j<<7)+k) if (k>=128) break; |
/* the serial number is now $j-128$ */ |
printf(" %s = %s (%d)\n",sym_buf+1,equiv_buf,j-128); |
} |
|
@ @d sym_length_max 1000 |
|
@<Glob...@>= |
int stab_start; /* where the symbol table began */ |
char sym_buf[sym_length_max]; |
/* the characters on middle transitions to current node */ |
char *sym_ptr; /* the character in |sym_buf| following the current prefix */ |
char equiv_buf[20]; /* equivalent of the current symbol */ |
|
@ @<Check the |lop_end|@>= |
while (byte_count) |
if (read_byte()) fprintf(stderr,"Nonzero byte follows the symbol table!\n"); |
@.Nonzero byte follows...@> |
read_tet(); |
if (buf[0]!=mm || buf[1]!=lop_end) |
fprintf(stderr,"The symbol table isn't followed by lop_end!\n"); |
@.The symbol table isn't...@> |
else if (count!=stab_start+yz+1) |
fprintf(stderr,"YZ field at lop_end should have been %d!\n",count-yz-1); |
@:YZ field at lop_end...}\.{YZ field at lop\_end...@> |
else { |
if (verbose) printf("Symbol table ends at tetra %d.\n",count); |
if (fread(buf,1,1,mmo_file)) |
fprintf(stderr,"Extra bytes follow the lop_end!\n"); |
@.Extra bytes follow...@> |
} |
|
|
@* Index. |
/mmix-20081027.tar
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
mmix-20081027.tar
Property changes :
Added: svn:mime-type
## -0,0 +1 ##
+application/octet-stream
\ No newline at end of property
Index: primesf.mms
===================================================================
--- primesf.mms (nonexistent)
+++ primesf.mms (revision 270)
@@ -0,0 +1,67 @@
+% Example program ... Table of primes (floating point version)
+L IS 500 The number of primes to find
+t IS $255 Temporary storage
+n GREG
+q GREG
+r GREG
+jj GREG
+kk GREG
+pk GREG
+mm IS kk
+
+ LOC Data_Segment
+PRIME1 WYDE 2
+ LOC PRIME1+2*L
+ptop GREG @
+j0 GREG PRIME1+2-@
+BUF OCTA
+
+ LOC #100
+Main SET n,3
+ SET jj,j0
+2H STWU n,ptop,jj
+ INCL jj,2
+3H BZ jj,2F
+4H INCL n,2
+5H SET kk,j0
+fn GREG 0
+sqrtn GREG 0
+ FLOT fn,n
+ FSQRT sqrtn,fn
+6H LDWU pk,ptop,kk
+ FLOT t,pk
+ FREM r,fn,t
+ BZ r,4B
+7H FCMP t,t,sqrtn
+ BNN t,2B
+8H INCL kk,2
+ JMP 6B
+ GREG @
+Title BYTE "First Five Hundred Primes"
+NewLn BYTE #a,0
+Blanks BYTE " ",0
+2H LDA t,Title
+ TRAP 0,Fputs,StdOut
+ NEG mm,2
+3H ADD mm,mm,j0
+ LDA t,Blanks
+ TRAP 0,Fputs,StdOut
+2H LDWU pk,ptop,mm
+0H GREG #2030303030000000
+ STOU 0B,BUF
+ LDA t,BUF+4
+1H DIV pk,pk,10
+ GET r,rR
+ INCL r,'0'
+ STBU r,t,0
+ SUB t,t,1
+ PBNZ pk,1B
+ LDA t,BUF
+ TRAP 0,Fputs,StdOut
+ INCL mm,2*L/10
+ PBN mm,2B
+ LDA t,NewLn
+ TRAP 0,Fputs,StdOut
+ CMP t,mm,2*(L/10-1)
+ PBNZ t,3B
+ TRAP 0,Halt,0
Index: makefile.dos
===================================================================
--- makefile.dos (nonexistent)
+++ makefile.dos (revision 270)
@@ -0,0 +1,117 @@
+#
+# Makefile for MMIXware under DOS
+#
+# Comments to andreas.scherer@pobox.com
+#
+# If you're using nmake, you'll need to save the Unix makefile and
+# rename this file to makefile, as in:
+#
+# ren Makefile Makefile.unix
+# ren Makefile.dos Makefile
+#
+# Then use nmake normally.
+
+# Be sure that CWEB version 3.0 or greater is installed before proceeding!
+# In fact, CWEB 3.6 is recommended for making hardcopy or PDF documentation.
+
+# If you prefer optimization to debugging, change /Zi to something like /GB:
+MAKE = $(MAKE) /$(MAKEFLAGS)
+CFLAGS = /Zi
+
+.SUFFIXES: .dvi .tex .w .ps .pdf
+
+.tex.dvi:
+ tex $*.tex
+
+.tex.pdf:
+ pdftex $*.tex
+
+.dvi.ps:
+ dvips $* -o $*.ps
+
+.w.c:
+ if exist $*.ch ctangle $*.w $*.ch
+ if not exist $*.ch ctangle $*.w
+
+.w.tex:
+ if exist $*.ch cweave $*.w $*.ch
+ if not exist $*.ch cweave $*.w
+
+.w.obj:
+ $(MAKE) $*.c
+ $(MAKE) $*.obj
+
+.w.exe:
+ $(MAKE) $*.c
+ $(MAKE) $*.exe
+
+.w.dvi:
+ $(MAKE) $*.tex
+ $(MAKE) $*.dvi
+
+.w.ps:
+ $(MAKE) $*.dvi
+ $(MAKE) $*.ps
+
+.w.pdf:
+ $(MAKE) $*.tex
+ $(MAKE) $*.pdf
+
+WEBFILES = mmix-def.w mmixal.w "mmix-arith.w" mmix-sim.w mmix-io.w mmix-mem.w \
+ mmotype.w abstime.w mmix-doc.w "mmix-config.w" mmix-pipe.w mmmix.w
+CHANGEFILES =
+TESTFILES = *.mms silly.run silly.out *.mmconfig *.mmix
+MISCFILES = Makefile makefile.dos README mmix.mp mmix.1
+ALL = $(WEBFILES) $(TESTFILES) $(MISCFILES)
+
+basic: mmixal.exe mmix.exe
+
+doc: mmix-doc.ps mmixal.ps mmix-sim.ps abstime.ps
+
+all: mmixal.exe mmix.exe mmotype.exe mmmix.exe
+
+clean:
+ del *~
+ del *.obj
+ del *.c
+ del *.h
+ del *.tex
+ del *.log
+ del *.dvi
+ del *.toc
+ del *.idx
+ del *.scn
+ del *.ps
+ del *.pdf
+ del *.ilk
+ del *.pdb
+
+abstime.exe: abstime.obj
+ $(CC) $(CFLAGS) abstime.obj /Feabstime.exe
+
+"mmix-pipe.obj": "mmix-pipe.c" abstime.exe
+ .\abstime >abstime.h
+ $(CC) $(CFLAGS) -c mmix-pipe.c
+
+mmmix.exe: "mmix-arith.obj" "mmix-pipe.obj" "mmix-config.obj" \
+ "mmix-mem.obj" "mmix-io.obj" mmmix.c
+ $(CC) $(CFLAGS) mmmix.c \
+ "mmix-arith.obj" "mmix-pipe.obj" "mmix-config.obj" "mmix-mem.obj" \
+ "mmix-io.obj" /Femmmix.exe
+
+mmixal.exe: "mmix-arith.obj" mmixal.c
+ $(CC) $(CFLAGS) mmixal.c "mmix-arith.obj" /Femmixal.exe
+
+mmix.exe: "mmix-arith.obj" mmix-io.obj mmix-sim.c abstime.exe
+ .\abstime >abstime.h
+ $(CC) $(CFLAGS) mmix-sim.c \
+ "mmix-arith.obj" mmix-io.obj /Femmix.exe
+
+mmotype.exe: mmotype.obj
+ $(CC) $(CFLAGS) mmotype.obj /Femmotype.exe
+
+tarfile: $(ALL)
+ tar cvf /tmp/mmix.tar $(ALL)
+ gzip -9 /tmp/mmix.tar
+
+
Index: plain.mmconfig
===================================================================
--- plain.mmconfig (nonexistent)
+++ plain.mmconfig (revision 270)
@@ -0,0 +1,42 @@
+% example configuration for basic tests
+memaddresstime 3
+memreadtime 10 memwritetime 10
+membusbytes 16
+branchpredictbits 2
+branchaddressbits 1
+branchhistorybits 1
+branchdualbits 1
+memchunksmax 100
+hashprime 127
+Scache blocksize 64
+Scache setsize 512
+Scache associativity 4 pseudolru
+Scache accesstime 2
+Dcache blocksize 32
+Dcache setsize 256
+Dcache victimsize 2
+Icache blocksize 32
+Icache setsize 256
+Icache victimsize 2
+DTcache associativity 4 lru
+unit ALU1 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
+unit ALU2 00000000ffffffffffffffffffffffff0000000300000003fffffffffffffffe
+unit LSU1 00000000000000000000000000000000fffffffcfffffffc0000000000000000
+unit LSU2 00000000000000000000000000000000fffffffcfffffffc0000000000000000
+unit MUL1 000080f000000000000000000000000000000000000000000000000000000000
+unit DIV1 00000c0f00000000000000000000000000000000000000000000000000000000
+unit FPU1 7fff730000000000000000000000000000000000000000000000000000000000
+memslots 4
+renameregs 10
+reorderbuffer 20
+Dcache writeallocate 1
+Scache writeallocate 1
+Dcache writeback 1
+Scache writeback 1
+Dcache ports 2
+DTcache ports 2
+writebuffer 4
+writeholdingtime 5
+mul0 1
+mul1 2
+mul2 5
Index: hptest.mms
===================================================================
--- hptest.mms (nonexistent)
+++ hptest.mms (revision 270)
@@ -0,0 +1,18 @@
+* Register stack test program by Hans-Peter Nilsson, January 2002
+ LOC #100
+cnt GREG
+max IS 17
+msg BYTE "No bug noticed here",#a,0
+
+Main PUSHJ $16,Recurse
+ GETA $255,msg
+ TRAP 0,Fputs,StdOut
+ TRAP 0,Halt,0
+
+Recurse ADDU cnt,cnt,1
+ CMP $0,cnt,max
+ BZ $0,0F
+ GET $1,rJ
+ PUSHJ $16,Recurse
+ PUT rJ,$1
+0H POP 0,0
Index: permu-langdon.mms
===================================================================
--- permu-langdon.mms (nonexistent)
+++ permu-langdon.mms (revision 270)
@@ -0,0 +1,35 @@
+* Permutation generator a la Langdon
+N IS 6 $n$ (2, 3, ..., 15)
+t IS $255
+k IS $0
+kk IS $1
+c IS $2
+d IS $3
+a GREG 0
+ones GREG #1111111111111111&(1<<(4*N)-1)
+
+ LOC #100
+ GREG @
+ElGordo OCTA #fedcba9876543210&(1<<(4*N)-1)
+Main LDOU a,ElGordo $a\gets\.{\#...3210}$.
+ JMP 2F
+1H SRU a,a,4*(16-N)
+ OR a,a,t
+
+2H ADDU c,a,ones Trace this location to see the perm!
+
+ SRU t,a,4*(N-1)
+ SLU a,a,4*(17-N)
+ PBNZ t,1B
+ SET k,1
+3H SRU d,a,60
+ SLU a,a,4
+ CMP c,d,k
+ SLU kk,k,2
+ SLU d,d,kk
+ OR t,t,d
+ PBNZ c,1B
+ INCL k,1
+ PBNZ a,3B
+ TRAP 0,Halt,0
+
Index: fftswap.mms
===================================================================
--- fftswap.mms (nonexistent)
+++ fftswap.mms (revision 270)
@@ -0,0 +1,67 @@
+% the bit-reversal portion of a 256-point Fast Fourier Transform
+
+t IS $255
+n IS 256
+pi GREG
+pj GREG
+tx GREG
+i GREG
+j GREG
+
+ LOC Data_Segment
+Data GREG @
+% Here follows 256 octabyte pairs for (real,imag) parts of complex data
+% I'm faking it with small integer numbers just for easy testing
+% But it uses long lines, so assemble with "mmixal -b 80"
+ OCTA 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+ OCTA 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+ OCTA 32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47
+ OCTA 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63
+ OCTA 64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79
+ OCTA 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95
+ OCTA 96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111
+ OCTA 112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
+ OCTA 128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143
+ OCTA 144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159
+ OCTA 160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175
+ OCTA 176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191
+ OCTA 192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207
+ OCTA 208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223
+ OCTA 224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239
+ OCTA 240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
+ OCTA 256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271
+ OCTA 272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287
+ OCTA 288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303
+ OCTA 304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319
+ OCTA 320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335
+ OCTA 336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351
+ OCTA 352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367
+ OCTA 368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383
+ OCTA 384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399
+ OCTA 400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415
+ OCTA 416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431
+ OCTA 432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447
+ OCTA 448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463
+ OCTA 464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479
+ OCTA 480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495
+ OCTA 496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511
+
+ LOC #100
+Main SET i,n-1
+0H GREG #0102040810204080
+1H MOR j,0B,i
+ CMP t,i,j
+ PBNN t,2F jump if i<=j
+ 16ADDU pi,i,Data pi=&Data[i]
+ 16ADDU pj,j,Data pj=&Data[j]
+ LDO t,pi,0
+ LDO tx,pj,0
+ STO t,pj,0 swap Data[i].real with Data[j].real
+ STO tx,pi,0
+ LDO t,pi,8
+ LDO tx,pj,8
+ STO t,pj,8 swap Data[i].imag with Data[j].imag
+ STO tx,pi,8
+2H SUB i,i,1 i--
+ PBP i,1B repeat until i==0
+ TRAP 0,Halt,0
Index: mmix-sim.w
===================================================================
--- mmix-sim.w (nonexistent)
+++ mmix-sim.w (revision 270)
@@ -0,0 +1,3424 @@
+% This file is part of the MMIXware package (c) Donald E Knuth 1999
+@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
+
+\def\title{MMIX-SIM}
+\def\MMIX{\.{MMIX}}
+\def\NNIX{\hbox{\mc NNIX}}
+\def\Hex#1{\hbox{$^{\scriptscriptstyle\#}$\tt#1}} % experimental hex constant
+\def\<#1>{\hbox{$\langle\,$#1$\,\rangle$}}\let\is=\longrightarrow
+\def\dts{\mathinner{\ldotp\ldotp}}
+\def\bull{\smallskip\textindent{$\bullet$}}
+@s xor normal @q unreserve a C++ keyword @>
+@s bool normal @q unreserve a C++ keyword @>
+
+@*Introduction. This program simulates a simplified version of the \MMIX\
+computer. Its main goal is to help people create and test \MMIX\ programs for
+{\sl The Art of Computer Programming\/} and related publications. It provides
+only a rudimentary terminal-oriented interface, but it has enough
+infrastructure to support a cool graphical user interface --- which could be
+added by a motivated reader. (Hint, hint.)
+
+\MMIX\ is simplified in the following ways:
+
+\bull
+There is no pipeline, and there are no
+caches. Thus, commands like \.{SYNC} and \.{SYNCD} and \.{PREGO} do nothing.
+
+\bull
+Simulation applies only to user programs, not to an operating system kernel.
+Thus, all addresses must be nonnegative; ``privileged'' commands such as
+\.{PUT}~\.{rK,z} or \.{RESUME}~\.1 or \.{LDVTS}~\.{x,y,z} are not allowed;
+instructions should be executed only from addresses in segment~0
+(addresses less than \Hex{2000000000000000}).
+Certain special registers remain constant: $\rm rF=0$,
+$\rm rK=\Hex{ffffffffffffffff}$,
+$\rm rQ=0$;
+$\rm rT=\Hex{8000000500000000}$,
+$\rm rTT=\Hex{8000000600000000}$,
+$\rm rV=\Hex{369c200400000000}$.
+
+\bull
+No trap interrupts are implemented, except for a few special cases of \.{TRAP}
+that provide rudimentary input-output.
+@^interrupts@>
+
+\bull
+All instructions take a fixed amount of time, given by the rough estimates
+stated in the \MMIX\ documentation. For example, \.{MUL} takes $10\upsilon$,
+\.{LDB} takes $\mu+\upsilon\mkern1mu$; all times are expressed in terms of
+$\mu$ and~$\upsilon$, ``mems'' and ``oops.'' The clock register~rC increases by
+@^mems@>
+@^oops@>
+$2^{32}$ for each~$\mu$ and 1~for each~$\upsilon$. But the interval
+counter~rI decreases by~1 for each instruction, and the usage
+counter~rU increases by~1 for each instruction.
+@^rC@>
+@^rI@>
+@^rU@>
+
+@ To run this simulator, assuming \UNIX/ conventions, you say
+`\.{mmix} \ \.{progfile} \.{args...}',
+where \.{progfile} is an output of the \.{MMIXAL} assembler,
+\.{args...} is a sequence of optional command-line arguments passed
+to the simulated program, and \ is any subset of the following:
+@^command line arguments@>
+
+\bull \.{-t}\quad Trace each instruction the first $n$ times it
+is executed. (The notation \.{} in this option, and in several
+other options and interactive commands below, stands for a decimal integer.)
+
+\bull \.{-e}\quad Trace each instruction that raises an arithmetic
+exception belonging to the given bit pattern. (The notation \.{} in this
+option, and in several other commands below, stands for a hexadecimal integer.)
+The exception bits are DVWIOUZX as they appear in rA, namely
+\Hex{80} for~D (integer divide check), \Hex{40} for~V (integer overflow),
+\dots, \Hex{01} for~X (floating inexact). The option \.{-e} by itself
+is equivalent to \.{-eff}, tracing all eight exceptions.
+
+\bull \.{-r}\quad Trace details of the register stack. This option
+shows all the ``hidden'' loads and stores that occur when octabytes are
+written from the ring of local registers into memory, or read from memory into
+that ring. It also shows the full details of \.{SAVE} and \.{UNSAVE}
+operations.
+
+\bull \.{-l}\quad List the source line corresponding to each traced
+instruction, filling gaps of length $n$ or less.
+For example, if one instruction came from line 10 of the source file
+and the next instruction to be traced came from line 12, line 11 would
+be shown also, provided that $n\ge1$. If \.{} is omitted it is
+assumed to be~3.
+
+\bull \.{-s}\quad Show statistics of running time with each traced instruction.
+
+\bull \.{-P}\quad Show the program profile (that is, the frequency counts
+of each instruction that was executed) when the simulation ends.
+
+\bull \.{-L}\quad List the source lines corresponding to each instruction
+that appears in the program profile, filling gaps of length $n$ or less.
+This option implies \.{-P}. If \.{} is omitted it is assumed to be~3.
+
+\bull \.{-v}\quad Be verbose: \kern-2.5ptTurn on all options.
+(More precisely, the \.{-v} option is
+shorthand for \.{-t9999999999}~\.{-e} \.{-r} \.{-s} \.{-l10}~\.{-L10}.)
+
+\bull \.{-q}\quad Be quiet: Cancel all previously specified options.
+
+\bull \.{-i}\quad Go into interactive mode before starting the simulation.
+
+\bull \.{-I}\quad Go into interactive mode when the simulated program
+halts or pauses for a breakpoint.
+
+\bull \.{-b}\quad Set the buffer size of source lines to $\max(72,n)$.
+
+\bull \.{-c}\quad Set the capacity of the local register ring
+to $\max(256,n)$; this number must be a power of~2.
+
+\bull \.{-f}\quad Use the named file for standard input to the
+simulated program. This option should be used whenever the simulator
+is not being used interactively, because the simulator will not recognize
+end of file when standard input has been defined in any other way.
+
+\bull \.{-D}\quad Prepare the named file for use by other
+simulators, instead of actually doing a simulation.
+
+\bull \.{-?}\quad Print the ``\.{Usage}'' message, which summarizes
+the command-line options.
+
+\smallskip\noindent
+The author recommends \.{-t2} \.{-l} \.{-L} for initial offline debugging.
+
+While the program is being simulated, an {\it interrupt\/}
+signal (usually control-C) will cause the simulator to
+@^interrupts@>
+break and go into interactive mode after tracing the current instruction,
+even if \.{-i} and \.{-I} were not specified on the command line.
+
+@ In interactive mode, the user is prompted `\.{mmix>}' and a variety of
+@.mmix>@>
+commands can be typed online. Any command-line option can be given
+in response to such a prompt (including the `\.-' that begins the option),
+and the following operations are also available:
+
+\bull Simply typing \ or \.n\ to the \.{mmix>} prompt causes
+one \MMIX\ instruction to be executed and traced; then the user is prompted
+again.
+
+\bull \.c continues simulation until the program halts or reaches
+a breakpoint. (Actually the command is `\.c\', but we won't
+bother to mention the \ in the following description.)
+
+\bull \.q quits (terminates the simulation), after printing the
+profile (if it was requested) and the final statistics.
+
+\bull \.s prints out the current statistics (the clock times and the
+current instruction location). We have already discussed the \.{-s} option
+on the command line, which
+causes these statistics to be printed automatically;
+but a lot of statistics can fill up a lot of file space, so users may
+prefer to see the statistics only on demand.
+
+\bull \.{l}, \.{g}, \.{\$}, \.{rA}, \.{rB}, \dots,
+\.{rZZ}, and \.{M} will show the current value of a local register,
+global register, dynamically numbered register, special register, or memory
+location. Here \.{} specifies the type of value to be displayed;
+if \.{} is `\.!', the value will be given in decimal notation;
+if \.{} is `\..' it will be given in floating point notation;
+if \.{} is `\.\#' it will be given in hexadecimal, and
+if \.{} is `\."' it will be given as a string of eight one-byte
+characters. Just typing \.{} by itself will repeat the most recently shown
+value, perhaps in another format; for example, the command `\.{l10\#}'
+will show local register 10 in hexadecimal notation, then the command
+`\.!' will show it in decimal and `\..' will show it as a floating point
+number. If \.{} is empty, the previous type will be repeated;
+the default type is decimal. Register \.{rA} is equivalent to \.{g22},
+according to the numbering used in \.{GET} and \.{PUT} commands.
+
+The `\.{}' in any of these commands can also have the form
+`\.{=}', where the value is a decimal or floating point or
+hexadecimal or string constant. (The syntax rules for floating point constants
+appear in {\mc MMIX-ARITH}. A string constant is treated as in the
+\.{BYTE} command of \.{MMIXAL}, but padded at the left with zeros if
+fewer than eight characters are specified.) This assigns a new value
+before displaying it. For example, `\.{l10=.1e3}'
+sets local register 10 equal to 100; `\.{g250="ABCD",\#a}' sets global
+register 250 equal to \Hex{000000414243440a}; `\.{M1000=-Inf}' sets
+M$[\Hex{1000}]_8=\Hex{fff0000000000000}$, the representation of $-\infty$.
+Special registers other than~rI cannot be set to values disallowed by~\.{PUT}.
+Marginal registers cannot be set to nonzero values.
+
+The command `\.{rI=250}' sets the interval counter to 250; this will
+cause a break in simulation after 250 instructions have been executed.
+
+\bull \.{+} shows the next $n$ octabytes following the one
+most recently shown, in format \.{}. For example, after `\.{l10\#}'
+a subsequent `\.{+30}' will show \.{l11}, \.{l12}, \dots, \.{l40} in
+hexadecimal notation. After `\.{g200=3}' a subsequent `\.{+30}' will
+set \.{g201}, \.{g202}, \dots, \.{g230} equal to~3, but a subsequent
+`\.{+30!}' would merely display \.{g201} through~\.{g230} in decimal
+notation. Memory addresses will advance by~8 instead of by~1. If \.{}
+is empty, the default value $n=1$ is used.
+
+\bull \.{@@} sets the address of the next tetrabyte to be
+simulated, sort of like a \.{GO} command.
+
+\bull \.{t} says that the instruction in tetrabyte location $x$ should
+always be traced, regardless of its frequency count.
+
+\bull \.{u} undoes the effect of \.{t}.
+
+\bull \.{b[rwx]} sets breakpoints at tetrabyte $x$; here \.{[rwx]}
+stands for any subset of the letters \.r, \.w, and/or~\.x, meaning to
+break when the tetrabyte is read, written, and/or executed. For example,
+`\.{bx1000}' causes a break in the simulation just after the tetrabyte
+in \Hex{1000} is executed; `\.{b1000}' undoes this breakpoint;
+`\.{brwx1000}' causes a break just after any simulated instruction loads,
+stores, or appears in tetrabyte number \Hex{1000}.
+
+\bull \.{T}, \.{D}, \.{P}, \.{S} sets the ``current segment'' to
+\.{Text\_Segment}, \.{Data\_Segment}, \.{Pool\_Segment}, or
+\.{Stack\_Segment}, respectively, namely to \Hex{0}, \Hex{2000000000000000},
+\Hex{4000000000000000}, or \Hex{6000000000000000}. The current segment,
+initially \Hex{0}, is added to all
+memory addresses in \.{M}, \.{@@}, \.{t}, \.{u}, and \.{b} commands.
+@:Text_Segment}\.{Text\_Segment@>
+@:Data_Segment}\.{Data\_Segment@>
+@:Pool_Segment}\.{Pool\_Segment@>
+@:Stack_Segment}\.{Stack\_Segment@>
+
+\bull \.{B} lists all current breakpoints and tracepoints.
+
+\bull \.{i} reads a sequence of interactive commands from the
+specified file, one command per line, ignoring blank lines. This feature
+can be used to set many breakpoints or to display a number of key
+registers, etc. Included lines that begin with \.\% or \.i are ignored;
+therefore an included file cannot include {\it another\/} file.
+Included lines that begin with a blank space are reproduced in the standard
+output, otherwise ignored.
+
+\bull \.h (help) reminds the user of the available interactive commands.
+
+@* Rudimentary I/O.
+Input and output are provided by the following ten primitive system calls:
+@^I/O@>
+@^input/output@>
+
+\bull \.{Fopen}|(handle,name,mode)|. Here |handle| is a
+one-byte integer, |name| is a string, and |mode| is one of the
+values \.{TextRead}, \.{TextWrite}, \.{BinaryRead}, \.{BinaryWrite},
+\.{BinaryReadWrite}. An \.{Fopen} call associates |handle| with the
+external file called |name| and prepares to do input and/or output
+on that file. It returns 0 if the file was opened successfully; otherwise
+returns the value~$-1$. If |mode| is \.{TextWrite}, \.{BinaryWrite}, or
+\.{BinaryReadWrite},
+any previous contents of the named file are discarded. If |mode| is
+\.{TextRead} or \.{TextWrite}, the file consists of ``lines'' terminated
+by ``newline'' characters, and it is said to be a text file; otherwise
+the file consists of uninterpreted bytes, and it is said to be a binary file.
+@.Fopen@>
+@.TextRead@>
+@.TextWrite@>
+@.BinaryRead@>
+@.BinaryWrite@>
+@.BinaryReadWrite@>
+
+Text files and binary files are essentially equivalent in cases
+where this simulator is hosted by an operating system derived from \UNIX/;
+in such cases files can be written as text and read as binary or vice versa.
+But with other operating systems, text files and binary files often have
+quite different representations, and certain characters with byte
+codes less than~|' '| are forbidden in text. Within any \MMIX\ program,
+the newline character has byte code $\Hex{0a}=10$.
+
+At the beginning of a program three handles have already been opened: The
+``standard input'' file \.{StdIn} (handle~0) has mode \.{TextRead}, the
+``standard output'' file \.{StdOut} (handle~1) has mode \.{TextWrite}, and the
+``standard error'' file \.{StdErr} (handle~2) also has mode \.{TextWrite}.
+@.StdIn@>
+@.StdOut@>
+@.StdErr@>
+When this simulator is being run interactively, lines of standard input
+should be typed following a prompt that says `\.{StdIn>\ }', unless the \.{-f}
+option has been used.
+The standard output and standard error files of the simulated program
+are intermixed with the output of the simulator~itself.
+
+The input/output operations supported by this simulator can perhaps be
+understood most easily with reference to the standard library \.{stdio}
+that comes with the \CEE/ language, because the conventions of~\CEE/
+have been explained in hundreds of books. If we declare an array
+|FILE *file[256]| and set |file[0]=stdin|, |file[1]=stdout|, and
+|file[2]=stderr|, then the simulated system call \.{Fopen}|(handle,name,mode)|
+is essentially equivalent to the \CEE/ expression
+$$\displaylines{
+\hskip5em\hbox{(|file[handle]|?
+ |(file[handle]=freopen(name,mode_string[mode],file[handle]))|:}\hfill\cr
+\hfill\hbox{|(file[handle]=fopen(name,mode_string[mode]))|)? 0: $-1$},%
+ \hskip5em\cr}$$
+if we set |mode_string|[\.{TextRead}]~=~|"r"|,
+|mode_string|[\.{TextWrite}]~=~|"w"|,
+|mode_string|[\.{BinaryRead}]~=~|"rb"|,
+|mode_string|[\.{BinaryWrite}]~=~|"wb"|, and
+|mode_string|[\.{BinaryReadWrite}]~=~|"wb+"|.
+
+\bull \.{Fclose}|(handle)|. If the given file handle has been opened, it is
+closed---no longer associated with any file. Again the result is 0 if
+successful, or $-1$ if the file was already closed or unclosable.
+The \CEE/ equivalent is
+$$\hbox{|fclose(file[handle])? -1: 0|}$$
+with the additional side effect of setting |file[handle]=NULL|.
+
+\bull \.{Fread}|(handle,buffer,size)|.
+The file handle should have been opened with mode \.{TextRead},
+\.{BinaryRead}, or \.{BinaryReadWrite}.
+@.Fread@>
+The next |size| characters are read into \MMIX's memory starting at address
+|buffer|. If an error occurs, the value |-1-size| is returned;
+otherwise, if the end of file does not intervene, 0~is returned;
+otherwise the negative value |n-size| is returned, where |n|~is the number of
+characters successfully read and stored. The statement
+$$\hbox{|fread(buffer,1,size,file[handle])-size|}$$
+has the equivalent effect in \CEE/, in the absence of file errors.
+
+\bull \.{Fgets}|(handle,buffer,size)|.
+The file handle should have been opened with mode \.{TextRead},
+\.{BinaryRead}, or \.{BinaryReadWrite}.
+@.Fgets@>
+Characters are read into \MMIX's memory starting at address |buffer|, until
+either |size-1| characters have been read and stored or a newline character has
+been read and stored; the next byte in memory is then set to zero.
+If an error or end of file occurs before reading is complete, the memory
+contents are undefined and the value $-1$ is returned; otherwise
+the number of characters successfully read and stored is returned.
+The equivalent in \CEE/ is
+$$\hbox{|fgets(buffer,size,file[handle])? strlen(buffer): -1|}$$
+if we assume that no null characters were read in; null characters may,
+however, precede a newline, and they are counted just like other characters.
+
+\bull \.{Fgetws}|(handle,buffer,size)|.
+@.Fgetws@>
+This command is the same as \.{Fgets}, except that it applies to wyde
+characters instead of one-byte characters. Up to |size-1| wyde
+characters are read; a wyde newline is $\Hex{000a}$. The \CEE/~version,
+using conventions of the ISO multibyte string extension (MSE), is
+@^MSE@>
+approximately
+$$\hbox{|fgetws(buffer,size,file[handle])? wcslen(buffer): -1|}$$
+where |buffer| now has type |wchar_t*|.
+
+\bull \.{Fwrite}|(handle,buffer,size)|.
+The file handle should have been opened with one of the modes \.{TextWrite},
+\.{BinaryWrite}, or \.{BinaryReadWrite}.
+@.Fwrite@>
+The next |size| characters are written from \MMIX's memory starting at address
+|buffer|. If no error occurs, 0~is returned;
+otherwise the negative value |n-size| is returned, where |n|~is the number of
+characters successfully written. The statement
+$$\hbox{|fwrite(buffer,1,size,file[handle])-size|}$$
+together with |fflush(file[handle])| has the equivalent effect in \CEE/.
+
+\bull \.{Fputs}|(handle,string)|.
+The file handle should have been opened with mode \.{TextWrite},
+\.{BinaryWrite}, or \.{BinaryReadWrite}.
+@.Fputs@>
+One-byte characters are written from \MMIX's memory to the file, starting
+at address |string|, up to but not including the first byte equal to~zero.
+The number of bytes written is returned, or $-1$ on error.
+The \CEE/ version is
+$$\hbox{|fputs(string,file[handle])>=0? strlen(string): -1|,}$$
+together with |fflush(file[handle])|.
+
+\bull \.{Fputws}|(handle,string)|.
+The file handle should have been opened with mode \.{TextWrite},
+\.{BinaryWrite}, or \.{BinaryReadWrite}.
+@.Fputws@>
+Wyde characters are written from \MMIX's memory to the file, starting
+at address |string|, up to but not including the first wyde equal to~zero.
+The number of wydes written is returned, or $-1$ on error.
+The \CEE/+MSE version is
+$$\hbox{|fputws(string,file[handle])>=0? wcslen(string): -1|}$$
+together with |fflush(file[handle])|, where |string| now has type |wchar_t*|.
+
+\bull \.{Fseek}|(handle,offset)|.
+The file handle should have been opened with mode \.{BinaryRead},
+\.{BinaryWrite}, or \.{BinaryReadWrite}.
+@.Fseek@>
+This operation causes the next input or output operation to begin at
+|offset| bytes from the beginning of the file, if |offset>=0|, or at
+|-offset-1| bytes before the end of the file, if |offset<0|. (For
+example, |offset=0| ``rewinds'' the file to its very beginning;
+|offset=-1| moves forward all the way to the end.) The result is 0
+if successful, or $-1$ if the stated positioning could not be done.
+The \CEE/ version is
+$$\hbox{|fseek(file[handle],@,offset<0? offset+1: offset,@,
+ offset<0? SEEK_END: SEEK_SET)|? $-1$: 0.}$$
+If a file in mode \.{BinaryReadWrite} is used for both reading and writing,
+an \.{Fseek} command must be given when switching from input to output
+or from output to input.
+
+\bull \.{Ftell}|(handle)|.
+The file handle should have been opened with mode \.{BinaryRead},
+\.{BinaryWrite}, or \.{BinaryReadWrite}.
+@.Ftell@>
+This operation returns the current file position, measured in bytes
+from the beginning, or $-1$ if an error has occurred. In this case the
+\CEE/ function
+$$\hbox{|ftell(file[handle])|}$$
+has exactly the same meaning.
+
+\smallskip
+Although these ten operations are quite primitive, they provide
+the necessary functionality for extremely complex input/output behavior.
+For example, every function in the \.{stdio} library of \CEE/,
+with the exception of the two administrative operations \\{remove} and
+\\{rename}, can be implemented as a subroutine in terms of the six basic
+operations \.{Fopen}, \.{Fclose}, \.{Fread}, \.{Fwrite}, \.{Fseek}, and
+\.{Ftell}.
+
+Notice that the \MMIX\ function calls are much more consistent than
+those in the \CEE/ library. The first argument is always a handle;
+the second, if present, is always an address; the third, if present,
+is always a size. {\it The result returned is always nonnegative if the
+operation was successful, negative if an anomaly arose.} These common
+features make the functions reasonably easy to remember.
+
+@ The ten input/output operations of the previous section are invoked by
+\.{TRAP} commands with $\rm X=0$, $\rm Y=\.{Fopen}$ or \.{Fclose} or \dots~or
+\.{Ftell}, and $\rm Z=\.{Handle}$. If~there are two arguments, the
+second argument is placed in \$255. If there are three arguments,
+the address of the second is placed in~\$255; the second argument
+is M$[\$255]_8$ and the third argument is M$[\$255+8]_8$. The returned
+value will be in \$255 when the system call is finished. (See the
+example below.)
+
+@ The user program starts at symbolic location \.{Main}. At this time
+@.Main@>
+@:Pool_Segment}\.{Pool\_Segment@>
+the global registers are initialized according to the \.{GREG}
+statements in the \.{MMIXAL} program, and \$255 is set to the
+numeric equivalent of~\.{Main}. Local register~\$0 is
+initially set to the number of {\it command-line arguments\/}; and
+@^command line arguments@>
+local register~\$1 points to the first such argument, which
+is always a pointer to the program name. Each command-line argument is a
+pointer to a string; the last such pointer is M$[\$0\ll3+\$1]_8$, and
+M$[\$0\ll3+\$1+8]_8$ is zero. (Register~\$1 will point to an octabyte in
+\.{Pool\_Segment}, and the command-line strings will be in that segment
+too.) Location M[\.{Pool\_Segment}] will be the address of the first
+unused octabyte of the pool segment.
+
+Registers rA, rB, rD, rE, rF, rH, rI, rJ, rM, rP, rQ, and rR
+are initially zero, and $\rm rL=2$.
+
+A subroutine library loaded with the user program might need to initialize
+itself. If an instruction has been loaded into tetrabyte M$[\Hex{90}]_4$,
+the simulator actually begins execution at \Hex{90} instead of at~\.{Main};
+in this case \$255 holds the location of~\.{Main}.
+@^subroutine library initialization@>
+@^initialization of a user program@>
+(The routine at \Hex{90} can pass control to \.{Main} without increasing~rL,
+if it starts with the slightly tricky sequence
+$$\.{PUT rW, \$255;{ } PUT rB, \$255;{ } SETML \$255,\#F700;{ } % PUTI rB,0!
+ PUT rX,\$255}$$
+and eventually says \.{RESUME}; this \.{RESUME} command will restore
+\$255 and~rB. But the user program should {\it not\/} really count on
+the fact that rL is initially~2.)
+
+@ The main program ends when \MMIX\ executes the system
+call \.{TRAP}~\.{0}, which is often symbolically written
+`\.{TRAP}~\.{0,Halt,0}' to make its intention clear. The contents
+of \$255 at that time are considered to be the value ``returned''
+by the main program, as in the |exit| statement of~\CEE/; a nonzero
+value indicates an anomalous exit. All open files are closed
+@.Halt@>
+when the program ends.
+
+@ Here, for example, is a complete program that copies a text file
+to the standard output, given the name of the file to be copied.
+It includes all necessary error checking.
+\vskip-14pt
+$$\baselineskip=10pt
+\obeyspaces\halign{\qquad\.{#}\hfil\cr
+* SAMPLE PROGRAM: COPY A GIVEN FILE TO STANDARD OUTPUT\cr
+\noalign{\smallskip}
+t IS \$255\cr
+argc IS \$0\cr
+argv IS \$1\cr
+s IS \$2\cr
+Buf\_Size IS 1000\cr
+{} LOC Data\_Segment\cr
+Buffer LOC @@+Buf\_Size\cr
+{} GREG @@\cr
+Arg0 OCTA 0,TextRead\cr
+Arg1 OCTA Buffer,Buf\_Size\cr
+\noalign{\smallskip}
+{} LOC \#200 main(argc,argv) \{\cr
+Main CMP t,argc,2 if (argc==2) goto openit\cr
+{} PBZ t,OpenIt\cr
+{} GETA t,1F fputs("Usage: ",stderr)\cr
+{} TRAP 0,Fputs,StdErr\cr
+{} LDOU t,argv,0 fputs(argv[0],stderr)\cr
+{} TRAP 0,Fputs,StdErr\cr
+{} GETA t,2F fputs(" filename\\n",stderr)\cr
+Quit TRAP 0,Fputs,StdErr \cr
+{} NEG t,0,1 quit: exit(-1)\cr
+{} TRAP 0,Halt,0\cr
+1H BYTE "Usage: ",0\cr
+{} LOC (@@+3)\&-4 align to tetrabyte\cr
+2H BYTE " filename",\#a,0\cr
+\noalign{\smallskip}
+OpenIt LDOU s,argv,8 openit: s=argv[1]\cr
+{} STOU s,Arg0\cr
+{} LDA t,Arg0 fopen(argv[1],"r",file[3])\cr
+{} TRAP 0,Fopen,3\cr
+{} PBNN t,CopyIt if (no error) goto copyit\cr
+{} GETA t,1F fputs("Can't open file ",stderr)\cr
+{} TRAP 0,Fputs,StdErr\cr
+{} SET t,s fputs(argv[1],stderr)\cr
+{} TRAP 0,Fputs,StdErr\cr
+{} GETA t,2F fputs("!\\n",stderr)\cr
+{} JMP Quit goto quit\cr
+1H BYTE "Can't open file ",0\cr
+{} LOC (@@+3)\&-4 align to tetrabyte\cr
+2H BYTE "!",\#a,0\cr
+\noalign{\smallskip}
+CopyIt LDA t,Arg1 copyit:\cr
+{} TRAP 0,Fread,3 items=fread(buffer,1,buf\_size,file[3])\cr
+{} BN t,EndIt if (items < buf\_size) goto endit\cr
+{} LDA t,Arg1 items=fwrite(buffer,1,buf\_size,stdout)\cr
+{} TRAP 0,Fwrite,StdOut\cr
+{} PBNN t,CopyIt if (items >= buf\_size) goto copyit\cr
+Trouble GETA t,1F trouble: fputs("Trouble w...!",stderr)\cr
+{} JMP Quit goto quit\cr
+1H BYTE "Trouble writing StdOut!",\#a,0\cr
+\noalign{\smallskip}
+EndIt INCL t,Buf\_Size\cr
+{} BN t,ReadErr if (ferror(file[3])) goto readerr\cr
+{} STO t,Arg1+8\cr
+{} LDA t,Arg1 n=fwrite(buffer,1,items,stdout)\cr
+{} TRAP 0,Fwrite,StdOut\cr
+{} BN t,Trouble if (n < items) goto trouble\cr
+{} TRAP 0,Halt,0 exit(0)\cr
+ReadErr GETA t,1F readerr: fputs("Trouble r...!",stderr)\cr
+{} JMP Quit goto quit \}\cr
+1H BYTE "Trouble reading!",\#a,0\cr
+}$$
+
+@* Basics. To get started, we define a type that provides semantic sugar.
+
+@=
+typedef enum {@!false,@!true}@+@!bool;
+
+@ This program for the 64-bit \MMIX\ architecture is based on 32-bit integer
+arithmetic, because nearly every computer available to the author at the time
+of writing (1999) was limited in that way. It uses subroutines
+from the {\mc MMIX-ARITH} module, assuming only that type \&{tetra}
+represents unsigned 32-bit integers. The definition of \&{tetra}
+given here should be changed, if necessary, to agree with the
+definition in that module.
+@^system dependencies@>
+
+@=
+typedef unsigned int tetra;
+ /* for systems conforming to the LP-64 data model */
+typedef struct {tetra h,l;} octa; /* two tetrabytes make one octabyte */
+typedef unsigned char byte; /* a monobyte */
+
+@ We declare subroutines twice, once with a prototype and once
+with the old-style~\CEE/ conventions. The following hack makes
+this work with new compilers as well as the old standbys.
+
+@=
+#ifdef __STDC__
+#define ARGS(list) list
+#else
+#define ARGS(list) ()
+#endif
+
+@ @=
+void print_hex @,@,@[ARGS((octa))@];@+@t}\6{@>
+void print_hex(o)
+ octa o;
+{
+ if (o.h) printf("%x%08x",o.h,o.l);
+ else printf("%x",o.l);
+}
+
+@ Most of the subroutines in {\mc MMIX-ARITH} return an octabyte as
+a function of two octabytes; for example, |oplus(y,z)| returns the
+sum of octabytes |y| and~|z|. Division inputs the high
+half of a dividend in the global variable~|aux| and returns
+the remainder in~|aux|.
+
+@=
+extern octa zero_octa; /* |zero_octa.h=zero_octa.l=0| */
+extern octa neg_one; /* |neg_one.h=neg_one.l=-1| */
+extern octa aux,val; /* auxiliary data */
+extern bool overflow; /* flag set by signed multiplication and division */
+extern int exceptions; /* bits set by floating point operations */
+extern int cur_round; /* the current rounding mode */
+extern char *next_char; /* where a scanned constant ended */
+extern octa oplus @,@,@[ARGS((octa y,octa z))@];
+ /* unsigned $y+z$ */
+extern octa ominus @,@,@[ARGS((octa y,octa z))@];
+ /* unsigned $y-z$ */
+extern octa incr @,@,@[ARGS((octa y,int delta))@];
+ /* unsigned $y+\delta$ ($\delta$ is signed) */
+extern octa oand @,@,@[ARGS((octa y,octa z))@];
+ /* $y\land z$ */
+extern octa shift_left @,@,@[ARGS((octa y,int s))@];
+ /* $y\LL s$, $0\le s\le64$ */
+extern octa shift_right @,@,@[ARGS((octa y,int s,int uns))@];
+ /* $y\GG s$, signed if |!uns| */
+extern octa omult @,@,@[ARGS((octa y,octa z))@];
+ /* unsigned $(|aux|,x)=y\times z$ */
+extern octa signed_omult @,@,@[ARGS((octa y,octa z))@];
+ /* signed $x=y\times z$ */
+extern octa odiv @,@,@[ARGS((octa x,octa y,octa z))@];
+ /* unsigned $(x,y)/z$; $|aux|=(x,y)\bmod z$ */
+extern octa signed_odiv @,@,@[ARGS((octa y,octa z))@];
+ /* signed $x=y/z$ */
+extern int count_bits @,@,@[ARGS((tetra z))@];
+ /* $x=\nu(z)$ */
+extern tetra byte_diff @,@,@[ARGS((tetra y,tetra z))@];
+ /* half of \.{BDIF} */
+extern tetra wyde_diff @,@,@[ARGS((tetra y,tetra z))@];
+ /* half of \.{WDIF} */
+extern octa bool_mult @,@,@[ARGS((octa y,octa z,bool xor))@];
+ /* \.{MOR} or \.{MXOR} */
+extern octa load_sf @,@,@[ARGS((tetra z))@];
+ /* load short float */
+extern tetra store_sf @,@,@[ARGS((octa x))@];
+ /* store short float */
+extern octa fplus @,@,@[ARGS((octa y,octa z))@];
+ /* floating point $x=y\oplus z$ */
+extern octa fmult @,@,@[ARGS((octa y ,octa z))@];
+ /* floating point $x=y\otimes z$ */
+extern octa fdivide @,@,@[ARGS((octa y,octa z))@];
+ /* floating point $x=y\oslash z$ */
+extern octa froot @,@,@[ARGS((octa,int))@];
+ /* floating point $x=\sqrt z$ */
+extern octa fremstep @,@,@[ARGS((octa y,octa z,int delta))@];
+ /* floating point $x\,{\rm rem}\,z=y\,{\rm rem}\,z$ */
+extern octa fintegerize @,@,@[ARGS((octa z,int mode))@];
+ /* floating point $x={\rm round}(z)$ */
+extern int fcomp @,@,@[ARGS((octa y,octa z))@];
+ /* $-1$, 0, 1, or 2 if $yz$, $y\parallel z$ */
+extern int fepscomp @,@,@[ARGS((octa y,octa z,octa eps,int sim))@];
+ /* $x=|sim|?\ [y\sim z\ (\epsilon)]:\ [y\approx z\ (\epsilon)]$ */
+extern octa floatit @,@,@[ARGS((octa z,int mode,int unsgnd,int shrt))@];
+ /* fix to float */
+extern octa fixit @,@,@[ARGS((octa z,int mode))@];
+ /* float to fix */
+extern void print_float @,@,@[ARGS((octa z))@];
+ /* print octabyte as floating decimal */
+extern int scan_const @,@,@[ARGS((char* buf))@];
+ /* |val| = floating or integer constant; returns the type */
+
+@ Here's a quick check to see if arithmetic is in trouble.
+
+@d panic(m) {@+fprintf(stderr,"Panic: %s!\n",m);@+exit(-2);@+}
+@=
+if (shift_left(neg_one,1).h!=0xffffffff)
+ panic("Incorrect implementation of type tetra");
+@.Incorrect implementation...@>
+
+@ Binary-to-decimal conversion is used when we want to see an octabyte
+as a signed integer. The identity $\lfloor(an+b)/10\rfloor=
+\lfloor a/10\rfloor n+\lfloor((a\bmod 10)n+b)/10\rfloor$ is helpful here.
+
+@d sign_bit ((unsigned)0x80000000)
+
+@=
+void print_int @,@,@[ARGS((octa))@];@+@t}\6{@>
+void print_int(o)
+ octa o;
+{
+ register tetra hi=o.h, lo=o.l, r, t;
+ register int j;
+ char dig[20];
+ if (lo==0 && hi==0) printf("0");
+ else {
+ if (hi&sign_bit) {
+ printf("-");
+ if (lo==0) hi=-hi;
+ else lo=-lo, hi=~hi;
+ }
+ for (j=0;hi;j++) { /* 64-bit division by 10 */
+ r=((hi%10)<<16)+(lo>>16);
+ hi=hi/10;
+ t=((r%10)<<16)+(lo&0xffff);
+ lo=((r/10)<<16)+(t/10);
+ dig[j]=t%10;
+ }
+ for (;lo;j++) {
+ dig[j]=lo%10;
+ lo=lo/10;
+ }
+ for (j--;j>=0;j--) printf("%c",dig[j]+'0');
+ }
+}
+
+@* Simulated memory. Chunks of simulated memory, 2048 bytes each,
+are kept in a tree structure organized as a {\it treap},
+following ideas of Vuillemin, Aragon, and Seidel
+@^Vuillemin, Jean Etienne@>
+@^Aragon, Cecilia Rodriguez@>
+@^Seidel, Raimund@>
+[{\sl Communications of the ACM\/ \bf23} (1980), 229--239;
+{\sl IEEE Symp.\ on Foundations of Computer Science\/ \bf30} (1989), 540--546].
+Each node of the treap has two keys: One, called |loc|, is the
+base address of 512 simulated tetrabytes; it follows the conventions
+of an ordinary binary search tree, with all locations in the left subtree
+less than the |loc| of a node and all locations in the right subtree
+greater than that~|loc|. The other, called |stamp|, can be thought of as the
+time the node was inserted into the tree; all subnodes of a given node
+have a larger~|stamp|. By assigning time stamps at random, we maintain
+a tree structure that almost always is fairly well balanced.
+
+Each simulated tetrabyte has an associated frequency count and
+source file reference.
+
+@=
+typedef struct {
+ tetra tet; /* the tetrabyte of simulated memory */
+ tetra freq; /* the number of times it was obeyed as an instruction */
+ unsigned char bkpt; /* breakpoint information for this tetrabyte */
+ unsigned char file_no; /* source file number, if known */
+ unsigned short line_no; /* source line number, if known */
+} mem_tetra;
+@#
+typedef struct mem_node_struct {
+ octa loc; /* location of the first of 512 simulated tetrabytes */
+ tetra stamp; /* time stamp for treap balancing */
+ struct mem_node_struct *left, *right; /* pointers to subtrees */
+ mem_tetra dat[512]; /* the chunk of simulated tetrabytes */
+} mem_node;
+
+@ The |stamp| value is actually only pseudorandom, based on the
+idea of Fibonacci hashing [see {\sl Sorting and Searching}, Section~6.4].
+This is good enough for our purposes, and it guarantees that
+no two stamps will be identical.
+
+@=
+mem_node* new_mem @,@,@[ARGS((void))@];@+@t}\6{@>
+mem_node* new_mem()
+{
+ register mem_node *p;
+ p=(mem_node*)calloc(1,sizeof(mem_node));
+ if (!p) panic("Can't allocate any more memory");
+@.Can't allocate...@>
+ p->stamp=priority;
+ priority+=0x9e3779b9; /* $\lfloor2^{32}(\phi-1)\rfloor$ */
+ return p;
+}
+
+@ Initially we start with a chunk for the pool segment, since
+the simulator will be putting command-line information there before
+it runs the program.
+
+@=
+mem_root=new_mem();
+mem_root->loc.h=0x40000000;
+last_mem=mem_root;
+
+@ @=
+tetra priority=314159265; /* pseudorandom time stamp counter */
+mem_node *mem_root; /* root of the treap */
+mem_node *last_mem; /* the memory node most recently read or written */
+
+@ The |mem_find| routine finds a given tetrabyte in the simulated
+memory, inserting a new node into the treap if necessary.
+
+@=
+mem_tetra* mem_find @,@,@[ARGS((octa))@];@+@t}\6{@>
+mem_tetra* mem_find(addr)
+ octa addr;
+{
+ octa key;
+ register int offset;
+ register mem_node *p=last_mem;
+ key.h=addr.h;
+ key.l=addr.l&0xfffff800;
+ offset=addr.l&0x7fc;
+ if (p->loc.l!=key.l || p->loc.h!=key.h)
+ @;
+ return &p->dat[offset>>2];
+}
+
+@ @=
+{@+register mem_node **q;
+ for (p=mem_root; p; ) {
+ if (key.l==p->loc.l && key.h==p->loc.h) goto found;
+ if ((key.lloc.l && key.h<=p->loc.h) || key.hloc.h) p=p->left;
+ else p=p->right;
+ }
+ for (p=mem_root,q=&mem_root; p && p->stamploc.l && key.h<=p->loc.h) || key.hloc.h) q=&p->left;
+ else q=&p->right;
+ }
+ *q=new_mem();
+ (*q)->loc=key;
+ @;
+ p=*q;
+found: last_mem=p;
+}
+
+@ At this point we want to split the binary search tree |p| into two
+parts based on the given |key|, forming the left and right subtrees
+of the new node~|q|. The effect will be as if |key| had been inserted
+before all of |p|'s nodes.
+
+@=
+{
+ register mem_node **l=&(*q)->left,**r=&(*q)->right;
+ while (p) {
+ if ((key.lloc.l && key.h<=p->loc.h) || key.hloc.h)
+ *r=p, r=&p->left, p=*r;
+ else *l=p, l=&p->right, p=*l;
+ }
+ *l=*r=NULL;
+}
+
+@* Loading an object file. To get the user's program into memory,
+we read in an \MMIX\ object, using modifications of the routines
+in the utility program \.{MMOtype}. Complete details of \.{mmo}
+format appear in the program for {\mc MMIXAL}; a reader
+who hopes to understand this section ought to at least skim
+that documentation.
+Here we need to define only the basic constants used for interpretation.
+
+@d mm 0x98 /* the escape code of \.{mmo} format */
+@d lop_quote 0x0 /* the quotation lopcode */
+@d lop_loc 0x1 /* the location lopcode */
+@d lop_skip 0x2 /* the skip lopcode */
+@d lop_fixo 0x3 /* the octabyte-fix lopcode */
+@d lop_fixr 0x4 /* the relative-fix lopcode */
+@d lop_fixrx 0x5 /* extended relative-fix lopcode */
+@d lop_file 0x6 /* the file name lopcode */
+@d lop_line 0x7 /* the file position lopcode */
+@d lop_spec 0x8 /* the special hook lopcode */
+@d lop_pre 0x9 /* the preamble lopcode */
+@d lop_post 0xa /* the postamble lopcode */
+@d lop_stab 0xb /* the symbol table lopcode */
+@d lop_end 0xc /* the end-it-all lopcode */
+
+@ We do not load the symbol table. (A more ambitious simulator could
+implement \.{MMIXAL}-style expressions for interactive debugging,
+but such enhancements are left to the interested reader.)
+
+@=
+mmo_file=fopen(mmo_file_name,"rb");
+if (!mmo_file) {
+ register char *alt_name=(char*)calloc(strlen(mmo_file_name)+5,sizeof(char));
+ if (!alt_name) panic("Can't allocate file name buffer");
+@.Can't allocate...@>
+ sprintf(alt_name,"%s.mmo",mmo_file_name);
+ mmo_file=fopen(alt_name,"rb");
+ if (!mmo_file) {
+ fprintf(stderr,"Can't open the object file %s or %s!\n",
+@.Can't open...@>
+ mmo_file_name,alt_name);
+ exit(-3);
+ }
+ free(alt_name);
+}
+byte_count=0;
+
+@ @=
+FILE *mmo_file; /* the input file */
+int postamble; /* have we encountered |lop_post|? */
+int byte_count; /* index of the next-to-be-read byte */
+byte buf[4]; /* the most recently read bytes */
+int yzbytes; /* the two least significant bytes */
+int delta; /* difference for relative fixup */
+tetra tet; /* |buf| bytes packed big-endianwise */
+
+@ The tetrabytes of an \.{mmo} file are stored in
+friendly big-endian fashion, but this program is supposed to work also
+on computers that are little-endian. Therefore we read four successive bytes
+and pack them into a tetrabyte, instead of reading a single tetrabyte.
+
+@d mmo_err {
+ fprintf(stderr,"Bad object file! (Try running MMOtype.)\n");
+@.Bad object file@>
+ exit(-4);
+ }
+
+@=
+void read_tet @,@,@[ARGS((void))@];@+@t}\6{@>
+void read_tet()
+{
+ if (fread(buf,1,4,mmo_file)!=4) mmo_err;
+ yzbytes=(buf[2]<<8)+buf[3];
+ tet=(((buf[0]<<8)+buf[1])<<16)+yzbytes;
+}
+
+@ @=
+byte read_byte @,@,@[ARGS((void))@];@+@t}\6{@>
+byte read_byte()
+{
+ register byte b;
+ if (!byte_count) read_tet();
+ b=buf[byte_count];
+ byte_count=(byte_count+1)&3;
+ return b;
+}
+
+@ @=
+read_tet(); /* read the first tetrabyte of input */
+if (buf[0]!=mm || buf[1]!=lop_pre) mmo_err;
+if (ybyte!=1) mmo_err;
+if (zbyte==0) obj_time=0xffffffff;
+else {
+ j=zbyte-1;
+ read_tet();@+ obj_time=tet; /* file creation time */
+ for (;j>0;j--) read_tet();
+}
+
+@ @=
+{
+ read_tet();
+ loop:@+if (buf[0]==mm) switch (buf[1]) {
+ case lop_quote:@+if (yzbytes!=1) mmo_err;
+ read_tet();@+break;
+ @t\4@>@@;
+ case lop_post: postamble=1;
+ if (ybyte || zbyte<32) mmo_err;
+ continue;
+ default: mmo_err;
+ }
+ @;
+}
+
+@ In a normal situation, the newly read tetrabyte is simply supposed
+to be loaded into the current location. We load not only the current
+location but also the current file position, if |cur_line| is nonzero
+and |cur_loc| belongs to segment~0.
+
+@d mmo_load(loc,val) ll=mem_find(loc), ll->tet^=val
+
+@=
+{
+ mmo_load(cur_loc,tet);
+ if (cur_line) {
+ ll->file_no=cur_file;
+ ll->line_no=cur_line;
+ cur_line++;
+ }
+ cur_loc=incr(cur_loc,4);@+ cur_loc.l &=-4;
+}
+
+@ @=
+octa cur_loc; /* the current location */
+int cur_file=-1; /* the most recently selected file number */
+int cur_line; /* the current position in |cur_file|, if nonzero */
+octa tmp; /* an octabyte of temporary interest */
+tetra obj_time; /* when the object file was created */
+
+@ @=
+cur_loc.h=cur_loc.l=0;
+cur_file=-1;
+cur_line=0;
+@;
+do @@;@+while (!postamble);
+@;
+fclose(mmo_file);
+cur_line=0;
+
+@ We have already implemented |lop_quote|, which
+falls through to the normal case after reading an extra tetrabyte.
+Now let's consider the other lopcodes in turn.
+
+@d ybyte buf[2] /* the next-to-least significant byte */
+@d zbyte buf[3] /* the least significant byte */
+
+@=
+case lop_loc:@+if (zbyte==2) {
+ j=ybyte;@+ read_tet();@+ cur_loc.h=(j<<24)+tet;
+ }@+else if (zbyte==1) cur_loc.h=ybyte<<24;
+ else mmo_err;
+ read_tet();@+ cur_loc.l=tet;
+ continue;
+case lop_skip: cur_loc=incr(cur_loc,yzbytes);@+continue;
+
+@ Fixups load information out of order, when future references have
+been resolved. The current file name and line number are not considered
+relevant.
+
+@=
+case lop_fixo:@+if (zbyte==2) {
+ j=ybyte;@+ read_tet();@+ tmp.h=(j<<24)+tet;
+ }@+else if (zbyte==1) tmp.h=ybyte<<24;
+ else mmo_err;
+ read_tet();@+ tmp.l=tet;
+ mmo_load(tmp,cur_loc.h);
+ mmo_load(incr(tmp,4),cur_loc.l);
+ continue;
+case lop_fixr: delta=yzbytes; goto fixr;
+case lop_fixrx:j=yzbytes;@+if (j!=16 && j!=24) mmo_err;
+ read_tet(); delta=tet;
+ if (delta&0xfe000000) mmo_err;
+fixr: tmp=incr(cur_loc,-(delta>=0x1000000? (delta&0xffffff)-(1<=
+case lop_file:@+if (file_info[ybyte].name) {
+ if (zbyte) mmo_err;
+ cur_file=ybyte;
+ }@+else {
+ if (!zbyte) mmo_err;
+ file_info[ybyte].name=(char*)calloc(4*zbyte+1,1);
+ if (!file_info[ybyte].name) {
+ fprintf(stderr,"No room to store the file name!\n");@+exit(-5);
+@.No room...@>
+ }
+ cur_file=ybyte;
+ for (j=zbyte,p=file_info[ybyte].name; j>0; j--,p+=4) {
+ read_tet();
+ *p=buf[0];@+*(p+1)=buf[1];@+*(p+2)=buf[2];@+*(p+3)=buf[3];
+ }
+ }
+ cur_line=0;@+continue;
+case lop_line:@+if (cur_file<0) mmo_err;
+ cur_line=yzbytes;@+continue;
+
+@ Special bytes are ignored (at least for now).
+
+@=
+case lop_spec:@+ while(1) {
+ read_tet();
+ if (buf[0]==mm) {
+ if (buf[1]!=lop_quote || yzbytes!=1) goto loop; /* end of special data */
+ read_tet();
+ }
+ }
+
+@ Since a chunk of memory holds 512 tetrabytes, the |ll| pointer in the
+following loop stays in the same chunk (namely, the first chunk
+of segment~3, also known as \.{Stack\_Segment}).
+@:Stack_Segment}\.{Stack\_Segment@>
+@:Pool_Segment}\.{Pool\_Segment@>
+
+@=
+aux.h=0x60000000;@+ aux.l=0x18;
+ll=mem_find(aux);
+(ll-1)->tet=2; /* this will ultimately set |rL=2| */
+(ll-5)->tet=argc; /* and $\$0=|argc|$ */
+(ll-4)->tet=0x40000000;
+(ll-3)->tet=0x8; /* and $\$1=\.{Pool\_Segment}+8$ */
+G=zbyte;@+ L=0;
+for (j=G+G;j<256+256;j++,ll++,aux.l+=4) read_tet(), ll->tet=tet;
+inst_ptr.h=(ll-2)->tet, inst_ptr.l=(ll-1)->tet; /* \.{Main} */
+(ll+2*12)->tet=G<<24;
+g[255]=incr(aux,12*8); /* we will |UNSAVE| from here, to get going */
+
+@* Loading and printing source lines.
+The loaded program generally contains cross references to the lines
+of symbolic source files, so that the context of each instruction
+can be understood. The following sections of this program
+make such information available when it is desired.
+
+Source file data is kept in a \&{file\_node} structure:
+
+@=
+typedef struct {
+ char *name; /* name of source file */
+ int line_count; /* number of lines in the file */
+ long *map; /* pointer to map of file positions */
+} file_node;
+
+@ In partial preparation for the day when source files are in
+Unicode, we define a type \&{Char} for the source characters.
+
+@=
+typedef char Char; /* bytes that will become wydes some day */
+
+@ @=
+file_node file_info[256]; /* data about each source file */
+int buf_size; /* size of buffer for source lines */
+Char *buffer;
+
+@ As in \.{MMIXAL}, we prefer source lines of length 72 characters or less,
+but the user is allowed to increase the limit. (Longer lines will silently
+be truncated to the buffer size when the simulator lists them.)
+
+@=
+if (buf_size<72) buf_size=72;
+buffer=(Char*)calloc(buf_size+1,sizeof(Char));
+if (!buffer) panic("Can't allocate source line buffer");
+@.Can't allocate...@>
+
+@ The first time we are called upon to list a line from a given source
+file, we make a map of starting locations for each line. Source files
+should contain at most 65535 lines. We assume that they contain
+no null characters.
+
+@=
+void make_map @,@,@[ARGS((void))@];@+@t}\6{@>
+void make_map()
+{
+ long map[65536];
+ register int k,l;
+ register long*p;
+ @;
+ for (l=1;l<65536 && !feof(src_file);l++) {
+ map[l]=ftell(src_file);
+ loop:@+if (!fgets(buffer,buf_size,src_file)) break;
+ if (buffer[strlen(buffer)-1]!='\n') goto loop;
+ }
+ file_info[cur_file].line_count=l;
+ file_info[cur_file].map=p=(long*)calloc(l,sizeof(long));
+ if (!p) panic("No room for a source-line map");
+@.No room...@>
+ for (k=1;k
+
+@=
+#include
+#include
+
+@ @=
+@^system dependencies@>
+{
+ struct stat stat_buf;
+ if (stat(file_info[cur_file].name,&stat_buf)>=0)
+ if ((tetra)stat_buf.st_mtime > obj_time)
+ fprintf(stderr,
+ "Warning: File %s was modified; it may not match the program!\n",
+@.File...was modified@>
+ file_info[cur_file].name);
+}
+
+@ Source lines are listed by the |print_line| routine, preceded by
+12 characters containing the line number. If a file error occurs,
+nothing is printed---not even an error message; the absence of
+listed data is itself a message.
+
+@=
+void print_line @,@,@[ARGS((int))@];@+@t}\6{@>
+void print_line(k)
+ int k;
+{
+ char buf[11];
+ if (k>=file_info[cur_file].line_count) return;
+ if (fseek(src_file,file_info[cur_file].map[k],SEEK_SET)!=0) return;
+ if (!fgets(buffer,buf_size,src_file)) return;
+ sprintf(buf,"%d: ",k);
+ printf("line %.6s %s",buf,buffer);
+ if (buffer[strlen(buffer)-1]!='\n') printf("\n");
+ line_shown=true;
+}
+
+@ @=
+#ifndef SEEK_SET
+#define SEEK_SET 0 /* code for setting the file pointer to a given offset */
+#endif
+
+@ The |show_line| routine is called when we want to output line |cur_line|
+of source file number |cur_file|, assuming that |cur_line!=0|. Its job
+is primarily to maintain continuity, by opening or reopening the |src_file|
+if the source file changes, and by connecting the previously output
+lines to the new one. Sometimes no output is necessary, because the
+desired line has already been printed.
+
+@=
+void show_line @,@,@[ARGS((void))@];@+@t}\6{@>
+void show_line()
+{
+ register int k;
+ if (shown_file!=cur_file) @@;
+ else if (shown_line==cur_line) return; /* already shown */
+ if (cur_line>shown_line+gap+1 || cur_line0)
+ if (cur_line=
+FILE *src_file; /* the currently open source file */
+int shown_file=-1; /* index of the most recently listed file */
+int shown_line; /* the line most recently listed in |shown_file| */
+int gap; /* minimum gap between consecutively listed source lines */
+bool line_shown; /* did we list anything recently? */
+bool showing_source; /* are we listing source lines? */
+int profile_gap; /* the |gap| when printing final frequencies */
+bool profile_showing_source; /* |showing_source| within final frequencies */
+
+@ @=
+{
+ if (!src_file) src_file=fopen(file_info[cur_file].name,"r");
+ else freopen(file_info[cur_file].name,"r",src_file);
+ if (!src_file) {
+ fprintf(stderr,"Warning: I can't open file %s; source listing omitted.\n",
+@.I can't open...@>
+ file_info[cur_file].name);
+ showing_source=false;
+ return;
+ }
+ printf("\"%s\"\n",file_info[cur_file].name);
+ shown_file=cur_file;
+ shown_line=0;
+ if (!file_info[cur_file].map) make_map();
+}
+
+@ Here is a simple application of |show_line|. It is a recursive routine that
+prints the frequency counts of all instructions that occur in a
+given subtree of the simulated memory and that were executed at least once.
+The subtree is traversed in symmetric order; therefore the frequencies
+appear in increasing order of the instruction locations.
+
+@=
+void print_freqs @,@,@[ARGS((mem_node*))@];@+@t}\6{@>
+void print_freqs(p)
+ mem_node *p;
+{
+ register int j;
+ octa cur_loc;
+ if (p->left) print_freqs(p->left);
+ for (j=0;j<512;j++) if (p->dat[j].freq)
+ @loc+4*j|@>;
+ if (p->right) print_freqs(p->right);
+}
+
+@ An ellipsis (\.{...}) is printed between frequency data for nonconsecutive
+instructions, unless source line information intervenes.
+
+@=
+{
+ cur_loc=incr(p->loc,4*j);
+ if (showing_source && p->dat[j].line_no) {
+ cur_file=p->dat[j].file_no, cur_line=p->dat[j].line_no;
+ line_shown=false;
+ show_line();
+ if (line_shown) goto loc_implied;
+ }
+ if (cur_loc.l!=implied_loc.l || cur_loc.h!=implied_loc.h)
+ if (profile_started) printf(" 0. ...\n");
+ loc_implied: printf("%10d. %08x%08x: %08x (%s)\n",
+ p->dat[j].freq, cur_loc.h, cur_loc.l, p->dat[j].tet,
+ info[p->dat[j].tet>>24].name);
+ implied_loc=incr(cur_loc,4);@+ profile_started=true;
+}
+
+@ @=
+octa implied_loc; /* location following the last shown frequency data */
+bool profile_started; /* have we printed at least one frequency count? */
+
+@ @=
+{
+ printf("\nProgram profile:\n");
+ shown_file=cur_file=-1;@+ shown_line=cur_line=0;
+ gap=profile_gap;
+ showing_source=profile_showing_source;
+ implied_loc=neg_one;
+ print_freqs(mem_root);
+}
+
+@* Lists. This simulator needs to deal with 256 different opcodes,
+so we might as well enumerate them~now.
+
+@=
+typedef enum{@/
+@!TRAP,@!FCMP,@!FUN,@!FEQL,@!FADD,@!FIX,@!FSUB,@!FIXU,@/
+@!FLOT,@!FLOTI,@!FLOTU,@!FLOTUI,@!SFLOT,@!SFLOTI,@!SFLOTU,@!SFLOTUI,@/
+@!FMUL,@!FCMPE,@!FUNE,@!FEQLE,@!FDIV,@!FSQRT,@!FREM,@!FINT,@/
+@!MUL,@!MULI,@!MULU,@!MULUI,@!DIV,@!DIVI,@!DIVU,@!DIVUI,@/
+@!ADD,@!ADDI,@!ADDU,@!ADDUI,@!SUB,@!SUBI,@!SUBU,@!SUBUI,@/
+@!IIADDU,@!IIADDUI,@!IVADDU,@!IVADDUI,@!VIIIADDU,@!VIIIADDUI,@!XVIADDU,@!XVIADDUI,@/
+@!CMP,@!CMPI,@!CMPU,@!CMPUI,@!NEG,@!NEGI,@!NEGU,@!NEGUI,@/
+@!SL,@!SLI,@!SLU,@!SLUI,@!SR,@!SRI,@!SRU,@!SRUI,@/
+@!BN,@!BNB,@!BZ,@!BZB,@!BP,@!BPB,@!BOD,@!BODB,@/
+@!BNN,@!BNNB,@!BNZ,@!BNZB,@!BNP,@!BNPB,@!BEV,@!BEVB,@/
+@!PBN,@!PBNB,@!PBZ,@!PBZB,@!PBP,@!PBPB,@!PBOD,@!PBODB,@/
+@!PBNN,@!PBNNB,@!PBNZ,@!PBNZB,@!PBNP,@!PBNPB,@!PBEV,@!PBEVB,@/
+@!CSN,@!CSNI,@!CSZ,@!CSZI,@!CSP,@!CSPI,@!CSOD,@!CSODI,@/
+@!CSNN,@!CSNNI,@!CSNZ,@!CSNZI,@!CSNP,@!CSNPI,@!CSEV,@!CSEVI,@/
+@!ZSN,@!ZSNI,@!ZSZ,@!ZSZI,@!ZSP,@!ZSPI,@!ZSOD,@!ZSODI,@/
+@!ZSNN,@!ZSNNI,@!ZSNZ,@!ZSNZI,@!ZSNP,@!ZSNPI,@!ZSEV,@!ZSEVI,@/
+@!LDB,@!LDBI,@!LDBU,@!LDBUI,@!LDW,@!LDWI,@!LDWU,@!LDWUI,@/
+@!LDT,@!LDTI,@!LDTU,@!LDTUI,@!LDO,@!LDOI,@!LDOU,@!LDOUI,@/
+@!LDSF,@!LDSFI,@!LDHT,@!LDHTI,@!CSWAP,@!CSWAPI,@!LDUNC,@!LDUNCI,@/
+@!LDVTS,@!LDVTSI,@!PRELD,@!PRELDI,@!PREGO,@!PREGOI,@!GO,@!GOI,@/
+@!STB,@!STBI,@!STBU,@!STBUI,@!STW,@!STWI,@!STWU,@!STWUI,@/
+@!STT,@!STTI,@!STTU,@!STTUI,@!STO,@!STOI,@!STOU,@!STOUI,@/
+@!STSF,@!STSFI,@!STHT,@!STHTI,@!STCO,@!STCOI,@!STUNC,@!STUNCI,@/
+@!SYNCD,@!SYNCDI,@!PREST,@!PRESTI,@!SYNCID,@!SYNCIDI,@!PUSHGO,@!PUSHGOI,@/
+@!OR,@!ORI,@!ORN,@!ORNI,@!NOR,@!NORI,@!XOR,@!XORI,@/
+@!AND,@!ANDI,@!ANDN,@!ANDNI,@!NAND,@!NANDI,@!NXOR,@!NXORI,@/
+@!BDIF,@!BDIFI,@!WDIF,@!WDIFI,@!TDIF,@!TDIFI,@!ODIF,@!ODIFI,@/
+@!MUX,@!MUXI,@!SADD,@!SADDI,@!MOR,@!MORI,@!MXOR,@!MXORI,@/
+@!SETH,@!SETMH,@!SETML,@!SETL,@!INCH,@!INCMH,@!INCML,@!INCL,@/
+@!ORH,@!ORMH,@!ORML,@!ORL,@!ANDNH,@!ANDNMH,@!ANDNML,@!ANDNL,@/
+@!JMP,@!JMPB,@!PUSHJ,@!PUSHJB,@!GETA,@!GETAB,@!PUT,@!PUTI,@/
+@!POP,@!RESUME,@!SAVE,@!UNSAVE,@!SYNC,@!SWYM,@!GET,@!TRIP}@+@!mmix_opcode;
+
+@ We also need to enumerate the special names for special registers.
+
+@=
+typedef enum{
+@!rB,@!rD,@!rE,@!rH,@!rJ,@!rM,@!rR,@!rBB,
+ @!rC,@!rN,@!rO,@!rS,@!rI,@!rT,@!rTT,@!rK,@!rQ,@!rU,@!rV,@!rG,@!rL,
+ @!rA,@!rF,@!rP,@!rW,@!rX,@!rY,@!rZ,@!rWW,@!rXX,@!rYY,@!rZZ} @!special_reg;
+
+@ @=
+char *special_name[32]={"rB","rD","rE","rH","rJ","rM","rR","rBB",
+ "rC","rN","rO","rS","rI","rT","rTT","rK","rQ","rU","rV","rG","rL",
+ "rA","rF","rP","rW","rX","rY","rZ","rWW","rXX","rYY","rZZ"};
+
+@ Here are the bit codes for arithmetic exceptions. These codes, except
+|H_BIT|, are defined also in {\mc MMIX-ARITH}.
+
+@d X_BIT (1<<8) /* floating inexact */
+@d Z_BIT (1<<9) /* floating division by zero */
+@d U_BIT (1<<10) /* floating underflow */
+@d O_BIT (1<<11) /* floating overflow */
+@d I_BIT (1<<12) /* floating invalid operation */
+@d W_BIT (1<<13) /* float-to-fix overflow */
+@d V_BIT (1<<14) /* integer overflow */
+@d D_BIT (1<<15) /* integer divide check */
+@d H_BIT (1<<16) /* trip */
+
+@ The |bkpt| field associated with each tetrabyte of memory has
+bits associated with forced tracing and/or
+breaking for reading, writing, and/or execution.
+
+@d trace_bit (1<<3)
+@d read_bit (1<<2)
+@d write_bit (1<<1)
+@d exec_bit (1<<0)
+
+@ To complete our lists of lists,
+we enumerate the rudimentary operating system calls
+that are built in to \.{MMIXAL}.
+
+@d max_sys_call Ftell
+
+@=
+typedef enum{
+@!Halt,@!Fopen,@!Fclose,@!Fread,@!Fgets,@!Fgetws,
+@!Fwrite,@!Fputs,@!Fputws,@!Fseek,@!Ftell} @!sys_call;
+
+@* The main loop. Now let's plunge in to the guts of the simulator,
+the master switch that controls most of the action.
+
+@=
+{
+ if (resuming) loc=incr(inst_ptr,-4), inst=g[rX].l;
+ else @;
+ op=inst>>24;@+xx=(inst>>16)&0xff;@+yy=(inst>>8)&0xff;@+zz=inst&0xff;
+ f=info[op].flags;@+yz=inst&0xffff;
+ x=y=z=a=b=zero_octa;@+ exc=0;@+ old_L=L;
+ if (f&rel_addr_bit) @;
+ @;
+ if (f&X_is_dest_bit) @;
+ w=oplus(y,z);
+ if (loc.h>=0x20000000) goto privileged_inst;
+ switch(op) {
+ @t\4@>@;
+ }
+ @;
+ @;
+ @;
+ if (resuming && op!=RESUME) resuming=false;
+}
+
+@ Operands |x| and |a| are usually destinations (results), computed from
+the source operands |y|, |z|, and/or~|b|.
+
+@=
+octa w,x,y,z,a,b,ma,mb; /* operands */
+octa *x_ptr; /* destination */
+octa loc; /* location of the current instruction */
+octa inst_ptr; /* location of the next instruction */
+tetra inst; /* the current instruction */
+int old_L; /* value of |L| before the current instruction */
+int exc; /* exceptions raised by the current instruction */
+int tracing_exceptions; /* exception bits that cause tracing */
+int rop; /* ropcode of a resumed instruction */
+int round_mode; /* the style of floating point rounding just used */
+bool resuming; /* are we resuming an interrupted instruction? */
+bool halted; /* did the program come to a halt? */
+bool breakpoint; /* should we pause after the current instruction? */
+bool tracing; /* should we trace the current instruction? */
+bool stack_tracing; /* should we trace details of the register stack? */
+bool interacting; /* are we in interactive mode? */
+bool interact_after_break; /* should we go into interactive mode? */
+bool tripping; /* are we about to go to a trip handler? */
+bool good; /* did the last branch instruction guess correctly? */
+tetra trace_threshold; /* each instruction should be traced this many times */
+
+@ @=
+register mmix_opcode op; /* operation code of the current instruction */
+register int xx,yy,zz,yz; /* operand fields of the current instruction */
+register tetra f; /* properties of the current |op| */
+register int i,j,k; /* miscellaneous indices */
+register mem_tetra *ll; /* current place in the simulated memory */
+register char *p; /* current place in a string */
+
+@ @=
+{
+ loc=inst_ptr;
+ ll=mem_find(loc);
+ inst=ll->tet;
+ cur_file=ll->file_no;
+ cur_line=ll->line_no;
+ ll->freq++;
+ if (ll->bkpt&exec_bit) breakpoint=true;
+ tracing=breakpoint||(ll->bkpt&trace_bit)||(ll->freq<=trace_threshold);
+ inst_ptr=incr(inst_ptr,4);
+}
+
+@ Much of the simulation is table-driven, based on a static data
+structure called the \&{op\_info} for each operation code.
+
+@=
+typedef struct {
+ char *name; /* symbolic name of an opcode */
+ unsigned char flags; /* its instruction format */
+ unsigned char third_operand; /* its special register input */
+ unsigned char mems; /* how many $\mu$ it costs */
+ unsigned char oops; /* how many $\upsilon$ it costs */
+ char *trace_format; /* how it appears when traced */
+} op_info;
+
+@ For example, the |flags| field of |info[op]|
+tells us how to obtain the operands from the X, Y, and~Z fields
+of the current instruction. Each entry records special properties of an
+operation code, in binary notation:
+\Hex{1}~means Z~is an immediate value, \Hex{2}~means rZ is
+a source operand, \Hex{4}~means Y~is an immediate value, \Hex{8}~means rY is a
+source operand, \Hex{10}~means rX is a source operand, \Hex{20}~means
+rX is a destination, \Hex{40}~means YZ is part of a relative address,
+\Hex{80}~means a push or pop or unsave instruction.
+
+The |trace_format| field will be explained later.
+
+@d Z_is_immed_bit 0x1
+@d Z_is_source_bit 0x2
+@d Y_is_immed_bit 0x4
+@d Y_is_source_bit 0x8
+@d X_is_source_bit 0x10
+@d X_is_dest_bit 0x20
+@d rel_addr_bit 0x40
+@d push_pop_bit 0x80
+
+@=
+op_info info[256]={
+@,
+@,
+@,
+@};
+
+@ @=
+{"TRAP",0x0a,255,0,5,"%r"},@|
+{"FCMP",0x2a,0,0,1,"%l = %.y cmp %.z = %x"},@|
+{"FUN",0x2a,0,0,1,"%l = [%.y(||)%.z] = %x"},@|
+{"FEQL",0x2a,0,0,1,"%l = [%.y(==)%.z] = %x"},@|
+{"FADD",0x2a,0,0,4,"%l = %.y %(+%) %.z = %.x"},@|
+{"FIX",0x26,0,0,4,"%l = %(fix%) %.z = %x"},@|
+{"FSUB",0x2a,0,0,4,"%l = %.y %(-%) %.z = %.x"},@|
+{"FIXU",0x26,0,0,4,"%l = %(fix%) %.z = %#x"},@|
+{"FLOT",0x26,0,0,4,"%l = %(flot%) %z = %.x"},@|
+{"FLOTI",0x25,0,0,4,"%l = %(flot%) %z = %.x"},@|
+{"FLOTU",0x26,0,0,4,"%l = %(flot%) %#z = %.x"},@|
+{"FLOTUI",0x25,0,0,4,"%l = %(flot%) %z = %.x"},@|
+{"SFLOT",0x26,0,0,4,"%l = %(sflot%) %z = %.x"},@|
+{"SFLOTI",0x25,0,0,4,"%l = %(sflot%) %z = %.x"},@|
+{"SFLOTU",0x26,0,0,4,"%l = %(sflot%) %#z = %.x"},@|
+{"SFLOTUI",0x25,0,0,4,"%l = %(sflot%) %z = %.x"},@|
+{"FMUL",0x2a,0,0,4,"%l = %.y %(*%) %.z = %.x"},@|
+{"FCMPE",0x2a,rE,0,4,"%l = %.y cmp %.z (%.b)) = %x"},@|
+{"FUNE",0x2a,rE,0,1,"%l = [%.y(||)%.z (%.b)] = %x"},@|
+{"FEQLE",0x2a,rE,0,4,"%l = [%.y(==)%.z (%.b)] = %x"},@|
+{"FDIV",0x2a,0,0,40,"%l = %.y %(/%) %.z = %.x"},@|
+{"FSQRT",0x26,0,0,40,"%l = %(sqrt%) %.z = %.x"},@|
+{"FREM",0x2a,0,0,4,"%l = %.y %(rem%) %.z = %.x"},@|
+{"FINT",0x26,0,0,4,"%l = %(int%) %.z = %.x"},@|
+{"MUL",0x2a,0,0,10,"%l = %y * %z = %x"},@|
+{"MULI",0x29,0,0,10,"%l = %y * %z = %x"},@|
+{"MULU",0x2a,0,0,10,"%l = %#y * %#z = %#x, rH=%#a"},@|
+{"MULUI",0x29,0,0,10,"%l = %#y * %z = %#x, rH=%#a"},@|
+{"DIV",0x2a,0,0,60,"%l = %y / %z = %x, rR=%a"},@|
+{"DIVI",0x29,0,0,60,"%l = %y / %z = %x, rR=%a"},@|
+{"DIVU",0x2a,rD,0,60,"%l = %#b%0y / %#z = %#x, rR=%#a"},@|
+{"DIVUI",0x29,rD,0,60,"%l = %#b%0y / %z = %#x, rR=%#a"},@|
+{"ADD",0x2a,0,0,1,"%l = %y + %z = %x"},@|
+{"ADDI",0x29,0,0,1,"%l = %y + %z = %x"},@|
+{"ADDU",0x2a,0,0,1,"%l = %#y + %#z = %#x"},@|
+{"ADDUI",0x29,0,0,1,"%l = %#y + %z = %#x"},@|
+{"SUB",0x2a,0,0,1,"%l = %y - %z = %x"},@|
+{"SUBI",0x29,0,0,1,"%l = %y - %z = %x"},@|
+{"SUBU",0x2a,0,0,1,"%l = %#y - %#z = %#x"},@|
+{"SUBUI",0x29,0,0,1,"%l = %#y - %z = %#x"},@|
+{"2ADDU",0x2a,0,0,1,"%l = %#y <<1+ %#z = %#x"},@|
+{"2ADDUI",0x29,0,0,1,"%l = %#y <<1+ %z = %#x"},@|
+{"4ADDU",0x2a,0,0,1,"%l = %#y <<2+ %#z = %#x"},@|
+{"4ADDUI",0x29,0,0,1,"%l = %#y <<2+ %z = %#x"},@|
+{"8ADDU",0x2a,0,0,1,"%l = %#y <<3+ %#z = %#x"},@|
+{"8ADDUI",0x29,0,0,1,"%l = %#y <<3+ %z = %#x"},@|
+{"16ADDU",0x2a,0,0,1,"%l = %#y <<4+ %#z = %#x"},@|
+{"16ADDUI",0x29,0,0,1,"%l = %#y <<4+ %z = %#x"},@|
+{"CMP",0x2a,0,0,1,"%l = %y cmp %z = %x"},@|
+{"CMPI",0x29,0,0,1,"%l = %y cmp %z = %x"},@|
+{"CMPU",0x2a,0,0,1,"%l = %#y cmp %#z = %x"},@|
+{"CMPUI",0x29,0,0,1,"%l = %#y cmp %z = %x"},@|
+{"NEG",0x26,0,0,1,"%l = %y - %z = %x"},@|
+{"NEGI",0x25,0,0,1,"%l = %y - %z = %x"},@|
+{"NEGU",0x26,0,0,1,"%l = %y - %#z = %#x"},@|
+{"NEGUI",0x25,0,0,1,"%l = %y - %z = %#x"},@|
+{"SL",0x2a,0,0,1,"%l = %y << %#z = %x"},@|
+{"SLI",0x29,0,0,1,"%l = %y << %z = %x"},@|
+{"SLU",0x2a,0,0,1,"%l = %#y << %#z = %#x"},@|
+{"SLUI",0x29,0,0,1,"%l = %#y << %z = %#x"},@|
+{"SR",0x2a,0,0,1,"%l = %y >> %#z = %x"},@|
+{"SRI",0x29,0,0,1,"%l = %y >> %z = %x"},@|
+{"SRU",0x2a,0,0,1,"%l = %#y >> %#z = %#x"},@|
+{"SRUI",0x29,0,0,1,"%l = %#y >> %z = %#x"}
+
+@ @=
+{"BN",0x50,0,0,1,"%b<0? %t%g"},@|
+{"BNB",0x50,0,0,1,"%b<0? %t%g"},@|
+{"BZ",0x50,0,0,1,"%b==0? %t%g"},@|
+{"BZB",0x50,0,0,1,"%b==0? %t%g"},@|
+{"BP",0x50,0,0,1,"%b>0? %t%g"},@|
+{"BPB",0x50,0,0,1,"%b>0? %t%g"},@|
+{"BOD",0x50,0,0,1,"%b odd? %t%g"},@|
+{"BODB",0x50,0,0,1,"%b odd? %t%g"},@|
+{"BNN",0x50,0,0,1,"%b>=0? %t%g"},@|
+{"BNNB",0x50,0,0,1,"%b>=0? %t%g"},@|
+{"BNZ",0x50,0,0,1,"%b!=0? %t%g"},@|
+{"BNZB",0x50,0,0,1,"%b!=0? %t%g"},@|
+{"BNP",0x50,0,0,1,"%b<=0? %t%g"},@|
+{"BNPB",0x50,0,0,1,"%b<=0? %t%g"},@|
+{"BEV",0x50,0,0,1,"%b even? %t%g"},@|
+{"BEVB",0x50,0,0,1,"%b even? %t%g"},@|
+{"PBN",0x50,0,0,1,"%b<0? %t%g"},@|
+{"PBNB",0x50,0,0,1,"%b<0? %t%g"},@|
+{"PBZ",0x50,0,0,1,"%b==0? %t%g"},@|
+{"PBZB",0x50,0,0,1,"%b==0? %t%g"},@|
+{"PBP",0x50,0,0,1,"%b>0? %t%g"},@|
+{"PBPB",0x50,0,0,1,"%b>0? %t%g"},@|
+{"PBOD",0x50,0,0,1,"%b odd? %t%g"},@|
+{"PBODB",0x50,0,0,1,"%b odd? %t%g"},@|
+{"PBNN",0x50,0,0,1,"%b>=0? %t%g"},@|
+{"PBNNB",0x50,0,0,1,"%b>=0? %t%g"},@|
+{"PBNZ",0x50,0,0,1,"%b!=0? %t%g"},@|
+{"PBNZB",0x50,0,0,1,"%b!=0? %t%g"},@|
+{"PBNP",0x50,0,0,1,"%b<=0? %t%g"},@|
+{"PBNPB",0x50,0,0,1,"%b<=0? %t%g"},@|
+{"PBEV",0x50,0,0,1,"%b even? %t%g"},@|
+{"PBEVB",0x50,0,0,1,"%b even? %t%g"},@|
+{"CSN",0x3a,0,0,1,"%l = %y<0? %z: %b = %x"},@|
+{"CSNI",0x39,0,0,1,"%l = %y<0? %z: %b = %x"},@|
+{"CSZ",0x3a,0,0,1,"%l = %y==0? %z: %b = %x"},@|
+{"CSZI",0x39,0,0,1,"%l = %y==0? %z: %b = %x"},@|
+{"CSP",0x3a,0,0,1,"%l = %y>0? %z: %b = %x"},@|
+{"CSPI",0x39,0,0,1,"%l = %y>0? %z: %b = %x"},@|
+{"CSOD",0x3a,0,0,1,"%l = %y odd? %z: %b = %x"},@|
+{"CSODI",0x39,0,0,1,"%l = %y odd? %z: %b = %x"},@|
+{"CSNN",0x3a,0,0,1,"%l = %y>=0? %z: %b = %x"},@|
+{"CSNNI",0x39,0,0,1,"%l = %y>=0? %z: %b = %x"},@|
+{"CSNZ",0x3a,0,0,1,"%l = %y!=0? %z: %b = %x"},@|
+{"CSNZI",0x39,0,0,1,"%l = %y!=0? %z: %b = %x"},@|
+{"CSNP",0x3a,0,0,1,"%l = %y<=0? %z: %b = %x"},@|
+{"CSNPI",0x39,0,0,1,"%l = %y<=0? %z: %b = %x"},@|
+{"CSEV",0x3a,0,0,1,"%l = %y even? %z: %b = %x"},@|
+{"CSEVI",0x39,0,0,1,"%l = %y even? %z: %b = %x"},@|
+{"ZSN",0x2a,0,0,1,"%l = %y<0? %z: 0 = %x"},@|
+{"ZSNI",0x29,0,0,1,"%l = %y<0? %z: 0 = %x"},@|
+{"ZSZ",0x2a,0,0,1,"%l = %y==0? %z: 0 = %x"},@|
+{"ZSZI",0x29,0,0,1,"%l = %y==0? %z: 0 = %x"},@|
+{"ZSP",0x2a,0,0,1,"%l = %y>0? %z: 0 = %x"},@|
+{"ZSPI",0x29,0,0,1,"%l = %y>0? %z: 0 = %x"},@|
+{"ZSOD",0x2a,0,0,1,"%l = %y odd? %z: 0 = %x"},@|
+{"ZSODI",0x29,0,0,1,"%l = %y odd? %z: 0 = %x"},@|
+{"ZSNN",0x2a,0,0,1,"%l = %y>=0? %z: 0 = %x"},@|
+{"ZSNNI",0x29,0,0,1,"%l = %y>=0? %z: 0 = %x"},@|
+{"ZSNZ",0x2a,0,0,1,"%l = %y!=0? %z: 0 = %x"},@|
+{"ZSNZI",0x29,0,0,1,"%l = %y!=0? %z: 0 = %x"},@|
+{"ZSNP",0x2a,0,0,1,"%l = %y<=0? %z: 0 = %x"},@|
+{"ZSNPI",0x29,0,0,1,"%l = %y<=0? %z: 0 = %x"},@|
+{"ZSEV",0x2a,0,0,1,"%l = %y even? %z: 0 = %x"},@|
+{"ZSEVI",0x29,0,0,1,"%l = %y even? %z: 0 = %x"}
+
+@ @=
+{"LDB",0x2a,0,1,1,"%l = M1[%#y+%#z] = %x"},@|
+{"LDBI",0x29,0,1,1,"%l = M1[%#y%?+] = %x"},@|
+{"LDBU",0x2a,0,1,1,"%l = M1[%#y+%#z] = %#x"},@|
+{"LDBUI",0x29,0,1,1,"%l = M1[%#y%?+] = %#x"},@|
+{"LDW",0x2a,0,1,1,"%l = M2[%#y+%#z] = %x"},@|
+{"LDWI",0x29,0,1,1,"%l = M2[%#y%?+] = %x"},@|
+{"LDWU",0x2a,0,1,1,"%l = M2[%#y+%#z] = %#x"},@|
+{"LDWUI",0x29,0,1,1,"%l = M2[%#y%?+] = %#x"},@|
+{"LDT",0x2a,0,1,1,"%l = M4[%#y+%#z] = %x"},@|
+{"LDTI",0x29,0,1,1,"%l = M4[%#y%?+] = %x"},@|
+{"LDTU",0x2a,0,1,1,"%l = M4[%#y+%#z] = %#x"},@|
+{"LDTUI",0x29,0,1,1,"%l = M4[%#y%?+] = %#x"},@|
+{"LDO",0x2a,0,1,1,"%l = M8[%#y+%#z] = %x"},@|
+{"LDOI",0x29,0,1,1,"%l = M8[%#y%?+] = %x"},@|
+{"LDOU",0x2a,0,1,1,"%l = M8[%#y+%#z] = %#x"},@|
+{"LDOUI",0x29,0,1,1,"%l = M8[%#y%?+] = %#x"},@|
+{"LDSF",0x2a,0,1,1,"%l = (M4[%#y+%#z]) = %.x"},@|
+{"LDSFI",0x29,0,1,1,"%l = (M4[%#y%?+]) = %.x"},@|
+{"LDHT",0x2a,0,1,1,"%l = M4[%#y+%#z]<<32 = %#x"},@|
+{"LDHTI",0x29,0,1,1,"%l = M4[%#y%?+]<<32 = %#x"},@|
+{"CSWAP",0x3a,0,2,2,"%l = [M8[%#y+%#z]==%a] = %x, %r"},@|
+{"CSWAPI",0x39,0,2,2,"%l = [M8[%#y%?+]==%a] = %x, %r"},@|
+{"LDUNC",0x2a,0,1,1,"%l = M8[%#y+%#z] = %#x"},@|
+{"LDUNCI",0x29,0,1,1,"%l = M8[%#y%?+] = %#x"},@|
+{"LDVTS",0x2a,0,0,1,""},@|
+{"LDVTSI",0x29,0,0,1,""},@|
+{"PRELD",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@|
+{"PRELDI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@|
+{"PREGO",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@|
+{"PREGOI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@|
+{"GO",0x2a,0,0,3,"%l = %#x, -> %#y+%#z"},@|
+{"GOI",0x29,0,0,3,"%l = %#x, -> %#y%?+"},@|
+{"STB",0x1a,0,1,1,"M1[%#y+%#z] = %b, M8[%#w]=%#a"},@|
+{"STBI",0x19,0,1,1,"M1[%#y%?+] = %b, M8[%#w]=%#a"},@|
+{"STBU",0x1a,0,1,1,"M1[%#y+%#z] = %#b, M8[%#w]=%#a"},@|
+{"STBUI",0x19,0,1,1,"M1[%#y%?+] = %#b, M8[%#w]=%#a"},@|
+{"STW",0x1a,0,1,1,"M2[%#y+%#z] = %b, M8[%#w]=%#a"},@|
+{"STWI",0x19,0,1,1,"M2[%#y%?+] = %b, M8[%#w]=%#a"},@|
+{"STWU",0x1a,0,1,1,"M2[%#y+%#z] = %#b, M8[%#w]=%#a"},@|
+{"STWUI",0x19,0,1,1,"M2[%#y%?+] = %#b, M8[%#w]=%#a"},@|
+{"STT",0x1a,0,1,1,"M4[%#y+%#z] = %b, M8[%#w]=%#a"},@|
+{"STTI",0x19,0,1,1,"M4[%#y%?+] = %b, M8[%#w]=%#a"},@|
+{"STTU",0x1a,0,1,1,"M4[%#y+%#z] = %#b, M8[%#w]=%#a"},@|
+{"STTUI",0x19,0,1,1,"M4[%#y%?+] = %#b, M8[%#w]=%#a"},@|
+{"STO",0x1a,0,1,1,"M8[%#y+%#z] = %b"},@|
+{"STOI",0x19,0,1,1,"M8[%#y%?+] = %b"},@|
+{"STOU",0x1a,0,1,1,"M8[%#y+%#z] = %#b"},@|
+{"STOUI",0x19,0,1,1,"M8[%#y%?+] = %#b"},@|
+{"STSF",0x1a,0,1,1,"%(M4[%#y+%#z]%) = %.b, M8[%#w]=%#a"},@|
+{"STSFI",0x19,0,1,1,"%(M4[%#y%?+]%) = %.b, M8[%#w]=%#a"},@|
+{"STHT",0x1a,0,1,1,"M4[%#y+%#z] = %#b>>32, M8[%#w]=%#a"},@|
+{"STHTI",0x19,0,1,1,"M4[%#y%?+] = %#b>>32, M8[%#w]=%#a"},@|
+{"STCO",0x0a,0,1,1,"M8[%#y+%#z] = %b"},@|
+{"STCOI",0x09,0,1,1,"M8[%#y%?+] = %b"},@|
+{"STUNC",0x1a,0,1,1,"M8[%#y+%#z] = %#b"},@|
+{"STUNCI",0x19,0,1,1,"M8[%#y%?+] = %#b"},@|
+{"SYNCD",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@|
+{"SYNCDI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@|
+{"PREST",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@|
+{"PRESTI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@|
+{"SYNCID",0x0a,0,0,1,"[%#y+%#z .. %#x]"},@|
+{"SYNCIDI",0x09,0,0,1,"[%#y%?+ .. %#x]"},@|
+{"PUSHGO",0xaa,0,0,3,"%lrO=%#b, rL=%a, rJ=%#x, -> %#y+%#z"},@|
+{"PUSHGOI",0xa9,0,0,3,"%lrO=%#b, rL=%a, rJ=%#x, -> %#y%?+"}
+
+@ @=
+{"OR",0x2a,0,0,1,"%l = %#y | %#z = %#x"},@|
+{"ORI",0x29,0,0,1,"%l = %#y | %z = %#x"},@|
+{"ORN",0x2a,0,0,1,"%l = %#y |~ %#z = %#x"},@|
+{"ORNI",0x29,0,0,1,"%l = %#y |~ %z = %#x"},@|
+{"NOR",0x2a,0,0,1,"%l = %#y ~| %#z = %#x"},@|
+{"NORI",0x29,0,0,1,"%l = %#y ~| %z = %#x"},@|
+{"XOR",0x2a,0,0,1,"%l = %#y ^ %#z = %#x"},@|
+{"XORI",0x29,0,0,1,"%l = %#y ^ %z = %#x"},@|
+{"AND",0x2a,0,0,1,"%l = %#y & %#z = %#x"},@|
+{"ANDI",0x29,0,0,1,"%l = %#y & %z = %#x"},@|
+{"ANDN",0x2a,0,0,1,"%l = %#y \\ %#z = %#x"},@|
+{"ANDNI",0x29,0,0,1,"%l = %#y \\ %z = %#x"},@|
+{"NAND",0x2a,0,0,1,"%l = %#y ~& %#z = %#x"},@|
+{"NANDI",0x29,0,0,1,"%l = %#y ~& %z = %#x"},@|
+{"NXOR",0x2a,0,0,1,"%l = %#y ~^ %#z = %#x"},@|
+{"NXORI",0x29,0,0,1,"%l = %#y ~^ %z = %#x"},@|
+{"BDIF",0x2a,0,0,1,"%l = %#y bdif %#z = %#x"},@|
+{"BDIFI",0x29,0,0,1,"%l = %#y bdif %z = %#x"},@|
+{"WDIF",0x2a,0,0,1,"%l = %#y wdif %#z = %#x"},@|
+{"WDIFI",0x29,0,0,1,"%l = %#y wdif %z = %#x"},@|
+{"TDIF",0x2a,0,0,1,"%l = %#y tdif %#z = %#x"},@|
+{"TDIFI",0x29,0,0,1,"%l = %#y tdif %z = %#x"},@|
+{"ODIF",0x2a,0,0,1,"%l = %#y odif %#z = %#x"},@|
+{"ODIFI",0x29,0,0,1,"%l = %#y odif %z = %#x"},@|
+{"MUX",0x2a,rM,0,1,"%l = %#b? %#y: %#z = %#x"},@|
+{"MUXI",0x29,rM,0,1,"%l = %#b? %#y: %z = %#x"},@|
+{"SADD",0x2a,0,0,1,"%l = nu(%#y\\%#z) = %x"},@|
+{"SADDI",0x29,0,0,1,"%l = nu(%#y%?\\) = %x"},@|
+{"MOR",0x2a,0,0,1,"%l = %#y mor %#z = %#x"},@|
+{"MORI",0x29,0,0,1,"%l = %#y mor %z = %#x"},@|
+{"MXOR",0x2a,0,0,1,"%l = %#y mxor %#z = %#x"},@|
+{"MXORI",0x29,0,0,1,"%l = %#y mxor %z = %#x"},@|
+{"SETH",0x20,0,0,1,"%l = %#z"},@|
+{"SETMH",0x20,0,0,1,"%l = %#z"},@|
+{"SETML",0x20,0,0,1,"%l = %#z"},@|
+{"SETL",0x20,0,0,1,"%l = %#z"},@|
+{"INCH",0x30,0,0,1,"%l = %#y + %#z = %#x"},@|
+{"INCMH",0x30,0,0,1,"%l = %#y + %#z = %#x"},@|
+{"INCML",0x30,0,0,1,"%l = %#y + %#z = %#x"},@|
+{"INCL",0x30,0,0,1,"%l = %#y + %#z = %#x"},@|
+{"ORH",0x30,0,0,1,"%l = %#y | %#z = %#x"},@|
+{"ORMH",0x30,0,0,1,"%l = %#y | %#z = %#x"},@|
+{"ORML",0x30,0,0,1,"%l = %#y | %#z = %#x"},@|
+{"ORL",0x30,0,0,1,"%l = %#y | %#z = %#x"},@|
+{"ANDNH",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@|
+{"ANDNMH",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@|
+{"ANDNML",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@|
+{"ANDNL",0x30,0,0,1,"%l = %#y \\ %#z = %#x"},@|
+{"JMP",0x40,0,0,1,"-> %#z"},@|
+{"JMPB",0x40,0,0,1,"-> %#z"},@|
+{"PUSHJ",0xe0,0,0,1,"%lrO=%#b, rL=%a, rJ=%#x, -> %#z"},@|
+{"PUSHJB",0xe0,0,0,1,"%lrO=%#b, rL=%a, rJ=%#x, -> %#z"},@|
+{"GETA",0x60,0,0,1,"%l = %#z"},@|
+{"GETAB",0x60,0,0,1,"%l = %#z"},@|
+{"PUT",0x02,0,0,1,"%s = %r"},@|
+{"PUTI",0x01,0,0,1,"%s = %r"},@|
+{"POP",0x80,rJ,0,3,"%lrL=%a, rO=%#b, -> %#y%?+"},@|
+{"RESUME",0x00,0,0,5,"{%#b} -> %#z"},@|
+{"SAVE",0x20,0,20,1,"%l = %#x"},@|
+{"UNSAVE",0x82,0,20,1,"%#z: rG=%x, ..., rL=%a"},@|
+{"SYNC",0x01,0,0,1,""},@|
+{"SWYM",0x00,0,0,1,""},@|
+{"GET",0x20,0,0,1,"%l = %s = %#x"},@|
+{"TRIP",0x0a,255,0,5,"rW=%#w, rX=%#x, rY=%#y, rZ=%#z, rB=%#b, g[255]=%#a"}
+
+@ @=
+{
+ if ((op&0xfe)==JMP) yz=inst&0xffffff;
+ if (op&1) yz-=(op==JMPB? 0x1000000: 0x10000);
+ y=inst_ptr;@+ z=incr(loc,yz<<2);
+}
+
+@ @=
+if (resuming && rop!=RESUME_AGAIN)
+ @@;
+else {
+ if (f&0x10) @;
+ if (info[op].third_operand) @;
+ if (f&0x1) z.l=zz;
+ else if (f&0x2) @@;
+ else if ((op&0xf0)==SETH) @;
+ if (f&0x4) y.l=yy;
+ else if (f&0x8) @;
+}
+
+@ There are 256 global registers, |g[0]| through |g[255]|; the
+first 32 of them are used for the special registers |rA|, |rB|, etc.
+There are |lring_mask+1| local registers, usually 256 but the
+user can increase this to a larger power of~2 if desired.
+
+The current values of rL, rG, rO, and rS are kept in separate variables
+called |L|, |G|, |O|, and |S| for convenience. (In fact, |O| and |S|
+actually hold the values rO/8 and rS/8, modulo |lring_size|.)
+
+@=
+{
+ if (zz>=G) z=g[zz];
+ else if (zz=
+{
+ if (yy>=G) y=g[yy];
+ else if (yy=
+{
+ if (xx>=G) b=g[xx];
+ else if (xx=
+register int G,L,O; /* accessible copies of key registers */
+
+@ @=
+octa g[256]; /* global registers */
+octa *l; /* local registers */
+int lring_size; /* the number of local registers (a power of 2) */
+int lring_mask; /* one less than |lring_size| */
+int S; /* congruent to $\rm rS\GG 3$ modulo |lring_size| */
+
+@ Several of the global registers have constant values, because
+of the way \MMIX\ has been simplified in this simulator.
+
+Special register rN has a constant value identifying the time of compilation.
+(The macro \.{ABSTIME} is defined externally in the file \.{abstime.h},
+which should have just been created by {\mc ABSTIME}\kern.05em;
+{\mc ABSTIME} is
+a trivial program that computes the value of the standard library function
+|time(NULL)|. We assume that this number, which is the number of seconds in
+the ``{\mc UNIX} epoch,'' is less than~$2^{32}$. Beware: Our assumption will
+fail in February of 2106.)
+@^system dependencies@>
+
+@d VERSION 1 /* version of the \MMIX\ architecture that we support */
+@d SUBVERSION 0 /* secondary byte of version number */
+@d SUBSUBVERSION 1 /* further qualification to version number */
+
+@=
+g[rK]=neg_one;
+g[rN].h=(VERSION<<24)+(SUBVERSION<<16)+(SUBSUBVERSION<<8);
+g[rN].l=ABSTIME; /* see comment and warning above */
+g[rT].h=0x80000005;
+g[rTT].h=0x80000006;
+g[rV].h=0x369c2004;
+if (lring_size<256) lring_size=256;
+lring_mask=lring_size-1;
+if (lring_size&lring_mask)
+ panic("The number of local registers must be a power of 2");
+@.The number of local...@>
+l=(octa*)calloc(lring_size,sizeof(octa));
+if (!l) panic("No room for the local registers");
+@.No room...@>
+cur_round=ROUND_NEAR;
+
+@ In operations like |INCH|, we want |z| to be the |yz| field,
+shifted left 48 bits. We also want |y| to be register~X, which has
+previously been placed in |b|; then |INCH| can be simulated as if
+it were |ADDU|.
+
+@=
+{
+ switch (op&3) {
+ case 0: z.h=yz<<16;@+break;
+ case 1: z.h=yz;@+break;
+ case 2: z.l=yz<<16;@+break;
+ case 3: z.l=yz;@+break;
+ }
+ y=b;
+}
+
+@ @=
+b=g[info[op].third_operand];
+
+@ @=
+if (xx>=G) {
+ sprintf(lhs,"$%d=g[%d]",xx,xx);
+ x_ptr=&g[xx];
+}@+else {
+ while (xx>=L) @;
+ sprintf(lhs,"$%d=l[%d]",xx,(O+xx)&lring_mask);
+ x_ptr=&l[(O+xx)&lring_mask];
+}
+
+@ @=
+{
+ l[(O+L)&lring_mask]=zero_octa;
+ L=g[rL].l=L+1;
+ if (((S-O-L)&lring_mask)==0) stack_store();
+}
+
+@ The |stack_store| routine advances the ``gamma'' pointer in the
+ring of local registers, by storing the oldest local register into memory
+location~rS and advancing rS.
+
+@d test_store_bkpt(ll) if ((ll)->bkpt&write_bit) breakpoint=tracing=true
+
+@=
+void stack_store @,@,@[ARGS((void))@];@+@t}\6{@>
+void stack_store()
+{
+ register mem_tetra *ll=mem_find(g[rS]);
+ register int k=S&lring_mask;
+ ll->tet=l[k].h;@+test_store_bkpt(ll);
+ (ll+1)->tet=l[k].l;@+test_store_bkpt(ll+1);
+ if (stack_tracing) {
+ tracing=true;
+ if (cur_line) show_line();
+ printf(" M8[#%08x%08x]=l[%d]=#%08x%08x, rS+=8\n",
+ g[rS].h,g[rS].l,k,l[k].h,l[k].l);
+ }
+ g[rS]=incr(g[rS],8), S++;
+}
+
+@ The |stack_load| routine is essentially the inverse of |stack_store|.
+
+@d test_load_bkpt(ll) if ((ll)->bkpt&read_bit) breakpoint=tracing=true
+
+@=
+void stack_load @,@,@[ARGS((void))@];@+@t}\6{@>
+void stack_load()
+{
+ register mem_tetra *ll;
+ register int k;
+ S--, g[rS]=incr(g[rS],-8);
+ ll=mem_find(g[rS]);
+ k=S&lring_mask;
+ l[k].h=ll->tet;@+test_load_bkpt(ll);
+ l[k].l=(ll+1)->tet;@+test_load_bkpt(ll+1);
+ if (stack_tracing) {
+ tracing=true;
+ if (cur_line) show_line();
+ printf(" rS-=8, l[%d]=M8[#%08x%08x]=#%08x%08x\n",
+ k,g[rS].h,g[rS].l,l[k].h,l[k].l);
+ }
+}
+
+@* Simulating the instructions. The master switch branches in 256
+directions, one for each \MMIX\ instruction.
+
+Let's start with |ADD|, since it is somehow the most typical case---not
+too easy, and not too hard. The task is to compute |x=y+z|, and to
+signal overflow if the sum is out of range. Overflow occurs if and
+only if |y| and |z| have the same sign but the sum has a different sign.
+
+Overflow is one of the eight arithmetic exceptions. We record such
+exceptions in a variable called~|exc|, which is set to
+zero at the beginning of each cycle and used to update~rA at the end.
+
+The main control routine has put the input operands into octabytes
+|y| and~|z|. It has also made |x_ptr| point to the octabyte where the
+result should be placed.
+
+@=
+case ADD: case ADDI: x=w; /* |w=oplus(y,z)| */
+ if (((y.h^z.h)&sign_bit)==0 && ((y.h^x.h)&sign_bit)!=0) exc|=V_BIT;
+store_x: *x_ptr=x;@+break;
+
+@ Other cases of signed and unsigned addition and subtraction are,
+of course, similar. Overflow occurs in the calculation |x=y-z| if and
+only if it occurs in the calculation |y=x+z|.
+
+@=
+case SUB: case SUBI: case NEG: case NEGI: x=ominus(y,z);
+ if (((x.h^z.h)&sign_bit)==0 && ((x.h^y.h)&sign_bit)!=0) exc|=V_BIT;
+ goto store_x;
+case ADDU: case ADDUI: case INCH: case INCMH: case INCML: case INCL:
+ x=w;@+goto store_x;
+case SUBU: case SUBUI: case NEGU: case NEGUI: x=ominus(y,z);@+goto store_x;
+case IIADDU: case IIADDUI: case IVADDU: case IVADDUI:
+case VIIIADDU: case VIIIADDUI: case XVIADDU: case XVIADDUI:
+ x=oplus(shift_left(y,((op&0xf)>>1)-3),z);@+goto store_x;
+case SETH: case SETMH: case SETML: case SETL: case GETA: case GETAB:
+ x=z;@+goto store_x;
+
+@ Let's get the simple bitwise operations out of the way too.
+
+@=
+case OR: case ORI: case ORH: case ORMH: case ORML: case ORL:
+ x.h=y.h|z.h;@+ x.l=y.l|z.l;@+ goto store_x;
+case ORN: case ORNI:
+ x.h=y.h|~z.h;@+ x.l=y.l|~z.l;@+ goto store_x;
+case NOR: case NORI:
+ x.h=~(y.h|z.h);@+ x.l=~(y.l|z.l);@+ goto store_x;
+case XOR: case XORI:
+ x.h=y.h^z.h;@+ x.l=y.l^z.l;@+ goto store_x;
+case AND: case ANDI:
+ x.h=y.h&z.h;@+ x.l=y.l&z.l;@+ goto store_x;
+case ANDN: case ANDNI: case ANDNH: case ANDNMH: case ANDNML: case ANDNL:
+ x.h=y.h&~z.h;@+ x.l=y.l&~z.l;@+ goto store_x;
+case NAND: case NANDI:
+ x.h=~(y.h&z.h);@+ x.l=~(y.l&z.l);@+ goto store_x;
+case NXOR: case NXORI:
+ x.h=~(y.h^z.h);@+ x.l=~(y.l^z.l);@+ goto store_x;
+
+@ The less simple bit manipulations are almost equally simple,
+given the subroutines of {\mc MMIX-ARITH}.
+The |MUX| operation has three inputs;
+in such cases the inputs appear in |y|, |z|, and~|b|.
+
+@d shift_amt (z.h || z.l>=64? 64: z.l)
+
+@=
+case SL: case SLI: x=shift_left(y,shift_amt);
+ a=shift_right(x,shift_amt,0);
+ if (a.h!=y.h || a.l!=y.l) exc|=V_BIT;
+ goto store_x;
+case SLU: case SLUI: x=shift_left(y,shift_amt);@+goto store_x;
+case SR: case SRI: case SRU: case SRUI:
+ x=shift_right(y,shift_amt,op&0x2);@+goto store_x;
+case MUX: case MUXI:
+ x.h=(y.h&b.h)|(z.h&~b.h);@+ x.l=(y.l&b.l)|(z.l&~b.l);
+ goto store_x;
+case SADD: case SADDI:
+ x.l=count_bits(y.h&~z.h)+count_bits(y.l&~z.l);@+goto store_x;
+case MOR: case MORI:
+ x=bool_mult(y,z,false);@+goto store_x;
+case MXOR: case MXORI:
+ x=bool_mult(y,z,true);@+goto store_x;
+case BDIF: case BDIFI:
+ x.h=byte_diff(y.h,z.h);@+x.l=byte_diff(y.l,z.l);@+goto store_x;
+case WDIF: case WDIFI:
+ x.h=wyde_diff(y.h,z.h);@+x.l=wyde_diff(y.l,z.l);@+goto store_x;
+case TDIF: case TDIFI:@+
+ if (y.h>z.h) x.h=y.h-z.h;
+tdif_l:@+ if (y.l>z.l) x.l=y.l-z.l;@+ goto store_x;
+case ODIF: case ODIFI:@+if (y.h>z.h) x=ominus(y,z);
+ else if (y.h==z.h) goto tdif_l;
+ goto store_x;
+
+@ When an operation has two outputs, the primary output is placed in~|x|
+and the auxiliary output is placed in~|a|.
+
+@=
+case MUL: case MULI: x=signed_omult(y,z);
+test_overflow:@+if (overflow) exc|=V_BIT;
+ goto store_x;
+case MULU: case MULUI: x=omult(y,z);@+a=g[rH]=aux;@+goto store_x;
+case DIV: case DIVI:@+if (!z.l && !z.h) aux=y, exc|=D_BIT, overflow=false;
+ else x=signed_odiv(y,z);
+ a=g[rR]=aux;@+goto test_overflow;
+case DIVU: case DIVUI: x=odiv(b,y,z);@+a=g[rR]=aux;@+goto store_x;
+
+@ The floating point routines of {\mc MMIX-ARITH} record exceptional
+events in a variable called |exceptions|. Here we simply merge those bits into
+the |exc| variable. The |U_BIT| is not exactly the
+same as ``underflow,'' but the true definition of underflow will be applied
+when |exc| is combined with~rA.
+
+@=
+case FADD: x=fplus(y,z);
+ fin_float: round_mode=cur_round;
+ store_fx: exc|=exceptions;@+ goto store_x;
+case FSUB: a=z;@+if (fcomp(a,zero_octa)!=2) a.h^=sign_bit;
+ x=fplus(y,a);@+goto fin_float;
+case FMUL: x=fmult(y,z);@+goto fin_float;
+case FDIV: x=fdivide(y,z);@+goto fin_float;
+case FREM: x=fremstep(y,z,2500);@+goto fin_float;
+case FSQRT: x=froot(z,y.l);
+ fin_unifloat:@+if (y.h || y.l>4) goto illegal_inst;
+ round_mode=(y.l? y.l: cur_round);@+goto store_fx;
+case FINT: x=fintegerize(z,y.l);@+goto fin_unifloat;
+case FIX: x=fixit(z,y.l);@+goto fin_unifloat;
+case FIXU: x=fixit(z,y.l);@+exceptions&=~W_BIT;@+goto fin_unifloat;
+case FLOT: case FLOTI: case FLOTU: case FLOTUI:
+case SFLOT: case SFLOTI: case SFLOTU: case SFLOTUI:
+ x=floatit(z,y.l,op&0x2,op&0x4);@+goto fin_unifloat;
+
+@ We have now done all of the arithmetic operations except for the
+cases that compare two registers and yield a value of $-1$~or~0~or~1.
+
+@d cmp_zero store_x /* |x| is 0 by default */
+
+@=
+case CMP: case CMPI:@+if ((y.h&sign_bit)>(z.h&sign_bit)) goto cmp_neg;
+ if ((y.h&sign_bit)<(z.h&sign_bit)) goto cmp_pos;
+case CMPU: case CMPUI:@+if (y.hz.h) goto cmp_pos;
+ if (y.l=
+int register_truth @,@,@[ARGS((octa,mmix_opcode))@];@+@t}\6{@>
+int register_truth(o,op)
+ octa o;
+ mmix_opcode op;
+{@+register int b;
+ switch ((op>>1) & 0x3) {
+ case 0: b=o.h>>31;@+break; /* negative? */
+ case 1: b=(o.h==0 && o.l==0);@+break; /* zero? */
+ case 2: b=(o.h=
+case CSN: case CSNI: case CSZ: case CSZI:@/
+case CSP: case CSPI: case CSOD: case CSODI:@/
+case CSNN: case CSNNI: case CSNZ: case CSNZI:@/
+case CSNP: case CSNPI: case CSEV: case CSEVI:@/
+case ZSN: case ZSNI: case ZSZ: case ZSZI:@/
+case ZSP: case ZSPI: case ZSOD: case ZSODI:@/
+case ZSNN: case ZSNNI: case ZSNZ: case ZSNZI:@/
+case ZSNP: case ZSNPI: case ZSEV: case ZSEVI:@/
+ x=register_truth(y,op)? z: b;@+goto store_x;
+
+@ Didn't that feel good, when 32 opcodes reduced to a single case?
+We get to do it one more time. Happiness!
+
+@=
+case BN: case BNB: case BZ: case BZB:@/
+case BP: case BPB: case BOD: case BODB:@/
+case BNN: case BNNB: case BNZ: case BNZB:@/
+case BNP: case BNPB: case BEV: case BEVB:@/
+case PBN: case PBNB: case PBZ: case PBZB:@/
+case PBP: case PBPB: case PBOD: case PBODB:@/
+case PBNN: case PBNNB: case PBNZ: case PBNZB:@/
+case PBNP: case PBNPB: case PBEV: case PBEVB:@/
+ x.l=register_truth(b,op);
+ if (x.l) {
+ inst_ptr=z;
+ good=(op>=PBN);
+ }@+else good=(op=
+case LDB: case LDBI: case LDBU: case LDBUI:@/
+ i=56;@+j=(w.l&0x3)<<3; goto fin_ld;
+case LDW: case LDWI: case LDWU: case LDWUI:@/
+ i=48;@+j=(w.l&0x2)<<3; goto fin_ld;
+case LDT: case LDTI: case LDTU: case LDTUI:@/
+ i=32;@+j=0;@+ goto fin_ld;
+case LDHT: case LDHTI: i=j=0;
+fin_ld: ll=mem_find(w);@+test_load_bkpt(ll);
+ x.h=ll->tet;
+ x=shift_right(shift_left(x,j),i,op&0x2);
+check_ld:@+if (w.h&sign_bit) goto privileged_inst;
+ goto store_x;
+case LDO: case LDOI: case LDOU: case LDOUI: case LDUNC: case LDUNCI:
+ w.l&=-8;@+ ll=mem_find(w);
+ test_load_bkpt(ll);@+test_load_bkpt(ll+1);
+ x.h=ll->tet;@+ x.l=(ll+1)->tet;
+ goto check_ld;
+case LDSF: case LDSFI: ll=mem_find(w);@+test_load_bkpt(ll);
+ x=load_sf(ll->tet);@+ goto check_ld;
+
+@ @=
+case STB: case STBI: case STBU: case STBUI:@/
+ i=56;@+j=(w.l&0x3)<<3; goto fin_pst;
+case STW: case STWI: case STWU: case STWUI:@/
+ i=48;@+j=(w.l&0x2)<<3; goto fin_pst;
+case STT: case STTI: case STTU: case STTUI:@/
+ i=32;@+j=0;
+fin_pst: ll=mem_find(w);
+ if ((op&0x2)==0) {
+ a=shift_right(shift_left(b,i),i,0);
+ if (a.h!=b.h || a.l!=b.l) exc|=V_BIT;
+ }
+ ll->tet^=(ll->tet^(b.l<<(i-32-j))) & ((((tetra)-1)<<(i-32))>>j);
+ goto fin_st;
+case STSF: case STSFI: ll=mem_find(w);
+ ll->tet=store_sf(b);@+exc=exceptions;
+ goto fin_st;
+case STHT: case STHTI: ll=mem_find(w);@+ ll->tet=b.h;
+fin_st: test_store_bkpt(ll);
+ w.l&=-8;@+ll=mem_find(w);
+ a.h=ll->tet;@+ a.l=(ll+1)->tet; /* for trace output */
+ goto check_st;
+case STCO: case STCOI: b.l=xx;
+case STO: case STOI: case STOU: case STOUI: case STUNC: case STUNCI:
+ w.l&=-8;@+ll=mem_find(w);
+ test_store_bkpt(ll);@+ test_store_bkpt(ll+1);
+ ll->tet=b.h;@+ (ll+1)->tet=b.l;
+check_st:@+if (w.h&sign_bit) goto privileged_inst;
+ break;
+
+@ The |CSWAP| operation has elements of both loading and storing.
+We shuffle some of
+the operands around so that they will appear correctly in the trace output.
+
+@=
+case CSWAP: case CSWAPI: w.l&=-8;@+ll=mem_find(w);
+ test_load_bkpt(ll);@+test_load_bkpt(ll+1);
+ a=g[rP];
+ if (ll->tet==a.h && (ll+1)->tet==a.l) {
+ x.h=0, x.l=1;
+ test_store_bkpt(ll);@+test_store_bkpt(ll+1);
+ ll->tet=b.h, (ll+1)->tet=b.l;
+ strcpy(rhs,"M8[%#w]=%#b");
+ }@+else {
+ b.h=ll->tet, b.l=(ll+1)->tet;
+ g[rP]=b;
+ strcpy(rhs,"rP=%#b");
+ }
+ goto check_ld;
+
+@ The |GET| command is permissive, but |PUT| is restrictive.
+
+@=
+case GET:@+if (yy!=0 || zz>=32) goto illegal_inst;
+ x=g[zz];
+ goto store_x;
+case PUT: case PUTI:@+ if (yy!=0 || xx>=32) goto illegal_inst;
+ strcpy(rhs,"%z = %#z");
+ if (xx>=8) {
+ if (xx<=11) goto illegal_inst; /* can't change rC, rN, rO, rS */
+ if (xx<=18) goto privileged_inst;
+ if (xx==rA) @@;
+ else if (xx==rL) @@;
+ else if (xx==rG) @;
+ }
+ g[xx]=z;@+zz=xx;@+break;
+
+@ @=
+{
+ x=z;@+ strcpy(rhs,z.h? "min(rL,%#x) = %z": "min(rL,%x) = %z");
+ if (z.l>L || z.h) z.h=0, z.l=L;
+ else old_L=L=z.l;
+}
+
+@ @=
+{
+ if (z.h!=0 || z.l>255 || z.l=
+{
+ if (z.h!=0 || z.l>=0x40000) goto illegal_inst;
+ cur_round=(z.l>=0x10000? z.l>>16: ROUND_NEAR);
+}
+
+@ Pushing and popping are rather delicate, because we want to trace
+them coherently.
+
+@=
+case PUSHGO: case PUSHGOI: inst_ptr=w;@+goto push;
+case PUSHJ: case PUSHJB: inst_ptr=z;
+push:@+if (xx>=G) {
+ xx=L++;
+ if (((S-O-L)&lring_mask)==0) stack_store();
+ }
+ x.l=xx;@+l[(O+xx)&lring_mask]=x; /* the ``hole'' records the amount pushed */
+ sprintf(lhs,"l[%d]=%d, ",(O+xx)&lring_mask,xx);
+ x=g[rJ]=incr(loc,4);
+ L-=xx+1;@+ O+=xx+1;
+ b=g[rO]=incr(g[rO],(xx+1)<<3);
+sync_L: a.l=g[rL].l=L;@+break;
+case POP:@+if (xx!=0 && xx<=L) y=l[(O+xx-1)&lring_mask];
+ if (g[rS].l==g[rO].l) stack_load();
+ k=l[(O-1)&lring_mask].l&0xff;
+ while ((tetra)(O-S)<=(tetra)k) stack_load();
+ L=k+(xx<=L? xx: L+1);
+ if (L>G) L=G;
+ if (L>k) {
+ l[(O-1)&lring_mask]=y;
+ if (y.h) sprintf(lhs,"l[%d]=#%x%08x, ",(O-1)&lring_mask,y.h,y.l);
+ else sprintf(lhs,"l[%d]=#%x, ",(O-1)&lring_mask,y.l);
+ }@+else lhs[0]='\0';
+ y=g[rJ];@+ z.l=yz<<2;@+ inst_ptr=oplus(y,z);
+ O-=k+1;@+ b=g[rO]=incr(g[rO],-((k+1)<<3));
+ goto sync_L;
+
+@ To complete our simulation of \MMIX's register stack, we need
+to implement |SAVE| and |UNSAVE|.
+
+@=
+case SAVE:@+if (xx;
+ if (k==255) k=rB;
+ else if (k==rR) k=rP;
+ else if (k==rZ+1) break;
+ else k++;
+ }
+ O=S, g[rO]=g[rS];
+ x=incr(g[rO],-8);@+goto store_x;
+
+@ This part of the program naturally has a lot in common with the
+|stack_store| subroutine. (There's a little white lie in the
+section name; if |k|~is |rZ+1|, we store rG and~rA, not |g[k]|.)
+
+@=
+ll=mem_find(g[rS]);
+if (k==rZ+1) x.h=G<<24, x.l=g[rA].l;
+else x=g[k];
+ll->tet=x.h;@+test_store_bkpt(ll);
+(ll+1)->tet=x.l;@+test_store_bkpt(ll+1);
+if (stack_tracing) {
+ tracing=true;
+ if (cur_line) show_line();
+ if (k>=32) printf(" M8[#%08x%08x]=g[%d]=#%08x%08x, rS+=8\n",
+ g[rS].h,g[rS].l,k,x.h,x.l);
+ else printf(" M8[#%08x%08x]=%s=#%08x%08x, rS+=8\n",
+ g[rS].h,g[rS].l,k==rZ+1? "(rG,rA)": special_name[k],x.h,x.l);
+}
+S++, g[rS]=incr(g[rS],8);
+
+@ @=
+case UNSAVE:@+if (xx!=0 || yy!=0) goto illegal_inst;
+ z.l&=-8;@+g[rS]=incr(z,8);
+ for (k=rZ+1;;) {
+ @;
+ if (k==rP) k=rR;
+ else if (k==rB) k=255;
+ else if (k==G) break;
+ else k--;
+ }
+ S=g[rS].l>>3;
+ stack_load();
+ k=l[S&lring_mask].l&0xff;
+ for (j=0;jG? G: k;
+ g[rL].l=L;@+a=g[rL];
+ g[rG].l=G;@+break;
+
+@ @=
+g[rS]=incr(g[rS],-8);
+ll=mem_find(g[rS]);
+test_load_bkpt(ll);@+test_load_bkpt(ll+1);
+if (k==rZ+1) x.l=G=g[rG].l=ll->tet>>24, a.l=g[rA].l=(ll+1)->tet&0x3ffff;
+else g[k].h=ll->tet, g[k].l=(ll+1)->tet;
+if (stack_tracing) {
+ tracing=true;
+ if (cur_line) show_line();
+ if (k>=32) printf(" rS-=8, g[%d]=M8[#%08x%08x]=#%08x%08x\n",
+ k,g[rS].h,g[rS].l,ll->tet,(ll+1)->tet);
+ else if (k==rZ+1) printf(" (rG,rA)=M8[#%08x%08x]=#%08x%08x\n",
+ g[rS].h,g[rS].l,ll->tet,(ll+1)->tet);
+ else printf(" rS-=8, %s=M8[#%08x%08x]=#%08x%08x\n",
+ special_name[k],g[rS].h,g[rS].l,ll->tet,(ll+1)->tet);
+}
+
+@ The cache maintenance instructions don't affect this simulation,
+because there are no caches. But if the user has invoked them, we do
+provide a bit of information when tracing, indicating the scope of the
+instruction.
+
+@=
+case SYNCID: case SYNCIDI: case PREST: case PRESTI:
+case SYNCD: case SYNCDI: case PREGO: case PREGOI:
+case PRELD: case PRELDI: x=incr(w,xx);@+break;
+
+@ Several loose ends remain to be nailed down.
+% (Incidentally, a ``loose end'' should never be confused with ``Lucent.'')
+
+@=
+case GO: case GOI: x=inst_ptr;@+inst_ptr=w;@+goto store_x;
+case JMP: case JMPB: inst_ptr=z;
+case SWYM: break;
+case SYNC:@+if (xx!=0 || yy!=0 || zz>7) goto illegal_inst;
+ if (zz<=3) break;
+case LDVTS: case LDVTSI: privileged_inst: strcpy(lhs,"!privileged");
+ goto break_inst;
+illegal_inst: strcpy(lhs,"!illegal");
+break_inst: breakpoint=tracing=true;
+ if (!interacting && !interact_after_break) halted=true;
+ break;
+
+@* Trips and traps. We have now implemented 253 of the 256 instructions: all
+but \.{TRIP}, \.{TRAP}, and \.{RESUME}.
+
+The |TRIP| instruction simply turns |H_BIT| on in the |exc| variable;
+this will trigger an interruption to location~0.
+@^interrupts@>
+
+The |TRAP| instruction is not simulated, except for the system calls
+mentioned in the introduction.
+
+@=
+case TRIP: exc|=H_BIT;@+break;
+case TRAP:@+if (xx!=0 || yy>max_sys_call) goto privileged_inst;
+ strcpy(rhs,trap_format[yy]);
+ g[rWW]=inst_ptr;
+ g[rXX].h=sign_bit, g[rXX].l=inst;
+ g[rYY]=y, g[rZZ]=z;
+ z.h=0, z.l=zz;
+ a=incr(b,8);
+ @;
+ switch (yy) {
+case Halt: @;@+g[rBB]=g[255];@+break;
+case Fopen: g[rBB]=mmix_fopen((unsigned char)zz,mb,ma);@+break;
+case Fclose: g[rBB]=mmix_fclose((unsigned char)zz);@+break;
+case Fread: g[rBB]=mmix_fread((unsigned char)zz,mb,ma);@+break;
+case Fgets: g[rBB]=mmix_fgets((unsigned char)zz,mb,ma);@+break;
+case Fgetws: g[rBB]=mmix_fgetws((unsigned char)zz,mb,ma);@+break;
+case Fwrite: g[rBB]=mmix_fwrite((unsigned char)zz,mb,ma);@+break;
+case Fputs: g[rBB]=mmix_fputs((unsigned char)zz,b);@+break;
+case Fputws: g[rBB]=mmix_fputws((unsigned char)zz,b);@+break;
+case Fseek: g[rBB]=mmix_fseek((unsigned char)zz,b);@+break;
+case Ftell: g[rBB]=mmix_ftell((unsigned char)zz);@+break;
+}
+ x=g[255]=g[rBB];@+break;
+
+@ @=
+if (!zz) halted=breakpoint=true;
+else if (zz==1) {
+ if (loc.h || loc.l>=0x90) goto privileged_inst;
+ print_trip_warning(loc.l>>4,incr(g[rW],-4));
+}@+else goto privileged_inst;
+
+@ @=
+char arg_count[]={1,3,1,3,3,3,3,2,2,2,1};
+char *trap_format[]={
+"Halt(%z)",
+"$255 = Fopen(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x",
+"$255 = Fclose(%!z) = %x",
+"$255 = Fread(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x",
+"$255 = Fgets(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x",
+"$255 = Fgetws(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x",
+"$255 = Fwrite(%!z,M8[%#b]=%#q,M8[%#a]=%p) = %x",
+"$255 = Fputs(%!z,%#b) = %x",
+"$255 = Fputws(%!z,%#b) = %x",
+"$255 = Fseek(%!z,%b) = %x",
+"$255 = Ftell(%!z) = %x"};
+
+@ @=
+if (arg_count[yy]==3) {
+ ll=mem_find(b);@+test_load_bkpt(ll);@+test_load_bkpt(ll+1);
+ mb.h=ll->tet, mb.l=(ll+1)->tet;
+ ll=mem_find(a);@+test_load_bkpt(ll);@+test_load_bkpt(ll+1);
+ ma.h=ll->tet, ma.l=(ll+1)->tet;
+}
+
+@ The input/output operations invoked by \.{TRAP}s are
+done by subroutines in an auxiliary program module called {\mc MMIX-IO}.
+Here we need only declare those subroutines, and write three primitive
+interfaces on which they depend.
+
+@ @=
+extern void mmix_io_init @,@,@[ARGS((void))@];
+extern octa mmix_fopen @,@,@[ARGS((unsigned char,octa,octa))@];
+extern octa mmix_fclose @,@,@[ARGS((unsigned char))@];
+extern octa mmix_fread @,@,@[ARGS((unsigned char,octa,octa))@];
+extern octa mmix_fgets @,@,@[ARGS((unsigned char,octa,octa))@];
+extern octa mmix_fgetws @,@,@[ARGS((unsigned char,octa,octa))@];
+extern octa mmix_fwrite @,@,@[ARGS((unsigned char,octa,octa))@];
+extern octa mmix_fputs @,@,@[ARGS((unsigned char,octa))@];
+extern octa mmix_fputws @,@,@[ARGS((unsigned char,octa))@];
+extern octa mmix_fseek @,@,@[ARGS((unsigned char,octa))@];
+extern octa mmix_ftell @,@,@[ARGS((unsigned char))@];
+extern void print_trip_warning @,@,@[ARGS((int,octa))@];
+extern void mmix_fake_stdin @,@,@[ARGS((FILE*))@];
+
+@ The subroutine |mmgetchars(buf,size,addr,stop)| reads characters
+starting at address |addr| in the simulated memory and stores them
+in |buf|, continuing until |size| characters have been read or
+some other stopping criterion has been met. If |stop<0| there is
+no other criterion; if |stop=0| a null character will also terminate
+the process; otherwise |addr| is even, and two consecutive null bytes
+starting at an even address will terminate the process. The number
+of bytes read and stored, exclusive of terminating nulls, is returned.
+
+@=
+int mmgetchars @,@,@[ARGS((char*,int,octa,int))@];@+@t}\6{@>
+int mmgetchars(buf,size,addr,stop)
+ char *buf;
+ int size;
+ octa addr;
+ int stop;
+{
+ register char *p;
+ register int m;
+ register mem_tetra *ll;
+ register tetra x;
+ octa a;
+ for (p=buf,m=0,a=addr; mtet;
+ if ((a.l&0x3) || m>size-4) @@;
+ else @@;
+ }
+ return size;
+}
+
+@ @=
+{
+ *p=(x>>(8*((~a.l)&0x3)))&0xff;
+ if (!*p && stop>=0) {
+ if (stop==0) return m;
+ if ((a.l&0x1) && *(p-1)=='\0') return m-1;
+ }
+ p++,m++,a=incr(a,1);
+}
+
+@ @=
+{
+ *p=x>>24;
+ if (!*p && (stop==0 || (stop>0 && x<0x10000))) return m;
+ *(p+1)=(x>>16)&0xff;
+ if (!*(p+1) && stop==0) return m+1;
+ *(p+2)=(x>>8)&0xff;
+ if (!*(p+2) && (stop==0 || (stop>0 && (x&0xffff)==0))) return m+2;
+ *(p+3)=x&0xff;
+ if (!*(p+3) && stop==0) return m+3;
+ p+=4,m+=4,a=incr(a,4);
+}
+
+@ The subroutine |mmputchars(buf,size,addr)| puts |size| characters
+into the simulated memory starting at address |addr|.
+
+@=
+void mmputchars @,@,@[ARGS((unsigned char*,int,octa))@];@+@t}\6{@>
+void mmputchars(buf,size,addr)
+ unsigned char *buf;
+ int size;
+ octa addr;
+{
+ register unsigned char *p;
+ register int m;
+ register mem_tetra *ll;
+ octa a;
+ for (p=buf,m=0,a=addr; msize-4) @@;
+ else @;
+ }
+}
+
+@ @=
+{
+ register int s=8*((~a.l)&0x3);
+ ll->tet^=(((ll->tet>>s)^*p)&0xff)<=
+{
+ ll->tet=(*p<<24)+(*(p+1)<<16)+(*(p+2)<<8)+*(p+3);
+ p+=4,m+=4,a=incr(a,4);
+}
+
+@ When standard input is being read by the simulated program at the same time
+as it is being used for interaction, we try to keep the two uses separate
+by maintaining a private buffer for the simulated program's \.{StdIn}.
+Online input is usually transmitted from the keyboard to a \CEE/ program
+a line at a time; therefore an
+|fgets| operation works much better than |fread| when we prompt
+for new input. But there is a slight complication, because |fgets|
+might read a null character before coming to a newline character.
+We cannot deduce the number of characters read by |fgets| simply
+by looking at |strlen(stdin_buf)|.
+
+@=
+char stdin_chr @,@,@[ARGS((void))@];@+@t}\6{@>
+char stdin_chr()
+{
+ register char* p;
+ while (stdin_buf_start==stdin_buf_end) {
+ if (interacting) {
+ printf("StdIn> ");@+fflush(stdout);
+@.StdIn>@>
+ }
+ if (!fgets(stdin_buf,256,stdin))
+ panic("End of file on standard input; use the -f option, not <");
+ stdin_buf_start=stdin_buf;
+ for (p=stdin_buf;p=
+char stdin_buf[256]; /* standard input to the simulated program */
+char *stdin_buf_start; /* current position in that buffer */
+char *stdin_buf_end; /* current end of that buffer */
+
+@ Just after executing each instruction, we do the following.
+Underflow that is exact and not enabled is ignored. (This applies
+also to underflow that was triggered by |RESUME_SET|.)
+
+@=
+if ((exc&(U_BIT+X_BIT))==U_BIT && !(g[rA].l&U_BIT)) exc &=~U_BIT;
+if (exc) {
+ if (exc&tracing_exceptions) tracing=true;
+ j=exc&(g[rA].l|H_BIT); /* find all exceptions that have been enabled */
+ if (j) @;
+ g[rA].l |= exc>>8;
+}
+
+@ @=
+{
+ tripping=true;
+ for (k=0; !(j&H_BIT); j<<=1, k++) ;
+ exc&=~(H_BIT>>k); /* trips taken are not logged as events */
+ g[rW]=inst_ptr;
+ inst_ptr.h=0, inst_ptr.l=k<<4;
+ g[rX].h=sign_bit, g[rX].l=inst;
+ if ((op&0xe0)==STB) g[rY]=w, g[rZ]=b;
+ else g[rY]=y, g[rZ]=z;
+ g[rB]=g[255];
+ g[255]=g[rJ];
+ if (op==TRIP) w=g[rW], x=g[rX], a=g[255];
+}
+
+@ We are finally ready for the last case.
+
+@=
+case RESUME:@+if (xx || yy || zz) goto illegal_inst;
+inst_ptr=z=g[rW];
+b=g[rX];
+if (!(b.h&sign_bit)) @;
+break;
+
+@ Here we check to see if the ropcode restrictions hold.
+If so, the ropcode will actually be obeyed on the next fetch phase.
+
+@d RESUME_AGAIN 0 /* repeat the command in rX as if in location $\rm rW-4$ */
+@d RESUME_CONT 1 /* same, but substitute rY and rZ for operands */
+@d RESUME_SET 2 /* set r[X] to rZ */
+
+@=
+{
+ rop=b.h>>24; /* the ropcode is the leading byte of rX */
+ switch (rop) {
+ case RESUME_CONT:@+if ((1<<(b.l>>28))&0x8f30) goto illegal_inst;
+ case RESUME_SET: k=(b.l>>16)&0xff;
+ if (k>=L && k>24)==RESUME) goto illegal_inst;
+ break;
+ default: goto illegal_inst;
+ }
+ resuming=true;
+}
+
+@ @=
+if (rop==RESUME_SET) {
+ op=ORI;
+ y=g[rZ];
+ z=zero_octa;
+ exc=g[rX].h&0xff00;
+ f=X_is_dest_bit;
+}@+else { /* |RESUME_CONT| */
+ y=g[rY];
+ z=g[rZ];
+}
+
+@ We don't want to count the |UNSAVE| that bootstraps the whole process.
+
+@=
+if (g[rU].l || g[rU].h || !resuming) {
+ g[rC].h+=info[op].mems; /* clock goes up by $2^{32}$ for each $\mu$ */
+ g[rC]=incr(g[rC],info[op].oops); /* clock goes up by 1 for each $\upsilon$ */
+ g[rU]=incr(g[rU],1); /* usage counter counts total instructions simulated */
+ g[rI]=incr(g[rI],-1); /* interval timer counts down by 1 only */
+ if (g[rI].l==0 && g[rI].h==0) tracing=breakpoint=true;
+}
+
+@* Tracing. After an instruction has been executed, we often want
+to display its effect. This part of the program prints out a
+symbolic interpretation of what has just happened.
+
+@=
+if (tracing) {
+ if (showing_source && cur_line) show_line();
+ @;
+ @;
+ if (showing_stats || breakpoint) show_stats(breakpoint);
+ just_traced=true;
+}@+else if (just_traced) {
+ printf(" ...............................................\n");
+ just_traced=false;
+ shown_line=-gap-1; /* gap will not be filled */
+}
+
+@ @=
+bool showing_stats; /* should traced instructions also show the statistics? */
+bool just_traced; /* was the previous instruction traced? */
+
+@ @=
+if (resuming && op!=RESUME) {
+ switch (rop) {
+ case RESUME_AGAIN: printf(" (%08x%08x: %08x (%s)) ",
+ loc.h,loc.l,inst,info[op].name);@+break;
+ case RESUME_CONT: printf(" (%08x%08x: %04xrYrZ (%s)) ",
+ loc.h,loc.l,inst>>16,info[op].name);@+break;
+ case RESUME_SET: printf(" (%08x%08x: ..%02x..rZ (SET)) ",
+ loc.h,loc.l,(inst>>16)&0xff);@+break;
+ }
+}@+else {
+ ll=mem_find(loc);
+ printf("%10d. %08x%08x: %08x (%s) ",ll->freq,loc.h,loc.l,inst,info[op].name);
+}
+
+@ This part of the simulator was inspired by ideas of E.~H. Satterthwaite,
+@^Satterthwaite, Edwin Hallowell, Jr.@>
+{\sl Software---Practice and Experience\/ \bf2} (1972), 197--217.
+Online debugging tools have improved significantly since Satterthwaite
+published his work, but good offline tools are still valuable;
+alas, today's algebraic programming languages do not provide tracing
+facilities that come anywhere close to the level of quality that Satterthwaite
+was able to demonstrate for {\mc ALGOL} in 1970.
+
+@=
+if (lhs[0]=='!') printf("%s instruction!\n",lhs+1); /* privileged or illegal */
+else {
+ @;
+ if (z.l==0 && (op==ADDUI||op==ORI)) p="%l = %y = %#x"; /* \.{LDA}, \.{SET} */
+ else p=info[op].trace_format;
+ for (;*p;p++) @;
+ if (exc) printf(", rA=#%05x", g[rA].l);
+ if (tripping) tripping=false, printf(", -> #%02x", inst_ptr.l);
+ printf("\n");
+}
+
+@ Push, pop, and \.{UNSAVE} instructions display changes to rL and rO
+explicitly; otherwise the change is implicit, if |L!=old_L|.
+
+@=
+if (L!=old_L && !(f&push_pop_bit)) printf("rL=%d, ",L);
+
+@ Each \MMIX\ instruction has a {\it trace format\/} string, which defines
+its symbolic representation. For example, the string for \.{ADD} is
+|"%l = %y + %z = %x"|; if the instruction is, say, \.{ADD}~\.{\$1,\$2,\$3}
+with $\$2=5$ and $\$3=8$, and if the stack offset is 100, the trace output
+will be |"$1=l[101] = 5 + 8 = 13"|.
+
+Percent signs (\.\%) induce special format conventions, as follows:
+
+\bull \.{\%a}, \.{\%b}, \.{\%p}, \.{\%q}, \.{\%w}, \.{\%x}, \.{\%y}, and
+\.{\%z} stand for the numeric contents of octabytes |a|, |b|, |ma|, |mb|, |w|,
+|x|, |y|, and~|z|, respectively; a ``style'' character may follow the
+percent sign in this case, as explained below.
+
+\bull \.{\%(} and \.{\%)} are brackets that indicate the mode of
+floating point rounding. If |round_mode=ROUND_NEAR|, |ROUND_OFF|,
+|ROUND_UP|, |ROUND_DOWN|, the corresponding brackets are
+\.(~and~\.), \.[~and~\.], \.\^~and~\.\^, \.\_~and~\.\_.
+Such brackets are placed around a floating point operator;
+for example, floating point addition is denoted
+by `\.{[+]}' when the current rounding mode is rounding-off.
+
+\bull \.{\%l} stands for the string |lhs|, which usually represents the
+``left hand side'' of the
+instruction just performed, formatted as a register number and
+its equivalent in the ring of local registers (e.g., `\.{\$1=l[101]}') or
+as a register number and its equivalent in the array of global registers
+(e.g., `\.{\$255=g[255]}'). The \.{POP} instruction
+uses |lhs| to indicate how the ``hole'' in the register stack was plugged.
+
+\bull \.{\%r} means to switch to string |rhs| and continue formatting
+from there. This mechanism allows us to use variable formats for opcodes like
+\.{TRAP} that have several variants.
+
+\bull \.{\%t} means to print either `\.{Yes, ->loc}' (where \.{loc} is
+the location of the next instruction) or `\.{No}', depending on the
+value of~|x|.
+
+\bull \.{\%g} means to print `\.{ (bad guess)}' if |good| is |false|.
+
+\bull \.{\%s} stands for the name of special register |g[zz]|.
+
+\bull \.{\%?} stands for omission of
+the following operator if |z=0|. For example, the
+memory address of \.{LDBI} is described by `\.{\%\#y\%?+}'; this
+means to treat the address as simply `\.{\%\#y}' if |z=0|,
+otherwise as `\.{\%\#y+\%z}'. This case is used only when
+|z| is a relatively small number (|z.h=0|).
+
+@=
+{
+ if (*p!='%') fputc(*p,stdout);
+ else {
+ style=decimal;
+ char_switch: switch (*++p) {
+ @t\4@>@;
+ default: printf("BUG!!"); /* can't happen */
+ }
+ }
+}
+
+@ Octabytes are printed as decimal numbers unless a
+``style'' character intervenes between the percent sign and the
+name of the octabyte: `\.\#' denotes hexadecimal notation, prefixed by~\.\#;
+`\.0' denotes hexadecimal notation with no prefixed~\.\# and with leading zeros not suppressed;
+`\..' denotes floating decimal notation; and
+`\.!' means to use the names \.{StdIn}, \.{StdOut}, or \.{StdErr}
+if the value is 0, 1, or~2.
+@.StdIn@>
+@.StdOut@>
+@.StdErr@>
+
+@=
+case '#': style=hex;@+ goto char_switch;
+case '0': style=zhex;@+ goto char_switch;
+case '.': style=floating;@+ goto char_switch;
+case '!': style=handle;@+ goto char_switch;
+
+@ @=
+typedef enum {@!decimal,@!hex,@!zhex,@!floating,@!handle} fmt_style;
+
+@ @=
+case 'a': trace_print(a);@+break;
+case 'b': trace_print(b);@+break;
+case 'p': trace_print(ma);@+break;
+case 'q': trace_print(mb);@+break;
+case 'w': trace_print(w);@+break;
+case 'x': trace_print(x);@+break;
+case 'y': trace_print(y);@+break;
+case 'z': trace_print(z);@+break;
+
+@ @=
+fmt_style style;
+char *stream_name[]={"StdIn","StdOut","StdErr"};
+@.StdIn@>
+@.StdOut@>
+@.StdErr@>
+@#
+void trace_print @,@,@[ARGS((octa))@];@+@t}\6{@>
+void trace_print(o)
+ octa o;
+{
+ switch (style) {
+ case decimal: print_int(o);@+return;
+ case hex: fputc('#',stdout);@+print_hex(o);@+return;
+ case zhex: printf("%08x%08x",o.h,o.l);@+return;
+ case floating: print_float(o);@+return;
+ case handle:@+if (o.h==0 && o.l<3) printf(stream_name[o.l]);
+ else print_int(o);@+return;
+ }
+}
+
+@ @=
+case '(': fputc(left_paren[round_mode],stdout);@+break;
+case ')': fputc(right_paren[round_mode],stdout);@+break;
+case 't':@+if (x.l) printf(" Yes, -> #"),print_hex(inst_ptr);
+ else printf(" No");@+break;
+case 'g':@+if (!good) printf(" (bad guess)");@+break;
+case 's': printf(special_name[zz]);@+break;
+case '?': p++;@+if (z.l) printf("%c%d",*p,z.l);@+break;
+case 'l': printf(lhs);@+break;
+case 'r': p=switchable_string;@+break;
+
+@ @d rhs &switchable_string[1]
+
+@=
+char left_paren[]={0,'[','^','_','('}; /* denotes the rounding mode */
+char right_paren[]={0,']','^','_',')'}; /* denotes the rounding mode */
+char switchable_string[48]; /* holds |rhs|; position 0 is ignored */
+ /* |switchable_string| must be able to hold any |trap_format| */
+char lhs[32];
+int good_guesses, bad_guesses; /* branch prediction statistics */
+
+@ @=
+void show_stats @,@,@[ARGS((bool))@];@+@t}\6{@>
+void show_stats(verbose)
+ bool verbose;
+{
+ octa o;
+ printf(" %d instruction%s, %d mem%s, %d oop%s; %d good guess%s, %d bad\n",
+ g[rU].l,g[rU].l==1? "": "s",@|
+ g[rC].h,g[rC].h==1? "": "s",@|
+ g[rC].l,g[rC].l==1? "": "s",@|
+ good_guesses,good_guesses==1? "": "es",bad_guesses);
+ if (!verbose) return;
+ o = halted? incr(inst_ptr,-4): inst_ptr;
+ printf(" (%s at location #%08x%08x)\n",
+ halted? "halted": "now", o.h, o.l);
+}
+
+@* Running the program. Now we are ready to fit the pieces together into a
+working simulator.
+
+@c
+#include
+#include
+#include
+#include
+#include
+#include "abstime.h"
+@@;
+@@;
+@@;
+@@;
+@#
+int main(argc,argv)
+ int argc;
+ char *argv[];
+{
+ @;
+ mmix_io_init();
+ @