URL https://opencores.org/ocsvn/forwardcom/forwardcom/trunk

# Subversion Repositoriesforwardcom

## [/] [forwardcom/] [manual/] [fwc_instruction_lists.tex] - Rev 166

% chapter included in forwardcom.tex
\documentclass[forwardcom.tex]{subfiles}
\begin{document}
\RaggedRight

\chapter{Instruction lists}\label{chap:InstructionLists}
The ForwardCom instructions are listed in a comma-separated file instruction\_list.csv. This file is intended for use by assemblers, disassemblers, debuggers and emulators. The list is preliminary and subject to possible changes. Please remember to keep the lists in this document and the list in the instruction\_list.cvs file synchronized.
\vv

The instruction list file has the following fields:

\begin{longtable} {|p{18mm}|p{100mm}|}
\caption{Fields in instruction list file}
\label{table:fieldsInInstructionListFile}
\\
\hline
\bfseries Field & \bfseries Meaning  \\
\hline
Name & Name of instruction as used by assembler.  \\
\hline
Category & 1: single format instruction, \newline
2: unused,  \newline
3: multi-format instruction,  \newline
4: jump instruction. \\
\hline
Formats & See table \ref{table:MeaningOfFormatsFieldInInstructionListFile} below.  \\
\hline
0xA - 0xE for template A - E,  \newline
0x0 for multiple templates. \\
\hline
Variant &
D0:  No destination operand, no operand type.\newline
D1:  No destination operand, but operand type specified.\newline
D2:  Operand type ignored.\newline
D3:  Destination register used for other purpose.\newline
F0:  Can have mask register, but not fallback register.\newline
F1:  Can have fallback register without mask register.\newline
I2:  Immediate source operand is integer regardless of specified operand type.\newline
M0:  Memory operand is destination.\newline
%M1:  E formats with a memory operand use IM3 as an extra immediate operand.\newline (obsolete)
On: n bits of IM3 in E template format used for options (IM3 can be used for shift count only if it is not used for options).\newline
R0:  Destination is a general purpose register.\newline
R1:  First source operand is a general purpose register.\newline
R2:  Second source operand is a general purpose register.\newline
RL:  RT is a general purpose register specifying vector length.\newline
U0:  Integer operands are unsigned.\newline
U3:  Integer operands are unsigned if option bit 3 is set.\newline \hspace{6mm}
(compare instruction).\newline
H0:  Half precision floating point instruction.\newline
X0:  Source register can be a special pointer (threadp, datap, ip).\newline
X1:  Source register is special register.\newline
X2:  Source register is capabilities register.\newline
X3:  Source register is performance monitor register.\newline
X4:  Source register is system register.\newline
Y0-4:Destination register is one of the above.
\\ \hline
Source operands & Number of source operands, including register, memory and immediate operands, but not including mask, option bits, vector length, and index. \\
\hline
OP1 & Operation code OP1. \\
\hline
OP2 & Additional operation code OP2. Zero if none. \\
\hline
Operand types general purpose registers & Hexadecimal number indicating required and optional support for each operand type with general purpose registers. See table \ref{table:OperandTypesInInstructionList} below for meaning of each bit. \\
\hline
Operand types scalar & Hexadecimal number indicating required and optional support for each operand type for scalar operations in vector registers. See table \ref{table:OperandTypesInInstructionList} below for meaning of each bit. \\
\hline
Operand types vector & Hexadecimal number indicating required and optional support for each operand type for vector operations. See table \ref{table:OperandTypesInInstructionList} below for meaning of each bit. \\
\hline
Immediate operand type & Type of immediate operand for single-format instructions. See table \ref{table:immediateOperantTypesInInstructionList} below. \\
\hline
Description & Description of the instruction and comments. \\
\hline
\end{longtable}

\pagebreak % The text in the multirow box below disappears if there is a page break in it.
% Put page break here instead to prevent this
\label{table_format_field_in_list}
\begin{longtable} {|p{18mm}|p{20mm} p{80mm}|}
\caption{Meaning of formats field in instruction list file}
\label{table:MeaningOfFormatsFieldInInstructionListFile}
\\
\hline
\bfseries Category & \multicolumn{2}{|l|}{\bfseries Interpretation of formats field} \\
\hline
1.  Single format instruction & \multicolumn{2}{|p{102mm}|}{
Number with three hexadecimal digits. \newline
The leftmost digit is the value of the IL field (0-3). \newline
The middle digit is he value of mode field or the combined M+mode field (0-9).\newline
The rightmost digit is the sub-mode defined by OP2 in E template modes or OP1 in mode 2.5.x. Zero otherwise. \newline
For example 0x223 means format 2.2.3.
}  \\
\hline
% \pagebreak % The text in the multirow box disappears if there is a page break in it.
% Put pagebreak here to prevent this, or before the table
% \hline
\multirow{27}{*}{\parbox[t]{18mm}{3. Multi-format instruction}}
&  \multicolumn{2}{|l|}{
Hexadecimal number composed of one bit for each format supported:} \\
&  0x0000001 & Format 0.0: three general purpose registers. \\
&  0x0000002 & Format 0.1: two general purpose registers, 8-bit immediate. \\
&  0x0000004 & Format 0.2: Three vector registers. \\
&  0x0000008 & Format 0.3: Two vectors, 8-bit immediate. \\
&  0x0000010 & Format 0.4: One vector, memory operand. \\
&  0x0000020 & Format 0.5: One vector, memory operand with negative index. \\
&  0x0000040 & Format 0.6: One vector, scalar memory operand with index. \\
&  0x0000080 & Format 0.7: One vector, scalar memory operand with 8-bit offset. \\
&  0x0000100 & Format 0.8: One g. p. register, memory operand with index. \\
&  0x0000200 & Format 0.9: One g. p. register, memory operand with 8-bit offset. \\

&  0x0001000 & Format 2.8: Three g. p. registers, 32-bit immediate. \\
&  0x0002000 & Format 2.1: Two g. p. registers, memory with 32-bit offset. \\
&  0x0004000 & Format 2.3: Three vector registers, 32-bit immediate. \\
&  0x0008000 & Format 2.4: One vector register, memory with 32-bit offset. \\

&  0x0010000 & Format 2.0.0: Three g. p. reg., memory with 16-bit offset. \\
&  0x0020000 & Format 2.0.1: Two g. p. reg., memory with unscaled index. \\
&  0x0040000 & Format 2.0.2: Two g. p. reg., memory with scaled index. \\
&  0x0080000 & Format 2.0.3: Two g. p. reg., memory with index and limit.\\
&  0x0400000 & Format 2.0.6: Four g. p. reg.\\
&  0x0800000 & Format 2.0.7: Three g. p. registers, 16-bit shifted immediate. \\

&  0x1000000 & Format 2.2.0: Two vector reg., scalar memory w. 16-bit offset. \\
&  0x2000000 & Format 2.2.1: Two vector reg., memory with 16-bit offset. \\
&  0x4000000 & Format 2.2.2: Two vector reg., memory with negative index. \\
&  0x8000000 & Format 2.2.3: Two vector reg., scalar memory w. index and limit. \\
& 0x40000000 & Format 2.2.6: Four vector reg. \\
& 0x80000000 & Format 2.2.7: Three vector registers, 16-bit shifted immediate.\\

& 0x100000000 & Format 3.8:   Three g. p. registers, 64-bit immediate. \\
& 0x40000 0000 & Format 3.3:   Three vector registers, 64-bit immediate. \\

&  0x100000 0000 & Format 3.0.0: Three g. p. reg., memory with 32-bit offset. \\
&  0x800000 0000 & Format 3.0.3: Two g. p. reg., memory with index and 32-bit limit.\\
&  0x2000000 0000 & Format 3.0.5: One g. p. reg., memory with index and 16-bit offset, 32-bit immediate.\\
&  0x8000000 0000 & Format 3.0.7: Three g. p. registers, 32-bit shifted immediate. \\

&  0x10000000 0000 & Format 3.2.0: Two vector reg., scalar memory w. 32-bit offset. \\
&  0x20000000 0000 & Format 3.2.1: Two vector reg., memory with 32-bit offset. \\
&  0x80000000 0000 & Format 3.2.3: Two vector reg., scalar memory index and 32-bit limit. \\
&  0x200000000 0000 & Format 3.2.5: One vector reg., memory with 16-bit offset, and 32-bit immediate. \\
&  0x800000000 0000 & Format 3.2.7: Three vector registers, float or 32-bit shifted immediate.\\

\hline

\multirow{12}{*}{\parbox[t]{18mm}{4. Jump instruction}}
&  \multicolumn{2}{|l|}{
Hexadecimal number composed of one bit for each format supported:} \\
&  0x00001 & Format 1.6.0 B: Two registers, 8 bit offset. \\
&  0x00002 & Format 1.7.1 C: One register, 8 bit immediate, 8 bit offset. \\
&  0x00010 & Format 2.5.0 A: Three registers, 24 bit offset. \\
&  0x00020 & Format 2.5.1 B: Two registers, 16 bit immediate, 16 bit offset. \\
&  0x00040 & Format 2.5.2 B: One register, memory operand with 16 bit address, 16 bit  offset. \\
&  0x00080 & Format 2.5.3 B: Unused. \\
&  0x00100 & Format 2.5.4 C: One register, 8 bit immediate, 32 bit offset. \\
&  0x00200 & Format 2.5.5 C: One register, 32 bit immediate, 8 bit offset. \\
&  0x01000 & Format 3.1.0 A: Two registers, memory operand w 32 bit address, 24 bit offset. \\
&  0x02000 & Format 3.1.1 B: Two registers, 32 bit immediate, 32 bit offset. \\
&  0x10000 & Format 1.6.1 B: Memory operand with 8 bit offset. \\
&  0x20000 & Format 1.6.2 A: Reg. and memory w. scaled index. \\
&  0x40000 & Format 1.6.3 A: Three registers. \\
&  0x100000 & Format 1.7.0 D: No register, 24 bit address. \\
%  &  0x200000 & Format 1.7.2 C: 16-bit offset. Unused \\
&  0x400000 & Format 1.7.3 C: One register. \\
&  0x800000 & Format 1.7.4 C: 16 bit immediate. \\
&  0x1000000 & Format 1.7.5 C: 16 bit fixed immediate. \\
&  0x2000000 & Format 1.7.A C: Format 1.7 with 64 bit operand size. \\
&  0x10000000 & Format 2.5.1 X: Two registers, 2x16 bit immediate. \\
&  0x20000000 & Format 2.5.2 X: One register, memory operand with 32 bit offset. \\
&  0x40000000 & Format 2.5.4 X: 64 bit operand size. \\
&  0x80000000 & Format 2.5.5 X: Conditional trap. \\
&  0x100000000 & Format 2.5.7 C: System call, 16 bit function, 32 bit module. \\
& 0x1000000 0000 & Format 3.1.1 X: System call, 32 bit function, 32 bit module. \\
\hline
\end{longtable}

\begin{longtable} {|p{18mm}|p{100mm}|}
\caption{
Indication of operand types supported for general purpose registers, scalars in vector
registers, or vectors. The value is a hexadecimal number composed of one bit for each operand
type supported}
\label{table:OperandTypesInInstructionList} \\
\hline
0x0001 & 8-bit integer supported. \\
0x0002 & 16-bit integer supported. \\
0x0004 & 32-bit integer supported. \\
0x0008 & 64-bit integer supported. \\
0x0010 & 128-bit integer supported. \\
0x0020 & single precision floating point supported. \\
0x0040 & double precision floating point supported. \\
0x0080 & quadruple precision floating point supported. \\
0x0100 & 8-bit integer optionally supported. \\
0x0200 & 16-bit integer optionally supported. \\
0x0400 & 32-bit integer optionally supported. \\
0x0800 & 64-bit integer optionally supported. \\
0x1000 & 128-bit integer optionally supported. \\
0x2000 & single precision floating point optionally supported. \\
0x4000 & double precision floating point optionally supported. \\
0x8000 & quadruple precision floating point optionally supported. \\
\hline
\end{longtable}

\begin{longtable} {|p{18mm}|p{100mm}|}
\caption{
Immediate operand type for single-format instructions}
\label{table:immediateOperantTypesInInstructionList}
\\
\hline
0 & none or multi-format. \\
% 1 & 4-bit signed integer. \\
2 & 8-bit signed integer. \\
3 & 16-bit signed integer. \\
4 & 32-bit signed integer. \\
5 & 64-bit signed integer. \\
6 & 8-bit signed integer shifted by specified count. \\
7 & 16-bit signed integer shifted by specified count. \\
8 & 16-bit signed integer shifted by 16. \\
9 & 32-bit signed integer shifted by 32. \\
% 17 & 4-bit unsigned integer. \\
18 & 8-bit unsigned integer. \\
19 & 16-bit unsigned integer. \\
20 & 32-bit unsigned integer. \\
21 & 64-bit unsigned integer. \\
24 & two 8-bit unsigned integers. \\
25 & two 8-bit and one 6-bit unsigned integers. \\
26 & two 16-bit unsigned integers. \\
27 & one 16-bit and one 32-bit unsigned integer. \\
28 & two 32-bit unsigned integers. \\
29 & one 16-bit and two 8-bit unsigned integers. \\
% 33 & 4-bit unsigned integer converted to float. \\
34 & 8-bit signed integer converted to float. \\
35 & 16-bit signed integer converted to float. \\
64 & half precision floating point. \\
65 & single precision floating point. \\
66 & double precision floating point. \\
100 & determined by operand type. \\
in & a number prefixed by 'i' indicates an implicit value.
The implicit immediate operand with this value does not need to be written in the assembly code. \\
\hline
\end{longtable}

Jump instructions are listed on page \pageref{table:controlTransferInstructions}. All other categories of instructions are listed in the following tables.

\section{List of multi-format instructions}
The following list covers general instructions that can be coded in most or all of the formats
assigned to multi-format instructions.

\begin{longtable} {|p{25mm}|p{12mm}|p{12mm}|p{100mm}|}
\caption{
List of multi-format instructions}
\label{table:ListOfMultiFormatInstructions} \\
\hline
\bfseries Instruction & \bfseries OP1 & \bfseries Source ope-rands & \bfseries Description \\
\hline
nop          &  0 & 0 & No operation. \\
store        &  1 & 1 & Store value to memory. \\
move         &  2 & 1 & Copy value. \\
prefetch     &  3 & 1 & Prefetch from memory. \\
sign\_extend &  4 & 1 & Sign-extend smaller integer to 64 bits. \\
sign\_extend\_ add & 5 & 2 & Sign-extend smaller integer to 64 bits and add 64-bit register. \\
compare      &  7 & 2 & Compare. Uses condition codes, see p. \pageref{table:conditionCodesForCompareInstruction}. \\
add          &  8 & 2 & src1 + src2. \\
sub          &  9 & 2 & src1 - src2. \\
sub\_rev     & 10 & 2 & src2 - src1. \\
mul          & 11 & 2 & src1 $\cdot$ src2. \\
mul\_hi      & 12 & 2 & (src1 $\cdot$ src2) $>>$ OS, signed (integer only). \\
mul\_hi\_u   & 13 & 2 & (src1 $\cdot$ src2) $>>$ OS, unsigned (integer only). \\
div          & 14 & 2 & src1 / src2, signed division (optional for integer vectors). \\
div\_u       & 15 & 2 & src1 / src2, unsigned integer division (optional for vectors). \\
div\_rev     & 16 & 2 & src2 / src1, signed division (optional for integer vectors). \\
rem          & 18 & 2 & Modulo or remainder, signed (optional for integer vectors). \\
rem\_u       & 19 & 2 & Modulo or remainder, unsigned (optional for integer vectors). \\
min          & 20 & 2 & Signed minimum. \\
min\_u       & 21 & 2 & Minimum. unsigned for integers, abs for f.p. \\
max          & 22 & 2 & Signed maximum. \\
max\_u       & 23 & 2 & Maximum. unsigned for integers, abs for f.p. \\
and          & 26 & 2 & src1 \& src2. \\
or           & 27 & 2 & src1 \textbar{} src2. \\
xor          & 28 & 2 & src1 \^{} src2. \\
mul\_2pow    & 32 & 2 & src1 * $2^{src2}$. Multiply by integer power of 2. Floating point only. \\
shift\_left  & 32 & 2 & src1 $<<$ src2. Shift left. Integer only. \\
rotate       & 33 & 2 & Rotate left if src2 positive, right if negative. \\
shift\_right\_s & 34 & 2 & src1 $>>$ src2. Integer shift right with sign extension.\\
shift\_right\_u & 35 & 2 & src1 $>>$ src2. Integer shift right with zero extension.\\
clear\_bit   & 36 & 2 & Clear bit. src1 \& \~{} (1 $<<$ src2). \\
set\_bit     & 37 & 2 & Set bit. src1 \textbar{} (1 $<<$ src2). \\
toggle\_bit  & 38 & 2 & Toggle bit. src1 \^{} (1 $<<$ src2). \\
test\_bit    & 39 & 2 & Test single bit. (src1 $>>$ src2) \& 1. \\
test\_bits\_and & 40 & 2 & Test if all indicated bits are 1. (src1 \& src2) == src2 \\test\_bits\_or   & 41 & 2 & Test if at least one indicated bit is 1. (src1 \& src2) != 0 \\
add          & 44 & 2 & src1 + scr2 (float16. optional). \\
sub          & 45 & 2 & src1 - scr2 (float16. optional). \\
mul          & 46 & 2 & src1 * scr2 (float16. optional). \\
mul\_add     & 48 & 3 & $\pm$ src1 $\cdot$ src2 $\pm$ src3 (float16. optional). \\
mul\_add     & 49 & 3 & $\pm$ src1 $\cdot$ src2 $\pm$ src3 (optional). \\
mul\_add2    & 50 & 3 & $\pm$ src1 $\cdot$ src3 $\pm$ src2 (optional). \\
add\_add     & 51 & 3 & $\pm$ src1 $\pm$ src2 $\pm$ src3 (optional). \\
select\_bits & 52 & 3 & src1 \& src3 \textbar{} src2 \& \~{}src3 \\
funnel\_shift & 53 & 3 & Concatenate src1 and src2 and shift right by src3. \\
userdef56 - userdef62
& 56-62 & 2 & Reserved for user-defined instructions. \\
undef        & 63 & 2 & Undefined code. Generates trap. \\
\hline
\end{longtable}

\section{List of single-format instructions}
These instructions are mostly available in only one or a few formats.

\begin{longtable} {|p{25mm}|p{14mm}|p{10mm}|p{95mm}|}
\caption{List of single-format instructions with general purpose registers}
\label{table:ListOfSingleFormatInstructionsGP} \\
\hline
\bfseries Instruction & \bfseries Format &\bfseries OP1 & \bfseries Description \\
\hline
move          & 1.1 C &  0 & Move 16-bit sign-extended constant to 32-bit general purpose register. \\
move          & 1.1 C &  1 & Move 16-bit sign-extended constant to 64-bit general purpose register. \\
move          & 1.1 C &  3 & Move 16-bit zero-extended constant to 64-bit general purpose register. \\
move          & 1.1 C & 4  & RD = IM2 \textless\textless{} IM1. Sign-extend IM2 to 32 bits and shift left by the unsigned value IM1. \\
move          & 1.1 C & 5  & RD = IM2 \textless\textless{} IM1. Sign-extend IM2 to 64 bits and shift left by the unsigned value IM1. \\
add           & 1.1 C &  6  & Add 16-bit sign-extended constant to 32-bit general purpose register.. \\
mul           & 1.1 C &  8  & Multiply 32-bit general purpose register by 16-bit sign-extended constant. \\
add           & 1.1 C & 10  & RD += IM2 \textless\textless{} IM1. Sign-extend IM2 to 32 bits, shift left by the unsigned value IM1, add to RD. \\
add           & 1.1 C & 11  & RD += IM2 \textless\textless{} IM1. Sign-extend IM2 to 64 bits, shift left by the unsigned value IM1, add to RD. \\
and           & 1.1 C & 12  & RD \&= IM2 \textless\textless{} IM1. Sign-extend IM2 to 32 bits, shift left by the unsigned value IM1, AND with RD. \\
and           & 1.1 C & 13  & RD \&= IM2 \textless\textless{} IM1. Sign-extend IM2 to 64 bits, shift left by the unsigned value IM1, AND with RD. \\
or            & 1.1 C & 14  & RD \textbar{}= IM2 \textless\textless{} IM1. Sign-extend IM2 to 32 bits, shift left by the unsigned value IM1, OR with RD. \\
or            & 1.1 C & 15  & RD \textbar{}= IM2 \textless\textless{} IM1. Sign-extend IM2 to 64 bits, shift left by the unsigned value IM1, OR with RD. \\
xor           & 1.1 C & 16  & RD \^{}= IM2 \textless\textless{} IM1. Sign-extend IM2 to 32 bits, shift left by the unsigned value IM1, XOR with RD. \\
xor           & 1.1 C & 17  & RD \^{}= IM2 \textless\textless{} IM1. Sign-extend IM2 to 64 bits, shift left by the unsigned value IM1, XOR with RD. \\
add           & 1.1 C & 18  & RD += (IM1,IM2) \textless\textless{} 16. Shift 16-bit zero-extended constant left by 16 and add to 32-bit general purpose register. \\

abs           & 1.8 B &  0  & Absolute value of integer. IM1 determines handling of overflow: 0: wrap around, 1: saturate, 2: zero. \\
%shift\_add    & 1.8 B &  1  & Shift and add. RD += RS \textless\textless{} IM1 \\
bitscan       & 1.8 B &  2 & Bit scan forward or reverse. Find index to first or last set bit. \\
roundp2       & 1.8 B &  3 & Round up or down to nearest power of 2. \\
popcount      & 1.8 B &  4 & Count the number of bits that are 1.\\
read\_spec    & 1.8 B & 32  & Read special register RS into g. p. register RD. \\
write\_spec   & 1.8 B & 33  & Write g. p. register RS to special register RD. \\
read\_capabi-lities & 1.8 B & 34  & Read capabilities register RS into g. p. register RD. \\
write\_capabi-lities & 1.8 B & 35  & Write g. p. register RS to capabilities register RD. \\
read\_perfs   & 1.8 B & 37  & Read performance counter, serializing. \\
read\_sys     & 1.8 B & 38  & Read system register RS into g. p. register RD. \\
write\_sys    & 1.8 B & 39  & Write g. p. register RS to system register RD. \\
push          & 1.8 B & 56  & Push g. p. register RS to stack with pointer RD. \\
pop           & 1.8 B & 57  & Pop g. p. register RS from stack with pointer RD. \\
input         & 1.8 B & 62  & Read RD from input port with address IM1 or RS. (privileged instruction) \\
output        & 1.8 B & 63  & Write RD to output port with address IM1 or RS. (privileged instruction) \\

truth\_tab3   & 2.0.6 E & 8.1 & Boolean function of three inputs, given by a truth table. \\

move\_bits    & 2.0.7 E & 0.1 & Replace one or more contiguous bits at one position of RS with contiguous bits from another position of RT. Optional. \\

move          & 2.9 A &  0  & Load 32-bit constant into the high part of a general purpose register. The low part is zero. RD = IM2 \textless\textless{} 32. \\
insert\_hi    & 2.9 A &  1  & Insert 32-bit constant into the high part of a general purpose register, leaving the low part unchanged.
RD = (RT \& 0xFFFFFFFF) \textbar{} (IM2 \textless\textless{} 32). \\
add           & 2.9 A &  2  & Add zero-extended 32-bit constant to general purpose register. \\
sub           & 2.9 A &  3  & Subtract zero-extended 32-bit constant from general purpose register. \\
add           & 2.9 A &  4  & Add 32-bit constant to high part of general purpose register. RD = RT + (IM2 \textless\textless{} 32). \\
and           & 2.9 A &  5  & AND high part of general purpose register with 32-bit constant. RD = RT \& (IM2 \textless\textless{} 32). \\
or            & 2.9 A &  6  & OR high part of general purpose register with 32-bit constant. RD = RT \textbar{} (IM2 \textless\textless{} 32). \\
xor           & 2.9 A &  7  & XOR high part of general purpose register with 32-bit constant. RD = RT \^{} (IM2 \textless\textless{} 32). \\
address       & 2.9 A & 32  & RD = RT + IM2, RT can be THREADP (28), DATAP (29) or IP (30). \\
\hline
\end{longtable}

\begin{longtable} {|p{25mm}|p{14mm}|p{10mm}|p{95mm}|}
\caption{List of single-format instructions with vector registers and mixed register types}
\label{table:ListOfSingleFormatInstructionsVector} \\
\hline
\bfseries Instruction & \bfseries Format &\bfseries OP1. OP2 & \bfseries Description \\
\hline
get\_len      & 1.2 A &  0 & Get length of vector register RT into general purpose register RD. \\
get\_num      & 1.2 A &  1 & Get length of vector register RT divided by the operand size. \\
set\_len      & 1.2 A &  2 & RD = vector register RS with length changed to value of RT. \\
set\_num      & 1.2 A &  3 & Change the length of vector register RS to RT$\cdot$OS. \\
insert        & 1.2 A &  4 & Replace one element in vector RD, starting at offset RT$\cdot$OS, with scalar RS. \\
extract       & 1.2 A & 5 & Extract one element from vector RS, starting at offset RT$\cdot$OS, with size OS into scalar in vector register RD. \\
broad         & 1.2 A & 6 & Broadcast first element of vector RS into all elements of RD with length RT bytes. \\
compress\_ sparse& 1.2 A &  8 & Compress sparse vector elements indicated by mask bits into contiguous vector. (optional). \\
expand\_sparse& 1.2 A & 9 & Expand contiguous vector into sparse vector with positions indicated by mask bits. RT = length of output vector. (optional). \\

bits2bool     & 1.2 A & 12 & The lower n bits of RT are unpacked into a boolean vector RD with length RS, with one bit in each element, where n = RS / OS. \\

shift\_expand & 1.2 A & 16 & Shift vector RS up by RT bytes and extend the vector length by RT. The lower RT bytes of RD will be zero. \\
shift\_reduce & 1.2 A & 17 & Shift vector RS down RT bytes and reduce the length by RT. The lower RT bytes are lost. \\
shift\_up     & 1.2 A & 18 & Shift elements of vector RS up RT elements. The lower RT elements of RD will be zero, the upper RT elements are lost. \\
shift\_down   & 1.2 A & 19 & Shift elements of vector RS down RT elements. The upper RT elements of RD will be zero, the lower RT elements are lost. \\
%rotate\_up  & 1.2 A & 20 & Rotate vector up one element. Optional. \\
%rotate\_down  & 1.2 A & 21 & Rotate vector down one element. Optional. \\

div\_ex    & 1.2 A & 24 & Divide vector of double-size signed integers RS by signed integers RT. RS has element size 2$\cdot$OS. These are divided by the even numbered
elements of RT with size OS. The truncated results are stored in
the even-numbered elements of RD. The remainders are stored in
the odd-numbered elements of RD. (Optional for vectors). \\
div\_ex\_u    & 1.2 A & 25 & Same, with unsigned integers. (Optional for vectors). \\
mul\_ex       & 1.2 A & 26 & Multiply even-numbered signed integer vector elements to double size result. \\
mul\_ex\_u    & 1.2 A & 27 & Multiply even-numbered unsigned integer vector elements to double size result. \\
sqrt          & 1.2 A & 28 & Square root (floating point, optional). \\

add\_ss       & 1.2 A & 32 & Add integer vectors, signed with saturation (optional). \\
add\_us       & 1.2 A & 33 & Add integer vectors, unsigned with saturation (optional). \\
sub\_ss       & 1.2 A & 34 & Subtract integer vectors, signed with saturation (optional). \\
sub\_us       & 1.2 A & 35 & Subtract integer vectors, unsigned with saturation (optional). \\
mul\_ss       & 1.2 A & 36 & Multiply integer vectors, signed with saturation (optional). \\
mul\_us       & 1.2 A & 37 & Multiply integer vectors, unsigned with saturation (optional). \\
add\_oc       & 1.2 A & 38 & add with overflow check (optional). \\
sub\_oc       & 1.2 A & 39 & subtract with overflow check (optional). \\
mul\_oc       & 1.2 A & 40 & multiply with overflow check (optional). \\
div\_oc       & 1.2 A & 41 & divide with overflow check (optional). \\
add\_c        & 1.2 A & 42 & Add with carry. Vector has two elements. The upper element is used as carry on input and output (optional). \\
sub\_b        & 1.2 A & 43 & Subtract with borrow. Vector has two elements. The upper element is used as borrow on input and output (optional). \\

read\_spev    & 1.2 A & 56 & read special vector register. Length RT. \\
read\_call\_ stack & 1.2 A & 58 & read internal call stack. RD = vector register destination of length RS, RT-RS = internal address (privileged instruction). \\
write\_call\_ stack & 1.2 A & 59 & write internal call stack. RD = vector register source of length RS, RT-RS = internal address (privileged instruction). \\

read\_memory\_ map & 1.2 A & 60 & read memory map. RD = vector register destination of length RS, RT-RS = internal address (privileged instruction). \\
write\_memory\_ map & 1.2 A & 61 & write memory map. RD = vector register source of length RS, RT-RS = internal address (privileged instruction). \\

input         & 1.2 A & 62 & read from input port. RD = vector register, RT = port address, RS = vector length (privileged instruction). \\
output        & 1.2 A & 63 & write to output port. RD = vector register source operand, RT = port address, RS = vector length (privileged instruction). \\

gp2vec        & 1.3 B &  0 & Move value of general purpose register RS to scalar in vector register RD. \\

vec2gp        & 1.3 B &  1 & Move value of first element of vector register RS to general purpose register RD. \\

make\_sequen-ce& 1.3 B & 3 & Make a vector with RS sequential numbers. First value is IM1. \\

insert        & 1.3 B &  4 & Replace one element in vector RD, starting at offset IM1$\cdot$OS, with first element in RS. \\

extract       & 1.3 B & 5 & Extract one element from vector RS, starting at offset IM1$\cdot$OS into a scalar in vector register RD. \\

compress      & 1.3 B  &  6 & Compress vector to half the length and half the element size. Double precision $\rightarrow$ single precision, 64-bit
integer $\rightarrow$ 32-bit integer, etc. \\

expand        & 1.3 B &  7 & Expand vector to the double length and the double element size. Half precision $\rightarrow$ single precision, 32-bit integer $\rightarrow$ 64-bit integer, etc. \\

float2int     & 1.3 B & 12 & Conversion of floating point to integer with the same operand size. The rounding mode is specified in IM1. \\
int2float     & 1.3 B & 13 & Conversion of integer to floating point with same operand size. \\

round         & 1.3 B & 14 & Round floating point to integer in floating point  representation. The rounding mode is specified in IM1. \\
round2n       & 1.3 B & 15 & Round to nearest multiple of $2^n$. \newline
RD = $2^n\cdot$ round($2^{-n}\cdot$ RS). $n$ is a signed integer constant in IM1 (optional). \\
abs           & 1.3 B & 16 & Absolute value of integer. IM1 determines handling of overflow: 0: wrap around, 1: saturate, 2: zero. \\

fp\_category  & 1.3 B & 17 & Check if floating point numbers belong to the categories indicated by constant. \\

broad         & 1.3 B & 18 & Broadcast 8-bit constant into all elements of RD with length RS (31 in RS field gives scalar output). \\

broadcast\_ max & 1.3 B & 19 & Broadcast 8-bit constant into all elements of RD with maximum vector length. \\

byte\_reverse & 1.3 B & 20 & Reverse the order of bytes in each element of vector. \\
bit\_reverse  & 1.3 B & 20 & Reverse the order of bits in each element of vector (optional). \\

bitscan       & 1.3 B & 21 & Bit scan forward or reverse. Find index to lowest set bit. \\

popcount      & 1.3 B & 22 & Count the number of bits that are 1 (optional for vectors). \\

bool2bits     & 1.3 B & 25 & A boolean vector with n elements is packed into the lower n bits of RD, taking bit 0 of each element. The length of RD is at least sufficient to contain n bits. \\

bool\_reduce  & 1.3 B & 26 & An integer vector is reduced by combining bit 0 of all elements. The output is a scalar integer where bit 0 is the
AND combination of all the bits, and bit 1 is the OR combination of
all the bits. The remaining bits are reserved for future use. \\

category\_ reduce & 1.3 B & 26 & A floating point vector is reduced to a scalar integer where each bit indicates that the source vector contains at least one element in a certain category, such as NAN, zero, normal positive, etc. \\

push          & 1.3 B & 56  & Push vector register RS to stack with pointer RD. \\
pop           & 1.3 B & 57  & Pop vector register RS from stack with pointer RD. \\
clear         & 1.3 B & 58  & Clear vector register RS. \\

move          & 1.4 C &  0 & Move 16 bit integer constant to 16-bit scalar (optional). \\
add           & 1.4 C &  1 & Add broadcasted 16 bit constant to 16-bit vector elements (optional). \\
and           & 1.4 C &  2 & AND broadcasted 16 bit constant with 16-bit vector elements (optional). \\
or            & 1.4 C &  3 & OR broadcasted 16 bit constant with 16-bit vector elements (optional). \\
xor           & 1.4 C &  4 & XOR broadcasted 16 bit constant with 16-bit vector elements (optional). \\

move          & 1.4 C &  8 & RD = IM2 \textless\textless{} IM1. Sign-extend IM2 to 32 bits and shift left by the unsigned value IM1 to make 32 bit scalar (optional). \\
move          & 1.4 C &  9 & RD = IM2 \textless\textless{} IM1. Sign-extend IM2 to 64 bits and shift left by the unsigned value IM1 to make 64 bit scalar (optional). \\
add           & 1.4 C & 10 & RD += IM2 \textless\textless{} IM1. Add broadcast shifted signed constant to 32-bit vector elements (optional). \\
add           & 1.4 C & 11 & RD += IM2 \textless\textless{} IM1. Add broadcast shifted signed constant to 64-bit vector elements (optional). \\
and           & 1.4 C & 12 & RD \&= IM2 \textless\textless{} IM1. AND broadcast shifted signed constant with 32-bit vector elements (optional). \\
and           & 1.4 C & 13 & RD \&= IM2 \textless\textless{} IM1. AND broadcast shifted signed constant with 64-bit vector elements (optional). \\
or            & 1.4 C & 14 & RD \textbar{}= IM2 \textless\textless{} IM1. OR broadcast shifted signed constant with 32-bit vector elements (optional). \\
or            & 1.4 C & 15 & RD \textbar{}= IM2 \textless\textless{} IM1. OR broadcast shifted signed constant with 64-bit vector elements (optional). \\
xor           & 1.4 C & 16 & RD \^{}= IM2 \textless\textless{} IM1. XOR broadcast shifted signed constant with 32-bit vector elements (optional). \\
xor           & 1.4 C & 17 & RD \^{}= IM2 \textless\textless{} IM1. XOR broadcast shifted signed constant with 64-bit vector elements (optional). \\

move          & 1.4 C & 32 & Move converted half precision floating point constant to single
precision scalar (optional). \\
move          & 1.4 C & 33 & Move converted half precision floating point constant to double
precision scalar (optional). \\
precision vector (optional). \\
precision vector (optional). \\
mul           & 1.4 C & 36 & Multiply broadcast half precision floating point constant with single precision vector (optional). \\
mul           & 1.4 C & 37 & Multiply broadcast half precision floating point constant with double precision vector (optional). \\
add\_h        & 1.4 C & 40 & add constant to half precision vector (optional). \\
mul\_h        & 1.4 C & 41 & multiply half precision vector with constant (optional). \\
concatenate   & 2.2.6 E & 0.1 & A vector RU of length RT and a vector RS of length RT are concatenated into a vector RD of length 2$\cdot$RT. \\
permute       & 2.2.6 E & 1.1 & The vector elements of RU are permuted within each block of size RT bytes, using indices in RS. Each index is relative to the
beginning of a block. An index out of range produces zero. The
maximum block size is implementation dependent. \\
interleave    & 2.2.6 E & 2.1 & Interleave elements of vectors RU and RS of length RT/2 to produce vector RD of length RT. Even-numbered elements of the destination come from RU and odd-numbered elements from RS. (optional). \\
truth\_tab3   & 2.2.6 E & 8.1 & Boolean function of three inputs, given by a truth table. \\

move\_bits    & 2.2.7 E & 0.1 & Replace one or more contiguous bits at one position of RS with contiguous bits from another position of RT. Optional \\
mask\_length  & 2.2.7 E & 1.1 & Make mask with true in the first RT bytes. Option bits in IM2. \\
repeat\_block  & 2.2.7 E & 8.1 & Repeat a block of data to make a longer vector. RS is input vector containing data block to repeat. IM2 is length in bytes of the block to repeat (must be a multiple of 4). RT is the length of destination vector RD. (optional). \\
repeat\_within \_blocks & 2.2.7 E & 9.1 & Broadcast the first element of each block of data in a vector to the entire block. RS is input vector containing data blocks. IM2 is length in bytes of each block (must be a multiple of the operand size). RT is length of destination vector RD. The operand size must be at least 4 bytes. (optional). \\

load\_hi      & 2.6 A & 0 & Make vector of two elements. dest[0] = 0, dest[1] = IM2. \\
insert\_hi    & 2.6 A & 1 & Make vector of two elements. dest[0] = src1[0], dest[1] = IM2. \\
make\_mask    & 2.6 A & 2 & Make vector where bit 0 of each element comes from bits in IM2, the remaining bits come from RT. \\
replace       & 2.6 A & 3 & Replace elements in RT by constant IM2. \\
replace\_even & 2.6 A & 4 & Replace even-numbered elements in RT by constant
IM2. \\
replace\_odd  & 2.6 A & 5 & Replace odd-numbered elements in RT by constant
IM2. \\
broad         & 2.6 A & 6 & Broadcast 32-bit or float32 constant into all elements of RD with length RT (31 in RT field gives scalar output). \\
permute       & 2.6 A & 8 & The vector elements of RS are permuted within each block of size RT bytes. The 4$\cdot$n bits of IM2 are used as index with 4 bits for
each element in blocks of size n. The same pattern is used in each
block. The number of elements in each block, n = RT / OS $\leq$ 8. \\
replace       & 3.1 A & 32 & Replace elements in RT by constant IM2,IM3. \\
broad         & 3.1 A & 33 & Broadcast 64-bit or float64 constant into all elements of RD with length RT (31 in RT field gives scalar output). \\
\hline
\end{longtable}

\begin{longtable} {|p{25mm}|p{14mm}|p{10mm}|p{95mm}|}
\caption{List of single-format instructions with memory operands.}
\label{table:ListOfSingleFormatInstructionsMemory} \\
\hline
\bfseries Instruction & \bfseries Format &\bfseries OP1, OP2 & \bfseries Description \\
\hline
store         & 2.5 B &  8 & Store 32-bit constant IM2 to memory operand [RS+IM1] (optional). \\

fence         & 2.5 B & 16 & Memory fence at address [RS+IM2]. read, write or full indicated by IM1.\\

compare\_swap & 2.5 B & 18 & Atomic compare and exchange with address [RS+IM2].\\

read\_insert  & 2.5 A & 32 & Replace one element in vector RD, starting at offset
RT$\cdot$OS, with scalar memory operand [RS+IM2] (optional).  \\

extract\_store& 2.5 A & 40 & Extract one element from vector RD, starting at offset RT$\cdot$OS, with size OS into memory operand [RS+IM2] (optional). \\

\hline
\end{longtable}
\vspace{4mm}

\section{List of control transfer instructions}

\begin{longtable}
{|p{12mm}|p{16mm}|p{60mm}|p{55mm}|}
%\nopagebreak
\caption{Condition codes for control transfer instructions with integer operands in general purpose registers }
\label{table:controlTransferInstructions}
\hline
\bfseries OPJ & \bfseries bit 0 \newline of OPJ & \bfseries Instruction & \bfseries Comment \\
\hline
0-7 & part of offset & Unconditional jump with 24-bit offset (jump) & Format 1.7 D. Bit 0-2 of OPJ are part of offset \\
\hline
8-15 & part of offset & Unconditional call with 24-bit offset (call) & Format 1.7 D.  Bit 0-2 of OPJ are part of offset \\
\hline
0-1 & invert & sub/jump\_zero, \newline sub/jump\_nzero & Not format 1.7. Not floating point \\
\hline
2-3 & invert & sub/jump\_neg, \newline sub/jump\_nneg & Not format 1.7. Not floating point  \\
\hline
4-5 & invert & sub/jump\_pos, \newline sub/jump\_npos & Not format 1.7. Not floating point \\
\hline
6-7 & invert & sub/jump\_overfl, \newline sub/jump\_noverfl & Not format 1.7. Not floating point \\
\hline
8-9 & invert & sub/jump\_borrow, \newline sub/jump\_nborrow & Not format 1.7. Not floating point \\
\hline
10-11 & invert & and/jump\_zero \newline and/jump\_nzero & Not format 1.7 \\
\hline
12-13 & invert & or/jump\_zero \newline or/jump\_nzero & Not format 1.7 \\
\hline
14-15 & invert & xor/jump\_zero, \newline xor/jump\_nzero & Not format 1.7 \\
\hline
16-17 & invert & add/jump\_zero, \newline add/jump\_nzero & Not floating point \\
\hline
18-19 & invert & add/jump\_neg, \newline add/jump\_nneg & Not floating point \\
\hline
20-21 & invert & add/jump\_pos, \newline add/jump\_npos & Not floating point \\
\hline
22-23 & invert & add/jump\_overfl, \newline add/jump\_noverfl & Not floating point \\
\hline
24-25 & invert & add/jump\_carry, \newline add/jump\_ncarry & Not floating point \\
\hline
26-27 & invert & test\_bit/jump\_true, \newline test\_bit/jump\_false &  \\
\hline
28-29 & invert & test\_bits\_and/jump\_true, \newline test\_bits\_and/jump\_false &  \\
\hline
30-31 & invert & test\_bits\_or/jump\_true, \newline test\_bits\_or/jump\_false & \\
\hline
32-33 & invert & compare/jump\_equal, \newline compare/jump\_nequal & \\
\hline
34-35 & invert & compare/jump\_sbelow, \newline compare/jump\_saboveeq &  \\
\hline
36-37 & invert & compare/jump\_sabove, \newline compare/jump\_sbeloweq &  \\
\hline
38-39 & invert & compare/jump\_ubelow, \newline compare/jump\_uaboveeq &  \\
\hline
40-41 & invert & compare/jump\_uabove, \newline compare/jump\_ubeloweq & \\
\hline
42-47 & invert & Reserved for future use. & \\
\hline

48-49 & invert & increment\_compare/jump\_below, \newline /jump\_aboveeq & \\
\hline
50-51 & invert & increment\_compare/jump\_above, \newline /jump\_beloweq & \\
\hline
52-53 & invert & sub\_maxlen/jump\_pos, \newline sub\_maxlen/jump\_npos &  \\
\hline
54-57 &  & Reserved for future use. & \\
\hline
58-59 & 0 jump \newline 1 call & Indirect jump or call with memory operand. & Format 1.6 B and 2.5.2. \\
\hline
58-59 & 0 jump \newline 1 call & Unconditional direct jump or call & 2.5.4, and 3.1.1. \\
\hline
60-61 & 0 jump\_ relative \newline 1 call\_ relative & Jump or call with relative address in memory, table index, and arbitrary reference point &
Format 1.6 A and 2.5.2 \\
\hline
60-61 & 0 jump \newline 1 call & Indirect jump or call to value of register & Format 1.7 C \\
\hline
62 & 0 & return  & Format 1.6 C  \\
\hline
62 & 0 & sys\_return & Format 1.7 C  \\
\hline
63 & 1 & sys\_call. ID in register & Format 1.6 A \\
\hline
63 & 1 & sys\_call. ID in constants & Format 2.5.7 and 3.1.1. \\
\hline
63 & 1 & trap or filler & Format 1.7 C \\
\hline
63 & 1 & Conditional traps & Format 2.5.5. \\
\hline
\end{longtable}

\begin{longtable}
{|p{10mm}|p{14mm}|p{65mm}|p{40mm}|}
%\nopagebreak
\caption{Condition codes for control transfer instructions with floating point operands in vector registers }
\label{table:controlTransferInstructionsFloat}
\hline
OPJ & bit 0 \newline of OPJ & Instruction & Comment \\
\hline
32-33 & invert & compare/jump\_equal, \newline compare/jump\_nequal & false if unordered \\
\hline
0-1 & invert & compare/jump\_equal\_uo, \newline compare/jump\_nequal\_uo & true if unordered \\
\hline
34-35 & invert & compare/jump\_below, \newline compare/jump\_aboveeq & false if unordered \\
\hline
2-3 & invert & compare/jump\_below\_uo, \newline compare/jump\_aboveeq\_uo & true if unordered \\
\hline
36-37 & invert & compare/jump\_above, \newline compare/jump\_beloweq & false if unordered \\
\hline
4-5 & invert & compare/jump\_above\_uo, \newline compare/jump\_beloweq\_uo & true if unordered \\
\hline
38-39 & invert & compare/jump\_abs\_below, \newline compare/jump\_abs\_aboveeq & false if unordered \\
\hline
6-7 & invert & compare/jump\_abs\_below\_uo, \newline compare/jump\_abs\_aboveeq\_uo & true if unordered \\
\hline
40-41 & invert & compare/jump\_abs\_above, \newline compare/jump\_abs\_beloweq & false if unordered \\
\hline
8-9 & invert & compare/jump\_abs\_above\_uo, \newline compare/jump\_abs\_beloweq\_uo & true if unordered \\
\hline
24-25 & invert & fp\_category/jump\_true, \newline fp\_category/jump\_false &  \\
\hline

\multicolumn{4}{|c|}{} \\
\multicolumn{4}{|c|}{ The following instructions treat floating point operands as integers in vector registers: } \\
\multicolumn{4}{|c|}{} \\
\hline

10-11 & invert & and/jump\_zero \newline and/jump\_nzero & \\
\hline
12-13 & invert & or/jump\_zero \newline or/jump\_nzero &  \\
\hline
14-15 & invert & xor/jump\_zero, \newline xor/jump\_nzero &  \\
\hline
26-27 & invert & test\_bit/jump\_true, \newline test\_bit/jump\_false &  \\
\hline
28-29 & invert & test\_bits\_and/jump\_true, \newline test\_bits\_and/jump\_false &  \\
\hline
30-31 & invert & test\_bits\_or/jump\_true, \newline test\_bits\_or/jump\_false & \\
\hline

\end{longtable}

See page \pageref{descriptionOfControlTransferInstructions} for
detailed descriptions of control transfer instructions.

\end{document}