URL https://opencores.org/ocsvn/forwardcom/forwardcom/trunk

# Subversion Repositoriesforwardcom

## [/] [forwardcom/] [manual/] [fwc_description_of_instructions.tex] - Blame information for rev 144

Line No. Rev Author Line
1 144 Agner
% chapter included in forwardcom.tex
2
\documentclass[forwardcom.tex]{subfiles}
3
\begin{document}
4
\RaggedRight
5
 
6
\chapter{Description of instructions}
7
\label{chap:DescriptionOfInstructions}
8
\vv
9
 
10
\subsection{Data move and conversion instructions}
11
\vv
12
 
13
\subsubsection{broad}
14
 
15
\label{table:broadInstruction}
16
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
17
\hline
18
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
19
1.2 A &  6 & vector and g.p. register \\ \hline
20
1.3 B & 18 & g.p. register, and 8-bit signed constant \\ \hline
21
2.6   &  6 & g.p. register, and 32-bit signed or float constant \\ \hline
22
3.1   & 33 & g.p. register, and 64-bit signed or double constant \\ \hline
23
\end{tabular}
24
\vv
25
 
26
float v0 = broad(v1, r2)\\
27
float v0 = broad(r2, 2.5)
28
\vv
29
 
30
Broadcast a constant or the first element of a source vector into all
31
elements of the destination vector with the length in bytes indicated by a general purpose register.
32
\vv
33
 
34
This instruction can have a mask but not a fallback register. The fallback value is zero.\\
35
(This instruction is not called broadcast because that is a reserved keyword).
36
 
37
 
38
\subsubsection{broadcast\_max}
39
 
40
\label{table:broadcastMaxInstruction}
41
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
42
\hline
43
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
44
1.3 B & 19 & vector and 8-bit signed constant \\ \hline
45
\end{tabular}
46
\vv
47
 
48
float v0 = broadcast\_max(1)
49
\vv
50
 
51
Broadcast a small constant to all elements of a vector with maximum length.
52
\vv
53
 
54
 
55
\subsubsection{compress}
56
\label{table:compressInstruction}
57
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
58
\hline
59
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
60
1.3 B & 6 & vectors \\ \hline
61
\end{tabular}
62
\vv
63
 
64
double v0 = compress(v1, 0)
65
\vv
66
 
67
All the elements of a vector are converted to half the element size. The length of the output vector will be half the length of the input vector. The OT field specifies the operand type of the input vector. Double precision floating point numbers are converted to single precision. Integer elements are converted to half the size. Support for the following conversions are optional: single precision float to half precision, quadruple precision to double precision, 8-bit integer to 4-bit.
68
\vv
69
 
70
Overflow options and rounding mode are specified in IM1 as follows:
71
 
72
\label{table:compressOptions}
73
\begin{tabular}{|p{16mm}|p{130mm}|}
74
\hline
75
\bfseries IM1 bits & \bfseries meaning \\ \hline
76
bit 0-2 & Floating point exception control: \newline
77
000 = exceptions are controlled by NUMCONTR. See page \pageref{table:FPExceptionResults} \newline
78
001 = overflow generates NAN code \newline
79
010 = underflow generates NAN code \newline
80
011 = overflow and underflow generate NAN code \newline
81
100 = underflow and inexact generate NAN code \newline
82
101 = overflow, underflow, and inexact generate NAN code \newline
83
111 = no conditions generate NAN code
84
\\ \hline
85
bit 0-2 & Integer overflow control: \newline
86
000 = integer overflow wraps around \newline
87
100 = signed integer overflow gives zero \newline
88
101 = signed integer overflow gives signed saturation \newline
89
110 = unsigned integer overflow gives zero \newline
90
111 = unsigned integer overflow gives unsigned saturation
91
\\ \hline
92
bit 3-5 & Floating point rounding mode: \newline
93
000 = rounding mode determined by NUMCONTR \newline
94
001 = odd if not exact \newline
95
100 = nearest or even \newline
96
101 = down \newline
97
110 = up \newline
98
111 = towards zero
99
\\ \hline
100
\end{tabular}
101
\vv
102
 
103
The rounding mode "odd if not exact" works in the following way:
104
Truncate the superfluous mantissa bits. If the result is not exact then set the least significant bit to 1.
105
This rounding mode is needed to avoid double rounding errors when rounding in multiple steps. Use odd rounding mode except in the last step.
106
For example, to convert from double precision to half precision, use the odd rounding mode in the first step from double to single precision, then use "nearest or even" in the last step from single to half precision.
107
\vv
108
 
109
Overflow in integer conversion can be detected by doing the conversion twice, using an "overflow gives zero" option and the corresponding saturation option. Overflow has occurred if the two results are different.
110
\vv
111
 
112
NANs are converted by preserving the least significant bits of the payload and the quiet bit. This differs from most other microprocessors, which preserve the most significant bits of binary floating point NAN payloads.
113
\vv
114
 
115
 
116
\subsubsection{compress\_sparse}
117
\label{table:compressSparseInstruction}
118
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
119
\hline
120
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
121
1.2 A & 8 & vectors. Optional \\ \hline
122
\end{tabular}
123
\vv
124
 
125
int32 v0 = compress\_sparse(v1), mask = v2
126
\vv
127
 
128
Compress sparse vector elements indicated by mask bits into contiguous vector.
129
\vv
130
 
131
The algorithm of this instruction is:
132
Start with a zero-length destination vector.
133
For each element in the mask vector that is true, take an element from the corresponding position in the source vector and append it to the destination vector.
134
The length of the destination vector will be the number of true mask elements
135
times the element size.
136
\vv
137
 
138
This instruction cannot have a fallback register.
139
\vv
140
 
141
 
142
\subsubsection{concatenate}
143
\label{table:concatenateInstruction}
144
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
145
\hline
146
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
147
2.2.6 & 0.1 & vectors \\ \hline
148
\end{tabular}
149
\vv
150
 
151
float v0 = concatenate(v1, v2, r3)
152
\vv
153
 
154
A vector v1 of length r3 bytes and a vector v2 of
155
length r3 bytes are concatenated into a result vector
156
of length 2$\cdot$r3, with v2 in the high end.
157
\vv
158
 
159
This instruction cannot have a mask.
160
\vv
161
 
162
\subsubsection{expand}
163
\label{table:expandInstruction}
164
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
165
\hline
166
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
167
1.3 B & 7 & vectors \\ \hline
168
\end{tabular}
169
\vv
170
 
171
float  v0 = expand(v1, 0)
172
\vv
173
 
174
This is the opposite of compress. The length of the output vector is double the length of the input vector if the maximum vector length is not exceeded.
175
\vv
176
 
177
The OT field specifies the operand type of the output vector. Single precision floating point numbers are converted to double precision. Integers are converted to the double size by sign-extension or zero-extension. Support for the following conversions are optional: half precision float to single precision, double precision to quadruple precision, 4-bit integer to 8-bit.
178
\vv
179
 
180
Options are specified in IM1:
181
\vv
182
 
183
\label{table:expandOptions}
184
\begin{tabular}{|p{20mm}|p{120mm}|}
185
\hline
186
\bfseries IM1 bits & \bfseries meaning \\ \hline
187
bit 0-1 & integer options: \newline
188
00 = sign extension \newline
189
10 = zero extension
190
\\ \hline
191
\end{tabular}
192
\vv
193
 
194
 
195
\subsubsection{expand\_sparse}
196
\label{table:expandSparseInstruction}
197
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
198
\hline
199
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
200
1.2 A & 9 & vectors. Optional \\ \hline
201
\end{tabular}
202
\vv
203
 
204
int32 v0 = expand\_sparse(v1, r2), mask = v3
205
\vv
206
 
207
This is the opposite of compress\_sparse.
208
 
209
Expand a contiguous vector into a sparse vector with positions indicated by mask bits.
210
 
211
The second operand is a general purpose register indicating the length in bytes of the output vector.
212
\vv
213
 
214
The algorithm of this instruction is:\\
215
Set an index i1 to position zero in the source vector.\\
216
Let another index i2 loop through the elements of the mask vector. For each i2 do:\\
217
\hspace{4mm} if mask[i2] then\\
218
\hspace{8mm}   destination[i2] = source[i1]; increment i1\\
219
\hspace{4mm} else\\
220
\hspace{8mm}   destination[i2] = 0\\
221
  end for\\
222
 
223
\vv
224
The length of the destination vector will be the number of true mask elements
225
times the element size. This instruction cannot have a fallback register.
226
\vv
227
 
228
 
229
\subsubsection{extract}
230
\label{table:extractInstruction}
231
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
232
\hline
233
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
234
1.2 A &  5 & vectors  \\ \hline
235
1.3 B &  5 & vectors  \\ \hline
236
\end{tabular}
237
\vv
238
 
239
float v0 = extract(v1, r2)\\
240
float v0 = extract(v1, 5)
241
\vv
242
 
243
Extract one element from the source vector at the given position and
244
broadcast it into all elements of vector register RD with same length and operand size as the source vector.
245
The index can be a constant or a general purpose register.
246
This index indicates which vector element to extract.
247
The size of the vector elements must match the operand type.
248
\vv
249
 
250
An index out of range will produce zero. An operand size of 128 bits can be used, even if this size is not otherwise supported.
251
This instruction cannot have a mask.
252
\vv
253
 
254
 
255
\subsubsection{float2int}
256
\label{table:extractInstruction}
257
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
258
\hline
259
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
260
1.3 B & 12 & vectors  \\ \hline
261
\end{tabular}
262
\vv
263
 
264
int32 v0 = float2int(v1, 0)
265
\vv
266
 
267
Conversion of floating point values to integers with the same operand size.\\
268
float16 is converted to int16. float32 is converted to int32. float64 is converted to int64.
269
\vv
270
 
271
The bits in IM1 specify rounding mode and error control, according to the following table:
272
\vv
273
 
274
\label{table:float2intOptions}
275
\begin{tabular}{|p{16mm}|p{120mm}|}
276
\hline
277
\bfseries IM1 bit & \bfseries Meaning \\ \hline
278
0-2 & overflow control: \newline
279
000 = integer overflow wraps around \newline
280
100 = signed integer overflow gives zero \newline
281
101 = signed integer overflow gives signed saturation \newline
282
110 = unsigned integer overflow gives zero \newline
283
111 = unsigned integer overflow gives unsigned saturation \\
284
\hline
285
3-4 & rounding mode: \newline
286
00 = nearest or even\newline
287
01 = down\newline
288
10 = up\newline
289
11 = truncate towards zero \\
290
\hline
291
5 & 0: NAN gives 0. 1: NAN gives MIN\_INT \\
292
\hline
293
\end{tabular}
294
\vv
295
 
296
To check for overflow: Compare the results for overflow gives zero and overflow gives saturation.
297
 
298
To check if the result is exact: Compare the results for round down and round up.
299
 
300
 
301
\subsubsection{get\_len}
302
\label{table:getLenInstruction}
303
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
304
\hline
305
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
306
1.2 A & 0 & vectors  \\ \hline
307
\end{tabular}
308
\vv
309
 
310
Get length in bytes of vector register RT into general purpose register RD.
311
\vv
312
 
313
This instruction cannot have a mask.
314
 
315
\subsubsection{get\_num}
316
\label{table:getNumInstruction}
317
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
318
\hline
319
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
320
1.2 A & 1 & vectors  \\ \hline
321
\end{tabular}
322
\vv
323
 
324
Get the number of elements in vector register RT into general purpose register RD. This is equal to the length divided by the operand size. The result is a 64-bit integer.
325
\vv
326
 
327
This instruction cannot have a mask.
328
 
329
\subsubsection{gp2vec}
330
\label{table:gp2vecInstruction}
331
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
332
\hline
333
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
334
1.3 B & 0 & g.p register in, vector register out \\ \hline
335
\end{tabular}
336
\vv
337
 
338
int64 v0 = gp2vec(r1)
339
\vv
340
 
341
Move integer value of general purpose register RS to
342
scalar in vector register RD.
343
\vv
344
 
345
 
346
\subsubsection{insert}
347
\label{table:insertInstruction}
348
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
349
\hline
350
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
351
1.2 A & 4 & vectors \\
352
1.3 B & 4 & vectors \\ \hline
353
\end{tabular}
354
\vv
355
 
356
float v0 = insert(v0, v1, r2) \\
357
float v0 = insert(v0, v1, 5)
358
\vv
359
 
360
Replace one element in the first vector with the first element of the second vector.
361
The index to the position of replacement can be a constant or a general purpose register. This index indicates which vector element to replace.
362
The size of the vector elements must match the operand type.
363
The destination register must be the same as the first source operand.
364
\vv
365
 
366
An index out of range will leave the vector unchanged. An operand size of 128 bits can be used, even if this size is not otherwise supported.
367
\vv
368
 
369
This instruction cannot have a mask.
370
\vv
371
 
372
\subsubsection{insert\_hi}
373
\label{table:insertHiInstruction}
374
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
375
\hline
376
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
377
2.9 & 1 & general purpose register, 32-bit immediate constant \\ \hline
378
2.6 & 1 & vector register, 32-bit immediate constant \\ \hline
379
\end{tabular}
380
\vv
381
 
382
int64 r0 = insert\_hi(r1, 2) \\
383
float v0 = insert\_hi(v1, 2.1)
384
\vv
385
 
386
Insert 32-bit constant into the high part of a
387
general purpose register, leaving the low part
388
unchanged. \\
389
dest = (src1 \& 0xFFFFFFFF) $|$ (IM2 $<<$ 32).
390
\vv
391
 
392
Make a vector of two elements. A constant is inserted into the second element, leaving the first element unchanged.\\
393
dest[0] = src1[0], dest[1] = IM2.
394
\vv
395
 
396
 
397
\subsubsection{int2float}
398
\label{table:int2floatInstruction}
399
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
400
\hline
401
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
402
1.3 B & 13 & vectors \\ \hline
403
\end{tabular}
404
\vv
405
 
406
int64 v0 = int2float(v1, 0)
407
\vv
408
 
409
Conversion of signed or unsigned integers to floating point numbers with same operand size.\\
410
int16 is converted to float16. int32 is converted to float32. int64 is converted to float64.
411
\vv
412
 
413
Options are coded in IM1:
414
 
415
\label{table:int2floatOptions}
416
\begin{tabular}{|p{20mm}|p{120mm}|}
417
\hline
418
\bfseries IM1\newline bit number & \bfseries Meaning \\ \hline
419
 
420
2 & Inexact result gives NAN. See page \pageref{table:FPExceptionResults}.
421
\\ \hline
422
\end{tabular}
423
\vv
424
 
425
 
426
 
427
\subsubsection{interleave}
428
\label{table:interleaveInstruction}
429
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
430
\hline
431
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
432
2.2.6 & 2.1 & vectors. Optional \\ \hline
433
\end{tabular}
434
\vv
435
 
436
float v0 = interleave(v1, v2, r3)
437
\vv
438
 
439
Interleave the inputs from two vectors, v1 and v2, so that the even-numbered elements come from v1 and the odd-numbered elements come from v2. The length in bytes of the destination vector is indicated by a general purpose register, r3. The length of each input vector is half the indicated value.
440
\vv
441
 
442
This instruction can have a mask but not a fallback register. The fallback value is zero.
443
\vv
444
 
445
\subsubsection{load\_hi}
446
\label{table:loadHiInstruction}
447
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
448
\hline
449
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
450
2.5 & 0 & vector. 32 bit immediate constant \\ \hline
451
\end{tabular}
452
\vv
453
 
454
float v0 = load\_hi(1.2)
455
\vv
456
 
457
Make vector of two elements. dest[0] = 0, dest[1] = IM2.
458
\vv
459
 
460
\subsubsection{move}
461
\label{table:moveInstruction}
462
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
463
\hline
464
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
465
multi &  2 & all types \\ \hline
466
1.1 C &  0 & 32-bit register = 16-bit sign-extended constant \\ \hline
467
1.1 C &  1 & 64-bit register = 16-bit sign-extended constant \\ \hline
468
1.1 C &  3 & 64-bit register = 16-bit zero-extended constant \\ \hline
469
1.1 C &  4 & 32-bit register = 8-bit sign-extended constant with left shift \\ \hline
470
1.1 C &  5 & 64-bit register = 8-bit sign-extended constant with left shift \\ \hline
471
1.4 C &  0 & vector register 16-bit scalar = 16-bit constant. Optional  \\ \hline
472
1.4 C &  8 & vector register 32-bit scalar = 8-bit sign extended constant with left shift. Optional \\ \hline
473
1.4 C &  9 & vector register 64-bit scalar = 8-bit sign extended constant with left shift. Optional \\ \hline
474
1.4 C & 32 & vector register single precision scalar = half precision immediate constant. Optional \\ \hline
475
1.4 C & 33 & vector register double precision scalar = half precision immediate constant. Optional \\ \hline
476
\end{tabular}
477
\vv
478
 
479
Copy A value from a register, memory operand or immediate constant to a register. If the destination is a vector register and the source is an immediate constant then the result will be a scalar. The value will not be broadcast because there is no other input operand that specifies the vector length. If a vector is desired then use the broadcast instruction instead.
480
\vv
481
 
482
The move instruction with an immediate operand is the preferred method for setting a register to zero.
483
\vv
484
 
485
 
486
\subsubsection{permute}
487
\label{table:permuteInstruction}
488
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
489
\hline
490
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
491
2.2.6 & 1.1 & vectors \\ \hline
492
2.6   & 8   & vectors and 32 bit immediate constant \\ \hline
493
\end{tabular}
494
\vv
495
 
496
float v0 = permute(v1, v2, r3) \\
497
float v0 = permute(v1, r3, 5) \\
498
\vv
499
 
500
This instruction permutes the elements of a vector v1. The vector is divided into blocks of size r3 bytes each. The block size must be a power of 2 and a multiple of the operand size. Elements can be moved arbitrarily between positions within each block, but not between blocks. Each element of the output vector is a copy of an element in the input vector, selected by the corresponding index in an index vector v2 or a constant. The indexes are relative to the start of the block they belong to, so that an index of zero will select the first element in the block of the input vector and insert it in the corresponding position of the output vector. The same element in the input vector can be copied to multiple elements in the output vector. An index out of range will produce a zero. The indexes are interpreted as integers regardless of the operand type.
501
\vv
502
 
503
The permute instruction has two versions. The first version specifies the indexes in a vector with the same length and element size as the input vector.
504
\vv
505
 
506
The second version specifies the indexes as a 32-bit immediate constant with 4 bits per element. This constant is split into a maximum of 8 elements with 4 bits in each, where the least significant four bits is index for the first element in the block.
507
If the blocks have more than 8 elements each then the sequence of 8 elements is repeated to fill a block. The same pattern of indexes will be applied to all blocks in the second version of the permute instruction.
508
\vv
509
 
510
The maximum block size for the permute instruction is implementation-dependent and given by a special register. The reason for this limitation of block size is that the complexity of the hardware grows quadratically with the block size. A full permutation is possible if the vector length does not exceed the maximum block size. A trap is generated if r3 is bigger than the maximum block size.
511
\vv
512
 
513
The outputs of multiple permute instructions can be combined by using indexes out of range to produce zeroes for unused outputs and then combine the outputs of multiple permutes by bitwise OR.
514
The fallback value is zero if a mask is used.
515
\vv
516
 
517
Permute instructions are essential for a vector processor because it is often necessary to rearrange data to facilitate the vector processing. These instructions are useful for reordering data, for transposing a matrix, etc.
518
\vv
519
 
520
Permute instructions can also be used for parallel table lookup when the block size is big enough to contain the entire table.
521
\vv
522
 
523
Finally, permute instructions can be used for gathering and scattering data within an area not bigger than the vector length or the block size.
524
\vv
525
 
526
\subsubsection{read\_insert}
527
\label{table:readInsertInstruction}
528
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
529
\hline
530
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
531
2.5 A & 32 & vectors. Optional \\ \hline
532
\end{tabular}
533
\vv
534
 
535
int32 v0 = read\_insert(v0, r1, [r2+0x8, scalar])
536
\vv
537
 
538
Replace one element in vector RD, starting
539
at offset RT$\cdot$OS, with scalar memory operand
540
[RS+IM2].
541
 
542
(OS = operand size).
543
 
544
\subsubsection{repeat\_block}
545
\label{table:repeatBlockInstruction}
546
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
547
\hline
548
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
549
2.2.7 & 8.1 & vectors. Optional \\ \hline
550
\end{tabular}
551
\vv
552
 
553
float v0 = repeat\_block(v1, r2, 8)
554
\vv
555
 
556
Repeat a block of data to make a longer vector. This is the same as broadcast, but with a larger block of data. v1 is an input vector containing a data block to repeat. A constant (IM2) is the length in bytes of the block to repeat. This must be a multiple of 4. r2 is the length in bytes of the result vector. This instruction is useful for matrix multiplication.
557
\vv
558
 
559
This instruction cannot have a mask.
560
\vv
561
 
562
\subsubsection{repeat\_within\_blocks}
563
\label{table:repeatWithinBlockInstruction}
564
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
565
\hline
566
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
567
2.2.7 & 9.1 & vectors. Optional \\ \hline
568
\end{tabular}
569
\vv
570
 
571
float v0 = repeat\_within\_blocks(v1, r2, 8)
572
\vv
573
 
574
This divides a vector into blocks and broadcasts the first element of each block to the rest of the block. The block size is given by a constant (IM2). This must be a multiple of the operand size, and at least 4 bytes. There may be a maximum limit to the block size. r2 is the length in bytes of the resulst vector. This instruction is useful for matrix multiplication.
575
\vv
576
 
577
For example, if the input vector contains (0,1,2,3,4,5,6,7,8) and the block size is 3 times the operand size, then the result will be (0,0,0,3,3,3,6,6,6).
578
\vv
579
 
580
This instruction cannot have a mask.
581
\vv
582
 
583
\subsubsection{replace}
584
\label{table:replaceInstruction}
585
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
586
\hline
587
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
588
2.6 & 3 & vectors and 32-bit immediate constant \\ \hline
589
3.1 & 32 & vectors and 64-bit immediate constant. Optional \\ \hline
590
\end{tabular}
591
\vv
592
 
593
int32 v0 = replace(v1, 1), mask=v2, fallback=v3\\
594
double v0 = replace(v1, 2.3)
595
\vv
596
 
597
All elements of src1 are replaced by the integer or floating point constant src2.
598
\vv
599
 
600
When used without a mask, the constant is simply broadcast to make a vector of the same length as src1. This is useful for broadcasting a constant to all elements of a vector. Only the length of src1 (in bytes) is used, not its contents, when this instruction is used without a mask.
601
\vv
602
 
603
When used with a mask, the elements of src1 are selectively replaced. Elements that are not selected by the mask will be taken from a fallback register.
604
 
605
 
606
\subsubsection{replace\_even}
607
\label{table:replaceEvenInstruction}
608
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
609
\hline
610
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
611
2.6 & 4 & vectors and 32-bit immediate constant \\ \hline
612
\end{tabular}
613
\vv
614
 
615
Same as replace. Only even-numbered vector elements are replaced.
616
 
617
\subsubsection{replace\_odd}
618
\label{table:replaceOddInstruction}
619
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
620
\hline
621
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
622
2.6 & 5 & vectors and 32-bit immediate constant \\ \hline
623
\end{tabular}
624
\vv
625
 
626
Same as replace. Only odd-numbered vector elements are replaced.
627
 
628
 
629
\subsubsection{set\_len}
630
\label{table:setLenInstruction}
631
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
632
\hline
633
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
634
1.2 & 2 & vectors \\ \hline
635
\end{tabular}
636
\vv
637
 
638
v1 = set\_len(v2, r3)
639
\vv
640
 
641
Sets the length of a vector register to the number of bytes specified by a general purpose register. If the specified length is more than the maximum length for the specified operand type then the maximum length will be used.
642
\vv
643
 
644
If the output vector is longer than the input vector then the extra elements will be zero. If the output vector is shorter than the input vector then the extra elements will be discarded.
645
\vv
646
 
647
This instruction cannot have a mask.
648
\vv
649
 
650
\subsubsection{set\_num}
651
\label{table:setNumInstruction}
652
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
653
\hline
654
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
655
1.2 & 3 & vectors \\ \hline
656
\end{tabular}
657
\vv
658
 
659
v1 = set\_num(v2, r3)
660
\vv
661
 
662
The length of a vector register is changed to the value of general purpose register. The length is indicated as number of elements. If the length is increased then the extra elements will be zero. If the length is decreased then the superfluous elements are lost.
663
 
664
\vv
665
This instruction differs from set\_len by multiplying the length by the operand size.
666
 This instruction cannot have a mask.
667
 
668
 
669
\subsubsection{shift\_down}
670
\label{table:shiftDownInstruction}
671
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
672
\hline
673
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
674
1.2 & 19 & vectors \\ \hline
675
\end{tabular}
676
\vv
677
 
678
int32 v0 = shift\_down(v1, r2)
679
\vv
680
 
681
Shift elements of a vector down by the number of elements (n) indicated by general purpose register.
682
The upper n elements of the result will be zero, the lower n elements are lost. The length of the vector is not changed.
683
\vv
684
 
685
This instruction differs from shift\_reduce by indicating the shift count as a number of elements rather than a number of bytes, and by not changing the length of the vector.
686
\vv
687
 
688
This instruction cannot have a mask.
689
\vv
690
 
691
\subsubsection{shift\_expand}
692
\label{table:shiftExpandInstruction}
693
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
694
\hline
695
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
696
1.2 & 16 & vectors \\ \hline
697
\end{tabular}
698
\vv
699
 
700
int32 v0 = shift\_expand(v1, r2)
701
\vv
702
 
703
The length of a vector is expanded by the specified number of bytes by adding zero-bytes at the low end and shifting all bytes up. If the resulting length is more than the maximum vector length for the specified operand type then the upper bytes are lost.
704
\vv
705
 
706
This instruction cannot have a mask.
707
\vv
708
 
709
\subsubsection{shift\_reduce}
710
\label{table:shiftReduceInstruction}
711
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
712
\hline
713
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
714
1.2 & 17 & vectors \\ \hline
715
\end{tabular}
716
\vv
717
 
718
int32 v0 = shift\_reduce(v1, r2)
719
\vv
720
 
721
The length of a vector is reduced by the specified number of bytes by removing bytes at the low end and shifting all bytes down. If the resulting length is less than zero then the result will be a zero-length vector. The specified operand type is ignored.
722
\vv
723
 
724
This instruction cannot have a mask.
725
\vv
726
 
727
\subsubsection{shift\_up}
728
\label{table:shiftUpInstruction}
729
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
730
\hline
731
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
732
1.2 & 18 & vectors \\ \hline
733
\end{tabular}
734
\vv
735
 
736
int32 v0 = shift\_up(v1, r2)
737
\vv
738
 
739
Shift elements of a vector up by the number of elements (n) indicated by general purpose register.
740
The lower n elements of RD will be zero, the upper n elements are lost. The length of the vector is not changed.
741
\vv
742
 
743
This instruction differs from shift\_expand by indicating the shift count as a number of elements rather than a number of bytes, and by not changing the length of the vector.
744
\vv
745
 
746
This instruction cannot have a mask.
747
\vv
748
 
749
\subsubsection{sign\_extend}
750
\label{table:signExtendInstruction}
751
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
752
\hline
753
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
754
multi & 4 & general purpose and integer scalar \\ \hline
755
\end{tabular}
756
\vv
757
 
758
int8 r0 = sign\_extend(r1)  // result is 64 bits\\
759
int8 v0 = sign\_extend(v1)  // lower 8 bits of each 64-bit element is extended to 64 bits\\
760
int8 v0 = sign\_extend([r1, scalar]) // memory operand is 8 bits, result is 64 bits scalar
761
\vv
762
 
763
Sign-extend smaller integer to 64 bits.
764
 
765
\vv
766
The input can be an 8-bit, 16-bit or 32-bit integer. This integer is sign-extended to produce a 64-bit output in a general purpose register or a scalar in a vector register. If the input is a vector then only the first element in each 64-bit block of the input vector is used. Floating point types cannot be used.
767
 
768
\subsubsection{sign\_extend\_add}
769
\label{table:signExtendAddInstruction}
770
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
771
\hline
772
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
773
multi & 5 & general purpose registers \\ \hline
774
\end{tabular}
775
\vv
776
 
777
int8 r0 = sign\_extend\_add(r1, r2) \\
778
int32 r0 = sign\_extend\_add(r1, [r2]), options = 2
779
\vv
780
 
781
src2 is an integer of the specified size, often a memory operand.
782
This integer is sign-extended to produce a 64-bit integer.
783
The sign-extended value is optionally shifted left by a value of 1 .. 3, specified in the options.
784
The result is added to the 64-bit integer in src1 and the result is stored in the 64-bit destination register.
785
\vv
786
 
787
This instruction is useful for converting relative pointers to absolute pointers, where the reference point is in src1. The relative pointer may be scaled by a factor of 1, 2, 4, or 8, corresponding to a shift count or 0, 1, 2, or 3, respectively. Support for larger scale factors is optional.
788
\vv
789
 
790
This instruction does not sign-extend when the operand size is 64 bits, but it can still add and shift 64-bit integers.
791
\vv
792
 
793
This instruction will not generate traps in case of signed or unsigned overflow.
794
\vv
795
 
796
 
797
\subsubsection{vec2gp}
798
\label{table:vec2gpInstruction}
799
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
800
\hline
801
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
802
1.3 B & 1 & vector register in, g.p. register out \\ \hline
803
\end{tabular}
804
\vv
805
 
806
int64 r0 = vec2gp(v1)
807
\vv
808
 
809
Copy value of first element of vector register RS to general purpose register RD. Integers are sign-extended. Single precision floating point values are zero-extended.
810
\vv
811
 
812
 
813
\subsection{Data read and write instructions}
814
\vv
815
 
816
\subsubsection{address}
817
\label{table:addressInstruction}
818
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
819
\hline
820
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
821
2.9 A & 32 & general purpose register \\ \hline
822
\end{tabular}
823
\vv
824
 
825
int64 r1 = address([mydata])
826
\vv
827
 
828
Gives the address of a data object in static memory.
829
\vv
830
 
831
The value must be shifted two places to the right if used as the target for a jump or call instruction, because code addresses are based on 32-bit words rather than bytes.
832
\vv
833
 
834
 
835
\subsubsection{clear}
836
\label{table:clearInstruction}
837
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
838
\hline
839
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
840
1.3 B & 58 & vector. Optional \\ \hline
841
\end{tabular}
842
\vv
843
 
844
clear(v5)      // clear one vector register \\
845
clear(v5, 8)   // clear vector registers v5 - v8
846
\vv
847
 
848
Clear one or more vector registers by setting the length to zero. A cleared register is regarded as unused.
849
\vv
850
 
851
It may be advantageous to clear vector registers after use. This will mean that there is less data to save during a task switch.
852
\vv
853
 
854
 
855
\subsubsection{extract\_store}
856
\label{table:extractStoreInstruction}
857
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
858
\hline
859
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
860
2.5 A & 40 & vector. Optional \\ \hline
861
\end{tabular}
862
\vv
863
 
864
int32 [r3+8, scalar] = extract\_store(v1, r2)
865
\vv
866
 
867
Extract one element from vector RD, starting at offset RT$\cdot$OS, with size OS into memory operand [RS+IM2].
868
 
869
(OS = operand size).
870
 
871
 
872
\subsubsection{fence}
873
\label{table:fenceInstruction}
874
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
875
\hline
876
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
877
2.5 B & 16 & memory operand and immediate. Optional \\ \hline
878
\end{tabular}
879
\vv
880
 
881
int32   fence([r1], 2)
882
\vv
883
 
884
Memory fence at address [RS+IM2].
885
\vv
886
 
887
Options indicated by IM1:
888
 
889
\begin{longtable}{|p{20mm}|p{50mm}|}
890
\endfirsthead
891
\endhead
892
\hline
893
\bfseries IM1 value & \bfseries meaning \\ \hline
894
1 & read fence \\ \hline
895
2 & write fence \\ \hline
896
3 & read and write fence \\ \hline
897
\end{longtable}
898
\vv
899
 
900
\subsubsection{move}
901
The move instruction, described at page \pageref{table:moveInstruction}
902
can read a register from a memory operand.
903
 
904
\subsubsection{pop}
905
\label{table:popInstruction}
906
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
907
\hline
908
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
909
1.8 B & 57 & general purpose registers. Optional \\
910
1.3 B & 57 & vector registers. Optional \\ \hline
911
\end{tabular}
912
\vv
913
 
914
\begin{lstlisting}[frame=none]
915
pop(r5)         // pop 64-bit register r5 off the stack
916
pop(r1, r2, 6)  // pop registers r2-r6 from stack pointed to by r1
917
pop(v5)         // pop vector register v5 off the stack
918
pop(v5, 9)      // pop vector registers v5-v9 off the stack
919
\end{lstlisting}
920
\vv
921
 
922
The pop instruction can pop one or more registers from a stack. The registers are popped in reverse order.
923
\vv
924
 
925
An optional first register (RD) indicates a stack pointer. The default stack pointer (SP) is used if not specified. An optional last operand is an index of the last register to pop. The syntax for the POP instruction has no equal sign. The operand size is 64 bits by default. A different operand type is allowed only for general purpose registers.
926
\vv
927
 
928
The stack is growing backwards by default. The last register is read from the address pointed to by the stack pointer. Then the stack pointer is incremented by the amount that was occupied by the register. This is 8 bytes by default for a general purpose register or a variable amount for a vector register. This process is repeated if multiple registers are popped. Registers are pushed in forward order and popped in reverse order.
929
\vv
930
 
931
It is possible to make a forward-growing stack for general purpose registers by adding 0x80 to the last operand. A stack containing vector registers cannot grow forwards because the pop instruction needs to read the vector length stored at the beginning of each field before it can read the rest of the vector.
932
\vv
933
 
934
See the push instruction on page \pageref{table:pushInstruction} for more details.
935
\vv
936
 
937
 
938
\subsubsection{prefetch}
939
\label{table:prefetchInstruction}
940
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
941
\hline
942
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
943
multi & 3 & memory operand. Optional \\ \hline
944
\end{tabular}
945
\vv
946
 
947
Prefetch memory operand into cache for later read or write.
948
Different variants (not yet defined) can be specified by option bits in IM3 for formats with E template.
949
\vv
950
 
951
 
952
\subsubsection{push}
953
\label{table:pushInstruction}
954
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
955
\hline
956
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
957
1.8 B & 56 & general purpose register. Optional \\
958
1.3 B & 56 & vector register. Optional \\ \hline
959
\end{tabular}
960
\vv
961
 
962
\begin{lstlisting}[frame=none]
963
push(r5)           // push 64-bit register r5 on the stack
964
push(r1, r2, 6)    // push registers r2-r6 on stack pointed to by r1
965
push(r1, r2, 0x86) // push registers r2-r6 on forward growing stack r1
966
push(v5, 9)        // push vector registers v5-v9 on the stack
967
\end{lstlisting}
968
\vv
969
 
970
The push instruction can push one or more registers on a stack.
971
\vv
972
 
973
An optional first register (RD) indicates a stack pointer. The default stack pointer (SP) is used if not specified. An optional last operand is an index of the last register to push. The syntax for the PUSH instruction has no equal sign. The operand size is 64 bits by default. A different operand type is allowed only for general purpose registers.
974
\vv
975
 
976
The stack is growing backwards by default.
977
The stack pointer is decremented by the amount that will be occupied by the register. The first register is then stored to the address pointed to by the stack pointer. This size is 8 bytes for a full general purpose register or a variable amount for a vector register. This process is repeated if multiple registers are pushed.
978
\vv
979
 
980
It is possible to make a forward-growing stack for general purpose registers by adding 0x80 to the last operand. This may be used as an increment-pointer-and-store instruction. A stack containing vector registers cannot grow forwards because a later pop instruction needs to read the vector length stored at the beginning of each field before it can read the rest of the vector.
981
\vv
982
 
983
Note that vector registers are stored in an implementation-dependent way by the push instruction. The microprocessor may compress the data or it may insert extra space for optimal alignment of memory access. The programmer should make no assumption about how the vector elements are stored. A pushed vector register can only be restored by a pop instruction on the same or an identical microprocessor that pushed it. If the memory image is moved before restoring, it must be moved by a multiple of the maximum vector lenth. The maximum amount of memory occupied by a pushed vector register is 8 bytes plus the maximum vector length.
984
\vv
985
 
986
See also the pop instruction on page \pageref{table:popInstruction} for more details.
987
\vv
988
 
989
 
990
\subsubsection{store}
991
\label{table:storeInstruction}
992
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
993
\hline
994
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
995
multi &  1 & memory operand and g.p. or vector register \\ \hline
996
2.5 B &  8 & memory operand and 32-bit constant. Optional \\ \hline
997
\end{tabular}
998
\vv
999
 
1000
int32 [r0+r1*4] = r1\\
1001
float [r0, length = r1] = v2 \\
1002
float [r0 + 0x10] = 2.5
1003
\vv
1004
 
1005
Write the value of a register or constant to a memory operand.
1006
\vv
1007
 
1008
The size of the memory operand is determined by the operand size OS when a scalar memory operand is specified, or by the vector length register in RS when a vector operand is specified.
1009
\vv
1010
 
1011
An immediate constant cannot be bigger than 32 bits. A 64 bit integer constant can only be used if it fits into a 32-bit signed integer. A float64 constant can only be used if it can be represented as single precision without loss of precision.
1012
\vv
1013
 
1014
The hardware must be able to handle memory operand sizes that are not powers of 2 without touching additional memory (read and rewrite beyond the memory operand is not allowed unless access from other threads is blocked during the operation and any access violation is suppressed).
1015
It is allowed for the hardware to write the operand in a piecemeal fashion.
1016
\vv
1017
 
1018
Masked operation with a mask of zero will leave the corresponding memory element untouched. An explicit fallback value cannot be specified.
1019
\vv
1020
 
1021
 
1022
\subsection{General arithmetic instructions}
1023
 
1024
\subsubsection{abs}
1025
\label{table:absInstruction}
1026
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1027
\hline
1028
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1029
1.8 B &  0 & g.p. registers \\ \hline
1030
1.3 B & 16 & vector registers \\ \hline
1031
\end{tabular}
1032
\vv
1033
 
1034
int32 r0 = abs(r1, 1)
1035
\vv
1036
 
1037
 
1038
Absolute value of signed number.
1039
\vv
1040
 
1041
Signed integers can overflow when the input is the minimum value.
1042
The handling of overflow for signed integers is controlled by the constant IM1 as follows:
1043
 
1044
\begin{longtable}{|p{12mm}|p{80mm}|}
1045
\endfirsthead
1046
\endhead
1047
\hline
1048
\bfseries IM1 & \bfseries result when input is INT\_MIN \\ \hline
1049
 
1050
1  & INT\_MAX (saturation)  \\ \hline
1051
2  & zero                   \\ \hline
1052
\end{longtable}
1053
\vv
1054
 
1055
 
1056
\subsubsection{add}
1057
\label{table:addInstruction}
1058
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1059
\hline
1060
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1061
multi &  8 & all standard types \\ \hline
1062
multi & 44 & float16. Optional \\ \hline
1063
1.1 C &  6 & 32-bit register and 16-bit sign-extended constant \\ \hline
1064
1.1 C & 10 & 32-bit register and 8-bit sign-extended constant shifted left by another constant. \\ \hline
1065
1.1 C & 11 & 64-bit register and 8-bit sign-extended constant shifted left by another constant. \\ \hline
1066
1.1 C & 18 & 32-bit register and 16-bit zero-extended constant shifted left by 16 \\ \hline
1067
2.9   &  2 & g.p. register and 32-bit zero-extended constant \\ \hline
1068
2.9   &  4 & g.p. register and 32-bit constant shifted left by 32 \\ \hline
1069
1.4 C &  1 & vector of 16-bit integer elements and broadcast 16 bit integer constant. Optional \\ \hline
1070
1.4 C & 10 & vector of 32-bit integer elements and broadcast 8-bit sign-extended constant shifted left by another constant. Optional \\ \hline
1071
1.4 C & 11 & vector of 64-bit integer elements and broadcast 8-bit sign-extended constant shifted left by another constant. Optional \\ \hline
1072
1.4 C & 34 & single precision floating point vector and broadcast half precision floating point constant. Optional \\ \hline
1073
1.4 C & 35 & double precision floating point vector and broadcast half precision floating point constant. Optional \\ \hline
1074
1.4 C & 40 & half precision floating point vector and broadcast half precision floating point constant. Optional \\ \hline
1075
\end{tabular}
1076
\vv
1077
 
1078
int32 r0 = r1 + r2 \\
1079
int32 r0 = r1 + 2 \\
1080
int32+ r0 += 4 \\
1081
int32+ r0++ \\
1082
float v0 = v1 + [r2 + 8, length = r5]
1083
\vv
1084
 
1085
Addition.
1086
\vv
1087
 
1088
If you want to add a 64-bit constant to a general purpose register, and triple size instructions are not supported, then add the lower half first using the zero-extended version, and then add the upper half using the shifted version.
1089
 
1090
 
1091
\subsubsection{add\_add}
1092
\label{table:addAddInstruction}
1093
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1094
\hline
1095
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1096
multi & 51 & all types. Optional \\ \hline
1097
\end{tabular}
1098
\vspace{3mm}
1099
 
1100
This gives two additions in one instruction:
1101
\vv
1102
 
1103
dest = $\pm$ src1 $\pm$ src2 $\pm$ src3
1104
\vv
1105
 
1106
For optimal precision with floating point operands, the intermediate sum of the two numerically largest operands should preferably be calculated first with extended precision.
1107
\vv
1108
 
1109
The signs of the operands can be inverted as indicated by the following option bits:
1110
 
1111
\begin{longtable} {|p{20mm}|p{75mm}|}
1112
\caption{Control bits for add\_add}
1113
\label{table:ControlBitsForAddAdd} \\
1114
\endfirsthead
1115
\endhead
1116
\hline
1117
\bfseries Option bits & \bfseries Meaning   \\
1118
\hline
1119
bit 0 & change sign of src1 \\
1120
bit 1 & change sign of src2 \\
1121
bit 2 & change sign of src3 \\
1122
\hline
1123
\end{longtable}
1124
 
1125
There is no sign change if there are no option bits.
1126
\vv
1127
 
1128
This instruction may be supported for integer operands or floating point or both.
1129
\vv
1130
 
1131
 
1132
\subsubsection{compare} \label{compare}
1133
\label{table:compareInstruction}
1134
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1135
\hline
1136
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1137
multi &  7 & all types \\ \hline
1138
\end{tabular}
1139
\vv
1140
 
1141
Examples:\\
1142
int8 r0 = r1 $>$ r2 \\
1143
uint8 r0 = r1 $>$ r2 \\
1144
float v0 = v1 $<=$ 2.3 \\
1145
int32 r0 = compare(r1, 2), mask=r3, fallback=r4, options=0b1001
1146
\vv
1147
 
1148
The compare instruction compares two source operands and generates a boolean scalar or vector where bit 0 indicates the result. This instruction can do different compare operations depending on option bits 0-4 defined according to the following table:
1149
 
1150
\begin{longtable} {|p{14mm}|p{50mm}|p{50mm}|}
1151
\caption{Condition codes for compare instruction}
1152
\label{table:conditionCodesForCompareInstruction} \\
1153
\endfirsthead
1154
\endhead
1155
\hline
1156
\bfseries Bit 3-2-1-0 & \bfseries Meaning for integer & \bfseries Meaning for floating point \\
1157
\hline
1158
\_ 0 0 0 & a $=$ b    & a $=$ b \\
1159
\_ 0 0 1 & a $\neq$ b & a $\neq$ b \\
1160
\_ 0 1 0 & a $<$ b    & a $<$ b \\
1161
\_ 0 1 1 & a $\geq$ b & a $\geq$ b \\
1162
\_ 1 0 0 & a $>$ b    & a $>$ b \\
1163
\_ 1 0 1 & a $\leq$ b & a $\leq$ b \\
1164
\_ 1 1 0 &            & abs(a) $<$ abs(b) \\
1165
\_ 1 1 1 &            & abs(a) $\geq$ abs(b) \\
1166
\hline
1167
 
1168
1 \_ \_ \_ & compare as unsigned & unordered gives 1 \\
1169
\hline
1170
\end{longtable}
1171
 
1172
Option bit 3 indicates how to threat floating point NAN inputs. A compare operation is considered unordered if at least one floating point input operand is NAN. The translation of high level language operators to ordered and unordered compare operations are listed on page \pageref{table:floatCompareJumpInstructions}.
1173
\vv
1174
 
1175
The result is indicated in bit 0 of the destination register. It is 1 for true and 0 for false. The remaining bits are copied from a mask register, or zero if there is no mask register. The number of mask bits available is implementation dependent.
1176
\vv
1177
 
1178
The condition code is zero (indicating compare for equal) if there are no option bits.
1179
\vv
1180
 
1181
A fallback register can be used as operand for an extra boolean operation, with or without a mask. Only bit 0 of the fallback register is used.
1182
This option is controlled by option bits 4-5:
1183
 
1184
\begin{longtable} {|p{25mm}|p{50mm}|p{50mm}|}
1185
\caption{Alternative use of fallback register}
1186
\label{table:AlternativeFallbackForCompare} \\
1187
\endfirsthead
1188
\endhead
1189
\hline
1190
\bfseries bit 5 bit 4 & \bfseries Output with mask & \bfseries Output without mask \\
1191
\hline
1192
\hspace{5mm} 0 0 & mask ? result : fallback  & result \\
1193
\hline
1194
\hspace{5mm} 0 1 & mask \&\& result \&\& fallback & result \&\& fallback \\
1195
\hline
1196
\hspace{5mm} 1 0 & mask \&\& (result $||$ fallback) & result $||$ fallback \\
1197
\hline
1198
\hspace{5mm} 1 1 & mask \&\& (result \^{} fallback) & result \^{} fallback \\
1199
\hline
1200
\end{longtable}
1201
\vv
1202
 
1203
 
1204
\subsubsection{div}
1205
\label{table:divInstruction}
1206
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1207
\hline
1208
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1209
multi & 14 & all types. Optional for integer vectors \\ \hline
1210
\end{tabular}
1211
\vv
1212
 
1213
int32 r0 = r1 / r2 \\
1214
int32 r0 = div(r1, r2), options = 4\\
1215
float v0 = v1 / [r2 + 8, length = r5]
1216
\vv
1217
 
1218
Signed division.
1219
 
1220
\vv
1221
This instruction has multiple rounding modes. The rounding mode for integer operands is controlled by option bits (IM3) as follows:
1222
 
1223
\begin{longtable} {|p{25mm}|p{80mm}|}
1224
\caption{division instructions}
1225
\label{table:DivInstructions} \\
1226
\endfirsthead
1227
\endhead
1228
\hline
1229
\bfseries Option bits 0-3 & \bfseries Meaning   \\
1230
\hline
1231
 
1232
\hline
1233
 
1234
 
1235
 
1236
 
1237
\hline
1238
other values & Not allowed \\
1239
\hline
1240
\end{longtable}
1241
Truncation is always used with integer operands when there are no option bits.
1242
 
1243
\vv
1244
The rounding mode for floating point operands is controlled by the mask or numeric control register. Option bits must be zero for floating point operands.
1245
 
1246
\vv
1247
Division of floating point operands by zero gives $\pm$INF (or NAN if exceptions are enabled).
1248
 
1249
Division of integer operands by zero gives INT\_MAX or INT\_MIN.
1250
 
1251
Overflow occurs by division of INT\_MIN by -1. The result will wrap around to give INT\_MIN.
1252
\vv
1253
 
1254
 
1255
\subsubsection{div\_rev}
1256
\label{table:divRevInstruction}
1257
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1258
\hline
1259
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1260
multi & 16 & all types. Optional for integer vectors \\ \hline
1261
\end{tabular}
1262
\vv
1263
 
1264
int32 r0 = 10 / r2 \\
1265
int32 v0 = div\_rev(v1, v2), options = 4
1266
\vv
1267
 
1268
Same as div, with the two source operands swapped.
1269
 
1270
The rounding mode is controlled in the same way as for the div instruction.
1271
\vv
1272
 
1273
 
1274
\subsubsection{div\_u}
1275
\label{table:divUInstruction}
1276
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1277
\hline
1278
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1279
multi & 15 & all integer types. Optional for integer vectors \\ \hline
1280
\end{tabular}
1281
\vv
1282
 
1283
uint32 r0 = r1 / r2 \\
1284
uint32 v0 = div\_u(v1, v2), options=4
1285
\vv
1286
 
1287
Unsigned integer division.
1288
 
1289
The rounding mode is controlled in the same way as for the div instruction, see page \pageref{table:DivInstructions}
1290
 
1291
\vv
1292
Division by zero gives UINT\_MAX.
1293
\vv
1294
 
1295
 
1296
\subsubsection{div\_ex}
1297
\label{table:divExInstruction}
1298
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1299
\hline
1300
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1301
1.2 A & 24 & Integer vectors. Optional for more than one element \\ \hline
1302
\end{tabular}
1303
\vv
1304
 
1305
Divide vector of double-size signed integers RS by signed integers RT. RS has element size 2$\cdot$OS. These are divided by the even numbered
1306
elements of RT with size OS. The truncated results are stored in the even-numbered elements of RD. The remainders are stored in the odd-numbered elements of RD.
1307
(OS = operand size).
1308
\vv
1309
 
1310
 
1311
\subsubsection{div\_ex\_u}
1312
\label{table:divExUInstruction}
1313
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1314
\hline
1315
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1316
1.2 A & 25 & Integer vectors. Optional for more than one element \\ \hline
1317
\end{tabular}
1318
\vv
1319
 
1320
Divide vector of double-size unsigned integers RS by unsigned integers RT. RS has element size 2$\cdot$OS. These are divided by the even numbered elements of RT with size OS. The truncated results are stored in the even-numbered elements of RD. The remainders are stored in the odd-numbered elements of RD.
1321
(OS = operand size).
1322
\vv
1323
 
1324
 
1325
\subsubsection{max}
1326
\label{table:maxInstruction}
1327
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1328
\hline
1329
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1330
multi & 22 & all types \\ \hline
1331
\end{tabular}
1332
\vv
1333
 
1334
int32 r0 = max(r1, r2) \\
1335
float v0 = max(v1, v2)
1336
\vv
1337
 
1338
Get the maximum of two numbers:
1339
 
1340
max(src1,src2) = src1 \textgreater{} src2 ? src1 : src2
1341
\vv
1342
 
1343
Integer operands are treated as signed.
1344
\vv
1345
 
1346
The handling of floating point NAN operands follows the definition of the maximum function in the 2019 revision of the IEEE floating point standard 754, which guarantees the propagation of NANs, unlike the 1985 and 2008 versions of the standard.
1347
\vv
1348
 
1349
 
1350
\subsubsection{max\_abs}
1351
\label{table:maxAbsInstruction}
1352
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1353
\hline
1354
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1355
multi & 23 & all floating point types \\ \hline
1356
\end{tabular}
1357
\vv
1358
 
1359
float v0 = max\_abs(v1, v2)
1360
\vv
1361
 
1362
Gives the maximum of the absolute values of two floating point numbers.
1363
\vv
1364
 
1365
max\_abs(src1, src2) = max(abs(src1), abs(src2))
1366
\vv
1367
 
1368
NAN values are treated in the same way as for the max instruction.
1369
\vv
1370
 
1371
 
1372
\subsubsection{max\_u}
1373
\label{table:maxUInstruction}
1374
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1375
\hline
1376
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1377
multi & 23 & all integer types \\ \hline
1378
\end{tabular}
1379
\vv
1380
 
1381
uint32 r0 = max\_u(r1, r2)
1382
\vv
1383
 
1384
Gives the maximum of two unsigned integers.
1385
\vv
1386
 
1387
max\_u(src1,src2) = src1 \textgreater{} src2 ? src1 : src2
1388
\vv
1389
 
1390
 
1391
\subsubsection{min}
1392
\label{table:minInstruction}
1393
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1394
\hline
1395
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1396
multi & 20 & all types \\ \hline
1397
\end{tabular}
1398
\vv
1399
 
1400
int32 r0 = min(r1, r2)\\
1401
float v0 = min(v1, v2)
1402
\vv
1403
 
1404
Get the minimum of two numbers:
1405
 
1406
min(src1,src2) = src1 \textless{} src2 ? src1 : src2
1407
\vv
1408
 
1409
Integer operands are treated as signed.
1410
\vv
1411
 
1412
The handling of floating point NAN operands follows the definition of the minimum function in the 2019 revision of the IEEE floating point standard 754, which guarantees the propagation of NANs, unlike the 1985 and 2008 versions of the standard.
1413
\vv
1414
 
1415
 
1416
\subsubsection{min\_abs}
1417
\label{table:minAbsInstruction}
1418
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1419
\hline
1420
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1421
multi & 21 & all floating point types \\ \hline
1422
\end{tabular}
1423
\vv
1424
 
1425
float v0 = min\_abs(v1, v2)
1426
\vv
1427
 
1428
Gives the minimum of the absolute values of two floating point numbers.
1429
\vv
1430
 
1431
min\_abs(src1, src2) = min(abs(src1), abs(src2))
1432
\vv
1433
 
1434
NAN values are treated in the same way as for the min instruction.
1435
\vv
1436
 
1437
 
1438
\subsubsection{min\_u}
1439
\label{table:minUInstruction}
1440
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1441
\hline
1442
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1443
multi & 21 & all integer types \\ \hline
1444
\end{tabular}
1445
\vv
1446
 
1447
uint32 r0 = min\_u(r1, r2)
1448
\vv
1449
 
1450
Gives the minimum of two unsigned integers.
1451
\vv
1452
 
1453
min\_u(src1,src2) = src1 \textless{} src2 ? src1 : src2
1454
\vv
1455
 
1456
 
1457
\subsubsection{mul}
1458
\label{table:mulInstruction}
1459
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1460
\hline
1461
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1462
multi & 11 & all standard types \\ \hline
1463
multi & 46 & float16. Optional \\ \hline
1464
1.1 C &  8 & general purpose register and 16-bit sign-extended integer constant \\ \hline
1465
1.4 C & 36 & single precision floating point vector and broadcast half-precision floating point constant. Optional \\ \hline
1466
1.4 C & 37 & double precision floating point vector and broadcast half-precision floating point constant. Optional \\ \hline
1467
1.4 C & 41 & half precision floating point vector and broadcast half-precision floating point constant. Optional \\ \hline
1468
\end{tabular}
1469
\vv
1470
 
1471
int32 r0 = r1 * r2 \\
1472
float v0 *= 5.0
1473
\vv
1474
 
1475
Multiplication.
1476
\vv
1477
 
1478
The same instruction can be used for signed and unsigned integers.
1479
\vv
1480
 
1481
 
1482
\subsubsection{mul\_add, mul\_add2}
1483
\label{table:mulAddInstruction}
1484
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1485
\hline
1486
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1487
multi & 49 & mul\_add: dest = $\pm$ src1 $\cdot$ src2 $\pm$ src3. All types. Optional \\
1488
multi & 50 & mul\_add2: dest = $\pm$ src1 $\cdot$ src3 $\pm$ src2. All types. Optional \\
1489
multi & 48 & mul\_add. float16. Optional \\ \hline
1490
\hline
1491
\end{tabular}
1492
\vv
1493
 
1494
Fused multiply and add.
1495
\vv
1496
 
1497
The fused multiply-and-add instruction can often improve the performance of floating point code significantly. The intermediate product is calculated with extended precision according to the IEEE 754-2008 standard.
1498
\vv
1499
 
1500
The signs of the operands can be inverted as indicated by the following option bits
1501
 
1502
\begin{longtable} {|p{20mm}|p{75mm}|}
1503
\caption{Control bits for mul\_add and mul\_add2}
1504
\label{table:ControlBitsForMulAdd} \\
1505
\endfirsthead
1506
\endhead
1507
\hline
1508
\bfseries Option bits &  \bfseries Meaning   \\
1509
\hline
1510
bit 0 & change sign of product in even-numbered vector elements \\
1511
bit 1 & change sign of product in odd-numbered vector elements \\
1512
bit 2 & change sign of addend in even-numbered vector elements \\
1513
bit 3 & change sign of addend in odd-numbered vector elements \\
1514
\hline
1515
\end{longtable}
1516
 
1517
\vv
1518
These option bits make it possible to do multiply-and-add, multiply-and-subtract, multiply-and-reverse-subtract, etc. It can also do multiply with alternating add and subtract, which is useful in calculations with complex numbers.
1519
There is no sign change if there are no option bits.
1520
 
1521
\vv
1522
Support for integer operands is optional. Support for floating point operands is optional but desired.
1523
\vv
1524
 
1525
 
1526
\subsubsection{mul\_ex}
1527
\label{table:mulExInstruction}
1528
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1529
\hline
1530
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1531
1.2 A & 26 & integer vectors \\ \hline
1532
\end{tabular}
1533
\vv
1534
 
1535
int32 v0 = mul\_ex(v1, v2)
1536
\vv
1537
 
1538
Extended multiply, signed.
1539
\vv
1540
 
1541
Multiply even-numbered signed integer vector elements to double size result. The result extends into the next odd-numbered vector element.
1542
\vv
1543
 
1544
 
1545
\subsubsection{mul\_ex\_u}
1546
\label{table:mulExUInstruction}
1547
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1548
\hline
1549
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1550
1.2 A & 27 & integer vectors \\ \hline
1551
\end{tabular}
1552
\vv
1553
 
1554
uint32 v0 = mul\_ex\_u(v1, v2)
1555
\vv
1556
 
1557
Extended multiply, unsigned.
1558
\vv
1559
 
1560
Multiply even-numbered unsigned integer vector elements to double size result. The result extends into the next odd-numbered vector element.
1561
\vv
1562
 
1563
 
1564
\subsubsection{mul\_hi}
1565
\label{table:mulHiInstruction}
1566
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1567
\hline
1568
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1569
multi & 12 & integer vectors \\ \hline
1570
\end{tabular}
1571
\vv
1572
 
1573
int32 r0 = mul\_hi(r1, r2) \\
1574
int32 v0 = mul\_hi(v1, 2)
1575
\vv
1576
 
1577
High part of signed integer product.
1578
\vv
1579
 
1580
dest = (src1 $\cdot$ src2) $>>$ OS
1581
 
1582
(Signed, OS = operand size in bits).
1583
\vv
1584
 
1585
 
1586
\subsubsection{mul\_hi\_u}
1587
\label{table:mulHiUInstruction}
1588
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1589
\hline
1590
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1591
multi & 13 & integer vectors \\ \hline
1592
\end{tabular}
1593
\vv
1594
 
1595
uint32 r0 = mul\_hi\_u(r1, r2)
1596
\vv
1597
 
1598
High part of unsigned integer product.
1599
\vv
1600
 
1601
dest = (src1 $\cdot$ src2) $>>$ OS
1602
 
1603
(Unsigned, OS = operand size in bits).
1604
\vv
1605
 
1606
 
1607
\subsubsection{mul\_2pow}
1608
\label{table:mul2PosInstruction}
1609
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1610
\hline
1611
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1612
multi & 32 & all floating point types \\ \hline
1613
\end{tabular}
1614
\vv
1615
 
1616
Multiply by power of 2.
1617
 
1618
dest = src1 * $2^{src2}$
1619
 
1620
src1 and dest are floating point vectors, while src2 is interpreted as a signed integer vector with the same element size as src1 and dest.
1621
\vv
1622
 
1623
Overflow will produce infinity. The result will be zero rather than a subnormal number in case of underflow, regardless of control bits in the mask or numeric control register.
1624
The reason for this is that
1625
speed has priority here. This instruction will typically take a single clock cycle, while floating point multiplication by a power of 2 takes multiple clock cycles.
1626
This is useful for fast multiplication or division by a power of 2.
1627
\vv
1628
 
1629
This instruction has the same op1 code as shift\_left, but applies to floating point types only.
1630
\vv
1631
 
1632
 
1633
\subsubsection{rem}
1634
\label{table:remInstruction}
1635
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1636
\hline
1637
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1638
multi & 18 & all types. Optional for vectors of more than one element \\ \hline
1639
\end{tabular}
1640
\vv
1641
 
1642
int32 r0 = r1 \% r2 \\
1643
float v0 = rem(v1, v2)
1644
\vv
1645
 
1646
Modulo.
1647
 
1648
\vv
1649
Signed with integer operands or floating point operands.
1650
 
1651
\vv
1652
A floating point number modulo zero gives NAN.
1653
An integer modulo zero gives zero.
1654
\vv
1655
 
1656
 
1657
\subsubsection{rem\_u}
1658
\label{table:remUInstruction}
1659
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1660
\hline
1661
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1662
multi & 19 & integers. Optional for vectors of more than one element \\ \hline
1663
\end{tabular}
1664
\vv
1665
 
1666
uint32 r0 = r1 \% r2
1667
\vv
1668
 
1669
Unsigned modulo or remainder.
1670
 
1671
\vv
1672
An integer modulo zero gives zero.
1673
\vv
1674
 
1675
 
1676
\subsubsection{round}
1677
\label{table:roundInstruction}
1678
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1679
\hline
1680
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1681
1.3 B & 14 & floating point vectors \\ \hline
1682
\end{tabular}
1683
\vv
1684
 
1685
float v0 = round(v1, 0)
1686
\vv
1687
 
1688
Round floating point number to integer in floating point representation.
1689
\vv
1690
 
1691
The rounding mode is specified in bit 0-1 of IM1. See table \ref{table:maskBits} page \pageref{table:maskBits}.
1692
\vv
1693
 
1694
 
1695
\subsubsection{roundp2}
1696
\label{table:roundP2Instruction}
1697
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1698
\hline
1699
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1700
1.8 B &  3 & g.p. registers \\ \hline
1701
\end{tabular}
1702
\vv
1703
 
1704
int64 r0 = roundp2(r1, 1)
1705
\vv
1706
 
1707
Round unsigned integer up or down to the nearest power of 2.
1708
\vv
1709
 
1710
Options:
1711
 
1712
\label{table:roundp2Options}
1713
\begin{tabular}{|p{16mm}|p{122mm}|}
1714
\hline
1715
\bfseries IM1 bits & \bfseries meaning \\ \hline
1716
bit 0 & 0: Round down to power or 2:\newline
1717
dest = 1 \textless\textless{} bitscan\_reverse(src1).\newline
1718
        1: Round up to power or 2:\newline
1719
dest = ((src1 \& (src1-1)) == 0) ? src1 : 1 \textless\textless{}  (bitscan\_reverse(src1) + 1)
1720
\\ \hline
1721
bit 4 & 0: returns 0 if the input is 0.\newline
1722
        1: returns -1 if the input is 0.\\ \hline
1723
bit 5 & 0: returns 0 if the result overflows.\newline
1724
        1: returns -1 if the result overflows.\\ \hline
1725
\end{tabular}
1726
\vv
1727
 
1728
 
1729
\subsubsection{round2n}
1730
\label{table:round2nInstruction}
1731
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1732
\hline
1733
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1734
1.3 B & 15 & vector registers. Optional \\ \hline
1735
\end{tabular}
1736
\vv
1737
 
1738
float v0 = round2n(v1, -4)
1739
\vv
1740
 
1741
Round to nearest multiple of $2^n$.
1742
 
1743
dest = $2^n\cdot$ round($2^{-n}\cdot$ src1)
1744
 
1745
n is a signed integer constant in IM1.
1746
\vv
1747
 
1748
 
1749
\subsubsection{sqrt}
1750
\label{table:sqrtInstruction}
1751
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1752
\hline
1753
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1754
1.2 A & 28 & floating point vectors. Optional \\ \hline
1755
\end{tabular}
1756
\vv
1757
 
1758
Square root.
1759
\vv
1760
 
1761
 
1762
\subsubsection{sub}
1763
\label{table:subInstruction}
1764
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1765
\hline
1766
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1767
multi &  9 & all standard types \\ \hline
1768
multi & 45 & float16. Optional \\ \hline
1769
2.9   &  3 & g.p. register and 32-bit zero-extended constant \\ \hline
1770
\end{tabular}
1771
\vv
1772
 
1773
int32 r0 = r1 - r2 \\
1774
int32 r0 = r1 - 2 \\
1775
int32+ r0 -= 4 \\
1776
int32+ r0-{-} \\
1777
float v0 = v1 - [r2 + 8, length = r5]
1778
\vv
1779
 
1780
Subtraction.
1781
\vv
1782
 
1783
 
1784
\subsubsection{sub\_rev}
1785
\label{table:subRevInstruction}
1786
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1787
\hline
1788
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1789
multi & 10 & all types \\ \hline
1790
%1.1 C &  3 & g.p. register and 16-bit sign-extended constant \\ \hline
1791
\end{tabular}
1792
\vv
1793
 
1794
int32 r0 = 1 - r2 \\
1795
int32 v0 = - v2 + v1 \\
1796
float v0 = -v1 + [r2 + 8, length = r5]
1797
\vv
1798
 
1799
Reverse subtraction.
1800
\vv
1801
 
1802
dest = src2 - src1.
1803
\vv
1804
 
1805
 
1806
\subsection{Arithmetic instructions with carry, overflow check, or saturation}
1807
These instructions do not generate traps on overflow because they provide alternative ways of handling overflow.
1808
\vv
1809
 
1810
\subsubsection{abs}
1811
see page \pageref{table:absInstruction}.
1812
\vv
1813
 
1814
\subsubsection{add\_c}
1815
\label{table:addCInstruction}
1816
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1817
\hline
1818
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1819
1.2 A & 42 & integer vectors with two elements. Optional \\ \hline
1820
\end{tabular}
1821
\vv
1822
 
1823
Addition with carry.
1824
\vv
1825
 
1826
The vector has two elements. The upper element of src1 is used as carry in. The upper element of dest is used as carry out. Only the lower element of src2 is used.
1827
\vv
1828
 
1829
Longer vectors are not supported. See page
1830
\pageref{highPrecisionArithmetic} for an alternative for longer vectors.
1831
\vv
1832
 
1833
\subsubsection{add\_oc}
1834
\label{table:addOcInstruction}
1835
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1836
\hline
1837
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1838
1.2 A & 38 & vector registers. Optional \\ \hline
1839
\end{tabular}
1840
\vv
1841
 
1842
Integer addition with overflow check.
1843
\vv
1844
 
1845
Instructions with overflow check use the even-numbered vector elements for arithmetic instructions. Each following odd-numbered vector element is used for overflow detection.
1846
\vv
1847
 
1848
Overflow conditions are indicated with the following bits:
1849
\vv
1850
 
1851
bit 0. Unsigned integer overflow (carry or borrow).
1852
 
1853
bit 1. Signed integer overflow.
1854
\vv
1855
 
1856
The values are propagated so that the overflow result of the operation is OR'ed with the corresponding values of both input operands.
1857
\vv
1858
 
1859
\subsubsection{add\_ss}
1860
\label{table:addSsInstruction}
1861
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1862
\hline
1863
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1864
1.2 A & 32 & integer vectors. Optional \\ \hline
1865
\end{tabular}
1866
\vv
1867
 
1868
Add signed integers with saturation.
1869
 
1870
Overflow and underflow produces INT\_MAX and INT\_MIN.
1871
\vv
1872
 
1873
\subsubsection{add\_us}
1874
\label{table:addUsInstruction}
1875
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1876
\hline
1877
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1878
1.2 A & 33 & integer vectors. Optional \\ \hline
1879
\end{tabular}
1880
\vv
1881
 
1882
Add unsigned integers with saturation.
1883
 
1884
Overflow produces UINT\_MAX.
1885
\vv
1886
 
1887
\subsubsection{compress\_ss}
1888
\label{table:compressSsInstruction}
1889
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1890
\hline
1891
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1892
1.2 A & 5 & integer vectors. Optional \\ \hline
1893
\end{tabular}
1894
\vv
1895
 
1896
Compress, signed with saturation.
1897
\vv
1898
 
1899
Same as compress (see page \pageref{table:compressInstruction}). Integers are treated as signed and compressed with saturation. Floating point operands cannot be used.
1900
Masks cannot be used and overflow traps cannot be enabled for this instruction.
1901
\vv
1902
 
1903
\subsubsection{compress\_us}
1904
\label{table:compressUsInstruction}
1905
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1906
\hline
1907
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1908
1.2 A & 6 & integer vectors. Optional \\ \hline
1909
\end{tabular}
1910
\vv
1911
 
1912
Compress, unsigned with saturation.
1913
\vv
1914
 
1915
Same as compress (see page \pageref{table:compressInstruction}). Integers are treated as unsigned and compressed with saturation. Floating point operands cannot be used.
1916
Masks cannot be used and overflow traps cannot be enabled for this instruction.
1917
\vv
1918
 
1919
 
1920
\subsubsection{div\_oc}
1921
\label{table:divOcInstruction}
1922
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1923
\hline
1924
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1925
1.2 A & 41 & vector registers. Optional \\ \hline
1926
\end{tabular}
1927
\vv
1928
 
1929
Divide signed integers with overflow check.
1930
 
1931
See add\_oc for options.
1932
\vv
1933
 
1934
\subsubsection{mul\_oc}
1935
\label{table:mulOcInstruction}
1936
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1937
\hline
1938
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1939
1.2 A & 40 & vector registers. Optional \\ \hline
1940
\end{tabular}
1941
\vv
1942
 
1943
Multiply integers with overflow check.
1944
 
1945
See add\_oc for options.
1946
\vv
1947
 
1948
\subsubsection{mul\_ss}
1949
\label{table:mulSsInstruction}
1950
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1951
\hline
1952
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1953
1.2 A & 36 & integer vectors. Optional \\ \hline
1954
\end{tabular}
1955
\vv
1956
 
1957
Multiply signed integers with saturation.
1958
 
1959
Overflow and underflow produces INT\_MAX and INT\_MIN.
1960
\vv
1961
 
1962
\subsubsection{mul\_us}
1963
\label{table:mulUsInstruction}
1964
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1965
\hline
1966
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1967
1.2 A & 37 & integer vectors. Optional \\ \hline
1968
\end{tabular}
1969
\vv
1970
 
1971
Multiply unsigned integers with saturation.
1972
 
1973
Overflow produces UINT\_MAX.
1974
\vv
1975
 
1976
\subsubsection{sub\_b}
1977
\label{table:subBInstruction}
1978
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1979
\hline
1980
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
1981
1.2 A & 43 & integer vectors with two elements. Optional \\ \hline
1982
\end{tabular}
1983
\vv
1984
 
1985
Subtraction with borrow.
1986
\vv
1987
 
1988
The vector has two elements. The upper element of src1 is used as borrow in. The upper element of dest is used as borrow out. Only the lower element of src2 is used.
1989
\vv
1990
 
1991
Longer vectors are not supported. See page
1992
\pageref{highPrecisionArithmetic} for an alternative for longer vectors.
1993
\vv
1994
 
1995
\subsubsection{sub\_oc}
1996
\label{table:subOcInstruction}
1997
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
1998
\hline
1999
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2000
1.2 A & 39 & vector registers. Optional \\ \hline
2001
\end{tabular}
2002
\vv
2003
 
2004
Subtract integers with overflow check.
2005
 
2006
See add\_oc for options.
2007
\vv
2008
 
2009
\subsubsection{sub\_ss}
2010
\label{table:subSsInstruction}
2011
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2012
\hline
2013
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2014
1.2 A & 34 & integer vectors. Optional \\ \hline
2015
\end{tabular}
2016
\vv
2017
 
2018
Subtract signed integers with saturation.
2019
 
2020
Overflow and underflow produces INT\_MAX and INT\_MIN.
2021
\vv
2022
 
2023
\subsubsection{sub\_us}
2024
\label{table:subUsInstruction}
2025
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2026
\hline
2027
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2028
1.2 A & 35 & integer vectors. Optional \\ \hline
2029
\end{tabular}
2030
\vv
2031
 
2032
Subtract unsigned integers with saturation.
2033
 
2034
Overflow and underflow produces UINT\_MAX and 0.
2035
\vv
2036
 
2037
\subsection{Logic and bit manipulation instructions}
2038
 
2039
\subsubsection{and}
2040
\label{table:andInstruction}
2041
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2042
\hline
2043
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2044
multi & 26 & all types \\ \hline
2045
1.1 C & 12 & 32-bit register and 8-bit signed constant shifted left by another constant \\ \hline
2046
1.1 C & 13 & 64-bit register and 8-bit signed constant shifted left by another constant \\ \hline
2047
2.9   &  5 & g.p. register and 32-bit constant shifted left by 32 \\ \hline
2048
1.4 C &  2 & vector of 16-bit integers, and broadcast 16-bit constant. Optional \\ \hline
2049
1.4 C & 12 & vector of 32-bit integers, and broadcast sign-extended 8-bit constant shifted left by another constant. Optional \\ \hline
2050
1.4 C & 13 & vector of 64-bit integers, and broadcast sign-extended 8-bit constant shifted left by another constant. Optional \\ \hline
2051
\end{tabular}
2052
\vv
2053
 
2054
int32 r0 = r1 \& r2 \\
2055
int32 v0 = v1 \& 2
2056
\vv
2057
 
2058
Bitwise boolean and.
2059
\vv
2060
 
2061
Floating point operands are treated as integers.
2062
 
2063
Do not use a floating point type with a constant operand unless you want the operand to be interpreted as floating point.
2064
\vv
2065
 
2066
\subsubsection{or}
2067
\label{table:orInstruction}
2068
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2069
\hline
2070
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2071
multi & 27 & all types \\ \hline
2072
1.1 C & 14 & 32-bit register and 8-bit signed constant shifted left by another constant \\ \hline
2073
1.1 C & 15 & 64-bit register and 8-bit signed constant shifted left by another constant \\ \hline
2074
2.9   &  6 & g.p. register and 32-bit constant shifted left by 32 \\ \hline
2075
1.4 C &  3 & vector of 16-bit integers, and broadcast 16-bit constant. Optional \\ \hline
2076
1.4 C & 14 & vector of 32-bit integers, and broadcast sign-extended 8-bit constant shifted left by another constant. Optional \\ \hline
2077
1.4 C & 15 & vector of 64-bit integers, and broadcast sign-extended 8-bit constant shifted left by another constant. Optional \\ \hline
2078
\end{tabular}
2079
\vv
2080
 
2081
int32 r0 = r1 $|$ r2 \\
2082
int32 v0 = v1 $|$ 2
2083
\vv
2084
 
2085
Bitwise boolean or.
2086
\vv
2087
 
2088
Floating point operands are treated as integers.
2089
 
2090
Do not use a floating point type with a constant operand unless you want the operand to be interpreted as floating point.
2091
\vv
2092
 
2093
\subsubsection{xor}
2094
\label{table:xorInstruction}
2095
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2096
\hline
2097
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2098
multi & 28 & all types \\ \hline
2099
1.1 C & 16 & 32-bit register and 8-bit signed constant shifted left by another constant \\ \hline
2100
1.1 C & 17 & 64-bit register and 8-bit signed constant shifted left by another constant \\ \hline
2101
2.9   &  7 & g.p. register and 32-bit constant shifted left by 32 \\ \hline
2102
1.4 C &  4 & vector of 16-bit integers, and broadcast 16-bit constant. Optional \\ \hline
2103
1.4 C & 16 & vector of 32-bit integers, and broadcast sign-extended 8-bit constant shifted left by another constant. Optional \\ \hline
2104
1.4 C & 17 & vector of 64-bit integers, and broadcast sign-extended 8-bit constant shifted left by another constant. Optional \\ \hline
2105
\end{tabular}
2106
\vv
2107
 
2108
int32 r0 = r1 \^{} r2 \\
2109
int32 v0 = v1 \^{} 2
2110
\vv
2111
 
2112
Bitwise boolean exclusive or.
2113
\vv
2114
 
2115
Floating point operands are treated as integers.
2116
 
2117
Do not use a floating point type with a constant operand unless you want the operand to be interpreted as floating point.
2118
\vv
2119
 
2120
\subsubsection{bit\_reverse byte\_reverse}
2121
\label{table:bitReverseInstruction}
2122
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2123
\hline
2124
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2125
1.3 B & 20 & vectors \\ \hline
2126
\end{tabular}
2127
\vv
2128
 
2129
int32 v0 = byte\_reverse(v1, 0)\\
2130
int32 v0 = bit\_reverse(v1, 1)
2131
\vv
2132
 
2133
IM1 = 0: Reverse the order of bytes within each vector element. This is useful for converting big-endian file data.\\
2134
IM1 = 1: Reverse the order of bits in each element of a vector.
2135
\vv
2136
 
2137
 
2138
\subsubsection{bits2bool}
2139
\label{table:bits2boolInstruction}
2140
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2141
\hline
2142
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2143
1.2 A & 12 & integer vectors \\ \hline
2144
\end{tabular}
2145
\vv
2146
 
2147
int32 v0 = bits2bool(r1, v2)
2148
\vv
2149
 
2150
Expand contiguous bits in a vector register to a boolean vector with each bit of the source going into bit 0 of each element of the destination.
2151
The remaining bits of each element are copied from the first element of the mask or the numeric control register. The number of mask or NUMCONTR bits available is implementation dependent.
2152
\vv
2153
 
2154
The length in bytes of the result vector is specified by a general purpose register in RS.
2155
\vv
2156
 
2157
This instruction cannot have a fallback register.
2158
\vv
2159
 
2160
\subsubsection{bitscan}
2161
\label{table:bitscanInstruction}
2162
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2163
\hline
2164
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2165
1.8 B &  2 & general purpose registers \\ \hline
2166
1.3 B & 21 & integer vectors. Optional \\ \hline
2167
\end{tabular}
2168
\vv
2169
 
2170
int32 r0 = bitscan(r1, 0)\\
2171
int64 v0 = bitscan(v1, 1)
2172
\vv
2173
 
2174
Bit scan forward or reverse. Option bits are given in the second operand:
2175
\vv
2176
 
2177
\label{table:bitscanOptions}
2178
\begin{tabular}{|p{16mm}|p{122mm}|}
2179
\hline
2180
\bfseries IM1 bits & \bfseries meaning \\ \hline
2181
bit 0 & 0: forward scan. Find index to the lowest set bit.\newline
2182
        1: reverse scan. Find index to the highest set bit.\\
2183
\hline
2184
bit 4 & 0: returns  0 if the input is 0.\newline
2185
        1: returns -1 if the input is 0.\\ \hline
2186
\end{tabular}
2187
\vv
2188
 
2189
 
2190
\subsubsection{bool\_reduce}
2191
\label{table:boolReduceInstruction}
2192
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2193
\hline
2194
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2195
1.3 B & 26 & integer vectors \\ \hline
2196
\end{tabular}
2197
\vv
2198
 
2199
int32 v0 = bool\_reduce(v1)
2200
\vv
2201
 
2202
A boolean vector is reduced by combining bit 0 of all elements.
2203
 
2204
The output is a scalar integer where bit 0 is the AND combination of all the bits, and bit 1 is the OR combination of all the bits. The remaining bits are reserved for future use.
2205
\vv
2206
 
2207
This instruction cannot have a mask.
2208
\vv
2209
 
2210
 
2211
\subsubsection{bool2bits}
2212
\label{table:bool2bitsInstruction}
2213
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2214
\hline
2215
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2216
1.3 B & 25 & integer vectors \\ \hline
2217
\end{tabular}
2218
\vv
2219
 
2220
int64 v0 = bool2bits(v1)
2221
\vv
2222
 
2223
A boolean vector with n elements is packed into the lower n bits of RD, taking bit 0 of each element.
2224
The length of RD will be at least sufficient to contain n bits.
2225
\vv
2226
 
2227
This instruction cannot have a mask.
2228
\vv
2229
 
2230
 
2231
\subsubsection{category\_reduce}
2232
\label{table:categoryReduceInstruction}
2233
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2234
\hline
2235
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2236
1.3 B & 26 & floating point vectors \\ \hline
2237
\end{tabular}
2238
\vv
2239
 
2240
float v0 = category\_reduce(v1)
2241
\vv
2242
 
2243
A floating point vector is analyzed and each element is classified as belonging to one of the eight categories listed below. Each bit in the output indicates that at least one element in RT belongs to the corresponding category.
2244
\vv
2245
 
2246
\begin{tabular}{|p{24mm}|p{115mm}|}
2247
\hline
2248
\bfseries Bit number & \bfseries Category \\ \hline
2249
 
2250
1 & at least one element is zero \\
2251
2 & at least one element is negative subnormal \\
2252
3 & at least one element is positive subnormal \\
2253
4 & at least one element is negative normal \\
2254
5 & at least one element is positive normal \\
2255
6 & at least one element is negative infinity \\
2256
7 & at least one element is positive infinity \\
2257
\hline
2258
\end{tabular}
2259
\vv
2260
 
2261
This instruction cannot have a mask.
2262
\vv
2263
 
2264
 
2265
\subsubsection{clear\_bit}
2266
\label{table:clearBitInstruction}
2267
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2268
\hline
2269
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2270
multi & 36 & all types \\ \hline
2271
\end{tabular}
2272
\vv
2273
 
2274
Clear bit number src2 in src1.
2275
\vv
2276
 
2277
dest = src1 \& \~{}(1 $<<$ src2).
2278
 
2279
\vv
2280
Floating point operands are treated as integers.
2281
\vv
2282
 
2283
 
2284
\subsubsection{set\_bit}
2285
\label{table:setBitInstruction}
2286
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2287
\hline
2288
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2289
multi & 37 & all integer types \\ \hline
2290
\end{tabular}
2291
\vv
2292
 
2293
Set bit number src2 in src1 to one.
2294
\vv
2295
 
2296
dest = src1 $|$ (1 $<<$ src2)
2297
\vv
2298
 
2299
 
2300
\subsubsection{toggle\_bit}
2301
\label{table:toggleBitInstruction}
2302
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2303
\hline
2304
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2305
multi & 38 & all types \\ \hline
2306
\end{tabular}
2307
\vv
2308
 
2309
Change the value of bit number src2 in src1 to its opposite.
2310
\vv
2311
 
2312
dest = src1 \^{} (1 $<<$ src2)
2313
\vv
2314
 
2315
 
2316
\subsubsection{compare}
2317
See page \pageref{table:compareInstruction}
2318
\vv
2319
 
2320
 
2321
\subsubsection{fp\_category}
2322
\label{table:fpCategoryInstruction}
2323
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2324
\hline
2325
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2326
1.3 B & 17 & floating point vectors \\ \hline
2327
\end{tabular}
2328
\vv
2329
 
2330
float v0 = fp\_category(v1, 1)
2331
\vv
2332
 
2333
The input is a floating point vector. The output is a boolean vector where bit 0 of each element indicates if the input RS belongs to any of the categories indicated by the bits in the immediate operand IM1. The remaining bits of the output are taken from the numeric control register. The number of NUMCONTR bits available is implementation dependent.
2334
Any floating point value will belong to one, and only one, of these categories.
2335
 
2336
\begin{longtable} {|p{20mm}|p{90mm}|}
2337
\caption{Meaning of bits in fp\_category}
2338
\label{table:fpCategoryInstructionBits} \\
2339
\endfirsthead
2340
\endhead
2341
\hline
2342
\bfseries Bit number & \bfseries Meaning  \\
2343
\hline
2344
 
2345
1 & $\pm$ Zero \\
2346
2 & $-$ Subnormal \\
2347
3 & $+$ Subnormal \\
2348
4 & $-$ Normal \\
2349
5 & $+$ Normal \\
2350
6 & $-$ Infinite  \\
2351
7 & $+$ Infinite  \\
2352
\hline
2353
\end{longtable}
2354
\vv
2355
 
2356
 
2357
\subsubsection{make\_mask}
2358
\label{table:makeMaskInstruction}
2359
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2360
\hline
2361
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2362
2.6 &  2 & integer vectors \\ \hline
2363
\end{tabular}
2364
\vv
2365
 
2366
int32 v0 = make\_mask(v1, 2), mask=v3
2367
\vv
2368
 
2369
Make a mask from the bits of the 32-bit integer constant src2. Each bit of the constant goes into bit 0 of one element of the output. The remaining bits of each element are taken from a mask register, or from NUMCONTR if there is no mask. The number of mask or NUMCONTR bits available is implementation dependent.
2370
The length of the output is the same as the length of src1. If there are more than 32 elements in the vector then the bit pattern of src2 is repeated.
2371
\vv
2372
 
2373
\subsubsection{make\_sequence}
2374
\label{table:makeSequenceInstruction}
2375
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2376
\hline
2377
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2378
1.3 B &  4 & all vectors \\ \hline
2379
\end{tabular}
2380
\vv
2381
 
2382
int32 v0 = make\_sequence(r1, 2)
2383
\vv
2384
 
2385
Makes a vector of sequential numbers. The number of elements is indicated by a general purpose register.
2386
The first element is equal to the immediate operand IM1, the next element is IM1+1, etc. IM1 must be an integer in the range -128 \rightarrow 127.
2387
\vv
2388
 
2389
 
2390
\subsubsection{mask\_length}
2391
\label{table:maskLengthInstruction}
2392
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2393
\hline
2394
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2395
2.2.7 & 1.1 & integer vectors \\ \hline
2396
\end{tabular}
2397
\vv
2398
 
2399
int64 v0 = mask\_length(v1, r2, 0), options=2
2400
\vv
2401
 
2402
Make a boolean vector to mask the first n bytes of a vector, where n is the value of a general purposer register r2. \\
2403
The result vector will have the same length as the input vector v1. r2 indicates the length of the part that is enabled by the mask.
2404
\vv
2405
 
2406
The following option bits can be specified: \\
2407
bit 0 = 0: bit 0 will be 1 in the first n bytes in the output and 0 in the rest. \\
2408
bit 0 = 1: bit 0 will be 0 in the first n bytes in the output and 1 in the rest. \\
2409
bit 1 = 1: copy remaining bits from input vector v1 into each vector element. \\
2410
bit 2 = 1: copy remaining bits from the numeric control register. \\
2411
bit 4 = 1: broadcast remaining bits from a constant (IM2) into all 32-bit words of the result. \\
2412
\hspace{17mm} Bit 1-7 of IM2 go to bit 1-7 of the result. \\
2413
\hspace{17mm} Bit 8-11 of IM2 go to bit 20-23 of the result. \\
2414
\hspace{17mm} Bit 12-15 of IM2 go to bit 26-29 of the result. \\
2415
Output bits that are not set by any of these options will be zero.
2416
If multiple options are specified, the results will be OR'ed.
2417
 
2418
\vv
2419
This instruction can have a mask but not a fallback register. The fallback value is zero.
2420
\vv
2421
 
2422
 
2423
\subsubsection{move\_bits}
2424
\label{table:moveBitsInstruction}
2425
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2426
\hline
2427
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2428
2.0.7 & 0.1 & general purpose registers. Optional \\ \hline
2429
2.2.7 & 0.1 & integer vectors. Optional \\ \hline
2430
\end{tabular}
2431
\vv
2432
 
2433
int16 r0 = move\_bits(r1, r2, 3, 4, 5) \\
2434
int32 v0 = move\_bits(v1, v2, 3, 4, 5) \\
2435
\vv
2436
 
2437
Extract, insert, or move bit fields.
2438
\vv
2439
 
2440
Takes one or more contiguous bits from position src4 in the second source operand (src2) and insert them into position src3 in the first source operand (src1). The remaining bits of src1 are unchanged. \\
2441
The third source operand (src3) is the bit position in src2 to take bits from. \\
2442
The fourth source operand (src4) is the bit position to insert the bits in. \\
2443
The fifth source operand (src5) is the number of bits to move. \\
2444
The first two source operands must be registers, the remaining operands must be constants.
2445
\vv
2446
 
2447
Definition:\\
2448
m = (1 $<<$ src5) - 1 \\
2449
b = src2 $>>$ src3 \\
2450
dest = (src1 \& \~{}(m$<<$src4)) $|$ (b \& m) $<<$ src4
2451
\vv
2452
 
2453
Examples:\\
2454
int16 r1 = 0x1234\\
2455
int16 r2 = 0xABCD\\
2456
// extract 4 bits from r2, starting from position 8, and insert into position 0 of r1:\\
2457
int16 r0 = move\_bits(r1, r2, 8, 0, 4) // = 0x123B \\
2458
// insert 8 bits from position 0 of r2 into position 4 of r1:\\
2459
int16 r0 = move\_bits(r1, r2, 0, 4, 8) // = 0x1CD4 \\
2460
// move 4 bits from position 8 in r2 into the same position of r1:\\
2461
int16 r0 = move\_bits(r1, r2, 8, 8, 4) // = 0x1B34 \\
2462
\vv
2463
 
2464
 
2465
\subsubsection{popcount}
2466
\label{table:popcountInstruction}
2467
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2468
\hline
2469
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2470
1.8 B &  4 & general purpose registers. Optional \\ \hline
2471
1.3 B & 22 & integer vectors. Optional \\ \hline
2472
\end{tabular}
2473
\vv
2474
 
2475
int32 r0 = popcount(r1) \\
2476
int32 v0 = popcount(v1)
2477
\vv
2478
 
2479
The popcount instruction counts the number of 1-bits in an integer. It can also be used for parity generation.
2480
\vv
2481
 
2482
 
2483
\subsubsection{rotate}
2484
\label{table:rotateInstruction}
2485
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2486
\hline
2487
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2488
multi & 33 & all integer types \\ \hline
2489
\end{tabular}
2490
\vv
2491
 
2492
dest = rotate(src1, src2)
2493
\vv
2494
 
2495
Rotate the bits of src1 left if src2 is positive, or right if src2 is negative.
2496
\vv
2497
 
2498
 
2499
\subsubsection{shift\_left}
2500
\label{table:shiftLeftInstruction}
2501
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2502
\hline
2503
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2504
multi & 32 & all integer types \\ \hline
2505
\end{tabular}
2506
\vv
2507
 
2508
Shift integer left.
2509
 
2510
dest = src1 $<<$ src2
2511
\vv
2512
 
2513
The result is zero if src2 is outside the range 0 $\leq$ src2 $<$ number\_of\_bits.
2514
\vv
2515
 
2516
This instruction has the same op1 code as mul\_2pow, but applies to integer operand types only.
2517
\vv
2518
 
2519
 
2520
\subsubsection{shift\_right\_s}
2521
\label{table:shiftRightSInstruction}
2522
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2523
\hline
2524
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2525
multi & 34 & all integer types \\ \hline
2526
\end{tabular}
2527
\vv
2528
 
2529
Shift integer right with sign extension (arithmetic shift).
2530
\vv
2531
 
2532
int32 dest = src1 $>>$ src2
2533
\vv
2534
 
2535
The result is 0 or -1 if src2 is outside the range 0 $\leq$ src2 $<$ number\_of\_bits.
2536
 
2537
\vv
2538
 
2539
 
2540
\subsubsection{shift\_right\_u}
2541
\label{table:shiftRightUInstruction}
2542
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2543
\hline
2544
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2545
multi & 35 & all integer types \\ \hline
2546
\end{tabular}
2547
\vv
2548
 
2549
Shift integer right with zero extension (logical shift).
2550
\vv
2551
 
2552
uint32 dest = src1 $>>$ src2
2553
\vv
2554
 
2555
The result is zero if src2 is outside the range 0 $\leq$ src2 $<$ number\_of\_bits.
2556
\vv
2557
 
2558
 
2559
\subsubsection{funnel\_shift}
2560
\label{table:funnelShiftInstruction}
2561
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2562
\hline
2563
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2564
multi & 53 & all integer types \\ \hline
2565
\end{tabular}
2566
\vv
2567
 
2568
int64 r1 = funnel\_shift(r2, r3, r4) \\
2569
int64 v1 = funnel\_shift(v2, v3, r4) \\
2570
\vv
2571
 
2572
This instruction concatenates two bit fields and shifts this to the right. This is useful for dealing with unaligned bit fields or unaligned vectors.
2573
\vv
2574
 
2575
dest = src1 $>>$ src3 | src2 $<<$ (operand\_size - src3)
2576
\vv
2577
 
2578
For general purpose registers: Operand 1 (low) and operand 2 (high), with n bits each, are concatenated into a bit field with 2n bits. This bit field is shifted right by the number of bits indicated by the third operand. The lower n bits of the result are returned. The result is zero if src3 is outside the range 0 $\leq$ src3 $<$ n.
2579
\vv
2580
 
2581
For vector registers: This instruction is shifting whole vectors rather than vector fields when the operands are vector registers. The shift count is counting vector elements rather than bits.
2582
Vector operand 1 (low) with n elements and vector operand 2 (high), with n elements or less, are concatenated into a larger vector with at most 2n elements. This concatenated vector is shifted down by the number of elements indicated by the third operand. The lower n elements of the result are returned. The result is zero if src3 is outside the range 0 $\leq$ src3 $<$ n.
2583
\vv
2584
 
2585
Some implementations may work slowly for high shift counts.
2586
\vv
2587
 
2588
This instruction will rotate a vector if both input vectors are the same.
2589
\vv
2590
 
2591
A funnel shift in the opposite direction can be made by swapping the first two operands and subtracting the shift count from the operand size.
2592
\vv
2593
 
2594
 
2595
\subsubsection{select\_bits}
2596
\label{table:selectBitsInstruction}
2597
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2598
\hline
2599
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2600
multi & 52 & all integer types \\ \hline
2601
\end{tabular}
2602
\vv
2603
 
2604
int32 r0 = select\_bits(r1, r2, r3)
2605
\vv
2606
 
2607
dest = src1 \& src3 \textbar{} src2 \& \~{}src3
2608
\vv
2609
 
2610
This instruction combines bits from the first two source operands, using the third source operand as selector.
2611
\vv
2612
 
2613
 
2614
\subsubsection{test\_bit}
2615
\label{table:testBitInstruction}
2616
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2617
\hline
2618
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2619
multi & 39 & all integer types \\ \hline
2620
\end{tabular}
2621
\vv
2622
 
2623
Test the value of bit number src2 in src1, and make it the least significant bit of the output, to use as a boolean. The result is zero if src2 is out of range.
2624
\vv
2625
 
2626
result = (src1 $>>$ src2) \& 1.
2627
\vv
2628
 
2629
The result is indicated in bit 0 of the destination register.
2630
The remaining bits of the output may be taken from a mask register or numeric control register. The number of mask or NUMCONTR bits available is implementation dependent.
2631
\vv
2632
 
2633
A fallback register can be used as an operand for an extra boolean operation, with or without a mask. Only bit 0 of the fallback register is used.
2634
The boolean operation is controlled by option bits 0-1.
2635
Option bit 2 inverts the result, bit 3 inverts the fallback, and bit 4 inverts the mask. These options are summarized in the following table, giving the value of bit 0 of the destination register.
2636
 
2637
\begin{longtable} {|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{60mm}|}
2638
\caption{Alternative use of mask and fallback register controlled by option bits}
2639
\label{table:AlternativeMaskUseForTestBit} \\
2640
\endfirsthead
2641
\endhead
2642
\hline
2643
\bfseries bit 4 & \bfseries bit 3 & \bfseries bit 2 & \bfseries bit 1 & \bfseries bit 0 & \bfseries Output \\
2644
\hline
2645
 
2646
 
2647
 
2648
 
2649
1 & 0 & 0 & 0 & 0 & !mask ? result : fallback \\
2650
1 & 0 & 1 & 0 & 0 & !mask ? !result : fallback \\
2651
1 & 1 & 0 & 0 & 0 & !mask ? result : !fallback \\
2652
1 & 1 & 1 & 0 & 0 & !mask ? !result : !fallback \\
2653
\hline
2654
 
2655
 
2656
 
2657
 
2658
1 & 0 & 0 & 0 & 1 & !mask \& result \& fallback \\
2659
1 & 0 & 1 & 0 & 1 & !mask \& !result \& fallback \\
2660
1 & 1 & 0 & 0 & 1 & !mask \& result \& !fallback \\
2661
1 & 1 & 1 & 0 & 1 & !mask \& !result \& !fallback \\
2662
\hline
2663
 
2664
 
2665
 
2666
 
2667
1 & 0 & 0 & 1 & 0 & !mask \& (result $|$ fallback) \\
2668
1 & 0 & 1 & 1 & 0 & !mask \& (!result $|$ fallback) \\
2669
1 & 1 & 0 & 1 & 0 & !mask \& (result $|$ !fallback) \\
2670
1 & 1 & 1 & 1 & 0 & !mask \& (!result $|$ !fallback) \\
2671
\hline
2672
 
2673
 
2674
 
2675
 
2676
1 & 0 & 0 & 1 & 1 & !mask \& (result \^{} fallback) \\
2677
1 & 0 & 1 & 1 & 1 & !mask \& (!result \^{} fallback) \\
2678
1 & 1 & 0 & 1 & 1 & !mask \& (result \^{} !fallback) \\
2679
1 & 1 & 1 & 1 & 1 & !mask \& (!result \^{} !fallback) \\
2680
\hline
2681
\end{longtable}
2682
\vv
2683
 
2684
The value of mask is 1 if there is no mask register.
2685
The remaining bits are copied from the mask register if option bit 5 is set, or from the numeric control register if there is no mask and bit 5 is set. The remaining bits are zero if option bit 5 is not set. The number of mask or NUMCONTR bits available is implementation dependent.
2686
\vv
2687
 
2688
 
2689
\subsubsection{test\_bits\_and}
2690
\label{table:testBitsAndInstruction}
2691
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2692
\hline
2693
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2694
multi & 40 & all integer types \\ \hline
2695
\end{tabular}
2696
\vv
2697
 
2698
Test if the indicated bits are all 1.
2699
 
2700
result = ((src1 \& src2) == src2)
2701
\vv
2702
 
2703
The result is indicated in bit 0 of the destination register.
2704
The remaining bits of the output may be taken from a mask register or numeric control register.
2705
\vv
2706
 
2707
A fallback register can be used as an operand for an extra boolean operation, with or without a mask. Only bit 0 of the fallback register is used. These options are controlled by option bits 0-4 in the same way as for test\_bit, as indicated in table \ref{table:AlternativeMaskUseForTestBit}.
2708
\vv
2709
 
2710
The remaining bits are copied from the mask register if option bit 5 is set, or from the numeric control register if there is no mask and bit 5 is set. The remaining bits are zero if option bit 5 is not set. The number of mask or NUMCONTR bits available is implementation dependent.
2711
\vv
2712
 
2713
 
2714
\subsubsection{test\_bits\_or}
2715
\label{table:testBitsOrInstruction}
2716
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2717
\hline
2718
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2719
multi & 41 & all integer types \\ \hline
2720
\end{tabular}
2721
\vv
2722
 
2723
Test if at least one of the indicated bits is 1.
2724
 
2725
result = ((src1 \& src2) != 0)
2726
\vv
2727
 
2728
The result is indicated in bit 0 of the destination register.
2729
The remaining bits of the output may be taken from a mask register or numeric control register.
2730
\vv
2731
 
2732
A fallback register can be used as an operand for an extra boolean operation, with or without a mask. Only bit 0 of the fallback register is used. These options are controlled by option bits 0-4 in the same way as for test\_bit, as indicated in table \ref{table:AlternativeMaskUseForTestBit}.
2733
\vv
2734
 
2735
The remaining bits are copied from the mask register if option bit 5 is set, or from the numeric control register if there is no mask and bit 5 is set. The remaining bits are zero if option bit 5 is not set. The number of mask or NUMCONTR bits available is implementation dependent.
2736
\vv
2737
 
2738
 
2739
\subsubsection{truth\_tab3}
2740
\label{table:truthTab3Instruction}
2741
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
2742
\hline
2743
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
2744
2.0.6 & 8.1 & general purpose registers. optional \\ \hline
2745
2.2.6 & 8.1 & integer vectors. optional \\ \hline
2746
\end{tabular}
2747
\vv
2748
 
2749
int32 r0 = truth\_tab3(r1, r2, r3, 0xF2), options=0 \\
2750
int32 v0 = truth\_tab3(v1, v2, v3, 0xF2), options=0
2751
\vv
2752
 
2753
This instruction can make an arbitrary bitwise boolean function of three integer variables, expressed by an 8-bit truth table in an immediate constant. Each bit of the result is the arbitrary boolean function of the corresponding bits of the three input registers. The boolean function is calculated for each bit position separately. Three bits from the three input registers are combined into a 3-bit index, where the bit from the first input register goes into the least significant bit and the bit from the last input register goes into the most significant bit. This index is then selecting one bit from the truth table to go into the result.
2754
\vv
2755
 
2756
For example, the boolean function F = A $\&$ $\sim$ B $|$ C has the truth table 0b11110010 or 0xF2.
2757
\vv
2758
 
2759
This can be used as a universal instruction for bitwise logic functions of up to three inputs. Functions of two inputs can be obtained by using the same register for two of the three input registers.
2760
\vv
2761
 
2762
This instruction can also be used for manipulating masks where only bit 0 contains the boolean result. The remaining bits are controlled by options according to the table below. This is useful when the result is used as a mask for floating point instructions:
2763
 
2764
\begin{longtable} {|p{20mm}|p{75mm}|}
2765
\caption{Options for truth\_tab3}
2766
\label{table:OptionsForTruthTab3} \\
2767
\endfirsthead
2768
\endhead
2769
\hline
2770
\bfseries Options & \bfseries Meaning   \\
2771
\hline
2772
 
2773
1 & bit 0 contains a boolean result. The remaining bits are zero \\ \hline
2774
2 & bit 0 contains a boolean result. The remaining bits are taken from a mask or numeric control register. The number of mask or NUMCONTR bits available is implementation dependent. \\ \hline
2775
\end{longtable}
2776
\vv
2777
 
2778
 
2779
\subsection{Combined arithmetic/logic and branch instructions with integer operands}
2780
\label{descriptionOfControlTransferInstructions}
2781
These instructions are doing an arithmetic or logic operation and a conditional jump
2782
depending on the result. Each instruction can be coded in a number of different formats
2783
described on page \pageref{table:jumpInstructionFormats}.
2784
\vv
2785
 
2786
The instructions are listed below in pairs, where the second instruction has the branch condition inverted.
2787
\vv
2788
 
2789
These instructions cannot have a mask.
2790
The destination operand, if any, should preferably be the same as the first source operand for optimal performance. The second source operand may be a register, a memory operand, or an immediate constant with no more than 32 bits.
2791
\vv
2792
 
2793
 
2794
\subsubsection{add/jump\_zero}
2795
\label{table:addJumpZeroInstruction}
2796
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
2797
\hline
2798
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2799
all & 16 & add/jump\_zero & integer \\ \hline
2800
all & 17 & add/jump\_nzero & integer\\ \hline
2801
\end{tabular}
2802
\vv
2803
 
2804
Add two integer operands and jump if the result is zero.
2805
 
2806
 
2807
\subsubsection{add/jump\_neg}
2808
\label{table:addJumpNegInstruction}
2809
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
2810
\hline
2811
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2812
all & 18 & add/jump\_neg & integer \\ \hline
2813
all & 19 & add/jump\_nneg & integer\\ \hline
2814
\end{tabular}
2815
\vv
2816
 
2817
Add two integer operands and jump if the signed result is negative.
2818
 
2819
The result will wrap around in the case of overflow and jump if the result has the sign bit set.
2820
 
2821
 
2822
\subsubsection{add/jump\_pos}
2823
\label{table:addJumpPosInstruction}
2824
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
2825
\hline
2826
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2827
all & 20 & add/jump\_pos & integer \\ \hline
2828
all & 21 & add/jump\_npos & integer\\ \hline
2829
\end{tabular}
2830
\vv
2831
 
2832
Add two integer operands and jump if the signed result is positive.
2833
 
2834
The result will wrap around in the case of overflow and jump if the result is not zero and does not have the sign bit set.
2835
 
2836
 
2837
\subsubsection{add/jump\_overflow}
2838
\label{table:addJumpOverflInstruction}
2839
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
2840
\hline
2841
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2842
all & 22 & add/jump\_overflow & integer \\ \hline
2843
all & 23 & add/jump\_noverflow & integer\\ \hline
2844
\end{tabular}
2845
\vv
2846
 
2847
Add two signed integer operands and jump if the result overflows.
2848
\vv
2849
 
2850
 
2851
\subsubsection{add/jump\_carry}
2852
\label{table:addJumpCarryInstruction}
2853
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
2854
\hline
2855
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2856
all & 24 & add/jump\_carry & integer \\ \hline
2857
all & 25 & add/jump\_ncarry & integer\\ \hline
2858
\end{tabular}
2859
\vv
2860
 
2861
Add two unsigned integer operands and jump if the operation produces a carry.
2862
\vv
2863
 
2864
 
2865
\subsubsection{increment\_compare/jump\_above/below}
2866
\label{table:addJumpInstruction}
2867
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
2868
\hline
2869
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2870
all & 48 & increment\_compare/jump\_below & integer \\ \hline
2871
all & 49 & increment\_compare/jump\_aboveeq & integer \\ \hline
2872
all & 50 & increment\_compare/jump\_above & integer \\ \hline
2873
all & 51 & increment\_compare/jump\_beloweq & integer \\ \hline
2874
\end{tabular}
2875
\vv
2876
 
2877
Add 1 to the first source operand and jump if the signed result is less than a certain limit. The result is saved in the destination operand. This is useful for implementing a simple for'' loop.
2878
\vv
2879
 
2880
The result will wrap around from INT\_MAX to INT\_MIN in case of overflow.
2881
\vv
2882
 
2883
 
2884
\subsubsection{sub/jump\_zero}
2885
\label{table:subJumpZeroInstruction}
2886
\begin{tabular}{|p{20mm}|p{12mm}|p{56mm}|p{50mm}|}
2887
\hline
2888
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2889
Not 1.7 &  0 & sub/jump\_zero & integer \\ \hline
2890
Not 1.7 &  1 & sub/jump\_nzero  & integer\\ \hline
2891
\end{tabular}
2892
\vv
2893
 
2894
Subtract two integer operands and jump if the result is zero.
2895
\vv
2896
 
2897
Immedate constants are not supported. The assembler will automatically convert a sub/jump\_zero instruction to an add/jump\_zero instruction with the negative constant.
2898
\vv
2899
 
2900
\subsubsection{sub/jump\_neg}
2901
\label{table:subJumpNegInstruction}
2902
\begin{tabular}{|p{20mm}|p{12mm}|p{56mm}|p{50mm}|}
2903
\hline
2904
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2905
Not 1.7 &  2 & sub/jump\_neg & integer \\ \hline
2906
Not 1.7 &  3 & sub/jump\_nneg & integer\\ \hline
2907
\end{tabular}
2908
\vv
2909
 
2910
Subtract two integer operands and jump if the signed result is negative.
2911
 
2912
The result will wrap around in the case of overflow and jump if the result has the sign bit set.
2913
\vv
2914
 
2915
Immedate constants are not supported. The assembler will automatically convert a sub/jump\_neg instruction to an add/jump\_neg instruction with the negative constant.
2916
\vv
2917
 
2918
\subsubsection{sub/jump\_pos}
2919
\label{table:subJumpPosInstruction}
2920
\begin{tabular}{|p{20mm}|p{12mm}|p{56mm}|p{50mm}|}
2921
\hline
2922
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2923
Not 1.7 &  4 & sub/jump\_pos & integer \\ \hline
2924
Not 1.7 &  5 & sub/jump\_npos & integer\\ \hline
2925
\end{tabular}
2926
\vv
2927
 
2928
Subtract two integer operands and jump if the signed result is positive.
2929
 
2930
The result will wrap around in the case of overflow and jump if the result is not zero and does not have the sign bit set.
2931
\vv
2932
 
2933
Immedate constants are not supported. The assembler will automatically convert a sub/jump\_pos instruction to an add/jump\_pos instruction with the negative constant.
2934
\vv
2935
 
2936
\subsubsection{sub/jump\_overflow}
2937
\label{table:subJumpOverflInstruction}
2938
\begin{tabular}{|p{20mm}|p{12mm}|p{56mm}|p{50mm}|}
2939
\hline
2940
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2941
Not 1.7 &  6 & sub/jump\_overflow & integer \\ \hline
2942
Not 1.7 &  7 & sub/jump\_noverflow & integer\\ \hline
2943
\end{tabular}
2944
\vv
2945
 
2946
Subtract two signed integer operands and jump if the result overflows.
2947
\vv
2948
 
2949
Immedate constants are not supported. The assembler will automatically convert a sub/jump\_overflow instruction to an add/jump\_overflow instruction with the negative constant.
2950
\vv
2951
 
2952
\subsubsection{sub/jump\_borrow}
2953
\label{table:subJumpBorrowInstruction}
2954
\begin{tabular}{|p{20mm}|p{12mm}|p{56mm}|p{50mm}|}
2955
\hline
2956
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2957
Not 1.7 &  8 & sub/jump\_borrow & integer \\ \hline
2958
Not 1.7 &  9 & sub/jump\_nborrow & integer\\ \hline
2959
\end{tabular}
2960
\vv
2961
 
2962
Subtract two unsigned integer operands and jump if the operation produces a borrow.
2963
\vv
2964
 
2965
Immedate constants are not supported. The assembler will automatically convert a sub/jump\_borrow instruction to an add/jump\_borrow instruction with the negative constant.
2966
\vv
2967
 
2968
\subsubsection{sub\_maxlen/jump\_pos}
2969
\label{table:subMaxlenJumpPosInstruction}
2970
\begin{tabular}{|p{24mm}|p{12mm}|p{52mm}|p{50mm}|}
2971
\hline
2972
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2973
1.7C, 2.5.1B, 2.5.4C & 52 & sub\_maxlen/jump\_pos & integer \\ \hline
2974
1.7C, 2.5.1B, 2.5.4C & 53 & sub\_maxlen/jump\_npos & integer \\ \hline
2975
\end{tabular}
2976
\vv
2977
 
2978
Subtract the maximum vector length (in bytes) from a general purpose register and jump if the result is positive.
2979
The immediate operand indicates the operand type for which the maximum vector length is obtained. The operand size for the source and destination register is 64 bits in C formats.
2980
\vv
2981
 
2982
This instruction makes it easy to implement the type of vector loop described on on page \pageref{vectorLoops}.
2983
\vv
2984
 
2985
 
2986
\subsubsection{and/jump\_zero}
2987
\label{table:andJumpZeroInstruction}
2988
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
2989
\hline
2990
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
2991
Not 1.7 & 10 & and/jump\_zero & all \\ \hline
2992
Not 1.7 & 11 & and/jump\_nzero & all \\ \hline
2993
\end{tabular}
2994
\vv
2995
 
2996
Bitwise and. Jump if zero.
2997
\vv
2998
 
2999
dest = src1 \& src2
3000
 
3001
jump if dest == 0
3002
\vv
3003
 
3004
All operands are treated as integers.
3005
Floating point operands are treated as unsigned integer scalars in vector registers.
3006
\vv
3007
 
3008
\subsubsection{or/jump\_zero}
3009
\label{table:orJumpZeroInstruction}
3010
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
3011
\hline
3012
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
3013
Not 1.7 & 12 & or/jump\_zero & all \\ \hline
3014
Not 1.7 & 13 & or/jump\_nzero & all \\ \hline
3015
\end{tabular}
3016
\vv
3017
 
3018
Bitwise or. Jump if zero.
3019
\vv
3020
 
3021
dest = src1 $|$ src2
3022
 
3023
jump if dest == 0
3024
\vv
3025
 
3026
All operands are treated as integers.
3027
Floating point operands are treated as unsigned integer scalars in vector registers.
3028
\vv
3029
 
3030
\subsubsection{xor/jump\_zero}
3031
\label{table:xorJumpZeroInstruction}
3032
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
3033
\hline
3034
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
3035
Not 1.7 & 14 & xor/jump\_zero & all \\ \hline
3036
Not 1.7 & 15 & xor/jump\_nzero & all \\ \hline
3037
\end{tabular}
3038
\vv
3039
 
3040
Bitwise exclusive or. Jump if zero.
3041
\vv
3042
 
3043
dest = src1 \^{ } src2
3044
 
3045
jump if dest == 0
3046
\vv
3047
 
3048
All operands are treated as integers.
3049
Floating point operands are treated as unsigned integer scalars in vector registers.
3050
 
3051
 
3052
\subsubsection{test\_bit/jump\_true}
3053
\label{table:testBitJumpTrueInstruction}
3054
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
3055
\hline
3056
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
3057
all & 26 & test\_bit/jump\_true & all \\ \hline
3058
all & 27 & test\_bit/jump\_false & all \\ \hline
3059
\end{tabular}
3060
\vv
3061
 
3062
int test\_bit(r1, 3), jump\_true target \\
3063
if (int r1 \& 8) \{jump target\}
3064
\vv
3065
 
3066
Test a single bit in the first source operand as indicated by the an index in the second source operand and jump if the indicated bit is 1. There is no destination operand.
3067
\vv
3068
 
3069
jump if ((src1 $>>$ src2) \& 1) == 1
3070
\vv
3071
 
3072
All operands are treated as unsigned integers.
3073
Floating point operands are treated as integer scalars in vector registers.
3074
\vv
3075
 
3076
 
3077
\subsubsection{test\_bits\_and/jump\_true}
3078
\label{table:testBitsAndJumpInstruction}
3079
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
3080
\hline
3081
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
3082
all & 28 & test\_bits\_and/jump\_true & all \\ \hline
3083
all & 29 & test\_bits\_and/jump\_false & all \\ \hline
3084
\end{tabular}
3085
\vv
3086
 
3087
int test\_bits\_and(r1, 7), jump\_true target \\
3088
if (int (r1 \& 7) == 7) \{jump target\}
3089
\vv
3090
 
3091
Test the AND combination of the bits indicated by the second source operand. Jump if the indicated bits are all 1. There is no destination operand.
3092
\vv
3093
 
3094
jump if (src1 \& src2) == src2
3095
\vv
3096
 
3097
All operands are treated as unsigned integers.
3098
Floating point operands are treated as integer scalars in vector registers.
3099
\vv
3100
 
3101
 
3102
\subsubsection{test\_bits\_or/jump\_true}
3103
\label{table:testBitsOrJumpInstruction}
3104
\begin{tabular}{|p{16mm}|p{12mm}|p{60mm}|p{50mm}|}
3105
\hline
3106
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries operands \\ \hline
3107
all & 30 & test\_bits\_or/jump\_true & all \\ \hline
3108
all & 31 & test\_bits\_or/jump\_false & all \\ \hline
3109
\end{tabular}
3110
\vv
3111
 
3112
int test\_bits\_or(r1, 7), jump\_true target \\
3113
if (int r1 \& 7) \{jump target\}
3114
\vv
3115
 
3116
Test the OR combination of the bits indicated by the second source operand. Jump if at least one of the indicated bits is 1. There is no destination operand.
3117
\vv
3118
 
3119
jump if (src1 \& src2) != 0
3120
\vv
3121
 
3122
All operands are treated as unsigned integers.
3123
Floating point operands are treated as integer scalars in vector registers.
3124
\vv
3125
 
3126
 
3127
\subsubsection{integer compare and branch instructions}
3128
int64 compare(r1, r2), jump\_equal target
3129
\vv
3130
 
3131
Compare instructions have no destination operand.
3132
Overflow cannot occur.
3133
\vv
3134
 
3135
\label{table:integerCompareJumpInstructions}
3136
\begin{tabular}{|p{12mm}|p{60mm}|p{50mm}|}
3137
\hline
3138
\bfseries opcode & \bfseries instruction & \bfseries jump condition \\ \hline
3139
32 & compare/jump\_equal & r1 = r2 \\ \hline
3140
33 & compare/jump\_nequal  & r1 $\neq$ r2 \\ \hline
3141
34 & compare/jump\_sbelow & r1 $<$ r2, signed \\ \hline
3142
35 & compare/jump\_saboveeq & r1 $\geq$ r2, signed \\ \hline
3143
36 & compare/jump\_sabove & r1 $>$ r2, signed  \\ \hline
3144
37 & compare/jump\_sbeloweq  & r1 $\leq$ r2, signed \\ \hline
3145
38 & compare/jump\_ubelow & r1 $<$ r2, unsigned \\ \hline
3146
39 & compare/jump\_uaboveeq  & r1 $\geq$ r2, unsigned \\ \hline
3147
40 & compare/jump\_uabove & r1 $>$ r2, unsigned \\ \hline
3148
41 & compare/jump\_ubeloweq  & r1 $\leq$ r2, unsigned \\ \hline
3149
\end{tabular}
3150
\vv
3151
 
3152
 
3153
\subsection{floating point branch instructions}
3154
The conditional jump instructions use general purpose registers for integer operands with at most 64 bits, and vector registers when a floating point type is specified. Only the first element of a floating point vector is used.
3155
\vv
3156
 
3157
Addition and subtraction instructions with conditional branching do not support floating point operands.
3158
\vv
3159
 
3160
 
3161
\subsubsection{floating point compare and branch instructions}
3162
double compare(v1, v2), jump\_above target
3163
\vv
3164
 
3165
Compare instructions have no destination operand.
3166
Overflow cannot occur. \\
3167
0.0 and -0.0 are treated as equal.
3168
\vv
3169
 
3170
The unordered versions of floating point compare instructions are true when any input operand is NAN. The versions without \_uo suffix are false when any operand is NAN.
3171
The unordered versions are needed because conditions are often inversed in the compilation process. For example the inverse of compare/jump\_below is not compare/jump\_aboveeq but compare/jump\_aboveeq\_uo. This is a consequence of the rule that  all comparisons except '!=' return false when the inputs are unordered, i.e. when at least one operand is NAN, according to the IEEE-754 standard for floating point arithmetic.
3172
\vspace{4mm}
3173
 
3174
\label{table:floatCompareJumpInstructions}
3175
\begin{tabular}{|p{12mm}|p{60mm}|p{40mm}|p{40mm}|}
3176
\hline
3177
\bfseries opcode & \bfseries instruction & \bfseries jump condition & \bfseries high level language \\ \hline
3178
32 & compare/jump\_equal & v1 = v2 & a == b \\ \hline
3179
 
3180
33 & compare/jump\_nequal  & v1 $\neq$ v2 &  \\ \hline
3181
1 & compare/jump\_nequal\_uo  & v1 $\neq$ v2 & a != b \\ \hline
3182
34 & compare/jump\_below & v1 $<$ v2 & a < b  \\ \hline
3183
2 & compare/jump\_below\_uo & v1 $<$ v2 & !(a >= b)  \\ \hline
3184
35 & compare/jump\_aboveeq & v1 $\geq$ v2 & a >= b  \\ \hline
3185
3 & compare/jump\_aboveeq\_uo & v1 $\geq$ v2 & !(a < b)  \\ \hline
3186
36 & compare/jump\_above & v1 $>$ v2 & a > b  \\ \hline
3187
4 & compare/jump\_above\_uo & v1 $>$ v2 & !(a <= b)   \\ \hline
3188
37 & compare/jump\_beloweq  & v1 $\leq$ v2 & a <= b  \\ \hline
3189
5 & compare/jump\_beloweq\_uo  & v1 $\leq$ v2 & !(a > b)  \\ \hline
3190
 
3191
38 & compare/jump\_abs\_below & abs(v1) $<$ abs(v2) &   \\ \hline
3192
6 & compare/jump\_abs\_below\_uo & abs(v1) $<$ abs(v2) &   \\ \hline
3193
39 & compare/jump\_abs\_aboveeq & abs(v1) $\geq$ abs(v2) &   \\ \hline
3194
7 & compare/jump\_abs\_aboveeq\_uo & abs(v1) $\geq$ abs(v2) &   \\ \hline
3195
40 & compare/jump\_abs\_above & abs(v1) $>$ abs(v2) &    \\ \hline
3196
8 & compare/jump\_abs\_above\_uo & abs(v1) $>$ abs(v2) &    \\ \hline
3197
41 & compare/jump\_abs\_beloweq  & abs(v1) $\leq$ abs(v2) &   \\ \hline
3198
9 & compare/jump\_abs\_beloweq\_uo  & abs(v1) $\leq$ abs(v2) &   \\ \hline
3199
24 & fp\_category/jump\_true & value belongs to one of the indicated categories &   \\ \hline
3200
25 &  fp\_category/jump\_false & value does not belong to any of the indicated categories &  \\ \hline
3201
\hline
3202
\end{tabular}
3203
\vv
3204
 
3205
The \_abs conditions ignore the sign bits and compare the absolute values of the two operands.
3206
\vv
3207
 
3208
The fp\_category/jump\_true instruction tests if the value of the first operand belongs to any of the categories indicated by the second source operand, which is an integer. The categories are indicated according to table \ref{table:fpCategoryInstructionBits} on page \pageref{table:fpCategoryInstructionBits}
3209
\vv
3210
 
3211
 
3212
\subsection{Unconditional and indirect jump, call, and return instructions}
3213
Control transfer instructions are available in a number of different formats, described on
3214
page \pageref{table:jumpInstructionFormats}.
3215
 
3216
 
3217
\subsubsection{Direct jump}
3218
\label{table:jumpInstruction}
3219
\begin{tabular}{|p{14mm}|p{12mm}|p{110mm}|}
3220
\hline
3221
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3222
%1.7 C & 58 & jump with 16 bit relative address (not supported) \\ \hline
3223
1.7 D &  0 & jump with 24 bit relative address \\ \hline
3224
2.5.4 C & 58 & jump with 32 bit relative address \\ \hline
3225
3.1.1 B & 58 & jump with 64 bit absolute address (optional) \\ \hline
3226
\end{tabular}
3227
\vv
3228
 
3229
Unconditional jump.
3230
 
3231
 
3232
\subsubsection{Direct function call}
3233
\label{table:callInstruction}
3234
\begin{tabular}{|p{14mm}|p{12mm}|p{110mm}|}
3235
\hline
3236
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3237
%1.7 C & 59 & call with 16 bit relative address (not supported) \\ \hline
3238
1.7 D &  8 & call with 24 bit relative address \\ \hline
3239
2.5.4 C & 59 & call with 32 bit relative address \\ \hline
3240
3.1.1 B & 59 & call with 64 bit absolute address (optional) \\ \hline
3241
\end{tabular}
3242
\vv
3243
 
3244
Function call.
3245
\vv
3246
 
3247
The return address is stored on the call stack. The calling conventions are described in chapter \ref{chap:functionCallingConventions}.
3248
 
3249
 
3250
\subsubsection{Indirect jump}
3251
 
3252
\label{table:indirectJumpInstruction}
3253
\begin{tabular}{|p{14mm}|p{12mm}|p{110mm}|}
3254
\hline
3255
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3256
1.6 B & 58 & 64 bit absolute address in memory operand with 8 bit offset \\ \hline
3257
1.7 C & 60 & 64 bit absolute address in register \\ \hline
3258
1.6 A & 60 & Multi-way jump with table of relative addresses (see below) \\ \hline
3259
2.5.2 B & 58 & Absolute address in memory operand with 32 bit offset \\ \hline
3260
\end{tabular}
3261
\vv
3262
 
3263
 
3264
\subsubsection{Indirect call}
3265
\label{table:IndirectCallInstruction}
3266
\begin{tabular}{|p{14mm}|p{12mm}|p{110mm}|}
3267
\hline
3268
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3269
1.6 B & 59 & 64 bit absolute address in memory operand with 8 bit offset \\ \hline
3270
1.7 C & 61 & 64 bit absolute address in register \\ \hline
3271
1.6 A & 61 & Multi-way call with table of relative addresses (see below) \\ \hline
3272
2.5.2 B & 59 & Absolute address in memory operand with 32 bit offset \\ \hline
3273
\end{tabular}
3274
\vv
3275
 
3276
 
3277
\subsubsection{Relative and multi-way jump and call}
3278
\label{table:multiwayJumpCallInstructions}
3279
\begin{tabular}{|p{14mm}|p{12mm}|p{110mm}|}
3280
\hline
3281
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3282
1.6 A   & 60 & Jump with table of relative addresses. \linebreak Has reference point, base and scaled index  \\ \hline
3283
2.5.2 B & 60 & Jump with relative address. \linebreak Has reference point, base and offset  \\ \hline
3284
1.6 A   & 61 & Call with table of relative addresses. \linebreak Has reference point, base and scaled index    \\ \hline
3285
2.5.2 B & 61 & Call with relative address. \linebreak Has reference point, base and offset \\ \hline
3286
\end{tabular}
3287
\vv
3288
 
3289
\label{relativeJumpInstruction}
3290
The multi-way and relative jump and call instructions, jump\_relative and call\_relative, are using pointers stored in memory relative to an arbitrary reference point.
3291
These instructions are intended to facilitate multi-way branches
3292
(switch/case statements), function tables in code interpreters, virtual function tables in object oriented languages with polymorphism, and general use of relative pointers. The relative pointers stored in memory use 8, 16, or 32 bits, depending on the distance to the reference point, while absolute pointers need 64 bits. This saves memory space and cache space.
3293
\vv
3294
 
3295
Relative pointers to jump or call addresses are stored in memory as signed offsets relative to an arbitrary reference point. The reference point may be the table address, the ip\_base, or any reference point defined by the programmer. The operand type specifies the size of the table entries.
3296
\vv
3297
 
3298
This instruction works as follows. Calculate the address of a table entry as the base pointer plus the offset (unscaled) or the index (RT) scaled by the operand size. Read a relative pointer from this address, sign-extend to 64 bits, and scale by 4. Then add the reference point (RD). Jump or call to the calculated address. The array index (RT) is scaled by the operand size, while the table entries are scaled by the instruction word size (4). The reference point must be aligned by 4.
3299
\vv
3300
 
3301
This instruction in format 1.6A has base pointer in RS, scaled index in RT, and reference point in RD. Format 2.5.2B has base pointer in RS, unscaled index in IM2 and reference point in RD.
3302
\vv
3303
 
3304
A table of pointers used by the table-based jump\_relative and call\_relative instructions is preferably placed in the constant data section (CONST). This makes it possible to use the table base as reference point. This also improves security by giving read-only access to the table.
3305
\vv
3306
 
3307
These instructions cannot have a mask and will not generate overflow traps in case of overflow in the address calculation, but you will get access violation traps when attempting to access an illegal memory address.
3308
\vv
3309
 
3310
 
3311
\subsubsection{return}
3312
\label{table:returnInstruction}
3313
\begin{tabular}{|p{14mm}|p{12mm}|p{110mm}|}
3314
\hline
3315
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3316
1.6 B & 62 & \\ \hline
3317
\end{tabular}
3318
\vv
3319
 
3320
Return from function call. The return address is taken from the call stack.
3321
\vv
3322
 
3323
Return instructions do not need a stack offset when the calling conventions specified in chapter \ref{chap:functionCallingConventions} are used.
3324
 
3325
 
3326
\subsubsection{breakpoint}
3327
\label{table:breakpointInstruction}
3328
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3329
\hline
3330
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3331
1.7 C & 63 & \\ \hline
3332
\end{tabular}
3333
\vv
3334
 
3335
This instruction is used as a debug breakpoint.
3336
\vv
3337
 
3338
It is the same as trap(1). The complete instruction code word is 0x7FE00001.
3339
\vv
3340
 
3341
\subsubsection{filler}
3342
\label{table:fillerInstruction}
3343
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3344
\hline
3345
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3346
1.7 C & 63 & \\ \hline
3347
\end{tabular}
3348
\vv
3349
 
3350
This instruction is used for filling unused code memory. It will generate a trap (interrupt) if executed.
3351
\vv
3352
 
3353
All fields are filled with ones. The complete instruction code word is 0x7FFFFFFF.
3354
\vv
3355
 
3356
 
3357
\subsubsection{System call, system return, and traps}
3358
See page \pageref{table:sysCallInstruction}.
3359
\vv
3360
 
3361
 
3362
\subsection{Miscellaneous instructions}
3363
 
3364
\subsubsection{address}
3365
\label{table:addressInstruction}
3366
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3367
\hline
3368
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3369
2.9 B & 32 & g.p. registers \\ \hline
3370
\end{tabular}
3371
\vv
3372
 
3373
int64 r1 = address [memory\_label]
3374
\vv
3375
 
3376
Calculate an address relative to a pointer by adding a 32-bit sign-extended constant to a special pointer register. The pointer register can be THREADP (28), DATAP (29), IP (30) or SP(31).
3377
\vv
3378
 
3379
 
3380
\subsubsection{compare\_swap}
3381
\label{table:compareSwapInstruction}
3382
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3383
\hline
3384
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3385
2.5 A & 18 & g. p. registers and memory operand with 32 bit offset. Optional \\ \hline
3386
\end{tabular}
3387
\vv
3388
 
3389
int32 r1 = compare\_swap(r1, r2, [r3+0x100])
3390
\vv
3391
 
3392
Atomic compare and swap instruction, used for thread synchronization and for lock-free data sharing between threads. src1 and src2 are register operands, src3 is a memory operand, which must be aligned to a natural address. All operands are treated as integers, regardless of the specified operand type. The operation is:
3393
 
3394
\begin{lstlisting}[frame=none]
3395
   temp = src3;
3396
   if (temp == src1) src3 = src2;
3397
   return temp;
3398
\end{lstlisting}
3399
 
3400
This instruction cannot have a mask.
3401
\vv
3402
 
3403
Further atomic instructions can be implemented if needed, preferably with the same format and consecutive values of OP1.
3404
\vv
3405
 
3406
 
3407
\subsubsection{nop}
3408
\label{table:nopInstruction}
3409
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3410
\hline
3411
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3412
multi &  0 & \\ \hline
3413
3.0   &  0 & \\ \hline
3414
\end{tabular}
3415
\vv
3416
 
3417
No operation. Used as a filler to replace removed code or to align code entries.
3418
\vv
3419
 
3420
Unused bits may be used for debugging information, etc.
3421
\vv
3422
 
3423
The processor is allowed to skip NOPs as fast as it can at an early stage in the pipeline. These NOPs cannot be used as timing delays, only as fillers.
3424
\vv
3425
 
3426
 
3427
\subsubsection{undef}
3428
\label{table:undefInstruction}
3429
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3430
\hline
3431
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3432
multi & 63 & \\ \hline
3433
\end{tabular}
3434
\vv
3435
 
3436
Undefined code. Guaranteed to generate trap (interrupt) in all future implementations
3437
\vv
3438
 
3439
\subsubsection{userdef}
3440
\label{table:userdefInstruction}
3441
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3442
\hline
3443
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3444
multi & 56-62 & any types \\ \hline
3445
\end{tabular}
3446
\vv
3447
 
3448
Reserved for user-defined instructions.
3449
\vv
3450
 
3451
 
3452
\subsection{System instructions}
3453
These instructions cannot have a mask.
3454
\vv
3455
 
3456
\subsubsection{input}
3457
 
3458
\label{table:inputInstruction}
3459
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3460
\hline
3461
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3462
1.8 B & 62 & general purpose registers \\ \hline
3463
1.2 A & 62 & vector registers \\ \hline
3464
\end{tabular}
3465
\vv
3466
 
3467
int32 r0 = input(r1, 4) \\
3468
int64 v0 = input(r1, r2)
3469
\vv
3470
 
3471
Read from input port into register RD. Privileged instruction.
3472
\vv
3473
 
3474
General purpose register input with immediate port address:\\
3475
The immediate operand contains a port address in the interval 0 - 254. Register RS is ignored.
3476
\vv
3477
 
3478
General purpose register input with port address in register:\\
3479
The immediate operand is 255. Register RS contains a 64 bit port address.
3480
\vv
3481
 
3482
Vector register input with port address in register:\\
3483
RS = port address. RT = vector length in bytes, \\
3484
Vector input is not necessarily supported for all input ports.\\
3485
Masks are not necessarily supported.
3486
\vv
3487
 
3488
 
3489
\subsubsection{output}
3490
\label{table:outputInstruction}
3491
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3492
\hline
3493
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3494
1.8 B & 63 & general purpose registers \\ \hline
3495
1.2 A & 63 & vector registers \\ \hline
3496
\end{tabular}
3497
\vv
3498
 
3499
int32 output(r1, r2, 4)\\
3500
int64 output(v0, r1, r2)
3501
\vv
3502
 
3503
Write register value RD to output port. Privileged instruction.
3504
\vv
3505
 
3506
General purpose register output with immediate port address:\\
3507
The immediate operand contains a port address in the interval 0 - 254. Register RS is ignored.
3508
\vv
3509
 
3510
General purpose register output with port address in register:\\
3511
The immediate operand is 255. Register RS contains a 64 bit port address.
3512
\vv
3513
 
3514
Vector register output with port address in register:\\
3515
RS = port address. RT = vector length in bytes, \\
3516
Vector output is not necessarily supported for all output ports.\\
3517
Masks are not necessarily supported.
3518
\vv
3519
 
3520
 
3521
\subsubsection{read\_capabilities, write\_capabilities}
3522
\label{table:readCapabilitiesInstruction}
3523
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3524
\hline
3525
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3526
1.8 B & 34 & read\_capabilities(capabilities register, constant) \\ \hline
3527
1.8 B & 35 & write\_capabilities(g.p. register, constant) \\ \hline
3528
\end{tabular}
3529
\vv
3530
 
3531
Preliminary specification.
3532
\vv
3533
 
3534
Read or write processor capabilities register. These registers are used for indicating capabilities of the processor, such as support for optional instructions and limitations to vector lengths. These registers are initialized with their default values at program start.
3535
\vv
3536
 
3537
The immediate constant in IM1 may determine details of the operation.
3538
\vv
3539
 
3540
\begin{longtable} {|p{20mm}|p{90mm}|}
3541
\caption{List of capabilities registers}
3542
\label{table:capabilitiesRegisters} \\
3543
\endfirsthead
3544
\endhead
3545
\hline
3546
\bfseries Capabilities register number & \bfseries Meaning  \\
3547
\hline
3548
capab0 & Microprocessor model or brand ID  \\
3549
capab1 & Microprocessor version number  \\
3550
\hline
3551
capab2 & Disable error traps. Bit 0: unknown instructions, bit 1: wrong instruction operands, bit 2: array overflow, bit 3: memory read violation, bit 4: memory write violation, bit 5: misaligned memory access. \\
3552
\hline
3553
capab4 & Code cache size, level 1  \\
3554
capab5 & Data cache size, level 1  \\
3555
\hline
3556
capab8  &  Support for operand sizes in general purpose registers. Bit 0: int8, bit 1: int16, bit 2: int32, bit 3: int64 \\
3557
capab9  &  Support for operand sizes in vector registers. \linebreak
3558
Bit 0: int8, bit 1: int16, bit 2: int32, bit 3: int64, bit 4: int128, bit 5: float32, bit 6: float64, bit 7: float128, bit 8: float16.\\
3559
\hline
3560
 
3561
capab12  &  Maximum vector length for general instructions. \\
3562
capab13  &  Maximum vector length for permute instructions. \\
3563
capab14  &  Maximum block size for permute instructions. \\
3564
capab15  &  Maximum vector length for compress\_sparse and expand\_sparse. \\
3565
\hline
3566
 
3567
\hline
3568
\end{longtable}
3569
 
3570
Some capabilities registers can be modified for test purposes or to tell the software not to use a specific instruction.
3571
\vv
3572
 
3573
Setting bits in capab2 will suppress error traps. Instead, the errors will be counted in performance counter registers described on page \pageref{table:performanceCounters}. To test if a particular instruction is supported, set bit 0 in capab2, reset the performance counter, try to execute the instruction, and read the performance counter again.
3574
 
3575
\vv
3576
Changing the values of the maximum vector length has the following effects. If the maximum length is reduced below the physical capability then any attempt to make a longer vector will result in the reduced length. The behavior of vector registers that already had a longer length before the maximum length was reduced, is implementation dependent. If the maximum vector length is set to a higher value than the physical capability then any attempt to make a vector longer than the physical capability will cause a trap to facilitate emulation, if the platform supports emulation.
3577
\vv
3578
 
3579
Capabilities registers 12-15 can be increased for the purpose of emulation. The value of capabilities registers 12-15 must be powers of 2.
3580
\vv
3581
 
3582
 
3583
\subsubsection{read\_memory\_map, write\_memory\_map}
3584
\label{table:readMemoryMapInstruction}
3585
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3586
\hline
3587
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3588
1.2 A & 60 & vector = read\_memory\_map(base, index) \\ \hline
3589
1.2 A & 61 & write\_memory\_map(vector, base, index) \\ \hline
3590
\end{tabular}
3591
\vv
3592
 
3593
Preliminary specification.
3594
\vv
3595
 
3596
int64 v0 = read\_memory\_map(r2, r3)
3597
\vv
3598
 
3599
Read memory map and save it to a vector register. Privileged instruction.\\
3600
RD = destination vector register, RT-RS = internal address.
3601
\vv
3602
 
3603
int64 write\_memory\_map(v1, r2, r3)
3604
\vv
3605
 
3606
Write a vector register to memory map. RD = vector register source. RT-RS = internal address. Privileged instruction.
3607
\vv
3608
 
3609
 
3610
\subsubsection{read\_call\_stack, write\_call\_stack}
3611
\label{table:readCallStackInstruction}
3612
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3613
\hline
3614
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3615
1.2 A & 58 & read\_call\_stack(r1, r2) \\
3616
\hline
3617
1.2 A & 59 & write\_call\_stack(v1, r2, r3) \\ \hline
3618
\end{tabular}
3619
\vv
3620
 
3621
Preliminary specification.
3622
\vv
3623
 
3624
int64 v0 = read\_call\_stack(r1, r2)
3625
\vv
3626
 
3627
Read the internal call stack into a vector register. This instruction is used for saving the internal call stack to system memory in case of overflow.
3628
Privileged instruction.
3629
\vv
3630
 
3631
RD = destination vector register, RT-RS = internal address.
3632
\vv
3633
 
3634
int64 write\_call\_stack(v1, r2, r3)
3635
\vv
3636
 
3637
Write a vector register to the internal call stack. This instruction is used for restoring the internal call stack.
3638
Privileged instruction.
3639
\vv
3640
 
3641
 
3642
\subsubsection{read\_perf}
3643
\label{table:readPerfInstruction}
3644
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3645
\hline
3646
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3647
1.8 B & 36 & performance counter register, constant \\ \hline
3648
\end{tabular}
3649
\vv
3650
 
3651
int64 r0 = read\_perf(perf1, 1)
3652
\vv
3653
 
3654
A number of internal registers are used for counting performance related events.
3655
This instruction reads performance counter registers and performance related information. Some performance counters may be implementation-specific.
3656
\vv
3657
 
3658
\begin{longtable} {|p{15mm}|p{15mm}|p{85mm}|}
3659
\caption{List of performance counter registers}
3660
\label{table:performanceCounters} \\
3661
\endfirsthead
3662
\endhead
3663
\hline
3664
\bfseries Performance counter & \bfseries Second operand & \bfseries Meaning  \\
3665
\hline
3666
perf0  & -1 & Reset all performance counters \\
3667
\hline
3668
perf1  & 1 & CPU clock cycles \\
3669
perf1  & 0 & Reset CPU clock cycles counter \\
3670
\hline
3671
perf2  & 1 & Number of instructions executed \\
3672
perf2  & 2 & Number of double size instructions \\
3673
perf2  & 3 & Number of triple size instructions \\
3674
perf2  & 4 & General purpose register instructions \\
3675
perf2  & 5 & G. p. register instructions with mask zero \\
3676
perf2  & 0 & Reset counters \\
3677
\hline
3678
perf3  & 1 & Vector instructions executed \\
3679
perf3  & 0 & Reset counter \\
3680
\hline
3681
perf4  & 1 & Vector registers in use. Returns one bit for each vector register \\
3682
\hline
3683
perf5  & 1 & Jumps, calls, and return instructions \\
3684
perf5  & 2 & Direct, unconditional jumps, calls, and returns \\
3685
perf5  & 3 & Indirect jumps and calls \\
3686
perf5  & 4 & Conditional jumps \\
3687
perf5  & 0 & Reset counters \\
3688
\hline
3689
perf16 & 1  & Unknown instructions attempted \\
3690
perf16 & 2  & Wrong operands for instruction \\
3691
perf16 & 3  & Array overflow  \\
3692
perf16 & 4  & Memory read violation \\
3693
perf16 & 5  & Memory write violation \\
3694
perf16 & 6  & Memory access misaligned \\
3695
perf16 & 62 & Code address where first error occurred  \\
3696
perf16 & 63 & Type of first error \\
3697
perf16 & 0  & Reset error counters \\
3698
\hline
3699
\end{longtable}
3700
\vv
3701
 
3702
The perf16 register is useful for detecting errors when error traps are disabled using the capabilities registers described on page \pageref{table:capabilitiesRegisters}.
3703
\vv
3704
 
3705
 
3706
\subsubsection{read\_perfs}
3707
\label{table:readPerfInstruction}
3708
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3709
\hline
3710
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3711
1.8 B & 37 & performance counter register, constant \\ \hline
3712
\end{tabular}
3713
\vv
3714
 
3715
This is the same as the read\_perf instruction, but serializing. The pipeline is flushed before reading the counter so that no instruction can execute out of order with read\_perfs.
3716
\vv
3717
 
3718
 
3719
\subsubsection{read\_spec, write\_spec}
3720
\label{table:readSpecInstruction}
3721
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3722
\hline
3723
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3724
1.8 B & 32 & read\_spec(special register, constant)\\ \hline
3725
1.8 B & 33 & write\_spec(g.p. register, constant) \\ \hline
3726
\end{tabular}
3727
\vv
3728
 
3729
int64 r0 = read\_spec(spec1, 0) \\
3730
int64 r1 = read\_spec(datap) \\
3731
\vv
3732
 
3733
Read a special system register. The following special registers are currently defined. The size is 64 bits. These registers are initialized with their default values at program start.
3734
\vv
3735
 
3736
The immediate operand (IM1) is currently unused. This instruction cannot have a mask.
3737
\vv
3738
 
3739
\begin{longtable} {|p{25mm}|p{15mm}|p{80mm}|}
3740
\caption{List of special registers}
3741
\label{table:specialRegisters} \\
3742
\endfirsthead
3743
\endhead
3744
\hline
3745
\bfseries Special register name & \bfseries number & \bfseries Meaning  \\
3746
\hline
3747
numcontr & spec0  & Numeric control register \\
3748
threadp  & spec1  & Thread environment block pointer \\
3749
datap    & spec2  & Data section pointer \\
3750
\hline
3751
\end{longtable}
3752
 
3753
\vv
3754
 
3755
 
3756
\subsubsection{read\_spev}
3757
\label{table:readSpevInstruction}
3758
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3759
\hline
3760
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3761
1.2 A & 56 & special vector register, general purpose register \\ \hline
3762
\end{tabular}
3763
\vv
3764
 
3765
int64 v0 = read\_spev(spec0, r2)
3766
\vv
3767
 
3768
Read special vector register spev1 into vector register result with length r2 bytes.
3769
\vv
3770
 
3771
The following special registers are currently defined:
3772
 
3773
\begin{longtable} {|p{15mm}|p{100mm}|}
3774
\caption{Special registers that can be read into vectors}
3775
\label{table:specialVectorRegisters} \\
3776
\endfirsthead
3777
\endhead
3778
\hline
3779
\bfseries Special register number & \bfseries Meaning  \\
3780
\hline
3781
spec0 & Numeric control register (NUMCONTR). The value is broadcast into all elements of the destination register with the indicated operand size and length.  \\
3782
\hline
3783
spec48 & Name of processor. The output is a zero-terminated UTF-8 string containing the brandname and model name of the microprocessor. \\
3784
\hline
3785
\end{longtable}
3786
\vv
3787
 
3788
\subsubsection{read\_sys, write\_sys}
3789
\label{table:readSysInstruction}
3790
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3791
\hline
3792
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3793
1.8 B & 38 & read\_sys(system register, constant) \\ \hline
3794
1.8 B & 39 & write\_sys(g.p. register, constant) \\ \hline
3795
\end{tabular}
3796
\vv
3797
 
3798
Read or write system register. Details are not defined yet. These instructions are privileged.
3799
 
3800
\vv
3801
 
3802
 
3803
\subsubsection{sys\_call}
3804
\label{systemCallInstruction}
3805
System calls use ID numbers rather than addresses to identify system functions.
3806
The ID is the combination of a module ID identifying a particular system module or device driver and a function ID identifying a particular function within this module. The module ID and the function ID are both 16 or 32 bits, so that the combined system call ID is up to 64 bits.
3807
The sys\_call instruction has the following variants:
3808
 
3809
\begin{longtable}
3810
{|p{20mm}|p{20mm}|p{20mm}|p{30mm}|p{30mm}|}
3811
\caption{Variants of system call instruction}
3812
\label{table:sysCallInstruction}
3813
\endfirsthead
3814
\endhead
3815
\hline
3816
Format & Operand type & Register operands & Module ID & Function ID \\
3817
\hline
3818
1.6 A & 32 bit & 3 & RT bit 16-31 & RT bit 0-15 \\
3819
\hline
3820
1.6 A & 64 bit & 3 & RT bit 32-63 & RT bit 0-31 \\
3821
\hline
3822
2.5.7 C & 64 bit & 0  & IM3 bit 0-31 & IM1,IM2 bit 0-15 \\
3823
\hline
3824
3.1.2 B & 64 bit & 2  & IM3 bit 0-31 & IM2 bit 0-31 \\
3825
\hline
3826
\end{longtable}
3827
 
3828
The sys\_call instruction can indicate a block of memory to be shared with the system function. The address of the memory block is pointed to by the register specified in RD and the length is in register RS. This memory block, which the caller must have access rights to, is shared with the system function. The system function will get the same access rights to this block as the calling thread has, i. e. read access and/or write access. This is useful for fast transfer of data between the caller and the system function. No other memory is accessible to both the caller and the called function. If the RD and RS fields are both r0 then no memory block is shared. If RD and RS are both SP then all the application's data memory is shared. The sys\_call instruction in format 2.5.7 has no register operands and no shared memory block. System calls cannot have a mask.
3829
\vv
3830
 
3831
Parameters for system functions are transferred in registers, following the same calling conventions as normal functions. The registers used for function parameters are usually different from the registers in the RD, RS and RT fields. Function parameters that do not fit into registers must reside in the shared memory block.
3832
 
3833
 
3834
\subsubsection{sys\_return}
3835
\label{table:sysCallInstruction}
3836
\begin{tabular}{|p{12mm}|p{12mm}|p{110mm}|}
3837
\hline
3838
\bfseries format & \bfseries opcode & \bfseries operands \\ \hline
3839
1.7 C & 62 & \\ \hline
3840
\end{tabular}
3841
\vv
3842
 
3843
Return from system call.
3844
 
3845
\subsubsection{trap}
3846
\label{traps}
3847
\label{table:trapInstruction}
3848
\begin{tabular}{|p{12mm}|p{12mm}|p{30mm}|p{80mm}|}
3849
\hline
3850
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries immediate operand \\ \hline
3851
1.7 C & 63 & trap & 0-254 \\ \hline
3852
1.7 C & 63 & filler & 255 \\ \hline
3853
\end{tabular}
3854
\vv
3855
 
3856
Traps work like interrupts. The unconditional trap has an 8-bit interrupt number in IM1. This is an index into the interrupt vector table, which initially starts at absolute address zero. The unconditional trap instruction may use IM2 for additional information.
3857
\vv
3858
 
3859
A trap instruction with all 1's in all fields (opcode 0x7FFFFFFF) can be used as filler in unused parts of code memory.
3860
 
3861
\subsubsection{conditional trap}
3862
\label{table:conditionalTrapInstructions}
3863
\begin{tabular}{|p{12mm}|p{12mm}|p{30mm}|p{80mm}|}
3864
\hline
3865
\bfseries format & \bfseries opcode & \bfseries instruction & \bfseries immediate operand \\ \hline
3866
2.5.5C & 63 & compare, trap\_uabove & limit \\ \hline
3867
%2.5.5 C & 63 & conditional trap & IM2 = interrupt number, IM3 = operand \\ \hline
3868
\end{tabular}
3869
\vv
3870
 
3871
Conditional traps are currently not supported.
3872
\vv
3873
 
3874
The conditional trap generates a trap if the specified condition is true.\\
3875
IM2 contains the interrupt number. \\
3876
IM3 contains an immediate operand
3877
%the condition code OPJ, specified in table \ref{table:controlTransferInstructions}.
3878
\vv
3879
 
3880
Compare/trap\_uabove will generate a trap if RD $>$ IM3. This is useful for checking if an array index exceeds the upper bound. The lower bound does not have to be checked because we use unsigned compare.
3881
\vv
3882
 
3883
 
3884
\section{Common operations that have no dedicated instruction}
3885
This section discusses some common operations that are not implemented as single instructions, and how to code these operations in software.
3886
 
3887
\subsubsection{Change sign}
3888
For integer operands, do a reverse subtract from zero. For floating point operands, use the toggle\_bit instruction on the sign bit.
3889
 
3890
\subsubsection{Not}
3891
To invert all bits in an integer, do an XOR with -1. To invert a Boolean, do an XOR with 1.
3892
 
3893
\subsubsection{Rotate through carry}
3894
Rotates through carry are rarely used, and common implementations can be very inefficient. A left rotate through carry can be replaced by an add\_c with the same register in both source operands.
3895
 
3896
\subsubsection{Horizontal vector add} \label{horizontalVectorAdd}
3897
See example \ref{exampleHorizontalAdd}.
3898
\vv
3899
 
3900
\section{Unused instructions} \label{unusedInstructions}
3901
Unused instructions and opcodes can be divided into three types:
3902
 
3903
\begin{enumerate}
3904
\item The opcode is reserved for future use. Attempts to execute it will trigger a trap (synchronous interrupt) which can be used for generating an error message or for emulating instructions that are not supported.
3905
\item The opcode is guaranteed to generate a trap, not only in the present version, but also in all future versions. This can be used as a filler in unused parts of the memory or for indicating unrecoverable errors. It can also be used for emulating user-specific instructions.
3906
\item The error is ignored and does not trigger a trap. It can be used for future extensions that improve performance or functionality, but which can be safely ignored when not supported.
3907
\end{enumerate}
3908
 
3909
All three types are implemented, where type 1 is the most common.
3910
\vv
3911
 
3912
Nop instructions with nonzero values in unused fields are type 3. These instructions are ignored.
3913
\vv
3914
 
3915
Prefetch and fence instructions with no memory operand, with nonzero values in unused fields, or with undefined values in IM3 are type 3. These instructions are ignored.
3916
\vv
3917
 
3918
Unused bits in masks and numeric control register are type 3. These bits are ignored.
3919
\vv
3920
 
3921
Trap instructions and conditional trap instructions with nonzero values in unused fields or undefined values in any field are type 2. These instructions are guaranteed to generate a trap. A special version of the trap instruction is intended as filler in unused or inaccessible parts of code memory.
3922
\vv
3923
 
3924
The undef instruction is type 2. It is guaranteed to generate a trap in all systems. It can be used for testing purposes and emulation.
3925
\vv
3926
 
3927
The userdef\_\_ instructions are type 1. These instructions are reserved for user-defined and application-specific purposes.
3928
\vv
3929
 
3930
Instructions with erroneous coding should preferably behave as type 1. This includes instruction codes with nonzero values in unused fields, operand types not supported, or any other bit pattern with no defined meaning in any field. Type 3 behavior may alternatively be allowed in these cases. If so, the instruction should behave as if it were coded correctly.
3931
\vv
3932
 
3933
All other opcodes not explicitly defined are type 1. These may be used for future instructions.
3934
\vv
3935
 
3936
Small systems with no operating system and no trap support should define alternative behavior.
3937
 
3938
 
3939
\end{document}