URL https://opencores.org/ocsvn/forwardcom/forwardcom/trunk

# Subversion Repositoriesforwardcom

## [/] [forwardcom/] [manual/] [fwc_instruction_formats.tex] - Blame information for rev 145

Line No. Rev Author Line
1 145 Agner
% chapter included in forwardcom.tex
2
\documentclass[forwardcom.tex]{subfiles}
3
\begin{document}
4
\RaggedRight
5
 
6
\chapter{Instruction formats}
7
\section{Formats and templates}
8
All instructions use one of the general format templates shown below (the most significant bits are shown to the left). The basic layout of the 32-bit code word is shown in template A. Template B, C and D are derived from template A by replacing 8, 16, or 24 bits, respectively, with immediate data. Double-size and triple-size instructions can be constructed by adding one or two 32-bit words to one of these templates. For example, template A with an extra 32-bit word containing data is called A2. Template E2 is an extension to template A where the second code word contains an extra register field, extra opcode bits, mode bits, option bits, and data.
9
\vspace{4mm}
10
 
11
\begin{table}[h!] \label{table:templateA}
12
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|} \hline
13
 Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 3 & 5 \\ \hline
14
Field & IL & Mode & OP1 & RD & M & OT & RS & Mask & RT  \\ \hline
15
\multicolumn{10}{|l|}{
16
\textbf{Template A}. Has three operand registers and a mask register.} \\ \hline
17
\end{tabular}
18
\end{table}
19
\vv
20
 
21
\begin{table}[h!] \label{table:templateB}
22
\vv
23
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{24mm}|} \hline
24
 Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 8 \\ \hline
25
Field & IL & Mode & OP1 & RD & M & OT & RS & IM1 \\ \hline
26
\multicolumn{9}{|l|}{
27
\textbf{Template B}. Has two operand registers and an 8-bit immediate constant.} \\ \hline
28
\end{tabular}
29
\end{table}
30
\vv
31
 
32
\begin{table}[h!] \label{table:templateC}
33
\vv
34
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{38.5mm}|p{24mm}|} \hline
35
 Bits & 2 & 3 & 6 & 5 & \hspace{15mm} 8 & \hspace{8mm} 8 \\ \hline
36
Field & IL & Mode & OP1 & RD & \hspace{14mm} IM2 & \hspace{7mm} IM1 \\ \hline
37
\multicolumn{7}{|l|}{
38
\textbf{Template C}. Has one operand register two 8-bit immediate constants.} \\ \hline
39
\end{tabular}
40
\end{table}
41
\vv
42
 
43
\begin{table}[h!] \label{table:templateD}
44
\vv
45
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{81.5mm}|} \hline
46
 Bits & 2 & 3 & 3 & \hspace{33mm} 24 \\ \hline
47
Field & IL & Mode & OP1 & \hspace{32mm} IM2 \\ \hline
48
\multicolumn{5}{|l|}{
49
\textbf{Template D}. Has no register and a 24-bit immediate constant.
50
} \\ \hline
51
\end{tabular}
52
\end{table}
53
\vv
54
 
55
\begin{table}[h!] \label{table:templateA2}
56
\vv
57
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|} \hline
58
 Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 3 & 5 \\ \hline
59
Field & IL & Mode & OP1 & RD & M & OT & RS & Mask & RT  \\ \hline
60
Field & \multicolumn{9}{|c|}{ IM2 } \\ \hline
61
\multicolumn{10}{|l|}{
62
\textbf{Template A2}. 2 words. As A, with an extra 32-bit immediate constant.
63
} \\ \hline
64
\end{tabular}
65
\end{table}
66
\vv
67
 
68
\begin{table}[h!] \label{table:templateB2}
69
\vv
70
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{24mm}|} \hline
71
 Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 8 \\ \hline
72
Field & IL & Mode & OP1 & RD & M & OT & RS & IM1 \\ \hline
73
Field & \multicolumn{8}{|c|}{ IM2 } \\ \hline
74
\multicolumn{9}{|l|}{
75
\textbf{Template B2}. As B, with an extra 32-bit immediate constant.} \\ \hline
76
\end{tabular}
77
\end{table}
78
\vv
79
 
80
\begin{table}[h!] \label{table:templateC2}
81
\vv
82
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{38.5mm}|p{24mm}|} \hline
83
 Bits & 2 & 3 & 6 & 5 & \hspace{15mm} 8 & \hspace{8mm} 8 \\ \hline
84
Field & IL & Mode & OP1 & RD & \hspace{14mm} IM2 & \hspace{7mm} IM1 \\ \hline
85
Field & \multicolumn{6}{|c|}{ IM3 } \\ \hline
86
\multicolumn{7}{|l|}{
87
\textbf{Template C2}. As C, with an extra 32-bit immediate constant.} \\ \hline
88
\end{tabular}
89
\end{table}
90
\vv
91
 
92
\begin{table}[h!] \label{table:templateE2}
93
\vv
94
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|} \hline
95
Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 3 & 5 \\ \hline
96
Field & IL & Mode & OP1 & RD & M & OT & RS & Mask & RT  \\ \hline
97
Bits & 3 & 5 & 2 & 6 & \multicolumn{5}{l|}{ \hspace{29mm} 16  } \\ \hline
98
Field  & Mode2 & RU & OP2 & IM3 & \multicolumn{5}{l|}{ \hspace{28mm} IM2 } \\ \hline
99
\multicolumn{10}{|l|}{
100
\textbf{Template E2}. Has 4 register operands, mask, a 16-bit immediate constant, } \\
101
\multicolumn{10}{|l|}{
102
and extra bits for mode, opcode, and options. } \\ \hline
103
\end{tabular}
104
\end{table}
105
\vv
106
 
107
\begin{table}[h!] \label{table:templateA3}
108
\vv
109
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|} \hline
110
 Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 3 & 5 \\ \hline
111
Field & IL & Mode & OP1 & RD & M & OT & RS & Mask & RT  \\ \hline
112
Field & \multicolumn{9}{|c|}{ IM2 } \\ \hline
113
Field & \multicolumn{9}{|c|}{ IM3 } \\ \hline
114
\multicolumn{10}{|l|}{
115
\textbf{Template A3}. 3 words. As A, with two extra 32-bit immediate constants.
116
} \\ \hline
117
\end{tabular}
118
\end{table}
119
\vv
120
 
121
\begin{table}[h!] \label{table:templateB3}
122
\vv
123
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{24mm}|} \hline
124
 Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 8 \\ \hline
125
Field & IL & Mode & OP1 & RD & M & OT & RS & IM1 \\ \hline
126
Field & \multicolumn{8}{|c|}{ IM2 } \\ \hline
127
Field & \multicolumn{8}{|c|}{ IM3 } \\ \hline
128
\multicolumn{9}{|l|}{
129
\textbf{Template B3}. As B, with two extra 32-bit immediate constants.
130
} \\ \hline
131
\end{tabular}
132
\end{table}
133
\vspace{4mm}
134
 
135
\begin{table}[h!] \label{table:templateE3}
136
\vv
137
\begin{tabular}{|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|p{10mm}|} \hline
138
Bits & 2 & 3 & 6 & 5 & 1 & 2 & 5 & 3 & 5 \\ \hline
139
Field & IL & Mode & OP1 & RD & M & OT & RS & Mask & RT  \\ \hline
140
Bits & 3 & 5 & 2 & 6 & \multicolumn{5}{l|}{ \hspace{29mm} 16  } \\ \hline
141
Field  & Mode2 & RU & OP2 & IM3 & \multicolumn{5}{l|}{ \hspace{28mm} IM2 } \\ \hline
142
Field & \multicolumn{9}{|c|}{ IM4 } \\ \hline
143
\multicolumn{10}{|l|}{
144
\textbf{Template E3}. As E2, with an extra 32-bit immediate constant.
145
} \\ \hline
146
\end{tabular}
147
\end{table}
148
\vspace{4mm}
149
 
150
The meaning of each field is described in the following table.
151
 
152
\pagebreak
153
 
154
\begin{longtable} {|p{16mm}|p{16mm}|p{85mm}|}
155
\caption{Fields in instruction templates} \label{table:fieldsInTemplates} \\
156
\endfirsthead
157
\endhead
158
\hline
159
\bfseries Field name & \bfseries Meaning & \bfseries Values  \\
160
\hline
161
IL & Instruction length & 0 or 1: 1 word = 32 bits \newline
162
2: 2 words = 64 bits \newline
163
3: 3 words (possibly more in future extensions if mode > 3)  \\
164
\hline
165
Mode & Format & Determines the format template and the use of each field.
166
Extended with the M bit when needed. \newline
167
See details below. \\
168
\hline
169
Mode2 & Format & Extension to Mode. \\
170
\hline
171
OT & Operand type and size (OS) &
172
0: 8 bit integer, OS = 1 byte  \newline
173
1: 16 bit integer, OS = 2 bytes \newline
174
2: 32 bit integer, OS = 4 bytes \newline
175
3: 64 bit integer, OS = 8 bytes \newline
176
4: 128 bit integer, OS = 16 bytes (optional) \newline
177
5: single precision float, OS = 4 bytes \newline
178
6: double precision float, OS = 8 bytes \newline
179
7: quadruple precision float, OS = 16 bytes (optional) \newline
180
The OT field is extended with the M bit when needed. \\
181
\hline
182
M & Operand type or mode & Extends the mode field when bit 1 and bit 2 of Mode are both zero (general purpose registers). Extends the OT field otherwise (vector registers).  \\
183
\hline
184
OP1 & Opcode & Decides the operation, for example add or move.  \\
185
\hline
186
OP2 & Opcode & Opcode extension for single-format instructions. \newline
187
               May also be used as an extension to IM3. \\
188
\hline
189
RD & Destination register & r0 – r31 or v0 – v31. Also used for first source operand and fallback if the instruction format does not specify enough operands. \\
190
\hline
191
RS & Source register & r0 – r31 or v0 – v31. Source register, pointer, or fallback. \\
192
\hline
193
RT & Source register & r0 – r31 or v0 – v31. Source register, index, or vector length.  \\
194
\hline
195
RU & Source register & r0 – r31 or v0 – v31. Source register or fallback. \\
196
\hline
197
Mask & mask register & 0-6 means that a general purpose register or vector register is used for mask and option bits. 7 means no mask.  \\
198
\hline
199
IM1 IM2 IM3 IM4 & Immediate data & 8, 16, 24, or 32 bits immediate operand or address offset or option bits. Adjacent IM fields can be merged to make a larger constant. \\
200
\hline
201
\end{longtable}
202
\vv
203
 
204
Instructions have several different formats, defined by the IL and mode bits, according to  table \ref{table:instructionFormats} below. The different formats specify different sizes of immediate data or memory operands with different addressing modes. \\
205
\vv
206
 
207
Instructions can have up to three source operands (input), one destination operand (output), and a mask. The destination operand always uses the RD field, except where the destination is a memory operand. The source operands are using the available operand fields according to the following algorithm: The required source operands are assigned to the available
208
operand fields defined by table \ref{table:instructionFormats} in the following order of priority: immediate data field, memory operand, RT, RS, RU, RD.
209
The operands are assigned in reverse order so that the last operand gets the field that comes first in this order of priority. For example, the instruction r1 = r2 - r3 using template A will be RD = RS - RT. RD is used for both destination and the first source operand only if there are no other vacant register fields.
210
\vv
211
 
212
The coding of instructions with two or three source operands is indicated in the table in the following way: \\
213
RD = f2(RS,RT)  means that instructions with two input operands (f2) use the register specified in RD as destination operand and RS and RT as source operands.\\
214
RD=f3(RD, RU, [RS+RT*OS+IM2])  means that instructions with three input operands (f3) use the register specified in RD as both destination and the first source operand. The second source operand is RU. The third source operand is a memory operand with RS as base pointer, RT as index scaled by the operand size, and the constant IM2 as offset.\\
215
Instructions with only one input operand are coded as f2 with the first source operand omitted.
216
 
217
\begin{longtable} {|p{10mm}|p{6mm}|p{9mm}|p{7mm}|p{80mm}|}
218
\caption{List of instruction formats} \label{table:instructionFormats} \\
219
\endfirsthead
220
\endhead
221
\hline
222
\bfseries Format name & \bfseries IL & \bfseries Mode. \small Mode2 & \bfseries Tem-plate & \bfseries Use \\
223
\hline
224
0.0 & 0 & 0 & A & Three general purpose register operands.\newline
225
RD = f2(RS, RT). RD = f3(RD, RS, RT).\\
226
 
227
\hline
228
0.1 & 0 & 1 & B & Two general purpose registers and 8-bit immediate operand. \newline
229
RD = f2(RS, IM1). RD = f3(RD, RS, IM1).\\
230
 
231
\hline
232
0.2 & 0 & 2 & A & Three vector register operands.\newline
233
RD = f2(RS, RT). RD = f3(RD, RS, RT).\\
234
 
235
\hline
236
0.3 & 0 & 3 & B & Two vector registers and a broadcast 8-bit immediate operand. \newline
237
RD = f2(RS, IM1). RD = f3(RD, RS, IM1).\\
238
 
239
\hline
240
0.4 & 0 & 4 & A & One vector register and memory operand. Vector length specified by general purpose register. \newline
241
RD = f2(RD, [RS]). length=RT.\\
242
 
243
\hline
244
0.5 & 0 & 5 & A & One vector register and a memory operand with base pointer and negative index.  This is used for vector loops as explained on page \pageref{vectorLoops}. \newline
245
RD = f2(RD, [RS-RT]). length=RT.\\
246
 
247
\hline
248
0.6 & 0 & 6 & A & One vector register and a scalar memory operand with base pointer and scaled index. \newline
249
RD = f2(RD, [RS+RT*OS]).\\
250
 
251
\hline
252
0.7 & 0 & 7 & B & One vector register and a scalar memory operand with base pointer and 8-bit offset. \newline
253
RD = f2(RD, [RS+IM1*OS]).\\
254
 
255
\hline
256
0.8 & 0 & 0 M=1 & A & One general purpose register and a memory operand with base pointer and scaled index. \newline
257
RD = f2(RD, [RS+RT*OS]).\\
258
 
259
\hline
260
0.9 & 0 & 1 M=1 & B & One general purpose register and a memory operand with base pointer and 8-bit offset. \newline
261
RD = f2(RD, [RS+IM1*OS]).\\
262
 
263
\hline
264
1.0 & 1 & 0 & A & Single-format instructions. Three general purpose register operands. \newline
265
RD = f2(RS, RT). RD = f3(RD, RS, RT).\\
266
 
267
\hline
268
1.1 & 1 & 1 & C & Single-format instructions. One general purpose register and a 16-bit immediate operand. \newline
269
RD = f2(RD, IM1-2).\\
270
 
271
\hline
272
1.2 & 1 & 2 & A & Single-format instructions. Three vector register operands. \newline
273
RD = f2(RS, RT). RD = f3(RD, RS, RT).\\
274
 
275
\hline
276
1.3 & 1 & 3 & B & Single-format instructions. Two vector registers and a broadcast 8-bit immediate operand. \newline
277
RD = f2(RS, IM1). RD = f3(RD, RS, IM1). \\
278
 
279
\hline
280
1.4 & 1 & 4 & C & Single-format instructions. One vector register and a broadcast 16-bit immediate operand. \newline
281
RD = f2(RD, IM1-2). \\
282
 
283
\hline
284
1.5 & 1 & 5 &  & Vacant. May be used for application-specific vector instructions. \\
285
 
286
\hline
287
1.6 A & 1 & 6 & A & Multiway jump instructions and system calls with three register operands.\\
288
 
289
\hline
290
1.6 B & 1 & 6 & B & Jump instructions with two register operands and 8 bit offset.\\
291
 
292
\hline
293
1.7C & 1 & 7 & C & Jump instructions with one register operand, 8 bit constant (IM2) and 8 bit offset (IM1).\\
294
 
295
\hline
296
1.7D & 1 & 7 & D & Jump instructions with no register and 24 bit offset.  \\
297
 
298
\hline
299
1.8 & 1 & 0 M=1 & B & Single-format instructions. Two general purpose registers and an 8-bit immediate operand.\newline
300
RD = f2(RS, IM1). RD = f3(RD, RS, IM1).\\
301
 
302
\hline
303
1.9 &  &  &  & There is no format 1.9 because 1.1 has no M bit.\\
304
 
305
\hline
306
2.0.0 & 2 & 0.0  & E2 & Three general purpose registers and a memory operand with base and 16 bit offset.\newline
307
RD = f2(RT, [RS+IM2]). \newline
308
RD = f3(RU, RT, [RS+IM2]).\\
309
 
310
\hline
311
2.0.1 & 2 & 0.1  & E2 & Two general purpose registers and a memory operand with base, index and optional 16 bit offset, no scale.\newline
312
RD = f2(RU, [RS+RT+IM2]).\newline
313
RD = f3(RD, RU, [RS+RT+IM2]). \\
314
 
315
\hline
316
2.0.2 & 2 & 0.2  & E2 & Two general purpose registers and a memory operand with base,  scaled index, and optional 16 bit offset.\newline
317
RD = f2(RU, [RS+RT*OS+IM2]). \newline
318
RD = f3(RD, RU, [RS+RT*OS+IM2]). \\
319
 
320
\hline
321
2.0.3 & 2 & 0.3  & E2 & Two general purpose registers and a memory operand with base, scaled index, and 16-bit limit. Optional. \newline
322
RD = f2(RU, [RS+RT*OS]). \newline
323
RD = f3(RD, RU, [RS+RT*OS]). \newline
324
Limit RT $\leq$ IM2 (unsigned).\newline
325
Support for this format is optional. \\
326
 
327
\hline
328
2.0.5 & 2 & 0.5  & E2 & One general purpose register and a memory operand with base, scaled index, 16-bit offset, and an 8-bit immediate operand using IM3 extended with OP2.  Optional. \newline
329
RD = f2([RS+RT*OS+IM2], IM3). \newline
330
RD = f3(RU, [RS+RT*OS+IM2], IM3). \\
331
 
332
\hline
333
2.0.6 & 2 & 0.6  & E2 & Four general purpose registers.\newline
334
RD = f2(RS, RT). \newline
335
RD = f3(RU, RS, RT).\\
336
 
337
\hline
338
2.0.7 & 2 & 0.7  & E2 & Three general purpose registers and a 16-bit integer with left shift.\newline
339
RD = f2(RT, IM2). \newline
340
RD = f3(RS, RT, IM2).\newline
341
IM2 (signed) is shifted left by the 6-bit unsigned value of IM3, or whithout shift if IM3 is used for other purposes. \\
342
 
343
\hline
344
2.1 & 2 & 1 & A2 & Two general purpose registers and a memory operand with base and 32 bit offset (IM2). \newline
345
RD = f2(RT, [RS+IM2]). \newline
346
RD = f3(RD, RT, [RS+IM2]).\\
347
 
348
\hline
349
2.2.0 & 2 & 2.0 & E2 & Two vector registers and a broadcast scalar memory operand with base  and 16 bit offset.\newline
350
RD = f2(RU, [RS+IM2]). \newline
351
RD = f3(RD, RU, [RS+IM2]). \newline
352
Broadcast to length RT.\\
353
 
354
\hline
355
2.2.1 & 2 & 2.1 & E2 & Two vector registers and a memory operand with base and 16 bit offset.\newline
356
RD = f2(RU, [RS+IM2]). \newline
357
RD = f3(RD, RU, [RS+IM2]).\newline
358
Length=RT.\\
359
 
360
\hline
361
2.2.2 & 2 & 2.2 & E2 & Two vector registers and a scalar memory operand with base and scaled index. \newline
362
RD = f2(RU, [RS+RT*OS+IM2]). \newline
363
RD = f3(RD, RU, [RS+RT*OS+IM2]). \\
364
 
365
\hline
366
2.2.3 & 2 & 2.3 & E2 & Two vector registers and a scalar memory operand with base, scaled index, and 16-bit limit. Optional. \newline
367
RD = f2(RU, [RS+RT*OS]). \newline
368
RD = f3(RD, RU, [RS+RT*OS]).\newline
369
Limit RT $\leq$ IM2 (unsigned).\\
370
 
371
\hline
372
2.2.4 & 2 & 2.4 & E2 & Two vector registers and a memory operand with base and negative index. \newline
373
RD = f2(RU, [RS-RT+IM2]). \newline
374
RD = f3(RD, RU, [RS-RT+IM2]). \newline
375
Length=RT. \\
376
 
377
\hline
378
2.2.5 & 2 & 2.5  & E2 & One vector register and a memory operand with base, 16-bit offset, and an 8-bit immediate operand using IM3 extended with OP2. Optional. \newline
379
RD = f2([RS+IM2], IM3). \newline
380
RD = f3(RU, [RS+IM2], IM3). \newline
381
Length=RT.\\
382
 
383
\hline
384
2.2.6 & 2 & 2.6 & E2 & Four vector registers.\newline
385
RD = f2(RS, RT). \newline
386
RD = f3(RU, RS, RT).\\
387
 
388
\hline
389
2.2.7 & 2 & 2.7 & E2 & Three vector registers and a broadcast immediate half-precision float or 16-bit integer with left shift.\newline
390
RD = f2(RT, IM2). \newline
391
RD = f3(RS, RT, IM2).\newline
392
Floating point operands: IM2 is half precision.
393
Integer operands: IM2 (signed) is shifted left by the 6-bit unsigned value of IM3, or whithout shift if IM3 is used for other purposes. \\
394
 
395
\hline
396
2.3 & 2 & 3 & A2 & Three vector registers and a broadcast 32-bit immediate operand.\newline
397
RD = f2(RT, IM2). \newline
398
RD = f3(RS, RT, IM2).\\
399
 
400
\hline
401
2.4 & 2 & 4 & A2 & One vector register and a memory operand with base and 32 bit offset.\newline
402
RD = f2(RD, [RS+IM2]). length=RT.\\
403
 
404
\hline
405
2.5 & 2 & 5 & A2, B2, C2 & Jump instructions for OP1 $<$ 8. Single format instructions with memory operands or mixed register types for OP1 $\geq$ 8.\\
406
 
407
\hline
408
2.6 & 2 & 6 & A2 & Single-format instructions. Three vector registers and a 32-bit immediate operand.\newline
409
RD = f2(RT, IM2). \newline
410
RD = f3(RS, RT, IM2).\\
411
 
412
\hline
413
2.7 & 2 & 7 &  & Currently unused.\\
414
 
415
\hline
416
2.8 & 2 & 0 M=1 & A2 & Three general purpose registers and a 32-bit immediate operand.\newline
417
RD = f2(RT, IM2). \newline
418
RD = f3(RS, RT, IM2).\\
419
 
420
\hline
421
2.9 & 2 & 1 M=1 & A2 & Single-format instructions. Three general purpose registers and a 32-bit immediate operand.\newline
422
RD = f2(RT, IM2). \newline
423
RD = f3(RS, RT, IM2).\\
424
 
425
 
426
\hline
427
3.0.0 & 3 & 0.0  & E3 & Three general purpose registers and a memory operand with base and 32 bit offset.\newline
428
RD = f2(RT, [RS+IM4]). \newline
429
RD = f3(RU, RT, [RS+IM4]).\\
430
 
431
\hline
432
3.0.2 & 3 & 0.2  & E3 & Two general purpose registers and a memory operand w. base, scaled index, and 32 bit offset.\newline
433
RD = f2(RU, [RS+RT*OS+IM4]). \newline
434
RD = f3(RD, RU, [RS+RT*OS+IM4]). \\
435
 
436
\hline
437
3.0.3 & 3 & 0.3  & E3 & Two general purpose registers and a memory operand with base, scaled index, and 32-bit limit. Optional. \newline
438
RD = f2(RU, [RS+RT*OS]). \newline
439
RD = f3(RD, RU, [RS+RT*OS]). \newline
440
Limit RT $\leq$ IM4 (unsigned). \\
441
 
442
\hline
443
3.0.5 & 3 & 0.5  & E3 & One general purpose register and a memory operand with base, scaled index, 16-bit offset, and a 32-bit immediate operand. Optional. \newline
444
RD = f2([RS+RT*OS+IM2], IM4). \newline
445
RD = f3(RU, [RS+RT*OS+IM2], IM4). \\
446
 
447
\hline
448
3.0.7 & 3 & 0.7  & E3 & Three general purpose registers and a 32-bit integer with left shift.\newline
449
RD = f2(RS, IM4 $<<$ IM2). \newline
450
RD = f3(RS, RT, IM4 $<<$ IM2). \newline
451
IM4 (signed) is shifted left by the unsigned value of IM2. \\
452
 
453
\hline
454
3.1 & 3 & 1 & A3, B3 & Jump instructions for OP1 $<$ 8. Single format instructions with memory operands or mixed register types for OP1 $\geq$ 8.\\
455
 
456
\hline
457
3.2.0 & 3 & 2.0 & E3 & Two vector registers and a broadcast scalar memory operand with base and 32 bit offset.\newline
458
RD = f2(RU, [RS+IM4]). \newline
459
RD = f3(RD, RU, [RS+IM4]). \newline
460
Broadcast to length RT.\\
461
 
462
\hline
463
3.2.1 & 3 & 2.1 & E3 & Two vector registers and a memory operand with base and 32 bit offset.\newline
464
RD = f2(RU, [RS+IM4]). \newline
465
RD = f3(RD, RU, [RS+IM4]). \newline
466
Length=RT.\\
467
 
468
\hline
469
3.2.2 & 3 & 2.2 & E3 & Two vector registers and a scalar memory operand w. base, scaled index, and 32-bit offset. Optional. \newline
470
RD = f2(RU, [RS+RT*OS+IM4]). \newline
471
RD = f3(RD, RU, [RS+RT*OS+IM4]).\\
472
 
473
\hline
474
3.2.3 & 3 & 2.3 & E3 & Two vector registers and a scalar memory operand with base, scaled index, and 32-bit limit. Optional. \newline
475
RD = f2(RU, [RS+RT*OS]). \newline
476
RD = f3(RD, RU, [RS+RT*OS]). \newline
477
Limit RT $\leq$ IM4 (unsigned).\\
478
 
479
\hline
480
3.2.5 & 3 & 2.5  & E3 & One vector register and a memory operand with base, 16-bit offset, and a 32-bit immediate operand. Optional. \newline
481
RD = f2([RS+IM2], IM4). \newline
482
RD = f3(RU, [RS+IM2], IM4). \newline
483
Length=RT.\\
484
 
485
\hline
486
3.2.7 & 3 & 2.7 & E3 & Three vector registers and a broadcast single precision float or 32-bit integer with left shift.\newline
487
RD = f2(RT, IM4). \newline
488
RD = f3(RS, RT, IM4).\newline
489
Floating point operands: IM4 is single precision.
490
Integer operands: IM4 (signed) is shifted left by the unsigned value of IM2. \\
491
 
492
\hline
493
3.3 & 3 & 3 & A3 & Three vector registers and a broadcast 64-bit immediate operand.\newline
494
RD = f2(RT, IM3:IM2). \newline
495
RD = f3(RS, RT, IM3:IM2).\\
496
 
497
\hline
498
3.8 & 3 & 0 M=1 & A3 & Three general purpose registers and a 64-bit immediate operand. \newline
499
RD = f2(RT, IM3:IM2). \newline
500
RD = f3(RS, RT, IM3:IM2).\\
501
 
502
\hline
503
3.9 &  &  &  & There is no format 3.9 because 3.1 uses the M bit.\\
504
 
505
\hline
506
4.x & 3 & 4-7 &  & Reserved for future 4-word instructions and longer. \\
507
\hline
508
\end{longtable}
509
 
510
 
511
%\vspace{2mm}
512
%\subsection{Maximum number of input operands}
513
%The hardware supports a maximum of four or five input dependencies. Three-input instructions cannot have both a mask and a memory operand with base and index or vector length if the hardware has a limit of four input dependencies. For example the mul\_add instruction cannot have a mask in format 2.2.0 if the hardware does not support five inputs.
514
 
515
\vv
516
\section{Coding of operands}
517
\subsection{Operand type}
518
The type and size of operands is determined by the OT field as indicated above. The operand type is 32 bit integer if there is no OT field unless otherwise specified. The operand size (OS) is the size in bytes of a scalar operand or a vector element. This is equal to the number of bits divided by 8.
519
 
520
\subsection{Register type}
521
The instructions can use either general purpose registers or vector registers. General purpose registers are used for source and destination operands and for masks if the Mode field is 0 or 1 (with M = 0 or 1). Vector registers are used for source and destination operands and for masks if Mode is 2-7. Jump instructions use vector registers if M = 1. A few single-format instructions deviate from this rule and use mixed register types.
522
 
523
\subsection{Pointer register}
524
Instructions with a memory operand always use an address relative to a base pointer. The base pointer can be a general purpose register, the data section pointer, the thread data pointer, the instruction pointer, or the stack pointer. The pointer is determined by the RS field. This field is interpreted as follows.
525
\vv
526
 
527
Single-size instructions with a memory operand (formats 0.4 - 0.9) can use any of the registers r0-r31 as base pointer. r31 is the stack pointer.
528
\vv
529
 
530
Larger instructions with a memory operand and an offset field of at least 16 bits (formats 2.0.x, 2.1, 2.2.x, 2.4, 2.9, 3.0.x, 3.2.x) can use the same registers, except r28 - r30,  which are replaced by the thread pointer (THREADP), data section pointer (DATAP), and instruction pointer (IP), respectively.
531
\vv
532
 
533
The instruction pointer may be used for addressing data in a read-only data section. This works in the following way. The address of the end of the current instruction is used as a reference point. This is the same as the address of the next instruction. The reason for using the end of the instruction as reference point is that it makes relocation in the linker independent of the instruction length in most cases. This address is multiplied by 4 when used as a data address because the instruction pointer is addressing 32 bit word units while data pointers are addressing byte units.
534
\vv
535
 
536
 
537
\subsection{Index register}
538
Instruction formats with an index can use r0 - r30 as index in the RT field.
539
A value of 31 in the index field means no index. The signed index is multiplied by the operand size (OS) for formats 0.6, 0.8, 2.0.2, 2.0.3, 2.0.5, 2.2.3, 3.0.3, 3.2.3; by 1 for format 2.0.1; or by -1 for format 0.5 and 2.2.2. The result is added to the address given by the base pointer.
540
 
541
\subsection{Offsets}
542
Offsets can be 8, 16, or 32 bits. The value is sign-extended to 64 bits. An 8-bit offset is multiplied by the operand size OS, as given by the OT field. An offset of 16 or 32 bits is not scaled. The result is added to the address given by the base pointer and the index.
543
\vv
544
 
545
Support for addressing modes with both index and offset is optional
546
(format 2.0.1, 2.0.2, 2.0.5, 2.2.2, 2.2.4, 3.0.2, 3.0.5, 3.2.2).
547
Hardware implementations where the use of two additions in the address calculation would cause timing problems may allow having an index with a offset of zero or an offset with no index (RT = 31).
548
\vv
549
 
550
\subsection{Limit on index}
551
Formats 2.0.3, 2.2.3, 3.0.3, and 3.2.3 have a 16-bit or 32-bit limit on the index register. This is useful for checking array limits. A trap is generated if the value of the index register, interpreted as unsigned, is bigger than the unsigned limit. This feature is optional.
552
 
553
\subsection{Vector length}
554
The vector length of memory operands is specified by r0-r30 in the RT field for formats with a vector memory operand. A value of 31 in the RT field indicates a scalar with the same length as the operand size (OS).
555
\vv
556
 
557
The value of the vector length register indicates the vector length in bytes (not the number of elements). If the value is bigger than the maximum vector length then the maximum vector length is used.
558
If the indicated vector length is zero or negative then the resulting vector will be empty and nothing will be read or written.
559
\vv
560
 
561
The vector length must be a multiple of the operand size OS, as indicated by the OT field. If the vector length is not a multiple of the operand size then the partial vector element will be zero.
562
\vv
563
 
564
The vector length for source operands in vector registers is stored in the register itself.
565
 
566
\subsection{Combining vectors with different lengths}
567
The length of the destination register of a vector instruction will be the same as the vector length of the first source operand.
568
\vv
569
 
570
A consequence of this is that the length of the result is determined by the order of the operands when vectors of different lengths are combined.
571
\vv
572
 
573
If the source operands have different lengths then the lengths will be adjusted as follows. If a vector source operand is too long then the extra elements will be ignored. If a vector source operand is too short then the missing elements will be zero.
574
\vv
575
 
576
A scalar memory operand is not broadcast but treated as a short vector. It is padded with zeroes to the vector length of the destination.
577
\vv
578
 
579
A broadcast memory operand will use the vector length given by the vector length register. If this is less than the length of the destination then it is padded with zeroes.
580
\vv
581
 
582
An immediate operand will be broadcast to the vector length of the destination.
583
\vv
584
 
585
\subsection{Immediate constants}
586
Immediate constants can be 8, 16, 32, and 64 bits. Immediate fields are aligned to natural addresses. They are interpreted as follows.
587
\vv
588
 
589
If OT specifies an integer type then the field is interpreted as an integer. If the field is smaller than the operand size then it is sign-extended to the appropriate size. If the field is larger than the operand size then the superfluous upper bits are ignored. The truncation of a too large immediate operand will not trigger any overflow condition.
590
\vv
591
 
592
If OT specifies a floating point type then the field is interpreted as follows. Immediate fields of 8 bits are interpreted as signed integers and converted to floating point numbers of the desired precision. A 16-bit field is interpreted as a half precision floating point number (subnormal numbers are supported for float16).
593
A 32-bit field is interpreted as a single precision floating point number. It is converted to the desired precision if necessary. A 64-bit field is interpreted as a double precision floating point number. A 64-bit field is not allowed with a single precision operand type.
594
\vv
595
 
596
Some instruction formats allow immediate integer constants with a left shift. Large integer constants with a limited number of significant bits can be represented with fewer bits in this way.
597
 
598
Format 2.0.7 and 2.2.7 allow a 16-bit immediate constant in IM2 to be shifted left by the unsigned value of IM3 to give a 64-bit signed value, except for instructions that use IM3 for other purposes.
599
 
600
Format 3.0.7 and 3.2.7 allow a 32-bit immediate constant in IM4 to be shifted left by the unsigned value of IM2.
601
Any overflow beyond 64 bits is ignored.
602
 
603
Some single-format instructions also use shifted constants.
604
\vv
605
 
606
An instruction can be made compact by using the smallest size that fits the actual value of the constant.
607
 
608
 
609
\subsection{Mask register and fallback register} \label{MaskAndFallback}
610
The 3-bit mask field in formats with templates of type A or E indicates a mask register. Register r0-r6 can be used as masks if the destination is a general purpose register. Vector register v0-v6 can be used as masks if the destination is a vector register. A value of 7 in the mask field means no mask and unconditional execution using the options specified in the numeric control register.
611
\vv
612
 
613
If the mask is a vector register then it is interpreted as a vector with the same element size as indicated by the OT field. Each element of the mask register is applied to the corresponding element of the result.
614
\vv
615
 
616
The mask has multiple purposes. The primary purpose is for conditional execution. An instruction is not executed if bit 0 of the mask is zero. In this case, the destination will get a fallback value instead of the result of the calculation, and any numerical error condition will be suppressed. Vector instructions are executed conditionally for each vector element separately, so that each vector element is enabled if bit 0 of the corresponding vector element of the mask register is 1.
617
\vv
618
 
619
The fallback value is taken from an extra register if the instruction has less than three source operands and the format has a vacant register field, or from the first source register operand otherwise. The fallback cannot be different from the first source register if the instruction has three source operands, even if there is a vacant register field.
620
If the instruction format has more than one vacant register field, then the field that would be used for the first source operand if the instruction had three source operands is used for the fallback register.
621
\vv
622
 
623
Register r31 (stack pointer) and v31 cannot be used as fallback register. Instead, the fallback value will be zero if a register number of 31 is indicated.
624
Register r31 and v31 should not be used as first source register if it is also used as feedback because this would cause ambiguity about the fallback value. (The fallback value will not be zero in this case).
625
\vv
626
 
627
A memory write has no fallback register. Instead, the value of the memory operand will be unchanged if the mask has a zero in bit 0.
628
\vv
629
 
630
The remaining bits of the mask are used for specifying various options.
631
The meanings of these mask bits are described in the next section.
632
 
633
\section{Coding of masks}
634
A mask register can be a general purpose register r0-r6 or a vector register v0-v6. A value of 7 in the mask field means no mask.
635
\vv
636
 
637
The bits in the mask register are coded as follows.
638
 
639
\begin{longtable}
640
{|p{15mm}|p{90mm}|}
641
\caption{Bits in mask register and numeric control register}
642
\label{table:maskBits}
643
\endfirsthead
644
\endhead
645
\hline
646
\bfseries Bit number & \bfseries Meaning \\
647
 \hline
648
 
649
1 & Guaranteed to be ignored. \\
650
\hline
651
2-7 & Numerical exception control. See page \pageref{table:FPExceptionResults}. \\
652
2 & Floating point division by zero generates NAN \\
653
3 & Floating point overflow generates NAN \\
654
4 & Floating point underflow generates NAN \\
655
5 & Floating point inexact generates NAN \\
656
  & Bits 2-7 may also be used for controlling integer division by zero and integer overflow \\ \hline
657
10-12 & Floating point rounding mode: \newline
658
000 = nearest or even \newline
659
001 = down \newline
660
010 = up \newline
661
011 = towards zero \newline
662
This feature is optional.\\ \hline
663
13 & Support subnormal numbers in single and higher precision.
664
(Subnormal numbers are always supported for half precision). This feature is optional.\\ \hline
665
 
666
18-23 & Instruction-specific option bits.\\
667
\hline
668
26 - 30 & Possible use for enabling numerical traps. Not used in the standard version. \\
669
\hline
670
31 & Constant execution time. This bit makes instructions take the same number of clock cycles regardless of the values of mask and operands. The guarantee provided by this bit is useful for cryptographic applications. This feature is optional. \\
671
\hline
672
\end{longtable}
673
 
674
Bits 8, 16, 24, etc. in a vector mask register can be used like bit 0 for 8-bit and 16-bit operand sizes. All other bits are reserved for future use.
675
\vv
676
 
677
Vector instructions treat the mask register as a vector with the same element size (OS) as the operands. Each element of the mask vector has the bit codes as listed above. The different vector elements can have different mask bits.
678
\vv
679
 
680
The numeric control register (NUMCONTR) is used as mask when the mask field is 7 or absent. The NUMCONTR register is broadcast to all elements of a vector, using as many bits of NUMCONTR as indicated by the operand size, when an instruction has no mask register. The number of bits in NUMCONTR is implementation dependent (usually 16 or more). Any missing bits will be zero.
681
The same NUMCONTR value is applied to all vector elements.
682
Bit 0 of NUMCONTR is always 1.
683
\vv
684
 
685
The instruction-specific option bits (bit 18-23) may be used for various options in specific instructions. The option bits in the mask are considered zero in vector operands with an 8-bit or 16-bit operand type because each mask element has too few bits in this case.
686
\vv
687
 
688
\section{Format for jump, call and branch instructions}
689
Most branches in software are based on the result of an arithmetic or logic instruction (ALU). The ForwardCom design combines the ALU instruction and the conditional jump into a single instruction. For example, a loop control can be implemented with a single instruction that counts down and jumps until it reaches zero or counts up until it reaches a certain limit.
690
\vv
691
 
692
The jumps, calls, branches, and multiway branches will use the following formats.
693
 
694
\begin{longtable}
695
{|p{10mm}|p{8mm}|p{8mm}|p{8mm}|p{8mm}|p{70mm}|}
696
\caption{List of formats for control transfer instructions}
697
\label{table:jumpInstructionFormats}
698
\endfirsthead
699
\endhead
700
\hline
701
\bfseries Format & \bfseries IL & \bfseries Mode & \bfseries OP1 & \bfseries Tem-plate & \bfseries Description \\
702
\hline
703
1.6 A & 1 & 6 & OPJ & B & Multiway jump and calls with three register operands.  \\
704
\hline
705
1.6 B & 1 & 6 & OPJ & B & Short jump with two register operands (RD, RS) and 8 bit offset  (IM1).  \\
706
\hline
707
1.7 C & 1 & 7 & OPJ & C & Short jump with one register operand (RD), an 8-bit immediate constant (IM2) and 8 bit offset (IM1). \\
708
\hline
709
1.7 D & 1 & 7 & 0-15 & D & Jump or call with 24-bit offset. \\
710
\hline
711
 
712
2.5.0 & 2 & 5 & 3 & A2 & Double size jump with three register operands (RD, RS, RT),
713
and a 24-bit address offset (IM2). OPJ in upper 8 bits of IM2. \\
714
\hline
715
 
716
2.5.1 & 2 & 5 & 1 & B2 & Double size jump with a register destination operand, a register source operand, a 16-bit immediate operand (IM2 lower half), and
717
a 16-bit jump offset (IM2 upper half). OPJ in IM1. \\
718
\hline
719
 
720
2.5.2 & 2 & 5 & 2 & B2 & Double size jump with one register operand (RD),
721
a memory operand with base RS and 16-bit address offset (IM2 lower half),
722
and a 16-bit jump offset (IM2 upper half). OPJ in IM1. Optional. \\
723
\hline
724
 
725
2.5.4 & 2 & 5 & 4 & C2 & Double size jump with one register operand (RD), one 8-bit immediate constant (IM2) and 32 bit offset (IM3). OPJ in IM1. \\
726
\hline
727
 
728
2.5.5 & 2 & 5 & 5 & C2 & Double size jump with one register operand (RD), an 8-bit offset (IM2) and a 32-bit immediate constant (IM3).  OPJ in IM1. \\
729
\hline
730
 
731
2.5.7 & 2 & 5 & 7 & C2 & Double size system call, 16 bit constant (IM1,IM2) and 32-bit constant (IM3). No OPJ. \\
732
\hline
733
 
734
3.1.0 & 3 & 1 & 0 & A3 & Triple size jump with two register operands (RD, RT),
735
a 24-bit jump offset (IM2), and a memory operand with base RS and 32-bit address offset (IM3). OPJ in last byte of IM2. Optional. \\
736
\hline
737
 
738
3.1.1 & 3 & 1 & 1 & B3 & Triple size jump with a register destination operand, a register source operand (RS), a 32-bit jump offset (IM2), and a 32-bit immediate operand (IM3). OPJ in IM1. Optional. \\
739
\hline
740
 
741
 
742
\end{longtable}
743
 
744
The jump, call, and branch instructions have signed offsets of 8, 16, 24, or 32 bits relative to the instruction pointer. Or, more precisely, relative to the end of the instruction. This offset is multiplied by the instruction word size (= 4 bytes) to cover an address range of $\pm$ 512 bytes for short conditional jumps with 8 bits offset, $\pm$ 128 kilobytes for jumps and calls with 16 bits offset, $\pm$ 32 megabytes for 24 bits offset, and $\pm$ 8 gigabytes for 32 bits offsets.
745
\vv
746
 
747
The OPJ field defines the operation and jump condition. This field has 6 bits in the single size version and 8 bits in the longer format versions. The two extra bits in the longer versions are reserved for future use.
748
\vv
749
 
750
The versions with template C and C2 have no OT field. The operand type is 32-bit integer when there is no OT field, unless otherwise noted. It is not possible to use formats with template C or C2 with other operand types.
751
\vv
752
 
753
The instructions will use vector registers when there is an OT field and M = 1. In other words, the combined ALU-and-branch instructions will use vector registers only when a floating point type is specified (or 128-bit integer type, if supported). General purpose registers are used in all other cases. Only the first element of a vector register is used.
754
Logical instructions will interpret the value in a vector register as an integer, when a floating point type is specified. Only the compare instructions interpret the operands as floating point when a floating point type is indicated. Branch instructions with addition and subtraction cannot use floating point operands. The codes that these instructions would use are used for floating point compare instructions instead.
755
\vv
756
 
757
The combined ALU and conditional jump instructions can be coded in the formats listed above. Subtraction with a constant cannot be coded in format 1.7 C. The assembler will replace subtraction with a small immediate constant by addition with the negative constant. The code space that would have been used by subtraction in format 1.7 C is instead used for coding direct jump and call instructions with a 24-bit offset using format 1.7 D, where the lower three bits of OP1 are used as part of the 24-bit offset.
758
\vv
759
 
760
Unconditional and indirect jumps and calls use the formats indicated above, where unused fields must be zero. Bit 0 of the OPJ field is zero for unconditional jump instructions and one for call instructions.
761
\vv
762
 
763
See page \pageref{table:controlTransferInstructions} for a list of OPJ condition codes.
764
\vv
765
 
766
 
767
\section{Assignment of opcodes}
768
The opcodes and formats for new instructions can be assigned according to the following rules.
769
 
770
\begin{itemize}
771
\item Multi-format instructions. Often-used instructions that need to support many different operand types, addressing modes, and formats use all or most of the following formats: 0.0 - 0.9, 2.0.x, 2.1, 2.2.x, 2.3, 2.4, 2.8, 3.0.x, 3.2.x, 3.3, and 3.8. The same value of OP1 is used in all these formats. OP2 must be 0, except in formats 2.0.5 and 2.2.5 that use OP2 for other purposes. Instructions with few source operands should have the lowest values of OP1. Available OP1 values is a limited resource that should be economized. Instructions for integers only and instructions for floating point only may share the same OP1 value.
772
 
773
\item Control transfer instructions, i. e. jumps, branches, calls and returns, can be coded as short instructions with IL = 1, mode = 6 - 7, and OP1 = 0 - 63 or as double-size instructions with IL = 2, mode = 5, OP1 = 0 - 7, and optionally as triple-size instructions with IL = 3, mode = 1, OP1 = 0-7. See page \pageref{table:jumpInstructionFormats}.
774
 
775
\item Short single-format instructions with general purpose registers. Use mode 1.0, 1.1, and 1.8, with any value of OP1. Mode 1.0 is currently unused and may be reserved for future purposes.
776
 
777
\item Short single-format instructions with vector registers. Use mode 1.2, 1.3, and 1.4 with any value of OP1.
778
 
779
\item Double-size single-format instructions with general purpose registers can use mode 2.9 with any value of OP1, and mode 2.0.x (except 2.0.5) with any value of OP1 and OP2 $\neq$ 0 (give similar instructions the same value of OP2). If more combinations are needed then use IM3 for further subdivision of the code space.
780
 
781
\item Double-size single-format instructions with vector registers can use mode 2.6 with with any value of OP1, and mode 2.2.x (except 2.2.5) with any value of OP1 and OP2 $\neq$ 0 (give similar instructions the same value of OP2). If more combinations are needed then use IM3 for further subdivision of the code space.
782
 
783
\item Double-size single-format instructions with mixed vector and general purpose registers or with memory operands can use mode 2.5 with OP1 in the range 8-63.
784
 
785
\item Triple-size single-format instructions with general purpose registers can use mode 3.0.x with with any value of OP1 and OP2 $\neq$ 0.
786
 
787
\item Triple-size single-format instructions with vector registers can use mode 3.2.x with with any value of OP1 and OP2 $\neq$ 0.
788
 
789
\item Triple-size single-format instructions with mixed register types can use mode 3.1 with with OP1 in the range 8-63.
790
 
791
\item Possible future instructions longer than three 32-bit words should be coded with IL = 3, mode = 4-7.
792
 
793
\item New options or other modifications to existing instructions can use IM3 bits in template E or mask register bits.
794
 
795
\item New addressing modes and formats may be implemented as single-format read and write instructions. Template E formats use Mode2 for distinguishing between different formats.
796
Other single-format templates may be divided into groups of eight consecutive OP1 values with the same format.
797
New addressing modes or other formats that apply to all multi-format instructions can use vacant values of Mode2 with E templates.
798
 
799
\item Format 1.0 is intended for single-format instructions with three general purpose registers. There are currently no such instructions. Therefore, format 1.0 A or B may be used for application-specific single-size instructions or for other purposes. Note that the M bit is not available in format 1.0 because this bit is used for distinguishing format 1.8 from 1.0. This means that format 1.0 cannot be used for vector instructions without violating the general coding scheme.
800
 
801
\item Format 1.5 is vacant to use for single-format instructions with vector registers.
802
 
803
\end{itemize}
804
 
805
Application-specific instructions may preferably use E template formats with OP2 $\neq$ 0. There are many vacant opcodes in these formats. General multi-purpose instructions may use some of the more crowded formats.
806
\vv
807
 
808
Unused register fields may have the same value as the first source register operand in order to avoid false dependences. Unused mask fields have the value 7 in instructions that can have a mask.
809
All other unused fields must be zero. The instructions with the fewest input operands should preferably have the lowest OP1 codes.
810
\vv
811
 
812
The file forwardcom\_sourcecode\_documentation has a checklist of what to do when making or modifying instructions.
813
\vv
814
 
815
 
816
\end{document}