1 |
8 |
hmanske |
====================================================
|
2 |
|
|
HiCoVec Processor Instruction Set & Coding
|
3 |
|
|
====================================================
|
4 |
|
|
|
5 |
|
|
www.opencores.org/projects/hicovec
|
6 |
|
|
|
7 |
|
|
This document originates from Prof. Dr.-Ing. Gundolf Kiefer, teaching at the University of Applied Sciences in Augsburg.
|
8 |
|
|
It is the fundament of the HiCoVec processor. While being vital during the development of the processor, it
|
9 |
|
|
now serves the purpose of providing detailed information about the instruction set and coding for people trying
|
10 |
|
|
to write applications.
|
11 |
|
|
|
12 |
|
|
|
13 |
|
|
General
|
14 |
|
|
===========
|
15 |
|
|
- Each intruction word consists of exactly 32 bit. This makes instruction decoding and execution pretty simple.
|
16 |
|
|
- With a few exceptions, the first 12 bit define a scalar operation and the following 20 bit a vector operation. (VLIW/EPIC-Principle)
|
17 |
|
|
- "000" encodes the NOP-command, thereby allowing the other unit (scalar/vector) to use the remaining bits.
|
18 |
|
|
|
19 |
|
|
Harware components:
|
20 |
|
|
- instruction register (32 Bit)
|
21 |
|
|
- memory interface
|
22 |
|
|
- scalar unit
|
23 |
|
|
- vector unit
|
24 |
|
|
|
25 |
|
|
Scalar unit
|
26 |
|
|
==============
|
27 |
|
|
Principle:
|
28 |
|
|
- accumulator machine
|
29 |
|
|
- load/store architecture
|
30 |
|
|
- addressing modes: absolute, register indirect, register indirect with displacement
|
31 |
|
|
- address calculation using ALU (ADD-operation)
|
32 |
|
|
|
33 |
|
|
Hardware components:
|
34 |
|
|
- register: A, X, Y (each 32 bit)
|
35 |
|
|
- flags: Carry, Zero
|
36 |
|
|
- ALU
|
37 |
|
|
- instruction counter
|
38 |
|
|
- control unit
|
39 |
|
|
- some multiplexers
|
40 |
|
|
|
41 |
|
|
ALU commands:
|
42 |
|
|
01 oooo dd ss tt -------------------- respectivly
|
43 |
|
|
01 oooo dd ss 00 000-nnnnnnnnnnnnnnnn d, s, t
|
44 |
|
|
|
45 |
|
|
d: destination register: 00 = none, 01 = A, 10 = X, 11 = Y
|
46 |
|
|
s: 1. source register: 00 = 0 , 01 = A, 10 = X, 11 = Y
|
47 |
|
|
t: 2. source register: 00 = n , 01 = A, 10 = X, 11 = Y
|
48 |
|
|
|
49 |
|
|
o: operation
|
50 |
|
|
0000: ADD 0001: ADC
|
51 |
|
|
0010: SUB 0011: SBC
|
52 |
|
|
0100: INC
|
53 |
|
|
0110: DEC
|
54 |
|
|
1000: AND 1001: OR
|
55 |
|
|
1010: XOR 1011: MUL (optional)
|
56 |
|
|
1100: LSL (insert 0) 1101: ROL (insert carry)
|
57 |
|
|
1110: LSR (insert 0) 1111: ROR (insert carry)
|
58 |
|
|
|
59 |
|
|
load/store commands:
|
60 |
|
|
10 00-- dd ss tt -------------------- respectivly
|
61 |
|
|
10 00-- dd ss 00 000-nnnnnnnnnnnnnnnn LD d, s + t
|
62 |
|
|
10 10-- -- ss tt -------------------- respectivly
|
63 |
|
|
10 10-- -- ss 00 000-nnnnnnnnnnnnnnnn ST s + t, A
|
64 |
|
|
|
65 |
|
|
Legwork for vector unit (here just scalar part, for full details look below):
|
66 |
|
|
10 01-- -- ss tt -------------------- VLD ..., s + t ; t != 00
|
67 |
|
|
10 1100 -- ss tt -------------------- VST s + t, ... ; t != 00
|
68 |
|
|
|
69 |
|
|
10 1110 -- ss -- -------------------- MOV r(...), s ; t != 00
|
70 |
|
|
10 1111 dd -- -- -------------------- MOV d, v(...) ; t != 00
|
71 |
|
|
10 1101 -- ss -- -------------------- MOVA r, s
|
72 |
|
|
|
73 |
|
|
Jump commands:
|
74 |
|
|
00 0--- -- -- -- -------------------- NOP
|
75 |
|
|
00 1000 00 ss tt -------------------- JMP s+t
|
76 |
|
|
00 1000 dd ss tt -------------------- JAL A/X/Y, s+t ; Jump-And-Link (useful for subprograms)
|
77 |
|
|
00 101- -- -- -- -------------------- HALT
|
78 |
|
|
00 1100 00 ss tt -------------------- JNC s+t
|
79 |
|
|
00 1101 00 ss tt -------------------- JC s+t
|
80 |
|
|
00 1110 00 ss tt -------------------- JNZ s+t
|
81 |
|
|
00 1111 00 ss tt -------------------- JZ s+t
|
82 |
|
|
|
83 |
|
|
Other:
|
84 |
|
|
11 0000 -- ee ff -------------------- set/clear flags
|
85 |
|
|
ee: select flag (Z, C) CLC, CLZ, SEC, SEZ
|
86 |
|
|
ff: new value
|
87 |
|
|
e.g.:
|
88 |
|
|
11 0000 -- 01 -0 -------------------- CLC (clear carry)
|
89 |
|
|
|
90 |
|
|
|
91 |
|
|
Vector unit
|
92 |
|
|
==============
|
93 |
|
|
Principle:
|
94 |
|
|
- N registers, each K * 32 bit wide
|
95 |
|
|
- N, K are configurable
|
96 |
|
|
- Direct access is only possible for R0 to R15. The other registers can be used as temporary storage using
|
97 |
|
|
the VMOV command.
|
98 |
|
|
- Wordlength of an operation can be specified (following Intel naming convention)
|
99 |
|
|
- QW = Quadword (64 bit)
|
100 |
|
|
- DW = Doubleword (32 bit)
|
101 |
|
|
- W = Word (16 bit)
|
102 |
|
|
- B = Byte (8 bit)
|
103 |
|
|
|
104 |
|
|
- N, K can choosed freely. Instruction decoding is independant of selected values with the following
|
105 |
|
|
exceptions:
|
106 |
|
|
* K has to be dividable by 2 (otherweise the 64 bit mode would be silly)
|
107 |
|
|
* N can not exceed 16 in instruction words (the rest is adressable via VMOV command)
|
108 |
|
|
* VSHUF wordlength is K * 32 / 4
|
109 |
|
|
|
110 |
|
|
Hardware components:
|
111 |
|
|
- N register, each K * 32 bit wide (N, K configurable)
|
112 |
|
|
- K 32 bit ALUs
|
113 |
|
|
- vector control unit
|
114 |
|
|
- shuffle unit
|
115 |
|
|
- select unit (used to select data for transfer to scalar unit)
|
116 |
|
|
- vector ALU control unit
|
117 |
|
|
- some multiplexers
|
118 |
|
|
|
119 |
|
|
|
120 |
|
|
VALU commands:
|
121 |
|
|
------------ 01 ww oooo rrrr vvvv wwww (.B/.W/.DW/.QW) r, v, w
|
122 |
|
|
|
123 |
|
|
w: wordlength: 00 = 8 Bit, 01 = 16 Bit, 10 = 32 Bit, 11 = 64 Bit
|
124 |
|
|
r: destination vectorregister (R0...R7)
|
125 |
|
|
v: 1. source vectorregister (R0...R7)
|
126 |
|
|
w: 2. source vektorregister (R0...R7)
|
127 |
|
|
|
128 |
|
|
o: Operation
|
129 |
|
|
0000: VADD
|
130 |
|
|
0010: VSUB
|
131 |
|
|
1000: VAND
|
132 |
|
|
1001: VOR
|
133 |
|
|
1010: VXOR
|
134 |
|
|
1011: VMUL (optional)
|
135 |
|
|
1100: VLSL
|
136 |
|
|
1110: VLSR
|
137 |
|
|
|
138 |
|
|
|
139 |
|
|
Transfer commands (in cooperation with scalar unit):
|
140 |
|
|
10 01-- -- ss tt 10 -- 0010 rrrr ---- ---- VLD r, s + t ; t != 00
|
141 |
|
|
10 1100 -- ss tt 10 -- 0011 --- vvvv ---- VST s + t, v ; t != 00
|
142 |
|
|
10 1101 -- ss -- 10 -- 0110 rrrr ---- ---- MOVA r, s ;
|
143 |
|
|
10 1110 -- ss tt 10 -- 0100 rrrr ---- ---- MOV r(t), s ; t != 00
|
144 |
|
|
10 1111 dd -- tt 10 -- 0101 ---- vvvv ---- MOV d, v(t) ; t != 00
|
145 |
|
|
|
146 |
|
|
|
147 |
|
|
Shuffle command: (BITORDER REVERSED !!!)
|
148 |
|
|
000-nnnnnnnn 11 ww ssss rrrr vvvv wwww VSHUF r,v,w,wwssssnnnnnnnn
|
149 |
|
|
|
150 |
|
|
VSHUF allows fast, parallel transfer of data inside of or between vector registers.
|
151 |
|
|
w defines a wordlength W <= 32*K/4. r[i], v[i] and w[i] are i-th partial words of or vector
|
152 |
|
|
register r, v and w. n[i] defines the i-th bit group of n und s[i] the i-th bit von n.
|
153 |
|
|
|
154 |
|
|
For i <= 3:
|
155 |
|
|
r[i] <- v[n[i]], if s[i] = 0
|
156 |
|
|
r[i] <- w[n[i]], if s[i] = 1
|
157 |
|
|
General:
|
158 |
|
|
r[i] <- v[n[i % 4] + i/4], if s[i % 4] = 0
|
159 |
|
|
r[i] <- w[n[i % 4] + i/4], if s[i % 4] = 1
|
160 |
|
|
|
161 |
|
|
Example:
|
162 |
|
|
Command: VSHUF R2, 16, R3:2, R3:3, R5:1, R5:2
|
163 |
|
|
Coding: nnnnnnnn = 10 11 01 10, ssss = 0011, vvvv = 3, wwww = 5
|
164 |
|
|
Effect: R2(31:0) <- R3(63:32), R2(63:32) <- R5(47:16)
|
165 |
|
|
|
166 |
|
|
Other:
|
167 |
|
|
------------ 00 0- ---- ---- ---- ---- VNOP
|
168 |
|
|
------------ 00 1- 1000 rrrr vvvv ---- VMOL r,v
|
169 |
|
|
------------ 00 1- 1100 rrrr vvvv ---- VMOR r,v
|
170 |
|
|
------------ 00 1- 0001 rrrr vvvv ---- VMOV r, v
|
171 |
|
|
000-nnnnnnnn 00 1- 0010 rrrr ---- ---- VMOV r, R
|
172 |
|
|
000-nnnnnnnn 00 1- 0011 ---- vvvv ---- VMOV R, v
|
173 |
|
|
|