URL https://opencores.org/ocsvn/hmta/hmta/trunk

Subversion Repositories hmta

[/] [hmta/] [trunk/] [docs/] [HyperMTA.txt] - Blame information for rev 4

Details | Compare with Previous | View Log


HyperMTA Processor Specifications
 
This is only a preliminary release and it is not complete.
 
 
#######################################################################
#######################################################################
 
User/System Access
 
Registers:
R0-R31: General Purpose Integer
F0-F31: Floating Point Registers
C0(F0,F1)-C15(F30,F31): Complex Floating Point Registers
 
Instruction Format and Next Instruction Data Placement:
| 42 Bits(I) | 42 Bits(I) | 42 Bits(I) | 2 Bits(11) |
 
| 64 Bits(D) | 20 Bits(D) | 42 Bits(I) | 2 Bits(01) |
NID0(I/F)    NID1(I)
 
| 42 Bits(I) | 42 Bits(I) | 42 Bits(D) | 2 Bits(10) |
                          NID0(I)
 
|              126 Bits(D)             | 2 Bits(00) | (Complex)
NID0(C)
// 63 Bits of each float least significat bit is zeroed or oneed by instruction
 
Next Instruction Data:
Inlined data stored within the next instruction. Nothing else to say except
it is one of the ways in which we hide memory latency.
 
Internal VLIW Branching:
This is another way to hide latency.
 
The following to reduce branch misprediction penalties since it'll be more
costly in this system:
 
| Conditional Branch | ALU OP | ALU OP |
If true the instruction will execute:
| ALU | ALU | IGNORE THIS ALU OP |
else
| IGNORE THIS | ALU | ALU OP |
 
Both of those were the same instruction. The Branch instruction contains
a mask of which molecules to execute of the next instruction. In a standard
pipelined system all three can be executed and in the write back stage the
correct molecules will be writen back. This can eliminate small loops.
Then a special return instruction will return the execution back to normal
executing all instructions. As in the example it is possible to have
shared instructions which will be executed either way.
 
ISA:
 
Arithmetic(64 bit):
ADD reg,reg,reg/imm
SUB reg,reg,reg/imm
MUL reg,reg,reg/imm
MULU reg,reg,reg/imm
DIV reg,reg,reg/imm
DIVU reg,reg,reg/imm
MOD reg,reg,reg/imm
MODU reg,reg,reg/imm
LMUL reg,reg,reg,reg/imm // Long multiply
LMULU reg,reg,reg,reg/imm // Long multiply unsigned
 
Logic(64 bit):
OR reg,reg,reg/imm
AND reg,reg,reg/imm
XOR reg,reg,reg/imm
NOT reg,reg
SHL reg,reg,reg/imm
SHR reg,reg,reg/imm
ROL reg,reg,reg/imm
ROR reg,reg,reg/imm
PCNT reg,reg
PCNTZ reg,reg
PCNTC reg,reg
CHG reg,reg
SB reg,imm // Set Bit
CB reg,imm // Clear Bit
TB reg,imm // Toggle Bit
 
Floating Point(64 Bit):
FADD reg,reg,reg
FSUB reg,reg,reg
FMUL reg,reg,reg
FDIV reg,reg,reg
FMOD reg,reg,reg
FABS reg,reg
FNEG reg,reg // Make Negative
FPOS reg,reg // Make Positive
FTSIGN reg,reg // Toggle Sign
FSQ reg,reg
FCMP reg,reg
FRND reg // Random Generator
FPI reg // Load PI
FE reg // Load E
FZERO reg // Load ZERO
FONE reg // Load ONE
FFLOOR reg,reg
FCEIL reg,reg
FINV reg,reg // 1/reg
 
Complex(128 Bit):
CADD reg,reg,reg
CSUB reg,reg,reg
CMUL reg,reg,reg
CDIV reg,reg,reg
CMOD reg,reg,reg // Do we really need this? I don't think so.
CSQ reg,reg
CCMP reg,reg // ?
CI reg // Load I
 
Branch: // Avoid if possible user internal VLIW branching
JMP rel
JMP reg
JMP{condition} rel
JMP{condition} reg
CALL rel
CALL reg
CALL{condition} rel
CALL{condition} reg
CALL [reg+8*cccc]
CALL{condition} [reg+8*cccc]
RETURN
RETURN{condition}
 
Internal VLIW Branching:
// Selects to execute certain molecules of each atom until a return is reached
// This is another way to hide memory latency
IVB{condition} moleculemask(0,1,2, or any combination)
IVRET
 
Interupt:
THROW reg/imm
RTI // Return Interupt
 
Data Movement:
MOV reg,reg // Move
MOVS reg,sreg // Move Special
MOVS sreg,reg
PREFETCH // Data Prefetch
PREFETCHI // Instruction Prefetch
LOADB(U)
LOADW(U)
LOADD(U)
LOADQ(U)
STOREB
STOREW
STORED
STOREQ
LOADF // Load/Store Float
STOREF
LOADC // Load/Store Complex
STOREC
LOADNID // Load from Next Instruction Data
LOADFNID // Load Float from Next Instruction Data
LOADCNID // Load Complex from Next Instruction Data
EXTRACT reg(dest),reg(src),imm(start),imm(stop)
DEPOSITE reg(dest),reg(src),reg(srcb),imm(start),imm(stop)
 
System:
TLBR reg(threadid),reg(tlbvalueh),reg(tlbvaluel)
TLBW reg(threadid),reg(tlbvalueh),reg(tlbvaluel)
 
Interupts: -- Avoid this unless absolutely nessicary
THROW reg/imm(vector) // Throw Exception
RETI // Return from Interupt
 
System:
IFENCE // Instruction Fence
DFENCE // Data Fence
REGISTER reg(threadptr),imm(interupt vector) // Registers an interupt
SYSCALL // Syscall (Pauses Current Stream/Flags for Service)
 
Process Management: // Dispatched through MP Bus (8 Threads = 1 Process)
PROCESS.LOAD reg(addrptr),reg(processorid:processid)
PROCESS.STORE reg(addrptr),reg(processorid:processid)
PROCESS.START reg(processorid:processid)
PROCESS.STOP reg(processorid:processid)
 
Thread Management: // Dispatched through MP Bus
THREAD.LOAD reg(addrptr),reg(processorid:threadid) // Loads a threads state
THREAD.STORE reg(addrptr),reg(processorid:threadid) // Saves a threads state
THREAD.START reg(processorid:threadid) // Continues execution of a thread
THREAD.STOP reg(processorid:threadid) // Stops execution of a thread
BREAK // Debugger Support
 
Processor Management: // Dispatched through MP Bus
PROCESSOR.START reg(processorid) // Start the processor
PROCESSOR.STOP reg(processorid) // Stop the processor
PROCESSOR.PAUSE reg(processorid) // Pause a processor and all it's streams
PROCESSOR.CONTINUE reg(processorid) // Resume a processor from pause
PROCESSOR.RESET reg(processorid) // Restart a processor
PROCESSOR.PING reg(processorid),reg(result/hop count) // Ping's a processor
 // result is number of hops to processor or 0 for nonexistant
 
Processor IDs:
0000: Startup
0001: Master Processor (OS Only)
0002-FFFE: Slave Processors
FFFF: Broadcast ID
 
Routing:
           |             |               |
           |             |               |
           |             |               |
           |             |               |
-----------1-------------2---------------3------------
           |             |               |
           |             |               |
           |             |               |
           |             |               |
-----------4-------------5 NO CONNECTION 6------------
           |             |               |
           |             |               |
           |             |               |
           |             |               |
-----------7-------------8---------------9------------
           |             |               |
           |             |               |
           |             |               |
           |             |               |
 
Each router will automatically keep track of processor id's and their routing keys
and each router will try to route to a specified processor using the best way possible
When a processor is assigned a processor id it automatically tells the router its
id and the router from then on builds routing key tables as data transfers occur.
Routers also buffer memory transfers and cache for their own memory banks.
The routing processors must be capable of sustaining 1 memory read/write to
each processor every clock cycle. Instructions will have a small special
buffer so that small loops can be made without any memory access penalty.
(That is loops not implemented with Internal VLIW Branching.)
 
I/O Interfacing:
There are memory based I/O chips connected to the memory routing network.
They are able to throw interupts by signalling processors through the MP
Bus that their is a service request needed to be serviced.
 
CPU Bus Interface:
Consist of MP Bus interface which connects to microkernel risc processor
and the memory i/o interface that is 128 bits in length and transfers
data through.
 
#######################################################################
#######################################################################
MicroKernel Support Processor's ISA(Small risc core) -- Incomplete
This microprocessor runs part of the os and manages the mp bus.
 
Arithmetic:
ADD
SUB
SHR
SHL
ROR
ROL
RND // Random Number Generator
Arguments: reg,reg,reg
Arguments: reg,reg,imm16
 
Logic:
OR
AND
XOR
NOT
Arguments: reg,reg,reg
 
Memory:
LB/LW/LD(S)
SB/SW/SD
Arguments: reg,[reg+imm16]
 
Branch:
BEQ(L)
BNE(L)
BZ(L)
BNZ(L)
BC(L)
BNC(L)
J(L)
JR(L)
 
Interupts/Special:
NOP // No Operation
 
MP(MultiProcessing) Interconnect:
MPIREAD // Write Buffer
MPIWRITE // Read Buffer
MPIREQ? // Branch on Request Pending
 
Threads:
TSREQ? // Branch on Thread Service Request
 
Local Processor Manipulation:
PSTOP // Processor
PSTART
TSTOP reg(threadid) // Thread
TSTART reg(threadid)

Browse

Tools

Subversion Repositories hmta

[/] [hmta/] [trunk/] [docs/] [HyperMTA.txt] - Blame information for rev 4

Line No.	Rev	Author	Line
1	2	alikat	`HyperMTA Processor Specifications`
2
3			`This is only a preliminary release and it is not complete.`
4
5
6			`#######################################################################`
7			`#######################################################################`
8
9			`User/System Access`
10
11			`Registers:`
12			`R0-R31: General Purpose Integer`
13			`F0-F31: Floating Point Registers`
14			`C0(F0,F1)-C15(F30,F31): Complex Floating Point Registers`
15
16			`Instruction Format and Next Instruction Data Placement:`
17			`\| 42 Bits(I) \| 42 Bits(I) \| 42 Bits(I) \| 2 Bits(11) \|`
18
19			`\| 64 Bits(D) \| 20 Bits(D) \| 42 Bits(I) \| 2 Bits(01) \|`
20			`NID0(I/F) NID1(I)`
21
22			`\| 42 Bits(I) \| 42 Bits(I) \| 42 Bits(D) \| 2 Bits(10) \|`
23			`NID0(I)`
24
25			`\| 126 Bits(D) \| 2 Bits(00) \| (Complex)`
26			`NID0(C)`
27			`// 63 Bits of each float least significat bit is zeroed or oneed by instruction`
28
29			`Next Instruction Data:`
30			`Inlined data stored within the next instruction. Nothing else to say except`
31			`it is one of the ways in which we hide memory latency.`
32
33			`Internal VLIW Branching:`
34			`This is another way to hide latency.`
35
36			`The following to reduce branch misprediction penalties since it'll be more`
37			`costly in this system:`
38
39			`\| Conditional Branch \| ALU OP \| ALU OP \|`
40			`If true the instruction will execute:`
41			`\| ALU \| ALU \| IGNORE THIS ALU OP \|`
42			`else`
43			`\| IGNORE THIS \| ALU \| ALU OP \|`
44
45			`Both of those were the same instruction. The Branch instruction contains`
46			`a mask of which molecules to execute of the next instruction. In a standard`
47			`pipelined system all three can be executed and in the write back stage the`
48			`correct molecules will be writen back. This can eliminate small loops.`
49			`Then a special return instruction will return the execution back to normal`
50			`executing all instructions. As in the example it is possible to have`
51			`shared instructions which will be executed either way.`
52
53			`ISA:`
54
55			`Arithmetic(64 bit):`
56			`ADD reg,reg,reg/imm`
57			`SUB reg,reg,reg/imm`
58			`MUL reg,reg,reg/imm`
59			`MULU reg,reg,reg/imm`
60			`DIV reg,reg,reg/imm`
61			`DIVU reg,reg,reg/imm`
62			`MOD reg,reg,reg/imm`
63			`MODU reg,reg,reg/imm`
64			`LMUL reg,reg,reg,reg/imm // Long multiply`
65			`LMULU reg,reg,reg,reg/imm // Long multiply unsigned`
66
67			`Logic(64 bit):`
68			`OR reg,reg,reg/imm`
69			`AND reg,reg,reg/imm`
70			`XOR reg,reg,reg/imm`
71			`NOT reg,reg`
72			`SHL reg,reg,reg/imm`
73			`SHR reg,reg,reg/imm`
74			`ROL reg,reg,reg/imm`
75			`ROR reg,reg,reg/imm`
76			`PCNT reg,reg`
77			`PCNTZ reg,reg`
78			`PCNTC reg,reg`
79			`CHG reg,reg`
80			`SB reg,imm // Set Bit`
81			`CB reg,imm // Clear Bit`
82			`TB reg,imm // Toggle Bit`
83
84			`Floating Point(64 Bit):`
85			`FADD reg,reg,reg`
86			`FSUB reg,reg,reg`
87			`FMUL reg,reg,reg`
88			`FDIV reg,reg,reg`
89			`FMOD reg,reg,reg`
90			`FABS reg,reg`
91			`FNEG reg,reg // Make Negative`
92			`FPOS reg,reg // Make Positive`
93			`FTSIGN reg,reg // Toggle Sign`
94			`FSQ reg,reg`
95			`FCMP reg,reg`
96			`FRND reg // Random Generator`
97			`FPI reg // Load PI`
98			`FE reg // Load E`
99			`FZERO reg // Load ZERO`
100			`FONE reg // Load ONE`
101			`FFLOOR reg,reg`
102			`FCEIL reg,reg`
103			`FINV reg,reg // 1/reg`
104
105			`Complex(128 Bit):`
106			`CADD reg,reg,reg`
107			`CSUB reg,reg,reg`
108			`CMUL reg,reg,reg`
109			`CDIV reg,reg,reg`
110			`CMOD reg,reg,reg // Do we really need this? I don't think so.`
111			`CSQ reg,reg`
112			`CCMP reg,reg // ?`
113			`CI reg // Load I`
114
115			`Branch: // Avoid if possible user internal VLIW branching`
116			`JMP rel`
117			`JMP reg`
118			`JMP{condition} rel`
119			`JMP{condition} reg`
120			`CALL rel`
121			`CALL reg`
122			`CALL{condition} rel`
123			`CALL{condition} reg`
124			`CALL [reg+8*cccc]`
125			`CALL{condition} [reg+8*cccc]`
126			`RETURN`
127			`RETURN{condition}`
128
129			`Internal VLIW Branching:`
130			`// Selects to execute certain molecules of each atom until a return is reached`
131			`// This is another way to hide memory latency`
132			`IVB{condition} moleculemask(0,1,2, or any combination)`
133			`IVRET`
134
135			`Interupt:`
136			`THROW reg/imm`
137			`RTI // Return Interupt`
138
139			`Data Movement:`
140			`MOV reg,reg // Move`
141			`MOVS reg,sreg // Move Special`
142			`MOVS sreg,reg`
143			`PREFETCH // Data Prefetch`
144			`PREFETCHI // Instruction Prefetch`
145			`LOADB(U)`
146			`LOADW(U)`
147			`LOADD(U)`
148			`LOADQ(U)`
149			`STOREB`
150			`STOREW`
151			`STORED`
152			`STOREQ`
153			`LOADF // Load/Store Float`
154			`STOREF`
155			`LOADC // Load/Store Complex`
156			`STOREC`
157			`LOADNID // Load from Next Instruction Data`
158			`LOADFNID // Load Float from Next Instruction Data`
159			`LOADCNID // Load Complex from Next Instruction Data`
160			`EXTRACT reg(dest),reg(src),imm(start),imm(stop)`
161			`DEPOSITE reg(dest),reg(src),reg(srcb),imm(start),imm(stop)`
162
163			`System:`
164			`TLBR reg(threadid),reg(tlbvalueh),reg(tlbvaluel)`
165			`TLBW reg(threadid),reg(tlbvalueh),reg(tlbvaluel)`
166
167			`Interupts: -- Avoid this unless absolutely nessicary`
168			`THROW reg/imm(vector) // Throw Exception`
169			`RETI // Return from Interupt`
170
171			`System:`
172			`IFENCE // Instruction Fence`
173			`DFENCE // Data Fence`
174			`REGISTER reg(threadptr),imm(interupt vector) // Registers an interupt`
175			`SYSCALL // Syscall (Pauses Current Stream/Flags for Service)`
176
177			`Process Management: // Dispatched through MP Bus (8 Threads = 1 Process)`
178			`PROCESS.LOAD reg(addrptr),reg(processorid:processid)`
179			`PROCESS.STORE reg(addrptr),reg(processorid:processid)`
180			`PROCESS.START reg(processorid:processid)`
181			`PROCESS.STOP reg(processorid:processid)`
182
183			`Thread Management: // Dispatched through MP Bus`
184			`THREAD.LOAD reg(addrptr),reg(processorid:threadid) // Loads a threads state`
185			`THREAD.STORE reg(addrptr),reg(processorid:threadid) // Saves a threads state`
186			`THREAD.START reg(processorid:threadid) // Continues execution of a thread`
187			`THREAD.STOP reg(processorid:threadid) // Stops execution of a thread`
188			`BREAK // Debugger Support`
189
190			`Processor Management: // Dispatched through MP Bus`
191			`PROCESSOR.START reg(processorid) // Start the processor`
192			`PROCESSOR.STOP reg(processorid) // Stop the processor`
193			`PROCESSOR.PAUSE reg(processorid) // Pause a processor and all it's streams`
194			`PROCESSOR.CONTINUE reg(processorid) // Resume a processor from pause`
195			`PROCESSOR.RESET reg(processorid) // Restart a processor`
196			`PROCESSOR.PING reg(processorid),reg(result/hop count) // Ping's a processor`
197			`// result is number of hops to processor or 0 for nonexistant`
198
199			`Processor IDs:`
200			`0000: Startup`
201			`0001: Master Processor (OS Only)`
202			`0002-FFFE: Slave Processors`
203			`FFFF: Broadcast ID`
204
205			`Routing:`
206			`\| \| \|`
207			`\| \| \|`
208			`\| \| \|`
209			`\| \| \|`
210			`-----------1-------------2---------------3------------`
211			`\| \| \|`
212			`\| \| \|`
213			`\| \| \|`
214			`\| \| \|`
215			`-----------4-------------5 NO CONNECTION 6------------`
216			`\| \| \|`
217			`\| \| \|`
218			`\| \| \|`
219			`\| \| \|`
220			`-----------7-------------8---------------9------------`
221			`\| \| \|`
222			`\| \| \|`
223			`\| \| \|`
224			`\| \| \|`
225
226			`Each router will automatically keep track of processor id's and their routing keys`
227			`and each router will try to route to a specified processor using the best way possible`
228			`When a processor is assigned a processor id it automatically tells the router its`
229			`id and the router from then on builds routing key tables as data transfers occur.`
230			`Routers also buffer memory transfers and cache for their own memory banks.`
231			`The routing processors must be capable of sustaining 1 memory read/write to`
232			`each processor every clock cycle. Instructions will have a small special`
233			`buffer so that small loops can be made without any memory access penalty.`
234			`(That is loops not implemented with Internal VLIW Branching.)`
235
236			`I/O Interfacing:`
237			`There are memory based I/O chips connected to the memory routing network.`
238			`They are able to throw interupts by signalling processors through the MP`
239			`Bus that their is a service request needed to be serviced.`
240
241			`CPU Bus Interface:`
242			`Consist of MP Bus interface which connects to microkernel risc processor`
243			`and the memory i/o interface that is 128 bits in length and transfers`
244			`data through.`
245
246			`#######################################################################`
247			`#######################################################################`
248			`MicroKernel Support Processor's ISA(Small risc core) -- Incomplete`
249			`This microprocessor runs part of the os and manages the mp bus.`
250
251			`Arithmetic:`
252			`ADD`
253			`SUB`
254			`SHR`
255			`SHL`
256			`ROR`
257			`ROL`
258			`RND // Random Number Generator`
259			`Arguments: reg,reg,reg`
260			`Arguments: reg,reg,imm16`
261
262			`Logic:`
263			`OR`
264			`AND`
265			`XOR`
266			`NOT`
267			`Arguments: reg,reg,reg`
268
269			`Memory:`
270			`LB/LW/LD(S)`
271			`SB/SW/SD`
272			`Arguments: reg,[reg+imm16]`
273
274			`Branch:`
275			`BEQ(L)`
276			`BNE(L)`
277			`BZ(L)`
278			`BNZ(L)`
279			`BC(L)`
280			`BNC(L)`
281			`J(L)`
282			`JR(L)`
283
284			`Interupts/Special:`
285			`NOP // No Operation`
286
287			`MP(MultiProcessing) Interconnect:`
288			`MPIREAD // Write Buffer`
289			`MPIWRITE // Read Buffer`
290			`MPIREQ? // Branch on Request Pending`
291
292			`Threads:`
293			`TSREQ? // Branch on Thread Service Request`
294
295			`Local Processor Manipulation:`
296			`PSTOP // Processor`
297			`PSTART`
298			`TSTOP reg(threadid) // Thread`
299			`TSTART reg(threadid)`