 # Code generating for the architecture with ILP

by zzeng on 2002-05-29
source: http://www.geocities.com/creta_ru/doc/VLIW-draft-eng.pdf
```1. procedure explore_vertex ( the current vertex, the context - the list of all ready inputs)
2. begin
3. while def[the current vertex, I] != null
4. find the block connected with the data through DU
5. notify this block that the appropriate input is ready, mark the input in the context
6. if this correlative block has all inputs ready
7. insert into the GX a new vertex -this block;
8. insert into the GX a new arc -the current vertex -> the given block;
9. call explore_vertex (correlative block, the context);
10. while there is a block with all inputs ready
11. insert into the GX a new vertex -this block;
12. insert into the GX a new arc -the current vertex -> the given block;
13. call explore_vertex (the found block, the context);
14. end ;
15. insert into
the GX the vertex __input__;
16. call explore_vertex (__input__, an emptycontext);
```
In fact, each time adding vertex (block) to the graph GX, we add to it recursively all completed blocks. The number of the vertexes of the GX is an exponent of the size of the initial graph G. In such a graph, any path from the input to the output represents some path of code generating, but our task is to find the shortest path what can be done easily by greedy algorithms, for example, by Dijkstra algorithm. The most important problem here is the selection of weights of the graph GX arcs -in the simplest variant, when weight of an arc is equal to a change of the current length of the code in ticks, searching for the shortest path will come to a trivial enumerating of the most part of variants since intermediate lengths of paths are distributed very densely, even presence of some rough conflicts of instructions will not result in the deviation of variant beyond the limits of consideration. Having made a successful choice of system of weights, it is possible to reach the finish, having touched only small part of possible variants. The following variant is proposed for consideration: 1. weight of the arc (i.e. the cost of addition of a code of some block to already available) is determined dynamically -directly at that moment when we try to make it 2. path length is not additive, but multiplicative -we are not to add , but to multiply lengths of arcs, however if we still assume that we go along Dijkstra algorithm, the length of the arc is determined by multiplying the current length of the path by some penalty coeficient of a new block coding. It is necessary to mention another problem which, fortunately, has a simple solution. When searching for the shortest path in GX, we can experience a situation of a choice between absolutely identical continuations. For example, in the above mentioned summation, at the very beginning, we have an opportu- nity to code any of blocks add1, add2, add3, add4 -due to the symmetry of the graph these variants are absolutely equivalent. Similarly, equivalent are adds1 and adds2. Having removed trivial branchings, we could reduce the number of variants at least by a factor of 4*3*2*2. The solution seems simple enough -to each vertex of the graph of the algorithm, we assign some metrics obtained from the description of the paths achievable from the given vertex by any stable way (here we mean stability of the metrics of the vertex to isomorphic transforma- tions of graph G). Then, by analyzing neighbors of any vertexes, we begin to incarnate only those branches which lead to vertexes with different metrics. So far, we considered basic blocks as given. Their contents did not interest us, we only assumed that it was correct and somehow optimal. But if we have really mentioned that we are going to charge penalties for coding of blocks, it is time to engage in their contents closely. It's obviously that blocks consist of instructions. It is also not less obvious that these instructions can not be static because it's impossible to have blocks for all occasions, moreover, nobody can make a programmer operate with millions of blocks. So, contents of the block is a pattern of a code with a set of parameters. Parameters can be used, for example, to set the actual numbers of registers. Thus, compiler's duty would consist in attaching blocks through parameters. This variant is simple enough, but not too flexible, therefore, just in order to add a variety, we'll split the concept of the block into two. 1. a type of block -in the above-stated example of summation, all blocks were of the same type, a definition of type must include - a name, using these names, the programmer will build a graph of the algorithm - a list of inputs of the block -just names in order to define DU - a list of outputs of the block -similarly, just names in order to define DU, some of outputs can branch. 2. implementation of the block -the variant of coding, contains -a name of the type of the block -a list of parameters (variables) which the compiler must define before coding given implementation - a list of inputs with the names equal to the names of inputs of the type, but with the definition of the optionally parameterized place where it is expected to find the data for this input. For example, if we have defined parameter x and we have a pool of registers A, such place can be A[x] - a similar list of outputs with the names from the description of the type and with the definition of the places where there will be the results of the execution of the code of the implementation - a code itself -is a list of instructions with optionally parameterized arguments. All instructions are considered consecutive, the compiler making them parallel if it's possible. In other words, when defining a graph of the algorithm, the programmer operates with the types of blocks, whereas the compiler, when generating a code, tries to substitute all available implementations of the blocks of the given type. For example, a block of division of two variables can have following implementations (it's a joke): - hardware division (if it's available) - Newton -Raffson approximation - taking the logarithm/subtraction/taking the antilogarithm. It should be noted, that the graph GX now must be redefined. In both places (lines 7 and 11) instead of insert into GX a vertex -this block; insert into GX an arc -the current vertex -> the given block; call explore_vertex (the found block, the context); must be: while implementation of the block! = null insert into GX a vertex -this implementation; insert into GX an arc: the current vertex->given implementation; call explore_vertex (the found implementation, the context); This resulting graph will be called as GXX. 6 Trial realization. At first, we have to create a universal as far as possible description of the target machine architecture. First of all, we define places where there can be data. They are: - registers - memory block(s), for example, if memory is spliced - devices of the processor -every possible adders, multipliers, memory paths... - special memory, for example ports of periphery -cache-memory of the all levels Let's note that such places can be both scalar (ports, devices of the processor) and vector (memory).
```pool {
comment="Unit_L1 First memory gateway";
name="L1"; range={0-0};
}; pool {
comment="Unit_L2 Second memory gateway";
name="L2"; range={0-0};
}; pool {
comment="Unit_S1 First summator";
name="S1"; range={0-0};
}; pool {
comment="Unit_S2 Second summator";
name="S2"; range={0-0};
}; pool {
comment="Register_pool_A";
name="A"; range={0-15};
}; pool {
comment="Register_pool_B";
name="B"; range={0-15};
}; pool {
comment="Memory_pool";
name="MEM";
range={0-100000000};
};```
Attributes of the pool tag are obvious: name -an identifier, comment a comment, range from ... range to -bounds of the allowable interval of the vector, if they are equal -then a scalar Now our task is to describe atomic operations of the processor.
```cop {
time=7;
inargs={ A = "1111111"};
outargs={A = "1111111"};
units={S1 = "1111111"};
}; cop {
time=7;
inargs={ B = "1111111"};
outargs={B = "1111111"};
units={S2 = "1111111"};
}; cop {
time=9; inargs={ B = "111111111" };
outargs={A = "111111111"};
units={S1 = "111111111"};
}; cop {
time=9; inargs={ A = "111111111" };
outargs={B = "111111111"};
units={S2 = "111111111"};
}; cop {
time=9;
inargs={ MEM = "111111111"};
outargs={A = "111111111"};
units={L1 = "111111111"};
}; cop {
time=9;
inargs={ MEM = "111111111"};
outargs={B = "111111111"};
units={L2 = "111111111"};
}; cop {
name="STORE";
comment="memory word store via L1";
time=9;
inargs={ A = "111111111" };
outargs={MEM = "111111111"};
units={L1 = "111111111"};
}; cop {
name="STORE";
comment="memory word store via L2";
time=9;
inargs={ B = "111111111"};
outargs={MEM = "111111111"};
units={L2 = "111111111"};
}; cop {
name="MOV";
comment="register copying in A";
time=1;
inargs={ A = "1"};
outargs={A = "1" };
}; cop {
name="MOV";
comment="register copying in B";
time=1;
inargs={ B = "1"};
outargs={B = "1" };
}; cop {
name="MOV";
comment="register copying from A to B";
time=2;
inargs={ A = "11"};
outargs={B = "11"};
}; cop {
name="MOV";
comment="register copying from B to A";
time=2;
inargs={ B = "11" };
outargs={A = "11"};
}; ```
Attributes of the cop tag: name -an identifier, comment -a comment, time -a formal time in clock ticks required for execution. The inargs, outargs and units attributes describe an argument of the instruction, distinctions between them consisting in that the argument described as unit does not demand to be referenced in the code, while the outargs argument in contrast to the inargs one keeps a cell occupied and the cell can not be used by a compiler within a current block without being explicitly referenced. A description of an argument contains a name=value pair where the name corresponds to the pool tag name and the value is a 0/1 bit string of length specified by the field time, where 1 means that the given device at the given tick is busy (now for simplicity we assume that all devices are busy all the time during execution of the instruction). Now it's the most interesting thing -the basic blocks: Types
```block { comment="plus";
name="plus";
input {left, right};
output {out}; };
block { comment="bind"; name="bind"; input {in}; output {out};
}; ```
and the implementations:
```implementation {
comment="plus";
name="plus";
var=y,x,z;
input { left=A(x);
right=A(y);
};
output { out=A(z);
};
MOV A(y) A(z);
};
}; implementation { comment="plus";
name="plus";
var=y,x,z;
input { left=B(x);
right=B(y);
};
output { out=B(z);
};
MOV B(y) B(z);
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=A(x);
};
output { out=A(y);
};
code { MOV A(x) A(y);
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=B(x);
};
output { out=B(y);
};
code { MOV B(x) B(y);
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=A(x);
};
output { out=B(y);
};
code { MOV A(x) B(y);
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=B(x);
};
output { out=A(y);
};
code { MOV B(x) A(y);
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=MEM(x);
};
output { out=A(y);
};
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=MEM(x);
};
output { out=B(y);
};
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=B(x);
};
output { out=MEM(y);
};
code { STORE B(x) MEM(y);
};
}; implementation { comment="bind";
name="bind";
var=y,x;
input { in=A(x);
};
output { out=MEM(y);
};
code { STORE A(x) MEM(y);
};
};
```
The attributes name and comment keep their meanings they had earlier. Note that the attribute name is not unique any longer and this makes branching be possible. The block externally is a black box with a set of inputs and outputs. Both inputs (input tag) and outputs (output tag) are data cells, what means that there is a need to have internal parameters. A parameter (var tag) is an index of a certain cell of one or another pool and can be associated not only with the inputs/outputs, but also be used in a definition of the internal life of the block. To the moment when an implementation of the block takes place, all parameters for inputs should be defined. However, if extra parameters, not only for inputs, are defined, this is an additional place of branching. In our simple task the only block has been used -the adder, albeit there is a set of blocks with implicitly predefined names (for example "BIND"), which are meant for moving data from the output of one block to the input of the other block, and also "JUMP", arising when there is a need to merge branches of code that have been born by branching of the output of the block (in the given example, there are no such outputs): Our case is the simplest one, but even here there is a lot of variants of transfers. Actually, there should be much more variants, including double-thread ones, etc. The amount of the variants will not have a significant effect on speed of searching since short variants will be processed in the first place. Long variants will be processed if only resources are not in use and if for some reason such a path is cheaper. It is necessary to pay especial attention to a case of non- homogeneous memory. We've already told that sometimes processors have the spliced memory. Sometimes memory pools have different access times because of their physical organization -ROM, FLASH..., sometimes, it is allowed to take addresses of the cache-memory -we are able to describe all these situations easily in our terms. Cache-memory in case of the VLIW processor deserves separate words we'll tell later. Now it's time to describe the algorithm.
```item {
block=plus;
};
block=plus;
};
block=plus;
};
block=plus;
};
block=plus;
};
block=plus;
};
block=plus;
};
```
There are seven blocks: the first four in pairs sum up our eight numbers, the others -intermediate results. We shall note that the name attribute is the unique identifier again, the block attribute referring to the group identifier of the block. Now it's the turn of the inter-block connections.
```link { from=in_1;
};
};
};
};
};
};
};
};
};
};
};
};
};
};

to=out_1;
};```
where the "from" tag refers to the certain output of the certain type of the block, and the "to" tag -to the certain input of the certain type of the block. Some of these connections refer to the inputs of the algorithm:
```input { in_1=MEM(0);
in_2=MEM(1);
in_3=MEM(2);
in_4=MEM(3);
in_5=MEM(4);
in_6=MEM(5);
in_7=MEM(6);
in_8=MEM(7);
};
```
and to the outputs:
```output {
out_1=MEM(1);
};
```