1 |
711 |
jeremybenn |
@c Copyright (c) 2006, 2007, 2008 Free Software Foundation, Inc.
|
2 |
|
|
@c Free Software Foundation, Inc.
|
3 |
|
|
@c This is part of the GCC manual.
|
4 |
|
|
@c For copying conditions, see the file gcc.texi.
|
5 |
|
|
|
6 |
|
|
@c ---------------------------------------------------------------------
|
7 |
|
|
@c Loop Representation
|
8 |
|
|
@c ---------------------------------------------------------------------
|
9 |
|
|
|
10 |
|
|
@node Loop Analysis and Representation
|
11 |
|
|
@chapter Analysis and Representation of Loops
|
12 |
|
|
|
13 |
|
|
GCC provides extensive infrastructure for work with natural loops, i.e.,
|
14 |
|
|
strongly connected components of CFG with only one entry block. This
|
15 |
|
|
chapter describes representation of loops in GCC, both on GIMPLE and in
|
16 |
|
|
RTL, as well as the interfaces to loop-related analyses (induction
|
17 |
|
|
variable analysis and number of iterations analysis).
|
18 |
|
|
|
19 |
|
|
@menu
|
20 |
|
|
* Loop representation:: Representation and analysis of loops.
|
21 |
|
|
* Loop querying:: Getting information about loops.
|
22 |
|
|
* Loop manipulation:: Loop manipulation functions.
|
23 |
|
|
* LCSSA:: Loop-closed SSA form.
|
24 |
|
|
* Scalar evolutions:: Induction variables on GIMPLE.
|
25 |
|
|
* loop-iv:: Induction variables on RTL.
|
26 |
|
|
* Number of iterations:: Number of iterations analysis.
|
27 |
|
|
* Dependency analysis:: Data dependency analysis.
|
28 |
|
|
* Lambda:: Linear loop transformations framework.
|
29 |
|
|
* Omega:: A solver for linear programming problems.
|
30 |
|
|
@end menu
|
31 |
|
|
|
32 |
|
|
@node Loop representation
|
33 |
|
|
@section Loop representation
|
34 |
|
|
@cindex Loop representation
|
35 |
|
|
@cindex Loop analysis
|
36 |
|
|
|
37 |
|
|
This chapter describes the representation of loops in GCC, and functions
|
38 |
|
|
that can be used to build, modify and analyze this representation. Most
|
39 |
|
|
of the interfaces and data structures are declared in @file{cfgloop.h}.
|
40 |
|
|
At the moment, loop structures are analyzed and this information is
|
41 |
|
|
updated only by the optimization passes that deal with loops, but some
|
42 |
|
|
efforts are being made to make it available throughout most of the
|
43 |
|
|
optimization passes.
|
44 |
|
|
|
45 |
|
|
In general, a natural loop has one entry block (header) and possibly
|
46 |
|
|
several back edges (latches) leading to the header from the inside of
|
47 |
|
|
the loop. Loops with several latches may appear if several loops share
|
48 |
|
|
a single header, or if there is a branching in the middle of the loop.
|
49 |
|
|
The representation of loops in GCC however allows only loops with a
|
50 |
|
|
single latch. During loop analysis, headers of such loops are split and
|
51 |
|
|
forwarder blocks are created in order to disambiguate their structures.
|
52 |
|
|
Heuristic based on profile information and structure of the induction
|
53 |
|
|
variables in the loops is used to determine whether the latches
|
54 |
|
|
correspond to sub-loops or to control flow in a single loop. This means
|
55 |
|
|
that the analysis sometimes changes the CFG, and if you run it in the
|
56 |
|
|
middle of an optimization pass, you must be able to deal with the new
|
57 |
|
|
blocks. You may avoid CFG changes by passing
|
58 |
|
|
@code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery,
|
59 |
|
|
note however that most other loop manipulation functions will not work
|
60 |
|
|
correctly for loops with multiple latch edges (the functions that only
|
61 |
|
|
query membership of blocks to loops and subloop relationships, or
|
62 |
|
|
enumerate and test loop exits, can be expected to work).
|
63 |
|
|
|
64 |
|
|
Body of the loop is the set of blocks that are dominated by its header,
|
65 |
|
|
and reachable from its latch against the direction of edges in CFG@. The
|
66 |
|
|
loops are organized in a containment hierarchy (tree) such that all the
|
67 |
|
|
loops immediately contained inside loop L are the children of L in the
|
68 |
|
|
tree. This tree is represented by the @code{struct loops} structure.
|
69 |
|
|
The root of this tree is a fake loop that contains all blocks in the
|
70 |
|
|
function. Each of the loops is represented in a @code{struct loop}
|
71 |
|
|
structure. Each loop is assigned an index (@code{num} field of the
|
72 |
|
|
@code{struct loop} structure), and the pointer to the loop is stored in
|
73 |
|
|
the corresponding field of the @code{larray} vector in the loops
|
74 |
|
|
structure. The indices do not have to be continuous, there may be
|
75 |
|
|
empty (@code{NULL}) entries in the @code{larray} created by deleting
|
76 |
|
|
loops. Also, there is no guarantee on the relative order of a loop
|
77 |
|
|
and its subloops in the numbering. The index of a loop never changes.
|
78 |
|
|
|
79 |
|
|
The entries of the @code{larray} field should not be accessed directly.
|
80 |
|
|
The function @code{get_loop} returns the loop description for a loop with
|
81 |
|
|
the given index. @code{number_of_loops} function returns number of
|
82 |
|
|
loops in the function. To traverse all loops, use @code{FOR_EACH_LOOP}
|
83 |
|
|
macro. The @code{flags} argument of the macro is used to determine
|
84 |
|
|
the direction of traversal and the set of loops visited. Each loop is
|
85 |
|
|
guaranteed to be visited exactly once, regardless of the changes to the
|
86 |
|
|
loop tree, and the loops may be removed during the traversal. The newly
|
87 |
|
|
created loops are never traversed, if they need to be visited, this
|
88 |
|
|
must be done separately after their creation. The @code{FOR_EACH_LOOP}
|
89 |
|
|
macro allocates temporary variables. If the @code{FOR_EACH_LOOP} loop
|
90 |
|
|
were ended using break or goto, they would not be released;
|
91 |
|
|
@code{FOR_EACH_LOOP_BREAK} macro must be used instead.
|
92 |
|
|
|
93 |
|
|
Each basic block contains the reference to the innermost loop it belongs
|
94 |
|
|
to (@code{loop_father}). For this reason, it is only possible to have
|
95 |
|
|
one @code{struct loops} structure initialized at the same time for each
|
96 |
|
|
CFG@. The global variable @code{current_loops} contains the
|
97 |
|
|
@code{struct loops} structure. Many of the loop manipulation functions
|
98 |
|
|
assume that dominance information is up-to-date.
|
99 |
|
|
|
100 |
|
|
The loops are analyzed through @code{loop_optimizer_init} function. The
|
101 |
|
|
argument of this function is a set of flags represented in an integer
|
102 |
|
|
bitmask. These flags specify what other properties of the loop
|
103 |
|
|
structures should be calculated/enforced and preserved later:
|
104 |
|
|
|
105 |
|
|
@itemize
|
106 |
|
|
@item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no
|
107 |
|
|
changes to CFG will be performed in the loop analysis, in particular,
|
108 |
|
|
loops with multiple latch edges will not be disambiguated. If a loop
|
109 |
|
|
has multiple latches, its latch block is set to NULL@. Most of
|
110 |
|
|
the loop manipulation functions will not work for loops in this shape.
|
111 |
|
|
No other flags that require CFG changes can be passed to
|
112 |
|
|
loop_optimizer_init.
|
113 |
|
|
@item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
|
114 |
|
|
a way that each loop has only one entry edge, and additionally, the
|
115 |
|
|
source block of this entry edge has only one successor. This creates a
|
116 |
|
|
natural place where the code can be moved out of the loop, and ensures
|
117 |
|
|
that the entry edge of the loop leads from its immediate super-loop.
|
118 |
|
|
@item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
|
119 |
|
|
force the latch block of each loop to have only one successor. This
|
120 |
|
|
ensures that the latch of the loop does not belong to any of its
|
121 |
|
|
sub-loops, and makes manipulation with the loops significantly easier.
|
122 |
|
|
Most of the loop manipulation functions assume that the loops are in
|
123 |
|
|
this shape. Note that with this flag, the ``normal'' loop without any
|
124 |
|
|
control flow inside and with one exit consists of two basic blocks.
|
125 |
|
|
@item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
|
126 |
|
|
edges in the strongly connected components that are not natural loops
|
127 |
|
|
(have more than one entry block) are marked with
|
128 |
|
|
@code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags. The
|
129 |
|
|
flag is not set for blocks and edges that belong to natural loops that
|
130 |
|
|
are in such an irreducible region (but it is set for the entry and exit
|
131 |
|
|
edges of such a loop, if they lead to/from this region).
|
132 |
|
|
@item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
|
133 |
|
|
and updated for each loop. This makes some functions (e.g.,
|
134 |
|
|
@code{get_loop_exit_edges}) more efficient. Some functions (e.g.,
|
135 |
|
|
@code{single_exit}) can be used only if the lists of exits are
|
136 |
|
|
recorded.
|
137 |
|
|
@end itemize
|
138 |
|
|
|
139 |
|
|
These properties may also be computed/enforced later, using functions
|
140 |
|
|
@code{create_preheaders}, @code{force_single_succ_latches},
|
141 |
|
|
@code{mark_irreducible_loops} and @code{record_loop_exits}.
|
142 |
|
|
|
143 |
|
|
The memory occupied by the loops structures should be freed with
|
144 |
|
|
@code{loop_optimizer_finalize} function.
|
145 |
|
|
|
146 |
|
|
The CFG manipulation functions in general do not update loop structures.
|
147 |
|
|
Specialized versions that additionally do so are provided for the most
|
148 |
|
|
common tasks. On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
|
149 |
|
|
used to cleanup CFG while updating the loops structures if
|
150 |
|
|
@code{current_loops} is set.
|
151 |
|
|
|
152 |
|
|
@node Loop querying
|
153 |
|
|
@section Loop querying
|
154 |
|
|
@cindex Loop querying
|
155 |
|
|
|
156 |
|
|
The functions to query the information about loops are declared in
|
157 |
|
|
@file{cfgloop.h}. Some of the information can be taken directly from
|
158 |
|
|
the structures. @code{loop_father} field of each basic block contains
|
159 |
|
|
the innermost loop to that the block belongs. The most useful fields of
|
160 |
|
|
loop structure (that are kept up-to-date at all times) are:
|
161 |
|
|
|
162 |
|
|
@itemize
|
163 |
|
|
@item @code{header}, @code{latch}: Header and latch basic blocks of the
|
164 |
|
|
loop.
|
165 |
|
|
@item @code{num_nodes}: Number of basic blocks in the loop (including
|
166 |
|
|
the basic blocks of the sub-loops).
|
167 |
|
|
@item @code{depth}: The depth of the loop in the loops tree, i.e., the
|
168 |
|
|
number of super-loops of the loop.
|
169 |
|
|
@item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
|
170 |
|
|
sub-loop, and the sibling of the loop in the loops tree.
|
171 |
|
|
@end itemize
|
172 |
|
|
|
173 |
|
|
There are other fields in the loop structures, many of them used only by
|
174 |
|
|
some of the passes, or not updated during CFG changes; in general, they
|
175 |
|
|
should not be accessed directly.
|
176 |
|
|
|
177 |
|
|
The most important functions to query loop structures are:
|
178 |
|
|
|
179 |
|
|
@itemize
|
180 |
|
|
@item @code{flow_loops_dump}: Dumps the information about loops to a
|
181 |
|
|
file.
|
182 |
|
|
@item @code{verify_loop_structure}: Checks consistency of the loop
|
183 |
|
|
structures.
|
184 |
|
|
@item @code{loop_latch_edge}: Returns the latch edge of a loop.
|
185 |
|
|
@item @code{loop_preheader_edge}: If loops have preheaders, returns
|
186 |
|
|
the preheader edge of a loop.
|
187 |
|
|
@item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
|
188 |
|
|
another loop.
|
189 |
|
|
@item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
|
190 |
|
|
to a loop (including its sub-loops).
|
191 |
|
|
@item @code{find_common_loop}: Finds the common super-loop of two loops.
|
192 |
|
|
@item @code{superloop_at_depth}: Returns the super-loop of a loop with
|
193 |
|
|
the given depth.
|
194 |
|
|
@item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
|
195 |
|
|
number of insns in the loop, on GIMPLE and on RTL.
|
196 |
|
|
@item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
|
197 |
|
|
loop.
|
198 |
|
|
@item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
|
199 |
|
|
with @code{EDGE_LOOP_EXIT} flag.
|
200 |
|
|
@item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
|
201 |
|
|
@code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
|
202 |
|
|
loop in depth-first search order in reversed CFG, ordered by dominance
|
203 |
|
|
relation, and breath-first search order, respectively.
|
204 |
|
|
@item @code{single_exit}: Returns the single exit edge of the loop, or
|
205 |
|
|
@code{NULL} if the loop has more than one exit. You can only use this
|
206 |
|
|
function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
|
207 |
|
|
@item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
|
208 |
|
|
@item @code{just_once_each_iteration_p}: Returns true if the basic block
|
209 |
|
|
is executed exactly once during each iteration of a loop (that is, it
|
210 |
|
|
does not belong to a sub-loop, and it dominates the latch of the loop).
|
211 |
|
|
@end itemize
|
212 |
|
|
|
213 |
|
|
@node Loop manipulation
|
214 |
|
|
@section Loop manipulation
|
215 |
|
|
@cindex Loop manipulation
|
216 |
|
|
|
217 |
|
|
The loops tree can be manipulated using the following functions:
|
218 |
|
|
|
219 |
|
|
@itemize
|
220 |
|
|
@item @code{flow_loop_tree_node_add}: Adds a node to the tree.
|
221 |
|
|
@item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
|
222 |
|
|
@item @code{add_bb_to_loop}: Adds a basic block to a loop.
|
223 |
|
|
@item @code{remove_bb_from_loops}: Removes a basic block from loops.
|
224 |
|
|
@end itemize
|
225 |
|
|
|
226 |
|
|
Most low-level CFG functions update loops automatically. The following
|
227 |
|
|
functions handle some more complicated cases of CFG manipulations:
|
228 |
|
|
|
229 |
|
|
@itemize
|
230 |
|
|
@item @code{remove_path}: Removes an edge and all blocks it dominates.
|
231 |
|
|
@item @code{split_loop_exit_edge}: Splits exit edge of the loop,
|
232 |
|
|
ensuring that PHI node arguments remain in the loop (this ensures that
|
233 |
|
|
loop-closed SSA form is preserved). Only useful on GIMPLE.
|
234 |
|
|
@end itemize
|
235 |
|
|
|
236 |
|
|
Finally, there are some higher-level loop transformations implemented.
|
237 |
|
|
While some of them are written so that they should work on non-innermost
|
238 |
|
|
loops, they are mostly untested in that case, and at the moment, they
|
239 |
|
|
are only reliable for the innermost loops:
|
240 |
|
|
|
241 |
|
|
@itemize
|
242 |
|
|
@item @code{create_iv}: Creates a new induction variable. Only works on
|
243 |
|
|
GIMPLE@. @code{standard_iv_increment_position} can be used to find a
|
244 |
|
|
suitable place for the iv increment.
|
245 |
|
|
@item @code{duplicate_loop_to_header_edge},
|
246 |
|
|
@code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
|
247 |
|
|
on GIMPLE) duplicate the body of the loop prescribed number of times on
|
248 |
|
|
one of the edges entering loop header, thus performing either loop
|
249 |
|
|
unrolling or loop peeling. @code{can_duplicate_loop_p}
|
250 |
|
|
(@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
|
251 |
|
|
loop.
|
252 |
|
|
@item @code{loop_version}, @code{tree_ssa_loop_version}: These function
|
253 |
|
|
create a copy of a loop, and a branch before them that selects one of
|
254 |
|
|
them depending on the prescribed condition. This is useful for
|
255 |
|
|
optimizations that need to verify some assumptions in runtime (one of
|
256 |
|
|
the copies of the loop is usually left unchanged, while the other one is
|
257 |
|
|
transformed in some way).
|
258 |
|
|
@item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
|
259 |
|
|
extra iterations to make the number of iterations divisible by unroll
|
260 |
|
|
factor, updating the exit condition, and removing the exits that now
|
261 |
|
|
cannot be taken. Works only on GIMPLE.
|
262 |
|
|
@end itemize
|
263 |
|
|
|
264 |
|
|
@node LCSSA
|
265 |
|
|
@section Loop-closed SSA form
|
266 |
|
|
@cindex LCSSA
|
267 |
|
|
@cindex Loop-closed SSA form
|
268 |
|
|
|
269 |
|
|
Throughout the loop optimizations on tree level, one extra condition is
|
270 |
|
|
enforced on the SSA form: No SSA name is used outside of the loop in
|
271 |
|
|
that it is defined. The SSA form satisfying this condition is called
|
272 |
|
|
``loop-closed SSA form'' -- LCSSA@. To enforce LCSSA, PHI nodes must be
|
273 |
|
|
created at the exits of the loops for the SSA names that are used
|
274 |
|
|
outside of them. Only the real operands (not virtual SSA names) are
|
275 |
|
|
held in LCSSA, in order to save memory.
|
276 |
|
|
|
277 |
|
|
There are various benefits of LCSSA:
|
278 |
|
|
|
279 |
|
|
@itemize
|
280 |
|
|
@item Many optimizations (value range analysis, final value
|
281 |
|
|
replacement) are interested in the values that are defined in the loop
|
282 |
|
|
and used outside of it, i.e., exactly those for that we create new PHI
|
283 |
|
|
nodes.
|
284 |
|
|
@item In induction variable analysis, it is not necessary to specify the
|
285 |
|
|
loop in that the analysis should be performed -- the scalar evolution
|
286 |
|
|
analysis always returns the results with respect to the loop in that the
|
287 |
|
|
SSA name is defined.
|
288 |
|
|
@item It makes updating of SSA form during loop transformations simpler.
|
289 |
|
|
Without LCSSA, operations like loop unrolling may force creation of PHI
|
290 |
|
|
nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
|
291 |
|
|
updated locally. However, since we only keep real operands in LCSSA, we
|
292 |
|
|
cannot use this advantage (we could have local updating of real
|
293 |
|
|
operands, but it is not much more efficient than to use generic SSA form
|
294 |
|
|
updating for it as well; the amount of changes to SSA is the same).
|
295 |
|
|
@end itemize
|
296 |
|
|
|
297 |
|
|
However, it also means LCSSA must be updated. This is usually
|
298 |
|
|
straightforward, unless you create a new value in loop and use it
|
299 |
|
|
outside, or unless you manipulate loop exit edges (functions are
|
300 |
|
|
provided to make these manipulations simple).
|
301 |
|
|
@code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
|
302 |
|
|
LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
|
303 |
|
|
LCSSA is preserved.
|
304 |
|
|
|
305 |
|
|
@node Scalar evolutions
|
306 |
|
|
@section Scalar evolutions
|
307 |
|
|
@cindex Scalar evolutions
|
308 |
|
|
@cindex IV analysis on GIMPLE
|
309 |
|
|
|
310 |
|
|
Scalar evolutions (SCEV) are used to represent results of induction
|
311 |
|
|
variable analysis on GIMPLE@. They enable us to represent variables with
|
312 |
|
|
complicated behavior in a simple and consistent way (we only use it to
|
313 |
|
|
express values of polynomial induction variables, but it is possible to
|
314 |
|
|
extend it). The interfaces to SCEV analysis are declared in
|
315 |
|
|
@file{tree-scalar-evolution.h}. To use scalar evolutions analysis,
|
316 |
|
|
@code{scev_initialize} must be used. To stop using SCEV,
|
317 |
|
|
@code{scev_finalize} should be used. SCEV analysis caches results in
|
318 |
|
|
order to save time and memory. This cache however is made invalid by
|
319 |
|
|
most of the loop transformations, including removal of code. If such a
|
320 |
|
|
transformation is performed, @code{scev_reset} must be called to clean
|
321 |
|
|
the caches.
|
322 |
|
|
|
323 |
|
|
Given an SSA name, its behavior in loops can be analyzed using the
|
324 |
|
|
@code{analyze_scalar_evolution} function. The returned SCEV however
|
325 |
|
|
does not have to be fully analyzed and it may contain references to
|
326 |
|
|
other SSA names defined in the loop. To resolve these (potentially
|
327 |
|
|
recursive) references, @code{instantiate_parameters} or
|
328 |
|
|
@code{resolve_mixers} functions must be used.
|
329 |
|
|
@code{instantiate_parameters} is useful when you use the results of SCEV
|
330 |
|
|
only for some analysis, and when you work with whole nest of loops at
|
331 |
|
|
once. It will try replacing all SSA names by their SCEV in all loops,
|
332 |
|
|
including the super-loops of the current loop, thus providing a complete
|
333 |
|
|
information about the behavior of the variable in the loop nest.
|
334 |
|
|
@code{resolve_mixers} is useful if you work with only one loop at a
|
335 |
|
|
time, and if you possibly need to create code based on the value of the
|
336 |
|
|
induction variable. It will only resolve the SSA names defined in the
|
337 |
|
|
current loop, leaving the SSA names defined outside unchanged, even if
|
338 |
|
|
their evolution in the outer loops is known.
|
339 |
|
|
|
340 |
|
|
The SCEV is a normal tree expression, except for the fact that it may
|
341 |
|
|
contain several special tree nodes. One of them is
|
342 |
|
|
@code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
|
343 |
|
|
expressed. The other one is @code{POLYNOMIAL_CHREC}. Polynomial chrec
|
344 |
|
|
has three arguments -- base, step and loop (both base and step may
|
345 |
|
|
contain further polynomial chrecs). Type of the expression and of base
|
346 |
|
|
and step must be the same. A variable has evolution
|
347 |
|
|
@code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
|
348 |
|
|
loop) equivalent to @code{x_1} in the following example
|
349 |
|
|
|
350 |
|
|
@smallexample
|
351 |
|
|
while (@dots{})
|
352 |
|
|
@{
|
353 |
|
|
x_1 = phi (base, x_2);
|
354 |
|
|
x_2 = x_1 + step;
|
355 |
|
|
@}
|
356 |
|
|
@end smallexample
|
357 |
|
|
|
358 |
|
|
Note that this includes the language restrictions on the operations.
|
359 |
|
|
For example, if we compile C code and @code{x} has signed type, then the
|
360 |
|
|
overflow in addition would cause undefined behavior, and we may assume
|
361 |
|
|
that this does not happen. Hence, the value with this SCEV cannot
|
362 |
|
|
overflow (which restricts the number of iterations of such a loop).
|
363 |
|
|
|
364 |
|
|
In many cases, one wants to restrict the attention just to affine
|
365 |
|
|
induction variables. In this case, the extra expressive power of SCEV
|
366 |
|
|
is not useful, and may complicate the optimizations. In this case,
|
367 |
|
|
@code{simple_iv} function may be used to analyze a value -- the result
|
368 |
|
|
is a loop-invariant base and step.
|
369 |
|
|
|
370 |
|
|
@node loop-iv
|
371 |
|
|
@section IV analysis on RTL
|
372 |
|
|
@cindex IV analysis on RTL
|
373 |
|
|
|
374 |
|
|
The induction variable on RTL is simple and only allows analysis of
|
375 |
|
|
affine induction variables, and only in one loop at once. The interface
|
376 |
|
|
is declared in @file{cfgloop.h}. Before analyzing induction variables
|
377 |
|
|
in a loop L, @code{iv_analysis_loop_init} function must be called on L.
|
378 |
|
|
After the analysis (possibly calling @code{iv_analysis_loop_init} for
|
379 |
|
|
several loops) is finished, @code{iv_analysis_done} should be called.
|
380 |
|
|
The following functions can be used to access the results of the
|
381 |
|
|
analysis:
|
382 |
|
|
|
383 |
|
|
@itemize
|
384 |
|
|
@item @code{iv_analyze}: Analyzes a single register used in the given
|
385 |
|
|
insn. If no use of the register in this insn is found, the following
|
386 |
|
|
insns are scanned, so that this function can be called on the insn
|
387 |
|
|
returned by get_condition.
|
388 |
|
|
@item @code{iv_analyze_result}: Analyzes result of the assignment in the
|
389 |
|
|
given insn.
|
390 |
|
|
@item @code{iv_analyze_expr}: Analyzes a more complicated expression.
|
391 |
|
|
All its operands are analyzed by @code{iv_analyze}, and hence they must
|
392 |
|
|
be used in the specified insn or one of the following insns.
|
393 |
|
|
@end itemize
|
394 |
|
|
|
395 |
|
|
The description of the induction variable is provided in @code{struct
|
396 |
|
|
rtx_iv}. In order to handle subregs, the representation is a bit
|
397 |
|
|
complicated; if the value of the @code{extend} field is not
|
398 |
|
|
@code{UNKNOWN}, the value of the induction variable in the i-th
|
399 |
|
|
iteration is
|
400 |
|
|
|
401 |
|
|
@smallexample
|
402 |
|
|
delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
|
403 |
|
|
@end smallexample
|
404 |
|
|
|
405 |
|
|
with the following exception: if @code{first_special} is true, then the
|
406 |
|
|
value in the first iteration (when @code{i} is zero) is @code{delta +
|
407 |
|
|
mult * base}. However, if @code{extend} is equal to @code{UNKNOWN},
|
408 |
|
|
then @code{first_special} must be false, @code{delta} 0, @code{mult} 1
|
409 |
|
|
and the value in the i-th iteration is
|
410 |
|
|
|
411 |
|
|
@smallexample
|
412 |
|
|
subreg_@{mode@} (base + i * step)
|
413 |
|
|
@end smallexample
|
414 |
|
|
|
415 |
|
|
The function @code{get_iv_value} can be used to perform these
|
416 |
|
|
calculations.
|
417 |
|
|
|
418 |
|
|
@node Number of iterations
|
419 |
|
|
@section Number of iterations analysis
|
420 |
|
|
@cindex Number of iterations analysis
|
421 |
|
|
|
422 |
|
|
Both on GIMPLE and on RTL, there are functions available to determine
|
423 |
|
|
the number of iterations of a loop, with a similar interface. The
|
424 |
|
|
number of iterations of a loop in GCC is defined as the number of
|
425 |
|
|
executions of the loop latch. In many cases, it is not possible to
|
426 |
|
|
determine the number of iterations unconditionally -- the determined
|
427 |
|
|
number is correct only if some assumptions are satisfied. The analysis
|
428 |
|
|
tries to verify these conditions using the information contained in the
|
429 |
|
|
program; if it fails, the conditions are returned together with the
|
430 |
|
|
result. The following information and conditions are provided by the
|
431 |
|
|
analysis:
|
432 |
|
|
|
433 |
|
|
@itemize
|
434 |
|
|
@item @code{assumptions}: If this condition is false, the rest of
|
435 |
|
|
the information is invalid.
|
436 |
|
|
@item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
|
437 |
|
|
this condition is true, the loop exits in the first iteration.
|
438 |
|
|
@item @code{infinite}: If this condition is true, the loop is infinite.
|
439 |
|
|
This condition is only available on RTL@. On GIMPLE, conditions for
|
440 |
|
|
finiteness of the loop are included in @code{assumptions}.
|
441 |
|
|
@item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
|
442 |
|
|
that gives number of iterations. The number of iterations is defined as
|
443 |
|
|
the number of executions of the loop latch.
|
444 |
|
|
@end itemize
|
445 |
|
|
|
446 |
|
|
Both on GIMPLE and on RTL, it necessary for the induction variable
|
447 |
|
|
analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
|
448 |
|
|
On GIMPLE, the results are stored to @code{struct tree_niter_desc}
|
449 |
|
|
structure. Number of iterations before the loop is exited through a
|
450 |
|
|
given exit can be determined using @code{number_of_iterations_exit}
|
451 |
|
|
function. On RTL, the results are returned in @code{struct niter_desc}
|
452 |
|
|
structure. The corresponding function is named
|
453 |
|
|
@code{check_simple_exit}. There are also functions that pass through
|
454 |
|
|
all the exits of a loop and try to find one with easy to determine
|
455 |
|
|
number of iterations -- @code{find_loop_niter} on GIMPLE and
|
456 |
|
|
@code{find_simple_exit} on RTL@. Finally, there are functions that
|
457 |
|
|
provide the same information, but additionally cache it, so that
|
458 |
|
|
repeated calls to number of iterations are not so costly --
|
459 |
|
|
@code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
|
460 |
|
|
on RTL.
|
461 |
|
|
|
462 |
|
|
Note that some of these functions may behave slightly differently than
|
463 |
|
|
others -- some of them return only the expression for the number of
|
464 |
|
|
iterations, and fail if there are some assumptions. The function
|
465 |
|
|
@code{number_of_latch_executions} works only for single-exit loops.
|
466 |
|
|
The function @code{number_of_cond_exit_executions} can be used to
|
467 |
|
|
determine number of executions of the exit condition of a single-exit
|
468 |
|
|
loop (i.e., the @code{number_of_latch_executions} increased by one).
|
469 |
|
|
|
470 |
|
|
@node Dependency analysis
|
471 |
|
|
@section Data Dependency Analysis
|
472 |
|
|
@cindex Data Dependency Analysis
|
473 |
|
|
|
474 |
|
|
The code for the data dependence analysis can be found in
|
475 |
|
|
@file{tree-data-ref.c} and its interface and data structures are
|
476 |
|
|
described in @file{tree-data-ref.h}. The function that computes the
|
477 |
|
|
data dependences for all the array and pointer references for a given
|
478 |
|
|
loop is @code{compute_data_dependences_for_loop}. This function is
|
479 |
|
|
currently used by the linear loop transform and the vectorization
|
480 |
|
|
passes. Before calling this function, one has to allocate two vectors:
|
481 |
|
|
a first vector will contain the set of data references that are
|
482 |
|
|
contained in the analyzed loop body, and the second vector will contain
|
483 |
|
|
the dependence relations between the data references. Thus if the
|
484 |
|
|
vector of data references is of size @code{n}, the vector containing the
|
485 |
|
|
dependence relations will contain @code{n*n} elements. However if the
|
486 |
|
|
analyzed loop contains side effects, such as calls that potentially can
|
487 |
|
|
interfere with the data references in the current analyzed loop, the
|
488 |
|
|
analysis stops while scanning the loop body for data references, and
|
489 |
|
|
inserts a single @code{chrec_dont_know} in the dependence relation
|
490 |
|
|
array.
|
491 |
|
|
|
492 |
|
|
The data references are discovered in a particular order during the
|
493 |
|
|
scanning of the loop body: the loop body is analyzed in execution order,
|
494 |
|
|
and the data references of each statement are pushed at the end of the
|
495 |
|
|
data reference array. Two data references syntactically occur in the
|
496 |
|
|
program in the same order as in the array of data references. This
|
497 |
|
|
syntactic order is important in some classical data dependence tests,
|
498 |
|
|
and mapping this order to the elements of this array avoids costly
|
499 |
|
|
queries to the loop body representation.
|
500 |
|
|
|
501 |
|
|
Three types of data references are currently handled: ARRAY_REF,
|
502 |
|
|
INDIRECT_REF and COMPONENT_REF@. The data structure for the data reference
|
503 |
|
|
is @code{data_reference}, where @code{data_reference_p} is a name of a
|
504 |
|
|
pointer to the data reference structure. The structure contains the
|
505 |
|
|
following elements:
|
506 |
|
|
|
507 |
|
|
@itemize
|
508 |
|
|
@item @code{base_object_info}: Provides information about the base object
|
509 |
|
|
of the data reference and its access functions. These access functions
|
510 |
|
|
represent the evolution of the data reference in the loop relative to
|
511 |
|
|
its base, in keeping with the classical meaning of the data reference
|
512 |
|
|
access function for the support of arrays. For example, for a reference
|
513 |
|
|
@code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
|
514 |
|
|
one for each array subscript, are:
|
515 |
|
|
@code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
|
516 |
|
|
|
517 |
|
|
@item @code{first_location_in_loop}: Provides information about the first
|
518 |
|
|
location accessed by the data reference in the loop and about the access
|
519 |
|
|
function used to represent evolution relative to this location. This data
|
520 |
|
|
is used to support pointers, and is not used for arrays (for which we
|
521 |
|
|
have base objects). Pointer accesses are represented as a one-dimensional
|
522 |
|
|
access that starts from the first location accessed in the loop. For
|
523 |
|
|
example:
|
524 |
|
|
|
525 |
|
|
@smallexample
|
526 |
|
|
for1 i
|
527 |
|
|
for2 j
|
528 |
|
|
*((int *)p + i + j) = a[i][j];
|
529 |
|
|
@end smallexample
|
530 |
|
|
|
531 |
|
|
The access function of the pointer access is @code{@{0, + 4B@}_for2}
|
532 |
|
|
relative to @code{p + i}. The access functions of the array are
|
533 |
|
|
@code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
|
534 |
|
|
relative to @code{a}.
|
535 |
|
|
|
536 |
|
|
Usually, the object the pointer refers to is either unknown, or we can't
|
537 |
|
|
prove that the access is confined to the boundaries of a certain object.
|
538 |
|
|
|
539 |
|
|
Two data references can be compared only if at least one of these two
|
540 |
|
|
representations has all its fields filled for both data references.
|
541 |
|
|
|
542 |
|
|
The current strategy for data dependence tests is as follows:
|
543 |
|
|
If both @code{a} and @code{b} are represented as arrays, compare
|
544 |
|
|
@code{a.base_object} and @code{b.base_object};
|
545 |
|
|
if they are equal, apply dependence tests (use access functions based on
|
546 |
|
|
base_objects).
|
547 |
|
|
Else if both @code{a} and @code{b} are represented as pointers, compare
|
548 |
|
|
@code{a.first_location} and @code{b.first_location};
|
549 |
|
|
if they are equal, apply dependence tests (use access functions based on
|
550 |
|
|
first location).
|
551 |
|
|
However, if @code{a} and @code{b} are represented differently, only try
|
552 |
|
|
to prove that the bases are definitely different.
|
553 |
|
|
|
554 |
|
|
@item Aliasing information.
|
555 |
|
|
@item Alignment information.
|
556 |
|
|
@end itemize
|
557 |
|
|
|
558 |
|
|
The structure describing the relation between two data references is
|
559 |
|
|
@code{data_dependence_relation} and the shorter name for a pointer to
|
560 |
|
|
such a structure is @code{ddr_p}. This structure contains:
|
561 |
|
|
|
562 |
|
|
@itemize
|
563 |
|
|
@item a pointer to each data reference,
|
564 |
|
|
@item a tree node @code{are_dependent} that is set to @code{chrec_known}
|
565 |
|
|
if the analysis has proved that there is no dependence between these two
|
566 |
|
|
data references, @code{chrec_dont_know} if the analysis was not able to
|
567 |
|
|
determine any useful result and potentially there could exist a
|
568 |
|
|
dependence between these data references, and @code{are_dependent} is
|
569 |
|
|
set to @code{NULL_TREE} if there exist a dependence relation between the
|
570 |
|
|
data references, and the description of this dependence relation is
|
571 |
|
|
given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
|
572 |
|
|
arrays,
|
573 |
|
|
@item a boolean that determines whether the dependence relation can be
|
574 |
|
|
represented by a classical distance vector,
|
575 |
|
|
@item an array @code{subscripts} that contains a description of each
|
576 |
|
|
subscript of the data references. Given two array accesses a
|
577 |
|
|
subscript is the tuple composed of the access functions for a given
|
578 |
|
|
dimension. For example, given @code{A[f1][f2][f3]} and
|
579 |
|
|
@code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
|
580 |
|
|
g2), (f3, g3)}.
|
581 |
|
|
@item two arrays @code{dir_vects} and @code{dist_vects} that contain
|
582 |
|
|
classical representations of the data dependences under the form of
|
583 |
|
|
direction and distance dependence vectors,
|
584 |
|
|
@item an array of loops @code{loop_nest} that contains the loops to
|
585 |
|
|
which the distance and direction vectors refer to.
|
586 |
|
|
@end itemize
|
587 |
|
|
|
588 |
|
|
Several functions for pretty printing the information extracted by the
|
589 |
|
|
data dependence analysis are available: @code{dump_ddrs} prints with a
|
590 |
|
|
maximum verbosity the details of a data dependence relations array,
|
591 |
|
|
@code{dump_dist_dir_vectors} prints only the classical distance and
|
592 |
|
|
direction vectors for a data dependence relations array, and
|
593 |
|
|
@code{dump_data_references} prints the details of the data references
|
594 |
|
|
contained in a data reference array.
|
595 |
|
|
|
596 |
|
|
@node Lambda
|
597 |
|
|
@section Linear loop transformations framework
|
598 |
|
|
@cindex Linear loop transformations framework
|
599 |
|
|
|
600 |
|
|
Lambda is a framework that allows transformations of loops using
|
601 |
|
|
non-singular matrix based transformations of the iteration space and
|
602 |
|
|
loop bounds. This allows compositions of skewing, scaling, interchange,
|
603 |
|
|
and reversal transformations. These transformations are often used to
|
604 |
|
|
improve cache behavior or remove inner loop dependencies to allow
|
605 |
|
|
parallelization and vectorization to take place.
|
606 |
|
|
|
607 |
|
|
To perform these transformations, Lambda requires that the loopnest be
|
608 |
|
|
converted into an internal form that can be matrix transformed easily.
|
609 |
|
|
To do this conversion, the function
|
610 |
|
|
@code{gcc_loopnest_to_lambda_loopnest} is provided. If the loop cannot
|
611 |
|
|
be transformed using lambda, this function will return NULL.
|
612 |
|
|
|
613 |
|
|
Once a @code{lambda_loopnest} is obtained from the conversion function,
|
614 |
|
|
it can be transformed by using @code{lambda_loopnest_transform}, which
|
615 |
|
|
takes a transformation matrix to apply. Note that it is up to the
|
616 |
|
|
caller to verify that the transformation matrix is legal to apply to the
|
617 |
|
|
loop (dependence respecting, etc). Lambda simply applies whatever
|
618 |
|
|
matrix it is told to provide. It can be extended to make legal matrices
|
619 |
|
|
out of any non-singular matrix, but this is not currently implemented.
|
620 |
|
|
Legality of a matrix for a given loopnest can be verified using
|
621 |
|
|
@code{lambda_transform_legal_p}.
|
622 |
|
|
|
623 |
|
|
Given a transformed loopnest, conversion back into gcc IR is done by
|
624 |
|
|
@code{lambda_loopnest_to_gcc_loopnest}. This function will modify the
|
625 |
|
|
loops so that they match the transformed loopnest.
|
626 |
|
|
|
627 |
|
|
|
628 |
|
|
@node Omega
|
629 |
|
|
@section Omega a solver for linear programming problems
|
630 |
|
|
@cindex Omega a solver for linear programming problems
|
631 |
|
|
|
632 |
|
|
The data dependence analysis contains several solvers triggered
|
633 |
|
|
sequentially from the less complex ones to the more sophisticated.
|
634 |
|
|
For ensuring the consistency of the results of these solvers, a data
|
635 |
|
|
dependence check pass has been implemented based on two different
|
636 |
|
|
solvers. The second method that has been integrated to GCC is based
|
637 |
|
|
on the Omega dependence solver, written in the 1990's by William Pugh
|
638 |
|
|
and David Wonnacott. Data dependence tests can be formulated using a
|
639 |
|
|
subset of the Presburger arithmetics that can be translated to linear
|
640 |
|
|
constraint systems. These linear constraint systems can then be
|
641 |
|
|
solved using the Omega solver.
|
642 |
|
|
|
643 |
|
|
The Omega solver is using Fourier-Motzkin's algorithm for variable
|
644 |
|
|
elimination: a linear constraint system containing @code{n} variables
|
645 |
|
|
is reduced to a linear constraint system with @code{n-1} variables.
|
646 |
|
|
The Omega solver can also be used for solving other problems that can
|
647 |
|
|
be expressed under the form of a system of linear equalities and
|
648 |
|
|
inequalities. The Omega solver is known to have an exponential worst
|
649 |
|
|
case, also known under the name of ``omega nightmare'' in the
|
650 |
|
|
literature, but in practice, the omega test is known to be efficient
|
651 |
|
|
for the common data dependence tests.
|
652 |
|
|
|
653 |
|
|
The interface used by the Omega solver for describing the linear
|
654 |
|
|
programming problems is described in @file{omega.h}, and the solver is
|
655 |
|
|
@code{omega_solve_problem}.
|