| 1 | 711 | jeremybenn | @c Copyright (c) 2006, 2007, 2008 Free Software Foundation, Inc.
 | 
      
         | 2 |  |  | @c Free Software Foundation, Inc.
 | 
      
         | 3 |  |  | @c This is part of the GCC manual.
 | 
      
         | 4 |  |  | @c For copying conditions, see the file gcc.texi.
 | 
      
         | 5 |  |  |  
 | 
      
         | 6 |  |  | @c ---------------------------------------------------------------------
 | 
      
         | 7 |  |  | @c Loop Representation
 | 
      
         | 8 |  |  | @c ---------------------------------------------------------------------
 | 
      
         | 9 |  |  |  
 | 
      
         | 10 |  |  | @node Loop Analysis and Representation
 | 
      
         | 11 |  |  | @chapter Analysis and Representation of Loops
 | 
      
         | 12 |  |  |  
 | 
      
         | 13 |  |  | GCC provides extensive infrastructure for work with natural loops, i.e.,
 | 
      
         | 14 |  |  | strongly connected components of CFG with only one entry block.  This
 | 
      
         | 15 |  |  | chapter describes representation of loops in GCC, both on GIMPLE and in
 | 
      
         | 16 |  |  | RTL, as well as the interfaces to loop-related analyses (induction
 | 
      
         | 17 |  |  | variable analysis and number of iterations analysis).
 | 
      
         | 18 |  |  |  
 | 
      
         | 19 |  |  | @menu
 | 
      
         | 20 |  |  | * Loop representation::         Representation and analysis of loops.
 | 
      
         | 21 |  |  | * Loop querying::               Getting information about loops.
 | 
      
         | 22 |  |  | * Loop manipulation::           Loop manipulation functions.
 | 
      
         | 23 |  |  | * LCSSA::                       Loop-closed SSA form.
 | 
      
         | 24 |  |  | * Scalar evolutions::           Induction variables on GIMPLE.
 | 
      
         | 25 |  |  | * loop-iv::                     Induction variables on RTL.
 | 
      
         | 26 |  |  | * Number of iterations::        Number of iterations analysis.
 | 
      
         | 27 |  |  | * Dependency analysis::         Data dependency analysis.
 | 
      
         | 28 |  |  | * Lambda::                      Linear loop transformations framework.
 | 
      
         | 29 |  |  | * Omega::                       A solver for linear programming problems.
 | 
      
         | 30 |  |  | @end menu
 | 
      
         | 31 |  |  |  
 | 
      
         | 32 |  |  | @node Loop representation
 | 
      
         | 33 |  |  | @section Loop representation
 | 
      
         | 34 |  |  | @cindex Loop representation
 | 
      
         | 35 |  |  | @cindex Loop analysis
 | 
      
         | 36 |  |  |  
 | 
      
         | 37 |  |  | This chapter describes the representation of loops in GCC, and functions
 | 
      
         | 38 |  |  | that can be used to build, modify and analyze this representation.  Most
 | 
      
         | 39 |  |  | of the interfaces and data structures are declared in @file{cfgloop.h}.
 | 
      
         | 40 |  |  | At the moment, loop structures are analyzed and this information is
 | 
      
         | 41 |  |  | updated only by the optimization passes that deal with loops, but some
 | 
      
         | 42 |  |  | efforts are being made to make it available throughout most of the
 | 
      
         | 43 |  |  | optimization passes.
 | 
      
         | 44 |  |  |  
 | 
      
         | 45 |  |  | In general, a natural loop has one entry block (header) and possibly
 | 
      
         | 46 |  |  | several back edges (latches) leading to the header from the inside of
 | 
      
         | 47 |  |  | the loop.  Loops with several latches may appear if several loops share
 | 
      
         | 48 |  |  | a single header, or if there is a branching in the middle of the loop.
 | 
      
         | 49 |  |  | The representation of loops in GCC however allows only loops with a
 | 
      
         | 50 |  |  | single latch.  During loop analysis, headers of such loops are split and
 | 
      
         | 51 |  |  | forwarder blocks are created in order to disambiguate their structures.
 | 
      
         | 52 |  |  | Heuristic based on profile information and structure of the induction
 | 
      
         | 53 |  |  | variables in the loops is used to determine whether the latches
 | 
      
         | 54 |  |  | correspond to sub-loops or to control flow in a single loop.  This means
 | 
      
         | 55 |  |  | that the analysis sometimes changes the CFG, and if you run it in the
 | 
      
         | 56 |  |  | middle of an optimization pass, you must be able to deal with the new
 | 
      
         | 57 |  |  | blocks.  You may avoid CFG changes by passing
 | 
      
         | 58 |  |  | @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery,
 | 
      
         | 59 |  |  | note however that most other loop manipulation functions will not work
 | 
      
         | 60 |  |  | correctly for loops with multiple latch edges (the functions that only
 | 
      
         | 61 |  |  | query membership of blocks to loops and subloop relationships, or
 | 
      
         | 62 |  |  | enumerate and test loop exits, can be expected to work).
 | 
      
         | 63 |  |  |  
 | 
      
         | 64 |  |  | Body of the loop is the set of blocks that are dominated by its header,
 | 
      
         | 65 |  |  | and reachable from its latch against the direction of edges in CFG@.  The
 | 
      
         | 66 |  |  | loops are organized in a containment hierarchy (tree) such that all the
 | 
      
         | 67 |  |  | loops immediately contained inside loop L are the children of L in the
 | 
      
         | 68 |  |  | tree.  This tree is represented by the @code{struct loops} structure.
 | 
      
         | 69 |  |  | The root of this tree is a fake loop that contains all blocks in the
 | 
      
         | 70 |  |  | function.  Each of the loops is represented in a @code{struct loop}
 | 
      
         | 71 |  |  | structure.  Each loop is assigned an index (@code{num} field of the
 | 
      
         | 72 |  |  | @code{struct loop} structure), and the pointer to the loop is stored in
 | 
      
         | 73 |  |  | the corresponding field of the @code{larray} vector in the loops
 | 
      
         | 74 |  |  | structure.  The indices do not have to be continuous, there may be
 | 
      
         | 75 |  |  | empty (@code{NULL}) entries in the @code{larray} created by deleting
 | 
      
         | 76 |  |  | loops.  Also, there is no guarantee on the relative order of a loop
 | 
      
         | 77 |  |  | and its subloops in the numbering.  The index of a loop never changes.
 | 
      
         | 78 |  |  |  
 | 
      
         | 79 |  |  | The entries of the @code{larray} field should not be accessed directly.
 | 
      
         | 80 |  |  | The function @code{get_loop} returns the loop description for a loop with
 | 
      
         | 81 |  |  | the given index.  @code{number_of_loops} function returns number of
 | 
      
         | 82 |  |  | loops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
 | 
      
         | 83 |  |  | macro.  The @code{flags} argument of the macro is used to determine
 | 
      
         | 84 |  |  | the direction of traversal and the set of loops visited.  Each loop is
 | 
      
         | 85 |  |  | guaranteed to be visited exactly once, regardless of the changes to the
 | 
      
         | 86 |  |  | loop tree, and the loops may be removed during the traversal.  The newly
 | 
      
         | 87 |  |  | created loops are never traversed, if they need to be visited, this
 | 
      
         | 88 |  |  | must be done separately after their creation.  The @code{FOR_EACH_LOOP}
 | 
      
         | 89 |  |  | macro allocates temporary variables.  If the @code{FOR_EACH_LOOP} loop
 | 
      
         | 90 |  |  | were ended using break or goto, they would not be released;
 | 
      
         | 91 |  |  | @code{FOR_EACH_LOOP_BREAK} macro must be used instead.
 | 
      
         | 92 |  |  |  
 | 
      
         | 93 |  |  | Each basic block contains the reference to the innermost loop it belongs
 | 
      
         | 94 |  |  | to (@code{loop_father}).  For this reason, it is only possible to have
 | 
      
         | 95 |  |  | one @code{struct loops} structure initialized at the same time for each
 | 
      
         | 96 |  |  | CFG@.  The global variable @code{current_loops} contains the
 | 
      
         | 97 |  |  | @code{struct loops} structure.  Many of the loop manipulation functions
 | 
      
         | 98 |  |  | assume that dominance information is up-to-date.
 | 
      
         | 99 |  |  |  
 | 
      
         | 100 |  |  | The loops are analyzed through @code{loop_optimizer_init} function.  The
 | 
      
         | 101 |  |  | argument of this function is a set of flags represented in an integer
 | 
      
         | 102 |  |  | bitmask.  These flags specify what other properties of the loop
 | 
      
         | 103 |  |  | structures should be calculated/enforced and preserved later:
 | 
      
         | 104 |  |  |  
 | 
      
         | 105 |  |  | @itemize
 | 
      
         | 106 |  |  | @item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no
 | 
      
         | 107 |  |  | changes to CFG will be performed in the loop analysis, in particular,
 | 
      
         | 108 |  |  | loops with multiple latch edges will not be disambiguated.  If a loop
 | 
      
         | 109 |  |  | has multiple latches, its latch block is set to NULL@.  Most of
 | 
      
         | 110 |  |  | the loop manipulation functions will not work for loops in this shape.
 | 
      
         | 111 |  |  | No other flags that require CFG changes can be passed to
 | 
      
         | 112 |  |  | loop_optimizer_init.
 | 
      
         | 113 |  |  | @item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
 | 
      
         | 114 |  |  | a way that each loop has only one entry edge, and additionally, the
 | 
      
         | 115 |  |  | source block of this entry edge has only one successor.  This creates a
 | 
      
         | 116 |  |  | natural place where the code can be moved out of the loop, and ensures
 | 
      
         | 117 |  |  | that the entry edge of the loop leads from its immediate super-loop.
 | 
      
         | 118 |  |  | @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
 | 
      
         | 119 |  |  | force the latch block of each loop to have only one successor.  This
 | 
      
         | 120 |  |  | ensures that the latch of the loop does not belong to any of its
 | 
      
         | 121 |  |  | sub-loops, and makes manipulation with the loops significantly easier.
 | 
      
         | 122 |  |  | Most of the loop manipulation functions assume that the loops are in
 | 
      
         | 123 |  |  | this shape.  Note that with this flag, the ``normal'' loop without any
 | 
      
         | 124 |  |  | control flow inside and with one exit consists of two basic blocks.
 | 
      
         | 125 |  |  | @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
 | 
      
         | 126 |  |  | edges in the strongly connected components that are not natural loops
 | 
      
         | 127 |  |  | (have more than one entry block) are marked with
 | 
      
         | 128 |  |  | @code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags.  The
 | 
      
         | 129 |  |  | flag is not set for blocks and edges that belong to natural loops that
 | 
      
         | 130 |  |  | are in such an irreducible region (but it is set for the entry and exit
 | 
      
         | 131 |  |  | edges of such a loop, if they lead to/from this region).
 | 
      
         | 132 |  |  | @item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
 | 
      
         | 133 |  |  | and updated for each loop.  This makes some functions (e.g.,
 | 
      
         | 134 |  |  | @code{get_loop_exit_edges}) more efficient.  Some functions (e.g.,
 | 
      
         | 135 |  |  | @code{single_exit}) can be used only if the lists of exits are
 | 
      
         | 136 |  |  | recorded.
 | 
      
         | 137 |  |  | @end itemize
 | 
      
         | 138 |  |  |  
 | 
      
         | 139 |  |  | These properties may also be computed/enforced later, using functions
 | 
      
         | 140 |  |  | @code{create_preheaders}, @code{force_single_succ_latches},
 | 
      
         | 141 |  |  | @code{mark_irreducible_loops} and @code{record_loop_exits}.
 | 
      
         | 142 |  |  |  
 | 
      
         | 143 |  |  | The memory occupied by the loops structures should be freed with
 | 
      
         | 144 |  |  | @code{loop_optimizer_finalize} function.
 | 
      
         | 145 |  |  |  
 | 
      
         | 146 |  |  | The CFG manipulation functions in general do not update loop structures.
 | 
      
         | 147 |  |  | Specialized versions that additionally do so are provided for the most
 | 
      
         | 148 |  |  | common tasks.  On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
 | 
      
         | 149 |  |  | used to cleanup CFG while updating the loops structures if
 | 
      
         | 150 |  |  | @code{current_loops} is set.
 | 
      
         | 151 |  |  |  
 | 
      
         | 152 |  |  | @node Loop querying
 | 
      
         | 153 |  |  | @section Loop querying
 | 
      
         | 154 |  |  | @cindex Loop querying
 | 
      
         | 155 |  |  |  
 | 
      
         | 156 |  |  | The functions to query the information about loops are declared in
 | 
      
         | 157 |  |  | @file{cfgloop.h}.  Some of the information can be taken directly from
 | 
      
         | 158 |  |  | the structures.  @code{loop_father} field of each basic block contains
 | 
      
         | 159 |  |  | the innermost loop to that the block belongs.  The most useful fields of
 | 
      
         | 160 |  |  | loop structure (that are kept up-to-date at all times) are:
 | 
      
         | 161 |  |  |  
 | 
      
         | 162 |  |  | @itemize
 | 
      
         | 163 |  |  | @item @code{header}, @code{latch}: Header and latch basic blocks of the
 | 
      
         | 164 |  |  | loop.
 | 
      
         | 165 |  |  | @item @code{num_nodes}: Number of basic blocks in the loop (including
 | 
      
         | 166 |  |  | the basic blocks of the sub-loops).
 | 
      
         | 167 |  |  | @item @code{depth}: The depth of the loop in the loops tree, i.e., the
 | 
      
         | 168 |  |  | number of super-loops of the loop.
 | 
      
         | 169 |  |  | @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
 | 
      
         | 170 |  |  | sub-loop, and the sibling of the loop in the loops tree.
 | 
      
         | 171 |  |  | @end itemize
 | 
      
         | 172 |  |  |  
 | 
      
         | 173 |  |  | There are other fields in the loop structures, many of them used only by
 | 
      
         | 174 |  |  | some of the passes, or not updated during CFG changes; in general, they
 | 
      
         | 175 |  |  | should not be accessed directly.
 | 
      
         | 176 |  |  |  
 | 
      
         | 177 |  |  | The most important functions to query loop structures are:
 | 
      
         | 178 |  |  |  
 | 
      
         | 179 |  |  | @itemize
 | 
      
         | 180 |  |  | @item @code{flow_loops_dump}: Dumps the information about loops to a
 | 
      
         | 181 |  |  | file.
 | 
      
         | 182 |  |  | @item @code{verify_loop_structure}: Checks consistency of the loop
 | 
      
         | 183 |  |  | structures.
 | 
      
         | 184 |  |  | @item @code{loop_latch_edge}: Returns the latch edge of a loop.
 | 
      
         | 185 |  |  | @item @code{loop_preheader_edge}: If loops have preheaders, returns
 | 
      
         | 186 |  |  | the preheader edge of a loop.
 | 
      
         | 187 |  |  | @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
 | 
      
         | 188 |  |  | another loop.
 | 
      
         | 189 |  |  | @item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
 | 
      
         | 190 |  |  | to a loop (including its sub-loops).
 | 
      
         | 191 |  |  | @item @code{find_common_loop}: Finds the common super-loop of two loops.
 | 
      
         | 192 |  |  | @item @code{superloop_at_depth}: Returns the super-loop of a loop with
 | 
      
         | 193 |  |  | the given depth.
 | 
      
         | 194 |  |  | @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
 | 
      
         | 195 |  |  | number of insns in the loop, on GIMPLE and on RTL.
 | 
      
         | 196 |  |  | @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
 | 
      
         | 197 |  |  | loop.
 | 
      
         | 198 |  |  | @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
 | 
      
         | 199 |  |  | with @code{EDGE_LOOP_EXIT} flag.
 | 
      
         | 200 |  |  | @item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
 | 
      
         | 201 |  |  | @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
 | 
      
         | 202 |  |  | loop in depth-first search order in reversed CFG, ordered by dominance
 | 
      
         | 203 |  |  | relation, and breath-first search order, respectively.
 | 
      
         | 204 |  |  | @item @code{single_exit}: Returns the single exit edge of the loop, or
 | 
      
         | 205 |  |  | @code{NULL} if the loop has more than one exit.  You can only use this
 | 
      
         | 206 |  |  | function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
 | 
      
         | 207 |  |  | @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
 | 
      
         | 208 |  |  | @item @code{just_once_each_iteration_p}: Returns true if the basic block
 | 
      
         | 209 |  |  | is executed exactly once during each iteration of a loop (that is, it
 | 
      
         | 210 |  |  | does not belong to a sub-loop, and it dominates the latch of the loop).
 | 
      
         | 211 |  |  | @end itemize
 | 
      
         | 212 |  |  |  
 | 
      
         | 213 |  |  | @node Loop manipulation
 | 
      
         | 214 |  |  | @section Loop manipulation
 | 
      
         | 215 |  |  | @cindex Loop manipulation
 | 
      
         | 216 |  |  |  
 | 
      
         | 217 |  |  | The loops tree can be manipulated using the following functions:
 | 
      
         | 218 |  |  |  
 | 
      
         | 219 |  |  | @itemize
 | 
      
         | 220 |  |  | @item @code{flow_loop_tree_node_add}: Adds a node to the tree.
 | 
      
         | 221 |  |  | @item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
 | 
      
         | 222 |  |  | @item @code{add_bb_to_loop}: Adds a basic block to a loop.
 | 
      
         | 223 |  |  | @item @code{remove_bb_from_loops}: Removes a basic block from loops.
 | 
      
         | 224 |  |  | @end itemize
 | 
      
         | 225 |  |  |  
 | 
      
         | 226 |  |  | Most low-level CFG functions update loops automatically.  The following
 | 
      
         | 227 |  |  | functions handle some more complicated cases of CFG manipulations:
 | 
      
         | 228 |  |  |  
 | 
      
         | 229 |  |  | @itemize
 | 
      
         | 230 |  |  | @item @code{remove_path}: Removes an edge and all blocks it dominates.
 | 
      
         | 231 |  |  | @item @code{split_loop_exit_edge}: Splits exit edge of the loop,
 | 
      
         | 232 |  |  | ensuring that PHI node arguments remain in the loop (this ensures that
 | 
      
         | 233 |  |  | loop-closed SSA form is preserved).  Only useful on GIMPLE.
 | 
      
         | 234 |  |  | @end itemize
 | 
      
         | 235 |  |  |  
 | 
      
         | 236 |  |  | Finally, there are some higher-level loop transformations implemented.
 | 
      
         | 237 |  |  | While some of them are written so that they should work on non-innermost
 | 
      
         | 238 |  |  | loops, they are mostly untested in that case, and at the moment, they
 | 
      
         | 239 |  |  | are only reliable for the innermost loops:
 | 
      
         | 240 |  |  |  
 | 
      
         | 241 |  |  | @itemize
 | 
      
         | 242 |  |  | @item @code{create_iv}: Creates a new induction variable.  Only works on
 | 
      
         | 243 |  |  | GIMPLE@.  @code{standard_iv_increment_position} can be used to find a
 | 
      
         | 244 |  |  | suitable place for the iv increment.
 | 
      
         | 245 |  |  | @item @code{duplicate_loop_to_header_edge},
 | 
      
         | 246 |  |  | @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
 | 
      
         | 247 |  |  | on GIMPLE) duplicate the body of the loop prescribed number of times on
 | 
      
         | 248 |  |  | one of the edges entering loop header, thus performing either loop
 | 
      
         | 249 |  |  | unrolling or loop peeling.  @code{can_duplicate_loop_p}
 | 
      
         | 250 |  |  | (@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
 | 
      
         | 251 |  |  | loop.
 | 
      
         | 252 |  |  | @item @code{loop_version}, @code{tree_ssa_loop_version}: These function
 | 
      
         | 253 |  |  | create a copy of a loop, and a branch before them that selects one of
 | 
      
         | 254 |  |  | them depending on the prescribed condition.  This is useful for
 | 
      
         | 255 |  |  | optimizations that need to verify some assumptions in runtime (one of
 | 
      
         | 256 |  |  | the copies of the loop is usually left unchanged, while the other one is
 | 
      
         | 257 |  |  | transformed in some way).
 | 
      
         | 258 |  |  | @item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
 | 
      
         | 259 |  |  | extra iterations to make the number of iterations divisible by unroll
 | 
      
         | 260 |  |  | factor, updating the exit condition, and removing the exits that now
 | 
      
         | 261 |  |  | cannot be taken.  Works only on GIMPLE.
 | 
      
         | 262 |  |  | @end itemize
 | 
      
         | 263 |  |  |  
 | 
      
         | 264 |  |  | @node LCSSA
 | 
      
         | 265 |  |  | @section Loop-closed SSA form
 | 
      
         | 266 |  |  | @cindex LCSSA
 | 
      
         | 267 |  |  | @cindex Loop-closed SSA form
 | 
      
         | 268 |  |  |  
 | 
      
         | 269 |  |  | Throughout the loop optimizations on tree level, one extra condition is
 | 
      
         | 270 |  |  | enforced on the SSA form:  No SSA name is used outside of the loop in
 | 
      
         | 271 |  |  | that it is defined.  The SSA form satisfying this condition is called
 | 
      
         | 272 |  |  | ``loop-closed SSA form'' -- LCSSA@.  To enforce LCSSA, PHI nodes must be
 | 
      
         | 273 |  |  | created at the exits of the loops for the SSA names that are used
 | 
      
         | 274 |  |  | outside of them.  Only the real operands (not virtual SSA names) are
 | 
      
         | 275 |  |  | held in LCSSA, in order to save memory.
 | 
      
         | 276 |  |  |  
 | 
      
         | 277 |  |  | There are various benefits of LCSSA:
 | 
      
         | 278 |  |  |  
 | 
      
         | 279 |  |  | @itemize
 | 
      
         | 280 |  |  | @item Many optimizations (value range analysis, final value
 | 
      
         | 281 |  |  | replacement) are interested in the values that are defined in the loop
 | 
      
         | 282 |  |  | and used outside of it, i.e., exactly those for that we create new PHI
 | 
      
         | 283 |  |  | nodes.
 | 
      
         | 284 |  |  | @item In induction variable analysis, it is not necessary to specify the
 | 
      
         | 285 |  |  | loop in that the analysis should be performed -- the scalar evolution
 | 
      
         | 286 |  |  | analysis always returns the results with respect to the loop in that the
 | 
      
         | 287 |  |  | SSA name is defined.
 | 
      
         | 288 |  |  | @item It makes updating of SSA form during loop transformations simpler.
 | 
      
         | 289 |  |  | Without LCSSA, operations like loop unrolling may force creation of PHI
 | 
      
         | 290 |  |  | nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
 | 
      
         | 291 |  |  | updated locally.  However, since we only keep real operands in LCSSA, we
 | 
      
         | 292 |  |  | cannot use this advantage (we could have local updating of real
 | 
      
         | 293 |  |  | operands, but it is not much more efficient than to use generic SSA form
 | 
      
         | 294 |  |  | updating for it as well; the amount of changes to SSA is the same).
 | 
      
         | 295 |  |  | @end itemize
 | 
      
         | 296 |  |  |  
 | 
      
         | 297 |  |  | However, it also means LCSSA must be updated.  This is usually
 | 
      
         | 298 |  |  | straightforward, unless you create a new value in loop and use it
 | 
      
         | 299 |  |  | outside, or unless you manipulate loop exit edges (functions are
 | 
      
         | 300 |  |  | provided to make these manipulations simple).
 | 
      
         | 301 |  |  | @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
 | 
      
         | 302 |  |  | LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
 | 
      
         | 303 |  |  | LCSSA is preserved.
 | 
      
         | 304 |  |  |  
 | 
      
         | 305 |  |  | @node Scalar evolutions
 | 
      
         | 306 |  |  | @section Scalar evolutions
 | 
      
         | 307 |  |  | @cindex Scalar evolutions
 | 
      
         | 308 |  |  | @cindex IV analysis on GIMPLE
 | 
      
         | 309 |  |  |  
 | 
      
         | 310 |  |  | Scalar evolutions (SCEV) are used to represent results of induction
 | 
      
         | 311 |  |  | variable analysis on GIMPLE@.  They enable us to represent variables with
 | 
      
         | 312 |  |  | complicated behavior in a simple and consistent way (we only use it to
 | 
      
         | 313 |  |  | express values of polynomial induction variables, but it is possible to
 | 
      
         | 314 |  |  | extend it).  The interfaces to SCEV analysis are declared in
 | 
      
         | 315 |  |  | @file{tree-scalar-evolution.h}.  To use scalar evolutions analysis,
 | 
      
         | 316 |  |  | @code{scev_initialize} must be used.  To stop using SCEV,
 | 
      
         | 317 |  |  | @code{scev_finalize} should be used.  SCEV analysis caches results in
 | 
      
         | 318 |  |  | order to save time and memory.  This cache however is made invalid by
 | 
      
         | 319 |  |  | most of the loop transformations, including removal of code.  If such a
 | 
      
         | 320 |  |  | transformation is performed, @code{scev_reset} must be called to clean
 | 
      
         | 321 |  |  | the caches.
 | 
      
         | 322 |  |  |  
 | 
      
         | 323 |  |  | Given an SSA name, its behavior in loops can be analyzed using the
 | 
      
         | 324 |  |  | @code{analyze_scalar_evolution} function.  The returned SCEV however
 | 
      
         | 325 |  |  | does not have to be fully analyzed and it may contain references to
 | 
      
         | 326 |  |  | other SSA names defined in the loop.  To resolve these (potentially
 | 
      
         | 327 |  |  | recursive) references, @code{instantiate_parameters} or
 | 
      
         | 328 |  |  | @code{resolve_mixers} functions must be used.
 | 
      
         | 329 |  |  | @code{instantiate_parameters} is useful when you use the results of SCEV
 | 
      
         | 330 |  |  | only for some analysis, and when you work with whole nest of loops at
 | 
      
         | 331 |  |  | once.  It will try replacing all SSA names by their SCEV in all loops,
 | 
      
         | 332 |  |  | including the super-loops of the current loop, thus providing a complete
 | 
      
         | 333 |  |  | information about the behavior of the variable in the loop nest.
 | 
      
         | 334 |  |  | @code{resolve_mixers} is useful if you work with only one loop at a
 | 
      
         | 335 |  |  | time, and if you possibly need to create code based on the value of the
 | 
      
         | 336 |  |  | induction variable.  It will only resolve the SSA names defined in the
 | 
      
         | 337 |  |  | current loop, leaving the SSA names defined outside unchanged, even if
 | 
      
         | 338 |  |  | their evolution in the outer loops is known.
 | 
      
         | 339 |  |  |  
 | 
      
         | 340 |  |  | The SCEV is a normal tree expression, except for the fact that it may
 | 
      
         | 341 |  |  | contain several special tree nodes.  One of them is
 | 
      
         | 342 |  |  | @code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
 | 
      
         | 343 |  |  | expressed.  The other one is @code{POLYNOMIAL_CHREC}.  Polynomial chrec
 | 
      
         | 344 |  |  | has three arguments -- base, step and loop (both base and step may
 | 
      
         | 345 |  |  | contain further polynomial chrecs).  Type of the expression and of base
 | 
      
         | 346 |  |  | and step must be the same.  A variable has evolution
 | 
      
         | 347 |  |  | @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
 | 
      
         | 348 |  |  | loop) equivalent to @code{x_1} in the following example
 | 
      
         | 349 |  |  |  
 | 
      
         | 350 |  |  | @smallexample
 | 
      
         | 351 |  |  | while (@dots{})
 | 
      
         | 352 |  |  |   @{
 | 
      
         | 353 |  |  |     x_1 = phi (base, x_2);
 | 
      
         | 354 |  |  |     x_2 = x_1 + step;
 | 
      
         | 355 |  |  |   @}
 | 
      
         | 356 |  |  | @end smallexample
 | 
      
         | 357 |  |  |  
 | 
      
         | 358 |  |  | Note that this includes the language restrictions on the operations.
 | 
      
         | 359 |  |  | For example, if we compile C code and @code{x} has signed type, then the
 | 
      
         | 360 |  |  | overflow in addition would cause undefined behavior, and we may assume
 | 
      
         | 361 |  |  | that this does not happen.  Hence, the value with this SCEV cannot
 | 
      
         | 362 |  |  | overflow (which restricts the number of iterations of such a loop).
 | 
      
         | 363 |  |  |  
 | 
      
         | 364 |  |  | In many cases, one wants to restrict the attention just to affine
 | 
      
         | 365 |  |  | induction variables.  In this case, the extra expressive power of SCEV
 | 
      
         | 366 |  |  | is not useful, and may complicate the optimizations.  In this case,
 | 
      
         | 367 |  |  | @code{simple_iv} function may be used to analyze a value -- the result
 | 
      
         | 368 |  |  | is a loop-invariant base and step.
 | 
      
         | 369 |  |  |  
 | 
      
         | 370 |  |  | @node loop-iv
 | 
      
         | 371 |  |  | @section IV analysis on RTL
 | 
      
         | 372 |  |  | @cindex IV analysis on RTL
 | 
      
         | 373 |  |  |  
 | 
      
         | 374 |  |  | The induction variable on RTL is simple and only allows analysis of
 | 
      
         | 375 |  |  | affine induction variables, and only in one loop at once.  The interface
 | 
      
         | 376 |  |  | is declared in @file{cfgloop.h}.  Before analyzing induction variables
 | 
      
         | 377 |  |  | in a loop L, @code{iv_analysis_loop_init} function must be called on L.
 | 
      
         | 378 |  |  | After the analysis (possibly calling @code{iv_analysis_loop_init} for
 | 
      
         | 379 |  |  | several loops) is finished, @code{iv_analysis_done} should be called.
 | 
      
         | 380 |  |  | The following functions can be used to access the results of the
 | 
      
         | 381 |  |  | analysis:
 | 
      
         | 382 |  |  |  
 | 
      
         | 383 |  |  | @itemize
 | 
      
         | 384 |  |  | @item @code{iv_analyze}: Analyzes a single register used in the given
 | 
      
         | 385 |  |  | insn.  If no use of the register in this insn is found, the following
 | 
      
         | 386 |  |  | insns are scanned, so that this function can be called on the insn
 | 
      
         | 387 |  |  | returned by get_condition.
 | 
      
         | 388 |  |  | @item @code{iv_analyze_result}: Analyzes result of the assignment in the
 | 
      
         | 389 |  |  | given insn.
 | 
      
         | 390 |  |  | @item @code{iv_analyze_expr}: Analyzes a more complicated expression.
 | 
      
         | 391 |  |  | All its operands are analyzed by @code{iv_analyze}, and hence they must
 | 
      
         | 392 |  |  | be used in the specified insn or one of the following insns.
 | 
      
         | 393 |  |  | @end itemize
 | 
      
         | 394 |  |  |  
 | 
      
         | 395 |  |  | The description of the induction variable is provided in @code{struct
 | 
      
         | 396 |  |  | rtx_iv}.  In order to handle subregs, the representation is a bit
 | 
      
         | 397 |  |  | complicated; if the value of the @code{extend} field is not
 | 
      
         | 398 |  |  | @code{UNKNOWN}, the value of the induction variable in the i-th
 | 
      
         | 399 |  |  | iteration is
 | 
      
         | 400 |  |  |  
 | 
      
         | 401 |  |  | @smallexample
 | 
      
         | 402 |  |  | delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
 | 
      
         | 403 |  |  | @end smallexample
 | 
      
         | 404 |  |  |  
 | 
      
         | 405 |  |  | with the following exception:  if @code{first_special} is true, then the
 | 
      
         | 406 |  |  | value in the first iteration (when @code{i} is zero) is @code{delta +
 | 
      
         | 407 |  |  | mult * base}.  However, if @code{extend} is equal to @code{UNKNOWN},
 | 
      
         | 408 |  |  | then @code{first_special} must be false, @code{delta} 0, @code{mult} 1
 | 
      
         | 409 |  |  | and the value in the i-th iteration is
 | 
      
         | 410 |  |  |  
 | 
      
         | 411 |  |  | @smallexample
 | 
      
         | 412 |  |  | subreg_@{mode@} (base + i * step)
 | 
      
         | 413 |  |  | @end smallexample
 | 
      
         | 414 |  |  |  
 | 
      
         | 415 |  |  | The function @code{get_iv_value} can be used to perform these
 | 
      
         | 416 |  |  | calculations.
 | 
      
         | 417 |  |  |  
 | 
      
         | 418 |  |  | @node Number of iterations
 | 
      
         | 419 |  |  | @section Number of iterations analysis
 | 
      
         | 420 |  |  | @cindex Number of iterations analysis
 | 
      
         | 421 |  |  |  
 | 
      
         | 422 |  |  | Both on GIMPLE and on RTL, there are functions available to determine
 | 
      
         | 423 |  |  | the number of iterations of a loop, with a similar interface.  The
 | 
      
         | 424 |  |  | number of iterations of a loop in GCC is defined as the number of
 | 
      
         | 425 |  |  | executions of the loop latch.  In many cases, it is not possible to
 | 
      
         | 426 |  |  | determine the number of iterations unconditionally -- the determined
 | 
      
         | 427 |  |  | number is correct only if some assumptions are satisfied.  The analysis
 | 
      
         | 428 |  |  | tries to verify these conditions using the information contained in the
 | 
      
         | 429 |  |  | program; if it fails, the conditions are returned together with the
 | 
      
         | 430 |  |  | result.  The following information and conditions are provided by the
 | 
      
         | 431 |  |  | analysis:
 | 
      
         | 432 |  |  |  
 | 
      
         | 433 |  |  | @itemize
 | 
      
         | 434 |  |  | @item @code{assumptions}: If this condition is false, the rest of
 | 
      
         | 435 |  |  | the information is invalid.
 | 
      
         | 436 |  |  | @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
 | 
      
         | 437 |  |  | this condition is true, the loop exits in the first iteration.
 | 
      
         | 438 |  |  | @item @code{infinite}: If this condition is true, the loop is infinite.
 | 
      
         | 439 |  |  | This condition is only available on RTL@.  On GIMPLE, conditions for
 | 
      
         | 440 |  |  | finiteness of the loop are included in @code{assumptions}.
 | 
      
         | 441 |  |  | @item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
 | 
      
         | 442 |  |  | that gives number of iterations.  The number of iterations is defined as
 | 
      
         | 443 |  |  | the number of executions of the loop latch.
 | 
      
         | 444 |  |  | @end itemize
 | 
      
         | 445 |  |  |  
 | 
      
         | 446 |  |  | Both on GIMPLE and on RTL, it necessary for the induction variable
 | 
      
         | 447 |  |  | analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
 | 
      
         | 448 |  |  | On GIMPLE, the results are stored to @code{struct tree_niter_desc}
 | 
      
         | 449 |  |  | structure.  Number of iterations before the loop is exited through a
 | 
      
         | 450 |  |  | given exit can be determined using @code{number_of_iterations_exit}
 | 
      
         | 451 |  |  | function.  On RTL, the results are returned in @code{struct niter_desc}
 | 
      
         | 452 |  |  | structure.  The corresponding function is named
 | 
      
         | 453 |  |  | @code{check_simple_exit}.  There are also functions that pass through
 | 
      
         | 454 |  |  | all the exits of a loop and try to find one with easy to determine
 | 
      
         | 455 |  |  | number of iterations -- @code{find_loop_niter} on GIMPLE and
 | 
      
         | 456 |  |  | @code{find_simple_exit} on RTL@.  Finally, there are functions that
 | 
      
         | 457 |  |  | provide the same information, but additionally cache it, so that
 | 
      
         | 458 |  |  | repeated calls to number of iterations are not so costly --
 | 
      
         | 459 |  |  | @code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
 | 
      
         | 460 |  |  | on RTL.
 | 
      
         | 461 |  |  |  
 | 
      
         | 462 |  |  | Note that some of these functions may behave slightly differently than
 | 
      
         | 463 |  |  | others -- some of them return only the expression for the number of
 | 
      
         | 464 |  |  | iterations, and fail if there are some assumptions.  The function
 | 
      
         | 465 |  |  | @code{number_of_latch_executions} works only for single-exit loops.
 | 
      
         | 466 |  |  | The function @code{number_of_cond_exit_executions} can be used to
 | 
      
         | 467 |  |  | determine number of executions of the exit condition of a single-exit
 | 
      
         | 468 |  |  | loop (i.e., the @code{number_of_latch_executions} increased by one).
 | 
      
         | 469 |  |  |  
 | 
      
         | 470 |  |  | @node Dependency analysis
 | 
      
         | 471 |  |  | @section Data Dependency Analysis
 | 
      
         | 472 |  |  | @cindex Data Dependency Analysis
 | 
      
         | 473 |  |  |  
 | 
      
         | 474 |  |  | The code for the data dependence analysis can be found in
 | 
      
         | 475 |  |  | @file{tree-data-ref.c} and its interface and data structures are
 | 
      
         | 476 |  |  | described in @file{tree-data-ref.h}.  The function that computes the
 | 
      
         | 477 |  |  | data dependences for all the array and pointer references for a given
 | 
      
         | 478 |  |  | loop is @code{compute_data_dependences_for_loop}.  This function is
 | 
      
         | 479 |  |  | currently used by the linear loop transform and the vectorization
 | 
      
         | 480 |  |  | passes.  Before calling this function, one has to allocate two vectors:
 | 
      
         | 481 |  |  | a first vector will contain the set of data references that are
 | 
      
         | 482 |  |  | contained in the analyzed loop body, and the second vector will contain
 | 
      
         | 483 |  |  | the dependence relations between the data references.  Thus if the
 | 
      
         | 484 |  |  | vector of data references is of size @code{n}, the vector containing the
 | 
      
         | 485 |  |  | dependence relations will contain @code{n*n} elements.  However if the
 | 
      
         | 486 |  |  | analyzed loop contains side effects, such as calls that potentially can
 | 
      
         | 487 |  |  | interfere with the data references in the current analyzed loop, the
 | 
      
         | 488 |  |  | analysis stops while scanning the loop body for data references, and
 | 
      
         | 489 |  |  | inserts a single @code{chrec_dont_know} in the dependence relation
 | 
      
         | 490 |  |  | array.
 | 
      
         | 491 |  |  |  
 | 
      
         | 492 |  |  | The data references are discovered in a particular order during the
 | 
      
         | 493 |  |  | scanning of the loop body: the loop body is analyzed in execution order,
 | 
      
         | 494 |  |  | and the data references of each statement are pushed at the end of the
 | 
      
         | 495 |  |  | data reference array.  Two data references syntactically occur in the
 | 
      
         | 496 |  |  | program in the same order as in the array of data references.  This
 | 
      
         | 497 |  |  | syntactic order is important in some classical data dependence tests,
 | 
      
         | 498 |  |  | and mapping this order to the elements of this array avoids costly
 | 
      
         | 499 |  |  | queries to the loop body representation.
 | 
      
         | 500 |  |  |  
 | 
      
         | 501 |  |  | Three types of data references are currently handled: ARRAY_REF,
 | 
      
         | 502 |  |  | INDIRECT_REF and COMPONENT_REF@. The data structure for the data reference
 | 
      
         | 503 |  |  | is @code{data_reference}, where @code{data_reference_p} is a name of a
 | 
      
         | 504 |  |  | pointer to the data reference structure. The structure contains the
 | 
      
         | 505 |  |  | following elements:
 | 
      
         | 506 |  |  |  
 | 
      
         | 507 |  |  | @itemize
 | 
      
         | 508 |  |  | @item @code{base_object_info}: Provides information about the base object
 | 
      
         | 509 |  |  | of the data reference and its access functions. These access functions
 | 
      
         | 510 |  |  | represent the evolution of the data reference in the loop relative to
 | 
      
         | 511 |  |  | its base, in keeping with the classical meaning of the data reference
 | 
      
         | 512 |  |  | access function for the support of arrays. For example, for a reference
 | 
      
         | 513 |  |  | @code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
 | 
      
         | 514 |  |  | one for each array subscript, are:
 | 
      
         | 515 |  |  | @code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
 | 
      
         | 516 |  |  |  
 | 
      
         | 517 |  |  | @item @code{first_location_in_loop}: Provides information about the first
 | 
      
         | 518 |  |  | location accessed by the data reference in the loop and about the access
 | 
      
         | 519 |  |  | function used to represent evolution relative to this location. This data
 | 
      
         | 520 |  |  | is used to support pointers, and is not used for arrays (for which we
 | 
      
         | 521 |  |  | have base objects). Pointer accesses are represented as a one-dimensional
 | 
      
         | 522 |  |  | access that starts from the first location accessed in the loop. For
 | 
      
         | 523 |  |  | example:
 | 
      
         | 524 |  |  |  
 | 
      
         | 525 |  |  | @smallexample
 | 
      
         | 526 |  |  |       for1 i
 | 
      
         | 527 |  |  |          for2 j
 | 
      
         | 528 |  |  |           *((int *)p + i + j) = a[i][j];
 | 
      
         | 529 |  |  | @end smallexample
 | 
      
         | 530 |  |  |  
 | 
      
         | 531 |  |  | The access function of the pointer access is @code{@{0, + 4B@}_for2}
 | 
      
         | 532 |  |  | relative to @code{p + i}. The access functions of the array are
 | 
      
         | 533 |  |  | @code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
 | 
      
         | 534 |  |  | relative to @code{a}.
 | 
      
         | 535 |  |  |  
 | 
      
         | 536 |  |  | Usually, the object the pointer refers to is either unknown, or we can't
 | 
      
         | 537 |  |  | prove that the access is confined to the boundaries of a certain object.
 | 
      
         | 538 |  |  |  
 | 
      
         | 539 |  |  | Two data references can be compared only if at least one of these two
 | 
      
         | 540 |  |  | representations has all its fields filled for both data references.
 | 
      
         | 541 |  |  |  
 | 
      
         | 542 |  |  | The current strategy for data dependence tests is as follows:
 | 
      
         | 543 |  |  | If both @code{a} and @code{b} are represented as arrays, compare
 | 
      
         | 544 |  |  | @code{a.base_object} and @code{b.base_object};
 | 
      
         | 545 |  |  | if they are equal, apply dependence tests (use access functions based on
 | 
      
         | 546 |  |  | base_objects).
 | 
      
         | 547 |  |  | Else if both @code{a} and @code{b} are represented as pointers, compare
 | 
      
         | 548 |  |  | @code{a.first_location} and @code{b.first_location};
 | 
      
         | 549 |  |  | if they are equal, apply dependence tests (use access functions based on
 | 
      
         | 550 |  |  | first location).
 | 
      
         | 551 |  |  | However, if @code{a} and @code{b} are represented differently, only try
 | 
      
         | 552 |  |  | to prove that the bases are definitely different.
 | 
      
         | 553 |  |  |  
 | 
      
         | 554 |  |  | @item Aliasing information.
 | 
      
         | 555 |  |  | @item Alignment information.
 | 
      
         | 556 |  |  | @end itemize
 | 
      
         | 557 |  |  |  
 | 
      
         | 558 |  |  | The structure describing the relation between two data references is
 | 
      
         | 559 |  |  | @code{data_dependence_relation} and the shorter name for a pointer to
 | 
      
         | 560 |  |  | such a structure is @code{ddr_p}.  This structure contains:
 | 
      
         | 561 |  |  |  
 | 
      
         | 562 |  |  | @itemize
 | 
      
         | 563 |  |  | @item a pointer to each data reference,
 | 
      
         | 564 |  |  | @item a tree node @code{are_dependent} that is set to @code{chrec_known}
 | 
      
         | 565 |  |  | if the analysis has proved that there is no dependence between these two
 | 
      
         | 566 |  |  | data references, @code{chrec_dont_know} if the analysis was not able to
 | 
      
         | 567 |  |  | determine any useful result and potentially there could exist a
 | 
      
         | 568 |  |  | dependence between these data references, and @code{are_dependent} is
 | 
      
         | 569 |  |  | set to @code{NULL_TREE} if there exist a dependence relation between the
 | 
      
         | 570 |  |  | data references, and the description of this dependence relation is
 | 
      
         | 571 |  |  | given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
 | 
      
         | 572 |  |  | arrays,
 | 
      
         | 573 |  |  | @item a boolean that determines whether the dependence relation can be
 | 
      
         | 574 |  |  | represented by a classical distance vector,
 | 
      
         | 575 |  |  | @item an array @code{subscripts} that contains a description of each
 | 
      
         | 576 |  |  | subscript of the data references.  Given two array accesses a
 | 
      
         | 577 |  |  | subscript is the tuple composed of the access functions for a given
 | 
      
         | 578 |  |  | dimension.  For example, given @code{A[f1][f2][f3]} and
 | 
      
         | 579 |  |  | @code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
 | 
      
         | 580 |  |  | g2), (f3, g3)}.
 | 
      
         | 581 |  |  | @item two arrays @code{dir_vects} and @code{dist_vects} that contain
 | 
      
         | 582 |  |  | classical representations of the data dependences under the form of
 | 
      
         | 583 |  |  | direction and distance dependence vectors,
 | 
      
         | 584 |  |  | @item an array of loops @code{loop_nest} that contains the loops to
 | 
      
         | 585 |  |  | which the distance and direction vectors refer to.
 | 
      
         | 586 |  |  | @end itemize
 | 
      
         | 587 |  |  |  
 | 
      
         | 588 |  |  | Several functions for pretty printing the information extracted by the
 | 
      
         | 589 |  |  | data dependence analysis are available: @code{dump_ddrs} prints with a
 | 
      
         | 590 |  |  | maximum verbosity the details of a data dependence relations array,
 | 
      
         | 591 |  |  | @code{dump_dist_dir_vectors} prints only the classical distance and
 | 
      
         | 592 |  |  | direction vectors for a data dependence relations array, and
 | 
      
         | 593 |  |  | @code{dump_data_references} prints the details of the data references
 | 
      
         | 594 |  |  | contained in a data reference array.
 | 
      
         | 595 |  |  |  
 | 
      
         | 596 |  |  | @node Lambda
 | 
      
         | 597 |  |  | @section Linear loop transformations framework
 | 
      
         | 598 |  |  | @cindex Linear loop transformations framework
 | 
      
         | 599 |  |  |  
 | 
      
         | 600 |  |  | Lambda is a framework that allows transformations of loops using
 | 
      
         | 601 |  |  | non-singular matrix based transformations of the iteration space and
 | 
      
         | 602 |  |  | loop bounds. This allows compositions of skewing, scaling, interchange,
 | 
      
         | 603 |  |  | and reversal transformations.  These transformations are often used to
 | 
      
         | 604 |  |  | improve cache behavior or remove inner loop dependencies to allow
 | 
      
         | 605 |  |  | parallelization and vectorization to take place.
 | 
      
         | 606 |  |  |  
 | 
      
         | 607 |  |  | To perform these transformations, Lambda requires that the loopnest be
 | 
      
         | 608 |  |  | converted into an internal form that can be matrix transformed easily.
 | 
      
         | 609 |  |  | To do this conversion, the function
 | 
      
         | 610 |  |  | @code{gcc_loopnest_to_lambda_loopnest} is provided.  If the loop cannot
 | 
      
         | 611 |  |  | be transformed using lambda, this function will return NULL.
 | 
      
         | 612 |  |  |  
 | 
      
         | 613 |  |  | Once a @code{lambda_loopnest} is obtained from the conversion function,
 | 
      
         | 614 |  |  | it can be transformed by using @code{lambda_loopnest_transform}, which
 | 
      
         | 615 |  |  | takes a transformation matrix to apply.  Note that it is up to the
 | 
      
         | 616 |  |  | caller to verify that the transformation matrix is legal to apply to the
 | 
      
         | 617 |  |  | loop (dependence respecting, etc).  Lambda simply applies whatever
 | 
      
         | 618 |  |  | matrix it is told to provide.  It can be extended to make legal matrices
 | 
      
         | 619 |  |  | out of any non-singular matrix, but this is not currently implemented.
 | 
      
         | 620 |  |  | Legality of a matrix for a given loopnest can be verified using
 | 
      
         | 621 |  |  | @code{lambda_transform_legal_p}.
 | 
      
         | 622 |  |  |  
 | 
      
         | 623 |  |  | Given a transformed loopnest, conversion back into gcc IR is done by
 | 
      
         | 624 |  |  | @code{lambda_loopnest_to_gcc_loopnest}.  This function will modify the
 | 
      
         | 625 |  |  | loops so that they match the transformed loopnest.
 | 
      
         | 626 |  |  |  
 | 
      
         | 627 |  |  |  
 | 
      
         | 628 |  |  | @node Omega
 | 
      
         | 629 |  |  | @section Omega a solver for linear programming problems
 | 
      
         | 630 |  |  | @cindex Omega a solver for linear programming problems
 | 
      
         | 631 |  |  |  
 | 
      
         | 632 |  |  | The data dependence analysis contains several solvers triggered
 | 
      
         | 633 |  |  | sequentially from the less complex ones to the more sophisticated.
 | 
      
         | 634 |  |  | For ensuring the consistency of the results of these solvers, a data
 | 
      
         | 635 |  |  | dependence check pass has been implemented based on two different
 | 
      
         | 636 |  |  | solvers.  The second method that has been integrated to GCC is based
 | 
      
         | 637 |  |  | on the Omega dependence solver, written in the 1990's by William Pugh
 | 
      
         | 638 |  |  | and David Wonnacott.  Data dependence tests can be formulated using a
 | 
      
         | 639 |  |  | subset of the Presburger arithmetics that can be translated to linear
 | 
      
         | 640 |  |  | constraint systems.  These linear constraint systems can then be
 | 
      
         | 641 |  |  | solved using the Omega solver.
 | 
      
         | 642 |  |  |  
 | 
      
         | 643 |  |  | The Omega solver is using Fourier-Motzkin's algorithm for variable
 | 
      
         | 644 |  |  | elimination: a linear constraint system containing @code{n} variables
 | 
      
         | 645 |  |  | is reduced to a linear constraint system with @code{n-1} variables.
 | 
      
         | 646 |  |  | The Omega solver can also be used for solving other problems that can
 | 
      
         | 647 |  |  | be expressed under the form of a system of linear equalities and
 | 
      
         | 648 |  |  | inequalities.  The Omega solver is known to have an exponential worst
 | 
      
         | 649 |  |  | case, also known under the name of ``omega nightmare'' in the
 | 
      
         | 650 |  |  | literature, but in practice, the omega test is known to be efficient
 | 
      
         | 651 |  |  | for the common data dependence tests.
 | 
      
         | 652 |  |  |  
 | 
      
         | 653 |  |  | The interface used by the Omega solver for describing the linear
 | 
      
         | 654 |  |  | programming problems is described in @file{omega.h}, and the solver is
 | 
      
         | 655 |  |  | @code{omega_solve_problem}.
 |