1 |
711 |
jeremybenn |
@c markers: BUG TODO
|
2 |
|
|
|
3 |
|
|
@c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
|
4 |
|
|
@c 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
|
5 |
|
|
@c Free Software Foundation, Inc.
|
6 |
|
|
@c This is part of the GCC manual.
|
7 |
|
|
@c For copying conditions, see the file gcc.texi.
|
8 |
|
|
|
9 |
|
|
@node Passes
|
10 |
|
|
@chapter Passes and Files of the Compiler
|
11 |
|
|
@cindex passes and files of the compiler
|
12 |
|
|
@cindex files and passes of the compiler
|
13 |
|
|
@cindex compiler passes and files
|
14 |
|
|
|
15 |
|
|
This chapter is dedicated to giving an overview of the optimization and
|
16 |
|
|
code generation passes of the compiler. In the process, it describes
|
17 |
|
|
some of the language front end interface, though this description is no
|
18 |
|
|
where near complete.
|
19 |
|
|
|
20 |
|
|
@menu
|
21 |
|
|
* Parsing pass:: The language front end turns text into bits.
|
22 |
|
|
* Gimplification pass:: The bits are turned into something we can optimize.
|
23 |
|
|
* Pass manager:: Sequencing the optimization passes.
|
24 |
|
|
* Tree SSA passes:: Optimizations on a high-level representation.
|
25 |
|
|
* RTL passes:: Optimizations on a low-level representation.
|
26 |
|
|
@end menu
|
27 |
|
|
|
28 |
|
|
@node Parsing pass
|
29 |
|
|
@section Parsing pass
|
30 |
|
|
@cindex GENERIC
|
31 |
|
|
@findex lang_hooks.parse_file
|
32 |
|
|
The language front end is invoked only once, via
|
33 |
|
|
@code{lang_hooks.parse_file}, to parse the entire input. The language
|
34 |
|
|
front end may use any intermediate language representation deemed
|
35 |
|
|
appropriate. The C front end uses GENERIC trees (@pxref{GENERIC}), plus
|
36 |
|
|
a double handful of language specific tree codes defined in
|
37 |
|
|
@file{c-common.def}. The Fortran front end uses a completely different
|
38 |
|
|
private representation.
|
39 |
|
|
|
40 |
|
|
@cindex GIMPLE
|
41 |
|
|
@cindex gimplification
|
42 |
|
|
@cindex gimplifier
|
43 |
|
|
@cindex language-independent intermediate representation
|
44 |
|
|
@cindex intermediate representation lowering
|
45 |
|
|
@cindex lowering, language-dependent intermediate representation
|
46 |
|
|
At some point the front end must translate the representation used in the
|
47 |
|
|
front end to a representation understood by the language-independent
|
48 |
|
|
portions of the compiler. Current practice takes one of two forms.
|
49 |
|
|
The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function,
|
50 |
|
|
and uses the gimplifier callbacks to convert the language-specific tree
|
51 |
|
|
nodes directly to GIMPLE before passing the function off to be compiled.
|
52 |
|
|
The Fortran front end converts from a private representation to GENERIC,
|
53 |
|
|
which is later lowered to GIMPLE when the function is compiled. Which
|
54 |
|
|
route to choose probably depends on how well GENERIC (plus extensions)
|
55 |
|
|
can be made to match up with the source language and necessary parsing
|
56 |
|
|
data structures.
|
57 |
|
|
|
58 |
|
|
BUG: Gimplification must occur before nested function lowering,
|
59 |
|
|
and nested function lowering must be done by the front end before
|
60 |
|
|
passing the data off to cgraph.
|
61 |
|
|
|
62 |
|
|
TODO: Cgraph should control nested function lowering. It would
|
63 |
|
|
only be invoked when it is certain that the outer-most function
|
64 |
|
|
is used.
|
65 |
|
|
|
66 |
|
|
TODO: Cgraph needs a gimplify_function callback. It should be
|
67 |
|
|
invoked when (1) it is certain that the function is used, (2)
|
68 |
|
|
warning flags specified by the user require some amount of
|
69 |
|
|
compilation in order to honor, (3) the language indicates that
|
70 |
|
|
semantic analysis is not complete until gimplification occurs.
|
71 |
|
|
Hum@dots{} this sounds overly complicated. Perhaps we should just
|
72 |
|
|
have the front end gimplify always; in most cases it's only one
|
73 |
|
|
function call.
|
74 |
|
|
|
75 |
|
|
The front end needs to pass all function definitions and top level
|
76 |
|
|
declarations off to the middle-end so that they can be compiled and
|
77 |
|
|
emitted to the object file. For a simple procedural language, it is
|
78 |
|
|
usually most convenient to do this as each top level declaration or
|
79 |
|
|
definition is seen. There is also a distinction to be made between
|
80 |
|
|
generating functional code and generating complete debug information.
|
81 |
|
|
The only thing that is absolutely required for functional code is that
|
82 |
|
|
function and data @emph{definitions} be passed to the middle-end. For
|
83 |
|
|
complete debug information, function, data and type declarations
|
84 |
|
|
should all be passed as well.
|
85 |
|
|
|
86 |
|
|
@findex rest_of_decl_compilation
|
87 |
|
|
@findex rest_of_type_compilation
|
88 |
|
|
@findex cgraph_finalize_function
|
89 |
|
|
In any case, the front end needs each complete top-level function or
|
90 |
|
|
data declaration, and each data definition should be passed to
|
91 |
|
|
@code{rest_of_decl_compilation}. Each complete type definition should
|
92 |
|
|
be passed to @code{rest_of_type_compilation}. Each function definition
|
93 |
|
|
should be passed to @code{cgraph_finalize_function}.
|
94 |
|
|
|
95 |
|
|
TODO: I know rest_of_compilation currently has all sorts of
|
96 |
|
|
RTL generation semantics. I plan to move all code generation
|
97 |
|
|
bits (both Tree and RTL) to compile_function. Should we hide
|
98 |
|
|
cgraph from the front ends and move back to rest_of_compilation
|
99 |
|
|
as the official interface? Possibly we should rename all three
|
100 |
|
|
interfaces such that the names match in some meaningful way and
|
101 |
|
|
that is more descriptive than "rest_of".
|
102 |
|
|
|
103 |
|
|
The middle-end will, at its option, emit the function and data
|
104 |
|
|
definitions immediately or queue them for later processing.
|
105 |
|
|
|
106 |
|
|
@node Gimplification pass
|
107 |
|
|
@section Gimplification pass
|
108 |
|
|
|
109 |
|
|
@cindex gimplification
|
110 |
|
|
@cindex GIMPLE
|
111 |
|
|
@dfn{Gimplification} is a whimsical term for the process of converting
|
112 |
|
|
the intermediate representation of a function into the GIMPLE language
|
113 |
|
|
(@pxref{GIMPLE}). The term stuck, and so words like ``gimplification'',
|
114 |
|
|
``gimplify'', ``gimplifier'' and the like are sprinkled throughout this
|
115 |
|
|
section of code.
|
116 |
|
|
|
117 |
|
|
While a front end may certainly choose to generate GIMPLE directly if
|
118 |
|
|
it chooses, this can be a moderately complex process unless the
|
119 |
|
|
intermediate language used by the front end is already fairly simple.
|
120 |
|
|
Usually it is easier to generate GENERIC trees plus extensions
|
121 |
|
|
and let the language-independent gimplifier do most of the work.
|
122 |
|
|
|
123 |
|
|
@findex gimplify_function_tree
|
124 |
|
|
@findex gimplify_expr
|
125 |
|
|
@findex lang_hooks.gimplify_expr
|
126 |
|
|
The main entry point to this pass is @code{gimplify_function_tree}
|
127 |
|
|
located in @file{gimplify.c}. From here we process the entire
|
128 |
|
|
function gimplifying each statement in turn. The main workhorse
|
129 |
|
|
for this pass is @code{gimplify_expr}. Approximately everything
|
130 |
|
|
passes through here at least once, and it is from here that we
|
131 |
|
|
invoke the @code{lang_hooks.gimplify_expr} callback.
|
132 |
|
|
|
133 |
|
|
The callback should examine the expression in question and return
|
134 |
|
|
@code{GS_UNHANDLED} if the expression is not a language specific
|
135 |
|
|
construct that requires attention. Otherwise it should alter the
|
136 |
|
|
expression in some way to such that forward progress is made toward
|
137 |
|
|
producing valid GIMPLE@. If the callback is certain that the
|
138 |
|
|
transformation is complete and the expression is valid GIMPLE, it
|
139 |
|
|
should return @code{GS_ALL_DONE}. Otherwise it should return
|
140 |
|
|
@code{GS_OK}, which will cause the expression to be processed again.
|
141 |
|
|
If the callback encounters an error during the transformation (because
|
142 |
|
|
the front end is relying on the gimplification process to finish
|
143 |
|
|
semantic checks), it should return @code{GS_ERROR}.
|
144 |
|
|
|
145 |
|
|
@node Pass manager
|
146 |
|
|
@section Pass manager
|
147 |
|
|
|
148 |
|
|
The pass manager is located in @file{passes.c}, @file{tree-optimize.c}
|
149 |
|
|
and @file{tree-pass.h}.
|
150 |
|
|
Its job is to run all of the individual passes in the correct order,
|
151 |
|
|
and take care of standard bookkeeping that applies to every pass.
|
152 |
|
|
|
153 |
|
|
The theory of operation is that each pass defines a structure that
|
154 |
|
|
represents everything we need to know about that pass---when it
|
155 |
|
|
should be run, how it should be run, what intermediate language
|
156 |
|
|
form or on-the-side data structures it needs. We register the pass
|
157 |
|
|
to be run in some particular order, and the pass manager arranges
|
158 |
|
|
for everything to happen in the correct order.
|
159 |
|
|
|
160 |
|
|
The actuality doesn't completely live up to the theory at present.
|
161 |
|
|
Command-line switches and @code{timevar_id_t} enumerations must still
|
162 |
|
|
be defined elsewhere. The pass manager validates constraints but does
|
163 |
|
|
not attempt to (re-)generate data structures or lower intermediate
|
164 |
|
|
language form based on the requirements of the next pass. Nevertheless,
|
165 |
|
|
what is present is useful, and a far sight better than nothing at all.
|
166 |
|
|
|
167 |
|
|
Each pass should have a unique name.
|
168 |
|
|
Each pass may have its own dump file (for GCC debugging purposes).
|
169 |
|
|
Passes with a name starting with a star do not dump anything.
|
170 |
|
|
Sometimes passes are supposed to share a dump file / option name.
|
171 |
|
|
To still give these unique names, you can use a prefix that is delimited
|
172 |
|
|
by a space from the part that is used for the dump file / option name.
|
173 |
|
|
E.g. When the pass name is "ud dce", the name used for dump file/options
|
174 |
|
|
is "dce".
|
175 |
|
|
|
176 |
|
|
TODO: describe the global variables set up by the pass manager,
|
177 |
|
|
and a brief description of how a new pass should use it.
|
178 |
|
|
I need to look at what info RTL passes use first@enddots{}
|
179 |
|
|
|
180 |
|
|
@node Tree SSA passes
|
181 |
|
|
@section Tree SSA passes
|
182 |
|
|
|
183 |
|
|
The following briefly describes the Tree optimization passes that are
|
184 |
|
|
run after gimplification and what source files they are located in.
|
185 |
|
|
|
186 |
|
|
@itemize @bullet
|
187 |
|
|
@item Remove useless statements
|
188 |
|
|
|
189 |
|
|
This pass is an extremely simple sweep across the gimple code in which
|
190 |
|
|
we identify obviously dead code and remove it. Here we do things like
|
191 |
|
|
simplify @code{if} statements with constant conditions, remove
|
192 |
|
|
exception handling constructs surrounding code that obviously cannot
|
193 |
|
|
throw, remove lexical bindings that contain no variables, and other
|
194 |
|
|
assorted simplistic cleanups. The idea is to get rid of the obvious
|
195 |
|
|
stuff quickly rather than wait until later when it's more work to get
|
196 |
|
|
rid of it. This pass is located in @file{tree-cfg.c} and described by
|
197 |
|
|
@code{pass_remove_useless_stmts}.
|
198 |
|
|
|
199 |
|
|
@item Mudflap declaration registration
|
200 |
|
|
|
201 |
|
|
If mudflap (@pxref{Optimize Options,,-fmudflap -fmudflapth
|
202 |
|
|
-fmudflapir,gcc,Using the GNU Compiler Collection (GCC)}) is
|
203 |
|
|
enabled, we generate code to register some variable declarations with
|
204 |
|
|
the mudflap runtime. Specifically, the runtime tracks the lifetimes of
|
205 |
|
|
those variable declarations that have their addresses taken, or whose
|
206 |
|
|
bounds are unknown at compile time (@code{extern}). This pass generates
|
207 |
|
|
new exception handling constructs (@code{try}/@code{finally}), and so
|
208 |
|
|
must run before those are lowered. In addition, the pass enqueues
|
209 |
|
|
declarations of static variables whose lifetimes extend to the entire
|
210 |
|
|
program. The pass is located in @file{tree-mudflap.c} and is described
|
211 |
|
|
by @code{pass_mudflap_1}.
|
212 |
|
|
|
213 |
|
|
@item OpenMP lowering
|
214 |
|
|
|
215 |
|
|
If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers
|
216 |
|
|
OpenMP constructs into GIMPLE.
|
217 |
|
|
|
218 |
|
|
Lowering of OpenMP constructs involves creating replacement
|
219 |
|
|
expressions for local variables that have been mapped using data
|
220 |
|
|
sharing clauses, exposing the control flow of most synchronization
|
221 |
|
|
directives and adding region markers to facilitate the creation of the
|
222 |
|
|
control flow graph. The pass is located in @file{omp-low.c} and is
|
223 |
|
|
described by @code{pass_lower_omp}.
|
224 |
|
|
|
225 |
|
|
@item OpenMP expansion
|
226 |
|
|
|
227 |
|
|
If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
|
228 |
|
|
parallel regions into their own functions to be invoked by the thread
|
229 |
|
|
library. The pass is located in @file{omp-low.c} and is described by
|
230 |
|
|
@code{pass_expand_omp}.
|
231 |
|
|
|
232 |
|
|
@item Lower control flow
|
233 |
|
|
|
234 |
|
|
This pass flattens @code{if} statements (@code{COND_EXPR})
|
235 |
|
|
and moves lexical bindings (@code{BIND_EXPR}) out of line. After
|
236 |
|
|
this pass, all @code{if} statements will have exactly two @code{goto}
|
237 |
|
|
statements in its @code{then} and @code{else} arms. Lexical binding
|
238 |
|
|
information for each statement will be found in @code{TREE_BLOCK} rather
|
239 |
|
|
than being inferred from its position under a @code{BIND_EXPR}. This
|
240 |
|
|
pass is found in @file{gimple-low.c} and is described by
|
241 |
|
|
@code{pass_lower_cf}.
|
242 |
|
|
|
243 |
|
|
@item Lower exception handling control flow
|
244 |
|
|
|
245 |
|
|
This pass decomposes high-level exception handling constructs
|
246 |
|
|
(@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form
|
247 |
|
|
that explicitly represents the control flow involved. After this
|
248 |
|
|
pass, @code{lookup_stmt_eh_region} will return a non-negative
|
249 |
|
|
number for any statement that may have EH control flow semantics;
|
250 |
|
|
examine @code{tree_can_throw_internal} or @code{tree_can_throw_external}
|
251 |
|
|
for exact semantics. Exact control flow may be extracted from
|
252 |
|
|
@code{foreach_reachable_handler}. The EH region nesting tree is defined
|
253 |
|
|
in @file{except.h} and built in @file{except.c}. The lowering pass
|
254 |
|
|
itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}.
|
255 |
|
|
|
256 |
|
|
@item Build the control flow graph
|
257 |
|
|
|
258 |
|
|
This pass decomposes a function into basic blocks and creates all of
|
259 |
|
|
the edges that connect them. It is located in @file{tree-cfg.c} and
|
260 |
|
|
is described by @code{pass_build_cfg}.
|
261 |
|
|
|
262 |
|
|
@item Find all referenced variables
|
263 |
|
|
|
264 |
|
|
This pass walks the entire function and collects an array of all
|
265 |
|
|
variables referenced in the function, @code{referenced_vars}. The
|
266 |
|
|
index at which a variable is found in the array is used as a UID
|
267 |
|
|
for the variable within this function. This data is needed by the
|
268 |
|
|
SSA rewriting routines. The pass is located in @file{tree-dfa.c}
|
269 |
|
|
and is described by @code{pass_referenced_vars}.
|
270 |
|
|
|
271 |
|
|
@item Enter static single assignment form
|
272 |
|
|
|
273 |
|
|
This pass rewrites the function such that it is in SSA form. After
|
274 |
|
|
this pass, all @code{is_gimple_reg} variables will be referenced by
|
275 |
|
|
@code{SSA_NAME}, and all occurrences of other variables will be
|
276 |
|
|
annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have
|
277 |
|
|
been inserted as necessary for each basic block. This pass is
|
278 |
|
|
located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}.
|
279 |
|
|
|
280 |
|
|
@item Warn for uninitialized variables
|
281 |
|
|
|
282 |
|
|
This pass scans the function for uses of @code{SSA_NAME}s that
|
283 |
|
|
are fed by default definition. For non-parameter variables, such
|
284 |
|
|
uses are uninitialized. The pass is run twice, before and after
|
285 |
|
|
optimization (if turned on). In the first pass we only warn for uses that are
|
286 |
|
|
positively uninitialized; in the second pass we warn for uses that
|
287 |
|
|
are possibly uninitialized. The pass is located in @file{tree-ssa.c}
|
288 |
|
|
and is defined by @code{pass_early_warn_uninitialized} and
|
289 |
|
|
@code{pass_late_warn_uninitialized}.
|
290 |
|
|
|
291 |
|
|
@item Dead code elimination
|
292 |
|
|
|
293 |
|
|
This pass scans the function for statements without side effects whose
|
294 |
|
|
result is unused. It does not do memory life analysis, so any value
|
295 |
|
|
that is stored in memory is considered used. The pass is run multiple
|
296 |
|
|
times throughout the optimization process. It is located in
|
297 |
|
|
@file{tree-ssa-dce.c} and is described by @code{pass_dce}.
|
298 |
|
|
|
299 |
|
|
@item Dominator optimizations
|
300 |
|
|
|
301 |
|
|
This pass performs trivial dominator-based copy and constant propagation,
|
302 |
|
|
expression simplification, and jump threading. It is run multiple times
|
303 |
|
|
throughout the optimization process. It is located in @file{tree-ssa-dom.c}
|
304 |
|
|
and is described by @code{pass_dominator}.
|
305 |
|
|
|
306 |
|
|
@item Forward propagation of single-use variables
|
307 |
|
|
|
308 |
|
|
This pass attempts to remove redundant computation by substituting
|
309 |
|
|
variables that are used once into the expression that uses them and
|
310 |
|
|
seeing if the result can be simplified. It is located in
|
311 |
|
|
@file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}.
|
312 |
|
|
|
313 |
|
|
@item Copy Renaming
|
314 |
|
|
|
315 |
|
|
This pass attempts to change the name of compiler temporaries involved in
|
316 |
|
|
copy operations such that SSA->normal can coalesce the copy away. When compiler
|
317 |
|
|
temporaries are copies of user variables, it also renames the compiler
|
318 |
|
|
temporary to the user variable resulting in better use of user symbols. It is
|
319 |
|
|
located in @file{tree-ssa-copyrename.c} and is described by
|
320 |
|
|
@code{pass_copyrename}.
|
321 |
|
|
|
322 |
|
|
@item PHI node optimizations
|
323 |
|
|
|
324 |
|
|
This pass recognizes forms of PHI inputs that can be represented as
|
325 |
|
|
conditional expressions and rewrites them into straight line code.
|
326 |
|
|
It is located in @file{tree-ssa-phiopt.c} and is described by
|
327 |
|
|
@code{pass_phiopt}.
|
328 |
|
|
|
329 |
|
|
@item May-alias optimization
|
330 |
|
|
|
331 |
|
|
This pass performs a flow sensitive SSA-based points-to analysis.
|
332 |
|
|
The resulting may-alias, must-alias, and escape analysis information
|
333 |
|
|
is used to promote variables from in-memory addressable objects to
|
334 |
|
|
non-aliased variables that can be renamed into SSA form. We also
|
335 |
|
|
update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
|
336 |
|
|
aggregates so that we get fewer false kills. The pass is located
|
337 |
|
|
in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}.
|
338 |
|
|
|
339 |
|
|
Interprocedural points-to information is located in
|
340 |
|
|
@file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}.
|
341 |
|
|
|
342 |
|
|
@item Profiling
|
343 |
|
|
|
344 |
|
|
This pass rewrites the function in order to collect runtime block
|
345 |
|
|
and value profiling data. Such data may be fed back into the compiler
|
346 |
|
|
on a subsequent run so as to allow optimization based on expected
|
347 |
|
|
execution frequencies. The pass is located in @file{predict.c} and
|
348 |
|
|
is described by @code{pass_profile}.
|
349 |
|
|
|
350 |
|
|
@item Lower complex arithmetic
|
351 |
|
|
|
352 |
|
|
This pass rewrites complex arithmetic operations into their component
|
353 |
|
|
scalar arithmetic operations. The pass is located in @file{tree-complex.c}
|
354 |
|
|
and is described by @code{pass_lower_complex}.
|
355 |
|
|
|
356 |
|
|
@item Scalar replacement of aggregates
|
357 |
|
|
|
358 |
|
|
This pass rewrites suitable non-aliased local aggregate variables into
|
359 |
|
|
a set of scalar variables. The resulting scalar variables are
|
360 |
|
|
rewritten into SSA form, which allows subsequent optimization passes
|
361 |
|
|
to do a significantly better job with them. The pass is located in
|
362 |
|
|
@file{tree-sra.c} and is described by @code{pass_sra}.
|
363 |
|
|
|
364 |
|
|
@item Dead store elimination
|
365 |
|
|
|
366 |
|
|
This pass eliminates stores to memory that are subsequently overwritten
|
367 |
|
|
by another store, without any intervening loads. The pass is located
|
368 |
|
|
in @file{tree-ssa-dse.c} and is described by @code{pass_dse}.
|
369 |
|
|
|
370 |
|
|
@item Tail recursion elimination
|
371 |
|
|
|
372 |
|
|
This pass transforms tail recursion into a loop. It is located in
|
373 |
|
|
@file{tree-tailcall.c} and is described by @code{pass_tail_recursion}.
|
374 |
|
|
|
375 |
|
|
@item Forward store motion
|
376 |
|
|
|
377 |
|
|
This pass sinks stores and assignments down the flowgraph closer to their
|
378 |
|
|
use point. The pass is located in @file{tree-ssa-sink.c} and is
|
379 |
|
|
described by @code{pass_sink_code}.
|
380 |
|
|
|
381 |
|
|
@item Partial redundancy elimination
|
382 |
|
|
|
383 |
|
|
This pass eliminates partially redundant computations, as well as
|
384 |
|
|
performing load motion. The pass is located in @file{tree-ssa-pre.c}
|
385 |
|
|
and is described by @code{pass_pre}.
|
386 |
|
|
|
387 |
|
|
Just before partial redundancy elimination, if
|
388 |
|
|
@option{-funsafe-math-optimizations} is on, GCC tries to convert
|
389 |
|
|
divisions to multiplications by the reciprocal. The pass is located
|
390 |
|
|
in @file{tree-ssa-math-opts.c} and is described by
|
391 |
|
|
@code{pass_cse_reciprocal}.
|
392 |
|
|
|
393 |
|
|
@item Full redundancy elimination
|
394 |
|
|
|
395 |
|
|
This is a simpler form of PRE that only eliminates redundancies that
|
396 |
|
|
occur on all paths. It is located in @file{tree-ssa-pre.c} and
|
397 |
|
|
described by @code{pass_fre}.
|
398 |
|
|
|
399 |
|
|
@item Loop optimization
|
400 |
|
|
|
401 |
|
|
The main driver of the pass is placed in @file{tree-ssa-loop.c}
|
402 |
|
|
and described by @code{pass_loop}.
|
403 |
|
|
|
404 |
|
|
The optimizations performed by this pass are:
|
405 |
|
|
|
406 |
|
|
Loop invariant motion. This pass moves only invariants that
|
407 |
|
|
would be hard to handle on RTL level (function calls, operations that expand to
|
408 |
|
|
nontrivial sequences of insns). With @option{-funswitch-loops} it also moves
|
409 |
|
|
operands of conditions that are invariant out of the loop, so that we can use
|
410 |
|
|
just trivial invariantness analysis in loop unswitching. The pass also includes
|
411 |
|
|
store motion. The pass is implemented in @file{tree-ssa-loop-im.c}.
|
412 |
|
|
|
413 |
|
|
Canonical induction variable creation. This pass creates a simple counter
|
414 |
|
|
for number of iterations of the loop and replaces the exit condition of the
|
415 |
|
|
loop using it, in case when a complicated analysis is necessary to determine
|
416 |
|
|
the number of iterations. Later optimizations then may determine the number
|
417 |
|
|
easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}.
|
418 |
|
|
|
419 |
|
|
Induction variable optimizations. This pass performs standard induction
|
420 |
|
|
variable optimizations, including strength reduction, induction variable
|
421 |
|
|
merging and induction variable elimination. The pass is implemented in
|
422 |
|
|
@file{tree-ssa-loop-ivopts.c}.
|
423 |
|
|
|
424 |
|
|
Loop unswitching. This pass moves the conditional jumps that are invariant
|
425 |
|
|
out of the loops. To achieve this, a duplicate of the loop is created for
|
426 |
|
|
each possible outcome of conditional jump(s). The pass is implemented in
|
427 |
|
|
@file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the
|
428 |
|
|
RTL level loop unswitching in @file{loop-unswitch.c}, but currently
|
429 |
|
|
the RTL level pass is not completely redundant yet due to deficiencies
|
430 |
|
|
in tree level alias analysis.
|
431 |
|
|
|
432 |
|
|
The optimizations also use various utility functions contained in
|
433 |
|
|
@file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
|
434 |
|
|
@file{cfgloopmanip.c}.
|
435 |
|
|
|
436 |
|
|
Vectorization. This pass transforms loops to operate on vector types
|
437 |
|
|
instead of scalar types. Data parallelism across loop iterations is exploited
|
438 |
|
|
to group data elements from consecutive iterations into a vector and operate
|
439 |
|
|
on them in parallel. Depending on available target support the loop is
|
440 |
|
|
conceptually unrolled by a factor @code{VF} (vectorization factor), which is
|
441 |
|
|
the number of elements operated upon in parallel in each iteration, and the
|
442 |
|
|
@code{VF} copies of each scalar operation are fused to form a vector operation.
|
443 |
|
|
Additional loop transformations such as peeling and versioning may take place
|
444 |
|
|
to align the number of iterations, and to align the memory accesses in the
|
445 |
|
|
loop.
|
446 |
|
|
The pass is implemented in @file{tree-vectorizer.c} (the main driver),
|
447 |
|
|
@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
|
448 |
|
|
and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
|
449 |
|
|
functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
|
450 |
|
|
Analysis of data references is in @file{tree-data-ref.c}.
|
451 |
|
|
|
452 |
|
|
SLP Vectorization. This pass performs vectorization of straight-line code. The
|
453 |
|
|
pass is implemented in @file{tree-vectorizer.c} (the main driver),
|
454 |
|
|
@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and
|
455 |
|
|
@file{tree-vect-data-refs.c}.
|
456 |
|
|
|
457 |
|
|
Autoparallelization. This pass splits the loop iteration space to run
|
458 |
|
|
into several threads. The pass is implemented in @file{tree-parloops.c}.
|
459 |
|
|
|
460 |
|
|
Graphite is a loop transformation framework based on the polyhedral
|
461 |
|
|
model. Graphite stands for Gimple Represented as Polyhedra. The
|
462 |
|
|
internals of this infrastructure are documented in
|
463 |
|
|
@w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on
|
464 |
|
|
this representation are implemented in the various @file{graphite-*}
|
465 |
|
|
files.
|
466 |
|
|
|
467 |
|
|
@item Tree level if-conversion for vectorizer
|
468 |
|
|
|
469 |
|
|
This pass applies if-conversion to simple loops to help vectorizer.
|
470 |
|
|
We identify if convertible loops, if-convert statements and merge
|
471 |
|
|
basic blocks in one big block. The idea is to present loop in such
|
472 |
|
|
form so that vectorizer can have one to one mapping between statements
|
473 |
|
|
and available vector operations. This pass is located in
|
474 |
|
|
@file{tree-if-conv.c} and is described by @code{pass_if_conversion}.
|
475 |
|
|
|
476 |
|
|
@item Conditional constant propagation
|
477 |
|
|
|
478 |
|
|
This pass relaxes a lattice of values in order to identify those
|
479 |
|
|
that must be constant even in the presence of conditional branches.
|
480 |
|
|
The pass is located in @file{tree-ssa-ccp.c} and is described
|
481 |
|
|
by @code{pass_ccp}.
|
482 |
|
|
|
483 |
|
|
A related pass that works on memory loads and stores, and not just
|
484 |
|
|
register values, is located in @file{tree-ssa-ccp.c} and described by
|
485 |
|
|
@code{pass_store_ccp}.
|
486 |
|
|
|
487 |
|
|
@item Conditional copy propagation
|
488 |
|
|
|
489 |
|
|
This is similar to constant propagation but the lattice of values is
|
490 |
|
|
the ``copy-of'' relation. It eliminates redundant copies from the
|
491 |
|
|
code. The pass is located in @file{tree-ssa-copy.c} and described by
|
492 |
|
|
@code{pass_copy_prop}.
|
493 |
|
|
|
494 |
|
|
A related pass that works on memory copies, and not just register
|
495 |
|
|
copies, is located in @file{tree-ssa-copy.c} and described by
|
496 |
|
|
@code{pass_store_copy_prop}.
|
497 |
|
|
|
498 |
|
|
@item Value range propagation
|
499 |
|
|
|
500 |
|
|
This transformation is similar to constant propagation but
|
501 |
|
|
instead of propagating single constant values, it propagates
|
502 |
|
|
known value ranges. The implementation is based on Patterson's
|
503 |
|
|
range propagation algorithm (Accurate Static Branch Prediction by
|
504 |
|
|
Value Range Propagation, J. R. C. Patterson, PLDI '95). In
|
505 |
|
|
contrast to Patterson's algorithm, this implementation does not
|
506 |
|
|
propagate branch probabilities nor it uses more than a single
|
507 |
|
|
range per SSA name. This means that the current implementation
|
508 |
|
|
cannot be used for branch prediction (though adapting it would
|
509 |
|
|
not be difficult). The pass is located in @file{tree-vrp.c} and is
|
510 |
|
|
described by @code{pass_vrp}.
|
511 |
|
|
|
512 |
|
|
@item Folding built-in functions
|
513 |
|
|
|
514 |
|
|
This pass simplifies built-in functions, as applicable, with constant
|
515 |
|
|
arguments or with inferable string lengths. It is located in
|
516 |
|
|
@file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}.
|
517 |
|
|
|
518 |
|
|
@item Split critical edges
|
519 |
|
|
|
520 |
|
|
This pass identifies critical edges and inserts empty basic blocks
|
521 |
|
|
such that the edge is no longer critical. The pass is located in
|
522 |
|
|
@file{tree-cfg.c} and is described by @code{pass_split_crit_edges}.
|
523 |
|
|
|
524 |
|
|
@item Control dependence dead code elimination
|
525 |
|
|
|
526 |
|
|
This pass is a stronger form of dead code elimination that can
|
527 |
|
|
eliminate unnecessary control flow statements. It is located
|
528 |
|
|
in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}.
|
529 |
|
|
|
530 |
|
|
@item Tail call elimination
|
531 |
|
|
|
532 |
|
|
This pass identifies function calls that may be rewritten into
|
533 |
|
|
jumps. No code transformation is actually applied here, but the
|
534 |
|
|
data and control flow problem is solved. The code transformation
|
535 |
|
|
requires target support, and so is delayed until RTL@. In the
|
536 |
|
|
meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility.
|
537 |
|
|
The pass is located in @file{tree-tailcall.c} and is described by
|
538 |
|
|
@code{pass_tail_calls}. The RTL transformation is handled by
|
539 |
|
|
@code{fixup_tail_calls} in @file{calls.c}.
|
540 |
|
|
|
541 |
|
|
@item Warn for function return without value
|
542 |
|
|
|
543 |
|
|
For non-void functions, this pass locates return statements that do
|
544 |
|
|
not specify a value and issues a warning. Such a statement may have
|
545 |
|
|
been injected by falling off the end of the function. This pass is
|
546 |
|
|
run last so that we have as much time as possible to prove that the
|
547 |
|
|
statement is not reachable. It is located in @file{tree-cfg.c} and
|
548 |
|
|
is described by @code{pass_warn_function_return}.
|
549 |
|
|
|
550 |
|
|
@item Mudflap statement annotation
|
551 |
|
|
|
552 |
|
|
If mudflap is enabled, we rewrite some memory accesses with code to
|
553 |
|
|
validate that the memory access is correct. In particular, expressions
|
554 |
|
|
involving pointer dereferences (@code{INDIRECT_REF}, @code{ARRAY_REF},
|
555 |
|
|
etc.) are replaced by code that checks the selected address range
|
556 |
|
|
against the mudflap runtime's database of valid regions. This check
|
557 |
|
|
includes an inline lookup into a direct-mapped cache, based on
|
558 |
|
|
shift/mask operations of the pointer value, with a fallback function
|
559 |
|
|
call into the runtime. The pass is located in @file{tree-mudflap.c} and
|
560 |
|
|
is described by @code{pass_mudflap_2}.
|
561 |
|
|
|
562 |
|
|
@item Leave static single assignment form
|
563 |
|
|
|
564 |
|
|
This pass rewrites the function such that it is in normal form. At
|
565 |
|
|
the same time, we eliminate as many single-use temporaries as possible,
|
566 |
|
|
so the intermediate language is no longer GIMPLE, but GENERIC@. The
|
567 |
|
|
pass is located in @file{tree-outof-ssa.c} and is described by
|
568 |
|
|
@code{pass_del_ssa}.
|
569 |
|
|
|
570 |
|
|
@item Merge PHI nodes that feed into one another
|
571 |
|
|
|
572 |
|
|
This is part of the CFG cleanup passes. It attempts to join PHI nodes
|
573 |
|
|
from a forwarder CFG block into another block with PHI nodes. The
|
574 |
|
|
pass is located in @file{tree-cfgcleanup.c} and is described by
|
575 |
|
|
@code{pass_merge_phi}.
|
576 |
|
|
|
577 |
|
|
@item Return value optimization
|
578 |
|
|
|
579 |
|
|
If a function always returns the same local variable, and that local
|
580 |
|
|
variable is an aggregate type, then the variable is replaced with the
|
581 |
|
|
return value for the function (i.e., the function's DECL_RESULT). This
|
582 |
|
|
is equivalent to the C++ named return value optimization applied to
|
583 |
|
|
GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by
|
584 |
|
|
@code{pass_nrv}.
|
585 |
|
|
|
586 |
|
|
@item Return slot optimization
|
587 |
|
|
|
588 |
|
|
If a function returns a memory object and is called as @code{var =
|
589 |
|
|
foo()}, this pass tries to change the call so that the address of
|
590 |
|
|
@code{var} is sent to the caller to avoid an extra memory copy. This
|
591 |
|
|
pass is located in @code{tree-nrv.c} and is described by
|
592 |
|
|
@code{pass_return_slot}.
|
593 |
|
|
|
594 |
|
|
@item Optimize calls to @code{__builtin_object_size}
|
595 |
|
|
|
596 |
|
|
This is a propagation pass similar to CCP that tries to remove calls
|
597 |
|
|
to @code{__builtin_object_size} when the size of the object can be
|
598 |
|
|
computed at compile-time. This pass is located in
|
599 |
|
|
@file{tree-object-size.c} and is described by
|
600 |
|
|
@code{pass_object_sizes}.
|
601 |
|
|
|
602 |
|
|
@item Loop invariant motion
|
603 |
|
|
|
604 |
|
|
This pass removes expensive loop-invariant computations out of loops.
|
605 |
|
|
The pass is located in @file{tree-ssa-loop.c} and described by
|
606 |
|
|
@code{pass_lim}.
|
607 |
|
|
|
608 |
|
|
@item Loop nest optimizations
|
609 |
|
|
|
610 |
|
|
This is a family of loop transformations that works on loop nests. It
|
611 |
|
|
includes loop interchange, scaling, skewing and reversal and they are
|
612 |
|
|
all geared to the optimization of data locality in array traversals
|
613 |
|
|
and the removal of dependencies that hamper optimizations such as loop
|
614 |
|
|
parallelization and vectorization. The pass is located in
|
615 |
|
|
@file{tree-loop-linear.c} and described by
|
616 |
|
|
@code{pass_linear_transform}.
|
617 |
|
|
|
618 |
|
|
@item Removal of empty loops
|
619 |
|
|
|
620 |
|
|
This pass removes loops with no code in them. The pass is located in
|
621 |
|
|
@file{tree-ssa-loop-ivcanon.c} and described by
|
622 |
|
|
@code{pass_empty_loop}.
|
623 |
|
|
|
624 |
|
|
@item Unrolling of small loops
|
625 |
|
|
|
626 |
|
|
This pass completely unrolls loops with few iterations. The pass
|
627 |
|
|
is located in @file{tree-ssa-loop-ivcanon.c} and described by
|
628 |
|
|
@code{pass_complete_unroll}.
|
629 |
|
|
|
630 |
|
|
@item Predictive commoning
|
631 |
|
|
|
632 |
|
|
This pass makes the code reuse the computations from the previous
|
633 |
|
|
iterations of the loops, especially loads and stores to memory.
|
634 |
|
|
It does so by storing the values of these computations to a bank
|
635 |
|
|
of temporary variables that are rotated at the end of loop. To avoid
|
636 |
|
|
the need for this rotation, the loop is then unrolled and the copies
|
637 |
|
|
of the loop body are rewritten to use the appropriate version of
|
638 |
|
|
the temporary variable. This pass is located in @file{tree-predcom.c}
|
639 |
|
|
and described by @code{pass_predcom}.
|
640 |
|
|
|
641 |
|
|
@item Array prefetching
|
642 |
|
|
|
643 |
|
|
This pass issues prefetch instructions for array references inside
|
644 |
|
|
loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and
|
645 |
|
|
described by @code{pass_loop_prefetch}.
|
646 |
|
|
|
647 |
|
|
@item Reassociation
|
648 |
|
|
|
649 |
|
|
This pass rewrites arithmetic expressions to enable optimizations that
|
650 |
|
|
operate on them, like redundancy elimination and vectorization. The
|
651 |
|
|
pass is located in @file{tree-ssa-reassoc.c} and described by
|
652 |
|
|
@code{pass_reassoc}.
|
653 |
|
|
|
654 |
|
|
@item Optimization of @code{stdarg} functions
|
655 |
|
|
|
656 |
|
|
This pass tries to avoid the saving of register arguments into the
|
657 |
|
|
stack on entry to @code{stdarg} functions. If the function doesn't
|
658 |
|
|
use any @code{va_start} macros, no registers need to be saved. If
|
659 |
|
|
@code{va_start} macros are used, the @code{va_list} variables don't
|
660 |
|
|
escape the function, it is only necessary to save registers that will
|
661 |
|
|
be used in @code{va_arg} macros. For instance, if @code{va_arg} is
|
662 |
|
|
only used with integral types in the function, floating point
|
663 |
|
|
registers don't need to be saved. This pass is located in
|
664 |
|
|
@code{tree-stdarg.c} and described by @code{pass_stdarg}.
|
665 |
|
|
|
666 |
|
|
@end itemize
|
667 |
|
|
|
668 |
|
|
@node RTL passes
|
669 |
|
|
@section RTL passes
|
670 |
|
|
|
671 |
|
|
The following briefly describes the RTL generation and optimization
|
672 |
|
|
passes that are run after the Tree optimization passes.
|
673 |
|
|
|
674 |
|
|
@itemize @bullet
|
675 |
|
|
@item RTL generation
|
676 |
|
|
|
677 |
|
|
@c Avoiding overfull is tricky here.
|
678 |
|
|
The source files for RTL generation include
|
679 |
|
|
@file{stmt.c},
|
680 |
|
|
@file{calls.c},
|
681 |
|
|
@file{expr.c},
|
682 |
|
|
@file{explow.c},
|
683 |
|
|
@file{expmed.c},
|
684 |
|
|
@file{function.c},
|
685 |
|
|
@file{optabs.c}
|
686 |
|
|
and @file{emit-rtl.c}.
|
687 |
|
|
Also, the file
|
688 |
|
|
@file{insn-emit.c}, generated from the machine description by the
|
689 |
|
|
program @code{genemit}, is used in this pass. The header file
|
690 |
|
|
@file{expr.h} is used for communication within this pass.
|
691 |
|
|
|
692 |
|
|
@findex genflags
|
693 |
|
|
@findex gencodes
|
694 |
|
|
The header files @file{insn-flags.h} and @file{insn-codes.h},
|
695 |
|
|
generated from the machine description by the programs @code{genflags}
|
696 |
|
|
and @code{gencodes}, tell this pass which standard names are available
|
697 |
|
|
for use and which patterns correspond to them.
|
698 |
|
|
|
699 |
|
|
@item Generation of exception landing pads
|
700 |
|
|
|
701 |
|
|
This pass generates the glue that handles communication between the
|
702 |
|
|
exception handling library routines and the exception handlers within
|
703 |
|
|
the function. Entry points in the function that are invoked by the
|
704 |
|
|
exception handling library are called @dfn{landing pads}. The code
|
705 |
|
|
for this pass is located in @file{except.c}.
|
706 |
|
|
|
707 |
|
|
@item Control flow graph cleanup
|
708 |
|
|
|
709 |
|
|
This pass removes unreachable code, simplifies jumps to next, jumps to
|
710 |
|
|
jump, jumps across jumps, etc. The pass is run multiple times.
|
711 |
|
|
For historical reasons, it is occasionally referred to as the ``jump
|
712 |
|
|
optimization pass''. The bulk of the code for this pass is in
|
713 |
|
|
@file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c}
|
714 |
|
|
and @file{jump.c}.
|
715 |
|
|
|
716 |
|
|
@item Forward propagation of single-def values
|
717 |
|
|
|
718 |
|
|
This pass attempts to remove redundant computation by substituting
|
719 |
|
|
variables that come from a single definition, and
|
720 |
|
|
seeing if the result can be simplified. It performs copy propagation
|
721 |
|
|
and addressing mode selection. The pass is run twice, with values
|
722 |
|
|
being propagated into loops only on the second run. The code is
|
723 |
|
|
located in @file{fwprop.c}.
|
724 |
|
|
|
725 |
|
|
@item Common subexpression elimination
|
726 |
|
|
|
727 |
|
|
This pass removes redundant computation within basic blocks, and
|
728 |
|
|
optimizes addressing modes based on cost. The pass is run twice.
|
729 |
|
|
The code for this pass is located in @file{cse.c}.
|
730 |
|
|
|
731 |
|
|
@item Global common subexpression elimination
|
732 |
|
|
|
733 |
|
|
This pass performs two
|
734 |
|
|
different types of GCSE depending on whether you are optimizing for
|
735 |
|
|
size or not (LCM based GCSE tends to increase code size for a gain in
|
736 |
|
|
speed, while Morel-Renvoise based GCSE does not).
|
737 |
|
|
When optimizing for size, GCSE is done using Morel-Renvoise Partial
|
738 |
|
|
Redundancy Elimination, with the exception that it does not try to move
|
739 |
|
|
invariants out of loops---that is left to the loop optimization pass.
|
740 |
|
|
If MR PRE GCSE is done, code hoisting (aka unification) is also done, as
|
741 |
|
|
well as load motion.
|
742 |
|
|
If you are optimizing for speed, LCM (lazy code motion) based GCSE is
|
743 |
|
|
done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM
|
744 |
|
|
based GCSE also does loop invariant code motion. We also perform load
|
745 |
|
|
and store motion when optimizing for speed.
|
746 |
|
|
Regardless of which type of GCSE is used, the GCSE pass also performs
|
747 |
|
|
global constant and copy propagation.
|
748 |
|
|
The source file for this pass is @file{gcse.c}, and the LCM routines
|
749 |
|
|
are in @file{lcm.c}.
|
750 |
|
|
|
751 |
|
|
@item Loop optimization
|
752 |
|
|
|
753 |
|
|
This pass performs several loop related optimizations.
|
754 |
|
|
The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain
|
755 |
|
|
generic loop analysis and manipulation code. Initialization and finalization
|
756 |
|
|
of loop structures is handled by @file{loop-init.c}.
|
757 |
|
|
A loop invariant motion pass is implemented in @file{loop-invariant.c}.
|
758 |
|
|
Basic block level optimizations---unrolling, peeling and unswitching loops---
|
759 |
|
|
are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}.
|
760 |
|
|
Replacing of the exit condition of loops by special machine-dependent
|
761 |
|
|
instructions is handled by @file{loop-doloop.c}.
|
762 |
|
|
|
763 |
|
|
@item Jump bypassing
|
764 |
|
|
|
765 |
|
|
This pass is an aggressive form of GCSE that transforms the control
|
766 |
|
|
flow graph of a function by propagating constants into conditional
|
767 |
|
|
branch instructions. The source file for this pass is @file{gcse.c}.
|
768 |
|
|
|
769 |
|
|
@item If conversion
|
770 |
|
|
|
771 |
|
|
This pass attempts to replace conditional branches and surrounding
|
772 |
|
|
assignments with arithmetic, boolean value producing comparison
|
773 |
|
|
instructions, and conditional move instructions. In the very last
|
774 |
|
|
invocation after reload, it will generate predicated instructions
|
775 |
|
|
when supported by the target. The code is located in @file{ifcvt.c}.
|
776 |
|
|
|
777 |
|
|
@item Web construction
|
778 |
|
|
|
779 |
|
|
This pass splits independent uses of each pseudo-register. This can
|
780 |
|
|
improve effect of the other transformation, such as CSE or register
|
781 |
|
|
allocation. The code for this pass is located in @file{web.c}.
|
782 |
|
|
|
783 |
|
|
@item Instruction combination
|
784 |
|
|
|
785 |
|
|
This pass attempts to combine groups of two or three instructions that
|
786 |
|
|
are related by data flow into single instructions. It combines the
|
787 |
|
|
RTL expressions for the instructions by substitution, simplifies the
|
788 |
|
|
result using algebra, and then attempts to match the result against
|
789 |
|
|
the machine description. The code is located in @file{combine.c}.
|
790 |
|
|
|
791 |
|
|
@item Register movement
|
792 |
|
|
|
793 |
|
|
This pass looks for cases where matching constraints would force an
|
794 |
|
|
instruction to need a reload, and this reload would be a
|
795 |
|
|
register-to-register move. It then attempts to change the registers
|
796 |
|
|
used by the instruction to avoid the move instruction. The code is
|
797 |
|
|
located in @file{regmove.c}.
|
798 |
|
|
|
799 |
|
|
@item Mode switching optimization
|
800 |
|
|
|
801 |
|
|
This pass looks for instructions that require the processor to be in a
|
802 |
|
|
specific ``mode'' and minimizes the number of mode changes required to
|
803 |
|
|
satisfy all users. What these modes are, and what they apply to are
|
804 |
|
|
completely target-specific. The code for this pass is located in
|
805 |
|
|
@file{mode-switching.c}.
|
806 |
|
|
|
807 |
|
|
@cindex modulo scheduling
|
808 |
|
|
@cindex sms, swing, software pipelining
|
809 |
|
|
@item Modulo scheduling
|
810 |
|
|
|
811 |
|
|
This pass looks at innermost loops and reorders their instructions
|
812 |
|
|
by overlapping different iterations. Modulo scheduling is performed
|
813 |
|
|
immediately before instruction scheduling. The code for this pass is
|
814 |
|
|
located in @file{modulo-sched.c}.
|
815 |
|
|
|
816 |
|
|
@item Instruction scheduling
|
817 |
|
|
|
818 |
|
|
This pass looks for instructions whose output will not be available by
|
819 |
|
|
the time that it is used in subsequent instructions. Memory loads and
|
820 |
|
|
floating point instructions often have this behavior on RISC machines.
|
821 |
|
|
It re-orders instructions within a basic block to try to separate the
|
822 |
|
|
definition and use of items that otherwise would cause pipeline
|
823 |
|
|
stalls. This pass is performed twice, before and after register
|
824 |
|
|
allocation. The code for this pass is located in @file{haifa-sched.c},
|
825 |
|
|
@file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and
|
826 |
|
|
@file{sched-vis.c}.
|
827 |
|
|
|
828 |
|
|
@item Register allocation
|
829 |
|
|
|
830 |
|
|
These passes make sure that all occurrences of pseudo registers are
|
831 |
|
|
eliminated, either by allocating them to a hard register, replacing
|
832 |
|
|
them by an equivalent expression (e.g.@: a constant) or by placing
|
833 |
|
|
them on the stack. This is done in several subpasses:
|
834 |
|
|
|
835 |
|
|
@itemize @bullet
|
836 |
|
|
@item
|
837 |
|
|
Register move optimizations. This pass makes some simple RTL code
|
838 |
|
|
transformations which improve the subsequent register allocation. The
|
839 |
|
|
source file is @file{regmove.c}.
|
840 |
|
|
|
841 |
|
|
@item
|
842 |
|
|
The integrated register allocator (@acronym{IRA}). It is called
|
843 |
|
|
integrated because coalescing, register live range splitting, and hard
|
844 |
|
|
register preferencing are done on-the-fly during coloring. It also
|
845 |
|
|
has better integration with the reload pass. Pseudo-registers spilled
|
846 |
|
|
by the allocator or the reload have still a chance to get
|
847 |
|
|
hard-registers if the reload evicts some pseudo-registers from
|
848 |
|
|
hard-registers. The allocator helps to choose better pseudos for
|
849 |
|
|
spilling based on their live ranges and to coalesce stack slots
|
850 |
|
|
allocated for the spilled pseudo-registers. IRA is a regional
|
851 |
|
|
register allocator which is transformed into Chaitin-Briggs allocator
|
852 |
|
|
if there is one region. By default, IRA chooses regions using
|
853 |
|
|
register pressure but the user can force it to use one region or
|
854 |
|
|
regions corresponding to all loops.
|
855 |
|
|
|
856 |
|
|
Source files of the allocator are @file{ira.c}, @file{ira-build.c},
|
857 |
|
|
@file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c},
|
858 |
|
|
@file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h}
|
859 |
|
|
and @file{ira-int.h} used for the communication between the allocator
|
860 |
|
|
and the rest of the compiler and between the IRA files.
|
861 |
|
|
|
862 |
|
|
@cindex reloading
|
863 |
|
|
@item
|
864 |
|
|
Reloading. This pass renumbers pseudo registers with the hardware
|
865 |
|
|
registers numbers they were allocated. Pseudo registers that did not
|
866 |
|
|
get hard registers are replaced with stack slots. Then it finds
|
867 |
|
|
instructions that are invalid because a value has failed to end up in
|
868 |
|
|
a register, or has ended up in a register of the wrong kind. It fixes
|
869 |
|
|
up these instructions by reloading the problematical values
|
870 |
|
|
temporarily into registers. Additional instructions are generated to
|
871 |
|
|
do the copying.
|
872 |
|
|
|
873 |
|
|
The reload pass also optionally eliminates the frame pointer and inserts
|
874 |
|
|
instructions to save and restore call-clobbered registers around calls.
|
875 |
|
|
|
876 |
|
|
Source files are @file{reload.c} and @file{reload1.c}, plus the header
|
877 |
|
|
@file{reload.h} used for communication between them.
|
878 |
|
|
@end itemize
|
879 |
|
|
|
880 |
|
|
@item Basic block reordering
|
881 |
|
|
|
882 |
|
|
This pass implements profile guided code positioning. If profile
|
883 |
|
|
information is not available, various types of static analysis are
|
884 |
|
|
performed to make the predictions normally coming from the profile
|
885 |
|
|
feedback (IE execution frequency, branch probability, etc). It is
|
886 |
|
|
implemented in the file @file{bb-reorder.c}, and the various
|
887 |
|
|
prediction routines are in @file{predict.c}.
|
888 |
|
|
|
889 |
|
|
@item Variable tracking
|
890 |
|
|
|
891 |
|
|
This pass computes where the variables are stored at each
|
892 |
|
|
position in code and generates notes describing the variable locations
|
893 |
|
|
to RTL code. The location lists are then generated according to these
|
894 |
|
|
notes to debug information if the debugging information format supports
|
895 |
|
|
location lists. The code is located in @file{var-tracking.c}.
|
896 |
|
|
|
897 |
|
|
@item Delayed branch scheduling
|
898 |
|
|
|
899 |
|
|
This optional pass attempts to find instructions that can go into the
|
900 |
|
|
delay slots of other instructions, usually jumps and calls. The code
|
901 |
|
|
for this pass is located in @file{reorg.c}.
|
902 |
|
|
|
903 |
|
|
@item Branch shortening
|
904 |
|
|
|
905 |
|
|
On many RISC machines, branch instructions have a limited range.
|
906 |
|
|
Thus, longer sequences of instructions must be used for long branches.
|
907 |
|
|
In this pass, the compiler figures out what how far each instruction
|
908 |
|
|
will be from each other instruction, and therefore whether the usual
|
909 |
|
|
instructions, or the longer sequences, must be used for each branch.
|
910 |
|
|
The code for this pass is located in @file{final.c}.
|
911 |
|
|
|
912 |
|
|
@item Register-to-stack conversion
|
913 |
|
|
|
914 |
|
|
Conversion from usage of some hard registers to usage of a register
|
915 |
|
|
stack may be done at this point. Currently, this is supported only
|
916 |
|
|
for the floating-point registers of the Intel 80387 coprocessor. The
|
917 |
|
|
code for this pass is located in @file{reg-stack.c}.
|
918 |
|
|
|
919 |
|
|
@item Final
|
920 |
|
|
|
921 |
|
|
This pass outputs the assembler code for the function. The source files
|
922 |
|
|
are @file{final.c} plus @file{insn-output.c}; the latter is generated
|
923 |
|
|
automatically from the machine description by the tool @file{genoutput}.
|
924 |
|
|
The header file @file{conditions.h} is used for communication between
|
925 |
|
|
these files. If mudflap is enabled, the queue of deferred declarations
|
926 |
|
|
and any addressed constants (e.g., string literals) is processed by
|
927 |
|
|
@code{mudflap_finish_file} into a synthetic constructor function
|
928 |
|
|
containing calls into the mudflap runtime.
|
929 |
|
|
|
930 |
|
|
@item Debugging information output
|
931 |
|
|
|
932 |
|
|
This is run after final because it must output the stack slot offsets
|
933 |
|
|
for pseudo registers that did not get hard registers. Source files
|
934 |
|
|
are @file{dbxout.c} for DBX symbol table format, @file{sdbout.c} for
|
935 |
|
|
SDB symbol table format, @file{dwarfout.c} for DWARF symbol table
|
936 |
|
|
format, files @file{dwarf2out.c} and @file{dwarf2asm.c} for DWARF2
|
937 |
|
|
symbol table format, and @file{vmsdbgout.c} for VMS debug symbol table
|
938 |
|
|
format.
|
939 |
|
|
|
940 |
|
|
@end itemize
|