1 |
38 |
julius |
README for GPROF
|
2 |
|
|
|
3 |
|
|
This is the GNU profiler. It is distributed with other "binary
|
4 |
|
|
utilities" which should be in ../binutils. See ../binutils/README for
|
5 |
|
|
more general notes, including where to send bug reports.
|
6 |
|
|
|
7 |
|
|
This file documents the changes and new features available with this
|
8 |
|
|
version of GNU gprof.
|
9 |
|
|
|
10 |
|
|
* New Features
|
11 |
|
|
|
12 |
|
|
o Long options
|
13 |
|
|
|
14 |
|
|
o Supports generalized file format, without breaking backward compatibility:
|
15 |
|
|
new file format supports basic-block execution counts and non-realtime
|
16 |
|
|
histograms (see below)
|
17 |
|
|
|
18 |
|
|
o Supports profiling at the line level: flat profiles, call-graph profiles,
|
19 |
|
|
and execution-counts can all be displayed at a level that identifies
|
20 |
|
|
individual lines rather than just functions
|
21 |
|
|
|
22 |
|
|
o Test-coverage support (similar to Sun tcov program): source files
|
23 |
|
|
can be annotated with the number of times a function was invoked
|
24 |
|
|
or with the number of times each basic-block in a function was
|
25 |
|
|
executed
|
26 |
|
|
|
27 |
|
|
o Generalized histograms: not just execution-time, but arbitrary
|
28 |
|
|
histograms are support (for example, performance counter based
|
29 |
|
|
profiles)
|
30 |
|
|
|
31 |
|
|
o Powerful mechanism to select data to be included/excluded from
|
32 |
|
|
analysis and/or output
|
33 |
|
|
|
34 |
|
|
o Support for DEC OSF/1 v3.0
|
35 |
|
|
|
36 |
|
|
o Full cross-platform profiling support: gprof uses BFD to support
|
37 |
|
|
arbitrary, non-native object file formats and non-native byte-orders
|
38 |
|
|
(this feature has not been tested yet)
|
39 |
|
|
|
40 |
|
|
o In the call-graph function index, static function names are now
|
41 |
|
|
printed together with the filename in which the function was defined
|
42 |
|
|
(required bfd_find_nearest_line() support and symbolic debugging
|
43 |
|
|
information to be present in the executable file)
|
44 |
|
|
|
45 |
|
|
o Major overhaul of source code (compiles cleanly with -Wall, etc.)
|
46 |
|
|
|
47 |
|
|
* Supported Platforms
|
48 |
|
|
|
49 |
|
|
The current version is known to work on:
|
50 |
|
|
|
51 |
|
|
o DEC OSF/1 v3.0
|
52 |
|
|
All features supported.
|
53 |
|
|
|
54 |
|
|
o SunOS 4.1.x
|
55 |
|
|
All features supported.
|
56 |
|
|
|
57 |
|
|
o Solaris 2.3
|
58 |
|
|
Line-level profiling unsupported because bfd_find_nearest_line()
|
59 |
|
|
is not fully implemented for Elf binaries.
|
60 |
|
|
|
61 |
|
|
o HP-UX 9.01
|
62 |
|
|
Line-level profiling unsupported because bfd_find_nearest_line()
|
63 |
|
|
is not fully implemented for SOM binaries.
|
64 |
|
|
|
65 |
|
|
* Detailed Description
|
66 |
|
|
|
67 |
|
|
** User Interface Changes
|
68 |
|
|
|
69 |
|
|
The command-line interface is backwards compatible with earlier
|
70 |
|
|
versions of GNU gprof and Berkeley gprof. The only exception is
|
71 |
|
|
the option to delete arcs from the call graph. The old syntax
|
72 |
|
|
was:
|
73 |
|
|
|
74 |
|
|
-k fromname toname
|
75 |
|
|
|
76 |
|
|
while the new syntax is:
|
77 |
|
|
|
78 |
|
|
-k fromname/toname
|
79 |
|
|
|
80 |
|
|
This change was necessary to be compatible with long-option parsing.
|
81 |
|
|
Also, "fromname" and "toname" can now be arbitrary symspecs rather
|
82 |
|
|
than just function names (see below for an explanation of symspecs).
|
83 |
|
|
For example, option "-k gprof.c/" suppresses all arcs due to calls out
|
84 |
|
|
of file "gprof.c".
|
85 |
|
|
|
86 |
|
|
*** Sym Specs
|
87 |
|
|
|
88 |
|
|
It is often necessary to apply gprof only to specific parts of a
|
89 |
|
|
program. GNU gprof has a simple but powerful mechanism to achieve
|
90 |
|
|
this. So called {\em symspecs\/} provide the foundation for this
|
91 |
|
|
mechanism. A symspec selects the parts of a profiled program to which
|
92 |
|
|
an operation should be applied to. The syntax of a symspec is
|
93 |
|
|
simple:
|
94 |
|
|
|
95 |
|
|
filename_containing_a_dot
|
96 |
|
|
| funcname_not_containing_a_dot
|
97 |
|
|
| linenumber
|
98 |
|
|
| ( [ any_filename ] `:' ( any_funcname | linenumber ) )
|
99 |
|
|
|
100 |
|
|
Here are some examples:
|
101 |
|
|
|
102 |
|
|
main.c Selects everything in file "main.c"---the
|
103 |
|
|
dot in the string tells gprof to interpret
|
104 |
|
|
the string as a filename, rather than as
|
105 |
|
|
a function name. To select a file whose
|
106 |
|
|
name does contain a dot, a trailing colon
|
107 |
|
|
should be specified. For example, "odd:" is
|
108 |
|
|
interpreted as the file named "odd".
|
109 |
|
|
|
110 |
|
|
main Selects all functions named "main". Notice
|
111 |
|
|
that there may be multiple instances of the
|
112 |
|
|
same function name because some of the
|
113 |
|
|
definitions may be local (i.e., static).
|
114 |
|
|
Unless a function name is unique in a program,
|
115 |
|
|
you must use the colon notation explained
|
116 |
|
|
below to specify a function from a specific
|
117 |
|
|
source file. Sometimes, functionnames contain
|
118 |
|
|
dots. In such cases, it is necessary to
|
119 |
|
|
add a leading colon to the name. For example,
|
120 |
|
|
":.mul" selects function ".mul".
|
121 |
|
|
|
122 |
|
|
main.c:main Selects function "main" in file "main.c".
|
123 |
|
|
|
124 |
|
|
main.c:134 Selects line 134 in file "main.c".
|
125 |
|
|
|
126 |
|
|
IMPLEMENTATION NOTE: The source code uses the type sym_id for symspecs.
|
127 |
|
|
At some point, this probably ought to be changed to "sym_spec" to make
|
128 |
|
|
reading the code easier.
|
129 |
|
|
|
130 |
|
|
*** Long options
|
131 |
|
|
|
132 |
|
|
GNU gprof now supports long options. The following is a list of all
|
133 |
|
|
supported options. Options that are listed without description
|
134 |
|
|
operate in the same manner as the corresponding option in older
|
135 |
|
|
versions of gprof.
|
136 |
|
|
|
137 |
|
|
Short Form: Long Form:
|
138 |
|
|
----------- ----------
|
139 |
|
|
-l --line
|
140 |
|
|
Request profiling at the line-level rather
|
141 |
|
|
than just at the function level. Source
|
142 |
|
|
lines are identified by symbols of the form:
|
143 |
|
|
|
144 |
|
|
func (file:line)
|
145 |
|
|
|
146 |
|
|
where "func" is the function name, "file" is the
|
147 |
|
|
file name and "line" is the line-number that
|
148 |
|
|
corresponds to the line.
|
149 |
|
|
|
150 |
|
|
To work properly, the binary must contain symbolic
|
151 |
|
|
debugging information. This means that the source
|
152 |
|
|
have to be translated with option "-g" specified.
|
153 |
|
|
Functions for which there is no symbolic debugging
|
154 |
|
|
information available are treated as if "--line"
|
155 |
|
|
had not been specified. However, the line number
|
156 |
|
|
printed with such symbols is usually incorrect
|
157 |
|
|
and should be ignored.
|
158 |
|
|
|
159 |
|
|
-a --no-static
|
160 |
|
|
-A[symspec] --annotated-source[=symspec]
|
161 |
|
|
Request output in the form of annotated source
|
162 |
|
|
files. If "symspec" is specified, print output only
|
163 |
|
|
for symbols selected by "symspec". If the option
|
164 |
|
|
is specified multiple times, annotated output is
|
165 |
|
|
generated for the union of all symspecs.
|
166 |
|
|
|
167 |
|
|
Examples:
|
168 |
|
|
|
169 |
|
|
-A Prints annotated source for all
|
170 |
|
|
source files.
|
171 |
|
|
-Agprof.c Prints annotated source for file
|
172 |
|
|
gprof.c.
|
173 |
|
|
-Afoobar Prints annotated source for files
|
174 |
|
|
containing a function named "foobar".
|
175 |
|
|
The entire file will be printed, but
|
176 |
|
|
only the function itself will be
|
177 |
|
|
annotated with profile data.
|
178 |
|
|
|
179 |
|
|
-J[symspec] --no-annotated-source[=symspec]
|
180 |
|
|
Suppress annotated source output. If specified
|
181 |
|
|
without argument, annotated output is suppressed
|
182 |
|
|
completely. With an argument, annotated output
|
183 |
|
|
is suppressed only for the symbols selected by
|
184 |
|
|
"symspec". If the option is specified multiple
|
185 |
|
|
times, annotated output is suppressed for the
|
186 |
|
|
union of all symspecs. This option has lower
|
187 |
|
|
precedence than --annotated-source
|
188 |
|
|
|
189 |
|
|
-p[symspec] --flat-profile[=symspec]
|
190 |
|
|
Request output in the form of a flat profile
|
191 |
|
|
(unless any other output-style option is specified,
|
192 |
|
|
this option is turned on by default). If
|
193 |
|
|
"symspec" is specified, include only symbols
|
194 |
|
|
selected by "symspec" in flat profile. If the
|
195 |
|
|
option is specified multiple times, the flat
|
196 |
|
|
profile includes symbols selected by the union
|
197 |
|
|
of all symspecs.
|
198 |
|
|
|
199 |
|
|
-P[symspec] --no-flat-profile[=symspec]
|
200 |
|
|
Suppress output in the flat profile. If given
|
201 |
|
|
without an argument, the flat profile is suppressed
|
202 |
|
|
completely. If "symspec" is specified, suppress
|
203 |
|
|
the selected symbols in the flat profile. If the
|
204 |
|
|
option is specified multiple times, the union of
|
205 |
|
|
the selected symbols is suppressed. This option
|
206 |
|
|
has lower precedence than --flat-profile.
|
207 |
|
|
|
208 |
|
|
-q[symspec] --graph[=symspec]
|
209 |
|
|
Request output in the form of a call-graph
|
210 |
|
|
(unless any other output-style option is specified,
|
211 |
|
|
this option is turned on by default). If "symspec"
|
212 |
|
|
is specified, include only symbols selected by
|
213 |
|
|
"symspec" in the call-graph. If the option is
|
214 |
|
|
specified multiple times, the call-graph includes
|
215 |
|
|
symbols selected by the union of all symspecs.
|
216 |
|
|
|
217 |
|
|
-Q[symspec] --no-graph[=symspec]
|
218 |
|
|
Suppress output in the call-graph. If given without
|
219 |
|
|
an argument, the call-graph is suppressed completely.
|
220 |
|
|
With a "symspec", suppress the selected symbols
|
221 |
|
|
from the call-graph. If the option is specified
|
222 |
|
|
multiple times, the union of the selected symbols
|
223 |
|
|
is suppressed. This option has lower precedence
|
224 |
|
|
than --graph.
|
225 |
|
|
|
226 |
|
|
-C[symspec] --exec-counts[=symspec]
|
227 |
|
|
Request output in the form of execution counts.
|
228 |
|
|
If "symspec" is present, include only symbols
|
229 |
|
|
selected by "symspec" in the execution count
|
230 |
|
|
listing. If the option is specified multiple
|
231 |
|
|
times, the execution count listing includes
|
232 |
|
|
symbols selected by the union of all symspecs.
|
233 |
|
|
|
234 |
|
|
-Z[symspec] --no-exec-counts[=symspec]
|
235 |
|
|
Suppress output in the execution count listing.
|
236 |
|
|
If given without an argument, the listing is
|
237 |
|
|
suppressed completely. With a "symspec", suppress
|
238 |
|
|
the selected symbols from the call-graph. If the
|
239 |
|
|
option is specified multiple times, the union of
|
240 |
|
|
the selected symbols is suppressed. This option
|
241 |
|
|
has lower precedence than --exec-counts.
|
242 |
|
|
|
243 |
|
|
-i --file-info
|
244 |
|
|
Print information about the profile files that
|
245 |
|
|
are read. The information consists of the
|
246 |
|
|
number and types of records present in the
|
247 |
|
|
profile file. Currently, a profile file can
|
248 |
|
|
contain any number and any combination of histogram,
|
249 |
|
|
call-graph, or basic-block count records.
|
250 |
|
|
|
251 |
|
|
-s --sum
|
252 |
|
|
|
253 |
|
|
-x --all-lines
|
254 |
|
|
This option affects annotated source output only.
|
255 |
|
|
By default, only the lines at the beginning of
|
256 |
|
|
a basic-block are annotated. If this option is
|
257 |
|
|
specified, every line in a basic-block is annotated
|
258 |
|
|
by repeating the annotation for the first line.
|
259 |
|
|
This option is identical to tcov's "-a".
|
260 |
|
|
|
261 |
|
|
-I dirs --directory-path=dirs
|
262 |
|
|
This option affects annotated source output only.
|
263 |
|
|
Specifies the list of directories to be searched
|
264 |
|
|
for source files. The argument "dirs" is a colon
|
265 |
|
|
separated list of directories. By default, gprof
|
266 |
|
|
searches for source files relative to the current
|
267 |
|
|
working directory only.
|
268 |
|
|
|
269 |
|
|
-z --display-unused-functions
|
270 |
|
|
|
271 |
|
|
-m num --min-count=num
|
272 |
|
|
This option affects annotated source and execution
|
273 |
|
|
count output only. Symbols that are executed
|
274 |
|
|
less than "num" times are suppressed. For annotated
|
275 |
|
|
source output, suppressed symbols are marked
|
276 |
|
|
by five hash-marks (#####). In an execution count
|
277 |
|
|
output, suppressed symbols do not appear at all.
|
278 |
|
|
|
279 |
|
|
-L --print-path
|
280 |
|
|
Normally, source filenames are printed with the path
|
281 |
|
|
component suppressed. With this option, gprof
|
282 |
|
|
can be forced to print the full pathname of
|
283 |
|
|
source filenames. The full pathname is determined
|
284 |
|
|
from symbolic debugging information in the image file
|
285 |
|
|
and is relative to the directory in which the compiler
|
286 |
|
|
was invoked.
|
287 |
|
|
|
288 |
|
|
-y --separate-files
|
289 |
|
|
This option affects annotated source output only.
|
290 |
|
|
Normally, gprof prints annotated source files
|
291 |
|
|
to standard-output. If this option is specified,
|
292 |
|
|
annotated source for a file named "path/filename"
|
293 |
|
|
is generated in the file "filename-ann". That is,
|
294 |
|
|
annotated output is {\em always\/} generated in
|
295 |
|
|
gprof's current working directory. Care has to
|
296 |
|
|
be taken if a program consists of files that have
|
297 |
|
|
identical filenames, but distinct paths.
|
298 |
|
|
|
299 |
|
|
-c --static-call-graph
|
300 |
|
|
|
301 |
|
|
-t num --table-length=num
|
302 |
|
|
This option affects annotated source output only.
|
303 |
|
|
After annotating a source file, gprof generates
|
304 |
|
|
an execution count summary consisting of a table
|
305 |
|
|
of lines with the top execution counts. By
|
306 |
|
|
default, this table is ten entries long.
|
307 |
|
|
This option can be used to change the table length
|
308 |
|
|
or, by specifying an argument value of 0, it can be
|
309 |
|
|
suppressed completely.
|
310 |
|
|
|
311 |
|
|
-n symspec --time=symspec
|
312 |
|
|
Only symbols selected by "symspec" are considered
|
313 |
|
|
in total and percentage time computations.
|
314 |
|
|
However, this option does not affect percentage time
|
315 |
|
|
computation for the flat profile.
|
316 |
|
|
If the option is specified multiple times, the union
|
317 |
|
|
of all selected symbols is used in time computations.
|
318 |
|
|
|
319 |
|
|
-N --no-time=symspec
|
320 |
|
|
Exclude the symbols selected by "symspec" from
|
321 |
|
|
total and percentage time computations.
|
322 |
|
|
However, this option does not affect percentage time
|
323 |
|
|
computation for the flat profile.
|
324 |
|
|
This option is ignored if any --time options are
|
325 |
|
|
specified.
|
326 |
|
|
|
327 |
|
|
-w num --width=num
|
328 |
|
|
Sets the output line width. Currently, this option
|
329 |
|
|
affects the printing of the call-graph function index
|
330 |
|
|
only.
|
331 |
|
|
|
332 |
|
|
-e
|
333 |
|
|
-E
|
334 |
|
|
-f
|
335 |
|
|
-F
|
336 |
|
|
-k
|
337 |
|
|
-b --brief
|
338 |
|
|
-dnum --debug[=num]
|
339 |
|
|
|
340 |
|
|
-h --help
|
341 |
|
|
Prints a usage message.
|
342 |
|
|
|
343 |
|
|
-O name --file-format=name
|
344 |
|
|
Selects the format of the profile data files.
|
345 |
|
|
Recognized formats are "auto", "bsd", "magic",
|
346 |
|
|
and "prof". The last one is not yet supported.
|
347 |
|
|
Format "auto" attempts to detect the file format
|
348 |
|
|
automatically (this is the default behavior).
|
349 |
|
|
It attempts to read the profile data files as
|
350 |
|
|
"magic" files and if this fails, falls back to
|
351 |
|
|
the "bsd" format. "bsd" forces gprof to read
|
352 |
|
|
the data files in the BSD format. "magic" forces
|
353 |
|
|
gprof to read the data files in the "magic" format.
|
354 |
|
|
|
355 |
|
|
-T --traditional
|
356 |
|
|
-v --version
|
357 |
|
|
|
358 |
|
|
** File Format Changes
|
359 |
|
|
|
360 |
|
|
The old BSD-derived format used for profile data does not contain a
|
361 |
|
|
magic cookie that allows to check whether a data file really is a
|
362 |
|
|
gprof file. Furthermore, it does not provide a version number, thus
|
363 |
|
|
rendering changes to the file format almost impossible. GNU gprof
|
364 |
|
|
uses a new file format that provides these features. For backward
|
365 |
|
|
compatibility, GNU gprof continues to support the old BSD-derived
|
366 |
|
|
format, but not all features are supported with it. For example,
|
367 |
|
|
basic-block execution counts cannot be accommodated by the old file
|
368 |
|
|
format.
|
369 |
|
|
|
370 |
|
|
The new file format is defined in header file \file{gmon_out.h}. It
|
371 |
|
|
consists of a header containing the magic cookie and a version number,
|
372 |
|
|
as well as some spare bytes available for future extensions. All data
|
373 |
|
|
in a profile data file is in the native format of the host on which
|
374 |
|
|
the profile was collected. GNU gprof adapts automatically to the
|
375 |
|
|
byte-order in use.
|
376 |
|
|
|
377 |
|
|
In the new file format, the header is followed by a sequence of
|
378 |
|
|
records. Currently, there are three different record types: histogram
|
379 |
|
|
records, call-graph arc records, and basic-block execution count
|
380 |
|
|
records. Each file can contain any number of each record type. When
|
381 |
|
|
reading a file, GNU gprof will ensure records of the same type are
|
382 |
|
|
compatible with each other and compute the union of all records. For
|
383 |
|
|
example, for basic-block execution counts, the union is simply the sum
|
384 |
|
|
of all execution counts for each basic-block.
|
385 |
|
|
|
386 |
|
|
*** Histogram Records
|
387 |
|
|
|
388 |
|
|
Histogram records consist of a header that is followed by an array of
|
389 |
|
|
bins. The header contains the text-segment range that the histogram
|
390 |
|
|
spans, the size of the histogram in bytes (unlike in the old BSD
|
391 |
|
|
format, this does not include the size of the header), the rate of the
|
392 |
|
|
profiling clock, and the physical dimension that the bin counts
|
393 |
|
|
represent after being scaled by the profiling clock rate. The
|
394 |
|
|
physical dimension is specified in two parts: a long name of up to 15
|
395 |
|
|
characters and a single character abbreviation. For example, a
|
396 |
|
|
histogram representing real-time would specify the long name as
|
397 |
|
|
"seconds" and the abbreviation as "s". This feature is useful for
|
398 |
|
|
architectures that support performance monitor hardware (which,
|
399 |
|
|
fortunately, is becoming increasingly common). For example, under DEC
|
400 |
|
|
OSF/1, the "uprofile" command can be used to produce a histogram of,
|
401 |
|
|
say, instruction cache misses. In this case, the dimension in the
|
402 |
|
|
histogram header could be set to "i-cache misses" and the abbreviation
|
403 |
|
|
could be set to "1" (because it is simply a count, not a physical
|
404 |
|
|
dimension). Also, the profiling rate would have to be set to 1 in
|
405 |
|
|
this case.
|
406 |
|
|
|
407 |
|
|
Histogram bins are 16-bit numbers and each bin represent an equal
|
408 |
|
|
amount of text-space. For example, if the text-segment is one
|
409 |
|
|
thousand bytes long and if there are ten bins in the histogram, each
|
410 |
|
|
bin represents one hundred bytes.
|
411 |
|
|
|
412 |
|
|
|
413 |
|
|
*** Call-Graph Records
|
414 |
|
|
|
415 |
|
|
Call-graph records have a format that is identical to the one used in
|
416 |
|
|
the BSD-derived file format. It consists of an arc in the call graph
|
417 |
|
|
and a count indicating the number of times the arc was traversed
|
418 |
|
|
during program execution. Arcs are specified by a pair of addresses:
|
419 |
|
|
the first must be within caller's function and the second must be
|
420 |
|
|
within the callee's function. When performing profiling at the
|
421 |
|
|
function level, these addresses can point anywhere within the
|
422 |
|
|
respective function. However, when profiling at the line-level, it is
|
423 |
|
|
better if the addresses are as close to the call-site/entry-point as
|
424 |
|
|
possible. This will ensure that the line-level call-graph is able to
|
425 |
|
|
identify exactly which line of source code performed calls to a
|
426 |
|
|
function.
|
427 |
|
|
|
428 |
|
|
*** Basic-Block Execution Count Records
|
429 |
|
|
|
430 |
|
|
Basic-block execution count records consist of a header followed by a
|
431 |
|
|
sequence of address/count pairs. The header simply specifies the
|
432 |
|
|
length of the sequence. In an address/count pair, the address
|
433 |
|
|
identifies a basic-block and the count specifies the number of times
|
434 |
|
|
that basic-block was executed. Any address within the basic-address can
|
435 |
|
|
be used.
|
436 |
|
|
|
437 |
|
|
IMPLEMENTATION NOTE: gcc -a can be used to instrument a program to
|
438 |
|
|
record basic-block execution counts. However, the __bb_exit_func()
|
439 |
|
|
that is currently present in libgcc2.c does not generate a gmon.out
|
440 |
|
|
file in a suitable format. This should be fixed for future releases
|
441 |
|
|
of gcc. In the meantime, contact davidm@cs.arizona.edu for a version
|
442 |
|
|
of __bb_exit_func() to is appropriate.
|