OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [trunk/] [gnu-stable/] [gcc-4.5.1/] [libstdc++-v3/] [doc/] [xml/] [manual/] [profile_mode.xml] - Blame information for rev 826

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 424 jeremybenn
2
3
 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
4
[ ]>
5
 
6
7
8
 
9
10
  
11
    
12
      C++
13
    
14
    
15
      library
16
    
17
    
18
      profile
19
    
20
  
21
22
 
23
Profile Mode
24
 
25
 
26
27
  Intro
28
  
29
  Goal: Give performance improvement advice based on
30
  recognition of suboptimal usage patterns of the standard library.
31
  
32
 
33
  
34
  Method: Wrap the standard library code.  Insert
35
  calls to an instrumentation library to record the internal state of
36
  various components at interesting entry/exit points to/from the standard
37
  library.  Process trace, recognize suboptimal patterns, give advice.
38
  For details, see
39
  paper presented at
40
   CGO 2009.
41
  
42
  
43
  Strengths: 
44
45
  
46
  Unintrusive solution.  The application code does not require any
47
  modification.
48
  
49
   The advice is call context sensitive, thus capable of
50
  identifying precisely interesting dynamic performance behavior.
51
  
52
  
53
  The overhead model is pay-per-view.  When you turn off a diagnostic class
54
  at compile time, its overhead disappears.
55
  
56
57
  
58
  
59
  Drawbacks: 
60
61
  
62
  You must recompile the application code with custom options.
63
  
64
  You must run the application on representative input.
65
  The advice is input dependent.
66
  
67
  
68
  The execution time will increase, in some cases by factors.
69
  
70
71
  
72
 
73
 
74
75
  Using the Profile Mode
76
 
77
  
78
  This is the anticipated common workflow for program foo.cc:
79
80
$ cat foo.cc
81
#include <vector>
82
int main() {
83
  vector<int> v;
84
  for (int k = 0; k < 1024; ++k) v.insert(v.begin(), k);
85
}
86
 
87
$ g++ -D_GLIBCXX_PROFILE foo.cc
88
$ ./a.out
89
$ cat libstdcxx-profile.txt
90
vector-to-list: improvement = 5: call stack = 0x804842c ...
91
    : advice = change std::vector to std::list
92
vector-size: improvement = 3: call stack = 0x804842c ...
93
    : advice = change initial container size from 0 to 1024
94
95
  
96
 
97
  
98
  Anatomy of a warning:
99
  
100
  
101
  
102
  Warning id.  This is a short descriptive string for the class
103
  that this warning belongs to.  E.g., "vector-to-list".
104
  
105
  
106
  
107
  
108
  Estimated improvement.  This is an approximation of the benefit expected
109
  from implementing the change suggested by the warning.  It is given on
110
  a log10 scale.  Negative values mean that the alternative would actually
111
  do worse than the current choice.
112
  In the example above, 5 comes from the fact that the overhead of
113
  inserting at the beginning of a vector vs. a list is around 1024 * 1024 / 2,
114
  which is around 10e5.  The improvement from setting the initial size to
115
  1024 is in the range of 10e3, since the overhead of dynamic resizing is
116
  linear in this case.
117
  
118
  
119
  
120
  
121
  Call stack.  Currently, the addresses are printed without
122
  symbol name or code location attribution.
123
  Users are expected to postprocess the output using, for instance, addr2line.
124
  
125
  
126
  
127
  
128
  The warning message.  For some warnings, this is static text, e.g.,
129
  "change vector to list".  For other warnings, such as the one above,
130
  the message contains numeric advice, e.g., the suggested initial size
131
  of the vector.
132
  
133
  
134
  
135
  
136
 
137
  Three files are generated.  libstdcxx-profile.txt
138
   contains human readable advice.  libstdcxx-profile.raw
139
   contains implementation specific data about each diagnostic.
140
   Their format is not documented.  They are sufficient to generate
141
   all the advice given in libstdcxx-profile.txt.  The advantage
142
   of keeping this raw format is that traces from multiple executions can
143
   be aggregated simply by concatenating the raw traces.  We intend to
144
   offer an external utility program that can issue advice from a trace.
145
   libstdcxx-profile.conf.out lists the actual diagnostic
146
   parameters used.  To alter parameters, edit this file and rename it to
147
   libstdcxx-profile.conf.
148
  
149
 
150
  Advice is given regardless whether the transformation is valid.
151
  For instance, we advise changing a map to an unordered_map even if the
152
  application semantics require that data be ordered.
153
  We believe such warnings can help users understand the performance
154
  behavior of their application better, which can lead to changes
155
  at a higher abstraction level.
156
  
157
 
158
159
 
160
161
  Tuning the Profile Mode
162
 
163
  Compile time switches and environment variables (see also file
164
   profiler.h).  Unless specified otherwise, they can be set at compile time
165
   using -D_<name> or by setting variable <name>
166
   in the environment where the program is run, before starting execution.
167
  
168
  
169
   _GLIBCXX_PROFILE_NO_<diagnostic>:
170
   disable specific diagnostics.
171
   See section Diagnostics for possible values.
172
   (Environment variables not supported.)
173
   
174
  
175
   _GLIBCXX_PROFILE_TRACE_PATH_ROOT: set an alternative root
176
   path for the output files.
177
   
178
  _GLIBCXX_PROFILE_MAX_WARN_COUNT: set it to the maximum
179
   number of warnings desired.  The default value is 10.
180
  
181
   _GLIBCXX_PROFILE_MAX_STACK_DEPTH: if set to 0,
182
   the advice will
183
   be collected and reported for the program as a whole, and not for each
184
   call context.
185
   This could also be used in continuous regression tests, where you
186
   just need to know whether there is a regression or not.
187
   The default value is 32.
188
   
189
  
190
   _GLIBCXX_PROFILE_MEM_PER_DIAGNOSTIC:
191
   set a limit on how much memory to use for the accounting tables for each
192
   diagnostic type.  When this limit is reached, new events are ignored
193
   until the memory usage decreases under the limit.  Generally, this means
194
   that newly created containers will not be instrumented until some
195
   live containers are deleted.  The default is 128 MB.
196
   
197
  
198
   _GLIBCXX_PROFILE_NO_THREADS:
199
   Make the library not use threads.  If thread local storage (TLS) is not
200
   available, you will get a preprocessor error asking you to set
201
   -D_GLIBCXX_PROFILE_NO_THREADS if your program is single-threaded.
202
   Multithreaded execution without TLS is not supported.
203
   (Environment variable not supported.)
204
   
205
  
206
   _GLIBCXX_HAVE_EXECINFO_H:
207
   This name should be defined automatically at library configuration time.
208
   If your library was configured without execinfo.h, but
209
   you have it in your include path, you can define it explicitly.  Without
210
   it, advice is collected for the program as a whole, and not for each
211
   call context.
212
   (Environment variable not supported.)
213
   
214
  
215
  
216
 
217
218
 
219
220
 
221
 
222
223
  Design
224
 
225
226
227
228
Profile Code Location
229
230
231
232
 
233
234
  
235
    Code Location
236
    Use
237
  
238
239
240
  
241
    libstdc++-v3/include/std/*
242
    Preprocessor code to redirect to profile extension headers.
243
  
244
  
245
    libstdc++-v3/include/profile/*
246
    Profile extension public headers (map, vector, ...).
247
  
248
  
249
    libstdc++-v3/include/profile/impl/*
250
    Profile extension internals.  Implementation files are
251
     only included from impl/profiler.h, which is the only
252
     file included from the public headers.
253
  
254
255
256
257
 
258
259
260
 
261
262
 xreflabel="Wrapper">
263
Wrapper Model
264
  
265
  In order to get our instrumented library version included instead of the
266
  release one,
267
  we use the same wrapper model as the debug mode.
268
  We subclass entities from the release version.  Wherever
269
  _GLIBCXX_PROFILE is defined, the release namespace is
270
  std::__norm, whereas the profile namespace is
271
  std::__profile.  Using plain std translates
272
  into std::__profile.
273
  
274
  
275
  Whenever possible, we try to wrap at the public interface level, e.g.,
276
  in unordered_set rather than in hashtable,
277
  in order not to depend on implementation.
278
  
279
  
280
  Mixing object files built with and without the profile mode must
281
  not affect the program execution.  However, there are no guarantees to
282
  the accuracy of diagnostics when using even a single object not built with
283
  -D_GLIBCXX_PROFILE.
284
  Currently, mixing the profile mode with debug and parallel extensions is
285
  not allowed.  Mixing them at compile time will result in preprocessor errors.
286
  Mixing them at link time is undefined.
287
  
288
289
 
290
 
291
292
 xreflabel="Instrumentation">
293
Instrumentation
294
  
295
  Instead of instrumenting every public entry and exit point,
296
  we chose to add instrumentation on demand, as needed
297
  by individual diagnostics.
298
  The main reason is that some diagnostics require us to extract bits of
299
  internal state that are particular only to that diagnostic.
300
  We plan to formalize this later, after we learn more about the requirements
301
  of several diagnostics.
302
  
303
  
304
  All the instrumentation points can be switched on and off using
305
  -D[_NO]_GLIBCXX_PROFILE_<diagnostic> options.
306
  With all the instrumentation calls off, there should be negligible
307
  overhead over the release version.  This property is needed to support
308
  diagnostics based on timing of internal operations.  For such diagnostics,
309
  we anticipate turning most of the instrumentation off in order to prevent
310
  profiling overhead from polluting time measurements, and thus diagnostics.
311
  
312
  
313
  All the instrumentation on/off compile time switches live in
314
  include/profile/profiler.h.
315
  
316
317
 
318
 
319
320
 xreflabel="Run Time Behavior">
321
Run Time Behavior
322
  
323
  For practical reasons, the instrumentation library processes the trace
324
  partially
325
  rather than dumping it to disk in raw form.  Each event is processed when
326
  it occurs.  It is usually attached a cost and it is aggregated into
327
  the database of a specific diagnostic class.  The cost model
328
  is based largely on the standard performance guarantees, but in some
329
  cases we use knowledge about GCC's standard library implementation.
330
  
331
  
332
  Information is indexed by (1) call stack and (2) instance id or address
333
  to be able to understand and summarize precise creation-use-destruction
334
  dynamic chains.  Although the analysis is sensitive to dynamic instances,
335
  the reports are only sensitive to call context.  Whenever a dynamic instance
336
  is destroyed, we accumulate its effect to the corresponding entry for the
337
  call stack of its constructor location.
338
  
339
 
340
  
341
  For details, see
342
   paper presented at
343
   CGO 2009.
344
  
345
346
 
347
 
348
349
 xreflabel="Analysis and Diagnostics">
350
Analysis and Diagnostics
351
  
352
  Final analysis takes place offline, and it is based entirely on the
353
  generated trace and debugging info in the application binary.
354
  See section Diagnostics for a list of analysis types that we plan to support.
355
  
356
  
357
  The input to the analysis is a table indexed by profile type and call stack.
358
  The data type for each entry depends on the profile type.
359
  
360
361
 
362
 
363
364
 xreflabel="Cost Model">
365
Cost Model
366
  
367
  While it is likely that cost models become complex as we get into
368
  more sophisticated analysis, we will try to follow a simple set of rules
369
  at the beginning.
370
  
371
372
  Relative benefit estimation:
373
  The idea is to estimate or measure the cost of all operations
374
  in the original scenario versus the scenario we advise to switch to.
375
  For instance, when advising to change a vector to a list, an occurrence
376
  of the insert method will generally count as a benefit.
377
  Its magnitude depends on (1) the number of elements that get shifted
378
  and (2) whether it triggers a reallocation.
379
  
380
  Synthetic measurements:
381
  We will measure the relative difference between similar operations on
382
  different containers.  We plan to write a battery of small tests that
383
  compare the times of the executions of similar methods on different
384
  containers.  The idea is to run these tests on the target machine.
385
  If this training phase is very quick, we may decide to perform it at
386
  library initialization time.  The results can be cached on disk and reused
387
  across runs.
388
  
389
  Timers:
390
  We plan to use timers for operations of larger granularity, such as sort.
391
  For instance, we can switch between different sort methods on the fly
392
  and report the one that performs best for each call context.
393
  
394
  Show stoppers:
395
  We may decide that the presence of an operation nullifies the advice.
396
  For instance, when considering switching from set to
397
  unordered_set, if we detect use of operator ++,
398
  we will simply not issue the advice, since this could signal that the use
399
  care require a sorted container.
400
401
 
402
403
 
404
 
405
406
 xreflabel="Reports">
407
Reports
408
  
409
There are two types of reports.  First, if we recognize a pattern for which
410
we have a substitute that is likely to give better performance, we print
411
the advice and estimated performance gain.  The advice is usually associated
412
to a code position and possibly a call stack.
413
  
414
  
415
Second, we report performance characteristics for which we do not have
416
a clear solution for improvement.  For instance, we can point to the user
417
the top 10 multimap locations
418
which have the worst data locality in actual traversals.
419
Although this does not offer a solution,
420
it helps the user focus on the key problems and ignore the uninteresting ones.
421
  
422
423
 
424
 
425
426
 xreflabel="Testing">
427
Testing
428
  
429
  First, we want to make sure we preserve the behavior of the release mode.
430
  You can just type "make check-profile", which
431
  builds and runs the whole test suite in profile mode.
432
  
433
  
434
  Second, we want to test the correctness of each diagnostic.
435
  We created a profile directory in the test suite.
436
  Each diagnostic must come with at least two tests, one for false positives
437
  and one for false negatives.
438
  
439
440
 
441
442
 
443
444
 xreflabel="API">
445
Extensions for Custom Containers
446
 
447
  
448
  Many large projects use their own data structures instead of the ones in the
449
  standard library.  If these data structures are similar in functionality
450
  to the standard library, they can be instrumented with the same hooks
451
  that are used to instrument the standard library.
452
  The instrumentation API is exposed in file
453
  profiler.h (look for "Instrumentation hooks").
454
  
455
 
456
457
 
458
 
459
460
 xreflabel="Cost Model">
461
Empirical Cost Model
462
 
463
  
464
  Currently, the cost model uses formulas with predefined relative weights
465
  for alternative containers or container implementations.  For instance,
466
  iterating through a vector is X times faster than iterating through a list.
467
  
468
  
469
  (Under development.)
470
  We are working on customizing this to a particular machine by providing
471
  an automated way to compute the actual relative weights for operations
472
  on the given machine.
473
  
474
  
475
  (Under development.)
476
  We plan to provide a performance parameter database format that can be
477
  filled in either by hand or by an automated training mechanism.
478
  The analysis module will then use this database instead of the built in.
479
  generic parameters.
480
  
481
 
482
483
 
484
 
485
486
 xreflabel="Implementation">
487
Implementation Issues
488
 
489
 
490
491
 xreflabel="Stack Traces">
492
Stack Traces
493
  
494
  Accurate stack traces are needed during profiling since we group events by
495
  call context and dynamic instance.  Without accurate traces, diagnostics
496
  may be hard to interpret.  For instance, when giving advice to the user
497
  it is imperative to reference application code, not library code.
498
  
499
  
500
  Currently we are using the libc backtrace routine to get
501
  stack traces.
502
  _GLIBCXX_PROFILE_STACK_DEPTH can be set
503
  to 0 if you are willing to give up call context information, or to a small
504
  positive value to reduce run time overhead.
505
  
506
507
 
508
 
509
510
 xreflabel="Symbolization">
511
Symbolization of Instruction Addresses
512
  
513
  The profiling and analysis phases use only instruction addresses.
514
  An external utility such as addr2line is needed to postprocess the result.
515
  We do not plan to add symbolization support in the profile extension.
516
  This would require access to symbol tables, debug information tables,
517
  external programs or libraries and other system dependent information.
518
  
519
520
 
521
 
522
523
 xreflabel="Concurrency">
524
Concurrency
525
  
526
  Our current model is simplistic, but precise.
527
  We cannot afford to approximate because some of our diagnostics require
528
  precise matching of operations to container instance and call context.
529
  During profiling, we keep a single information table per diagnostic.
530
  There is a single lock per information table.
531
  
532
533
 
534
 
535
536
 xreflabel="Using the Standard Library in the Runtime Library">
537
Using the Standard Library in the Instrumentation Implementation
538
  
539
  As much as we would like to avoid uses of libstdc++ within our
540
  instrumentation library, containers such as unordered_map are very
541
  appealing.  We plan to use them as long as they are named properly
542
  to avoid ambiguity.
543
  
544
545
 
546
 
547
548
 xreflabel="Malloc Hooks">
549
Malloc Hooks
550
  
551
  User applications/libraries can provide malloc hooks.
552
  When the implementation of the malloc hooks uses stdlibc++, there can
553
  be an infinite cycle between the profile mode instrumentation and the
554
  malloc hook code.
555
  
556
  
557
  We protect against reentrance to the profile mode instrumentation code,
558
  which should avoid this problem in most cases.
559
  The protection mechanism is thread safe and exception safe.
560
  This mechanism does not prevent reentrance to the malloc hook itself,
561
  which could still result in deadlock, if, for instance, the malloc hook
562
  uses non-recursive locks.
563
  XXX: A definitive solution to this problem would be for the profile extension
564
  to use a custom allocator internally, and perhaps not to use libstdc++.
565
  
566
567
 
568
 
569
570
 xreflabel="Construction and Destruction of Global Objects">
571
Construction and Destruction of Global Objects
572
  
573
  The profiling library state is initialized at the first call to a profiling
574
  method.  This allows us to record the construction of all global objects.
575
  However, we cannot do the same at destruction time.  The trace is written
576
  by a function registered by atexit, thus invoked by
577
  exit.
578
  
579
580
 
581
582
 
583
 
584
585
 xreflabel="Developer Information">
586
Developer Information
587
 
588
589
 xreflabel="Big Picture">
590
Big Picture
591
 
592
  The profile mode headers are included with
593
   -D_GLIBCXX_PROFILE through preprocessor directives in
594
   include/std/*.
595
  
596
 
597
  Instrumented implementations are provided in
598
   include/profile/*.  All instrumentation hooks are macros
599
   defined in include/profile/profiler.h.
600
  
601
 
602
  All the implementation of the instrumentation hooks is in
603
   include/profile/impl/*.  Although all the code gets included,
604
   thus is publicly visible, only a small number of functions are called from
605
   outside this directory.  All calls to hook implementations must be
606
   done through macros defined in profiler.h.  The macro
607
   must ensure (1) that the call is guarded against reentrance and
608
   (2) that the call can be turned off at compile time using a
609
   -D_GLIBCXX_PROFILE_... compiler option.
610
  
611
 
612
613
 
614
615
 xreflabel="How To Add A Diagnostic">
616
How To Add A Diagnostic
617
 
618
  Let's say the diagnostic name is "magic".
619
  
620
 
621
  If you need to instrument a header not already under
622
   include/profile/*, first edit the corresponding header
623
   under include/std/ and add a preprocessor directive such
624
   as the one in include/std/vector:
625
626
#ifdef _GLIBCXX_PROFILE
627
# include <profile/vector>
628
#endif
629
630
  
631
 
632
  If the file you need to instrument is not yet under
633
   include/profile/, make a copy of the one in
634
   include/debug, or the main implementation.
635
   You'll need to include the main implementation and inherit the classes
636
   you want to instrument.  Then define the methods you want to instrument,
637
   define the instrumentation hooks and add calls to them.
638
   Look at include/profile/vector for an example.
639
  
640
 
641
  Add macros for the instrumentation hooks in
642
   include/profile/impl/profiler.h.
643
   Hook names must start with __profcxx_.
644
   Make sure they transform
645
   in no code with -D_NO_GLBICXX_PROFILE_MAGIC.
646
   Make sure all calls to any method in namespace __gnu_profile
647
   is protected against reentrance using macro
648
   _GLIBCXX_PROFILE_REENTRANCE_GUARD.
649
   All names of methods in namespace __gnu_profile called from
650
   profiler.h must start with __trace_magic_.
651
  
652
 
653
  Add the implementation of the diagnostic.
654
   
655
     
656
      Create new file include/profile/impl/profiler_magic.h.
657
     
658
     
659
      Define class __magic_info: public __object_info_base.
660
      This is the representation of a line in the object table.
661
      The __merge method is used to aggregate information
662
      across all dynamic instances created at the same call context.
663
      The __magnitude must return the estimation of the benefit
664
      as a number of small operations, e.g., number of words copied.
665
      The __write method is used to produce the raw trace.
666
      The __advice method is used to produce the advice string.
667
     
668
     
669
      Define class __magic_stack_info: public __magic_info.
670
      This defines the content of a line in the stack table.
671
     
672
     
673
      Define class __trace_magic: public __trace_base<__magic_info,
674
      __magic_stack_info>.
675
      It defines the content of the trace associated with this diagnostic.
676
     
677
    
678
  
679
 
680
  Add initialization and reporting calls in
681
   include/profile/impl/profiler_trace.h.  Use
682
   __trace_vector_to_list as an example.
683
  
684
 
685
  Add documentation in file doc/xml/manual/profile_mode.xml.
686
  
687
688
689
 
690
691
Diagnostics
692
 
693
  
694
  The table below presents all the diagnostics we intend to implement.
695
  Each diagnostic has a corresponding compile time switch
696
  -D_GLIBCXX_PROFILE_<diagnostic>.
697
  Groups of related diagnostics can be turned on with a single switch.
698
  For instance, -D_GLIBCXX_PROFILE_LOCALITY is equivalent to
699
  -D_GLIBCXX_PROFILE_SOFTWARE_PREFETCH
700
  -D_GLIBCXX_PROFILE_RBTREE_LOCALITY.
701
  
702
 
703
  
704
  The benefit, cost, expected frequency and accuracy of each diagnostic
705
  was given a grade from 1 to 10, where 10 is highest.
706
  A high benefit means that, if the diagnostic is accurate, the expected
707
  performance improvement is high.
708
  A high cost means that turning this diagnostic on leads to high slowdown.
709
  A high frequency means that we expect this to occur relatively often.
710
  A high accuracy means that the diagnostic is unlikely to be wrong.
711
  These grades are not perfect.  They are just meant to guide users with
712
  specific needs or time budgets.
713
  
714
 
715
716
Profile Diagnostics
717
718
719
720
721
722
723
724
725
 
726
727
  
728
    Group
729
    Flag
730
    Benefit
731
    Cost
732
    Freq.
733
    Implemented
734
  
735
736
737
  
738
    
739
    CONTAINERS
740
    
741
    HASHTABLE_TOO_SMALL
742
    10
743
    1
744
    
745
    10
746
    yes
747
  
748
  
749
    
750
    
751
    HASHTABLE_TOO_LARGE
752
    5
753
    1
754
    
755
    10
756
    yes
757
  
758
  
759
    
760
    
761
    INEFFICIENT_HASH
762
    7
763
    3
764
    
765
    10
766
    yes
767
  
768
  
769
    
770
    
771
    VECTOR_TOO_SMALL
772
    8
773
    1
774
    
775
    10
776
    yes
777
  
778
  
779
    
780
    
781
    VECTOR_TOO_LARGE
782
    5
783
    1
784
    
785
    10
786
    yes
787
  
788
  
789
    
790
    
791
    VECTOR_TO_HASHTABLE
792
    7
793
    7
794
    
795
    10
796
    no
797
  
798
  
799
    
800
    
801
    HASHTABLE_TO_VECTOR
802
    7
803
    7
804
    
805
    10
806
    no
807
  
808
  
809
    
810
    
811
    VECTOR_TO_LIST
812
    8
813
    5
814
    
815
    10
816
    yes
817
  
818
  
819
    
820
    
821
    LIST_TO_VECTOR
822
    10
823
    5
824
    
825
    10
826
    no
827
  
828
  
829
    
830
    
831
    ORDERED_TO_UNORDERED
832
    10
833
    5
834
    
835
    10
836
    only map/unordered_map
837
  
838
  
839
    
840
    ALGORITHMS
841
    
842
    SORT
843
    7
844
    8
845
    
846
    7
847
    no
848
  
849
  
850
    
851
    LOCALITY
852
    
853
    SOFTWARE_PREFETCH
854
    8
855
    8
856
    
857
    5
858
    no
859
  
860
  
861
    
862
    
863
    RBTREE_LOCALITY
864
    4
865
    8
866
    
867
    5
868
    no
869
  
870
  
871
    
872
    
873
    FALSE_SHARING
874
    8
875
    10
876
    
877
    10
878
    no
879
  
880
881
882
883
 
884
885
 xreflabel="Template">
886
Diagnostic Template
887
888
  Switch:
889
  _GLIBCXX_PROFILE_<diagnostic>.
890
  
891
  Goal:  What problem will it diagnose?
892
  
893
  Fundamentals:.
894
  What is the fundamental reason why this is a problem
895
  Sample runtime reduction:
896
  Percentage reduction in execution time.  When reduction is more than
897
  a constant factor, describe the reduction rate formula.
898
  
899
  Recommendation:
900
  What would the advise look like?
901
  To instrument:
902
  What stdlibc++ components need to be instrumented?
903
  Analysis:
904
  How do we decide when to issue the advice?
905
  Cost model:
906
  How do we measure benefits?  Math goes here.
907
  Example:
908
909
program code
910
...
911
advice sample
912
913
914
915
916
 
917
 
918
919
 xreflabel="Containers">
920
Containers
921
 
922
923
Switch:
924
  _GLIBCXX_PROFILE_CONTAINERS.
925
926
 
927
928
 xreflabel="Hashtable Too Small">
929
Hashtable Too Small
930
931
  Switch:
932
  _GLIBCXX_PROFILE_HASHTABLE_TOO_SMALL.
933
  
934
  Goal: Detect hashtables with many
935
  rehash operations, small construction size and large destruction size.
936
  
937
  Fundamentals: Rehash is very expensive.
938
  Read content, follow chains within bucket, evaluate hash function, place at
939
  new location in different order.
940
  Sample runtime reduction: 36%.
941
  Code similar to example below.
942
  
943
  Recommendation:
944
  Set initial size to N at construction site S.
945
  
946
  To instrument:
947
  unordered_set, unordered_map constructor, destructor, rehash.
948
  
949
  Analysis:
950
  For each dynamic instance of unordered_[multi]set|map,
951
  record initial size and call context of the constructor.
952
  Record size increase, if any, after each relevant operation such as insert.
953
  Record the estimated rehash cost.
954
  Cost model:
955
  Number of individual rehash operations * cost per rehash.
956
  Example:
957
958
1 unordered_set<int> us;
959
2 for (int k = 0; k < 1000000; ++k) {
960
3   us.insert(k);
961
4 }
962
 
963
foo.cc:1: advice: Changing initial unordered_set size from 10 to 1000000 saves 1025530 rehash operations.
964
965
966
967
968
 
969
 
970
971
 xreflabel="Hashtable Too Large">
972
Hashtable Too Large
973
974
  Switch:
975
  _GLIBCXX_PROFILE_HASHTABLE_TOO_LARGE.
976
  
977
  Goal: Detect hashtables which are
978
  never filled up because fewer elements than reserved are ever
979
  inserted.
980
  
981
  Fundamentals: Save memory, which
982
  is good in itself and may also improve memory reference performance through
983
  fewer cache and TLB misses.
984
  Sample runtime reduction: unknown.
985
  
986
  Recommendation:
987
  Set initial size to N at construction site S.
988
  
989
  To instrument:
990
  unordered_set, unordered_map constructor, destructor, rehash.
991
  
992
  Analysis:
993
  For each dynamic instance of unordered_[multi]set|map,
994
  record initial size and call context of the constructor, and correlate it
995
  with its size at destruction time.
996
  
997
  Cost model:
998
  Number of iteration operations + memory saved.
999
  Example:
1000
1001
1 vector<unordered_set<int>> v(100000, unordered_set<int>(100)) ;
1002
2 for (int k = 0; k < 100000; ++k) {
1003
3   for (int j = 0; j < 10; ++j) {
1004
4     v[k].insert(k + j);
1005
5  }
1006
6 }
1007
 
1008
foo.cc:1: advice: Changing initial unordered_set size from 100 to 10 saves N
1009
bytes of memory and M iteration steps.
1010
1011
1012
1013
1014
 
1015
1016
 xreflabel="Inefficient Hash">
1017
Inefficient Hash
1018
1019
  Switch:
1020
  _GLIBCXX_PROFILE_INEFFICIENT_HASH.
1021
  
1022
  Goal: Detect hashtables with polarized
1023
  distribution.
1024
  
1025
  Fundamentals: A non-uniform
1026
  distribution may lead to long chains, thus possibly increasing complexity
1027
  by a factor up to the number of elements.
1028
  
1029
  Sample runtime reduction: factor up
1030
   to container size.
1031
  
1032
  Recommendation: Change hash function
1033
  for container built at site S.  Distribution score = N.  Access score = S.
1034
  Longest chain = C, in bucket B.
1035
  
1036
  To instrument:
1037
  unordered_set, unordered_map constructor, destructor, [],
1038
  insert, iterator.
1039
  
1040
  Analysis:
1041
  Count the exact number of link traversals.
1042
  
1043
  Cost model:
1044
  Total number of links traversed.
1045
  Example:
1046
1047
class dumb_hash {
1048
 public:
1049
  size_t operator() (int i) const { return 0; }
1050
};
1051
...
1052
  unordered_set<int, dumb_hash> hs;
1053
  ...
1054
  for (int i = 0; i < COUNT; ++i) {
1055
    hs.find(i);
1056
  }
1057
1058
1059
1060
1061
 
1062
1063
 xreflabel="Vector Too Small">
1064
Vector Too Small
1065
1066
  Switch:
1067
  _GLIBCXX_PROFILE_VECTOR_TOO_SMALL.
1068
  
1069
  Goal:Detect vectors with many
1070
  resize operations, small construction size and large destruction size..
1071
  
1072
  Fundamentals:Resizing can be expensive.
1073
  Copying large amounts of data takes time.  Resizing many small vectors may
1074
  have allocation overhead and affect locality.
1075
  Sample runtime reduction:%.
1076
  
1077
  Recommendation:
1078
  Set initial size to N at construction site S.
1079
  To instrument:vector.
1080
  
1081
  Analysis:
1082
  For each dynamic instance of vector,
1083
  record initial size and call context of the constructor.
1084
  Record size increase, if any, after each relevant operation such as
1085
  push_back.  Record the estimated resize cost.
1086
  
1087
  Cost model:
1088
  Total number of words copied * time to copy a word.
1089
  Example:
1090
1091
1 vector<int> v;
1092
2 for (int k = 0; k < 1000000; ++k) {
1093
3   v.push_back(k);
1094
4 }
1095
 
1096
foo.cc:1: advice: Changing initial vector size from 10 to 1000000 saves
1097
copying 4000000 bytes and 20 memory allocations and deallocations.
1098
1099
1100
1101
1102
 
1103
1104
 xreflabel="Vector Too Large">
1105
Vector Too Large
1106
1107
  Switch:
1108
  _GLIBCXX_PROFILE_VECTOR_TOO_LARGE
1109
  
1110
  Goal:Detect vectors which are
1111
  never filled up because fewer elements than reserved are ever
1112
  inserted.
1113
  
1114
  Fundamentals:Save memory, which
1115
  is good in itself and may also improve memory reference performance through
1116
  fewer cache and TLB misses.
1117
  Sample runtime reduction:%.
1118
  
1119
  Recommendation:
1120
  Set initial size to N at construction site S.
1121
  To instrument:vector.
1122
  
1123
  Analysis:
1124
  For each dynamic instance of vector,
1125
  record initial size and call context of the constructor, and correlate it
1126
  with its size at destruction time.
1127
  Cost model:
1128
  Total amount of memory saved.
1129
  Example:
1130
1131
1 vector<vector<int>> v(100000, vector<int>(100)) ;
1132
2 for (int k = 0; k < 100000; ++k) {
1133
3   for (int j = 0; j < 10; ++j) {
1134
4     v[k].insert(k + j);
1135
5  }
1136
6 }
1137
 
1138
foo.cc:1: advice: Changing initial vector size from 100 to 10 saves N
1139
bytes of memory and may reduce the number of cache and TLB misses.
1140
1141
1142
1143
1144
 
1145
1146
 xreflabel="Vector to Hashtable">
1147
Vector to Hashtable
1148
1149
  Switch:
1150
  _GLIBCXX_PROFILE_VECTOR_TO_HASHTABLE.
1151
  
1152
  Goal: Detect uses of
1153
  vector that can be substituted with unordered_set
1154
  to reduce execution time.
1155
  
1156
  Fundamentals:
1157
  Linear search in a vector is very expensive, whereas searching in a hashtable
1158
  is very quick.
1159
  Sample runtime reduction:factor up
1160
   to container size.
1161
  
1162
  Recommendation:Replace
1163
  vector with unordered_set at site S.
1164
  
1165
  To instrument:vector
1166
  operations and access methods.
1167
  Analysis:
1168
  For each dynamic instance of vector,
1169
  record call context of the constructor.  Issue the advice only if the
1170
  only methods called on this vector are push_back,
1171
  insert and find.
1172
  
1173
  Cost model:
1174
  Cost(vector::push_back) + cost(vector::insert) + cost(find, vector) -
1175
  cost(unordered_set::insert) + cost(unordered_set::find).
1176
  
1177
  Example:
1178
1179
1  vector<int> v;
1180
...
1181
2  for (int i = 0; i < 1000; ++i) {
1182
3    find(v.begin(), v.end(), i);
1183
4  }
1184
 
1185
foo.cc:1: advice: Changing "vector" to "unordered_set" will save about 500,000
1186
comparisons.
1187
1188
1189
1190
1191
 
1192
1193
 xreflabel="Hashtable to Vector">
1194
Hashtable to Vector
1195
1196
  Switch:
1197
  _GLIBCXX_PROFILE_HASHTABLE_TO_VECTOR.
1198
  
1199
  Goal: Detect uses of
1200
  unordered_set that can be substituted with vector
1201
  to reduce execution time.
1202
  
1203
  Fundamentals:
1204
  Hashtable iterator is slower than vector iterator.
1205
  Sample runtime reduction:95%.
1206
  
1207
  Recommendation:Replace
1208
  unordered_set with vector at site S.
1209
  
1210
  To instrument:unordered_set
1211
  operations and access methods.
1212
  Analysis:
1213
  For each dynamic instance of unordered_set,
1214
  record call context of the constructor.  Issue the advice only if the
1215
  number of find, insert and []
1216
  operations on this unordered_set are small relative to the
1217
  number of elements, and methods begin or end
1218
  are invoked (suggesting iteration).
1219
  Cost model:
1220
  Number of .
1221
  Example:
1222
1223
1  unordered_set<int> us;
1224
...
1225
2  int s = 0;
1226
3  for (unordered_set<int>::iterator it = us.begin(); it != us.end(); ++it) {
1227
4    s += *it;
1228
5  }
1229
 
1230
foo.cc:1: advice: Changing "unordered_set" to "vector" will save about N
1231
indirections and may achieve better data locality.
1232
1233
1234
1235
1236
 
1237
1238
 xreflabel="Vector to List">
1239
Vector to List
1240
1241
  Switch:
1242
  _GLIBCXX_PROFILE_VECTOR_TO_LIST.
1243
  
1244
  Goal: Detect cases where
1245
  vector could be substituted with list for
1246
  better performance.
1247
  
1248
  Fundamentals:
1249
  Inserting in the middle of a vector is expensive compared to inserting in a
1250
  list.
1251
  
1252
  Sample runtime reduction:factor up to
1253
   container size.
1254
  
1255
  Recommendation:Replace vector with list
1256
  at site S.
1257
  To instrument:vector
1258
  operations and access methods.
1259
  Analysis:
1260
  For each dynamic instance of vector,
1261
  record the call context of the constructor.  Record the overhead of each
1262
  insert operation based on current size and insert position.
1263
  Report instance with high insertion overhead.
1264
  
1265
  Cost model:
1266
  (Sum(cost(vector::method)) - Sum(cost(list::method)), for
1267
  method in [push_back, insert, erase])
1268
  + (Cost(iterate vector) - Cost(iterate list))
1269
  Example:
1270
1271
1  vector<int> v;
1272
2  for (int i = 0; i < 10000; ++i) {
1273
3    v.insert(v.begin(), i);
1274
4  }
1275
 
1276
foo.cc:1: advice: Changing "vector" to "list" will save about 5,000,000
1277
operations.
1278
1279
1280
1281
1282
 
1283
1284
 xreflabel="List to Vector">
1285
List to Vector
1286
1287
  Switch:
1288
  _GLIBCXX_PROFILE_LIST_TO_VECTOR.
1289
  
1290
  Goal: Detect cases where
1291
  list could be substituted with vector for
1292
  better performance.
1293
  
1294
  Fundamentals:
1295
  Iterating through a vector is faster than through a list.
1296
  
1297
  Sample runtime reduction:64%.
1298
  
1299
  Recommendation:Replace list with vector
1300
  at site S.
1301
  To instrument:vector
1302
  operations and access methods.
1303
  Analysis:
1304
  Issue the advice if there are no insert operations.
1305
  
1306
  Cost model:
1307
    (Sum(cost(vector::method)) - Sum(cost(list::method)), for
1308
  method in [push_back, insert, erase])
1309
  + (Cost(iterate vector) - Cost(iterate list))
1310
  Example:
1311
1312
1  list<int> l;
1313
...
1314
2  int sum = 0;
1315
3  for (list<int>::iterator it = l.begin(); it != l.end(); ++it) {
1316
4    sum += *it;
1317
5  }
1318
 
1319
foo.cc:1: advice: Changing "list" to "vector" will save about 1000000 indirect
1320
memory references.
1321
1322
1323
1324
1325
 
1326
1327
 xreflabel="List to Forward List">
1328
List to Forward List (Slist)
1329
1330
  Switch:
1331
  _GLIBCXX_PROFILE_LIST_TO_SLIST.
1332
  
1333
  Goal: Detect cases where
1334
  list could be substituted with forward_list for
1335
  better performance.
1336
  
1337
  Fundamentals:
1338
  The memory footprint of a forward_list is smaller than that of a list.
1339
  This has beneficial effects on memory subsystem, e.g., fewer cache misses.
1340
  
1341
  Sample runtime reduction:40%.
1342
  Note that the reduction is only noticeable if the size of the forward_list
1343
  node is in fact larger than that of the list node.  For memory allocators
1344
  with size classes, you will only notice an effect when the two node sizes
1345
  belong to different allocator size classes.
1346
  
1347
  Recommendation:Replace list with
1348
  forward_list at site S.
1349
  To instrument:list
1350
  operations and iteration methods.
1351
  Analysis:
1352
  Issue the advice if there are no backwards traversals
1353
  or insertion before a given node.
1354
  
1355
  Cost model:
1356
  Always true.
1357
  Example:
1358
1359
1  list<int> l;
1360
...
1361
2  int sum = 0;
1362
3  for (list<int>::iterator it = l.begin(); it != l.end(); ++it) {
1363
4    sum += *it;
1364
5  }
1365
 
1366
foo.cc:1: advice: Change "list" to "forward_list".
1367
1368
1369
1370
1371
 
1372
1373
 xreflabel="Ordered to Unordered Associative Container">
1374
Ordered to Unordered Associative Container
1375
1376
  Switch:
1377
  _GLIBCXX_PROFILE_ORDERED_TO_UNORDERED.
1378
  
1379
  Goal:  Detect cases where ordered
1380
  associative containers can be replaced with unordered ones.
1381
  
1382
  Fundamentals:
1383
  Insert and search are quicker in a hashtable than in
1384
  a red-black tree.
1385
  Sample runtime reduction:52%.
1386
  
1387
  Recommendation:
1388
  Replace set with unordered_set at site S.
1389
  To instrument:
1390
  set, multiset, map,
1391
  multimap methods.
1392
  Analysis:
1393
  Issue the advice only if we are not using operator ++ on any
1394
  iterator on a particular [multi]set|map.
1395
  
1396
  Cost model:
1397
  (Sum(cost(hashtable::method)) - Sum(cost(rbtree::method)), for
1398
  method in [insert, erase, find])
1399
  + (Cost(iterate hashtable) - Cost(iterate rbtree))
1400
  Example:
1401
1402
1  set<int> s;
1403
2  for (int i = 0; i < 100000; ++i) {
1404
3    s.insert(i);
1405
4  }
1406
5  int sum = 0;
1407
6  for (int i = 0; i < 100000; ++i) {
1408
7    sum += *s.find(i);
1409
8  }
1410
1411
1412
1413
1414
 
1415
1416
 
1417
 
1418
 
1419
1420
 xreflabel="Algorithms">
1421
Algorithms
1422
 
1423
  Switch:
1424
  _GLIBCXX_PROFILE_ALGORITHMS.
1425
  
1426
 
1427
1428
 xreflabel="Sorting">
1429
Sort Algorithm Performance
1430
1431
  Switch:
1432
  _GLIBCXX_PROFILE_SORT.
1433
  
1434
  Goal: Give measure of sort algorithm
1435
  performance based on actual input.  For instance, advise Radix Sort over
1436
  Quick Sort for a particular call context.
1437
  
1438
  Fundamentals:
1439
  See papers:
1440
  
1441
  A framework for adaptive algorithm selection in STAPL and
1442
  
1443
  Optimizing Sorting with Machine Learning Algorithms.
1444
  
1445
  Sample runtime reduction:60%.
1446
  
1447
  Recommendation: Change sort algorithm
1448
  at site S from X Sort to Y Sort.
1449
  To instrument: sort
1450
  algorithm.
1451
  Analysis:
1452
  Issue the advice if the cost model tells us that another sort algorithm
1453
  would do better on this input.  Requires us to know what algorithm we
1454
  are using in our sort implementation in release mode.
1455
  Cost model:
1456
  Runtime(algo) for algo in [radix, quick, merge, ...]
1457
  Example:
1458
1459
1460
1461
1462
1463
 
1464
1465
 
1466
 
1467
1468
 xreflabel="Data Locality">
1469
Data Locality
1470
 
1471
  Switch:
1472
  _GLIBCXX_PROFILE_LOCALITY.
1473
  
1474
 
1475
1476
 xreflabel="Need Software Prefetch">
1477
Need Software Prefetch
1478
1479
  Switch:
1480
  _GLIBCXX_PROFILE_SOFTWARE_PREFETCH.
1481
  
1482
  Goal: Discover sequences of indirect
1483
  memory accesses that are not regular, thus cannot be predicted by
1484
  hardware prefetchers.
1485
  
1486
  Fundamentals:
1487
  Indirect references are hard to predict and are very expensive when they
1488
  miss in caches.
1489
  Sample runtime reduction:25%.
1490
  
1491
  Recommendation: Insert prefetch
1492
  instruction.
1493
  To instrument: Vector iterator and
1494
  access operator [].
1495
  
1496
  Analysis:
1497
  First, get cache line size and page size from system.
1498
  Then record iterator dereference sequences for which the value is a pointer.
1499
  For each sequence within a container, issue a warning if successive pointer
1500
  addresses are not within cache lines and do not form a linear pattern
1501
  (otherwise they may be prefetched by hardware).
1502
  If they also step across page boundaries, make the warning stronger.
1503
  
1504
  The same analysis applies to containers other than vector.
1505
  However, we cannot give the same advice for linked structures, such as list,
1506
  as there is no random access to the n-th element.  The user may still be
1507
  able to benefit from this information, for instance by employing frays (user
1508
  level light weight threads) to hide the latency of chasing pointers.
1509
  
1510
  
1511
  This analysis is a little oversimplified.  A better cost model could be
1512
  created by understanding the capability of the hardware prefetcher.
1513
  This model could be trained automatically by running a set of synthetic
1514
  cases.
1515
  
1516
  
1517
  Cost model:
1518
  Total distance between pointer values of successive elements in vectors
1519
  of pointers.
1520
  Example:
1521
1522
1 int zero = 0;
1523
2 vector<int*> v(10000000, &zero);
1524
3 for (int k = 0; k < 10000000; ++k) {
1525
4   v[random() % 10000000] = new int(k);
1526
5 }
1527
6 for (int j = 0; j < 10000000; ++j) {
1528
7   count += (*v[j] == 0 ? 0 : 1);
1529
8 }
1530
 
1531
foo.cc:7: advice: Insert prefetch instruction.
1532
1533
1534
1535
1536
 
1537
1538
 xreflabel="Linked Structure Locality">
1539
Linked Structure Locality
1540
1541
  Switch:
1542
  _GLIBCXX_PROFILE_RBTREE_LOCALITY.
1543
  
1544
  Goal: Give measure of locality of
1545
  objects stored in linked structures (lists, red-black trees and hashtables)
1546
  with respect to their actual traversal patterns.
1547
  
1548
  Fundamentals:Allocation can be tuned
1549
  to a specific traversal pattern, to result in better data locality.
1550
  See paper:
1551
  
1552
  Custom Memory Allocation for Free.
1553
  
1554
  Sample runtime reduction:30%.
1555
  
1556
  Recommendation:
1557
  High scatter score N for container built at site S.
1558
  Consider changing allocation sequence or choosing a structure conscious
1559
  allocator.
1560
  To instrument: Methods of all
1561
  containers using linked structures.
1562
  Analysis:
1563
  First, get cache line size and page size from system.
1564
  Then record the number of successive elements that are on different line
1565
  or page, for each traversal method such as find.  Give advice
1566
  only if the ratio between this number and the number of total node hops
1567
  is above a threshold.
1568
  Cost model:
1569
  Sum(same_cache_line(this,previous))
1570
  Example:
1571
1572
 1  set<int> s;
1573
 2  for (int i = 0; i < 10000000; ++i) {
1574
 3    s.insert(i);
1575
 4  }
1576
 5  set<int> s1, s2;
1577
 6  for (int i = 0; i < 10000000; ++i) {
1578
 7    s1.insert(i);
1579
 8    s2.insert(i);
1580
 9  }
1581
...
1582
      // Fast, better locality.
1583
10    for (set<int>::iterator it = s.begin(); it != s.end(); ++it) {
1584
11      sum += *it;
1585
12    }
1586
      // Slow, elements are further apart.
1587
13    for (set<int>::iterator it = s1.begin(); it != s1.end(); ++it) {
1588
14      sum += *it;
1589
15    }
1590
 
1591
foo.cc:5: advice: High scatter score NNN for set built here.  Consider changing
1592
the allocation sequence or switching to a structure conscious allocator.
1593
1594
1595
1596
1597
 
1598
1599
 
1600
 
1601
1602
 xreflabel="Multithreaded Data Access">
1603
Multithreaded Data Access
1604
 
1605
  
1606
  The diagnostics in this group are not meant to be implemented short term.
1607
  They require compiler support to know when container elements are written
1608
  to.  Instrumentation can only tell us when elements are referenced.
1609
  
1610
 
1611
  Switch:
1612
  _GLIBCXX_PROFILE_MULTITHREADED.
1613
  
1614
 
1615
1616
 xreflabel="Dependence Violations at Container Level">
1617
Data Dependence Violations at Container Level
1618
1619
  Switch:
1620
  _GLIBCXX_PROFILE_DDTEST.
1621
  
1622
  Goal: Detect container elements
1623
  that are referenced from multiple threads in the parallel region or
1624
  across parallel regions.
1625
  
1626
  Fundamentals:
1627
  Sharing data between threads requires communication and perhaps locking,
1628
  which may be expensive.
1629
  
1630
  Sample runtime reduction:?%.
1631
  
1632
  Recommendation: Change data
1633
  distribution or parallel algorithm.
1634
  To instrument: Container access methods
1635
  and iterators.
1636
  
1637
  Analysis:
1638
  Keep a shadow for each container.  Record iterator dereferences and
1639
  container member accesses.  Issue advice for elements referenced by
1640
  multiple threads.
1641
  See paper: 
1642
  The LRPD test: speculative run-time parallelization of loops with
1643
  privatization and reduction parallelization.
1644
  
1645
  Cost model:
1646
  Number of accesses to elements referenced from multiple threads
1647
  
1648
  Example:
1649
1650
1651
1652
1653
1654
 
1655
1656
 xreflabel="False Sharing">
1657
False Sharing
1658
1659
  Switch:
1660
  _GLIBCXX_PROFILE_FALSE_SHARING.
1661
  
1662
  Goal: Detect elements in the
1663
  same container which share a cache line, are written by at least one
1664
  thread, and accessed by different threads.
1665
  
1666
  Fundamentals: Under these assumptions,
1667
  cache protocols require
1668
  communication to invalidate lines, which may be expensive.
1669
  
1670
  Sample runtime reduction:68%.
1671
  
1672
  Recommendation: Reorganize container
1673
  or use padding to avoid false sharing.
1674
  To instrument: Container access methods
1675
  and iterators.
1676
  
1677
  Analysis:
1678
  First, get the cache line size.
1679
  For each shared container, record all the associated iterator dereferences
1680
  and member access methods with the thread id.  Compare the address lists
1681
  across threads to detect references in two different threads to the same
1682
  cache line.  Issue a warning only if the ratio to total references is
1683
  significant.  Do the same for iterator dereference values if they are
1684
  pointers.
1685
  Cost model:
1686
  Number of accesses to same cache line from different threads.
1687
  
1688
  Example:
1689
1690
1     vector<int> v(2, 0);
1691
2 #pragma omp parallel for shared(v, SIZE) schedule(static, 1)
1692
3     for (i = 0; i < SIZE; ++i) {
1693
4       v[i % 2] += i;
1694
5     }
1695
 
1696
OMP_NUM_THREADS=2 ./a.out
1697
foo.cc:1: advice: Change container structure or padding to avoid false
1698
sharing in multithreaded access at foo.cc:4.  Detected N shared cache lines.
1699
1700
1701
1702
1703
 
1704
1705
 
1706
 
1707
1708
 xreflabel="Statistics">
1709
Statistics
1710
 
1711
1712
Switch:
1713
  _GLIBCXX_PROFILE_STATISTICS.
1714
1715
 
1716
1717
  In some cases the cost model may not tell us anything because the costs
1718
  appear to offset the benefits.  Consider the choice between a vector and
1719
  a list.  When there are both inserts and iteration, an automatic advice
1720
  may not be issued.  However, the programmer may still be able to make use
1721
  of this information in a different way.
1722
1723
1724
  This diagnostic will not issue any advice, but it will print statistics for
1725
  each container construction site.  The statistics will contain the cost
1726
  of each operation actually performed on the container.
1727
1728
 
1729
1730
 
1731
 
1732
1733
 
1734
 
1735
1736
Bibliography
1737
 
1738
  
1739
    </code></pre></td>
      </tr>
      <tr valign="middle">
         <td>1740</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>      Perflint: A Context Sensitive Performance Advisor for C++ Programs</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>1741</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    
1742
 
1743
    
1744
      Lixia
1745
      Liu
1746
    
1747
    
1748
      Silvius
1749
      Rus
1750
    
1751
 
1752
    
1753
      2009
1754
      
1755
    
1756
 
1757
    
1758
      
1759
        Proceedings of the 2009 International Symposium on Code Generation
1760
        and Optimization
1761
      
1762
    
1763
  
1764
1765
 
1766
 
1767

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.