1 |
424 |
jeremybenn |
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
2 |
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
3 |
|
|
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Diagnostics</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2" /><meta name="keywords" content=" C++ , library , profile " /><meta name="keywords" content=" ISO C++ , library " /><link rel="home" href="../spine.html" title="The GNU C++ Library Documentation" /><link rel="up" href="profile_mode.html" title="Chapter 32. Profile Mode" /><link rel="prev" href="bk01pt12ch32s06.html" title="Developer Information" /><link rel="next" href="ext_allocators.html" title="Chapter 33. Allocators" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Diagnostics</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="bk01pt12ch32s06.html">Prev</a> </td><th width="60%" align="center">Chapter 32. Profile Mode</th><td width="20%" align="right"> <a accesskey="n" href="ext_allocators.html">Next</a></td></tr></table><hr /></div><div class="sect1" title="Diagnostics"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="manual.ext.profile_mode.diagnostics"></a>Diagnostics</h2></div></div></div><p>
|
4 |
|
|
The table below presents all the diagnostics we intend to implement.
|
5 |
|
|
Each diagnostic has a corresponding compile time switch
|
6 |
|
|
<code class="code">-D_GLIBCXX_PROFILE_<diagnostic></code>.
|
7 |
|
|
Groups of related diagnostics can be turned on with a single switch.
|
8 |
|
|
For instance, <code class="code">-D_GLIBCXX_PROFILE_LOCALITY</code> is equivalent to
|
9 |
|
|
<code class="code">-D_GLIBCXX_PROFILE_SOFTWARE_PREFETCH
|
10 |
|
|
-D_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>.
|
11 |
|
|
</p><p>
|
12 |
|
|
The benefit, cost, expected frequency and accuracy of each diagnostic
|
13 |
|
|
was given a grade from 1 to 10, where 10 is highest.
|
14 |
|
|
A high benefit means that, if the diagnostic is accurate, the expected
|
15 |
|
|
performance improvement is high.
|
16 |
|
|
A high cost means that turning this diagnostic on leads to high slowdown.
|
17 |
|
|
A high frequency means that we expect this to occur relatively often.
|
18 |
|
|
A high accuracy means that the diagnostic is unlikely to be wrong.
|
19 |
|
|
These grades are not perfect. They are just meant to guide users with
|
20 |
|
|
specific needs or time budgets.
|
21 |
|
|
</p><div class="table"><a id="id626153"></a><p class="title"><b>Table 32.2. Diagnostics</b></p><div class="table-contents"><table summary="Diagnostics" border="1"><colgroup><col align="left" /><col align="left" /><col align="left" /><col align="left" /><col align="left" /><col align="left" /></colgroup><thead><tr><th align="left">Group</th><th align="left">Flag</th><th align="left">Benefit</th><th align="left">Cost</th><th align="left">Freq.</th><th align="left">Implemented</th></tr></thead><tbody><tr><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.containers" target="_top">
|
22 |
|
|
CONTAINERS</a></td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.hashtable_too_small" target="_top">
|
23 |
|
|
HASHTABLE_TOO_SMALL</a></td><td align="left">10</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.hashtable_too_large" target="_top">
|
24 |
|
|
HASHTABLE_TOO_LARGE</a></td><td align="left">5</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.inefficient_hash" target="_top">
|
25 |
|
|
INEFFICIENT_HASH</a></td><td align="left">7</td><td align="left">3</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.vector_too_small" target="_top">
|
26 |
|
|
VECTOR_TOO_SMALL</a></td><td align="left">8</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.vector_too_large" target="_top">
|
27 |
|
|
VECTOR_TOO_LARGE</a></td><td align="left">5</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.vector_to_hashtable" target="_top">
|
28 |
|
|
VECTOR_TO_HASHTABLE</a></td><td align="left">7</td><td align="left">7</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.hashtable_to_vector" target="_top">
|
29 |
|
|
HASHTABLE_TO_VECTOR</a></td><td align="left">7</td><td align="left">7</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.vector_to_list" target="_top">
|
30 |
|
|
VECTOR_TO_LIST</a></td><td align="left">8</td><td align="left">5</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.list_to_vector" target="_top">
|
31 |
|
|
LIST_TO_VECTOR</a></td><td align="left">10</td><td align="left">5</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.assoc_ord_to_unord" target="_top">
|
32 |
|
|
ORDERED_TO_UNORDERED</a></td><td align="left">10</td><td align="left">5</td><td align="left"> </td><td align="left">10</td><td align="left">only map/unordered_map</td></tr><tr><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.algorithms" target="_top">
|
33 |
|
|
ALGORITHMS</a></td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.algorithms.sort" target="_top">
|
34 |
|
|
SORT</a></td><td align="left">7</td><td align="left">8</td><td align="left"> </td><td align="left">7</td><td align="left">no</td></tr><tr><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.locality" target="_top">
|
35 |
|
|
LOCALITY</a></td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.locality.sw_prefetch" target="_top">
|
36 |
|
|
SOFTWARE_PREFETCH</a></td><td align="left">8</td><td align="left">8</td><td align="left"> </td><td align="left">5</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.locality.linked" target="_top">
|
37 |
|
|
RBTREE_LOCALITY</a></td><td align="left">4</td><td align="left">8</td><td align="left"> </td><td align="left">5</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="ulink" href="#manual.ext.profile_mode.analysis.mthread.false_share" target="_top">
|
38 |
|
|
FALSE_SHARING</a></td><td align="left">8</td><td align="left">10</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr></tbody></table></div></div><br class="table-break" /><div class="sect2" title="Diagnostic Template"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.template"></a>Diagnostic Template</h3></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
39 |
|
|
<code class="code">_GLIBCXX_PROFILE_<diagnostic></code>.
|
40 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> What problem will it diagnose?
|
41 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>.
|
42 |
|
|
What is the fundamental reason why this is a problem</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>
|
43 |
|
|
Percentage reduction in execution time. When reduction is more than
|
44 |
|
|
a constant factor, describe the reduction rate formula.
|
45 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>
|
46 |
|
|
What would the advise look like?</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span>
|
47 |
|
|
What stdlibc++ components need to be instrumented?</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
48 |
|
|
How do we decide when to issue the advice?</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
49 |
|
|
How do we measure benefits? Math goes here.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
50 |
|
|
</p><pre class="programlisting">
|
51 |
|
|
program code
|
52 |
|
|
...
|
53 |
|
|
advice sample
|
54 |
|
|
</pre><p>
|
55 |
|
|
</p></li></ul></div></div><div class="sect2" title="Containers"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.containers"></a>Containers</h3></div></div></div><p>
|
56 |
|
|
<span class="emphasis"><em>Switch:</em></span>
|
57 |
|
|
<code class="code">_GLIBCXX_PROFILE_CONTAINERS</code>.
|
58 |
|
|
</p><div class="sect3" title="Hashtable Too Small"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_too_small"></a>Hashtable Too Small</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
59 |
|
|
<code class="code">_GLIBCXX_PROFILE_HASHTABLE_TOO_SMALL</code>.
|
60 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables with many
|
61 |
|
|
rehash operations, small construction size and large destruction size.
|
62 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Rehash is very expensive.
|
63 |
|
|
Read content, follow chains within bucket, evaluate hash function, place at
|
64 |
|
|
new location in different order.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> 36%.
|
65 |
|
|
Code similar to example below.
|
66 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>
|
67 |
|
|
Set initial size to N at construction site S.
|
68 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span>
|
69 |
|
|
<code class="code">unordered_set, unordered_map</code> constructor, destructor, rehash.
|
70 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
71 |
|
|
For each dynamic instance of <code class="code">unordered_[multi]set|map</code>,
|
72 |
|
|
record initial size and call context of the constructor.
|
73 |
|
|
Record size increase, if any, after each relevant operation such as insert.
|
74 |
|
|
Record the estimated rehash cost.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
75 |
|
|
Number of individual rehash operations * cost per rehash.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
76 |
|
|
</p><pre class="programlisting">
|
77 |
|
|
1 unordered_set<int> us;
|
78 |
|
|
2 for (int k = 0; k < 1000000; ++k) {
|
79 |
|
|
3 us.insert(k);
|
80 |
|
|
4 }
|
81 |
|
|
|
82 |
|
|
foo.cc:1: advice: Changing initial unordered_set size from 10 to 1000000 saves 1025530 rehash operations.
|
83 |
|
|
</pre><p>
|
84 |
|
|
</p></li></ul></div></div><div class="sect3" title="Hashtable Too Large"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_too_large"></a>Hashtable Too Large</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
85 |
|
|
<code class="code">_GLIBCXX_PROFILE_HASHTABLE_TOO_LARGE</code>.
|
86 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables which are
|
87 |
|
|
never filled up because fewer elements than reserved are ever
|
88 |
|
|
inserted.
|
89 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Save memory, which
|
90 |
|
|
is good in itself and may also improve memory reference performance through
|
91 |
|
|
fewer cache and TLB misses.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> unknown.
|
92 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>
|
93 |
|
|
Set initial size to N at construction site S.
|
94 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span>
|
95 |
|
|
<code class="code">unordered_set, unordered_map</code> constructor, destructor, rehash.
|
96 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
97 |
|
|
For each dynamic instance of <code class="code">unordered_[multi]set|map</code>,
|
98 |
|
|
record initial size and call context of the constructor, and correlate it
|
99 |
|
|
with its size at destruction time.
|
100 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
101 |
|
|
Number of iteration operations + memory saved.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
102 |
|
|
</p><pre class="programlisting">
|
103 |
|
|
1 vector<unordered_set<int>> v(100000, unordered_set<int>(100)) ;
|
104 |
|
|
2 for (int k = 0; k < 100000; ++k) {
|
105 |
|
|
3 for (int j = 0; j < 10; ++j) {
|
106 |
|
|
4 v[k].insert(k + j);
|
107 |
|
|
5 }
|
108 |
|
|
6 }
|
109 |
|
|
|
110 |
|
|
foo.cc:1: advice: Changing initial unordered_set size from 100 to 10 saves N
|
111 |
|
|
bytes of memory and M iteration steps.
|
112 |
|
|
</pre><p>
|
113 |
|
|
</p></li></ul></div></div><div class="sect3" title="Inefficient Hash"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.inefficient_hash"></a>Inefficient Hash</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
114 |
|
|
<code class="code">_GLIBCXX_PROFILE_INEFFICIENT_HASH</code>.
|
115 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables with polarized
|
116 |
|
|
distribution.
|
117 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> A non-uniform
|
118 |
|
|
distribution may lead to long chains, thus possibly increasing complexity
|
119 |
|
|
by a factor up to the number of elements.
|
120 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> factor up
|
121 |
|
|
to container size.
|
122 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change hash function
|
123 |
|
|
for container built at site S. Distribution score = N. Access score = S.
|
124 |
|
|
Longest chain = C, in bucket B.
|
125 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span>
|
126 |
|
|
<code class="code">unordered_set, unordered_map</code> constructor, destructor, [],
|
127 |
|
|
insert, iterator.
|
128 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
129 |
|
|
Count the exact number of link traversals.
|
130 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
131 |
|
|
Total number of links traversed.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
132 |
|
|
</p><pre class="programlisting">
|
133 |
|
|
class dumb_hash {
|
134 |
|
|
public:
|
135 |
|
|
size_t operator() (int i) const { return 0; }
|
136 |
|
|
};
|
137 |
|
|
...
|
138 |
|
|
unordered_set<int, dumb_hash> hs;
|
139 |
|
|
...
|
140 |
|
|
for (int i = 0; i < COUNT; ++i) {
|
141 |
|
|
hs.find(i);
|
142 |
|
|
}
|
143 |
|
|
</pre><p>
|
144 |
|
|
</p></li></ul></div></div><div class="sect3" title="Vector Too Small"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_too_small"></a>Vector Too Small</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
145 |
|
|
<code class="code">_GLIBCXX_PROFILE_VECTOR_TOO_SMALL</code>.
|
146 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span>Detect vectors with many
|
147 |
|
|
resize operations, small construction size and large destruction size..
|
148 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Resizing can be expensive.
|
149 |
|
|
Copying large amounts of data takes time. Resizing many small vectors may
|
150 |
|
|
have allocation overhead and affect locality.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>%.
|
151 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>
|
152 |
|
|
Set initial size to N at construction site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>.
|
153 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
154 |
|
|
For each dynamic instance of <code class="code">vector</code>,
|
155 |
|
|
record initial size and call context of the constructor.
|
156 |
|
|
Record size increase, if any, after each relevant operation such as
|
157 |
|
|
<code class="code">push_back</code>. Record the estimated resize cost.
|
158 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
159 |
|
|
Total number of words copied * time to copy a word.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
160 |
|
|
</p><pre class="programlisting">
|
161 |
|
|
1 vector<int> v;
|
162 |
|
|
2 for (int k = 0; k < 1000000; ++k) {
|
163 |
|
|
3 v.push_back(k);
|
164 |
|
|
4 }
|
165 |
|
|
|
166 |
|
|
foo.cc:1: advice: Changing initial vector size from 10 to 1000000 saves
|
167 |
|
|
copying 4000000 bytes and 20 memory allocations and deallocations.
|
168 |
|
|
</pre><p>
|
169 |
|
|
</p></li></ul></div></div><div class="sect3" title="Vector Too Large"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_too_large"></a>Vector Too Large</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
170 |
|
|
<code class="code">_GLIBCXX_PROFILE_VECTOR_TOO_LARGE</code>
|
171 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span>Detect vectors which are
|
172 |
|
|
never filled up because fewer elements than reserved are ever
|
173 |
|
|
inserted.
|
174 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Save memory, which
|
175 |
|
|
is good in itself and may also improve memory reference performance through
|
176 |
|
|
fewer cache and TLB misses.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>%.
|
177 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>
|
178 |
|
|
Set initial size to N at construction site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>.
|
179 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
180 |
|
|
For each dynamic instance of <code class="code">vector</code>,
|
181 |
|
|
record initial size and call context of the constructor, and correlate it
|
182 |
|
|
with its size at destruction time.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
183 |
|
|
Total amount of memory saved.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
184 |
|
|
</p><pre class="programlisting">
|
185 |
|
|
1 vector<vector<int>> v(100000, vector<int>(100)) ;
|
186 |
|
|
2 for (int k = 0; k < 100000; ++k) {
|
187 |
|
|
3 for (int j = 0; j < 10; ++j) {
|
188 |
|
|
4 v[k].insert(k + j);
|
189 |
|
|
5 }
|
190 |
|
|
6 }
|
191 |
|
|
|
192 |
|
|
foo.cc:1: advice: Changing initial vector size from 100 to 10 saves N
|
193 |
|
|
bytes of memory and may reduce the number of cache and TLB misses.
|
194 |
|
|
</pre><p>
|
195 |
|
|
</p></li></ul></div></div><div class="sect3" title="Vector to Hashtable"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_to_hashtable"></a>Vector to Hashtable</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
196 |
|
|
<code class="code">_GLIBCXX_PROFILE_VECTOR_TO_HASHTABLE</code>.
|
197 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect uses of
|
198 |
|
|
<code class="code">vector</code> that can be substituted with <code class="code">unordered_set</code>
|
199 |
|
|
to reduce execution time.
|
200 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
201 |
|
|
Linear search in a vector is very expensive, whereas searching in a hashtable
|
202 |
|
|
is very quick.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>factor up
|
203 |
|
|
to container size.
|
204 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace
|
205 |
|
|
<code class="code">vector</code> with <code class="code">unordered_set</code> at site S.
|
206 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>
|
207 |
|
|
operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
208 |
|
|
For each dynamic instance of <code class="code">vector</code>,
|
209 |
|
|
record call context of the constructor. Issue the advice only if the
|
210 |
|
|
only methods called on this <code class="code">vector</code> are <code class="code">push_back</code>,
|
211 |
|
|
<code class="code">insert</code> and <code class="code">find</code>.
|
212 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
213 |
|
|
Cost(vector::push_back) + cost(vector::insert) + cost(find, vector) -
|
214 |
|
|
cost(unordered_set::insert) + cost(unordered_set::find).
|
215 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
216 |
|
|
</p><pre class="programlisting">
|
217 |
|
|
1 vector<int> v;
|
218 |
|
|
...
|
219 |
|
|
2 for (int i = 0; i < 1000; ++i) {
|
220 |
|
|
3 find(v.begin(), v.end(), i);
|
221 |
|
|
4 }
|
222 |
|
|
|
223 |
|
|
foo.cc:1: advice: Changing "vector" to "unordered_set" will save about 500,000
|
224 |
|
|
comparisons.
|
225 |
|
|
</pre><p>
|
226 |
|
|
</p></li></ul></div></div><div class="sect3" title="Hashtable to Vector"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_to_vector"></a>Hashtable to Vector</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
227 |
|
|
<code class="code">_GLIBCXX_PROFILE_HASHTABLE_TO_VECTOR</code>.
|
228 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect uses of
|
229 |
|
|
<code class="code">unordered_set</code> that can be substituted with <code class="code">vector</code>
|
230 |
|
|
to reduce execution time.
|
231 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
232 |
|
|
Hashtable iterator is slower than vector iterator.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>95%.
|
233 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace
|
234 |
|
|
<code class="code">unordered_set</code> with <code class="code">vector</code> at site S.
|
235 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">unordered_set</code>
|
236 |
|
|
operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
237 |
|
|
For each dynamic instance of <code class="code">unordered_set</code>,
|
238 |
|
|
record call context of the constructor. Issue the advice only if the
|
239 |
|
|
number of <code class="code">find</code>, <code class="code">insert</code> and <code class="code">[]</code>
|
240 |
|
|
operations on this <code class="code">unordered_set</code> are small relative to the
|
241 |
|
|
number of elements, and methods <code class="code">begin</code> or <code class="code">end</code>
|
242 |
|
|
are invoked (suggesting iteration).</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
243 |
|
|
Number of .</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
244 |
|
|
</p><pre class="programlisting">
|
245 |
|
|
1 unordered_set<int> us;
|
246 |
|
|
...
|
247 |
|
|
2 int s = 0;
|
248 |
|
|
3 for (unordered_set<int>::iterator it = us.begin(); it != us.end(); ++it) {
|
249 |
|
|
4 s += *it;
|
250 |
|
|
5 }
|
251 |
|
|
|
252 |
|
|
foo.cc:1: advice: Changing "unordered_set" to "vector" will save about N
|
253 |
|
|
indirections and may achieve better data locality.
|
254 |
|
|
</pre><p>
|
255 |
|
|
</p></li></ul></div></div><div class="sect3" title="Vector to List"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_to_list"></a>Vector to List</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
256 |
|
|
<code class="code">_GLIBCXX_PROFILE_VECTOR_TO_LIST</code>.
|
257 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where
|
258 |
|
|
<code class="code">vector</code> could be substituted with <code class="code">list</code> for
|
259 |
|
|
better performance.
|
260 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
261 |
|
|
Inserting in the middle of a vector is expensive compared to inserting in a
|
262 |
|
|
list.
|
263 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>factor up to
|
264 |
|
|
container size.
|
265 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace vector with list
|
266 |
|
|
at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>
|
267 |
|
|
operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
268 |
|
|
For each dynamic instance of <code class="code">vector</code>,
|
269 |
|
|
record the call context of the constructor. Record the overhead of each
|
270 |
|
|
<code class="code">insert</code> operation based on current size and insert position.
|
271 |
|
|
Report instance with high insertion overhead.
|
272 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
273 |
|
|
(Sum(cost(vector::method)) - Sum(cost(list::method)), for
|
274 |
|
|
method in [push_back, insert, erase])
|
275 |
|
|
+ (Cost(iterate vector) - Cost(iterate list))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
276 |
|
|
</p><pre class="programlisting">
|
277 |
|
|
1 vector<int> v;
|
278 |
|
|
2 for (int i = 0; i < 10000; ++i) {
|
279 |
|
|
3 v.insert(v.begin(), i);
|
280 |
|
|
4 }
|
281 |
|
|
|
282 |
|
|
foo.cc:1: advice: Changing "vector" to "list" will save about 5,000,000
|
283 |
|
|
operations.
|
284 |
|
|
</pre><p>
|
285 |
|
|
</p></li></ul></div></div><div class="sect3" title="List to Vector"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.list_to_vector"></a>List to Vector</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
286 |
|
|
<code class="code">_GLIBCXX_PROFILE_LIST_TO_VECTOR</code>.
|
287 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where
|
288 |
|
|
<code class="code">list</code> could be substituted with <code class="code">vector</code> for
|
289 |
|
|
better performance.
|
290 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
291 |
|
|
Iterating through a vector is faster than through a list.
|
292 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>64%.
|
293 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace list with vector
|
294 |
|
|
at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>
|
295 |
|
|
operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
296 |
|
|
Issue the advice if there are no <code class="code">insert</code> operations.
|
297 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
298 |
|
|
(Sum(cost(vector::method)) - Sum(cost(list::method)), for
|
299 |
|
|
method in [push_back, insert, erase])
|
300 |
|
|
+ (Cost(iterate vector) - Cost(iterate list))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
301 |
|
|
</p><pre class="programlisting">
|
302 |
|
|
1 list<int> l;
|
303 |
|
|
...
|
304 |
|
|
2 int sum = 0;
|
305 |
|
|
3 for (list<int>::iterator it = l.begin(); it != l.end(); ++it) {
|
306 |
|
|
4 sum += *it;
|
307 |
|
|
5 }
|
308 |
|
|
|
309 |
|
|
foo.cc:1: advice: Changing "list" to "vector" will save about 1000000 indirect
|
310 |
|
|
memory references.
|
311 |
|
|
</pre><p>
|
312 |
|
|
</p></li></ul></div></div><div class="sect3" title="List to Forward List (Slist)"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.list_to_slist"></a>List to Forward List (Slist)</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
313 |
|
|
<code class="code">_GLIBCXX_PROFILE_LIST_TO_SLIST</code>.
|
314 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where
|
315 |
|
|
<code class="code">list</code> could be substituted with <code class="code">forward_list</code> for
|
316 |
|
|
better performance.
|
317 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
318 |
|
|
The memory footprint of a forward_list is smaller than that of a list.
|
319 |
|
|
This has beneficial effects on memory subsystem, e.g., fewer cache misses.
|
320 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>40%.
|
321 |
|
|
Note that the reduction is only noticeable if the size of the forward_list
|
322 |
|
|
node is in fact larger than that of the list node. For memory allocators
|
323 |
|
|
with size classes, you will only notice an effect when the two node sizes
|
324 |
|
|
belong to different allocator size classes.
|
325 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace list with
|
326 |
|
|
forward_list at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">list</code>
|
327 |
|
|
operations and iteration methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
328 |
|
|
Issue the advice if there are no <code class="code">backwards</code> traversals
|
329 |
|
|
or insertion before a given node.
|
330 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
331 |
|
|
Always true.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
332 |
|
|
</p><pre class="programlisting">
|
333 |
|
|
1 list<int> l;
|
334 |
|
|
...
|
335 |
|
|
2 int sum = 0;
|
336 |
|
|
3 for (list<int>::iterator it = l.begin(); it != l.end(); ++it) {
|
337 |
|
|
4 sum += *it;
|
338 |
|
|
5 }
|
339 |
|
|
|
340 |
|
|
foo.cc:1: advice: Change "list" to "forward_list".
|
341 |
|
|
</pre><p>
|
342 |
|
|
</p></li></ul></div></div><div class="sect3" title="Ordered to Unordered Associative Container"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.assoc_ord_to_unord"></a>Ordered to Unordered Associative Container</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
343 |
|
|
<code class="code">_GLIBCXX_PROFILE_ORDERED_TO_UNORDERED</code>.
|
344 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where ordered
|
345 |
|
|
associative containers can be replaced with unordered ones.
|
346 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
347 |
|
|
Insert and search are quicker in a hashtable than in
|
348 |
|
|
a red-black tree.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>52%.
|
349 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>
|
350 |
|
|
Replace set with unordered_set at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span>
|
351 |
|
|
<code class="code">set</code>, <code class="code">multiset</code>, <code class="code">map</code>,
|
352 |
|
|
<code class="code">multimap</code> methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
353 |
|
|
Issue the advice only if we are not using operator <code class="code">++</code> on any
|
354 |
|
|
iterator on a particular <code class="code">[multi]set|map</code>.
|
355 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
356 |
|
|
(Sum(cost(hashtable::method)) - Sum(cost(rbtree::method)), for
|
357 |
|
|
method in [insert, erase, find])
|
358 |
|
|
+ (Cost(iterate hashtable) - Cost(iterate rbtree))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
359 |
|
|
</p><pre class="programlisting">
|
360 |
|
|
1 set<int> s;
|
361 |
|
|
2 for (int i = 0; i < 100000; ++i) {
|
362 |
|
|
3 s.insert(i);
|
363 |
|
|
4 }
|
364 |
|
|
5 int sum = 0;
|
365 |
|
|
6 for (int i = 0; i < 100000; ++i) {
|
366 |
|
|
7 sum += *s.find(i);
|
367 |
|
|
8 }
|
368 |
|
|
</pre><p>
|
369 |
|
|
</p></li></ul></div></div></div><div class="sect2" title="Algorithms"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.algorithms"></a>Algorithms</h3></div></div></div><p><span class="emphasis"><em>Switch:</em></span>
|
370 |
|
|
<code class="code">_GLIBCXX_PROFILE_ALGORITHMS</code>.
|
371 |
|
|
</p><div class="sect3" title="Sort Algorithm Performance"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.algorithms.sort"></a>Sort Algorithm Performance</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
372 |
|
|
<code class="code">_GLIBCXX_PROFILE_SORT</code>.
|
373 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Give measure of sort algorithm
|
374 |
|
|
performance based on actual input. For instance, advise Radix Sort over
|
375 |
|
|
Quick Sort for a particular call context.
|
376 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
377 |
|
|
See papers:
|
378 |
|
|
<a class="ulink" href="http://portal.acm.org/citation.cfm?doid=1065944.1065981" target="_top">
|
379 |
|
|
A framework for adaptive algorithm selection in STAPL</a> and
|
380 |
|
|
<a class="ulink" href="http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=4228227" target="_top">
|
381 |
|
|
Optimizing Sorting with Machine Learning Algorithms</a>.
|
382 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>60%.
|
383 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change sort algorithm
|
384 |
|
|
at site S from X Sort to Y Sort.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> <code class="code">sort</code>
|
385 |
|
|
algorithm.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
386 |
|
|
Issue the advice if the cost model tells us that another sort algorithm
|
387 |
|
|
would do better on this input. Requires us to know what algorithm we
|
388 |
|
|
are using in our sort implementation in release mode.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
389 |
|
|
Runtime(algo) for algo in [radix, quick, merge, ...]</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
390 |
|
|
</p><pre class="programlisting">
|
391 |
|
|
</pre><p>
|
392 |
|
|
</p></li></ul></div></div></div><div class="sect2" title="Data Locality"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.locality"></a>Data Locality</h3></div></div></div><p><span class="emphasis"><em>Switch:</em></span>
|
393 |
|
|
<code class="code">_GLIBCXX_PROFILE_LOCALITY</code>.
|
394 |
|
|
</p><div class="sect3" title="Need Software Prefetch"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.locality.sw_prefetch"></a>Need Software Prefetch</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
395 |
|
|
<code class="code">_GLIBCXX_PROFILE_SOFTWARE_PREFETCH</code>.
|
396 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Discover sequences of indirect
|
397 |
|
|
memory accesses that are not regular, thus cannot be predicted by
|
398 |
|
|
hardware prefetchers.
|
399 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
400 |
|
|
Indirect references are hard to predict and are very expensive when they
|
401 |
|
|
miss in caches.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>25%.
|
402 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Insert prefetch
|
403 |
|
|
instruction.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Vector iterator and
|
404 |
|
|
access operator [].
|
405 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
406 |
|
|
First, get cache line size and page size from system.
|
407 |
|
|
Then record iterator dereference sequences for which the value is a pointer.
|
408 |
|
|
For each sequence within a container, issue a warning if successive pointer
|
409 |
|
|
addresses are not within cache lines and do not form a linear pattern
|
410 |
|
|
(otherwise they may be prefetched by hardware).
|
411 |
|
|
If they also step across page boundaries, make the warning stronger.
|
412 |
|
|
</p><p>The same analysis applies to containers other than vector.
|
413 |
|
|
However, we cannot give the same advice for linked structures, such as list,
|
414 |
|
|
as there is no random access to the n-th element. The user may still be
|
415 |
|
|
able to benefit from this information, for instance by employing frays (user
|
416 |
|
|
level light weight threads) to hide the latency of chasing pointers.
|
417 |
|
|
</p><p>
|
418 |
|
|
This analysis is a little oversimplified. A better cost model could be
|
419 |
|
|
created by understanding the capability of the hardware prefetcher.
|
420 |
|
|
This model could be trained automatically by running a set of synthetic
|
421 |
|
|
cases.
|
422 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
423 |
|
|
Total distance between pointer values of successive elements in vectors
|
424 |
|
|
of pointers.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
425 |
|
|
</p><pre class="programlisting">
|
426 |
|
|
1 int zero = 0;
|
427 |
|
|
2 vector<int*> v(10000000, &zero);
|
428 |
|
|
3 for (int k = 0; k < 10000000; ++k) {
|
429 |
|
|
4 v[random() % 10000000] = new int(k);
|
430 |
|
|
5 }
|
431 |
|
|
6 for (int j = 0; j < 10000000; ++j) {
|
432 |
|
|
7 count += (*v[j] == 0 ? 0 : 1);
|
433 |
|
|
8 }
|
434 |
|
|
|
435 |
|
|
foo.cc:7: advice: Insert prefetch instruction.
|
436 |
|
|
</pre><p>
|
437 |
|
|
</p></li></ul></div></div><div class="sect3" title="Linked Structure Locality"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.locality.linked"></a>Linked Structure Locality</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
438 |
|
|
<code class="code">_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>.
|
439 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Give measure of locality of
|
440 |
|
|
objects stored in linked structures (lists, red-black trees and hashtables)
|
441 |
|
|
with respect to their actual traversal patterns.
|
442 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Allocation can be tuned
|
443 |
|
|
to a specific traversal pattern, to result in better data locality.
|
444 |
|
|
See paper:
|
445 |
|
|
<a class="ulink" href="http://www.springerlink.com/content/8085744l00x72662/" target="_top">
|
446 |
|
|
Custom Memory Allocation for Free</a>.
|
447 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>30%.
|
448 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>
|
449 |
|
|
High scatter score N for container built at site S.
|
450 |
|
|
Consider changing allocation sequence or choosing a structure conscious
|
451 |
|
|
allocator.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Methods of all
|
452 |
|
|
containers using linked structures.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
453 |
|
|
First, get cache line size and page size from system.
|
454 |
|
|
Then record the number of successive elements that are on different line
|
455 |
|
|
or page, for each traversal method such as <code class="code">find</code>. Give advice
|
456 |
|
|
only if the ratio between this number and the number of total node hops
|
457 |
|
|
is above a threshold.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
458 |
|
|
Sum(same_cache_line(this,previous))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
459 |
|
|
</p><pre class="programlisting">
|
460 |
|
|
1 set<int> s;
|
461 |
|
|
2 for (int i = 0; i < 10000000; ++i) {
|
462 |
|
|
3 s.insert(i);
|
463 |
|
|
4 }
|
464 |
|
|
5 set<int> s1, s2;
|
465 |
|
|
6 for (int i = 0; i < 10000000; ++i) {
|
466 |
|
|
7 s1.insert(i);
|
467 |
|
|
8 s2.insert(i);
|
468 |
|
|
9 }
|
469 |
|
|
...
|
470 |
|
|
// Fast, better locality.
|
471 |
|
|
10 for (set<int>::iterator it = s.begin(); it != s.end(); ++it) {
|
472 |
|
|
11 sum += *it;
|
473 |
|
|
12 }
|
474 |
|
|
// Slow, elements are further apart.
|
475 |
|
|
13 for (set<int>::iterator it = s1.begin(); it != s1.end(); ++it) {
|
476 |
|
|
14 sum += *it;
|
477 |
|
|
15 }
|
478 |
|
|
|
479 |
|
|
foo.cc:5: advice: High scatter score NNN for set built here. Consider changing
|
480 |
|
|
the allocation sequence or switching to a structure conscious allocator.
|
481 |
|
|
</pre><p>
|
482 |
|
|
</p></li></ul></div></div></div><div class="sect2" title="Multithreaded Data Access"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.mthread"></a>Multithreaded Data Access</h3></div></div></div><p>
|
483 |
|
|
The diagnostics in this group are not meant to be implemented short term.
|
484 |
|
|
They require compiler support to know when container elements are written
|
485 |
|
|
to. Instrumentation can only tell us when elements are referenced.
|
486 |
|
|
</p><p><span class="emphasis"><em>Switch:</em></span>
|
487 |
|
|
<code class="code">_GLIBCXX_PROFILE_MULTITHREADED</code>.
|
488 |
|
|
</p><div class="sect3" title="Data Dependence Violations at Container Level"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.mthread.ddtest"></a>Data Dependence Violations at Container Level</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
489 |
|
|
<code class="code">_GLIBCXX_PROFILE_DDTEST</code>.
|
490 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect container elements
|
491 |
|
|
that are referenced from multiple threads in the parallel region or
|
492 |
|
|
across parallel regions.
|
493 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>
|
494 |
|
|
Sharing data between threads requires communication and perhaps locking,
|
495 |
|
|
which may be expensive.
|
496 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>?%.
|
497 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change data
|
498 |
|
|
distribution or parallel algorithm.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Container access methods
|
499 |
|
|
and iterators.
|
500 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
501 |
|
|
Keep a shadow for each container. Record iterator dereferences and
|
502 |
|
|
container member accesses. Issue advice for elements referenced by
|
503 |
|
|
multiple threads.
|
504 |
|
|
See paper: <a class="ulink" href="http://portal.acm.org/citation.cfm?id=207110.207148" target="_top">
|
505 |
|
|
The LRPD test: speculative run-time parallelization of loops with
|
506 |
|
|
privatization and reduction parallelization</a>.
|
507 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
508 |
|
|
Number of accesses to elements referenced from multiple threads
|
509 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
510 |
|
|
</p><pre class="programlisting">
|
511 |
|
|
</pre><p>
|
512 |
|
|
</p></li></ul></div></div><div class="sect3" title="False Sharing"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.mthread.false_share"></a>False Sharing</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span>
|
513 |
|
|
<code class="code">_GLIBCXX_PROFILE_FALSE_SHARING</code>.
|
514 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect elements in the
|
515 |
|
|
same container which share a cache line, are written by at least one
|
516 |
|
|
thread, and accessed by different threads.
|
517 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Under these assumptions,
|
518 |
|
|
cache protocols require
|
519 |
|
|
communication to invalidate lines, which may be expensive.
|
520 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>68%.
|
521 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Reorganize container
|
522 |
|
|
or use padding to avoid false sharing.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Container access methods
|
523 |
|
|
and iterators.
|
524 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span>
|
525 |
|
|
First, get the cache line size.
|
526 |
|
|
For each shared container, record all the associated iterator dereferences
|
527 |
|
|
and member access methods with the thread id. Compare the address lists
|
528 |
|
|
across threads to detect references in two different threads to the same
|
529 |
|
|
cache line. Issue a warning only if the ratio to total references is
|
530 |
|
|
significant. Do the same for iterator dereference values if they are
|
531 |
|
|
pointers.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span>
|
532 |
|
|
Number of accesses to same cache line from different threads.
|
533 |
|
|
</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span>
|
534 |
|
|
</p><pre class="programlisting">
|
535 |
|
|
1 vector<int> v(2, 0);
|
536 |
|
|
2 #pragma omp parallel for shared(v, SIZE) schedule(static, 1)
|
537 |
|
|
3 for (i = 0; i < SIZE; ++i) {
|
538 |
|
|
4 v[i % 2] += i;
|
539 |
|
|
5 }
|
540 |
|
|
|
541 |
|
|
OMP_NUM_THREADS=2 ./a.out
|
542 |
|
|
foo.cc:1: advice: Change container structure or padding to avoid false
|
543 |
|
|
sharing in multithreaded access at foo.cc:4. Detected N shared cache lines.
|
544 |
|
|
</pre><p>
|
545 |
|
|
</p></li></ul></div></div></div><div class="sect2" title="Statistics"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.statistics"></a>Statistics</h3></div></div></div><p>
|
546 |
|
|
<span class="emphasis"><em>Switch:</em></span>
|
547 |
|
|
<code class="code">_GLIBCXX_PROFILE_STATISTICS</code>.
|
548 |
|
|
</p><p>
|
549 |
|
|
In some cases the cost model may not tell us anything because the costs
|
550 |
|
|
appear to offset the benefits. Consider the choice between a vector and
|
551 |
|
|
a list. When there are both inserts and iteration, an automatic advice
|
552 |
|
|
may not be issued. However, the programmer may still be able to make use
|
553 |
|
|
of this information in a different way.
|
554 |
|
|
</p><p>
|
555 |
|
|
This diagnostic will not issue any advice, but it will print statistics for
|
556 |
|
|
each container construction site. The statistics will contain the cost
|
557 |
|
|
of each operation actually performed on the container.
|
558 |
|
|
</p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="bk01pt12ch32s06.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="profile_mode.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="ext_allocators.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Developer Information </td><td width="20%" align="center"><a accesskey="h" href="../spine.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 33. Allocators</td></tr></table></div></body></html>
|