1 |
280 |
hellwig |
|
2 |
|
|
|
3 |
|
|
Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
|
4 |
|
|
|
5 |
|
|
[published in SIGPLAN Notices 23,8 (Aug. 1988), 49-62]
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
Reinhold P. Weicker
|
9 |
|
|
Siemens AG, E STE 35
|
10 |
|
|
[now: Siemens AG, AUT E 51]
|
11 |
|
|
Postfach 3220
|
12 |
|
|
D-8520 Erlangen
|
13 |
|
|
Germany (West)
|
14 |
|
|
|
15 |
|
|
|
16 |
|
|
|
17 |
|
|
|
18 |
|
|
1. Why a Version 2 of Dhrystone?
|
19 |
|
|
|
20 |
|
|
The Dhrystone benchmark program [1] has become a popular benchmark for
|
21 |
|
|
CPU/compiler performance measurement, in particular in the area of
|
22 |
|
|
minicomputers, workstations, PC's and microprocesors. It apparently satisfies
|
23 |
|
|
a need for an easy-to-use integer benchmark; it gives a first performance
|
24 |
|
|
indication which is more meaningful than MIPS numbers which, in their literal
|
25 |
|
|
meaning (million instructions per second), cannot be used across different
|
26 |
|
|
instruction sets (e.g. RISC vs. CISC). With the increasing use of the
|
27 |
|
|
benchmark, it seems necessary to reconsider the benchmark and to check whether
|
28 |
|
|
it can still fulfill this function. Version 2 of Dhrystone is the result of
|
29 |
|
|
such a re-evaluation, it has been made for two reasons:
|
30 |
|
|
|
31 |
|
|
o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal and C
|
32 |
|
|
have been distributed by Reinhold Weicker via floppy disk. However, the
|
33 |
|
|
version that was used most often for benchmarking has been the version made
|
34 |
|
|
by Rick Richardson by another translation from the Ada version into the C
|
35 |
|
|
programming language, this has been the version distributed via the UNIX
|
36 |
|
|
network Usenet [2].
|
37 |
|
|
|
38 |
|
|
There is an obvious need for a common C version of Dhrystone, since C is at
|
39 |
|
|
present the most popular system programming language for the class of
|
40 |
|
|
systems (microcomputers, minicomputers, workstations) where Dhrystone is
|
41 |
|
|
used most. There should be, as far as possible, only one C version of
|
42 |
|
|
Dhrystone such that results can be compared without restrictions. In the
|
43 |
|
|
past, the C versions distributed by Rick Richardson (Version 1.1) and by
|
44 |
|
|
Reinhold Weicker had small (though not significant) differences.
|
45 |
|
|
|
46 |
|
|
Together with the new C version, the Ada and Pascal versions have been
|
47 |
|
|
updated as well.
|
48 |
|
|
|
49 |
|
|
o As far as it is possible without changes to the Dhrystone statistics,
|
50 |
|
|
optimizing compilers should be prevented from removing significant
|
51 |
|
|
statements. It has turned out in the past that optimizing compilers
|
52 |
|
|
suppressed code generation for too many statements (by "dead code removal"
|
53 |
|
|
or "dead variable elimination"). This has lead to the danger that
|
54 |
|
|
benchmarking results obtained by a naive application of Dhrystone - without
|
55 |
|
|
inspection of the code that was generated - could become meaningless.
|
56 |
|
|
|
57 |
|
|
The overall policiy for version 2 has been that the distribution of
|
58 |
|
|
statements, operand types and operand locality described in [1] should remain
|
59 |
|
|
unchanged as much as possible. (Very few changes were necessary; their impact
|
60 |
|
|
should be negligible.) Also, the order of statements should remain unchanged.
|
61 |
|
|
Although I am aware of some critical remarks on the benchmark - I agree with
|
62 |
|
|
several of them - and know some suggestions for improvement, I didn't want to
|
63 |
|
|
change the benchmark into something different from what has become known as
|
64 |
|
|
"Dhrystone"; the confusion generated by such a change would probably outweight
|
65 |
|
|
the benefits. If I were to write a new benchmark program, I wouldn't give it
|
66 |
|
|
the name "Dhrystone" since this denotes the program published in [1].
|
67 |
|
|
However, I do recognize the need for a larger number of representative
|
68 |
|
|
programs that can be used as benchmarks; users should always be encouraged to
|
69 |
|
|
use more than just one benchmark.
|
70 |
|
|
|
71 |
|
|
The new versions (version 2.1 for C, Pascal and Ada) will be distributed as
|
72 |
|
|
widely as possible. (Version 2.1 differs from version 2.0 distributed via the
|
73 |
|
|
UNIX Network Usenet in March 1988 only in a few corrections for minor
|
74 |
|
|
deficiencies found by users of version 2.0.) Readers who want to use the
|
75 |
|
|
benchmark for their own measurements can obtain a copy in machine-readable
|
76 |
|
|
form on floppy disk (MS-DOS or XENIX format) from the author.
|
77 |
|
|
|
78 |
|
|
|
79 |
|
|
2. Overall Characteristics of Version 2
|
80 |
|
|
|
81 |
|
|
In general, version 2 follows - in the parts that are significant for
|
82 |
|
|
performance measurement, i.e. within the measurement loop - the published
|
83 |
|
|
(Ada) version and the C versions previously distributed. Where the versions
|
84 |
|
|
distributed by Rick Richardson [2] and Reinhold Weicker have been different,
|
85 |
|
|
it follows the version distributed by Reinhold Weicker. (However, the
|
86 |
|
|
differences have been so small that their impact on execution time in all
|
87 |
|
|
likelihood has been negligible.) The initialization and UNIX instrumentation
|
88 |
|
|
part - which had been omitted in [1] - follows mostly the ideas of Rick
|
89 |
|
|
Richardson [2]. However, any changes in the initialization part and in the
|
90 |
|
|
printing of the result have no impact on performance measurement since they
|
91 |
|
|
are outside the measaurement loop. As a concession to older compilers, names
|
92 |
|
|
have been made unique within the first 8 characters for the C version.
|
93 |
|
|
|
94 |
|
|
The original publication of Dhrystone did not contain any statements for time
|
95 |
|
|
measurement since they are necessarily system-dependent. However, it turned
|
96 |
|
|
out that it is not enough just to inclose the main procedure of Dhrystone in a
|
97 |
|
|
loop and to measure the execution time. If the variables that are computed
|
98 |
|
|
are not used somehow, there is the danger that the compiler considers them as
|
99 |
|
|
"dead variables" and suppresses code generation for a part of the statements.
|
100 |
|
|
Therefore in version 2 all variables of "main" are printed at the end of the
|
101 |
|
|
program. This also permits some plausibility control for correct execution of
|
102 |
|
|
the benchmark.
|
103 |
|
|
|
104 |
|
|
At several places in the benchmark, code has been added, but only in branches
|
105 |
|
|
that are not executed. The intention is that optimizing compilers should be
|
106 |
|
|
prevented from moving code out of the measurement loop, or from removing code
|
107 |
|
|
altogether. Statements that are executed have been changed in very few places
|
108 |
|
|
only. In these cases, only the role of some operands has been changed, and it
|
109 |
|
|
was made sure that the numbers defining the "Dhrystone distribution"
|
110 |
|
|
(distribution of statements, operand types and locality) still hold as much as
|
111 |
|
|
possible. Except for sophisticated optimizing compilers, execution times for
|
112 |
|
|
version 2.1 should be the same as for previous versions.
|
113 |
|
|
|
114 |
|
|
Because of the self-imposed limitation that the order and distribution of the
|
115 |
|
|
executed statements should not be changed, there are still cases where
|
116 |
|
|
optimizing compilers may not generate code for some statements. To a certain
|
117 |
|
|
degree, this is unavoidable for small synthetic benchmarks. Users of the
|
118 |
|
|
benchmark are advised to check code listings whether code is generated for all
|
119 |
|
|
statements of Dhrystone.
|
120 |
|
|
|
121 |
|
|
Contrary to the suggestion in the published paper and its realization in the
|
122 |
|
|
versions previously distributed, no attempt has been made to subtract the time
|
123 |
|
|
for the measurement loop overhead. (This calculation has proven difficult to
|
124 |
|
|
implement in a correct way, and its omission makes the program simpler.)
|
125 |
|
|
However, since the loop check is now part of the benchmark, this does have an
|
126 |
|
|
impact - though a very minor one - on the distribution statistics which have
|
127 |
|
|
been updated for this version.
|
128 |
|
|
|
129 |
|
|
|
130 |
|
|
3. Discussion of Individual Changes
|
131 |
|
|
|
132 |
|
|
In this section, all changes are described that affect the measurement loop
|
133 |
|
|
and that are not just renamings of variables. All remarks refer to the C
|
134 |
|
|
version; the other language versions have been updated similarly.
|
135 |
|
|
|
136 |
|
|
In addition to adding the measurement loop and the printout statements,
|
137 |
|
|
changes have been made at the following places:
|
138 |
|
|
|
139 |
|
|
o In procedure "main", three statements have been added in the non-executed
|
140 |
|
|
"then" part of the statement
|
141 |
|
|
|
142 |
|
|
if (Enum_Loc == Func_1 (Ch_Index, 'C'))
|
143 |
|
|
|
144 |
|
|
they are
|
145 |
|
|
|
146 |
|
|
strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
|
147 |
|
|
Int_2_Loc = Run_Index;
|
148 |
|
|
Int_Glob = Run_Index;
|
149 |
|
|
|
150 |
|
|
The string assignment prevents movement of the preceding assignment to
|
151 |
|
|
Str_2_Loc (5'th statement of "main") out of the measurement loop (This
|
152 |
|
|
probably will not happen for the C version, but it did happen with another
|
153 |
|
|
language and compiler.) The assignment to Int_2_Loc prevents value
|
154 |
|
|
propagation for Int_2_Loc, and the assignment to Int_Glob makes the value of
|
155 |
|
|
Int_Glob possibly dependent from the value of Run_Index.
|
156 |
|
|
|
157 |
|
|
o In the three arithmetic computations at the end of the measurement loop in
|
158 |
|
|
"main ", the role of some variables has been exchanged, to prevent the
|
159 |
|
|
division from just cancelling out the multiplication as it was in [1]. A
|
160 |
|
|
very smart compiler might have recognized this and suppressed code
|
161 |
|
|
generation for the division.
|
162 |
|
|
|
163 |
|
|
o For Proc_2, no code has been changed, but the values of the actual parameter
|
164 |
|
|
have changed due to changes in "main".
|
165 |
|
|
|
166 |
|
|
o In Proc_4, the second assignment has been changed from
|
167 |
|
|
|
168 |
|
|
Bool_Loc = Bool_Loc | Bool_Glob;
|
169 |
|
|
|
170 |
|
|
to
|
171 |
|
|
|
172 |
|
|
Bool_Glob = Bool_Loc | Bool_Glob;
|
173 |
|
|
|
174 |
|
|
It now assigns a value to a global variable instead of a local variable
|
175 |
|
|
(Bool_Loc); Bool_Loc would be a "dead variable" which is not used
|
176 |
|
|
afterwards.
|
177 |
|
|
|
178 |
|
|
o In Func_1, the statement
|
179 |
|
|
|
180 |
|
|
Ch_1_Glob = Ch_1_Loc;
|
181 |
|
|
|
182 |
|
|
was added in the non-executed "else" part of the "if" statement, to prevent
|
183 |
|
|
the suppression of code generation for the assignment to Ch_1_Loc.
|
184 |
|
|
|
185 |
|
|
o In Func_2, the second character comparison statement has been changed to
|
186 |
|
|
|
187 |
|
|
if (Ch_Loc == 'R')
|
188 |
|
|
|
189 |
|
|
('R' instead of 'X') because a comparison with 'X' is implied in the
|
190 |
|
|
preceding "if" statement.
|
191 |
|
|
|
192 |
|
|
Also in Func_2, the statement
|
193 |
|
|
|
194 |
|
|
Int_Glob = Int_Loc;
|
195 |
|
|
|
196 |
|
|
has been added in the non-executed part of the last "if" statement, in order
|
197 |
|
|
to prevent Int_Loc from becoming a dead variable.
|
198 |
|
|
|
199 |
|
|
o In Func_3, a non-executed "else" part has been added to the "if" statement.
|
200 |
|
|
While the program would not be incorrect without this "else" part, it is
|
201 |
|
|
considered bad programming practice if a function can be left without a
|
202 |
|
|
return value.
|
203 |
|
|
|
204 |
|
|
To compensate for this change, the (non-executed) "else" part in the "if"
|
205 |
|
|
statement of Proc_3 was removed.
|
206 |
|
|
|
207 |
|
|
The distribution statistics have been changed only by the addition of the
|
208 |
|
|
measurement loop iteration (1 additional statement, 4 additional local integer
|
209 |
|
|
operands) and by the change in Proc_4 (one operand changed from local to
|
210 |
|
|
global). The distribution statistics in the comment headers have been updated
|
211 |
|
|
accordingly.
|
212 |
|
|
|
213 |
|
|
|
214 |
|
|
4. String Operations
|
215 |
|
|
|
216 |
|
|
The string operations (string assignment and string comparison) have not been
|
217 |
|
|
changed, to keep the program consistent with the original version.
|
218 |
|
|
|
219 |
|
|
There has been some concern that the string operations are over-represented in
|
220 |
|
|
the program, and that execution time is dominated by these operations. This
|
221 |
|
|
was true in particular when optimizing compilers removed too much code in the
|
222 |
|
|
main part of the program, this should have been mitigated in version 2.
|
223 |
|
|
|
224 |
|
|
It should be noted that this is a language-dependent issue: Dhrystone was
|
225 |
|
|
first published in Ada, and with Ada or Pascal semantics, the time spent in
|
226 |
|
|
the string operations is, at least in all implementations known to me,
|
227 |
|
|
considerably smaller. In Ada and Pascal, assignment and comparison of strings
|
228 |
|
|
are operators defined in the language, and the upper bounds of the strings
|
229 |
|
|
occuring in Dhrystone are part of the type information known at compilation
|
230 |
|
|
time. The compilers can therefore generate efficient inline code. In C,
|
231 |
|
|
string assignemt and comparisons are not part of the language, so the string
|
232 |
|
|
operations must be expressed in terms of the C library functions "strcpy" and
|
233 |
|
|
"strcmp". (ANSI C allows an implementation to use inline code for these
|
234 |
|
|
functions.) In addition to the overhead caused by additional function calls,
|
235 |
|
|
these functions are defined for null-terminated strings where the length of
|
236 |
|
|
the strings is not known at compilation time; the function has to check every
|
237 |
|
|
byte for the termination condition (the null byte).
|
238 |
|
|
|
239 |
|
|
Obviously, a C library which includes efficiently coded "strcpy" and "strcmp"
|
240 |
|
|
functions helps to obtain good Dhrystone results. However, I don't think that
|
241 |
|
|
this is unfair since string functions do occur quite frequently in real
|
242 |
|
|
programs (editors, command interpreters, etc.). If the strings functions are
|
243 |
|
|
implemented efficiently, this helps real programs as well as benchmark
|
244 |
|
|
programs.
|
245 |
|
|
|
246 |
|
|
I admit that the string comparison in Dhrystone terminates later (after
|
247 |
|
|
scanning 20 characters) than most string comparisons in real programs. For
|
248 |
|
|
consistency with the original benchmark, I didn't change the program despite
|
249 |
|
|
this weakness.
|
250 |
|
|
|
251 |
|
|
|
252 |
|
|
5. Intended Use of Dhrystone
|
253 |
|
|
|
254 |
|
|
When Dhrystone is used, the following "ground rules" apply:
|
255 |
|
|
|
256 |
|
|
o Separate compilation (Ada and C versions)
|
257 |
|
|
|
258 |
|
|
As mentioned in [1], Dhrystone was written to reflect actual programming
|
259 |
|
|
practice in systems programming. The division into several compilation
|
260 |
|
|
units (5 in the Ada version, 2 in the C version) is intended, as is the
|
261 |
|
|
distribution of inter-module and intra-module subprogram calls. Although on
|
262 |
|
|
many systems there will be no difference in execution time to a Dhrystone
|
263 |
|
|
version where all compilation units are merged into one file, the rule is
|
264 |
|
|
that separate compilation should be used. The intention is that real
|
265 |
|
|
programming practice, where programs consist of several independently
|
266 |
|
|
compiled units, should be reflected. This also has implies that the
|
267 |
|
|
compiler, while compiling one unit, has no information about the use of
|
268 |
|
|
variables, register allocation etc. occuring in other compilation units.
|
269 |
|
|
Although in real life compilation units will probably be larger, the
|
270 |
|
|
intention is that these effects of separate compilation are modeled in
|
271 |
|
|
Dhrystone.
|
272 |
|
|
|
273 |
|
|
A few language systems have post-linkage optimization available (e.g., final
|
274 |
|
|
register allocation is performed after linkage). This is a borderline case:
|
275 |
|
|
Post-linkage optimization involves additional program preparation time
|
276 |
|
|
(although not as much as compilation in one unit) which may prevent its
|
277 |
|
|
general use in practical programming. I think that since it defeats the
|
278 |
|
|
intentions given above, it should not be used for Dhrystone.
|
279 |
|
|
|
280 |
|
|
Unfortunately, ISO/ANSI Pascal does not contain language features for
|
281 |
|
|
separate compilation. Although most commercial Pascal compilers provide
|
282 |
|
|
separate compilation in some way, we cannot use it for Dhrystone since such
|
283 |
|
|
a version would not be portable. Therefore, no attempt has been made to
|
284 |
|
|
provide a Pascal version with several compilation units.
|
285 |
|
|
|
286 |
|
|
o No procedure merging
|
287 |
|
|
|
288 |
|
|
Although Dhrystone contains some very short procedures where execution would
|
289 |
|
|
benefit from procedure merging (inlining, macro expansion of procedures),
|
290 |
|
|
procedure merging is not to be used. The reason is that the percentage of
|
291 |
|
|
procedure and function calls is part of the "Dhrystone distribution" of
|
292 |
|
|
statements contained in [1]. This restriction does not hold for the string
|
293 |
|
|
functions of the C version since ANSI C allows an implementation to use
|
294 |
|
|
inline code for these functions.
|
295 |
|
|
|
296 |
|
|
o Other optimizations are allowed, but they should be indicated
|
297 |
|
|
|
298 |
|
|
It is often hard to draw an exact line between "normal code generation" and
|
299 |
|
|
"optimization" in compilers: Some compilers perform operations by default
|
300 |
|
|
that are invoked in other compilers only when optimization is explicitly
|
301 |
|
|
requested. Also, we cannot avoid that in benchmarking people try to achieve
|
302 |
|
|
results that look as good as possible. Therefore, optimizations performed
|
303 |
|
|
by compilers - other than those listed above - are not forbidden when
|
304 |
|
|
Dhrystone execution times are measured. Dhrystone is not intended to be
|
305 |
|
|
non-optimizable but is intended to be similarly optimizable as normal
|
306 |
|
|
programs. For example, there are several places in Dhrystone where
|
307 |
|
|
performance benefits from optimizations like common subexpression
|
308 |
|
|
elimination, value propagation etc., but normal programs usually also
|
309 |
|
|
benefit from these optimizations. Therefore, no effort was made to
|
310 |
|
|
artificially prevent such optimizations. However, measurement reports
|
311 |
|
|
should indicate which compiler optimization levels have been used, and
|
312 |
|
|
reporting results with different levels of compiler optimization for the
|
313 |
|
|
same hardware is encouraged.
|
314 |
|
|
|
315 |
|
|
o Default results are those without "register" declarations (C version)
|
316 |
|
|
|
317 |
|
|
When Dhrystone results are quoted without additional qualification, they
|
318 |
|
|
should be understood as results obtained without use of the "register"
|
319 |
|
|
attribute. Good compilers should be able to make good use of registers even
|
320 |
|
|
without explicit register declarations ([3], p. 193).
|
321 |
|
|
|
322 |
|
|
Of course, for experimental purposes, post-linkage optimization, procedure
|
323 |
|
|
merging and/or compilation in one unit can be done to determine their effects.
|
324 |
|
|
However, Dhrystone numbers obtained under these conditions should be
|
325 |
|
|
explicitly marked as such; "normal" Dhrystone results should be understood as
|
326 |
|
|
results obtained following the ground rules listed above.
|
327 |
|
|
|
328 |
|
|
In any case, for serious performance evaluation, users are advised to ask for
|
329 |
|
|
code listings and to check them carefully. In this way, when results for
|
330 |
|
|
different systems are compared, the reader can get a feeling how much
|
331 |
|
|
performance difference is due to compiler optimization and how much is due to
|
332 |
|
|
hardware speed.
|
333 |
|
|
|
334 |
|
|
|
335 |
|
|
6. Acknowledgements
|
336 |
|
|
|
337 |
|
|
The C version 2.1 of Dhrystone has been developed in cooperation with Rick
|
338 |
|
|
Richardson (Tinton Falls, NJ), it incorporates many ideas from the "Version
|
339 |
|
|
1.1" distributed previously by him over the UNIX network Usenet. Through his
|
340 |
|
|
activity with Usenet, Rick Richardson has made a very valuable contribution to
|
341 |
|
|
the dissemination of the benchmark. I also thank Chaim Benedelac (National
|
342 |
|
|
Semiconductor), David Ditzel (SUN), Earl Killian and John Mashey (MIPS), Alan
|
343 |
|
|
Smith and Rafael Saavedra-Barrera (UC at Berkeley) for their help with
|
344 |
|
|
comments on earlier versions of the benchmark.
|
345 |
|
|
|
346 |
|
|
|
347 |
|
|
7. Bibliography
|
348 |
|
|
|
349 |
|
|
[1]
|
350 |
|
|
Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming Benchmark.
|
351 |
|
|
Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
|
352 |
|
|
|
353 |
|
|
[2]
|
354 |
|
|
Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
|
355 |
|
|
Informal Distribution via "Usenet", Last Version Known to me: Sept. 21,
|
356 |
|
|
1987
|
357 |
|
|
|
358 |
|
|
[3]
|
359 |
|
|
Brian W. Kernighan and Dennis M. Ritchie: The C Programming Language.
|
360 |
|
|
Prentice-Hall, Englewood Cliffs (NJ) 1978
|
361 |
|
|
|