1 |
295 |
hellwig |
|
2 |
|
|
Understanding Variations in Dhrystone Performance
|
3 |
|
|
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
By Reinhold P. Weicker, Siemens AG, AUT E 51, Erlangen
|
7 |
|
|
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
April 1989
|
11 |
|
|
|
12 |
|
|
|
13 |
|
|
This article has appeared in:
|
14 |
|
|
|
15 |
|
|
|
16 |
|
|
Microprocessor Report, May 1989 (Editor: M. Slater), pp. 16-17
|
17 |
|
|
|
18 |
|
|
|
19 |
|
|
|
20 |
|
|
|
21 |
|
|
Microprocessor manufacturers tend to credit all the performance measured by
|
22 |
|
|
benchmarks to the speed of their processors, they often don't even mention the
|
23 |
|
|
programming language and compiler used. In their detailed documents, usually
|
24 |
|
|
called "performance brief" or "performance report," they usually do give more
|
25 |
|
|
details. However, these details are often lost in the press releases and other
|
26 |
|
|
marketing statements. For serious performance evaluation, it is necessary to
|
27 |
|
|
study the code generated by the various compilers.
|
28 |
|
|
|
29 |
|
|
Dhrystone was originally published in Ada (Communications of the ACM, Oct.
|
30 |
|
|
1984). However, since good Ada compilers were rare at this time and, together
|
31 |
|
|
with UNIX, C became more and more popular, the C version of Dhrystone is the
|
32 |
|
|
one now mainly used in industry. There are "official" versions 2.1 for Ada,
|
33 |
|
|
Pascal, and C, which are as close together as the languages' semantic
|
34 |
|
|
differences permit.
|
35 |
|
|
|
36 |
|
|
Dhrystone contains two statements where the programming language and its
|
37 |
|
|
translation play a major part in the execution time measured by the benchmark:
|
38 |
|
|
|
39 |
|
|
o String assignment (in procedure Proc_0 / main)
|
40 |
|
|
o String comparison (in function Func_2)
|
41 |
|
|
|
42 |
|
|
In Ada and Pascal, strings are arrays of characters where the length of the
|
43 |
|
|
string is part of the type information known at compile time. In C, strings
|
44 |
|
|
are also arrays of characters, but there are no operators defined in the
|
45 |
|
|
language for assignment and comparison of strings. Instead, functions
|
46 |
|
|
"strcpy" and "strcmp" are used. These functions are defined for strings of
|
47 |
|
|
arbitrary length, and make use of the fact that strings in C have to end with
|
48 |
|
|
a terminating null byte. For general-purpose calls to these functions, the
|
49 |
|
|
implementor can assume nothing about the length and the alignment of the
|
50 |
|
|
strings involved.
|
51 |
|
|
|
52 |
|
|
The C version of Dhrystone spends a relatively large amount of time in these
|
53 |
|
|
two functions. Some time ago, I made measurements on a VAX 11/785 with the
|
54 |
|
|
Berkeley UNIX (4.2) compilers (often-used compilers, but certainly not the
|
55 |
|
|
most advanced). In the C version, 23% of the time was spent in the string
|
56 |
|
|
functions; in the Pascal version, only 10%. On good RISC machines (where less
|
57 |
|
|
time is spent in the procedure calling sequence than on a VAX) and with better
|
58 |
|
|
optimizing compilers, the percentage is higher; MIPS has reported 34% for an
|
59 |
|
|
R3000. Because of this effect, Pascal and Ada Dhrystone results are usually
|
60 |
|
|
better than C results (except when the optimization quality of the C compiler
|
61 |
|
|
is considerably better than that of the other compilers).
|
62 |
|
|
|
63 |
|
|
Several people have noted that the string operations are over-represented in
|
64 |
|
|
Dhrystone, mainly because the strings occurring in Dhrystone are longer than
|
65 |
|
|
average strings. I admit that this is true, and have said so in my SIGPLAN
|
66 |
|
|
Notices paper (Aug. 1988); however, I didn't want to generate confusion by
|
67 |
|
|
changing the string lengths from version 1 to version 2.
|
68 |
|
|
|
69 |
|
|
Even if they are somewhat over-represented in Dhrystone, string operations are
|
70 |
|
|
frequent enough that it makes sense to implement them in the most efficient
|
71 |
|
|
way possible, not only for benchmarking purposes. This means that they can
|
72 |
|
|
and should be written in assembly language code. ANSI C also explicitly allows
|
73 |
|
|
the strings functions to be implemented as macros, i.e. by inline code.
|
74 |
|
|
|
75 |
|
|
There is also a third way to speed up the "strcpy" statement in Dhrystone: For
|
76 |
|
|
this particular "strcpy" statement, the source of the assignment is a string
|
77 |
|
|
constant. Therefore, in contrast to calls to "strcpy" in the general case, the
|
78 |
|
|
compiler knows the length and alignment of the strings involved at compile
|
79 |
|
|
time and can generate code in the same efficient way as a Pascal compiler
|
80 |
|
|
(word instructions instead of byte instructions).
|
81 |
|
|
|
82 |
|
|
This is not allowed in the case of the "strcmp" call: Here, the addresses are
|
83 |
|
|
formal procedure parameters, and no assumptions can be made about the length
|
84 |
|
|
or alignment of the strings. Any such assumptions would indicate an incorrect
|
85 |
|
|
implementation. They might work for Dhrystone, where the strings are in fact
|
86 |
|
|
word-aligned with typical compilers, but other programs would deliver
|
87 |
|
|
incorrect results.
|
88 |
|
|
|
89 |
|
|
So, for an apple-to-apple comparison between processors, and not between
|
90 |
|
|
several possible (legal or illegal) degrees of compiler optimization, one
|
91 |
|
|
should check that the systems are comparable with respect to the following
|
92 |
|
|
three points:
|
93 |
|
|
|
94 |
|
|
(1) String functions in assembly language vs. in C
|
95 |
|
|
|
96 |
|
|
Frequently used functions such as the string functions can and should be
|
97 |
|
|
written in assembly language, and all serious C language systems known
|
98 |
|
|
to me do this. (I list this point for completeness only.) Note that
|
99 |
|
|
processors with an instruction that checks a word for a null byte (such
|
100 |
|
|
as AMD's 29000 and Intel's 80960) have an advantage here. (This
|
101 |
|
|
advantage decreases relatively if optimization (3) is applied.) Due to
|
102 |
|
|
the length of the strings involved in Dhrystone, this advantage may be
|
103 |
|
|
considered too high in perspective, but it is certainly legal to use
|
104 |
|
|
such instructions - after all, these situations are what they were
|
105 |
|
|
invented for.
|
106 |
|
|
|
107 |
|
|
(2) String function code inline vs. as library functions.
|
108 |
|
|
|
109 |
|
|
ANSI C has created a new situation, compared with the older
|
110 |
|
|
Kernighan/Ritchie C. In the original C, the definition of the string
|
111 |
|
|
function was not part of the language. Now it is, and inlining is
|
112 |
|
|
explicitly allowed. I probably should have stated more clearly in my
|
113 |
|
|
SIGPLAN Notices paper that the rule "No procedure inlining for
|
114 |
|
|
Dhrystone" referred to the user level procedures only and not to the
|
115 |
|
|
library routines.
|
116 |
|
|
|
117 |
|
|
(3) Fixed-length and alignment assumptions for the strings
|
118 |
|
|
|
119 |
|
|
Compilers should be allowed to optimize in these cases if (and only if)
|
120 |
|
|
it is safe to do so. For Dhrystone, this is the "strcpy" statement, but
|
121 |
|
|
not the "strcmp" statement (unless, of course, the "strcmp" code
|
122 |
|
|
explicitly checks the alignment at execution time and branches
|
123 |
|
|
accordingly). A "Dhrystone switch" for the compiler that causes the
|
124 |
|
|
generation of code that may not work under certain circumstances is
|
125 |
|
|
certainly inappropriate for comparisons. It has been reported in Usenet
|
126 |
|
|
that some C compilers provide such a compiler option; since I don't have
|
127 |
|
|
access to all C compilers involved, I cannot verify this.
|
128 |
|
|
|
129 |
|
|
If the fixed-length and word-alignment assumption can be used, a wide
|
130 |
|
|
bus that permits fast multi-word load instructions certainly does help;
|
131 |
|
|
however, this fact by itself should not make a really big difference.
|
132 |
|
|
|
133 |
|
|
A check of these points - something that is necessary for a thorough
|
134 |
|
|
evaluation and comparison of the Dhrystone performance claims - requires
|
135 |
|
|
object code listings as well as listings for the string functions (strcpy,
|
136 |
|
|
strcmp) that are possibly called by the program.
|
137 |
|
|
|
138 |
|
|
I don't pretend that Dhrystone is a perfect tool to measure the integer
|
139 |
|
|
performance of microprocessors. The more it is used and discussed, the more I
|
140 |
|
|
myself learn about aspects that I hadn't noticed yet when I wrote the program.
|
141 |
|
|
And of course, the very success of a benchmark program is a danger in that
|
142 |
|
|
people may tune their compilers and/or hardware to it, and with this action
|
143 |
|
|
make it less useful.
|
144 |
|
|
|
145 |
|
|
Whetstone and Linpack have their critical points also: The Whetstone rating
|
146 |
|
|
depends heavily on the speed of the mathematical functions (sine, sqrt, ...),
|
147 |
|
|
and Linpack is sensitive to data alignment for some cache configurations.
|
148 |
|
|
|
149 |
|
|
Introduction of a standard set of public domain benchmark software (something
|
150 |
|
|
the SPEC effort attempts) is certainly a worthwhile thing. In the meantime,
|
151 |
|
|
people will continue to use whatever is available and widely distributed, and
|
152 |
|
|
Dhrystone ratings are probably still better than MIPS ratings if these are -
|
153 |
|
|
as often in industry - based on no reproducible derivation. However, any
|
154 |
|
|
serious performance evaluation requires more than just a comparison of raw
|
155 |
|
|
numbers; one has to make sure that the numbers have been obtained in a
|
156 |
|
|
comparable way.
|
157 |
|
|
|