URL
https://opencores.org/ocsvn/eco32/eco32/trunk
Subversion Repositories eco32
[/] [eco32/] [trunk/] [fpga/] [tests/] [test_101/] [dhry/] [VARIATIONS] - Rev 317
Go to most recent revision | Compare with Previous | Blame | View Log
Understanding Variations in Dhrystone Performance
By Reinhold P. Weicker, Siemens AG, AUT E 51, Erlangen
April 1989
This article has appeared in:
Microprocessor Report, May 1989 (Editor: M. Slater), pp. 16-17
Microprocessor manufacturers tend to credit all the performance measured by
benchmarks to the speed of their processors, they often don't even mention the
programming language and compiler used. In their detailed documents, usually
called "performance brief" or "performance report," they usually do give more
details. However, these details are often lost in the press releases and other
marketing statements. For serious performance evaluation, it is necessary to
study the code generated by the various compilers.
Dhrystone was originally published in Ada (Communications of the ACM, Oct.
1984). However, since good Ada compilers were rare at this time and, together
with UNIX, C became more and more popular, the C version of Dhrystone is the
one now mainly used in industry. There are "official" versions 2.1 for Ada,
Pascal, and C, which are as close together as the languages' semantic
differences permit.
Dhrystone contains two statements where the programming language and its
translation play a major part in the execution time measured by the benchmark:
o String assignment (in procedure Proc_0 / main)
o String comparison (in function Func_2)
In Ada and Pascal, strings are arrays of characters where the length of the
string is part of the type information known at compile time. In C, strings
are also arrays of characters, but there are no operators defined in the
language for assignment and comparison of strings. Instead, functions
"strcpy" and "strcmp" are used. These functions are defined for strings of
arbitrary length, and make use of the fact that strings in C have to end with
a terminating null byte. For general-purpose calls to these functions, the
implementor can assume nothing about the length and the alignment of the
strings involved.
The C version of Dhrystone spends a relatively large amount of time in these
two functions. Some time ago, I made measurements on a VAX 11/785 with the
Berkeley UNIX (4.2) compilers (often-used compilers, but certainly not the
most advanced). In the C version, 23% of the time was spent in the string
functions; in the Pascal version, only 10%. On good RISC machines (where less
time is spent in the procedure calling sequence than on a VAX) and with better
optimizing compilers, the percentage is higher; MIPS has reported 34% for an
R3000. Because of this effect, Pascal and Ada Dhrystone results are usually
better than C results (except when the optimization quality of the C compiler
is considerably better than that of the other compilers).
Several people have noted that the string operations are over-represented in
Dhrystone, mainly because the strings occurring in Dhrystone are longer than
average strings. I admit that this is true, and have said so in my SIGPLAN
Notices paper (Aug. 1988); however, I didn't want to generate confusion by
changing the string lengths from version 1 to version 2.
Even if they are somewhat over-represented in Dhrystone, string operations are
frequent enough that it makes sense to implement them in the most efficient
way possible, not only for benchmarking purposes. This means that they can
and should be written in assembly language code. ANSI C also explicitly allows
the strings functions to be implemented as macros, i.e. by inline code.
There is also a third way to speed up the "strcpy" statement in Dhrystone: For
this particular "strcpy" statement, the source of the assignment is a string
constant. Therefore, in contrast to calls to "strcpy" in the general case, the
compiler knows the length and alignment of the strings involved at compile
time and can generate code in the same efficient way as a Pascal compiler
(word instructions instead of byte instructions).
This is not allowed in the case of the "strcmp" call: Here, the addresses are
formal procedure parameters, and no assumptions can be made about the length
or alignment of the strings. Any such assumptions would indicate an incorrect
implementation. They might work for Dhrystone, where the strings are in fact
word-aligned with typical compilers, but other programs would deliver
incorrect results.
So, for an apple-to-apple comparison between processors, and not between
several possible (legal or illegal) degrees of compiler optimization, one
should check that the systems are comparable with respect to the following
three points:
(1) String functions in assembly language vs. in C
Frequently used functions such as the string functions can and should be
written in assembly language, and all serious C language systems known
to me do this. (I list this point for completeness only.) Note that
processors with an instruction that checks a word for a null byte (such
as AMD's 29000 and Intel's 80960) have an advantage here. (This
advantage decreases relatively if optimization (3) is applied.) Due to
the length of the strings involved in Dhrystone, this advantage may be
considered too high in perspective, but it is certainly legal to use
such instructions - after all, these situations are what they were
invented for.
(2) String function code inline vs. as library functions.
ANSI C has created a new situation, compared with the older
Kernighan/Ritchie C. In the original C, the definition of the string
function was not part of the language. Now it is, and inlining is
explicitly allowed. I probably should have stated more clearly in my
SIGPLAN Notices paper that the rule "No procedure inlining for
Dhrystone" referred to the user level procedures only and not to the
library routines.
(3) Fixed-length and alignment assumptions for the strings
Compilers should be allowed to optimize in these cases if (and only if)
it is safe to do so. For Dhrystone, this is the "strcpy" statement, but
not the "strcmp" statement (unless, of course, the "strcmp" code
explicitly checks the alignment at execution time and branches
accordingly). A "Dhrystone switch" for the compiler that causes the
generation of code that may not work under certain circumstances is
certainly inappropriate for comparisons. It has been reported in Usenet
that some C compilers provide such a compiler option; since I don't have
access to all C compilers involved, I cannot verify this.
If the fixed-length and word-alignment assumption can be used, a wide
bus that permits fast multi-word load instructions certainly does help;
however, this fact by itself should not make a really big difference.
A check of these points - something that is necessary for a thorough
evaluation and comparison of the Dhrystone performance claims - requires
object code listings as well as listings for the string functions (strcpy,
strcmp) that are possibly called by the program.
I don't pretend that Dhrystone is a perfect tool to measure the integer
performance of microprocessors. The more it is used and discussed, the more I
myself learn about aspects that I hadn't noticed yet when I wrote the program.
And of course, the very success of a benchmark program is a danger in that
people may tune their compilers and/or hardware to it, and with this action
make it less useful.
Whetstone and Linpack have their critical points also: The Whetstone rating
depends heavily on the speed of the mathematical functions (sine, sqrt, ...),
and Linpack is sensitive to data alignment for some cache configurations.
Introduction of a standard set of public domain benchmark software (something
the SPEC effort attempts) is certainly a worthwhile thing. In the meantime,
people will continue to use whatever is available and widely distributed, and
Dhrystone ratings are probably still better than MIPS ratings if these are -
as often in industry - based on no reproducible derivation. However, any
serious performance evaluation requires more than just a comparison of raw
numbers; one has to make sure that the numbers have been obtained in a
comparable way.
Go to most recent revision | Compare with Previous | Blame | View Log