forked from Imagelibrary/rtems
158 lines
8.4 KiB
Plaintext
158 lines
8.4 KiB
Plaintext
|
|
Understanding Variations in Dhrystone Performance
|
|
|
|
|
|
|
|
By Reinhold P. Weicker, Siemens AG, AUT E 51, Erlangen
|
|
|
|
|
|
|
|
April 1989
|
|
|
|
|
|
This article has appeared in:
|
|
|
|
|
|
Microprocessor Report, May 1989 (Editor: M. Slater), pp. 16-17
|
|
|
|
|
|
|
|
|
|
Microprocessor manufacturers tend to credit all the performance measured by
|
|
benchmarks to the speed of their processors, they often don't even mention the
|
|
programming language and compiler used. In their detailed documents, usually
|
|
called "performance brief" or "performance report," they usually do give more
|
|
details. However, these details are often lost in the press releases and other
|
|
marketing statements. For serious performance evaluation, it is necessary to
|
|
study the code generated by the various compilers.
|
|
|
|
Dhrystone was originally published in Ada (Communications of the ACM, Oct.
|
|
1984). However, since good Ada compilers were rare at this time and, together
|
|
with UNIX, C became more and more popular, the C version of Dhrystone is the
|
|
one now mainly used in industry. There are "official" versions 2.1 for Ada,
|
|
Pascal, and C, which are as close together as the languages' semantic
|
|
differences permit.
|
|
|
|
Dhrystone contains two statements where the programming language and its
|
|
translation play a major part in the execution time measured by the benchmark:
|
|
|
|
o String assignment (in procedure Proc_0 / main)
|
|
o String comparison (in function Func_2)
|
|
|
|
In Ada and Pascal, strings are arrays of characters where the length of the
|
|
string is part of the type information known at compile time. In C, strings
|
|
are also arrays of characters, but there are no operators defined in the
|
|
language for assignment and comparison of strings. Instead, functions
|
|
"strcpy" and "strcmp" are used. These functions are defined for strings of
|
|
arbitrary length, and make use of the fact that strings in C have to end with
|
|
a terminating null byte. For general-purpose calls to these functions, the
|
|
implementor can assume nothing about the length and the alignment of the
|
|
strings involved.
|
|
|
|
The C version of Dhrystone spends a relatively large amount of time in these
|
|
two functions. Some time ago, I made measurements on a VAX 11/785 with the
|
|
Berkeley UNIX (4.2) compilers (often-used compilers, but certainly not the
|
|
most advanced). In the C version, 23% of the time was spent in the string
|
|
functions; in the Pascal version, only 10%. On good RISC machines (where less
|
|
time is spent in the procedure calling sequence than on a VAX) and with better
|
|
optimizing compilers, the percentage is higher; MIPS has reported 34% for an
|
|
R3000. Because of this effect, Pascal and Ada Dhrystone results are usually
|
|
better than C results (except when the optimization quality of the C compiler
|
|
is considerably better than that of the other compilers).
|
|
|
|
Several people have noted that the string operations are over-represented in
|
|
Dhrystone, mainly because the strings occurring in Dhrystone are longer than
|
|
average strings. I admit that this is true, and have said so in my SIGPLAN
|
|
Notices paper (Aug. 1988); however, I didn't want to generate confusion by
|
|
changing the string lengths from version 1 to version 2.
|
|
|
|
Even if they are somewhat over-represented in Dhrystone, string operations are
|
|
frequent enough that it makes sense to implement them in the most efficient
|
|
way possible, not only for benchmarking purposes. This means that they can
|
|
and should be written in assembly language code. ANSI C also explicitly allows
|
|
the strings functions to be implemented as macros, i.e. by inline code.
|
|
|
|
There is also a third way to speed up the "strcpy" statement in Dhrystone: For
|
|
this particular "strcpy" statement, the source of the assignment is a string
|
|
constant. Therefore, in contrast to calls to "strcpy" in the general case, the
|
|
compiler knows the length and alignment of the strings involved at compile
|
|
time and can generate code in the same efficient way as a Pascal compiler
|
|
(word instructions instead of byte instructions).
|
|
|
|
This is not allowed in the case of the "strcmp" call: Here, the addresses are
|
|
formal procedure parameters, and no assumptions can be made about the length
|
|
or alignment of the strings. Any such assumptions would indicate an incorrect
|
|
implementation. They might work for Dhrystone, where the strings are in fact
|
|
word-aligned with typical compilers, but other programs would deliver
|
|
incorrect results.
|
|
|
|
So, for an apple-to-apple comparison between processors, and not between
|
|
several possible (legal or illegal) degrees of compiler optimization, one
|
|
should check that the systems are comparable with respect to the following
|
|
three points:
|
|
|
|
(1) String functions in assembly language vs. in C
|
|
|
|
Frequently used functions such as the string functions can and should be
|
|
written in assembly language, and all serious C language systems known
|
|
to me do this. (I list this point for completeness only.) Note that
|
|
processors with an instruction that checks a word for a null byte (such
|
|
as AMD's 29000 and Intel's 80960) have an advantage here. (This
|
|
advantage decreases relatively if optimization (3) is applied.) Due to
|
|
the length of the strings involved in Dhrystone, this advantage may be
|
|
considered too high in perspective, but it is certainly legal to use
|
|
such instructions - after all, these situations are what they were
|
|
invented for.
|
|
|
|
(2) String function code inline vs. as library functions.
|
|
|
|
ANSI C has created a new situation, compared with the older
|
|
Kernighan/Ritchie C. In the original C, the definition of the string
|
|
function was not part of the language. Now it is, and inlining is
|
|
explicitly allowed. I probably should have stated more clearly in my
|
|
SIGPLAN Notices paper that the rule "No procedure inlining for
|
|
Dhrystone" referred to the user level procedures only and not to the
|
|
library routines.
|
|
|
|
(3) Fixed-length and alignment assumptions for the strings
|
|
|
|
Compilers should be allowed to optimize in these cases if (and only if)
|
|
it is safe to do so. For Dhrystone, this is the "strcpy" statement, but
|
|
not the "strcmp" statement (unless, of course, the "strcmp" code
|
|
explicitly checks the alignment at execution time and branches
|
|
accordingly). A "Dhrystone switch" for the compiler that causes the
|
|
generation of code that may not work under certain circumstances is
|
|
certainly inappropriate for comparisons. It has been reported in Usenet
|
|
that some C compilers provide such a compiler option; since I don't have
|
|
access to all C compilers involved, I cannot verify this.
|
|
|
|
If the fixed-length and word-alignment assumption can be used, a wide
|
|
bus that permits fast multi-word load instructions certainly does help;
|
|
however, this fact by itself should not make a really big difference.
|
|
|
|
A check of these points - something that is necessary for a thorough
|
|
evaluation and comparison of the Dhrystone performance claims - requires
|
|
object code listings as well as listings for the string functions (strcpy,
|
|
strcmp) that are possibly called by the program.
|
|
|
|
I don't pretend that Dhrystone is a perfect tool to measure the integer
|
|
performance of microprocessors. The more it is used and discussed, the more I
|
|
myself learn about aspects that I hadn't noticed yet when I wrote the program.
|
|
And of course, the very success of a benchmark program is a danger in that
|
|
people may tune their compilers and/or hardware to it, and with this action
|
|
make it less useful.
|
|
|
|
Whetstone and Linpack have their critical points also: The Whetstone rating
|
|
depends heavily on the speed of the mathematical functions (sine, sqrt, ...),
|
|
and Linpack is sensitive to data alignment for some cache configurations.
|
|
|
|
Introduction of a standard set of public domain benchmark software (something
|
|
the SPEC effort attempts) is certainly a worthwhile thing. In the meantime,
|
|
people will continue to use whatever is available and widely distributed, and
|
|
Dhrystone ratings are probably still better than MIPS ratings if these are -
|
|
as often in industry - based on no reproducible derivation. However, any
|
|
serious performance evaluation requires more than just a comparison of raw
|
|
numbers; one has to make sure that the numbers have been obtained in a
|
|
comparable way.
|
|
|