OpenCores
URL https://opencores.org/ocsvn/or1k_old/or1k_old/trunk

Subversion Repositories or1k_old

[/] [or1k_old/] [trunk/] [rc203soc/] [sw/] [uClinux/] [arch/] [i386/] [math-emu/] [README] - Blame information for rev 1782

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 1623 jcastillo
 +---------------------------------------------------------------------------+
2
 |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
3
 |                                                                           |
4
 | Copyright (C) 1992,1993,1994,1995,1996                                    |
5
 |                       W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
6
 |                       Australia.  E-mail billm@suburbia.net               |
7
 |                                                                           |
8
 |    This program is free software; you can redistribute it and/or modify   |
9
 |    it under the terms of the GNU General Public License version 2 as      |
10
 |    published by the Free Software Foundation.                             |
11
 |                                                                           |
12
 |    This program is distributed in the hope that it will be useful,        |
13
 |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
14
 |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
15
 |    GNU General Public License for more details.                           |
16
 |                                                                           |
17
 |    You should have received a copy of the GNU General Public License      |
18
 |    along with this program; if not, write to the Free Software            |
19
 |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
20
 |                                                                           |
21
 +---------------------------------------------------------------------------+
22
 
23
 
24
 
25
wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
26
which was my 80387 emulator for early versions of djgpp (gcc under
27
msdos); wm-emu387 was in turn based upon emu387 which was written by
28
DJ Delorie for djgpp.  The interface to the Linux kernel is based upon
29
the original Linux math emulator by Linus Torvalds.
30
 
31
My target FPU for wm-FPU-emu is that described in the Intel486
32
Programmer's Reference Manual (1992 edition). Unfortunately, numerous
33
facets of the functioning of the FPU are not well covered in the
34
Reference Manual. The information in the manual has been supplemented
35
with measurements on real 80486's. Unfortunately, it is simply not
36
possible to be sure that all of the peculiarities of the 80486 have
37
been discovered, so there is always likely to be obscure differences
38
in the detailed behaviour of the emulator and a real 80486.
39
 
40
wm-FPU-emu does not implement all of the behaviour of the 80486 FPU,
41
but is very close.  See "Limitations" later in this file for a list of
42
some differences.
43
 
44
Please report bugs, etc to me at:
45
       billm@suburbia.net
46
 
47
 
48
--Bill Metzenthen
49
  October 1996
50
 
51
 
52
----------------------- Internals of wm-FPU-emu -----------------------
53
 
54
Numeric algorithms:
55
(1) Add, subtract, and multiply. Nothing remarkable in these.
56
(2) Divide has been tuned to get reasonable performance. The algorithm
57
    is not the obvious one which most people seem to use, but is designed
58
    to take advantage of the characteristics of the 80386. I expect that
59
    it has been invented many times before I discovered it, but I have not
60
    seen it. It is based upon one of those ideas which one carries around
61
    for years without ever bothering to check it out.
62
(3) The sqrt function has been tuned to get good performance. It is based
63
    upon Newton's classic method. Performance was improved by capitalizing
64
    upon the properties of Newton's method, and the code is once again
65
    structured taking account of the 80386 characteristics.
66
(4) The trig, log, and exp functions are based in each case upon quasi-
67
    "optimal" polynomial approximations. My definition of "optimal" was
68
    based upon getting good accuracy with reasonable speed.
69
(5) The argument reducing code for the trig function effectively uses
70
    a value of pi which is accurate to more than 128 bits. As a consequence,
71
    the reduced argument is accurate to more than 64 bits for arguments up
72
    to a few pi, and accurate to more than 64 bits for most arguments,
73
    even for arguments approaching 2^63. This is far superior to an
74
    80486, which uses a value of pi which is accurate to 66 bits.
75
 
76
The code of the emulator is complicated slightly by the need to
77
account for a limited form of re-entrancy. Normally, the emulator will
78
emulate each FPU instruction to completion without interruption.
79
However, it may happen that when the emulator is accessing the user
80
memory space, swapping may be needed. In this case the emulator may be
81
temporarily suspended while disk i/o takes place. During this time
82
another process may use the emulator, thereby perhaps changing static
83
variables. The code which accesses user memory is confined to five
84
files:
85
    fpu_entry.c
86
    reg_ld_str.c
87
    load_store.c
88
    get_address.c
89
    errors.c
90
As from version 1.12 of the emulator, no static variables are used
91
(apart from those in the kernel's per-process tables). The emulator is
92
therefore now fully re-entrant, rather than having just the restricted
93
form of re-entrancy which is required by the Linux kernel.
94
 
95
----------------------- Limitations of wm-FPU-emu -----------------------
96
 
97
There are a number of differences between the current wm-FPU-emu
98
(version 1.20) and the 80486 FPU (apart from bugs). Some of the more
99
important differences are listed below:
100
 
101
The Roundup flag does not have much meaning for the transcendental
102
functions and its 80486 value with these functions is likely to differ
103
from its emulator value.
104
 
105
In a few rare cases the Underflow flag obtained with the emulator will
106
be different from that obtained with an 80486. This occurs when the
107
following conditions apply simultaneously:
108
(a) the operands have a higher precision than the current setting of the
109
    precision control (PC) flags.
110
(b) the underflow exception is masked.
111
(c) the magnitude of the exact result (before rounding) is less than 2^-16382.
112
(d) the magnitude of the final result (after rounding) is exactly 2^-16382.
113
(e) the magnitude of the exact result would be exactly 2^-16382 if the
114
    operands were rounded to the current precision before the arithmetic
115
    operation was performed.
116
If all of these apply, the emulator will set the Underflow flag but a real
117
80486 will not.
118
 
119
NOTE: Certain formats of Extended Real are UNSUPPORTED. They are
120
unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
121
and Unnormals. None of these will be generated by an 80486 or by the
122
emulator. Do not use them. The emulator treats them differently in
123
detail from the way an 80486 does.
124
 
125
The emulator treats PseudoDenormals differently from an 80486. These
126
numbers are in fact properly normalised numbers with the exponent
127
offset by 1, and the emulator treats them as such. Unlike the 80486,
128
the emulator does not generate a Denormal Operand exception for these
129
numbers. The arithmetical results produced when using such a number as
130
an operand are the same for the emulator and a real 80486 (apart from
131
any slight precision difference for the transcendental functions).
132
Neither the emulator nor an 80486 produces one of these numbers as the
133
result of any arithmetic operation. An 80486 can keep one of these
134
numbers in an FPU register with its identity as a PseudoDenormal, but
135
the emulator will not; they are always converted to a valid number.
136
 
137
Self modifying code can cause the emulator to fail. An example of such
138
code is:
139
          movl %esp,[%ebx]
140
          fld1
141
The FPU instruction may be (usually will be) loaded into the pre-fetch
142
queue of the cpu before the mov instruction is executed. If the
143
destination of the 'movl' overlaps the FPU instruction then the bytes
144
in the prefetch queue and memory will be inconsistent when the FPU
145
instruction is executed. The emulator will be invoked but will not be
146
able to find the instruction which caused the device-not-present
147
exception. For this case, the emulator cannot emulate the behaviour of
148
an 80486DX.
149
 
150
Handling of the address size override prefix byte (0x67) has not been
151
extensively tested yet. A major problem exists because using it in
152
vm86 mode can cause a general protection fault. Address offsets
153
greater than 0xffff appear to be illegal in vm86 mode but are quite
154
acceptable (and work) in real mode. A small test program developed to
155
check the addressing, and which runs successfully in real mode,
156
crashes dosemu under Linux and also brings Windows down with a general
157
protection fault message when run under the MS-DOS prompt of Windows
158
3.1. (The program simply reads data from a valid address).
159
 
160
The emulator supports 16-bit protected mode, with one difference from
161
an 80486DX.  A 80486DX will allow some floating point instructions to
162
write a few bytes below the lowest address of the stack.  The emulator
163
will not allow this in 16-bit protected mode: no instructions are
164
allowed to write outside the bounds set by the protection.
165
 
166
----------------------- Performance of wm-FPU-emu -----------------------
167
 
168
Speed.
169
-----
170
 
171
The speed of floating point computation with the emulator will depend
172
upon instruction mix. Relative performance is best for the instructions
173
which require most computation. The simple instructions are adversely
174
affected by the fpu instruction trap overhead.
175
 
176
 
177
Timing: Some simple timing tests have been made on the emulator functions.
178
The times include load/store instructions. All times are in microseconds
179
measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
180
ms-dos, the next two columns are for emulators running with the djgpp
181
ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
182
using libm4.0 (hard).
183
 
184
function      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu
185
 
186
   +          60.5           154.8              76.5          139.4
187
   -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
188
   *          71.0           190.8              79.6          146.6
189
   /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1
190
 
191
 sin()        310.8          4692.0            319.0          398.5
192
 cos()        284.4          4855.2            308.0          388.7
193
 tan()        495.0          8807.1            394.9          504.7
194
 atan()       328.9          4866.4            601.1          419.5-491.9
195
 
196
 sqrt()       128.7          crashed           145.2          227.0
197
 log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
198
 exp()        479.1          6619.2            469.1          850.8
199
 
200
 
201
The performance under Linux is improved by the use of look-ahead code.
202
The following results show the improvement which is obtained under
203
Linux due to the look-ahead code. Also given are the times for the
204
original Linux emulator with the 4.1 'soft' lib.
205
 
206
 [ Linus' note: I changed look-ahead to be the default under linux, as
207
   there was no reason not to use it after I had edited it to be
208
   disabled during tracing ]
209
 
210
            wm-FPU-emu w     original w
211
            look-ahead       'soft' lib
212
   +         106.4             190.2
213
   -         108.6-111.6      192.4-216.2
214
   *         113.4             193.1
215
   /         108.8-124.4      700.1-706.2
216
 
217
 sin()       390.5            2642.0
218
 cos()       381.5            2767.4
219
 tan()       496.5            3153.3
220
 atan()      367.2-435.5     2439.4-3396.8
221
 
222
 sqrt()      195.1            4732.5
223
 log()       358.0-387.5     3359.2-3390.3
224
 exp()       619.3            4046.4
225
 
226
 
227
These figures are now somewhat out-of-date. The emulator has become
228
progressively slower for most functions as more of the 80486 features
229
have been implemented.
230
 
231
 
232
----------------------- Accuracy of wm-FPU-emu -----------------------
233
 
234
 
235
The accuracy of the emulator is in almost all cases equal to or better
236
than that of an Intel 80486 FPU.
237
 
238
The results of the basic arithmetic functions (+,-,*,/), and fsqrt
239
match those of an 80486 FPU. They are the best possible; the error for
240
these never exceeds 1/2 an lsb. The fprem and fprem1 instructions
241
return exact results; they have no error.
242
 
243
 
244
The following table compares the emulator accuracy for the sqrt(),
245
trig and log functions against the Turbo C "emulator". For this table,
246
each function was tested at about 400 points. Ideal worst-case results
247
would be 64 bits. The reduced Turbo C accuracy of cos() and tan() for
248
arguments greater than pi/4 can be thought of as being related to the
249
precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
250
accurate to 64 bits can result in a relative accuracy in cos() of
251
about 64 + log2(cos(x)) = 31 bits.
252
 
253
 
254
Function      Tested x range            Worst result                Turbo C
255
                                        (relative bits)
256
 
257
sqrt(x)       1 .. 2                    64.1                         63.2
258
atan(x)       1e-10 .. 200              64.2                         62.8
259
cos(x)        0 .. pi/2-(1e-10)         64.4 (x <= pi/4)             62.4
260
                                        64.1 (x = pi/2-(1e-10))      31.9
261
sin(x)        1e-10 .. pi/2             64.0                         62.8
262
tan(x)        1e-10 .. pi/2-(1e-10)     64.0 (x <= pi/4)             62.1
263
                                        64.1 (x = pi/2-(1e-10))      31.9
264
exp(x)        0 .. 1                    63.1 **                      62.9
265
log(x)        1+1e-6 .. 2               63.8 **                      62.1
266
 
267
** The accuracy for exp() and log() is low because the FPU (emulator)
268
does not compute them directly; two operations are required.
269
 
270
 
271
The emulator passes the "paranoia" tests (compiled with gcc 2.3.3 or
272
later) for 'float' variables (24 bit precision numbers) when precision
273
control is set to 24, 53 or 64 bits, and for 'double' variables (53
274
bit precision numbers) when precision control is set to 53 bits (a
275
properly performing FPU cannot pass the 'paranoia' tests for 'double'
276
variables when precision control is set to 64 bits).
277
 
278
The code for reducing the argument for the trig functions (fsin, fcos,
279
fptan and fsincos) has been improved and now effectively uses a value
280
for pi which is accurate to more than 128 bits precision. As a
281
consequence, the accuracy of these functions for large arguments has
282
been dramatically improved (and is now very much better than an 80486
283
FPU). There is also now no degradation of accuracy for fcos and fptan
284
for operands close to pi/2. Measured results are (note that the
285
definition of accuracy has changed slightly from that used for the
286
above table):
287
 
288
Function      Tested x range          Worst result
289
                                     (absolute bits)
290
 
291
cos(x)        0 .. 9.22e+18              62.0
292
sin(x)        1e-16 .. 9.22e+18          62.1
293
tan(x)        1e-16 .. 9.22e+18          61.8
294
 
295
It is possible with some effort to find very large arguments which
296
give much degraded precision. For example, the integer number
297
           8227740058411162616.0
298
is within about 10e-7 of a multiple of pi. To find the tan (for
299
example) of this number to 64 bits precision it would be necessary to
300
have a value of pi which had about 150 bits precision. The FPU
301
emulator computes the result to about 42.6 bits precision (the correct
302
result is about -9.739715e-8). On the other hand, an 80486 FPU returns
303
0.01059, which in relative terms is hopelessly inaccurate.
304
 
305
For arguments close to critical angles (which occur at multiples of
306
pi/2) the emulator is more accurate than an 80486 FPU. For very large
307
arguments, the emulator is far more accurate.
308
 
309
 
310
Prior to version 1.20 of the emulator, the accuracy of the results for
311
the transcendental functions (in their principal range) was not as
312
good as the results from an 80486 FPU. From version 1.20, the accuracy
313
has been considerably improved and these functions now give measured
314
worst-case results which are better than the worst-case results given
315
by an 80486 FPU.
316
 
317
The following table gives the measured results for the emulator. The
318
number of randomly selected arguments in each case is about half a
319
million.  The group of three columns gives the frequency of the given
320
accuracy in number of times per million, thus the second of these
321
columns shows that an accuracy of between 63.80 and 63.89 bits was
322
found at a rate of 133 times per one million measurements for fsin.
323
The results show that the fsin, fcos and fptan instructions return
324
results which are in error (i.e. less accurate than the best possible
325
result (which is 64 bits)) for about one per cent of all arguments
326
between -pi/2 and +pi/2.  The other instructions have a lower
327
frequency of results which are in error.  The last two columns give
328
the worst accuracy which was found (in bits) and the approximate value
329
of the argument which produced it.
330
 
331
                                frequency (per M)
332
                               -------------------   ---------------
333
instr   arg range    # tests   63.7   63.8    63.9   worst   at arg
334
                               bits   bits    bits    bits
335
-----  ------------  -------   ----   ----   -----   -----  --------
336
fsin     (0,pi/2)     547756      0    133   10673   63.89  0.451317
337
fcos     (0,pi/2)     547563      0    126   10532   63.85  0.700801
338
fptan    (0,pi/2)     536274     11    267   10059   63.74  0.784876
339
fpatan  4 quadrants   517087      0      8    1855   63.88  0.435121 (4q)
340
fyl2x     (0,20)      541861      0      0    1323   63.94  1.40923  (x)
341
fyl2xp1 (-.293,.414)  520256      0      0    5678   63.93  0.408542 (x)
342
f2xm1     (-1,1)      538847      4    481    6488   63.79  0.167709
343
 
344
 
345
Tests performed on an 80486 FPU showed results of lower accuracy. The
346
following table gives the results which were obtained with an AMD
347
486DX2/66 (other tests indicate that an Intel 486DX produces
348
identical results).  The tests were basically the same as those used
349
to measure the emulator (the values, being random, were in general not
350
the same).  The total number of tests for each instruction are given
351
at the end of the table, in case each about 100k tests were performed.
352
Another line of figures at the end of the table shows that most of the
353
instructions return results which are in error for more than 10
354
percent of the arguments tested.
355
 
356
The numbers in the body of the table give the approx number of times a
357
result of the given accuracy in bits (given in the left-most column)
358
was obtained per one million arguments. For three of the instructions,
359
two columns of results are given: * The second column for f2xm1 gives
360
the number cases where the results of the first column were for a
361
positive argument, this shows that this instruction gives better
362
results for positive arguments than it does for negative.  * In the
363
cases of fcos and fptan, the first column gives the results when all
364
cases where arguments greater than 1.5 were removed from the results
365
given in the second column. Unlike the emulator, an 80486 FPU returns
366
results of relatively poor accuracy for these instructions when the
367
argument approaches pi/2. The table does not show those cases when the
368
accuracy of the results were less than 62 bits, which occurs quite
369
often for fsin and fptan when the argument approaches pi/2. This poor
370
accuracy is discussed above in relation to the Turbo C "emulator", and
371
the accuracy of the value of pi.
372
 
373
 
374
bits   f2xm1  f2xm1 fpatan   fcos   fcos  fyl2x fyl2xp1  fsin  fptan  fptan
375
62.0       0      0      0      0    437      0      0      0      0    925
376
62.1       0      0     10      0    894      0      0      0      0   1023
377
62.2      14      0      0      0   1033      0      0      0      0    945
378
62.3      57      0      0      0   1202      0      0      0      0   1023
379
62.4     385      0      0     10   1292      0     23      0      0   1178
380
62.5    1140      0      0    119   1649      0     39      0      0   1149
381
62.6    2037      0      0    189   1620      0     16      0      0   1169
382
62.7    5086     14      0    646   2315     10    101     35     39   1402
383
62.8    8818     86      0    984   3050     59    287    131    224   2036
384
62.9   11340   1355      0   2126   4153     79    605    357    321   1948
385
63.0   15557   4750      0   3319   5376    246   1281    862    808   2688
386
63.1   20016   8288      0   4620   6628    511   2569   1723   1510   3302
387
63.2   24945  11127     10   6588   8098   1120   4470   2968   2990   4724
388
63.3   25686  12382     69   8774  10682   1906   6775   4482   5474   7236
389
63.4   29219  14722     79  11109  12311   3094   9414   7259   8912  10587
390
63.5   30458  14936    393  13802  15014   5874  12666   9609  13762  15262
391
63.6   32439  16448   1277  17945  19028  10226  15537  14657  19158  20346
392
63.7   35031  16805   4067  23003  23947  18910  20116  21333  25001  26209
393
63.8   33251  15820   7673  24781  25675  24617  25354  24440  29433  30329
394
63.9   33293  16833  18529  28318  29233  31267  31470  27748  29676  30601
395
 
396
Per cent with error:
397
        30.9           3.2          18.5    9.8   13.1   11.6          17.4
398
Total arguments tested:
399
       70194  70099 101784 100641 100641 101799 128853 114893 102675 102675
400
 
401
 
402
------------------------- Contributors -------------------------------
403
 
404
A number of people have contributed to the development of the
405
emulator, often by just reporting bugs, sometimes with suggested
406
fixes, and a few kind people have provided me with access in one way
407
or another to an 80486 machine. Contributors include (to those people
408
who I may have forgotten, please forgive me):
409
 
410
Linus Torvalds
411
Tommy.Thorn@daimi.aau.dk
412
Andrew.Tridgell@anu.edu.au
413
Nick Holloway, alfie@dcs.warwick.ac.uk
414
Hermano Moura, moura@dcs.gla.ac.uk
415
Jon Jagger, J.Jagger@scp.ac.uk
416
Lennart Benschop
417
Brian Gallew, geek+@CMU.EDU
418
Thomas Staniszewski, ts3v+@andrew.cmu.edu
419
Martin Howell, mph@plasma.apana.org.au
420
M Saggaf, alsaggaf@athena.mit.edu
421
Peter Barker, PETER@socpsy.sci.fau.edu
422
tom@vlsivie.tuwien.ac.at
423
Dan Russel, russed@rpi.edu
424
Daniel Carosone, danielce@ee.mu.oz.au
425
cae@jpmorgan.com
426
Hamish Coleman, t933093@minyos.xx.rmit.oz.au
427
Bruce Evans, bde@kralizec.zeta.org.au
428
Timo Korvola, Timo.Korvola@hut.fi
429
Rick Lyons, rick@razorback.brisnet.org.au
430
Rick, jrs@world.std.com
431
 
432
...and numerous others who responded to my request for help with
433
a real 80486.
434
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.