OpenCores
URL https://opencores.org/ocsvn/fpu100/fpu100/trunk

Subversion Repositories fpu100

[/] [fpu100/] [tags/] [arelease/] [test_bench/] [SoftFloat/] [softfloat/] [SoftFloat.txt] - Blame information for rev 21

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 6 jidan
 
2
SoftFloat Release 2b General Documentation
3
 
4
John R. Hauser
5
2002 May 27
6
 
7
 
8
----------------------------------------------------------------------------
9
Introduction
10
 
11
SoftFloat is a software implementation of floating-point that conforms to
12
the IEC/IEEE Standard for Binary Floating-Point Arithmetic.  As many as four
13
formats are supported:  single precision, double precision, extended double
14
precision, and quadruple precision.  All operations required by the standard
15
are implemented, except for conversions to and from decimal.
16
 
17
This document gives information about the types defined and the routines
18
implemented by SoftFloat.  It does not attempt to define or explain the
19
IEC/IEEE Floating-Point Standard.  Details about the standard are available
20
elsewhere.
21
 
22
 
23
----------------------------------------------------------------------------
24
Limitations
25
 
26
SoftFloat is written in C and is designed to work with other C code.  The
27
SoftFloat header files assume an ISO/ANSI-style C compiler.  No attempt
28
has been made to accomodate compilers that are not ISO-conformant.  In
29
particular, the distributed header files will not be acceptable to any
30
compiler that does not recognize function prototypes.
31
 
32
Support for the extended double-precision and quadruple-precision formats
33
depends on a C compiler that implements 64-bit integer arithmetic.  If the
34
largest integer format supported by the C compiler is 32 bits, SoftFloat
35
is limited to only single and double precisions.  When that is the case,
36
all references in this document to extended double precision, quadruple
37
precision, and 64-bit integers should be ignored.
38
 
39
 
40
----------------------------------------------------------------------------
41
Contents
42
 
43
    Introduction
44
    Limitations
45
    Contents
46
    Legal Notice
47
    Types and Functions
48
    Rounding Modes
49
    Extended Double-Precision Rounding Precision
50
    Exceptions and Exception Flags
51
    Function Details
52
        Conversion Functions
53
        Standard Arithmetic Functions
54
        Remainder Functions
55
        Round-to-Integer Functions
56
        Comparison Functions
57
        Signaling NaN Test Functions
58
        Raise-Exception Function
59
    Contact Information
60
 
61
 
62
 
63
----------------------------------------------------------------------------
64
Legal Notice
65
 
66
SoftFloat was written by John R. Hauser.  This work was made possible in
67
part by the International Computer Science Institute, located at Suite 600,
68
1947 Center Street, Berkeley, California 94704.  Funding was partially
69
provided by the National Science Foundation under grant MIP-9311980.  The
70
original version of this code was written as part of a project to build
71
a fixed-point vector processor in collaboration with the University of
72
California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
73
 
74
THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
75
has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
76
TIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
77
PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL
78
LOSSES, COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO
79
FURTHERMORE EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER
80
SCIENCE INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES,
81
COSTS, OR OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE
82
SOFTWARE.
83
 
84
 
85
----------------------------------------------------------------------------
86
Types and Functions
87
 
88
When 64-bit integers are supported by the compiler, the `softfloat.h'
89
header file defines four types:  `float32' (single precision), `float64'
90
(double precision), `floatx80' (extended double precision), and `float128'
91
(quadruple precision).  The `float32' and `float64' types are defined in
92
terms of 32-bit and 64-bit integer types, respectively, while the `float128'
93
type is defined as a structure of two 64-bit integers, taking into account
94
the byte order of the particular machine being used.  The `floatx80' type
95
is defined as a structure containing one 16-bit and one 64-bit integer, with
96
the machine's byte order again determining the order within the structure.
97
 
98
When 64-bit integers are _not_ supported by the compiler, the `softfloat.h'
99
header file defines only two types:  `float32' and `float64'.  Because
100
ISO/ANSI C guarantees at least one built-in integer type of 32 bits,
101
the `float32' type is identified with an appropriate integer type.  The
102
`float64' type is defined as a structure of two 32-bit integers, with the
103
machine's byte order determining the order of the fields.
104
 
105
In either case, the types in `softfloat.h' are defined such that if a system
106
implements the usual C `float' and `double' types according to the IEC/IEEE
107
Standard, then the `float32' and `float64' types should be indistinguishable
108
in memory from the native `float' and `double' types.  (On the other hand,
109
when `float32' or `float64' values are placed in processor registers by
110
the compiler, the type of registers used may differ from those used for the
111
native `float' and `double' types.)
112
 
113
SoftFloat implements the following arithmetic operations:
114
 
115
-- Conversions among all the floating-point formats, and also between
116
   integers (32-bit and 64-bit) and any of the floating-point formats.
117
 
118
-- The usual add, subtract, multiply, divide, and square root operations
119
   for all floating-point formats.
120
 
121
-- For each format, the floating-point remainder operation defined by the
122
   IEC/IEEE Standard.
123
 
124
-- For each floating-point format, a ``round to integer'' operation that
125
   rounds to the nearest integer value in the same format.  (The floating-
126
   point formats can hold integer values, of course.)
127
 
128
-- Comparisons between two values in the same floating-point format.
129
 
130
The only functions required by the IEC/IEEE Standard that are not provided
131
are conversions to and from decimal.
132
 
133
 
134
----------------------------------------------------------------------------
135
Rounding Modes
136
 
137
All four rounding modes prescribed by the IEC/IEEE Standard are implemented
138
for all operations that require rounding.  The rounding mode is selected
139
by the global variable `float_rounding_mode'.  This variable may be set
140
to one of the values `float_round_nearest_even', `float_round_to_zero',
141
`float_round_down', or `float_round_up'.  The rounding mode is initialized
142
to nearest/even.
143
 
144
 
145
----------------------------------------------------------------------------
146
Extended Double-Precision Rounding Precision
147
 
148
For extended double precision (`floatx80') only, the rounding precision
149
of the standard arithmetic operations is controlled by the global variable
150
`floatx80_rounding_precision'.  The operations affected are:
151
 
152
   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
153
 
154
When `floatx80_rounding_precision' is set to its default value of 80, these
155
operations are rounded (as usual) to the full precision of the extended
156
double-precision format.  Setting `floatx80_rounding_precision' to 32
157
or to 64 causes the operations listed to be rounded to reduced precision
158
equivalent to single precision (`float32') or to double precision
159
(`float64'), respectively.  When rounding to reduced precision, additional
160
bits in the result significand beyond the rounding point are set to zero.
161
The consequences of setting `floatx80_rounding_precision' to a value other
162
than 32, 64, or 80 is not specified.  Operations other than the ones listed
163
above are not affected by `floatx80_rounding_precision'.
164
 
165
 
166
----------------------------------------------------------------------------
167
Exceptions and Exception Flags
168
 
169
All five exception flags required by the IEC/IEEE Standard are
170
implemented.  Each flag is stored as a unique bit in the global variable
171
`float_exception_flags'.  The positions of the exception flag bits within
172
this variable are determined by the bit masks `float_flag_inexact',
173
`float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and
174
`float_flag_invalid'.  The exception flags variable is initialized to all 0,
175
meaning no exceptions.
176
 
177
An individual exception flag can be cleared with the statement
178
 
179
    float_exception_flags &= ~ float_flag_;
180
 
181
where `' is the appropriate name.  To raise a floating-point
182
exception, the SoftFloat function `float_raise' should be used (see below).
183
 
184
In the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess
185
for underflow either before or after rounding.  The choice is made by
186
the global variable `float_detect_tininess', which can be set to either
187
`float_tininess_before_rounding' or `float_tininess_after_rounding'.
188
Detecting tininess after rounding is better because it results in fewer
189
spurious underflow signals.  The other option is provided for compatibility
190
with some systems.  Like most systems, SoftFloat always detects loss of
191
accuracy for underflow as an inexact result.
192
 
193
 
194
----------------------------------------------------------------------------
195
Function Details
196
 
197
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
198
Conversion Functions
199
 
200
All conversions among the floating-point formats are supported, as are all
201
conversions between a floating-point format and 32-bit and 64-bit signed
202
integers.  The complete set of conversion functions is:
203
 
204
   int32_to_float32      int64_to_float32
205
   int32_to_float64      int64_to_float32
206
   int32_to_floatx80     int64_to_floatx80
207
   int32_to_float128     int64_to_float128
208
 
209
   float32_to_int32      float32_to_int64
210
   float32_to_int32      float64_to_int64
211
   floatx80_to_int32     floatx80_to_int64
212
   float128_to_int32     float128_to_int64
213
 
214
   float32_to_float64    float32_to_floatx80   float32_to_float128
215
   float64_to_float32    float64_to_floatx80   float64_to_float128
216
   floatx80_to_float32   floatx80_to_float64   floatx80_to_float128
217
   float128_to_float32   float128_to_float64   float128_to_floatx80
218
 
219
Each conversion function takes one operand of the appropriate type and
220
returns one result.  Conversions from a smaller to a larger floating-point
221
format are always exact and so require no rounding.  Conversions from 32-bit
222
integers to double precision and larger formats are also exact, and likewise
223
for conversions from 64-bit integers to extended double and quadruple
224
precisions.
225
 
226
Conversions from floating-point to integer raise the invalid exception if
227
the source value cannot be rounded to a representable integer of the desired
228
size (32 or 64 bits).  If the floating-point operand is a NaN, the largest
229
positive integer is returned.  Otherwise, if the conversion overflows, the
230
largest integer with the same sign as the operand is returned.
231
 
232
On conversions to integer, if the floating-point operand is not already
233
an integer value, the operand is rounded according to the current rounding
234
mode as specified by `float_rounding_mode'.  Because C (and perhaps other
235
languages) require that conversions to integers be rounded toward zero, the
236
following functions are provided for improved speed and convenience:
237
 
238
   float32_to_int32_round_to_zero    float32_to_int64_round_to_zero
239
   float64_to_int32_round_to_zero    float64_to_int64_round_to_zero
240
   floatx80_to_int32_round_to_zero   floatx80_to_int64_round_to_zero
241
   float128_to_int32_round_to_zero   float128_to_int64_round_to_zero
242
 
243
These variant functions ignore `float_rounding_mode' and always round toward
244
zero.
245
 
246
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
247
Standard Arithmetic Functions
248
 
249
The following standard arithmetic functions are provided:
250
 
251
   float32_add    float32_sub    float32_mul    float32_div    float32_sqrt
252
   float64_add    float64_sub    float64_mul    float64_div    float64_sqrt
253
   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
254
   float128_add   float128_sub   float128_mul   float128_div   float128_sqrt
255
 
256
Each function takes two operands, except for `sqrt' which takes only one.
257
The operands and result are all of the same type.
258
 
259
Rounding of the extended double-precision (`floatx80') functions is affected
260
by the `floatx80_rounding_precision' variable, as explained above in the
261
section _Extended Double-Precision Rounding Precision_.
262
 
263
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
264
Remainder Functions
265
 
266
For each format, SoftFloat implements the remainder function according to
267
the IEC/IEEE Standard.  The remainder functions are:
268
 
269
   float32_rem
270
   float64_rem
271
   floatx80_rem
272
   float128_rem
273
 
274
Each remainder function takes two operands.  The operands and result are all
275
of the same type.  Given operands x and y, the remainder functions return
276
the value x - n*y, where n is the integer closest to x/y.  If x/y is exactly
277
halfway between two integers, n is the even integer closest to x/y.  The
278
remainder functions are always exact and so require no rounding.
279
 
280
Depending on the relative magnitudes of the operands, the remainder
281
functions can take considerably longer to execute than the other SoftFloat
282
functions.  This is inherent in the remainder operation itself and is not a
283
flaw in the SoftFloat implementation.
284
 
285
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
286
Round-to-Integer Functions
287
 
288
For each format, SoftFloat implements the round-to-integer function
289
specified by the IEC/IEEE Standard.  The functions are:
290
 
291
   float32_round_to_int
292
   float64_round_to_int
293
   floatx80_round_to_int
294
   float128_round_to_int
295
 
296
Each function takes a single floating-point operand and returns a result of
297
the same type.  (Note that the result is not an integer type.)  The operand
298
is rounded to an exact integer according to the current rounding mode, and
299
the resulting integer value is returned in the same floating-point format.
300
 
301
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
302
Comparison Functions
303
 
304
The following floating-point comparison functions are provided:
305
 
306
   float32_eq    float32_le    float32_lt
307
   float64_eq    float64_le    float64_lt
308
   floatx80_eq   floatx80_le   floatx80_lt
309
   float128_eq   float128_le   float128_lt
310
 
311
Each function takes two operands of the same type and returns a 1 or 0
312
representing either _true_ or _false_.  The abbreviation `eq' stands for
313
``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands
314
for ``less than'' (<).
315
 
316
The standard greater-than (>), greater-than-or-equal (>=), and not-equal
317
(!=) functions are easily obtained using the functions provided.  The
318
not-equal function is just the logical complement of the equal function.
319
The greater-than-or-equal function is identical to the less-than-or-equal
320
function with the operands reversed, and the greater-than function is
321
identical to the less-than function with the operands reversed.
322
 
323
The IEC/IEEE Standard specifies that the less-than-or-equal and less-than
324
functions raise the invalid exception if either input is any kind of NaN.
325
The equal functions, on the other hand, are defined not to raise the invalid
326
exception on quiet NaNs.  For completeness, SoftFloat provides the following
327
additional functions:
328
 
329
   float32_eq_signaling    float32_le_quiet    float32_lt_quiet
330
   float64_eq_signaling    float64_le_quiet    float64_lt_quiet
331
   floatx80_eq_signaling   floatx80_le_quiet   floatx80_lt_quiet
332
   float128_eq_signaling   float128_le_quiet   float128_lt_quiet
333
 
334
The `signaling' equal functions are identical to the standard functions
335
except that the invalid exception is raised for any NaN input.  Likewise,
336
the `quiet' comparison functions are identical to their counterparts except
337
that the invalid exception is not raised for quiet NaNs.
338
 
339
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
340
Signaling NaN Test Functions
341
 
342
The following functions test whether a floating-point value is a signaling
343
NaN:
344
 
345
   float32_is_signaling_nan
346
   float64_is_signaling_nan
347
   floatx80_is_signaling_nan
348
   float128_is_signaling_nan
349
 
350
The functions take one operand and return 1 if the operand is a signaling
351
NaN and 0 otherwise.
352
 
353
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
354
Raise-Exception Function
355
 
356
SoftFloat provides a function for raising floating-point exceptions:
357
 
358
    float_raise
359
 
360
The function takes a mask indicating the set of exceptions to raise.  No
361
result is returned.  In addition to setting the specified exception flags,
362
this function may cause a trap or abort appropriate for the current system.
363
 
364
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
365
 
366
 
367
----------------------------------------------------------------------------
368
Contact Information
369
 
370
At the time of this writing, the most up-to-date information about
371
SoftFloat and the latest release can be found at the Web page `http://
372
www.cs.berkeley.edu/~jhauser/arithmetic/SoftFloat.html'.
373
 
374
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.