| 1 | 
         734 | 
         jeremybenn | 
         Long double format
  | 
      
      
         | 2 | 
          | 
          | 
         ==================
  | 
      
      
         | 3 | 
          | 
          | 
          
  | 
      
      
         | 4 | 
          | 
          | 
           Each long double is made up of two IEEE doubles.  The value of the
  | 
      
      
         | 5 | 
          | 
          | 
         long double is the sum of the values of the two parts (except for
  | 
      
      
         | 6 | 
          | 
          | 
         -0.0).  The most significant part is required to be the value of the
  | 
      
      
         | 7 | 
          | 
          | 
         long double rounded to the nearest double, as specified by IEEE.  For
  | 
      
      
         | 8 | 
          | 
          | 
         Inf values, the least significant part is required to be one of +0.0
  | 
      
      
         | 9 | 
          | 
          | 
         or -0.0.  No other requirements are made; so, for example, 1.0 may be
  | 
      
      
         | 10 | 
          | 
          | 
         represented as (1.0, +0.0) or (1.0, -0.0), and the low part of a NaN
  | 
      
      
         | 11 | 
          | 
          | 
         is don't-care.
  | 
      
      
         | 12 | 
          | 
          | 
          
  | 
      
      
         | 13 | 
          | 
          | 
         Classification
  | 
      
      
         | 14 | 
          | 
          | 
         --------------
  | 
      
      
         | 15 | 
          | 
          | 
          
  | 
      
      
         | 16 | 
          | 
          | 
         A long double can represent any value of the form
  | 
      
      
         | 17 | 
          | 
          | 
           s * 2^e * sum(k=0...105: f_k * 2^(-k))
  | 
      
      
         | 18 | 
          | 
          | 
         where 's' is +1 or -1, 'e' is between 1022 and -968 inclusive, f_0 is
  | 
      
      
         | 19 | 
          | 
          | 
         1, and f_k for k>0 is 0 or 1.  These are the 'normal' long doubles.
  | 
      
      
         | 20 | 
          | 
          | 
          
  | 
      
      
         | 21 | 
          | 
          | 
         A long double can also represent any value of the form
  | 
      
      
         | 22 | 
          | 
          | 
           s * 2^-968 * sum(k=0...105: f_k * 2^(-k))
  | 
      
      
         | 23 | 
          | 
          | 
         where 's' is +1 or -1, f_0 is 0, and f_k for k>0 is 0 or 1.  These are
  | 
      
      
         | 24 | 
          | 
          | 
         the 'subnormal' long doubles.
  | 
      
      
         | 25 | 
          | 
          | 
          
  | 
      
      
         | 26 | 
          | 
          | 
         There are four long doubles that represent zero, two that represent
  | 
      
      
         | 27 | 
          | 
          | 
         +0.0 and two that represent -0.0.  The sign of the high part is the
  | 
      
      
         | 28 | 
          | 
          | 
         sign of the long double, and the sign of the low part is ignored.
  | 
      
      
         | 29 | 
          | 
          | 
          
  | 
      
      
         | 30 | 
          | 
          | 
         Likewise, there are four long doubles that represent infinities, two
  | 
      
      
         | 31 | 
          | 
          | 
         for +Inf and two for -Inf.
  | 
      
      
         | 32 | 
          | 
          | 
          
  | 
      
      
         | 33 | 
          | 
          | 
         Each NaN, quiet or signalling, that can be represented as a 'double'
  | 
      
      
         | 34 | 
          | 
          | 
         can be represented as a 'long double'.  In fact, there are 2^64
  | 
      
      
         | 35 | 
          | 
          | 
         equivalent representations for each one.
  | 
      
      
         | 36 | 
          | 
          | 
          
  | 
      
      
         | 37 | 
          | 
          | 
         There are certain other valid long doubles where both parts are
  | 
      
      
         | 38 | 
          | 
          | 
         nonzero but the low part represents a value which has a bit set below
  | 
      
      
         | 39 | 
          | 
          | 
         2^(e-105).  These, together with the subnormal long doubles, make up
  | 
      
      
         | 40 | 
          | 
          | 
         the denormal long doubles.
  | 
      
      
         | 41 | 
          | 
          | 
          
  | 
      
      
         | 42 | 
          | 
          | 
         Many possible long double bit patterns are not valid long doubles.
  | 
      
      
         | 43 | 
          | 
          | 
         These do not represent any value.
  | 
      
      
         | 44 | 
          | 
          | 
          
  | 
      
      
         | 45 | 
          | 
          | 
         Limits
  | 
      
      
         | 46 | 
          | 
          | 
         ------
  | 
      
      
         | 47 | 
          | 
          | 
          
  | 
      
      
         | 48 | 
          | 
          | 
         The maximum representable long double is 2^1024-2^918.  The smallest
  | 
      
      
         | 49 | 
          | 
          | 
         *normal* positive long double is 2^-968.  The smallest denormalised
  | 
      
      
         | 50 | 
          | 
          | 
         positive long double is 2^-1074 (this is the same as for 'double').
  | 
      
      
         | 51 | 
          | 
          | 
          
  | 
      
      
         | 52 | 
          | 
          | 
         Conversions
  | 
      
      
         | 53 | 
          | 
          | 
         -----------
  | 
      
      
         | 54 | 
          | 
          | 
          
  | 
      
      
         | 55 | 
          | 
          | 
         A double can be converted to a long double by adding a zero low part.
  | 
      
      
         | 56 | 
          | 
          | 
          
  | 
      
      
         | 57 | 
          | 
          | 
         A long double can be converted to a double by removing the low part.
  | 
      
      
         | 58 | 
          | 
          | 
          
  | 
      
      
         | 59 | 
          | 
          | 
         Comparisons
  | 
      
      
         | 60 | 
          | 
          | 
         -----------
  | 
      
      
         | 61 | 
          | 
          | 
          
  | 
      
      
         | 62 | 
          | 
          | 
         Two long doubles can be compared by comparing the high parts, and if
  | 
      
      
         | 63 | 
          | 
          | 
         those compare equal, comparing the low parts.
  | 
      
      
         | 64 | 
          | 
          | 
          
  | 
      
      
         | 65 | 
          | 
          | 
         Arithmetic
  | 
      
      
         | 66 | 
          | 
          | 
         ----------
  | 
      
      
         | 67 | 
          | 
          | 
          
  | 
      
      
         | 68 | 
          | 
          | 
         The unary negate operation operates by negating the low and high parts.
  | 
      
      
         | 69 | 
          | 
          | 
          
  | 
      
      
         | 70 | 
          | 
          | 
         An absolute or absolute-negate operation must be done by comparing
  | 
      
      
         | 71 | 
          | 
          | 
         against zero and negating if necessary.
  | 
      
      
         | 72 | 
          | 
          | 
          
  | 
      
      
         | 73 | 
          | 
          | 
         Addition and subtraction are performed using library routines.  They
  | 
      
      
         | 74 | 
          | 
          | 
         are not at present performed perfectly accurately, the result produced
  | 
      
      
         | 75 | 
          | 
          | 
         will be within 1ulp of the range generated by adding or subtracting
  | 
      
      
         | 76 | 
          | 
          | 
         1ulp from the input values, where a 'ulp' is 2^(e-106) given the
  | 
      
      
         | 77 | 
          | 
          | 
         exponent 'e'.  In the presence of cancellation, this may be
  | 
      
      
         | 78 | 
          | 
          | 
         arbitrarily inaccurate.  Subtraction is done by negation and addition.
  | 
      
      
         | 79 | 
          | 
          | 
          
  | 
      
      
         | 80 | 
          | 
          | 
         Multiplication is also performed using a library routine.  Its result
  | 
      
      
         | 81 | 
          | 
          | 
         will be within 2ulp of the correct result.
  | 
      
      
         | 82 | 
          | 
          | 
          
  | 
      
      
         | 83 | 
          | 
          | 
         Division is also performed using a library routine.  Its result will
  | 
      
      
         | 84 | 
          | 
          | 
         be within 3ulp of the correct result.
  | 
      
      
         | 85 | 
          | 
          | 
          
  | 
      
      
         | 86 | 
          | 
          | 
          
  | 
      
      
         | 87 | 
          | 
          | 
         Copyright (C) 2004 Free Software Foundation, Inc.
  | 
      
      
         | 88 | 
          | 
          | 
          
  | 
      
      
         | 89 | 
          | 
          | 
         Copying and distribution of this file, with or without modification,
  | 
      
      
         | 90 | 
          | 
          | 
         are permitted in any medium without royalty provided the copyright
  | 
      
      
         | 91 | 
          | 
          | 
         notice and this notice are preserved.
  |