Mathematics
Tech Note

Exploring Numbers, Not-A-Number, and Infinity

by Earl F. Glynn

First appeared in October 1998, Delphi Developer, pp. 9-14;
appearing here with corrections, updates and additions.

Contents

Ordinal Types
Real Types
When a Real is Actually an Integer
IEEE 754-1985
"Quiet" vs. "Signaling" NaNs
Other Examples of Handling Exceptions
Mean and Standard Deviation Example
Conclusions
Postscript

Download IEEEDemo.ZIP (125 KB)
with D3/D4/D5 source code for the IEEE754 unit

Note:  NaNs really slow down computations on the Pentium 4, unless you use SSE instructions. (Feb 2003)
See Scientific mathematics page by Dr. Chris Rorden


In this Tech Note, you'll discover how to create and use a NaN (not-a-number) and INF (infinity) values in your Delphi mathematical calculations with the help of the new Int64 feature in Delphi 4. But before considering these floating-point concepts, let's review some of the changes related to numbers introduced in Delphi 4.

Ordinal Types

Delphi 4 introduced two new ordinal types, the LongWord and the Int64. (Table 1 shows all the ordinal types, which versions of Delphi support them, and their range of values.) The LongWord is now an unsigned 32-bit integer type, while the LongInt is a signed 32-bit integer. The difference between these types is how a value is interpreted when the high-order, sign bit is 1. For example, the value $FFFFFFFF is either -1 when interpreted as a LongInt, or 4,294,967,295 when interpreted as a LongWord.

Table 1. Ordinal Types By Version of Delphi
Ordinal Type Delphi Version Min Value Max Value Signed Size[bytes]
ShortInt all -128 127 Yes 1
Byte all 0 255 No 1
SmallInt all

-32768

32767

Yes 2
Word all

0

65535

No 2
LongWord 4-5

0

4,294,967,295

No 4
LongInt all

-2,147,483,648

2,147,483,647

Yes 4
Int64 4-5

-263

263 - 1

Yes 8

With all the trouble caused over the years with the lack of a signed and unsigned 32-bit integer type, it's somewhat surprising the same problem wasn't avoided when introducing a 64-bit integer type. The new Int64 is a signed 64-bit integer (which really is a better implementation of the Comp type described below as a Real type). An unsigned 64-bit integer does not exist at present. [See Ray Lischner's "What's New? 64-Bit Integers"  also in the Oct 98. Delphi Developer, or See Ray Lischner's "Delphi 4 Integers: The Long and the Short of It" in March/April 1999 Visual Developer, pp. 92-93.]

I am happy with the very descriptive name "Int64" instead of the ambiguous "LongLong," which is also defined as a synonym for Int64 in the Windows.PAS unit. Over the years I've always wondered why "generic" names have been recommended, when the size of an integer can affect many algorithms. For some reason "speed" has always been more important than portability of an integer algorithm.

One of the first uses for Int64 is with the DiskSize and DiskFree functions for disks sizes that far exceed the 2 GB limit.

In addition to the LongWord and Int64 types, Delphi 4 changed the definition of the "Generic" ordinal types, Cardinal. As shown in Table 2, a Cardinal in Delphi 4 now uses the full unsigned range that is possible.

Table 2. Generic Ordinal Types By Version of Delphi

Generic Ordinal Type

Delphi Version

Min Value

Max Value

Signed

Size[bytes]

Integer 1

-32768

32767

Yes 2
2-5

-2,147,483,648

2,147,483,647

Yes 4
Cardinal 1

0

65535

No 2
2-3

0

2,147,483,647

No 4
4-5

0

4,294,967,295

No 4

I have been somewhat sloppy with API calls in Delphi 32 when I used an "Integer" instead of a "Cardinal". Since the values were usually relatively small, positive values, I didn't care if I used an Integer or Cardinal. Since the word "Integer" was a little shorter and the "standard" Pascal declaration for years, I've used Integer values with most Win32 API calls. With Delphi 4, many of these API calls now require a Cardinal type to avoid a compiler warning. For example, use of Integer values with the API call GetDIBSizes in Delphi 2 and 3 must now be changed to Cardinal in Delphi 4 (or even the DWORD type that is a synonym available to match C definitions).

In general, arithmetic operations on integers return a value of type Integer — which, in Delphi 2-5, is equivalent to the 32-bit LongInt. Operations in Delphi 4-5 return a value of type Int64 only when performed on an Int64 operand. So if I is an Integer and J is an Int64, use the statement J := Int64(I) + 1 for correct results.

There is at least one problem introduced indirectly by the new Int64 support. Example: Given variables A and B of type Byte or Word, with A := 40 and B := 45, the expression A - B is -5 as expected (as it has been in Delphi 1, 2 and 3), but (A-B) DIV 4 is 1,073,741,822 in Delphi 4.   (A - B) DIV 4 in Delphi 4.01 is now again -1 as it was in Delphi 1-3. [In Delphi 4.00 and 4.01, there was no compiler warning about this possible consequence. Borland claimed this was NOT a bug but still made a change in Delphi 4.02]. In Delphi 4, if A and B are LongWords (32-bit unsigned values), the expression A - B is 4,294,967,291. IMHO, Borland should treat all unsigned quantities in a like way but it's probably difficult to impose new rules to correct poor definitions in the past.  To be consistent, A - B for unsigned quantities should always return an unsigned result.   The treatment should not vary with Byte, Word and LongWord.

Real Types

As shown in Table 3, Delphi 4 introduced a new Real48 type, which is really what the old "Real" type was in Delphi 1-3. Use the Real48 type for compatibility with Turbo Pascal or Delphi 1-3 programs that used Real values. A new {$REALCOMPATIBILITY} compiler directive is available to turn "Real" back into the 6-byte type from the past. Real48 should not be used except for backward compatibility. Real48 is generally slower than other floating-point types, since the FPU is not used with Real48.

Table 3. Real Types By Version of Delphi
Real Type Delphi
Version

Smallest Value

Largest Value

Significant
Digits
Size[bytes]
Single all

1.5 x 10-45

3.4 x 1038

7-8 4
Real48 4

2.9 x 10-39

1.7 x 1038

11-12 6
Double all

5.0 x 10-324

1.7 x 10308

15-16 8
Extended all

3.4 x 10-4932

1.1 x 104932

19-20 10
Comp all

-263 + 1

263 -1

19-20 8
Currency 2-5

-922,337,203,685,477.5808

922,337,203,685,477.5807

19-20 8

The "Real" definition is now a "generic" type, as shown in Table 4. "Real" in Delphi 4/5 is equivalent to "Double." The "Extended" type offers greater precision than other real types but is less portable — the IEEE "Double" has the most precision that is a defined standard for all platforms. Be careful using Extended if you are creating data files to share across platforms. The Double type is adequate for most scientific and engineering calculations, so the new "Real" can be used for most calculations. Since the Double is 64-bits long, performance with the 64-bit processors in the near future will probably be faster with doubles than with the 80-bit extended values.

Table 4. Generic Real Type By Version of Delphi
Generic Real Type Delphi
Version

Smallest Value

Largest Value

Significant
Digits
Size[bytes]
Real 1-3

2.9 x 10-39

1.7 x 1038

11-12 6
4-5

5.0 x 10-324

1.7 x 10308

15-16 8

When a Real is Actually an Integer

Two of the types listed in Table 3, Comp and Currency, are treated like floating point types but are actually an integer for the Comp type and a scaled integer for the Currency type. The Comp (computational) type is native to the Intel CPU and represents a 64-bit integer. However, Comp is maintained for backward compatibility only. Use the Int64 type for better performance in Delphi 4.

Currency is a fixed-point data type that minimizes rounding errors in monetary calculations. It is stored as a scaled 64-bit integer with the four least-significant digits implicitly representing decimal places. When mixed with other real types in assignments and expressions, Currency values are automatically divided or multiplied by 10000. A Currency variable has no "fuzz" that is present with regular floating-point values making it a good choice for monetary calculations.

IEEE 754-1985

Now that we've reviewed the various ordinal and real types in the various versions of Delphi, let's consider how to use a NaN (non-a-number) or an INF (infinity) floating-point value. Delphi 4 helps in doing this, since the new Int64 can be used to initialize a 64-bit Double value.

The set of "real" numbers in mathematics is approximated using floating-point values in computers. Such numbers are often used in scientific and engineering calculations (and sometimes even financial calculations, although the floating point "fuzz" is usually not desirable in financial calculations.) Sometimes mathematical difficulties complicate calculations, such as division by zero or the square root of a negative number (when complex numbers are not needed or appropriate). What should the result be when a program attempts to divide by zero?

At other times, a value can be missing or unknown? How can calculations proceed without a lot of special logic to deal with the missing or unknown values?

The IEEE 754 standard was introduced in the mid-80s and defined values for "Not-a-Number" and for both positive and negative infinity. I won't attempt to explain all the IEEE 754 details here. The following is a brief introduction.

Figure 1 from the IEEEDemo program (in IEEEDemo.ZIP ), which compiles in Delphi 3/4/5, shows various "special" Double values defined by the IEEE 754 standard.   Note that in Delphi 4, "-INF" appears as "INF" but is now correct in Delphi 5.

Figure 1.  A Window in the IEEEDemo program.
ScreenIEEE754.jpg (58514 bytes)

Consider how the hexadecimal representation of these values can be converted to decimal for a "normal" Double. The Figure shows that 100.0 can be represented in exponential notation as 1.00E+2, which is shorthand for 1 x 10, or in hexadecimal as $4059000000000000.

This hexadecimal string needs to be parsed into three fields, the sign bit (s), the exponent (e) and the fraction (f).

DoubleInternalFormat.gif (1743 bytes)

normal double 0 < e < 2047 v = (-1)s   x  2e-1023  x  1.f
"denormal" e = 0 and f <> 0 v = (-1)s   x  2-1022  x  0.f
Zero e = 0 and f = 0 v = (-1)s   x  2-1022  x  0.f
Infinity (INF) e = 2047 and f = 0 v = (-1)s   x  INF
Not a Number (NaN) e = 2047 and f <> 0 NaN

The IEEE Double precision value can be found using the three factors in the "normal double" equation (see above):

When the high-order bit is 0, such as with $4 = 0100 binary, the number is positive.  When the high-order bit is 1, such as with the -0 or -Infinity values (see Table 1), the value is negative.  The first factor for 100.0 in the above equation is 1.

The exponent is the first three hex digits, which is $405 = 1029. This exponent is biased by 1023, so this value must be reduced by 1023. The second factor for 100.0 in the previous equation becomes

21029-1023 = 26 = 64

The fraction has an implied 1, so the signficand is $1.9. In decimal this value becomes 1 9/16, which is the third factor.

Putting these three decimal factors together: 1 x 64 x 1 9/16 = 100.0. Binary floating point values aren't that complicated or mysterious!

The code is simple to create a double constant with the value of 100.0:

CONST x: Double = 100.0;

How can this value be defined using its hexadecimal form?  In Delphi 4/5, with the Int64 type, the task is fairly simple:

{$J-}
CONST HundredBits: Int64 = $4059000000000000;
VAR Hundred: Double ABSOLUTE HundredBits;

The ABSOLUTE directive tells the compiler that the variable Hundred is at the same location in memory as the HundredBits constant. Since both are 64-bit values they should have the same memory alignment.

Use the $J- compiler directive to make sure a constant cannot be changed at runtime. Interestingly, since the HundredBits constant cannot be changed, neither can the Hundred variable! Without the $J- directive, Delphi has allowed a variable constant, but with it we now we have a constant variable!

In Delphi 2 or 3, with only 32-bit LongInts, defining a Double value 100.0 using hex was a bit more involved, and complicated by the "Little Endian" storage format:

{$J-}
CONST HundredBits: ARRAY[1..2] OF LongInt = ($00000000, $40590000);
VAR Hundred: Double ABSOLUTE HundredBits;

Defining most Double constants in hex would be a real pain, but it's the only way to deal with many of the special values shown in Figure 1. The ButtonConstantsClick procedure for the "Constants" button in the IEEEDemo program uses conditional compilation to implement these techniques:

{$IFDEF VER130}
    <Delphi 5 code>
{$ELSE} 
{$IFDEF VER120}
    <Delphi 4 code>
{$ELSE}
    <Delphi 2-3 code>
{$ENDIF}
{$ENDIF}

Download the complete source code file for details. A NaN function and PositiveInfinity and NegativeInfinity functions are in the IEEE754 unit, using constants defined as described previously. These functions can be used to obtain these special values whenever they are needed anywhere in code.

The decimal values in Figure 1 were formatted with the following statement:

Format('%24.17e', [x])

In Delphi 4 for some unexplained reason, Delphi formats both +INF and -INF as simple "INF" (this was fixed in Delphi 5). 

The hex strings in Figure 1 were formatted using a DoubleToHex function, which is defined in a separate IEEE754 unit.

FUNCTION DoubleToHex(CONST d: DOUBLE): STRING;
  VAR
    Overlay: ARRAY[1..2] OF LongInt ABSOLUTE d;
BEGIN
  // Look at element 2 before element 1 because
  // of "Little Endian" order.
  RESULT := IntToHex(Overlay[2],8) + IntToHex(Overlay[1],8);
END {DoubleToHex};

Seemingly, this routine could have been simplified in Delphi 4 to the following, but IntToHex works slightly differently with Int64 values. In D4 (but not D5)  IntToHex suppresses leading 0s in the hex string, so the above Delphi 3 code still works best.

// Delphi 4 code suppresses leading 0s in hex strings below. Why?
// Fixed in Delphi 5
FUNCTION DoubleToHex(CONST d: DOUBLE): STRING;
  VAR
    Overlay: Int64 ABSOLUTE d;
BEGIN
  RESULT := IntToHex(Overlay, 16)
END {DoubleToHex};

The inverse function, HexToDouble, is also defined in the IEEE754 unit:

TYPE
  EIEEEMath = CLASS(Exception);
. . .
FUNCTION HexToDouble(CONST hex: STRING): DOUBLE;
  VAR
    d : DOUBLE;
    Overlay: ARRAY[1..2] OF LongInt ABSOLUTE d;
BEGIN
  IF   LENGTH(hex) <> 16
  THEN RAISE EIEEEMath.Create('Invalid hex string for HexToDouble');
  Overlay[1] := StrToInt('$' + COPY(hex, 9, 8));
  Overlay[2] := StrToInt('$' + Copy(hex, 1, 8));
  RESULT := d
END {HexToDouble};

DoubleToHex provides a way to save all the binary precision of a floating-point value in an ASCII form, and also a way to save the NaN and INF values. Converting from a binary double to an ASCII decimal string and back to the binary double is not always a reversible process, especially if many decimal digits are not saved and restored. DoubleToHex and HexToDouble provide an alternative that saves and restores all bits exactly. (Note: The Borland StrToFloat routine coughs if you pass it a string 'NAN'.)

The ButtonConstantsClick procedure for the "Constants" button in the IEEEDemo program uses both the DoubleToHex and HexToDouble functions to show these functions are inverses for each of the special IEEE values. However, testing equality with NaN and INF values isn't completely straightforward.

Given the Double array named "constant", the following shows the conversion of value constant[i] to a hex string and back to a Double "d":

VAR
  i  : INTEGER;
  d  : Double;
  hex: STRING;
  Constant: ARRAY[1..12] OF DOUBLE;
. . .
hex := DoubleToHex(Constant[i]);
d   := HextoDouble(hex);

Inside a simple comparison loop, the straightforward comparison shown next sometimes fails:

IF   d <> Constant[i]
THEN ShowMessage(FloatToStr(d) + ' <> ' + FloatToStr(Constant[i]));

Exceptions occur when comparing certain of the IEEE special values. "IsNan" and "IsInfinity" routines must first be used:

IF   IsInfinity(d)
THEN BEGIN
  IF   NOT IsInfinity(Constant[i])
  THEN ShowMessage('Infinity: ' + FloatToStr(d) + ' <> ' + FloatToStr(Constant[i]))
END
ELSE
  IF   IsNAN(d)
  THEN BEGIN
    IF   NOT IsNan(Constant[i])
    THEN ShowMessage('NaN: ' + FloatToStr(d) + ' <> ' + FloatToStr(Constant[i]))
  END
  ELSE
    IF   d <> Constant[i]
    THEN ShowMessage(FloatToStr(d) + ' <> ' + FloatToStr(Constant[i])

The IsInfinity and IsNaN routines are much easier in Delphi 4 than Delphi 3:

// Since a NAN is not a single, unique value, a special function is
// needed for this test
FUNCTION IsNAN(CONST d: DOUBLE): BOOLEAN;
  VAR
    Overlay: Int64 ABSOLUTE d;
BEGIN
  RESULT :=
    ((Overlay AND $7FF0000000000000)  = $7FF0000000000000) AND
    ((Overlay AND $000FFFFFFFFFFFFF) <> $0000000000000000)
END {IsNAN};

// Like a NaN, an INF Double value has an exponent of 7FF, but the INF
// values have a fraction field of 0. INF values can be positive or
// negative, which is specified in the high-order, sign bit.
FUNCTION IsInfinity(CONST d: DOUBLE): BOOLEAN;
  VAR
    Overlay: Int64 ABSOLUTE d;
BEGIN
  RESULT :=
    ((Overlay AND $7FF0000000000000) = $7FF0000000000000) AND
    ((Overlay AND $000FFFFFFFFFFFFF) = $0000000000000000) 
END {IsInfinity};

The Delphi 5 versions of these functions use "mask" constants defined by Jon Shemitz, which is a better approach.

See the Delphi 3 and Delphi 5 equivalent routines in the IEEE754 unit.

"Quiet" versus "Signaling" NaNs

NaNs come in two flavors, "quiet" and "signaling." The NaN function in the IEEE754 unit returns the "quiet" NaN, which is the NaN that quietly propagates through any subsequent calculations. The high-order bit of the fraction field for a "Quiet" NaN is 1, while it's zero for the signaling NaN. NaN values are not necessarily unique and can take on many possible values for the fraction field (except 0).

The result of any arithmetic operation involving a quiet NaN is a NaN. The "NAN, +INF, -INF" button in the IEEEDemo programs demonstrates the "quiet" and "signaling" NaNs (as well as IsNaN and IsInfinity functions).

Given a Double value that is a quiet NaN, any arithmetic operation results in a NaN. In the following example, since X is a NaN, X+5 also results in a NaN:

VAR x: DOUBLE;
. . .
x := NAN;
IF   IsNan(x+5)
THEN ShowMessage('Is Nan (Quiet)');

The value for a "Signaling" NaN cannot be returned via a function, since this would immediately generate an exception. So, the NANSignaling constant, defined in the IEEE754 unit, may be used directly as shown in these example:

x := NANSignaling;
IF   IsNan(x)
THEN ShowMessage('Is Nan (Signaling)');

// Force the exception with the Signaling NAN
TRY
  y := 5*x + 1;
  ShowMessage('Will not happen ' + FloatToStr(y))
EXCEPT
  ON EInvalidOp DO ShowMessage('Trap Signaling NaN')
END;

// A signaling NaN raises an invalid operation whenever
// it appears as an operand.
TRY
  x := 25.0*SIN(NANSignaling)
EXCEPT
  On EInvalidOp DO x := 6.0
END;

As shown above, ANY use of a signaling NaN will result in an EInvalidOp exception.

Other Examples of Handling Exceptions

Several other variations of using NaN and INF values are demonstrated when the "Test Exceptions" button (see Figure 1) is pressed. If an exception is not captured, no value is automatically assigned to the result.

VAR
  x,y,z: double;
...
x := 1.0;
y := 0.0;
z := 5.0;

// NAN or INF is not automatically assigned as result of
// invalid operation
TRY
  z := x / y;
EXCEPT
END;

The value of "z" above is still 5.0 since the EZeroDivide exception was not captured. Here's how to capture the exception and assign a PositiveInfinity result using a function in the IEEE754 unit.

// Here's how to assign an INF to the result of a EZeroDivide
TRY
  z := x / y;
EXCEPT
  // Do not confuse with integer EDivByZero
  ON EZeroDivide DO z := PositiveInfinity
END;

Sometimes a NaN value is appropriate for some sort of mathematical "difficulty," such as the square root of a negative value (assuming complex values are not wanted):

// Let's handle the SQRT(-1) here
TRY
  z := SQRT(-1.0)
EXCEPT
  // Do not confuse with EInvalidOperation!
  ON EInvalidOp DO z := NAN
END;

If calculated values get too large or two small, overflows or underflows must be trapped:

// Handle an overflow
x := MaxDouble;
TRY
  z := SQR(x);
EXCEPT
  ON EOverFlow DO z := PositiveInfinity
END;

// Handle an underflow
x := MinDouble;
TRY
  z := SQR(x);
EXCEPT
  // When possible, IEEE math allows
  // "gradual underflow" by allowing
  // subnormal values with less precision.
  // However, in this case, the underflow
  // is extreme, so the value is just set to 0.
  ON EUnderFlow DO z := 0
END;

Certain operations with the IEEE special values need not be trapped. For example:

x := NAN;                    // Quiet NaN
y := 5.0;
z := 13.0 * SQRT(x + y);    // Result is NaN

x := PositiveInfinity;
y := 5.0;
z := 13.0 * SQRT(x + y);    // Result is INF

// You can add two INFs, or multiply two INFS, but
// an attempt to subtract or divide two INFs results
// in Invalid Floating Point Operation.
x := PositiveInfinity;
y := PositiveInfinity;
z := x*y;                    // Result is +INF

x := NAN;                    // Quiet NaN
y := PositiveInfinity;
z := x + y;                  // Result is a NaN

Mean and Standard Deviation Example

One final example will combine the use of NaNs in calculating a mean and standard deviation, with the new dynamic array feature Delphi 4.

Given a series of data, a mean cannot be calculated unless there is at least one value. A standard deviation cannot be calculated unless at least two values are present. The following routine returns NaNs for these values when they are not defined. Any number of values can be passed to the MeanAndStandardDeviation procedure using the open array construct:

PROCEDURE MeanAndStandardDeviation(CONST x: ARRAY OF DOUBLE;
    VAR Mean: DOUBLE;
    VAR StandardDeviation: DOUBLE);
  VAR
    i          : INTEGER;
    N          : INTEGER;
    xSum       : DOUBLE;
    xSquaredSum: DOUBLE;
BEGIN
  xSum := 0.0;
  xSquaredSum := 0.0;

  FOR i := Low(x) TO High(x) DO
  BEGIN
    xSum := xSum + x[i];
    xSquaredSum := xSquaredSum + SQR(x[i])
  END;

  N := High(x) - Low(x) + 1;

  TRY
    Mean := xSum / N;
  EXCEPT
    ON EInvalidOp DO Mean := NAN; // 0.0 / 0
  END;

  TRY
    StandardDeviation := SQRT( (xSquaredSum - xSum*xSum/N) / (N - 1) )
  EXCEPT
    ON EZeroDivide DO StandardDeviation := NAN; // N = 1
    ON EInvalidOp  DO StandardDeviation := NAN  // N = 0
  END
END {MeanAndStandardDeviation};

The ButtonStatsClick procedure shows how to use this routine with dynamic arrays in Delphi 3 and Delphi 4. The Delphi 4/5 solution, shown below, is more elegant, using the new dynamic array construct. (Download the complete source code for the Delphi 3 version.)

procedure TFormIEEE754.ButtonStatsClick(Sender: TObject);
  VAR
    x    :  ARRAY of DOUBLE;
    Count:  CARDINAL;
    Mean :  DOUBLE;
    StandardDeviation:  DOUBLE;
begin
  Memo.Lines.Clear;

  Count := 5;
  SetLength(x, Count); // New Delphi 4 feature

  x[0] := 1.5;
  x[1] := 2.0;
  x[2] := 2.2;
  x[3] := 2.1;
  x[4] := 1.9;

  // Use all 5 values
  MeanAndStandardDeviation(x, Mean, StandardDeviation);

  Memo.Lines.Add ('Count = ' + IntToStr(Count) + ', ' +
      Format('Mean = %6.2f, St.Dev. = %6.2f',
      [Mean, StandardDeviation] ) );

  // Use single value -- NAN for Standard Deviation
  Count := 1;
  SetLength(x, Count);
  MeanAndStandardDeviation(x, Mean, StandardDeviation);
  Memo.Lines.Add ('Count = ' + IntToStr(Count) + ', ' +
      Format('Mean = %6.2f, St.Dev. = %6.2f',
      [Mean, StandardDeviation] ) );

  // Use no values -- NAN for Mean and Standard Deviation
  Count := 0;
  SetLength(x, Count);
  MeanAndStandardDeviation(x, Mean, StandardDeviation);

  Memo.Lines.Add ('Count = ' + IntToStr(Count) + ', ' +
      Format('Mean = %6.2f, St.Dev. = %6.2f',
      [Mean, StandardDeviation] ) );
end;

The results from clicking on the "Mean & St. Dev." Button is shown next.

Count = 5, Mean = 1.94, St.Dev. = 0.27
Count = 1, Mean = 1.50, St.Dev. = NAN
Count = 0, Mean = NAN,  St.Dev. = NAN

Conclusions

Delphi 4 introduced several new numeric features, including LongWord, Real48 and Int64. This Int64 integer makes working with several of the special IEEE 754 double-precision floating point values much simpler. I only wish Borland would add direct support for NaNs and INFs to Delphi by supplying the necessary constants and routines for using IEEE arithmetic.


Postscript

1.  The FormIEEE.PAS unit (in IEEEDemo.ZIP) shows a trick for defining an INF value:

CONST InfinityTrick = 1.0 / 0.0;   // Trick to define INF

2.  Ray Lischner in a UseNet post points out a similar construct works to define a NaN:

CONST NaNTrick = 0.0 / 0.0;  // Trick to define NaN

3.  Dr. Chris Rorden, Department of Psychology, University of Nottingham in the UK, had a problem with Singles and NaNs:

I am trying to read 4-byte floating point values (SINGLES) -- this is an MRI scan.  However, some of the values appear to be "+NaN". The code you have is good for detecting NaNs in DOUBLES, but I get errors when I try to detect  these. It seems like the "-NaNs" are detected, as well as infinities. 

Before I could respond, Dr. Chris solved his own problem and gave me permission to publish his solution here:

I found a solution for rapidly detecting NaN, Infinity and indeterminate IEEE SINGLE values.  I worked this out after reading your excellent page that described DOUBLE values, and the link you added to Steve Hollasch's page.

My technique is to read the 32-bit float as a 32-bit integer and check to see if all of the exponent bits are set to 1 (indicating special values). Your excellent code for treating DOUBLES as integers helped.

This was a nice way for me to learn about IEEE numbers - your code was close enough for me to see the problem was soluble, but I needed to learn about the structure of singles to implement the solution.

Anyway, others might find my code useful:

FUNCTION SpecialSingle (s:single): boolean;
//returns true if s is Infinity, NAN or Indeterminate
//4byte IEEE: msb[31] = signbit, bits[23-30] exponent,
//bits[0..22] mantissa
//exponent of all 1s = Infinity, NAN or Indeterminate
CONST kSpecialExponent = 255 shl 23;
VAR
  Overlay: LongInt ABSOLUTE s;
BEGIN
  IF ((Overlay AND kSpecialExponent) = kSpecialExponent) THEN
    RESULT := true
  ELSE
    RESULT := false;
END;

 

Dr. Chris Rorden
Department of Psychology
University of Nottingham
Nottingham NG7 2RD, UK 
http://www.psychology.nottingham.ac.uk/staff/cr1/ 


Thanks to Jon Shemitz for identifying a bug in the IsInfinity function. (April 2000)


Other References

Floating-Point Unit (excellent article)
http://developer.intel.com/design/intarch/techinfo/pentium/fpu.htm

IEEE Standard 754 Floating Point Numbers
www.research.microsoft.com/~hollasch/cgindex/coding/ieeefloat.html

Floating Point Optimization
www.optimalcode.com/float.htm 

Integer Optimization
www.optimalcode.com/integer.htm 

CmpSci 535 Notes from Lecture 6:  Number Representations
http://www.cs.umass.edu/~weems/CmpSci535/535lecture6.html

John Herbster's T_BinaryFloatingPoint program for analyzing the extended, double, and single binary point numbers.


Updated 26 Feb 2003


since 22 Nov 1998