Numerical Analysis

cfbsoftware · Post by **cfbsoftware** » Sun Nov 23, 2014 11:34 am

There was a similar discussion with 20+ entries on the subject real math earlier this month on the ETH Oberon mailing list. You can see it in the mailing list archives starting here:

https://lists.inf.ethz.ch/pipermail/obe ... 07857.html

lschwerdfager · Post by **lschwerdfager** » Tue Jun 19, 2018 2:18 am

With respect to the following test code, note that the "x outside Foo" value is 2.255140518769849E-17 while the
"x inside Foo" value is -2.036944831557141E-17. Accordingly, it appears that the negative Sqrt argument value originates somehow as a result of passing u and v as Foo arguments. These arguments are stored on the stack; presumably, the stack values are 64 bits wide while the actual REAL values are 80 bits wide. This might account for the loss of precision and hence the negative value being passed to Sqrt. Does anyone have an opinion on this line of possible explanation?

MODULE NumAnalysis;

IMPORT Math, StdLog;

PROCEDURE Foo (u, v: REAL): REAL;

VAR
x: REAL;
result: REAL;

BEGIN
x := 1.0 - u * u - v * v;
StdLog.String ("x inside Foo is ");
StdLog.Real (x);
StdLog.Ln;
result := Math.Sqrt (x);
RETURN result
END Foo;

PROCEDURE Foo2 (x: REAL): REAL;

VAR
result: REAL;

BEGIN
result := Math.Sqrt (x);
RETURN result
END Foo2;

PROCEDURE Test*;

CONST
u = 0.9925092578236596;
v = 0.1221694443563053;

VAR
x, w: REAL;

BEGIN
IF 1.0 - u * u + v * v >= 0.0 THEN
StdLog.String ("1.0 - u * u - v * v = ");
StdLog.Real (1.0 - u * u - v * v);
StdLog.Ln;
StdLog.Ln;
w := Foo2 (1.0 - u * u - v * v);
StdLog.String ("Foo2 is ");
StdLog.Real (w);
StdLog.Ln;
x := 1.0 - u * u - v * v;
StdLog.String ("x outside Foo is ");
StdLog.Real (x);
StdLog.Ln;
w := Foo (u, v);
StdLog.String ("Foo is ");
StdLog.Real (w);
StdLog.Ln
ELSE
StdLog.String ("Out of bounds!");
StdLog.Ln
END
END Test;

END NumAnalysis.

NumAnalysis.Test

Josef Templ · Post by **Josef Templ** » Wed Jun 20, 2018 6:10 am

The expression x := 1.0 - u * u - v * v; is evaluated at run-time with 80 bits internally and the result is rounded
and assigned to a 64 bit variable.
The compiler evaluates constant REAL expressions at compile time with 64 bit IEEE floating point arithmetic
because it saves intermediate results to 64-bit variables.
This explains the difference.
In general, REAL operations should not make any assumptions about the precision.

The Help text file "Platform-Specific Issues" contains some information about that.

BTW, the first IF in Test looks buggy.

- Josef

Robert · Post by **Robert** » Wed Jun 20, 2018 3:33 pm

lschwerdfager wrote:Accordingly, it appears that the negative Sqrt argument value originates somehow as a result of passing u and v as Foo arguments. These arguments are stored on the stack; presumably, the stack values are 64 bits wide while the actual REAL values are 80 bits wide.
Does anyone have an opinion on this line of possible explanation?

No, the explanation is, as Josef said, the difference between compile time (64-bit intermediate results) and run-time (80-bit intermediate results here) arithmetic. Argument passing is not involved.
REAL values are 64 bits.
The software stack is 64 bits.
The hardware floating point unit stack is 80 bits.
Four answers are possible: see the (hopefully) simple example below:

I have written your u & v constants as exactly the values they take internally in 64-bits.

Code: Select all

Compile time arithmetic ...
u                :  0.9925092578236596
v                :  0.1221694443563053
1 -  u^2 - v^2   :        2.25514E−017
1 - (u^2 + v^2)  :        0.00000E+000

Run time arithmetic ...
x                :  0.9925092578236596
y                :  0.1221694443563053
1 -  x^2 - y^2   :       −2.03694E−017
1 - (x^2 + y^2)  :       −2.03830E−017

High precision arithmetic ...
u            =   0.99250925 78236596 18580336 42783993 85511875 15258789 06250000 0000
v            =   0.12216944 43563053 06742527 25020778 59826385 97488403 32031250 0000
1 - u² - v²  =  −0.00000000 00000000 20359482 85512916 29686870 97978011 28353744 1162

Code: Select all

PROCEDURE  Do*;
  CONST
    fw  =  20;
    dp  =  16;
    sf  =   6;
    u   =  0.9925092578236596185803364278399385511875152587890625;
    v   =  0.1221694443563053067425272502077859826385974884033203125;
  VAR
    x, y, a, b, c, d  :  REAL;
  BEGIN
    f.SetToEnd;
    f.StrLn ('Compile time arithmetic ...');
    f.StrRealLn ('u                :', u, fw, dp);
    f.StrRealLn ('v                :', v, fw, dp);
    f.StrRealLn ('1 -  u^2 - v^2   :', 1 -  u * u - v * v,  fw, -sf);
    f.StrRealLn ('1 - (u^2 + v^2)  :', 1 - (u * u + v * v), fw, -sf);

    f.Ln;
    f.StrLn ('Run time arithmetic ...');
    x  :=  u;
    y  :=  v;
    f.StrRealLn ('x                :', x, fw, dp);
    f.StrRealLn ('y                :', y, fw, dp);
    f.StrRealLn ('1 -  x^2 - y^2   :', 1 -  x * x - y * y,  fw, -sf);
    f.StrRealLn ('1 - (x^2 + y^2)  :', 1 - (x * x + y * y), fw, -sf);
  END  Do;

BlackBox Component Builder Forums

Numerical Analysis

Re: Numerical Analysis

Re: Numerical Analysis

Re: Numerical Analysis

Re: Numerical Analysis