lschwerdfager wrote:Accordingly, it appears that the negative Sqrt argument value originates somehow as a result of passing u and v as Foo arguments. These arguments are stored on the stack; presumably, the stack values are 64 bits wide while the actual REAL values are 80 bits wide.
Does anyone have an opinion on this line of possible explanation?
No, the explanation is, as Josef said, the difference between compile time (64-bit intermediate results) and run-time (80-bit intermediate results here) arithmetic. Argument passing is not involved.
REAL values are 64 bits.
The software stack is 64 bits.
The hardware floating point unit stack is 80 bits.
Four answers are possible: see the (hopefully) simple example below:
I have written your u & v constants as
exactly the values they take internally in 64-bits.
Code: Select all
Compile time arithmetic ...
u : 0.9925092578236596
v : 0.1221694443563053
1 - u^2 - v^2 : 2.25514E−017
1 - (u^2 + v^2) : 0.00000E+000
Run time arithmetic ...
x : 0.9925092578236596
y : 0.1221694443563053
1 - x^2 - y^2 : −2.03694E−017
1 - (x^2 + y^2) : −2.03830E−017
High precision arithmetic ...
u = 0.99250925 78236596 18580336 42783993 85511875 15258789 06250000 0000
v = 0.12216944 43563053 06742527 25020778 59826385 97488403 32031250 0000
1 - u² - v² = −0.00000000 00000000 20359482 85512916 29686870 97978011 28353744 1162
Code: Select all
PROCEDURE Do*;
CONST
fw = 20;
dp = 16;
sf = 6;
u = 0.9925092578236596185803364278399385511875152587890625;
v = 0.1221694443563053067425272502077859826385974884033203125;
VAR
x, y, a, b, c, d : REAL;
BEGIN
f.SetToEnd;
f.StrLn ('Compile time arithmetic ...');
f.StrRealLn ('u :', u, fw, dp);
f.StrRealLn ('v :', v, fw, dp);
f.StrRealLn ('1 - u^2 - v^2 :', 1 - u * u - v * v, fw, -sf);
f.StrRealLn ('1 - (u^2 + v^2) :', 1 - (u * u + v * v), fw, -sf);
f.Ln;
f.StrLn ('Run time arithmetic ...');
x := u;
y := v;
f.StrRealLn ('x :', x, fw, dp);
f.StrRealLn ('y :', y, fw, dp);
f.StrRealLn ('1 - x^2 - y^2 :', 1 - x * x - y * y, fw, -sf);
f.StrRealLn ('1 - (x^2 + y^2) :', 1 - (x * x + y * y), fw, -sf);
END Do;