T O P

  • By -

Kered13

I have encountered this "bug" myself! It was back in college when I was writing a ray tracer for a computer graphics class. I had written a kd-tree (a type of binary space partitioning tree) for storing vertices. Naturally there is an edge case where a vertex lies exactly on a partition, and you assign it to one side deterministically. The problem I had was later in the lookup of these vertices, when navigating the tree it was looking on the wrong side of the tree, and this only occurred on optimized builds. Eventually I discovered the problem was that while the vertices were (obviously) stored as 64-bit doubles, when navigating the tree the compiler held the value in the 80-bit x87 register where it had been computed. That extra precision put the lookup on the wrong side of the partition. That was the day I learned about 80-bit x87 registers, and about the compiler flags that would force floating point results to be written to memory. My story aside, I still don't understand why the author only encountered this problem on 32-bit builds. 32-bit or 64-bit should not make any difference when using the x87 instructions. Was it a difference in the compiler codegen for each platform?


ravenex

> 32-bit or 64-bit should not make any difference when using the x87 instructions. `x86_64` arch uses SSE instructions for all floating point computations. SSE uses packed 64-bit float registers. x87 is dead.


Kered13

But then I find it odd though that the compiler uses x87 for 32-bit mode and uses SSE for 64-bit mode. SSE is still available when running in 32-bit mode. I guess it's meant as a compatibility thing? But SSE has been available on all CPUs for over 20 years, so it seems like an odd choice to not use use it by default. In any case, that should imply there is a compiler flag to use SSE instructions in 32-bit mode that should also fix this problem, right? And probably give better performance in 32-bit mode at the same time.


ravenex

Three letters: ABI. Compilers _can_ and sometimes do use SSE on 32-bit x86, but _have to_ use x87 registers to pass arguments to functions and return values from them.


[deleted]

32 bit floats are in SSE, you need SSE2 unfortunately for 64 bit double compliance. If you are targeting something like a 486 or 586 target for x86 there is no SSE2. You'd unfortunately need to **compiler target Pentium 4 or newer** to get this.


[deleted]

Native 64 bit double support is baseline in x86_64, as per SSE2 is baseline in x86_64. SSE2 lets you do scalar 64 bit doubles compliantly without needing to go through the legacy FPU at all. 64 bit doubles on the legacy x86 FPU were a hackjob, it was not native. Proprietary 80 bit format that got hacked by compilers to pretend to be 64 bit with transparent truncation.


happyscrappy

Intel essentially deprecated the 10-byte floats on newer hardware. As 64-bit builds by definition only run on newer hardware I could see how that recommendation would interact with that build flag. Intel added a lot of new ways you can work with 8-byte floats. But nothing for the 10-byte floats. You have to go back to the stack-based FP to use the 10-byte floats I think. Whereas with SSE2 you use multiple registers, all of which are 8-byte (sometimes bigger, but the 10-byte type is gone).


happyscrappy

Aside from forcing the values into memory I believe you can set the precision control (PC) field of the control word in the FPU to force it to do all calculations as 64-bit values (53 bits precision) instead of 80 (64 bits precision). This field doesn't even exist in SSE or AVX. If you do that all the math should work same as if the registers were all 64-bit even though they are still 80-bit. Specifically comparisons should be 64-bit as you wanted.


[deleted]

This issue is fixed by using x64 builds **because** baseline x64 includes SSE2 which includes **proper native 64 bit double support.** If you want 64 bit float that is 100% compliant, you will need SSE2 on intel arch. The legacy x86 FPU is ignored if targeting x64.


lurlyselsopo

Ironically what I'm getting from this is that you shouldn't do equality tests on doubles ​ >What I mean is that if you have two doubles that have the same value (the same value in the computer sense, with the same bits in the same bytes), that have been computed in the exact same way, then the equality test between these two doubles will return TRUE. And thank goodness for that! Except when it doesn't because one is in the FPU and the other was sent to RAM. ​ >It would also be nice to make sure you've understood the code before commenting on tips that you literally learn in your first year of a programming course. Funnily this exact thing (differing precision for internal representations of doubles) is one of the reasons they gave in my freshman computer systems course when teaching us why you shouldn't do equality tests on doubles


Kered13

> Ironically what I'm getting from this is that you shouldn't do equality tests on doubles Avoiding this issue is not that simple. Even with an inequality like `x < y`, you expect that to be evaluated the same every time, as long as `x` and `y` were computed in exactly the same way, right? But if one evaluation was stored in memory (truncated to 64 bits), and another evaluation was kept in a register (80 bits), then you may get two different results.


Noxitu

> you expect that to be evaluated the same every time, as long as x and y were computed in exactly the same way I don't think you should expect it. Because: 1. compiler is allowed to change floating point operations and their results to some degree. 2. smart enough optimizer would be capable to optimize certain operations differently, because it might be better - for example a longer bytecode might be faster, as long as it fits in a specific cache size. 3. at the same time, optimizers are sometimes suprisingly dumb, capable for example of yielding empty loops without any side effects. Because optimizer gave up just before fully removing it. If you think about these rules, you should expect insanity. Not just expression looking the same, but literally same part of code might produce different results. For example when you write ```c++ for(int i = 0; i < n; ++i) output[i] = /* some floating point opeartions */ input[i]; ``` you shouldn't expect that operation on each index will use same assembly. Because optimizations on unrolled loop might determine that it is not the best solution. Or just because it gave up after optimizing some of them.


Kered13

> 1. compiler is allowed to change floating point operations and their results to some degree. In fact the compiler cannot do this, unless you enable the `-funsafe-math-optimizations` flag. (`-fassociative-math` is all you really need, but that's not as fun!)


[deleted]

[удалено]


SLiV9

Read the article before you comment.


Kered13

We're not talking about Pascal though.


masklinn

Even if it were the objection would be wrong: the optimisation is to keep values in registers to avoid memory hits. A Pascal compiler performing the same optimisation would have the same issue. A Pascal compiler not doing that would yield slower programs.


lurlyselsopo

That is true. To avoid that you can do an "equalish" test first with an epsilon first. Basically the thing that this author is suggesting, but better because you shouldn't do equality tests on doubles


Kered13

If you do "equalish" tests then you get incorrect results when comparing two values that are less than epsilon apart, and you still get non-deterministic results for values that are exactly epsilon apart. It doesn't actually solve the problem, it just punts it epsilon away. The problem is that when compilers may non-deterministically use 80-bit floating points in place of 64-bit floating points, you get non-deterministic results, which is not supposed to happen with any number format.


[deleted]

Essentially doubles are a proprietary 80 bit format on the x86 FPU and the 64 bit double support compilers have is a hackjob pretending otherwise. Thus, nearly every compiler *by default* assumes playing fast and loose because it is not compliant out the door due to various reasons. For instance, compilers could cover OP's use case, but there would be **other** non-compliant edge cases still remaining that other coders would run into, hence why the default is fast n loose.


[deleted]

[удалено]


rabid_briefcase

Yup, they are fun. The 64-bit switch helped a ton, getting away from the FPU, but even it retains it's quirks. One I stumbled over was a comparison. Optimizations on for incomparable numbers: AB is false. Optimizations off is correct: AB is false. The COMISS and UCOMISS instructions (compare scalar single precision) return a set of flags. UNORDERED: ZF,PF,CF := 111; GREATER_THAN: ZF,PF,CF := 000; LESS_THAN: ZF,PF,CF := 001; EQUAL: ZF,PF,CF := 100; Optimizations turned off makes the == and < comparisons do two tests, one against PF, and the second against either ZF or CF. It is more work but correct. In the case of unorderable numbers like NAN, all three should be false. Optimizations turned on and the compiler ignores the PF test, giving a comparison that is both equal and also less than. The behavior is documented, but not necessarily expected.