Skip to content

Conversation

clayjohn
Copy link
Member

@clayjohn clayjohn commented Sep 9, 2025

On at least one intel integrated device (Intel Xe) I get a nasty precision bug when using the mobile renderer

Screenshot from 2025-09-09 12-00-47

I have tested on another device and the issue seems to be specific to this GPU. Most likely this is a shader code gen error in the driver (it could also be a bug with how FP16 is handled by Intel integrated GPUs).

We have two options:

  1. Disable FP16 on this device (i.e. maintain a ban list)
  2. Avoid FP16 in this situation.

I opted to avoid FP16 in this situation because I doubt that FP 16 is helping this specific code very much anyway.

This PR forces us to do 1 dot product, 1 max, operation, and 2 subtract operations in high precision that would normally be done at half precision. It also potentially adds an extra half register for most of this function. I say "potentially" since it's not guaranteed that a given compiler would pack spot_dir into a register with another variable anyway.

As a trade off. This PR avoids 3 conversions from vec3() to hvec3(), 1 conversion from float to half, and 2 conversions from half to float.

Overall profiling doesn't reveal any obvious performance difference. So the impact on performance is inconclusive and likely it is highly device dependent.

Since the changes are minimal and the performance impact is inconclusive. I think we should go ahead with this PR. It is extremely bad for people to ship games and have them randomly look horrible on certain devices.

After this PR

Screenshot from 2025-09-09 13-16-55

@clayjohn clayjohn added this to the 4.6 milestone Sep 9, 2025
@clayjohn clayjohn requested a review from DarioSamo September 9, 2025 20:19
@clayjohn clayjohn requested a review from a team as a code owner September 9, 2025 20:19
@clayjohn clayjohn added bug regression cherrypick:4.5 Considered for cherry-picking into a future 4.5.x release labels Sep 9, 2025
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on Linux + NVIDIA/llvmpipe, it works as expected. Code looks good to me.

@DarioSamo
Copy link
Contributor

DarioSamo commented Sep 10, 2025

The part that I'm worried about regarding this PR is that there's no clear indication of what we're supposed to avoid to not break this particular device. Whoever decides to refactor or optimize this code in the future again will have no clear reason as to why they should avoid it, and we can't exactly test again unless whoever refactors it has access to it.

Seeing as the reason behind the error is so vague, I'd say the ban list for FP16 on Intel would be the best approach for future maintenance (can be device/driver version based too). If the workaround is pushed as is, we at least need a link back to this PR denoting it is a highly fragile area of code in the specific device this bug was encountered on.

Are we absolutely sure this bug only happens on this device and not elsewhere?

@clayjohn
Copy link
Member Author

The part that I'm worried about regarding this PR is that there's no clear indication of what we're supposed to avoid to not break this particular device. Whoever decides to refactor or optimize this code in the future again will have no clear reason as to why they should avoid it, and we can't exactly test again unless whoever refactors it has access to it.

Seeing as the reason behind the error is so vague, I'd say the ban list for FP16 on Intel would be the best approach for future maintenance (can be device/driver version based too). If the workaround is pushed as is, we at least need a link back to this PR denoting it is a highly fragile area of code in the specific device this bug was encountered on.

Are we absolutely sure this bug only happens on this device and not elsewhere?

@DarioSamo No, I'm not sure. I'm judging based on the fact that this code has been in place for awhile and no one has complained. I am concerned about the increase in complexity and maintenance cost from a ban list if it turns out this is just an issue for a few device/driver combos.

I am not super concerned about someone undoing this change in the future since these specific lines of code were pretty dubious to begin with. The fact that spot_rim needed to be in full precision anyway makes the preceding half calculations a lot less impactfull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug cherrypick:4.5 Considered for cherry-picking into a future 4.5.x release regression topic:rendering topic:3d
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants