Bug 91232

Summary: Torchlight 1&2 (in Wine): glitchy models when hardware skinning enabled
Product: Mesa Reporter: Béla Gyebrószki <gyebro69>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: CLOSED NOTOURBUG QA Contact: Nouveau Project <nouveau>
Severity: normal    
Priority: medium CC: gyebro69
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: screenshot, comparison (nouveau vs. nvidia blob)
shader_test that demonstrates the (?) issue
init unknown values to 0
init unknown values to (0, 0, 0, 1)

Description Béla Gyebrószki 2015-07-05 07:08:42 UTC
Created attachment 116947 [details]
screenshot, comparison (nouveau vs. nvidia blob)

I believe this is different from bug #90887, I tried the patch from that bug and it didn't help.
Models in Torchlight 1 and 2 are rendered incorrectly, they have lots of flashing, white rectangles on them.Disabling hardware skinning in the options menu makes the problem go away.
I see similar issues in some other games as well so this bug may affect multiple games.

I tried and reproduced the problem in Mesa 10.1 and 10.4 too.
The software renderer doesn't have the problem.
Disabling shader optimization doesn't help.
The trace replays correctly on Nvidia blob 340.76.

Trace (uncompressed 43 MB):
https://drive.google.com/open?id=0B-tTbLKBl-tORHV2eGdleGk1dzQ

Fedora 22 32-bit
Kernel 4.0.6-300.fc22.i686+PAE
VGA compatible controller: NVIDIA Corporation G92 [GeForce GTS 250] (rev a2) (prog-if 00 [VGA controller])
Mesa 10.6-branchpoint-826-gff0a41b
libdrm-2.4.62-1-g676c806
xf86-video-nouveau-1.0.11-23-gdfd827c
Comment 1 Ilia Mirkin 2015-07-05 16:16:25 UTC
Reproduced on nvc0 as well. Works with llvmpipe.

Switching to opt=0 makes the flickering purple instead of white. Disabling the watchdog makes no difference.
Comment 2 Ilia Mirkin 2015-07-05 16:31:51 UTC
Appears that the draw @30166 is the one that introduces a white polygon. The shader program in question is a little sketchy since it sets gl_FrontColor in vertex shader, but uses gl_SecondaryColor in fragment shader. Investigating whether this is legal and how to handle it.
Comment 3 Ilia Mirkin 2015-07-05 17:18:00 UTC
Created attachment 116959 [details]
shader_test that demonstrates the (?) issue

Looks like they use gl_SecondaryColor in the fragment shader, and the GLSL linker, seeing no writes to gl_SecondaryFrontColor in the vertex shader, creates this IR:

(declare (location=4 shader_in ) vec4 gl_in_TexCoord0)
(declare (location=5 shader_in ) vec4 gl_in_TexCoord1)
(declare (location=1 shader_in ) vec4 gl_Color)
(declare (temporary ) vec4 gl_in_FrontColor1_dummy)
(declare (location=3 shader_in ) float gl_FogFragCoord)
(declare (location=4 shader_out ) vec4 gl_out_TexCoord0)
(declare (temporary ) vec4 gl_out_TexCoord0)
(declare () vec4 ret)
(declare (location=0 uniform ) sampler2D ps_sampler0)
(declare (location=1 uniform ) samplerCube ps_sampler1)
(declare (location=2 uniform ) vec4 specular_enable)
(declare (location=3 uniform ) #anon_struct_0001@0x24cc5d0 ffp_fog)
(function main
  (signature void
    (parameters
    )
    (
      (declare (temporary ) vec4 textureCube_retval)
      (assign  (xyzw) (var_ref textureCube_retval)  (tex vec4 (var_ref ps_sampler1)  (swiz xyz (var_ref gl_in_TexCoord1) ) 0 1 () )) 
      (assign  (xyzw) (var_ref ret)  (expression vec4 * (tex vec4 (var_ref ps_sampler0)  (swiz xy (var_ref gl_in_TexCoord0) ) 0 1 () )(var_ref gl_Color) ) ) 
      (assign  (xyz) (var_ref ret)  (expression vec3 sat (expression vec3 + (swiz xyz (var_ref textureCube_retval) )(swiz xyz (var_ref ret) )) ) ) 
      (assign  (w) (var_ref ret)  (expression float * (swiz w (var_ref textureCube_retval) )(swiz w (var_ref ret) )) ) 
      (assign  (xyzw) (var_ref gl_out_TexCoord0)  (expression vec4 + (expression vec4 * (var_ref gl_in_FrontColor1_dummy) (var_ref specular_enable) ) (var_ref ret) ) ) 
      (assign  (xyz) (var_ref gl_out_TexCoord0)  (expression vec3 lrp (swiz xyz (record_ref (var_ref ffp_fog)  color) )(swiz xyz (var_ref gl_out_TexCoord0) )(expression float sat (expression float * (expression float + (record_ref (var_ref ffp_fog)  end) (expression float neg (var_ref gl_FogFragCoord) ) ) (record_ref (var_ref ffp_fog)  scale) ) ) ) ) 
      (assign  (xyzw) (var_ref gl_out_TexCoord0@64)  (var_ref gl_out_TexCoord0) ) 
    ))

)

This in turn becomes an uninitialized read from TEMP[3] in the TGSI, and nouveau doesn't 0-initialize such vars. specular_enable happens to be (1, 1, 1, 0) here.
Comment 4 Ilia Mirkin 2015-07-05 17:57:11 UTC
Created attachment 116960 [details] [review]
init unknown values to 0

As I suspected, getting nouveau to init these to 0 fixes the issue. But IMO the glsl compiler is the one doing something dodgy here...
Comment 5 Béla Gyebrószki 2015-07-05 18:47:11 UTC
(In reply to Ilia Mirkin from comment #4)
> Created attachment 116960 [details] [review] [review]
> init unknown values to 0
> 
> As I suspected, getting nouveau to init these to 0 fixes the issue. But IMO
> the glsl compiler is the one doing something dodgy here...

The patch works and it fixes the issue not just in Torchlight but also in some other titles like Silent Hunter III, Evil Genius or the Ankh series.
Some games with similar looking problem still have the the problem, but that might be a different issue.
Comment 6 Béla Gyebrószki 2015-07-05 19:50:59 UTC
Traces from games where the problem is fixed with your patch:

Evil Genius (character model corrupted after loading a saved game):
https://drive.google.com/open?id=0B-tTbLKBl-tOMWtxWXdGVm13LXc

Ankh 2 (menu screen heavily corrupted):
https://drive.google.com/open?id=0B-tTbLKBl-tOQ0kydGFFM2xsbXM

===
Maybe unrelated, here are the traces from 2 games where the problem isn't fixed by the patch:

Bloodrayne 2 (objects like the tables in front of the model have those flickering boxes):
https://drive.google.com/open?id=0B-tTbLKBl-tOWGZEZ1dPYUVwUzA
With your patch applied those objects become invisible.

Tomb Raider:Underworld (character model corrupted):
https://drive.google.com/open?id=0B-tTbLKBl-tOOWhES3ZJaGlHeXc
The problem remains with the patch.
Comment 7 Ilia Mirkin 2015-07-05 20:22:20 UTC
(In reply to Béla Gyebrószki from comment #6)
> Traces from games where the problem is fixed with your patch:
> 
> Evil Genius (character model corrupted after loading a saved game):
> https://drive.google.com/open?id=0B-tTbLKBl-tOMWtxWXdGVm13LXc
> 
> Ankh 2 (menu screen heavily corrupted):
> https://drive.google.com/open?id=0B-tTbLKBl-tOQ0kydGFFM2xsbXM

i965 on hsw also has issues with these. And on the original trace in comment 1. Looking at least at the Ankh 2 trace, it appears to also be the gl_SecondaryColor issue -- I assume they're using slightly diff versions of the same engine. All the var names are the same, but the frag shader structure is a tad different.

> 
> ===
> Maybe unrelated, here are the traces from 2 games where the problem isn't
> fixed by the patch:
> 
> Bloodrayne 2 (objects like the tables in front of the model have those
> flickering boxes):
> https://drive.google.com/open?id=0B-tTbLKBl-tOWGZEZ1dPYUVwUzA
> With your patch applied those objects become invisible.

i965 also has funkiness rendering on the table.

This genius shader actually only sets gl_FrontSecondaryColor and then uses gl_Color.w (in addition to gl_SecondaryColor). I bet that has to be 1.0 and not 0.0, hence the transparency fail. I guess it's all from the same engine. And it expects the colors to be (0, 0, 0, 1) when not explicitly set.

> 
> Tomb Raider:Underworld (character model corrupted):
> https://drive.google.com/open?id=0B-tTbLKBl-tOOWhES3ZJaGlHeXc
> The problem remains with the patch.

i965 seems fine with this one. Probably an unrelated bug.

Well, I don't feel so bad. Intel has a QA team (not to mention a dev team). You have probably done 10x more QA on nouveau than the sum of what had ever been done prior.
Comment 8 Ilia Mirkin 2015-07-05 20:57:32 UTC
Created attachment 116966 [details] [review]
init unknown values to (0, 0, 0, 1)

This fixes the table in Bloodrayne 2, I think. (Having some unrelated issues, so not 100% sure.)
Comment 9 Ilia Mirkin 2015-07-06 02:35:13 UTC
I spoke to ChrisF about this, and it appears that the behaviour is undefined at least in GLSL. As the GLSL is being generated by wine, I mentioned it on #winehackers, but there was no response, people probably just weren't around.

Perhaps worth filing a bug on bugs.winehq.org as well?
Comment 10 Béla Gyebrószki 2015-07-06 05:06:27 UTC
Bug reported in Wine bugtracker: https://bugs.winehq.org/show_bug.cgi?id=38869
Comment 11 Henri Verbeet 2015-07-06 10:03:30 UTC
Yeah, this is our (Wine's) bug. We should be intializing varyings read by the fragment shader but not written by the vertex shader, pretty much like the patch on this bug report does, but in wined3d instead of the driver.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.