Bug 22622 - [GM965 GLSL] noise*() cause GPU lockup
[GM965 GLSL] noise*() cause GPU lockup
Status: RESOLVED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Intel
XOrg git
x86-64 (AMD64) Linux (All)
: medium critical
Assigned To: Ian Romanick
: patch
: 19546 (view as bug list)
Depends on:
Blocks: 29044
  Show dependency treegraph
 
Reported: 2009-07-05 04:29 UTC by David L.
Modified: 2010-09-09 15:52 UTC (History)
3 users (show)

See Also:


Attachments
Xorg.log (16.33 KB, text/plain)
2009-07-05 04:29 UTC, David L.
no flags Details
intel_gpu_dump (bz2) after progs/glsl/noise lockup (120.33 KB, application/octet-stream)
2009-07-05 04:31 UTC, David L.
no flags Details
intel_gpu_dump (bz2) after progs/glsl/noise2 lockup (133.36 KB, application/octet-stream)
2009-07-05 04:31 UTC, David L.
no flags Details
intel_gpu_dump (bz2) after progs/glsl/multinoise lockup (131.10 KB, application/octet-stream)
2009-07-05 04:32 UTC, David L.
no flags Details
Replacse noise opcodes with GLSL implementation (26.99 KB, patch)
2009-07-07 16:26 UTC, Ian Romanick
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description David L. 2009-07-05 04:29:20 UTC
Created attachment 27391 [details]
Xorg.log

all "noise" samples from Mesa progs/glsl cause immediate GPU lockup on start.

installed versions:

kernel b44866e34ce96cdec2e848ab57808381df871ac8 (== 2.6.30.1)
libdrm 72a29340ea3225550db6b009f4e50c77c7b1f394 (2009-07-03 15:03:03+0200)
mesa 862488075c5537b0613753b0d14c267527fc6199 (2009-07-03 18:53:58+0200)
xf86-video-intel 74227141923a2f5049592219ab80e8733062a5d9 (2009-06-23
14:14:50+0100)
xorg-server d6b8205e699c0c62af76c4a9cbff1402337927b3 (2009-07-03 19:25:33-0700)

hardware:
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960
Integrated Graphics Controller [8086:2a02] (rev 03)
00:02.1 Display controller [0380]: Intel Corporation Mobile GM965/GL960
Integrated Graphics Controller [8086:2a03] (rev 03)
MSI disabled.

killing the program or X does not return me to a working console, only way out is reboot.

no other sample from progs/glsl causes a lockup.
Comment 1 David L. 2009-07-05 04:31:04 UTC
Created attachment 27392 [details]
intel_gpu_dump (bz2) after progs/glsl/noise lockup
Comment 2 David L. 2009-07-05 04:31:35 UTC
Created attachment 27393 [details]
intel_gpu_dump (bz2) after progs/glsl/noise2 lockup
Comment 3 David L. 2009-07-05 04:32:15 UTC
Created attachment 27394 [details]
intel_gpu_dump (bz2) after progs/glsl/multinoise lockup
Comment 4 Gordon Jin 2009-07-05 17:57:26 UTC
*** Bug 19546 has been marked as a duplicate of this bug. ***
Comment 5 David L. 2009-07-06 00:47:49 UTC
when i change the shader in noise.c like this:

@@ -27,17 +27,17 @@
    "uniform float Slice;\n"
    "void main()\n"
    "{\n"
    "   vec4 scale = vec4(5.0);\n"
    "   vec4 p;\n"
    "   p.xy = gl_TexCoord[0].xy;\n"
    "   p.z = Slice;\n"
    "   p.w = 0;\n"
-   "   vec4 n = noise4(p * scale);\n"
+   "   vec4 n = p * scale;\n"
    "   gl_FragColor = n * Scale + Bias;\n"
    "}\n";
 
the hang goes away, so it definitively is an issue with the noise4 function.
Comment 6 David L. 2009-07-06 01:04:50 UTC
noise1 trips the lockup as well. can't test noise2/noise3 since i have no clue about writing glsl code...

adjusting summary and bumping priority for hang.
Comment 7 Ian Romanick 2009-07-07 16:26:30 UTC
Created attachment 27485 [details] [review]
Replacse noise opcodes with GLSL implementation
Comment 8 David L. 2009-07-08 02:29:09 UTC
confirmed the patch fixes the hangs with the noise samples, thank you very much!

EVE still freezes, removing bug #22613 from blocks :(
Comment 9 Gordon Jin 2009-09-07 22:39:38 UTC
(In reply to comment #7)
> Created an attachment (id=27485) [details]
> Replacse noise opcodes with GLSL implementation
> 

Ian, is this patch committed?
Comment 10 Ian Romanick 2009-09-08 16:16:18 UTC
(In reply to comment #9)

> Ian, is this patch committed?

It is not.  When this code is used, the GLSL compiler inlines *everything*, and programs quickly become too large (or use too many registers) to run in hardware.  We're going to have to do some work on the GLSL compiler before we can properly implement this.  I don't have a good solution for right now.
Comment 11 Charlie Burrows 2010-02-11 05:55:03 UTC
(In reply to comment #10)
> We're going to have to do some work on the GLSL compiler before we
> can properly implement this.  
 
Ian, can you tell me what are the bugs filed against GLSL that are linked to this issue?
Comment 12 Eric Anholt 2010-03-19 12:42:34 UTC
Are you still experiencing this problem?  I just ran noise and noise2 on my GM45 successfully.
Comment 13 David L. 2010-03-20 13:05:17 UTC
Eric Anholt: yes, it seems to persist:

# ./checkinst
drm.32/c1c8bbf80b1f734e23996bf805dc78f32ebaf56f
drm.64/c1c8bbf80b1f734e23996bf805dc78f32ebaf56f
mesa.32/05c03c6a1bcfb8ad77d3025f166f02ddaa741aa2
mesa.64/05c03c6a1bcfb8ad77d3025f166f02ddaa741aa2
xf86-video-intel/3d4b3f257fbbb69c6f236d9803abe54a90d7d434
# mesa/progs/glsl/noise
Uniforms:
  0: Scale size=1 type=0x8b52 loc=0 value=0.5, 0.4, 0, 0
  1: Bias size=1 type=0x8b52 loc=1 value=0.5, 0.3, 0, 0
  2: Slice size=1 type=0x1406 loc=2 value=0.5, 0, 0, 0
GL_RENDERER = Mesa DRI Intel(R) 965GM GEM 20091221 DEVELOPMENT 
intel_bufmgr_gem.c:1234: Error setting memory domains 1 (00000040 00000000): Input/output error .
^C
# dmesg | tail -n 3
[30715.868133] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[30715.868148] render error detected, EIR: 0x00000000
[30715.868190] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 2664539 at 2664538)

... but at least it doesn't hard-lockup my box anymore. Though I'm not sure the noise stuff is causing the GPU hangs, I'm experiencing these quite frequently with wine.

With Ian's patch, noise works.
Comment 14 David L. 2010-03-20 13:07:07 UTC
btw, bug #19546 comment 2 says "Q965 and g45 is fine, this bug may be gm965 one only."
Comment 15 Gordon Jin 2010-03-21 23:31:30 UTC
right, this only happens on GM965 in our test machines.

But in our case, X can be killed and returns to working console.

[root@x-gm965 glsl]#./noise
libGL: OpenDriver: trying /opt/X11R7/lib/dri/i965_dri.so
Uniforms:
  0: Scale size=1 type=0x8b52 loc=0 value=0.5, 0.4, 0, 0
  1: Bias size=1 type=0x8b52 loc=1 value=0.5, 0.3, 0, 0
  2: Slice size=1 type=0x1406 loc=2 value=0.5, 0, 0, 0
GL_RENDERER = Mesa DRI Intel(R) 965GM GEM 20091221 DEVELOPMENT
intel_bufmgr_gem.c:1247: Error setting memory domains 1 (00000040 00000000): Inp

[root@x-gm965 glsl]# dmesg | tail -n 3
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1601 at 1600)
Comment 16 Ian Romanick 2010-09-09 15:52:19 UTC
The following patches effectively disable noise in i965 GLSL.  Once real function calls are implemented in i965 drivers a variation of the GLSL code (attachment #27485 [details] [review]) will be used.

commit 1f3c7d968c4313dbb71bc93306556cc9292d06ef
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Sep 1 21:23:52 2010 -0700

    glsl2: Implement noise[1234] built-in functions using ir_unop_noise

commit 2b70dbfe091af5ae7c788e16275e1af2cb1c284c
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Thu Sep 9 15:25:32 2010 -0700

    glsl2: Add EmitNoNoise flag, use it to remove noise opcodes

commit 547131ac8750acabd030972fc768705c13d19ef7
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Thu Sep 9 15:20:09 2010 -0700

    glsl2: Add lowering pass to remove noise opcodes

commit 3a5ce85cfa4914711e56c8cf831699242618928e
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Sep 1 21:12:10 2010 -0700

    glsl2: Add ir_unop_noise