Bugzilla – Bug 47535
[snb] gpu hang with google maps (GL version)
Last modified: 2013-10-04 23:39:58 UTC
Created attachment 58707 [details]
xf86-video-intel git: 1c2932e9cb283942567c3dd2695d03b8045da27f
[ 357.750313] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 357.750325] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 357.767446] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 7401 at 7390, next 7402)
[ 353.425] [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
[ 353.425] Backtrace:
[ 353.426] 0: /usr/bin/X (xorg_backtrace+0x36) [0x56d2d6]
[ 353.426] 1: /usr/bin/X (mieqEnqueue+0x273) [0x54dd43]
[ 353.426] 2: /usr/bin/X (0x400000+0x4a3ae) [0x44a3ae]
[ 353.426] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fdf2082c000+0x62d1) [0x7fdf208322d1]
[ 353.426] 4: /usr/bin/X (0x400000+0x72267) [0x472267]
[ 353.426] 5: /usr/bin/X (0x400000+0x97ab3) [0x497ab3]
[ 353.426] 6: /lib64/libpthread.so.0 (0x7fdf24db8000+0x10420) [0x7fdf24dc8420]
[ 353.426] 7: /lib64/libc.so.6 (ioctl+0x7) [0x7fdf23da8807]
[ 353.426] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7fdf2233ef08]
[ 353.426] 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fdf21e07000+0x37524) [0x7fdf21e3e524]
[ 353.426] 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fdf21e07000+0x3a353) [0x7fdf21e41353]
[ 353.426] 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fdf21e07000+0x6256c) [0x7fdf21e6956c]
[ 353.426] 12: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fdf21e07000+0x70720) [0x7fdf21e77720]
[ 353.426] 13: /usr/bin/X (WakeupHandler+0xa0) [0x43a280]
[ 353.426] 14: /usr/bin/X (WaitForSomething+0x1bc) [0x56a8cc]
[ 353.426] 15: /usr/bin/X (0x400000+0x35e52) [0x435e52]
[ 353.426] 16: /usr/bin/X (0x400000+0x24e6a) [0x424e6a]
[ 353.426] 17: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fdf23cf522d]
[ 353.426] 18: /usr/bin/X (0x400000+0x24a09) [0x424a09]
[ 353.426] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[ 353.426] [mi] mieq is *NOT* the cause. It is a victim.
[ 354.097] [mi] EQ overflow continuing. 100 events have been dropped.
Created attachment 58710 [details]
i915_error_state after upgrade to mesa (git: ca760181b4420696c7e86aa2951d7203522ad1e8 )
It appears that the hang goes away if you disable HiZ. (This can be done via driconf, or by setting the environment variable hiz=false.)
The simulator is giving me interesting complaints; investigating further...
*** Bug 45806 has been marked as a duplicate of this bug. ***
*** Bug 44108 has been marked as a duplicate of this bug. ***
Confirming that disabling Hierarchical Z stops the hang.
However, overall MapsGL performance is uninspiring. Probably Google's fault since its still experimental.
MapsGL is also uninspiring on my MacBook Pro with a discrete NVidia card. It's definitely Google's fault.
Maybe we're lucky and one of the SNB workarounds fixed this. Can someone try the latest kernel from my drm-intel-next-queued branch?
I just gave drm-intel-next-queued (a360bb1a83279243a0945a0e646fd6c66521864e) a try (running with mesa 8.0.2, libdrm-2.4.33 and xf86-video-intel-caf9144271a10f90ea580c246b2df3f69a10b7a0 ) and I'd say the situation definitely improved. I got one initial hang when using google maps, after that it ran almost smooth. Previously every single zoom on the map caused yet another hangcheck_hung entry.
The log message I got was:
[ 79.722665] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 79.722669] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 79.734067] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Created attachment 60446 [details]
i915_error_state running with drm-intel-next-queued kernel
Mashing this register makes the problem go away for me:
$ sudo intel_reg_write 0x2120 '0x1206800'
Value before: 0x6820
Value after: 0x6800
Anyone care to try it out? I'll put together a kernel patch soon.
I can confirm it works, for me at least.
Works for me, too.
It turns out that Daniel already implemented a workaround for this in the kernel...but the register write isn't sticking. Apparently this register needs to be written later in the initialization process.
I've sent a preliminary (lame) patch to intel-gfx to spark some discussion on what the right fix should be. Hopefully we can land on something soon...
Reassigning to me.
Linus merged the patch into the upstream kernel, and Greg picked it up for 3.3 stable. Hopefully should be landing in a distro near you. :)
Marking fixed. If upgrading kernels is inconvenient, you can also apply the workaround manually via "sudo intel_reg_write 0x2120 0x1206800", or by disabling HiZ and separate stencil (export hiz=false). (The kernel patch does the register write, so the intel_reg_write workaround is just as good.)
Author: Kenneth Graunke <email@example.com>
Date: Fri Apr 27 12:44:41 2012 -0700
drm/i915: Set the Stencil Cache eviction policy to non-LRA mode.
Clearing bit 5 of CACHE_MODE_0 is necessary to prevent GPU hangs in
OpenGL programs such as Google MapsGL, Google Earth, and gzdoom when
using separate stencil buffers. Without it, the GPU tries to use the
LRA eviction policy, which isn't supported. This was supposed to be off
by default, but seems to be on for many machines.
This cannot be done in gen6_init_clock_gating with most of the other
workaround bits; the render ring needs to exist. Otherwise, the
register write gets dropped on the floor (one printk will show it
changed, but a second printk immediately following shows the value
reverts to the old one).
Cc: Rob Castle <firstname.lastname@example.org>
Cc: Eric Appleman <email@example.com>
Cc: Keith Packard <firstname.lastname@example.org>
Signed-off-by: Kenneth Graunke <email@example.com>
Reviewed-by: Daniel Vetter <firstname.lastname@example.org>
Acked-by: Daniel Vetter <email@example.com>
Signed-off-by: Dave Airlie <firstname.lastname@example.org>
*** Bug 48526 has been marked as a duplicate of this bug. ***
*** Bug 48791 has been marked as a duplicate of this bug. ***
*** Bug 48748 has been marked as a duplicate of this bug. ***
Works a treat. Thank you!