Created attachment 98831 [details] Crash dump from /sys/class/drm/card0/error When trying to launch XBMC 13.0 on my ThinkPad X41 Tablet with a 915GM, the display gets filled with a solid color, the GPU hangs and does not recover. XBMC doesn't get far enough to generate its own error log. I also don't know if this is a regression because I never tried to use XBMC before. OS: Arch Linux x86 with Kernel 3.14.2, Mesa 10.1.3. These are the only related messages in the ring buffer: [drm] GPU crash dump saved to /sys/class/drm/card0/error [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. i915: render error detected, EIR: 0x00000010 i915: page table error i915: PGTBL_ER: 0x00100003 [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking i915: render error detected, EIR: 0x00000010 i915: page table error i915: PGTBL_ER: 0x00100003 [drm] stuck on render ring [drm:i915_reset] *ERROR* Failed to reset chip: -19
That's bad. There is tons of garbage in the ringbuffer. Can you try mplayer -vo xv or mplayer -vo gl? Just to see if they suffer something similar that may be easier to debug. Do all the hangs in xbmc look like the one attached? Can you please attach a couple more.
Created attachment 98847 [details] Try 1: Crashdump
Created attachment 98848 [details] Try 1: Kernel log
Created attachment 98849 [details] Try 2: Crashdump
Created attachment 98850 [details] Try 2: Kernel log
Created attachment 98851 [details] Try 3: Crashdump
Created attachment 98852 [details] Try 3: Kernel log
Created attachment 98853 [details] Try 4: Crashdump
Created attachment 98854 [details] Try 4: Kernel log
Created attachment 98855 [details] Try 5: Crashdump
Created attachment 98856 [details] Try 5: Kernel log
Playing a video with mplayer -vo xv and -vo gl works fine. I normally use mpv, which works fine too with --vo=xv, --vo=x11 and --vo=opengl-old. I also collected a few more crashdumps. Once (on try 1) the GPU did recover and X was usable after a short hang, although XBMC still crashed before its GUI showed up. On this try there also were some related entries in the kernel log.
Apart from try1, they all seem to have the same characteristic of something overwriting the ring buffer. How easy would it to be to test with an old mesa, I guess mesa-9.0?
I'd have to recompile a lot of stuff to use older mesa versions, since it's a rolling release distro. However I found out something new: The same GPU hang with "EIR stuck" also occurs when using xrandr or lxrandr. No resolution change, simply executing "xrandr" from lxterminal or starting lxrandr to show the current setting causes the GPU to hang. But after X has locked up, I can still execute it with "DISPLAY=:0 xrandr" via SSH and get the correct output. If I change the mode, xrandr output reflects that (* next to a different mode), but the display stays unresponsive. I can also turn the display off, and it really gets turned off. Turning it back on again yields the same solid color it was before though. While doing this lots of call traces appear in the kernel log again. If I try to execute "DISPLAY=:0 xrandr" from SSH after clean boot, the same crash occurs, but xrandr runs produces the correc output. Since XBMC tries to change the display mode on startup because it normally runs in fullscreen and mplayer/mpv don't try to change the mode, this might be the same issue.
Created attachment 98866 [details] Hang when execuiting xrandr: Crashdump
Created attachment 98867 [details] Hang when executing xrandr: Kernel log (annotated)
It has the same symptoms that the ringbuffer has been overwritten. I would try: ickle@nuc-i3427:/usr/src/linux$ git diff diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c index 20bd839..dcf90ca 100644 --- a/drivers/gpu/drm/i915/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c @@ -337,6 +337,8 @@ int i915_gem_init_stolen(struct drm_device *dev) } #endif + dev_priv->gtt.stolen_size = 0; + if (dev_priv->gtt.stolen_size == 0) return 0;
(In reply to comment #17) > + dev_priv->gtt.stolen_size = 0; Yep, this worked. xrandr no longer hangs the GPU and XBMC works fine.
Should we just give up on stolen on gen3? After all we have a few bugs with conflicts with mmio bars and other crap ..
Maybe this? http://lists.freedesktop.org/archives/intel-gfx/2013-December/036841.html
(In reply to comment #20) > Maybe this? > http://lists.freedesktop.org/archives/intel-gfx/2013-December/036841.html Oh and in addition we might have to leave a guard page between the GTT and the rest of stolen. I think I saw something like that mentioned somewhere. That patch doesn't have a guard page though.
Well fodder for testing for sure, with or without the guard page.
(In reply to comment #20) > Maybe this? > http://lists.freedesktop.org/archives/intel-gfx/2013-December/036841.html Wait, that's not upstream yet?
Assigning to Ville to push his fix.
commit f1e1c2129b79cfdaf07bca37c5a10569fe021abe Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Thu Jun 5 20:02:59 2014 +0300 drm/i915: Don't clobber the GTT when it's within stolen memory is now merged, so I'm assuming that this is now fixed.
I can confirm it's fixed on my machine with vanilla kernel 3.15.8.
Closing resolved+fixed. Verification done by Reporter.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.