Created attachment 111414 [details]
For other reasons I am bound to use kernel 3.4.105, without any trouble, including xf86-video-intel versions up to 2.99.916.
Upgrading xf86-video-intel to 2.99.917 breaks this setup; however, it works in combination with latest linux-3.19-rc1+ (which I'm currently testing for fixing the reason I'm stuck with 3.4.x).
All using X.Org X Server 1.16.3; the regression is reproducable:
[ 19.540530] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 19.541519] render error detected, EIR: 0x00000010
[ 19.541519] IPEIR: 0x00000000
[ 19.541519] IPEHR: 0x00000000
[ 19.541519] INSTDONE: 0xffffffff
[ 19.541519] INSTPS: 0x4001e020
[ 19.541519] INSTDONE1: 0xbfffffff
[ 19.541519] ACTHD: 0x7ffff000
[ 19.541519] page table error
[ 19.541519] PGTBL_ER: 0x00100000
[ 19.541519] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
[ 25.532145] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 25.532154] render error detected, EIR: 0x00000010
[ 25.532158] IPEIR: 0x00000000
[ 25.532162] IPEHR: 0x00000000
[ 25.532165] INSTDONE: 0xffffffff
[ 25.532169] INSTPS: 0x4001e020
[ 25.532172] INSTDONE1: 0xbfffffff
[ 25.532175] ACTHD: 0x7ffff000
[ 25.532179] page table error
[ 25.532182] PGTBL_ER: 0x00100000
Created attachment 111415 [details]
Yes, it's a bug in the kernel relocation routines.
In case there is a kernel fix, just how big are my chances to get it backported to 3.4? ;)
Does it happen with latest drm-intel-nightly branch from cgit.freedesktop.org/drm-intel?
If so a bisect could lead you to the fix commit. If it doesn't it is still an upstream issue.
As said, it works with 3.19-rc1+, so actually I would need to find out at which point in the past it was fixed. If there was no prominent fix you could point me to from memory, I will start going back the last few majors. I know why I keep my .configs around...
I guess Chris might know offhand, otherwise I guess you get to do the long, painful, bisect. :/
Oh. I think I know what it might actually have been:
Author: Chris Wilson <email@example.com>
Date: Mon Jan 26 10:47:10 2015 +0000
agp/intel: Serialise after GTT updates
That could explain a few of these similar bugs.
Some things I can tell now:
Kernel versions that hang with >=xf86-video-intel-2.99.917: 3.4.106, 3.10.53
What works: 3.14.33, 3.17.4
Trying to apply the patch over 3.14.33 (only to check for backportability) breaks build, and the same happens for 3.4.106:
drivers/char/agp/intel-gtt.c: In function ‘i810_write_entry’:
drivers/char/agp/intel-gtt.c:331:2: error: implicit declaration of function ‘writel_relaxed’
If I remove the `writel_relaxed` hunks to make the patch succeed, 3.4.106 still hangs. But it seems that patch isn't the real fix anyway - it must be something between 3.10 and 3.14.