I'm getting this just after booting kernel 3.5 (including 3.5.1) on GM45. [ 298.460954] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 298.462548] i915: render error detected, EIR: 0x00000010 [ 298.462553] i915: IPEIR: 0x00000000 [ 298.462555] i915: IPEHR: 0x01000000 [ 298.462557] i915: INSTDONE: 0xfffffffe [ 298.462559] i915: INSTPS: 0x0001e000 [ 298.462561] i915: INSTDONE1: 0xffffffff [ 298.462563] i915: ACTHD: 0x0021aaa0 [ 298.462566] i915: page table error [ 298.462568] i915: PGTBL_ER: 0x00000001 [ 298.462573] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking 00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) dmesg, xorg.log and i915_error_state attached. xorg 1:7.7+1 xserver-xorg 1:7.7+1 xserver-xorg-core 2:1.12.3-1 xserver-xorg-video-intel 2:2.19.0-5 libdrm-intel1:amd64 2.4.33-3 libdrm-intel1:i386 2.4.33-3 There were no such issue with kernel 3.4
Created attachment 65441 [details] dmesg
Created attachment 65442 [details] Xorg.log
Created attachment 65443 [details] i915_error_state
The GPU is completely idle at the time of the error, and the source of the error is from the CPU accessing an invalid PTE through the GTT. There never should be an invalid PTE (the entire GTT is meant to only be pointing at buffer objects or the scratch page, valid entries one and all) so this is doubly concerning. Is there any chance you can perform a bisection between 3.4 and 3.5?
I've had the same error once, but without a method to reproduce it will be hard to bisect - I haven't seen it since.
It just happened again after a few hours runtime: [14153.513354] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [14153.514006] i915: render error detected, EIR: 0x00000010 [14153.514006] i915: IPEIR: 0x00000000 [14153.514006] i915: IPEHR: 0x01000000 [14153.514006] i915: INSTDONE: 0xfffffffe [14153.514006] i915: INSTPS: 0x0001e000 [14153.514006] i915: INSTDONE1: 0xffffffff [14153.514006] i915: ACTHD: 0x1f80d0b8 [14153.514006] i915: page table error [14153.514006] i915: PGTBL_ER: 0x00000001 [14153.514006] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking Package versions: kernel-3.6_rc3-drm xorg-server-1.12.4 xf86-video-intel-2.20.5 libdrm-2.4.38 mesa-8.1_rc1_pre20120814
Created attachment 66499 [details] output of intel_error_decode After today's upgrades to mesa master, libdrm-2.4.39 and 3.6_rc4-drm the error happened again. I still have no idea what causes it - the error occurred at wildly different uptimes and always at regular desktop workload.
Pity this is irregular, otherwise I could ask you to switch to SNA and see if that helps.
I've tried to start bisecting but unfortunately it doesn't reproduces with my 'minimal' .config. So it's definitely something configuration-specific Any ideas which options to check first?
(In reply to comment #9) > I've tried to start bisecting but unfortunately it doesn't reproduces with my > 'minimal' .config. So it's definitely something configuration-specific > > > Any ideas which options to check first? That just means it's a timing-related race somewhere. Which makes this really hard to track down :(
Dmitry, since we both have GM45 hardware and share at least one symptom - would you mind testing kernel 3.4.10 for bug 54575?
(In reply to comment #8) > Pity this is irregular, otherwise I could ask you to switch to SNA and see if > that helps. The error didn't occur in ~ 9 hours after switching from UXA to SNA.
Another week with SNA and I think it's safe to say that it really only happens with UXA.
The hint here is that this appears to be the a race with pageflipping. So UXA should receive the same level of protection as SNA with current xf86-video-intel.git, and there is yet another bug to be fixed in the kernel... commit 5a6c82a097e23cadc73eb65ebe6634bd84d363bc Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 27 21:17:28 2012 +0100 drm/i915: Flush the pending flips on the CRTC before modification
Working on the theory that this is also related to the cpu-relocs issue, does using 3.7 help?
It looks like I can't reproduce it using 3.6.2 kernel. So 3.7 is probably also ok. So I think that it's ok to close this as resolved. Thanks.
Ok, closing this as no longer reproducible on latest kernels, thanks a lot for the bug report and please reopen if this issue pops up again.
Well, it still reproduces somehow. But now it happens at very random times. I don't see any conditions. Maybe switching to console, watching video (mplayer -vo gl2) or just switching between multiple X11 sessions is cause. I can't be sure. And also I don't know any steps to reproduce it. This is with kernel 3.6.3 [248021.375714] i915: render error detected, EIR: 0x00000010 [248021.375724] i915: IPEIR: 0x00000000 [248021.375728] i915: IPEHR: 0x01000000 [248021.375733] i915: INSTDONE: 0xfffffffe [248021.375736] i915: INSTPS: 0x0001e000 [248021.375740] i915: INSTDONE1: 0xffffffff [248021.375744] i915: ACTHD: 0x098151d0 [248021.375749] i915: page table error [248021.375752] i915: PGTBL_ER: 0x00000001 [248021.375760] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking I'm attaching i915_error_state and intel_error_decode output for this case
Created attachment 69637 [details] i915_error_state for 3.6.3 kernel
Created attachment 69638 [details] intel_error_decode output for 3.6.3 kernel
Smells like a missing mb().
Can you please try http://cgit.freedesktop.org/~ickle/linux-2.6 #master which contains a review of the mb() around GTT access.
I've switched to SNA since comment #13, but the error pops up now and then, seen it in 3.7.x and also in 3.8 (right now in rc6 after 13+ hrs uptime).
Created attachment 74543 [details] output of intel_error_decode latest output of intel_error_decode
(In reply to comment #22) > Can you please try http://cgit.freedesktop.org/~ickle/linux-2.6 #master > which contains a review of the mb() around GTT access. Chris, could you point out which patches I should take from there so I can try with a stable kernel? ~ickle/linux-2.6 master suffers from bug 58867 which I could patch, but right at startup I already see an other ugly kernel oops...
The most interesting of those patches are now in drm-intel-next (http://cgit.freedesktop.org/~danvet/drm-intel) or something like the drm-intel-experimental ppa.
Ok thx, I 'unblurred' current drm-intel-next branch for my system, testing the resulting image now. :)
I've manually picked your commits d0a57789d5ec807fc218151b2fb2de4da30fbef5 97c809fd9cf5e914322b53773ad0d67efe503fde a3e30cef4b84f92763ed54c9934d70e2dd591246 9ddcb7df360c62ac6d4090ae60376c26510022f1 from 2012-12-16, all about mb(), as the current drm-intel-next branch kernel image panicks on my system after some time. Testing with 3.8_rc7 right now. Other related packages updated since commit #6: xf86-video-intel-2.20.19 xorg-server-1.13.2 libdrm-2.4.40 mesa-9.0.1
Happened again with the above mentioned patches, this time very early - before even wlan0 was up. :(
Created attachment 75646 [details] 3.8 dmesg with drm.debug=6 If anything, it seems to happen more often now... Also updated to xf86-video-intel-2.21.3
(In reply to comment #30) > Created attachment 75646 [details] > 3.8 dmesg with drm.debug=6 > > If anything, it seems to happen more often now... > > Also updated to xf86-video-intel-2.21.3 Can you please retest with the latest drm-intel-nightly?
Do we have enough of fastboot upstream yet to fix the regression of not turning off the BIOS outputs whilst we overwrite its memory and PTE?
Nope, fastboot framebuffer reconstruction is still missing :(
In my small collection of dmesg logs, it was last seen in a 3.9.0 kernel. I should probably automate this and grep/save dmesg at each shutdown. However, these days I'm running 3.10 and so far haven't stumbled over it, while not exactly watching out for it. I will do that in the coming days and report back should it happen / then try out drm-intel-nightly.
OK, it just happened again in 3.10.0-rc7+ which brings me to drm-intel-nightly next.
Unfortunately both drm-intel-nightly as well as -next are currently unusable on my system - there's no external display output at all. KDE detects when I fire up the DP monitor but X hangs when I actually try to enable some output there. Maybe I'll find a working state somewhere back in git history.
Where? What? When? How? Don't leave us hanging like this! cat /proc/`pidof Xorg`/stack or attach gdb would be useful as would the last traces from the log file.
Sorry, not much time for bug hunting with my current workload. :( However, I found out that what's actually broken is one of my boot params that is needed for correct kms fbcon native resolution detection - after removing i915.panel_ignore_lid=0 I do have output again on DP. So now I'm able to test today's state of drm-intel-nightly.
Created attachment 82046 [details] intel reg dump from drm-intel-nightly with i915.panel_ignore_lid=0 fwiw, attaching the reg dump from nightly without dp output
I haven't seen the error so far using nightly, but it's too early to be safe. Only this has appeared in dmesg when switching on the external display via xrandr (it wouldn't come up by itself, it's the troubled setup from bug 58876): [ 45.127774] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status 0x11450085
OK, there it is again with a drm-intel-nightly image pulled and built yesterday evening: [15979.289716] [drm] capturing error event; look for more information in /sys/class/drm/card0/error [15979.290709] i915: render error detected, EIR: 0x00000010 [15979.290709] i915: IPEIR: 0x00000000 [15979.290709] i915: IPEHR: 0x01000000 [15979.290709] i915: INSTDONE_0: 0xfffffffe [15979.290709] i915: INSTDONE_1: 0xffffffff [15979.290709] i915: INSTDONE_2: 0x00000000 [15979.290709] i915: INSTDONE_3: 0x00000000 [15979.290709] i915: INSTPS: 0x0001e000 [15979.290709] i915: ACTHD: 0x164041f8 [15979.290709] i915: page table error [15979.290709] i915: PGTBL_ER: 0x00000001 [15979.290709] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
Created attachment 82487 [details] output of /sys/class/drm/card0/error attaching error log. Error happened while also using SNA, xf86-video-intel-2.21.12, xorg-server-1.13.4, libdrm-2.4.45, mesa-9.1.4
Created attachment 82488 [details] intel error decode (3.10.0-rc7+ drm-intel-nightly from 13/07/15)
Note that the immediate after boot vs after several hours runtime are likely two different bugs. Or rather I have a two theories that explains each one independently...
Since I'm currently doing a lot of rebooting due to other issues with i915, I did notice that with 3.8.13 the early after boot error was more or less guaranteed, and that seems to have disappeared as I only noticed it late in the game with recent kernels. Which kind of confirms your theory and that there has been some progress indeed, I guess.
Andreas, what's the situation with current drm-intel-nightly?
(In reply to comment #46) > Andreas, what's the situation with current drm-intel-nightly? I just tried the latest state of drm-intel-nightly on the setup that's troubled by bug 57461 and bug 69251 (external display via DisplayPort), and it's got a bit worse: 1.) System freezes every time that - presumably - EDID is accessed. At first there's a noticeable black screen delay between grub2 and init, then it proceeds fine to the login manager, all seems fine at that point. 2.) That 5-6 seconds freeze (total lock, any input is lost) then happens each time I switch between fbcon and login manager, and doing that I can soon provoke the following error in dmesg: [ 66.836072] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5 3.) Starting the desktop environment results in a multitude of those freezes, presumably because KDE tries to detect and find out a few things about display capabilities, color management and whatnot, startup is considerably delayed by that. 4.) How to reproduce the freeze: ~ $ time oyranos-monitor -l 0: ":0.0" 1920,00x1200,00+0,00+0,00 S2243W real 0m6.230s user 0m0.034s sys 0m0.033s 5.) During first startup of the new kernel image I also got an 'hpd interrupt storm' in dmesg, a few restarts later a familiar error has reappeared: [ 457.189291] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 457.189296] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 457.189297] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 457.189299] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 457.189300] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 457.190004] i915: render error detected, EIR: 0x00000010 [ 457.190004] i915: IPEIR: 0x00000000 [ 457.190004] i915: IPEHR: 0x54c00006 [ 457.190004] i915: INSTDONE_0: 0x808f837f [ 457.190004] i915: INSTDONE_1: 0xbf2706ae [ 457.190004] i915: INSTDONE_2: 0x00000000 [ 457.190004] i915: INSTDONE_3: 0x00000000 [ 457.190004] i915: INSTPS: 0x8001e025 [ 457.190004] i915: ACTHD: 0x01bcb45c [ 457.190004] i915: page table error [ 457.190004] i915: PGTBL_ER: 0x00000001 [ 457.190004] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
Sorry, the first bug number in above comment #47 should have pointed at bugzilla.kernel.org.
Created attachment 91132 [details] intel-error-decode-131222.log (drm-intel-nightly-3.13.0-rc4+)
Created attachment 91133 [details] intel-reg-dump-131222.log (drm-intel-nightly-3.13.0-rc4+)
Happened now as well on the DP-DVI setup from bug 58876 and as soon as [ 1716.048044], but here at least there are no 6sec freezes.
Timeout, please try current drm-intel-nightly.
Hi, I (submitter of bug) don't have access to affected GM45 laptop anymore. Probably we can wait a week for other guys from CC...
Created attachment 106161 [details] 20140908-0828_3.16.1-gentoo-stop_i915errdecode-ON.log *checks logs* Error was last recorded on 2014-09-08 with kernel 3.16.1 for the first time in about a month since I started saving logs at shutdown: [ 81.622660] [drm] GPU HANG: ecode -1:0x00000000, reason: Command parser error, iir 0x00008000, action: continue [ 81.622660] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 81.622660] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 81.622660] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 81.622660] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 81.622660] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 81.622660] i915: render error detected, EIR: 0x00000010 [ 81.622660] i915: IPEIR: 0x00000000 [ 81.622660] i915: IPEHR: 0x01000000 [ 81.622660] i915: INSTDONE_0: 0xfffffffe [ 81.622660] i915: INSTDONE_1: 0xffffffff [ 81.622660] i915: INSTDONE_2: 0x00000000 [ 81.622660] i915: INSTDONE_3: 0x00000000 [ 81.622660] i915: INSTPS: 0x0001e000 [ 81.622660] i915: ACTHD: 0x0080b8a0 [ 81.622660] i915: page table error [ 81.622660] i915: PGTBL_ER: 0x00000001 [ 81.622660] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
Created attachment 106162 [details] 20140908-0828_3.16.1-gentoo-stop_i915regdump-ON.log regdump available as well
*** Bug 79222 has been marked as a duplicate of this bug. ***
This could be related to http://patchwork.freedesktop.org/patch/41094/
I am going to take a risk and say this is fixed by: commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 26 10:47:10 2015 +0000 agp/intel: Serialise after GTT updates
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.