Created attachment 60417 [details] i915_error_state of latetst occurrence of this error 0) On a laptop I use I've ran into i915 kernel errors at resume. I see these only every now and then: not on all resumes. The pattern of these errors (and preceding messages) is basically always like this: <6>[14673.762529] [drm] Changing LVDS panel from (+hsync, +vsync) to (-hsync, -vsync) <6>[14673.824599] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state <3>[14673.825071] render error detected, EIR: 0x00000010 <3>[14673.825071] page table error <3>[14673.825071] PGTBL_ER: 0x00000100 <3>[14673.825071] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking 1) I've only noticed this because: - I tend to check for KERN_ERR messages in the logs; and - the errors (ie, the messages with a <3> prefix) look scary when text mode briefly flashes by at suspend and at resume. Otherwise the display on this machine seems to work just fine. 2) The logs on this machine only have information for v3.3 based kernels. I haven't checked whether earlier kernels also do this. Nor have I checked whether this is fixed in the current v3.4 release candidates. 3) Previously reported: http://lists.freedesktop.org/archives/dri-devel/2012-April/021672.html Some similar reports, but for messages printed at boot: http://lists.freedesktop.org/archives/dri-devel/2011-January/007302.html http://lists.freedesktop.org/archives/dri-devel/2012-January/018158.html 4) I'll try to attach the contents of /debug/dri/0/i915_error_state for the latest occurrence of this error and an excerpt of dmesg (for that suspend and resume cycle) shortly.
Created attachment 60418 [details] dmesg of suspend and resume cycle
Created attachment 60419 [details] [review] Remove too-early-plane enable. Looks like another too-early-plane enable.
Comment on attachment 60419 [details] [review] Remove too-early-plane enable. 0) I hope to rebuild the kernel (or probably just the i915 module) on which I'm running currently (kernel-3.3.2-1.fc16.x86_64 from Fedora 16) with this patch shortly. 1) I'll then try to report in a few days whether I can still trigger this bug or not.
(In reply to comment #3) > 1) I'll then try to report in a few days whether I can still trigger this bug > or not. 0) Patch applied cleanly (tough at an enormous "offset") to kernel-3.3.2. 1) I've been running this patch for about three days now and haven't been able to trigger this error since. But there's this idea that "the absence of evidence is not the evidence of absence". So I'm currently planning to keep applying this patch on top of whatever Fedora 16 will be shipping as a kernel, just to be sure. 2) I'm not sure about the status of that patch (is it included in any public repo? what public repos is it intended for?). But if you plan on updating it (ie, it's description) for the error on resume case and pushing it to whatever repo you're targeting, please feel free to add Reported-by and/or Tested-by tags.
These bugs all have similar symptoms that could be explained and fixed by the following patch. So please do test drm-intel-next-queued and report back. On trying the equivalent patch in the past, it has caused modesetting regression for the initial switch from the BIOS configuration, so do look out for any glitches during boot. Thanks. commit 969d380a39d33f7533b6dcee35e834109d23f9e9 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 24 16:36:50 2012 +0100 drm/i915: Remove too early plane enable on pre-PCH hardware Enabling the plane before we have assigned valid address means that it will access random PTE (often with conflicting memory types) and cause GPU lockups. However, enabling the plane too early appears to workaround a number of bugs in our modesetting code. Cc: Franz Melchior <melchior.franz@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=39947 References: https://bugs.freedesktop.org/show_bug.cgi?id=41091 References: https://bugs.freedesktop.org/show_bug.cgi?id=49041 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(In reply to comment #5) > These bugs all have similar symptoms that could be explained and fixed by the > following patch. So please do test drm-intel-next-queued and report back. 0) I've located that patch at http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=969d380a39d33f7533b6dcee35e834109d23f9e9. 1) I've run the equivalent of that patch on v3.3.2 for some time now with no obvious (to me) problems and without triggering the error that this patch is supposed to suppress. Is that enough to CC -stable (for the v3.3 kernel)?
On Wed, Apr 25, 2012 at 12:27, <bugzilla-daemon@freedesktop.org> wrote: > Is that enough to CC -stable (for the v3.3 kernel)? Nope, because the patch needs to be in a mainline git first. And because it has blown up in the past, it will go through -next, which means 3.5. And even then I'd opposed sending it to stable right away until we have some testing feedback on it.
(In reply to comment #6) > 1) I've run the equivalent of that patch on v3.3.2 for some time now with no > obvious (to me) problems and without triggering the error that this patch is > supposed to suppress. > > Is that enough to CC -stable (for the v3.3 kernel)? I'd be careful, the last time I managed to get the patch upstream it was reverted shortly afterwards because a user reported the initial modeset was corrupt. So I'd play safe and wait for broader testing.
(In reply to comment #7) > Nope, because the patch needs to be in a mainline git first. 0) I thought the point of CC'ing stable was that the acceptance in mainline and the (start of) backporting to the stable branches could coincide. Ie, once the patch is in mainline the procedure to get the patch in the relevant stable branches starts immediately. But, anyhow, since comment #7 and comment #8 both oppose backporting to stable without additional testing on those stable branches my point is moot for this issue. 1) As for testing: I'm running the v3.3.4 based kernel of Fedora 16 with the equivalent of the patch of comment #5 for some time now without triggering this issue. So the patch still seems to fix this issue for v3.3 based stable kernels (on this machine!).
A patch referencing this bug report has been merged in Linux v3.5-rc1: commit c7bd4c25650704d4d065eb4ce2a122d2a80ce804 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 24 16:36:50 2012 +0100 drm/i915: Remove too early plane enable on pre-PCH hardware
Created attachment 127292 [details] Error code issued by dmesg command
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.