Bug 49041

Summary: [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
Product: DRI Reporter: Paul Bolle <pebolle>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: medium CC: ben, chris, daniel, florian, jbarnes, luidgimichael
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915_error_state of latetst occurrence of this error
none
dmesg of suspend and resume cycle
none
Remove too-early-plane enable.
none
Error code issued by dmesg command none

Description Paul Bolle 2012-04-21 06:23:14 UTC
Created attachment 60417 [details]
i915_error_state of latetst occurrence of this error

0) On a laptop I use I've ran into i915 kernel errors at resume. I see
these only every now and then: not on all resumes. The pattern of these
errors (and preceding messages) is basically always like this:

<6>[14673.762529] [drm] Changing LVDS panel from (+hsync, +vsync) to (-hsync, -vsync)
<6>[14673.824599] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
<3>[14673.825071] render error detected, EIR: 0x00000010
<3>[14673.825071] page table error
<3>[14673.825071]   PGTBL_ER: 0x00000100
<3>[14673.825071] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking

1) I've only noticed this because:
- I tend to check for KERN_ERR messages in the logs; and
- the errors (ie, the messages with a <3> prefix) look scary when text
mode briefly flashes by at suspend and at resume.
Otherwise the display on this machine seems to work just fine.

2) The logs on this machine only have information for v3.3 based
kernels. I haven't checked whether earlier kernels also do this. Nor
have I checked whether this is fixed in the current v3.4 release
candidates.

3) Previously reported:
http://lists.freedesktop.org/archives/dri-devel/2012-April/021672.html

Some similar reports, but for messages printed at boot:
http://lists.freedesktop.org/archives/dri-devel/2011-January/007302.html
http://lists.freedesktop.org/archives/dri-devel/2012-January/018158.html

4) I'll try to attach the contents of /debug/dri/0/i915_error_state for the latest occurrence of this error and an excerpt of dmesg (for that suspend and resume cycle) shortly.
Comment 1 Paul Bolle 2012-04-21 06:24:48 UTC
Created attachment 60418 [details]
dmesg of suspend and resume cycle
Comment 2 Chris Wilson 2012-04-21 06:33:08 UTC
Created attachment 60419 [details] [review]
Remove too-early-plane enable.

Looks like another too-early-plane enable.
Comment 3 Paul Bolle 2012-04-21 06:38:50 UTC
Comment on attachment 60419 [details] [review]
Remove too-early-plane enable.

0) I hope to rebuild the kernel (or probably just the i915 module) on which I'm running currently (kernel-3.3.2-1.fc16.x86_64 from Fedora 16) with this patch shortly.

1) I'll then try to report in a few days whether I can still trigger this bug or not.
Comment 4 Paul Bolle 2012-04-23 14:01:08 UTC
(In reply to comment #3)
> 1) I'll then try to report in a few days whether I can still trigger this bug
> or not.

0) Patch applied cleanly (tough at an enormous "offset") to kernel-3.3.2.

1) I've been running this patch for about three days now and haven't been able to trigger this error since. But there's this idea that "the absence of evidence is not the evidence of absence". So I'm currently planning to keep applying this patch on top of whatever Fedora 16 will be shipping as a kernel, just to be sure.

2) I'm not sure about the status of that patch (is it included in any public repo? what public repos is it intended for?). But if you plan on updating it (ie, it's description) for the error on resume case and pushing it to whatever repo you're targeting, please feel free to add Reported-by and/or Tested-by tags.
Comment 5 Chris Wilson 2012-04-25 02:28:08 UTC
These bugs all have similar symptoms that could be explained and fixed by the following patch. So please do test drm-intel-next-queued and report back. On trying the equivalent patch in the past, it has caused modesetting regression for the initial switch from the BIOS configuration, so do look out for any glitches during boot. Thanks.

commit 969d380a39d33f7533b6dcee35e834109d23f9e9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 24 16:36:50 2012 +0100

    drm/i915: Remove too early plane enable on pre-PCH hardware
    
    Enabling the plane before we have assigned valid address means that it
    will access random PTE (often with conflicting memory types) and cause
    GPU lockups. However, enabling the plane too early appears to workaround
    a number of bugs in our modesetting code.
    
    Cc: Franz Melchior <melchior.franz@gmail.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=39947
    References: https://bugs.freedesktop.org/show_bug.cgi?id=41091
    References: https://bugs.freedesktop.org/show_bug.cgi?id=49041
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 6 Paul Bolle 2012-04-25 03:27:27 UTC
(In reply to comment #5)
> These bugs all have similar symptoms that could be explained and fixed by the
> following patch. So please do test drm-intel-next-queued and report back.

0) I've located that patch at http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=969d380a39d33f7533b6dcee35e834109d23f9e9.

1) I've run the equivalent of that patch on v3.3.2 for some time now with no obvious (to me) problems and without triggering the error that this patch is supposed to suppress.

Is that enough to CC -stable (for the v3.3 kernel)?
Comment 7 Daniel Vetter 2012-04-25 03:32:16 UTC
On Wed, Apr 25, 2012 at 12:27,  <bugzilla-daemon@freedesktop.org> wrote:
> Is that enough to CC -stable (for the v3.3 kernel)?

Nope, because the patch needs to be in a mainline git first. And
because it has blown up in the past, it will go through -next, which
means 3.5. And even then I'd opposed sending it to stable right away
until we have some testing feedback on it.
Comment 8 Chris Wilson 2012-04-25 03:35:19 UTC
(In reply to comment #6)
> 1) I've run the equivalent of that patch on v3.3.2 for some time now with no
> obvious (to me) problems and without triggering the error that this patch is
> supposed to suppress.
> 
> Is that enough to CC -stable (for the v3.3 kernel)?

I'd be careful, the last time I managed to get the patch upstream it was reverted shortly afterwards because a user reported the initial modeset was corrupt. So I'd play safe and wait for broader testing.
Comment 9 Paul Bolle 2012-05-05 04:08:52 UTC
(In reply to comment #7)
> Nope, because the patch needs to be in a mainline git first. 

0) I thought the point of CC'ing stable was that the acceptance in mainline and the (start of) backporting to the stable branches could coincide. Ie, once the patch is in mainline the procedure to get the patch in the relevant stable branches starts immediately. But, anyhow, since comment #7 and comment #8 both oppose backporting to stable without additional testing on those stable branches my point is moot for this issue.

1) As for testing: I'm running the v3.3.4 based kernel of Fedora 16 with the equivalent of the patch of comment #5 for some time now without triggering this issue. So the patch still seems to fix this issue for v3.3 based stable kernels (on this machine!).
Comment 10 Florian Mickler 2012-07-01 03:44:54 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc1:

commit c7bd4c25650704d4d065eb4ce2a122d2a80ce804
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 24 16:36:50 2012 +0100

    drm/i915: Remove too early plane enable on pre-PCH hardware
Comment 11 Luidgi Michael 2016-10-14 09:02:52 UTC
Created attachment 127292 [details]
Error code issued by dmesg command

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.