Bug 36246 - i915GM: page table error during x11perf (enable plane too early during hotplug load-detect?)
Summary: i915GM: page table error during x11perf (enable plane too early during hotplu...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-14 14:13 UTC by Knut Petersen
Modified: 2017-07-24 23:05 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
i915_error_state (697.02 KB, text/plain)
2011-04-14 14:14 UTC, Knut Petersen
no flags Details
Xorg log (68.49 KB, text/plain)
2011-04-14 14:16 UTC, Knut Petersen
no flags Details
Check that the plane points to the pipe's framebuffer (2.47 KB, patch)
2011-04-16 11:22 UTC, Chris Wilson
no flags Details | Splinter Review
Check that the plane points to the pipe's framebuffer (2.52 KB, patch)
2011-04-17 00:02 UTC, Chris Wilson
no flags Details | Splinter Review
dmesg after diagnostic patch applied (52.48 KB, text/plain)
2011-04-17 13:40 UTC, Knut Petersen
no flags Details
dmesg after diagnostic patch applied (122.59 KB, text/plain)
2011-04-17 13:51 UTC, Knut Petersen
no flags Details
Xorg log after diagnostic patch applied (41.96 KB, text/plain)
2011-04-17 13:52 UTC, Knut Petersen
no flags Details
Attach an fb to load-detect pipe (12.21 KB, patch)
2011-04-19 13:25 UTC, Chris Wilson
no flags Details | Splinter Review
Enable the plane after setting the base (1.18 KB, patch)
2011-04-19 13:26 UTC, Chris Wilson
no flags Details | Splinter Review
dmesg with NULL pointer dereference (2.98 KB, text/plain)
2011-04-24 21:55 UTC, Knut Petersen
no flags Details
dmesg after applying 1, 2 & 3 but leaving tv connector enabled (76.83 KB, text/plain)
2011-04-26 20:43 UTC, Scott MacKenzie
no flags Details

Description Knut Petersen 2011-04-14 14:13:32 UTC
During my tests of a freshly compiled Xorg (current git version)
on an AOpen i915GMm-hfs I started "x11perf -all". 

Back at the test machine several hours later I found the keyboard
dead and the screen switched off. A ssh login showed the following kernel messages:

[ 6202.221950] render error detected, EIR: 0x00000010
[ 6202.221955] page table error
[ 6202.221957]   PGTBL_ER: 0x00000010
[ 6202.221961] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
[ 6202.222093] render error detected, EIR: 0x00000010
[ 6202.222096] page table error
[ 6202.222098]   PGTBL_ER: 0x00000010

There were no sign of any errors in the Xorg.log. The PC was idling around.
I killed Xorg. Screen and keyboard came back to life ;-)
Comment 1 Knut Petersen 2011-04-14 14:14:55 UTC
Created attachment 45634 [details]
i915_error_state
Comment 2 Knut Petersen 2011-04-14 14:16:02 UTC
Created attachment 45635 [details]
Xorg log
Comment 3 Chris Wilson 2011-04-16 11:21:28 UTC
This is the wtf moment:

PGTBL_ER: 0x00000010
    Display A: Invalid GTT PTE
Plane [0]:
  CNTR: c1000000
  STRIDE: 00000c80
  SIZE: 03ff04ff
  POS: 00000000
  ADDR: 00000000

[Offset 0x0 is the ringbuffer.]

WHAT! Why has that plane been enabled with no surface attached? This sounds like

commit 37d42bfcbc51fd42de15bf05f68586c156d7b76a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 29 10:40:27 2011 +0100

    drm/i915: Disable all outputs early, before KMS takeover
    
    If the outputs are active and continuing to access the GATT when we
    teardown the PTEs, then there is a potential for us to hang the GPU.
    The hang tends to be a PGTBL_ER with either an invalid host access or
    an invalid display plane fetch.
    
    v2: Reorder IRQ initialisation to defer until after GEM is setup.
    
    Reported-by: Pekka Enberg <penberg@kernel.org>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> (855GM)
    Tested-by: Pekka Enberg <penberg@kernel.org>
               # note that this doesn't fix the underlying problem of the
                 PGTBL_ER and pipe underruns being reported immediately upon
                 init on his 965GM MacBook

but the timing has me a little perplexed.
Comment 4 Chris Wilson 2011-04-16 11:22:24 UTC
Created attachment 45715 [details] [review]
Check that the plane points to the pipe's framebuffer

This is a debugging patch to see if we are the cause, or if it is due to nefarious external interference.
Comment 5 Chris Wilson 2011-04-17 00:02:20 UTC
Created attachment 45724 [details] [review]
Check that the plane points to the pipe's framebuffer

And now for one that compiles...
Comment 6 Knut Petersen 2011-04-17 01:24:25 UTC
Well, the patch fails against 2.6.38.2 and 2.6.38.3
Which  kernel tree should I use for the test?
Comment 7 Chris Wilson 2011-04-17 01:37:35 UTC
I'm currently writing stable patches against 2.6.39. However, it looks like that patch is throwing out too many false positives (or at least I'm getting plenty of warnings from it, so I need to investigate the ordering a little more closely).
Comment 8 Knut Petersen 2011-04-17 13:38:48 UTC
First warning 3.67 seconds after boot.
I'll run X11perf overnight now.
Comment 9 Knut Petersen 2011-04-17 13:40:28 UTC
Created attachment 45743 [details]
dmesg after diagnostic patch applied
Comment 10 Knut Petersen 2011-04-17 13:51:42 UTC
Created attachment 45744 [details]
dmesg after diagnostic patch applied

Well, lot's of warnings both at boot time and after X11 startup.
X11 screen flickers a bit.
Comment 11 Knut Petersen 2011-04-17 13:52:32 UTC
Created attachment 45745 [details]
Xorg log after diagnostic patch applied
Comment 12 Knut Petersen 2011-04-17 13:55:18 UTC
Comment on attachment 45743 [details]
dmesg after diagnostic patch applied

First positive 3.6 seconds after boot
Comment 13 Chris Wilson 2011-04-17 14:04:22 UTC
I guess the flicker is from the extra warnings. But the warning do look like a genuine issue we have with enabling an incomplete crtc for TV detection. Should be easy to solve, but in the meantime, does disabling TV detection prevent the GPU hang?

Something like

diff --git a/drivers/gpu/drm/i915/intel_tv.c b/drivers/gpu/drm/i915/intel_tv.c
index 6b22c1d..447e4a9 100644
--- a/drivers/gpu/drm/i915/intel_tv.c
+++ b/drivers/gpu/drm/i915/intel_tv.c
@@ -1355,6 +1355,8 @@ intel_tv_detect(struct drm_connector *connector, bool force)
        struct intel_tv *intel_tv = intel_attached_tv(connector);
        int type;
 
+       return connector_status_disconnected;
+
        mode = reported_modes[0];
        drm_mode_set_crtcinfo(&mode, CRTC_INTERLACE_HALVE_V);
Comment 14 Chris Wilson 2011-04-19 13:25:41 UTC
Created attachment 45829 [details] [review]
Attach an fb to load-detect pipe
Comment 15 Chris Wilson 2011-04-19 13:26:34 UTC
Created attachment 45830 [details] [review]
Enable the plane after setting the base
Comment 16 Chris Wilson 2011-04-19 13:27:36 UTC
These 2 patches fixed the warnings found by the diagnostics patch on my 915GM. Can you please test them and see if they prevent the warnings on your machine and the eventual hang?
Comment 17 Knut Petersen 2011-04-24 21:52:39 UTC
Well, I tested the 2.6.39-rc4+ kernel

  commit 686c4cbb10fc0e75b29b097290b4f7fc3f010b9e
  Merge: b07ad99 19234c0
  Author: Linus Torvalds <torvalds@linux-foundation.org>
  Date:   Sat Apr 23 22:35:16 2011 -0700

together with
   - your two last patches, 
   - the diagnostic patch and
   - intel_tv_detect changed to return connector_status_disconnected
     as suggested in comment 13.

That kernel does work without any warnings here.

After that I tested the same kernel without the change to intel_tv_detect.
Not as good as it should be: X11 freezes during startup, keyboard is dead,
but ssh login does work and shows a kernel NULL pointer dereference.
Comment 18 Knut Petersen 2011-04-24 21:55:00 UTC
Created attachment 46037 [details]
dmesg with NULL pointer dereference
Comment 19 Chris Wilson 2011-04-25 01:17:38 UTC
Ah, there is now a much more substantial patch series to refactor load-detect-pipe on drm-intel-staging. Would be excellent if you could test with that.
Comment 20 Scott MacKenzie 2011-04-26 20:43:24 UTC
Created attachment 46103 [details]
dmesg after applying 1, 2 & 3 but leaving tv connector enabled

Attached dmesg output after applying...

/usr/src/linux-2.6.39-rc4$ sudo patch -p1 < patches/0001-drm-i915-Check-that-the-plane-points-to-the-pipe-s-f.patch 
patching file `drivers/gpu/drm/i915/intel_display.c'
Hunk #1 succeeded at 1560 (offset 286 lines).
/usr/src/linux-2.6.39-rc4$ sudo patch -p1 < patches/0002-drm-i915-Attach-a-fb-to-the-load-detect-pipe.patch 
patching file `drivers/gpu/drm/i915/intel_crt.c'
patching file `drivers/gpu/drm/i915/intel_display.c'
Hunk #1 succeeded at 5506 (offset -35 lines).
patching file `drivers/gpu/drm/i915/intel_drv.h'
Hunk #2 succeeded at 290 with fuzz 2 (offset -7 lines).
patching file `drivers/gpu/drm/i915/intel_tv.c'
Hunk #1 succeeded at 1361 (offset -9 lines).
/usr/src/linux-2.6.39-rc4$ sudo patch -p1 < patches/0003-drm-i915-Only-enable-the-plane-after-setting-the-fb-.patch 
patching file `drivers/gpu/drm/i915/intel_display.c'
Hunk #1 succeeded at 5177 (offset -35 lines).

I have a CRT (PAL B/G TV) connected to the YPbPr output but unfortunately am still not able to use KMS at all as the screen blanks as the KMS modifies the BIOS' correctly configured TV output.  I would like to know how to force-enable the connector and select an 800x600 VGA, and 576i scaler mode.
Comment 21 Knut Petersen 2011-04-26 22:53:49 UTC
There definitely is progress, thanks for your work Chris!

Yesterdays git of the Xorg server and the drm_intel_staging branch of the kernel
do work well for me. No page table error etc, no lockup at Xorg start etc.

I suppose you do know which patches of the *staging branch need to be passed to 2.6.39 before release ...

Bug 36151 (i915GM: Pageflip completion has impossible msc) is still present.

Well, whenever I write "no problems" I should write more exactly "no problems together with monitors attached to the VGA and DPMI connector". Scott has the
same i915GMm-hfs model but also uses the TV out connectors that I cannot test because I don't have the needed TV hardware here at the office.

Maybe it would be a good idea to close this bug as soon as the fix is included in the master branch of linux-2.6 and to open a new bug "TV-Out broken on i915GM".
Comment 22 Chris Wilson 2011-04-26 23:29:00 UTC
(In reply to comment #21)
> There definitely is progress, thanks for your work Chris!
> 
> Yesterdays git of the Xorg server and the drm_intel_staging branch of the
> kernel
> do work well for me. No page table error etc, no lockup at Xorg start etc.
> 
> I suppose you do know which patches of the *staging branch need to be passed to
> 2.6.39 before release ...

Sadly, all of them. Though only the series of 9 patches required for this bug. :|
 
> Bug 36151 (i915GM: Pageflip completion has impossible msc) is still present.

Just waiting upon the burst of comprehension so that I can understand just how that bit of code intends to operate and so why it is failing to do so...

> Well, whenever I write "no problems" I should write more exactly "no problems
> together with monitors attached to the VGA and DPMI connector". Scott has the
> same i915GMm-hfs model but also uses the TV out connectors that I cannot test
> because I don't have the needed TV hardware here at the office.
> 
> Maybe it would be a good idea to close this bug as soon as the fix is included
> in the master branch of linux-2.6 and to open a new bug "TV-Out broken on
> i915GM".

Scott, please do track the TV-out bug separately. There are also 3 patches for broken TV-out in staging (which have already been pushed to -fixes and should be upstream shortly).
Comment 23 Scott MacKenzie 2011-04-27 14:17:16 UTC
(In reply to comment #22)
> (In reply to comment #21)
> 
> Scott, please do track the TV-out bug separately. There are also 3 patches for
> broken TV-out in staging (which have already been pushed to -fixes and should
> be upstream shortly).

Will do.  Would one of you be kind enough to shoot me an email with a quick explanantion of which statements in my attached kernel r/b (dmesg) output show 
where the TV out is going wrong, if there might be a temporary workaround or something I could do to take it one or two steps further, and tell me if I should be raising a new bug or just contributing to a different one?
Comment 24 Florian Mickler 2011-06-30 02:56:54 UTC
A patch referencing this bug report has been merged in v3.0-rc1:

commit d2dff872ac44540622ef77a2b7d6ce4a1b145931
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 19 08:36:26 2011 +0100

    drm/i915: Attach a fb to the load-detect pipe
Comment 25 Jesse Barnes 2012-04-16 14:34:01 UTC
Another bad plane fetch... hopefully it's gone in current kernels.  Knut or Scott, can you confirm?
Comment 26 Daniel Vetter 2012-04-16 14:56:37 UTC
I guess with about one year of silence and no complaints from the reporters we can presume this got indeed fixed. Thanks for reporting this, I'll close this as fixed. Please reopen if that's not the case and you still see issues.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.