During my tests of a freshly compiled Xorg (current git version) on an AOpen i915GMm-hfs I started "x11perf -all". Back at the test machine several hours later I found the keyboard dead and the screen switched off. A ssh login showed the following kernel messages: [ 6202.221950] render error detected, EIR: 0x00000010 [ 6202.221955] page table error [ 6202.221957] PGTBL_ER: 0x00000010 [ 6202.221961] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking [ 6202.222093] render error detected, EIR: 0x00000010 [ 6202.222096] page table error [ 6202.222098] PGTBL_ER: 0x00000010 There were no sign of any errors in the Xorg.log. The PC was idling around. I killed Xorg. Screen and keyboard came back to life ;-)
Created attachment 45634 [details] i915_error_state
Created attachment 45635 [details] Xorg log
This is the wtf moment: PGTBL_ER: 0x00000010 Display A: Invalid GTT PTE Plane [0]: CNTR: c1000000 STRIDE: 00000c80 SIZE: 03ff04ff POS: 00000000 ADDR: 00000000 [Offset 0x0 is the ringbuffer.] WHAT! Why has that plane been enabled with no surface attached? This sounds like commit 37d42bfcbc51fd42de15bf05f68586c156d7b76a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 29 10:40:27 2011 +0100 drm/i915: Disable all outputs early, before KMS takeover If the outputs are active and continuing to access the GATT when we teardown the PTEs, then there is a potential for us to hang the GPU. The hang tends to be a PGTBL_ER with either an invalid host access or an invalid display plane fetch. v2: Reorder IRQ initialisation to defer until after GEM is setup. Reported-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> (855GM) Tested-by: Pekka Enberg <penberg@kernel.org> # note that this doesn't fix the underlying problem of the PGTBL_ER and pipe underruns being reported immediately upon init on his 965GM MacBook but the timing has me a little perplexed.
Created attachment 45715 [details] [review] Check that the plane points to the pipe's framebuffer This is a debugging patch to see if we are the cause, or if it is due to nefarious external interference.
Created attachment 45724 [details] [review] Check that the plane points to the pipe's framebuffer And now for one that compiles...
Well, the patch fails against 2.6.38.2 and 2.6.38.3 Which kernel tree should I use for the test?
I'm currently writing stable patches against 2.6.39. However, it looks like that patch is throwing out too many false positives (or at least I'm getting plenty of warnings from it, so I need to investigate the ordering a little more closely).
First warning 3.67 seconds after boot. I'll run X11perf overnight now.
Created attachment 45743 [details] dmesg after diagnostic patch applied
Created attachment 45744 [details] dmesg after diagnostic patch applied Well, lot's of warnings both at boot time and after X11 startup. X11 screen flickers a bit.
Created attachment 45745 [details] Xorg log after diagnostic patch applied
Comment on attachment 45743 [details] dmesg after diagnostic patch applied First positive 3.6 seconds after boot
I guess the flicker is from the extra warnings. But the warning do look like a genuine issue we have with enabling an incomplete crtc for TV detection. Should be easy to solve, but in the meantime, does disabling TV detection prevent the GPU hang? Something like diff --git a/drivers/gpu/drm/i915/intel_tv.c b/drivers/gpu/drm/i915/intel_tv.c index 6b22c1d..447e4a9 100644 --- a/drivers/gpu/drm/i915/intel_tv.c +++ b/drivers/gpu/drm/i915/intel_tv.c @@ -1355,6 +1355,8 @@ intel_tv_detect(struct drm_connector *connector, bool force) struct intel_tv *intel_tv = intel_attached_tv(connector); int type; + return connector_status_disconnected; + mode = reported_modes[0]; drm_mode_set_crtcinfo(&mode, CRTC_INTERLACE_HALVE_V);
Created attachment 45829 [details] [review] Attach an fb to load-detect pipe
Created attachment 45830 [details] [review] Enable the plane after setting the base
These 2 patches fixed the warnings found by the diagnostics patch on my 915GM. Can you please test them and see if they prevent the warnings on your machine and the eventual hang?
Well, I tested the 2.6.39-rc4+ kernel commit 686c4cbb10fc0e75b29b097290b4f7fc3f010b9e Merge: b07ad99 19234c0 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sat Apr 23 22:35:16 2011 -0700 together with - your two last patches, - the diagnostic patch and - intel_tv_detect changed to return connector_status_disconnected as suggested in comment 13. That kernel does work without any warnings here. After that I tested the same kernel without the change to intel_tv_detect. Not as good as it should be: X11 freezes during startup, keyboard is dead, but ssh login does work and shows a kernel NULL pointer dereference.
Created attachment 46037 [details] dmesg with NULL pointer dereference
Ah, there is now a much more substantial patch series to refactor load-detect-pipe on drm-intel-staging. Would be excellent if you could test with that.
Created attachment 46103 [details] dmesg after applying 1, 2 & 3 but leaving tv connector enabled Attached dmesg output after applying... /usr/src/linux-2.6.39-rc4$ sudo patch -p1 < patches/0001-drm-i915-Check-that-the-plane-points-to-the-pipe-s-f.patch patching file `drivers/gpu/drm/i915/intel_display.c' Hunk #1 succeeded at 1560 (offset 286 lines). /usr/src/linux-2.6.39-rc4$ sudo patch -p1 < patches/0002-drm-i915-Attach-a-fb-to-the-load-detect-pipe.patch patching file `drivers/gpu/drm/i915/intel_crt.c' patching file `drivers/gpu/drm/i915/intel_display.c' Hunk #1 succeeded at 5506 (offset -35 lines). patching file `drivers/gpu/drm/i915/intel_drv.h' Hunk #2 succeeded at 290 with fuzz 2 (offset -7 lines). patching file `drivers/gpu/drm/i915/intel_tv.c' Hunk #1 succeeded at 1361 (offset -9 lines). /usr/src/linux-2.6.39-rc4$ sudo patch -p1 < patches/0003-drm-i915-Only-enable-the-plane-after-setting-the-fb-.patch patching file `drivers/gpu/drm/i915/intel_display.c' Hunk #1 succeeded at 5177 (offset -35 lines). I have a CRT (PAL B/G TV) connected to the YPbPr output but unfortunately am still not able to use KMS at all as the screen blanks as the KMS modifies the BIOS' correctly configured TV output. I would like to know how to force-enable the connector and select an 800x600 VGA, and 576i scaler mode.
There definitely is progress, thanks for your work Chris! Yesterdays git of the Xorg server and the drm_intel_staging branch of the kernel do work well for me. No page table error etc, no lockup at Xorg start etc. I suppose you do know which patches of the *staging branch need to be passed to 2.6.39 before release ... Bug 36151 (i915GM: Pageflip completion has impossible msc) is still present. Well, whenever I write "no problems" I should write more exactly "no problems together with monitors attached to the VGA and DPMI connector". Scott has the same i915GMm-hfs model but also uses the TV out connectors that I cannot test because I don't have the needed TV hardware here at the office. Maybe it would be a good idea to close this bug as soon as the fix is included in the master branch of linux-2.6 and to open a new bug "TV-Out broken on i915GM".
(In reply to comment #21) > There definitely is progress, thanks for your work Chris! > > Yesterdays git of the Xorg server and the drm_intel_staging branch of the > kernel > do work well for me. No page table error etc, no lockup at Xorg start etc. > > I suppose you do know which patches of the *staging branch need to be passed to > 2.6.39 before release ... Sadly, all of them. Though only the series of 9 patches required for this bug. :| > Bug 36151 (i915GM: Pageflip completion has impossible msc) is still present. Just waiting upon the burst of comprehension so that I can understand just how that bit of code intends to operate and so why it is failing to do so... > Well, whenever I write "no problems" I should write more exactly "no problems > together with monitors attached to the VGA and DPMI connector". Scott has the > same i915GMm-hfs model but also uses the TV out connectors that I cannot test > because I don't have the needed TV hardware here at the office. > > Maybe it would be a good idea to close this bug as soon as the fix is included > in the master branch of linux-2.6 and to open a new bug "TV-Out broken on > i915GM". Scott, please do track the TV-out bug separately. There are also 3 patches for broken TV-out in staging (which have already been pushed to -fixes and should be upstream shortly).
(In reply to comment #22) > (In reply to comment #21) > > Scott, please do track the TV-out bug separately. There are also 3 patches for > broken TV-out in staging (which have already been pushed to -fixes and should > be upstream shortly). Will do. Would one of you be kind enough to shoot me an email with a quick explanantion of which statements in my attached kernel r/b (dmesg) output show where the TV out is going wrong, if there might be a temporary workaround or something I could do to take it one or two steps further, and tell me if I should be raising a new bug or just contributing to a different one?
A patch referencing this bug report has been merged in v3.0-rc1: commit d2dff872ac44540622ef77a2b7d6ce4a1b145931 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 19 08:36:26 2011 +0100 drm/i915: Attach a fb to the load-detect pipe
Another bad plane fetch... hopefully it's gone in current kernels. Knut or Scott, can you confirm?
I guess with about one year of silence and no complaints from the reporters we can presume this got indeed fixed. Thanks for reporting this, I'll close this as fixed. Please reopen if that's not the case and you still see issues.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.