Bug 27146 - [gm45] GPU lockup (xorg crash when closing the laptop lid)
Summary: [gm45] GPU lockup (xorg crash when closing the laptop lid)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Jesse Barnes
QA Contact:
URL:
Whiteboard:
Keywords:
: 27267 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-03-17 15:41 UTC by Geir Ove Myhr
Modified: 2017-07-24 23:08 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg before closing lid (125.54 KB, text/plain)
2010-03-17 15:42 UTC, Geir Ove Myhr
no flags Details
Xorg.0.log before closing lid (31.93 KB, text/plain)
2010-03-17 15:43 UTC, Geir Ove Myhr
no flags Details
intel_gpu_dump output before closing lid (gzipped) (100.13 KB, application/x-gzip)
2010-03-17 15:44 UTC, Geir Ove Myhr
no flags Details
dmesg (with drm.debug=0x02) after closing lid (240.41 KB, text/plain)
2010-03-17 15:46 UTC, Geir Ove Myhr
no flags Details
i915_error_state after lid close (759.78 KB, text/plain)
2010-03-17 15:47 UTC, Geir Ove Myhr
no flags Details
Xorg.0.log after closing lid (34.75 KB, text/plain)
2010-03-17 15:48 UTC, Geir Ove Myhr
no flags Details
intel_gpu_dump output after closing lid (gzipped) (122.84 KB, application/x-gzip)
2010-03-17 15:49 UTC, Geir Ove Myhr
no flags Details
intel_error_decode output on the previously attache i915_error_state (gzipped, lid closed) (139.50 KB, application/x-gzip)
2010-03-17 15:50 UTC, Geir Ove Myhr
no flags Details
dmesg (with drm.debug=0x02) after opening lid (240.41 KB, text/plain)
2010-03-17 15:52 UTC, Geir Ove Myhr
no flags Details
dmesg (with drm.debug=0x02) after opening lid (240.44 KB, text/plain)
2010-03-17 15:53 UTC, Geir Ove Myhr
no flags Details
i915_error_state after lid open (759.78 KB, text/plain)
2010-03-17 15:54 UTC, Geir Ove Myhr
no flags Details
Xorg.0.log after opening lid (35.13 KB, text/plain)
2010-03-17 15:56 UTC, Geir Ove Myhr
no flags Details

Description Geir Ove Myhr 2010-03-17 15:41:24 UTC
Originally reported by Nicolò Chieffo at:
  https://bugs.launchpad.net/bugs/535640

[Problem]

When the laptop lid is closed, the GPU hangs at MI_WAIT_FOR_EVENT Display Pipe B Scan Line Window Wait Enable (0x01800020). Logs and i915_error_state from drm-intel-next kernel before close, after close, and after open are attached.

[Original report]

Binary package hint: xserver-xorg-video-intel

When I close the laptop lid (I have set it to blank the screen, NOT SUSPEND) and open it, the screen remains black for a while, the xorg is killed and gdm restarts.

If quickly switching to the VT, Xorg does not crash.
If gnome-settings-daemon is not running, this bug does not exist.

ProblemType: Crash
Architecture: amd64
Chipset: gm45
Date: Wed Mar 10 11:49:22 2010
DistroRelease: Ubuntu 10.04
DkmsStatus: Error: [Errno 2] No such file or directory
DumpSignature: de05bf80bf83cd22541cb55f1a2ee99e
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
InterpreterPath: /usr/bin/python2.6
MachineType: Dell Inc. Latitude E6400
Package: xserver-xorg-video-intel 2:2.9.1-1ubuntu12
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-16-generic root=UUID=aefb7f45-2b3b-432f-a5c0-a4900283c338 ro quiet splash
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:
 
ProcVersionSignature: Ubuntu 2.6.32-16.25-generic
SourcePackage: xserver-xorg-video-intel
Title: [gm45] GPU lockup de05bf80bf83cd22541cb55f1a2ee99e
Uname: Linux 2.6.32-16-generic x86_64
UserGroups:
 
dmi.bios.date: 12/21/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A20
dmi.board.name: 0RX493
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA20:bd12/21/2009:svnDellInc.:pnLatitudeE6400:pvr:rvnDellInc.:rn0RX493:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude E6400
dmi.sys.vendor: Dell Inc.
system:
 distro:             Ubuntu
 codename:           lucid
 architecture:       x86_64
 kernel:             2.6.32-16-generic (with drm from 2.6.33)

[lspci]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07)
    	Subsystem: Dell Device [1028:0233]
Comment 1 Geir Ove Myhr 2010-03-17 15:42:15 UTC
Created attachment 34169 [details]
dmesg before closing lid
Comment 2 Geir Ove Myhr 2010-03-17 15:43:00 UTC
Created attachment 34170 [details]
Xorg.0.log before closing lid
Comment 3 Geir Ove Myhr 2010-03-17 15:44:24 UTC
Created attachment 34171 [details]
intel_gpu_dump output before closing lid (gzipped)
Comment 4 Geir Ove Myhr 2010-03-17 15:46:54 UTC
Created attachment 34172 [details]
dmesg (with drm.debug=0x02) after closing lid

Unfortunately, the beginning is missing, since the log has been filled with [drm:i915_add_request] lines.
Comment 5 Geir Ove Myhr 2010-03-17 15:47:52 UTC
Created attachment 34173 [details]
i915_error_state after lid close
Comment 6 Geir Ove Myhr 2010-03-17 15:48:28 UTC
Created attachment 34174 [details]
Xorg.0.log after closing lid
Comment 7 Geir Ove Myhr 2010-03-17 15:49:51 UTC
Created attachment 34175 [details]
intel_gpu_dump output after closing lid (gzipped)
Comment 8 Geir Ove Myhr 2010-03-17 15:50:30 UTC
Created attachment 34176 [details]
intel_error_decode output on the previously attache i915_error_state (gzipped, lid closed)
Comment 9 Geir Ove Myhr 2010-03-17 15:52:27 UTC
Created attachment 34177 [details]
dmesg (with drm.debug=0x02) after opening lid
Comment 10 Geir Ove Myhr 2010-03-17 15:53:31 UTC
Created attachment 34178 [details]
dmesg (with drm.debug=0x02) after opening lid
Comment 11 Geir Ove Myhr 2010-03-17 15:54:19 UTC
Created attachment 34179 [details]
i915_error_state after lid open
Comment 12 Geir Ove Myhr 2010-03-17 15:56:25 UTC
Created attachment 34180 [details]
Xorg.0.log after opening lid
Comment 13 unggnu 2010-03-24 05:34:12 UTC
*** Bug 27267 has been marked as a duplicate of this bug. ***
Comment 14 Geir Ove Myhr 2010-03-24 07:22:29 UTC
The current (hanging) batchbuffer for this bug is (from running intel_error_decode on i915_error_state captured with drm-intel-next or 2.6.34-rcX kernel)

0x0ad96000:      0x09100000: MI_LOAD_SCAN_LINES_INCL
0x0ad96004:      0x00000384: dword 1
0x0ad96008:      0x09100000: MI_LOAD_SCAN_LINES_INCL
0x0ad9600c:      0x00000384: dword 1
0x0ad96010:      0x01800020: MI_WAIT_FOR_EVENT
0x0ad96014: HEAD 0x54f08806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 1, dst tile 1)
0x0ad96018:      0x03cc0600: format 8888, dst pitch 1536, clipping disabled
0x0ad9601c:      0x00000000: dst (0,0)
0x0ad96020:      0x038405a0: dst (1440,900)
0x0ad96024:      0x02522000: dst offset 0x02522000
0x0ad96028:      0x00000000: src (0,0)
0x0ad9602c:      0x00000600: src pitch 1536
0x0ad96030:      0x0360e000: src offset 0x0360e000
0x0ad96034:      0x02000000: MI_FLUSH
0x0ad96038:      0x00000000: MI_NOOP
0x0ad9603c:      0x05000000: MI_BATCH_BUFFER_END

Most of the 50+ automatically reported Ubuntu bug reports that are duplicates to this fail to record this batchbuffer with intel_gpu_dump. In the limited i915_error_state that is captured with the Ubuntu kernel, one can se IPEHR: 0x01800020 (header of last instruction) in all those reports. 

This last instruction MI_WAIT_FOR_EVENT (Display Pipe B Scan Line Window Wait Enable) may (or may not) be related to the lines in dmesg output: 

[ 3188.322798] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe 0
[ 3188.532922] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe 1
Comment 15 Geir Ove Myhr 2010-04-11 02:02:37 UTC
This bug has been marked as fixed downstream. The patch that fixed it was 

  [ Jesse Barnes ]
  * SAUCE: drm/i915: don't change DRM configuration when releasing load
    detect pipe

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=0d2907f4bead56cff60f91068b3a3efa7149e702

I haven't seen this being applied upstream in linux-2.6.33.y or linux-2.6 git
trees, so I'm not closing this bug yet.
Comment 16 Chris Wilson 2010-05-28 05:49:44 UTC
Jesse, that patch is still MIA. However, at work I think we may have a move serious issue, namely we need to ensure that the gpu is idle prior to changing connectors lest we have a pending WAIT.
Comment 17 Chris Wilson 2010-07-10 07:10:24 UTC
Because I know Jesse is this ->||<- close to getting this patch upstream. :)
Comment 18 Geir Ove Myhr 2010-07-10 07:47:11 UTC
(In reply to comment #17)
> Because I know Jesse is this ->||<- close to getting this patch upstream. :)

Last message on the intel-gfx mailing list [1] was that this patch is not needed when the patch "drm/i915: cleanup lvds detection function" [2] is applied. So I suppose it is patch [2] that we will see upstream soon.

[1]: http://lists.freedesktop.org/archives/intel-gfx/2010-July/007383.html
[2]: http://lists.freedesktop.org/archives/intel-gfx/2010-July/007382.html
Comment 19 Jesse Barnes 2010-07-10 09:56:47 UTC
On Sat, 10 Jul 2010 07:47:11 -0700 (PDT)
bugzilla-daemon@freedesktop.org wrote:

> https://bugs.freedesktop.org/show_bug.cgi?id=27146
> 
> --- Comment #18 from Geir Ove Myhr <gomyhr@gmail.com> 2010-07-10 07:47:11 PDT ---
> (In reply to comment #17)
> > Because I know Jesse is this ->||<- close to getting this patch upstream. :)
> 
> Last message on the intel-gfx mailing list [1] was that this patch is not
> needed when the patch "drm/i915: cleanup lvds detection function" [2] is
> applied. So I suppose it is patch [2] that we will see upstream soon.
> 
> [1]: http://lists.freedesktop.org/archives/intel-gfx/2010-July/007383.html
> [2]: http://lists.freedesktop.org/archives/intel-gfx/2010-July/007382.html

No, the cleanup function is separate.  If you still see issues with
current 2.6.35-rc and you need to apply the previous "disable unused"
patch, then we have some other issue going on as well.
Comment 20 Jesse Barnes 2010-07-12 14:07:18 UTC
Put another way; is the patch in http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=0d2907f4bead56cff60f91068b3a3efa7149e702 still required in current (i.e. 2.6.35-rc) kernels?
Comment 21 Geir Ove Myhr 2010-07-13 00:51:00 UTC
(In reply to comment #20)
> Put another way; is the patch in
> http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=0d2907f4bead56cff60f91068b3a3efa7149e702
> still required in current (i.e. 2.6.35-rc) kernels?

The Ubuntu 10.10 development version has been without this patch since late May, and we haven't got the storms of bug reports for this issue like we did for 10.04, but then again that started only later in the release cycle when more people started testing. The patch was apparently removed after one person confirmed that the patch was no longer needed for him [1].

We have had one recent bug report with Maverick with the same type of batchbuffer hanging in MI_WAIT_FOR_EVENT after two MI_LOAD_SCAN_LINES_INCL [2,3], but the circumstances seems to be different (watching video) and Chris Wilson has committed a DDX patch [4] which may solve this but it has not been tested yet.

I have asked for verification at the downstream bug report and I will report back here and clear the NEEDINFO flag when/if I get some feedback.


[1]: https://lists.ubuntu.com/archives/kernel-team/2010-May/010602.html 
[2]: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/603064
[3]: http://bugs.freedesktop.org/show_bug.cgi?id=28964
[4]: video: apply the crtc box checks from dri. (http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=272d1c14a39c32ade39b5a8b080a891f2b3d6e8e)
Comment 22 Geir Ove Myhr 2010-07-13 01:15:27 UTC
Nicolò Chieffo, the original reporter has tested the 10.10 development version which is without this patch (and has a kernel based on 2.6.35-rc4), and the problem is gone. I'm marking this as FIXED.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.