The X server sometimes crashes on monitor unplug, with this error: [ 54874.313] (II) config/udev: removing device Targus Soft-Touch Bluetooth Mouse [ 54874.314] (II) Targus Soft-Touch Bluetooth Mouse: Close [ 54874.314] (II) UnloadModule: "evdev" [ 54874.315] (II) Unloading evdev [ 85626.868] (EE) intel(0): Detected a hung GPU, disabling acceleration. [ 85733.985] (II) intel(0): EDID vendor "LEN", prod id 16401 [ 85733.985] (II) intel(0): Printing DDC gathered Modelines: [ 85733.985] (II) intel(0): Modeline "1280x800"x0.0 68.94 1280 1296 1344 1408 800 801 804 816 -hsync -vsync (49.0 kHz) [ 85733.985] (II) intel(0): Modeline "1280x800"x0.0 60.96 1280 1328 1360 1478 800 803 809 825 -hsync -vsync (41.2 kHz) [ 85734.185] (II) intel(0): EDID vendor "LEN", prod id 16401 [ 85734.185] (II) intel(0): Printing DDC gathered Modelines: [ 85734.185] (II) intel(0): Modeline "1280x800"x0.0 68.94 1280 1296 1344 1408 800 801 804 816 -hsync -vsync (49.0 kHz) [ 85734.185] (II) intel(0): Modeline "1280x800"x0.0 60.96 1280 1328 1360 1478 800 803 809 825 -hsync -vsync (41.2 kHz) [ 85734.562] (II) intel(0): Allocated new frame buffer 1920x1080 stride 7680, tiled [ 85735.126] (EE) intel(0): [DRI2] DRI2SwapBuffers: drawable has no back or front? [ 85735.419] Backtrace: [ 85735.422] 0: /usr/bin/Xorg (xorg_backtrace+0x2f) [0x4a120f] [ 85735.423] 1: /usr/bin/Xorg (0x400000+0x61da6) [0x461da6] [ 85735.423] 2: /lib64/libc.so.6 (0x3000400000+0x33140) [0x3000433140] [ 85735.423] 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f2a08675000+0x23b22) [0x7f2a08698b22] [ 85735.423] 4: /usr/lib64/xorg/modules/extensions/libdri2.so (0x7f2a088c5000+0x2370) [0x7f2a088c7370] [ 85735.423] 5: /usr/lib64/xorg/modules/extensions/libdri2.so (DRI2GetBuffersWithFormat+0x14) [0x7f2a088c74a4] [ 85735.423] 6: /usr/lib64/xorg/modules/extensions/libdri2.so (0x7f2a088c5000+0x3d1c) [0x7f2a088c8d1c] [ 85735.423] 7: /usr/bin/Xorg (0x400000+0x2e6a1) [0x42e6a1] [ 85735.423] 8: /usr/bin/Xorg (0x400000+0x2292a) [0x42292a] [ 85735.423] 9: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x300041ee5d] [ 85735.423] 10: /usr/bin/Xorg (0x400000+0x22c11) [0x422c11] [ 85735.423] Segmentation fault at address (nil) [ 85735.423] Fatal server error: [ 85735.423] Caught signal 11 (Segmentation fault). Server aborting The 0x23b22 offset in intel_drv.so corresponds to: /usr/src/debug/xf86-video-intel-2.14.0/src/intel_dri.c:388 The line is in I830DRI2DestroyBuffer(): 387 I830DRI2BufferPrivatePtr private = buffer->driverPrivate; 388 if (--private->refcnt == 0) { 389 ScreenPtr screen = private->pixmap->drawable.pScreen; So, buffer->driverPrivare was probably NULL?
This looks to be papering over an underlying bug. Can you keep you eyes open for more "DRI2SwapBuffers: drawable has no back or front?" commit e889d3a709b55a0731ab098b17a3364b9bf39387 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Feb 27 10:51:50 2011 +0000 dri: Protect against destroying a foreign DRI drawable I have no clue as to how such an alien drawable reached us, but we have the evidence of a segfault to say it can happen. Reported-by: Bernie Innocenti <bernie@codewiz.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34787 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(In reply to comment #1) > This looks to be papering over an underlying bug. Can you keep you eyes open > for more "DRI2SwapBuffers: drawable has no back or front?" Thanks for the fast response. I've rebuilt the intel driver package from a git snapshot and I'm currently testing it. I'll check the Xorg logs from time to time for more instances of this error.
(In reply to comment #2) > I've rebuilt the intel driver package from a git snapshot and I'm currently > testing it. I'll check the Xorg logs from time to time for more instances of > this error. Unfortunately, I experienced a new crash on hot-unplg after applying the proposed patch. Here's the backtrace: [160859.189] 0: /usr/bin/Xorg (xorg_backtrace+0x2f) [0x4a120f] [160859.189] 1: /usr/bin/Xorg (0x400000+0x61da6) [0x461da6] [160859.189] 2: /lib64/libc.so.6 (0x3000400000+0x33140) [0x3000433140] [160859.189] 3: /lib64/libc.so.6 (cfree+0x3c) [0x300047a52c] [160859.189] 4: /usr/lib64/xorg/modules/extensions/libdri2.so (0x7f3462f9c000+0x2370) [0x7f3462f9e370] [160859.189] 5: /usr/lib64/xorg/modules/extensions/libdri2.so (DRI2GetBuffersWithFormat+0x14) [0x7f3462f9e4a4] [160859.189] 6: /usr/lib64/xorg/modules/extensions/libdri2.so (0x7f3462f9c000+0x3d1c) [0x7f3462f9fd1c] [160859.189] 7: /usr/bin/Xorg (0x400000+0x2e6a1) [0x42e6a1] [160859.189] 8: /usr/bin/Xorg (0x400000+0x2292a) [0x42292a] [160859.189] 9: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x300041ee5d] [160859.189] 10: /usr/bin/Xorg (0x400000+0x22c11) [0x422c11] [160859.189] Segmentation fault at address (nil) The last call in libdri2.so in do_get_buffers() line 501: 499 for (i = 0; i < count; i++) { 500 if (buffers[i] != NULL) 501 (*ds->DestroyBuffer)(pDraw, buffers[i]); <--- 502 } The call to cfree() in glibc seems bogus. Maybe we crashed somewhere in I830DRI2DestroyBuffer()? It seems likely that the driverPrivate may be NULL here as well.
This morning, I experience a different failure mode, maybe related to this bug: 1. I unplug my DisplayPort monitor 2. the LVDS output lits with a uniform gray background (which corresponds to my background in GNOME) 3. The cursor continues to move, but nothing happens 4. I could switch to the console by hitting CTRL-ALT-F1 5. On the console, I could see GPU hung messages (I don't remember the exact text), approx. one per second 6. There were definitely other messages intermixed on the console, but I can't remember what they said 7. After switching virtual consoles a few times, I finally managed to completely hang the machine I was running 2.6.35.11-83.fc14.x86_64. Newer kernels hang during PM resume on my Lenovo X201, so I cannot test them. If you indicate a drm patch that would apply to 2.6.35, I could build a custom kernel.
Today it happened again, but the console did not hang, so I could collect more data (attached). Compiz was definitely involved in some way: when I killed it, the X server suddenly came back to life!
Created attachment 44280 [details] dmesg while the X server was hung
Created attachment 44281 [details] Xorg.log from the hung X server
Created attachment 44282 [details] gdb backtrace of Compiz (hung in glXSwapBuffers())
Created attachment 44283 [details] strace of the Xorg process while it's hung (fd 8 is /dev/dri/card0)
I filed a bug in Fedora against Compiz which is loosely related with this one: https://bugzilla.redhat.com/show_bug.cgi?id=664094
Today, when unplugging a VGA monitor, I got this kernel oops: general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1a.0/usb1/idVendor CPU 0 Modules linked in: pl2303 hidp fuse usb_storage rfcomm sco bnep l2cap vboxnetadp vboxnetflt vboxdrv coretemp ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf kvm_intel kvm uinput qcserial usb_wwan arc4 i2400m_usb i2400m ecb wimax snd_hda_codec_intelhdmi btusb bluetooth usbserial snd_hda_codec_conexant iwlagn snd_hda_intel snd_hda_codec snd_hwdep snd_seq iwlcore snd_seq_device snd_pcm mac80211 thinkpad_acpi cfg80211 snd_timer e1000e snd snd_page_alloc joydev iTCO_wdt i2c_i801 wmi iTCO_vendor_support rfkill microcode soundcore i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 2080, comm: kslowd002 Not tainted 2.6.35.11-83.fc14.x86_64 #1 3249CTO/3249CTO RIP: 0010:[<ffffffff812266a0>] [<ffffffff812266a0>] list_del+0x10/0x8b RSP: 0018:ffff88012d9cfc90 EFLAGS: 00010286 RAX: dead000000200200 RBX: ffff88008d8b09c8 RCX: ffff880130648000 RDX: 0000000000000000 RSI: ffff88012d9cfff8 RDI: ffff88008d8b09c8 RBP: ffff88012d9cfca0 R08: ffff88012d9cfb5c R09: ffffffff8100aae0 R10: ffff8801ad9cf98f R11: 0000000000000000 R12: ffff88012fd27800 R13: ffff88012fd25000 R14: ffffffffa009ea10 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880002000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f2168285018 CR3: 0000000132253000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kslowd002 (pid: 2080, threadinfo ffff88012d9ce000, task ffff880130648000) Stack: ffff88012fd27800 ffff88008d8b09c0 ffff88012d9cfcf0 ffffffffa0035425 <0> 0000000000000000 0000000000000001 ffff88012fd27800 ffff88012fd27800 <0> 0000000000000000 ffff88012fd27a68 ffffffffa009ea10 0000000000000438 Call Trace: [<ffffffffa0035425>] drm_mode_connector_update_edid_property+0x3f/0xf3 [drm] [<ffffffffa0067c40>] drm_helper_probe_single_connector_modes+0x110/0x29d [drm_kms_helper] [<ffffffffa00654e5>] drm_fb_helper_probe_connector_modes+0x47/0x5f [drm_kms_helper] [<ffffffffa0066756>] drm_fb_helper_hotplug_event+0xac/0xc9 [drm_kms_helper] [<ffffffffa0094888>] intel_fb_output_poll_changed+0x1c/0x20 [i915] [<ffffffffa006757b>] output_poll_execute+0xf2/0x12e [drm_kms_helper] [<ffffffff810c9f1b>] slow_work_execute+0x1a2/0x2cc [<ffffffff810ca3c5>] slow_work_thread+0x173/0x2a4 [<ffffffff81066633>] ? autoremove_wake_function+0x0/0x39 [<ffffffff81469fcf>] ? _raw_spin_unlock_irqrestore+0x17/0x19 [<ffffffff810ca252>] ? slow_work_thread+0x0/0x2a4 [<ffffffff81066199>] kthread+0x7f/0x87 [<ffffffff8100aae4>] kernel_thread_helper+0x4/0x10 [<ffffffff8106611a>] ? kthread+0x0/0x87 [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10 Code: 00 00 00 74 05 e8 8b 74 e2 ff 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f c9 c3 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 47 08 <4c> 8b 00 49 39 f8 74 1d 48 89 f9 48 c7 c2 fb 18 7b 81 be 30 00 RIP [<ffffffff812266a0>] list_del+0x10/0x8b RSP <ffff88012d9cfc90>
Also, a few days ago I was able to reproduce the GPU lockup without plugging or unplugging a monitor. All I did was closing and reopening the lid, causing a suspend-to-ram and resume cycle.
Another hint, maybe useful: on kernel-2.6.38-1.fc15.x86_64, plugging a monitor into the external VGA connector of my Lenovo X201 often results in a flashing red/black screen! Moreover, it seems to be a lot easier to make the X hang with a black screen while running a 2.6.38 kernel, but I can't see any output on the console to confirm it. I'm attaching a dmesg.out taken while X was hung.
Created attachment 44702 [details] dmesg output of 2.6.38, taken while Xorg was hung with a black screen
After several days of testing, I couldn't reproduce this bug with kernel-2.6.39-0.rc3.git2.0.fc16.x86_64. However, I'm still seeing several other bugs on monitor plug/unplug while running a composing GL window manager (compiz in my case).
So the kernel bugs were (upstream): commit 752d2635ebb12b6122ba05775f7d1ccfef14b275 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Apr 22 11:03:57 2011 +0100 drm: Take lock around probes for drm_fb_helper_hotplug_event We need to hold the dev->mode_config.mutex whilst detecting the output status. But we also need to drop it for the call into drm_fb_helper_single_fb_probe(), which indirectly acquires the lock when attaching the fbcon. Failure to do so exposes a race with normal output probing. Detected by adding some warnings that the mutex is held to the backend detect routines and commit 9a362dd718119042cbe2821edd277c8b98c7fa65 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 16 12:59:17 2011 +0100 drm/i915: Finish any pending operations on the framebuffer before disabling Similar to the case where we are changing from one framebuffer to another, we need to be sure that there are no pending WAIT_FOR_EVENTs on the pipe for the current framebuffer before switching. If we disable the pipe, and then try to execute a WAIT_FOR_EVENT it will block indefinitely and cause a GPU hang. We attempted to fix this in commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a (drm/i915: Retire any pending operations on the old scanout when switching) for the case of mode switching, but this leaves the condition where we are switching off the pipe vulnerable. There still remains the race condition were a display may be unplugged, switched off by the core, a uevent sent to notify the DDX and the DDX may issue a WAIT_FOR_EVENT before it processes the uevent. This window does not exist if the pipe is only switched off in response to the uevent. Time to make sure that is so...
Which tree has the latter patch? (9a362dd718119042cbe2821edd277c8b98c7fa65)
We are getting closer to having the flush-before-disable fix upstream.
Step 1 of the flush fixes is upstream: commit 14667a4bde4361b7ac420d68a2e9e9b9b2df5231 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 3 17:58:35 2012 +0100 drm/i915: Finish any pending operations on the framebuffer before disabling
And the second step is upstream as well, so I think we have all the pieces in place for this. commit 0f91128d88bbb8b0a8e7bb93df2c40680871d45a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 17 10:05:38 2012 +0100 drm/i915: Wait for all pending operations to the fb before disabling the pip During modeset we have to disable the pipe to reconfigure its timings and maybe its size. Userspace may have queued up command buffers that depend upon the pipe running in a certain configuration and so the commands may become confused across the modeset. At the moment, we use a less than satisfactory kick-scanline-waits should the GPU hang during the modeset. It should be more reliable to wait for the pending operations to complete first, even though we still have a window for userspace to submit a broken command buffer during the modeset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.