Bug 21414

Summary: [915] 915gm ioctl() freeze with attached batch buffer dump
Product: xorg Reporter: martin <mnemo>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: gomyhr, nalimilan
Version: 7.4 (2008.09)   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description martin 2009-04-26 07:39:58 UTC
One ubuntu user has reported a X server freeze with xserver stuck in ioctl() and he installed 2.6.30-rc2 kernel and were able to repro the freeze and capture a full batch buffer dump.

Driver running in: EXA

Exact versions are basically ubuntu jaunty except for the kernel:
intel ddx 2.6.3-0ubuntu9
kernel non-tained 2.6.30rc2
mesa 7.4-0ubuntu3
xserver 1:7.4~5ubuntu18
libdrm 2.4.5-0ubuntu4

Finally his exact chipset is:
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 03)

downstream bug report with all the data you need:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/365994

Direct link to the batch buffer dump:
http://launchpadlibrarian.net/25995147/dri_debug.tgz
Comment 1 Jesse Barnes 2009-05-11 11:21:53 UTC
Adjusting severity: crashes & hangs should be marked critical.
Comment 2 Eric Anholt 2009-05-12 15:38:07 UTC
This dump looks like:

commit 1142353b487c155a31011923fbd08ec67e60f505
Author: Keith Packard <keithp@keithp.com>
Date:   Fri May 1 11:44:13 2009 -0700

    intel_batch_start_atomic: fix size passed to intel_batch_require_space (*4)

(batchbuffer starts in the middle of what should have been an atomic batch emission)
Comment 3 martin 2009-05-12 15:46:27 UTC
Thanks for suggesting the patch. Looks like cworth already cherry picked it (as commit 115fc9a7d79da07301b96d9fc5c513d33734d273) for 2.7.1 as well.

Thanks a lot.
Comment 4 Milan Bouchet-Valat 2009-05-14 12:23:07 UTC
Sorry, but I'm now using 2.7.1 and I've experienced the freeze twice in two days. I'll try to get a new dump, but the symptoms are the same so I assume that's the same freeze...
Comment 5 Milan Bouchet-Valat 2009-05-16 07:27:01 UTC
I've caught a new batch buffer dump, which looks to my unexperienced eye very different form the old one. Maybe another bug... See http://launchpadlibrarian.net/26815168/dri_debug-new.tgz.
Comment 6 Eric Anholt 2009-05-19 10:32:56 UTC
Milan, don't reopen someone else's fixed bug to submit your bug.  Submit your own bug.
Comment 7 Milan Bouchet-Valat 2009-05-19 10:36:42 UTC
Actually, I'm the original reporter on Launchpad. Sorry for the confusion, I forgot that my report was upstreamed by someone else. So the new batchbuffer dump has been made on the same machine, and is likely to be the same freeze.
Comment 8 Milan Bouchet-Valat 2009-05-21 05:12:11 UTC
Using kernel 2.6.30rc6 fixes it. Bryce Harrington pointed to the following patch, which is indeed included in rc6:
https://bugs.freedesktop.org/attachment.cgi?id=25806

So the present bug is most likely a duplicate of bug 21488 (but it occurred with and without UXA and KMS).

*** This bug has been marked as a duplicate of bug 21488 ***
Comment 9 Milan Bouchet-Valat 2009-05-21 15:27:07 UTC
Reopening for the second time, sorry. Actually, I experienced it again with kernel 2.6.30rc6 and driver 2.7.99.1+git20090519. The output of dmesg is exactly the same as that of the first dump linked here:
[ 1320.512119] Call Trace:
[ 1320.512137] [<c02d296e>] ? rb_erase+0xbe/0x130
[ 1320.512150] [<c0512ed4>] __mutex_lock_slowpath+0xa4/0x100
[ 1320.512159] [<c0512c10>] mutex_lock+0x20/0x40
[ 1320.512197] [<e0a4d688>] i915_gem_retire_work_handler+0x28/0x70 [i915]
[ 1320.512209] [<c014c8bd>] run_workqueue+0x6d/0x130
[ 1320.512238] [<e0a4d660>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[ 1320.512248] [<c014cf48>] worker_thread+0x88/0xe0
[ 1320.512259] [<c01505a0>] ? autoremove_wake_function+0x0/0x40
[ 1320.512268] [<c014cec0>] ? worker_thread+0x0/0xe0
[ 1320.512277] [<c01501fc>] kthread+0x4c/0x80
[ 1320.512284] [<c01501b0>] ? kthread+0x0/0x80
[ 1320.512294] [<c01039c7>] kernel_thread_helper+0x7/0x10

Only difference: mouse cursor was frozen too this time (not sure those data are really meaningful).

See http://launchpadlibrarian.net/27031541/dri_debug.tgz for the full dump.
Comment 10 Milan Bouchet-Valat 2009-05-27 01:18:00 UTC
Eric: may I hope somebody will look at the new traces soon? I don't want to offend the dev team, and I'm willing to help you as much as I can, but rebooting and losing work everyday is *really* annoying. I'm sure you understand that... Thanks! ;-)
Comment 11 Milan Bouchet-Valat 2009-06-08 10:44:10 UTC
Closing since it looks like the bug is fixed with latest 2.7.99 driver and kernel 2.6.30rc8.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.