Bug 29153

Summary: [legacy/ums] kernel panic on Debian 2.6.35-rc5
Product: DRI Reporter: Martin Sillence <martin>
Component: DRM/IntelAssignee: Xorg Project Team <xorg-team>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: medium CC: kurt
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
kernel log
none
Check master exists before dereferencing.
none
kernel log when kms is disable.
none
xorg log with latest version none

Description Martin Sillence 2010-07-19 12:01:15 UTC
The Kernel panics on starting X on Debian with experimental kernel 2.6.35-rc5
I'm unable to switch back to text terminal, sysreq sync still works.

Attached is the kernel log up to the sysreq sync
Comment 1 Martin Sillence 2010-07-19 12:02:44 UTC
Created attachment 37184 [details]
kernel log

kernel log from boot to panic
Comment 2 Martin Sillence 2010-07-19 12:20:11 UTC
probably worth including I'm using the _latest_ intel driver:

context:
http://ikibiki.org/blog/2010/07/04/We_need_you_redux/

> Cyril Brulebois <kibi@debian.org> (12/07/2010):
>> It would be nice to know how it goes with the packages I built (for
>> i386 + amd64) and uploaded there:
>>   http://people.debian.org/~kibi/packages/xserver-xorg-video-intel/
>
> I've put a new version there: 2.12.0-1+ickle2


Hardware: 965GM on a Sony SZ680

Previous bug: 28204

Regards,
M
Comment 3 Jesse Barnes 2010-07-19 12:28:04 UTC
Wow, must be an old userspace to use the IRQ_EMIT ioctl...  Looks like we're failing to grab the hw lock, probably because some aspect of the master structures aren't set up at this point.

(gdb) list *i915_irq_emit+0x18a
0x50fa is in i915_irq_emit (drivers/gpu/drm/i915/i915_irq.c:1073).
1068		if (!dev_priv || !dev_priv->render_ring.virtual_start) {
1069			DRM_ERROR("called with no initialization\n");
1070			return -EINVAL;
1071		}
1072	
1073		RING_LOCK_TEST_WITH_RETURN(dev, file_priv);
1074	
1075		mutex_lock(&dev->struct_mutex);
1076		result = i915_emit_irq(dev);
1077		mutex_unlock(&dev->struct_mutex);
Comment 4 Jesse Barnes 2010-07-19 12:29:11 UTC
Chris's legacy branch was implicated in this bug.
Comment 5 Chris Wilson 2010-07-19 12:32:22 UTC
Created attachment 37185 [details] [review]
Check master exists before dereferencing.
Comment 6 Chris Wilson 2010-07-19 12:40:14 UTC
Hmm, master is dereferenced even earlier and need to understand when and why master might be NULL in the first place.
Comment 7 Martin Sillence 2010-07-19 22:49:55 UTC
Please note this is a regression, this version of xorg-intel is working fine in 2.6.34-1-amd64
Comment 8 Kurt Roeckx 2010-07-24 10:48:38 UTC
I'm seeing about the same thing using an 2.6.35-rc5 and -rc6 kernel.  I'm using the "ickle2" version mentioned before.

I'll attach my dmesg shortly.
Comment 9 Kurt Roeckx 2010-07-24 10:50:20 UTC
Created attachment 37353 [details]
kernel log when kms is disable.
Comment 10 Kurt Roeckx 2010-07-24 11:29:05 UTC
The patch in the bug report does not fix my issue.  The call trace still looks the same.
Comment 11 Chris Wilson 2010-07-24 15:14:08 UTC
I'm not terribly happy that a broken userspace can cause a kernel BUG(), but I've pushed a fix to my legacy branch.

Maybe we can use this as evidence to remove the broken kernel API...
Comment 12 Kurt Roeckx 2010-07-24 16:03:35 UTC
A bisect for at least my problem results in:
$ git bisect good
8187a2b70e34c727a06617441f74f202b6fefaf9 is the first bad commit
commit 8187a2b70e34c727a06617441f74f202b6fefaf9
Author: Zou Nan hai <nanhai.zou@intel.com>
Date:   Fri May 21 09:08:55 2010 +0800

    drm/i915: introduce intel_ring_buffer structure (V2)

    Introduces a more complete intel_ring_buffer structure with callbacks
    for setup and management of a particular ringbuffer, and converts the
    render ring buffer consumers to use it.

    Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
    Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
    [anholt: Fixed up whitespace fail and rebased against prep patches]
    Signed-off-by: Eric Anholt <eric@anholt.net>

:040000 040000 b90a540c84c2ffa50b8b0bb7292749cef96e75d3 22c06e081bc722df129f2d0dc937950d5f164c5c M      drivers
:040000 040000 6ac1363503569458bf035132b01f206c256701cb 757099565b205b0908a8b903db5c9b00d2c6e142 M      include
Comment 13 Kurt Roeckx 2010-07-26 13:56:23 UTC
As I understand it, this bug seems to contain at least 2 issues:
- X doing some wrong API call
- The kernel having a problem with it.

I think the first issue is solved?  (The bug being set to fixed state)

But I think the second problem should get fixed too, and I'm not sure if anything is happening with that.  Is there a bug in the kernel bug tracker about it?  Should I create one?
Comment 14 Chris Wilson 2010-07-26 15:09:36 UTC
If you can demonstrate that the old userspace, say either 2.6 or 2.9, fails with the current kernel then it needs an immediate fix. If however, it is just one more broken piece in a broken API, the sooner we can kill it with fire the better.

In short, it is not at the top of my list of kernel OOPs to fix. :(
Comment 15 Ben Hutchings 2010-07-26 22:02:49 UTC
(In reply to comment #14)
> If you can demonstrate that the old userspace, say either 2.6 or 2.9, fails
> with the current kernel then it needs an immediate fix. If however, it is just
> one more broken piece in a broken API, the sooner we can kill it with fire the
> better.
> 
> In short, it is not at the top of my list of kernel OOPs to fix. :(

Maybe, but that doesn't mean the bug is fixed. Please don't mark it as such.
Comment 16 Chris Wilson 2010-07-27 00:33:21 UTC
The bug in legacy is fixed.
Comment 17 Kurt Roeckx 2010-07-27 14:45:43 UTC
Created attachment 37415 [details]
xorg log with latest version

There is now an "ickle3" version which contains commit 352016d2da69bfc998a642132ab722940899ad2e.

With that version on a 2.6.32 (Debian version 2.6.32-17), I can get to the login screen, but the screen turns black and the pc locks up somewhere after logging in.  I've attached my xorg log file.  This is with ums.

This looks like a step back since it used to work with this kernel.
Comment 18 Kurt Roeckx 2010-07-27 14:57:54 UTC
I've now also booted it using drm.debug=0x06, and the kernel log ended with:
[  162.381742] [drm:i915_wait_irq], irq_nr=433 breadcrumb=433
[  162.381804] [drm:i915_batchbuffer], i915 batchbuffer, start e9000 used 152 cliprects 0
[  162.381817] [drm:i915_emit_irq],
[  163.237945] [drm:i915_emit_irq],
[  163.237960] [drm:i915_wait_irq], irq_nr=436 breadcrumb=436
[  163.238029] [drm:i915_batchbuffer], i915 batchbuffer, start ed000 used 56 cliprects 0
[  163.238042] [drm:i915_emit_irq],
[  163.716533] [drm:i915_emit_irq],
[  163.716549] [drm:i915_wait_irq], irq_nr=439 breadcrumb=439
[  163.716623] [drm:i915_batchbuffer], i915 batchbuffer, start e9000 used 416 cliprects 0
[  163.716637] [drm:i915_emit_irq],
[  163.717262] [drm:i915_emit_irq],
[  163.717271] [drm:i915_wait_irq], irq_nr=442 breadcrumb=439
[  163.737309] [drm:i915_wait_irq], irq_nr=442 breadcrumb=439
[  163.757321] [drm:i915_wait_irq], irq_nr=442 breadcrumb=439
[  163.777235] [drm:i915_wait_irq], irq_nr=442 breadcrumb=439
[  163.781312] [drm:i915_batchbuffer], i915 batchbuffer, start ed000 used 8 cliprects 0
[  163.781331] [drm:i915_emit_irq],
[  165.836145] [drm:i915_get_vblank_counter], trying to get vblank count for disabled pipe 1
Comment 19 Chris Wilson 2011-01-26 01:59:01 UTC
Should be fixed with:

commit e8616b6ced6137085e6657cc63bc2fe3900b8616
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jan 20 09:57:11 2011 +0000

    drm/i915: Initialise ring vfuncs for old DRI paths
    
    We weren't setting up the vfunc table when initialising the old DRI
    ringbuffer, leading to such OOPSes as:

...

commit 5a9a8d1a99c617df82339456fbdd30d6ed3a856b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jan 23 13:03:24 2011 +0000

    drm/i915: Handle the no-interrupts case for UMS by polling
    
    If the driver calls into the kernel to wait for a breadcrumb to pass,
    but hasn't enabled interrupts, fallback to polling the breadcrumb value.
    
    Reported-by: Chris Clayton <chris2553@googlemail.com>
    Tested-by: Chris Clayton <chris2553@googlemail.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.