Bug 34211

Summary: [915G] page flip completion panic
Product: DRI Reporter: Tom Leese <leese.thomas81>
Component: DRM/IntelAssignee: Jesse Barnes <jbarnes>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: medium CC: jbarnes, leese.thomas81, milind.movasha
Version: XOrg git   
Hardware: x86 (IA32)   
OS: Linux (All)   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=37752
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Kernel Output
none
Repeatedly outputed kernel trace.
none
xorg logs none

Description Tom Leese 2011-02-12 06:53:26 UTC
Created attachment 43294 [details]
Kernel Output

I keep getting a kernel panic when using KDE 4.6 and compositing with OpenGL. I have attached an image of the kernel panic output.

I am using Arch Linux with the latest libgl and intel-dri packages, and I am using xf86-video-intel as my video driver.

This is my chipset: "VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04)"
Comment 1 Chris Wilson 2011-02-12 13:54:57 UTC
But what kernel and configuration are you using? Can you record the first OOPS?
Comment 2 Tom Leese 2011-02-13 02:34:55 UTC
(In reply to comment #1)
> But what kernel and configuration are you using? Can you record the first OOPS?

I am using Kernel 2.6.37. Also, this is 100% reproducible.
Comment 3 mborgelt 2011-03-29 01:30:26 UTC
I can also reproduce this since Kernel 2.6.37.
Attaced is my kernel trace wich is repeatedly output to console.
Comment 4 mborgelt 2011-03-29 01:31:45 UTC
Created attachment 44986 [details]
Repeatedly outputed kernel trace.
Comment 5 Tom Leese 2011-04-06 12:57:58 UTC
I can also re-produce this problem when using GNOME Shell.
Comment 6 Chris Wilson 2011-04-17 00:33:48 UTC
I would have said this was fixed by

commit 78c6e170badd22c86a5b50a7eb038a02024b8f03
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 31 10:48:04 2011 +0000

    drm/i915: Suppress spurious vblank interrupts

in 2.6.38.
Comment 7 Tom Leese 2011-04-18 10:34:23 UTC
I don't think it's fixed as I'm using 2.6.38 and I can still get the bug.

Here is the list of programs that are known to cause it:

KWin when using OpenGL for compositing rendering
GNOME Shell
Warzone2100 game
Comment 8 Milind Movasha 2011-05-30 01:06:33 UTC
I could get a reproduction of this issue with kernel 2.6.39-1.
I am using same graphics chipset with archlinux and latest xf86-video-intel drivers.

http://lists.freedesktop.org/archives/dri-devel/2011-May/011553.html
Comment 9 Tom Leese 2011-05-30 03:27:58 UTC
I can also re-produce this bug in Kernel 2.6.39.
Comment 10 Jesse Barnes 2011-06-08 11:17:03 UTC
there are two separate bugs here, one related to mode sets causing a warning in the timer deletion code and another related to page flipping.  Which one is still present?  If both, we'll need another bug to keep them separate...
Comment 11 Milind Movasha 2011-06-09 04:11:03 UTC
(In reply to comment #10)
> Which one is still present? 

I can still easily reproduce the issue mentioned in comment #8 

I am using the following packages:

$ pacman -Q kernel26
kernel26 2.6.39.1-1
$ pacman -Q | grep intel
intel-dri 7.10.99.git20110531-1
xf86-video-intel 2.15.0-2

$ lspci | grep -i display
00:02.1 Display controller: Intel Corporation 82915G Integrated Graphics Controller (rev 04)
Comment 12 Milind Movasha 2011-06-09 04:21:37 UTC
Created attachment 47762 [details]
xorg logs
Comment 13 Jesse Barnes 2011-06-09 08:44:29 UTC
Can you post the output of:
  $ gdb i915.ko
  $ print *do_intel_finish_page_flip+0x155

if you have symbols compiled in that should give you a listing of the code that caused the failure, and hopefully point me at the root cause.
Comment 14 Milind Movasha 2011-06-09 23:52:21 UTC
Reading symbols from /lib/modules/2.6.39-milindm/kernel/drivers/gpu/drm/i915/i915.ko...done.
(gdb) l *do_intel_finish_page_flip+0x155
0x32b75 is in do_intel_finish_page_flip (drivers/gpu/drm/i915/intel_display.c:6011).
6006			list_add_tail(&e->base.link,
6007				      &e->base.file_priv->event_list);
6008			wake_up_interruptible(&e->base.file_priv->event_wait);
6009		}
6010	
6011		drm_vblank_put(dev, intel_crtc->pipe);
6012	
6013		spin_unlock_irqrestore(&dev->event_lock, flags);
6014	
6015		obj = work->old_fb_obj;

Reading symbols from /lib/modules/2.6.39-milindm/kernel/drivers/gpu/drm/drm.ko...done.
(gdb) l *drm_vblank_put+0x5e
0x703e is in drm_vblank_put (drivers/gpu/drm/drm_irq.c:923).
918	 * Release ownership of a given vblank counter, turning off interrupts
919	 * if possible. Disable interrupts after drm_vblank_offdelay milliseconds.
920	 */
921	void drm_vblank_put(struct drm_device *dev, int crtc)
922	{
923		BUG_ON(atomic_read(&dev->vblank_refcount[crtc]) == 0);
924	
925		/* Last user schedules interrupt disable */
926		if (atomic_dec_and_test(&dev->vblank_refcount[crtc]) &&
927		    (drm_vblank_offdelay > 0))

Looks like following line from drm_vblank_put() is causing the oops:
923		BUG_ON(atomic_read(&dev->vblank_refcount[crtc]) == 0);
Comment 15 Jesse Barnes 2011-06-16 12:04:17 UTC
Hm, so we do have a race between setting the unpin_work pointer and the vblank_get which you may be hitting.  Ugg there are lots of races here... but this may help with your particular race, or may just trigger new problems with our buffer object refcounts...

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 81a9059..54be277 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -6297,6 +6297,17 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		return -EBUSY;
 	}
 	intel_crtc->unpin_work = work;
+	ret = drm_vblank_get(dev, intel_crtc->pipe);
+	if (ret) {
+		intel_crtc->unpin_work = NULL;
+		spin_unlock_irqrestore(&dev->event_lock, flags);
+		goto free_work;
+	}
+	/*
+	 * Past this point, if we fail we'll let the flip completion code
+	 * clean up the vblank refcount and pin work.  It'll be a spurious
+	 * completion, but we handle that case.
+	 */
 	spin_unlock_irqrestore(&dev->event_lock, flags);
 
 	intel_fb = to_intel_framebuffer(fb);
@@ -6305,7 +6316,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	mutex_lock(&dev->struct_mutex);
 	ret = intel_pin_and_fence_fb_obj(dev, obj, LP_RING(dev_priv));
 	if (ret)
-		goto cleanup_work;
+		goto cleanup_objs;
 
 	/* Reference the objects for the scheduled work. */
 	drm_gem_object_reference(&work->old_fb_obj->base);
@@ -6412,13 +6423,8 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 cleanup_objs:
 	drm_gem_object_unreference(&work->old_fb_obj->base);
 	drm_gem_object_unreference(&obj->base);
-cleanup_work:
 	mutex_unlock(&dev->struct_mutex);
-
-	spin_lock_irqsave(&dev->event_lock, flags);
-	intel_crtc->unpin_work = NULL;
-	spin_unlock_irqrestore(&dev->event_lock, flags);
-
+free_work:
 	kfree(work);
 
 	return ret;
Comment 16 Milind Movasha 2011-06-23 22:25:29 UTC
(In reply to comment #15)

This patch works for me! I am no longer able to reproduce the issue after applying the patch to "drm-intel-fixes" branch from git sources.
Comment 17 Eugeni Dodonov 2011-08-22 09:53:28 UTC
So, just checking, is this fixed now?
Comment 18 Jesse Barnes 2011-08-22 10:28:19 UTC
No, looks like I never pushed the fix, arg.
Comment 19 Chris Wilson 2011-11-09 11:58:45 UTC
We need to get this patch upstreamed, poke.
Comment 20 Jesse Barnes 2011-11-11 12:36:06 UTC
Patch is under "[PATCH] drm/i915: don't set unpin_work if vblank_get fails" on intel-gfx, I think it's ready to commit, but it's been awhile.
Comment 21 Jesse Barnes 2012-01-11 10:59:24 UTC
Fix is heading upstream finally.


commit 7317c75e66fce0c9f82fbe6f72f7e5256b315422
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Aug 29 09:45:28 2011 -0700

    drm/i915: don't set unpin_work if vblank_get fails

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.