Bug 58107 - [sna gm45] Lockups in X on 2.20.15, flip-vs-dpms? race
Summary: [sna gm45] Lockups in X on 2.20.15, flip-vs-dpms? race
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-11 01:28 UTC by Elvis Pranskevichus
Modified: 2017-07-24 22:59 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Logs (159.25 KB, application/octet-stream)
2012-12-11 01:28 UTC, Elvis Pranskevichus
no flags Details
Another set of dumps, with error_info this time. (157.95 KB, text/plain)
2012-12-11 17:27 UTC, Elvis Pranskevichus
no flags Details

Description Elvis Pranskevichus 2012-12-11 01:28:12 UTC
My gm45-based system began to lock up in X randomly ever since upgrading to 2.20.  VT switch sometimes works and sometimes it doesn't.  Attempts to restart X lock up the system hard.   I was finally able to capture *some* information in the logs.  intel_gpu_abrt output is attached.  Unfortunately, cat /sys/kernel/debug/dri/0/i915_error_state failed with memory allocation error.  It did contain some information after a reboot, although I'm not sure if it's useful.  Also attached.
Comment 1 Elvis Pranskevichus 2012-12-11 01:28:36 UTC
Created attachment 71298 [details]
Logs
Comment 2 Chris Wilson 2012-12-11 09:14:29 UTC
Impossible to tell without the error state, but at first glance that looks like a kernel bug.
Comment 3 Elvis Pranskevichus 2012-12-11 16:45:39 UTC
Trouble is, error state is empty in most cases, just as all the logs.  In the rare case it was suggested it wasn't empty I got memory allocation error trying to cat it.  Are there any suggestions as to how to debug this next time it happens?
Comment 4 Elvis Pranskevichus 2012-12-11 17:27:22 UTC
Created attachment 71347 [details]
Another set of dumps, with error_info this time.

OK, I managed to capture i915_error_info.  You can find it attached along with reg_dumper output and vbios dump.  dmesg is clean.
Comment 5 Chris Wilson 2012-12-11 20:44:09 UTC
Looks like a pageflip versus state race. Are you able to grab a 3.7 as we fixed a few issues there recently?
Comment 6 Elvis Pranskevichus 2012-12-11 21:05:02 UTC
I'm running 3.7.0-rc8-00014-g27d7c2a.  The only drm-related commit I'm missing is caf491916b1c1e939a2c7575efb7a77f11fc9bdf
Comment 7 Chris Wilson 2012-12-11 21:42:39 UTC
If you are sure you can hit it again, you can try the mb() patches included in

http://cgit.freedesktop.org/~ickle/linux-2.6/ #master
Comment 8 Chris Wilson 2012-12-11 22:19:47 UTC
I suspect we have a dupe of bug 53385
Comment 9 Elvis Pranskevichus 2012-12-14 20:07:27 UTC
That branch doesn't boot for me as is, but I cherry-picked commits that seemed to contain mb() insertions on top of 3.7:

* drm/i915: Insert a full mb() before reading the seqno from the status page
* drm/i915: Review the memory barriers around CPU access to buffers
* overlay mb()

Is that it, or is there something else in that branch that may be relevant?
Comment 10 Chris Wilson 2012-12-14 21:01:59 UTC
Heh, there were a couple more important ones.

Something has broke in the fastboot patches, as least it is causing problems on one of machines, which can be avoided with:

diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_
index 64a8079..78bd6f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -319,6 +319,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_de
        struct drm_i915_gem_object *obj;
        struct drm_mm_node *stolen;
 
+       return NULL;
+
        if (dev_priv->mm.stolen_base == 0)
                return NULL;
Comment 11 Chris Wilson 2012-12-16 14:07:51 UTC
Also be advised that xf86-video-intel-2.20.16 fixed a race in SNA for resizing DRI buffers.
Comment 12 Elvis Pranskevichus 2012-12-19 21:31:05 UTC
Unfortunately, 2.20.16 does not fix this issue.  And I'm still unable to boot the ickle/master branch, even with the fix you posted.  I'll try the 3.8-rc0 and see if the latest drm pull has changed anything.
Comment 13 Elvis Pranskevichus 2012-12-21 20:31:03 UTC
Locks up under 3.8 as well.  I booted with drm.debug=0x06 and dmesg is full of this:

[drm:i915_pageflip_stall_check], Pageflip stall detected
[drm:intel_update_fbc], more than one pipe active, disabling compression
[drm:intel_prepare_page_flip], preparing flip with no unpin work?
[drm:intel_update_fbc], more than one pipe active, disabling compression
[drm:intel_update_fbc], more than one pipe active, disabling compression
[drm:intel_update_fbc], more than one pipe active, disabling compression
[drm:intel_update_fbc], more than one pipe active, disabling compression
[drm:intel_update_fbc], more than one pipe active, disabling compression
[drm:i915_pageflip_stall_check], Pageflip stall detected
[drm:intel_update_fbc], more than one pipe active, disabling compression

I was also hoping to be able to bisect the driver, but the whole xorg-server API thing doesn't make it easy.
Comment 14 Elvis Pranskevichus 2013-01-16 02:25:00 UTC
A small update here.

Reverted to xf86-video-intel-2.19.0 and xorg-server-1.12.4.  Rock solid.  2.20.16 still locks up.  Will try to "bisect" through public releases first, will hopefully narrow the regression area.
Comment 15 Daniel Vetter 2013-01-16 09:49:47 UTC
Just fyi, usually it's quicker to just do a git bisect, since commits are not evently distributed between releases ...
Comment 16 Chris Wilson 2013-02-20 14:52:56 UTC
Hah, I think we have a solution:

commit 21ad833075801a7cd81b5ef1604ffc6c600e5ff9
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Tue Feb 19 15:16:39 2013 +0200

    drm/i915: Fix races in gen4 page flip interrupt handling


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.