Bug 94250 - Intel driver crashes when undocking Lenovo T450s
Summary: Intel driver crashes when undocking Lenovo T450s
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-22 16:51 UTC by cs_gon
Modified: 2016-02-25 09:40 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
stack trace, without --enable-debug (5.20 KB, text/plain)
2016-02-22 16:52 UTC, cs_gon
no flags Details
stack trace, with --enable-debug (6.45 KB, text/plain)
2016-02-22 16:53 UTC, cs_gon
no flags Details
Xorg.0.log.debug, with --enable-debug (34.42 KB, text/plain)
2016-02-22 16:53 UTC, cs_gon
no flags Details
Xorg.0.log.1 (29.30 KB, text/plain)
2016-02-23 14:07 UTC, cs_gon
no flags Details
stack_trace1.txt (4.36 KB, text/plain)
2016-02-23 14:08 UTC, cs_gon
no flags Details
Xorg.0.log.2 (28.97 KB, text/plain)
2016-02-23 14:09 UTC, cs_gon
no flags Details
stack_trace2.txt (4.19 KB, text/plain)
2016-02-23 14:10 UTC, cs_gon
no flags Details
Xorg.0.log.3 (41.74 KB, text/plain)
2016-02-23 16:39 UTC, cs_gon
no flags Details
stack_trace3.txt (4.21 KB, text/plain)
2016-02-23 16:40 UTC, cs_gon
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description cs_gon 2016-02-22 16:51:20 UTC
I'm using Ubuntu 14.04 since installing the Wily HWE stack, the X-Server began crashing when undocking the notebook (Lenovo T450s).

I tried a newer driver (GIT snapshot up to commit e41040fb55e84fc72612ae503944a99814a5c9d3 , 15th Feb. 2016), but the X-Server still crashes. The stacktrace looks a bit different when using the GIT version, but still seems to be in the same area in the code (SNA page flips). The stackstace for the GIT version is in stacktrace_nodebug.txt.

I have also compiled the driver with --enable-debug, then the X-Server exits due to an assertion (as seen in Xorg.0.log.debug). I'm also attaching the stacktrace for this case (stacktrace_debug.txt).

The kernel is the Ubuntu Wily kernel (4.2.0-27-generic).
Comment 1 cs_gon 2016-02-22 16:52:23 UTC
Created attachment 121894 [details]
stack trace, without --enable-debug
Comment 2 cs_gon 2016-02-22 16:53:05 UTC
Created attachment 121895 [details]
stack trace, with --enable-debug
Comment 3 cs_gon 2016-02-22 16:53:43 UTC
Created attachment 121896 [details]
Xorg.0.log.debug, with --enable-debug
Comment 4 Chris Wilson 2016-02-22 21:10:39 UTC
Hmm. First thought is that the kernel returned an event for an operation that failed, can you please quickly try with

diff --git a/src/sna/sna_dri2.c b/src/sna/sna_dri2.c
index fcfbd9d..7db6f61 100644
--- a/src/sna/sna_dri2.c
+++ b/src/sna/sna_dri2.c
@@ -1914,6 +1914,7 @@ sna_dri2_flip(struct sna_dri2_event *info)
                return false;
        }
 
+       assert(!info->queued);
        if (!sna_page_flip(info->sna, bo, sna_dri2_flip_handler,
                           info->type == FLIP_ASYNC ? NULL : info))
                return false;

and --enable-debug.
Comment 5 Chris Wilson 2016-02-23 09:36:42 UTC
Ok, found one scenario that could explain the traces after a failed flip when undocking, so please test with:

commit 64b1b1f10da59f15a91141c9f76d7d09517f8ea8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 23 09:32:57 2016 +0000

    sna/dri2: Ensure the flipqueue is drained on pageflip failure
    
    If we fail to queue a flip for a CRTC, we attempt to restore the
    original mode. However, if the failure is on a second CRTC, the queued
    flip on the first will still complete causing us to process the event
    twice.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=94250
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 6 cs_gon 2016-02-23 14:06:07 UTC
When using the first patch (comment 4), the code did not exit at the new assertion, but at the one from before (from comment 3).

I think I saw it one time exiting with (but failed to save the log and core dump):
(EE) sna_dri2_schedule_flip:3078 assertion 'info->front == front' failed

And the new patch (comment 5) unfortunately did not fix the problem. After a couple of tries the code seems to exit at one of two assertions. I will add the Xorg logs and corresponding stack traces. One of the assertions is "'info->front == front' failed" from above, so I got the log and stack trace for this one.
Comment 7 cs_gon 2016-02-23 14:07:56 UTC
Created attachment 121918 [details]
Xorg.0.log.1

Xorg log for assertion

(EE) sna_dri2_event_free:1663 assertion '!info->signal' failed
Comment 8 cs_gon 2016-02-23 14:08:37 UTC
Created attachment 121919 [details]
stack_trace1.txt

stack trace for assertion

(EE) sna_dri2_event_free:1663 assertion '!info->signal' failed
Comment 9 cs_gon 2016-02-23 14:09:20 UTC
Created attachment 121920 [details]
Xorg.0.log.2

Xorg log for assertion

(EE) sna_dri2_schedule_flip:3078 assertion 'info->front == front' failed
Comment 10 cs_gon 2016-02-23 14:10:11 UTC
Created attachment 121921 [details]
stack_trace2.txt

stack trace for assertion

(EE) sna_dri2_schedule_flip:3078 assertion 'info->front == front' failed
Comment 11 Chris Wilson 2016-02-23 14:43:31 UTC
(In reply to cs_gon from comment #7)
> Created attachment 121918 [details]
> Xorg.0.log.1
> 
> Xorg log for assertion
> 
> (EE) sna_dri2_event_free:1663 assertion '!info->signal' failed

That is not a huge issue, just a bit of debug code needs cancelling along the error path.

(In reply to cs_gon from comment #10)
> Created attachment 121921 [details]
> stack_trace2.txt
> 
> stack trace for assertion
> 
> (EE) sna_dri2_schedule_flip:3078 assertion 'info->front == front' failed

This is actually more of an issue. This should reduce the assert

commit 3593a2d18928f74ee470f824dc34b8b5b148ce2d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 23 14:36:10 2016 +0000

    sna/dri2: Reset front pointer on frame event across a modeset
    
    If the root window's pixmap is changed (e.g. to resizing the
    framebuffer) then an outstanding flip becomes invalid. The invalid flip
    is marked as having a stale front, and that triggers an assertion if the
    user then tries to schedule flip before the pending flip event is
    processed.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=9425
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

but it is still a little bit of a puzzle how we did not drain the event queue first. That's probably explained by

commit 4cd43faa646e368624079b73b216f6546ede5c16
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 23 14:40:39 2016 +0000

    sna: Flush the DRM event queue after a modeset
    
    Changing the mode will cause the DRM events(pageflips, vblanks) to
    be completed and the queue flushed. After applying the CRTC, drain the
    event queue.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 12 cs_gon 2016-02-23 16:37:11 UTC
The last two patches may have fixed it, or at least made things a lot better. 

I found out before, that disconnecting the external monitor from the docking station, instead of undocking the notebook, did cause this problem more frequently (maybe 75% of the time), and I was now unable to reproduce the issue when disconnecting the monitor.

Though during a lot of docking and undocking, I got this assertion one time:
(EE) sna_dri2_schedule_flip:3080 assertion 'info->front == NULL' failed

I will attach the Xorg log and stack trace again.
Comment 13 cs_gon 2016-02-23 16:39:54 UTC
Created attachment 121925 [details]
Xorg.0.log.3

Xorg log for assertion:

(EE) sna_dri2_schedule_flip:3080 assertion 'info->front == NULL' failed
Comment 14 cs_gon 2016-02-23 16:40:38 UTC
Created attachment 121926 [details]
stack_trace3.txt

stack trace for assertion

(EE) sna_dri2_schedule_flip:3080 assertion 'info->front == NULL' failed
Comment 15 Chris Wilson 2016-02-25 09:40:57 UTC
After a bit more thought, the assertion was incorrect and we just expect the stale pointer and not NULL.

Sadly, I squashed the change with

commit d1672806a5222f00dcc2eb24ccddd03f727f71bc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 24 10:33:22 2016 +0000

    sna/dri2: Add active-scanout tracking to single CRTC flips


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.