Bug 98855 - xf86-video-intel crashes with Xorg 1.19 using I-V-O with Nvidia
Summary: xf86-video-intel crashes with Xorg 1.19 using I-V-O with Nvidia
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
: 99129 (view as bug list)
Depends on:
Reported: 2016-11-25 14:00 UTC by Pablo Cholaky
Modified: 2016-12-17 16:55 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:

X.org crash (23.24 KB, application/x-trash)
2016-11-25 14:00 UTC, Pablo Cholaky
no flags Details
X.org Crash with gdb (32.58 KB, text/plain)
2016-11-29 17:37 UTC, Pablo Cholaky
no flags Details
X.org crash with gdb and backtrace (34.57 KB, text/plain)
2016-11-29 18:03 UTC, Pablo Cholaky
no flags Details
Handle xf86RandR12 gamma changes in xorg-1.19 (1.82 KB, patch)
2016-11-29 22:07 UTC, Chris Wilson
no flags Details | Splinter Review
Backtrace crash Xorg 1.19 with xf86-video-intel patch on attachment 128277 (27.45 KB, text/plain)
2016-12-01 00:35 UTC, Pablo Cholaky
no flags Details
New crash with unknown steps to reproduce (27.99 KB, application/x-trash)
2016-12-05 14:40 UTC, Pablo Cholaky
no flags Details

Description Pablo Cholaky 2016-11-25 14:00:02 UTC
Created attachment 128189 [details]
X.org crash

Using: x11-drivers/xf86-video-intel-2.99.917, mesa 13.0.1 and Xorg 1.19.0 on KDE 5.8.4 and Systemd 232

Everything works fine, until I run IVO, then the monitor will load fine, until I open ANY X application. that makes a whole X.org crash. That doesn't happen with Xorg 1.18.4

Hardware: Intel HD 530
External Graphics: Nvidia 980m with 375.20

I can't really use modesetting due need of VIRTUAL screen, even if is suggested to drop x86-video-intel for xf86-video-modesetting for Gen 4 and above.

Steps to reproduce:

1) Run IVO and wait for external monitors to turn on
2) Open ANY application
3) Visualize the crash.
Comment 1 Chris Wilson 2016-11-25 14:14:42 UTC
Hmm, another ABI change. Thanks for the backtrace, but it didn't find any symbols for your X server. Could you install the debug symbols for xorg-server and -intel and crash again? :) Or there might be a USE=full-debug build of the -intel ddx which will hopefully generate enough logging to find the crash.
Comment 2 Pablo Cholaky 2016-11-29 17:37:10 UTC
Created attachment 128269 [details]
X.org Crash with gdb

I'm really sorry about my delay with this, because I have splitted debug it was quite hard for me to really debug this

I hope this log may worth
Comment 3 Pablo Cholaky 2016-11-29 18:03:49 UTC
Created attachment 128270 [details]
X.org crash with gdb and backtrace

Updated file with backtrace, my bad of not doing this before.
Comment 4 Chris Wilson 2016-11-29 22:07:51 UTC
Created attachment 128277 [details] [review]
Handle xf86RandR12 gamma changes in xorg-1.19
Comment 5 cunio 2016-11-30 15:06:58 UTC
I have just applied the patch and it didn't help.
Comment 6 Chris Wilson 2016-11-30 23:49:40 UTC
I was able to reproduce the crash (using chvt to trigger the CMapReinstallMap), and that is fixed by the attached patch. Please could you verify the patch applied and grab a new backtrace?
Comment 7 Chris Wilson 2016-11-30 23:55:42 UTC
commit 9ac7a3370ab265d4cbdbbf3dc588af88c37048e1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 29 22:01:21 2016 +0000

    sna: Handle xf86Randr12 gamma changes in xorg-xserver-1.19
    commit 17213b74fd7fc4c4e2fe7a3781e7422dd482a0ab
    Author: Michel Dänzer <michel.daenzer@amd.com>
    Date:   Tue Jun 21 16:44:20 2016 +0900
        xfree86/modes: Remove xf86RandR12CrtcGetGamma
    removed the randr_crtc->palettes allocation and initialisation causing a
    later dereference of the gamma table to crash. Looks like that was just
    ABI misuse.
Comment 8 Pablo Cholaky 2016-12-01 00:35:28 UTC
Created attachment 128293 [details]
Backtrace crash Xorg 1.19 with xf86-video-intel patch on attachment 128277 [details] [review]

Tried with the patch and crash again.

Attaching new backtrace of the crash.
Comment 9 Chris Wilson 2016-12-01 00:57:59 UTC
Puzzling. The call to GammaSetSize() looks to be the right fix. :|

Could you add this bit of debugging to confirm that the GammaSetSize() is taking effect:

diff --git a/src/sna/sna_display_fake.c b/src/sna/sna_display_fake.c
index fa26bda..b504f3f 100644
--- a/src/sna/sna_display_fake.c
+++ b/src/sna/sna_display_fake.c
@@ -293,6 +293,8 @@ static bool add_fake_output(struct sna *sna, bool late)
                                   RR_Rotate_All | RR_Reflect_All);
                if (!RRCrtcGammaSetSize(crtc->randr_crtc, 256))
                        goto err;
+               ErrorF("crtc->gamma_size=%d, randr_crtc->gammaSize=%d\n",
+                      crtc->gamma_size, crtc->randr_crtc->gammaSize);
Comment 10 Pablo Cholaky 2016-12-01 01:26:57 UTC
Hi Chris,

I apologize, you were right, looks like the patches weren't applying on my build.

The patch works great (attachment 128277 [details] [review] ), I could use without any crash my external monitor.

I would say this issue is solved with this patch. Let me tomorrow do more stress testing about it.

Many thanks.
Comment 11 Pablo Cholaky 2016-12-05 14:40:31 UTC
Created attachment 128346 [details]
New crash with unknown steps to reproduce


I got 2 crashes with very unknown reasons about how to reproduce, isn't something happening every time to properly debug it with gdb, but there is an Intel debug log at least.

Steps to reproduce randomly:
1) Connect your video output.
2) Run I-V-O
3) Use xrandr to make the system detect the screen (not auto)
4) Open KDE Platforms GUI (Displays) to enable the screen on certain position.
5) Crash as soon I apply the changes.

Those steps to reproduce doesn't always happen, as I said, I was able to reproduce this same problem last week and today.
Comment 12 Chris Wilson 2016-12-05 14:52:05 UTC
The first trace is from a client requesting a swapbuffers for more than 2 years in advance; it's just a warning that we received garbage from the client/upper layers.

The second trace is interesting. Looks to be a race in handling the completed flip events in the middle of resizing the framebuffer - looks like the state has been partly updated and then we got the stale unflip request. If you don't mind, please file a new bug for that trace so that I don't forget about it.
Comment 13 Pablo Cholaky 2016-12-05 15:00:00 UTC
Sure, thanks Chris, I will try to find out the way to reproduce this bug, debug it and fill a new ticket, or this trace is fine to proper check it?
Comment 14 Chris Wilson 2016-12-05 15:16:15 UTC
Trace is good enough for a bug report without tracking down the cause, though if you can workout how to reproduce that makes testing easier :)
Comment 15 Chris Wilson 2016-12-06 09:41:13 UTC
Took a stab at that trace:

commit ff25ad3402be3bc20f7b6e680e49ad03d7c6e2af
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Dec 5 21:28:35 2016 +0000

    sna: Reorder frontbuffer resize vs flip event queue draining
    If we are not careful, we may process an unflip in the middle of
    resizing the frontbuffer - when the ScreenPixmap state is ill-defined.
    First flush all the pending flip events, cancel any residual unflips,
    then update the screen pixmap. This should be enough to close the race.
Comment 16 Chris Wilson 2016-12-17 16:55:17 UTC
*** Bug 99129 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.