Bug 100086

Summary: xorg server 1.19.2: Crash with PRIME and multiple displays
Product: xorg Reporter: bastian.beischer
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: andyrtr, bastian.beischer, dongeryduo, jol, jonas.h.lundberg, michel, peter, root, v_bachvarov, ville.syrjala
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.log v0
none
xorg.log from 1.19.1 none

Description bastian.beischer 2017-03-06 17:42:07 UTC
Arch Linux just updated xorg-server to 1.19.2.

Since the update I'm seeing problems with my second display. I'm using a Lenovo W520 laptop which features an Intel integrated GPU and a dedicated NVIDIA GF106GLM [Quadro 2000M] GPU.

The external display port is attached to the NVIDIA GPU, but I'm using the Intel GPU as the primary one. I'm using PRIME and I'm configuring the Intel GPU to offload to the NVIDIA GPU for display on the external display:

xrandr --setprovideroutputsource nouveau Intel
primary=$(xrandr | grep -e 'LVDS' | grep -v -e 'disconnected' | awk '{print $1}')
office_con=$(xrandr | grep -e 'DP-1' | grep -v -e 'disconnected' | awk '{print $1}')
if [[ -n ${office_con} ]]; then
    xrandr --output ${office_con} --auto --right-of ${primary}
fi

Furthermore I'm using KDE Plasma 5.9 as my desktop environment. After logging in in SDDM the picture on the external screen does not update, it just shows the black plasma splash screen, even though Plasma has fully started on the left screen. I can move the mouse over the picture on the second screen and I can see the cursor move.

When disabling the second screen in the KDE systemsettings (or using xrandr) and then reenabling it the X server freezes and I have to shutdown my laptop forcibly.

Downgrading to xorg-server 1.19.1-5 fixes the problem. The Arch package 1.19.1-5 contained the following two patches on top of 1.19.1:

https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/xorg-server&id=63f2ddee51705b0055041fdf67895d7383cd07cc
https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/xorg-server&id=fdb75aee720eedd503a7a0ce819e45a6e13a0705
Comment 1 Michel Dänzer 2017-03-07 06:35:55 UTC
Please attach the corresponding Xorg log file.
Comment 2 bastian.beischer 2017-03-07 08:43:09 UTC
Created attachment 130105 [details]
xorg.log v0

xorg log file for session in which

a) picture on external screen is frozen (except for mouse) after login (timestamp ~221)
b) external screen is disabled (timestamp ~318)
c) external screen is enabled again and picture is again frozen (timestamp ~375)
Comment 3 bastian.beischer 2017-03-07 08:46:21 UTC
The log file I attached differs slightly in symptoms from the original report: There's no crash of the X server, but the picture on the external screen is frozen (except for the mouse pointer) just like in the original report.

Yesterday my X server crashed when I disabled and reenabled the external screen, but I can't reproduce that at the moment.
Comment 4 Jonas Lundberg 2017-03-07 08:58:50 UTC
I second this bug, same behavior on gnome 3.22.2.
Downgrading the xorg-server from 1.19.2-1 to 1.19.1-5 fixes it.
Comment 5 bastian.beischer 2017-03-07 09:09:50 UTC
Created attachment 130106 [details]
xorg.log from 1.19.1

Here's the xorg.log for a session which does not show the bug (from xorg 1.19.1). Unfortunately it looks identical to xorg.log_v0.
Comment 6 Michel Dänzer 2017-03-07 09:36:24 UTC
Does the problem also occur with the modesetting driver instead of the intel driver?
Comment 7 bastian.beischer 2017-03-07 10:23:04 UTC
No it seems to be working with the modesetting driver (I switched both GPUs to the modesetting driver, I tried to do it only for the Intel GPU but I think I failed, at least xrandr --listproviders names them both "modesetting").
Comment 8 Michel Dänzer 2017-03-08 01:55:45 UTC
Looks like an intel SNA bug. sna_accel_post_damage/migrate_dirty_tracking need to be fixed to properly handle the root window pixmap not being the screen pixmap (while a Present client is flipping).
Comment 9 Chris Wilson 2017-03-09 11:18:39 UTC
You've got to be kidding me.

The screen->pixmap_dirty_list does not track the dirty pixmaps anymore due to

commit b5b292896f647c85f03f53b20b2f03c0e94de428
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Wed Feb 1 18:35:56 2017 +0900

    prime: Sync shared pixmap from root window instead of screen pixmap
Comment 10 Michel Dänzer 2017-03-16 01:57:26 UTC
Thanks for the report, fixed in the 1.19.3 release.

commit 1097bc9c184db4c722d5a8d2c5a4c0da9cdc70f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Mar 9 11:25:34 2017 +0000

    Revert "prime: Sync shared pixmap from root window instead of screen pixmap"
Comment 11 bastian.beischer 2018-05-16 15:32:19 UTC
I believe that 1097bc9c184db4c722d5a8d2c5a4c0da9cdc70f5 wasn't commited to the branch which was used for the 1.20.0 release?

I'm seeing problems in 1.20.0 again.
Comment 12 Michel Dänzer 2018-05-16 17:18:00 UTC
(In reply to bastian.beischer from comment #11)
> I'm seeing problems in 1.20.0 again.

Yes, the change remains in 1.20's ABI, xf86-video-intel needs to be adapted to it.
Comment 13 bastian.beischer 2018-06-07 09:44:08 UTC
Do we have any updates here?

I should add that the symptoms are different to the original report - I don't observe a crash anymore, but instead the second display is blank except for the mouse cursor which is visible and moves if the mouse moves, which makes me think if there's any connection to:

https://bugs.freedesktop.org/show_bug.cgi?id=105812

I also tried to switch both GPU drivers to modesetting, but I had no luck getting the external output to work at all (which was no problem in X server 1.19 either). That might be a different bug since I'm getting EDID related errors in xrandr.

All I can say with certainty at the moment is that there are definitely issues with NVIDIA (reverse) prime setups with X server 1.20.
Comment 14 Michel Dänzer 2018-06-11 16:13:43 UTC
*** Bug 106891 has been marked as a duplicate of this bug. ***
Comment 15 Michel Dänzer 2018-07-17 08:00:19 UTC
*** Bug 107253 has been marked as a duplicate of this bug. ***
Comment 16 Peter Wu 2018-08-14 08:43:59 UTC
Hi, after being handicapped for a few months, I decided to give it a go. The current intel module (built for 1.20) has a type confusion issue (assuming Pixmap, got Window) and crashes under ASAN, so that was an easy way to detect the cause.

Proposed patches:
[PATCH xf86-video-intel] SNA: fix PRIME output support since xserver 1.20
https://lists.freedesktop.org/archives/intel-gfx/2018-August/173523.html

Optional (it does not affect normal operation, only server exit):
[PATCH xserver] randr: fix RRCrtcDetachScanoutPixmap crash on server exit
https://lists.freedesktop.org/archives/intel-gfx/2018-August/173523.html
Comment 17 Peter Wu 2018-08-14 08:47:34 UTC
Oops, the most important patch had the wrong link, it should be:

[PATCH xf86-video-intel] SNA: fix PRIME output support since xserver 1.20
https://lists.freedesktop.org/archives/intel-gfx/2018-August/173522.html
Comment 18 Jorge Luis Martinez Gomez 2018-08-15 21:51:39 UTC
Awesome to see a patch. Thank you Peter Wu. I'll try it out as soon as I get back to work on Monday. I didn't get the crash,  so I can't test that, but I do get the blank screen with the cursor.
Comment 19 Jorge Luis Martinez Gomez 2018-08-17 15:41:59 UTC
I just tested the SNA patch Peter Wu shared and can confirm it works. :)
Comment 20 Yurii Kolesnykov 2019-09-28 13:38:08 UTC
This patch is shipping in ArchLinux xf86-video-intel package[0] and seems to work. I do maintain an alternative package in AUR, xf86-video-intel-git, which is very similar to official one but uses the latest master instead of sticking with some commit.

And now I see that patch needs to be rebased[2]. Peter, could you please rebase it and submit again, maybe it will see more attention now.

[0]https://archlinux.org/packages/extra/x86_64/xf86-video-intel/
[1]https://aur.archlinux.org/packages/xf86-video-intel-git/
[2]https://aur.archlinux.org/packages/xf86-video-intel-git/#comment-709686
Comment 21 Peter Wu 2019-09-29 15:07:35 UTC
Ville has applied a different patch that repeats much of my earlier patch:
https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=581ddc5d2f55efa2cf5ec76a802fb781ee142b01

However, it appears that it misses one crucial detail which most likely will result in crashes (assertion failed). I'll try to test the latest git without patch and prepare an updated patch if needed.
Comment 22 Jorge Luis Martinez Gomez 2019-10-25 00:50:26 UTC
@Peter Wu

Well, I got the blank screen with the cursor, again. I made a regression bug report on Archlinux here:

https://bugs.archlinux.org/task/64238

There, loqs provided a patch there that fixed the issue.
Comment 23 Peter Wu 2019-11-08 17:10:46 UTC
I can confirm 1:2.99.917+893+gbff5eca4-1 on Arch Linux resulted in a black screen with just a cursor being displayed. Not great if you only have a few minutes before doing a presentation for a full room of people... I rebased the patch and reach the exact same diff as in the patch in the linked issue tracker.

I will do an ASAN test while the European Intel developers are enjoying their weekend, and submit a patch on Monday.
Comment 24 Peter Wu 2019-11-15 15:27:26 UTC
The crash can still be reproduced with Intel + modesetting. On Arch Linux with xf86-video-intel 1:2.99.917+893+gbff5eca4-1 and xorg-server 1.20.5-4 it resulted in an instant segfault on connecting an external monitor. That instant occurrence is likely due to the "autobind GPUs to the screen" patch.

With pristine xorg-server 1.20.5 + a glvnd build patch, and xf86-video-intel 2.99.917-893-gbff5eca4 from git, the following ASAN trace is observable after:

    xrandr --setprovideroutputsource modesetting Intel
    xrandr --output HDMI-1-1 --mode 2560x1440  # should not crash

I'll submit the updated patch to the list.

==369074==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6120001ad918 at pc 0x7f33f0a08153 bp 0x7ffd89f50630 sp 0x7ffd89f50620
READ of size 8 at 0x6120001ad918 thread T0
    #0 0x7f33f0a08152 in to_sna_from_pixmap ../../../src/sna/sna.h:521
    #1 0x7f33f0a08152 in sna_pixmap_move_to_gpu ../../../src/sna/sna_accel.c:4222
    #2 0x7f33f0a57f3f in sna_accel_post_damage ../../../src/sna/sna_accel.c:17773
    #3 0x7f33f0a5c561 in sna_accel_block ../../../src/sna/sna_accel.c:18414
    #4 0x7f33f0acce2e in sna_block_handler ../../../src/sna/sna_driver.c:777
    #5 0x55bc9c56e97c in BlockHandler ../xorg-server-1.20.5/dix/dixutils.c:388
    #6 0x55bc9c80ecc0 in WaitForSomething ../xorg-server-1.20.5/os/WaitFor.c:201
    #7 0x55bc9c55edb7 in Dispatch ../xorg-server-1.20.5/dix/dispatch.c:421
    #8 0x55bc9c56cd9c in dix_main ../xorg-server-1.20.5/dix/main.c:276
    #9 0x7f33f4c21152 in __libc_start_main (/usr/lib/libc.so.6+0x27152)
    #10 0x55bc9c4b264d in _start (/tmp/nv/xprefix2/bin/Xorg.bin+0xdd64d)

0x6120001ad918 is located 56 bytes to the right of 288-byte region [0x6120001ad7c0,0x6120001ad8e0)
allocated by thread T0 here:
    #0 0x7f33f5432aca in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x55bc9c5bedbf in _dixAllocateScreenObjectWithPrivates ../xorg-server-1.20.5/dix/privates.c:709
    #2 0x55bc9c5df890 in CreateRootWindow ../xorg-server-1.20.5/dix/window.c:571
    #3 0x55bc9c56cb12 in dix_main ../xorg-server-1.20.5/dix/main.c:220
    #4 0x7f33f4c21152 in __libc_start_main (/usr/lib/libc.so.6+0x27152)
Comment 25 Peter Wu 2019-11-16 16:06:37 UTC
*** Bug 111976 has been marked as a duplicate of this bug. ***
Comment 26 Peter Wu 2019-11-19 22:37:20 UTC
Fixed the black screen issue (nouveau) and crash (modesetting) in xf86-video-intel 2.99.917-895-gcb6bff95.

Sometimes there is still a crash on Xorg exit when a external screen is attached, but at least it does not happen while in use (hopefully!). I can live with that issue, but for completeness the trace can be found below.

Thread 1 "Xorg.bin" received signal SIGSEGV, Segmentation fault.
0x00005612376305cc in PixmapStopDirtyTracking (src=0x0, slave_dst=0x6110000a9140) at ../xorg-server-1.20.5/dix/pixmap.c:251
251         ScreenPtr screen = src->pScreen;
#0  0x00005612376305cc in PixmapStopDirtyTracking (src=0x0, slave_dst=0x6110000a9140) at ../xorg-server-1.20.5/dix/pixmap.c:251
#1  0x0000561237692a2c in RRCrtcDetachScanoutPixmap (crtc=crtc@entry=0x617000004a00) at ../xorg-server-1.20.5/randr/rrcrtc.c:413
#2  0x0000561237692dcd in RRCrtcDestroyResource (value=0x617000004a00, pid=<optimized out>) at ../xorg-server-1.20.5/randr/rrcrtc.c:900
#3  0x0000561237644021 in doFreeResource (res=0x60300000b4a0, skip=skip@entry=0) at ../xorg-server-1.20.5/dix/resource.c:880
#4  0x0000561237647698 in FreeClientResources (client=0x60e000000040) at ../xorg-server-1.20.5/dix/resource.c:1146
#5  0x0000561237647698 in FreeClientResources (client=0x60e000000040) at ../xorg-server-1.20.5/dix/resource.c:1109
#6  0x00005612376478e5 in FreeAllResources () at ../xorg-server-1.20.5/dix/resource.c:1161
#7  0x00005612375e1e19 in dix_main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../xorg-server-1.20.5/dix/main.c:292
#8  0x00007f9b9ba12153 in __libc_start_main () at /usr/lib/libc.so.6
#9  0x000056123752764e in _start ()

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.