Bug 111834

Summary: Xorg Segfault with intel driver on Intel x5-E8000 (Cherryview) hardware
Product: xorg Reporter: Stefan Gottwald <gottwald>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: not set    
Priority: not set    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
i915 platform: i915 features:

Description Stefan Gottwald 2019-09-26 12:58:27 UTC
The device with the following problem is a IGEL UD2-LX50 (internal M250C) with 2 DisplayPorts (yes one is the eDP used as normal DP).

Reproducing the issue is hard because we only got the issue once with a dual screen setup and waking up from DPMS off. The Xorg in this case was the 1.20.x and the intel driver from current git.

We got following logs in Xorg.0.log:

[1725899.564] (EE) 
[1725899.564] (EE) Backtrace:
[1725899.599] (EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4e) [0x597dbe]
[1725899.599] (EE) 1: /usr/lib/xorg/Xorg (0x400000+0x19bb29) [0x59bb29]
[1725899.599] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f83f5938000+0x12890) [0x7f83f594a890]
[1725899.599] (EE) 3: /usr/lib/xorg/Xorg (0x400000+0xba104) [0x4ba104]
[1725899.600] (EE) 4: /usr/lib/xorg/Xorg (0x400000+0x118808) [0x518808]
[1725899.600] (EE) 5: /usr/lib/xorg/Xorg (FreeCursor+0x71) [0x42a8f1]
[1725899.600] (EE) 6: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x6d7c3) [0x7f83f2afe7c3]
[1725899.600] (EE) 7: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x6f7ff) [0x7f83f2b007ff]
[1725899.600] (EE) 8: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x75114) [0x7f83f2b06114]
[1725899.600] (EE) 9: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x7b793) [0x7f83f2b0c793]
[1725899.600] (EE) 10: /usr/lib/xorg/Xorg (0x400000+0x7759d) [0x47759d]
[1725899.600] (EE) 11: /usr/lib/xorg/Xorg (DPMSSet+0x76) [0x477886]
[1725899.600] (EE) 12: /usr/lib/xorg/Xorg (mieqProcessInputEvents+0x166) [0x57a866]
[1725899.600] (EE) 13: /usr/lib/xorg/Xorg (ProcessInputEvents+0x19) [0x477c59]
[1725899.600] (EE) 14: /usr/lib/xorg/Xorg (0x400000+0x36e37) [0x436e37]
[1725899.600] (EE) 15: /usr/lib/xorg/Xorg (0x400000+0x3b148) [0x43b148]
[1725899.600] (EE) 16: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xe7) [0x7f83f5568b97]
[1725899.600] (EE) 17: /usr/lib/xorg/Xorg (_start+0x29) [0x425099]
[1725899.601] (EE) 
[1725899.601] (EE) Segmentation fault at address 0x18
[1725899.601] (EE) 
Fatal server error:
[1725899.601] (EE) Caught signal 11 (Segmentation fault). Server aborting
[1725899.601] (EE) 
[1725899.601] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[1725899.601] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[1725899.601] (EE) 
[1725899.601] (II) AIGLX: Suspending AIGLX clients for VT switch
[1725899.712] (EE) Server terminated with error (1). Closing log file.

Which remembered us of the Cherryview issues with hardware cursors:

Small part from i915 Kernel driver which means if you are unlucky and move your mouse cursor to the left screen border you will lose the hardware cursor. Which was a bigger problem as the Intel driver was then stuck with the software cursor.

 * There's something wrong with the cursor on CHV pipe C.
 * If it straddles the left edge of the screen then
 * moving it away from the edge or disabling it often
 * results in a pipe underrun, and often that can lead to
 * dead pipe (constant underrun reported, and it scans
 * out just a solid color). To recover from that, the
 * display power well must be turned off and on again.
 * Refuse the put the cursor into that compromised position.
if (IS_CHERRYVIEW(dev_priv) && pipe == PIPE_C &&
    plane_state->base.visible && plane_state->base.crtc_x < 0) {
	DRM_DEBUG_KMS("CHV cursor C not allowed to straddle the left screen edge\n");
	return -EINVAL;

The commit below helped as the cursor will switch back to hardware cursor


But the above commit leads to new problems as one can see in the log. As we already did some debugging before we digged a little bit deeper which wasn't really hard with the FreeCursor mentioned in the logs.

The following is the fix we added in our builds and up to now no crash was reported after this change:

--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -6428,8 +6428,10 @@ static void restore_swcursor(struct sna
 	/* XXX Force the cursor to be restored (avoiding recursion) */
-	FreeCursor(sna->cursor.ref, None);
-	sna->cursor.ref = NULL;
+	if (sna->cursor.ref) {
+		FreeCursor(sna->cursor.ref, None);
+		sna->cursor.ref = NULL;
+	}
 	RegisterBlockAndWakeupHandlers((void *)__restore_swcursor,
 				       (void *)NoopDDA,

There are 2 other places where FreeCursor is called (there it is protected by the if (sna->cursor.ref) {) and we think that this will fix the problem. The issue seems to be a timing thing going out of DPMS with a mouse move and switch from Software to Hardware Cursor at the same time and then call FreeCursor with a NULL pointer. It is not fully clear this is really the solution as the problem was only seen once with logfiles available. We have customers reporting similar issues but with no Logs available. Most of them switched to the modesetting driver which is also not really trouble free (https://gitlab.freedesktop.org/xorg/xserver/issues/880 and https://gitlab.freedesktop.org/xorg/xserver/issues/881) but seems more stable (no further reports after switch to it).
Comment 1 Martin Peres 2019-11-27 13:51:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/issues/168.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.