Bug 111834 - Xorg Segfault with intel driver on Intel x5-E8000 (Cherryview) hardware
Summary: Xorg Segfault with intel driver on Intel x5-E8000 (Cherryview) hardware
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: not set not set
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-26 12:58 UTC by Stefan Gottwald
Modified: 2019-09-26 12:58 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Gottwald 2019-09-26 12:58:27 UTC
The device with the following problem is a IGEL UD2-LX50 (internal M250C) with 2 DisplayPorts (yes one is the eDP used as normal DP).

Reproducing the issue is hard because we only got the issue once with a dual screen setup and waking up from DPMS off. The Xorg in this case was the 1.20.x and the intel driver from current git.

We got following logs in Xorg.0.log:

[1725899.564] (EE) 
[1725899.564] (EE) Backtrace:
[1725899.599] (EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4e) [0x597dbe]
[1725899.599] (EE) 1: /usr/lib/xorg/Xorg (0x400000+0x19bb29) [0x59bb29]
[1725899.599] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f83f5938000+0x12890) [0x7f83f594a890]
[1725899.599] (EE) 3: /usr/lib/xorg/Xorg (0x400000+0xba104) [0x4ba104]
[1725899.600] (EE) 4: /usr/lib/xorg/Xorg (0x400000+0x118808) [0x518808]
[1725899.600] (EE) 5: /usr/lib/xorg/Xorg (FreeCursor+0x71) [0x42a8f1]
[1725899.600] (EE) 6: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x6d7c3) [0x7f83f2afe7c3]
[1725899.600] (EE) 7: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x6f7ff) [0x7f83f2b007ff]
[1725899.600] (EE) 8: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x75114) [0x7f83f2b06114]
[1725899.600] (EE) 9: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f83f2a91000+0x7b793) [0x7f83f2b0c793]
[1725899.600] (EE) 10: /usr/lib/xorg/Xorg (0x400000+0x7759d) [0x47759d]
[1725899.600] (EE) 11: /usr/lib/xorg/Xorg (DPMSSet+0x76) [0x477886]
[1725899.600] (EE) 12: /usr/lib/xorg/Xorg (mieqProcessInputEvents+0x166) [0x57a866]
[1725899.600] (EE) 13: /usr/lib/xorg/Xorg (ProcessInputEvents+0x19) [0x477c59]
[1725899.600] (EE) 14: /usr/lib/xorg/Xorg (0x400000+0x36e37) [0x436e37]
[1725899.600] (EE) 15: /usr/lib/xorg/Xorg (0x400000+0x3b148) [0x43b148]
[1725899.600] (EE) 16: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xe7) [0x7f83f5568b97]
[1725899.600] (EE) 17: /usr/lib/xorg/Xorg (_start+0x29) [0x425099]
[1725899.601] (EE) 
[1725899.601] (EE) Segmentation fault at address 0x18
[1725899.601] (EE) 
Fatal server error:
[1725899.601] (EE) Caught signal 11 (Segmentation fault). Server aborting
[1725899.601] (EE) 
[1725899.601] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[1725899.601] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[1725899.601] (EE) 
[1725899.601] (II) AIGLX: Suspending AIGLX clients for VT switch
[1725899.712] (EE) Server terminated with error (1). Closing log file.

Which remembered us of the Cherryview issues with hardware cursors:

Small part from i915 Kernel driver which means if you are unlucky and move your mouse cursor to the left screen border you will lose the hardware cursor. Which was a bigger problem as the Intel driver was then stuck with the software cursor.

/*
 * There's something wrong with the cursor on CHV pipe C.
 * If it straddles the left edge of the screen then
 * moving it away from the edge or disabling it often
 * results in a pipe underrun, and often that can lead to
 * dead pipe (constant underrun reported, and it scans
 * out just a solid color). To recover from that, the
 * display power well must be turned off and on again.
 * Refuse the put the cursor into that compromised position.
 */
if (IS_CHERRYVIEW(dev_priv) && pipe == PIPE_C &&
    plane_state->base.visible && plane_state->base.crtc_x < 0) {
	DRM_DEBUG_KMS("CHV cursor C not allowed to straddle the left screen edge\n");
	return -EINVAL;
}

The commit below helped as the cursor will switch back to hardware cursor

https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=6afed33b2d673d88674f0c76efe500ae414e8e1b

But the above commit leads to new problems as one can see in the log. As we already did some debugging before we digged a little bit deeper which wasn't really hard with the FreeCursor mentioned in the logs.

The following is the fix we added in our builds and up to now no crash was reported after this change:

--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -6428,8 +6428,10 @@ static void restore_swcursor(struct sna
 	sna->cursor.info->HideCursor(sna->scrn);
 
 	/* XXX Force the cursor to be restored (avoiding recursion) */
-	FreeCursor(sna->cursor.ref, None);
-	sna->cursor.ref = NULL;
+	if (sna->cursor.ref) {
+		FreeCursor(sna->cursor.ref, None);
+		sna->cursor.ref = NULL;
+	}
 
 	RegisterBlockAndWakeupHandlers((void *)__restore_swcursor,
 				       (void *)NoopDDA,

There are 2 other places where FreeCursor is called (there it is protected by the if (sna->cursor.ref) {) and we think that this will fix the problem. The issue seems to be a timing thing going out of DPMS with a mouse move and switch from Software to Hardware Cursor at the same time and then call FreeCursor with a NULL pointer. It is not fully clear this is really the solution as the problem was only seen once with logfiles available. We have customers reporting similar issues but with no Logs available. Most of them switched to the modesetting driver which is also not really trouble free (https://gitlab.freedesktop.org/xorg/xserver/issues/880 and https://gitlab.freedesktop.org/xorg/xserver/issues/881) but seems more stable (no further reports after switch to it).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.