Bug 102442 - i915 segfault on archlinux / dell e6430 / HD4000 with xrandr --scale
Summary: i915 segfault on archlinux / dell e6430 / HD4000 with xrandr --scale
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-27 22:27 UTC by fanf42
Modified: 2017-08-30 17:13 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
X.org.0.log (xorg auto config for i915) (30.11 KB, text/plain)
2017-08-27 22:27 UTC, fanf42
no flags Details
X.org.0.log (DRI disabled) (28.15 KB, text/plain)
2017-08-27 22:29 UTC, fanf42
no flags Details
lspci-vv (12.86 KB, text/plain)
2017-08-27 22:32 UTC, fanf42
no flags Details
lspci-nn (2.36 KB, text/plain)
2017-08-27 22:33 UTC, fanf42
no flags Details
dmesg (101.52 KB, text/plain)
2017-08-27 22:36 UTC, fanf42
no flags Details
dmesg with arandr OK (107.50 KB, text/plain)
2017-08-27 22:59 UTC, fanf42
no flags Details
Xorg.0.log for xf86-video-intel-git 1:2.99.917+781+gc8990575-1 with --enable-debug=full (147.04 KB, application/x-xz)
2017-08-30 11:47 UTC, fanf42
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description fanf42 2017-08-27 22:27:17 UTC
Created attachment 133818 [details]
X.org.0.log (xorg auto config for i915)

When trying to understand why Firefox 55 is displaying a spinning wheel on tab when I connect/disconnect external monitors ( https://bugzilla.mozilla.org/show_bug.cgi?id=1391216 ) I triend at some point a simple: 

xrandr --output LVDS1 --scale "1368x768"

(where "1368x768" is a listed modset with xrandr) and it leads to xork/i915 segfault. 
I tried with various combination of DRI mode / AccelMethod, and: 

- DRI=2 or DRI=3 or DRI=False does not seems to impact the bug occurence, 
- AccelMethod=sna leads to the segfault, where uxa leads to a blank screen (with the need to switch to console and restart lightdm) but no error log in /var/log/Xorg.0.log*

The segfault is: 

8<---------------------------------
[    18.992] (EE)
[    18.992] (EE) Backtrace:
[    18.993] (EE) 0: /usr/lib/xorg-server/Xorg (OsLookupColor+0x139) [0x564b734b0f39]
[    18.993] (EE) 1: /usr/lib/libpthread.so.0 (funlockfile+0x50) [0x7f02adc9282f]
[    18.993] (EE) 2: /usr/lib/xorg/modules/drivers/intel_drv.so (_init+0x142c4) [0x7f02aa45a0a4]
[    18.993] (EE) 3: /usr/lib/xorg/modules/drivers/intel_drv.so (_init+0x5d265) [0x7f02aa4ebd35]
[    18.994] (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (_init+0x5d97c) [0x7f02aa4ed0fc]
[    18.994] (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (_init+0x689a8) [0x7f02aa503178]
[    18.994] (EE) 6: /usr/lib/xorg-server/Xorg (xf86DisableUnusedFunctions+0xf4) [0x564b733c3c04]
[    18.994] (EE) 7: /usr/lib/xorg-server/Xorg (xf86PruneDuplicateModes+0x2c1d) [0x564b733cde3d]
[    18.995] (EE) 8: /usr/lib/xorg-server/Xorg (RRCrtcSet+0x122) [0x564b7340ac42]
[    18.995] (EE) 9: /usr/lib/xorg-server/Xorg (ProcRRSetCrtcConfig+0x253) [0x564b7340c4c3]
[    18.995] (EE) 10: /usr/lib/xorg-server/Xorg (SendErrorToClient+0x368) [0x564b7334b1e8]
[    18.995] (EE) 11: /usr/lib/xorg-server/Xorg (InitFonts+0x420) [0x564b7334f1f0]
[    18.996] (EE) 12: /usr/lib/libc.so.6 (__libc_start_main+0xea) [0x7f02ad8fb4ca]
[    18.996] (EE) 13: /usr/lib/xorg-server/Xorg (_start+0x2a) [0x564b73338e9a]
[    18.996] (EE)
[    18.996] (EE) Segmentation fault at address 0x7f02b010c000
[    18.996] (EE)
Fatal server error:
[    18.996] (EE) Caught signal 11 (Segmentation fault). Server aborting

8<---------------------------------


System information:

- Archlinux
% uname -a
Linux luhman16 4.12.8-2-ARCH #1 SMP PREEMPT Fri Aug 18 14:08:02 UTC 2017 x86_64 GNU/Linux

% pacman -Q | grep xorg | sort
xorg-bdftopcf 1.0.5-1
xorg-fonts-alias 1.0.3-1
xorg-fonts-encodings 1.0.4-4
xorg-fonts-misc 1.0.3-5
xorg-fonts-type1 7.7-2
xorg-font-util 1.3.1-1
xorg-font-utils 7.6-4
xorg-iceauth 1.0.7-1
xorg-luit 1.1.1-2
xorg-mkfontdir 1.0.7-8
xorg-mkfontscale 1.1.2-1
xorg-server 1.19.3-3
xorg-server-common 1.19.3-3
xorg-server-utils 7.6-4
xorg-server-xwayland 1.19.3-3
xorg-sessreg 1.1.1-1
xorg-setxkbmap 1.3.1-1
xorg-twm 1.0.9-1
xorg-utils 7.6-9
xorg-xauth 1.0.10-1
xorg-xbacklight 1.2.1-1
xorg-xclock 1.0.7-1
xorg-xcmsdb 1.0.5-1
xorg-xdpyinfo 1.3.2-1
xorg-xdriinfo 1.0.5-2
xorg-xev 1.2.2-1
xorg-xgamma 1.0.6-1
xorg-xhost 1.0.7-1
xorg-xinit 1.3.4-4
xorg-xinput 1.6.2-1
xorg-xkbcomp 1.4.0-1
xorg-xlsatoms 1.1.2-1
xorg-xlsclients 1.1.3-1
xorg-xmodmap 1.0.9-1
xorg-xprop 1.2.2-1
xorg-xrandr 1.5.0-1
xorg-xrdb 1.1.0-2
xorg-xrefresh 1.0.5-1
xorg-xset 1.2.3-1
xorg-xsetroot 1.1.1-2
xorg-xvinfo 1.1.3-1
xorg-xwininfo 1.1.3-1

% pacman -Q | grep xf86 | sort
libxxf86dga 1.1.4-1
libxxf86vm 1.1.4-1
xf86dgaproto 2.1-3
xf86-input-evdev 2.10.5-1
xf86-input-libinput 0.25.1-1
xf86-input-synaptics 1.9.0-1
xf86-video-intel 1:2.99.917+779+g2100efa1-2
xf86-video-nouveau 1.0.15-2
xf86vidmodeproto 2.3.1-3

Attached are relevant log files (dmesg, xorg log and lspci output)
Comment 1 fanf42 2017-08-27 22:29:51 UTC
Created attachment 133819 [details]
X.org.0.log (DRI disabled)

An other xorg log with DRI disabled
Comment 2 fanf42 2017-08-27 22:32:24 UTC
Created attachment 133820 [details]
lspci-vv
Comment 3 fanf42 2017-08-27 22:33:22 UTC
Created attachment 133821 [details]
lspci-nn
Comment 4 fanf42 2017-08-27 22:36:34 UTC
Created attachment 133822 [details]
dmesg


I triggered the segfault between these two log lines:

[  914.124797] [drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:54:VGA-1] disconnected
[ 1978.446104] [drm:drm_mode_addfb2 [drm]] [FB:65]
Comment 5 fanf42 2017-08-27 22:59:51 UTC
Created attachment 133823 [details]
dmesg with arandr OK

And if I use arandr (right click -> resolution -> 1368x768), the scaling is done without a segfault. 

Which also explain why I didn't notice the segfault before (I tend to use xrandr when debuging display problem, which is quite rare, and arandr / desktop change resolution tool for my day to day use).
Comment 6 Chris Wilson 2017-08-29 19:08:24 UTC
The bt is not that useful without the symbols, and that --scale explodes is quite, quite surprising. If you can get a clean bt, try recompiling with --enable-debug and make sure the symbols aren't stripped on install that will be a big help.
Comment 7 fanf42 2017-08-29 23:42:24 UTC
I'm a little lost, I'm not succeding to have the trace with debug symbol. 

I compiled xf86-video-intel with "-O -g -ffast-math -march=native -ggdb3" and addede "--enable-debug" to configure. 
I took care (ok, it was the third time before I succeed) to not let arch mkpkg strip the debug symbols. The resulting /usr/lib/xorg/modules/drivers/intel_drv.so is 15M, so it seems to have what is needed, but I still get the non informative stack. 

Is there something else that I have to do to get the nice bt?

(I know I can do it, I also recompiled xorg-server and mesa for that other segfault:https://phab.enlightenment.org/T5957, it was a fabulous evening. Now I now that mesa is a very big package with debug symbol, I believe it can be accounted as a win ?)

It's the first time I'm trying to do these things, perhaps I'm missing something obvious?
Comment 8 Chris Wilson 2017-08-30 10:43:24 UTC
Hmm, that sounds like it should have been able to pick up the symbols. The last resort is to use "sudo gdb --pid ($pidof Xorg)" from a remote login.

To check that it picked up the recompiled intel_drv.so, after --enable-debug you should get ""SNA compiled with assertions enabled" in the Xorg.log. If that is in order, and we still don't have a good stacktrace, use --enable-debug=full and attach the compressed Xorg.log and I'll figure out where it dies based on the last debug message.
Comment 9 fanf42 2017-08-30 10:51:11 UTC
Just an idea - where should I use / set "--enable-debug" ? Because perhaps I'm just not doing it right. Is it a config option of "./configure", or a parameter of xorg starting command, or something else?

(sorry if it sounds very dumb)
Comment 10 Chris Wilson 2017-08-30 10:56:28 UTC
It's an option to ./configure (or ./autogen.sh) of xf86-video-intel.
Comment 11 fanf42 2017-08-30 11:45:47 UTC
So, I did the correct thing and I had the "SNA compiled with assertions enabled". 

So here goes the full debug log. 
What I did (if it helps understanding):

- in console, restart lightdm; 
- switch to console 7, log-in in lightdm
- enlightenment starts
- open a terminal, enter: "xrandr -s 800x600", enter 

=> segfault, lightdm restarts. 

Switch back to console 1, copy/compress X.org.log.old. 

Hope it helps. 

Don't hesitate to ask, if you need some other piece of xorg recompiled with the debug symbole, I'm starting to be good at that :) (ok, modulo the critical fails here)
Comment 12 fanf42 2017-08-30 11:47:36 UTC
Created attachment 133876 [details]
Xorg.0.log for xf86-video-intel-git 1:2.99.917+781+gc8990575-1 with --enable-debug=full
Comment 13 fanf42 2017-08-30 11:49:39 UTC
Comment on attachment 133876 [details]
Xorg.0.log for xf86-video-intel-git 1:2.99.917+781+gc8990575-1 with --enable-debug=full

xf86-video-intel-git 1:2.99.917+781+gc8990575-1 with --enable-debug=full
Comment 14 Chris Wilson 2017-08-30 12:07:58 UTC
Ah, can you tweak the assert

diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index d1f01218..3f70d536 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -565,7 +565,7 @@ static void assert_scanout(struct kgem *kgem, struct kgem_bo *bo,
        assert(drmIoctl(kgem->fd, DRM_IOCTL_MODE_GETFB, &info) == 0);
        gem_close(kgem->fd, info.handle);
 
-       assert(width == info.width && height == info.height);
+       assert(width <= info.width && height <= info.height);
 }
 #else
 #define assert_scanout(k, b, w, h)

and try again?
Comment 15 fanf42 2017-08-30 16:39:10 UTC
Yep, that worked :) 

Congrats !
Comment 16 Chris Wilson 2017-08-30 16:47:05 UTC
Uhoh, that will have only fixed up an assert that you could not have hit before we started testing...
Comment 17 fanf42 2017-08-30 17:13:55 UTC
That's really funny :) OK, I will try to reproduce - but I can't with the previous method, so I need to understand what changed (perhaps I was just using a debug package all along, which is not totally to exclude since 1/ as you said, a bug in scala is highly unlikely and 2/ I was trying to debug other things https://phab.enlightenment.org/T5941 https://phab.enlightenment.org/T5957)

Thanks for the help in all cases !


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.