Bug 104388 - [snb] GPU HANG: ecode 6:0:0x85fffff8 in fgfs
Summary: [snb] GPU HANG: ecode 6:0:0x85fffff8 in fgfs
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: low normal
Assignee: Kenneth Graunke
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords: bisected
Depends on:
Blocks:
 
Reported: 2017-12-26 20:58 UTC by Fabrice Bellet
Modified: 2018-08-02 06:06 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
error file (78.95 KB, text/plain)
2017-12-26 20:58 UTC, Fabrice Bellet
Details
glxinfo (40.31 KB, text/plain)
2017-12-26 20:59 UTC, Fabrice Bellet
Details
workaround (1.31 KB, patch)
2018-05-11 12:31 UTC, vadym
Details | Splinter Review

Description Fabrice Bellet 2017-12-26 20:58:22 UTC
Created attachment 136392 [details]
error file

On a thinkpad X220, with a Intel HD Graphics 3000, running a stock version of Fedora 27 x86_64, and using Xorg instead of Wayland, I can trigger a GPU hang pretty easily running FlightGear, the flight simulator, when the windows has a particular size.

The /sys/class/drm/card0/error is attached to this bug.

I tested with Mesa-17.2.2 (packaged in Fedora 27), and also with a custom built of Mesa-17.3.1, with the same problem.

Here is how to reproduce the bug, for example from a Fedora 27 live Workstation USB stick (with a Fedora 27 installed on disk, you can skip directly to the dnf install commands):

- make sure that gdm will use Xorg instead of Wayland :
# rpm -e xorg-x11-server-Xwayland gnome-session-wayland-session --nodeps
- logout liveuser and login again
- verify that Xorg is running.
- free some space on / (the live USB stick may not have enough free space to hold the 2.5GB of FlightGear scenery data), for example by removing some *big* packages:
# rpm -qa | grep libreoffice| xargs rpm -e gnome-documents unoconv
# rpm -qa | grep adobe-source | xargs rpm -e
# rpm -qa | egrep '(qemu|libvirt)' | xargs rpm -e gnome-boxes
# rpm -e glibc-all-langpacks java-1.8.0-openjdk-headless gnome-getting-started-docs  javapackages-tools
# rpm -e  gnome-user-docs fedora-workstation-backgrounds
# rpm -e foomatic-db-ppds cldr-emoji-annotation libpinyin-data evolution-langpacks evolution ibus-libpinyin foomatic-db  evolution-ews evolution-help libpinyin foomatic ibus-typing-booster evolution-ews-langpacks libzhuyin  ibus-libzhuyin
# rpm -e firefox
# rpm -e iwl7260-firmware webkitgtk4-plugin-process-gtk2 qt-x11 unicode-ucd libkkc-data  dbusmenu-qt sni-qt adwaita-qt4 ibus-qt libkkc  ibus-kkc
- install FlightGear and wmctrl
# dnf install FlightGear wmctrl
- start it using these startup parameters:
$ fgfs --airport=LKPR --aircraft=mibs --timeofday=afternoon --season=summer --disable-real-weather-fetch --prop:/sim/rendering/multi-sample-buffers=true --prop:/sim/rendering/multi-samples=2 --disable-rembrandt --enable-freeze --disable-terrasync --glideslope=6.0 --offset-distance=5 --on-ground=false --disable-auto-coordination --metar="XXXX 012345Z 15003KT 19SM FEW072 FEW350 25/07 Q1028 NOSIG"
- from another terminal window, resize the FlightGear window to size 1024x717, and verify with xwininfo that the window has the expected size (if the window is sticked below the menu top bar in GNOME, it won't resize properly, the bug also happens if the window is partially offscreen):
$ wmctrl -r FlightGear -e 0,200,200,1024,717
- unfreeze the simulator by hitting the "P" key
- move up the view a dozen times, by hitting the "down arrow" key, you should see sparse clouds in the sky
- move the "left" or "right" arrow key to rotate the view of the sky.
- the GPU should freeze pretty quickly in this situation (<30seconds)

The window height seems to be an important condition to trigger this crash.
Comment 1 Fabrice Bellet 2017-12-26 20:59:27 UTC
Created attachment 136393 [details]
glxinfo
Comment 2 Elizabeth 2017-12-27 16:15:23 UTC
Hello Fabrice, 
Do you think that you could get an apitrace file of the issue??
http://apitrace.github.io/
Comment 3 Fabrice Bellet 2017-12-27 19:52:24 UTC
Sure, here is one (133MB) :

https://bellet.info/apitrace/fgfs.trace.bz2

[root@localhost ~]# md5sum fgfs.trace.bz2 
baad65506432041193a706c527310e9a  fgfs.trace.bz2
Comment 4 Elizabeth 2017-12-27 20:36:17 UTC
Awesome! Thank you. Let me see who can lend me a hand with this.
Comment 5 Elizabeth 2017-12-28 15:20:37 UTC
Hello again, seems that apitrace won't be enough. If you have some time, could you try to find a working commit? Thank you.
Comment 6 Fabrice Bellet 2017-12-28 17:01:32 UTC
a working commit of mesa git ?
Comment 7 Fabrice Bellet 2017-12-28 20:54:08 UTC
I tested several older Mesa/kernel versions, and _all_ these versions have the same problem:

mesa-11.0.0-2.20150913.fc23
mesa-12.0.3-2.fc26
mesa-13.0.3-3.fc26
mesa-17.0.1-1.fc27
mesa-17.0.3-1.fc27
mesa-17.1.3-2.fc27
mesa-17.2.4-2.fc27
mesa-17.3.1 (local built)

kernel-4.14.8-300.fc27.x86_64
kernel-4.13.9-300.fc27.x86_64
kernel-4.11.8-300.fc26.x86_64
kernel-4.8.6-300.fc25.x86_64
kernel-4.6.5-300.fc24.x86_64
Comment 8 Fabrice Bellet 2017-12-30 13:21:33 UTC
If it may help to narrow the issue, I played with various fgfs startup options, and noticed that _both_ multi samples option, and an odd window height value (height & 1 == 1) must be selected to trigger this bug. An odd window width is safe.


Changing the multi sample option causes a different visual to be selected, this same visual (190) is used when 2 or 4 samples are requested:

glXChooseVisual(0x55d7155916a0, 0, {1, GLX_RGBA, GLX_DOUBLEBUFFER, GLX_RED_SIZE, 8, GLX_GREEN_SIZE, 8, GLX_BLUE_SIZE, 8, GLX_DEPTH_SIZE, 24, GLX_STENCIL_SIZE, 8, GLX_SAMPLES, 2, 0}) = &{visual = 0x55d71558fe78, visualid = 190, screen = 0, depth = 24, c_class = 4, red_mask = 16711680, green_mask = 65280, blue_mask = 255, colormap_size = 256, bits_per_rgb = 8}

glXChooseVisual(0x559880b4ff20, 0, {1, GLX_RGBA, GLX_DOUBLEBUFFER, GLX_RED_SIZE, 8, GLX_GREEN_SIZE, 8, GLX_BLUE_SIZE, 8, GLX_DEPTH_SIZE, 24, GLX_STENCIL_SIZE, 8, 0}) = &{visual = 0x559880b49df8, visualid = 182, screen = 0, depth = 24, c_class = 4, red_mask = 16711680, green_mask = 65280, blue_mask = 255, colormap_size = 256, bits_per_rgb = 8}
Comment 9 Fabrice Bellet 2017-12-30 22:13:58 UTC
OK, I git-bisected the 9.0 branch, because 9.0-branchpoint was affected, but mesa-9.0 was not, and I found that commit dbe13c105f fixes the hang.
Comment 10 Fabrice Bellet 2018-01-09 17:02:42 UTC
More details:
 * it works if I use a fixed guardband size of (-1,1,-1,1), which I think consists to align the guardband size on the viewport size.

 * it also works, if a use the top level window size (1024,717) as the gb_size value in the function brw_calculate_guardband_size(), instead of 
(8192,8192). Moreover, it hangs when I choose a gb_size value that is greater than 718 in the computation of ss_gb_ymin and ss_gb_ymax (and only in the y-range) in this same function. Of course if a gb_size lower than 717 is used, gb clipping artifacts become visible, but it doesn't hang.

If needed I can provide a trace of the params used when brw_calculate_guardband_size() is called.
Comment 11 vadym 2018-05-11 12:30:54 UTC
Able to reproduce this on the same SandyBridge but with Ubuntu 16.04 (with stock mesa-17.2.8). So this is definitely not Fedora-specific issue. It is reproducible only on SandyBridge (not reproducible on Haswell and Kabylake).

Very strange behavior: it is not odd height problem: window 1280x801 doesn't hang, but 1280x799 does. Also not only odd heights are affected, but odd widths too (e.g, 1025x718). Didn't find any dependency here. Attaching a workaround patch.
Comment 12 vadym 2018-05-11 12:31:50 UTC
Created attachment 139498 [details] [review]
workaround
Comment 13 Fabrice Bellet 2018-05-28 10:30:21 UTC
I can no longer help on this bug, because I don't have the required gen6 hardware anymore.
Comment 14 Denis 2018-08-01 12:04:25 UTC
Hi, patch was accepted and added to mesa-master. I think, issue can be closed as fixed.


commit 399228ecad37f420be3028165b94d5d8d33516fc
Author: vadym.shovkoplias <vadim.shovkoplias@gmail.com>
Date:   Thu May 24 14:16:46 2018 +0300

    i965: Disable guardband clipping on SandyBridge for odd dimensions
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104388
    Signed-off-by: Andriy Khulap <andriy.khulap@globallogic.com>
    Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.