Created attachment 136392 [details] error file On a thinkpad X220, with a Intel HD Graphics 3000, running a stock version of Fedora 27 x86_64, and using Xorg instead of Wayland, I can trigger a GPU hang pretty easily running FlightGear, the flight simulator, when the windows has a particular size. The /sys/class/drm/card0/error is attached to this bug. I tested with Mesa-17.2.2 (packaged in Fedora 27), and also with a custom built of Mesa-17.3.1, with the same problem. Here is how to reproduce the bug, for example from a Fedora 27 live Workstation USB stick (with a Fedora 27 installed on disk, you can skip directly to the dnf install commands): - make sure that gdm will use Xorg instead of Wayland : # rpm -e xorg-x11-server-Xwayland gnome-session-wayland-session --nodeps - logout liveuser and login again - verify that Xorg is running. - free some space on / (the live USB stick may not have enough free space to hold the 2.5GB of FlightGear scenery data), for example by removing some *big* packages: # rpm -qa | grep libreoffice| xargs rpm -e gnome-documents unoconv # rpm -qa | grep adobe-source | xargs rpm -e # rpm -qa | egrep '(qemu|libvirt)' | xargs rpm -e gnome-boxes # rpm -e glibc-all-langpacks java-1.8.0-openjdk-headless gnome-getting-started-docs javapackages-tools # rpm -e gnome-user-docs fedora-workstation-backgrounds # rpm -e foomatic-db-ppds cldr-emoji-annotation libpinyin-data evolution-langpacks evolution ibus-libpinyin foomatic-db evolution-ews evolution-help libpinyin foomatic ibus-typing-booster evolution-ews-langpacks libzhuyin ibus-libzhuyin # rpm -e firefox # rpm -e iwl7260-firmware webkitgtk4-plugin-process-gtk2 qt-x11 unicode-ucd libkkc-data dbusmenu-qt sni-qt adwaita-qt4 ibus-qt libkkc ibus-kkc - install FlightGear and wmctrl # dnf install FlightGear wmctrl - start it using these startup parameters: $ fgfs --airport=LKPR --aircraft=mibs --timeofday=afternoon --season=summer --disable-real-weather-fetch --prop:/sim/rendering/multi-sample-buffers=true --prop:/sim/rendering/multi-samples=2 --disable-rembrandt --enable-freeze --disable-terrasync --glideslope=6.0 --offset-distance=5 --on-ground=false --disable-auto-coordination --metar="XXXX 012345Z 15003KT 19SM FEW072 FEW350 25/07 Q1028 NOSIG" - from another terminal window, resize the FlightGear window to size 1024x717, and verify with xwininfo that the window has the expected size (if the window is sticked below the menu top bar in GNOME, it won't resize properly, the bug also happens if the window is partially offscreen): $ wmctrl -r FlightGear -e 0,200,200,1024,717 - unfreeze the simulator by hitting the "P" key - move up the view a dozen times, by hitting the "down arrow" key, you should see sparse clouds in the sky - move the "left" or "right" arrow key to rotate the view of the sky. - the GPU should freeze pretty quickly in this situation (<30seconds) The window height seems to be an important condition to trigger this crash.
Created attachment 136393 [details] glxinfo
Hello Fabrice, Do you think that you could get an apitrace file of the issue?? http://apitrace.github.io/
Sure, here is one (133MB) : https://bellet.info/apitrace/fgfs.trace.bz2 [root@localhost ~]# md5sum fgfs.trace.bz2 baad65506432041193a706c527310e9a fgfs.trace.bz2
Awesome! Thank you. Let me see who can lend me a hand with this.
Hello again, seems that apitrace won't be enough. If you have some time, could you try to find a working commit? Thank you.
a working commit of mesa git ?
I tested several older Mesa/kernel versions, and _all_ these versions have the same problem: mesa-11.0.0-2.20150913.fc23 mesa-12.0.3-2.fc26 mesa-13.0.3-3.fc26 mesa-17.0.1-1.fc27 mesa-17.0.3-1.fc27 mesa-17.1.3-2.fc27 mesa-17.2.4-2.fc27 mesa-17.3.1 (local built) kernel-4.14.8-300.fc27.x86_64 kernel-4.13.9-300.fc27.x86_64 kernel-4.11.8-300.fc26.x86_64 kernel-4.8.6-300.fc25.x86_64 kernel-4.6.5-300.fc24.x86_64
If it may help to narrow the issue, I played with various fgfs startup options, and noticed that _both_ multi samples option, and an odd window height value (height & 1 == 1) must be selected to trigger this bug. An odd window width is safe. Changing the multi sample option causes a different visual to be selected, this same visual (190) is used when 2 or 4 samples are requested: glXChooseVisual(0x55d7155916a0, 0, {1, GLX_RGBA, GLX_DOUBLEBUFFER, GLX_RED_SIZE, 8, GLX_GREEN_SIZE, 8, GLX_BLUE_SIZE, 8, GLX_DEPTH_SIZE, 24, GLX_STENCIL_SIZE, 8, GLX_SAMPLES, 2, 0}) = &{visual = 0x55d71558fe78, visualid = 190, screen = 0, depth = 24, c_class = 4, red_mask = 16711680, green_mask = 65280, blue_mask = 255, colormap_size = 256, bits_per_rgb = 8} glXChooseVisual(0x559880b4ff20, 0, {1, GLX_RGBA, GLX_DOUBLEBUFFER, GLX_RED_SIZE, 8, GLX_GREEN_SIZE, 8, GLX_BLUE_SIZE, 8, GLX_DEPTH_SIZE, 24, GLX_STENCIL_SIZE, 8, 0}) = &{visual = 0x559880b49df8, visualid = 182, screen = 0, depth = 24, c_class = 4, red_mask = 16711680, green_mask = 65280, blue_mask = 255, colormap_size = 256, bits_per_rgb = 8}
OK, I git-bisected the 9.0 branch, because 9.0-branchpoint was affected, but mesa-9.0 was not, and I found that commit dbe13c105f fixes the hang.
More details: * it works if I use a fixed guardband size of (-1,1,-1,1), which I think consists to align the guardband size on the viewport size. * it also works, if a use the top level window size (1024,717) as the gb_size value in the function brw_calculate_guardband_size(), instead of (8192,8192). Moreover, it hangs when I choose a gb_size value that is greater than 718 in the computation of ss_gb_ymin and ss_gb_ymax (and only in the y-range) in this same function. Of course if a gb_size lower than 717 is used, gb clipping artifacts become visible, but it doesn't hang. If needed I can provide a trace of the params used when brw_calculate_guardband_size() is called.
Able to reproduce this on the same SandyBridge but with Ubuntu 16.04 (with stock mesa-17.2.8). So this is definitely not Fedora-specific issue. It is reproducible only on SandyBridge (not reproducible on Haswell and Kabylake). Very strange behavior: it is not odd height problem: window 1280x801 doesn't hang, but 1280x799 does. Also not only odd heights are affected, but odd widths too (e.g, 1025x718). Didn't find any dependency here. Attaching a workaround patch.
Created attachment 139498 [details] [review] workaround
I can no longer help on this bug, because I don't have the required gen6 hardware anymore.
Hi, patch was accepted and added to mesa-master. I think, issue can be closed as fixed. commit 399228ecad37f420be3028165b94d5d8d33516fc Author: vadym.shovkoplias <vadim.shovkoplias@gmail.com> Date: Thu May 24 14:16:46 2018 +0300 i965: Disable guardband clipping on SandyBridge for odd dimensions Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104388 Signed-off-by: Andriy Khulap <andriy.khulap@globallogic.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.