Created attachment 137426 [details] error dump Hello, When I leave the modesetting driver parameter AccelMethod to default, I am very easily getting a GPU hang, I just need to run: startx fvwm I get an X session with eDP-1 on the left and DP-2 on the right, which I change with xrandr --output eDP-1 --auto --output DP-2 --auto --above eDP-1 and then I get a GPU hang. I am now attaching the /sys/class/drm/card0/error output, will also attach other logs.
Created attachment 137427 [details] Xorg log
Created attachment 137428 [details] dmesg
Created attachment 137429 [details] lspci
Created attachment 137430 [details] xorg.conf
Created attachment 137431 [details] packages versions
Hello Samuel, it makes any difference if you add intel_iommu=igfx_off to grub? What distro and desktop environment are you using? Is it possible that you test with mesa 18.0.0.rc4?
It makes a huge difference indeed! No issue at all so far, despite trying to read videos, do some OpenGL, etc.
Hello Samuel, were you able to test with mesa 18.0.0-rc4 or latest 17.3.6 release?
17.3.6 was getting the same result. I have just upgraded to debian experimental's 18.0.0-rc4 and my reproduction test case doesn't have any issue. I'll see how well it goes on the long run.
Created attachment 138121 [details] error dump This morning the same symptom happened, here is the error dump. linux 4.15.0, mesa 18.0.0~rc4
(In reply to Samuel Thibault from comment #10) > Created attachment 138121 [details] > error dump > > This morning the same symptom happened, here is the error dump. > linux 4.15.0, mesa 18.0.0~rc4 So steps from comment #1 still produce it?
Yes. I guess I was just lucky yesterday and should have tried more times.
Hi, I'm going to try to replicate the issue, I just found a BDW with a DP output but was having HW issues, so I'll try with another BDW with HDMI. To summarize, Arch linux + fvwm + eDP & a external display + AccelMethod to default, and finally use xrandr to change the outputs, right?
It's Debian Buster :) but yes that's it.
(In reply to Samuel Thibault from comment #14) > It's Debian Buster :) but yes that's it. Oh, you right: (Debian 7.3.0-1). I was looking a different log.
Hi again, I failed to replicate on a BDW with HDMI. This is what I have done so far: 1. Install debian sid 2. Update and upgrade system 3. Install newer kernel (because I needed to apply this patch https://patchwork.kernel.org/patch/10156067/ for acpi) 3. Install fvwm 4. startx 5. Connect HDMI output (after booting had finished) 6. Used: xrandr --output eDP-1 --auto --output DP-2 --auto --(above/below/left-of/right-of) eDP-1 7. No hang happened. (I can use terminal only on primary display, not sure if it's expected from fvwm). As for the AccelMethod, I did nothing to it: gfx@debian:~$ sudo find / -name xorg.conf /usr/share/doc/xserver-xorg-video-intel/xorg.conf gfx@debian:~$ cat /usr/share/doc/xserver-xorg-video-intel/xorg.conf Section "Device" Identifier "Intel" Driver "intel" # Option "AccelMethod" "uxa" EndSection gfx@debian:~$ What I'm missing to be able to replicate? Thanks. gfx@debian:~$ uname -a Linux debian 4.16.0-rc6 #1 SMP Tue Mar 20 02:10:14 PDT 2018 x86_64 GNU/Linux gfx@debian:~$ glxinfo | grep -i "opengl version" OpenGL version string: 3.0 Mesa 17.3.6 gfx@debian:~$
Hello, fvwm indeed doesn't currently detect xinerama changes, but let's just get rid of it from the picture. - I boot with the external screen plugged on VGA, so the linux console shows up on both screens. - I run startx xterm, it works fine, one screen on the right of the other. - In the xterm window, I run "xrandr --output eDP-1 --auto --output DP-2 --auto --above eDP-1" to get one screen above the other - I can type enter in xterm a couple of times, it still works, until it has to scroll, and there things hang. I'll attach the xrandr output before and after the change, in case details there matter.
Created attachment 138256 [details] before xrandr call
Created attachment 138257 [details] after xrandr call
And I tried again with the HDMI connected from the beginning and not issue so far. Could you try same kernel as mine, it's the mainline one at https://www.kernel.org? Also you could try to add the parameter drm.debug=0x1e in grub to get more debug information, and by ssh do a dmesg -w to check for any errors occurring just before the hang. You mentioned vga before, is that correct? Don't you mean DP? The error state indicates a mesa related issue, but do this worked with a previous version of mesa?
Created attachment 138292 [details] dmesg 4.5.12 I'll have to recompile that kernel, but here are already the dmesg results with 4.15.12 with drm.debug=0x1e for now. I actually didn't need to use ssh because even if Xorg looks frozen, ctrl-alt-f2 works (it just takes some time to take effect. Apparently moving the mouse helps) > You mentioned vga before, is that correct? Don't you mean DP? Well, it's really a VGA plug that I have on this laptop, even if in xrandr it happens to be called DP-2. > do this worked with a previous version of mesa? As far as I can remember, I have never been able to get a stable Xorg workspace without disabling acceleration, before adding intel_iommu=igfx_off.
Created attachment 138304 [details] dmesg 4.16.0-rc6 The symptoms are a bit different with 4.16.0-rc6 indeed (it seems it is able to recover), but I'm still getting hangs, here is the dmesg with debugging.
Well I found interesting that this messages are just before the hang report, from dmesg: [ 154.524158] DMAR: DRHD: handling fault status reg 3 [ 154.524160] DMAR: [DMA Read] Request device [00:02.0] fault addr 38d000 [fault reason 05] PTE Write access is not set And looking around I believe I found why you have to use the intel_iommu=igfx_off: "On HPE ProLiant Gen9-series servers running Red Hat Enterprise Linux 6, Red Hat Enterprise Linux 7, SUSE Linux Enterprise Server 11 SP3, or SUSE Linux Enterprise Server 12 with the I/O Memory Management Unit (IOMMU) option Enabled in the ROM-Based Setup Utility (RBSU) and with "intel_iommu=on" added to the Linux kernel boot parameters, the IP addresses assigned to interface will not be accessible and a message similar to "CPU stuck" may be displayed on the console. In addition, DMAR fault messages are logged in the /var/log/messages as follows: > dmar: DRHD: handling fault status reg 2 > dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr 791dc000 > DMAR:[fault reason 05] PTE Write access is not set > dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr 791dc000 > DMAR:[fault reason 05] PTE Write access is not set > dmar: DMAR:[DMA Write] Request device [02:00.1] fault addr 791dc000 > DMAR:[fault reason 05] PTE Write access is not set This occurs because of a known limitation that the bnx2x driver has with the Option Card Black Box - Active Health (OCBB) feature when IOMMU is enabled. The network adapter firmware will attempt to access a memory area that is no longer assigned the network devices when bringing up/down the interface or loading/unloading the driver. When this occurs, a reboot is required." Information from here https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04565693
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1692.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.