Bug 107420 - [hsw] GPU HANG: ecode 7:0:0x86dffffd in Xorg
Summary: [hsw] GPU HANG: ecode 7:0:0x86dffffd in Xorg
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-29 16:31 UTC by Georg
Modified: 2018-12-10 13:10 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
card0 error (1.04 MB, text/plain)
2018-07-29 16:31 UTC, Georg
Details
GPU Error file (1.04 MB, text/plain)
2018-08-05 15:41 UTC, Georg
Details
GPU Hang (147.93 KB, text/plain)
2018-08-06 13:09 UTC, Georg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg 2018-07-29 16:31:47 UTC
Created attachment 140876 [details]
card0 error

My system is a desktop with Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz.

This bug happens immediately after installation a debian stretch. 
I am able to boot and log in. Using firefox or terminal works as well.
When I start nautilus the system carashes and I'm pushed back to the log in screen.

All data I share are from this situation.

Btw: I see a very similar behavior if I try to install Ubuntu 18.04. I'm not able to run the live system for installation. Neither from USB stick nor from DVD. The same image runs in a virtual box. (so Image seems to be OK.)
Comment 1 Georg 2018-07-29 16:35:29 UTC
Sorry, this may be important too: the system is still running with Ubuntu up to 16.04. No issues here at all.
Comment 2 Chris Wilson 2018-07-29 16:36:03 UTC
Update mesa; this should be fixed circa 17.3, current stable is 18.1.
Comment 3 Georg 2018-08-05 15:41:27 UTC
Created attachment 140967 [details]
GPU Error file

This is from Ubuntu 18.04 installation DVD
Comment 4 Georg 2018-08-05 15:43:07 UTC
Hmm. I think the problem is something else:
I really did a lot in the meantime. I even tried to upgrade 'mesa'. I was finally able to compile, but was not able to install it (make install).

But: as mentioned in my first post Ubuntu seems to have a similar problem on my system. (I cannot imagine, that this is a problem on all systems. :-) There probably would be more noise... )

So I boot the installation CD in the live system mode. I end up in the graphical screen but I cannot do anything her. If I switch to a terminal (<ctrl><alt>F4) I find the error file attached here and I see mesa version 18.0.0. This is the reason for my doubts.

In another post I found this 'workaround': Install Ubuntu 16.04 and upgrade to 18.04. Ok. I did this and - it seems to work as expected.
I don's say this to propose this as the solution, but maybe it helps you to find the root cause.

Thank's for taking care for me!
Comment 5 Georg 2018-08-05 17:34:15 UTC
Erratum: the workaround is not working too. (16.04 --> 18.04). Same behavior.
Comment 6 Georg 2018-08-06 13:09:08 UTC
Created attachment 140980 [details]
GPU Hang
Comment 7 Georg 2018-08-06 17:51:47 UTC
OK. I don't have any idea any more. I tried to collect more information, but I move in a circle:

If there is anybody having an idea, here my today's summary:
Mainboard: Intel DZ87KLT-75K
CPU: Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz

I'm using the internal grapic, I don not have another grafic adapter installed.

The system runs since 2014 with Ubuntu 14.04 and 16.04 without any (obvious) issue.
There is one partition in the system I want to install 18.04. On this partition I can install 16.04 without any problem. I did so several times and used the system each for a couple of hours just to see: It works.

Because 18.04 is not even starting the installation process from DVD or USB, I tried:
- Ubuntu 18.04 (mesa 18)
- Debean 9 (mesa 13)
- lubuntu 18.04
- suse 15 Leap with
  - Gnome X11
  - Gnome Wayland
  - KDE
- Ubuntu 16.04 --> dist-upgrade to 18.04. Seemed to run, but cashed as well after booting the first or second time. (I don't remember)

Sometimes I do have the /sys/class/rdm/card0/error, sometimes not. But I always see the system crashing in graphic mode. It always runs fine in terminal mode.

Im long anough with linux to know, that there is very often a mystic swith/parameter/anything to use to overcome the quaintest things.

So don't hesitate to teach me one more. :-)
As you can see I did a lot. If you want me to do more, please advise.
Comment 8 Georg 2018-08-12 15:04:24 UTC
In the meantime I did this:
- Update of the boards firmware (BIOS)
Concentrating on Ubuntu:
- Install 16.04, upgrading to 18.04
- Install kernel (linux-kernel-amdgpu-binaries-master) from 4.15 to 4.18.0-rc8 including firmware-radeon-ucode_2.10_all

Perhaps it is interesting that things seem to be a little better: I always reach the login in gdm3 or lightdm after the kernel and firmware upgrade. This was not the case before. There I ended up in a screen without user name and scaling was wrong. 

"dpkg-query -l | grep mesa" shows mesa version 18.0.5.

Beside this I played with boot parameters: "intel_iommu=igfx_off", "i915.enable_rc6=0".

I can't believe that I'm the only one in the world with this problem. I use an Intel board (DZ87KLT-75K) with no additional boards in it. And I'm not even able to boot the installation of Ubuntu 18.04. This reproduces 100%.

Btw: just for testing I added an external graphic board. In this configuration it works without any problem. If I switch back to the internal graphic unit problems are back.

Any further hint more than apprechiated.
Comment 9 Mark Janes 2018-08-12 16:29:36 UTC
It has been several years, but I remember being surprised that Ubuntu had issues installing on my Haswell laptop.  Can you verify that other distributions install and run properly on your hardware (arch/debian-testing)?.
Comment 10 Georg 2018-08-12 16:33:33 UTC
Lubuntu 18.04, Debian 9, suse 15 also not working with the same symptoms. KDE, Gnome doesn't matter.
Comment 11 Georg 2018-08-13 15:06:35 UTC
Sorry if it becomes stupid: When looking around for "gpu hang" and "i915" there are hints that "i915.enable_rc6=0" in the grub command line did solv the problem.
I tried as well but without success.

But!: "modinfo -p i915" show a lot of module parameters, but no parameter 'enable_rc6'.

Is it possible that with kernel around 4.15.0 (default of Ubuntu 18.04) the module i915 was 'optimized' so 'enable_rc6' is ON and I cannot switch it any more?

Btw: "modinfo -p i915" for Ubuntu 16.04 has 'enable_rc6' in it's list.
Comment 12 Danylo 2018-08-13 15:48:09 UTC
The parameter i915.enable_rc6 is indeed gone, you can find more in:
https://bugs.freedesktop.org/show_bug.cgi?id=105962

If you didn't use the "i915.enable_rc6=0" with previous kernels it shouldn't be needed in new ones.
Comment 13 Georg 2018-08-15 15:40:51 UTC
Thank's Danylo!

Just to inform you: I installed 16.04 which comes with a kernel 4.4.??.
I'm not sure that this is really valid, but I installed 4.18.0-rc8 including firmware-radeon-ucode_2.10_all. Nothing else. And I have my GPU Hang within 16.04 together with /sys/class/drm/card0/error.

Now it's your turn. :-|
Comment 14 Danylo 2018-08-15 15:50:47 UTC
Unfortunately I cannot help here, I have tried to reproduce the issue on a laptop with Haswell i5 but everything worked good.
Comment 15 Mark Janes 2018-08-15 16:36:43 UTC
Since the issue affects multiple distros, but others cannot reproduce it, I expect there is an issue with your hardware.

While working with the hundreds of older systems deployed in our continuous integration lab, I have seen issues like this caused by faulty ram.  I would run memtest86.
Comment 16 Georg 2018-08-22 09:07:52 UTC
Thank's ALL. 
I really believe that this is something with the special configuration/chip set.
I do not think that there is something broken, because the system runs well with older kernels and runs well with the actual if I install an external and use graphic adapter.
memtest86 ran for 36h, 8 passes, no error. My next test will be to mix the existing RAM modules to give the internal GPU the chance to use different areas of it.
I do see different error codes, not only 7:0:0x86dffffd and 7:0:0x85dfdffc. I think at least one more if I remember well. Wwould it be helpful for you to have more?
Comment 17 Georg 2018-11-05 15:12:25 UTC
Here I am again. After some time of patience I opened this issue again with some interesting results:

Some forums talk about 'nomodeset'. And in deed: it works. - ...better.
Now it is possible to boot the Ubuntu life DVD and boot an installed 18.04 in graphical mode. The system seems to behave as everyone would expect.

But: There are some drawbacks.
- HDMI for display works fine. HDMI for audio is not available. "pactl list sinks|grep -i hdmi" is empty.
If I boot without 'nomodeset' (and swicht to a TTY) I do see a HDMI audio device here.

- It seems that I can really work with the Ubuntu GUI for a long time. But if I switch to TTY3 for example, then I have a 80x25 display resolution what is not the case with no 'nomodeset'. The only way back is <ctrl>d to close the terminal. Switchin to another terminal is not possible. Then I am in the log in screen again, but now the system is dead. No keyboard, no mouse. Only the clock on top of the screen refreshes and screensaver works...
Comment 18 Georg 2018-12-10 09:47:06 UTC
I'm now pretty sure that the root cause is a chang on the kernel side. Code, bahaviour, defaults, interaction with other packages, ... - whatever.

I tried with Ubuntu18.04 and Manjaro (Arch) and downgraded the kernel from 4.15 to 4.08 (Ubuntu) and from 4.19 to 4.4 (Manjaro) and the problem is gone.
In both cases I don't do anything else explicitely. I install the kernel and reboot. Entering the desktop shows freezing (gpu hang) with actual kernel or normal behavior with older versions. 

My investigation showed that 4.9 or higher has my problem.
Just for compleetness here: my BIOS is up to date and 'intel-microcode' seems to be the latest as well.

If anybody want's to have more information I am happy to support, otherwise I have my ear close to some forums and wait..
Comment 19 Denis 2018-12-10 13:10:35 UTC
Hi Georg, as Danylo mentioned, we couldn't reproduce this problem, so the only way I see now - to try and bisect the kernel.

It might be hard because of some patches, required for building some intermittent versions, but in my experience - it is possible to google/find these patches and apply them.

There are two sources of kernels can be checked:

https://github.com/freedesktop/drm-intel
https://github.com/freedesktop/drm-tip

First one should be stable, second - the most up to date (for your case I would suggest stable, because, according to you, failure appeared somewhere in 4.9 version)

How-to instructions:

https://git-scm.com/docs/git-bisect
https://01.org/linuxgraphics/documentation/build-guide-0

Good luck.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.