Created attachment 140876 [details]
My system is a desktop with Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz.
This bug happens immediately after installation a debian stretch.
I am able to boot and log in. Using firefox or terminal works as well.
When I start nautilus the system carashes and I'm pushed back to the log in screen.
All data I share are from this situation.
Btw: I see a very similar behavior if I try to install Ubuntu 18.04. I'm not able to run the live system for installation. Neither from USB stick nor from DVD. The same image runs in a virtual box. (so Image seems to be OK.)
Sorry, this may be important too: the system is still running with Ubuntu up to 16.04. No issues here at all.
Update mesa; this should be fixed circa 17.3, current stable is 18.1.
Created attachment 140967 [details]
GPU Error file
This is from Ubuntu 18.04 installation DVD
Hmm. I think the problem is something else:
I really did a lot in the meantime. I even tried to upgrade 'mesa'. I was finally able to compile, but was not able to install it (make install).
But: as mentioned in my first post Ubuntu seems to have a similar problem on my system. (I cannot imagine, that this is a problem on all systems. :-) There probably would be more noise... )
So I boot the installation CD in the live system mode. I end up in the graphical screen but I cannot do anything her. If I switch to a terminal (<ctrl><alt>F4) I find the error file attached here and I see mesa version 18.0.0. This is the reason for my doubts.
In another post I found this 'workaround': Install Ubuntu 16.04 and upgrade to 18.04. Ok. I did this and - it seems to work as expected.
I don's say this to propose this as the solution, but maybe it helps you to find the root cause.
Thank's for taking care for me!
Erratum: the workaround is not working too. (16.04 --> 18.04). Same behavior.
Created attachment 140980 [details]
OK. I don't have any idea any more. I tried to collect more information, but I move in a circle:
If there is anybody having an idea, here my today's summary:
Mainboard: Intel DZ87KLT-75K
CPU: Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz
I'm using the internal grapic, I don not have another grafic adapter installed.
The system runs since 2014 with Ubuntu 14.04 and 16.04 without any (obvious) issue.
There is one partition in the system I want to install 18.04. On this partition I can install 16.04 without any problem. I did so several times and used the system each for a couple of hours just to see: It works.
Because 18.04 is not even starting the installation process from DVD or USB, I tried:
- Ubuntu 18.04 (mesa 18)
- Debean 9 (mesa 13)
- lubuntu 18.04
- suse 15 Leap with
- Gnome X11
- Gnome Wayland
- Ubuntu 16.04 --> dist-upgrade to 18.04. Seemed to run, but cashed as well after booting the first or second time. (I don't remember)
Sometimes I do have the /sys/class/rdm/card0/error, sometimes not. But I always see the system crashing in graphic mode. It always runs fine in terminal mode.
Im long anough with linux to know, that there is very often a mystic swith/parameter/anything to use to overcome the quaintest things.
So don't hesitate to teach me one more. :-)
As you can see I did a lot. If you want me to do more, please advise.
In the meantime I did this:
- Update of the boards firmware (BIOS)
Concentrating on Ubuntu:
- Install 16.04, upgrading to 18.04
- Install kernel (linux-kernel-amdgpu-binaries-master) from 4.15 to 4.18.0-rc8 including firmware-radeon-ucode_2.10_all
Perhaps it is interesting that things seem to be a little better: I always reach the login in gdm3 or lightdm after the kernel and firmware upgrade. This was not the case before. There I ended up in a screen without user name and scaling was wrong.
"dpkg-query -l | grep mesa" shows mesa version 18.0.5.
Beside this I played with boot parameters: "intel_iommu=igfx_off", "i915.enable_rc6=0".
I can't believe that I'm the only one in the world with this problem. I use an Intel board (DZ87KLT-75K) with no additional boards in it. And I'm not even able to boot the installation of Ubuntu 18.04. This reproduces 100%.
Btw: just for testing I added an external graphic board. In this configuration it works without any problem. If I switch back to the internal graphic unit problems are back.
Any further hint more than apprechiated.
It has been several years, but I remember being surprised that Ubuntu had issues installing on my Haswell laptop. Can you verify that other distributions install and run properly on your hardware (arch/debian-testing)?.
Lubuntu 18.04, Debian 9, suse 15 also not working with the same symptoms. KDE, Gnome doesn't matter.
Sorry if it becomes stupid: When looking around for "gpu hang" and "i915" there are hints that "i915.enable_rc6=0" in the grub command line did solv the problem.
I tried as well but without success.
But!: "modinfo -p i915" show a lot of module parameters, but no parameter 'enable_rc6'.
Is it possible that with kernel around 4.15.0 (default of Ubuntu 18.04) the module i915 was 'optimized' so 'enable_rc6' is ON and I cannot switch it any more?
Btw: "modinfo -p i915" for Ubuntu 16.04 has 'enable_rc6' in it's list.
The parameter i915.enable_rc6 is indeed gone, you can find more in:
If you didn't use the "i915.enable_rc6=0" with previous kernels it shouldn't be needed in new ones.
Just to inform you: I installed 16.04 which comes with a kernel 4.4.??.
I'm not sure that this is really valid, but I installed 4.18.0-rc8 including firmware-radeon-ucode_2.10_all. Nothing else. And I have my GPU Hang within 16.04 together with /sys/class/drm/card0/error.
Now it's your turn. :-|
Unfortunately I cannot help here, I have tried to reproduce the issue on a laptop with Haswell i5 but everything worked good.
Since the issue affects multiple distros, but others cannot reproduce it, I expect there is an issue with your hardware.
While working with the hundreds of older systems deployed in our continuous integration lab, I have seen issues like this caused by faulty ram. I would run memtest86.
I really believe that this is something with the special configuration/chip set.
I do not think that there is something broken, because the system runs well with older kernels and runs well with the actual if I install an external and use graphic adapter.
memtest86 ran for 36h, 8 passes, no error. My next test will be to mix the existing RAM modules to give the internal GPU the chance to use different areas of it.
I do see different error codes, not only 7:0:0x86dffffd and 7:0:0x85dfdffc. I think at least one more if I remember well. Wwould it be helpful for you to have more?
Here I am again. After some time of patience I opened this issue again with some interesting results:
Some forums talk about 'nomodeset'. And in deed: it works. - ...better.
Now it is possible to boot the Ubuntu life DVD and boot an installed 18.04 in graphical mode. The system seems to behave as everyone would expect.
But: There are some drawbacks.
- HDMI for display works fine. HDMI for audio is not available. "pactl list sinks|grep -i hdmi" is empty.
If I boot without 'nomodeset' (and swicht to a TTY) I do see a HDMI audio device here.
- It seems that I can really work with the Ubuntu GUI for a long time. But if I switch to TTY3 for example, then I have a 80x25 display resolution what is not the case with no 'nomodeset'. The only way back is <ctrl>d to close the terminal. Switchin to another terminal is not possible. Then I am in the log in screen again, but now the system is dead. No keyboard, no mouse. Only the clock on top of the screen refreshes and screensaver works...
I'm now pretty sure that the root cause is a chang on the kernel side. Code, bahaviour, defaults, interaction with other packages, ... - whatever.
I tried with Ubuntu18.04 and Manjaro (Arch) and downgraded the kernel from 4.15 to 4.08 (Ubuntu) and from 4.19 to 4.4 (Manjaro) and the problem is gone.
In both cases I don't do anything else explicitely. I install the kernel and reboot. Entering the desktop shows freezing (gpu hang) with actual kernel or normal behavior with older versions.
My investigation showed that 4.9 or higher has my problem.
Just for compleetness here: my BIOS is up to date and 'intel-microcode' seems to be the latest as well.
If anybody want's to have more information I am happy to support, otherwise I have my ear close to some forums and wait..
Hi Georg, as Danylo mentioned, we couldn't reproduce this problem, so the only way I see now - to try and bisect the kernel.
It might be hard because of some patches, required for building some intermittent versions, but in my experience - it is possible to google/find these patches and apply them.
There are two sources of kernels can be checked:
First one should be stable, second - the most up to date (for your case I would suggest stable, because, according to you, failure appeared somewhere in 4.9 version)