Bug 109209 - i915 module results in total lockups without any dmesg trace on a NP900X5N Kaby Lake machine
Summary: i915 module results in total lockups without any dmesg trace on a NP900X5N Ka...
Status: NEEDINFO
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-02 12:34 UTC by Jan
Modified: 2019-01-07 09:53 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features:


Attachments
dmesg (235.50 KB, text/plain)
2019-01-02 12:34 UTC, Jan
no flags Details
syslog (204.49 KB, text/plain)
2019-01-02 23:00 UTC, Jan
no flags Details
Xorg.0.log (with drm.debug=14 nouveau.modeset=0) (36.44 KB, text/x-log)
2019-01-03 18:18 UTC, Jan
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan 2019-01-02 12:34:20 UTC
Created attachment 142940 [details]
dmesg

I installed various Kali linux versions up to Linux 4.20.0-rc7 (downloaded, compiled and installed) on a Samsung NP900X5N laptop and have an issue with the driver after loading.

My configuration:
- i7 7500
- 16 gb / 256 gb ssd
- nvidia 940MX (for 3D graphics)

Shortly after loading the module the screen goes black (af if screen saver) and stays black. I tried to fix it myself.and 'studied' the behaviour for about 20 hours, I think it is a bug in the i915 module itself.

A summary of the test I performed.

- I tried several versions and distributions. They all result in the same behavior. Screen goes black.I do not see any logging in the  logs. I enabled ssh and the machine is unresponsive after the screen going black. The more tests I do (rebooting via holding power key) the sooner the screen goes black. I wonder if the gpu gets too hot locally. Btw, the processor is not hot as the fan stays off.

- When I disable the driver in grub or in the modprobe.d dir I do not have experience any hangs.

- With Windows 10 the machine does not results in lockups (kept the machine on for more than 24 hours).

I added drm.debug=14 module parameter, attach /var/log/messages file from boot to desktop to the bug. 

When the machines hangs, no ssh is possible. Copied the file afterwards. 
The syslog is attached
Comment 1 Ilia Mirkin 2019-01-02 20:37:35 UTC
To rule out any unfortunate interactions between intel and nouveau, try booting with

nouveau.modeset=0

(but not any other modesetting-related flags).
Comment 2 Jan 2019-01-02 21:38:07 UTC
I already tried that several times. KMS 'nouveau.modeset=0'. Just tried it again. Black screen within a few minutes after starting.
Comment 3 Jan 2019-01-02 23:00:59 UTC
Created attachment 142946 [details]
syslog

This time I uploaded syslog with Kernel parameters as requested.
Comment 4 Jani Nikula 2019-01-03 11:05:30 UTC
Please also attach Xorg.0.log (should be at either /var/log/Xorg.0.log or ~/.local/share/xorg/Xorg.0.log) with nouveau.modeset=0.
Comment 5 Jan 2019-01-03 18:18:49 UTC
Created attachment 142965 [details]
Xorg.0.log (with drm.debug=14 nouveau.modeset=0)

Hi Jani et al. I have included the Xorg.0.log. The settings are as requested. Please let me know in case you need more.
Comment 6 Jan 2019-01-05 15:01:18 UTC
A small update that might be of help.

Yesterday I started the laptop with 'i915.modeset=0'. After booting into shell I used the laptop for hours, same as always when the i915 driver is 'disabled' (did some reading and testing with the laptop how the kernel, udev, block/devices, sys etc works; and how I can compile and install a single module, without the need of compiling the whole kernel).

Then I removed the i915 driver with 'rmmod i915'. (required for the next step, otherwise it complains about 'File Exists'). Subsequently inserted the module with insmod -f /lib/modules/<version>/kernel/drivers/gpu/drm/i915/i915.ko.

After inserting the module the screen changed and the machine kept working. However after some time the laptop hangs (repeated these steps a few times, same results).

Also tested by removing the driver before the laptop hangs and see what happens. Unfortunately the machine does not switch back to the basic GUI :-( and I needed to connect via SSH. When I remove the module (via SSH) before the hang the machine keeps working.

Furthermore I repeatedly removed (before an expected hang) and inserted the module. Every (re)insertion of the module seems to result in a 'reset' in time before a hang.

In one occasion... the machine kept on running 15 minutes before a hang, but usually hangs much sooner.


Tested also with:   i915.modeset=1 nouveau.modeset=0 single debug drm.debug=14. Kept the machine almost an hour. In single mode it runs much longer.
And tested with:   i915.modeset=1 nouveau.modeset=0 debug drm.debug=14. Machine usually hangs within a few minutes.
Comment 7 Jan 2019-01-06 21:11:23 UTC
Based on a hint in the freenode intel-gfx forum of [TJ] I added intel_idle.max_cstate=1 yesterday. Tonight and today I ran the machine for more than 24 hours with the i915 module running in graphics mode without any hang.

Today I also tested with cstate=2, which also runs fine. As soon as I use cstate=3 the machine hangs in minutes.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.