Bug 109209 - i915 module results in total lockups without any dmesg trace on a NP900X5N Kaby Lake machine
Summary: i915 module results in total lockups without any dmesg trace on a NP900X5N Ka...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-02 12:34 UTC by Jan
Modified: 2019-11-29 18:03 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features:


Attachments
dmesg (235.50 KB, text/plain)
2019-01-02 12:34 UTC, Jan
no flags Details
syslog (204.49 KB, text/plain)
2019-01-02 23:00 UTC, Jan
no flags Details
Xorg.0.log (with drm.debug=14 nouveau.modeset=0) (36.44 KB, text/x-log)
2019-01-03 18:18 UTC, Jan
no flags Details
syslog 20191024 (129.20 KB, text/plain)
2019-10-24 20:04 UTC, Jan
no flags Details
xorg 20191024 (30.82 KB, text/x-log)
2019-10-24 20:04 UTC, Jan
no flags Details

Description Jan 2019-01-02 12:34:20 UTC
Created attachment 142940 [details]
dmesg

I installed various Kali linux versions up to Linux 4.20.0-rc7 (downloaded, compiled and installed) on a Samsung NP900X5N laptop and have an issue with the driver after loading.

My configuration:
- i7 7500
- 16 gb / 256 gb ssd
- nvidia 940MX (for 3D graphics)

Shortly after loading the module the screen goes black (af if screen saver) and stays black. I tried to fix it myself.and 'studied' the behaviour for about 20 hours, I think it is a bug in the i915 module itself.

A summary of the test I performed.

- I tried several versions and distributions. They all result in the same behavior. Screen goes black.I do not see any logging in the  logs. I enabled ssh and the machine is unresponsive after the screen going black. The more tests I do (rebooting via holding power key) the sooner the screen goes black. I wonder if the gpu gets too hot locally. Btw, the processor is not hot as the fan stays off.

- When I disable the driver in grub or in the modprobe.d dir I do not have experience any hangs.

- With Windows 10 the machine does not results in lockups (kept the machine on for more than 24 hours).

I added drm.debug=14 module parameter, attach /var/log/messages file from boot to desktop to the bug. 

When the machines hangs, no ssh is possible. Copied the file afterwards. 
The syslog is attached
Comment 1 Ilia Mirkin 2019-01-02 20:37:35 UTC
To rule out any unfortunate interactions between intel and nouveau, try booting with

nouveau.modeset=0

(but not any other modesetting-related flags).
Comment 2 Jan 2019-01-02 21:38:07 UTC
I already tried that several times. KMS 'nouveau.modeset=0'. Just tried it again. Black screen within a few minutes after starting.
Comment 3 Jan 2019-01-02 23:00:59 UTC
Created attachment 142946 [details]
syslog

This time I uploaded syslog with Kernel parameters as requested.
Comment 4 Jani Nikula 2019-01-03 11:05:30 UTC
Please also attach Xorg.0.log (should be at either /var/log/Xorg.0.log or ~/.local/share/xorg/Xorg.0.log) with nouveau.modeset=0.
Comment 5 Jan 2019-01-03 18:18:49 UTC
Created attachment 142965 [details]
Xorg.0.log (with drm.debug=14 nouveau.modeset=0)

Hi Jani et al. I have included the Xorg.0.log. The settings are as requested. Please let me know in case you need more.
Comment 6 Jan 2019-01-05 15:01:18 UTC
A small update that might be of help.

Yesterday I started the laptop with 'i915.modeset=0'. After booting into shell I used the laptop for hours, same as always when the i915 driver is 'disabled' (did some reading and testing with the laptop how the kernel, udev, block/devices, sys etc works; and how I can compile and install a single module, without the need of compiling the whole kernel).

Then I removed the i915 driver with 'rmmod i915'. (required for the next step, otherwise it complains about 'File Exists'). Subsequently inserted the module with insmod -f /lib/modules/<version>/kernel/drivers/gpu/drm/i915/i915.ko.

After inserting the module the screen changed and the machine kept working. However after some time the laptop hangs (repeated these steps a few times, same results).

Also tested by removing the driver before the laptop hangs and see what happens. Unfortunately the machine does not switch back to the basic GUI :-( and I needed to connect via SSH. When I remove the module (via SSH) before the hang the machine keeps working.

Furthermore I repeatedly removed (before an expected hang) and inserted the module. Every (re)insertion of the module seems to result in a 'reset' in time before a hang.

In one occasion... the machine kept on running 15 minutes before a hang, but usually hangs much sooner.


Tested also with:   i915.modeset=1 nouveau.modeset=0 single debug drm.debug=14. Kept the machine almost an hour. In single mode it runs much longer.
And tested with:   i915.modeset=1 nouveau.modeset=0 debug drm.debug=14. Machine usually hangs within a few minutes.
Comment 7 Jan 2019-01-06 21:11:23 UTC
Based on a hint in the freenode intel-gfx forum of [TJ] I added intel_idle.max_cstate=1 yesterday. Tonight and today I ran the machine for more than 24 hours with the i915 module running in graphics mode without any hang.

Today I also tested with cstate=2, which also runs fine. As soon as I use cstate=3 the machine hangs in minutes.
Comment 8 Jan 2019-01-22 16:44:27 UTC
I use the machine successfully with the Kernel Parameter cstate=2. The machine is running stable for 2 weeks straight, while using it 14 hours a day and suspending at night.

In case of NO response, I suggest closing the bug while clearly stating that the bug has not been resolved. 

I perfectly understand that the bug is considered not important, but at least let's have clarity to everybody that reads this bug.
Comment 9 Lakshmi 2019-07-03 08:14:07 UTC
Jan, my apologies for responding on this issue so late.

Would mind testing this issue with latest drmtip?
(https://cgit.freedesktop.org/drm-tip).
Comment 10 Lakshmi 2019-07-29 12:04:13 UTC
Jan, Can you please provide feedback with drmtip?
Comment 11 Jan 2019-07-29 15:09:40 UTC
(In reply to Lakshmi from comment #10)
> Jan, Can you please provide feedback with drmtip?

Hi, thanks for responding. I am on holies. Will test when I am back. 12 August.

Is it possible to send me a link of the compiled version of the drmtip? Compiling the whole kernel took a lot of time and I was unable to insert the compiled i915 driver in my current Debian based distro last time.

Oh what is the likely cause of the bug and what should have fixed it?
Comment 12 Lakshmi 2019-07-30 10:29:13 UTC
(In reply to Jan from comment #11)
> (In reply to Lakshmi from comment #10)
> > Jan, Can you please provide feedback with drmtip?
> 
> Hi, thanks for responding. I am on holies. Will test when I am back. 12
> August.
> 
> Is it possible to send me a link of the compiled version of the drmtip?
> Compiling the whole kernel took a lot of time and I was unable to insert the
> compiled i915 driver in my current Debian based distro last time.
> 
> Oh what is the likely cause of the bug and what should have fixed it?

Unfortunately we can don't have the facility to send you the compiled drmtip version. Last time this issue was verified with kernel 4.20 and current drm-tip kernel is 5.3. Feedback from drmtip ensures that bug still exists and dmesg will be useful during investigation.
Comment 13 Lakshmi 2019-08-28 11:09:34 UTC
(In reply to Lakshmi from comment #12)
> (In reply to Jan from comment #11)
> > (In reply to Lakshmi from comment #10)
> > > Jan, Can you please provide feedback with drmtip?
> > 
> > Hi, thanks for responding. I am on holies. Will test when I am back. 12
> > August.
> > 
> > Is it possible to send me a link of the compiled version of the drmtip?
> > Compiling the whole kernel took a lot of time and I was unable to insert the
> > compiled i915 driver in my current Debian based distro last time.
> > 
> > Oh what is the likely cause of the bug and what should have fixed it?
> 
> Unfortunately we can don't have the facility to send you the compiled drmtip
> version. Last time this issue was verified with kernel 4.20 and current
> drm-tip kernel is 5.3. Feedback from drmtip ensures that bug still exists
> and dmesg will be useful during investigation.

No feedback for 4 weeks. Jan, any feedback with drmtip?
Comment 14 Jan 2019-10-02 10:12:34 UTC
Hi Lakshmi, I am currently stuffed with work because of a major project we are working on. Anyway, still have this on my backlog and expect to test it before end of the month. Apologies for the delay and please keep this bug open until I have tested it.
Comment 15 Jan 2019-10-20 12:35:49 UTC
Last week I compiled and tested kernel 5.2, 5.3 and 5.4.0rc1. 

With 5.3 and 5.4.0rc1 there is no issue; however only one module gets loaded (checked with lsmod), the temperature module. All other modules including i915 are not loaded. I noticed because a lot of hardware is not working, such as my touch pad, network etc.

With 5.2 all modules gets loaded.
Comment 16 Lakshmi 2019-10-21 06:01:48 UTC
(In reply to Jan from comment #15)
> Last week I compiled and tested kernel 5.2, 5.3 and 5.4.0rc1. 
> 
Any chance of getting dmesg log from boot from 5.4.0rc1?
Comment 17 Lakshmi 2019-10-21 06:02:55 UTC
(In reply to Lakshmi from comment #16)
> (In reply to Jan from comment #15)
> > Last week I compiled and tested kernel 5.2, 5.3 and 5.4.0rc1. 
> > 
> Any chance of getting dmesg log from boot from 5.4.0rc1?

Also xorg log as stated by Jani in comment 4.
Comment 18 Jan 2019-10-24 20:04:28 UTC
Created attachment 145810 [details]
syslog 20191024
Comment 19 Jan 2019-10-24 20:04:59 UTC
Created attachment 145811 [details]
xorg 20191024
Comment 20 Jan 2019-10-24 20:05:25 UTC
both uploaded.
Comment 21 Lakshmi 2019-10-25 07:41:42 UTC
(In reply to Jan from comment #18)
> Created attachment 145810 [details]
> syslog 20191024

Can you attach with kernel parameters drm.debug=0x1e log_buf_len=4M from bbot?(In reply to Jan from comment #15)
> Last week I compiled and tested kernel 5.2, 5.3 and 5.4.0rc1. 
> 
> With 5.3 and 5.4.0rc1 there is no issue; however only one module gets loaded
> (checked with lsmod), the temperature module. All other modules including
> i915 are not loaded. I noticed because a lot of hardware is not working,
> such as my touch pad, network etc.
> 
> With 5.2 all modules gets loaded.

I doubt if this issue is same as the original bug report.
Comment 22 Martin Peres 2019-11-29 18:03:01 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/207.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.