Bug 106721 - [i915] Linux machine hangs after several reboots
Summary: [i915] Linux machine hangs after several reboots
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
: 106722 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-05-30 11:17 UTC by ccalderon
Modified: 2018-10-18 15:22 UTC (History)
2 users (show)

See Also:
i915 platform: BSW/CHT
i915 features: power/Other


Attachments
user log during the reboot test (34.38 KB, text/x-log)
2018-05-30 11:17 UTC, ccalderon
no flags Details
sys log for ubilinux reboot test (4.49 MB, text/plain)
2018-05-30 11:28 UTC, ccalderon
no flags Details
kernel log for Ubilinux reboot test (3.15 MB, text/x-log)
2018-05-30 11:32 UTC, ccalderon
no flags Details
UART output for reboot test (27.49 KB, text/x-log)
2018-05-30 11:36 UTC, ccalderon
no flags Details
Time log for script used in reboot test (5.75 KB, text/x-log)
2018-05-30 11:37 UTC, ccalderon
no flags Details
k4.17:: dmesg file for each reboot (546.34 KB, application/gzip)
2018-06-11 11:53 UTC, ccalderon
no flags Details
k4.17:: Reboot time log (11.17 KB, text/x-log)
2018-06-11 11:55 UTC, ccalderon
no flags Details
k4.17:: System log file (42.06 MB, text/plain)
2018-06-11 12:17 UTC, ccalderon
no flags Details
k4.17:: kernel log for drm k4.17 (65.82 MB, text/x-log)
2018-06-11 12:32 UTC, ccalderon
no flags Details

Description ccalderon 2018-05-30 11:17:21 UTC
Created attachment 139847 [details]
user log during the reboot test

Tested with Ubuntu (4.16, 4.18) and Ubilinux3 and 4 (Debian) distros and multiple kernel versions (from 4.4 to 4.15) on UP board machines.

In all cases, the issue is the same. After several software reboots (in most of the cases less than 100 reboots) the UP machine hangs and is blocked. So, it is impossible to use but disconnecting the power wire.

Using an FTDI connector to use the UART transmitter, some ERROR messages were found in the exact moment where machine hangs.
All the ERROR are on line with the i915 module of DRM used in the GPU.

After disabling all modules for i915 in the kernel configuration file and rebuild the full kernel and test again, with more than 1000 reboots, the UP machine continues working properly. That test was used in Ubuntu and Ubilinux (Debian).
Comment 1 ccalderon 2018-05-30 11:25:45 UTC
*** Bug 106722 has been marked as a duplicate of this bug. ***
Comment 2 ccalderon 2018-05-30 11:28:59 UTC
Created attachment 139848 [details]
sys log for ubilinux reboot test
Comment 3 ccalderon 2018-05-30 11:32:47 UTC
Created attachment 139851 [details]
kernel log for Ubilinux reboot test
Comment 4 ccalderon 2018-05-30 11:36:07 UTC
Created attachment 139852 [details]
UART output for reboot test
Comment 5 ccalderon 2018-05-30 11:37:12 UTC
Created attachment 139853 [details]
Time log for script used in reboot test
Comment 6 ccalderon 2018-05-30 13:49:15 UTC
Testing on:
Ubuntu 16.04.3 kernel 4.4/4.9/4.10/4.13
Ubuntu 18.04 kernel 4.15
Ubilinux3 and ubilinux4 (both Debian-based distro) with kernel 4.4/4.9.
Comment 7 Jani Saarinen 2018-05-31 06:55:31 UTC
Can you also try with our latest drm-tip: https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M from start to hang? 

Or add drm.debug=14 module parameter to get detailed debugging, and get the dmesg again.
Comment 8 Jani Nikula 2018-05-31 09:30:23 UTC
Yeah, use drm-tip or latest upstream kernel with drm.debug=14, attach dmesg as well as the serial console log leading to the hang.

Did you have a display connected? If yes, try without, and vice versa.
Comment 9 ccalderon 2018-06-07 07:32:46 UTC
Hi,

I cloned the drm-tip kernel latest version and built it to get the .deb packages.
Then, I installed the kernel in my machine updating the grub but when I reboot, the machine hangs after loading the drm kernel.

Also, where exactly I have to put drm.debug=0x1e log_buf_len=4M or drm.debug=14 parameters to start debugging. Sorry that is not really clear for me.

Thank you!
Comment 10 Jani Nikula 2018-06-07 10:13:28 UTC
(In reply to ccalderon from comment #9)
> I cloned the drm-tip kernel latest version and built it to get the .deb
> packages.
> Then, I installed the kernel in my machine updating the grub but when I
> reboot, the machine hangs after loading the drm kernel.

Possibly depends on your kernel config. You should probably start off with the distro config from /boot. Also make sure you have the updated initramfs.

https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel and relevant pages will help you.

If all else fails, you might have a look at the drm-tip ppa, although I can't endorse it in any way. http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/

> Also, where exactly I have to put drm.debug=0x1e log_buf_len=4M or
> drm.debug=14 parameters to start debugging. Sorry that is not really clear
> for me.

Three basic options, 1) hit 'e' in grub menu before booting the kernel, and add them to the kernel command line (needs to be done for each boot), 2) add them to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and be sure to run update-grub afterwards (permanent), or 3) add them in modprobe configuration (see man modprobe.d) (permanent).
Comment 11 ccalderon 2018-06-11 11:53:39 UTC
Created attachment 140115 [details]
k4.17:: dmesg file for each reboot
Comment 12 ccalderon 2018-06-11 11:55:07 UTC
Created attachment 140116 [details]
k4.17:: Reboot time log
Comment 13 ccalderon 2018-06-11 12:17:11 UTC
Created attachment 140118 [details]
k4.17:: System log file
Comment 14 ccalderon 2018-06-11 12:32:15 UTC
Created attachment 140119 [details]
k4.17:: kernel log for drm k4.17
Comment 15 ccalderon 2018-06-11 12:36:41 UTC
Hi,

The latest DRM version was tested using kernel 4.17.
After more than 100 reboots the board hanged again.

I attach you dmesg files for each reboot and the kernel and system log files.
The grub was modified to include the drm.debug=0x1e log_buf_len=4M

Thank you.
Comment 16 Jani Saarinen 2018-06-25 10:06:13 UTC
While sending logs do not compress /tar/zip etc..) just add plain test files thanks.  I suppose decent enough logs provided now.
Comment 17 Lakshmi 2018-09-11 09:17:54 UTC
Reporter, do you still have the issue?
Comment 18 Lakshmi 2018-10-18 15:21:51 UTC
No feedback from more than a month. I assume no issues for reporter.

In case it appears again:
Please try to reproduce the issue using latest drm-tip (https://cgit.freedesktop.org/drm-tip).
If problem exists with latest drm-tip, set kernel parameters drm.debug=0x1e log_buf_len=4M and reboot.
Try to reproduce the issue and attach the dmesg log.

This way we see more information about the bug and easy for debugging.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.