Bug 112428 - Unrecoverable GPU hang with 5.4.0 kernel
Summary: Unrecoverable GPU hang with 5.4.0 kernel
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: not set major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-29 13:11 UTC by Laurent Bonnaud
Modified: 2019-11-29 19:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Full dmesg (77.21 KB, text/plain)
2019-11-29 13:14 UTC, Laurent Bonnaud
no flags Details

Description Laurent Bonnaud 2019-11-29 13:11:21 UTC
Hi,

I was using my system, doing nothing special, and the GPU hung.

There are many reports about GPU hangs but this one seems different:
 - it occurred with kernel 5.4.0 instead of 5.3.x kernels (my Intel GPU also had many problems with 5.3.x kernels)
 - the GPU never recovered (which BTW caused some data loss).  I had to ssh into the system to get debug info.

Here is some system info (full details below):

Kernel: Linux xeelee 5.4.0-050400-generic #201911242031 SMP Mon Nov 25 01:35:10 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Distribution: Ubuntu 19.10

Machine: Intel NUC7i5BNB

Display connector: HDMI 2.0

[233850.738984] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[233850.739750] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}                                                                                                    
[233850.739824] i915 0000:00:02.0: Resetting chip for hang on rcs0
[233850.741595] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}                                                                                                    
[233850.742349] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}                                                                                                    
[234291.141681] INFO: task kworker/0:0:5853 blocked for more than 120 seconds.
[234291.141690]       Not tainted 5.4.0-050400-generic #201911242031
[234291.141693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[234291.141697] kworker/0:0     D    0  5853      2 0x80004000
[234291.141823] Workqueue: events i915_hotplug_work_func [i915]
[234291.141826] Call Trace:
[234291.141839]  __schedule+0x2e3/0x740
[234291.141846]  schedule+0x42/0xb0
[234291.141852]  schedule_preempt_disabled+0xe/0x10
[234291.141857]  __ww_mutex_lock.isra.0+0x261/0x7f0
[234291.141864]  __ww_mutex_lock_slowpath+0x16/0x20
[234291.141869]  ww_mutex_lock+0x38/0x90
[234291.141916]  drm_modeset_lock+0x35/0xb0 [drm]
[234291.142025]  intel_dp_retrain_link+0x94/0x1c0 [i915]
[234291.142122]  intel_ddi_hotplug+0x7a/0x350 [i915]
[234291.142130]  ? __switch_to_asm+0x40/0x70
[234291.142135]  ? __switch_to_asm+0x34/0x70
[234291.142140]  ? __switch_to_asm+0x40/0x70
[234291.142146]  ? __switch_to_asm+0x40/0x70
[234291.142238]  i915_hotplug_work_func+0x18b/0x280 [i915]
[234291.142249]  process_one_work+0x1ec/0x3a0
[234291.142256]  worker_thread+0x4d/0x400
[234291.142262]  kthread+0x104/0x140
[234291.142268]  ? process_one_work+0x3a0/0x3a0
[234291.142274]  ? kthread_park+0x90/0x90
[234291.142281]  ret_from_fork+0x35/0x40
Comment 1 Laurent Bonnaud 2019-11-29 13:12:23 UTC
Here is full system info:

# inxi -Fm
System:    Host: xeelee Kernel: 5.4.0-050400-generic x86_64 bits: 64 Console: tty 31
           Distro: Ubuntu 19.10 (Eoan Ermine)
Machine:   Type: Desktop Mobo: Intel model: NUC7i5BNB v: J31144-303 serial: GEBN715009CU UEFI: Intel
           v: BNKBL357.86A.0080.2019.0725.1139 date: 07/25/2019
Memory:    RAM: total: 15.59 GiB used: 5.07 GiB (32.6%)
           Array-1: capacity: 32 GiB slots: 2 EC: None
           Device-1: ChannelA-DIMM0 size: No Module Installed
           Device-2: ChannelB-DIMM0 size: 16 GiB speed: 2133 MT/s
CPU:       Topology: Dual Core model: Intel Core i5-7260U bits: 64 type: MCP L2 cache: 4096 KiB
           Speed: 642 MHz min/max: 400/3400 MHz Core speeds (MHz): 1: 682 2: 696
Graphics:  Device-1: Intel Iris Plus Graphics 640 driver: i915 v: kernel
           Display: server: X.org 1.20.5 driver: i915 tty: 113x59
           Message: Advanced graphics data unavailable in console for root.
Audio:     Device-1: Intel Sunrise Point-LP HD Audio driver: snd_hda_intel
           Sound Server: ALSA v: k5.4.0-050400-generic
Network:   Device-1: Intel Ethernet I219-V driver: e1000e
           IF: eno1 state: up speed: 1000 Mbps duplex: full mac: f4:4d:30:6d:87:6e
Partition: ID-1: / size: 465.26 GiB used: 294.53 GiB (63.3%) fs: btrfs dev: /dev/nvme0n1p2
Sensors:   System Temperatures: cpu: 43.5 C mobo: N/A
           Fan Speeds (RPM): N/A
Info:      Processes: 314 Uptime: 2d 17h 39m Init: systemd runlevel: 5 Shell: bash inxi: 3.0.36
Comment 2 Laurent Bonnaud 2019-11-29 13:13:12 UTC
Note that I cannot get more info about this GPU hang:

# cat /sys/class/drm/card0/error
cat: /sys/class/drm/card0/error: Cannot allocate memory
Comment 3 Laurent Bonnaud 2019-11-29 13:14:10 UTC
Created attachment 146048 [details]
Full dmesg
Comment 4 Martin Peres 2019-11-29 19:54:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/670.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.