Bug 100110 - [SNB] 4.11-rc1: gpu hang after about 10 minutes of Plasma session usage during rsync file copy
Summary: [SNB] 4.11-rc1: gpu hang after about 10 minutes of Plasma session usage durin...
Status: CLOSED DUPLICATE of bug 99671
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-08 10:50 UTC by Martin Steigerwald
Modified: 2017-07-24 22:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
complete dmesg of boot where intel gpu hung (61.12 KB, text/plain)
2017-03-08 10:50 UTC, Martin Steigerwald
no flags Details
Xorg log of session where intel GPU hung (29.85 KB, text/plain)
2017-03-08 10:53 UTC, Martin Steigerwald
no flags Details
card0-error debug file after gpu hang (20.76 KB, application/octet-stream)
2017-03-08 11:09 UTC, Martin Steigerwald
no flags Details

Description Martin Steigerwald 2017-03-08 10:50:40 UTC
Created attachment 130120 [details]
complete dmesg of boot where intel gpu hung

I got around to test whether GPU hangs on ThinkPad T520 with Sandybridge HD 3000 graphics that started with 4.9 are fixed in 4.11-rc1. Unfortunately they are not, but this time I had another box to SSH into the laptop and gather the error file.

I compiled 4.11-rc1 + a few commits, booted from it and then copied some files around with rsync. I had one Plasma session with kwin_x11 and Compositing running. During the rsync, which might be related or pure coincidence, graphic output froze. I was able to switch to TTY1, but cursor was frozen to.

I SSH´d from my training workstation to the laptop I saw the following in dmesg:

[ 1393.553774] [drm] GPU HANG: ecode 6:0:0x00ffffff, in kwin_x11 [2261], reason: Hang on render ring, action: reset
[ 1393.553778] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1393.553779] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1393.553780] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1393.553781] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1393.553782] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1393.553833] drm/i915: Resetting chip after gpu hang
[ 1401.549065] drm/i915: Resetting chip after gpu hang
[ 1401.549077] [drm:i915_reset] *ERROR* GPU recovery failed

As the hang happened I waited at least a minute for it to recover, but apparently I wasn´t able to.

This is on ThinkPad T520 with Sandybridge HD 3000 graphics on Debian Sid with Xorg running with modesetting driver. There is no custom configuration like enabling DRI 3 or so… modesetting driver is default on Debian Sid since some time.

I will attach complete dmesg and Xorg log of that boot.

Relevant package versions:

~> apt-show-versions | egrep "libdrm|libgbm|xserver-xorg:|xserver-xorg-core:|libgl1-mesa-dri" | grep amd64
libdrm-amdgpu1:amd64/sid 2.4.74-1 uptodate
libdrm-dev:amd64/sid 2.4.74-1 uptodate
libdrm-intel1:amd64/sid 2.4.74-1 uptodate
libdrm-nouveau2:amd64/sid 2.4.74-1 uptodate
libdrm-radeon1:amd64/sid 2.4.74-1 uptodate
libdrm2:amd64/sid 2.4.74-1 uptodate
libgbm1:amd64/experimental 17.0.1-1 uptodate
libgl1-mesa-dri:amd64/experimental 17.0.1-1 uptodate
xserver-xorg:amd64/sid 1:7.7+18 uptodate
xserver-xorg-core:amd64/sid 2:1.19.2-1 uptodate

What I offer:
- test patches
- test different driver settings
- provide additional information if obtainable with reasonable amount of time

What I do not offer:
- git bisect (due to amount of time needed and fact that this is a production machine)
Comment 1 Martin Steigerwald 2017-03-08 10:53:42 UTC
Created attachment 130121 [details]
Xorg log of session where intel GPU hung

Hardware information. No external display was connected.

~> phoronix-test-suite system-info

Phoronix Test Suite v5.2.1
[…]
System Information

Hardware:
Processor: Intel Core i5-2520M @ 3.20GHz (4 Cores), Motherboard: LENOVO 42433WG, Chipset: Intel 2nd Generation Core Family DRAM, Memory: 16384MB, Disk: 300GB INTEL SSDSA2CW30 + 480GB Crucial_CT480M50, Graphics: Intel 2nd Generation Core Family IGP, Audio: Conexant CX20590, Network: Intel 82579LM Gigabit Connection + Intel Centrino Advanced-N 6205

Software:
OS: Debian 9.0, Kernel: 4.8.16-tp520+ (x86_64), Desktop: KDE Frameworks 5, Display Server: X Server 1.19.2, Display Driver: modesetting 1.19.2, OpenGL: 3.3 Mesa 17.0.1, File-System: btrfs, Screen Resolution: 1920x1080
Comment 2 Martin Steigerwald 2017-03-08 11:09:57 UTC
Created attachment 130122 [details]
card0-error debug file after gpu hang

I almost forgot the probably most important file for you. Here it is. Thanks, Martin
Comment 3 Chris Wilson 2017-03-08 11:16:46 UTC
Quick question, have the nature of the hangs changed? In 4.9 do you also think it is related to mempressure (i.e. rsync is a critical factor)?
Comment 4 Martin Steigerwald 2017-03-08 12:25:43 UTC
Chris, I have no idea. Sometimes it hung during playing PlaneShift, with 4.10 once during switch between two Plasma sessions… but this could all just be co-incidence. I currently have no clear way to reproduce these hangs :(. I reported them all here, but this is the first where I capture card error file.

All I know that 4.8 is the last kernel which runs stable. There had been some GPU hangs there as well during playing PlaneShift, but all recovered within a few seconds.
Comment 5 Martin Steigerwald 2017-03-08 12:26:18 UTC
Also this hang with 4.11-rc1 was without external display attached and with just one Plasma session.
Comment 6 Martin Steigerwald 2017-03-08 12:27:54 UTC
Oh sorry, and for memory pressure: I don´t think the machine could have had memory pressure – especially not after a fresh boot with just one Plasma session. Current memory figures:


~> free -m
              total        used        free      shared  buff/cache   available
Mem:          15829        4196         210         268       11422       11032
Swap:         12287           0       12287
Comment 7 Chris Wilson 2017-03-08 12:28:29 UTC

*** This bug has been marked as a duplicate of bug 99671 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.