Bug 92201 - [drm] stuck on render ring
Summary: [drm] stuck on render ring
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-30 17:19 UTC by Denis Sokolovsky
Modified: 2016-09-22 08:56 UTC (History)
3 users (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (274.51 KB, text/plain)
2015-09-30 17:19 UTC, Denis Sokolovsky
no flags Details
dmesg.txt (2.53 KB, text/plain)
2015-09-30 17:24 UTC, Denis Sokolovsky
no flags Details
error-4.1.6.bz2 (249.74 KB, application/octet-stream)
2015-10-01 11:34 UTC, Denis Sokolovsky
no flags Details
kern-drm-nightly.log.bz2 (136.85 KB, application/octet-stream)
2015-10-01 22:51 UTC, Denis Sokolovsky
no flags Details
dmesg.txt.bz2 (103.72 KB, application/x-bzip)
2015-10-02 13:23 UTC, Denis Sokolovsky
no flags Details
error.bz2 (233.76 KB, application/x-bzip)
2015-10-02 13:25 UTC, Denis Sokolovsky
no flags Details

Description Denis Sokolovsky 2015-09-30 17:19:12 UTC
Created attachment 118549 [details]
/sys/class/drm/card0/error

Stuck occurred always when chrome is running, but actual triggering time/action/environment is unclear. I built drm-nightly, but since my current kernel have drm-i915 built-in and bug reproduction takes unknown time I can't report anything about it. First time I notice this issue on kernel 4.2.0

- system architecture: x86_64
- kernel version: 4.2.1-gentoo
- linux distribution: Gentoo
- machine: Lenovo T430 (2344BZU)
- display connector: panel connected via LVDS, VGA and DP++ disconnected
Comment 1 Denis Sokolovsky 2015-09-30 17:24:22 UTC
Created attachment 118550 [details]
dmesg.txt

Most relevant part of dmesg. Since I hadn't drm debug turned on it's all relevant info which I have.
Comment 2 Chris Wilson 2015-09-30 18:40:24 UTC
A few reports now with Sandybridge + kernel 4.2.0. Could you please double check with kernel 4.1.6+ to see if it is indeed the new kernel that introduces the fault?
Comment 3 Denis Sokolovsky 2015-10-01 11:34:07 UTC
Created attachment 118557 [details]
error-4.1.6.bz2

Friend of mine confirmed gpu hung on 4.1.6. Sadly he have somewhat misconfigured logging, so the only useful log info is

Sep 30 08:42:13 workhorse kernel: [drm] GPU HANG: ecode 6:0:0x85fffffc, in plasmashell [4084], reason: Ring hung, action: reset

uname -m: x86_64
uname -r: 4.1.6-gentoo
Comment 4 Chris Wilson 2015-10-01 11:40:58 UTC
(In reply to Denis Sokolovsky from comment #3)
> Created attachment 118557 [details]
> error-4.1.6.bz2

That error looks more likely to be a mesa bug - definite hang inside a batch suggesting a userspace bug as opposed to a bug in the kernel submission.
Comment 5 Denis Sokolovsky 2015-10-01 12:02:12 UTC
If bug is in userspace then it is, most probably, in plasmashell, as we have same mesa version (11.0.0). Also I have a lot of options in kernel command line, namely "i915.semaphores=1 i915.enable_rc6=7 i915.enable_fbc=1 i915.lvds_downclock=1"
Comment 6 Jani Nikula 2015-10-01 12:41:44 UTC
(In reply to Denis Sokolovsky from comment #5)
> Also I have a lot of options in kernel command
> line, namely "i915.semaphores=1 i915.enable_rc6=7 i915.enable_fbc=1
> i915.lvds_downclock=1"

Please reproduce the issue without any of those set. They are basically debug options, and we don't support changing them from their platform specific defaults.
Comment 7 Denis Sokolovsky 2015-10-01 13:23:05 UTC
Okay, should I try 4.1.6 (4.1.9), 4.2.1 (4.2.2) or drm-intel-nightly?
Comment 8 Denis Sokolovsky 2015-10-01 22:48:40 UTC
Without kernel cmdline fancy stuff things get much worse. System just hang completely, no magic sysrq, network, broken filesystem, etc. I'm not sure right now, but, afair, I've added "i915.semaphores=1" to cmdline because I had stability problems before, which was fixed with semaphores turned on.
Comment 9 Denis Sokolovsky 2015-10-01 22:51:52 UTC
Created attachment 118581 [details]
kern-drm-nightly.log.bz2

On my system dmesg saved into kern.log, but, due to system hang, I'm not sure if it contain all messages till system death.

I cut part of messages, because complete file, compressed with "bzip2 -9" hadn't fit in 3MB constraint.
Comment 10 Denis Sokolovsky 2015-10-02 13:22:11 UTC
Things are not that straightforward, as it seems initially. With same configuration and workload, system on 4.2.1 kernel works without hang, that I saw on drm-intel-nightly, for a lot longer. And even when hung occur kernel driver managed to restart rendering.
Comment 11 Denis Sokolovsky 2015-10-02 13:23:34 UTC
Created attachment 118599 [details]
dmesg.txt.bz2

dmesg from 4.2.1 with drm debug
Comment 12 Denis Sokolovsky 2015-10-02 13:25:20 UTC
Created attachment 118601 [details]
error.bz2

Corresponding error info
Comment 13 Denis Sokolovsky 2016-01-20 11:26:33 UTC
Not sure if this bug is still relevant, as I haven't experience GPU hang since I switched to 4.3.0 kernel.
Comment 14 Denis Sokolovsky 2016-01-20 11:48:33 UTC
Actually, not only kernel was new. Almost at the same time were few updates: chrome (47->48), mesa (11.0->11.1) and kernel (4.2.x->4.3.0). Since Dec 20, in logs, I can found only one warning about GPU (see below), but no hungs with current setup, usual workflow and uptime for 17/38+ days (excluding/including sleeps).

Jan  9 20:17:19 isis kernel: [1151010.810868] WARNING: CPU: 0 PID: 0 at /usr/src/linux-4.3.0-gentoo/drivers/gpu/drm/i915/intel_display.c:11293 intel_check_page_flip+0xed/0x100()
Jan  9 20:17:19 isis kernel: [1151010.810871] Kicking stuck page flip: queued at 28897891, now 28897895
Comment 15 yann 2016-09-22 08:56:26 UTC
(In reply to Denis Sokolovsky from comment #14)
> Actually, not only kernel was new. Almost at the same time were few updates:
> chrome (47->48), mesa (11.0->11.1) and kernel (4.2.x->4.3.0). Since Dec 20,
> in logs, I can found only one warning about GPU (see below), but no hungs
> with current setup, usual workflow and uptime for 17/38+ days
> (excluding/including sleeps).
> 
> Jan  9 20:17:19 isis kernel: [1151010.810868] WARNING: CPU: 0 PID: 0 at
> /usr/src/linux-4.3.0-gentoo/drivers/gpu/drm/i915/intel_display.c:11293
> intel_check_page_flip+0xed/0x100()
> Jan  9 20:17:19 isis kernel: [1151010.810871] Kicking stuck page flip:
> queued at 28897891, now 28897895

Closing this GPU hang issue now. Regarding warning, please update your kernel and if this is still occurring, fill a new bug and attached kernel log.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.