Summary: | [drm] stuck on render ring | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Denis Sokolovsky <ganellon> | ||||||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||
Severity: | normal | ||||||||||||||||
Priority: | medium | CC: | intel-gfx-bugs, ioann.sys, Manuel.h87 | ||||||||||||||
Version: | XOrg git | ||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||
OS: | Linux (All) | ||||||||||||||||
Whiteboard: | |||||||||||||||||
i915 platform: | SNB | i915 features: | GPU hang | ||||||||||||||
Attachments: |
|
Created attachment 118550 [details]
dmesg.txt
Most relevant part of dmesg. Since I hadn't drm debug turned on it's all relevant info which I have.
A few reports now with Sandybridge + kernel 4.2.0. Could you please double check with kernel 4.1.6+ to see if it is indeed the new kernel that introduces the fault? Created attachment 118557 [details]
error-4.1.6.bz2
Friend of mine confirmed gpu hung on 4.1.6. Sadly he have somewhat misconfigured logging, so the only useful log info is
Sep 30 08:42:13 workhorse kernel: [drm] GPU HANG: ecode 6:0:0x85fffffc, in plasmashell [4084], reason: Ring hung, action: reset
uname -m: x86_64
uname -r: 4.1.6-gentoo
(In reply to Denis Sokolovsky from comment #3) > Created attachment 118557 [details] > error-4.1.6.bz2 That error looks more likely to be a mesa bug - definite hang inside a batch suggesting a userspace bug as opposed to a bug in the kernel submission. If bug is in userspace then it is, most probably, in plasmashell, as we have same mesa version (11.0.0). Also I have a lot of options in kernel command line, namely "i915.semaphores=1 i915.enable_rc6=7 i915.enable_fbc=1 i915.lvds_downclock=1" (In reply to Denis Sokolovsky from comment #5) > Also I have a lot of options in kernel command > line, namely "i915.semaphores=1 i915.enable_rc6=7 i915.enable_fbc=1 > i915.lvds_downclock=1" Please reproduce the issue without any of those set. They are basically debug options, and we don't support changing them from their platform specific defaults. Okay, should I try 4.1.6 (4.1.9), 4.2.1 (4.2.2) or drm-intel-nightly? Without kernel cmdline fancy stuff things get much worse. System just hang completely, no magic sysrq, network, broken filesystem, etc. I'm not sure right now, but, afair, I've added "i915.semaphores=1" to cmdline because I had stability problems before, which was fixed with semaphores turned on. Created attachment 118581 [details]
kern-drm-nightly.log.bz2
On my system dmesg saved into kern.log, but, due to system hang, I'm not sure if it contain all messages till system death.
I cut part of messages, because complete file, compressed with "bzip2 -9" hadn't fit in 3MB constraint.
Things are not that straightforward, as it seems initially. With same configuration and workload, system on 4.2.1 kernel works without hang, that I saw on drm-intel-nightly, for a lot longer. And even when hung occur kernel driver managed to restart rendering. Created attachment 118599 [details]
dmesg.txt.bz2
dmesg from 4.2.1 with drm debug
Created attachment 118601 [details]
error.bz2
Corresponding error info
Not sure if this bug is still relevant, as I haven't experience GPU hang since I switched to 4.3.0 kernel. Actually, not only kernel was new. Almost at the same time were few updates: chrome (47->48), mesa (11.0->11.1) and kernel (4.2.x->4.3.0). Since Dec 20, in logs, I can found only one warning about GPU (see below), but no hungs with current setup, usual workflow and uptime for 17/38+ days (excluding/including sleeps). Jan 9 20:17:19 isis kernel: [1151010.810868] WARNING: CPU: 0 PID: 0 at /usr/src/linux-4.3.0-gentoo/drivers/gpu/drm/i915/intel_display.c:11293 intel_check_page_flip+0xed/0x100() Jan 9 20:17:19 isis kernel: [1151010.810871] Kicking stuck page flip: queued at 28897891, now 28897895 (In reply to Denis Sokolovsky from comment #14) > Actually, not only kernel was new. Almost at the same time were few updates: > chrome (47->48), mesa (11.0->11.1) and kernel (4.2.x->4.3.0). Since Dec 20, > in logs, I can found only one warning about GPU (see below), but no hungs > with current setup, usual workflow and uptime for 17/38+ days > (excluding/including sleeps). > > Jan 9 20:17:19 isis kernel: [1151010.810868] WARNING: CPU: 0 PID: 0 at > /usr/src/linux-4.3.0-gentoo/drivers/gpu/drm/i915/intel_display.c:11293 > intel_check_page_flip+0xed/0x100() > Jan 9 20:17:19 isis kernel: [1151010.810871] Kicking stuck page flip: > queued at 28897891, now 28897895 Closing this GPU hang issue now. Regarding warning, please update your kernel and if this is still occurring, fill a new bug and attached kernel log. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 118549 [details] /sys/class/drm/card0/error Stuck occurred always when chrome is running, but actual triggering time/action/environment is unclear. I built drm-nightly, but since my current kernel have drm-i915 built-in and bug reproduction takes unknown time I can't report anything about it. First time I notice this issue on kernel 4.2.0 - system architecture: x86_64 - kernel version: 4.2.1-gentoo - linux distribution: Gentoo - machine: Lenovo T430 (2344BZU) - display connector: panel connected via LVDS, VGA and DP++ disconnected