96965 – stuck on render ring; GPU HANG: ecode 9:0:0x86dffffd, in Xorg [9455], reason: Ring hung, action: reset

Bug 96965 - stuck on render ring; GPU HANG: ecode 9:0:0x86dffffd, in Xorg [9455], reason: Ring hung, action: reset

Summary: stuck on render ring; GPU HANG: ecode 9:0:0x86dffffd, in Xorg [9455], reason:...

Status:	RESOLVED INVALID

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Ian Romanick
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-07-17 14:11 UTC by Peter Gervai
Modified:	2017-02-10 22:39 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	SKL
i915 features:	GPU hang

Attachments
error log from kernel (71.27 KB, application/gzip) 2016-07-17 14:11 UTC, Peter Gervai	Details
View All

Description Peter Gervai 2016-07-17 14:11:03 UTC

Created attachment 125118 [details]
error log from kernel

My kernel beat me to this.
Log attached.

Comment 1 Peter Gervai 2016-07-17 14:15:54 UTC

As a sidenote: plenty of ugly stuff is happening all the time, like the following. The Net Wisdom says !wm_changed is harmless, but ugly nevertheless. Resetting usually works, but hangs the machine for 2-5 seconds.

[233977.486646] ------------[ cut here ]------------
[233977.486661] WARNING: CPU: 1 PID: 9455 at drivers/gpu/drm/i915/intel_pm.c:3586 skl_update_other_pipe_wm+0x406/0x420 [i915]
[233977.486662] WARN_ON(!wm_changed)
[233977.486663] Modules linked in:
[233977.486664]  nfnetlink_log cfg80211 zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) cpuid xt_NFQUEUE nft_queue nf_tables nfnetlink_queu
[233977.486689]  snd_hda_codec_hdmi snd_seq_device asus_wmi mxm_wmi sparse_keymap snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_
[233977.486709]  xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack nf_log_ipv4 nf_log_common xt_LOG xt_limit iptable_filter ip_tables x_tab
[233977.486720] CPU: 1 PID: 9455 Comm: Xorg Tainted: P     U  W  OE   4.6.4-xanmod6 #1
[233977.486721] Hardware name: System manufacturer System Product Name/B150-PRO, BIOS 0604 11/19/2015
[233977.486722]  0000000000000286 000000001c661fb1 ffff880823037978 ffffffff814dac24
[233977.486723]  ffffffffc0741f58 ffffffffc0648a06 ffff8808230379f8 ffffffff81097221
[233977.486724]  ffffffffc07582b1 ffff880100000020 ffff880823037a08 ffff8808230379b0
[233977.486725] Call Trace:
[233977.486728]  [<ffffffff814dac24>] dump_stack+0x63/0x8f
[233977.486737]  [<ffffffffc0648a06>] ? skl_update_other_pipe_wm+0x406/0x420 [i915]
[233977.486739]  [<ffffffff81097221>] warn_slowpath_fmt+0xc1/0x160
[233977.486746]  [<ffffffffc0648a06>] skl_update_other_pipe_wm+0x406/0x420 [i915]
[233977.486753]  [<ffffffffc0648e20>] skl_update_wm+0x400/0x980 [i915]
[233977.486763]  [<ffffffffc06a7a0e>] ? gen9_write32+0x29e/0x4b0 [i915]
[233977.486770]  [<ffffffffc064df6e>] intel_update_watermarks+0x1e/0x20 [i915]
[233977.486781]  [<ffffffffc06d0451>] haswell_crtc_enable+0x311/0xa30 [i915]
[233977.486791]  [<ffffffffc06cc5ea>] intel_atomic_commit+0x7fa/0x1a60 [i915]
[233977.486818]  [<ffffffffc059fd6f>] ? drm_atomic_check_only+0x18f/0x600 [drm]
[233977.486824]  [<ffffffffc05a0217>] drm_atomic_commit+0x37/0x60 [drm]
[233977.486829]  [<ffffffffc0608c6d>] drm_atomic_helper_connector_dpms+0xed/0x1a0 [drm_kms_helper]
[233977.486835]  [<ffffffffc058f014>] drm_mode_connector_property_set_ioctl+0x394/0x3b0 [drm]
[233977.486840]  [<ffffffffc0578812>] drm_ioctl+0x152/0x540 [drm]
[233977.486846]  [<ffffffffc058ec80>] ? drm_property_change_valid_put+0x180/0x180 [drm]
[233977.486848]  [<ffffffff812c2853>] do_vfs_ioctl+0xa3/0x610
[233977.486849]  [<ffffffff8104451d>] ? fpu__restore_sig+0x2d/0x40
[233977.486850]  [<ffffffff810354f0>] ? restore_sigcontext+0x140/0x1a0
[233977.486851]  [<ffffffff8103591f>] ? sys_rt_sigreturn+0xef/0x190
[233977.486852]  [<ffffffff812c2e39>] SyS_ioctl+0x79/0x90
[233977.486853]  [<ffffffff81a32776>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[233977.486854] ---[ end trace d8d901b5e0b86a8a ]---

Comment 2 yann 2016-07-18 07:26:30 UTC

Peter, there were two dozen or so workarounds affecting skl/kbl that went to nightly by the end of June. Could you please respin with latest on git://anongit.freedesktop.org/drm-intel ?

Comment 3 yann 2016-08-30 13:32:42 UTC

Peter, did you upgrade your kernel? Does the issue is still occurring?

Comment 4 yann 2016-08-30 14:27:13 UTC

(In reply to yann from comment #3)
> Peter, did you upgrade your kernel? Does the issue is still occurring?

I mean for the wm issue. If it still occurs please filled another bug in DRI, DRM/Intel.

Regarding GPU hang, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang).

From this error dump, hung is happening in render ring batch with active head at 0xff06f504, with 0x79000002 (3DSTATE_DRAWING_RECTANGLE) as IPEHR.

Batch extract (around 0xff06f504):


0xff06f4e8:      0x784e0002: 3D UNKNOWN: 3d_965 opcode = 0x784e
0xff06f4ec:      0x00000000: MI_NOOP
0xff06f4f0:      0x00000000: MI_NOOP
0xff06f4f4:      0x00000000: MI_NOOP
0xff06f4f8:      0x780f0000: 3DSTATE_SCISSOR_POINTERS
0xff06f4fc:      0x000073a0:    scissor rect offset
0xff06f500:      0x79000002: 3DSTATE_DRAWING_RECTANGLE
0xff06f504:      0x00000000:    top left: 0,0
0xff06f508:      0x026805e1:    bottom right: 1505,616
0xff06f50c:      0x00000000:    origin: 0,0
0xff06f510:      0x784a0000: 3D UNKNOWN: 3d_965 opcode = 0x784a
0xff06f514:      0x0000c001: MI_NOOP
0

Comment 5 Peter Gervai 2016-08-30 20:20:56 UTC

Due to other problems (general crappyness of i915 regardless of this bug) I have given up trying and swapped in my old nvidia to a newer one. As far as I remember I have tried a bit newer kernel but not very recent ones, since there were way too much time wasted on all kind of new components went into the machine (since it's been struck by lightning and got replaced, which primarily resulted my adventures with i915) and the machine had to be stabilised for working. The stalls were causing all kind of problems in other parts of the system so with all my apologies I had to give up debugging this and went on working on my tasks. 

Testing unfortunately requires plenty of reboots and the mainboard doesn't quite like to init dual monitors at startup (DVI works, HDMI doesn't, which makes quite hard to follow boot problems or modeset related stuff), the kernel had to be forced to always-present due to a detection bug, and the various hacks required to keep the system running (albeit pretty slowly) piled up too high for my tastes. 

Therefore I am terribly sorry that I cannot provide more input and testing on this bug (I would say "when time permits", since the card is still present, but my estimation of the probability of this to happen are rather low).

Feel free to do whatever you deem appropriate to this poor bug.

Comment 6 Annie 2017-02-10 22:39:06 UTC

Dear Reporter,

This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested.

Thank you.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.