Bug 91593 - [BDW] GPU HANG: ecode 8:0:0x85dffffb, in valley_x64 [8637], reason: Ring hung, action: reset
Summary: [BDW] GPU HANG: ecode 8:0:0x85dffffb, in valley_x64 [8637], reason: Ring hung...
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact:
URL:
Whiteboard:
Keywords:
: 93223 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-08-09 23:55 UTC by Edward O'Callaghan
Modified: 2017-02-10 22:38 UTC (History)
2 users (show)

See Also:
i915 platform: BDW
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (2.82 MB, text/plain)
2015-08-09 23:55 UTC, Edward O'Callaghan
Details

Description Edward O'Callaghan 2015-08-09 23:55:37 UTC
Created attachment 117601 [details]
/sys/class/drm/card0/error

uname -a

Linux rushlocal.promimlocal 4.1.3-200.fc22.x86_64 #1 SMP Wed Jul 22 19:51:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

dmesg snip
----------

239726.963055] [drm] stuck on render ring
[239726.967006] [drm] GPU HANG: ecode 8:0:0x85dffffb, in valley_x64 [8637], reason: Ring hung, action: reset
[239726.967015] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[239726.967016] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[239726.967017] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[239726.967018] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[239726.967019] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[239726.969206] drm/i915: Resetting chip after gpu hang
[239736.965134] [drm] stuck on render ring
[239736.968701] [drm] GPU HANG: ecode 8:0:0x85dffffb, in valley_x64 [8637], reason: Ring hung, action: reset
[239736.968739] ------------[ cut here ]------------
[239736.968777] WARNING: CPU: 0 PID: 8506 at drivers/gpu/drm/i915/intel_display.c:10007 intel_mmio_flip_work_func+0x31a/0x330 [i915]()
[239736.968778] WARN_ON(__i915_wait_request(mmio_flip->req, crtc->reset_counter, false, NULL, NULL) != 0)
[239736.968780] Modules linked in:
[239736.968781]  cdc_mbim cdc_wdm cdc_ncm usbnet mii snd_usb_audio snd_usbmidi_lib snd_rawmidi ccm bnep bluetooth xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat tun bridge ebtable_filter ebtables ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm arc4 iwlmvm mac80211 iTCO_wdt snd_hda_codec_realtek iwlwifi iTCO_vendor_support rtsx_pci_ms memstick snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep joydev thinkpad_acpi mei_me mei cfg80211 snd_seq snd_seq_device snd_pcm wmi snd_timer snd tpm_tis i2c_i801 tpm lpc_ich shpchp rfkill soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc dm_crypt
[239736.968817]  8021q garp stp llc mrp i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm e1000e ghash_clmulni_intel serio_raw ptp rtsx_pci pps_core mfd_core video
[239736.968828] CPU: 0 PID: 8506 Comm: kworker/0:1 Not tainted 4.1.3-200.fc22.x86_64 #1
[239736.968830] Hardware name: LENOVO 20BV0020AU/20BV0020AU, BIOS JBET39WW (1.04 ) 12/09/2014
[239736.968844] Workqueue: events intel_mmio_flip_work_func [i915]
[239736.968846]  0000000000000000 0000000031a30932 ffff88004e577cd8 ffffffff8179b4cd
[239736.968848]  0000000000000000 ffff88004e577d30 ffff88004e577d18 ffffffff810a163a
[239736.968850]  ffff88004e577ce8 ffff88042b08c8b8 ffff88043dc17000 ffff88042b08c000
[239736.968852] Call Trace:
[239736.968857]  [<ffffffff8179b4cd>] dump_stack+0x45/0x57
[239736.968860]  [<ffffffff810a163a>] warn_slowpath_common+0x8a/0xc0
[239736.968862]  [<ffffffff810a16c5>] warn_slowpath_fmt+0x55/0x70
[239736.968875]  [<ffffffffa01deaaa>] intel_mmio_flip_work_func+0x31a/0x330 [i915]
[239736.968877]  [<ffffffff810db92c>] ? put_prev_task_fair+0x2c/0x40
[239736.968880]  [<ffffffff810baa6b>] process_one_work+0x1bb/0x410
[239736.968881]  [<ffffffff810bad13>] worker_thread+0x53/0x480
[239736.968883]  [<ffffffff810bacc0>] ? process_one_work+0x410/0x410
[239736.968885]  [<ffffffff810bacc0>] ? process_one_work+0x410/0x410
[239736.968887]  [<ffffffff810c0b88>] kthread+0xd8/0xf0
[239736.968890]  [<ffffffff810c0ab0>] ? kthread_worker_fn+0x180/0x180
[239736.968892]  [<ffffffff817a1e62>] ret_from_fork+0x42/0x70
[239736.968894]  [<ffffffff810c0ab0>] ? kthread_worker_fn+0x180/0x180
[239736.968896] ---[ end trace 743a4072ff49f8ca ]---
[239736.970523] drm/i915: Resetting chip after gpu hang
Comment 1 Edward O'Callaghan 2015-08-09 23:58:29 UTC
Hi,

I used the following demo program to produce the hang on Fedora 22:
http://unigine.com/products/valley/

[edward@rushlocal Unigine_Valley-1.0]$ lspci -s 00:02.0 -v
00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 5034
        Flags: bus master, fast devsel, latency 0, IRQ 46
        Memory at e0000000 (64-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 3000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915

Hopefully you can reproduce also.
Kind Regards
Comment 2 yann 2016-05-18 17:12:51 UTC
*** Bug 93223 has been marked as a duplicate of this bug. ***
Comment 3 yann 2016-09-27 15:05:52 UTC
We seem to have neglected the bug a bit, apologies.

There were improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring.

In parallel, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang).

Kernel: 4.1.3-200.fc22.x86_64
Platform: Broadwell-U (pci id: 0x1616)
Mesa: [Please confirm your mesa version]

From this error dump, hung is happening in render ring batch with active head at 0x18a04a94, with 0x7a000004 (PIPE_CONTROL) as IPEHR.

We can also note:
ERROR: 0x00000001
    TLB page fault error (GTT entry not valid)
FAULT_TLB_DATA: 0x0000001c 0xe1b59da4
    Address 0x0000ce1b59da4000 GGTT

and in FAULT_REG of render ring: Invalid PTE Fault

Batch extract (around 0x18a04a94):

0x18a04a74:      0x78140000: 3D UNKNOWN: 3d_965 opcode = 0x7814
0x18a04a78:      0x80002044: UNKNOWN
Bad count in PIPE_CONTROL
0x18a04a7c:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0x18a04a80:      0x00101c11:    destination address
0x18a04a84:      0x00000000:    immediate dword low
0x18a04a88:      0x00000000:    immediate dword high
Bad count in PIPE_CONTROL
0x18a04a94:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0x18a04a98:      0x00002000:    destination address
0x18a04a9c:      0x00000000:    immediate dword low
0x18a04aa0:      0x00000000:    immediate dword high
Bad count in PIPE_CONTROL
0x18a04aac:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0x18a04ab0:      0x00000001:    destination address
0x18a04ab4:      0x00000000:    immediate dword low
0x18a04ab8:      0x00000000:    immediate dword high
Comment 4 yann 2016-11-04 15:47:00 UTC
Please test a new version of Mesa (12 or 13) and mark as REOPENED
if you can reproduce and RESOLVED/* if you cannot reproduce.

If you can reproduce, please capture and upload an apitrace (https://github.com/apitrace/apitrace) so that we can easily 
reproduce as well.
Comment 5 Annie 2017-02-10 22:38:25 UTC
Dear Reporter,

This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested.

Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.