Summary: | [snb] GPU HANG: ecode 0:0x85fffff8 Freeze with SMP support (stuck on render ring) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Cedric Sodhi <manday> | ||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Severity: | normal | ||||||||||
Priority: | medium | CC: | alinm.elena, intel-gfx-bugs, przanoni, undying-m, zahnwiegebiss | ||||||||
Version: | XOrg git | ||||||||||
Hardware: | Other | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | SNB | i915 features: | GPU hang | ||||||||
Attachments: |
|
Created attachment 102271 [details]
dmeg after errors started appearing
Is this a regression? Can you reproduce this on newer Kernels? (In reply to comment #2) > Is this a regression? Can you reproduce this on newer Kernels? If you have any particular version in mind I'll try it. Something to test: diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index ed4376e..d05d1b6 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -991,14 +991,17 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, cache_level, true, flags); if (++act_pte == I915_PPGTT_PT_ENTRIES) { + drm_clflush_virt_range(pd_vaddr, PAGE_SIZE); kunmap_atomic(pt_vaddr); pt_vaddr = NULL; act_pt++; act_pte = 0; } } - if (pt_vaddr) + if (pt_vaddr) { + drm_clflush_virt_range(pd_vaddr, PAGE_SIZE); kunmap_atomic(pt_vaddr); + } } That patch doesn't apply to drm-intel-nightly of 7. July. Which repository is this? One based off nightly on the 4th of July. It's a two line patch - just add drm_clflush_virt_range(pd_vaddr, PAGE_SIZE); before kunmap_atomic() in gen6_ppgtt_insert_entries. I get "pd_vaddr undeclared". *** Bug 81459 has been marked as a duplicate of this bug. *** *** Bug 80478 has been marked as a duplicate of this bug. *** *** Bug 81375 has been marked as a duplicate of this bug. *** Assuming "pd_vaddr" should have been "pt_vaddr", the patch did not help. tested with the latests stable kernel and still here Linux abaddon 3.16.0-5.g07174c1-desktop #1 SMP PREEMPT Wed Aug 13 16:23:31 UTC 2014 (07174c1) x86_64 x86_64 x86_64 GNU/Linux S | Name | Type | Version | Arch | Repository --+----------------------+---------+-------------+--------+------------------ i | DirectFB-Mesa | package | 1.7.5-1.1 | x86_64 | openSUSE-13.1-Oss i | Mesa | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-32bit | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-demo-x | package | 8.2.0-1.2 | x86_64 | openSUSE-13.1-Oss i | Mesa-libEGL1 | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libEGL1-32bit | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libGL-devel | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libGL1 | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libGL1-32bit | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libGLESv1_CM1 | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libGLESv2-2 | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libglapi-devel | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libglapi0 | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | Mesa-libglapi0-32bit | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | libOSMesa-devel | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | libOSMesa9 | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss i | libOSMesa9-32bit | package | 10.2.5-89.1 | x86_64 | openSUSE-13.1-Oss added Xorg... S | Name | Type | Version | Arch | Repository --+-----------------------+---------+----------------+--------+------------------ i | xorg-cf-files | package | 1.0.5-3.5 | noarch | openSUSE-13.1-Oss i | xorg-scripts | package | 1.0.1-8.2 | noarch | openSUSE-13.1-Oss i | xorg-x11 | package | 7.6_1-14.2 | noarch | openSUSE-13.1-Oss i | xorg-x11-Xvnc | package | 1.3.1-4.1 | x86_64 | openSUSE-13.1-Oss i | xorg-x11-driver-video | package | 7.6_1-13.2 | x86_64 | openSUSE-13.1-Oss i | xorg-x11-essentials | package | 7.6_1-14.2 | noarch | openSUSE-13.1-Oss i | xorg-x11-fonts | package | 7.6-29.2 | noarch | openSUSE-13.1-Oss i | xorg-x11-fonts-core | package | 7.6-29.2 | noarch | openSUSE-13.1-Oss i | xorg-x11-libs | package | 7.6-45.1 | noarch | openSUSE-13.1-Oss i | xorg-x11-server | package | 7.6_1.16.0-3.1 | x86_64 | openSUSE-13.1-Oss i | xorg-x11-server-extra | package | 7.6_1.16.0-3.1 | x86_64 | openSUSE-13.1-Oss found a trace too 9564.511631] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [ 9710.676434] [drm] stuck on render ring [ 9710.677007] [drm] GPU HANG: ecode 0:0x85fffdf8, in Xorg [3335], reason: Ring hung, action: reset [ 9712.678810] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [ 9779.754376] [drm] stuck on render ring [ 9779.754968] [drm] GPU HANG: ecode 0:0x85fffdf8, in Xorg [3335], reason: Ring hung, action: reset [ 9781.756641] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [ 9786.762283] [drm] stuck on render ring [ 9786.762867] [drm] GPU HANG: ecode 0:0x85fffdf8, in Xorg [3335], reason: Ring hung, action: reset [ 9788.764994] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [ 9828.809723] [drm] stuck on render ring [ 9828.810351] [drm] GPU HANG: ecode 0:0x85ff9ff8, in Xorg [3335], reason: Ring hung, action: reset [ 9830.812078] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [ 9858.843612] [drm] stuck on render ring [ 9858.844187] [drm] GPU HANG: ecode 0:0x85ff9ff8, in Xorg [3335], reason: Ring hung, action: reset [ 9860.845890] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [ 9864.850387] [drm] stuck on render ring [ 9864.850951] [drm] GPU HANG: ecode 0:0x85ff9ff8, in Xorg [3335], reason: Ring hung, action: reset [ 9866.853178] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [10092.106825] [drm] stuck on render ring [10092.107410] [drm] GPU HANG: ecode 0:0x85fffdf8, in Xorg [3335], reason: Ring hung, action: reset [10094.109482] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [10100.115868] [drm] stuck on render ring [10100.116441] [drm] GPU HANG: ecode 0:0x85fffdf8, in Xorg [3335], reason: Ring hung, action: reset [10100.130852] ------------[ cut here ]------------ [10100.130886] WARNING: CPU: 1 PID: 3335 at ../drivers/gpu/drm/i915/i915_gem.c:4058 i915_gem_object_pin+0x74f/0x7a0 [i915]() [10100.130899] Modules linked in: rfcomm ctr ccm uas usb_storage rndis_wlan fuse hidp af_packet bnep ecb dell_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core iTCO_wdt v4l2_common iTCO_vendor_support videodev rndis_host cdc_ether usbnet mii dell_laptop dcdbas ath3k btusb bluetooth 6lowpan_iphc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel aesni_intel snd_hda_controller snd_hda_codec snd_hwdep aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_seq arc4 snd_seq_device snd_timer pcspkr joydev snd serio_raw ath9k ath9k_common ath9k_hw ath mac80211 i2c_i801 cfg80211 rfkill lpc_ich mfd_core soundcore shpchp mei_me [10100.130932] mei thermal wmi battery tpm_tis tpm ac processor dm_mod i915 i2c_algo_bit drm_kms_helper drm xhci_hcd video button sg [10100.130941] CPU: 1 PID: 3335 Comm: Xorg Tainted: G W 3.16.0-5.g07174c1-desktop #1 [10100.130942] Hardware name: Dell Inc. XPS L322X/0PJHXN, BIOS A10 08/28/2013 [10100.130944] 0000000000000009 ffffffff81619585 0000000000000000 ffffffff8105bab7 [10100.130947] ffff88009fa57a08 ffff8802125cf580 0000000000001000 ffff88009f8ea000 [10100.130949] 0000000000000004 ffffffffa01008bf ffffffffa00f25a0 ffff88009666e580 [10100.130951] Call Trace: [10100.130963] [<ffffffff8100519e>] dump_trace+0x8e/0x350 [10100.130967] [<ffffffff81005506>] show_stack_log_lvl+0xa6/0x190 [10100.130970] [<ffffffff81006c01>] show_stack+0x21/0x50 [10100.130975] [<ffffffff81619585>] dump_stack+0x49/0x6a [10100.130979] [<ffffffff8105bab7>] warn_slowpath_common+0x77/0x90 [10100.130992] [<ffffffffa01008bf>] i915_gem_object_pin+0x74f/0x7a0 [i915] [10100.131040] [<ffffffffa00f354e>] i915_switch_context+0x11e/0x590 [i915] [10100.131068] [<ffffffffa00f584e>] i915_gem_do_execbuffer.isra.24+0xa3e/0x13d0 [i915] [10100.131097] [<ffffffffa00f669f>] i915_gem_execbuffer2+0xaf/0x2b0 [i915] [10100.131125] [<ffffffffa005c8c7>] drm_ioctl+0x1c7/0x5b0 [drm] [10100.131131] [<ffffffff811c9c27>] do_vfs_ioctl+0x2e7/0x4c0 [10100.131140] [<ffffffff811c9e81>] SyS_ioctl+0x81/0xa0 [10100.131145] [<ffffffff816202ad>] system_call_fastpath+0x1a/0x1f [10100.131149] [<00007f50138bb727>] 0x7f50138bb726 [10100.131150] ---[ end trace 6f7349d393b8fa48 ]--- [10100.133475] ------------[ cut here ]------------ [10100.133506] WARNING: CPU: 1 PID: 3335 at ../drivers/gpu/drm/i915/i915_gem.c:4058 i915_gem_object_pin+0x74f/0x7a0 [i915]() [10100.133507] Modules linked in: rfcomm ctr ccm uas usb_storage rndis_wlan fuse hidp af_packet bnep ecb dell_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core iTCO_wdt v4l2_common iTCO_vendor_support videodev rndis_host cdc_ether usbnet mii dell_laptop dcdbas ath3k btusb bluetooth 6lowpan_iphc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel aesni_intel snd_hda_controller snd_hda_codec snd_hwdep aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_pcm snd_seq arc4 snd_seq_device snd_timer pcspkr joydev snd serio_raw ath9k ath9k_common ath9k_hw ath mac80211 i2c_i801 cfg80211 rfkill lpc_ich mfd_core soundcore shpchp mei_me [10100.133537] mei thermal wmi battery tpm_tis tpm ac processor dm_mod i915 i2c_algo_bit drm_kms_helper drm xhci_hcd video button sg [10100.133546] CPU: 1 PID: 3335 Comm: Xorg Tainted: G W 3.16.0-5.g07174c1-desktop #1 [10100.133547] Hardware name: Dell Inc. XPS L322X/0PJHXN, BIOS A10 08/28/2013 [10100.133549] 0000000000000009 ffffffff81619585 0000000000000000 ffffffff8105bab7 [10100.133551] ffff88009fa57a08 ffff8802125cf580 0000000000001000 ffff88009f8ea000 [10100.133553] 0000000000000004 ffffffffa01008bf ffffffffa00f25a0 ffff88009666e580 [10100.133556] Call Trace: [10100.133565] [<ffffffff8100519e>] dump_trace+0x8e/0x350 [10100.133568] [<ffffffff81005506>] show_stack_log_lvl+0xa6/0x190 [10100.133571] [<ffffffff81006c01>] show_stack+0x21/0x50 [10100.133575] [<ffffffff81619585>] dump_stack+0x49/0x6a [10100.133580] [<ffffffff8105bab7>] warn_slowpath_common+0x77/0x90 [10100.133595] [<ffffffffa01008bf>] i915_gem_object_pin+0x74f/0x7a0 [i915] [10100.133648] [<ffffffffa00f354e>] i915_switch_context+0x11e/0x590 [i915] [10100.133679] [<ffffffffa00f584e>] i915_gem_do_execbuffer.isra.24+0xa3e/0x13d0 [i915] [10100.133711] [<ffffffffa00f669f>] i915_gem_execbuffer2+0xaf/0x2b0 [i915] [10100.133739] [<ffffffffa005c8c7>] drm_ioctl+0x1c7/0x5b0 [drm] [10100.133746] [<ffffffff811c9c27>] do_vfs_ioctl+0x2e7/0x4c0 [10100.133754] [<ffffffff811c9e81>] SyS_ioctl+0x81/0xa0 [10100.133759] [<ffffffff816202ad>] system_call_fastpath+0x1a/0x1f [10100.133763] [<00007f50138bb727>] 0x7f50138bb726 [10100.133764] ---[ end trace 6f7349d393b8fa49 ]--- [10102.118543] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [10358.399466] [drm] stuck on render ring [10358.400047] [drm] GPU HANG: ecode 0:0x85fffff8, in Xorg [3335], reason: Ring hung, action: reset [10360.402159] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [10495.566306] [drm] stuck on render ring [10495.566822] [drm] GPU HANG: ecode 0:0x85fffdf8, in Xorg [3335], reason: Ring hung, action: reset [10497.568905] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [10735.837513] [drm] stuck on render ring [10735.838051] [drm] GPU HANG: ecode 0:0x85fffff8, in Xorg [3335], reason: Ring hung, action: reset [10737.840123] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [11321.486431] [drm] stuck on render ring [11321.487014] [drm] GPU HANG: ecode 0:0x85ff9ff8, in Xorg [3335], reason: Ring hung, action: reset [11323.489153] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off [11458.649203] [drm] stuck on render ring [11458.649833] [drm] GPU HANG: ecode 0:0x85fffdf8, in Xorg [3335], reason: Ring hung, action: reset [11460.651885] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off Created attachment 105482 [details] [review] Silver bullet Any improvement with this? I found this was covering up one bug during hardware init that caused similar symptoms. I am more and mroe confused.... now the error code changed (this is with the new rc kernel and git Mesa, I did not try your patch yet) [ 118.563025] [drm] stuck on render ring [ 118.563613] [drm] GPU HANG: ecode 0:0x85ffbffa, in Xorg [931], reason: Ring hung, action: reset [ 118.563618] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 118.563619] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 118.563621] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 118.563623] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 118.563625] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 258.105176] show_signal_msg: 111 callbacks suppressed [ 258.105182] chromium[1800]: segfault at 1f8 ip 00007fc19c576ff8 sp 00007fff2c806360 error 4 in i965_dri.so[7fc19c22f000+4fc000] [ 258.269060] chromium[1844]: segfault at 1f8 ip 00007f3d84b59ff8 sp 00007fff66cb32c0 error 4 in i965_dri.so[7f3d84812000+4fc000] [ 258.355290] chromium[1873]: segfault at 1f8 ip 00007fe11eb12ff8 sp 00007fffd50a0020 error 4 in i965_dri.so[7fe11e7cb000+4fc000] [ 707.792265] [drm] stuck on render ring [ 707.792797] [drm] GPU HANG: ecode 0:0x85ff9ffa, in Xorg [931], reason: Ring hung, action: reset [ 1204.141958] [drm] stuck on render ring [ 1204.142507] [drm] GPU HANG: ecode 0:0x85ffbcfa, in Xorg [931], reason: Ring hung, action: reset [ 2092.971512] [drm] stuck on render ring [ 2092.972093] [drm] GPU HANG: ecode 0:0x85ff9dfa, in Xorg [931], reason: Ring hung, action: reset [ 4127.302035] [drm] stuck on render ring [ 4127.302619] [drm] GPU HANG: ecode 0:0x85ff9ffa, in Xorg [931], reason: Ring hung, action: reset [ 4194.222338] [drm] stuck on render ring [ 4194.222954] [drm] GPU HANG: ecode 0:0x85ffbffa, in Xorg [931], reason: Ring hung, action: reset [ 4207.205334] [drm] stuck on render ring [ 4207.205892] [drm] GPU HANG: ecode 0:0x85ff9ffa, in Xorg [931], reason: Ring hung, action: reset [ 4340.031296] [drm] stuck on render ring [ 4340.031880] [drm] GPU HANG: ecode 0:0x85fffef8, in Xorg [931], reason: Ring hung, action: reset [ 4347.022126] [drm] stuck on render ring [ 4347.022695] [drm] GPU HANG: ecode 0:0x85ff9dfa, in Xorg [931], reason: Ring hung, action: reset [ 4404.946208] [drm] stuck on render ring [ 4404.946816] [drm] GPU HANG: ecode 0:0x85ff9ffa, in Xorg [931], reason: Ring hung, action: reset [ 4558.025281] [drm:intel_dp_start_link_train] *ERROR* too many full retries, give up [ 4616.668699] [drm] stuck on render ring [ 4616.669297] [drm] GPU HANG: ecode 0:0x85fffef8, in Xorg [3514], reason: Ring hung, action: reset [ 5086.053782] [drm] stuck on blitter ring [ 5086.056212] [drm] GPU HANG: ecode 2:0xfffffffe, in Xorg [3514], reason: Ring hung, action: reset [ 5392.653547] [drm] stuck on render ring [ 5392.654138] [drm] GPU HANG: ecode 0:0x85ff9ffa, in Xorg [3514], reason: Ring hung, action: reset [ 5401.641919] [drm] stuck on render ring [ 5401.642496] [drm] GPU HANG: ecode 0:0x85ff9dfa, in Xorg [3514], reason: Ring hung, action: reset [ 5614.366020] [drm] stuck on render ring [ 5614.366570] [drm] GPU HANG: ecode 0:0x85fffcfe, in Xorg [3514], reason: Ring hung, action: reset [ 6068.069431] chromium[4612]: segfault at 1f8 ip 00007fd59f45aff8 sp 00007fffd765f6f0 error 4 in i965_dri.so[7fd59f113000+4fc000] [ 6068.319886] chromium[4642]: segfault at 1f8 ip 00007f9dafc7aff8 sp 00007fff6eea6df0 error 4 in i965_dri.so[7f9daf933000+4fc000] [ 6068.437668] chromium[4669]: segfault at 1f8 ip 00007fccd99b8ff8 sp 00007fff40ea21b0 error 4 in i965_dri.so[7fccd9671000+4fc000] [ 6311.460527] [drm] stuck on render ring [ 6311.461159] [drm] GPU HANG: ecode 0:0x85ff9ffa, in Xorg [3514], reason: Ring hung, action: reset [ 6325.442251] [drm] stuck on render ring [ 6325.442815] [drm] GPU HANG: ecode 0:0x85ffbff8, in Xorg [3514], reason: Ring hung, action: reset *** This bug has been marked as a duplicate of bug 81369 *** Falsely duped. Chris, is this fixed by commit 5e4f518959bdf8a4f9c8f80879e4a0f7a95d2cb3 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Feb 13 14:35:59 2015 +0000 drm/i915: Prevent TLB error on first execution on SNB in drm-intel-fixes? It References: this, but I feel unsure. turns out the problem was the memory. Alin Timeout, presuming fixed, closing. Please reopen if the problem persists with latest kernels. Closing >1 year old resolved+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 102270 [details] card0/error file as referred to in dmesg With CONFIG_X86_64_SMP, X either ...works ...does not start after `startx` and freezes with a black screen, underline cursor in the top left and has to be killed by cutting the power ...does not start after `startx` and freezes with a black screen, underline cursor in the top left and has to be killed by SysReq or cutting the power ...works at first and then freezes ...works at first and then starts showing artefacts (large black areas) in anything even slightly rendering expensive (webpages in firefox) ...shows artefacts right from the beginning at complete random. Without SMP support, no problems are encountered. Attached you find dmeg and card0/error after X worked at first but then suddenly started showing errors (and eventually freeze completely).