Bug 78960 - [hsw gt1] Death on context load with Xbmc (mesa/libva)
Summary: [hsw gt1] Death on context load with Xbmc (mesa/libva)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 80337 80608 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-05-20 11:58 UTC by asleulv
Modified: 2017-07-24 22:54 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
error file form the dump (2.80 MB, text/plain)
2014-05-20 11:58 UTC, asleulv
no flags Details
kernel demsg (47.26 KB, text/rtf)
2014-05-20 11:59 UTC, asleulv
no flags Details
error file from the dump 2 (2.80 MB, application/octet-stream)
2014-05-20 13:19 UTC, asleulv
no flags Details
kernel demsg 3 (52.36 KB, text/rtf)
2014-05-21 17:28 UTC, asleulv
no flags Details
error file from the dump 3 (2.80 MB, application/octet-stream)
2014-05-21 17:30 UTC, asleulv
no flags Details
GPU stuck (2.81 MB, text/plain)
2014-06-14 17:23 UTC, Peter Frühberger
no flags Details
GPU stuck log from /sys/class/drm/card0/error (22 bytes, text/plain)
2014-10-11 10:42 UTC, marlock9
no flags Details
GPU stuck log from /sys/class/drm/card0/error (1.60 MB, text/plain)
2014-10-11 10:47 UTC, marlock9
no flags Details
GPU stuck log from /sys/class/drm/card0/error part 2 (1.60 MB, text/plain)
2014-10-11 10:48 UTC, marlock9
no flags Details

Description asleulv 2014-05-20 11:58:28 UTC
Created attachment 99403 [details]
error file form the dump

When watching movies with xbmc, the machine freezes after some time. Xbmc uses OpenGL for rendering and for decoding VAAPI is used.

Find attached:
error file from the dump
kernel dmesg
Comment 1 asleulv 2014-05-20 11:59:39 UTC
Created attachment 99404 [details]
kernel demsg
Comment 2 Chris Wilson 2014-05-20 12:04:47 UTC
Died whilst loading the context. Could be the mesa overrun bug, but no garbage is evident.
Comment 3 asleulv 2014-05-20 12:13:55 UTC
Hi Chris.

I am running OpenELEC 4.0 that integrates the latest stable mesa 10.1.3 - is there a patch for this overrun problem somewhere? I could try if it solves this problem. Thanks in advance.
Comment 4 Chris Wilson 2014-05-20 12:24:01 UTC
commit 7ae870211ddc40ef6ed209a322c3a721214bb737
Author: Eric Anholt <eric@anholt.net>
Date:   Mon Apr 14 16:52:43 2014 -0700

    i965: Fix buffer overruns in MSAA MCS buffer clearing.

which is in 10.1.1, so it looks like you already have that patch. Can you try reproducing the issue a few more times to see if the error state has an obvious error?
Comment 5 asleulv 2014-05-20 13:19:52 UTC
Created attachment 99411 [details]
error file from the dump 2
Comment 6 Chris Wilson 2014-05-20 13:31:22 UTC
Another context load death, with everything else appearing clean.
Comment 7 asleulv 2014-05-21 17:28:40 UTC
Created attachment 99521 [details]
kernel demsg 3
Comment 8 asleulv 2014-05-21 17:30:39 UTC
Created attachment 99522 [details]
error file from the dump 3
Comment 9 Ben Widawsky 2014-06-04 00:04:42 UTC
Just some obersvations:

I agree with Chris, this is a context restore.

IPEHR: 0x780c0000 => 3DSTATE_VF

We have several mesa bugs with this as the IPEHR (according to Ken)

3DSTATE_VF is the last state loaded (other than the resource streamer).
Comment 10 Peter Frühberger 2014-06-14 17:23:29 UTC
Created attachment 101057 [details]
GPU stuck

I could reproduce today. While rendering 720p50 video with a nice lanczos3 optimized upscaler my render got stuck, too.

Mesa 10.1.3
Kernel 3.15.0

Same bug?
Comment 11 Chris Wilson 2014-06-21 19:25:22 UTC
*** Bug 80337 has been marked as a duplicate of this bug. ***
Comment 12 Chris Wilson 2014-06-27 22:33:57 UTC
*** Bug 80608 has been marked as a duplicate of this bug. ***
Comment 13 Ferry Toth 2014-09-20 15:06:08 UTC
I am experiencing:

GPU HANG: ecode 0:0x87d3bffa, in kwin, reason: Ring hung, action: reset

Sporadically when using firefox, much more often in chromium. Must be related to the type GPU as I don't see this on other machines with intel.

lspci:
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)
Comment 14 marlock9 2014-10-11 10:32:10 UTC
I am experiencing this too. It happens when I playing flash game (eg Tanki Online) in Chromium with pepper flash plugin or listen to music in flash music player.
[ 1491.505481] [drm] stuck on render ring
[ 1491.507282] [drm] GPU HANG: ecode 0:0x87d3bffa, in chromium [517], reason: Ring hung, action: reset
[ 1491.507287] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1491.507289] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1491.507292] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1491.507294] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1491.507297] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1493.504508] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
Comment 15 marlock9 2014-10-11 10:42:24 UTC
Created attachment 107709 [details]
GPU stuck log from /sys/class/drm/card0/error
Comment 16 marlock9 2014-10-11 10:47:53 UTC
Created attachment 107710 [details]
GPU stuck log from /sys/class/drm/card0/error
Comment 17 marlock9 2014-10-11 10:48:37 UTC
Created attachment 107711 [details]
GPU stuck log from /sys/class/drm/card0/error part 2
Comment 18 Kete Tefid 2014-10-23 18:15:09 UTC
Hello,
I have the same problem on an arrandale i3-380m. I encounter this problem whenever I use mplayer with vaapi. The movie stops and the cpu usage goes high. Then, I have to kill mplayer and restart. On my machine it happens too often, even after a few seconds. However, dmesg does not show anything. Should I install any packages or enable some debug flags in the kernel to make the log show up?
thanks
Comment 19 Kete Tefid 2014-10-23 18:30:39 UTC
And I just tested with xbmc and the result was exactly like yours. At first the video froze but the audio continued, however, the audio also stopped shortly after that! The player had to be killed.
And dmesg showed this line:
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... bsd ring idle
I did not see this bug before and it apparantly appeared after I had updated the system. Are there working versions in mesa/libva without this bug?
Comment 20 marlock9 2014-10-25 13:48:53 UTC
After recent updates:
[ 5892.079279] [drm] stuck on render ring
[ 5892.081239] [drm] GPU HANG: ecode 0:0x87d3bffa, in chromium [564], reason: Ring hung, action: reset
[ 5892.081244] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 5892.081247] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 5892.081249] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 5892.081251] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 5892.081254] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 6035.985740] [drm] stuck on render ring
[ 6035.987234] [drm] GPU HANG: ecode 0:0x87d3bffa, in chromium [2704], reason: Ring hung, action: reset


Few minutes after (I launched chromium again and started playing the flash game):

[ 6037.991290] ------------[ cut here ]------------
[ 6037.991318] WARNING: CPU: 1 PID: 39 at drivers/gpu/drm/i915/intel_pm.c:3486 gen6_enable_rps_interrupts+0x71/0x80 [i915]()
[ 6037.991321] Modules linked in: cdc_acm fuse ctr ccm arc4 joydev ecb iwlmvm coretemp mousedev mac80211 snd_hda_codec_conexant snd_hda_codec_hdmi snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel rtsx_pci_ms ghash_clmulni_intel memstick cryptd psmouse snd_hda_intel uas snd_hda_controller i2c_i801 serio_raw snd_hda_codec iwlwifi thinkpad_acpi cfg80211 wmi lpc_ich thermal evdev tpm_tis tpm snd_hwdep snd_pcm battery shpchp ac nvram led_class rfkill hwmon mac_hid snd_timer snd processor ie31200_edac edac_core mei_me mei soundcore ext4 crc16 mbcache jbd2 usb_storage sd_mod sr_mod crc_t10dif cdrom crct10dif_common rtsx_pci_sdmmc mmc_core atkbd libps2 ahci libahci libata scsi_mod rtsx_pci ehci_pci ehci_hcd
[ 6037.991363]  usbcore usb_common i8042 serio i915 button intel_gtt i2c_algo_bit video drm_kms_helper drm i2c_core [last unloaded: bluetooth]
[ 6037.991374] CPU: 1 PID: 39 Comm: kworker/1:1 Not tainted 3.17.1-1-ARCH #1
[ 6037.991376] Hardware name: LENOVO 20C5005LRT/20C5005LRT, BIOS J9ET93WW (2.13 ) 08/29/2014
[ 6037.991386] Workqueue: events intel_gen6_powersave_work [i915]
[ 6037.991388]  0000000000000000 000000005e0af667 ffff880037453d58 ffffffff81536850
[ 6037.991391]  0000000000000000 ffff880037453d90 ffffffff8107054d ffff8800376c0000
[ 6037.991394]  ffff8800376c7150 0000000000040000 ffff8800376c7118 ffff8800376c0000
[ 6037.991397] Call Trace:
[ 6037.991403]  [<ffffffff81536850>] dump_stack+0x4d/0x6f
[ 6037.991408]  [<ffffffff8107054d>] warn_slowpath_common+0x7d/0xa0
[ 6037.991412]  [<ffffffff8107067a>] warn_slowpath_null+0x1a/0x20
[ 6037.991422]  [<ffffffffa0091d81>] gen6_enable_rps_interrupts+0x71/0x80 [i915]
[ 6037.991432]  [<ffffffffa009999f>] intel_gen6_powersave_work+0x63f/0x10f0 [i915]
[ 6037.991437]  [<ffffffff81088b85>] process_one_work+0x145/0x400
[ 6037.991440]  [<ffffffff8108914b>] worker_thread+0x6b/0x4a0
[ 6037.991444]  [<ffffffff810890e0>] ? init_pwq.part.22+0x10/0x10
[ 6037.991447]  [<ffffffff8108e06a>] kthread+0xea/0x100
[ 6037.991450]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[ 6037.991453]  [<ffffffff8153c77c>] ret_from_fork+0x7c/0xb0
[ 6037.991456]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[ 6037.991458] ---[ end trace 8e62456f40ed3e10 ]---
Comment 21 marlock9 2014-11-03 18:26:55 UTC
Try to add this to kernel parameters:

quiet pcie_aspm=force drm.vblankoffdelay=1 i915.semaphores=0 i915.modeset=1 i915.use_mmio_flip=1 i915.enable_ppgtt=1 video.use_native_backlight=1

I tested for 3 hour gaming in HL2 (that caused hang also) - no one hang anymore!

Also, this parameters seems to work good also:

quiet pcie_aspm=force drm.vblankoffdelay=1 i915.semaphores=0 video.use_native_backlight=1 i915.lvds_downclock=1 i915.modeset=1 i915.use_mmio_flip=1 i915.enable_ppgtt=1 i915.enable_fbc=1
Comment 22 Mika Kuoppala 2014-11-04 14:59:58 UTC
(In reply to marlock9 from comment #21)
> Try to add this to kernel parameters:
> 
> quiet pcie_aspm=force drm.vblankoffdelay=1 i915.semaphores=0 i915.modeset=1
> i915.use_mmio_flip=1 i915.enable_ppgtt=1 video.use_native_backlight=1
> 
> I tested for 3 hour gaming in HL2 (that caused hang also) - no one hang
> anymore!

marlock9, could you please try out which one of the i915.* parameters is
the one curing the hang?
Comment 23 marlock9 2014-11-04 17:37:24 UTC
(In reply to Mika Kuoppala from comment #22)
> (In reply to marlock9 from comment #21)
> > Try to add this to kernel parameters:
> > 
> > quiet pcie_aspm=force drm.vblankoffdelay=1 i915.semaphores=0 i915.modeset=1
> > i915.use_mmio_flip=1 i915.enable_ppgtt=1 video.use_native_backlight=1
> > 
> > I tested for 3 hour gaming in HL2 (that caused hang also) - no one hang
> > anymore!
> 
> marlock9, could you please try out which one of the i915.* parameters is
> the one curing the hang?

It is i915.use_mmio_flip=1
Comment 24 Daniel Vetter 2014-11-26 16:26:02 UTC
Please test this patch

http://patchwork.freedesktop.org/patch/37647/
Comment 25 Chris Wilson 2014-12-19 20:03:42 UTC
commit 2c550183476dfa25641309ae9a28d30feed14379
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 16 10:02:27 2014 +0000

    drm/i915: Disable PSMI sleep messages on all rings around context switches


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.