Bug 92814 - [snb] 4.3.0 gpu hang in MediaPl~back
Summary: [snb] 4.3.0 gpu hang in MediaPl~back
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-04 12:22 UTC by Jonas Jelten
Modified: 2017-07-03 11:07 UTC (History)
3 users (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments
debug dump (4.73 MB, text/plain)
2015-11-04 12:22 UTC, Jonas Jelten
no flags Details

Description Jonas Jelten 2015-11-04 12:22:12 UTC
Created attachment 119400 [details]
debug dump

[126770.012985] [drm] stuck on render ring
[126770.013842] [drm] GPU HANG: ecode 6:0:0x87e8fffd, in MediaPl~back #1 [93854], reason: Ring hung, action: reset
[126770.013844] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[126770.013845] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[126770.013847] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[126770.013848] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[126770.013849] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[126770.013906] ------------[ cut here ]------------
[126770.013935] WARNING: CPU: 0 PID: 93437 at drivers/gpu/drm/i915/intel_display.c:11197 intel_mmio_flip_work_func+0x6b/0x330 [i915]()
[126770.013937] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
[126770.013938] Modules linked in:
[126770.013939]  ccm nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables bnep nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables iTCO_wdt uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev btusb btrtl joydev btbcm coretemp iwldvm snd_hda_codec_hdmi btintel media cmac kvm_intel mac80211 snd_hda_codec_generic kvm bluetooth iwlwifi psmouse microcode pcspkr serio_raw cfg80211 sdhci_pci snd_hda_intel sdhci snd_hda_codec i2c_i801 snd_hwdep mmc_core lpc_ich thinkpad_acpi snd_hda_core nvram snd_pcm i915 snd_timer sch_fq_codel rfkill snd evdev soundcore
[126770.013971] CPU: 0 PID: 93437 Comm: kworker/0:1 Tainted: G     U          4.3.0-JJ #142
[126770.013972] Hardware name: LENOVO 4296CTO/4296CTO, BIOS 8DET70WW (1.40 ) 05/14/2015
[126770.013989] Workqueue: events intel_mmio_flip_work_func [i915]
[126770.013990]  0000000000000000 00000000c58d59c4 ffff88003fb17d10 ffffffff8a44d8e4
[126770.013992]  ffff88003fb17d58 ffff88003fb17d48 ffffffff8a09bae6 ffffffffc0204a16
[126770.013994]  ffff880100706c80 ffff88041e216080 ffff8804098a7a80 ffff880100706c80
[126770.013997] Call Trace:
[126770.014002]  [<ffffffff8a44d8e4>] dump_stack+0x44/0x55
[126770.014005]  [<ffffffff8a09bae6>] warn_slowpath_common+0x99/0xb2
[126770.014020]  [<ffffffffc0204a16>] ? intel_mmio_flip_work_func+0x6b/0x330 [i915]
[126770.014023]  [<ffffffff8a09bb56>] warn_slowpath_fmt+0x57/0x73
[126770.014037]  [<ffffffffc0204a16>] intel_mmio_flip_work_func+0x6b/0x330 [i915]
[126770.014054]  [<ffffffff8a0afd60>] process_one_work+0x1ac/0x31a
[126770.014056]  [<ffffffff8a0b0824>] worker_thread+0x285/0x376
[126770.014059]  [<ffffffff8a0b059f>] ? rescuer_thread+0x2c4/0x2c4
[126770.014060]  [<ffffffff8a0b4bf9>] kthread+0xe1/0xe9
[126770.014062]  [<ffffffff8a0b4b18>] ? kthread_worker_fn+0x149/0x149
[126770.014065]  [<ffffffff8a78699f>] ret_from_fork+0x3f/0x70
[126770.014067]  [<ffffffff8a0b4b18>] ? kthread_worker_fn+0x149/0x149
[126770.014069] ---[ end trace ebe03e08f66836be ]---
[126770.015908] drm/i915: Resetting chip after gpu hang
Comment 1 Chris Wilson 2015-11-04 12:46:28 UTC
CS stops working (no more bb executed, no more MI commands, both LRI and STORE_DWORD) after a libva chained batch buffer. Not the first time we have seen such behaviour (but always in conjunction with libva).
Comment 2 Jonas Jelten 2015-11-08 03:06:27 UTC
I have these hangs _very_ often, mainly with html5 video.
Now i thought, many others don't seem to have the problem, so I removed my long-year kernel command line options (since 3.2 or something):

i915.enable_rc6=1 i915.enable_fbc=1 i915.powersave=1 i915.modeset=1 i915.lvds_downclock=1

And now the crashes and hangs are gone.

I'd be glad to further help debugging this.

(Probably related to another bug by me then: #92330)
Comment 3 Jonas Jelten 2015-11-08 03:44:53 UTC
Veeery interesting: I forgot I changed something else except those 4 parameters: intel_iommu=on, it was off before (because of playing with xen).

Now, with the iommu on again, it does not crash even after I enable my beloved 4 parameters again (yea i know, rc6 and the modeset is default now :)

To confirm, I just double-tested it with mmu=off again and keeping the 4 params, and yes it hung very quickly when watching youtube.

=> IOMMU on == good!

Both on Gentoo, Linux 4.3.0-JJ #142 SMP Mon Nov 2 02:43:59 CET 2015 x86_64 Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz GenuineIntel GNU/Linux:

dmesg | grep drm  # with iommu on
[    1.591911] [drm] Initialized drm 1.1.0 20060810
[    9.870699] [drm] Memory usable by graphics device = 2048M
[    9.870780] [drm] VT-d active for gfx access
[    9.870857] [drm] Disabling PPGTT because VT-d is on
[    9.870935] [drm] Replacing VGA console driver
[    9.886575] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.886582] [drm] Driver supports precise vblank timestamp query.
[    9.886602] [drm] DMAR active, disabling use of stolen memory
[    9.987376] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
[   10.003780] [drm] Initialized i915 1.6.0 20150731 for 0000:00:02.0 on minor 0
[   10.003797] fbcon: inteldrmfb (fb0) is primary device
[   10.853001] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[   11.464863] drm: not enough stolen space for compressed buffer (need 4325376 more bytes), disabling. Hint: you may be able to increase stolen memory size in the BIOS to avoid this.



dmesg | grep drm  # with iommu off
[    1.586329] [drm] Initialized drm 1.1.0 20060810
[    9.244689] [drm] Memory usable by graphics device = 2048M
[    9.244774] [drm] Replacing VGA console driver
[    9.250961] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.250968] [drm] Driver supports precise vblank timestamp query.
[    9.354597] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
[    9.371987] fbcon: inteldrmfb (fb0) is primary device
[    9.372597] [drm] Initialized i915 1.6.0 20150731 for 0000:00:02.0 on minor 0
[   10.227585] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
(...) # another hang I just had:
[  253.551252] ------------[ cut here ]------------
[  253.551341] WARNING: CPU: 2 PID: 973 at drivers/gpu/drm/i915/intel_display.c:3926 intel_crtc_wait_for_pending_flips+0xf8/0x1f1 [i915]()
[  253.551346] WARN_ON(wait_event_timeout(dev_priv->pending_flip_queue, !intel_crtc_has_pending_flip(crtc), 60*HZ) == 0)
[  253.551350] Modules linked in:
[  253.551354]  nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables bnep nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables sch_fq_codel snd_hda_codec_hdmi iTCO_wdt uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core snd_hda_codec_generic v4l2_common videodev joydev media btusb btrtl btbcm btintel cmac iwldvm mac80211 bluetooth coretemp kvm_intel snd_hda_intel kvm iwlwifi snd_hda_codec snd_hwdep cfg80211 sdhci_pci thinkpad_acpi snd_hda_core nvram microcode psmouse sdhci snd_pcm i915 serio_raw mmc_core pcspkr rfkill snd_timer i2c_i801 lpc_ich snd evdev soundcore
[  253.551444] CPU: 2 PID: 973 Comm: X Not tainted 4.3.0-JJ #142
[  253.551447] Hardware name: LENOVO 4296CTO/4296CTO, BIOS 8DET70WW (1.40 ) 05/14/2015
[  253.551452]  0000000000000000 0000000033858303 ffff880401ddfa80 ffffffff8444d8e4
[  253.551459]  ffff880401ddfac8 ffff880401ddfab8 ffffffff8409bae6 ffffffffc04bbc52
[  253.551466]  ffff880409d5e000 0000000000000000 ffff8804092b4800 ffff880409d28ee0
[  253.551473] Call Trace:
[  253.551487]  [<ffffffff8444d8e4>] dump_stack+0x44/0x55
[  253.551498]  [<ffffffff8409bae6>] warn_slowpath_common+0x99/0xb2
[  253.551555]  [<ffffffffc04bbc52>] ? intel_crtc_wait_for_pending_flips+0xf8/0x1f1 [i915]
[  253.551563]  [<ffffffff8409bb56>] warn_slowpath_fmt+0x57/0x73
[  253.551612]  [<ffffffffc04bbc52>] intel_crtc_wait_for_pending_flips+0xf8/0x1f1 [i915]
[  253.551620]  [<ffffffff840cea0a>] ? wait_woken+0x72/0x72
[  253.551667]  [<ffffffffc04bcaf6>] intel_pre_plane_update+0xa7/0x101 [i915]
[  253.551713]  [<ffffffffc04bdcab>] intel_atomic_commit+0xddc/0xed4 [i915]
[  253.551761]  [<ffffffffc04c12dc>] ? intel_atomic_check+0x8a2/0xc15 [i915]
[  253.551808]  [<ffffffffc04a9f00>] ? intel_crtc_duplicate_state+0x3c/0x7a [i915]
[  253.551819]  [<ffffffff8454b256>] ? __drm_atomic_helper_crtc_duplicate_state+0x2f/0x41
[  253.551826]  [<ffffffff845698b5>] ? drm_atomic_check_only+0x168/0x45b
[  253.551832]  [<ffffffff84569bf5>] drm_atomic_commit+0x4d/0x52
[  253.551839]  [<ffffffff8454a8b5>] drm_atomic_helper_disable_plane+0xcb/0x116
[  253.551846]  [<ffffffff8455f066>] __setplane_internal+0x39/0x2aa
[  253.551852]  [<ffffffff8455f715>] drm_mode_setplane+0x145/0x166
[  253.551859]  [<ffffffff84553067>] drm_ioctl+0x23d/0x374
[  253.551865]  [<ffffffff8455f5d0>] ? drm_mode_cursor_common+0x14d/0x14d
[  253.551874]  [<ffffffff841c5307>] do_vfs_ioctl+0x3a2/0x42c
[  253.551880]  [<ffffffff841b766f>] ? __sb_end_write+0x1d/0x1f
[  253.551885]  [<ffffffff841b573b>] ? vfs_write+0x154/0x162
[  253.551892]  [<ffffffff841c53e8>] SyS_ioctl+0x57/0x79
[  253.551901]  [<ffffffff8478662e>] entry_SYSCALL_64_fastpath+0x12/0x71
[  253.551907] ---[ end trace 1fc1877966ffb925 ]---
[  253.551911] ------------[ cut here ]------------
[  253.551959] WARNING: CPU: 2 PID: 973 at drivers/gpu/drm/i915/intel_display.c:3931 intel_crtc_wait_for_pending_flips+0x132/0x1f1 [i915]()
[  253.551962] Removing stuck page flip
[  253.551965] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables bnep nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables sch_fq_codel snd_hda_codec_hdmi iTCO_wdt uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core snd_hda_codec_generic v4l2_common videodev joydev media btusb btrtl btbcm btintel cmac iwldvm mac80211 bluetooth coretemp kvm_intel snd_hda_intel kvm iwlwifi snd_hda_codec snd_hwdep cfg80211 sdhci_pci thinkpad_acpi snd_hda_core nvram microcode psmouse sdhci snd_pcm i915 serio_raw mmc_core pcspkr rfkill snd_timer i2c_i801 lpc_ich snd evdev soundcore
[  253.552042] CPU: 2 PID: 973 Comm: X Tainted: G        W       4.3.0-JJ #142
[  253.552047] Hardware name: LENOVO 4296CTO/4296CTO, BIOS 8DET70WW (1.40 ) 05/14/2015
[  253.552050]  0000000000000000 0000000033858303 ffff880401ddfa80 ffffffff8444d8e4
[  253.552056]  ffff880401ddfac8 ffff880401ddfab8 ffffffff8409bae6 ffffffffc04bbc8c
[  253.552062]  ffff880409d5e000 ffff8804092b49a8 ffff8804092b4800 ffff880409d28ee0
[  253.552068] Call Trace:
[  253.552076]  [<ffffffff8444d8e4>] dump_stack+0x44/0x55
[  253.552083]  [<ffffffff8409bae6>] warn_slowpath_common+0x99/0xb2
[  253.552128]  [<ffffffffc04bbc8c>] ? intel_crtc_wait_for_pending_flips+0x132/0x1f1 [i915]
[  253.552136]  [<ffffffff8409bb56>] warn_slowpath_fmt+0x57/0x73
[  253.552179]  [<ffffffffc04bbc8c>] intel_crtc_wait_for_pending_flips+0x132/0x1f1 [i915]
[  253.552185]  [<ffffffff840cea0a>] ? wait_woken+0x72/0x72
[  253.552228]  [<ffffffffc04bcaf6>] intel_pre_plane_update+0xa7/0x101 [i915]
[  253.552271]  [<ffffffffc04bdcab>] intel_atomic_commit+0xddc/0xed4 [i915]
[  253.552316]  [<ffffffffc04c12dc>] ? intel_atomic_check+0x8a2/0xc15 [i915]
[  253.552361]  [<ffffffffc04a9f00>] ? intel_crtc_duplicate_state+0x3c/0x7a [i915]
[  253.552370]  [<ffffffff8454b256>] ? __drm_atomic_helper_crtc_duplicate_state+0x2f/0x41
[  253.552376]  [<ffffffff845698b5>] ? drm_atomic_check_only+0x168/0x45b
[  253.552382]  [<ffffffff84569bf5>] drm_atomic_commit+0x4d/0x52
[  253.552389]  [<ffffffff8454a8b5>] drm_atomic_helper_disable_plane+0xcb/0x116
[  253.552395]  [<ffffffff8455f066>] __setplane_internal+0x39/0x2aa
[  253.552401]  [<ffffffff8455f715>] drm_mode_setplane+0x145/0x166
[  253.552407]  [<ffffffff84553067>] drm_ioctl+0x23d/0x374
[  253.552413]  [<ffffffff8455f5d0>] ? drm_mode_cursor_common+0x14d/0x14d
[  253.552420]  [<ffffffff841c5307>] do_vfs_ioctl+0x3a2/0x42c
[  253.552427]  [<ffffffff841b766f>] ? __sb_end_write+0x1d/0x1f
[  253.552432]  [<ffffffff841b573b>] ? vfs_write+0x154/0x162
[  253.552438]  [<ffffffff841c53e8>] SyS_ioctl+0x57/0x79
[  253.552445]  [<ffffffff8478662e>] entry_SYSCALL_64_fastpath+0x12/0x71
[  253.552450] ---[ end trace 1fc1877966ffb926 ]---
Comment 4 Jani Nikula 2016-06-17 15:54:40 UTC
(In reply to Jonas Jelten from comment #2)
> I have these hangs _very_ often, mainly with html5 video.
> Now i thought, many others don't seem to have the problem, so I removed my
> long-year kernel command line options (since 3.2 or something):
> 
> i915.enable_rc6=1 i915.enable_fbc=1 i915.powersave=1 i915.modeset=1
> i915.lvds_downclock=1
> 
> And now the crashes and hangs are gone.
> 
> I'd be glad to further help debugging this.

Sorry for neglecting this bug. We don't really support changing said parameters from their per platform defaults. If it works without, we're happy. Glad you found another way to fix it too (comment #3).

Closing.
Comment 5 Jari Tahvanainen 2017-07-03 11:07:16 UTC
Closing >1 year old resolved+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.