Bug 94141 - [HSW] GPU HANG: ecode 7:0:0xf3cffffe [MI_SET_CONTEXT], in chrome. Introduced between 4.1.0 and 4.4.0.
Summary: [HSW] GPU HANG: ecode 7:0:0xf3cffffe [MI_SET_CONTEXT], in chrome. Introduced ...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 93279 94076 94237 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-02-13 21:46 UTC by Matt Turner
Modified: 2016-05-20 12:46 UTC (History)
8 users (show)

See Also:
i915 platform: HSW
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (3.05 MB, text/plain)
2016-02-13 21:46 UTC, Matt Turner
no flags Details
/sys/class/drm/card0/error (3.04 MB, text/plain)
2016-03-01 04:51 UTC, Matt Turner
no flags Details
Intel GPU Hang error dump (255.04 KB, application/x-bzip)
2016-03-10 15:33 UTC, Ritesh Raj Sarraf
no flags Details
/sys/class/drm/card0/error after resume (416.13 KB, application/gzip)
2016-03-11 07:49 UTC, Ferry Toth
no flags Details

Description Matt Turner 2016-02-13 21:46:56 UTC
Created attachment 121739 [details]
/sys/class/drm/card0/error

------------[ cut here ]------------
WARNING: CPU: 3 PID: 13037 at /home/mattst88/projects/linux/drivers/gpu/drm/i915/intel_display.c:11289 intel_mmio_flip_work_func+0x378/0x3c0()
WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
Modules linked in:
 wl(PO) ax88179_178a reiserfs cfg80211 [last unloaded: wl]
CPU: 3 PID: 13037 Comm: kworker/3:1 Tainted: P           O    4.4.0 #2  
Hardware name: Apple Inc. MacBookPro11,2/Mac-3CBD00234E554E41, BIOS MBP112.88Z.0138.B02.1310181745 10/18/2013
Workqueue: events intel_mmio_flip_work_func
 ffffffff819dcd58 ffff88038dd33d30 ffffffff812f14df ffff88038dd33d78
 ffff88038dd33d68 ffffffff81078a81 ffff8802645e8c40 ffff88037d33bcc0
 ffff88047f2d48c0 00000000000000c0 ffff88047f2d9000 ffff88038dd33dc8
Call Trace:
 [<ffffffff812f14df>] dump_stack+0x44/0x55
 [<ffffffff81078a81>] warn_slowpath_common+0x81/0xc0
 [<ffffffff81078b07>] warn_slowpath_fmt+0x47/0x50
 [<ffffffff81003764>] ? __switch_to+0x354/0x460
 [<ffffffff8144ef58>] intel_mmio_flip_work_func+0x378/0x3c0
 [<ffffffff8108e597>] process_one_work+0x147/0x3d0
 [<ffffffff8108eb36>] worker_thread+0x46/0x440
 [<ffffffff8108eaf0>] ? rescuer_thread+0x2d0/0x2d0
 [<ffffffff81093354>] kthread+0xc4/0xe0
 [<ffffffff81093290>] ? kthread_park+0x50/0x50
 [<ffffffff817294df>] ret_from_fork+0x3f/0x70
 [<ffffffff81093290>] ? kthread_park+0x50/0x50
---[ end trace 5061c2aae0d052d8 ]---



Perhaps a duplicate of bug 93279, bug 94124, and bug 94076 (the ecode 7:0:0xf3cffffe is the same)
Comment 1 Chris Wilson 2016-02-15 20:45:10 UTC
Worrisome, hang with IPEHR == 0x0c000000 [MI_SET_CONTEXT]
Comment 2 Matt Turner 2016-02-15 20:45:48 UTC
*** Bug 93279 has been marked as a duplicate of this bug. ***
Comment 3 Matt Turner 2016-02-15 20:46:05 UTC
*** Bug 94076 has been marked as a duplicate of this bug. ***
Comment 4 Chris Wilson 2016-02-15 20:52:08 UTC
First thought, it is dying when loading the current context. That reminds me of

https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=b7d9d4bf1fee923b1d546b9f0cdd9ad549352dc6

commit 9c5230ec40288f9f96ba315227a1ff559366eed4
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 29 21:02:02 2015 +0000

    drm/i915: Skip MI_SET_CONTEXT for the same context
    
    Fixes regression from
    
    commit 71b7e54f71b899db9f8def67a0e976969384e699
    Author: Daniel Vetter <daniel.vetter@ffwll.ch>
    Date:   Tue Apr 14 17:35:18 2015 +0200
    
        drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

71b7e54f71b899db9f8def67a0e976969384e699 -> 4.2-rc1
Comment 5 Chris Wilson 2016-02-22 09:34:24 UTC
*** Bug 94237 has been marked as a duplicate of this bug. ***
Comment 6 Matt Turner 2016-02-27 02:27:50 UTC
I switched to 4.3.0 to try to narrow the range of possible culprits, and never saw the hang. I switched back to 4.4.0 (the same kernel as before that generated the hang) and haven't seen in since. Will keep an eye out.
Comment 7 Matt Turner 2016-02-27 07:32:49 UTC
Of course less than an hour after commenting that I haven't had a hang in more than a week, I get a hang.

I'll try the patch you mention.
Comment 8 Yury Zhuravlev 2016-02-29 11:00:55 UTC
I will try too. But for 4.4 kernel need rename intel_engine_flag to intel_ring_flag. 
Also I can get this HANG only in KDE after close window (with KDevelop for example).
Comment 9 Matt Turner 2016-02-29 21:08:29 UTC
Yes, happened with that patch as well.

Not sure if I said it anywhere, but it happens often on resume from suspend.
Comment 10 Matt Turner 2016-03-01 04:51:17 UTC
Created attachment 122050 [details]
/sys/class/drm/card0/error

GPU hang in Chrome immediately upon result. IPEHR = 0x70040000 this time.
Comment 11 Matt Turner 2016-03-01 04:51:38 UTC
(In reply to Matt Turner from comment #10)
> Created attachment 122050 [details]
> /sys/class/drm/card0/error
> 
> GPU hang in Chrome immediately upon result. IPEHR = 0x70040000 this time.

s/result/resume/
Comment 12 Chris Wilson 2016-03-01 10:45:40 UTC
(In reply to Matt Turner from comment #10)
> Created attachment 122050 [details]
> /sys/class/drm/card0/error
> 
> GPU hang in Chrome immediately upon result. IPEHR = 0x70040000 this time.

Scratches head. Looks like a separate issue regarding PIPELINE_SELECT across suspend. The batch there looks like GPGPU, and it dies on a pipecontrol following the first state setup. Everything else looks intact, so the presumption will be the context save/restore across suspend.
Comment 13 Ferry Toth 2016-03-06 13:47:10 UTC
I have this on Intel 2955U (Haswell-ULT), most often after returning from suspend in kscreenlocker_g, and sometimes during normal use in chromium.

This is on Kubuntu Wily (linux 4.2), currently on linux 4.4.4 from Ubuntu Kernel PPA with no improvement.
Comment 14 Ferry Toth 2016-03-06 13:56:10 UTC
Trace after:
 [drm] GPU HANG: ecode 7:0:0xf3cffffe, in kscreenlocker_g [19371], reason: Ring hung, action: reset

------------[ cut here ]------------
WARNING: CPU: 1 PID: 18403 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:11289 intel_mmio_flip_work_func+0x38e/0x3d0 [i915]()
WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
Modules linked in: nls_utf8 ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c rfcomm drbg ansi_cprng ctr ccm bnep joydev cros_ec_devs cyapatp crc_itu_t atmel_mxt_ts arc4 ath9k ath9k_common ath9k_hw intel_rapl x86_pkg_temp_thermal intel_powerclamp ath cros_ec_lpc binfmt_misc cros_ec coretemp snd_seq_midi kvm_intel mac80211 snd_seq_midi_event chromeos_laptop uvcvideo kvm videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_hda_codec_realtek videobuf2_core v4l2_common snd_hda_codec_hdmi irqbypass crct10dif_pclmul snd_hda_codec_generic crc32_pclmul ath3k videodev snd_rawmidi btusb snd_hda_intel cryptd media cfg80211 btrtl input_leds btbcm serio_raw snd_hda_codec btintel bluetooth snd_hda_core lpc_ich snd_hwdep shpchp snd_pcm snd_seq dw_dmac_pci i2c_designware_pci snd_seq_device snd_timer snd soundcore dw_dmac dw_dmac_core 8250_fintek tpm_infineon i2c_designware_platform i2c_designware_core 8250_dw mac_hid spi_pxa2xx_platform ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 xt_comment xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter parport_pc ppdev ip_tables x_tables lp parport autofs4 btrfs xor raid6_pq i915 i2c_algo_bit ahci libahci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops fjes video sdhci_acpi drm sdhci
CPU: 1 PID: 18403 Comm: kworker/1:4 Tainted: G        W       4.4.4-040404-generic #201603031931
Hardware name: Acer Peppy, BIOS          04/30/2014
Workqueue: events intel_mmio_flip_work_func [i915]
 0000000000000286 00000000cf2349ab ffff88014ed8fd20 ffffffff813ce993
 ffff88014ed8fd68 ffffffffc01e1aa8 ffff88014ed8fd58 ffffffff8107fda2
 ffff88017646b380 ffff88017cb16300 ffff88017cb1ac00 0000000000000040
Call Trace:
 [<ffffffff813ce993>] dump_stack+0x63/0x90
 [<ffffffff8107fda2>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8107fe3c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff810168b0>] ? __switch_to+0x420/0x5a0
 [<ffffffffc017ac8e>] intel_mmio_flip_work_func+0x38e/0x3d0 [i915]
 [<ffffffff81098d92>] process_one_work+0x162/0x480
 [<ffffffff810990fb>] worker_thread+0x4b/0x4c0
 [<ffffffff810990b0>] ? process_one_work+0x480/0x480
 [<ffffffff810990b0>] ? process_one_work+0x480/0x480
 [<ffffffff8109f288>] kthread+0xd8/0xf0
 [<ffffffff8109f1b0>] ? kthread_create_on_node+0x1a0/0x1a0
 [<ffffffff81805b8f>] ret_from_fork+0x3f/0x70
 [<ffffffff8109f1b0>] ? kthread_create_on_node+0x1a0/0x1a0
---[ end trace f857b40eaf2ae0be ]---
Comment 15 Ritesh Raj Sarraf 2016-03-10 15:32:06 UTC
I have the same issue on Debian with a 4.4.4 kernel. Like Matt Turner, the issue is easily reproducible for me during resume from suspend, and chromium browser running.


I'm also attaching my dri dump as an attachment. And some other output that may be relevant.

rrs@learner:~/.rrs-home/Community/linux-upstream_GIT (stable-44)$ cat /var/tmp/intel-reg-checker.txt 
  (bit 12) FAIL: MI_FLUSH enable must be set
  (bit  6) FAIL: Vertex Shader Timer Dispatch Enable must be set
MI_MODE (0x209c): 0x00000000
  (bit 14) OK:   Async Flip Performance mode
  (bit 13) OK:   Flush Performance Mode
GFX_MODE (0x229c): 0x00000000
  (bit 13) PERF: Flush TLB Invalidation Mode should be set
GT_MODE (0x7008): 0x00000000
CACHE_MODE_0 (0x7000): 0x00000000
  (bit 15) OK:   Sampler L2 Disable
  (bit  9) PERF: Sampler L2 TLB Prefetch Enable should be set
  (bit  8) OK:   Depth Related Cache Pipelined Flush Disable
  (bit  5) OK:   STC LRA Eviction Policy
  (bit  4) OK:   RCC LRA Eviction Policy
  (bit  3) OK:   Hierarchical Z Disable
  (bit  0) OK:   Render Cache Operational Flush
CACHE_MODE_1 (0x7004): 0x00000000
  (bit 13) OK:   STC Address Lookup Optimization Disable
  (bit 12) OK:   HIZ LRA Eviction Policy
  (bit 11) OK:   DAP Instruction and State Cache Invalidate
  (bit 10) OK:   Instruction L1 Cache and In-Flight Queue Disable
  (bit  9) OK:   Instruction L2 Cache Fill Buffers Disable
  (bit  6) OK:   Pixel Backend sub-span collection Optimization Disable
  (bit  5) OK:   MCS Cache Disable
  (bit  4) OK:   Data Disable
  (bit  1) OK:   Instruction and State L2 Cache Disable
  (bit  0) OK:   Instruction and State L1 Cache Disable
FF_SLICE_CHICKEN (0x2088): 0x00000000
           OK:   chicken bits unset
3D_CHICKEN3 (0x2090): 0x00000000
           OK:   chicken bits unset
FF_SLICE_CS_CHICKEN1 (0x20e0): 0x00000000
           OK:   chicken bits unset
FF_SLICE_CS_CHICKEN2 (0x20e4): 0x00000000
           OK:   chicken bits unset
FF_SLICE_CS_CHICKEN3 (0x20e8): 0x00000000
           OK:   chicken bits unset
COMMON_SLICE_CHICKEN1 (0x7010): 0x00000000
           OK:   chicken bits unset
COMMON_SLICE_CHICKEN2 (0x7014): 0x00000000
           OK:   chicken bits unset
WM_CHICKEN (0x5580): 0x00000000
           OK:   chicken bits unset
HALF_SLICE_CHICKEN (0xe100): 0x00000000
           OK:   chicken bits unset
HALF_SLICE_CHICKEN2 (0xe180): 0x00000000
           OK:   chicken bits unset
ROW_CHICKEN (0xe4f0): 0x00000000
           OK:   chicken bits unset
ROW_CHICKEN2 (0xe4f4): 0x00000000
           OK:   chicken bits unset
ECOSKPD (0x21d0): 0x00000000
           OK:   chicken bits unset
2016-03-10 / 21:01:19 ♒♒♒  ☺
Comment 16 Ritesh Raj Sarraf 2016-03-10 15:33:33 UTC
Created attachment 122208 [details]
Intel GPU Hang error dump
Comment 17 Ferry Toth 2016-03-11 07:49:15 UTC
Created attachment 122216 [details]
/sys/class/drm/card0/error after resume
Comment 18 Ferry Toth 2016-03-17 07:45:00 UTC
After installing linux 4.5.0 from Ubuntu's kernel ppa I have not seen this hang anymore.

It's a bit early to say that it's fixed, but I've got my hopes up.
Comment 19 Thiago Macieira 2016-03-17 16:32:02 UTC
I haven't seen a hang in 4.4.3 either.
Comment 20 Ferry Toth 2016-03-17 20:04:31 UTC
I still had the hang in 4.4.4 (see #13) but nothing in the last 3 days with 4.5.0
Comment 21 Ferry Toth 2016-03-23 23:14:34 UTC
In the 10 days since installing 4.5.0 not one gpu hang vs. multiple hangs per day using 4.2 - 4.4.

It seems to have been solved.
Comment 22 yann 2016-05-20 07:46:07 UTC
Matt can you try with >= 4.5 like Ferry and see if hang is still occuring?
Comment 23 Matt Turner 2016-05-20 12:39:15 UTC
It does indeed seem to be fixed. Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.