Summary: | [BDW] GPU hang on resume from suspend | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Timo Aaltonen <tjaalton> | ||||||
Component: | DRM/Intel | Assignee: | Ben Widawsky <ben> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | critical | ||||||||
Priority: | high | CC: | intel-gfx-bugs, yk | ||||||
Version: | XOrg git | ||||||||
Hardware: | Other | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
so turns out this is probably caused by incomplete ppgtt support in the backport series which still uses aliasing, but disabling it by i915.enable_ppgtt=0 didn't seem to work this bug blocks us from supporting BDW in Ubuntu 14.04 . bumping up the importance to "High" + "Critical" . please let me know if this is inappropriate. We've now been getting several reports of failures on resume. Are you certain this is a broadwell specific issue? Kernel trace just before the system dies shows gen8_ppgtt_cleanup() on it, so yes I think this one is. 'broadwell' git branch before the recent rebase is stable. also, it doesn't happen when the S3-cycles are scripted with fwts, but closing/opening the lid makes it hang after ~3 cycles [ 899.708190] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 899.708330] IP: [<ffffffffa031e8db>] gen8_ppgtt_cleanup+0x1b/0x60 [i915_bdw] [ 899.708481] PGD 36b8c067 PUD d0ae0067 PMD 0 [ 899.708560] Oops: 0002 [#1] SMP [ 899.708618] Modules linked in: rpcsec_gss_krb5 nfsv4 ctr ccm snd_hda_codec_realtek x86_pkg_temp_thermal coretemp kvm_intel sparse_keymap kvm crct10dif_pclmul arc4 dcdbas crc32_pclmul ghash_clmulni_intel aesni_intel rfcomm bnep aes_x86_64 lrw gf128mul bluetooth snd_hda_intel glue_helper ablk_helper uvcvideo cryptd snd_hda_codec snd_hwdep videobuf2_vmalloc snd_pcm videobuf2_memops psmouse snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi videobuf2_core iwlmvm videodev mac80211 snd_seq nfsd i915_bdw auth_rpcgss nfs_acl snd_seq_device nfs intel_ips iwlwifi lockd snd_timer sunrpc drm_kms_helper drm serio_raw mei_me fscache cfg80211 i2c_algo_bit snd lpc_ich mei soundcore wmi video acpi_pad parport_pc mac_hid ppdev lp parport ahci sdhci_pci e1000e libahci sdhci ptp pps_core [ 899.710220] CPU: 0 PID: 6083 Comm: kworker/0:5 Not tainted 3.13.0-25-generic #47tja1 [ 899.710320] Hardware name: not-really [ 899.710416] Workqueue: events i915_error_work_func [i915_bdw] [ 899.710474] task: ffff8800ae372fe0 ti: ffff88009f938000 task.ti: ffff88009f938000 [ 899.710540] RIP: 0010:[<ffffffffa031e8db>] [<ffffffffa031e8db>] gen8_ppgtt_cleanup+0x1b/0x60 [i915_bdw] [ 899.710656] RSP: 0018:ffff88009f939d38 EFLAGS: 00010286 [ 899.710704] RAX: 0000000000000000 RBX: ffff8800d09d3e00 RCX: 0000000000000001 [ 899.710769] RDX: 0000000000000000 RSI: 00000000b400b3fe RDI: ffff8800d09d3e00 [ 899.710836] RBP: ffff88009f939d40 R08: 0000000000000286 R09: 000000000000000b [ 899.710903] R10: 00000000e0065000 R11: ffffffff8118626f R12: ffff8800cc1e4000 [ 899.710969] R13: ffff8800cc1e4000 R14: ffff8800cc1e5870 R15: 0000000000000000 [ 899.711046] FS: 0000000000000000(0000) GS:ffff88011f400000(0000) knlGS:0000000000000000 [ 899.711156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 899.711225] CR2: 0000000000000008 CR3: 00000000d0b51000 CR4: 00000000003407f0 [ 899.711291] Stack: [ 899.711321] ffff8800d09d3e00 ffff88009f939d60 ffffffffa031efe9 0000000000000000 [ 899.711461] ffff8800cd21c000 ffff88009f939d90 ffffffffa0314fe2 ffff8800cd21c000 [ 899.711591] ffff8800cc1e4000 ffff8800cd21c020 0000000000000000 ffff88009f939dc8 [ 899.711687] Call Trace: [ 899.711751] [<ffffffffa031efe9>] i915_gem_cleanup_aliasing_ppgtt+0x29/0x50 [i915_bdw] [ 899.711851] [<ffffffffa0314fe2>] i915_gem_init_hw+0x362/0x380 [i915_bdw] [ 899.711931] [<ffffffffa0301ca1>] i915_reset+0xa1/0x180 [i915_bdw] [ 899.712008] [<ffffffffa030901d>] i915_error_work_func+0xcd/0x120 [i915_bdw] [ 899.712086] [<ffffffff810838a2>] process_one_work+0x182/0x450 [ 899.712161] [<ffffffff81084641>] worker_thread+0x121/0x410 [ 899.712247] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 [ 899.712348] [<ffffffff8108b312>] kthread+0xd2/0xf0 [ 899.712442] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 899.712563] [<ffffffff81728ffc>] ret_from_fork+0x7c/0xb0 [ 899.712664] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 899.712778] Code: 51 ff ff ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 8b 97 b0 00 00 00 48 8b 87 b8 00 00 00 48 89 fb <48> 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 48 89 87 b0 [ 899.713280] RIP [<ffffffffa031e8db>] gen8_ppgtt_cleanup+0x1b/0x60 [i915_bdw] easiest way to reproduce is to play big_buck_bunny_1080p.ogg with vlc I am unable to reproduce this with VLC + big_buck_bunny_1080p.ogg + bdw-backports (and non-composited desktop). I'll try with the USB live image as soon as possible. I think we're really looking at two separate issues though. The first is the hang, and the second is bad cleanup after hang. The latter one I can reproduce by forcing the GPU to wedged. I'll try to come up with a patch to fix that. I have no ideas on the real issue - the hang. Vanilla bdw-backports kernel fails as well, so it's not just the distro kernel. But kernel built from old 'broadwell' branch based on 3.14 doesn't hang hard, there's still a gpu hang but the system recovers from it. So the only difference when you run bdw-backports is you have compositing, correct? I will try that. I have compiz working yes, our mesa 10.1 is patched to enable bdw. Can't reproduce the hard hang with vanilla 3.15-rc4, fwiw. The system freeze with S3/vlc is not happening with the preliminary patch I got from Ben, but it triggers #76368. Created attachment 98788 [details] [review] Use MMIO for PDPs let's just close this.. bdw-backports won't fly for long, I need to rebase to 3.15 anyway due to #76368 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 97527 [details] error state testing a module based on 3.14 + bdw-backports I sometimes get a gpu hang on resume, which results in a complete system hang shortly after managed to get an error state from it, attaching