Created attachment 97527 [details] error state testing a module based on 3.14 + bdw-backports I sometimes get a gpu hang on resume, which results in a complete system hang shortly after managed to get an error state from it, attaching
so turns out this is probably caused by incomplete ppgtt support in the backport series which still uses aliasing, but disabling it by i915.enable_ppgtt=0 didn't seem to work
this bug blocks us from supporting BDW in Ubuntu 14.04 . bumping up the importance to "High" + "Critical" . please let me know if this is inappropriate.
We've now been getting several reports of failures on resume. Are you certain this is a broadwell specific issue?
Kernel trace just before the system dies shows gen8_ppgtt_cleanup() on it, so yes I think this one is. 'broadwell' git branch before the recent rebase is stable.
also, it doesn't happen when the S3-cycles are scripted with fwts, but closing/opening the lid makes it hang after ~3 cycles [ 899.708190] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 899.708330] IP: [<ffffffffa031e8db>] gen8_ppgtt_cleanup+0x1b/0x60 [i915_bdw] [ 899.708481] PGD 36b8c067 PUD d0ae0067 PMD 0 [ 899.708560] Oops: 0002 [#1] SMP [ 899.708618] Modules linked in: rpcsec_gss_krb5 nfsv4 ctr ccm snd_hda_codec_realtek x86_pkg_temp_thermal coretemp kvm_intel sparse_keymap kvm crct10dif_pclmul arc4 dcdbas crc32_pclmul ghash_clmulni_intel aesni_intel rfcomm bnep aes_x86_64 lrw gf128mul bluetooth snd_hda_intel glue_helper ablk_helper uvcvideo cryptd snd_hda_codec snd_hwdep videobuf2_vmalloc snd_pcm videobuf2_memops psmouse snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi videobuf2_core iwlmvm videodev mac80211 snd_seq nfsd i915_bdw auth_rpcgss nfs_acl snd_seq_device nfs intel_ips iwlwifi lockd snd_timer sunrpc drm_kms_helper drm serio_raw mei_me fscache cfg80211 i2c_algo_bit snd lpc_ich mei soundcore wmi video acpi_pad parport_pc mac_hid ppdev lp parport ahci sdhci_pci e1000e libahci sdhci ptp pps_core [ 899.710220] CPU: 0 PID: 6083 Comm: kworker/0:5 Not tainted 3.13.0-25-generic #47tja1 [ 899.710320] Hardware name: not-really [ 899.710416] Workqueue: events i915_error_work_func [i915_bdw] [ 899.710474] task: ffff8800ae372fe0 ti: ffff88009f938000 task.ti: ffff88009f938000 [ 899.710540] RIP: 0010:[<ffffffffa031e8db>] [<ffffffffa031e8db>] gen8_ppgtt_cleanup+0x1b/0x60 [i915_bdw] [ 899.710656] RSP: 0018:ffff88009f939d38 EFLAGS: 00010286 [ 899.710704] RAX: 0000000000000000 RBX: ffff8800d09d3e00 RCX: 0000000000000001 [ 899.710769] RDX: 0000000000000000 RSI: 00000000b400b3fe RDI: ffff8800d09d3e00 [ 899.710836] RBP: ffff88009f939d40 R08: 0000000000000286 R09: 000000000000000b [ 899.710903] R10: 00000000e0065000 R11: ffffffff8118626f R12: ffff8800cc1e4000 [ 899.710969] R13: ffff8800cc1e4000 R14: ffff8800cc1e5870 R15: 0000000000000000 [ 899.711046] FS: 0000000000000000(0000) GS:ffff88011f400000(0000) knlGS:0000000000000000 [ 899.711156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 899.711225] CR2: 0000000000000008 CR3: 00000000d0b51000 CR4: 00000000003407f0 [ 899.711291] Stack: [ 899.711321] ffff8800d09d3e00 ffff88009f939d60 ffffffffa031efe9 0000000000000000 [ 899.711461] ffff8800cd21c000 ffff88009f939d90 ffffffffa0314fe2 ffff8800cd21c000 [ 899.711591] ffff8800cc1e4000 ffff8800cd21c020 0000000000000000 ffff88009f939dc8 [ 899.711687] Call Trace: [ 899.711751] [<ffffffffa031efe9>] i915_gem_cleanup_aliasing_ppgtt+0x29/0x50 [i915_bdw] [ 899.711851] [<ffffffffa0314fe2>] i915_gem_init_hw+0x362/0x380 [i915_bdw] [ 899.711931] [<ffffffffa0301ca1>] i915_reset+0xa1/0x180 [i915_bdw] [ 899.712008] [<ffffffffa030901d>] i915_error_work_func+0xcd/0x120 [i915_bdw] [ 899.712086] [<ffffffff810838a2>] process_one_work+0x182/0x450 [ 899.712161] [<ffffffff81084641>] worker_thread+0x121/0x410 [ 899.712247] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 [ 899.712348] [<ffffffff8108b312>] kthread+0xd2/0xf0 [ 899.712442] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 899.712563] [<ffffffff81728ffc>] ret_from_fork+0x7c/0xb0 [ 899.712664] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 899.712778] Code: 51 ff ff ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 8b 97 b0 00 00 00 48 8b 87 b8 00 00 00 48 89 fb <48> 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 48 89 87 b0 [ 899.713280] RIP [<ffffffffa031e8db>] gen8_ppgtt_cleanup+0x1b/0x60 [i915_bdw]
easiest way to reproduce is to play big_buck_bunny_1080p.ogg with vlc
I am unable to reproduce this with VLC + big_buck_bunny_1080p.ogg + bdw-backports (and non-composited desktop). I'll try with the USB live image as soon as possible. I think we're really looking at two separate issues though. The first is the hang, and the second is bad cleanup after hang. The latter one I can reproduce by forcing the GPU to wedged. I'll try to come up with a patch to fix that. I have no ideas on the real issue - the hang.
Vanilla bdw-backports kernel fails as well, so it's not just the distro kernel. But kernel built from old 'broadwell' branch based on 3.14 doesn't hang hard, there's still a gpu hang but the system recovers from it.
So the only difference when you run bdw-backports is you have compositing, correct? I will try that.
I have compiz working yes, our mesa 10.1 is patched to enable bdw. Can't reproduce the hard hang with vanilla 3.15-rc4, fwiw.
The system freeze with S3/vlc is not happening with the preliminary patch I got from Ben, but it triggers #76368.
Created attachment 98788 [details] [review] Use MMIO for PDPs
let's just close this.. bdw-backports won't fly for long, I need to rebase to 3.15 anyway due to #76368
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.