Created attachment 128160 [details] netconsole-dump On the latest -nightly when running the following, a hard hang will sometimes present itself: ./nexuiz-linux-x86_64-glx -benchmark demos/demo1 -nosound 2>&1 | egrep -e '[0-9]+ frames' This has so far been hard to reproduce, so bisection has proved to be futile, but did finally get it to happen whilst netconsole was running. Please see attached.
Hmm, looks like quite general memory corruption. Probably due to the vma pinleak, i.e. the hardware was continuing to use pages returned to the system; calamity ensures. Alternatively, some other code went rogue and scribbled over large chunks of memory and just happened to write zero over the unpin_work...
Matthew whilst you still hopefully have that kernel, care to work out the line that died? 120 bytes from the base of the pointer...
120 bytes should be vma->vm i.e. the lockdep_assert_held(&vma->vm->dev->struct_mutex), and so vma is NULL. Given the persistence of GGTT vma (they are only destroyed when the object is) it seems more likely that there was another change of state. Still that loophole is closed by https://patchwork.freedesktop.org/series/15325/
Yeah, as expected the problematic line is: 0x000000000007c625 <+101>: mov 0x78(%rax),%rcx Which would be vma->fence, vma->vm would be 0x70, I guess I didn't have lockdep enabled...anyway as you said it looks like the vma is NULL. Now just need to figure out why...
*** Bug 99134 has been marked as a duplicate of this bug. ***
This still reproduces on 4.10-rc2.
Happens occasionally during logout from lightdm session. Afterwards, SysRq reboot is required. Skylake i7-6700K, 4.10.0-rc3+. Jan 16 10:18:53 atlas kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 Jan 16 10:18:53 atlas kernel: IP: intel_unpin_fb_obj+0x63/0xd0 [i915] Jan 16 10:18:53 atlas kernel: PGD 0 Jan 16 10:18:53 atlas kernel: Jan 16 10:18:53 atlas kernel: Oops: 0000 [#1] PREEMPT SMP Jan 16 10:18:53 atlas kernel: Modules linked in: cpuid btrfs ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs dm_mod uas usb_storage loop drbg ansi_cprng authenc echainiv xfrm6_mode_tunnel xfrm4_mode_tunnel hmac xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp fuse esp4 ah4 af_key xfrm_algo ip6table_filter ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_tcpudp xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat crc32c_generic binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul eeepc_wmi crc32_pclmul asus_wmi sparse_keymap ghash_clmulni_intel mxm_wmi pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate snd_hda_intel Jan 16 10:18:53 atlas kernel: snd_hda_codec snd_hda_core snd_hwdep intel_uncore snd_pcm snd_timer snd intel_rapl_perf iTCO_wdt mei_me joydev serio_raw pcspkr soundcore iTCO_vendor_support sg shpchp mei hci_uart btbcm btqca btintel battery wmi bluetooth rfkill intel_lpss_acpi intel_lpss mfd_core acpi_als tpm_tis evdev acpi_pad tpm_tis_core kfifo_buf industrialio tpm nf_conntrack msr zram zsmalloc ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath linear raid1 raid0 md_mod sd_mod hid_generic usbhid ahci libahci crc32c_intel sata_sil24 i915 psmouse e1000e i2c_algo_bit ptp xhci_pci nvme pps_core i2c_i801 xhci_hcd drm_kms_helper nvme_core libata usbcore drm scsi_mod fan thermal video i2c_hid hid button fjes Jan 16 10:18:53 atlas kernel: CPU: 0 PID: 4277 Comm: kworker/u16:0 Not tainted 4.10.0-rc3+ #1 Jan 16 10:18:53 atlas kernel: Hardware name: System manufacturer System Product Name/Z170-A, BIOS 2202 09/19/2016 Jan 16 10:18:53 atlas kernel: Workqueue: i915 intel_unpin_work_fn [i915] Jan 16 10:18:53 atlas kernel: task: ffff9fabf3ab9e80 task.stack: ffffb8af60e24000 Jan 16 10:18:53 atlas kernel: RIP: 0010:intel_unpin_fb_obj+0x63/0xd0 [i915] Jan 16 10:18:53 atlas kernel: RSP: 0018:ffffb8af60e27de8 EFLAGS: 00010246 Jan 16 10:18:53 atlas kernel: RAX: 0000000000000000 RBX: ffff9faa7f21f700 RCX: 0000000000000001 Jan 16 10:18:53 atlas kernel: RDX: ffffb8af60e27de8 RSI: ffff9fac54c03908 RDI: ffff9faa7f21f700 Jan 16 10:18:53 atlas kernel: RBP: ffff9fa94b96a500 R08: ffff9fa99f7a9f08 R09: 0000000000000002 Jan 16 10:18:53 atlas kernel: R10: 000000000000008a R11: 0000000000000075 R12: 0000000000000001 Jan 16 10:18:53 atlas kernel: R13: ffff9fac54c00000 R14: ffff9fac5b5e9c00 R15: ffff9fac54c00068 Jan 16 10:18:53 atlas kernel: FS: 0000000000000000(0000) GS:ffff9fac6ec00000(0000) knlGS:0000000000000000 Jan 16 10:18:53 atlas kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 16 10:18:53 atlas kernel: CR2: 0000000000000078 CR3: 0000000373809000 CR4: 00000000003406f0 Jan 16 10:18:53 atlas kernel: Call Trace: Jan 16 10:18:53 atlas kernel: ? intel_unpin_work_fn+0x50/0x120 [i915] Jan 16 10:18:53 atlas kernel: ? process_one_work+0x18e/0x440 Jan 16 10:18:53 atlas kernel: ? worker_thread+0x4a/0x480 Jan 16 10:18:53 atlas kernel: ? kthread+0xf4/0x130 Jan 16 10:18:53 atlas kernel: ? process_one_work+0x440/0x440 Jan 16 10:18:53 atlas kernel: ? kthread_create_on_node+0x60/0x60 Jan 16 10:18:53 atlas kernel: ? ret_from_fork+0x25/0x30 Jan 16 10:18:53 atlas kernel: Code: a9 fc ff ff ff 74 64 44 89 e2 48 89 ee 48 89 e7 e8 73 30 ff ff 48 8b 43 08 48 89 e2 48 89 df 48 8d b0 08 39 00 00 e8 bd 25 fc ff <48> 8b 50 78 48 85 d2 74 04 83 6a 20 01 48 89 c7 e8 c8 6b fc ff Jan 16 10:18:53 atlas kernel: RIP: intel_unpin_fb_obj+0x63/0xd0 [i915] RSP: ffffb8af60e27de8 Jan 16 10:18:53 atlas kernel: CR2: 0000000000000078 Jan 16 10:18:53 atlas kernel: ---[ end trace 97044bd2bd6079bb ]---
A w/a has been applied that will prevent the oops. Root cause still unknown, hopefully we will stumble upon it soon enough. commit be1e341513ca23b0668b7b0f26fa6e2ffc46ba20 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 16 15:21:27 2017 +0000 drm/i915: Track pinned vma in intel_plane_state
Great! And just in time, my other computer (BDW) locked up with the same oops. Will test, and report after some time if it helps (or not). Thanks!
I just tested and can confirm it's fixed in drm-intel-next. If this is any help, prior to recent patches I was able to reproduce this consistently by opening a video in vlc/mplayer/mpv and quickly clicking the video, such that it was rapidly flipping between fullscreen and windowed.
Is there a patch for 4.10-rc5? I'm seeing this intermittently, mostly at login/logoff and it's a problem.
(In reply to Greg White from comment #11) > Is there a patch for 4.10-rc5? I'm seeing this intermittently, mostly at > login/logoff and it's a problem. Reference to Marteen's "Backport vma fixes for 4.10-rc6" patch: https://patchwork.freedesktop.org/series/18825/
Thanks!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.