With kernel 4.4 and an uptime overnight, we got this Oops on an Intel based POS machine. The CPU is Atom D525 and thermal throttling is only supported via p4_clockmod on this chip. The Oops happened around pressing Ctrl-Alt-F2 to go to console from X. Xorg server 1.16.4 and xf86-video-intel 2.99.918-ish (git.fdo commit 627ef68a8cd7a51627d5b6a98cb0a5bdb1d9b534) was running on the system. The OS is a Yocto 1.6-based custom OpenEmbedded 64-bit build, the kernel was compiled to support module signing but Yocto doesn't actually sign its kernel modules, hence the "E" tainted flag. Jan 19 03:52:33 chef01 kernel: Modules linked in: binfmt_misc(E) cpufreq_ondemand(E) tun(E) egalax_ts_serial(E) i915(E) joydev(E) hid_generic(E) usbhid(E) snd_hda_codec_realtek( Jan 19 03:52:33 chef01 kernel: nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat(E) xt_connmark(E) nf_conntrack(E) ip6_tables(E) iptable_mangle(E) Jan 19 03:52:33 chef01 kernel: CPU: 1 PID: 313 Comm: Xorg.bin Tainted: G E 4.4.0 #1 Jan 19 03:52:33 chef01 kernel: Hardware name: SI SL20/SL20, BIOS 080016 04/01/2013 Jan 19 03:52:33 chef01 kernel: ffffffffa0174c3f ffff8800741bb990 ffffffff8075359d 0000000000000000 Jan 19 03:52:33 chef01 kernel: ffff8800741bb9c8 ffffffff8048fa08 ffff88007ba6d0c0 ffff88007b4d43c0 Jan 19 03:52:33 chef01 kernel: ffff88007b4d43c0 ffff880056e8d000 ffff88007b8e9000 ffff8800741bb9d8 Jan 19 03:52:33 chef01 kernel: Call Trace: Jan 19 03:52:33 chef01 kernel: [<ffffffff8075359d>] dump_stack+0x44/0x57 Jan 19 03:52:33 chef01 kernel: [<ffffffff8048fa08>] warn_slowpath_common+0x88/0xc0 Jan 19 03:52:33 chef01 kernel: [<ffffffff8048fafa>] warn_slowpath_null+0x1a/0x20 Jan 19 03:52:33 chef01 kernel: [<ffffffffa01657b3>] kref_get.part.9+0x1e/0x27 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffffa0151601>] drm_framebuffer_reference+0x51/0x60 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffffa0161ead>] drm_atomic_set_fb_for_plane+0x2d/0x90 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d35e0>] __drm_atomic_helper_set_config+0xd0/0x3b0 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d4506>] restore_fbdev_mode+0x1f6/0x260 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d65e3>] drm_fb_helper_restore_fbdev_mode_unlocked+0x33/0x80 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d665c>] drm_fb_helper_set_par+0x2c/0x50 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa032153a>] intel_fbdev_set_par+0x1a/0x60 [i915] Jan 19 03:52:33 chef01 kernel: [<ffffffff807c18b1>] fb_set_var+0x191/0x400 Jan 19 03:52:33 chef01 kernel: [<ffffffff804be317>] ? update_curr+0x67/0x130 Jan 19 03:52:33 chef01 kernel: [<ffffffff804bcd8c>] ? __enqueue_entity+0x6c/0x70 Jan 19 03:52:33 chef01 kernel: [<ffffffff804c13ca>] ? enqueue_entity+0x34a/0x900 Jan 19 03:52:33 chef01 kernel: [<ffffffff807b88fc>] fbcon_blank+0x1bc/0x2b0 Jan 19 03:52:33 chef01 kernel: [<ffffffff808410ba>] do_unblank_screen+0xba/0x1c0 Jan 19 03:52:33 chef01 kernel: [<ffffffff8083774a>] complete_change_console+0x5a/0xe0 Jan 19 03:52:33 chef01 kernel: [<ffffffff808386ed>] vt_ioctl+0xf1d/0x10d0 Jan 19 03:52:33 chef01 kernel: [<ffffffff80595b82>] ? do_wp_page+0x1d2/0x4d0 Jan 19 03:52:33 chef01 kernel: [<ffffffff80586003>] ? __inc_zone_page_state+0x33/0x40 Jan 19 03:52:33 chef01 kernel: [<ffffffff805cb005>] ? mem_cgroup_end_page_stat+0x5/0x50 Jan 19 03:52:33 chef01 kernel: [<ffffffffa014b8b0>] ? drm_setmaster_ioctl+0xf0/0xf0 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffff8082bf34>] tty_ioctl+0x3d4/0xbc0 Jan 19 03:52:33 chef01 kernel: [<ffffffff80597995>] ? handle_mm_fault+0xca5/0x16f0 Jan 19 03:52:33 chef01 kernel: [<ffffffff806e6867>] ? selinux_file_ioctl+0x107/0x1d0 Jan 19 03:52:33 chef01 kernel: [<ffffffff805ee61d>] do_vfs_ioctl+0x2cd/0x4b0 Jan 19 03:52:33 chef01 kernel: [<ffffffff806dff43>] ? security_file_ioctl+0x43/0x60 Jan 19 03:52:33 chef01 kernel: [<ffffffff805ee879>] SyS_ioctl+0x79/0x90 Jan 19 03:52:33 chef01 kernel: [<ffffffff80ab24f6>] entry_SYSCALL_64_fastpath+0x16/0x75 Jan 19 03:52:33 chef01 kernel: ---[ end trace 80bfc40970d49658 ]--- Jan 19 03:52:33 chef01 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 Jan 19 03:52:36 chef01 kernel: IP: [<ffffffffa031810c>] intel_fb_obj_invalidate+0x1c/0x100 [i915] Jan 19 03:52:36 chef01 kernel: PGD 74107067 PUD 741a2067 PMD 0 Jan 19 03:52:36 chef01 kernel: Oops: 0000 [#1] SMP Jan 19 03:52:33 chef01 kernel: ------------[ cut here ]------------ Jan 19 03:52:33 chef01 kernel: WARNING: CPU: 1 PID: 313 at include/linux/kref.h:46 kref_get.part.9+0x1e/0x27 [drm]() Jan 19 03:52:33 chef01 kernel: Modules linked in: binfmt_misc(E) cpufreq_ondemand(E) tun(E) egalax_ts_serial(E) i915(E) joydev(E) hid_generic(E) usbhid(E) snd_hda_codec_realtek( Jan 19 03:52:33 chef01 kernel: nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat(E) xt_connmark(E) nf_conntrack(E) ip6_tables(E) iptable_mangle(E) Jan 19 03:52:33 chef01 kernel: CPU: 1 PID: 313 Comm: Xorg.bin Tainted: G E 4.4.0 #1 Jan 19 03:52:33 chef01 kernel: Hardware name: SI SL20/SL20, BIOS 080016 04/01/2013 Jan 19 03:52:33 chef01 kernel: ffffffffa0174c3f ffff8800741bb990 ffffffff8075359d 0000000000000000 Jan 19 03:52:33 chef01 kernel: ffff8800741bb9c8 ffffffff8048fa08 ffff88007ba6d0c0 ffff88007b4d43c0 Jan 19 03:52:33 chef01 kernel: ffff88007b4d43c0 ffff880056e8d000 ffff88007b8e9000 ffff8800741bb9d8 Jan 19 03:52:33 chef01 kernel: Call Trace: Jan 19 03:52:33 chef01 kernel: [<ffffffff8075359d>] dump_stack+0x44/0x57 Jan 19 03:52:33 chef01 kernel: [<ffffffff8048fa08>] warn_slowpath_common+0x88/0xc0 Jan 19 03:52:33 chef01 kernel: [<ffffffff8048fafa>] warn_slowpath_null+0x1a/0x20 Jan 19 03:52:33 chef01 kernel: [<ffffffffa01657b3>] kref_get.part.9+0x1e/0x27 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffffa0151601>] drm_framebuffer_reference+0x51/0x60 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffffa0161ead>] drm_atomic_set_fb_for_plane+0x2d/0x90 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d35e0>] __drm_atomic_helper_set_config+0xd0/0x3b0 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d4506>] restore_fbdev_mode+0x1f6/0x260 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d65e3>] drm_fb_helper_restore_fbdev_mode_unlocked+0x33/0x80 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa01d665c>] drm_fb_helper_set_par+0x2c/0x50 [drm_kms_helper] Jan 19 03:52:33 chef01 kernel: [<ffffffffa032153a>] intel_fbdev_set_par+0x1a/0x60 [i915] Jan 19 03:52:33 chef01 kernel: [<ffffffff807c18b1>] fb_set_var+0x191/0x400 Jan 19 03:52:33 chef01 kernel: [<ffffffff804be317>] ? update_curr+0x67/0x130 Jan 19 03:52:33 chef01 kernel: [<ffffffff804bcd8c>] ? __enqueue_entity+0x6c/0x70 Jan 19 03:52:33 chef01 kernel: [<ffffffff804c13ca>] ? enqueue_entity+0x34a/0x900 Jan 19 03:52:33 chef01 kernel: [<ffffffff807b88fc>] fbcon_blank+0x1bc/0x2b0 Jan 19 03:52:33 chef01 kernel: [<ffffffff808410ba>] do_unblank_screen+0xba/0x1c0 Jan 19 03:52:33 chef01 kernel: [<ffffffff8083774a>] complete_change_console+0x5a/0xe0 Jan 19 03:52:33 chef01 kernel: [<ffffffff808386ed>] vt_ioctl+0xf1d/0x10d0 Jan 19 03:52:33 chef01 kernel: [<ffffffff80595b82>] ? do_wp_page+0x1d2/0x4d0 Jan 19 03:52:33 chef01 kernel: [<ffffffff80586003>] ? __inc_zone_page_state+0x33/0x40 Jan 19 03:52:33 chef01 kernel: [<ffffffff805cb005>] ? mem_cgroup_end_page_stat+0x5/0x50 Jan 19 03:52:33 chef01 kernel: [<ffffffffa014b8b0>] ? drm_setmaster_ioctl+0xf0/0xf0 [drm] Jan 19 03:52:33 chef01 kernel: [<ffffffff8082bf34>] tty_ioctl+0x3d4/0xbc0 Jan 19 03:52:33 chef01 kernel: [<ffffffff80597995>] ? handle_mm_fault+0xca5/0x16f0 Jan 19 03:52:33 chef01 kernel: [<ffffffff806e6867>] ? selinux_file_ioctl+0x107/0x1d0 Jan 19 03:52:33 chef01 kernel: [<ffffffff805ee61d>] do_vfs_ioctl+0x2cd/0x4b0 Jan 19 03:52:33 chef01 kernel: [<ffffffff806dff43>] ? security_file_ioctl+0x43/0x60 Jan 19 03:52:33 chef01 kernel: [<ffffffff805ee879>] SyS_ioctl+0x79/0x90 Jan 19 03:52:33 chef01 kernel: [<ffffffff80ab24f6>] entry_SYSCALL_64_fastpath+0x16/0x75 Jan 19 03:52:33 chef01 kernel: ---[ end trace 80bfc40970d49658 ]---
I hope commit 0c82312f3f15538f4e6ceda2a82caee8fbac4501 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Dec 4 16:05:26 2015 +0000 drm/i915: Pin the ifbdev for the info->system_base GGTT mmapping helps. Hmm, that patch lost the stable tags.
I found your commit ID as https://patchwork.freedesktop.org/patch/67152/ Unfortunately, it does not apply cleanly to kernel 4.4 final. Can you backport it please?
(In reply to Zoltán Böszörményi from comment #2) > I found your commit ID as https://patchwork.freedesktop.org/patch/67152/ > Unfortunately, it does not apply cleanly to kernel 4.4 final. > Can you backport it please? Please try current Linus' master, it's there.
I will test it. Will these i915 changes get into 4.4.x?
Kernel 4.5.0-rc1 seems to solve this problem. The machine survives heavy switching back and forth between X and the console. I will leave it running for the night to see it survives a few hours uptime.
The machine survived the night and I can still switch between X and console. When will this code go to a stable release? Can we expect it in 4.4.1?
*** Bug 92119 has been marked as a duplicate of this bug. ***
Is it possible for the developers to backport this and related fixes to 4.4? I've tried to do this one here on my own (without comprehensive programming knowledge), and it works somehow, but doesn't survive a resume from disk, so most likely there are other issues involved and there may be fixes for the latter, that I haven't found so far. In the current shape as of 4.4.2 i915 it's not really usable -- to tell it honestly. Best regards and thank you for your work, Manuel Krause
There are still people suffering this bug downstream on gentoo even with kernel 4.4.10. It would be interesting to be able to get the patch backported to this LTS branch. I have tried to see if other distributions were able to backport it already but I didn't found it :( Thanks a lot
From upstream perspective this is fixed. Closing as such. Unfortunately, the commit doesn't backport cleanly to v4.4 or earlier. Chris, if the backport is trivial, please provide a patch for it.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.