Bug 93822 - Oops in the i915 driver with kernel 4.4
Summary: Oops in the i915 driver with kernel 4.4
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 92119 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-01-22 13:10 UTC by Zoltán Böszörményi
Modified: 2017-07-24 22:43 UTC (History)
5 users (show)

See Also:
i915 platform: PNV
i915 features:


Attachments

Description Zoltán Böszörményi 2016-01-22 13:10:02 UTC
With kernel 4.4 and an uptime overnight, we got this Oops on an Intel based POS machine. The CPU is Atom D525 and thermal throttling is only supported via p4_clockmod on this chip. The Oops happened around pressing Ctrl-Alt-F2 to go to console from X. Xorg server 1.16.4 and xf86-video-intel 2.99.918-ish (git.fdo commit 627ef68a8cd7a51627d5b6a98cb0a5bdb1d9b534) was running on the system. The OS is a Yocto 1.6-based custom OpenEmbedded 64-bit build, the kernel was compiled to support module signing but Yocto doesn't actually sign its kernel modules, hence the "E" tainted flag.

Jan 19 03:52:33 chef01 kernel: Modules linked in: binfmt_misc(E) cpufreq_ondemand(E) tun(E) egalax_ts_serial(E) i915(E) joydev(E) hid_generic(E) usbhid(E) snd_hda_codec_realtek(
Jan 19 03:52:33 chef01 kernel:  nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat(E) xt_connmark(E) nf_conntrack(E) ip6_tables(E) iptable_mangle(E)
Jan 19 03:52:33 chef01 kernel: CPU: 1 PID: 313 Comm: Xorg.bin Tainted: G            E   4.4.0 #1
Jan 19 03:52:33 chef01 kernel: Hardware name: SI SL20/SL20, BIOS 080016  04/01/2013
Jan 19 03:52:33 chef01 kernel:  ffffffffa0174c3f ffff8800741bb990 ffffffff8075359d 0000000000000000
Jan 19 03:52:33 chef01 kernel:  ffff8800741bb9c8 ffffffff8048fa08 ffff88007ba6d0c0 ffff88007b4d43c0
Jan 19 03:52:33 chef01 kernel:  ffff88007b4d43c0 ffff880056e8d000 ffff88007b8e9000 ffff8800741bb9d8
Jan 19 03:52:33 chef01 kernel: Call Trace:
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8075359d>] dump_stack+0x44/0x57
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8048fa08>] warn_slowpath_common+0x88/0xc0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8048fafa>] warn_slowpath_null+0x1a/0x20
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01657b3>] kref_get.part.9+0x1e/0x27 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa0151601>] drm_framebuffer_reference+0x51/0x60 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa0161ead>] drm_atomic_set_fb_for_plane+0x2d/0x90 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d35e0>] __drm_atomic_helper_set_config+0xd0/0x3b0 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d4506>] restore_fbdev_mode+0x1f6/0x260 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d65e3>] drm_fb_helper_restore_fbdev_mode_unlocked+0x33/0x80 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d665c>] drm_fb_helper_set_par+0x2c/0x50 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa032153a>] intel_fbdev_set_par+0x1a/0x60 [i915]
Jan 19 03:52:33 chef01 kernel:  [<ffffffff807c18b1>] fb_set_var+0x191/0x400
Jan 19 03:52:33 chef01 kernel:  [<ffffffff804be317>] ? update_curr+0x67/0x130
Jan 19 03:52:33 chef01 kernel:  [<ffffffff804bcd8c>] ? __enqueue_entity+0x6c/0x70
Jan 19 03:52:33 chef01 kernel:  [<ffffffff804c13ca>] ? enqueue_entity+0x34a/0x900
Jan 19 03:52:33 chef01 kernel:  [<ffffffff807b88fc>] fbcon_blank+0x1bc/0x2b0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff808410ba>] do_unblank_screen+0xba/0x1c0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8083774a>] complete_change_console+0x5a/0xe0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff808386ed>] vt_ioctl+0xf1d/0x10d0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80595b82>] ? do_wp_page+0x1d2/0x4d0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80586003>] ? __inc_zone_page_state+0x33/0x40
Jan 19 03:52:33 chef01 kernel:  [<ffffffff805cb005>] ? mem_cgroup_end_page_stat+0x5/0x50
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa014b8b0>] ? drm_setmaster_ioctl+0xf0/0xf0 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8082bf34>] tty_ioctl+0x3d4/0xbc0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80597995>] ? handle_mm_fault+0xca5/0x16f0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff806e6867>] ? selinux_file_ioctl+0x107/0x1d0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff805ee61d>] do_vfs_ioctl+0x2cd/0x4b0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff806dff43>] ? security_file_ioctl+0x43/0x60
Jan 19 03:52:33 chef01 kernel:  [<ffffffff805ee879>] SyS_ioctl+0x79/0x90
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80ab24f6>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 19 03:52:33 chef01 kernel: ---[ end trace 80bfc40970d49658 ]---
Jan 19 03:52:33 chef01 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
Jan 19 03:52:36 chef01 kernel: IP: [<ffffffffa031810c>] intel_fb_obj_invalidate+0x1c/0x100 [i915]
Jan 19 03:52:36 chef01 kernel: PGD 74107067 PUD 741a2067 PMD 0 
Jan 19 03:52:36 chef01 kernel: Oops: 0000 [#1] SMP 
Jan 19 03:52:33 chef01 kernel: ------------[ cut here ]------------
Jan 19 03:52:33 chef01 kernel: WARNING: CPU: 1 PID: 313 at include/linux/kref.h:46 kref_get.part.9+0x1e/0x27 [drm]()
Jan 19 03:52:33 chef01 kernel: Modules linked in: binfmt_misc(E) cpufreq_ondemand(E) tun(E) egalax_ts_serial(E) i915(E) joydev(E) hid_generic(E) usbhid(E) snd_hda_codec_realtek(
Jan 19 03:52:33 chef01 kernel:  nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat(E) xt_connmark(E) nf_conntrack(E) ip6_tables(E) iptable_mangle(E)
Jan 19 03:52:33 chef01 kernel: CPU: 1 PID: 313 Comm: Xorg.bin Tainted: G            E   4.4.0 #1
Jan 19 03:52:33 chef01 kernel: Hardware name: SI SL20/SL20, BIOS 080016  04/01/2013
Jan 19 03:52:33 chef01 kernel:  ffffffffa0174c3f ffff8800741bb990 ffffffff8075359d 0000000000000000
Jan 19 03:52:33 chef01 kernel:  ffff8800741bb9c8 ffffffff8048fa08 ffff88007ba6d0c0 ffff88007b4d43c0
Jan 19 03:52:33 chef01 kernel:  ffff88007b4d43c0 ffff880056e8d000 ffff88007b8e9000 ffff8800741bb9d8
Jan 19 03:52:33 chef01 kernel: Call Trace:
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8075359d>] dump_stack+0x44/0x57
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8048fa08>] warn_slowpath_common+0x88/0xc0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8048fafa>] warn_slowpath_null+0x1a/0x20
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01657b3>] kref_get.part.9+0x1e/0x27 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa0151601>] drm_framebuffer_reference+0x51/0x60 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa0161ead>] drm_atomic_set_fb_for_plane+0x2d/0x90 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d35e0>] __drm_atomic_helper_set_config+0xd0/0x3b0 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d4506>] restore_fbdev_mode+0x1f6/0x260 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d65e3>] drm_fb_helper_restore_fbdev_mode_unlocked+0x33/0x80 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa01d665c>] drm_fb_helper_set_par+0x2c/0x50 [drm_kms_helper]
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa032153a>] intel_fbdev_set_par+0x1a/0x60 [i915]
Jan 19 03:52:33 chef01 kernel:  [<ffffffff807c18b1>] fb_set_var+0x191/0x400
Jan 19 03:52:33 chef01 kernel:  [<ffffffff804be317>] ? update_curr+0x67/0x130
Jan 19 03:52:33 chef01 kernel:  [<ffffffff804bcd8c>] ? __enqueue_entity+0x6c/0x70
Jan 19 03:52:33 chef01 kernel:  [<ffffffff804c13ca>] ? enqueue_entity+0x34a/0x900
Jan 19 03:52:33 chef01 kernel:  [<ffffffff807b88fc>] fbcon_blank+0x1bc/0x2b0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff808410ba>] do_unblank_screen+0xba/0x1c0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8083774a>] complete_change_console+0x5a/0xe0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff808386ed>] vt_ioctl+0xf1d/0x10d0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80595b82>] ? do_wp_page+0x1d2/0x4d0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80586003>] ? __inc_zone_page_state+0x33/0x40
Jan 19 03:52:33 chef01 kernel:  [<ffffffff805cb005>] ? mem_cgroup_end_page_stat+0x5/0x50
Jan 19 03:52:33 chef01 kernel:  [<ffffffffa014b8b0>] ? drm_setmaster_ioctl+0xf0/0xf0 [drm]
Jan 19 03:52:33 chef01 kernel:  [<ffffffff8082bf34>] tty_ioctl+0x3d4/0xbc0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80597995>] ? handle_mm_fault+0xca5/0x16f0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff806e6867>] ? selinux_file_ioctl+0x107/0x1d0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff805ee61d>] do_vfs_ioctl+0x2cd/0x4b0
Jan 19 03:52:33 chef01 kernel:  [<ffffffff806dff43>] ? security_file_ioctl+0x43/0x60
Jan 19 03:52:33 chef01 kernel:  [<ffffffff805ee879>] SyS_ioctl+0x79/0x90
Jan 19 03:52:33 chef01 kernel:  [<ffffffff80ab24f6>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 19 03:52:33 chef01 kernel: ---[ end trace 80bfc40970d49658 ]---
Comment 1 Chris Wilson 2016-01-22 13:18:12 UTC
I hope

commit 0c82312f3f15538f4e6ceda2a82caee8fbac4501
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Dec 4 16:05:26 2015 +0000

    drm/i915: Pin the ifbdev for the info->system_base GGTT mmapping

helps. Hmm, that patch lost the stable tags.
Comment 2 Zoltán Böszörményi 2016-01-22 13:51:30 UTC
I found your commit ID as https://patchwork.freedesktop.org/patch/67152/
Unfortunately, it does not apply cleanly to kernel 4.4 final.
Can you backport it please?
Comment 3 Jani Nikula 2016-01-22 14:40:00 UTC
(In reply to Zoltán Böszörményi from comment #2)
> I found your commit ID as https://patchwork.freedesktop.org/patch/67152/
> Unfortunately, it does not apply cleanly to kernel 4.4 final.
> Can you backport it please?

Please try current Linus' master, it's there.
Comment 4 Zoltán Böszörményi 2016-01-22 14:49:49 UTC
I will test it. Will these i915 changes get into 4.4.x?
Comment 5 Zoltán Böszörményi 2016-01-25 13:54:36 UTC
Kernel 4.5.0-rc1 seems to solve this problem.
The machine survives heavy switching back and forth between X and the console.
I will leave it running for the night to see it survives a few hours uptime.
Comment 6 Zoltán Böszörményi 2016-01-26 07:36:18 UTC
The machine survived the night and I can still switch between X and console.

When will this code go to a stable release? Can we expect it in 4.4.1?
Comment 7 Jiri Slaby 2016-02-01 15:19:10 UTC
*** Bug 92119 has been marked as a duplicate of this bug. ***
Comment 8 Manuel Krause 2016-02-20 14:02:53 UTC
Is it possible for the developers to backport this and related fixes to 4.4?

I've tried to do this one here on my own (without comprehensive programming knowledge), and it works somehow, but doesn't survive a resume from disk, so most likely there are other issues involved and there may be fixes for the latter, that I haven't found so far.

In the current shape as of 4.4.2 i915 it's not really usable -- to tell it honestly.

Best regards and thank you for your work,
Manuel Krause
Comment 9 Pacho Ramos 2016-05-21 11:12:18 UTC
There are still people suffering this bug downstream on gentoo even with kernel 4.4.10. It would be interesting to be able to get the patch backported to this LTS branch. I have tried to see if other distributions were able to backport it already but I didn't found it :(

Thanks a lot
Comment 10 Jani Nikula 2016-05-23 06:50:21 UTC
From upstream perspective this is fixed. Closing as such.

Unfortunately, the commit doesn't backport cleanly to v4.4 or earlier. Chris, if the backport is trivial, please provide a patch for it.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.