Bug 100111 - [SKL] i915_gem_object_sync "unable to handle kernel NULL pointer dereference" and system lockup returning from suspend
Summary: [SKL] i915_gem_object_sync "unable to handle kernel NULL pointer dereference"...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-08 11:46 UTC by Chris Down
Modified: 2017-03-24 03:02 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features:


Attachments

Description Chris Down 2017-03-08 11:46:38 UTC
I sporadically get complete system lockups with the following stack on kernel 4.10 when trying to return from suspend:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 2fd54c067 PUD 2fd6bd067 PMD 0 
Oops: 0010 [#1] SMP 
Modules linked in: tun ctr ccm fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus iTCO_wdt dw_dmac_core iTCO_vendor_support intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel arc4 kvm irqbypass joydev mousedev iwlmvm mac80211 snd_hda_intel snd_hda_codec e1000e iwlwifi psmouse evdev input_leds snd_hda_core mac_hid snd_hwdep snd_pcm ptp pps_core i2c_i801 i915 cfg80211 snd_timer rtsx_pci_ms memstick shpchp drm_kms_helper drm intel_gtt syscopyarea sysfillrect sysimgblt mei_me fb_sys_fops mei i2c_algo_bit thinkpad_acpi nvram snd soundcore led_class thermal rfkill battery ac video fjes wmi tpm_tis tpm button processor sch_fq_codel
 vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) netatop(O) ip_tables x_tables ext4 crc16 mbcache jbd2 jitterentropy_rng sha256_ssse3 sha256_generic hmac drbg ansi_cprng algif_skcipher af_alg hid_multitouch hid_generic usbhid hid dm_crypt dm_mod sd_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd xhci_pci xhci_hcd ahci usbcore libahci libata rtsx_pci scsi_mod usb_common i8042 serio
CPU: 3 PID: 6475 Comm: firefox Tainted: G     U  W  O    4.4.52-1-lts44 #1
Hardware name: LENOVO 20FAS22V05/20FAS22V05, BIOS N1CET43W (1.11 ) 04/14/2016
task: ffff8802937c1d80 ti: ffff8802e54bc000 task.ti: ffff8802e54bc000
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff8802e54bfb00  EFLAGS: 00010282
RAX: ffff8802e54bfbc8 RBX: ffff8800b41983c0 RCX: 000000000004e545
RDX: 000000000004e545 RSI: ffff88030f8920d8 RDI: ffff8800b4198540
RBP: ffff8802e54bfb88 R08: ffff88030e6d9498 R09: 0000000000000000
R10: 00000000000000a0 R11: ffff8800b4198540 R12: 0000000000000001
R13: ffff88030f894568 R14: ffff88030f8920d8 R15: 0000000000000000
FS:  00007fc5702e0740(0000) GS:ffff8803214c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000002e5588000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0667b19 0000000000004000 ffff88030f8920dc 0004e545810654a7
 ffff8802e54bfbc8 ffff8800b45c9680 ffff8800b41983c0 ffff88030f890000
 ffff8802d1ea4480 ffff88030f894578 ffff88030f890000 0000000088b36159
Call Trace:
 [<ffffffffa0667b19>] ? i915_gem_object_sync+0x1a9/0x310 [i915]
 [<ffffffffa067ac3d>] intel_execlists_submission+0x1cd/0x420 [i915]
 [<ffffffffa065a804>] i915_gem_do_execbuffer.isra.13+0x1374/0x1410 [i915]
 [<ffffffff8116b52c>] ? __alloc_pages_nodemask+0x14c/0xaa0
 [<ffffffff8109ef9d>] ? ttwu_do_activate.constprop.35+0x5d/0x70
 [<ffffffffa065b5b4>] i915_gem_execbuffer2+0xd4/0x240 [i915]
 [<ffffffffa04d7752>] drm_ioctl+0x152/0x540 [drm]
 [<ffffffffa065b4e0>] ? i915_gem_execbuffer+0x320/0x320 [i915]
 [<ffffffff811ef379>] do_vfs_ioctl+0x2a9/0x4b0
 [<ffffffff81064c73>] ? __do_page_fault+0x193/0x420
 [<ffffffff811ef5f9>] SyS_ioctl+0x79/0x90
 [<ffffffff8158fbae>] entry_SYSCALL_64_fastpath+0x12/0x6d
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8802e54bfb00>
CR2: 0000000000000000
---[ end trace 58fb3e8de972d5d8 ]---

To try and mitigate this, I tried to disable execlist support with i915.enable_execlists=0 on the kernel command line, but this seems to not have any effect?

% sudo cat /sys/module/i915/parameters/enable_execlists
1
% grep -o 'i915.\S\+' /proc/cmdline
i915.enable_execlists=0

This is with the following hardware (a Lenovo T460s):

00:00.0 0600: 8086:1904 (rev 08)
00:02.0 0300: 8086:1916 (rev 07)
00:08.0 0880: 8086:1911
00:14.0 0c03: 8086:9d2f (rev 21)
00:14.2 1180: 8086:9d31 (rev 21)
00:16.0 0780: 8086:9d3a (rev 21)
00:16.3 0700: 8086:9d3d (rev 21)
00:17.0 0106: 8086:9d03 (rev 21)
00:1c.0 0604: 8086:9d10 (rev f1)
00:1c.2 0604: 8086:9d12 (rev f1)
00:1f.0 0601: 8086:9d48 (rev 21)
00:1f.2 0580: 8086:9d21 (rev 21)
00:1f.3 0403: 8086:9d70 (rev 21)
00:1f.4 0c05: 8086:9d23 (rev 21)
00:1f.6 0200: 8086:156f (rev 21)
02:00.0 ff00: 10ec:522a (rev 01)
04:00.0 0280: 8086:24f3 (rev 3a)

vendor_id	: GenuineIntel
cpu family	: 6
model		: 78
model name	: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
stepping	: 3
microcode	: 0x88
Comment 1 Chris Down 2017-03-08 11:48:19 UTC
(ah, this stack is from when I was testing 4.4 LTS kernel to see if it had the same issue, hence the kernel version listed, but this also happens on 4.10).
Comment 2 Chris Wilson 2017-03-08 11:50:42 UTC
Kernel v4.4; please retest with an upstream kernel, preferrably https://cgit.freedesktop.org/drm-tip
Comment 3 Chris Wilson 2017-03-08 11:51:27 UTC
(In reply to Chris Down from comment #1)
> (ah, this stack is from when I was testing 4.4 LTS kernel to see if it had
> the same issue, hence the kernel version listed, but this also happens on
> 4.10).

The stacktrace is meaningless for recent kernels.
Comment 4 Chris Down 2017-03-08 15:20:55 UTC
Sure thing, I'll test with drm-tip/6f8585956c95 over the coming days.
Comment 5 Ricardo 2017-03-08 21:16:38 UTC
Thanks Chris Down, when you attach the new logs please also change the status of the Bug to Reopen
Comment 6 Chris Down 2017-03-20 18:07:39 UTC
Unable to repro with drm-tip after over a week, so I guess this is fixed somewhere between 4.10 and drm-tip. Thanks!
Comment 7 Francecso 2017-03-24 03:02:31 UTC
I got the same exact issue on kernel 4.10.4-1-ARCH

- BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
- IP: gen8_ppgtt_alloc_page_directories.isra.14+0x11f/0x270 [i915]
- RIP: gen8_ppgtt_alloc_page_directories.isra.14+0x11f/0x270 [i915] RSP: ffffc90004727890


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.