Bug 100516 - [i915][SKL] wayland desktop crash from i915 driver bug: gen8_ppgtt_alloc_page_directories
Summary: [i915][SKL] wayland desktop crash from i915 driver bug: gen8_ppgtt_alloc_page...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-31 15:07 UTC by aappddeevv
Modified: 2017-07-24 23:15 UTC (History)
7 users (show)

See Also:
i915 platform: SKL
i915 features:


Attachments

Description aappddeevv 2017-03-31 15:07:29 UTC
I'm using 4.11rc4 with Fedora 25 on an XPS 9350 hooked up to a TP15 dock with 2 external monitors. I've seen this error on 4.10 as well.

Occassionally, about once or twice a day, I get the following bug. It crashes my wayland session and I have to reboot. The underlying OS is still running as I can login from another machine.


Mar 31 10:47:03 nc6910p kernel: ---[ end trace 7191751e8c8925ea ]---
Mar 31 10:47:03 nc6910p kernel: CR2: 0000000000000018
Mar 31 10:47:03 nc6910p kernel: RIP: gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915] RSP: ffffbf0085d8b878
Mar 31 10:47:03 nc6910p kernel: Code: e6 48 8b 90 28 03 00 00 48 8b b8 e0 02 00 00 48 8b 52 08 48 83 ca 03 e8 4a d0 ff ff 48 8b 45 b0 48 8b 4d c8 48 8b 10 48 8b 45 d0 <4c> 89 24 ca 48 0f ab 08 0f 1f 44 00 00 e9 53 ff ff ff 65 8b 0
Mar 31 10:47:03 nc6910p kernel: R13: 0000000000000010 R14: 0000000000000000 R15: 0000000000000000
Mar 31 10:47:03 nc6910p kernel: R10: 0000000000000050 R11: 0000000000000246 R12: 00000000c0406469
Mar 31 10:47:03 nc6910p kernel: RBP: 00007ffc29523620 R08: 0000000000000000 R09: 0000000000000000
Mar 31 10:47:03 nc6910p kernel: RDX: 00007ffc29523620 RSI: 00000000c0406469 RDI: 0000000000000010
Mar 31 10:47:03 nc6910p kernel: RAX: ffffffffffffffda RBX: 00001e30454f3000 RCX: 00007fa9fd3ed787
Mar 31 10:47:03 nc6910p kernel: RSP: 002b:00007ffc295235d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 31 10:47:03 nc6910p kernel: RIP: 0033:0x7fa9fd3ed787
Mar 31 10:47:03 nc6910p kernel:  entry_SYSCALL64_slow_path+0x25/0x25
Mar 31 10:47:03 nc6910p kernel:  do_syscall_64+0x67/0x180
Mar 31 10:47:03 nc6910p kernel:  SyS_ioctl+0x79/0x90
Mar 31 10:47:03 nc6910p kernel:  do_vfs_ioctl+0xa3/0x5f0
Mar 31 10:47:03 nc6910p kernel:  ? i915_gem_execbuffer+0x310/0x310 [i915]
Mar 31 10:47:03 nc6910p kernel:  ? seccomp_run_filters+0x52/0xc0
Mar 31 10:47:03 nc6910p kernel:  drm_ioctl+0x209/0x4c0 [drm]
Mar 31 10:47:03 nc6910p kernel:  i915_gem_execbuffer2+0xc5/0x240 [i915]
Mar 31 10:47:03 nc6910p kernel:  i915_gem_do_execbuffer.isra.36+0x4ec/0x1650 [i915]
Mar 31 10:47:03 nc6910p kernel:  i915_gem_execbuffer_reserve.isra.30+0x457/0x490 [i915]
Mar 31 10:47:03 nc6910p kernel:  i915_gem_execbuffer_reserve_vma.isra.29+0x14d/0x1b0 [i915]
Mar 31 10:47:03 nc6910p kernel:  __i915_vma_do_pin+0x3a3/0x460 [i915]
Mar 31 10:47:03 nc6910p kernel:  i915_vma_bind+0x81/0x170 [i915]
Mar 31 10:47:03 nc6910p kernel:  gen8_alloc_va_range+0x25b/0x410 [i915]
Mar 31 10:47:03 nc6910p kernel:  ? add_hole+0xf0/0x110 [drm]
Mar 31 10:47:03 nc6910p kernel:  ? pick_next_task_fair+0x398/0x550
Mar 31 10:47:03 nc6910p kernel:  ? sched_clock+0x9/0x10
Mar 31 10:47:03 nc6910p kernel:  gen8_alloc_va_range_3lvl+0xd4/0x920 [i915]
Mar 31 10:47:03 nc6910p kernel: Call Trace:
Mar 31 10:47:03 nc6910p kernel: CR2: 0000000000000018 CR3: 0000000220044000 CR4: 00000000003406e0
Mar 31 10:47:03 nc6910p kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 31 10:47:03 nc6910p kernel: FS:  00007faa03d64f80(0000) GS:ffff9ee4bed00000(0000) knlGS:0000000000000000
Mar 31 10:47:03 nc6910p kernel: R13: ffff9ee1637a4d10 R14: 00000000fffef000 R15: 0000000000008000
Mar 31 10:47:03 nc6910p kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9ee390806000
Mar 31 10:47:03 nc6910p kernel: RBP: ffffbf0085d8b8d0 R08: 0000000000000000 R09: 0000000000000000
Mar 31 10:47:03 nc6910p kernel: RDX: 0000000000000000 RSI: ffff9ee1d1a74000 RDI: ffff9ee4a73b8000
Mar 31 10:47:03 nc6910p kernel: RAX: ffff9ee43fc76dc0 RBX: 0000000000000003 RCX: 0000000000000003
Mar 31 10:47:03 nc6910p kernel: RSP: 0018:ffffbf0085d8b878 EFLAGS: 00010246
Mar 31 10:47:03 nc6910p kernel: RIP: 0010:gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915]
Mar 31 10:47:03 nc6910p kernel: task: ffff9ee26905cc00 task.stack: ffffbf0085d88000
Mar 31 10:47:03 nc6910p kernel: Hardware name: Dell Inc. XPS 13 9350/0H67KH, BIOS 1.4.14 02/08/2017
Mar 31 10:47:03 nc6910p kernel: CPU: 2 PID: 24710 Comm: chrome Tainted: G     U     OE   4.11.0-0.rc4.git0.2.local.fc27.x86_64 #1
Mar 31 10:47:03 nc6910p kernel:  int3403_thermal intel_hid intel_lpss int340x_thermal_zone sparse_keymap int3400_thermal acpi_pad acpi_thermal_rel acpi_als kfifo_buf industrialio tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl l
Mar 31 10:47:03 nc6910p kernel:  vfat fat dell_led snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core snd_hda_codec_realtek snd_hda_codec_generic snd_compress snd_pcm_dmaeng
Mar 31 10:47:03 nc6910p kernel: Modules linked in: ccm arc4 mac80211 cfg80211 cdc_ether usbnet snd_usb_audio snd_usbmidi_lib snd_rawmidi r8152 mii rfcomm nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJE
Mar 31 10:47:03 nc6910p kernel: Oops: 0002 [#1] SMP
Mar 31 10:47:03 nc6910p kernel: 
Mar 31 10:47:03 nc6910p kernel: PGD 0 
Mar 31 10:47:03 nc6910p kernel: IP: gen8_ppgtt_alloc_page_directories.isra.38+0x115/0x250 [i915]
Mar 31 10:47:03 nc6910p kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Comment 1 aappddeevv 2017-03-31 15:11:02 UTC
The crash happens when moving the mouse or opening a window, say in Chrome. It appears to be random other than the mouse/window/popup action.
Comment 2 Chris Wilson 2017-03-31 15:22:19 UTC
commit e2b763caa6eb68ea56918ee6f79b40b82bdcf7c9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 15 08:43:48 2017 +0000

    drm/i915: Remove bitmap tracking for used-pdpes

Which is too big for stable and queued for 4.12.
Comment 3 rockorequin 2017-05-02 03:27:19 UTC
So, because the fix is too big for "stable" kernels, kernel 4.10 and 4.11 are therefore unstable for any PC running Intel graphics? I've seen this happen under X, so it's not just Wayland sessions that crash with this bug.

Is the fix in drm-intel-nightly?
Comment 4 aappddeevv 2017-05-02 12:21:45 UTC
I am running on drm-tip which does not generate this error, it generates another error that seems similar in that it deals with allocation. Devs thought it was a userspace issue with say, gnome-shell, but I'm not so sure.
Comment 5 Arnaud Kleinveld 2017-05-05 02:57:27 UTC
I am running Fedora 25 with 4.10.13 and Xorg crashes one or two times daily with the following message:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: gen8_ppgtt_alloc_page_directories.isra.36+0x115/0x250 [i915]

I guess the fix didn't make it into 4.10.12?
Comment 6 Benjamin Herrenschmidt 2017-05-17 04:25:42 UTC
Chris, I wonder how you guys find it "ok" to have two major kernel versions (4.10 and 4.11) lockup on users accross the board with no intent to backport the fix ? 

This is hitting *all* the laptops here in ozlabs since the distros have been updating to 4.10. We can't get more than about a day uptime.

This really need a workaround of some sort in 4.10 and 4.11


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.