Bug 104436 - Regression in kernel 4.15-rc1, GPF and display locks up. Still in rc6
Summary: Regression in kernel 4.15-rc1, GPF and display locks up. Still in rc6
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 104743 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-01-01 14:15 UTC by m.keehan
Modified: 2018-03-02 16:05 UTC (History)
3 users (show)

See Also:
i915 platform: KBL
i915 features:


Attachments
Full dmesg from boot to error report, with drm.debug=0x1e set. (30.13 KB, application/x-bzip)
2018-01-01 14:15 UTC, m.keehan
no flags Details

Description m.keehan 2018-01-01 14:15:26 UTC
Created attachment 136469 [details]
Full dmesg from boot to error report, with drm.debug=0x1e set.

I am getting a General Protection Fault on 4.15-rc1, and the X11 screen
locks up within minutes of login, at every boot.  Fault is still reproducible in 4.15-rc6 at every boot.  Previous kernels have all been fine.

file <kernel>: Linux kernel x86 boot executable bzImage, version 4.15.0-rc6 (mkeehan@babelfish) #11 SMP PREEMPT Mon Jan 1 13:11:17 GMT 2018, RO-rootFS, swap_dev 0x4, Normal VGA
I use the Arch distribution, which is up to date as of today, Jan 1, 2018.

I use the xf86-video-intel, the i915 driver.  Haven't tried nomodesetting yet.

I tried bisecting rc1, but found that some "good" bisections would hang up after a few hours use.  Whereas the hang occurred within minutes of booting rc1 itself.

The file /sys/class/drm/card0/error was empty.  I used ssh into the locked up laptop.

There is an nVidia chip present in this Dell Laptop XPS15, but I don't include any nouveau driver in the kernel, nor the proprietary nVidia driver.  There are no bios options for video card selection or disabling.
Comment 1 m.keehan 2018-01-01 16:23:39 UTC
I have been running 4.15-rc6 now for a few hours in nomodesetting mode with the i915 driver removed, and there is no hangup problem anymore.

(Normally I use the i915 driver because nomodesetting causes video tearing to occur, especially on movies etc.)
Comment 2 Elizabeth 2018-01-03 20:31:30 UTC
Hello M, still no luck with the bisection? 

Jan 01 13:40:15 babelfish kernel: general protection fault: 0000 [#1] PREEMPT SMP
Jan 01 13:40:15 babelfish kernel: Modules linked in: ccm hid_logitech_hidpp hid_logitech_dj fuse iptable_raw usbhid iptable_mangle iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter arc4 mousedev hid_multitouch snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt i2c_designware_platform iTCO_vendor_support rtsx_pci_sdmmc i2c_designware_core mmc_core rtsx_pci_ms input_leds memstick intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp dell_laptop kvm_intel nls_iso8859_1 led_class dell_smbios_smm kvm nls_cp437 dcdbas irqbypass vfat crc32_pclmul crc32c_intel fat dell_smm_hwmon pcbc snd_hda_intel ath10k_pci snd_hda_codec tpm_crb ath10k_core idma64 ath snd_hwdep virt_dma snd_hda_core
Jan 01 13:40:15 babelfish kernel:  xhci_pci mac80211 aesni_intel dell_wmi dell_smbios_wmi snd_pcm aes_x86_64 xhci_hcd crypto_simd dell_smbios cryptd glue_helper snd_timer snd atkbd usbcore rtsx_pci intel_cstate intel_lpss_pci serio_raw pcspkr i2c_i801 soundcore intel_lpss intel_rapl_perf i2c_hid intel_wmi_thunderbolt cfg80211 i915 dell_wmi_descriptor usb_common mfd_core thermal hid battery acpi_pad tpm_tis ac tpm_tis_core evdev intel_hid sparse_keymap sg crypto_user ip_tables x_tables ipv6
Jan 01 13:40:15 babelfish kernel: CPU: 0 PID: 3280 Comm: Xorg Not tainted 4.15.0-rc6 #11
Jan 01 13:40:15 babelfish kernel: Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.5.0 08/30/2017
Jan 01 13:40:15 babelfish kernel: RIP: 0010:execlists_schedule+0x75/0x200 [i915]
Jan 01 13:40:15 babelfish kernel: RSP: 0018:ffffb0a3c126bb20 EFLAGS: 00010212
Jan 01 13:40:15 babelfish kernel: RAX: ffff8cee92afc280 RBX: 0000000000000400 RCX: 0000000000000000
Jan 01 13:40:15 babelfish kernel: RDX: 0a3479d738836ec7 RSI: ffff8ceedbd817c0 RDI: ffff8ceedbd81680
Jan 01 13:40:15 babelfish kernel: RBP: ffffb0a3c126bb20 R08: 0000000000000003 R09: ffffb0a3c126bb30
Jan 01 13:40:15 babelfish kernel: R10: 000000000000000e R11: ffff8ceee8f88000 R12: 0000000000000400
Jan 01 13:40:15 babelfish kernel: R13: ffff8ceea8067c00 R14: ffff8ceee9ec8c00 R15: ffff8ceee81fe600
Jan 01 13:40:15 babelfish kernel: FS:  00007fa94a4a6940(0000) GS:ffff8ceeff400000(0000) knlGS:0000000000000000
Jan 01 13:40:15 babelfish kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 01 13:40:15 babelfish kernel: CR2: 00007f7fc10a7008 CR3: 00000004614b8006 CR4: 00000000003606f0
Jan 01 13:40:15 babelfish kernel: Call Trace:
Jan 01 13:40:15 babelfish kernel:  ? __i915_gem_object_flush_for_display+0xe/0x30 [i915]
Jan 01 13:40:15 babelfish kernel:  i915_gem_object_wait_priority+0xad/0x160 [i915]
Jan 01 13:40:15 babelfish kernel:  intel_prepare_plane_fb+0x169/0x2e0 [i915]
Jan 01 13:40:15 babelfish kernel:  drm_atomic_helper_prepare_planes+0x47/0xd0
Jan 01 13:40:15 babelfish kernel:  intel_atomic_commit+0x9e/0x280 [i915]
Jan 01 13:40:15 babelfish kernel:  drm_atomic_helper_page_flip+0x77/0x90
Jan 01 13:40:15 babelfish kernel:  drm_mode_page_flip_ioctl+0x4bd/0x520
Jan 01 13:40:15 babelfish kernel:  ? drm_mode_cursor2_ioctl+0x10/0x10
Jan 01 13:40:15 babelfish kernel:  drm_ioctl_kernel+0x54/0xa0
Jan 01 13:40:15 babelfish kernel:  drm_ioctl+0x2c6/0x380
Jan 01 13:40:15 babelfish kernel:  ? drm_mode_cursor2_ioctl+0x10/0x10
Jan 01 13:40:15 babelfish kernel:  ? vfs_writev+0xb4/0x110
Jan 01 13:40:15 babelfish kernel:  do_vfs_ioctl+0x9c/0x610
Jan 01 13:40:15 babelfish kernel:  SyS_ioctl+0x6f/0x80
Jan 01 13:40:15 babelfish kernel:  entry_SYSCALL_64_fastpath+0x13/0x6c
Jan 01 13:40:15 babelfish kernel: RIP: 0033:0x7fa947d5f337
Jan 01 13:40:15 babelfish kernel: RSP: 002b:00007ffde1127748 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
Jan 01 13:40:15 babelfish kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa947d5f337
Jan 01 13:40:15 babelfish kernel: RDX: 00007ffde1127810 RSI: 00000000c01864b0 RDI: 000000000000000d
Jan 01 13:40:15 babelfish kernel: RBP: 0000559c7bf474d0 R08: 0000000000000003 R09: 0000000000000000
Jan 01 13:40:15 babelfish kernel: R10: 0000000000000001 R11: 0000000000003246 R12: 00007ffde1127630
Jan 01 13:40:15 babelfish kernel: R13: 0000000000000000 R14: 0000559c7c29aef0 R15: 0000000000000020
Jan 01 13:40:15 babelfish kernel: Code: 10 48 8d 44 24 38 48 89 44 24 08 4c 8d 4c 24 10 48 89 6c 24 40 48 89 04 24 49 8b 31 48 8b 16 48 8d 42 f8 48 39 d6 74 69 48 8b 10 <8b> 4a 78 85 c9 74 23 4c 8b 82 38 ff ff ff 4d 8b 80 38 01 00 00 
Jan 01 13:40:15 babelfish kernel: RIP: execlists_schedule+0x75/0x200 [i915] RSP: ffffb0a3c126bb20
Jan 01 13:40:15 babelfish kernel: ---[ end trace 1582d7fb894eb091 ]---
Comment 3 m.keehan 2018-01-05 12:42:07 UTC
No, I gave up on the bisection as it would have taken days to work I think.

All I can say is that without the i915 driver (1:2.99.917+802+gaf6d8e9e-1 from Arch) I have had no problems running with nomodeset driver.

Mike.
Comment 4 Chris Wilson 2018-01-08 09:38:28 UTC
commit c218ee03b9315073ce43992792554dafa0626eb8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jan 6 10:56:18 2018 +0000

    drm/i915: Don't adjust priority on an already signaled fence
    
    When we retire a signaled fence, we free the dependency tree. However,
    we skip clearing the list so that if we then try to adjust the priority
    of the signaled fence, we may walk the list of freed dependencies.
    
    [ 3083.156757] ==================================================================
    [ 3083.156806] BUG: KASAN: use-after-free in execlists_schedule+0x199/0x660 [i915]
    [ 3083.156810] Read of size 8 at addr ffff8806bf20f400 by task Xorg/831
    
    [ 3083.156815] CPU: 0 PID: 831 Comm: Xorg Not tainted 4.15.0-rc6-no-psn+ #1
    [ 3083.156817] Hardware name: Notebook                         N24_25BU/N24_25BU, BIOS 5.12 02/17/2017
    [ 3083.156818] Call Trace:
    [ 3083.156823]  dump_stack+0x5c/0x7a
    [ 3083.156827]  print_address_description+0x6b/0x290
    [ 3083.156830]  kasan_report+0x28f/0x380
    [ 3083.156872]  ? execlists_schedule+0x199/0x660 [i915]
    [ 3083.156914]  execlists_schedule+0x199/0x660 [i915]
    [ 3083.156956]  ? intel_crtc_atomic_check+0x146/0x4e0 [i915]
    [ 3083.156997]  ? execlists_submit_request+0xe0/0xe0 [i915]
    [ 3083.157038]  ? i915_vma_misplaced.part.4+0x25/0xb0 [i915]
    [ 3083.157079]  ? __i915_vma_do_pin+0x7c8/0xc80 [i915]
    [ 3083.157121]  ? intel_atomic_state_alloc+0x44/0x60 [i915]
    [ 3083.157130]  ? drm_atomic_helper_page_flip+0x3e/0xb0 [drm_kms_helper]
    [ 3083.157145]  ? drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
    [ 3083.157159]  ? drm_ioctl_kernel+0xa7/0xf0 [drm]
    [ 3083.157172]  ? drm_ioctl+0x45b/0x560 [drm]
    [ 3083.157211]  i915_gem_object_wait_priority+0x14c/0x2c0 [i915]
    [ 3083.157251]  ? i915_gem_get_aperture_ioctl+0x150/0x150 [i915]
    [ 3083.157290]  ? i915_vma_pin_fence+0x1d8/0x320 [i915]
    [ 3083.157331]  ? intel_pin_and_fence_fb_obj+0x175/0x250 [i915]
    [ 3083.157372]  ? intel_rotation_info_size+0x60/0x60 [i915]
    [ 3083.157413]  ? intel_link_compute_m_n+0x80/0x80 [i915]
    [ 3083.157428]  ? drm_dev_printk+0x1b0/0x1b0 [drm]
    [ 3083.157443]  ? drm_dev_printk+0x1b0/0x1b0 [drm]
    [ 3083.157485]  intel_prepare_plane_fb+0x2f8/0x5a0 [i915]
    [ 3083.157527]  ? intel_crtc_get_vblank_counter+0x80/0x80 [i915]
    [ 3083.157536]  drm_atomic_helper_prepare_planes+0xa0/0x1c0 [drm_kms_helper]
    [ 3083.157587]  intel_atomic_commit+0x12e/0x4e0 [i915]
    [ 3083.157605]  drm_atomic_helper_page_flip+0xa2/0xb0 [drm_kms_helper]
    [ 3083.157621]  drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
    [ 3083.157638]  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
    [ 3083.157652]  ? drm_lease_owner+0x1a/0x30 [drm]
    [ 3083.157668]  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
    [ 3083.157681]  drm_ioctl_kernel+0xa7/0xf0 [drm]
    [ 3083.157696]  drm_ioctl+0x45b/0x560 [drm]
    [ 3083.157711]  ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
    [ 3083.157725]  ? drm_getstats+0x20/0x20 [drm]
    [ 3083.157729]  ? timerqueue_del+0x49/0x80
    [ 3083.157732]  ? __remove_hrtimer+0x62/0xb0
    [ 3083.157735]  ? hrtimer_try_to_cancel+0x173/0x210
    [ 3083.157738]  do_vfs_ioctl+0x13b/0x880
    [ 3083.157741]  ? ioctl_preallocate+0x140/0x140
    [ 3083.157744]  ? _raw_spin_unlock_irq+0xe/0x30
    [ 3083.157746]  ? do_setitimer+0x234/0x370
    [ 3083.157750]  ? SyS_setitimer+0x19e/0x1b0
    [ 3083.157752]  ? SyS_alarm+0x140/0x140
    [ 3083.157755]  ? __rcu_read_unlock+0x66/0x80
    [ 3083.157757]  ? __fget+0xc4/0x100
    [ 3083.157760]  SyS_ioctl+0x74/0x80
    [ 3083.157763]  entry_SYSCALL_64_fastpath+0x1a/0x7d
    [ 3083.157765] RIP: 0033:0x7f6135d0c6a7
    [ 3083.157767] RSP: 002b:00007fff01451888 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
    [ 3083.157769] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6135d0c6a7
    [ 3083.157771] RDX: 00007fff01451950 RSI: 00000000c01864b0 RDI: 000000000000000c
    [ 3083.157772] RBP: 00007f613076f600 R08: 0000000000000001 R09: 0000000000000000
    [ 3083.157773] R10: 0000000000000060 R11: 0000000000003246 R12: 0000000000000000
    [ 3083.157774] R13: 0000000000000060 R14: 000000000000001b R15: 0000000000000060
    
    [ 3083.157779] Allocated by task 831:
    [ 3083.157783]  kmem_cache_alloc+0xc0/0x200
    [ 3083.157822]  i915_gem_request_await_dma_fence+0x2c4/0x5d0 [i915]
    [ 3083.157861]  i915_gem_request_await_object+0x321/0x370 [i915]
    [ 3083.157900]  i915_gem_do_execbuffer+0x1165/0x19c0 [i915]
    [ 3083.157937]  i915_gem_execbuffer2+0x1ad/0x550 [i915]
    [ 3083.157950]  drm_ioctl_kernel+0xa7/0xf0 [drm]
    [ 3083.157962]  drm_ioctl+0x45b/0x560 [drm]
    [ 3083.157964]  do_vfs_ioctl+0x13b/0x880
    [ 3083.157966]  SyS_ioctl+0x74/0x80
    [ 3083.157968]  entry_SYSCALL_64_fastpath+0x1a/0x7d
    
    [ 3083.157971] Freed by task 831:
    [ 3083.157973]  kmem_cache_free+0x77/0x220
    [ 3083.158012]  i915_gem_request_retire+0x72c/0xa70 [i915]
    [ 3083.158051]  i915_gem_request_alloc+0x1e9/0x8b0 [i915]
    [ 3083.158089]  i915_gem_do_execbuffer+0xa96/0x19c0 [i915]
    [ 3083.158127]  i915_gem_execbuffer2+0x1ad/0x550 [i915]
    [ 3083.158140]  drm_ioctl_kernel+0xa7/0xf0 [drm]
    [ 3083.158153]  drm_ioctl+0x45b/0x560 [drm]
    [ 3083.158155]  do_vfs_ioctl+0x13b/0x880
    [ 3083.158156]  SyS_ioctl+0x74/0x80
    [ 3083.158158]  entry_SYSCALL_64_fastpath+0x1a/0x7d
    
    [ 3083.158162] The buggy address belongs to the object at ffff8806bf20f400
                    which belongs to the cache i915_dependency of size 64
    [ 3083.158166] The buggy address is located 0 bytes inside of
                    64-byte region [ffff8806bf20f400, ffff8806bf20f440)
    [ 3083.158168] The buggy address belongs to the page:
    [ 3083.158171] page:00000000d43decc4 count:1 mapcount:0 mapping:          (null) index:0x0
    [ 3083.158174] flags: 0x17ffe0000000100(slab)
    [ 3083.158179] raw: 017ffe0000000100 0000000000000000 0000000000000000 0000000180200020
    [ 3083.158182] raw: ffffea001afc16c0 0000000500000005 ffff880731b881c0 0000000000000000
    [ 3083.158184] page dumped because: kasan: bad access detected
    
    [ 3083.158187] Memory state around the buggy address:
    [ 3083.158190]  ffff8806bf20f300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    [ 3083.158192]  ffff8806bf20f380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    [ 3083.158195] >ffff8806bf20f400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    [ 3083.158196]                    ^
    [ 3083.158199]  ffff8806bf20f480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    [ 3083.158201]  ffff8806bf20f500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    [ 3083.158203] ==================================================================
    
    Reported-by: Alexandru Chirvasitu <achirvasub@gmail.com>
    Reported-by: Mike Keehan <mike@keehan.net>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104436
    Fixes: 1f181225f8ec ("drm/i915/execlists: Keep request->priority for its lifetime")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
    Cc: Michał Winiarski <michal.winiarski@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Tested-by: Alexandru Chirvasitu <achirvasub@gmail.com>
    Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180106105618.13532-1-chris@chris-wilson.co.uk
Comment 5 m.keehan 2018-01-15 19:53:16 UTC
Thanks Chris.  I have been running all afternoon now with kernel 4.15-rc8 and no problem.  Certainly seems fixed to me.
Comment 6 Chris Wilson 2018-01-22 23:20:29 UTC
*** Bug 104743 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.