Kernel crashes after a GPU reset. The GPU hang is frequent and happens during boot time. However, the GPU hang occasionally results in a kernel crash. This has been observed on Chrome OS with a 3.18 kernel that has i915 backports. I believe that the NULL pointer access happens at I915_WRITE(DSPSURF(intel_crtc->plane), intel_crtc->unpin_work->gtt_offset); in intel_display.c:ilk_do_mmio_flip If we assume an ongoing reset, then the call sequence intel_finish_reset -> intel_complete_page_flips -> intel_finish_page_flip_plane -> do_intel_finish_page_flip -> page_flip_completed might set intel_crtc->unpin_work = NULL. We need some help to debug this crash. <6>[ 6.744129] [drm] stuck on render ring <6>[ 6.766343] [drm] GPU HANG: ecode 8:0:0x2efe5dbc, reason: Ring hung, action: reset <6>[ 6.766356] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. <6>[ 6.766367] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel <6>[ 6.766378] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. <6>[ 6.766389] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. <6>[ 6.766400] [drm] GPU crash dump saved to /sys/class/drm/card0/error <5>[ 6.769207] drm/i915: Resetting chip after gpu hang <6>[ 12.739947] [drm] stuck on render ring <6>[ 12.765654] [drm] GPU HANG: ecode 8:0:0x86dffffd, in chrome [3652], reason: Ring hung, action: reset <4>[ 12.765733] ------------[ cut here ]------------ <4>[ 12.765764] WARNING: CPU: 2 PID: 41 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/gpu/drm/i915/intel_display.c:11277 intel_mmio_flip_work_func+0x6d/0x315() <4>[ 12.765787] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips)) <4>[ 12.765805] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 cros_ec_sensors ip6table_filter cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf ip6_tables iio_trig_sysfs industrialio iwlmvm iwl7000_mac80211 iwlwifi cfg80211 btusb btbcm btintel bluetooth smsc95xx usbnet mii uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core joydev ppp_async ppp_generic slhc tun <4>[ 12.765930] CPU: 2 PID: 41 Comm: kworker/2:1 Not tainted 3.18.0-06623-g902cb99 #1 <4>[ 12.765944] Hardware name: GOOGLE Cyan, BIOS Google_Cyan.7287.57.2015_09_30_1147 09/30/2015 <4>[ 12.765962] Workqueue: events intel_mmio_flip_work_func <4>[ 12.765974] 0000000000000000 000000004607d413 ffff88017a9bbcc8 ffffffff8d5f3d15 <4>[ 12.765996] 0000000000000000 ffff88017a9bbd20 ffff88017a9bbd08 ffffffff8d03dfd9 <4>[ 12.766016] ffff88017a9bbcd8 ffffffff8d345524 ffff88017a97f000 ffff880072f2da00 <4>[ 12.766037] Call Trace: <4>[ 12.766054] [<ffffffff8d5f3d15>] ? dump_stack+0x46/0x58 <4>[ 12.766070] [<ffffffff8d03dfd9>] ? warn_slowpath_common+0x81/0x9b <4>[ 12.766085] [<ffffffff8d345524>] ? intel_mmio_flip_work_func+0x6d/0x315 <4>[ 12.766100] [<ffffffff8d03e048>] ? warn_slowpath_fmt+0x55/0x6b <4>[ 12.766115] [<ffffffff8d345524>] ? intel_mmio_flip_work_func+0x6d/0x315 <4>[ 12.766133] [<ffffffff8d05c849>] ? finish_task_switch+0x5b/0xba <4>[ 12.766149] [<ffffffff8d051a1b>] ? process_one_work+0x175/0x2ab <4>[ 12.766163] [<ffffffff8d052c95>] ? worker_thread+0x1fb/0x2ce <4>[ 12.766178] [<ffffffff8d052a9a>] ? rescuer_thread+0x2d7/0x2d7 <4>[ 12.766192] [<ffffffff8d056863>] ? kthread+0x10e/0x116 <4>[ 12.766207] [<ffffffff8d056755>] ? kthread_stop+0xc0/0xc0 <4>[ 12.766222] [<ffffffff8d5f8bac>] ? ret_from_fork+0x7c/0xb0 <4>[ 12.766237] [<ffffffff8d056755>] ? kthread_stop+0xc0/0xc0 <4>[ 12.766249] ---[ end trace 8d614c29c562a829 ]--- <5>[ 12.767790] drm/i915: Resetting chip after gpu hang <6>[ 18.740012] [drm] stuck on render ring <6>[ 18.760304] [drm] GPU HANG: ecode 8:0:0x86dffffd, in chrome [3652], reason: Ring hung, action: reset <4>[ 18.760635] ------------[ cut here ]------------ <4>[ 18.760665] WARNING: CPU: 0 PID: 1099 at /mnt/host/source/src/third_party/kernel/v3.18/drivers/gpu/drm/i915/intel_display.c:11277 intel_mmio_flip_work_func+0x6d/0x315() <4>[ 18.760688] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips)) <4>[ 18.760706] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 cros_ec_sensors ip6table_filter cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf ip6_tables iio_trig_sysfs industrialio iwlmvm iwl7000_mac80211 iwlwifi cfg80211 btusb btbcm btintel bluetooth smsc95xx usbnet mii uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core joydev ppp_async ppp_generic slhc tun <4>[ 18.760823] CPU: 0 PID: 1099 Comm: kworker/0:2 Tainted: G W 3.18.0-06623-g902cb99 #1 <4>[ 18.760837] Hardware name: GOOGLE Cyan, BIOS Google_Cyan.7287.57.2015_09_30_1147 09/30/2015 <4>[ 18.760854] Workqueue: events intel_mmio_flip_work_func <4>[ 18.760866] 0000000000000000 00000000e8ee967d ffff8801760b3cc8 ffffffff8d5f3d15 <4>[ 18.760886] 0000000000000000 ffff8801760b3d20 ffff8801760b3d08 ffffffff8d03dfd9 <4>[ 18.760905] ffff8801760b3cd8 ffffffff8d345524 ffff8801799ceb40 ffff8801798a50c0 <4>[ 18.760925] Call Trace: <4>[ 18.760940] [<ffffffff8d5f3d15>] ? dump_stack+0x46/0x58 <4>[ 18.760955] [<ffffffff8d03dfd9>] ? warn_slowpath_common+0x81/0x9b <4>[ 18.760969] [<ffffffff8d345524>] ? intel_mmio_flip_work_func+0x6d/0x315 <4>[ 18.760983] [<ffffffff8d03e048>] ? warn_slowpath_fmt+0x55/0x6b <4>[ 18.760997] [<ffffffff8d345524>] ? intel_mmio_flip_work_func+0x6d/0x315 <4>[ 18.761014] [<ffffffff8d05c849>] ? finish_task_switch+0x5b/0xba <4>[ 18.761028] [<ffffffff8d051a1b>] ? process_one_work+0x175/0x2ab <4>[ 18.761042] [<ffffffff8d052c95>] ? worker_thread+0x1fb/0x2ce <4>[ 18.761055] [<ffffffff8d052a9a>] ? rescuer_thread+0x2d7/0x2d7 <4>[ 18.761069] [<ffffffff8d056863>] ? kthread+0x10e/0x116 <4>[ 18.761083] [<ffffffff8d056755>] ? kthread_stop+0xc0/0xc0 <4>[ 18.761096] [<ffffffff8d5f8bac>] ? ret_from_fork+0x7c/0xb0 <4>[ 18.761110] [<ffffffff8d056755>] ? kthread_stop+0xc0/0xc0 <4>[ 18.761121] ---[ end trace 8d614c29c562a82a ]--- <5>[ 18.763443] drm/i915: Resetting chip after gpu hang <1>[ 18.769490] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 <1>[ 18.769515] IP: [<ffffffff8d345716>] intel_mmio_flip_work_func+0x25f/0x315 <4>[ 18.769536] PGD 0 <4>[ 18.769544] Oops: 0000 [#1] SMP <0>[ 18.773130] gsmi: Log Shutdown Reason 0x03 <4>[ 18.773140] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 cros_ec_sensors ip6table_filter cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf ip6_tables iio_trig_sysfs industrialio iwlmvm iwl7000_mac80211 iwlwifi cfg80211 btusb btbcm btintel bluetooth smsc95xx usbnet mii uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core joydev ppp_async ppp_generic slhc tun <4>[ 18.773241] CPU: 0 PID: 1099 Comm: kworker/0:2 Tainted: G W 3.18.0-06623-g902cb99 #1 <4>[ 18.773255] Hardware name: GOOGLE Cyan, BIOS Google_Cyan.7287.57.2015_09_30_1147 09/30/2015 <4>[ 18.773272] Workqueue: events intel_mmio_flip_work_func <4>[ 18.773283] task: ffff880179bfea80 ti: ffff8801760b0000 task.ti: ffff8801760b0000 <4>[ 18.773296] RIP: 0010:[<ffffffff8d345716>] [<ffffffff8d345716>] intel_mmio_flip_work_func+0x25f/0x315 <4>[ 18.773314] RSP: 0018:ffff8801760b3d88 EFLAGS: 00010096 <4>[ 18.773324] RAX: 0000000000000000 RBX: ffff88017b2b7000 RCX: 0000000000180000 <4>[ 18.773337] RDX: 00000000001e1180 RSI: 0000000000000046 RDI: ffff88017a080000 <4>[ 18.773349] RBP: ffff8801760b3de8 R08: 0000000000000001 R09: ffff88017b2b7000 <4>[ 18.773361] R10: 0000000000000000 R11: 000000000000b910 R12: ffff88017a080000 <4>[ 18.773373] R13: 00000000001f0180 R14: ffff8801798a50c0 R15: ffff8801741c5680 <4>[ 18.773386] FS: 0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 <4>[ 18.773399] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>[ 18.773410] CR2: 0000000000000048 CR3: 0000000077e06000 CR4: 00000000001007f0 <4>[ 18.773422] Stack: <4>[ 18.773427] ffff8801760b3db8 ffffffff8d05c849 ffff88017a99c000 ffff880078ac0b40 <4>[ 18.773446] 000003dc77f04c10 00000000e8ee967d ffff8801760b3e28 ffff8801799ceb40 <4>[ 18.773463] ffff8801798a50c0 ffff88017fc11780 0000000000000000 ffff88017fc15b00 <4>[ 18.773481] Call Trace: <4>[ 18.773495] [<ffffffff8d05c849>] ? finish_task_switch+0x5b/0xba <4>[ 18.773510] [<ffffffff8d051a1b>] process_one_work+0x175/0x2ab <4>[ 18.773523] [<ffffffff8d052c95>] worker_thread+0x1fb/0x2ce <4>[ 18.773535] [<ffffffff8d052a9a>] ? rescuer_thread+0x2d7/0x2d7 <4>[ 18.773548] [<ffffffff8d056863>] kthread+0x10e/0x116 <4>[ 18.773561] [<ffffffff8d056755>] ? kthread_stop+0xc0/0xc0 <4>[ 18.773575] [<ffffffff8d5f8bac>] ret_from_fork+0x7c/0xb0 <4>[ 18.773587] [<ffffffff8d056755>] ? kthread_stop+0xc0/0xc0 <4>[ 18.773597] Code: 00 c0 74 05 80 cc 04 89 c2 b9 01 00 00 00 4c 89 ee 4c 89 e7 41 ff 94 24 d8 00 00 00 48 8b 83 48 07 00 00 41 8b 4c 24 20 4c 89 e7 <8b> 50 48 8b 83 24 04 00 00 41 8b 44 84 30 41 2b 44 24 30 8d b4 <1>[ 18.773724] RIP [<ffffffff8d345716>] intel_mmio_flip_work_func+0x25f/0x315 <4>[ 18.773739] RSP <ffff8801760b3d88> <4>[ 18.773747] CR2: 0000000000000048 <4>[ 18.773756] ---[ end trace 8d614c29c562a82b ]--- <0>[ 18.781967] Kernel panic - not syncing: Fatal exception <0>[ 18.782089] Kernel Offset: 0xc000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) <0>[ 18.782293] gsmi: Log Shutdown Reason 0x02
Created attachment 118993 [details] dmesg
There were improvements pushed in kernel and Mesa that will benefit to your system and certainly fix your issue, so please re-test with latest kernel & Mesa to see if this issue is still occurring. In this last case, please attached as well gpu crash dump located at /sys/class/drm/card0/error
iirc this issue was solved. I will go ahead and close this.
ChromeOS still has this issue. https://bugs.chromium.org/p/chromium/issues/detail?id=776613 Dhinakaran, why did you consider it's fixed?
DS, I filed that bug over two years ago and as far as I can remember some backports made the GPU hang go away. I'd recommend filing a new bug for the issue you are seeing now if it's reproducible on drm-tip and/or talk to someone who's familiar with GPU hangs.
(In reply to Dhinakaran Pandiyan from comment #5) > DS, > > I filed that bug over two years ago and as far as I can remember some > backports made the GPU hang go away. > > I'd recommend filing a new bug for the issue you are seeing now if it's > reproducible on drm-tip and/or talk to someone who's familiar with GPU hangs. Hello Dongseong Hwang, please file a new bug for your case, since Dhinakaran stated that his issue was different to https://bugs.chromium.org/p/chromium/issues/detail?id=776613
For the record, it was fixed by https://patchwork.freedesktop.org/patch/106110/
As reference: https://patchwork.freedesktop.org/patch/104303/ commit 3e7d28b655aefefe51f1d7ac6aba46d6ca03b658 Author: Rodrigo Vivi <rodrigo.vivi@intel.com> Date: Thu Jan 4 14:45:54 2018 -0800 drm-tip: 2018y-01m-04d-22h-45m-20s UTC integration manifest
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.