Bug 90136

Summary: [SKL]Webglc sporadically causes system hang
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: unspecified   
Hardware: All   
OS: Linux (All)   
URL: while :; do sudo ./gem_exec_lut_handle ; done
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
case list
none
dmesg
none
dmesg(70614661) none

Description lu hua 2015-04-22 08:10:00 UTC
Created attachment 115262 [details]
case list

==System Environment==
--------------------------
Regression: not sure, Only meet once.

Non-working platforms: SKL

==kernel==
--------------------------
drm-intel-nightly/b9fe357740009b89d4bac30b297bfe9808957e6a
commit b9fe357740009b89d4bac30b297bfe9808957e6a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Apr 20 10:28:37 2015 -0700

    drm-intel-nightly: 2015y-04m-20d-17h-28m-16s UTC integration manifest

==Bug detailed description==
-----------------------------
Run full weblgc case on SKL-Y with the latest drm-intel-nightly and mesa master branch. executed attached case list, then system hang, ssh works well, run reboot, system is no response.
I meet it once, then run 4 cycles, I am unable to reproduce it.

dmesg:
[12816.875799] WARNING: CPU: 0 PID: 26375 at drivers/gpu/drm/i915/i915_gem_gtt.c:504 gen8_ppgtt_clear_range+0xcd/0x16e [i915]()
[12816.875803] WARN_ON(!pd->page_table[pde])
[12816.875806] Modules linked in: dm_mod snd_hda_codec_realtek snd_hda_codec_generic ppdev snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep snd_pcm pcspkr snd_timer i2c_i801 snd soundcore joydev wmi battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper drm
[12816.875845] CPU: 0 PID: 26375 Comm: chrome Tainted: G        W       4.0.0_drm-intel-nightly_b9fe35_20150421+ #368
[12816.875849] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.86C.B067.R00.1412310711 12/31/2014
[12816.875852]  0000000000000000 0000000000000009 ffffffff81795847 ffff880147693508
[12816.875859]  ffffffff8103bd5a ffffffff8103bafd ffffffffa0098277 0000000000000000
[12816.875863]  000000009b37f083 0000000000000000 0000000000000013 0000000000000a6c
[12816.875868] Call Trace:
[12816.875874]  [<ffffffff81795847>] ? dump_stack+0x40/0x50
[12816.875881]  [<ffffffff8103bd5a>] ? warn_slowpath_common+0x98/0xb0
[12816.875887]  [<ffffffff8103bafd>] ? add_taint+0x2a/0x2c
[12816.875909]  [<ffffffffa0098277>] ? gen8_ppgtt_clear_range+0xcd/0x16e [i915]
[12816.875916]  [<ffffffff8103bdb7>] ? warn_slowpath_fmt+0x45/0x4a
[12816.875938]  [<ffffffffa0098211>] ? gen8_ppgtt_clear_range+0x67/0x16e [i915]
[12816.875959]  [<ffffffffa0098277>] ? gen8_ppgtt_clear_range+0xcd/0x16e [i915]
[12816.875984]  [<ffffffffa00a0355>] ? i915_vma_unbind+0xb9/0x1e6 [i915]
[12816.876009]  [<ffffffffa00a2b23>] ? i915_gem_shrink+0x166/0x1dc [i915]
[12816.876032]  [<ffffffffa00a2bf9>] ? i915_gem_shrinker_scan+0x60/0x81 [i915]
[12816.876038]  [<ffffffff810de3b3>] ? shrink_slab.part.57.constprop.67+0x1a5/0x2b5
[12816.876044]  [<ffffffff810e038b>] ? shrink_zone+0x67/0x92
[12816.876049]  [<ffffffff810e0771>] ? do_try_to_free_pages+0x20d/0x241
[12816.876055]  [<ffffffff810e0871>] ? try_to_free_pages+0xcc/0x108
[12816.876061]  [<ffffffff810d7b46>] ? __alloc_pages_nodemask+0x48e/0x6fc
[12816.876067]  [<ffffffff81104767>] ? alloc_pages_current+0xad/0xca
[12816.876089]  [<ffffffffa0097cdd>] ? alloc_pt_single+0x75/0x131 [i915]
[12816.876111]  [<ffffffffa009975f>] ? gen8_alloc_va_range+0x2ae/0x6cc [i915]
[12816.876136]  [<ffffffffa009add9>] ? i915_vma_bind+0x9e/0x454 [i915]
[12816.876141]  [<ffffffff81355064>] ? swiotlb_map_sg_attrs+0x84/0x10c
[12816.876166]  [<ffffffffa00a0b2f>] ? i915_gem_object_do_pin+0x6ad/0x77f [i915]
[12816.876188]  [<ffffffffa0094fc1>] ? i915_gem_execbuffer_reserve_vma.isra.12+0x5d/0x103 [i915]
[12816.876212]  [<ffffffffa00952b3>] ? i915_gem_execbuffer_reserve+0x24c/0x2e3 [i915]
[12816.876256]  [<ffffffffa0095916>] ? i915_gem_do_execbuffer.isra.13+0x5cc/0xd88 [i915]
[12816.876277]  [<ffffffff8133543a>] ? idr_get_empty_slot+0x1c5/0x2c4
[12816.876297]  [<ffffffff81334f23>] ? idr_mark_full+0x2b/0x52
[12816.876319]  [<ffffffff81109093>] ? kmem_cache_alloc_trace+0x2a/0xfb
[12816.876348]  [<ffffffffa0017cbc>] ? drm_vma_node_allow+0xaa/0xb2 [drm]
[12816.876364]  [<ffffffff8110948a>] ? __kmalloc+0x65/0x13d
[12816.876390]  [<ffffffffa0097085>] ? i915_gem_execbuffer2+0x16e/0x205 [i915]
[12816.876410]  [<ffffffffa00047ae>] ? drm_ioctl+0x322/0x38d [drm]
[12816.876417]  [<ffffffff81115db7>] ? pipe_read+0x211/0x227
[12816.876446]  [<ffffffffa0096f17>] ? i915_gem_execbuffer+0x339/0x339 [i915]
[12816.876453]  [<ffffffff8110ed0c>] ? new_sync_read+0x6b/0x8f
[12816.876462]  [<ffffffff810a84bb>] ? seccomp_phase1+0x1b7/0x1f0
[12816.876470]  [<ffffffff8111daa6>] ? do_vfs_ioctl+0x360/0x424
[12816.876478]  [<ffffffff8100d3e9>] ? syscall_trace_enter_phase1+0xe4/0x123
[12816.876487]  [<ffffffff8111dbb3>] ? SyS_ioctl+0x49/0x7a
[12816.876506]  [<ffffffff8179b0f2>] ? system_call_fastpath+0x12/0x17
[12816.876518] ---[ end trace c9f0c31965899e15 ]---

==Reproduce steps==
---------------------------- 
1. xinit
2. run full webglc case https://www.khronos.org/registry/webgl/conformance-suites/1.0.2/webgl-conformance-tests.html
Comment 1 lu hua 2015-04-22 08:10:32 UTC
Created attachment 115263 [details]
dmesg
Comment 2 lu hua 2015-04-22 08:12:53 UTC
output shows:
glxinfo failed to connect to X

X not responsible, or DRI disabled, kill it

glxinfo failed to connect to X
Comment 3 Chris Wilson 2015-04-22 09:21:35 UTC
Mika has been discussing this. The basic problem is that we didn't pin the vma prior to the shrinker potentially running and ruining our day. The fix is in http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=3c84cdac9285bf13b69fccbb19816d287f214c4c
Comment 4 lu hua 2015-04-23 06:09:55 UTC
Created attachment 115283 [details]
dmesg(70614661)

Test on commit 70614661341ffb26de05e84c6958563b45964223, system hang.
Clean boot system, wait 5 minutes or run xinit, system hang. 

dmesg:
[  175.024940] ------------[ cut here ]------------
[  175.024984] WARNING: CPU: 1 PID: 5255 at drivers/gpu/drm/drm_mm.c:367 i915_vma_unbind+0x112/0x1bc [i915]()
[  175.024995] Modules linked in: dm_mod ppdev snd_hda_codec_realtek snd_hda_codec_generic pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i2c_i801 joydev wmi battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper drm
[  175.025030] CPU: 1 PID: 5255 Comm: X Not tainted 4.0.0_kcloud_706146_20150423+ #364
[  175.025034] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.86C.B067.R00.1412310711 12/31/2014
[  175.025038]  0000000000000000 0000000000000009 ffffffff81795a87 0000000000000000
[  175.025044]  ffffffff8103bd5a ffff88009ad86d00 ffffffffa00a166b ffff88009ad86d00
[  175.025050]  ffff8801483c3380 ffff88009ad86d00 0000000000000001 ffff8801442d0000
[  175.025056] Call Trace:
[  175.025068]  [<ffffffff81795a87>] ? dump_stack+0x40/0x50
[  175.025078]  [<ffffffff8103bd5a>] ? warn_slowpath_common+0x98/0xb0
[  175.025109]  [<ffffffffa00a166b>] ? i915_vma_unbind+0x112/0x1bc [i915]
[  175.025137]  [<ffffffffa00a166b>] ? i915_vma_unbind+0x112/0x1bc [i915]
[  175.025163]  [<ffffffffa00a1740>] ? i915_vma_close+0x2b/0x80 [i915]
[  175.025188]  [<ffffffffa00a17fb>] ? i915_gem_close_object+0x66/0x125 [i915]
[  175.025205]  [<ffffffffa0003d35>] ? drm_gem_handle_delete+0xaa/0xbd [drm]
[  175.025221]  [<ffffffffa00047ae>] ? drm_ioctl+0x322/0x38d [drm]
[  175.025241]  [<ffffffffa0003f97>] ? drm_gem_handle_create+0x37/0x37 [drm]
[  175.025248]  [<ffffffff8105ede9>] ? set_next_entity+0x32/0x55
[  175.025257]  [<ffffffff8111dcde>] ? do_vfs_ioctl+0x360/0x424
[  175.025265]  [<ffffffff817987d2>] ? __schedule+0x589/0x7c9
[  175.025271]  [<ffffffff8104fad2>] ? task_work_run+0x84/0x96
[  175.025278]  [<ffffffff8111ddeb>] ? SyS_ioctl+0x49/0x7a
[  175.025284]  [<ffffffff8179b472>] ? system_call_fastpath+0x12/0x17
[  175.025288] ---[ end trace eab6accbedda9078 ]---
[  175.025301] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Comment 5 Chris Wilson 2015-04-23 06:22:33 UTC
(In reply to lu hua from comment #4)
> Created attachment 115283 [details]
> dmesg(70614661)
> 
> Test on commit 70614661341ffb26de05e84c6958563b45964223, system hang.
> Clean boot system, wait 5 minutes or run xinit, system hang. 
> 
> dmesg:
> [  175.024940] ------------[ cut here ]------------
> [  175.024984] WARNING: CPU: 1 PID: 5255 at drivers/gpu/drm/drm_mm.c:367
> i915_vma_unbind+0x112/0x1bc [i915]()
> [  175.024995] Modules linked in: dm_mod ppdev snd_hda_codec_realtek
> snd_hda_codec_generic pcspkr snd_hda_intel snd_hda_controller snd_hda_codec
> snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore i2c_i801 joydev wmi
> battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper
> drm
> [  175.025030] CPU: 1 PID: 5255 Comm: X Not tainted
> 4.0.0_kcloud_706146_20150423+ #364
> [  175.025034] Hardware name: Intel Corporation Skylake Client
> platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.86C.B067.R00.1412310711
> 12/31/2014
> [  175.025038]  0000000000000000 0000000000000009 ffffffff81795a87
> 0000000000000000
> [  175.025044]  ffffffff8103bd5a ffff88009ad86d00 ffffffffa00a166b
> ffff88009ad86d00
> [  175.025050]  ffff8801483c3380 ffff88009ad86d00 0000000000000001
> ffff8801442d0000
> [  175.025056] Call Trace:
> [  175.025068]  [<ffffffff81795a87>] ? dump_stack+0x40/0x50
> [  175.025078]  [<ffffffff8103bd5a>] ? warn_slowpath_common+0x98/0xb0
> [  175.025109]  [<ffffffffa00a166b>] ? i915_vma_unbind+0x112/0x1bc [i915]
> [  175.025137]  [<ffffffffa00a166b>] ? i915_vma_unbind+0x112/0x1bc [i915]
> [  175.025163]  [<ffffffffa00a1740>] ? i915_vma_close+0x2b/0x80 [i915]
> [  175.025188]  [<ffffffffa00a17fb>] ? i915_gem_close_object+0x66/0x125
> [i915]
> [  175.025205]  [<ffffffffa0003d35>] ? drm_gem_handle_delete+0xaa/0xbd [drm]
> [  175.025221]  [<ffffffffa00047ae>] ? drm_ioctl+0x322/0x38d [drm]
> [  175.025241]  [<ffffffffa0003f97>] ? drm_gem_handle_create+0x37/0x37 [drm]
> [  175.025248]  [<ffffffff8105ede9>] ? set_next_entity+0x32/0x55
> [  175.025257]  [<ffffffff8111dcde>] ? do_vfs_ioctl+0x360/0x424
> [  175.025265]  [<ffffffff817987d2>] ? __schedule+0x589/0x7c9
> [  175.025271]  [<ffffffff8104fad2>] ? task_work_run+0x84/0x96
> [  175.025278]  [<ffffffff8111ddeb>] ? SyS_ioctl+0x49/0x7a
> [  175.025284]  [<ffffffff8179b472>] ? system_call_fastpath+0x12/0x17
> [  175.025288] ---[ end trace eab6accbedda9078 ]---
> [  175.025301] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000008

Sure but that is a bug from another commit.
Comment 6 lu hua 2015-05-22 07:28:33 UTC
Only meet once. I am unable to reproduce it again. Close it.
Comment 7 lu hua 2015-05-22 07:29:00 UTC
Verified.Fixed.
Comment 8 Jari Tahvanainen 2017-07-03 13:58:38 UTC
Closing old verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.