https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3332/shard-apl6/igt@drv_selftest@live_gtt.html oom killer starting: <7>[ 1607.338719] [drm:drm_setup_crtcs] desired mode 1024x768 set on crtc 39 (0,0) <7>[ 1607.339273] [drm:intelfb_create [i915]] no BIOS fb, allocating a new one <6>[ 1636.138752] Purging GPU memory, 0 pages freed, 845 pages still pinned. <3>[ 1636.138774] 1 and 0 pages still available in the bound and unbound GPU page lists. <4>[ 1636.138886] drv_selftest invoked oom-killer: gfp_mask=0x16042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK), nodemask=(null), order=2, oom_score_adj=1000 ... <3>[ 1636.928657] 1 and 0 pages still available in the bound and unbound GPU page lists. <6>[ 1636.974782] oom_reaper: reaped process 6772 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB <5>[ 1637.006382] owatch: /dev/watchdog0 closed so, it looks like it is possible to oom kill the python process and then python and hence the whole test execution. From run.log this just looks like you typical system hang: Completed CI_IGT_test CI_DRM_3332@shard-apl6 : FAILURE CI_IGT_test runtime 160 seconds
Only seen once so far (I think at least), it looks to be a kernel leak. At the moment, the obvious thing to do is a run with kmemleak, but my initial guess is that it's a result of early fail not cleaning up properly. The modules allocations (such as drm_mm, kmem_cache etc) are checked upon module unload (and kselftest) but no warning seen, hence the search for something a little more unusual.
(In reply to Chris Wilson from comment #1) > Only seen once so far (I think at least), it looks to be a kernel leak. At > the moment, the obvious thing to do is a run with kmemleak, but my initial > guess is that it's a result of early fail not cleaning up properly. The > modules allocations (such as drm_mm, kmem_cache etc) are checked upon module > unload (and kselftest) but no warning seen, hence the search for something a > little more unusual. This incomplete is pretty frequent on APL, but due to ftrace messing up pstore and the recent 4.15.0-rc1 fire, we'll have to wait and see if we can get any reasonable data on this.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3415/shard-apl6/igt@drv_selftest@live_gtt.html doesn't have the any oom stuff, so I change the title and file all igt@drv_selftest@live_gtt on this bug. run.log doesn't hint at timeout or softdog so system hang is assumed. this is last dmesg: <7>[ 2890.553261] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01 <5>[ 2891.380887] __shrink_hole timed out at ofset 1ffffff000 [0 - 1000000000000] <5>[ 2892.632006] lowlevel_hole timed out before 192296/260705 <5>[ 2893.636015] drunk_hole timed out after 114947/521410 <5>[ 2894.637006] walk_hole timed out at 1c93a000 <5>[ 2895.784092] pot_hole timed out after 16/31 <5>[ 2896.837487] fill_hole timed out (npages=279841, prime=23) <6>[ 2896.842475] Console: switching to colour dummy device 80x25
(In reply to Marta Löfstedt from comment #3) > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3415/shard-apl6/ > igt@drv_selftest@live_gtt.html > > doesn't have the any oom stuff, so I change the title and file all > igt@drv_selftest@live_gtt on this bug. > > run.log doesn't hint at timeout or softdog so system hang is assumed. > > this is last dmesg: > <7>[ 2890.553261] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to > 01 > <5>[ 2891.380887] __shrink_hole timed out at ofset 1ffffff000 [0 - > 1000000000000] > <5>[ 2892.632006] lowlevel_hole timed out before 192296/260705 > <5>[ 2893.636015] drunk_hole timed out after 114947/521410 > <5>[ 2894.637006] walk_hole timed out at 1c93a000 > <5>[ 2895.784092] pot_hole timed out after 16/31 > <5>[ 2896.837487] fill_hole timed out (npages=279841, prime=23) > <6>[ 2896.842475] Console: switching to colour dummy device 80x25 The dmesg snippet is wrong, it is from this GLK-shards run: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3415/shard-glkb4/igt@drv_selftest@live_gtt.html The last APL dmesgs are: <7>[ 1653.449383] [drm:intel_fb_initial_config [i915]] Not using firmware configuration <7>[ 1653.449404] [drm:drm_setup_crtcs] looking for cmdline mode on connector 72 <7>[ 1653.449427] [drm:drm_setup_crtcs] looking for preferred mode on connector 72 0 <7>[ 1653.449434] [drm:drm_setup_crtcs] found mode 1024x768 <7>[ 1653.449439] [drm:drm_setup_crtcs] picking CRTCs for 8192x8192 config <7>[ 1653.449467] [drm:drm_setup_crtcs] desired mode 1024x768 set on crtc 40 (0,0) <7>[ 1653.449577] [drm:intelfb_create [i915]] no BIOS fb, allocating a new one <7>[ 1653.483768] [drm:asle_work [i915]] bclp = 0x800000ff <7>[ 1653.483842] [drm:asle_work [i915]] updating opregion backlight 255/255 <6>[ 1664.028427] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
to clarify my previous mess: Here are 2 new occurrences of this issue, both looks like system hangs. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3415/shard-apl6/igt@drv_selftest@live_gtt.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3415/shard-glkb4/igt@drv_selftest@live_gtt.html
I think this explains this failure, and it should also prevent the sanitycheck incompletes. commit c325dd948b4e4e9fe0cc7d612f2101fb3804de5c (HEAD, upstream/master) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 30 21:41:10 2017 +0000 igt/drv_selftests: Disable initialising the display Many of the selftests try to completely fill global resources; resources that are presumed available for bringing up the display. Avoid the contention by simply not bringing up the display! This does limit the effectiveness of selftesting to GEM for the time being. To exercise KMS from selftests we would essentially have to always mock the displays. References: https://bugs.freedesktop.org/show_bug.cgi?id=103718 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Fix included in CI_DRM_3449 I will close
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.