Bug 106954 - [CI] igt@gem_ctx_create@basic-files - dmesg-warn - gem_ctx_create: page allocation failure
Summary: [CI] igt@gem_ctx_create@basic-files - dmesg-warn - gem_ctx_create: page alloc...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-18 14:08 UTC by Martin Peres
Modified: 2018-11-01 16:08 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features: firmware/guc, GEM/Other


Attachments

Description Martin Peres 2018-06-18 14:08:26 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4333/fi-kbl-guc/igt@gem_ctx_create@basic-files.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4334/fi-kbl-guc/igt@gem_ctx_create@basic-files.html

[   45.255378] gem_ctx_create: page allocation failure: order:0, mode:0x8402(__GFP_HIGHMEM|__GFP_RETRY_MAYFAIL|__GFP_ZERO), nodemask=(null)
[   45.255386] CPU: 1 PID: 1326 Comm: gem_ctx_create Tainted: G     U            4.18.0-rc1-CI-CI_DRM_4334+ #1
[   45.255387] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
[   45.255388] Call Trace:
[   45.255392]  ? dump_stack+0x67/0x9b
[   45.255395]  ? warn_alloc+0xee/0x170
[   45.255401]  ? __alloc_pages_nodemask+0xe61/0x1230
[   45.255404]  ? ___slab_alloc.constprop.34+0x221/0x390
[   45.255439]  ? setup_scratch_page+0x173/0x200 [i915]
[   45.255457]  ? gen8_ppgtt_create+0x9a/0x560 [i915]
[   45.255474]  ? i915_ppgtt_create+0x1c/0x180 [i915]
[   45.255489]  ? i915_gem_create_context+0x129/0x2b0 [i915]
[   45.255504]  ? i915_gem_context_open+0x64/0xe0 [i915]
[   45.255521]  ? i915_gem_open+0x91/0xc0 [i915]
[   45.255524]  ? drm_open+0x205/0x460
[   45.255528]  ? drm_stub_open+0xae/0xe0
[   45.255531]  ? chrdev_open+0xa2/0x1c0
[   45.255533]  ? cdev_put.part.1+0x20/0x20
[   45.255536]  ? do_dentry_open.isra.1+0x186/0x2d0
[   45.255539]  ? path_openat+0x4e0/0xb10
[   45.255543]  ? do_filp_open+0x96/0x110
[   45.255548]  ? __alloc_fd+0xe0/0x1e0
[   45.255553]  ? do_sys_open+0x1b8/0x240
[   45.255555]  ? do_sys_open+0x1b8/0x240
[   45.255559]  ? do_syscall_64+0x55/0x190
[   45.255561]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   45.255592] Mem-Info:
[   45.255595] active_anon:81250 inactive_anon:289110 isolated_anon:0
                active_file:42820 inactive_file:29981 isolated_file:0
                unevictable:0 dirty:2 writeback:0 unstable:0
                slab_reclaimable:36282 slab_unreclaimable:540620
                mapped:22660 shmem:289309 pagetables:2374 bounce:0
                free:25541 free_pcp:1373 free_cma:0
[   45.255598] Node 0 active_anon:325000kB inactive_anon:1156440kB active_file:171280kB inactive_file:119924kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:90640kB dirty:8kB writeback:0kB shmem:1157236kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 51200kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   45.255600] DMA free:15884kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   45.255601] lowmem_reserve[]: 0 2962 7795 7795
[   45.255608] DMA32 free:44640kB min:25628kB low:32032kB high:38436kB active_anon:4kB inactive_anon:486276kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3145048kB managed:3102248kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1848kB local_pcp:112kB free_cma:0kB
[   45.255609] lowmem_reserve[]: 0 0 4833 4833
[   45.255615] Normal free:41640kB min:41820kB low:52272kB high:62724kB active_anon:324996kB inactive_anon:669660kB active_file:171456kB inactive_file:119924kB unevictable:0kB writepending:196kB present:5095424kB managed:4949544kB mlocked:0kB kernel_stack:4176kB pagetables:9496kB bounce:0kB free_pcp:3644kB local_pcp:224kB free_cma:0kB
[   45.255616] lowmem_reserve[]: 0 0 0 0
[   45.255621] DMA: 3*4kB (U) 2*8kB (U) 3*16kB (U) 0*32kB 3*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
[   45.255641] DMA32: 2*4kB (UM) 2*8kB (UM) 6*16kB (M) 7*32kB (UME) 10*64kB (ME) 11*128kB (UM) 9*256kB (UM) 12*512kB (UME) 7*1024kB (UME) 7*2048kB (UME) 3*4096kB (M) = 44632kB
[   45.255663] Normal: 445*4kB (M) 109*8kB (M) 44*16kB (ME) 28*32kB (UME) 11*64kB (ME) 1*128kB (M) 2*256kB (UM) 1*512kB (U) 7*1024kB (UME) 4*2048kB (ME) 5*4096kB (M) = 41948kB
[   45.255686] 362089 total pagecache pages
[   45.255688] 6 pages in swap cache
[   45.255689] Swap cache stats: add 141, delete 135, find 0/0
[   45.255690] Free swap  = 2096380kB
[   45.255691] Total swap = 2097148kB
[   45.255692] 2064115 pages RAM
[   45.255693] 0 pages HighMem/MovableOnly
[   45.255694] 47196 pages reserved

This seems to have been introduced in CI_DRM_4333, as we got two failures in a row.
Comment 1 Chris Wilson 2018-06-18 14:15:06 UTC
Just one of the usual lack-of-reclaim faux pas. There are a lot of shmem objects there that we would expect to be able to swap out and load a new object. However, since we are using MAYFAIL in this case, we purposely don't try very hard. It would be justifiable to remove the warning as the ENOMEM goes straight back to userspace, but on the other hand we need to improve the shrinker for this case as well. The prudent course of action would be to create a test that reproduces this MAYFAIL reliably; although the ability of the shrinker to reclaim memory will be dependent on the type of objects and workload, so definitely will not be a one-test-fits-all.
Comment 3 Martin Peres 2018-06-21 10:39:28 UTC
This does not look like it is GuC-related.
Comment 4 Chris Wilson 2018-06-21 10:41:17 UTC
Not directly, no. The pstore says the test had finished and was idle for 60-70s before the watchdog killed the machine abruptly.
Comment 5 Martin Peres 2018-06-21 10:42:52 UTC
(In reply to Chris Wilson from comment #4)
> Not directly, no. The pstore says the test had finished and was idle for
> 60-70s before the watchdog killed the machine abruptly.

Hmm, ok! Should I file another bug?
Comment 6 Chris Wilson 2018-06-21 10:44:32 UTC
Yeah, I think the incomplete here is related to the rc1 fallout Tomi is trying to deal with (incompletes without dmesg).
Comment 7 Martin Peres 2018-06-21 10:57:51 UTC
Separate bug filed: https://bugs.freedesktop.org/show_bug.cgi?id=106988
Comment 8 Martin Peres 2018-11-01 16:08:52 UTC
Used to be seen every 3 runs or so. Now not seen since CI_DRM_4370_88 (3 months, 2 weeks / 1636 runs ago). Closing!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.