Bug 111875 - [CI][SHARDS]igt@gem_exec_reuse@contexts - incomplete - Out of memory
Summary: [CI][SHARDS]igt@gem_exec_reuse@contexts - incomplete - Out of memory
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-01 10:54 UTC by Lakshmi
Modified: 2019-11-29 19:36 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-10-01 10:54:57 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6977/shard-skl9/igt@gem_exec_reuse@contexts.html
<6> [2075.844850] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),global_oom,task_memcg=/,task=gem_exec_reuse,pid=3690,uid=0
<3> [2075.846994] Out of memory: Killed process 3690 (gem_exec_reuse) total-vm:214048kB, anon-rss:9712kB, file-rss:4kB, shmem-rss:0kB, UID:0 pgtables:425984kB oom_score_adj:1000
<6> [2076.112341] oom_reaper: reaped process 3690 (gem_exec_reuse), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
<4> [2079.622101] java invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
<4> [2079.622127] CPU: 0 PID: 852 Comm: java Tainted: G     U            5.4.0-rc1-CI-CI_DRM_6977+ #1
<4> [2079.622141] Hardware name: Google Caroline/Caroline, BIOS MrChromebox 08/27/2018
<4> [2079.622152] Call Trace:
<4> [2079.622184]  dump_stack+0x67/0x9b
<4> [2079.622206]  dump_header+0x4a/0x3f0
<4> [2079.622231]  oom_kill_process+0xe8/0x200
<4> [2079.622257]  out_of_memory+0xfa/0x380
<4> [2079.622291]  __alloc_pages_slowpath+0xc1d/0xdc0
<4> [2079.622397]  __alloc_pages_nodemask+0x2ce/0x330
<4> [2079.622441]  pagecache_get_page+0xb5/0x240
<4> [2079.622475]  filemap_fault+0x6e5/0x9c0
<4> [2079.622498]  ? filemap_map_pages+0x1cd/0x560
<4> [2079.622539]  ? ext4_filemap_fault+0x22/0x39
<4> [2079.622586]  ext4_filemap_fault+0x2a/0x39
<4> [2079.622605]  __do_fault+0x4a/0xa0
<4> [2079.622632]  __handle_mm_fault+0xa0f/0xf80
<4> [2079.622699]  handle_mm_fault+0x159/0x350
<4> [2079.622733]  __do_page_fault+0x2bb/0x4f0
<4> [2079.622774]  page_fault+0x34/0x40
<4> [2079.622792] RIP: 0033:0x7f0080b13fd3
<4> [2079.622819] Code: Bad RIP value.
<4> [2079.622832] RSP: 002b:00007f0061cbf6e0 EFLAGS: 00010206
<4> [2079.622849] RAX: 00007f007825e4f0 RBX: 00007f007825d800 RCX: 0000000000002744
<4> [2079.622860] RDX: 0000000000002745 RSI: 00000000000001a1 RDI: 00000000c3ec70d8
<4> [2079.622872] RBP: 00007f0061cbf740 R08: 00007ffd29dc0090 R09: 00007f0061cbf770
<4> [2079.622884] R10: 00007f006955d838 R11: 00000000c3ee2860 R12: 00000000000000c7
<4> [2079.622896] R13: 00007f0061cbf750 R14: 0000000000000000 R15: 00007f007825d800
<4> [2079.623119] Mem-Info:
<4> [2079.623164] active_anon:43953 inactive_anon:2312 isolated_anon:0
 active_file:61 inactive_file:129 isolated_file:0
 unevictable:108514 dirty:0 writeback:0 unstable:0
 slab_reclaimable:46619 slab_unreclaimable:752469
 mapped:346 shmem:110826 pagetables:1516 bounce:0
 free:22460 free_pcp:341 free_cma:0
<4> [2079.623213] Node 0 active_anon:175812kB inactive_anon:9248kB active_file:244kB inactive_file:516kB unevictable:434056kB isolated(anon):0kB isolated(file):0kB mapped:1384kB dirty:0kB writeback:0kB shmem:443304kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 108544kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
<4> [2079.623332] DMA free:15400kB min:276kB low:344kB high:412kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:176kB writepending:0kB present:15996kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
<4> [2079.623355] lowmem_reserve[]: 0 1823 3783 3783
<4> [2079.623444] DMA32 free:40100kB min:32444kB low:40552kB high:48660kB active_anon:0kB inactive_anon:0kB active_file:28kB inactive_file:64kB unevictable:337220kB writepending:0kB present:1992708kB managed:1936720kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:496kB local_pcp:248kB free_cma:0kB
<4> [2079.623465] lowmem_reserve[]: 0 0 1959 1959
<4> [2079.623564] Normal free:34340kB min:34856kB low:43568kB high:52280kB active_anon:175812kB inactive_anon:9248kB active_file:0kB inactive_file:384kB unevictable:96660kB writepending:0kB present:2080768kB managed:2006912kB mlocked:0kB kernel_stack:2672kB pagetables:6064kB bounce:0kB free_pcp:868kB local_pcp:124kB free_cma:0kB
<4> [2079.623586] lowmem_reserve[]: 0 0 0 0
<4> [2079.623653] DMA: 2*4kB (U) 0*8kB 2*16kB (UE) 0*32kB 2*64kB (UE) 1*128kB (U) 1*256kB (E) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ME) 2*4096kB (M) = 15400kB
<4> [2079.623836] DMA32: 5*4kB (UME) 3*8kB (UE) 1*16kB (E) 13*32kB (ME) 14*64kB (UME) 8*128kB (ME) 5*256kB (UM) 3*512kB (UM) 6*1024kB (ME) 4*2048kB (ME) 5*4096kB (M) = 40028kB
<4> [2079.623982] Normal: 1501*4kB (UMEH) 866*8kB (ME) 423*16kB (MEH) 197*32kB (UMEH) 88*64kB (UMEH) 21*128kB (UMH) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 34324kB
<6> [2079.624110] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
<4> [2079.624130] 111038 total pagecache pages
<4> [2079.624153] 0 pages in swap cache
<4> [2079.624176] Swap cache stats: add 0, delete 0, find 0/0
<4> [2079.624197] Free swap  = 0kB
<4> [2079.624324] Total swap = 0kB
<4> [2079.624344] 1022368 pages RAM
<4> [2079.624364] 0 pages HighMem/MovableOnly
<4> [2079.624384] 32484 pages reserved
<6> [2079.624408] Unreclaimable slab info:
<6> [2079.624431] Name                      Used          Total
<6> [2079.624463] i915_vma              115187KB     115189KB
<6> [2079.624491] i915_priolist              0KB          8KB
<6> [2079.624519] i915_dependency            0KB          8KB
<6> [2079.624549] drm_i915_gem_object     409648KB     409656KB
<6> [2079.624578] i915_lut_handle        38949KB      38950KB
<6> [2079.624606] intel_context           3594KB       3606KB
<6> [2079.624640] active_node                0KB         32KB
<6> [2079.624710] bio-2                      1KB         15KB
<6> [2079.624764] fib6_nodes                 3KB          8KB
<6> [2079.624792] ip6_dst_cache              5KB         15KB
<6> [2079.624824] RAWv6                     16KB         31KB
<6> [2079.624856] UDPv6                      0KB         31KB
<6> [2079.624893] TCPv6                      3KB         31KB
<6> [2079.624937] sd_ext_cdb                 0KB          7KB
<6> [2079.624965] sgpool-128                 8KB         31KB
<6> [2079.624993] sgpool-64                  4KB         31KB
<6> [2079.625020] sgpool-32                  2KB         31KB
<6> [2079.625048] sgpool-16                  1KB         15KB
<6> [2079.625076] sgpool-8                   1KB         15KB
<6> [2079.625105] mqueue_inode_cache          1KB         30KB
<6> [2079.625144] jbd2_inode                 6KB         31KB
<6> [2079.625175] ext4_system_zone           9KB         15KB
<6> [2079.625203] ext4_bio_post_read_ctx         52KB         54KB
<6> [2079.625385] bio-1                      2KB         15KB
<6> [2079.625407] posix_timers_cache          0KB         31KB
<6> [2079.625423] iommu_devinfo             11KB         16KB
<6> [2079.625439] iommu_domain              45KB         63KB
<6> [2079.625455] iommu_iova             50475KB      50478KB
<6> [2079.625473] UNIX                     187KB        223KB
<6> [2079.625494] tcp_bind_bucket            1KB          8KB
<6> [2079.625510] inet_peer_cache            1KB         15KB
<6> [2079.625531] ip_fib_trie                3KB          7KB
<6> [2079.625547] ip_fib_alias               3KB          7KB
<6> [2079.625564] ip_dst_cache               7KB         47KB
Comment 2 Chris Wilson 2019-10-01 11:44:28 UTC
We've disabled kmemleak from normal builds (i.e. should still be enabled for kasan; check with Tomi for precise details or check the kconfig of the relevant run). Bisect ongoing as to how/why it suddenly exploded memusage wise.
Comment 3 Chris Wilson 2019-10-01 12:53:05 UTC
ickle@broadwell:~/linux$ git bisect bad
c5665868183fec689dbab9fb8505188b2c4f0757 is the first bad commit
commit c5665868183fec689dbab9fb8505188b2c4f0757
Author: Catalin Marinas <catalin.marinas@arm.com>
Date:   Mon Sep 23 15:34:05 2019 -0700

    mm: kmemleak: use the memory pool for early allocations
    
    Currently kmemleak uses a static early_log buffer to trace all memory
    allocation/freeing before the slab allocator is initialised.  Such early
    log is replayed during kmemleak_init() to properly initialise the kmemleak
    metadata for objects allocated up that point.  With a memory pool that
    does not rely on the slab allocator, it is possible to skip this early log
    entirely.
    
    In order to remove the early logging, consider kmemleak_enabled == 1 by
    default while the kmem_cache availability is checked directly on the
    object_cache and scan_area_cache variables.  The RCU callback is only
    invoked after object_cache has been initialised as we wouldn't have any
    concurrent list traversal before this.
    
    In order to reduce the number of callbacks before kmemleak is fully
    initialised, move the kmemleak_init() call to mm_init().
    
    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: remove WARN_ON(), per Catalin]
    Link: http://lkml.kernel.org/r/20190812160642.52134-4-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Qian Cai <cai@lca.pw>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment 4 Chris Wilson 2019-10-01 14:00:57 UTC
Reverts cleanly, but still ooms. Hmm, maybe taken a wrong turn in bisecting.
Comment 5 Chris Wilson 2019-10-02 09:50:17 UTC
Fwiw, disabling kmemleak had the desired effected in preventing the incompletes during shard runs.
Comment 6 Martin Peres 2019-11-29 19:36:39 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/471.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.