Bug 109801 - [CI][DRMTIP] igt@gem_ppgtt@blt-vs-render-ctx[0n] - dmesg-warn/dmesg-fail/incomplete - swap is full
Summary: [CI][DRMTIP] igt@gem_ppgtt@blt-vs-render-ctx[0n] - dmesg-warn/dmesg-fail/inco...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-28 15:26 UTC by Lakshmi
Modified: 2019-12-02 11:12 UTC (History)
1 user (show)

See Also:
i915 platform: ICL, KBL
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-02-28 15:26:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_232/fi-kbl-guc/igt@gem_ppgtt@blt-vs-render-ctxn.html

<6> [143.802548] [IGT] gem_ppgtt: executing
<6> [143.805225] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctxN
<7> [143.817359] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [143.817378] [drm:intel_power_well_enable [i915]] enabling DC off
<7> [143.817396] [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
<4> [173.507826] gem_ppgtt: page allocation failure: order:0, mode:0x6204d2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_RECLAIMABLE), nodemask=(null)
<4> [173.507909] CPU: 5 PID: 1249 Comm: gem_ppgtt Tainted: G     U            5.0.0-rc8-gcf7f9ddffea0-drmtip_232+ #1
<4> [173.507910] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018
<4> [173.507911] Call Trace:
<4> [173.507915]  dump_stack+0x67/0x9b
<4> [173.507918]  warn_alloc+0xfa/0x180
<4> [173.507922]  ? __mutex_unlock_slowpath+0x46/0x2b0
<4> [173.507925]  __alloc_pages_nodemask+0xda7/0x1110
<4> [173.507933]  shmem_alloc_and_acct_page+0x6f/0x1d0
<4> [173.507936]  shmem_getpage_gfp.isra.8+0x172/0xd10
<4> [173.507941]  shmem_read_mapping_page_gfp+0x3e/0x70
<4> [173.507972]  i915_gem_object_get_pages_gtt+0x203/0x680 [i915]
<4> [173.507994]  ? __i915_gem_object_get_pages+0x18/0xb0 [i915]
<4> [173.507998]  ? lock_acquire+0xa6/0x1c0
<4> [173.508018]  ? i915_gem_set_domain_ioctl+0x1/0x420 [i915]
<4> [173.508036]  ____i915_gem_object_get_pages+0x1d/0xa0 [i915]
<4> [173.508058]  __i915_gem_object_get_pages+0x59/0xb0 [i915]
<4> [173.508084]  i915_gem_set_domain_ioctl+0x34d/0x420 [i915]
<4> [173.508110]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [173.508114]  drm_ioctl_kernel+0x83/0xf0
<4> [173.508119]  drm_ioctl+0x2f3/0x3b0
<4> [173.508146]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [173.508153]  ? lock_acquire+0xa6/0x1c0
<4> [173.508158]  do_vfs_ioctl+0xa0/0x6e0
<4> [173.508164]  ksys_ioctl+0x35/0x60
<4> [173.508168]  __x64_sys_ioctl+0x11/0x20
<4> [173.508171]  do_syscall_64+0x55/0x190
<4> [173.508174]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [173.508176] RIP: 0033:0x7f5aacabb5d7
<4> [173.508179] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4> [173.508180] RSP: 002b:00007ffe99650748 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4> [173.508183] RAX: ffffffffffffffda RBX: 000055f7bc15e960 RCX: 00007f5aacabb5d7
<4> [173.508184] RDX: 00007ffe99650780 RSI: 00000000400c645f RDI: 000000000000000a
<4> [173.508185] RBP: 00007ffe99650780 R08: 000055f7bc159f98 R09: 00007f5aa0930000
<4> [173.508186] R10: 000055f7bc12c010 R11: 0000000000000246 R12: 00000000400c645f
<4> [173.508187] R13: 000000000000000a R14: 00007ffe99650780 R15: 0000000000000000
<4> [173.508232] Mem-Info:
<4> [173.508237] active_anon:101076 inactive_anon:1792281 isolated_anon:120
 active_file:392 inactive_file:213 isolated_file:0
 unevictable:15581 dirty:4 writeback:0 unstable:0
 slab_reclaimable:19495 slab_unreclaimable:46033
 mapped:3368 shmem:1858442 pagetables:2148 bounce:0
 free:27485 free_pcp:2308 free_cma:0
<4> [173.508241] Node 0 active_anon:404304kB inactive_anon:7169124kB active_file:1568kB inactive_file:852kB unevictable:62324kB isolated(anon):480kB isolated(file):0kB mapped:13472kB dirty:16kB writeback:0kB shmem:7433768kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 83968kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
<4> [173.508246] DMA free:15884kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
<4> [173.508248] lowmem_reserve[]: 0 3007 7788 7788
<4> [173.508257] DMA32 free:46600kB min:26040kB low:32548kB high:39056kB active_anon:92204kB inactive_anon:2933852kB active_file:220kB inactive_file:508kB unevictable:27836kB writepending:0kB present:3145048kB managed:3144764kB mlocked:0kB kernel_stack:16kB pagetables:12kB bounce:0kB free_pcp:3880kB local_pcp:388kB free_cma:0kB
<4> [173.508259] lowmem_reserve[]: 0 0 4781 4781
<4> [173.508268] Normal free:47456kB min:47552kB low:57904kB high:68256kB active_anon:309988kB inactive_anon:4235424kB active_file:1816kB inactive_file:1104kB unevictable:33036kB writepending:0kB present:5095424kB managed:4900620kB mlocked:0kB kernel_stack:4160kB pagetables:8580kB bounce:0kB free_pcp:5328kB local_pcp:476kB free_cma:0kB
<4> [173.508270] lowmem_reserve[]: 0 0 0 0
<4> [173.508274] DMA: 3*4kB (U) 2*8kB (U) 3*16kB (U) 0*32kB 3*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
<4> [173.508284] DMA32: 101*4kB (UME) 48*8kB (UME) 61*16kB (UME) 68*32kB (UME) 39*64kB (UME) 24*128kB (UME) 27*256kB (UME) 14*512kB (ME) 6*1024kB (M) 6*2048kB (UME) 1*4096kB (M) = 46116kB
<4> [173.508296] Normal: 1568*4kB (UMH) 1016*8kB (UMH) 622*16kB (UMEH) 247*32kB (UMEH) 122*64kB (UMEH) 58*128kB (UMEH) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47488kB
<6> [173.508307] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
<4> [173.508308] 1870700 total pagecache pages
<4> [173.508312] 11729 pages in swap cache
<4> [173.508314] Swap cache stats: add 1529938, delete 1517393, find 0/0
<4> [173.508315] Free swap  = 0kB
<4> [173.508317] Total swap = 2097148kB
<4> [173.508318] 2064115 pages RAM
<4> [173.508320] 0 pages HighMem/MovableOnly
<4> [173.508321] 48798 pages reserved
<6> [176.293417] gem_ppgtt (1252) used greatest stack depth: 10264 bytes left
<7> [178.118341] [drm:intel_power_well_disable [i915]] disabling DC off
<7> [178.118363] [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [178.118382] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
<7> [178.118849] [drm:intel_power_well_disable [i915]] disabling always-on
<7> [178.461575] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [178.461594] [drm:intel_power_well_enable [i915]] enabling DC off
<7> [178.461612] [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
<6> [178.473594] [IGT] gem_ppgtt: exiting, ret=0
<5> [178.474666] Setting dangerous option reset - tainting kernel
<7> [178.475210] [drm:intel_power_well_disable [i915]] disabling DC off
<7> [178.475229] [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [178.475245] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
<7> [178.475710] [drm:intel_power_well_disable [i915]] disabling always-on
Comment 1 CI Bug Log 2019-02-28 15:31:26 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* igt@gem_ppgtt@blt-vs-render-ctxn - dmesg-warn - Free swap  = 0kB
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_232/fi-kbl-guc/igt@gem_ppgtt@blt-vs-render-ctxn.html
Comment 2 Chris Wilson 2019-03-11 10:11:27 UTC
So fingers crossed this is just bad resource estimation and not a true leak:

commit 4e7296aa879350b10a216b88fa7f44d919765765 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Feb 28 15:30:43 2019 +0000

    i915/gem_ppgtt: Estimate resource usage and bail if it means swapping!
    
    fi-kbl-guc's swap ran dry while running blt-vs-render-ctxN, which is
    mildly concerning but conceivable as we never checked there was enough
    memory to run the test to begin with.
    
    Each child needs to keep its own surface and possible a pair of logical
    contexts (one for rcs and one for bcs) so check that there is enough
    memory to allow all children to co-exist. During execution, we require
    another surface and batch, but these are temporary and so should fit
    fine with a small amount of thrashing on the boundary.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=109801
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Comment 3 Chris Wilson 2019-03-11 14:32:37 UTC
So we have 8 children and only need ~1MiB each; and still run out of RAM! Something doesn't add up...

<6> [1625.670979] Purging GPU memory, 0 pages freed, 9454 pages still pinned.
<4> [1625.671796] gem_ppgtt invoked oom-killer: gfp_mask=0x6042c0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP), order=1, oom_score_adj=1000
<4> [1625.671839] CPU: 0 PID: 7853 Comm: gem_ppgtt Tainted: G     U            5.0.0-CI-CI_DRM_5731+ #1
<4> [1625.671843] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [1625.671845] Call Trace:
<4> [1625.671853]  dump_stack+0x67/0x9b
<4> [1625.671859]  dump_header+0x52/0x58e
<4> [1625.671866]  ? _raw_spin_unlock_irqrestore+0x55/0x60
<4> [1625.671872]  oom_kill_process+0x309/0x3a0
<4> [1625.671880]  out_of_memory+0x107/0x390
<4> [1625.671886]  __alloc_pages_nodemask+0xd6c/0x1110
<4> [1625.671905]  new_slab+0x3ad/0x520
<4> [1625.671914]  ___slab_alloc.constprop.35+0x2d3/0x380
<4> [1625.671918]  ? __sg_alloc_table+0x7f/0x150
<4> [1625.671929]  ? __sg_alloc_table+0x7f/0x150
<4> [1625.671931]  ? __sg_alloc_table+0x7f/0x150
<4> [1625.671936]  ? __slab_alloc.isra.28.constprop.34+0x3d/0x70
<4> [1625.671939]  __slab_alloc.isra.28.constprop.34+0x3d/0x70
<4> [1625.671945]  __kmalloc+0x29f/0x2e0
<4> [1625.671951]  __sg_alloc_table+0x7f/0x150
<4> [1625.671955]  ? sg_init_one+0xa0/0xa0
<4> [1625.671962]  sg_alloc_table+0x21/0xb0
<4> [1625.672019]  i915_sg_trim+0x3d/0x180 [i915]
<4> [1625.672090]  i915_gem_object_get_pages_gtt+0x2dd/0x6d0 [i915]
<4> [1625.672161]  ? __i915_gem_object_get_pages+0x18/0xb0 [i915]
<4> [1625.672173]  ? lock_acquire+0xa6/0x1c0
<4> [1625.672235]  ? i915_gem_obj_prepare_shmem_write+0x1b1/0x280 [i915]
<4> [1625.672298]  ____i915_gem_object_get_pages+0x1d/0xa0 [i915]
<4> [1625.672359]  __i915_gem_object_get_pages+0x59/0xb0 [i915]
<4> [1625.672421]  i915_gem_set_domain_ioctl+0x34d/0x420 [i915]
<4> [1625.672484]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [1625.672492]  drm_ioctl_kernel+0x83/0xf0
<4> [1625.672502]  drm_ioctl+0x2f3/0x3b0
<4> [1625.672565]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [1625.672581]  ? lock_acquire+0xa6/0x1c0
<4> [1625.672591]  do_vfs_ioctl+0xa0/0x6e0
<4> [1625.672605]  ksys_ioctl+0x35/0x60
<4> [1625.672614]  __x64_sys_ioctl+0x11/0x20
<4> [1625.672618]  do_syscall_64+0x55/0x190
<4> [1625.672624]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [1625.672628] RIP: 0033:0x7fe8ba8315d7
<4> [1625.672634] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4> [1625.672638] RSP: 002b:00007ffc6c91ff58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4> [1625.672643] RAX: ffffffffffffffda RBX: 000056227dfa3580 RCX: 00007fe8ba8315d7
<4> [1625.672646] RDX: 00007ffc6c91ff90 RSI: 00000000400c645f RDI: 0000000000000010
<4> [1625.672649] RBP: 00007ffc6c91ff90 R08: 000056227dfa1018 R09: 00007fe8ae59d000
<4> [1625.672653] R10: 000056227df70010 R11: 0000000000000246 R12: 00000000400c645f
<4> [1625.672656] R13: 0000000000000010 R14: 00007ffc6c91ff90 R15: 0000000000000000
<4> [1625.672846] Mem-Info:
<4> [1625.672864] active_anon:131729 inactive_anon:3788702 isolated_anon:0
 active_file:130 inactive_file:55 isolated_file:0
 unevictable:11283 dirty:0 writeback:0 unstable:0
 slab_reclaimable:32071 slab_unreclaimable:46810
 mapped:2841 shmem:3866099 pagetables:2281 bounce:0
 free:36260 free_pcp:2864 free_cma:0
<4> [1625.672869] Node 0 active_anon:526916kB inactive_anon:15154808kB active_file:520kB inactive_file:220kB unevictable:45132kB isolated(anon):0kB isolated(file):0kB mapped:11364kB dirty:0kB writeback:0kB shmem:15464396kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 153600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
<4> [1625.672874] DMA free:15876kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15876kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
<4> [1625.672876] lowmem_reserve[]: 0 964 15788 15788
<4> [1625.672884] DMA32 free:63260kB min:4124kB low:5152kB high:6180kB active_anon:0kB inactive_anon:979984kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1104624kB managed:1056896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:4512kB local_pcp:500kB free_cma:0kB
<4> [1625.672886] lowmem_reserve[]: 0 0 14823 14823
<4> [1625.672894] Normal free:65904kB min:63392kB low:79240kB high:95088kB active_anon:526916kB inactive_anon:14176132kB active_file:428kB inactive_file:0kB unevictable:44844kB writepending:0kB present:15470592kB managed:15179776kB mlocked:0kB kernel_stack:4608kB pagetables:9124kB bounce:0kB free_pcp:7156kB local_pcp:1348kB free_cma:0kB
<4> [1625.672896] lowmem_reserve[]: 0 0 0 0
<4> [1625.672901] DMA: 1*4kB (U) 0*8kB 2*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15876kB
<4> [1625.672952] DMA32: 11*4kB (ME) 6*8kB (ME) 10*16kB (UM) 21*32kB (UM) 12*64kB (UME) 7*128kB (UM) 7*256kB (ME) 7*512kB (ME) 4*1024kB (M) 5*2048kB (UME) 10*4096kB (M) = 63260kB
<4> [1625.672969] Normal: 2344*4kB (UMEH) 1278*8kB (UMEH) 684*16kB (UME) 371*32kB (UME) 184*64kB (ME) 34*128kB (ME) 14*256kB (ME) 7*512kB (UE) 0*1024kB 0*2048kB 0*4096kB = 65712kB
<6> [1625.672985] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
<4> [1625.672987] 3877427 total pagecache pages
<4> [1625.672994] 11056 pages in swap cache
<4> [1625.672996] Swap cache stats: add 524287, delete 512898, find 0/0
<4> [1625.672999] Free swap  = 0kB
<4> [1625.673001] Total swap = 2097148kB
<4> [1625.673003] 4147802 pages RAM
<4> [1625.673006] 0 pages HighMem/MovableOnly
<4> [1625.673008] 84665 pages reserved
<6> [1625.673010] Tasks state (memory values in pages):
<6> [1625.673013] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
<6> [1625.673153] [    293]     0   293    27267      185   229376        0             0 systemd-journal
<6> [1625.673296] [    319]     0   319    11033      430   114688        0         -1000 systemd-udevd
<6> [1625.673301] [    455]   101   455    17653      141   184320        0             0 systemd-resolve
<6> [1625.673306] [    464]     0   464    17648      196   184320        0             0 systemd-logind
<6> [1625.673311] [    471]     0   471    27620       90   118784        0             0 irqbalance
<6> [1625.673316] [    473]     0   473     1137       16    57344        0             0 acpid
<6> [1625.673321] [    474]     0   474    44724     1980   241664        0             0 networkd-dispat
<6> [1625.673326] [    476]   102   476    65758      484   159744        0             0 rsyslogd
<6> [1625.673331] [    487]   103   487    12577      236   135168        0          -900 dbus-daemon
<6> [1625.673336] [    523]     0   523     2478       32    57344        0             0 rngd
<6> [1625.673341] [    549]     0   549   159622     1205   450560        0             0 NetworkManager
<6> [1625.673346] [    641]     0   641    18073      189   176128        0         -1000 sshd
<6> [1625.673351] [    643]     0   643     6126       34    73728        0             0 agetty
<6> [1625.673356] [    656]     0   656    72218      198   200704        0             0 polkitd
<6> [1625.673360] [    660]     0   660     6414      309    94208        0             0 dhclient
<6> [1625.673365] [   1129]     0  1129    26430      248   237568        0             0 sshd
<6> [1625.673372] [   1134]  1000  1134    19154      264   196608        0             0 systemd
<6> [1625.673377] [   1137]  1000  1137    65300      571   270336        0             0 (sd-pam)
<6> [1625.673381] [   1175]  1000  1175    27173      411   241664        0             0 sshd
<6> [1625.673386] [   1211]  1000  1211  2121546    44098   950272        0             0 java
<6> [1625.673391] [   1340]     0  1340    18483      120   188416        0             0 sudo
<6> [1625.673396] [   1345]     0  1345     2478       26    65536        0             0 rngd
<6> [1625.673402] [   2078]     0  2078    18483      120   192512        0             0 sudo
<6> [1625.673406] [   2083]     0  2083     2478       27    61440        0             0 rngd
<6> [1625.673412] [   5654]     0  5654    93723      428   294912        0             0 packagekitd
<6> [1625.673417] [   6054]  1000  6054     5302       74    86016        0             0 bash
<6> [1625.673422] [   6128]     0  6128    18483      120   180224        0             0 sudo
<6> [1625.673426] [   6133]     0  6133     2478       27    61440        0             0 rngd
<6> [1625.673431] [   6151]  1000  6151     6097       31    81920        0             0 dmesg
<6> [1625.673435] [   6154]     0  6154    18483      121   180224        0             0 sudo
<6> [1625.673440] [   6158]     0  6158    50694      437   421888        0             0 igt_runner
<6> [1625.673456] [   7844]     0  7844    52469     2738   425984        0          1000 gem_ppgtt
<6> [1625.673460] [   7845]     0  7845    50421      467   385024        0          1000 gem_ppgtt
<6> [1625.673465] [   7846]     0  7846    52725      442   385024        0          1000 gem_ppgtt
<6> [1625.673470] [   7847]     0  7847    52725      480   385024        0          1000 gem_ppgtt
<6> [1625.673475] [   7848]     0  7848    52725      677   389120        0          1000 gem_ppgtt
<6> [1625.673480] [   7849]     0  7849    52469      480   385024        0          1000 gem_ppgtt
<6> [1625.673484] [   7850]     0  7850    52469      480   385024        0          1000 gem_ppgtt
<6> [1625.673489] [   7851]     0  7851    52469      480   385024        0          1000 gem_ppgtt
<6> [1625.673494] [   7852]     0  7852    52725      480   385024        0          1000 gem_ppgtt
<6> [1625.673498] [   7853]     0  7853    52725      480   385024        0          1000 gem_ppgtt
Comment 4 CI Bug Log 2019-03-11 14:56:31 UTC
A CI Bug Log filter associated to this bug has been updated:

{- igt@gem_ppgtt@blt-vs-render-ctxn - dmesg-warn - Free swap  = 0kB -}
{+ KBL: igt@gem_ppgtt@blt-vs-render-ctxn - dmesg-warn - Free swap  = 0kB +}

 No new failures caught with the new filter
Comment 5 CI Bug Log 2019-03-11 15:03:49 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL: igt@gem_ppgtt@blt-vs-render-ctxn - dmesg-warn - Free swap  = 0kB -}
{+ KBL ICL: igt@gem_ppgtt@blt-vs-render-ctxn - dmesg-warn - gem_ppgtt: page allocation failure +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5727/shard-iclb3/igt@gem_ppgtt@blt-vs-render-ctxn.html
Comment 7 Chris Wilson 2019-03-15 22:41:36 UTC
I can run gem_ppgtt/blt-vs-render-ctxN in a loop for hours upon hours on kbl. The number of gem objects stays beneath 1000 and memory usage negligible.

Went through the test and rendercopy_gen9 (which is also gen11) and there is nothing that looks like a leak, nor does valgrind think there is (significant at least).

I suppose if I dump meminfo/slabinfo before the test begins, we should then be able to compare with the allocation failure to see if the shmem usage is solely due to the test.

In terms of patches, the one that stands out as being of interest would be

commit 64e3d12f769d60eaee6d2e53a9b7f0b3814f32ed
Author: Kuo-Hsin Yang <vovoy@chromium.org>
Date:   Tue Nov 6 13:23:24 2018 +0000

    mm, drm/i915: mark pinned shmemfs pages as unevictable
Comment 8 CI Bug Log 2019-03-18 06:25:12 UTC
A CI Bug Log filter associated to this bug has been updated:

{- KBL ICL: igt@gem_ppgtt@blt-vs-render-ctxn - dmesg-warn - gem_ppgtt: page allocation failure -}
{+ KBL ICL: igt@gem_ppgtt@blt-vs-render-ctx[0n] - dmesg-warn - gem_ppgtt: page allocation failure +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5755/shard-iclb8/igt@gem_ppgtt@blt-vs-render-ctx0.html
* https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5759/shard-iclb3/igt@gem_ppgtt@blt-vs-render-ctx0.html
Comment 9 CI Bug Log 2019-03-20 11:27:05 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_ppgtt@blt-vs-render-ctx - incomplete - No useful logs
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_247/fi-icl-y/igt@gem_ppgtt@blt-vs-render-ctx0.html
Comment 10 CI Bug Log 2019-04-01 07:39:56 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@gem_ppgtt@blt-vs-render-ctx - incomplete - No useful logs -}
{+ ICL: igt@gem_ppgtt@blt-vs-render-ctx - incomplete - No useful logs +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4914/shard-iclb3/igt@gem_ppgtt@blt-vs-render-ctxn.html
Comment 11 Martin Peres 2019-04-23 13:22:52 UTC
Still happening every single run: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_262/fi-icl-y/igt@gem_ppgtt@blt-vs-render-ctx0.html
Comment 12 Francesco Balestrieri 2019-04-29 16:40:31 UTC
Chris,

> I suppose if I dump meminfo/slabinfo before the test begins, we should then be able 
> to compare with the allocation failure to see if the shmem usage is solely due to the > test.

Should we do just that? It seems to be reproducible fairly easily.
Comment 13 Chris Wilson 2019-04-29 17:09:42 UTC
(In reply to Francesco Balestrieri from comment #12)
> Chris,
> 
> > I suppose if I dump meminfo/slabinfo before the test begins, we should then be able 
> > to compare with the allocation failure to see if the shmem usage is solely due to the > test.
> 
> Should we do just that? It seems to be reproducible fairly easily.

The CI kernel is missing the slabinfo dump on oom -- not sure which config flag that we are missing, I see it on my oomkiller but CI does not. Without that, we don't have much indication of what changed inside the test -- you may as well just log in remotely and look at slabinfo / i915_gem_objects and decide where to go from there.

Why, oh why, does this appear to be icl specific?!!!
Comment 14 Francesco Balestrieri 2019-06-03 06:40:47 UTC
> The CI kernel is missing the slabinfo dump on oom -- not sure which config 
> flag that we are missing, I see it on my oomkiller but CI does not

Martin, Tomi, is this something we could enable in CI?
Comment 15 Chris Wilson 2019-06-11 21:00:46 UTC
This appears to have evaporated into thin air...
Comment 16 CI Bug Log 2019-12-02 11:12:23 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.