Bug 107886

Summary: [CI][DRMTIP] igt@drv_suspend@shrink - dmesg-warn - Out of memory: Kill process 2520 (drv_suspend) score 1865 or sacrifice child
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT, CFL, ICL, SKL i915 features:

Description Martin Peres 2018-09-10 10:48:41 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_106/fi-cfl-s3/igt@drv_suspend@shrink.html

[  319.828988] Out of memory: Kill process 2520 (drv_suspend) score 1865 or sacrifice child
[  319.856766] 1 and 0 pages still available in the bound and unbound GPU page lists.
[  319.856808] oom_kill_process: 291 callbacks suppressed
[  319.856809] java invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[  319.856812] CPU: 3 PID: 2527 Comm: java Tainted: G     U            4.19.0-rc2-g7fddb79b0908-drmtip_106+ #1
[  319.856813] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X118.B19.1802080131 02/08/2018
[  319.856814] Call Trace:
[  319.856818]  dump_stack+0x67/0x9b
[  319.856822]  dump_header+0x60/0x42e
[  319.856825]  ? lockdep_hardirqs_on+0xe0/0x1b0
[  319.856827]  ? _raw_spin_unlock_irqrestore+0x39/0x60
[  319.856830]  oom_kill_process+0x310/0x3a0
[  319.856834]  out_of_memory+0x103/0x380
[  319.856837]  __alloc_pages_nodemask+0xd72/0x1120
[  319.856841]  ? lock_acquire+0xa6/0x1c0
[  319.856848]  __read_swap_cache_async+0x148/0x260
[  319.856852]  read_swap_cache_async+0x23/0x60
[  319.856855]  swapin_readahead+0x14a/0x410
[  319.856858]  ? find_get_entry+0x130/0x210
[  319.856861]  ? pagecache_get_page+0x2b/0x210
[  319.856865]  ? do_swap_page+0x2ea/0x950
[  319.856867]  do_swap_page+0x2ea/0x950
[  319.856871]  __handle_mm_fault+0x666/0xee0
[  319.856878]  handle_mm_fault+0x196/0x3a0
[  319.856881]  __do_page_fault+0x295/0x590
[  319.856885]  ? page_fault+0x8/0x30
[  319.856887]  page_fault+0x1e/0x30
[  319.856889] RIP: 0033:0x7fe5d56ea008
[  319.856892] Code: Bad RIP value.
[  319.856893] RSP: 002b:00007fe5bcf9c550 EFLAGS: 00010202
[  319.856896] RAX: 00007fe5d4f40190 RBX: 0000000000000000 RCX: 00000000000002e0
[  319.856897] RDX: 000000076e6a0050 RSI: ffffffffebd3ee55 RDI: 000000000000fb20
[  319.856898] RBP: 00007fe5bcf9c8f0 R08: ffffffffebd3ee55 R09: 00007fff70198080
[  319.856899] R10: 00000000edcd400a R11: 0000000000000246 R12: 0000000000000000
[  319.856900] R13: 0000000000000000 R14: 0000000770f00730 R15: 00007fe53c002800
[  319.856908] Mem-Info:
[  319.856911] active_anon:58 inactive_anon:20 isolated_anon:0
                active_file:79 inactive_file:0 isolated_file:0
                unevictable:3973918 dirty:0 writeback:11 unstable:0
                slab_reclaimable:20990 slab_unreclaimable:21348
                mapped:3974024 shmem:3973983 pagetables:16956 bounce:0
                free:35394 free_pcp:126 free_cma:0
[  319.856914] Node 0 active_anon:232kB inactive_anon:80kB active_file:316kB inactive_file:0kB unevictable:15895672kB isolated(anon):0kB isolated(file):0kB mapped:15896096kB dirty:0kB writeback:44kB shmem:15895932kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[  319.856916] DMA free:15884kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  319.856917] lowmem_reserve[]: 0 1362 15846 15846
[  319.856925] DMA32 free:63984kB min:5804kB low:7252kB high:8700kB active_anon:0kB inactive_anon:88kB active_file:0kB inactive_file:60kB unevictable:1382628kB writepending:0kB present:1461268kB managed:1461080kB mlocked:1382628kB kernel_stack:0kB pagetables:7076kB bounce:0kB free_pcp:504kB local_pcp:0kB free_cma:0kB
[  319.856926] lowmem_reserve[]: 0 0 14483 14483
[  319.856933] Normal free:61708kB min:61708kB low:77132kB high:92556kB active_anon:32kB inactive_anon:240kB active_file:324kB inactive_file:0kB unevictable:14513044kB writepending:0kB present:15171584kB managed:14835068kB mlocked:14513044kB kernel_stack:4160kB pagetables:60748kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  319.856933] lowmem_reserve[]: 0 0 0 0
[  319.856939] DMA: 1*4kB (U) 1*8kB (U) 2*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
[  319.856961] DMA32: 89*4kB (M) 84*8kB (UM) 60*16kB (UM) 62*32kB (UM) 3*64kB (UM) 2*128kB (U) 7*256kB (U) 4*512kB (U) 5*1024kB (UE) 1*2048kB (E) 12*4096kB (UM) = 64580kB
[  319.856984] Normal: 1596*4kB (M) 633*8kB (ME) 185*16kB (UME) 23*32kB (UME) 26*64kB (ME) 18*128kB (ME) 13*256kB (ME) 31*512kB (M) 24*1024kB (UME) 0*2048kB 0*4096kB = 62888kB
[  319.857006] 3974243 total pagecache pages
[  319.857009] 11 pages in swap cache
[  319.857010] Swap cache stats: add 180946, delete 180946, find 73971/132995
[  319.857011] Free swap  = 1772540kB
[  319.857012] Total swap = 2097148kB
[  319.857013] 4162211 pages RAM
[  319.857014] 0 pages HighMem/MovableOnly
[  319.857015] 84203 pages reserved
[  319.857173] Out of memory: Kill process 2520 (drv_suspend) score 1865 or sacrifice child
[  319.883973] 1 and 0 pages still available in the bound and unbound GPU page lists.
[  319.884025] in:imklog invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[  319.884029] CPU: 10 PID: 468 Comm: in:imklog Tainted: G     U            4.19.0-rc2-g7fddb79b0908-drmtip_106+ #1
[  319.884031] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X118.B19.1802080131 02/08/2018
[  319.884032] Call Trace:
[  319.884038]  dump_stack+0x67/0x9b
[  319.884042]  dump_header+0x60/0x42e
[  319.884045]  ? lockdep_hardirqs_on+0xe0/0x1b0
[  319.884048]  ? _raw_spin_unlock_irqrestore+0x39/0x60
[  319.884052]  oom_kill_process+0x310/0x3a0
[  319.884056]  out_of_memory+0x103/0x380
[  319.884060]  __alloc_pages_nodemask+0xd72/0x1120
[  319.884066]  ? lock_acquire+0xa6/0x1c0
[  319.884078]  __read_swap_cache_async+0x148/0x260
[  319.884084]  read_swap_cache_async+0x23/0x60
[  319.884087]  swapin_readahead+0x14a/0x410
[  319.884091]  ? find_get_entry+0x130/0x210
[  319.884095]  ? pagecache_get_page+0x2b/0x210
[  319.884100]  ? do_swap_page+0x2ea/0x950
[  319.884103]  do_swap_page+0x2ea/0x950
[  319.884110]  __handle_mm_fault+0x666/0xee0
[  319.884119]  handle_mm_fault+0x196/0x3a0
[  319.884123]  __do_page_fault+0x295/0x590
[  319.884127]  ? page_fault+0x8/0x30
[  319.884130]  page_fault+0x1e/0x30
[  319.884132] RIP: 0033:0x561b707cb580
[  319.884135] Code: Bad RIP value.
[  319.884136] RSP: 002b:00007f42be67e3b8 EFLAGS: 00010206
[  319.884139] RAX: 0000000000000000 RBX: 00007f42be67e418 RCX: 00007f42be67ed03
[  319.884140] RDX: 000000000000003e RSI: 000000000000003e RDI: 0000000000000260
[  319.884141] RBP: 00007f42be67e418 R08: 00007f42c0647cc0 R09: a3d70a3d70a3d70b
[  319.884143] R10: 2ce33e6c02ce33e7 R11: 0000000000000000 R12: 0000000000000006
[  319.884144] R13: 00007f42be67ed03 R14: 0000000000001f9f R15: 00007f42be67ed4e
[  319.884192] Mem-Info:
[  319.884196] active_anon:48 inactive_anon:66 isolated_anon:0
                active_file:74 inactive_file:0 isolated_file:0
                unevictable:3973918 dirty:0 writeback:0 unstable:0
                slab_reclaimable:20990 slab_unreclaimable:21344
                mapped:3973872 shmem:3973983 pagetables:16956 bounce:0
                free:35256 free_pcp:314 free_cma:0
[...]
[  320.031550] Out of memory: Kill process 2520 (drv_suspend) score 1865 or sacrifice child
[  320.032610] 1 and 0 pages still available in the bound and unbound GPU page lists.
[...]
[  324.434191] 1 and 0 pages still available in the bound and unbound GPU page lists.
[  324.434248] Out of memory: Kill process 2520 (drv_suspend) score 1865 or sacrifice child
[  346.573368] sd 3:0:0:0: [sda] Synchronizing SCSI cache
[  346.577069] sd 3:0:0:0: [sda] Stopping disk
[  347.829199] NOHZ: local_softirq_pending 02
[  347.829257] NOHZ: local_softirq_pending 02
[  348.118195]  cache: parent cpu1 should not be sleeping
[  348.121307]  cache: parent cpu2 should not be sleeping
[  348.123788]  cache: parent cpu3 should not be sleeping
[  348.126220]  cache: parent cpu4 should not be sleeping
[  348.128739]  cache: parent cpu5 should not be sleeping
[  348.131493]  cache: parent cpu6 should not be sleeping
[  348.134043]  cache: parent cpu7 should not be sleeping
[  348.136626]  cache: parent cpu8 should not be sleeping
[  348.139199]  cache: parent cpu9 should not be sleeping
[  348.141782]  cache: parent cpu10 should not be sleeping
[  348.144547]  cache: parent cpu11 should not be sleeping
[  348.205952] hpet1: lost 706 rtc interrupts
[  348.216378] ------------[ cut here ]------------
[  348.216384] Delta way too big! 18446743730754416149 ts=18446744055903157564 write stamp = 325148741415
               If you just came from a suspend/resume,
               please switch to the trace global clock:
                 echo global > /sys/kernel/debug/tracing/trace_clock
               or add trace_clock=global to the kernel command line
[  348.216401] WARNING: CPU: 4 PID: 2553 at kernel/trace/ring_buffer.c:2769 rb_handle_timestamp.isra.30+0x6c/0x70
[  348.216405] Modules linked in: vgem i915 cdc_ether usbnet r8152 mii x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000e mei_me mei prime_numbers
[  348.216458] CPU: 4 PID: 2553 Comm: kworker/u24:5 Tainted: G     U            4.19.0-rc2-g7fddb79b0908-drmtip_106+ #1
[  348.216461] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X118.B19.1802080131 02/08/2018
[  348.216468] Workqueue: events_unbound async_run_entry_fn
[  348.216477] RIP: 0010:rb_handle_timestamp.isra.30+0x6c/0x70
[  348.216482] Code: 48 8b 73 08 85 c0 48 8b 13 48 c7 c0 00 4c 07 97 49 c7 c0 a0 0d 08 97 4c 0f 44 c0 48 8b 0f 48 c7 c7 c0 4c 07 97 e8 f4 f6 f2 ff <0f> 0b eb a8 41 54 55 48 63 f6 53 48 8b 87 a0 00 00 00 48 8b 1c f0
[  348.216486] RSP: 0000:ffffa3dc00a0fcd8 EFLAGS: 00010282
[  348.216493] RAX: 0000000000000000 RBX: ffffa3dc00a0fcf0 RCX: 0000000000000002
[  348.216496] RDX: 0000000080000002 RSI: ffffffff970f76ca RDI: 00000000ffffffff
[  348.216500] RBP: ffff8995cbdc20c0 R08: 00000000f3233359 R09: 0000000000000000
[  348.216503] R10: ffffa3dc00a0fce8 R11: 0000000000000000 R12: ffff8995cbc20358
[  348.216507] R13: ffff8995cbdc20e0 R14: 00000000000003e8 R15: 0000000000000000
[  348.216511] FS:  0000000000000000(0000) GS:ffff8995cd900000(0000) knlGS:0000000000000000
[  348.216515] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  348.216519] CR2: 0000000000000000 CR3: 00000001fb210001 CR4: 00000000003606e0
[  348.216523] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  348.216526] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  348.216529] Call Trace:
[  348.216539]  ring_buffer_lock_reserve+0x210/0x3f0
[  348.216615]  ? intel_engines_reset_default_submission+0x30/0x30 [i915]
[  348.216623]  __trace_bputs+0x5b/0x130
[  348.216632]  ? __mutex_unlock_slowpath+0x46/0x2b0
[  348.216704]  intel_engines_sanitize+0x26/0x50 [i915]
[  348.216755]  i915_pm_resume_early+0x9c/0x110 [i915]
[  348.216804]  ? vlv_resume_prepare+0x670/0x670 [i915]
[  348.216813]  dpm_run_callback+0x64/0x280
[  348.216824]  device_resume_early+0xa6/0xe0
[  348.216834]  async_resume_early+0x14/0x40
[  348.216840]  async_run_entry_fn+0x34/0x160
[  348.216849]  process_one_work+0x245/0x610
[  348.216864]  worker_thread+0x37/0x380
[  348.216873]  ? process_one_work+0x610/0x610
[  348.216879]  kthread+0x119/0x130
[  348.216886]  ? kthread_park+0x80/0x80
[  348.216895]  ret_from_fork+0x3a/0x50
[  348.216914] irq event stamp: 3920
[  348.216922] hardirqs last  enabled at (3919): [<ffffffff960fd68e>] vprintk_emit+0x2ee/0x310
[  348.216928] hardirqs last disabled at (3920): [<ffffffff96001910>] trace_hardirqs_off_thunk+0x1a/0x1c
[  348.216934] softirqs last  enabled at (3866): [<ffffffff96c0031d>] __do_softirq+0x31d/0x483
[  348.216939] softirqs last disabled at (3859): [<ffffffff960900f9>] irq_exit+0xa9/0xc0
[  348.216946] WARNING: CPU: 4 PID: 2553 at kernel/trace/ring_buffer.c:2769 rb_handle_timestamp.isra.30+0x6c/0x70
[  348.216949] ---[ end trace 337d81b78116b4be ]---
Comment 1 Chris Wilson 2018-09-10 12:37:19 UTC
The test creatively misuses oom to see what happens if we encounter one while trying to suspend.

Despite Java's best efforts to screw up (WHAT IS IT DOING WHILE THE TEST IS RUNNING! AGAIN!!!) it did not.

This should not have been marked as DMESG-WARN.
Comment 2 Martin Peres 2018-09-10 12:38:07 UTC
(In reply to Chris Wilson from comment #1)
> This should not have been marked as DMESG-WARN.

So, this is a bug then ;)
Comment 5 CI Bug Log 2018-12-31 14:27:18 UTC
A CI Bug Log filter associated to this bug has been updated:

{- APL CFL ICL: igt@drv_suspend@shrink - dmesg-warn - Out of memory: Kill process \d+ (drv_suspend) score \d+ or sacrifice child -}
{+ SKL APL CFL ICL: igt@drv_suspend@shrink - dmesg-warn - Out of memory: Kill process \d+ (drv_suspend) score \d+ or sacrifice child +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_169/fi-skl-gvtdvm/igt@i915_suspend@shrink.html
Comment 6 CI Bug Log 2018-12-31 14:27:39 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL APL CFL ICL: igt@drv_suspend@shrink - dmesg-warn - Out of memory: Kill process \d+ (drv_suspend) score \d+ or sacrifice child -}
{+ SKL APL CFL ICL: igt@drv_suspend@shrink - dmesg-warn - Out of memory: Kill process \d+ (i915_suspend) score \d+ or sacrifice child +}

 No new failures caught with the new filter
Comment 7 Martin Peres 2019-03-06 18:58:30 UTC
The test got removed. Problem solved :D
Comment 8 CI Bug Log 2019-03-06 18:58:43 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.