Bug 106886 - [CI][SHARDS] igt@drv_suspend@shrink - incomplete - hard hang?
Summary: [CI][SHARDS] igt@drv_suspend@shrink - incomplete - hard hang?
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-11 13:33 UTC by Martin Peres
Modified: 2019-03-08 15:25 UTC (History)
1 user (show)

See Also:
i915 platform: BSW/CHT, CFL, G33, HSW, I945GM, KBL, SNB
i915 features: power/suspend-resume


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Chris Wilson 2018-06-11 13:38:31 UTC
It looks like we get into a livelock loop where we make no progress. Some report continuing to hit our shrinking, but a few others just go quiet.
Comment 2 Martin Peres 2018-06-13 12:23:41 UTC
Reproduced on CFL, this time with pstores!

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_61/fi-cfl-u2/igt@drv_suspend@shrink.html
Comment 3 Martin Peres 2018-06-14 07:54:31 UTC
Hmm, on SNB, the test just get killed by the OOM killer:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4310/shard-snb5/igt@drv_suspend@shrink.html

Stdout:
child 0 died with signal 9, Killed

Dmesg:
<6>[   66.295205] drv_suspend (1451) used greatest stack depth: 11288 bytes left
<4>[   69.647709] python3 invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
<4>[   69.647741] CPU: 1 PID: 1362 Comm: python3 Tainted: G     U            4.17.0-rc7-CI-CI_DRM_4310+ #1
<4>[   69.647743] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4>[   69.647744] Call Trace:
<4>[   69.647751]  dump_stack+0x67/0x9b
<4>[   69.647755]  dump_header+0x60/0x42e
<4>[   69.647759]  ? trace_hardirqs_on_caller+0xe0/0x1b0
<4>[   69.647762]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[   69.647767]  oom_kill_process+0x2be/0x6d0
<4>[   69.647773]  out_of_memory+0x103/0x390
<4>[   69.647777]  __alloc_pages_nodemask+0xe3f/0x1250
<4>[   69.647791]  filemap_fault+0x276/0x620
<4>[   69.647798]  ext4_filemap_fault+0x27/0x40
<4>[   69.647802]  __do_fault+0x1b/0x80
<4>[   69.647805]  __handle_mm_fault+0x888/0xe30
<4>[   69.647815]  handle_mm_fault+0x196/0x3a0
<4>[   69.647820]  __do_page_fault+0x295/0x590
<4>[   69.647826]  ? page_fault+0x8/0x30
<4>[   69.647829]  page_fault+0x1e/0x30
<4>[   69.647831] RIP: 0033:0x54e734
<4>[   69.647833] RSP: 002b:00007fc52ac6f260 EFLAGS: 00010246
<4>[   69.647836] RAX: 0000000000000000 RBX: 0000000002037950 RCX: 00007fc5397f510d
<4>[   69.647838] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000a8b1c0
<4>[   69.647839] RBP: 0000000002037950 R08: 0000000000a8b180 R09: 0000000000000000
<4>[   69.647841] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc52b473126
<4>[   69.647843] R13: 00007fc52400a8a0 R14: 00007fc52400a790 R15: 0000000000000057
<4>[   69.647852] Mem-Info:
<4>[   69.647856] active_anon:0 inactive_anon:61 isolated_anon:0
                   active_file:113 inactive_file:0 isolated_file:0
                   unevictable:1926764 dirty:0 writeback:0 unstable:0
                   slab_reclaimable:13274 slab_unreclaimable:12073
                   mapped:1926824 shmem:1926766 pagetables:9135 bounce:0
                   free:25273 free_pcp:1 free_cma:0
<4>[   69.647860] Node 0 active_anon:0kB inactive_anon:244kB active_file:452kB inactive_file:0kB unevictable:7707056kB isolated(anon):0kB isolated(file):0kB mapped:7707296kB dirty:0kB writeback:0kB shmem:7707064kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
<4>[   69.647864] DMA free:15360kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15360kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
<4>[   69.647865] lowmem_reserve[]: 0 3050 7771 7771
<4>[   69.647876] DMA32 free:44916kB min:26476kB low:33092kB high:39708kB active_anon:76kB inactive_anon:156kB active_file:140kB inactive_file:616kB unevictable:3056012kB writepending:132kB present:3238728kB managed:3127532kB mlocked:3056012kB kernel_stack:0kB pagetables:11976kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
<4>[   69.647878] lowmem_reserve[]: 0 0 4721 4721
<4>[   69.647888] Normal free:40816kB min:40976kB low:51220kB high:61464kB active_anon:44kB inactive_anon:148kB active_file:0kB inactive_file:300kB unevictable:4650828kB writepending:44kB present:4978688kB managed:4834472kB mlocked:4650828kB kernel_stack:3872kB pagetables:24564kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB
<4>[   69.647890] lowmem_reserve[]: 0 0 0 0
<4>[   69.647898] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
<4>[   69.647933] DMA32: 77*4kB (UM) 39*8kB (ME) 27*16kB (UME) 80*32kB (UME) 14*64kB (ME) 1*128kB (M) 3*256kB (UM) 1*512kB (U) 3*1024kB (UME) 0*2048kB 9*4096kB (UM) = 45852kB
<4>[   69.647967] Normal: 445*4kB (UME) 221*8kB (UME) 192*16kB (UME) 50*32kB (UME) 61*64kB (ME) 38*128kB (ME) 21*256kB (ME) 9*512kB (UME) 14*1024kB (UME) 0*2048kB 0*4096kB = 41308kB
<6>[   69.648050] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
<4>[   69.648053] 1927023 total pagecache pages
<4>[   69.648059] 1 pages in swap cache
<4>[   69.648062] Swap cache stats: add 80511, delete 80577, find 6280/10999
<4>[   69.648065] Free swap  = 1809148kB
<4>[   69.648068] Total swap = 2097148kB
<4>[   69.648071] 2058350 pages RAM
<4>[   69.648073] 0 pages HighMem/MovableOnly
<4>[   69.648076] 64009 pages reserved
<6>[   69.648079] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
<6>[   69.648133] [  239]     0   239    35727        0   303104      174             0 systemd-journal
<6>[   69.648203] [  276]     0   276    11024        1   114688      460         -1000 systemd-udevd
<6>[   69.648209] [  471]   102   471    17652        0   172032      165             0 systemd-resolve
<6>[   69.648214] [  506]   118   506    11814        0   135168      107             0 avahi-daemon
<6>[   69.648219] [  519]   118   519    11769        0   126976       85             0 avahi-daemon
<6>[   69.648225] [  523]     0   523    45273        0   204800      208             0 thermald
<6>[   69.648230] [  524]     0   524   106814        0   335872      356             0 ModemManager
<6>[   69.648236] [  526]   104   526    65759        0   163840      406             0 rsyslogd
<6>[   69.648241] [  529]     0   529     1138        0    53248       41             0 acpid
<6>[   69.648247] [  537]   106   537    12591        1   147456      250          -900 dbus-daemon
<6>[   69.648252] [  548]     0   548    11188        0   131072      138             0 wpa_supplicant
<6>[   69.648258] [  553]     0   553    17646        1   176128      178             0 systemd-logind
<6>[   69.648263] [  554]     0   554     8135        1   114688       74             0 cron
<6>[   69.648269] [  555]     0   555    27617        0   114688       95             0 irqbalance
<6>[   69.648275] [  557]     0   557    42936        2   217088     1999             0 networkd-dispat
<6>[   69.648280] [  592]     0   592    72217        0   204800      729             0 polkitd
<6>[   69.648286] [  610]     0   610   120439        1   421888      678             0 NetworkManager
<6>[   69.648291] [  736]     0   736    18074        0   184320      188         -1000 sshd
<6>[   69.648297] [  737]     0   737     4350        0    77824       37             0 agetty
<6>[   69.648303] [  741]     0   741     6414        1    86016      304             0 dhclient
<6>[   69.648308] [  904]     0   904    26431        1   249856      247             0 sshd
<6>[   69.648314] [  909]  1000   909    19150        1   188416      271             0 systemd
<6>[   69.648319] [  912]  1000   912    28487        0   253952      609             0 (sd-pam)
<6>[   69.648325] [  951]  1000   951    27046        0   249856      290             0 sshd
<6>[   69.648330] [  985]  1000   985  1515142        0   827392    30069             0 java
<6>[   69.648336] [ 1022]  1000  1022     3526        1    69632       72             0 bash
<6>[   69.648341] [ 1091]     0  1091    16707        1   184320      119             0 sudo
<6>[   69.648347] [ 1096]     0  1096     2479        1    65536       33             0 rngd
<6>[   69.648352] [ 1119]  1000  1119     3807        0    73728       43             0 dmesg
<6>[   69.648358] [ 1121]     0  1121    16707        1   163840      120             0 sudo
<6>[   69.648363] [ 1125]     0  1125     1129        0    57344       22             0 owatch
<6>[   69.648369] [ 1126]     0  1126   338543        0   606208    32399             0 python3
<6>[   69.648375] [ 1450]     0  1450  2042986  1926080 15818752      425          1000 drv_suspend
<6>[   69.648380] [ 1452]     0  1452  2042986  1926766 15814656      419          1000 drv_suspend
<3>[   69.648384] Out of memory: Kill process 1452 (drv_suspend) score 1766 or sacrifice child
<3>[   69.648845] Killed process 1452 (drv_suspend) total-vm:8171944kB, anon-rss:0kB, file-rss:0kB, shmem-rss:7707064kB
<6>[   69.651916] oom_reaper: reaped process 1452 (drv_suspend), now anon-rss:0kB, file-rss:0kB, shmem-rss:7707064kB
<4>[   69.656037] in:imklog invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
<4>[   69.656079] CPU: 3 PID: 543 Comm: in:imklog Tainted: G     U            4.17.0-rc7-CI-CI_DRM_4310+ #1
<4>[   69.656082] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4>[   69.656085] Call Trace:
<4>[   69.656091]  dump_stack+0x67/0x9b
<4>[   69.656097]  dump_header+0x60/0x42e
<4>[   69.656102]  ? trace_hardirqs_on_caller+0xe0/0x1b0
<4>[   69.656107]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[   69.656115]  oom_kill_process+0x2be/0x6d0
<4>[   69.656125]  out_of_memory+0x103/0x390
<4>[   69.656132]  __alloc_pages_nodemask+0xe3f/0x1250
<4>[   69.656157]  __read_swap_cache_async+0x148/0x260
<4>[   69.656166]  swapin_readahead+0x312/0x410
<4>[   69.656177]  ? pagecache_get_page+0x2b/0x210
<4>[   69.656186]  ? do_swap_page+0x2e2/0x910
<4>[   69.656190]  do_swap_page+0x2e2/0x910
<4>[   69.656202]  __handle_mm_fault+0x65e/0xe30
<4>[   69.656218]  handle_mm_fault+0x196/0x3a0
<4>[   69.656226]  __do_page_fault+0x295/0x590
<4>[   69.656238]  page_fault+0x1e/0x30

Should I file another bug?
Comment 8 Martin Peres 2018-07-12 11:40:13 UTC
Also seen on SNB: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4535/shard-snb4/igt@drv_suspend@shrink.html

This time, there is nothing interesting in the logs, except a crash.
Comment 9 Martin Peres 2019-03-08 15:25:23 UTC
The test is gone. Closing!
Comment 10 CI Bug Log 2019-03-08 15:25:31 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.