Bug 100353 - [EXT][NO SNB, IVB] gem_tiled_swapping@non-threaded failing or hanging testhosts
Summary: [EXT][NO SNB, IVB] gem_tiled_swapping@non-threaded failing or hanging testhosts
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL: https://patchwork.freedesktop.org/pat...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-23 10:32 UTC by Tomi Sarvela
Modified: 2017-07-25 13:00 UTC (History)
1 user (show)

See Also:
i915 platform: BDW, HSW, ILK, KBL, SKL
i915 features: GEM/Other


Attachments

Description Tomi Sarvela 2017-03-23 10:32:29 UTC
IGT test gem_tiled_swapping@non-threaded has hanged many testhosts for a week.

On a newest DRM-Tip From a fresh boot, dmesg below happens.
8229a8c712c22ff8e94e3244d4fd942a7dcd89af
2017-03-23_09-57-57 drm-tip: 2017y-03m-23d-09h-57m-34s UTC integration manifest

12:27 < ickle> it's genuine
12:27 < ickle> better give me a bug so I don't lalala foever
12:29 < ickle> on current tip, at least it should no longer hang :|

[   50.304609] Console: switching to colour dummy device 80x25
[   50.305082] [IGT] gem_tiled_swapping: executing
[   50.318345] gem_tiled_swapp (6810): drop_caches: 4
[   51.829865] [IGT] gem_tiled_swapping: starting subtest non-threaded
[   51.872167] gem_tiled_swapp: page allocation failure: order:0, mode:0x1000(__GFP_NORETRY), nodemask=(null)
[   51.872183] CPU: 2 PID: 6810 Comm: gem_tiled_swapp Not tainted 4.11.0-rc3-CI-CI_DRM_2388+ #1
[   51.872184] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2017
[   51.872185] Call Trace:
[   51.872188]  dump_stack+0x67/0x92
[   51.872191]  warn_alloc+0xd9/0x170
[   51.872194]  __alloc_pages_nodemask+0xe97/0x1260
[   51.872197]  ? __lock_acquire+0x45b/0x1bb0
[   51.872202]  ? __this_cpu_preempt_check+0x13/0x20
[   51.872203]  ? __percpu_counter_add+0x85/0xb0
[   51.872206]  shmem_alloc_and_acct_page+0x1b8/0x380
[   51.872208]  ? page_cache_prev_hole+0x50/0x50
[   51.872211]  shmem_getpage_gfp.isra.9+0x171/0xc40
[   51.872213]  ? synchronize_rcu_expedited+0x35/0xb0
[   51.872217]  shmem_read_mapping_page_gfp+0x37/0x50
[   51.872233]  i915_gem_object_get_pages_gtt+0x3b0/0x600 [i915]
[   51.872247]  ____i915_gem_object_get_pages+0x25/0x60 [i915]
[   51.872259]  __i915_gem_object_get_pages+0x59/0x70 [i915]
[   51.872271]  i915_gem_set_domain_ioctl+0x2f7/0x330 [i915]
[   51.872282]  ? i915_gem_set_domain_ioctl+0x1e2/0x330 [i915]
[   51.872285]  drm_ioctl+0x200/0x450
[   51.872296]  ? i915_gem_obj_prepare_shmem_write+0x200/0x200 [i915]
[   51.872300]  ? up_write+0x1a/0x40
[   51.872303]  do_vfs_ioctl+0x90/0x6e0
[   51.872305]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[   51.872307]  ? __this_cpu_preempt_check+0x13/0x20
[   51.872309]  ? trace_hardirqs_on_caller+0xe7/0x200
[   51.872311]  SyS_ioctl+0x3c/0x70
[   51.872313]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[   51.872315] RIP: 0033:0x7f57f2a48357
[   51.872316] RSP: 002b:00007ffc98788468 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   51.872319] RAX: ffffffffffffffda RBX: ffffffff8147ed53 RCX: 00007f57f2a48357
[   51.872320] RDX: 00007ffc987884a0 RSI: 00000000400c645f RDI: 0000000000000003
[   51.872321] RBP: ffffc90000dc3f88 R08: 0000000000000003 R09: 0000000104600000
[   51.872322] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000003
[   51.872323] R13: 0000000000000003 R14: 00000000400c645f R15: 00007f57f3f1e000
[   51.872325]  ? __this_cpu_preempt_check+0x13/0x20
[   51.872339] Mem-Info:
[   51.872342] active_anon:1190 inactive_anon:68620 isolated_anon:0
                active_file:1176 inactive_file:944 isolated_file:0
                unevictable:3927540 dirty:6 writeback:833 unstable:0
                slab_reclaimable:6942 slab_unreclaimable:14952
                mapped:2138 shmem:18326 pagetables:8873 bounce:0
                free:33830 free_pcp:1105 free_cma:0
[   51.872345] Node 0 active_anon:4760kB inactive_anon:274480kB active_file:4704kB inactive_file:3776kB unevictable:15710160kB isolated(anon):0kB isolated(file):0kB mapped:8552kB dirty:24kB writeback:3332kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 15552512kB anon_thp: 73304kB writeback_tmp:0kB unstable:0kB pages_scanned:384 all_unreclaimable? no
[   51.872347] DMA free:15876kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15884kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   51.872348] lowmem_reserve[]: 0 2848 15860 15860
[   51.872355] DMA32 free:64072kB min:12124kB low:15152kB high:18180kB active_anon:80kB inactive_anon:10240kB active_file:0kB inactive_file:0kB unevictable:2881644kB writepending:0kB present:3058324kB managed:2956336kB mlocked:2891884kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   51.872356] lowmem_reserve[]: 0 0 13012 13012
[   51.872362] Normal free:55372kB min:55392kB low:69240kB high:83088kB active_anon:4776kB inactive_anon:264240kB active_file:4704kB inactive_file:3776kB unevictable:12828332kB writepending:3356kB present:13615104kB managed:13324684kB mlocked:12879532kB slab_reclaimable:27768kB slab_unreclaimable:59800kB kernel_stack:3712kB pagetables:35492kB bounce:0kB free_pcp:4420kB local_pcp:692kB free_cma:0kB
[   51.872363] lowmem_reserve[]: 0 0 0 0
[   51.872368] DMA: 3*4kB (U) 1*8kB (U) 3*16kB (U) 0*32kB 3*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15876kB
[   51.872388] DMA32: 3*4kB (UM) 3*8kB (UM) 3*16kB (UM) 2*32kB (U) 0*64kB 7*128kB (UM) 7*256kB (UM) 10*512kB (UM) 11*1024kB (UM) 2*2048kB (UM) 10*4096kB (M) = 64276kB
[   51.872409] Normal: 77*4kB (UME) 499*8kB (UM) 377*16kB (UM) 179*32kB (UME) 78*64kB (ME) 66*128kB (UME) 68*256kB (UME) 17*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 55612kB
[   51.872430] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   51.872431] 21844 total pagecache pages
[   51.872434] 1209 pages in swap cache
[   51.872435] Swap cache stats: add 28852, delete 27657, find 294/534
[   51.872436] Free swap  = 8157180kB
[   51.872437] Total swap = 8272892kB
[   51.872438] 4172354 pages RAM
[   51.872439] 0 pages HighMem/MovableOnly
[   51.872440] 98128 pages reserved
[   51.902964] [IGT] gem_tiled_swapping: exiting, ret=99
[   51.927222] Console: switching to colour frame buffer device 240x75
Comment 1 Chris Wilson 2017-03-23 10:34:12 UTC
The issue is that we are not adding the newly mmaped objects to the shrinker list and so instead of swapping out objects, we exhaust all physical memory -> ENOMEM on current tip, system dead due to ineffective oomkiller before.
Comment 2 Chris Wilson 2017-03-23 10:38:13 UTC
Hmm, analysis is probably wrong. We are calling gem_set_domain() which does i915_gem_object_bump_inactive_ggtt() and that should be placing the new objects on the shrinker list (i915->mm.unbound_list).
Comment 3 Chris Wilson 2017-03-23 11:27:33 UTC
The shrinker is being run before the failures, e.g.:

[   31.367525] i915_gem_shrinker_count: count = 256 pages (actual 354 pages)
[   31.367832] i915_gem_shrinker_scan: freed = 256 / 128 pages
[   31.367861] i915_gem_shrinker_count: count = 0 pages (actual 98 pages)
[   31.368090] i915_gem_shrinker_count: count = 0 pages (actual 98 pages)
[   31.368124] i915_gem_shrinker_count: count = 0 pages (actual 98 pages)
[   31.368138] gem_tiled_swapp: page allocation failure: order:0, mode:0x1401000(GFP_NOIO|__GFP_NORETRY), nodemask=(null)
[   31.368147] gem_tiled_swapp cpuset=/ mems_allowed=0
[   31.368153] CPU: 0 PID: 747 Comm: gem_tiled_swapp Not tainted 4.11.0-rc3+ #7
[   31.368155] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
[   31.368156] Call Trace:
[   31.368164]  ? dump_stack+0x46/0x5d
[   31.368168]  ? warn_alloc+0xe6/0x180
[   31.368171]  ? __alloc_pages_slowpath+0xab6/0xb40
[   31.368174]  ? set_next_entity+0x6b/0x9c0
[   31.368177]  ? __alloc_pages_nodemask+0x197/0x1b0
[   31.368179]  ? shmem_alloc_and_acct_page+0x140/0x230
[   31.368181]  ? find_get_entry+0x14/0xe0
[   31.368183]  ? shmem_getpage_gfp.isra.56+0x15a/0xae0
[   31.368186]  ? shmem_read_mapping_page_gfp+0x31/0x60
[   31.368216]  ? i915_gem_object_get_pages_gtt+0x20f/0x5f0 [i915]
[   31.368239]  ? ____i915_gem_object_get_pages+0x38/0x70 [i915]
[   31.368262]  ? __i915_gem_object_get_pages+0x4e/0x60 [i915]
[   31.368284]  ? i915_gem_set_domain_ioctl+0x1bd/0x210 [i915]
[   31.368295]  ? drm_ioctl+0x1d1/0x3b0 [drm]
[   31.368317]  ? i915_gem_obj_prepare_shmem_write+0x140/0x140 [i915]
[   31.368320]  ? do_mmap+0x356/0x440
[   31.368323]  ? do_vfs_ioctl+0x86/0x5b0
[   31.368325]  ? SyS_ioctl+0x36/0x70
[   31.368328]  ? entry_SYSCALL_64_fastpath+0x17/0x98
[   31.368330] Mem-Info:
[   31.368336] active_anon:8601 inactive_anon:78273 isolated_anon:45
 active_file:2772 inactive_file:634 isolated_file:0
 unevictable:1872890 dirty:13 writeback:15104 unstable:0
 slab_reclaimable:992 slab_unreclaimable:4998
 mapped:3590 shmem:68205 pagetables:4174 bounce:0
 free:26901 free_pcp:72 free_cma:0
[   31.368341] Node 0 active_anon:34404kB inactive_anon:313092kB active_file:11088kB inactive_file:2536kB unevictable:7491560kB isolated(anon):180kB isolated(file):0kB mapped:14360kB dirty:52kB writeback:60416kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 272820kB writeback_tmp:0kB unstable:0kB pages_scanned:8 all_unreclaimable? no
[   31.368346] DMA free:15892kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   31.368346] lowmem_reserve[]: 0 1800 7803 7803
[   31.368354] DMA32 free:39836kB min:15560kB low:19448kB high:23336kB active_anon:19528kB inactive_anon:287760kB active_file:4kB inactive_file:12kB unevictable:1498088kB writepending:58328kB present:1926492kB managed:1853664kB mlocked:1498088kB slab_reclaimable:620kB slab_unreclaimable:3852kB kernel_stack:0kB pagetables:2920kB bounce:0kB free_pcp:288kB local_pcp:288kB free_cma:0kB
[   31.368355] lowmem_reserve[]: 0 0 6003 6003
[   31.368362] Normal free:51876kB min:51884kB low:64852kB high:77820kB active_anon:14716kB inactive_anon:24904kB active_file:11084kB inactive_file:2364kB unevictable:5993492kB writepending:2100kB present:6291456kB managed:6147088kB mlocked:5993492kB slab_reclaimable:3348kB slab_unreclaimable:16140kB kernel_stack:1648kB pagetables:13776kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   31.368363] lowmem_reserve[]: 0 0 0 0
[   31.368367] DMA: 3*4kB (U) 3*8kB (U) 3*16kB (U) 2*32kB (U) 2*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[   31.368383] DMA32: 5*4kB (ME) 21*8kB (UE) 19*16kB (UE) 15*32kB (UME) 8*64kB (ME) 5*128kB (UME) 8*256kB (UME) 4*512kB (ME) 1*1024kB (M) 0*2048kB 8*4096kB (M) = 40012kB
[   31.368399] Normal: 286*4kB (UM) 111*8kB (UME) 58*16kB (M) 54*32kB (UM) 28*64kB (UME) 2*128kB (E) 3*256kB (ME) 3*512kB (UM) 2*1024kB (ME) 4*2048kB (UME) 8*4096kB (M) = 52048kB
[   31.368416] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   31.368417] 86889 total pagecache pages
[   31.368421] 15150 pages in swap cache
[   31.368422] Swap cache stats: add 36066, delete 20916, find 22/37
[   31.368423] Free swap  = 8089596kB
[   31.368424] Total swap = 8233980kB
[   31.368425] 2058483 pages RAM
[   31.368426] 0 pages HighMem/MovableOnly
[   31.368426] 54322 pages reserved

and that also says we don't have stray objects left lying around. So why is everything "unevictable"?
Comment 4 Chris Wilson 2017-03-23 11:40:47 UTC
Ugh, the test uses igt_lock_mem(). That explains why everything appears to be unevictable.
Comment 5 Chris Wilson 2017-03-23 11:48:21 UTC
Ok, looks like this is the classic oom with swap + writeable pages, and not the untracked object bug as I first feared.
Comment 6 Chris Wilson 2017-06-14 10:37:24 UTC
https://patchwork.freedesktop.org/patch/160838/
Comment 7 Elizabeth 2017-06-23 22:16:09 UTC
(In reply to Chris Wilson from comment #6)
> https://patchwork.freedesktop.org/patch/160838/

Changing the status to NEEDINFO. Thank you.
Comment 8 Ricardo 2017-07-25 13:00:34 UTC
tested again and the test is now passing..


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.