Bug 108686 - [CI][SHARDS] igt@gem_tiled_swapping@non-threaded - dmesg-warn - page allocation failure
Summary: [CI][SHARDS] igt@gem_tiled_swapping@non-threaded - dmesg-warn - page allocati...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-07 09:01 UTC by Martin Peres
Modified: 2019-01-21 16:30 UTC (History)
1 user (show)

See Also:
i915 platform: KBL, SKL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-11-07 09:01:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5094/shard-skl9/igt@gem_tiled_swapping@non-threaded.html

<6> [545.153698] Console: switching to colour dummy device 80x25
<6> [545.154241] [IGT] gem_tiled_swapping: executing
<6> [549.923047] [IGT] gem_tiled_swapping: starting subtest non-threaded
<4> [563.412315] gem_tiled_swapp: page allocation failure: order:0, mode:0x40d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE), nodemask=(null)
<4> [563.412643] CPU: 2 PID: 1215 Comm: gem_tiled_swapp Tainted: G     U            4.20.0-rc1-CI-CI_DRM_5094+ #1
<4> [563.412655] Hardware name: Google Caroline/Caroline, BIOS MrChromebox 08/27/2018
<4> [563.412663] Call Trace:
<4> [563.412693]  dump_stack+0x67/0x9b
<4> [563.412716]  warn_alloc+0xee/0x170
<4> [563.412770]  __alloc_pages_nodemask+0xe2d/0x1160
<4> [563.412798]  ? get_partial_node.isra.29+0x1f1/0x460
<4> [563.412863]  new_slab+0x239/0x560
<4> [563.412900]  ___slab_alloc.constprop.34+0x2d3/0x380
<4> [563.412918]  ? xas_nomem+0x38/0x50
<4> [563.412944]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [563.412960]  ? kmem_cache_alloc+0x24d/0x280
<4> [563.412994]  ? xas_nomem+0x38/0x50
<4> [563.413015]  ? __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [563.413030]  __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [563.413052]  ? xas_nomem+0x38/0x50
<4> [563.413069]  kmem_cache_alloc+0x21c/0x280
<4> [563.413097]  xas_nomem+0x38/0x50
<4> [563.413114]  add_to_swap_cache+0x2ae/0x350
<4> [563.413160]  __read_swap_cache_async+0xed/0x1d0
<4> [563.413191]  swap_cluster_readahead+0x17f/0x250
<4> [563.413244]  ? shmem_swapin+0x7b/0xa0
<4> [563.413259]  shmem_swapin+0x7b/0xa0
<4> [563.413296]  ? find_get_entry+0x1b5/0x2f0
<4> [563.413355]  shmem_getpage_gfp.isra.8+0x75d/0xd90
<4> [563.413415]  shmem_read_mapping_page_gfp+0x3e/0x70
<4> [563.413655]  i915_gem_object_get_pages_gtt+0x1ef/0x660 [i915]
<4> [563.413876]  ____i915_gem_object_get_pages+0x1d/0xa0 [i915]
<4> [563.414064]  __i915_gem_object_get_pages+0x59/0xb0 [i915]
<4> [563.414253]  i915_gem_set_domain_ioctl+0x378/0x450 [i915]
<4> [563.414436]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [563.414464]  drm_ioctl_kernel+0x81/0xf0
<4> [563.414499]  drm_ioctl+0x2e6/0x3a0
<4> [563.414705]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [563.414741]  ? vm_mmap_pgoff+0xa3/0xf0
<4> [563.414761]  ? do_mmap+0x3d1/0x570
<4> [563.414798]  do_vfs_ioctl+0xa0/0x6e0
<4> [563.414840]  ksys_ioctl+0x35/0x60
<4> [563.414866]  __x64_sys_ioctl+0x11/0x20
<4> [563.414882]  do_syscall_64+0x55/0x190
<4> [563.414907]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [563.414922] RIP: 0033:0x7f9327ebb5d7
<4> [563.414937] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4> [563.414949] RSP: 002b:00007ffc438eb8c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4> [563.414965] RAX: ffffffffffffffda RBX: 000055bbaafe746c RCX: 00007f9327ebb5d7
<4> [563.414977] RDX: 00007ffc438eb91c RSI: 00000000400c645f RDI: 0000000000000005
<4> [563.414988] RBP: 00007ffc438eb91c R08: 0000000000000005 R09: 0000000106400000
<4> [563.414999] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000400c645f
<4> [563.415009] R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
<4> [563.415208] Mem-Info:
<4> [563.415236] active_anon:30065 inactive_anon:46173 isolated_anon:32\x0a active_file:1611 inactive_file:1440 isolated_file:0\x0a unevictable:860417 dirty:13 writeback:126 unstable:0\x0a slab_reclaimable:6849 slab_unreclaimable:13008\x0a mapped:2735 shmem:44760 pagetables:3115 bounce:0\x0a free:22582 free_pcp:472 free_cma:0
<4> [563.415261] Node 0 active_anon:120260kB inactive_anon:184692kB active_file:6444kB inactive_file:5760kB unevictable:3441668kB isolated(anon):128kB isolated(file):0kB mapped:10940kB dirty:52kB writeback:504kB shmem:179040kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 3244032kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
<4> [563.415286] DMA free:15420kB min:272kB low:340kB high:408kB active_anon:0kB inactive_anon:372kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
<4> [563.415294] lowmem_reserve[]: 0 1831 3790 3790
<4> [563.415331] DMA32 free:40264kB min:32528kB low:40660kB high:48792kB active_anon:24kB inactive_anon:1280kB active_file:0kB inactive_file:232kB unevictable:1902592kB writepending:0kB present:1992708kB managed:1945088kB mlocked:1902592kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
<4> [563.415339] lowmem_reserve[]: 0 0 1958 1958
<4> [563.415375] Normal free:34644kB min:34776kB low:43468kB high:52160kB active_anon:120236kB inactive_anon:183040kB active_file:6444kB inactive_file:5528kB unevictable:1539076kB writepending:864kB present:2080768kB managed:2005968kB mlocked:1539076kB kernel_stack:2704kB pagetables:12460kB bounce:0kB free_pcp:1888kB local_pcp:740kB free_cma:0kB
<4> [563.415382] lowmem_reserve[]: 0 0 0 0
<4> [563.415402] DMA: 3*4kB (UE) 2*8kB (UE) 0*16kB 3*32kB (UE) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ME) 2*4096kB (M) = 15420kB
<4> [563.415475] DMA32: 6*4kB (UME) 4*8kB (UME) 3*16kB (ME) 8*32kB (UM) 7*64kB (ME) 6*128kB (UME) 10*256kB (UME) 7*512kB (UME) 6*1024kB (UME) 3*2048kB (UM) 5*4096kB (M) = 40488kB
<4> [563.415584] Normal: 744*4kB (UMEH) 1114*8kB (MH) 559*16kB (UMH) 251*32kB (UMH) 69*64kB (MEH) 6*128kB (EH) 2*256kB (E) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 34560kB
<6> [563.415695] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
<4> [563.415710] 49537 total pagecache pages
<4> [563.415744] 1724 pages in swap cache
<4> [563.415764] Swap cache stats: add 130568, delete 128835, find 400/644
<4> [563.415779] Free swap  = 871808kB
<4> [563.415794] Total swap = 1392256kB
<4> [563.415809] 1022368 pages RAM
<4> [563.415822] 0 pages HighMem/MovableOnly
<4> [563.415833] 30628 pages reserved
<4> [563.415977] SLUB: Unable to allocate memory on node -1, gfp=0xc0(__GFP_IO|__GFP_FS)
<4> [563.415990]   cache: radix_tree_node, object size: 576, buffer size: 912, default order: 2, min order: 0
<4> [563.416004]   node 0: slabs: 582, objs: 4967, free: 15
<6> [615.568036] java (908) used greatest stack depth: 11736 bytes left
<6> [674.331010] [IGT] gem_tiled_swapping: exiting, ret=0
Comment 1 Chris Wilson 2018-11-07 14:52:34 UTC
Note it didn't cause a rendering error, so was just a transient allocation failure under heavy mempressure, so not really a bug, just the usual noise. Our only goal here is to make sure direct-reclaim is effective to never allow a preventable order-0 allocation failure.
Comment 2 Martin Peres 2018-11-07 15:03:53 UTC
(In reply to Chris Wilson from comment #1)
> Note it didn't cause a rendering error, so was just a transient allocation
> failure under heavy mempressure, so not really a bug, just the usual noise.
> Our only goal here is to make sure direct-reclaim is effective to never
> allow a preventable order-0 allocation failure.

Shouldn't we hide this message then?

If not, this means that we would not catch anything happening when this failure happens.

If this is OK, let's close with NOTABUG, but I will keep the issue in cibuglog.
Comment 3 Chris Wilson 2018-11-07 15:17:08 UTC
We already do a couple of allocation passes before we hit the failure message, and the allocation itself will do a couple of attempts before warning. We know we have plenty of memory available, and we know most of it is tied up in GPU buffers, so we know we can do better -- it's just trying to ensure we never run out of kernel reserves that we fail at.

Mostly trying to say it's not a new problem, nor is an easy problem; but it is a real problem in how the kernel + i915 handles mempressure.
Comment 4 Francesco Balestrieri 2018-11-23 11:39:24 UTC
Setting to medium based on the discussion above
Comment 5 Lakshmi 2018-11-29 11:54:15 UTC
Also seen on KBL
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_149/fi-kbl-7560u/igt@gem_tiled_swapping@non-threaded.html

	
<6> [101.598962] Console: switching to colour dummy device 80x25
<6> [101.599013] [IGT] gem_tiled_swapping: executing
<6> [101.606140] gem_tiled_swapp (1152): drop_caches: 4
<6> [102.928910] [IGT] gem_tiled_swapping: starting subtest non-threaded
<7> [103.519069] [drm:edp_panel_vdd_off_sync [i915]] Turning eDP port A VDD off
<7> [103.519141] [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x00000007
<7> [103.519162] [drm:intel_power_well_disable [i915]] disabling DC off
<7> [103.519183] [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [103.519202] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
<4> [104.717385] gem_tiled_swapp: page allocation failure: order:0, mode:0x40d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE), nodemask=(null)
<4> [104.717423] CPU: 3 PID: 1152 Comm: gem_tiled_swapp Tainted: G     U            4.20.0-rc4-gd04475fd1ae0-drmtip_149+ #1
<4> [104.717425] Hardware name: Dell Inc. XPS 13 9360/0823VW, BIOS 2.3.1 10/03/2017
<4> [104.717426] Call Trace:
<4> [104.717431]  dump_stack+0x67/0x9b
<4> [104.717435]  warn_alloc+0xee/0x170
<4> [104.717441]  __alloc_pages_nodemask+0xddd/0x1170
<4> [104.717452]  new_slab+0x230/0x550
<4> [104.717456]  ___slab_alloc.constprop.34+0x2d3/0x380
<4> [104.717459]  ? xas_nomem+0x38/0x50
<4> [104.717462]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [104.717464]  ? kmem_cache_alloc+0x24d/0x280
<4> [104.717468]  ? xas_nomem+0x38/0x50
<4> [104.717470]  ? __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [104.717472]  __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [104.717475]  ? xas_nomem+0x38/0x50
<4> [104.717477]  kmem_cache_alloc+0x21c/0x280
<4> [104.717480]  xas_nomem+0x38/0x50
<4> [104.717482]  add_to_swap_cache+0x2ae/0x350
<4> [104.717488]  __read_swap_cache_async+0xed/0x1d0
<4> [104.717492]  swap_cluster_readahead+0x17f/0x250
<4> [104.717499]  ? shmem_swapin+0x7b/0xa0
<4> [104.717501]  shmem_swapin+0x7b/0xa0
<4> [104.717506]  ? find_get_entry+0x1b5/0x2f0
<4> [104.717514]  shmem_getpage_gfp.isra.8+0x76f/0xdb0
<4> [104.717521]  shmem_read_mapping_page_gfp+0x3e/0x70
<4> [104.717555]  i915_gem_object_get_pages_gtt+0x203/0x680 [i915]
<4> [104.717585]  ? __i915_gem_object_get_pages+0x18/0xb0 [i915]
<4> [104.717590]  ? lock_acquire+0xa6/0x1c0
<4> [104.717614]  ____i915_gem_object_get_pages+0x1d/0xa0 [i915]
<4> [104.717636]  __i915_gem_object_get_pages+0x59/0xb0 [i915]
<4> [104.717660]  i915_gem_set_domain_ioctl+0x378/0x450 [i915]
<4> [104.717682]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [104.717687]  drm_ioctl_kernel+0x81/0xf0
<4> [104.717691]  drm_ioctl+0x2de/0x390
<4> [104.717713]  ? i915_gem_obj_prepare_shmem_write+0x280/0x280 [i915]
<4> [104.717718]  ? vm_mmap_pgoff+0xa3/0xf0
<4> [104.717720]  ? do_mmap+0x3d1/0x570
<4> [104.717726]  do_vfs_ioctl+0xa0/0x6e0
<4> [104.717731]  ksys_ioctl+0x35/0x60
<4> [104.717734]  __x64_sys_ioctl+0x11/0x20
<4> [104.717736]  do_syscall_64+0x55/0x190
<4> [104.717740]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [104.717742] RIP: 0033:0x7fc6c64e95d7
<4> [104.717744] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4> [104.717745] RSP: 002b:00007ffd894537e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4> [104.717748] RAX: ffffffffffffffda RBX: 0000564165cc146c RCX: 00007fc6c64e95d7
<4> [104.717749] RDX: 00007ffd8945383c RSI: 00000000400c645f RDI: 0000000000000005
<4> [104.717751] RBP: 00007ffd8945383c R08: 0000000000000005 R09: 0000000106400000
<4> [104.717752] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000400c645f
<4> [104.717753] R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
Comment 6 Francesco Balestrieri 2019-01-09 09:14:08 UTC
> If this is OK, let's close with NOTABUG, but I will keep the issue in cibuglog.

Can we resolve as in the above comment?
Comment 7 CI Bug Log 2019-01-21 16:30:29 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL: igt@gem_tiled_swapping@non-threaded - dmesg-warn - page allocation failure -}
{+ SKL APL KBL: igt@gem_tiled_swapping@non-threaded - dmesg-warn - page allocation failure +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5456/shard-apl3/igt@gem_tiled_swapping@non-threaded.html


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.