Bug 101090 - [BAT][BWR] igt@gem_exec_gttfill@basic hit by a page allocation failure
Summary: [BAT][BWR] igt@gem_exec_gttfill@basic hit by a page allocation failure
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-18 13:43 UTC by Martin Peres
Modified: 2017-06-16 17:06 UTC (History)
1 user (show)

See Also:
i915 platform: I965G
i915 features: GEM/Other


Attachments

Description Martin Peres 2017-05-18 13:43:29 UTC
The test igt@gem_exec_gttfill@basic on CI_DRM_2626 on fi-bwr-2160 got hit by a page allocation failure, suggesting memory management issues or just a general lack of RAM.

Here is the stack trace:
[  203.070086] gem_exec_gttfil: page allocation failure: order:0, mode:0x14210d4(GFP_USER|GFP_DMA32|__GFP_NORETRY|__GFP_RECLAIMABLE), nodemask=(null)
[  203.070156] CPU: 1 PID: 1657 Comm: gem_exec_gttfil Not tainted 4.12.0-rc1-CI-CI_DRM_2626+ #1
[  203.070160] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
[  203.070164] Call Trace:
[  203.070175]  dump_stack+0x67/0x97
[  203.070184]  warn_alloc+0xd6/0x160
[  203.070195]  __alloc_pages_nodemask+0xeaf/0x1220
[  203.070211]  ? __percpu_counter_add+0x85/0xb0
[  203.070221]  shmem_alloc_and_acct_page+0x1b6/0x370
[  203.070230]  shmem_getpage_gfp.isra.9+0x16a/0xc30
[  203.070244]  shmem_read_mapping_page_gfp+0x31/0x50
[  203.070324]  i915_gem_object_get_pages_gtt+0x3b2/0x600 [i915]
[  203.070371]  ____i915_gem_object_get_pages+0x25/0x60 [i915]
[  203.070413]  __i915_gem_object_get_pages+0x59/0x70 [i915]
[  203.070458]  __i915_vma_do_pin+0x59a/0x690 [i915]
[  203.070503]  i915_gem_execbuffer_reserve_vma.isra.9+0xc3/0x240 [i915]
[  203.070545]  i915_gem_execbuffer_reserve.isra.10+0x434/0x460 [i915]
[  203.070589]  i915_gem_do_execbuffer.isra.16+0x631/0x1b80 [i915]
[  203.070629]  ? i915_gem_execbuffer2+0x179/0x220 [i915]
[  203.070638]  ? __lock_acquire+0x472/0x1960
[  203.070648]  ? lock_acquire+0xb5/0x210
[  203.070656]  ? __might_fault+0x39/0x90
[  203.070700]  i915_gem_execbuffer2+0xb5/0x220 [i915]
[  203.070710]  drm_ioctl+0x202/0x490
[  203.070750]  ? i915_gem_execbuffer+0x310/0x310 [i915]
[  203.070766]  do_vfs_ioctl+0x90/0x6d0
[  203.070773]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  203.070780]  ? __this_cpu_preempt_check+0x13/0x20
[  203.070785]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[  203.070792]  SyS_ioctl+0x3c/0x70
[  203.070799]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  203.070804] RIP: 0033:0x7f1e4875a987
[  203.070808] RSP: 002b:00007ffc791c70a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  203.070816] RAX: ffffffffffffffda RBX: ffffffff8146fe03 RCX: 00007f1e4875a987
[  203.070820] RDX: 00007ffc791c7200 RSI: 0000000040406469 RDI: 0000000000000003
[  203.070823] RBP: ffffc90000677f88 R08: 0000000000000040 R09: 0000000000000081
[  203.070827] R10: 000000000000000c R11: 0000000000000246 R12: 00000000007149a4
[  203.070831] R13: 0000000000000003 R14: 0000000040406469 R15: 00007ffc791c7250
[  203.070839]  ? __this_cpu_preempt_check+0x13/0x20
[  203.071007] Mem-Info:
[  203.071017] active_anon:37729 inactive_anon:158343 isolated_anon:526
                active_file:11996 inactive_file:1409 isolated_file:0
                unevictable:0 dirty:10 writeback:150 unstable:0
                slab_reclaimable:7294 slab_unreclaimable:9622
                mapped:13745 shmem:132919 pagetables:1534 bounce:0
                free:12184 free_pcp:0 free_cma:0
[  203.071025] Node 0 active_anon:150916kB inactive_anon:633372kB active_file:47984kB inactive_file:5636kB unevictable:0kB isolated(anon):2104kB isolated(file):0kB mapped:54980kB dirty:40kB writeback:600kB shmem:531676kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 32768kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  203.071033] DMA free:4472kB min:732kB low:912kB high:1092kB active_anon:180kB inactive_anon:10984kB active_file:48kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:128kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  203.071037] lowmem_reserve[]: 0 938 938 938
[  203.071062] DMA32 free:44264kB min:44320kB low:55400kB high:66480kB active_anon:150564kB inactive_anon:622184kB active_file:47976kB inactive_file:5996kB unevictable:0kB writepending:792kB present:1021948kB managed:964420kB mlocked:0kB slab_reclaimable:29048kB slab_unreclaimable:38456kB kernel_stack:2736kB pagetables:6136kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  203.071066] lowmem_reserve[]: 0 0 0 0
[  203.071087] DMA: 7*4kB (M) 10*8kB (UM) 7*16kB (UME) 7*32kB (ME) 5*64kB (UME) 3*128kB (UM) 3*256kB (UM) 1*512kB (M) 2*1024kB (UE) 0*2048kB 0*4096kB = 4476kB
[  203.071168] DMA32: 2450*4kB (UMH) 1203*8kB (UMEH) 604*16kB (UMEH) 285*32kB (MEH) 98*64kB (MEH) 4*128kB (MEH) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44992kB
[  203.071273] 146840 total pagecache pages
[  203.071283] 487 pages in swap cache
[  203.071287] Swap cache stats: add 2626, delete 2145, find 0/0
[  203.071291] Free swap  = 2086396kB
[  203.071295] Total swap = 2097148kB
[  203.071299] 259485 pages RAM
[  203.071303] 0 pages HighMem/MovableOnly
[  203.071307] 14403 pages reserved
Comment 1 Chris Wilson 2017-05-18 14:04:41 UTC
Telling part there is "all_unreclaimable? no" but reclaim failed nevertheless. The only complication with bdw is that we request DMA32 which may make reclaim more unreliable?
Comment 2 Jani Saarinen 2017-05-29 08:23:24 UTC
Statistics: Failure rate 1/37 run(s) (2%)
Last seen: 2017-05-18
Comment 3 Eero Tamminen 2017-05-30 10:33:48 UTC
I've seen similar issue also with Unigine Heaven 4.0 & Valley 1.0 benchmarks, on BYT N2820 with 1x 2GB 1333 DDR3.  

It happens after booting the Ubuntu 16.04 with latest 3D stack (kernel, Mesa & X) and running those two benchmarks few times.

Issue seems to be fairly rare, happens ~1/30 test-set runs.

Last it happened with last night build:
kernel git://anongit.freedesktop.org/drm-tip at 43719a23d643830482f87c67aa5118dd18478409 2017-05-29_13-18-38 drm-tip: 2017y-05m-29d-13h-17m-27s UTC integration manifest

-------------------------------------------
[  776.410773] heaven_x64: page allocation failure: order:0, mode:0x14210d2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_RECLAIMABLE), nodemask=(null)
[  776.410788] CPU: 0 PID: 1859 Comm: heaven_x64 Not tainted 4.12.0-rc3-CI-Nightly_1005+ #1
[  776.410790] Hardware name: \xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff \xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff/DN2820FYK, BIOS FYBYT10H.86A.0056.2016.1122.1846 11/22/2016
[  776.410792] Call Trace:
[  776.410802]  dump_stack+0x4f/0x67
[  776.410807]  warn_alloc+0xdb/0x170
[  776.410810]  __alloc_pages_nodemask+0xbf8/0xe90
[  776.410817]  ? __percpu_counter_add+0x85/0xb0
[  776.410821]  shmem_alloc_and_acct_page+0x156/0x250
[  776.410824]  shmem_getpage_gfp.isra.9+0x15b/0xae0
[  776.410891]  ? i915_gem_shrink+0x33b/0x4b0 [i915]
[  776.410895]  shmem_read_mapping_page_gfp+0x33/0x50
[  776.410927]  ? i915_gem_object_get_pages_gtt+0x1cb/0x5b0 [i915]
[  776.410958]  i915_gem_object_get_pages_gtt+0x1fd/0x5b0 [i915]
[  776.410991]  ____i915_gem_object_get_pages+0x20/0x60 [i915]
[  776.411023]  __i915_gem_object_get_pages+0x5c/0x70 [i915]
[  776.411055]  __i915_vma_do_pin+0x1c5/0x3b0 [i915]
[  776.411087]  i915_gem_execbuffer_reserve_vma.isra.9+0x14d/0x1b0 [i915]
[  776.411119]  i915_gem_execbuffer_reserve.isra.10+0x371/0x3d0 [i915]
[  776.411151]  i915_gem_do_execbuffer.isra.17+0x636/0x1740 [i915]
[  776.411153]  ? shmem_write_end+0x5c/0x2b0
[  776.411159]  ? refcount_dec_and_test+0x11/0x20
[  776.411162]  ? __kmalloc+0x2e/0x210
[  776.411193]  i915_gem_execbuffer2+0x90/0x1a0 [i915]
[  776.411198]  drm_ioctl+0x1f7/0x440
[  776.411229]  ? i915_gem_execbuffer+0x290/0x290 [i915]
[  776.411234]  do_vfs_ioctl+0x92/0x5a0
[  776.411238]  ? handle_mm_fault+0xfd/0x210
[  776.411241]  ? __fget+0x73/0xa0
[  776.411244]  SyS_ioctl+0x41/0x70
[  776.411248]  entry_SYSCALL_64_fastpath+0x17/0x98
[  776.411251] RIP: 0033:0x7f18d8a88357
[  776.411253] RSP: 002b:00007fffe11b78d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  776.411256] RAX: ffffffffffffffda RBX: 000000000da264f0 RCX: 00007f18d8a88357
[  776.411258] RDX: 00007fffe11b7930 RSI: 0000000040406469 RDI: 0000000000000007
[  776.411260] RBP: 00007f18d8d4fb20 R08: 0000000000000007 R09: 0000000000000002
[  776.411261] R10: 0002000000000000 R11: 0000000000000246 R12: 0000000000000005
[  776.411263] R13: 000000000e910ea8 R14: 0000000000000000 R15: 0000000000000000
[  776.411266] Mem-Info:
[  776.411272] active_anon:216452 inactive_anon:159741 isolated_anon:0
                active_file:14782 inactive_file:55818 isolated_file:32
                unevictable:0 dirty:23 writeback:0 unstable:0
                slab_reclaimable:4658 slab_unreclaimable:5095
                mapped:77570 shmem:171170 pagetables:6614 bounce:0
                free:13068 free_pcp:0 free_cma:0
[  776.411277] Node 0 active_anon:865808kB inactive_anon:638964kB active_file:59128kB inactive_file:223272kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:310280kB dirty:92kB writeback:0kB shmem:684680kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 299008kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[  776.411283] DMA free:7832kB min:368kB low:460kB high:552kB active_anon:6656kB inactive_anon:1256kB active_file:20kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15832kB mlocked:0kB slab_reclaimable:16kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:20kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  776.411284] lowmem_reserve[]: 0 1869 1869 1869
[  776.411294] DMA32 free:44440kB min:44684kB low:55852kB high:67020kB active_anon:859152kB inactive_anon:637620kB active_file:58772kB inactive_file:223720kB unevictable:0kB writepending:92kB present:1967432kB managed:1916592kB mlocked:0kB slab_reclaimable:18616kB slab_unreclaimable:20356kB kernel_stack:5872kB pagetables:26436kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  776.411295] lowmem_reserve[]: 0 0 0 0
[  776.411300] DMA: 4*4kB (UM) 7*8kB (UME) 3*16kB (U) 3*32kB (UM) 3*64kB (UM) 2*128kB (UM) 2*256kB (ME) 1*512kB (E) 4*1024kB (UME) 1*2048kB (E) 0*4096kB = 7832kB
[  776.411324] DMA32: 2400*4kB (UME) 955*8kB (UMH) 237*16kB (UM) 51*32kB (UMH) 28*64kB (UM) 10*128kB (UMH) 4*256kB (UEH) 4*512kB (UEH) 16*1024kB (UEH) 0*2048kB 0*4096kB = 45192kB
[  776.411347] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  776.411348] 241840 total pagecache pages
[  776.411352] 0 pages in swap cache
[  776.411353] Swap cache stats: add 380267, delete 380251, find 107354/117770
[  776.411355] Free swap  = 1918716kB
[  776.411356] Total swap = 1983484kB
[  776.411357] 495856 pages RAM
[  776.411358] 0 pages HighMem/MovableOnly
[  776.411359] 12750 pages reserved
-------------------------------------------

Probably unrelated, but on SKL GT3e (i5-6260U + 2x 2133Mhz DDR4), I saw with same build:
-------------------------------------------
[  778.771987] [drm] GPU HANG: ecode 9:0:0x84df7ec4, in heaven_x64 [1970], reason: Hang on rcs0, action: reset
-------------------------------------------
Comment 4 Jani Saarinen 2017-06-02 08:10:24 UTC
Last seen: 2017-05-18
Statistics: Failure rate 1/62 run(s) (1%)
Comment 5 Jani Saarinen 2017-06-16 08:13:11 UTC
Last seen same.
Statistics: Failure rate 1/103 run(s) (0%)
Comment 6 Jani Saarinen 2017-06-16 08:15:37 UTC
Closing this. Whitelisted on CI.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.