https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3419/shard-apl2/igt@gem_ppgtt@blt-vs-render-ctxn.html This caught my eye in dmesg: <7>[ 192.797271] [IGT] gem_eio: starting subtest throttle <4>[ 192.817585] Setting dangerous option reset - tainting kernel <4>[ 193.388010] hpet1: lost 7161 rtc interrupts ... <7>[ 196.788603] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915], irq posted? yes, current seqno=1a6b3, last=1a6b4 ... <4>[ 209.345145] hpet1: lost 7161 rtc interrupts <7>[ 209.851911] [drm:i915_reset_device [i915]] resetting chip <5>[ 209.852184] i915 0000:00:02.0: Resetting chip after gpu hang ... <7>[ 210.168410] [IGT] gem_ppgtt: executing <4>[ 210.281048] hpet1: lost 7161 rtc interrupts <7>[ 210.296595] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctxN <4>[ 211.179533] hpet1: lost 7161 rtc interrupts ... <4>[ 349.723277] hpet1: lost 7161 rtc interrupts <4>[ 350.192490] gem_ppgtt: page allocation failure: order:0, mode:0x14204d2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_RECLAIMABLE), nodemask=(null) <4>[ 350.192608] CPU: 0 PID: 2707 Comm: gem_ppgtt Tainted: G U 4.15.0-rc1-CI-CI_DRM_3419+ #1 <4>[ 350.192612] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017 <4>[ 350.192616] Call Trace: <4>[ 350.192627] dump_stack+0x5f/0x86 <4>[ 350.192635] warn_alloc+0xad/0x130 <4>[ 350.192647] ? __mutex_unlock_slowpath+0x38/0x270 <4>[ 350.192656] __alloc_pages_nodemask+0xeb5/0x11d0 <4>[ 350.192666] ? mark_held_locks+0x66/0x90 <4>[ 350.192686] shmem_alloc_and_acct_page+0x7b/0x3a0 <4>[ 350.192697] shmem_getpage_gfp.isra.9+0x163/0xc90 <4>[ 350.192715] shmem_read_mapping_page_gfp+0x2e/0x50 <4>[ 350.192775] i915_gem_object_get_pages_gtt+0x13b/0x660 [i915] <4>[ 350.192826] ? i915_gem_obj_prepare_shmem_write+0x201/0x220 [i915] <4>[ 350.192870] ____i915_gem_object_get_pages+0x17/0x70 [i915] <4>[ 350.192914] __i915_gem_object_get_pages+0x59/0x80 [i915] <4>[ 350.192959] i915_gem_set_domain_ioctl+0x377/0x3e0 [i915] <4>[ 350.193005] ? i915_gem_obj_prepare_shmem_write+0x220/0x220 [i915] <4>[ 350.193012] drm_ioctl_kernel+0x60/0xa0 <4>[ 350.193020] drm_ioctl+0x290/0x330 <4>[ 350.193066] ? i915_gem_obj_prepare_shmem_write+0x220/0x220 [i915] <4>[ 350.193076] ? remove_vma+0x56/0x60 <4>[ 350.193082] ? remove_vma+0x56/0x60 <4>[ 350.193087] ? remove_vma+0x56/0x60 <4>[ 350.193092] ? rcu_read_lock_sched_held+0x6f/0x80 <4>[ 350.193101] do_vfs_ioctl+0x8a/0x680 <4>[ 350.193107] ? entry_SYSCALL_64_fastpath+0x5/0x89 <4>[ 350.193113] ? trace_hardirqs_on_caller+0xde/0x1c0 <4>[ 350.193120] SyS_ioctl+0x36/0x70 <4>[ 350.193128] entry_SYSCALL_64_fastpath+0x1c/0x89 <4>[ 350.193133] RIP: 0033:0x7f5cfdd49587 <4>[ 350.193137] RSP: 002b:00007ffd39791888 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 350.193144] RAX: ffffffffffffffda RBX: 0000559c607423c0 RCX: 00007f5cfdd49587 <4>[ 350.193147] RDX: 00007ffd397918c0 RSI: 00000000400c645f RDI: 0000000000000003 <4>[ 350.193151] RBP: 0000559c60739720 R08: 0000559c60739828 R09: 00007f5cfc22e000 <4>[ 350.193154] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559c60739720 <4>[ 350.193158] R13: 0000559c60739828 R14: 0000000000000d29 R15: 0000559c5f784a30 <4>[ 350.193318] Mem-Info: <4>[ 350.193330] active_anon:1688098 inactive_anon:211034 isolated_anon:32 active_file:139 inactive_file:150 isolated_file:0 unevictable:0 dirty:9 writeback:11 unstable:0 slab_reclaimable:22465 slab_unreclaimable:37058 mapped:4171 shmem:1822976 pagetables:1485 bounce:0 free:26698 free_pcp:1026 free_cma:0 <4>[ 350.193336] Node 0 active_anon:6752392kB inactive_anon:844136kB active_file:556kB inactive_file:600kB unevictable:0kB isolated(anon):128kB isolated(file):0kB mapped:16684kB dirty:36kB writeback:44kB shmem:7291904kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 88064kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no <4>[ 350.193343] DMA free:15896kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB <4>[ 350.193346] lowmem_reserve[]: 0 1772 7774 7774 <4>[ 350.193368] DMA32 free:39320kB min:15376kB low:19220kB high:23064kB active_anon:1449076kB inactive_anon:300352kB active_file:216kB inactive_file:408kB unevictable:0kB writepending:100kB present:1926488kB managed:1817944kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:2096kB local_pcp:428kB free_cma:0kB <4>[ 350.193371] lowmem_reserve[]: 0 0 6001 6001 <4>[ 350.193393] Normal free:51576kB min:52068kB low:65084kB high:78100kB active_anon:5303204kB inactive_anon:543384kB active_file:96kB inactive_file:940kB unevictable:0kB writepending:40kB present:6291456kB managed:6146000kB mlocked:0kB kernel_stack:3296kB pagetables:5936kB bounce:0kB free_pcp:2008kB local_pcp:316kB free_cma:0kB <4>[ 350.193396] lowmem_reserve[]: 0 0 0 0 <4>[ 350.193415] DMA: 2*4kB (U) 2*8kB (U) 2*16kB (U) 3*32kB (U) 2*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB <4>[ 350.193486] DMA32: 75*4kB (ME) 161*8kB (UME) 63*16kB (UME) 34*32kB (ME) 10*64kB (M) 23*128kB (UME) 52*256kB (ME) 12*512kB (UME) 2*1024kB (E) 5*2048kB (M) 0*4096kB = 39012kB <4>[ 350.193561] Normal: 2268*4kB (UME) 1355*8kB (UME) 924*16kB (UME) 448*32kB (UME) 40*64kB (ME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51592kB <6>[ 350.193620] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB <4>[ 350.193624] 1830981 total pagecache pages <4>[ 350.193632] 7618 pages in swap cache <4>[ 350.193636] Swap cache stats: add 980158, delete 972501, find 6/14 <4>[ 350.193659] Free swap = 0kB <4>[ 350.193662] Total swap = 2097148kB <4>[ 350.193665] 2058482 pages RAM <4>[ 350.193669] 0 pages HighMem/MovableOnly <4>[ 350.193672] 63522 pages reserved <4>[ 350.625960] hpet1: lost 7161 rtc interrupts <4>[ 351.529516] hpet1: lost 7161 rtc interrupts <4>[ 351.563934] gem_ppgtt invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=1000 <4>[ 351.564070] CPU: 3 PID: 2707 Comm: gem_ppgtt Tainted: G U 4.15.0-rc1-CI-CI_DRM_3419+ #1 <4>[ 351.564074] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017 <4>[ 351.564077] Call Trace: <4>[ 351.564090] dump_stack+0x5f/0x86 <4>[ 351.564098] dump_header+0x60/0x424 <4>[ 351.564106] ? trace_hardirqs_on_caller+0xde/0x1c0 <4>[ 351.564113] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4>[ 351.564123] oom_kill_process+0x2b3/0x650 <4>[ 351.564134] out_of_memory+0xf4/0x350 <4>[ 351.564142] __alloc_pages_nodemask+0x108f/0x11d0 <4>[ 351.564171] filemap_fault+0x456/0x600 <4>[ 351.564185] ext4_filemap_fault+0x2a/0x40 <4>[ 351.564192] __do_fault+0x15/0x70 <4>[ 351.564198] __handle_mm_fault+0x9ab/0xda0 <4>[ 351.564217] handle_mm_fault+0x14f/0x2f0 <4>[ 351.564226] __do_page_fault+0x2d1/0x560 <4>[ 351.564239] page_fault+0x22/0x30 <4>[ 351.564245] RIP: 0033:0x7f5cfdccd610 <4>[ 351.564249] RSP: 002b:00007ffd39791898 EFLAGS: 00010246 <4>[ 351.564256] RAX: 0000000000000181 RBX: 0000559c607423c0 RCX: 00007f5cfdd49587 <4>[ 351.564260] RDX: 0000000000000000 RSI: 0000559c607423b0 RDI: 00007f5cfe00cb00 <4>[ 351.564264] RBP: 0000559c60741200 R08: 0000559c607395e0 R09: 00007ffd3979d080 <4>[ 351.564268] R10: 000000000009588e R11: 0000000000000246 R12: 0000559c60739720 <4>[ 351.564272] R13: 0000559c60739828 R14: 0000000000000d4a R15: 0000559c5f784a30 <4>[ 351.564292] Mem-Info: <4>[ 351.564301] active_anon:1687917 inactive_anon:211004 isolated_anon:96 active_file:72 inactive_file:0 isolated_file:0 unevictable:0 dirty:2 writeback:11 unstable:0 slab_reclaimable:22451 slab_unreclaimable:37227 mapped:2735 shmem:1830444 pagetables:1485 bounce:0 free:26606 free_pcp:1126 free_cma:0 <4>[ 351.564308] Node 0 active_anon:6751668kB inactive_anon:844016kB active_file:288kB inactive_file:0kB unevictable:0kB isolated(anon):384kB isolated(file):0kB mapped:10940kB dirty:8kB writeback:44kB shmem:7321776kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no <4>[ 351.564314] DMA free:15896kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB <4>[ 351.564318] lowmem_reserve[]: 0 1772 7774 7774 <4>[ 351.586427] DMA32 free:39820kB min:15376kB low:19220kB high:23064kB active_anon:1741616kB inactive_anon:5912kB active_file:44kB inactive_file:372kB unevictable:0kB writepending:88kB present:1926488kB managed:1817944kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:2544kB local_pcp:720kB free_cma:0kB <4>[ 351.586430] lowmem_reserve[]: 0 0 6001 6001 <4>[ 351.586455] Normal free:52764kB min:52068kB low:65084kB high:78100kB active_anon:5006776kB inactive_anon:838476kB active_file:140kB inactive_file:0kB unevictable:0kB writepending:24kB present:6291456kB managed:6146000kB mlocked:0kB kernel_stack:3296kB pagetables:5940kB bounce:0kB free_pcp:2376kB local_pcp:644kB free_cma:0kB <4>[ 351.586458] lowmem_reserve[]: 0 0 0 0 <4>[ 351.586479] DMA: 2*4kB (U) 2*8kB (U) 2*16kB (U) 3*32kB (U) 2*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB <4>[ 351.586671] DMA32: 76*4kB (UME) 112*8kB (ME) 66*16kB (ME) 51*32kB (UME) 19*64kB (UME) 11*128kB (UME) 51*256kB (ME) 16*512kB (ME) 2*1024kB (E) 5*2048kB (M) 0*4096kB = 40048kB <4>[ 351.586754] Normal: 2499*4kB (UME) 1359*8kB (UME) 920*16kB (UME) 433*32kB (UME) 49*64kB (UME) 1*128kB (E) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 52708kB <6>[ 351.586824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB <4>[ 351.586828] 1830078 total pagecache pages <4>[ 351.586837] 0 pages in swap cache <4>[ 351.586842] Swap cache stats: add 980178, delete 980138, find 10/22 <4>[ 351.586846] Free swap = 0kB <4>[ 351.586850] Total swap = 2097148kB <4>[ 351.586854] 2058482 pages RAM <4>[ 351.586858] 0 pages HighMem/MovableOnly <4>[ 351.586861] 63522 pages reserved <6>[ 351.586865] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name <6>[ 351.587119] [ 193] 0 193 11556 316 122880 0 0 systemd-journal <6>[ 351.587421] [ 211] 0 211 11266 392 114688 0 -1000 systemd-udevd <6>[ 351.587432] [ 439] 0 439 44195 203 192512 0 0 thermald <6>[ 351.587440] [ 443] 104 443 64105 231 139264 0 0 rsyslogd <6>[ 351.587449] [ 451] 0 451 105640 883 335872 0 0 ModemManager <6>[ 351.587457] [ 474] 0 474 11661 146 131072 0 0 systemd-logind <6>[ 351.587465] [ 496] 0 496 8086 72 110592 0 0 cron <6>[ 351.587473] [ 507] 0 507 1098 22 53248 0 0 acpid <6>[ 351.587481] [ 512] 106 512 11478 257 135168 0 -900 dbus-daemon <6>[ 351.587488] [ 529] 118 529 11830 107 139264 0 0 avahi-daemon <6>[ 351.587496] [ 533] 0 533 255151 693 274432 0 0 repowerd <6>[ 351.587505] [ 534] 118 534 11797 90 126976 0 0 avahi-daemon <6>[ 351.587513] [ 546] 0 546 99822 746 348160 0 0 NetworkManager <6>[ 351.587521] [ 604] 0 604 4906 63 77824 0 0 irqbalance <6>[ 351.587529] [ 643] 0 643 88880 342 249856 0 0 upowerd <6>[ 351.587537] [ 649] 0 649 76011 240 245760 0 0 polkitd <6>[ 351.587545] [ 756] 102 756 12445 135 143360 0 0 systemd-resolve <6>[ 351.587553] [ 757] 0 757 17504 182 180224 0 -1000 sshd <6>[ 351.587560] [ 856] 0 856 4305 35 73728 0 0 agetty <6>[ 351.587568] [ 936] 0 936 25378 245 237568 0 0 sshd <6>[ 351.587576] [ 942] 1000 942 16248 233 167936 0 0 systemd <6>[ 351.587584] [ 945] 1000 945 22381 562 208896 0 0 (sd-pam) <6>[ 351.587592] [ 958] 1000 958 25412 279 233472 0 0 sshd <6>[ 351.587600] [ 986] 1000 986 1295991 28692 696320 0 0 java <6>[ 351.587607] [ 1041] 1000 1041 1125 27 57344 0 0 sh <6>[ 351.587615] [ 1114] 1000 1114 3760 37 77824 0 0 dmesg <6>[ 351.587623] [ 1116] 0 1116 15572 116 172032 0 0 sudo <6>[ 351.587631] [ 1120] 0 1120 1090 18 57344 0 0 owatch <6>[ 351.587638] [ 1121] 0 1121 263175 32295 561152 0 0 python3 <6>[ 351.587651] [ 1599] 0 1599 4030 214 73728 0 0 dhclient <6>[ 351.587659] [ 2706] 0 2706 14115 1857 147456 0 1000 gem_ppgtt <6>[ 351.587667] [ 2707] 0 2707 13859 104 126976 0 1000 gem_ppgtt <6>[ 351.587675] [ 2709] 0 2709 14115 105 126976 0 1000 gem_ppgtt <6>[ 351.587683] [ 2710] 0 2710 14115 235 126976 0 1000 gem_ppgtt <6>[ 351.587690] [ 2711] 0 2711 14115 105 126976 0 1000 gem_ppgtt <6>[ 351.587698] [ 2712] 0 2712 14115 301 126976 0 1000 gem_ppgtt <6>[ 351.587706] [ 2713] 0 2713 14115 301 126976 0 1000 gem_ppgtt <6>[ 351.587714] [ 2714] 0 2714 14115 105 126976 0 1000 gem_ppgtt <6>[ 351.587722] [ 2715] 0 2715 14627 302 131072 0 1000 gem_ppgtt <3>[ 351.587727] Out of memory: Kill process 2706 (gem_ppgtt) score 1000 or sacrifice child <3>[ 351.588765] Killed process 2715 (gem_ppgtt) total-vm:58508kB, anon-rss:416kB, file-rss:4kB, shmem-rss:788kB <6>[ 351.616709] oom_reaper: reaped process 2715 (gem_ppgtt), now anon-rss:0kB, file-rss:0kB, shmem-rss:788kB <4>[ 352.431360] hpet1: lost 7161 rtc interrupts .. the just "stray" in rest of dmesg.
From dmesg, the problems start already on igt trying to init before gem_eio/throttle. multiring resets (dead gpu?) already before throttle starts and also then during throttle.
(In reply to Mika Kuoppala from comment #1) > From dmesg, the problems start already on igt trying to init before > gem_eio/throttle. multiring resets (dead gpu?) already before throttle starts > and also then during throttle. That's gem_eio, it's purpose is to wedge the gpu and check abi for correct behaviour. Before gem_ppgtt started, it checked that the gpu was no longer wedged, so everything appears in order. What is not so happy is the <4>[ 193.388010] hpet1: lost 7161 rtc interrupts A dead timer. That would explain why we never were able to retire requests and so leaked everything, and everything remained marked as busy.
(In reply to Chris Wilson from comment #2) > What is not so happy is the > > <4>[ 193.388010] hpet1: lost 7161 rtc interrupts > > A dead timer. That would explain why we never were able to retire requests > and so leaked everything, and everything remained marked as busy. No, it doesn't. If we thought everything was still active, we would not have filled swap (only inactive bo can be swapped). Which brings us back to a bo leak, but whose?
Last seen: CI_DRM_3419: 2017-11-30 / 241 runs ago
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.