System Environment: -------------------------- Platform: GM45/ILK/BYT/BDW kernel: (drm-intel-nightly)2a9f2367690435ba1d3132c92a71300d2dbe90ba Bug detailed description: ------------------------- It causes OOM killer on GM45/ILK/BYT/BDW. Many gem_concurrent_blit subcases have this issue. output: Killed dmesg: [ 83.425615] Call Trace: [ 83.425654] [<ffffffff81709bba>] ? dump_stack+0x41/0x51 [ 83.425717] [<ffffffff81705fe3>] ? dump_header.isra.8+0x69/0x194 [ 83.425789] [<ffffffff8106b059>] ? ktime_get_ts+0x49/0xab [ 83.425854] [<ffffffff812cbcda>] ? ___ratelimit+0xae/0xc8 [ 83.425919] [<ffffffff810a30e9>] ? oom_kill_process+0x7c/0x311 [ 83.425987] [<ffffffff810a2eab>] ? oom_unkillable_task.isra.5+0x6d/0x7e [ 83.426063] [<ffffffff810a3892>] ? out_of_memory+0x3b2/0x3e5 [ 83.426130] [<ffffffff810a6fdb>] ? __alloc_pages_nodemask+0x666/0x779 [ 83.426206] [<ffffffff810cffbc>] ? alloc_pages_current+0xc5/0xe2 [ 83.426276] [<ffffffff810a2779>] ? filemap_fault+0x25f/0x384 [ 83.426343] [<ffffffff810b7e9b>] ? __do_fault+0xac/0x3c3 [ 83.426417] [<ffffffff810bb538>] ? handle_mm_fault+0x1e7/0x7e4 [ 83.426487] [<ffffffff81711e9e>] ? __do_page_fault+0x422/0x46f [ 83.426557] [<ffffffff810e94e6>] ? __pollwait+0xcb/0xcb [ 83.426618] [<ffffffff810e94e6>] ? __pollwait+0xcb/0xcb [ 83.426680] [<ffffffff8106b059>] ? ktime_get_ts+0x49/0xab [ 83.426744] [<ffffffff810e96e9>] ? poll_select_set_timeout+0x4e/0x6f [ 83.426818] [<ffffffff8170f432>] ? page_fault+0x22/0x30 [ 83.426879] Mem-Info: [ 83.426908] Node 0 DMA per-cpu: [ 83.426950] CPU 0: hi: 0, btch: 1 usd: 0 [ 83.427004] CPU 1: hi: 0, btch: 1 usd: 0 [ 83.427059] CPU 2: hi: 0, btch: 1 usd: 0 [ 83.427113] CPU 3: hi: 0, btch: 1 usd: 0 [ 83.427167] Node 0 DMA32 per-cpu: [ 83.427211] CPU 0: hi: 186, btch: 31 usd: 0 [ 83.427266] CPU 1: hi: 186, btch: 31 usd: 0 [ 83.427320] CPU 2: hi: 186, btch: 31 usd: 0 [ 83.427374] CPU 3: hi: 186, btch: 31 usd: 0 [ 83.427440] active_anon:349508 inactive_anon:117402 isolated_anon:0 Reproduce steps: ---------------------------- 1. ./gem_concurrent_blit --run-subtest cpu-overwrite-source
commit aee0dcb1ec2075991d310dd6f3fb5e50160847d1 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Dec 3 16:32:52 2013 +0100 test/gem_concurrent_blt Please retest with this commit, it should help.
Test on igt commit ab7cbf9737fe35cc286520379e54ae9882ab402b. It still happens.
Just to check: Do all gem_concurrent_blit subtest fail with OOM or just the cpu-overwrite-source one?
Some subcases are pass. Following sub cases are fail with OOM killer. cpu-early-read-interruptible cpu-gpu-read-after-write-interruptible cpu-overwrite-source-interruptible prw-early-read-interruptible prw-gpu-read-after-write-interruptible prw-overwrite-source-interruptible
Just to make sure: All the other tests work well, even the cpu-*-forked ones? Also can you please attach the full OOM splat (doesn't matter which machine, just need an example), preferrably full dmesg?
Some sub cases randomly cause OOM killer. igt/gem_concurrent_blit/cpu-overwrite-source-forked fails 4 in 5 runs. igt/gem_concurrent_blit/gtt-overwrite-source-forked fails 4 in 5 runs. igt/gem_concurrent_blit/cpu-early-read-forked fail 0/5 igt/gem_concurrent_blit/cpu-early-read fai 0/5
Created attachment 90340 [details] dmesg on ILK
GFP_NORMAL exhaustion on 32bit kernels afaict. Not entirely unexpected since that'll massively increase the shrinker pressure. The ilk is running a 32bit kernel, is that the case for all the other affected systems, too? 64bit kernels really should work here ...
It also happens on 64 bit kernel. BYT is running 64bit kernel.
Increasing priority. Several tests have to be disabled in nightly due to this bug, impacting the execution rate of BYT/BDW.
What do those systems have in common? Memory configuration, 32-vs-64 bit? Random bits of kernel config? Random userspace setup? In other words, why do you regularly see this but I don't?
Test on Baytrail: output: IGT-Version: 1.5-gb5109e6 (x86_64) (Linux: 3.13.0-rc8_drm-intel-nightly_f27f16_20140126+ x86_64) Killed # cat /proc/meminfo MemTotal: 1941356 kB MemFree: 1871720 kB Buffers: 7216 kB Cached: 21728 kB SwapCached: 6424 kB Active: 6464 kB Inactive: 31584 kB Active(anon): 900 kB Inactive(anon): 8236 kB Active(file): 5564 kB Inactive(file): 23348 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 3964924 kB SwapFree: 3926980 kB Dirty: 176 kB Writeback: 0 kB AnonPages: 4148 kB Mapped: 5332 kB Shmem: 32 kB Slab: 15748 kB SReclaimable: 5728 kB SUnreclaim: 10020 kB KernelStack: 896 kB PageTables: 4968 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4935600 kB Committed_AS: 285944 kB VmallocTotal: 34359738367 kB VmallocUsed: 347756 kB VmallocChunk: 34359377964 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 15104 kB DirectMap2M: 1972224 kB
A little printf goes a long way... root@baytrail-t:/usr/src/intel-gpu-tools/tests# ./gem_concurrent_blit IGT-Version: 1.5-gb5109e6 (i686) (Linux: 3.13.0-rc8+ i686) using 2x512 buffers, each 1MiB Subtest prw-overwrite-source: SUCCESS Subtest prw-early-read: SUCCESS Subtest prw-gpu-read-after-write: SUCCESS Subtest prw-overwrite-source-interruptible: SUCCESS Subtest prw-early-read-interruptible: SUCCESS Subtest prw-gpu-read-after-write-interruptible: SUCCESS Subtest prw-overwrite-source-forked: SUCCESS Subtest prw-early-read-forked: SUCCESS Subtest prw-gpu-read-after-write-forked: SUCCESS Subtest cpu-overwrite-source: SUCCESS Subtest cpu-early-read: SUCCESS Subtest cpu-gpu-read-after-write: SUCCESS Subtest cpu-overwrite-source-interruptible: SUCCESS Subtest cpu-early-read-interruptible: SUCCESS Subtest cpu-gpu-read-after-write-interruptible: SUCCESS Subtest cpu-overwrite-source-forked: SUCCESS Subtest cpu-early-read-forked: SUCCESS Subtest cpu-gpu-read-after-write-forked: SUCCESS Subtest gtt-overwrite-source: SUCCESS Subtest gtt-early-read: SUCCESS Subtest gtt-gpu-read-after-write: SUCCESS Subtest gtt-overwrite-source-interruptible: SUCCESS Subtest gtt-early-read-interruptible: SUCCESS Subtest gtt-gpu-read-after-write-interruptible: SUCCESS Subtest gtt-overwrite-source-forked: SUCCESS Subtest gtt-early-read-forked: SUCCESS Subtest gtt-gpu-read-after-write-forked: SUCCESS cat /proc/meminfo MemTotal: 1929304 kB MemFree: 1613484 kB Buffers: 15236 kB Cached: 270856 kB SwapCached: 0 kB Active: 118092 kB Inactive: 175144 kB Active(anon): 7200 kB Inactive(anon): 8656 kB Active(file): 110892 kB Inactive(file): 166488 kB Unevictable: 0 kB Mlocked: 0 kB HighTotal: 1022332 kB HighFree: 891840 kB LowTotal: 906972 kB LowFree: 721644 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 7164 kB Mapped: 1896 kB Shmem: 8692 kB Slab: 14772 kB SReclaimable: 8096 kB SUnreclaim: 6676 kB KernelStack: 648 kB PageTables: 352 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 964652 kB Committed_AS: 52136 kB VmallocTotal: 122880 kB VmallocUsed: 14076 kB VmallocChunk: 93428 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 4096 kB DirectMap4k: 23544 kB DirectMap4M: 884736 kB commit 0b4c33f62c2d4a61b0b5e9184524c8ca273400b1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Jan 26 14:36:32 2014 +0000 igt/gem_concurrent_blit: Scale resource usage to RAM correctly Note that we use twice the number of buffers, and so we need to restrict num_buffers appropriately to fit within RAM. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72255 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Verified.Fixed.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.