Bug 72255

Summary: [GM45/ILK/BYT/BDW]igt/gem_concurrent_blit/cpu-overwrite-source causes OOM killer
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg on ILK none

Description lu hua 2013-12-03 09:06:37 UTC
System Environment:
--------------------------
Platform: GM45/ILK/BYT/BDW
kernel:   (drm-intel-nightly)2a9f2367690435ba1d3132c92a71300d2dbe90ba

Bug detailed description:
-------------------------
It causes OOM killer on GM45/ILK/BYT/BDW.
Many gem_concurrent_blit subcases have this issue.

output:
Killed

dmesg:
[   83.425615] Call Trace:
[   83.425654]  [<ffffffff81709bba>] ? dump_stack+0x41/0x51
[   83.425717]  [<ffffffff81705fe3>] ? dump_header.isra.8+0x69/0x194
[   83.425789]  [<ffffffff8106b059>] ? ktime_get_ts+0x49/0xab
[   83.425854]  [<ffffffff812cbcda>] ? ___ratelimit+0xae/0xc8
[   83.425919]  [<ffffffff810a30e9>] ? oom_kill_process+0x7c/0x311
[   83.425987]  [<ffffffff810a2eab>] ? oom_unkillable_task.isra.5+0x6d/0x7e
[   83.426063]  [<ffffffff810a3892>] ? out_of_memory+0x3b2/0x3e5
[   83.426130]  [<ffffffff810a6fdb>] ? __alloc_pages_nodemask+0x666/0x779
[   83.426206]  [<ffffffff810cffbc>] ? alloc_pages_current+0xc5/0xe2
[   83.426276]  [<ffffffff810a2779>] ? filemap_fault+0x25f/0x384
[   83.426343]  [<ffffffff810b7e9b>] ? __do_fault+0xac/0x3c3
[   83.426417]  [<ffffffff810bb538>] ? handle_mm_fault+0x1e7/0x7e4
[   83.426487]  [<ffffffff81711e9e>] ? __do_page_fault+0x422/0x46f
[   83.426557]  [<ffffffff810e94e6>] ? __pollwait+0xcb/0xcb
[   83.426618]  [<ffffffff810e94e6>] ? __pollwait+0xcb/0xcb
[   83.426680]  [<ffffffff8106b059>] ? ktime_get_ts+0x49/0xab
[   83.426744]  [<ffffffff810e96e9>] ? poll_select_set_timeout+0x4e/0x6f
[   83.426818]  [<ffffffff8170f432>] ? page_fault+0x22/0x30
[   83.426879] Mem-Info:
[   83.426908] Node 0 DMA per-cpu:
[   83.426950] CPU    0: hi:    0, btch:   1 usd:   0
[   83.427004] CPU    1: hi:    0, btch:   1 usd:   0
[   83.427059] CPU    2: hi:    0, btch:   1 usd:   0
[   83.427113] CPU    3: hi:    0, btch:   1 usd:   0
[   83.427167] Node 0 DMA32 per-cpu:
[   83.427211] CPU    0: hi:  186, btch:  31 usd:   0
[   83.427266] CPU    1: hi:  186, btch:  31 usd:   0
[   83.427320] CPU    2: hi:  186, btch:  31 usd:   0
[   83.427374] CPU    3: hi:  186, btch:  31 usd:   0
[   83.427440] active_anon:349508 inactive_anon:117402 isolated_anon:0

Reproduce steps:
----------------------------
1. ./gem_concurrent_blit --run-subtest cpu-overwrite-source
Comment 1 Daniel Vetter 2013-12-03 15:32:57 UTC
commit aee0dcb1ec2075991d310dd6f3fb5e50160847d1
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Dec 3 16:32:52 2013 +0100

    test/gem_concurrent_blt

Please retest with this commit, it should help.
Comment 2 lu hua 2013-12-04 05:56:20 UTC
Test on igt commit ab7cbf9737fe35cc286520379e54ae9882ab402b. It still happens.
Comment 3 Daniel Vetter 2013-12-04 08:26:12 UTC
Just to check: Do all gem_concurrent_blit subtest fail with OOM or just the cpu-overwrite-source one?
Comment 4 lu hua 2013-12-05 05:36:14 UTC
Some subcases are pass. Following sub cases are fail with OOM killer.
cpu-early-read-interruptible
cpu-gpu-read-after-write-interruptible
cpu-overwrite-source-interruptible
prw-early-read-interruptible
prw-gpu-read-after-write-interruptible
prw-overwrite-source-interruptible
Comment 5 Daniel Vetter 2013-12-05 07:33:58 UTC
Just to make sure: All the other tests work well, even the cpu-*-forked ones?

Also can you please attach the full OOM splat (doesn't matter which machine, just need an example), preferrably full dmesg?
Comment 6 lu hua 2013-12-06 07:42:22 UTC
Some sub cases randomly cause OOM killer. 

igt/gem_concurrent_blit/cpu-overwrite-source-forked fails 4 in 5 runs.
igt/gem_concurrent_blit/gtt-overwrite-source-forked fails 4 in 5 runs.

igt/gem_concurrent_blit/cpu-early-read-forked    fail    0/5
igt/gem_concurrent_blit/cpu-early-read           fai     0/5
Comment 7 lu hua 2013-12-06 07:42:45 UTC
Created attachment 90340 [details]
dmesg on ILK
Comment 8 Daniel Vetter 2013-12-06 09:14:08 UTC
GFP_NORMAL exhaustion on 32bit kernels afaict. Not entirely unexpected since that'll massively increase the shrinker pressure.

The ilk is running a 32bit kernel, is that the case for all the other affected systems, too? 64bit kernels really should work here ...
Comment 9 lu hua 2013-12-09 05:50:19 UTC
It also happens on 64 bit kernel.
BYT is running 64bit kernel.
Comment 10 Gordon Jin 2014-01-25 00:07:51 UTC
Increasing priority.
Several tests have to be disabled in nightly due to this bug, impacting the execution rate of BYT/BDW.
Comment 11 Chris Wilson 2014-01-25 16:41:29 UTC
What do those systems have in common? Memory configuration, 32-vs-64 bit? Random bits of kernel config? Random userspace setup?

In other words, why do you regularly see this but I don't?
Comment 12 lu hua 2014-01-26 05:01:24 UTC
Test on Baytrail:
output:
IGT-Version: 1.5-gb5109e6 (x86_64) (Linux: 3.13.0-rc8_drm-intel-nightly_f27f16_20140126+ x86_64)
Killed

# cat /proc/meminfo
MemTotal:        1941356 kB
MemFree:         1871720 kB
Buffers:            7216 kB
Cached:            21728 kB
SwapCached:         6424 kB
Active:             6464 kB
Inactive:          31584 kB
Active(anon):        900 kB
Inactive(anon):     8236 kB
Active(file):       5564 kB
Inactive(file):    23348 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3964924 kB
SwapFree:        3926980 kB
Dirty:               176 kB
Writeback:             0 kB
AnonPages:          4148 kB
Mapped:             5332 kB
Shmem:                32 kB
Slab:              15748 kB
SReclaimable:       5728 kB
SUnreclaim:        10020 kB
KernelStack:         896 kB
PageTables:         4968 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4935600 kB
Committed_AS:     285944 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      347756 kB
VmallocChunk:   34359377964 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       15104 kB
DirectMap2M:     1972224 kB
Comment 13 Chris Wilson 2014-01-26 15:24:00 UTC
A little printf goes a long way...

root@baytrail-t:/usr/src/intel-gpu-tools/tests# ./gem_concurrent_blit 
IGT-Version: 1.5-gb5109e6 (i686) (Linux: 3.13.0-rc8+ i686)
using 2x512 buffers, each 1MiB
Subtest prw-overwrite-source: SUCCESS
Subtest prw-early-read: SUCCESS
Subtest prw-gpu-read-after-write: SUCCESS
Subtest prw-overwrite-source-interruptible: SUCCESS
Subtest prw-early-read-interruptible: SUCCESS
Subtest prw-gpu-read-after-write-interruptible: SUCCESS
Subtest prw-overwrite-source-forked: SUCCESS
Subtest prw-early-read-forked: SUCCESS
Subtest prw-gpu-read-after-write-forked: SUCCESS
Subtest cpu-overwrite-source: SUCCESS
Subtest cpu-early-read: SUCCESS
Subtest cpu-gpu-read-after-write: SUCCESS
Subtest cpu-overwrite-source-interruptible: SUCCESS
Subtest cpu-early-read-interruptible: SUCCESS
Subtest cpu-gpu-read-after-write-interruptible: SUCCESS
Subtest cpu-overwrite-source-forked: SUCCESS
Subtest cpu-early-read-forked: SUCCESS
Subtest cpu-gpu-read-after-write-forked: SUCCESS
Subtest gtt-overwrite-source: SUCCESS
Subtest gtt-early-read: SUCCESS
Subtest gtt-gpu-read-after-write: SUCCESS
Subtest gtt-overwrite-source-interruptible: SUCCESS
Subtest gtt-early-read-interruptible: SUCCESS
Subtest gtt-gpu-read-after-write-interruptible: SUCCESS
Subtest gtt-overwrite-source-forked: SUCCESS
Subtest gtt-early-read-forked: SUCCESS
Subtest gtt-gpu-read-after-write-forked: SUCCESS

cat /proc/meminfo 
MemTotal:        1929304 kB
MemFree:         1613484 kB
Buffers:           15236 kB
Cached:           270856 kB
SwapCached:            0 kB
Active:           118092 kB
Inactive:         175144 kB
Active(anon):       7200 kB
Inactive(anon):     8656 kB
Active(file):     110892 kB
Inactive(file):   166488 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       1022332 kB
HighFree:         891840 kB
LowTotal:         906972 kB
LowFree:          721644 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          7164 kB
Mapped:             1896 kB
Shmem:              8692 kB
Slab:              14772 kB
SReclaimable:       8096 kB
SUnreclaim:         6676 kB
KernelStack:         648 kB
PageTables:          352 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      964652 kB
Committed_AS:      52136 kB
VmallocTotal:     122880 kB
VmallocUsed:       14076 kB
VmallocChunk:      93428 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
DirectMap4k:       23544 kB
DirectMap4M:      884736 kB

commit 0b4c33f62c2d4a61b0b5e9184524c8ca273400b1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jan 26 14:36:32 2014 +0000

    igt/gem_concurrent_blit: Scale resource usage to RAM correctly
    
    Note that we use twice the number of buffers, and so we need to restrict
    num_buffers appropriately to fit within RAM.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72255
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 14 lu hua 2014-01-28 05:10:17 UTC
Verified.Fixed.
Comment 15 Elizabeth 2017-10-06 14:41:37 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.