Bug 68169

Summary: [PNV/ILK/SNB/IVB]igt/gem_concurrent_blit/gtt-gpu-read-after-write-forked causes OOM killer
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium    
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg none

Description lu hua 2013-08-16 02:53:59 UTC
Created attachment 84118 [details]
dmesg

System Environment:
--------------------------
Platform:    Pineview/Ironlake/Sandybridge/Ivybridge
Kernel:      (drm-intel-nightly)d93f59e86ae93066969fa8ae2a6c9ccc7fc4728d
 
Bug detailed description:
-----------------------------
It causes OOM killer on Pineview/Ironlake/Sandybridge/Ivybridge with -queued, -fixes, -nightly kernel.It's a new igt case.
Following new cases also has this issue:	
igt/gem_concurrent_blit/cpu-overwrite-source-forked	
igt/gem_concurrent_blit/gtt-early-read-forked	
igt/gem_concurrent_blit/gtt-gpu-read-after-write-forked	
igt/gem_concurrent_blit/gtt-overwrite-source-forked	
igt/gem_concurrent_blit/prw-early-read-forked	
igt/gem_concurrent_blit/prw-gpu-read-after-write-forked	
igt/gem_concurrent_blit/prw-overwrite-source-forked	

output:
Subtest prw-overwrite-source: SUCCESS
Subtest prw-early-read: SUCCESS
Subtest prw-gpu-read-after-write: SUCCESS
Subtest prw-overwrite-source-interruptible: SUCCESS
Subtest prw-early-read-interruptible: SUCCESS
Subtest prw-gpu-read-after-write-interruptible: SUCCESS

[  377.060662] Call Trace:
[  377.060693]  [<c08706a5>] ? dump_stack+0x3e/0x4e
[  377.060737]  [<c086d9f4>] ? dump_header.isra.9+0x53/0x15e
[  377.060786]  [<c028ce84>] ? oom_kill_process+0x6b/0x2a3
[  377.062849]  [<c028f5fa>] ? get_page_from_freelist+0x382/0x3b6
[  377.064835]  [<c0296050>] ? try_to_free_pages+0x20b/0x25b
[  377.066796]  [<c0453852>] ? security_capable_noaudit+0xc/0xf
[  377.068754]  [<c028d3bd>] ? out_of_memory+0x1c3/0x1f0
[  377.070690]  [<c028fb17>] ? __alloc_pages_nodemask+0x4e9/0x5e9
[  377.072609]  [<c0297bf4>] ? shmem_getpage_gfp+0x2d1/0x55e
[  377.074630]  [<c0297f2a>] ? shmem_read_mapping_page_gfp+0x1f/0x39
[  377.076581]  [<f8454542>] ? i915_gem_object_get_pages_gtt+0x113/0x27e [i915]
[  377.078506]  [<c08743c8>] ? schedule_preempt_disabled+0x5/0x6
[  377.080436]  [<c0873148>] ? __mutex_lock_interruptible_slowpath+0x13c/0x146
[  377.082383]  [<f8451919>] ? i915_gem_object_get_pages+0x4b/0x71 [i915]
[  377.084350]  [<f8455aef>] ? i915_gem_pwrite_ioctl+0x3a4/0x73a [i915]
[  377.086321]  [<f8383b09>] ? drm_ioctl+0x116/0x323 [drm]
[  377.088313]  [<f845574b>] ? i915_gem_fault+0x1a6/0x1a6 [i915]
[  377.090313]  [<f8383c30>] ? drm_ioctl+0x23d/0x323 [drm]
[  377.092312]  [<f845574b>] ? i915_gem_fault+0x1a6/0x1a6 [i915]
[  377.094301]  [<c0292b88>] ? pagevec_lru_move_fn+0x98/0xa6
[  377.096304]  [<c02a5cad>] ? page_add_new_anon_rmap+0x2f/0x8b
[  377.098306]  [<c02a02b3>] ? handle_pte_fault+0x1f3/0x5e3
[  377.100287]  [<f83839f3>] ? drm_copy_field+0x47/0x47 [drm]
[  377.102247]  [<c02c0508>] ? vfs_ioctl+0x18/0x21
[  377.104216]  [<c02c0ed4>] ? do_vfs_ioctl+0x3ec/0x42c
[  377.106196]  [<c0877869>] ? __do_page_fault+0x400/0x43b
[  377.108175]  [<c087781d>] ? __do_page_fault+0x3b4/0x43b
[  377.110161]  [<c02c0f5d>] ? SyS_ioctl+0x49/0x74
[  377.112135]  [<c08793fa>] ? sysenter_do_call+0x12/0x22


[  377.114111] Mem-Info:
[  377.116076] DMA per-cpu:
[  377.118029] CPU    0: hi:    0, btch:   1 usd:   0
[  377.119996] CPU    1: hi:    0, btch:   1 usd:   0
[  377.121948] CPU    2: hi:    0, btch:   1 usd:   0
[  377.123868] CPU    3: hi:    0, btch:   1 usd:   0
[  377.125742] Normal per-cpu:
[  377.127580] CPU    0: hi:  186, btch:  31 usd:   0
[  377.129421] CPU    1: hi:  186, btch:  31 usd:   0
[  377.131257] CPU    2: hi:  186, btch:  31 usd:   0
[  377.133042] CPU    3: hi:  186, btch:  31 usd:   0
[  377.134771] HighMem per-cpu:
[  377.136470] CPU    0: hi:  186, btch:  31 usd:   0
[  377.138169] CPU    1: hi:  186, btch:  31 usd:   0
[  377.139857] CPU    2: hi:  186, btch:  31 usd:   0
[  377.141465] CPU    3: hi:  186, btch:  31 usd:   0
[  377.143023] active_anon:228986 inactive_anon:229510 isolated_anon:0
[  377.143023]  active_file:26 inactive_file:63 isolated_file:0
[  377.143023]  unevictable:0 dirty:0 writeback:45311 unstable:0
[  377.143023]  free:10385 slab_reclaimable:2946 slab_unreclaimable:7241
[  377.143023]  mapped:82 shmem:406963 pagetables:476 bounce:0
[  377.143023]  free_cma:0

Reproduce steps:
----------------------------
1. ./gem_concurrent_blit --runsubtest gtt-gpu-read-after-write-forked
Comment 1 Chris Wilson 2013-08-16 08:53:22 UTC
Well that was inevitable... The original intention was to thrash with lots of contending processes.

Hopefully

commit 1ca607b458b63762846fc95c104b4686eb4eccb3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 16 09:44:13 2013 +0100

    gem_concurrent_blit: Share total num_buffers between all children
    
    Apparently not all machines have more than 4GiB of RAM available.
    Spoilsports.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68169

will make you happier.
Comment 2 Chris Wilson 2013-08-16 10:26:59 UTC
And there was also a classic leak:

commit 1b17cb9d04809f9528279abc44ad74f5559df3b5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 16 11:23:22 2013 +0100

    gem_concurrent_blit: Fix the leak from the children.
    
    As the children use the parent's fd, the kernel only has a single filp
    for everyone. Therefore we cannot rely on the process termination
    reaping the buffers we allocate for the children.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68169
Comment 3 Chris Wilson 2013-08-16 11:23:48 UTC
Freeing stuff is hard:


commit 0d320fdcedf9e5fd512597477e9f4913a3c7af33
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 16 12:07:56 2013 +0100

    gem_concurrent_blit: Purge the child bufmgr's cache
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68169
Comment 4 lu hua 2013-08-20 05:45:40 UTC
Verified.Fixed.
Comment 5 Elizabeth 2017-10-06 14:43:59 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.