Bug 111035

Summary: [CI][SHARDS] igt@gem_create@create-clear - dmesg-fail - gem_create: page allocation failure
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: SNB i915 features: GEM/Other

Description Martin Peres 2019-07-01 10:57:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5071/shard-snb1/igt@gem_create@create-clear.html

<6> [923.583591] [IGT] gem_create: executing
<6> [923.590047] [IGT] gem_create: starting subtest create-clear
<4> [927.627455] gem_create: page allocation failure: order:0, mode:0x104cd2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_RECLAIMABLE), nodemask=(null)
<4> [927.627472] CPU: 6 PID: 7843 Comm: gem_create Tainted: G     U            5.2.0-rc6-CI-CI_DRM_6375+ #1
<4> [927.627474] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4> [927.627475] Call Trace:
<4> [927.627481]  dump_stack+0x67/0x9b
<4> [927.627485]  warn_alloc+0xfa/0x180
<4> [927.627490]  ? __mutex_unlock_slowpath+0x46/0x2b0
<4> [927.627495]  __alloc_pages_nodemask+0xd5c/0x1130
<4> [927.627506]  shmem_alloc_and_acct_page+0x72/0x1e0
<4> [927.627510]  shmem_getpage_gfp.isra.8+0x156/0x860
<4> [927.627517]  shmem_read_mapping_page_gfp+0x3e/0x70
<4> [927.627573]  shmem_get_pages+0x212/0x6d0 [i915]
<4> [927.627608]  ? __i915_gem_object_get_pages+0x18/0xb0 [i915]
<4> [927.627614]  ? lock_acquire+0xa6/0x1c0
<4> [927.627647]  ____i915_gem_object_get_pages+0x1d/0xa0 [i915]
<4> [927.627677]  __i915_gem_object_get_pages+0x59/0xb0 [i915]
<4> [927.627710]  i915_gem_pread_ioctl+0x3ea/0x7d0 [i915]
<4> [927.627713]  ? drm_dev_exit+0x8/0x40
<4> [927.627746]  ? i915_gem_gtt_pread+0x7a0/0x7a0 [i915]
<4> [927.627749]  drm_ioctl_kernel+0x83/0xf0
<4> [927.627753]  drm_ioctl+0x2f3/0x3b0
<4> [927.627786]  ? i915_gem_gtt_pread+0x7a0/0x7a0 [i915]
<4> [927.627792]  ? lock_acquire+0xa6/0x1c0
<4> [927.627796]  do_vfs_ioctl+0xa0/0x6e0
<4> [927.627800]  ? __fget+0x10f/0x200
<4> [927.627803]  ksys_ioctl+0x35/0x60
<4> [927.627807]  __x64_sys_ioctl+0x11/0x20
<4> [927.627809]  do_syscall_64+0x55/0x1c0
<4> [927.627812]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [927.627814] RIP: 0033:0x7f8acc5305d7
<4> [927.627818] Code: Bad RIP value.
Comment 1 CI Bug Log 2019-07-01 10:58:22 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB: igt@gem_create@create-clear - dmesg-fail - gem_create: page allocation failure
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5071/shard-snb1/igt@gem_create@create-clear.html
Comment 2 Chris Wilson 2019-07-01 11:30:44 UTC
That's the impact of making the page reclaim unavailable while it is pending the rcu callback. Should be fine to rearrange like https://patchwork.freedesktop.org/patch/315092/?series=63032&rev=1
Comment 3 Chris Wilson 2019-07-03 19:51:13 UTC
commit c03467ba40f783ebe756114bb68e13a6b404c03a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 3 10:17:17 2019 +0100

    drm/i915/gem: Free pages before rcu-freeing the object
    
    As we have dropped the final reference to the object, we do not need to
    wait until after the rcu grace period to drop its pages. We still require
    struct_mutex to completely unbind the object to release the pages, so we
    still need a free-worker to manage that from process context. By
    scheduling the release of pages before waiting for the rcu should mean
    that we are not trapping those pages from beyond the reach of the
    shrinker.
    
    v2: Pass along the request to skip if the vma is busy to the underlying
    unbind routine, to avoid checking the reservation underneath the
    i915->mm.obj_lock which may be used from inside irq context.
    
    v3: Flip the bit for unbinding while active, for later convenience.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111035
    Fixes: a93615f900bd ("drm/i915: Throw away the active object retirement complexity")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Reviewed-by: Matthew Auld <matthew.auld@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-6-chris@chris-wilson.co.uk

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.