Bug 110990 - [CI][DRMTIP] igt@gem_cpu_reloc@forked - timeout - Received signal SIGQUIT
Summary: [CI][DRMTIP] igt@gem_cpu_reloc@forked - timeout - Received signal SIGQUIT
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-25 10:07 UTC by Lakshmi
Modified: 2019-07-27 16:13 UTC (History)
1 user (show)

See Also:
i915 platform: PNV
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-06-25 10:07:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_315/fi-pnv-d510/igt@gem_cpu_reloc@forked.html

Starting subtest: forked
Received signal SIGQUITReceived signal .
SIGQUIT.
Stack trace: 
Stack trace: 
Received signal SIGQUIT.
Stack trace: 
Received signal SIGQUIT.
Received signal SIGQUIT.
Stack trace: 
Stack trace: 
Received signal SIGQUIT.
Stack trace: 
 #0 [fatal_sig_handler+0xd6]
 #0 [fatal_sig_handler+0xd6]
 #0 [fat al#_0s i[gf_ahtaanl_sidgl_ehra+n0dxlde6r]+
0xd6]
 #0 [fatal_sig_handler+0xd6]
 #1 [killpg+0x40]
 #1 [killpg+0x40] 
#1 [killpg+0x40]
 #0 [fatal #2 [ w#a1i t[+k0ixl1lfp]g
+0x40]
_sig_handler+0xd6]
 #1 [killpg+0x40]
 #2 [ioctl+0x7]
 #2 [ioctl+0x7]
 #2 [__poll+0x14]
  #3 [igt_fork_hang_detector+0x14c#]1
 [killpg+0x40]
 #2 [ioctl+0x7]
 #3 [drmIoct l#4+ 0[x_2_8r]e
al_main268+0x197]
 #5 [main+0x27]
 #3 [drmIoctl+0x28]
 #4 [__gem_execbuf+0x12]
 #3 [__igt_waitchildren+0x56]
 #6 [__libc_start_main+0xe7]
 #4 [__gem_execbuf+0x12]
 #7 [_start+0x2a]
 #3 [drmIoctl+0x28]
 #4 [igt_waitchildren+0x9]
 #2 [ioctl+0x7]
 #5 [gem_execbuf+0x9]
 #6 [run_t #5 [__real_main268+0x212]
 #5 [gem_execbuf+0x9]
  ##66  [[rmuani_nt+e0sxt2+70]x
24c]
 #4 [__gem_execbuf+0x12]
est+0x24c]
 #7 [__real_main268+0x206]
 #5 [gem_execbuf+0x9]
 #8 [main+0x27]
 #7 [__real_main268+0x206]
 #8 [main+0x27]
 #6 [run_test+0x24c]
 #7 [__real_main268+ #9 [__libc_start_main+0xe7]
 #7 [__libc_start_m a#i3n +[0dxrem7I]o
ctl+0x28]
 #10 [_start+0x2a]
  ##89  [[__s_tlairbtc+_0sxt2aar]t
_main+0xe7]
0x206]
 #10 [_start+0x2a]
 #8 [main+0x27]
 #9 [__libc_start_main+0xe7]
 #10 [_start+0x2a]
 #4 [__gem_execbuf+0x12]
 #5 [gem_execbuf+0x9]
 #6 [run_test+0x24c]
 #7 [__real_main268+0x206]
 #8 [main+0x27]
 #9 [__libc_start_main+0xe7]
 #10 [_start+0x2a]
Comment 1 CI Bug Log 2019-06-25 10:07:57 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* PNV: igt@gem_cpu_reloc@forked - timeout - Received signal SIGQUIT
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_315/fi-pnv-d510/igt@gem_cpu_reloc@forked.html
Comment 2 Chris Wilson 2019-06-25 13:11:26 UTC
Not much to see here, I think the system got swap happy.
Comment 3 Caz.Yokoyama 2019-06-26 16:10:06 UTC
SIGQUITReceived means that kswap daemon picks up this process in random manner and kills it because of too many process or too many pinned memory, correct? This happens only once. Maybe the machine this test is executed has less memory or does not have enough swap space?
Comment 4 Chris Wilson 2019-06-26 16:22:39 UTC
SIGQUIT is the runner timeout. I am suggesting that variance in runtime on pnv is due to swap thrashing.
Comment 5 Chris Wilson 2019-07-03 19:52:30 UTC
Pure postulation, but I expect

commit c03467ba40f783ebe756114bb68e13a6b404c03a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 3 10:17:17 2019 +0100

    drm/i915/gem: Free pages before rcu-freeing the object
    
    As we have dropped the final reference to the object, we do not need to
    wait until after the rcu grace period to drop its pages. We still require
    struct_mutex to completely unbind the object to release the pages, so we
    still need a free-worker to manage that from process context. By
    scheduling the release of pages before waiting for the rcu should mean
    that we are not trapping those pages from beyond the reach of the
    shrinker.
    
    v2: Pass along the request to skip if the vma is busy to the underlying
    unbind routine, to avoid checking the reservation underneath the
    i915->mm.obj_lock which may be used from inside irq context.
    
    v3: Flip the bit for unbinding while active, for later convenience.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111035
    Fixes: a93615f900bd ("drm/i915: Throw away the active object retirement complexity")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Reviewed-by: Matthew Auld <matthew.auld@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-6-chris@chris-wilson.co.uk

to help here.
Comment 6 Francesco Balestrieri 2019-07-23 09:17:43 UTC
So far it hasn't happened again, so either the postulation is correct, or it's a rare one.
Comment 7 Chris Wilson 2019-07-27 16:13:23 UTC
See bug 110619 for ideas on what else we need to collect to make these reports more actionable than a shot in the dark.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.