https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_315/fi-pnv-d510/igt@gem_cpu_reloc@forked.html Starting subtest: forked Received signal SIGQUITReceived signal . SIGQUIT. Stack trace: Stack trace: Received signal SIGQUIT. Stack trace: Received signal SIGQUIT. Received signal SIGQUIT. Stack trace: Stack trace: Received signal SIGQUIT. Stack trace: #0 [fatal_sig_handler+0xd6] #0 [fatal_sig_handler+0xd6] #0 [fat al#_0s i[gf_ahtaanl_sidgl_ehra+n0dxlde6r]+ 0xd6] #0 [fatal_sig_handler+0xd6] #1 [killpg+0x40] #1 [killpg+0x40] #1 [killpg+0x40] #0 [fatal #2 [ w#a1i t[+k0ixl1lfp]g +0x40] _sig_handler+0xd6] #1 [killpg+0x40] #2 [ioctl+0x7] #2 [ioctl+0x7] #2 [__poll+0x14] #3 [igt_fork_hang_detector+0x14c#]1 [killpg+0x40] #2 [ioctl+0x7] #3 [drmIoct l#4+ 0[x_2_8r]e al_main268+0x197] #5 [main+0x27] #3 [drmIoctl+0x28] #4 [__gem_execbuf+0x12] #3 [__igt_waitchildren+0x56] #6 [__libc_start_main+0xe7] #4 [__gem_execbuf+0x12] #7 [_start+0x2a] #3 [drmIoctl+0x28] #4 [igt_waitchildren+0x9] #2 [ioctl+0x7] #5 [gem_execbuf+0x9] #6 [run_t #5 [__real_main268+0x212] #5 [gem_execbuf+0x9] ##66 [[rmuani_nt+e0sxt2+70]x 24c] #4 [__gem_execbuf+0x12] est+0x24c] #7 [__real_main268+0x206] #5 [gem_execbuf+0x9] #8 [main+0x27] #7 [__real_main268+0x206] #8 [main+0x27] #6 [run_test+0x24c] #7 [__real_main268+ #9 [__libc_start_main+0xe7] #7 [__libc_start_m a#i3n +[0dxrem7I]o ctl+0x28] #10 [_start+0x2a] ##89 [[__s_tlairbtc+_0sxt2aar]t _main+0xe7] 0x206] #10 [_start+0x2a] #8 [main+0x27] #9 [__libc_start_main+0xe7] #10 [_start+0x2a] #4 [__gem_execbuf+0x12] #5 [gem_execbuf+0x9] #6 [run_test+0x24c] #7 [__real_main268+0x206] #8 [main+0x27] #9 [__libc_start_main+0xe7] #10 [_start+0x2a]
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * PNV: igt@gem_cpu_reloc@forked - timeout - Received signal SIGQUIT - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_315/fi-pnv-d510/igt@gem_cpu_reloc@forked.html
Not much to see here, I think the system got swap happy.
SIGQUITReceived means that kswap daemon picks up this process in random manner and kills it because of too many process or too many pinned memory, correct? This happens only once. Maybe the machine this test is executed has less memory or does not have enough swap space?
SIGQUIT is the runner timeout. I am suggesting that variance in runtime on pnv is due to swap thrashing.
Pure postulation, but I expect commit c03467ba40f783ebe756114bb68e13a6b404c03a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jul 3 10:17:17 2019 +0100 drm/i915/gem: Free pages before rcu-freeing the object As we have dropped the final reference to the object, we do not need to wait until after the rcu grace period to drop its pages. We still require struct_mutex to completely unbind the object to release the pages, so we still need a free-worker to manage that from process context. By scheduling the release of pages before waiting for the rcu should mean that we are not trapping those pages from beyond the reach of the shrinker. v2: Pass along the request to skip if the vma is busy to the underlying unbind routine, to avoid checking the reservation underneath the i915->mm.obj_lock which may be used from inside irq context. v3: Flip the bit for unbinding while active, for later convenience. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111035 Fixes: a93615f900bd ("drm/i915: Throw away the active object retirement complexity") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-6-chris@chris-wilson.co.uk to help here.
So far it hasn't happened again, so either the postulation is correct, or it's a rare one.
See bug 110619 for ideas on what else we need to collect to make these reports more actionable than a shot in the dark.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.