https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_20/fi-gdg-551/igt@gem_exec_reloc@basic-wc-read-noreloc.html (gem_exec_reloc:1318) CRITICAL: Test assertion failure function basic_reloc, file ../tests/gem_exec_reloc.c:422: (gem_exec_reloc:1318) CRITICAL: Failed assertion: reloc.presumed_offset == offset (gem_exec_reloc:1318) CRITICAL: error: 0x322000 != 0xffffffff Subtest basic-wc-read-noreloc failed.
It hit the slow path (where we have to tell userspace to do relocations on the next pass) where we did not expect it to. Could be an interrupt, could be mempressure, or it could be a bug (in igt or execbuf).
*** Bug 106376 has been marked as a duplicate of this bug. ***
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-gdg-551/igt@gem_exec_big.html (gem_exec_big:1176) CRITICAL: Test assertion failure function execN, file ../tests/gem_exec_big.c:192: (gem_exec_big:1176) CRITICAL: Failed assertion: tmp == gem_reloc[n].presumed_offset (gem_exec_big:1176) CRITICAL: error: -559038845 != 3805184 Test gem_exec_big failed.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_36/fi-gdg-551/igt@gem_exec_reloc@basic-wc-gtt.html Also seen on a non-noreloc test.
(In reply to Martin Peres from comment #3) > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-gdg-551/ > igt@gem_exec_big.html > > (gem_exec_big:1176) CRITICAL: Test assertion failure function execN, file > ../tests/gem_exec_big.c:192: > (gem_exec_big:1176) CRITICAL: Failed assertion: tmp == > gem_reloc[n].presumed_offset > (gem_exec_big:1176) CRITICAL: error: -559038845 != 3805184 > Test gem_exec_big failed. Careful, that isn't the same class of failure. That's arguably the same read/write incoherency we see elsewhere in gdg. It's where the offset is 0xffffffff that is a test bug.
(In reply to Chris Wilson from comment #5) > (In reply to Martin Peres from comment #3) > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-gdg-551/ > > igt@gem_exec_big.html > > > > (gem_exec_big:1176) CRITICAL: Test assertion failure function execN, file > > ../tests/gem_exec_big.c:192: > > (gem_exec_big:1176) CRITICAL: Failed assertion: tmp == > > gem_reloc[n].presumed_offset > > (gem_exec_big:1176) CRITICAL: error: -559038845 != 3805184 > > Test gem_exec_big failed. > > Careful, that isn't the same class of failure. That's arguably the same > read/write incoherency we see elsewhere in gdg. It's where the offset is > 0xffffffff that is a test bug. ok! I will file another bug then!
Fingers crossed, but commit fddcd00a49e9122a3579247151e9cb3ce5a1a36e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 3 09:33:35 2018 +0100 drm/i915: Force the slow path after a user-write error If we fail to write the user relocation back when it is changed, force ourselves to take the slow relocation path where we can handle faults in the write path. There is still an element of dubiousness as having patched up the batch to use the correct offset, it no longer matches the presumed_offset in the relocation, so a second pass may miss any changes in layout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-3-chris@chris-wilson.co.uk seems a more than likely suspect.
(In reply to Chris Wilson from comment #7) > Fingers crossed, but > > commit fddcd00a49e9122a3579247151e9cb3ce5a1a36e > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Sep 3 09:33:35 2018 +0100 > > drm/i915: Force the slow path after a user-write error > > If we fail to write the user relocation back when it is changed, force > ourselves to take the slow relocation path where we can handle faults in > the write path. There is still an element of dubiousness as having > patched up the batch to use the correct offset, it no longer matches the > presumed_offset in the relocation, so a second pass may miss any changes > in layout. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Link: > https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-3- > chris@chris-wilson.co.uk > > seems a more than likely suspect. This still happened days later: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_108/fi-gdg-551/igt@gem_exec_reloc@basic-wc-cpu-noreloc.html
Last seen three days ago.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_157/fi-gdg-551/igt@gem_exec_big.html (gem_exec_big:949) CRITICAL: Test assertion failure function exec1, file ../tests/i915/gem_exec_big.c:112: (gem_exec_big:949) CRITICAL: Failed assertion: tmp == gem_reloc[0].presumed_offset (gem_exec_big:949) CRITICAL: error: 0 != 3289088 Test gem_exec_big failed.
A CI Bug Log filter associated to this bug has been updated: {- fi-gdg-551: igt@gem_exec_big - fail - Failed assertion: tmp == gem_reloc[(n|0)].presumed_offset -} {+ GDG HSW: igt@gem_exec_big - fail - Failed assertion: tmp == gem_reloc[(n|0)].presumed_offset +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_197/fi-hsw-peppy/igt@gem_exec_big.html
Still happening, latest occurrence: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_231/fi-gdg-551/igt@gem_exec_big.html
This issue still occurs on the fi-gdg-551 machine every 1-2 weeks. It has not been observed on the HSW machines in past 4 months. In the past 4 months, the bug has only been observed with WC and no-reloc flags on the fi-gdg-551 machine. In the most recent occurrence of the issue the offset and presumed offset didn't match and presumed_offset != -1. (gem_exec_reloc:1007) CRITICAL: Failed assertion: reloc.presumed_offset == offset (gem_exec_reloc:1007) CRITICAL: error: 0x325000 != 0x327000 The older occurrences of the bug do not have logs available. Would it be possible to retain logs for this filter beyond 2 months?
The only bug that is relevant here is where we report 0xffffffff, i.e. we unexpectedly hit the relocation slow path. Please do not conflate the wider gdg incoherency with this bug.
A CI Bug Log filter associated to this bug has been updated: {- GDG: igt@gem_exec_reloc@basic-wc* - fail - Failed assertion: reloc.presumed_offset == offset -} {+ GDG: igt@gem_exec_reloc@basic-wc* - fail - Failed assertion: reloc.presumed_offset == offset, error: 0x[\da-f]+ != 0xffffffff +} No new failures caught with the new filter
The CI Bug Log issue associated to this bug has been updated. ### Removed filters * GDG HSW: igt@gem_exec_big - fail - Failed assertion: tmp == gem_reloc[(n|0)].presumed_offset (added on 6 months, 4 weeks ago)
(In reply to Chris Wilson from comment #14) > The only bug that is relevant here is where we report 0xffffffff, i.e. we > unexpectedly hit the relocation slow path. > > Please do not conflate the wider gdg incoherency with this bug. Filing updated! Thanks!
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * GDG: igt@gem_exec_reloc@basic-wc-noreloc - fail - Failed assertion: reloc.presumed_offset == offset - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_355/fi-gdg-551/igt@gem_exec_reloc@basic-wc-noreloc.html
@Chris, there is a new failure captured under this bug https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_389/fi-gdg-551/igt@gem_exec_reloc@basic-wc-read.html Starting subtest: basic-wc-read (gem_exec_reloc:964) CRITICAL: Test assertion failure function basic_reloc, file ../tests/i915/gem_exec_reloc.c:424: (gem_exec_reloc:964) CRITICAL: Failed assertion: reloc.presumed_offset == offset (gem_exec_reloc:964) CRITICAL: error: 0x30b000 != 0xffffffff Subtest basic-wc-read failed.
A CI Bug Log filter associated to this bug has been updated: {- GDG: igt@gem_exec_reloc@basic-wc-noreloc - fail - Failed assertion: reloc.presumed_offset == offset -} {+ GDG: igt@gem_exec_reloc@basic-wc-.* - fail - Failed assertion: reloc.presumed_offset == offset +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_389/fi-gdg-551/igt@gem_exec_reloc@basic-wc-gtt-noreloc.html
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.