Bug 106099 - [CI gdg] igt@gem_exec_reloc@basic-wc-(gtt|cpu)* - fail - Failed assertion: reloc.presumed_offset == offset
Summary: [CI gdg] igt@gem_exec_reloc@basic-wc-(gtt|cpu)* - fail - Failed assertion: re...
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Tvrtko Ursulin
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 106376 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-04-17 11:40 UTC by Martin Peres
Modified: 2018-12-05 09:38 UTC (History)
1 user (show)

See Also:
i915 platform: I915G
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-04-17 11:40:12 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_20/fi-gdg-551/igt@gem_exec_reloc@basic-wc-read-noreloc.html

(gem_exec_reloc:1318) CRITICAL: Test assertion failure function basic_reloc, file ../tests/gem_exec_reloc.c:422:
(gem_exec_reloc:1318) CRITICAL: Failed assertion: reloc.presumed_offset == offset
(gem_exec_reloc:1318) CRITICAL: error: 0x322000 != 0xffffffff
Subtest basic-wc-read-noreloc failed.
Comment 1 Chris Wilson 2018-04-17 11:46:34 UTC
It hit the slow path (where we have to tell userspace to do relocations on the next pass) where we did not expect it to. Could be an interrupt, could be mempressure, or it could be a bug (in igt or execbuf).
Comment 2 Chris Wilson 2018-05-03 13:20:02 UTC
*** Bug 106376 has been marked as a duplicate of this bug. ***
Comment 3 Martin Peres 2018-05-03 14:20:42 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-gdg-551/igt@gem_exec_big.html

(gem_exec_big:1176) CRITICAL: Test assertion failure function execN, file ../tests/gem_exec_big.c:192:
(gem_exec_big:1176) CRITICAL: Failed assertion: tmp == gem_reloc[n].presumed_offset
(gem_exec_big:1176) CRITICAL: error: -559038845 != 3805184
Test gem_exec_big failed.
Comment 4 Martin Peres 2018-05-22 22:28:41 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_36/fi-gdg-551/igt@gem_exec_reloc@basic-wc-gtt.html

Also seen on a non-noreloc test.
Comment 5 Chris Wilson 2018-05-23 08:52:23 UTC
(In reply to Martin Peres from comment #3)
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-gdg-551/
> igt@gem_exec_big.html
> 
> (gem_exec_big:1176) CRITICAL: Test assertion failure function execN, file
> ../tests/gem_exec_big.c:192:
> (gem_exec_big:1176) CRITICAL: Failed assertion: tmp ==
> gem_reloc[n].presumed_offset
> (gem_exec_big:1176) CRITICAL: error: -559038845 != 3805184
> Test gem_exec_big failed.

Careful, that isn't the same class of failure. That's arguably the same read/write incoherency we see elsewhere in gdg. It's where the offset is 0xffffffff that is a test bug.
Comment 6 Martin Peres 2018-05-23 21:28:15 UTC
(In reply to Chris Wilson from comment #5)
> (In reply to Martin Peres from comment #3)
> > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-gdg-551/
> > igt@gem_exec_big.html
> > 
> > (gem_exec_big:1176) CRITICAL: Test assertion failure function execN, file
> > ../tests/gem_exec_big.c:192:
> > (gem_exec_big:1176) CRITICAL: Failed assertion: tmp ==
> > gem_reloc[n].presumed_offset
> > (gem_exec_big:1176) CRITICAL: error: -559038845 != 3805184
> > Test gem_exec_big failed.
> 
> Careful, that isn't the same class of failure. That's arguably the same
> read/write incoherency we see elsewhere in gdg. It's where the offset is
> 0xffffffff that is a test bug.

ok! I will file another bug then!
Comment 7 Chris Wilson 2018-09-06 19:47:23 UTC
Fingers crossed, but

commit fddcd00a49e9122a3579247151e9cb3ce5a1a36e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 3 09:33:35 2018 +0100

    drm/i915: Force the slow path after a user-write error
    
    If we fail to write the user relocation back when it is changed, force
    ourselves to take the slow relocation path where we can handle faults in
    the write path. There is still an element of dubiousness as having
    patched up the batch to use the correct offset, it no longer matches the
    presumed_offset in the relocation, so a second pass may miss any changes
    in layout.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-3-chris@chris-wilson.co.uk

seems a more than likely suspect.
Comment 8 Martin Peres 2018-09-20 17:29:31 UTC
(In reply to Chris Wilson from comment #7)
> Fingers crossed, but
> 
> commit fddcd00a49e9122a3579247151e9cb3ce5a1a36e
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Sep 3 09:33:35 2018 +0100
> 
>     drm/i915: Force the slow path after a user-write error
>     
>     If we fail to write the user relocation back when it is changed, force
>     ourselves to take the slow relocation path where we can handle faults in
>     the write path. There is still an element of dubiousness as having
>     patched up the batch to use the correct offset, it no longer matches the
>     presumed_offset in the relocation, so a second pass may miss any changes
>     in layout.
>     
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-3-
> chris@chris-wilson.co.uk
> 
> seems a more than likely suspect.

This still happened days later: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_108/fi-gdg-551/igt@gem_exec_reloc@basic-wc-cpu-noreloc.html
Comment 9 Francesco Balestrieri 2018-12-04 08:42:37 UTC
Last seen three days ago.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.