Summary: | [CI][DRMTIP] igt@gem_tiled_fence_blits@normal - fail - Failed assertion: linear[i] == start_val | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | IGT | Assignee: | Default DRI bug account <dri-devel> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | intel-gfx-bugs |
Version: | XOrg git | Keywords: | regression |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | G45, I945GM, I965GM, ILK | i915 features: | GEM/Other |
Description
Martin Peres
2018-10-29 14:38:42 UTC
Neither of those are as likely as commit ff2db94acb53543acd7ba4e2badff59807069365 (upstream/master) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jul 23 11:39:09 2018 +0100 igt/gem_tiled_fence_blits: Remove libdrm_intel dependence Modernise the test to use igt's ioctl library as opposed to the antiquated libdrm_intel. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> gdg/blb can be easily explained: https://patchwork.freedesktop.org/patch/259064/ elk/ilk? commit 3aedf1b000e27abfa1bf179205a81efe2b76a508 (upstream/master) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Oct 29 20:47:35 2018 +0000 igt/gem_tiled_fence_blits: Remember to mark up fence blits Older platforms require fence registers to perform blits, and so userspace is expected to mark up the objects to request fences be assigned. Fixes: ff2db94acb53 ("igt/gem_tiled_fence_blits: Remove libdrm_intel dependence") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108591 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> I haven't yet tested elk/ilk, but on the off chance that this helps... (In reply to Chris Wilson from comment #3) > commit 3aedf1b000e27abfa1bf179205a81efe2b76a508 (upstream/master) > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Oct 29 20:47:35 2018 +0000 > > igt/gem_tiled_fence_blits: Remember to mark up fence blits > > Older platforms require fence registers to perform blits, and so > userspace is expected to mark up the objects to request fences be > assigned. > > Fixes: ff2db94acb53 ("igt/gem_tiled_fence_blits: Remove libdrm_intel > dependence") > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108591 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > I haven't yet tested elk/ilk, but on the off chance that this helps... Still happening with ILK and ELK: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_138/fi-ilk-650/igt@gem_tiled_fence_blits@normal.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_138/fi-elk-e7500/igt@gem_tiled_fence_blits@normal.html Starting subtest: normal (gem_tiled_fence_blits:1194) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_tiled_fence_blits.c:80: (gem_tiled_fence_blits:1194) CRITICAL: Failed assertion: linear[i] == start_val (gem_tiled_fence_blits:1194) CRITICAL: Expected 0x1f4c0000, found 0x1f380000 at offset 0x00000000 As expected, elk/ilk is a completely different bug,https://patchwork.freedesktop.org/series/52013/ and ideally shouldn't be grouped up with the igt bug. (In reply to Chris Wilson from comment #5) > As expected, elk/ilk is a completely different > bug,https://patchwork.freedesktop.org/series/52013/ > and ideally shouldn't be grouped up with the igt bug. commit 55f99bf2a9c331838c981694bc872cd1ec4070b2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 5 09:43:05 2018 +0000 drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 Exercising the gpu reloc path strenuously revealed an issue where the updated relocations (from MI_STORE_DWORD_IMM) were not being observed upon execution. After some experiments with adding pipecontrols (a lot of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe controls or even the current on), it was discovered that we merely needed to delay the EMIT_INVALIDATE by several flushes. It is important to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that needs the delay as opposed to what one might first expect -- that the delay is required for the TLB invalidation to take effect (one presumes to purge any CS buffers) as opposed to a delay after flushing to ensure the writes have landed before triggering invalidation. Testcase: igt/gem_tiled_fence_blits Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181105094305.5767-1-chris@chris-wilson.co.uk BLB is still hapenning: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_139/fi-blb-e6850/igt@gem_tiled_fence_blits@normal.html Starting subtest: normal (gem_tiled_fence_blits:1040) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_tiled_fence_blits.c:80: (gem_tiled_fence_blits:1040) CRITICAL: Failed assertion: linear[i] == start_val (gem_tiled_fence_blits:1040) CRITICAL: Expected 0x06900000, found 0x0aa80000 at offset 0x00000000 Probably should have mentioned the gpu hang, as that makes it a completely different bug. <7> [99.123649] hangcheck rcs0 <7> [99.123667] hangcheck \x09current seqno 1ec, last 1fb, hangcheck 1ec [5952 ms] <7> [99.123671] hangcheck \x09Reset count: 0 (global 0) <7> [99.123676] hangcheck \x09Requests: <7> [99.123688] hangcheck \x09\x09first 1ed [4:2ba8] @ 7677ms: rcs0 <7> [99.123693] hangcheck \x09\x09last 1fb [4:2bb6] @ 7672ms: rcs0 <7> [99.123704] hangcheck \x09\x09active 1ed [4:2ba8] @ 7677ms: rcs0 <7> [99.123709] hangcheck \x09\x09ring->start: 0x00002000 <7> [99.123712] hangcheck \x09\x09ring->head: 0x0000ad60 <7> [99.123716] hangcheck \x09\x09ring->tail: 0x0000afc8 <7> [99.123720] hangcheck \x09\x09ring->emit: 0x0000afc8 <7> [99.123724] hangcheck \x09\x09ring->space: 0x00006a78 <7> [99.123729] hangcheck [head ad70, postfix ad88, tail ad98, batch 0x00000000_00326000]: <7> [99.123745] hangcheck [0000] 02000001 00000000 18800080 00326001 02000000 00000000 10800001 000000c0 <7> [99.123750] hangcheck [0020] 000001ed 01000000 <7> [99.123769] hangcheck \x09RING_START: 0x00002000 <7> [99.123774] hangcheck \x09RING_HEAD: 0x0000ad84 <7> [99.123778] hangcheck \x09RING_TAIL: 0x0000afc8 <7> [99.123782] hangcheck \x09RING_CTL: 0x0001f001 <7> [99.123786] hangcheck \x09RING_MODE: 0x00000000 <7> [99.123791] hangcheck \x09ACTHD: 0x00000000_0060ad84 <7> [99.123795] hangcheck \x09BBADDR: 0x00000000_00000000 <7> [99.123799] hangcheck \x09DMA_FADDR: 0x00000000_0000cfc8 <7> [99.123803] hangcheck \x09IPEIR: 0x00000000 <7> [99.123807] hangcheck \x09IPEHR: 0x02000000 <7> [99.123833] hangcheck \x09\x09E 1ed [4:2ba8] @ 7677ms: rcs0 <7> [99.123870] hangcheck \x09\x09E 1ee [4:2ba9] @ 7676ms: rcs0 <7> [99.123874] hangcheck \x09\x09E 1ef [4:2baa] @ 7676ms: rcs0 <7> [99.123879] hangcheck \x09\x09E 1f0 [4:2bab] @ 7675ms: rcs0 <7> [99.123884] hangcheck \x09\x09E 1f1 [4:2bac] @ 7675ms: rcs0 <7> [99.123888] hangcheck \x09\x09E 1f2 [4:2bad] @ 7675ms: rcs0 <7> [99.123893] hangcheck \x09\x09E 1f3 [4:2bae] @ 7675ms: rcs0 <7> [99.123897] hangcheck \x09\x09...skipping 7 executing requests... <7> [99.123902] hangcheck \x09\x09E 1fb [4:2bb6] @ 7672ms: rcs0 <7> [99.123905] hangcheck \x09\x09Queue priority: -2147483648 <7> [99.123926] hangcheck \x09gem_tiled_fence [1040] waiting for 1ed <7> [99.123955] hangcheck IRQ? 0x1 (breadcrumbs? yes) <7> [99.123959] hangcheck HWSP: <7> [99.123965] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [99.123968] hangcheck * <7> [99.123974] hangcheck [00c0] 000001ec 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [99.123979] hangcheck [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [99.123983] hangcheck * <7> [99.123987] hangcheck Idle? no Looks intact. So could be a reloc coherency issue and it tried to read/write into garbage, but then it just uses the stale locations of old buffers. Still that's my leading candidate. Fwiw, my slow pIIIm i915gm doesn't seem to suffer the same fate. (In reply to Chris Wilson from comment #8) > Fwiw, my slow pIIIm i915gm doesn't seem to suffer the same fate. Except I should check pnv for my closest equiv to blb. commit 7fa28e146994da1e8a4124623d7da97b798ea520 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 19 15:41:53 2018 +0000 drm/i915: Write GPU relocs harder with gen3 Under moderate amounts of GPU stress, we can observe on Bearlake and Pineview (later gen3 models) that we execute the following batch buffer before the write into the batch is coherent. Adding extra (tested with upto 32x) MI_FLUSH to either the invalidation, flush or both phases does not solve the incoherency issue with the relocations, but emitting the MI_STORE_DWORD_IMM twice does. So be it. Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1-chris@chris-wilson.co.uk (In reply to Chris Wilson from comment #10) > commit 7fa28e146994da1e8a4124623d7da97b798ea520 (HEAD -> > drm-intel-next-queued, drm-intel/for-linux-next, > drm-intel/drm-intel-next-queued) > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Nov 19 15:41:53 2018 +0000 > > drm/i915: Write GPU relocs harder with gen3 > > Under moderate amounts of GPU stress, we can observe on Bearlake and > Pineview (later gen3 models) that we execute the following batch buffer > before the write into the batch is coherent. Adding extra (tested with > upto 32x) MI_FLUSH to either the invalidation, flush or both phases does > not solve the incoherency issue with the relocations, but emitting the > MI_STORE_DWORD_IMM twice does. So be it. > > Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") > Testcase: igt/gem_tiled_fence_blits # blb/pnv > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Link: > https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1- > chris@chris-wilson.co.uk Oddly-enough, this was not sufficient to fix the issue, but it stopped failing after drmtip_176 (https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_176/fi-gdg-551/igt@gem_tiled_fence_blits@normal.html), so closing! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.