Bug 99209

Summary:	[EGL, i965] dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.textures.copyteximage2d_texsubimage2d_render
Product:	Mesa	Reporter:	Chad Versace <chadversary>
Component:	Drivers/DRI/i965	Assignee:	Topi Pohjolainen <topi.pohjolainen>
Status:	RESOLVED FIXED	QA Contact:	Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity:	normal
Priority:	medium	CC:	ben, chadversary, harish.krupo.kps, jason, kenneth, lemody, mark.a.janes, pallavi.g, topi.pohjolainen
Version:	git
Hardware:	Other
OS:	All
See Also:	https://bugs.freedesktop.org/show_bug.cgi?id=99085 https://bugs.freedesktop.org/show_bug.cgi?id=98602
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	98315
Attachments:	hack patch to remove meta usage

Description Chad Versace 2016-12-27 22:59:50 UTC

Test: dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.textures.copytexi
mage2d_texsubimage2d_render

The test fails consistently for me, even when the refcount bugfix in 99085 [1] is applied. It also fails when ran under gdb. Surprisingly, the test consistently *PASSES* when ran under helgrind.

Setup:
    mesa: master@d6545f2                                                      
    deqp: nougat-cts-dev@4acf725 with local patches
    deqp-target: x11_egl
    hw: Intel Broadwell 0x1616

[1]: https://bugs.freedesktop.org/show_bug.cgi?id=99085

Comment 1 Chad Versace 2016-12-28 00:45:20 UTC

I spent 2 hours on this one today, and I'm lost.

Comment 2 Tapani Pälli 2016-12-28 07:14:47 UTC

This is sometimes passing for me. It seems image comparison fails, unfortunately this test does not dump the image data within TestResults.qpa, maybe need to start with that to see what's the diff.

Comment 3 Chad Versace 2016-12-28 19:09:56 UTC

+ken +jason

Helgrind complains about potential read/write races on drm_intel_gem_bo::offset64. According to helgrind, the conflict occurs when intel_batchbuffer_flush() updates the offset in thread1 and brw_update_texture_surfaces() reads the offset during batchbuffer construction in thread2.

Below is the relevant Helgrind snippet. Helgrind claims to detect this error exactly once during the testrun.

Possible data race during write of size 8 at 0xCD29890 by thread #2
Locks held: 3, at addresses 0x7E49FE8 0x7E6E3B8 0xA382CB0
   at 0xB0E1DB7: drm_intel_update_buffer_offsets2 (intel_bufmgr_gem.c:2292)
   by 0xB0E2316: do_exec2 (intel_bufmgr_gem.c:2450)
   by 0xB0E24C0: drm_intel_gem_bo_context_exec (intel_bufmgr_gem.c:2493)
   by 0xA92EE5E: do_flush_locked (intel_batchbuffer.c:359)
   by 0xA92F0F1: _intel_batchbuffer_flush (intel_batchbuffer.c:422)
   by 0xA905724: brw_fence_insert (brw_sync.c:86)
   by 0xA905B7D: brw_dri_create_fence (brw_sync.c:255)
   by 0x811F886: dri2_create_sync (egl_dri2.c:2563)
   by 0x8112249: _eglCreateSync (eglapi.c:1659)
   by 0x811237B: eglCreateSyncKHR (eglapi.c:1684)
   by 0x79D789: eglw::FuncPtrLibrary::createSyncKHR(void*, unsigned int, int const*) const (eglwFuncPtrLibraryImpl.inl:94)
   by 0x60E798: deqp::egl::GLES2ThreadTest::FenceSync::init(deqp::egl::GLES2ThreadTest::EGLThread&, bool) (teglGLES2SharingThreadedTests.cpp:301)

This conflicts with a previous read of size 8 by thread #3
Locks held: 2, at addresses 0xA386C40 0xCD29130
   at 0xB0E14B9: do_bo_emit_reloc (intel_bufmgr_gem.c:2042)
   by 0xB0E1714: drm_intel_gem_bo_emit_reloc (intel_bufmgr_gem.c:2104)
   by 0xB0D651D: drm_intel_bo_emit_reloc (intel_bufmgr.c:205)
   by 0xA90F926: brw_emit_surface_state (brw_wm_surface_state.c:173)
   by 0xA9108A8: brw_update_texture_surface (brw_wm_surface_state.c:646)
   by 0xA911C18: update_stage_texture_surfaces (brw_wm_surface_state.c:1271)
   by 0xA911D42: brw_update_texture_surfaces (brw_wm_surface_state.c:1302)
   by 0xA904480: check_and_emit_atom (brw_state_upload.c:771)
 Address 0xcd29890 is 48 bytes inside a block of size 360 alloc'd
   at 0x4C2DE60: calloc (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
   by 0xB0DACBF: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem.c:811)
   by 0xB0DBADB: drm_intel_gem_bo_alloc_tiled (intel_bufmgr_gem.c:945)
   by 0xB0D61A7: drm_intel_bo_alloc_tiled (intel_bufmgr.c:83)
   by 0xA93A0E4: miptree_create (intel_mipmap_tree.c:664)
   by 0xA93A154: intel_miptree_create (intel_mipmap_tree.c:688)
   by 0xA948CAB: intel_miptree_create_for_teximage (intel_tex_image.c:109)
   by 0xA9476F3: intel_alloc_texture_image_buffer (intel_tex.c:95)
   by 0xA948E01: intelTexImage (intel_tex_image.c:140)
   by 0xA5C8BAC: _mesa_get_fallback_texture (texobj.c:1049)
   by 0xA5D2385: update_single_program_texture (texstate.c:563)
   by 0xA5D242A: update_program_texture_state (texstate.c:588)
 Block was alloc'd by thread #2

Comment 4 Kenneth Graunke 2016-12-28 21:20:22 UTC

(In reply to Chad Versace from comment #3)
> +ken +jason
> 
> Helgrind complains about potential read/write races on
> drm_intel_gem_bo::offset64. According to helgrind, the conflict occurs when
> intel_batchbuffer_flush() updates the offset in thread1 and
> brw_update_texture_surfaces() reads the offset during batchbuffer
> construction in thread2.

I think that should be harmless.  offset64 is the presumed location of the buffer - i.e. our guess where the kernel relocated it to on the last execbuf.  If we guess correctly, the kernel sees that everything's where we think it is and skips relocating things.  If we guess wrong, it goes ahead and does relocations anyway.

At least, that's how I understand it.  It's good to know about this though.

I did notice that it passes with INTEL_DEBUG=sync (which I suppose is not surprising).

Comment 5 Tapani Pälli 2017-01-09 10:44:40 UTC

Created attachment 128823 [details] [review]
hack

I spent some time with this today. I noticed that this test passes for me consistently if I disable _mesa_meta_pbo_TexSubImage usage in intelTexSubImage. In case I let it use meta and when the test fails, variable 'create_pbo' is true. I'm attaching a hack that makes this test pass in case this gives some additional hints what's happening.

Comment 6 Tapani Pälli 2017-01-09 11:59:19 UTC

Created attachment 128824 [details] [review]
patch to remove meta usage

Here's a patch to remove meta usage, this fixes the issue for me. I don't spot any regression in CI, only bunch of tests hat start to pass when expecting a failure. Could we do such move or is meta version preferred?

Comment 7 Jason Ekstrand 2017-01-09 15:58:33 UTC

(In reply to Tapani Pälli from comment #6)
> Created attachment 128824 [details] [review] [review]
> patch to remove meta usage
> 
> Here's a patch to remove meta usage, this fixes the issue for me. I don't
> spot any regression in CI, only bunch of tests hat start to pass when
> expecting a failure. Could we do such move or is meta version preferred?

Removing meta shouldn't fix anything here since meta isn't really doing anything the user couldn't already do.  If removing meta fixes it, then we're just papering over some other bug.

(In reply to Kenneth Graunke from comment #4)
> (In reply to Chad Versace from comment #3)
> > +ken +jason
> > 
> > Helgrind complains about potential read/write races on
> > drm_intel_gem_bo::offset64. According to helgrind, the conflict occurs when
> > intel_batchbuffer_flush() updates the offset in thread1 and
> > brw_update_texture_surfaces() reads the offset during batchbuffer
> > construction in thread2.

This has me concerned...

> I think that should be harmless.  offset64 is the presumed location of the
> buffer - i.e. our guess where the kernel relocated it to on the last
> execbuf.  If we guess correctly, the kernel sees that everything's where we
> think it is and skips relocating things.  If we guess wrong, it goes ahead
> and does relocations anyway.

Not so much.  The race is actually way more subtle.  When we add a relocation to the BO, we write some value into the BO based on a read of offset64.  We then pass the list of BOs off to the kernel with their offsets.  If the offsets in the list match the kernel's view of addresses, then it skips doing any relocations.  Suppose we have the following:

1) We create a texture.  By default it has offset64 == 0
2) Thread A emits a relocation for some reason
3) Thread B does stuff, flushes the batch, and writes the new value into offset64
4) Thread A finishes what it was doing and flushes the batch

Given that the kernel doesn't move things around on us like crazy, there's a decent chance that offset64 hasn't changed between (3) and (4) so the offsets will match the kernel's view when A submits in (4).  However, because we emitted a reloaction in (2) and the offset changed in (3) before we submit the batch in (4), the offset we provide to the kernel in the BO list is different from the offset with which the relocation was emitted.  The kernel doesn't relocate and things explode.

In other words, the problem isn't so much a race between threads A and B as it is that all relocations need to happen with the same offset as the one we send off to the kernel.  This is fine in a single-context environment because we only have one batch in flight at any given time and so the value of offset64 is always exactly what was returned by the kernel last time and will remain that way until the current batch is submitted and we update it.  In a multi-context environment we're just toast.

The solution here is that each context needs to track its own set of offsets.  There's no real utility in sharing them since each hardware context has it's own GTT on modern hardware.  On older hardware, we'll do the relocation for the first batch we submit with the BO but then the kernel will tell us the offset and future batches won't relocate.  In either case, tracking it globally in libdrm isn't gaining us anything and is, in fact, the problem.

Comment 8 Tapani Pälli 2017-01-09 16:37:39 UTC

(In reply to Jason Ekstrand from comment #7)
> (In reply to Tapani Pälli from comment #6)
> > Created attachment 128824 [details] [review] [review] [review]
> > patch to remove meta usage
> > 
> > Here's a patch to remove meta usage, this fixes the issue for me. I don't
> > spot any regression in CI, only bunch of tests hat start to pass when
> > expecting a failure. Could we do such move or is meta version preferred?
> 
> Removing meta shouldn't fix anything here since meta isn't really doing
> anything the user couldn't already do.  If removing meta fixes it, then
> we're just papering over some other bug.
> 

OK, I figured it might have some bug considering the busy bo (test fails only when the bo is busy), I haven't really gone through that code though.

Comment 9 Chad Versace 2017-01-09 16:58:25 UTC

(In reply to Jason Ekstrand from comment #7)
[...]
> The solution here is that each context needs to track its own set of
> offsets.  There's no real utility in sharing them since each hardware
> context has it's own GTT on modern hardware.  On older hardware, we'll do
> the relocation for the first batch we submit with the BO but then the kernel
> will tell us the offset and future batches won't relocate.  In either case,
> tracking it globally in libdrm isn't gaining us anything and is, in fact,
> the problem.

I infer that a correct fix (not a hack) would require i965 to stop using, at a minimum, the libdrm exec functions (drm_intel_*_exec()). We could, to mitigate the pain of fixing this, and to write a backportable patch, continue using drm_intel_bo throughout i965; we just need to, at least initially, stop using drm_intel_*_exec.  Do you agree? Am I misunderstanding the problem?

Comment 10 Chad Versace 2017-01-09 16:59:45 UTC

(In reply to Chad Versace from comment #9)
> (In reply to Jason Ekstrand from comment #7)
[...]
> I infer that a correct fix (not a hack) would require i965 to stop using, at
> a minimum, the libdrm exec functions (drm_intel_*_exec()). We could, to
> mitigate the pain of fixing this, and to write a backportable patch,
> continue using drm_intel_bo throughout i965; we just need to, at least
> initially, stop using drm_intel_*_exec.  Do you agree? Am I misunderstanding
> the problem?

If that's true, then...

Oh Cthulhu! Why did I assign this bug to myself!?!?

Comment 11 Kenneth Graunke 2017-01-10 06:16:28 UTC

Jason was thinking we use I915_EXEC_NO_RELOC, but we don't.  I don't think that's a problem.  An easy way to sanity check that would be to remove offset64 from Mesa's relocations (effectively always programming an assumed offset of 0).

Comment 12 Kenneth Graunke 2017-01-10 06:33:13 UTC

I think Tapani is on to something here.  If the texture is busy and we can't immediately write to it, _mesa_meta_pbo_TexSubImage decides to create a PBO, copy the user supplied data into the PBO, then essentially blit (texture/render) from the PBO to the destination texture.  It does this to try and avoid stalling.

So...it should totally be legal to do:

   bool tex_busy = true;

namely, always take this "create a PBO and blit" path.  It should always be viable, and this should remove some unpredictability.

However, doing so uncovers a bit problem:

deqp-egl: main/texobj.c:2030: _mesa_unlock_context_textures: Assertion `ctx->Shared->TextureStateStamp == ctx->TextureStateTimestamp' failed.

I think the Meta path is exposing Mesa's bogus texture locking.

Comment 13 Kenneth Graunke 2017-01-10 06:34:01 UTC

(typo..."big problem" not "bit problem".)

Comment 14 Kenneth Graunke 2017-01-10 08:42:34 UTC

if you want to burn some cycles looking into making Mesa's texture locking less incompetent, that would probably be worthwhile.  I'm also a bit concerned that I had to make the texture lock a recursive mutex a while back, when Kristian implemented meta fast clears...it may be that meta altering texturing state in this case isn't really valid - similar to our old recursive-meta hell.

It may be that switching things to BLORP is more of an actual solution than the paper-over we feared it might be.

But, again, feel free to look into it - I'm mostly speculating at this point.

Comment 15 Jason Ekstrand 2017-01-11 01:03:52 UTC

(In reply to Kenneth Graunke from comment #11)
> Jason was thinking we use I915_EXEC_NO_RELOC, but we don't.  I don't think
> that's a problem.  An easy way to sanity check that would be to remove
> offset64 from Mesa's relocations (effectively always programming an assumed
> offset of 0).

I think Ken is right and it's not as dire as I'd feared.  Relocations should be safe so long as we continue to not use NO_RELOC.

(In reply to Kenneth Graunke from comment #14)
> if you want to burn some cycles looking into making Mesa's texture locking
> less incompetent, that would probably be worthwhile.  I'm also a bit
> concerned that I had to make the texture lock a recursive mutex a while
> back, when Kristian implemented meta fast clears...it may be that meta
> altering texturing state in this case isn't really valid - similar to our
> old recursive-meta hell.
> 
> It may be that switching things to BLORP is more of an actual solution than
> the paper-over we feared it might be.

Short-term, yes, that may fix this bug.  It should be easy enough to grab Topi's patches, apply them, and see if that fixes it.  Long-term, we still have a lot of meta paths and getting rid of them will take time.  Not sure if it's better to get rid of them or to fix them.

Comment 16 Ben Widawsky 2017-01-24 06:31:34 UTC

Topi's patches are merged now, right, is this still an issue?

Comment 17 Tapani Pälli 2017-01-24 10:55:17 UTC

(In reply to Ben Widawsky from comment #16)
> Topi's patches are merged now, right, is this still an issue?

Which patches are these?

Comment 18 Tapani Pälli 2017-01-24 11:11:50 UTC

OK found those, I've tested that 'blorp_tex_upload_2' fixes the bug but those patches are not in upstream yet.

Comment 19 Pallavi G 2017-10-11 05:41:55 UTC

We face the similar issue in Android-IA on O.
Please let me know those fixes and whether it merged in upstream?

Comment 20 Kenneth Graunke 2017-10-11 06:14:04 UTC

Not yet, sorry.  Using BLORP for texture upload uncovered a nasty bug in our state upload code that required me to do a fairly invasive rework.  That's landed.  We also uncovered bugs with renderbuffer compression.  The fix for that landed.  With those in place, I finally was able to re-send the BLORP texture upload patches for review.  Ideally they will be able to land late this week or early next week.

In the meantime you can grab the latest version from my branch here:
https://cgit.freedesktop.org/~kwg/mesa/log/?h=tex-upload

Comment 21 Pallavi G 2017-10-11 10:34:59 UTC

Thank you we will check (In reply to Kenneth Graunke from comment #20)
> Not yet, sorry.  Using BLORP for texture upload uncovered a nasty bug in our
> state upload code that required me to do a fairly invasive rework.  That's
> landed.  We also uncovered bugs with renderbuffer compression.  The fix for
> that landed.  With those in place, I finally was able to re-send the BLORP
> texture upload patches for review.  Ideally they will be able to land late
> this week or early next week.
> 
> In the meantime you can grab the latest version from my branch here:
> https://cgit.freedesktop.org/~kwg/mesa/log/?h=tex-upload

Comment 22 Kenneth Graunke 2017-10-13 04:46:39 UTC

Fixed by:

commit dffda6cbbbc1a8c2f7125db1b9cf15fbcdc8eb11
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Tue Jun 6 09:58:07 2017 -0700

    i965: Use blorp instead of meta for PBO texture uploads
    
    Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.