Bug 94080 - [HSW] intel_do_flush_locked failed: Invalid argument in dEQP-GLES31.functional.compute.indirect_dispatch.upload_buffer.single_invocation
Summary: [HSW] intel_do_flush_locked failed: Invalid argument in dEQP-GLES31.functiona...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: i965-deqp
  Show dependency treegraph
 
Reported: 2016-02-10 16:58 UTC by Ilia Mirkin
Modified: 2019-03-14 08:40 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
INTEL_DEBUG=bat,buf output prior to error (27.66 KB, text/plain)
2016-02-10 16:58 UTC, Ilia Mirkin
Details

Description Ilia Mirkin 2016-02-10 16:58:13 UTC
Created attachment 121652 [details]
INTEL_DEBUG=bat,buf output prior to error

I reliably get the attached (INTEL_DEBUG=bat,buf) error when running:

MESA_GLES_VERSION_OVERRIDE=3.1 ./deqp-gles31 --deqp-visibility=hidden --deqp-case='dEQP-GLES31.functional.compute.indirect_dispatch.upload_buffer.single_invocation'

Note that I have Ken's recent patch series to fix up some compute state tracking: https://patchwork.freedesktop.org/series/3213/ although the error also happens without it.

This dEQP build is from https://android.googlesource.com/platform/external/deqp + a minor patch to make it actually build (libpng.h -> png.h in CMakeLists.txt). You'll need Xorg 1.18.1 or the relevant GLX patch for it to work too. Please ask if you're having trouble getting it up and running, or need any additional debug info, this reproduces 100% for me.
Comment 1 Ilia Mirkin 2016-02-10 18:25:29 UTC
Upgrading from kernel 4.3.0 to 4.4.1 fixed it but... 2 things

(a) You shouldn't be exposing GL_ARB_compute_shader in this case
(b) exit(1) is *really* harsh on intel_do_flush_locked failure
Comment 2 Kenneth Graunke 2016-02-10 21:04:01 UTC
I actually addressed (a) in bd21b5460761560 ("i965: Only turn on ARB_compute_shader if we can write registers."), but only for desktop GL.  Presumably we need something that stops us from advertising ES 3.1 as well.

Regarding (b)...we've always done that.  We don't really know why the kernel returned an error from the execbuf2 ioctl, but several options are: 1) the GPU is toast (can't really continue).  2) the kernel has revoked our rights to talk to the GPU after hosing it repeatedly (shouldn't continue).  3) some out of memory condition (who knows what to do?).  4) the new command parser rejected our batch for doing bogus things (a bug in Mesa, so kind of like an assert).

The last reason is the sketchiest.  IMHO the command parser is misdesigned - platforms with the hardware checker simply MI_NOOP disallowed things - but the Gen7/7.5-only software checker -EINVALs your program.  I think it should mimic the hardware behavior.  But, others disagree.

So, that's where we're at.  *shrug*
Comment 3 Ilia Mirkin 2016-02-10 21:08:38 UTC
(In reply to Kenneth Graunke from comment #2)
> I actually addressed (a) in bd21b5460761560 ("i965: Only turn on
> ARB_compute_shader if we can write registers."), but only for desktop GL. 
> Presumably we need something that stops us from advertising ES 3.1 as well.

I was force-enabling GLES 3.1. However GL_ARB_compute_shader was exposed for me in Linux kernel 4.3.0. I guess there's more to it? This specifically had to do with indirect compute dispatch, I believe separately from indirect draws.

> 
> Regarding (b)...we've always done that.  We don't really know why the kernel
> returned an error from the execbuf2 ioctl, but several options are: 1) the
> GPU is toast (can't really continue).  2) the kernel has revoked our rights
> to talk to the GPU after hosing it repeatedly (shouldn't continue).  3) some
> out of memory condition (who knows what to do?).  4) the new command parser
> rejected our batch for doing bogus things (a bug in Mesa, so kind of like an
> assert).
> 
> The last reason is the sketchiest.  IMHO the command parser is misdesigned -
> platforms with the hardware checker simply MI_NOOP disallowed things - but
> the Gen7/7.5-only software checker -EINVALs your program.  I think it should
> mimic the hardware behavior.  But, others disagree.
> 
> So, that's where we're at.  *shrug*

Yeah, dealing with unexpected errors sucks. I think the ultimate move is to tear down the context and start from scratch. And start returning errors if you can't bring things up properly. You're going to have to deal with this for proper robustness support eventually, but I agree this is a giant pain :)
Comment 4 Matt Turner 2016-11-03 01:11:52 UTC
The test passes for me on HSW (now that we expose ES 3.1). The original bug is fixed. RESOLVED.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.