108787 – [BSW] Mesa "total_needs <= urb_chunks" abort in GfxBench CarChase startup, when asserts are enabled

Bug 108787 - [BSW] Mesa "total_needs <= urb_chunks" abort in GfxBench CarChase startup, when asserts are enabled

Summary: [BSW] Mesa "total_needs <= urb_chunks" abort in GfxBench CarChase startup, wh...

Status:	RESOLVED MOVED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel 3D Bugs Mailing List
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:	regression

Duplicates (1):	101406 (view as bug list)
Depends on:
Blocks:	101406
	Show dependency tree / graph

Reported:	2018-11-19 14:32 UTC by Eero Tamminen
Modified:	2019-09-25 19:15 UTC (History)
CC List:	1 user (show)

See Also:	92320
i915 platform:
i915 features:

Attachments

Description Eero Tamminen 2018-11-19 14:32:33 UTC

Setup:
* BSW N3050
* Ubuntu 18.04
* Gfx stack built from Git
* GfxBench v4 / v5 (reproduced with latter, but v4 should work the same)

Use-case:
* bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_4

Actual outcome:
* Abort with following message:
testfw_app: src/intel/common/gen_urb_config.c:152: gen_get_urb_config: Assertion `total_needs <= urb_chunks' failed.

This regressed between following Mesa commits:
41c8f99137: 2018-11-12 18:28:04: util: Fix warning in u_cpu_detect on non-x86
e13dd70581: 2018-11-14 14:41:58: i965: avoid 'unused variable' warnings

In case it matters:
* i965 has always misrendered Carchase benchmark on BSW, see bug 101406
* GfxBench Vulkan Aztec Ruins and SynMark DeferredAA & TexMem128 started
  to GPU hang around 6th of November, but for now this seems more of
  a drm-tip kernel than Mesa issue

Comment 1 Eero Tamminen 2018-11-28 14:25:44 UTC

CarChase still aborts on BSW N3050 to this error with latest Mesa Git version, both with v4.19 and latest v4.20-rc4 drm-tip kernels.

Comment 2 Eero Tamminen 2018-12-28 09:59:31 UTC

This regression is still happening with latest v4.20.0-rc7 drm-tip kernel & Mesa.

Comment 3 Kenneth Graunke 2019-02-14 17:12:48 UTC

FWIW you can reproduce this assert on any system by using INTEL_DEVID_OVERRIDE=0x22b0

Comment 4 Kenneth Graunke 2019-02-14 17:22:49 UTC

Total needs is 9, URB chunks is 8.

Notably, brw->urb.size == 64, but devinfo->urb.size == 192, so we're probably getting things out of sync when changing L3 configurations.

Oddly KBL GT1 has a devinfo->urb.size of 192 as well, and still uses chv_l3_configs, yet it works fine.

Comment 5 Mark Janes 2019-02-14 22:13:15 UTC

Based on Ken's investigation, this is not a regression for the release.

Comment 6 Kenneth Graunke 2019-02-15 01:47:18 UTC

I'm not positive about that - it may have regressed. There's a fundamental bug which isn't a regression, but this app may have gotten worse.

So, what's going on...is that the program enables SLM, which drops the URB size to 64kB on Cherryview or Broadwell GT1 systems. We also always reserve 32kB of space for push constants, which comes from the URB area, leaving us with only 32kB of URB, which is very small. The app also uses tessellation at the same time, which on Gen8 bumps the minimum number of VS URB entries to 192, due to a HW workaround. And, the VUE size is fairly large, at 128 (bytes?). With the 8kB granularity, we end up needing 9 chunks of URB space, and we can only offer 8 chunks. So we are unable to meet the minimum requirements for the program.

It may make sense to reduce the push constant space when the URB is small, or possibly just for CHV/BDW-GT1 parts. But, we also have VS, HS, DS, and PS competing for that space, so reducing that space could hurt as well...

As to why this might be a regression. It's possible that some compiler changes ended up increasing the VUE sizes, due to extra varyings between stages. This would push it over the limit, when the app would have worked before. So, that would be a regression. But, optimizing that again would not be a full fix, because we can certainly write a test case to hit this path.

Another last thought. There's no way that SLM and tessellation can both be required at once. SLM is only needed for compute shaders. We try to avoid changing the L3 configuration mid-batch because it's expensive, so we're probably sticking with the SLM-enabled config when we ought to just switch back to one with the full URB size. This means that any i965 patch which affects the command stream may move flush points such that compute lands in the same batch as this expensive tessellation draw, and didn't before, triggering the issue.

It may make sense to consider tessellation being enabled on BDW-GT1/CHV as a good enough reason to do the expensive transition, because of the extra 192 entry workaround. (Apollolake doesn't have that limitation, and it appears to fit just fine.)

Comment 7 Eero Tamminen 2019-02-18 17:28:29 UTC

(In reply to Kenneth Graunke from comment #6)
> I'm not positive about that - it may have regressed.  There's a fundamental
> bug which isn't a regression, but this app may have gotten worse.
> [technical explanation]

This Mesa change could have been triggered by some change in the build environment, because I can't anymore reproduce working Mesa from the commit that originally worked (the old build of commit 41c8f99137 still works fine with the latest run-time environment).

Libdrm changes didn't cause it (I tested it), and I don't think upstream X util-macros, xorgproto, xcbproto, wayland-protocols & libwayland could have affected it.  Ubuntu 18.04 package updates in the build deps (that I'm not building myself) should be just security updates so I don't see them causing this either.

Which leaves Autotools -> Meson change...

From following Autotools arguments:
     --with-dri-drivers=i965,swrast
     --with-vulkan-drivers=intel
     --with-gallium-drivers=radeonsi
     --enable-llvm
     --with-platforms=x11,drm,wayland,surfaceless
     --enable-dri3
     --enable-gbm
     --enable-gles2
     --enable-glx-tls
     --enable-shared-glapi

To following Meson arguments:
     -Ddri-drivers=i965,swrast
     -Dvulkan-drivers=intel
     -Dgallium-drivers=radeonsi
     -Dplatforms=x11,wayland,drm,surfaceless
     -Dllvm=true
     -Ddri3=true
     -Dgbm=true
     -Degl=true

And indeed, re-building the Mesa commit 41c8f99137 with Autotools instead of Meson makes CarChase work.

=> Meson build bug?

(Didn't have time to test whether also Mesa tip would work when built with Autotools.)

Comment 8 Eero Tamminen 2019-02-19 12:39:47 UTC

Also with latest Mesa commit, building with Autotools instead of Meson fixes the problem.  How using Meson instead of Autotools is able to mess up with the i965 URB config???

-> I think this bug is one of the blockers for Meson switch / dropping Autotools support (whether it happens in 19.0 or 19.1).

Comment 9 Dylan Baker 2019-03-06 18:20:38 UTC

Doesn't autotools only build with asserts with --enable-debug is set? Meson controls that separately from the buildtype with -Db_ndebug. Can you reproduce if you add `-Dbuildtype=release` to the meson command line? (as long as your meson is newer than 0.45 asserts are defaulted to off when the buildtype is release or plain)

Comment 10 Dylan Baker 2019-03-06 18:21:38 UTC

I guess I should be clearer, our meson defaults to -O2 -g without touching buildtype and/or b_ndebug yourself.

Comment 11 Sergii Romantsov 2019-03-07 09:39:50 UTC

Could it be related to issue https://bugs.freedesktop.org/show_bug.cgi?id=109791?

So with meson < 0.46 asserts seems may be included into release-builds.

Comment 12 Eero Tamminen 2019-03-07 14:46:25 UTC

reply to Sergii Romantsov from comment #11)
> Could it be related to issue
> https://bugs.freedesktop.org/show_bug.cgi?id=109791?
> 
> So with meson < 0.46 asserts seems may be included into release-builds.

Meson 0.47.2 from Ubuntu 18.10 seems to work fine as-is in Ubuntu 18.04, so I was able to do a Mesa built with a newer Meson version.

I'm still getting the assert though:
testfw_app: src/intel/common/gen_urb_config.c:152: gen_get_urb_config: Assertion `total_needs <= urb_chunks' failed.


Then I tried Autotools build with --enable-debug.

It gives gives the same error, and additionally also:
Mesa: User error: GL_INVALID_ENUM in glGetString(GL_EXTENSIONS)


=> it's not a regression or Meson related, but an older issue on BSW.

This may actually explain bug 101406 i.e. why Mesa has never rendered CarChase correctly on BSW.



(In reply to Dylan Baker from comment #9)
> Doesn't autotools only build with asserts with --enable-debug is set? Meson
> controls that separately from the buildtype with -Db_ndebug. Can you
> reproduce if you add `-Dbuildtype=release` to the meson command line? (as
> long as your meson is newer than 0.45 asserts are defaulted to off when the
> buildtype is release or plain)

I don't get assert with "release".  As've I stated above, I was using "debugoptimized".

FYI: I thought it corresponded to RELWITHDEBINFO target in CMake.  With CMake, there's the RELEASE build type that strips the binaries, and RELWITHDEBINFO, which builds with debug symbols and doesn't do stripping i.e. binaries are 100% match to release, just with debug info.

According to Meson docs, "release" build doesn't include debug symbols, but according to "file" command it doesn't strip binaries either.  Meson documentation doesn't say anything about "debugoptimized" enabling asserts:
  https://mesonbuild.com/Running-Meson.html

So I'm not sure whether that's a bug or not.

=> With what Meson options I'm supposed to get Mesa binaries that correspond to release build (no asserts), just with full (-g) debug symbols?


PS. IMHO Meson --help output or manual page should explain the differences between the build targets.  Currently it doesn't seem to be properly documented even in latest Meson docs.

Comment 13 Dylan Baker 2019-03-08 22:35:52 UTC

by default meson doesn't link buildtype and b_ndebug, they're completely separate. We use n_debug=if-release in mesa, which is supposed to enable asserts for debug and debugoptimized and disable them for release and plain.

Meson's buildtype are:
debug : -O0 -g or -Og $userflags
debugoptimized : -O2 -g $userflags
release : -O2 $userflags
plain :  $userflags

So if you want compare to autotools:
./configure --enable-debug ~= meson -Dbuildtype=debug
./configure != meson -Dbuildtype=plain

Comment 14 Dylan Baker 2019-03-11 17:01:55 UTC

Oops, that should be
./configure ~= meson -Dbuildtype=plain

Comment 15 Mark Janes 2019-03-13 15:30:37 UTC

*** Bug 101406 has been marked as a duplicate of this bug. ***

Comment 16 GitLab Migration User 2019-09-25 19:15:16 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1772.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.