Bug 102774 - [BDW] [Bisected] Absolute constant buffers break VAAPI in mpv
Summary: [BDW] [Bisected] Absolute constant buffers break VAAPI in mpv
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.2
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords: bisected
Depends on:
Blocks:
 
Reported: 2017-09-15 07:08 UTC by alim
Modified: 2017-10-23 23:22 UTC (History)
14 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description alim 2017-09-15 07:08:22 UTC
Archlinux
Dell Latitude e7450
i5-5300U
Broadwell HD Graphics 5500

linux 4.12.12
mesa 17.2.0
libva-intel-driver 1.8.3
libva 1.8.3
mpv 0.27.0

mpv -hwdec=vaapi -vo=vaapi video.mkv no longer works with mesa 17.2. It shows a black window. No video plays. Audio plays fine. App UI works fine. No problems with mesa 17.1.9. It also affects chromium when it is patched to enable hardware acceleration via vaapi.

No errors on dmesg, mpv and /sys/class/drm/card0/error

Bisected mesa and found that vaapi stopped working with commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b

If I reverse the changes to brw_state_upload.c with HEAD 17.2, vaapi starts working again.

https://cgit.freedesktop.org/mesa/mesa/commit/?h=17.2&id=8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b
Comment 1 Matt Turner 2017-09-15 07:11:13 UTC
Just a heads up: the recommended vo= option is vo=opengl as far as I know. Does the problem still occur when using that?
Comment 2 alim 2017-09-15 17:07:35 UTC
It plays fine with the vo=opengl option.
Comment 3 Maxim Baz 2017-09-18 08:56:26 UTC
This bug also affects Chromium compiled with VA-API patch to use hardware video acceleration. We have at least 6 people using ArchLinux confirming this so far.

Chromium sometimes just renders videos as a black rectangle, sometimes freezes entirely, and once even crashed X.

Someone reported, that reverting the same change as @alim mention fixes the issue.

Non-exhaustive list of affected Intel CPUs: i7-6500U, i7-7500U, i7-7820HQ.

If this can't / shouldn't be reverted, is there a way to compile Chromium in a different way so that we don't experience this issue?

How we compile Chromium with VA-API today:

https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=chromium-vaapi

History of comments discussing this issue:

https://aur.archlinux.org/packages/chromium-vaapi


I'll ask affected people to CC to this bug, so you can better see the number of affected people.
Comment 4 Kenneth Graunke 2017-09-18 16:53:36 UTC
Oh dear.  I was pretty sure that register is supposed to be context saved and restored, so it shouldn't affect anything else.  But, I guess it must be...

I'll have to look into this.  This week's XDC though, so it may take a bit longer...
Comment 5 Kenneth Graunke 2017-09-19 03:01:47 UTC
I tried this on Kabylake tonight, and I get a black window with Mesa master, commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b (the bisected commit), commit 86bd3fd864a8383e1d6823114da422f6a948bf1e (the one before that), and 17.1.  So, it doesn't seem like it ever worked.

-vo=gl continues to work fine, just -vo=vaapi doesn't.

I'll have to try on Broadwell...
Comment 6 Stephen Panicho 2017-09-19 04:41:37 UTC
I'm on Kabylake myself and I can confirm that mpv --vo=vaapi --hwdec=vaapi works with mesa 17.1.8, but not with 17.2.0. On the other hand --vo=opengl --hwdec=vaapi works with both versions.

Now, for Chromium built with vaapi, accelerated video playback (testing with YouTube) works with 17.1.8 but not with 17.2.0. With the latter version, all that's presented is a black video with working audio.
Comment 7 Kenneth Graunke 2017-09-19 06:16:27 UTC
So, trying with Mesa 17.1 on Kabylake...mpv renders black with a theora video, GPU hangs immediately with a VC1 video, and with H264, renders garbage colors and/or black and GPU hangs.  The GPU hangs are definitely a vaapi batch, not from Mesa.

I don't think this is a Mesa bug.  It seems like there are a lot of bugs in vaapi.  And the setting my patch changed is definitely supposed to be per-context, so Mesa changing it shouldn't affect vaapi at all.  vaapi probably doesn't use contexts explicitly, but I thought the kernel was supposed to give every process a context by default, to prevent state leaks like this.
Comment 8 Maxim Baz 2017-09-19 12:09:03 UTC
Thanks for looking into this, Kenneth!

How do you suggest we proceed from here?

For example, is it possible for you to file a bug with VA-API, and revert your change to mesa until VA-API is fixed?

Because there is a bunch of us who are not upgrading to mesa-17.2, and probably there is another bunch of people who experience this issue but don't know what is causing it.
Comment 9 nanericwang 2017-09-20 04:46:41 UTC
My GPU is Intel Gen5. The bug causes the chrome tabs totally freeze.

Steps to reproduce:
1. install mesa-17.2
2. install chromium-vaapi-bin
3. enable vaapi accelaration
4. open a HTML5 video
5. expected playback smoothly, but actually freeze the whole process
Comment 10 Kenneth Graunke 2017-09-20 08:43:38 UTC
(In reply to nanericwang from comment #9)
> My GPU is Intel Gen5. The bug causes the chrome tabs totally freeze.

Just to clarify, by "Gen5" you mean "5th generation Core Processor", right?  (Naming is hard, 5th Gen CPUs have Gen8 GPUs...)
Comment 11 nanericwang 2017-09-21 02:45:24 UTC
(In reply to Kenneth Graunke from comment #10)
> (In reply to nanericwang from comment #9)
> > My GPU is Intel Gen5. The bug causes the chrome tabs totally freeze.
> 
> Just to clarify, by "Gen5" you mean "5th generation Core Processor", right? 
> (Naming is hard, 5th Gen CPUs have Gen8 GPUs...)

I mean the 5th Gen GPU, Ironlake.
https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units#Fifth_generation
Comment 12 Kenneth Graunke 2017-09-21 06:49:31 UTC
(In reply to nanericwang from comment #11)
> I mean the 5th Gen GPU, Ironlake.
> https://en.wikipedia.org/wiki/
> List_of_Intel_graphics_processing_units#Fifth_generation

Okay, the patch in question here only affects Gen8+ (Broadwell and later).  So you must be hitting a different bug (with similar symptoms, but almost certainly a different cause).  Please file a separate report.  Thanks.
Comment 13 nanericwang 2017-09-23 01:16:28 UTC
(In reply to Kenneth Graunke from comment #12)
> (In reply to nanericwang from comment #11)
> > I mean the 5th Gen GPU, Ironlake.
> > https://en.wikipedia.org/wiki/
> > List_of_Intel_graphics_processing_units#Fifth_generation
> 
> Okay, the patch in question here only affects Gen8+ (Broadwell and later). 
> So you must be hitting a different bug (with similar symptoms, but almost
> certainly a different cause).  Please file a separate report.  Thanks.

Report filed at:
https://bugs.freedesktop.org/show_bug.cgi?id=102949

Thanks.
Comment 14 Kenneth Graunke 2017-09-24 07:17:15 UTC
Daniel reminded me that X (Glamor) and GNOME Shell use Mesa, so my earlier testing wasn't completely on Mesa 17.1.  I can now confirm the report (on Kabylake).

mpv -hwdec=vaapi -vo=vaapi did indeed work with Mesa 17.1.  It misrenders or GPU hangs with 17.2 and later (due to "i965: Switch to absolute addressing for constant buffer 0").

I ran a few more experiments:

- Starting X/Glamor with 17.1 without a compositor...
  - mpv -hwdec=vaapi -vo=vaapi appears to work with any Mesa version
  - Running Piglit using Mesa 17.2+ concurrently with a working mpv works fine...both mpv and Piglit work fine.

I think this suggests that CS_DEBUG_MODE2 is indeed saved and restored as part of the context - Glamor and mpv (on 17.1) would expect relative mode, and Piglit (on 17.2+) would expect absolute mode.  And both worked.  Daniel had suggested there might be a bug where setting the mode made it take effect on all rings, and that seems to not be the case either.

- Starting X/Glamor with 17.2+ without a compositor...
  - mpv -hwdec=vaapi -vo=vaapi misrenders and GPU hangs.

My theory is that new contexts are inheriting the state from X/Glamor's context.  I'd say maybe it's because it gets the fd from X via DRI3...but that doesn't make much sense, because libva-intel-driver doesn't use DRI3.  Maybe it's because X is the first client on the system?  Or it's drm master?  Or runs just before mpv?

Note that libva-intel-driver doesn't create its own context.  But I think the kernel ought to make one for it, when it opens the fd...

Daniel, Chris, do you have any ideas?
Comment 15 Kenneth Graunke 2017-10-05 00:35:35 UTC
Chris seems to think that we should patch libva (and possibly beignet) to initialize this value.

In the meantime, I've posted a patch for Mesa 17.2.x that reverts the offending commit...

https://lists.freedesktop.org/archives/mesa-stable/2017-September/007116.html
Comment 16 Mads 2017-10-10 09:13:44 UTC
I experience the same issue on my Skylake laptop (Dell XPS 15 9550) and also on a Kaby Lake laptop (Dell XPS 15 9560). Works with mesa 17.1.8, not with mesa 17.2.0 and upwards.
Comment 17 Maxim Baz 2017-10-23 16:02:12 UTC
Hey Kenneth, has there been any traction on reverting and/or fixing the code during the past 3 weeks? Looks like the thread your created [1] didn't receive any attention whatsoever.

[1]: https://lists.freedesktop.org/archives/mesa-stable/2017-September/007116.html
Comment 18 Kenneth Graunke 2017-10-23 19:10:12 UTC
(In reply to Maxim Baz from comment #17)
> Hey Kenneth, has there been any traction on reverting and/or fixing the code
> during the past 3 weeks? Looks like the thread your created [1] didn't
> receive any attention whatsoever.
> 
> [1]:
> https://lists.freedesktop.org/archives/mesa-stable/2017-September/007116.html

True that.  I sent another patch to revert it off of master last week, which stirred up a bit of flames, and Kristian apparently convinced at least -some- kernel people that this needs to be fixed in the kernel.  Whether anything will happen to that end, I have no idea.

In the meantime, I've pushed my revert to master, with the appropriate marking for stable branches.  Emil / Andres should pick it up for the 17.3 / 17.2 stable branches soon.

Thanks for your patience, and sorry about this mess. :(

Fixed in master with:

commit 013d33122028f2492da90a03ae4bc1dab84c3ee9
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Thu Oct 19 14:38:30 2017 -0700

    i965: Revert absolute mode for constant buffer pointers.
    
    The kernel doesn't initialize the value of the INSTPM or CS_DEBUG_MODE2
    registers at context initialization time.  Instead, they're inherited
    from whatever happened to be running on the GPU prior to first run of a
    new context.  So, when we started setting these, other contexts in the
    system started inheriting our values.  Since this controls whether
    3DSTATE_CONSTANT_* takes a pointer or an offset, getting the wrong
    setting is fatal for almost any process which isn't expecting this.
    
    Unfortunately, VA-API and Beignet don't initialize this (nor does older
    Mesa), so they will die horribly if we start doing this.  UXA and SNA
    don't use any push constants, so they are unaffected.
    
    Until we have some kind of solution to this problem, I'm going to revert
    this patch and abandon using the feature for now.  It will lead to fewer
    pushed UBO ranges on Broadwell+, which may lead to lower performance,
    though I don't have any data on the impact.
    
    Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102774


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.