Bug 98402

Summary: [HSW] [regression] 4.9-rc1 shows corruption with mpv's vaapi-copy
Product: DRI Reporter: Andreas Reis <andreas.reis>
Component: DRM/IntelAssignee: Andreas Reis <andreas.reis>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: cristi.magherusan, intel-gfx-bugs, Martin, ricardo.vega, seanvk
Version: DRI gitKeywords: regression
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: HSW i915 features:
Attachments:
Description Flags
Screenshot of vaapi-copy video corruption
none
Round min chunk size up to a tile row
none
gfx corruption with regular KDE/Qt application Konversation none

Description Andreas Reis 2016-10-23 18:27:05 UTC
Created attachment 127505 [details]
Screenshot of vaapi-copy video corruption

Since 4.9-rc1 (also drm-intel-nightly) most 8-bit H.264 videos like…

https://www.youtube.com/watch?v=22jIHfvelZk
(In particular the max quality video returned eg. by "youtube-dl -f 137".)

… are corrupted as in the screenshot when mpv plays their h.264 versions my Haswell 4770 & 4200U with 'hwdec=vaapi-copy'. No other configuration (needed) in mpv.conf.

Unaccelerated 10-bit H.264, H.265, vp9, etc. is unaffected. Also, hwdec=vaapi still works fine.

(mpv's *-copy brings the video data back to system memory after decoding, eg. to apply further filters.)

The patterns of corruption are always the same, it appears as if a particular type of frame causes hiccups.

I'm blaming it on DRM since rebooting into 4.8.3 (Arch vanilla config, same compiler as for 4.9-rc1, same auto-detected gcc optimizations) and playing it there with the same software shows no such corruption.

Involved software is all at respective git head – mesa, xserver, xf86-video-intel, ffmpeg, libva, libva-intel-driver, mpv. gcc is version 6.2.1 20161006.
Comment 1 Chris Wilson 2016-10-23 18:36:58 UTC
* places a bet that libva forgot once again to set the write hazard on their render targets.

Other than timing issues, nothing immediately suggests itself. As you have two kernel endpoints, might as well begin the bisect there...
Comment 2 Sean V Kelley 2016-10-25 18:22:16 UTC
So does vaapi-copy actually do a copy or are you deriving the surfaces to get the image?

Sean
Comment 3 Sean V Kelley 2016-10-25 18:23:18 UTC
Please test with yamidecode from libyami-utils.  On Arch just yaourt libyami-utils.

Sean
Comment 4 Andreas Reis 2016-10-25 18:41:12 UTC
I'm afraid you'll have to look at mpv's sources for what its vaapi-copy does precisely.

libyami won't build for me on Arch, make DESTDIR="$pkgdir/" install instantly dies for me with
Making install in common
make[1]: Entering directory '/tmp/makepkg/libyami/src/libyami-libyami-1.0.0/common'
make[2]: Entering directory '/tmp/makepkg/libyami/src/libyami-libyami-1.0.0/common'
make[2]: Nothing to be done for 'install-exec-am'.
 /usr/bin/mkdir -p '/tmp/makepkg/libyami/pkg/libyami//usr/include/libyami'
 ../0 -m 644 YamiVersion.h '/tmp/makepkg/libyami/pkg/libyami//usr/include/libyami'
/bin/sh: line 11: ../0: No such file or directory
make[2]: *** [Makefile:679: install-libyami_commonincludeHEADERS] Error 127
Comment 5 Sean V Kelley 2016-10-25 21:11:46 UTC
Did you yaourt libyami first?

Sean
Comment 6 Sean V Kelley 2016-10-25 21:19:21 UTC
Odd yaourt libyami-utils installs fine for me, correctly pulling in libyami deps.  I'm the package maintainer.  I can help you with that later.

Anyway, I will have a look at this issue on HSW.
Comment 7 Andreas Reis 2016-10-25 21:26:47 UTC
Tried again, suddenly it installs (used the git versions). Huh.

yamidecode works fine, apart from all the texture render modes failing with "do not support this render mode".
Comment 8 Sean V Kelley 2016-10-25 23:05:19 UTC
Okay, so I wonder if this is MPV specific then.
Comment 9 Jari Tahvanainen 2016-10-26 13:31:08 UTC
Highest+Blocker due to regression w/o workaround
Comment 10 Sean V Kelley 2016-10-26 16:45:17 UTC
This is not @jari a "highest blocker"  It's not even a bug with the vaapi driver.  From what I can tell the issue is specific to MPV.  

Sean
Comment 11 Chris Wilson 2016-10-26 18:15:45 UTC
There is still the issue that the observed behaviour changed between kernel versions. We really do need to identify why to rule out a kernel regression.
Comment 12 Sean V Kelley 2016-10-26 22:01:50 UTC
Changed it back to highest blocker as it still needs to rule out a kernel issue...
Comment 13 Nobody 2016-11-01 15:30:37 UTC
Humberto can you help me to reproduce this
Comment 14 Nobody 2016-11-01 20:45:16 UTC
No longer need for a reproduction Humberto
Comment 15 Chris Wilson 2016-11-04 22:14:47 UTC
It's related to the partial fencing support. Other reports say it is fixed in -nightly, my suspicion is that https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-next-queued&id=d2a84a76a3b970fa32e6eda3d85e7782f831379e is the likely fix, but I'm waiting upon confirmation.
Comment 16 Chris Wilson 2016-11-07 09:51:12 UTC
Created attachment 127809 [details] [review]
Round min chunk size up to a tile row
Comment 17 Chris Wilson 2016-11-07 11:46:11 UTC
commit 0ef723cbceb6dce8116e75d44c5b8679b2eba69a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 7 10:54:43 2016 +0000

    drm/i915: Round tile chunks up for constructing partial VMAs
    
    When we split a large object up into chunks for GTT faulting (because we
    can't fit the whole object into the aperture) we have to align our cuts
    with the fence registers. Each partial VMA must cover a complete set of
    tile rows or the offset into each partial VMA is not aligned with the
    whole image. Currently we enforce a minimum size on each partial VMA,
    but this minimum size itself was not aligned to the tile row causing
    distortion.
Comment 18 Chris Wilson 2016-11-07 11:46:51 UTC
*** Bug 98504 has been marked as a duplicate of this bug. ***
Comment 19 Jari Tahvanainen 2016-11-08 05:02:40 UTC
Andreas - Please see with the latest intel-drm-nightly if you can still reproduce the issue.
Comment 20 Martin Steigerwald 2016-11-08 15:18:40 UTC
Created attachment 127846 [details]
gfx corruption with regular KDE/Qt application Konversation

For now drm-intel-fixes branch (with all commits up to 54905ab5fe7aa453610e31cec640e528aaedb2e2) seems to work okay with the graphics glitches I saw as in (see attachment). As these look similar to the corruptions shown here and Jani pointed me to this bug report, I add the information here.

I can only test with laptop display at the moment. But I will test with external display this evening – in case the issue at hand is DisplayPort related, but according to the git commit Chris mentioned in comment #17, it does not seem to be.
Comment 21 Martin Steigerwald 2016-11-08 15:21:04 UTC
For reference see:

[REGRESSION] Linux 4.9-rc4: gfx glitches on Intel Sandybridge (was: Re: Linux 4.9-rc4)
http://lkml.iu.edu/hypermail/linux/kernel/1611.0/02800.html
Comment 22 Andreas Reis 2016-11-08 15:30:17 UTC
As for me, vaapi-copy videos indeed play fine again.
Comment 23 Martin Steigerwald 2016-11-09 08:21:13 UTC
Also no graphics glitches with external DisplayPort connected display.

*However*, I got a soft freeze and a hard freeze (well after about a minute I gave up and rebooted by pressing power button long enough to forcefully switch off the laptop) when playing PlaneShift using drm-intel-fixes branch.

Unfortunately I have no further time to debug any of this week, but it seems not all fixes are there are ready for next stable kernel.
Comment 24 Mihai Dontu 2016-11-13 21:00:05 UTC
v4.9-rc5 solved the gfx corruptions for me with regular KDE/Qt applications.
Comment 25 Jari Tahvanainen 2016-11-14 15:02:47 UTC
Verified by Reporter.
Comment 26 Jari Tahvanainen 2016-11-14 15:06:28 UTC
Closing Verified+Fixed. Martin, file another bug related to soft and hard freeze, if that is still a problem.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.