Bug 102818 - [BSW] image corruption issue after edd849e5448c4f6ddc04a5fa1ac5479176660c27
Summary: [BSW] image corruption issue after edd849e5448c4f6ddc04a5fa1ac5479176660c27
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: JP
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords: bisected
Depends on:
Blocks:
 
Reported: 2017-09-17 10:47 UTC by freedesktop
Modified: 2018-04-20 14:26 UTC (History)
2 users (show)

See Also:
i915 platform: BSW/CHT
i915 features: display/HDMI


Attachments
drm.debug=0xe (105.83 KB, text/plain)
2017-09-17 10:48 UTC, freedesktop
no flags Details
Xorg.0.log (18.56 KB, text/plain)
2017-09-17 10:48 UTC, freedesktop
no flags Details
xrandr --verbose (8.88 KB, text/plain)
2017-09-17 10:49 UTC, freedesktop
no flags Details
git bisect log (2.66 KB, text/plain)
2017-09-17 11:08 UTC, freedesktop
no flags Details
single frame from the video exhibiting the issue (2.82 MB, image/png)
2017-09-21 12:43 UTC, Jani Nikula
no flags Details
dmesg that includes period when image artifacts appeared (7.56 MB, text/plain)
2017-09-21 13:26 UTC, freedesktop
no flags Details
Reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d for use with 4.13.3 (1.20 KB, patch)
2017-09-29 08:11 UTC, freedesktop
no flags Details | Splinter Review
Steps to confirm that 608b20506941969ea30d8c08dc9ae02bb87dbf7d is BAD (7.53 KB, text/plain)
2017-09-29 08:12 UTC, freedesktop
no flags Details
Hold powerwell for vblanks (1.20 KB, patch)
2017-10-09 08:52 UTC, Chris Wilson
no flags Details | Splinter Review

Description freedesktop 2017-09-17 10:47:30 UTC
A user has reported an intermittent issue since 4.12-rc1 which manifests as a "z" shaped flicker/breakup of the displayed image.

The OS is LibreELEC with latest mainline kernel, running Kodi 18.

Since a picture is worth a thousand words, you can view the issue in two short videos:

Video #1[1]: flicker at 1s, 10s and 12s
Video #2[2]: flicker at 1s and 8s

The user has tested with two different HDMI cables and two different TVs, and the result is the same.

I bisected the kernel with the user (bisect log attached) and the bad commit[3] is:

Merge tag 'drm-misc-next-2017-03-21' of git://anongit.freedesktop.org/git/drm-misc into drm-next

Unfortunately this is a merge commit, but it seems to be in the right area.

The kernel log with "drm.debug=0xe" is attached, as is Xorg.0.log.

Unfotunately it hasn't been possible to obtain a vbios dump.

The user has tried on two TVs with two different cables, and the same problem is present with all combinations:

TV #1: Sony KDL 32EX521 SW:PKG3-309-EUA-0104 (1080p, 6 years old)
TV #2: Sony KDL 49X8305C SW:PKG3-473-0107EUB (4K, 1 year old)
Cable #1: Belkin High speed 3m #49749
Cable #2: Startech 1m Highspeed 20276

The PC configuration is:

CPU: Intel Atom x5-E8000 (Braswell) @ 1.04GHz (SolidRun IB8000 SOM[4])
GPU: Intel HD Graphics 400
Driver: intel-vaapi-driver 1.8.3, libva 1.8.3, mesa 17.2

The issue is still present with kernel 4.13.2 and 4.14-rc1.

I'd be happy to create test builds with patches for the user to try out.

Many thanks.

1. http://milhouse.libreelec.tv/other/vsync_issue/IMG_0403.MOV (27MB)
2. http://milhouse.libreelec.tv/other/vsync_issue/IMG_0406.MOV (18MB)
3. https://github.com/torvalds/linux/commit/edd849e5448c4f6ddc04a5fa1ac5479176660c27
4. https://www.solid-run.com/intel-braswell-family/braswell-som-system-on-module/braswell-som-specifications/
Comment 1 freedesktop 2017-09-17 10:48:04 UTC
Created attachment 134287 [details]
drm.debug=0xe
Comment 2 freedesktop 2017-09-17 10:48:56 UTC
Created attachment 134288 [details]
Xorg.0.log
Comment 3 freedesktop 2017-09-17 10:49:33 UTC
Created attachment 134289 [details]
xrandr --verbose
Comment 4 freedesktop 2017-09-17 11:08:17 UTC
Created attachment 134290 [details]
git bisect log
Comment 5 Jani Nikula 2017-09-18 08:33:56 UTC
I don't see why git bisect wouldn't bisect into the merge. Care to check the last results again?
Comment 6 freedesktop 2017-09-18 09:22:09 UTC
> I don't see why git bisect wouldn't bisect into the merge.

Sorry Jani, I'm not entirely sure what you mean here - I simply used "git bisect" between 4.11.10 (known good) and 4.12-rc1 (known bad), and based on testing of the resulting kernels git decided that the merge commit was the bad commit.

Can you tell me the best way to bisect the merge commit and I'll give that a go.

My kernel repo is a clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

> Care to check the last results again?

I'll ask the user to check the builds again, there were 14 in total. This may take a few days. Unfortunately I can't reproduce this issue myself.
Comment 7 Jani Nikula 2017-09-21 12:30:29 UTC
Does the dmesg cover the part where the user sees flickering?
Comment 8 Jani Nikula 2017-09-21 12:43:13 UTC
Created attachment 134407 [details]
single frame from the video exhibiting the issue
Comment 9 freedesktop 2017-09-21 13:26:45 UTC
Created attachment 134408 [details]
dmesg that includes period when image artifacts appeared

Quote from user:

"dmesg | pastebinit -> http://sprunge.us/cTFK
& the issues are reproduced ~20 times during 2min !!!"

So it would appear that when these events occur, nothing is being logged...
Comment 10 freedesktop 2017-09-29 08:10:23 UTC
I used git to bisect between v4.12-rc1 (BAD) and v4.11.10 (GOOD).

Unfortunately git identified a merge commit as the BAD commit:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/patch/?id=edd849e5448c4f6ddc04a5fa1ac5479176660c27

and would not bisect any further. I decided to "bisect" manually, and having done so the first BAD commit is:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/patch/?id=608b20506941969ea30d8c08dc9ae02bb87dbf7d

By reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d from 4.13.3 (see attached patch), the user has confirmed that 4.13.3 is now GOOD, while the normal 4.13.3 (without the revert) is still BAD.

I'll attach the "revert" patch (which is just for test purposes), and also my steps taken to confirm this conclusion (just for completeness)
Comment 11 freedesktop 2017-09-29 08:11:34 UTC
Created attachment 134562 [details] [review]
Reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d for use with 4.13.3
Comment 12 freedesktop 2017-09-29 08:12:11 UTC
Created attachment 134563 [details]
Steps to confirm that 608b20506941969ea30d8c08dc9ae02bb87dbf7d is BAD
Comment 13 JP 2017-10-05 09:19:16 UTC
Dears, 

I'm the originator of this bug. I start my own company and after several researches I decided to base my products on intel platform ( instead of Amlogx, imx,...) 
I try to offer to end user and complete personal DVBT/DVBS streaming solution
To reach this target , my company is developping the client & server by ourself ( at limited costs )
I'm now in blocked state due to the above bug.

May I ask you kindly to review it & let me know if there is chance to found some solution.

Otherwise, I 'll be forced to switch from intel to other platform ( I know it's technical the best ... but ... project / results first )

Thanks a lot in advance

JP 
Project Manager
Comment 14 Elizabeth 2017-10-05 20:24:26 UTC
Reopening since information requested in comment #5 and comment #7 has been provided. 

Good afternoon JP. If your company has an agreement with Intel please use proper internal channels to speed this up, otherwise allow us to review further.
Comment 15 JP 2017-10-06 06:54:28 UTC
Hi Elizabeth
As I wrote I'm small company with limited budget
Unfortunatelly I don't have any agreement with Intel

I just count from now on developpers devotion...

Thanks in advance
JP
Comment 16 Jani Nikula 2017-10-09 07:33:28 UTC
(In reply to freedesktop from comment #10)
> By reverting 608b20506941969ea30d8c08dc9ae02bb87dbf7d from 4.13.3 (see
> attached patch), the user has confirmed that 4.13.3 is now GOOD, while the
> normal 4.13.3 (without the revert) is still BAD.

608b20506941 ("drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off)")
Comment 17 Chris Wilson 2017-10-09 08:52:24 UTC
Created attachment 134760 [details] [review]
Hold powerwell for vblanks

Something like this?
Comment 18 freedesktop 2017-10-11 07:35:37 UTC
Hi Chris. Unfortunately the patch hasn't helped - JP has tested a build based on 4.13.5 + your patch from comment #17, and he has the same Z display corruption as before.
Comment 19 JP 2017-10-19 05:50:59 UTC
Hi 
Any update on the previous topic ?
thanks in advance
JP
Comment 20 Daniel Vetter 2017-11-07 12:20:26 UTC
We have a bunch of bugfixes all over in flight, please reteste with latest drm-tip (and quote the full top commit of it, it's a rebasing tree, the sha1 isn't useful). Both with and without Chris' patch.
Comment 21 Daniel Vetter 2017-11-07 12:23:49 UTC
Also note: PSR will break the vblank code, pls make sure you don't have that enabled somewhere in the module options.
Comment 22 freedesktop 2017-11-13 17:41:41 UTC
Hi, apologies for the long delay - a bad case of Man Flu has knocked me out for the past week.

On Sunday I produced three test builds for JP based on drm-tip[1]:

"drm-tip: 2017y-11m-12d-14h-36m-14s UTC integration manifest"

The three builds were:

#1113b: drmtip only
#1113c: #1113b + Chris Wilson patch (comment 17)
#1113d: #1113b + my hack patch (comment 11)

After testing, JP has called all three builds as "GOOD", as none are exhibiting the Z-shaped corruption. Therefore it doesn't look like any additional patches from this bug are required.

However a different build (exact same userland but with drm-tip replaced by the mainline 4.14 kernel released on Sunday, and no patches from this bug) continues to show the Z-shaped corruption, so the corruption issue is still present in the mainline 4.14 kernel.

Maybe this issue will be fixed in 4.15?

1. https://cgit.freedesktop.org/drm-tip/commit/?id=0dc48f1fad834c3ab95f4d178e9e38e6ea39b6cf
Comment 23 freedesktop 2017-11-13 17:42:42 UTC
Oh and there are no PSR (Panel Self Refresh) settings enabled to my knowledge.
Comment 24 JP 2017-12-08 13:08:40 UTC
Dears, 
it appears bug solved in the 4.15rc1
I 'll close the bug when the test will be also positive for the 4.15.0
Thanks for al
JP
Comment 25 Jani Saarinen 2018-03-29 07:10:56 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 26 Jani Saarinen 2018-04-20 14:26:44 UTC
Closing, please re-open if still occurs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.