Bug 87682 - Horizontal lines in radeon driver on kernel 3.15 and upwards
Summary: Horizontal lines in radeon driver on kernel 3.15 and upwards
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Default DRI bug account
QA Contact:
URL: https://www.youtube.com/watch?v=nx2-F...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-24 15:35 UTC by lockheed
Modified: 2016-07-28 16:23 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Youtube video depicting the artefacts. (43 bytes, text/plain)
2014-12-24 15:35 UTC, lockheed
no flags Details

Description lockheed 2014-12-24 15:35:20 UTC
Created attachment 111292 [details]
Youtube video depicting the artefacts.

This is what I get if I run any kernel newer than 3.14: https://www.youtube.com/watch?v=nx2-Fvihzxg
Those artefacts appear as soon as new kernel is selected in GRUB, and remain after logging into X session.

The last kernel working without artefacts is v3.15-rc2-trusty for Ubuntu, and 3.14.7 for Arch. All later kernels have those artefacts.

I updated mesa and xorg to those form ppaibaf/graphics-drivers ppa on Ubuntu, and on using git version on Arch but it did not change a thing. This regression is solely kernel-related.

GPU: Mobility Radeon HD 3200 (RS780M)
System tested: Arch, Ubuntu 14.04, 14.10
Kernels affected: 3.15 and onwards (tested up to 3.19-rc1)
Comment 1 Michel Dänzer 2014-12-25 01:17:13 UTC
Can you isolate the kernel change which introduced the problem with git bisect?
Comment 2 Christian König 2014-12-25 12:37:34 UTC
Most likely another problem caused by the PLL rework. I would guess it's one of those patches.
Comment 3 lockheed 2014-12-25 15:11:14 UTC
@Michel Dänzer, I can contribute bug as detailed as I can, but I don't think I have the necessary combination of time and skill to "bisect" a kernel. 

However, since I gave the specific kernel version which the error emerges, it should be enough information for someone with more knowledge to find the cause.
Comment 4 Alex Deucher 2015-01-05 17:25:57 UTC
Possibly related to https://bugzilla.kernel.org/show_bug.cgi?id=83461
Comment 5 Thom 2016-05-02 23:10:19 UTC
I can confirm this bug: Laptop HP 6735s 2xTurion + RS780M videochip
I happen to have the exact same artefacts with any kernelversion higher than 3.13
It affects the buildin LVDS but NOT the VGA-output.

I tested kernels up to 4.4.0 (to no avail)

I don't know what "git bisect" is but eager to learn.
I also dropped a note on https://bugzilla.kernel.org/show_bug.cgi?id=83461

I used the link to Lockheed's video as illustration on https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1479136 where I originally filed a bug.

I am in the happy circumstances to dedicate this laptop to any test you want me to throw at it.
Comment 6 Felix Schwarz 2016-05-03 08:02:38 UTC
(In reply to Thom from comment #5)
> I don't know what "git bisect" is but eager to learn.

"bisecting" is a way to find out which commit caused a specific regression.
This involves compiling the linux kernel from git and testing the compiled
versions. If you can find out which commit is the culprit chances are pretty
good that the problem can be fixed quickly.

To learn more about bisecting I suggest seaching for "git bisect".
Comment 7 Thom 2016-05-04 09:39:34 UTC
Ok, I did my first bisect, it worked out well but I encountered something that puzzles me a bit.
Here is the last part of the bisect:

3.15.0-rc3-00725-g1465967  bad

Bisecting: 658 revisions left to test after this (roughly 9 steps)
[e9dba837640d960f56bef22ff08611955ff8a5b4] Merge tag 'pm+acpi-3.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

3.15.0-rc2-00219-ge9dba83  bad

Bisecting: 355 revisions left to test after this (roughly 8 steps)
[6e66d5dab5d530a368314eb631201a02aabb075d] Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6


3.15.0-rc1-00303-g6e66d5d good

Bisecting: 176 revisions left to test after this (roughly 8 steps)
[4d0fa8a0f01272d4de33704f20303dcecdb55df1] Merge tag 'gpio-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

3.15.0-rc2-00042-g4d0fa8a good

Bisecting: 99 revisions left to test after this (roughly 7 steps)
[76e7745e8e4330fdb30f049303d524261c0b7a2c] Merge tag 'zynq-dt-fixes-for-3.15' of git://git.xilinx.com/linux-xlnx into fixes

3.15.0-rc2-00077-g76e7745 good (how can this be ??)

Bisecting: 49 revisions left to test after this (roughly 6 steps)
[92891ed6b1fdb49655f9a071ef2880a567807375] Merge branch 'fixes_for_v3.15' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

3.15.0-rc2-00092-g92891ed bad

Bisecting: 22 revisions left to test after this (roughly 5 steps)
[1aae31c8306e5f1bdeafd87b2cd9e3f0df3709e5] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

3.15.0-rc2-00069-g1aae31c bad

Bisecting: 13 revisions left to test after this (roughly 4 steps)
[7740fc52105c9e6d2beac389a9ae0ce7138cf5ab] Input: soc_button_array - fix a crash during rmmod

3.14.0-rc4-00065-g7740fc5 good

Bisecting: 6 revisions left to test after this (roughly 3 steps)
[3ed9a335cfc64b2c83545f341cdddf2347b12b97] drm/radeon/pm: don't walk the crtc list before it has been initialized (v2)

3.15.0-rc1-00075-g3ed9a33 bad

Bisecting: 3 revisions left to test after this (roughly 2 steps)
[c2fb3094669a3205f16a32f4119d0afe40b1a1fd] drm/radeon: improve PLL limit handling in post div calculation

3.15.0-rc1-00071-gc2fb309 bad

Bisecting: 0 revisions left to test after this (roughly 1 step)
[24315814239a3fdb306244c99bd076bc79db4ade] drm/radeon: use fixed PPL ref divider if needed

3.15.0-rc1-00070-g2431581 good

c2fb3094669a3205f16a32f4119d0afe40b1a1fd is the first bad commit

commit c2fb3094669a3205f16a32f4119d0afe40b1a1fd
Author: Christian König <christian.koenig@amd.com>
Date:   Sun Apr 20 13:24:32 2014 +0200

    drm/radeon: improve PLL limit handling in post div calculation
    
    This improves the PLL parameters when we work at
    the limits of the allowed ranges.
    
    Signed-off-by: Christian König <christian.koenig@amd.com>

:040000 040000 5c3ac5ddf911c2c1f8926ecde2d83fdbcd6bb269 4731ceed6e1c149abd6fda6a06318700750f8

So far so good, but what I'm puzzled about is this:

As far as I understand; 3.15.0-rc2-00077-g76e7745 is a later revision (good) than 3.15.0-rc2-00069-g1aae31c (bad) and an earlier revision than 3.15.0-rc2-00092-g92891ed (bad) which doesn't seem to make sense to me.

It is as if someone did a patch to improve on 3.15.0-rc1-00071-gc2fb309 but that it got revoked afterwards, is that possible ?
Comment 8 Chris Bainbridge 2016-05-04 22:13:39 UTC
> As far as I understand; 3.15.0-rc2-00077-g76e7745 is a later revision (good)
> than 3.15.0-rc2-00069-g1aae31c (bad)

This is not correct. The 77/69 does not imply a linear ordering because of forks:

$ git merge-base --is-ancestor 3.15.0-rc2-00069-g1aae31c 3.15.0-rc2-00077-g76e7745; echo $?
1

Trust git ;-)

> c2fb3094669a3205f16a32f4119d0afe40b1a1fd is the first bad commit

Not familiar with this code, but from the patch the PLL values are printed out:

         DRM_DEBUG_KMS("%d - %d, pll dividers - fb: %d.%d ref: %d, post %d\n",
                      freq, *dot_clock_p * 10, *fb_div_p, *frac_fb_div_p,
                      ref_div, post_div);

So suggest enabling debug log and compare those two lines from a working and non-working kernel.

It should also be trivial to checkout a recent tag and revert the bad commit (there is a conflict but just delete the avivo_get_fb_ref_div function to resolve it).
Comment 9 Thom 2016-05-05 17:01:21 UTC
(In reply to Chris Bainbridge from comment #8)

> This is not correct. The 77/69 does not imply a linear ordering because of
> forks:
> Trust git ;-)

Thanks for the update, that explains everything. I hardly know git, and before yesterday I didn't even know what git or what bisecting was...it's a bit overwhelming.

> 
> > c2fb3094669a3205f16a32f4119d0afe40b1a1fd is the first bad commit
> 
> Not familiar with this code, but from the patch the PLL values are printed
> out:
> 
>          DRM_DEBUG_KMS("%d - %d, pll dividers - fb: %d.%d ref: %d, post
> %d\n",
>                       freq, *dot_clock_p * 10, *fb_div_p, *frac_fb_div_p,
>                       ref_div, post_div);
> 

That is like magic :-) How did you get git to give you the source of that patch so quickly  ?
(I googled for hours on this stuff without success) 

> So suggest enabling debug log and compare those two lines from a working and
> non-working kernel.
>

I assume that I have to enable debug log via a bootoption because I couldn't find anything in menuconfig that wasn't already marked for inclusion.
What bootoption do I have to use to enable the right (and right amount of) debug logging ? (and after that, where do I find the log output?)

> It should also be trivial to checkout a recent tag and revert the bad commit

I don't even know yet what that is or how to do that, even after reading the man pages about checkout, tag, revert and commit; but I'm convinced I'll get there in the end ;-)
Comment 10 Thom 2016-05-05 17:37:01 UTC
Hmmm.... I'm afraid I have to enable "debug boot parapeters" in menuconfig.
What git command do I use to get a specific kernelversion source lined up so I can recompile selected kernels for debug ?
Comment 11 Thom 2016-05-06 02:25:44 UTC
ok, some results:

PLL-readings on good working compilations:

3.15.0-rc1-00303-g6e66d5d
[drm:radeon_compute_pll_avivo] 69300 - 6949, pll dividers - fb: 165.0 ref: 2, post 17

3.15.0-rc2-00042-g4d0fa8a
[drm:radeon_compute_pll_avivo] 69300 - 6930, pll dividers - fb: 329.1 ref: 4, post 17

3.15.0-rc2-00077-g76e7745
[drm:radeon_compute_pll_avivo] 69300 - 6930, pll dividers - fb: 329.1 ref: 4, post 17

3.15.0-rc1-00070-g2431581
no output, system hangs loading driver in debug mode
(probably because this one didn't had the patch yet.)
works ok when not in debug mode.



PLL-readings on bad noisy-artefacty compilations:

3.15.0-rc2-00069-g1aae31c
[drm:radeon_compute_pll_avivo] 69300 - 69290, pll dividers - fb: 135.5 ref: 2, post 14

3.15.0-rc1-00071-gc2fb309
[drm:radeon_compute_pll_avivo] 69300 - 69290, pll dividers - fb: 135.5 ref: 2, post 14

3.15.0-rc1-00075-g3ed9a33
[drm:radeon_compute_pll_avivo] 69300 - 69290, pll dividers - fb: 135.5 ref: 2, post 14

3.15.0-rc2-00092-g92891ed
[drm:radeon_compute_pll_avivo] 69300 - 69290, pll dividers - fb: 135.5 ref: 2, post 14


Problem is: I haven't the slightest clue what it all means.
Comment 12 Thom 2016-05-06 14:26:16 UTC
(In reply to Chris Bainbridge from comment #8)

> So suggest enabling debug log and compare those two lines from a working and
> non-working kernel.

Done (see previous message) :-)
 
> It should also be trivial to checkout a recent tag and revert the bad commit

Done :-)
Reverted the bad commit on current 4.6.0-rc6+ an tested
and it worked like a charm !! no display problems anymore

> (there is a conflict but just delete the avivo_get_fb_ref_div function to
> resolve it).

I did, and thanks to your directions it all worked out perfectly :-)
Comment 13 Chris Bainbridge 2016-05-09 09:26:16 UTC
This might be https://bugzilla.kernel.org/show_bug.cgi?id=75241 - there is one line patch there from Christian König but it doesn't look like it was ever merged.
Comment 14 Thom 2016-05-09 20:42:54 UTC
(In reply to Chris Bainbridge from comment #13)
> This might be https://bugzilla.kernel.org/show_bug.cgi?id=75241 - there is
> one line patch there from Christian König but it doesn't look like it was
> ever merged.

I did a git fetch origin , git reset --hard origin/master
to get a plain unaltered current kernel again (4.6.0-rc7+)

I changed the one line in ./drivers/gpu/drm/radeon/radeon_display.c:
    fb_div_max = pll->max_feedback_div;
to:
    fb_div_max = min(pll->max_feedback_div, 512u); 
according to:
    https://bugzilla.kernel.org/attachment.cgi?id=142281
    (linked from https://bugzilla.kernel.org/show_bug.cgi?id=75241)

and compiled (make && make modules_install install)

Assuming that i did not make a mistake or overlooked something;
this patch didn't work, lots of noise/artefacts.
Timings seem identical to the other "bad" compilations, i.e. nothing changed:

(bootparam drm.debug=4)
 [drm:radeon_compute_pll_avivo] 69300 - 69290, pll dividers - fb: 135.5 ref: 2, post 14

too bad, but it was absolutely worth to try.
I wonder if "fb" and "post" are consequently too low....is that possible ?
Comment 15 Thom 2016-05-10 01:42:07 UTC
ok, i created a variation of the one liner patch that works without reverting any of the existing code:

    This patch prevents fb from going lower than 140
    Preventing noise/snow on display . (for RS780M + LVDS)

diff:
@@      void radeon_compute_pll_avivo(struct radeon_pll *pll,

	/* determine allowed feedback divider range */
--	fb_div_min = pll->min_feedback_div;
++	fb_div_min = max(pll->min_feedback_div, 140u);
	fb_div_max = pll->max_feedback_div;

 
 	if (pll->flags & RADEON_PLL_USE_FRAC_FB_DIV) {
 		fb_div_min *= 10;


results in:
[drm:radeon_compute_pll_avivo] 69300 - 69290, pll dividers - fb: 271.0 ref: 4, post 14

This "works for me (TM)"

But it would be good if someone could check if there are no "unforeseen consequences" to this patch.
I don't know much about GPU stuff an I am not familiar with the code.
(and yes I know: hardcoding values is definitely "not done")
Comment 16 Thom 2016-05-10 09:06:02 UTC
fb lower than 140 is possible, my current stock kernel 3.13.0-86 works flawless
[drm:radeon_compute_pll_avivo], 6928, pll dividers - fb: 125.8 ref: 2, post 13

(sigh) I just wish I understood why some modes work and some don't
Comment 17 Chris Bainbridge 2016-05-10 22:12:05 UTC
Christian König posted an explanation of the PLL divider values at https://bugzilla.kernel.org/show_bug.cgi?id=91861#c12 (another "no screen after 3.15" bug report)

The various fixes adjust the divider value limits slightly for different displays. The basic formula is commented in the radeon_compute_pll_avivo function:

        dot_clock = (ref_freq * feedback_div) / (ref_div * post_div)

So by adjusting the limits of those values you can find something that works for your laptop display. But I don't know which solution is technically correct - if you don't get a reply here you could try emailing Christian König and asking.
Comment 18 Thom 2016-05-12 20:58:12 UTC
(In reply to Chris Bainbridge from comment #17)
> if you don't get a reply here you could try emailing Christian
> König and asking.

I did, and Christian responded almost instantly, so I will be busy for quite a while with testing. Don't close this bug yet....work in progress :-)
Comment 19 Thom 2016-06-14 08:03:21 UTC
Patch submitted by Christian König

https://lists.freedesktop.org/archives/dri-devel/2016-June/110724.html

This solved the bug. Thanks everyone for all the help.
Comment 20 Gilbert Smith 2016-07-22 20:03:18 UTC
I have this same problem with an upgrade from 14.04 LTS to 16.04 LTS Ubuntu -

Linux DV7 4.6.4-040604-generic #201607111332 SMP Mon Jul 11 17:34:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

01:05.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RS780M [Mobility Radeon HD 3200] [1002:9612] (prog-if 00 [VGA controller])


I noticed that a patch was submitted. Can I expect to see this in a future kernel or perhaps a RC version after my 4.6.4 kernel?

-------
Comment 21 Thom 2016-07-22 22:34:27 UTC
Gilbert, same with me, also ubuntu 14.04 -> 16.04.

The patch is already in the 4.7+ kernel tree so it should be in the first 4.7 kernel (pre) release.

I'm not familiar with ubuntu's kernel policy and I also don't know anyone who does but I guess that the 4.7 kernel will land in  16.10 or 17.04.
Best to ask the Ubuntu Kernelteam.
Comment 22 Thom 2016-07-22 23:20:07 UTC
addendum:

https://github.com/torvalds/linux/commit/9ef8537e68941d858924a3eacee5a1945767cbab

i.e. kernel 4.7-rc4 and up
Comment 23 Gilbert Smith 2016-07-23 21:02:21 UTC
(In reply to Thom from comment #21)
> Gilbert, same with me, also ubuntu 14.04 -> 16.04.
> 
> The patch is already in the 4.7+ kernel tree so it should be in the first
> 4.7 kernel (pre) release.
> 
> I'm not familiar with ubuntu's kernel policy and I also don't know anyone
> who does but I guess that the 4.7 kernel will land in  16.10 or 17.04.
> Best to ask the Ubuntu Kernelteam.

Thank you for the informative information. I'll probably stay on the LTS 16.04 but as soon as I get wind of the release of kernel 4.7+ I will install it.

I was able to get my system working properly by reverting to kernel 3.13.0-92-generic.

Here' a link to a discussion I found that stated that users who upgraded may use older kernels from 12.04 and 14.04 on 16.04 even if not supported.

http://askubuntu.com/questions/776910/install-old-kernel-in-ubuntu-16-04/801847#801847
Comment 24 Gilbert Smith 2016-07-28 15:53:09 UTC
(In reply to Thom from comment #21)
> Gilbert, same with me, also ubuntu 14.04 -> 16.04.
> 
> The patch is already in the 4.7+ kernel tree so it should be in the first
> 4.7 kernel (pre) release.
> 
> I'm not familiar with ubuntu's kernel policy and I also don't know anyone
> who does but I guess that the 4.7 kernel will land in  16.10 or 17.04.
> Best to ask the Ubuntu Kernelteam.

I just installed the new kernel 4.7.0-040700-generic but it didn't fix the display problem. Reverting back to 3.13.0-92-generic. :(
Comment 25 Thom 2016-07-28 16:23:12 UTC
AFAIK the patch is in since 4.7-RC4.
Could it be that your version is older ?

see also:
https://github.com/torvalds/linux/commit/9ef8537e68941d858924a3eacee5a1945767cbab


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.