Bug 99275 - Kernel 4.9: amdgpu regression; gui flickers; amd radeon rx 460
Summary: Kernel 4.9: amdgpu regression; gui flickers; amd radeon rx 460
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-04 19:25 UTC by Reimar Imhof
Modified: 2019-11-19 08:12 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
hwinfo (415.38 KB, text/plain)
2017-01-04 19:25 UTC, Reimar Imhof
no flags Details
dmesg.txt (67.74 KB, text/plain)
2017-01-04 21:17 UTC, Reimar Imhof
no flags Details
Xorg.0.log (39.89 KB, text/plain)
2017-01-04 21:17 UTC, Reimar Imhof
no flags Details
rpm -qa --queryformat "%{Name} - %{Version} - %{Vendor}\n" | grep build.opensuse.org/X11 (9.01 KB, text/plain)
2017-01-04 21:18 UTC, Reimar Imhof
no flags Details
Xorg.0.log with activated Xorg amdgup driver (28.62 KB, text/plain)
2017-01-05 18:09 UTC, Reimar Imhof
no flags Details
artifacts on radeon rx460 (240.58 KB, image/png)
2017-02-14 11:55 UTC, alvarex
no flags Details

Description Reimar Imhof 2017-01-04 19:25:45 UTC
Created attachment 128754 [details]
hwinfo

System: openSuse Leap 42.2,
kernel from download.opensuse.org/repositories/Kernel:/stable/standard/
4.9.0-4.g1af4b0f
Mesa from download.opensuse.org/repositories/X11:/XOrg/openSUSE_Leap_42.2/

graphics adapter: amd radeon rx 460
cpu: i5-6402p

desktop: kde5 (from 42.2-oss/update)

Problem: Address-drop-down in firefox flickers.

Steps to reproduce:

start firefox
enter address (example: www.heise.de)
enter an other address
address drop down starts to flicker.

Perhaps problem shows up only when virtual desktop has been changed before firefox was started. (I use cube animation for virtual desktop change.)

Problem also occurs with tumbleweed live system.

There is no problem with kernel 4.8.x (4.8.14)

I've also reported this bug at https://bugzilla.suse.com/show_bug.cgi?id=1017938
Comment 1 Alex Deucher 2017-01-04 19:34:50 UTC
Please attach your xorg log and dmesg output.  Is it just the menu that flickers or the whole screen?  Did you also change any other components (mesa, ddx, etc.)?
Comment 2 Reimar Imhof 2017-01-04 21:17:01 UTC
Created attachment 128756 [details]
dmesg.txt
Comment 3 Reimar Imhof 2017-01-04 21:17:39 UTC
Created attachment 128757 [details]
Xorg.0.log
Comment 4 Reimar Imhof 2017-01-04 21:18:37 UTC
Created attachment 128758 [details]
rpm -qa --queryformat "%{Name} - %{Version} - %{Vendor}\n" | grep build.opensuse.org/X11
Comment 5 Reimar Imhof 2017-01-04 21:24:42 UTC
I've attached the logs and the info about changed components.
They are all from
download.opensuse.org/repositories/X11:/XOrg/openSUSE_Leap_42.2/

The flicker problem is not about the menu but the drop down list that opens when you enter an address to the firefox address field.

A second effect:
When I re-size a window using kernel 4.8.14 the background (or underlying windows) shine through.
When I do so using kernel 4.9 it also starts flickering and shows artifacts from the background.
Comment 6 Reimar Imhof 2017-01-04 21:35:04 UTC
Just something I do not understand:

While having a look at the Xorg.log I couldn't find anything about loading xf86-video-amdgpu.
I just found lines about 
/usr/lib64/xorg/modules/drivers/ati_drv.so
and
/usr/lib64/xorg/modules/drivers/radeon_drv.so

Do I need a special config for loading the amdgpu driver?

There is no video-driver specific config in /etc/X11/xorg.conf.d/
Comment 7 Michel Dänzer 2017-01-05 09:56:39 UTC
Any chance you can use git bisect to find the change between 4.8 and 4.9 which introduced the problem?

(In reply to Reimar Imhof from comment #6)
> Do I need a special config for loading the amdgpu driver?
> 
> There is no video-driver specific config in /etc/X11/xorg.conf.d/

There should be a /usr/share/X11/xorg.conf.d/10-amdgpu.conf file which causes the amdgpu driver to be used automatically for GPUs controlled by the amdgpu kernel driver. If that file is missing, please report the problem to openSuse.
Comment 8 Reimar Imhof 2017-01-05 18:09:35 UTC
Created attachment 128779 [details]
Xorg.0.log with activated Xorg amdgup driver

I found an Xorg amdgpu.conf file at the arch linux wiki:
 
Section "Device"
    Identifier "AMD"
    Driver "amdgpu"
EndSection

With this config the amdgpu Xorg driver gets loaded. But the kernel bug still exists.
Comment 9 Reimar Imhof 2017-01-05 18:36:33 UTC
(In reply to Michel Dänzer from comment #7)
> Any chance you can use git bisect to find the change between 4.8 and 4.9
> which introduced the problem?
Sorry, I can't.

> There should be a /usr/share/X11/xorg.conf.d/10-amdgpu.conf file which
> causes the amdgpu driver to be used automatically for GPUs controlled by the
> amdgpu kernel driver. If that file is missing, please report the problem to
> openSuse.
I filed a bug report at https://bugzilla.opensuse.org/show_bug.cgi?id=1018406
Comment 10 Reimar Imhof 2017-01-16 20:55:22 UTC
I've just tried kernel 4.9.4
from download.opensuse.org/repositories/Kernel:/stable/standard/
Same bug.

Do you have any news?
Comment 11 Reimar Imhof 2017-01-29 19:03:43 UTC
tried a bisect:
git bisect start
# good: [c8d2bc9bc39ebea8437fd974fdbc21847bb897a3] Linux 4.8
git bisect good c8d2bc9bc39ebea8437fd974fdbc21847bb897a3
# bad: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9
git bisect bad 69973b830859bc6529a7a0468ba0d80ee5117826
# bad: [a5af7e1fc69a46f29b977fd4b570e0ac414c2338] rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs
git bisect bad a5af7e1fc69a46f29b977fd4b570e0ac414c2338
# bad: [d268dbe76a53d72cc41316eb59e7968db60e77ad] Merge tag 'pinctrl-v4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad d268dbe76a53d72cc41316eb59e7968db60e77ad
# bad: [02bafd96f3a5d8e610b19033ffec55b92459aaae] Merge tag 'docs-4.9' of git://git.lwn.net/linux
git bisect bad 02bafd96f3a5d8e610b19033ffec55b92459aaae
# bad: [9929780e86854833e649b39b290b5fe921eb1701] Merge tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect bad 9929780e86854833e649b39b290b5fe921eb1701
# good: [12b7bcb43e6ea834ab2f5dc52d971e379a0ca109] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 12b7bcb43e6ea834ab2f5dc52d971e379a0ca109
# bad: [5e1b834b27fb2c27cde33a0752425f11d10c0b2d] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 5e1b834b27fb2c27cde33a0752425f11d10c0b2d
# bad: [110a9e42b68719f584879c5c5c727bbae90d15f9] Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 110a9e42b68719f584879c5c5c727bbae90d15f9
# bad: [a399d233078edbba7cf7902a6d080100cdf75636] sched/core: Fix incorrect utilization accounting when switching to fair class
git bisect bad a399d233078edbba7cf7902a6d080100cdf75636
# good: [1a3d027c5a6847e5d349c8527f99aada47e5467a] sched/debug: Rename and move enqueue_sleeper()
git bisect good 1a3d027c5a6847e5d349c8527f99aada47e5467a
# good: [53061afee43bc5041b67a45b6d793e7afdcf9ca7] Merge branch 'akpm' (patches from Andrew)
git bisect good 53061afee43bc5041b67a45b6d793e7afdcf9ca7
# good: [a18a579e5f84daa74f64b1f1b652b4a6a8d6f8b4] sched/debug: Hide printk() by default
git bisect good a18a579e5f84daa74f64b1f1b652b4a6a8d6f8b4
# good: [eaf9ef52241b545fe63621266bfc6fd8b06559ff] sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()
git bisect good eaf9ef52241b545fe63621266bfc6fd8b06559ff
# good: [24fc7edb92eea05946119cc0258c891c26b3b469] sched/core: Introduce 'struct sched_domain_shared'
git bisect good 24fc7edb92eea05946119cc0258c891c26b3b469
# good: [10e2f1acd0106c05229f94c70a344ce3a2c8008b] sched/core: Rewrite and improve select_idle_siblings()
git bisect good 10e2f1acd0106c05229f94c70a344ce3a2c8008b
# bad: [1b568f0aabf280555125bc7cefc08321ff0ebaba] sched/core: Optimize SCHED_SMT
git bisect bad 1b568f0aabf280555125bc7cefc08321ff0ebaba
# first bad commit: [1b568f0aabf280555125bc7cefc08321ff0ebaba] sched/core: Optimize SCHED_SMT

> git bisect bad
1b568f0aabf280555125bc7cefc08321ff0ebaba is the first bad commit
commit 1b568f0aabf280555125bc7cefc08321ff0ebaba
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon May 9 10:38:41 2016 +0200

    sched/core: Optimize SCHED_SMT
    
    Avoid pointless SCHED_SMT code when running on !SMT hardware.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

:040000 040000 2ce55347eeb85f4e6044fac184e944f5235de1d3 454406febf88e80e4c5a8e37a403ad6c79e097d6 M      kernel
Comment 12 Reimar Imhof 2017-01-29 19:04:55 UTC
tried an other bisect

git bisect start
# bad: [9929780e86854833e649b39b290b5fe921eb1701] Merge tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect bad 9929780e86854833e649b39b290b5fe921eb1701
# good: [12b7bcb43e6ea834ab2f5dc52d971e379a0ca109] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 12b7bcb43e6ea834ab2f5dc52d971e379a0ca109
# bad: [5e1b834b27fb2c27cde33a0752425f11d10c0b2d] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 5e1b834b27fb2c27cde33a0752425f11d10c0b2d
# bad: [110a9e42b68719f584879c5c5c727bbae90d15f9] Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 110a9e42b68719f584879c5c5c727bbae90d15f9
# bad: [a399d233078edbba7cf7902a6d080100cdf75636] sched/core: Fix incorrect utilization accounting when switching to fair class
git bisect bad a399d233078edbba7cf7902a6d080100cdf75636
# bad: [1a3d027c5a6847e5d349c8527f99aada47e5467a] sched/debug: Rename and move enqueue_sleeper()
git bisect bad 1a3d027c5a6847e5d349c8527f99aada47e5467a
# good: [0e6d2a67a41321b3ef650b780a279a37855de08e] sched/core: Remove unnecessary NULL-pointer check
git bisect good 0e6d2a67a41321b3ef650b780a279a37855de08e
# good: [62cc20bcf25617dd5ad23356ea46830da3ef7356] Merge branch 'sched/urgent' into sched/core, to pick up fixes
git bisect good 62cc20bcf25617dd5ad23356ea46830da3ef7356
# good: [d8206bb3ffe0eaee03abfad46fd44d8b17142e88] sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()
git bisect good d8206bb3ffe0eaee03abfad46fd44d8b17142e88
# good: [efca03ecbe29a46c2c5ae539563b6326af9dcba7] schedcore: Remove duplicated init_task's preempt_notifiers init
git bisect good efca03ecbe29a46c2c5ae539563b6326af9dcba7
# good: [61c7aca695b6fabe85d0fc424fe8ae2f66f267dd] sched/deadline: Fix the intention to re-evalute tick dependency for offline CPU
git bisect good 61c7aca695b6fabe85d0fc424fe8ae2f66f267dd
# first bad commit: [1a3d027c5a6847e5d349c8527f99aada47e5467a] sched/debug: Rename and move enqueue_sleeper()


 > git bisect good
1a3d027c5a6847e5d349c8527f99aada47e5467a is the first bad commit
commit 1a3d027c5a6847e5d349c8527f99aada47e5467a
Author: Josh Poimboeuf <jpoimboe@redhat.com>
Date:   Fri Jun 17 12:43:23 2016 -0500

    sched/debug: Rename and move enqueue_sleeper()
    
    enqueue_sleeper() doesn't actually enqueue, it just handles some
    statistics and tracepoints.  Rename it to update_stats_enqueue_sleeper()
    and call it from update_stats_enqueue().
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matt Fleming <matt@codeblueprint.co.uk>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/fb20b7159dc4d028c406c0e8d5f8c439b741615b.1466184592.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

:040000 040000 18aa42131bb6e286cc94eb481b7edf59c0b8d3b5 01d1206384de4d8a232427b39e9940806d9dacf3 M      kernel
Comment 13 Reimar Imhof 2017-01-29 19:11:09 UTC
after I've had seen, 4.8 hat the same problem I tried to find the first good 4.8.x:

 > git bisect new
107d026ae1c80ac0881f791a58cd115321d111ca is the first new commit
commit 107d026ae1c80ac0881f791a58cd115321d111ca
Author: Will Deacon <will.deacon@arm.com>
Date:   Fri Aug 26 11:36:39 2016 +0100

    arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
    
    commit 3a402a709500c5a3faca2111668c33d96555e35a upstream.
    
    When TIF_SINGLESTEP is set for a task, the single-step state machine is
    enabled and we must take care not to reset it to the active-not-pending
    state if it is already in the active-pending state.
    
    Unfortunately, that's exactly what user_enable_single_step does, by
    unconditionally setting the SS bit in the SPSR for the current task.
    This causes failures in the GDB testsuite, where GDB ends up missing
    expected step traps if the instruction being stepped generates another
    trap, e.g. PTRACE_EVENT_FORK from an SVC instruction.
    
    This patch fixes the problem by preserving the current state of the
    stepping state machine when TIF_SINGLESTEP is set on the current thread.
    
    Cc: <stable@vger.kernel.org>
    Reported-by: Yao Qi <yao.qi@arm.com>
    Signed-off-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 98b6a8a0d27ab29b17b6b617e52beb33d56dc604 46030e41ff5783555a6d6e895799aabbe3a0c46e M      arch
Comment 14 Reimar Imhof 2017-01-29 19:13:22 UTC
I've also tried 4.9.6 and 4.10-rc5.

Again error still existing.
Comment 15 Michel Dänzer 2017-01-31 06:42:20 UTC
There are two basic explanations for not getting consistent bisection results:

* In each case, at least one bad commit was accidentally marked as good. Test longer / more times before declaring a commit as good to avoid this.

* The problem (or at least the trigger) isn't in the kernel but somewhere else.
Comment 16 Reimar Imhof 2017-02-01 21:08:59 UTC
I had an other look at 4.8.
4.8 is ok.

Next try was 4.9-rc1.
That one is buggy.

Trying to bisect ended up in lots of unbootable kernels.

git bisect start
# good: [c8d2bc9bc39ebea8437fd974fdbc21847bb897a3] Linux 4.8
git bisect good c8d2bc9bc39ebea8437fd974fdbc21847bb897a3
# bad: [1001354ca34179f3db924eb66672442a173147dc] Linux 4.9-rc1
git bisect bad 1001354ca34179f3db924eb66672442a173147dc
# skip: [41844e36206be90cd4d962ea49b0abc3612a99d0] Merge tag 'staging-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect skip 41844e36206be90cd4d962ea49b0abc3612a99d0
# skip: [7906a4da0ef845d01e76f2187c23cc71ae00fa1d] net/faraday: Make EDO{R,T}R bits configurable
git bisect skip 7906a4da0ef845d01e76f2187c23cc71ae00fa1d
# skip: [86ec64bc382ad1e0e6bc66f5f705a53e299a3445] ACPICA: Tables: Fix a regression in acpi_tb_find_table()
git bisect skip 86ec64bc382ad1e0e6bc66f5f705a53e299a3445

So that took quite some time but didn't help at all.
I can't see that this will come to an end. Hopefully it will not break my machine.
So I'm stopping this.
Comment 17 Alex Deucher 2017-02-01 22:10:55 UTC
Does using the new ucode here help?

https://people.freedesktop.org/~agd5f/radeon_ucode/polaris/
Comment 18 Reimar Imhof 2017-02-03 20:05:58 UTC
That firmware kills kde / plasma5 and freezes the machine!

What I've done:
Copied the firmware files to /lib/firmware/amdgpu/
ran dracut to build new initrd
reboot

graphical login comes up
after logging in to kde plasma 5 the machine freezes

After a hard power of I logged into icewm.
That desktop came up. I opened an console. Ok. 
Started dolphin. Machine has frozen immediately. Even the clock in the desktop toolbar stopped working.

Before starting dolphin I had a look at dmesg. I couldn't find any new error messages. Unfortunately I do not have these messages around any more.
I also had a look at the /var/log/Xorg.<i>.log
There was no log written from the kde session freezing the machine.

Could you please test that firmware with a kde plasma 5 desktop? Please!
Comment 19 Alex Deucher 2017-02-03 20:11:54 UTC
(In reply to Reimar Imhof from comment #18)
> That firmware kills kde / plasma5 and freezes the machine!
> 
> What I've done:
> Copied the firmware files to /lib/firmware/amdgpu/
> ran dracut to build new initrd
> reboot
> 
> graphical login comes up
> after logging in to kde plasma 5 the machine freezes
> 
> After a hard power of I logged into icewm.
> That desktop came up. I opened an console. Ok. 
> Started dolphin. Machine has frozen immediately. Even the clock in the
> desktop toolbar stopped working.
> 
> Before starting dolphin I had a look at dmesg. I couldn't find any new error
> messages. Unfortunately I do not have these messages around any more.
> I also had a look at the /var/log/Xorg.<i>.log
> There was no log written from the kde session freezing the machine.
> 
> Could you please test that firmware with a kde plasma 5 desktop? Please!

Can you just try updating the smc firmware?
Comment 20 Reimar Imhof 2017-02-03 20:46:23 UTC
This time I only updated the following files:
polaris12_smc.bin
polaris11_smc_sk.bin
polaris11_smc.bin
polaris10_smc_sk.bin
polaris10_smc.bin

I think relevant are the polaris11 files as dmesg talked about POLARIS11.

Same effect, machine freezes when trying to start kde :-(((
Comment 21 Reimar Imhof 2017-02-12 16:12:25 UTC
at
https://bugzilla.kernel.org/show_bug.cgi?id=193651#c8
I found the hint about drm-next-4.11-wip.
I build that kernel but still had the same bug.
Comment 22 Reimar Imhof 2017-02-12 16:17:32 UTC
I also tried to build a 4.8 kernel with 4.9 drm:
git clone -b drm-next-4.9-wip git://people.freedesktop.org/~agd5f/linux freedektop-drm-next-4.9-wip
cd freedektop-drm-next-4.9-wip
make rpm

That kernel showed no problems.
Comment 23 Reimar Imhof 2017-02-12 16:30:46 UTC
After freedektop-drm-next-4.9-wip was ok, I build some pre 4.9 kernels:
(repository: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)

7af8a0f8088831428051976cb06cc1e450f8bab5
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
--> ok

d7a0dab82fef61bebd34f2bbb9314b075153b646
Merge branch 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
--> ok

e606d81d2d9596ab2b4fd0dc052eea0485b7e8c2
Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
--> ok

af79ad2b1f337a00aa150b993635b10bc68dc842
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
--> bug, flickers

So af79ad2b1f337a00aa150b993635b10bc68dc842 turns out to be the first buggy commit.
Pull scheduler changes from Ingo Molnar:
     "The main changes are:
    
       - irqtime accounting cleanups and enhancements. (Frederic Weisbecker)
    
       - schedstat debugging enhancements, make it more broadly runtime
         available. (Josh Poimboeuf)
    
       - More work on asymmetric topology/capacity scheduling. (Morten
         Rasmussen)
    
       - sched/wait fixes and cleanups. (Oleg Nesterov)
    
       - PELT (per entity load tracking) improvements. (Peter Zijlstra)
    
       - Rewrite and enhance select_idle_siblings(). (Peter Zijlstra)
    
       - sched/numa enhancements/fixes (Rik van Riel)
    
       - sched/cputime scalability improvements (Stanislaw Gruszka)
    
       - Load calculation arithmetics fixes. (Dietmar Eggemann)
    
       - sched/deadline enhancements (Tommaso Cucinotta)
    
       - Fix utilization accounting when switching to the SCHED_NORMAL
         policy. (Vincent Guittot)
    
       - ... plus misc cleanups and enhancements"

Do these scheduler changes somehow break amdgpu?
Comment 24 Reimar Imhof 2017-02-12 19:21:08 UTC
I forgot to mention: When running the 4.8 kernel with high cpu load (about 100%) (for example building a kernel) the 4.8 kernel also produces flickering and shows artifacts from the background when re-sizing a window.
But the 4.9 kernel shows this bug even with very low cpu load when the machine is idle (see comment#5).
Comment 25 Michel Dänzer 2017-02-13 03:10:32 UTC
(In reply to Reimar Imhof from comment #23)
> 
> Do these scheduler changes somehow break amdgpu?

That's unlikely. See comment #15.
Comment 26 Reimar Imhof 2017-02-13 18:55:47 UTC
According comment #15: Things are properly tested.

Together with comment #24 there is a render bug in kernel 4.8 that shows up at 100% cpu load.
With kernel 4.9 this same bug shows up at 0% / idle cpu load.

With 
af79ad2b1f337a00aa150b993635b10bc68dc842
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
it changed from "bug shows at 100% load" to "bug shows at 0% load". And the change is something about the scheduler.
To me this seems likely.

In summery the word "regression" might be wrong.
Comment 27 Michel Dänzer 2017-02-14 02:10:28 UTC
(In reply to Reimar Imhof from comment #26)
> Together with comment #24 there is a render bug in kernel 4.8 that shows up
> at 100% cpu load.
> With kernel 4.9 this same bug shows up at 0% / idle cpu load.
> 
> With 
> af79ad2b1f337a00aa150b993635b10bc68dc842
> Merge branch 'sched-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> it changed from "bug shows at 100% load" to "bug shows at 0% load". And the
> change is something about the scheduler.
> To me this seems likely.

Not really. That commit is a pure merge commit, which makes it unlikely that it behaves any differently from either of its parent commits. So git bisect should have identified one of its ancestor commits instead. The fact that it identified a pure merge commit indicates that the result is incorrect, most likely because at least one commit along the way was incorrectly classified as good (or bad).
Comment 28 alvarex 2017-02-14 11:55:46 UTC
Created attachment 129582 [details]
artifacts on radeon rx460
Comment 29 alvarex 2017-02-14 11:57:21 UTC
Hi I think I ve run into the same issue. I'm not quite sure. Firefox and other elements sometimes present artifacts, and as Reimar suggests with kernel 4.8 it doesn't happen.
RX460 on Opensuse 42.2. 
It seams like some sort of memory corruption, my worst fear is that maybe this is a defective hardware. What brand is your rx460? I have a rx460 Gigabyte Windforce OC edition.

I attach a screenshot.
Comment 30 alvarex 2017-02-14 12:02:07 UTC
it happens really fast for a fraction of a second and then it draws correctly, I had to record the desktop and play the video in slow motion to take the screenshot. 
And it's random sometimes it won't happen for a long period and sometimes clicking every drop down menu will trigger it.
Comment 31 alvarex 2017-02-14 14:05:39 UTC
I will try to git bisect but last time I tried bisecting the kernel I failed miserably and it failed compiling several times.
Comment 32 alvarex 2017-02-14 15:29:17 UTC
I'm sorry but I can't find a consistent way of reproducing the bug. I presumed that with 4.9 the bug will still be there but right now, no matter how hard I try I cannot reproduce with 4.9.4, 4.9.9 and 4.9rc1, I think that in my case is a hardware problem and possibly unrelated to Reimar bug. It seems to me that the artifacting varies under load, or under temperature, just that maybe some kernel versions mitigate the problem and it goes unnoticed and some a aggravate the problem (IMHO). 
I really don't know what the cause is in my case.
Comment 33 Reimar Imhof 2017-02-14 19:32:16 UTC
(In reply to Michel Dänzer from comment #27)
> (In reply to Reimar Imhof from comment #26)
> > Together with comment #24 there is a render bug in kernel 4.8 that shows up
> > at 100% cpu load.
> > With kernel 4.9 this same bug shows up at 0% / idle cpu load.
> > 
> > With 
> > af79ad2b1f337a00aa150b993635b10bc68dc842
> > Merge branch 'sched-core-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > it changed from "bug shows at 100% load" to "bug shows at 0% load". And the
> > change is something about the scheduler.
> > To me this seems likely.
> 
> Not really. That commit is a pure merge commit, which makes it unlikely that
> it behaves any differently from either of its parent commits. So git bisect
> should have identified one of its ancestor commits instead. The fact that it
> identified a pure merge commit indicates that the result is incorrect, most
> likely because at least one commit along the way was incorrectly classified
> as good (or bad).

I forgot to mention:
I had a look at the first merge commits. I did _not_ do a "git bisect" but for example a "git reset --hard 7af8a0f8088831428051976cb06cc1e450f8bab5" followed by a "make rpm" to compile "Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux".
"e606d81d2d9596ab2b4fd0dc052eea0485b7e8c2
Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip" was a good commit - no problems at idle cpu as described in Comment #23.
That one was followed by "af79ad2b1f337a00aa150b993635b10bc68dc842
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip" which turned out to be the first bad commit (glitches at 0 cpu load).
So all tested commits were pure merge commits.
Comment 34 Reimar Imhof 2017-02-22 21:25:51 UTC
Suse provided a new kernel-firmware package:
kernel-firmware-20170217-35.1.
This includes amdgpu firmware.
Using this firmware the computer freezes while starting kde plasma 5.
Same bug as reported in #c18.

I tried with latest firmware from
https://people.freedesktop.org/~agd5f/radeon_ucode/polaris/
This did _not_ help in any way. System still freezes.

I tested this with kernel 4.10.0 from 
http://download.opensuse.org/repositories/Kernel:/stable/standard/
Comment 35 Reimar Imhof 2017-02-22 21:26:11 UTC
flickering bug still around with kernel 4.10.0
Comment 36 Jan Vlug 2018-03-24 11:20:38 UTC
see also https://gitlab.gnome.org/GNOME/mutter/issues/22
Comment 37 Ivan Chebykin 2018-12-19 20:07:48 UTC
This looks related to https://bugs.freedesktop.org/show_bug.cgi?id=105910
Comment 38 Martin Peres 2019-11-19 08:12:24 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/119.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.