Bug 110510 - Radeon VII HDMI issues: Flicking/system crashing
Summary: Radeon VII HDMI issues: Flicking/system crashing
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-24 22:54 UTC by Tom B
Modified: 2019-11-19 09:19 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg after setting HDMI screen to 59.94hz (14.72 KB, text/plain)
2019-04-24 22:54 UTC, Tom B
no flags Details

Description Tom B 2019-04-24 22:54:50 UTC
Created attachment 144089 [details]
dmesg after setting HDMI screen to 59.94hz

I have a radeon VII with two 4k displays. One connected via HDMI and one connected via DisplayPort. Neither monitor supports freesync.

Linux 5.0.9 
Mesa 19.0.3

The DisplayPort monitor is working fine.

The HDMI monitor has some problems. On its own I can get it working perfectly. Here's the output from xrandr:

DisplayPort-2 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 521mm x 293mm
   3840x2160     60.00*+  30.00    30.00    24.00    29.97    23.98  
   1920x1200     60.00  
   1920x1080     60.00    50.00    59.94    30.00    24.00    29.97    23.98  
   1600x1200     60.00  
   1680x1050     59.95  
   1280x1024     75.02    60.02  
   1440x900      59.89  
   1280x960      60.00  
   1280x800      59.81  
   1152x864      75.00  
   1280x768      59.87  
   1280x720      60.00    50.00    59.94  
   1024x768      75.03    70.07    60.00  
   832x624       74.55  
   800x600       72.19    75.00    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       75.00    72.81    66.67    60.00    59.94  
   720x400       70.08  
HDMI-A-0 connected 3840x2160+3840+0 (normal left inverted right x axis y axis) 521mm x 293mm
   3840x2160     60.00*+  60.00    50.00    59.94    30.00    30.00    24.00    29.97    23.98  
   1920x1200     60.00  
   1920x1080     60.00    50.00    59.94    30.00    24.00    29.97    23.98  
   1600x1200     60.00  
   1680x1050     59.88  
   1280x1024     75.02    60.02  
   1440x900      59.90  
   1280x960      60.00  
   1280x800      59.91  
   1152x864      75.00  
   1280x768      59.87  
   1280x720      60.00    50.00    59.94  
   1024x768      75.03    70.07    60.00  
   832x624       74.55  
   800x600       72.19    75.00    60.32    56.25  
   720x576       50.00  
   720x480       60.00    59.94  
   640x480       75.00    72.81    66.67    60.00    59.94  
   720x400       70.08  


If it's connected on it shows two problems intermittently:  

1. It flickers occasionally with visual artifacts
2. The screen goes black for a few seconds then comes back


With just the HDMI screen connected this can be solved. If I set the refresh rate to 59.94 the problems go away and everything is working flawlessly. 

However, as soon as I connect the second monitor by displayport this fix  no longer works.

With both monitors connected and both running at 60.0hz, the displayport screen is fine but the HDMI screen flickers. An easy fix I thought: Run the HMDI screen at 59.94hz and the flicker will go away.


Unfortunately, what actually happens is the entire session freezes after a few seconds on both X and Wayland. Usually it requires a complete system reset but I have managed to recover from it a couple of times and I've attached my dmesg output.

Sometimes the freeze is instant, other times I get up to 10 seconds before the system freezes.


I also tried 50hz on the HDMI monitor and the same thing happened.

There appears to be two different issues here:


1. HDMI flickers 
2. Running two monitors at different refresh rates causes the driver to crash
Comment 1 Tom B 2019-04-24 23:07:34 UTC
This may be relevant: The crash does not happen if I run the HDMI monitor at 30hz. This means the monitors can be run at different refresh rates. 

The DisplayPort monitor does not support 59.94hz or 50hz. My conclusion is that the crash occurs if the HDMI monitor is set to a refresh rate that the DisplayPort monitor does not support. 


And that explains why the HDMI monitor works on its own. With only the HDMI monitor connected, all the connected displays support 59.94 so everything is fine.


I did try adding 3840x2160 59.94 to the displayport  monitor via xrandr to see if I could run both displays at 59.94, unfortunately it does not seem to support it and I just get a black screen.
Comment 2 Tom B 2019-04-24 23:21:07 UTC
After freezing the system, here's the output from journalctl --boot=-1 | grep amdgpu

Apr 25 00:10:18 desktop kernel: [drm] amdgpu kernel modesetting enabled.
Apr 25 00:10:18 desktop kernel: fb0: switching to amdgpudrmfb from EFI VGA
Apr 25 00:10:18 desktop kernel: amdgpu 0000:44:00.0: No more image in the PCI ROM
Apr 25 00:10:18 desktop kernel: amdgpu 0000:44:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
Apr 25 00:10:18 desktop kernel: amdgpu 0000:44:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
Apr 25 00:10:18 desktop kernel: amdgpu 0000:44:00.0: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
Apr 25 00:10:18 desktop kernel: [drm] amdgpu: 16368M of VRAM memory ready
Apr 25 00:10:18 desktop kernel: [drm] amdgpu: 16368M of GTT memory ready.
Apr 25 00:10:18 desktop kernel: amdgpu 0000:44:00.0: Direct firmware load for amdgpu/vega20_ta.bin failed with error -2
Apr 25 00:10:18 desktop kernel: amdgpu 0000:44:00.0: psp v11.0: Failed to load firmware "amdgpu/vega20_ta.bin"
Apr 25 00:10:19 desktop kernel: fbcon: amdgpudrmfb (fb0) is primary device
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: fb0: amdgpudrmfb frame buffer device
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring gfx uses VM inv eng 0 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring sdma0 uses VM inv eng 0 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring page0 uses VM inv eng 1 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring sdma1 uses VM inv eng 4 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring page1 uses VM inv eng 5 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring vce0 uses VM inv eng 12 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring vce1 uses VM inv eng 13 on hub 1
Apr 25 00:10:19 desktop kernel: amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
Apr 25 00:10:19 desktop kernel: [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on minor 0
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=590, emitted seq=591
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page0 timeout, signaled seq=2654, emitted seq=2656
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=23, emitted seq=25
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout, signaled seq=1051, emitted seq=1053
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasmashell pid 1388 thread plasmashel:cs0 pid 1666
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Apr 25 00:11:10 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_x11 pid 1379 thread kwin_x11:cs0 pid 1615
Apr 25 00:11:10 desktop kernel: amdgpu 0000:44:00.0: GPU reset begin!
Apr 25 00:11:10 desktop kernel: amdgpu 0000:44:00.0: GPU reset begin!
Apr 25 00:11:10 desktop kernel: amdgpu 0000:44:00.0: GPU reset begin!
Apr 25 00:11:10 desktop kernel: amdgpu 0000:44:00.0: GPU reset begin!
Apr 25 00:11:21 desktop kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Apr 25 00:11:31 desktop kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:49:crtc-1] hw_done or flip_done timed out
Comment 3 Tom B 2019-04-25 00:42:09 UTC
Further testing after reading this: https://bugs.freedesktop.org/show_bug.cgi?id=102646

If I run:

echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level

Before setting the refresh rate to 59.94hz it works perfectly! The system does not freeze at all (at least not for the 5 minutes since I tried it.)


The second I ran 

echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level

X froze. 


A rather annoying chain of bugs:

1. I have to set the refresh rate to 59.94hz to stop the screen flickering (Don't really care about this, I'd happily do this if it was the only issue)

2. It works perfectly with a single display

3. With multiple displays, setting the refresh rate to 59.94 causes the system to freeze and I have to to a hard reset

4. Points 2 and 3 are only relevant if  power_dpm_force_performance_level is set to auto.


This is an acceptable compromise for now. The additional temperature/power usage is far more palatable than the flickering screen or system freezes.
Comment 4 Tom B 2019-04-25 13:42:58 UTC
Unfortunately this workaround does not work.

As soon as I launch a game on either monitor the whole system freezes despite the performance setting.

Nothing appeared in journalctl this time but it seems like the same kind of freeze.
Comment 5 Tom B 2019-05-01 19:28:22 UTC
Having swapped my HDMI monitor to use DisplayPort, everything is now perfectly fine. Both monitors work entirely as expected, no flickering or system freezes.
Comment 6 Martin Peres 2019-11-19 09:19:48 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/757.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.