103370 – `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

Bug 103370 - `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

Summary: `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Grap...

Status:	RESOLVED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Radeon (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-10-20 09:16 UTC by Shih-Yuan Lee
Modified:	2018-03-15 06:45 UTC (History)
CC List:	5 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg by drm.debug=0xe (313.87 KB, text/plain) 2017-10-20 11:04 UTC, Shih-Yuan Lee	no flags	Details
Xorg.0.log (42.38 KB, text/plain) 2017-10-25 04:31 UTC, Shih-Yuan Lee	no flags	Details
blacklist radeon dmesg (127.60 KB, text/plain) 2017-10-26 04:14 UTC, Shih-Yuan Lee	no flags	Details
blacklist radeon Xorg.0.log (39.47 KB, text/plain) 2017-10-26 04:15 UTC, Shih-Yuan Lee	no flags	Details
attachment-12667-0.html (2.28 KB, text/html) 2017-10-26 08:41 UTC, Mike Lothian	no flags	Details
workaround for radeon (1.03 KB, patch) 2017-11-21 17:13 UTC, Alex Deucher	no flags	Details \| Splinter Review
workaround for amdgpu (1.05 KB, patch) 2017-11-21 17:13 UTC, Alex Deucher	no flags	Details \| Splinter Review
dmesg (118.09 KB, text/plain) 2017-11-22 08:59 UTC, Shih-Yuan Lee	no flags	Details
View All

Description Shih-Yuan Lee 2017-10-20 09:16:14 UTC

While I am doing the tests with AC plugged in by `DRI_PRIME=1 glxgears -info` and `DRI_PRIME=0 glxgears -info`, the system halts and then is forced to shutdown automatically.
I tried mainline kernels from 4.10rc7 to v4.14rc5 and they have the same problem.

Comment 1 Michel Dänzer 2017-10-20 09:19:55 UTC

Please attach the corresponding dmesg output.

(In reply to Shih-Yuan Lee from comment #0)
> While I am doing the tests with AC plugged in by `DRI_PRIME=1 glxgears
> -info` and `DRI_PRIME=0 glxgears -info`, the system halts and then is forced
> to shutdown automatically.

To clarify, DRI_PRIME=1 glxgears works, the problem only occurs with DRI_PRIME=0 glxgears?

Comment 2 Shih-Yuan Lee 2017-10-20 09:20:57 UTC

If I executed the command under battery mode, it won't halt the system.

Comment 3 Shih-Yuan Lee 2017-10-20 09:26:27 UTC

Using DRI_PRIME=0 is just to switch between Intel and AMD Graphics.
Of course, we can omit it completely.

The system halt issue happens when executing `DRI_PRIME=1 glxgears -info`.

u@u:~$ DRI_PRIME=1 glxgears -info
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
GL_RENDERER   = Gallium 0.4 on AMD HAINAN (DRM 2.46.0, LLVM 3.8.0)
GL_VERSION    = 3.0 Mesa 11.2.0
GL_VENDOR     = X.Org
...
u@u:~$ DRI_PRIME=0 glxgears -info
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
GL_RENDERER   = Mesa DRI Intel(R) Kabylake GT1.5 
GL_VERSION    = 3.0 Mesa 11.2.0
GL_VENDOR     = Intel Open Source Technology Center
...

(In reply to Michel Dänzer from comment #1)
> Please attach the corresponding dmesg output.
> 
> (In reply to Shih-Yuan Lee from comment #0)
> > While I am doing the tests with AC plugged in by `DRI_PRIME=1 glxgears
> > -info` and `DRI_PRIME=0 glxgears -info`, the system halts and then is forced
> > to shutdown automatically.
> 
> To clarify, DRI_PRIME=1 glxgears works, the problem only occurs with
> DRI_PRIME=0 glxgears?

Comment 4 Shih-Yuan Lee 2017-10-20 09:33:24 UTC

Sorry. I pasted wrong logs.

GL_VERSION should be "3.0 Mesa 17.0.7" instead.

(In reply to Shih-Yuan Lee from comment #3)
> Using DRI_PRIME=0 is just to switch between Intel and AMD Graphics.
> Of course, we can omit it completely.
> 
> The system halt issue happens when executing `DRI_PRIME=1 glxgears -info`.
> 
> u@u:~$ DRI_PRIME=1 glxgears -info
> Running synchronized to the vertical refresh.  The framerate should be
> approximately the same as the monitor refresh rate.
> GL_RENDERER   = Gallium 0.4 on AMD HAINAN (DRM 2.46.0, LLVM 3.8.0)
> GL_VERSION    = 3.0 Mesa 11.2.0
> GL_VENDOR     = X.Org
> ...
> u@u:~$ DRI_PRIME=0 glxgears -info
> Running synchronized to the vertical refresh.  The framerate should be
> approximately the same as the monitor refresh rate.
> GL_RENDERER   = Mesa DRI Intel(R) Kabylake GT1.5 
> GL_VERSION    = 3.0 Mesa 11.2.0
> GL_VENDOR     = Intel Open Source Technology Center
> ...
> 
> (In reply to Michel Dänzer from comment #1)
> > Please attach the corresponding dmesg output.
> > 
> > (In reply to Shih-Yuan Lee from comment #0)
> > > While I am doing the tests with AC plugged in by `DRI_PRIME=1 glxgears
> > > -info` and `DRI_PRIME=0 glxgears -info`, the system halts and then is forced
> > > to shutdown automatically.
> > 
> > To clarify, DRI_PRIME=1 glxgears works, the problem only occurs with
> > DRI_PRIME=0 glxgears?

Comment 5 Shih-Yuan Lee 2017-10-20 09:55:47 UTC

There is no problem under the battery mode, and because the system halts that makes unable to collect any dmesg.

$ DRI_PRIME=1 glxgears -info                                                                                                                                                                                                           
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
GL_RENDERER   = Gallium 0.4 on AMD HAINAN (DRM 2.50.0 / 4.14.0-041400rc5-generic, LLVM 4.0.0)
GL_VERSION    = 3.0 Mesa 17.0.7
GL_VENDOR     = X.Org
...

$ DRI_PRIME=0 glxgears -info
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
GL_RENDERER   = Mesa DRI Intel(R) Kabylake GT1.5
GL_VERSION    = 3.0 Mesa 17.0.7
GL_VENDOR     = Intel Open Source Technology Center
...

Comment 6 Michel Dänzer 2017-10-20 10:01:42 UTC

(In reply to Shih-Yuan Lee from comment #5)
> There is no problem under the battery mode, and because the system halts
> that makes unable to collect any dmesg.

Please attach dmesg captured in battery mode or before the problem occurs.

Comment 7 Shih-Yuan Lee 2017-10-20 11:04:52 UTC

Created attachment 134936 [details]
dmesg by drm.debug=0xe

The messages just before the system halt.

Comment 8 Mike Lothian 2017-10-20 15:54:01 UTC

Hmm, I notice these errors:

[    2.050887] [drm:radeon_acpi_init [radeon]] Call to ATCS verify_interface failed: -5
[    2.050994] [drm:radeon_acpi_init [radeon]] Call to ATIF verify_interface failed: -5

which I think are ACPI calls, it might be worth checking your BIOS/EFI is up to date and if that doesn't fix things maybe play around with the acpi_osi= options

Comment 9 Shih-Yuan Lee 2017-10-24 10:12:02 UTC

I have tried acpi_osi="Windows 2009", "Windows 2012", "Windows 2013" and "Windows 2015" on the latest mainline kernel 4.14rc6, and they all have the same errors and halt the system.
The BIOS is also up to date.

(In reply to Mike Lothian from comment #8)
> Hmm, I notice these errors:
> 
> [    2.050887] [drm:radeon_acpi_init [radeon]] Call to ATCS verify_interface
> failed: -5
> [    2.050994] [drm:radeon_acpi_init [radeon]] Call to ATIF verify_interface
> failed: -5
> 
> which I think are ACPI calls, it might be worth checking your BIOS/EFI is up
> to date and if that doesn't fix things maybe play around with the acpi_osi=
> options

Comment 10 Mike Lothian 2017-10-24 10:21:52 UTC

Did this ever work for you?

Comment 11 Shih-Yuan Lee 2017-10-24 10:30:35 UTC

(In reply to Mike Lothian from comment #10)
> Did this ever work for you?

What do you mean by this?

Comment 12 Shih-Yuan Lee 2017-10-24 10:32:50 UTC

BTW, this is a new Dell laptop in the development.

Comment 13 Mike Lothian 2017-10-24 10:44:42 UTC

I was meaning, is this a regression, as in it used to work with an older kernel or mesa. If it's a new system perhaps not.

Comment 14 Shih-Yuan Lee 2017-10-24 11:03:31 UTC

Yup, this is a new system. `DRI_PRIME=1 glxgears` never worked properly before.

Comment 15 Mike Lothian 2017-10-24 11:37:30 UTC

Are there any changes when you boot the system with radeon.runpm=0, this will mean the card never powers down

What distro are you running?

You mention trying older kernel version, did you try older mesa versions too?

Can you attach your Xorg.0.log too

Comment 16 Mike Lothian 2017-10-24 11:38:51 UTC

Do you also see the issue with amdgpu rather than using the radeon kernel driver?

Comment 17 Shih-Yuan Lee 2017-10-25 04:31:56 UTC

Created attachment 135027 [details]
Xorg.0.log

(In reply to Mike Lothian from comment #15)
> Are there any changes when you boot the system with radeon.runpm=0, this
> will mean the card never powers down
> 
> What distro are you running?
> 
> You mention trying older kernel version, did you try older mesa versions too?
> 
> Can you attach your Xorg.0.log too

radeon.runpm=0 doesn't make any change.

I am running Ubuntu 16.04 LTS which using Linux kernel 4.4 and Mesa 11.2.0 before upgrading the system.
After the system upgraded, it uses Mesa 17.0.7 instead.

Comment 18 Shih-Yuan Lee 2017-10-25 04:54:29 UTC

(In reply to Mike Lothian from comment #16)
> Do you also see the issue with amdgpu rather than using the radeon kernel
> driver?

amdgpu doesn't support on this AMD graphics with the kernel parameters "amdgpu.si_support=1 radeon.si_support=0" on Linux kernel 4.14rc6.
X window system can not start up.

01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Jet PRO [Radeon R5 M230] [1002:6665] (rev c3)                                                                                                                
        Subsystem: Dell Jet PRO [Radeon R5 M230] [1028:0844]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 129
        Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at d0000000 (64-bit, non-prefetchable) [size=256K]
        Region 4: I/O ports at e000 [size=256]
        Expansion ROM at d0040000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: radeon
        Kernel modules: radeon, amdgpu

Comment 19 Mike Lothian 2017-10-25 22:23:46 UTC

You have to blacklist radeon to use amdgpu as both modules try and claim the device

Comment 20 Shih-Yuan Lee 2017-10-26 04:14:34 UTC

Created attachment 135048 [details]
blacklist radeon dmesg

Comment 21 Shih-Yuan Lee 2017-10-26 04:15:03 UTC

Created attachment 135049 [details]
blacklist radeon Xorg.0.log

Comment 22 Shih-Yuan Lee 2017-10-26 04:17:26 UTC

(In reply to Mike Lothian from comment #19)
> You have to blacklist radeon to use amdgpu as both modules try and claim the
> device

After I blacklist radeon, there is no AMD graphics provider from `xrandr --listproviders`.

[    1.937326] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[    1.937633] amdgpu 0000:01:00.0: SI support provided by radeon.
[    1.937635] amdgpu 0000:01:00.0: Use radeon.si_support=0 amdgpu.si_support=1 to override.

After I use 'radeon.si_support=0 amdgpu.si_support=1', X window system can not start up.

Comment 23 Mike Lothian 2017-10-26 08:41:28 UTC

Created attachment 135059 [details]
attachment-12667-0.html

Can you show the dmesg and Xorg.0.log with radeon.si_support=0
amdgpu.si_support=1

On Thu, 26 Oct 2017 at 05:17 <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 22 <https://bugs.freedesktop.org/show_bug.cgi?id=103370#c22> on
> bug 103370 <https://bugs.freedesktop.org/show_bug.cgi?id=103370> from
> Shih-Yuan Lee <fourdollars@gmail.com> *
>
> (In reply to Mike Lothian from comment #19 <https://bugs.freedesktop.org/show_bug.cgi?id=103370#c19>)> You have to blacklist radeon to use amdgpu as both modules try and claim the
> > device
>
> After I blacklist radeon, there is no AMD graphics provider from `xrandr
> --listproviders`.
>
> [    1.937326] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
> [    1.937633] amdgpu 0000:01:00.0: SI support provided by radeon.
> [    1.937635] amdgpu 0000:01:00.0: Use radeon.si_support=0 amdgpu.si_support=1
> to override.
>
> After I use 'radeon.si_support=0 amdgpu.si_support=1', X window system can not
> start up.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are on the CC list for the bug.
>
>

Comment 24 Shih-Yuan Lee 2017-11-15 04:43:34 UTC

If I used radeon.dpm=0, there is no such issue.

Comment 25 Shih-Yuan Lee 2017-11-15 08:02:54 UTC

There is no such issue when I used mesa 11.2.0 on Ubuntu 16.04.
I found this issue on mesa 17.0.7 and mesa 17.2.4 also has this issue.

Comment 26 Timo Aaltonen 2017-11-17 09:56:21 UTC

this was tested to regress between mesa 12.0.3 and 12.0.5, and bisect points out

commit d3d33918c79d9e87aedaf6f70ed39f75eed262a0
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Wed Aug 17 17:02:04 2016 +0900

    loader/dri3: Overhaul dri3_update_num_back

as the first bad commit

Comment 27 Michel Dänzer 2017-11-17 10:12:51 UTC

Thanks for bisecting, but I don't think that commit can be directly responsible for a GPU hang. Before that commit, the DRI3 code in Mesa would only use one back buffer for glxgears, which means that the GPU could only start rendering a new frame after the previous one had finished presenting. Maybe that somehow prevented the hang.

A possible test for this theory is running

 vblank_mode=0 DRI_PRIME=1 glxgears

with Mesa 12.0.3; does that also trigger the hang?

Comment 28 Shih-Yuan Lee 2017-11-17 11:11:54 UTC

`vblank_mode=0 DRI_PRIME=1 glxgears` will also introduce the GPU lock up.
However when using radeon.dpm=0, it won't happen but it is tearing all the time.

Comment 29 Michel Dänzer 2017-11-17 11:14:00 UTC

Tearing is expected with vblank_mode=0.

Comment 30 Shih-Yuan Lee 2017-11-17 15:37:44 UTC

Tearing won't happen on battery power, but it will only happen when plugged in AC power.
Is this behavior also expected?

Comment 31 Michel Dänzer 2017-11-17 15:39:29 UTC

With vblank_mode=0, the only thing that can prevent tearing is luck.

Comment 32 Timo Aaltonen 2017-11-20 07:56:22 UTC

forwarding a comment from an engineer:

"During viewing the source code of radeon module, I found there is a bug [1] related to the dpm and clocks. So I decided to do some experiments.
Tried to set different max_sclk and max_mclk to see if the issue is gone.
1. max_sclk: 70000, max_mclk: 75000 --> have the same issue
2. max_sclk: 50000, max_mclk: 60000 --> pass multi-run test (more than 50 runs)

[1] https://bugs.freedesktop.org/show_bug.cgi?id=76490
"

Comment 33 Alex Deucher 2017-11-20 15:41:11 UTC

(In reply to Michel Dänzer from comment #27)
> Thanks for bisecting, but I don't think that commit can be directly
> responsible for a GPU hang. Before that commit, the DRI3 code in Mesa would
> only use one back buffer for glxgears, which means that the GPU could only
> start rendering a new frame after the previous one had finished presenting.
> Maybe that somehow prevented the hang.

That commit "fixed" a performance regression at the time because it ended up causing enough of a delay that the clocks didn't ramp up.  So it probably exposed a kernel dpm issue.  Without it, the clocks never ramped up enough to cause an issue.  With it, they did.


(In reply to Timo Aaltonen from comment #32)
> forwarding a comment from an engineer:
> 
> "During viewing the source code of radeon module, I found there is a bug [1]
> related to the dpm and clocks. So I decided to do some experiments.
> Tried to set different max_sclk and max_mclk to see if the issue is gone.
> 1. max_sclk: 70000, max_mclk: 75000 --> have the same issue
> 2. max_sclk: 50000, max_mclk: 60000 --> pass multi-run test (more than 50
> runs)
> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=76490
> "

I think Sonny fixed this.  It was due to using the wrong firmware.
[    1.827060] [drm] initializing kernel modesetting (HAINAN 0x1002:0x6665 0x1028:0x0844 0xC3).  This chip should be using radeon/banks_k_2_smc.bin smc firmware.  Is that available on the test system and kernel?

Comment 34 Alex Deucher 2017-11-20 15:48:10 UTC

The following commits are relevant:
abb2e3c1ce64c8bba678973800c34ea1dc97c42c
6458bd4dfd9414cba5804eb9907fe2a824278c34
ef736d394e85b1bf1fd65ba5e5257b85f6c82325
4e6e98b1e48c9474aed7ce03025ec319b941e26e

Comment 35 Alex Deucher 2017-11-20 15:52:24 UTC

Does reverting a628392cf03e0eef21b345afbb192cbade041741 fix the issue?

Comment 36 Robert Liu 2017-11-21 08:05:53 UTC

(In reply to Alex Deucher from comment #33)
> I think Sonny fixed this.  It was due to using the wrong firmware.
> [    1.827060] [drm] initializing kernel modesetting (HAINAN 0x1002:0x6665
> 0x1028:0x0844 0xC3).  This chip should be using radeon/banks_k_2_smc.bin smc
> firmware.  Is that available on the test system and kernel?
The firmware radeon/banks_k_2_smc.bin is on the test system.
With Ubuntu kernel 4.4.0-101-generic, I am not pretty sure the radeon driver is using this firmware.
With Ubuntu kernel 4.13.0-16-generic, I tried both amdgpu and radeon drivers, but the system hang. as soon as the system hang, the amdgpu_pm_info shows 'invalid dpm profile 15'.

(In reply to Alex Deucher from comment #34)
> The following commits are relevant:
> abb2e3c1ce64c8bba678973800c34ea1dc97c42c
> 6458bd4dfd9414cba5804eb9907fe2a824278c34
> ef736d394e85b1bf1fd65ba5e5257b85f6c82325
> 4e6e98b1e48c9474aed7ce03025ec319b941e26e
These commits would be already included in Ubuntu kernel 4.13.0-16-generic.

(In reply to Alex Deucher from comment #35)
> Does reverting a628392cf03e0eef21b345afbb192cbade041741 fix the issue?
Removing this commit does not fix the issue.


BTW, with 4.13.0-16-generic, I change the max_sclk in drm/radeon/si_dpm.c (what we did with Ubuntu kernel 4.4.0-101-generic) from 75000 to 65000, but still met the hang issue.

Comment 37 Robert Liu 2017-11-21 09:21:23 UTC

(In reply to Robert Liu from comment #36)
> BTW, with 4.13.0-16-generic, I change the max_sclk in drm/radeon/si_dpm.c
> (what we did with Ubuntu kernel 4.4.0-101-generic) from 75000 to 65000, but
> still met the hang issue.
By restricting max_sclk to 65000 and max_mclk to 80000, both radeon and amdgpu do not have the issue.

Comment 38 Alex Deucher 2017-11-21 17:13:11 UTC

Created attachment 135647 [details] [review]
workaround for radeon

workarounds for radeon and amdgpu to fix the issue.

Comment 39 Alex Deucher 2017-11-21 17:13:34 UTC

Created attachment 135648 [details] [review]
workaround for amdgpu

Comment 40 Shih-Yuan Lee 2017-11-22 08:59:24 UTC

Created attachment 135662 [details]
dmesg

(In reply to Alex Deucher from comment #38)
> Created attachment 135647 [details] [review] [review]
> workaround for radeon
> 
> workarounds for radeon and amdgpu to fix the issue.

I applied this patch on top of Ubuntu-4.4.0-101.124 Linux kernel and it seems to fix the issue in the beginning.
But it has some problem later on.

$ seq 20 | while read i; do echo Loop $i; DRI_PRIME=1 glxgears -info|head -n 5; done                                                                                                                                                   
Loop 1
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x800000
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x800000
radeonsi: Failed to create a context.
Loop 2
...

Comment 41 Robert Liu 2017-11-24 07:02:54 UTC

So far, setting max_sclk to 60000 and max_mclk to 80000, the system passed a 24hours burn-in test (vblank_mode=0 DRI_PRIME=1 glmark2 --run-forever).

Another issue found is when removing the adapter, the system goes to suspend. After I wake it up, it continues running the benchmark.

Comment 42 Michel Dänzer 2017-11-24 08:55:09 UTC

(In reply to Robert Liu from comment #41)
> Another issue found is when removing the adapter, the system goes to
> suspend.

That's not directly related to graphics drivers.

Comment 43 Shih-Yuan Lee 2018-01-19 08:59:17 UTC

I can still reduplicate the issue after setting max_sclk to 60000 and max_mclk to 80000.

Comment 44 Shih-Yuan Lee 2018-01-29 04:35:20 UTC

I tried max_sclk = 50000 and max_mclk = 60000 on Ubuntu-4.4.0-112.135, but I can still reduplicate the GPU lock up issue.
It can pass the first run of `seq 100 | while read i; do echo Loop $i; DRI_PRIME=1 glxgears -info|head -n 3; done`.
But it failed when I tried the second run of `seq 100 | while read i; do echo Loop $i; DRI_PRIME=1 glxgears -info|head -n 3; done`.

Comment 45 Shih-Yuan Lee 2018-03-15 04:18:07 UTC

I can still reduplicate this issue on Ubuntu 18.04 by `seq 100 | while read i; do echo Loop $i; DRI_PRIME=1 glxgears -info|head -n2; done`.

Comment 46 Shih-Yuan Lee 2018-03-15 06:45:54 UTC

The Linux kernel of Comment 45 is 4.15.0-10.11 from Ubuntu 18.04.
When I tried a later version 4.15.0-12.13, I can not reduplicate this issue on Ubuntu 18.04.
4.15.0-12.13 contains the following commit.

commit 239b5f64e12b1f09f506c164dff0374924782979
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Tue Nov 21 12:09:38 2017 -0500

    drm/radeon: Add dpm quirk for Jet PRO (v2)
    
    Fixes stability issues.
    
    v2: clamp sclk to 600 Mhz
    
    Bug: https://bugs.freedesktop.org/show_bug.cgi?id=103370
    Acked-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org

diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
index ee3e742..97a0a63 100644
--- a/drivers/gpu/drm/radeon/si_dpm.c
+++ b/drivers/gpu/drm/radeon/si_dpm.c
@@ -2984,6 +2984,11 @@ static void si_apply_state_adjust_rules(struct radeon_device *rdev,
                    (rdev->pdev->device == 0x6667)) {
                        max_sclk = 75000;
                }
+               if ((rdev->pdev->revision == 0xC3) ||
+                   (rdev->pdev->device == 0x6665)) {
+                       max_sclk = 60000;
+                       max_mclk = 80000;
+               }
        } else if (rdev->family == CHIP_OLAND) {
                if ((rdev->pdev->revision == 0xC7) ||
                    (rdev->pdev->revision == 0x80) ||

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.