Bug 107950

Summary: Delayed freeze with DRI_PRIME=1 on Topaz
Product: DRI Reporter: SET <nmset>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: NEW --- QA Contact:
Severity: major    
Priority: medium CC: nmset
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log
none
dmesg after reboot
none
kernel log during problems none

Description SET 2018-09-16 17:09:39 UTC
Host : laptop with Kaveri iGPU and Topaz dGPU
kernel : 4.18.4
Xorg : 1.20.1
Mesa : 18.2.0


When running 'DRI_PRIME=1 glmark2', the systems hangs after about 60 seconds. Must reboot wildly with the power button or magic sysrq, the latter may not completely power off the laptop.

The iGPU is driven by radeon module, and the dGPU with amdgpu. No module options, or the following options (amdgpu cik_support=0 si_support=1; radeon cik_support=1 si_support=0) yield the same result.

I can't say it started with 4.18.4. It's observed on 4.19-rc2/3 also. This never happened with older kernels.

No such event occurs when using the iGPU.

I cannot bisect, because the last crash badly corrupted the home partition, and my home directory simply vanished after fsck recreated the ext4 journal. I could recover from backup fortunately.

May be it's not related to amdgpu, but rather to Xorg, mesa or anything else. I am reporting it here in case it could be amdgpu in such offloading context.

Regards.
Comment 1 Alex Deucher 2018-09-17 21:07:58 UTC
Can you attach the xorg log and dmesg output from your system?
Comment 2 SET 2018-09-18 08:31:13 UTC
Created attachment 141623 [details]
Xorg log
Comment 3 SET 2018-09-18 08:31:47 UTC
Created attachment 141624 [details]
dmesg after reboot
Comment 4 SET 2018-09-18 08:33:36 UTC
Created attachment 141625 [details]
kernel log during problems
Comment 5 SET 2018-09-18 08:34:13 UTC
Please see attachments.
Comment 6 Michel Dänzer 2018-09-18 09:44:56 UTC
Does updating xf86-video-ati to 18.1.0 or using EXA instead of glamor help by any chance? Your system is affected by bug 105381.
Comment 7 SET 2018-09-18 11:34:13 UTC
With EXA, sddm login screen does not show up.

xf86-video-ati 18.1.0 is in testing branch at Arch repositories. Will test when it'll be available as stable.
Comment 8 SET 2018-09-19 19:07:16 UTC
(In reply to Michel Dänzer from comment #6)

Tried with xf86-video-ati 18.1.0 :

Same delayed freeze.

I think the host gets overheated. The last line in kernel.log is 

Sep 19 20:44:44 hp2 kernel: [  337.131484] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!

I was monitoring the temperature with 'sensors' command. Last output for amdgpu sensor was :

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:       +0.82 V  
fan1:             N/A
temp1:        +190.0°C  (crit = +104000.0°C, hyst = -273.1°C)
power1:        1.04 kW (cap =  30.00 W)

Perhaps powerplay needs some fix ?

Regards.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.