|Summary:||Delayed freeze with DRI_PRIME=1 on Topaz|
|Component:||DRM/AMDgpu||Assignee:||Default DRI bug account <dri-devel>|
|Status:||NEW ---||QA Contact:|
|i915 platform:||i915 features:|
Description SET 2018-09-16 17:09:39 UTC
Host : laptop with Kaveri iGPU and Topaz dGPU kernel : 4.18.4 Xorg : 1.20.1 Mesa : 18.2.0 When running 'DRI_PRIME=1 glmark2', the systems hangs after about 60 seconds. Must reboot wildly with the power button or magic sysrq, the latter may not completely power off the laptop. The iGPU is driven by radeon module, and the dGPU with amdgpu. No module options, or the following options (amdgpu cik_support=0 si_support=1; radeon cik_support=1 si_support=0) yield the same result. I can't say it started with 4.18.4. It's observed on 4.19-rc2/3 also. This never happened with older kernels. No such event occurs when using the iGPU. I cannot bisect, because the last crash badly corrupted the home partition, and my home directory simply vanished after fsck recreated the ext4 journal. I could recover from backup fortunately. May be it's not related to amdgpu, but rather to Xorg, mesa or anything else. I am reporting it here in case it could be amdgpu in such offloading context. Regards.
Comment 1 Alex Deucher 2018-09-17 21:07:58 UTC
Can you attach the xorg log and dmesg output from your system?
Comment 4 SET 2018-09-18 08:33:36 UTC
Created attachment 141625 [details] kernel log during problems
Comment 5 SET 2018-09-18 08:34:13 UTC
Please see attachments.
Comment 6 Michel Dänzer 2018-09-18 09:44:56 UTC
Does updating xf86-video-ati to 18.1.0 or using EXA instead of glamor help by any chance? Your system is affected by bug 105381.
Comment 7 SET 2018-09-18 11:34:13 UTC
With EXA, sddm login screen does not show up. xf86-video-ati 18.1.0 is in testing branch at Arch repositories. Will test when it'll be available as stable.
Comment 8 SET 2018-09-19 19:07:16 UTC
(In reply to Michel Dänzer from comment #6) Tried with xf86-video-ati 18.1.0 : Same delayed freeze. I think the host gets overheated. The last line in kernel.log is Sep 19 20:44:44 hp2 kernel: [ 337.131484] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0! I was monitoring the temperature with 'sensors' command. Last output for amdgpu sensor was : amdgpu-pci-0100 Adapter: PCI adapter vddgfx: +0.82 V fan1: N/A temp1: +190.0°C (crit = +104000.0°C, hyst = -273.1°C) power1: 1.04 kW (cap = 30.00 W) Perhaps powerplay needs some fix ? Regards.