Host : laptop with Kaveri iGPU and Topaz dGPU
kernel : 4.18.4
Xorg : 1.20.1
Mesa : 18.2.0
When running 'DRI_PRIME=1 glmark2', the systems hangs after about 60 seconds. Must reboot wildly with the power button or magic sysrq, the latter may not completely power off the laptop.
The iGPU is driven by radeon module, and the dGPU with amdgpu. No module options, or the following options (amdgpu cik_support=0 si_support=1; radeon cik_support=1 si_support=0) yield the same result.
I can't say it started with 4.18.4. It's observed on 4.19-rc2/3 also. This never happened with older kernels.
No such event occurs when using the iGPU.
I cannot bisect, because the last crash badly corrupted the home partition, and my home directory simply vanished after fsck recreated the ext4 journal. I could recover from backup fortunately.
May be it's not related to amdgpu, but rather to Xorg, mesa or anything else. I am reporting it here in case it could be amdgpu in such offloading context.
Can you attach the xorg log and dmesg output from your system?
Created attachment 141623 [details]
Created attachment 141624 [details]
dmesg after reboot
Created attachment 141625 [details]
kernel log during problems
Please see attachments.
Does updating xf86-video-ati to 18.1.0 or using EXA instead of glamor help by any chance? Your system is affected by bug 105381.
With EXA, sddm login screen does not show up.
xf86-video-ati 18.1.0 is in testing branch at Arch repositories. Will test when it'll be available as stable.
(In reply to Michel Dänzer from comment #6)
Tried with xf86-video-ati 18.1.0 :
Same delayed freeze.
I think the host gets overheated. The last line in kernel.log is
Sep 19 20:44:44 hp2 kernel: [ 337.131484] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!
I was monitoring the temperature with 'sensors' command. Last output for amdgpu sensor was :
Adapter: PCI adapter
vddgfx: +0.82 V
temp1: +190.0°C (crit = +104000.0°C, hyst = -273.1°C)
power1: 1.04 kW (cap = 30.00 W)
Perhaps powerplay needs some fix ?
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/531.