Bug 107950 - Delayed freeze with DRI_PRIME=1 on Topaz
Summary: Delayed freeze with DRI_PRIME=1 on Topaz
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-16 17:09 UTC by SET
Modified: 2019-11-19 08:56 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log (40.83 KB, text/plain)
2018-09-18 08:31 UTC, SET
no flags Details
dmesg after reboot (90.50 KB, text/plain)
2018-09-18 08:31 UTC, SET
no flags Details
kernel log during problems (2.55 MB, text/plain)
2018-09-18 08:33 UTC, SET
no flags Details

Description SET 2018-09-16 17:09:39 UTC
Host : laptop with Kaveri iGPU and Topaz dGPU
kernel : 4.18.4
Xorg : 1.20.1
Mesa : 18.2.0


When running 'DRI_PRIME=1 glmark2', the systems hangs after about 60 seconds. Must reboot wildly with the power button or magic sysrq, the latter may not completely power off the laptop.

The iGPU is driven by radeon module, and the dGPU with amdgpu. No module options, or the following options (amdgpu cik_support=0 si_support=1; radeon cik_support=1 si_support=0) yield the same result.

I can't say it started with 4.18.4. It's observed on 4.19-rc2/3 also. This never happened with older kernels.

No such event occurs when using the iGPU.

I cannot bisect, because the last crash badly corrupted the home partition, and my home directory simply vanished after fsck recreated the ext4 journal. I could recover from backup fortunately.

May be it's not related to amdgpu, but rather to Xorg, mesa or anything else. I am reporting it here in case it could be amdgpu in such offloading context.

Regards.
Comment 1 Alex Deucher 2018-09-17 21:07:58 UTC
Can you attach the xorg log and dmesg output from your system?
Comment 2 SET 2018-09-18 08:31:13 UTC
Created attachment 141623 [details]
Xorg log
Comment 3 SET 2018-09-18 08:31:47 UTC
Created attachment 141624 [details]
dmesg after reboot
Comment 4 SET 2018-09-18 08:33:36 UTC
Created attachment 141625 [details]
kernel log during problems
Comment 5 SET 2018-09-18 08:34:13 UTC
Please see attachments.
Comment 6 Michel Dänzer 2018-09-18 09:44:56 UTC
Does updating xf86-video-ati to 18.1.0 or using EXA instead of glamor help by any chance? Your system is affected by bug 105381.
Comment 7 SET 2018-09-18 11:34:13 UTC
With EXA, sddm login screen does not show up.

xf86-video-ati 18.1.0 is in testing branch at Arch repositories. Will test when it'll be available as stable.
Comment 8 SET 2018-09-19 19:07:16 UTC
(In reply to Michel Dänzer from comment #6)

Tried with xf86-video-ati 18.1.0 :

Same delayed freeze.

I think the host gets overheated. The last line in kernel.log is 

Sep 19 20:44:44 hp2 kernel: [  337.131484] amdgpu: [powerplay] GPU over temperature range detected on PCIe 0:0.0!

I was monitoring the temperature with 'sensors' command. Last output for amdgpu sensor was :

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:       +0.82 V  
fan1:             N/A
temp1:        +190.0°C  (crit = +104000.0°C, hyst = -273.1°C)
power1:        1.04 kW (cap =  30.00 W)

Perhaps powerplay needs some fix ?

Regards.
Comment 9 Martin Peres 2019-11-19 08:56:17 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/531.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.