Summary: | Hang regression with R7 M370, identified possible culprit commit | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Mauro Santos <registo.mailling> | ||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | medium | ||||||||||
Version: | unspecified | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Created attachment 130250 [details] [review] patch 1/2 The attached patches should fix it. Created attachment 130251 [details] [review] patch 2/2 Build fails after applying patch 1 followed by patch 2 with: drivers/gpu/drm/radeon/si_dpm.c: In function ‘si_get_vce_clock_voltage’: drivers/gpu/drm/radeon/si_dpm.c:2977:4: error: ‘else’ without a previous ‘if’ } else if (rdev->family == CHIP_OLAND) { ^~~~ drivers/gpu/drm/radeon/si_dpm.c:2985:4: error: ‘max_sclk’ undeclared (first use in this function) max_sclk = 75000; ^~~~~~~~ drivers/gpu/drm/radeon/si_dpm.c:2985:4: note: each undeclared identifier is reported only once for each function it appears in The patch changes things inside the si_get_vce_clock_voltage function but I suppose the changes should be made a few lines bellow that to the si_apply_state_adjust_rules function after the quirks for pitcairn and hainan right? Another thing that I'm curious about, any guesses as to why the card needs the maximum core clock limited to 750MHz on linux but seems to work fine on windows 10 at 875MHz? I've tried it on Windows 10 (all drivers downloaded via windows update) with unigine heaven + cpu-z to monitor the frequencies and it seems to go along happily with 875MHz core and 900MHz memory clocks. (In reply to Mauro Santos from comment #3) > Build fails after applying patch 1 followed by patch 2 with: > > > drivers/gpu/drm/radeon/si_dpm.c: In function ‘si_get_vce_clock_voltage’: > drivers/gpu/drm/radeon/si_dpm.c:2977:4: error: ‘else’ without a previous ‘if’ > } else if (rdev->family == CHIP_OLAND) { > ^~~~ > drivers/gpu/drm/radeon/si_dpm.c:2985:4: error: ‘max_sclk’ undeclared (first > use in this function) > max_sclk = 75000; > ^~~~~~~~ > drivers/gpu/drm/radeon/si_dpm.c:2985:4: note: each undeclared identifier is > reported only once for each function it appears in > > > The patch changes things inside the si_get_vce_clock_voltage function but I > suppose the changes should be made a few lines bellow that to the > si_apply_state_adjust_rules function after the quirks for pitcairn and > hainan right? The patch modifies si_apply_state_adjust_rules, I guess it's not applying cleanly to your kernel. > > Another thing that I'm curious about, any guesses as to why the card needs > the maximum core clock limited to 750MHz on linux but seems to work fine on > windows 10 at 875MHz? I've tried it on Windows 10 (all drivers downloaded > via windows update) with unigine heaven + cpu-z to monitor the frequencies > and it seems to go along happily with 875MHz core and 900MHz memory clocks. There is still some bug in the driver that prevents the higher clocks for working stable on your card. We fixed some issues and the driver was working on the hardware samples we had in house (which is why I removed the workaround), but apparently there are still some variants that are not working correctly. (In reply to Alex Deucher from comment #4) > The patch modifies si_apply_state_adjust_rules, I guess it's not applying > cleanly to your kernel. I've retried it with the current git tree and it does apply properly. Before I was trying with kernel 4.9.2. I can confirm that with the patches that were provided the card does not hang. I have also tried reverting commit 3a69adfe5617ceba04ad3cff0f9ccad470503fb2 from kernel 4.9.2 (leaving only the sclk limitation) and it also works, no hangs with sclk=750MHz and mclk=900MHz. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/784. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 130246 [details] lspci -nnk and dmesg output After updating from kernel 4.9 series to 4.10 series I have identified a regression when using the discrete GPU on my laptop (Lenovo Thinkpad E560). When running any demanding application with DRI_PRIME=1 the card will hang, one example would be running 'DRI_PRIME=1 glmark2 -b texture'. I have noticed that the content of /sys/kernel/debug/dri/1/radeon_pm_info has changed between kernel 4.9 and 4.10 when running glmark2. With 4.9: power level 4 sclk: 75000 mclk: 80000 vddc: 1050 vddci: 0 pcie gen: 2 With 4.10: power level 4 sclk: 87500 mclk: 90000 vddc: 1050 vddci: 0 pcie gen: 2 This led me to revert commit 3a69adfe5617ceba04ad3cff0f9ccad470503fb2 which prevents the card from hanging. You can find the output of lspci and dmesg in the attachment for the case with commit 3a69adfe5617ceba04ad3cff0f9ccad470503fb2 reverted.