Everything was working fine with v4.8-rc2, but with v4.8-rc3 the module loads successfully but when I try to use it via vdpau it results in following errors:- [ 101.129388] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 1 test failed (scratch(0xC040)=0xCAFEDEAD) [ 101.340282] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 2 test failed (scratch(0xC040)=0xCAFEDEAD) [ 101.551074] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 3 test failed (scratch(0xC040)=0xCAFEDEAD) [ 101.761909] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 4 test failed (scratch(0xC040)=0xCAFEDEAD) [ 101.972856] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 5 test failed (scratch(0xC040)=0xCAFEDEAD) [ 102.183708] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 6 test failed (scratch(0xC040)=0xCAFEDEAD) [ 102.394620] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 7 test failed (scratch(0xC040)=0xCAFEDEAD) [ 102.605321] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 8 test failed (scratch(0xC040)=0xCAFEDEAD) [ 102.709055] [drm:sdma_v2_4_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 9 test failed (0xCAFEDEAD) [ 102.709065] [drm:amdgpu_resume [amdgpu]] *ERROR* resume of IP block <sdma_v2_4> failed -22 [ 102.709074] [drm:amdgpu_resume_kms [amdgpu]] *ERROR* amdgpu_resume failed (-22). [ 126.965536] [TTM] Buffer eviction failed
After the git bisect, the following commit was leading to error:- commit c39b487f195b93235ee76384427467786f7bf29f Author: Alex Deucher <alexander.deucher@amd.com> Date: Tue Aug 9 00:20:28 2016 -0400 Revert "drm/amdgpu: work around lack of upstream ACPI support for D3cold" This reverts commit c63695cc5e5f685e924e25a8f9555f6e846f1fc6. Now that d3cold support is upstream, there is no more need for this workaround. bug: https://bugs.freedesktop.org/show_bug.cgi?id=97248
Possibly the same root cause as bug 98398.
I had a look at Peter Wu's patch that landed today on drm-fixes. Do we also need to have a patch like that which maybe tweaks amdgpu_atpx_handler to make it work?
@Nayan To see whether the problem is similar as nouveau, I need more information about your machine. Could you provide your acpidump? You can also submit a whole tarball as described at https://bugs.launchpad.net/lpbugreporter/+bug/752542, that would include the acpidump among other things.
Created attachment 127673 [details] Dell_Inc.-Inspiron_5548.tar.gz
After I run any program which uses the AMD gpu the system will hang if I perform any heavy task like starting chrome for example. Please let me know if you need any more info or need to test some patches.
Created attachment 127678 [details] [review] amdgpu patch that checks whether the new interface can be used for PM PCIe port PM is not enabled because this BIOS is pre-2015: 12/04/2014 The BIOS seems to be able to report support for lots of things (can you post a fuller dmesg that include the supported functions?): Scope (\_SB.PCI0.GFX0) { Method (ATPX, 2, Serialized) { // ... If (Arg0 == One) { Name (TMP2, Buffer (0x0100) { 0x00 }) CreateWordField (TMP2, Zero, F1SS) CreateDWordField (TMP2, 0x02, F1VM) CreateDWordField (TMP2, 0x06, F1FG) F1SS = 0x0A F1VM = 0x7FC0 If ((\_SB.PCI0.RP05.PXSX.SGMD & 0x0F) == 0x02) { // ... If (\_SB.PCI0.RP05.PXSX.PXDY == One) { F1FG |= 0x80 /* ATPX_DYNAMIC_PX_SUPPORTED */ F1VM |= 0x80 } // If (\_SB.PCI0.RP05.PXSX.FDPD == One) { F1FG |= 0x0400 /* ATPX_DYNAMIC_DGPU_POWER_OFF_SUPPORTED */ F1VM |= 0x0400 If (OSYS >= 0x07DC) { F1FG |= 0x0800 /* ATPX_DGPU_REQ_POWER_FOR_DISPLAYS */ F1VM |= 0x0800 } } // ... If (OSYS >= 0x07DD) { F1FG |= 0x4000 /* ATPX_MS_HYBRID_GFX_SUPPORTED */ F1VM |= 0x4000 } amdgpu (and radeon) needs to check whether PCIe port PM is really supported. Possible patch is attached (there should probably be a pci_d3cold_disable call somewhere, see commit 279cf3f23870f7eb8ca071115e06d3d5ca0a2b9e for nouveau).
Comment on attachment 127678 [details] [review] amdgpu patch that checks whether the new interface can be used for PM Review of attachment 127678 [details] [review]: ----------------------------------------------------------------- ::: drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c @@ +206,5 @@ > atpx->is_hybrid = false; > if (valid_bits & ATPX_MS_HYBRID_GFX_SUPPORTED) { > printk("ATPX Hybrid Graphics\n"); > + /* Disable legacy PM methods only when pcie port PM is usable. */ > + atpx->functions.power_cntl = !amdgpu_atpx_priv->bridge_pm_usable; With this replaced by atpx->functions.power_cntl = !amdgpu_atpx_priv.bridge_pm_usable; The patch fixes the issue. :)
I will also try to the master branch of kernel to see if bug 98357 is gone
(In reply to Peter Wu from comment #7) > Created attachment 127678 [details] [review] [review] > amdgpu patch that checks whether the new interface can be used for PM > > PCIe port PM is not enabled because this BIOS is pre-2015: 12/04/2014 > The BIOS seems to be able to report support for lots of things (can you post > a fuller dmesg that include the supported functions?): Can we bump the bios white list to 2014? Windows enabled Hybrid graphics on windows 8.x as well as 10, and a number of preliminary win10 systems have 2014 bios versions. I'd prefer that to doing all of these hacks in the drivers.
(In reply to Alex Deucher from comment #10) > (In reply to Peter Wu from comment #7) > > Created attachment 127678 [details] [review] [review] [review] > > amdgpu patch that checks whether the new interface can be used for PM > > > > PCIe port PM is not enabled because this BIOS is pre-2015: 12/04/2014 > > The BIOS seems to be able to report support for lots of things (can you post > > a fuller dmesg that include the supported functions?): > > Can we bump the bios white list to 2014? Windows enabled Hybrid graphics on > windows 8.x as well as 10, and a number of preliminary win10 systems have > 2014 bios versions. I'd prefer that to doing all of these hacks in the drivers. Lowering the minumum from 2015 to 2014 should work for nouveau: https://lists.freedesktop.org/archives/nouveau/2016-July/025619.html You can try to ask Mika, see v4.7-rc2-3-g9d26d3a. Maybe the presence of a power resource (_PR3 or flags.power_resources) can be used as a hint whether to enable port PM.
(In reply to Peter Wu from comment #11) > (In reply to Alex Deucher from comment #10) > > (In reply to Peter Wu from comment #7) > > > Created attachment 127678 [details] [review] [review] [review] [review] > > > amdgpu patch that checks whether the new interface can be used for PM > > > > > > PCIe port PM is not enabled because this BIOS is pre-2015: 12/04/2014 > > > The BIOS seems to be able to report support for lots of things (can you post > > > a fuller dmesg that include the supported functions?): > > > > Can we bump the bios white list to 2014? Windows enabled Hybrid graphics on > > windows 8.x as well as 10, and a number of preliminary win10 systems have > > 2014 bios versions. I'd prefer that to doing all of these hacks in the drivers. > > Lowering the minumum from 2015 to 2014 should work for nouveau: > https://lists.freedesktop.org/archives/nouveau/2016-July/025619.html > For further clarify on your research, the current hybrid graphics spec requires no connectors on the dGPU and hence no audio devices. For devices with connectors on the dGPU, they should use the older vendor specific methods and PR# should not be exposed. > You can try to ask Mika, see v4.7-rc2-3-g9d26d3a. Maybe the presence of a > power resource (_PR3 or flags.power_resources) can be used as a hint whether > to enable port PM. Yes, the presence of _PR3 would be a great way to determine when to enable it.
(In reply to Alex Deucher from comment #12) > methods and PR# should not be exposed. PR3
(In reply to Alex Deucher from comment #12) [..] > > Lowering the minumum from 2015 to 2014 should work for nouveau: > > https://lists.freedesktop.org/archives/nouveau/2016-July/025619.html > > > > For further clarify on your research, the current hybrid graphics spec > requires no connectors on the dGPU and hence no audio devices. For devices > with connectors on the dGPU, they should use the older vendor specific > methods and PR3 should not be exposed. The list was mentioned because of the BIOS dates and whether _PR3 is used. The Windows 8 requirements state that the discrete GPU is render-only[0], but the Win10 reqs have no such requirement[1]. (Are these the same specs you had in mind? If not, could you share the right one?) On my new hybrid nvidia laptop, there are actually DP/HDMI ports to the dGPU (HDMI with audio). [0]: https://msdn.microsoft.com/en-us/library/windows/hardware/dn265510(v=vs.85).aspx [1]: https://msdn.microsoft.com/en-us/windows/hardware/commercialize/design/compatibility/systems#systemfundamentalsgraphicshybridgraphicsmultigpu > > You can try to ask Mika, see v4.7-rc2-3-g9d26d3a. Maybe the presence of a > > power resource (_PR3 or flags.power_resources) can be used as a hint whether > > to enable port PM. > > Yes, the presence of _PR3 would be a great way to determine when to enable > it. Have you been working on this or should I have a look?
(In reply to Peter Wu from comment #14) > (In reply to Alex Deucher from comment #12) > [..] > > > Lowering the minumum from 2015 to 2014 should work for nouveau: > > > https://lists.freedesktop.org/archives/nouveau/2016-July/025619.html > > > > > > > For further clarify on your research, the current hybrid graphics spec > > requires no connectors on the dGPU and hence no audio devices. For devices > > with connectors on the dGPU, they should use the older vendor specific > > methods and PR3 should not be exposed. > > The list was mentioned because of the BIOS dates and whether _PR3 is used. > > The Windows 8 requirements state that the discrete GPU is render-only[0], > but the Win10 reqs have no such requirement[1]. (Are these the same specs > you had in mind? If not, could you share the right one?) On my new hybrid > nvidia laptop, there are actually DP/HDMI ports to the dGPU (HDMI with > audio). > I had access to the PR3 spec internally. I'm not sure if it's available publicly or not. Support for displays was not included in the information I saw or from talking with our internal hybrid graphics architects. It's possible the spec has been updated since I last talked to them earlier this year. I can double check. Does your new laptop use PR3 or the older nvidia proprietary ACPI method? > [0]: > https://msdn.microsoft.com/en-us/library/windows/hardware/dn265510(v=vs.85). > aspx > [1]: > https://msdn.microsoft.com/en-us/windows/hardware/commercialize/design/ > compatibility/systems#systemfundamentalsgraphicshybridgraphicsmultigpu > > > > You can try to ask Mika, see v4.7-rc2-3-g9d26d3a. Maybe the presence of a > > > power resource (_PR3 or flags.power_resources) can be used as a hint whether > > > to enable port PM. > > > > Yes, the presence of _PR3 would be a great way to determine when to enable > > it. > > Have you been working on this or should I have a look? I have not been. Please have a look. Thanks!
(In reply to Alex Deucher from comment #15) > (In reply to Peter Wu from comment #14) [..] > > The Windows 8 requirements state that the discrete GPU is render-only[0], > > but the Win10 reqs have no such requirement[1]. (Are these the same specs > > you had in mind? If not, could you share the right one?) On my new hybrid > > nvidia laptop, there are actually DP/HDMI ports to the dGPU (HDMI with > > audio). > > I had access to the PR3 spec internally. I'm not sure if it's available > publicly or not. Support for displays was not included in the information I > saw or from talking with our internal hybrid graphics architects. It's > possible the spec has been updated since I last talked to them earlier this > year. I can double check. > > Does your new laptop use PR3 or the older nvidia proprietary ACPI method? The laptop (Clevo P651RA) uses PR3 (confirmed via a Windows 10 trace): https://github.com/Bumblebee-Project/bbswitch/issues/115#issuecomment-218622306 > > Have you been working on this or should I have a look? > > I have not been. Please have a look. Thanks! Ok, do you have any certainty about the earliest BIOS date where _PR3 is used? E.g. if the minimum date is lowered to 2014 without checking for _PR3, would it be likely to miss out some models?
(In reply to Peter Wu from comment #16) > Ok, do you have any certainty about the earliest BIOS date where _PR3 is > used? E.g. if the minimum date is lowered to 2014 without checking for _PR3, > would it be likely to miss out some models? It's always used if it's available. I think 2014 should be ok (should catch most of the early ones), but I'll check with our windows architects. I suspect it will come down to OEM/ODMs.
*** Bug 98687 has been marked as a duplicate of this bug. ***
(In reply to Peter Wu from comment #16) > Ok, do you have any certainty about the earliest BIOS date where _PR3 is > used? E.g. if the minimum date is lowered to 2014 without checking for _PR3, > would it be likely to miss out some models? Can you add me to any relevant threads? I'd like to help get this fixed for 4.9.
(In reply to Alex Deucher from comment #19) > (In reply to Peter Wu from comment #16) > > Ok, do you have any certainty about the earliest BIOS date where _PR3 is > > used? E.g. if the minimum date is lowered to 2014 without checking for _PR3, > > would it be likely to miss out some models? > > Can you add me to any relevant threads? I'd like to help get this fixed for > 4.9. I will, but did you get any feedback from the Windows architects yet? I'd like to get some solid reasons that support lowering the version requirement.
(In reply to Peter Wu from comment #20) > > I will, but did you get any feedback from the Windows architects yet? I'd > like to get some solid reasons that support lowering the version requirement. Still waiting to hear back if they have any opinion on the dates. On windows it will always use PR3 if it's available.
Update from our windows team: Hybrid graphics platforms using _PR3 were first available in Oct 2013 (when windows 8.1 was released).
Nayan, as a workaround you may add the pcie_port_pm=force option to your kernel command line, then you should not need the patch. Can you check this? Alex, I don't think that the minimum date should change in 4.8 (and 4.9?) due to the risk of breakage since it is not just limited to amdgpu/radeon. Another concern is that while the year seems a good heuristic, it does not match the checks that Windows 8 performs (which may or may not be an issue): https://msdn.microsoft.com/en-us/windows/hardware/drivers/bringup/firmware-requirements-for-d3cold I'll bring this up with linux-pci developers after the weekend. Should I proceed with proposing workaround amdgpu/radeon patches?
Created attachment 127944 [details] dmesg-with-pcie_port_pm=force Without the patch and with the option pcie_port_pm=force, I get a bunch of errors on resume. I attaching the dmesg.
"Resume" means "runtime resume" as opposed to "resume from system sleep" right (based on the logs). When this happens, what is the output of "sudo lspci -s4: -nnvv"? (is the "rev" field "ff" or something else? Does the Power Management register report D0 or D3?) Can you reproduce the problem in this way: 1. Blacklist amdgpu (and radeon if enabled) 2. Boot system 3. Load amdgpu 4. Wait for runtime PM to kick in (5 seconds) 5. Wake up (e.g. "lspci") If so, can you build your kernel with CONFIG_ACPI_DEBUG=y and either boot with acpi.trace_state=enable? If that gives too many messages, you can also try setting the option between step 2 and 3: echo enable > /sys/module/acpi/parameters/trace_state What is the dmesg for this? (At this moment it is not clear for me what the port PM is doing versus what amdgpu is doing, hopefully some logs will clarify this).
Created attachment 127953 [details] dmesg-with-amdgpu-blacklisted I meant runtime resume. I am attaching the dmesg after blacklisting amdgpu and loading it without the debug option. I will recompile the kernel with the debug options and upload it whenever I am free.
Created attachment 127954 [details] sudo lspci -s4: -nnvv
Ok, when possible, please try to obtain a dmesg with CONFIG_ACPI_DEBUG=y and the trace_state option as described above.
(In reply to Peter Wu from comment #23) > Alex, I don't think that the minimum date should change in 4.8 (and 4.9?) > due to the risk of breakage since it is not just limited to amdgpu/radeon. > Another concern is that while the year seems a good heuristic, it does not > match the checks that Windows 8 performs (which may or may not be an issue): > https://msdn.microsoft.com/en-us/windows/hardware/drivers/bringup/firmware- > requirements-for-d3cold I agree that using the date is probably not a good idea, but it's what we have now. Adjusting the date seems lower impact than changing the policy for these kernels. > > I'll bring this up with linux-pci developers after the weekend. Should I > proceed with proposing workaround amdgpu/radeon patches? Sounds good. I generally don't like hacking around this in the driver, but I guess it's better than nothing at this point. Thanks for your help with this.
Created attachment 128042 [details] dmesg_with_debug_options Sorry for the delay, I compiled my kernel with CONFIG_DEBUG_ACPI=y followed the steps that you mentioned.
(In reply to Nayan Deshmukh from comment #30) > Created attachment 128042 [details] > dmesg_with_debug_options > > Sorry for the delay, I compiled my kernel with CONFIG_DEBUG_ACPI=y followed > the steps that you mentioned. Have you enabled acpi.trace_state=enable (either via cmdline or before loading the module)? It does not appear in the logs.
Created attachment 128064 [details] dmesg_with_debug_options
(In reply to Nayan Deshmukh from comment #32) > Created attachment 128064 [details] > dmesg_with_debug_options These logs do not show the "resume failed" messages that appeared in the original logs. Could you try to trigger this? The logs also shows no \_SB.PCI0.RP05.PC05._ON (or _OFF) method invocations, so pcie_port_pm=off (or force) is likely not going to make a difference here. Btw, to increase the kernel log size from 256K you could pass log_buf_len=1M to your cmdline. Could you try obtaining a log where the "resume failed" messages are shown? These messages also look suspicious, any idea what it could be? (Alex?) [ 221.378665] VI should always have 2 performance levels [ 221.380787] failed to send message 18a ret is 255 [ 221.380794] failed to send pre message 145 ret is 255 [ 221.621346] failed to send message 146 ret is 0 [ 221.862589] failed to send pre message 148 ret is 0 They also appear in bug 97240 and bug 97548.
Created attachment 128075 [details] dmesg-without_blacklist Sorry for the previous logs they had an extra patch so please ignore them. I can get the same error messages if the amdgpu is not blacklisted.
Created attachment 128076 [details] dmesg_with_blacklist This is the dmesg with blacklisting amdgpu, it also leads to similar errors but this is a bit different from the previous one.
The bug is fixed in 4.9-rc7. Thanks for the effort Peter.
Closing resolved+Fixed. Verified by Reporter.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.