Summary: | RX-480 [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD) | ||
---|---|---|---|
Product: | DRI | Reporter: | HV <suzaku.29a> |
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | major | ||
Priority: | medium | CC: | maxim.cournoyer |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
HV
2017-05-08 01:38:51 UTC
Created attachment 131249 [details]
dmesg output with amdgpu loading successfully
Created attachment 131250 [details]
lspci short output
Sounds like maybe a power supply issues. Do you have another power supply you could try? Not at the moment. Maybe some cheapo PSU lying around, but i'm not sure it's gonna be able to handle the draw. I currently have an Antec VP500P (500w) and the GPU works on Windows 7 on the same PC (installed along Debian and Gentoo). It's been used extensibly on Windows without any issues. My friend's PC does have a more powerful PSU (some Seasonic 750w). So maybe during init it needs a bit more of a push on linux to start up. Is this possible?. I'll borrow the 750w PSU during the weekend and give it a try. In the meantime, is there anything else I can test or info I can provide?. If not, I'll update/confirm during the weekend. I tested a Seasonic 750W PSU with the RX480 but i got the same error on boot (amdgpu not loading, the rest of the system boots normally and i still get display output with nomodeset on). Is there anything else i can test for this issue?. Regards HV Hi! I have the exact same issue ``[drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)`` with a R9 285 GPU. I've had this problems for ages and had been using nomodeset to get by. I'm trying this on Debian 9 (stretch), with kernel ``Linux debian 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux``. I've attached the full dmesg. The interesting thing is that this seems to be related to the motherboard; when using the very same card (R9 285) in another system, *with the same software* (Debian 9), it works! It is not a hardware problem: the power supply is brand new (Seasonic G550W), the RAM tests fine, the SSD is brand new, the CMOS battery too, etc. The problem occurs on Asus M2N SLI Deluxe motherboard based system; and it disappears when using it with an equally old Asus P5W DH Deluxe based system. I notice there is a message saying that the clock source is unstable right before the error occurs; could it be related? Here is an excerpt of the dmesg: [ 11.552754] failed to send pre message 5b ret is 0 [ 11.748489] failed to send message 5b ret is 0 [ 11.748536] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: [ 11.748538] clocksource: 'acpi_pm' wd_now: f69088 wd_last: ba62a0 mask: ffffff [ 11.748539] clocksource: 'tsc' cs_now: 167976623b cs_last: 12698f30d7 mask: ffffffffffffffff [ 11.957731] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD) [ 11.957840] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -22 [ 11.957879] amdgpu 0000:03:00.0: amdgpu_init failed [ 12.171671] failed to send pre message 133 ret is 0 [ 12.385424] failed to send message 133 ret is 0 [ 12.385433] DPM is not running right now, no need to disable DPM! [ 12.386774] clocksource: Switched to clocksource acpi_pm [ 12.772038] I can provide dmesg from the working system (using the same software with the same card) if judged useful. Created attachment 133465 [details]
Success with customized kernel version 4.13.0-rc2+
I managed to make it work by compiling latest 4.13.0-rc2+ from there: git://people.freedesktop.org/~agd5f/linux. I used the 'drm-next-4.14-wip' branch, and customized it a bit (added CIK option for amdgpu driver, removed AGP support, bumped event rate to 1000 Hz, dropped a few Intel specific options (I'm using AMD K10 class CPU) and enabled a few AMD specific options. I'll try to narrow down exactly what fixes it, but one thing we can see is that there is no longer clock source skew problems apparent in the kernel messages. Created attachment 133466 [details]
Debian 9 stretch stock kernel failing to initialize R9 285 on Asus M2N SLI Deluxe motherboard
I had forgotten to join that one.
A few more data points. None of the 'vanilla' kernel could initialize the R9 285 (tonga 1.2) card on this Asus M2N SLI Deluxe card (remember that the same card works easily on an Asus P5W DH Deluxe based system). I've tried building the following kernels (reusing the Debian 9 stable 4.9.0 kernel config as a starting point) and booted them, but I would always get a CAFEDEAD error: * 4.11.0 from stretch-backports (didn't need to build this one) * 4.12.7 from kernel.org * 4.13.0-rc4-1 from kernel.org None of them worked. So my only success so far is with the drm-next-4.14-wip branch from git://people.freedesktop.org/~agd5f/linux. I've included the dmesg I get when booting off the 4.13.0-rc4 kernel, it has new error output talking about powerplay: [ 5.514767] amdgpu: [powerplay] failed to send message 254 ret is 0 [ 5.514793] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 5.515491] amdgpu: [powerplay] Invalid VDDGFX value! [ 5.515491] amdgpu: [powerplay] Get EVV Voltage Failed. Abort Driver loading! [ 5.515493] amdgpu: [powerplay] amdgpu: powerplay initialization failed [ 5.703544] [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD) [ 5.703601] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -22 [ 5.703630] amdgpu 0000:03:00.0: amdgpu_init failed Created attachment 133478 [details]
dmesg failed init with 4.13.0-rc4
Failure to initialize a R9 285 on Asus M2N Sli Deluxe motherboard with 4.13.0-rc4 kernel.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/165. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.