95015 – [drm:.r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)

Bug 95015 - [drm:.r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)

Summary: [drm:.r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x...

Status:	RESOLVED MOVED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Radeon (show other bugs)
Version:	XOrg git
Hardware:	PowerPC Linux (All)

Importance:	medium major
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (1):	89886 (view as bug list)
Depends on:
Blocks:

Reported:	2016-04-19 10:25 UTC by Rui Salvaterra
Modified:	2019-11-19 09:15 UTC (History)
CC List:	6 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg (43.70 KB, text/plain) 2016-04-19 10:25 UTC, Rui Salvaterra	no flags	Details
dmesg (Radeon on PCIe x8, no IOMMU bypass) (38.74 KB, text/plain) 2016-04-23 10:40 UTC, Rui Salvaterra	no flags	Details
dmesg \| grep radeon (2.06 KB, text/plain) 2016-05-08 10:24 UTC, Dominik Klementowski	no flags	Details
Ubuntu 16.04.2 LTS apport-cli output from a Toshiba P200 laptop (472.47 KB, text/plain) 2017-05-08 19:22 UTC, raulvior.bcn	no flags	Details
View All

Description Rui Salvaterra 2016-04-19 10:25:07 UTC

Created attachment 123045 [details]
dmesg

Full dmesg attached. I have a Power Mac G5 (late 2005, PCIe) with a Radeon HD 6450 (CAICOS, x86 BIOS) installed on the PCIe x16 slot, and I get this error on boot. I also have a GeForce 6600 (Open Firmware) on a PCIe x8 slot, which works fine, minus the usual nouveau caveats. I have no idea if this is related at all, but the x16 slot is directly connected to the U4 northbridge, and is capable of bypassing the DART IOMMU for 64-bit DMA capable devices (like the Radeon). The other slots are connected to a PCIe bridge on the HyperTransport bus, and DMA always goes through the DART. If needed, I can provide ssh access to the machine.

Comment 1 Rui Salvaterra 2016-04-23 10:40:59 UTC

Created attachment 123179 [details]
dmesg (Radeon on PCIe x8, no IOMMU bypass)

So, today I tried moving the Radeon card from the x16 to the x8 slot, and it initialised correctly, as you can see in the attached dmesg. This bug is most certainly caused by some kind of interaction between the card and the IOMMU (bypass). Can someone please take a look at this? Like I wrote before, I'm willing to provide ssh access to the machine (Power Mac G5), if the bug is hard to reproduce. Thanks in advance!

Comment 2 Rui Salvaterra 2016-04-23 10:50:43 UTC

Adding Benjamin Herrenschmidt to the cc list, for the original [1] patch.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-August/085302.html

Comment 3 Rui Salvaterra 2016-04-24 12:06:57 UTC

*** Bug 89886 has been marked as a duplicate of this bug. ***

Comment 4 Oded Gabbay 2016-05-03 20:53:26 UTC

Did you try running a 4.5 kernel ?
I added a POWER relevant fix for radeon in 4.5, which relates to caches (c524498 -  drm/radeon: mask out WC from BO on unsupported arches)

Comment 5 Rui Salvaterra 2016-05-03 21:05:24 UTC

(In reply to Oded Gabbay from comment #4)
> Did you try running a 4.5 kernel ?
> I added a POWER relevant fix for radeon in 4.5, which relates to caches
> (c524498 -  drm/radeon: mask out WC from BO on unsupported arches)

Hi, Oded


I compiled 4.6-rc6 today and the result is the same (actually, now that I see, the second dmesg I attached is from 4.6-rc4). The card works on the x8 slot (behind the DART) but not on the x16 slot (bypassing the DART).


Thanks,

Rui

Comment 6 Dominik Klementowski 2016-05-08 10:24:09 UTC

Created attachment 123548 [details]
dmesg | grep radeon

I use Lenovo Z51-70 laptop with Intel HD5500 and discrete Radeon R9 M375.
I wasn't ever be able to switch to faster card and use it with radeon module. (I only was able to run it with fglrx proprietary driver, but it was running so ugly so computer wasn't usable at all).

Once I switched (that was kernel 4.4 i guess), but my screen was blank and after that and I could only reboot laptop to see anything (reboot reaction normally, CTRL+ALT+DEL)
But content of vgaswitcheroo changed from:
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :DynOff:0000:04:00.0
to:
0:IGD: :Off:0000:00:02.0
1:DIS:+:DynOff:0000:04:00.0

So THERE'S HOPE i thought :D

From kernel 4.5 I have such error and /sys/kernel/debug/vgaswitcheroo file no longer exists.

If I could provide some logs, do tests or stuff, just let me know.

Comment 7 Michel Dänzer 2016-05-09 02:04:27 UTC

(In reply to Dominik Klementowski from comment #6)
> I use Lenovo Z51-70 laptop [...]

This bug report is about a problem which is specific to a different platform. Please file your own report for your problem.

Comment 8 j.ribeirovega 2016-06-22 00:05:44 UTC

I can confirm the exact same error with a PowerMac G5 Quad, only difference is I have an AMD TURKS card. Card works in all slots except the x16 one.

Comment 9 intermediadc@hotmail.com 2016-06-22 11:06:38 UTC

Same bug i face on quad g5 equiped
5450 1 gb and 6570 2gb (x86 vboards).

Comment 10 raulvior.bcn 2017-05-08 19:22:49 UTC

Created attachment 131264 [details]
Ubuntu 16.04.2 LTS apport-cli output from a Toshiba P200 laptop

I am having the exact same error in a laptop with an Advanced Micro Devices, Inc. [AMD/ATI] RV630/M76 [Mobility Radeon HD 2600].

Comment 11 raulvior.bcn 2017-05-08 19:24:59 UTC

When I mean the same exact error I mean the same address too.

Comment 12 raulvior.bcn 2017-05-08 19:25:56 UTC

(In reply to raulvior.bcn from comment #11)
> When I mean the same exact error I mean the same address too.

This:
 may 08 19:13:31 username-portatil kernel: [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)

Comment 13 Joshua Cogliati 2017-09-15 21:25:01 UTC

Just for reference, this error comes from the line:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/r600.c#n2848

Basically, the r600 driver writes 0xCAFEDEAD into a scratch register, then uses a ring write to try and write 0xDEADBEEF into the scratch register, and then waits and then reads the scratch register again to see what is in it.  If it is not 0xDEADBEEF, then the test fails and hardware acceleration is turned off.

The important question is why is the ring write failing, and I don't have an answer.

Comment 14 Martin Peres 2019-11-19 09:15:22 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/708.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.