Bug 28671

Summary: Seg. fault and Oops with Radeon KMS (v2.6.34) on PPC ATI Radeon AGP r420 JH.
Product: DRI Reporter: Brett Witherspoon <spoonb>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: spoonb
Version: unspecified   
Hardware: PowerPC   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Dmesg with call trace.
none
possible fix
none
dmesg after starting X
none
Xorg log with KMS patch applied.
none
Xorg.log seg fault after quiting wm with patch and noaccel enabled.
none
lscpi -vnn for video card.
none
attempt to fix connector tables
none
new patch
none
dmesg: first connector table patch
none
take 3
none
dmesg: after take 3 patch.
none
take 4
none
reverse digital encoder mapping none

Description Brett Witherspoon 2010-06-22 09:44:22 UTC
Created attachment 36419 [details] [review]
Dmesg with call trace.

Mentioned on #radeon:

Hello, I have recently adopted a PowerMac G5 (machine7,3 circa 2003-04) with a ATI Radeon x800 XT Mac edition (AGP chipset R420 JH). When attempting to use the new KMS Radeon drivers (kernel v. 2.4.34) I get a seg. fault and kernel Oops. I am using PPC gentoo linux with 64bit kernel and 32bit userland. I have collected the call trace that is given after inserting the module with modesetting enabled. Attached is my entire dmesg with the call trace at the bottom. Inserting the module was done via SSH with no other fb or video drivers enabled in the kernel.

Let me know if I can provide any additional information that would be useful and I am happy to do any testing.

Thanks,
B. Witherspoon
Comment 1 Alex Deucher 2010-06-22 10:03:59 UTC
Created attachment 36421 [details] [review]
possible fix

try this patch.  It should avoid this oops, but there may be other changes required for that card.
Comment 2 Brett Witherspoon 2010-06-22 12:30:57 UTC
Created attachment 36423 [details] [review]
dmesg after starting X
Comment 3 Brett Witherspoon 2010-06-22 12:32:13 UTC
Yes, this does avoids the oops. Thank you. The fb console works just it should. As you expected there are still some issues with X. The issue is very much like what I experienced without KMS and X when I was testing. I am not sure if this is separate issue and another bug report should be filed or not. If so let me know, but I will report what I have so far here.

Using:
libdrm-2.4.21
mesa-7.8.2
xorg-1.8.1.901
ati-drivers-6.13.0

After starting X I get a cursor and multi-color lines in the background. The window manager and other apps are started, but are not visible. Attempting to switch VT will cause the machine to lockup. It also seems that it will also lockup after a few min even if nothing is done. Xorg.log doesn't have any errors i can see but dmesg after X is started does have some *ERROR*. I have attached both. The *ERROR*'s in dmesg are repeated until the machine eventually locks. I was in a bit of a hurry to get the messages before it locked, so it only occurs once or twice here. Without KMS the issue is almost the same, but I get a nasty race condition prior to the lockup.

I will also attempt to get some sort of backtrace if it is helpful. I will need to re-compile with debuging symbols. libdrm, mesa, xorg, ati-drivers should be enough? I am not sure if gdb is going to be useful in this case but I am happy to give it a run tonight.
Comment 4 Brett Witherspoon 2010-06-22 12:36:00 UTC
Created attachment 36424 [details]
Xorg log with KMS patch applied.
Comment 5 Alex Deucher 2010-06-22 13:39:41 UTC
You are getting a GPU lockup.  You might try:
Option "NoAccel" "True"
in the device section of your xorg.conf as a work around for the time being.
Comment 6 Alex Deucher 2010-06-22 13:42:55 UTC
Also, what connectors does the card actually have on it?  DVI + VGA + TV?  DVI + DVI + TV?  We probably need a custom connector table for it.  Finally, can you attach the output of lspci -vnn for the card?
Comment 7 Brett Witherspoon 2010-06-22 17:12:29 UTC
Ah, I had tried NoAccel previously with limited success, however with your patch applied it works much better. The window manager and apps are visible and there is no lockup when I switch VT's. 

However, X does seg fault when I quit the wm. I have attached the backtrace found the Xorg.log from that seg fault in case it is useful and the lspci output you requested.

The connectors on the card are quite different from the generic connector table the driver seems to use. The connectors are DVI + ADC (Apple Display Connector DVI+USB for Cinema Displays; it was available prior to 2005). In case it is relevant, I am currently using the DVI outpt with a VGA adaptor since I ran out of DVI cables.

Please let me know if I can provide anything else.
Comment 8 Brett Witherspoon 2010-06-22 17:17:03 UTC
Created attachment 36428 [details]
Xorg.log seg fault after quiting wm with patch and noaccel enabled.
Comment 9 Brett Witherspoon 2010-06-22 17:17:41 UTC
Created attachment 36429 [details]
lscpi -vnn for video card.
Comment 10 Alex Deucher 2010-06-22 22:51:41 UTC
(In reply to comment #7)
> Ah, I had tried NoAccel previously with limited success, however with your
> patch applied it works much better. The window manager and apps are visible and
> there is no lockup when I switch VT's. 
> 
> However, X does seg fault when I quit the wm. I have attached the backtrace
> found the Xorg.log from that seg fault in case it is useful and the lspci
> output you requested.
> 

That might be fixed in xf86-video-ati from git master.

As to the GPU lockup, You might try disabling EXA accel routines to see if you can narrow down the problem.  E.g.,
Option "EXANoComposite"
Option "EXANoDownloadFromScreen"
Option "EXANoUploadToScreen"
in the device section of your xorg config.

> The connectors on the card are quite different from the generic connector table
> the driver seems to use. The connectors are DVI + ADC (Apple Display Connector
> DVI+USB for Cinema Displays; it was available prior to 2005). In case it is
> relevant, I am currently using the DVI outpt with a VGA adaptor since I ran out
> of DVI cables.
> 
> Please let me know if I can provide anything else.

We'll need to come up with a custom connector table.  I will need your help testing to make sure I got the encoder and ddc line routing correct.
Comment 11 Alex Deucher 2010-06-22 23:04:50 UTC
Actually, your lockups are probably AGP related.  Try adding radeon.agpmode=X to your kernel command line where X = -1 or 1 or 2 or 4 or 8, e.g.,
radeon.agpmode=1
and see if any of those help.
Comment 12 Brett Witherspoon 2010-06-23 13:19:06 UTC
Ok I will pull from the git tree and give those a try.

Changing the AGP mode did not improve any of the symptoms. Also disabling one or all of the EXA options did not seem to help either. Adding an additional "RenderAccel" "False" did show some of the wm but it would still lock. The "NoAccel" option seems to be the only one that makes a difference. 

I am happy to test or help in anyway I can.
Comment 13 Alex Deucher 2010-06-25 13:58:08 UTC
Created attachment 36499 [details] [review]
attempt to fix connector tables

This patch attempts to fix the connector tables.  We have 2 dacs, 2 tmds controllers, and 4 ddc lines the could be configured.  Each DVI/ADC port is a combination of 1 dac, 1 tmds controller, and 1 DDC line, so we need to find the right combinations.
Comment 14 Alex Deucher 2010-06-25 14:02:06 UTC
Created attachment 36500 [details] [review]
new patch

Sorry typo in the last patch.
Comment 15 Brett Witherspoon 2010-06-26 13:03:58 UTC
Created attachment 36526 [details] [review]
dmesg: first connector table patch

Ok, gave this table a try. I had to fuzz a few lines to get the patch to apply to the 2.6.34 sources. I assume you must be working with some git branch. If it would be best that I use the same source your working with let me know.

Attached is the dmesg. I noticed the connector_id is the same for both entries. Are we testing with just one connector? The drivers reports that there is nothing connected to the connector. I do not know how the Connector [0-1] are enumerated, but I was unable to connect a monitor to the ADT port as I don't have an adaptor yet. I ordered one though, so it should be here be the end of this next week. 

I do not know of anyway to determine which physical connector this would be and I assume if the dac,tmds or ddc lines or incorrect it might not be detected. Is this the case?

Also if it is any use; the connectors appear to be DVI-D (Analog and Digital, Dual link) and of course the ADT one has some additional pins.
Comment 16 Alex Deucher 2010-06-26 14:53:43 UTC
Created attachment 36527 [details] [review]
take 3

Sorry, that one had a typo as well.  This one should add both connectors.  The patches are based on Dave's drm-fixes branch:
http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=shortlog;h=refs/heads/drm-fixes
Comment 17 Brett Witherspoon 2010-06-26 18:47:33 UTC
I thought that was the case. We have two connectors now, but no input to the monitor (atleast on the DVI-D output). So our controllers and ddc lines must be wrong?
Comment 18 Alex Deucher 2010-06-27 15:35:36 UTC
(In reply to comment #17)
> I thought that was the case. We have two connectors now, but no input to the
> monitor (atleast on the DVI-D output). So our controllers and ddc lines must be
> wrong?

If no monitor is getting detected, the ddc lines are probably wrong.  In the patch edit these two lines:

+		/* DVI - primary dac, dvo */
+		ddc_i2c = combios_setup_i2c_bus(rdev, RADEON_GPIO_DVI_DDC);

+		/* DVI - tv dac, internal tmds */
+		ddc_i2c = combios_setup_i2c_bus(rdev, RADEON_GPIO_MONID);

Make sure they are not the same value.  Try combinations the following values:

RADEON_GPIO_VGA_DDC
RADEON_GPIO_DVI_DDC
RADEON_GPIO_CRT2_DDC
RADEON_GPIO_MONID

When you get the right one, you should get an edid from your monitor assuming it has one.
Comment 19 Brett Witherspoon 2010-06-27 16:46:46 UTC
What I meant by "no input" is that the monitor is detected as dmesg does not complain about no connected monitors and I am able to find the correct monitor EDID in the /proc/devices-tree/pci@... directory, but the monitor doesn't seem to be receiving input as it switches between digital and analog and then goes into standby. So would this mean the dac and tmds controllers are wrong? 

The other night I thought I would do a little mix/match experimentation with the different ddc, dac and tmds options you mentioned. I was able to find those 4 ddc symbols in the source and how I use them, but the dac and tmds lines I have yet to figure out. It was getting late and I found the declarations I need, but I have not yet been able to determine how to use them. I guess I need some more examples.
Comment 20 Brett Witherspoon 2010-06-27 17:14:09 UTC
I can most likely find the examples I need in the other tables we have here. I have some time tonight so I will do some poking around.

Looking at the dmesg I see that the first connector actually has two dac connectors and the second only has a tmds. Does this mean when adding our dac encoder to the second connector we should pass ATOM_DEVICE_CRT2_SUPPORT for supported_device instead of ATOM_DEVICE_CRT1_SUPPORT? I am unsure of that function. I will attach the dmesg for you.
Comment 21 Brett Witherspoon 2010-06-27 17:16:30 UTC
Created attachment 36554 [details] [review]
dmesg: after take 3 patch.
Comment 22 Brett Witherspoon 2010-06-27 17:32:02 UTC
Making that change brings back video in fbcon, but the I have the same affects when starting X. Locked...
Comment 23 Alex Deucher 2010-06-27 22:49:34 UTC
Created attachment 36556 [details] [review]
take 4

Argh!  typo in take 3 as well.  try this one...
Comment 24 Alex Deucher 2010-06-27 22:50:45 UTC
Note, these patches will just fix up the connectors.  The GPU hang is another issue.
Comment 25 Brett Witherspoon 2010-06-28 13:51:33 UTC
Ok seems to work well. How will I know when we have the correct combinations for the connector table? It seems that multiple combinations work.
Comment 26 Alex Deucher 2010-06-28 14:09:52 UTC
Ideally you'd test both a digital and analog monitor on both ports and make sure it works on each.  Can you attach your xorg log and dmesg from the final version of the patch?  Also, make sure dpms works.  when X is running, try 'xset dpms force off' and make sure the montior goes off properly.
Comment 27 Brett Witherspoon 2010-07-19 10:14:47 UTC
Sorry for the delay in my response. I have been preoccupied. 

I tested analog and dmcp works fine. However DVI has no output on monitor with either connector.
Comment 28 Alex Deucher 2010-07-21 17:05:02 UTC
(In reply to comment #27)
> I tested analog and dmcp works fine. However DVI has no output on monitor with
> either connector.

What's dmcp?  With the analog monitor is the driver able to get an edid from the monitor on both ports?  check 'xrandr --verbose' to see if there is an edid.  If so, then we've gotten the ddc lines correct and the digital encoder are probably reversed.
Comment 29 Alex Deucher 2010-07-21 17:05:58 UTC
Created attachment 37285 [details] [review]
reverse digital encoder mapping

Does this patch help with the digital monitors?
Comment 30 Brett Witherspoon 2010-09-04 20:39:12 UTC
Again I have to apologize. New that I am back home I was able to test this. This patch fixes the digital monitors. I test analog/digital on the DVI port and digital on the ADT port. The adapter I have is not compatible with analog. An EDID is found on both and dpms works also.

Should this be marked resolved? I still have a seg fault when exiting X with noaccel on and a gpu hang with it of. Should these be reported as another bug?
Comment 31 Alex Deucher 2010-09-07 11:43:23 UTC
I've sent the connector patch upstream:
http://lists.freedesktop.org/archives/dri-devel/2010-September/003677.html

As to the hangs with accel, BenH was working on some fixes for that related to the AGP bridge on macs.
Comment 32 Brett Witherspoon 2010-09-13 12:12:00 UTC
Ok thanks. Marking resolved.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.