Bugzilla – Bug 20954
mesa/drm(git): kernel panic with radeon driver (Radeon 9500 Pro)
Last modified: 2014-01-23 11:09:53 UTC
Created attachment 24374 [details]
Kernel panic message
When I reboot or power off my Linux box, the radeon driver panics. I am unable to dump the message so I've taken a photo and attached it to the bug.
With git bisect I've been able to determine the patches that caused the problem:
If I close the gnome session, switch to a terminal and try to poweroff or reboot, nothing bad happens.
I'm on Gentoo Linux 2.6.28 with x11 overlay. Please, feel free to ask for more information.
I just moved the vblank_cleanup after lastclose, give that a try and see if it is resolved.
(In reply to comment #1)
> I just moved the vblank_cleanup after lastclose, give that a try and see if it
> is resolved.
No, the same kernel panic still appears.
Created attachment 24695 [details]
Kernel panic complete message
Here is the complete panic message. I've used objdump with the radeon.ko module to locate the specific line that causes the error. It seems to be this:
de10: 8b 80 e0 00 00 00 mov 0xe0(%eax),%eax
de16: 8b 40 10 mov 0x10(%eax),%eax
de19: 05 14 02 00 00 add $0x214,%eax
de1e: 8b 00 mov (%eax),%eax
de20: 83 c4 0c add $0xc,%esp
de23: c3 ret
The problem is 0xde16(radeon_get_vblank_counter+0x76) operating with eax = NULL as you can see in the photo. I guess that dev_priv->mmio is NULL and that triggers the oops. I've tested this hypothesis adding a check:
if (dev_priv->mmio == NULL)
This avoids the panic.
Well, this is the information I've been able to gather but I'm not an expert so I'm sorry if I've erroneously brought more darkness than light :)
(In reply to comment #3)
> if (dev_priv->mmio == NULL)
> return -EINVAL;
Looks like a good solution.
Created attachment 25010 [details] [review]
Proposed patch to avoid null pointer use.
Many more reports:
This still exists in latest kernel src and drm git. Rafael, are you going to submit for inclusion?
Dug around in the source to try and confirm the fix, but I can't find the code that would ever make this mmio pointer be NULL. I wonder if we are operating on this structure after it has been freed? Or am I missing something?
(In reply to comment #6)
> Many more reports:
> This still exists in latest kernel src and drm git. Rafael, are you going to
> submit for inclusion?
How? I thought that filing a bug report was the way to fix this issue.
> Dug around in the source to try and confirm the fix, but I can't find the code
> that would ever make this mmio pointer be NULL. I wonder if we are operating on
> this structure after it has been freed? Or am I missing something?
I reviewed the code and also I was unable to find the sentence that turns NULL the pointer. To be honest, I don't see my patch as a fix because it attacks the symptom and not the illness. I'll try again to find the point where "dev_priv->mmio" is "nullified" but I'm afraid that I won't get anything useful.
I've been using the KMS branch  for a while and the panic is gone.
Appologies for reopening this bug if it truly is fixed in the KMS branch. I have been unable to get KMS setup on my machine, but have tried my best to analyze the changes in linux-next to check if this is truly fixed but I can not see how it would be. Also note, I am not an advanced kernel or DRI hacker, so this analysis may be completely bogus. My analysis of the problem is as follows:
During X shutdown, RADEONDRICloseScreen (radeon_dri.c:1915) in the X driver calls the radeon_cp_init (radeon_cp.c:1313) ioctl with RADEON_CLEANUP_CP. radeon_cp_init then calls radeon_do_cleanup_cp (radeon_cp.c:1221) which sets mmio to NULL when it memset's most of dev_priv to 0.
Later at module unload time (or pci unplug time in the KMS branch) drm_unload (drm_drv.c:358) (drm_put_dev (drm_stub.c:490) in KMS) calls drm_vblank_cleanup (drm_irq.c:97) which calls vblank_disable_fn (drm_irq.c:74) which calls radeon_get_vblank_counter (radeon_irq.c:284) and causes an Oops trying to read from mmio, which is NULL.
I'm not sure if the solution is to not zero mmio during cleanup or protect against it being NULL in get_vblank_count or not call get_vblank_count during vblank cleanup (presumably by saving the last vblank during driver cleanup). I don't know enough about the code to make that determination.
As best I can tell, this sequence happens in both the current git master as well as the linux-next KMS branch, but perhaps I am overlooking something... my apologies if this really is properly fixed.
Thanks for reconsidering it.
I'm pretty sure this has been fixed. If not, please reopen.