Summary: | mesa/drm(git): kernel panic with radeon driver (Radeon 9500 Pro) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | SpOeK <SpOeK> | ||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||
Severity: | major | ||||||||||
Priority: | medium | CC: | dan, kevin | ||||||||
Version: | DRI git | ||||||||||
Hardware: | x86 (IA32) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
SpOeK@DistroBit.Net
2009-03-30 12:04:29 UTC
I just moved the vblank_cleanup after lastclose, give that a try and see if it is resolved. (In reply to comment #1) > I just moved the vblank_cleanup after lastclose, give that a try and see if it > is resolved. > No, the same kernel panic still appears. Created attachment 24695 [details]
Kernel panic complete message
Here is the complete panic message. I've used objdump with the radeon.ko module to locate the specific line that causes the error. It seems to be this:
radeon_get_vblank_counter(radeon_irq.c:307)
return RADEON_READ(RADEON_CRTC2_CRNT_FRAME);
de10: 8b 80 e0 00 00 00 mov 0xe0(%eax),%eax
de16: 8b 40 10 mov 0x10(%eax),%eax
de19: 05 14 02 00 00 add $0x214,%eax
de1e: 8b 00 mov (%eax),%eax
de20: 83 c4 0c add $0xc,%esp
de23: c3 ret
The problem is 0xde16(radeon_get_vblank_counter+0x76) operating with eax = NULL as you can see in the photo. I guess that dev_priv->mmio is NULL and that triggers the oops. I've tested this hypothesis adding a check:
if (dev_priv->mmio == NULL)
return -EINVAL;
This avoids the panic.
Well, this is the information I've been able to gather but I'm not an expert so I'm sorry if I've erroneously brought more darkness than light :)
(In reply to comment #3) > if (dev_priv->mmio == NULL) > return -EINVAL; Looks like a good solution. Created attachment 25010 [details] [review] Proposed patch to avoid null pointer use. Many more reports: https://bugs.gentoo.org/show_bug.cgi?id=264280 This still exists in latest kernel src and drm git. Rafael, are you going to submit for inclusion? Dug around in the source to try and confirm the fix, but I can't find the code that would ever make this mmio pointer be NULL. I wonder if we are operating on this structure after it has been freed? Or am I missing something? (In reply to comment #6) > Many more reports: > https://bugs.gentoo.org/show_bug.cgi?id=264280 > > This still exists in latest kernel src and drm git. Rafael, are you going to > submit for inclusion? How? I thought that filing a bug report was the way to fix this issue. > Dug around in the source to try and confirm the fix, but I can't find the code > that would ever make this mmio pointer be NULL. I wonder if we are operating on > this structure after it has been freed? Or am I missing something? I reviewed the code and also I was unable to find the sentence that turns NULL the pointer. To be honest, I don't see my patch as a fix because it attacks the symptom and not the illness. I'll try again to find the point where "dev_priv->mmio" is "nullified" but I'm afraid that I won't get anything useful. I've been using the KMS branch [1] for a while and the panic is gone. [1] http://airlied.livejournal.com/66958.html Appologies for reopening this bug if it truly is fixed in the KMS branch. I have been unable to get KMS setup on my machine, but have tried my best to analyze the changes in linux-next to check if this is truly fixed but I can not see how it would be. Also note, I am not an advanced kernel or DRI hacker, so this analysis may be completely bogus. My analysis of the problem is as follows: During X shutdown, RADEONDRICloseScreen (radeon_dri.c:1915) in the X driver calls the radeon_cp_init (radeon_cp.c:1313) ioctl with RADEON_CLEANUP_CP. radeon_cp_init then calls radeon_do_cleanup_cp (radeon_cp.c:1221) which sets mmio to NULL when it memset's most of dev_priv to 0. Later at module unload time (or pci unplug time in the KMS branch) drm_unload (drm_drv.c:358) (drm_put_dev (drm_stub.c:490) in KMS) calls drm_vblank_cleanup (drm_irq.c:97) which calls vblank_disable_fn (drm_irq.c:74) which calls radeon_get_vblank_counter (radeon_irq.c:284) and causes an Oops trying to read from mmio, which is NULL. I'm not sure if the solution is to not zero mmio during cleanup or protect against it being NULL in get_vblank_count or not call get_vblank_count during vblank cleanup (presumably by saving the last vblank during driver cleanup). I don't know enough about the code to make that determination. As best I can tell, this sequence happens in both the current git master as well as the linux-next KMS branch, but perhaps I am overlooking something... my apologies if this really is properly fixed. Thanks for reconsidering it. I'm pretty sure this has been fixed. If not, please reopen. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.