Summary: | Kernel NULL pointer crash when viewing big images in Firefox on Radeon XPress 200M | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Matthijs Kooijman <matthijs> | ||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | medium | CC: | lisaev, pumba88 | ||||||||
Version: | unspecified | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
Matthijs Kooijman
2010-10-06 03:08:00 UTC
Can attach your dmesg output? Looking at code i am puzzle on how this could happen, segfault happen because bo->list.next is NULL which is impossible as we init list before calling ttm and every time we manipulate this list it's with list_del_init Thanks for your comments, they allowed me to look around the code a bit and add some debugging instrumentation. I still don't understand the code or your comments completely though. Regarding your comments: How are you so sure the problem occurs because bo->list.next is NULL? Couldn't it just as well be bo->list.prev, which is also dereferenced in list_del_init? Also, couldn't it be other values, like bo itself, or bo->rdev, etc. ? I've tried to add some debugging output to my kernel to find out what codepaths are taken exactly and confirm that it is indeed those list pointers causing the problem. However, I've not been able to reproduce the exact same problem anymore so far. Out of four tries, I've completely locked up the machine three times, not allowing me to get at my debug output through SSH. The fourth time the machine was still responsive, but the crash was different. Instead of a NULL pointer, it encountered a BUG() in ttm_bo_vm_insert_rb (at ttm_bo.c:1614, though your line numbers might be slightly different due to my patches). I'm attaching the dmesg output of this crash, which also includes the patch I applied at the bottom. Note that the "radeon_ttm_bo_destroy: Calling list_del_init with bo->list.prev: %p and bo->list.next: %p\n" message is useless, since I forget to actually pass in those arguments to printk. I suspect that both of these are symptoms of the same underlying problem, perhaps this helps to find that problem. If you have thoughts about possible causes, please think aloud, I might be able to confirm or disprove any suspicions with some instrumentation. Created attachment 39406 [details]
Dmesg log and patch for a BUG() in ttm_bo_vm_insert_rb
Created attachment 39408 [details]
Kernel log messages from a normal system boot
Here's the dmesg output (or actually, stuff from /var/log/kern.log) from a normal system boot. Is this what you needed?
I have the same problem with Xpress 200M and Ubuntu 10.10 32 bit (xserver-xorg-video ati 6.13.1-1ubuntu5, linux-image-2.6.35-22.34 (2.6.35.4), xorg-server 1.9.0-0ubuntu7). Freeze occurs when I view specific pictures in Firefox. "Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831735] BUG: unable to handle kernel NULL pointer dereference at 00000004 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831756] IP: [<f84832f1>] radeon_ttm_bo_destroy+0x31/0xa0 [radeon] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831818] *pde = 56116067 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831826] Oops: 0002 [#1] SMP Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831836] last sysfs file: /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831845] Modules linked in: binfmt_misc parport_pc ppdev arc4 joydev snd_hda_codec_realtek pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi radeon snd_seq_midi_event snd_seq ath5k snd_timer snd_seq_device mac80211 ttm drm_kms_helper drm ath yenta_socket i2c_algo_bit pcmcia_rsrc snd ati_agp psmouse cfg80211 soundcore snd_page_alloc shpchp pcmcia_core serio_raw i2c_piix4 led_class agpgart video output lp parport 8139too usbhid 8139cp hid mii sata_sil pata_atiixp Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831966] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831976] Pid: 1006, comm: Xorg Not tainted 2.6.35-22-generic #34-Ubuntu Satellite L30/Satellite L30 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.831986] EIP: 0060:[<f84832f1>] EFLAGS: 00213286 CPU: 0 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832027] EIP is at radeon_ttm_bo_destroy+0x31/0xa0 [radeon] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832035] EAX: f32b8800 EBX: f32b8800 ECX: 00000000 EDX: 00000000 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832042] ESI: f32b8850 EDI: f400c404 EBP: f42e9d3c ESP: f42e9d30 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832050] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832058] Process Xorg (pid: 1006, ti=f42e8000 task=f322cc20 task.ti=f42e8000) Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832064] Stack: Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832069] f32b8850 f32b8878 f400c404 f42e9d50 f8322051 f32b8878 f8321fe0 f400c414 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832088] <0> f42e9d60 c0352edd f32b8850 f400c404 f42e9d74 f832477c f32b8874 f8324710 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832109] <0> 00000001 f42e9d84 c0352edd f400c404 f32b8850 f42e9d94 f83225f2 fffffff4 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832132] Call Trace: Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832158] [<f8322051>] ? ttm_bo_release_list+0x71/0xb0 [ttm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832174] [<f8321fe0>] ? ttm_bo_release_list+0x0/0xb0 [ttm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832191] [<c0352edd>] ? kref_put+0x2d/0x60 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832207] [<f832477c>] ? ttm_bo_release+0x6c/0x90 [ttm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832223] [<f8324710>] ? ttm_bo_release+0x0/0x90 [ttm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832233] [<c0352edd>] ? kref_put+0x2d/0x60 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832249] [<f83225f2>] ? ttm_bo_unref+0x32/0x50 [ttm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832266] [<f83246a2>] ? ttm_bo_init+0x1a2/0x1f0 [ttm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832309] [<f8483521>] ? radeon_bo_create+0x111/0x240 [radeon] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832355] [<f84832c0>] ? radeon_ttm_bo_destroy+0x0/0xa0 [radeon] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832408] [<f8496d36>] ? radeon_gem_object_create+0x76/0xe0 [radeon] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832459] [<f8496e07>] ? radeon_gem_create_ioctl+0x67/0xe0 [radeon] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832474] [<c01c645d>] ? t_start+0x8d/0x90 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832511] [<f81fd98d>] ? drm_ioctl+0x1ad/0x430 [drm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832524] [<c01c645d>] ? t_start+0x8d/0x90 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832571] [<f8496da0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832590] [<c032e636>] ? apparmor_file_permission+0x16/0x20 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832605] [<c0302a84>] ? security_file_permission+0x14/0x20 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832620] [<c02185d2>] ? rw_verify_area+0x62/0xd0 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832630] [<c01c645d>] ? t_start+0x8d/0x90 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832640] [<c0226622>] ? vfs_ioctl+0x32/0xb0 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832667] [<f81fd7e0>] ? drm_ioctl+0x0/0x430 [drm] Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832676] [<c0226eb9>] ? do_vfs_ioctl+0x79/0x2d0 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832685] [<c0227177>] ? sys_ioctl+0x67/0x80 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832695] [<c01c645d>] ? t_start+0x8d/0x90 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832707] [<c05c9114>] ? syscall_call+0x7/0xb Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832717] [<c01c645d>] ? t_start+0x8d/0x90 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832726] [<c01c645d>] ? t_start+0x8d/0x90 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832732] Code: 89 1c 24 89 74 24 04 89 7c 24 08 0f 1f 44 00 00 8d 58 b0 89 c6 8b 83 34 01 00 00 05 e0 09 00 00 e8 e5 46 14 c8 8b 56 b0 8b 43 04 <89> 42 04 89 10 89 5e b0 8b 83 34 01 00 00 89 5b 04 05 e0 09 00 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832848] EIP: [<f84832f1>] radeon_ttm_bo_destroy+0x31/0xa0 [radeon] SS:ESP 0068:f42e9d30 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832896] CR2: 0000000000000004 Oct 17 17:05:15 SatelliteL30 kernel: [ 2708.832965] ---[ end trace 4821b7bd8308a654 ]---" In my case the freeze happens before the picture shows up. is this still happening with 2.6.36? I'm going to try and upgrade my rs480 today but its a slow machine ;-) okay on 64-bit rs480 here but with 2.6.36-rc8 I'm not able to crash it loading the xkcd large image. So hopefully someone else can ;-) *** Bug 31038 has been marked as a duplicate of this bug. *** I've just compiled 2.6.36, which no longer shows this problem. The big images are rendered properly now, there is no lockup and no oopses etc. in dmesg. Resolving per comment #11, thanks for the update. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.