Bug 23240 - Neverwinter Nights freezes, X and nwmain consume 100% cpu, and trace shows radeon to blame.
Summary: Neverwinter Nights freezes, X and nwmain consume 100% cpu, and trace shows ra...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r300 (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-10 12:20 UTC by Stephen E. Baker
Modified: 2010-04-10 10:27 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
lspci output (1.87 KB, text/plain)
2009-08-25 15:16 UTC, Stephen E. Baker
Details
Current Xorg log (61.48 KB, text/x-log)
2009-08-25 15:21 UTC, Stephen E. Baker
Details

Description Stephen E. Baker 2009-08-10 12:20:16 UTC
While in game play in NWN the game will freeze and sound will stutter.  The
keyboard does not respond to any key presses (eg. ctrl+alt+backspace)

If I ssh in I still cannot kill nwmain or X.  They both run at 100% cpu (dual
core) and do not terminate on killall or kill -9.  (nwmain becomes a zombie but
it still consumes 100% cpu)

The crash occurs on Mesa 7.4 and 7.5 with 6.12.1 and 6.12.2 of the radeon drivers.  (I haven't tried other versions)

Reproducible: Always

Steps to Reproduce:
The bug occurs while playing Neverwinter Nights, but not always in the same
place.  I've gotten between 15 minutes and 2 hours in before it crashes.


messages.log of the crash:
Jul 25 23:27:27 goodt60 Uhhuh. NMI received for unknown reason a1 on CPU 0.
Jul 25 23:27:27 goodt60 You have some hardware problem, likely on the PCI bus.
Jul 25 23:27:27 goodt60 Dazed and confused, but trying to continue
Jul 25 23:30:01 goodt60 cron[24468]: (root) CMD (test -x /usr/sbin/run-crons &&
/usr/sbin/run-crons )
Jul 25 23:31:27 goodt60 sshd[24479]: Accepted keyboard-interactive/pam for
stephen from 192.168.1.101 port 45568 ssh2
Jul 25 23:31:27 goodt60 sshd[24479]: pam_unix(sshd:session): session opened for
user stephen by (uid=0)
Jul 25 23:37:32 goodt60 sudo:  stephen : TTY=pts/0 ; PWD=/home/stephen ;
USER=root ; COMMAND=/usr/bin/killall X
Jul 25 23:37:32 goodt60 sudo: pam_unix(sudo:session): session opened for user
root by stephen(uid=0)
Jul 25 23:37:32 goodt60 sudo: pam_unix(sudo:session): session closed for user
root
Jul 25 23:39:46 goodt60 su[24559]: Successful su for root by stephen
Jul 25 23:39:46 goodt60 su[24559]: + pts/0 stephen:root
Jul 25 23:39:46 goodt60 su[24559]: pam_unix(su:session): session opened for
user root by stephen(uid=1000)
Jul 25 23:40:02 goodt60 cron[24565]: (root) CMD (test -x /usr/sbin/run-crons &&
/usr/sbin/run-crons )
Jul 25 23:40:04 goodt60 mtrr: MTRR 2 not used
Jul 25 23:40:05 goodt60 BUG: unable to handle kernel NULL pointer dereference
at 00000001
Jul 25 23:40:05 goodt60 IP: [<f8180009>] radeon_do_cp_idle+0x199/0x1f0 [radeon]
Jul 25 23:40:05 goodt60 *pde = 00000000 
Jul 25 23:40:05 goodt60 Oops: 0000 [#1] SMP 
Jul 25 23:40:05 goodt60 last sysfs file:
/sys/class/power_supply/BAT0/energy_full
Jul 25 23:40:05 goodt60 Modules linked in: radeon usbhid usb_storage ehci_hcd
uhci_hcd iwl3945 usbcore sg joydev
Jul 25 23:40:05 goodt60 
Jul 25 23:40:05 goodt60 Pid: 24364, comm: nwmain Not tainted (2.6.29-gentoo-r5
#1) 20077KU
Jul 25 23:40:05 goodt60 EIP: 0060:[<f8180009>] EFLAGS: 00010202 CPU: 0
Jul 25 23:40:05 goodt60 EIP is at radeon_do_cp_idle+0x199/0x1f0 [radeon]
Jul 25 23:40:05 goodt60 EAX: 00000001 EBX: f649dc00 ECX: 0003ffff EDX: 00027130
Jul 25 23:40:05 goodt60 ESI: f649c400 EDI: f649c400 EBP: f5cc7d08 ESP: f5cc7cf0
Jul 25 23:40:05 goodt60 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Jul 25 23:40:05 goodt60 Process nwmain (pid: 24364, ti=f5cc6000 task=f20bf660
task.ti=f5cc6000)
Jul 25 23:40:05 goodt60 Stack:
Jul 25 23:40:05 goodt60 00000282 f5caebe0 f649c4bc f649dc00 f649c400 f649dc00
f5cc7d28 f8180d36
Jul 25 23:40:05 goodt60 c0363ea1 f5caebe0 f5caebf4 f649c410 f649c400 f649c400
f5cc7d30 f818cf18
Jul 25 23:40:05 goodt60 f5cc7d54 c035ece9 f20a3ac0 00000286 00000008 00000282
f649c410 f6118af8
Jul 25 23:40:05 goodt60 Call Trace:
Jul 25 23:40:05 goodt60 [<f8180d36>] ? radeon_do_release+0xc6/0x130 [radeon]
Jul 25 23:40:05 goodt60 [<c0363ea1>] ? drm_master_destroy+0x101/0x120
Jul 25 23:40:05 goodt60 [<f818cf18>] ? radeon_driver_lastclose+0x8/0x10
[radeon]
Jul 25 23:40:05 goodt60 [<c035ece9>] ? drm_lastclose+0x29/0x2c0
Jul 25 23:40:05 goodt60 [<c035f4f4>] ? drm_release+0x354/0x4f0
Jul 25 23:40:05 goodt60 [<c01824e2>] ? __fput+0xb2/0x1b0
Jul 25 23:40:05 goodt60 [<c01825ff>] ? fput+0x1f/0x30
Jul 25 23:40:05 goodt60 [<c017f647>] ? filp_close+0x47/0x70
Jul 25 23:40:05 goodt60 [<c012b3c0>] ? put_files_struct+0xa0/0xc0
Jul 25 23:40:05 goodt60 [<c012b421>] ? exit_files+0x41/0x50
Jul 25 23:40:05 goodt60 [<c012ce7f>] ? do_exit+0x5cf/0x760
Jul 25 23:40:05 goodt60 [<c0120a9b>] ? dequeue_task_fair+0x3b/0x1d0
Jul 25 23:40:05 goodt60 [<c013395e>] ? recalc_sigpending+0xe/0x40
Jul 25 23:40:05 goodt60 [<c013663b>] ? dequeue_signal+0x2b/0x1a0
Jul 25 23:40:05 goodt60 [<c0113f93>] ? lapic_next_event+0x13/0x20
Jul 25 23:40:05 goodt60 [<c012d043>] ? do_group_exit+0x33/0xa0
Jul 25 23:40:05 goodt60 [<c0136b46>] ? get_signal_to_deliver+0x166/0x370
Jul 25 23:40:05 goodt60 [<c01025e9>] ? do_notify_resume+0x99/0x8b0
Jul 25 23:40:05 goodt60 [<c0145723>] ? getnstimeofday+0x53/0x120
Jul 25 23:40:05 goodt60 [<c02d3ba6>] ? copy_to_user+0x36/0x130
Jul 25 23:40:05 goodt60 [<c01412eb>] ? hrtimer_nanosleep+0x12b/0x170
Jul 25 23:40:05 goodt60 [<c0140890>] ? hrtimer_wakeup+0x0/0x20
Jul 25 23:40:05 goodt60 [<c0141390>] ? sys_nanosleep+0x60/0x70
Jul 25 23:40:05 goodt60 [<c010347e>] ? work_notifysig+0x13/0x19
Jul 25 23:40:05 goodt60 Code: 82 03 00 00 00 40 21 c8 e9 47 ff ff ff 90 8d 74
26 00 8b 83 e4 00 00 00 8b 40 10 8b 00 e9 ae fe ff ff 8b 83 e4 00 00 00 8b 40
10 <8b> 00 e9 6b ff ff ff c7 44 24 04 d2 5a 19 f8 c7 04 24 71 67 19 
Jul 25 23:40:05 goodt60 EIP: [<f8180009>] radeon_do_cp_idle+0x199/0x1f0
[radeon] SS:ESP 0068:f5cc7cf0
Jul 25 23:40:05 goodt60 ---[ end trace 102836e0b42b5cbd ]---
Jul 25 23:40:05 goodt60 Fixing recursive fault but reboot is needed!
Jul 25 23:40:05 goodt60 kdm[8052]: X server for display :0 terminated
unexpectedly
Jul 25 23:40:05 goodt60 su[9287]: pam_unix(su:session): session closed for user
root
Jul 25 23:40:05 goodt60 su[23811]: pam_unix(su:session): session closed for
user root
Jul 25 23:40:05 goodt60 kdm: :0[8159]: pam_unix(kde:session): session closed
for user stephen
Jul 25 23:40:05 goodt60 acpid: client 8105[0:0] has disconnected
Jul 25 23:40:05 goodt60 acpid: client connected from 24579[0:0]
Jul 25 23:40:05 goodt60 acpid: 1 client rule loaded
Jul 25 23:40:20 goodt60 kdm[8052]: X server startup timeout, terminating
Jul 25 23:40:35 goodt60 kdm[8052]: X server termination timeout, killing
Jul 25 23:40:46 goodt60 kdm[8052]: X server is stuck in D state; leaving it
alone
Jul 25 23:40:46 goodt60 kdm[8052]: X server for display :0 cannot be started,
session disabled
Comment 1 Stephen E. Baker 2009-08-10 12:51:21 UTC
A few more things I noticed:
Caps Lock does not respond unless I use Alt+SysRq+R

Sound stops after Alt+SysRq+I but the screen stays frozen on the game no matter what I push

The freezes seems to occur very frequently in particular places, eg. attacking the white dragon in SoU ch1; while in other places I can play for hours without incident.  Even where they occur frequently though I haven't noticed any particular trigger or similarity.

If it wasn't clear, there is no way to use the computer after a freeze short of a reboot.
Comment 2 Stephen E. Baker 2009-08-11 17:02:15 UTC
Today I recreated the freeze I saw this in my messages leading up to it:

Aug 11 19:35:59 goodt60 [drm] Num pipes: 1
Aug 11 19:36:05 goodt60 [drm] Num pipes: 1
Aug 11 19:36:10 goodt60 [drm] Num pipes: 1
Aug 11 19:36:12 goodt60 [drm] Num pipes: 1
Aug 11 19:36:12 goodt60 [drm] Num pipes: 1
Aug 11 19:36:14 goodt60 [drm] Num pipes: 1
Aug 11 19:36:21 goodt60 CE: hpet increasing min_delta_ns to 50624 nsec
Aug 11 19:36:22 goodt60 [drm] Num pipes: 1
Aug 11 19:36:24 goodt60 [drm] Num pipes: 1
Aug 11 19:36:25 goodt60 [drm] Num pipes: 1
Aug 11 19:36:26 goodt60 [drm] Num pipes: 1
Aug 11 19:36:30 goodt60 [drm] Num pipes: 1
Aug 11 19:36:32 goodt60 [drm] Num pipes: 1
Aug 11 19:36:33 goodt60 [drm] Num pipes: 1
Aug 11 19:36:34 goodt60 [drm] Num pipes: 1
Aug 11 19:36:38 goodt60 [drm] Num pipes: 1
Aug 11 19:36:40 goodt60 [drm] Num pipes: 1
Aug 11 19:36:41 goodt60 [drm] Num pipes: 1
Aug 11 19:36:42 goodt60 [drm] Num pipes: 1
Aug 11 19:36:47 goodt60 [drm] Num pipes: 1
Aug 11 19:36:53 goodt60 [drm] Num pipes: 1
Aug 11 19:36:53 goodt60 [drm] Num pipes: 1

Does this help?
Comment 3 Nicolai Hähnle 2009-08-25 12:30:26 UTC
Some additional information might be useful, especially: What hardware are you using? Consider attaching output of lspci, and your Xorg.0.log.

Also, what about dmesg output *before* things start to go wrong visibly?
Comment 4 Stephen E. Baker 2009-08-25 15:16:39 UTC
Created attachment 28908 [details]
lspci output

My computer is a Lenovo ThinkPad T60 with an ATI Radeon X1400
Comment 5 Stephen E. Baker 2009-08-25 15:21:22 UTC
Created attachment 28909 [details]
Current Xorg log

Since filing this bug I've downgraded back to mesa 7.3 and xf86-video-ati 6.12.1 but the freezing occurs in these versions as well as those mentioned prior.
Comment 6 Nicolai Hähnle 2009-08-26 06:56:38 UTC
Thank you for the additional information.

This is a typical symptom of hard-to-find and sometimes hardware specific hardware lockups. The NMI message seems to support the theory I've once heard that these lockups are related to the GPU doing stupid things on the bus.

While I can't reproduce this right now and it's hard to tell what's going wrong, here are some things that you could try:

1. If I saw correctly, your card is connected via AGP. Try changing the AGP settings (including disabling AGP entirely).

2. Can you test whether the lockups also occur with KMS-enabled graphics stack? Unfortunately, that requires updating a lot of things (the kernel with staging drivers enabled, X.Org, and Mesa), but several distributions offer bleeding-edge packages.

To anybody who is listening: Is there a good, updated guide somewhere that explains how to get a KMS graphics stack?
Comment 7 Pauli 2009-08-26 09:47:34 UTC
(In reply to comment #6)
> ...
> To anybody who is listening: Is there a good, updated guide somewhere that
> explains how to get a KMS graphics stack?
> 

http://xorg.freedesktop.org/wiki/radeonBuildHowTo

It needs work to make more detailed info available. But it is wiki so anyone in theory could will useful info to there.
Comment 8 Stephen E. Baker 2009-08-29 06:39:38 UTC
I switched to fedora 11 with kms but the game was unplayable (missing text, missing textures, very slow.)  Setting setnokms fixes these problems in fedora 11 - but I haven't played long enough to see if the game crashes on that distro yet.

On another note; I'm not certain (being a laptop) but I thought my graphics card was PCIe.
Comment 9 Stephen E. Baker 2010-04-10 10:27:45 UTC
I haven't been able to reproduce this issue in a recent version of X.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.