Bug 13864 - crash on server restart
Summary: crash on server restart
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: 7.3 (2007.09)
Hardware: Other All
: medium major
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-30 11:07 UTC by Pierre Ossman
Modified: 2009-01-13 15:09 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (320.32 KB, text/plain)
2007-12-30 11:22 UTC, Pierre Ossman
no flags Details
DMesg log (37.54 KB, text/plain)
2008-04-14 15:00 UTC, Tomasz Sałaciński
no flags Details
This is a very crude attempt, but it may work. (1.15 KB, patch)
2008-04-15 10:22 UTC, Maarten Maathuis
no flags Details | Splinter Review
Syslog with "debug=1" in drm.ko (54.33 KB, text/plain)
2008-07-03 06:23 UTC, Paulo Zanoni
no flags Details
Xorg log (109.83 KB, text/plain)
2008-07-03 06:26 UTC, Paulo Zanoni
no flags Details

Description Pierre Ossman 2007-12-30 11:07:04 UTC
When the X server automatically restarts after the last client disconnects, nouveau will crash if RandR 1.2 is enabled.

I know the RandR 1.2 stuff still is a bit sketchy, so this might be known. But if not, I'll be happy to provide info and test code.
Comment 1 Maarten Maathuis 2007-12-30 11:13:32 UTC
A xlog or a backtrace, anything to indicate what went wrong is needed.
Comment 2 Pierre Ossman 2007-12-30 11:21:59 UTC
These are the final lines from X:

nv_output_restore is called
nv_output_restore is called
nv_output_restore is called
pre-Owner: 0x0
post-Owner: 0x0
pre-Owner: 0x0
post-Owner: 0x0
(EE) NOUVEAU(0): [DRI] Locking deadlock.
        Already locked with context 157577356,
        trying to lock with context 2.
(EE) NOUVEAU(0): Error creating device

Fatal server error:
AddScreen/ScreenInit failed for driver 0
Comment 3 Pierre Ossman 2007-12-30 11:22:59 UTC
Created attachment 13418 [details]
Xorg.0.log
Comment 4 Pierre Ossman 2007-12-30 11:44:07 UTC
The crash is also present without RandR 1.2 on this card. A case of bad testing on my part. Sorry.
Comment 5 David Heidelberg (okias) 2008-01-01 17:10:10 UTC
Same problem with Gentoo ~amd64, nVidia 7600 GS. Lastest x11-drm, libdrm and xf86-video-nouveau. If is server restarted with CTRL-ALT-BACKSPACE, not crash. If is ended with exit in failsafe mode, or Logout in KDE, then crash with same error in X.Org.0.log
Comment 6 Tomasz Sałaciński 2008-04-14 15:00:36 UTC
Created attachment 15916 [details]
DMesg log

This file is quite long, but in the middle you can see some traces of nouveau driver doing some nasty things. I haven't cut the file because I think you prefer the whole message log.
Comment 7 Tomasz Sałaciński 2008-04-14 15:04:57 UTC
I have the same issue, but I must admit that I am running Fedora 9 Beta and I am using the nouveau driver packaged for the distro:

xorg-x11-drv-nouveau-0.0.10-2.20080408git0991281.fc9.i386

X server crashes and won't run again (actually, I have to do a hard reboot because the console is not showing up, only some error messages on the screen and ALT+Fx doesn't work).

I've created an attachment with dmesg log, you can look for "nouveau" phrase and you will find a backtrace.

It shows something like this:

Apr 13 10:13:41 Tommy-PC kernel: Fixing recursive fault but reboot is needed!

And unfortunately it tells the truth:( I had to disable RHGB (Red Hat Graphical Boot, it starts X server to show the boot progress graphically and then restarts it and starts GDM). Because RHGB starts the X server, it won't run again and it won't show GDM.

Packages:

xorg-x11-server-Xorg-1.4.99.901-21.20080407.fc9.i386
[root@Tommy-PC tommy]# uname -r
2.6.25-0.218.rc8.git7.fc9.i686
Comment 8 Tomasz Sałaciński 2008-04-14 15:06:01 UTC
And, arr, my card: GeForce 7100 GS. I have GeForce 8600GT laying on my desk, but haven't tested it yet.
Comment 9 Maarten Maathuis 2008-04-15 03:58:56 UTC
I've seen this before, for someone reason it partially restarts while X is still running, i wonder if that is valid beheaviour.
Comment 10 Maarten Maathuis 2008-04-15 04:06:01 UTC
s/someone/some
Comment 11 Maarten Maathuis 2008-04-15 10:22:33 UTC
Created attachment 15930 [details] [review]
This is a very crude attempt, but it may work.

Please test this.
Comment 12 Maarten Maathuis 2008-04-25 11:13:49 UTC
Can someone test this?
Comment 13 Pierre Ossman 2008-04-25 11:55:49 UTC
I'm afraid the patch is insufficient. I get this now:

(EE) [drm] Could not set DRM device bus ID.
(EE) NOUVEAU(0): [dri] DRIScreenInit failed.  Disabling DRI.

Fatal server error:
AddScreen/ScreenInit failed for driver 0


I'm in #nouveau if you have a quick update for me to test.
Comment 14 Maarten Maathuis 2008-04-25 13:10:21 UTC
ossman has unusual and unexplainable problems, anyone else that can test?
Comment 15 Paulo Zanoni 2008-06-06 05:47:12 UTC
I have the same problem.

I have tested the patch, and whenever the server restarts, it crashes, giving me the same message Ossman was getting, instead of the "deadlock" message.

(EE) [drm] Could not set DRM device bus ID.
(EE) NOUVEAU(0): [dri] DRIScreenInit failed.  Disabling DRI.

Fatal server error:
AddScreen/ScreenInit failed for driver 0
Comment 16 Maarten Maathuis 2008-06-06 09:22:14 UTC
It would really help if someone who actually had the issue look at it, because this is not easy to do blindly.
Comment 17 Maarten Maathuis 2008-06-25 10:54:17 UTC
I committed a few changes that might help, please try again.
Comment 18 Paulo Zanoni 2008-07-03 06:23:29 UTC
Created attachment 17495 [details]
Syslog with "debug=1" in drm.ko

Tested with July 02 2008 drm/nouveau.
Loaded drm.ko with debug=1.
Comment 19 Paulo Zanoni 2008-07-03 06:26:23 UTC
Created attachment 17496 [details]
Xorg log

Tested with July 02 2008 drm/nouveau
The problem still happens with me.

How I reproduced the bug:

Open X
DISPLAY=:0.0 xterm
"ctrl+c" to kill xterm.
Comment 20 Stuart Bennett 2008-10-23 08:33:35 UTC
Using the repro method in comment 19, I've pushed some stuff which now lets me regenerate without crashing, so anyone who was hitting this, please test.

The problem here is that for every mmap(2) (wrapped in the libdrm function drmMap), the kernel increases the refcount on the struct file associated with the drm file descriptor.  While we can happily call drmClose (which wraps close(2)), this only decreases the filp->f_count refcount by one, and the count needs to hit zero before the fops->release function (drm_release) is called to free all kernel-side resources.
Calling the release function also means that next time the DRM is opened the opener becomes DRM master, which is necessary for privileged ioctls to work on the next xserver generation.  The solution pushed is to drmUnmap (wrapping munmap(2)) all mappings made on the fd (i.e. the user channel, and all buffer objects).
Comment 21 Stuart Bennett 2009-01-13 15:09:05 UTC
AFAIK the fixes described in comment 20 fixed this.  Please reopen with Xorg.0.log and dmesg logs and preferably details of how to reproduce if you still hit this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.