Bug 14418 - Server crashes with "Too many input devices" after several resets
Server crashes with "Too many input devices" after several resets
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Server/General
git
Other Linux (All)
: medium normal
Assigned To: Peter Hutterer
Xorg Project Team
:
Depends on:
Blocks: 13766 xorg-7.4 15645
  Show dependency treegraph
 
Reported: 2008-02-07 18:13 UTC by Eamon Walsh
Modified: 2008-04-29 05:53 UTC (History)
7 users (show)

See Also:


Attachments
Server stderr output (3.62 KB, text/plain)
2008-02-07 18:13 UTC, Eamon Walsh
no flags Details
Server logfile (74.00 KB, text/x-log)
2008-02-07 18:15 UTC, Eamon Walsh
no flags Details
call DIDR instead of CloseDevice when shutting down. (1.96 KB, patch)
2008-02-10 00:24 UTC, Peter Hutterer
no flags Details | Splinter Review
call DIDR instead of CloseDevice when shutting down (updated) (3.61 KB, patch)
2008-02-21 18:07 UTC, Peter Hutterer
no flags Details | Splinter Review
0001-xfree86-don-t-free-the-config-file-related-informat.patch (4.55 KB, patch)
2008-04-22 20:38 UTC, Peter Hutterer
no flags Details | Splinter Review
0002-dix-NULL-out-WindowTable.patch (750 bytes, patch)
2008-04-23 21:16 UTC, Peter Hutterer
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Eamon Walsh 2008-02-07 18:13:26 UTC
Created attachment 14212 [details]
Server stderr output

I'm able to crash a "bare" X server after it resets several times.
 
Server stderr and logfile attached.  The PreInit errors in the stderr 
output might be related to the problem.
 
Will open a bugzilla next week if necessary.

Peter Hutterer writes:
Yeah, I've seen this before.
IIRC this is caused by the devices not resetting properly on a server
restart. The hal code then initialises them each time again, until the
ids run out.

I haven't had time to look at it yet, but I'd put my bets on either
evdev not disabling and deleting the device when it fails to initialise
or the server just ignoring the return code. My bet is on the first.
Comment 1 Eamon Walsh 2008-02-07 18:15:37 UTC
Created attachment 14213 [details]
Server logfile
Comment 2 Peter Hutterer 2008-02-10 00:24:41 UTC
Created attachment 14259 [details] [review]
call DIDR instead of CloseDevice when shutting down.

Problem is caused by the DDX maintaining its own device list in addition to the DIX. When closing the server, we only remove the devices from the DIX, not the DDX. So when the server comes up again, all the hal-enabled devices are still in the DDX's list. Repeat until device IDs run out.

This patch should fix things, although now I get some weird behaviour with the hal hotplugging code. daniels also mentioned that this may not work with Xephyr. consider it as incomplete.
Comment 3 Peter Hutterer 2008-02-21 18:07:35 UTC
Created attachment 14493 [details] [review]
call DIDR instead of CloseDevice when shutting down (updated)

updated patch, including a few strdups in xfree86 to avoid SIGSEGVs when "<default pointer>" is cleaned up.

Tested with master, didn't see any problems with Xephyr. Tested with AutoAddDevices enabled and disabled.


The weird HAL problems went away, not sure why it happend in the first place. Probably compilation issue?


Can I please get an ACK for this before I push?
Comment 4 Daniel Stone 2008-02-21 19:35:43 UTC
On Thu, Feb 21, 2008 at 06:07:36PM -0800, bugzilla-daemon@freedesktop.org wrote:
> Tested with master, didn't see any problems with Xephyr. Tested with
> AutoAddDevices enabled and disabled.

Er, Xephyr has DIDR short-circuited to do nothing, so hmm.  Maybe we need
a flag to differentiate between user request and server removal of
devices.
Comment 5 Peter Hutterer 2008-02-21 19:41:34 UTC
(In reply to comment #4)
> On Thu, Feb 21, 2008 at 06:07:36PM -0800, bugzilla-daemon@freedesktop.org
> wrote:
> > Tested with master, didn't see any problems with Xephyr. Tested with
> > AutoAddDevices enabled and disabled.
> 
> Er, Xephyr has DIDR short-circuited to do nothing, so hmm.  Maybe we need
> a flag to differentiate between user request and server removal of
> devices.

http://cgit.freedesktop.org/xorg/xserver/tree/hw/kdrive/src/kinput.c

2505	void
2506	DeleteInputDeviceRequest(DeviceIntPtr pDev)
2507	{
2508	    RemoveDevice(pDev);
2509	}


Does Xephyr actually do hotplugging? i thought it only does the Xephyr virtual mouse/keyboard anyway.
Comment 6 Peter Hutterer 2008-02-26 19:53:33 UTC
daniel, if you don't have any other comments I'll push this patch to master.
Comment 7 Peter Hutterer 2008-04-11 02:16:51 UTC
pushed as 6d22a9615a0e6ab3d00b0bcb22ff001b6ece02ae
Comment 8 Rene Rask 2008-04-18 18:53:24 UTC
This patch breaks both the intel and the wacom driver when used at the same time.
I have 2 tabletpc with intel graphics and one usb tablet for two laptops using the nouveau driver. The nouvaeu+wacom works fine, but intel+wacom break with this patch.
It breaks on logout, VT and killing X.

I'm running fedora rawhide with the latest updates.

One the X60 tabletpc it doesn't take down the system completely so I could get this backtrace from ssh.
(gdb) bt full
#0  xf86WcmDevProc (pWcm=0x9b0dad0, what=2) at ./xf86Wacom.c:682
	local = (LocalDevicePtr) 0x9ba6568
	priv = (WacomDevicePtr) 0x0
#1  0x0021cc8c in xf86WcmUninit (drv=0x99abd40, local=0x9afa350, flags=0) at ./wcmConfig.c:351
	priv = (WacomDevicePtr) 0x9b1e968
#2  0x080d5292 in DeleteInputDeviceRequest (pDev=0x9b0dad0) at xf86Xinput.c:463
	pInfo = (LocalDevicePtr) 0x9afa350
	drv = (InputDriverPtr) 0x99abd40
	idev = (IDevRec *) 0x998fdc8
#3  0x0807e6e3 in CloseDownDevices () at devices.c:621
	dev = (DeviceIntPtr) 0x9b0dad0
	next = (DeviceIntPtr) 0x9b0dc10
#4  0x0806b3f0 in main (argc=8, argv=0xbf96bd84, envp=0x0) at main.c:461
	i = <value optimized out>
	error = 136273580
	xauthfile = <value optimized out>
	alwaysCheckForInput = {0, 1}


Comment 9 Peter Hutterer 2008-04-21 21:44:09 UTC
(In reply to comment #8)
> This patch breaks both the intel and the wacom driver when used at the same
> time.
> I have 2 tabletpc with intel graphics and one usb tablet for two laptops using
> the nouveau driver. The nouvaeu+wacom works fine, but intel+wacom break with
> this patch.
> It breaks on logout, VT and killing X.
> 
> I'm running fedora rawhide with the latest updates.
> 
> One the X60 tabletpc it doesn't take down the system completely so I could get
> this backtrace from ssh.
> (gdb) bt full
> #0  xf86WcmDevProc (pWcm=0x9b0dad0, what=2) at ./xf86Wacom.c:682
>         local = (LocalDevicePtr) 0x9ba6568
>         priv = (WacomDevicePtr) 0x0

Call order is RemoveDevice() -> CloseDevice() -> xf86WcmDevProc(DEVICE_OFF). CloseDevice then frees the device memory.

After that, DIDR calls wacom's UnInit -> xf86WcmUninit -> xf86DevProc(DEVICE_OFF).

I think this is a bug in wacom. xf86WcmUninit should only remove stuff stored in the LocalDeviceRec, not actually touch the DeviceIntPtr which at that point is already freed. Simply removing the line 

	gWacomModule.DevProc(local->dev, DEVICE_OFF);

in xf86WcmUninit should do the job.

Daniel, Magnus: can you comment on this?
Comment 10 Bernhard Rosenkraenzer 2008-04-22 05:53:24 UTC
The fix for this bug causes bug 15645
Comment 11 Daniel Stone 2008-04-22 07:24:02 UTC
On Tue, Apr 22, 2008 at 05:53:24AM -0700, bugzilla-daemon@freedesktop.org wrote:
> The fix for this bug causes bug 15645

Some drivers are buggy, and don't cope with it; if you find any, please
open one bug per input driver, with logs/backtraces, and they can be
fixed.
Comment 12 Magnus Vigerlof 2008-04-22 10:09:22 UTC
The Wacom X-driver has a problem with its internal structures when removing the defined devices. Since we don't hotplug the input devices this has not been a problem until now when the Xserver remove the devices at shutdown (and restart?). Previously it just left all devices as-is and didn't try to clean up..

I've been working on an alternative wacom-driver (git://git.debian.org:/git/collab-maint/linux-wacom.git, branch hotplug, path src/xdrvhp) which can be hotplugged and also should work better in these situations. I'm currently making an effort to align that branch with master.

My problem at the moment is what kind of hotplug approach we should aim for, i.e. how much work should I put into this one and xinputhotplugd (git://people.freedesktop.org/~wigge/xinputhotplugd) before we really start working on the new InputDevice structure.
Comment 13 Peter Hutterer 2008-04-22 17:55:11 UTC
(In reply to comment #11)
> On Tue, Apr 22, 2008 at 05:53:24AM -0700, bugzilla-daemon@freedesktop.org
> wrote:
> > The fix for this bug causes bug 15645
> 
> Some drivers are buggy, and don't cope with it; if you find any, please
> open one bug per input driver, with logs/backtraces, and they can be
> fixed. 

No, this is my fault. xfree86 DIDR removes the config entry for the device, so in the next server generation there's nothing left to initialise. Reopening the bug.
Comment 14 Peter Hutterer 2008-04-22 20:38:12 UTC
Created attachment 16115 [details] [review]
0001-xfree86-don-t-free-the-config-file-related-informat.patch

An addition to the already existing patch. The configuration parsing code stores information about the devices to be initialised in xf86ConfigLayout.inputs. Pointers to these structs are also passed through the driver, where they are usually stored in dev->conf_idev. 

Freeing the device in DIDR would also free this conf_idev, thus freeing the information about the device. This in turn caused #15654.
It also left us with dangling pointers in xf86ConfigLayout.inputs, which may be the reason for some corruption as also reported on the xorg list.

Please try this patch. It only frees the config information for devices added through HAL, leaving the rest. In the next server generation, it re-initialises the configured input devices.
Comment 15 Peter Hutterer 2008-04-22 20:39:13 UTC
*** Bug 15645 has been marked as a duplicate of this bug. ***
Comment 16 Peter Hutterer 2008-04-23 21:16:29 UTC
Created attachment 16144 [details] [review]
0002-dix-NULL-out-WindowTable.patch

Follow-up to the patch above. CloseDownDevices calls DIDR calls DisableDevice() which attempts to send a DevicePresenceNotify to all windows. At this point, the windows have already be been freed, and accessing the dangling pointers is a bad idea. This patch NULLs out the WindowTable and ensures that no Presence notify event is sent if a window is NULL.
Comment 17 Peter Hutterer 2008-04-24 00:08:31 UTC
Patches pushed to master. Can you please verify if the problem persists?
(i'm pretty sure the wacom problem will still be there)
Comment 18 Peter Hutterer 2008-04-29 05:53:36 UTC
(In reply to comment #17)
> Patches pushed to master. Can you please verify if the problem persists?
> (i'm pretty sure the wacom problem will still be there)
> 



no complaints -> closing bug. 
I claim the wacom issue as NOTOURBUG.