Bug 20703

Summary: HAL sometimes doesn't emit udi-removed signals
Product: hal Reporter: Dan Williams <dcbw>
Component: haldAssignee: Danny Kukawka <danny.kukawka>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: kay, mark, stern
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: HAL log showing "Error removing device" message
udev log showing tty removal events
udev device add/remove log for 2.6.29-0.258.rc8.git2.fc11

Description Dan Williams 2009-03-16 20:26:29 UTC
hal-0.5.12-24.20090226git.fc11.i586

Sierra 881 mobile broadband card.  When the mobile broadband connection is
active in NetworkManager, pulling the card without disconnecting emits removal
signals for some devices, but not for other devices:

Obligatory lshal --monitor output:

23:06:37.994: pci_1033_35 added
23:06:38.180: pci_1033_35_0 added
23:06:38.369: usb_device_1d6b_1_0000_03_00_0 added
23:06:38.476: usb_device_1d6b_1_0000_03_00_0_if0 added
23:06:38.614: usb_device_1d6b_1_0000_03_00_1 added
23:06:38.632: usb_device_1d6b_1_0000_03_00_1_if0 added
23:06:42.845: usb_device_1199_6851_noserial added
23:06:43.104: usb_device_1199_6851_noserial_if0 added
23:06:44.038: computer_power_supply_battery_BAT0 property
battery.voltage.current = 12599 (0x3137)
23:06:45.990: usb_device_1199_6851_noserial_if0_serial_usb_0 added
23:06:46.098: usb_device_1199_6851_noserial_if0_serial_usb_2 added
23:06:48.127: usb_device_1199_6851_noserial_if0_serial_usb_3 added
23:06:48.215: usb_device_1199_6851_noserial_if0_serial_usb_1 added

(activate 3G connection)

23:07:14.063: computer_power_supply_battery_BAT0 property
battery.voltage.current = 12604 (0x313c)
23:07:21.562: net_computer added

(pull card while 3G connection is active)

23:07:40.061: usb_device_1199_6851_noserial_if0 removed
23:07:40.100: usb_device_1199_6851_noserial removed
23:07:40.205: usb_device_1d6b_1_0000_03_00_0_if0 removed
23:07:40.264: usb_device_1d6b_1_0000_03_00_0 removed
23:07:40.365: pci_1033_35 removed
23:07:40.446: usb_device_1d6b_1_0000_03_00_1_if0 removed
23:07:40.490: usb_device_1d6b_1_0000_03_00_1 removed
23:07:40.611: pci_1033_35_0 removed
23:07:40.632: net_computer removed
23:07:44.056: computer_power_supply_battery_BAT0 property
battery.voltage.current = 12599 (0x3137)


Notice how no device removal messages are sent for the following four serial
TTYs:

usb_device_1199_6851_noserial_if0_serial_usb_0 added
usb_device_1199_6851_noserial_if0_serial_usb_2 added
usb_device_1199_6851_noserial_if0_serial_usb_3 added
usb_device_1199_6851_noserial_if0_serial_usb_1 added


NetworkManager kind of depends on HAL to send device-removal signals when a
device gets removed, and if HAL doesn't do that, weird stuff happens, like the
3G card still showing up in the applet menu even though it's been pulled out,
becuase HAL didn't send the removal signal, for the tty, and thus NM doesn't
think the device has been removed.

But the ttys actually *are* removed, because they aren't in the device list (checked with lshal) after the card has been pulled.  So the removal signal for the tty seems to be getting dropped on the floor.
Comment 1 Dan Williams 2009-03-16 20:27:16 UTC
RH bug: https://bugzilla.redhat.com/show_bug.cgi?id=490577
Comment 2 Dan Williams 2009-03-16 20:48:47 UTC
Created attachment 23947 [details]
HAL log showing "Error removing device" message
Comment 3 Dan Williams 2009-03-16 20:49:54 UTC
Created attachment 23948 [details]
udev log showing tty removal events
Comment 4 Danny Kukawka 2009-03-18 07:22:03 UTC
Can you please provide full lshal output before you remove the card?
Comment 5 Danny Kukawka 2009-03-18 07:37:54 UTC
Without lshal output: 

Looks as if the problem is may the sysfs_path. On the ADD event it's e.g.: 
/sys/devices/pci0000:00/0000:00:1e.0/0000:02:01.0/0000:03:00.0/usb2/2-1/2-1:1.0/ttyUSB0/tty/ttyUSB0

But on the REMOVE event it's: 
/sys/2-1:1.0/ttyUSB0/tty/ttyUSB0

That's why HAL can't remove the device from the device list, there is simply no device in the store with this sysfs_path. 

Is this maybe a UDEV or kernel issue?
Comment 6 Kay Sievers 2009-03-18 07:49:58 UTC
Looks like a kernel bug. The parent usb_device is no longer available while the usb_interface, which is normally a child of the usb_device, and the class devices, which are childs of the usb_interface, are removed.

Such events must never happen, they must all start with with a full path starting with "/devices/...":
  UEVENT[1237261056.182478] remove   /2-1:1.0/ttyUSB0/tty/ttyUSB0 (tty)

Seems like a bug in the kernel usb cleanup logic. What kernel version is it? And can this e reproduced with a current vanilla kernel.org kernel?
Comment 7 Dan Williams 2009-03-18 08:43:14 UTC
Kernel was Fedora rawhide 2.6.29-0.176.rc6.git5.fc11.i686.PAE.  Will re-test with 2.6.29-0.258.rc8.git2.fc11.PAE.
Comment 8 Dan Williams 2009-03-18 08:44:13 UTC
udev-139-2.fc11.i586 BTW
Comment 9 Dan Williams 2009-03-18 08:45:21 UTC
note that (so far) this only happens if the 3G device is *active* at the time it's removed; ie NetworkManager has started PPP on one of the ttys.  I can't reproduce the issue if the device is dormant and nothing is using the ttys when yanking the card.
Comment 10 Kay Sievers 2009-03-18 08:52:33 UTC
(In reply to comment #9)
> note that (so far) this only happens if the 3G device is *active* at the time
> it's removed; ie NetworkManager has started PPP on one of the ttys.  I can't
> reproduce the issue if the device is dormant and nothing is using the ttys when
> yanking the card.

Which fits into the expected refcounting/cleanup bug, that the active tty device does not pin the entire parent chain, but for some reason, the usb_interface can disconnect from its parent, the usb_device, during the removal processing.
Comment 11 Dan Williams 2009-03-18 09:54:20 UTC
Behavior still present with 2.6.29-0.258.rc8.git2.fc11.PAE; I get udev events like:

UDEV [....] remove /2-1:1.0/ttyUSB3 (usb-serial)
UDEV_LOG=3
ACTION=remove
DEVPATH=/2-1:1.0/ttyUSB3
SUBSYSTEM=usb-serial
SEQNUM=1267

Oddly, I get *duplicate* udev remove events emitted (with the same sequence #).  Log attached; look for the dupes.
Comment 12 Dan Williams 2009-03-18 09:55:01 UTC
Created attachment 23999 [details]
udev device add/remove log for 2.6.29-0.258.rc8.git2.fc11
Comment 13 Dan Williams 2009-03-18 09:59:06 UTC
For example:

UEVENT[1237394963.457798] remove   /2-1:1.0/ttyUSB0/tty/ttyUSB0 (tty)
UDEV_LOG=3
ACTION=remove
DEVPATH=/2-1:1.0/ttyUSB0/tty/ttyUSB0
SUBSYSTEM=tty
SEQNUM=1260
MAJOR=188
MINOR=0


<lots of other stuff>


UDEV  [1237394964.427524] remove   /2-1:1.0/ttyUSB0/tty/ttyUSB0 (tty)
UDEV_LOG=3
ACTION=remove
DEVPATH=/2-1:1.0/ttyUSB0/tty/ttyUSB0
SUBSYSTEM=tty
SEQNUM=1260
DEVNAME=/dev/ttyUSB0
MAJOR=188
MINOR=0


I wonder what's up with that...? :)
Comment 14 Kay Sievers 2009-03-18 10:15:59 UTC
(In reply to comment #13)
> UEVENT[1237394963.457798] remove   /2-1:1.0/ttyUSB0/tty/ttyUSB0 (tty)
...
> UDEV  [1237394964.427524] remove   /2-1:1.0/ttyUSB0/tty/ttyUSB0 (tty)
...

It's this:
  $ /sbin/udevadm monitor
  monitor will print the received events for:
  UDEV the event which udev sends out after rule processing
  UEVENT the kernel uevent

--kernel, --udev can be used to see only one of them. The output of both events is useful to see the timing, the time it takes to finish an event, and the reordering of udev because of parallel execution.
Comment 15 Kay Sievers 2009-03-18 11:41:59 UTC
Alan, any idea?

We see for a 3G USB-modem, in case it is busy/active and we disconnect it, broken DEVPATH uevents like:
  UEVENT[1237394963.457798] remove   /2-1:1.0/ttyUSB0/tty/ttyUSB0 (tty)

The usb_interface seems to "disconnect" from the parent usb_device before the childs are handled.
Comment 16 Dan Williams 2009-03-25 08:14:57 UTC
Follow-up: Alan posted some patches that fix refcounting and tty destruction in drivers/usb/serial/usb-serial.c that fix the issue.  Definitely a kernel bug.
Comment 17 Mark Ellis 2009-04-20 01:05:51 UTC
(In reply to comment #16)
> Follow-up: Alan posted some patches that fix refcounting and tty destruction in
> drivers/usb/serial/usb-serial.c that fix the issue.  Definitely a kernel bug.
> 

Can someone point me to these patches please. I'm seeing this behaviour on a Pocket PC pda.
Comment 19 Dan Williams 2009-05-10 18:57:06 UTC
FWIW, now fixed in 2.6.27.23 and 2.6.29.3 as well.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.