Bug 20712 - missing_scsi_host() removes scsi_host that is still in use
Summary: missing_scsi_host() removes scsi_host that is still in use
Status: RESOLVED FIXED
Alias: None
Product: hal
Classification: Unclassified
Component: hald (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium major
Assignee: Danny Kukawka
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-17 09:51 UTC by Arnout Vandecappelle (Essensium/Mind)
Modified: 2009-08-07 01:46 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
patch to avoid removing scsi_host device if it still has children (1.30 KB, patch)
2009-03-17 09:51 UTC, Arnout Vandecappelle (Essensium/Mind)
Details | Splinter Review
hald log for a system with a SES expander from which a disk is removed. (166.78 KB, text/plain)
2009-03-20 03:39 UTC, Arnout Vandecappelle (Essensium/Mind)
Details
lshal output before removing a disk (173.56 KB, text/plain)
2009-03-20 04:00 UTC, Arnout Vandecappelle (Essensium/Mind)
Details
lshal output after removing a disk (144.51 KB, text/plain)
2009-03-20 04:02 UTC, Arnout Vandecappelle (Essensium/Mind)
Details
hald log for a system with a SES expander from which a disk is removed (489.60 KB, text/plain)
2009-03-20 04:16 UTC, Arnout Vandecappelle (Essensium/Mind)
Details
lshal output after re-inserting a disk (4.58 KB, application/octet-stream)
2009-03-20 04:18 UTC, Arnout Vandecappelle (Essensium/Mind)
Details
patch to avoid removing scsi_host if it still has children (1.46 KB, patch)
2009-03-20 04:39 UTC, Arnout Vandecappelle (Essensium/Mind)
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Arnout Vandecappelle (Essensium/Mind) 2009-03-17 09:51:31 UTC
Created attachment 23968 [details] [review]
patch to avoid removing scsi_host device if it still has children

I have a SAS storage enclosure with removable hard disks.  When one of the hard disks is removed, the scsi_host to which it is connected disappears from HAL, even though it is still in use.  Therefore, all other SCSI devices connected to the same host no longer appear in the HAL tree, because their parent no longer exists.

This happens because missing_scsi_host() removes the host unconditionally.  It should check if the host still has other targets attached to it.  I'm attaching a patch to correct this.
Comment 1 Arnout Vandecappelle (Essensium/Mind) 2009-03-17 09:56:21 UTC
Actually, it seems to me to be more appropriate to really react to the scsi_host sysfs events.  The problem with those events is that the sysfs_path they give is one to /sys/class/scsi_host/hostN, while HAL wants to have it at /sys/devices/<bus>/scsi_host:hostN.  Isn't it better to just rewrite the sysfs_path?
Comment 2 Danny Kukawka 2009-03-18 07:43:31 UTC
Can you provide these info from an unpatched, latest HAL (git):

- full lshal before and after removing the harddisk from the SAS
- start hald with --verbose=yes --use-syslog, reproduce the problem and add the part of /var/log/messages to this bug since the HAL start.
Comment 3 Arnout Vandecappelle (Essensium/Mind) 2009-03-20 03:39:00 UTC
Created attachment 24074 [details]
hald log for a system with a SES expander from which a disk is removed.

The attached log was generated with 0.5.12~rc1 (I have trouble accessing git).  I checked on cgit, though, and nothing has changed in the code paths where it goes wrong.

I've edited the log a bit: added comments starting with # in the first column and removed the very useless stuff.  The part where it goes wrong is indicated with #***.

After the patch, I've tested with a USB stick as well; in that case, the scsi_host is properly removed when the USB stick is removed.

lshal output follows shortly.
Comment 4 Arnout Vandecappelle (Essensium/Mind) 2009-03-20 04:00:10 UTC
Created attachment 24075 [details]
lshal output before removing a disk

The lshal output was unfortunately taken from a different machine (with exactly the same hardware and software), so the serial numbers of the disks are different.  Shouldn't make too much of a difference, though.
Comment 5 Arnout Vandecappelle (Essensium/Mind) 2009-03-20 04:02:15 UTC
Created attachment 24076 [details]
lshal output after removing a disk

In the lshal output after removing the disk, I also included an lshal -lu of some of the devices that are still present, to show which ones are gone.  In the normal lshal output, they don't appear since they don't have a parent anymore.
Comment 6 Arnout Vandecappelle (Essensium/Mind) 2009-03-20 04:16:56 UTC
Created attachment 24077 [details]
hald log for a system with a SES expander from which a disk is removed

I've created an additional log of hald on the same machine as the lshal output.  This time, there are no comments :-)  HAL initialisation was finished on 12:06:56.  I removed the disk on 12:07:06.  I inserted the disk again on 12:08:54.

Note that things become even worse after re-insertion.  The scsi_host and scsi_device are re-synthesized, and they happen to have the same name as the parent scsi_device of one of the existing drives.
Comment 7 Arnout Vandecappelle (Essensium/Mind) 2009-03-20 04:18:19 UTC
Created attachment 24078 [details]
lshal output after re-inserting a disk
Comment 8 Arnout Vandecappelle (Essensium/Mind) 2009-03-20 04:39:57 UTC
Created attachment 24079 [details] [review]
patch to avoid removing scsi_host if it still has children

I was able to clone the git repository today, so I have re-created the patch as a git commit against the head.  If I need to change anything about the patch, I'm willing to do it.
Comment 9 Danny Kukawka 2009-08-07 01:46:15 UTC
http://cgit.freedesktop.org/~dkukawka/hal/log/:

commit 195146f263e193dc80a4094b4750999657734243
Author: Arnout Vandecappelle <arnout@mind.be>
Date:   Fri Aug 7 10:44:48 2009 +0200

    don't remove scsi_host if there are still targets using it

    When a scsi device is removed, the host it is connected to is also
    removed by missing_scsi_host().  However, it is possible that other
    targets still exist that are connected to the same host.  In that case,
    the scsi_host should not be removed.  We check if the scsi_host has
    more than one child, and if so it is not removed.

    fd.o#20712


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.