Created attachment 23968 [details] [review]
patch to avoid removing scsi_host device if it still has children
I have a SAS storage enclosure with removable hard disks. When one of the hard disks is removed, the scsi_host to which it is connected disappears from HAL, even though it is still in use. Therefore, all other SCSI devices connected to the same host no longer appear in the HAL tree, because their parent no longer exists.
This happens because missing_scsi_host() removes the host unconditionally. It should check if the host still has other targets attached to it. I'm attaching a patch to correct this.
Actually, it seems to me to be more appropriate to really react to the scsi_host sysfs events. The problem with those events is that the sysfs_path they give is one to /sys/class/scsi_host/hostN, while HAL wants to have it at /sys/devices/<bus>/scsi_host:hostN. Isn't it better to just rewrite the sysfs_path?
Can you provide these info from an unpatched, latest HAL (git):
- full lshal before and after removing the harddisk from the SAS
- start hald with --verbose=yes --use-syslog, reproduce the problem and add the part of /var/log/messages to this bug since the HAL start.
Created attachment 24074 [details]
hald log for a system with a SES expander from which a disk is removed.
The attached log was generated with 0.5.12~rc1 (I have trouble accessing git). I checked on cgit, though, and nothing has changed in the code paths where it goes wrong.
I've edited the log a bit: added comments starting with # in the first column and removed the very useless stuff. The part where it goes wrong is indicated with #***.
After the patch, I've tested with a USB stick as well; in that case, the scsi_host is properly removed when the USB stick is removed.
lshal output follows shortly.
Created attachment 24075 [details]
lshal output before removing a disk
The lshal output was unfortunately taken from a different machine (with exactly the same hardware and software), so the serial numbers of the disks are different. Shouldn't make too much of a difference, though.
Created attachment 24076 [details]
lshal output after removing a disk
In the lshal output after removing the disk, I also included an lshal -lu of some of the devices that are still present, to show which ones are gone. In the normal lshal output, they don't appear since they don't have a parent anymore.
Created attachment 24077 [details]
hald log for a system with a SES expander from which a disk is removed
I've created an additional log of hald on the same machine as the lshal output. This time, there are no comments :-) HAL initialisation was finished on 12:06:56. I removed the disk on 12:07:06. I inserted the disk again on 12:08:54.
Note that things become even worse after re-insertion. The scsi_host and scsi_device are re-synthesized, and they happen to have the same name as the parent scsi_device of one of the existing drives.
Created attachment 24078 [details]
lshal output after re-inserting a disk
Created attachment 24079 [details] [review]
patch to avoid removing scsi_host if it still has children
I was able to clone the git repository today, so I have re-created the patch as a git commit against the head. If I need to change anything about the patch, I'm willing to do it.
Author: Arnout Vandecappelle <email@example.com>
Date: Fri Aug 7 10:44:48 2009 +0200
don't remove scsi_host if there are still targets using it
When a scsi device is removed, the host it is connected to is also
removed by missing_scsi_host(). However, it is possible that other
targets still exist that are connected to the same host. In that case,
the scsi_host should not be removed. We check if the scsi_host has
more than one child, and if so it is not removed.