Bug 1852

Summary: Hal causes SCSI errors on a sym53c8xx card
Product: hal Reporter: Mathieu Chouquet-Stringer <mchouque>
Component: haldAssignee: David Zeuthen (not reading bugmail) <zeuthen>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: high CC: ma
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Mathieu Chouquet-Stringer 2004-11-14 17:53:54 UTC
When haldaemon starts (hal-0.4.0-10), I get the following:
sym0: SCSI parity error detected: SCR1=132 DBC=50000000 SBCL=0
sym0:0: ERROR (81:0) (8-0-0) (1f/9f/0) @ (scripta 38:f31c0004).
sym0: script cmd = e21c0004
sym0: regdump: da 00 00 9f 47 1f 00 02 00 08 80 00 80 00 0f 02 ff ff ff 00 02 ff
ff ff.
sym0: SCSI BUS reset detected.
sym0: SCSI BUS has been reset.

This is 100% reproducible but the OS survives the problem. However, if I try to
cycle the service, the scsi card gets into abort/reset loops: at this stage the
system is more or less useless (ie no more IOs to disk).

Here's my version of the SCSI driver:
sym0: <895> rev 0x1 at pci 0000:00:0b.0 irq 10
sym0: Tekram NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.1.18k

I run kernel 2.6.10-rc1 (latest bk as of 11/14/2004).

If you have any questions or if you want me to try patches, let me know.

- Mathieu
Comment 1 Matthias Andree 2005-05-15 03:43:37 UTC
I also see this, and as no-one cared so far, I'm bumping Severity up.

OS: SUSE Linux 9.3
Kernels tried (same results): SUSE's kernel-default-2.6.11.4-20a as well as a
vanilla 2.6.11.9
hal-0.4.7-26 (SUSE)
udev-053-15.2 (SUSE)
hotplug-0.50-19 (SUSE)

Relevant discussion around hwscan(d) took place on Linux-SCSI in November 2004,
see <http://marc.theaimsgroup.com/?t=110048286400004&r=1&w=2>
particularly <http://marc.theaimsgroup.com/?l=linux-scsi&m=110113027015843&w=2>
- note thought that the assertion in the discussion 53C876 and higher fixed this
is wrong, I can reproduce the problem on a 53C895 (as well as 53C860, 53C875,
Matt Wilcox reproduced it on a 53C810A).

hald attempts to read 4,096 bytes from the
/sys/devices/pci0000:00/0000:00:0d.0/config file (which pertains to my SYM53C895
based Symbios SCSI adaptor) and gets 256 if run as root (64 if run as regular
user). In these upper 128 bytes, some of the SYM53C8XX adaptors mirror their
register block, and reading one of the registers causes the parity error, with
major consequences such as GNOME hanging, CD-ROM disappearing from the bus,
syslog excerpt below.

Please make hald read only 128 (or perhaps 64 if sufficient) bytes from the
/config file of SYM53C8XX devices.

strace:
12750 12:12:48.546305 lstat64("/sys/devices/pci0000:00/0000:00:0d.0/config",
{st_mode=S_IFREG|0644, st_size=256, ...}) = 0
12750 12:12:48.546377 stat64("/sys/devices/pci0000:00/0000:00:0d.0/config",
{st_mode=S_IFREG|0644, st_size=256, ...}) = 0
12750 12:12:48.546449 open("/sys/devices/pci0000:00/0000:00:0d.0/config",
O_RDONLY) = 12
12750 12:12:48.546497 read(12, "\0\20\f\0\27\0\0\2\1\0\0\1\20
\0\0\1\320\0\0\0\0\200\354"..., 4096) = 256
12750 12:12:48.553789 close(12)         = 0

syslog:
May 14 19:56:47 merlin kernel: sym0: SCSI parity error detected: SCR1=132
DBC=50000000 SBCL=0
May 14 19:57:18 merlin kernel: sym0:2:0: ABORT operation started.
May 14 19:57:23 merlin kernel: sym0:2:0: ABORT operation timed-out.
May 14 19:57:23 merlin kernel: sym0:2:0: DEVICE RESET operation started.
May 14 19:57:28 merlin kernel: sym0:2:0: DEVICE RESET operation timed-out.
May 14 19:57:28 merlin kernel: sym0:2:0: BUS RESET operation started.
May 14 19:57:28 merlin kernel: sym0: SCSI BUS reset detected.
May 14 19:57:28 merlin kernel: sym0: SCSI BUS has been reset.
May 14 19:57:28 merlin kernel: sym0:2:0: BUS RESET operation complete.
May 14 19:57:38 merlin kernel: sym0:2:0: HOST RESET operation started.
May 14 19:57:48 merlin kernel: scsi: Device offlined - not ready after error
recovery: host 0 channel 0 id 2 lun 0
May 14 19:57:48 merlin kernel: scsi0 (2:0): rejecting I/O to offline device
May 14 19:57:48 merlin last message repeated 4 times
May 14 19:57:48 merlin kernel: cdrom: open failed.
May 14 19:57:48 merlin kernel: scsi0 (2:0): rejecting I/O to offline device
May 14 19:57:50 merlin kernel: cdrom: open failed.
May 14 19:57:50 merlin kernel: scsi0 (2:0): rejecting I/O to offline device
... and so on every two seconds

lspci:
0000:00:0d.0 SCSI storage controller: LSI Logic / Symbios Logic 53c895 (rev 01)
        Subsystem: Tekram Technology Co.,Ltd. DC-390U2W
        Flags: bus master, medium devsel, latency 32, IRQ 16
        I/O ports at d000 [size=256]
        Memory at ec800000 (32-bit, non-prefetchable) [size=256]
        Memory at ec000000 (32-bit, non-prefetchable) [size=4K]

lshal excerpt:
udi = '/org/freedesktop/Hal/devices/pci_1000_c'
  info.parent = '/org/freedesktop/Hal/devices/computer'  (string)
  info.udi = '/org/freedesktop/Hal/devices/pci_1000_c'  (string)
  pci.device_protocol = 0  (0x0)  (int)
  pci.device_subclass = 0  (0x0)  (int)
  pci.device_class = 1  (0x1)  (int)
  info.vendor = 'LSI Logic / Symbios Logic'  (string)
  info.product = '53c895'  (string)
  pci.subsys_product = 'DC-390U2W'  (string)
  pci.subsys_vendor = 'Tekram Technology Co.,Ltd.'  (string)
  pci.product = '53c895'  (string)
  pci.vendor = 'LSI Logic / Symbios Logic'  (string)
  pci.subsys_product_id = 14599  (0x3907)  (int)
  pci.subsys_vendor_id = 7649  (0x1de1)  (int)
  pci.product_id = 12  (0xc)  (int)
  pci.vendor_id = 4096  (0x1000)  (int)
  pci.linux.sysfs_path = '/sys/devices/pci0000:00/0000:00:0d.0'  (string)
  linux.sysfs_path_device = '/sys/devices/pci0000:00/0000:00:0d.0'  (string)
  linux.sysfs_path = '/sys/devices/pci0000:00/0000:00:0d.0'  (string)
  info.bus = 'pci'  (string)

hald output:
12:12:48.528 [I] linux/osspec.c:795: handling
/sys/devices/pci0000:00/0000:00:0d.0 pci
12:12:48.560 [I] device_info.c:1175: scan_fdi_files: Processing file
'6in1-card-reader.fdi'
12:12:48.565 [I] device_info.c:1175: scan_fdi_files: Processing file
'ide-drives.fdi'
12:12:48.572 [I] device_info.c:1175: scan_fdi_files: Processing file
'jetflash-mp3-player.fdi'
12:12:48.576 [I] device_info.c:1175: scan_fdi_files: Processing file
'lexar-media-cf-reader.fdi'
12:12:48.581 [I] device_info.c:1175: scan_fdi_files: Processing file
'lucent-pcmcia-wireless.fdi'
12:12:48.586 [I] device_info.c:1175: scan_fdi_files: Processing file 'sony_dsc.fdi'
12:12:48.591 [I] device_info.c:1175: scan_fdi_files: Processing file
'usb-zip-drives.fdi'
12:12:48.598 [I] device_info.c:1175: scan_fdi_files: Processing file
'storage-policy.fdi'
12:12:48.604 [I] device_info.c:1175: scan_fdi_files: Processing file 'ipod.fdi'
12:12:48.608 [E] device_info.c:334: Could not resolve keypath
'@info.parent:storage.model' on udi '/org/freedesktop/Hal/devices/pci_1000_c'
12:12:48.618 [I] callout.c:318: Invoking /etc/hal/device.d/90-block-subfs.hal
12:12:48.657 [I] callout.c:330: Child pid 13098 for 90-block-subfs.hal
12:12:48.662 [I] callout.c:193: Callouts done for
/org/freedesktop/Hal/devices/pci_109e_36e
12:12:48.667 [W] hald_dbus.c:97: No property volume.is_disc on device with id
/org/freedesktop/Hal/devices/pci_109e_36e
12:12:48.678 [I] callout.c:173: Child pid 13098 terminated
12:12:48.683 [I] callout.c:318: Invoking /etc/hal/device.d/40-hal-hotplug-map.hal
12:12:48.722 [I] callout.c:330: Child pid 13099 for 40-hal-hotplug-map.hal
12:12:48.730 [I] callout.c:173: Child pid 13099 terminated
12:12:48.735 [I] hald.c:81: Added device to GDL;
udi=/org/freedesktop/Hal/devices/pci_1000_c
Comment 2 Matthias Andree 2005-05-15 03:43:49 UTC
*** Bug 3297 has been marked as a duplicate of this bug. ***
Comment 3 Mathieu Chouquet-Stringer 2005-05-15 08:20:58 UTC
FWIW, I currently run FC4T2 which ships hal-0.5.2-1 and I don't have this
problem anymore.
Comment 4 Matthias Andree 2005-05-16 12:06:11 UTC
David posted a patch against 0.4.7 to the hal list. The patch works for me.

Find the patch at:
http://lists.freedesktop.org/archives/hal/2005-May/002546.html
Deep link:
http://lists.freedesktop.org/archives/hal/attachments/20050516/b68022c6/hal-0.4.7-do-not-read-config-file.bin
Comment 5 David Zeuthen (not reading bugmail) 2005-05-17 08:53:52 UTC
Fixed on hal-0_4-stable branch and part of hal 0.4.8 release. HEAD is not
affected by this. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.