61998 – Fails to read status from WD raptors

Bug 61998 - Fails to read status from WD raptors

Summary: Fails to read status from WD raptors

Status:	NEW

Alias:	None

Product:	libatasmart
Classification:	Unclassified
Component:	library (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Lennart Poettering
QA Contact:	Lennart Poettering

URL:
Whiteboard:
Keywords:	have-backtrace, patch

Depends on:
Blocks:

Reported:	2013-03-08 03:40 UTC by Phillip Susi
Modified:	2017-10-04 12:08 UTC (History)
CC List:	7 users (show)

See Also:
i915 platform:
i915 features:

Attachments
fix-status-io-error.patch (1.12 KB, patch) 2013-03-18 23:10 UTC, Phillip Susi	Details \| Splinter Review
ssd crash (1.16 MB, image/jpeg) 2016-03-23 22:33 UTC, Benjamin Bellec	Details
View All

Description Phillip Susi 2013-03-08 03:40:22 UTC

I get an I/O error trying to read health status:

psusi@faldara:~$ sudo skdump /dev/sdc
Device: sat16:/dev/sdc
Type: 16 Byte SCSI ATA SAT Passthru
Size: 35304 MiB
Model: [WDC WD360GD-00FNA0]
Serial: [WD-WMAH91337618]
Firmware: [35.06K35]
SMART Available: yes
Quirks:
Awake: yes
SMART Disk Health Good: Input/output error
Off-line Data Collection Status: [Off-line data collection activity was completed without error.]
Total Time To Complete Off-Line Data Collection: 1572 s
Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
Percent Self-Test Remaining: 0%
Conveyance Self-Test Available: yes
Short/Extended Self-Test Available: yes
Start Self-Test Available: yes
Abort Self-Test Available: yes
Short Self-Test Polling Time: 2 min
Extended Self-Test Polling Time: 28 min
Conveyance Self-Test Polling Time: 5 min
Bad Sectors: 0 sectors
Powered On: 1.7 years
Power Cycles: 5260
Average Powered On Per Power Cycle: 2.8 h
Temperature: 35.0 C
Attribute Parsing Verification: Good
Overall Status: Input/output error
ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
  1 raw-read-error-rate         200   200    51   0           0x000000000000 prefail online  yes  yes 
  3 spin-up-time                 88    84    21   2.1 s       0x340800000000 prefail online  yes  yes 
  4 start-stop-count             95    95    40   5571        0xc31500000000 old-age online  yes  yes 
  5 reallocated-sector-count    200   200   140   0 sectors   0x000000000000 prefail online  yes  yes 
  7 seek-error-rate             200   200    51   0           0x000000000000 prefail online  yes  yes 
  9 power-on-hours               80    80     0   1.7 years   0x833900000000 old-age online  n/a  n/a 
 10 spin-retry-count            100   100    51   0           0x000000000000 prefail online  yes  yes 
 11 calibration-retry-count     100   100    51   0           0x000000000000 prefail online  yes  yes 
 12 power-cycle-count            95    95     0   5260        0x8c1400000000 old-age online  n/a  n/a 
194 temperature-celsius-2       108   253     0   35.0 C      0x230000000000 old-age online  n/a  n/a 
196 reallocated-event-count     200   200     0   0           0x000000000000 old-age online  n/a  n/a 
197 current-pending-sector      200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
198 offline-uncorrectable       200   200     0   0 sectors   0x000000000000 old-age online  n/a  n/a 
199 udma-crc-error-count        200   253     0   0           0x000000000000 old-age online  n/a  n/a 
200 multi-zone-error-rate       200   125    51   0           0x000000000000 prefail offline yes  yes 


I traced it down to sk_disk_smart_status here:

        /* SAT/USB bridges truncate packets, so we only check for 4F,
         * not for 2C on those */
        if ((d->type == SK_DISK_TYPE_ATA_PASSTHROUGH_12 || cmd[3] == htons(0x00C2U)) &&
            cmd[4] == htons(0x4F00U))
                *good = TRUE;
        else if ((d->type == SK_DISK_TYPE_ATA_PASSTHROUGH_12 || cmd[3] == htons(0x002CU)) &&
                 cmd[4] == htons(0xF400U))
                *good = FALSE;
        else {
>               errno = EIO;
                return -1;
        }

(gdb) print d->type
$5 = SK_DISK_TYPE_ATA_PASSTHROUGH_16
(gdb) print /x cmd
$4 = {0x0, 0x0, 0x0, 0xc200, 0x454f, 0x5000}

Comment 1 Phillip Susi 2013-03-18 23:10:59 UTC

Created attachment 76719 [details] [review]
fix-status-io-error.patch

Fixed the bug with this simple patch.  The existing code is testing the value of 8 bits to be zero that the standard says are undefined.  My drives were not setting them to zero.  Mask off the undefined bits when comparing.

Comment 2 Martin Pitt 2013-03-25 06:41:26 UTC

Thanks Philip! I applied your patch to the Debian package (and will sync that into Ubuntu)

Comment 3 Phillip Susi 2013-09-25 14:19:29 UTC

Hi Lennart, it has been 6 months since I submitted this patch and it hasn't been applied yet.  Could you take a look and at least comment?

Comment 4 Orion Poplawski 2014-06-16 21:10:18 UTC

Patch appears to fix this for me as well as bug 53475

Comment 5 Orion Poplawski 2014-06-16 21:11:45 UTC

*** Bug 53475 has been marked as a duplicate of this bug. ***

Comment 6 Benjamin Bellec 2016-03-23 22:33:02 UTC

Created attachment 122509 [details]
ssd crash

I have a cheap SSD (Corsair Force LS) in my working computer, and it regularly crash. I think this is related to this bug, see my attachment.

Comment 7 Phillip Susi 2016-05-13 00:31:05 UTC

3 years on and this patch is still waiting to be applied.

Comment 8 RASG 2017-10-04 12:08:21 UTC

another year, and still the same problem.

in my case, with a hybrid (HD+SSD) 500G ST95005620AS

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.