Possibly a dying hard drive, but reads, writes work – unsure about log entries

Question: I recently received a Linux box having problems with Samba share – first off, couldn’t connect, second ls -la showed some I/O error (close to what can be seen below) with no listing.

Now, I’ve fully updated the box, and after the update, the RAID is OK, all the data accessible and Samba worked like a charm. Apparently, I didn’t save the previous logs.

Now, even if everything works, from time to time this pops up in my journalctl:

kernel: ata4: EH completekernel: end_request: I/O error, dev sdc, sector 2839546656kernel: cdb[0]=0x28: 28 00 a9 40 0b 20 00 00 f0 00kernel: sd 3:0:0:0: [sdc] CDB:kernel: ASC=0x47 ASCQ=0x0kernel: sd 3:0:0:0: [sdc]kernel: ?a9 40 0b a0kernel: ?72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00kernel: Descriptor sense data with sense descriptors (in hex):kernel: Sense Key : 0xb [current] [descriptor]kernel: sd 3:0:0:0: [sdc]kernel: Result: hostbyte=0x00 driverbyte=0x08kernel: sd 3:0:0:0: [sdc]kernel: ata4.00: configured for UDMA/133kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 330)kernel: ata4: hard resetting linkkernel: ata4.00: error: { ICRC ABRT }kernel: ata4.00: status: { DRDY ERR }kernel: [145B blob data]kernel: ata4.00: failed command: READ DMA EXTkernel: ata4: SError: { UnrecovData 10B8B BadCRC }kernel: ata4.00: BMDMA stat 0x26kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6

smartctl -t extended (S.M.A.R.T. long (maximum) scan) says nothing three times already.

By “everything works”, I mean:

// Read from drive, write to drive.find > files.txt// Another read->write.du -bc > sizes.txt// 100 GB random writerdd if=/dev/urandom of=fillerd bs=512 count=209715200

The files end up uncorrupt, fully readable.

What does the error depict? Should I be worried? How do I fix it?

Answer: The salient log entries are:

kernel: ata4.00: error: { ICRC ABRT }
kernel: ata4: SError: { UnrecovData 10B8B BadCRC }

These log entries indicate an error is occurring on the SATA interface between the PC and HDD.The SATA interface carries ATAPI packets for data, commands and status reports that are verified using CRC, Cyclic Redundancy Check, code.The ICRC ABRT message indicates an “Interface CRC error” event and that the “Command aborted”. ?The other log entries are ancillary information relating to the command that was aborted.This is not reporting an error relating to the R/W heads or platters of the HDD, since sectors are verified using ECC, not weaker CRC.More detailed information about these messages is at this libata wiki page

See this similar question on “SATA drives or chipset throwing DRDY ERR and ICRC ABRT”, where the source of the problem was attributed to the host side of the SATA interface and not the HDD. ?

Note that an occasional SATA interface error is not considered problematic:

?For SATA drives, occasional transmission problems are expected even on ?otherwise pretty healthy systems. No need to worry about it too much ?unless the problem repeats itself a lot.

quoted from this Linux post.

smartctl -t extended (S.M.A.R.T. long (maximum) scan) says nothing three times already.

The Extended S.M.A.R.T. test is a self-test that is performed local to the drive, and apparently does not stress the SATA interface. Hence it doesn’t help resolve the issue, but does reinforce the notion that the issue is on the interface rather than the media. ?

You need to look for a disk diagnostic or exerciser that executes from the host PC.Since the Extended S.M.A.R.T. test can evidently read every sector without error, a near-identical test to read every sector and transfer that sector to the PC over the SATA bus is:

dd if=/dev/sdc of=/dev/null

There would be three sources of hardware failure on the SATA interface:

the SATA cable. e.g. Is my drive dying?Simple test: replace the cable.
the motherboard’s SATA interface.Test: use a different SATA port, or install an alternate interface, such as a PCI or USB to SATA adapter with a new cable.
the drive’s SATA interface.Test: install the HDD in another PC with a new cable, and see if errors follow the drive. ?

But besides a hardware fault for this issue, there have been reports that implicated the Linux kernel as the cause of SATA errors: ?

[SOLVED] DRDY ERR and ICRC ABRT in dmesg and console ?
Repeated DRDY ERR / ICRC ABRT msgs on 2.6.31-19-server ?

Bottom Line

If you’re only seeing these ICRC ABRT entries in the log at an infrequent “time to time” rate, then you may no longer have a problem. ?Perhaps the original issues may be attributable to some kernel issues that were eliminated when you updated the system. ?

Try using the system, and backup diligently.

Possibly a dying hard drive, but reads, writes work – unsure about log entries

Related Post

What are the Windows A: and B: drives used for?

Why is Google so much faster than a hard-drive search?

Is there still a reason to choose a 10,000 RPM hard drive over an SSD?