Question: tl;dr in case a URE ?occurs on a hdd, will I loose 1bit, 1Byte, or the size of a sector (512Bytes, or 4096 Bytes AF)? and if possible explain why so?

Background:The question here arises when a hard disk has a problem reading data. Surely a disk can fail completly leaving all its data lost (DISK FAIL), but the case I ask about here about is that when just a smaller part of it is lost (URE, an uncorrectable read error).

Even though I have looked for information regarding URE, I have found out little for certain. This might have its cause in that what happens internally in the drive, i.e. what is hidden from direct user interaction like ECCs-correction, is for me hard to relate to what I access as a user – the sectors.

Let us imagine that the hdd has trouble reading data.

In that situation, surely this must mean either that:

  • (a) some bits of the sector cannot be read, or
  • (b) all bits are can be read, yet they do not pass a checksum test (off course expecting trouble a sector 4096 Byte is not just 8*4096 bits, but some additional bits/byte for error checking/correction (i.e. parity bits)(c) ? ?

No my believe is that when we are in the situation in which a combination of (a) and (b) occured and a relyable reconstrution of the 4096 sector’s bytes cannot be done, then it is excessive to assume that necessarily all of them are garpage, actually if we were aware of the interal hdd error correction logic we might instead say “look something does not check out, and with a good change at least 1,2,3,n bits/bytes of the block data is “wrong””. If we were redundantly saving “hello,hello…..,hello” ASCII byte strings in this sector we actually ?might still have a fair succession of “hello,hello….” before there will be a “…Uellohello…” (i.e. “e” -> “U”).

So what is the granularity of an URE?

UPDATE:there has been a comment inputing the idea of bad sector (and suggesting that this reflects the granularity of an URE event. It is not absurd, to suggest it and maybe can be used in answering the question. Yet I just read another related question asking about pending unreadable sectors (here https://unix.stackexchange.com/questions/1869/how-do-i-make-my-disk-unmap-pending-unreadable-sectors) which leads me to think that in some scenarios there is indeed a more blurry line in between the data lost in case of an URE.

Answer: The error correction code on a hard drive is an additional chunk of data that’s associated with each hardware sector. During writing the drive firmware calculates this data and writes it along with the user’s data. During reading the firmware reads the ECC along with the data and checks them together.

For a traditional hard drive the hardware sector is 512 bytes. For an Advanced Format drive it’s 4K bytes (it doesn’t matter whether the drive is presenting 512-byte or 4K-byte sectors at the interface, i.e. 512e vs. 4kn).

The result of the check after a read has basically three possible results:

  • sector was read without error. This is actually not completely common on modern hard drives; the bit densities are such that they depend on ECC working.

  • sector was read with correctable errors. As implied above this is not uncommon; it is expected. The drive returns the data, with error correction applied, to the user.

  • sector was read but there were too many “wrong bits”; the errors could not be corrected.

In the latter case the drive does not normally return any contents whatsoever; it just returns a status indicating the error. This is because it is not possible to know which bits are suspect, let alone what their values should be. Therefore the entire sector (ECC bits and all) is untrustable. It is impossible to determine which part of the bad sector is bad, let alone what its contents should be. The ECC is a “gestalt” that is calculated across the entire sector content, and if it doesn’t match, it’s the entire sector that isn’t matched.

SpinRite works by simply trying to read the bad sector over and over again, using a “maintenance read” function that returns the data (but without ECC bits) even though the drive says “uncorrectable error”. As said in the description linked by DavidPostill, it may succeed with an error-free (actually “correctable” is more likely) read; or it may be able to deduce, essentially by averaging the returned bits together, a reasonable guess at the sector contents. It has no more ability to precisely correct errors using the ECC than the drive does; that’s mathematically impossible.