Question:


??????

??Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. ?

??

Closed 5 years ago.

???? ??Improve this question ?? ?

Why is it generally considered that RAID 5 systems are not suitable for larger disk sizes? Is the same true for RAID 6?

Reference: http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

Answer: The reason why RAID 5 might not be reliable for large disk sizes is that statistically, storage devices (even when they are working normally) are not immune to errors. This is what is termed UBE (sometimes URE), for Unrecoverable Bit Error rate, and it is quoted in full-sector errors per number of bytes read. For consumer rotational hard disk drives, this metric is normally specified at 10^-14, meaning that you will get one failed sector read per 10^14 bytes read. (Because of how exponents work, 10^-14 is the same thing as one per 10^14.)

10^14 bytes might sound like a big number, but it’s really just a handful of full read passes over a modern large (say 4-6 TB) drive. With RAID 5, when one drive fails, there exists no redundancy whatsoever, which means that any error is non-correctable: any problem reading anything from any of the other drives, and the controller (whether hardware or software) won’t know what to do. At that point, your array breaks down.

What RAID 6 does is add a second redundancy disk to the equation. This means that even if one drive fails entirely, RAID 6 is able to tolerate a read error on one of the other drives in the array at the same time, and still successfully reconstruct your data. This dramatically reduces the probability of a single problem causing your data to become unavailable, although it doesn’t eliminate the possibility; in the case of one drive having failed, instead of one additional drive needing to develop a problem for data to be unrecoverable, now two additional drives need to develop a problem in the same sector for there to be a problem.

Of course, that 10^-14 figure is statistical, in the same way as that rotational hard drives commonly have a quoted statistical AFR (Annual Failure Rate) on the order of 2.5%. Which would mean that the average drive should last for 20-40 years; clearly not the case. Errors tend to happen in batches; you might be able to read 10^16 or 10^17 bytes without any sign of a problem, and then you get dozens or hundreds of read errors in short order.

RAID actually makes that latter problem worse by exposing the drives to very similar workloads and environment (temperature, vibration, power impurities, etc.). The situation is worsened further yet by the fact that many RAID arrays are commissioned and set up as a group, which means that by the time the first failure happens, all of the drives in the array will have been active for very near the same amount of time. All this makes correlated failures vastly more likely to happen: when one drive fails, it is very likely to be that case that additional drives are marginal and may fail soon. Merely the stress of the full read pass together with normal user activity may be enough to push an additional drive into failing. As we saw, with RAID 5, with one drive nonfunctional, any read error anywhere else will cause a permanent error and is highly likely to simply bring your array to a halt. With RAID 6, you at least have some margin for further errors during the resilvering process.

Because the UBE is stated as per number of bytes read, and number of bytes read tends to correlate fairly well with how many bytes can be stored, what used to be a fine setup with a set of 100 MB drives might be a marginal setup with a set of 1 TB drives and might be completely unrealistic with a set of 4-6 TB drives, even if the physical number of drives remains the same. (In other words, ten 100 MB drives vs ten 6 TB drives.)

That is why RAID 5 is generally considered not adequate for arrays of common sizes today, and depending on specific needs RAID 6 or 1+0 is usually encouraged.

And that’s not even touching on the detail that RAID is not a backup.