Question: Mean Time to Failure (MTTF) is usually given in terms of hours, and by doing some calculations, it seems that a disk should fail only after a good number of years have gone by.
It seems that disks need repair more often than that. Does anyone know why this is so?
I figured that there is something fishy about this metric. Am interpreting something wrong here?
Answer: First off: ?
?
MTTF = Mean Time To Failure ?MTTR = Mean Time To Repair ?MTBF = Mean Time Between Failures = MTTF + MTTR ?
MTBF is often more or less equal to MTTF, since repair may take an hour, and MTTF may be tens of thousands of hours. But also MTBF is often not applicable, since defective products don’t get repaired, but simply replaced, because repair costs more than replacing. ?
MTTF calculation is a complex statistical method involving calculating the odds of failing each and every individual part. And it’s not a linear thing as people sometimes presume. If you have a MTTF of 1000 000 hours that doesn’t mean that in 1000 devices there will be one failing after 1000 hours, or that you will get a failure in 1000 000 devices after 1 hour.Many electronic devices follow the “bathtub curve”, ?
where there are many failures early on, then a long time with hardly any failures, and near the end of life the number of failures rises again. In hard disks there are also some mechanical parts which have a more linear failure curve; this slowly ramps up from day 1. ?
If the manufacturer says for instance 1000 000 hours MTTF (that’s most often POH, or Power-On Hours) it means that on average the drive should last > 100 years. Some drives will last longer, some will fail earlier on. So despite the 1000 000 hours it’s perfectly possible to have a failure after 1000 hours. I once had a drive failing within a week, and then you have to think back of the bathtub curve. The replacement drive has been spinning happily for >50k hours.