Question: I still did not get why is RAID5 better than RAID4. I understand both computes parity bits that are used for recovering if some failure occurs, the only difference is in storing those parity bits. I have borrowed diagrams from here How does parity work on a RAID-5 array
A B (A XOR B)0 0 ?01 1 ?00 1 ?11 0 ?1
RAID4
Disk1 ?Disk2 ?Disk3 ?Disk4—————————-data1 ?data1 ?data1 ?parity1data2 ?data2 ?data2 ?parity2data3 ?data3 ?data3 ?parity3data4 ?data4 ?data4 ?parity4
Lets say that first row is:
data1 = 1data1 = 0data1 = 1parity1 = 0 (COMPUTED: 1 XOR 0 XOR 1 = 0)
RAID5
Disk1 ?Disk2 ?Disk3 ?Disk4—————————-parity1 data1 ?data1 ?data1 ?data2 ?parity2 data2 ?data2 ?data3 ?data3 ?parity3 data3data4 ?data4 ?data4 ?parity4
Lets say that first row is:
parity1 = 0 (COMPUTED: 1 XOR 0 XOR 1 = 0)data1 = 1data1 = 0data1 = 1
Scanarios:
1. RAID4 – Disk3 FAILURE:
data1 = 1data1 = 0data1 = 1 (COMPUTED: 1 XOR 0 XOR 0 = 1)parity1 = 0
2. RAID4 – Disk4 (parity) FAILURE:
data1 = 1data1 = 0data1 = 1 parity1 = 0 (COMPUTED: 1 XOR 0 XOR 1 = 0)
etc.
In general: when RAID(4 or 5) uses N disks and one fails. I can take all remaining non failed disks (N-1) and XOR (since XOR is associative operation) values and I will get the failed value. What is the benefit of storing parity not on dedicated disk but rather cycle them? Is there some performance benefit or what? Thank you
Answer: There is a performance difference in that with RAID 4 each change requires writing to the single parity disk, which means things can queue waiting to update the parity data on that disk.
With RAID 5 you have a significant reduction in this because the parity update load is spread across multiple disks, so there’s less chance if getting stuck in a queue.
Here’s a nice link from Fujitsu with a short explanation and some nice animations to help clarify the performance/penalties of RAID 4 (as well as other RAID levels).