TEXT   43

RAID5 versus RAID10

Guest on 7th June 2022 01:19:42 AM

  1. RAID5 versus RAID10 (or even RAID3 or RAID4)
  3. First let's get on the same page so we're all talking about apples.
  5. What is RAID5?
  7. OK here is the deal, RAID5 uses ONLY ONE parity drive per stripe and many
  8. RAID5 arrays are 5 (if your counts are different adjust the calculations
  9. appropriately) drives (4 data and 1 parity though it is not a single drive
  10. that is holding all of the parity as in RAID 3 & 4 but read on). If you
  11. have 10 drives or say 20GB each for 200GB RAID5 will use 20% for parity
  12. (assuming you set it up as two 5 drive arrays) so you will have 160GB of
  13. storage.  Now since RAID10, like mirroring (RAID1), uses 1 (or more) mirror
  14. drive for each primary drive you are using 50% for redundancy so to get the
  15. same 160GB of storage you will need 8 pairs or 16 - 20GB drives, which is
  16. why RAID5 is so popular.  This intro is just to put things into
  17. perspective.
  19. RAID5 is physically a stripe set like RAID0 but with data recovery
  20. included.  RAID5 reserves one disk block out of each stripe block for
  21. parity data.  The parity block contains an error correction code which can
  22. correct any error in the RAID5 block, in effect it is used in combination
  23. with the remaining data blocks to recreate any single missing block, gone
  24. missing because a drive has failed.  The innovation of RAID5 over RAID3 &
  25. RAID4 is that the parity is distributed on a round robin basis so that
  26. there can be independent reading of different blocks from the several
  27. drives.  This is why RAID5 became more popular than RAID3 & RAID4 which
  28. must sychronously read the same block from all drives together.  So, if
  29. Drive2 fails blocks 1,2,4,5,6 & 7 are data blocks on this drive and blocks
  30. 3 and 8 are parity blocks on this drive.  So that means that the parity on
  31. Drive5 will be used to recreate the data block from Disk2 if block 1 is
  32. requested before a new drive replaces Drive2 or during the rebuilding of
  33. the new Drive2 replacement.  Likewise the parity on Drive1 will be used to
  34. repair block 2 and the parity on Drive3 will repair block4, etc.  For block
  35. 2 all the data is safely on the remaining drives but during the rebuilding
  36. of Drive2's replacement a new parity block will be calculated from the
  37. block 2 data and will be written to Drive 2.
  39. Now when a disk block is read from the array the RAID software/firmware
  40. calculates which RAID block contains the disk block, which drive the disk
  41. block is on and which drive contains the parity block for that RAID block
  42. and reads ONLY the one data drive.  It returns the data block.  If you
  43. later modify the data block it recalculates the parity by subtracting the
  44. old block and adding in the new version then in two separate operations it
  45. writes the data block followed by the new parity block.  To do this it must
  46. first read the parity block from whichever drive contains the parity for
  47. that stripe block and reread the unmodified data for the updated block from
  48. the original drive. This read-read-write-write is known as the RAID5 write
  49. penalty since these two writes are sequential and synchronous the write
  50. system call cannot return until the reread and both writes complete, for
  51. safety, so writing to RAID5 is up to 50% slower than RAID0 for an array of
  52. the same capacity.  (Some software RAID5's avoid the re-read by keeping an
  53. unmodified copy of the orginal block in memory.)
  55. Now what is RAID10:
  57. RAID10 is one of the combinations of RAID1 (mirroring) and RAID0
  58. (striping) which are possible.  There used to be confusion about what
  59. RAID01 or RAID10 meant and different RAID vendors defined them
  60. differently.  About five years or so ago I proposed the following standard
  61. language which seems to have taken hold.  When N mirrored pairs are
  62. striped together this is called RAID10 because the mirroring (RAID1) is
  63. applied before striping (RAID0).  The other option is to create two stripe
  64. sets and mirror them one to the other, this is known as RAID01 (because
  65. the RAID0 is applied first).  In either a RAID01 or RAID10 system each and
  66. every disk block is completely duplicated on its drive's mirror.
  67. Performance-wise both RAID01 and RAID10 are functionally equivalent.  The
  68. difference comes in during recovery where RAID01 suffers from some of the
  69. same problems I will describe affecting RAID5 while RAID10 does not.
  71. Now if a drive in the RAID5 array dies, is removed, or is shut off data is
  72. returned by reading the blocks from the remaining drives and calculating
  73. the missing data using the parity, assuming the defunct drive is not the
  74. parity block drive for that RAID block.  Note that it takes 4 physical
  75. reads to replace the missing disk block (for a 5 drive array) for four out
  76. of every five disk blocks leading to a 64% performance degradation until
  77. the problem is discovered and a new drive can be mapped in to begin
  78. recovery.  Performance is degraded further during recovery because all
  79. drives are being actively accessed in order to rebuild the replacement
  80. drive (see below).
  82. If a drive in the RAID10 array dies data is returned from its mirror drive
  83. in a single read with only minor (6.25% on average for a 4 pair array as a
  84. whole) performance reduction when two non-contiguous blocks are needed from
  85. the damaged pair (since the two blocks cannot be read in parallel from both
  86. drives) and none otherwise.
  88. One begins to get an inkling of what is going on and why I dislike RAID5,
  89. but, as they say on late night info-mercials, there's more.
  91. What's wrong besides a bit of performance I don't know I'm missing?
  93. OK, so that brings us to the final question of the day which is: What is
  94. the problem with RAID5?  It does recover a failed drive right?  So writes
  95. are slower, I don't do enough writing to worry about it and the cache
  96. helps a lot also, I've got LOTS of cache!  The problem is that despite the
  97. improved reliability of modern drives and the improved error correction
  98. codes on most drives, and even despite the additional 8 bytes of error
  99. correction that EMC puts on every Clariion drive disk block (if you are
  100. lucky enough to use EMC systems), it is more than a little possible that a
  101. drive will become flaky and begin to return garbage.  This is known as
  102. partial media failure.  Now SCSI controllers reserve several hundred disk
  103. blocks to be remapped to replace fading sectors with unused ones, but if
  104. the drive is going these will not last very long and will run out and SCSI
  105. does NOT report correctable errors back to the OS!  Therefore you will not
  106. know the drive is becoming unstable until it is too late and there are no
  107. more replacement sectors and the drive begins to return garbage.  [Note
  108. that the recently popular IDE/ATA drives do not (TMK) include bad sector
  109. remapping in their hardware so garbage is returned that much sooner.]
  110. When a drive returns garbage, since RAID5 does not EVER check parity on
  111. read (RAID3 & RAID4 do BTW and both perform better for databases than
  112. RAID5 to boot) when you write the garbage sector back garbage parity will
  113. be calculated and your RAID5 integrity is lost!  Similarly if a drive
  114. fails and one of the remaining drives is flaky the replacement will be
  115. rebuilt with garbage also propagating the problem to two blocks instead of
  116. just one.
  118. Need more?  During recovery, read performance for a RAID5 array is
  119. degraded by as much as 80%.  Some advanced arrays let you configure the
  120. preference more toward recovery or toward performance.  However, doing so
  121. will increase recovery time and increase the likelihood of losing a second
  122. drive in the array before recovery completes resulting in catastrophic
  123. data loss.  RAID10 on the other hand will only be recovering one drive out
  124. of 4 or more pairs with performance ONLY of reads from the recovering pair
  125. degraded making the performance hit to the array overall only about 20%!
  126. Plus there is no parity calculation time used during recovery - it's a
  127. straight data copy.
  129. What about that thing about losing a second drive?  Well with RAID10 there
  130. is no danger unless the one mirror that is recovering also fails and
  131. that's 80% or more less likely than that any other drive in a RAID5 array
  132. will fail!  And since most multiple drive failures are caused by
  133. undetected manufacturing defects you can make even this possibility
  134. vanishingly small by making sure to mirror every drive with one from a
  135. different manufacturer's lot number.  ("Oh", you say, "this schenario does
  136. not seem likely!"  Pooh, we lost 50 drives over two weeks when a batch of
  137. 200 IBM drives began to fail.  IBM discovered that the single lot of
  138. drives would have their spindle bearings freeze after so many hours of
  139. operation.  Fortunately due in part to RAID10 and in part to a herculean
  140. effort by DG techs and our own people over 2 weeks no data was lost.
  141. HOWEVER, one RAID5 filesystem was a total loss after a second drive failed
  142. during recover.  Fortunately everything was on tape.
  144. Conclusion?  For safety and performance favor RAID10 first, RAID3 second,
  145. RAID4 third, and RAID5 last!  The original reason for the RAID2-5 specs
  146. was that the high cost of disks was making RAID1, mirroring, impractical.
  147. That is no longer the case!  Drives are commodity priced, even the biggest
  148. fastest drives are cheaper in absolute dollars than drives were then and
  149. cost per MB is a tiny fraction of what it was.  Does RAID5 make ANY sense
  150. anymore?  Obviously I think not.
  152. To put things into perspective: If a drive costs $1000US (and most are far
  153. less expensive than that) then switching from a 4 pair RAID10 array to a 5
  154. drive RAID5 array will save 3 drives or $3000US.  What is the cost of
  155. overtime, wear and tear on the technicians, DBAs, managers, and customers
  156. of even a recovery scare?  What is the cost of reduced performance and
  157. possibly reduced customer satisfaction?  Finally what is the cost of lost
  158. business if data is unrecoverable?  I maintain that the drives are FAR
  159. cheaper!  Hence my mantra:
  163. Art S. Kagel

Raw Paste

Login or Register to edit or fork this paste. It's free.