{"id":1537,"date":"2018-08-11T16:58:35","date_gmt":"2018-08-11T14:58:35","guid":{"rendered":"http:\/\/borkedcode.com\/wp\/?p=1537"},"modified":"2018-08-11T16:58:35","modified_gmt":"2018-08-11T14:58:35","slug":"raid-5-and-the-sky-is-falling","status":"publish","type":"post","link":"https:\/\/www.borkedcode.com\/wp2\/2018\/08\/11\/raid-5-and-the-sky-is-falling\/","title":{"rendered":"RAID-5 and the Sky Is Falling"},"content":{"rendered":"<h2>The Situation<\/h2>\n<p>I\u2019ve seen, over the last few weeks, more than a few posts on a popular IT hangout site proclaiming in loud, evangelical voice that \u201cRAID-5 is terrible for spinning disks, never use it!\u00a0 If you do, you\u2019re a stupidhead!\u201d and similar statements.<\/p>\n<p>I\u2019m here to tell you that\u2019s not an appropriate answer.\u00a0 In fact, it\u2019s tunnel-vision BS.<\/p>\n<p>I\u2019m also here to remind you that RAID is not a backup \u2013 it is <em>avoidance of downtime<\/em>, and it is <em>reliability of storage.<\/em>\u00a0 If you are relying on a RAID array to protect you from data loss, you need to add some extra figures to your budget.\u00a0 <strong><em><u>You cannot have your production system also be your backup repository.<\/u><\/em><\/strong>\u00a0 If you think that you are safe because all your production data is on a RAID, and you don\u2019t bother with a proper backup, you are going to be in deep kimchee when you have a serious issue with your array.<\/p>\n<p>Now, I suspect there is a kernel of truth inside the concern here \u2013 it seems to stem from an article written last year whose theme was \u201cIs this the end of RAID-5\u201d or something similar.\u00a0 That article was quite accurate in its point \u2013 that with the escalating size of drives today, and the numbers of them we are using to produce our volumes, it is inevitable that a drive failure will occur \u2013 and that during a rebuild, it becomes a mathematical likelihood that a read error will result in a rebuild failure.<\/p>\n<p>All quite true.<\/p>\n<p>But in the reality of many of the conversations I\u2019ve seen the doomsayers trumpeting their end-of-the-world mantras, volume sizes simply do not justify the fear.<\/p>\n<p>Let\u2019s\u00a0 take a realistic look at RAID fails, and figure out the real numbers, so we can all breathe a little calmer, shall we?<\/p>\n<p>As a goal for this article, I want to give you the ability to calculate the odds of data loss in your own RAID systems when we\u2019re done.<\/p>\n<p>First off, we have to look at the risk we are mitigating with RAID\u2026drive failures and read failures.\u00a0 Both come down to a small percentage chance of failure, which is best tied to the figures \u201cAnnualized Failure Rate\u201d (AFR, which represents the % of drives that die in a year) and \u201cUnreadable Read Error\u201d (URE, which represents an attempt by an array to read a sector and fails, probably due to Bit Error).<\/p>\n<p>Google wrote a paper on drive fails about ten years ago, which showed that drives which don\u2019t die in the first few months of life generally last for five years or so before their AFR starts getting up on about 6%-8%, which is generally considered unacceptable for datacenter or other usage that requires reliability.\u00a0 As it happens, BackBlaze (backblaze.com) is a DC that publishes its own empirical hard drive mortality stats regularly, so these figures can be updated in your own records using accurate data for the brands of drive you use.<\/p>\n<p>The most current BlackBlaze chart as of the time of this writing can be found here:\u00a0 <a href=\"https:\/\/www.backblaze.com\/blog\/hard-drive-stats-for-q1-2018\/\">https:\/\/www.backblaze.com\/blog\/hard-drive-stats-for-q1-2018\/<\/a><\/p>\n<p>So let\u2019s begin, shall we?<\/p>\n<p>During this article, I\u2019m going to spell out several different scenarios, all real-world and all appropriate for both SMBs and personal operations.\u00a0 I have direct and hands-on with each of them, and it is my hope you\u2019ll be able to perform the same calculations for those arrays within your own sphere of control.<\/p>\n<p>Array 1:\u00a0 4 Western Digital Red drives, 4TB each in a RAID-5 array.<\/p>\n<p>Array 2:\u00a0 4 HGST NAS drives, 8TB each in a RAID-5 array.<\/p>\n<p>Array 3:\u00a0 8 Western Digital Red drives, 6TB each in a RAID-6 array. (we\u2019ll also run over this in RAID-5 just to be thorough)<\/p>\n<p>Array 4:\u00a0 12 Seagate Iron Wolf Pro drives, 10TB each in RAID-6 (as with the above, we\u2019ll hit it at RAID-5 too)<\/p>\n<p>Array 5:\u00a0 12 Seagate Enterprise Capacity drives, 8TB each in RAID-6 (and RAID-5)<\/p>\n<p>Array 6:\u00a0 12 Seagate 300GB Savvio drives, RAID-5<\/p>\n<p>Array 7:\u00a0 7 Seagate 600GB Savvio drives, RAID-5<\/p>\n<p>(Note:\u00a0 Enterprise Capacity drives have been re-branded by Seagate and now go by the name \u201cExos\u201d)<\/p>\n<p>We start by collecting fail rates on those drives, both annualized fail rates from the empirical charts at BackBlaze, and the averaged bit-read error rate.\u00a0 Note that AFR increases with age, high temperature, and power cycles; it lowers for things like using Helium as a filler (despite this making all your data sound like it was recorded by Donald Duck).\u00a0 The bit error rate figures are drawn directly from the manufacturer\u2019s sites (and can often be found as BER, \u201cbit error rate\u201d), so there will be some \u2018wiggle room\u2019 in our final derived figures.<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"141\">Drive<\/td>\n<td width=\"161\">Annualized Failure Rate<\/td>\n<td width=\"142\">Bit Error Rate<\/td>\n<\/tr>\n<tr>\n<td width=\"141\">WD Red 4TB<\/td>\n<td width=\"161\">2.17%<\/td>\n<td width=\"142\">1 per 10e14<\/td>\n<\/tr>\n<tr>\n<td width=\"141\">HGST NAS 8TB<\/td>\n<td width=\"161\">1.2%<\/td>\n<td width=\"142\">1 per 10e14<\/td>\n<\/tr>\n<tr>\n<td width=\"141\">WD Red 6TB<\/td>\n<td width=\"161\">4.19%<\/td>\n<td width=\"142\">1 per 10e14<\/td>\n<\/tr>\n<tr>\n<td width=\"141\">Iron Wolf Pro 10TB<\/td>\n<td width=\"161\">0.47%<\/td>\n<td width=\"142\">1 per 10e15<\/td>\n<\/tr>\n<tr>\n<td width=\"141\">Iron EC 8TB<\/td>\n<td width=\"161\">1.08%<\/td>\n<td width=\"142\">1 per 10e15<\/td>\n<\/tr>\n<tr>\n<td width=\"141\">Seagate Savvio .3TB<\/td>\n<td width=\"161\">0.44%<\/td>\n<td width=\"142\">1 per 10e16<\/td>\n<\/tr>\n<tr>\n<td width=\"141\">Seagate Savvio .6TB<\/td>\n<td width=\"161\">0.44%<\/td>\n<td width=\"142\">1 per 10e16<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>For reference, the reason why people often follow up the statement \u201cRAID-5 is crap\u201d with \u201cunless you use an SSD\u201d is because SSDs have a BER of around 1 per 10e17 \u2013 a BER on an SSD is extremely rare.<\/p>\n<p>With these figures, and with the sizes of the arrays and their types known, we can prepare the variables of the equation we\u2019ll build.<\/p>\n<p>Num:\u00a0 Number of drives in the array<\/p>\n<p>ALoss:\u00a0 Allowed loss \u2013 the number of drives we can afford to lose before unrecoverable data loss occurs.<\/p>\n<p>AFR:\u00a0 Annualized Failure Rate (derived from empirical evidence)<\/p>\n<p>URE:\u00a0 Unrecoverable Read Error, this is the same as \u201cBit Error Rate\u201d above<\/p>\n<p>MTTR:\u00a0 Mean time to repair \u2013 this will vary depending on your drive sizes, cage controller(s), memory, processor, etc.\u00a0 I\u2019m going to just plug in \u201c24 hours\u201d here, you can put in whatever you feel is appropriate.<\/p>\n<p>We\u2019re also going to be playing a probability game with these, since we don\u2019t know exactly when something is going to blow out on us, we can only assume statistical probability.\u00a0 To set the stage, let\u2019s play with a few dice (and that\u2019s something I know quite a bit about, having written a book on Craps some decades ago).\u00a0 We want to establish the probability of a particular event.<\/p>\n<p>The probability of something is = number of sought outcomes \/ number of total outcomes<\/p>\n<p>Starting simple, we\u2019ll use a six-sided die.\u00a0 We want to prepare an equation to determine the odds of <em>rolling a one on any of ten rolls.\u00a0 <\/em><\/p>\n<p>So our sought outcome is 1.\u00a0 Number of total outcomes is 6.\u00a0 That gives us 1\/6, or 0.1667.<\/p>\n<p>We\u2019re trying ten times, which complicates matters.\u00a0 It\u2019s not simply additive.\u00a0 It\u2019s <em>multiplicative.<\/em>\u00a0 And when we\u2019re collating multiple independent events, we multiply the odds of each event against each other.\u00a0 Probability of two events A and B happening together, then, are Prob(A) * Prob(B).\u00a0 If we were asking \u201cwhat are the odds of rolling a one on each of ten rolls\u201d it would be pretty easy.\u00a0 But that\u2019s not the question we\u2019re asking.<\/p>\n<p>The question we\u2019re asking is <em>what are the odds of one or more of the rolls being a one?<\/em><\/p>\n<p>We have to invert our approach a bit.\u00a0 We\u2019re going to start with 100% and subtract the chance of <em>not getting a 1.<\/em>\u00a0 If we determine the odds of avoiding a 1 on every single roll, then the chance of getting a roll on any one of our rolls is the inverse of it.\u00a0 The odds of <em>not<\/em> getting a 1 when we roll are 5\/6, and there are ten tries being made, so (5\/6) raised to the 10<sup>th<\/sup>.\u00a0 Then we simply subtract that from 100% to get our answer.<\/p>\n<p>(5\/6) raised to the 10<sup>th<\/sup> is (9,765,625 \/ 60,466,176), which is 0.1615 \u2013 I rounded a bit.<\/p>\n<p>1-0.1615= 0.8385, which is our result.\u00a0 The odds of rolling a 1 on any of ten individual rolls is 83.85%.<\/p>\n<h2>RAID Types<\/h2>\n<p>A little backgrounder on types of RAID for the uninitiated here \u2013 and there\u2019s no shame in not knowing, this stuff is pretty dry for all but the total platterhead.\u00a0 I guess that means I\u2019m a bit of a dork, but what the hell.<\/p>\n<p>RAID means \u201cRedundant Array of Inexpensive Disks\u201d and first became popular commercially in the late \u201880s and early \u201890s, when hard drives were becoming economically a big deal.\u00a0 Previously, a strategy called \u201cSLED\u201d was considered the go-to model for storage, and it represented \u201cSingle Large Expensive Disk\u201d.\u00a0 RAID took over, because it was a lot more economical to bond multiple inexpensive units into an array that offered capacity equal to a drive which would cost far more than the combined cost of the RAID.<\/p>\n<p>Different RAID types offer different advantages.\u00a0 Importantly, <em>all of them are considered for use as volumes,<\/em> just like you\u2019d consider a hard drive.\u00a0 These aren\u2019t magic, they\u2019re just volumes.\u00a0 How you use them is up to you.\u00a0 <em>When you store production data on them, they need to be backed up using smart backup practice.<\/em><\/p>\n<p>Most mentions you\u2019ll see regarding RAID include various numbers, each of\u00a0 which means something:<\/p>\n<p><u>RAID 0<\/u> \u2013 this form of raid uses at least two disks, and \u201cstripes\u201d data across all of them.\u00a0 This offers fast read performance, fast write performance.\u00a0 Usually this RAID limits its use of any physical drives to the size of the smallest in the group (so if you have three 4TB and one 6TB, it will generally only use 4TB of the 6TB drive).\u00a0 This RAID also provides the used capacity in full for storage, so 3 4TB drives will make a 12TB RAID 0 volume.\u00a0 This RAID adds vulnerability:\u00a0 if any one of the drives in the array is lost, you lose data.<\/p>\n<p><u>RAID 1<\/u> \u2013 this is \u201cmirroring\u201d.\u00a0 It uses an even number of disks (usually just two), and makes an exact copy of volume data on each drive.\u00a0 They don\u2019t have to be the same size, but the volume will only be as big as the smallest drive.\u00a0 Benefit is fast reading (no benefit in write speed) and redundant protection \u2013 if you lose a drive, you still have its mirror.\u00a0 It also is fast to create, as adding a second drive only requires that the new drive receive a copy of the other.\u00a0 The performance benefits are limited only to the speed of the slowest member of the array.\u00a0 This method gives up 50% of the total drive capacity to form the mirror.<\/p>\n<p><u>RAID 2<\/u> \u2013 it\u2019s unlikely you\u2019ll ever see this in your life.\u00a0 Uses a disk for parity information in case of loss of a data disk.\u00a0 It\u2019s capable of super-fast performance, but it depended on coordinating the spin of all disks to be in sync with each other.<\/p>\n<p><u>RAID 3<\/u> \u2013 Also extremely rare, this one is good for superfast sequential reads or writes, so perhaps would be good for surveillance camera recording or reading extended video tracks.\u00a0 This also uses a parity disk similar to RAID 2.<\/p>\n<p><u>RAID 4<\/u> \u2013 another rare one, suitable for lots of little reads, not so hot for little writes, also uses a dedicated parity disk like 2 &amp; 3.<\/p>\n<p><u>RAID 5<\/u> \u2013 this is currently the most common form of raid.\u00a0 It stripes data among all its drives, just like RAID 0, but it also dedicates a portion of its array equal to the capacity of one of its disks to parity information and stripes that parity information among all disks in the array.\u00a0 This is different from the previous forms of parity, which used a single disk to store all parity info.\u00a0 RAID 5 can withstand the loss of any one disk without data loss from the array\u2019s volumes, but a second drive loss will take data with it.\u00a0 This array type has an advantage in write speed against a single disk, but not quite as good as RAID 0 since it has to calculate and record parity info.<\/p>\n<p><u>RAID 6<\/u> \u2013 this basically takes the idea of striped parity in RAID 5 and adds redundancy to it:\u00a0 this array stores parity info twice, enabling it to resist the loss of two drives without data loss.<\/p>\n<p><u>RAID 10<\/u> \u2013 this is actually \u201cnested\u201d RAID, a combination of 1 (striping) and 0 (mirroring).\u00a0 This requires at least four disks, which are striped and mirrored.\u00a0 Usually this is done for performance, and some data protection.\u00a0 It\u2019s a little bit more protected than RAID 5, in that it can withstand the loss of one drive reliably, and if it loses a second, there\u2019s a chance that second drive won\u2019t cause data loss.\u00a0 However, this one gives up 50% of the total drive capacity to the mirror copies.<\/p>\n<p>There are also a series of other nested forms of RAID, but if you need those you\u2019re well past the scope of this article.<\/p>\n<h2>Parity<\/h2>\n<div id=\"attachment_1556\" style=\"width: 310px\" class=\"wp-caption alignleft\"><a href=\"http:\/\/borkedcode.com\/wp\/wp-content\/uploads\/2018\/08\/600px-RAID_6.svg_.png\"><img fetchpriority=\"high\" decoding=\"async\" aria-describedby=\"caption-attachment-1556\" class=\"wp-image-1556 size-medium\" src=\"http:\/\/borkedcode.com\/wp\/wp-content\/uploads\/2018\/08\/600px-RAID_6.svg_-300x177.png\" alt=\"\" width=\"300\" height=\"177\" \/><\/a><p id=\"caption-attachment-1556\" class=\"wp-caption-text\">Credit: Wikipedia<\/p><\/div>\n<p>In RAID terminology, &#8220;Parity&#8221; is a value calculated by the combination of bits on the disks in the array (most famously an XOR calc, but different vendors can stray from this), which generates a bit value which is recorded in the &#8220;parity&#8221; bit.<\/p>\n<p>In the image here of a RAID 6 array, the first bit of stripe A&#8217;s parity would be generated by taking the first bit of each A1, A2, and A3, and performing a sequential XOR calculation on them.\u00a0 This would produce a bit that is recorded on both Ap and Aq.\u00a0 Later, if a disk fails &#8211; say Disk 0 bites it &#8211; then the system can read the data from the bits in A2, A3, and Ap or Aq to figure out what belongs where A1 used to be.\u00a0 When a new drive replaces the failed Disk 0, that calculation is run for every bit on the disk, and the new drive is &#8220;rebuilt&#8221; to where the old one was.<\/p>\n<p>There&#8217;s also an important point to be made about the\u00a0<em>types<\/em> of parity you&#8217;re looking at in that image.\u00a0 There are multiple ways to calculate the parity bit that is being used.\u00a0 In RAID 5, the most common is an XOR calculation.\u00a0 In this method &#8220;bit number 1&#8221; on each data stripe is XOR&#8217;ed with the next one, and then the next, etc. until you reach the parity stripe and the result is then recorded there.\u00a0 Effectively this is a &#8220;horizontal&#8221; line drawn through each disk, ending in the parity stripe.\u00a0 So when you need to know what was on that data disk (whether rebuilding or just reading), it can be re-constructed by backing up that XOR equation.<\/p>\n<p>And then&#8230;the gods rose from R&#8217;lyeh to lay down RAID 6 parity.<\/p>\n<p>Most RAID-6 uses an encoding method for its extra parity called &#8220;Reed-Solomon&#8221; (this method is used in a lot of data-reading applications, like barcode scanners, DVD readers, and low-bandwidth radio data transmission).\u00a0 This method manages to record parity against a second missing piece using other data in the array &#8211; RS encoding builds its parity using an algorithm that generates something like a scattergram of source bits, both vertical and horizontal (which makes it resistant to the loss of a second data disk &#8211; if it just copied the XOR result of the first disk then a second data disk would corrupt the intent).\u00a0 I&#8217;m not going to pretend I understand the Galois Field and other heavy-duty math behind this stuff, I just know it exists, it is commonly used for RAID-6, and Dumbledore or the Old Ones were probably involved somewhere along the way.\u00a0 It costs more CPU- and IO-wise, which is why it isn&#8217;t commonly used in RAID-5.<\/p>\n<p>(I say &#8220;Most&#8221; RAID-6, because other vendors can use different methods &#8211; for example, Adaptec has their own proprietary algorithm in their hardware controllers, different from RS, but the functional result to us as users is the same.)<\/p>\n<h2>Data Loss<\/h2>\n<p>What is it that takes us into data loss territory?\u00a0 Obviously, dropping the entire cage while powered up and running will get us there fast.\u00a0 Let\u2019s make the assumption that if something along those lines were to occur, you\u2019d have an entirely different set of problems, and you wouldn\u2019t have time to be perusing this article. Instead, we\u2019ll focus on natural wear-and-tear.\u00a0 To get to data loss, there are three steps:<\/p>\n<ol>\n<li>Initial drive failure(s), and&#8230;\n<ol>\n<li>Enough drive failures in any point before preservation exceeding our acceptable loss<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>Some other topics we\u2019ll talk about:<\/p>\n<ol start=\"2\">\n<li>Possibly drive failure during rebuild (I\u2019ll tell you towards the end here why you should have caution before starting that rebuild)<\/li>\n<\/ol>\n<p>\u2026and\/or\u2026<\/p>\n<ol start=\"3\">\n<li>Read error during rebuild (this is why 2 will require caution)<\/li>\n<\/ol>\n<p>This brings me to a very important point, and one around which this entire discussion revolves:\u00a0 protecting your data.\u00a0 I think the entire \u201cRAID-5 is poopy\u201d argument stems from the forgetfulness that one must never rely on RAID levels as the only protection of your data.\u00a0 RAID serves to make you a nice big volume of capacity, and protects your uptime with some performance benefits.<\/p>\n<p><em>It does not magically provide itself with backup<\/em>.\u00a0 You have to back it up just like anything else.<\/p>\n<p>So if you\u2019re creating a 3TB array, get something that can back that array up and has the capacity on reliable forms of storage to keep your data safely.<\/p>\n<h2>Drive Failure<\/h2>\n<h3><span style=\"text-decoration: underline;\">First Failure<\/span><\/h3>\n<p>The initial drive failure is a compound figure of the AFR by the number of drives, and we\u2019ll figure it on an annual rate. This part is pretty simple, let\u2019s go back to our dice equation and substitute drive values:<\/p>\n<p>Drive Loss Rate = <em>what are the odds at least one drive will die in a year?<\/em><\/p>\n<p>If it\u2019s just one drive, that\u2019s easy \u2013 use the AFR.<\/p>\n<p>But it\u2019s multiple drives, so we have to approach it backwards like we did with dice rolls.<\/p>\n<p>So it\u2019s 100% minus (1-AFR)eNumberOfDrives.<\/p>\n<p>For my Array 1, for example:\u00a0 those WD\u2019s have an AFR of .0217.\u00a0 Plugging this into the equation above yields:<\/p>\n<p>100% &#8211; (1-AFR)e4 = 100% &#8211; 91.59% = 8.41%<\/p>\n<p>So I have about an 8.41% chance of losing a drive in a given year.\u00a0 This will change over time as the drives age, etc.<\/p>\n<h3><span style=\"text-decoration: underline;\">Drive Failure Before Preservation<\/span><\/h3>\n<p>So let\u2019s assume I lost a drive.\u00a0 I\u2019m now ticking with no redundancy in my array, and what are my chances of losing another to cause data loss during the window of time I have to secure my data?<\/p>\n<p>This one is also pretty simple \u2013 it\u2019s the same calc we just did, but we\u2019re doing it only for the gap-time before we preserve the data and for the remaining drives in the array.\u00a0 Let\u2019s use two examples \u2013 24 hours, and two weeks.<\/p>\n<p>24 hours:\u00a0 1 \u2013 (1-(AFR * 0.00273))eN<\/p>\n<p>Where AFR is the AFR of the drive, N is the number of drives remaining.\u00a0 The 0.00273 is the fraction of a year represented by 24 hours.<\/p>\n<p>2 weeks:\u00a0 1 \u2013 (1-(AFR * 0.0384))eN<\/p>\n<p>0.0384 is the fraction of a year represented by 2 weeks.<\/p>\n<p>If it\u2019s my Array 1, then we\u2019re working with WD reds which have a 0.0217 AFR.\u00a0 I lose a drive, I have three left.\u00a0 Plugging those values in results in:<\/p>\n<p>24 hours:\u00a0 1 \u2013 (1-(0.0217 * 0.00273))e3 = 1 \u2013 (0.99994)e3 = 0.0001777, or 0.01777% chance of failure<\/p>\n<p>2 weeks:\u00a0 1 \u2013 (1-(0.0217 * 0.0384))e3 = 1 \u2013 (0.99917)e3 = 0.002498, or 0.2498% chance of failure<\/p>\n<p>We now know what it will take for my Array 1 to have a <u>data loss<\/u> failure:\u00a0 8.41% (chance of initial drive failure) times the chance of failure during the gap when I am protecting my data.\u00a0 Assuming I\u2019m a lazy bastard, let\u2019s go with 2 weeks, 0.2498%.<\/p>\n<p>That data loss figure comes out to be 0.021%.\u00a0 A little bit more than two chances in ten thousand.<\/p>\n<p>Based on that, I\u2019m pretty comfy with RAID-5.\u00a0 Especially since I take a backup of that array every night.<\/p>\n<h2>Unrecoverable Read Error<\/h2>\n<p>This figure is generally the one that strikes fear into people\u2019s hearts when talking about RAID-5.\u00a0 I want to establish the odds of a read error occurring during the rebuild, so we can really assess what the fearful figure is:<\/p>\n<p>What is Bit Error Rate?\u00a0 In simple terms, BER is calculated as (# of errors \/ total bits sent-read).\u00a0 Let\u2019s find a way to translate these miniscule numbers into something our brains can grok, like a percentage.<\/p>\n<p>To start, we\u2019re reading some big quantities of data from hard drives, so let\u2019s bring that into the equation too \u2013 there are 8 bits in a byte, and 1,000,000,000 bytes in a Gigabyte.\u00a0 Add three more zeroes for a Terabyte.<\/p>\n<p>Be aware that some arrays can see a failure coming, and have the ability to activate a hot-spare to replace the threatened drive \u2013 most SAN units have this capacity, for example, and a lot of current NAS vendors do as well.\u00a0 If yours can\u2019t, this is where you should be paying attention to your SMART health reports, so you can see it coming and take action beforehand.\u00a0 Usually that action is to install and activate a hot-spare.\u00a0 If you have a hot-spare and it gets activated, it receives a bit-for-bit copy of what\u2019s on the failing drive, and then is promoted to take over the position of the failing disk.\u00a0 This avoids rebuild errors and is much faster than a rebuild, but it doesn\u2019t protect from BER, so if there\u2019s a bit error during the copy then the incorrect bit will be written to the new drive.\u00a0 This might not be a big issue, as many file formats can withstand an occasional error.\u00a0 Might even be that the error takes place on unused space.<\/p>\n<p>Rebuilds of an array are another case entirely.\u00a0 The time required is much greater, since the array is reading <em>every single bit <\/em>from the remaining stripe data on the good drives, and doing an XOR calc using the parity stripe to determine what the missing bit should be, and writing it to the new drive.\u00a0 During a rebuild, that bit error poses a bigger problem.\u00a0 We are unable to read, ergo we can\u2019t do the XOR calc, and that means we have a rebuild failure.<\/p>\n<p>(If we\u2019re in RAID-1, by the way, that\u2019s a block-for-block copy from a good drive to the new drive \u2013 bit error will end up copying rather than calculating, so there won\u2019t be a failure, just bad data.)<\/p>\n<p>If we had a hot spare, we\u2019d be out of the woods before having to rebuild.\u00a0 But let\u2019s keep looking at that rebuild.<\/p>\n<p>Translating that BER into how likely we have for a rebuild failure&#8230;the math gets a little sticky.<\/p>\n<p>UREs, just like drive fails, are a matter of odds.\u00a0 Every bit you read is an independent event, with the odds of failure being the bit-read chance that we collected about the drive.\u00a0 The probability equation comes out looking like this:<\/p>\n<p>Let\u2019s apply the probabilities we started with at the beginning of this article to the drives in my Array 1 now.\u00a0 A reminder, these are WD Red 4TB drives.\u00a0 Western Digital sets a BER value of 1 per 10e14.<\/p>\n<p>Array 1 blows a drive.\u00a0 I\u2019ve got three left, and a new 4TB I popped into the array.\u00a0 I trigger the rebuild.\u00a0 We\u2019ve already said 24 hours, so we\u2019ll stick with that (technically it\u2019s closer to 10h for a 4TB, but big deal).<\/p>\n<p>Edit 10.10.2018 &#8211; I have identified a mistake in my calcs here courtesy of the Spiceworks forum.\u00a0 Parity data is being read from more drives than I originally laid out.\u00a0 by the time you read this, the information below will have been corrected.<\/p>\n<p>My array now has to perform <em>three\u00a0<\/em>reads (two data and one parity) to get each value to be written to the new drive \u2013 a read on the stripe, and a read on the parity stripe.\u00a0 So I\u2019m actually reading twice the volume of the target drive.<\/p>\n<p>4TB is 4,000,000,000,000 bytes.\u00a0 Three times that is 12,000,000,000,000.\u00a0 8 bits per byte means 96,000,000,000,000.\u00a0 Which is a crap-ton of bytes.<\/p>\n<p>However, 10e14 (the BER of our WD drives) is 100,000,000,000,000.\u00a0 That\u2019s an even bigger crap-ton.\u00a0 Not that much bigger, but bigger.<\/p>\n<p>So let\u2019s ask the question, and plug in the numbers.\u00a0 The question:<\/p>\n<p><em>During my rebuild, what are the odds of rolling a mis-read on any of my 96,000,000,000,000 reads?<\/em><\/p>\n<p>As before, let&#8217;s invert this question and ask instead,\u00a0<em>what are the odds of\u00a0<span style=\"text-decoration: underline;\">not<\/span> rolling a mis-read on every one of our reads?\u00a0<\/em>and then subtract that from 1.<\/p>\n<p>Odds of successful read on each of these reads is 99,999,999,999,999 \/ 100,000,000,000,000.\u00a0 We\u2019re trying 96,000,000,000,000 times.\u00a0 Most of our PCs can\u2019t raise something to the 96-trillionth power, I\u2019m afraid.\u00a0 Even Excel&#8217;s BINOM.DIST will barf on numbers this size.\u00a0 You&#8217;re going to need a scientific calculator to get this done.<\/p>\n<p>1 \u2013 (99,999,999,999,999\/100,000,000,000,000)e96,000,000,000,000 =<\/p>\n<p>1 \u2013 (.99,999,999,999,999)e96,000,000,000,000 =<\/p>\n<p>(now you\u2019re going to have to trust me on the following figure, I got it from the scientific calculator at <a href=\"https:\/\/www.mathsisfun.com\/scientific-calculator.html\">https:\/\/www.mathsisfun.com\/scientific-calculator.html<\/a>)<\/p>\n<p>1 \u2013 0.38318679500580827 = 0.6168132049941917<\/p>\n<p>So the odds of a BER giving my Array 1 a bad case of indigestion is 61.68%.\u00a0 That\u2019s a pretty scary figure, actually, and I\u2019ll get to the mitigation of it later.\u00a0 It\u2019s this kind of figure that I think generally gives people enough of the willies to make that crazy \u201cRAID-5 is for poopyheads!\u201d proclamation.\u00a0 Very likely because the people who make that claim assume that this is the end of the road.<\/p>\n<p>Thankfully, we\u2019re looking at odds of <em>data loss<\/em>.\u00a0 Not necessarily <em>rebuild failure<\/em>, though that does factor into the odds of loss.<\/p>\n<h2>The Equation for Data Loss<\/h2>\n<p>In order to have loss of data, basically we have to lose a number of drives that our array cannot tolerate, before we can protect or preserve that data.<\/p>\n<p>Let\u2019s say that window of time comes out to two weeks.\u00a0 That\u2019s probably a lot more than we need, so it will inflate the odds to a conservative number.\u00a0 Two weeks is 336 hours, .038 of a year.<\/p>\n<p>So given that, the basic odds of data loss are:<\/p>\n<p>For RAID-5, we need to lose a second drive for data loss.\u00a0 That means odds of Initial loss * odds of another loss during window (remember that these are multiplicative, not additive).\u00a0 If all the arrays I mentioned above were RAID-5, and using the \u201clazy bastard\u201d two-week window, here\u2019s where we\u2019d be:<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"72\">Array and # drives<\/td>\n<td width=\"86\">Drive Type<\/td>\n<td width=\"97\">Annualized Failure Rate<\/td>\n<td width=\"89\">Odds of Initial Loss<\/td>\n<td width=\"125\">Loss during Window<\/td>\n<td width=\"98\">Total Chance<\/td>\n<\/tr>\n<tr>\n<td width=\"72\"><\/td>\n<td width=\"86\"><\/td>\n<td width=\"97\"><\/td>\n<td width=\"89\">1 &#8211; (1-AFR)eN<\/td>\n<td width=\"125\">1 \u2013 (1-(AFR * 0.0384))e(N-1)<\/td>\n<td width=\"98\">Initial * Window Loss<\/td>\n<\/tr>\n<tr>\n<td width=\"72\">Array 1 \u2013 4 drives<\/td>\n<td width=\"86\">WD Red 4TB<\/td>\n<td width=\"97\">2.17%<\/td>\n<td width=\"89\">1-(1-.0217)e4 = 8.41%<\/td>\n<td width=\"125\">1-(1-(.0217*.0384))e3 = 0.25%<\/td>\n<td width=\"98\">0.00021, or 0.021%<\/td>\n<\/tr>\n<tr>\n<td width=\"72\">Array 2 \u2013 4 drives<\/td>\n<td width=\"86\">HGST NAS 8TB<\/td>\n<td width=\"97\">1.2%<\/td>\n<td width=\"89\">1-(1-.012)e4 =<\/p>\n<p>4.7%<\/td>\n<td width=\"125\">1-(1-(.012*.0384))e3 = 0.138%<\/td>\n<td width=\"98\">0.00006486, or 0.0065%<\/td>\n<\/tr>\n<tr>\n<td width=\"72\">Array 3 \u2013 8 drives<\/td>\n<td width=\"86\">WD Red 6TB<\/td>\n<td width=\"97\">4.19%<\/td>\n<td width=\"89\">1-(1-.0419)e8<\/p>\n<p>= 28.99%<\/td>\n<td width=\"125\">1-(1-(.0419*.0384))e7 = 1.12%<\/td>\n<td width=\"98\">0.003249, or<\/p>\n<p>0.3249%<\/td>\n<\/tr>\n<tr>\n<td width=\"72\">Array 4 \u2013 12 drives<\/td>\n<td width=\"86\">Iron Wolf Pro 10TB<\/td>\n<td width=\"97\">0.47%<\/td>\n<td width=\"89\">1-(1-.0047)e12 =<\/p>\n<p>5.94%<\/td>\n<td width=\"125\">1-(1-(.0047*.0384))e11 = 0.216%<\/td>\n<td width=\"98\">0.0001283, or<\/p>\n<p>0.01283%<\/td>\n<\/tr>\n<tr>\n<td width=\"72\">Array 5 \u2013 12 drives<\/td>\n<td width=\"86\">Iron EC 8TB<\/td>\n<td width=\"97\">1.08%<\/td>\n<td width=\"89\">1-(1-.0108)e12 =<\/p>\n<p>12.217%<\/td>\n<td width=\"125\">1-(1-(.0108*.0384))e11 = 0.455%<\/td>\n<td width=\"98\">0.0005559, or<\/p>\n<p>0.05559%<\/td>\n<\/tr>\n<tr>\n<td width=\"72\">Array 6 \u2013 12 drives<\/td>\n<td width=\"86\">Seagate Savvio .3TB<\/td>\n<td width=\"97\">0.44%<\/td>\n<td width=\"89\">1-(1-.0044)e12 =<\/p>\n<p>5.154%<\/td>\n<td width=\"125\">1-(1-(.0044*.0384))e11 = 0.1857%<\/td>\n<td width=\"98\">0.0000957, or<\/p>\n<p>0.00957%<\/td>\n<\/tr>\n<tr>\n<td width=\"72\">Array 7 \u2013 7 drives<\/td>\n<td width=\"86\">Seagate Savvio .6TB<\/td>\n<td width=\"97\">0.44%<\/td>\n<td width=\"89\">1-(1-.0044)e7 = 3.04%<\/td>\n<td width=\"125\">1-(1-(.0044*.0384))e6 = 0.1013%<\/td>\n<td width=\"98\">0.0000308, or<\/p>\n<p>0.00308%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>I think the values above show definitively that RAID-5 is a perfectly viable storage mechanism.<\/p>\n<h2>RAID-6 Enters the Fray<\/h2>\n<p>With RAID-6, we\u2019re now adding a second parity stripe distributed among the disks of the array.\u00a0 In order for this type of array to fail, we have to have a third disk die during the window.\u00a0 I won\u2019t repeat the entire set of equations, because that would be a pain in the ass.\u00a0 Basically, we\u2019re adding a new column, called \u201cSecond Loss During Window\u201d, which has the exact same formula as the \u201cLoss During Window\u201d one.\u00a0 The only difference is that the exponential is one less.\u00a0 Once we get the result of that column, we multiply it with the Initial Loss and Loss During Window to get the real figure of data loss.<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"50\">Array and # drives<\/td>\n<td width=\"47\">Drive Type<\/td>\n<td width=\"81\">Annualized Failure Rate<\/td>\n<td width=\"67\">Odds of Initial Loss<\/td>\n<td width=\"117\">Loss during Window<\/td>\n<td width=\"117\">2<sup>nd<\/sup> Loss<\/td>\n<td width=\"104\">Total Chance<\/td>\n<\/tr>\n<tr>\n<td width=\"50\"><\/td>\n<td width=\"47\"><\/td>\n<td width=\"81\"><\/td>\n<td width=\"67\">1 &#8211; (1-AFR)eN<\/td>\n<td width=\"117\">1 \u2013 (1-(AFR * 0.0384))e(N-1)<\/td>\n<td width=\"117\">1 \u2013 (1-(AFR * 0.0384))e(N-2)<\/td>\n<td width=\"104\">Initial * Window Loss<\/td>\n<\/tr>\n<tr>\n<td width=\"50\">Array 1 \u2013 4 drives<\/td>\n<td width=\"47\">WD Red 4TB<\/td>\n<td width=\"81\">2.17%<\/td>\n<td width=\"67\">1-(1-.0217)e4 = 8.41%<\/td>\n<td width=\"117\">1-(1-(.0217*.0384))e3 = 0.25%<\/td>\n<td width=\"117\">1-(1-(.0217*.0384))e2 = 0.16%<\/td>\n<td width=\"104\">0.0000003364, or 0.00003364%<\/td>\n<\/tr>\n<tr>\n<td width=\"50\">Array 2 \u2013 4 drives<\/td>\n<td width=\"47\">HGST NAS 8TB<\/td>\n<td width=\"81\">1.2%<\/td>\n<td width=\"67\">1-(1-.012)e4 =<\/p>\n<p>4.7%<\/td>\n<td width=\"117\">1-(1-(.012*.0384))e3 = 0.138%<\/td>\n<td width=\"117\">1-(1-(.012*.0384))e2 = 0.092%<\/td>\n<td width=\"104\">0.0000000597, or 0.00000597%<\/td>\n<\/tr>\n<tr>\n<td width=\"50\">Array 3 \u2013 8 drives<\/td>\n<td width=\"47\">WD Red 6TB<\/td>\n<td width=\"81\">4.19%<\/td>\n<td width=\"67\">1-(1-.0419)e8<\/p>\n<p>= 28.99%<\/td>\n<td width=\"117\">1-(1-(.0419*.0384))e7 = 1.12%<\/td>\n<td width=\"117\">1-(1-(.0419*.0384))e6 = 0.9615%<\/td>\n<td width=\"104\">0.00003122, or 0.003122%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As you can see, even if you\u2019re a lazy bastard your chance of data loss in the window of vulnerability, RAID-6 makes the odds of data loss vanishingly small.<\/p>\n<h2>Failure Mitigation<\/h2>\n<p>So you had a drive blow out in your RAID-5 or -6 array, and you\u2019re staring at the column of Loss Window now, wondering what to do.<\/p>\n<p>The most important action you can take right now is this:<\/p>\n<p><strong><u>CALM DOWN<\/u><\/strong>.<\/p>\n<p>You haven\u2019t lost data yet.\u00a0 But by hasty action, you might.\u00a0 Stop, breathe.\u00a0 Do NOT touch that array, and do NOT power it down just yet.\u00a0 If one of your disks has checked out of the hotel, when you reboot the cage, there\u2019s a chance it could \u201cunrecognize\u201d that disk and re-initialize the array, blowing your data into never-never land.<\/p>\n<p>Steps to take here:<\/p>\n<ol>\n<li>DO <strong><u>NOT<\/u><\/strong> STUFF A NEW DRIVE IN THE ARRAY AND REBUILD. NOT YET.<\/li>\n<li>If you haven\u2019t done so already, write down your RAID configuration. Include total capacity, disk types, stripe size, drive order, partitions\/volumes and any other details you can get.<\/li>\n<li>Can you isolate the array from users? If you can, do it.\u00a0 Get their IO off the array if possible.<\/li>\n<li>Check your backups and confirm that you have a backup of the array\u2019s data.<\/li>\n<li>Get another volume online that has capacity at least equal to the total used space on the degraded array. One of the easiest methods of doing this is a USB 3.0 drive cradle and a set of SATA drives.\n<ol>\n<li>Copy all your data from the array onto this volume and confirm that it is valid<\/li>\n<\/ol>\n<\/li>\n<li>If you can affirm that 5.a is done and good, proceed<\/li>\n<li>Are all the drives in the cage the same age? If so, get replacements for all of them and start a completely new array with the new ones.\u00a0 Retire the old drives.\n<ol>\n<li>Reason for this is that they have all experienced similar wear-and-tear, and they all probably come from the same batch made at the factory \u2013 if there is a defect in one, there\u2019s a good chance that this defect applies to all of them. You\u2019re better off just dropping them all and replacing them.<\/li>\n<li>If they aren\u2019t the same age, just note the ones that are, and plan to replace them asap.<\/li>\n<\/ol>\n<\/li>\n<li>Okay, if 4 is good and 5 is good, NOW you can do a rebuild if you feel you have to. I still recommend reinitializing completely fresh and restoring the copied\/backed up data, but I also recognize that convenience is a big draw.<\/li>\n<\/ol>\n<p>Part of the whole debate about the validity of RAID-5 tends to stem from the probability of failure during a rebuild \u2013 which can be unacceptably high with old disks of appreciable size (see my section on UREs above).\u00a0 The argument seems to make the assumption that the array is either not backed up, or is somehow on critical path for general use by users.<\/p>\n<p>Rebuilding an array while live and in production use should be considered a last resort.\u00a0 You can see above that there is a high likelihood of failure even from reasonably modest size arrays.\u00a0 The fact that current RAID vendors offer live-system rebuilds should be considered a convenience only at this point.\u00a0 When we were using 100Gb disks, a live rebuild was a viable option, but that simply doesn\u2019t fit any more.<\/p>\n<p>If your array is in that position \u2013 critical path and not backed up \u2013 then you have a big problem.\u00a0 You need to get a backup arranged <em>yesterday<\/em>.\u00a0 And if it is critical path, then you should ensure that there is a failover plan in place.\u00a0 Never assume that just because you have your critical data on RAID that you are totally safe.\u00a0 You are <em>safer <\/em>in the case of a drive fail, yes, but you aren\u2019t out of the woods.<\/p>\n<p>Stuff to consider that will help you survive an array failure:<\/p>\n<ul>\n<li>Buy a USB cradle or a tape drive that can handle the capacity of your RAID array. Use them religiously to preserve your data.\n<ul>\n<li>Test them regularly (monthly is good) to ensure that when a fail does happen, you\u2019re prepared to recover.<\/li>\n<\/ul>\n<\/li>\n<li>Consider a second array, or a big-ass disk that you can house next to the array, of similar capacity that you can set up on a synchronization system (for example, Synology has a \u201cCloud Station Server\u201d and \u201cCloud Synch\u201d apps that can be used to ensure one NAS maintains exactly the same content as the other). That becomes your fail-over.<\/li>\n<li>Unless you absolutely have to, do not rely on the use of a live rebuild to preserve your data.<\/li>\n<li>If you have room in your cage, add another drive and convert your RAID-5 to RAID-6 to buy you extra insurance against multiple drive failure.<\/li>\n<li>Smaller volumes are better than big ones \u2013 you can shovel smaller volumes onto a USB drive more easily than trying to subdivide one large one onto multiple removable drives.<\/li>\n<li>When filling up an array, buy disks of the same brand and capacity, but mix up who you buy them from or buy them over time to protect you from factory batch errors.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>There\u2019s no \u201cmagic panacea\u201d here with RAID systems.\u00a0 They\u2019re great, they\u2019re effective, and there are simply some things that they do not do.\u00a0 I hope that I have helped dispel some of the fear about RAID-5 here, and it is also my hope that I have perhaps called attention to any gaps in your data coverage so that you can fill them now rather than wait for the inevitable to occur.\u00a0 With luck, you can breathe a little easier now, and not be too harsh on RAID-5.<\/p>\n<p>Feel free to write me with any questions, comments, death-threats, or mathematical corrections you might feel necessary.\u00a0 Meanwhile, happy computing.<\/p>\n<p>Edit 13.08.2018:\u00a0 I whipped up the figures into a spreadsheet that you can download and use for your own arrays as well.<\/p>\n<p>Edit 10.10.2018:\u00a0 edited for clarity, and corrected math on UREs.\u00a0 Also corrected spreadsheet which is linked below.<\/p>\n<p><a href=\"http:\/\/borkedcode.com\/wp\/wp-content\/uploads\/2018\/10\/Array-Life-Expectancy.xlsx\">Array Life Expectancy<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Situation I\u2019ve seen, over the last few weeks, more than a few posts on a popular IT hangout site proclaiming in loud, evangelical voice that \u201cRAID-5 is terrible for spinning disks, never use it!\u00a0 If you do, you\u2019re a &hellip; <a href=\"https:\/\/www.borkedcode.com\/wp2\/2018\/08\/11\/raid-5-and-the-sky-is-falling\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,42,13,15,17],"tags":[75,119,120],"class_list":["post-1537","post","type-post","status-publish","format-standard","hentry","category-business","category-disk-management","category-hardware","category-it","category-pc-stuff","tag-disk-arrays","tag-raid","tag-reliability"],"_links":{"self":[{"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/posts\/1537","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/comments?post=1537"}],"version-history":[{"count":0,"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/posts\/1537\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/media?parent=1537"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/categories?post=1537"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.borkedcode.com\/wp2\/wp-json\/wp\/v2\/tags?post=1537"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}