I’m an avid reader of Maximum PC and have had a subscription since they were known as boot back in the late 90′s. (I’ve remained a subscriber despite the fact that they put most of their content online, a good move nonetheless, but I still like the printed version). They’re fairly accurate in most of their reviews and I respect them for both this and their useful guides and other features.
However, when reading their review of the FragBox II, I came across an interesting quote about failure rates: (Emphasis mine)
So why would Falcon configure RAM in single-channel mode? Falcon gave us three reasons for this decision: Thereâ€™s only a minimal performance advantage to running dual-channel mode with this box; RAM is the second-most-likely component to fail (the GPU is first), so using just one DIMM cuts the chance of failure in half…
The emphasized part of the statement is wrong in the general case. But why?
It seems to be common sense that if you have one device, with a failure rate of p (within a given time period), that using two of them will double your chances of failure. However, some quick inspection proves that this isn’t the case. For example, using this line of thinking would mean that having three of the devices in use would triple your chances of failure. Eventually, with enough devices in use, the chances of failure would grow larger than 100%, an impossibility.
Looking at it another way, just because you are using two devices, each with an expected failure rate of 50% within a given time frame, does not guarantee that at least one will fail within that time period.
Looking at it a different way
This leads us to the actual question: What is the expected failure rate with two DIMMs as compared to with just one? First of all, let’s do some quick definitions. Let p denote the probability that a single DIMM will fail within a given time period. Thus, (1-p) is the probability that it will not fail.
Pr(DIMM failure) = p Pr(DIMM does not fail) = 1-p
Moving to the case of two DIMMs in operation, we want to find the probability that at least one of them fails; since both are required for operation, at least one failing can be considered “failure” in this case. It isn’t readily apparent how we’d figure out the probability that at least one would fail of the two, so let’s break it down:
Pr(at least one fail) = 1 - Pr(none of them fail)
We can obtain the probability that at least one fails from the probability that neither of them fails, since these are mutually exclusive events.
Pr(none of them fail) = (1-p)2 = 1 - 2p + p2
This expression can be derived if we assume that DIMM failures are independent events. Thus,
Pr(at least one fail) = 1 - Pr(none of them fail) = 1 - (1 - 2p + p2) = 2p - p2 >= p
From this, we conclude that the chances of failure (at least one DIMM failing) for two DIMMs is indeed higher than for one DIMM, since p <= 1. In fact, the chances of failure for two DIMMs is never double the chances of failure for one DIMM, unless the probability of failure is 0.
For further inspection, let us calculate the difference between the two failure rates, that is, how much more likely two DIMMs are to fail over just one:
Increase in probability of failure = Pr(at least one fail of two DIMMs) - Pr(DIMM failure) = 2p - p2 - p = p - p2
Taking the derivative of this function yields the expression (1 – 2p), which has a global maximum at p = 0.5. At this value of p, the increase in the likelihood of failure is:
p - p2 = (0.5) - (0.5)2 = 0.25
Thus, for a single-DIMM failure rate of 50%, using two will dramatically increase the expected overall failure rate to 75%. However I’d expect that failure rates on DIMMs are quite low, and for very low values of p (or very high values), the difference is actually quite minimal.
Going back to the original statement…
So, we’ve ascertained that the failure rates are never double for the situation described. However, in actual practical situations, the results are quite close. I’ve already suggested that failure rates for DIMMs are quite low, that is p is usually very close to 0. In order for the two-DIMM failure rate to be double that of a single-DIMM, we would have to have:
Pr(at least one fail of two DIMMs) = 2*Pr(DIMM failure)
Substituting in the expressions derived above, we have:
2p - p2 = 2*p ?
Obviously, this expression is never true, unless p = 0. However, for very small values of p, the expression is close to being true, that is, the difference approaches 0:
lim(p -> 0) 2p - p2 - 2*p = 0 = lim(p -> 0) p2 = 0
Thus, for very small rates of failure, the statement of failure rates doubling is approximately true.
Update: More details
More specifically, this is a case of figuring out the probability of the union of two events. For any two events, A and B, the probability of their union is:
Pr(A ∪ B) = Pr(A) + Pr(B) - Pr(A ∩ B)
Where Pr(A ∩ B) is the probability of the intersection of the two events, that is, the probability that both occur. For the above problem, the expression reduces to:
Pr(A ∪ B) = Pr(at least one DIMM fails) = Pr(DIMM failure) + Pr(DIMM failure) - Pr(Both DIMMs fail) = p + p - p2 = 2p - p2
Which is the same expression we arrived at before. As noted before, for very small values of p, this is approximately equal to 2p, or double the original failure rate.