Reliability vs. Availability in Fault Tolerance

We have learnt the following definitions of Reliability and Availability from the Fault Tolerance Measures post:

Reliability, R(t): Probability that the system has been up continuously during the whole interval [0,t], given it was up at time 0. This measure is suitable for applications in which even a momentary disruption can prove costly. One example is computer that controls the auto-braking of a vehicle, for which failure would result in an accident. So, reliability is calculated where repair condition is not available.

Availability, A(t): Fraction of time system is up during the interval [0,t]. This measure is appropriate for applications in which continuous performance is not vital but where it would be expensive to have the system down for a significant amount of time. High availability is desired for an online merchandise system. This is because downtime can put off customers and lose sales; however, an occasional short-duration failure can be well tolerated.

Now, it might seem like a system with low reliability would be less available. But, sometime a low-reliable system can be highly-available system.

For example a website floats a quiz every hour with a prize. So the server goes down every hour for a minute due to high number of visitors. But after a minute, as the traffic goes down, the website goes up once again. Such a system has a Mean Time Between Failure (MTBF) of just 1 hour and, consequently, a low reliability; however, its availability is high.

The availability calculation can be done as below. In every hour the website is up for 59 minutes and down for 1 minute.

Availability = 59/60 = 0.983

Since the system is available for 98.3% of time, we can consider it as a highly-available system.

Reference: Fault-Tolerant Systems, by Israel Koren and C. Mani Krishna

Leave a Reply

Your email address will not be published. Required fields are marked *