Fault Tolerance Measures

Fault tolerance of electronic system is a major concern for the VLSI engineers. This can be realized from the post Need of Fault Tolerant VLSI System Design. The objective of this post is to introduce the proper tools for fault tolerance measure. A measure is a mathematical abstraction, which expresses only some subset of the object’s nature

Reliability, R(t): Probability that the system has been up continuously during the whole interval [0,t], given it was up at time 0. This measure is suitable for applications in which even a momentary disruption can prove costly. One example is computer that controls the auto-braking of a vehicle, for which failure would result in an accident. So, reliability is calculated where repair condition is not available.

Availability, A(t): Fraction of time system is up during the interval [0,t]. This measure is appropriate for applications in which continuous performance is not vital but where it would be expensive to have the system down for a significant amount of time. High availability is desired for an online merchandise system. This is because downtime can put off customers and lose sales; however, an occasional short-duration failure can be well tolerated.

Mean Time to Failure, Mean Time to Repair, Mean Time Between Failure

Mean Time To Failure, MTTF: Average time the system remains up before it goes down (until a failure occurs) and has to be repaired or replaced.

Mean Time Between Failures, MTBF: Average time between two consecutive failures.

The difference between the above two is due to the time needed to repair the system following the first failure. Denoting the Mean Time to Repair, MTTR.

Either or multiple of these fault tolerance measures are used to calculate the reliability of any system.


All the definitions have been extracted from  Fault-Tolerant Systems, by Israel Koren and C. Mani Krishna

Leave a Reply

Your email address will not be published. Required fields are marked *