Defects, Errors, and Faults

In Electronics industry incorrectness in products are described in several ways which may create confusion in understanding the terms defect, error and fault. Though these terms are used interchangeably in the field of VLSI testing, let’s try to draw a fine boundary between the meaning of these terms.

Before doing this, we should understand why the study of fault tolerance and VLSI testing are so important. The following discussion would also give you a small insight on the meaning of defect, error and fault.

Faults in electronic systems are defects that can be converted to failures. A very small defect, such as a frozen memory bit, a stuck-at fault, an uninitialized variable in software, an alpha particle hit or cosmic ray ionization can be considered as a fault. Error may be viewed as the next level of a fault which may lead to an unexpected behavior within the system. Something like mistaken value of a state variable, entering into an infinite loop or incorrect result of a calculation might be considered as errors in system. Failure may be seen as the next level of error which may raise the situation where some part of the system does not function as expected [1].

Fault-error-failure cascade can lead to life-threatening hazards [2]

Reliability of electronic systems has always been a concern. The intensity of concern increases when the system is related to an application like avionics, space mission, automobiles, medical etc., where a fault or hazard may lead to an accidental situation which in turn risks the human life. So, it is very important for a VLSI engineer to understand the fine difference between defect, error and fault.

Defect: A defect in an electronic system is the unintended difference between the implemented hardware and its intended design. Defects can be Process Defects, Material Defects, Age Defects, and Package Defects.

Fault: A representation of a “defect” at the abstracted function level is called a fault. A fault (or failure) can be either a hardware defect or a software/programming mistake (bug).

Error: A wrong output signal produced by a defective system is called an error. An error is an “effect” whose cause is some “defect.”

All the above definitions have been extracted from [3].

Let’s understand the above three terms through following few examples:

Example – 1 (Hardware Defect) [3]

Consider an AND gate with two of its inputs connected to ‘a’, ‘b’, and output connected to ‘c’. Let’s say the input connected to ‘b’ has a wrong connection and is grounded. The functional output of this system, as implemented, is c = 0 instead of the correct output c = ab. For this system, we have:

Defect: a short to ground.

Fault: signal b stuck at logic 0.

Error: a = 1, b = 1, output c = 0; correct output c = 1. Notice that the error is not permanent. As long as at least one input is 0, there is no error in the output.

Example – 2 (Hardware Defect) [4]

consider an adder circuit, with an output line stuck at 1; it always carries the value 1 independently of the values of the input operands. This is a fault, but not (yet) an error. This fault causes an error when the adder is used and the result on that line is supposed to have been a 0, rather than a 1.

Example – 3 (software/programming mistake) [4]

Consider, for example, a subroutine that is supposed to compute sin(x) but owing to a programming mistake calculates the absolute value of sin(x) instead. This mistake will result in an execution error only if that particular subroutine is used and the correct result is negative.

Both faults and errors can spread through the system. For example, if a chip shorts out power to ground, it may cause nearby chips to fail as well. Errors can spread because the output of one unit is used as input by other units. So, understanding of  the terms defect, error and fault; and at the same time incorporation of correct fault tolerance techniques and VLSI testing schemes are utmost necessary.

[1] S. S. Rout, “Reliability Aware Intelligent Memory Management (RAIMM)

[2] D. Kalinsky, “Architecture of safety-critical systems.” Embedded Systems Programming magazine (U.S.A.), vol. 18, no. 9, Sept. 2005.

[3] Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits, by Michael L. Bushnell, and Vishwani D. Agrawal.

[4]  Fault-Tolerant Systems, by Israel Koren and C. Mani Krishna

Leave a Reply

Your email address will not be published. Required fields are marked *