While product reliability has become a major concern to most organizations, many have overlook developing good reliability specifications. This oversight can result in ambiguous and purposeless reliability testing during the validation phase of the product development. Effective reliability testing requires well-defined reliability specification. After all, the prime objective of a reliability engineering program is to test and assess product reliability.
A common element that is vastly ignored but rather critical to a sound reliability specification is definitions of equipment failure. Even the most vigorous reliability-testing program is of little use if the product being tested has poorly defined failure parameters. This article discusses the essential requirements for establishing concise and effective reliability specifications, and proposes a method to define equipment failure.
Among the requirements that are often used to specify equipment reliability is “mean time between failures” (MTBF), which is verified during subsystem and system reliability testing. Essential to reliability testing is the development of agreed-upon definitions of equipment failure and it must be clearly defined at the earliest stages of the product development. This may seem to be fairly obvious whether a product has failed or not, but such a definition is quite necessary for a number of different reasons.
One of the most important reasons is that different manufacturers may have different definitions as to what sort of behavior actually constitutes a failure. Identical tests can be performed on the same equipment by different groups may produce radically different results simply because the different groups may have different definitions of product failure. This can result in performance values that are sometimes significantly different, which in reality it may not be true. There are cases where one manufacturer has claimed MTBF value of 2000 hours/failure whereas another manufacturer has reported 100 hours/failure for the same product. This discrepancy may not be due to the fact that one having a vastly superior product; rather, it may be due to the differences in their definition and assessment of a “failure”.
In an effort to normalize reliability criteria, standards such as SEMI E10 have been created to give customers and suppliers in semiconductor manufacturing a guideline for measuring reliability, availability, and maintainability (RAM). SEMI E10 defines an interrupt as equipment inability to perform its intended function due to occurrence of assists or failures, or 
- Equipment Interrupt = Sum of all Failures + Sum of all Assists
It has further defines assists and failures as:
Assist: Any unplanned interruption that occurs during equipment operation where all of the three following conditions apply:
- Equipment operation is resumed through external intervention.
- There is no replacement of a part, other than specified consumables.
- There is no further variation from specifications of equipment operation.
Failure: Any unplanned interruption or variance from the specifications of equipment operation other than assists.
From the above definition, one may conclude that a failure is defined as replacement of a part and assists are any external intervention. In practice however, many customers view machine performance as the cost of operation. Thus, a customer may not favor equipment if it would require frequent external interventions (e.g., equipment adjustment) since they would need to allocate many resources to operate the equipment. For this reason, customer may tend to view equipment adjustments also as failures. Thus, even with standards, a common struggle still exists between the suppliers and the customers to classify an interrupt.
Equipment Interrupt (Failure/Assist) Classification
This article proposes classifying failures and assists by considering the modes of recovery; that is, to categorize failures and assists by considering the means by which a customer amends the problems.
Just like reliability testing that must simulate product usage in the field, failures and assists should also be profiled the way they are repaired by the customers. In semiconductor production environment, many customers categorize equipment repair activities by the nature of the interrupts. They have repair policies that allocate recovery actions to machine operators and engineering technicians. This recovery plan is practical because different interrupts require different skill sets for repairs. The plan requires that the customer have a good understanding of the equipment-operating behavior, so they can accurately staff maintenance and repair personnel.
This recovery plan technique can be used as a foundation for assist and failure classifications. In other words, assists may be classified as machine induced interrupts that is recovered by a machine operator, whereas, failures are machine induced interrupts that require skilled technicians for comprehensive troubleshooting and in-depth corrective actions. Deployment of this method also helps determine the cost of equipment ownership; it costs less to resolve an assist that is repaired by a machine operator and cost more to resolve a failure since technician involvement is required.
Product specifications are no longer limited to just meeting functionality measures (i.e., speed, capacity, range, etc) because for products with poor reliability and seldom available for use, functionality measures are meaningless. Reliability specification is the backbone of a reliability program and it is a prerequisite for reliability testing. Without this, the implementation of a reliability program will be difficult and frustrating process. Typical equipment reliability specification includes performance indices such MTBF and it must always be accompanied with clear definition if failure. Effective reliability testing heavily hinges on clear definition of equipment failure. Without this definition as a baseline, any reliability discussions become meaningless.
This article stressed the importance of equipment failure as an integral part of reliability specification, and proposed a method for classifying equipment interrupts.
1. SEMI E10-99, “Standard for Definition and Measurement of Equipment Reliability, Availability, and Maintainability (RAM),” SEMI (Semiconductor Equipment and Material International), 805 East Middlefield Road, Mountain View, CA 94043, 1999.