|
MIL-HDBK-217 vs. by Barry Ma and Mekonen Buzuayene, Anritsu For the last three decades, MIL-HDBK-217 has been widely used to predict product reliability.1 Today, however, highly accelerated life testing (HALT) and highly accelerated stress screening (HASS) are being recognized as effective tools to intensify product reliability.2 The military standard and HALT/HASS cover different areas in the reliability world. Is there any correlation between them? Manufacturers usually make reliability predictions based on failure models described in MIL-HDBK-217, Bellcore TR-332, or some other model before the product is manufactured or marketed.3,4 But when a product is delivered to customers and then field failure reports begin to arrive, the preliminary reliability prediction sometimes is not validated by real-world failure reports. Some manufacturers have said the prediction model could be widely inaccurate when compared with the performance in the field.4,5 What makes the discrepancy between the reliability prediction and the field failure report? The Purpose of MIL-HDBK-217 This military standard is used to estimate the inherent reliability of electronic equipment and systems, based on component failure data. It consists of two basic prediction methods:
The general failure mod-el in MIL-HDBK-217 and Bellcore TR-332 is of the form:
where: lb
= the base failure rate, described by the Arrhenius equation The Arrhenius equation illustrates the relationship between failure rate and temperature for components. It derives from the observed dependence of chemical reaction, gaseous diffusion, and migration rates on temperature changes:
where: lb = process rate (component failure rate) Detailed models are provided for each part type, such as microcircuits, transistors, resistors, and connectors. The Merit of HALT/HASS HALT is performed during design to find the weak reliability links in the product. The applied stresses to the product are well beyond normal shipping, storage, and application conditions. HALT consists of:
HASS is performed in the production stage to confirm that all reliability improvements made in HALT are maintained. It ensures that no defects are introduced due to variations in the manufacturing process and vendor parts. It contains the following:
The precipitation and detection screen limits of HASS are based on HALT results. Usually, the precipitation-screen limits are located between operational limits and destruct limits and the detection screen limits between spec limits and operational limits, as shown in Figure 1.3 Figure 1. Hass Limits Selected From HALT Data
HALT/HASS has been proven to find latent defects that would very likely precipitate in end-use applications, causing product failures in the field. As a result, the HALT/HASS process can effectively intensify product reliability. Why MIL-HDBK-217 Turns Out Inaccurate Predictions The prediction techniques described in MIL-HDBK-217 for estimating system reliability are based on the Arrhenius equation, an exponentially temperature-dependent expression. But many failure modes in the real world do not follow the equation. For instance, mechanical vibration and shock, humidity, power on/off cycling, ESD, and dielectric breakdown—all independent of temperature—are common causes of failure. Even some temperature-related stresses, such as temperature cycling and thermal shock, would cause failures that do not follow the Arrhenius equation. More importantly, the reliability of components in many electronic systems is improving. Consequently, component failure no longer constitutes a major reason for system failure. But, the MIL-HDBK-217 model still tells us how to predict system reliability based on part failure data. Figure 2 illustrates the nominal percentage of failures attributable to each of eight predominant failure causes, based on data collected by the Reliability Analysis Center.6 The definitions of the eight failure causes in Figure 2 are as follows: Parts—22%: Part failing to perform its intended function. Design—9%: Inadequate design. Manufacturing—15%: Anomalies in the manufacturing process. System Management—4%: Failure to interpret system requirements. Wear-Out—9%: Wear-out-related failure mechanisms. No Defect—20%: Perceived failure that cannot be reproduced upon further testing. These failures may or may not be actual failures; however, they are removals and count toward the logistic failure rate. Induced—12%: An externally applied stress. Software—9%: Failure to perform its intended function due to a software fault. To illustrate the disparity, consider the following: A circuit board containing 338 components with six component types is used in a mobile radio system.4 The failure rate of the MIL-HDBK-217 prediction is 1.934 failures per million hours, as shown in Table 1. The field behavior of the board, however, shows 19 failures in a total operating time of 4,444,696 hours, resulting in a field failure rate of 4.274 failures per million hours. The deviation 4.274 - 1.934 = 2.34 failures per million hours was not covered by the MIL-HDBK-217 prediction. Table 1. Contribution to Failure Rate of Each Component in MIL-HDBK-217 Prediction
Actually, many field failures are caused by unpredictable factors, often the main reasons for reliability problems in today’s electronic systems. But those unpredictable reasons can be successfully precipitated, detected, and eliminated during a HALT/HASS process. Conclusion Before making a reliability prediction, be certain of one of the two following items:
References
About the Author Barry Ma is a qualification engineer at
Anritsu. He received a B.S. in physics and a master’s and Ph.D. in E.E. from Nanjing University. e-mail:
bma@anritsu.com Return to EE Home Page Published by EE-Evaluation
Engineering |