Abstract
In this paper it is shown that the bathtub-curve (BTC) based time-derivative of the failure rate at the initial moment of time can be considered as a suitable criterion of whether burn-in testing (BIT) should or does not have to be conducted. It is also shown that the above criterion is, in effect, the variance of the random statistical failure rate (SFR) of the mass-produced components that the product manufacturer received from numerous vendors, whose commitments to reliability were unknown, and their random SFR might vary therefore in a very wide range, from zero to infinity. A formula for the non-random SFR of a product comprised of mass-produced components with random SFRs was derived, and a solution for the case of the normally distributed random SFR was obtained.
1. Introduction
Burn-in testing (BIT) [1,2,3,4,5,6,7,8,9,10] has for many years been an accepted practice for detecting and eliminating early failures in newly fabricated electronic products prior to shipping the “healthy” ones that survived BIT to customers. BIT is mandatory on most high-reliability procurement contracts, such as for military and aerospace applications, but is also a must for automotive, medical, long-haul telecommunication and other electronic materials, packages and systems, whose high operational performance is paramount. BIT stimulates failures in defective materials and vulnerable structural elements of the manufactured products by accelerating the stresses that will supposedly cause these materials and elements to fail. BIT is usually conducted at the component level, because the cost of testing and replacing parts is the lowest at this level. The products are tested by applying stress extremes, usually, but not necessarily, of the expected operational stressors. It is believed that once a sufficiently long BIT process is complete, no further early failures are likely to occur.
Depending on the anticipated operation conditions of the product and testing capabilities of a particular manufacturer, BIT can be based on temperature cycling, elevated temperatures, voltage, current, humidity, random vibrations, and so on, or, since the principle of superposition does not work in the reliability engineering—on the appropriate combination of these stressors. The duration of stressing depends on the product, the manufacturing technology and the reliability requirements, with consideration of the consequences of possible failures. Elevated temperature (say, 125 °C for 168 h) or elevated stresses screening (say, twenty temperature cycles from −10 °C to 70 °C) are most often used. For complex products, dynamic BIT might be employed. The thermal stress, caused by the change in temperature, is combined in these tests with dynamic (shocks, random vibrations) loading. Such a temperature-dynamic bias is thought to provide worst-case operating conditions [11,12]. For commercial applications, BIT, if any is conducted at all, does not last longer than one or two days (24 or 48 h). BIT is a costly effort, and its application is therefore thoroughly planned and carefully executed.
It goes without saying that as a result of successfully applying BIT, early failures are avoided and the infant mortality portion (IMP) of the bathtub-curve (BTC) (Figure 1) is eliminated at the expense of an undesirable reduced yield caused by the BIT process. In addition, high BIT stresses might not only eliminate “freaks”, but could cause permanent damage to the main population of the “healthy” products, thereby reducing their lifetime. It is unclear, however, to what extent it happens indeed: the highly accelerated life testing (HALT) [13], a “black box” that tries “to kill many birds with one stone” and is at present the testing procedure of choice employed as a suitable BIT vehicle, is unable to provide any information on that.
Figure 1.
Bathtub curve—“reliability passport” of an electronic product.
It remains unclear what could possibly be done to develop an insight into what is actually happening during and as a result of BIT and what could possibly be done to effectively eliminate “freaks”, while shortening the testing time and not damaging the sound devices. In a mature production, when HALT is relied upon to do the BIT job, it is not easy even to determine whether there exists a decreasing failure rate. To determine the failure time for a very low percentage of the production, one has to destroy a large number of devices, unless there are additional considerations of what could be possibly done to enhance the merits of the BIT process and to minimize its shortcomings. Thus, there is an obvious incentive to develop ways in which the BIT process could be quantified, monitored and possibly optimized. Accordingly, in this analysis some important aspects of BIT are addressed for an electronic product comprised of numerous mass-produced components. Our intent is to shed some quantitative light on the BIT process. Particularly, we try to develop a suitable and predictable criterion that would be able to answer the fundamental “to burn-in or not to burn-in” question.
Two mutually complementing modeling studies have been carried out here: (1) the analysis of the configuration of the IMP of a BTC of a more or less well established manufacturing technology; and (2) the analysis of the role of the random statistical failure rate (SFR) of the mass-produced components that the product of interest is comprised of. Particularly, as far as the second study is concerned, we consider the effect that the random SFR of the mass-produced components might have on the nonrandom initial SFR of the product. Although this paper does not offer a straightforward and an ultimate answer to the “to burn-in or not to burn-in” question, nor to how to optimize the BIT process, in terms of its cost and duration, the suggested physics-of-failure and statistics-of-failure based criterion, and the calculated probabilities of non-failure for the given loading conditions and time of testing provide, in our judgment, a useful step forward in advancing the state-of-the-art in today’s BIT practice.
BIT, being an HALT effort, is, in effect, a failure-oriented-accelerated-test (FOAT) [14,15,16] and, as such, should be geared, to confirm the anticipated physics of failure and the expected failure modes, to a physically meaningful accelerated test model. The application of the probabilistic design for reliability (PDfR) approach [17,18] and its constituents, FOAT and multi-parametric Boltzmann-Arrhenius-Zhurkov’s equation (BAZ) [19,20,21], are beyond the scope of this paper. The PDfR/FOAT/BAZ concept is considered, however, as important future work. Let us briefly elaborate on its substance.
If the well-known Arrhenius model [22] is employed, FOAT should be conducted to determine the corresponding activation energies and other data that characterize the device reliability [23]. The desirable steady-state portion of the BTC occurs, as is known, at the end of the BIT process as a result of the interaction of two major irreversible processes: the “favorable” SFR process, resulting in a decreasing failure rate with time, and the “unfavorable” physics-of-failure-related process (PFR), resulting in an increasing failure rate. The first process dominates at the IMP of the BTC and is considered in this paper, and the second one—at its wear-out portion. These two processes start to compensate for each other at the beginning of the low enough and acceptable level of the steady-state BTC failure rate process. The SFR process can be predicted [24,25] for a product comprised of mass-produced components, from sheer theoretical considerations. Assuming that the physics-of-failure and statistics-of-failure processes are statistically independent, the failure rates of the first process at the given moment of time can be obtained by simply deducting the predicted SFR values from the experimentally obtained BTC ordinates. In our BIT analysis, a different application of the Ref. [24,25] finding is employed, namely, to quantify, on the probabilistic basis, some more or less well known considerations underlying the existing BIT practice, including the “to burn-in, or not to burn-in” question. Application of the PDfR/FOAT/BAZ concept will be able, hopefully, not only answer this question for the given manufacturing technology, but, most importantly, will be able to establish the appropriate elevated stresses and their levels, and decide on the effective BIT duration to minimize the number of devices that will be destroyed and the time of testing. The numerical example in Appendix B gives an indication of what could be expected from the application of the PDfR/FOAT/BAZ concept.
2. Analysis
2.1. Prediction Based on the Analytical Approximation of the Bathtub-Curve (BTC)
The typical BTC, the “reliability passport” of a mass-produced electronic product (Figure 1), can be approximated by the following expressions [18]:
Here is time-dependent failure rate, is its steady-state minimum, is its initial (high) value at the beginning of the IMP, is the duration of this portion, is the final (actual or acceptable) value of the failure rate at the end of the wear-out portion, is the duration of this portion, and the exponents and are expressed through the fullnesses and of the BTC infant-mortality and the wear-out portions as . These fullnesses are defined as the ratios of the areas below the BTC (i.e., the areas between the BTC and the time axis) to the areas and of the corresponding rectangulars. The exponents and change from zero to one, when the fullnesses and change from zero to 0.5. The “to burn-in or not to burn-in” question can be tentatively answered based on the derivative:
calculated for the initial moment of time . This yields:
If this derivative is zero or next-to-zero, this means that there is no IMP at all, so that no BIT is needed to eliminate this portion, and “not to burn-in” is the answer to our basic question. This certainly happens when the initial value of the BTC is not different from its steady-state value. What is less obvious is that the same result takes place for . This means that no more or less durable BIT is needed in such a case, because there are not too many “freaks” in the population, and that these “freaks” are characterized by very low probabilities of non-failure, so that the planned BIT process is a next-to-instantaneous one. The maximum value of the fullness is . This corresponds to the case when the IMP of the BTC is a straight line connecting the initial, and the steady-state, values of the BTC. In this case,
The derivative (3)
with respect to the fullness changes from the value expressed by the formula (4) to the value, which is four times greater, when the fullness changes from zero to 0.5. But how to establish the most likely value and the required BIT time, even for the worst case scenario so that the question “to burn-in or not to burn-in?” could be answered with some certainty? To do that let us address two additional and independent, methodologies: the one based on the use of the SFR [24,25] and, briefly, also the one, based on the application of the BAZ constitutive equation [19,20,21].
2.2. Prediction Based on the Analysis of the SFR Process
In the simplest case of the uniformly distributed random failure rates when the probability density distribution function is constant, the formula (A3) of the Appendix A yields:
In such a case, the probability of non-failure becomes time independent, that is, constant over the entire operation range: . This result does seem to make physical sense. Let us consider therefore a more realistic case, when the random failure rates of the components are normally distributed:
Here is the mean value of the random SFR and is its variance. Introducing (7) into the formula (A3) and using [26], the following expression for the non-random SFR of the product can be obtained:
The function
depends on the dimensionless “physical” (effective) time
and so do the auxiliary function
and the probability integral (Laplace function)
The term in formula (10) can be interpreted as a sort of a measure of the level of uncertainty of the random SFR. The value changes from infinity to zero, when the variance changes from zero, in the case of a non-random SFR, to infinity, in the case of an “ideally random” SFR. As is evident from formula (10), the “physical” time of the SFR process depends not only on the “chronological” (actual) time , but also on the mean and variance of the components’ random SFR. The rate of change of the “physical” time with the change in the “chronological” time is the “physical” time changes the faster the larger the standard deviation of the random SFR is. Considering this relationship, the formula (8) yields:
The “physical” time is zero, when the “chronological” time is and changes from to when the variance of the random SFR changes from zero to infinity. The function is tabulated in Table 1. It changes from 3 to zero when the “physical” time changes from −3 to infinity, that is, when the “chronological” time changes from zero to infinity. The function in this table is calculated numerically.
Table 1.
The governing function of the effective time .
The expansion (11) can be used to calculate the auxiliary function for large values, exceeding, say, 2.5, and has been, in effect, employed, when computing the Table 1 data. The function changes from infinity to zero, when the “physical” time changes from to For the times below −2.5, the function is large, and the second term in (9) becomes small compared to the first term. In this case the function coincides with the time itself, with an opposite sign though. As evident from Table 1, the derivative can be put, at the initial moment of time, equal to −1.0, and therefore,
This result explains the physical meaning of the initial failure rate of the BTC.
At the initial moment of time the formulas (10), (11) and (8) yield:
where the function
is tabulated in Table 2. This function changes from to infinity, when the factor changes from zero to infinity.
Table 2.
The initial statistical failure rate (SFR) vs. its standard deviation.
With the product’s initial SFR value (the degradation failure rate is obviously zero at initial moment of time, so that the initial value of the non-random SFR coincides with the initial value of the BTC), the last formula in (13) yields: When the ratio increases from zero to infinity (see Table 2), the ratio increases from to infinity. The initial failure rate can be put equal to its mean value, if the ratio exceeds 2.5. This is usually indeed the case in an actual situation, since the accepted normal distribution, when applied to a random variable that cannot be negative, should be characterized by a significant ratio of its mean value to the standard deviation, so that the negative values of such a distribution, although exist, are insignificant and do not contribute appreciably to the sought information.
The probability of non-failure,
can be calculated as,
and is tabulated in Table 3 as the function of the “physical” time and the “safety factor”
Table 3.
Calculated probabilities of non-failure as functions of the time and factor .
From (8) we obtain:
The derivative can be evaluated analytically or obtained numerically using Table 3 data.
3. Conclusions
The following conclusions could be drawn from the carried out analysis:
- Two mutually complementing modeling studies have been carried out: (1) the analysis of the configuration of the IMP of the BTC, the reliability “passport” of an established semiconductor technology; and (2) the analysis of the role of the random SFR of the mass-produced components that the product of interest is comprised of.
- The first analysis has shown that the BTC-based time-derivative of the failure rate at the initial moment of time can be considered as a suitable criterion of whether BIT should or does not have to be conducted. If this derivative is small, no BIT might be needed, because the initial part of the IMP is more-or-less parallel to the time axis, and this is an indication that there are no highly unreliable items (“freaks”) in the lot and that the initial moment of time is, in effect, the start of the steady-state BTC condition. In the opposite extreme case, when this derivative is significant, BIT is needed, but could be made very short, because the “freaks” are so unreliable that even a very short and weak BIT could successfully remove them.
- The second analysis has indicated that the above criterion is, in effect, the variance of the random SFR of the mass-produced components that the product manufacturer received from numerous vendors, whose commitments to reliability were unknown, and their random SFR might vary therefore in a very wide range, from zero to infinity.
- A solution for the case of the normally distributed random SFR was obtained. Using this solution, probabilities of non-failure as functions of time and the ratio of the mean value of the random SFR of the mass-produced components to its standard deviation (in analysis of structures this ratio is known as safety factor) were calculated. This adds useful information to the next-step investigations and a more effective answer to our fundamental “question in question”.
- Although this paper does not offer a straightforward and an ultimate answer to this question, the suggested physics-of-failure and statistics-of-failure based criterion, and the calculated probabilities of non-failure for the given loading conditions and time of testing, provide a useful step forward in advancing today’s BIT practice, which is based on the HALT, a “black box” that has many merits, but does not quantify reliability, even on a deterministic basis.
- Future work should include experimental verification of the suggested “to burn-in or not to burn in” criterion, as well as its acceptable values, which would enable to answer the “to burn-in or not to burn-in” question. It should include also investigation of the effects of other possible distributions of the random SFR, such as, for example, Rayleigh distribution.
Conflicts of Interest
The author declares no conflict of interest.
Acronyms
| BAZ | Boltzmann-Arrhenius-Zhurkov’s (equation) |
| BIT | Burn-in Testing |
| BTC | Bathtub Curve |
| DfR | Design for Reliability |
| FOAT | Failure Oriented Accelerated Testing |
| HALT | Highly Accelerated Life Testing |
| IMP | Infant Mortality Portion (of the BTC) |
| PDfR | Probabilistic Design for Reliability |
| SFR | Statistical Failure Rate |
Appendix A
SFR of a Product Comprised of Mass-Produced Components
Consider a typical situation when a product manufacturer receives the components for this product from independent and numerous vendors that produce components of different and typically unknown reliability levels. The probability of non-failure of such a product can be sought, assuming that the exponential law of reliability is applicable, as
Here is the component fraction received from the vendor, is the probability of non-failure of the component, is its (random) failure rate, and is time.The sum in this formula can be substituted, for a large number of mass-produced components, by the integral:
where is the probability distribution function and is the probability density distribution function of the continuously distributed random failure rate of the mass-produced components. The time-dependent non-random SFR of the product can be determined for the given moment of time as the ratio of the current rate of the number of products that failed by the time to the number of products that still remained sound by this time. Substituting the number of the sound items with the probability of non-failure and the number of the failed items—with the probability of failure, the above formula can be written as:
or, considering (A2), as
Appendix B
Prediction Based on the Application of the BAZ Equation
The BAZ model [20,21] enables a simple, easy-to-use and physically meaningful solution to be obtained for the evaluation of the probability of failure of a material or device after the given time in operation at the given temperature and under the given stress. Using this model, the probability of non-failure of the device subjected to an elevated temperature can be sought in the form:
Here is the stress-free activation energy (material/device characteristic), is the absolute temperature, is Boltzmann’s constant, is time, is the critical value of the monitored characteristic of degradation (electrical resistance, leakage current, etc.), and is the sensitivity factor. The FOAT aimed at the evaluation of this factor and the stress-free activation energy can be conducted using the following procedure. BIT should be conducted at two different elevated temperatures. Since the activation energy should remain the same at both temperature levels, sensitivity factor could be found from the following formula:
Let, for example, in accordance with the data accumulated during a product launch or lot release is accumulated: the BIT at the temperature was conducted for and of the tested devices failed ( When the test was conducted for with another group of the same devices at the temperature of the tested devices failed The observed failure modes were mechanical failures of the solder joints, and the failures corresponded to the increase in the electrical resistance to the level of Then the second formulas in (A6) yields:
and the first formula in (A6) results in the following value of the sensitivity factor
The activation energy is therefore,
or (to make sure that there was no calculation error),
which makes physical sense. The SFR at failure are, for the above probabilities of non-failure,
in the first step of BIT, and is
in the second step.
In an approximate analysis, when the times and are short, one could tentatively evaluate the variance of the random SFR as
A more accurate prediction could be obtained using Table 3 data.
It is noteworthy that a similar approach could be applied with different failure modes, such as short or open circuits, leakage current, charge accumulation, etc.
References
- Matisoff, B. Handbook of Electronics Manufacturing Engineering, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1994. [Google Scholar]
- Kececioglu, D.; Sun, F.-B. Burn-in-Testin: Its Quantification and Optimization; Prentice Hall: Upper Saddle River, NJ, USA, 1997. [Google Scholar]
- Ebeling, C. An Introduction to Reliability and Maintainability Engineering; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
- Vollertsen, R.-P. Burn-In. In Proceedings of the 1999 IEEE International Integrated Reliability Workshop Final Report, Lake Tahoe, CA, USA, 18–21 October 1999; pp. 167–173. [Google Scholar]
- Quality and Reliability; ASIC Products Application Note; Revision 3. SA 14-2280-03; IBM Microelectronics Division: Essex Junction, VT, USA, 1999; p. 15.
- Whipple, P.J.; Lanoux, J.A.; Richard, J.W.; Vu, V.V.; Motorola, Inc. Massive Parallel Semiconductor Manufacturing Test Process. U.S. Patent 6,433,568, 13 August 2002. [Google Scholar]
- Burn-In. MIL-STD-883F: Test Method Standard, Microchips. Method 1015.9; US DoD: Washington, DC, USA, 2004.
- Noel, M.; Dobbin, A.; van Overloop, D. Reducing the Cost of Test in Burn-in—An Integrated Approach. In Proceedings of the Burn-in and Test, Socket Workshop, Mesa, AZ, USA, 3–6 March 2004; pp. 34–59. [Google Scholar]
- Ooi, M.P.-L.; Abu Kassim, Z.; Demidenko, S. Shortening Burn-In Test: Application of HVST Weibull Statistical Analysis. IEEE Trans. Instrum. Meas. 2007, 56, 990–999. [Google Scholar] [CrossRef]
- Ng, Y.-H.; Low, Y.-H.; Demidenko, S.N. Improving Efficiency of IC Burn-In Testing. In Proceedings of the IEEE Instrumentation and Measurement Technology Conference, Victoria, BC, Canada, 12–15 May 2008. [Google Scholar]
- Duane, J.T.; Collins, D.H.; Jason, K.; Freels, J.K.; Huzurbazar, A.V.; Warr, R.L.; Brian, P.; Weaver, B.P. Accelerated Test Methods for Reliability Prediction. IEEE Trans. Aerosp. 1964, 2, 563. [Google Scholar] [CrossRef]
- Nelson, W.B. Accelerated Testing: Statistical Models, Test Plans, and Data Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1990. [Google Scholar]
- Crowe, D.; Feinberg, A. Design for Reliability (Electronics Handbook Series), 1st ed.; CRC Press: Boca Raton, FL, USA, 2001. [Google Scholar]
- Suhir, E.; Ghaffarian, R. Electron Device Subjected to Temperature Cycling: Predicted Time-to-Failure. J. Electron. Mater. 2019, 48, 778–779. [Google Scholar] [CrossRef]
- Suhir, E. HALT, FOAT and Their Role in Making a Viable Device into a Reliable Product. In Proceedings of the IEEE-AIAA Aerospace Conference, Big Sky, MT, USA, 1–8 March 2014. [Google Scholar]
- Suhir, E. Failure-Oriented-Accelerated-Testing (FOAT) and Its Role in Making a Viable IC Package into a Reliable Product. Circuits Assembly, July 2013. [Google Scholar]
- Suhir, E. Probabilistic Design for Reliability. Chip Scale Rev. 2010, 14, 6. [Google Scholar]
- Suhir, E. Remaining Useful Lifetime (RUL): Probabilistic Predictive Model. Int. J. Progn. Health Monit. (PHM) 2011, 2, 140. [Google Scholar]
- Zhurkov, S.N. Kinetic Concept of the Strength of Solids. Int. J. Fract. Mech. 1965, 1, 311–323. [Google Scholar] [CrossRef]
- Suhir, E.; Kang, S. Boltzmann-Arrhenius-Zhurkov (BAZ) Model in Physics-of-Materials Problems. Mod. Phys. Lett. B (MPLB) 2013, 27, 1330009. [Google Scholar] [CrossRef]
- Suhir, E.; Bensoussan, A. Application of Multi-Parametric BAZ Model in Aerospace Optoelectronics. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2014. [Google Scholar]
- Arrhenius, S.A. Über die Dissociationswärme und den Einfluß der Temperatur auf den Dissociationsgrad der Elektrolyte. Z. Phys. Chem. 1889, 4, 96–116. (In German) [Google Scholar] [CrossRef]
- Katz, A.; Pecht, M.; Suhir, E. Accelerated Testing in Microelectronics: Review, Pitfalls and New Developments. In Proceedings of the International Symposium on Microelectronics and Packaging, IMAPS, Tel-Aviv, Israel, 15 June 2000. [Google Scholar]
- Suhir, E. Statistics- and Reliability-Physics-Related Failure Processes. Mod. Phys. Lett. B (MPLB) 2014, 28, 1450105. [Google Scholar] [CrossRef]
- Suhir, E.; Bensoussan, A. Degradation Related Failure Rate Determined from the Experimental Bathtub Curve. In Proceedings of the SAE Conference, Seattle, WA, USA, 22–24 September 2015. [Google Scholar]
- Gradshteyn, I.S.; Ryzhik, I.M. Tables of Integrals, Series, and Products; Academic Press: San Diego, CA, USA, 1980. [Google Scholar]
© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).