Case Study of Expected Loss Failure Mode and Effect Analysis Model Based on Maintenance Data

: Failure mode and effect analysis (FMEA) is one of the most widely employed pre-evaluation techniques to avoid risks during the product design and manufacturing phases. Risk priority number (RPN), a risk assessment indicator used in FMEA, is widely used in the ﬁeld due to its simple calculation process, but its limitations as an absolute risk assessment indicator have been pointed out. There has also been criticism of the unstructured nature and lack of systematicity in the FMEA procedures. This work proposes an expected loss-FMEA (EL-FMEA) model that organizes FMEA procedures and structures quantitative risk assessment metrics. In the EL-FMEA model, collectible maintenance record data is deﬁned and based on this, the failure rate of components and systems and downtime and uptime of the system are calculated. Moreover, based on these calculated values, the expected economic loss is computed considering the failure detection time. It also provides an alternative coefﬁcient to evaluate whether or not a detection system is installed to improve the expected loss of failure. Finally, a case study was conducted based on the maintenance record data, and the application procedure of the EL-FMEA model was presented in detail, and the practicality of this model was veriﬁed through the results.


Introduction
FMEA is an efficient qualitative analysis method adopted in safety system engineering. The failure modes, failure effects, potential accidents, and the consequence of accidents can be systematically identified and evaluated by using the powerful tool [1]. The results of the FMEA can help analysts identify and correct the failure modes that have a detrimental effect on the system and improve its performance during the design and production stages [2]. Since its introduction as a methodology for preventing failures, FMEA has been broadly used in various industries, including the aerospace, automotive, semiconductor, aircraft, chemical plant, and steel industries [1,[3][4][5][6][7][8].
Procedures for traditional FMEA are largely divided into system definition and FMEA worksheet creation. System definition identifies functions and operating modes as an introduction stage. The procedures required for FMEA worksheet creation are largely divided into failure mode analysis and risk assessment. Failure analysis has a way of enumerating failure modes for individual hardware or cataloging functional failures for products and components [9]. Next, the risk associated with a failure mode is usually evaluated using the risk priority number (RPN), which corresponds to the mathematical product of the occurrence (O), severity (S), and detection (D) of a failure-cause (i.e., RPN = O × S × D). RPN evaluation of a potential failure requires evaluating three risk factors, i.e., O, S and D, via the 1-to-10 scale. The higher the RPN of a failure, the greater is the associated risk concerning system/product reliability [10].
However, this traditional FMEA risk-evaluation approach has often been extensively discussed in the existing literature for various reasons [2]. FMEA includes the following limitations in terms of worksheet creation procedures and RPN-based approaches. First, an RPN-based approach include (i) no consideration of relative importance among O, S, and D [11][12][13][14]; (ii) difficulty to obtain the exact ratings on O, S, and D because of the uncertainty and vagueness of FMEA group members' judgements [15][16][17][18][19][20][21][22]; (iii) no reflection of the interdependence between different causes of failure and their consequent effects [6,[23][24][25]; and (iv) excessive reliance on expert intuition and experience instead of scientific methods for assessing three risk factors [26,27]. Next, the limitations in terms of FMEA procedures include: (i) FMEA is an inductive and non-structured approach to identify failure modes [28]; (ii) FMEA worksheets are only practical when there are a variety of experts and team members who are knowledgeable about the system [28]; (iii) identification of failure modes, causes, and effects requires a lot of time and effort by experts.
Many researchers have attempted to overcome the drawbacks mentioned above, thereby improving the FMEA risk-evaluation method in the process. Summarizing many studies that focused on improving the risk assessment method, Liu et al. reported that the most popular FMEA approach correspond to the fuzzy rule-based system [29][30][31][32], followed by the grey theory [13,14], cost-based model [16][17][18]33], AHP/ANP [24,34,35] and linear programming. Others included an integration-based approach [21,22,26,27] and probability-based methodology [19,36]. Apart from these approaches, in our previous study [10,36], we proposed an economic expected loss model in consideration of the time between the occurrence of the cause of failure and the time it takes to be detected when targeting a step-tier system and a multi-tier system. The proposed model is a timedependent FMEA model, which has the advantage of being able to intuitively assess the size of risk, unlike RPN, which is a traditional FMEA risk assessment index, RPN. Besides, Kwon et al. [37] suggested the expected loss model, in which failure was dependent on time, and the system was periodically monitored to prevent failures during its mission period. The loss due to each failure mode was assumed to depend on the remaining mission period of the system.
Next, among some studies that have focused on improving the FMEA procedure, Peeters et al. proposed a systematic structure that can identify the relationship between failure modes by combining FTA and FMEA, but it has a limitation in that it consumes a lot of time. [28]. Ahn et al. proposed a procedural model for process FMEA based on process function and equipment model (PFEM) representing the mapping between process functions and related equipment. The process FMEA performance method using PFEM is a methodology that enables them to be found by systematically accessing failure modes, effects, and causes by using a combined voice of customer (VOC) and defect history [38]. Kondrateva et al. proposed an algorithm for assessing the risk of accidents at energy facilities, based on the FMEA, event tree analysis (ETA), and human reliability assessment (HRA) methods, will increase the level of monitoring effectiveness of repair work organization of electrical equipment and its technical condition [39]. In addition, some studies have applied FTA and ETA to FMEA to systematically identify failure modes, causes, and effects [40,41].
Various studies are being conducted in terms of risk assessment methods and procedures to improve traditional FMEA. However, few studies suggest results applicable to the industry considering both the FMEA procedure and the risk assessment method in several previous studies, including our previous studies. Moreover, most previous studies presented risk assessment methods or procedures on an academic level, making it difficult for industrial field workers to apply the proposed methodology. Furthermore, although many studies have improved the limitations of RPNs as the risk assessment indicators, few cases have been found to present readily applicable indicators while intuitively assessing risks at industrial sites.
This study proposes an expected loss-based FMEA (EL-FMEA) model to present both the FMEA procedure and risk assessment indicators. In the EL-FMEA model, maintenance record data that can be sufficiently collected in the industrial field is first defined. Furthermore, based on the maintenance record data, the calculation method of system downtime, uptime, and the failure rate is presented in detail. Then, the method of cal-culating the expected loss due to the cause failure is presented. Finally, an alternative coefficient calculation method is also presented to determine whether to install a cause failure detection system based on expected loss. The composition of this study is described in detail in Section 2, defining the maintenance record data necessary for risk assessment and calculating the expected loss and the best alternative decision coefficient. In Section 3, a case study is performed based on maintenance record data processed for the experiment. Section 4 discusses the results of the case study and proposes ways to use them.

Data Definition and Modeling Procedure
This section explains in detail variables and calculation processes that can quantitatively perform risk evaluation based on the EL-FMEA model. The indicators used in the EL-FMEA model are expected loss due to failure and the best alternative decision coefficient. The alternative meaning here is installing a detection system (e.g., a sensor) to prevent the recurrence of a failure. The procedures for the EL-FMEA model are shown in Table 1. Prerequisite data for the EL-FMEA model is defined as shown in Part (a) of Table 1 to calculate various indicators for risk assessment. The prerequisite data is defined as a variable necessary for calculating the expected loss based on the maintenance record data that can be easily collected in the industrial field. Based on the prerequisite data, we present the process of formulating intermediate variables step by step, as shown in Part (b) of Table 1. Based on the variables in this part, the expected loss and coefficient of determination for the best alternative can be calculated as shown in Part (c) of Table 1. Re-operating time of the i th system failure n C Number of parts to repair a cause failure n peop Number of repairing people at each system failure n prod Number of productions per unit time π C Price of a part required to be repaired at each cause failure π peop Laboring cost per unit person per unit time π prod Price of a unit product π sens Price of a detection sensor λ sens * Sensor failure rate * If λ sens = 1/MTBF sens is unknown and MTBF sens is expected to be longer than T, it can be set 1/T conservatively.
Maintenance cost of a cause failure Maintenance cost of a system failure observation value θ Loss per unit time due to system breakdown θ = n peop π peop + n prod π prod (c) Output values

Variable Description Formulation
L Expected loss for k-causes system L sens Expected loss k-causes system with detection sensors ; install detection sensor for m < c 1 , do not install detection sensor for m > c 2 , hold off otherwise. c 1 = 0.8 and c 2 = 1.2, for instance.
First of all, the primary data required for FMEA performance has failure modes, causes, and failure effects, which can be organized as shown in Table 2. Based on FMEA's primary data, the prerequisite data is organized as shown in Part (a) of Table 1.  The total time (T) is calculated by multiplying the working days by 24 h a day as the system operating time. In the regular maintenance period (τ), it generally means an inspection cycle performed for preventive maintenance in the industrial field. System failure time (τ i,1 ) and re-operating time (τ i,2 ) of the system failure are defined for system down and uptime calculation. Moreover, the number of parts to repair a cause failure (n C ), number of repairing people at each system failure (n peop ), number of productions per unit time (n prod ), price of parts required to be repaired at each cause failure (π C ), laboring cost per unit person per unit time (π peop ), price of a unit product (π prod ), price of a detection sensor (π sens ), sensor failure rate are defined (λ sens ). Here, the price of the detection sensor means an installation price of a sensor for detecting a cause failure.
Second, based on the prerequisite data defined in Part (a) of Table 1, intermediate variables for the EL-FMEA model are calculated according to the procedure in Part (b) of Table 1. By performing the procedure in Part (b), as shown in Part (c) of Table 1, two types of expected loss and the alternative coefficient can be calculated. The detailed calculation procedure is described in the next Section 2.2.

Risk Evaluation Modeling
In this study, the probability of cause failure and system failure was assumed to be exponentially distributed with each failure rate. We calculated the expected value of loss for the cases in a system with detection sensors or does not have them. First, we considered a system of no detection sensor that the failure causes are checked only by a constant maintenance period. Changing or repairing faulted components is carried out to detect the abnormality of cause during regular maintenance. However, there are cases where a cause failure occurs and leads to system failure. In this case, changing or repairing faulted components is carried out instantly other than the maintenance period. Second, we considered a system of detection sensors. Then, regular maintenances are unnecessary. We assume that there is no system failure because alarms occur when the cause failure occurs, so preemptive response is possible.
We calculated the expected value of loss when there is no detection sensor. When the regular maintenance period increases, the cost of cause failure decreases, but system failure increases. We calculated the expected loss by regular and corrective maintenance. The criteria for determining whether to set up the detection sensor are presented by comparing the cost of the detection sensor with the expected loss.
In advance, we assume that the cause failures and system failures occur in random with exponential distribution. The causes are checked every maintenance cycle, and abnormal parts are exchanged or repaired to renew all causes after each maintenance. The probability that a cause failure occurred and was detected at the next maintenance period equals the cumulative probability of an exponential distribution during the regular maintenance cycle. This is empirically equivalent to the discovery of cause failures relative to the total number of tests. The summary of cause failures is as follows: Similarly, the probability that a system failure occurred and was detected in a system uptime is equal to the cumulative probability of an exponential distribution. This is empirically equivalent to the number of discovery of system failures relative to the total number of tests. The summary of system failures is as follows: We calculated the case where the detection sensor is installed on the system and not on the system. First, the loss function of the case without the detection sensor is as follows.

•
Expected loss without detection sensor of 1-cause system: • Expected loss without detection sensor of k-causes system: Second, the loss function of the case with the detection sensor is as follows. System failure is reduced by cause failure, and a sensor is alerted from a cause failure before the system failure. Thus, it is reasonable to say that the sum of the numbers of the cause failures and system failures when the sensor is not installed is the number of cause failures when the sensor is mounted. Therefore:

•
Detection sensor with 1-cause system: there does not occur a system failure • Detection sensor with k-causes system: there does not occur a system failure If L sens is significantly lower than L, system administrators should consider mounting detection sensors for cost down due to the system failure.

Example Case
This paper aims to perform a quantified risk assessment by presenting an analysis procedure (Table 1) for calculating the expected loss of a failure based on the maintenance record data obtained in the industrial field. As preliminary work for field application, realistic maintenance records were generated and analyzed based on the EL-FMEA model (see Appendix A). When generating maintenance records, the regular maintenance period was set to 15 days, and three types of maintenance types were considered as follows: The maintenance record data (Appendix A) we modeled are based on the assumption of general manufacturing facilities and consisted of 12 failure modes, 12 failure causes and eight items. Items 1 to 4 are equipped with sensors, so only SM is performed. On the other hand, items 5 to 8 are not equipped with sensors, so RM and CM are performed. Therefore, in this case study, based on the EL-FMEA model, Items 1 to 4 estimate the expected loss (L sens ) when there is a detection sensor, and Items 5 to 8 estimate the expected loss (L) when there is no detection sensor. Finally, in Items 5 to 8, the expected loss (L sens ) is first estimated before the sensor is installed to determine whether to install the detection sensor. The coefficient of determination for the best alternative (m) is calculated.

Results
Based on the maintenance record in Appendix A, the intermediate variables in part b in Table 1 were calculated as shown in Table 3. τ D , τ U , λ C , and λ S were calculated for Item 5 to Item 8 without a detection sensor, and in the case of λ sens , it was calculated considering the case where a detection sensor was installed. Based on the results in Table 3, the EL-FMEA results are summarized in Table 4. First, we calculated the expected loss when there are detection sensors from Item 1 to 4. In the case of an item with a detection sensor, since it is detected before the system's failure, the loss due to the system's failure is not included, but the cost of the detection sensor is considered. Among the items with detection sensors, Item 1, which has the highest component price and the highest occurrence frequency, showed the highest expected loss.
Next, in the case of Item 5 to 8, both RM and CM were performed, and each failure cause is different. In the case of RM, it is found that the expected loss is lower than that of CM, even for the same item, because it is found before system failure. In terms of items in Item 5, 6, and 7, the expected loss is relatively higher than that of Item 8 because it causes system failure and requires frequent repairs during the regular maintenance period. Also, in item 1 to item 3, the expected loss was high despite the presence of a detection sensor in terms of the entire item. In this case, it will be necessary to consider adjusting the sensor's sensitivity or replacing it with highly reliable parts to lower expected losses.
Finally, the coefficient of determination for the best alternative (m) was derived based on the expected loss from items 5 to 8 without a detection sensor. First, the total expected loss from items 5 to 8 is summarized in Table 5. For example, the expected loss for Item 5 is $638, which is the sum of the RM cost ($120) and the CM cost ($518). When a detection sensor is installed to lower the expected loss (L), the L sens was calculated and based on this, m was derived. m can establish a criterion from the user's perspective, for example, here, when m is less than 0.8, it is assumed that a detection sensor is installed. Therefore, in the case of Item 5 to 7, m is less than 0.7, so a detection sensor is to be installed. On the other hand, in the case of Item 8, it is withheld.

Discussion
The conventional FMEA model classes the risk of cause failure as occurrence (O), severity (S), and detection (D), quantities them with 1-10 point scales, and then derivatives a risk priority number (RPN) with multiplying them. RPN is used as a rank of risk among several causes, but it does not mean the amount of the risk. However, LOSS calculated in our EL-FMEA model means the actual cost that the system operator is most interested in. Therefore, the operator can check the magnitude of the risk and the priority of risk of a cause from the LOSS. In our model, occurrence (O) corresponds to the probability of a cause failure, detection (D) to the probability of a cause detection, and severity (S) to the cost of the cause.
We created dummy data that can be collected on-site to verify the model's usefulness (See Tables A1 and A2). The system we envision consists of Item 01 to 08. Item 01 to 04 are equipped with sensors, so only the sensor maintenance (SM) occurs. However, items 05 to 08 are not equipped with sensors, so regular maintenance (RM) and corrective maintenance (CM) occur simultaneously. From the maintenance table, we could derive the cause failure rates of Item 05 to 08 and the system failure rates, uptime, and downtime caused by Item 05 to 08. From the detection and failure rate of the already secured items 05 to 08, we derived alternative coefficients to determine whether the sensor is installed in items 05 to 08. In our example, the alternative coefficients of items 05 to 07 are smaller than 0.8, and the coefficient of Item 08 is 1.147. Therefore, it is considered economical to install sensors on items 05 to 07 and not in Item 08.
Collecting maintenance record data in the industrial fields has recently become very convenient due to various innovative technologies. The EL-FMEA model is applicable to the risk management of systems and processes in manufacturing and service industries. First, in the manufacturing industry, maintenance records are generally collected for quality control, but it is often not guaranteed whether indicators such as failure rate and expected loss (probability) can be calculated based on the collected data. Therefore, a data collection method and procedure that can be quantitatively analyzed in this study are presented through the EL-FMEA model. Even in the service industry, it is known that service failures, customer complaints, and complaint handling time are collected to manage the service process. Similarly, if the EL-FMEA model is applied, it will be possible to calculate the service failure rate, expected loss, and alternative decision coefficient.
We believe that applying the EL-FMEA model to the industrial field has the following advantages: (i) The EL-FMEA model is also fully applicable as a commonly used data processing tool (e.g., MS-Excel). (ii) The risk due to failure can be intuitively evaluated through the expected loss (economic aspect). That is, an acceptable cost of loss can be set to suit the situation; (iii) The expected profit can be calculated before the risk of failure is eliminated. In other words, a decision can be made based on the difference or ratio between the existing expected loss and the expected loss after the detection sensor is installed (iv) If the data defined to perform the EL-FMEA is continuously collected, the failure rate and operation rate of the system and items can be obtained, and the reliability of the system and items can also be calculated; (v) If the system of the EL-FMEA model is established in the company system, it can be used to verify the quality improvement goal and achievement. In other words, it will be possible to establish failure rate and expected loss reduction goals and check whether the goals are achieved.
If the EL-FMEA model is continuously used in the industrial field, we think that probability-based multifaceted analysis can be performed. In the future, based on this study, we will propose an optimal maintenance period that minimizes the expected loss but try to present it in a way that can be easily applied in the industrial field. In addition, in the Republic of Korea, various punishments are applied when a human accident occurs in the industrial field, according to the Occupational Safety and Health Act. Recently, through an interview with field workers, we gathered that a risk assessment considering the size of such a loss is necessary. In the future, we intend to present a more advanced EL-FMEA model along with measures that can realistically reflect such a situation.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Based on the prerequisite data defined in Section 2.1 of this study, we arbitrarily modeled the case study's maintenance record data (Section 3) as Table A2. We modeled the maintenance record data based on a general facility consisting of 12 failure modes, 12 failure causes, and eight items. Three types of maintenance were considered: (i) Regular maintenance (RM); (ii) corrective maintenance (CM); and (iii) sensor maintenance (SM). Items 1 to 4 are equipped with sensors, so only sensor maintenance (SM) occurs. On the other hand, items 5 to 8 are not equipped with sensors, so regular maintenance (RM) and corrective maintenance (CM) occur. Abbreviations and variables used in Table A2 are summarized in Table A1.