Maintenance Strategy Selection Based on FMEA/FMECA Approach Using Time Dependent Failure Probability

: Maintenance strategies are one of the key aspects determining the availability of a production system and its maintenance cost. For complex machines with multiple components, each of which can have several failure modes, maintenance strategy selection based on a failure mode and effect analysis (FMEA) approach is particularly suitable. The combination of failure probability and failure effect of each component allows an evaluation of different maintenance actions. The aim of this publication is to introduce a new aspect of the approach, allowing failure probabilities of components to be time dependent. This in turn can lead to different ideal maintenance strategies over time, based on the expected wear of a component.


Introduction
Industrial maintenance is playing an increasingly important role in production systems.The interconnections between machines and systems are becoming more complex, partly due to increasing digitization and automation.Malfunctions and failures not only need more time to be resolved, but also have greater consequences on the production system [1].To respond to these increasing demands, maintenance must make intelligent use of the available resources.One of the most important factors that determines the use of resources and the resulting availability is the selection of the maintenance strategy for the production machines [2].In literature, there are many approaches to select a maintenance strategy [3], with reliability-centered methods being some of the most prominent ones.For complex machines consisting of different components, each of which can have several failure causes, maintenance strategy selection based on failure mode and effect analysis (FMEA) is particularly suitable [4].The combination of failure probability and failure effect of each component allows an evaluation of different maintenance actions.
The aim of this publication is to introduce a new approach allowing failure probabilities of components to be time dependent.With this approach the expected wear of the components and the respective failure probabilities of each wear state can be introduced into the analysis, leading to a time dependent probability for each failure possibility.This in turn can lead to different ideal maintenance strategies over time, based on the expected failure probability.The addition of the FMEA approach is validated in a real production environment with maintenance employees, reported in the results section and discussed.

Materials and Methods
First a short literature review will be presented with the focus on using FMEA for maintenance strategy and action selection.Based on that, the concept of time dependent failure probability will be introduced before outlining an expansion possibility to the FMEA approach integrating time dependency of failure probability in the evaluation.

Literature Review
There have been many publications concerning maintenance strategy selection including literature reviews like [1,3], which both present a reliability-centered approach to maintenance activity selection as one of the most common ones used.In their literature review focusing just on reliability-centered maintenance, Sajardj et al. describe the failure mode and effect analysis as one of the major frameworks used in reliability centered maintenance.The analysis of different failure modes and their possible effects is at the center of the reliability approaches.Even if the specific method used by maintenance personnel is not named FMEA, most reliability centered approaches can be traced back to the FMEA approach [4].An extension to the FMEA adding a criticality analysis is summarized by Scheu et al. [5] and Lipol and Haq [6] in their publications respectably.The so-called failure mode, effects and criticality analysis (FMECA) ranks the identified failure modes according to failure rate and severity to allow the maintenance staff to counter the most disruptive failures first.In most cases the FMEA or FMECA consists of a preparation phase, a structure and a function analysis before arriving at the failure analysis.In the failure analysis the occurrence rate or probability of a failure, the severity of the effect and the detection rate for each possible failure mode are defined [7].In some cases for maintenance strategy decisions, the detection rate is not assessed [8,9].The occurrence, severity and detection are each represented by an ordinal scaled value which when multiplied leads to the risk priority number of each failure mode.
In their literature review on FMEA implementations Sharma and Srivastava summarize different advances of the method [10].In the 67 publications reviewed, only one addresses that failure probability can be time dependent.In his study on the reliability of a hydraulic excavator, Majumdar does not use the time dependency of the failure probability in a FMEA but uses the FMEA to analyze the failure possibilities of the excavator [11].In another state-of-the-art review from 2017 on FMEA/FMECA, Spreafico et al. summarize suggestions for improving common criticism of the method [12].Concerning the improvement for failure probability in FMEA, two relevant proposals are listed: first using statistical and logical methods to quantify the probability of failure by Xu et al. [13], where fuzzy approaches are discussed to determine a failure probability and second, the use of historical data in order to quantify potential failures by García and Gilabert [14].Both improvement proposals try to optimize the computation of the fixed value for the failure probability and do not include time dependency in their respective approaches.One additional improvement suggestion for FMEA was found during the literature review not included in the reviews by Spreafico et al. or Sharma and Srivastava.Banghart et al. make suggestions on how to deal with subjectivity in the severity classification [15].
To summarize the literature review, the failure mode and effect analysis and its advances and improvements play an important role in maintenance strategy and activity selection especially in reliability-centered approaches.All of the approaches and extensions to the FMEA reviewed in the literature research use a single fixed value for failure probability and no approach introduces time dependency into a FMEA.

Failure and Cumulative Failure Probability
In order to not only use a fix failure probability for the maintenance strategy decision of a machine or component, and to consider the time dependency of failure, possible types of dependencies need to be identified.In their reliability-centered maintenance guide, Nasa summarized six basic types of failure probability curves identified by [16], which show the probability that an item will fail during each time interval [9].The six curves, represented by the dashed lines, are shown in Figure 1.In this summary, only failures with technical reasons were considered.Maintenance relevant machine failures can also be caused by incorrect operation, defects in the material or other reasons which can lead to individual failure probability curves beyond the six types [17].In addition to the failure probability for each time interval the cumulative failure probabilities for the six types are shown as green lines.For each failure probability curve the green line represents the probability that an item will have failed before the time interval or will fail in it.The cumulative failure probability ( ) can be calculated by subtracting the multiplied individual complementary failure probabilities (1 −  ) up to the time interval from one as shown in Formula (1).

Using Cumulative Failure Probability for Maintenance Strategy Decision
In a classic FMEA the occurrence rate or probability for each failure mode is determined and ranked from low to high.With this fixed number the occurrence rate of that failure mode would be entered into the calculation [7].This method neglects the effects shown in Figure 1 where the time dependency of failure occurrence is shown.In order to use the time dependency of the failure probability the FMEA methods need to be expanded to be able to cope with a variable occurrence rate.Using a time dependent cumulative failure probability leads to time dependent risk priority numbers for each failure mode and therefore may also lead to time dependent action requirements.A maintenance measure which would be economically worthwhile when using a fixed occurrence rate might be not necessary for the first time periods when the cumulative failure probability is still low.
In general, the new time dependent FMEA can be performed just a like a classic FMEA.Within the preparation phase, additional to the classic tasks, a time frame needs to be established.This timeframe dictates the boundaries on how far into the future the failure modes will be analyzed.The next phases of structure and function analysis are not influenced by the inclusion of time dependency.The failure analysis phase needs to be expended in order to allow for the variation of failure rates.At the beginning of that phase, the failure modes still need to be identified, for example by using a fault tree analysis [18].Each failure mode then needs to be investigated for time dependency.For component In addition to the failure probability for each time interval the cumulative failure probabilities for the six types are shown as green lines.For each failure probability curve the green line represents the probability that an item will have failed before the time interval or will fail in it.The cumulative failure probability (CFP n ) can be calculated by subtracting the multiplied individual complementary failure probabilities (1 − FP i ) up to the time interval from one as shown in Formula (1).

Using Cumulative Failure Probability for Maintenance Strategy Decision
In a classic FMEA the occurrence rate or probability for each failure mode is determined and ranked from low to high.With this fixed number the occurrence rate of that failure mode would be entered into the calculation [7].This method neglects the effects shown in Figure 1 where the time dependency of failure occurrence is shown.In order to use the time dependency of the failure probability the FMEA methods need to be expanded to be able to cope with a variable occurrence rate.Using a time dependent cumulative failure probability leads to time dependent risk priority numbers for each failure mode and therefore may also lead to time dependent action requirements.A maintenance measure which would be economically worthwhile when using a fixed occurrence rate might be not necessary for the first time periods when the cumulative failure probability is still low.
In general, the new time dependent FMEA can be performed just a like a classic FMEA.Within the preparation phase, additional to the classic tasks, a time frame needs to be established.This timeframe dictates the boundaries on how far into the future the failure modes will be analyzed.The next phases of structure and function analysis are not influenced by the inclusion of time dependency.The failure analysis phase needs to be expended in order to allow for the variation of failure rates.At the beginning of that phase, the failure modes still need to be identified, for example by using a fault tree analysis [18].Each failure mode then needs to be investigated for time dependency.For component wear, the different types of failure probabilities shown in Figure 1 can be used and enriched with company data to best fit their individual failure rates.However, the time dependency of a failure does not need be exclusive to machine or component wear.It can also include different kinds of failure, for example higher rates of incorrect machine operations during the summer due to inexperienced vacation replacement.Depending on the set timeframe, even possible different failure probabilities from day to night shift may be included in the calculation.Having the failure probabilities for each time period for each time dependent failure mode, the cumulative failure probability then needs to be calculated using Formula (1).To use the generated information the cumulative failure probability can either be divided into ranges and used like the classic occurrence rate, or it is possible to use the actual percent value in the upcoming calculation.The evaluation of the risk priority curve must consider which way is chosen.The severity and detection rate also might differ over time which is addressed in the discussion section of this paper.The other steps of the risk analysis can be performed in the same way as in the classic FMEA approach.In order to evaluate the optimization or maintenance measures, the former risk priority number now is represented by a risk priority curve with the chosen time frame on the x-axis and the risk priority number on the y-axis.Figure 2 shows an example of a classic FMEA approach (Figure 2a) and the corresponding time dependent approach.The failure probability and the cumulative failure probability for the component with an overall average occurrence rate of six is shown in Figure 2c. Figure 2b shows the time dependent risk priority number for the range approach and Figure 2d for the actual value approach.
wear, the different types of failure probabilities shown in Figure 1 can be used and enriched with company data to best fit their individual failure rates.However, the time dependency of a failure does not need be exclusive to machine or component wear.It can also include different kinds of failure, for example higher rates of incorrect machine operations during the summer due to inexperienced vacation replacement.Depending on the set timeframe, even possible different failure probabilities from day to night shift may be included in the calculation.Having the failure probabilities for each time period for each time dependent failure mode, the cumulative failure probability then needs to be calculated using Formula (1).To use the generated information the cumulative failure probability can either be divided into ranges and used like the classic occurrence rate, or it is possible to use the actual percent value in the upcoming calculation.The evaluation of the risk priority curve must consider which way is chosen.The severity and detection rate also might differ over time which is addressed in the discussion section of this paper.The other steps of the risk analysis can be performed in the same way as in the classic FMEA approach.In order to evaluate the optimization or maintenance measures, the former risk priority number now is represented by a risk priority curve with the chosen time frame on the x-axis and the risk priority number on the y-axis.Figure 2 shows an example of a classic FMEA approach (Figure 2a) and the corresponding time dependent approach.The failure probability and the cumulative failure probability for the component with an overall average occurrence rate of six is shown in Figure 2c. Figure 2b shows the time dependent risk priority number for the range approach and Figure 2d for the actual value approach.Using the Figure 2, it is possible not only to determine whether a measure or maintenance action is feasible or not, but also to see at what point in time the cumulative failure probability becomes high enough to make the action economically sensible.The single risk priority number is transformed into a time dependent curve.

Results
To validate the expanded FMEA approach for time dependent failure probability, the approach was used to determine the maintenance strategy and necessary maintenance actions for a machine in a real production environment.The results are exemplary, shown for a turning wheel moving the products inside the machine.For the time frame of the expanded FMEA, five years were chosen from experience of the maintenance workers, because they expect the components of the machine to have failed after five years without maintenance action.From the maintenance workers experience, there are two possible failure causes for the turning wheel, both leading to a machine shutdown and having to Using the Figure 2, it is possible not only to determine whether a measure or maintenance action is feasible or not, but also to see at what point in time the cumulative failure probability becomes high enough to make the action economically sensible.The single risk priority number is transformed into a time dependent curve.

Results
To validate the expanded FMEA approach for time dependent failure probability, the approach was used to determine the maintenance strategy and necessary maintenance actions for a machine in a real production environment.The results are exemplary, shown for a turning wheel moving the products inside the machine.For the time frame of the expanded FMEA, five years were chosen from experience of the maintenance workers, because they expect the components of the machine to have failed after five years without maintenance action.From the maintenance workers experience, there are two possible failure causes for the turning wheel, both leading to a machine shutdown and having to replace the entire turning wheel module.Either the bearing of the turning wheel would wear, or the gearbox would accumulate wear and go out of synchronization.Because both failure causes are independent from each other and lead to the same maintenance measure, both failures could be grouped together.To be able to compare the new expanded FMEA to a classic approach, both the static and time dependent failure probabilities were workshopped with the employees using historical data and experience.The workshop results for the turning wheel are shown in Figure 3.
replace the entire turning wheel module.Either the bearing of the turning wheel would wear, or the gearbox would accumulate wear and go out of synchronization.Because both failure causes are independent from each other and lead to the same maintenance measure, both failures could be grouped together.To be able to compare the new expanded FMEA to a classic approach, both the static and time dependent failure probabilities were workshopped with the employees using historical data and experience.The workshop results for the turning wheel are shown in Figure 3.The static failure probability for a given year of the bearing and the gear box was determined to be 1/3 and 1/5 respectively.The time dependent failure probability for both components were chosen to represent the constant failure probability with a wear-out region, like in Figure 1c, with the bearings having a 50% failure probability each month after the wear out region.In order to make sure both approaches are comparable, attention was paid to ensure that the cumulative failure probability for each component was the same at the time.Both points are marked in the graphs in Figure 3.In this example it was assumed that all components are new and start with their beginning failure probability.This approach also allows for shifting and the curves, if for example a new component was installed two years ago.
Using these failure probabilities to finish both versions of the FMEA, it was decided to postpone the purchase of a spare part turning wheel for two and a half years (30 months).The result of the classic approach recommended the investment for the first year, but the results of the time dependent approach has convinced the maintenance employees to wait until the cumulative failure probability has risen significantly.

Discussion
The FMEA approach was among other things so successful because of its simplicity.For each failure mode only three questions had to be answered: how often does it occur, how severe is the failure and to what extent can it be detected.Introducing time dependent failure probabilities might seem like taking a simple concept and making it more complex than it needs to be.But, evaluating the workshop with the maintenance employees, that was not the case.The concept was easily understood and using example failure probability curves, made it easy for the employees to create failure cause specific curves with the historical data and their experience.The results were intuitive and immediately excepted.This could be due to the fact that in maintenance the concept of failure probabilities is very common.The static failure probability for a given year of the bearing and the gear box was determined to be 1/3 and 1/5 respectively.The time dependent failure probability for both components were chosen to represent the constant failure probability with a wear-out region, like in Figure 1c, with the bearings having a 50% failure probability each month after the wear out region.In order to make sure both approaches are comparable, attention was paid to ensure that the cumulative failure probability for each component was the same at the time.Both points are marked in the graphs in Figure 3.In this example it was assumed that all components are new and start with their beginning failure probability.This approach also allows for shifting and the curves, if for example a new component was installed two years ago.

Conclusions
Using these failure probabilities to finish both versions of the FMEA, it was decided to postpone the purchase of a spare part turning wheel for two and a half years (30 months).The result of the classic approach recommended the investment for the first year, but the results of the time dependent approach has convinced the maintenance employees to wait until the cumulative failure probability has risen significantly.

Discussion
The FMEA approach was among other things so successful because of its simplicity.For each failure mode only three questions had to be answered: how often does it occur, how severe is the failure and to what extent can it be detected.Introducing time dependent failure probabilities might seem like taking a simple concept and making it more complex than it needs to be.But, evaluating the workshop with the maintenance employees, that was not the case.The concept was easily understood and using example failure probability curves, made it easy for the employees to create failure cause specific curves with the historical data and their experience.The results were intuitive and immediately excepted.This could be due to the fact that in maintenance the concept of failure probabilities is very common.

Conclusions
After giving a short literature overview about maintenance strategy and maintenance action selection using FMEA/FMECA, the concept of time dependent failure probability was introduced.Following that, the lack of time dependency in the FMEA concept was shown and an approach presented on how time dependent failure probability can be introduced.The approach was validated with a machine in a production line and an example component was discussed.
Using time dependency allows for a more accurate description of the expected state of a machine and gives the maintenance personnel the opportunity to base their decisions not on fixed values but on more differentiated timelines.The approach was easily understood by the employees in the test production environment and the results implemented.
With these results for the occurrence rate or failure probability it should also be explored how a time dependent failure severity and maybe even time dependent failure detection might be introduced into the FMEA framework.

Figure 1 .
Figure 1.Failure probabilities (dashed lines) and cumulative failure probabilities (green lines) for the six component failure types based on Nasa [9].(a) Constant failure probability, (b) linear rising failure probability, (c) constant failure probability with a wear-out region, (d) rising failure probability to a constant level, (e) infant mortality of equipment followed by a constant failure probability and (f) infant mortality, followed by constant and then rising failure probability.

Figure 1 .
Figure 1.Failure probabilities (dashed lines) and cumulative failure probabilities (green lines) for the six component failure types based on Nasa [9].(a) Constant failure probability, (b) linear rising failure probability, (c) constant failure probability with a wear-out region, (d) rising failure probability to a constant level, (e) infant mortality of equipment followed by a constant failure probability and (f) infant mortality, followed by constant and then rising failure probability.

Figure 2 .
Figure 2. (a) Classic FMEA approach to determine the risk priority number; (b) time dependent risk priority number using ranges; (c) time dependent failure probability of the component; (d) time dependent risk priority number using ranges.

Figure 2 .
Figure 2. (a) Classic FMEA approach to determine the risk priority number; (b) time dependent risk priority number using ranges; (c) time dependent failure probability of the component; (d) time dependent risk priority number using ranges.

Figure 3 .
Figure 3. Using failure probability (fp) and cumulative failure probability (cum-fp) of a turning wheel (tw) to determine necessary maintenance actions (a) for constant failure probabilities, and (b) for time dependent failure probabilities.

Figure 3 .
Figure 3. Using failure probability (fp) and cumulative failure probability (cum-fp) of a turning wheel (tw) to determine necessary maintenance actions (a) for constant failure probabilities, and (b) for time dependent failure probabilities.