Fleet Resilience: Evaluating Maintenance Strategies in Critical Equipment

Durán, Orlando; Aguilar, Javier; Capaldo, Andrea; Arata, Adolfo

doi:10.3390/app11010038

Open AccessArticle

Fleet Resilience: Evaluating Maintenance Strategies in Critical Equipment

¹

Pontificia Universidad Católica de Valparaíso, Valparaíso 243000, Chile

²

R-MES Analytics, Las Condes 7550000, Chile

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(1), 38; https://doi.org/10.3390/app11010038

Submission received: 3 December 2020 / Revised: 18 December 2020 / Accepted: 21 December 2020 / Published: 23 December 2020

(This article belongs to the Special Issue Condition Monitoring and Their Applications in Industry)

Download

Browse Figures

Versions Notes

Abstract

Resilience is an intrinsic characteristic of systems. Through it, the capacity of a system to react to the existence of disruptive events is expressed. A series of metrics to represent systems’ resilience have been proposed, however, only one indicator relates the availability of the system to this characteristic. With such a metric, it is possible to relate the topological aspects of a system and the resources available in order to be able to promptly respond to the loss of performance as a result of unexpected events. This work proposes the adaptation and application of such a resilience index to assess the influence of different maintenance strategies and topologies in fleets’ resilience. In addition, an application study considering an actual mining fleet is provided. A set of critical assets was identified and represented using reliability block diagrams. Monte Carlo simulation experiments were conducted and the system availability data were extracted. Resilience indexes were obtained in order to carry out the definition of the best maintenance policies in critical equipment and the assessment of the impact of modifying system redundancies. The main results of this work lead to the overall conclusion that redundancy is an important system attribute in order to improve resiliency along time.

Keywords:

resilience; fleets; reliability; availability; maintainability

1. Introduction

Resilience can be defined as the ability of a system to withstand the challenges and threats imposed by the environment. The term resilience originates in the field of ecology [1], but has been expanded to a number of other fields, such as sociology, infrastructure, industry, among others [1].

In industry, the concept of resilience raises as a means to measure the ability of a physical asset, or system of physical assets, to react quickly to disruptive events and to recover its operational condition and operational readiness. However, there is no unanimous measure for such a concept, nor a single way of measuring it. Several authors have suggested resilience metrics in different industry sectors, civil, gas supply [2], electric distribution networks [3], water infrastructure [4], wind farms [5], etc. Sterbenz et al. [6,7] discuss the use of the concept of resilience across several application areas. In engineering systems, and according to the American Society of Mechanicals Engineers (ASME) [8], resilience is the ability to withstand disruptive events, both internal and external, without interruption in operation, and if it occurs, to recover completely and quickly.

The definition of the resilience concept and its measurement combines both quantitative and qualitative aspects [9]. A brief review of the literature shows that there is a number of alternative definitions of resilience. Among the terms used to represent resilience are robustness, survivability, reliability, and availability. Hashimoto et al. [10] proposed three inter-related terms: Reliability, as the degree of susceptibility to failure; Resilience, as the speed with which the system recovers from a failure; Vulnerability, as the degree of severity of failures and their consequences. Chuanyi Ji et al. [11] define vulnerability as the “non-resilience” characteristic. According to Cai et al. [12] resilience overlaps with adaptability, robustness, redundancy, flexibility, survivability, recoverability, speed, resourcefulness and reparability. On the other hand, resilience is frequently considered as the speed of recovery from a shock. However, the speed of recovery does not consider the magnitude of the damage caused by the shock [4]. According to [13], resilience should capture the recovery process as the rate between recovered functionality and lost functionality. Cholda [14] proposes a metric called the “quality of resilience” in which she summarizes the aspects of frequency and extent of interruptions. Bruneau et al. [15] propose four basic properties that explain, through the 4Rs framework, the concept of resilience: robustness, redundancy, rapidity and resourcefulness. Robustness is interpreted as the ability to “avoid” events that cause a decline in system functionality. Redundancy is based on the existence in the system of alternative resources and elements, to “cope with” such a loss of functionality. Rapidity is the ability to act quickly and expeditiously upon the condition that caused the loss of functionality. Finally, resourcefulness represents the condition that a system possesses to trigger the process of recovering the level of functionality.

Some authors emphasize that the degree of recovery does not imply a perfect restoration of the system’s functionality but considers that the system has returned to a level or state in which it can be considered functional. This situation is recurrent in complex engineering systems that, even in the presence of local failures, the functionality of the system is not completely shut down, but it may occur in a gradual performance degradation process. On the other hand, other authors emphasize that it is necessary that the system recovers the level of functionality that it had before the shock, or at least, a value similar to this one [16,17]. This reflects two slightly different ways of understanding resilience. One considers resilience as the property of a system to remain close to a stable balance, in regard to its functionality. The other refers to the ability to transform from one state of equilibrium to another, emphasizing more on the dynamic characteristics of such transformation [18]. This idea is reaffirmed in [18], who states that much research effort has been directed towards the recovery of the pre-shock condition, rather than considering this as an opportunity to reconstruct or transform a system into a different one with better conditions or characteristics for a state of equilibrium. This can be achieved by incorporating the concept of transformability into the resilience of a system. Another author who highlights the need to implement a learning process during the restoration process is [16]. This points to the need for an administration with the ability to optimize its management based on the knowledge gained from each disruptive event. In some sense, this is about the resources that the organization might have and their costs. Among the main limitations to increasing the resilience of systems are their limited resources. Thus, it is impossible to allocate resources to all components of a system. It is therefore necessary to identify the critical components, whose improved performance will most benefit the system in terms of its resilience, or with whose degradation, the system as a whole, will be most affected [17].

Fleet Resilience

Capital-intensive organizations, such as those in the mining sector, require and use a lot of large and expensive equipment. This equipment is intended for the transport and processing of the material removed from the mine. Major investments in physical assets (capital expenditures—CAPEX), coupled with operation and maintenance costs (OPEX) can reach values of several hundred million dollars. The basic and fundamental components of any logistics infrastructure are all the resources that enable the storage, movement and handling of cargo units. This is supplemented by computer networks and decision support systems [19]. Typical open pit equipment consists of trucks, shovels, drills and other auxiliary units [20,21] that develop activities that include drilling, loading, transport, unloading and other auxiliary services, such as road conditioning, among others [22,23]. In a fleet-based system, all units in the system work together under a productive program with defined standards of availability and efficiency [24]. The transport of material is one of the most important aspects in open pit mining operations. The costs can reach up to 50% of OPEX, therefore, any improvement initiative in this area will have a significant economic impact on the organization [25].

Although some aspects of resilience have received attention in the transport sector, there is no unified definition for such a term. Based on the fundamental concepts of engineering resilience and ecological resilience, Wang [18] defined ‘resilience in transport’ as the quality that leads to recovery, reliability and sustainability.

These attributes were integrated by [26] into three dimensions: Robustness, Redundancy and Recovery. Robustness is about the ability of system to resist when suffering disruptions. Redundancy provide the system with spare service capacity when some components or elements fails. Robustness and Redundancy concerns about the ability of maintain its service level or performance. Recovery concerns about the ability of a system restore its characteristic service level or performance [26]. The work of [27] investigates the role of network topology, and the characteristics of the topology, in the ability of a transportation system to cope with disruptive events. Specifically, the paper hypothesizes that the topological attributes of a transportation system significantly affect its resilience to disaster events. According to the authors, resilience explains not only the inherent ability of the system to absorb externally induced shocks, but also the effective and efficient adaptation measures that can be taken to preserve or re-establish performance post-event [27].

Many quantitative approaches have been proposed to measure the resilience of different transportation systems [16,28], such as railway system [29], and freight transportation system [30]. Zhou et al. [31] provided a synthesis of the up-to-date literature on resilient transportation, focusing on concepts and methodologies. The main body of that review was devoted to the characterization of resilience metrics and mathematical models and to the strategies used to enhance resilience [31]. The authors suggested that through reliability investment, the organizations could achieve significant savings over a systems lifecycle. Thus, such reliability improved fleet availability and a larger fleet size can be used by the companies when needed.

This work proposes the use of a resilience metric in the extraction fleet of an open-pit mine, using as a basis the availability of each of the elements, the topological structure of the fleets and the criticalities defined according to the reliability and maintainability of each of these pieces of equipment. A case study, using an example of a mine located in northern Chile, allows the validation of the concept and its usefulness in evaluating various maintenance strategies, as well as the impact of the topological structure of the system. In the next section, we discuss the different approaches to quantify the resilience in engineered systems. Section 3 is devoted the analysis and discussion of resilience metrics. The methodological aspects of this research are presented in Section 4. The case study, and the analysis of its results is presented in Section 5. Finally, conclusions and future research directions are given in the last section.

2. Resilience Metrics

As certain confusion exists between different terms and definitions of the resilience concept, there is a wide range of metrics that seek to assess and quantify the resilience of a system. The challenge of finding the right resilience metric, considering important properties such as performance and time-related properties and the necessary simplicity of implementation is still open. In 2016, Yodo and Wang [32] presented a comprehensive review of the engineering resilience literature, suggesting the need to develop a generally applicable engineering resilience analysis framework. A quantitative analysis of the resilience concept should initially acknowledge that this approach has been little used in the engineering context.

Albasrawi et al. [13] suggest a resilience metric to compare the recovery strategies from cyber infrastructure systems failures. Through simulation, they obtain information about possible cascading failures by providing a stochastic reliability model to obtain the resilience assessment. This model is illustrated through the representation of a smart grid type system. The authors do not identify the variable that identifies the level of functionality, nor the performance of the system. They only refer to it as some quantifiable indicator (F(t)). The paper by [33] proposes a set of factors that are used to quantify the resilience of microgrid systems. The level of resilience measurement is determined by the assessment of the percentage of voltage, the level of yield reduction measured by the percentage of reduction in the served load, the recovery time and the time to reach the power balance state. Attoh-Okine et al. [34] proposed another metric based on the triangle of resilience. This metric is based on the concept of urban infrastructure quality and is founded on the concepts of robustness, redundancy, speed and resourcefulness. These concepts were partially defined by the authors. Hu and Mahadevan [35] proposed a metric to measure the resilience of mechanical systems. The defined metric can be used for system design through time-dependent reliability analysis. The performance measure is expressed as a quantity of interest (economic value of system performance, or other measures).

Some authors associate two well known engineering metrics to resiliency: reliability and availability at the system level. Huizar et al. [4] mention two main elements: the frequency of disruptive events and their duration. The first element can easily be related to the concept of mean time between failures (MTBF) and reliability, while the second element is a measure of the maintainability of the system, and can be expressed as mean time to repair (MTTR). Zhuang et al. [36] associate the concepts of availability and resistance. The former represents the percentage of demand that is satisfied during an interruption, while the latter represents the response to a failure, even if it is not a full failure.

The resilience concept can be better represented considering Figure 1 (adapted from [37]). This shows the behavior of a given performance metric (e.g., its functionality). This, over time, may suffer variations due to disruptive events. In one such event, performance suffers a significant decline in its level, to a minimum value. From that moment on, the functional performance begins to increase its value in time, until it reaches an “acceptable” level, indicated in Figure 1 as the final functional state.

Cholda [14] emphasizes that resilience must be seen as a global measure that comes from the structure of the system and takes into account all its parts, their interrelationships and local performances, with the way they impact on the whole. According to Bishop et al. [38] there is the need to consider the functionality of a system in a holistic way. Thus, different levels of abstraction are possible and necessary. Bishop et al. [38] highlight among their concerns the structures of systems and the cascading effects that will occur in the presence of events and how to take this into account in defining resilient systems. ISO standard ISO-14224 [39] defines a general hierarchy using terms such as system, subsystem, and component which can be used to establish an appropriate structure to determine the resilience behavior in a complex system.

Another important concept related to the resilience quantification is the gradual or abrupt loss of functionality. Several authors do not consider the effect of a shock only as a binary situation (functional or not: on or off) [40]. Bruneau et al. [15] assume that the initial system is full of functionality (100% functional), however, this is not always the case. The system may not be fully functional before a performance shock. A system’s response may be expressed as a given minimum level of service, or as an acceptable range within which the system operates.

This situation is recurrent in complex engineering systems that, even in the presence of local failures, and thanks to complex topologies that combine parallel, redundant and stand by equipment, the functionality of the system is not completely shut down, but it may occur in a gradual performance degradation process. The Figure 2 (adapted from [14]) depicts that graduality.

Similarly, the system will not necessarily recover its full functionality after it is recovered. This may be due to two important aspects: other disruptive events may continue to influence the system performance [16] or the organization may not be able to sustain the costs of full recovery. This is why resilience must consider that the output or service level of the system must be compared to certain levels of acceptance.

Initially we addressed the following question, what is the goal of maintenance and what is its measure? Stapelberg [41], as an analogy with the productivity ratio, defines as the main goal of maintenance as the correct balance between the “maintenance output” and “maintenance input”. This reinforces the idea that the performance of maintenance actions (efficiency and effectiveness) should be measured. Thus, it is widely accepted that the availability is the most effective metric to determine the result of the operational condition of an asset. That is, availability determines the operating condition of the equipment and the effectiveness of the maintenance being applied to that equipment.

Resilience and Maintenance

To the best of our knowledge, only one work [12] has proposed in an explicit manner the use of system-level availability as the parameter that allows performance evaluation in engineering systems. System-level availability reflects the relationship between the time a physical asset is available and the total or nominal time for operation. Availability is an important parameter linked to maintenance and its efficiency. This can be observed in Equation (1), in which the average times between failures and the average times to repair are related.

A_{s} = \frac{MTBF}{MTBF + MTTR}

(1)

The first parameter, MTBF, is closely aligned with the concept of reliability, i.e., the probability of an asset to be operational at a certain point in time, while maintainability (MTTR) reflects the probability that the repair will take place in a certain period of time. Still in reference to Equation (1), it can be seen that if the reliability of an asset is reduced, availability will be affected. On the other hand, if the maintainability of the asset decreases, or it suffers a worsening, that is, the time to repair an asset increases, the availability of such asset will also be reduced. For this reason, to promote increases in availability at the system level, efforts must be deployed to improve the reliability of its constituent elements and install efficient and expedited repair procedures. Figure 3 explains the concept of availability in a graphic form.

The availability concept allows the calculation of resilience as an intrinsic characteristic of the engineered system, stemming from its structure, its level of organization and its maintenance resources. Therefore, it is possible to assess the effects on the system-level availability considering alterations in the system’s topology and the definition of technical resources allocated to maintenance activities.

The availability at the system level is estimated taking into consideration the structure (topology) of the system. That is, in order to compute the value of systemic availability, the connections among different pieces of equipment (i.e., serial, parallel) or the existence of redundant or stand-by equipment, must be considered. For example, this can be observed in the time diagram in Figure 4, which reflects the up and downtimes in two pieces of equipment arranged in parallel.

For each of the aforementioned configurations, there are analytical models that allow obtaining or estimating the availability at the system level. Table 1 shows the models for system availability computation, considering a system with n equipment, arranged according to different topologies (serial, parallel and with partial redundancy or r over n).

Here,

A_{s}

represents the system’s availability and

A_{i}

represents the equipment’s availability. As already mentioned, the model proposed by Cai et al. [12] is the unique evidence of an analytical model that relates the system resilience to the availability of the system in the presence of a given set of disruptive events. Considering Figure 5, we have Equation (2):

ρ = \frac{A_{1}}{n (\ln (t_{1}))} \sum_{i = 1}^{n} \frac{A_{2}^{i} A_{3}^{i}}{\ln (t_{3}^{i} - t_{2}^{i})}

(2)

where A₁ corresponds to the steady state system’s availability. This occurs from the initial moment, until the instant t₁. At the instant t₂, a sudden decrease in the system availability occurs. From there on, availability presents a transitory state or value A₂. Then, the system recovers its availability until a new equilibrium state A₃. This happens in t₃. The number of availability drops is denoted by “n”. Consequently,

A_{2}^{i}

and

A_{3}^{i}

represents the corresponding values associated with each of the i shocks (i ≦ n). Based on [12], it can be considered that the structure (taxonomy) and maintenance resources (strategies) are the two vital drivers for an engineering system to achieve full resilience. This means that, by setting the structure of a system and its maintenance resources, the resilience at the system level is set. This would allow the concept of resilience to be considered as a critical variable in decisions related to the planning, design and maintenance management of engineering systems.

3. Proposed Methodology

To estimate the reliability, maintainability and total availability parameters of a complex engineering system, there are several ways to represent them. Bouroni [42] proposes and compares fault tree analysis (FTA) and reliability block diagrams (RBD).

The FTA is a deductive methodology that allows the determination of possible causes of disruptive events in a system. In addition, this allows for the estimation of the probabilities of a failure. The FTA seeks to determine the causes of an undesirable event. Using a fault tree as a basis, the system is dissected in detail to determine the root causes or combinations of causes of the superior event. Top events are generally significant failures, which generate serious consequences to system performance. The FTA produces both qualitative and quantitative information about the system under study. However, FTAs cannot be used to accurately estimate the value of system availability and are therefore only used to obtain good approximations of that parameter. Safder et al. [43] defined the system availability considering a combination of “AND gates” and “OR gates”.

RBDs are representations based on linked blocks according to various configurations (series, parallel, stand by, etc.) or topologies, which, according to these links, and the reliability of the equipment or systems that are represented by each of the blocks, specifically affect the reliability of the system as a whole.

Still, according to Bourouni [42], in simple systems, the FTA and RBD methods usually produce similar results. In fact, [42] suggests equivalence relations between both methods. However, it should be mentioned that FTA is too difficult to apply for complex systems (back-up configurations, redundancy, etc.). Therefore, it can be deduced that the RBD method is more suitable as the systems’ configurations to be represented become more complex.

For the present work, the following steps were carried out:

Selection of the resilience metric. Implementation of the model using MATLAB software.
Modeling of the production system through reliability blocks diagrams (RBD) using R-MES software.
Obtaining the availability of the system using real data and reliability, availability and maintainability (RAM) analysis.
Simulation of different maintenance strategies and the insertion of additional equipment (Monte Carlo experiments).
Calculating and comparing resiliencies for each simulated scenario.

The application of reliability block diagrams is proposed to support the availability calculations. With this methodology, in combination with Monte Carlo simulation experiments, it is possible to estimate the expected values of availabilities both at the equipment level and at the system level, taking into account the topological relationships among the equipment pieces [44]. In addition, through these experiments, it is also possibly useful information of which components contribute to the major portion of capacity and availability losses. Finally, using such an approach, it is possible to execute the trade-offs and assessments among reliability, maintainability and redundancy. Here, we included in such evaluations the possibility of observation of the behavior of the system’s resilience metric of the according to the previously mentioned system design parameters.

Cai et al. [12] required the definition or computation of the steady state availability for the application of their proposed model. This concept has been extensively discussed in the literature, without reaching a consensus on how to quantify it, therefore, this may constitute a difficulty for its application in real cases. According to the same authors [12], there is no real steady state availability, nevertheless, they defined steady state availability as a value whose variation does not exceed 10⁻⁵ within a given continuous five-periods interval. They called such an instant as the steady state time. Another relevant aspect of the aforementioned model is the lack of thresholds establishing the availability values, before and after availability recovery.

This paper proposes the adaptation and use of the resilience index proposed by [12] by considering that the main objective of maintenance organizations and actions are pointed to the maximization of the system availability, that is, 100%. Therefore, we propose such a value as the reference or initial availability value.

In relation to the second issue, we propose the following application principle: any availability loss at the system level corresponds to one or more availability losses in the elements of that system. How these losses impact the system will depend on the criticality levels of each piece of equipment, the topological structure of the system, and the duration of each of these losses. Therefore, and considering that the systemic availability is only the result of the behavior of the individual availabilities, considering the topologies of the system, the availability of the system throughout the time may be represented as shown in Figure 6. In this paper, we consider each loss and recovery in systemic availability as an individual disruptive event. Each of these events is the combined result of one or more disruptive events in one or more pieces of equipment within the system.

Therefore, once the time series of availability at the system level is obtained, the sequence of local minima and local maxima in that series is identified (Figure 7). Then, all the data required to use the model of Equation (2) in a given time interval are available.

4. Case Study

With the aim of demonstrating the validity and usefulness of the model and the proposed methodological procedure, the behavior of the resilience index of a fleet used for the operation of an open-pit mine located in northern Chile was analyzed. For the purposes of the study, we divided this equipment into three sub-fleets. These sub-fleets are based on three classes of equipment: electric shovels (S02–S04), drills (D05–D09) and trucks (Hxx), summing 51 pieces of equipment. The truck and drill sub-fleets were further subdivided according to their types. A series of support equipment was ignored, such as motor graders, irrigation trucks, etc. Once the equipment was identified and selected, historical data were collected, considering 18 months of continuous operation. From these data, the reliability and maintainability curves of each of the equipment items were modelled. With this, the metrics for each of the equipment items were obtained, and from this analysis, the critical equipment of each sub-fleet was identified. In Table 2, the values of mean time between failures (MTBF) and mean time to repair (MTTR) are shown as main indicators of reliability and maintainability since they are directly linked to the availability, and therefore, to the resilience of the system.

Figure 8 shows the Jack Knife diagram with the equipment selected as critical from Table 2, for each of the sub-fleets. We established as an assumption that these equipment items are the ones that most affect the resilience of the system. Therefore, they were selected for stage II of the study, i.e., the Monte Carlo analysis.

Analyzing the results and the historical series of interventions, it was initially possible to observe that the Drilling Fleet presents the equipment with the greatest impact at system level (unavailability), mainly due to the number of hours of detention as a result of corrective maintenance. Finally, it was decided to continue the analysis by selecting one equipment, the most critical, for each sub-fleet, except for the 1AC truck fleet, which, due to its size, incorporated 10% of them (three trucks) into the analysis.

4.1. System-Level Resilience

The analysis interval begins on 1 January 2018 and ends on 25 June 2019. The monthly systemic availability of the extraction fleet is shown in Figure 9. The occurrence of an external disruption, as well as the recovery period of availability at the system level, is represented by an arrow in the aforementioned figure. This goes from each local minimum to the next local maximum of the series.

As can be seen in Figure 9, there are four “shocks” in the series that affected availability in the Extraction Fleet. That is to say, the availability presents four local minimums indicated by the marks

t_{2_{i}}

. In addition, four recovery points are identified

t_{3_{i}}

. The resilience index obtained through Equation (1) for the period of analysis is 0.41 or 41%.

4.2. Simulation and Sensitivity Studies

In the second stage, the sensitivity analysis one, various scenarios were proposed in order to study the behavior of the resilience index in relation to changes in the maintenance strategy. These scenarios can be divided into three groups: (i) the establishment of a simultaneous preventive maintenance plan for the nine items of critical equipment. These preventive interventions have a duration of 6 h and are applied with three different frequencies (3, 6 and 9 months); (ii) preventive interventions in each of the items of critical equipment separately, with three different durations (6, 9 and 12 h) and the same frequencies mentioned above; (iii) increase in redundancy in the critical fleet, incorporating, separately, one piece of equipment to each of the sub-fleets of drills to measure the behavior of the resilience index in the sub-fleet, fleet and in the extraction system as a whole.

These scenarios totalize nine settings for the Class I experiments and 81 experiments for the Class II experiments. For type (iii) there are three experiments. Each Monte Carlo experiment was run with 1000 iterations for each scenario and the R-MES suite was used for this purpose. Figure 10 shows the behavior of the resilience index at the system level with the experiments of category (i).

Figure 11 shows the behavior of the resilience indexes, for each sub-fleet, considering the preventive maintenance according to frequencies indicated for critical equipment.

In the second set of experiments, simulations were carried out considering each of the nine critical equipment separately, preventive interventions with three different durations (6, 12 and 18 h), and with three frequencies of application between them (3, 9 and 12 months). Table 3, Table 4 and Table 5 show the results of these experiments.

Firstly, it is noted that in general, significant increases in resilience are achieved by applying preventive maintenance. In general, the resilience values obtained from such maintenance show little variability and are about 30% higher than the resilience index obtained from the historical series. Table 3, Table 4 and Table 5 highlight the results of the highest indices for each group of experiments. In addition, the averages of each of them are presented for each analyzed equipment. Based on these results, it can be seen that, in five of the nine experiments, the highest value of the resilience index at the system level was obtained when applying planned maintenance activities to the S02 equipment, the most critical piece of equipment (higher failure frequency among critical equipment). In two other opportunities, and when applying planned maintenance activities to this same equipment (S02), the system resilience index occupied the second position among the tested ones. This allows us to observe that the planned maintenance activities in the S02 equipment are those that have more influence in the resilience of the system. When observing the H43 equipment, a truck, it can be seen that, on four occasions, the maximum resilience value was achieved by applying preventive maintenance to the equipment. However, this truck has not shown significant values either in terms of reliability or maintainability, within the group of the nine most critical items of equipment.

In regard to experiment (iii), it seeks to measure the impact on the resilience index from increases in redundancy in certain pieces of equipment. For this case, the increase, separately, in a drilling unit, to each of the existing sub-fleets was analyzed. Figure 12 shows the values of the resilience index at the system level in its original configuration, and by adding an identical unit (with the same reliability and maintainability parameters) as the original one(s).

The resilience resulting from the incorporation of an additional piece of equipment to each of the drilling fleets is shown in Figure 13.

Figure 13 shows that the largest relative increase is achieved by adding a drill to the ROC sub-fleet, where the rate of the ROC sub-fleet increased by more than four times. On the other hand, it can be seen that the smallest increase is achieved by adding a drill to the Drilltech sub-fleet, as the increase in the resilience of this sub-fleet is a little over 40%.

5. Conclusions

A resilience index based on system availability was adapted and applied. Through that, it was possible to measure the impact of different planned maintenance strategies. In addition, the effect of the incorporation of additional pieces of equipment in order to increment redundancies was analyzed.

Regarding the proposed methodology, it was possible to identify where or how it is possible to positively influence the system resilience applying different preventive maintenance strategies. Moreover, it is feasible to prevent excessive maintenance activities that will not produce significant improvements in one of the system’s overall characteristics, such as resilience. In other words, a balance can be established between the maintenance strategy, and its resources, and the impact on the system’s resilience, based on the overall availability. Such balance can be affected over time, hence the importance of developing simulations or projections using Monte Carlo experiments to simulate the behavior of each reliability and maintainability parameter over time. This allows the incorporation of a useful element into the decision-making process with regard to the physical assets management.

Author Contributions

O.D. designed the research, A.C. contributed to the construction of the model and result analysis and J.A. contributed to the simulation experiment, writing and editing of the manuscript. A.A. contributed to writing and editing and the results analysis. All authors have read and agreed to the published version of the manuscript.

Funding

No funding is present.

Acknowledgments

This work utilized R-MES suite by R-MES analytics, Santiago, Chile.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hutchison, D.; Sterbenz, J.P.G. Architecture and design for resilient networked systems. Comput. Commun. 2018. [Google Scholar] [CrossRef]
Xueyi, L.; Jinjun, Z.; Huai, S.; Zio, E. Resilience Assessment of China’s Natural Gas Supply System Based on Ecological Network Analysis. In Proceedings of the 2019 4th International Conference on System Reliability and Safety, ICSRS 2019, Rome, Italy, 20–22 November 2019. [Google Scholar] [CrossRef]
Mousavizadeh, S.; Bolandi, T.G.; Haghifam, M.R.; Moghimi, M.; Lu, J. Resiliency analysis of electric distribution networks: A new approach based on modularity concept. Int. J. Electr. Power Energy Syst. 2020. [Google Scholar] [CrossRef]
Huizar, L.H.; Lansey, K.E.; Arnold, R.G. Sustainability, robustness, and resilience metrics for water and other infrastructure systems. Sustain. Resilient Infrastruct. 2018. [Google Scholar] [CrossRef]
Feng, Q.; Zhao, X.; Fan, D.; Cai, B.; Liu, Y.; Ren, Y. Resilience design method based on meta-structure: A case study of offshore wind farm. Reliab. Eng. Syst. Saf. 2019. [Google Scholar] [CrossRef]
Sterbenz, J.P.G.; Hutchison, D.; Çetinkaya, E.K.; Jabbar, A.; Rohrer, J.P.; Schöller, M.; Smith, P. Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines. Comput. Netw. 2010. [Google Scholar] [CrossRef]
Sterbenz, J.P.G.; Hutchison, D.; Çetinkaya, E.K.; Jabbar, A.; Rohrer, J.P.; Schöller, M.; Smith, P. Redundancy, diversity, and connectivity to achieve multilevel network resilience, survivability, and disruption tolerance. Telecommun. Syst. 2014. [Google Scholar] [CrossRef]
ASME-ITI. All Hazards Risk and Resilience–Prioritizing Critical Infrastructure Using the RAMCAP Plus SM Approach; ASME Innovative Technology Institute: Washington, DC, USA, 2009. [Google Scholar]
Linkov, I.; Eisenberg, D.A.; Plourde, K.; Seager, T.P.; Allen, J.; Kott, A. Resilience metrics for cyber systems. Environ. Syst. Decis. 2013. [Google Scholar] [CrossRef]
Hashimoto, T.; Stedinger, J.R.; Loucks, D.P. Reliability, resiliency, and vulnerability criteria for water resource system performance evaluation. Water Resour. Res. 1982. [Google Scholar] [CrossRef]
Ji, C.; Wei, Y.; Poor, H.V. Resilience of Energy Infrastructure and Services: Modeling, Data Analytics, and Metrics. Proc. IEEE 2017. [Google Scholar] [CrossRef]
Cai, B.; Xie, M.; Liu, Y.; Liu, Y.; Feng, Q. Availability-based engineering resilience metric and its corresponding evaluation methodology. Reliab. Eng. Syst. Saf. 2018. [Google Scholar] [CrossRef]
Albasrawi, M.N.; Jarus, N.; Joshi, K.A.; Sarvestani, S.S. Analysis of reliability and resilience for smart grids. In Proceedings of the Proceedings—International Computer Software and Applications Conference, Vasteras, Sweden, 21–25 July 2014. [Google Scholar] [CrossRef]
Cholda, P.; Tapolcai, J.; Cinkler, T.; Wajda, K.; Jajszczyk, A. Quality of resilience as a network reliability characterization tool. IEEE Netw. 2009. [Google Scholar] [CrossRef]
Bruneau, M.; Chang, S.E.; Eguchi, R.T.; Lee, G.C.; Rourke, D.O.; Reinhorn, A.M.; Shinozuka, M.; Tierney, K.; Wallace, W.A.; Winterfeldt, D.V.O.N. 13 th World Conference on Earthquake Engineering A framework to quantitatively assess and enhance the seismic resilience of communities. Nat. Hazards 2004. [Google Scholar] [CrossRef]
Sun, W.; Bocchini, P.; Davison, B.D. Resilience metrics and measurement methods for transportation infrastructure: The state of the art. Sustain. Resilient Infrastruct. 2020. [Google Scholar] [CrossRef]
Li, J.; Zhou, Y. Optimizing risk mitigation investment strategies for improving post-earthquake road network resilience. Int. J. Transp. Sci. Technol. 2020. [Google Scholar] [CrossRef]
Wang, J. Resilience thinking’in transport planning. Civ. Eng. Environ. Syst. 2015, 32, 180–191. [Google Scholar] [CrossRef]
Żurek, J.; Małachowski, J.; Ziółkowski, J.; Szkutnik-Rogoż, J. Reliability Analysis of Technical Means of Transport. Appl. Sci. 2020, 10, 3016. [Google Scholar] [CrossRef]
Burt, C.N.; Caccetta, L. Equipment selection for surface mining: A review. Interfaces 2014. [Google Scholar] [CrossRef][Green Version]
Topal, E.; Ramazan, S. A new MIP model for mine equipment scheduling by minimizing maintenance cost. Eur. J. Oper. Res. 2010. [Google Scholar] [CrossRef]
Chaowasakoo, P.; Seppälä, H.; Koivo, H. Age-based maintenance for a fleet of haul trucks. J. Qual. Maint. Eng. 2018. [Google Scholar] [CrossRef]
Chaowasakoo, P.; Seppälä, H.; Koivo, H.; Zhou, Q. Improving fleet management in mines: The benefit of heterogeneous match factor. Eur. J. Oper. Res. 2017. [Google Scholar] [CrossRef]
Yanagi, S. An iteration method for reliability evaluation of a fleet System. J. Oper. Res. Soc. 1993, 43, 885–896. [Google Scholar] [CrossRef]
Alarie, S.; Gamache, M. Overview of solution strategies used in truck dispatching systems for open pit mines. Int. J. Surf. Min. Reclam. Environ. 2002. [Google Scholar] [CrossRef]
Cao, M. Transportation Resilience: A summative review on Definition and Connotation. Adv. Intell. Syst. Res. 2015. [Google Scholar] [CrossRef]
Zhang, X.; Miller-Hooks, E.; Denny, K. Assessing the role of network topology in transportation network resilience. J. Transp. Geogr. 2015. [Google Scholar] [CrossRef]
Wan, C.; Yang, Z.; Zhang, D.; Yan, X.; Fan, S. Resilience in transportation systems: A systematic review and future directions. Transp. Rev. 2018, 38, 479–498. [Google Scholar] [CrossRef]
Adjetey-Bahun, K.; Birregah, B.; Châtelet, E.; Planchet, J.L. A model to quantify the resilience of mass railway transportation systems. Reliab. Eng. Syst. Saf. 2016. [Google Scholar] [CrossRef]
Miller-Hooks, E.; Zhang, X.; Faturechi, R. Measuring and maximizing resilience of freight transportation networks. Comput. Oper. Res. 2012. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, J.; Yang, H. Resilience of Transportation Systems: Concepts and Comprehensive Review. IEEE Trans. Intell. Transp. Syst. 2019. [Google Scholar] [CrossRef]
Yodo, N.; Wang, P. Engineering resilience quantification and system design implications: A literature survey. J. Mech. Des. Trans. ASME 2016. [Google Scholar] [CrossRef]
Ibrahim, M.; Alkhraibat, A. Resiliency Assessment of Microgrid Systems. Appl. Sci. 2020, 10, 1824. [Google Scholar] [CrossRef]
Attoh-Okine, N.O.; Cooper, A.T.; Mensah, S.A. Formulation of resilience index of urban infrastructure using belief functions. IEEE Syst. J. 2009. [Google Scholar] [CrossRef]
Hu, Z.; Mahadevan, S. Resilience assessment based on time-dependent system reliability analysis. J. Mech. Des. Trans. ASME 2016. [Google Scholar] [CrossRef]
Zhuang, B.; Lansey, K.; Kang, D. Resilience/availability analysis of municipal water distribution system incorporating adaptive pump operation. J. Hydraul. Eng. 2013. [Google Scholar] [CrossRef]
Hosseini, S.; Barker, K.; Ramirez-Marquez, J.E. A review of definitions and measures of system resilience. Reliab. Eng. Syst. Saf. 2016. [Google Scholar] [CrossRef]
Bishop, M.; Carvalho, M.; Ford, R.; Mayron, L.M. Resilience is more than availability. In Proceedings of the Proceedings New Security Paradigms Workshop, New York, NY, USA, 12 September 2011. [Google Scholar] [CrossRef]
ISO 14224:2006. Petroleum, Petrochemical and Natural Gas Industries—Collection and Exchange of Reliability and Maintenance Data for Equipment; International Organization for Standardization: London, UK, 2006. [Google Scholar]
Russell, H.R. Methodology for Quantifying Resiliency of Transportation Systems; Embry-Riddle Aeronautical University: Prescott, FL, USA, 2020. [Google Scholar]
Stapelberg, R.F. Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design; Springer: London, UK, 2009. [Google Scholar]
Bourouni, K. Availability assessment of a reverse osmosis plant: Comparison between Reliability Block Diagram and Fault Tree Analysis Methods. Desalination 2013. [Google Scholar] [CrossRef]
Safder, U.; Ifaei, P.; Nam, K.; Rashidi, J.; Yoo, C. Availability and reliability analysis of integrated reverse osmosis—Forward osmosis desalination network. Desalin. Water Treat. 2018, 109, 1–7. [Google Scholar] [CrossRef]
Roda, I.; Garetti, M.; Arata, A.; Heidke, E. Model-based evaluation of asset operational availability. In Proceedings of the Summer School Francesco Turco, Senigallia, Italy, 11–13 September 2013. [Google Scholar]

Figure 1. Concept of system resilience (adapted from [37]).

Figure 2. The concept of gradual performance degradation process (adapted from [14]).

Figure 3. Availability computation.

Figure 4. System availability according to different system topologies.

Figure 5. System availability subject to degradation and shock (adapted from [12]).

Figure 6. System availability as a result of individual availabilities and topological aspects.

Figure 7. Schematic representation of the availability values considered by the Equation (2).

Figure 8. Jack knife diagram.

Figure 9. Actual availability of the extraction fleet.

Figure 10. System resilience index with preventive maintenance on critical equipment.

Figure 11. Resilience indexes of each sub-fleet with preventive maintenance in critical equipment.

Figure 12. Systemic resilience indexes with original topology and addition of drills.

Figure 13. Resilience indexes in drilling sub-fleets with the addition of an additional piece of equipment.

Table 1. Availability metrics according to different system topologies.

Configuration	Equation
Series	$A_{s} (t) = \prod_{i - 1}^{1 - n} A_{i}$
Parallel	$A_{s} (t) = 1 - \prod_{i - 1}^{1 - n} (1 - A_{i})$
Partial Redundancy	$A_{s} (t) = \sum_{r = k}^{n} (\begin{matrix} n \\ r \end{matrix}) A_{i})^{n - 1} {(1 - A_{i})}^{r}$

Table 2. Reliability and maintainability parameters of the fleet equipment.

Equipment	MTBF(h)	MTTR(h)	Equipment	MTBF(h)	MTTR(h)	Equipment	MTBF(h)	MTTR(h)
D08	17.84	11.46	H40	23.06	2.27	H57	98.30	1.11
D09	161.60	2.28	H41	29.09	1.69	H58	75.47	1.03
D06	3.06	7.55	H42	20.62	1.27	H59	68.43	1.88
D07	3.10	9.46	H43	22.88	3.83	H60	50.00	1.19
D05	4.71	7.15	H44	23.18	1.70	H61	117.20	1.78
S04	9.94	1.61	H45	26.01	1.48	H62	110.13	1.04
S03	7.48	1.25	H46	28.10	2.54	H63	176.80	3.86
S02	7.75	1.66	H47	25.82	2.26	H64	251.67	1.30
H31	19.76	3.26	H49	23.77	1.40	H23	12.68	2.07
H32	20.75	2.63	H48	27.81	1.86	H24	12.45	2.03
H33	15.74	2.38	H50	20.85	1.53	H25	10.31	4.93
H34	20.20	3.42	H51	33.66	1.48	H26	15.14	4.93
H35	16.44	1.88	H52	33.02	1.65	H27	14.78	3.28
H36	26.00	1.70	H53	28.21	1.26	H28	14.20	3.28
H37	16.90	1.98	H54	35.45	0.87	H02	19.19	2.77
H38	22.43	2.14	H55	56.47	2.05	H04	17.55	2.73
H39	26.86	2.47	H56	71.11	2.60	H08	19.22	2.77

Table 3. Simulation experiment result: planned maintenance to each equipment (duration: 6 h).

	3 Months	6 Months	9 Months	Average
D07	73%	72%	75%	74%
D05	75%	73%	76%	74%
D08	74%	75%	75%	75%
H25	77%	73%	75%	75%
S02	73%	75%	76%	75%
H34	75%	76%	75%	75%
H43	73%	72%	76%	74%
H31	70%	72%	70%	71%
H04	75%	72%	70%	72%
Ave.	74%	73%	74%	74%

Table 4. Simulation experiment result: planned maintenance to each equipment (duration: 12 h).

	3 Months	6 Months	9 Months	Average
D07	72%	73%	74%	73%
D05	73%	74%	74%	74%
D08	75%	75%	73%	74%
H25	75%	74%	73%	74%
S02	73%	74%	75%	74%
H34	73%	74%	74%	74%
H43	74%	73%	75%	74%
H31	75%	73%	72%	73%
H04	73%	72%	72%	72%
Ave.	74%	74%	74%	74%

Table 5. Simulation experiment result: planned maintenance to each equipment (duration: 18 h).

	3 Months	6 Months	9 Months	Average
D07	73%	72%	76%	74%
D05	74%	71%	77%	74%
D08	74%	73%	75%	74%
H25	73%	72%	75%	74%
S02	76%	75%	77%	76%
H34	71%	73%	71%	71%
H43	76%	75%	73%	75%
H31	75%	73%	74%	74%
H04	74%	75%	73%	74%
Ave.	74%	73%	74%	74%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Durán, O.; Aguilar, J.; Capaldo, A.; Arata, A. Fleet Resilience: Evaluating Maintenance Strategies in Critical Equipment. Appl. Sci. 2021, 11, 38. https://doi.org/10.3390/app11010038

AMA Style

Durán O, Aguilar J, Capaldo A, Arata A. Fleet Resilience: Evaluating Maintenance Strategies in Critical Equipment. Applied Sciences. 2021; 11(1):38. https://doi.org/10.3390/app11010038

Chicago/Turabian Style

Durán, Orlando, Javier Aguilar, Andrea Capaldo, and Adolfo Arata. 2021. "Fleet Resilience: Evaluating Maintenance Strategies in Critical Equipment" Applied Sciences 11, no. 1: 38. https://doi.org/10.3390/app11010038

APA Style

Durán, O., Aguilar, J., Capaldo, A., & Arata, A. (2021). Fleet Resilience: Evaluating Maintenance Strategies in Critical Equipment. Applied Sciences, 11(1), 38. https://doi.org/10.3390/app11010038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fleet Resilience: Evaluating Maintenance Strategies in Critical Equipment

Abstract

1. Introduction

Fleet Resilience

2. Resilience Metrics

Resilience and Maintenance

3. Proposed Methodology

4. Case Study

4.1. System-Level Resilience

4.2. Simulation and Sensitivity Studies

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI