A Review of Reliability and Fault Analysis Methods for Heavy Equipment and Their Components Used in Mining

Prerita Odeyar; Derek B. Apel; Robert Hall; Brett Zon; Krzysztof Skrzypkowski

doi:10.3390/en15176263

,

and

¹

School of Mining and Petroleum Engineering, University of Alberta, Edmonton, AB T6G 2R3, Canada

²

Department of Mining Engineering and Management (MEM), South Dakota School of Mines, Rapid City, SD 57701, USA

³

North American Construction Group, 27287-100 Avenue, Acheson, AB T7X 6H8, Canada

⁴

Faculty of Civil Engineering and Resource Management, AGH University of Science and Technology, 30-059 Kraków, Poland

Energies2022, 15(17), 6263;https://doi.org/10.3390/en15176263

This article belongs to the Special Issue Volume II: Mining Innovation

Version Notes

Order Reprints

Abstract

To achieve a targeted production level in mining industries, all machine systems and their subsystems must perform efficiently and be reliable during their lifetime. Implications of equipment failure have become more critical with the increasing size and intricacy of the machinery. Appropriate maintenance planning reduces the overall maintenance cost, increases machine life, and results in optimized life cycle costs. Several techniques have been used in the past to predict reliability, and there’s always been scope for improvement of the same. Researchers are finding new methods for better analysis of faults and reliability from traditional statistical methods to applying artificial intelligence. With the advancement of Industry 4.0, the mining industry is steadily moving towards the predictive maintenance approach to correct potential faults and increase equipment reliability. This paper attempts to provide a comprehensive review of different statistical techniques that have been applied for reliability and fault prediction from both theoretical aspects and industrial applications. Further, the advantages and limitations of the algorithm are discussed, and the efficiency of new ML methods are compared to the traditional methods used.

Keywords:

reliability; fault diagnosis; predictive maintenance; machine learning; lifetime distributions

1. Introduction

Reliability refers to the probability of a system meeting its desired performance standards in yielding output for a specific time duration when used under specific conditions [1]. For instance, if a machine is designed to run continuously for 10,000 h with no faults in between, the machine is said to be 100% reliable for that period. However, if a failure occurs after 10,000 h of operation, the machine’s reliability after 10,000 h is less than 100% [2]. Component reliability is a function of time and is always measured at a specific operating time. Reliable operation is interrupted or terminated by failures. A failure is an event that results in the inability to complete the required duties and meet the requirements. The theoretical definition of reliability is (Reliability = 1 − Probability of Failure), given by R(t). Availability and maintenance are related to reliability and are defined as essential components of it [3].

Understanding heavy equipment’s complexities, efficiency, and failures can help achieve better production results and reduce unexpected and unneeded costs. Industries can maintain consistent levels of productivity by conducting regular reliability assessments [4]. Performance measurement is significant because it identifies existing performance gaps between existing and desired performance and shows how far the gaps have been closed [5,6]. A production system consists of many subsystems. In order to make the system efficient and viable to operate, each subsystem must be optimized concerning one another. The system’s availability, reliability, and maintainability, as well as its ability to perform as intended, significantly impacts the equipment’s effectiveness. Since the mid-1980s, reliability analysis methodologies have steadily gained acceptance as standard tools for developing and operating automated and complex mining systems [7].

A proper maintenance plan is of paramount importance to increase or maintain the system’s reliability at a standard level. The role of equipment maintenance has evolved in the last few decades, from merely being a part of the production to an essential strategic element in mining operations. Since the early 2000s, maintenance practices have been recognized as a profit contributor, giving more importance to maintenance practices, and elevating them to the same level as production [8]. With proper maintenance strategies many abrupt failures can be prevented, decreasing the downtime and increasing the system’s reliability. This helps in achieving targeted levels of production in the industry.

Equipment maintenance is so vital that around 35% to 50% of the annual operating budget can be spent on equipment maintenance and repair alone in the mining industry, and around 30% in the construction industry [9,10]. The evolution of maintenance in the mining and construction industry has come a long way in the last decade, aided by real-time data availability. There are four common maintenance approaches that can be applied to mine assets: reactive, preventive, condition-based, and prescriptive [11,12].

Often known as unscheduled maintenance, corrective maintenance is only conducted when equipment fails. This is because it could result in a lot of equipment downtime and a lot of secondary failures, resulting in a loss of production [13].

Preventive maintenance (PM), is carried out at predetermined intervals and according to a prescribed criterion; “it is intended to reduce any cost of unplanned maintenance from unexpected equipment failure” (EN 13306 2001). All preventive management programs are time driven. The component to be maintained can either be replaced or reconditioned depending on its condition. PM can be further categorized into condition-based and predicted maintenance [14].

Condition-based monitoring (CBM) is a form of preventive maintenance that repairs a system before it fails by looking for signs of fatigue or other failure precursors. CBM creates an optimum maintenance period by extending the time between preventive maintenance and reducing the expense of unnecessary excessive maintenance and downtime. CBM is based on the study of maintenance of gathered data (such as vibration, crack propagation, oil, pressure, and viscosity) [15].

An overview of the maintenance classifications is shown in Figure 1. It is required that any maintenance strategy should minimize equipment failure rates, improve equipment reliability, prolong the equipment’s life, and reduce maintenance costs. Many KPIs are used to monitor the long-term trends in reliability and maintenance performance. These KPIs help understand if all the small and large modifications in maintenance practices and system changes are having the desired effect over time. The mean time between failure (MTBF) and mean time to failure (MTTF) are two essential KPIs for determining the system’s reliability and faults. A successful maintenance strategy and reliability policies lead to resolving issues that lead to equipment failures and show a steady increasing performance trend that stabilizes at industry benchmark levels [16].

Figure 1. Different maintenance strategies [17].

Fault detection and reliability analysis of the system have evolved over the years. The history of the reliability field may be traced back to the early 1930s, when probability concepts were applied to problems associated with electric power generation. The beginning of the maintainability field may be traced back to 1901. By the 1960s, equipment maintenance activities started to be regarded as technical and involved optimizing maintenance solutions and activities [1]. Most literature on reliability and maintenance analysis in the mining and construction industry is present from 1975 onwards. During the 1975–1985 period, several literature works can be found on reliability analysis of mining equipment that was based on theoretical approaches [18,19,20,21]. Authors used manually drawn probability density functions and KS tests to identify the availability of continuous mine systems [22] and determined the reliability of bucket wheel excavators by doing probability calculations. In the next few years (1985–1995), graphical methods using total time on test and analytical methods using KS test and maximum likelihood estimations were used for reliability testing [4,23,24]. Authors used proportional hazard models to investigate the effects of two different designs and maintenance of power transmission cables [21]. Fault tree analysis, failure mode, effect and criticality analysis were used in the late 1980s for fault detection and reliability analysis [25,26,27,28]. By the 2000s best fit probability distribution using reliability software was extensively used in mining to predict reliability and schedule maintenance using information from reliability plots [29]. Weibull++6 software (from Reliasoft, Tuscon, AZ, USA) was used to determine best-fit distributions for characterizing the failure pattern of the two crushing plants and their subsystems. Authors used Statgraphic software to estimate parameters of probability distributions for the shovel and its subsystems [30]. Most work in reliability is found around estimating best fit distributions for independent and identically distributed data (I.I.D) and NHPP models for non-I.I. D data [31,32,33]. The genetic algorithm was first applied in the reliability analysis of equipment in mining in 2001 [34]. Authors used Pareto analysis and statistical modeling of failure and repair distribution for reliability analysis of a hydraulic shovel [35]. Machine learning applications for mine equipment reliability analysis were largely introduced from late 2000’s. Several articles in the last ten years have used machine learning and deep learning for reliability and maintenance analysis. Genetic algorithms, discrete event simulations, SVM regression, KNN models, ANN, and reinforcement learning, have been widely used in the application of fault predictions and reliability analysis.

2. Methodology

Most of the relevant literature and research work reviewed in this study is regarding Machine learning applications in equipment fault detection and reliability analysis and their components, focusing on artificial intelligence and machine learning usage. This paper aims to provide a comprehensive review of advanced statistical and ML techniques widely applied for reliability and maintenance analysis by classifying the research according to the different statistical models and ML algorithms to offer guidelines and a foundation for further research. In addition, a critical analysis of previous articles was carried out to identify the advantages and shortcomings of the latest technological systems in the fault detection and maintenance field to identify areas for the future scope of the study.

To achieve the mentioned, the paper is organized into five sections. In the first section, there is a brief description of the current field of study. Section 2 presents the methodology in the literature that is employed to categorize the previous work. Section 3 presents the application of different traditional reliability methods. Section 4 discusses the application of ML methods used in failure and reliability predictions. Section 5 discusses the conclusions drawn based on the review and the potential future scope for the same.

Research databases, including Google scholar, Scopus, IEEE Xplore, ScienceDirect and SpringerLink were mainly used for this study. Strategic keywords like reliability/maintenance/failure analysis/fault detection and mine equipment (component) and machine learning/statistical/graphical method were used in the searches. Figure 2 shows the number of documents reviewed and used in each segment.

Figure 2. Statistics of number of documents reviewed.

3. Review on Application of Different Traditional Methods Used in Reliability and Fault Analysis

3.1. Graphical Methods

Graphical methods can identify fault time and monitor and schedule preventive maintenance. The graph plots the number of failures per unit versus the total time on test per unit. This method assumes the time between failures (TBFs) to be independently and identically distributed. Therefore, the actual chronological orderings of the TBFs can be ignored. Thus, using a TTT plot is not useful to evaluate failure data that has structures or is positive to the serial correlation test. However, a significant aspect of these plots is that they can be used to analyze incomplete data. The failure rate of the equipment can be inferred from the shape of the plot. If the plot is concave downwards, the equipment is deteriorating (increasing failure rate), but if it is concave upwards, the equipment improves over time [36]. If the plot crosses diagonal multiple times, the equipment has a constant failure rate [4]. Graphical methods can be used to arrive at maintenance intervals. TTT plots can be used to monitor health of equipment in terms of constant failure rate/increasing or decreasing failure rate. The technique of TTT-plotting, originally suggested by Barlow and Campo, is very simple to use for failure data analysis (Refs. [4,37,38]).

Graphical approaches can also be used to verify the presence of trends in failure and repair data by plotting the cumulative number of failures against the cumulative time [36]. Before modeling the reliability data, it should also be tested for mutual independence by testing it for the presence of serial correlation. The serial correlation can be tested by plotting the ith TBF X_i against (I − 1) th TBF, X_{i − 1}. If the plotted points exhibit no pattern, it can be interpreted that the TBFs are free from serial correlation. In case the plot reveals serial correlation, then the TBFs are plotted at greater lags, such as X_i against X_{i − 2,} X_{i − 3}, X_{i − 4} …. etc., to search for serial correlation over greater lags [39]. Since the 1990s, reliability and maintenance engineering has incorporated graphical methods, and recent studies show that graphical methods are still in use for the initial exploratory investigation. The input data for the graphical approaches are TTF and TBF data. Graphical methods are typically used to estimate the reliability of large equipment like excavators, draglines, and LHDs. From the existing literature work, it can be deduced that graphical methods are mostly employed in planning maintenance intervals, identifying the machine’s failure trends (increasing/decreasing failure rate), and testing the goodness of fit of other reliability estimating methods [40].

Authors used TTT plot to estimate the reliability of LHD machines and identified components that needed improvement in design [4]. In [23] scaled TTT was used to review the goodness of fit of the power-law-process model using both graphical and analytical procedures. In [41] used TTT plots for i.i.d failure data to plan maintenance intervals for material handling equipment operating in the mining industry. In [42] authors collected failure data of hydraulic shovels for a period of 1.5 years, analyzed the machine’s reliability using distribution plots and studied increasing/decreasing failure rates using a TTT plot. Authors used failure mode effect analysis (FMEA) and TTT plots to study the reliability of the cone crusher [40,43].

3.2. Fault Tree Analysis

Fault Tree analysis translates a physical system into a logical diagram, making it one of the industry’s most popular approaches for reliability and safety calculations. It can also update a system’s setup to make it less vulnerable and sensitive [44]. Fault trees can also assess the impact of design changes or proposed corrective actions [45]. The causes of an event are deduced using a top-down deductive analysis. The components of a fault tree analysis are “events” and “logic gates”, which connect the events to determine the reason for the top unwanted event. The process of creating a fault tree is one of trial and error, and no failure causes should be overlooked [46]. The completed fault tree is assessed considering the analysis’ goals. There are several stages to the evaluation: listing minimum cut sets, grading minimum cut sets, calculating probabilities, and so on. When there is quantitative data on the likelihood of events, FTA is very useful, although qualitative analysis is also possible [44]. Other risk analysis approaches aren’t as effective at discovering faults as fault trees. Its visual presentation of the failure causes makes it simple to identify a single failure that leads to a complete system failure. A fault tree is often normalized to a given interval, and an event’s probability depends on the relationship between the event risk function and this interval. The reliability is calculated using a sequence of gates, considering the probabilities of the outputs of a set of Boolean logic operations. It can also be used to assess the impact of design changes or proposed corrective actions [45]. Two major approaches used for determining minimal cut sets for fault trees are Monte Carlo simulation and deterministic methods. A basic fault tree structure is represented below (Figure 3). According to the literature, FTA is used in the fault analysis of HEMM. Several studies have been published in the last five years using SFT and DFT. From the previous work, it can be noted that FTA is used both with descriptive and numerical data combined with Boolean algebra to make decisions on optimized maintenance intervals, qualitative and quantitative fault analysis, and reliability estimations of the equipment [47]. FTA was helpful in identifying risk priority number (RPN), equipment value, and impact on value, identifying basic events that cause failures, and building mathematical models by logically correlating the events.

Figure 3. Basic fault tree representation [48].

In [49], authors used fault tree analysis to understand the effects of each component or subsystem of a dragline on its reliability, and to get an insight of an optimized maintenance schedule. Probability distribution that best defined TTF data of each subsystem of dragline was identified. The obtained distributions were then combined with a fault tree for defining the system to identify the influence of individual component reliability on a dragline. Dragging rope is predicted to have the highest contribution to a number of failures within a year, but the motors and generators will cause the longest downtime if they fail. Probability values were also useful in deciding which components need attention at certain time intervals. In [50], the authors used fault tree analysis for fault identification of CNC turning center. Boolean algebra was used to evaluate the fault tree (FT) diagram and to derive the machine’s governing reliability model. Qualitative and quantitative analysis is carried out to identify critical sub-systems and components of CNC turning center. The results are the estimation of the reliability of the CNC machine after one year of the warranty period and identify the number of failures during this period. In [51], the authors used fault tree analysis to analyze failures associated with the mine cage conveyance while showing the various branches of events that can lead to failures and their order of criticality for the various associated components. Failures associated with one or more components compromised the effectiveness of the mine cage conveyance as a system, and efforts were geared toward managing the critical components identified in this study by reviewing the existing maintenance plans and developing more robust strategies. In [52], the authors developed a methodology to determine the critical machine of the company, based on impact on production, impact on value, availability standby and equipment value and this identified machine was further analyzed by using failure mode and effect analysis and fault tree analysis in detail to determine its risk priority number (RPN). The risk priority number (RPN) is the product of severity rating, probability of occurrence, and the probability of detection [53]. A case study used a fault tree for a heavy-duty machine’s hydraulic system, and the result shows that there are 27 basic events that cause hydraulic failure in the hydraulic system, where oil pollution is the most critical basic event. As the outcome of quantitative analysis is entirely dependent on the precision of the numerical data used in the analysis, if uncertainties are left unresolved, then there is a chance of producing misleading results. Hence, different methodologies, mainly based on fuzzy numbers, were proposed to tackle the issue of uncertain failure data in FTA.

Standard fault trees (SFTs) can only assess the reliability of static systems. The dynamic nature of a system leads to several dynamic failure features such as functional dependent events and priorities of failure events. Although SFTs are commonly used for dependability analysis, they are incapable of capturing dynamic data. SFTs have been expanded in several ways to assist dynamic dependability analysis, such as dynamic fault trees (DFTs), state-event faults, and stochastic hybrid fault trees. The DFT is one of the most extensively used dynamic extensions of the SFT, and it captures sequence-dependent behavior, functionally dependent component behavior, and event priority [54,55]. In [56], the authors proposed a method to set the dynamic fault tree of a roadheader. The modular method was used to split the fault tree into dynamic and static states, and a binary decision tree was used to analyze the static state, and the logical relationship between faults was used to assess the dynamic module. In [57], the authors constructed a dynamic fault tree using a binary decision tree and Markov method in a modular approach for an electric haulage shearer. The study revealed that improper installation of the first shaft bearing, cage off of first shaft bearing, cutting motor damage and poor quality of lubricating oil were the major contributors to the faults of the shearer [44].

FTA’s design concept can be used to demonstrate its limitations. It focuses on building a mathematical model of a complex physical condition by logically correlating events. The strategy is solely based on the analyst’s judgement if all peripheral, environmental, and operating parameters are not given [58]. Another important difficulty with a quantitative FTA is the lack of reliable and meaningful failure data and the probabilities of events. The cost of development in a first-time application to a system is the most notable. For investigating small systems, inductive analysis approaches such as failure-mode-and-effects analysis are significantly easier and less expensive to deploy [58].

Even though several fault tree extensions have been proposed, they all have a variety of shortcomings. Even when software tool help is available, many investigations involve a significant amount of manual work. Over the last two decades, researchers have focused on ways to automate the synthesis of dependability information from system models, with the goal of simplifying dependability analysis. As a result, the field of model-based dependability analysis has emerged (MBDA) [44]. As part of MBDA, many tools and approaches for automating the development of dependability analysis, such as fault trees, have been developed. Because the analyses in MBDA are carried out on formal models, they may be carried out iteratively, which helps to generate more results and new results if the system design changes. When compared to manual procedures, this process takes less time and costs less money, and because it is more structured, the chances of introducing errors in the analysis or producing incomplete results are reduced. Furthermore, by allowing sections of an existing system model or libraries of previously analyzed components to be reused, the MBDA methodologies give a higher degree of reusability [44,55].

3.3. Probability Distributions and NHPP Models

The reliability of the system and sub-system can be determined from the failure rate using probability distribution methods. Both parametric and non-parametric methods are used in reliability estimations. Trend and correlation tests can be used to check if the data points are independent and identically distributed. Parametric distributions can be used if no trend or correlation is observed in the data. Otherwise, non-parametric methods can be used to analyze data. In non-parametric methods, the failure data are analyzed without assuming any particular distribution. The non-parametric analysis methods include Kaplan–Meier, simple actuarial and standard actuarial methods. Reliability evaluation by parametric method considers fitting the failure rate to any statistical distribution, such as the exponential, normal, Weibull, or lognormal. This will result in a better understanding of failure, and the resulting model can be used for analytical evaluation of reliability parameters for the whole lifespan of the system.

Parametric probability distributions are used both in stochastic analyses of system reliability, where the systems are mostly assumed to be fully known, and corresponding properties of the system are analyzed, and for statistical inference, where process data are used to estimate the parameters of the distribution, often followed by a specific inference of interest [59]. Goodness of fit tests like the Chi-square test, Kolmogorov–Smirnov, Anderson–Darling and Shipiro–Wilk tests are used to analyze how best the distribution fits the given data. The model that most efficiently describes the data can be selected based on goodness-of-fit tests for reliability estimations. The Weibull distribution function, among all distributions, is usually the most used method to evaluate system reliability as the distribution could be used to show an assortment of life behaviors. In this distribution, cumulative probability, failure rate and probability density function (PDF) curves are changed by the influence of either shape parameter, β, scale parameter, η and location parameter, γ variation. The shape parameter mainly indicates the condition of the system. If β < 1, it indicates that the rate of failure of a system or component will be decreasing with respect to time. This condition can be treated as an early-life failure. Weibull distributions with β nearer to or equivalent to 1 have a constant rate of failure, also known as the useful life. Similarly, Weibull distributions with β > 1 have an increasing failure rate with respect to time, denoted as a wear-out failure. A typical ‘bathtub curve’ plot clearly depicts the three segments of failure zones. Figure 4 shows the bathtub curve representing the failure rate over time.

Figure 4. Bathtub curve representing equipment failure rate [60].

Most work in the literature is based on probability distributions in equipment reliability estimations and maintenance analysis. TBF, TTF and TTR data are mainly used in parametric and non-parametric estimations. Probability distribution and NHPP models are mainly used in reliability centered maintenance and to identify critical systems and sub-systems of the equipment. In [61], the authors presented reliability analysis based on probability density function and failure rate of a shovel-dumper system of an open pit coal mine using probability distribution functions. The KS test was used to evaluate the best fit distribution for TBF data of shovels and dumpers. In [62], the authors adopted a three-parameter Weibull distribution approach to analyze the data sets of load-haul-dumper (LHD) in underground mines using the ‘Isograph Reliability Workbench 13.0’ software package. The parameters were evaluated using best fit distributions, and Weibull likelihood plots and the percentage reliability of each individual subsystem of LHD were estimated. Using the results, the authors identified preventive maintenance time intervals and enhanced the overall reliability of the LHD. The equipment performance evaluation was based on availability and utilization. In [63], the authors presented a case study describing the reliability analysis of crushing plants in a bauxite mine where the crushing plants were divided into seven subsystems and reliability analysis was done for each subsystem using failures data. The parameters of some idealized probability distributions were estimated by using ReliaSoft’s Weibull ++ 6 software, and the best fit distributions that characterized the failure pattern of the two crushing plants and their subsystems were identified. Further reliability of both the crushing plants and their subsystems were estimated at different time missions using the best fit distribution. Other aspects of system failure behavior were also analyzed briefly for machine improvement. Analysis of the total downtime, breakdown frequency, reliability, and maintainability characteristics of different subsystems shows that the reliability of crushing plant 1 and crushing plant 2 after 10 h reduce to about 64% and 35%, respectively. The study showed the importance of reliability and maintainability analysis for deciding maintenance intervals and for planning and organizing maintenance. In [64], the authors considered two approaches (a basic maintenance approach and a reliability-based approach) to analyze maintenance data. To find the best-fit distribution, different types of statistical distributions were tested by the Easyfit software. The developed model based on these data showed that the reliability of loader No. 1 and No. 2 decreased to a zero value after approximately 477 h and 309 h of operation, respectively, and suggested a review on the maintenance program to be performed to increase reliability. In [65], the authors presented a reliability analysis of load-haul-dumpers in an underground coal mine. The distribution parameters were estimated by both graphical and MLE processes and the goodness-of-fit test was carried out using the Cramer von mises statistical test. Further, using this analysis, the total cost of operation was reduced by estimating the reliability-based preventive maintenance time intervals. In [66], the authors presented a case study describing reliability analysis and life cycle cost optimization of a band saw cutting machine. A few components followed the parametric distribution and certain components followed the non-parametric distribution. The failure distribution parameters for each component of the machine were estimated using ReliaSoft’s Weibull++6 software. The result of the analysis indicates critical parts of the machine and with certain design changes indicated by the authors, there is around 16% improvement in the overall reliability of the system, and the life cycle costs are reduced by 22%. In [67], the authors used a renewal process (Poisson distribution) for modelling the LHD’s mechanical failures. The graphical method tests if the data is independent and identically distributed (IID). The parameters of various distributions were found by using Math Wave Easy Fit 5.6 professional software. Chi-square test was applied to select the best-fit distribution model. Further, the study of the two-parameter log normal distribution theory and its parameters are presented using log-normal probability theory. The study reflects that reliability analysis is a powerful tool for determining maintenance intervals. Maintenance activity every week was suggested for the machine to achieve a reliability of 75%. In [68], the authors developed a basic methodology for the reliability modelling and development of a maintenance program for a fleet of four drilling rigs. Failure and performance data was collected from the Sarcheshmeh copper mine in Iran for two years. Then, the available data was classified and analyzed and the reliability of all subsystems and whole rigs were modelled and studied. Easyfit and MS Excel software were used for data analysis and finding the best-fit distributions and parameters, and the Kolmogorov–Smirnov (K-S) test was used to select the best distributions. NHPP and renewable processes were used for the reliability modelling of the subsystems of the drill rigs. The probabilistic possibility of all fleet states was calculated, and maintenance operations were suggested for 80% reliability.

In [69], the authors studied the reliability of a drum shearer machine using operation and maintenance data from an Iranian mine for a period of two years. The tests for trend and serial correlation showed that the times between successive failures for the cable system were not independent and identically distributed and the graphical tests revealed that the cable system of the shearer is a deteriorating system. A goodness-of-fit test showed that the power law process model is a good fit for this system’s failure data. After parameter estimation for the power law model, reliability and failure rate plots were obtained. Based on analysis and results, a period of 125 h was defined as the reliability-based maintenance interval for the cable system of the shearer. The analysis shows that, using this strategy, the system’s reliability would improve by at least 50%. In [70], the authors studied the reliability, availability, and maintainability (RAM) of a 36T dumper machine with failure and repair data using the KME method and outlined the constraints and reasons for machine unavailability. The results were verified using maximum likelihood estimation and piecewise exponential estimation methods. The reliability and maintainability of an LHD system are disappointing. They suggested maintenance planning and machine improvement from this analysis. The Kaplan–Meir estimator is used to find the design life and optimal maintenance period which are useful information in maintenance planning. In [71], the authors developed a computational tool and programming with VBA in Excel for reliability and failure analysis of underground rock bolters. The developed approach used the modelling of stochastic processes, such as the renewal process, the non-homogeneous Poisson process, and the Bayesian approach. The tool gives the best-associated model, the parameters estimation, the mean time between failure and the reliability estimate. This approach is validated with the reliability analysis of inter-failure times from underground rock bolters subsystems over a two-year period. Results show that Weibull and lognormal probability distribution fit to the most subsystem inter-failure times. The study revealed that the bolting head, the rock drill, the screen handler, the electric/electronic system, the hydraulic system, the drilling feeder and the structure have a high repair frequency. The hydraulic and electric/electronic subsystems represented the lowest reliability after 50 operation hours. In [23], the authors conducted a preliminary analysis of a fleet of LHD machines, found that engine and hydraulic systems are the two most critical systems and selected hydraulic systems for further study. Maintenance data for two years for these machines were analyzed. The tests for trends and serial correlation showed that times between successive failures for the hydraulic systems were in most cases not independent and identically distributed. Goodness-of-fit tests (Cramer–von Mises test and graphical methods) showed that the power law process model is a good fit for the hydraulic systems’ failure data. Methods for parameter estimation in the power law process model and estimation of optimal maintenance intervals for the LHDs are presented, emphasizing the use of graphical methods for data analysis.

4. Machine Learning Applications in Failure Predictions and Reliability Estimations

Machine Learning (ML) is a subclass of artificial intelligence (AI) that can be defined as a semi-automated system in which computers create an algorithm by learning from observed data. Machine learning algorithms create a model based on training data and use it to make predictions or judgments without having to be explicitly programmed to do so. In recent years, decision makers and the scientific community have paid close attention to the use of machine learning in risk and reliability assessment. Currently, quite a good amount of work is being carried out in mine equipment failure and reliability assessments and predictive maintenance analysis [72]. A machine learning approach can be used for predicting failures and also to identify important parameters that predict failures.

From the equipment failure perspective, machine learning can be useful to replace or repair a component before a fault happens and restore the original condition of the equipment to maintain reliability. The algorithms use previous failure data or the equipment’s vibration/condition monitoring data to study failures and make predictions. This would lead to decreased downtime and achieve expected production levels at all times. Machine learning helps predict future failures to accurately schedule maintenance operations. ML techniques are designed to derive knowledge out of existing data. The following diagram (Figure 5) gives a basic understanding of ML application for fault analysis.

Figure 5. Workflow for developing data-driven ML model for fault detection.

Businesses can profit from big data since it aids in guiding systems with a prescriptive maintenance strategy. To improve the performance of machine learning algorithms, it is critical to acquire usable data from the dataset [73]. Depending on the availability of labelled data, ML-based data-driven methods can be further classified as supervised, semi-supervised or unsupervised approaches. Machine learning algorithms are classified into taxonomies based on the algorithm’s expected outcome. The following are a list of common algorithm types:

Supervised learning: The algorithm creates a function that maps inputs to outputs. Output variables are known. The classification problem is a common supervised learning challenge in which the learner must learn (or estimate the behaviors of) a function that maps a vector into one of many classes by studying multiple input-output samples of the function.
Unsupervised learning: There is no target or outcome variable to predict/estimate in this method. It is used for clustering populations in different groups and when there is a lack of sufficiently labelled data [74].
Semi-supervised learning: Combines both labelled and unlabeled examples to generate an appropriate function or classifier [75]
Reinforcement learning: The machine is taught to make a certain decision using this algorithm. It works like this: the machine is placed in an environment where it would constantly train itself through trial and error. This system learns from its previous experiences and seeks to capture as much information as possible to make accurate decisions [74].

Predictive modelling can be described as the mathematical problem of approximating a mapping function (f) from input variables (X) to output variables (y). This is called the problem of function approximation. The algorithms are divided into two types: classification and regression based on the output variable. Classification predictive modelling is the task of approximating a mapping function (f), from input variables (X) to discrete output variables (y). The output variables are often called labels or categories. The mapping function predicts the class or category for a given observation. Regression predictive modelling is the task of approximating a mapping function (f) from input variables (X) to a continuous output variable (y). A continuous output variable is a real value, such as an integer or floating-point value. Classification models use different metrics like accuracy, precision, recall, F1-score, ROC, confusion metrics, specificity, sensitivity, and AUC to evaluate model performance. Regression models use mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), bias-variance and learning curves to estimate error and evaluate model performance [76]. Classification models are mostly used in the literature to predict and classify faults. A few ML algorithms used majorly in the literature of equipment reliability and fault analysis are discussed in this section.

4.1. Support Vector Machine (SVM)

Support vector machine (SVM) is a supervised machine learning algorithm that can be used for classification and regression problems. In the SVM algorithm, each data item is plotted as a point in n-dimensional space where n is the number of features considered, with each feature being the value of a particular coordinate [77]. Then, the aim is to perform classification by finding the hyper-plane that differentiates the two classes very well. SVMs maximize the margin around the separating plane, and the decision function is fully specified by a subset of training samples called the support vectors [78,79]. The optimal SVM hyperplane for binary classification is represented in Figure 6.

Figure 6. Optimal hyperplane for binary classification [81].

A separating hyper plane can be used to divide data that is linear. However, the data is frequently non-linear, and the datasets are closely linked. To account for this, the input data is non-linearly mapped to a high-dimensional space. After that, the new mapping is linearly separable. Kernel trick allows SVM’s to form nonlinear boundaries. The kernel function’s purpose is to allow operations to be conducted in the input space instead of the possibly high-dimensional feature space. As a result, the two classes can be separated in the feature space. Different kernel functions exist, such as polynomial, radial basis function (RBF), and sigmoid function, and the choice of a kernel function is determined by the application [80]. From the literature review, it can be noted that SVM is mainly used for forecasting failures, fault diagnosis and pattern recognition. The previous works used TTF, TBF, audio signals, vibration data, and fault states as input data for SVM algorithms. From the time horizon, it can be noted SVM was widely popular from 2010 to 2015 in mining.

In [82], the authors used SVM to detect defects and fault patterns of unexpected heavy equipment failures. SVM classifier was used to divide data as normal and abnormal and only normal data was used for learning using restricted Boltzmann machine (RBM) and then based on patterns, faults in the system were identified. In [83], the authors used the SVM regression algorithm to forecast TBFs using historical observations of LHD failures. A Pareto analysis detected the LHD’s engine as the most critical system. TBFs of 32 failures were obtained. Twenty-five records were used for SVR modelling and the remaining for testing. Mean absolute percentage error (MAPE) and normalized root mean square error (NRMSE) values were used to evaluate model performance. A polynomial kernel function of the third degree resulted in the best predictions (minimum errors). An absolute percentage error value of less than 2% was achieved, demonstrating excellent forecasting applicability of SVR. In [84], the authors have explored the application of the SVM classification approach for pattern recognition and failure forecasting on mining shovels. The failure behavior of a fleet of ten mining shovels during 1 year of operation was investigated and the shovels were classified into four clusters using k-means clustering algorithms, based on their reliability. Future failures were predicted using the support vector machine (SVM) classification technique. Historical failure (component type) and time to repair data were used to predict the next failure type for all shovels. Four different kernel functions, namely linear, polynomial, RBF and sigmoid function were examined in combination with different values of C parameter, using a grid search attempt. The best C–K pair that resulted in the maximum number of correct classes for the test dataset was selected for each shovel from each cluster using a grid search method, and the results were validated using particle swarm optimization. The SVM technique was shown to be successful with a prediction accuracy of over 75%. In [85], the authors proposed principal component analysis (PCA) with the SVM method for fault diagnosis of mine hoists. PCA was used to extract relevant time domain and frequency domain features and using these, a multi-class SVM algorithm model corresponding to nine different fault states output was built. Comparison of various methods showed the PCA-SVM method successfully diagnosed faults in the mine hoists system. The RBF kernel function system had the best classification properties and the accuracy of the model turned out to be around 98%. In [86], the authors developed a SVM based ensemble model for reliability forecasting of a mine dumper. The hyperparameters of the SVM were selected by applying a genetic algorithm. A case study was conducted investigating a dumper operated at a coal mine in India. Time-to-failure historical data for the LHD were collected, and cumulative time to failure was calculated for reliability forecasting. The hyperparameters of the SVM models were selected using genetic algorithm-based learning. Study results demonstrate that the developed model performs well with high accuracy (determination coefficient R² = 0.97) in the prediction of LHD future failure times, and a comparison with other methods demonstrates the superiority of the proposed ensemble SVM model. In [87], the authors have proposed a classification method for an automated operating mode to increase the performance of vibration-based online condition monitoring systems for applications such as gearboxes, motors, and their constituent components. Several variations of the system have been tested and found to be successful. A swing machinery system of an electromagnetic excavator is used to see how this method functions on dynamic signals gathered from an operating machine. The empty and full swing cycles are the two classification classes with vibration and speed as input parameters. SVM and other classification models were used to analyze swing performance. Data were collected over a period of 45 h on an operation. In [88], the authors developed a method for monitoring and tracking both location and action for automated construction equipment. The authors have proposed an audio-based method for tracking and activity analysis of heavy construction equipment. The equipment generates distinct sound patterns while performing a certain task and these audio signals are filtered and converted into time–frequency representations. This data is classified into different activity representations using a multiclass SVM classification algorithm, and the results demonstrated the potential capacity to correctly recognize various equipment actions with 80% model accuracy.

4.2. The k-Nearest Neighbors KNN

The k-nearest neighbors (KNN) method is a supervised machine learning algorithm that can be used to address classification and regression problems [89]. KNN is a kind of instance-based learning (also known as lazy learning), in which the function is only estimated locally, and all computation is deferred until classification. When there is very little prior knowledge about the data distribution, the KNN is the most basic and simplest classification algorithm. The data points are categorized based on how their neighbors are classified. The algorithm’s idea is that all data points with similar characteristics are in close proximity. Given a K value, the nearest K neighbors are chosen for any new point, and the class containing the most points out of the k points is allocated to the new point. The choice of K, as well as the distance measure used to pick the nearest K points, determine the performance of a KNN classifier. In the case of KNN, a small training sample size can significantly impact the selection of the optimal neighborhood size K, and the sensitivity of K selection can significantly decrease KNN classification performance. In general, KNN is susceptible to data sparsity, noisy mislabeled points, and outliers from other classes if the K value chosen is too small or too large [90,91,92]. From the literature review, it can be inferred that KNN data is recently gaining popularity in mining. It is mainly used in fault diagnosis and real time fault monitoring. Faults are monitored and identified both at system and sub-system levels.

In [93], the authors studied a historical failure dataset of a dragline to conduct predictive maintenance. The authors used the k-Nearest Neighbors algorithm to predict the failure mode but there was a chance of overfitting in the methodology. Hence, a combination of the genetic algorithm and k-Nearest Neighbor algorithm was applied for the failure dataset. This enhanced the model performance, and the results were better predicted. In another study, [94], the authors collected vibration signals of main journal-bearings of an IC engine from condition monitoring methods. The vibration signals were classified under normal, oil starvation, and extreme wear fault. Thirty features were extracted from the processing of signals, and KNN and ANN were applied to train the dataset and later for diagnostic use. Variable K ranging from 1 to 20 with the step size of 1 was used to get better classification results. The experimental results showed diagnostic methods were reliable in separating fault conditions in the bearings. In [95], the authors proposed a new methodology of weighted k-Nearest Neighbor classifier where a square inverse weighting technique was used to improve the accuracy of the KNN model for fault diagnosis of rolling bearing elements. Three bearing conditions were classified: healthy, inner, and outer race fault. The algorithm indicated that this method enables fault detection in bearings with high accuracy. In [96], the authors presented a fault diagnosis technique based on acoustic emission (AE) analysis with the Hilbert–Huang transform (HHT) and data mining tool. In [97], the authors proposed a real-time online fault diagnosis method for rolling bearings based on the KNN algorithm. The rolling bearing vibration signal is preprocessed, and feature parameters are extracted. The data was preprocessed, with 100 raw points as one sample, for a total of 8496 samples. Different classification models like decision tree C4.5, CART algorithm and KNN were used to classify fault data. Real-time online extraction of the characteristic parameters of the vibration signal was used to realize real-time online faults through the fault diagnosis model. Results show that the fault diagnosis model based on the KNN algorithm is better than the fault diagnosis model.

4.3. Naïve Bayes Classifier

Naïve Bayes, a supervised machine learning algorithm, assumes an underlying probability distribution and captures uncertainty about the model logically by calculating probabilities of occurrences. It is used to solve diagnostic and predictive issues. It calculates explicit hypothesis probabilities and is robust to noise in the input data [98]. The naïve Bayes algorithm is a straightforward probability classifier that derives a set of probabilities by counting the frequency and combinations of values in a data set. When assessing the value of the class variable, the method applies Bayes’ theorem and assumes that all variables are independent. In a range of controlled categorization challenges, the algorithm learns quickly [99].

There are different types of Naïve Bayes classifiers. When characteristic values are continuous, it is assumed that the values associated with each class are spread according to the Gaussian distribution, which is the Normal distribution. On multinomial distributed data, multinomial naïve Bayes is preferred. Bernoulli naïve Bayes is employed when data is distributed according to multivariate Bernoulli distributions. That is, multiple features exist, but each one is considered to have a binary value. As a result, binary values are required for features [100,101]. Naïve Bayes has recently earned a lot of attention because of its high learning and prediction accuracy, and more importantly, the algorithm works well for mining data and conditions. In the literature work, naïve Bayes was used in fault diagnosis and assessing faults’ damage and fault classifications.

In [102], the authors predicted RUL of bearings using the naïve Bayes algorithm. Firstly, the statistical method is used to extract the features of the vibration signal, and the root mean square (RMS) is regarded as the main performance degradation index. Second, the correlation coefficient is used to select the statistical characteristics that have high correlation with the RMS. Then, in order to avoid the fluctuation of the statistical feature, the improved Weibull distributions (WD) algorithm is used to fit the fluctuation feature of bearings at different recession stages, which is used as the input of the naïve Bayes (NB) training stage. During the testing stage, the true fluctuation feature of the bearings is used as the input of NB. After the NB testing, five classes are obtained: health states and four states for bearing degradation. Finally, the exponential smoothing algorithm is used to smooth the five classes and to predict the RUL of bearings. The experimental results show that the proposed method is effective for RUL prediction of bearings. In [98], the authors used Naïve Bayes for bearing fault diagnosis on enhanced independent data. Data-based fault diagnostics of mechanical components has become a new hotspot. Their approach was based on processing the data vector (attribute feature and sample dimension) to reduce the limitations of Naïve Bayes by an independence hypothesis. The statistical characteristics of the bearings’ original signal were extracted, decision trees were used to select important features of the signal, and low correlation features were selected. The authors used SVM models in the next step to prune redundant vectors, and in the last step used Naïve Bayes on the processed data to diagnose faults. In [103], the authors studied non-repairable equipment with multiple and independent failure modes, where only incomplete information about the failure mode was obtained through condition monitoring. The study focused on obtaining a probability matrix representing the relationship between actual health and condition monitoring information of the equipment and Naïve Bayes was used as a classifier to classify each failure mode based on the degree of damage. An experimental planetary gearbox system is used to gather condition monitoring data for damage degree classification considering four failure modes. A forward feature selection is used in this paper to find the best set of features. The classification accuracy increases to 94.76%. In [104], the authors applied a Naïve Bayes classifier for diagnosing faults of rolling element bearings and indicated that the Naïve Bayes classifier presented higher levels of accuracy of 96% without any feature engineering requirement.

4.4. Decision Tree

Decision tree is a supervised machine learning method for constructing classification systems based on multiple parameters or generating prediction algorithms for a target variable. In this method, a population is divided into branch-like segments that form an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can handle huge, complex datasets without imposing a complex parametric framework [105]. Decision trees are mainly effective in handling non-linear datasets. Like stepwise selection in regression analysis, decision tree methods can be used to pick the most relevant predictor variables from a large number of features in datasets and to assess the relative importance of these variables on the decision variable. Moreover, decision trees can also handle missing data very well. It is also easy to handle a variety of input data: nominal, numeric and textual [106].

However, other target functions of the decision tree can also include, minimizing the number of nodes or minimizing the average depth to find the most important predictors. Pruning is the practice of removing redundant nodes from a tree to obtain the best decision tree possible. A general decision tree structure is represented in Figure 7.

Figure 7. A general decision tree structure [107].

In [108], the authors in their work proposed an equipment reliability model for pumps, designed by applying a data extraction algorithm on equipment maintenance records residing in SAP applications. The author has initially applied unsupervised learning to perform cluster evaluation. Thereafter, the data from the finalized model was applied to a supervised learning algorithm where the classifier was trained to predict equipment breakdown. The classifier was tested on test data sets where it was observed that support vector machine (SVM) and decision tree (DT) algorithms were able to classify and predict equipment breakdown with high accuracy and a true positive rate (TPR) of more than 95 percent.

In [109], the authors proposed the fault diagnosis method of an industrial ventilator (Fan) based on analysis-decision trees. The operating of the fan was followed in five different conditions: a healthy condition and then affected by four different faults, those affecting inner and outer races of rolling bearings, the mass unbalance and mechanical looseness. Fifteen factors including mean, median, variance indicators (including the greatest three peaks by amplitude in each condition) that described the vibration signals were extracted for each spectrum. In each condition, 30 signals were recorded to have 150 indicator vectors, divided into two sets. Twelve trees were built on the base of numeric attributes, DecisionStump, FT, J48, J48graft, LADTree, LMT, NBTree, RandomForest, RandomTree, REPTree, and SimpleCart. Genetic algorithms optimized the finding of the best choice representative tree. The RandomForest Tree is preconized for establishing a diagnostic tool for the studied industrial Fan. In [110], the authors emphasize the problem of finding out good features that discriminate the different fault conditions of the bearing. The selection of good features is an important phase in pattern recognition and requires detailed domain knowledge. Their paper illustrated the use of a Decision Tree that identifies the best features from a given set of samples for the purpose of classification. It uses Proximal Support Vector Machine (PSVM), which has the capability to efficiently classify the faults using statistical features. The criterion used to identify the best feature invokes the concepts of entropy reduction and information gain that are used in Decision Tree. The vibration signal from a piezoelectric transducer is captured for the following conditions: good bearing, bearing with inner race fault, bearing with outer race fault, and inner and outer race fault. The statistical features are extracted using decision tree and classified successfully using PSVM and SVM. In [111], the authors used Decision Tree combined with Bayesian network for fault diagnosis of motor faults. This paper describes the model structure and the basic ideas of Decision Tree and Bayesian network, combines the advantages of the two, and solves the uncertainty of diagnosis information effectively.

4.5. Logistic Regression

In binary classification, logistic regression analysis performs exceptionally well, particularly with categorical variables with [0, 1] classes. Based on the values of predictor variables, either categorical or numerical, logistic regression models can estimate the likelihood of a failure occurrence [112]. In logistic regression, the dependent variable has a Bernoulli distribution. Thus, for any given linear combination of independent variables, an unknown probability, P, of the response variable is estimated. To do so, a link function must be used to link the independent variables to Bernoulli’s distribution, with the natural log of the odds ratio or the logit acting as the link function. This function converts a linear combination of explanatory variables to Bernoulli’s probability distribution, which has a domain of 0 to 1.

Logistic regression is a supervised learning technique often used in failure predictions and preventive maintenance strategies. Cost data, failure data, sensor data and acoustic electric signals were the input data used in logistic regression in previous work. The algorithm was used to predict economic success, RPN, machine state in the next 24 h given the current state and equipment reliability.

In [113], the authors used logistic regression models based on cost to accurately predict economic success or failure using the fleet data for 378 single axle dump trucks. In [114], the authors proposed a systematic approach for developing a standard equation for the risk priority number (RPN) measure, using the methodology of interval number-based logistic regression. The aim is to reduce risks of failure, using FMEA in terms of the risk priority number (RPN). The logistic regression model helped identify the probability of risk of failure of high-capacity submersible pumps. Another study aimed to propose a model for predicting mechanical equipment failure from various sensor data collected in the manufacturing process. This study constructed a Hadoop-based big data platform to distribute many datasets for research, and performed logistic regression modelling to predict the main variables causing the failure from various collected variables. As a result of the study, the main variables in the manufacturing process that cause equipment failure were derived from the collected sensor data, and the fitness and performance evaluations for the prediction model were made using the ROC curve [115]. In [116], the authors applied logistic regression to predict machine state 24 h in the future, given the current machine state. A confusion matrix was used to evaluate model performance. In [117], the authors used logistic regression models and acoustic emissions (AE) to evaluate the reliability of the cutting tool to determine best maintenance practice. As it is difficult to monitor cutting forces in practice, a combination of both AE and logistic models are effective in reliability analysis. Reliability models are constructed using AE signals and cutting force as parameters. The results show that AE feature extractions and logistic models work effectively in reliability estimations.

4.6. K-Means Algorithm

K-Means clustering is an unsupervised learning approach that is used in machine learning to handle clustering problems. It divides the unlabeled data into many clusters. The K-Means clustering method is easy and accurate, flexible to handle large data, has a good speed of convergence, and has adaptability to sparse data. K-Means clusters the data into different groups and provides a simple technique to determine the categories of groups in an unlabeled dataset without any training. It is a centroid-based approach, where each cluster has its own centroid. The goal of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters. The K-means clustering algorithm finds the best value for K center points or centroids by an iterative process and assigns each data point to its closest K-center. Those points which are near to the K-center create a cluster. The distance of the point from the centroid in each step is calculated using Euclidean method. Hence data points from each cluster are similar in some way and are far from other clusters. The K value is user defined for the algorithm that is generated. The Elbow method is the most popular way that helps in selecting the optimal K value. The method is based on minimizing within cluster sum of square values (WCSS) that defines total variation in the data [118].

In [119], the authors have tried to implement a clustering method to group maintainable equipment based on their need for maintenance according to time to failure, and the location of these machines. The main aim was to reduce scheduling process and time and a standard maintenance procedure for the machines in each cell. In [120], the authors examined the condition-based equipment data using a data analytics approach to develop a predictive maintenance program. K-means for clustering the failure characteristic, support vector regression (SVR) model used for predicting equipment failure were the two models used in their study.

4.7. The Neural Network ANN

The neural network (NN) plays a vital part in the human brain, and ANN is an unsupervised learning technique created from biology. ANN stands for artificial neural networks, and biological neurons inspired it. It is a massively parallel computing system made up of many basic processors connected by a large number of interconnections. ANNs learn the basic rules from a series of given symbolic circumstances in instances rather than following a set of laws specified by human experts. They are organized into three layers (i.e., input layer, several hidden layers, and an output layer).

Furthermore, the relationships between the network processing units are the source of the ANNs’ analytical activity. ANNs are the most extensively used machine learning algorithms. Multilayer perceptrons (MLPs) with backpropagation learning are based on a supervised technique and have three layers: input, hidden, and output [121,122]. Compared to other classic machine learning techniques, ANN models have significant advantages in dealing with random, fuzzy, and nonlinear data. ANNs are best suited for systems with a complicated, large-scale structure and ambiguous data. They are commonly employed for a wide range of issues [123,124]. ANNs do, however, also have some drawbacks. As a hardware-dependent algorithm, ANN requires GPU for processing and to create them in the first place. ANN requires a large amount of training data to build the appropriate algorithm. When using the sigmoid activation function, ANN algorithms frequently encounter vanishing and expanding gradient difficulties and the challenge remains in finding the loss function. The algorithms of ANN are black boxes in nature, where results are based on the experience of training data and not a specified program, making it difficult for modification and explanation to business stakeholders. Despite the shortcomings of ANN, neural networks are gaining wide popularity in the mining industry and researchers are mostly moving towards the use of ANN in failure analysis and predictive maintenance. The sample neural network architecture is shown in Figure 8.

Figure 8. Sample neural network architecture [125].

ANN is widely used in reliability and fault analysis of mining machines. Several literature works can be found using ANN for analysis. ANN has been used in mining since the early 2000s. However, the ANN architecture was not as developed as it is today, and only feed-forward networks were used in the algorithm. Presently, ANN is used with higher accuracy and better results in predicting equipment failures and reliability. ANN is used for fault diagnostics of numerous types of rotating machinery that use signal processing techniques to extract features and further input these to the ANN model to classify faults [126,127,128,129]. In [130] the authors studied electric motor faults with ANN feedforward networks and self-organizing maps. Data was taken from stator current and mechanical vibration signals for major motor faults. The study showed the effectiveness of both algorithms and feedforward networks looked more promising for electric motor analysis. In [131], the authors used multilayer perceptrons (MLP) in ANN to classify dragline faults using two years failure data. There were 16 causes in total that lead to dragline failure. Two different models for analysis of these faults, using seven causes, seven symptoms and five fault parameters of drag systems have been developed. The prediction accuracy of symptoms using the cause was 94.2% and that of fault using symptom was 97.1%. In [124], the authors demonstrated on how neural networks can be used in vibration monitoring analysis of rolling element bearing and derived how it can be effective in handling noisy data. In [132], presented a multi-state algorithm for dynamic condition monitoring of a gear. The algorithm information referred to the gear status and estimated the mesh stiffness per shaft revolution in case that any abnormality is detected. This network was fed with statistical parameters obtained from the wavelet coefficients derived for the most sensitive levels of decomposition to damage; the output resulted in the drop in the averaged torsional meshing stiffness when a failure appears, which is highly related to local failure. In [123], the authors proposed a rotor vibration fault diagnosis approach, that transforms multiple vibration signals into symmetrized dot pattern (SDP) images, and then identifies the SDP graphical feature characteristic of different vibration states using a convolutional neural network (CNN). A CNN can reliably and accurately identify vibration faults by extracting the feature information of SDP images adaptively through deep learning. The proposed approach was tested experimentally using a rotor vibration test bed, and the results obtained were compared to those obtained with an equivalent CNN-based image recognition approach using orbit plot images. The rotor fault diagnosis precision was improved from 92% to 96.5%.

5. Discussions and Conclusions

Various statistical techniques have been reviewed in this literature review and are categorized based on the method of application. Based on the literature review, it can be concluded that reliability and failure analysis play a significant role in tracking and improving efficiency of machine systems and subsystems and a significant amount of work is carried out with this regard. However, the effectiveness of statistical learning is based on the amount and quality of data that can be collected. The most common data used is historical failure data (TBF/TTF, TTR, failure component) and real-time vibration data. As the volume of data increases, the complexity increases. With the advancement in the integration of big data tools, the analysis should progress more efficiently. Often, incorrect and missing data lead to lower analysis quality and accuracy, and this problem can be mitigated by leveraging automation techniques to store failure data. At present, research is more focused towards the analysis of failure data and less attention is given to the process of automation of data collection and storage. This could be one of the significant areas of improvement. As per the literature review, reliability and failures can be analyzed using a wide range of algorithms. To sum up, every algorithm has its own advantages and limitations and should be chosen based on the stated problem and data availability. Choosing a sub-optimal or unsuitable algorithm can lead to reduced benefits or even loss of time and money. The business goals should be clearly specified, and the data driven framework should be properly established before the start of problem solving and actual statistical analysis.

Graphical methods, probability distributions, NHPP models, supervised and unsupervised classification models are discussed in the analysis. Based on the literature review, probability distributions and NHPP models are widely applied techniques in reliability and maintenance analysis of mine equipment and components. In the present day, artificial neural networks are gaining more importance and several works of literature are successfully leveraging ANN. Table 1 gives a summary of the methods reviewed in the literature, data type used in different literature for the algorithms reviewed, the application of each algorithm in the existing literature and the distinction of the methods from the other algorithms reviewed.

Table 1. Different methods, f reliability analysis and failure predictions, their applications and distinction.

Graphical methods are the oldest and most convenient techniques that can be used in reliability analysis to get an overview of the system condition (if it has a decreasing, increasing or constant failure rate) and only time between failure (TBF) data is required for the analysis. However, the process is time consuming, and a deep dive analysis of the problem is not possible using this technique. More importantly, the plots cannot be used if the data is not independently and identically distributed (i.i.d.). Graphical methods are used from the early 1990s to date in mining. As graphical methods are the easiest to use to determine the system condition, though trivial, the method is still existent and is used along with complex algorithms for initial data exploration.

Probability distributions and NHPP models work on both i.i.d and non-i.i.d data. Probability distributions can be applied if data is not correlated and shows no trend otherwise NHPP models can be used. TBFs or TTF or TTR are the input data for the analysis. A wide range of software is available in the market to make the analysis easier. The system’s reliability, subsystems at any instance, the overall reliability, failure rate, and distribution parameters can be quickly obtained within seconds. Hence, this technique is widely used in reliability estimations. Maintenance intervals can be scheduled by studying probability graphs to maintain certain reliability levels. However, the major limitation of this method lies in not capturing parameters that influence the failures. As mining is a very complex activity, the external and internal parameters that influence equipment failure keep changing constantly from one state to other, and this has a major effect in failure analysis. As Weibull distribution commonly explains a component behavior, the future scope for improvement of this method can be the development of machine learning algorithm that can enhance the Weibull -based curve through the integration of external knowledge.

Fault trees can effectively discover the underlying cause of every failure and troubleshoot the problem from its root. Its visual presentation of failure causes makes it simpler to identify a single failure that leads to complete system failure and find the probability of the same. However, FTA’s design concept can be used to demonstrate its limitations. It focuses on building a mathematical model of a complex physical condition by logically correlating events. The strategy is solely based on the analyst’s judgement if all peripheral, environmental, and operating parameters aren’t given. A static fault tree cannot be applied if the system functions continuously change. Dynamic fault trees can be used in such conditions and even though several fault tree extensions have been proposed, they all have a variety of shortcomings. Even when software tool help is available, many investigations involve a significant amount of manual work.

Machine learning offers a wide range of algorithms that are excellent with failure analysis and predictions. Machine learning overcomes most of the limitations of the traditional statistical reliability techniques. Machine learning can work both with i.i.d and non-i.i.d data and the algorithm can easily capture underlying trends. ML can be faster than most other methods and can be less expensive if the input data is correctly fed. It considers external and internal feature parameters which influence failures. There are a variety of ML algorithms available and can be adopted based on the business problem requirement. Advantages and shortcomings of most common algorithms are discussed below.

SVM is one of the best classification and regression algorithms for failure analysis. It can generally categorize failure data very well into different groups with high classification accuracy. From the literature review, it can be seen that SVM is mainly used for fault pattern recognitions and predicting future failures. SVM can excellently deal with high dimension features, doesn’t suffer from overfitting and outliers generally have less influence. However, SVM is not suitable for large data sets and data that has more noise. SVM was mostly used in combination with another pre-processing algorithm (genetic algorithms, principal component analysis) in the reviewed literature. Naïve Bayes and the ANN algorithm are replacing other classification models due to their high learning and prediction accuracy in mining.

K-NN is the easiest algorithm to implement and makes no assumptions about the underlying data. K-NN is used both for failure and real time monitoring data. K-NN presented high accuracy with failure data in the literature reviewed. However, the accuracy of data is susceptible to the quality of data. Overfitting is one of the major problems of K-NN and to eliminate this possibility K-NN was used with other algorithms like the genetic algorithm. KNN also does not work well with high dimensional data and needs feature scaling.

The Naïve Bayes algorithm is characterized by the explicit underlying probability model. Naïve Bayes was mainly applied in the bearing fault predictions that use vibration data. Naïve Bayes along with a forward feature selection method, provided excellent accuracy when data had incomplete information about failure mode. It can be used very well to analyze failure data where the predictors are independent of each other. The disadvantage of the method is the assumption of independent predictors, which might not actually be true and has a need of prior probability.

Decision trees need little data preparation and are used for constructing classification systems based on multiple parameters or generating prediction algorithms for a target variable. Decision trees in reliability and failure analysis are mainly used to identify important features influencing target variable. SVM, KNN and Naïve Bayes are used along with decision trees to classify faults. The pruning method used in decision tree is one of the best techniques to accurately select parameters for classification models. Decision trees are very easy to understand and are able to handle multi-output problems. The major limitation of a decision tree is the time taken to process the algorithm and can be unstable due to small variations in data. Decision trees are piecewise constant approximations making it difficult to predict future faults. Decision trees were previously used in fault diagnosis of mining equipment. However, with the improvements of decision tree algorithms, new methods like random forest, or xgboost, have replaced the traditional decision tree algorithm.

Logistic regression performs well with failure classifications. They are mainly used in binary decision models and to estimate the importance of each feature. Logistic regression can be easily used for linearly separable data with a low dimensional dataset. Overfitting is the problem of high dimensional data. K-means models were mainly used to categorize data into groups, in order to plan a preventive maintenance strategy for each group. K-Means can also be to separate data into different fault classes and each of these classes can be an input parameter for a training dataset of SVM or KNN classification. K-Means is very easy to implement and computationally faster. But it is difficult to predict the value of K and it can have a strong impact on the final results. Rescaling data may result in completely different outputs.

ANN mimics the human brain structure to enable the model to approximate a complex non-linear function with multi-input and multi-output. As seen, ANN has a very high classification accuracy and a diverse use. It can very easily deal with complex non-linear functions. ANN is used in both failure and real time monitoring vibration data. As most other models, ANN is also prone to overfitting problems and there is an unexplained functioning of networks. There is no physical meaning to the training data of faults. ANN requires a large amount of training data and with the sigmoid activation function, ANN algorithms frequently encounter vanishing and expanding gradient problems. With the amount of quality data increasing in the mining industry, the scope for future applications of deep learning is massive. Equipment fault detection using image recognition, incorporating rule-based knowledge to implement logical procedures and formalizing knowledge on the algorithm of fault detection or equipment reliability can be few areas of exploration in future.

Overall, machine learning is a powerful tool in reliability and fault analysis. Although classifiers have presented excellent accuracy, they are required to be trained with complete data of all faults. Most of the literature reviewed uses single training set data and a single prediction method to carry out predictions which may not provide the best results. Multiple methods can be applied for a comprehensive understanding of data. Ensemble models can be created to predict outcome either by using different training datasets or by using different training models. Cross validations like K-fold cross validation techniques can also be employed to improve accuracy of the model and reduce the chance of randomness and overfitting. With the development of AI techniques and the rise of deep learning, intelligent diagnosis is going to be the future direction of fault diagnosis development. On the other hand, in the future diagnostic systems, not only data-driven AI methods, but also the consideration of failure mechanism and prior knowledge should be utilized and integrated closely to improve diagnostic performance. Statistical techniques like graphical methods and probability distributions can be used when there is no information on failure conditions and to get an overview of system conditions. Machine learning and deep learning algorithms can be employed where there is enough information for analysis. Combination of different techniques together might help in better analysis of reliability and faults. At present, fault diagnostic systems are mostly built as the combination of individual parts, such as data collection, feature extraction and dimensionality reduction, fault recognition, with little consideration of the whole diagnostic system. A complete end-to-end integrated and automated diagnostic system should be paid more attention.

Author Contributions

Conceptualization, P.O., D.B.A. and R.H.; methodology, P.O., D.B.A. and R.H.; software, P.O.; validation, B.Z., R.H., D.B.A. and K.S.; formal analysis, P.O., D.B.A., R.H. and B.Z.; investigation, P.O.; resources, B.Z.; data curation, P.O.; writing—P.O.; original draft preparation, P.O., D.B.A. and R.H.; writing—review and editing, P.O., D.B.A., R.H. and K.S.; visualization, P.O.; supervision, D.B.A., R.H. and B.Z.; project administration, D.B.A. and R.H.; funding acquisition, R.H. and B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The funding for this research was provided by the North American Construction Group, Grant NACG Apel.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement