Machine Learning in Predictive Maintenance towards Sustainable Smart Manufacturing in Industry 4.0

: Recently, with the emergence of Industry 4.0 (I4.0), smart systems, machine learning (ML) within artificial intelligence (AI), predictive maintenance (PdM) approaches have been extensively applied in industries for handling the health status of industrial equipment. Due to digital transformation towards I4.0, information techniques, computerized control, and communication networks, it is possible to collect massive amounts of operational and processes conditions data generated form several pieces of equipment and harvest data for making an automated fault detection and diagnosis with the aim to minimize downtime and increase utilization rate of the components and increase their remaining useful lives. PdM is inevitable for sustainable smart manufacturing in I4.0. Machine learning (ML) techniques have emerged as a promising tool in PdM applications for smart manufacturing in I4.0, thus it has increased attraction of authors during recent years. This paper aims to provide a comprehensive review of the recent advancements of ML techniques widely applied to PdM for smart manufacturing in I4.0 by classifying the research according to the ML algorithms, ML category, machinery, and equipment used, device used in data acquisition, classification of data, size and type, and highlight the key contributions of the researchers, and thus offers guidelines and foundation for further research.


Introduction
Industries are currently going through "The Fourth Industrial Revolution," as professionals have called it, a term also known as "Industry 4.0." (I4.0) Integration amongst physical and digital systems of the production contexts is what mainly concerns Industry 4.0 [1]. With the appearance of I4.0, the concept of prognostics and health management (PHM) has become unavoidable tendency in the framework of industrial big data and smart manufacturing; plus, at the same time, it offers a reliable solution for handling the industrial equipment health status. I4.0 and its key technologies play an essential role to make industrial systems autonomous [2,3] and thus make possible the automatized data collection from industrial machines/components. Based on the collected data type machine learning algorithms can be applied for automated fault detection and diagnosis. However, it is very cruel to select appropriate maching learning (ML) techniques, type of data, data size, and equipment to apply ML in industrial systems. Selection of inappropriate predictive maintenance (PdM) technique, dataset, and data size may cause time loss and infeasible maintenance scheduling. Therefore, this study aims to present a comprehensive literature review to discover existing studies and ML applications, and thus help researchers and practitioners to select appropriate ML techniques, data size, and data type to obtain a feasible ML application.
The industrial equipment predictive maintenance (PdM) can perceive the degradation performance because it was designed to achieve near-zero; hidden dangers, failures, pollution, and near-zero accidents in the entire environment of manufacturing processes [4].
These huge amounts of data collected for ML contains very useful information and valuable knowledge which can improve the whole productivity of manufacturing processes and system dynamics, and can also be applied into decision support in several areas, mainly in condition-based maintenance and health monitoring [5]. Due to the recent advances in technology, information techniques, computerized control, and communication networks, it is now possible to collect vast volumes of operational and processes conditions data generated from several pieces of equipment in order to be harvested in making an automated Fault Detection and Diagnosis (FDD) [6]. The datasets collected can also be applied to develop more efficient methodologies for the intelligent preventive maintenance activities, similarly known as PdM [7].
ML applications provide some advantages which include maintenance cost reduction, repair stop reduction, machine fault reduction, spare-part life increases and inventory reduction, operator safety enhancement, increased production, repair verification, an increase in overall profit, and many more. These advantages also have a tremendous and strong bond with the procedures of maintenance [1,[8][9][10][11]. Moreover, fault detection is one of the critical components of predictive maintenance; it is very much needed for industries to detect faults at very early stage [12]. Techniques for maintenance policies can be categorized into the following main classifications [13][14][15][16][17].
1. (R2F) Run 2 Failure: also known as corrective maintenance or unplanned maintenance. It is the simplest amongst maintenance techniques which is performed only when the equipment has failed. It may lead to high equipment downtime and a high risk of secondary faults and thus, create a very large number of defective products in production. 2. Preventive Maintenance (PvM): also known as scheduled maintenance or time-based maintenance (TBM). PvM refers to periodically performed maintenance based on a planned schedule in order to anticipate the failures. It sometimes leads to unnecessary maintenance which increase the operating costs. The main aim here is to improve the efficiency of the equipment by minimizing the failures in production [18]. 3. Condition-based Maintenance (CBM): this method of maintenance is based on a constant machine or equipment monitoring or their process health that can be carried out only when they are actually necessary. The maintenance actions can only be carried out when the actions on the process are taken after one or more conditions of degradation of the process. CBM usually cannot be planned in advance. 4. PdM: known as Statistical-based maintenance: maintenance schedules are only taken when needed. It is based on the continuous monitoring of the equipment or the machine, as like CBM. It utilizes prediction tools to measure when such maintenance actions are necessary, hence the maintenance can be scheduled. Furthermore, it allows failure detection at an early stage based on the historical data by utilizing those prediction tools such as machine learning methods, integrity factors (such as visual aspects, coloration different from original, wear), statistical inference approaches, and engineering techniques.
It is required that any maintenance strategy ought to minimize equipment failure rates, must improve equipment condition, should prolong the life of the equipment, and reduce the maintenance costs. An overview for the maintenance classifications is shown in Figure 1. PdM turned out to be one of the most promising strategies amongst other strategies of maintenance that has the ability of achieving those characteristics [19], thus the strategy has been applied recently in many fields of studies. PdM captivates the attention of the industries, hence it has been applied in the era of I4.0 due to it is capability of optimizing the use and management of assets [1,20]. ML, within the contexts of artificial intelligence (AI) (Figure 1, copyright permission of Figure 1 has taken on 20 September 2020), lately, has appeared to be one of the most powerful tools that can be applied in several applications to develop intelligent predictive algorithms. It has been developed into a wide field of research over the past decades. ML can be defined as a technology by which the outcomes can be forecasted based on a model prepared and trained on past or historical input data and its output behavior [21]. According to Samuel, A.L. [22], ML mainly means that if computers are allowed to solve without specifically being programmed in doing so. ML approaches are known to have tremendous advantages, as they have the ability in handling multivariate, high dimensional data and can extract hidden relationships within data in complex, dynamic, and chaotic environments [1,23,24]. However, depending on the ML approach chosen, the performance and advantages might differ. As of today, ML techniques have been widely applied in several areas of manufacturing (such as maintenance, optimization, troubleshooting, and control) [23]. Consequently, this paper aims to provide the recent advancements of ML techniques applied to PdM from an ample perspective. Predominantly, this ample review uses Scopus database while acquiring and identifying the articles used. From a comprehensive perspective, this paper aims to pinpoint and categorize based on the ML technique considered, ML category, equipment used, device used in data acquiring, applied data description, data size, and data type.
The following describes how the paper is organized: firstly, this section gives a brief introduction on the current field of study. Secondly, Section 2 presents a brief background on PdM and ML techniques. Thirdly, Section 3 explains the methodology employed in the literature while considering and categorizing the papers to review and how they are grouped. Section 4 presents the comprehensive applications of ML techniques applied to PdM. Subsequently, discussions are drawn based on the analysis carried out in the literature of ML algorithms for PdM. Finally, a conclusion and future research guidelines are given.

PdM and ML Techniques
Currently, the PHM system has become a safe-fire method for maintaining the safety status of equipment (e.g., defect detection and Remaining Useful Life (RUL)). It is accomplished by the systematic use of the current testing findings in AI technology and IT technology. [4]. Additionally, PdM cannot only provide reduction in the costs of the maintenance, it can also prolong the RUL [25]. The incipient issues that may lead to disastrous failures can be correctly forecast and appropriate steps can be set in order to avoid these failures on the basis of the prediction outcomes [4]. Deloitte [31] classifies the technologies that drive PdM into five different categories. Those are sensors, network, integration, augmented intelligence, and augmented behavior. Smart sensors are used to gather machine information with the use of built-in sensors or environment information with the use of external sensors. The network provides data storage as well as data transfer by using Bluetooth and WiFi [32,33]. Technology integration allows data management and data accumulation via Internet of Things (IoT), augmented intelligence assist data processing, and data analytics [30], whereas augmented behavior allows virtualization, computing and service platform via apps, and tickets to assist the operator [29].
ML algorithms are categorized into three different types; (1) supervised, (2) unsupervised, and (3) reinforcement learning (RL) (see Figure 3) [23,34,35]. The aim is to show how complex the structure can be and the commonly used available learning techniques. Moreover, as stated by [23], different algorithms can be combined together in order to maximize the classification power. To add on, some among the ML algorithms are both applicable to unsupervised and supervised learning. In unsupervised machine learning, there is no feedback from an external teacher or knowledgeable expert [23]. Based on the existing data, the algorithm identifies the clusters. The main aim here in supervised learning is determining the unknown classes of items by clustering [36],whereas classification is for supervised learning. Unsupervised ML basically defines any ML method that attempts to learn structure in the absence of either an identified output (like supervised ML) or feedback (like Reinforcement learning (RL)) [23].

Remote
Clustering, self-organizing maps, and association rules are the basic three main examples of unsupervised learning. In this paper, reviewed articles are categorized into three ML categories, classification, regression, and clustering, as shown in Figure 3. Data utilized in papers are categorized into two data types, real data that are taken from real world machineries, and simulated or synthetic data that are generated to meet specific needs such as model validation in ML. Authors [12,23,32,37,38] has also defined ML categories. RL is characterized by the delivery of information on training to the community. Through RL, the learner must discover which actions produce the greatest outcomes (numeric reinforcement signal) by attempting instead of being told. [23]. Nonetheless, some of the researchers considered RL as some sort of special supervised learning, like [34]. Moreover, as stated by RL, problems differ from supervised learning, as they can be described by the absence of labeled examples of "good" and "bad" behavior [23].
There are several available supervised machine learning algorithms, as few can be seen from Figure 3. Each of these algorithms has its own specific advantages as well as limitations regarding the application (either PdM or manufacturing). Selecting the most appropriate and suitable ML algorithm can be a major challenge for the requirements of the PdM problem. It is also important to get good at applied machine learning by practicing on lots of different datasets. Therefore, each problem requires different subtlety, different data preparation, and modelling methods. Datasets are classified into seven categories: multivariate, sequential text, time series, sequential, univariate, text, and domain theory. However, this paper classifies datasets into two categories. One of them is real datasets that are any production data obtained from real production processes and applicable to ML.
Another one is synthetic datasets that are any kind of production data applicable to ML, but they are simulated data rather than direct measurement in the production.
Ethical/legal permission is not required for this study. The study complies with research and publication ethics in obtaining all kinds of data and images.

Survey Methodology and Analysis
The scholarly or academic databases used for this review include articles from Scopus, ScienceDirect, Institute of Electronic and Electrical Engineers (IEEE), and Google Scholar. The Scopus database was mainly used for this review. In this review, the articles reviewed are categorized into two groups. The first group comprises the articles collected from Scopus and are considered as the main featured articles of the research. The second group comprises the articles that are used as supporting or background work in the contexts of introduction and the study in general. The second group of the articles that are obtained from the four databases stated above helped in building the theoretical foundations to the PdM, ML techniques, and the ML algorithms.
Strategy and keywords used while collecting the article from Scopus are as follows:  Firstly, the search was carried out based on "machine learning."  Followed by search within search, with "predictive maintenance," note that the use of quotation here means that the whole phrase was searched entirely, not as separate word by word.
With (TITLE-ABS-KEY ("machine learning") AND ("predictive maintenance")), as the search keywords, 788 documents appeared, and the survey was carried out on 30 July 2020.


The documents were then limited to the recent time parameters of 2010 to 2020. All inclusively, the number of the documents reduced to 367.  Subject area was limited to engineering, energy, and materials science, then the documents reduced to 273.  Then, from the document type, review and conference review were excluded from the analysis. That left a total of 217 documents. From the language section, the documents were limited to English. Figure 4 shows the number of documents that are published over the years, between 2010 and 2020. As can be seen from Figure 4, it is confirmed that just recently (i.e., from the last three years) ML techniques in PdM captivates the attention of the researchers. As there are very few papers published in 2010 and 2011 in comparison to how the number of published articles just spiked-up through the year 2017-2018. Thus, it can be concluded that application of ML technique in the field of PdM is a new method with a growing interest in the field of research. This might be due to the increase in the amount of dataset that are generated in industries, by the industrial equipment, system, or components, and at the same time it could be due to the recent advances of ML techniques and their algorithms [1]. Figure 5 shows the information on how the documents are published by type considered in this review, 134 conference papers, 72 articles, and 11 book chapters.
Among these 217 documents, we were able to download 103. These 103 articles and 33 from IEEE, combined together gives a total of 126 articles. Then, they are analyzed and screened for the removal of any duplicates. The number of publications are given in Figure 4. Article selection criteria aforementioned can be summarized and pictured from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart given in Figure 5.  Published documents by country with keywords: ("Predictive Maintenance" AND "Machine Learning").

Applications of ML Algorithms in PdM
ML algorithms can be used to solve several problems with the enormously available data generated from industries, thus, ML has been widely used in computer science and other areas, such as PdM of manufacturing system, tool or machine, and is one of the possible areas of use for datadriven methods (Artificial Neural Network (ANN), Reinforcement Learning (RF), Support Vector Machine (SVM), Logistics Regression (LR), and Decision Tree (DT)) [4,23]. Recently, ML techniques have been widely applied in various fields of study. Selecting the most appropriate, simple, and the most efficient could be of a great concern. ML algorithms usually require collecting huge amounts of data of the failure status scenarios and the health conditions scenarios for model training. These algorithms that mainly require large amounts of data involves Vector Space Model (VSM), LR, DT, and RF. ML algorithm development covers historical data selection, pre-processing data, model selection, model training, model validation, and maintenance. The steps involved in ML algorithm development can be specified as input, feature extraction & selection, features, traditional ML techniques, and output [4]. Similarly, [1] describes the main steps for ML development as historical data, data pre-processing, model selection, training and validation, and model maintenance. Further details on the main steps involved can be found in [1]. PdM has been broadly applied in industries such as manufacturing industries using ML techniques [39] and deep learning [40]. Articles identified through IEEE database searching: 33 Number of Articles after duplicates removed:

333
Number of Articles Included in Literature Review:

788
Number of Articles excluded:

515
Full text Articles Assessed for Eligibility: 273 Articles Excluded, with Reasons:

Artificial Neural Network (ANN)
In fact, ANN is developed from the subject of biology, where the Neural Network (NN) plays a significant role in the human brain. [41]. ANN is an intelligent computational technique that has been inspired by biological neurons [10]. It is a massively parallel computing system consisting of an extremely large number of simple processors with many interconnections. Instead of following the set of laws specified by human experts, ANNs learn the basic laws from the set of given symbolic situations in examples [42]. They are organized in three layers or more, (i.e., input layer, several hidden layers, and an output layer) [43]. Moreover, the analytical activity of these ANNs derives from the relations between the network processing units.
ANN models are broadly applied in many fields of studies due to their capability to learn from examples. To add on, ANNs models in comparison to the other traditional machine learning algorithms have noticeable advantages in addressing random data, fuzzy data, and nonlinear data. ANNs are primarily appropriate for systems with a complex, large scale structure and unclear information [4]. ANNs are widely applied and they are the most common ML algorithms [1], at the same time they have been suggested in several industrial applications involving soft sensing [44], and in predictive control systems [45]. Hesser, D.F. and Markert, B. [46] trained an ANN model to classify tool state of a Computer Numerical Control (CNC) milling machine with acceleration data. The proposed study was based on a retrofitting approach in order to facilitate older machines towards to I4.0. The tool wear was monitored by utilizing a programmable prototyping platform equipped with built-in sensors. The study proves the feasibility of retrofitting older machines. In the study, the performance of the built model was compared and outperformed the performance of Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) models.
A methodology was proposed by Sampaio, G.S. et al. [47] to treat and convert the collected data of vibration measurements from a vibration system that simulated a motor and to build a dataset in order to train and test an ANN model capable of predicting the future condition of the equipment, predicting when a failure can happen. The methodology involves the use of frequency and amplitude data by classifying the dataset and defining a way of calculating the vibrating system's failure time.
In the study, Multilayer Perceptron (MLP) methodology was used in performing the prediction task, due to its easier implementation with a good generalization index. The ANN model proposed was then compared in terms of its efficiency and based on Root Mean Square Error (RMSE) performance index values with other ML techniques, including Regression Tree (RT), Random Forest (RF), and Support Vector Machine (SVM). Comparative and training results were adequate and showed that ANN was greater than the others. In terms of medium-term and long-term prediction, ANN outperforms the others, whereas generalization in short term predictions between ANN, RF, and RT were equal.
ANN and SVM ML algorithms are applied in developing gauge degradation measurements prediction for two types of rail track including straight and curved segments by Falamarzi, A. et al. [48]. Mean squared error and coefficient of determination are used in the performance evaluation of the proposed models, ANN with greater than 0.9 coefficient of determination value. Based on the results obtained from the study, both ANN and SVM models provide satisfactory and slightly similar outcomes, but the performance of ANN models in predicting gauge deviation of straight segments is slightly better than SVM models. Biswal, S. and Sabareesh, G.R. [10] designed and developed a bench top test-rig for investigating the time domain vibration signatures of several critical components in wind turbine by imitating the operating condition of an actual wind turbine and use it for monitoring its condition. In their work, they acquired the healthy and faulty condition vibration signature of the critical components, then developed and applied ANN model to carry out the classification of the faulty and healthy state features. The model developed shows a 92.6% accuracy classification efficiency. Zhang, Y. et al. [49] reported a study on Physics-based Model and Neural Network Model for Monitoring Starter Degradation of Auxiliary Power Unit (APU). In their study, a generic modeling technique is adopted to overcome the limitations of lack of component characteristics. A comparative analysis between back-spreading and forward-feed neural network models has been performed, trained, and tested. Both models are applied under nominal and deteriorated conditions and their capabilities are validated. Depending on the data collected, their analysis concluded that the physicsbased approach produces more consistent outcomes for cases with degraded starters, although the neural network model showed better results with starters in healthy condition.

Support Vector Machine (SVM)
SVM is a well-known ML technique which is widely used for both classification and regression analysis, due to its high accuracy [1,50,51]. SVM is defined as a statistical learning concept with an adaptive computational learning method. SVM learning algorithm is presented in Figure 6. SVM learning technique employs input vectors to map nonlinearly into a feature space whose dimension is high [52][53][54]. SVM is a supervised ML technique that can perform pattern recognition, classification, and regression analysis. In the PdM of industrial equipment, SVMs have been widely applied for identifying a specific status based on the acquired signal [55]. SVM and ANN ML algorithms are applied in developing gauge degradation measurements prediction for two types of rail track including straight and curved segments by Falamarzi, A. et al. [48], where mean squared error and coefficient of determination are used in the performance evaluation of the proposed models, SVM with greater than 0.75 coefficient of determination value. Based on the results obtained from the study, both ANN and SVM models provide satisfactory and slightly similar outcomes, but, the performance of SVM models in predicting gauge deviation of curved segments is slightly better than ANN models. Moreover, in the study, Melbourne tram network has been used as a case study.
A data driven diagnostics and prognostics framework for machines to increase efficiency and reduce maintenance cost was proposed by Xiang, S. et al. [56]. Moreover, an accurate data labeling methodology is developed for supervised learning via comparing the serial number of target components in the adjacent dates. In the study, vending machine real data was used to validate the proposed framework for three different classifiers including SVM, RF, and Gradient Boosting Machines (GBM). Moreover, two models were developed for PdM, one for diagnostics and the other for two-stage prognostics. Results for the cross-validated simulation obtained shows that the diagnostics model can achieve more than 80% of accuracy, thus the developed model of SVM can be applied for diagnosis and prognostics monitoring of complex vending machines. The prognostics model outperforms one-stage conventional prediction models.

Decision Tree (DT)
Decision Tree is a network system composed primarily of nodes and branches, and nodes comprising root nodes and intermediate nodes. The intermediate nodes are used to represent a feature, and the leaf nodes are used to represent a class label [52]. DT can be used for feature selection [57]. DT algorithm is presented in Figure 7. DT classifiers have gained considerable popularity in a number of areas, such as character identification, medical diagnosis, and voice recognition. More notably, the DT model has the potential to decompose a complicated decision-making mechanism into a series of simplified decisions by recursively splitting covariate space into subspaces, thereby offering a solution that is sensitive to interpretation [58,59].

Random Forest (RF)
RF was developed by Breiman, L. [60]. This is an ensemble learning algorithm made up of several DT classifiers, and the output category is determined collectively by these individual trees. When the number of trees in the forest increases, the fallacy in generalization error for forests converges. There are also important benefits of the RF. For example, it can manage high-dimensional data without choosing a feature; trees are independent of each other during the training process, and implementation is fairly simple; however, the training speed is generally fast and, at the same time, the generalization functionality is good enough [4]. Random forest algorithm for machine learning has tree predictions, and based on tree predictions, the RF provides random forest predictions [61]. The RF model is visualized in Figure 8. A study was reported on forecasting the downtime of a printing machine based on real time predictions of imminent failures [62]. In their study, they used unstructured historical machine data to train the ML classification algorithms including RF, XGBoost, and LR to predict the machine failures. Different metrics were analyzed to determine the fitness of the models. These metrics include empirical cross-entropy, area under the receiver operating characteristic curve (AUC), receiver operating characteristic curve itself (ROC), precision-recall curve (PRC), number of false positives (FP), true positives (TP), false negatives (FN) and true negatives (TN) at various decision thresholds, and calibration curves of the estimated probabilities. Based on the results obtained, in terms of ROC, all the algorithms performed significantly better and almost similar. But in terms of decision thresholds, RF and XGBoost perform better than LR. ML algorithms including Linear Regression, RF, and Symbolic Regression (SR) are applied in modeling the condition of a healthy industrial machinery [63]. They proposed a methodology for detecting and predicting drifting behavior (called concept drifts) in continuous data streams. Further, a real-world case study was presented on industrial radial fans. Based on the results obtained using the synthetic data, both the results from concept drift detection and prediction are highly successful. Moreover, based on the conducted real word study, experts on-site at the strained radial fan reported that the principle of drift detection has been successfully deployed. However, due to the lack of continuous deterioration information, the predictability of concept drifts is currently based on assumptions and cannot be measured yet, even though results of the tests are already very promising.
Janssens, O. et al. [64] proposed a multi-sensor device that uses not only infrared thermal imaging data, but also uses vibration measurements for automatic conditioning and fault detection in rotating machines. The feature fusion is utilized where model-driven features are derived from vibration measurements and data-driven features are derived from infrared thermal imaging data. Then, the extracted features are combined together and presented to RF classifiers for actual fault detection. They have demonstrated in the study by mixing these two types of sensor data, a variety of conditions/faults and combinations can be measured more accurately than in the case of individual sensor streams.
Lacaille, J. and Rabenoro, T. [65] developed a learning algorithm that can automatically detect and analyze multidimensional datasets of turbofan engine. The developed model uses a very wide population of pre-treatments and statistic tests on the data and has the ability to select good combinations of tests with higher than 85% pre-identification. Quiroz, J.C. et al. [66] proposed a new approach to diagnose broken rotor bar failure in a Line Start-Permanent Magnet Synchronous Motor (LS-PMSM) using RF. The transient current signal during the motor startup was acquired from a healthy motor and a faulty motor with a broken rotor bar fault. The model was trained using features extracted from thirteen different statistical time-domain features, and these features were used in determining the state of the motor where it is operating under faulty or normal conditions. Feature importance was considered for their feature selection in order to reduce the number of features to very few from the RF. Results have shown that RF categorizes the motor disorder as safe or deficient with an accuracy of 98.8% using all the features and an accuracy of 98.4% using only the mean index and impulsion features. A comparison was carried out between the developed model and other traditional ML algorithms including Decision Tree (DT), Naive Bayes classifier (NBC), LR, linear ridge, and SVM, the RF consistently outperforms these algorithms with having a higher accuracy than the other algorithms. The suggested methodology can be used for electronic tracking and fault detection of LS-PMSM motors in the industry, and the findings can be beneficial for the development of preventive maintenance plans in factories.
Yan, W. and Zhou, J.H. [67] proposed a predictive model using Term Frequency-Inverse Document Frequency (TF-IDF) and RF can forecast faults of high sensitivity in advance by analyzing the historical data of aircraft maintenance systems, and preventive maintenance may be carried out on the basis of the model's prediction performance. TF-IDF has been employed in order to extract the features from raw data in the past consecutive flights. Different priorities were considered in classifying the faults by the proposed RF model. The ROC curve has been adopted as a performance metric as the dataset is highly imbalanced. Compared to the other method, the suggested approach reaches the maximum true positive rating of 100% and the lowest false positive rate of 0.13%. For the testing dataset, the proposed method achieves true positive rate 66.67% and false positive rate 0.13%.

Logistic Regression (LR)
Binding, A. et al. [62] reported a study on forecasting the downtime of a printing machine based on real time predictions of imminent failures. In their study, they utilized unstructured historical machine data to train the ML classification algorithms including RF, XGBoost, and LR in predicting the machine failures. Various metrics were analyzed to determine the goodness of fit of the models. These metrics include empirical cross-entropy, area under the receiver operating characteristic curve (AUC), receiver operating characteristic curve itself (ROC), precision-recall curve (PRC), number of false positives (FP), true positives (TP), false negatives (FN), and true negatives (TN) at various decision thresholds, and calibration curves of the estimated probabilities. Based on the results obtained, in terms of ROC, all the algorithms performed significantly better and almost similar. But in terms of decision thresholds, RF and XGBoost perform better than LR. Using a given set of independent variables, linear regression is used to estimate the continuous dependent variations. However, using a given set of independent variables, logistic regression is used to estimate the categorical contingent variations [68]. Graph of the linear regression model and logistics regression model are shown in Figure 9.

Extreme Gradient Boosted Trees (XGBoost)
XGBoost was developed by Chen, T. & Guestrin, C. [69], a scalable tree boosting system that is widely used by data scientists and provides state-of-the-art results on many problems. Open source C++ was utilized in the implementation of XGBoost algorithm on forecasting the downtime of a printing machine based on real time predictions of imminent failures [62] and used unstructured historical machine data to train the ML classification algorithms including RF, XGBoost, and LR in predicting the machine failures. Various metrics were analyzed to determine the goodness of fit of the models. These metrics include; empirical cross-entropy, area under the receiver operating characteristic curve (AUC), receiver operating characteristic curve itself (ROC), precision-recall curve (PRC), number of false positives (FP), true positives (TP), false negatives (FN) and true negatives (TN) at various decision thresholds, and calibration curves of the estimated probabilities. Based on the results obtained, in terms of ROC all the algorithms performed significantly better and almost similar. But in terms of decision thresholds, XGBoost and RF perform better than LR. XGBoost algorithm tree uses majority voting technique to define final class [70]. XGBoost algorithm tree is presented in Figure 10.

Gradient Boosting Machines (GBM)
GBM is a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. It is also an assembly-based model that learns to update prediction results on new models consecutively [71,72].
A data driven diagnostics and prognostics framework for machines to increase efficiency and reduce maintenance cost was proposed by Xiang, S. et al. [56]. Moreover, an accurate data labeling methodology is developed for supervised learning via comparing the serial number of target components in the adjacent dates. In the study, vending machine real data was used to validate the proposed framework for three different classifiers including SVM, RF, and Gradient Boosting Machines (GBM). Moreover, two models were developed for PdM, one for diagnostics and the other for two-stage prognostics. Results for the cross-validated simulation obtained shows that the diagnostics model can achieve more than 80% of accuracy, thus the developed model of GBM can be applied for diagnosis and prognostics monitoring of complex vending machines. The prognostics model outperforms one-stage conventional prediction models.

Linear Regression
Linear regression refers to a multivariate linear combination of regression coefficients [63]. The coefficients are calculated by the generalized least square technique. Linear regression is deterministic and the parameter is less, there is no need to adjust something other than the data break for model training and testing. Linear regression and Random Forest Regression are very common regression algorithms in general and time series regression algorithms that have already been used in the field of predictive maintenance [73]. Linear regression model in machine learning is presented in Figure 11. ML algorithms including linear regression, RF, and Symbolic Regression (SR) are applied in modeling the condition of a healthy industrial machinery [63], where they proposed a methodology for detecting and predicting drifting behavior (called concept drifts) in continuous data streams. Further, a real-world case study was presented on industrial radial fans. Based on the results obtained using the synthetic data, both the results from concept drift detection and prediction are highly successful.

Symbolic Regression (SR)
SR refers to models in the form of a syntax tree composed of arbitrary mathematical symbols (terminals: constants and variables, non-terminals: mathematical functions) that can be easily converted into simple mathematical functions. Top-down syntax trees are reviewed for target estimation [63]. Syntax trees are developed using the stochastic genetic programming technique from the field of evolutionary algorithms [74].
SR has been applied in modeling the condition of a healthy industrial machinery [63]. The study proposed concept drifts methodology in continuous data streams. Further, a real-world case study was presented on industrial radial fans. Based on the results obtained using the synthetic data, both the results from concept drift detection and prediction were highly successful. Sample of SR algorithm is shown in Figure 12.   (DNN). This was mitigated in their study. Case studies have been carried out on machine-fault detection and the oil-level prediction; in both cases, results have shown that CNN outperforms the classical FE methods. They added that the proposed method has the potential to improve online CM, like offshore wind turbines. Another potential application is the monitoring of bearings in manufacturing lines. Using thermal imaging together to the trained CNN allows identifying the location of the faults in the manufacturing lines.
Huuhtanen, T. and Jung, A. [76] proposed a study on DL for predictive maintenance of photovoltaic panels. CNN was applied for monitoring the operation of photovoltaic panels. In fact, they estimate the regular electrical power curve of the photovoltaic panel depending on the power curves of the neighboring panels. An unusually broad difference between the predicted and the actual (observed) power curve can be used to suggest a malfunctioning panel. By the means of numerical experiments, they are able to demonstrate that the proposed method is able to predict accurately the power curve of a functioning panel and the method out-performs the existing methods that are based on simple interpolation filters.
Pan, Z. et al. [77] proposed a modular cognitive acoustics analytics service for IoT that provides customers with an incremental learning approach to improve their analytical capabilities for nonintuitive and unstructured acoustic data through a combination of acoustic signal processing. They pointed out that different types of data formats created from complicated acoustic environments can go through pre-processing and noise reduction stages and then feed into higher-level analytics platforms. The model allows for acoustic signal-based anomaly detection, acoustic grouping, acoustic signal processing, acoustic array processing, and other features. In classification, the model uses a baseline algorithm when a small amount of data is used, while when a huge amount of data is used, this model utilizes a technique based on the DNN to perform a more accurate classification. This service will include signal processing data, such as sound intensity, spectral centroid, frequency, etc., and can support numerous applications. Eventually, the service can also detect several sources of sound that allows detection and enhancement of the acoustic source. Experimental findings show that this service achieve excellent performance. The application case for the diagnosis of a washing machine is defined.
Jimenez-Cortad, A. et al. [78] carried out a case study based on the application of predictive maintenance to a real machining process. The aim of their study is to increase tool life of the machine by application of ML methods for RUL prediction. Real-time data obtained from the computer and then approximation of the data performed in the analysis. Their study conducted linear and quadratic regression models to perform the design application for RUL estimation. Finally, accurate results were found in their study to predict RUL for comparison. Luo, W. et al. [79] employed predictive maintenance approach for machine tool driven by digital twin to avoid faults and causalities. In their study, a hybrid approach was utilized to calculate RUL results that show the prediction error ratio (between actual value and predicted value).
In this section, a summary on the applications of ML algorithms in PdM will be given. Table 1 summarizes the analysis and comparison among these algorithms that are mainly applied in the field of PdM according to ML techniques, ML categories, equipment systems, and type of data.

Commercial Platforms available for Machine Learning in Smart Manufacturing Industry 4.0
Data Science and Machine Learning Platforms offer platforms for the development, implementation, and analysis of machine learning algorithms. Such systems integrate intelligent algorithms for decision taking with data, thereby enabling developers to build a business solution. Several platforms provide pre-constructed algorithms and simple workflows with functionality such as drag and drop modeling and visual interfaces that quickly link required data to the end solution, whereas others need further programming and coding skills. In addition to other machine learning applications, these algorithms have functionalities for image recognition, natural language processing, speech recognition, and recommendation systems [125]. The most common platforms are mentioned in Table 2. Qubole  Provides a simple and secure data lake platform for ML, streaming, and ad-hoc analytics.

Discussion and Conclusions
The literature review is categorized based on ML techniques, ML categories, equipment used, device used in data acquiring, description of the applied data, data size, and data type. Based on performed comprehensive literature review, predictive maintenance continues to be an important method for improving efficiency in all kinds of environments where machines that wear down over time are involved. The possibilities of manufacturing and placing cheap, connected sensors will continue to increase with the rise of IoT. As the amount of data increases with the number of sensors, so will the possibilities of applying machine learning algorithms to perform predictive maintenance.
This paper presents a comprehensive review of ML techniques applied in PdM of industrial components. Recent applications within the timeframe of ten years (i.e., 2010-2020) for several ML algorithms were reviewed and presented. Finally, some discussions have been drawn based on the literature review performed.
It is observed that predictive maintenance has enormous market opportunities, and that machine learning is an innovative solution to predictive maintenance implementation. Yet, according to a PwC survey, only 11% of the companies surveyed have "realized" predictive maintenance based on ML [126]. There are some challenges implementing ML algorithms for PdM in I4.0 and those are identified in Table 3. Table 3. Challenges in implementing ML for Industry 4.0 (I4.0).
Most of the data type used is real data; very few studies applied simulated or synthetic data while developing the ML algorithms. Public data applied in developing ML algorithm PdM models include Bosch data, SECOM data, Repository dataset of NASA for turbofan-engine, CMAPSS dataset of aircrafts, CMAPSS NASA simulation dataset of engine degradation.


Vibration signals acquired using accelerometer are the most used data.  The most applied ML category is classification.
According to existing research, a couple of limitations are highlighted. The limitations are the following: (1) Although the classifiers have presented excellent accuracy in the distinction between states, they are required to be trained with a complete dataset of all the faults. (2) Algorithms are selected base on developer's experience and this situation can have influence on variable of prediction results. (3) Carrying out a study with a single prediction method may not present excellent results. Therefore, application of other methods to provide comprehensive results between methods can give better understanding about the study. (4) Cross validation for models can be unsuccessful due to lack of RAM memory.
Moreover, some of the works conducted by this research employ regular machine learning methods without parameter tuning. Perhaps this is due to the fact that PdM is a new subject for industry experts and is beginning to be explored. It is also important to point out that it is appropriate to have the R2F and PvM strategies already applied in its process to collect data for PdM modeling in order to obtain good results of a PdM strategy in a plant. Based on that data, designing and validating a PdM strategy becomes feasible. During this study, it was noted that there is incremental application of machine learning techniques to develop PdM applications. Integrating PdM and machine learning in some applications provides cost reduction. However, the incorporation of PdM techniques with the new sensor technologies can be seen as avoiding unnecessary replacement of equipment, saving costs and improving process safety, availability, and performance. This paper presents a comprehensive review of ML techniques in PdM by identifying the most used ML techniques, industrial areas where ML is applied, and utilized data type for ML applications, and proposes a way forward, and provides a foundation for further research. Below, remarks for further researches are made to motivate authors and practitioners.


Extraction of real time data using intelligent data acquisition system can help to automate predictive maintenance.  Combination of more than one ML models can provide better prediction compared to use of individual model.  ML model implementations based on cloud can be further studied.  Classification and Anomaly Detection algorithms can be combined to maintain precision of classification models without losing Anomaly detection advantages. By this way, PdM can be applied to equipment or system which does not have large dataset. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflicts of interest.