SOPHIA: An Event-Based IoT and Machine Learning Architecture for Predictive Maintenance in Industry 4.0

: Predictive Maintenance (PdM) is a prominent strategy comprising all the operational techniques and actions required to ensure machine availability and to prevent a machine-down failure. One of the main challenges of PdM is to design and develop an embedded smart system to monitor and predict the health status of the machine. In this work, we use a data-driven approach based on machine learning applied to woodworking industrial machines for a major woodworking Italian corporation. Predicted failures probabilities are calculated through tree-based classiﬁcation models (Gradient Boosting, Random Forest and Extreme Gradient Boosting) and calculated as the temporal evolution of event data. This is achieved by applying temporal feature engineering techniques and training an ensemble of classiﬁcation algorithms to predict Remaining Useful Lifetime (RUL) of woodworking machines. The effectiveness of the proposed method is showed by testing an independent sample of additional woodworking machines without presenting machine down. The Gradient Boosting model achieved accuracy, recall, and precision of 98.9%, 99.6%, and 99.1%. Our predictive maintenance approach deployed on a Big Data framework allows screening simultaneously multiple connected machines by learning from terabytes of log data. The target prediction provides salient information which can be adopted within the maintenance management practice.


Introduction
The increasing availability of Big Data technology platforms and data-driven applications are changing the way decisions are taken in the industry in important areas such as scheduling [1], maintenance management [2,3], and quality improvement [4][5][6]. In manufacturing industry machines and systems become more advanced and complicated. End-users demand comprehensive maintenance service in their production equipment to ensure high availability preventing machine downtimes. In this context, machine learning can be efficiently used for optimal maintenance decision-making. Most of the companies and manufacturers possess huge amounts of sensor, process, and environment data. Combining the data with information about the failures creates useful train data sets for predictive maintenance.
Approaches to maintenance management are grouped into three main categories which, in order of increasing complexity and efficiency [7], are: (i) Run-to-Failure (R2F), (ii) preventive maintenance and (iii) predictive maintenance. R2F is the simplest approach dealing with maintenance interventions which are performed only after the occurrence of failures. Preventive maintenance (PM) comprises all the operational techniques and actions required to ensure machine availability and to prevent a "machine-down" failure. PM is defined as regularly scheduled maintenance actions based on average failure rates driven by time-, meter-, or event-based triggers. A properly implemented PM strategy can provide many benefits to an organization by extending equipment life, optimizing resource expenditures, and balancing work schedules. However, poor maintenance strategies can reduce a plant's overall machine productive capacity by 5% to 20% [8]. Extensive PM programs require many labor resources, and there is often a probability of performing excessive maintenance that has no positive impact on the equipment [9]. Therefore, it is difficult to determine the optimal level of PM, and it may require years of maintenance actions and data collection before payback is realized [10]. Such maintenance is often carried out separately for every component, based on its usage or on some fixed schedule. Unlike traditional maintenance methods, Predictive Maintenance (PdM), could play a central role for asset utilization, service, and after-sales in the realizing Industry 4.0 new technological services. PdM is defined as a process of determining maintenance actions according to regular inspections of an equipment asset's physical parameters, degradation mechanisms, and stressors to correct problems before machine-down occurs [9]. Standard EN 13306:2001 [CEN: "Maintenance terminology", European Standard EN13306, 2001] defines PdM as "condition-based maintenance carried out following a forecast derived from the analysis and evaluation of significant parameters of the degradation of the item". Contrary to the classic PM programs, PdM could improve the performance of the equipment, strengthening the business model of companies. Thanks to including a set of sensing, condition-based monitoring (CBM), predictive analytics and distributed systems technologies, it is possible to provide remote technical help based on continuous monitoring. PdM works particularly well for systems easy to monitor and have easily identifiable characteristics that can be statistically analyzed to determine the Remaining Useful Lifetime (RUL) [11]. Notwithstanding research unanimously states that implementing of PdM can bring enormous benefits to businesses in terms to reduce life-cycle costs [12] and increasing product reliability [13], advanced maintenance technologies have not yet been well implemented in the manufacturing industry. This depends on various reasons, including the difficulty of gaining appropriately condition monitoring (CM) data (e.g., vibrations, currents, temperatures, etc.) and collect run to failure data in the industrial production environment [14]. In fact, this requires the machines equipped with sensors and proper data acquisition systems and often implies huge investments to redesign sub-systems of the machines. However, many types of equipment, including CNC machine tools, can already provide in real time massive state data about machining process and a large amount of event data related to errors and faults generated by the diagnostic systems embedded by their controllers. All these data together represent a huge wealth of information that nowadays machine tool manufacturers can easily access thanks to the new Internet of Things (IoT) and Information and Communication Technologies (ICT) introduced with the Industry 4.0.
Based on our knowledge, current research neglects the importance of such data for PdM in manufacturing industry; very few studies explored the possibility to perform equipment failure prediction from event log data in the case of electronic equipment, such as medical devices [15] and automated teller machines [16]. No studies proposed similar approaches to predict the failure of specific mechanical components of machine tools in the context of the manufacturing industry. Perhaps, non-observance of event data may result from the erroneous belief that event data is not valuable as long as the CM indicators seem to work well in reducing equipment failures [17]. However, taking advantage of this data source would allow alternative low-cost PdM, without requiring additional sensors to be implemented on the machines. The idea of using equipment logs to predict faults poses several issues that have not yet been fully explored. In particular, determining predictive features poses a major challenge, as the logs contain a massive amount of data that rarely includes explicit information for failure prediction [15]. The management of these volumes of data requires a system capable of incorporating the entire technology stack: the extraction-transformation-load, the data filtering, and advanced machine learning algorithms to identify hidden patterns of operational, usage and maintenance data log files to predict machine downtime. To achieve this, Big Data frameworks come in useful for analyzing the data more efficiently and deploy PdM in order to be executed in a scalable cloud computing environment.
In this context, the main contribution of this research is to present a methodology for PdM, taking advantage of Big Data information already provided by machine tool data log systems, without the need to install sensors into the machines to collect specific CM data. This work presents a PdM approach to adopt a data-driven solution deployed in a scalable cloud computing environment. We describe a machine learning application for PdM applied to the ball-bearing component of the Electrospindler (ES) machine. We present a data-driven approach based on multi-step machine learning pipeline comprising (i) log file parser development (ii) feature engineering (iii) model building and model evaluation (iv) model deployment and monitoring. By the application of this computation pipeline, we tested the hypothesis that the aggregated event-driven data (errors and warning events) are associated with machine-down and can be qualified as predictive markers or KPIs for machine down of woodworking machines.

Related Work
As evidenced by many research reviews [14,[17][18][19], during the past twenty years, enormous efforts have been made to improve CBM systems with increasingly effective diagnostic and prognostic capabilities. The approaches predominantly used for the development of PdM systems can be classified in knowledge-driven and data-driven [14,19].
Knowledge-driven approaches typically make use of "a priori" human expertise or involve building models based on comprehensive knowledge about system physics. When fault models of the system and their progression are available, model-based approaches provide the most accurate and effective results [19]. However, as the system complexity and uncertainties increase, it can be very difficult and costly to adopt such approaches. Data-driven approaches represent the most suitable solution in all practical cases where it is easier to gather data than building accurate expert systems or physics models [18]. They attempt to derive models directly from historical records, exploiting machine learning algorithms: the basic idea is to learn system behavior by observing variations in nominal operating conditions showing faulty operating states and to extrapolate knowledge to determine the RUL of certain components. The most commonly used learning approaches include Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Bayesian networks (BNs), and Hidden Markov Models (HMMs) [14,19,20]. ANN is the most commonly used Artificial Intelligence technique in machinery RUL prediction [14], mainly because of its strong ability for nonlinear simulation, strong robustness and self-study ability [20]. However, this method often requires large numbers of high-quality training data and it is difficult to be generalized among different contexts, as its structure and parameters are generally initialized randomly or specified manually [14]. BN is a probabilistic acyclic graphical model [21], where each node represents a random variable that can be continuous or discrete. The expert knowledge about the domain is needed to identify those random variables and build the graph [22]. BN is a multiple state model with the ability to test several outputs in the same model, but it can only be used to predict failure times for previously anticipated faults, with anticipated symptoms. Dynamic Bayesian Network (DBN) extends BN. It is a general state-space model to describe stochastic dynamic systems [23]. DBN is used for prediction failures in industrial plants [24]. While DBN is a potential tool to predict failures in advance, its models are applicable so far for a simple network with a few variables. Several works have also been developed using HMM to estimate the sequences of hidden degradation states of a system before a failure occurs [25][26][27]. HMM model accuracy outcome depends on the quality of temporal data available and are not suitable for a large number of variables. To collect information about the operating conditions of critical components of mechanical equipment, proper sensors are needed for recording and monitoring CM data (e.g., vibration, temperature, current, etc.). Some studies refer both on CM data and event data. For example, Canizo et al. [28] presented a Big Data analytics approach for the renewable energy field. In particular, they developed a PdM application for wind turbines by using a Big Data processing framework to generate data-driven predictive models that are based upon historical operational data (e.g., power, wind speed, rotor speed, and generator speed recorded) and system status data, previously stored in the cloud.
Taking into account the increasing amount of data, there is growing interest in the application of machine learning and reasoning for predictive maintenance. In particular, machine learning pros may lie the foundation for overcoming several challenges including the need to integrate data from various sources (sources heterogeneity) and systems, the need to provide accurate prediction model [29] and the need for real-time monitoring demand dealing with latency, and scalability [30]. Hence, the major pros of the ML model in this context can be justified in terms of predictive performance, computation effort and interpretability. However, the use of machine learning in this context may raise other challenges like obtaining training data; dealing with the dynamic operating conditions (domain shifts) and the need to select the more appropriate method for each industrial case. Only in recent years, some data-driven approaches for PdM based only on event log data have been proposed for several equipments. For example, Sipos et al. [15] report the application for multiple instance learning (MIL) to build predictive models from log data for medical equipment. MIL is a variation of supervised learning where a single class label is assigned to a bag of instances (bags are labeled positive or negative). Given bags got from different equipment and at different dates, the goal is to build a classifier that will label either unseen bags or unseen instances correctly. However, based on the best of our knowledge, no studies proposed similar approaches to achieve PdM to predict the failure of specific mechanical components (e.g., roller bearings, motor inverters, etc.), taking advantage of CNC machine tool data log systems.
In our study, we adapted our PdM application into a Big Data environment using cloud computing and Big Data frameworks to provide the method with the ability to scale and to process the data of hundreds of thousands of connected machines. We build a computational pipeline to deal with high dimensional data and to model complex interaction event-based error aggregates parsed from unstructured log files. To achieve this, we tested the hypothesis that aggregated features computed by feature engineering on temporal data could be efficiently modeled with machine learning classifiers. The proposed application can help to enable PdM and root cause analysis of machine down of woodworking machines.

Log File Data Collection
We concentrated our efforts on the broken ball bearing component which causes the machine down of woodworker Rover devices. Log files were collected having at least five months of historical data and obtained the standard output (stdout) log files. The collected stdout log files data set was from 14 Rover machines: 5 ES with the broken ball bearing component, and 9 ES without ball bearing (control group) were subjected for data processing as described in the following paragraphs.

Big Data Architecture
The Big Data architecture is reported in Figure 1. Big Data design of the PdM application comprised three major modules: (i) data acquisition (ii) data processing and data science module and (iii) predictive monitoring. For data acquisition, Azure Blog Storage (https://docs.microsoft.com/enus/azure/hdinsight/hdinsight-hadoop-use-blob-storage) was used. Azure storage is a distributed, robust, general-purpose storage solution that integrates with Azure HDInsight. HDInsight (Azure Hadoop Distribution) uses a blob container in Azure Blob Storage as the default file system for the cluster. Through a Hadoop distributed file system (HDFS) interface, the full set of components in HDInsight can operate directly log files stored as blobs. Data are retrieved through Azure Blob Storage REST APIs service using the HDFS interface.
Apache Spark (https://spark.apache.org/) (in Python version called pySpark) was used for data processing ( Figure 1B). Apache Spark is a fast, general purpose memory engine for large-scale data processing and feature engineering. These features make Apache Spark well-suited to process and analyze large volumes of log files. Apache Spark is also a general-purpose and very flexible platform. Thanks to this last feature it was possible to install the pySpark libraries to run the machine learning algorithm. The PdM application uses two data processing types: offline processing to generate predictive models based on historical log data files and online processing. The Python language is used for the offline mode, while pySpark is employed for the production environment ( Figure 1B). Online processing of the entire pipeline Spark job was scheduled to run the entire pipeline through the Linux Cron scheduler (every 24 h) and to update with new data the prediction probabilities of the machine down. Finally, the updated prediction probabilities for machine down failure were daily visualized and monitored through a front-end web application.

Data Science Module
The data science module is summarized in Figure 2B. Our methodology can be encapsulated in a protocol that has two main components: Log file parsing and Machine Learning (ML) module. The main steps of the current protocol are: (i) log-file parsing, (ii) feature engineering for selected group error groups (Table A2), (iii) model building, (iv) model evaluation.

Log-File Parsing Module
Event triggers and log file timestamp were parsed using ad hoc regular expressions ( Figure 3A). In total, 104 selected group errors and warnings were selected and grouped into five major error groups: emergency, security control, unexpected stop (KO) error, inverter, overhearing, tool change (Table A2). The target variable was expressed as RUL and estimated by calculating the remaining number of days to the failure event based as provided by the internal service records ( Figure 3B,C). The RUL variable was discretized into a binary variable and tested for several windows period before machine down (W = {30,20,10} days). The optimal binary RUL split was evaluated and tested into the classification step.

Feature Engineering
We used a 'Bag-of-words' approach using Scikit-learn's CountVectorizer [31] to count frequency of events (Figure 2A). Feature engineering was applied, to construct the final training data set. In this step, for each classification group error (Table A2), the rolling mean of size 'W' (W = {30,20,15,10} days) was applied to calculate the lagged features (Table 1).

Model Optimization
On the training set, cross-validation (CV) was used to estimate the optimal values of hyperparameters. The optimal parameter values were determined using a grid-search on the training set. The grid search was performed over the several ranges of the following parameters: max depth, learning rate, number of trees, column sample rate and sample rate as reported in the Table A3. For each approach, the optimized set of hyperparameters was then used to train the classifier using the training group; the performance of the resulting classifier was then evaluated on the testing set. In this way, we achieved unbiased estimates of the performances of each method.

Machine Learning Model Building
Using the H2O.ai python package (https://www.h2o.ai) we trained and tested nine machine learning models with different binary RUL targets (W = {30,20,10} days) comprising Extreme Gradient Boosting (XGBoost), Distributed Random forest (DRF) and Gradient Boosting Machine (GBM). The key advantage of tree-based models is that they are easy to interpret, since the predictions are given by a set of rules [32,33].
We performed experiments with other state-of-the-art machine learning methodologies (i.e., SVM with linear and Gaussian Kernel, nearest-neighbor [NN] classifier and decision tree [DT]). In our task, Extreme Gradient Boosting (XGBoost), Distributed Random forest (DRF) and Gradient Boosting Machine (GBM) achieved higher performance (in terms of accuracy, precision, and recall) than SVM and DT. We have not considered the neural network-based model in our comparisons, because the potential of neural network models may be limited by the interpretability of the model, which does not always allow to perform the pattern localization [34]. On the other hand, the key advantage of tree-based models is that they are easy to interpret since the predictions are given by a set of rules. The knowledge of these rules and how the prediction is achieved are key information that may support the operator for the diagnosis and prognosis procedure in the context of PdM.

Experimental Procedures and Metrics
To obtain unbiased estimates of the performances, the data set divided into two groups: a training set and testing set with a 70-30 split. The training set (70%) was used to determine the optimal values of the hyperparameters of each method and to train the classifier. The testing set was then only used to test classification performances. On the training set, CV was used to estimate the optimal values of hyperparameters. The optimal parameter values were determined using a grid-search on the training set. The grid-search was performed over the several ranges of the following parameters: max depth, learning rate, number of trees, column sample rate and sample rate as reported in the Table A3. For each approach, the optimized set of hyperparameters was then used to train the classifier using the training group; the performance of the resulting classifier was then evaluated on the testing set. The ML model, which achieved the highest expected value of accuracy (see (1)), was selected as the optimal classifier tested for several periods before machine down (W = {30,20,10} days).
The performance of the ML model was evaluated in terms of predictive accuracy and model interpretability. Predictive accuracy of the proposed approach was performed according to the following measures:  In terms of model interpretability, the variable importance was used to determine the relative influence of each variable on the classification task for the three models GBM, RF, and XGBM.

Experimental Results
The analysis was based on a computational pipeline protocol (Figure 2), comprising the following step: (i) pre-processing step (Log file parser, Feature engineering) and (ii) classification step (Model Building and Model evaluation) and (iii) model deployment (Hadoop Spark Cluster and monitoring).

Log File Parser
In the pre-processing step, we developed a log file parser to extract events and pre-processing the data (Figure 2A). Stout log files are a collection of events recorded by various applications running on the ES equipment. The input log file consists of a timestamp expressed in milliseconds indicating when the events occur, message texts (either fully unstructured or generated from a template) describing the event, represented with an alphanumeric code to indicate different event category or groups event variations which reflect the developers' original idea about what are the valuable states to report. Events can be aggregated into functional units called program units as defined between specific start and stop flags (Table A1). During the extraction phase ( Figure 3B), we applied a set of regular expressions to (i) extract program units and (ii) extract specific event codes, including event codes (PLC), error codes, inverter codes, etc. (Table A2). In the processing phase, ( Figure 3C) each record was aggregated into program units and calculated frequency of all event codes. The target variable ( Figure 3C) was expressed as RUL and was estimated by calculating the remaining number of days to the failure event based as provided by the internal service records.

Feature Engineering
We applied a feature engineering technique to create more informative variables to our training data set using sliding windows method to calculate novel variables called lag features. Sliding windows are fixed size windows that are used for aggregate computations. For each data point in a temporal sequence, aggregates are computed from the data points in a pre-defined window. In our training data set, for each record, a sliding window of size 'W' was chosen as the number of units of time to compute the lag features for daily program units (Table 1). Lag features of the program were computed using the W periods before the event of machine down failure were calculated within four windows: (W = {30,20,15,10} days). The final data set ( Figure 3D), consisting of 14 cases (five with machine failure and nine without machine failure), returned 44 aggregated features and 2975 records which are submitted to the classification step ( Figure 2B).

Classification Results
In the classification step, we trained, tuned, and evaluated three tree-based models: Distributed Random Forest (DRF), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). For each model the optimal values of the hyperparameters were identified using a grid-search method (Table A3). Classification accuracy was estimated by means of four-fold CV in training data set (70%) and validation on an independent testing data set (30%). Since the GBM model achieved for the 30-day RUL target the highest expected value of accuracy in the testing data set, it was selected as the optimal classifier. The classification accuracy values of all three models were listed in Table 2. The accuracy, sensitivity, and specificity were 98.9%, 98.6% and 98.3% in the testing group, respectively (Table 2, Figure 4A,B). The Area Under Curve of the ROC (AUC) was 0.92 in the testing data set ( Figure 4B). XGboost and DRF models achieved lower metric values in the testing set ( Table 2). The confusion matrix for this GBM classifier supported these observations as well ( Figure 4A). This classifier demonstrated a reliable performance showing that can predict the machine down caused by the ball bearing component with high accuracy.

Model Feature Importance
The high interpretability of the GBM model allows extracting the importance of each feature to localize the most discriminative predictors. The H2O GBM algorithm [35] calculates the importance of each feature based on whether that variable was selected during splitting in the tree building process and how much the squared error improved as a result. Figure 4C shows a horizontal bar-plot, with only the top 10 variables which indicate which variable the GBM has learned are important in distinguishing machine down failure against the control group. The features are sorted according to the decreasing order to select the most relevant ones, assuming that the feature with the highest score is the most discriminant one. The most significant variables provided by the optimal model (GBM) we observe the top three variable the GBM belong to the invert error group (avg_num_err_inv_30), the number of daily programs (avg_num_err_30) and the tool change (avg_num_err_ut_30) whereas DFR and XGBM belong to average number of programs (avg_num_pgr_30, avg_num_pgr_15 and avg_num_pgr_10) and invert error (avg_num_err_inv_30) ( Figure 4C and Figure A1 in Appendix A).

Platform Integration: PdM Application and Machine Down Monitoring
In the deployment phase ( Figure 2C), the machine learning was deployed on Azure HDinsight Spark Cluster with Pyspark components. This architecture was composed of two master nodes and two slave nodes and other services like an SQL, and distributed storage. BIESSE manufacturing machines were connected over the internet network and providing monitoring of machine status. As shown in Figure 1, log files were retrieved from blob storage ( Figure 1A) and pre-processed by the log file parser ( Figure 1B). Raw log files were provided in a compressed format. Log files were then decompressed, parsed and saved in a staging layer and submitted to the Data Science module ( Figure 1B) the machine down failure probabilities. To update prediction probabilities of machine down predictions an agent scheduler (every 24 h) was scheduled to run the Spark job pipeline. Prediction probabilities were visualized within the dashboard to monitor the status of machine down ( Figure 1C).

Discussions
One of the challenges for the Industry 4.0 approach is to design and develop an embedded smart system to enable the health status of the machine. PdM has been featured as a key theme of Industry 4.0 which application allows reducing unscheduled downtime and consequently improve productivity and reduce production cost. Our work was focused on the design and development of a computational pipeline for classifying the health status of the machine-based event information by taking into advantage log files event-based errors. Through the application of advanced analytics on new data streams either in the cloud-connected machine benefits from machine learning algorithms to perform PdM.
Our approach was developed with active involvement from domain experts and was evaluated and shown to be effective by both machine learning and domain standards. The proposed approach allowed screening simultaneously multiple connected machines, thus providing in day-by-bay monitoring that can be adopted with maintenance management. This is achieved by applying temporal feature engineering techniques and training tree-based classification algorithms to predict the RUL of woodworking machines. The effectiveness of the proposed approach is demonstrated by testing an independent sample of additional woodworking machines. The GBM classifier correctly classified 78 out of 81 records considering the machine down period (30-day RUL machine-down) and 605 out of 610 without machine down, yielding 98.2% accuracy, 98.6% recall and 98.3% precision ( Figure 4A). The area under the ROC curve value for the scores was 92.1% ( Figure 4B).
We showed that starting from unstructured log files, trigger events can be efficiently exploited to enable a PdM system. This is due to several unique aspects of log files: containing a timestamp and can be viewed both as symbolic sequences (over event codes trigger) and as event frequencies over some aggregated temporal windows (lagged features). The previous data-driven PdM approach is to manually create rule-based predictive patterns for a particular component based on a Boolean combination of a few relevant event codes [16]. Such an approach is heavily experience-based and very time-consuming, but it illustrates an important concept that component failure can be predicted by checking daily logs for patterns. Many methods like Markov process [25][26][27], Bayesian network [36] were applied to enable PdM and CBM to estimate failure rate, deterioration of components, the breakdown of machines and effective scheduling to overcome problems. Our approach encapsulated the developed computational pipeline and it has obtained an optimal overall success to predict the ball bearing component.
Another major challenge in predictive maintenance is to collect faulty data for training the supervised model. From a machine learning perspective, the purpose is to collect normal state behavior to failure [37]. In real application, the failure event is not always easy to collect, thus leading to a high imbalanced setting. However, this data is important for learning a discriminative model between normal and failure samples by training a supervised machine learning model over two classes (normal state, anomaly state). In our approach, the target variable was expressed as RUL and estimated by calculating the remaining number of days to the failure event-based as provided by the internal service records. RUL variable has been discretized into a binary variable and tested for several windows period before machine down (W = {30,20,10} days). This procedure allows setting the optimal RUL threshold in order to (i) accurate identify a high-risk of failure event and low-risk of failure event while (ii) alleviating the natural imbalanced setting of this task. In fact, the optimal binary RUL split was evaluated and tested into the classification step. Future work may be addressed to extend the binary formulation into a multi-class paradigm while alleviating the imbalanced setting.
The RUL prediction task is often affect by uncertainty. In the literature, uncertainty quantification in the RUL prediction is approached as an uncertainty propagation problem by using statistical models [38]. In our work, we deal with this problem by introducing a non-linear boosting methods to learn the intrinsic variability (e.g., variance) of representative data. However, the boosting methods are trained offline. We plan to extend this methodology as future work, by employing an online boosting training strategy that might be able to learn time-varying parameters in the presence of both non-linear and non-stationary conditions.

Conclusions
In conclusion, we presented a log-based PdM application by taking in advantage event-based triggers and the application of state-of-the-art machine learning techniques to build predictive models. The main advantage of such an approach is the use of aggregated event-based predictors (errors and warning events) as temporal characteristics to predict the potential machine down failures. The proposed application has been deployed a PdM application for woodworking by taking into advantage distributed a Big Data environment to generate data-driven predictive models that are based upon historical log data. The PdM application, deployed by a Big Data stream processing framework, screens log files and predicts the machine down the state of the woodworking machine every 24 h. Status of machine down failure is visualized through a front-end dashboard where the trend of predicted probability of each connected machine can be monitored in real-time. The presented approach applied to woodworking industrial machine could also apply to other domains, such as IT infrastructure, industrial or medical equipment in which equipment logs are recorded. Our model evaluation has shown a reliable predictive performance of the classifier (up to 98.9%, 99.6%, and 99.1% accuracy, recall, and precision respectively). Future work may be addressed to generalize the proposed system to other components or industrial machine types. In this context, domain-adaptation techniques can be exploited in order to mitigate the domain-shifts between training and test set. In this scenario, unsupervised domain adaptation methods can be exploited (e.g., [39]) by taking into account the difficulty to collect RUL information in specific industrial machine types. Another interesting future direction would be to model the temporal dynamics of each feature within pre-defined windows. To achieve this purpose the boosting based models could be extended to multiple instance boosting models.    Figure A1. Variable importance of XGboost (left side) and DRF classifiers (right side). Horizontal plot shows the top 10 variables selected of the model to classify machine down against control group.