A Goal Programming-Based Methodology for Machine Learning Model Selection Decisions: A Predictive Maintenance Application

: The paper develops a goal programming-based multi-criteria methodology, for assessing different machine learning (ML) regression models under accuracy and time efﬁciency criteria. The developed methodology provides users with high ﬂexibility in assessing the models as it allows for a fast and computationally efﬁcient sensitivity analysis of accuracy and time signiﬁcance weights as well as accuracy and time signiﬁcance threshold values. Four regression models were assessed, namely the decision tree, random forest, support vector and the neural network. The developed methodology was employed to forecast the time to failures of NASA Turbofans. The results reveal that decision tree regression (DTR) seems to be preferred for low values of accuracy weights (up to 30%) and low accuracy and time efﬁciency threshold values. As the accuracy weights tend to increase and for higher accuracy and time efﬁciency threshold values, random forest regression (RFR) seems to be the best choice. The preference for the RFR model however, seems to change towards the adoption of the neural network for accuracy weights equal to and higher than 90%.


Introduction
Industry 4.0 has revolutionized business processes and operations as it now provides capabilities for dynamically collecting and managing a vast amount of data from IoT devices [1]. On this basis, software democratization provides an opportunity for efficiently managing these data, with ML models emerging as tools for further facilitating the data and supporting decision-making [2]. These models are trained on a train-dataset and evaluated on a test-dataset to examine their accuracy, with their success hinging upon their ability to dynamically train on data, capture recent disruptions, and thus further provide dynamic forecasts [3].
Artificial Neural networks (ANNs) are types of ML models that are characterized by a complex algorithmic process for prediction, which could in turn lead to more accurate forecasts. This algorithmic complexity leads to significantly high algorithmic training times, especially when the number of features increase [4]. As a result, ANNs lack the potential to be dynamically trained on real-time data, and thus provide dynamic forecasts. This drawback may not allow the planner to identify a potential disruption on-time and thus take appropriate proactive actions.
On the other hand, advanced regression models such as DTR, RFR and support vector regression (SVR) are ML models, whose solution process is simpler. This simplicity allows for dramatically reduced training times that provide users with the capability to dynamically train the model to datasets retrieved on a real time basis, and further deliver dynamic forecasts [5]. However, the simplicity in the algorithmic process could in turn lead to less accurate forecasts.
Predictive maintenance (PdM) is condition-based research field, that heavily depends on dynamic data of equipment and sensors. It does not substitute the traditional periodic maintenance management relying on routine servicing and run-to-failure programs but makes it more reliable by providing a scheduling tool for dynamic PdM tasks [6]. The dynamic monitoring of the operating condition of machines and the resulting dynamic estimation of their mean time to failure, results in the following: (i) reductions of unnecessary maintenance operations, (ii) increase in the time that spare parts are used, (iii) prevention of unexpected machine breakdowns and (iv) increases of the available production times [7].
ML models ability to handle a large stream of data from IoT sensor devices is one of the main drivers that support their utilization for PdM [8]. However, and based on [8], two critical challenges emerge and involve the following: (i) the latency associated to the high dynamic training times of the ML models as these derive from the need for real-time monitoring of large machine operating condition data streams and (ii) the selection of the ML algorithm that better fits to a specific scenarios.
On this basis, the purpose of this paper is twofold. Firstly, to address the above challenges, through the development of a fast, computationally efficient and user-friendly goal programming-based methodology that allows planners to (i) select the ML algorithm that balances prediction accuracy and training time efficiency and (ii) assess ML models in a time efficient manner, for different accuracy and time significance weights as well as accuracy and time significance threshold values, thus providing planners with the flexibility to quantify the impact of higher and lower latencies on prediction accuracy and further identify solutions more appropriate for different scenarios. Secondly, it allows planners to illustrate the applicability of the developed methodology in a real-world PdM problem and derive critical managerial insights for predicting remaining lifetimes of machines and equipment.
The rest of the paper is organized as follows. Section 2 provides a critical synthesis of the existing state of the art literature review on PdM and ML algorithms, while in Section 3, the goal programming-based methodology is provided for selecting the best ML model, along with an analysis of the ML models that will be examined. Section 4 describes the numerical analysis process, while Section 5 analyzes the results. Finally, Section 6 wraps ups with the discussion and conclusions.

Literature Review
There is a large number of the PdM methods (e.g., vibration monitoring, thermography, ferrography, acoustic emission, corrosion monitoring etc.) [9] that include the monitoring and measuring of different process parameters and generate a big volume of data, e.g., actual machine condition, failed components, failure rate, mean-time-between-failures, repair times. This data is collected by different real-time and offline sources and can be used to predict the future trends of the machines, to schedule and plan disruptions and the required repairs in most cost-effective time point [10]. Moreover, such massive input of data could benefit from implementation of machine intelligence in maintenance modeling and management addressing the needs of the Industry 4.0 and building an intelligent manufacturing [11]. Zonta et al. (2020) [1] classify three main approaches used for prediction, namely physical model-based, knowledge-based and data-driven, where the last one includes models based on ML algorithms. ML models are characterized by the ability to deal with a large amount of multivariate data and the learning capability of algorithms. Due to this fact, ML is an appropriate tool in PdM, and currently it is being increasingly applied to PdM. There is a number of predictive algorithms that used in ML where each type represents its own patterns and has a bearing upon the performance of PdM applications. Carvalho et al. (2019) [12] ranked ML algorithms by the extent of application in PdM, detecting that Random Forest (RF) algorithm [13] is used most frequently, followed by Decision Tree (DT) [14], ANNs based methods [15], Support Vector Machine (SVM) [16], k-Nearest Neighbor (KNN) [17], Linear Discriminant Analysis (LDA) [18] and Bayesian Network [19]. These algorithms can perform prediction tasks or validate proposed plans. In most cases, the PdM models are developed with the use of real vibration data. However, manufacturing plants are dynamic units where the processes change dynamically. This fact causes the heterogeneity of the data and can affect the predictability of the ML models. Due to these two aspects there is no universal model that could be applied to different scenarios [19]. Besides, each ML algorithm has its specific characteristics and applications.
Examples of the papers that use the aforementioned algorithms include [20], where the authors create a PdM experiment setup for the detection of the faulty bearing based on IoT. They model the obtained data with the use of five ML algorithms, namely SVM, LDA, RF, DT, and KNN and evaluate the models with the use of eight different metrics. Syafrudin et al. (2018) [21] propose a real-time monitoring system that utilizes IoT-based sensors, big data processing, and a hybrid model for fault detection in order to improve decision-making for automotive manufacturing. This prediction model utilizes densitybased spatial clustering of applications with noise to separate outliers from normal sensor data, and RF classification to predict faults. Ali et al. (2019) [22] provide a software middleware that uses the outcomes of real-time data analytics in combination with ML models trained over historical data for production forecasting within a manufacturing unit in real-time. The authors apply regression-based approaches for prediction such as Multiple Linear Regression, SVR, DTR and RFR. The proposed integrated framework allows one to calculate the impact of detected abnormal events and set the optimal production targets accordingly. In the context of real-time decision making, [8] high levels of Scalability and network Bandwidth both are the compulsory requirement and one of the major challenge for applying ML models in PdM. Liu et al. (2018) [23] address this issue by proposing training of ML models at the edge of the networks for real-time feature extraction and anomaly detection of the high-speed railway transportation system. The proposed methodology incorporates Auto-Associative Neural Network.
ANNs technique is widely used by the authors to tackle issues related to PdM. Li, et al. (2017) [24] point out that the main benefits of ANNs are fault tolerance, generalization and adaptability, at the same time, the lack of an explanation function is identified as a limitation. The authors apply ANNs for fault prediction, focusing on the prognosis process for a backlash error in machine centers. Crespo et al. (2019) [25] combine ANNs with data mining to address the problem of assets' performance monitoring predicting any loss of energy consumption efficiency, where ANNs is used to identify when asset behavior abnormalities can appear. Daniyan, et al. (2020) [26] develop the training modules comprising ANNs with dynamic time series model to predict the state and potential failure of a railcar wheel bearing. Recurrent Neural Networks (RNN) technique is a type of ANNs that is also implemented to deal with PdM problems and is characterized by the ability to capture the dynamics of sequence data [27]. For example, [28] create Long Short-Term Memory model using RNN to predict failures and to estimate the number of remaining cycles or Remaining Useful Life. Bogojeski et al. (2021) [29] also invoke the use of RNN technique to model the industrial aging process forecasting problem in the context of predicting degradation of chemical process equipment.
Moreover, some processes may demand the combination of ML techniques. Huang et al. (2020) [30] deal with multi-source sensing data fusion models and algorithms based on neural network in mechanical equipment fault diagnosis and prediction. The authors come to conclusion that the mentioned algorithms need to be combined with data preprocessing algorithms (e.g., SVM) to achieve higher accuracy. A number of authors apply a combination of both ANNs and SVM techniques. Thus, [27] bring together these techniques to predict the machine systems' failure events through monitoring the cutting tool and the spindle motor, where SVM was used to classify the conditions of the cutting tool, while two ANNs algorithms were used to monitor the condition of bearing. The study of [31] focuses on the development of framework that can prevent the failure and extend the lifetime of mechanical, electrical and plumbing components of building facilities. The results show that the proposed model that combines ANNs and SVM techniques can efficiently predict the future condition of these components for maintenance planning. Besides this combination, [32] develop a PdM approach towards an early warning maintenance/failure warning system for floating dock ballast pumps using MATLAB and SVM algorithm. Gohel et al. (2020) [33] combine SVM and logistic regression algorithms to perform PdM of nuclear infrastructure and to predict the failure of nuclear plant infrastructure and engines.
Çınar et al. (2020) [34] reviewed the number of papers on the application of ML algorithms in PdM that were published between 2010 and 2020. The authors came to the conclusion that a single prediction method may not provide best results, and the combination of more than one ML model could provide more accurate prediction. The accurate multi-criteria decision making methodology for recommending ML algorithm was provided by [35], which evaluates and ranks classifiers' and helps to learn and build classification models and includes criteria selection method, relative consistent weighting scheme, a ranking method, statistical significance and fitness assessment functions, and implicit and explicit constraints satisfaction at the time of analysis. The performance evaluation of ML algorithms using multi-criteria decision making techniques was also discussed by [36] who applied Fuzzy Analytical Hierarchical Process (FAHP) in assigning weights to the criteria and ranking the performance criteria and implemented Simple Additive Weighting and TOPSIS model to rank the classifiers for comparison. While [37] proposed a library-based overview regarding component security evaluation based on multi-criteria decision and ML algorithms. Earlier, [38] developed a multi-criteria-based active learning approach to apply it to named entity recognition.
A summary of the related research in the realm of ML techniques in PdM is presented in Table 1.  The results of the critical synthesis reveal the following: • To our knowledge there seems to be a small number of research efforts that provide a multi-criteria decision-making methodology for ML algorithm selection.

•
The aforementioned research efforts provide time consuming multicriteria methodologies, are theoretical in nature and cannot be easily adapted to multiple scenarios as the weights assigned to ML selection criterions are a result of a time-consuming statistical process. Moreover, they do not relate to any focus areas or case studies, and their applicability in PdM is not demonstrated.
This paper contributes to the existing literature through the following: • Development of a multi-criteria decision-making methodology for ML model selection that utilizes the method of Goal Programming. The methodology allows for undertaking a time-efficient sensitivity analysis process of the weights assigned to the criteria, and of the criteria threshold values, thus providing the decision maker with a wide range of alternatives for optimal ML model selection that make the model suitable for multiple PdM scenarios where the weights on time and accuracy efficiency maybe different. • Assessment of the model's applicability on the real-world dataset of NASA Turbofans time to failures and the generation of practical managerial insights for predicting the remaining lifetimes of machines and equipment.

Materials and Methods
The methodological approach employed involves the development and assessment of different type of ML models m ∈ M for the forecasting of a machine's time to failure. The employed models will be fitted in a training dataset and employed to forecast the dependent variable value of a test dataset. The forecasting accuracy of the models is denoted by a i and will be assessed by the Mean Absolute Percentage error (MAPE) metric, thus ranging from 0 to 100%, while the time efficiency, by the models total training and error generation times, denoted by t m .
In order to find the best model that balances time efficiency and model accuracy, a goal programming-based methodology will be employed. Thus, significance weights on forecasting accuracy and time efficiency will be set, denoted by w a , and (1 − w a ), respectively and the best model selected should lead to the lowest value of the following deviation function d m .
where a * , t * represent target accuracy and time efficiency values respectively. Thus, and as the accuracy values tend to increase and the training and error generation times decrease, the deviation function value is constantly being reduced.
The nomenclature of the model parameters are summarized in the following Table 2: Table 2. Nomenclature of the deviation function parameters. Four regression models will be examined, namely the SVR, the DTR, the RFR, and the neural network regression.

Support Vector Regression (SVR)
The method considers a number of independent variables i ∈ I, of a training set denoted by x i , and their respective weights denoted by w i . For mathematical expression simplicity, [39] expresses the independent variables x i , as a vector x and the weights assigned to the independent variables w i , as a vector w. These are then used for determining the hyperplane line equation function of Equation (2), where b corresponds to the hyperplane bias.
The hyperplane line splits the independent variable space in two regions. The first region involves the area above the line f (x) = 1, and is denoted by R + , while the second one, by the area below the line f (x) = −1 and is denoted by R − . This type of split does not consider the independent variable values of the training set between these two lines, when determining the optimal values of w, b thus providing higher degrees of freedom of the model and the flexibility to derive lower forecasting error potential for a test set [40].
The distance of the hyperplane line from f (x) = 1, is denoted by d + and from f (x) = −1, by d − . The sum of these distances can be estimated as 2· w·x+b w , where w corresponds to the length of the weight vector w. The optimization function and the model's constraints are summarized below [41].

Decision Tree Regression (DTR)
DTR is a tree-structured regression model, normally using mean squared error on deciding how to optimally split a node in two or more sub-nodes [42].
As in the SVR model, x i represents the values of the independent variables i ∈ I, of the training set and y i , the corresponding values of the dependent variables. The model aims to determine the optimum number of split variables j ∈ J, denoted by s j , and split points p ∈ P, denoted by s p , that minimize the number of binary partitions of a region, denoted by, R s j , s p 1 , R s j , s p 2 , under the sum of mean square error minimization objectives [41]. The model's optimization function is presented through the following Equation (4) [43].
where, c 1 , c 2 , correspond to the average values of the y i variables corresponding to the x i of each partitioned region as presented below through Equations (5) and (6):

Random Forest Regression (RFR)
Compared to DTR, the RFR model generates multiple decision trees, by randomly selecting a sample of the independent variables x i and their respective variables y i , from the examined dataset. For each decision tree, the model optimizes the same decision variables s j , s p , as in those optimized under the DTR model, and under the sum of mean square error minimization objectives. An additional decision variable is considered, and involves the number of random (ensemble) trees e ∈ E, that the RFR should generate, determined under average mean square error minimization objectives [44]. The derived forecasted value of the RF is then estimated as an average value of the forecasts of the randomly selected decision trees.

Artificial Neural Networks (ANNs)
Based on [45], the neural network consists of input layers that are connected with hidden layers, and the hidden layers are in turn connected with output layers. During the training process the independent variable values x i are used, as inputs to neuron and are distributed unchanged between the input and the hidden layers.
Between the hidden and the output layer though, the inputs are transformed to a weighted sum further reduced by a threshold value of the neuron, denoted by o i , as presented in the following Equation (7): where w ij , corresponds to the weight assigned to the independent variable of the input layer i that is distributed to the hidden layer j, and θ j corresponds to the threshold value of layer j.
The results of the output layer are then expressed as a non-linear function of x i , and through Equation (8) as follows: The algorithm's objective during the training process is to determine the optimal values of θ j , w ij that minimize the mean square errors E p , per training pattern p ∈ P, through Equation (9) [45]:

Numerical Analysis
The applicability of the developed methodology will be examined through its application in the Turbofan Engine Degradation Simulation Data Set provided by the Prognostics CoE at NASA Ames [46]. The dependent variable involves the number of the turbine times to failure, while the independent variables involve three operational settings, 26 sensor measurements and the Turbofans unit numbers.
The examined models will be trained on a training set and employed for the forecasting of the test set using Spyder (Python 3.8). The training and forecast times will be captured along with the derived mean square errors for each model type. A sensitivity analysis will be conducted for the accuracy and time significance weights denoted by w a , (1 − w a ) respectively, and on the target accuracy and time values, denoted by a * , t * respectively.
The accuracy and time significance weights will range from 10 to 100% with a step of 10%. The sensitivity analysis of the target accuracy threshold will start from 0.1% to 3% with a step of 0.1, while the sensitivity analysis of the time efficiency threshold from 1 to 15 min with a step of 0.5 min.

Results
The results of the forecasting process for each model type are summarized in Table 3. For all models an 80% training set was considered. An attribute selection process was employed for the regression models leading to only 4 out of 25 important decision variables, namely the Turbofan's, unit number and the results of sensors 4, 9 and 11. Moreover and specifically for the ANN regression model, a Keras regressor was employed, with a significantly low MAPE value realized after 1000 training epochs. Based on the formulated Table, we observe that the ANN model exhibits the best performance in terms of accuracy; however, the time required is extremely high in order to allow for a dynamic training process, which could in turn lead to a lower accuracy in a dynamic setting. On the other hand, DTR is the most time efficient model.
The following graph (Figure 1) depicts the impacts of a sensitivity analysis on the accuracy and time efficiency weights and thus, on the derived deviation functions of Equation (1)   The following Figures 2-10 illustrate the results of the sensitivity analysis on accuracy and time threshold values, given fixed values of accuracy weights ranging from 0.1 to 0.9 and thus, of respective time efficiency weights ranging from 0.9 to 0.1.             More specifically, Figure 2 clearly illustrates the prevalence of DTR for almost all of sensitivity analysis values of the accuracy weights. The prevalence is also high in Figure  3, but not in Figure 4, where RFR seems to be the most preferred algorithm. The preference seems to increase further for higher accuracy and time threshold values (Figures 4-9), further validating the results of Figure 1, where RFR is preferred for accuracy weights equal to and above 30%.
Another interesting finding involves the gradual preference on ANNs, as the accuracy weights increase, and for higher accuracy threshold values (Figures 4-9), with the More specifically, Figure 2 clearly illustrates the prevalence of DTR for almost all of sensitivity analysis values of the accuracy weights. The prevalence is also high in Figure 3, but not in Figure 4, where RFR seems to be the most preferred algorithm. The preference seems to increase further for higher accuracy and time threshold values (Figures 4-9), further validating the results of Figure 1, where RFR is preferred for accuracy weights equal to and above 30%.
Another interesting finding involves the gradual preference on ANNs, as the accuracy weights increase, and for higher accuracy threshold values (Figures 4-9), with the ANNs prevalence realized, for accuracy weights equal to and above 90% regardless of the sensitivity analysis values on accuracy and time thresholds (Figure 10).

Discussion and Conclusions
We developed a fast and computationally efficient methodology for assessing ML algorithms, considering two criteria, namely accuracy and time efficiency. The accuracy is quantified through the MAPE metric, time efficiency, through the algorithms' training and error generation times.
The methodology is based on goal programming and provides to the user the flexibility to easily assess the models under alternative accuracy and time efficiency weights, and for various values of accuracy and time efficiency thresholds. The methodology was employed for estimating the time to failure of turbofans. The decision tree, random forest, support vector and the neural network regression ML models were examined. The results of the numerical analysis are summarized below: The DTR model seems to be the most efficient model for dynamically estimating turbofan time to failures when considering an accuracy significance weight up to 30%. However, the model's efficiency seems to decrease as the accuracy and time thresholds increase. The RFR seems to be more efficient for accuracy weights ranging from 30% to 90%, and for higher accuracy and time threshold values. The ANN model exhibits significantly high accuracy values and thus, seems to be more preferable for accuracy weights ranging from 90% to 100%.