TORINO Repository ISTITUZIONALE From Novelty Detection to a Genetic Algorithm Optimized Classification for the Diagnosis of a SCADA-Equipped Complex Machine

: In the ﬁeld of Diagnostics, the fundamental task of detecting damage is basically a binary classiﬁcation problem, which is addressed in many cases via Novelty Detection (ND): an observation is classiﬁed as novel if it differs signiﬁcantly from reference, healthy data. ND is practically implemented summarizing a multivariate dataset with univariate distance information called Novelty Index. As many different approaches are possible to produce NIs, in this analysis, the possibility of implementing a simple classiﬁer in a reduced-dimensionality space of NIs is studied. In addition to a simple decision-tree-like classiﬁcation method, the process for obtaining the NIs can result as a dimension reduction method and, in turn, the NIs can be used for other classiﬁcation algorithms. In addition, a case study will be analyzed thanks to the data published by the Prognostics and Health Management Europe (PHME) society, on the occasion of the Data Challenge 2021.


Introduction
The maintenance of a mechanical system has a fundamental role in the industrial field and has repercussions, both in terms of safety and economics, as it allows for reducing costs and downtime. In fact, in recent years, maintenance techniques have evolved rapidly, passing from corrective and preventive approaches to the most recent and developed condition-based [1] and predictive ones. Recently, research is focusing on further diagnostic techniques, aimed at prescriptive maintenance, which allows exploiting predictions to recommend operational decisions, thanks to the damage-type recognition and, consequently, its cause [2].
Among the different diagnostic techniques and prominent studies present in the literature [3][4][5][6][7][8][9][10][11], Novelty Detection (ND) is a classification technique based on the recognition of "abnormal" values and is frequently used for fault detection in complex industrial systems. In particular, the novelty information has a direct correspondence to the fault detection in case of exclusion of confounding influences, among which are the work and environmental conditions. ND can be based on different types of approaches, among which are distancebased and model-based approaches, support vector methods, other statistical methods, and neural networks. For example, in [12], a multivariate technique, such as Principal Component Analysis (PCA), was implemented for diagnostics via Novelty Detection. In general, ND can be seen as a classification technique between two classes (normal and abnormal or, in the context of diagnostics, healthy and damaged). From a multiclass diagnostic system point of view for prescriptive maintenance, this problem can be decomposed into multiple two-class classifications using ND, since pattern recognition usually occurs with a higher number of classes. Focusing on the monitoring of mechanical systems, ND is frequently

Test Bench Description
The machine used by PHME is a complex system, composed primarily of the 4-axis SCARA robot shown in Figure 1. It represents a typical component for the quality control of an industrial production line. Electric fuses are tested on this test bench (electricity conduction, temperature reached by induced heating), taken with a vacuum gripper. For these controls and for real-time monitoring of the machinery health state, a Supervisory Control and Data Acquisition (SCADA) system composed of 50 sensors was implemented to record the evolution of the quantities of interest and to consider the contributions of the several different components. Please note that the main components making up the entire machine are the 4-axis SCARA robot, a thermal imaging camera and a camera for detecting fuses, Electronically Commutated (EC) and Direct-Current (DC) motors, a pneumatic system, including vacuum pumps and various valves and an electrical power supply circuit for control tests. As can be noted, the overall structure of the machine is rather complex and heterogeneous. There are components of a different nature (rotating and non-rotating parts, electric and pneumatic equipment) and this makes the extraction of the most representative features for the machinery health conditions more challenging.

Case Study
The proposed diagnostic method was developed for industrial applications particular, was tested on the dataset that was distributed for the Prognostics and Management Europe (PHME) society Data Challenge 2021 [35]. This section desc dataset and the related test bench used for its acquisition.

Test Bench Description
The machine used by PHME is a complex system, composed primarily of th SCARA robot shown in Figure 1. It represents a typical component for the quality of an industrial production line. Electric fuses are tested on this test bench (electri duction, temperature reached by induced heating), taken with a vacuum grip these controls and for real-time monitoring of the machinery health state, a Sup Control and Data Acquisition (SCADA) system composed of 50 sensors was imple to record the evolution of the quantities of interest and to consider the contributio several different components. Please note that the main components making up t machine are the 4-axis SCARA robot, a thermal imaging camera and a camera fo ing fuses, Electronically Commutated (EC) and Direct-Current (DC) motors, a pn system, including vacuum pumps and various valves and an electrical power su cuit for control tests. As can be noted, the overall structure of the machine is rath plex and heterogeneous. There are components of a different nature (rotating a rotating parts, electric and pneumatic equipment) and this makes the extractio most representative features for the machinery health conditions more challengin There are no defects throughout the entire quality control line during the test out under healthy conditions. The five different artificial failure forms introduc obtained by manually altering one or more components. The five introduced fau the sensor readings in different ways, so this dataset potentially allows one to cla only the presence of defects but also their type, from a prescriptive maintenance view.

Dataset Description
The recorded experimental dataset is relative to 50 signals, inherent to differe tities of interest. These quantities vary from measurements of ambient tempera humidity to pressure measurements inherent to the state of the machine up to qu There are no defects throughout the entire quality control line during the tests carried out under healthy conditions. The five different artificial failure forms introduced were obtained by manually altering one or more components. The five introduced faults affect the sensor readings in different ways, so this dataset potentially allows one to classify not only the presence of defects but also their type, from a prescriptive maintenance point of view.

Dataset Description
The recorded experimental dataset is relative to 50 signals, inherent to different quantities of interest. These quantities vary from measurements of ambient temperature and humidity to pressure measurements inherent to the state of the machine up to quantities of a different nature, such as CPU temperature and process memory consumption. Each of these signals is described in the dataset through a specific set of fields, inherent to a fixed time window (vCnt = number of samples recorded; vFreq = Sampling frequency; vMax = Maximum recorded value; vMin = Minimum recorded value; vTrend = trend of the historical series and value = Average value) to describe its characteristics. Appendix A lists the different signals present in the dataset and the related measured fields per sensor. The reference time windows have a duration of 10 s, while each experiment can last from 1 to 3 h approximately. However, the dataset was pre-processed by averaging the available features over the entire acquisition period (i.e., ≈ 1 to 3 h) to limit the dataset and to obtain unique features describing each experiment. Furthermore, not all measures have all the above characteristics. Therefore, the dimension of the data matrix X 0 of dimensions m × n is composed of m = 70 rows for n = 240 columns (where m represents the tests and n the features). Further, 50 tests were performed under healthy conditions, while the five conditions with different failures have a cardinality of 4 tests each, for a total of 70 tests.
It should be noted that, in the context of diagnostics and health monitoring of mechanical systems, data are generally collected using suitable sensors (e.g., accelerometers, load cells and temperature sensors) positioned on the machinery of interest, both during its operation in optimal conditions and with the presence of (alternatively simulating) faults, defects, damage, failures. Therefore, each performed test is classified through a specific label describing the condition of the machinery. In the following, healthy conditions of the machinery will be indicated as Class 0, while the damage will be generally defined as Class k, where k ∈ N is the number of types of damage considered.
Finally, this dataset was further pre-processed by standardizing the data on the mean value and on the standard deviation, referred to as the healthy class, obtaining the X matrix of size m × n and rank L ≤ min(m, n). Vector C, containing the labels, describes the condition of the machinery concerning each test carried out and, consequently, has dimensions of 1 × m.

Proposed Methodology
Novelty Detection (ND) is a semi-supervised methodology for implementing a binary classification problem using only data from a healthy reference condition. When a new datapoint arrives, its distance from the healthy reference data cloud is measured, and this measure, usually called Novelty Index (NI), is compared to a threshold so as to identify if the new data are sufficiently far away to be considered novel. The NI is then a reduced dimensionality (1-D) version of the original multivariate dataset.
Different algorithms are available for computing NIs. The simplest involves the projection of the multivariate dataset along a direction which is believed to correspond to the damage-evolution direction. In this case, the NI is simply a linear combination of features (i.e., a weighted sum of features). Hence, the training consists in the determination of such weights α.
In this work this task has been tackled by a heuristic maximization of a utility function measuring the separation of the different classes along the direction identified by α.
A genetic algorithm was used to find the optimal α minimizing the p-value from an ANOVA post-hoc test indicating the degree of separation. In particular, the average value of the p-values which represents the degree of separation between each pair of classes has been optimized. In this way, not only the ability of features to identify classes was considered but the number of distinguished classes was also maximized.
To improve the identification of classes, a more refined NI was implemented, based on Mahalanobis Distance (MD) [36]: where Σ is the covariance matrix of the reference class. Considering that datasets with a high number of features are usually handled, it may often be impossible to correctly estimate Σ −1 using a few healthy points in a feature space with a large dimension. An optimization scheme similar to the previously introduced one, based on GA, was then proposed for the selection of a subspace with a lower dimension. GA was then iterated several times changing the dimensionality to find the optimal features to be kept for the computation of N I m . Comparing the optimal utility function values, it is possible to select the final subspace. Appendix A shows the results of the performed GA optimization in terms of coefficients referred to each feature which allow obtaining N I α and N I m .
Merging the information of N I α and N I m in a 2-D space, the classification task can be implemented with better results. This proves that the implementation of a simple classifier in a 2D space of NIs (such as a decision tree that basically allows recognizing the damage class by dividing the resulting 2D space into regions) can be used for multiclass classification purposes. In addition to this possible use, NIs are also suitable for the implementation of other classifiers, as shown in Section 4. In general, the proposed method can be considered as a size reduction method for classification algorithms.
To conclude, a flowchart is presented in Figure 2 to summarize and clarify the proposed method. where is the covariance matrix of the reference class. Considering that datasets with a high number of features are usually handled, it may often be impossible to correctly estimate −1 using a few healthy points in a feature space with a large dimension.
An optimization scheme similar to the previously introduced one, based on GA, was then proposed for the selection of a subspace with a lower dimension. GA was then iterated several times changing the dimensionality to find the optimal features to be kept for the computation of . Comparing the optimal utility function values, it is possible to select the final subspace. Appendix A shows the results of the performed GA optimization in terms of coefficients referred to each feature which allow obtaining and . Merging the information of and in a 2-D space, the classification task can be implemented with better results. This proves that the implementation of a simple classifier in a 2D space of NIs (such as a decision tree that basically allows recognizing the damage class by dividing the resulting 2D space into regions) can be used for multi-class classification purposes. In addition to this possible use, NIs are also suitable for the implementation of other classifiers, as shown in Section 4. In general, the proposed method can be considered as a size reduction method for classification algorithms.
To conclude, a flowchart is presented in Figure 2 to summarize and clarify the proposed method.

Results and Discussion
This section shows the results obtained with the proposed method applied on the database described in Section 2. After having obtained the NIs in a reduced-dimension space, six of the main classification models were applied: Linear Discriminant Analysis (LDA) [37], k-Nearest Neighbor (kNN) with = 2, given the small amount of data for the minority classes [38], Decision Trees (DT) [39], Linear Support Vector Machine (SVM) [40], Gaussian Naive Bayes (GNB) and Kernel Naive Bayes (KNB) [41]. These classifiers have been adopted both because they are the most widely used (semi-)supervised machine learning algorithms, and to study the proposed method performances as the type of algorithm varies.

Results and Discussion
This section shows the results obtained with the proposed method applied on the database described in Section 2. After having obtained the NIs in a reduced-dimension space, six of the main classification models were applied: Linear Discriminant Analysis (LDA) [37], k-Nearest Neighbor (kNN) with k = 2, given the small amount of data for the minority classes [38], Decision Trees (DT) [39], Linear Support Vector Machine (SVM) [40], Gaussian Naive Bayes (GNB) and Kernel Naive Bayes (KNB) [41]. These classifiers have been adopted both because they are the most widely used (semi-)supervised machine learning algorithms, and to study the proposed method performances as the type of algorithm varies. A Monte Carlo Cross-Validation (MCCV) [42] was applied to all the tests to obtain more precise results, in terms of performance indices. Indeed, since a classification model needs both a training dataset and a second group of data for verification (called validation dataset), the choice of these datasets can take place in different ways. A k-fold Cross-Validation (CV) consists of dividing data into k groups. Only one group is used as a validation dataset, while the remaining k − 1 as training. This process is repeated k times until all groups have been validated. In this case, given that the number of samples describing classes with damage is reduced to four examples per class, k = 4 was chosen to have at least one test in each subdivision of the dataset and to train the model correctly. Considering a generic CV on a database composed of n samples, divided, respectively, into n t for the training set and n v = n − n t for the validation set, the binomial coefficient n n v represents the number of different combinations for the subdivision. However, each of these subdivisions can bring different results, in terms of model generation and, consequently, accuracy. MCCV is a very effective method that consists, in addition to the random subdivision of the samples into the training and validation groups, of the iteration of this procedure N = 50 times. Thanks to this method, the computational complexity is significantly reduced, and the average accuracy tends to the theoretical value of the generated model. Figure 3 shows an example of accuracy trends calculated with MCCV as the number of iterations N increases to demonstrate their convergence.
Machines 2022, 10, x FOR PEER REVIEW 6 of 14 A Monte Carlo Cross-Validation (MCCV) [42] was applied to all the tests to obtain more precise results, in terms of performance indices. Indeed, since a classification model needs both a training dataset and a second group of data for verification (called validation dataset), the choice of these datasets can take place in different ways. A k-fold Cross-Validation (CV) consists of dividing data into groups. Only one group is used as a validation dataset, while the remaining − 1 as training. This process is repeated times until all groups have been validated. In this case, given that the number of samples describing classes with damage is reduced to four examples per class, = 4 was chosen to have at least one test in each subdivision of the dataset and to train the model correctly. Considering a generic CV on a database composed of samples, divided, respectively, into for the training set and = − for the validation set, the binomial coefficient ( ) represents the number of different combinations for the subdivision. However, each of these subdivisions can bring different results, in terms of model generation and, consequently, accuracy. MCCV is a very effective method that consists, in addition to the random subdivision of the samples into the training and validation groups, of the iteration of this procedure = 50 times. Thanks to this method, the computational complexity is significantly reduced, and the average accuracy tends to the theoretical value of the generated model. Figure 3 shows an example of accuracy trends calculated with MCCV as the number of iterations N increases to demonstrate their convergence. The proposed method performance will be evaluated through different comparative indices [43]. In addition to the typical accuracy, other indices are used in this study compared to traditional sensitivity and specificity, since the database used is multi-class and the proposed method aims to recognize not only the damage but also its nature. For this reason, considering a generic confusion matrix, as in Table 1, the following indices are introduced to evaluate the performance of the methods, where the acronyms are as in Table 1 • Accuracy: this represents the ability of the classifier to correctly recognize positive and negative cases. The proposed method performance will be evaluated through different comparative indices [43]. In addition to the typical accuracy, other indices are used in this study compared to traditional sensitivity and specificity, since the database used is multi-class and the proposed method aims to recognize not only the damage but also its nature. For this reason, considering a generic confusion matrix, as in Table 1, the following indices are introduced to evaluate the performance of the methods, where the acronyms are as in Table 1 ( • Class Errors Rate: this index allows for recognizing how many tests have not been correctly classified, despite being recognized as unhealthy. Therefore, it represents the error made in identifying the specific damage. C.E. R. = CE TC (6) • Performance Index: this is a redundant index, as it is the product of the indices seen so far, but allows for observing, simultaneously, the set of previous performances.
• Frobenius Norm: this is a matrix norm defined as the square root of the sum of the absolute squares of its elements.
where A is the confusion matrix after having standardized it by columns and subtracting the identity matrix, a ij are the elements of the matrix A and k are the numbers of the fault classes. In this way, the results obtained in terms of the Frobenius norm will be greater than or equal to 0. In detail, the larger the norm, the worse the classification and vice versa. • AUC: the area under the receiver operating characteristic (ROC) curve. The AUC provides a combined measure of performance across all possible classification thresholds. Among the many different algorithms for calculating NIs, this method initially provides the projection of the multivariate dataset along a direction that is believed to correspond to the evolution of the damage. A Genetic Algorithm was adopted to optimize the results and, thus, to maximize the number of distinct classes. The results concerning N I α are shown in Figure 4. As is clear from the image, classes 2 and 3 definitely stand out from the healthy values; nevertheless, classes 5, 7 and 9 are more difficult to identify. results and, thus, to maximize the number of distinct classes. The results concerning are shown in Figure 4. As is clear from the image, classes 2 and 3 definitely stand out from the healthy values; nevertheless, classes 5, 7 and 9 are more difficult to identify. Because of this, a more refined NI was calculated, based on Mahalanobis Distance (MD). The resulting MD-NIs from such a subspace are plotted in Figure 5. As can be noticed, classes 2 and 3 definitely stand out again from the healthy values, but classes 5 and 7 are now better identifiable. In any case, a perfect classification is still impossible.  However, merging the information of and in a 2-D space (Figure 6), the classification task can be implemented with better results. In this particular case, the classifier was built by segmenting the 2-D space in rectangular regions, as visible in Figure 6. After reducing the original space (having 240 dimensions) to a 2-D space (where the two Because of this, a more refined NI was calculated, based on Mahalanobis Distance (MD). The resulting MD-NIs from such a subspace are plotted in Figure 5. As can be noticed, classes 2 and 3 definitely stand out again from the healthy values, but classes 5 and 7 are now better identifiable. In any case, a perfect classification is still impossible.
Machines 2022, 10, x FOR PEER REVIEW  8 of 14 results and, thus, to maximize the number of distinct classes. The results concerning are shown in Figure 4. As is clear from the image, classes 2 and 3 definitely stand out from the healthy values; nevertheless, classes 5, 7 and 9 are more difficult to identify. Because of this, a more refined NI was calculated, based on Mahalanobis Distance (MD). The resulting MD-NIs from such a subspace are plotted in Figure 5. As can be noticed, classes 2 and 3 definitely stand out again from the healthy values, but classes 5 and 7 are now better identifiable. In any case, a perfect classification is still impossible.  However, merging the information of and in a 2-D space (Figure 6), the classification task can be implemented with better results. In this particular case, the classifier was built by segmenting the 2-D space in rectangular regions, as visible in Figure 6. After reducing the original space (having 240 dimensions) to a 2-D space (where the two However, merging the information of N I α and N I m in a 2-D space (Figure 6), the classification task can be implemented with better results. In this particular case, the classifier was built by segmenting the 2-D space in rectangular regions, as visible in Figure 6. After reducing the original space (having 240 dimensions) to a 2-D space (where the two dimensions correspond to the calculated NIs), it is possible to calculate the performance indices obtained with each classifier and initially compare them with those detectable using the initial features. These performance indices are shown in Tables 2 and 3.    In Table 4 it is possible to note that all the performance indices concerning the model precision are significantly improved thanks to the proposed method. On the other hand, the indices indicating the different error typologies decrease on average.  In Table 4 it is possible to note that all the performance indices concerning the model precision are significantly improved thanks to the proposed method. On the other hand, the indices indicating the different error typologies decrease on average. In addition, it can be noted that the variations relating to LDA and GNB classifiers are not present, since it is not possible to use them with the features extracted from the original space. In fact, given that the proposed method allows for reducing the space dimensionality, it makes it possible to employ classifiers otherwise not usable.
To conclude, in Table 5, it can be further observed how the classification operation with the use of NIs is significantly speeded up. In particular, the proposed method allows for reducing the computational effort by 97% (reducing the average elapsed time per cycle from 17.40 s, using the original dataset, to 0.57 s, employing the NIs obtained thanks to the proposed method). The reported results were obtained by averaging the time taken over 50 cycles. The computational software used to conduct these experiments is MATLAB R2020b, running on a PC equipped with a 10th gen Intel i7 processor and 16 GB RAM. Table 5. Processing times of the classification with different reduced datasets.

Dataset Average Elapsed Time (%) per Cycle
Original dataset with "n" features 100.0% Multi − NI optimized by means of GA (N I α , N I m ) 3.3%

Conclusions
This work exploits simple novelty detection strategies to produce a 2-D space where a classification is possible, in an easy but satisfactory way. The proposed method was described and subsequently applied to a real industrial case, consisting of a complex quality control line of electronic components. In particular, the first axis is obtained as a linear combination of original features. The second axis is obtained as the (Mahalanobis) distance of a new data point from a reference distribution in a subspace composed of 19 selected features. Since it is a parametric model, both such features and the linear combination weights were automatically selected by a routine able to optimize a measure of class separation by means of a genetic algorithm. This composition of the features made it possible to extract the most relevant information in relation to the machinery state of health. Despite the presence of components heterogeneous in nature and non-stationary working conditions, the results seem to suggest that such a 2-D data compression can lead to satisfactory diagnostic results, improving the performance of a simple feature extraction technique. In particular, the results showed an improvement in terms of the general performance index, ranging from 22% to 49% in relation to the classification algorithm.
In addition to this advantage, the proposed method is able to recognize not only the failure condition of the mechanical system (damage detection) but also the type of damage (damage classification). This characteristic makes the method suitable for a prescriptive maintenance conception.
In general, in addition to being a classification ND-based method, the proposed work can also be applied as a dimension reduction method, since it allows for improving the diagnostic results by simultaneously and significantly decreasing the number of features. This is very important when dealing with big data [44]. This aspect has further related advantages, such as the memory reduction for saving data for diagnostic purposes and the speed increase in the calculation of the predictions. Indeed, it was possible to observe a reduction of about 97% of calculation time compared to the classification with the original features dataset. This last advantage makes the method suitable for real-time applications or for applications where timely damage recognition is particularly essential.