Identiﬁcation of Critical Components in the Complex Technical Infrastructure of the Large Hadron Collider Using Relief Feature Ranking and Support Vector Machines

: This work proposes a data-driven methodology for identifying critical components in Complex Technical Infrastructures (CTIs), for which the functional logic and/or the system structure functions are not known due the CTI’s complexity and evolving nature. The methodology uses large amounts of CTI monitoring data acquired over long periods of time and under different operating conditions. The critical components are identiﬁed as those for which the condition monitoring signals permit the optimal classiﬁcation of the CTI functioning or failed state. The methodology includes two stages: in the ﬁrst stage, a feature selection ﬁlter method based on the Relief technique is used to rank the monitoring signals according to their importance with respect to the CTI functioning or failed state; the second stage identiﬁes the subset of signals among those highlighted by the Relief technique that are most informative with respect to the CTI state. This identiﬁcation is performed on the basis of evaluating the performance of a Cost-Sensitive Support Vector Machine (CS-SVM) classiﬁer trained with several subsets of the candidate signals. The capabilities of the methodology proposed are assessed through its application to different benchmarks of highly imbalanced datasets, showing performances that are competitive to those obtained by other methods presented in the literature. The methodology is ﬁnally applied to the monitoring signals of the Large Hadron Collider (LHC) of the European Organization for Nuclear Research (CERN), a CTI for experiments of physics; the criticality of the identiﬁed components has been conﬁrmed by CERN experts.


Introduction
The identification of components critical for the functionalities of industrial and manufacturing systems is of essential importance to detect bottlenecks to which consolidation efforts must be directed, aiming at reducing the probability of abnormal conditions, number of shutdowns and recovery time [1,2]. When the structure function of the system and the reliabilities of its components are known, components critical for the successful functionality of the system can be identified by traditional reliability and risk analysis methods by calculating the so-called component importance measures [3,4]. However, this is unattainable for CTIs, which are large-scale systems of systems composed of thousands of interdependent and interconnected components executing diverse functions and using distinct technologies, e.g., hydraulics, mechanics, and electronics [5,6]. Due to their topological complexity, their spatial distribution, and the distinctions in their functionalities and technologies, the systems of a CTI are designed and constructed independently, and then assembled relying on the direct physical interfaces and considering functional dependencies that only take into account the theoretical scenarios of operation [2,7]. Throughout the life of a CTI, its systems evolve over time to maintain their operation, improve their performance, expand their functions, etc. [7]. This evolution modifies the physical interconnections between the systems of the CTI, which consequently alters the functional dependencies among the components and, therefore, their criticalities [1,8,9].
For two decades, there has been a growing interest from academy, industry and governments in investigating novel and alternative methods for identifying criticalities in different types of CTIs. Hausken [10] proposed a methodology combining defense-attack models and risk analysis, which analytically identifies critical interdependencies among complex infrastructures, and applied it to identify economically critical interdependencies among different industrial sectors in the USA. Wu et al. [11] developed a method relying on network theory for identifying potential geographical and physical interdependencies triggering cascading failures among different infrastructures in situations of touristic attacks. Chopra and Khanna [12] proposed a method based on network and graph theories for the identification of interdependencies among different infrastructures that could magnify the effects of natural disasters on the resilience of the US national economy. Zio and Ferrario [13] developed a system-of-systems modeling framework for analyzing the risk of accidents caused by external events (specifically earthquakes) in a nuclear power plant. Their proposed framework made it possible to consider the interdependent infrastructures in which the plant was embedded (e.g., power and water distribution networks), through the use of Muir Web and Monte Carlo simulation for the quantitative evaluation. Genge et al. [14] presented a sensitivity analysis-based method that required knowledge of the system dynamics in order to highlight the critical control variables regarding cyber-attacks on infrastructures. Patterson and Apostolakis [15] developed a method, combining network theory and Monte Carlo simulation, for identifying critical geographical sectors in dependent infrastructures.
These methods, however, face serious challenges hindering their practical application to CTIs, which requires detailed knowledge of the CTI structure function, architecture and/or dynamics, which is not attainable for complex and evolving systems [1].
Recently, two data-driven approaches have been developed for the analysis of components criticality in CTIs. A wrapper feature selection method based on the use of a differential evolution algorithm for the identification of the subset of monitoring signals that optimizes the performance of a Support Vector Machine model classifying the CTI functioning/failed state was proposed in Baraldi et al. [16]. Although the method was successfully applied to a dataset involving 200 monitoring signals of a real CTI, the computational cost associated with the wrapper feature selection algorithm represents a practical limitation on the size of the CTI to which the method can be applied. Lu et al. [17] developed a data-driven method for the estimation of the importance measures of CTI components based on the use of a random forest model for classifying the CTI functioning/failed state as a function of the individual components' states. The method relies on the assumption that the functioning/failed state of each individual component is known, which is not possible for many real CTIs.
To tackle these challenges, this work presents a contribution in the form of a novel data-driven methodology for the identification of components critical for CTI production availability, based on the processing of a large amount of monitoring data of the CTI. These data include the values of hundreds of signals registered over long periods of time and under several operational conditions of the CTI. The methodology relies on the selection of the subset of signals most relevant and informative with respect to the CTI state, i.e., functioning or failed. Then, the critical components are identified as those whose states are monitored by the selected subset of signals.
The identification of the subset of relevant/informative signals is accomplished via a two-stage methodology. The first stage is formulated as a feature selection problem, i.e., the identification of the subset of monitoring signals that allows the development of the best-performing model for the classification of the CTI functioning/failed state [18,19]. Most feature selection methods can be categorized as either filter or wrapper methods [20]. Filter methods are based on computation, directly from the data and independently of the specific model used for classification, of a numerical evaluation index, which is used to score individual features or features subsets according to their importance or relevance with respect to the classification output [18,20]. On the contrary, wrapper feature selection methods select an optimal subset of features for developing the classification model itself, i.e., the classification model is wrapped within a search algorithm (i.e., optimizer), which aims to identify the feature subset that provides the best classification performance [16,21].
In this work, wrapper feature selection methods are not considered because of the scale of the system, which is characterized by a very large number of monitoring signals. This makes the application of wrapper methods to the whole CTI infeasible, due to the computational effort required for training and testing a dedicated classifier for each of the candidate subsets of features explored by the optimization algorithm [22,23]. Therefore, a filter method based on the Relief ranking technique is employed to identify potential candidate signals related to the CTI state [18,19]. The Relief feature ranking algorithm has been chosen in this work due to its relatively low computational requirements, tolerance to noise, and robustness to feature interactions [24]. The Relief ranking algorithm assigns a proxy statistic, or a score, to each individual feature: low scores are assigned to features that have different values for neighboring instances belonging to the same class, and high scores are assigned to features that have different values for neighboring instances belonging to different classes. The scores assigned by Relief express the relative importance, or relevance, of each feature with respect to the output (i.e., the binary CTI state in our case of interest). These scores range from −1 (worst) to +1 (best), and provide valuable knowledge of the system [18].
The second stage involves a procedure for identifying the subset of most informative signals for the CTI state, among those top-ranked by Relief. This subset is searched by maximizing the performance of a CS-SVM classifier [25,26] of the CTI state on a set of data different from that used for the previous stage of Relief feature ranking. The procedure evaluates the performance of CS-SVM classifiers sequentially built by adding, each iteration, the next top-ranked signal (feature). Therefore, in the first step, only the first-top feature is included in the feature subset, in the second iterating, the second-top feature is added to the feature subset, and the procedure is repeated as long as the performance increases. Then, the components referred to by the chosen subset of top-ranked features are identified as critical.
The main novelties of this work are: (i) the development of a method for the identification of critical components in CTIs that relies fully on data and, thus, unlike most of the existing methods of CTI analysis, it does not require detailed knowledge of the CTI; (ii) the application of the method to the real and complex CTI of the CERN LHC, on the basis of a large amount of monitoring data collected over one year of operation, in contrast to most existing methods of CTI analysis, which investigate simplified and theoretical cases of CTIs.
A two-stage procedure has been adopted for the validation of the proposed method: (i) the individual steps of the proposed method, i.e., the classification using the CS-SVM, the feature ranking provided by the Relief algorithm, and the method for the quantification of the information content of a subset of features have been verified considering 15 benchmark labeled datasets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository [25,27]; (ii) the identification of the critical components in CTIs has been validated by means of a real case study involving 260 monitoring signals of the LHC particle accelerator of CERN, which is a large-scale and complex system of systems, in which the failure of individual components/units directly influences the accelerator performance and its total availability for physics experiments [28,29]. Since the ground-truth importance ranking of the CTI components is not available, the obtained results have been directly validated by expert judgment of CERN operators knowledgeable of the CTI behavior and its malfunctioning.
The remainder of this paper is structured as follows. The problem of the identification of critical components in CTIs is formulated in Section 2. The proposed methodology based on the Relief feature ranking algorithm is described in Section 3. The performance of the proposed method is validated on the basis of different benchmark datasets in Section 4. The application of the proposed methodology for the identification of the components critical in the CTI of the CERN LHC is shown in Section 5. Finally, the work is concluded in Section 6.

Problem Statement
This work addresses a CTI composed of a large number, M, of components whose individual states (functioning or failed) are unknown. The CTI functioning or failed state at time t is denoted using a binary variable: The CTI functioning or failed state x CTI (t) is assumed to be known to the plant operators. Additionally, the CTI is monitored by a number of N > M sensors, measuring a set of N physical quantities of the CTI components, y(t) = (y 1 (t), y 2 (t), . . . , y n , . . . , y N (t)), such as pressures, temperatures and electrical signals, which implicitly contain real-time information about the CTI individual component states. Figure 1 shows a schematic representation of the information collected by monitoring the CTI components. The goal of this study is to identify the set of components critical for the state of the CTI, c * = c * 1 , c * 2 , . . . , c * , . . . , c * q , with q < M and c * ∈ {c 1 , . . . , c M } for any = 1, . . . , q. To achieve this goal, this work exploits the information content of a dataset D involving a history of instances y(k), x CTI (k) , k = 1, . . . K, where y(k) is the vector of signal values collected at past time instant k and x CTI (k) ∈ {0, 1} represents the associated state of the CTI (0 refers to failed state and 1 to functioning state). Henceforth, we denote the patterns with failed state as positive and belonging to the minority class, while we denote those with functioning state as negative and belonging to the majority class. In fact, since CTI failure is a rare event, the number of positive patterns K + is considerably smaller than that of negative patterns K − , i.e., K + K − . In this work, the principal hypothesis is the existence of an unknown function G : x CTI (t) = G(y(t)), representing a relation between the values of the monitoring signals y(t) and the state of the system x CTI (t). The conjecture is that if a data-driven classifier x CTI (t) = G * (y(t)) can be developed with acceptable classification performance, the hypothesis of the existence of the function x CTI (t) = G(y(t)) can be verified.
Hence, the problem of the identification of the set of components c * critical for the CTI state is linked to that of identifying the subset of signals y * = y * 1 , y * 2 , . . . , y * r , . . . , y * R with r < N and y * r ∈ {y 1 , y 2 , . . . , y i , . . . , y N }, which are most relevant for determining the CTI state x CTI . A subset of signals y 1 is assumed to be more relevant to the CTI state, x CTI , than a subset of signals y 2 if the performance of a classifier G * 1 (y 1 ) : x CTI = G * 1 (y 1 ), developed using the instances y 1 (k), x CTI (k) , k = 1, . . . K, is superior to the performance of a classifier G * 2 (y 2 ) : x CTI = G * 2 (y 2 ), developed using the instances y 2 (k), x CTI (k) , k = 1, . . . K. This, then, leads to the definition of the problem of selecting the subset of signals y * that permits the development of the best data-driven classifier G * of the CTI state. In other words, searching for the best subset of signals y * is led by the performance of the classifier G * developed using y * . Eventually, the critical components c * are identified as those whose operating states are monitored by the optimal subset of signals y * .
It is worth mentioning that with respect to the quality of the monitoring data collected from industrial systems, the performance of the classification method could be reduced by: (i) missing measurements resulting from the failure to collect or store signal values at few time instances [30] and (ii) gross errors due to sensor faults, such as freezing, drifting and miscalibration [31,32]. To address these problems, missing data imputation and data reconciliation methods have been developed and successfully applied for improving the quality of the data collected from industrial systems. Interested readers can refer to [33,34].

Methodology
To identify the optimal subset of monitoring signals y * that allows the best classification of the CTI state (i.e., function or failed), this work proposes a methodology based on the following steps ( Figure 2): • sorting the monitoring signals according to their importance/relevance to the CTI state using the Relief-based feature ranking algorithm (Section 3.1); • identifying the best performing features subset, y * , by the quantification of the information content (Section 3.2); • identifying as critical components those whose states are monitored by the best feature subset y * .

Relief-Based Feature Ranking
The Relief algorithm is a supervised filter method for feature selection that provides a proxy statistic for each feature (also called feature weight), which acts as a measure of the relevance of the feature with respect to a target concept [24]. It was originally designed for application to binary classification problems with discrete or continuous features, and was then generalized to handle multi-class classification and regression problems [18]. The Relief algorithm is also able to rank features of different types, such as continuous, discrete, binary, categorical and textual.
Relief assigns low scores to features that have different values for neighboring instances belonging to the same class and high scores to features that have different values for neighboring instances belonging to different classes [24]. Consider a dataset D containing instances y(k), x CTI (k) , k = 1, . . . , K, where the k − th instance records the features values, y(k) = (y 1 (k), y 2 (k), . . . , y n (k), . . . , y N (k)), and the corresponding CTI functioning/failed state, x CTI (k) = 0/1. The Relief algorithm first initializes the weights of the features to zero, W = w 1 , w 2 , . . . , w j , . . . , w N = 0. Then, the weights are updated over V iterations corresponding to V instances, V ≤ K, randomly selected without replacement. In the v − th iteration, v = 1, 2, . . . V, the features weights are updated on the basis of the difference between their values at this instance, y v , and their values at the Q nearest neighbour instances, y v q , q = 1, 2, . . . , Q, of each class: where p x CTI y v q and p x CTI ( y v ) are the prior probabilities of the class (i.e., CTI state 0/1) to which the instances y v q and y v belong, respectively, which can be estimated from the training data. Notice that the larger the difference of the feature values for instances of the same class, the smaller the assigned weights (second term of Equation (1)) and the larger the difference of the feature values for instances of different class, the larger the assigned weight (third term of Equation (1)). The obtained weights are normalized between −1 and +1, and the normalized weight vector, W = [w 1 , w 2 , . . . , w N ], is used to rank the N features according to their informativeness or relevance. Finally, the ranked signals y rnk = y 1 , y 2 , . . . , y , . . . , y N are obtained, where y is the − th most important feature with respect to the CTI state, x CTI , among all the N features. For further details on the Relief algorithm, the reader is invited to consult [18].

Quantification of the Information Content of the Top-Ranked Feature Subsets
The feature ranking information provided by the Relief technique is used to select the subset of input signals for building a classification model [18]. Commonly adopted practical solutions include directly choosing a predefined maximum number of top-ranked features or selecting a subset of top-ranked features for which the importance weights are higher than a certain threshold.
Assuming that the subset of the top-ranked features most informative of the CTI state is the one that allows to obtain the best classification of the CTI functioning or failed state, the following iterative procedure has been adopted in this work. At the first iteration, an empirical classifier is built to receive in input the first top-ranked feature y 1 and provide as output the CTI state x CTI , and its performance is evaluated. Improved classifiers are then obtained by iteratively adding the next top-ranked feature in the input. In other words, at the generic i-th iteration, the empirical classifier, G * i y rnk sb i : x CTI = G * i y rnk sb i , y rnk sb i = y rnk y 1 : y =i , receiving as input the top-ranked features up to the i-th one, is developed and its classification performance is evaluated. The procedure stops when adding the next top-ranked feature to the inputs produces a classifier that has a performance that does not improve upon that of the previous classifiers.
With respect to the choice of the proper metric to be used for assessing the classification performance, the G mean and F measure metrics are typically considered in the case of highly imbalanced datasets [25]: The recall metric greatly rewards the accurate classification of the rare positive instances, specificity slightly penalizes the wrong classification of the relatively large number of negative instances and precision heavily penalizes large number of FP predictions [25]. Since the G mean is the geometric mean of recall and specificity, its values can be large even if the false alarm rate value is high [25], which is harmful in industrial practice. Therefore, the additional consideration of the F measure , which is the harmonic mean of sensitivity and precision, is necessary to consider false alarms. In this work, as a compromise between the two measures, we also consider the following utility function to assess the performance of the classifiers: where η ∈ [0, 1] is a weighting parameter that balances the importance of the F measure and the G mean : lower values of η result in a performance measure, U, that gives more importance to triggering true alarms than avoiding false alarms, and vice versa. In this work, the same importance is given to the two metrics F measure and G mean , i.e., η is set equal to 0.5.

Classifier
Different types of classification methods, such as artificial neural networks [35,36], SVMs [37], random forest [17], decision trees [38] and k-nearest neighbors [39], can be used to classify the CTI functioning/failed state from monitoring signals. In this work, SVM classifiers have been adopted, since they offer high classification performance, low computational cost, and ability to handle imbalanced datasets by using its Cost-Sensitive adjustment [26,40].
Given a set of input-output training data, SVM maps the input-data original space into a high-dimensional feature space through a basis or kernel function that can be of different types, e.g., linear, polynomial, Gaussian, etc. [41]. Then, the problem becomes the determination of a linear decision boundary separating the training instances of the different classes. The identification of the optimal decision boundary implies the maximization of a distance, also called margin, from the decision boundary to its nearest positive and negative training instances, which are referred to as the support vectors [41].
Given the typical high reliability of CTIs, most of the instances of monitoring signals are recorded while the CTI is operating in the functioning state (majority class), while very few instances correspond to CTI failure events (minority class). To efficiently handle such imbalanced datasets, the Cost-Sensitive setting of the SVM is considered [25]. CS-SVM assigns a greater cost to the incorrect classification of minority class instances than to that of majority class instances. Specifically, the cost value associated with a class is set to be equal to the inverse of the number of available instances belonging to the class.
The CS-SVM hyperparameters, such as the kernel scale, are optimized using a grid search procedure. We also considered the Gaussian kernel type, since it has shown very satisfactory results in several different applications [26,40]. The implementation of CS-SVM classifiers is accomplished via the "fitcsvm" algorithm included in the statistics and machine learning toolbox of Matlab [42].

Validation of Proposed Method on Benchmark Datasets
In this section, the performance of the proposed method is validated using 15 benchmarks classification datasets taken from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository, which are characterized by high imbalance ratios [27]. Table 1 shows the main characteristics of these benchmark datasets. The validation of the method is based on the following steps: (1)   All the performances were obtained by applying a 10-fold cross-validation procedure and using a grid search for setting the parameters of the Gaussian kernel scale (see Section 3.2) of the CS-SVM. Note that the Relief method uses the same training data as the CS-SVM (i.e., nine folds of data).

Validation of the CS-SVM Classifier
This analysis involves a comparison between the classification performance of a CS-SVM that receives in input all the features and two other SVM-based approaches developed by Liu et al. [25] for imbalanced datasets, which are based on Random Under Sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE). The work of Liu et al. [25] has been selected as benchmark, since it is the most recent work using filter feature selection, SVM classification and dealing with highly imbalanced datasets. Other methods based on wrapper feature selection approaches are not considered here, since they are not applicable to systems where hundreds of signals are measured, due to their unaffordable computational cost.
According to the results shown in Figure 3, the CS-SVM classifier outperforms two of the techniques (RUS and SMOTE) most commonly used in the literature for handling imbalanced datasets in binary classification, over at least 11 out of 15 datasets. Therefore, it can be concluded that the CS-SVM can be used for the classification of CTI datasets containing a large proportion of negative (functioning state) patterns.

Validation of the Relief Feature Ranking
The CS-SVM classifier is used in combination with the proposed Relief-based feature ranking method, considering as input for the classifier all the features with importance weight greater than or equal to half of the weight value associated to the first-ranked feature. Figure 4 compares the classification performance of the CS-SVM receiving as input the feature subset chosen by the Relief feature ranking method with that of the CS-SVM receiving as input all of the features, and with that of the method proposed in [25]. This latter method combines (i) a filter-based feature selection technique that calculates an index measuring the Between-Class Separability (B-CS) for each individual feature, (ii) an instance selection technique that considers the B-CS and a variable Local Fitness (LF) index, and (iii) a classification model, based on cost-sensitive least square SVM, which is built using only the features and the instances selected in steps (i) and (ii).  Figure 4 shows that the classifiers, trained using only the features identified by the Relief algorithm (i.e., CS-SVM + Relief, represented by red crosses), provide better performances than the classifiers trained using all the input features (blue circles), over the majority of the datasets. Additionally, the proposed method (CS-SVM + Relief) outperforms the method developed by Liu et al., 2017 [25] on seven datasets considering the F − measure, ten datasets considering the G − mean and eight datasets considering U. In the datasets in which the proposed method does not outperform the method in [25], it still achieves very competitive performance.

Analysis of the Relief Robustness
To further verify the robustness of the Relief technique with respect to measurement noise and irrelevant features, the following test is carried out. A modified version of the dataset "poker8vs6" is artificially generated. It includes: (i) the original 10 signals, which will be referred to as A, B, C, D, E, F, G, H, I, J, (ii) 50 artificial signals obtained by adding Gaussian noise to a fraction, Pr%, of the instances of the original signals, assuming five different combinations of σ and Pr for each original signal (see Table 2), and (iii) 60 pure Gaussian noise signals. Notice that the dataset "poker8vs6" has been chosen for this analysis because it is one of the considered datasets characterized by the largest imbalance ratios and the smallest number of positive instances (Table 1). The dataset is randomly divided into two subtests with similar imbalance ratios: the first subset includes 90% of the instances and is used for feature ranking by the Relief method, whereas the second subset contains 10% of the instances and is used for the quantification of the information content. Figure 5 shows the ranking obtained using the Relief algorithm for the 120 features of the modified dataset and their corresponding weights, and Table 3 reports the obtained 20 top-ranked features and their corresponding weights. The notation "C_0.3_0.2", for example, refers to the artificial signal generated from the original signal C by adding a Gaussian noise of standard deviation equal to 0.3 to 20% of the instances. The robustness of the Relief feature selection against noisy measurements and irrelevant features is effectively proven by the obtained results, since the Relief is able to efficiently reject a large number of irrelevant features and to properly rank the relevant features with respect to the amount of noise affecting their measurements. This can be confirmed by the observation that: (1) the artificial signals always follow in the ranking the corresponding original ones (e.g., signal "C_0.2_0.1") is ranked after signal "C", (2) when artificial signals generated from the same original signal are considered, the larger the noise (σ) and the fraction of disturbed instances (Pr%), the lower the signal ranking (e.g., signal "A_0.2_0.1" is ranked before "A_0.3_0.2"), which is coherent with the fact that the addition of noise degrades the signal importance with respect to the true binary output, and (3) all the artificial signals present in the 20 top-ranked signals are generated from the five topranked signals. Additionally, the first appearance of a pure noise signal in the ranking is in position 46 out of 120, and the last 60 signals in the ranging include 45 pure noise signals, which, again, supports the ability of Relief to rank the irrelevant/non-informative signals.

Validation of the Information Content Quantification Procedure
Finally, the procedure proposed in Section 3.2 for selecting a proper subset of the most relevant, informative features is applied to the same modified data "poker8vs6". A CS-SVM classifier is trained and tested 120 times, where, each time, the next top-ranked feature is added to the input of the classifier. The classifiers are trained using the same data previously used in the features ranking stage (90% of the total set), and the classification performance metrics are calculated considering the test set composed of data not used in the feature ranking stage (10% of the total set). Figure 6 shows the performances of the classifiers over the first 35 iterations, since the remaining are not providing improved performances. Note that the increase of the size of the optimal signal subset complicates the engineering interpretations, and, as expected, reduces the classification accuracy due to the curse of dimensionality. The most satisfactory performances with the lowest number of signals are achieved using the three top-ranked features (F measure = 1, G mean = 1 and U = 1). Notice that the procedure of the quantification of the information content is shown to be able to identify a small set of features providing the best performance, which cannot be improved by adding other features. Additionally, these results show the capability of the proposed methodology as a whole (Relief + CS-SVM + information quantification procedure) of handling noisy measurements and irrelevant features in the data.

Case Study: The CERN LHC Electrical Network
The LHC of CERN is a technical infrastructure that provides the services required for the operation of its particle accelerators and experimental areas [43]. Since components malfunctioning, failures or external events can directly influence the performance of the accelerators and total availability for the physics experiments, CERN's technical infrastructure is considered to be critical [44][45][46]. The analysis of the LHC's operation during 2016 revealed that a large amount of the total LHC downtime was caused by electrical disturbances, which led to interruption of the experiments and dumping of the accelerated beams of particles [29]. Therefore, it is of significant importance to identify those components that are responsible for the electrical disturbances propagation to the circulating beam and, therefore, are critical with respect to the CTI availability (because they are responsible of the beam dump in case of electrical disturbances).
For this reason, the case study concerns the propagation of electrical disturbances in the CERN's electrical network [47], which is responsible for distributing electrical power on CERN's campus that extends over a 27 km circumference across the French-Swiss borders, through a hierarchical structure of power lines or levels starting from the tophierarchy, 400 kV line, to the bottom, 400 V line. Electrical disturbances are caused by surges or short circuits that can be generated inside or outside CERN's electrical network and propagate in different ways depending on the operational state and conditions of the CERN CTI components (e.g., healthy, degraded, faulty, under maintenance, etc.). When these disturbances reach critical components, such as power converters, the stability required for controlling the superconducting magnets current is not guaranteed, and consequently, the LHC beams are preventively stopped (an operation called beam dump in technical jargon) to avoid any possible damage to the LHC components.
The LHC electrical network is geographically decomposed into eight zones, which are referred to as LHC Points 1, 2, 3, 4, 5, 6, 7 and 8, and is functionally decomposed into the LHC lines at 400 kV, 66 kV, 18 kV, 3.3 kV and 400 V. Starting from the top-hierarchy, 400 kV line, to the bottom-hierarchy, 400 V line, the number of components/units and parameters to be monitored explodes exponentially from hundreds to tens of thousands. At first, we considered perturbations coming from upper-level hierarchy components to be the most likely to distribute external and internal disturbances due to the hierarchical organization of the network.
As a result, we consider a dataset, D, including a number of N = 260 monitoring signals, whose values were collected during the year 2016 by sensors installed on distribution switchboards and breaker units at the 66 kV and 18 kV lines of the LHC electrical network. The dataset contains K = 3670 instances collected during the occurrence time of an electrical disturbance, which was detected by oscilloscopes installed in specific locations. Since among these instances (electrical disturbances events) only 48 (K + ) of them have led to the beam dump (x CTI = 1), the dataset is characterized by a very large imbalance ratio of 75. 46.
Given the limited number of missing values in the dataset and the high quality of both the instrumentation used to monitor the CTI and its calibration plan, the application of missing data imputation and data reconciliation methods is not required in this case study.

Feature Selection and Informativeness Quantification
The dataset is randomly split into two subsets of similar imbalance ratios: the set used for feature ranking by the Relief method includes 90% of the instances and the set used for the quantification of the information content contains 10% of the instances. The Matlab [42] function "relieff ", considering all the instances V = K for the weights updating and Q = 6 nearest-neighborhood parameter, has been employed. Figure 7 shows the obtained ranking and the associated weights of the 260 signals of the considered dataset, where the signals of each line (66 kV and 18 kV) are highlighted by a different color. Notice that most of the top-ranked features are from the 18 kV line. In particular, only one feature from the 66 kV line is present among the 10 top-ranked features and three among the 20 top-ranked features. This indicates that signals from the LHC 66 kV line are less informative with respect to the LHC state than those from the 18 kV line. This is in accordance with the fact that the 66 kV line is higher in the hierarchy of the electrical network, and it is further away from the sensitive components connected to the LHC. In contrast, the 18 kV line is at the bottom of the network hierarchy and directly feeds components critically affecting the LHC operation, such as power converters. As previously discussed, the Relief algorithm returns only the ranking/weights of the signals, without directly selecting a proper subset of the most relevant or informative signals. This task has been performed by applying the procedure proposed in Section 3.2. In practice, the first classifier is built using the first top-ranked feature, the second classifier is built using the first two top-ranked features, and the procedure is repeated until no clear improvement in the performance is obtained. The classifiers are trained using the same data previously used in the features ranking stage (90% of the total set), whereas, in order to avoid overfitting and to achieve a fair comparison among different feature subsets, the performance indices of the classification have been calculated using a test set formed by data not used during the feature ranking stage (10% of the total). Figure 8 shows the performances of the developed classifiers over the first 35 iterations. The subset of the top-ranked features formed by 24 signals, which provide the maximum U function value, is selected. It includes 19 signals from the 18 kV line and 5 signals from the 66 kV line. Table 4 shows the details of the identified 24 critical signals.

Engineering Analysis of the Best Features Subset
In this section, the 24 top-ranked features selected by the proposed method have been analyzed, revealing that 12 components from the LHC 18 kV line and one component from the 66 kV line can be identified as critical, among tens of components related to all the 260 signals collected from both the 18 kV and 66 kV lines. The analysis of the geographic distribution of the identified critical components over the CERN CTI area is illustrated in Figure 9a. According to the obtained results, the most critical components are situated in the areas of the CTI corresponding to points 1, 2, 4, 5, and 6, which indicates the criticality of these geographical sections with respect to other sections, such as 3, 7 and 8. A third analysis performed on of the selected signals (see Figure 9b) reflected that all selected critical features were power signals. Finally, an analysis was performed to explore the distribution of the identified signals over the critical components ( Figure 10   The criticality of the 13 identified components (mostly power converters) has been confirmed by CERN operators, who support these results by the fact that these specific components belong to the 18 kV power distribution level, which is the most sensitive to perturbations, since it directly feeds the power converters that supply current to the superconducting magnet system. Furthermore, the identified geographical areas are characterized by the presence of old-generation power converters.

Discussion
The results of the analysis, i.e., the identified critical components, can be used to optimize maintenance, monitoring, control, design retrofit and update plans for improving the CTI's reliability and availability.
For example, maintenance can be optimized by following intensive inspection and preventive policies of the identified critical components, while ensuring a minimum availability level of spares. Additionally, the monitoring system of the CTI can be improved by enhancing the sensors installed on the critical components and prioritizing their monitoring signals to supervise the overall state of the CTI. Operation and control of the CTI can benefit from the identification of critical components through the definition of safe operational scenarios able to minimize failure risks, robust control regimes able to immunize the critical components against disturbances, and efficient failure recovery procedures able to quickly retrieve the states of critical components to their normal conditions. In the case study here considered, the obtained results suggest that priority should be given to upgrading the old power converters of the 18 kV line zones.
The expected bottlenecks of the proposed method for critical components identification in CTIs are, by nature, the same ones faced by most data-driven approaches. They mainly include: (a) the quality of data, (b) their information content, and (c) the need for expert knowledge to interpret/validate the obtained results. With respect to (a), missing values, sensor failure and gross errors are expected to reduce the performance of the ranking provided by relief, and, therefore, affect the identification of the critical components. In such situations, missing data imputation [30] and reconciliation techniques [31] can be used to enhance data quality. With respect to (b), the main criticality is that data collected when the CTI is in a failed state, which, given the high reliability of CTIs, can be insufficient for analysis. In this work, we have tackled this challenge by considering data collected over a long period of time (a whole year of operation) and developing a method properly tailored to deal with imbalanced datasets. With respect to (c), benchmark datasets that mimic the complexity of real-world problems have been employed for the verification of the proposed method (see Section 4). It is the opinion of the authors that a validation of the obtained results, when the ground-truth criticality of the components is not known, as in most real situations, requires the contributions of expert judgment to confirm the outcomes.

Conclusions
This work proposes a novel data-driven methodology for the identification of critical components in CTIs, based on the analysis of hundreds of monitoring signals recorded over long periods of time and under different operational conditions of the CTI. The problem has been addressed as a feature selection one with the objective of identifying the subset of signals allowing the best classification of the CTI functioning or failed state.
A filter approach for feature selection based on the Relief technique is used to rank the monitoring signal with respect to the CTI state. The informativeness of different subsets of the top-ranked features are quantified by assessing the performances of a CS-SVM classifier built using the different subsets. The CTI critical components are identified as those associated with the subset of the top-ranked signals that achieves the best classification of the CTI state by means of a CS-SVM.
The performance of the methodology has been verified using literature benchmarks characterized by highly imbalanced datasets. The performance of the CS-SVM built with the subset of the top-ranked features identified by the Relief algorithm shows high competitiveness to those obtained by other methods in the literature.
Finally, the methodology has been applied to the monitoring signals of the CERN's electrical network, leading to the identification of specific components, LHC areas (points 1, 2, 4, 5 and 6), type of signals (power) and even hierarchical levels (18 kV), for which the criticality has been confirmed by the equipment experts and operators.
Future research lines will include: (i) analyzing the robustness and sensitivity of the method performance with respect to uncertainties, such as the quality of the data that may include missing values, noise and/or gross errors, and (ii) testing different techniques that can perform the same function, i.e., different filter feature ranking methods and different classification models.

Conflicts of Interest:
The authors declare no conflict of interest.