Assessing Sensor Integrity for Nuclear Waste Monitoring Using Graph Neural Networks

A deep geological repository for radioactive waste, such as Andra’s Cigéo project, requires long-term (persistent) monitoring. To achieve this goal, data from a network of sensors are acquired. This network is subject to deterioration over time due to environmental effects (radioactivity, mechanical deterioration of the cell, etc.), and it is paramount to assess each sensor’s integrity and ensure data consistency to enable the precise monitoring of the facilities. Graph neural networks (GNNs) are suitable for detecting faulty sensors in complex networks because they accurately depict physical phenomena that occur in a system and take the sensor network’s local structure into consideration in the predictions. In this work, we leveraged the availability of the experimental data acquired in Andra’s Underground Research Laboratory (URL) to train a graph neural network for the assessment of data integrity. The experiment considered in this work emulated the thermal loading of a high-level waste (HLW) demonstrator cell (i.e., the heating of the containment cell by nuclear waste). Using real experiment data acquired in Andra’s URL in a deep geological layer was one of the novelties of this work. The used model was a GNN that inputted the temperature field from the sensors (at the current and past steps) and returned the state of each individual sensor, i.e., faulty or not. The other novelty of this work lay in the application of the GraphSAGE model which was modified with elements of the Graph Net framework to detect faulty sensors, with up to half of the sensors in the network being faulty at once. This proportion of faulty sensors was explained by the use of distributed sensors (optic fiber) and the environmental effects on the cell. The GNNs trained on the experimental data were ultimately compared against other standard classification methods (thresholding, artificial neural networks, etc.), which demonstrated their effectiveness in the assessment of data integrity.


Introduction
This section explores the context of the research, ranging from the industrial situation regarding nuclear waste storage to the scientific background regarding graph neural networks.

Andra's Cigéo Project
Radioactive waste storage is a contemporary issue that is addressed in different countries around the globe.One of the most feasible solutions is deep underground storage.However, such storage needs to be actively monitored for safety reasons.This study was part of Andra's Cigéo project, which aims to design, develop and monitor a deep geological repository for radioactive waste.This repository will store higher-activity waste (HAW), including high-level waste (HLW) and intermediate-level waste (ILW), as shown in Figure 1.
Given the dynamic conditions of the storage and the impossibility of accessing the sensors, there is a need for a method that can ensure the consistency of the data.Indeed, the sensor network will evolve over time, in part caused by aging sensors, drifts and failures.

The High-Level Waste (HLW) Demonstrator Cell
The data used was acquired in Andra's Underground Research Laboratory (URL), in which there is an HLW demonstrator cell.This demonstrator is a prototype of an HLW storage and is heavily instrumented with both distributed sensors (such as optic fiber) and point sensors.Figure 2 shows the position of each sensor with respect to the demonstrator cell.This prototype was used in an experiment that consisted of the thermal loading of the cell using the thermal sources presented in Figure 3.This experiment provided us with the data required for the proposed machine learning algorithms.The use of this experimental data was one of the novelties of this work.To the best knowledge of the authors, no other work dealing with sensor integrity assessment around high-level waste demonstrator cells in a deep geological layer has been published in the literature.

Graph Neural Networks and Message Passing
Abbreviations section describes the different notations used in this paper.The sensor network presented in Figure 2 can be represented as a graph by linking measurement points that are in the vicinity of each other.Moreover, graphs can capture and represent physical phenomena, such as thermal conduction, where energy conservation operates on nodes and heat flows through the graph's edges.The case studied in this work was one-dimensional, which involved considering only the data of an optic fiber sensor as a unidirectional graph, in order to test the efficiency of a GNN for assessment (classification) of the sensor integrity.
A GNN is a machine learning algorithm that takes a graph as the input and by modifying the graph's embeddings (from the nodes and the edges), can perform various prediction tasks (i.e., clustering, classification and regression).GNNs can operate predictions at multiple levels [1][2][3][4][5][6][7]: • At the graph level, for instance, one could give a molecule (as an input graph) and try to find out whether the molecule is toxic.• At the edge level, typical operations are friend recommendations on a social network graph.

•
At the node level, classification tasks can be performed, as was the case in this work, where the state (healthy or faulty) of each sensor (i.e., each node) was derived from the graph of the sensor network.
To perform these predictions, GNNs require the raw data to be updated into more relevant data that are easier to process.Thus, the input graph was updated iteratively using a mechanism called message passing, which was described by Gilmer et al., 2017 [8].The message-passing mechanism can be divided into the following tasks [1-9]: 1.
Select a node v.

2.
Collect information (the messages) from the neighboring nodes N (v) (and edges).

3.
Concatenate the messages using a node-order equivariant function α.

4.
Update the embedding x v of the node using an update function ϕ (which can be discovered by a neural network).This update function takes the concatenated messages and the selected node's embedding as the inputs and outputs the updated node's embedding This mechanism, which is presented in Figure 4, leverages the connectivity of the graph by using the information of neighboring nodes (and edges) to update the nodes' embeddings.Equation (1) describes the updating of a single node's embedding using the message-passing mechanism [2,8]: Note that the update function ϕ and the concatenation function α used in the messagepassing algorithm are shared across all nodes [9], which means that the same exact functions are used to update all nodes' embeddings.This notion is central to the concept of a GNN.
Figure 5 explains how to build a complete GNN: the top part describes the GNN model, and the bottom part showcases its application to a simple graph.This model can be split in two main elements [1,2,4,5,9]:

•
The core of a GNN aims at transforming the input graph's embeddings into embeddings that are easier to interpret by the second part of the model.This is achieved by stacking multiple layers of the GNN, each of which is based on message passing.The connectivity of the graph is not modified during this step.At the bottom of Figure 5, the embeddings x 0 are transformed into embeddings x n by the repeated application of message passing.

•
The specific network takes the updated graph as the input and performs the desired prediction task (in the example, it is node classification).This network's architecture depends on the related task (pooling layers are used for graph classification, softmax or sigmoid activation functions for classification, etc.).Then, a loss function is applied to the model in order to train it.
Equation ( 2) is a variation of Equation (1).It describes the updating of the nodes' embeddings in layer l of a GNN.This equation shows the methodology used to obtain the updated embeddings of any node x l+1 v (and, by extension, X l+1 V ) given the previous nodes' embeddings 8,9].From now on, this kind of equation is used to describe a GNN.
A graph convolutional network (GCN) is one of the most straightforward models that use a simple convolution operator presented by Kipf and Welling 2017 [10].In this model, the update function is the same for all nodes of the graph.Equation ( 3) presents the GCN model [8][9][10]12,23,24], which uses an activation function σ (often non-linear, such as ReLU) and two matrices W l 0 and W l 1 that represent the multi-layer perceptron (MLP) shared across all nodes.
Equation ( 2) is the general form of Equation (3).This model was later expanded by Schlichtkrull et al., 2017 [29], who included different updating functions that depend on the type of connection between two nodes.The latter is called a relational-GCN (R-GCN) because the type of relationship between two nodes induces a different convolution operator.This is prompted by the fact that the relationship connecting two nodes is meaningful and should be taken into account by the GNN.For instance, on Facebook, two users might be connected because they are friends or because they blocked each other, both of which are fundamentally different.Equation ( 4) presents an R-GCN model using a type of relationship between two nodes noted r: This model's main weakness is the increased algorithmic complexity when dealing with a high number of relations.This problem is partially alleviated when using a basis decomposition for W l r or defining W l r with a diagonal block matrix.But even with these improvements, the model is not the most suitable for handling large-sized graphs.
Generalizing even further, graph attention networks (GATs) [31][32][33] use an attention mechanism to operate on the updating function.The strengh of a GAT is that there is no need to have relationship sets; the attention mechanism defines the relationship between two nodes by using their embeddings.Equation ( 5) presents a GAT model with the attention function A l (x u , x v ), which is a learnable parameter: Similar to the R-GCN, this model is not easy to apply to large graphs, given its computational complexity.
The model used in this study for the sensor integrity assessment was a variant of the GCN called GraphSAGE, which was developed by Hamilton et al., 2018 [1,9,13].This model uses a chosen node order invariant aggregation function (such as a sum, average or max) to collect messages from the neighboring nodes, as shown in Figure 6.This model is efficient when operating on large graphs, which warrants its use.Equation (6) presents the mathematical model of GraphSAGE, which is similar to Equation (2), with α as the chosen aggregation function (e.g., sum) and ϕ l (a, b) = σ([b, a]W l ):

Related Works
The model presented in this work was derived from the GraphSAGE model presented by Hamilton et al., 2018 [13] and aimed at detecting faulty sensors within a network.Therefore, this study built on previous works using GNNs for the following tasks: There are plenty of graph neural networks specialized in anomaly detections [7], whether they are used for IT security [34][35][36][37][38], time series [39] or the industrial use of the Internet of things (IoT) [40].
The novelty of the work presented in this paper lay in the combination of these three elements: using a graph neural network for anomaly detection in a sensor network by employing node classification models.Similar work was achieved in Jiang et Luo 2023 [54], where a model for sensor self-diagnosis was used.In Deng et Hoooi 2021 [39], a GNN was developed for the detection of anomalies in time series, which were measured using sensors.
However, both these models use a partial GAT [31], whereas the model proposed in this paper was based on GraphSAGE [13].The GraphSAGE model has reduced computational costs when dealing with large graphs compared with a GAT, which led to its selection for our application.Another novelty of this work lay in the modification of the GraphSAGE model using elements from the Graph Net framework [2].

Materials and Methods
This section presents the available data and the modifications applied to generate training datasets for our machine learning algorithms.Then, the used machine learning and comparative classification models are described.

Generating a Training Dataset
As presented in the Introduction, one of the novelties of this study came from the use of industrial data that originated from Andra's URL.These data represent the responses of a subset of the thermal sensors used in the thermal loading of the HLW demonstrator cell.We collected the responses of one of the distributed optic fiber sensors (Figure 2 in blue), thus inducing a one-dimensional study case.These data could be viewed as a unidirectional graph, as shown in Figure 7. Figure 8 presents the responses of the distributed sensor over time at various sample points.The lower temperatures were closer to the gallery (i.e., to the cold point) and the higher temperatures were closer to the heat sources.
Figure 9 shows the temperature along the distributed sensor for different time samples.Similar to Figure 8, we can observe the gallery at x = 0 p (where p is the sensor's position and consecutive sensors were 5 cm apart), which corresponded to the cold source; heating elements were present in the second half of the HLW demonstrator; and the rock at x = 485 p acted as an insulator (given the small heat flow).The considered time step was one day, while the spatial sampling had a step size of 5 cm.For the rest of this study, the responses of the distributed optic fiber sensors were considered as a series of point sensors.Thus, distributed sensor failures, such as breaking of the optic fiber, were not modeled, but could be easily inferred.The sensor failures that were considered in this work were partial failures [55][56][57][58][59]: The data from the thermal loading of the HLW demonstrator measured by the optic fiber were supposed to have no inaccuracies.Therefore, there was a need to introduce inappropriate responses for some of the sensors to simulate sensor partial failures.
The process used to induce inaccuracies is presented below.Before altering the sensor outputs, a time step was chosen.The sensor responses at the current time step were the starting point of the process that added synthetic errors in the sensors' outputs.
First, the number of sensors with inaccuracies needed to be determined.This number followed a discrete uniform distribution between 0 and 250 among the 485 measurement points.Equation (7) presents the distribution of the number of sensors that were degraded: Later on, the sensors that were degraded were chosen via an unordered random draw without replacement.This caused one-quarter of the data to be degraded on average, as shown in Equation ( 8): Once the measurement points to degrade were selected, a mask of sensor integrity was created.This mask returned a Boolean output that described the state of the sensor (i.e., faulty or healthy), as showcased in Equation ( 9): Inappropriate sensor responses were induced on the previously selected sensors.These inaccuracies were modeled by an offset of a minimum of 2 °C and up to 8 °C, and could be either positive or negative.These offsets followed a continuous uniform distribution, as described in Equation ( 10): The complete process of inducing inaccuracies in otherwise accurate data is presented in Figure 10.Now that the process of adding inaccuracies for simulating sensor degradations has been presented, the inputs and outputs of our machine learning algorithm can be described.Figure 11 represents the inputs (in black) and outputs (in grey) of our model.The input was a graph that contained three successive sensor responses for all measurement points.The responses in the two first time steps (temporally) were unaltered, clean sensor responses, as represented by the two curves with dashes in Figure 11.However, the third graph was based on data that had been degraded by following the aforementioned process and is shown by the full black curve in Figure 10.The expected output was the sensor integrity mask that identified the degraded data, as shown by the grey curve in Figure 11.
The use of three consecutive measurements was based on the integrity of measuring at least two correct values, then incrementally identifying the faulty measures.The two correct initial measurements (the initial condition) gave a baseline to our model.The measurements at the first time step allowed the model to learn the temperature distribution.The measurements at the second time step enabled the model to learn the evolution of the temperature distribution over time, and therefore, permit a prediction of the temperature distribution at the third time step.Then, the measurements at the third time step introduced sensor degradation.
Using the method presented above, two datasets of a total of 5000 inputs and outputs were created: a training dataset and a testing dataset.These datasets were taken from real measurements performed on the HLW demonstrator shown in Figures 2 and 3.The exact same training and testing datasets were used throughout this study for the different kinds of machine learning algorithms presented.Therefore, a comparison between different machine learning models could be performed on the same testing dataset.Later, we compare the efficiency of different trained models with different combinations of hyperparameters.

The Graph Neural Network Architecture
The GNN used for the node-level assessment of the integrity of the measurements was divided into three successive tasks:

•
The creation of the input graph.

•
The updating of the graph's embeddings, i.e., the core of the GNN, using messagepassing layers based on the GraphSAGE model.

•
The classification of each independent node.
The first element of the GNN, which is presented in Figure 12, took the unidirectional graph representing the different measurements along the optic fiber considered and added the edges' embeddings (which could ultimately correspond to the heat flow between each pair of neighboring nodes).The initial graph (G 0 ) was composed of three successive time steps, namely, the first two unaltered steps (T(t 1 ) and T(t 2 )) and the third step with added inaccuracies (T d (t 3 )).This calculation was performed by a multi-layer perceptron (MLP) W Flow that was applied to neighboring nodes.W Flow was shared across all the edges.The nodes' embeddings remained unchanged.Equations ( 11) and ( 12) model the first part of our GNN.This initialization was aimed at introducing physics by creating edges' embeddings similar to a physical flow of energy.
The nodes' and edges' embeddings of graphs G 0 and G 1 were three-dimensional vectors.Henceforth, all embeddings of any layer l of the GNN were three dimensional vectors: The second element was the core of the GNN.It was used to update the graph embeddings, which made them easier to interpret for the classification network.This element was composed of subsets called GNN layers.Each layer updated the embeddings of the nodes and edges, and by stacking them, the core of the GNN was created.Each layer function used a variation of the GraphSAGE model, as shown by Figures 13 and 14.The novelty of this model lay in the modification of the GraphSAGE model with elements of the Graph Net (GN) framework developed by Battaglia et al., 2018 [2].This model aims at inferring physics by deducing the flow (on edges) from energy levels (on nodes).And inversely, to apply energy conservation (on nodes) by the cumulation of flows (on edges).The layers update the nodes' embeddings by aggregating its neighboring edges and vice-versa.Two update functions, namely, W l e and W l x , are used to update the edges and the nodes' embeddings, respectively.These update functions depend on the layer l and are built using a multilayer perceptron (MLP).Moreover they are shared across all nodes and edges.Equations ( 13) and ( 14) describe the updating of the nodes and the edges' embeddings:  The third element of the network was the classifier presented in Figure 15.Its role was to classify each node with respect to the sensor integrity.This was a local element that, for each node, took the embedding of the corresponding node and the embeddings of its neighboring edges in order to classify the node sensor state as healthy or faulty.This calculation was undertaken by an MLP network denoted as W CLA , which was shared across all the nodes.Equation ( 15) describes the network's Boolean prediction Ŝv of the sensor's state at node v: The loss function used for this model was a variation of the binary cross-entropy (BCE) [60] described by Equation ( 16): We used a weight w to alter the BCE, with the aim to improve the detection of faulty sensors using our model.Increasing the weight w reduced the rate of false negatives (FNs) but increased the rate of false positives (FPs).However, the false negatives (FNs) were the most critical errors in our system, as presented in Figure 16 later in this section.
The different hyperparameters used in defining the GNN were as follows: • The size of the various MLPs in terms of the number of neurons.

•
The aggregation functions used for the message-passing layers and the classifier.

•
The number of message-passing layers.

•
The weight w used in the loss function.
The main idea was to create multiple GNNs with different hyperparameters and to create some kind of benchmark.Table 1 presents all the different hyperparameters tested.Every combination of the hyperparameters presented in Table 1 was used to train a GNN.In total, there were 162 (3 4 × 2) unique sets of hyperparameters.For each set of hyperparameters, 10 similar GNNs (for a total of 1620) were trained on the unique training dataset generated following the process presented in Figure 10.
The training parameters were identical for each GNN: Each GNN was then tested on the unique testing dataset.The results were then displayed in a confusion matrix [62], as presented in Figure 16.We note that the sensors' data were used in predictive models later on and if a sensor was detected as faulty, it was not used in the models.As such, false positives (FPs) slightly decreased the efficiency of these models because they lowered the amount of usable data.However, false negatives (FNs) fed wrong information to the predictive models, making them a major threat.

The Thresholding Classification Method
To assess the efficiency of the GNN for the classification of the sensors' states, a comparison with verified methods, the first of which was the thresholding method described below, was necessary.The used method was based on the same inputs and outputs as the GNN.The method used was decomposed into the following steps: 1.
The prediction of the temperature distribution at the third time step without errors was derived from the distributions at the two first time steps.For this purpose, a linear extrapolation method in time was used.Equation ( 17) presents how to obtain a prediction of the temperature at the third time step.

2.
The prediction of the temperature distribution was compared against the deteriorated response (at the third time step) by determining a threshold ε under which the sensor was supposedly healthy and over which the sensor was supposedly faulty.Equation (18) shows how the threshold value was used to assess the sensor integrity.

3.
Multiple thresholds ε were tested and the one with the best results was used.

The Multi-Layer-Perceptron-Based Classifier
The second classification method used to evaluate the GNN model was a simple feedforward neural network.This neural network inputted the three consecutive temperature responses at one of the graph's nodes and outputted the sensor state (at said node).Thus, the input and output dimensions were, respectively, three and one.
This multi-layer perceptron was composed of the following dense layers: 1.
Layer 1 was composed of 15 neurons and used ReLU as its activation function.

2.
Layer 2 was composed of five neurons and used ReLU as its activation function.

3.
Layer 3 was composed of one neuron and used sigmoid as its activation function.
This multi-layer perceptron's training parameters were as follows: • The optimizer used was the Adam stochastic gradient descent algorithm [61].A validation split of 15%.

The Decision Tree Classification Method
The last model used to evaluate the GNN model performance was a decision tree.This model shared the same inputs and outputs as the multi-layer perceptron and the thresholding method.The scikit-learn [63] decision tree model was used.

Results
This section provides the results from the different methods used on the testing dataset.Different metrics are then given to evaluate the performances of the different methods.Finally, these results are compared with each other.

Results of the GNNs
This section shows the results of each of the 1620 GNNs tested on the unique testing dataset.Using the confusion matrix presented in Figure 16, it was possible to use recall metrics [62,[64][65][66], such as the true positive rate (TPR) and true negative rate (TNR), to plot the efficiency of each of the GNNs. Figure 17 presents the recall metrics on the testing dataset for each GNN trained.Equations ( 19) and ( 20) describe the recall metrics: Then, by using the accuracy metrics [62,[64][65][66], it was possible to plot the proportion of GNNs that attained certain accuracy thresholds.This is shown in Figure 18.Equation ( 21) presents the accuracy metric: The same analysis could be performed using precision metrics [62,[64][65][66], such as positive predictive value (PPV) and negative predictive value (NPV).Figure 19 shows the distribution of the precision metrics on the testing dataset for each GNN tested.Equations ( 22) and ( 23) describe the precision metrics: Figure 19.Performances of the GNNs on the testing dataset in terms of precision metrics.
Eventually, the same could be achieved using the F1-scores [62,[64][65][66], and the results are presented in Figure 20.The F1-score is the harmonic mean of the recall and precision, as showcased by Equation ( 24): Figure 20.Performances of the GNNs on the testing dataset in terms of F1-scores.
Then, it was interesting to evaluate which hyperparameters presented in the previous section had the most influence on the performances of the GNNs.Figures 21 and 22 present the impacts of the various hyperparameters.Figure 21 shows the proportion of networks that performed over a certain accuracy, similarly to Figure 18, only this time, we split the population into groups with regard to the hyperparameters.Each row of the figure corresponds to a set of hyperparameters (aggregation function, number of layers, etc.), and the subfigure on the right is a magnified view of the figure on the left.This magnified view is meaningful because it showcases which hyperparameter gave the best results when only the best GNNs were taken into account.Thus, the GNNs that did not learn sufficiently were out of the equation.Later on, the thresholding method was evaluated on the testing dataset using the same threshold ε identified using the training dataset.The confusion matrix of the thresholding method is shown in Table 2.

Results of the Multi-Layer-Perceptron-Based Classifier
Similar to the GNN, the multi-layer-perceptron-based classifier was optimized on the training dataset and was then applied to the test dataset.The confusion matrix associated with this method is shown in Table 3.

Results of the Multi-Layer-Perceptron-Based Classifier
Table 4 presents the confusion matrix that resulted from the application of the decision tree optimized on the training dataset to the test dataset.

Trained GNNs Compared against the Other Classification Methods
Table 5 provides the metrics for the five top-performing GNNs and the other standard classification methods that used the test dataset.The hyperparameters used in the topperforming GNNs are presented in Table 6.

Discussion
This section explores the results presented in the previous section.It aims to provide an analysis of the hyperparametric study, review the efficiency of the used model and tackle the limitations of the selected GNN architecture.

The Impacts of Different Hyperparameters
The impacts of the different hyperparameters are discussed using Figures 21 and 22.To better understand Figure 21, subfigures 1 on the left and subfigures 2 on the right must be distinguished.Subfigures 1 represent the whole population of trained GNNs for one hyperparameter, whereas subfigures 2 present only the 30% best networks for this hyperparameter.This means that subfigures 1 showcase which hyperparameter had the better odds to produce a GNN with a precision from 70% up to 99%.In contrast, subfigures 2 only focus on networks whose accuracy exceeded 99.5%.The choice between these two options can be made based on the selected objective or precision of the analysis.
First of all, one may choose the hyperparameters that performed the best according to subfigures 1 if the aims are as follows:

•
To have a set of hyperparameters that performs well on average.

•
To train only a few GNNs and have decent results.
On the other hand, one may choose the hyperparameters that perform the best according to subfigures 2 if the goals are as follows:

•
To a set of hyperparameters that performs the best but many trained GNNs might have relatively bad precision.

•
To train a lot of GNNs and pick out the top performers.
Moreover, Figure 22 only presents a distribution of the 100 top-performing GNNs.Thus, it is to be used in the same manner as subfigures 2.
The choice of an aggregation function (both for the classifier and the layers of the GNN) had little impact on the general accuracy of the GNN, as showcased by Figure 21(a1,b1).However, among the top-performing networks, the sum function seemed to perform worse than the max and average functions, as shown in Figures 21(a2,b2) and 22a.
This might have been because the sum was dependent on the size of the neighborhood, whereas the average and maximum were normalized.Indeed, there were two elements in the graph that only had one neighbor (the extremities, i.e., nodes v 1 and v n of the graph, as shown in Figure 7), while the rest of the nodes had two.This might have been the reason why the networks with sum as its aggregation function underperformed.
Another impactful parameter was the number of layers of message passing used.Figure 21(c1) shows that the networks with zero layers of message passing performed better overall than the networks with one layer of message passing, which, in turn, performed better overall than the networks with two layers of message passing.However, among the top-performing (15% best) networks, the GNNs with one layer of message passing seemed to perform better, as showcased by Figures 21(c2) and 22b.
This result is a bit strange.Indeed, it was expected that the more layers of message passing, the better the result of the GNN.This can be explained by the limited number of epochs because adding another layer of message passing would complexify the optimization of the GNN, and therefore, require more effort and data to learn the correct weights of the network.Moreover, a GNN with a higher number of layers requires optimizing more weights, which increases the risk of falling into a local minimum.Both these phenomena can explain why having fewer layers of message passing increased the overall accuracy but did not transfer to the top-performing GNNs, for which added complexity permitted better results.
Furthermore, Figure 21(d1) suggests that the medium-sized MLPs performed better overall and that the large-sized MLPs performed worse overall.Figures 21(d2) and 22c seem to show the same trend.This could have been the effect of the same phenomena of optimization complexification as exemplified by the number of GNN layers, but this could have also been linked to the inherent dimension of the data.Indeed, using fewer data than the GNN's inherent dimension limits the ability to learn because the GNN cannot take into account all the problem's variables, but a higher data dimension can also hinder the ability to learn because the information is too dilute.
Moreover, as expected, when the importance of detecting a faulty sensor was induced by increasing the weight w in the BCE formula (Equation ( 16)), the rate of TPs increased, whereas the rate of TNs decreased.
Finally, this analysis did not take into consideration the correlation between one or more sets of hyperparameters.For instance, small-sized neural networks may perform well with two layers of message passing but not with zero or one.However, this analysis was complex and singling out hyperparameters already showed great empirical results.

Efficiency of GNNs for Sensors' Integrity Assessment
As presented in Table 5, the most precise GNNs outperformed the thresholding method for every metric, which demonstrated the effectiveness of the model for assessing the sensors' integrity.However, the GNN had a longer training and a slightly longer prediction time.Although the computation time is a relevent comparison criterion, in this application, the measurements are performed daily, and therefore, there is no need for real-time computation.
When compared with the MLP-based classifier, the most precise GNNs had relatively similar results for all metrics.These results may into question the appeal of the use of a GNN for the studied problem; however, it is worth pointing out that the data used here contained very few topological components given they were one-dimensional.Moreover, the temperature profiles of the used experimentation presented in Figures 8 and 9 were rather simple because they involved a relatively uniform heat source.The model for sensor failure was also quite simple (i.e., an offset from 2 °C to 8 °C).The GNN model will outperform by a suitable amount the MLP for a more complex graph, a more complex heat source distributions or a more intricate sensor failure model thanks to its ability to gain insights into the underlying physical information and topology.
The decision tree outperformed the GNN for a majority of the metrics but had a tendency to have an increased amount of false negatives in comparison with the MLP and GNN, which means it might not be the best method for our problem.
In a nutshell, the performance of the method proposed in this work was similar to common classification methods for one-dimensional data that are relatively homogeneous.This guaranteed that the used method was effective for detecting sensor failures.

Upscaling the Model to Complex Networks
The next logical step of this research is to apply this model to more complex sensor networks, for instance, a network composed of all the sensors presented in Figure 2. Two challenges seem to arise from this upscaling: adapting the model to a complex graph and creating the graph of the sensors.
The first can be almost entirely bypassed because the model is based on node-order invariant aggregation functions, such as sum, average or maximum.However, as exposed in the previous subsection, the sum function is not normalized using the neighborhood size, which might cause issues during upscaling.But for the most part, applying the same GNN to a different graph is completely feasible.Moreover, because this model is based on the GraphSAGE model [13], it is really efficient at dealing with large graphs that require fewer computational resources than other GNN models.
The second and main challenge of this upscaling is to create the sensor network's graph.This task is tricky because we need to define a systematic way to connect various nodes (i.e., sensors) without the graph becoming too dense or too sparse.Indeed, a very dense graph loses topological expressiveness, in addition to being very costly to compute with a GNN.This task is even more complex if we consider that the sensors are not all in the same part of the cell (some are in concrete, others in a cylinder liner, etc.).

Limitations of the Used Model
The first limitation of the used model was linked to the definitions of the sensor failure models, which were considered to be a simple restricted (between 2 °C and 8 °C) offsets in this work.This kind of failure is easier to pinpoint than real errors for two main reasons: first, it is a homogeneous definition, meaning all faults are similar, making them easier to identify; second, it is a simple definition that does not represent all the various ways a sensor can malfunction.This limitation could be lifted by integrating each type of sensor fault separately in the model and by having one class for each type of sensor failure.
The second limitation of the used model was the training complexity, which means that in order to have a GNN with decent predictive capabilities, multiple training sessions are required.It is therefore required to create a list of objective metrics to identify the top-performing networks that operate without biases.This also means that if a new type of fault is identified, the models will need to be retrained in order to be able to measure their performance on this particular problem.
The third limitation was technical and linked to the computational resources required to run a GNN over large graphs, which requires a lot of RAM (random access memory) and processing power.This limitation could be lifted using subgraph learning [67] and by setting up multithreading.Subgraph learning consists of training the GNN on subsets of the initial graph and then reconstructing the network on the graph, hence reducing the amount of memory required to store the model.
Another limitation was the use of the GraphSAGE model, which lacks the generalization capabilities of the attention mechanism used in a GAT [31].However, when dealing with large graphs, the computation of the attention mechanism is very costly.Therefore, there is a tradeoff between the generalization capabilities and the computational resources required for training and predictions.

Conclusions
In this work, we propose a novel method based on graph neural networks to assess a sensor's response integrity.The method was applied on real data obtained using Andra's HLW cell demonstrator.The method was compared to state-of-the-art (i.e., thresholding, MLP and decision tree) and showed a similar performance.The method could perform even better when dealing with more complex data (from a topological and thermal standpoint).Moreover, the GNN could adapt more easily to various data topologies, which warrants its use for the assessment of sensor integrity for nuclear waste monitoring.Multiple GNNs were trained and compared to find the optimal neural network hyperparameters.The results show that a single message-passing layer was often enough for the selected application, while multiple message-passing layers were harder to train and could result in overfitting.
Future works will deal with the whole sensor network instead of only the data along one of the optic fibers (see Figure 2).The main challenge with this upgrade may be the creation of the sensors' network, which will have to consider the location of the sensor (i.e., the structural element the sensor is mounted on).In contrast, the scaling of the model from a one-dimensional graph to a complex graph is not a concern since the architecture of the GNN remains similar.Eventually, future works may be directed toward the exploration of multiple GNN models and on the sensor failure models.
Further developments will include interpolation under dynamic conditions (evolution of the sensor network, the medium, etc.) using a GNN.Indeed, once a faulty sensor is identified, it is paramount to evaluate the temperature at this spot to ensure that numerical models can continue to monitor the facility.

Figure 2 .
Figure 2. Sensor distribution around the HLW demonstrator cell.

Figure 3 .
Figure 3. Installation of the thermal source in the HLW demonstrator cell.

Figure 4 .
Figure 4. Message-passing mechanism and its use in GNN.

Figure 5 .
Figure 5.The graph neural network model.

Figure 8 .
Figure 8. Thermal responses of the optic fiber sensor at set sample points over time.

Figure 9 .
Figure 9. Responses of the integrality of the distributed sensors for different time samples.

Figure 10 .
Figure 10.Process of introducing sensor inaccuracies into clean distributed sensor data.

Figure 11 .
Figure 11.Inputs and outputs of the machine learning algorithm.

Figure 12 .
Figure 12.First element of the GNN, which was used to create the input graph.

Figure 13 .
Figure 13.One GNN layer of message passing: updating the nodes' embeddings.

Figure 14 .
Figure 14.One GNN layer of message passing: updating the edges' embeddings.

Figure 15 .
Figure 15.The local classifier predicts the sensor state.

•
The used optimizer was the Adam stochastic gradient descent algorithm [61].• A total of 50 training epochs.• A batch size of 20 per iteration of the gradient descent algorithm.• A validation split of 15% (same split for all the networks).

Figure 16 .
Figure 16.Confusion matrix of our problem.
• A total of 200 training epochs.• A batch size of 200 per iteration of the gradient descent algorithm.•

Figure 17 .
Figure 17.Performances of the GNNs on the testing dataset in terms of recall metrics.

Figure 18 .
Figure 18.Proportion of GNNs above a certain accuracy.

Figure 21 .
Figure 21.Proportion of GNNs above a certain accuracy, sorted by hyperparameters.

Figure 22
Figure 22 presents the distribution of hyperparameters for the 100 top-performing GNNs that were trained.Subfigure (a) presents the aggregation functions used for the classification and the message passing, subfigure (b) shows how many layers of message passing the top-performing GNNs used, subfigure (c) shows the sizes of the MLPs used and subfigure (d) presents the w coefficient used in the BCE loss function.

Figure 22 .
Figure 22.Distribution of the hyperparameters' set for the 100 top-performing GNNs.

Figure 23
Figure 23 presents the value of ε optimized on the training dataset to provide the best overall performance using the recall metrics.

Figure 23 .
Figure 23.Optimizing the value of ε on the training dataset.

Table 1 .
Various hyperparameters used in the benchmark.

Table 2 .
Confusion matrix of the thresholding method that used the test dataset.

Table 3 .
Confusion matrix of multi-layer-perceptron-based classifier that used the test dataset.

Table 4 .
Confusion of the decision tree classification method that used the test dataset.

Table 5 .
Comparison between the top-performing GNNs and other standard classification methods.