A Fault Detection System for a Geothermal Heat Exchanger Sensor Based on Intelligent Techniques

This paper proposes a methodology for dealing with an issue of crucial practical importance in real engineering systems such as fault detection and recovery of a sensor. The main goal is to define a strategy to identify a malfunctioning sensor and to establish the correct measurement value in those cases. As study case, we use the data collected from a geothermal heat exchanger installed as part of the heat pump installation in a bioclimatic house. The sensor behaviour is modeled by using six different machine learning techniques: Random decision forests, gradient boosting, extremely randomized trees, adaptive boosting, k-nearest neighbors, and shallow neural networks. The achieved results suggest that this methodology is a very satisfactory solution for this kind of systems.


Introduction
In recent years, most countries faced an important challenge in terms of global warming, economic instability and fossil fuels price dependency. In this context, the use of alternative energies has been promoted by the administrations. The most common alternative energy sources are the wind and solar energies, whose technologies have been subjected to significant advances. However, in addition to these two energies, the promotion of other renewable energies, such as oceanic or geothermal energy, have presented important developments in terms of efficiency [1].
Geothermal energy is defined as the heat energy stored under the ground. Dickson and Fanelli, in [2], presented an estimation of the amount of heat inside the earth rounds the 42 × 10 12 W. In spite of this high amount of energy, geothermal installations must be placed in specific areas with suitable geological conditions [3]. Around the world, its use represents 15 MW of non electrical applications, such as industrial processes, bathing or heat pumps, and 9 MW of the electrical ones.
The heat exchanger is a crucial component of a geothermal facility, and its main function is to absorb heat from the ground or transfer it. A geothermal heat exchanger can be placed under the ground in vertical or horizontal configurations [4,5]. On the one hand, vertical configurations are more efficient because at high depths, the ground temperature remains almost constant along the year. This means that, compared to the ambient temperature, the ground temperature would be higher in The Sotavento bioclimatic house is a building dedicated to promote the use of alternative energies and the energy savings. These facilities, founded by the Sotavento Galicia Foundation, are located between the councils of Xermade and Monfero (Lugo), in the autonomous community of Galicia (Spain). Its geographical coordinates are 43 • 21 North, 7 • 52 West, with an elevation of 640 m above the sea and at a distance of 30 km from the sea.
Two different energy needs must be satisfied in the Sotavento bioclimatic house: The thermal and the electric energy. The thermal system has three different renewable energy sources: Geothermal, solar and biomass. These three sources ensure the thermal demand coverage. The thermal installation can be divided into three parts [27]:

•
Generation group: Three different renewable sources are exploited:  • Energy accumulation group: The thermal energy storage is ensured using different accumulators. A solar accumulator of 1000 L receives the thermal energy from the solar system. In series, an inertial accumulator of 800 L stores the heat from the boiler and geothermal systems.

•
Consumption group: The thermal system must cover the demand of underfloor heating systems and Domestic Hot Water (DHW). The underfloor heating system is designed to keep the house temperature between 18 • C and 22 • C. The fluid temperature should remain between 35 • C and 40 • C. According to the Spanish Technical Building Code, the DHW demands 240 L/day.
In addition to the thermal systems, the Sotavento bioclimatic house has also an electrical installation with two renewable sources: Wind and photovoltaic. The electricity supplies the power systems and the lighting. To avoid power cuts, the house is connected to the power grid when it is demanded.

The Geothermal System
A more detailed explanation about the geothermal energy system is presented in this subsection. It is divided into two main parts described below: The heat pump and the heat exchanger ( Figure 1). Heat Pump. The Heat Pump has two different circuits; the primary one provides the heat from the ground (the geothermal exchanger) to the heat pump unit, and the other one is connected between the unit and the inertial accumulator. The energy absorbed from the ground is measured by two sensors.
Geothermal exchanger. The horizontal exchanger consists of five different circuits. The ground temperature along the exchanger is monitored using sensors distributed in four different loops. A scheme of the sensors located along the exchanger can be seen in Figure 2. Sensors S28 and S29 measure the energy absorbed from the ground and S401 measures the ground temperature. The rest of the sensors monitor the exchanger temperature in different points.

The Dataset
The initial dataset corresponds to the temperatures measured by the sensors during one year, registered with a sample time of 10 min.
Sensors S28 and S29 (the input and the output of the heat exchanger) are located inside the house. Hence, when the heat pump is off, these sensors measure the temperature inside the house. For this reason, the temperature is filtered to take into account only the data when the heat pump is on.
As this work proposes a model capable of predicting the sensor measurements to detect anomalies, only data from correct operation is considered. Then, to avoid wrong samples (bad sample time, bad range, open wires, etc.), the dataset was filtered to discard the erroneous data. After this conditioning step, the samples were reduced from 52,705 to 52,699.
However, as the appearance of any kind of anomaly in a sensor must be detected in a short time, the models were implemented with an amount of data corresponding to two days. These measurements are randomly selected from the 52,699 samples.

Fault Detection and Recovery (FDR) Approach-Used Techniques
The scheme defined for fault detection and recovery approach is shown in Figure 3. It is possible to divide the figure into two parts: The model and the fault detection and recovery block. The first one gives the prediction of each sensor based on the measurements made by the rest of the sensors.
The second one compares the prediction with the real measurement, and analyzes the deviation based on a defined range. If there is a significant deviation, the valid signal is the prediction. Otherwise, the real measurement is set at the output.

FDR Steps
In this subsection, the necessary steps to accomplish the FDR developed approach are explained. Sensor fault detection. Initially, a simple methodology for accomplishing sensor fault detection technique is used. The method allows a specific configuration of the range deviation. If the measured sample is out of this range, then a fault is labeled. The deviation percentage is referred to the operating temperature range.
Recovery. If a fault is detected, then it is necessary to recover the wrong sample with a value prediction. This prediction could be based on the other sensors readings, their previous values, and so on. To accomplish the recovery, a model must be implemented with the aim to predict an accurate value.

Used Techniques
The present subsection shows the different techniques used for accomplishing the objectives of the present research.

Analysis and Preprocessing
From the considered initial data, two different subclasses were created: 1. Day data cases. 2. Night data cases.
Knowing each date of the data recollection and the precise location of the installation under study, the sunrise and sunset times can be obtained. This is the criteria used to split the data in the two subclasses.
To obtain a representative model, some variables of the raw dataset have been selected. In addition, the previous state of some signals is included as an artificial input, for developing each experiment shown in Section 4.
The use of this extra information can be more beneficial than obtaining the model with original data features only. The election of these artificial features is always based on expert knowledge about the system behavior [28].
Based on a data description of the new dataset generated from the raw data, a common pre-processing procedure has been developed, including those experiments with previous values of different sensor like artificial variables.
The criterion for data normalizing is shown in Equation (1): The Standard Scaler data input pre-processing has been implemented with Python sklearn.preprocessing. StandardScaler [29] library. The main goal of the normalization step is to avoid the very soon convergence in the first iterations, when the training process of a particular regression method begins [30].

Regression Techniques
The recovery methodology purpose is to mimic the actual behaviour of the sensor. Thus, a predictive model trained with data acquired from the sensor is a sensible approach for achieving a computational representation of the sensor. Six different types of predictive models have been tested: Shallow neural networks, extremely randomized trees (ExtraTrees), random decision forests, adaptive boosting, k-nearest neighbors, and gradient boosting.
This choice of regressors pursues to represent the complexity of the sensor's behaviour by two subtly different approaches: The shallow neural network solution features a single model capable of increasing its complexity by means of enlarging the number of neurons in its hidden layer; on the other hand, the extremely randomized trees, adaptive boosting, random decision forests, and gradient boosting regressors, belong to the ensemble methods family. Ensemble models provide their results by combining those obtained from multiple elementary models. In this case, complexity is approached by enlarging the number of simple models comprised in the ensemble.
The Multilayer Perceptron (MLP) is one of the most frequently used shallow neural network architectures. The good performance of this kind of artificial neural network has been proven in similar works such as [31][32][33]. Previous research [34] proved how this technique is capable of providing satisfactory results in the context of much larger amounts of data than those used in the case of study.
Ensemble methods, on the other hand, are among the most frequently used techniques for the excellent results they usually display. Examples of successful stories can be discovered following Kaggle's machine learning competitions (https://www.kaggle.com/), where, along with Deep Neural Networks, ensemble methods such as those reported in this research are most frequently the winning techniques.
Each technique and the set of their associated parameters used in this work are explained bellow: • Shallow Neural Networks. Artificial Neural Networks can be used as universal approximators [35]. For this paper, a three layer Multi Layer Perceptron architecture was chosen: An input layer for capturing the sensor information, a hidden layer with non linear activation functions, and an output layer with one single neuron and a linear activation function to provide the prediction. The most important hyperparameters governing the regressor performance are the hidden layer size, the maximum number of iterations, the early stopping, the activation function, the nesterov momentum and the solver. • K-Nearest Neighbors. This is a representative of instance based techniques or non generalizing learning. Instead of representing the data via a model, this technique stores instance and uses a voting scheme on the nearest neighbors for obtaining the prediction on new data. This technique is a popular choice for setting a baseline for the prediction error. The most important hyperparameter is the number of neighbors.

•
Adaptive Boosting. This technique belongs to the stagewise additive models family. The prediction is based on a weighted sum of the simpler weak estimators it comprises. Each weak estimator is designed to concentrate on those samples that previous estimators found still to be difficult to fit. In this technique, the number of estimators is the most important hyperparameter to tune. • Random Decision Forests. Being one of the most popular ensemble methods, Random decision forests (RF) comprise a collection of simple decision trees whose results are considered to emit a final collective result. RF basic components can be built by considering a random limited number of features and/or a random limited number of observations. Thus, each component only has access to a fraction of the information and pays attention to specific details in the portion of information assigned to them. The combination of a number of these simple basic trees most frequently outperforms the results from a larger and more complex single tree. The number of estimators is the most important hyperparameter to tune.
• Extremely Randomized Trees. They are similar to Random Forests, as they combine an ensemble of decision trees. Nevertheless, a few important differences are worth noting: Firstly, Extra Trees can provide piece-wise multilinear approximations to the training dataset instead of the piece-wise constants one provides by random forests. Secondly, Extra Trees are based on using random values for the optimal cut point choice, instead of bootstrapping to find the optimal cut point [36]. Similarly to RF, one of the most important hyperparameters to tune is the number of basic estimators. • Gradient Boosting. This technique builds the model following a stage-wise approach, by adding subsequent basic estimators in order to capture the unexplained information present in the residuals of former weak estimators [37]. The estimators frequently are decision trees and, similarly, the number of basic estimators is among the most important hyperparameters.

Experiments and Results
This section describes the different experiments carried out and the results obtained.

Experiment Definition
Depending on the predictors used in the predictive model, four different experiments are defined: In each experiment, the four regression techniques mentioned above -shallow neural networks, extremely randomized trees, random decision forests, and gradient boosting-are used to build two types of models, according to the data used for each one:

•
Global models: In this case the whole data set is used for training a single regressor.

•
Hybrid models: In this case, the data set is split into two groups in accordance to day and night criteria. Two different models are fit, one for day usage and another one for the night hours.

Error Metrics
In order to compare the different regression models obtained, the following error metrics have been implemented: • MAE: Mean Absolute Error. The goal of this metric is to measure the difference between predicted and real values. This metric has some advantages compared to other error measures [38].
where Y k is the observed value andŶ k is the foretold value.
where Y k is the observed value andŶ k is the foretold value. • SMAPE: Symmetric Mean Absolute Percentage Error. The main goal of this metric is to explain relative errors thanks to the use of percentages [40], Equation (4).
where Y k is the observed value andŶ k is the foretold value. • MSE: Mean Squared Error. This metric can include the variance of error, it can be applied in several forecasting problems [41] Equation (5).
where Y k is the observed value andŶ k is the foretold value. • MAPE: Mean Absolute Percentage Error. This error metric is one of the most common measures of the accuracy in regression problems [42], Equation (6).
where Y k is the observed value andŶ k is the foretold value. • NMSE: Normalised Mean Square Error. This a measure oriented to estimate the overall deviations between observed and predicted values [43], Equation (7).
where Y k is the observed value andŶ k is the foretold value.

Experiments Setup
For each experiment the dataset was split into two subsets-training and test sets-as customary in data science projects in order to provide the error value on a held out dataset. Such an error represents the capability of the method to generalize the observed behavior to new unseen data. Thus, a fraction comprising 70% of the samples is used for training purposes to adjust the parameters of the models, while a fraction with 30% of the samples is used for final testing. In order to find the best combination of hyperparameters for each model, a grid search with ten fold cross validation has been carried out. The chosen scoring criteria was the negative mean square error. As a preprocessing step, the data is normalized before entering the regression model. In order to avoid leaking information from the validation test during cross validation, both the scaler and the regressor are embedded in a pipeline.
The four families of regressors, the scaler, the pipeline tool, and the grid search with cross validation, are implemented in Scikit-Learn's machine learning library [44] which provides easy access to these techniques using Python as programming language for computational purposes.
The search space for the best values of the hyperparameters is reported below. Those hyperparameters not mentioned adopt Scikit-Learn default values.

K-Neareat Neighbors
• n_neighbors=range(5, 20, 5) Tables 1-3 show the results obtained in the experiments by the global and hybrid approaches (best ones in bold). According to most error metrics, the ExtraTrees regressor achieves the best results in both global and hybrid approaches. Among these two, the hybrid approach displays better results, particularly according to the mean absolute error criteria, the easiest to interpret by human beings. Figures 4 and 5 display the results obtained by the six types of regressors considered, in this case using the data from experiment A. It is clear that the Extra Trees regressor achieves great resemblance with the actual data recorded from the sensor in what are considered very satisfactory results.

Discussion
The number of experiments, regressors and error metrics reported in this paper builds a complex scenario when attempting to establish a single winner solution. As it usually happens in real engineering problems, the solution to a problem is not unique and the context determines the preferred one.
From a strictly numerical point of view, it could be argued that Experiment A frequently displays best error values. In those cases where it fails to outperform other experiments, the results are not significantly different from the minimum.
Considering the experiments from the point of view of their complexity, Experiment A also represents the simplest configuration as it requires the lowest number of input variables; a fact that, in absence of significant differences in performance with respect to the rest of the experiments, also advocates for its designation as the preferred configuration.
In economic terms, the context around this study case does not justify a more complex configuration. In some scenarios, e.g., optimizing a quality feature in manufacturing processes, a marginal improvement in the prediction model leads to significant economic benefits; but that is hardly the case of the study case reported in this paper: A predictive model that is used as a backup for the real sensor and whose reads are only considered during malfunctioning.
According to these former criteria, Experiment A could be considered the best choice, but considering the following practicalities, the final decision might differ. Firstly, an important issue to consider is the intrinsic precision of the actual sensor being modeled. If the performance difference between alternatives is orders of magnitude lower than the sensor precision, then those alternatives are in fact equally optimal. Secondly, the results must be considered from the point of view of the subsequent data consumption. If the sensor data is to be further processed by an algorithm sensitive to a specific precision, it makes little sense considering differences in considerable smaller differences, e.g., comparing two temperature readings with values 82.15 F and 82.17 F when the on-off controller driving a pump already made a decision at a 60 F threshold. This paper deliberately does not specify the particular subsequent model that the sensor signal feeds, as many such systems can be considered. Essentially, it is the engineer's call to weigh the context factors and choose the optimal solution for the problem at hand, the numerical error scored by each alternative being an important but not unique criterion in the decision making process. For the study case reported in this paper, Experiment A using Extra Trees was adopted as the preferred solution. Nevertheless, the approach proposed relies on training data from a short period of time, two days, which makes it possible to periodically retrain the models and perform the comparison to select the new best choice and adapt for future changes.

Conclusions and Future Works
A methodology for recovering data missing in malfunctioning state sensor and the sensor fault detection have been addressed in this research successfully. Sensor fault detection procedure is relaying on tagging data as fault, when a measured sample is out of the range derivation. Moreover, the procedure for recovering data missing is based on the implementation of several experiments with the aim to get the best way to define a model when it is trying to get measurements of a sensor with problems. Input data features election is relevant when a robust regression model wants to be created to predict missing data in a process where the temperature is involved-more concretely, the election of new features and how these are estimated or calculated. In this research, new artificial features based on the sensor values on the previous state are added to achieve and compare a global model and hybrid model for recovering missing data of a sensor. Results prove that a hybrid model implemented with an Extremely Ramdomized Trees regressor, composed by day and night submodels not including previous state values as artificial features, is the best way for recovering data missing. Future works will explore the improvement of the sensor fault detection procedure via anomaly detection techniques such as Isolation Forest, One Class SVM (Support Vector Machines), Local Outlier Factor, and Elliptic Envelope. From the point of view of recovering missing data, new experiments based on time series oriented to prevent the use of previous state information will be implemented. Some new, complex and data fusion models will be used also in the next research phase.