A Novel Analytical-ANN Hybrid Model for Borehole Heat Exchanger

Optimizing the operation of ground source heat pumps requires simulation of both short-term and long-term response of the borehole heat exchanger. However, the current physical and neural network based models are not suited to handle the large range of time scales, especially for large borehole fields. In this study, we present a hybrid model for long-term simulation of BHE with high resolution in time. The model uses an analytical model with low time resolution to guide an artificial neural network model with high time resolution. We trained, tuned, and tested the hybrid model using measured data from a ground source heat pump in real operation. The performance of the hybrid model is compared with an analytical model, a calibrated analytical model, and three different types of neural network models. The hybrid model has a relative RMSE of 6% for the testing period compared to 22%, 14%, and 12% respectively for the analytical model, the calibrated analytical model, and the best of the three investigated neural network models. The hybrid model also has a reasonable computational time and was also found to be robust with regard to the model parameters used by the analytical model.


Introduction
Ground source heat pumps (GSHPs) use the heat capacity of the ground to provide efficient heating and cooling for buildings. GSHP is widely recognized as an essential technology for the future of energy systems, due to its high efficiency and ability to store energy. In 2015, GSHPs provided 90.9 TWh of energy worldwide, which grew at 10.3% per year from 2010 to 2015 [1]. Sweden is one of the pioneers in GSHP, where around 20% of the heating for the residential and service sector comes from GSHPs. In recent years, the GSHP market in Sweden has shifted from small GSHP for single-family houses to large GSHPs and borehole thermal energy storage (BTES) for commercial and institutional buildings, often used in combination with other heat sources [2]. This trend is expected to continue since an increasing proportion of renewable energy in the energy mix has focused the need of integrating thermal storage in district energy systems [3].
The heat demand of a building does not constrain the operation of GSHPs that work in combination with other heat sources. This provides flexibility to the heating systems that can be utilized to heat a building with reduced economic and environmental costs by optimizing the operation of the heating system. Models are necessary for optimizing the operation of a GSHP. Optimal operation of GSHP requires the short-term response of the borehole heat exchanger (BHE) [4], and the long-term response of the BHE to ensure that the performance of the GSHP does not deteriorate over its lifetime [5]. Therefore, a long-term model with high time resolution is required to optimize the operation of a BHE. The considerable variation in timescales makes the modeling of BHE particularly challenging. Several studies have used ANN as a solution to the difficulties in the modeling of BHE and obtaining the parameters. ANN leverages the increase in data available in recent years, due to better monitoring of GSHP to solve the issues of BHE modeling. Esen and Inalli [45], Sun et al. [46], and Fannou et al. [47] used ANN to predict the overall performance of the GSHP. ANN has also been used for design [48,49], determination of ground properties [50], and g-function generation [23,51]. Gang and Wang [52] and Lee et al. [53] developed ANN models of BHE as a component, for control of GSHP. Both studies developed ANN models to predict the outlet temperature of a BHE, and both studies used numerical models of a single borehole to train the model. Gang and Wang [52] used fluid inlet temperature and temperatures outside the wall of the pipes and temperature outside the wall of the grouting as inputs. Lee et al. [53] used the heat load of the borehole and outlet temperature of the previous step as input. Both studies concluded that using information from the previous time steps improves the accuracy of the model. The ANN models considered in these studies are not suitable for long-term simulation as both the models were trained using data from short simulations, 9-10 weeks. Moreover, the numerical models used to train the ANN only considered a limited domain around the borehole, and the interaction between the boreholes was not considered.
ANNs were used in the above examples since they did not have many of the issues of the classical modeling techniques of BHE. ANN models have a computational time of less than a second on a standard computer [53]. ANN models use the training data to calculate the weights of the models; hence they do not require any parameters that might be hard to obtain. The ANN models can deduce the complex relationships between the inputs and outputs. These relationships are otherwise hard to determine due to natural convection in the borehole, groundwater flow, heterogeneous ground, etc. However, for the ANN model to predict the outputs accurately, it is necessary to choose the right set of inputs and have a large set of data covering the entire range of operation of the BHE. Due to the large heat capacity of the ground, the thermal response of BHE is influenced by the entire thermal history of the BHE. Therefore, for any BHE model to have an accurate long-term prediction, it must consider the entire thermal history of the BHE. Selecting the input set for such an ANN will be challenging. Moreover, obtaining a training set covering the full range of operations is very difficult, since the thermal history changes with time, especially for an unbalanced system.
In this paper, we propose a hybrid approach that uses a simple long-term analytical model with low time resolution to guide the ANN model as a solution to this issue. This approach attempts to combine the analytical model's long-term modeling capability with ANN models' high accuracy and simplicity. Several studies have used ANN as a solution to the difficulties in the modeling of BHE and obtaining the parameters. ANN leverages the increase in data available in recent years, due to better monitoring of GSHP to solve the issues of BHE modeling. Esen and Inalli [45], Sun et al. [46], and Fannou et al. [47] used ANN to predict the overall performance of the GSHP. ANN has also been used for design [48,49], determination of ground properties [50], and g-function generation [23,51].
Gang and Wang [52] and Lee et al. [53] developed ANN models of BHE as a component, for control of GSHP. Both studies developed ANN models to predict the outlet temperature of a BHE, and both studies used numerical models of a single borehole to train the model. Gang and Wang [52] used fluid inlet temperature and temperatures outside the wall of the pipes and temperature outside the wall of the grouting as inputs. Lee et al. [53] used the heat load of the borehole and outlet temperature of the previous step as input. Both studies concluded that using information from the previous time steps improves the accuracy of the model. The ANN models considered in these studies are not suitable for long-term simulation as both the models were trained using data from short simulations, 9-10 weeks. Moreover, the numerical models used to train the ANN only considered a limited domain around the borehole, and the interaction between the boreholes was not considered.
ANNs were used in the above examples since they did not have many of the issues of the classical modeling techniques of BHE. ANN models have a computational time of less than a second on a standard computer [53]. ANN models use the training data to calculate the weights of the models; hence they do not require any parameters that might be hard to obtain. The ANN models can deduce the complex relationships between the inputs and outputs. These relationships are otherwise hard to determine due to natural convection in the borehole, groundwater flow, heterogeneous ground, etc. However, for the ANN model to predict the outputs accurately, it is necessary to choose the right set of inputs and have a large set of data covering the entire range of operation of the BHE. Due to the large heat capacity of the ground, the thermal response of BHE is influenced by the entire thermal history of the BHE. Therefore, for any BHE model to have an accurate long-term prediction, it must consider the entire thermal history of the BHE. Selecting the input set for such an ANN will be challenging. Moreover, obtaining a training set covering the full range of operations is very difficult, since the thermal history changes with time, especially for an unbalanced system.
In this paper, we propose a hybrid approach that uses a simple long-term analytical model with low time resolution to guide the ANN model as a solution to this issue. This approach attempts to combine the analytical model's long-term modeling capability with ANN models' high accuracy and simplicity.

GSHP Installation
The proposed BHE model is applied to a GSHP installation at a hospital in Umeå, Sweden. The GSHP is designed to provide 95% of the cooling load of the hospital and 5% of the heating load. The remaining heating and cooling loads are supplied by the district heating and cooling network and additional heat pumps. The GSHP consists of 125 boreholes, three heat pumps, and a heat exchanger for domestic hot water production, as shown in Figure 2. The boreholes are divided into two groups, A and B, which have independent fluid loops.

GSHP Installation
The proposed BHE model is applied to a GSHP installation at a hospital in Umeå, Sweden. The GSHP is designed to provide 95% of the cooling load of the hospital and 5% of the heating load. The remaining heating and cooling loads are supplied by the district heating and cooling network and additional heat pumps. The GSHP consists of 125 boreholes, three heat pumps, and a heat exchanger for domestic hot water production, as shown in Figure 2. The boreholes are divided into two groups, A and B, which have independent fluid loops. The GSHP provides both heating and cooling throughout the year. The operation of the heat pump is governed by the cooling demand during summer and by heating demand during winter. The GSHP has three operation modes, heat production and free cooling mode, active cooling and free cooling mode, and active cooling mode. During winter, when the heating demand governs the heat pumps, the GSHP operates in heat production and free cooling mode. In this mode, the heat required by the evaporators of the heat pumps 1 and 2 is higher than the cooling demand. Hence, the heat extracted from the BHE provides additional heat. During summer, the cooling load controls the heat pump. On most summer days, the GSHP operates in active cooling and free cooling mode. In active cooling and free cooling mode, free cooling from borehole group B and cooling from heat pumps 1 and 2 supplies the cooling demand. The heat produced by heat pumps 1 and 2 will be greater than space heating demand. Hence, the excess heat from the heat pumps is injected into borehole group A. When borehole group A cannot absorb all the excess heat from the heat pumps, both borehole groups A and B are used for the injection of excess heat. This mode is the active cooling mode. Heat pump 3 uses the intercooler from heat pumps 1 and 2 as a source. It provides heat for domestic hot water and additional heat for space heating during winter.
The borehole groups A and B consists of 62 and 63 boreholes, respectively, as shown in Figure  2. The depth of boreholes in borehole groups A and B are 250 m and 200 m, respectively. The distance between the boreholes is 7 m. The boreholes are single U-tube heat exchangers and are naturally filled with groundwater. The groundwater level is 10 m. The working fluid of the BHE is 20% bioethanol, with heat capacity 4390 J(kg·K) −1 and density 976.6 kg·m −3 . The properties of the ground were assumed homogenous. The borehole resistance, thermal conductivity, and undisturbed temperature of the ground and were determined by thermal response test (TRT), while the volumetric heat The GSHP provides both heating and cooling throughout the year. The operation of the heat pump is governed by the cooling demand during summer and by heating demand during winter. The GSHP has three operation modes, heat production and free cooling mode, active cooling and free cooling mode, and active cooling mode. During winter, when the heating demand governs the heat pumps, the GSHP operates in heat production and free cooling mode. In this mode, the heat required by the evaporators of the heat pumps 1 and 2 is higher than the cooling demand. Hence, the heat extracted from the BHE provides additional heat. During summer, the cooling load controls the heat pump. On most summer days, the GSHP operates in active cooling and free cooling mode. In active cooling and free cooling mode, free cooling from borehole group B and cooling from heat pumps 1 and 2 supplies the cooling demand. The heat produced by heat pumps 1 and 2 will be greater than space heating demand. Hence, the excess heat from the heat pumps is injected into borehole group A. When borehole group A cannot absorb all the excess heat from the heat pumps, both borehole groups A and B are used for the injection of excess heat. This mode is the active cooling mode. Heat pump 3 uses the intercooler from heat pumps 1 and 2 as a source. It provides heat for domestic hot water and additional heat for space heating during winter.
The borehole groups A and B consists of 62 and 63 boreholes, respectively, as shown in Figure 2. The depth of boreholes in borehole groups A and B are 250 m and 200 m, respectively. The distance between the boreholes is 7 m. The boreholes are single U-tube heat exchangers and are naturally filled with groundwater. The groundwater level is 10 m. The working fluid of the BHE is 20% bioethanol, with heat capacity 4390 J(kg·K) −1 and density 976.6 kg·m −3 . The properties of the ground were assumed homogenous. The borehole resistance, thermal conductivity, and undisturbed temperature of the Energies 2020, 13, 6213 5 of 19 ground and were determined by thermal response test (TRT), while the volumetric heat capacity was obtained from the geological data. Table 1 contains a summary of the properties of the BHE. In Puttige et al. [54], a more detailed description of the properties of the BHE.

Measurements
The GSHP has been thoroughly monitored since it started operating in February 2016. Temperature sensors measured the inlet and outlet temperature of each borehole group. The flow rate of each borehole group was measured from March 2017. The flow rate from February 2016 to March 2017 is estimated using the power consumed by the variable speed circulation pump of each borehole group. The tolerance of the temperature sensors are 0.15 • C and the tolerance of the flow meter is 0.33%. The time resolution of the temperature sensors and flow meter are 10 min and 1 min, respectively. However, the monitoring system calculates and stores the hourly average values, the raw data of higher time resolution is deleted at regular intervals. The tolerance of the hourly averaged measurements will be smaller as averaging reduces the random errors.
The hourly averaged measurements from 16 February 2016 to 15 July 2020, 1612 days, is used in this study. For 6.4% of this period, the measurements are not available due to faults in the monitoring system or maintenance. The majority of the missing loads occurs during 2016. 27.9% of data is missing during the first year, while only 1.1% of data is missing for the remaining period. The ground heat loads for the periods without measured data are estimated using the ambient temperature. Figure 3 shows the measured load of borehole groups A and B. The figure shows that the first year contains the most periods without measured data. Figure 3 also shows that the heat injected into the ground during the cooling season is greater than heat extracted from the ground, i.e., the BHE is not balanced.
Energies 2020, 13, x FOR PEER REVIEW 5 of 20 capacity was obtained from the geological data. Table 1 contains a summary of the properties of the BHE. In Puttige et al. [54], a more detailed description of the properties of the BHE.

Measurements
The GSHP has been thoroughly monitored since it started operating in February 2016. Temperature sensors measured the inlet and outlet temperature of each borehole group. The flow rate of each borehole group was measured from March 2017. The flow rate from February 2016 to March 2017 is estimated using the power consumed by the variable speed circulation pump of each borehole group. The tolerance of the temperature sensors are 0.15 °C and the tolerance of the flow meter is 0.33%. The time resolution of the temperature sensors and flow meter are 10 min and 1 min, respectively. However, the monitoring system calculates and stores the hourly average values, the raw data of higher time resolution is deleted at regular intervals. The tolerance of the hourly averaged measurements will be smaller as averaging reduces the random errors.
The hourly averaged measurements from 16 February 2016 to 15 July 2020, 1612 days, is used in this study. For 6.4% of this period, the measurements are not available due to faults in the monitoring system or maintenance. The majority of the missing loads occurs during 2016. 27.9% of data is missing during the first year, while only 1.1% of data is missing for the remaining period. The ground heat loads for the periods without measured data are estimated using the ambient temperature. Figure 3 shows the measured load of borehole groups A and B. The figure shows that the first year contains the most periods without measured data. Figure 3 also shows that the heat injected into the ground during the cooling season is greater than heat extracted from the ground, i.e., the BHE is not balanced.    Figure 3 shows that the measured data have a high variance, where some points have sudden changes in the load. The large differences could be due to a sudden change in the operation mode of the GSHP or due to erroneous measurements. The models are trained to minimize the squared error. Hence, a few points with high errors will disproportionately influence the model. Due to the large number of data points, random measurement errors will not affect the model. However, errors due to malfunctions in the measurement systems will result in some obviously erroneous measurements. These erroneous measurements can be identified as outliers in measured data.

Filtering of Data
In order to identify the outliers we studied the deviation between the measured data and a prediction model. An analytical model was used to identify the outliers, since the analytical model is not trained on the measured data. Therefore, it is better suited than the hybrid model presented in this study to identify measurement errors. The analytical model developed by Puttige et al. [54] with one-hour time resolution was used. Section 3 includes a short explanation of the model. Figure 4 shows a histogram of the deviation between the load from the analytical model and the measured data. The mean residual for borehole groups A and B are −0.6 kW and 18.5 kW, and the standard deviations of the residuals are 80.4 kW and 57.2 kW for borehole group A and B, respectively. The data points with residuals outside the window of three times the standard deviation are considered as outliers. Note that the deviation includes both measurement errors and model inaccuracies therefore a large window was chosen to accommodate for the inaccuracies of the analytical model. A total of 1.3% of the points were identified as outliers. The outliers were removed from the data, and the filtered data will be used in the following sections.
Energies 2020, 13, x FOR PEER REVIEW 6 of 20 Figure 3 shows that the measured data have a high variance, where some points have sudden changes in the load. The large differences could be due to a sudden change in the operation mode of the GSHP or due to erroneous measurements. The models are trained to minimize the squared error. Hence, a few points with high errors will disproportionately influence the model. Due to the large number of data points, random measurement errors will not affect the model. However, errors due to malfunctions in the measurement systems will result in some obviously erroneous measurements. These erroneous measurements can be identified as outliers in measured data.

Filtering of Data
In order to identify the outliers we studied the deviation between the measured data and a prediction model. An analytical model was used to identify the outliers, since the analytical model is not trained on the measured data. Therefore, it is better suited than the hybrid model presented in this study to identify measurement errors. The analytical model developed by Puttige et al. [54] with one-hour time resolution was used. Section 3 includes a short explanation of the model. Figure 4 shows a histogram of the deviation between the load from the analytical model and the measured data. The mean residual for borehole groups A and B are −0.6 kW and 18.5 kW, and the standard deviations of the residuals are 80.4 kW and 57.2 kW for borehole group A and B, respectively. The data points with residuals outside the window of three times the standard deviation are considered as outliers. Note that the deviation includes both measurement errors and model inaccuracies therefore a large window was chosen to accommodate for the inaccuracies of the analytical model. A total of 1.3% of the points were identified as outliers. The outliers were removed from the data, and the filtered data will be used in the following sections.

Hybrid Approach
We present a hybrid ANN model that predicts long-term high time resolution performance of a BHE using the outputs of an analytical model as an input to the ANN model. The hybrid model predicts the hourly load at time t using the inlet temperature and mass flow rate of the last 24 h and the output of the analytical model with 24 h time resolution. With the analytical model to guide the ANN model, the ANN model can predict the long-term performance of the BHE with high accuracy. We chose a simple analytical model that uses only a few physical parameters that are easy to obtain. The authors have presented the analytical model for the BHE described in Section 2 in a previous

Hybrid Approach
We present a hybrid ANN model that predicts long-term high time resolution performance of a BHE using the outputs of an analytical model as an input to the ANN model. The hybrid model predicts the hourly load at time t using the inlet temperature and mass flow rate of the last 24 h and the output of the analytical model with 24 h time resolution. With the analytical model to guide the ANN model, the ANN model can predict the long-term performance of the BHE with high accuracy. We chose a simple analytical model that uses only a few physical parameters that are easy to obtain. The authors have presented the analytical model for the BHE described in Section 2 in a previous study [54]. A 24 h time resolution was selected for the analytical model to have a reasonable computational time. Increasing the time resolution of the analytical model improves the accuracy of the hybrid model. An initial trial showed that increasing the time resolution from 30-days to 24 h decreases the relative error of the model by 2.7%, but increasing the time resolution further to one-hour reduces the relative error by just 0.8%. Therefore, we chose a time resolution of 24 h for the analytical model as a tradeoff between computational time and accuracy.
The analytical model uses an approach proposed by Lamarche [20]. The main reason to select this approach was to represent BHE with multiple inputs and outputs. Although the model does not consider geothermal gradient and groundwater, it was chosen because of its simplicity. Moreover, geothermal gradient and groundwater flow were not significant at the installation location [54]. The model's inputs are inlet temperatures and mass flow rates for each borehole group as inputs, and its outputs are the temperature at the borehole wall and heat load of each borehole. The borehole wall refers to the interface between the ground and the groundwater inside the borehole. The model uses the properties of the ground and the geometric properties of the BHE as parameters. The parameters are listed in Table 1 of Section 2. The model first calculates the coefficients that depend only on the geometry of the BHE. The calculation of these coefficients is a computationally-intensive process, but they are calculated just once for the given geometry and can be used with different inputs and ground parameters. The model uses these coefficients to determine the current temperature at the borehole wall of each borehole using the current inputs and the effect of the previous inputs. The current loads are then calculated using the borehole wall temperatures. Although the borehole wall temperatures and heat loads for each borehole are available, we will only use the average values for each borehole group in this study. For a detailed explanation about the model and its implementation, the readers can refer to [54].
The inputs of the ANN model are chosen to include information about the entire thermal history of the BHE. Hence, the selected inputs for the ANN model are the inlet temperature and mass flow rate of the last 24 h and the output of the last step of the analytical model before time t. The outputs of the analytical model are calculated using the inlet temperature and mass flow rate of all time until the last step. Since the analytical model has a time resolution of 24 h, the last time step of the analytical model can end 1 to 24 h before the time t. Hence, the number of hours between the end of the last step of the analytical and the current time is included as an additional input.
Several studies have used the result that the thermal history of a few immediate hours is more important than the thermal history of the past to decrease computational time by aggregating loads of the past thermal history [24,55,56]. The studies show that past loads can be aggregated into larger time steps without affecting the accuracy of the model. We used the same approach to reduce the number of inputs to the ANN model. Reducing the number of inputs reduces the complexity of the ANN model, which not only reduces the computational time but also reduces the possibility of overtraining. Therefore, instead of using hourly steps as the input for the last 24 h, we used six hourly steps and an aggregated step of 18 h for the remainder of the 24 h.
Thus, the inputs to the ANN model are: The total number of inputs to the ANN is 37, 4 from the current step, 6 × 4 from 6 previous hourly steps, 4 from the 18 h aggregated step, 4 from the outputs of the analytical model, and Naa. The two outputs of the ANN model are heat loads of the current time step for each borehole group. The choice of the number of hidden nodes was made based on the rule of thumb mentioned in [57], two-thirds of the number of inputs plus the number of outputs. Therefore, the initial architecture was chosen as one hidden layer with 27 nodes. The activation function of the hidden layer is the hyperbolic tangent, and the output layer has a linear activation function. The ANN was implemented in MATLAB (R2020a, The MathWorks Inc, Natick, MA, USA).
The filtered measured data from the GSHP described in Section 2.1 is used for training and testing of the ANN. The data from 2016 is less reliable than the remaining period, as a large portion of the data is missing. Hence the data from 2016 is not used for training or testing of the models. However, the loads during this period will affect the future operation of the BHE. Therefore, the analytical model includes data from 2016. Two years of data, 2017 and 2018, are used for training. The data from 2019 are used as a validation set. The error of the validation set after each training iteration is used to determine the criteria to stop the training algorithm and to select hyperparameters in Section 4.1. The rest of the data, 2020, is used as a testing set.
The network was trained using the Levenberg-Marquardt (LM) algorithm to minimize the mean squared error (MSE). The probability of overtraining the ANN was reduced using the early stopping criteria. The early stopping criteria stop the optimization algorithm if the MSE of the validation set does not decrease for a specified number of iterations of the algorithm. The LM algorithm only finds local minima and might not find a good solution. Therefore the ANN was trained multiple times, 50 times, with different random initial weights, and the network with the least MSE for the validation set was selected.

Results and Discussion
The RMSE of the ANN model for training, validation, and testing are 12.0 kW, 19.3 kW, and 26.6 kW, respectively. The RMSEs correspond to 3.2%, 6.1%, and 7.3% error for training, validation, and training. Note that the percentage of error is with respect to the mean absolute load of the corresponding period. The total error corresponds to 4.9%. The relative RMSE is higher for validation and testing periods compared to training period since the model over fits to the training data to some degree.
Due to a large number of data points, the 30-day moving relative RMSE of the hybrid model is plotted in Figure 6, instead of hourly RMSE. The maximum 30-day moving relative RMSE is 12.3%. The relative RMSE for validation and testing is greater than training but the relative RMSE of in the validation and testing periods is similar. The relative RMSE in the testing period has an upward trend. However, the variation is within the error range of the validation period. To check if the relative RMSE increases in the testing period, we used the data from only 2017 training, and 2018 was The total number of inputs to the ANN is 37, 4 from the current step, 6 × 4 from 6 previous hourly steps, 4 from the 18 h aggregated step, 4 from the outputs of the analytical model, and N aa . The two outputs of the ANN model are heat loads of the current time step for each borehole group. The choice of the number of hidden nodes was made based on the rule of thumb mentioned in [57], two-thirds of the number of inputs plus the number of outputs. Therefore, the initial architecture was chosen as one hidden layer with 27 nodes. The activation function of the hidden layer is the hyperbolic tangent, and the output layer has a linear activation function. The ANN was implemented in MATLAB (R2020a, The MathWorks Inc, Natick, MA, USA).
The filtered measured data from the GSHP described in Section 2.1 is used for training and testing of the ANN. The data from 2016 is less reliable than the remaining period, as a large portion of the data is missing. Hence the data from 2016 is not used for training or testing of the models. However, the loads during this period will affect the future operation of the BHE. Therefore, the analytical model includes data from 2016. Two years of data, 2017 and 2018, are used for training. The data from 2019 are used as a validation set. The error of the validation set after each training iteration is used to determine the criteria to stop the training algorithm and to select hyper-parameters in Section 4.1. The rest of the data, 2020, is used as a testing set.
The network was trained using the Levenberg-Marquardt (LM) algorithm to minimize the mean squared error (MSE). The probability of overtraining the ANN was reduced using the early stopping criteria. The early stopping criteria stop the optimization algorithm if the MSE of the validation set does not decrease for a specified number of iterations of the algorithm. The LM algorithm only finds local minima and might not find a good solution. Therefore the ANN was trained multiple times, 50 times, with different random initial weights, and the network with the least MSE for the validation set was selected.

Results and Discussion
The RMSE of the ANN model for training, validation, and testing are 12.0 kW, 19.3 kW, and 26.6 kW, respectively. The RMSEs correspond to 3.2%, 6.1%, and 7.3% error for training, validation, and training. Note that the percentage of error is with respect to the mean absolute load of the corresponding period. The total error corresponds to 4.9%. The relative RMSE is higher for validation and testing periods compared to training period since the model over fits to the training data to some degree.
Due to a large number of data points, the 30-day moving relative RMSE of the hybrid model is plotted in Figure 6, instead of hourly RMSE. The maximum 30-day moving relative RMSE is 12.3%. The relative RMSE for validation and testing is greater than training but the relative RMSE of in the validation and testing periods is similar. The relative RMSE in the testing period has an upward trend. However, the variation is within the error range of the validation period. To check if the relative Energies 2020, 13, 6213 9 of 19 RMSE increases in the testing period, we used the data from only 2017 training, and 2018 was used for validation, and 2019 and 2020 were used for testing. Figure 7 shows the moving relative RMSE for this case. For the 19 months of the testing period in Figure 7, the relative RMSE shows no clear increase over time. This implies that the model can be used for predicting the load for multiple years. The relative RMSE for validation and testing is higher in Figure 7 compared to Figure 6. Therefore, we will use the original scheme, with 2017 and 2018 for training, 2019 for validation, and 2020 for testing in this study.
Energies 2020, 13, x FOR PEER REVIEW 9 of 20 used for validation, and 2019 and 2020 were used for testing. Figure 7 shows the moving relative RMSE for this case. For the 19 months of the testing period in Figure 7, the relative RMSE shows no clear increase over time. This implies that the model can be used for predicting the load for multiple years. The relative RMSE for validation and testing is higher in Figure 7 compared to Figure 6. Therefore, we will use the original scheme, with 2017 and 2018 for training, 2019 for validation, and 2020 for testing in this study.

Model Improvement
The performance of the model was improved by tuning the hyper-parameters of the model and by using an ensemble of multiple neural networks instead of a single network.  used for validation, and 2019 and 2020 were used for testing. Figure 7 shows the moving relative RMSE for this case. For the 19 months of the testing period in Figure 7, the relative RMSE shows no clear increase over time. This implies that the model can be used for predicting the load for multiple years. The relative RMSE for validation and testing is higher in Figure 7 compared to Figure 6. Therefore, we will use the original scheme, with 2017 and 2018 for training, 2019 for validation, and 2020 for testing in this study.

Model Improvement
The performance of the model was improved by tuning the hyper-parameters of the model and by using an ensemble of multiple neural networks instead of a single network.

Model Improvement
The performance of the model was improved by tuning the hyper-parameters of the model and by using an ensemble of multiple neural networks instead of a single network.

Ensemble
Neural network ensemble is a technique to refine the neural network models by using multiple neural networks to solve a problem. In this technique, several ANNs are trained, and then the output of all the networks produce a better prediction than any individual network. Hansen and Salamon [58] first showed that neural network ensembles could be an effective method to reduce the error of ANNs. Since then, several approaches to training individual networks have been developed [59]. In this study, Energies 2020, 13, 6213 10 of 19 we will use the simple approach of training each of the networks using the same set of data with different initial weights. More complex approaches were not explored since it is outside the scope of our study. The output of the ensemble is calculated as the average of the networks, giving equal weight to each network.
Fifty ANNs were trained using the filtered data set similar to the previous model, but instead of using the predations of the best network, the average of all the networks is used as output. Figure 8 shows the relative RMSE of the average of first N networks. The ensemble of all 50 networks has a relative RMSE of 2.6%, 5.9%, and 6.6% for training, validation, and testing, respectively. Therefore, the average prediction of all the models is better than the prediction of the best model. However, from Figure 8, we notice that after a certain point adding more models does not result in any significant change in the accuracy of the ensemble. Hence the number of models in the ensemble can be lower than 50. The validation relative RMSE changes by less than 0.1% when the number of models varies from 15 to 50. Therefore, an ensemble of 15 models will be used for the remainder of this study. The relative RMSE for training, validation, and testing of the ensemble of 15 models is 2.7%, 5.9%, and 6.7%, respectively.
Neural network ensemble is a technique to refine the neural network models by using multiple neural networks to solve a problem. In this technique, several ANNs are trained, and then the output of all the networks produce a better prediction than any individual network. Hansen and Salamon [58] first showed that neural network ensembles could be an effective method to reduce the error of ANNs. Since then, several approaches to training individual networks have been developed [59]. In this study, we will use the simple approach of training each of the networks using the same set of data with different initial weights. More complex approaches were not explored since it is outside the scope of our study. The output of the ensemble is calculated as the average of the networks, giving equal weight to each network.
Fifty ANNs were trained using the filtered data set similar to the previous model, but instead of using the predations of the best network, the average of all the networks is used as output. Figure 8 shows the relative RMSE of the average of first N networks. The ensemble of all 50 networks has a relative RMSE of 2.6%, 5.9%, and 6.6% for training, validation, and testing, respectively. Therefore, the average prediction of all the models is better than the prediction of the best model. However, from Figure 8, we notice that after a certain point adding more models does not result in any significant change in the accuracy of the ensemble. Hence the number of models in the ensemble can be lower than 50. The validation relative RMSE changes by less than 0.1% when the number of models varies from 15 to 50. Therefore, an ensemble of 15 models will be used for the remainder of this study. The relative RMSE for training, validation, and testing of the ensemble of 15 models is 2.7%, 5.9%, and 6.7%, respectively.

Choosing the Number of Hidden Nodes
The number of nodes in the hidden layer is an important parameter of the ANN model. If the number of nodes is too low, the model will be under-fitted, and the model will not be able to represent the variance of the system. If the number of nodes is too high, the model has a higher risk of overfitted, and random errors of the model will be represented in the model. Therefore, we tried different numbers of hidden nodes to find an appropriate architecture for the ANN.

Choosing the Number of Hidden Nodes
The number of nodes in the hidden layer is an important parameter of the ANN model. If the number of nodes is too low, the model will be under-fitted, and the model will not be able to represent the variance of the system. If the number of nodes is too high, the model has a higher risk of over-fitted, and random errors of the model will be represented in the model. Therefore, we tried different numbers of hidden nodes to find an appropriate architecture for the ANN. Table 2 shows the relative RMSE for training, validation, and testing of the model with different numbers of nodes in the hidden layer. Each model is an ensemble of 15 models trained on filtered data. As expected, the training error decreases as the number of nodes increases. The validation error initially decreases as the number of nodes increases. However, increasing the number of nodes after a certain point increases the validation error, i.e., the model overfits the training data after a certain point. The validation error increases when the number of nodes is higher than 45. We, therefore, limit the number of nodes in the hidden to be 45. The number of nodes was not refined further because the difference in the RMSE among the models was not significant, i.e., the choice of the number of nodes would be affected by the random initial weights of the model.

Choosing Optimization Algorithm
The choice of the optimization algorithm used for training the network can also have an impact on the accuracy of the model. Hence, we tested three optimization algorithms, namely, scaled conjugate gradient (SCG), Levenberg-Marquardt (LM), and Bayesian regularization within the LM algorithm (BR). The BR algorithm does not use the validation set; therefore, a predefined goal for MSE is set as the stopping criteria for the algorithm. After a few trials, the goal for the MSE was set as 50 kW 2 . Table 3 shows the relative RMSE for training, validation, and testing for models trained using the three algorithms. An ensemble of 15 models with 45 nodes in the hidden layer was used, i.e., the parameters selected in the Sections 4.1.1 and 4.1.2. The SCG algorithm performs significantly worse than the other two algorithms. Both LM and BR have similar RMSE values, but the validation RMSE of the LM model is lower. We, therefore, chose the LM optimization algorithm. Both algorithms use the LM algorithm to update the weights, the BR algorithm uses regularization to reduce overfitting. In the LM model, overfitting was prevented by using early fitting. The results show that LM using early stopping is slightly better than BR and is therefore we chose the LM algorithm.

Choosing the Number of Non-Aggregated Steps
In the initial model, we divided the 24 h thermal history into six non-aggregated 1 h steps and one aggregated 18 h step. Increasing the number of non-aggregated steps will increase the number of inputs of the model. A higher number of inputs means more information, which can result in a more accurate model, but the risk of overfitting and local minima is also higher. Table 4 shows the relative RMSE of the training, validation, and training set. The models use the parameters selected in the Sections 4. 1.1 and 4.1.3. The RMSE of the validation set decreases when the number of non-aggregated steps increases from 3 to 6, but a further increase in the number of non-aggregated steps did not decrease the RMSE of the validation set. We selected the number of non-aggregated steps as six since an additional increase in the number of non-aggregated steps did not improve the model, but the computational time for training increased.

Comparison with Other Models
We evaluated the proposed hybrid model by comparing it with various analytical and ANN models for BHE. The models in the following section use the filtered data set described in Section 2.2 for training and testing. We compared the hybrid model with the two analytical models presented in [54], ANA, and ANA_calib. ANA is the same model used by the hybrid model but with a time resolution of one hour. ANA_calib is the ANA model, but the properties of the ground calibrated using the measured data. Four parameters are calibrated, namely thermal conductivity of the ground (k), volumetric heat capacity of the ground (ρCp), thermal resistance between the fluid and the borehole wall (R b ), and undisturbed temperature of the ground (T ug ). Puttige et al. [54] recommended the used of different parameters for heating and cooling seasons. Based on this recommendation, the loads were determined using two sets of parameters. We used filtered data from 2017 and 2018 to calibrate the model, which is the same dataset as training set of the hybrid model. Table 5 lists the calibrated model parameters for heating and cooling seasons.  Figure 9 shows the moving relative RMSE for the two analytical models and the final hybrid model. The hybrid model has lower relative RMSE during the whole period. The ANA_calib and hybrid models use data from 2017 and 2018 for training, and the hybrid model is validated using the data from 2019. None of the models uses the data from 2020. It is therefore best to compare the models by the error of 2020. The relative RMSE for 2020 for ANA, ANA_calib, and hybrid models is 21.9%, 13.2%, and 6.3%. The results show that the error of the hybrid model is significantly lower than the analytical models.  The hybrid model has an advantage in terms of computational time as well. The two analytical models have a computational time of approx. 5 h for each run on a computer with a 3.4 GHz quad-core processor and 16 GB RAM. While the hybrid model has a computational time of approx. 15 min for the analytical model and less than a second for the ANN model. The computational time for calibrating the ANA_calib model is approx. 40 h, while the time for training the hybrid model is approx. 25 min. Note that the computational times for the analytical models do not include the time required for calculating the geometric coefficients, since they were calculated just once for the BHE and used for all the models.
The physical parameters necessary for both the ANA and hybrid models are the geometry of the BHE and four properties of the ground. ANA_calib model uses the measured data to calculate the properties of the ground, so it only requires the geometry of the BHE. However, the accuracy of the parameters required for the hybrid model is lower than regular ANA model since the hybrid model is capable of correcting the inaccuracy of the analytical model. We tested this by adding random error in the range of ±20% to the measured ground parameters, i.e., the measured parameters were multiplied by a random number in the range of 0.8 to 1.2. Two hybrid models with different random errors were tested. The relative RMSE of the models were 6.2% and 6.4% for the testing period.

Neural Network Models
We compared the hybrid model with three different types of neural network models. The first is a simple feedforward network, while the other two networks that use the outputs from the previous time steps.
The inputs of the feedforward network (ANN) are similar to the hybrid model but without the outputs of the analytical model and N aa . Therefore, the inputs of the ANN model are the inlet temperatures and mass flow rates of the current one-hour step, six previous one-hour steps, and one aggregated step of 18 h. The ANN model has 32 inputs. The outputs of the ANN model are the same as the hybrid model, the predicted load of the current step. The suitable number of hidden nodes in the ANN model is determined to be 32, using the same method as Section 4.1.2. An ensemble of 15 ANN models trained using the LM algorithm is used to obtain the predicted loads.
Temporal dependencies of a neural network can be included using a class of neural networks called recursive neural networks (RNN). RNNs use the outputs of previous time steps as inputs of the current time step. Since each step of the RNN depends on the previous step/s, the output of the current step depends on the entire sequence of inputs before the current step, as shown in Figure 10. Therefore, RNNs are capable of considering the history of the BHE. Kose and Petlenkov [60] used RNNs to model BHE, but the structure of the RNN used is not clear. Lee et al. [53] predicted the current outlet temperature using different inputs and found that using the four inputs, current heat load, outlet temperature at the previous time step, heat load at the previous time step, and the outlet temperature at two time steps prior, was the best option. However, they did not use an RNN, but the outlet temperature from the target outputs. In practice, when predicting outlet temperatures of multiple time steps ahead, the target output will not be available. Hence, we must use the simulated output. In this study, we will use inputs similar to [53], but we will use simulated outputs from the previous steps instead of target outputs. The inputs to the RNN model are inlet temperatures and mass flow rate of the current and the previous step for each borehole, and the output of the two previous time steps. The output of the model is the heat loads of borehole groups A and B.
Since the output of the RNN model depends on the sequence of inputs, the inputs must contain the complete time series, i.e., from the start of the operation to the current step. However, the input data has some missing data. To resolve this issue, we calculated the loads during the missing periods using the ambient temperature. The mass flow rate of the BHE group was assumed to be constant and equal to the average mass flow rate of the borehole group during the missing periods. The inlet temperature was calculated using the analytical model with load and mass flow rates as inputs. This method gives us an estimate of the inputs to the RNN model, but these inputs are not accurate. Therefore, we used weighted MSE to calculate the error only when the inputs to the model are reliable. In order to have the same inputs as the hybrid model, the weights were set to 0 for 2016, during periods of missing data, and the points removed by the filtering process described in Section 2.2. The model uses data from 2017 and 2018 as the training set, 2019 as the validation set, and 2020 as the testing set.
Energies 2020, 13, x FOR PEER REVIEW 14 of 20 temperature at two time steps prior, was the best option. However, they did not use an RNN, but the outlet temperature from the target outputs. In practice, when predicting outlet temperatures of multiple time steps ahead, the target output will not be available. Hence, we must use the simulated output. In this study, we will use inputs similar to [53], but we will use simulated outputs from the previous steps instead of target outputs. The inputs to the RNN model are inlet temperatures and mass flow rate of the current and the previous step for each borehole, and the output of the two previous time steps. The output of the model is the heat loads of borehole groups A and B. Since the output of the RNN model depends on the sequence of inputs, the inputs must contain the complete time series, i.e., from the start of the operation to the current step. However, the input data has some missing data. To resolve this issue, we calculated the loads during the missing periods using the ambient temperature. The mass flow rate of the BHE group was assumed to be constant and equal to the average mass flow rate of the borehole group during the missing periods. The inlet temperature was calculated using the analytical model with load and mass flow rates as inputs. This method gives us an estimate of the inputs to the RNN model, but these inputs are not accurate. Therefore, we used weighted MSE to calculate the error only when the inputs to the model are reliable. In order to have the same inputs as the hybrid model, the weights were set to 0 for 2016, during periods of missing data, and the points removed by the filtering process described in Section 2.2. The model uses data from 2017 and 2018 as the training set, 2019 as the validation set, and 2020 as the testing set.
An ensemble of 15 models was used to determine the output similar to the hybrid model. The models were trained using, but the early stopping criteria cannot be used for RNN in MATLAB; therefore, the stopping criteria was set by defining the maximum number of iterations. The maximum number of iterations and the number of hidden nodes in the hidden layer were selected by trial and error. We selected 30 as the maximum number of iterations and 5 as the number of hidden nodes.
Typical RNNs like the one described above have an issue called vanishing gradient problem. The vanishing gradient problem causes the long-term inputs to lose their effect on the output, i.e., the output will only depend on the short-term inputs. Hochreiter and Schmidhuber [61] introduced long short-term memory (LSTM) networks as a solution to the vanishing gradient problem. LSTM resolves the issue of vanishing gradients by using gates, sigmoid functions, which decide how much information from the previous steps must be retained, and therefore important information from the previous steps is not lost. In this study, we will use a simpler variant of LSTM introduced by Cho et al. [62] called the gated recursion unit (GRU).
The GRU and the RNN model are similar, with the same training, validation, and testing data sets and the same weights for data and the same inputs and outputs. However, the output of the GRU ht is not used directly as the output of the network. The size of ht (hidden units size) determines how much information is passed from one time step to the next and should not be fixed by the number of outputs of the network. The GRU is therefore implemented using a four-layer network, sequence input layer, GRU layer, fully connected layer, and weighted regression layer. The sequence input layer coverts the input data into the right format so that it can be used in the GRU layer. The GRU layer uses the sequence of inputs to calculate the hidden states. The fully connected layer converts An ensemble of 15 models was used to determine the output similar to the hybrid model. The models were trained using, but the early stopping criteria cannot be used for RNN in MATLAB; therefore, the stopping criteria was set by defining the maximum number of iterations. The maximum number of iterations and the number of hidden nodes in the hidden layer were selected by trial and error. We selected 30 as the maximum number of iterations and 5 as the number of hidden nodes.
Typical RNNs like the one described above have an issue called vanishing gradient problem. The vanishing gradient problem causes the long-term inputs to lose their effect on the output, i.e., the output will only depend on the short-term inputs. Hochreiter and Schmidhuber [61] introduced long short-term memory (LSTM) networks as a solution to the vanishing gradient problem. LSTM resolves the issue of vanishing gradients by using gates, sigmoid functions, which decide how much information from the previous steps must be retained, and therefore important information from the previous steps is not lost. In this study, we will use a simpler variant of LSTM introduced by Cho et al. [62] called the gated recursion unit (GRU).
The GRU and the RNN model are similar, with the same training, validation, and testing data sets and the same weights for data and the same inputs and outputs. However, the output of the GRU h t is not used directly as the output of the network. The size of h t (hidden units size) determines how much information is passed from one time step to the next and should not be fixed by the number of outputs of the network. The GRU is therefore implemented using a four-layer network, sequence input layer, GRU layer, fully connected layer, and weighted regression layer. The sequence input layer coverts the input data into the right format so that it can be used in the GRU layer. The GRU layer uses the sequence of inputs to calculate the hidden states. The fully connected layer converts the hidden states into the required output, i.e., the load of the current step. The weighted regression layer calculates the weighted MSE using the measured and simulated loads.
Adam optimizer was used to train the model. The maximum number of iterations set the stopping criteria. The maximum number of iterations, the hidden unit size, and the learning rate of the algorithm were determined by trial and error. The selected number of hidden unit size, the maximum number of iterations, and the learning rate of the algorithm are 50, 250, and 0.02. Figure 11 shows the moving relative RMSE of the ANN, RNN, GRU, and the final hybrid models. All four models have the same training, validation and testing data. All the models use an ensemble of 15 neural networks. The relative RMSE for the testing set for the hybrid, ANN, RNN, and GRU models is 6.3%, 13.4%, 12.5%, and 13.0%, respectively. The relative RMSE of the hybrid model is about half of the other models. Which shows that the hybrid model is significantly more accurate than the other models. models. All four models have the same training, validation and testing data. All the models use an ensemble of 15 neural networks. The relative RMSE for the testing set for the hybrid, ANN, RNN, and GRU models is 6.3%, 13.4%, 12.5%, and 13.0%, respectively. The relative RMSE of the hybrid model is about half of the other models. Which shows that the hybrid model is significantly more accurate than the other models. The neural network models do not use the analytical model. Hence, they do not require any geometric parameters of the BHE or the thermal properties of the ground, and they have lower computational times. The run time of the ANN, RNN, and GRU models are 1, 7, and 8 s, respectively, and the training time for the three models is 11, 5, and 90 min. The training and the run time of the hybrid model are 25 and 15 min. The ANN, RNN, and GRU models could be a good option if the required parameters are not available or if the computational time is more important than accuracy for an application. The neural network models do not use the analytical model. Hence, they do not require any geometric parameters of the BHE or the thermal properties of the ground, and they have lower computational times. The run time of the ANN, RNN, and GRU models are 1, 7, and 8 s, respectively, and the training time for the three models is 11, 5, and 90 min. The training and the run time of the hybrid model are 25 and 15 min. The ANN, RNN, and GRU models could be a good option if the required parameters are not available or if the computational time is more important than accuracy for an application.

Conclusions
The proposed approach for BHE modeling uses an analytical model with low time resolution to guide an ANN model with high time resolution. This hybrid analytical-ANN model extends the range of typical ANN models, which are limited to short-term simulations, to long-term simulations of BHE, while maintaining good accuracy and reasonable computational time.
The hybrid model was applied to a large GSHP with more than four years of measured data. The measured data was used to train, validate, and test the model. After tuning the parameter of the model, we achieved a relative RMSE of 2.4% for training, 5.6% for validation, and 6.3% for testing. The model has a reasonable computational time of 15 minutes and requires only parameters that are easy to obtain. Moreover, the model is robust with respect to the model parameters used for the supportive analytical model.
The hybrid model was compared with two analytical models and three neural network models. The hybrid model was found to be significantly more accurate than the compared models. The best analytical and neural network models have a relative RMSE of 13.9% and 12.6% for the test set.