Application of Regression and ANN Models for Heat Pumps with Field Measurements

: Developing accurate models is necessary to optimize the operation of heating systems. A large number of ﬁeld measurements from monitored heat pumps have made it possible to evaluate different heat pump models and improve their accuracy. This study used measured data from a heating system consisting of three heat pumps to compare ﬁve regression and two artiﬁcial neural network (ANN) models. The models’ performance was compared to determine which model was suitable during the design and operation stage by calibrating them using data provided by the manufacturer and the measured data. A method to reﬁne the ANN model was also presented. The results indicate that simple regression models are more suitable when only manufacturers’ data are available, while ANN models are more suited to utilize a large amount of measured data. The method to reﬁne the ANN model is effective at increasing the accuracy of the model. The reﬁned models have a relative root mean square error (RMSE) of less than 5%.


Introduction
Heat pumps are an efficient way to provide heating and cooling. In recent years, they have become more environmentally and economically viable due to an increasing share of renewable sources in the energy mix and decreasing electricity prices [1,2]. The use of large heat pumps as a part of heat networks has also increased in recent years as they enable the use of low-temperature heat sources, such as waste heat from industry [3]. Additionally, heat pumps can convert excess electricity into heat, which can be stored and used later [4,5]. Adding large heat pumps to the energy network will allow us to include up to 40% of fluctuating renewable energy sources, such as solar and wind power, without losing efficiency [6]. As a result, we can expect the size and complexity of heat pumps to increase.
The accurate models of heat pumps can be used to improve the operation of heat pumps, as demonstrated in several studies [7,8]. With an increasing number of large heat pumps, the number of field measurements has also increased [5,9,10], which provided an opportunity to improve the accuracy of the heat pump models. However, not all models can use the additional data effectively. Hence, the choice of model is important.
Underwood [10] described four different types of heat pump models: balanced-state, steady-state, fitted, and dynamic-state models. Balanced-state models use a constant coefficient of performance (COP) and are used to evaluate the seasonal or annual performance of a heat pump. Refrigeration cycle-based steady-state models and regression-based fitted models are used for optimizing the long-term operation of a heat pump, which requires a time resolution of one hour. Dynamic models consider the heat pump's transient behavior and are suited for control system design, which requires a second scale-time resolution.
Refrigeration cycle-based models [11,12] use a simplified vapor compression cycle to calculate the thermodynamic states of the refrigerant. Steady-state models of each component, compressor, expansion valve, and heat exchanger (condenser and evaporator) There are several examples of ANNs being used to model heat pumps, especially in complex cases. Bechtler et al. [22] used a generalized radial basis function ANN model to predict a heat pump's performance with evaporator outlet temperature, condenser outlet  There are several examples of ANNs being used to model heat pumps, especially in complex cases. Bechtler et al. [22] used a generalized radial basis function ANN model to predict a heat pump's performance with evaporator outlet temperature, condenser outlet temperature, and evaporator capacity as inputs. Bechtler et al. [22] developed the model for three different refrigerants using experimental data and demonstrated that the ANN was an acceptable alternative to refrigeration cycle-based models. Arcaklioglu et al. [23] used a multilayer feed-forward network with a sigmoid transfer function to model heat pumps with binary refrigerant mixtures. They included the mixing ratio as input and thus found that the model accurately predicted the heat pump's performance with different mixing ratios. Esen et al. [24] and Benli [25] modeled ground-source heat pumps using a multilayer feed-forward network with a hyperbolic tangent sigmoid transfer function. Esen et al. [26,27] also used other machine learning techniques for modeling ground-source heat pumps. The model predicted the COP of the heat pump using air temperature at the inlet and outlet of a condenser fan-coil unit and ground temperatures at two different depths as inputs. These examples show that ANNs can be a viable option to model a variety of heat pumps. However, in the above-reviewed cases, the training data set was obtained under laboratory conditions and not actual field measurements.
Both black-box models and grey-box models require parameters that must be determined for individual heat pumps. The parameters are obtained using either laboratory data or data generated by simulations. Zhang et al. [28] demonstrated that using manufacturers' data to develop regression models can lead to high uncertainty. Zhang et al. [28] trained 13 regression models on the manufacturers' data and showed that all models had a good fit. However, when the models were used to predict a dynamic load, they deviated from each other by up to 30%. Deviations among the models were reported to be due to non-standard operating conditions such as operating below the design load, i.e., partial load operation, and operating temperatures outside the specified limits. The influence of partial load was included in some models using part-load efficiencies [29] for scroll compressors and on-off cycle degradation [18].
However, the performance of heat pumps' field operation is different from that in a laboratory, not only due to non-standard operating conditions but also due to other reasons such as transients, faults, or improper installation. Corberan et al. [30] validated their regression model using field measurements. They noted that the condenser power of the model deviated by up to 13.8% from the field measurements, mainly due to transients. Ruschenburg et al. [19] compared monthly COP values simulated using a regression model with field measurements for five similar installations and found that the average deviation varied from 2% to 13%.
Using field measurements to calibrate the regression models can reduce the model's inaccuracy by accounting for variations in individual installations. The traditional heat pumps market is single and multi-family houses that use small heat pumps, which are not monitored. However, there has recently been a lot of effort to integrate large heat pumps into district heating and cooling networks [2]. Large heat pumps are usually monitored, and hence we now have the opportunity to calibrate the heat pump models using field measurements. ANN-based models have an advantage over traditional regression models when a large amount of monitored data are available as they can characterize the more complex behavior of the heat pumps without explicitly defining the relationship between the inputs and the outputs. The models developed using the field measurements can be useful for fault detection or optimizing the operation of a heating system. Models for optimizing the operation of a heating system can be particularly important when there is flexibility in the heating system. For example, in the present case study, the domestic hot water and space heating system of the building consists of a ground source heat pump and heat from a district heating network, hence an accurate model of the heat pump can be used to optimize the heat load distribution between the two sources of heat.
This study will present a comparison of seven models-five regression models, and two ANN models-calibrated using both field measurements and manufacturers' data. The objective of this comparison was to show how suitable the models are during the design phase, when only manufacturers' data are available, and during the operation, when measured data are available. To the best of the authors' knowledge, such a comparison is not available in the literature. The insight provided by the comparison will help practitioners choose the right model. The models were developed for a large ground-source heat pump system, which has been monitored since 2017. We also demonstrated how to refine the ANN models for heat pumps to utilize the field measurements more effectively.

Description of the Studied Facility
The geothermal heating system is a supplement to the heating and cooling provided by the district heating and cooling network at the university hospital in Umeå, Sweden. The installation consists of three heat pumps and a borehole heat exchanger consisting of 125 boreholes. In summer, the heat from space cooling and the excess heat from the heat pump are injected into the borehole heat exchanger; and in winter, the borehole heat exchanger acts as a source of heat for the heat pumps. A detailed explanation of the borehole heat exchanger and its model is presented in separate articles [31,32]. The geothermal heat pump was designed to satisfy 95% of the hospital's cooling load (5 GWh) and 20% of the heating load (7 GWh). The heating and cooling provided by the geothermal heat pump for a typical year are shown in Figure 2. The cooling load does not vary much with the seasons, whereas the heating load decreases significantly during the summer. flexibility in the heating system. For example, in the present case study, the domestic hot water and space heating system of the building consists of a ground source heat pump and heat from a district heating network, hence an accurate model of the heat pump can be used to optimize the heat load distribution between the two sources of heat.
This study will present a comparison of seven models-five regression models, and two ANN models-calibrated using both field measurements and manufacturers' data. The objective of this comparison was to show how suitable the models are during the design phase, when only manufacturers' data are available, and during the operation, when measured data are available. To the best of the authors' knowledge, such a comparison is not available in the literature. The insight provided by the comparison will help practitioners choose the right model. The models were developed for a large groundsource heat pump system, which has been monitored since 2017. We also demonstrated how to refine the ANN models for heat pumps to utilize the field measurements more effectively.

Description of the Studied Facility
The geothermal heating system is a supplement to the heating and cooling provided by the district heating and cooling network at the university hospital in Umeå, Sweden. The installation consists of three heat pumps and a borehole heat exchanger consisting of 125 boreholes. In summer, the heat from space cooling and the excess heat from the heat pump are injected into the borehole heat exchanger; and in winter, the borehole heat exchanger acts as a source of heat for the heat pumps. A detailed explanation of the borehole heat exchanger and its model is presented in separate articles [31,32]. The geothermal heat pump was designed to satisfy 95% of the hospital's cooling load (5 GWh) and 20% of the heating load (7 GWh). The heating and cooling provided by the geothermal heat pump for a typical year are shown in Figure 2. The cooling load does not vary much with the seasons, whereas the heating load decreases significantly during the summer. Two heat pumps provide heat and cold for space heating and cooling, while the third heat pump provides additional heat for space heating during cold days and heat for hot water production. Figure 3 shows a schematic of the facility. The heat from the condensers of heat pumps 1, 2, and 3 are used for space heating in three stages. The heat from the sub-cooler of heat pumps 1 and 2 is used as a heat source for heat pump 3 and preheating the domestic hot water. In the winter season, the heat extracted from the borehole heat exchanger and minor space cooling demand of the hospital provides the heat required for the evaporators of heat pumps 1 and 2. In the summer season, the cooling load dominates the heat pump operation. Because the space heating load is lower than the heat released by the condenser of heat pumps 1 and 2, the excess heat is injected into the borehole heat exchanger. Additionally, the borehole heat exchanger is also used as a pre-cooler for the evaporators of heat pumps 1 and 2. domestic hot water. In the winter season, the heat extracted from the borehole heat exchanger and minor space cooling demand of the hospital provides the heat required for the evaporators of heat pumps 1 and 2. In the summer season, the cooling load dominates the heat pump operation. Because the space heating load is lower than the heat released by the condenser of heat pumps 1 and 2, the excess heat is injected into the borehole heat exchanger. Additionally, the borehole heat exchanger is also used as a pre-cooler for the evaporators of heat pumps 1 and 2. Heat pumps 1 and 2 have similar operating temperatures, while heat pump 3 has a higher temperature range and a lower heat load. Hence, heat pumps 1 and 2 use the same model of heat pump, model EMA from the manufacturer EnergyMachines, Gävle, Sweden. Heat pumps 1 and 2 (HP1&2) have two loops (circuits), and each circuit has twin compressors, as shown in Figure 3. The refrigerant in HP1&2 is R410A. Heat pump 3 (HP3) uses a different heat pump model, EMB from EnergyMachines, which consists of two circuits with a single compressor, shown in Figure 3. The refrigerant in HP3 is R134a. Although HP1&2 provide both heating and cooling, we will consider that the primary purpose of each heat pump is heating and thus develop the models accordingly.
The installation has been operating since mid-February 2016, and it has been fully monitored since January 2017. Table 1 shows a list of measurements that were performed continuously. Since HP1&2 have a common condenser and evaporator for both circuits, they have the same inlet and outlet water temperatures. HP3 has a separate condenser and evaporator for each circuit. Compressor utilization time is the cumulative amount of time the compressor has been turned on. The compressor utilization time is not measured for HP3. The average hourly value of each measurement in Table 1 is calculated and stored in a database. Heat pumps 1 and 2 have similar operating temperatures, while heat pump 3 has a higher temperature range and a lower heat load. Hence, heat pumps 1 and 2 use the same model of heat pump, model EMA from the manufacturer EnergyMachines, Gävle, Sweden. Heat pumps 1 and 2 (HP1&2) have two loops (circuits), and each circuit has twin compressors, as shown in Figure 3. The refrigerant in HP1&2 is R410A. Heat pump 3 (HP3) uses a different heat pump model, EMB from EnergyMachines, which consists of two circuits with a single compressor, shown in Figure 3. The refrigerant in HP3 is R134a. Although HP1&2 provide both heating and cooling, we will consider that the primary purpose of each heat pump is heating and thus develop the models accordingly.
The installation has been operating since mid-February 2016, and it has been fully monitored since January 2017. Table 1 shows a list of measurements that were performed continuously. Since HP1&2 have a common condenser and evaporator for both circuits, they have the same inlet and outlet water temperatures. HP3 has a separate condenser and evaporator for each circuit. Compressor utilization time is the cumulative amount of time the compressor has been turned on. The compressor utilization time is not measured for HP3. The average hourly value of each measurement in Table 1 is calculated and stored in a database. The data are not continuous for the whole period of measurement due to faults in the monitoring system. The malfunctioning of one or more sensors or the monitoring system's shut down for maintenance and upgrade caused these faults. For example, the evaporator and condenser power measurements are not available from March 2019. Each of the heat pumps is considered separately, enabling us to use the data from one heat pump even when the measurement of another heat pump is incomplete. HP1&2 have 14,151 and 10,865 h of complete data, respectively. The majority of the data are from the period of April 2017-March 2019. HP3 operates intermittently as it is only used during the coldest hours and to produce domestic hot water. Hence, there are fewer hours with complete data for HP3. However, the two circuits of HP3 operate almost independently and can be considered as separate heat pumps. Circuit 1 and circuit 2 of HP3 have 2952 and 6915 h of complete data, respectively. The measured data for a typical summer and winter day are included in Appendix B.
We divided the measured data into two sets. One set was used for training the models, and the other was used for testing. For HP1&2, the period from April 2017 to March 2019 has most of the continuous data. We must have at least one year of data to ensure that the training data has the full range of variation in load and temperatures; hence, the data until April 2018 were used for training, and the remaining data were used for testing. The resulting training data set contained 54% of the data points, while the other 46% was used for testing. The available data set for heat pump 3 was smaller and scattered over time. We therefore randomly sorted the data for HP3 into two equal sets to ensure that both training and testing data sets had similar variation.
The power delivered by the heat pumps was divided between the sub-cooler and the condenser. However, many models used in this study do not consider a sub-cooler, and therefore, the sub-cooler power is not considered in the calculation of COP.

Description of Models
Seven data-driven models for heat pumps are chosen for this study, including five regression models and two neural network models. Table 2 compiles a list of the models and their corresponding equations. Table 2. List of models.

Model Equation Inputs
Bilinear [10] The bilinear model was the simplest model used in this study. The model does not include higher-order terms, but Underwood [33] showed that the bilinear model could have good accuracy. The models' simplicity also reduces the chances of overfitting, and hence the model has good accuracy while extrapolating [19]. The second model uses a biquadratic equation with condenser temperature and condenser power as inputs. This model had the best fit among six black-box and gray-box models compared in an earlier investigation [20]. Another form of the biquadratic model often used [18] has the evaporator and condenser temperatures as input and the condenser and compressor power as output. The multivariate polynomial is another model that previous studies have shown to have good performance [14,20]. This model uses three inputs, condenser power and evaporator and condenser temperatures. The ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) model is a biquadratic model with the temperature difference between the condenser and evaporator as one input and condenser power as the other input. The output of the Biquadratic 2 model and the ASHRAE model were converted to COP to compare the different models. Note that the models presented above can be used with the temperatures from either the refrigerant or the water sides. We used the refrigerant temperatures since the data provided by the manufacturer used the refrigerant side temperatures. The subscript R in the "inputs" columns represents that we used the refrigerant side temperatures.
A similar architecture was chosen for the two neural network models used in this study: one hidden layer with five nodes and one node in the output layer. The output of the neural network models was COP. The hyperbolic tangent function was used as the activation function for the hidden layer. The difference between the neural network models was the number of inputs; the neural network 2 (NN_2) model uses two inputs, evaporator and condenser temperatures, while the neural network 3 (NN_3) model uses three input, evaporator temperature, condenser temperatures, and condenser power. The trust region algorithm was used to train the models. The algorithm finds local minima, and the result of the optimization depends on the initial weights of the network, which are assigned randomly. Hence, the algorithm was run 100 times with different initial weights, and the solution with the lowest training error was chosen.

Development and Evaluation of Models
Models for HP1&2 and HP3 were determined from both the measured training set described in Section 2 and the performance data supplied by the manufacturer. Both sets of models were then evaluated by testing them on the measured testing data set. In the design phase, the only data available for modeling the heat pump operation were the performance data supplied by the manufacturer. Therefore, the models fitted using the manufacturer's data inform us about the deviation of actual performance from the design calculations.
The performance data supplied by the manufacturer were the data for full load and steady-state operation of the heat pump within its theoretical operating range. The data included every combination of evaporator temperature and condenser temperature with a resolution of 1 • C. The performance data included the evaporator temperature (T ER ), condenser temperature (T CR ), condenser power, compressor power, and COP. Table 3 shows the operating temperature range of the heat pumps. Note that the manufacturer's data were not laboratory measurements but data obtained using their model of the heat pump, which was calibrated using laboratory data. Sample data from the manufacturer are presented in Appendix B. Table 3. Range of manufacturer's data.

Heat Pump Model
Evaporator Temperature Range

Condenser Temperature Range
The manufacturer's data for HP3 contain some data points where the condenser temperature is lower than the evaporator temperature, so these data points were not included in the fitting process. The data points were used to determine the parameters by least-squares methods, i.e., the sum of the square of errors was minimized. Table 4 shows the training error of models for HP1&2 and HP3, trained using the manufacturer's data (Manu_fit) and the measured data (Meas_fit). The error was expressed as the root mean square error (RMSE) of the models for their respective training data and percentage of error, which was calculated as the percentage of RMSE with respect to the average COP of the training data set. As seen from the first column of Table 4 and the right side of Figure 4, the models that best represented the performance data supplied by the manufacturer (Manu_fit) were the two neural network models, and among the regression models, biquadratic 1 had the best fit. The ANN model NN_3 performed slightly better than NN_2, which was probably because NN_3 had an additional input, Q C .

Results
The ability of the different models to represent the measured data (Meas_fit) was fairly equal, as seen in the left side of Figure 4, but in general, worse than the models based on Manu_fit. The NN_3 model had the best fit in the case of Meas_fit as well as for Manu_fit. However, the choice of model was less important in the case of Meas_fit. The manufacturer's data were obtained from an ideal setting in which only the specified inputs affected the COP, so a good model can thus accurately represent its behavior. In actual operation, the heat pumps have to satisfy a dynamic load and have a sub-cooler. Such non-standard operating conditions, along with measurement errors, influences the measured COP of the heat pumps. Hence, the fact that the choice of the model is not important for Meas_fit indicates that the variables not considered in the model influence the results more than the choice of model.
We see that the Manu_fit models have a better fit for HP1&2 than for HP3. The main difference between HP1&2 and HP3 is that the temperature range specified by the manufacturer of HP3 is higher than HP1&2. Due to its higher temperature range, HP3 is expected to require a more complex function to represent the relationship between COP and the input variables. This is a plausible explanation of why the Manu_fit models of HP3 have a higher RMSE than HP1&2. Since ANN models are better than regression models at representing a complex function, the difference in performance between the regression models and the neural network models is higher for HP3. The Meas_fit models for both HP1&2 and HP3 have similar RMSEs since the temperature range of the heat pumps' actual operation is lower than the operating range specified by the manufacturer.
As seen from the first column of Table 4 and the right side of Figure 4, the models that best represented the performance data supplied by the manufacturer (Manu_fit) were the two neural network models, and among the regression models, biquadratic 1 had the best fit. The ANN model NN_3 performed slightly better than NN_2, which was probably because NN_3 had an additional input, QC. The ability of the different models to represent the measured data (Meas_fit) was fairly equal, as seen in the left side of Figure 4, but in general, worse than the models based on Manu_fit. The NN_3 model had the best fit in the case of Meas_fit as well as for Manu_fit. However, the choice of model was less important in the case of Meas_fit. The manufacturer's data were obtained from an ideal setting in which only the specified inputs affected the COP, so a good model can thus accurately represent its behavior. In actual operation, the heat pumps have to satisfy a dynamic load and have a sub-cooler. Such non-standard operating conditions, along with measurement errors, influences the measured COP of the heat pumps. Hence, the fact that the choice of the model is not important for Meas_fit indicates that the variables not considered in the model influence the results more than the choice of model.
We see that the Manu_fit models have a better fit for HP1&2 than for HP3. The main difference between HP1&2 and HP3 is that the temperature range specified by the manufacturer of HP3 is higher than HP1&2. Due to its higher temperature range, HP3 is expected to require a more complex function to represent the relationship between COP and  The models described above have been tested on the test set described in Section 2, and the results are presented in Table 5 and Figure 5. Following the naming convention of Table 4, Manu_fit refers to the models fitted using data from the manufacturer, and Meas_fit refers to models trained using the measured data. The Meas_fit models have a lower RMSE than the Manu_fit models. Among the Manu_fit models, the bilinear model has the least RMSE. The models that use condenser power as one of the inputs, i.e., biquadratic 1, multivariate polynomial, ASHRAE, and NN_3, have a higher RMSE than the models that use only evaporator and condenser temperatures. This can be explained by the fact that the condenser power during real operation is outside the range of the manufacturer's data since the heat pumps are not working at 100% load all the time. Using measured data to fit the models eliminates this issue. Therefore, the NN_3 and multivariate polynomial models have the lowest RMSE among the Meas_fit models. The same two models had the worst performance in Manu_fit, which emphasizes the importance of choosing the complexity of the model based on the availability of data.

Refining ANN Models
In the previous section, we noted that using the measured data could improve the model accuracy. The neural network model with three inputs, NN_3, has the highest accuracy among the models, as seen in Table 5. This shows that the neural network model is capable of utilizing the measured data better than traditional models. Another advantage of neural network models over traditional regression models is that we can easily

Refining ANN Models
In the previous section, we noted that using the measured data could improve the model accuracy. The neural network model with three inputs, NN_3, has the highest accuracy among the models, as seen in Table 5. This shows that the neural network model is capable of utilizing the measured data better than traditional models. Another advantage of neural network models over traditional regression models is that we can easily refine the model to better utilize the measured data set and identify more patterns within the measured data. However, there are many possible combinations of inputs, outputs, and architectures for ANN. It can be a challenging task to come up with a strategy to optimize the ANN model. Both the selection of inputs and optimizing the architecture of ANN are active research fields [34,35]. The modeling of a heat pump is a relatively simple problem for ANN. Hence it does not require a complex ANN architecture. Therefore, we will describe a methodology to refine neural network models, which is appropriate for the problem of heat pump modeling.
To select suitable measures to improve the ANN model, we first examined the error of the NN_3 meas_fit model. The training error of the model for HP1&2 and HP3 models was 5.2% and 7.1%, respectively, and the testing error of the model for HP1&2 and HP3 is 6.4% and 7.5%, respectively. The error of the testing set is only 0.4% higher than that of the training set for HP3. The difference is higher for HP1&2. However, the difference is due to change in operating conditions over the two years. Randomly dividing the data into training and testing sets resulted in training and testing errors of 6.4% and 4.7%, respectively, indicating that the model was not overfitting to the measured data. The low difference in training and testing errors indicates that the model has low-variance and is unable to capture the variations in the data. We can increase the variance of the model by adding more input variables that explain the variations or by increasing the complexity of the model by increasing the number of hidden layers. A high testing error would indicate that the model was overfitting to the data. Overfitting can be reduced by regularization, reducing the model's complexity by reducing the size of the hidden layer, using a larger training set. Changing the output variable from COP to compressor power was also tested since a change in output will affect the ANN model.

Additional Inputs
Adding additional inputs that explain the variations in COP will increase the accuracy of the model. The advantage of the neural network model compared with other models is that the additional inputs can be included without explicitly defining the relation between the input and output, which can be a complicated and time-consuming process.
The inputs are added to the model by changing the number of nodes in the input layer without changing the architecture of the other layers, i.e., five nodes in the hidden layer and one output layer. Three main inputs that may affect the heat pump's performance are considered, namely partial operation, sub-cooler operation, and COP from the manufacturer's data. An input used to improve the accuracy of the model should have a correlation to the output. However, it should not have a high correlation to the other inputs, as the new input must add information not available in other inputs. To select the inputs, we used a greedy forward selection process, in which the best inputs are sequentially added to the model until the improvement in the model is below a threshold [36,37].

Inputs Tested
When the heating/cooling requirements are lower than the heat pump's design load, the heat pump is partially or completely switched off. This influences the performance of the heat pump since the partial load operation of a heat pump is different from a full load operation. Moreover, the steady-state of the heat pump is disturbed by switching the compressor on/off. While the parameter Q c includes information about the partial operation of the heat pump, an explicit variable to represent the fraction of operation may improve the accuracy of the model.
Two different values are used to quantify the fraction of the heat pump's operation: the fraction of time the compressors are switched on in each circuit (UT r ) and the ratio of actual condenser power to design condenser power (Q cr ). Each circuit of HP1&2 has two compressors, which can be switched on and off independently. The parameter UT r is calculated using the average utilization time (UT) of both the compressors, measured every hour.
The ratio of condenser load to design condenser load, Q cr , is an alternative input that can be used to estimate the fraction of operation. The design condenser load at each T CR and T ER is calculated using the manufacturers' data. Q cr can be used for both HP1&2 and HP3, unlike UT r , which is only available for HP1&2. Moreover, Q cr can be estimated before the measurements are available, using estimated and designed condenser load. Q cr can therefore be used for a wider range of applications.
A sub-cooler is used in both HP1&2 and HP3. The heat from the sub-cooler in HP1&2 acts as the source for HP3 and as a preheater for domestic hot water production, while in HP3, the heat from the sub-cooler is used to preheat the water for space heating. To include the effect of the sub-cooler in the model, we used the sub-cooler power (Q SC ) and sub-cooler temperature (T SC ) as additional inputs. T SC is calculated as the average of sub-cooler water inlet temperature (T SCWin ) and outlet temperature (T SCWout ). T SCWin and T SCWout were only available for HP1&2. Note that the sub-cooler temperature is more important in the case of HP1&2 since the heat capacity in the sub-cooler of HP1&2 is higher than the heat capacity in the sub-cooler of HP3.
The temperature range of the evaporator and condenser specified by the manufacturer is larger than the temperature range used in real operation. Hence, including the COP obtained from the manufacturer's model (COP manu ) as input to the neural network model may improve the model's range. COP manu obtained from the NN_2 inputs model trained on the manufacturer's data was used as the additional input, as illustrated in Figure 6.

Selection of Inputs
The additional inputs were selecting using greedy forward feature selection. In this process, the best input is added to the model in each step. We defined the best input as the input that gives the largest reduction in relative RMSE for the testing set. All the inputs were tested in each step, and the best input was added to the model. We defined two criteria to stop the forward selection process. The first was a threshold of 1% reduction in relative RMSE, and the second was a threshold of 0.1%. Therefore, we will select two models for each of HP1&2 and HP3 at the end of the forward selection. Five inputs, namely UTr, Qcr, QSC, TSC, and COPmanu, were tested for HP1&2. Three inputs were tested for HP3 since UTr and TSC were not available for HP3.
In the first step, the first additional input for NN_3 was chosen by adding each of the inputs to the NN_3 model. Table 6 shows that the model with QSC as the fourth input has the lowest relative RMSE for the testing set of both HP1&2 and HP3. Adding QSC to NN3 results in a 1.5% reduction in relative RMSE for the testing set of HP1&2 and a 0.5% reduction for HP3. According to the first stopping criteria, we can stop the forward selection process and select the NN_3 model as one of the selected models for HP3. Table 6.
Step 1 of forward selection. Figure 6. Schematic of the ANN model that includes manufacturer's coefficient of performance (COP) as input.

Selection of Inputs
The additional inputs were selecting using greedy forward feature selection. In this process, the best input is added to the model in each step. We defined the best input as the input that gives the largest reduction in relative RMSE for the testing set. All the inputs were tested in each step, and the best input was added to the model. We defined two criteria to stop the forward selection process. The first was a threshold of 1% reduction in relative RMSE, and the second was a threshold of 0.1%. Therefore, we will select two models for each of HP1&2 and HP3 at the end of the forward selection. Five inputs, namely UT r , Q cr , Q SC , T SC, and COP manu , were tested for HP1&2. Three inputs were tested for HP3 since UT r and T SC were not available for HP3.
In the first step, the first additional input for NN_3 was chosen by adding each of the inputs to the NN_3 model. Table 6 shows that the model with Q SC as the fourth input has the lowest relative RMSE for the testing set of both HP1&2 and HP3. Adding Q SC to NN3 results in a 1.5% reduction in relative RMSE for the testing set of HP1&2 and a 0.5% reduction for HP3. According to the first stopping criteria, we can stop the forward selection process and select the NN_3 model as one of the selected models for HP3. In the next step, the second additional input was chosen by adding each of the remaining inputs to NN_4 Q SC . The results of step 2 are shown in Table 7. Adding T SC to the NN_4 Q SC model reduced the relative RMSE for the testing set of HP1&2 by 0.3%. NN_4 Q SC was chosen for HP1&2 using the first stopping criteria. Adding a fifth input to the HP3 model reduced the training error of the model, but it did not improve the testing error. In fact, the relative RMSE for the testing set increased when the fifth variable was added. This implied that the model started overfitting the training data. Therefore, the NN4_Q SC model was chosen for HP3 according to the second stopping criteria. We tested the effect of adding a third additional input to the HP1&2 model, as shown in Table 8. Since both the stopping criteria for the HP3 model were satisfied, HP3 is not included in step 3. The remaining three variables were added to the NN_5 Q SC T SC model for HP1&2. Adding a sixth variable increases the relative RMSE for the testing set. Hence, the forward selection processes are stopped, and NN_5 Q SC T SC is chosen as the second model for HP1&2. Therefore, the models chosen for HP3 are NN_3 and NN_4 Q SC using the 1% and 0.1% stopping criteria, respectively, and the models chosen for HP1&2 are NN_4 Q SC and NN_5 Q SC T SC, using the 1% and 0.1% stopping criteria, respectively. The inputs that are selected by the method gives us some insights into the heat pumps. Adding the sub-cooler data Q SC and T SC as inputs significantly improves the model accuracy for HP1&2. The improvement in the accuracy of HP3 is less significant. Including the effects of the sub-cooler is more important for HP1&2 since the relative Q SC is higher for HP1&2 than for HP3, Q SC /Q C for HP1&2 is 0.13, as compared to 0.08 for HP3. Q SC is higher for HP1&2 than for HP3 due to the difference in how the sub-coolers are used, as seen in Figure 3 of Section 2. The sub-cooler of HP1&2 is cooled by the evaporator of HP3, which is at around 13.5 • C lower than the inlet water temperature of the condenser. The temperature drop in the sub-cooler of HP1&2 is therefore significant. However, the sub-cooler of HP3 acts as a pre-cooler to the condenser, and the temperature drop in the sub-cooler of HP3 is expected to be lower than HP1&2.
The variables that consider the partial operation of the heat pumps, UT r and Q cr , were not considered as significant inputs. This implies that Q c , which is already input in the reference model, is sufficient to account for the effects of the partial operation. Additional inputs to account for the inefficiencies during the partial operation are not necessary. COP manu is the other input that was not selected. COP manu was included as an additional variable in the NN_3 model as it may help to improve the model accuracy for extrapolation. Therefore, the results imply that the training and testing set are similar, and the amount of extrapolation required is minimal.

Change of Output
In the above models, COP was used as the output of the models to compare the models with the standard regression models. However, in many cases, compressor power is a more useful output since it represents a physical quantity that can be used for other analyses. We therefore studied whether changing the output to compressor power has any effect on the accuracy of the models.
The four models selected in the previous section were tested with compressor power as output instead of COP. Table 9 shows that using the compressor power for HP1&2 models results in a lower training error but higher testing error. However, changing the output for HP3 models results in significantly lower training and testing errors. The testing error reduces by 2.7% and 2.4% for NN_3 and NN_4 Q SC, respectively. Hence, the models with compressor power as output were selected for HP3. The models will be referred to as NN_3_PO and NN_4_Q SC _PO.

Number of Hidden Nodes
All of the above models use one hidden layer with five nodes. The model might learn more complex relations between inputs and output if the number of nodes in the hidden layer or the number of hidden layers is increased. However, there is also a risk of overtraining the model to the training data. Therefore, the number of nodes in the hidden layer was increased for each of the four selected models until the testing error was reduced. Table 10 shows that increasing the number of nodes in the hidden layer reduces the relative RMSE of the training set. The testing error initially decreases as the number of nodes increases but increasing the number of nodes further results in overfitting, and the testing error increases. Hence, the optimal number of nodes for NN_4 Q SC is 15 and 10 for the other three models, NN_5 Q SC T SC , NN_3 PO, and NN_4 Q SC PO.

Sensitivity Analysis
The partial correlation coefficients of the four models were calculated to understand the influence of the inputs on the outputs. The correlations were calculated using the measured inputs of the models and the simulated output of each of the four models. Figure 7 shows the partial correlation coefficients between the inputs and outputs of the models for HP1&2 and HP3. The plots show that for the HP1&2 model, which calculates the COP, the condenser and evaporator temperatures are the most important inputs. Introducing the fifth variable, T SC , to the NN_4 Q SC model reduced the correlation coefficient of Q C and Q SC . This indicates that T SC is correlated to Q C and Q SC . Since the model is trained on actual measurements where Q SC , T SC and Q C vary simultaneously is hard for the model to learn the influence of the individual input variables. The correlation coefficient of Q C is higher for HP3 compared to HP1&2 since the models for HP3 output compressor power. Increasing T CR increases the COP but a high T CR is also correlated to high Q C and hence a higher compressor power. This demonstrates the difficulty in interpreting the results of a black-box model. 3.9 4.5

Sensitivity Analysis
The partial correlation coefficients of the four models were calculated to understand the influence of the inputs on the outputs. The correlations were calculated using the measured inputs of the models and the simulated output of each of the four models. Figure 7 shows the partial correlation coefficients between the inputs and outputs of the models for HP1&2 and HP3. The plots show that for the HP1&2 model, which calculates the COP, the condenser and evaporator temperatures are the most important inputs. Introducing the fifth variable, TSC, to the NN_4 QSC model reduced the correlation coefficient of QC and QSC. This indicates that TSC is correlated to QC and QSC . Since the model is trained on actual measurements where QSC, TSC and QC vary simultaneously is hard for the model to learn the influence of the individual input variables. The correlation coefficient of QC is higher for HP3 compared to HP1&2 since the models for HP3 output compressor power. Increasing TCR increases the COP but a high TCR is also correlated to high QC and hence a higher compressor power. This demonstrates the difficulty in interpreting the results of a black-box model.

Discussion
Section 4 shows that ANN models have the lowest training error among the models, when only the manufacturer's data are available. However, when the models trained on manufacturer's data were tested on the measured data, the bilinear model has a lower testing error than almost all the other more complex models. This shows that the idealized

Discussion
Section 4 shows that ANN models have the lowest training error among the models, when only the manufacturer's data are available. However, when the models trained on manufacturer's data were tested on the measured data, the bilinear model has a lower testing error than almost all the other more complex models. This shows that the idealized operation represented in the manufactured data does not include many of the variations present in the real operation represented in the measured data. Hence, the additional knowledge learned by the more complex models compared to the bilinear model does not help in predicting the measured operation. Another interesting result in Section 4 is that the models that do not use Q c as an input perform better than the models that use Q c . This could be because the models with Q c , as one of the inputs, must extrapolate in part-load condition since the manufacturers' data consist of only full load condition. Due to these reasons, a simple model is suitable in the design phase, when only the manufacturer's data are available.
During the operation of the heat pump, the models can be trained on the measured data. The measured data are generally a larger data set that includes the heat pump's actual range of operation. Moreover, the measured data also include many inefficiencies and nonstandard operating conditions of the real heat pump. The results of Section 4 demonstrate that it is justified to use more complex models when measured data are available. Among the models, the NN_3 model has the lowest testing error. Hence, the NN_3 model is better than the regression models in utilizing the larger size and range of the measured data.
The NN_3 model performed better than the regression models, but the difference between the accuracy of the models was small. To utilize the measured data effectively, we must refine the models to utilize the measured data. Since ANN models do not require an explicit relation between the inputs and the outputs, it is easier to refine ANN models compared to regression models. We can change the inputs, outputs, and architecture of the model. This gives ANN models an advantage over traditional regression models.
In Section 5, a method to refine the ANN model is presented. The method first identified an appropriate set of measures for improving the ANN by examining the error of the model. The measures were then tested and applied sequentially to the model. First, additional inputs to the model were selected through greedy forward selection. Then, a different output was tested for the model. In the last step, the number of nodes in the hidden layer was optimized. We were able to achieve an error of less than 5% for both heat pump models using this method. The relative RMSE for the test set reduced from 6.4% to 4.5% for HP1&2 and from 7.5% to 4.3% for HP3. The possibility of adding more inputs to suit the application makes ANN models suitable for a wide variety of applications, especially when the standard models cannot include some of the causes of variability in the performance of the heat pump.
ANN models can be accurate and flexible, but they also have some inherent limitations. The dependence of ANN models on training data is demonstrated in Section 4. The ANN models trained on manufacturer's data had a high RMSE when tested on measured data. This is because the manufacturer's data from an idealized simulation were not representative of the measured heat pump operation. This alludes to a more general issue with ANN models that the training data set must be representative of the testing data. A modification to the heat pump will require the retraining of all the model parameters.
Since the behavior of the ANN model is hard to explain, we cannot retain/modify the parameters of an ANN model of a heat pump to represent a similar heat pump without retraining the model. Hence, ANN models are suitable in applications that do not aim to modify the heat pump. Some applications in which ANN models are suitable are the fault detection and optimization of heat pump operation by controlling the inputs.
In this study, we used the evaporator and condenser temperature of the refrigerant side as input in the models. The refrigerant-side temperatures were provided by the manufacturer and measured during actual operation. It was therefore convenient to use refrigerant-side temperatures in our case. However, in most cases, it was more convenient to use evaporator inlet temperature (T EWin ) and condenser outlet temperature (T CWout ) on the water side instead of T ER and T CR . The advantage of using T EWin and T CWout over T ER and T CR is that they are easier to access. The source temperature determines T EWin, and the temperature requirement of the building gives T CWout . We therefore tested how using T EWin and T CWout instead of T ER and T CR in the NN_3 model affects the accuracy of the model. Changing the inputs from T ER and T CR to T EWin and T CWout reduced the relative RMSE for the testing data set from 6.4% to 6.5% for the HP1&2 model and 7.5% to 8.3% for HP3. The accuracy of the model decreased in the water temperature case, but the decrease in relative error was less than 1% of the average COP. This suggests that models can be modified to use the water temperatures instead of the refrigerant temperatures without any major reduction in accuracy.

Conclusions
In this study, we compared the accuracy of different regression and ANN models, trained on the manufacturer's data and the measured data. The study highlighted the importance of considering the quantity and quality of data while choosing a regression or an ANN model. The study showed that in the design phase, when only the manufacturer's data are available, simple models have an advantage. However, the more complex models have higher accuracy compared to the simple models when measured data from actual operation become available. An ANN model with three inputs has the lowest relative error among the models.
We systematically refined the ANN model to utilize the measured data effectively. We obtained an error of less than 5% for ANN models of both heat pumps using this method. The flexibility in the selection of inputs and outputs makes ANN models an attractive option for many applications that require additional inputs to model the operation of the heat pump. However, the need for representative data to train the model is a constraint for the application of ANN models.