Heating, ventilation, and air-conditioning (HVAC) equipment such as chillers and fans has an important impact on the energy use and electrical demand of large commercial and institutional buildings. Building automation systems (BAS) that are installed in such buildings can record measurements of hundreds of sensors, usually at 15 min time intervals. After the measurement datasets are validated, two kinds of problematic situations could be noticed, which are obstacles for the development, validation, and application of the prediction models:
The recent use of the transfer learning (TL) method for building applications has opened up new alternative solutions to problems created by small and incomplete datasets. This paper explores the application of TL for the performance prediction of chillers in a central cooling plant serving a university campus, using the measurement datasets from the BAS as a case study. This section presents briefly the concept of transfer learning and some details about the hyper-parameters of deep learning models.
1.1. Transfer Learning
When the prediction of the energy performance of HVAC equipment, such as a large-capacity chiller, is required, the usual approach consists first of collecting measurements over a time interval, as long as reasonably possible and cost-effective. The dedicated prediction model is then developed and tested, followed by the application. However, in many situations, the available measurements dataset is not adequate for the purpose of model development, either because the dataset is incomplete, some readings are erroneous, or variables needed are not recorded.
In this situation, the transfer learning (TL) method can help in the development of performance prediction models for HVAC equipment.
Transfer learning is used to improve a learner in a target domain by transferring knowledge obtained from a different but related source domain [
3]. Another definition simply says that TL is the ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks [
4].
When the size and quality of the target training dataset are not sufficient, TL uses a pre-trained model from the source domain as the starting point for the generalization to the target domain. Thus, the application of TL from a source domain to a target domain reduces the need for large task datasets for model training and validation, as well as the training time.
Transfer learning is formally defined by [
3] as follows: Given a source domain D
S with a corresponding source task T
S and a target domain D
T with a corresponding target task T
T, transfer learning is the process of improving the target predictive function by using the related information from D
S and T
S, where the two domains are different (D
S ≠ D
T) or tasks are different (T
S ≠ T
T).
A few definitions are presented here for the clarification of the method proposed in the paper.
When both the source and the target domains use the same metrics, the transfer learning is called homogeneous, while when the domains use different metrics, the transfer learning is called heterogeneous.
The form of homogeneous transfer learning between the source and target domains can be classified into four transfer categories [
4,
5]:
- (1)
Instance-based transfer learning can be used when samples in the source domain are reweighted in an attempt to correct for marginal distribution differences. The reweighted samples are directly used for training in the target domain. For instance, [
6] calculated the weights by using the means of the target and source domains.
- (2)
Feature-based transfer learning, which takes one of the following two different approaches. The first approach, called asymmetric feature transformation, works well when both source and target domains have the same labels and transforms the features of the source through reweighting to more closely match the target domain. The second approach discovers underlying meaningful structures between the domains and transfers both domains to a common feature space.
- (3)
Model parameter-based transfer learning starts by using parameters (i.e., weights of the deep neutral network (DNN) model), which were trained previously from a source domain, to initialize the weights of the target domain DNN model. Finally, the weights are fine-tuned for the target domain.
- (4)
Relation-based transfer learning, which uses the common relationship between the source and target domains.
There are three common transfer learning scenarios [
3] in terms of task similarity and the availability of labeled datasets [
5]:
- (1)
Inductive transfer learning occurs when the source and target domain data are available and the source and target tasks are different.
- (2)
Transductive transfer learning occurs when only the source data are available; the source and target domains are different but have the same tasks.
- (3)
Unsupervised transfer learning occurs when the source and target data are not available and the source and target have different tasks.
A few examples of application of the transfer learning method are listed herein: the detection of faults in chillers [
6,
7], the forecasting of building energy demand [
8,
9], the control of HVAC systems [
10,
11], the detection of faults in solar photovoltaic modules [
12], the detection of gas path faults across the turbine fleet [
13], the forecasting of indoor air temperature [
14], and the building information extraction [
15]. The detailed literature review of applications of transfer learning to HVAC systems is beyond the scope of this paper.
Liu et al. [
6] used a laboratory-controlled set-up of two chillers, the source chiller with a 422 kW (120 tons) cooling capacity and the target chiller with a 703 kW (200 tons) cooling capacity. The paper presented the results of applying two transfer learning strategies to a convolutional neural network (CNN) model used for FDD: (a) with complete source chiller working data, and (b) with partial data available. Several approaches were used, starting with the original pre-trained CNN model of the source chiller and applying it to the target chiller:
- (i)
by using the pre-trained model without any fine-tuning; and
- (ii)
using the pre-trained model for weight initialization, and then (b1) fine-tuning the weights of all layers, and (b2) fine-tuning only the weights of fully connected layers.
The results showed that (i) the performance obtained with fine-tuning of all layers was better than that with fine-tuning only fully connected layers; and (ii) the models trained on data from a particular source chiller are difficult to use directly with the target chiller. The results also revealed that the amount of source domain data does not have a significant impact on the improvement of the transfer learning model. However, the more data in the source domain, the better the stability of the model.
Fan et al. [
7] presented the development of an FDD model of a target chiller by using the support vector machine (SVM) with imbalanced datasets in size and diversity. The knowledge from one source water-cooled chiller of 422 kW (120 tons) capacity [
6] with a large dataset of normal and fault operations was applied to a target water-cooled chiller of 703 kW (200 tons) capacity with a smaller dataset. The use of prior knowledge from the source chiller along with the adaptive imbalanced processing enlarged the datasets of normal and fault situations, which helped for better diagnostic performance of the FDD model of the target chiller.
Qian et al. [
8] presented the improvement of daily and monthly forecasting of seasonal cooling loads in a building, which combined the load simulation with the EnergyPlus program with transfer learning, using only a small amount of available data. The simulation dataset of 15–21 July 2009 was used for training and validation, and the dataset of cooling season May–October 2010 was used as the target. An instance-based transfer learning strategy was applied. The results indicated that the transfer learning strategy could improve forecasting accuracy when compared with conventional load forecasting methods such as ANN.
Fan et al. [
9] presented a transfer learning method for 24 h ahead forecasting of building energy demand, using as a case study 407 buildings randomly selected as the source domain and other 100 buildings as the target domain. The usefulness of transfer learning was evaluated by using two learning scenarios and different implementation strategies. First, a pre-trained baseline model was developed from the source domain operational data. Then, the knowledge learned by the pre-trained model was transferred to target buildings using two implementation strategies: (i) feature extraction, where all the model weights are fixed except for the output layer or the last few layers; and (ii) weight initialization, where the weights of the pre-trained model are used for initialization only and are fine-tuned.
Two learning scenarios are simulated: (a) the training data available is insufficient; and (b) the building data are available but cannot adequately describe building operating conditions. The results obtained with scenario (a) showed that the value of the pre-trained model decreased with the increase in training data amounts. More stable results can be obtained when utilizing the pre-trained model for weight initialization. The results in scenario (b) showed that the value of the pre-trained model tends to increase with the increase in training data amounts.
Zhu et al. [
10] presented a framework for transferring the prior knowledge of an information-rich source screw chiller to build the diagnostic model for a new screw target chiller. Domain adaptation transfer learning is applied to overcome the differences in feature distribution between chillers. An adversarial neural network generated the diagnostic model for the target chiller, which has only easy-to-collect normal operation data along with the prior knowledge from the source chiller. Results indicated that the transferred diagnostic model for the target chiller yields decent diagnostic performance. The proposed transfer learning approach has improved performance compared with conventional machine learning models.
Coraci et al. [
11] used a homogeneous transductive TL to transfer a Deep Reinforcement Learning control policy of the cooling system from one source building to various target buildings by using hourly synthetic data from a simulation environment that coupled EnergyPlus and Python. The target buildings were derived from the source building by varying the weather conditions, electricity price schedules, occupancy schedules, and building thermophysical properties. The pre-trained control model of the source building was fine-tuned for each target building. The weight-initialization TL method was used as a knowledge-sharing strategy between source and target buildings.
Reference [
12] used TL with convolutional neural network (CNN) models to detect faults in solar photovoltaic modules. A CNN model was pre-trained using a dataset of thermographic infrared PV images of various anomalies found in solar systems, and the IR images are separated according to different fault classes. The offline augmentation method was used to increase the classification success in the case of a low number of fault images. The model was re-trained to extract multi-scale feature maps to classify anomalies. The results from the proposed method show that the average improvement could reach about 24% compared with the conventional model training strategy.
Li et al. [
13] proposed a method that is capable of learning transferable cross-domain features while preserving the properties and structures of the source domain as much as possible. The method was applied to the detection of gas path faults across the turbine fleet. The goal was to complete the diagnostic task of the target domain with the help of fault knowledge learned from the source domain.
Bellagarda et al. [
14] tested different TL methods applied to neural network models for the forecasting of indoor air temperature in existing buildings. The results indicate that the implementation of TL contributed to an extension of the forecast horizon by 13.4 hours on average. The model’s performance maintains acceptable accuracy in 79% of the cases.
Wang et al. [
15] applied TL along with deep neural networks to improve the query from building information modeling (BIM). They achieved a high precision of 99.76% on the validation dataset.
1.2. Deep Learning
Deep learning (DL) is a machine learning method [
16]. An example of a DL model is the multilayer perceptron (MLP), composed of a mathematical function (that may consist of many simpler functions) mapping some sets of input values to output values, where each application of a different mathematical function can be regarded as providing a new representation of the input.
DL refers to machine learning models that include multi-levels of nonlinear transformation. Deep neural network (DNN) models are the application of such strategies to neural networks [
17].
Figure 1 shows the multilayer perceptron (MLP) as a fully connected DNN model, as an example, for the prediction task with three sequential layers: (1) one input layer including
n inputs, (2) three hidden layers, and (3) one output layer including
m outputs. Each layer is composed of adjustable neurons, where the number of neurons in the input layer usually depends on the number of independent inputs, and the number of neurons in the output layer depends on the number of predicted target variables.
DNN is a type of feed-forward neural network; the summation of inputs with corresponding weights and biases (e.g.,
W(1) and
b(1) in
Figure 1) is sent to the first hidden layer and goes through a non-linear transformation due to the activation function applied. The information from the first hidden layer, along with corresponding weights and biases, is sent to the next hidden layer for another non-linear transformation, following the feedforward process. Equations (1)–(3) are listed as examples.
where
h(1) and
h(2) are the outputs of the first and second hidden layers,
g is the activation function,
is the output of the output layer, and
WT and
b are the weights and biases of the corresponding hidden layer.
A few examples of recent publications (2015–2022) that focused on prediction or classification tasks are listed in
Table 1. A detailed literature review of the application of DNN in the field of HVAC systems is beyond the scope of this paper.
Important hyperparameters of the DNN model are the number of hidden layers and the number of neurons in each hidden layer. A DNN model with more hidden layers usually outperforms the one with only one hidden layer in some aspects, like greater accuracy and preventing overfitting. All reviewed papers used between two and six hidden layers. However, there is not a general rule to set the number of hidden layers, except for the recommendation found in [
18]. Each hidden layer has either an equal or unequal number of neurons. The activation function of the rectified linear unit (ReLU) is used in seven papers, while four other papers do not give information about the activation functions (marked as NA in
Table 1).
According to [
18], the best performance and robustness of a DNN model could be achieved with: (i) two hidden layers; (ii) the number of neurons in the first hidden layer equal to 2 × (2 ×
n + 1), where
n is the number of neurons in the input layer; and (iii) the number of neurons in the second hidden layer equal to (2 ×
n + 1).
1.3. Objective: Hypothesis Testing
This paper verifies the hypothesis that the DNN models that are pre-trained for one chiller (called the source chiller) with a small dataset of measurements from 14 days in July 2013 could be applied successfully, by using TL strategies, for the prediction of the operation performance of another chiller (called the target chiller) with different datasets, which were recorded three years later, during the cooling season of 2016. Measurements, recorded by the BAS of a university campus, are used as a case study.
The measurement datasets, obtained from BAS, are recorded every 15 min. This short time interval brings a larger variation of data compared with the hourly or monthly measurement data or with synthetic data. The use of BAS trend data are also more challenging in terms of missing data, noise, and erroneous data than using data from laboratory-controlled conditions or synthetic data. Hence, achieving high performance in transfer learning in this study becomes more challenging.
The paper proposes the use of homogeneous transductive TL between the source chiller and target chiller, with a few strategies for weight initialization. The prediction performance of DNN models is evaluated over a few validation datasets of the target chiller, and the effect of TL is discussed.
The paper is structured as follows:
Section 2 presents the method proposed for the application of transfer learning to the case study of existing chillers.
Section 3 presents details of the case study based on measurements from the BAS.
Section 4 presents the architecture of DNN models for the prediction of target variables and the training and validation datasets.
Section 5 presents the discussion of the results. Chapter 6 presents the conclusions from the case study of transfer learning.