Semi-Supervised Transfer Learning Methodology for Fault Detection and Diagnosis in Air-Handling Units

: Heating, ventilation and air-conditioning (HVAC) systems are the major energy consumers among buildings’ equipment. Reliable fault detection and diagnosis schemes can effectively reduce their energy consumption and maintenance costs. In this respect, data-driven approaches have shown impressive results, but their accuracy depends on the availability of representative data to train the models, which is not common in real applications. For this reason, transfer learning is attracting growing attention since it tackles the problem by leveraging the knowledge between datasets, increasing the representativeness of fault scenarios. However, to date, research on transfer learning for heating, ventilation and air-conditioning has mostly been focused on learning algorithmic, overlooking the importance of a proper domain similarity analysis over the available data. Thus, this study proposes the design of a transfer learning approach based on a speciﬁc data selection methodology to tackle dissimilarity issues. The procedure is supported by neural network models and the analysis of eventual prediction uncertainties resulting from the assessment of the target application samples. To verify the proposed methodology, it is applied to a semi-supervised transfer learning case study composed of two publicly available air-handling unit datasets containing some fault scenarios. Results emphasize the potential of the proposed domain dissimilarity analysis reaching a classiﬁcation accuracy of 92% under a transfer learning framework, an increase of 37% in comparison to classical approaches.


Introduction
The world's energy demand has been increasing dramatically over the last few decades due to rapid economic growth and the incorporation of emerging and developing countries into the industrialization and global value chains. In this regard, commercial and residential buildings account for 30% of the total energy delivered, and this energy use is expected to continue growing in the coming years [1]. Among the different building services systems, the biggest energy consumers are heating, ventilation, and air-conditioning (HVAC) systems, reaching values around 40% of the total building consumption [2] and it is expected that the global eneir-conrgy demand for cooling could triple between 2020 and 2050 [1]. Therefore, an optimal operation of HVAC systems is critical for simultaneously maintaining energy efficiency and thermal comfort [3]. Improper control, performance degradation and equipment faults are problems faced by the vast majority of building energy-related systems, thus resulting in significant energy waste [4]. It has been estimated by multiple studies that a proper operation of HVAC systems can lead to energy savings of around 20-30% while maintaining the same thermal comfort of its occupants [1], that is, savings of near 10% of the total energy consumption in a tertiary building.
To this extent, building energy management systems (BEMS) are used to collect and store energy consumption and equipment operational data, as well as to manage HVAC systems' operation. Besides BEMS, the surge of IoT-enabled devices and smart sensors has contributed to the huge increment of data related to building services and their energy consumption. This massive digitalization is leading to advanced studies focused on optimization strategies, such as energy consumption forecast [5,6] or advanced control strategies [7,8] aiming to increase the energy efficiency of building HVAC systems. Additionally, the deployment of advanced maintenance strategies is considered one of the research lines with more potential impact on HVAC energy consumption [9]. Once a fault occurs in one of the HVAC sub-components, it leads to a failure or degradation of the system's performance to meet the setpoint requirements, and a significant portion of energy can be wasted continuously [10]. Indeed, without predictive maintenance strategies, such as fault detection and diagnosis (FDD) models, the system will evolve to critical damage to the HVAC equipment. In this regard, automated fault detection algorithms are expected to use the available data in BEMS to determine the actual HVAC condition and detect earlier possible deviations from the expected behaviours [11]. Automated FDD for HVAC systems has been extensively researched in the last few decades, obtaining reliable results and impressive accuracies when detecting and diagnosing faults. However, the availability of a large number of samples for both the normal and faulty operation regimes is a common requirement for these methodologies. Generally, current studies use specific and complete datasets where all the scenarios are well represented and very few articles can be found to address the issue of lack of representativeness of one or more faulty classes [9]. On the one hand, some researchers tackle this issue by constructing detailed simulation models where all the operating points can be studied, but as R. Zhang and T. Hong mentioned in [12], these simulations end up being highly complex due to the following reasons: (i) HVAC systems consist of highly interrelated components that may present complex coupling effects (e.g., a fan degradation may affect the supply airflow and/or increase fan power, but it also may affect the heat transfer performance of coils and its energy consumption), (ii) operational faults can present diverse impacts on different aspects of the building performance depending on the seasonal period when it occurs, (iii) a particular fault may present different operational characteristics and needs to be simulated accordingly (i.e., static vs abrupt vs degradation faults), (iv) the results may vary depending on the empirical curves or the component model employed, and (v) fault modelling should take into account various modes of operation. On the other hand, some authors suggest recreating all faults on the real equipment and using the newly generated samples to train the model. However, in real applications, obtaining data able to characterize the system is also a difficult task due to two main reasons. First, it is generally hard to obtain a full training dataset with the same distribution as the online operating conditions because of seasonal variations, building envelope, and occupancy variations that affect the system operation and control. Obtaining it would imply a large data collection phase, including sparse samples of a whole year before the model can be considered robust and reliable enough. Second, in-service HVAC machines are not allowed to work continuously under faulty conditions, since it involves not only high economic costs but also comfort issues. In addition, when the system experiences a fault, hardly ever it is correctly labelled in the building energy management system. Therefore, most of the studies end up using synthetic data or limited sets of representative acquisitions, leading to non-generalizable solutions with an intrinsic risk of overfitting that can only be applied to specific equipment. As pointed out by S. Lazarova-Molnar and N. Mohammed in [13], a data-driven FDD framework based on taking advantage of several buildings' data streams will contribute effectively to palliate the effects of a limited set of data. Leveraging information coming from different buildings is still a challenging task, and at the same time, it is expected to be a revolution to promote the massive deployment of FDD systems in the sector [14]. This paradigm shift over classical data-driven approaches is known as transfer learning, and it is defined as the knowledge transfer among similar applications [15] using the knowledge learnt from previous tasks and domains to a new one. Transfer learning has been successfully applied to other machinery or condition monitoring fields such as bearings in rotating machinery [16] or nuclear power plants monitoring [17]. In the last few years, several works have been published in the field of HVAC applications, especially those related to chillers [18,19]. Nonetheless, they are usually applied to synthetic datasets [20] that are a replica of the real installation or to data collected in a laboratory [19] where the source and target machines share external conditions and several components making their problem more similar than in actual industrial and real applications. Few researchers have addressed the issue of applying transfer learning methodologies to highly dissimilar datasets, and it remains unclear how these approaches will respond when pushed to their limits to build a common feature space for the studied domains [21]. Studying the domain and data samples to use during model training has been proven to help in increasing the model's performance by reducing the negative transfer of knowledge caused by highly dissimilar samples [22]. This is even more important for HVAC equipment since its size and configuration are customized based on the building characteristics and greater attention should be put when analysing the domains intended to be combined [23].
In this respect, this study explores, for the first time, the transfer learning in the field of FDD for HVAC related applications considering a high domain shift between the domain that transfers the knowledge and the application domain, which is the real issue in the field of HVAC FDD. This work contributes to the state-of-the-art by proposing a methodology based on the study of the discrepancy between both domains and the usage of a dissimilarity filter that rejects samples that would potentially produce a negative transfer of knowledge during the training process of the selected algorithm. The inclusion of such a filter is the main contribution of the proposal; therefore, a classical neural network architecture is transferred using a simple parameter-based transfer learning to showcase the potential of the filter. To increase the significance of the study, the typical faults for a facility AHU Variable Air Volume (VAV) system have been considered to validate and evaluate the proposed model and a comparison between the proposed approach and some other current transfer learning methodologies has also been studied. Specifically, the originality of this work can be summarized in the following key aspects: • The proposal of a cross-domain transfer learning approach for FDD in HVAC systems with high dissimilarity between their operating conditions. The study includes the description of the application framework and a quantitative analysis of the transfer learning performance. • The use of a parameter-based transfer approach for transfer learning in FDD applications for AHU equipment. The proposed methodology for the leverage of knowledge is also compared to other state-of-the-art transfer learning approaches to ensure the validity of the study hypothesis. • A robust data instances filter for the AHU datasets, based on the reference model predictions' probability to evaluate the similarity between the samples. This dissimilarity filter is compared to other uncertainty measures commonly found in the literature.

Literature Review
The interest in signal processing methodologies that enable the manipulation of data and information of the systems under monitoring and/or control is a reality in several fields of investigation [24][25][26][27]. All these studies make available valuable knowledge which is used as the inspiration for applied studies such as the monitoring of HVAC systems' operation. Research on automated FDD in HVAC systems has been an active research field for more than 20 years. Being a system that contains numerous and simultaneously interacting sub-systems, the HVAC equipment can be considered complex and hard to model using physical or mathematical representations. In contrast to that, data-driven approaches are less affected by the complexity of the system, as they rely solely on historical operation data. Currently, data-driven methodologies are considered the most popular choice in the field, thanks to their reduced development complexity and the fast-growing of machine learning powered methods [28].
Like in other machinery classification problems, many data-driven techniques have been applied to HVAC FDD. Classical approaches such as Support Vector Machines (SVM) [29], Bayesian Networks [30] and Artificial Neural Networks (ANN) [31] are popular choices in the field. Advanced methodologies such as ensemble learning techniques [32] and deep learning algorithms [33,34] are also gaining attention in the last few years. Hence, the state of the art proves the viability of applying data-driven schemes for the supervision of HVAC systems in general, and AHU in particular; where classical machine learning approaches such as ANN already show a great performance for the operation classification tasks. However, FDD models are data intensive and as HVAC systems mostly operate under fairly normal conditions; therefore, the facility manager may need to either (i) wait for a long time to collect enough fault data-even if they can be correctly labelled-to train a useful FDD algorithm, or (ii) rely on established industry standards and exhaustively simulate potential scenarios. Neither option takes advantage of the new digitalization paradigm, in which a rich stream of data from well-equipped buildings and different HVAC will be available to overcome the problem of needing enough representative data of the system operation.
In the last few years, some strategies have already been proposed to deal with the imbalanced dataset problem arising from the lack of faulty data, falling the majority of them in the semi-supervised learning field. In [35], Yan et al. developed several classical machine learning methodologies in a semi-supervised manner for AHU FDD, based on the idea of pseudo labelling those samples with high-class confidence and adding them to the training pool, hence improving the models' performance. In addition, Fan et al. [36] follow a similar approach using ANN as their machine learning approach. In [37,38], the authors make use of generative adversarial networks (GAN) as a pre-processing stage to address the imbalanced data problem in an AHU and a chiller respectively, by generating a stream of synthetic data with the same distribution as the original dataset. All these proposals, however, are built upon the hypothesis that a scarce number of samples is sufficient to characterize the underlying physical behaviour of the machinery, which is something difficult without falling into an overfitted response in any machine supervision field, especially in HVAC related applications, that are heavily affected by the seasonality.
In this regard, transfer learning aims to provide a more generalizable solution approach [39] to address not only the challenge of having highly imbalanced data but also the problem of having a fully transferable model applicable to buildings without requiring nearly any historical sample. The idea behind transfer learning is that the knowledge acquired by training a predictor on an existing and fully labelled database can be used in another, but similar, with fewer or unlabelled samples.
In the field of HVAC, some studies have been conducted to address the potential of transfer learning, mostly devoted to monitoring, supervision and control of the system by predicting temperature, humidity or energy consumption [20,40,41] to then adjust the control strategy accordingly. In the FDD field, only a few studies have been published using transfer learning techniques applied to chillers. For example, [19] applies domain adversarial neural network (DANN), a common technique used in domain adaptation based on minimizing the classification loss while maximizing the domain confusion, to solve the problem of cross-domain models for two screw chillers. Similarly, the authors of [23] use transfer component analysis (TCA) to adapt the feature space of both the source and target chillers to a more similar sample distribution.
As it can be extracted from the extensive review on fault diagnosis cross-domain transfer learning carried out by Zheng et al. [42], while most of the transfer learning algorithms lean on using the similarity between the source and target domains (either indirectly or employing a discrepancy metric that helps to align the source and target datasets distributions), very few study how the dissimilarity of the samples fed to the model can affect its performance and in what measure can be improved if these samples are dropped before the actual training phase of the model. In addition, the majority of published studies assume and use datasets with high similarity between the source and target domains and apply transfer learning methodologies taking the focus out of the particularities of the application which, in the case of HVAC, is characterized by custommade systems for each building so neither the dimensions of the equipment and the ducts, the energy requirements of the machines (fans, compressors, . . . ), the building geometry nor the external variables (i.e., weather, control schedule, occupancy pattern, . . . ) are necessarily similar.
This study provides new insights on the importance of considering dissimilar datasets when dealing with transfer learning in HVAC related applications since it is in these situations where transfer learning methodologies struggle to achieve reliable results [21].
The proposed study considers a transfer learning approach, based on a discrepancy filter, applied to neural network algorithms for fault detection and diagnosis problems in HVAC machinery. It aims to deal with the low and imbalanced number of representative data by leveraging different installations data samples selected through the discrepancy filter.
The rest of the paper is organized as follows. Section 2 describes the typical AHU system and exposes the common ways of obtaining a model for it. Then the methodology proposed is presented and derived step by step. Section 3 covers the results of the methodology when applied to a real case study consisting of two publicly available datasets, to verify its effectiveness. Section 4 discusses the results presented and finally, Section 5 summarizes this article and presents its conclusions.

Materials and Methods
Given the scarcity of normal and faulty data for a newly installed AHU system, there is a necessity of leveraging the knowledge coming from a different AHU installation to obtain better classification results. However, given that for real HVAC applications is common to find a large domain shift (i.e., a big dissimilarity between domains that cause classes not to overlap in the classification space), the transfer learning methodology should include a pre-processing stage to align the domains. Figure 1 depicts an overview of the proposed method, which consists of four major parts: (1) construction of the reference model, which is expected to have a high classification accuracy on the source domain; (2) data collection and labelling for the target AHU system; (3) perform a dissimilarity detection step to discard samples whose corresponding operational conditions differ from the generalization of the source domain; and (4) re-training of the source classifier with the newly collected data from the target domain. The following subsections describe in detail these four parts of the method.
(2) (3) (4) Figure 1. Overview of the proposed methodology. The initial source AHU training dataset consists of labelled samples for all the classes and the target AHU dataset typically consists of fewer labelled samples that are used to tune the initial model.

Construction of the Reference Model
The first step in the methodology is the construction of the reference model. This model needs to be built from other machines' historical data or simulation models. This data, known as the source domain in transfer learning, is expected to be already clean or require minor formatting. In this study, the features are scaled using a simple min-max score, which has been chosen over other methods such as z-score due to the possible difference existing between the control signals scales (in the source and target domains for the same variable) or the possible variations in temperature distributions between domains. Min-max is represented by the following equation: where x is the feature column in the database and x max , x min are the maximum and minimum values for the feature. In addition, another pre-processing step applied in this study is to discard the samples corresponding to periods marked as unoccupied in the datasets. During these times, the AHU machines are set to idle and there are no heating or cooling processes taking place. Using these samples during the training of the model would be counterproductive, as it means assigning non-representative patterns that will show the same independently of the fault. The assignment of such a sample to one of the considered classes will lead to a degraded reference model and a poor transfer of knowledge between both domains as the classes get over-represented artificially and in a non-significant manner. Considering that the model developed will be trained using historical databases, the classes contained in these databases will be naturally imbalanced. Therefore, correcting the class representation is a necessary step before undergoing the training process. The dataset balancing approach can vary depending on the class distribution, creating new samples for those classes under-represented or decreasing the samples of the majority class. Well-known techniques such as synthetic minority over-sampling Technique (SMOTE) and NearMiss [43] can be used for the former and the latter, respectively. As exposed in many studies [35,36,44], generally there is a huge imbalance of the classes in favour of healthy behaviour. Therefore, the strategy followed in this study is to under-sample the healthy class of the source dataset since it is highly represented and is expected to have enough fault data in the source domain AHU to characterize the faulty conditions and over-sampling them will only make the training process slower. To perform the resampling, the NearMiss algorithm is employed. After that, the condition classifier is trained following the traditional approach of splitting it into training and testing datasets.

Target Data Collection and Labelling
For the target AHU system, the proposed methodology assumes that before the deployment, the maintenance team will label a small portion of the samples using their knowledge and maintenance records or will force experimentally the faults during short periods to collect representative examples of the faults. As in all data collection processes, data cleaning must be performed to get rid of incorrect or inconsistent data that may lead to false conclusions and decrease models' performance. This stage typically includes missing values imputation, dropping irrelevant data, removing transient states and outliers, and feature scaling, among others [45]. The details of these procedures are well-known in the field and, despite it needs to be adapted to the available data, the applicable steps and criteria follow quite general rules concerning the integrity and uniformity of the data, such as the ones described in [46].

Dissimilarity Reduction Previous to the Transfer Learning
When carrying out a transfer learning approach, the first and most important assumption that is made is that both domains have some grade of similarity, enhancing the idea of leveraging the knowledge from one to the other. With HVAC equipment, any operating point is defined by a large number of parameters and this similarity cannot be ensured for real systems which are affected by different climate or energy requirements. In addition, the seasonality present in their operation affects the distribution and the same fault can correspond to different operating points. Using samples too distinct would cause a bad effect on the knowledge transfer inferring fault patterns on operating points that are unknown in the reference model, leading to misclassification and therefore decreasing the model's overall accuracy. This fact makes the application of a pure transfer learning methodology not advisable without first filtering the data on the target domain to some extent. This procedure is somehow similar to what is called active learning, which groups different techniques behind the idea of prioritizing the data which needs to be labelled/used to have the highest impact on the training of a supervised model. The most popular techniques of active learning are based on calculating the uncertainty of the predictions. It is argued that the more uncertain the prediction is, the more information it can add to the model if its ground-truth value is added to the training set [47]. Different uncertainty measures have been used in the literature. Among the most used there is the entropy [48][49][50], which measures the certainty of a classifier by means of the probabilities or scores that are assigned to each class in the prediction. For a given sample x, the entropy is defined as: where p y j |x represents the probabilities of the classes obtained from the classifier for the sample x and N is the number of classes. When the entropy is high, it means that the classifier evaluates nearly equally the classes for that sample. In other words, it is uncertain about which class to assign to the data. Another common method, called margin sampling, is based on computing the difference between the highest and the second highest score in the probabilities' vector. Thus, the confidence is computed by the following equation: In this case, the idea is to select the samples with the lowest value of confidence. In practice, the data evaluated is sorted from highest to lowest entropy or from lowest to highest confidence and then the use of the samples is prioritized in that manner selecting the samples where the model is more confused about which class should be assigned. This can be counterproductive if applied to a problem with a large domain shift between source and target domains because it will use samples with high uncertainty (i.e., located in the boundaries between previously known classes) to fine-tune the model and will avoid those samples that are actually overlapped with the previously learned distribution, even if they correspond to a different class.
In this study, a novel method is proposed based on using the reference model previously constructed as a screener on the labelled target data. Algorithm 1 shows the logical algorithm followed to select the training samples. The objective is to obtain a similarity grade between the available data on the target domain and the data belonging to the source domain, to use only the part of data that reassembles the behaviour of the target. This technique is a trade-off between simplicity and robustness; making it a robust solution to extract the similarity between each of the input vectors. Then, by applying a soft threshold, those samples that are different from the source data (i.e., clearly different operating conditions of the HVAC system) can be discarded. This threshold is set empirically by evaluating the ANN probability outputs. In all the cases evaluated, a probability value between 50-75% results in good classification. The final decision on the threshold would depend on the number of samples available. if Prediction confidence is over a threshold δ then 4: ifŷ i = y i then 5: Add sample x i to the target training pool N

Transfer Learning to the Target Model
Following the same procedure described for the case of the reference model construction, in the target model the data needs to be balanced. On this occasion, after the dissimilarity reduction process described above, a new resampling stage is applied, but now aiming at balancing the target dataset, which is assumed to contain a large number of samples from the healthy class and a few samples of the faulty ones. In contrast with the scheme followed for the reference model, where the fault samples are considered enough and the majority class was reduced, here the faulty classes are over-sampled using the SMOTE algorithm [4]. In this study, the SMOTE applied makes use of the 5-nearestneighbours to a sample to create a new similar instance. Then, a parameter transfer learning approach is applied, using the reference model weights and bias as the starting point for the target model and the fine-tuning process is carried out using the filtered target data. The flowchart of the methodology, including all the steps explained in this section, is presented in Figure 2.
Regarding the features used to derive the model, the methodology developed only uses variables and measurements directly obtained from the building's energy management system. Typically, these signals consist of both metered sensors and data that is sent to the HVAC system actuators. In an ideal scenario, the sensor network installed in the Airhandling units will include temperature, pressure, flow rate and power meters. However, in practice, only temperature sensors are found in the majority of installations since the rest are relatively expensive and not essential for equipment control. A recent study on feature selection for AHU FDD algorithms [35] showed that among the most important variables to add as features there are air temperatures and humidities at the different stages of the AHU system. Thus, the set of features used in this study to characterize the AHUs system corresponds to information available in the vast majority of HVAC monitoring devices. The features used in this study are listed in Table 1.

Construction of the reference model
Target data collection and labelling Target data collection and labelling

Variable Meaning Description
T sa Supply air temperature Temperature at the exit of the AHU served to thermal zones, after the mixing box, heating and cooling coils and the supply fan.

T oa
Outdoor air temperature Temperature of the fresh air that is fed into the fresh air intake. The volume of air is controlled by the outside air dampers.

T ma
Mixed air temperature Temperature of the mixing chamber or economizer, where the fresh air is mixed with the exhaust air allowing the recirculation of the latter.

T ra
Return air temperature Temperature of the air that leaves the thermal zone, before the return fan.
T sa, sp Supply air temperature setpoint Setpoint for T sa . It is achieved by modifying the control (e.g., open/close dampers, heating/cooling the air)

Results and Discussion
To validate the proposed methodology and discuss the results obtained, the real case study is presented in this section, together with the results in every of the aforementioned methodology stages.

Experimental AHU Installations
In order to validate the proposed transfer learning based on FDD methodology, this study is based on AHU's equipment due to its extended presence in HVAC systems. Indeed, AHUs are the key components for maintaining a comfortable indoor environment and healthy air quality. To facilitate the reproducibility of the methodology and comparison of its performance, this study uses two publicly available datasets as its source and target domains. On one hand, the source domain is a real-world AHU faulty operational dataset coming from the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) project RP-1312. This dataset was collected by Li et al. [51] through a series of experiments carried out in the summer of 2007 and the spring and winter of 2008. The research was conducted at Iowa Energy Center's Resource Station (ERS) located in Ankeny, Iowa. The ERS, depicted in Figure 3a, was established to compare energyefficiency measures and demonstrate HVAC concepts. Several experiments have been conducted in the ERS and details of the test facility were provided by [52]. To perform side-by-side testing, the facility is equipped with three AHUs and the floor plan of the ERS (see Figure 3b) is comprised of three distinct and separate areas. AHU-1 serves the common areas and the remaining AHUs serve two sets of test systems. AHU-A is identical to AHU-B, and each of them serves 4 test rooms with identical construction and exposures yielding to identical thermal loads. A rooms are serviced by AHU-A and the B rooms by AHU-B. In the dataset, one of the AHUs ran under normal conditions while the other ran under several fault circumstances.  On the other hand, the target dataset considered in this study comprises experimental data obtained from a single-zone VAV-AHU located at the Lawrence Berkeley National Laboratory (FLEXLAB) in Berkeley, California [53]. FLEXLAB facilities, shown in Figure 4, is one of the most advanced integrated building and grid technologies test beds and since 2014, over a dozen of case studies have been developed to explore energy efficiency measures. The target dataset was collected during the summer of 2017. The collection time interval for both datasets was 1 min. In general, each fault was tested for one day, meaning that we have at least 1440 samples for each of the faults considered. To ease the understanding of the operation and possible faults present in an AHU, a typical single-duct variable air volume AHU is shown in Figure 5. The outdoor air is the main input to the AHU system that, after being provided to the AHU, is mixed with the return air from the multiple zones. Then, it passes through the cooling/heating coil where the mixed air achieves the desired temperature. Finally, this air is supplied to multiple zones. Other major components of the system supply and return fans, which are in charge of modulating the supply and return air volumes; the return/outdoor/exhaust air dampers, which regulate the airflow that passes through them; and the heating/cooling coils valves, which are employed to adjust the water flow in the coils. Faults occurred in AHUs are related to sensor metering errors, fan speeds, damper positions, and leakage or malfunctioning of the heating and cooling coils.
Typically, an AHU installation can be used for both heating and cooling purposes. In this study, as cooling is the fastest growing use of energy in buildings [54], the cooling season is used for testing the performance of the methodology. Moreover, to transfer the fault knowledge and verify the methodology, both domains must contain representative samples of the fault conditions that are present in both. Additionally, some of the samples in the target dataset can be used to verify if the fault pattern has been successfully transferred from the reference model, thus validating the fault detection and diagnosis. With these constraints, four conditions are evaluated, the normal operation data, denoted by F0, and three typical fault types are selected and denoted from F1 to F3:  To provide some insights into the behaviour and similarities of the considered source and target AHUs systems, Figure 6 depicts the difference in some of the variables used as features during two consecutive days which correspond to samples labelled as healthy in the databases. The outdoor temperature comparison is shown in Figure 6a. As can be seen, the difference between the temperatures measured is in the 7-15 • C range given the different locations of the facilities used to generate the datasets. The temperature profile corresponds to the average behaviour on a summer day where the ambient temperature is higher between 12:00 h and 16:00 h. From Figure 6b, one can see that the temperature achieved by the AHU to be supplied to the building spaces ranges between 22 • C when the cooling system is off, and 12.5 • C when the outdoor air is cooled. As it can be seen, the temperature is near the same for both datasets; therefore, the offset in outdoor temperature is compensated by changes in the systems' components or their operation. These changes can be slightly perceived in Figure 6a where despite both AHUs share the same rule-based schedule to turn on the cooling regime at 6:00 h and turn it off at 18:00 h, there is a delay in the reduction of the temperature in the supply air duct. The differences in the control and operation regime of the AHU systems are more evident in Figure 6c-e, where some of the actuator variables are presented. There are changes in the magnitude of the variables which may be caused by the difference in the components' dimensions or specifications. The feature scaling applied to the features will help to palliate these changes. However, there are also changes in the shape of the curves, especially for the outdoor air damper (Figure 6c) and the supply air fan control signal (Figure 6d). Both signals take continuous values between 0 and 1. For the outdoor air damper, 0 means a fully closed damper and 1 means that the sheets are in the fully open position. In the case of the supply air fan, its speed is variable, and a value of 0 corresponds to a stopped fan while 1 corresponds to the fan working at full speed. In addition to the delay of the turning on, the signals show how the control algorithm of the target domain includes reducing the outdoor air intake during the hours when the outdoor temperature is maximum or how the turning off is conducted more smoothly in the target domain than in the source domain. Figure 6e shows the signal that controls the opening of the cooling coil valve, as in the other two, it takes values in the interval between 0 and 1. The main difference observed is the delay in the first opening of the valve and the presence of spikes in the source domain measurement. This kind of differences represents quite well the challenge that inspires the use of a transfer learning methodology, there are clear similarities in how the systems work and how the faults affect the variables measured, but in real applications there are also differences in the control and operation algorithms and these are the points that current studies in transfer learning for HVAC are not studying.

Methodology Analysis
This section covers the results of the methodology following the steps depicted in Figure 1. To present these results in an ordered way, it has been split into three different subsections: the first covers the construction and validation of the reference model and uses it as a baseline to test its performance over the target dataset; the second presents the results obtained by using the methodology, preceded by a comparison of the dissimilarity filter proposed in this study with the rest of approaches exposed in Section 2.3; and the third section covers the comparison of the presented methodology with current transfer learning alternatives.

Model's Overall Accuracy Validation
First, an ANN-based reference model is built up using the source domain dataset for training purposes (i.e., ASHRAE RP-1312). The considered neural network based model is a simple structure consisting of 3 hidden layers with 150, 100 and 50 neurons that use a ReLu activation function. This structure configuration, that is, the number of layers, number of neurons and activation function, has been designed following common empirical procedures that can be found in the corresponding literature based on available data size and resulting performance [36].
As a result of the data balancing step, the source model is trained using a total of 10,028 samples, being 4977 of them normal data, selected using the NearMiss algorithm. The rest of the samples are distributed evenly. The ANN model is trained during 100 epochs. The confusion matrix obtained for the fault diagnosis is shown in Figure 7, where the reference model shows a global accuracy of 99%, verifying that the source domain contains a good representation of the considered faults. The model can label correctly the healthy class (F0), raising a low number of false positives. The classification accuracy for the fault classes is also high, with only minor difficulties for the F2 class that has samples which are confused with the normal operation. That behaviour can be due to the seasonality dependence since F2 corresponds to a leak in the heating coil and the fault has been characterized during the cooling season when the temperature setpoints require cooling of the supply instead of heating it, hiding the possible effects of a fault in the heating system.
Applying the model directly to the target installation will not lead to an acceptable performance since it does not have any information about the machine. Figure 8 shows the accuracy of the reference model when it is tested against the target domain dataset, following a direct FDD assessment. As it can be seen, numerous target samples fall on the classification region of F3, clearly showing the dissimilarities between both domains since a great part of the target space fall in the region previously defined as F3 faults during the training. This behaviour is somehow expected since the difference in the cooling system performance can fool the reference model into thinking that there is a fault in the system when in reality it is working under normal conditions. It should also be noted that the direct use of the training data collected on the target HVAC system as the only source of data is not enough to correctly model the target HVAC. The evaluation of the performance of a model with the same structure but trained using only normal operation data and 30 samples for each of the fault classes is depicted in Figure 9, show that whilst the F2 class seems easily diagnosed with not more than 30 samples, there is a severe overlapping between samples belonging to the F1 class and the normal operation, and a partial overlapping between the samples belonging to F3 class and normal operation.

Dissimilarity Filtering Comparison
After obtaining the reference model, it is used to filter target samples that do not show a similarity with the already seen conditions. Here, the Algorithm 1 with a decision threshold of 50% is compared to the different uncertainty sampling techniques defined in Section 2.3. Table 2 contains a comparison between the proposed methodology, based on using the same model reference as a filter, entropy sampling, margin sampling and random sampling. In all the cases, the fault classes are balanced before feeding them to the model. The metrics chosen for the comparison are the F-score, the precision and the recall for the output obtained from the classifier on the target test dataset. For the case of random sampling, 200 data instances are selected randomly from the target training pool dataset and, as it can be seen, the performance obtained by using this sampling filter is rather poor having a precision and recall near to 70%. The rest of the uncertainty sampling techniques considered select a greater amount of data samples (around 3000) when compared to the proposed method (234 samples), and despite this difference, their performance is between 7% and 10% lower than the one for the proposed method. This drop in the performance can be explained by the common nature of the uncertainty sampling methods since this kind of approaches based on selecting the most informative samples for the model tend to choose the samples near to the model boundary decisions to increase the model's knowledge. To show this behaviour, a t-distributed stochastic neighbour embedding (t-SNE) representation of the whole training dataset (i.e., the source samples plus the selected target samples), has been depicted in Figure 10. This t-SNE map shows that the uncertainty sampling approaches select samples corresponding to a highly uncertain cluster of the target dataset, which does not fall over the distribution of the source domain. With these samples, the algorithms are selecting an unseen behaviour of the target machine, but as it does not fall on the already known region, for a transfer learning methodology applied to HVAC systems it is a probable cause of distortion for the already known space, hence con contributing to the negative transfer and leading to a performance decrease of the model.  The reference model is then also used as the starting point for the final model and a parameter transfer approach is used to fine-tune the model parameters with the target data instances. The samples selected for the fault conditions and the normal operation are balanced using SMOTE and added to the training dataset. The neural network is then trained for 300 epochs to achieve acceptable diagnosis accuracy. Figure 11 shows the confusion matrix after the transfer learning where the model predicts erroneously roughly 11% of F3 samples as F1. This is not a severe issue since, although the model fails in diagnosing which of the faults is occurring, it does not end up as a false negative and a warning is raised to the building administrator. It will trigger a maintenance action that could probably identify the misdiagnosis of the model. In addition, 15% of F1 samples are confused with the normal operation. This will raise false alarms that will reduce the administrator's trust in the model and the overall efficiency of the maintenance service. When comparing the results obtained using only the target samples to build a classical machine learning model ( Figure 9) and the ones from the proposed methodology ( Figure 11) it can be concluded that the model's ability to differentiate F1 from F0 has been critically beneficiated from the knowledge transferred by the source domain. The overlapping between classes has been successfully fixed using the proposed transfer learning methodology, since the percentage of false negatives in the test set has been reduced from 30% of the fault samples to near the 5%, being especially interesting for the case of F1 where the percentage of false negatives has been reduced in a 58%.
The typical classification metrics for the results discussed above are collected in Table 3. Additionally, the evaluation of the classification results without applying the dissimilarity reduction is also included. As it can be seen, the performance of the model without dissimilarity reduction and retraining is very poor. This is normal, since new points of operation, which are far from the transferred knowledge, are being added to the training pool. Hence, it is assuming that source patterns are generalizable to any point of operation, leading to evaluating the model in uncertain spaces that end up in increases in the diagnosis error.

Comparison with Other Transfer Learning Methodologies
The main hypothesis of the study is that, in the development of HVAC FDD systems, the dissimilarity caused by the characteristics and particularities of each building and HVAC installation complicates the task of applying a transfer learning methodology without first studying the nature of the data that one is aiming to transfer and examine the similarity of the domains. To validate this hypothesis, this section of the results includes a comparison of several state-of-the-art transfer learning methodologies which are being applied to the field. The results of evaluating 7 different methods (in addition to the proposed one) are collected in Table 4. The techniques evaluated can be classified into two categories of transfer learning: (i) feature-based transfer learning, which consists of the use of common features with similar behaviour with respect to both the source and target domains; and (ii) instance-based, which tries to re-weight the labelled training samples to correct the difference between the distributions during the training. The feature-based methodologies compared are: • domain adversarial neural network (DANN), replicating the network architecture and training parameters used in [19] which consists of several dense layers fully connected, that is, a 3-layer feature extractor with 17, 14 and 11 neurons followed by a 4-layer (11, 9, 7, 5 neurons) label predictor and a 4-layer domain discriminator with 11, 8, 5 and 2 neurons. The learning rate is set to 0.001 and is trained for 400 epochs using the Adam optimizer, • Subspace alignment, which is a classical transfer learning that tries to linearly align the source and target domain in a reduced PCA space. It has been used in [55] for activity recognition in one building using features learnt in another building. For this implementation, a 3-component PCA has been implemented as a pre-processing stage before the ANN architecture described in the proposed methodology, • Feature augmentation, which consists of increasing the number of features by separating the existing features into three classes: specific source features, specific target features, and general features that have the same behaviour in both domains. Like in the previous approach, it consists of a pre-processing stage that is added in before the ANN defined for the proposed methodology, • Correlation alignment (CORAL), minimizes the domain shift by aligning the secondorder statistics of the distributions. This, as in the rest of the methodologies of the category performs a transformation on the input data before it is fed to the ANN of the proposed methodology. Here, the regularization hyperparameter has been set at 1 × 10 −5 which is a common value found in the literature, and • Transfer Component analysis (TCA), which is an efficient dimensionality reduction for aligning marginal distributions. It is used in [23] in combination with an SVM for the transfer learning in FDD for chillers achieving an F-scores between 0.7 and 0.9. It is also used in [20] for the prediction of temperatures in residential buildings. Here, both TCA-SVM and TCA-ANN were tested using the radial basis function as the TCA kernel and a trade-off parameter of 0.1. The ANN variant of the TCA obtained better results and it is the one that is reported in Table 4.
And the instance-based approaches included in the comparison are: • Kullback-Leibler importance estimation procedure (KLIEP) which reweights the source samples minimizing the Kullback-Leibler divergence between the source and target domains. In [56], the authors use this methodology to estimate occupancy in smart buildings. Here the hyperparameters of the KLIEP estimator are the same used in [56] and the estimator used is the same as the one in the proposed methodology, and • Transfer AdaBoost (TrAdaBoost), which applies the reverse boosting principle to reduce the weights of the poorly predicted source samples and increase the weight of the target samples. The authors of [57] use this methodology to perform a mediumterm energy prediction of buildings. Here a total of 10 ANN estimators are trained each 200 epochs using a learning rate for the TrAdaBoost of 0.1.
It goes without saying that, in all the cases, the datasets have been pre-processed in the same way as in the proposed methodology except for applying the dissimilarity filter designed. As shown in Table 4, most of the compared transfer learning methodologies perform poorly in a situation with a high domain shift and may not be suitable for FDD purposes without the use of a previous stage of alignment. The most promising approach is feature augmentation, which nearly achieves the same performance as the proposed methodology. This technique classifies accurately the F2 and F3 faults, but F1 class is severely confused with the healthy class. A similar issue is obtained with the TrAd-aBoost model, where the F2 and F3 classes have more than 92% of the samples correctly classified, but nearly 50% of F1 samples are confused with the healthy class. The rest of the methodologies fail in aligning the domains given the sample disparity between the two datasets.

Discussion
The objective of this study was to study the performance of the transfer learning methodologies in an FDD problem in unfavourable conditions. Leveraging the information and knowledge from different HVAC installations to generate a model is the way to go to achieve the wide implementation of predictive or preventive maintenance systems for this kind of equipment. Generating the historical data necessary to produce a reliable fault diagnosis system has a high cost, not only monetary but also in terms of time since the collection of data has to cover the different operating conditions which are affected by the external weather conditions. However, directly transposing the methodologies and algorithms that have achieved great results for other fault diagnosis problems has an important flaw. The particularities of the field should be considered and HVAC installations are singular systems that are created on a case-by-case basis and despite manufacturers can sell pre-configured units, the control scheme or the thermal load and behaviour of the zones served cannot be defined beforehand. In other words, each HVAC system work in a unique installation and the similarity between them cannot be overlooked.
This study demonstrates that a high fault diagnosis accuracy can be achieved by using a transfer learning approach, but not without adapting the methodologies to the HVAC FDD field. The results of the study show that, while the neural network architecture trained only using the source domain obtains an F-score of 0.55 in the diagnosis of target faults; when it is retrained with the target data instances, the accuracy does not improve. Instead, the F-score is reduced to 0.29 showing a degraded performance of the FDD method. Additionally, the application of popular transfer learning techniques shows F-scores that, in most cases, are below 0.60 and therefore cannot be considered reliable enough for a FDD system. Taken together, these findings suggest that contrary to what has been reported in current studies, transfer learning cannot be applied overlooking the particularities of the field and needs somehow a filtering mechanism to avoid the problem of negative transfer. This work has presented an innovative strategy to filter these samples by applying a pre-processing stage consisting of a filter that discards the dissimilar samples before proceeding with the model fine-tuning. This alleviates the number of samples needed for the model generation and at the same ensures the creation of a robust model. The results obtained when applying the filter to the target training samples show that the F-score of the final model can be increased from 0.29 to over 0.90 revealing the influence of the domain similarity.
The data presented here emphasize the importance of the study of the similarity between the domains intended to use for transfer learning. In this regard, there are three important aspects to highlight of the presented methodology, which define the innovation introduced by the paper:

•
This study is not subject to the hypothesis of high similarity between the used domains, which is a common requirement in the published articles related to transfer learning in the field. This is the main advantage of the study compared with the current studies in the field, • It provides an easily transferable scheme that does not require too much effort to be adapted and applied to a new installation without recorded historical data. In comparison to other data-driven approaches, it does not need a huge amount of labelled data, which implies large data collection periods or a high number of epochs during the training, which favour the overfitting of the resulting model, and • The usage of a data instance filter generated from the reference model, which is used to characterize the patterns for their later recognition, but also to evaluate the dissimilarity between the domains to detect not correlated samples.
This work can be beneficial for the community to raise attention to the fact that the similarity is a problem in current studies and real HVAC applications. The main limitation of the study lies in the fact that, given the nature of the filter, it is not applicable to cases where the diagnosis of different fault severities is needed. In these cases, the different severity samples could be filtered out. Additionally, even a full fault class could be rejected by the filter if it does not resemble any condition seen in the source data set. In this kind of situation, current state-of-the-art methodologies showed better performances. In this respect, the proposed study provides the basis for the development of transfer learning methodologies for the fault diagnosis of AHU equipment when dealing with dissimilar domains. Future research in transfer learning for HVAC FDD is required to study the effects of negative transfer during the training process and the possibility of combining the methodology with an online incremental learning approach to see the effects of plasticitystability dilemma in HVAC FDD related applications. Incremental learning can be helpful to deal with the biggest limitation of this and other current transfer learning studies in the field, which is the accommodation of new fault classes and at the same time avoiding the forgetting of previously known faults.

Conclusions
This paper presents a methodology to implement a robust transfer learning approach to HVAC related applications. In particular, this study deals with the supervision and maintenance of AHU equipment employing a fault detection and diagnosis classification model. The developed framework combines a traditional machine learning methodology such as the ANN with a scheme of training based on leveraging the knowledge acquired in a similar problem to select the samples to use during the training and to filter out the non-significant samples. The findings of this study have a main implication for the transfer learning applied to HVAC systems, which is the demonstration of the need for a high grade of similarity between the domains employed. In this regard, the method presented in the article could be applied to deal with this issue, since it has demonstrated reliable accuracy when working with high domain shifts.