Integrated Unfold-PCA Monitoring Application for Smart Buildings: An AHU Application Example

This paper presents a complete methodology, together with its implementation as a web application, for monitoring smart buildings. The approach uses unfold-Principal Component Analysis (unfold-PCA) as a batch projection method and two statistics, Hotelling’s T-squared (T2) and the squared prediction error (SPE), for alarm generation resulting in two simple control charts independently on the number of variables involved. The method consists of modelling the normal operating conditions of a building (entire building, room or subsystem) with latent variables described expressing the principal components. Thus, the method allows detecting faults and misbehaviour as a deviation of previously mentioned statistics from their statistical thresholds. Once a fault or misbehaviour is detected, the isolation of sensors that mostly contribute to such detection is proposed as a first step for diagnosis. The methodology has been implemented under a SaaS (software as a service) approach to be offered to multiple buildings as an on-line application for facility managers. The application is general enough to be used for monitoring complete buildings, or parts of them, using on-line data. A complete example of use for monitoring the performance of the air handling unit of a lecture theatre is presented as demonstrative example and results are discussed


Introduction
Current building performance and efficiency demands (e.g., 2018/844/EU Directive) require accurately controlling and supervising technical equipment (e.g., lighting, heating, ventilation and air conditioning/air handling unit (HVAC/AHU, etc.) to ensure that they operate adequately for user activities and building uses. The evident interaction among subsystems and the need to operate the building as a whole, instead of a set of separated technical solutions, means that enhanced tools to manage, interpret and elaborate data to offer valuable action and decision-support mechanisms are required. Solutions offering an integrated view and actuation capability are usually referred to as building automation solutions and the buildings implementing them fall into the category of smart buildings.
The irruption of the Internet of Things (IoT) and sensor wireless networks (SWN) in the building automation sector have enormously facilitated the integration and accessibility of data through the deployment of middleware, gateways, edge and cloud computing solutions. Within this ecosystem of integrated solutions, and aiming to reduce the energy gap in buildings, projects such as HIT2GAP (http://www.hit2gap.eu/) or CROWDSAVING (TIN2016-79726-C2-2-R) have developed solutions that support both, data integration and the interoperability of tools to improve energy management, assessment and monitoring (i.e., open platform BEMServer, https://www.bemserver.org). In this way large amounts of data are available to enhance building monitoring. This paper presents the results of integrating a multivariate statistical analysis method as a web service in such building management platforms to offer automated fault detection in highly instrumented buildings. The methodology under the module is based on an extension of principal component analysis (PCA), to consider the existence of daily profiles (unfold-PCA). Two operation modes, modelling and monitoring, allow easy adaptation to any building or subsystem. Modelling is performed from historical data gathered by the selected sensor set; whereas monitoring evaluates new data according to the reference model previously obtained. The multivariate approach allows detecting and isolating abnormal behaviours affecting either correlation (squared prediction error (SPE) statistics) and magnitude or variations on the operation point (T 2 statistics). The multivariate statistical approach allows merging variables from different subsystems, or external sources (e.g., weather agencies), into the same model to gather correlations and has the capability to separate uncorrelated noise in the residual space. The unfold approach allows dealing with non-linear behaviours resulting from daily variations in the operating points. Based on this principle, detection is reduced to monitor two statistic indices known as Hotelling's T 2 and the squared prediction error (SPE, or Q); independent of the number of variables being monitored. This methodology has been implemented within the HIT2GAP project, as a web service that is easily integrable into any building energy monitoring solution.
PCA has been validated as valuable technology for the building sector in many research works. Thus, PCA has been applied to building monitoring as a simple dimension reduction technique [1] to later apply data-mining tools or to model and monitor several aspects such as human activities for outlier detection [2], heating evaluation of school buildings [3] or to analyse seasonal variations in electricity [4]. PCA has been demonstrated to be useful in the building sector to enhance energy monitoring and in particular it has been applied for monitoring technical systems as HVAC (heating, ventilation and air conditioning), and subsystems, with enhanced capabilities for fault detection and diagnosis. Some initial work reporting on the use of PCA in an air handling unit (AHU) can be found in [5,6]. The authors proposed focusing on detecting sensor faults in AHU systems by analysing the magnitude of the SPE statistics. Once abnormal behaviour is detected, a contribution analysis is then carried out to isolate the original variables involved in the fault. PCA has also been proposed to monitor and detect bias in sensors of VAV (variable air volume) systems by evaluating the consistency of the outdoor air temperature and the AHU supply temperature in the residual space [7,8]. More recently, a similar approach [9] proposed using PCA to detect failures in a heat recovery unit, in an attempt to simplify maintenance activities. Some authors have even proposed wavelet pre-processing to enhance the frequency content of PCA input signals [10] in an HVAC monitoring system. All these research works consist of PCA decomposition, followed by SPE-based detection for fault/misbehaviour detection and an SPE contribution analysis to identify the variables contributing the most to the out-of-control (in a statistical sense) situation.
Despite the interest of PCA, other techniques that do not implement dimensionality reduction such as augmented kernel Mahalanobis distance [11,12] or recursive transformed component statistical analysis (RTCSA) [13] can be found in the literature. However, the continuous rise in the number of sensors deployed in smart buildings makes the techniques that provide dimensionality reduction attractive.
In statistical modelling we can find other modelling techniques, mainly based on neural networks [14]. Neural network based techniques perform well in monitoring, are capable of modelling linear and non-linear correlations among the variables but have a big drawback, neural networks are black boxes, their capability of providing an analysis of the fault is limited, when modelling buildings the capability of providing a fault diagnosis is very important. Neural networks also present high computational costs when dealing with high volumes of data.
A novelty of this paper is the use of unfold-PCA, or multiway PCA, as underlying technique to manage daily profiles and capture the statistical model's intra-day correlations of sensors. Thus, the time-window paradigm is used to define daily records as batch observations (e.g., variables quarterly or hourly sampled during a day). Unfold-PCA is a common way of monitoring systems with multiple time repetitiveness. It was introduced by [15] and it has been proposed in the industry for batch process monitoring (e.g., wastewater plants [16], on-line plant stress monitoring [17]). The authors introduced and adapted the unfold-PCA method for modelling and monitoring energy in buildings [18,19] for being able to exploit even more repetitiveness present in building structure than simply time repetitiveness.
A second contribution of this paper is the use of Hotelling's T 2 statistic in AHU monitoring and the analysis of contributions in unfold-PCA. Including Hotelling's T 2 and its contributions in AHU monitoring scope enables the possibility of not only detecting sensor faults or misbehaviours affecting the modelled sensors correlations but also the variation from the normal operation. Hotelling's T 2 in comparison to SPE will detect situations when the system variables present abnormal values but correlations are still the same as the modelled ones. This detection make Hotelling's T 2 a valuable tool for detecting efficiency problems specially when merging external data. Moreover, folding the contributions allows improving the location of faults and their graphical representation.
A third contribution of this paper relies on the proposal of implementing the methodology as web services for managing multiple models of the same system (e.g., created for each season/working mode) or even managing diverse buildings in a unique server. This allows direct access to the variables available from the building management system, so the method can enrich the models with additional data sources (e.g., occupancy, weather, comfort, CO 2 , etc.). It also allows monitoring buildings or parts of them as a whole (technical equipment and environment variables) instead of monitoring every subsystem as isolated elements. Monitoring the building as a whole allows detection of not only process faults but also behavioural changes.

Materials and Methods
The unfold-PCA technique and the development of the on-line tool for monitoring buildings (including functionalities, architecture and implementation details) are described in the following subsections.

Unfold-PCA Background
The unfold-PCA method essentially consists of three basic steps: modelling, monitoring (including fault detection) and fault isolation. Unlike other methods in HVAC monitoring literature, this one not only uses the information in the residual space, represented by the SPE indicator, but also considers the information contained in the projection space by using the Hotelling's T 2 as a quality distance to evaluate an observation. For a complete description of the method involving unfold-PCA on monitoring buildings see [18].
The exploitation of such methodology for building monitoring implies two operation steps: modelling and monitoring. The first requires historic data recorded during the building's normal operating conditions, whereas the second is applied on-line every time a new observation is available for evaluation.
During the modelling step, observations throughout the normal operation conditions (NOC) of the system (building, room, systems, HVAC, etc.) are used. Input observations consist of records of all the variables considered that have been collected during complete days and uniformly sampled. These are usually quarterly or hourly, but no restrictions exist in terms of sampling frequency. The observations are organised into rows containing as many data, m, as the number of variables, (j), times the number of samples in a day, (k), in other words, (m = j × k). If a set of n observations is used, then it results in an n × m matrix named X. Unfolded variables (columns) in the matrix are auto-scaled (zero-mean and unit variance) to eliminate trajectories in the windowed observations (i.e., the daily average profile) and thus avoid variables with large magnitudes (i.e., peak hours) dominating. This step allows overcoming the limitation of only modelling linear correlations of variables around an operating point.
Then the traditional PCA method is applied to identify correlations among these unfolded variables at different time instants (unfold-PCA): Computation of the covariance matrix and its subsequent decomposition to obtain the eigenvalues and eigenvector (the latter defines the principal components, and the former the variance gathered in each of them). Selecting an appropriate number of principal components (r), results in the loadings matrix P containing the first r eigenvectors, and serves to project original unfolded variables, x, onto a lower dimension space (projection space) with a better representation of the system behaviour; and separating uncorrelated information (noise) in the residual subspace. With: where x is the original unfolded observation,x are the coordinates of the projection in the original space and x the difference between both, or residual, and t represents the scores, or coordinates, of the projection in the principal component space. P (dimensions m × r) is the loadings matrix with the r eigenvectors (columns) that define the principal component space and is used to perform the projection operation (Equation (2)). Every principal component defined by the r column vectors in P gathers an amount of information of the original data, in terms of variance, that is represented by the associated r eigenvalues (λ i , i = 1 . . . r). Once a model is created, P is completely determined. Then, as previously mentioned, two statistics are defined to monitor the quality of observations in these two subspaces: Hotelling's T 2 evaluates the quality of projections in terms of distance to the centre of the model and SPE (square of the prediction error) evaluates the quality of the observation in terms of distance to the projection hyperplane. These statistics can be computed for an observation, x, using the following expressions [20] (Equations (4) and (5)): Observe that Hotelling's T 2 corresponds to a Mahalanobis distance (the eigenvalue λ i represents a variance) of the projected observation to the origin (centre of the model after auto-scaling), and SPE is calculated as a summation of the square of projection errors in every component of the observation, that is the square of the distance of the observation to the projection subspace.
From the distribution of these statistics it is possible to automatically compute the associated thresholds for a certain level of confidence. This allows an automated criterion to be established to discriminate those deviations that statistically do not fit the distribution of T 2 and SPE obtained from the sample of observations collected during normal operation conditions. Generally speaking, SPE allows abnormal behaviours that have a correlation structure different from the model to be detected, whereas Hotelling's T 2 allows deviations of the system due to operation conditions far from the centre of the model (change in the operation point, for example) to be detected.
When monitoring, the new observations available are projected using (Equation (2)), and the Hotelling's T 2 and SPE indices (Equations (4) and (5)) are calculated and compared with the model's thresholds. In the case of exceeding one or both, an alarm is generated, alerting an abnormal observation. This mode can operate with observations affected by missing values and outliers, by reconstructing them using the model's information. In the first case, a replacement strategy provided by the redundancy of the PCA model is applied, whereas outliers are simply detected by the statistical thresholds as unexpected values.
Once an observation is detected as faulty (observations over the threshold in the Hotelling's T 2 and/or SPE charts), a diagnosis strategy known as a contribution analysis can be applied to identify the variables and time instants responsible for this out-of-control, or faulty situation. The method decomposes the index (T 2 or SPE) into small portions representing how every variable in the original space contributes to that magnitude.
This decomposition of the T 2 statistic into contributions of the variables in the original space can be computed according to Equation (6) [21], which is obtained from the analysis of the individual contributions of every component of x when combining Equations (4) and (6).
Thus, the contribution of the m th component, or variable, of an observation in the original space, x m , to the Hotelling's T 2 statistic is obtained by adding all the contributions of that variable to a scores that present abnormal values (a < r): (Equation (7)).
The criteria to consider the value of a score as abnormal is related to the Hotelling's T 2 threshold.
The contributions from each original unfolded variable to the SPE statistic are obtained by simply subtracting components in the projection and residual space (Equation (8)).
x is the back projection of t (Equation (3)) to the original unfolded space for a given observation, and x m andx m are the respective components. Finally, contribution arrays composed by the contributions of all the components are obtained.
Contribution analysis will consist of detecting contributions over a threshold to obtain the components responsible for the abnormal behaviour. The threshold for each contribution (T 2 and SPE) can be obtained from the contributions of the observations used in the modelling step. To compute the contribution limits, the mean and the standard deviation are calculated for each component considering all the modelled observations. Finally, the contribution limit will be three times the standard deviation from the mean.
For the sake of simplicity, in other words, to facilitate the diagnosis step, a signature of the detected abnormal situation can be computed from the contributions. To obtain this signature, the contributions can be binarized with respect to whether they overpass the threshold or not. Moreover, as they represent the contributions of an unfolded observation, x (1 × m), they can be folded into the original variables along the time format for a single day (j × k) to help with end-user understanding and interpretation.

Implementation as a Web Application
The implementation has been designed following a model-view-controller design pattern. All the code was developed trying to guarantee the maximum modularity (see Figure 1), so a service-oriented architecture was used for the implementation. Thus, the unfold-PCA methods were implemented as a set of RESTful services. Furthermore, data management platform communication (e.g., HIT2GAP or CROWDSAVING) is carried out with RESTful services not only to obtain data from the core, but also to send the fault detection and diagnosis (FDD) results to the core. This means the module is independent of the data management platform and guarantees its interoperability with any other data source, or integration middleware, used for smart building monitoring.
The user interface was developed as an HTML + CSS + JavaScript + PHP web application. This fault detection and diagnosis (FDD) web application is a client application that calls the PCA and unfold-PCA Java web services hosted on a server. So, for example, in the HIT2GAP project, data access for both modelling (historic data) and monitoring (on-line data) is provided by the BEMServer platform through its services; however, in- tegration with any other Building Management System/Building Energy Management System (BMS/BEMS) server is possible. The web application also allows the end-user (facility mangers) to exploit the results under different defined scenarios, with the objective of reducing the gap between the expected energy consumption (according to the design/potential use of the building) and the measured energy consumption (according to the real use of the building).
Energies 2021, 14, x FOR PEER REVIEW 6 of 15 unfold-PCA methods were implemented as a set of RESTful services. Furthermore, data management platform communication (e.g., HIT2GAP or CROWDSAVING) is carried out with RESTful services not only to obtain data from the core, but also to send the fault detection and diagnosis (FDD) results to the core. This means the module is independent of the data management platform and guarantees its interoperability with any other data source, or integration middleware, used for smart building monitoring. The user interface was developed as an HTML + CSS + JavaScript + PHP web application. This fault detection and diagnosis (FDD) web application is a client application that calls the PCA and unfold-PCA Java web services hosted on a server. So, for example, in the HIT2GAP project, data access for both modelling (historic data) and monitoring (on-line data) is provided by the BEMServer platform through its services; however, integration with any other Building Management System/Building Energy Management System (BMS/BEMS) server is possible. The web application also allows the end-user (facility mangers) to exploit the results under different defined scenarios, with the objective of reducing the gap between the expected energy consumption (according to the design/potential use of the building) and the measured energy consumption (according to the real use of the building).
Thus, the web application is used by the facility manager to detect and identify nonusual/abnormal patterns in the building's operation. The web application allows end-users to upload their own files containing data or work with on-line data available in HIT2GAP Core, create an unfold-PCA model, project data over the available models and show the results. The web application is principally composed of two interfaces: modelling and monitoring. The main functionalities available in these interfaces are described in the following subsections.

Modelling Interface
The modelling interface is shown in Figure 2. Visually, this interface is organized into two vertically separated areas. The left area corresponds to the modelling parameters Thus, the web application is used by the facility manager to detect and identify nonusual/abnormal patterns in the building's operation. The web application allows end-users to upload their own files containing data or work with on-line data available in HIT2GAP Core, create an unfold-PCA model, project data over the available models and show the results. The web application is principally composed of two interfaces: modelling and monitoring. The main functionalities available in these interfaces are described in the following subsections.

Modelling Interface
The modelling interface is shown in Figure 2. Visually, this interface is organized into two vertically separated areas. The left area corresponds to the modelling parameters (used to configure the distinct models), while the right one corresponds to the management of the models (used to inspect, delete, etc., models).
The steps to create a model are enumerated in the interface. The modelling process can last from a few seconds until several hours, depending on the chosen parameters and mainly due to the amount of data being used for modelling. This change of modelling time is because of the scalability of unfold-PCA algorithm (O 3 ) with respect to the number of observations in the dataset (it is unusual to have more features than observations in a building's scope but in this case the order will be O 3 with respect to the number of features). Using an Intel Core i7-4790 (3.6 GHz) with 16 GB of RAM, modelling using 1 GB of data normally takes around 30 min (depending on matrix shapes, results can change). Transmission time must be also considered when dealing with high data volumes. Luckily (used to configure the distinct models), while the right one corresponds to the management of the models (used to inspect, delete, etc., models). The steps to create a model are enumerated in the interface. The modelling process can last from a few seconds until several hours, depending on the chosen parameters and mainly due to the amount of data being used for modelling. This change of modelling time is because of the scalability of unfold-PCA algorithm (O 3 ) with respect to the number of observations in the dataset (it is unusual to have more features than observations in a building's scope but in this case the order will be O 3 with respect to the number of features). Using an Intel Core i7-4790 (3.6 GHz) with 16 GB of RAM, modelling using 1 GB of data normally takes around 30 min (depending on matrix shapes, results can change). Transmission time must be also considered when dealing with high data volumes. Luckily this amount of computing time is only required for model creation, during exploitation time cost is just linear.
The right side of the interface, the management area, shows the models available (previously created) in the "List of models" area. By selecting one of them, the user can view the settings and variables used by that model or delete it. In addition, by using the buttons under the list of models, the user can upload new models created off-line. The right side of the interface, the management area, shows the models available (previously created) in the "List of models" area. By selecting one of them, the user can view the settings and variables used by that model or delete it. In addition, by using the buttons under the list of models, the user can upload new models created off-line.

Monitoring Interface
The monitoring interface (Figure 3) is the main interface of the web application. It allows new observations (one or sets of them) to be monitored to detect abnormal operating conditions (faults/failures in sensors/equipment or user misbehaviour).
Visually this interface is organized into two vertically separated areas. The left corresponds to the control of the interface and the right to visualization.
The steps to start monitoring are enumerated in the control area. The first step can be performed manually or the HIT2GAP data can automatically be retrieved directly from the HIT2GAP platform, according to the model. At the end of the computing process the server will return the results, which will be displayed on Hotelling's T 2 and SPE charts found on the right of the user interface (Figure 3, shows values for three observations). Computation

Monitoring Interface
The monitoring interface (Figure 3) is the main interface of the web application. It allows new observations (one or sets of them) to be monitored to detect abnormal operating conditions (faults/failures in sensors/equipment or user misbehaviour). Visually this interface is organized into two vertically separated areas. The left corresponds to the control of the interface and the right to visualization.
The steps to start monitoring are enumerated in the control area. The first step can be performed manually or the HIT2GAP data can automatically be retrieved directly from the HIT2GAP platform, according to the model. At the end of the computing process the server will return the results, which will be displayed on Hotelling's T 2 and SPE charts found on the right of the user interface (Figure 3, shows values for three observations). Computation time for monitoring is negligible, monitoring is affected only by data internet transfer time, due to the web service implementation. Data amount used during monitoring is small (one day, one week, etc.), compared with data volume used in modelling (one season, one year, etc.).
The web application is completed with additional services to schedule monitoring events that allow continuous or periodic execution for continuous monitoring. This functionality is especially useful to automate alarm generation and alerting facility managers (alerts can trigger a messaging service) of detected abnormal behaviours for deeper analysis.
Furthermore, in the "Information" area some useful parameters about the selected model are shown.
The Hotelling's T 2 and SPE control index charts are used to detect abnormal behaviours-represented here as blue dots (observations) over red areas. Dots (observations) The web application is completed with additional services to schedule monitoring events that allow continuous or periodic execution for continuous monitoring. This functionality is especially useful to automate alarm generation and alerting facility managers (alerts can trigger a messaging service) of detected abnormal behaviours for deeper analysis.
Furthermore, in the "Information" area some useful parameters about the selected model are shown.
The Hotelling's T 2 and SPE control index charts are used to detect abnormal behaviours-represented here as blue dots (observations) over red areas. Dots (observations) falling in the red area are those representing abnormal behaviours according to the modelled NOC, while points in the white area are compliant with NOC.
After detection, by clicking on any of these dots in the red area, the application will display the contribution analysis. Figure 4 shows the contribution limit in green and the contributions of the selected abnormal observation are in red. Note that variables presenting values over the green area are those that contribute the most (Equations (7) and (8)) to abnormal behaviour. A more concentrated view of the contributions can be generated as a report of the abnormal behaviour or automatically sent to a scheduled processes e-mail. elled NOC, while points in the white area are compliant with NOC.
After detection, by clicking on any of these dots in the red area, the application will display the contribution analysis. Figure 4 shows the contribution limit in green and the contributions of the selected abnormal observation are in red. Note that variables presenting values over the green area are those that contribute the most (Equations (7) and (8)) to abnormal behaviour. A more concentrated view of the contributions can be generated as a report of the abnormal behaviour or automatically sent to a scheduled processes e-mail.

Results
In this section the results of the methodology implemented in the on-line application are shown for a real case: monitoring the environmental conditions of a lecture theatre heated and cooled by an air handling unit (AHU) and submitted to different operational conditions. The data used is from the Alice Perry Building at NUI Galway, one of the pilot sites in the HIT2GAP project.

Modelled System Description
The lecture theatre is a multi-purpose amphitheatre used for lectures, conferences and presentations. The room has around 200 seats, no windows, thick insulated walls and double entrance doors (external and internal). The room's characteristics means its interactions with the rest of the building or the exterior are low.
In this room, many variables at a 1-min frequency are monitored and provided by the BMS. The data-set includes AHU control variables, for example, fan speeds and air quality measurements in the room (i.e., CO2, humidity and temperature). From those variables, 25 variables are selected for modelling. These selected variables are automatically resampled by the HIT2GAP core at a 15-min frequency. The full list of the selected variables can be found in Table 1.

Results
In this section the results of the methodology implemented in the on-line application are shown for a real case: monitoring the environmental conditions of a lecture theatre heated and cooled by an air handling unit (AHU) and submitted to different operational conditions. The data used is from the Alice Perry Building at NUI Galway, one of the pilot sites in the HIT2GAP project.

Modelled System Description
The lecture theatre is a multi-purpose amphitheatre used for lectures, conferences and presentations. The room has around 200 seats, no windows, thick insulated walls and double entrance doors (external and internal). The room's characteristics means its interactions with the rest of the building or the exterior are low.
In this room, many variables at a 1-min frequency are monitored and provided by the BMS. The data-set includes AHU control variables, for example, fan speeds and air quality measurements in the room (i.e., CO 2 , humidity and temperature). From those variables, 25 variables are selected for modelling. These selected variables are automatically resampled by the HIT2GAP core at a 15-min frequency. The full list of the selected variables can be found in Table 1. The AHU system in the lecture theatre is automatically controlled by the BMS. The BMS present in the Alice Perry building provides on-line data to the HIT2GAP Core for studied AHU among others, in ( Figure 5)  The AHU system in the lecture theatre is automatically controlled by the BMS. The BMS present in the Alice Perry building provides on-line data to the HIT2GAP Core for studied AHU among others, in ( Figure 5) the representation of the AHU shown by the BMS. The AHU system is managed using a calendar and a timetable that switch it on or off depending on building opening days and hours.
According to the technical description, the room is controlled considering the following rules: • User adjustable temperature set points. The AHU system is managed using a calendar and a timetable that switch it on or off depending on building opening days and hours.
According to the technical description, the room is controlled considering the following rules: • User adjustable temperature set points. • Dehumidification procedure to maintain higher relative humidity levels at 60%.

•
Air quality control CO 2 should not exceed 1500 ppm, when CO 2 reaches 1500 ppm return duct and impulsion fan speeds are set to max. • Frost protection will activate heating if room temperature is 3 • C or lower.
The period studied, from 1 January 2018 to February 2019, is analysed using two complementary models corresponding to non-working and working days, respectively. The subsequent subsections present the capabilities of the two models.

Non-Working Days
In this case, a set of 27 days of 2018 were selected for modelling. During these days there was no occupation of the room and the AHU system was in a kind of maintenance mode throughout the day and presented no empty data registers. We named this situation off mode, despite the system not being completely switched off and still maintaining a minimum air quality in the room. This model is intended to detect any usage of the room, although there was not supposed to be any occupancy, and the AHU system operated in this minimum energy mode as planned by its schedule. This unfold-PCA model captures 79.19% of the original variance using seven principal components.
For monitoring purposes, all available days are projected, then daily Hotelling's T 2 and SPE indices are obtained and compared with the statistical thresholds. Days over the threshold correspond to abnormal behaviour that should be diagnosed (see Figure 6 for a T 2 graphical representation example). Diagnosis is made by means of contribution analysis, which will indicate the variables causing the abnormal behaviour and the period of the day when the abnormal variation occurred. Remember that, according to the methodology all the detections are abnormal situations.
• Air quality control CO2 should not exceed 1500 ppm, when CO2 reaches 1500 ppm return duct and impulsion fan speeds are set to max. • Frost protection will activate heating if room temperature is 3 °C or lower.
The period studied, from 1 January 2018 to February 2019, is analysed using two complementary models corresponding to non-working and working days, respectively. The subsequent subsections present the capabilities of the two models.

Non-Working Days
In this case, a set of 27 days of 2018 were selected for modelling. During these days there was no occupation of the room and the AHU system was in a kind of maintenance mode throughout the day and presented no empty data registers. We named this situation off mode, despite the system not being completely switched off and still maintaining a minimum air quality in the room. This model is intended to detect any usage of the room, although there was not supposed to be any occupancy, and the AHU system operated in this minimum energy mode as planned by its schedule. This unfold-PCA model captures 79.19% of the original variance using seven principal components.
For monitoring purposes, all available days are projected, then daily Hotelling's T 2 and SPE indices are obtained and compared with the statistical thresholds. Days over the threshold correspond to abnormal behaviour that should be diagnosed (see Figure 6 for a T 2 graphical representation example). Diagnosis is made by means of contribution analysis, which will indicate the variables causing the abnormal behaviour and the period of the day when the abnormal variation occurred. Remember that, according to the methodology all the detections are abnormal situations. Days with occupation and AHU system off-These cases are characterized by presenting abnormal values for CO2, temperature and humidity variables. This kind of situation should be considered and solved because users experience uncomfortable, and perhaps even unhealthy, conditions. See in Figure 7, that CO2 is among the identified variables that cause misbehaviour for one of these days outside of the control. Days with occupation and AHU system off-These cases are characterized by presenting abnormal values for CO 2 , temperature and humidity variables. This kind of situation should be considered and solved because users experience uncomfortable, and perhaps even unhealthy, conditions. See in Figure 7, that CO 2 is among the identified variables that cause misbehaviour for one of these days outside of the control.
• Dehumidification procedure to maintain higher relative humidity levels at 60%. • Air quality control CO2 should not exceed 1500 ppm, when CO2 reaches 1500 ppm return duct and impulsion fan speeds are set to max. • Frost protection will activate heating if room temperature is 3 °C or lower.
The period studied, from 1 January 2018 to February 2019, is analysed using two complementary models corresponding to non-working and working days, respectively. The subsequent subsections present the capabilities of the two models.

Non-Working Days
In this case, a set of 27 days of 2018 were selected for modelling. During these days there was no occupation of the room and the AHU system was in a kind of maintenance mode throughout the day and presented no empty data registers. We named this situation off mode, despite the system not being completely switched off and still maintaining a minimum air quality in the room. This model is intended to detect any usage of the room, although there was not supposed to be any occupancy, and the AHU system operated in this minimum energy mode as planned by its schedule. This unfold-PCA model captures 79.19% of the original variance using seven principal components.
For monitoring purposes, all available days are projected, then daily Hotelling's T 2 and SPE indices are obtained and compared with the statistical thresholds. Days over the threshold correspond to abnormal behaviour that should be diagnosed (see Figure 6 for a T 2 graphical representation example). Diagnosis is made by means of contribution analysis, which will indicate the variables causing the abnormal behaviour and the period of the day when the abnormal variation occurred. Remember that, according to the methodology all the detections are abnormal situations. Days with occupation and AHU system off-These cases are characterized by presenting abnormal values for CO2, temperature and humidity variables. This kind of situation should be considered and solved because users experience uncomfortable, and perhaps even unhealthy, conditions. See in Figure 7, that CO2 is among the identified variables that cause misbehaviour for one of these days outside of the control. Days with AHU system on-All days with the system on are detected as abnormal because the model is made for monitoring non-working days and the system should be in off mode. In this group we can differentiate two situations depending on the occupation of the room. The days with occupation and the system on are just misclassifications. These misclassifications are due to the policy of considering all the Saturdays and Sundays as non-working days. These days should be re-projected for a correct monitoring in the working days' model. The second situation is more interesting in terms of energy efficiency, as they correspond to days when the AHU system was on due to poor scheduling or user carelessness. In these detection situations, the AHU could be turned off as the lecture theatre was not in use. Changes in the system's configuration-In this case a new model should be created for monitoring the new working conditions.

Modelling Working Days
In this case, a set of 64 days from 2018 without missing data are used for modelling. The most common situation during working days is that there is no occupation and the AHU system is in on mode during a fixed period. This unfold-PCA model captures 87.42% of the original variance using 11 principal components. Next, monitoring the whole set of working days is carried out. When monitoring detects an abnormal behaviour, a fault diagnosis is performed. In this case, the most common state of the room is that it is empty and the AHU system is in on mode and with a minimal configuration. Thus, this condition is captured by the model. Consequently, the detections correspond to the days the room is occupied. The main problem here is that the system presents a schedule and switches on and off based on the schedule and not on the room's occupation. If the system had been controlled by user-demand parameters, other variables apart from CO 2 should have been detected as being affected (see Figure 8).  (20 January 2018). Y axis contains the variables and X axis the time (quarterly sampled). Green represents variables/time in the model space and red represents variables/time out of the model space. The two CO2 sensors are detected as abnormal along with some temperature related variables during the hours with presence in the room. Variables without abnormal contributions are not represented. Days with AHU system on-All days with the system on are detected as abnormal because the model is made for monitoring non-working days and the system should be in off mode. In this group we can differentiate two situations depending on the occupation of the room. The days with occupation and the system on are just misclassifications. These misclassifications are due to the policy of considering all the Saturdays and Sundays as non-working days. These days should be re-projected for a correct monitoring in the working days' model. The second situation is more interesting in terms of energy efficiency, as they correspond to days when the AHU system was on due to poor scheduling or user carelessness. In these detection situations, the AHU could be turned off as the lecture theatre was not in use. Changes in the system's configuration-In this case a new model should be created for monitoring the new working conditions.

Modelling Working Days
In this case, a set of 64 days from 2018 without missing data are used for modelling. The most common situation during working days is that there is no occupation and the AHU system is in on mode during a fixed period. This unfold-PCA model captures 87.42% of the original variance using 11 principal components. Next, monitoring the whole set of working days is carried out. When monitoring detects an abnormal behaviour, a fault diagnosis is performed. In this case, the most common state of the room is that it is empty and the AHU system is in on mode and with a minimal configuration. Thus, this condition is captured by the model. Consequently, the detections correspond to the days the room is occupied. The main problem here is that the system presents a schedule and switches on and off based on the schedule and not on the room's occupation. If the system had been controlled by user-demand parameters, other variables apart from CO2 should have been detected as being affected (see Figure 8).

Discussion
Current building management and energy management systems (BMS/BEMs) are intended to make control variables accessible and provide monitoring and data management capabilities. They also provide some alarm detection mechanisms associated with the bounds of every variable; however, they lack data modelling and enhanced fault detection capabilities capable of systematically gathering knowledge embedded in historic data. Adding a statistical monitoring tool to provide information for decision support is useful in such situations. In buildings we can find many subsystems with interactions among them (heating/cooling, lighting, water distribution, etc.) but we can also find correlations among building physical parameters and also among other exogenous variables such as weather or building usages.
On one hand, it is known that the use of PCA-based techniques, in comparison with other modelling techniques such as neural networks, brings powerful monitoring and fault isolation charts. Those charts are easy to understand for any user, expert or nonexpert in PCA, and provide valuable information on the variables responsible for model

Discussion
Current building management and energy management systems (BMS/BEMs) are intended to make control variables accessible and provide monitoring and data management capabilities. They also provide some alarm detection mechanisms associated with the bounds of every variable; however, they lack data modelling and enhanced fault detection capabilities capable of systematically gathering knowledge embedded in historic data. Adding a statistical monitoring tool to provide information for decision support is useful in such situations. In buildings we can find many subsystems with interactions among them (heating/cooling, lighting, water distribution, etc.) but we can also find correlations among building physical parameters and also among other exogenous variables such as weather or building usages.
On one hand, it is known that the use of PCA-based techniques, in comparison with other modelling techniques such as neural networks, brings powerful monitoring and fault isolation charts. Those charts are easy to understand for any user, expert or non-expert in PCA, and provide valuable information on the variables responsible for model detection and a detailed interpretation of the reason (present distinct correlations for SPE index or present abnormal values for Hotelling's T 2 ). Providing easy to understand charts for non-PCA experts is very important in buildings' scopes as the results must be interpreted by buildings' energy managers, to take the right actions to solve the detected problems.
On the other hand, especially when talking about classical PCA, it has the limitation of gathering linear relationships (correlations). Luckily many state of the art PCA variants exist, mitigating this peculiarity. Kernel based variants are mainly used when all the variables present non-linear correlations, unfold (or multiway) variants are more useful when not all the variables present non-linear correlations and especially when you know the periodicity of your system cycles. Space reduction of these techniques is also useful in systems with many variables for its ability to compressing data.
This paper presents an unfold-PCA based technique for monitoring buildings as a unique system instead of many separate systems. Buildings are systems that many times directly operate under a schedule (offices, malls, etc.), but also present human behavioural repetitive patterns, or climate derived patterns, among others. These variables and their repetitive patterns can be easily included into the model when applying unfold-PCA when performing daily or weekly unfolds. Furthermore, other possibilities of unfolding can be easily applied for comparing distinct parts of the building. Moreover, the use of this concept of windowed observations allows overcoming the limitation of only modelling linear correlations of variables around an operating point. Thus, we can consider a whole day as an observation defined by the trajectory of the multiple sensors being monitored. Then, when scaling the data, it is possible to remove the daily average trajectory of sensors and monitor variations around it.
Furthermore, when applying statistical modelling techniques to buildings and technical subsystems, especially AHUs, another drawback arises. This drawback is that buildings can usually operate under different conditions depending on the variations of heating/cooling load, occupancy, time of the day, etc. Due to these multiple configurations, PCA-based techniques, especially, cannot be correctly used to model normal operating conditions defined by multiple operating points unless many models are built, one for each of these different operation conditions. A valuable technical contribution of this work relies on the implementation as web services ready to be offered as an on the shelf solution. The solution has been adapted to manage multiple users (buildings) and models (day/night, seasons, etc.) and it is robust to data sets with lost data by implementing several blank replacement methods. There are still no tools capable of statistically monitoring buildings apart from those presented here, integrated and ready for use in the real world.
The proposed methodology uses Hotelling's T 2 statistic as a measure of normality for an observation as a complement of the traditional SPE index used in state of the art AHU monitoring. Furthermore, the contribution analyses for both indices, Hotelling's T 2 and SPE, have been included to offer improved isolation capabilities. The SPE index captures correlation breaks with respect to the reference model whereas Hotelling's T 2 is more sensitive to variations of magnitude in the correlation space. Thus, SPE provides information of possible sensor faults or malfunctions and Hotelling's T 2 is sensible to larger consumption patterns or leakages in the systems. Just to mention, SPE's contribution presented in this paper can be improved by including improvements in fault detection charts, for example, reconstruction-based contribution (RBC) [22].

Conclusions
This paper presents and uses monitoring tools and methods specially designed for facility managers, energy managers and those responsible for building maintenance. This work shows the integration and application of a building monitoring module that offers enhanced capabilities based on unfold-PCA. The module, developed within the CROWD-SAVING project, allows abnormal behaviours and misbehaviours to be detected and diagnosed by considering the existence of daily profiles, and has been implemented in an on-line application developed during the HIT2GAP project and integrated into the HIT2GAP platform. This module also simplifies the creation, management, storage and exploitation tasks for all the models.
To illustrate and test the module, a real-use case of its application to a pilot site in the HIT2GAP project is presented. A lecture theatre in the Alice Perry Building at NUI Galway, heated and cooled by an AHU and submitted to different operational conditions, was used for this purpose. The module allows different models, depending on the environment conditions, to be obtained. These models are then applied on-line to detect abnormal situations and diagnose them. In this way, the on-line application detects misbehaviours associated with energy consumption and links this behaviour to the most influent variables being monitored in the building. The tool was also successfully tested with three other pilots in the HIT2GAP project. Overall the tool was proved to be capable of managing multiple users, buildings and distinctly modelled areas for each of the studied buildings.
This module provides a statistical decision support tool for complementing all standard BEM systems. This complementation is archived by directly integrating as a module for BEMServer but this module is also especially intended to support continuous monitoring while using the implantation of energy management procedures such as ISO50001 or ISO 50006:2014 (energy performance) and can also be used to support audits (EN16247-1, ISO 50002) and performance measure and verification protocols (IPMVP, ISO 50015:2014). Furthermore, it can be used as a separate stand-alone tool for BEM systems with no integration capabilities.