Simulation-Based Fault Detection and Diagnosis for AHU Systems Using a Deep Belief Network

Yoo, Mooyoung

doi:10.3390/buildings16020342

Open AccessArticle

Simulation-Based Fault Detection and Diagnosis for AHU Systems Using a Deep Belief Network

by

Mooyoung Yoo

Department of Architectural Engineering, Daejin University, Pocheon 11159, Republic of Korea

Buildings 2026, 16(2), 342; https://doi.org/10.3390/buildings16020342

Submission received: 12 November 2025 / Revised: 20 December 2025 / Accepted: 5 January 2026 / Published: 14 January 2026

(This article belongs to the Special Issue Built Environment and Building Energy for Decarbonization)

Download

Browse Figures

Versions Notes

Abstract

Heating, ventilation, and air conditioning (HVAC) systems account for a significant portion of building energy consumption and play a crucial role in maintaining indoor comfort. However, hidden faults in air-handling units (AHUs) often lead to energy waste and degraded performance, highlighting the importance of reliable fault detection and diagnosis (FDD). This study proposes a simulation-driven FDD framework that integrates a standardized prototype dataset and an independent evaluation dataset generated from a calibrated EnergyPlus model representing a target facility, enabling controlled experimentation and transfer evaluation within simulation environments. Training data were generated from the DOE EnergyPlus Medium Office prototype model, while evaluation data were obtained from a calibrated building-specific EnergyPlus model of a research facility operated by Company H in Korea. Three representative fault scenarios—outdoor air damper stuck closed, cooling coil fouling (65% capacity), and air filter fouling (30% pressure drop)—were systematically implemented. A Deep Belief Network (DBN) classifier was developed and optimized through a two-stage hyperparameter tuning strategy, resulting in a three-layer architecture (256–128–64 nodes) with dropout and regularization for robustness. The optimized DBN achieved diagnostic accuracies of 92.4% for the damper fault, 98.7% for coil fouling, and 95.9% for filter fouling. These results confirm the effectiveness of combining simulation-based dataset generation with advanced deep learning methods for HVAC fault diagnosis. The results indicate that a DBN trained on a standardized EnergyPlus prototype can transfer to a second, independently calibrated EnergyPlus building model when AHU topology, control logic, and monitored variables are aligned. This study should be interpreted as a simulation-based proof-of-concept, motivating future validation with field BMS data and more diverse fault scenarios.

Keywords:

Fault Detection and Diagnosis (FDD); HVAC systems; Air-Handling Unit (AHU); Deep Belief Network (DBN); simulation-based modeling

1. Introduction

Heating, Ventilation, and Air Conditioning (HVAC) systems represent one of the largest energy consumers in buildings, typically accounting for 40–60% of total energy use in commercial facilities [1,2]. Their performance directly affects energy efficiency, indoor environmental quality, and occupant comfort, which are increasingly critical under global climate change and net-zero energy targets. Hidden faults in HVAC equipment—such as outdoor damper failures, coil fouling, or filter clogging—often remain undetected for extended periods, leading to unnecessary energy waste, increased operational costs, and deterioration of indoor comfort [3,4,5]. Reliable fault detection and diagnosis (FDD) strategies are therefore essential to support sustainable building operations.

The existing FDD approaches can be broadly categorized into three groups: rule-based, model-based, and data-driven methods [6,7,8,9]. Rule-based methods employ expert knowledge and threshold rules [10,11], offering simplicity but often lacking scalability. Model-based approaches utilize physical or mathematical representations of HVAC dynamics [12,13,14], enabling accurate diagnosis but requiring extensive modeling and calibration. Data-driven methods, empowered by the growing availability of Building Energy Management System (BEMS) data, have recently become dominant. Techniques such as machine learning and deep learning have demonstrated significant potential for capturing nonlinear and high-dimensional operational patterns [15,16,17,18,19]. In particular, advanced models including Long Short-Term Memory (LSTM) networks, Convolutional Neural Networks (CNNs), and Deep Belief Networks (DBNs) have shown strong diagnostic capabilities [18,19,20,21,22,23].

Despite these advancements, several gaps remain. First, many studies rely solely on simulation data, limiting their generalizability to real-world building operations [24]. Second, training and evaluation are often conducted on the same dataset, raising concerns of overfitting and poor robustness in new environments [25]. Third, while deep learning methods are promising, their transfer performance between standardized simulation datasets and building-specific simulation models has not been sufficiently validated. These limitations underscore the need for integrated frameworks that combine standardized reference models for scalable dataset generation with independent evaluation using calibrated building-specific simulation models (digital twin simulations).

To address these gaps, the present study introduces a dual-dataset simulation framework. The DOE EnergyPlus Medium Office reference model was used to generate large-scale training data under both normal and faulty conditions. Three representative HVAC faults—outdoor air damper stuck closed, cooling coil fouling (65%), and air filter fouling (30%)—were implemented to simulate abnormal operating states and labeled systematically. To evaluate model robustness, independent testing data were generated from a calibrated building-specific EnergyPlus model of a research building operated by Company H in Korea, using the same set of monitored variables. This two-stage data generation strategy combines a standardized reference model for training with an independent evaluation dataset generated from a calibrated building-specific EnergyPlus model, providing an intermediate transfer evaluation under controlled fault injection within simulation environments. We note that both the source and target domains are EnergyPlus-based building models with aligned AHU topology, control logic, and monitored variables; therefore, the transfer challenge addressed in this study is reduced compared to real-world cross-building BMS transfer. For fault classification, a Deep Belief Network (DBN) was adopted due to its ability to extract hierarchical feature representations from high-dimensional sensor data [26,27,28]. By optimizing hyperparameters such as learning rate, hidden layer configuration, and dropout rate, the DBN was fine-tuned for HVAC-specific applications. The framework was evaluated by comparing training and independent test results, demonstrating high diagnostic accuracy under two EnergyPlus models with aligned AHU topology, control logic, and monitored variables.

The key contributions of this study are threefold:

Development of a dual-dataset strategy combining an EnergyPlus reference model with an independently calibrated, building-specific EnergyPlus model (digital twin simulation);
Systematic implementation and labeling of three representative HVAC fault scenarios;
Application and optimization of a DBN-based FDD framework, achieving consistent performance across the prototype-training and digital-twin-simulation testing datasets under a strictly separated evaluation protocol.

Figure 1 illustrates the overall process of simulation-based fault detection and diagnosis using machine learning adopted in this study.

The remainder of this paper is structured as follows. Section 2 details the methodology, including dataset generation, fault implementation, and variable selection. Section 3 introduces the DBN model and its optimization strategy. Section 4 presents the results and discussion, followed by Section 5, which concludes the paper with key findings, practical implications, and directions for future research.

2. Methodology

The methodology consists of three major steps: (i) generation of the training dataset using a standardized EnergyPlus prototype model, (ii) implementation of representative AHU faults within the simulation environment, and (iii) generation of the evaluation dataset using a project-specific EnergyPlus model of the target building. All datasets were constructed from 21 key variables relevant to AHU fault diagnosis, and only occupied periods were considered to reduce the risk of overfitting and misclassification.

2.1. Training Data Generation

The training dataset was generated from the ASHRAE 90.1 Medium Office proto-type model (RefBldgMediumOfficeNew2004_Chicago.idf) provided within the EnergyPlus distribution. The main building characteristics of the DOE Medium Office prototype used for training data generation are summarized in Table 1. Figure 2 shows the EnergyPlus simulation model of the DOE Medium Office prototype used for training dataset construction. This prototype represents a three-story, 4982 m² commercial office building designed to reflect the load profiles and system characteristics of typical U.S. mid-size offices as reported in the Commercial Building Energy Consumption Survey (CBECS). The HVAC system is organized as a chilled-water central plant connected to multiple air-handling units (AHUs), each of which supplies conditioned air to both core and perimeter zones. Each AHU comprises an outdoor-air mixing box, a chilled-water cooling coil, a variable-volume supply fan, and a return fan with integrated economizer control. Terminal units are configured as variable air volume (VAV) reheat boxes, allowing zone-level modulation of supply airflow and reheating when necessary. The cooling coil is served by a water-cooled electric chiller operating within a chilled-water loop, with circulation pumps delivering chilled water to each AHU. The nominal chilled-water supply temperature is maintained at 6.67 °C with a design temperature rise of approximately 5.6 °C, ensuring sufficient cooling capacity across a range of load conditions.

To ensure transferability, the present framework does not assume that data generated from an arbitrary simulation model can be directly applied to any real building. The generalization between the prototype and the target facility is valid only under a clear correspondence of system configuration and control logic. Specifically, the air-handling unit topology (mixing box, cooling coil, supply/return fans, and VAV terminals), control sequences (duct static pressure control, minimum outdoor-air regulation, and supply-air temperature setpoints), and the set of 21 monitored variables were aligned between the Medium Office prototype and the target building model. Prior to test-data generation, the building-specific EnergyPlus model of the target building was calibrated to match key operational parameters such as setpoints, schedules, and plant capacities. These steps ensured physical and control-level consistency while preventing over-generalization beyond comparable system domains. A concise summary of variable and control correspondence is provided in Table 2, and validation results of calibrated building-specific EnergyPlus models are reported in Section 2.3.

2.2. Fault Characteristics

Three representative fault scenarios were implemented within the EnergyPlus environment, each chosen for its prevalence in practice and its significant impact on both energy performance and indoor air quality. The first fault corresponded to an outdoor-air damper stuck in the fully closed position. Figure 3 illustrates the fault characteristics of the outdoor air damper stuck condition implemented in the EnergyPlus simulation. Under this condition, no fresh air enters the mixing box, and the mixed-air temperature converges to the return-air temperature. This not only reduces ventilation effectiveness but also alters the thermal load on the cooling coil. The fault was implemented in EnergyPlus by overriding the outdoor-air fraction to zero using the Energy Management System (EMS), thereby ensuring that the system consistently operated under a no-ventilation condition during the designated fault periods.

The second fault involved cooling coil fouling, simulated by reducing the effective heat transfer capacity of the chilled-water coil to 65% of its nominal value. Figure 4 illustrates the fault characteristics of cooling coil fouling corresponding to a 65% reduction in heat-transfer capacity implemented in the EnergyPlus simulation. This degradation was realized by scaling the coil’s UA parameter within the EnergyPlus input structure, which in turn reduced the coil’s ability to meet the supply-air temperature setpoint under design cooling loads. The expected symptoms included an elevated supply-air temperature, increased valve opening to demand higher chilled-water flow, and a decrease in chilled-water temperature difference across the coil. The third fault was air filter fouling, implemented as a 30% increase in pressure drop across the fan and duct system. Figure 5 illustrates the fault characteristics of air filter fouling implemented in the EnergyPlus simulation. This was modeled by adjusting the fan pressure rise and/or introducing an equivalent loss coefficient so that, at design airflow, the system experienced a 30% higher resistance. In practical terms, this fault condition required the supply fan to consume more power to maintain the same duct static pressure. If the fan could not fully overcome the added resistance, a decrease in supply airflow would occur, resulting in temperature control deterioration at the zone level.

All fault scenarios were introduced according to a normal–fault–normal schedule, allowing each simulation day to include transitions between healthy and faulty states. Each timestep in the output dataset was labeled accordingly as normal, outdoor-air damper stuck, cooling coil fouling, or air filter fouling. This ensured the availability of supervised training data for the classifier, where the ground truth was directly embedded into the simulation outputs.

In the simulation environment, each day was configured with a normal–fault–normal cycle to ensure that the model could learn both transition and steady-state behaviors. This setup was used purely for data-labeling convenience and does not imply that faults in real systems resolve automatically. In practical building operation, such transitions correspond to the occurrence and subsequent correction of a fault through maintenance or system reset. The intermittent scheduling in simulation therefore serves as a proxy for multiple operating periods—before, during, and after maintenance—allowing the classifier to observe temporal fault progression while maintaining clear ground-truth labeling.

2.3. Calibrated Building-Specific EnergyPlus Model for Testing Data

In this study, the term ‘digital twin’ is used in a narrow sense to denote a calibrated EnergyPlus model that reproduces key operational trends of the target AHU under normal conditions. It is employed as a simulation environment for controlled fault injection and does not imply real-time bi-directional synchronization with the physical building. The analysis was carried out on a research building operated by Company H in Korea, which serves as the target facility for the simulation-based investigation. The building is a medium-sized office-type research institute, representative of typical research and development environments in the region. A detailed building model was constructed using the EnergyPlus simulation platform to reflect the building envelope, occupancy schedules, construction materials, and HVAC system operations, thereby creating a realistic basis for performance evaluation. The second floor above ground was designated as the primary analysis target, owing to the presence of a dedicated air-handling unit (AHU) that serves multiple thermal zones. Operational trends from BEMS under normal operation were used to calibrate and validate the simulation model, providing a baseline reference for subsequent fault-injection analysis.

The target-building EnergyPlus model was calibrated using measured Building Energy Management System (BEMS) data collected under normal operation. Calibration focused on representative AHU-level variables that directly reflect thermodynamic and control behavior, including return air temperature (RAT), as well as supporting variables such as airflow and fan power where available. The calibration objective was to match both steady-state levels and dominant transient characteristics during occupied operation, while maintaining consistency with the implemented control sequences.

Calibration quality was quantified using the normalized mean bias error (NMBE) and the coefficient of variation in the root mean square error (CVRMSE), following ASHRAE Guideline 14 criteria for hourly HVAC data. Table 3 summarizes the calibration statistics and acceptance criteria. In addition, Figure 6 provides representative time-series comparisons between measured and simulated return air temperature, illustrating both a best-case and worst-case period to transparently show model behavior under different operating conditions.

Beyond validation, the EnergyPlus-based target building model functioned as a data generation environment. By simulating both normal and faulty operation, the model enabled the acquisition of training and testing datasets necessary for machine-learning-based fault detection and diagnosis (FDD). This ensured that robust and diverse datasets could be obtained in a safe and cost-effective manner without inducing faults in the actual HVAC system. Figure 7 shows the location and exterior view of the target research building. The digital-twin model reproduces the geometric and operational characteristics of the real facility, including its rooftop systems and envelope features that influence cooling loads. The left panel illustrates the site layout with the analyzed building highlighted, and the right panel presents the exterior of the actual research institute used for model construction.

To ensure comparability between the prototype and the target building, the air-handling-unit topology, control sequences, and monitored variables were aligned. Both models include the same physical components—outdoor-air mixing box, cooling coil, supply and return fans, and VAV terminals—and follow identical control strategies such as duct static-pressure control, minimum outdoor-air regulation, and fixed supply-air-temperature setpoints. This one-to-one correspondence guarantees that the diagnostic model interprets consistent physical relationships in both training and testing domains. A summary of the matched variables and control parameters is presented in Table 3.

This correspondence confirms that the digital-twin-based dataset is physically consistent with the prototype model and provides a reliable foundation for evaluating prototype-to-target transfer within simulation environments under aligned system configuration and control logic. The digital-twin model was validated against measured Building Energy Management System (BEMS) data collected under normal operating conditions. Calibration was performed by tuning control schedules, setpoints, and equipment capacities until the simulated supply-air temperature, airflow rate, and fan-power profiles closely matched the measured trends. The goodness-of-fit was evaluated using the coefficient of variation in the root-mean-square error (CVRMSE) and the normalized mean bias error (NMBE) following ASHRAE Guideline 14. The results satisfied the standard acceptance thresholds—CVRMSE below 30% and |NMBE| below 10% for hourly HVAC data—confirming that the calibrated building-specific EnergyPlus model provides a reliable representation of the real AHU’s thermodynamic and control behavior. Similar calibration procedures and validation criteria have been adopted in previous studies on EnergyPlus-based digital twins and HVAC system modeling [7,13,26].

2.4. Testing Data Generation Using Calibrated Building-Specific EnergyPlus Model

The evaluation dataset was generated from an EnergyPlus model of a research building operated by Company H, hereafter referred to as the target building. Unlike the standardized prototype, this model was developed to capture site-specific characteristics, including its envelope configuration, occupancy schedule, and HVAC operational strategies. The target building is served by a centralized air-handling unit connected to a chilled-water plant, supplying conditioned air to multiple thermal zones through a variable-air-volume system. The system incorporates an outdoor-air mixing box, a chilled-water cooling coil, a variable-speed supply fan, and a return fan, with terminal units modulating airflow to meet zone-level cooling demands. The supply-air temperature setpoint is maintained at 16 °C during cooling, and chilled water is supplied at 8 °C, consistent with the training model to facilitate comparison.

The AHU in the target building is characterized by control strategies that transition between occupied and unoccupied modes. During occupied periods, the supply fan operates under duct static pressure control, and VAV dampers adjust to track the zone temperature setpoints. Minimum outdoor-air requirements are enforced through the outdoor-air controller, while economizer operation is limited in order to prevent confounding effects when assessing cooling faults. These characteristics, derived from previous studies and design documentation, provide a realistic operational context against which the classifier can be evaluated.

To ensure comparability with the training dataset, the same 21 variables were extracted from the target building model at one-minute intervals, and the same labeling protocol was applied. Faults were implemented identically to those in the prototype model, using EMS to control damper positions, coil performance multipliers, and fan pressure rise. As with the training data, the first five minutes following each transition were excluded, and only occupied hours were considered for the evaluation dataset. This protocol ensured that the evaluation data preserved both temporal consistency and physical interpretability, while also reflecting the distinct dynamics of a real building system.

2.5. Baseline Classifiers: Decision Tree and Artificial Neural Network

To establish a fair comparative framework for the proposed DBN, two widely used classifiers—Decision Tree (DT) and Artificial Neural Network (ANN)—were implemented following the same data pipeline described in Section 2.1, Section 2.2, Section 2.3 and Section 2.4. All models consumed identical inputs: 21 monitored variables, z-score standardization, 30 min windows with 1 min stride (630 features), and first-order differences Δx concatenated to yield 1260-dimensional samples. Only occupied hours were considered, and the label space comprised four classes (normal, OA damper stuck, coil fouling, filter fouling).

2.5.1. Decision Tree

Decision Tree (DT) is one of the most established algorithms in fault detection and diagnosis owing to its intuitive structure, transparency, and computational efficiency. A DT classifies input data by recursively partitioning the feature space into a hierarchy of decision nodes, where each split is determined by a criterion such as information gain or Gini impurity. This hierarchical decomposition enables the algorithm to discover dominant variable thresholds that separate different operating modes or fault classes. Each terminal node corresponds to a specific diagnosis category, and the resulting rule paths can be directly interpreted as human-readable if–then relationships. Such interpretability makes DTs particularly appealing for building operators, who often prefer diagnostic models that can be understood and validated without advanced machine-learning expertise. In the context of HVAC systems, DT-based fault detection has been widely applied to air-handling units, chillers, and terminal devices, primarily because it requires minimal preprocessing and provides rapid inference suitable for real-time monitoring [29,30,31].

Despite these advantages, DTs exhibit certain limitations that restrict their generalization capability. Because the algorithm greedily selects locally optimal splits, the resulting tree can overfit to training data, particularly when the features are correlated or the classes overlap. When applied to sensor-based HVAC data, this sensitivity manifests as false alarms under noisy or drifting measurements. Previous studies have noted that DTs tend to perform well for discrete and strongly separable faults, but struggle with gradual degradations such as coil fouling or damper leakage, where feature boundaries are ambiguous [30,31,32]. Regularization techniques such as pruning and depth control can mitigate overfitting, but often at the cost of reduced diagnostic resolution.

In this study, DT was implemented as a lightweight reference model using the scikit-learn framework, with maximum tree depth and minimum split size optimized via validation. Class weights were also applied to address label imbalance among the four operating states (normal, outdoor-air damper stuck, coil fouling, and filter fouling). The DT results serve as a transparent baseline for comparison with more expressive nonlinear models such as ANN and DBN. DT is one of the most widely used machine learning methods in fault detection due to its simplicity, interpretability, and low computational cost. The algorithm recursively partitions the feature space into subsets by selecting splitting rules that maximize information gain. Each terminal node corresponds to a fault class, and the resulting decision paths can be easily interpreted to identify key variables contributing to classification outcomes. This transparency makes DT particularly suitable as a reference model in HVAC fault detection studies. However, DTs are prone to overfitting and exhibit limited ability to generalize when class boundaries overlap or when input variables are noisy [31,32]. In this study, DT was implemented with maximum depth and minimum split size tuned through validation, with class weights applied to address class imbalance.

2.5.2. Artificial Neural Network

Artificial Neural Networks (ANNs) extend the representational capacity of shallow classifiers by learning nonlinear mappings between input features and output classes. Unlike Decision Trees, which rely on explicit threshold-based partitioning, ANNs can capture smooth and continuous relationships among correlated variables through multi-layer transformations. Each neuron in a hidden layer computes a weighted sum of its inputs followed by a nonlinear activation function, enabling the network to model complex interactions among HVAC variables such as airflow, temperature, and energy consumption. These characteristics make ANNs particularly suitable for systems where multiple sensor readings collectively indicate a fault condition rather than a single dominant threshold. In the context of HVAC fault detection and diagnosis (FDD), previous studies have demonstrated that shallow feedforward networks can effectively identify control and sensor faults under both steady-state and transient conditions [29,30,33]. Their relatively simple architecture, when combined with standardized feature preprocessing, offers an attractive balance between learning capacity and computational efficiency, making ANNs a practical choice for real-time building applications.

In this study, an ANN was implemented as a multilayer perceptron consisting of two hidden layers with rectified linear unit (ReLU) activations and a Softmax output layer for four-class classification. Regularization techniques were applied to improve robustness and prevent overfitting, including dropout between hidden layers and early-stopping based on validation loss. These practices are consistent with recent deep-learning-based FDD frameworks that have emphasized regularization and temporal validation to enhance generalization [20,22,23]. Although ANNs can learn nonlinear boundaries more effectively than DTs, they lack interpretability—internal weights do not directly correspond to physical rules, making root-cause explanation difficult for operators [30,33]. Furthermore, their performance can degrade when training data are limited or class imbalance is severe, conditions common in building datasets. Nevertheless, the ANN serves as a critical intermediate baseline between interpretable but shallow methods such as DT and deeper hierarchical architectures like the proposed DBN, providing insight into how model complexity affects diagnostic accuracy and generalization capability across building domains.

2.5.3. Rule-Based Fault Detection Method

To provide a physically interpretable benchmark, a rule-based fault detection (FDD) method was developed using the same set of 21 monitored variables employed by the machine-learning models. Each rule was derived from established guidelines and previous studies on air-handling unit (AHU) diagnostics [31,34,35], and tuned using the prototype dataset to balance sensitivity and specificity. The approach relies on simple threshold-based conditions representing the expected cause–effect relationships among AHU variables under normal and faulty operations.

The rules were defined as follows:

-: Outdoor-air damper stuck: The absolute temperature difference between mixed air and return air is less than 1.0 °C, and the outdoor-air flow fraction is below 5 percent of the design value for more than 10 min.
-: Cooling-coil fouling: The difference between supply-air temperature and its setpoint exceeds 1.0 °C, and the cooling-valve opening remains above 90 percent for at least 5 min.
-: Filter fouling: The duct static pressure exceeds 1.3 times its nominal value, or fan power increases by more than 15 percent while maintaining a constant airflow rate.

Each rule included a persistence window (5–10 min) to suppress transient fluctuations and reduce false alarms. Threshold values were tuned through a validation sweep to maximize the macro F1-score on the prototype dataset. This rule-based baseline was evaluated using the same testing dataset and variable set as the machine-learning models, ensuring a fair comparison based on identical inputs and labeling procedures.

2.6. Dataset Summary and Selection Rationale

The proposed fault detection and diagnosis (FDD) framework relied on two complementary datasets—one generated from a standardized prototype and the other from a building-specific the calibrated building-specific EnergyPlus model—to ensure both generalizability and realism. Simulation-based data generation was chosen to capture a wide range of HVAC operating conditions without disrupting real systems, providing a safe and cost-effective basis for developing and validating the diagnostic models.

For the training dataset, the DOE reference Medium Office prototype model included in the EnergyPlus distribution was selected. This model represents a typical three-story commercial office building and has been widely validated in prior studies, making it a reliable benchmark for research on HVAC performance and fault analysis. The prototype provides standardized occupancy, lighting, and equipment schedules consistent with the ASHRAE 90.1 framework, ensuring realistic load and control patterns. Only occupied-hour data (07:00–22:00) were extracted to eliminate idle-system behavior that could bias classification. The simulation generated 1 min interval data for an entire month, encompassing transient and steady-state operation under diverse weather conditions. Twenty-one variables were monitored, covering air and water-side thermal conditions, airflow rates, fan power, coil loads, and pressure measurements. These variables were selected based on their diagnostic relevance and consistent availability across both the prototype and target building models.

Three representative HVAC faults were systematically introduced into the prototype simulation to create labeled data for supervised training:

Outdoor-air damper stuck closed, representing ventilation loss and potential indoor air-quality degradation.
Cooling-coil fouling (65% capacity reduction), representing degraded heat-transfer effectiveness and elevated supply-air temperatures.
Air-filter fouling (30% pressure increase), representing airflow restriction and increased fan power demand.

For the testing dataset, an EnergyPlus-based digital twin of a research building operated by Company H in Korea was constructed. The building is a medium-sized office-type research facility, modeled to reflect its actual geometry, envelope, occupancy schedules, and HVAC system configuration. The second floor—served by a dedicated air-handling unit (AHU)—was chosen as the analysis target. The same 21 variables and 1 min temporal resolution were used to maintain parity with the training data. All three fault scenarios were reproduced using the same implementation logic, allowing the model to be evaluated under comparable yet physically distinct conditions.

By aligning both datasets in terms of monitored variables, temporal resolution, and fault definitions, the framework ensures that any observed differences in performance arise from model generalization across building domains, rather than inconsistencies in data representation.

Table 4 summarizes the key characteristics of the training and testing datasets.

3. Deep Belief Network Model and Hyperparameter Configuration

3.1. Model Selection and Architectural Design

A Deep Belief Network (DBN) was selected as the classification model for fault detection and diagnosis due to its capability of handling high-dimensional, nonlinear, and time-dependent sensor data. Unlike shallow classifiers that rely on fixed feature boundaries, a DBN is constructed upon probabilistic graphical models, allowing hierarchical extraction of features that are well suited to complex HVAC operational patterns. Formally, a DBN is built as a stack of Restricted Boltzmann Machines (RBMs). An RBM is an undirected bipartite graphical model that connects a visible layer v and a hidden layer h. Figure 8 illustrates the basic framework of a Restricted Boltzmann Machine (RBM) consisting of visible and hidden layers. The joint distribution of these nodes is defined by the Boltzmann distribution, parameterized by the following energy function:

E (v, h) = - \sum_{i = 1}^{m} a_{i} v_{i} - \sum_{j = 1}^{n} b_{j} h_{j} - \sum_{i = 1}^{m} \sum_{j = 1}^{n} v_{j} w_{i j} h_{j}

(1)

where W is the weight matrix, and bbb and ccc denote the bias vectors of the visible and hidden layers, respectively. Low-energy states are more probable, and the probability of a given configuration is defined as P(v,h) = 1Ze − E(v,h)P(v,h) = \frac{1}{Z} e^{−E(v,h)}P(v,h) = Z1e − E(v,h), with Z representing the partition function. Training of RBMs seeks to maximize data likelihood, which is commonly approximated using Contrastive Divergence (CD).

By stacking RBMs in a layer-wise manner, the DBN performs unsupervised pre-training, progressively transforming raw sensor data into abstract feature representations. Each RBM is trained independently, using the output of the previous layer as its input. After this unsupervised stage, the network undergoes supervised fine-tuning using backpropagation with a cross-entropy loss function. This two-step process addresses the challenge of scarce labeled data in fault detection while improving initialization for deep networks.

Figure 9 depicts the construction of a DBN by stacking multiple RBMs. The greedy layer-wise training strategy updates weights sequentially, enabling efficient learning in deep architectures. Once pre-training is complete, the final network is fine-tuned in a supervised fashion, making it suitable for multi-class fault classification. For this study, the DBN was tailored to building fault detection with both static and dynamic features. Twenty-one monitored variables were standardized using z-scores and segmented into 30 min windows with one-minute stride, yielding 630 features per sample. To emphasize temporal transitions, first-order differences (Δx) were computed and concatenated with the raw features, resulting in 1260-dimensional inputs. The final network architecture consisted of three hidden layers with 256, 128, and 64 nodes, respectively, with ReLU activations and dropout regularization applied between layers. A Softmax output layer mapped the learned features into four categories: normal operation, outdoor air damper stuck closed, cooling coil fouling, and air filter fouling. This design balances capacity and regularization, ensuring that the model can extract discriminative features while generalizing across training and testing datasets.

3.2. Hyperparameter Optimization Strategy

The optimization of hyperparameters is a critical process in the development of a Deep Belief Network, as these parameters directly determine the network’s representational capacity, convergence stability, and ability to generalize to unseen operational data. In the present study, a systematic two-stage optimization framework was adopted to avoid arbitrary parameter selection. The first stage consisted of a coarse random search, in which 50 candidate models were generated by randomly sampling from predefined parameter ranges. The search space included network depth between two and four layers, hidden units ranging from 64 to 512 per layer, learning rates distributed logarithmically between 1 × 10⁻⁵ and 2 × 10⁻³, dropout rates between 0.1 and 0.5, L2 regularization values from zero to 5 × 10⁻⁴, and batch sizes between 64 and 256. Additionally, both raw input features and augmented features with first-order differences (Δx) were evaluated to assess the contribution of temporal dynamics. Each sampled model was trained for up to 50 epochs with early stopping applied if validation performance stagnated for more than eight iterations. Macro F1-score was employed as the primary evaluation metric because it provides a balanced view of performance across fault classes, regardless of class imbalance.

The second stage employed Bayesian optimization using a Tree-structured Parzen Estimator to refine the most promising regions of the hyperparameter space. This approach adaptively sampled new configurations based on the posterior probability of high performance, ensuring efficient exploration. To further ensure robustness, each trial was validated using day-block cross-validation, which separates training and validation datasets chronologically. This method mimics real-world deployment scenarios, where predictive models are required to operate on time periods not represented in training data. Several stabilization techniques were incorporated into the optimization pipeline. Gradient clipping was applied with an upper bound of to prevent exploding gradients, while cosine learning rate decay with warm-up ensured smooth convergence even when the initial learning rate was relatively large. Dropout was systematically varied, and models with rates below 0.2 were found to overfit rapidly, while rates above 0.4 compromised feature retention. L2 weight penalties consistently improved validation scores by constraining excessively large parameters. Batch size was observed to influence gradient stability: small batches increased noise, while very large batches reduced generalization. A size of 128 achieved the best balance. The optimization process ultimately converged to a three-layer DBN with 256, 128, and 64 hidden units, a learning rate of 5 × 10⁻⁴, dropout of 0.3, L2 penalty of 1 × 10⁻⁴, and batch size of 128. The inclusion of Δx features improved macro F1-scores by approximately 1–2%, particularly for faults that evolve gradually such as damper or filter malfunctions. The final optimized configuration is summarized in Table 3, which highlights the search ranges, initial settings, and sensitivity analyses for each parameter.

3.3. Optimization Outcomes and Implications

This study adopted a two-stage hyperparameter optimization procedure to obtain a DBN configuration that is stable, regularized, and appropriate for high-dimensional AHU time-series data. In Stage-1, a coarse random search explored network depth (2–4 layers), hidden units (64–512 per layer), learning rate (log-uniform), dropout (0.1–0.5), L2 weight penalty (0–5 × 10⁻⁴), batch size (64–256), and the inclusion of first-order differences (Δx) in addition to raw features. Each candidate was trained with early stopping (validation patience of 8 epochs) using day-block cross-validation to prevent temporal leakage, and the best regions of the space were then refined in Stage-2 via Bayesian optimization with a tree-structured Parzen estimator. To further stabilize training, we employed gradient clipping (ℓ₂ ≤ 5), cosine learning-rate decay with warm-up, and class-weighting to address label imbalance across the four operating states.

The final architecture was selected based on validation-set criteria (lowest validation loss/highest macro-F1 under identical splits) and on robustness considerations observed consistently across folds. The chosen model comprises three hidden layers with 256, 128, and 64 units, respectively, rectified-linear activations, a Softmax output over four classes, dropout of 0.3 between hidden layers, L2 penalty of 1 × 10⁻⁴, a base learning rate of 5 × 10⁻⁴ with cosine decay, and a batch size of 128. Input features include both 30 min windows of standardized sensor values and their first-order differences (Δx), which we retained because they improved temporal discriminability in ventilation-related scenarios during validation. We also observed that widening the first hidden layer beyond 256 units produced diminishing returns relative to the added computation, and that dropout in the range 0.25–0.35 offered a favorable regularization–retention trade-off under the same validation protocol.

These design selections are consistent with prior deep-learning-based FDD research emphasizing the importance of temporal dynamics and regularization for stable training and generalization [20,21,22,23]. Previous studies have similarly shown that incorporating short-term feature differences enhances fault discriminability in HVAC systems, while carefully tuned dropout and network depth prevent overfitting in high-dimensional sensor datasets [20,22]. These selections therefore reflect methodological best practices established in the recent HVAC FDD literature rather than ad hoc configuration choices. All quantitative outcomes (training/validation vs. test and inter-model comparisons) are reported in Section 4 under a strictly separated evaluation protocol.

4. Results and Discussion

4.1. Performance Comparison of Classifiers

Prior to the comparative evaluation, it is important to note that the DBN was trained exclusively on simulation data generated from the EnergyPlus Medium Office prototype and tested on an independent dataset derived from the digital-twin model of the target research building. This separation eliminates any data overlap between training and testing and ensures that the reported performance reflects prototype-to-target transfer within simulation environments rather than memorization. The model achieved an average diagnostic accuracy of 98.6% on the simulation-based training dataset and 97.9% on the digital-twin test dataset, indicating that the DBN maintained high accuracy with less than a one-percent drop when evaluated on an independently calibrated target-building EnergyPlus model. To comprehensively assess the suitability of the Deep Belief Network (DBN) for HVAC fault detection and diagnosis, its performance was compared against two widely used baseline classifiers: Decision Tree (DT) and Artificial Neural Network (ANN). These models were selected for three reasons. First, DTs represent a simple, interpretable, and computationally lightweight machine learning approach, frequently employed in building fault detection as a benchmark model due to their ease of implementation. Second, ANNs provide a natural progression in complexity, offering nonlinear mapping capabilities that can handle high-dimensional sensor data more effectively than DTs. Third, by comparing DBN against both a shallow interpretable model (DT) and a conventional neural network (ANN), the superiority of the DBN in capturing hierarchical and temporal system behaviors could be clearly demonstrated. All three models were trained on the DOE Medium Office prototype dataset and tested on the EnergyPlus-based Company H research building model. Both datasets consisted of 21 monitored variables at 1 min resolution, covering airflow rates, temperatures, coil energy consumption, and pressure measurements. Fault scenarios included outdoor air damper stuck, cooling coil fouling (65%), and air filter fouling (30%), providing a consistent and balanced benchmark across classifiers. By maintaining identical datasets and fault scenarios, the comparison isolates model architecture as the primary variable influencing performance.

The Decision Tree classifier achieved the lowest performance. Figure 10 presents the confusion matrices of the rule-based method, Decision Tree (DT), and Artificial Neural Network (ANN) on the independent test dataset. While DTs are attractive for their transparency and computational efficiency, their simplistic rule-based structure was unable to capture subtle overlaps between normal and faulty operating states. The confusion matrix revealed substantial misclassification of OA damper stuck versus air filter fouling, reflecting the model’s inability to generalize when variable distributions were similar across classes. With an overall accuracy of approximately 90% and F1-scores around 89%, the DT highlighted its role as a baseline benchmark rather than a robust solution for high-stakes HVAC fault detection. The ANN outperformed DT, benefiting from its nonlinear feature mapping capability. Its improved architecture allowed for more reliable classification of cooling coil fouling, where recall values exceeded 95%. However, the relatively shallow structure of the ANN limited its ability to extract deeper temporal and hierarchical patterns within the time-series data. While it reduced misclassification errors compared to DT, the ANN still exhibited weaknesses in distinguishing OA damper faults, where variable fluctuations and transient behaviors played a significant role. The model achieved an overall accuracy of nearly 95%, positioning it as a mid-level classifier that demonstrated clear improvements over DT but remained inferior to DBN.

The rule-based method exhibits notable confusion among fault types due to overlapping threshold-based conditions, particularly between OA damper and air filter faults. Data-driven models reduce misclassification progressively, with ANN showing improved discrimination compared to rule-based and DT approaches. The corresponding DBN results are presented separately in Figure 11.

The DBN consistently achieved superior results across all evaluation metrics. Its deep hierarchical structure facilitated the extraction of both low-level operational signals and high-level abstract patterns, allowing for robust classification of faults that were problematic for DT and ANN. In particular, the DBN demonstrated resilience against sensor noise and fault transition boundaries, which are common challenges in real-world HVAC systems. For OA damper stuck, DBN improved classification accuracy by more than 5% compared to ANN and nearly 8% compared to DT, demonstrating its ability to capture subtle dynamics that shallow models overlooked. With average accuracy approaching 98% and macro F1-scores above 97%, DBN proved to be the most reliable and scalable method for building fault detection.

The DBN hyperparameter optimization results are summarized in Table 5, and the overall performance comparison of the rule-based method, DT, ANN, and DBN is reported in Table 6, the rule-based method provides a physically interpretable baseline but exhibits lower accuracy due to rigid threshold definitions, whereas the DBN consistently achieves the highest performance across all evaluation metrics. As illustrated, DT provided a quick and interpretable but ultimately limited approach, ANN offered moderate performance with improved nonlinear mapping, and DBN decisively outperformed both. The confusion matrices in Figure 10 and Figure 11 further illustrate the progression of performance across models: DT exhibited frequent misclassifications, ANN reduced but did not eliminate them, and DBN nearly eliminated errors across all fault types. These re-sults confirm that DBN is not only a marginal improvement over traditional classifiers but represents a significant advancement in achieving reliable and robust fault detection in HVAC applications.

Although detailed confusion matrices are not shown, an internal analysis of classification results was conducted to verify possible mislabeling between normal and faulty states. The Decision Tree (DT) displayed the largest overlap between outdoor-air damper stuck and filter fouling conditions, confirming its tendency to merge ventilation-related faults with similar variable patterns. The Artificial Neural Network (ANN) reduced this overlap but occasionally misclassified partial-fault conditions—particularly mild damper degradation—as normal during transient recovery periods. The Deep Belief Network (DBN) exhibited the lowest rate of false negatives, with fewer than 2% of total fault samples incorrectly labeled as normal and no major confusion among fault categories.

These results indicate that the DBN not only achieves high overall accuracy but also maintains diagnostic reliability in distinguishing fault conditions from normal operation, which is essential for real-time FDD implementation.

4.2. Pracal Implications and Limitations

The results provide implications for simulation-driven development and benchmarking of FDD models under controlled fault injection. The framework suggests potential scalability in terms of workflow scalability within simulation-driven development, because the training and evaluation pipelines rely on commonly available AHU variables. However, deployment relevance is discussed, but field validation remains future work; thus, direct implementation in diverse real buildings is not demonstrated in this study. Among them, the superior performance of DBN highlights its enhanced ability to capture nonlinear and high-dimensional fault patterns, making it particularly suitable for complex HVAC systems. This finding facilitates broader adoption in existing buildings, where retrofit costs and system heterogeneity often present significant barriers. In this study, the target-building EnergyPlus model was calibrated against measured BEMS trends under normal operation, and then used as an independent simulation environment for generating evaluation data under controlled fault injection. Accordingly, the reported test performance reflects transfer across two EnergyPlus models under aligned AHU topology, control sequences, and variable definitions. This should not be interpreted as evidence of generalization to real-world BMS data, which remains a key direction for future work. While DT and ANN showed acceptable accuracy, DBN consistently outperformed both baselines, underscoring its robustness in transferring knowledge across building domains. This feature is potentially useful for simulation-based screening and benchmarking across multiple modeled facilities, provided that system topology, control logic, and variable definitions are comparable. Despite these advantages, certain limitations must be acknowledged. The current framework focused on a limited set of fault scenarios—outdoor air damper stuck, cooling coil fouling, and air filter fouling. Although these represent common and energy-critical faults, the scalability of the method to a broader range of HVAC fault types remains to be validated. Additionally, the training and testing datasets were generated through EnergyPlus simulations, which, although grounded in physical principles, may not fully capture the variability, uncertainties, and noise inherent in field data. This limitation applies equally to all three classifiers but is particularly important for validating DBN’s generalization in real-world contexts.

Finally, while DBN achieved the strongest performance in this study, ANN and DT still provide lightweight alternatives with lower computational cost and easier implementation. Future research should extend the comparative analysis to include advanced deep learning architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or hybrid ensemble methods, as well as online learning mechanisms to adapt to evolving building operation conditions. Such work will further clarify the trade-offs between accuracy, complexity, and scalability between accuracy and complexity, and supporting future deployment studies with measured BMS data. Seasonal representativeness is a limitation of this study. Due to the characteristics of the experimental setup and the availability of calibrated BEMS data for the target-building model, the simulation-based training and evaluation datasets were generated for a single operating period rather than multiple seasons. As a result, the reported performance reflects fault-diagnosis behavior under the evaluated conditions only. Extending the framework to multi-season or year-round operation and assessing seasonal robustness using longer-term BEMS records remain important directions for future work.

4.3. Discussion

The comparative evaluation highlights that model architecture and feature representation play decisive roles in HVAC fault diagnosis performance. The Decision Tree (DT) offered the advantage of transparency and simplicity but showed limited capability in handling overlapping or gradual fault patterns. This behavior stems from its axis-aligned partitioning of feature space, which makes DT sensitive to noisy or weakly correlated variables. Consequently, DT was effective in identifying discrete faults such as filter clogging but performed inconsistently for progressive faults like damper sticking or coil fouling. The Artificial Neural Network (ANN) improved the overall fault-classification accuracy by modeling nonlinear interactions among variables. However, its shallow structure constrained hierarchical feature learning, and its reliance on static feature snapshots made it less responsive to temporal fault evolution. This explains its occasional misclassification of transient fault states as normal. Such limitations have also been noted in prior HVAC fault-detection studies employing shallow neural architectures [29,30,33].

The Deep Belief Network (DBN) outperformed both baselines primarily due to its hierarchical feature-extraction capability and the inclusion of temporal-difference features (Δx). These enabled the model to capture the dynamic progression of HVAC faults rather than relying solely on instantaneous sensor readings. The unsupervised pretraining of Restricted Boltzmann Machines provided robust feature initialization, reducing sensitivity to data imbalance and improving prototype-to-target transfer performance within simulation environments. The DBN’s low false-negative rate and consistent accuracy between prototype-trained and digital-twin-tested datasets demonstrate that the framework can maintain performance across the two evaluated EnergyPlus models when key variables and control logic are aligned. Another noteworthy aspect is that performance improvements were not strictly tied to network depth or parameter count but rather to the balance between expressiveness and regularization. The optimized three-layer DBN achieved stable convergence and superior validation consistency compared with deeper or unregularized alternatives. This observation supports the findings of recent deep-learning FDD research, which emphasizes that moderate complexity combined with dropout and L2 regularization provides optimal robustness for high-dimensional sensor data [20,22,23].

Overall, these results indicate that future FDD research should focus less on increasing model depth and more on improving temporal representation, feature fusion, and physical interpretability. While sequence-to-sequence architectures such as LSTM, GRU, and Transformer models have become increasingly popular for HVAC fault detection using raw time-series data [20,21,22,23], this study focuses on a simulation-based baseline that combines (i) windowed temporal features and first-order differences (Δx) with (ii) DBN’s unsupervised pretraining for stable learning under high-dimensional inputs and limited fault diversity [26,27,28]. The primary objective is not to claim superiority over state-of-the-art sequence models, but to establish a transparent inter-simulation transfer benchmark using a DBN under controlled EnergyPlus-based fault scenarios [6,7,13]. Comprehensive comparisons with modern sequence-to-sequence architectures are therefore left for future work.

5. Conclusions

This study developed and evaluated a machine learning–based framework for fault detection and diagnosis (FDD) in building HVAC systems using simulation-generated datasets. Training data were obtained from the EnergyPlus Medium Office prototype, while testing data were generated from a detailed EnergyPlus model of a research building operated by Company H in Korea. By extracting 21 representative variables related to thermal conditions, airflow, pressure, and energy consumption, both training and testing datasets were constructed under consistent conditions. Three representative HVAC fault scenarios were considered: outdoor air damper stuck in the fully closed position, cooling coil fouling corresponding to a 65% reduction in heat exchange effectiveness, and air filter fouling representing a 30% increase in resistance. The performance of three machine learning classifiers—Decision Tree (DT), Artificial Neural Network (ANN), and Deep Belief Network (DBN)—was systematically compared. The results demonstrated that DBN consistently outperformed the baseline models, achieving the highest accuracy, precision, recall, and F1-scores across all fault scenarios. ANN achieved moderate performance, while DT showed the lowest overall accuracy, confirming the advantage of deeper architectures in handling nonlinear fault patterns and high-dimensional sensor data. The evaluation also indicated prototype-to-target transfer within simulation environments, as the DBN trained on a standardized EnergyPlus prototype dataset maintained performance when evaluated on an independent dataset generated from a calibrated EnergyPlus model representing the target facility (digital twin simulation). The findings provide two key implications. First, they highlight the potential of DBN-based FDD when applied to commonly available AHU-level variables, without requiring complex physical modeling or expert-defined rules, within a simulation-based evaluation framework. Second, the observed prototype-to-target transfer suggests potential scalability in terms of workflow scalability within simulation-driven development, provided that system topology, control logic, and monitored variables are comparable. Nevertheless, certain limitations were identified, including reliance on simulation-generated datasets and a limited number of fault scenarios. Future work should expand the fault library, integrate field data for validation, and investigate advanced architectures such as CNNs, RNNs, or ensemble hybrids, as well as online learning strategies for continuous adaptation to changing building conditions.

This study demonstrates a simulation-based proof-of-concept for AHU fault classification using a Deep Belief Network (DBN). Training was conducted on a standardized EnergyPlus prototype model, and evaluation was performed using an independent, calibrated EnergyPlus model representing a target facility (digital twin simulation). The results indicate prototype-to-target transfer within simulation environments when AHU topology, control logic, and monitored variables are aligned. Deployment relevance is discussed, but field validation remains future work; therefore, claims regarding real-building feasibility or portfolio-scale scalability are beyond the evidence provided in this study. Future work will focus on validation using measured BMS data, expanding fault libraries and severities, and evaluating performance across multiple buildings and control configurations. Future work may also consider integration of fault diagnosis with learning-based control and resource management strategies in building and energy systems [36].

Funding

This research was supported by a Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Korean government (MOLIT) (RS-2023-00250434).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

Katipamula, S.; Brambley, M.R. Methods for Fault Detection, Diagnostics, and Prognostics for Building Systems—A Review, Part I. HVACR Res. 2005, 11, 3–25. [Google Scholar] [CrossRef]
Katipamula, S.; Brambley, M.R. Methods for Fault Detection, Diagnostics, and Prognostics for Building Systems—A Review, Part II. HVACR Res. 2005, 11, 169–187. [Google Scholar] [CrossRef]
Du, Z.; Fan, B.; Jin, X.; Chi, J. Fault detection and diagnosis for buildings and HVAC systems using combined neural networks and subtractive clustering analysis. Build. Environ. 2014, 73, 1–11. [Google Scholar] [CrossRef]
Li, Y.; O’Neill, Z. A critical review of fault modeling of HVAC systems in buildings. Build. Simul. 2018, 11, 953–975. [Google Scholar] [CrossRef]
Frank, S.; Jin, X.; Lin, G.; Singla, R.; Farthing, A.; Granderson, J. A Performance Evaluation Framework for Building Fault Detection and Diagnosis Algorithms. Energy Build. 2019, 192, 84–92. [Google Scholar] [CrossRef]
Kim, J.; Frank, S.; Im, P.; Braun, J.E.; Goldwasser, D.; Leach, M. Representing Small Commercial Building Faults in EnergyPlus, Part II: Model Validation. Buildings 2019, 9, 239. [Google Scholar] [CrossRef]
Frank, S.; Heaney, M.; Jin, X.; Robertson, J.; Cheung, H.; Elmore, R.; Henze, G. Hybrid Model-Based and Data-Driven Fault Detection and Diagnostics for Commercial Buildings: Preprint; Conference Paper (NREL/CP-5500-65924); National Renewable Energy Laboratory (NREL): Golden, CO, USA, 2016.
Zhang, L.; Leach, M.; Bae, Y.; Cui, B.; Bhattacharya, S.; Lee, S.; Im, P.; Adetola, V.; Vrabie, D.; Kuruganti, T. Sensor impact evaluation and verification for fault detection and diagnostics in building energy systems: A review. Adv. Appl. Energy 2021, 3, 100055. [Google Scholar] [CrossRef]
Dexter, A.; Pakanen, J. (Eds.) Demonstrating Automated Fault Detection and Diagnosis Methods in Real Buildings; IEA ECBCS Annex 34; VTT Technical Research Centre of Finland: Espoo, Finland, 2001. [Google Scholar]
Huang, J.; Wen, J.; Yoon, H.; Pradhan, O.; Wu, T.; O’Neill, Z.; Candan, K.S. Real vs. simulated: Questions on the capability of simulated datasets on building fault detection for energy efficiency from a data-driven perspective. Energy Build. 2022, 259, 111872. [Google Scholar] [CrossRef]
Mosteiro-Romero, M.; Wang, Z.; Lu, C.J.; Itard, L.C.M. Whole-Building HVAC Fault Detection and Diagnosis with the 4S3F Method: Towards Integrating Systems and Occupant Feedback. In Proceedings of the ASim2024 (IBPSA Asia Conference), Osaka, Japan, 8–10 December 2024; pp. 1185–1192. [Google Scholar]
Matetić, I.; Štajduhar, I.; Wolf, I.; Ljubić, S. A Review of Data-Driven Approaches and Techniques for Fault Detection and Diagnosis in HVAC Systems. Sensors 2023, 23, 1. [Google Scholar] [CrossRef]
Xie, X.; Merino, J.; Moretti, N.; Pauwels, P.; Chang, J.Y.; Parlikad, A. Digital Twin enabled fault detection and diagnosis process for building HVAC systems. Autom. Constr. 2023, 146, 104695. [Google Scholar] [CrossRef]
Belikov, J.; Meas, M.; Machlev, R.; Kose, A.; Tepljakov, A.; Loo, L.; Petlenkov, E.; Levron, Y. Explainable AI based Fault Detection and Diagnosis System for Air Handling Units. In Proceedings of the ICINCO 2022, Lisbon, Portugal, 14–16 July 2022. [Google Scholar]
Nelson, W.; Culp, C. FDD in Building Systems Based on Generalized Machine Learning Approaches. Energies 2023, 16, 1637. [Google Scholar] [CrossRef]
Gao, T.; Marié, S.; Béguery, P.; Thebault, S.; Lecoeuche, S. Integrated building fault detection and diagnosis using data modeling and Bayesian networks. Energy Build. 2024, 306, 113889. [Google Scholar] [CrossRef]
Pradhan, O.; Wen, J.; Chen, Y.; Lu, X.; Chu, M.; Fu, Y.; O’Neill, Z.; Wu, T.; Candan, K.S. Dynamic Bayesian network-based fault diagnosis for ASHRAE Guideline 36: High performance sequence of operation for HVAC systems. In Proceedings of the BuildSys ’21 (ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation), Coimbra, Portugal, 17–18 November 2021. [Google Scholar]
Li, G.; Yao, Z.; Chen, L.; Li, T.; Xu, C. An interpretable graph convolutional neural network based fault diagnosis method for building energy systems. Build. Simul. 2024, 17, 1113–1136. [Google Scholar] [CrossRef]
Bruton, K.; Raftery, P.; Kennedy, B.; Keane, M.M.; O’Sullivan, D.T.J. Review of automated fault detection and diagnostic tools in air handling units. Energy Effic. 2014, 7, 335–351. [Google Scholar] [CrossRef]
Cheng, F.; Cai, W.; Zhang, X.; Liao, H.; Cui, C. Fault detection and diagnosis for Air Handling Unit based on multiscale convolutional neural networks. Energy Build. 2021, 236, 110795. [Google Scholar] [CrossRef]
Li, B.; Cheng, F.; Cai, H.; Zhang, X.; Cai, W. A semi-supervised approach to fault detection and diagnosis for building HVAC systems based on the modified generative adversarial network. Energy Build. 2021, 246, 111044. [Google Scholar] [CrossRef]
Martinez-Viol, V.; Urbano, E.M.; Torres Rangel, J.E.; Delgado-Prieto, M.; Romeral, L. Semi-Supervised Transfer Learning Methodology for Fault Detection and Diagnosis in Air-Handling Units. Appl. Sci. 2022, 12, 8837. [Google Scholar] [CrossRef]
Zhu, H.; Yang, W.; Li, S.; Pang, A. An Effective Fault Detection Method for HVAC Systems Using the LSTM-SVDD Algorithm. Buildings 2022, 12, 246. [Google Scholar] [CrossRef]
Azuatalam, D.; Lee, W.-L.; de Nijs, F.; Liebman, A. Reinforcement learning for whole-building HVAC control and demand response. Energy AI 2020, 2, 100020. [Google Scholar] [CrossRef]
Li, G.; Zheng, Y.; Liu, J.; Zhou, Z.; Xu, C.; Fang, X.; Yao, Q. An improved stacking ensemble learning-based sensor fault detection method for building energy systems using fault-discrimination information. J. Build. Eng. 2021, 43, 102812. [Google Scholar] [CrossRef]
Adetola, V.; Bengea, S.; Kang, K.; Kelman, A.; Leonardi, F.; Li, P.; Lovett, T.; Sarkar, S.; Vichik, S. Model Predictive Control and Fault Detection and Diagnostics of a Building Heating, Ventilation, and Air Conditioning System. In Proceedings of the International High Performance Buildings Conference (IHPBC), Purdue, IN, USA, 14–17 July 2014. [Google Scholar]
Movahed, P.; Taheri, S.; Razban, A. A bi-level data-driven framework for fault-detection and diagnosis of HVAC systems. Appl. Energy 2023, 339, 120948. [Google Scholar] [CrossRef]
Shi, Z.; O’Brien, W. Building zone fault detection with Kalman filter based methods. In Proceedings of the eSim (IBPSA), Hamilton, ON, Canada, 3–6 May 2016. [Google Scholar]
Du, Z.; Fan, B.; Chi, J.; Jin, X. Sensor fault detection and its efficiency analysis in air handling unit using the combined neural networks. Energy Build. 2014, 72, 157–166. [Google Scholar] [CrossRef]
Zhao, Y.; Li, T.; Zhang, X.; Zhang, C. Artificial intelligence-based fault detection and diagnosis methods for building energy systems: Advantages, challenges and the future. Renew. Sustain. Energy Rev. 2019, 109, 85–101. [Google Scholar] [CrossRef]
Rokach, L.; Maimon, O. Data Mining with Decision Trees: Theory and Applications, 2nd ed.; World Scientific: Singapore, 2014. [Google Scholar]
Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann: San Mateo, CA, USA, 1993. [Google Scholar]
Yu, Y.; Woradechjumroen, D.; Yu, D. A review of fault detection and diagnosis methodologies on air-handling units. Energy Build. 2014, 82, 550–562. [Google Scholar] [CrossRef]
House, J.; Vaezi-Nejad, H. An Expert Rule Set for Fault Detection in Air-Handling Units. ASHRAE Trans. 2001, 107, 1. [Google Scholar]
Schein, J.; Bushby, S.T. A Hierarchical Rule-Based Fault Detection and Diagnostic Method for HVAC Systems. HVACR Res. 2006, 12, 111–125. [Google Scholar] [CrossRef]
Wu, H.-C.; Qiu, D.; Zhang, L.; Sun, M. Adaptive Multi-Agent Reinforcement Learning for Flexible Resource Management in a Virtual Power Plant with Dynamic Participating Multi-Energy Buildings. Appl. Energy 2024, 374, 123998. [Google Scholar] [CrossRef]

Figure 1. Process of fault detection and diagnosis using machine learning.

Figure 2. EnergyPlus simulation model developed for training dataset construction. Different colors indicate representative building layers and zone groupings (roof and top zone, intermediate floors, and ground floor).

Figure 3. Fault characteristics of the outdoor air damper (stuck condition).

Figure 4. Fault characteristics of cooling coil fouling (65% capacity reduction).

Figure 5. Fault characteristics of air filter fouling.

Figure 6. Measured versus simulated return air temperature used for calibration/validation of the target-building EnergyPlus model: (a) best-case period showing minor response delay and overshoot during heating-coil operation, and (b) worst-case period exhibiting underfit behavior during transient operation. Despite transient discrepancies, the overall agreement satisfies ASHRAE Guideline 14 acceptance criteria (Table 3), supporting the calibrated model for simulation-based fault evaluation.

Figure 7. Overview of the Target Building.

Figure 8. Framework of RBMs.

Figure 9. Structure of DBN.

Figure 10. Confusion matrices of fault classification results on the independent test dataset using (a) rule-based diagnosis, (b) decision tree (DT), and (c) artificial neural network (ANN).

Figure 11. Results for DBN classifier.

Table 1. Building characteristics of the DOE Medium Office prototype.

Element		Detail
Floor Area [m²]		4982
Total Window Area [m²]		653
Window to Wall Ratio		0.33
Internal Load	People [m²/person]	18.58
	Light [W/m²]	10.76
	Equip. [W/m²]	10.76
HVAC System	Air-loop	VAV
	Cooling	DX Coil (2 speed)
	Heating	Central + Reheat Coil
Primary System		Electric, Gas (Heating)
HVAC Operation		07:00~22:00
Cooling Set Point [°C]		24
Heating Set Point [°C]		20

Table 2. Calibration performance of the target-building EnergyPlus model based on ASHRAE Guideline 14.

Metric	Source Model	ASHRAE Guideline 14 (Hourly)	Result
NMBE (%)	4.23	±10	Pass
CVRMSE (%)	9.87	≤30	Pass
R²	0.94	≥0.75	Pass

Table 3. Variable and control correspondence between the prototype and target building models.

Category	Prototype Variable	Target Variable	Unit	Matching/Adjustment
Airflow	Supply fan flow rate	AHU-2 supply fan flow	m³/s	Directly matched
Temperature	Mixed-air temperature	MA temperature sensor	°C	Offset < 0.5 °C after calibration
Temperature	Return-air temperature	RA temperature sensor	°C	Directly matched
Temperature	Supply-air temperature	SAT sensor	°C	Control setpoint matched
Pressure	Duct static pressure	DSP controller feedback	Pa	Control setpoint matched
Coil performance	Cooling-coil valve position	CHW valve opening	%	Same control logic
Coil performance	Chilled-water supply temperature	CHWS sensor	°C	8 °C ± 0.2
Control	Outdoor-air damper signal	OA damper actuator	%	Matched (manual override allowed)
Control	Minimum outdoor-air ratio	OA fraction	-	Setpoint = 0.1
Control	Economizer enable	OA economizer control	-	Disabled in both models
Power	Supply fan power	SF motor power	kW	Within ±5% after calibration
Schedule	Occupancy period	07:00–22:00	h	Identical
Schedule	Cooling setpoint	24 °C	°C	Identical
Schedule	Heating setpoint	20 °C	°C	Identical

Table 4. Overview of training and testing datasets.

Dataset	Source Model	Variables	Conditions	Fault Scenarios
Training	DOE Reference Medium Office	21	Occupied Hours (1 min intervals of 1 month)	Damper Stuck Coil Fouling Air Filter Fouling
Testing	Digital-twin model of Company H building (Korea)	21	Occupied Hours (1 min intervals of 1 month)	Damper Stuck Coil Fouling Air Filter Fouling

Table 5. Summary of DBN hyperparameter optimization.

Parameter	Initial	Range	Optimized Setting
Hidden layers	2	2–4	3
Units per layer	128–128	64–512	256–128–64
Learning rate	1.0 × 10⁻³	1.0 × 10⁻⁵–2.0 × 10⁻³ (log)	5.0 × 10⁻⁴
Dropout	0.2	0.1–0.5	0.3
L2 regularization	0	0–5.0 × 10⁻⁴	1.0 × 10⁻⁴
Batch size	64	64–256	128
Δx features	Off	{Off, On}	On
Epochs (max)	50	30–100	50 (early stop)

Table 6. Performance comparison of classifiers.

Model	Accuracy [%]	Precision [%]	Recall [%]	F1-Score [%]
Rule-based	84.7	83.2	85.1	84.1
Decision Tree	90.1	89.5	88.7	89.0
ANN	94.8	94.2	93.7	93.9
DBN	97.9	97.5	97.1	97.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yoo, M. Simulation-Based Fault Detection and Diagnosis for AHU Systems Using a Deep Belief Network. Buildings 2026, 16, 342. https://doi.org/10.3390/buildings16020342

AMA Style

Yoo M. Simulation-Based Fault Detection and Diagnosis for AHU Systems Using a Deep Belief Network. Buildings. 2026; 16(2):342. https://doi.org/10.3390/buildings16020342

Chicago/Turabian Style

Yoo, Mooyoung. 2026. "Simulation-Based Fault Detection and Diagnosis for AHU Systems Using a Deep Belief Network" Buildings 16, no. 2: 342. https://doi.org/10.3390/buildings16020342

APA Style

Yoo, M. (2026). Simulation-Based Fault Detection and Diagnosis for AHU Systems Using a Deep Belief Network. Buildings, 16(2), 342. https://doi.org/10.3390/buildings16020342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Simulation-Based Fault Detection and Diagnosis for AHU Systems Using a Deep Belief Network

Abstract

1. Introduction

2. Methodology

2.1. Training Data Generation

2.2. Fault Characteristics

2.3. Calibrated Building-Specific EnergyPlus Model for Testing Data

2.4. Testing Data Generation Using Calibrated Building-Specific EnergyPlus Model

2.5. Baseline Classifiers: Decision Tree and Artificial Neural Network

2.5.1. Decision Tree

2.5.2. Artificial Neural Network

2.5.3. Rule-Based Fault Detection Method

2.6. Dataset Summary and Selection Rationale

3. Deep Belief Network Model and Hyperparameter Configuration

3.1. Model Selection and Architectural Design

3.2. Hyperparameter Optimization Strategy

3.3. Optimization Outcomes and Implications

4. Results and Discussion

4.1. Performance Comparison of Classifiers

4.2. Pracal Implications and Limitations

4.3. Discussion

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI