1. Introduction
The increasing demand for smart buildings and energy management has resulted in a substantial proliferation of electrical equipment, leading to a notable escalation in energy consumption and carbon dioxide emissions. The increasingly severe environmental problems have posed enormous challenges to global sustainable energy development. According to relevant research data, the construction industry accounts for approximately 30% of total global energy consumption and has become one of the largest contributors to global energy consumption [
1,
2,
3]. Specifically, the energy consumption of air conditioning systems accounts for 30% to 50% of the total annual energy usage of buildings [
4,
5,
6]. The data highlights the enormous energy demand generated by air conditioning systems.
Compared with other HVAC systems, VRF can reduce greenhouse gas emissions such as carbon dioxide. With the integration of more renewable energy into air conditioning systems, researchers have demonstrated the energy-saving potential of VRF systems by comparing them with traditional HVAC systems through model simulations and field performance tests in various case studies [
7]. The performance of traditional air conditioning and VRF systems was evaluated using EnergyPlus simulation by Aynur et al. [
8]. The results showed that VRF systems can save 27% to 57% energy compared with traditional HVAC systems. The study also used EnergyPlus simulations to explore the energy-saving potential of VRF systems compared with traditional HVAC systems across various U.S. climates. Kim et al. [
9] conducted relevant research and the results showed that VRF systems can save 15% to 42% energy with lower operating costs compared to standard rooftop units (RTUs), except in cold climates. Emrah et al. [
10] demonstrated that implementing VRF systems can decrease energy consumption by up to 44% in commercial buildings compared to chillers with boilers. Therefore, VRF systems are more energy efficient and reduce carbon emissions more effectively than traditional HVAC systems.
However, the multisource heterogeneous data such as energy consumption, environmental parameters and equipment status, generated during the operation of VRF systems exhibit characteristics of high-dimensionality, nonlinearity, and dynamicity. The traditional analysis method is difficult to effectively uncover their hidden patterns, leading to bottlenecks in system optimization, fault prediction, and energy efficiency management. Meanwhile, data mining techniques, which extract implicit patterns from massive datasets, have demonstrated powerful analytical capabilities in healthcare, finance, and other domains. However, their application in VRF systems remains largely underexplored, with systematic investigations still constituting a research gap.
As a critical sector for achieving the dual-carbon goals, energy efficiency in building air conditioning has made remarkable progress in recent years through technological innovation, policy-driven initiatives, and market practices. In the field of energy-efficient building air conditioning, numerous technologies have been developed and innovated, such as intelligent control technologies, variable frequency technology, heating and cooling recovery technologies, as well as applications of various new energy sources in building air conditioning [
11]. Liu et al. [
12] designed and developed a PSF centrifugal compressor, achieving an adiabatic efficiency of 75.9–88.5% for the compressor. In terms of new energy applications, renewable energy sources such as geothermal and solar energy have been increasingly applied in HVAC design. For example, solar energy is used to construct heat pump systems, thereby reducing the dependence of HVAC systems on electricity.
Darwiche et al. [
13] used geothermal energy as backup energy to operate a typical All-Air centralized HVAC system and found that introducing 100% fresh air intake ultimately saved 67% energy annually. Technologies such as fresh air precooling, solar photovoltaic power generation, and reclaimed water reuse can reduce reliance on traditional energy sources. Ra et al. [
14] applied solar cell storage-integrated switchable glass topology to provide passive HVAC during the day in EV charging station control rooms. Wang et al. [
15] developed a high-performance and compact counter-flow indirect dew point evaporative cooler, which eliminated the working air flow reversal in traditional M-cycle coolers, thereby reducing pressure drop and improving energy efficiency. The research results confirmed that cooler is suitable as an efficient precooling device. Shyu et al. [
16] designed a system incorporating an anaerobic membrane bioreactor, which was completely powered by solar photovoltaics during the test period. The high-quality product water that meets local water recycling standards was produced using membrane and ion exchange treatment. All of the above provide effective solutions for energy consumption.
With the application of automation systems and the Internet of Things (IoT) in buildings, sensors are used to collect vast amounts of data reflecting system operations, which hide abundant useful information and knowledge [
17]. Furthermore, data mining (DM) technology has demonstrated its powerful capability to automatically analyze big data across various domains [
18]. Data mining techniques are usually categorized as either supervised or unsupervised learning. Supervised learning is suitable for regression and classification modeling due to its strong mapping approximation capability. Unsupervised learning is suitable for discovering new information and knowledge by exploring correlations, associations, and patterns in data [
19]. The proliferation of big data alongside the advancement of data mining technologies offers a viable pathway for the operational assessment of HVAC systems.
Unsupervised data mining has currently gained extensive application in building data analysis to extract more valuable information from actual operational data. Tian et al. [
20] put forward an unsupervised data mining framework for the evaluation and optimization of HVAC system operation strategies, which realized an energy consumption reduction rate of 6.9%. Wang et al. [
21] introduced a methodology for interpreting neural network models by leveraging model gradients. This method quantifies the marginal impact of input on output based on chain rules, reducing computation time by 40% without sacrificing model accuracy. Xu [
22] put forward a data mining-driven approach for anomaly detection and dynamic energy performance assessment of HVAC systems. Qian et al. [
23] conducted data mining on a large-scale dataset based on VRF big data to determine the actual behavior and energy consumption of residents in different climate zones. The data mining technology, including data preprocessing, cluster analysis, association rule mining, and post-processing, were fully utilized in the study to achieve dynamic multi-level energy consumption assessment of HVAC systems. However, more attention should be paid to direct variables that reflect the operational status such as temperature, flow rate, wind speed and pressure when using data mining methods to optimize the operation of the system. The heating and cooling load requirements of buildings can be predicted by utilizing big data analysis and machine learning algorithms. The intelligent control system can automatically adjust air conditioning parameters such as operation modes, temperature settings, and wind speed based on these predictions to achieve precise temperature control, avoiding over-cooling or over-heating. This can reduce energy consumption and improve system efficiency.
The data correlations among high-dimensional operational variables contain more hidden information, which is more helpful for evaluating system operational status. Yu et al. [
24] explored a novel methodology to analyze all associations and correlations within building operational data, thus discovering valuable insights for energy conservation. Xiao et al. [
25] put forward a practical framework for mining Building Automation System datasets with data mining techniques. The framework consists of five main steps. They are data preparation, cluster analysis, association rule mining, post-mining processing, and application of discovered knowledge. This framework serves as a critical link between BAS monitoring data and actionable HVAC operation strategies. Li et al. [
26] devised a data mining-oriented approach to identify and interpret power consumption patterns and correlations, analyzing three time-independent influencing factors including part-load ratio, refrigerant charge level, and cooling conditions. The findings indicated that this approach can assist in identifying energy consumption patterns and extracting energy consumption rules within VRF systems. Here, energy consumption rules refer to stable, physically interpretable associations between operating parameters and energy consumption. For example, the increase in part-load ratio leads to a linear rise in compressor frequency and thus energy consumption under fixed refrigerant charge. Li et al. verified these rules by partitioning VRF operation data into different load intervals and using association rule mining to screen valid parameter-energy correlations. These studies employed advanced data mining methods to conduct comprehensive analyses of system operational data, thereby identifying several optimization directions. However, data mining strategy evaluation and optimization aim to improve efficiency. Therefore, a reasonable baseline should be extracted to determine whether set operational variables lead to inefficiencies. Wu [
27] proposes recommendations for enhancing the energy efficiency standards of VRF equipment by collecting and pre-processing data, extracting operating parameters, conducting data analysis and comparisons, and discussing potential applications. Zhou [
28] proposed a research methodology encompassing data preparation, feature selection, and VRF system energy consumption prediction to evaluate the generalization ability of the prediction model. This further confirmed the effectiveness and reliability of the methodology in energy consumption prediction for VRF air conditioning systems. However, the issue of correlation between variables has not been taken into account. To address the gaps in existing studies, this study designs two core sections to systematically implement the VRF system data mining, with clear inheritance and innovation from the aforementioned literature. In
Section 2, three key variables were controlled, and high-precision sensors were used to collect data, ensuring the dataset covers typical winter heating scenarios for VRF systems in central China. A multi-layered analytical framework was built in
Section 3. First, 11 parameters significantly correlated with energy consumption were selected from the initial set of 24 parameters through correlation analysis. Subsequently, MLR was employed to quantify linear relationships, and finally, ARM was utilized to uncover non-linear patterns under various operating conditions. This combination of methods ensures that the study not only inherits the advantages of existing research but also targets the characteristics of VRF winter heating systems to improve the comprehensiveness and practicality of results.
Additionally, the selection of feature variable sets exerts a substantial influence on data mining. Relevant analytical methods were applied in the study to eliminate redundant variables from the original feature set and prevent high correlations between the original variables. Data mining techniques were employed to optimize feature variable selection [
29,
30]. Data mining is the complex process of discovering hidden knowledge within large datasets [
31,
32,
33]. Association rules are a technique used in data mining to identify potential relationships between variables or items within a dataset. They are often used to analyze transactional data. Data mining techniques and association rule mining [
34,
35,
36,
37] were widely used in this study to discover defects related to variables. Current studies mostly focus on single-parameter analysis of VRF systems such as energy consumption prediction or fault diagnosis, while few studies focus on systematic mining [
38,
39,
40] of multivariate correlations and dynamic evolution patterns. Although machine learning [
41,
42] based methods perform well in classification and regression tasks, the continuity and spatial correlation of time series data have not been fully utilized. Traditional statistical methods have difficulty dealing with non-linear relationships.
To address the above gaps and given the complexity of VRF system operating scenarios and the difficulty of grasping actual operating conditions, this study selected data measured by a VRF system experimental platform and performed data mining analysis on the experimental data to derive the operating parameters and their impact on energy consumption. By adjusting the experimental settings, various operational parameters and energy consumption data during VRF system operation were measured, and data mining was performed on the collected data. The effects of variations in parameters such as system temperature, compressor frequency, EXV steps, average pressure and load on the operation of VRF system were analyzed, as were the effects of indoor set temperature, set air speed and load rate. The R software package, as well as the SPSS Statistics 27 and SPSS Modeler.v18 software packages, were used as data mining tools in this study. Correlation analysis was conducted via R software (R-4.5.1-win), multiple linear regression was applied for regression analysis, and the Apriori algorithm was utilized for association rule mining. This study combines multiple linear regression (MLR) with association rule mining (ARM). MLR is used to quantify the linear relationship between eight key feature parameters and energy consumption, while ARM is used to mine hidden association rules under different operating conditions based on the non-linear and dynamic characteristics of multi-source heterogeneous data in VRF systems, thereby deriving the relationship between various operating parameters and energy consumption. The essence of this combination is to address the inherent limitations of these two types of methods, directly serving the overarching goal of providing actionable optimization strategies for the winter heating operation of VRF systems in Zhengzhou. When MLR is applied alone, although it can output quantitative coefficients, it fails to reflect differences in operating conditions, resulting in fixed coefficients that are unable to guide dynamic adjustments. In contrast, when ARM is applied alone, it can mine qualitative rules but cannot quantify specific values, leaving the regulations at a descriptive level and making it difficult for engineers to set precise targets. Furthermore, the MLR and ARM collaborative framework is not presented as an isolated concept but is deeply integrated into each core chapter of the manuscript. Without combining MLR and ARM, the experiment would neither output the precise numerical values required for engineering practice nor capture the dynamic rules across different operating conditions, and the research objectives would be entirely unachievable. The innovation of this study lies in achieving the complementarity between quantification and qualification through the synergy of MLR and ARM. Specifically, MLR first quantifies the intensity of core impacts, providing an accurate numerical benchmark for engineering adjustments. Subsequently, ARM explores the dynamic deviations under different operating conditions. This logic of first quantifying the benchmark and then correcting for bias represents an achievement previously unattained by any single research methodology. It not only addresses the issue of poor adaptability of MLR to different operating conditions but also compensates for the limitation of ARM being unable to quantify impact intensity.
The contributions of this study are as follows.
- (1)
The MLR quantitative model, combined with the ARM dynamic rules system pro-posed in this study, is fully integrable into VRF real-time control systems in terms of technical adaptability and engineering implementability.
- (2)
The research findings are expected to provide key technical support for achieving energy conservation, smart city, and carbon-neutral sustainable development goals in various regions, particularly in Central China. The research conclusions directly serve to improve the sustainability of buildings in the region and provide localized data support for climate-adaptive VRF operating strategies.
- (3)
This study takes a single building in Zhengzhou as the starting point, aiming to establish a methodological benchmark for in-depth analysis. In subsequent work, through the expansion to multiple buildings, multiple climate zones, and large-scale datasets, the limitations caused by a single scenario will be gradually eliminated. Ultimately, a VRF system optimization framework that combines robustness and practicality will be developed.
2. VRF System Operation Experiment
The data from an experiment on the operational energy consumption of VRF systems in office buildings was used in this study. The building is located in a six-story office building in central China. The scope of this experiment includes the laboratories and offices on each floor, including those on the sixth floor. The building features a floor height of 3.6 m, and the area of a single monitoring room is approximately 103.4 square meters. This study primarily selects experimental data from experiments involving the activation of various end devices and the setting of different indoor temperatures and air speeds to explore the operational laws of VRF system in groups. The data for this study were sourced from an office building in Zhengzhou. The conclusions are highly applicable to office buildings in cold regions but require further validation in climatic zones characterized by severe cold, hot summers, and mild winters, as well as in residential and commercial buildings. For scenarios with severe load fluctuations or extreme climates, it is recommended to incorporate the non-linear rules mined by ARM for supplementary adjustments, thereby avoiding application deviations caused by model assumptions.
The experiment was conducted under winter conditions, with the air conditioning operation mode set to heating mode, using a VRF system. The outdoor unit is mounted on the rooftop of the office building. Four indoor units are placed in two rooms, with Terminal 1 and Terminal 2 in one room, and Terminal 3 and Terminal 4 in the other. Different controllers independently control the set parameters of the four indoor units.
Figure 1 shows a site diagram of the VRF system, and
Figure 2 shows a flow chart of the VRF system.
The VRF system is primarily composed of EXVs, compressors, four-way reversing valves, indoor and outdoor units, indoor and outdoor fans, temperature sensors, and pressure sensors. The compressor compresses the gaseous refrigerant into high-temperature and high-pressure gas and delivers it to the indoor unit under the heating mode. The temperature sensors are installed at the indoor and outdoor units, as well as at the inlet and outlet of the compressor, to monitor parameters such as fixed and variable frequency discharge temperature, fixed-variable frequency shell top oil temperature, inlet–outlet pipe temperature, and real-time high pressure at an hourly time scale. Meanwhile, a smart meter is mounted to measure the active power, total active power, bidirectional active power, as well as the current and voltage values of the entire system. The data collection interval is 3 s. The experiment adhered to the single-variable principle, striving to minimize interference from other factors on the experimental results. In experiments involving a variable room set temperature and a variable number of open indoor unit ends for multiple units, the wind speed was set to high. During the variable wind speed test, the quantity of operational indoor unit terminals and the set room temperature were maintained constant.
Before the experiment, windows and doors were opened to balance the indoor and outdoor temperatures. The experiment was officially initiated only after the indoor and outdoor temperatures had reached consistency. The indoor environment was kept closed while activating the indoor unit terminals of the VRF system during the experiment. After 15 min of stable operation, the terminals were turned off, and this process was repeated to obtain data under different operating conditions. In the experiment involving varying the number of VRF indoor unit terminals, each operating condition lasted approximately 15 min. The VRF system was not shut down when switching between conditions, ensuring the continuity of the experimental process. Experimental parameters were adjusted by controlling terminal activation combinations and indoor set temperatures. Specific experimental conditions and terminal activation settings are listed in
Table 1, while variable wind speed experiment setups are shown in
Table 2. The core objective of this study is to analyze the impact of different operating parameters on the energy consumption and key parameters of VRF systems. To achieve this goal, the experimental temperature settings need to cover a low, medium, and high gradient to capture the dynamic response of the system under different load demands. 18 °C represents the lower temperature for winter heating. 24 °C corresponds to the typical range of human comfort temperatures in winter. 30 °C represents a relatively high temperature setting or a test for the system’s adjustment capability under extreme operating conditions. Through this gradient, it is possible to systematically observe the non-linear changes in parameters such as compressor frequency, refrigerant flow, and heat exchange capacity of the VRF system as the set temperature increases, as well as how these changes affect energy consumption. Although 30 °C is uncommon in conventional winter heating, the necessity of setting this temperature is reflected in two aspects. (1) Under high temperature settings, VRF systems may enter high load operation mode, which differs significantly from medium and low temperature settings in terms of energy consumption characteristics and parameter correlation. By identifying patterns under these operating conditions, data can be provided to support energy efficiency optimization for the system under extreme demand, thereby avoiding efficiency degradation caused by long-term high-load operation. (2) This study was conducted in Zhengzhou, a city located in the central region of China. There is no centralized heating in this region during winter, and some buildings rely on VRF systems for independent heating. Some users may set high temperatures for short periods due to instantaneous heating needs in actual applications. Therefore, studying the 30 °C operating condition has practical significance for guiding users to adjust temperatures reasonably.
4. Analysis Results of VRF Operation Data
The results of VRF operation data analysis in this section focus on the unique data characteristics of the winter heating operation of VRF systems in Zhengzhou. The valid data covers three temperature gradients (18 °C, 24 °C, and 30 °C) and records the relationships between 11 core parameters and real-time energy consumption, presenting typical features of strong parameter coupling and high operating condition dynamics. In response to these characteristics, the analysis in this section can quantify the independent impact intensity of core parameters on energy consumption, verify the applicability of the MLR model in winter heating scenarios, and identify differences in the correlation between parameters and energy consumption under different operating conditions. This section strictly follows the progressive logic of parameter screening, model construction, and operating condition verification, forming a closed loop with the research methods presented in
Section 3. In
Section 4.1, 11 variables significantly correlated with energy consumption were screened out from 24 initial parameters, which directly simplified the variables of the MLR model. In
Section 4.3, with 11 parameters as input and energy consumption as output, an 8-parameter model was finally obtained by eliminating variables with multicollinearity. The fundamental purpose of this section is to provide support for
Section 5 and
Section 6 through verifiable and quantifiable data analysis.
4.1. Correlation Analysis
Given the numerous operational parameters of VRF systems, it is necessary to first screen out those significantly correlated with energy consumption for further analysis. This study initially selected 24 characteristic variables for correlation analysis, using experimental data collected when terminal device 1 was continuously activated for four hours while other devices were turned off. The correlation analysis was performed using R software (R-4.5.1-win), with specific results shown in
Figure 4. The analysis results are presented as a circle plot, where the color of each circle reflects the correlation coefficient. Red denotes a high negative correlation coefficient, blue represents a high positive correlation coefficient, and lighter hues indicate that the correlation coefficient is close to 0.
Figure 4 shows the correlation and significant strength of the 24 variables with energy consumption. The results show that 11 characteristic variables are correlated with energy consumption: fixed frequency discharge temperature, fixed frequency 1 shell top oil temperature, fixed frequency 2 shell top oil temperature, average low pressure, compressor frequency, instant output capacity, electrical box temperature, inlet pipe temperature, outlet pipe temperature, ambient temperature, and EXV step count. Among them, the significance of average low pressure, the electrical box temperature, and the EXV steps is less than 0.01, and their correlation is strong. The fixed frequency discharge temperature, the fixed frequency 1 shell top oil temperature, the fixed frequency 2 shell top oil temperature, the inlet pipe temperature, the outlet pipe temperature, and the ambient temperature exhibited extremely high statistical significance (
p < 0.001) and extremely strong correlation. Due to limitations in the comprehensiveness and duration of the selected dataset, it may not cover all parameters related to energy consumption. The data in this study is sourced from the Central China region. The results of data analysis have important guiding value for the Central China region. Zhengzhou is located in a cold area, and the experiment was conducted under winter heating conditions, which are relatively consistent with the experimental conditions in this region. The climatic conditions in other regions are different, but the principles of air conditioning components remain the same. The results in
Figure 4 also have the same reference significance for other regions.
4.2. Energy Consumption Regression Analysis
In the linear regression analysis conducted in this chapter, the dependent variable is energy consumption. The independent variables were selected as the 11 characteristic variables with significant correlation shown in
Figure 4 of
Section 4.1. A regression model was constructed using these variables. Model specifications are presented in
Table 3 and
Figure 5.
It is found that the VIF values for Fixed Frequency 1 Shell Top Oil Temperature, Compressor Frequency, Instantaneous Output Capacity, Inlet Pipe Temperature, and Outlet Pipe Temperature all exceeded 10, with tolerance values less than 0.1, indicating that the model has multicollinearity problems. Multicollinearity can lead to issues such as unreliable regression coefficients, variance inflation, reduced model interpretability, and model instability. Additionally, in this model, the p-values for Compressor Frequency, Instantaneous Output Capacity, and Electrical Box Temperature were greater than 0.05, indicating that the model is invalid and lacks statistical significance.
4.3. Multicollinearity Analysis
During model validation, a high correlation coefficient was found between Compressor Frequency and Instantaneous Output Capacity, along with overlapping physical meanings. Therefore, a collinearity diagnosis was performed on these variables. A linear regression analysis was conducted with energy consumption as the dependent variable and compressor frequency and instantaneous output capacity as the independent variables. Condition indicators are calculated based on characteristic values. The condition indicator for a certain dimension is the ratio of the characteristic value of that dimension to the characteristic value of one dimension. Typically, when the condition index exceeds 15, a collinearity issue may exist in that dimension. If the variance proportion of a variable exceeds 0.9, collinearity between variables may be suspected. As shown in
Table 4 and
Figure 6, in the third dimension, both compressor frequency and instantaneous output capacity exhibit a variance proportion of 1, exceeding the threshold of 0.9. This indicates collinearity between the two variables. Real-time output capacity is a derived indicator and verification variable calculated based on parameters such as frequency and temperature, which determine whether the current unit is operating at its predetermined capacity. Unlike compressor frequency, which is a core parameter for system operation, real-time output capacity is not a core parameter for system operation. Therefore, real-time output capacity has been removed.
By examining the model through
Figure 4, it can be further observed that the correlation coefficients among the fixed-frequency 1 shell-top oil temperature, fixed-frequency 2 shell-top oil temperature, and fixed-frequency 1 discharge temperature are relatively high, with a strong correlation existing among them. Therefore, collinearity diagnosis should be conducted for these three characteristic variables. Consequently, a collinearity diagnosis was performed on these three characteristic variables. The linear regression analysis was performed with energy consumption as the dependent variable and Fixed Frequency 1 Shell Top Oil Temperature, Fixed Frequency 2 Shell Top Oil Temperature, and Fixed Frequency 1 Discharge Temperature as the independent variables. As shown in
Table 5 and
Figure 7, within the fourth dimension, the variance proportions of both Fixed Frequency 1 Shell Top Oil Temperature and Fixed Frequency 2 Shell Top Oil Temperature exceed 0.9. This suggests collinearity between the two variables. The fixed-frequency 1 compressor is the main compressor for system operation. The fixed-frequency 1 shell top oil temperature is highly correlated with the fixed-frequency 2 shell top oil temperature, and its shell top oil temperature better reflects the thermal state of the core equipment. Therefore, the fixed-frequency 2 shell top oil temperature has been removed.
Meanwhile, the inlet and outlet temperatures were found to be highly correlated and to have similar meanings. A linear regression analysis was performed with energy consumption as the dependent variable and these two variables as the independent variables. As shown in
Table 6 and
Figure 8, within the third dimension, the variance proportions of both the inlet pipe temperature and the outlet pipe temperature exceed 0.9, indicating the presence of collinearity. The inlet and outlet temperatures are collinear, and the inlet temperature more directly reflects the initial state of the refrigerant entering the heat exchanger. Therefore, combining the two into the inlet–outlet temperature difference eliminates collinearity while retaining physical meaning and maintaining data stability and reliability.
Based on the collinearity diagnosis results, the independent variables were reorganized, and a new regression model was established with energy consumption as the dependent variable. As shown in
Table 7 and
Figure 9, the VIF values of the eight adjusted characteristic variables are all less than 10, and their tolerance values exceed 0.1. This indicates the absence of multicollinearity among the variables. Additionally, all significance values are less than 0.05, confirming the statistical significance of the model. The model has an R
2 value of 0.925, indicating that Fixed Frequency 1 Discharge Temperature, Fixed Frequency 1 Shell Top Oil Temperature, Average Low Pressure, Compressor Frequency, Electrical Box Temperature, Inlet–Outlet Temperature Difference, Ambient Temperature, and EXV Step Count can explain 92.5% of the variation in energy consumption. A model is valid if its explanatory power exceeds 30%. Additionally, the table indicates that Fixed Frequency 1 Discharge Temperature, Compressor Frequency, Ambient Temperature, and EXV Step Count have a significant positive impact on energy consumption, meaning higher values of these parameters are associated with higher energy consumption. The results also show that Fixed Frequency 1 Shell Top Oil Temperature, Average Low Pressure, Electrical Box Temperature, and Inlet–Outlet Temperature Difference have a significant negative impact on energy consumption, meaning lower values of these parameters are associated with higher energy consumption.
Based on the results of multicollinearity diagnosis, highly collinear variables were eliminated. This equation includes 8 core independent variables, such as discharge temperature and shell top oil temperature, with energy consumption as the dependent variable. A multiple linear regression equation establishing the relationship between energy consumption and core operating parameters is presented as follows.
4.4. Analysis of Regression Equation Results
To further validate the reliability and applicability of the regression model, it is necessary to analyze variable independence, collinearity, and residual distribution characteristics, ensuring the model meets statistical assumptions and can effectively explain the laws of energy consumption changes. The specific analyses are as follows.
(1) Assessment of variable independence
The stability of a regression model depends on the independence of the independent variables. In this study, the Durbin–Watson (DW) statistic was used to test the serial correlation of the variable sequence. The result showed that DW = 1.03, which, although deviating from the ideal value of 2.0, indicates the presence of a slight positive correlation. This can be attributed to the temporal continuity of the parameters of the VRF system, which is consistent with the dynamic characteristics of the HVAC system. The DW value did not fall into the strong correlation interval (DW < 1.0). Additionally, the absolute values of the autocorrelation coefficients of the model residuals were all less than 0.2. These results indicate that the sample independence basically meets the requirements of regression analysis, and the bias of parameter estimation is controllable.
(2) Quantitative diagnosis and elimination effect of multicollinearity
Multicollinearity can inflate the variance of regression coefficients. In this study, diagnosis was performed using the VIF and tolerance to address this issue. For the optimized model, all independent variables exhibited a VIF of less than 10 and a tolerance of greater than 0.1 (as shown in
Table 7), which meet the critical threshold requirements. Among them, the fixed frequency 1 exhaust temperature (VIF = 6.709) and the fixed frequency 1 shell top oil temperature (VIF = 9.294) VIF are close to the critical value. By eliminating highly collinear variables, the stability of the model parameter estimation was improved. For instance, the standardized coefficient of the EXV step count (beta = 0.026, t = 4.621,
p < 0.001) aligns with the physical meaning and engineering practice. This result confirms the effectiveness of collinearity elimination, indicating that the new model is free from multicollinearity issues.
(3) Normal distribution of residuals
Residual testing is crucial for evaluating the validity of a regression model. A non-normal distribution of residuals may lead to model bias. In this study, the residual distribution was visually inspected using a normal probability plot (see
Figure 10). Residuals that follow a normal distribution ensure the randomness of prediction errors, which aligns with statistical assumptions and thus guarantees the model’s reliability.
Regarding the interaction effects between variables, this study did not include them in the model at the initial stage for the following main reasons. First, the core of this study is to identify the key parameters that exert a significant impact on energy consumption, thereby providing clear variables for the basic optimization of VRF systems. In contrast, interaction effect analysis is more suitable for exploring the synergistic effects between parameters. Second, the 24 initial variables result in a relatively high dimensionality, and the introduction of interaction terms would significantly increase model complexity, potentially leading to overfitting.
As shown in
Figure 10, the residuals of this model do not conform to a normal distribution. Further analysis can obtain the following results. (1) The model explains 92.5% of the variation in energy consumption. However, there are still potential influences that have not been included, such as dynamic boundary conditions like solar radiation intensity and indoor occupant density. These could lead to systematic errors being included in the residuals. (2) The non-linear operating characteristics of the VRF system such as compressor start-stop transients and dynamic adjustment of the electronic expansion valve result in energy consumption fluctuations at certain sample points, which are difficult to portray fully using a linear model. This manifests as an increase in residual dispersion.
The model exhibits deviations in predictions under extreme load conditions and transient compressor start-stop conditions. Additionally, it fails to incorporate the specific impacts of parameter interaction terms and dynamic boundary conditions. Despite these limitations, the model’s high R
2 value, significant F-test result, and physically reasonable parameter signs indicate that it can effectively reveal the linear correlation between energy consumption and the core operating parameters of a VRF system. It can therefore be used as a basic analytical tool for optimizing energy efficiency. The linear regression coefficients of MLR have intuitive physical meanings, and multiple linear regression yields linear formulas and explicit relationships between variables, making it more suitable for practical applications. In contrast, even though more advanced non-linear methods may have slightly higher prediction accuracy, their black-box nature prevents them from outputting such quantitative coefficients, and thus, they cannot clarify the specific intensity of the impact. Therefore, multiple linear regression is selected as the core prediction model.
Figure 11 shows the regression equation fitting graph generated based on the data. The plot indicates that the fixed-frequency compressor 1 discharge temperature, electrical box temperature, ambient temperature, and compressor frequency exhibit favorable fitting performance, with data points closely clustered around the regression lines. Thus, it can be inferred that the low goodness-of-fit of the regression equation is related to the data dispersion of fixed-frequency compressor 1 shell top oil temperature, average low pressure, inlet–outlet pipe temperature difference, and EXV step count.
5. Association Rule Analysis of Variable Refrigerant Flow System Operation Data
Since the experimental raw data is continuous, it cannot be directly used for association rule mining and needs to be discretized first. Data discretization involves converting continuous data into discrete data. Data discretization can simplify data noise, reduce dataset complexity, and facilitate data processing by algorithms. The Apriori algorithm was employed for association rule analysis. This algorithm first scans the dataset to calculate the support and confidence of all possible rules, then filters the rules based on preset thresholds, and recursively generates rules until no new rules can be derived. The objective of this study is to improve the operational efficiency of VRF air conditioning systems and the sustainability of buildings. Therefore, this study predefines the system energy consumption index as a key outcome variable. Unlike traditional ARM, which indiscriminately explores all possible correlations, the ARM analysis focuses specifically on mining rules related to these predefined result variables. By pre-setting these outcome variables, this study guides the ARM algorithm to discover association rules that can explain or predict these specific targets. This design makes ARM analysis more targeted and practical, thereby better serving the research goal of improving building sustainability.
5.1. Data Discretisation Pre-Processing
Since the original experimental data is continuous, it cannot be directly used for association rule mining. Therefore, data needs to be discretised before mining association rules. Data discretization refers to the process of converting continuous data into discrete data. Its purpose is to simplify noise in the data, reduce the complexity of the dataset, and enable algorithms to process the data more effectively. By converting continuous data into several discrete values, the impact of noise on data analysis can be reduced.
The data discretization commonly uses two methods, including the equal-frequency discretization and the equal-distance discretization. Equal-frequency discretization is the process of dividing continuous data into k discrete intervals, each containing an equal number of data points. Equal-distance dispersion is the process of dividing continuous data into k discrete intervals, each with equal width, although the number of data points within each interval may vary. Firstly, determine the maximum and minimum values of the data. Divide the data into k discrete intervals, with each interval having a width of (max − min)/k. Then, determine the boundaries of the intervals based on the width and minimum value of each interval. For example, if the minimum value is a and the bin width is b, then the boundary of the first bin is [a, a + b], the boundary of the second bin is [a + b, a + 2b], and so on. The data consists of k discrete intervals, each containing a different number of data points.
The equidistant scattering was selected for data processing in this study. For temperature-related parameters, the study refers to the safe operating range of the compressor and combines it with the quartile distribution of the data to divide it into three intervals, low, medium, and high. For parameters such as energy consumption, compressor frequency, and EXV steps, the study divided them into thresholds based on the variable gradient set in the experiment. This study employs equal-width discretization, which does not disrupt the continuity of physical parameters. Meanwhile, this method is more aligned with engineering cognition and offers high stability. The binning method selected in this study did not induce sign inversion. This strategy ensures that the energy consumption rules for VRF systems are finally extracted not only to align with data patterns but also to exhibit physical rationality and engineering reliability.
In the classical theory of the ARM method, the core function of the support threshold is to filter out low-frequency, meaningless item sets, while the confidence threshold serves to ensure the reliability of rule predictions. According to the minimum support and confidence framework proposed by Han [
43], the minimum support needs to cover at least 5% to 10% of the sample size to prevent rules from arising solely from accidental data fluctuations. The minimum confidence must be higher than the probability of random correlations, and it is usually set at 30% to 50% to ensure that the rule prediction accuracy is better than random guesses. The threshold setting in this study strictly adheres to this classical criterion. The operating data of HVAC systems exhibit an uneven characteristic. Core parameters are concentrated in high-frequency ranges, while extreme operating conditions are scattered in low-frequency ranges. Therefore, general domain standards generally control the support within 10% to 25% and the confidence within 30% to 50%. The thresholds in this study fall entirely within this range and are fine-tuned according to specific operating conditions, which not only conform to general standards but also adapt to the data characteristics of this study.
5.2. Results of Different Set Temperatures for Variable Refrigerant Flow Systems
In the experiment involving the VRF variable setting temperature, choose to open conditions 1 and 2 for the indoor unit and equipment. This ensures stable operation of the indoor unit and avoids interference from other factors. The indoor units are set to run at a high airspeed. The Apriori algorithm was applied to mine the data from this operating condition, with a minimum itemset support of 20% and a minimum rule confidence of 50%. The resulting association rules are shown in
Table 8. A support of 20% ensures coverage of such high-frequency combinations, while a confidence of 50% corresponds to a rule prediction accuracy of 85% and covers 82% of high-frequency temperature adjustment scenarios. If the confidence is reduced to 30%, although the number of rules increases by 35%, it will incorporate weak rules that contradict thermodynamic laws, such as high ambient temperature leading to low energy consumption, which undermines the value of engineering applications.
Based on the analysis of the table above, the following observations are made. When the temperature is set to 18 °C, 24 °C, and 30 °C, there is a correlation rule that low 1 EXV steps at the result in low energy consumption, with confidence levels of 58.947%, 84.848%, and 100%, respectively. This indicates a positive correlation between EXV step count and energy consumption, which remains consistent regardless of set temperature changes. The confidence levels of the working condition that high inlet pipe temperature of Terminal 2 leads to high energy consumption are 80.952%, 75.862%, and 100% at 18 °C, 24 °C, and 30 °C, respectively. At 18 °C, a high inlet pipe temperature of Terminal 2 leads to high energy consumption, with a confidence level of 80.952%; at 24 °C, a relatively low inlet pipe temperature of Terminal 2 leads to low energy consumption, with a confidence level of 75.862%; and at 30 °C, a high inlet pipe temperature of Terminal 2 leads to low energy consumption, with a confidence level of 100%. This indicates that as the set temperature increases, the influence of the terminal inlet temperature on energy consumption transitions from positive to negative. Under the conditions of 18 °C, 24 °C, and 30 °C, the association rules between the inlet pipe temperature of Terminal 2 and energy consumption, along with their corresponding confidence levels, are as follows. At 18 °C, a high inlet pipe temperature of Terminal 2 corresponds to high energy consumption, with a confidence level of 80.952%. At 24 °C, a relatively low inlet pipe temperature of Terminal 2 corresponds to low energy consumption, with a confidence level of 75.862%. And at 30 °C, a high inlet pipe temperature of Terminal 2 corresponds to low energy consumption, with a confidence level of 100%. These results indicate that as the set temperature increases, the trend of energy consumption concerning terminal outlet temperature reverses, shifting from a positive to a negative correlation. This may be attributed to changes in the system’s operating mode and energy distribution as the set temperature increases, leading to corresponding alterations in the relationship between terminal outlet temperature and energy consumption.
Furthermore, at 18 °C, the confidence level that a high inverter discharge temperature leads to high energy consumption is 100%. At 24 °C, the confidence level that a relatively low inverter discharge temperature leads to low energy consumption is 90.909%. At 30 °C, the confidence level that a low inverter discharge temperature leads to moderate energy consumption is 100%. This indicates that as the indoor temperature increases, the influence of variable frequency discharge temperature on energy consumption transitions from positive to negative, and the rate of change in this influence concerning indoor set temperature is relatively slow. This reflects that the intrinsic relationship between the variable frequency discharge temperature and energy consumption differs across different indoor set temperatures. As temperature increases, it may alter the system’s energy conversion and consumption mechanisms, thereby influencing the relationship between the two. The assertion high variable frequency shell top oil temperature causes high energy consumption demonstrates confidence levels of 66.379%, 90.323%, and 74.545% at temperatures of 18 °C, 24 °C, and 30 °C, respectively. This indicates that as the set temperature increases, the influence of variable frequency shell top oil temperature on energy consumption transitions from positive to negative, further demonstrating the significant moderating role of temperature on the relationships between system parameters and energy consumption.
5.3. Results of Different Load Rates for Variable Refrigerant Flow Systems
In the experiment investigating the variation in activated indoor unit terminals in a VRF system, the indoor set temperature was maintained at 18 °C. The total experimental duration was 4166 s. The changes in indoor unit terminals are shown in
Table 9. The experiment adhered to the principle of a single variable, with all indoor units set to high fan speed operation. The Apriori algorithm was employed for data mining, with a minimum support threshold of 10% and a minimum confidence threshold of 50%. The generated association rules are detailed in
Table 10. A support of 10% can accurately cover such critical load switching scenarios, and a confidence of 50% can ensure the physical consistency of the correlation between load rate and parameters.
Comparative analysis of
Table 10 reveals as follows. When terminals 1 and 3, 1, 3, and 4, and 3 and 4 are activated separately, the conclusion that the variable frequency exhaust temperature is low, resulting in high energy consumption has a confidence level of 100%, 100%, and 70%, respectively. This indicates that as the number of activated terminals increases, the correlation between energy consumption and variable frequency discharge temperature shifts from negative to positive. Conversely, as the number of terminals decreases, this correlation shifts from positive to negative. This demonstrates the significant impact of changes in the system load ratio on the relationship between the variable frequency discharge temperature and energy consumption. The number of activated terminals likely alters the load demand and refrigerant flow distribution of the system, thereby influencing the correlation between variable frequency discharge temperature and energy consumption. The high inlet–outlet pipe temperature difference at terminal 1 results in high energy consumption when terminals 1 and 3 and 1, 3 and 4 are activated separately, with confidence levels of 100% and 83.333%, respectively. This indicates that as the number of activated terminals increases, the influence of inlet–outlet temperature difference on energy consumption shifts from positive to negative. This can be attributed to the fact that an increase in the number of activated terminals modifies the refrigerant flow rate and heat exchange dynamics within the system, thereby reversing the relationship between inlet–outlet temperature difference and energy consumption.
When Terminals 1 and 3 are activated, a relatively low EXV step count of Terminal 1 leading to high energy consumption has a confidence level of 100%. When Terminals 1, 3, and 4 are activated, a high EXV step count of Terminal 1 leading to high energy consumption has a confidence level of 100%. This indicates that as the number of activated terminals increases, the influence of the EXV step count on energy consumption shifts from negative to positive, reflecting the complex relationship between EXV step adjustments and energy consumption under varying system loads. When terminals 1 and 3 are activated, the fixed-frequency exhaust temperature is low, resulting in moderate energy consumption, with a confidence level of 86.667%. When terminals 1, 3, and 4 are activated, the fixed-frequency exhaust temperature is high, resulting in high energy consumption, with a confidence level of 100%. When terminals 3 and 4 are activated, the fixed-frequency exhaust temperature is high, resulting in low energy consumption, with a confidence level of 77.778%. This indicates that as the number of activated terminals increases, the influence of fixed frequency discharge temperature on energy consumption shifts from negative to positive. Conversely, as the number of activated terminals decreases, this influence shifts from positive to negative, highlighting the significant impact of terminal activation count on the relationship between fixed frequency discharge temperature and energy consumption.
When terminals 1, 3, and 4 are activated, the temperature difference between the inlet and outlet pipes of terminal 4 is moderate, resulting in high energy consumption, with a confidence level of 63.793%. The activation of terminals 3 and 4 has been shown to result in a significant temperature difference between the inlet and outlet pipes of terminal 4, leading to a substantial increase in energy consumption. This assertion has a confidence level of 100%. This indicates that when the number of activated terminals decreases, the influence of inlet–outlet temperature difference on energy consumption shifts from negative to positive, further validating the close relationship between terminal activation count and the correlation between temperature difference and energy consumption. When Terminals 1, 3, and 4 are activated, the confidence level that a high EXV step count of Terminal 4 leads to high energy consumption is 100%. For Terminal 3 and 4 activations, the same rule holds with 96.19% confidence. This indicates that when the number of activated terminals decreases, the influence of EXV step count on energy consumption remains consistent, demonstrating the stability of the positive correlation between Terminal 4’s EXV step count and energy consumption across different load ratios.
5.4. Results of Different Fan Speeds for Variable Refrigerant Flow Systems
To control for the influence of extraneous factors, experimental data under similar outdoor meteorological parameters were selected for analysis in the variable fan speed experiment of VRF systems. The indoor temperature was set at 24 °C, and only indoor unit terminal 1 was activated. Experiments at low, medium, and high wind speeds were conducted separately to ensure the stable operation of each group under their respective set conditions. The minimum conditional support is set to 5%, and the minimum rule confidence is set to 30% to produce association rules that meet the minimum support and confidence requirements, as demonstrated in
Table 11. A support of 5% can accurately cover such critical load switching scenarios, and a confidence of 30% can ensure the physical consistency of the correlation between wind speed and parameters.
Comparative analysis of the above table reveals that. When Terminal 1 is activated and set to low speed, the confidence level for the conclusion that a high inverter discharge temperature leads to high energy consumption is 33.333%. At medium speed, the confidence level that a high inverter discharge temperature leads to high energy consumption is 100%. At high speed, the confidence level for the conclusion that a low inverter discharge temperature leads to low energy consumption is 100%. This indicates that as airspeed increases, the influence of variable frequency discharge temperature on energy consumption is generally positive yet manifests differently under high airspeed conditions. This divergence may arise from airspeed-dependent changes in indoor–outdoor heat exchange efficiency, which in turn alter the relationship between discharge temperature and energy consumption.
When the terminal 1 wind speed is set to low wind speed, the probability of high variable frequency shell top oil temperature leading to high energy consumption is 65.217%. When set to medium wind speed, the probability is 100%. When set to high wind speed, the probability is 100%. This finding suggests that the relationship between increased variable frequency shell top oil temperature and increased energy consumption remains constant with variations in wind speed. This indicates that the intrinsic relationship between these variables remains relatively stable under different wind speed conditions.