1. Introduction
The rapid advancement of the information society has led to significant expansion in both the number and scale of data centers. According to recent statistics from Synergy Research Group [
1], the number of large data centers operated by hyperscale providers increased to 1136 at the end of 2024, having doubled over the previous five years. Simultaneously, energy consumption and operational expenditures in data centers continue to rise significantly [
2,
3], with projections indicating they will constitute nearly 8% of global electricity demand by 2030 [
4]. In addition to high energy requirements, data centers also suffer from notably low resource utilization. Relevant statistical reports show that the average resource utilization rate in typical data centers remains below 25% [
5]. In a data center, the cooling system plays a crucial role, accounting for approximately 37% of the total energy consumption [
6]. While diverse cooling technologies, including advanced rack-level solutions, continue to advance [
7], their operational efficiency fundamentally relies on accurate anticipation of the cooling load. Precise cooling load prediction is therefore a critical enabler for energy optimization across the entire spectrum of cooling technologies. Within the cooling load, IT equipment constitutes the primary heat source, accounting for approximately 71~73% of the total cooling load. Additionally, uninterruptible power supply (UPS) systems contribute about 11~13% to the total cooling load [
8,
9]. Nowadays, most data centers experience significant overcooling, whereby the cooling systems frequently supply nearly twice the required cooling capacity. This results in an increase of over 50% in the energy consumption of cooling systems compared to their design specifications, thereby substantially elevating the operational costs of the data center [
10]. Consequently, accurate and reliable prediction of cooling loads is essential for minimizing energy consumption, improving the operational efficiency of data center cooling systems, and establishing a robust foundation for effective energy management.
Cooling load prediction models have been developed across various domains and applications, typically categorized into two main approaches: physical models and data-driven models [
11]. Physical models employ thermodynamics-based simulation tools such as EnergyPlus [
12], Dymola [
13], TRNSYS [
14] and DOE-2 [
15]. These models require detailed building parameter inputs, a requirement that often involves substantial labor costs [
16] and may still suffer from limitations in predictive accuracy. This limitation stems from simplifications in modeling complex real-world phenomena, including uncertainties in occupant behavior, equipment degradation over time, and the stochastic nature of weather conditions. In contrast, data-driven models overcome these limitations associated with physical models by leveraging machine learning techniques. While physics-based and hybrid gray-box models offer interpretability and physical consistency, they often require detailed knowledge of building systems and can be computationally intensive for real-time control. The accuracy of the prediction model plays a critical role in determining the reliability of parameter prediction and the effectiveness of energy management system optimization. This relationship is reflected in improved decision-support robustness, as greater predictive precision leads to reduced operational uncertainty.
With the rapid advancement of artificial intelligence (AI) technologies [
17], data-driven models have gained prominence in energy applications. A variety of machine learning algorithms have been widely adopted for predictive modeling, including backpropagation neural networks (BP) [
18], artificial neural networks (ANNs) [
19], support vector machines (SVMs) [
20], long short-term memory (LSTM) [
21], extreme gradient boosting (XGBoost) [
22], light gradient boosting machine (LightGBM) [
23] and other algorithms. Hu and Wei [
24] developed a BP neural network with Bayesian regularization for hourly cooling load prediction of a large commercial building, achieving prediction errors of 1.60% and 1.18% for weekly and daily forecasts, respectively. Under dynamic operational scenarios, specifically varying occupancy flow and weather conditions, the maximum relative errors remained below 9.8257% and 11.675%, respectively, confirming its robustness across diverse boundary conditions. Chen et al. [
25] proposed an SVM-based hourly air conditioning load prediction model, employing metaheuristic optimization techniques for parameter tuning, achieving a maximum relative error of 2.52%. In a related study, An et al. [
26] developed support vector regression (SVR) models for data center cooling systems, demonstrating superior accuracy under small-sample conditions compared to conventional methods, which highlights SVR’s pronounced advantages for this application. Sha et al. [
27] demonstrated that gradient tree boosting (GTB) models trained on 1 h resolution data achieved superior accuracy for building cooling load prediction compared to alternative approaches. Ji et al. [
28] implemented a LightGBM-based framework for cooling load prediction, incorporating feature selection of key load determinants. The model, developed using Python(PyCharm Community Edition 2023.3.3) and validated with operational data from an office building in Beijing, consistently achieved prediction accuracies exceeding 90%. These results underscore the model’s practical applicability and effectiveness for real-world energy management scenarios. Hou et al. [
29] evaluated five machine learning algorithms for hourly energy consumption prediction in a university office building. Among the evaluated models, the deep neural network (DNN) demonstrated the best performance, achieving optimal R
2 values of 0.971 and 0.959, respectively, as well as optimal RMSEs of 4.139 kWh and 4.796 kWh, respectively. Additionally, the DNN achieved optimal metrics of Mean Absolute Percentage Error (MAPE) of 5.095% and 5.738%, respectively. In data center energy prediction, Li et al. [
30] developed a hybrid physical–ANN model, where the ANN corrected errors from the physical model, reducing the Mean Relative Error (MRE) from 13.44% to 6.54% and the RMSE from 352.6 to 181.9. Further advancing this field, Dong et al. [
31] developed a real-time server energy consumption prediction model using XGBoost, incorporating distance correlation coefficient-based feature selection to identify key parameters. This approach enhanced model accuracy, achieving a 4.698% reduction in MAPE compared to five benchmark regression models. Current research in this domain is characterized by three notable trends: the prevalence of gradient boosting models (e.g., XGBoost, LightGBM) due to their robust performance and capacity to handle high-dimensional data, the increasing application of deep learning architectures for capturing temporal dependencies, and the integration of advanced hyperparameter optimization techniques (e.g., Bayesian optimization, metaheuristic algorithms) to enhance model accuracy. Looking forward, hybrid models that combine physical insights with data-driven flexibility [
30], along with interpretability tools such as SHAP, represent promising directions for future research.
Building on these advances, machine learning prediction models are widely recognized for their structural simplicity and broad applicability. Their powerful nonlinear fitting capabilities and strong generalization performance enable them to accommodate diverse building environments and dynamic operating conditions. However, when applied to data center cooling scenarios, existing approaches face two critical limitations: insufficient specialization for high-density, high-reliability cooling load profiles; and inherent trade-offs between computational speed and predictive accuracy. These challenges highlight the need for customized solutions that effectively balance efficiency, precision, and scalability to meet the stringent performance requirements of data center environments. The LightGBM algorithm presents a compelling solution for this domain, as its decision tree-based distributed gradient boosting framework is specifically optimized for large-scale, high-dimensional data processing. It offers fast training speeds, low memory consumption, and competitive predictive accuracy characteristics that align well with the unique requirements of data center cooling load prediction, including the need for high reliability, scalability, and real-time responsiveness.
To enhance the performance of predictive models, Bayesian optimization provides a probabilistic framework that systematically explores the hyperparameter space while incorporating uncertainty, leading to more robust and accurate model performance. These limitations have driven the growing adoption of probabilistic approaches like Bayesian optimization in recent years [
29], as they systematically account for parameter uncertainty during the tuning process. Recent advances in hyperparameter optimization have demonstrated significant improvements in cooling load prediction accuracy. Yan et al. [
32] developed an enhanced BiLSTM model incorporating PCANet for sensitivity analysis, retaining only features with correlation coefficients >0.2 to reduce dimensionality. By employing a hybrid strategy improved whale optimization algorithm (HSIWOA) for hyperparameter tuning, their approach achieved 50% lower MAPE compared to three benchmark models, with HSIWOA exhibiting superior convergence behavior versus six competing optimization methods. Complementing this work, Mao et al. [
33] proposed a nonlinear chaotic Harris hawks algorithm (NCHHO)-optimized full Elman neural network (FENN), where the improved NCHHO outperformed particle swarm optimization (PSO), gray wolf optimizer (GWO), and standard Harris hawks algorithm (HHO) in convergence speed and solution quality. The NCHHO-FENN hybrid model reduced RMSE by 11.72% and increased R
2 by 0.46% compared to the baseline FENN. As mentioned above, these studies demonstrate that integrating advanced hyperparameter optimization techniques, particularly metaheuristic algorithms, with neural network architectures can significantly improve both the efficiency and accuracy of data center cooling load prediction models. Notably, reported reductions in MAPE of up to 50%, along with improved convergence behavior, underscore the transformative potential of such algorithmic hybridization for next-generation building energy management systems.
In summary, data-driven models reduce dependence on a priori knowledge of building design and physical systems [
34], instead utilizing historical operational data to uncover latent relationships between energy consumption (as the output) and a wide range of input variables, including meteorological conditions, building characteristics, occupancy patterns, and equipment schedules. This methodology offers greater flexibility and broader applicability for cooling load prediction compared to traditional physical models. Although various advanced prediction methods have been developed for commercial buildings [
35], their research and application in data center environments remain relatively underdeveloped [
36], despite their critical role in energy-efficient operation. Specifically, while LightGBM has been successfully applied in general building energy forecasting, systematic studies that optimize LightGBM with advanced hyperparameter tuning methods—tailored to the unique, high-density, and dynamically fluctuating cooling load profiles of data centers—remain scarce. This gap is particularly critical in the context of data centers, where cooling load profiles are uniquely high-density and dynamically fluctuating. Recent systematic reviews have identified this as a critical gap, highlighting the need for interpretable models and real-time adaptive solutions specifically designed for data center thermal management [
37]. Unlike conventional buildings, data centers require 24 h cooling throughout the year, and their optimization control strategies are highly dependent on the precision of predictive models. This predictive accuracy is essential for ensuring both the safety and energy efficiency of cooling systems, highlighting the need for specialized predicting methods tailored to the operational characteristics of data centers.
The selection of an appropriate model is highly dependent on the specific characteristics of the target system and its operational data. In conventional buildings such as offices, cooling loads typically exhibit regular patterns governed by occupancy and diurnal cycles. For these settings, simpler models (e.g., linear regression, SVR) have been shown to provide adequate predictive efficiency [
11]. This is corroborated by case studies where methods like Random Forest achieved competitive accuracy in such settings, benefiting from stable and periodic load profiles [
38].
In contrast, data center cooling loads present a distinct challenge characterized by high dimensionality (multiple interacting variables), non-strict periodicity (load patterns that are not perfectly repetitive due to dynamic IT workloads), and transient fluctuations driven by sudden changes in computing demands. These complex, nonlinear dynamics exceed the representational capacity of simpler models, necessitating more advanced approaches. Gradient Boosting Decision Tree (GBDT) family models have proven particularly effective in capturing such patterns within data center energy systems [
30]. Among these, LightGBM is particularly well-suited. Its algorithmic efficiency in handling high-dimensional data stems from a histogram-based approach and a leaf-wise growth strategy [
23]. This efficiency allows it to effectively prioritize and model critical load variations.
To address this gap, a LightGBM model with Bayesian optimization is proposed, which is characterized by rapid training speed, low memory consumption, strong generalization capability, and precise adaptation to data center environments. The selection of LightGBM is motivated by its proven superiority over other ensemble methods in handling large-scale, high-dimensional datasets with complex feature interactions [
39,
40], as well as its algorithmic efficiency in capturing nonlinear dynamics through leaf-wise growth and histogram-based splitting [
23]. The main contributions of this study are summarized as follows:
A LightGBM model is proposed specifically for data center cooling load prediction, addressing the unique challenges of high dimensionality and non-strict periodicity;
Bayesian optimization is employed to automatically tune hyperparameters, enhancing model accuracy and generalization;
Comprehensive comparisons with naive benchmarks (T-1, T-24, and T-168) and state-of-the-art models (SVR, XGBoost, and LSTM) validate the superiority of the proposed approach in terms of prediction accuracy, computational efficiency, and robustness to noise.
This paper is structured as follows:
Section 2 details the LightGBM model, including the Bayesian hyperparameter optimization strategy and comparative model selection.
Section 3 describes the data acquisition and preprocessing process, supplemented by SHAP (SHapley Additive exPlanations)-based feature importance analysis.
Section 4 presents the experimental validation against baseline benchmarks.
2. Methodology
To develop a highly accurate algorithm for predicting the cooling load in data centers, the LightGBM model with Bayesian optimization was proposed. The cooling load prediction workflow of the proposed methodology is shown in
Figure 1, which consists of three main components: (1) data acquisition and preprocessing, (2) cooling load prediction using the LightGBM model, and (3) hyperparameter tuning through Bayesian optimization. In the data preprocessing phase, operational parameters such as equipment cooling load, equipment power density, meteorological data, and other relevant variables are acquired from the data center. The raw data subsequently undergo preprocessing and feature selection to form a structured dataset suitable for model training. Owing to the large number of hyperparameters in the LightGBM model and the challenges associated with manual tuning, this study employs the Bayesian optimization algorithm to automatically optimize the hyperparameters and enhance model performance. The final prediction model is built using the LightGBM framework, incorporating the hyperparameters identified through Bayesian optimization. Following the training process, the model is evaluated to produce the final cooling load predictions.
2.1. LightGBM Model
The LightGBM model is a highly efficient gradient boosting framework that extends the conventional gradient boosting decision tree (GBDT) algorithm. Originally introduced by Microsoft in 2017, it is specifically designed to handle large-scale, high-dimensional datasets, making it particularly well-suited for tasks such as cooling load prediction in data centers. The core innovation of the LightGBM model lies in its integration of three key techniques: a histogram algorithm method, a leaf-wise growth strategy with depth constraints, and parallel computing optimization. These advancements collectively enhance training efficiency and predictive accuracy compared to conventional gradient boosting approaches [
41]. The LightGBM model constructs an ensemble through an additive, iterative process, combining M weak regression trees to achieve superior predictive performance [
42]. The final model after M iterations is expressed as follows:
where
x denotes the input feature vector, and
fm is the
mth tree.
Compared to conventional GBDT, the LightGBM model demonstrates superior performance on large-scale, high-dimensional datasets, such as those encountered in data center cooling load prediction, by significantly improving both computational efficiency and predictive accuracy. To address the computational inefficiencies inherent in conventional GBDT when processing large-scale datasets, LightGBM employs a histogram-based optimization strategy, which discretizes continuous features into k bins. This method quantizes continuous features into discrete integer values, constructs k-bin histograms in a single data pass, accumulates statistical distributions for gain prediction, and determines the best segmentation point based on the maximum information gain criterion. This method effectively reduces computational complexity while maintaining high precision in split-point selection. To further enhance model performance and mitigate overfitting, the LightGBM model employs a leaf-wise tree growth strategy with depth constraints. This approach iteratively selects nodes with the highest gain for splitting, thereby optimizing model expressiveness and predictive accuracy while effectively controlling model complexity to ensure computational tractability. Conventional decision tree algorithms generally adopt a level-wise growth strategy, where all leaf nodes at the same depth are split simultaneously based on maximum impurity reduction (as shown in
Figure 2). However, this method frequently introduces redundant computations, as certain leaf nodes may contribute minimal splitting gain, leading to increased computational overhead. In data center cooling systems, measured operational parameters typically exhibit minor fluctuations around their rated values [
43]. Although large volumes of data are available, the measured operational parameters in data center cooling systems typically exhibit limited variability around their rated values under steady-state conditions. This characteristic, combined with the high dimensionality of the data, increases the risk of overfitting. LightGBM’s leaf-wise growth strategy mitigates this risk by dynamically selecting the leaf node with the maximum gain for splitting at each iteration. As a result, it not only improves computational efficiency but also mitigates overfitting, making it particularly effective for high-dimensional, low-variability datasets, such as those encountered in cooling load prediction tasks.
Furthermore, the LightGBM model integrates advanced parallel computing techniques, including feature parallelism, data parallelism, and histogram parallelism. These techniques enable the concurrent processing of features and the distribution of data blocks across multiple computational units, while utilizing multithreading for efficient histogram construction. By significantly reducing communication overhead, these techniques enhance both computational efficiency and memory utilization, which are advantages particularly critical for large-scale applications such as data center cooling load prediction. LightGBM incorporates Gradient-based One-Side Sampling (GOSS), a novel sampling method that addresses the computational cost of traditional gradient boosting, which requires scanning all data instances for every split. GOSS retains all data instances with large gradients (i.e., those that are under-trained and contribute significantly to information gain) while performing random sampling on instances with small gradients. By focusing on these high-gradient instances, GOSS ensures that the most informative data points, such as those representing sudden cooling load changes in a data center, are prioritized during training. Compared to uniform random sampling, this approach yields more accurate gain estimates, thereby improving learning efficiency and model performance without compromising accuracy.
2.2. Hyperparameter Tuning via Bayesian Optimization
The selection of optimal hyperparameters for the LightGBM model is crucial due to their direct impact on predictive accuracy. Effective hyperparameter tuning is essential for maximizing model performance [
44], serving as a crucial component of the overall optimization process. Conventional hyperparameter tuning of the LightGBM model typically relies on manual trial-and-error methods, where parameters are adjusted empirically based on performance evaluation. However, this method proves inefficient and impractical for multi-parameter optimization scenarios, often resulting in suboptimal solutions due to premature convergence to a local optimum rather than the global optimum. To enhance the predictive accuracy of the LightGBM model for data center cooling load prediction, this study adopts Bayesian optimization, which facilitates the efficient and simultaneous tuning of multiple hyperparameters, thereby increasing the likelihood of achieving a globally optimal solution [
45]. This approach enables the development of more accurate and robust predictive models. Bayesian optimization is a global optimization technique that has been successfully applied across various domains, including intelligent robotics [
46], information processing, and combinatorial optimization [
47]. Notably, Snoek et al. [
48] introduced Bayesian optimization into machine learning, demonstrating its effectiveness for joint hyperparameter tuning in complex models. The theoretical foundation of this method is Bayes’ theorem, originally proposed by Reverend Thomas Bayes [
49], which provides a probabilistic framework for updating beliefs based on observed data. It can be formally expressed as:
The theorem can be written as:
where
p(
θ│
y) is the posterior probability of the parameters
θ, given the observed data
y;
p(
y│
θ) is the likelihood function of the data
y, given the parameters
θ;
p(
θ) is the prior probability of
θ; and
p(
y) is the marginal likelihood.
The objective of applying Bayesian optimization to LightGBM hyperparameter tuning is to minimize the model’s validation loss function, formally expressed as:
where
is the combination of hyperparameters to be optimized,
is the loss function on the validation set,
is the truth labels of the validation set, and
is the predicted value on the validation set generated by the model trained with hyperparameters
.
The objective of applying Bayesian optimization to LightGBM hyperparameter tuning is to minimize the model’s validation loss function, formally expressed as:
where
is the best observed value of the objective function.
The Bayesian optimization process operates iteratively through three key phases: (a) GP model construction using existing hyperparameter evaluations, followed by next-point selection via acquisition function maximization (Equation (5)); (b) objective function evaluation at the new candidate point and subsequent GP model updating; (c) iterative repetition of this cycle until termination conditions are satisfied, such as reaching the maximum number of iterations or achieving convergence. This closed-loop procedure represents the complete Bayesian optimization workflow, ultimately resulting in an optimized set of hyperparameters that enhance model performance.
2.3. Naive Benchmark Models
To establish a rigorous baseline that accounts for the inherent periodicity in data center cooling loads, the following naive time-lagged models were established as performance benchmarks. The T-1, T-24, and T-168 baseline models predict the current data center cooling load based on the cooling load value from the previous hour, the same hour of the previous day (24 h lag), and the same hour of the previous week (168 h lag), respectively. This approach can be mathematically formulated as:
These simple baseline models serve as critical benchmarks for evaluating whether the proposed model genuinely surpasses the predictive capability inherent in simple temporal lags, thereby achieving enhanced predictive performance. Notably, they rely exclusively on historical load values and do not require the training of complex algorithms, resulting in minimal computational overhead.
2.4. Three Comparative Models
To evaluate the performance of the proposed model, this study conducts a comparative analysis using XGBoost, SVR, and LSTM as benchmark models. Among these, XGBoost is an ensemble learning algorithm that iteratively trains multiple weak learners (decision trees) and combines their outputs to enhance predictive accuracy through the optimization of an objective function. In building cooling load prediction, XGBoost has exhibited strong capability in processing heterogeneous, multi-source data effectively. SVR extends SVM to regression tasks by identifying optimal hyperplanes in high-dimensional feature spaces. It is particularly effective for building cooling load prediction scenarios characterized by limited sample sizes and high-dimensional input features. LSTM, a specialized recurrent neural network (RNN) variant, resolves the gradient vanishing problem through its gated architecture (input, forget, and output gates). This structure enables the model to capture long-range temporal dependencies effectively, rendering it especially suitable for modeling complex sequential patterns in building cooling load prediction.
All three algorithms are widely recognized for their robust predictive performance. In this study, they serve as baseline benchmarks to evaluate and compare the performance of the proposed Bayesian-optimized LightGBM model.