1. Introduction
With the rapid development of the information society and the surge in demand for computing and storage resources, the energy consumption of datacenters has significantly increased in recent years, making it a crucial component of global energy consumption [
1]. Nowadays, datacenters consume approximately 1% of the world’s total electricity [
2], which might rise significantly in the coming decades as computing power demands continue to escalate [
3]. Thus, it is crucial to improve the operational energy efficiency and achieve the energy conservation of datacenters.
Commonly, the energy consumption of datacenters is primarily driven by IT equipment, the cooling system, and auxiliary systems such as power distribution system and lighting system [
4]. Among them, nearly 50% of the energy is used by the cooling system to provide airflow for cooling IT equipment [
5]. Therefore, the cooling system has significant potential for energy savings. Developing effective cooling system control strategies (i.e., thermal management) to maintain energy efficiency while ensuring the safety of IT equipment is a key element in reducing energy consumption and carbon emissions for datacenters [
6].
For achieving effective thermal management in datacenters, computational fluid dynamics (CFD) is widely regarded as the most powerful tool [
7,
8]. CFD methods aim to solve the governing equations of fluid mechanics numerically on computers, thereby enabling accurate prediction of airflow distributions in datacenter rooms, providing adequate and effective information for thermal management [
9]. For example, Zhou et al. [
10] optimized the thermal management system of a small datacenter by employing a validated CFD model (prediction accuracy within 2 °C of experimental results) to simulate airflow and heat transfer, thereby enhancing cooling efficiency and improving the uniformity of the temperature distribution. Cao et al. [
11] used CFD simulations to optimize server inlet areas in a data center, achieving a reduction in the bottom server temperature from 309.2 K to 299.9 K without additional cooling infrastructure, thus enhancing thermal management and potential energy savings. However, the CFD process is inherently complex, as it requires the discretization of differential equations over a large number of mesh nodes [
12]. This makes CFD simulations highly time-consuming, particularly in a datacenter scenario, where high-resolution modeling or complex geometries must be considered. Consequently, CFD methods are often limited to the design and optimization stage [
13] and are generally unsuitable for real-time energy-saving control applications, which demand rapid computation of airflow patterns [
14].
With the rapid development of artificial intelligence (AI), machine learning (ML) models, especially neural network (NN)-based surrogate models [
15], have demonstrated significant potential and are expected to serve as a promising alternative to time-consuming CFD methods in real-time energy-saving operations [
16]. This approach involves the development of predictive models using existing experimental or simulation datasets of airflow. It skips complex physical solving processes, as a well-trained model can provide accurate predictions in a very short time [
17], fully meeting the computational speed requirements for real-time control. For instance, Wang et al. [
18] used an artificial neural network (ANN) to predict the thermal effects of workloads on datacenter temperatures, achieving a reduction of 6.67 °C in the maximum temperature, which translates to a 13.34% reduction in cooling system power consumption. Athavale et al. [
19] employed an ANN to predict rack inlet air temperatures in datacenters. The ANN demonstrated good accuracy in predicting rack inlet air temperatures, with an average absolute error of 0.7 °C, corresponding to a relative error of 3.2%. Fang et al. [
20] proposed an improved deep neural network (DNN) combined with an attention mechanism to predict server temperatures in a datacenter, achieving a significant reduction in prediction error, with the mean absolute error (MAE) dropping from 3.51 °C for the standard DNN model to 0.65 °C, an improvement of 81.48%. Meanwhile, the surrogate model only required 0.47 s for prediction.
However, in real-world scenarios, the operating conditions of datacenter environments are highly variable, resulting in significantly different airflow patterns. For instance, the temperature of the incoming airflow fluctuates as the cooling system operates. [
21]. The power of different server cabinets is variant depending on users’ demand [
22]. The existing NN-based methods are usually proposed for a specific scene. Most of them suffer from poor generalization ability because they are trained and utilized in the same working conditions [
23]. Their precision and reliability would be doubtful when they are applied in new working conditions which are outside of the training data. Meanwhile, existing studies have primarily been conducted on smaller datacenters. In comparison, large datacenters serve as the main energy consumers in the datacenter industry [
24]. These larger facilities exhibit more diverse operating conditions, resulting in more complex and variant airflow fields [
25]. Therefore, a question arises about how to utilize ML-based methods in effectively predicting airflow distributions of large datacenter rooms under various working conditions, as these ML-based methods are trained on the CFD database, with limited working conditions. In essence, the scientific problem is the generalization ability of ML-based models, which is the ability to predict airflow distributions in large datacenters under diverse and fluctuating operational conditions, in order to provide high-quality data support for real-time thermal management applications in datacenters, thereby further enhancing the effectiveness of thermal management.
For conventional ANN-based methods, there are two main challenges for directly utilizing them to handle the aforementioned situation. Firstly, the fully connected structure in conventional ANN-based methods assumes that the output units are independent of each other [
26], which does not align with the spatial correlations inherent in airflow patterns. In scenarios where airflow distribution varies across diverse operating conditions, the spatial dependencies between different regions of the flow field are more crucial, making fully connected structures inadequate for accurately capturing and predicting these variations. Secondly, due to the dense connectivity of fully connected structures, conventional ANN-based methods require an enormous number of parameters when handling high-dimensional inputs and outputs [
27]. In the context of large data centers, the dimensions of the input features (i.e., working conditions) are relatively higher, while the outputs (i.e., airflow fields) are characterized by intricate spatial structures and high resolution. This not only results in substantial memory consumption for conventional ANN-based methods, but also leads to prohibitively high computational costs and extended training times. In summary, there still lacks an appropriate method that can effectively learn the relationship between complex input condition vectors and the spatial distribution of output airflow patterns, while efficiently generating detailed spatial airflow distributions in large datacenter rooms.
In the artificial intelligence community, the convolutional neural network (CNN) is one of the most crucial concepts. It is inspired by the visual perception of humans, who can focus on key portions of a large object that needs to be analyzed. Instead of a fixed sequence for conventional ML-based methods, it takes a 2D/3D tensor as input/output and introduces convolutional operators to process spatial features [
28]. The CNN has found extensive applications in the domain of computer vision, including tasks such as image recognition [
29], space semantic segmentation [
30] and video comprehension [
31], etc. Results indicate its powerful capacity in uncovering hidden knowledge in spatial data, learning the complex relationships between input features and output space. Hence, combined with the processing ability for complex features of ANN, the CNN has great potential to overcome the challenge of fast and accurate airflow predictions under different working conditions in large datacenter rooms. However, to the best of the authors’ knowledge, the feasibility and effectiveness of CNN-based methods in the domain of large datacenter room environment analysis are still unclear. There still lacks a framework to merge CNN with conventional ANN-based methods to analyze airflow in large datacenter rooms.
To address this issue, this paper proposes an ANN–CNN hybrid method to rapidly predict the temperature distributions of large datacenter rooms under variant working conditions. This study makes three main contributions. Firstly, an ANN–CNN hybrid framework is proposed. The role of the ANN component is to transform the input operating conditions into feature vectors that can be processed by the CNN. The CNN is responsible for efficiently decoding these latent feature vectors to reconstruct the target temperature fields. Secondly, a simulated dataset containing 500 3D temperature fields with different working conditions is established and validated based on a real large datacenter in Hubei, China. The simulation is processed by 6 Sigma 15 software [
32]. Thirdly, comprehensive evaluations on different airflow regions are made to evaluate the performance of the proposed method across different operating conditions.
6. Conclusions
In this study, an ANN–CNN hybrid surrogate model is developed for the rapid and precise prediction of airflow temperature fields in large datacenter rooms under variant working conditions. A CFD simulation dataset is generated based on a real large datacenter located in Hubei, China. This dataset consists of 500 distinct working conditions, each corresponding to a three-dimensional temperature field within the datacenter room. The reliability of the CFD model is validated through a comparison between the simulation results and actual measurement data. Then, a domain customized ANN–CNN framework is developed to process input working condition more effectively and predict airflow distribution with high-fidelity spatial details. The generalization ability of the proposed method is evaluated on a testing set with working conditions and temperature fields that are completely distinct from those in the training set.
Results of the comparison between the CFD simulation data and actual measured data demonstrate a high level of consistency. The errors between the measured and simulated temperatures are generally small, with most discrepancies falling within a range of less than 1 °C and percentage errors typically under 5%. This result demonstrates that the CFD simulation model is reliable and capable of generating datasets for further training surrogate models. The validation results of surrogate models indicate that the proposed ANN–CNN hybrid surrogate model and its variant, the ANN–U-Net hybrid surrogate model, are both capable of accurately predicting the temperature field of large datacenter rooms under operating conditions that are not encountered during the training process. Compared to the conventional ANN surrogate model, the ANN–CNN hybrid surrogate model achieved a reduction of 87.44% in MAE and 91.57% in MAPE, while its R2 increased by 210.37%. The conventional ANN surrogate model exhibits significant errors in this task, with a MAE greater than 4 °C, making it difficult to provide effective predictions. Compared to the ANN–U-Net hybrid surrogate model, the ANN–CNN hybrid surrogate model achieved a reduction of 9.39% in MAE and 10.09% in MAPE, while its R2 increased by 1.94%. Results show that convolutional-based decoders, i.e., CNN or U-Net, allow surrogate models to progressively capture more abstract spatial features, leading to superior accuracy in predicting airflow distributions under variant working conditions.
This study addresses the challenge of rapidly predicting the temperature field of large datacenter rooms under varying working conditions, providing a powerful tool for real-time, efficient thermal management of large datacenters. However, there are still areas for improvement. For example, the model occasionally struggles with local regions where airflow patterns are highly complex. Incorporating physical constraints into the model might be useful for enhancing its accuracy and reliability. In addition, the proposed method is now utilized in predicting temperature fields under certain spatial configuration. Further studies are necessary to extend this work to include diverse spatial conditions, achieving a more robust generalization capability, encompassing both the generalization across operating conditions and the generalization across spatial dimensions.