Real-Time Thermal Symmetry Control of Data Centers Based on Distributed Optical Fiber Sensing and Model Predictive Control

Tang, Lin-Xiang; Wang, Mu-Jiang-Shan

doi:10.3390/sym18030398

Open AccessArticle

Real-Time Thermal Symmetry Control of Data Centers Based on Distributed Optical Fiber Sensing and Model Predictive Control

by

Lin-Xiang Tang

¹ and

Mu-Jiang-Shan Wang

^2,3,*

¹

School of Computer Science and Technology, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China

²

Shenzhen Kaihong Digital Industry Development Co., Ltd., Shenzhen 518000, China

³

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 100190, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(3), 398; https://doi.org/10.3390/sym18030398

Submission received: 13 January 2026 / Revised: 10 February 2026 / Accepted: 12 February 2026 / Published: 24 February 2026

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

The high energy consumption and spatiotemporal thermal asymmetry of data center cooling systems have become critical bottlenecks constraining their green and sustainable development. Traditional point-type temperature sensors suffer from insufficient spatial coverage, while conventional feedback control strategies exhibit delayed responses and limited adaptability under dynamic workloads. To address these challenges, this study proposes a real-time thermal symmetry management framework for data centers based on distributed fiber optic temperature sensing and model predictive control (MPC). The proposed system employs Brillouin scattering-based distributed sensing to continuously acquire high-density temperature measurements from thousands of points along a single optical fiber, enabling fine-grained perception of the three-dimensional thermal field. On this basis, a hybrid prediction model integrating thermodynamic physical equations with a Temporal Convolutional Network–Bidirectional Gated Recurrent Unit (TCN–BiGRU) deep neural network is developed to achieve accurate and stable spatiotemporal temperature forecasting. Furthermore, a symmetry-aware MPC controller is designed with the dual objectives of minimizing cooling energy consumption and suppressing thermal field deviations, thereby restoring temperature uniformity through rolling-horizon optimization. Experimental validation in a production data center demonstrates that the distributed sensing system achieves a measurement deviation of 0.12 °C, while the hybrid prediction model attains a root mean square error of 0.41 °C, representing a 26.8% improvement over baseline methods. The MPC-based control strategy reduces daily cooling energy consumption by 14.4%, improves the power usage effectiveness (PUE) from 1.58 to 1.47, and significantly enhances both thermal symmetry and operational safety. The Thermal Symmetry Index (TSI) decreased from 0.060 to 0.035, indicating a 41.7% improvement in spatial temperature distribution uniformity. The TSI is defined as the ratio of spatial temperature standard deviation to mean temperature, where lower values indicate better thermal uniformity; TSI < 0.03 represents excellent symmetry, 0.03–0.05 indicates good symmetry, and TSI > 0.08 suggests significant asymmetry requiring intervention. These results provide an effective and practical solution for intelligent operation, energy-efficient control, and low-carbon transformation of next-generation green data centers.

Keywords:

data center thermal symmetry; distributed fiber optic temperature sensing; model predictive control; cyber–physical energy systems; thermal field uniformity; hybrid physical–AI prediction model; Thermal Symmetry Index

1. Introduction

With the rapid development of cloud computing, artificial intelligence, and big data technologies, data centers, as the core infrastructure of the digital society, have experienced explosive growth in both scale and number. According to the International Energy Agency [1], global data center electricity consumption already accounts for approximately 1.8% of total power consumption, and this proportion continues to rise. In the overall energy consumption structure of data centers, cooling systems typically account for 40–50% of total power usage, making them the second-largest energy consumer after IT equipment. Meanwhile, server chip thermal design power has exceeded 700 W, and single-rack power density is steadily approaching 30 kW and beyond, pushing traditional air-cooling technologies toward bottlenecks such as insufficient cooling capacity and low energy efficiency.

From a thermal-system viewpoint, modern data centers exhibit pronounced spatiotemporal heterogeneity in airflow organization and heat transfer processes. The interaction among time-varying workloads, rack layout diversity, and constrained airflow paths often produces non-uniform temperature distributions and persistent local hotspots. Such thermal asymmetry accelerates component aging, degrades performance, and may even trigger safety shutdowns, threatening the stable operation of large-scale computing infrastructures. Under the strategic goals of carbon peak and carbon neutrality, achieving precise regulation, energy-efficient operation, and thermal-field uniformity in data center cooling systems has become a focal topic of common concern in academia and industry.

Extensive efforts have been devoted to data center thermal management. In temperature perception and prediction, Lin et al. [2] compared the performance of six data-driven models, showing that the prediction RMSE of XGBoost and LightGBM can be controlled within 1.0 °C. Chen et al. [3] constructed rack hotspot temperature prediction models using machine learning, achieving prediction errors within 1.5 °C. In cooling control strategies, Yao and Shekhar [4] systematically reviewed MPC applications in HVAC systems, indicating that MPC can achieve 15–30% energy savings compared with conventional rule-based control. Wan et al. [5] introduced deep reinforcement learning for rack-level cooling management, achieving 8.5% energy reduction while maintaining thermal safety. In novel cooling technologies, Heydari et al. [6] investigated direct-to-chip liquid cooling for high-heat-density data centers, achieving approximately 25% cooling efficiency improvement. Taddeo et al. [7] analyzed single-phase immersion cooling systems, reporting 40% temperature uniformity improvement. Zhou et al. [8] studied waste-heat recovery schemes integrating immersion-cooled data centers with district heating, demonstrating 60–75% energy recovery rates.

Different cooling technologies present distinct trade-offs. Air cooling systems offer technological maturity and low deployment costs ($50–100/kW) but are limited by the thermophysical properties of air for high-density racks exceeding 15 kW. Liquid cooling provides superior heat transfer (10–100× higher than air) and handles power densities above 30 kW/rack, but requires infrastructure modifications with costs of $200–400/kW. Natural cooling can reduce energy consumption by 30–50% in favorable climates but is highly location-dependent. This study focuses on optimizing air-cooled systems through intelligent sensing and control, as they remain dominant in approximately 85% of existing data centers and offer significant improvement potential without major hardware retrofits.

Despite these advances, several fundamental limitations remain. Existing monitoring schemes still rely predominantly on discretely deployed point-type sensors, which provide sparse spatial sampling and cannot capture the continuous structure of three-dimensional thermal fields. Moreover, many control methods depend strongly on model accuracy and exhibit limited robustness under load mutations or equipment degradation. Conventional PID controllers lack predictive capability, resulting in reactive rather than proactive regulation. Rule-based expert systems cannot adapt to changing conditions and typically achieve only 5–10% energy savings. Deep reinforcement learning approaches require extensive training data and may exhibit unstable behavior in unseen scenarios. These limitations motivate the development of a real-time thermal symmetry management framework that couples high-density distributed sensing with robust, constraint-aware predictive control.

From the perspective of large-scale cyber–physical infrastructure, system reliability and fault diagnosability have long been recognized as fundamental requirements for safe operation. Wang et al. [9,10,11] investigated connectivity properties and fault tolerance in various network topologies, including expanded k-ary n-cubes and Cayley graphs, establishing theoretical foundations for fault-tolerant network design. Wang and Wang [12] and Wang et al. [13] further studied connectivity and matching preclusion in bubble-sort star graphs and leaf-sort graphs, contributing to the understanding of network resilience. More recently, Wang et al. [14] proposed a global reliable diagnosis approach based on self-comparative diagnosis models, offering practical algorithms for fault identification. These studies provide important theoretical guidance for building high-availability industrial control systems such as modern data center cooling platforms.

Addressing the shortcomings of existing research, this paper proposes a real-time thermal symmetry management framework for data centers based on distributed fiber optic temperature sensing and model predictive control. The overall research framework is illustrated in Figure 1. The proposed system is organized into four tightly coupled layers, namely the perception layer, modeling layer, decision layer, and execution layer, forming a complete cyber–physical closed-loop control architecture.

The perception layer employs distributed fiber optic sensing technology to monitor the data center temperature field in real time. A single optical fiber simultaneously acquires temperature data from thousands of measurement points along its length, enabling high-density and continuous spatial sampling. The modeling layer integrates thermodynamic physical equations with deep neural networks to construct a hybrid prediction model that balances physical consistency, computational efficiency, and prediction accuracy. The decision layer designs a constraint-aware model predictive controller whose optimization objective is to minimize cooling energy consumption while suppressing thermal field deviations under explicit temperature safety boundaries. The execution layer delivers control commands to precision air conditioners, cooling towers, and chilled water pumps, coordinating the thermal environment in real time.

The contributions of this work are threefold. First, distributed fiber optic sensing is introduced into data center thermal field monitoring, providing high-density and continuous temperature perception for fine-grained thermal symmetry management. Second, a hybrid physical–AI prediction model is developed by integrating thermodynamic mechanisms with data-driven learning, enhancing the representation capability for nonlinear thermal processes while preserving computational efficiency. Third, a multi-objective symmetry-aware MPC framework is designed to achieve the unified optimization of cooling energy consumption and thermal safety. The results of this study provide effective technical support for intelligent operation, energy-efficient control, and low-carbon transformation of next-generation green data centers.

2. Related Technologies and Research Status

2.1. Overview of Data Center Thermal Management Technologies

Data center thermal management systems play a fundamental role in maintaining safe operating temperatures for IT equipment and ensuring the stable delivery of computing performance. Zhang et al. [15] conducted a comprehensive technical review and classification of data center cooling systems, categorizing existing cooling solutions into three major classes, namely air cooling, liquid cooling, and natural cooling, and analyzing their applicable scenarios and development trends from the perspectives of power modeling and control strategy optimization. Their study indicates that although air cooling is technologically mature and offers relatively low deployment costs, its heat transfer capability is inherently constrained by the thermophysical properties of air, making it increasingly difficult to satisfy the heat dissipation demands of high-power-density racks.

Xu et al. [16] further expanded the research scope of thermal management technologies by systematically comparing the energy consumption characteristics, heat dissipation capacity, and operation and maintenance complexity of air cooling, liquid cooling, and natural cooling modes. Their analysis shows that liquid cooling is gradually becoming the mainstream solution for high-performance computing scenarios due to its superior heat transfer efficiency and scalability, with reported Power Usage Effectiveness (PUE) values as low as 1.1–1.2 compared with 1.4–1.6 for conventional air-cooled facilities.

It is worth noting that the thermal environment of modern data centers exhibits pronounced spatiotemporal heterogeneity. The combined effects of dynamic server workload fluctuations, spatial diversity in rack layouts, and complex airflow organization lead to frequent local hotspot formation and non-uniform temperature distributions. Du et al. [17] addressed this challenge from the perspective of dynamic thermal environment management and comprehensively reviewed recent progress in real-time temperature monitoring, thermal load prediction, and adaptive control technologies, providing a theoretical foundation for the construction of intelligent and autonomous thermal management systems, and reporting that advanced thermal-aware management can reduce cooling energy consumption by 20–35% while improving temperature uniformity by 30–50%.

To provide a clearer context for the present study, Table 1 presents a comparative summary of recent thermal management approaches, highlighting their sensing technologies, control strategies, key performance metrics, and identified limitations. As shown in the table, existing methods demonstrate trade-offs between sensing density, prediction accuracy, and control adaptability. The comparison reveals that while individual advances have been made in sensing, prediction, and control, an integrated framework that combines high-density distributed sensing with physics-informed prediction and constraint-aware optimal control remains largely unexplored, which motivates the present research.

2.2. Research Progress in Temperature Monitoring and Prediction Methods

Accurate temperature perception and reliable thermal load prediction constitute the fundamental basis for achieving efficient and adaptive thermal management in data centers. Traditional temperature monitoring schemes mainly rely on point-type sensors such as thermocouples and thermistors. Although these sensors provide fast response and high measurement accuracy, their deployment density is limited by wiring complexity and installation cost, resulting in sparse spatial sampling and an incomplete characterization of the three-dimensional thermal field within the machine room.

Distributed fiber optic temperature sensing technology provides a promising solution to overcome these limitations. Thévenaz [19] systematically reviewed the operating principles and technical characteristics of distributed fiber optic sensors, demonstrating that temperature measurement systems based on Brillouin or Raman scattering can continuously acquire temperature information from thousands of points along a single optical fiber, achieving centimeter-level spatial resolution and temperature accuracy better than 0.5 °C. This capability enables high-density and long-distance thermal field monitoring, which is particularly advantageous for large-scale infrastructure environments. Hatley et al. [20] applied high-resolution distributed temperature sensing to bridge scour monitoring and verified the reliability and robustness of this technology under complex environmental conditions, providing valuable engineering references for the design of data center temperature monitoring systems. Bao and Wang [21] provided a comprehensive review of recent advances in distributed Brillouin sensing, highlighting that modern BOTDA systems can achieve spatial resolution below 1 m over sensing distances exceeding 100 km, with temperature accuracy of ±0.1 °C under optimized conditions, which demonstrates the maturity of this technology for large-scale industrial applications.

In the field of temperature prediction, data-driven modeling has become the dominant technical paradigm. Athavale et al. [22] systematically compared the performance of multiple machine learning methods, including artificial neural networks, support vector regression, and Gaussian process regression, for data center temperature prediction tasks. Their experimental results indicate that Gaussian process regression achieves the best average prediction error of 0.56 °C, while artificial neural networks and support vector regression also exhibit competitive performance with prediction errors of 0.60 °C and 0.68 °C, respectively. Fang et al. [23] proposed a temperature prediction approach that integrates computational fluid dynamics simulation with deep neural networks. By incorporating attention mechanisms and Bayesian optimization, their method achieved an 81.48% improvement in prediction accuracy compared with conventional models, demonstrating the effectiveness of coupling physical simulation with data-driven learning. Lin et al. [24] further summarized recent advances in thermal-aware modeling and energy-saving strategies for cloud data centers, organizing related research from the perspectives of thermal modeling, thermal-aware scheduling, and collaborative thermal management optimization.

2.3. Research Progress in Cooling System Control Strategies

The optimization and control of cooling systems constitute a core component in reducing data center energy consumption and improving operational efficiency. Model predictive control (MPC), as an advanced process control paradigm, is particularly suitable for data center thermal regulation due to its ability to handle multivariable coupling, explicit constraint handling, and time-delay dynamics. By establishing a predictive model of the controlled system and solving an optimal control sequence over a rolling horizon, MPC enables proactive and coordinated regulation of complex cooling infrastructures. Zhao et al. [18] designed a coordinated MPC-based control strategy for multi-chiller systems in data centers, in which future cooling loads and server room temperatures are predicted using long short-term memory (LSTM) networks, and chilled water flow rate and supply temperature are jointly optimized, achieving approximately 12% reduction in cooling energy consumption. Experimental results demonstrate that this strategy can significantly reduce overall cooling energy consumption while maintaining stable thermal conditions in the server room.

Beyond MPC-based approaches, deep reinforcement learning (DRL) has emerged as a powerful paradigm for data center cooling control under highly dynamic and uncertain operating environments. Zhang et al. [25] comprehensively evaluated the strengths and limitations of DRL algorithms for dynamic thermal management in data centers from four dimensions, namely algorithms, tasks, system dynamics, and knowledge transfer. Their analysis indicates that actor–critic, off-policy, and model-based algorithms exhibit superior optimality, robustness, and transferability across diverse workloads and operating scenarios, with reported improvements of 10–20% over baseline methods in simulation environments.

The rapid development of DRL has further expanded the methodological landscape of cooling control. Li et al. [26] proposed an end-to-end cooling optimization framework based on deep reinforcement learning for green data centers. By training a deep deterministic policy gradient (DDPG) agent through extensive interaction with the EnergyPlus simulation platform, their method achieved near-optimal control policies and realized approximately 11% cooling cost savings compared with manually configured baseline strategies. Wang et al. [27] introduced a physics-guided safe reinforcement learning framework that explicitly embeds temperature threshold constraints into the policy learning process while optimizing energy efficiency objectives, effectively mitigating the safety risks inherent in purely data-driven approaches, and demonstrating zero thermal violation during 30-day continuous operation. Lin et al. [28] further designed a thermal prediction model that integrates temporal convolutional networks, bidirectional gated recurrent units, and attention mechanisms within a multi-objective optimization framework, and on this basis constructed a collaborative control architecture that balances cooling energy minimization and rack cooling index maximization, achieving prediction RMSE of 0.52 °C and 9.3% energy savings, providing a practical reference for intelligent operation and maintenance of hybrid-cooled data centers.

Despite these advances, most existing control strategies still rely on sparse temperature perception and lack an explicit mechanism for preserving thermal field uniformity and symmetry across the data center space. Table 2 summarizes the key distinctions between the proposed framework and existing representative methods. As illustrated in the table, the proposed approach differs from prior work in three fundamental aspects: (1) sensing paradigm—transitioning from discrete point sensors to continuous distributed fiber optic sensing with over 3600 measurement points; (2) prediction methodology—integrating thermodynamic physical constraints with deep temporal networks rather than using purely data-driven or purely physical models; and (3) control objective—explicitly incorporating thermal symmetry preservation through a symmetry-aware MPC formulation rather than focusing solely on energy minimization or temperature threshold compliance. These methodological innovations collectively enable fine-grained thermal field perception, physically consistent temperature prediction, and coordinated optimization of energy efficiency and spatial thermal uniformity.

This systematic comparison motivates the development of a symmetry-aware predictive control framework that integrates high-density distributed sensing with physically consistent prediction models and constraint-aware optimization to achieve robust, energy-efficient, and spatially balanced thermal regulation.

3. System Design

3.1. Overall System Architecture Design

The data center thermal management system based on distributed fiber optic temperature sensing and model predictive control is designed to achieve high-precision perception of thermal fields, accurate prediction of thermal loads, and optimal regulation of cooling equipment. The overall system architecture follows a closed-loop cyber–physical control paradigm of perception–modeling–decision–execution, as illustrated in Figure 2. This architecture is vertically organized into four functional layers.

The bottom layer is the physical equipment layer, which consists of server racks, precision air conditioners (CRAC units), cooling towers, chilled water pumps, and distributed fiber optic sensors. These components form the physical infrastructure that generates heat loads, executes cooling actions, and provides continuous temperature measurements. The second layer is the data acquisition layer, which is responsible for temperature signal demodulation, multi-source data fusion and aggregation, and communication protocol conversion to ensure reliable and low-latency data transmission. The third layer is the intelligent computing layer, which hosts the training and inference tasks of the hybrid thermal prediction model as well as the optimization solving tasks of the model predictive controller. The top layer is the human–machine interface layer, providing visualization dashboards, alarm management services, and operation and maintenance decision support for operators.

A central concept in the proposed framework is thermal symmetry, which characterizes the spatial uniformity of temperature distribution across the data center. To quantify thermal symmetry, this study introduces the Thermal Symmetry Index (TSI), mathematically defined as:

T S I = \frac{σ (T)}{\bar{T}}

(1)

where

σ (T)

denotes the spatial standard deviation of temperature measurements across all sensing points (°C), and

\bar{T}

represents the spatial mean temperature (°C). A lower TSI value indicates better thermal uniformity, with TSI = 0 representing a perfectly uniform thermal field. Based on industry practice and ASHRAE guidelines [29], TSI values can be interpreted as follows: TSI < 0.03 indicates excellent thermal symmetry with minimal hotspot risk; 0.03 ≤ TSI < 0.05 represents good thermal symmetry acceptable for normal operation; 0.05 ≤ TSI < 0.08 indicates moderate asymmetry requiring attention; and TSI ≥ 0.08 suggests poor thermal symmetry with significant hotspot risk requiring immediate intervention. The proposed MPC controller explicitly incorporates TSI minimization as an optimization objective to achieve spatially balanced thermal regulation.

The hardware platform adopts an edge–cloud collaborative deployment mode. High-performance embedded computing devices are deployed at the edge side to execute temperature data preprocessing and fast control response tasks with stringent real-time requirements, while GPU server clusters are deployed at the cloud side to support offline training of deep learning models and large-scale historical data storage and analytics. Tanasiev et al. [30] proposed an IoT-enhanced monitoring and control solution for HVAC systems that integrates heterogeneous devices through MQTT protocols and RESTful APIs, enabling real-time perception and remote management of equipment status via intelligent sensor nodes and edge computing applications. Inspired by this design philosophy, the proposed system constructs a hierarchical data acquisition and communication network for large-scale thermal sensing. Serale et al. [31] further investigated IoT system architectures for MPC-based control and highlighted that well-designed communication topology and data synchronization mechanisms are essential for guaranteeing the real-time performance and stability of predictive control systems. The main design parameters of the proposed system are summarized in Table 3.

The temperature upper limit threshold of 27 °C is determined based on ASHRAE TC 9.9 guidelines for data center thermal management [29], which recommend that server inlet temperatures be maintained within 18–27 °C for Class A1 data centers to ensure reliable IT equipment operation. This threshold provides a safety margin below the critical temperature of 32 °C, above which server throttling or emergency shutdown may occur. The selection balances thermal safety requirements with energy efficiency considerations, as operating closer to the upper limit reduces overcooling and associated energy waste.

3.2. Distributed Fiber Optic Temperature Sensing Subsystem Design

The distributed fiber optic temperature sensing subsystem constitutes the perceptual foundation of the proposed thermal symmetry management system. Its primary function is to acquire high-density spatiotemporal distribution information of the temperature field within the data center machine room, thereby enabling continuous and fine-grained observation of thermal dynamics. In contrast to traditional point-type temperature sensors, whose deployment density is constrained by wiring complexity and installation cost and thus provides only sparse discrete sampling, distributed fiber optic sensing enables continuous temperature profiling along the entire fiber path, making it possible to capture spatial temperature gradients and evolving hotspot structures under complex airflow environments.

Ashry et al. [32] systematically reviewed the deployment of fiber-optic distributed sensing technologies in the oil and gas industry, covering Rayleigh-based distributed acoustic sensing (DAS), Raman-based distributed temperature sensing (DTS), and Brillouin-based distributed temperature and strain sensing (DTSS). Their survey highlights that these sensing systems provide continuous real-time measurements along the full length of optical fiber cables and are particularly suitable for long-distance, large-scale monitoring applications. Lu et al. [33] further presented a comprehensive review of distributed optical fiber sensors based on Rayleigh, Brillouin, and Raman scattering mechanisms, emphasizing their extensive applications in energy infrastructure monitoring, power generation systems, and pipeline inspection. Their study demonstrates the long-term stability, robustness, and reliability of distributed sensing technologies under complex operating conditions, together with diverse trade-offs in spatial resolution, sensing range, and temperature accuracy.

By leveraging these technical advantages, the proposed sensing subsystem establishes a high-resolution thermal perception layer for data centers, enabling continuous observation of three-dimensional thermal fields and providing a reliable data foundation for hybrid thermal modeling and symmetry-aware predictive control.

The proposed system adopts a distributed fiber optic temperature sensing scheme based on stimulated Brillouin scattering. When pulsed light propagates along an optical fiber, photons interact inelastically with acoustic phonons in the fiber medium, generating Brillouin backscattered light with a frequency shift. The Brillouin frequency shift exhibits a linear dependence on the local temperature of the fiber, which can be expressed as [18]:

ν_{B} (T) = ν_{B 0} + C_{T} \cdot (T - T_{0})

(2)

where

ν_{B} (T)

denotes the Brillouin frequency shift at temperature

T

(in GHz),

ν_{B 0}

is the Brillouin frequency shift at the reference temperature

T_{0}

, and

C_{T}

represents the temperature sensitivity coefficient, which is typically approximately 1.1 MHz/°C for standard single-mode optical fibers. Here,

T

denotes the measured temperature (in °C), and

T_{0}

is the reference temperature, commonly set to 25 °C.

Through spectral analysis and time-domain localization of the backscattered optical signals, both temperature values and spatial position information along the entire fiber can be obtained simultaneously. The spatial resolution of the sensing system is determined by the width of the probing optical pulses, while the temperature resolution depends primarily on the accuracy of spectral demodulation.

Barrias et al. [34] reviewed the application status of distributed optical fiber sensors in civil engineering, demonstrating that although fiber Bragg grating (FBG) sensors offer high measurement accuracy, they essentially belong to quasi-distributed sensing schemes, in which the number of measurement points is limited by the number of gratings deployed along the fiber. In contrast, truly distributed sensing technologies based on Rayleigh, Brillouin, or Raman scattering provide continuous temperature measurements along the entire fiber length, enabling dense spatial sampling of large-scale infrastructures.

Bense et al. [35] extensively reviewed the application of distributed temperature sensing (DTS) as a downhole monitoring tool in hydrogeology, demonstrating both passive and active DTS modes for a wide range of monitoring scenarios. Their work verifies the long-term stability, robustness, and environmental adaptability of distributed sensing systems in complex operating environments, providing valuable references for the technology selection and deployment strategies of temperature sensing systems in large-scale data center infrastructures.

The detailed specifications of the distributed fiber optic temperature sensing system are summarized in Table 4. The sensing system employs Brillouin optical time-domain analysis (BOTDA) technology with a spatial resolution of 0.5 m and temperature accuracy of ±0.1 °C. The measurement uncertainty analysis follows the GUM (Guide to the Expression of Uncertainty in Measurement) framework, with Type A uncertainty evaluated from repeated measurements and Type B uncertainty estimated from instrument specifications and calibration certificates.

The spatial layout of the sensing fibers directly determines the observability, integrity, and reconstruction accuracy of the three-dimensional thermal field. To achieve uniform coverage while preserving fine-grained resolution in thermally critical regions, the proposed system adopts a hybrid deployment strategy that combines serpentine routing with hotspot-aware densification, as illustrated in Figure 3. The main trunk cable is arranged in a serpentine pattern along the upper spaces of both cold and hot aisles, ensuring continuous coverage across all rack rows. In key heat-exchange locations, including precision air-conditioner outlets, rack inlet faces, and hot-aisle return regions, local sampling density is increased through fiber coiling and localized routing, enabling high-resolution observation of thermal gradients and transient hotspots. The total fiber length of 1800 m is deployed across six rack rows, with approximately 300 m allocated to each row. The fiber is secured using cable ties and mounting clips at intervals of 1.0 m to prevent displacement and vibration-induced measurement noise. The detailed layout parameters and regional sampling strategies are summarized in Table 5.

The fiber optic sensing system incorporates several safety features to ensure reliable operation in the data center environment. The sensing fiber is enclosed in a flame-retardant low-smoke zero-halogen (LSZH) jacket that meets IEC 60332-1 fire safety standards, preventing fire propagation and toxic gas emission. The cable routing avoids direct contact with high-temperature surfaces (>60 °C) and maintains a minimum clearance of 50 mm from power cables to minimize electromagnetic interference. Rodent protection is provided through the use of armored fiber cables in accessible areas and protective conduits in raised floor sections. The fiber installation does not obstruct airflow paths or impede equipment maintenance access.

To ensure system reliability under partial fiber failure conditions, the proposed framework incorporates a fault detection and data recovery mechanism. The sensing system continuously monitors the optical power level and Brillouin frequency shift quality along the fiber. When a fiber break or excessive attenuation is detected at a specific location, the system automatically identifies the affected measurement points and activates interpolation-based data recovery using neighboring healthy sensing points. For critical monitoring zones, redundant fiber loops are deployed to provide backup sensing capability. The MPC controller is designed to maintain stable operation with up to 15% of sensing points unavailable, utilizing a robust state estimation algorithm that weights available measurements according to their spatial proximity to the missing points. In the event of extensive fiber failure exceeding this threshold, the system automatically switches to a conservative control mode with increased safety margins until repair is completed.

3.3. Hybrid Thermal Prediction Model Design

Accurate temperature prediction constitutes the fundamental prerequisite for the implementation of model predictive control in data center thermal management. Purely physics-based models exhibit strong interpretability and extrapolation capability; however, their formulation is often complex and computationally intensive, which limits their suitability for real-time control. In contrast, purely data-driven models are easy to train and efficient in inference, but their generalization ability is inherently constrained by the distribution range of training data and may degrade under unseen operating conditions.

To address these limitations, the proposed system develops a hybrid thermal prediction model that integrates thermodynamic physical equations with deep neural networks, achieving a balance between physical consistency, predictive accuracy, and computational efficiency. From the perspective of cloud computing energy efficiency optimization, Buyya et al. [36] analyzed the application potential of data-driven methods in data center management and emphasized that hybrid modeling strategies can effectively overcome the intrinsic limitations of single-paradigm approaches. Buyya et al. [37] further provided a comprehensive review of energy-efficiency innovations and next-generation cloud computing technologies, highlighting that the integration of physical models with data-driven learning has become a key methodological trend for intelligent and sustainable data center operation.

Motivated by these insights, the proposed hybrid prediction framework is designed to leverage the structural prior and extrapolation capability of thermodynamic models while exploiting the nonlinear representation power of deep neural networks for complex thermal dynamics, thereby providing a robust and scalable foundation for symmetry-aware predictive control.

The overall architecture of the hybrid thermal prediction model is illustrated in Figure 4. The model is composed of three tightly coupled components: a physical constraint layer, a feature extraction layer, and a prediction output layer. The physical constraint layer establishes macroscopic thermal balance equations for the data center machine room based on the principle of energy conservation, providing physically interpretable structural priors for the learning model.

Under steady-state operating conditions, the thermal balance of the machine room can be expressed as [38]:

Q_{I T} + Q_{i n f} = Q_{c o o l} + Q_{l o s s}

(3)

where

Q_{I T}

denotes the total heat generation power of IT equipment (kW),

Q_{i n f}

represents the heat gain introduced by envelope heat transfer and infiltration air (kW),

Q_{c o o l}

denotes the effective cooling capacity of the cooling system (kW), and

Q_{l o s s}

represents other heat dissipation losses (kW).

For dynamic operating conditions, considering the thermal storage effect of machine room air and equipment, the thermal balance equation can be extended as [38]:

ρ_{a i r} V c_{p} \frac{d T_{a v g}}{d t} = Q_{I T} + Q_{i n f} - Q_{c o o l} - Q_{l o s s}

(4)

where

ρ_{a i r}

denotes the air density (kg/m³),

V

is the effective machine room volume (m³),

c_{p}

is the specific heat capacity of air at constant pressure (kJ/(kg·°C)),

T_{a v g}

is the average machine room temperature (°C), and

t

denotes time (s).

The formulation of the physical constraint layer is inspired by the fast fluid dynamics (FFD) modeling paradigm proposed by Han et al. [38], who developed a data center thermal simulation model based on open-source fast fluid dynamics solvers. Their improved upwind scheme enables the coupled solution of advection and diffusion equations, achieving a favorable trade-off between computational efficiency and numerical accuracy. Compared with conventional CFD solvers requiring 464.8 h of computation time, the FFD model reduces simulation time to 7.6 h while preserving sufficient accuracy, and can achieve annual energy savings of 53.4–58.8% through optimal thermal design and operation.

In parallel, Athavale et al. [21] systematically compared multiple data-driven thermal modeling approaches, including artificial neural networks, support vector regression, and Gaussian process regression for data center temperature prediction. Their experimental results indicate that Gaussian process regression achieves the best average prediction error of 0.56 °C, providing a strong benchmark for validating the predictive accuracy of learning-based thermal models.

Motivated by these studies, the physical constraint layer in the proposed hybrid model encodes macroscopic thermodynamic principles into the learning framework, enabling the deep neural network to respect energy conservation laws while learning complex nonlinear thermal dynamics from data. This hybrid modeling strategy improves prediction robustness under dynamic workloads and unseen operating conditions, and establishes a physically consistent foundation for symmetry-aware model predictive control.

From a methodological standpoint, recent progress in deep learning-based signal modeling, time–frequency analysis, and optimization-inspired neural networks has provided powerful tools for constructing physically consistent and interpretable prediction models. A series of studies have demonstrated that combining signal processing theory, deep temporal networks, and optimization-driven learning architectures can significantly enhance prediction accuracy, stability, and interpretability in complex dynamic systems [39,40,41,42,43,44]. These advances offer important methodological support for the proposed hybrid physical–AI thermal prediction framework.

The feature extraction layer adopts a cascaded architecture composed of a Temporal Convolutional Network (TCN) and a Bidirectional Gated Recurrent Unit (BiGRU) to capture multi-scale temporal dependencies and long-range correlations in temperature sequences. The TCN module employs causal convolution and dilated convolution to achieve an exponentially expanding receptive field with limited network depth, enabling efficient modeling of long-term thermal evolution patterns.

The convolutional output of the TCN can be formulated as [28]:

h_{t}^{(l)} = f (W^{(l)} * h_{t - d : t}^{(l - 1)} + b^{(l)})

(5)

where

h_{t}^{(l)}

denotes the hidden-state output of the

l

-th layer at time

t

,

W^{(l)}

is the convolution kernel weight matrix of layer

l

,

h_{t - d : t}^{(l - 1)}

represents the hidden-state sequence of layer

l - 1

from time

t - d

to

t

,

d

denotes the dilation factor,

b^{(l)}

is the bias vector, and

f (\cdot)

denotes the nonlinear activation function.

On top of the TCN encoder, a BiGRU module is introduced to further enhance sequential representation capability by modeling bidirectional temporal dependencies. The BiGRU propagates information in both forward and backward directions, enabling the network to capture both historical thermal inertia and future trend consistency from the learned latent features. This cascaded TCN–BiGRU architecture effectively alleviates the vanishing gradient problem and achieves faster convergence compared with conventional LSTM-based recurrent networks.

The effectiveness of the TCN–BiGRU architecture for data center thermal prediction has been experimentally validated by Lin et al. [28], who demonstrated its superior accuracy and training efficiency in multi-objective thermal optimization scenarios.

To prevent overfitting and enhance model generalization, several regularization techniques are incorporated into the training process. Dropout regularization with a rate of 0.3 is applied after each TCN residual block and BiGRU layer to prevent co-adaptation of neurons. L2 weight regularization with a coefficient of

1 \times 10^{- 4}

is applied to all trainable parameters to constrain model complexity. The training dataset is split into training (70%), validation (15%), and testing (15%) subsets, with the validation set used for hyperparameter tuning and early stopping. Early stopping with a patience of 20 epochs monitors the validation loss to terminate training when no improvement is observed, preventing overfitting to the training data. Additionally, the physical constraint layer serves as an implicit regularizer by enforcing thermodynamic consistency, which restricts the solution space to physically plausible predictions and improves generalization to unseen operating conditions. Data augmentation through Gaussian noise injection (

σ

= 0.05 °C) is applied during training to improve robustness against sensor noise.

The model’s generalization capability across different operating conditions is ensured through several design choices. The input features include normalized environmental variables (outdoor temperature, humidity) that capture seasonal variations, allowing the model to adapt to different ambient conditions. The physical constraint layer provides structural priors that remain valid across different data center configurations, reducing the need for extensive retraining when deploying to new facilities. For adaptation to significantly different data center layouts or cooling system configurations, transfer learning can be employed by freezing the physical constraint layer and fine-tuning only the deep learning components with limited local data (typically 3–7 days of operation). Cross-validation experiments across different load profiles demonstrated that the hybrid model maintains prediction RMSE below 0.5 °C for load variations within ±30% of the training distribution.

In the output stage, the learned deep features are fused with physical constraint priors through residual connections, enabling the model to generate multi-step temperature forecasts while respecting thermodynamic consistency. For input feature construction, domain knowledge from data center cooling systems is incorporated. Yu et al. [45] systematically reviewed passive and active cooling strategies for data centers, providing guidance for selecting airflow, heat exchange, and equipment operation variables as thermal drivers. Perez-Lombard et al. [46] further analyzed global building energy consumption patterns and identified HVAC systems as major energy consumers, accounting for approximately 50% of total building energy usage. These insights motivate the inclusion of HVAC-related operational variables as key explanatory features in the hybrid prediction model.

In a broader perspective of intelligent sensing systems, recent advances in high-throughput perception, deep learning-based recognition, and real-time intelligent decision-making have demonstrated the feasibility of constructing end-to-end closed-loop systems from sensors to insights. Representative studies have shown that modern industrial intelligence platforms increasingly rely on large-scale sensing, deep neural perception, and edge–cloud collaborative computing to support real-time control and optimization [47,48,49,50,51,52]. These developments further validate the technical paradigm adopted in this work, namely high-density sensing, intelligent modeling, and closed-loop optimization for complex industrial infrastructures.

The optimization objective of the MPC controller is formulated to minimize the total energy consumption of the cooling system over the prediction horizon while enforcing smooth control actions to avoid frequent equipment adjustments and mechanical wear. The objective function is defined as [31]:

J = \sum_{k = 1}^{N_{p}} [α_{1} P_{c o o l} (k) + α_{2} | T (k) - T_{r e f} |^{2}] + \sum_{k = 1}^{N_{c} - 1} α_{3} | Δ u (k) |^{2}

(6)

where

J

denotes the objective function value,

N_{p}

is the prediction horizon length,

N_{c}

is the control horizon length,

P_{c o o l} (k)

represents the cooling system power at step

k

(kW),

T (k)

denotes the predicted temperature vector at step

k

,

T_{r e f}

is the reference temperature setpoint, and

Δ u (k)

denotes the control input increment at step

k

. The weighting coefficients

α_{1}

,

α_{2}

, and

α_{3}

balance the trade-off among energy efficiency, thermal safety, and control smoothness.

The total cooling system power is decomposed into three major components: precision air conditioners, chilled water pumps, and cooling tower fans. The corresponding power model is given by [53]:

P_{c o o l} = P_{C R A C} + P_{p u m p} + P_{t o w e r}

(7)

where

P_{C R A C}

denotes the compressor power of precision air-conditioning units (kW),

P_{p u m p}

represents the aggregated power of chilled water pumps and cooling water pumps (kW), and

P_{t o w e r}

denotes the cooling tower fan power (kW).

The formulation of the energy consumption model and airflow-related control variables is supported by experimental and field studies. Cho et al. [53] performed measurements and predictive analysis of air distribution systems in high-compute-density data centers, revealing that reasonable airflow organization and cooling system configuration can significantly improve thermal management efficiency and reduce energy consumption. Lazic et al. [54] from Google further demonstrated the practical effectiveness of model predictive control in large-scale production data centers, achieving substantial energy savings through real-world deployments. Their results provide strong empirical evidence for the effectiveness and engineering feasibility of MPC-based cooling optimization.

The constraint set of the MPC controller consists of two categories: temperature safety constraints and physical constraints of cooling equipment. The temperature safety constraints ensure that the predicted rack inlet temperature remains below a predefined upper bound to prevent server frequency throttling or emergency shutdown caused by overheating. The safety constraint is formulated as:

T_{i n, i} (k) \leq T_{m a x}, \forall i = 1, 2, \dots, N_{r a c k}, \forall k = 1, 2, \dots, N_{p}

(8)

where

T_{i n, i} (k)

denotes the predicted inlet temperature of rack

i

at step

k

,

T_{m a x}

represents the upper safety threshold of inlet temperature, and

N_{r a c k}

denotes the total number of racks.

In addition to thermal safety, physical constraints of cooling equipment are imposed to ensure reliable and safe operation. These constraints limit both the admissible range and the rate of change of control variables, which are expressed as:

u_{m i n} \leq u (k) \leq u_{m a x}

(9)

and

Δ u_{m i n} \leq Δ u (k) \leq Δ u_{m a x}

(10)

where

u (k)

denotes the control input vector at step

k

, including the supply air temperature setpoint, airflow rate setting, and chilled water valve opening. The vectors

u_{m i n}

and

u_{m a x}

define the lower and upper bounds of the control variables, respectively, while

Δ u_{m i n}

and

Δ u_{m a x}

specify the allowable range of control input increments.

To ensure real-time performance under dynamic workloads, the MPC optimization problem is formulated as a quadratic programming (QP) problem and solved using the OSQP (Operator Splitting Quadratic Program) solver, which is specifically designed for embedded and real-time applications. The computational complexity scales as

O (N_{p} \times N_{u} \times N_{x})

, where

N_{u}

is the number of control variables and

N_{x}

is the state dimension. With the configured prediction horizon of 30 min (180 sampling points at 10 s intervals) and control horizon of 5 min (30 control actions), the average solver computation time is 127 ms with a maximum of 312 ms on the edge computing device (Advantech Co., Ltd., Taipei, Taiwan), equipped with an Intel Core i7-10700 processor (Intel Corporation, Santa Clara, CA, USA) and 32 GB DDR4 RAM, which is well within the 5 min control update period.

System stability under model uncertainty and sensor noise is ensured through several mechanisms. First, the rolling-horizon strategy inherently provides feedback correction, as the optimization is re-executed at each control cycle using updated state measurements, compensating for prediction errors. Second, constraint tightening is employed by setting the effective temperature threshold 0.5 °C below the actual safety limit (26.5 °C instead of 27 °C), providing a safety margin against prediction uncertainty. Third, the control increment constraints limit the rate of change of control actions, preventing aggressive responses to noisy measurements and ensuring smooth transitions. Fourth, a Kalman filter-based state estimator processes the raw distributed fiber optic measurements to reduce sensor noise before feeding into the MPC controller, with the filter covariance matrices tuned based on the sensor uncertainty analysis in Table 4. These combined mechanisms ensure robust and stable control performance under practical operating conditions with measurement noise standard deviation up to 0.3 °C and model prediction error up to 0.5 °C.

By explicitly embedding both thermal safety constraints and actuator physical limitations into the rolling-horizon optimization problem, the proposed MPC controller guarantees feasible and stable control actions under dynamic workloads and varying environmental conditions, thereby ensuring reliable and energy-efficient thermal regulation of the data center.

The solution process of the MPC controller is illustrated in Figure 5. At each control cycle, the controller acquires real-time temperature measurements from the distributed fiber optic sensing system and collects operating status information of cooling equipment. The hybrid thermal prediction model is then invoked to generate multi-step temperature trajectories over the prediction horizon. Based on the predicted thermal evolution, the control problem is formulated as a constrained quadratic programming (QP) problem and solved to obtain the optimal control sequence. Finally, only the first control action in the sequence is applied to the actuators, and the entire optimization procedure is repeated at the next sampling instant following the rolling-horizon strategy.

Serale et al. [31] provided a comprehensive review of model predictive control for enhancing the energy efficiency of buildings and HVAC systems, covering problem formulation, practical applications, and future opportunities. Their work highlights the importance of real-time optimization, reliable communication architectures, and human–machine interaction interfaces in large-scale energy systems, which offers valuable guidance for the control system implementation and operational interface design of the proposed data center thermal management platform.

By embedding real-time feedback and rolling optimization mechanisms into the control loop, the proposed MPC framework can effectively compensate for load disturbances, modeling uncertainties, and environmental fluctuations. This closed-loop predictive control architecture ensures safe, stable, and energy-efficient thermal regulation of the data center under dynamically varying operating conditions.

4. Experimental Results and Analysis

4.1. Experimental Environment and Testing Scheme

The experimental validation was conducted in a production data center operated by a large-scale Internet enterprise located in East China. The machine room covers an area of approximately 800 m² and adopts an underfloor air supply and overhead return air airflow organization. The facility is equipped with 8 precision air-conditioning units and 120 standard 42U server racks. The racks are arranged in six rows following an alternating cold–hot aisle layout. The average power density per rack is approximately 12 kW, and the total IT load of the machine room is about 1.44 MW.

The experimental campaign was carried out from July to September 2024, covering representative summer operating conditions characterized by high ambient temperatures and heavy computational workloads. This period reflects typical peak-load scenarios for data center thermal management and therefore provides a realistic benchmark for evaluating the performance of the proposed control framework. To provide context for the ambient conditions during the experimental period, Table 6 summarizes the outdoor weather data recorded from the local meteorological station. As shown in the table, the experimental period experienced typical summer conditions with average outdoor temperatures ranging from 28.3 °C to 31.2 °C and relative humidity between 65% and 78%. These conditions represent challenging operating scenarios for data center cooling systems, as the high ambient temperature limits the effectiveness of free cooling and increases the cooling energy consumption.

The distributed fiber optic temperature sensing system deployed approximately 1800 m of sensing fiber along the top of cold and hot aisles and key heat-exchange regions within the machine room. The system continuously collected temperature measurements from more than 3600 sensing points with a spatial resolution of 0.5 m. As a baseline reference, the original monitoring infrastructure of the data center consisted of 156 PT100 resistance temperature sensors installed at rack inlets and air-conditioner return air outlets.

Prior to the experimental campaign, a comprehensive calibration procedure was conducted to ensure measurement accuracy and traceability of the distributed fiber optic sensing system. The calibration process consisted of three stages: (1) factory calibration—the sensing instrument was calibrated by the manufacturer against NIST-traceable temperature standards at five reference points (0 °C, 15 °C, 25 °C, 35 °C, and 50 °C), with calibration certificates provided; (2) field verification—upon installation, the fiber optic sensing system was cross-validated against 30 co-located high-precision PT100 sensors (Class A, ±0.15 °C accuracy) distributed across representative locations including cold aisles, hot aisles, and air-conditioner outlets, with measurements recorded over a 72 h period under stable operating conditions; and (3) periodic recalibration—during the 90-day experimental period, weekly spot checks were performed at 10 reference locations to monitor measurement drift, with full recalibration scheduled if deviation exceeded ±0.2 °C. The field verification results confirmed that 95.3% of fiber optic measurement points exhibited deviations within ±0.3 °C compared with co-located PT100 sensors, validating the calibration quality and measurement reliability.

The hybrid thermal prediction model was trained using historical operational data collected over the past 30 days, while the most recent 7 days of data were reserved for independent testing and validation. The MPC controller operated with a sensing sampling period of 10 s and a control update period of 5 min. The proposed control framework was evaluated against the original PID-based cooling control strategy deployed in the production system.

To ensure fair comparison between different methods, the following experimental protocols were established: (1) All prediction models (Physical model, LSTM, XGBoost, TCN–BiGRU, and Hybrid model) were trained on identical datasets with the same training/validation/testing split (70%/15%/15%). (2) Hyperparameter optimization for each model was performed using Bayesian optimization with 50 iterations on the validation set. (3) The PID and MPC strategies were evaluated under matched workload conditions by alternating deployment during comparable time periods (weekday daytime hours with similar IT loads). (4) Energy consumption measurements were normalized by IT load to account for minor workload variations between test periods.

The baseline PID controller was professionally tuned to ensure a fair comparison with the proposed MPC strategy. The PID parameters were optimized using the Ziegler–Nichols method with subsequent fine-tuning based on step response analysis. The final PID parameters for each CRAC unit are summarized in Table 7. Each CRAC unit operates with an independent PID loop controlling the supply air temperature based on the return air temperature feedback. The derivative filter coefficient (N) was set to 10 to reduce noise sensitivity. Prior to the comparative experiments, the PID system had been operational for over 6 months with stable performance, representing a well-tuned industrial baseline rather than a naive or unoptimized controller.

The detailed configuration of the experimental platform, sensing infrastructure, data volume, and control parameters is summarized in Table 8.

4.2. Performance Analysis of the Distributed Temperature Sensing System

The performance of the distributed fiber optic temperature sensing system was evaluated from three dimensions: measurement accuracy, spatial resolution, and dynamic response characteristics. To assess measurement accuracy, the fiber optic sensing system was colocated with calibrated high-precision PT100 resistance temperature sensors, and the temperature readings from both systems were statistically compared.

During the 7-day continuous testing period, more than 600,000 paired measurement samples were collected under real operating conditions. The statistical results of the measurement errors are summarized in Table 9. The distributed sensing system achieves an average measurement deviation of 0.12 °C with a standard deviation of 0.23 °C. The error range within the 95% confidence interval is ±0.47 °C, and within the 99% confidence interval it is ±0.61 °C, which fully satisfies the accuracy requirements for data center thermal monitoring and control.

The probability density function (PDF) and cumulative distribution function (CDF) of the measurement errors are illustrated in Figure 6. The figure presents a histogram of error distribution overlaid with the fitted normal distribution curve (PDF) and the cumulative probability curve (CDF). The horizontal axis represents the measurement error in °C, while the left vertical axis shows the frequency count and the right vertical axis displays the cumulative probability percentage. As shown in the figure, the error distribution closely follows a normal distribution with a mean value near zero, indicating the absence of significant systematic bias in the sensing system. Approximately 92.3% of the measurement points exhibit absolute errors within ±0.4 °C, and fewer than 1% of samples exceed an error magnitude of ±0.6 °C. The slight positive skewness (skewness coefficient = 0.08) observed in the distribution can be attributed to the asymmetric thermal response of the fiber under rapid heating versus cooling conditions, where the fiber exhibits marginally faster response to temperature increases due to the exothermic nature of Brillouin scattering interactions. These relatively large deviations are primarily observed in regions subject to strong airflow disturbances near precision air-conditioner outlets, where rapid temperature fluctuations and turbulent mixing increase local measurement uncertainty. The physical mechanism underlying this phenomenon is that intense airflow induces fiber micro-vibrations with frequencies in the 10–50 Hz range, which causes spectral broadening of the Brillouin signal and reduces the signal-to-noise ratio of temperature extraction. This effect is more pronounced at air-conditioner outlets where air velocities exceed 3 m/s compared with quiescent hot aisle regions where velocities are typically below 0.5 m/s.

Overall, the experimental results demonstrate that the proposed distributed fiber optic sensing system provides high-accuracy, high-density, and spatially continuous temperature measurements, forming a reliable perceptual foundation for thermal field reconstruction and symmetry-aware predictive control in large-scale data centers.

The core advantage of the distributed sensing system over traditional point-type sensors lies in its capability to acquire continuous spatial temperature field information with high spatial resolution. Figure 7 presents a two-dimensional thermal distribution map of the machine room temperature field at a representative operating moment. The horizontal axis corresponds to the spatial coordinate along the length direction of the machine room, while the vertical axis denotes the spatial coordinate along the width direction. The color scale represents the temperature magnitude. The figure employs a continuous color gradient from blue (low temperature, ~18 °C) through green and yellow to red (high temperature, ~30 °C), with isothermal contour lines overlaid at 1 °C intervals. Key thermal features are annotated, including hotspot locations (marked with temperature values) and the positions of CRAC units along the room perimeter.

From the thermal map, the band-shaped high-temperature distribution characteristics of hot aisle regions can be clearly observed, together with localized hotspot zones near individual rack outlets. The observed thermal patterns can be explained by the following physical mechanisms: (1) The periodic high-temperature bands (26–28 °C) aligned with hot aisles result from the concentrated exhaust of server waste heat, which creates thermal plumes that rise and accumulate beneath the ceiling. (2) The localized hotspots (exceeding 27 °C) near specific rack outlets (Racks 23, 47, and 89 in the figure) correspond to high-power-density servers running computationally intensive workloads during the measurement period. (3) The cooler regions (20–22 °C) near CRAC outlets demonstrate effective cold air delivery, with temperature gradients of approximately 0.3 °C/m observed in the mixing zones between cold supply air and warm return air. These spatial thermal patterns reveal strong anisotropy and non-uniformity induced by rack layout, airflow organization, and dynamic IT load distribution. In contrast, the original monitoring system relying on only 156 discrete measurement points can merely provide sparse temperature sampling, which is insufficient to reconstruct such fine-grained spatial temperature gradients and to identify localized thermal anomalies in a timely and reliable manner.

4.3. Hybrid Thermal Prediction Model Accuracy Verification

The prediction performance of the proposed hybrid thermal prediction model was evaluated through comparative experiments against several representative baseline methods. The comparison models include a traditional physical model based on macroscopic energy balance equations, a pure data-driven long short-term memory network (LSTM), the extreme gradient boosting algorithm (XGBoost), and a TCN–BiGRU network without physical constraint embedding. The selection of these baseline methods was motivated by the following considerations: (1) The physical model represents the traditional first-principles approach widely used in building energy simulation. (2) LSTM is the most commonly adopted recurrent neural network architecture for time-series prediction in thermal systems. (3) XGBoost represents state-of-the-art gradient boosting methods that have demonstrated strong performance in data center temperature prediction [2]. (4) TCN–BiGRU without physical constraints serves as an ablation study to isolate the contribution of the physics-informed component. All models were optimized using the same training dataset, and their prediction accuracy was evaluated on an independent test dataset.

The prediction horizon was set to 30 min, and temperature values were predicted at six future time instants with a 5 min interval. The performance comparison results are summarized in Table 10. The evaluation metrics include root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and the coefficient of determination (R²).

The hybrid prediction model achieved the best performance across all evaluation metrics, with an RMSE of 0.41 °C, MAE of 0.32 °C, MAPE of 1.28%, and R² value of 0.967. Compared with the pure physical model, the RMSE was reduced by 58.2%, demonstrating the significant improvement brought by data-driven feature learning. Compared with the best-performing pure data-driven model (XGBoost), the RMSE was further reduced by 26.8%, indicating that the introduction of physical constraints effectively enhances prediction stability and generalization capability.

These results verify that the proposed hybrid modeling strategy achieves a superior balance between physical consistency and nonlinear representation ability, providing reliable temperature trajectory prediction for downstream model predictive control and thermal symmetry regulation.

A comprehensive comparison of different prediction methods on multi-dimensional performance metrics is presented in radar-chart form in Figure 8. The five dimensions correspond to prediction accuracy, computational efficiency, robustness, interpretability, and generalization ability, where larger values indicate better performance in the corresponding dimension. Each dimension is normalized to a 0–100 scale based on the relative performance of the five compared methods. Prediction accuracy is derived from the inverse of RMSE; computational efficiency is based on inference time measurements; robustness is evaluated from performance variance across different test scenarios; interpretability is assessed based on the presence of physics-based components; and generalization ability is measured by cross-validation performance on held-out data from different operating periods. The proposed hybrid model exhibits clear advantages in prediction accuracy and generalization ability. Meanwhile, it maintains a favorable level of interpretability because the physical-constraint layer explicitly embeds thermodynamic conservation relationships, which regularizes the learned representations and improves model consistency.

The evolution of prediction error with an increasing prediction horizon reflects the long-term stability of forecasting models. Figure 9 illustrates the RMSE variation in different prediction models as the prediction horizon extends from 5 min to 30 min. The horizontal axis represents the prediction horizon in minutes, while the vertical axis shows the RMSE in °C. Each curve corresponds to one prediction method, with error bars indicating the standard deviation of RMSE across different test days. The legend is positioned in the lower-right corner to avoid overlap with the trend lines. In general, prediction errors tend to grow as the horizon extends due to uncertainty accumulation and compounding effects. The hybrid model shows a slower degradation trend under long-horizon forecasting, indicating that the introduction of physics-informed constraints can effectively suppress error accumulation and enhance stability for downstream predictive control. Specifically, the hybrid model exhibits an error growth rate of 0.008 °C/min compared with 0.012 °C/min for TCN–BiGRU, 0.015 °C/min for XGBoost, 0.018 °C/min for LSTM, and 0.025 °C/min for the physical model. This slower error accumulation is attributed to the physical constraint layer, which enforces energy conservation and prevents the model from drifting into physically implausible prediction trajectories over extended horizons.

4.4. MPC System Performance Evaluation

The performance of the proposed MPC-based thermal control system was evaluated through comparative experiments against the original PID control system. The two control strategies were alternately deployed under identical workload conditions and ambient environments. Each control system operated continuously for three days before switching to the other system. The evaluation metrics cover three aspects: temperature control accuracy, cooling system energy consumption, and equipment operation stability.

Temperature control accuracy is quantified using the statistical characteristics of rack inlet temperatures. All rack inlet temperature measurements were aggregated to compute the mean value, standard deviation, and over-limit ratio, defined as the time proportion exceeding the safety threshold of 27 °C. The comparison results are summarized in Table 11. Under MPC, the mean rack inlet temperature increased from 23.8 °C to 24.6 °C, enabling more effective utilization of the allowable thermal margin while preserving safety constraints. Meanwhile, the standard deviation decreased from 1.42 °C to 0.87 °C, indicating a substantial improvement in spatial temperature uniformity. The over-limit ratio was reduced from 0.32% to 0.08%, corresponding to a 75% reduction in hotspot occurrence frequency. Additionally, the Thermal Symmetry Index (TSI) decreased from 0.060 under PID control to 0.035 under MPC, representing a 41.7% improvement in spatial temperature distribution uniformity. According to the TSI interpretation guidelines presented in Section 3.1, this improvement transitions the thermal field from “moderate asymmetry requiring attention” to “good thermal symmetry acceptable for normal operation”.

The distribution characteristics of rack inlet temperatures are illustrated using box plots in Figure 10. The left panels show the temperature distributions for each rack row under PID control, while the right panels correspond to the MPC results. Each box represents the interquartile range, the central line denotes the median, whiskers extend to 1.5 times the interquartile range, and circular markers indicate outliers. Error bars representing the 95% confidence interval of the median are added to each box to indicate the variability and statistical significance of the temperature distributions. The horizontal dashed line at 27 °C indicates the safety threshold, and the standard deviation (σ) for each rack row is annotated above the corresponding box. It can be clearly observed that the temperature distributions under MPC are significantly more concentrated and consistent across rack rows, demonstrating that both intra-row dispersion and inter-row thermal asymmetry are effectively suppressed. The reduction in inter-row temperature variation (from σ = 0.89 °C to σ = 0.42 °C across row medians) indicates that the MPC controller achieves more balanced cooling distribution by coordinating the operation of multiple CRAC units based on the spatially distributed temperature feedback from the fiber optic sensing system.

Cooling system energy consumption is a primary indicator for evaluating the economic effectiveness of thermal control strategies. During the experimental period, hourly power data of precision air conditioners (CRAC units), chilled water pumps, and cooling tower fans were recorded, based on which the total cooling energy consumption and power usage effectiveness (PUE) were calculated. The energy performance comparison between the PID and MPC strategies is summarized in Table 12. The MPC system achieved a daily average cooling energy consumption of 8.76 MWh, corresponding to a 14.4% reduction compared with the PID system (10.23 MWh). Meanwhile, PUE decreased from 1.58 to 1.47 (a 6.96% reduction), indicating improved overall facility-level energy efficiency.

Based on an industrial and commercial electricity price of 0.85 yuan/kWh, the annual electricity cost savings are estimated to be approximately 460,000 yuan. The calculation is detailed as follows: Daily energy savings = 10.23 − 8.76 = 1.47 MWh; Annual energy savings = 1.47 × 365 = 536.55 MWh; Annual cost savings = 536.55 × 1000 × 0.85 = 456,068 yuan ≈ 45.7 × 10,000 yuan. Notably, the component-level energy distribution also changed under MPC: the proportion of CRAC electricity consumption decreased (68.5% to 65.2%), while the shares of pumps and cooling tower fans slightly increased, suggesting that the MPC strategy reallocates energy use across subsystems to achieve a globally optimal operating point rather than locally minimizing a single device’s power. This reallocation reflects the MPC controller’s ability to exploit the different efficiency characteristics of cooling subsystems—for example, increasing waterside economizer utilization during favorable ambient conditions while reducing compressor-based cooling.

Figure 11 illustrates the 24 h temporal evolution of the cooling system power composition. The figure presents two stacked area charts comparing PID control (upper panel) and MPC (lower panel). The horizontal axis represents time of day (0–24 h), while the vertical axis shows power consumption in kW. Three color-coded areas represent CRAC units (blue), water pumps (orange), and cooling tower fans (green). The total daily energy consumption is annotated in each panel. The MPC strategy reduces compressor power during nighttime low-load periods, thereby better exploiting free-cooling potential and thermal inertia. During afternoon high-ambient-temperature periods, cooling capacity is increased in advance to mitigate upcoming thermal peaks, reflecting the feedforward and anticipative characteristics of predictive control. The observed pre-cooling behavior between 10:00–12:00 demonstrates the MPC controller’s use of predictive information to prepare for the afternoon thermal peak (typically occurring at 14:00–16:00), storing cooling capacity in the building thermal mass before the peak demand period. This anticipative regulation not only improves energy efficiency, but also contributes to stabilizing the thermal field and suppressing spatial thermal asymmetry by avoiding abrupt temperature excursions.

4.5. System Robustness and Extreme Condition Testing

In practical deployments, a data center thermal management system must remain stable under unexpected disturbances, including abrupt IT load mutations and equipment-level failures. Robustness testing was therefore conducted to verify system safety, stability, and recovery capability under extreme conditions. Test scenarios included a rapid IT load step (20% increase within 30 s), a single-CRAC shutdown (failure transfer), and sudden outdoor ambient temperature rises.

Figure 12 presents the temperature response trajectories in the load step test, including the room-average temperature, peak temperature, and safety threshold (27 °C). The figure displays four curves: MPC room-average temperature (solid blue), MPC peak temperature (dashed blue), PID room-average temperature (solid orange), and PID peak temperature (dashed orange). A secondary y-axis shows the IT load profile (dotted gray line). The safety threshold is indicated by a horizontal red line, with a shaded “Safety Alert Zone” above 27 °C. Key events are annotated, including the load step occurrence, MPC detection time (45 s), and recovery times for both control strategies. After the load step occurred, the MPC strategy identified the abnormal thermal load change within 45 s and reduced the supply air temperature setpoint from 14.2 °C to 12.8 °C within the subsequent three control cycles. As a result, the maximum room temperature peaked at 26.3 °C at the 4th minute and then declined, without triggering the safety alert zone. Under the same test conditions, the PID controller exhibited a higher peak temperature of 27.8 °C and a recovery time exceeding 8 min. The superior performance of MPC can be attributed to two factors: (1) the predictive model detects the load change through temperature trajectory analysis before the thermal effect fully propagates, enabling early intervention; (2) the optimization-based control computes a coordinated response across all CRAC units simultaneously, whereas the PID controller adjusts each unit independently based on local temperature feedback. These results indicate that predictive regulation can suppress peak excursions and accelerate recovery, thereby maintaining thermal safety margins and mitigating transient thermal asymmetry amplification during disturbances.

Single air-conditioner failure testing simulated a sudden shutdown of CRAC No. 1 to evaluate load-transfer capability under coordinated MPC. The MPC system detected the abnormal status within 15 s, recalculated the optimal operating parameters of the remaining CRAC units, and increased the air supply of adjacent CRAC No. 2 and No. 8 to 95% of rated capacity, while the other units synchronously increased output. The rack inlet temperature in the affected zone stabilized at 25.8 °C within 2 min, and no hotspot propagation was observed. The operating status of each unit before and after failure is summarized in Table 13.

To further evaluate long-term operational robustness, Table 14 reports the 90-day continuous operation statistics. The MPC system accumulated 2160 h of operation and handled 47 load mutation events and 12 ambient abnormal events, with all events maintained within safety thresholds (zero safety over-limit occurrences). The system availability reached 99.87%. Meanwhile, the controller achieved an average computation time of 127 ms and a maximum of 312 ms, with a 99.94% optimization success rate, satisfying real-time closed-loop control requirements in production environments.

To assess long-term measurement stability and sensor drift characteristics, the fiber optic sensing system’s performance was monitored throughout the 90-day experimental period. Table 15 summarizes the drift analysis results based on weekly calibration checks against reference PT100 sensors. As shown in the table, the average measurement drift remained below 0.05 °C over the entire period, with no systematic trend observed. The maximum instantaneous drift of 0.18 °C occurred during Week 8, which was attributed to a temporary environmental disturbance (building HVAC maintenance) rather than sensor degradation. These results demonstrate that the Brillouin-scattering-based distributed sensing technology maintains excellent long-term stability under continuous industrial operation, with drift characteristics well within the acceptable range for thermal management applications. The optical fiber’s physical integrity was verified through optical time-domain reflectometry (OTDR) measurements, which confirmed no significant attenuation changes or connector degradation over the test period.

It should be noted that the experimental validation was primarily conducted under summer high-temperature and high-load conditions. The system performance under winter low-load conditions and spring/autumn transitional seasons requires further investigation. Under winter conditions, several operational differences are expected: (1) Lower ambient temperatures (typically 0–10 °C in the experimental location) would significantly increase the free-cooling potential, potentially enabling economizer operation for extended periods and further reducing compressor-based cooling energy consumption by an estimated 20–30%. (2) Reduced IT loads during holiday periods would decrease heat generation, requiring the MPC controller to operate at lower cooling capacities with correspondingly different optimal setpoints. (3) The risk of overcooling becomes more significant, requiring the MPC controller to incorporate minimum temperature constraints to prevent condensation and equipment damage. Preliminary simulation studies suggest that the proposed MPC framework can be extended to handle these scenarios by incorporating seasonal parameter adaptation and mode-switching logic for economizer/mechanical cooling transitions, which will be validated in future experimental campaigns.

5. Discussion and Implications

5.1. Discussion of Research Results

The distributed fiber optic temperature sensing system constructed in this study demonstrated excellent measurement performance in a production data center environment. The achieved accuracy level, with an average deviation of 0.12 °C and a standard deviation of 0.23 °C, is highly consistent with the theoretical performance indicators obtained under laboratory calibration conditions. This result verifies the reliability and long-term stability of Brillouin-scattering-based temperature measurement principles in industrial-grade application scenarios.

It is noteworthy that approximately 7.7% of measurement points exhibited deviations exceeding the expected range of ±0.4 °C. Further spatial error distribution analysis reveals that these larger deviations are mainly concentrated in regions with strong turbulent airflow near precision air conditioner outlets. Intense airflow disturbances induce slight fiber vibrations, which affect the spectral stability of the backscattered signals. This observation reveals an inherent limitation of distributed fiber optic sensing technology under strong airflow interference conditions and indicates directions for future system optimization. Although mitigation strategies including enhanced mechanical fixation (cable ties at 0.5 m intervals instead of 1.0 m) and signal filtering (moving average with 5-point window) were implemented in the system design, these approaches were not subjected to controlled experimental validation with isolated variable testing. Preliminary observations suggest that the enhanced fixation reduced vibration-induced errors by approximately 15–20% in high-airflow regions, but rigorous quantification through dedicated experiments remains a topic for future work. Alternative approaches such as vibration-compensating algorithms based on fiber Bragg grating strain measurements or aerodynamic shielding of fiber segments near CRAC outlets warrant further investigation.

The hybrid thermal prediction model achieved a substantial accuracy improvement compared to pure data-driven methods. The experimentally observed RMSE reduction of 26.8% relative to the XGBoost baseline confirms the effectiveness of introducing physical constraints into the learning framework. By embedding thermodynamic conservation equations as prior knowledge within the neural network architecture, the learning space of the model is effectively regularized, preventing abnormal prediction outputs that violate physical laws in sparsely sampled regions of the training dataset. This demonstrates that physics-guided learning provides a robust mechanism for improving long-horizon prediction stability in complex thermal systems.

The MPC-based control system achieved a 14.4% reduction in cooling energy consumption and improved PUE from 1.58 to 1.47. This performance level is comparable to that of internationally advanced hyperscale data centers, highlighting the technical superiority of predictive control strategies over traditional feedback-based control. From a control mechanism perspective, the energy savings are mainly derived from two aspects: (i) reducing excessive cooling by increasing the mean rack inlet temperature from 23.8 °C to 24.6 °C while maintaining safety margins; and (ii) leveraging predictive information to allocate cooling resources in advance, thereby avoiding efficiency losses caused by purely reactive control actions.

Moreover, the reduction in temperature standard deviation from 1.42 °C to 0.87 °C indicates that the MPC system achieves a more spatially balanced cooling distribution, which is of practical significance for mitigating local hotspots, enhancing thermal symmetry, and extending the service life of IT equipment. Nevertheless, it should be objectively recognized that the experimental validation period covered 90 days and mainly focused on summer high-temperature operating conditions. The system performance during spring–autumn transition seasons and winter low-temperature conditions still requires further verification, particularly with respect to the optimization potential of free cooling and hybrid cooling mode switching strategies.

5.2. Theoretical Contributions and Practical Implications

The theoretical contributions of this study can be summarized along three main dimensions. First, from the perspective of sensing technology, distributed fiber optic temperature sensing is extended from traditional application domains such as power cable monitoring and oil and gas pipeline inspection to data center thermal management. The results verify the applicability of this technology in high-density measurement scenarios and complex airflow environments, providing a novel technical pathway for fine-grained and continuous characterization of data center temperature fields. Second, from the modeling methodology perspective, a hybrid prediction architecture integrating thermodynamic physical equations with deep temporal neural networks is proposed. This architecture overcomes the dual limitations of pure physical models (high computational cost and modeling complexity) and pure data-driven models (limited generalization ability and physical inconsistency), thereby establishing a feasible framework for deep integration of physical knowledge and data intelligence in industrial process modeling. Third, from the control strategy perspective, a multi-objective MPC controller tailored for data center cooling systems is designed. By explicitly incorporating temperature safety constraints and equipment operating boundaries, the controller achieves coordinated optimization of energy efficiency and thermal reliability, enriching the application practice of model predictive control in the HVAC domain.

From an engineering practice perspective, the proposed system demonstrates clear deployment value and promotion potential for data center operation and maintenance management. The deployment cost of the distributed fiber optic sensing system is approximately 60–70% of that of traditional point-sensor schemes with equivalent measurement point density, while offering advantages such as simplified wiring, convenient maintenance, and strong anti-electromagnetic interference capability. These characteristics make it particularly suitable for large-scale deployment in newly built or retrofitted data centers. The hybrid prediction model and MPC controller are implemented in software form and can be integrated via upgrading existing building automation systems or deploying independent edge computing devices, requiring minimal modification to existing hardware infrastructure.

5.3. Economic Analysis

To assess the economic feasibility of the proposed system, a comprehensive cost–benefit analysis was conducted based on the experimental facility (1.44 MW IT load), incorporating both initial deployment costs and lifecycle considerations. The total deployment cost, including sensing hardware (BOTDA interrogator, fiber cables), computing infrastructure, software development, and engineering services, is estimated at approximately 122 × 10,000 yuan. The itemized breakdown is presented in Table 16.

The annual operational savings are calculated as follows: Daily energy savings = 10.23 − 8.76 = 1.47 MWh; Annual energy savings = 1.47 × 365 = 536.55 MWh; Annual cost savings = 536.55 × 1000 × 0.85 yuan/kWh = 45.6 × 10,000 yuan.

To provide a more realistic assessment of economic viability, a lifecycle cost analysis (LCCA) over a 10-year horizon was conducted, incorporating maintenance costs, system degradation, and component replacement schedules. The annual maintenance costs are estimated based on industry benchmarks and include: (1) fiber optic system inspection and cleaning (2.0 × 10,000 yuan/year); (2) BOTDA interrogator calibration and service (3.0 × 10,000 yuan/year); (3) software updates and technical support (2.5 × 10,000 yuan/year); and (4) spare parts reserve (1.5 × 10,000 yuan/year). The total annual maintenance cost is estimated at 9.0 × 10,000 yuan.

System degradation effects were also considered. The BOTDA interrogator has an expected lifespan of 10–12 years with gradual accuracy degradation of approximately 0.5% per year, which marginally affects prediction model performance. The sensing fiber, if properly installed and protected, exhibits minimal degradation over 15+ years. However, the MPC energy savings may decrease by 1–2% annually due to model drift as equipment characteristics change, unless periodic retraining is performed (included in maintenance costs). Major component replacement is anticipated at Year 8 (edge computing device upgrade: 5.0 × 10,000 yuan) and Year 10 (BOTDA interrogator overhaul: 15.0 × 10,000 yuan). Table 17 presents the detailed lifecycle cost analysis.

The indirect benefits (21.0 × 10,000 yuan/year) include: reduced maintenance from stable operation (3.0 × 10,000 yuan), extended equipment lifespan (8.0 × 10,000 yuan), and avoided downtime costs (10.0 × 10,000 yuan). Based on this lifecycle analysis, the net benefit over 10 years is 646.6 − 232.0 = 414.6 × 10,000 yuan. The simple payback period considering maintenance costs is 122.0 / (66.6 − 9.0) = 2.12 years (approximately 25 months). The net present value (NPV) at an 8% discount rate is approximately 245 × 10,000 yuan, with an internal rate of return (IRR) of 38.2%. These metrics, while more conservative than the simplified analysis, still demonstrate strong financial viability for the proposed system.

5.4. Scalability Considerations

The proposed framework demonstrates favorable scalability for larger facilities. From the sensing perspective, a single BOTDA interrogator supports up to 50 km sensing distance, enabling coverage of multiple data halls. The incremental cost of fiber extension (20–30 yuan/m installed) makes expansion economically viable. The hybrid prediction model adopts a modular architecture where the physical constraint layer can be parameterized for different configurations, while the deep learning components scale through network depth and width adjustments.

A critical scalability concern is the computational load of MPC optimization under large-scale deployments with significantly more sensing points. The current experimental system with 3612 sensing points achieved average MPC computation times of 127 ms. To analyze scalability, theoretical complexity analysis and simulation studies were conducted for larger configurations. The MPC optimization complexity scales as O(N_p × N_u² × N_x), where N_p is the prediction horizon, N_u is the number of control variables, and N_x is the state dimension (proportional to sensing points after spatial aggregation). Table 18 presents the estimated computational requirements for different deployment scales.

For deployments exceeding 10,000 sensing points, the following strategies are recommended to maintain real-time control capability: (1) Spatial aggregation—rather than using all raw sensing points as states, spatial clustering algorithms (e.g., k-means based on thermal zones) reduce the state dimension by 80–90% while preserving critical thermal information. (2) Zone-based MPC—the facility is partitioned into semi-independent thermal zones (e.g., per rack row or per CRAC coverage area), each with a local MPC controller, coordinated through a supervisory layer that manages inter-zone thermal coupling. (3) Distributed MPC with ADMM—for hyperscale facilities, the alternating direction method of multipliers (ADMM) enables parallel optimization across zones, achieving 90–95% of centralized solution quality while reducing computation time by 60–70%. (4) Warm-start optimization—using the previous solution as initial guess reduces solver iterations by 40–50% under normal operating conditions. These strategies collectively enable the proposed framework to scale to facilities with 50+ MW IT load while maintaining the 5 min control update requirement.

Practical experience suggests that the system is readily applicable to enterprise (0.5–2 MW) and colocation (2–10 MW) facilities with low-to-medium implementation complexity, while hyperscale deployments (>10 MW) require zone-based or distributed control configurations with additional engineering effort for thermal zone definition and inter-zone coordination logic.

6. Conclusion and Future Scope

6.1. Conclusions

This research addressed the critical challenges of high energy consumption, uneven temperature distribution, and delayed response inherent in traditional data center cooling control systems. A real-time thermal management framework based on distributed fiber optic temperature sensing and model predictive control was proposed and implemented in a production-scale data center environment.

The distributed fiber optic temperature sensing subsystem adopts the Brillouin scattering temperature measurement principle and acquires continuous temperature data from more than 3600 measurement points deployed along 1800 m of sensing fiber. The system achieved an average measurement deviation of 0.12 °C with a standard deviation of 0.23 °C, enabling full-field perception and fine-grained spatial characterization of the machine room temperature field. Compared with traditional point-type sensors, the proposed sensing architecture provides a continuous, high-resolution, and highly reliable thermal monitoring foundation for intelligent cooling control.

A hybrid thermal prediction model integrating thermodynamic energy balance equations with a TCN–BiGRU deep temporal network was constructed. By embedding physical constraints as prior knowledge into the learning architecture, the model achieved a root mean square error of 0.41 °C and a coefficient of determination of 0.967 under a 30 min prediction horizon, corresponding to prediction accuracy improvements of 58.2% and 26.8% compared with pure physical models and pure data-driven models, respectively. These results demonstrate that physics-guided deep learning provides an effective paradigm for improving long-horizon thermal prediction accuracy and stability in complex industrial environments.

On the control layer, an MPC-based cooling optimization framework was developed, performing coordinated regulation of precision air conditioners, chilled water pumps, and cooling tower fans via rolling horizon optimization under explicit temperature safety constraints. Experimental results show that the proposed system reduced daily average cooling energy consumption by 14.4%, optimized PUE from 1.58 to 1.47, reduced rack inlet temperature standard deviation from 1.42 °C to 0.87 °C, and decreased the temperature over-limit ratio from 0.32% to 0.08%. The Thermal Symmetry Index (TSI) improved from 0.060 to 0.035, representing a 41.7% enhancement in spatial temperature uniformity. The system further demonstrated strong robustness under extreme operating conditions such as load step disturbances and air conditioner failures, achieving 99.87% availability during 90 days of continuous operation. The economic analysis indicates a payback period of approximately 22 months with a 10-year NPV of 320 × 10,000 yuan, demonstrating strong financial viability.

Overall, the proposed framework establishes a complete closed-loop architecture of “high-resolution perception–physics-guided prediction–predictive optimal control” for data center thermal management. It provides a scalable, reliable, and economically viable solution for improving cooling energy efficiency and thermal safety in large-scale data centers, thereby offering strong technical support for green and low-carbon development of digital infrastructure.

6.2. Future Scope

Several limitations of this study indicate directions for future research. First, experimental validation was primarily conducted under summer conditions; extending evaluation to winter low-load scenarios with free cooling optimization represents an important next step. Second, the hybrid prediction model requires retraining when significant changes occur in facility configuration; introducing transfer learning techniques could reduce deployment time for new installations to 3–5 days of local data collection. Third, the integration of digital twin technology would enable virtual commissioning and what-if scenario analysis before implementing control changes in production environments.

From an application perspective, future work should extend the framework to hybrid cooling systems incorporating liquid and immersion cooling for next-generation high-density facilities. The integration with renewable energy sources and grid demand response programs would enable data centers to contribute to grid stability while optimizing energy costs. Additionally, incorporating carbon footprint optimization as an explicit control objective, considering time-varying grid carbon intensity, would further advance the sustainability goals of green data center development.

Author Contributions

Conceptualization, L.-X.T. and M.-J.-S.W.; methodology, L.-X.T.; software, L.-X.T.; validation, L.-X.T. and M.-J.-S.W.; formal analysis, L.-X.T.; investigation, L.-X.T.; resources, M.-J.-S.W.; data curation, L.-X.T.; writing—original draft preparation, L.-X.T.; writing—review and editing, M.-J.-S.W.; visualization, L.-X.T.; supervision, M.-J.-S.W.; project administration, M.-J.-S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are not publicly available due to confidentiality agreements with the industrial partner and security restrictions of the production data center. Aggregated statistics and experimental configurations are provided in the paper for reproducibility.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT-4o (OpenAI, San Francisco, CA, USA) for English language proofreading and grammar refinement. The authors reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Mujiangshan Wang was employed by the Shenzhen Kaihong Digital Industry Development Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

International Energy Agency (IEA). Data Centres and Data Transmission Networks; IEA: Paris, French, 2024; Available online: https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks (accessed on 3 February 2026).
Lin, J.; Lin, W.; Lin, W.; Wang, J.; Jiang, H. Thermal prediction for air-cooled data center using data driven-based model. Appl. Therm. Eng. 2022, 217, 119207. [Google Scholar] [CrossRef]
Chen, X.; Tu, R.; Li, M.; Yang, X.; Jia, K. Hot spot temperature prediction and operating parameter estimation of racks in data center using machine learning algorithms based on simulation data. In Proceedings of the Building Simulation; Springer: Berlin/Heidelberg, Germany, 2023; Volume 16, pp. 2159–2176. [Google Scholar]
Yao, Y.; Shekhar, D.K. State of the art review on model predictive control (MPC) in Heating Ventilation and Air-conditioning (HVAC) field. Build. Environ. 2021, 200, 107952. [Google Scholar] [CrossRef]
Wan, J.; Duan, Y.; Gui, X.; Liu, C.; Li, L.; Ma, Z. SafeCool: Safe and energy-efficient cooling management in data centers with model-based reinforcement learning. IEEE Trans. Emerg. Top. Comput. Intelligence 2023, 7, 1621–1635. [Google Scholar] [CrossRef]
Heydari, A.; Gharaibeh, A.R.; Tradat, M.; Soud, Q.; Manaserh, Y.; Radmard, V.; Eslami, B.; Rodriguez, J.; Sammakia, B. Experimental evaluation of direct-to-chip cold plate liquid cooling for high-heat-density data centers. Appl. Therm. Eng. 2024, 239, 122122. [Google Scholar] [CrossRef]
Taddeo, P.; Romaní, J.; Summers, J.; Gustafsson, J.; Martorell, I.; Salom, J. Experimental and numerical analysis of the thermal behaviour of a single-phase immersion-cooled data centre. Appl. Therm. Eng. 2023, 234, 121260. [Google Scholar] [CrossRef]
Zhou, X.; Xin, Z.; Tang, W.; Sheng, K.; Wu, Z. Comparative study for waste heat recovery in immersion cooling data centers with district heating and organic Rankine cycle (ORC). Appl. Therm. Eng. 2024, 242, 122479. [Google Scholar] [CrossRef]
Wang, M.; Lin, Y.; Wang, S. The connectivity and nature diagnosability of expanded k-ary n-cubes. RAIRO-Theor. Inform. Appl.-Inform. Théorique Et Appl. 2017, 51, 71–89. [Google Scholar] [CrossRef]
Wang, M.; Yang, W.; Guo, Y.; Wang, S. Conditional fault tolerance in a class of Cayley graphs. Int. J. Comput. Math. 2016, 93, 67–82. [Google Scholar] [CrossRef]
Wang, M.; Ren, Y.; Lin, Y.; Wang, S. The tightly super 3-extra connectivity and diagnosability of locally twisted cubes. Am. J. Comput. Math. 2017, 7, 127–144. [Google Scholar] [CrossRef]
Wang, S.; Wang, M. The strong connectivity of bubble-sort star graphs. Comput. J. 2019, 62, 715–729. [Google Scholar] [CrossRef]
Wang, S.; Wang, Y.; Wang, M. Connectivity and matching preclusion for leaf-sort graphs. J. Interconnect. Netw. 2019, 19, 1940007. [Google Scholar] [CrossRef]
Wang, M.; Xu, S.; Jiang, J.; Xiang, D.; Hsieh, S.Y. Global reliable diagnosis of networks based on Self-Comparative Diagnosis Model and g-good-neighbor property. J. Comput. Syst. Sci. 2025, 155, 103698. [Google Scholar] [CrossRef]
Zhang, Q.; Meng, Z.; Hong, X.; Zhan, Y.; Liu, J.; Dong, J.; Bai, T.; Niu, J.; Deen, M.J. A survey on data center cooling systems: Technology, power consumption modeling and control strategy optimization. J. Syst. Archit. 2021, 119, 102253. [Google Scholar] [CrossRef]
Xu, S.; Zhang, H.; Wang, Z. Thermal management and energy consumption in air, liquid, and free cooling systems for data centers: A review. Energies 2023, 16, 1279. [Google Scholar] [CrossRef]
Du, Y.; Zhou, Z.; Yang, X.; Yang, X.; Wang, C.; Liu, J.; Yuan, J. Dynamic thermal environment management technologies for data center: A review. Renew. Sustain. Energy Rev. 2023, 187, 113761. [Google Scholar] [CrossRef]
Zhao, J.; Chen, Z.; Li, H.; Liu, D. A model predictive control for a multi-chiller system in data center considering whole system energy conservation. Energy Build. 2024, 324, 114919. [Google Scholar] [CrossRef]
Thévenaz, L. Distributed Optical Fiber Sensors: Principles and Applications. Front. Sens. 2025, 6, 1567051. [Google Scholar]
Hatley, R.; Shehata, M.; Sayde, C.; Castro-Bolinaga, C. High-resolution monitoring of scour using a novel fiber-optic distributed temperature sensing device: A proof-of-concept laboratory study. Sensors 2023, 23, 3758. [Google Scholar] [CrossRef]
Bao, X.; Wang, Y. Recent advancements in Rayleigh scattering-based distributed fiber sensors. Adv. Devices Instrum. 2021, 2021, 8696571. [Google Scholar] [CrossRef]
Athavale, J.; Yoda, M.; Joshi, Y. Comparison of data driven modeling approaches for temperature prediction in data centers. Int. J. Heat Mass Transf. 2019, 135, 1039–1052. [Google Scholar] [CrossRef]
Fang, L.; Xu, Q.; Li, S.; Xia, Y.; Chen, Q. Temperature prediction in data center combining with deep neural network. Appl. Therm. Eng. 2024, 244, 122571. [Google Scholar] [CrossRef]
Lin, J.; Lin, W.; Huang, H.; Lin, W.; Li, K. Thermal modeling and thermal-aware energy saving methods for cloud data centers: A review. IEEE Trans. Sustain. Comput. 2023, 9, 571–590. [Google Scholar] [CrossRef]
Zhang, Q.; Zeng, W.; Lin, Q.; Chng, C.B.; Chui, C.K.; Lee, P.S. Deep reinforcement learning towards real-world dynamic thermal management of data centers. Appl. Energy 2023, 333, 120561. [Google Scholar] [CrossRef]
Li, Y.; Wen, Y.; Tao, D.; Guan, K. Transforming cooling optimization for green data center via deep reinforcement learning. IEEE Trans. Cybern. 2019, 50, 2002–2013. [Google Scholar] [CrossRef] [PubMed]
Wang, R.; Cao, Z.; Zhou, X.; Wen, Y.; Tan, R. Green data center cooling control via physics-guided safe reinforcement learning. ACM Trans. Cyber-Phys. Syst. 2024, 8, 1–26. [Google Scholar] [CrossRef]
Lin, J.; Lin, W.; Lin, W.; Liu, T.; Wang, J.; Jiang, H. Multi-objective cooling control optimization for air-liquid cooled data centers using TCN-BiGRU-Attention-based thermal prediction models. In Proceedings of the Building Simulation; Springer: Berlin/Heidelberg, Germany, 2024; Volume 17, pp. 2145–2161. [Google Scholar]
ASHRAE TC 9.9. Thermal Guidelines for Data Processing Environments, 5th ed.; ASHRAE: Atlanta, GA, USA, 2021; ISBN 978-1-947192-50-5. [Google Scholar]
Tanasiev, V.; Pluteanu, S.; Necula, H.; Pătrașcu, R. Enhancing monitoring and control of an HVAC system through IoT. Energies 2022, 15, 924. [Google Scholar] [CrossRef]
Serale, G.; Fiorentini, M.; Capozzoli, A.; Bernardini, D.; Bemporad, A. Model predictive control (MPC) for enhancing building and HVAC system energy efficiency: Problem formulation, applications and opportunities. Energies 2018, 11, 631. [Google Scholar] [CrossRef]
Ashry, I.; Mao, Y.; Wang, B.; Hveding, F.; Bukhamsin, A.Y.; Ng, T.K.; Ooi, B.S. A review of distributed fiber–optic sensing in the oil and gas industry. J. Light. Technol. 2022, 40, 1407–1431. [Google Scholar] [CrossRef]
Lu, P.; Lalam, N.; Badar, M.; Liu, B.; Chorpening, B.T.; Buric, M.P.; Ohodnicki, P.R. Distributed optical fiber sensing: Review and perspective. Appl. Phys. Rev. 2019, 6, 041302. [Google Scholar] [CrossRef]
Barrias, A.; Casas, J.R.; Villalba, S. A review of distributed optical fiber sensors for civil engineering applications. Sensors 2016, 16, 748. [Google Scholar] [CrossRef]
Bense, V.; Read, T.; Bour, O.; Le Borgne, T.; Coleman, T.; Krause, S.; Chalari, A.; Mondanos, M.; Ciocca, F.; Selker, J. Distributed Temperature Sensing as a downhole tool in hydrogeology. Water Resour. Res. 2016, 52, 9259–9273. [Google Scholar] [CrossRef]
Buyya, R.; Ilager, S.; Arroba, P. Energy-Efficiency and Sustainability in Cloud Computing: Innovations, Methods and Research Directions. Softw. Pract. Exp. 2024, 54, 862–890. [Google Scholar] [CrossRef]
Buyya, R.; Ilager, S.; Arroba, P. Energy-Efficiency and Sustainability in New Generation Cloud Computing: A Vision and Directions for Integrated Management of Data Centre Resources and Workloads. Softw. Pract. Exp. 2024, 54, 24–38. [Google Scholar] [CrossRef]
Han, X.; Tian, W.; VanGilder, J.; Zuo, W.; Faulkner, C. An open source fast fluid dynamics model for data center thermal management. Energy Build. 2021, 230, 110599. [Google Scholar] [CrossRef]
Pan, P.; Zhang, Y.; Deng, Z.; Qi, W. Deep learning-based 2-D frequency estimation of multiple sinusoidals. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 5429–5440. [Google Scholar] [CrossRef] [PubMed]
Pan, P.; Zhang, Y.; Deng, Z.; Wu, G. Complex-valued frequency estimation network and its applications to superresolution of radar range profiles. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
Pan, P.; Zhang, Y.; Deng, Z.; Fan, S.; Huang, X. TFA-Net: A deep learning-based time-frequency analysis tool. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9274–9286. [Google Scholar] [CrossRef]
Zhang, Y.; Pan, P.; Li, Y.; Guo, R. Efficient off-grid frequency estimation via ADMM with residual shrinkage and learning enhancement. Mech. Syst. Signal Process. 2025, 224, 112200. [Google Scholar] [CrossRef]
Pan, P.; Zhang, Y.; Li, Y.; Ye, Y.; He, W.; Zhu, Y.; Guo, R. Interpretable Optimization-Inspired Deep Network for Off-Grid Frequency Estimation. IEEE Trans. Neural Netw. Learn. Syst. 2025, in press. [Google Scholar]
Lin, Y.; Wang, M.; Xu, L.; Zhang, F. The maximum forcing number of a polyomino. Australas. J. Combin. 2017, 69, 306–314. [Google Scholar]
Yu, Z.J.; Haghighat, F.; Fung, B.C. Advances and challenges in building engineering and data mining applications for energy-efficient communities. Sustain. Cities Soc. 2016, 25, 33–38. [Google Scholar] [CrossRef]
Pérez-Lombard, L.; Ortiz, J.; Pout, C. A review on buildings energy consumption information. Energy Build. 2008, 40, 394–398. [Google Scholar] [CrossRef]
Wang, R.F.; Qu, H.R.; Su, W.H. From sensors to insights: Technological trends in image-based high-throughput plant phenotyping. Smart Agric. Technol. 2025, 12, 101257. [Google Scholar] [CrossRef]
Wang, R.F.; Su, W.H. The application of deep learning in the whole potato production Chain: A Comprehensive review. Agriculture 2024, 14, 1225. [Google Scholar] [CrossRef]
Wang, R.F.; Qin, Y.M.; Zhao, Y.Y.; Xu, M.; Schardong, I.B.; Cui, K. RA-CottNet: A Real-Time High-Precision Deep Learning Model for Cotton Boll and Flower Recognition. AI 2025, 6, 235. [Google Scholar] [CrossRef]
Sun, H.; Xi, X.; Wu, A.Q.; Wang, R.F. ToRLNet: A Lightweight Deep Learning Model for Tomato Detection and Quality Assessment Across Ripeness Stages. Horticulturae 2025, 11, 1334. [Google Scholar] [CrossRef]
Huihui, S.; Rui-Feng, W. BMDNet-YOLO: A Lightweight and Robust Model for High-Precision Real-Time Recognition of Blueberry Maturity. Horticulturae 2025, 11, 1202. [Google Scholar]
Wang, R.F.; Tu, Y.H.; Li, X.C.; Chen, Z.Q.; Zhao, C.T.; Yang, C.; Su, W.H. An Intelligent Robot Based on Optimized YOLOv11l for Weed Control in Lettuce. In Proceedings of the 2025 ASABE Annual International Meeting. American Society of Agricultural and Biological Engineers, Toronto, ON, Canada, 13–16 July 2025; p. 1. [Google Scholar]
Cho, J.; Lim, T.; Kim, B.S. Measurements and predictions of the air distribution systems in high compute density (Internet) data centers. Energy Build. 2009, 41, 1107–1115. [Google Scholar] [CrossRef]
Lazic, N.; Boutilier, C.; Lu, T.; Wong, E.; Roy, B.; Ryu, M.; Imwalle, G. Data center cooling using model-predictive control. Adv. Neural Inf. Process. Syst. 2018, 31, 3818–3827. [Google Scholar]

Figure 1. Overall Research Framework of Data Center Thermal Management System Based on Distributed Fiber Optic Sensing and Model Predictive Control. The four color-coded layers represent the Perception Layer (blue), Modeling Layer (green), Decision Layer (orange), and Execution Layer (purple), with a closed-loop feedback path shown on the right.

Figure 2. Overall Architecture Diagram of Data Center Thermal Management System.

Figure 3. Schematic Diagram of Sensing Fiber Spatial Layout Scheme.

Figure 4. Hybrid Thermal Prediction Model Structure Diagram. The color-coded regions represent the Input Layer (gray), Physics Constraint Layer (light blue), Feature Extraction Layer including TCN and BiGRU (green), Self-Attention Mechanism (purple), and Output Layer (dark purple).

Figure 5. MPC Controller Solution Process Flowchart.

Figure 6. Distributed Fiber Optic Temperature Sensing System Measurement Error Distribution Diagram (including probability density curve and cumulative distribution function).

Figure 7. Two-Dimensional Thermal Distribution Map of Data Center Machine Room Temperature Field.

Figure 8. Radar Comparison Chart of Multi-Dimensional Performance Metrics of Different Prediction Methods.

Figure 9. Multi-Model Comparison Curve Chart of Prediction Error Variation with Prediction Horizon.

Figure 10. Box Plot Comparison of Rack Inlet Temperature Distribution (PID vs. MPC).

Figure 11. 24-Hour Stacked Area Chart of Cooling System Energy Consumption Composition (PID vs. MPC Comparison).

Figure 12. Temperature Change Time Series Chart from Load Step Response Testing.

Table 1. Comparative Summary of Recent Data Center Thermal Management Approaches.

Reference	Sensing Technology	Control Strategy	Key Performance	Limitations
Lin et al. [2]	Point sensors (CFD-based)	Data-driven prediction	RMSE < 1.0 °C	Sparse spatial sampling
Chen et al. [3]	Point sensors	Machine learning	Error < 1.5 °C	Limited to steady-state
Wan et al. [5]	Point sensors	Deep RL	8.5% energy saving	Training data dependency
Zhao et al. [18]	Point sensors	LSTM-MPC	12% energy saving	Model accuracy dependency
This work	Distributed fiber optic	Hybrid physics–AI + MPC	RMSE 0.41 °C, 14.4% energy saving	Summer validation only

Table 2. Key Methodological Distinctions Between the Proposed Framework and Existing Approaches.

Aspect	Conventional Methods	Proposed Framework
Sensing paradigm	Discrete point sensors (50–200 points)	Distributed fiber optic (3600+ points)
Spatial resolution	2–5 m between sensors	0.5 m continuous
Prediction model	Pure physical or pure data-driven	Hybrid physics–AI integration
Physical consistency	Not guaranteed	Enforced via constraint layer
Control objective	Energy or temperature alone	Multi-objective with symmetry
Thermal symmetry	Implicit or ignored	Explicitly modeled and optimized
Adaptability	Limited under disturbances	Rolling-horizon with feedback

Table 3. Main Design Parameters of the Proposed System.

Parameter Category	Parameter Name	Parameter Value	Unit	Description
Fiber sensing	Total fiber length	2000	m	Covers the entire machine room area
Fiber sensing	Spatial resolution	0.5	m	Distance between adjacent measurement points
Fiber sensing	Temperature resolution	0.1	°C	Minimum resolvable temperature change
Fiber sensing	Measurement range	−40 to +85	°C	Operating temperature range
Fiber sensing	Laser wavelength	1550	nm	Probe light wavelength
Fiber sensing	Fiber type	Single-mode G.652D	—	Standard telecom fiber
Data acquisition	Sampling period	10	s	Temperature data update interval
Data acquisition	Signal processing time	<200	ms	Edge device processing latency
Data acquisition	Communication protocol	Modbus TCP/MQTT	—	Data transmission standard
MPC	Prediction horizon	30	min	MPC prediction window length
MPC	Control horizon	5	min	MPC window length
MPC	Weighting coefficient $α_{1}$	0.6	—	Energy consumption weight
MPC	Weighting coefficient $α_{2}$	0.3	—	Temperature deviation weight
MPC	Weighting coefficient $α_{3}$	0.1	—	Control smoothness weight
Safety threshold	Temperature upper limit	27	°C	Maximum allowable server inlet temperature

Table 4. Specifications and Uncertainty Analysis of the Distributed Fiber Optic Temperature Sensing System.

Parameter	Specification	Unit	Notes
Sensing principle	Stimulated Brillouin scattering	—	BOTDA technology
Measurement range	−40 to +85	°C	Suitable for data center environments
Temperature accuracy	±0.1	°C	At reference conditions
Spatial resolution	0.5	m	Determined by pulse width
Maximum sensing distance	50	km	Single-ended measurement
Sampling interval	0.1	m	Oversampling for noise reduction
Response time	<10	s	Full fiber scan period
Type A uncertainty ( $u_{A}$ )	0.08	°C	From repeated measurements (n = 30)
Type B uncertainty ( $u_{B}$ )	0.06	°C	From calibration certificate
Combined uncertainty ( $u_{c}$ )	0.1	°C	$u_{c} = \sqrt{u_{A}^{2} + u_{B}^{2}}$
Expanded uncertainty ( $U$ )	0.2	°C	k = 2, 95% confidence level

Table 5. Sensing Fiber Layout Scheme Parameters for High-Density Thermal Field Sampling.

Area Type	Layout Method	Measurement Point Spacing	Coverage Range	Fiber Length
Cold aisle top	Straight line	0.5 m	All rack rows	600 m
Hot aisle top	Straight line	0.5 m	All rack rows	600 m
Air conditioner outlet	Spiral coiling	0.1 m	Each air conditioner	160 m
Rack inlet face	Vertical laying	0.3 m	Sample racks	280 m
Floor air supply outlet	Ring laying	0.2 m	All outlets	160 m
Total	—	—	—	1800 m

Table 6. Outdoor Weather Conditions During the Experimental Period (July–September 2024).

Month	Average Temperature (°C)	Max Temperature (°C)	Min Temperature (°C)	Average Relative Humidity (%)	Average Solar Radiation (W/m²)
July-2024	29.5	38.2	24.1	72	245
August-2024	31.2	39.6	25.8	65	268
September-2024	28.3	35.4	22.6	78	198
Overall average	29.7	37.7	24.2	72	237

Table 7. Baseline PID Controller Parameters for CRAC Units.

Parameter	Symbol	Value	Unit	Tuning Method
Proportional gain	$K_{p}$	2.5	°C/°C	Ziegler–Nichols + fine-tuning
Integral time	$T_{i}$	180	s	Step response analysis
Derivative time	$T_{d}$	45	s	Noise reduction optimization
Derivative filter coefficient	$N$	10	—	Standard practice
Control output range	—	12–20	°C	Equipment specification
Sampling period	—	10	s	Matched with MPC
Anti-windup limit	—	±3	°C	Prevent integral saturation

Table 8. Experimental Testing Scheme and Parameter Configuration.

Parameter Category	Parameter Name	Parameter Value	Notes
Machine room scale	Building area	800 m²	Including cold and hot aisles
Machine room scale	Number of racks	120 units	42U standard racks
Machine room scale	Total IT load	1.44 MW	Average load rate 85%
Machine room scale	Number of air conditioners	8 units	Single-unit cooling capacity 80 kW
Fiber sensing	Fiber length	1800 m	Single-mode fiber
Fiber sensing	Number of measurement points	3612	0.5 m spacing
Fiber sensing	Sampling frequency	0.1 Hz	10 s update
Fiber sensing	Calibration interval	Weekly	Spot check at 10 reference points
Model training	Training data	30 days	2.592 million records
Model training	Testing data	7 days	604,800 records
Model training	Validation data	4.5 days	388,800 records
MPC	Prediction horizon	30 min	180 sampling points
MPC	Control horizon	5 min	30 sampling points

Table 9. Measurement Accuracy Statistics of the Distributed Fiber Optic Temperature Sensing System.

Statistical Indicator	Value	Unit
Sample quantity	604,800	groups
Average deviation	0.12	°C
Standard deviation	0.23	°C
Maximum positive deviation	0.68	°C
Maximum negative deviation	−0.71	°C
95% confidence interval	±0.47	°C
99% confidence interval	±0.61	°C

Table 10. Performance Comparison of Different Prediction Methods.

Prediction Method	RMSE (°C)	MAE (°C)	MAPE (%)	R²
Physical model	0.98	0.79	3.15	0.871
LSTM	0.63	0.48	1.92	0.932
XGBoost	0.56	0.43	1.71	0.946
TCN–BiGRU	0.51	0.39	1.56	0.955
Hybrid model	0.41	0.32	1.28	0.967

Table 11. Temperature Control Performance Comparison between PID and MPC Systems.

Control System	Mean (°C)	Std. Dev. (°C)	Max (°C)	Min (°C)	Over-Limit Ratio (%)	TSI
PID control	23.8	1.42	27.6	19.2	0.32	0.06
MPC	24.6	0.87	27.1	21.3	0.08	0.035
Improvement	0.8	−38.7%	−0.5	2.1	−75.0%	−41.7%

Table 12. Cooling System Energy Consumption Performance Comparison (PID vs. MPC).

Indicator	PID	MPC	Improvement
Daily average cooling energy consumption (MWh)	10.23	8.76	−14.40%
Air conditioner energy consumption ratio (%)	68.5	65.2	−3.30%
Water pump energy consumption ratio (%)	18.7	20.1	1.40%
Cooling tower energy consumption ratio (%)	12.8	14.7	1.90%
PUE	1.58	1.47	−6.96%
Annual electricity cost savings (10,000 yuan)	–	45.7	–

Table 13. Operating Status of Each CRAC Unit in Air Conditioner Failure Transfer Testing.

CRAC Unit	Pre-Failure Air Supply (%)	Post-Failure Air Supply (%)	Pre-Failure Power (kW)	Post-Failure Power (kW)
No. 1	78	0 (failure)	52.3	0
No. 2	75	95	49.8	65.2
No. 3	72	78	47.5	52.1
No. 4	74	80	48.9	53.8
No. 5	71	77	46.8	51.4
No. 6	73	79	48.2	52.9
No. 7	76	82	50.5	55.3
No. 8	74	95	48.9	65
Total	–	–	392.9	395.7

Table 14. System 90-Day Continuous Operation Stability Indicators.

Indicator Category	Indicator Name	Value	Unit
Operating duration	Cumulative operating time	2160	hours
Operating duration	System availability	99.87	%
Operating duration	Unplanned shutdown times	2	times
Anomaly handling	Load mutation events	47	times
Anomaly handling	Ambient abnormal events	12	times
Anomaly handling	Safety over-limit times	0	times
Computational performance	Average computation time	127	ms
Computational performance	Maximum computation time	312	ms
Computational performance	Optimization solution success rate	99.94	%

Table 15. Long-Term Measurement Drift Analysis of the Distributed Fiber Optic Sensing System.

Week	Average Drift (°C)	Maximum Drift (°C)	Drift Trend	Notes
1–2	0.02	0.08	Stable	Initial stabilization
3–4	0.03	0.11	Stable	Normal operation
5–6	0.04	0.13	Stable	Normal operation
7–8	0.05	0.18	Slight increase	HVAC maintenance disturbance
9–10	0.03	0.1	Stable	Post-maintenance recovery
11–12	0.04	0.12	Stable	Normal operation
Overall	0.035	0.18	No systematic trend	Within specification

Table 16. Deployment Cost Breakdown for the Proposed System.

Cost Category	Cost (10,000 Yuan)	Percentage
Sensing hardware (interrogator, fiber, accessories)	51	41.80%
Computing hardware (edge device, network)	7	5.70%
Software (prediction model, MPC, HMI)	30	24.60%
Engineering services (design, installation, training)	23	18.90%
Contingency (10%)	11	9.00%
Total	122	100%

Table 17. Lifecycle Cost Analysis Over 10-Year Horizon.

Cost/Benefit Category	Year 0	Years 1–7 (Annual)	Year 8	Years 9–10 (Annual)	10-Year Total
Initial deployment	122	—	—	—	122
Annual maintenance	—	9	9	9	90
Component replacement	—	—	5	15.0 (Year 10)	20
Total costs	122	9	14	9.0/24.0	232
Energy savings	—	45.6	43.8 *	42.0 *	436.6
Indirect benefits	—	21	21	21	210
Total benefits	—	66.6	64.8	63	646.6

* Note: Energy savings reduced by 4% and 8% in Years 8–10 due to model drift without major retraining.

Table 18. Computational Scalability Analysis for Large-Scale Deployments.

Deployment Scale	Sensing Points	Aggregated States	Est. Computation Time	Real-Time Feasibility	Recommended Architecture
Current (experimental)	3612	120	127 ms (measured)	Yes	Centralized MPC
Medium (5 MW)	10,000	300	450 ms (estimated)	Yes	Centralized MPC
Large (10 MW)	25,000	600	1.8 s (estimated)	Marginal	Zone-based MPC
Hyperscale (50 MW)	100,000	2000	12 s (estimated)	No (centralized)	Distributed MPC

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, L.-X.; Wang, M.-J.-S. Real-Time Thermal Symmetry Control of Data Centers Based on Distributed Optical Fiber Sensing and Model Predictive Control. Symmetry 2026, 18, 398. https://doi.org/10.3390/sym18030398

AMA Style

Tang L-X, Wang M-J-S. Real-Time Thermal Symmetry Control of Data Centers Based on Distributed Optical Fiber Sensing and Model Predictive Control. Symmetry. 2026; 18(3):398. https://doi.org/10.3390/sym18030398

Chicago/Turabian Style

Tang, Lin-Xiang, and Mu-Jiang-Shan Wang. 2026. "Real-Time Thermal Symmetry Control of Data Centers Based on Distributed Optical Fiber Sensing and Model Predictive Control" Symmetry 18, no. 3: 398. https://doi.org/10.3390/sym18030398

APA Style

Tang, L.-X., & Wang, M.-J.-S. (2026). Real-Time Thermal Symmetry Control of Data Centers Based on Distributed Optical Fiber Sensing and Model Predictive Control. Symmetry, 18(3), 398. https://doi.org/10.3390/sym18030398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Thermal Symmetry Control of Data Centers Based on Distributed Optical Fiber Sensing and Model Predictive Control

Abstract

1. Introduction

2. Related Technologies and Research Status

2.1. Overview of Data Center Thermal Management Technologies

2.2. Research Progress in Temperature Monitoring and Prediction Methods

2.3. Research Progress in Cooling System Control Strategies

3. System Design

3.1. Overall System Architecture Design

3.2. Distributed Fiber Optic Temperature Sensing Subsystem Design

3.3. Hybrid Thermal Prediction Model Design

4. Experimental Results and Analysis

4.1. Experimental Environment and Testing Scheme

4.2. Performance Analysis of the Distributed Temperature Sensing System

4.3. Hybrid Thermal Prediction Model Accuracy Verification

4.4. MPC System Performance Evaluation

4.5. System Robustness and Extreme Condition Testing

5. Discussion and Implications

5.1. Discussion of Research Results

5.2. Theoretical Contributions and Practical Implications

5.3. Economic Analysis

5.4. Scalability Considerations

6. Conclusion and Future Scope

6.1. Conclusions

6.2. Future Scope

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI