A Privacy-Preserving Multi-Time-Scale Tie-Line Power Smoothing Method for Multiple Data Centers

Luo, Quanyong; Yu, Jiexiao; Feng, Xiangwei

doi:10.3390/en19112708

Open AccessArticle

A Privacy-Preserving Multi-Time-Scale Tie-Line Power Smoothing Method for Multiple Data Centers

by

Quanyong Luo

¹,

Jiexiao Yu

^1,* and

Xiangwei Feng

²

¹

School of Electric and Information Engineering, Tianjin University, Nankai District, Tianjin 300072, China

²

School of Electrical Engineering, Shenyang University of Technology, Shenyang 110870, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(11), 2708; https://doi.org/10.3390/en19112708

Submission received: 28 April 2026 / Revised: 1 June 2026 / Accepted: 3 June 2026 / Published: 4 June 2026

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Applications in Smart Energy Systems—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

As renewable penetration in data-center power supply increases, stochastic renewable output can cause tie-line power fluctuations between data centers (DCs) and the utility grid. This paper proposes a privacy-preserving multi-time-scale tie-line power smoothing method for multiple DCs. A two-stage first-order low-pass filter decomposes tie-line fluctuations into high- and low-frequency regulation targets. Server task shifting tracks the high-frequency target, while uninterruptible power supply (UPS) regulation compensates the low-frequency residual under practical energy and power constraints. Second, a federated adaptive proximal policy optimization (Fed-AdaPPO) framework is developed. Proximal policy optimization (PPO) provides stable policy optimization in the continuous action space, and the upper confidence bound (UCB)-guided adaptive exploration improves task-shifting exploration. Critically, only Critic gradients are aggregated across DCs; Actor networks, raw workload data, and user-sensitive information remain local. This design reduces the risk of exposing local state-action mappings. Results show that coordinated server-cluster and UPS regulation reduces the standard deviation of tie-line power by at least 33.4% while maintaining service quality and data privacy.

Keywords:

privacy-preserving; data center; renewable energy; UPS energy storage; tie-line power smoothing; federated reinforcement learning; smart grid; multi-time-scale control

1. Introduction

In recent years, the rapid development of generative artificial intelligence, cloud services, and large-scale data analytics has significantly increased the computing demand and electricity consumption of DCs. As critical infrastructures supporting large-model training and inference, DCs are becoming major electricity consumers [1,2,3]. According to the International Energy Agency (IEA), global electricity demand from DCs grew by 17% in 2025, and electricity consumption from DCs is projected to increase from 485 TWh in 2025 to 950 TWh in 2030, accounting for about 3% of global electricity demand by then [4]. To reduce operating costs and carbon emissions, an increasing number of DCs are being supplied partially or predominantly by renewable energy sources such as wind and photovoltaic generation [5,6]. However, the intermittency of renewable generation, together with the randomness of computational workloads, introduces significant fluctuations in DC operating power. These fluctuations propagate to the tie-line power, degrading power quality on the DC side and threatening the operational stability of the upstream grid. Recent studies on load frequency control have further shown that enhancing the reliability and dynamic performance of multi-area power systems is particularly important under component aging and cyber-attack disturbances [7].

To alleviate tie-line power fluctuations, existing studies have explored multiple categories of regulation resources and coordinated control strategies. One important line of research focuses on storage-based compensation, in which batteries, virtual energy storage, and hierarchical storage regulation are used to absorb renewable-energy-induced power surpluses and compensate for power deficits [8]. These methods are effective because of the fast response of storage devices; however, their regulation capability is inherently constrained by energy capacity and battery lifetime, especially when frequent charge/discharge actions are required. Another important line of research exploits load-side flexibility in DCs. In this category, server workloads are regarded as adjustable resources, and demand response is achieved through workload scheduling, workload redistribution, and geographically distributed coordination [9,10]. Such methods take advantage of the fact that delay-tolerant workloads can be shifted in time or location without violating service-level agreements, thereby enabling server clusters to participate in tie-line power regulation as fast-response flexible loads.

Recent studies have further extended tie-line smoothing from single-resource regulation to multi-resource coordination and interconnected multi-DC scheduling. In particular, Yang et al. [11] proposed a multi-DC tie-line power smoothing method based on demand response, in which intra-DC temporal task migration, inter-DC spatial task migration, and UPS-based low-frequency compensation are coordinated under tightly coupled interconnection conditions. This work shows that cross-DC workload migration can enlarge the scheduling space and improve smoothing performance in strongly coordinated multi-DC systems. Shao et al. [12] proposed a carbon-oriented workload spatial-temporal shifting scheduling method for internet DCs, where temporal deferral and spatial redistribution are jointly guided by nodal carbon intensity signals to reduce carbon emissions. This study further demonstrates that spatiotemporal workload flexibility can be exploited not only for power balancing but also for carbon-aware coordinated scheduling. In addition, Yang and Hu [13] developed a robust decentralized approach for multi-area microgrid clusters by coordinating generators and energy storage, showing that distributed multi-resource coordination can effectively suppress inter-area power fluctuations under uncertainty. These studies indicate that multi-resource and multi-time-scale coordination is an effective direction for tie-line power smoothing. Nevertheless, most existing methods still rely on strong coordination assumptions, centralized scheduling, or extensive information exchange, which may be challenging to implement in privacy-constrained multi-entity DC scenarios.

As source–load interactions become increasingly stochastic and high-dimensional, deep reinforcement learning (DRL) has emerged as a promising alternative to model-dependent optimization. Compared with conventional optimization and heuristic control, DRL is better suited to high-dimensional, nonlinear, and uncertain scheduling problems because it can learn adaptive decision policies directly from environment interaction. For example, Siddesha et al. [14] applied DRL to task scheduling in cloud computing, demonstrating its effectiveness in improving scheduling efficiency under dynamic environments. Lou et al. [15] further proposed a DRL-based joint optimization method for task assignment and migration in DCs, improving resource utilization under uncertain task execution conditions. Chen et al. [16] extended DRL to a multi-DC scenario and developed a two-timescale joint optimization framework based on multi-agent DRL, showing the potential of learning-based methods in handling spatiotemporal uncertainty and coupled decision-making. However, most existing DRL-based multi-agent methods still rely on centralized training or extensive information sharing across agents.

Despite the above progress, privacy-preserving collaborative tie-line power smoothing for multiple DCs remains insufficiently addressed. Alinezhadi et al. [17] proposed an intelligent privacy-preserving demand response method for green DCs, showing that reinforcement learning and federated learning can be combined to protect local data privacy while enabling DCs to participate in grid-side peak shaving. In addition, Lin et al. [18] developed a privacy-preserving federated learning mechanism for power systems and showed that local data can remain decentralized while collaborative model training is still achieved, with an explicit trade-off between privacy-preserving level and model accuracy. These studies confirm the feasibility of privacy-preserving collaborative intelligence in energy systems. However, the application of federated reinforcement learning to tie-line power smoothing in multiple DCs is still limited, especially under strong spatiotemporal heterogeneity and without inter-DC raw workload exchange.

To address these challenges, this paper develops a privacy-preserving multi-time-scale tie-line power smoothing method for multiple DCs. The main contributions are summarized as follows:

(1): A multi-time-scale tie-line power smoothing framework is established, in which a two-stage first-order low-pass filter is used to decompose tie-line power fluctuations into high- and low-frequency components. Server-cluster task shifting is used to suppress high-frequency fluctuations on the fast time scale, while UPS regulation is used to compensate for low-frequency residuals on the slow time scale.
(2): A Fed-AdaPPO scheduling framework is proposed for privacy-preserving collaborative server-cluster regulation. The UCB-guided adaptive exploration mechanism improves task-shifting search efficiency in the continuous action space. Meanwhile, only clipped and perturbed Critic gradients are aggregated, while Actor policies, real-time task queues, and user-related information remain local. Since Actor policies contain local state-action mappings, this Critic-only aggregation design reduces privacy leakage risks and is more suitable for privacy-constrained multi-DC scheduling.
(3): Extensive experiments under multiple operating scenarios show that the proposed method can effectively exploit the local regulation flexibility of server clusters and UPS systems within each DC, achieve effective tie-line power smoothing without requiring the exchange of raw workload data across DCs, maintain service quality, and show robustness under different renewable energy penetration levels and different proportions of delay-tolerant tasks.

2. Materials and Methods

2.1. DC Microgrid Architecture

The DC microgrid consists of renewable generation, a main-grid tie-line, server cluster loads, uncontrollable loads such as cooling and lighting, and a UPS-based energy storage system, forming a typical source-grid-load-storage coordinated structure. As noted earlier, many DCs adopt fully or partially renewable-powered supply schemes, such as photovoltaic and wind generation, to reduce energy consumption and carbon emissions. In practical operation, DCs are often geographically and electrically close to local renewable generation facilities. As a result, renewable output fluctuations appear directly in the tie-line power, demanding faster load scheduling and quicker energy storage response.

Figure 1 illustrates the DC microgrid architecture. Under ideal conditions and neglecting internal system losses, the power balance of DC d at time t can be expressed as follows [19]:

P_{TL, t}^{d} + P_{UPS, t}^{d} + (P_{W, t}^{d} + P_{V, t}^{d}) = E_{PUE}^{d} \cdot P_{cls, t}^{d}

(1)

where

P_{TL, t}^{d}

denotes the power imported from the main grid to DC d at time t;

P_{UPS, t}^{d}

denotes the real-time output power of the UPS battery bank (discharging positive, charging negative);

P_{W, t}^{d}

and

P_{V, t}^{d}

denote the real-time wind and photovoltaic power outputs, respectively;

P_{cls, t}^{d}

denotes the server cluster load power; and

E_{PUE}^{d}

is the power usage effectiveness (PUE) [20], defined as the ratio of the total energy consumption of the DC to the energy consumption of computing equipment. It is calculated as follows:

E_{PUE}^{d} = \frac{P_{cls, t}^{d} + P_{UL, t}^{d}}{P_{cls, t}^{d}}

(2)

where

P_{UL, t}^{d}

denotes the uncontrollable load power in DC d at time t, such as cooling and lighting loads.

2.2. Server Cluster Load Model Based on Task Migration

According to the service-level agreement (SLA), tasks processed in a DC can be classified into delay-sensitive tasks and delay-tolerant tasks [21]. Delay-sensitive tasks, such as online transaction processing and interactive query services, are subject to strict real-time requirements and must occupy CPU resources immediately upon arrival, leaving little room for scheduling. In contrast, delay-tolerant tasks, such as offline data mining and backup model training, can be flexibly scheduled within a predefined deadline window and therefore provide considerable temporal slack and scheduling flexibility. The differences between these two task categories are illustrated in Figure 2.

This study adopts a temporal task-shifting strategy to regulate the server-cluster load level by altering the time-domain distribution of delay-tolerant tasks. Assuming that computing tasks are evenly distributed across the server cluster, the average CPU utilization rate

α_{CPU, t}^{d}

of the server cluster in DC d at time t is jointly determined by the utilization induced by the base load

α_{base, t}^{d}

and the utilization deviation caused by task migration:

α_{CPU, t}^{d} = α_{base, t}^{d} + \frac{1}{M^{d}} \sum_{i = 1}^{n_{active, t}^{d}} α_{active}^{i} - \frac{1}{M^{d}} \sum_{j = 1}^{n_{defer, t}^{d}} α_{defer}^{j}

(3)

where

M_{d}

denotes the number of servers in DC d;

n_{active, t}^{d}

denotes the number of delay-tolerant tasks migrated into the system at the current time step, and

α_{active}^{i}

is the CPU occupancy ratio of the i-th incoming task;

n_{defer, t}^{d}

denotes the number of delay-tolerant tasks migrated out of the system at the current time step, and

α_{defer}^{j}

is the CPU occupancy ratio of the j-th outgoing task.

Because the servers within the DC are highly homogeneous and are dispatched in a balanced manner, a linear power model is adopted in this paper [22]. Specifically, the total server-cluster power is approximated as the product of the power of a single server and the number of active servers. Accordingly, the server cluster load

P_{cls, t}^{d}

in DC d at time t is expressed as follows:

P_{cls, t}^{d} = [P_{idle}^{d} + (P_{\max}^{d} - P_{idle}^{d}) α_{CPU, t}^{d}] M_{d}

(4)

where

P_{\max}^{d}

and

P_{idle}^{d}

denote the full-load power and idle power of a single server in DC d, respectively.

2.3. UPS Battery Energy Storage Model

To ensure reliable power supply for critical equipment, UPS battery banks are typically deployed in DC microgrids to provide backup energy in a timely manner when renewable generation drops sharply or when the main grid fails. In this paper, a general first-order energy storage model is adopted to describe the state-of-charge (SOC) dynamics of the UPS battery system [23]. The residual energy of the UPS battery bank

E_{t + 1}^{d}

in DC d at time t + 1 is determined by the residual energy at the previous time step and the current charging/discharging power. The corresponding state transition equation is given by

E_{t + 1}^{d} = \{\begin{array}{l} E_{t}^{d} - Δ t_{u} \cdot P_{UPS, t}^{d} / ψ_{e}^{d}, & P_{UPS, t}^{d} \geq 0 \\ E_{t}^{d} - Δ t_{u} \cdot P_{UPS, t}^{d} \cdot ψ_{c}^{d}, & P_{UPS, t}^{d} < 0 \end{array}

(5)

where

ψ_{c}^{d}

and

ψ_{e}^{d}

denote the charging and discharging efficiency coefficients of the UPS battery bank, respectively, and

Δ t_{u}

is the control step size.

Based on the above model, the UPS state of charge

S_{t}^{d}

is defined as the ratio of the current residual energy to the rated capacity

S_{t}^{d} = \frac{E_{t}^{d}}{E_{UPS}^{d}}

. To prolong battery lifetime and ensure operational safety, the following constraints must be satisfied.

(1): SOC constraints: The battery SOC must remain within a safe range to prevent overcharging and overdischarging:

S_{\min}^{d} \leq S_{t}^{d} \leq S_{\max}^{d}

(6)

where

S_{\max}^{d}

and

S_{\min}^{d}

denote the upper SOC limit and the physical lower SOC limit of the UPS battery bank, respectively. Considering the emergency backup function of practical data-center UPS systems, UPS regulation should not consume the reserve capacity required by critical IT loads. Therefore, the actual lower SOC bound for regulation is revised as follows:

{\underline{S}}^{d} = S_{\min}^{d} + S_{res}^{d} + S_{mar}^{d}

(7)

{\underline{S}}^{d} \leq S_{t}^{d} \leq S_{\max}^{d}

(8)

where

S_{res}^{d}

is the reserve SOC margin for emergency backup and

S_{mar}^{d}

is an engineering safety margin. This constraint indicates that the adjustable energy used for low-frequency tie-line power compensation is limited to the remaining headroom beyond the reserved backup capacity.

(2): Charge/discharge cycle constraints: To avoid frequent switching between charging and discharging states, the number of UPS charge/discharge transitions within the scheduling horizon is constrained by

\sum_{t = 1}^{T - 1} | δ_{t + 1}^{d} - δ_{t}^{d} | \leq K_{\max}^{d}

(9)

where

K_{\max}^{d}

denotes the maximum allowable number of charge/discharge actions within the scheduling horizon T;

δ_{t}^{d}

is the charging/discharging state indicator at time t, defined as follows:

δ_{t}^{d} = \{\begin{array}{l} 1, & P_{UPS, t}^{d} \geq 0 \\ 0, & P_{UPS, t}^{d} < 0 \end{array}

(10)

Furthermore, because battery cycling causes lifetime degradation, an energy-throughput-based degradation proxy cost is included in the scheduling objective:

C_{\deg, t}^{d} = c_{\deg}^{d} |P_{UPS, t}^{d}| Δ t_{u}

(11)

where

c_{\deg}^{d}

denotes the degradation-cost coefficient per unit energy throughput. This term is not intended to precisely describe the electrochemical aging mechanism of a specific battery cell; rather, it penalizes unnecessary frequent or large-amplitude charge/discharge actions at the scheduling level.

(3): Charge/discharge power constraints: The UPS output power is also limited by equipment ratings, converter capacity, and engineering operating limits. Therefore, the actual UPS charging/discharging power satisfies

- P_{ch, \max}^{d} \leq P_{UPS, t}^{d} \leq P_{dis, \max}^{d}

(12)

where

P_{ch, \max}^{d}

and

P_{dis, \max}^{d}

denote the maximum charging and discharging powers of the UPS, respectively.

(4): Ramp-rate constraints: To avoid abrupt changes in UPS output power between adjacent control intervals, a ramp-rate constraint is further introduced:

|P_{UPS, t + 1}^{d} - P_{UPS, t}^{d}| \leq R_{UPS}^{d} Δ t_{u}

(13)

where

R_{UPS}^{d}

denotes the maximum allowable UPS power variation rate.

While maintaining backup power reliability for the DC, the charging and discharging power of the UPS can be dynamically regulated by adjusting

P_{UPS, t}^{d}

. In this way, the UPS acts as a flexible resource for tie-line power smoothing without compromising its emergency backup capability for critical loads.

2.4. Multi-Time-Scale Tie-Line Power Smoothing Strategy

In DC load management, server clusters and UPS battery banks exhibit complementary regulation characteristics. In this study, server-cluster task scheduling is executed on a finer time scale to suppress high-frequency tie-line power fluctuations, whereas UPS battery regulation is executed on a coarser time scale to compensate for the remaining low-frequency deviations under energy constraints. To leverage these complementary characteristics, this paper proposes a multi-time-scale tie-line power smoothing strategy based on frequency-domain decomposition. Specifically, a two-stage first-order low-pass filter is constructed to generate two levels of target tie-line power. On the fast time scale, temporal task shifting is used to track the first-level target and suppress high-frequency power fluctuations. On the slow time scale, the charging/discharging behavior of the UPS battery bank is used to track the second-level target and compensate for the remaining low-frequency power deviations.

For DC d, the continuous-time transfer function of the i-th first-order low-pass filter is defined as follows:

H_{i}^{d} (s) = \frac{P_{i, o u t}^{d} (s)}{P_{i, i n}^{d} (s)} = \frac{1}{T_{i}^{d} s + 1}, i = 1, 2

(14)

where

T_{i}^{d}

denotes the time constant of the i-th filter. The corresponding equivalent cutoff frequency is given by

f_{i}^{d} = \frac{1}{2 π T_{i}^{d}}, i = 1, 2

(15)

A larger time constant results in stronger attenuation of short-term fluctuations and therefore produces a smoother output. In this paper,

T_{1}^{d} < T_{2}^{d}

is adopted, so that the first-stage filter preserves the power variations that can be rapidly handled by server-cluster scheduling, while the second-stage filter extracts a smoother low-frequency target suitable for UPS-based compensation.

First, the initial tie-line power of DC d, denoted by

P_{TL 0}^{d}

, is used as the input of the first-stage filter. Its continuous-time form can be written as follows:

\frac{d P_{TL 1}^{d} (t)}{d t} = \frac{P_{TL 0}^{d} (t) - P_{TL 1}^{d} (t)}{T_{1}^{d}}

(16)

Using the forward-Euler method, the first-stage target tie-line power

P_{TL 1, t}^{d}

is discretized as follows:

P_{TL 1, t}^{d} = \frac{Δ t_{s}}{T_{1}^{d}} (P_{TL 0, t}^{d} - P_{TL 1, t - 1}^{d}) + P_{TL 1, t - 1}^{d}

(17)

where

Δ t_{s}

denotes the control time step of the server cluster.

The first-level tie-line target is compensated by the server cluster load, and the corresponding power adjustment is denoted by

Δ P_{TL 1, t}^{d} = P_{TL 1, t}^{d} - P_{TL 0, t}^{d}

. Substituting

P_{TL 1, t}^{d}

into (1) and setting

P_{UPS, t}^{d} = 0

, the target server-cluster power of DC d,

P_{cls - target, t}^{d}

, can be obtained as follows:

P_{cls - target, t}^{d} = \frac{P_{TL 1, t}^{d} + (P_{W, t}^{d} + P_{V, t}^{d})}{E_{PUE}^{d}}

(18)

To achieve tie-line smoothing at the server-cluster level, the deviation between the actual server-cluster power and its target value is minimized. The corresponding optimization objective is formulated as follows:

\min J_{cls}^{d} = \sum_{t = 1}^{T} {(P_{cls, t}^{d} - \frac{P_{TL 1, t}^{d} + (P_{W, t}^{d} + P_{V, t}^{d})}{E_{PUE}^{d}})}^{2}

(19)

After the server-cluster response, the actual tie-line power

P_{TL 1 - real, t}^{d}

is used as the input of the second-stage first-order low-pass filter to generate the low-frequency compensation target for the UPS layer. The continuous-time expression is

\frac{d P_{TL 2}^{d} (t)}{d t} = \frac{P_{TL 1 - real}^{d} (t) - P_{TL 2}^{d} (t)}{T_{2}^{d}}

(20)

After forward-Euler discretization, the second-stage target tie-line power is given by

P_{TL 2, t}^{d} = \frac{Δ t_{u}}{T_{2}^{d}} (P_{TL 1 - real, t}^{d} - P_{TL 2, t - 1}^{d}) + P_{TL 2, t - 1}^{d}

(21)

The second-level target tie-line power is then compensated by the UPS battery bank, and the target UPS power for DC d, denoted by

P_{UPS - target, t}^{d}

, is given as follows:

P_{UPS - target, t}^{d} = Δ P_{TL 2, t}^{d} = P_{TL 2, t}^{d} - P_{TL 1 - real, t}^{d}

(22)

Correspondingly, the objective of the UPS layer is to track the second-level target power while accounting for the battery degradation cost. The first term represents the tracking error, and the second term penalizes the degradation cost caused by UPS charging and discharging.

\min J_{UPS}^{d} = \sum_{t = 1}^{T} [P_{UPS, t}^{d} - (P_{TL 2, t}^{d} - P_{TL 1 - real, t}^{d})]^{2} + λ_{\deg} \sum_{t = 1}^{T} C_{\deg, t}^{d}

(23)

By combining the fast response of the server cluster to high-frequency components with the slower UPS-based compensation of low-frequency residuals, the proposed strategy realizes hierarchical decoupling and coordinated smoothing of tie-line power fluctuations in the frequency domain. This approach effectively balances the conflicting requirements of regulation speed and energy capacity.

2.5. Distributed Horizontal Federated Reinforcement Learning Framework

Server-cluster task scheduling and UPS energy management differ substantially in state-space complexity and control flexibility. Task migration is therefore treated as a multi-objective decision problem: it must handle highly stochastic task arrivals, a large queue state space, and the joint balancing of delay and power. Therefore, it is well suited to modeling and optimization within a Markov decision process (MDP) framework. By contrast, for the UPS battery bank, a computationally efficient greedy algorithm is adopted to directly track the low-frequency power target while satisfying physical constraints, thereby avoiding unnecessary training overhead.

To mitigate sample insufficiency and local overfitting in single-DC training, this paper proposes Fed-AdaPPO, a federated adaptive proximal policy optimization framework for collaborative tie-line power scheduling among multiple DCs. The framework integrates UCB-guided adaptive exploration with Critic-only federated gradient aggregation, enabling improved policy learning efficiency while keeping raw workload data and local scheduling states decentralized.

2.5.1. Markov Decision Process Modeling

To improve user satisfaction and fully exploit the scheduling capability, in DC d, tasks in the processing state at time t are sorted in descending order of allowable delay time, and tasks with the same allowable delay time are further sorted in ascending order of required processing time, thereby forming the task-processing queue SP. Tasks in the delayed state are sorted in ascending order of allowable delay time, and tasks with the same allowable delay time are further sorted in descending order of required processing time, thereby forming the task-delay queue SQ.

To achieve coordinated optimization of server-layer task migration, the tie-line power smoothing problem for multiple DCs is formulated as a unified MDP model.

(1): State space: The state vector $s_{cls, t}^{d}$ includes the operating state of the server cluster and the task-sequence information at time t, and is defined as follows:

s_{cls, t}^{d} = [P_{cls, t}^{d}, P_{cls - target, t}^{d}, L_{SP, t}^{d}, L_{SQ, t}^{d}, T_{SLA, t}^{d}]

(24)

where

L_{S P, t}^{d}

denotes the length of the SP queue;

L_{S Q, t}^{d}

denotes the length of the SQ queue;

T_{SLA, t}^{d}

denotes the normalized residual delay ratio, which is introduced as a soft constraint to ensure service quality, defined as follows:

T_{SLA, t}^{d} = \frac{1}{L_{SQ, t}^{d}} \sum_{k \in L_{SQ, t}^{d}} \frac{d_{k, t}}{D_{k}^{\max}}

(25)

where

d_{k, t}

denotes the remaining allowable delay time of the k-th task in the SQ queue;

D_{k}^{\max}

denotes the maximum allowable delay time of the k-th task.

(2): Action space: To avoid the curse of dimensionality caused by a large discrete action space and to enable the agent to continuously perceive the scale of task migration, a continuous action $a_{cls, t}^{d}$ is constructed and then mapped to a physically executable discrete number of tasks as follows:

a_{cls, t}^{d} \in [- 1, 1]

(26)

n_{mig, t}^{d} = round (a_{cls, t}^{d} \cdot N_{\max}^{d})

(27)

where

n_{mig, t}^{d}

denotes the number of activated or suspended tasks. A positive value indicates that tasks are sequentially activated from the head of the SQ queue and moved to the SP queue, whereas a negative value indicates that tasks are sequentially suspended from the head of the SP queue and moved to the SQ queue;

N_{\max}^{d}

denotes the maximum number of tasks allowed to migrate.

(3): Reward function: The objective of tie-line power smoothing in a DC is to minimize the deviation between the actual power and the target power while ensuring service quality. Accordingly, the reward function is defined as follows:

R (s_{c l s, t}^{d}, a_{c l s, t}^{d}) = ω_{1} {(1 - \frac{P_{cls, t}^{d} - P_{cls - target, t}^{d}}{P_{DC - \max}^{d}})}^{2} + ω_{2} T_{SLA, t}^{d}

(28)

where

ω_{1}

,

ω_{2}

denote weighting coefficients used to balance the power-smoothing performance and the task-delay risk;

P_{DC - \max}^{d}

denotes the maximum power consumption of DC d, defined as follows:

P_{DC - \max}^{d} = P_{\max}^{d} \cdot N_{d}

(29)

Normalizing the power-deviation term in the reward function by the maximum power consumption effectively eliminates the dimensional effect caused by capacity differences among DCs of different scales. This ensures a consistent reward scale and further improves the training stability and convergence efficiency of the reinforcement learning algorithm.

2.5.2. Federated Learning Framework Mechanism

Traditional discrete RL methods suffer from action-space explosion in large-scale task scheduling and cannot capture fine-grained scheduling characteristics. Although conventional PPO with a continuous action space avoids this explosion, its exploration strategy is too simple for complex, high-dimensional environments, limiting its ability to balance stability and exploration.

AdaPPO employs a parameterized stochastic policy network as the Actor network and a state-value function network as the Critic network. The Actor network consists of a new policy network and an old policy network. The new policy network is used to update the policy based on the latest sampled data, whereas the old policy network is responsible for generating actions during interaction with the environment.

The Critic network is used to estimate the state-value function

V_{ϕ} (s_{t})

and is trained by minimizing the mean squared error between the predicted value and the empirical return:

L^{VF} (ϕ) = E_{t} [{(V_{ϕ} (s_{t}) - {\hat{V}}_{t})}^{2}]

(30)

where

L^{VF} (ϕ)

denotes the loss function of the Critic network, and

{\hat{V}}_{t}

denotes the empirical return used as the training target.

The weights of the Critic network are updated by backpropagation according to:

ϕ \leftarrow ϕ - α \nabla L^{VF} (ϕ)

(31)

where

α

denotes the learning rate of the Critic network,

\nabla

denotes the gradient operator.

The weights of the Actor network are optimized through the following loss function:

L^{CLIP} (θ) = E_{t} [\min (r_{t} (θ) {\hat{A}}_{t}, clip (r_{t} (θ), 1 - ε_{clip}, 1 + ε_{clip}) {\hat{A}}_{t})]

(32)

where

E_{t}

denotes the average over sampled transitions;

clip (\cdot)

denotes the clipping function, which constrains

r_{t} (θ)

to the interval

[1 - ε, 1 + ε]

to prevent excessively large policy updates in a single step;

ε_{clip}

denotes the clipping threshold;

r_{t} (θ)

denotes the probability ratio between the new and old policies;

{\hat{A}}_{t}

denotes the estimated advantage function, which measures the relative advantage of taking action

a_{t}

in state

s_{t}

under the current policy. In this paper,

{\hat{A}}_{t}

is computed using generalized advantage estimation (GAE). The corresponding expressions are

r_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ^{'}} (a_{t} | s_{t})}

(33)

{\hat{A}}_{t} = \sum_{l = 0}^{\infty} {(γ λ)}^{l} δ_{t + l}

(34)

where

π_{θ} (a_{t} | s_{t})

denotes the probability of taking action

a_{t}

in state

s_{t}

under the current policy parameterized by

θ

,

π_{θ^{'}} (a_{t} | s_{t})

denotes the corresponding probability under the old policy,

γ

denotes the discount factor,

λ

denotes the balancing parameter used to trade off bias and variance, l denotes the step offset in the summation, and

δ_{t + l}

denotes the temporal-difference residual at step t + l, defined as follows:

δ_{t + l} = r_{t + l} + γ V_{ϕ} (s_{t + l + 1}) - V_{ϕ} (s_{t + l})

(35)

The weights of the Actor network are updated via backpropagation based on its loss function:

θ \leftarrow θ - β \nabla L^{CLIP} (θ)

(36)

where

β

is the learning rate of the Actor network.

Because the server-cluster scheduling problem involves a high-dimensional continuous action space, uniform random exploration may lead to slow convergence and inefficient sampling. To improve exploration efficiency, a UCB-based adaptive exploration mechanism is introduced. By assigning higher exploration priority to action regions with greater uncertainty, the proposed strategy encourages the agent to explore potentially valuable actions more effectively than conventional random perturbation. Specifically, the action space is discretized into K intervals

\{a_{1, t}, a_{2, t}, \dots, a_{K, t}\}

, and the UCB value of interval

U (a_{i, t})

is defined as follows:

U (a_{i, t}) = {\bar{R}}_{i, t} + c \sqrt{\frac{\ln N}{n_{i, t}}}

(37)

where

{\bar{R}}_{i, t}

denotes the average reward of interval

a_{i, t}

;

N

denotes the number of training steps;

n_{i, t}

denotes the number of times that interval

a_{i, t}

has been selected;

c

denotes the exploration parameter.

During training, with probability

ρ

, the center of the interval with the highest UCB value

U (a_{i, t})

is selected as the action; with probability 1 −

ρ

, the action is sampled from policy

π_{θ} (a_{t} | s_{t})

. Unlike epsilon-greedy exploration, which samples actions uniformly at random, this method prioritizes under-explored action regions while continuing to exploit high-return power-regulation actions, thereby balancing exploration and exploitation more effectively.

To further enhance generalization capability under heterogeneous multi-DC operating conditions, a horizontal federated learning framework is incorporated into PPO training. In the proposed design, each DC acts as an independent client and only shares Critic-side gradient information with the central server, while Actor policies, task queues, and other local operational data remain on-site. Since Actor policies are directly associated with local state-action mappings, not sharing Actor gradients helps reduce the exposure of sensitive scheduling information. With this design, collaborative value-function learning is achieved without exchanging raw workload data.

To further bound the sensitivity of the uploaded gradients, gradient clipping is first performed before perturbation:

\bar{\nabla} L^{d, VF} (ϕ) = \frac{\nabla L^{d, VF} (ϕ)}{\max (1, \frac{{‖\nabla L^{d, VF} (ϕ)‖}_{2}}{C})}

(38)

where

\bar{\nabla} L^{d, VF} (ϕ)

denotes the clipped Critic gradient, and C denotes the clipping threshold. Then, zero-mean Gaussian noise

N (0, σ^{2} \cdot I)

is added before parameter uploading:

\tilde{\nabla} L^{d, VF} (ϕ) = \bar{\nabla} L^{d, VF} (ϕ) + N (0, σ^{2} \cdot I)

(39)

where

\tilde{\nabla} L^{d, VF} (ϕ)

denotes the Critic gradients after noise injection;

σ

denotes the noise intensity;

I

denotes the identity matrix.

During the federated phase, the server aggregates the perturbed Critic gradients as follows:

\nabla L^{g l o b a l, VF} (ϕ) = \frac{1}{D} \sum_{d = 1}^{D} \tilde{\nabla} L^{d, VF} (ϕ)

(40)

where D denotes the number of DC clients participating in the aggregation.

After aggregation, the global Critic parameter is updated according to the aggregated gradient:

ϕ \leftarrow ϕ - α \nabla L^{g l o b a l, VF} (ϕ)

(41)

Then, the updated parameter is redistributed to each DC as the initialization for the next stage of training, thereby forming a closed-loop federated optimization process consisting of local training, gradient clipping, noise perturbation, secure aggregation, and parameter updating. In this way, collaborative Critic optimization and improved policy generalization are achieved without exchanging raw workload data, task queues, user information, or Actor policies.

The overall procedure of the tie-line power smoothing method for multiple DCs based on Fed-AdaPPO is illustrated in Figure 3.

2.5.3. Security and Convergence Analysis

Based on the Critic-only federated gradient aggregation mechanism, this subsection analyzes the theoretical properties of Fed-AdaPPO in terms of differential privacy, communication complexity, and convergence stability. Since Actor-Critic training in a continuous action space involves nonlinear function approximation and stochastic policy optimization, the resulting optimization problem is generally non-convex. Therefore, this paper does not claim global optimality, but instead analyzes the first-order convergence behavior of the noisy federated Critic update under standard smoothness and bounded-variance assumptions.

For convergence analysis, the global Critic objective is defined as follows:

L^{VF} (ϕ) = \frac{1}{D} \sum_{d = 1}^{D} L^{d, VF} (ϕ)

(42)

The noisy federated gradient aggregation is treated as a stochastic estimate of the gradient of this objective.

Assumption 1.

For each DC d, the local Critic loss function

L^{d, VF} (ϕ)

is lower bounded and has an L-Lipschitz continuous gradient. The mini-batch stochastic gradient estimator is unbiased and has a bounded second moment. The bias induced by DC heterogeneity and gradient clipping is denoted by

b_{r}

, satisfying

‖ b_{r} ‖_{2}^{2} \leq B_{het}^{2}

(43)

where

B_{het}

is introduced to denote the upper bound of the heterogeneity- and clipping-induced bias. The stochastic gradient noise is denoted by

ζ_{r}

, satisfying

E [ζ_{r} ∣ F_{r}] = 0, E [‖ ζ_{r} ‖_{2}^{2} ∣ F_{r}] \leq ν^{2}

(44)

where

F_{r}

denotes the historical information before communication round r, and

ν^{2}

is the upper bound of the stochastic gradient noise variance.

Proposition 1.

Under the gradient clipping mechanism, the l₂-sensitivity of the uploaded gradient from each DC satisfies

Δ_{2} = \sup_{S ~ S^{'}} {‖clip (\nabla L^{d, VF} (ϕ; S), C) - clip (\nabla L^{d, VF} (ϕ; S^{'}), C)‖}_{2} \leq 2 C

(45)

where

S

and

S^{'}

are neighboring local datasets that differ in only one training sample, and

Δ_{2}

denotes the l₂-sensitivity of the uploaded gradient mechanism.

Proof.

By the definition of gradient clipping, any clipped local Critic gradient satisfies

{‖clip (\nabla L^{d, VF} (ϕ), C)‖}_{2} \leq C

(46)

Therefore, for any neighboring datasets

S

and

S^{'}

, the triangle inequality gives

\begin{array}{l} {‖clip (\nabla L^{d, VF} (ϕ; S), C) - clip (\nabla L^{d, VF} (ϕ; S^{'}), C)‖}_{2} \\ \leq {‖clip (\nabla L^{d, VF} (ϕ; S), C)‖}_{2} + {‖clip (\nabla L^{d, VF} (ϕ; S^{'}), C)‖}_{2} \\ \leq 2 C \end{array}

(47)

This proves Proposition 1. □

Proposition 2.

If the Gaussian noise intensity sigma satisfies

σ \geq \frac{Δ_{2} \sqrt{2 \ln (1.25 / δ)}}{ε}, 0 < ε < 1, 0 < δ < 1

(48)

then the perturbed Critic gradient uploaded by each DC in one communication round satisfies (

ε

,

δ

)-differential privacy.

Proof.

According to Proposition 1, the l₂-sensitivity of the clipped uploaded gradient is upper bounded by

Δ_{2}

. By the Gaussian mechanism, adding zero-mean Gaussian noise with covariance

σ^{2} I

ensures (

ε

,

δ

)-differential privacy when

σ

satisfies the above condition. This proves Proposition 2. □

Substituting the sensitivity bound from Proposition 1 into the Gaussian mechanism yields a conservative per-round privacy budget of

ε = \frac{2 C \sqrt{2 \ln (1.25 / δ)}}{σ}

(49)

This result provides an explicit relationship among the privacy budget (

ε

,

δ

), the clipping threshold

C

, and the noise intensity

σ

. A smaller epsilon corresponds to stronger privacy protection, but usually requires a larger

σ

, which may increase the variance of the aggregated gradient.

Furthermore, let

M

denote the upload mechanism consisting of gradient clipping and Gaussian perturbation, and let

E

denote any event observable by the server. For any neighboring datasets

S

~

S^{'}

, we have

|\Pr (M (S) \in E) - \Pr (M (S^{'}) \in E)| \leq e^{ε} - 1 + δ

(50)

Thus, the server’s ability to distinguish neighboring datasets based on uploaded gradients is bounded by

ε

and

δ

, which limits the advantage of membership inference attacks. Moreover, gradient clipping limits the maximum contribution of local updates, and Gaussian perturbation weakens the deterministic mapping between local data and uploaded gradients, thereby mitigating the risks of gradient inversion and membership inference attacks [24]. In addition, since the uploaded gradient contains independent Gaussian perturbation,

E [{‖\tilde{\nabla} L^{d, VF} (ϕ) - clip (\nabla L^{d, VF} (ϕ), C)‖}_{2}^{2}] = p σ^{2}

(51)

where

p = \dim (ϕ)

denotes the dimension of the Critic parameter vector. This indicates that the unperturbed clipped gradient cannot be exactly recovered from a single noisy upload, thereby reducing the risk of gradient inversion attacks. Moreover, Fed-AdaPPO uploads only Critic gradients, while raw task data, task queues, user information, and Actor policy parameters remain local, further reducing the possibility of sensitive scheduling information leakage.

Next, the influence of server-side noise aggregation on training stability is analyzed. Since the Gaussian noises added by different DCs are independent, the equivalent noise after server-side averaging is

{\bar{ξ}}_{r} = \frac{1}{D} \sum_{d = 1}^{D} ξ_{d}^{r}

(52)

which satisfies

{\bar{ξ}}_{r} ~ N (0, \frac{σ^{2}}{D} I)

(53)

Therefore, as the number of participating DCs D increases, the variance of the aggregated noise decreases at a rate of 1/D. This property indicates that, with more participating DCs, the disturbance caused by privacy noise in the global Critic update direction can be partially offset by averaging, thereby improving training stability. Since each DC only uploads and receives Critic gradients in each communication round, if

p = \dim (ϕ)

, the per-round communication complexity and server-side aggregation complexity are both

O (D p)

.

Proposition 3.

Under Assumption 1 and an appropriately selected Critic learning rate alpha, the noisy federated Critic update in Fed-AdaPPO satisfies the following average first-order convergence bound:

\begin{array}{l} \frac{1}{R} \sum_{r = 0}^{R - 1} E [{‖\nabla L^{VF} (ϕ_{r})‖}_{2}^{2}] & \leq \frac{4 (L^{VF} (ϕ_{0}) - L_{\inf}^{VF})}{R α} \\ + 4 c_{1} B_{het}^{2} + 4 c_{2} α (ν^{2} + \frac{p σ^{2}}{D}) \end{array}

(54)

where

L_{\inf}^{VF}

is the lower bound of the global Critic loss function, and

c_{1}

and

c_{2}

are positive constants independent of the communication round.

Proof.

Using the aggregated gradient, the Critic parameter update at communication round r can be written as follows:

ϕ_{r + 1} = ϕ_{r} - α \tilde{\nabla} L^{Fed, VF} (ϕ_{r})

(55)

Considering client heterogeneity, clipping bias, stochastic sampling error, and differential-privacy noise, the update can be expressed as follows:

ϕ_{r + 1} = ϕ_{r} - α [\nabla L^{VF} (ϕ_{r}) + b_{r} + ζ_{r} + {\bar{ξ}}_{r}]

(56)

By the L-smoothness of

L^{VF} (ϕ)

, we have

L^{VF} (ϕ_{r + 1}) \leq L^{VF} (ϕ_{r}) + 〈\nabla L^{VF} (ϕ_{r}), ϕ_{r + 1} - ϕ_{r}〉 + \frac{L}{2} {‖ϕ_{r + 1} - ϕ_{r}‖}_{2}^{2}

(57)

Substituting the update rule into the above inequality and taking conditional expectation under the bounded-bias and bounded-variance conditions in Assumption 1 yield

E [L^{VF} (ϕ_{r + 1}) ∣ F_{r}] \leq L^{VF} (ϕ_{r}) - \frac{α}{4} {‖\nabla L^{VF} (ϕ_{r})‖}_{2}^{2} + c_{1} α B_{het}^{2} + c_{2} α^{2} (ν^{2} + \frac{p σ^{2}}{D})

(58)

Summing the above inequality over r = 0, …, R − 1 and using

L^{VF} (ϕ) \geq L_{\inf}^{VF}

gives the bound in Proposition 3. This completes the proof. □

Proposition 3 shows that the Critic update of Fed-AdaPPO converges to a first-order stationary neighborhood. The size of this neighborhood is jointly determined by the heterogeneity and clipping bias

B_{het}^{2}

, the stochastic gradient variance

ν^{2}

, and the differential-privacy noise term

p σ^{2} / D

. In particular, as D increases, the privacy-noise term decreases, indicating that multi-DC aggregation can alleviate the negative influence of noisy gradients on training stability.

The Actor network is updated locally through the clipped PPO objective, and its parameters are not involved in federated aggregation. Since the clipped PPO objective constrains the magnitude of each policy update, the Actor update can be regarded as local policy improvement based on the stabilized Critic estimate. The training process of Fed-AdaPPO can therefore be viewed as a privacy-constrained, two-level stochastic optimization: the Critic layer learns a collaborative value function through noisy federated gradient aggregation across DCs, and the Actor layer performs policy optimization locally.

3. Results and Discussion

3.1. Experimental Parameter Settings

To replicate the load distribution patterns observed in practical environments, the initial resource utilization of each DC is generated by sampling from the statistical distribution of the Alibaba Cluster-trace-v2018 dataset [25]. For renewable generation, the open dataset released by the Belgian transmission system operator Elia is adopted [26,27]. The original sampling interval is 15 min. To match the control time step, cubic spline interpolation is applied to obtain data with a 1 min resolution. To improve reproducibility and avoid information leakage in time-series data, the preprocessed source–load data are segmented into daily scenarios, each consisting of 1440 time steps with a 1 min resolution. These daily scenarios are then divided chronologically into training and testing sets with a ratio of 8:2. After training, three representative DC scenarios are selected from the testing set for validation. Government DC 1 exhibits a tidal load pattern characterized by high daytime demand and low nighttime demand, corresponding to working hours. AI inference DC 2 exhibits strong randomness driven by real-time requests. Internet DC 3 is driven by online entertainment services and shows an evening peak accompanied by random fluctuations. Considering differences in construction cycle and geographical resources among the three DCs, the renewable energy penetration levels are set to 20%, 30%, and 0%, respectively. The renewable generation profiles of the two DCs with renewable generation are shown in Figure 4.

All simulations are conducted on a computer equipped with an AMD Ryzen 7 9700X processor (Advanced Micro Devices, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 5060Ti graphics card (NVIDIA Corporation, Santa Clara, CA, USA). The software environment is based on Python 3.10, and the model is implemented using the PyTorch 2.11.0 framework. The task parameters are listed in Table 1. The time constants of the two-stage filter are set to 20 min and 120 min, respectively. The control step size of the server cluster is 1 min, and that of the UPS battery bank is 5 min. The UPS battery parameters are listed in Table 2. The Fed-AdaPPO parameters are listed in Table 3. In the repeated-run experiments, the training and testing sets remain fixed, and only the random seed is changed.

3.2. Comparison of Multiple Algorithms

To evaluate the superiority of Fed-AdaPPO for the DC tie-line power smoothing problem, comparative experiments were conducted against federated Soft Actor-Critic (Fed-SAC) [28], Non-federated AdaPPO, Non-federated PPO [29], Multi-Agent PPO (MAPPO) [30], Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [31], and the first-come-first-served (FCFS) algorithm [32].

Figure 5 presents the reward curves of the different algorithms. Taking DC 1 as an example, Non-federated PPO converges only after 4000 training episodes and exhibits the most severe oscillations. Non-federated AdaPPO, which incorporates an adaptive exploration mechanism, dynamically balances exploration and exploitation, effectively avoiding prolonged stagnation and significantly accelerating convergence. Fed-AdaPPO further outperforms both non-federated methods in convergence speed and stability, showing a clear upward trend after approximately 500 episodes and converging by approximately 2000 episodes. This improvement arises because collaborative Critic-gradient aggregation compensates for the limited sample diversity of single-node training while preserving data privacy. By contrast, MAPPO converges more slowly and to a lower final reward than Fed-AdaPPO across all three DCs, with noticeable oscillations in DC 2 and DC 3. MAPPO learns a single global policy shared across all DCs. Under the substantial heterogeneity among the three DCs, which span distinct load patterns and renewable penetration levels, a single global policy cannot adapt simultaneously to all operating conditions, producing compromised and unstable performance in individual DCs. MADDPG performs the worst among all learning-based methods, with rewards remaining low and highly oscillatory throughout training and no clear convergence trend. Its deterministic policy is poorly suited to heterogeneous and stochastic scheduling environments, and its off-policy experience replay amplifies instability under cross-DC variability.

To improve the statistical reliability of the reported results, each learning-based method is evaluated over six independent runs under different random seeds, and the main performance metrics are reported as mean ± standard deviation. The corresponding 95% confidence intervals are calculated using the t-distribution, and statistical significance is further examined using the Wilcoxon signed-rank test.

Table 4 compares the smoothing performance of different scheduling algorithms in the three DC scenarios based on six independent runs. In DC 1, Fed-AdaPPO reduces the standard deviation of tie-line power from 10.988 kW before optimization to 9.674 ± 0.017 kW, corresponding to a reduction of 11.96%. Compared with Fed-SAC, Non-federated AdaPPO, Non-federated PPO, MAPPO, MADDPG, and FCFS, the standard deviation is further reduced by 1.29%, 0.63%, 3.53%, 7.53%, 5.76%, and 8.45%, respectively. Notably, MAPPO and MADDPG exhibit large run-to-run variability in DC 1, with standard deviations of ±0.845 kW and ±0.589 kW across six runs, far exceeding the ±0.017 kW of Fed-AdaPPO. This instability reflects the difficulty that a single global policy faces in maintaining consistent performance across heterogeneous DC environments.

In DC 2, which exhibits the most severe fluctuations, Fed-AdaPPO reduces the standard deviation from 15.980 kW to 13.359 ± 0.145 kW, a reduction of 16.40%. Fed-SAC, Non-federated AdaPPO, Non-federated PPO, MAPPO, and MADDPG yield standard deviations of 13.723, 13.812, 13.886, 14.762, and 14.864 kW, respectively. Fed-AdaPPO thus achieves additional reductions of 2.65%, 3.28%, 3.80%, 9.50%, and 10.12% over these baselines. The gap is particularly pronounced for MAPPO and MADDPG, which reduce the standard deviation by only 7.62% and 6.98% from the pre-optimization baseline, compared to 16.40% for Fed-AdaPPO. Their run-to-run variability is also substantially larger, with standard deviations exceeding ±1.3 kW across six runs. This poor performance stems from the fundamental mismatch between the single global policy design of CTDE methods and the pronounced heterogeneity in load patterns and renewable penetration across DCs.

In DC 3, Fed-AdaPPO reduces the standard deviation from 12.021 kW to 8.609 ± 0.020 kW, a reduction of 28.38%. Compared with Fed-SAC, Non-federated AdaPPO, Non-federated PPO, MAPPO, MADDPG, and FCFS, the standard deviation is further reduced by 2.51%, 1.40%, 6.65%, 3.33%, 4.16%, and 17.66%, respectively. Across all three DCs, Fed-AdaPPO consistently achieves the lowest average delay while delivering the best smoothing performance. The underperformance of MAPPO and MADDPG, together with their high run-to-run variance, confirms that CTDE methods relying on a single global policy are ill-suited to multi-DC scheduling under strong cross-DC heterogeneity. In contrast, Fed-AdaPPO trains DC-specific Actor policies locally while sharing only Critic gradients, allowing each DC to develop a policy adapted to its own operating characteristics.

From a statistical perspective, the corresponding 95% confidence intervals of Fed-AdaPPO remain relatively narrow, namely, [9.656, 9.692] kW, [13.208, 13.511] kW, and [8.587, 8.630] kW for the tie-line power standard deviation, and [5.590, 5.659] min, [5.352, 5.588] min, and [5.453, 5.532] min for the average delay in DC1, DC2, and DC3, respectively. Moreover, the Wilcoxon signed-rank test based on the six repeated runs shows that the improvements of Fed-AdaPPO over Fed-SAC, Non-federated AdaPPO, Non-federated PPO, MAPPO, and MADDPG are statistically significant in both tie-line power standard deviation and average delay for all three DCs, with all p-values equal to 0.03125 (<0.05). These results confirm that the superior performance of Fed-AdaPPO is not an artifact of a particular random initialization but persists across repeated experiments. Since FCFS and Before optimization are deterministic cases, they are used only for descriptive comparison and are not included in the repeated-run significance test.

3.3. Smoothing Effect Analysis

Figure 6 shows the tie-line power trajectories of the three DCs after optimization using the proposed method. The light-blue solid curve represents the initial tie-line power, which exhibits pronounced fluctuations and high-frequency noise owing to the combined uncertainty of user behavior and the stochasticity of wind and photovoltaic generation. After the initial tie-line power is processed by the first-stage filter, the first-level target tie-line power is obtained, as shown by the red curve, based on which the regulation target of each server cluster is determined. The proposed method optimizes server power by dynamically adjusting the execution timing of delay-tolerant tasks. As a result, the post-response tie-line power, shown by the dark-blue curve, can effectively track the target, and high-frequency fluctuations are significantly suppressed.

The first-level target tie-line power is then passed through the second-stage filter to generate the second-level target tie-line power, shown by the yellow curve, which serves as the regulation reference for the UPS battery bank. Under the SOC and charge/discharge cycle constraints, the battery bank participates in regulation by adjusting its charging and discharging power in real time. After the two-stage coordinated control, the tie-line power, shown by the green curve, becomes significantly smoother. The standard deviation of tie-line power in the three DCs decreases from 10.988 kW, 15.980 kW, and 12.021 kW to 7.320 kW, 9.042 kW, and 6.570 kW, respectively, corresponding to reductions of 33.4%, 43.4%, and 45.3%. These results verify the effectiveness of the proposed method.

To quantify the smoothing performance, the tie-line power fluctuation rate is introduced and defined as follows:

Δ P_{TL, t}^{d} = \frac{d P_{TL, t}^{d}}{d t} = \frac{P_{TL, t + 1}^{d} - P_{TL, t}^{d}}{Δ t}

(59)

Taking DC 1 as an example, Figure 7 shows that the initial tie-line power fluctuation rate remains around ±10 kW/min during the control period. After server-cluster response, the distribution contracts toward the zero axis, indicating a substantial reduction in fluctuation rate and an initial suppression of severe fluctuations. After further introducing UPS-based energy storage regulation, the fluctuation rate is concentrated near zero, demonstrating that the proposed method can significantly reduce both the magnitude and uncertainty of tie-line power fluctuations.

3.4. Assessment of Factors Affecting the Smoothing Performance

3.4.1. Impact of Renewable Energy Penetration on Smoothing Performance

To evaluate the applicability of the proposed strategy in future green computing networks, DC 2, which exhibits the largest fluctuations, is selected as an example. With reference to the national target of achieving a green electricity share of more than 80% at national computing hub nodes, two additional scenarios with renewable energy penetration levels of 60% and 80% are considered to emulate high-penetration conditions in the deep decarbonization stage.

The simulation results are shown in Figure 8a. As the renewable energy penetration increases further, tie-line power fluctuations become significantly more pronounced. In particular, under the extreme 80% penetration scenario, represented by the dark red curve, the tie-line power becomes negative during periods of peak renewable output, approximately from 700 min to 900 min. This indicates that renewable generation exceeds the DC load demand, and the system enters a reverse power flow mode. Even under this condition, the proposed control strategy maintains a satisfactory smoothing effect, enabling the tie-line power to transition smoothly from source-to-load absorption to source-to-grid injection. This verifies the bidirectional regulation capability of the system under high renewable penetration. Such a smooth transition effectively avoids the risks of voltage-limit violation and frequency oscillation caused by reverse power flow, thereby demonstrating the robustness and applicability of the proposed method in future deep decarbonization scenarios.

As further shown in Figure 8b, after applying the proposed multi-time-scale smoothing strategy, tie-line power fluctuations are effectively suppressed under all penetration scenarios. Even in the 80% high-penetration case, the final fluctuation rate remains concentrated around zero. These results strongly demonstrate the robustness of the proposed method and its capability to provide reliable tie-line power control for green DC development.

3.4.2. Impact of the Proportion of Delay-Tolerant Tasks on Smoothing Performance

DC workloads are diverse, and the delay sensitivity of different task types directly determines the dispatchable potential of the load. To verify the effectiveness of the proposed method in exploiting load-side flexibility, DC 2 is again taken as an example. The proportion of delay-tolerant tasks is reduced from the baseline value of 70% to 30%, while keeping the proportions of the three task categories unchanged, thereby simulating a service scenario with stricter real-time requirements.

Table 5 presents the smoothing performance after server-cluster response under different proportions of delay-tolerant tasks. When the proportion of delay-tolerant tasks decreases from 70% to 30%, the number of schedulable delay-tolerant tasks is reduced, causing the standard deviation of tie-line power to increase from 13.203 kW to 13.798 kW and the average delay to rise to 5.55 min. Compared with the pre-optimization standard deviation of 15.980 kW, the value is still reduced by 13.7%. This demonstrates that Fed-AdaPPO can effectively perceive load characteristics and maintain relatively stable smoothing performance even when the scheduling freedom is restricted.

3.4.3. Impact of Two-Stage Filtering Time Constants on Smoothing Performance

To evaluate the influence of the two-stage filtering parameters on the hierarchical regulation performance, a sensitivity analysis is conducted for the first-stage time constant

T_{1}

and the second-stage time constant

T_{2}

. Since

T_{1}

mainly affects the response of the server cluster, while

T_{2}

determines the UPS-based compensation of the remaining low-frequency power deviations, the two parameters are tested separately. Specifically,

T_{2}

is fixed when analyzing

T_{1}

, and

T_{1}

is fixed when analyzing

T_{2}

.

Table 6 presents the standard deviation of tie-line power and the average delay after the server-cluster response under different

T_{1}

values, with

T_{2}

fixed at 120 min. When

T_{1} = 20

min, the standard deviations of tie-line power for DC 1, DC 2, and DC 3 are 9.652 kW, 13.203 kW, and 8.607 kW, respectively, which are lower than those obtained under the other tested settings. Meanwhile, the corresponding average delays are 5.80 min, 5.43 min, and 5.50 min, respectively. When

T_{1}

is reduced to 10 min or increased to 40 min, the standard deviations increase in all three DCs. This indicates that an excessively small first-stage time constant may retain more short-term fluctuations in the target signal, increasing the regulation burden on the server cluster, whereas an excessively large value may weaken the timely response of the server cluster to adjustable workloads. Therefore,

T_{1}

= 20 min provides a better trade-off between tie-line power smoothing and response delay in the studied scenario.

Table 7 presents the standard deviation of tie-line power after UPS response under different

T_{2}

values, with

T_{1}

fixed at 20 min. The results show that increasing

T_{2}

from 60 min to 120 min significantly reduces the standard deviation of tie-line power in all three DCs, indicating that a moderate increase in the second-stage filtering time constant can enhance the UPS compensation for low-frequency residual power deviations. When

T_{2}

is further increased to 240 min, the standard deviations of DC 2 and DC 3 continue to decrease, whereas that of DC 1 increases from 7.320 kW to 8.241 kW. This is mainly because a larger

T_{2}

transfers more low-frequency power deviations to the UPS layer, leading to larger regulation amplitudes, higher energy throughput, and wider SOC variations. When the UPS regulation capability is limited by its capacity, SOC safety range, and charge/discharge power constraints, part of the regulation demand cannot be fully tracked, and the smoothing effect may no longer improve. Therefore, although an excessively large

T_{2}

may further reduce tie-line power fluctuations in some DCs, it also places greater engineering demands on UPS operation. Considering tie-line power smoothing performance, UPS regulation capability, and battery operating constraints,

T_{1}

= 20 min and

T_{2}

= 120 min are selected as the benchmark parameters for the subsequent simulations.

3.5. Computational Efficiency and Real-Time Feasibility Analysis

To clarify the computational cost and real-time feasibility of the proposed framework, the runtime and communication overhead are further evaluated. After offline training, the online operation only involves Actor-network inference for server-cluster scheduling and a rule-based greedy dispatch for the UPS layer. On the test platform, the average online Actor inference time is 0.086 ms per decision step. For the UPS layer, the greedy dispatch time for one DC over a 288-point daily sequence is 3.89 ms, corresponding to an average of 0.014 ms per 5 min control step. These results indicate that the online computation is much shorter than the 1 min server-cluster control interval and the 5 min UPS control interval.

To further assess practical deployment feasibility, the federated communication transmission time is estimated. Based on the parameter scale of the Critic network, the bidirectional transmission time per DC per aggregation round is approximately 85.36 ms, 8.54 ms, and 0.85 ms under communication bandwidths of 100 Mbps, 1 Gbps, and 10 Gbps, respectively. These values are all much shorter than the adopted 1 min server-cluster control step and the 5 min UPS control step. Therefore, under the current multi-time-scale scheduling framework, federated communication is not expected to become a practical real-time bottleneck.

The above estimates assume ideal network conditions. In practice, stochastic communication delays may cause individual data centers to upload stale Critic gradients computed from earlier parameters, introducing a staleness bias. As the number of participating data centers D grows, two opposing effects emerge. A larger D reduces the privacy-noise term pσ²/D through averaging, and the impact of any single dropout shrinks. At the same time, straggler probability rises with fleet size, and staleness variance across clients may degrade the aggregated gradient. For moderate-scale deployments, where intra-data-center latencies are on the order of milliseconds and the control steps adopted here are 1 min and 5 min, communication delays are unlikely to pose a practical bottleneck.

3.6. Impact of Privacy Protection Levels on Scheduling Performance

To quantify the privacy-utility trade-off, the Gaussian noise standard deviation σ was varied while all other training parameters and system settings were held fixed.

Table 8 reports the scheduling performance of Fed-AdaPPO under varying Gaussian noise intensities σ. Across all three data centers, increasing σ from 1 to 2.5 produced only marginal changes in tie-line power standard deviation and average delay, indicating that the proposed method retains effective scheduling under moderate privacy protection. When σ rose to 5 and 10, both metrics degraded more markedly: stronger noise injection diluted the informativeness of Critic gradient aggregation, weakening policy learning and final scheduling quality. These results confirm a clear privacy-utility trade-off, with σ = 2.5 offering a practical compromise between privacy strength and scheduling performance.

4. Conclusions

This paper proposes a Fed-AdaPPO-based multi-time-scale tie-line power smoothing method for multiple DCs under high renewable energy penetration. A two-stage first-order low-pass filter decomposes tie-line power fluctuations into high- and low-frequency components, which are regulated at different time scales through coordinated server-cluster scheduling and UPS energy storage control. With this coordinated regulation, the standard deviation of tie-line power in the three DCs is reduced by 33.4%, 43.4%, and 45.3%, respectively. Fed-AdaPPO outperforms all baseline methods, including federated, non-federated, and multi-agent reinforcement learning approaches, in both smoothing performance and service quality. In the server-cluster response stage, the standard deviation is reduced by 12.2%, 17.4%, and 28.4%, while the average task delay remains the lowest among all compared methods. The method also remains effective under challenging conditions, including renewable energy penetration up to 80% and a reduced proportion of delay-tolerant tasks.

A limitation of the present study is that validation remains simulation-based. Although the Alibaba production traces and Elia operational data provide representative test scenarios, and the three data centers examined span distinct load patterns and renewable penetration levels, the sim-to-real transfer gap has not been quantified. Hardware-in-the-loop validation on a scaled-down testbed is a natural next step toward operational deployment. Future work will pursue two directions. First, the modeling fidelity of the proposed scheduling framework will be improved by incorporating asynchronous federated updates, communication delays, server heterogeneity, cooling dynamics, and battery thermal characteristics. Second, the sim-to-real transfer gap will be addressed through heterogeneous hardware adaptation, safety verification of learned scheduling policies, and coordination with existing power system dispatch cycles.

Author Contributions

Conceptualization, Q.L.; methodology, Q.L., J.Y. and X.F.; software, Q.L. and X.F.; validation, J.Y.; formal analysis, Q.L.; visualization, Q.L. and J.Y.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L., J.Y. and X.F.; supervision, X.F.; project administration, J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Smart Grid-National Science and Technology Major Project (2024ZD0800800).

Data Availability Statement

The Alibaba Cluster-trace-v2018 dataset used for workload modeling is publicly available at the Alibaba Cluster Data repository. The wind and photovoltaic generation datasets used for renewable generation modeling are publicly available from the Elia Open Data Portal. The processed data generated during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, M.; Porter, A.L. Can Nanogenerators Contribute to the Global Greening Data Centres? Nano Energy 2019, 60, 235–246. [Google Scholar] [CrossRef]
Ding, Z.; Cao, Y.; Xie, L.; Lu, Y.; Wang, P. Integrated Stochastic Energy Management for Data Center Microgrid Considering Waste Heat Recovery. IEEE Trans. Ind. Appl. 2019, 55, 2198–2207. [Google Scholar] [CrossRef]
Ding, Z.; Chen, S.; Sun, Y.; Shi, K.; Wang, J.; Chen, S.; Xiao, T.; Wang, Y.; Wei, X. Data Center Job Scheduling and Energy Management Under Uncertain Environments. IEEE Trans. Ind. Appl. 2025, 61, 5489–5500. [Google Scholar] [CrossRef]
International Energy Agency. Key Questions on Energy and AI; International Energy Agency: Paris, France, 2026.
Ren, X.; Wang, J.; Hu, X.; Sun, Z.; Zhao, Q.; Chong, D.; Xue, K.; Yan, J. A Novel Demand Response-Based Distributed Multi-Energy System Optimal Operation Framework for Data Centers. Energy Build. 2024, 305, 113886. [Google Scholar] [CrossRef]
An, M.; Han, X.; Lu, T. A Stochastic Model Predictive Control Method for Tie-Line Power Smoothing under Uncertainty. Energies 2024, 17, 3515. [Google Scholar] [CrossRef]
Wu, D.; Guo, F.; Yao, Z.; Zhu, D.; Zhang, Z.; Li, L.; Du, X.; Zhang, J. Enhancing Reliability and Performance of Load Frequency Control in Aging Multi-Area Power Systems under Cyber-Attacks. Appl. Sci. 2024, 14, 8631. [Google Scholar] [CrossRef]
Wen, G.; Xu, J.-Z.; Liu, Z.-W. Hierarchical Regulation Strategy for Smoothing Tie-Line Power Fluctuations in Grid-Connected Microgrids with Battery Storage Aggregators. IEEE Trans. Ind. Inform. 2024, 20, 12210–12219. [Google Scholar] [CrossRef]
Tran, N.H.; Tran, D.H.; Ren, S.; Han, Z.; Huh, E.-N.; Hong, C.S. How Geo-Distributed Data Centers Do Demand Response: A Game-Theoretic Approach. IEEE Trans. Smart Grid 2015, 7, 937–947. [Google Scholar] [CrossRef]
Cupelli, L.; Schutz, T.; Jahangiri, P.; Fuchs, M.; Monti, A.; Muller, D. Data Center Control Strategy for Participation in Demand Response Programs. IEEE Trans. Ind. Inform. 2018, 14, 5087–5099. [Google Scholar] [CrossRef]
Yang, T.; Hou, Y.; Cai, S.; Yu, J.; Pen, H. Multi-Data Center Tie-Line Power Smoothing Method Based on Demand Response. IEEE Trans. Cloud Comput. 2024, 12, 983–995. [Google Scholar] [CrossRef]
Shao, W.; Li, L.; Zhao, J. Carbon-Oriented Workload Spatial-Temporal Shifting Scheduling for Internet Data Centers. Electr. Power Syst. Res. 2026, 258, 113090. [Google Scholar] [CrossRef]
Yang, L.; Hu, Z. Coordination of Generators and Energy Storage to Smooth Power Fluctuations for Multi-Area Microgrid Clusters: A Robust Decentralized Approach. IEEE Access 2021, 9, 12506–12520. [Google Scholar] [CrossRef]
Siddesha, K.; Jayaramaiah, G.V.; Singh, C. A Novel Deep Reinforcement Learning Scheme for Task Scheduling in Cloud Computing. Clust. Comput. 2022, 25, 4171–4188. [Google Scholar] [CrossRef]
Lou, J.; Tang, Z.; Jia, W. Energy-Efficient Joint Task Assignment and Migration in Data Centers: A Deep Reinforcement Learning Approach. IEEE Trans. Netw. Serv. Manag. 2023, 20, 961–973. [Google Scholar] [CrossRef]
Chen, S.; Li, J.; Yuan, Q.; He, H.; Li, S.; Yang, J. Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning. IEEE Trans. Parallel Distrib. Syst. 2024, 35, 2331–2346. [Google Scholar] [CrossRef]
Alinezhadi, A.; Sheikholeslami, S.M.; Atapour, S.K.; Abouei, J.; Plataniotis, K.N. Intelligent Privacy-Preserving Demand Response for Green Data Centers. Electr. Power Syst. Res. 2023, 221, 109394. [Google Scholar] [CrossRef]
Lin, W.-T.; Chen, G.; Zhou, X. Privacy-Preserving Federated Learning for Detecting False Data Injection Attacks on Power System. Electr. Power Syst. Res. 2024, 229, 110150. [Google Scholar] [CrossRef]
Dayarathna, M.; Wen, Y.; Fan, R. Data Center Energy Consumption Modeling: A Survey. IEEE Commun. Surv. Tutor. 2016, 18, 732–794. [Google Scholar] [CrossRef]
Flores-Martin, D.; Mahillo, M.; Lemus-Prieto, F.; Corral-García, J.; Rico-Gallego, J.A. Improving Energy Efficiency in a Data Center: PUE Analyzing and Tuning. In Proceedings of the 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Tromsø, Norway, 19 May 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–10. [Google Scholar]
Yuan, H.; Bi, J.; Zhou, M. Spatiotemporal Task Scheduling for Heterogeneous Delay-Tolerant Applications in Distributed Green Data Centers. IEEE Trans. Autom. Sci. Eng. 2019, 16, 1686–1697. [Google Scholar] [CrossRef]
Jin, C.; Bai, X.; Yang, C.; Mao, W.; Xu, X. A Review of Power Consumption Models of Servers in Data Centers. Appl. Energy 2020, 265, 114806. [Google Scholar] [CrossRef]
Boulmrharj, S.; Ouladsine, R.; NaitMalek, Y.; Bakhouya, M.; Zine-dine, K.; Khaidar, M.; Siniti, M. Online Battery State-of-Charge Estimation Methods in Micro-Grid Systems. J. Energy Storage 2020, 30, 101518. [Google Scholar] [CrossRef]
Yang, T.; Feng, X.; Cai, S.; Niu, Y.; Pen, H. A Privacy-Preserving Federated Reinforcement Learning Method for Multiple Virtual Power Plants Scheduling. IEEE Trans. Circuits Syst. Regul. Pap. 2025, 72, 1939–1950. [Google Scholar] [CrossRef]
Alibaba Group Alibaba Cluster Trace Program: Cluster-Trace-V2018. 2018. Available online: https://github.com/alibaba/clusterdata/tree/master/cluster-trace-v2018 (accessed on 28 April 2026).
Elia Transmission Belgium SA. Wind Power Production Estimation and Forecast on Belgian Grid (Historical), Dataset Ods031; Elia Transmission Belgium: Brussels, Belgium.
Elia Transmission Belgium SA. Photovoltaic Power Production Estimation and Forecast on Belgian Grid (Historical), Dataset Ods032; Elia Transmission Belgium: Brussels, Belgium.
Moghaddasi, K.; Jurdak, R. An Energy-Aware Distributed Federated Soft Actor-Critic Framework for Intelligent Task Offloading in Vehicular Mobile Edge Computing Networks. Ad Hoc Netw. 2026, 180, 104043. [Google Scholar] [CrossRef]
Ionuț-Cosmin, D.; Alexandrescu, B.; Constantinescu, R.-C. Energy-Efficient Task Scheduling in Data Centers Using Adaptive Deep Reinforcement Learning. In Proceedings of the 2025 10th International Conference on Energy Efficiency and Agricultural Engineering (EE & AE), Stara Zagora, Bulgaria, 5 November 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
Wang, Z.; Xiao, F.; Ran, Y.; Li, Y.; Xu, Y. Scalable Energy Management Approach of Residential Hybrid Energy System Using Multi-Agent Deep Reinforcement Learning. Appl. Energy 2024, 367, 123414. [Google Scholar] [CrossRef]
Harrold, D.J.B.; Cao, J.; Fan, Z. Renewable Energy Integration and Microgrid Energy Trading Using Multi-Agent Deep Reinforcement Learning. Appl. Energy 2022, 318, 119151. [Google Scholar] [CrossRef]
Akbar, S.; Malik, S.U.R.; Choo, K.-K.R.; Khan, S.U.; Ahmad, N.; Anjum, A. A Game-Based Thermal-Aware Resource Allocation Strategy for Data Centers. IEEE Trans. Cloud Comput. 2021, 9, 845–853. [Google Scholar] [CrossRef]

Figure 1. DC microgrid architecture.

Figure 2. Task type schematic diagram.

Figure 3. Framework of the Fed-AdaPPO algorithm for tie-line power smoothing in multiple DCs.

Figure 4. Renewable generation profiles of the two DCs. (a) DC 1; (b) DC 2.

Figure 5. Comparison of training reward curves. (a) DC 1; (b) DC 2; (c) DC 3.

Figure 6. Tie-line power trajectories of the three DCs. (a) DC 1; (b) DC 2; (c) DC 3.

Figure 7. Distribution of tie-line power fluctuations in the three DCs. (a) DC 1; (b) DC 2; (c) DC 3.

Figure 8. Tie-line power smoothing performance of DC 2 under different renewable energy penetration levels. (a) Tie-line power trajectory; (b) Distribution of tie-line power fluctuations (80% penetration).

Table 1. Task parameters.

Task Type	CPU Utilization per Server	Processing Time (min)	Allowable Delay Time (min)	Task Proportion
1	40%	8	14	10%
2	20%	5	12	20%
3	10%	3	8	40%
4	1%	1	0	30%

Table 2. UPS battery parameters.

Parameter	DC1	DC2	DC3
Number of battery groups	4	5	4
Capacity of each battery group $E_{UPS}^{d}$ /kWh	80	80	100
Charging coefficient $ψ_{c}^{d}$	0.95	0.95	0.95
Discharging coefficient $ψ_{e}^{d}$	1.05	1.05	1.05
Initial SOC $S_{0}^{d}$	0.60	0.60	0.60
Physical SOC range $[S_{\min}^{d}, S_{\max}^{d}]$	[0.20, 0.80]	[0.20, 0.80]	[0.20, 0.80]
Reserve SOC margin $S_{res}^{d}$	0.10	0.10	0.10
Engineering safety margin $S_{mar}^{d}$	0.05	0.05	0.05
Actual regulation SOC range $[{\underline{S}}^{d}, S_{\max}^{d}]$	[0.35, 0.80]	[0.35, 0.80]	[0.35, 0.80]
Maximum discharging power $P_{dis, \max}^{d}$ /kW	120	150	150
Maximum charging power $P_{ch, \max}^{d}$ /kW	100	120	120
Maximum power ramp rate $R_{UPS}^{d}$ /kW·min⁻¹	6.0	7.5	7.5
Maximum number of charge/discharge switching actions $K_{\max}^{d}$	24	24	24
Throughput-based degradation cost coefficient $c_{\deg}^{d}$ /CNY·kWh⁻¹	0.208	0.208	0.208

Table 3. Fed-AdaPPO parameters.

Parameter	Value
Hidden layer dimension	256
Actor learning rate $β$	0.0001
Critic learning rate $α$	0.0001
Discount factor $γ$	0.7
GAE parameter $λ$	0.95
PPO clipping parameter $ε_{clip}$	0.2
Training batch size	144
UCB confidence parameter $c$	1
Number of participating DC clients $D$	3
Local training episode	5
Total training episodes	10,000
Gradient clipping threshold $C$	1
Differential privacy parameter $δ$	10⁻⁵
Gaussian noise standard deviation $σ$	2.5

Table 4. Comparison of smoothing performance after server-cluster response.

Method	DC 1		DC 2		DC 3
Method	Std. of Tie-Line Power (kW)	Average Delay (min)	Std. of Tie-Line Power (kW)	Average Delay (min)	Std. of Tie-Line Power (kW)	Average Delay (min)
Fed-AdaPPO	9.674 ± 0.017	5.625 ± 0.033	13.359 ± 0.145	5.470 ± 0.113	8.609 ± 0.020	5.493 ± 0.038
Fed-SAC	9.800 ± 0.027	5.755 ± 0.033	13.723 ± 0.046	5.590 ± 0.010	8.831 ± 0.052	5.701 ± 0.022
Non-federated AdaPPO	9.735 ± 0.018	5.696 ± 0.016	13.812 ± 0.015	5.627 ± 0.003	8.731 ± 0.065	5.641 ± 0.028
Non-federated PPO	10.028 ± 0.043	5.877 ± 0.037	13.886 ± 0.018	5.685 ± 0.020	9.222 ± 0.307	5.791 ± 0.049
MAPPO	10.461 ± 0.845	5.902 ± 0.071	14.762 ± 1.424	5.710 ± 0.166	8.905 ± 0.257	5.678 ± 0.090
MADDPG	10.265 ± 0.589	5.844 ± 0.091	14.864 ± 1.362	5.780 ± 0.209	8.982 ± 0.611	5.700 ± 0.190
FCFS	10.567	6.045	14.996	5.757	10.455	5.933
Before optimization	10.988	0	15.980	0	12.021	0

Table 5. Comparison of smoothing performance under different proportions of delay-tolerant tasks.

Proportion of Delay-Tolerant Tasks	Standard Deviation of Tie-Line Power (kW)	Average Delay (min)
30%	13.798	5.55
70%	13.203	5.43

Table 6. Comparison of smoothing performance after server-cluster response.

T₁/T₂ (min)	DC 1		DC 2		DC 3
T₁/T₂ (min)	Standard Deviation of Tie-Line Power (kW)	Average Delay (min)	Standard Deviation of Tie-Line Power (kW)	Average Delay (min)	Standard Deviation of Tie-Line Power (kW)	Average Delay (min)
20/120	9.652	5.80	13.203	5.43	8.607	5.50
10/120	9.733	5.85	13.572	5.57	8.910	5.73
40/120	10.019	5.9	14.268	5.64	9.131	5.69

Table 7. Comparison of smoothing performance after UPS response.

T₁/T₂ (min)	DC 1	DC 2	DC 3
T₁/T₂ (min)	Standard Deviation of Tie-Line Power (kW)	Standard Deviation of Tie-Line Power (kW)	Standard Deviation of Tie-Line Power (kW)
20/60	7.866	11.079	7.024
20/120	7.320	9.042	6.570
20/240	8.241	6.433	5.898

Table 8. Comparison of smoothing performance under different privacy strengths.

σ	ε	DC 1		DC 2		DC 3
σ	ε	Standard Deviation of Tie-Line Power (kW)	Average Delay (min)	Standard Deviation of Tie-Line Power (kW)	Average Delay (min)	Standard Deviation of Tie-Line Power (kW)	Average Delay (min)
1	7.76	9.673 ± 0.018	5.608 ± 0.036	13.331 ± 0.154	5.408 ± 0.140	8.607 ± 0.012	5.489 ± 0.062
2.5	3.88	9.674 ± 0.017	5.625 ± 0.033	13.359 ± 0.145	5.470 ± 0.113	8.609 ± 0.020	5.493 ± 0.038
5	1.55	9.721 ± 0.032	5.733 ± 0.108	13.848 ± 0.042	5.584 ± 0.045	8.739 ± 0.064	5.716 ± 0.051
10	0.78	10.009 ± 0.116	5.782 ± 0.056	13.859 ± 0.047	5.602 ± 0.028	9.112 ± 0.549	5.719 ± 0.256

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, Q.; Yu, J.; Feng, X. A Privacy-Preserving Multi-Time-Scale Tie-Line Power Smoothing Method for Multiple Data Centers. Energies 2026, 19, 2708. https://doi.org/10.3390/en19112708

AMA Style

Luo Q, Yu J, Feng X. A Privacy-Preserving Multi-Time-Scale Tie-Line Power Smoothing Method for Multiple Data Centers. Energies. 2026; 19(11):2708. https://doi.org/10.3390/en19112708

Chicago/Turabian Style

Luo, Quanyong, Jiexiao Yu, and Xiangwei Feng. 2026. "A Privacy-Preserving Multi-Time-Scale Tie-Line Power Smoothing Method for Multiple Data Centers" Energies 19, no. 11: 2708. https://doi.org/10.3390/en19112708

APA Style

Luo, Q., Yu, J., & Feng, X. (2026). A Privacy-Preserving Multi-Time-Scale Tie-Line Power Smoothing Method for Multiple Data Centers. Energies, 19(11), 2708. https://doi.org/10.3390/en19112708

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Privacy-Preserving Multi-Time-Scale Tie-Line Power Smoothing Method for Multiple Data Centers

Abstract

1. Introduction

2. Materials and Methods

2.1. DC Microgrid Architecture

2.2. Server Cluster Load Model Based on Task Migration

2.3. UPS Battery Energy Storage Model

2.4. Multi-Time-Scale Tie-Line Power Smoothing Strategy

2.5. Distributed Horizontal Federated Reinforcement Learning Framework

2.5.1. Markov Decision Process Modeling

2.5.2. Federated Learning Framework Mechanism

2.5.3. Security and Convergence Analysis

3. Results and Discussion

3.1. Experimental Parameter Settings

3.2. Comparison of Multiple Algorithms

3.3. Smoothing Effect Analysis

3.4. Assessment of Factors Affecting the Smoothing Performance

3.4.1. Impact of Renewable Energy Penetration on Smoothing Performance

3.4.2. Impact of the Proportion of Delay-Tolerant Tasks on Smoothing Performance

3.4.3. Impact of Two-Stage Filtering Time Constants on Smoothing Performance

3.5. Computational Efficiency and Real-Time Feasibility Analysis

3.6. Impact of Privacy Protection Levels on Scheduling Performance

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI