Next Article in Journal
Process Simulation and Technical Evaluation of Dual Oil and Biochar Co-Production from Native Avocado Using the Extended Water–Energy–Product Approach
Previous Article in Journal
Resilience Enhancement Strategy for Power Systems: A Novel Active Response Model
Previous Article in Special Issue
Real-Time Capable MPC-Based Energy Management of Hybrid Microgrid
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Timescale Cooperative Scheduling Method for Flexible Load in Power Distribution System Considering Dynamic Transformer Rating

1
Henan XJ Metering Co., Ltd., Xuchang 461000, China
2
School of Automation and Electrical Engineering, Zhongyuan University of Technology, Zhengzhou 450007, China
*
Author to whom correspondence should be addressed.
Processes 2026, 14(10), 1584; https://doi.org/10.3390/pr14101584
Submission received: 5 March 2026 / Revised: 21 April 2026 / Accepted: 22 April 2026 / Published: 14 May 2026

Abstract

With the large-scale integration of new energy, electric vehicles, and other new loads, disorderly electricity consumption has led to surging peak loads and heightened overload risks for distribution transformers. Particularly in aging, high-density urban areas constrained by the cost and space limitations of upgrading distribution facilities, there is an urgent need to tap into the flexible load control potential of existing power distribution systems to ensure system safety. This paper proposes a multi-timescale cooperative scheduling framework for flexible loads in distribution systems, deeply integrating the dynamic load capacity of transformers with the dispatchable characteristics of a flexible load. First, a day-ahead scheduling layer based on multi-agent reinforcement learning is constructed to optimize electricity plans and smooth peak–valley loads in the distribution system. Second, a dynamic transformer-rating model for distribution transformers is established to uncover their dynamic load capabilities under varying environmental conditions. Finally, an intraday scheduling layer for flexible loads is developed. It dynamically matches the regulation demands of distribution transformers and flexible loads via real-time optimization of consumption strategies to address electricity price fluctuations and user behavior randomness. Case study results demonstrate that the methods described in this paper effectively reduce power load fluctuations, ensuring the safe and stable operation of distribution and power supply systems.

1. Introduction

In recent years, the large-scale integration of distributed renewable energy and electric vehicles (EVs) has significantly amplified uncertainties on both the generation and demand sides of power systems [1,2,3]. This integration continuously widens the peak-to-valley load difference, posing severe challenges for the safe and stable operation of distribution networks [4,5,6,7,8,9]. The issue is particularly pronounced in aging, high-density urban areas constrained by limited physical space and prohibitive infrastructure-upgrading costs [10,11,12]. Traditional distribution systems struggle to manage the power fluctuations caused by these strong uncertainties, frequently resulting in transformer overloads. When distribution transformers operate under prolonged overload conditions, internal heat gradually accumulates, causing sudden hotspot temperature spikes. This thermal stress accelerates insulation aging, reduces equipment lifespan, and increases operational risks [13,14,15,16]. This challenge stems primarily from disorderly user behavior—such as uncoordinated EV charging overlapping with evening peak loads—and the inherent inability of land-scarce urban grids to rapidly adapt to new load characteristics.
To mitigate these risks without requiring massive capital investments, it is crucial to leverage the regulatory potential of flexible loads to smooth peak–valley fluctuations. Flexible loads, characterized by their responsive agility, serve as vital resources in modern power systems [17,18]. They provide value through active peak-shifting and avoidance mechanisms. As a highly scalable flexible resource, EVs represent a significant regulatory asset [19]. Because EV charging possesses inherent shiftability and transferability, intelligent management of EV clusters can redistribute charging demands across time and multiple adjacent distribution substations, effectively preventing localized transformer overloads [20,21,22,23,24,25,26,27,28].
While static mixed-integer programming and single-agent reinforcement learning have been employed to schedule EV charging, these approaches often struggle in modern grids [29,30,31,32,33]. Static optimization suffers from the curse of dimensionality when scaling to large EV fleets, and single-agent models fail to capture the highly competitive, non-stationary dynamics between independent charging stations vying for grid capacity [34,35]. In contrast, Multi-Agent Reinforcement Learning (MARL) provides a robust framework for handling distributed decision-making under uncertainty. Recent advancements demonstrate that MARL can effectively minimize charging costs while smoothing load profiles through Centralized Training with Decentralized Execution (CTDE) architectures [36,37]. However, a critical limitation in the current MARL literature is the reliance on static transformer capacity constraints. Treating transformer limits as rigid, worst-case seasonal boundaries forces conservative scheduling, leaving substantial operational headroom untapped. Dynamic Transformer Rating (DTR) leverages real-time ambient conditions to safely evaluate the true thermal capacity of transformers; yet, integrating DTR into model-free MARL introduces severe non-differentiable boundaries that can destabilize policy convergence.
To bridge this gap, this paper proposes a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) scheduling framework intrinsically coupled with DTR constraints. The primary advantage of our approach lies in its decentralized execution phase, which eliminates communication bottlenecks and safely unlocks hidden transformer capacity without necessitating hardware upgrades. Conversely, the main trade-off is the high computational and memory overhead required during the initial centralized training phase.
Therefore, we propose a multi-timescale flexible load coordination strategy that actively embeds DTR evaluation into the multi-agent decision-making process. While our previous preliminary work [29] established the static distribution network environment and basic mathematical constraints for EV modeling, it lacked adaptive capacity mechanisms. The core theoretical novelty of this manuscript is its integration of dynamic thermal boundaries into a CTDE-based potential game, ensuring that local policy updates mathematically converge to a Nash Equilibrium that aligns with global grid safety. The main contributions are as follows:
(1)
We fully consider the characteristics of the power distribution system and electric vehicles in adjacent substations. We established physical models for EV charging status and charging station status and constructed a day-ahead scheduling layer based on multi-agent reinforcement learning to optimize power consumption plans and smooth peak–valley loads in the power distribution system.
(2)
We explored the dynamic load capacity of distribution transformers under varying environmental conditions and developed an evaluation model for the dynamic load capacity of distribution transformers.
(3)
We constructed an intraday scheduling layer for flexible loads. By dynamically optimizing electricity consumption strategies in real time to address price fluctuations and user behavior randomness, it is possible to achieve dynamic matching between distribution transformers and flexible load regulation demands.
The remainder of this paper is organized as follows. Section 2 introduces the day-ahead flexible load dispatch model governed by the MADDPG algorithm, outlining the observation, action, and reward structures. Section 3 details the intraday DTR integration, formulating the thermal and physical constraints acting as the real-time clipping boundary. Section 4 presents a comprehensive case study on a modified IEEE 33-bus system, verifying the algorithm against baseline methods through quantitative analysis. Finally, Section 5 concludes the study and discusses future implications.

2. Flexible Load Dispatch Model for a Distribution System in the Day-Ahead Stage

Flexible loads represented by electric vehicles involve multiple entities, each with distinct objectives. Users seek lower charging costs and reduced queue times, while charging stations aim to maintain stable load levels to prevent distribution transformer overloads. Furthermore, EV charging behavior is deeply coupled with electricity prices and charging stations. To effectively address diverse EV charging patterns and coordinate the objectives of EVs and charging stations, this paper constructs a day-ahead flexible load scheduling model for power distribution systems based on multi-agent reinforcement learning. As shown in the Figure 1, the reinforcement learning framework primarily includes the agent, reward, action, observation state, and environment.
In the figure above, Agent represents the decision-making entity, primarily referring to the electric vehicle agent and charging station agent in this paper. Action reflects the Agent’s behavior, representing the decision-making actions of electric vehicles and charging stations in different scenarios. Observation state denotes a subset of environmental information from the environment, serving as the basis for each entity’s decisions. Reward reflects environmental feedback, guiding agents to optimize and accumulate benefits. When scheduling strategies effectively reduce EV charging costs and smooth out peak–valley differences at charging stations, each Agent’s reward increases. Environment encompasses factors like charging station distribution. Through distributed architecture and collaborative learning mechanisms, multi-agent reinforcement learning can reflect the behavioral decisions of diverse agents like flexible loads. It characterizes the dynamic coupling relationships between factors such as electric vehicles and charging stations [16]. Consequently, it effectively enhances EV charging efficiency and reduces charging costs while mitigating load fluctuations in the distribution system. The relevant variables and their meanings in this paper are shown in the Table 1.

2.1. Observation State Space Model

Flexible loads represented by electric vehicles are deeply coupled with users’ daily habits, with peak activity concentrated between 8:00 AM and 10:00 AM and 5:00 PM and 8:00 PM. Therefore, this paper employs a double-hybrid normal distribution to model electric vehicle charging behavior, with the primary model shown below.
t a = f a ( t ) = p a 1 σ 2 π exp t μ 1 2 2 σ 2     + 1 p a 1 σ 2 π exp t μ 2 2 2 σ 2
where t a represents the exact arrival time of the EV at the charging station. The parameters μ 1 and μ 2 denote the statistical temporal peaks of user charging behavior; specifically, μ 1 represents the morning commute arrival peak, while μ 2 captures the evening return peak (centered near 18:00). The variable σ denotes the standard deviation characterizing arrival stochasticity, and ω is the mixture weighting coefficient. The probability distribution model for the initial State of Charge (SOC) of electric vehicles upon entering a charging station can be formalized as follows:
f ( t ) = 1 σ 2 π exp ( t μ ) 2 2 σ 2
If the charging time of an electric vehicle at a charging station is T, then the time the electric vehicle leaves the charging station is t b = t a + T . Therefore, the Observation State Space Model [29] in this paper can be formalized as follows:
S i t = o i t , O i t o i t = SOC i t , p i t , t a , t b , g i c , d i , q i , f i c , ω c t O i t = P P V i t , ω c t , P i , j t , Q i , j t , V i t
where o i t is the electric vehicle observation space, O i t is the observation space of the charging station, g i c is the charging station number, primarily used for scheduling decisions, d i represents the distance between electric vehicles and charging stations, q i represents the current queue length at the charging station, f i c indicates whether the electric vehicle is currently charging, and ω c t represents the current price of electricity. To ensure safe and stable operation of the charging station, the node voltage constraints are set as follows:
V min V i t V max      i L
where V min and V max represent the maximum and minimum values of the node voltage.

2.2. Action Space Model

To reflect the complex coupling relationship between electric vehicles and charging stations, this paper employs a Markov game tuple { N ,   S ,   A ,   P ,   R ,   γ } to characterize the behavioral decisions of the agents. S is the joint state space of agents. A is the coordinated action of intelligent agents. P is the state transition probability. R is the cumulative global reward. γ represents the discount factor. When the i-th electric vehicle initiates a charging request, its mathematical expression and constraints are as follows:
A i t = p i t      i = 1 , 2 , , N
p max i = 1 N η i p i t p max
p 2 max p i t p 2 max
η i = η c p i t 0 1 / η d p i t < 0
where P max is the maximum power of the charging station, P 2 max is the maximum power of the charging station’s charging pile, η i is the power exchange efficiency, η c is charging efficiency, η d is the discharge efficiency, and p i t is the charging and discharging power of the i-th electric vehicle at time t. When p i t > 0 , it indicates that the electric vehicle is in G2V mode. When p i t < 0 , it indicates that the electric vehicle is in V2G mode.

2.3. State Transition Model

When an electric vehicle begins charging or discharging, the change in battery capacity can be formalized as the following equation:
S O C i t + 1 = S O C i t + η i p i t Δ t C i
where C i represents the battery capacity of an electric vehicle. To ensure that the voltage and current at each node of the power grid comply with physical constraints during the charging and discharging process of electric vehicles, this paper sets the following constraints:
P i j t = P c , j t P P V , j t + j k L P j k t + r i j I i j t 2 Q i j t = Q c , j t + j k L Q j k t + x i j I i j t 2 I i j t V i t 2 = P i j t 2 + Q i j t 2 V j t 2 = V i t 2 2 r i j P i j t + x i j Q i j t + r i j 2 + x i j 2 I i j t 2
where P i j t is the active power from node i to node j, Q i j t is the reactive power from node i to node j, P c , j t is the active power demand of node j at time t, and Q c , j t is the reactive power demand of node j at time t.

2.4. Flexible Load Scheduling Model Based on Multi-Agent Reinforcement Learning

To reflect the dynamic game-theoretic process between electric vehicles and charging stations, this paper defines the joint strategy space of the game model as π E V , π C S . Through continuous iterative training, the model gradually reaches a Nash equilibrium state. At this point, the system state can be formalized as the following equation:
R E V π E V , π C S R E V π E V , π C S R C S π E V , π C S R C S π E V , π C S
where π E V is an electric vehicle charging strategy, π C S is the charging station strategy, and π E V and π C S represent achieving optimal strategy. To ensure the convergence of the proposed algorithm, which involves multi-objective games among multiple agents, this paper unifies the optimization objectives of electric vehicles and charging stations. A global potential function is constructed, and individual rewards are adjusted through weight coefficients, as shown in the following equation:
R E V π E V , π C S R E V π E V , π C S = Φ π E V , π C S Φ π E V , π C S R C S π E V , π C S R C S π E V , π C S = Φ π E V , π C S Φ π E V , π C S
Φ π C S , π E V = c R C S π C S , π E V + ( 1 c ) R E V π C S , π E V
where c is the weighting coefficient. At this point, when the entire model reaches the Nash equilibrium, the potential function must also be maximized. The primary formula is as follows:
Φ π E V , π C S Φ π E V , π C S π E V , π C S
The maximum point of the potential function corresponds to the Pareto optimal solution, and the Nash equilibrium point naturally becomes the Pareto optimal strategy combination. To achieve strategy convergence, this paper employs a multi-agent strategy gradient method. Through repeated iterations, model parameters are updated using gradient ascent. The primary mathematical expressions are as follows:
π E V k + 1 = π E V k + α π E V R E V π E V k , π C S k π C S k + 1 = π C S k + α π C S R C S π E V k , π C S k
During peak grid load periods, the charging power of electric vehicles will be moderately reduced. If the target SOC cannot be achieved within the specified timeframe, user demand cannot be met. Therefore, this paper introduces a secondary penalty for such scenarios, expressed primarily as follows:
r 1 = 0.8 S O C i t a t i c + ϵ 2 S O C i t < 0.8 a 0.8 S O C i t 2 S O C i t 0.8
where a is the weight adjustment coefficient for SOC, ϵ is the smoothing coefficient, and t i c is the remaining driving time for the electric vehicle.
To guide electric vehicles and charging stations toward achieving an optimal equilibrium state, this paper introduces charging cost and unauthorized discharge penalty functions, distance and queueing penalty functions, line loss penalty functions, and load balancing penalty functions. The charging cost and unauthorized discharge penalty function r 2 encourages EVs to charge during off-peak hours, while the distance and queueing penalty function r 3 encourages EVs to select nearby, idle charging stations. The line loss penalty function r 4 reduces grid energy waste, and the load-balancing penalty function r 5 suppresses load fluctuations to smooth grid curves. Detailed expressions [29] for these penalty functions are as follows:
r 2 = p i t ω c t μ I ( p i t < 0 & S O C i t < 0.8 )
r 3 = ξ d i δ q i
r 4 = b i j L r i j I i j t 2
r 5 = 1 N i = 1 N P i P ¯ 2
where μ serves as a positive scaling coefficient and I acts as the indicator function. ξ is the distance weighting coefficient, δ is the queue weight coefficient, b is the grid loss weighting coefficient, and P ¯ is the average active load. The design of Equation (17) specifically targets grid economic efficiency: it mathematically disincentivizes charging behaviors during peak periods by explicitly penalizing high charging costs. Furthermore, it imposes a severe negative reward through the indicator function whenever an agent attempts an unauthorized Grid-to-Vehicle (G2V) or Vehicle-to-Grid (V2G) power transition that violates the operational constraints, thus strictly enforcing temporal consistency in the charging schedule.
In multi-agent reinforcement learning, each agent must interact not only with the environment but also with other agents. During the training stage, existing research primarily employs the Centralized Training with Decentralized Execution (CTDE) framework. During training, agents optimize based on global information, while during execution stage, they make independent decisions using local information. The multi-agent reinforcement learning equations involving value function decomposition and policy gradients in CTDE are as follows:
Q t o t ( s , a ) = i = 1 n ω i ( s ) Q i s , a i
J θ i = E θ i π i a i o i a Q i S , a 1 , , a n
where Q t o t ( s , a ) is the global action-value function, Q i s , a i is the local Q-value, ω i ( s ) is the state-dependent weight, J θ i is the gradient of the strategy parameters, and a Q i is the action gradient.
To enhance the training efficiency of multi-agent reinforcement learning, this paper groups charging stations, with each station accommodating a certain number of electric vehicles. Electric vehicles within the same group only need to communicate with their corresponding charging station [29]. At each time step, the charging station collects information from all electric vehicles. This relationship can be formalized as follows:
ξ k t = 1 g k i g k o i t
The charging station generates decisions based on aggregated information ξ k t . The policy update process employs policy gradient methods combined with a Gaussian distribution to address continuous action problems, which can be formalized as follows:
ϕ k J ϕ k = E ϕ k log π k a g k , t ξ k t A g k ξ k t , a g k , t
where A g k is the advantage function. During training, multiple agents undergo group-based training. Each charging station updates its model parameters based on group-specific information and feedback. This update process can be formalized as follows:
L T D g k = R g k , t + γ max Q θ i g k ξ k t + 1 , a g k , t + 1 Q θ i g k ξ k t , a g k , t 2
To reflect the game-theoretic relationships among different groups and ensure that each group’s local optimization direction aligns with the overall global objective, this paper employs a fusion evaluation method for the behavioral decision outcomes of different groups. This fusion evaluation method can be formalized as the following equation:
y t = R t + γ max Q ¯ ϕ t o t S , a g 1 , t , , a g k , t L T D ( ϕ ) = E y t Q ¯ ϕ t o t S , a g 1 , t , , a g k , t 2
To achieve simultaneous global and local optimization of model parameters during multiple iterative cycles, this paper constructs consistency constraints for guidance. The loss function is as follows:
L ( Cl ) ( ϕ , ω ) = E max Q ¯ ϕ tot   S , a g 1 , t , , a g k , t Q ω tot   , 0 2
L ( C 2 ) ( ϕ , ω ) = E max a Q ¯ ϕ t o t S , a g 1 , t , , a g k , t max a Q ω t o t 2
where Q ω t o t = T ω Q θ i g 1 , , Q θ i g k , Q ¯ ϕ tot denotes the global evaluation result, and Q ω tot   denotes the results of the local evaluation.
This paper constructs a day-ahead flexible load scheduling model for power distribution systems based on multi-agent reinforcement learning algorithms. By grouping charging stations and electric vehicles and establishing a potential game model, it coordinates global and local optimization directions to achieve optimal electric vehicle scheduling outcomes during the day-ahead stage, laying the groundwork for subsequent stages.

3. Flexible Load Scheduling Model for Distribution System Considering Dynamic Transformer Rating During Intraday Stages

3.1. Multi-Timescale Coupling and Receding Horizon Optimization

The integration of the multi-agent system with the transformer evaluation model forms a dual-layer architecture. The day-ahead MARL layer generates an economically optimal, global baseline trajectory. However, relying purely on this trajectory is risky due to stochastic forecasting errors in renewable output and user arrivals. To mitigate this, the intraday layer operates as a Receding Horizon Optimization (RHO) mechanism, akin to Model Predictive Control. It continuously monitors the real-time DTR, derived dynamically from actual ambient temperatures, treating it as an absolute hard boundary. If the aggregate power demand from the day-ahead schedule threatens to exceed the real-time DTR limit, the intraday algorithm executes a priority-based clipping function. This strictly truncates the instantaneous EV charging power, sacrificing marginal economic optimality to unconditionally guarantee the physical integrity of the distribution transformer.

3.2. Dynamic Transformer-Rating Evaluation Model

In aging, high-density urban areas, constrained by the cost and space limitations of upgrading distribution facilities, timely upgrades to power distribution and consumption systems are challenging [23,24,25]. There is an urgent need to tap into the potential for flexible load control within existing power distribution and consumption systems. The load capacity of distribution transformers is closely tied to their operating environment. Relying solely on rated capacity as the primary reference standard while ignoring the dynamic variations in a distribution transformer’s load capacity prevents the full utilization of the existing power distribution and consumption system’s equipment potential. Therefore, evaluating the Dynamic Transformer Rating (DTR) under varying conditions and optimizing EV charging station scheduling based on DTR constraints are crucial for reducing peak-to-valley differences in distribution transformers and ensuring the safe and stable operation of power distribution systems.
Numerous factors influence DTR fluctuations, among which the hotspot temperature of distribution transformers is the key limiting factor for DTR. This paper evaluates DTR using the hottest spot temperature (HST) calculation method recommended in IEEE std. C57.91-2011 [30]. The hotspot temperature is derived from factors such as ambient temperature and top-layer oil temperature. The formula for calculating top-layer oil temperature is as follows:
θ T O = θ A + Δ θ T O
where θ T O is the top-oil temperature of the distribution transformer, θ A is the ambient temperature, and Δ θ T O is the temperature rise of the top-oil temperature relative to the ambient temperature. The specific expression for Δ θ T O is as follows [30]:
Δ θ T O = Δ θ T O , U Δ θ T O , i 1 e 24 τ T O + Δ θ T O , i
where Δ θ T O , i is the initial temperature at the current time, τ T O is the oil time constant, and Δ θ T O , U is steady-state oil temperature. Δ θ T O , U is expressed as follows:
Δ θ T O , U = Δ θ T O , R L U 2 R + 1 ( R + 1 ) n
where n is the cooling coefficient, which primarily depends on the transformer’s cooling method, and L U is the transformer load factor. The parameters for R are provided by the equipment manufacturer. Therefore, the HST calculation equation for distribution transformers is as follows:
θ H = θ A + Δ θ T O + Δ θ H
Δ θ H = Δ θ H , U Δ θ H , i 1 e 24 τ W + Δ θ H , i
where Δ θ H , U = Δ θ H , R K u 2 m , θ H is HST, Δ θ H is the temperature rise of the top-oil temperature for HST, and Δ θ H , i is the HST at the initial moment.
When HST is excessively high, it accelerates the aging rate of insulating paper in distribution transformers. When distribution transformers operate under prolonged overload, the insulating paper exhibits reduced mechanical properties and deteriorated insulation performance. This paper employs Degree of Polymerization (DP) to assess aging in distribution transformers. Newly manufactured transformers exhibit an initial DP value of D P 0 = 1000 . After prolonged operation, when D P f = 200 , the insulation paper’s performance significantly deteriorates [31]. At this stage, the insulation paper can no longer guarantee the safe and stable operation of the distribution transformer. Therefore, when tapping into the operational potential of distribution transformers, one should avoid incurring additional life loss. Consequently, this paper constructs the following model to reflect the aging condition of transformers:
T W = R E a ln e E a / R T i n 1
t l i f e = 1 D P f 1 D P 0 A × 24 × 365 e E a / R T i
where R is the molar gas constant, T W is the equivalent aging temperature, E a is the empirically derived activation energy, and A is an empirically derived environmental factor. Elevated HST not only accelerates thermal aging of distribution transformer insulation paper but also increases the volume of oil within the transformer. This causes the oil level to rise, compressing the gas in the upper chamber and elevating internal pressure. To prevent issues such as tank bulging and weld seam cracking, this paper establishes the following model to evaluate and limit the internal pressure of distribution transformers [31].
P i n t = 0.145 P 0 θ H θ A V l 0 V l S F 0 + V g 0 V l 0 S F i + V g V l
where P int is the final pressure inside the distribution transformer, P 0 is the initial environmental pressure, V l 0 is the oil volume at initial oil temperature, V l is the oil volume at the current oil temperature, V g 0 is the initial gas volume, V g is the current gas volume, S F i is the solubility factor at the final temperature, and S F 0 is the solubility factor at ambient temperature. Therefore, the DTR assessment methods under different environmental conditions are as follows:
K T D = K T D ( X ) + n = 1 N K t ( n , t ) + n = 1 N K E V ( t ( n , Y ) ) + K B ( T )
where K B ( T ) is the base load, K T D is the DTR, n = 1 N K t ( n , t ) is the electric vehicle currently being charged, and n = 1 N K E V ( t ( n , Y ) ) is an electric vehicle that will be charging within the next few hours. During the intraday stage, this paper employs DTR as the constraint for distribution transformer capacity. At this point, the assessment of whether a distribution transformer is overloaded primarily depends on HST, pressure indicators, and thermal aging constraints. If, after superimposing new electric vehicle charging loads, the distribution transformer exhibits excessively high HST or pressure exceeding limits, the scheduling model proposed in this paper will dynamically adjust the charging power of that electric vehicle based on the load curve.

4. Example Analysis

4.1. Test System and Simulation Environment

To validate the effectiveness of the proposed multi-timescale scheduling framework and evaluate its impact on distribution systems, this paper constructed an enhanced IEEE 33-bus active distribution network model. The system incorporates three photovoltaic (PV) power stations connected at nodes 11, 18, and 33, alongside six fast-charging stations located at nodes 8, 15, 21, 25, 28, and 31. The flowchart is shown in the Figure 2.
The day-ahead scheduling layer executes offline centralized training under the CTDE architecture. During this phase, electric vehicle agents and charging station agents interact with the environment, observing local states including arrival time, initial state of charge, queue length, and real-time electricity price. The MADDPG algorithm trains centralized critics and decentralized actors, yielding a baseline charging trajectory that minimizes peak–valley differentials and charging costs. In the intraday stage, the dynamic transformer rating is evaluated continuously for the monitored transformer using real-time ambient conditions. The receding horizon optimization module then compares the aggregated charging load at the monitored transformer against the real-time DTR limit. If the load threatens to exceed the thermal boundary, a priority-based clipping function is triggered to curtail instantaneous EV charging power. The topology structure of the power system is shown in the following figure.
The CTDE framework was operationalized using the MADDPG algorithm. To ensure stable convergence, the hyperparameters were strictly configured. The actor and critic networks feature three hidden layers using ReLU activation. The primary training hyperparameters, which govern the neural network updates are summarized in Table 2.
This paper initially focuses on the distribution transformer at Node 28 as the primary research subject due to its high historical load density. Using the assessment methods detailed in Section 3, the daily maximum Dynamic Transformer Rating (DTR) relative to the rated capacity (800 kVA) over a standard week was evaluated, as illustrated in Figure 3.
As shown in Figure 4, the maximum DTR within a week is generally concentrated around 1.2 times the rated capacity. The dynamic load capacity frequently exceeds the conservative rated capacity, particularly during periods of lower ambient temperature, allowing for additional EV integration without incurring equipment damage.
A core claim of integrating DTR is the safe unlocking of operational capacity without exacerbating equipment degradation. Under an uncoordinated charging scenario, severe peak load overlaps drove the distribution transformer’s hotspot temperature (HST) at high temperature. Based on the aging model, this resulted in an equivalent aging acceleration factor greater than 1. Consequently, the daily equivalent aging acceleration factor was maintained below 1.0, yielding a negligible insulation deterioration. This confirms that the framework successfully unlocked 18% to 23% of hidden transformer capacity while incurring zero additional lifespan loss.

4.2. Baseline Algorithm Comparison

To quantify the theoretical superiority of the proposed framework, it was benchmarked against two standard baselines: a centralized Rule-Based Optimization algorithm and a standard single-agent Deep Q-Network (DQN).
The above figure illustrates the reward convergence curves during the training phase. The proposed MADDPG model converges to a higher stable reward threshold in fewer episodes compared to the DQN baseline. The single-agent DQN struggles with the non-stationary environment caused by independent EVs competing for charging capacity, leading to significant oscillations.
Regarding physical grid impact, comparing the uncoordinated charging pattern with the optimized schedule highlights the algorithm’s peak-shaving capability. Figure 5 displays the maximum and minimum loads of the original system versus the system after implementing the MADDPG-coordinated scheduling.
As depicted in Figure 6, the proposed method reduces the peak load by 5.2% compared to the uncoordinated baseline. The original maximum peak valley difference was 0.67, and after optimization using the method proposed in this paper, the peak valley difference was reduced to 0.41.

4.3. Sensitivity Analysis of Reward Coefficients

The stability of the MARL algorithm is highly dependent on the weighting coefficients configured in the potential game reward function. A sensitivity analysis was conducted on the distance penalty coefficient and the grid loss weighting coefficient. The impact of grid loss weighting coefficient varying on system performance is detailed in Table 3.
As shown in Table 3, the system is robust to minor variations. However, it exhibits high sensitivity when grid loss weighting coefficient is substantially altered. Artificially inflating the grid loss coefficient by 15% resulted in overly conservative agent behaviors; EVs opted to defer charging entirely rather than incur marginal line losses, dropping the station utilization rate by 12% and leading to unmet user charging demands.
To rigorously evaluate whether the performance improvements stem from the architectural innovations, we conducted systematic ablation studies. The results of the ablation tests are shown in the table below.
Transformer Utilization is defined as the ratio of the actual carried load to the rated capacity, where values exceeding 100% indicate successful exploitation of dynamic thermal headroom enabled by DTR. The full framework achieves 118.5% utilization with zero safety violations. This demonstrates that the 18.5% capacity augmentation is realized without compromising equipment integrity. Conversely, removing the DTR layer constrains utilization to 98.2%. Without DTR, the system cannot implement preemptive power clipping during rapid load aggregations, causing transient HST spikes that exceed safe thresholds. Ablating the CTDE structure yields 112.4% utilization but 11 violations. Independent learners, lacking centralized coordination during training, exhibit transient non-stationarity where agents simultaneously select high-power actions.

4.4. Scalability Validation

As EV adoption accelerates, scheduling algorithms must scale efficiently. The computational scalability of the proposed decentralized execution architecture was tested against a standard Centralized Optimization solver by varying the EV penetration rate from 20% to 50%, as recorded in Table 4 and Table 5.
As demonstrated in Table 4, the centralized method experiences exponential growth in execution time. In contrast, the CTDE architecture distributes the computational burden locally to the charging stations. The execution time for the MADDPG model scales linearly, maintaining an average local decision latency of under 2.0 s, confirming its suitability for large-scale real-world deployment.

5. Conclusions

This paper addresses the severe capacity constraints and overload risks threatening distribution networks in aging urban areas due to uncoordinated EV charging. By shifting away from conservative static ratings and centralized computational bottlenecks, we propose a multi-timescale scheduling framework coupling MARL with Dynamic Transformer Rating. The decentralized execution of the MADDPG algorithm successfully bypasses the curse of dimensionality. As quantified in Section 4, this framework reduces the system peak-to-valley load difference from 0.67 to 0.41 while lowering peak loads by 5.2% compared to uncoordinated baselines. The integration of intraday DTR constraints acts as a rigid physical safeguard.
Future research will focus on transitioning the current simulation environment into a hardware-in-the-loop (HIL) testbed, explicitly incorporating the degradation models of EV lithium-ion batteries into the multi-agent reward structure to facilitate deep Vehicle-to-Grid (V2G) integration.

Author Contributions

Conceptualization, T.Z., P.L., J.W. and Q.Z.; methodology, T.Z., P.L., J.W. and Q.Z.; software, T.Z., P.L., J.W. and Q.Z.; formal analysis, T.Z., P.L., J.W. and Q.Z.; resources, T.Z., P.L., J.W. and Q.Z.; writing—original draft preparation, T.Z., P.L., J.W. and Q.Z.; writing—review and editing, T.Z., P.L., J.W. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded through Research on Key Technologies and Equipment Development of Aggregation Control for Adjustable Resources in Active Distribution Transformer District (2025G168).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to express our sincere gratitude to the reviewers for their professional opinions and valuable suggestions and our heartfelt thanks to the journal’s editorial board staff for their efficient work in supporting the rigorous presentation of the scholarly results. We would like to pay tribute to the academic community.

Conflicts of Interest

Authors Tiantian Zhang, Peng Li and Jun Wang were employed by the company Henan XJ Metering Co., Ltd. The remaining authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. Asna, M.; Shareef, H.; Prasanthi, A.; Errouissi, R.; Wahyudie, A. A Novel Multi-Level Charging Strategy for Electric Vehicles to Enhance Customer Charging Experience and Station Utilization. IEEE Trans. Intell. Transp. Syst. 2024, 25, 11497–11508. [Google Scholar] [CrossRef]
  2. Nikpour, B.; Sinodinos, D.; Armanfard, N. Deep Reinforcement Learning in Human Activity Recognition: A Survey and Outlook. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 4267–4278. [Google Scholar] [CrossRef]
  3. Su, S.; Li, Y.; Yamashita, K.; Xia, M.; Li, N.; Folly, K.A. Electric Vehicle Charging Guidance Strategy Considering “Traffic Network-Charging Station-Driver” Modeling: A Multiagent Deep Reinforcement Learning-Based Approach. IEEE Trans. Transp. Electrif. 2024, 10, 4653–4666. [Google Scholar] [CrossRef]
  4. Lotfi, F.; Rajoli, H.; Afghah, F. Task-Specific Sharpness-Aware O-RAN Resource Management Using Multi-Agent Reinforcement Learning. IEEE Trans. Mach. Learn. Commun. Netw. 2026, 4, 98–114. [Google Scholar] [CrossRef]
  5. Dai, W.; Li, H.; Liu, H.; Goh, H.H.; Yuan, X.; Liu, Y.; Chen, B. An Efficient Affine Arithmetic-Based Optimal Dispatch Method for Active Distribution Networks with Uncertainties of Electric Vehicles. IEEE Trans. Sustain. Energy 2025, 16, 1021–1036. [Google Scholar] [CrossRef]
  6. Dai, W.; Li, D.; Liu, H.; Liu, Y. A Cost Surrogate Model for TSO-DSO Coordination Based on Polynomial Chaos Expansion. IEEE Trans. Power Syst. 2025, early access. [Google Scholar] [CrossRef]
  7. Dai, W.; Xu, J.; Goh, H.H.; Shi, T.; Zeng, Z. Small Signal Equivalent Modeling for Large ES-Embedded DFIG Wind Farm with Dynamic Frequency Response. IEEE Trans. Power Syst. 2025, 40, 2324–2335. [Google Scholar] [CrossRef]
  8. Siddiqua, A.; Liu, S.; Nipu, A.S.; Harris, A.; Liu, Y. Co-Evolving Multi-Agent Transfer Reinforcement Learning via Scenario Independent Representation. IEEE Access 2024, 12, 99439–99451. [Google Scholar] [CrossRef]
  9. Dahiwale, P.V.; Rather, Z.H.; Mitra, I. A Comprehensive Review of Smart Charging Strategies for Electric Vehicles and Way Forward. IEEE Trans. Intell. Transp. Syst. 2024, 25, 10462–10482. [Google Scholar] [CrossRef]
  10. Morstyn, T.; Crozier, C.; Deakin, M.; McCulloch, M.D. Conic Optimization for Electric Vehicle Station Smart Charging with Battery Voltage Constraints. IEEE Trans. Transp. Electrif. 2020, 6, 478–487. [Google Scholar] [CrossRef]
  11. Dong, X.; Si, Q.; Yu, X.; Mu, Y.; Ren, Y.; Dong, X. Charging Pricing Strategy of Charging Operators Considering Dual Business Mode Based on Noncooperative Game. IEEE Trans. Transp. Electrif. 2025, 11, 7334–7345. [Google Scholar] [CrossRef]
  12. Lai, S.; Dong, Z.Y.; Qiu, J.; Tao, Y.; Zhao, J.; Wang, G. Competitive Pricing Strategy for the Wireless Charging Lane Operator Considering Range Anxiety of Electric Vehicle Users. IEEE Trans. Intell. Transp. Syst. 2025, 26, 2187–2201. [Google Scholar] [CrossRef]
  13. Lai, S.; Qiu, J.; Tao, Y.; Zhao, J. Pricing Strategy for Energy Supplement Services of Hybrid Electric Vehicles Considering Bounded-Rationality and Energy Substitution Effect. IEEE Trans. Smart Grid 2023, 14, 2973–2985. [Google Scholar] [CrossRef]
  14. Qureshi, U.; Ghosh, A.; Panigrahi, B.K. Scheduling and Routing of Mobile Charging Stations With Stochastic Travel Times to Service Heterogeneous Spatiotemporal Electric Vehicle Charging Requests with Time Windows. IEEE Trans. Ind. Appl. 2022, 58, 6546–6556. [Google Scholar] [CrossRef]
  15. Bayram, I.S.; Galloway, S. Pricing-Based Distributed Control of Fast EV Charging Stations Operating Under Cold Weather. IEEE Trans. Transp. Electrif. 2022, 8, 2618–2628. [Google Scholar] [CrossRef]
  16. Lu, X.; Li, J.; Yuan, S.; Jin, H.; Wu, C.; Xu, Z. Toward Real-Time Pricing and Allocation for Surplus Resources in Electric Bus Charging Stations. IEEE Trans. Intell. Transp. Syst. 2024, 25, 2101–2115. [Google Scholar] [CrossRef]
  17. Gan, W.; Wen, J.; Yan, M.; Zhou, Y.; Yao, W. Enhancing Resilience with Electric Vehicles Charging Redispatching and Vehicle-to-Grid in Traffic-Electric Networks. IEEE Trans. Ind. Appl. 2024, 60, 953–965. [Google Scholar] [CrossRef]
  18. Li, S.; Zhao, P.; Gu, C.; Bu, S.; Li, J.; Cheng, S. Integrating Incentive Factors in the Optimization for Bidirectional Charging of Electric Vehicles. IEEE Trans. Power Syst. 2024, 39, 4105–4116. [Google Scholar] [CrossRef]
  19. Nimalsiri, N.I.; Ratnam, E.L.; Smith, D.B.; Mediwaththe, C.P.; Halgamuge, S.K. Coordinated Charge and Discharge Scheduling of Electric Vehicles for Load Curve Shaping. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7653–7665. [Google Scholar] [CrossRef]
  20. Tan, M.; Ren, Y.; Pan, R.; Wang, L.; Chen, J. Fair and Efficient Electric Vehicle Charging Scheduling Optimization Considering the Maximum Individual Waiting Time and Operating Cost. IEEE Trans. Veh. Technol. 2023, 72, 9808–9820. [Google Scholar] [CrossRef]
  21. Li, Y.; Chen, Q.; Strbac, G.; Hur, K.; Kang, C. Active Distribution Network Expansion Planning with Dynamic Thermal Rating of Underground Cables and Transformers. IEEE Trans. Smart Grid 2024, 15, 218–232. [Google Scholar] [CrossRef]
  22. Paulhiac, L.; Desquiens, R. Dynamic Thermal Model for Oil Directed Air Forced Power Transformers with Cooling Stage Representation. IEEE Trans. Power Deliv. 2022, 37, 4135–4144. [Google Scholar] [CrossRef]
  23. Dong, M. A Data-Driven Long-Term Dynamic Rating Estimating Method for Power Transformers. IEEE Trans. Power Deliv. 2021, 36, 686–697. [Google Scholar] [CrossRef]
  24. Alvarez, D.L.; Rivera, S.R.; Mombello, E.E. Transformer Thermal Capacity Estimation and Prediction Using Dynamic Rating Monitoring. IEEE Trans. Power Deliv. 2019, 34, 1695–1705. [Google Scholar] [CrossRef]
  25. Li, Y.; Wang, Y.; Chen, Q. Optimal Dispatch With Transformer Dynamic Thermal Rating in ADNs Incorporating High PV Penetration. IEEE Trans. Smart Grid 2021, 12, 1989–1999. [Google Scholar] [CrossRef]
  26. Lin, H.; Fu, K.; Wang, Y.; Sun, Q.; Li, H.; Hu, Y.; Sun, B.; Wennersten, R. Characteristics of Electric Vehicle Charging Demand at Multiple Types of Location—Application of an Agent-Based Trip Chain Model. Energy 2019, 188, 116122. [Google Scholar] [CrossRef]
  27. Savari, G.F.; Krishnasamy, V.; Sugavanam, V.; Vakesan, K. Optimal Charging Scheduling of Electric Vehicles in Micro Grids Using Priority Algorithms and Particle Swarm Optimization. Mob. Netw. Appl. 2019, 24, 1835–1847. [Google Scholar] [CrossRef]
  28. Zhang, J.; He, Y.; Cui, M.; Lu, Y. Primal Dual Interior Point Dynamic Programming for Coordinated Charging of Electric Vehicles. J. Mod. Power Syst. Clean Energy 2017, 5, 1004–1015. [Google Scholar] [CrossRef]
  29. Peng, Z.; He, Z.; Wu, Y.; Xiao, S. Research on Charging Scheduling for Electric Vehicle Clusters Based on Multi-Agent Reinforcement Learning. Proc. CSEE 2026, 1, 1–16. (In Chinese) [Google Scholar]
  30. IEEE Std C57.12.39-2017; IEEE Standard for Requirements for Distribution Transformer Tank Pressure Coordination. IEEE: Piscataway, NJ, USA, 2017.
  31. Bunn, M.; Seet, B.C.; Baguley, C.; Martin, D. A Thermally-Based Dynamic Approach to the Load Management of Distribution Transformers. IEEE Trans. Power Deliv. 2022, 37, 5124–5132. [Google Scholar] [CrossRef]
  32. Li, B.; Liu, Z.; Huang, H.; Zhong, H.; Li, Z.; Kang, W.; Guerrero, J.M. Two-Stage Risk-Based Scheduling for Electricity-Biogas Rural Microgrids with Biomass Fermentation. IEEE Trans. Smart Grid 2026. [Google Scholar] [CrossRef]
  33. Zhao, A.P.; Li, S.; Xie, D.; Wang, Y.; Li, Z.; Hu, P.J.-H.; Zhang, Q. Hydrogen as the Nexus of Future Sustainable Transport and Energy Systems. Nat. Rev. Electr. Eng. 2025, 2, 447–466. [Google Scholar] [CrossRef]
  34. Sharif, M.; Seker, H. Context-Aware Agentic Power Resources Optimisation in EV using Smart2Charge App. arXiv 2024, arXiv:2512.12048. [Google Scholar]
  35. Altamimi, A.; Ali, M.B.; Kazmi, S.A.A.; Khan, Z.A. Multi-Agent Reinforcement Learning Optimization Framework for On-Grid Electric Vehicle Charging from Base Transceiver Stations Using Renewable Energy and Storage Systems. Energies 2024, 17, 3592. [Google Scholar] [CrossRef]
  36. Jamjuntr, P.; Techawatcharapaikul, C.; Suanpang, P. Adaptive Multi-Agent Reinforcement Learning for Optimizing Dynamic Electric Vehicle Charging Networks in Thailand. World Electr. Veh. J. 2024, 15, 453. [Google Scholar] [CrossRef]
  37. Li, Y.; Zhang, Z.; Xing, Q. Real-time online charging control of electric vehicle charging station based on a multi-agent deep reinforcement learning. Energy 2025, 319, 135095. [Google Scholar] [CrossRef]
Figure 1. Diagram of the reinforcement-learning block.
Figure 1. Diagram of the reinforcement-learning block.
Processes 14 01584 g001
Figure 2. Deployment flowchart of the proposed framework on the modified IEEE 33-bus system.
Figure 2. Deployment flowchart of the proposed framework on the modified IEEE 33-bus system.
Processes 14 01584 g002
Figure 3. Improved IEEE-33 Node System.
Figure 3. Improved IEEE-33 Node System.
Processes 14 01584 g003
Figure 4. Maximum DTR Variation of Distribution Transformers Within One Week.
Figure 4. Maximum DTR Variation of Distribution Transformers Within One Week.
Processes 14 01584 g004
Figure 5. Reward Convergence Curves During Training.
Figure 5. Reward Convergence Curves During Training.
Processes 14 01584 g005
Figure 6. Comparison of Peak–valley Differences Before and After Implementing Electric Vehicle Dispatch.
Figure 6. Comparison of Peak–valley Differences Before and After Implementing Electric Vehicle Dispatch.
Processes 14 01584 g006
Table 1. Corresponding meanings of model variables in this paper.
Table 1. Corresponding meanings of model variables in this paper.
VariablesDescription
NTotal number of accessible charging stations within the local distribution zone.
i, jIndices for nodes in the distribution network.
tTime step index.
r, xBranch resistance and reactance, respectively.
μ 1 ,   μ 2 Statistical mean times for morning (08:30) and evening (18:00) charging peaks.
δGrid loss weighting coefficient.
VNode voltage magnitude.
IBranch current flowing between adjacent nodes.
γDiscount factor for the MARL algorithm.
P i , j ,   Q i , j Active and reactive power flow from node i to node j .
Table 2. MADDPG Training Hyperparameters.
Table 2. MADDPG Training Hyperparameters.
ParameterValueDescription
Actor Learning Rate1 × 10−4Controls the step size for updating the actor policy network
Critic Learning Rate1 × 10−3Controls the step size for updating the critic value network
Discount Factor0.95Determines the importance of future rewards
Target Soft Update Rate0.01Smoothing coefficient for updating target networks
Batch Size1024Number of experiences sampled per training iteration
Replay Buffer Capacity106Maximum number of stored transitions
Table 3. Impact of Grid Loss Coefficient on System Performance.
Table 3. Impact of Grid Loss Coefficient on System Performance.
Variation of Grid Loss Weighting CoefficientStation Utilization Rate (%)Relative Charging Cost Index
−15%88.5105
−5%87.2102
Base (0%)85.0100
+5%80.598
+15%73.092
Table 4. Performance Comparison of Component Ablation Configurations.
Table 4. Performance Comparison of Component Ablation Configurations.
ConfigurationTransformer Utilization (%)Safety Violations
Full Framework118.50
w/o DTR Layer (Static Rating)98.27
w/o CTDE
(Independent Learners)
112.411
w/o Intraday Layer115.34
Table 5. Computational Scalability Across EV Penetration Rates.
Table 5. Computational Scalability Across EV Penetration Rates.
EV Penetration Rate (%)Centralized Optimization Execution Time (s)Proposed MADDPG Execution Time (s)
20%12.30.5
30%35.80.8
40%128.31.2
50%351.01.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, T.; Li, P.; Wang, J.; Zhao, Q. A Multi-Timescale Cooperative Scheduling Method for Flexible Load in Power Distribution System Considering Dynamic Transformer Rating. Processes 2026, 14, 1584. https://doi.org/10.3390/pr14101584

AMA Style

Zhang T, Li P, Wang J, Zhao Q. A Multi-Timescale Cooperative Scheduling Method for Flexible Load in Power Distribution System Considering Dynamic Transformer Rating. Processes. 2026; 14(10):1584. https://doi.org/10.3390/pr14101584

Chicago/Turabian Style

Zhang, Tiantian, Peng Li, Jun Wang, and Qiangsong Zhao. 2026. "A Multi-Timescale Cooperative Scheduling Method for Flexible Load in Power Distribution System Considering Dynamic Transformer Rating" Processes 14, no. 10: 1584. https://doi.org/10.3390/pr14101584

APA Style

Zhang, T., Li, P., Wang, J., & Zhao, Q. (2026). A Multi-Timescale Cooperative Scheduling Method for Flexible Load in Power Distribution System Considering Dynamic Transformer Rating. Processes, 14(10), 1584. https://doi.org/10.3390/pr14101584

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop