Next Article in Journal
Criterion Circle-Optimized Hybrid Finite Element–Statistical Energy Analysis Modeling with Point Connection Updating for Acoustic Package Design in Electric Vehicles
Previous Article in Journal
Optimizing Traffic Accident Severity Prediction with a Stacking Ensemble Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Intelligent Control Framework for High-Power EV Fast Charging via Contrastive Learning and Manifold-Constrained Optimization

1
CATARC Automotive Test Center (Changzhou) Company Limited, Changzhou 213000, China
2
Jiangsu Province Engineering Technology Research Center of Optical Storage, Charging and Testing Integrated R&D and Application, Changzhou 213000, China
3
Key Lab of Broadband Wireless Communication and Sensor Network Technology, Ministry of Education Nanjing University of Posts and Telecommunications, Nanjing 210003, China
*
Author to whom correspondence should be addressed.
World Electr. Veh. J. 2025, 16(10), 562; https://doi.org/10.3390/wevj16100562
Submission received: 18 August 2025 / Revised: 21 September 2025 / Accepted: 24 September 2025 / Published: 1 October 2025
(This article belongs to the Section Charging Infrastructure and Grid Integration)

Abstract

To address the complex trade-offs among charging efficiency, battery lifespan, energy efficiency, and safety in high-power electric vehicle (EV) fast charging, this paper presents an intelligent control framework based on contrastive learning and manifold-constrained multi-objective optimization. A multi-physics coupled electro-thermal-chemical model is formulated as a Mixed-Integer Nonlinear Programming (MINLP) problem, incorporating both continuous and discrete decision variables—such as charging power and cooling modes—into a unified optimization framework. An environment-adaptive optimization strategy is also developed. To enhance learning efficiency and policy safety, a contrastive learning–enhanced policy gradient (CLPG) algorithm is proposed to distinguish between high-quality and unsafe charging trajectories. A manifold-aware action generation network (MAN) is further introduced to enforce dynamic safety constraints under varying environmental and battery conditions. Simulation results demonstrate that the proposed framework reduces charging time to 18.3 min—47.7% faster than the conventional CC–CV method—while achieving 96.2% energy efficiency, 99.7% capacity retention, and zero safety violations. The framework also exhibits strong adaptability across wide temperature (−20 °C to 45 °C) and aging (SOH down to 70%) conditions, with real-time inference speed (6.76 ms) satisfying deployment requirements. This study provides a safe, efficient, and adaptive solution for intelligent high-power EV fast-charging.

1. Introduction

The global energy transition and the pursuit of “dual carbon” goals—carbon peaking and neutrality—have accelerated the growth of the electric vehicle (EV) industry. Among the key enablers for large-scale EV adoption, high-power fast charging (typically 150–350 kW) has emerged as a critical technology to significantly shorten charging time and enhance user convenience. However, the introduction of high-power charging also presents substantial challenges: increased risk of thermal runaway, accelerated battery degradation due to lithium plating, and reduced energy efficiency, especially under dynamic environmental conditions. These issues highlight the need for a comprehensive, system-level control strategy that balances performance, safety, and longevity.
Existing charging strategies can be broadly classified into three categories: rule-based, model-based, and data-driven. Rule-based methods like constant current–constant voltage (CC–CV) are simple to implement but lack adaptability to dynamic conditions. Model-based methods that explicitly encode constraints but struggle with parameter drift and computational complexity [1,2,3,4,5]. Data-driven approaches, especially reinforcement learning (RL), offer adaptability but suffer from poor sample efficiency and lack of safety guarantees [6,7,8,9,10].
The core challenge of high-power EV fast charging lies in complex constrained optimization under strong multi-physics coupling. The charging process involves tightly coupled electrochemical reactions, heat transfer, and electrical dynamics, where high power levels can trigger thermal runaway [11]. High currents also accelerate lithium plating, degrading battery lifespan [12]. The trade-offs among charging time, battery life, and energy efficiency make it difficult for traditional methods to find a globally optimal solution [13]. Thus, improving charging efficiency while guaranteeing safety remains a critical unresolved problem.
To address these challenges, this paper proposes an intelligent control framework for high-power EV fast charging that integrates contrastive learning-enhanced policy optimization and manifold-constrained safe action generation. The main contributions are summarized as follows:
  • A multi-physics coupled MINLP model that incorporates electrochemical, thermal, and electrical dynamics is developed, and it encodes lithium plating risk and temperature limits as hard constraints and supports an environment-adaptive optimization strategy.
  • A contrastive learning–enhanced policy gradient (CLPG) algorithm is designed to improve policy learning efficiency by incorporating a novel trajectory-level contrastive learning mechanism that distinguishes between safe/efficient and unsafe/inefficient charging patterns.
  • Unlike existing safe RL methods that operate at the reward or constraint optimization level, we introduce manifold-aware action generation that directly projects actions onto a safe manifold, providing intrinsic safety guarantees at the action generation stage.

2. Related Work

The optimization and control of electric vehicle (EV) charging have evolved from simple rule-based strategies to intelligent decision-making. Early studies primarily adopted fixed charging schemes such as constant current–constant voltage (CC–CV), which lacked adaptability to battery conditions and environmental variations. With the development of optimization theory, model-based approaches gradually became mainstream. Reference [4] proposed a nonlinear Model Predictive Control (MPC) framework that explicitly considered battery temperature constraints; however, the computational burden of online optimization limited its real-time applicability. To reduce computational complexity, linearized MPC has been widely employed, albeit at the cost of model accuracy [5]. Mixed-integer programming methods are capable of handling discrete decisions in the charging process, such as power level selection [14] and charging mode switching [15], but their computational efficiency deteriorates rapidly with increasing problem size.
In recent years, data-driven approaches—particularly Deep Reinforcement Learning (DRL)—have shown great potential in charging control. Reference [16] proposed a DQN (Deep Q-Network)-based large-scale EV charging scheduling method that achieved model-free adaptive control. Actor-Critic algorithms designed for continuous action spaces further improved control precision [6,7]. Nevertheless, traditional reinforcement learning algorithms suffer from low sample efficiency and struggle to handle hard constraints. Constrained reinforcement learning approaches, such as Constrained Policy Optimization (CPO) [17] and Safety Layer mechanisms [18], have incorporated constraints but at the cost of increased algorithmic complexity.
Multi-objective optimization and constraint handling are key challenges in charging control. Traditional weighted-sum methods lack theoretical guidance for weight selection [19], while Pareto optimization can obtain a set of non-dominated solutions. Studies have shown that algorithms like NSGA-II and NSGA-III perform well in the optimal configuration of charging stations [20]. For constraint handling, the penalty function method adds penalty terms to the objective [21]; in reinforcement learning frameworks, CPO ensures safe policy updates within a trust region [17], and the Safety Layer introduces a safety filter at the output layer [18]. However, these methods still exhibit limitations when addressing state-dependent dynamic constraints.
Research on battery aging and safety provides important constraints for charging optimization. The growth of the solid electrolyte interface (SEI) film is a major cause of capacity degradation. Reference [22] developed an aging model incorporating SEI growth. Lithium plating presents significant safety risks during high-current charging. Reference [23] found that the risk significantly increases when the anode potential drops below 0 V versus Li/Li+. In terms of thermal management, Reference [24] established a 3D thermal model for battery packs, while Reference [25] proposed an active thermal management system. However, existing studies often consider thermal management and charging control separately.
In summary, existing research faces several limitations: model-based methods struggle to accurately describe nonlinear and time-varying characteristics; data-driven approaches lack safety guarantees; multi-objective optimization often lacks adaptive weight adjustment mechanisms; and environmental factors are not sufficiently accounted for. To address these issues, this paper formulates the charging optimization problem as an MINLP model and adopt a DRL-based solution. The proposed method introduces a contrastive learning mechanism and manifold awareness to design an environment-adaptive multi-objective optimization strategy. This approach offers a novel technological pathway for safe, efficient, and adaptive high-power fast charging, with significant implications for advancing the EV industry.

3. Materials and Methods

3.1. Multi-Physics Coupled MINLP Model

3.1.1. Charging Requirements and Physical Modeling

The battery charging process involves the multi-physics coupling of electrochemical reactions, thermodynamic processes, and electrical characteristics, as shown in Figure 1. During the charge transfer process—comprising lithium-ion deintercalation, migration, and intercalation—the generation of Joule heat, polarization heat, and heat from side reactions causes dynamic changes in internal resistance and drifts in electrode potential. The strong coupling effects of these processes cause the system to exhibit significant nonlinear and time-varying characteristics. Figure 2 illustrates the charging system circuit diagram, showing the power conversion stages from AC grid to battery pack, with the proposed CL-PG+MAN controller managing both charging power and cooling system operation.
Consequently, balancing charging speed and battery lifespan is not a simple static trade-off. Instead, it requires a dynamic decision-making framework capable of sensing the battery’s state and environmental conditions in real-time. This framework must seek a dynamic optimal solution among conflicting objectives, including minimizing charging time, maximizing battery lifespan, optimizing energy efficiency, and ensuring safety margins. This constitutes a complex optimization problem that continuously evolves with the system’s state and boundary conditions.
This study aims to solve the multi-objective optimization control problem for high-power fast charging of EVs. Our core design objectives are: (1) To establish an environment-adaptive, layered decision mechanism. This mechanism uses integer variables to represent the selection of power levels and cooling modes, combined with continuous variables to describe the charging dynamics, enabling adaptation across a wide temperature range of −20 °C to 45 °C. (2) To construct an optimization model with embedded safety constraints. Protections against lithium plating and temperature limits are integrated as hard constraints within the MINLP framework. (3) To design a multi-objective optimization strategy with dynamic weights. This strategy automatically adjusts the priority of charging speed, battery protection, and energy efficiency based on the ambient temperature.
The dynamic weight adjustment mechanism operates through a temperature-dependent function that prioritizes different objectives based on operational conditions. At nominal temperatures, the system emphasizes charging speed to maximize user convenience. Under extreme conditions, the weights shift to prioritize battery preservation and energy efficiency, reflecting the increased stress on battery chemistry at these temperatures. This adaptive strategy ensures optimal performance across varying environmental conditions while maintaining battery health.

3.1.2. Decision Variable Definition

The charging process is discretized into N time intervals, each with a duration of t Δ t = 10   s . The model’s decision variables include two categories: continuous and integer variables. Continuous variables represent physical system states and control inputs, including charging power P k [ 0 , P m a x ] , charging current I k 0 , I m a x , charging voltage V k [ V m i n , V m a x ] , battery core temperature T k c o r e R + , surface temperature T k s u r f R + , and cooling system power Q c o o l , k [ 0 , Q c o o l m a x ] . Integer variables correspond to discrete system decisions, including the charging state α k 0 , 1   , where 1 indicates active charging and 0 indicates charging is paused; power level selection β k j 0 , 1   for   j 1 , 2 , 3 , 4 , 5 , where each j corresponds to a distinct power level; the cooling mode γ k 0 , 1 , 2 , representing natural convection, fan cooling, and liquid cooling, respectively; and the preheating flag δ k 0 , 1 , which indicates if the preheating mode is active. Details of the definitions and units for all variables employed in the MINLP model formulation are provided in Appendix A, Table A1.
Practical charging infrastructure exhibits discrete power level configurations, while thermal management systems operate according to predetermined modalities. The incorporation of integer decision variables accurately captures inherent system discretization, precludes infeasible solutions arising from purely continuous optimization formulations, and facilitates the enforcement of logical constraint structures.

3.1.3. Multi-Objective Optimization Function Design

A dynamically weighted objective function is used that considers three key metrics: charging time, battery aging, and energy consumption. The goal is to minimize the total cost J :
m i n J = w 1 T e n v J t i m e + w 2 T e n v J a g i n g + w 3 T e n v J e n e r g y
Each component of the objective function represents critical operational and economic factors in EV charging. The charging time cost J t i m e directly impacts user satisfaction and charging station throughput, as reduced charging duration significantly improves station utilization rates and customer experience. The aging cost J a g i n g captures long-term battery degradation effects, where aggressive charging strategies without proper thermal management can substantially reduce battery lifespan, leading to premature replacement costs. The energy cost J e n e r g y encompasses both electricity consumption and auxiliary system operation, particularly cooling, which represents a non-negligible portion of operational expenses. The adaptive weighting mechanism ensures these competing objectives are balanced according to environmental conditions. where the weighting coefficients w i ( T e n v ) are adaptively adjusted according to the ambient temperature T e n v :
w i ( T e n v ) = { 0.6 if   T e n v [ 15 , 25 ]   C 0.4 if   T e n v [ 5 , 15 ) ( 25 , 35 ]   C 0.3 otherwise
The temperature-dependent weight coefficients in Equation (2) are designed based on extensive battery degradation studies and thermal safety research. According to reference [26], lithium-ion batteries exhibit optimal performance and minimal degradation within the 15–25 ° C temperature range, where lithium plating risk is negligible and ion mobility is sufficient. This supports our assignment of w 1 T e n v = 0.6 to prioritize charging speed in this range.
For moderate temperature deviations ( T e n v [ 5 , 15 ) ( 25 , 35 ]   ° C ), reference [13] demonstrated that capacity fade rate increases by approximately 15% per 10   ° C deviation from optimal temperature. Therefore, we increase the battery protection weight to w 2 T e n v = 0.4 to compensate for accelerated aging. Under extreme conditions ( T e n v < 5   ° C or >35 °C), the risk of lithium plating at low temperatures and thermal runaway at high temperatures necessitates a conservative strategy with w 2 T e n v = 0.5 and w 3 T e n v = 0.2 [27,28].
The charging time cost accounts not only for the actual charging duration but also incorporates a penalty for overtime to ensure completion within the target time:
J t i m e = k = 1 N Δ t 1 α k + λ t i m e max 0 , t t o t a l t t a r g e t
where λ t i m e denotes the overtime penalty coefficient, t t o t a l is the total charging time, and t t a r g e t is the target charging duration.
The battery aging cost J a g i n g is formulated based on the Arrhenius aging model and stress factor theory, comprehensively considering the effects of thermal stress, current stress, and state-of-charge (SOC) stress:
J aging   = k = 1 N α k [ A 1 e x p ( E a R g T k core   ) I k I nom   n 1 + A 2 e x p ( T k core   T ref   T stress   ) f S O C ( S O C k ) ]
The parameters in Equation (4) are defined as follows: A 1 and A 2 are the pre-exponential factors for aging; E a is the activation energy (J/mol); R g is the universal gas constant (8.314 J/(mol·K)); I n o m is the nominal charging current; n 1 is the current stress exponent; T r e f is the reference temperature (298.15 K); and T s t r e s s is the temperature stress scale.
In the model, an exponential temperature stress term e x p ( T k c o r e T r e f T s t r e s s ) is introduced to reflect the accelerated aging under elevated temperatures; the current stress term ( I k I n o m ) n 1 captures the adverse effects of high charging currents; and the SOC-related stress function f S O C ( S O C k ) characterizes the aging acceleration at high state-of-charge levels.
The energy cost J energy accounts not only for energy losses during the charging process but also for the energy consumed by the cooling system, as defined in Equation (5):
J energy = k = 1 N [ η loss ( P k , T k core   ) P k + Q cool , k η cool ( γ k ) ] Δ t
Here, η l o s s P k , T k c o r e denotes the energy loss rate as a function of power and temperature, and η c o o l ( γ k ) denotes the efficiency factor for each cooling mode. Specifically, the energy loss rate is expressed as
η l o s s P k , T k c o r e = I k 2 R i n t S O C k , T k c o r e P k
η c o o l ( γ k ) = { 0 if   γ k = 0   ( natural   cooling ) 0.3 if   γ k = 1   ( air   cooling ) 0.6 if   γ k = 2   ( liquid   cooling )
Here, R i n t ( S O C k , T k c o r e ) denotes the internal resistance as a function of SOC and core temperature.

3.1.4. Constraint Conditions and Safety Mechanisms

Power Balance and Limitation:
P k = V k I k α k
P k j = 1 5 β k j P l e v e l j T e n v
where P l e v e l j T e n v denotes the available power at the j-th level under the ambient temperature T e n v .
Uniqueness of Power Level Selection:
j = 1 5 β k j = α k , k
The dynamic thermal constraints are keys to ensuring charging safety. The proposed model captures these dynamics using a two-state thermal model for the core and surface temperatures:
C t h d T k c o r e d t = P l o s s , k h c o n v T k c o r e T k s u r f Q c o o l , k
C t h s u r f d T k s u r f d t = h c o n v T k c o r e T k s u r f h a m b T k s u r f T e n v
where C t h and C t h s u r f denote the core and surface thermal capacitance (J/°C), respectively; h c o n v and h a m b represent the internal and ambient convective heat transfer coefficients (W/°C). The heat loss power P l o s s , k in time step k considers both Joule heating and charging efficiency loss:
P l o s s , k = I k 2 R i n t S O C k , T k c o r e + P k 1 η c h a r g e I k
Temperature Safety Constraint:
T k c o r e T m a x s a f e ϵ t e m p 1 + 0.5 I T e n v > 35
where T m a x s a f e is the maximum allowable core temperature ( 55   ° C ), ϵ t e m p is the temperature safety margin ( 3   ° C ), and I ( ) is the indicator function.
Lithium Plating Protection Constraint:
η a n o d e S O C k , I k , T k c o r e > V L i L i + + ϵ p l a t i n g
where η a n o d e is the anode potential, L i L i + is the lithium metal potential (0 V), and ϵ p l a t i n g is the safety margin (0 V) to prevent lithium plating. The lithium plating constraint is particularly critical as it shares mechanistic similarities with dendrite formation in lithium metal batteries. Recent research has shown that effective suppression of lithium dendrites requires careful management of current distribution and interface stability [29]. In this fast charging context, the manifold constraint mechanism ensures that the charging current never creates conditions conducive to metallic lithium deposition, thereby preventing both capacity fade and potential safety hazards associated with dendrite growth.
SOC Evolution Equation:
S O C k + 1 = S O C k + η c o u l o m b i c ( I k , T k c o r e ) I k Δ t Q c a p a c i t y ( S O H k )
where η c o u l o m b i c is the coulombic efficiency, and Q c a p a c i t y represents the capacity of the battery.
Power Derating Coefficient:
μ ( T e n v ) = { 0.3 + 0.7 T e n v + 20 20 if   T e n v < 0   ° C 1.0 if   T e n v 0 , 30   ° C 1.0 0.5 T e n v 30 15 if   T e n v > 30   ° C
Low-Temperature Preheating Logic:
α k M 1 δ k I T k c o r e < T p r e h e a t m i n
where M is a sufficiently large constant and T p r e h e a t m i n denotes the minimum preheating temperature.
High-Temperature Cooling Control:
γ k γ k 1 M ( 1 I ( T k c o r e > T c o o l t r i g g e r ) )
where T c o o l t r i g g e r represents the cooling trigger temperature.

3.2. Solution Framework with Contrastive Learning and Manifold Constraints

3.2.1. DRL Framework and Problem Mapping

The proposed algorithmic framework learns an optimal charging strategy from scratch through the coordination of three core modules: dynamics modeling, constraint learning, and policy optimization, as shown in Figure 3. In each learning cycle, the system employs Neural Ordinary Differential Equations (Neural ODEs) to model the battery’s dynamic behavior.
The objective function for learning the dynamics model is defined as:
L d y n a m i c s = E x t , u t , x t + 1 D x t + 1 t t + Δ t f η x s , u s , s d s 2 2
The integral is computed using a numerical ODE solver (e.g., Runge-Kutta method), ensuring that state predictions are continuous and differentiable, which provides reliable gradient information for the subsequent policy optimization.
Meanwhile, the constraint manifold is learned by analyzing safe and unsafe patterns from historical charging trajectories. The system maintains two trajectory buffers: B safe , which stores trajectories that complete successfully without triggering any safety alerts, B unsafe , which contains trajectories that lead to overheating, voltage violations, or other hazardous signals.
L constraint = E x , u B safe ReLU h x , u , t ; ϕ + E x , u B unsafe ReLU h x , u , t ; ϕ + ϵ
where ϵ > 0 is a safety margin parameter that ensures the learned constraint boundary is conservative.
The policy network update leverages both the predictive capability of the dynamics model and the safety guarantees provided by the learned manifold. During action generation, the manifold-aware module computes a parameterized representation of the feasible action space given the current state. Policy search is then conducted within this constrained region, ensuring the exploration process remains within safe operational bounds.
The algorithm modulates the exploration level by adjusting the output covariance matrix Σ(x) Σ x of the manifold parameterization network. Early in training or in unexplored state regions, a higher covariance encourages broader exploration. As confidence grows with accumulated experience, the covariance gradually shrinks, enabling fine-grained control. The state-dependent exploration strategy is formulated as:
Σ x = Σ base x exp β N x
where N x is the visitation count near state x, and β is a decay coefficient.
The learning process uses an asynchronous update schedule: the dynamics model is updated most frequently (every episode), the constraint function at a moderate frequency (every 10 episodes), and the policy network least frequently (every 20 episodes).
The overall algorithm flow is illustrated in Figure 4, which shows the iterative learning process with environmental adaptation.

3.2.2. Contrastive Learning-Enhanced Policy Gradient (CLPG) Algorithm

The CLPG algorithm introduced in this paper incorporates a trajectory-level contrastive learning mechanism, as shown in Figure 5. Let θ denote the parameters of the policy network, and π θ ( u | x ) represent the policy function. A trajectory encoder E ψ : T R d maps trajectories into an embedding space, where T denotes the trajectory space and d represents the embedding dimension. For trajectories τ i and τ j , the contrastive learning loss function is defined as:
L C L = E τ i D + τ j D log exp s E ψ τ i , E ψ τ j k = 1 K exp s E ψ τ i , E ψ τ k
where D + and D denote the distributions of positive (safe and efficient trajectories) and negative (violation or inefficient trajectories) samples, respectively. s · , · is the similarity function, and K is the number of negative samples.
The policy gradient update rule is modified as:
θ J ( θ ) = E τ π θ [ t = 0 T θ l o g   π θ ( u t x t ) A aug ( x t , u t ) ]
where the augmented advantage function A aug is defined as:
A aug x t , u t = A base x t , u t + λ S E ψ τ t
where A base is the baseline advantage function, S ( ) is the trajectory quality score function, and S ( τ 0 : t ) represents the partial trajectory from time 0 to time t. λ is a balancing coefficient.
Distinguished from conventional actor-critic architectures, this approach eliminates the explicit value function approximator by leveraging the trajectory encoder E ψ to directly assess trajectory quality. This design choice is motivated by the observation that in safety-critical charging control, the relative ranking of trajectories provides more robust learning signals than absolute value estimates. The trajectory quality score S ( ) in Equation (25) effectively serves the role of a critic but operates on complete trajectory embeddings rather than state-value pairs, thereby reducing approximation errors and improving stability.

3.2.3. Manifold-Aware Action Generation Network (MAN)

To ensure that generated actions consistently satisfy dynamic constraints, MAN is proposed. The core idea of this network is to learn a state-dependent parameterization of the action manifold.
Given a state x , the manifold parameterization network F ω : X R 2 q outputs the local parameters of the manifold:
[ μ ( x ) , Σ ( x ) ] = F ω ( x )
where μ ( x ) R q represents the manifold center, and Σ ( x ) R q × q denotes the covariance matrix.
A latent variable z N ( μ ( x ) , Σ ( x ) ) is sampled from the tangent space of the manifold, and the action generation process is carried out through a differentiable projection operator Π M : R p × X M :
u = Π M G ν x , z , x
where G ν : X × R q R p is the action generation network.
The projection operator is achieved by solving the following optimization problem:
Π M u , x = arg m i n u M t u u 2 2
This constrained optimization problem can be transformed into an unconstrained optimization problem by means of the Lagrange multiplier method:
L u , λ = 1 2 u u 2 2 + λ T h x , u , t ; ϕ t
By solving the Karush-Kuhn-Tucker (KKT) conditions, the optimal action u * that satisfies the constraints can be obtained. This projection mechanism ensures that even during the exploration phase, all generated actions strictly adhere to safety constraints, thereby ensuring the safety of the charging process.
Unlike barrier functions that rely on penalty terms approaching infinity near constrained boundaries, manifold parameterization restricts the action space directly to feasible regions via geometric projection. This approach offers several advantages: it ensures constraint satisfaction even during exploration, thereby preventing safety violations; it eliminates the need for hyperparameter tuning associated with penalty weights, thus simplifying implementation; and it preserves differentiability throughout the projection, maintaining stable gradient flow and learning efficiency. In contrast, barrier methods require careful selection of penalty coefficients and may still allow constraint violations, particularly in the early stages of training.

3.3. Theoretical Analysis

3.3.1. Convergence Analysis

Under the assumptions that the augmented advantage function A aug is bounded: | A aug ( x , u ) | B , and that the policy gradient is L-Lipschitz continuous: θ l o g π θ ( u | x ) θ l o g π θ ( u | x ) L θ θ , the policy gradient algorithm with a learning rate of α k = α 0 k converges to a local optimum at a rate of O ( 1 / K ) .
The objective function is defined as J ( θ ) = E τ π θ [ R ( τ ) ] . According to the policy gradient theorem:
θ J θ = E τ π θ t = 0 T θ log π θ u t x t A aug x t , u t
Given that A aug is bounded and the policy gradient is L-Lipschitz, the objective function J ( θ ) is L-smooth. Applying the standard convergence result for stochastic gradient descent yields:
E J θ * J θ K L θ 0 θ * 2 2 k = 1 K α k + σ 2 k = 1 K α k 2 2 k = 1 K α k
where σ 2 is an upper bound on the variance of the gradient estimate.
Substitution of α k = α 0 k into this inequality results in a convergence rate of O ( 1 / K ) , which confirms the convergence of the algorithm.

3.3.2. Computational Complexity Analysis

Let n , p , h , and T denote the dimensions of the state space, action space, neural network hidden layer, and trajectory length, respectively.
The time complexity for a single policy update is O ( T ( n + p ) h 2 + K d 2 ) , where the first term corresponds to the forward and backward propagation of the policy network, and the second to the contrastive learning computation.
The computational complexity of the manifold projection depends on the specific form of the constraints. For linear constraints, the complexity is O ( p 3 ) . For general nonlinear constraints, using an interior-point method results in a complexity of O ( p 3 l o g ( 1 / ϵ ) ) , where ϵ denotes the desired solution precision. The space complexity is determined by the storage for the policy network parameters O ( n h + h 2 + h p ) , the trajectory encoder parameters O ( T n h + h 2 + h d ) , and the experience replay buffer O ( B T ( n + p ) ) , where B is the buffer size. Therefore, the algorithm’s complexity is comparable to standard DRL and exhibits scalability.

4. Results

4.1. Experimental Setup and Benchmarks

4.1.1. Experimental Platform Configuration

The experiments were conducted on a high-performance computing platform, with the hardware configuration listed in Table 1. An Intel Core i9-14900K CPU (Intel Corporation, Santa Clara, CA, USA) was paired with an NVIDIA RTX 5090 GPU. The software environment consisted of PyTorch 2.0.1 and the PyBaMM version 23.5 battery simulation platform to construct a high-fidelity electrochemical model, and simulation results under controlled conditions.
The thermal and electrical parameters in Table 2 were configured for the PyBaMM simulation environment based on validated literature values for NCM622 lithium-ion cells. The core heat capacity (62.7 J/°C) and surface heat capacity (4.5 J/°C) were set according to the thermal characterization studies by reference [25] for similar 5Ah cylindrical cells. The internal convective coefficient (1.9 W/°C) and ambient convective coefficient (5 W/°C) were configured based on the thermal modeling parameters reported by reference [30]. The internal resistance function R i n t ( S O C k , T k c o r e ) was parameterized using the Arrhenius-based model from Plett’s battery modeling work, with activation energy and pre-exponential factors calibrated to match typical NCM622 behavior [31]. These simulation parameters were validated by comparing the model’s thermal and electrical responses against published experimental data for similar battery chemistries under comparable operating conditions.

4.1.2. Comparative Algorithms and Evaluation Metrics

To comprehensively evaluate algorithmic performance, six representative algorithms spanning three major categories—traditional methods, MPC, and DRL—were selected for comparison, as summarized in Table 3. Traditional methods include the classical constant current– CC-CV strategy and the multi-stage constant current strategy. For the MPC category, a model predictive controller based on an equivalent circuit model (MPC-ECM) was employed. In the DRL category, standard Proximal Policy Optimization (PPO), CPO, and Soft Actor-Critic with Lagrangian constraints (SAC-Lagrangian) were considered.
The evaluation metrics include charging time, average C-rate, energy efficiency, and capacity retention. Safety metrics cover maximum temperature rise, safety violation rate, cumulative aging factor, and exploration safety. Specifically, the cumulative aging factor L a g i n g = Σ J a g i n g , k reflects the overall impact of the charging process on battery degradation. The exploration safety metric S e x p l o r e = 1 ( N u n s a f e / N e x p l o r e ) quantifies the safety assurance capability of DRL algorithms during training. Details are provided in Table 4.

4.2. Ablation Study of Key Algorithmic Components

4.2.1. Contribution of the Contrastive Learning Module

To quantitatively assess the contribution of the contrastive learning mechanism, a set of ablation studies were conducted, with results presented in Table 5. Removal of the module led to a marked degradation in performance: charging time increased from 18.3 to 20.1 min, energy efficiency decreased by 1.7%, and the cumulative aging factor rose by 15.9%. Furthermore, the contrastive learning module enhances learning efficiency by distinguishing among historical charging patterns. Specifically, the number of training episodes required to reach 90% performance was reduced from 1200 to 850, representing a 29.2% improvement in training efficiency.

4.2.2. Safety Assurance via Manifold Constraints

MAN represents a key innovation for ensuring charging safety. Ablation results presented in Table 6 show that although the charging time was slightly reduced after removing the manifold constraint, the safety violation rate surged to 4.2%, the peak temperature rise reached 13.8 °C, and the cumulative aging factor increased by 43.9%. These results strongly confirm the critical role of manifold constraints in maintaining charging safety.
Figure 6 illustrates the safety boundary within the action space. The manifold constraint projects the original high-dimensional action space onto a lower-dimensional manifold that satisfies all physical constraints, thereby ensuring that all generated control actions remain within the safe region—specifically constrained by the temperature-current-SOC safety envelope.
Figure 7 compares the evolution of safety violation rates during exploration under different configurations. The complete CL-PG+MAN algorithm maintained a zero violation rate throughout the entire training process. In contrast, removing the manifold constraint led to an initial violation rate as high as 12%; although this rate decreased as training progressed, it never fully eliminated the safety risks. These findings underscore the necessity of the manifold constraint, particularly in safety-critical charging control applications.

4.3. Performance Evaluation Under Different Environmental Conditions

4.3.1. Performance Comparison Under Ambient Temperature Conditions

Simulation results indicate that under standard ambient temperature of 25 °C, the evaluated charging algorithms exhibited significant performance differences, as summarized in Table 7. The proposed CL-PG+MAN method achieved a charging time of 18.3 min with an average charging rate of 3.28C, representing a 47.7% reduction compared to the conventional CC-CV strategy. This improvement was attained without compromising efficiency or safety: the method maintained an energy efficiency of 96.2%, a capacity retention of 99.7%, and a peak temperature rise of only 8.5 °C, with zero safety violations recorded. These performance disparities reflect inherent algorithmic trade-offs. CL-PG+MAN’s superiority stems from its ability to operate near the safety boundary without violations. In contrast, PPO achieves competitive charging speed but lower efficiency (91.2%) due to unawareness of thermal dynamics, frequently triggering protective throttling. MPC-ECM delivers consistent performance through model prediction but operates conservatively, lacking adaptability to parameter variations.
From the current profiles illustrated in Figure 8, it can be observed that the CL-PG+MAN method adopts a high charging current close to 4C during the initial stage. As the state of charge (SOC) and temperature rise, the algorithm adaptively adjusts the current, applying power derating in the 40–60% SOC range to effectively suppress further temperature increase. In contrast, the standard PPO method attempts to charge with a high current, but the absence of effective constraint mechanisms leads to a rapid temperature escalation into hazardous zones, necessitating a significant reduction in charging power. The MPC-ECM method demonstrates better thermal control due to its model-based design; however, its conservative behavior limits charging speed.
Pareto front analysis in Figure 9 further reveals the trade-off between charging time and energy efficiency across different methods. The CL-PG+MAN approach lies on the upper-right edge of the Pareto frontier (the optimal region), indicating a superior balance between speed and efficiency. This advantage is attributed to contrastive learning’s ability to identify efficient charging patterns and the manifold constraints that ensure safety during policy exploration. Traditional methods, while safe and reliable, fail to fully exploit the battery’s charging potential. On the other hand, purely reinforcement learning-based approaches often compromise efficiency and safety in pursuit of speed.

4.3.2. Adaptability Under Extreme Temperature Conditions

As shown in Table 8, in the simulated extreme temperature scenarios of 5 °C, the CL-PG+MAN method exhibits a prolonged charging time of 28.7 min, while still maintaining an average charging rate of 2.09C. This represents a 45.1% reduction in charging time compared to the conventional CC-CV method. The algorithm autonomously adjusts the multi-objective weights through an environmental adaptation mechanism, reducing the weight assigned to charging time from 0.6 to 0.3 to emphasize battery protection.
Under extreme temperatures, the algorithmic differences become more pronounced. At 5 °C, PPO’s violation rate increases to 5.1% due to inadequate thermal modeling, while proposed method maintains zero violations through temperature-aware manifold adaptation. At 40 °C, CC-CV shows 3.2% violations as its fixed thresholds cannot adapt to elevated thermal stress, whereas MPC-ECM’s conservative approach keeps violations at 0.3% but sacrifices charging speed.
Figure 10 illustrates the dynamic adjustment of power levels and cooling modes across temperature conditions. In low-temperature environments, the algorithm first activates a preheating phase by applying a small charging current to raise the battery temperature to an optimal range. During the main charging phase, the selected power levels tend to remain conservative, oscillating primarily between levels 2 and 3, while the cooling system operates in natural convection mode to maintain thermal stability. In contrast, under high-temperature conditions, the algorithm immediately activates the liquid cooling system at the onset of charging and limits the maximum power level below 3, effectively suppressing temperature rise.
The radar plot in Figure 11 presents a comprehensive comparison across six evaluation dimensions: charging time, energy efficiency, capacity retention, thermal control, safety, and environmental adaptability. The CL-PG+MAN approach demonstrates superior performance across all metrics, forming the largest enclosed area. In particular, the method shows strong environmental adaptability, effectively handling a wide temperature range from −20 °C to 45 °C without requiring manual parameter adjustments.

4.4. Validation of Online Adaptive

Strategy Adaptation for Batteries with Varying Health Conditions

Table 9 presents the policy adaptation results of the proposed method when applied to batteries with different SOH. As the SOH degrades from 100% to 90%, the CL-PG+MAN algorithm autonomously reduces the average charging rate from 3.28C to 2.93C, resulting in an extended charging time of 20.5 min. This adjustment is driven by the algorithm’s real-time perception of battery conditions and intelligent decision-making, requiring no manual intervention. In contrast, MPC-ECM’s performance degrades severely due to model mismatch, while CC-CV’s inability to adapt creates increasing safety risks, highlighting the necessity of learning-based approaches for long-term deployment.
As illustrated in Figure 12, the algorithm progressively lowers the upper limit of charging power, increases the activation frequency of the cooling system, and adopts more conservative strategies during the high-SOC stage as SOH declines. When the SOH drops to 70%, the algorithm completely abandons the 4C fast-charging mode and restricts the maximum charging rate to below 2.5C, prioritizing battery safety and residual lifespan.
The dynamic adjustment of the temperature-current safety boundary, shown in Figure 13, highlights the adaptability of the proposed method. As the battery ages, the safety envelope gradually shrinks, reducing the allowable charging current range. The algorithm leverages neural ordinary differential equations (Neural ODEs) to continuously update the battery’s dynamic model, capturing aging-induced increases in internal resistance and alterations in thermal behavior. Based on these insights, it adaptively modifies the control strategy. Such online learning and adaptation capabilities are difficult to achieve using conventional methods.

4.5. Computational Efficiency and Real-Time Feasibility

4.5.1. Sample Efficiency and Convergence Speed

The sample efficiency of deep reinforcement learning (DRL) algorithms directly affects their practical applicability. Table 10 compares the learning efficiency of various algorithms. The CL-PG+MAN algorithm demonstrates a significant advantage, achieving 90% performance within only 850 training episodes, representing a 63.0% reduction in sample demand compared to standard PPO. This improvement is primarily attributed to the contrastive learning mechanism, which effectively leverages historical experience to accelerate policy optimization.
As illustrated in Figure 14, the CL-PG+MAN algorithm exhibits rapid performance gains during the early stages of training and approaches near-optimal performance by episode 1000, significantly outperforming standard PPO and other baseline methods in both convergence speed and final performance. This enhanced sample efficiency—reaching convergence in 850 episodes compared to 2300 for standard PPO—stems primarily from its actor-only architecture combined with contrastive learning. By removing the critic network and its value estimation errors, and instead learning through trajectory comparisons, the algorithm attains more stable gradient estimates from the initial stages, resulting in the smooth convergence curve shown in the figure. It is also noteworthy that even without the contrastive learning module, the manifold constraint mechanism alone continues to provide stabilization and mitigates training oscillations.

4.5.2. Inference Time and Resource Utilization Analysis

Table 11 provides a detailed breakdown of computational time across all modules. The total inference time per step is 6.76 ms, which fully satisfies the 10-s control cycle requirement for real-time operation. Among the components, manifold projection accounts for 31.8% of the total computation time, reflecting its critical role in ensuring safety. In batch mode, processing 32 samples requires only 84.5 ms, demonstrating strong parallel computing capabilities.
Memory usage analysis in Table 12 shows that the full model size is 21.52 MB, with a total runtime memory footprint of 2293 MB, primarily attributed to the experience replay buffer. The model contains 5.53 million parameters, which is comparable to typical deep learning models and remains within the capacity of onboard computing platforms. FLOPs (floating-point operations) analysis indicates that the trajectory encoder contributes the highest computational load (44.3 M), yet it is still within the processing capabilities of embedded GPUs.
Figure 15 illustrates the variation of inference time with different batch sizes. As the batch size increases from 1 to 64, the average inference time per sample decreases from 6.76 ms to 3.21 ms, demonstrating good scalability. This characteristic enables efficient parallel control of multiple battery packs, highlighting its significant practical applicability.
The comprehensive experimental results indicate that the proposed high-power fast-charging optimization control method—integrating Mixed-Integer Nonlinear Programming with deep reinforcement learning—exhibits outstanding performance across multiple dimensions, including charging efficiency, safety, environmental adaptability, and computational efficiency, thereby providing an effective solution for the engineering deployment of high-power fast-charging technology in electric vehicles.

5. Discussion

The proposed CLPG+MAN framework demonstrates superior performance in simulation environments, yet practical deployment requires careful consideration of computational and robustness constraints.
While the experiments utilized high-performance computing with a 21.52 MB model and 6.76 ms inference time, deployment on automotive ECUs is achievable through established optimization techniques. Model quantization from full-precision FP32 to INT8 can reduce size by 75% with minimal performance loss, resulting in a compressed model of approximately 5.4 MB [32]. The manifold projection module, which accounts for about 31.8% of the total computation, can be optimized using lookup tables for common operating points. Modern automotive microcontrollers such as the Infineon AURIX TC3xx series, operating at 300–800 MHz with 8–16 MB of flash memory, are capable of running the optimized model. Estimated inference times between 30 and 50 ms would still comfortably meet the 10-s control cycle requirement [33].
The framework incorporates multiple mechanisms for handling real-world uncertainties. Figure 16 demonstrates the practical feasibility of deploying our framework on automotive embedded systems. Safety margins account for parameter drift and sensor noise. When accounting for BMS communication latency of around 5 ms and sensor sampling time of about 2 ms, total system latency remains under 15 ms [34]. The contrastive learning mechanism inherently learns robust policies from noisy historical data, while the manifold constraints guarantee safety even under uncertainty. For pack-level deployment, the framework can adopt conservative strategies based on the weakest cell’s constraints.
It is important to note that the current validation is based solely on PyBaMM simulations using a single NCM622 5Ah cell. While the learning-based approach is theoretically chemistry-agnostic, comprehensive validation across different chemistries (LFP, NCA) and hardware-in-the-loop testing are necessary future steps. The absence of real battery validation represents a significant limitation that should be addressed before commercial deployment. However, the algorithmic innovations and safety mechanisms demonstrated provide a solid foundation for future development.
Future work will explore online learning mechanisms to dynamically adjust multi-objective weights based on real-time conditions and user preferences, thereby enhancing system adaptability. Additionally, efforts will be made to validate the framework in real-world settings and through large-scale tests to further assess its generalization capability and reliability, supporting eventual commercial deployment.

6. Conclusions

In addressing the multi-objective control challenge of high-power fast charging for electric vehicles, a coupled electro-thermal-chemical multi-physics model was developed, alongside a CLPG algorithm and a MAN. These components enable a dynamic balance between charging speed, battery lifespan, and safety. The CLPG algorithm achieves 29% superior sample efficiency by distinguishing safe from unsafe charging trajectories, while the MAN framework guarantees zero safety violations through geometric projection onto feasible action manifolds.
Simulation results suggest that the proposed method could reduce charging time to 18.3 min—a 47.7% improvement over conventional approaches—while maintaining 96.2% energy efficiency with zero safety violations. Moreover, the algorithm exhibits exceptional adaptability across a wide temperature range (−20 °C to 45 °C) without the need for manual parameter tuning. With an inference time of only 6.76 ms, real-time control requirements are fully met.
The theoretical contribution lies in unifying trajectory-level contrastive learning with manifold constraints for safe reinforcement learning, eliminating the value approximation errors and constraint violations inherent in conventional approaches.

Author Contributions

Conceptualization, H.T., T.Y. and G.D.; methodology, software, G.D. and M.W.; validation, formal analysis, X.Z.; investigation, resources, G.D.; data curation, H.T., T.Y. and G.D.; writing—original draft preparation, M.W.; writing—review and editing, X.Z.; visualization, T.Y.; supervision, G.D.; project administration, funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62303239, and the Changzhou International Cooperation Project, grant number CZ20240007.

Data Availability Statement

The data that support the findings of this study are available upon request from the corresponding author, [Z].

Conflicts of Interest

Hao Tian, Tao Yan, and Guangwu Dai are employees of CATARC Automotive Test Center (Changzhou) Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
EVElectric Vehicle
MINLPMixed-Integer Nonlinear Programming
DRLDeep Reinforcement Learning
MPCModel Predictive Control
ECMEquivalent Circuit Model
PGPolicy Gradient
PPOProximal Policy Optimization
CPOConstrained Policy Optimization
SOCState of Charge
SOHState of Health
ODEOrdinary Differential Equation
CLPGContrastive Learning-Enhanced Policy Gradient
MANManifold-Aware Action Generation Network
FLOPsFloating-point Operations
CC-CVConstant Current–Constant Voltage
SACSoft Actor-Critic
NCMNickel Cobalt Manganese

Appendix A

Table A1. This is a notation table.
Table A1. This is a notation table.
SymbolDefinitionUnit/Range
P k Charging power at time step k [ 0 , P m a x ] kW
I k Charging current 0 , I m a x A
V k Charging voltage [ V m i n , V m a x ] V
α k Charging state (binary) 0 , 1
β k j Power level selection 0 , 1 , j 1 , 2 , 3 , 4 , 5
γ k Cooling mode 0 , 1 , 2
δ k Preheating flag 0 , 1
T k c o r e Battery core temperature°C
T k s u r f Battery surface temperature°C
S O C k State of charge [ 0 , 1 ]
S O H k State of health [ 0 , 1 ]
T e n v Ambient temperature°C
Q cool Cooling powerkW
C t h Core thermal capacitanceJ/°C
h c o n v Internal convection coefficientW/°C
E a Activation energy24,500 J/mol
R g Universal gas constant8.314 J/(mol·K)
w i ( T e n v ) Temperature-dependent weights [ 0 , 1 ]
η loss ( P k , T k core ) Energy loss rate function-
R i n t ( S O C k , T k c o r e ) Internal resistance
η a n o d e Anode potentialV vs. Li/Li+

References

  1. Ma, Y.; Ding, H.; Mou, H.; Gao, J. Battery Thermal Management Strategy for Electric Vehicles Based on Nonlinear Model Predictive Control. Measurement 2021, 186, 110115. [Google Scholar] [CrossRef]
  2. Pozzi, A.; Raimondo, D.M. Stochastic Model Predictive Control for Optimal Charging of Electric Vehicles Battery Packs. J. Energy Storage 2022, 55, 105332. [Google Scholar] [CrossRef]
  3. Xavier, M.A.; Trimboli, M.S. Lithium-Ion Battery Cell-Level Control Using Constrained Model Predictive Control and Equivalent Circuit Models. J. Power Sources 2015, 285, 374–384. [Google Scholar] [CrossRef]
  4. Liu, K.; Li, K.; Peng, Q.; Zhang, C. A Brief Review on Key Technologies in the Battery Management System of Electric Vehicles. Front. Mech. Eng. 2019, 14, 47–64. [Google Scholar] [CrossRef]
  5. Hu, X.; Zou, C.; Zhang, C.; Li, Y. Technological Developments in Batteries: A Survey of Principal Roles, Types, and Management Needs. IEEE Power Energy Mag. 2017, 15, 20–31. [Google Scholar] [CrossRef]
  6. Dorokhova, M.; Martinson, Y.; Ballif, C.; Wyrsch, N. Deep Reinforcement Learning Control of Electric Vehicle Charging in the Presence of Photovoltaic Generation. Appl. Energy 2021, 301, 117504. [Google Scholar] [CrossRef]
  7. Wan, Z.; Li, H.; He, H.; Prokhorov, D. Model-Free Real-Time EV Charging Scheduling Based on Deep Reinforcement Learning. IEEE Trans. Smart Grid 2019, 10, 5246–5257. [Google Scholar] [CrossRef]
  8. Zhang, C.; Liu, Y.; Wu, F.; Tang, B.; Fan, W. Effective Charging Planning Based on Deep Reinforcement Learning for Electric Vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 22, 542–554. [Google Scholar] [CrossRef]
  9. Li, H.; Wan, Z.; He, H. Constrained EV Charging Scheduling Based on Safe Deep Reinforcement Learning. IEEE Trans. Smart Grid 2020, 11, 2427–2439. [Google Scholar] [CrossRef]
  10. Abdullah, H.M.; Gastli, A.; Ben-Brahim, L. Reinforcement Learning Based EV Charging Management Systems—A Review. IEEE Access 2021, 9, 41506–41531. [Google Scholar] [CrossRef]
  11. Ren, D.; Feng, X.; Liu, L.; Hsu, H.; Lu, L.; Wang, L.; He, X.; Ouyang, M. Investigating the Relationship between Internal Short Circuit and Thermal Runaway of Lithium-Ion Batteries under Thermal Abuse Condition. Energy Storage Mater. 2021, 34, 563–573. [Google Scholar] [CrossRef]
  12. Ren, Y.; Widanage, D.; Marco, J. A Plating-Free Charging Scheme for Battery Module Based on Anode Potential Estimation to Prevent Lithium Plating. Batteries 2023, 9, 294. [Google Scholar] [CrossRef]
  13. Ma, S.; Jiang, M.; Tao, P.; Song, C.; Wu, J.; Wang, J.; Deng, T.; Shang, W. Temperature Effect and Thermal Impact in Lithium-Ion Batteries: A Review. Prog. Nat. Sci. Mater. Int. 2018, 28, 653–666. [Google Scholar] [CrossRef]
  14. Zhang, Y.; Wang, Y.; Li, F.; Wu, B.; Chiang, Y.-Y.; Zhang, X. Efficient Deployment of Electric Vehicle Charging Infrastructure: Simultaneous Optimization of Charging Station Placement and Charging Pile Assignment. IEEE Trans. Intell. Transp. Syst. 2021, 22, 6654–6659. [Google Scholar] [CrossRef]
  15. Huang, Y.; Kockelman, K.M. Electric Vehicle Charging Station Locations: Elastic Demand, Station Congestion, and Network Equilibrium. Transp. Res. Part D Transp. Environ. 2020, 78, 102179. [Google Scholar] [CrossRef]
  16. Han, Y.; Li, T.; Wang, Q. A DQN Based Approach for Large-Scale EVs Charging Scheduling. Complex Intell. Syst. 2024, 10, 8319–8339. [Google Scholar] [CrossRef]
  17. Achiam, J.; Held, D.; Tamar, A.; Abbeel, P. Constrained Policy Optimization. In Proceedings of the 34th International Conference on Machine Learning PMLR, Sydney, Australia, 6–11 August 2017; pp. 22–31. [Google Scholar]
  18. Dalal, G.; Dvijotham, K.; Vecerik, M.; Hester, T.; Paduraru, C.; Tassa, Y. Safe Exploration in Continuous Action Spaces. arXiv 2018, arXiv:1801.08757. [Google Scholar] [CrossRef]
  19. Marler, R.T.; Arora, J.S. The Weighted Sum Method for Multi-Objective Optimization: New Insights. Struct. Multidisc. Optim. 2010, 41, 853–862. [Google Scholar] [CrossRef]
  20. Hu, Z.; Liu, S.; Yang, F.; Geng, X.; Huo, X.; Liu, J. Research on Multi-Objective Optimization Model of Power Storage Materials Based on NSGA-II Algorithm. Int. J. Comput. Intell. Syst. 2024, 17, 76. [Google Scholar] [CrossRef]
  21. Couture, J.; Lin, X. Image- and Health Indicator-Based Transfer Learning Hybridization for Battery RUL Prediction. Eng. Appl. Artif. Intell. 2022, 114, 105120. [Google Scholar] [CrossRef]
  22. Yang, X.-G.; Leng, Y.; Zhang, G.; Ge, S.; Wang, C.-Y. Modeling of Lithium Plating Induced Aging of Lithium-Ion Batteries: Transition from Linear to Nonlinear Aging. J. Power Sources 2017, 360, 28–40. [Google Scholar] [CrossRef]
  23. Ren, D.; Hsu, H.; Li, R.; Feng, X.; Guo, D.; Han, X.; Lu, L.; He, X.; Gao, S.; Hou, J.; et al. A Comparative Investigation of Aging Effects on Thermal Runaway Behavior of Lithium-Ion Batteries. eTransportation 2019, 2, 100034. [Google Scholar] [CrossRef]
  24. Wang, Q.; Jiang, B.; Li, B.; Yan, Y. A Critical Review of Thermal Management Models and Solutions of Lithium-Ion Batteries for the Development of Pure Electric Vehicles. Renew. Sustain. Energy Rev. 2016, 64, 106–128. [Google Scholar] [CrossRef]
  25. Lin, X.; Perez, H.E.; Siegel, J.B.; Stefanopoulou, A.G.; Li, Y.; Anderson, R.D.; Ding, Y.; Castanier, M.P. Online Parameterization of Lumped Thermal Dynamics in Cylindrical Lithium Ion Batteries for Core Temperature Estimation and Health Monitoring. IEEE Trans. Control Syst. Technol. 2013, 21, 1745–1755. [Google Scholar] [CrossRef]
  26. Waldmann, T.; Wilka, M.; Kasper, M.; Fleischhammer, M.; Wohlfahrt-Mehrens, M. Temperature Dependent Ageing Mechanisms in Lithium-Ion Batteries—A Post-Mortem Study. J. Power Sources 2014, 262, 129–135. [Google Scholar] [CrossRef]
  27. Petzl, M.; Danzer, M.A. Nondestructive Detection, Characterization, and Quantification of Lithium Plating in Commercial Lithium-Ion Batteries. J. Power Sources 2014, 254, 80–87. [Google Scholar] [CrossRef]
  28. Feng, X.; Ouyang, M.; Liu, X.; Lu, L.; Xia, Y.; He, X. Thermal Runaway Mechanism of Lithium Ion Battery for Electric Vehicles: A Review. Energy Storage Mater. 2018, 10, 246–267. [Google Scholar] [CrossRef]
  29. Jalees, S.; Hussain, A.; Iqbal, R.; Raza, W.; Ahmad, A.; Saleem, A.; Majeed, M.K.; Faheem, M.; Ahmad, N.; Rehman, L.N.U.; et al. Functional PBI Membrane Based on Polyimide Covalent Organic Framework for Durable Lithium Metal Battery. J. Energy Storage 2024, 101, 113985. [Google Scholar] [CrossRef]
  30. Forgez, C.; Do, D.V.; Friedrich, G.; Morcrette, M.; Delacourt, C. Thermal Modeling of a Cylindrical LiFePO 4/Graphite Lithium-Ion Battery. J. Power Sources 2010, 195, 2961–2968. [Google Scholar] [CrossRef]
  31. Battery Management Systems, Volume II: Equivalent-Circuit Methods. Available online: https://ieeexplore.ieee.org/document/9100098 (accessed on 11 September 2025).
  32. Han, S.; Mao, H.; Dally, W. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv 2015, arXiv:1510.00149. [Google Scholar] [CrossRef]
  33. Hamann, A.; Dasari, D.; Kramer, S.; Pressler, M.; Wurst, F. Communication Centric Design in Complex Automotive Embedded Systems. In 29th Euromicro Conference on Real-Time Systems (ECRTS 2017), Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS), Dubrovnik, Croatia, 27–30 June 2017; Bertogna, M., Ed.; Schloss Dagstuhl–Leibniz-Zentrum für Informatik: Dagstuhl, Germany, 2017; Volume 76, pp. 10:1–10:20. [Google Scholar]
  34. Xing, Y.; Ma, E.W.M.; Tsui, K.L.; Pecht, M. Battery Management Systems in Electric and Hybrid Vehicles. Energies 2011, 4, 1840–1857. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of multi-physics coupling in the battery charging process.
Figure 1. Schematic diagram of multi-physics coupling in the battery charging process.
Wevj 16 00562 g001
Figure 2. Charging system circuit diagram.
Figure 2. Charging system circuit diagram.
Wevj 16 00562 g002
Figure 3. Information Flow of the Unified Charging Learning Framework.
Figure 3. Information Flow of the Unified Charging Learning Framework.
Wevj 16 00562 g003
Figure 4. Algorithm Flow Diagram. Solid arrows indicate sequential process flow, dashed arrows represent feedback loops for iterative learning and constraint updates.
Figure 4. Algorithm Flow Diagram. Solid arrows indicate sequential process flow, dashed arrows represent feedback loops for iterative learning and constraint updates.
Wevj 16 00562 g004
Figure 5. Architecture diagram of CL-PG algorithm.
Figure 5. Architecture diagram of CL-PG algorithm.
Wevj 16 00562 g005
Figure 6. Visualization of safety boundaries in action space.
Figure 6. Visualization of safety boundaries in action space.
Wevj 16 00562 g006
Figure 7. Comparison of safety violation rates during exploration.
Figure 7. Comparison of safety violation rates during exploration.
Wevj 16 00562 g007
Figure 8. Comparison of charging curves for various algorithms at room temperature.
Figure 8. Comparison of charging curves for various algorithms at room temperature.
Wevj 16 00562 g008
Figure 9. Pareto frontier analysis of charging time and energy efficiency.
Figure 9. Pareto frontier analysis of charging time and energy efficiency.
Wevj 16 00562 g009
Figure 10. Evolution of charging strategies at different temperatures.
Figure 10. Evolution of charging strategies at different temperatures.
Wevj 16 00562 g010
Figure 11. Temperature-performance radar chart.
Figure 11. Temperature-performance radar chart.
Wevj 16 00562 g011
Figure 12. Evolution of charging strategies during SOH degradation.
Figure 12. Evolution of charging strategies during SOH degradation.
Wevj 16 00562 g012
Figure 13. Dynamic adjustment of temperature-current safety boundaries for aged batteries.
Figure 13. Dynamic adjustment of temperature-current safety boundaries for aged batteries.
Wevj 16 00562 g013
Figure 14. Comparison of learning curves.
Figure 14. Comparison of learning curves.
Wevj 16 00562 g014
Figure 15. Inference time scaling characteristics under different batch sizes.
Figure 15. Inference time scaling characteristics under different batch sizes.
Wevj 16 00562 g015
Figure 16. Deployment Feasibility Analysis.
Figure 16. Deployment Feasibility Analysis.
Wevj 16 00562 g016
Table 1. Experimental platform configuration parameters.
Table 1. Experimental platform configuration parameters.
Configuration ItemSpecification
Hardware EnvironmentIntel Core i9-12900K (16 cores/24 threads, 3.2–5.2 GHz)
CPUNVIDIA RTX 3090 (24 GB GDDR6X)
GPU64 GB DDR5-5600
Memory2 TB NVMe SSD
StorageUbuntu 20.04 LTS
Software Environment3.9.16
Operating SystemPyTorch 2.0.1 + CUDA 11.8
Python VersionStable-Baselines3 1.8.0
Deep Learning FrameworkNumPy 1.24.3, SciPy 1.10.1
Reinforcement Learning Librarytorchdiffeq 0.2.3
Numerical Computing LibrariesPyBaMM 23.5 (Python Battery Mathematical Modelling)
ODE SolverSingle Particle Model with thermal effects
Battery Simulation Environment10 s
Simulation PlatformRelative error < 10−6
Table 2. Battery parameter specifications.
Table 2. Battery parameter specifications.
ParameterValueUnit
Battery TypeNCM622 lithium-ion battery-
Nominal Capacity5Ah
Nominal Voltage3.7V
Cut-off Charging Voltage4.2V
Cut-off Discharging Voltage2.5V
Maximum Charging Rate4C-
Core Heat Capacity62.7J/°C
Surface Heat Capacity4.5J/°C
Internal Convection Coefficient1.9W/°C
Ambient Convection Coefficient5W/°C
Maximum Safe Temperature55°C
Minimum Operating Temperature−20°C
Lithium Plating Safety Margin0.05V
Activation Energy24,500J/mol
Pre-exponential Factor A17.5 × 104-
Pre-exponential Factor A21.2 × 103-
Current Stress Exponent0.55-
Table 3. Summary of comparison algorithms and their characteristics.
Table 3. Summary of comparison algorithms and their characteristics.
AlgorithmCategoryMain FeaturesParameter ComplexitySafety Mechanism
Proposed (CL-PG+MAN)Deep Reinforcement LearningContrastive Learning with Manifold-Constrained Action GenerationO(106)Intrinsic Safety (Manifold Projection)
CC-CVConventional MethodTwo-Stage Constant Current-Constant Voltage ChargingO(10)Preset Current/Voltage Limits
Multi-Stage CCRule-BasedMulti-Stage Constant Current Charging (5 stages)O(102)Stage Transition Rules
MPC-ECMModel Predictive ControlRolling Optimization with Equivalent Circuit ModelO(103)Constrained Optimization
Standard PPODeep Reinforcement LearningProximal Policy OptimizationO(106)Reward-Penalty Mechanism
CPOSafe Reinforcement LearningConstrained Policy OptimizationO(106)Lagrangian Constraint
SAC-LagrangianSafe Reinforcement LearningSoft Actor-Critic + Lagrangian MultiplierO(106)Cost Constraint
Table 4. Definition and calculation formulas of evaluation metrics.
Table 4. Definition and calculation formulas of evaluation metrics.
MetricSymbolFormulaUnitOptimal Direction
Charging Time t c h a r g e Σ Δ t · α k min
Average C-rate C a v g ( 1 / t c h a r g e ) ( I / Q n o m ) d t C
Energy Efficiency η e n e r g y ( Δ S O C · Q n o m · V a v g ) / P d t × 100 % %
Capacity Retention R c a p a c i t y ( Q e n d / Q i n i t ) × 100 % %
Max Temperature Rise Δ T m a x m a x ( T c o r e ) T e n v °C
Violation Rate R v i o l a t i o n ( N v i o l a t i o n / N t o t a l ) × 100 % %
Cumulative Aging L a g i n g Σ J a g i n g , k -
Exploration Safety S e x p l o r e 1 ( N u n s a f e / N e x p l o r e ) -
↓ indicates that lower values are preferred; ↑ indicates that higher values are preferred.
Table 5. Ablation study results.
Table 5. Ablation study results.
ConfigurationCharging Time (min)Energy Efficiency (%)Capacity Retention (%)Max Temp Rise (°C)Violation Rate (%)Cumulative Aging
Full (CL-PG+MAN)18.396.299.78.500.082
No Contrastive Learning (PG+MAN only)20.194.599.210.20.30.095
Baseline (Standard PG)22.591.397.816.57.50.142
Only MAN (no RL)28.392.799.59.300.091
Supervised Contrastive only25.293.29911.51.20.103
Table 6. Ablation study results related to manifold constraints.
Table 6. Ablation study results related to manifold constraints.
ConfigurationCharging Time (min)Energy Efficiency (%)Capacity Retention (%)Max Temp Rise (°C)Violation Rate (%)Cumulative Aging
Full (CL-PG+MAN)18.396.299.78.500.082
No Manifold Constraint (CL-PG)17.893.898.513.84.20.118
Baseline (Standard PG)22.591.397.816.57.50.142
Only MAN (no RL)28.392.799.59.300.091
Supervised Contrastive only25.293.29911.51.20.103
Table 7. Performance metrics of various algorithms at 25 °C.
Table 7. Performance metrics of various algorithms at 25 °C.
AlgorithmTemperature (°C)Charging Time (min)Avg. C-RateEnergy Efficiency (%)Capacity Retention (%)Max Temp Rise (°C)Violation Rate (%)
CL-PG+MAN2518.33.2896.299.78.50
CC-CV25351.7192.199.212.30
Multi-Stage2525.52.3593.899.410.20
MPC-ECM2522.12.7194.599.59.80
PPO2519.83.0391.298.915.72.3
CPO2521.52.7993.199.311.20.2
SAC-Lagrangian2520.22.9792.899.112.50.5
Table 8. Performance metrics of various algorithms at 5 °C and 40 °C.
Table 8. Performance metrics of various algorithms at 5 °C and 40 °C.
AlgorithmTemperature (°C)Charging Time (min)Avg. C-RateEnergy Efficiency (%)Capacity Retention (%)Max Temp Rise (°C)Violation Rate (%)
CL-PG+MAN528.72.0993.599.511.20
CC-CV552.31.1588.298.79.80
Multi-Stage538.51.5690.19910.50
MPC-ECM531.21.9291.899.210.80
PPO535.61.6985.397.818.35.1
CPO532.81.8389.798.912.10.8
SAC-Lagrangian533.51.7988.998.813.21.2
CL-PG+MAN4024.52.4594.899.39.80
CC-CV40421.4390.598.514.53.2
Multi-Stage4032.11.8792.298.812.81.5
MPC-ECM4026.82.2493.19911.50.3
PPO4023.22.5989.797.219.28.7
CPO4025.32.3791.898.713.51.8
SAC-Lagrangian4024.82.4291.298.514.12.5
Table 9. Adaptive charging strategy results for batteries with different SOH.
Table 9. Adaptive charging strategy results for batteries with different SOH.
SOH (%)AlgorithmCharging Time (min)Avg. C-RateEnergy Efficiency (%)Max Temp Rise (°C)Strategy Adaptation Description
100CL-PG+MAN18.33.2896.28.5Baseline strategy
100MPC-ECM22.12.7194.59.8Fixed model parameters
100CC-CV35.01.7192.112.3Static settings
90CL-PG+MAN20.52.9395.89.2Automatic C-rate reduction
90MPC-ECM24.82.4293.511.5Manual parameter tuning required
90CC-CV35.01.7190.813.8No adjustment
80CL-PG+MAN23.12.6095.210.1Further reduced C-rate, enhanced cooling
80MPC-ECM28.22.1392.113.2Significant model mismatch
80CC-CV35.01.7188.515.5Noticeable efficiency drop
70CL-PG+MAN26.82.2494.511.2Conservative strategy, safety prioritized
70MPC-ECM32.51.8590.215.8Severe performance degradation
70CC-CV35.01.7185.318.2Potential safety risks
Table 10. Sample efficiency comparison.
Table 10. Sample efficiency comparison.
Algorithm90% Performance (ep)95% Performance (ep)Convergence (ep)Total Training Time (h)Final Score
CL-PG+MAN8501420210018.50.973
Standard PPO23003850550042.30.892
CPO17502900420035.80.935
SAC-Lagrangian19503200450038.20.927
No Contrastive learning12002100300024.50.948
No Manifold11001850280022.10.915
Table 11. Computational time breakdown of each module.
Table 11. Computational time breakdown of each module.
ModuleSingle Inference Time (ms)Ratio (%)Batch Inference (B = 32) (ms)Memory Usage (MB)
State Encoder0.8212.18.524
Manifold Network1.2318.215.2156
Policy Network0.9514.110.882
Manifold Projection2.1531.828.312
Trajectory Encoder1.3520.018.7238
Pre-processing0.152.21.88
Post-processing0.111.61.24
Total6.76100.084.5524
Table 12. Memory usage and model complexity analysis.
Table 12. Memory usage and model complexity analysis.
ComponentParametersModel Size (MB)Inference Memory (MB)Training Memory (MB)FLOPs (M)
State Encoder125 K0.4824852.5
Manifold Network1.8 M6.8715646836.2
Policy Network450 K1.72822459.1
Value Network380 K1.45681987.6
Trajectory Encoder2.2 M8.4023865244.3
Dynamics Model (ODE)680 K2.609528513.7
Experience Buffer (10K)12501250
Trajectory Buffer (1K)380380
Total5.53 M21.5222933563113.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tian, H.; Yan, T.; Dai, G.; Wang, M.; Zhao, X. An Intelligent Control Framework for High-Power EV Fast Charging via Contrastive Learning and Manifold-Constrained Optimization. World Electr. Veh. J. 2025, 16, 562. https://doi.org/10.3390/wevj16100562

AMA Style

Tian H, Yan T, Dai G, Wang M, Zhao X. An Intelligent Control Framework for High-Power EV Fast Charging via Contrastive Learning and Manifold-Constrained Optimization. World Electric Vehicle Journal. 2025; 16(10):562. https://doi.org/10.3390/wevj16100562

Chicago/Turabian Style

Tian, Hao, Tao Yan, Guangwu Dai, Min Wang, and Xuejian Zhao. 2025. "An Intelligent Control Framework for High-Power EV Fast Charging via Contrastive Learning and Manifold-Constrained Optimization" World Electric Vehicle Journal 16, no. 10: 562. https://doi.org/10.3390/wevj16100562

APA Style

Tian, H., Yan, T., Dai, G., Wang, M., & Zhao, X. (2025). An Intelligent Control Framework for High-Power EV Fast Charging via Contrastive Learning and Manifold-Constrained Optimization. World Electric Vehicle Journal, 16(10), 562. https://doi.org/10.3390/wevj16100562

Article Metrics

Back to TopTop