Research on Ecological Driving Following Strategy Based on Deep Reinforcement Learning

: Traditional car-following models usually prioritize minimizing inter-vehicle distance error when tracking the preceding vehicle, often neglecting crucial factors like driving economy and passenger ride comfort. To address this limitation, this paper integrates the concept of eco-driving and formulates a multi-objective function that encompasses economy, comfort, and safety. A novel eco-driving car-following strategy based on the deep deterministic policy gradient (DDPG) is proposed, employing the vehicle’s state, including data from the preceding vehicle and the ego vehicle, as the state space, and the desired time headway from the intelligent driver model (IDM) as the action space. The DDPG agent is trained to dynamically adjust the following vehicle’s speed in real-time, striking a balance between driving economy, comfort, and safety. The results reveal that the proposed DDPG-based IDM model signiﬁcantly enhances comfort, safety, and economy when compared to the ﬁxed-time headway IDM model, achieving an economy improvement of 2.66% along with enhanced comfort. Moreover, the proposed approach maintains a relatively stable following distance under medium-speed conditions, ensuring driving safety. Additionally, the comprehensive performance of the proposed method is analyzed under three typical scenarios, conﬁrming its generalization capability. The DDPG-enhanced IDM car-following model aligns with eco-driving principles, offering novel insights for advancing IDM-based car-following models.


Introduction
Transportation plays a substantial role in global energy consumption and greenhouse gas emissions. In the pursuit of energy conservation and emission reduction, the transportation sector is prioritizing the implementation of stricter emission standards and the advancement of new energy vehicles [1]. Electric vehicles have garnered widespread attention from various countries due to their high energy conversion efficiency and lower environmental impact [2][3][4]. Eco-driving, acknowledged and praised by scholars globally, presents a remarkable energy-saving and emission-reducing effect. Achieved through appropriate vehicle speed and acceleration, scientifically selected routes, and suitable vehicle maintenance, eco-driving effectively reduces fuel consumption and tailpipe emissions, ultimately mitigating energy waste and environmental pollution [5][6][7]. Currently, research on eco-driving primarily focuses on the following areas: maintaining vehicle speed under different driving conditions [8], optimizing acceleration and deceleration [9,10], car following [5], and route planning [11,12]. Vehicle-following technology emerges as an effective means to alleviate traffic congestion by optimizing urban traffic flow while enhancing vehicle operational efficiency. Vehicle following describes the interactions between vehicles in the same lane [13]. Treiber et al. [14] integrated previous findings and proposed the intelligent driver model (IDM), a unified mathematical model capable of describing vehicle-following behavior from free-flow to jammed-flow conditions. Kesting et al. [15,16] focused their research on parameter calibration for the following model, treating headway and trajectory as measures of performance. They conducted a thorough analysis of various issues during the validation process. However, the study lacked an evaluation and validation of the selected performance indicators. In the context of hybrid electric vehicle fleet eco-driving, Wang et al. [17] established a multi-objective optimization function for hybrid electric vehicle fleet queue speed. With deeper research on the vehicle-following models by scholars, it has been discovered that the human-driver and vehicle can exhibit distinct vehicle-following characteristics as a unified entity. Investigating human driving styles, Hu et al. [18] developed a car-following driver model that accurately captures the driving characteristics of humans and demonstrated its effectiveness in adapting to various driving styles. Saifuzzaman et al. [19] combined the IDM model and Gipps model with a driving difficulty module to establish the TDIDM model and TDGipps model, respectively. They convincingly demonstrated that the TDIDM model effectively maintains stable following behavior even in intricate driving scenarios. Furthermore, as intelligent connected technology continues to advance, vehicles are increasingly capable of effortlessly accessing real-time traffic information while in operation. Several studies have emphasized the significance of incorporating the influence of diverse road conditions when developing car-following models [20][21][22][23]. And, with the rise of artificial intelligence technology, deep reinforcement learning, as a popular field in the field of artificial intelligence, is capable of solving numerous challenging problems [24][25][26]. It is also widely applied in the domain of car-following. For instance, researchers have proposed several car-following models based on deep reinforcement learning. These include a human-like car-following model, utilizing deep reinforcement learning techniques [27], a personalized car-following model incorporating memory-based deep reinforcement learning [28], and a vehicle-following model that combines deep deterministic policy gradients (DDPG) with stacked denoising autoencoders (SDAE) [29].
In vehicle-following models, time headway (THW) is a critical parameter used to describe vehicle-following behavior, defined as the time interval between consecutive vehicles passing the same section of the road [30]. Many scholars have conducted extensive research on the issue of headway in vehicles' car-following models, mainly categorized into fixed headway and variable headway. For instance, in the realm of studying fixed-time headways, certain scholars have utilized gray correlation analysis methods to examine the correlation between time headway and vehicle-following behavior [31]. They have subsequently devised a car-following model grounded in fixed-time headways. Conversely, other researchers have leveraged time headway as a metric to comprehend the behavior exhibited by human drivers [32]. Research on variable headway time is limited, Yuan et al. [33] designed a novel car-following model based on dynamic safety headway, effectively preventing collisions and improving driving performance and traffic flow efficiency under emergency situations. However, the current research mainly focuses on minimizing energy consumption or setting a fixed desired time headway in car-following models, with fewer studies on multi-objective optimization of the car-following models. Additionally, existing research often sets the desired headway time as a constant value, while vehicles are subject to varying traffic conditions during operation. Fixed desired headway time cannot accurately reflect the actual traffic flow. Moreover, there is limited research applying the intelligent driver model (IDM) to eco-driving. Thus, this study integrates the IDM model with the deep deterministic policy gradient (DDPG) algorithm, presenting a dynamic expected headway IDM car-following model that incorporates eco-driving principles to attain multi-objective optimization for economy, safety, and comfort.
The contributions of this study can be summarized as follows: Firstly, an electric vehicle (EV) model is established using MATLAB/Simulink, and the relevant parameters of the intelligent driver model (IDM) and the multi-objective function for eco-driving are introduced. Secondly, an eco-driving IDM car-following strategy based on the deep deterministic policy gradient (DDPG) algorithm is proposed. Lastly, simulation verification is conducted by comparing the improved IDM model with dynamic desired headway based on the DDPG agent strategy to the IDM model with a fixed expected headway. The results confirm that the proposed method exhibits superior economy, comfort, and safety performance. Additionally, the proposed strategy is validated under different driving conditions, demonstrating its generalization capability.

IDM-Based Vehicle-Following Model
Electric vehicles (EVs) are characterized by their simple structure and high energy conversion efficiency. To investigate eco-driving car-following strategies, it is essential to establish a vehicle model for electric vehicles.

Vehicle Model
In accordance with Equations (1) and (2) governing vehicle motion, a vehicle model for electric vehicles (EVs) is constructed. The main parameters of the vehicle are presented in Table 1.
where F t represents the driving force, F f represents the rolling resistance, F w represents the aerodynamic drag, F i represents the grade resistance, F j represents the acceleration resistance, T tq represents the engine torque, i represents the transmission ratio, η represents the transmission system efficiency, r represents the wheel radius, m represents the vehicle mass, g represents the gravity, g = 9.8 (m·s −2 ), f represents the rolling resistance coefficient, θ represents the road grade, C D represents the air resistance coefficient, A represents the frontal area, ρ represents the air density, u represents the vehicle speed, and du dt represents the longitudinal acceleration during travel. proposed the intelligent driver model (IDM), which has been developed through the integration of various interdisciplinary theories, such as physics, psychology, automatic control, and vehicle engineering. These interdisciplinary theories together form the core of the IDM. Based on modern artificial intelligence algorithms and extensive data training, the IDM can simulate the decision-making process of human drivers, enabling autonomous vehicle control and rational decision-making in complex traffic environments. Therefore, the development of the IDM holds significant importance for the future advancement of intelligent transportation. The classic IDMs are as follows:     Eco-driving has become a prominent area of research in the field of intelligent connected vehicles. Its primary goal is to enhance driving behavior to reduce energy consumption and improve traffic conditions, all while ensuring driving safety. To achieve this objective, this paper builds upon the IDM car-following model and formulates a comprehensive multi-objective eco-driving function that takes into account economy, safety, and comfort.
where F denotes the eco-driving objective function, T represents the travel time, ∆SOC represents the energy consumption, v(t) represents the speed of the following vehicle, v lead (t) represents the speed of the preceding vehicle, a(t) represents the acceleration of the following vehicle. Additionally, α, β, and γ are the weighting coefficients for the three indicators, respectively.

Car-Following Model Design
Existing car-following models often set the desired time headway as a fixed value, but vehicles in actual operation are often influenced by traffic flow, which is time-varying. Static time headway cannot accurately reflect the impact of traffic flow. DDPG algorithm is a strategy capable of finding optimal objectives in continuous action spaces. This algorithm consists of two neural networks: one for estimating the action-value function (Critic network) and the other for generating actions (Actor network). The Actor network outputs an action a based on the current state, while the Critic network uses this action and the current state as inputs to estimate the value of the current state. The DDPG algorithm employs a cooperative approach between the Actor and Critic networks, simultaneously updating their parameters by minimizing the error in the action-value function. The training process of the DDPG algorithm involves two stages: sampling and learning. During the sampling stage, the Agent interacts with the environment, storing newly acquired experience data. In the learning stage, the DDPG algorithm samples data from the experience pool for learning and optimizes the parameters of the Actor and Critic networks.
The optimization framework in this study is as follows: (1) state selection: preceding vehicle speed, inter-vehicle distance, and ego vehicle speed; (2) action selection: desired time headway; and (3) rewards: aimed at vehicle fuel economy, comfort, and safety performance. The reinforcement learning strategy employed in this study is illustrated in Figure 1.
is a strategy capable of finding optimal objectives in continuous action spaces. This algorithm consists of two neural networks: one for estimating the action-value function (Critic network) and the other for generating actions (Actor network). The Actor network outputs an action a based on the current state, while the Critic network uses this action and the current state as inputs to estimate the value of the current state. The DDPG algorithm employs a cooperative approach between the Actor and Critic networks, simultaneously updating their parameters by minimizing the error in the action-value function. The training process of the DDPG algorithm involves two stages: sampling and learning. During the sampling stage, the Agent interacts with the environment, storing newly acquired experience data. In the learning stage, the DDPG algorithm samples data from the experience pool for learning and optimizes the parameters of the Actor and Critic networks.
The optimization framework in this study is as follows: (1) state selection: preceding vehicle speed, inter-vehicle distance, and ego vehicle speed; (2) action selection: desired time headway; and (3) rewards: aimed at vehicle fuel economy, comfort, and safety performance. The reinforcement learning strategy employed in this study is illustrated in Fig

Environment and Reward Configuration
As indicated by the IDM car-following model mentioned earlier, the driver's car-following behavior is influenced by the distance between the two vehicles, relative velocity, and desired time headway. Consequently, in the scope of this research, the preceding vehicle's velocity, the distance between the two vehicles, and the ego vehicle's velocity are considered as input states, while the desired time headway is regarded as the output of the driver's model.
In reinforcement learning, the task of the agent is to acquire current state information from the environment and select actions from the action space based on a policy. After executing an action in the environment, the agent transitions to the next state and receives

Environment and Reward Configuration
As indicated by the IDM car-following model mentioned earlier, the driver's carfollowing behavior is influenced by the distance between the two vehicles, relative velocity, and desired time headway. Consequently, in the scope of this research, the preceding vehicle's velocity, the distance between the two vehicles, and the ego vehicle's velocity are considered as input states, while the desired time headway is regarded as the output of the driver's model.
In reinforcement learning, the task of the agent is to acquire current state information from the environment and select actions from the action space based on a policy. After executing an action in the environment, the agent transitions to the next state and receives corresponding rewards (or penalties). This process continues iteratively until a termination condition is met. The objective of the agent is to maximize the cumulative rewards it receives, and the measurement of the goodness of the agent's actions is typically represented by a reward function. Designing a suitable reward function is crucial in reinforcement learning algorithms as it guides and constrains the agent's behavior, leading to improvements in autonomous learning and adaptability. Therefore, the design of the reward function must carefully consider practical problems and be flexibly adjusted according to specific application scenarios to enhance the performance and effectiveness of the reinforcement learning algorithm. In real-world car-following scenarios, drivers adjust their vehicle's state based on changes in the environment, taking appropriate actions accordingly. Building upon the eco-driving function mentioned earlier, this paper designs the reward function as follows: where R represents the reward function and v 1 and v 2 , respectively, denote the speeds of the preceding and following vehicles. The agent receives a score when the speed difference R 2 between the two vehicles is less than 0.5 m·s −1 . ω 1 , ω 2 , and ω 3 are the weight coefficients for the three indicators, respectively.

Parameter Updating
The relevant parameters for the DDPG algorithm are provided as follows: the target smoothing factor is set to 1.0 × 10 −3 , the experience replay buffer length is 1.0 × 10 6 , the minimum sample size is 256, and the discount factor is 0.99.
To achieve better convergence speed, the hyperbolic tangent tan H(x) activation function is used to approximate the transformation relationship between input and output signals in the hidden layer, ensuring that the output acceleration falls within the range of [−1, 1]. The expression for tan H(x) is as follows: After receiving experience samples from the experience replay buffer, the Critic network updates the relevant parameters of the policy network by minimizing the loss function. The loss function can be represented as follows: where θ Q represents the parameters of the Critic network; S t and a t are the state and action at time t, respectively; Q S t , a t θ Q is the output of the Critic network; y t is the target Q-value.
where r t is the reward value at time t, Q and µ are the target Critic network and target Actor network, respectively, and γ is the discount factor. The θ µ parameter of the Actor network are updated by minimizing the loss function: The update of the target network parameters θ Q , θ µ is carried out using the following method: where ζ represents the soft update rate.

Algorithm Validation
To validate the eco-driving car-following performance of the IDM model integrated with the DDPG strategy, simulations were conducted using Matlab/Simulink. The velocity profile of the United States Federal Test Procedure FTP72 was selected as the velocity curve for the preceding vehicle. FTP72 is a more complex driving cycle compared to the New European Driving Cycle (NEDC), with varying speeds and a longer test duration.
Additionally, the FTP72 driving cycle exhibits both high-speed and low-speed operating conditions. The cumulative reward plot of the DDPG agent is shown in Figure 2, indicating that the agent's rewards tend to converge after 512 episodes.

Algorithm Validation
To validate the eco-driving car-following performance of the IDM model integrated with the DDPG strategy, simulations were conducted using Matlab/Simulink. The velocity profile of the United States Federal Test Procedure FTP72 was selected as the velocity curve for the preceding vehicle. FTP72 is a more complex driving cycle compared to the New European Driving Cycle (NEDC), with varying speeds and a longer test duration. Additionally, the FTP72 driving cycle exhibits both high-speed and low-speed operating conditions. The cumulative reward plot of the DDPG agent is shown in Figure 2, indicating that the agent's rewards tend to converge after 512 episodes.

Analytical Results
The optimized deep reinforcement learning action in this study is the desired time headway, which is known to vary within a relatively small range for specific drivers, typically between 1.5 and 3.5 (s·veh −1 ) [34]. Therefore, in this study, we compare and analyze the results obtained by setting the desired time headway to 2 (s·veh −1 ) and 3.5 (s·veh −1 ), respectively. Figures 3-5 depict the speed profiles of the leading and following vehicles using the IDM model with the DDPG intelligent agent strategy, for the desired time headways of 2 (s·veh −1 ) and 3.5 (s·veh −1 ), respectively. Additionally, Figure 6 shows the relative distance between the two vehicles for the three different approaches: Method 1, where the desired time headway is determined by the DDPG intelligent agent; Method 2, with a

Analytical Results
The optimized deep reinforcement learning action in this study is the desired time headway, which is known to vary within a relatively small range for specific drivers, typically between 1.5 and 3.5 (s·veh −1 ) [34]. Therefore, in this study, we compare and analyze the results obtained by setting the desired time headway to 2 (s·veh −1 ) and 3.5 (s·veh −1 ), respectively. Figures 3-5 depict the speed profiles of the leading and following vehicles using the IDM model with the DDPG intelligent agent strategy, for the desired time headways of 2 (s·veh −1 ) and 3.5 (s·veh −1 ), respectively. Additionally, Figure 6 shows the relative distance between the two vehicles for the three different approaches: Method 1, where the desired time headway is determined by the DDPG intelligent agent; Method 2, with a fixed desired time headway of 2 (s·veh −1 ); and Method 3, with a fixed desired time headway of 3.5 (s·veh −1 ).      It can be observed from Figures 3-6 that when the speed is in the range of 0-40 (km·h −1 ), the relative distances between the two vehicles are similar for all three strategies and they all maintain an appropriate following distance; also, when the speed is in the range of 40-60 (km·h −1 ), Method 1 exhibits a greater distance between the vehicles compared to Method 2 and 3. This is because the speed increases, and while ensuring safety, Method 1 sacrifices the following distance to improve fuel economy. On the other hand, the following distance in Method 2 is too small, indicating relatively aggressive driving behavior, which may not be conducive to driving safety. When the speed exceeds 60     It can be observed from Figures 3-6 that when the speed is in the range of 0-40 (km·h −1 ), the relative distances between the two vehicles are similar for all three strategies and they all maintain an appropriate following distance; also, when the speed is in the range of 40-60 (km·h −1 ), Method 1 exhibits a greater distance between the vehicles compared to Method 2 and 3. This is because the speed increases, and while ensuring safety, Method 1 sacrifices the following distance to improve fuel economy. On the other hand, the following distance in Method 2 is too small, indicating relatively aggressive driving behavior, which may not be conducive to driving safety. When the speed exceeds 60 (km·h −1 ), Method 1 is capable of maintaining a relatively stable following distance, ensuring driving safety. However, the distance for Method 3 is too large, leading to a decrease in the following effect.  Figure 7 shows the acceleration profiles of the following vehicle under three different methods, where acceleration is an important indicator reflecting comfort. From the three acceleration curves, it can be observed that the acceleration curve of Method 1 is generally lower than those of Method 2 and Method 3, while the acceleration curve of Method 2 is higher than the other two methods at each stage. This indicates that the car-following model of Method 1 exhibits higher stability, better passenger comfort, and a relatively gentle driving style. Figure 8  It can be observed from Figures 3-6 that when the speed is in the range of 0-40 (km·h −1 ), the relative distances between the two vehicles are similar for all three strategies and they all maintain an appropriate following distance; also, when the speed is in the range of 40-60 (km·h −1 ), Method 1 exhibits a greater distance between the vehicles compared to Method 2 and 3. This is because the speed increases, and while ensuring safety, Method 1 sacrifices the following distance to improve fuel economy. On the other hand, the following distance in Method 2 is too small, indicating relatively aggressive driving behavior, which may not be conducive to driving safety. When the speed exceeds 60 (km·h −1 ), Method 1 is capable of maintaining a relatively stable following distance, ensuring driving safety. However, the distance for Method 3 is too large, leading to a decrease in the following effect. Figure 7 shows the acceleration profiles of the following vehicle under three different methods, where acceleration is an important indicator reflecting comfort. From the three acceleration curves, it can be observed that the acceleration curve of Method 1 is generally lower than those of Method 2 and Method 3, while the acceleration curve of Method 2 is higher than the other two methods at each stage. This indicates that the car-following model of Method 1 exhibits higher stability, better passenger comfort, and a relatively gentle driving style. Figure 8 displays the State of Charge (SOC) curves under the three methods for the same driving conditions. In this case, the final SOC value of Method 1 is 0.7012, while for Method 2 and Method 3, the final SOC values are 0.6985 and 0.7003, respectively. Method 1 improves fuel economy by 2.66% compared to Method 2, and it also shows a slight improvement in fuel economy compared to Method 3. In summary, Method 1 outperforms Method 2 and Method 3 in terms of tracking performance, driving comfort, and economy.  Figure 7 shows the acceleration profiles of the following vehicle under three different methods, where acceleration is an important indicator reflecting comfort. From the three acceleration curves, it can be observed that the acceleration curve of Method 1 is generally lower than those of Method 2 and Method 3, while the acceleration curve of Method 2 is higher than the other two methods at each stage. This indicates that the car-following model of Method 1 exhibits higher stability, better passenger comfort, and a relatively gentle driving style. Figure 8 displays the State of Charge (SOC) curves under the three methods for the same driving conditions. In this case, the final SOC value of Method 1 is 0.7012, while for Method 2 and Method 3, the final SOC values are 0.6985 and 0.7003, respectively. Method 1 improves fuel economy by 2.66% compared to Method 2, and it also shows a slight improvement in fuel economy compared to Method 3. In summary, Method 1 outperforms Method 2 and Method 3 in terms of tracking performance, driving comfort, and economy. To evaluate the generalization capability of the proposed DDPG intelligent agent car-following strategy, several representative driving scenarios were selected for testing, including FTP72, WLTC CLASS2, and JC08. Figure 9 shows the economy under the three methods for each of the selected driving scenarios. It can be observed that in the FTP72 and JC08 scenarios, Method 1 consistently outperforms Methods 2 and 3 in terms of economy. However, in the WLTC scenario, Method 1 exhibits slightly lower economy compared to the other two methods. The preceding Figures 3-6 represent the vehicle speeds and relative distances between the leading and following vehicles for the three methods under the FTP72 operating condition. Figure 10 illustrates the vehicle speeds and relative distances between the leading and following vehicles for the three methods under the selected additional driving scenarios. For speeds below 40 (km·h −1 ), the relative distances are relatively close for all three methods. For speeds between 40-60 (km·h −1 ), Method 1 maintains a slightly larger relative distance compared to the other methods, striking a balance between tracking performance and driving safety. For speeds above 60 (km·h −1 ), Method 1 maintains a stable following distance, while Method 2 results in a smaller following distance, potentially compromising driving safety, and Method 3 exhibits rapidly increasing following distances, leading to inferior tracking performance. Figure 11 presents the average acceleration and deceleration values for the three methods under the selected driving scenarios. It can be observed that the average acceleration and deceleration values of Method 1 and Method 3 are lower than those of Method 2, indicating that Method 2 involves more aggressive driving behavior, while Method 1 and Method 3 provide better comfort. To evaluate the generalization capability of the proposed DDPG intelligent agent car following strategy, several representative driving scenarios were selected for testing, in cluding FTP72, WLTC CLASS2, and JC08. Figure 9 shows the economy under the thre methods for each of the selected driving scenarios. It can be observed that in the FTP7 and JC08 scenarios, Method 1 consistently outperforms Methods 2 and 3 in terms of econ omy. However, in the WLTC scenario, Method 1 exhibits slightly lower economy com pared to the other two methods. The preceding Figures 3-6 represent the vehicle speed and relative distances between the leading and following vehicles for the three method under the FTP72 operating condition. Figure 10 illustrates the vehicle speeds and relativ distances between the leading and following vehicles for the three methods under th selected additional driving scenarios. For speeds below 40 (km·h −1 ), the relative distance are relatively close for all three methods. For speeds between 40-60 (km·h −1 ), Method maintains a slightly larger relative distance compared to the other methods, striking balance between tracking performance and driving safety. For speeds above 60 (km·h −1 Method 1 maintains a stable following distance, while Method 2 results in a smaller fol lowing distance, potentially compromising driving safety, and Method 3 exhibits rapidl increasing following distances, leading to inferior tracking performance. Figure 11 pre sents the average acceleration and deceleration values for the three methods under th selected driving scenarios. It can be observed that the average acceleration and decelera tion values of Method 1 and Method 3 are lower than those of Method 2, indicating tha Method 2 involves more aggressive driving behavior, while Method 1 and Method 3 pro vide better comfort.
In conclusion, Method 1 demonstrates superior overall performance compared t    In conclusion, Method 1 demonstrates superior overall performance compared to Methods 2 and 3, confirming that the IDM model with the desired time headway determined by the DDPG intelligent agent outperforms the fixed-time headway IDM model in various driving scenarios and exhibits a certain level of generalization capability.

Discussions
The proposed IDM model based on the DDPG agent policy integrates the concept of eco-driving relative to the fixed-time-headway IDM model. It outperforms the traditional fixed desired time-headway IDM model in terms of economic efficiency, safety, and comfort, as it can adapt to changes in traffic flow. In traffic environments, fixed-time-headway carfollowing models often fail to adapt to variations in traffic conditions. In contrast, the IDM model based on the DDPG agent policy utilizes the information of the ego vehicle and the leading vehicle as the state space, with the desired time headway as the action output, allowing it to promptly respond to changes in the traffic environment.
Based on the simulation analysis outlined above, it becomes evident that all three strategies maintain commendable tracking performance throughout the FTP72 driving cycle, as well as the low-speed phase of the JC08 cycle. Nevertheless, during the transition phase from low to medium speeds, Method 2, characterized by a desired time headway of 2 (s·vehi −1 ), gives rise to sudden acceleration or deceleration. This leads to a notable deterioration in driver comfort and safety, concurrently escalating energy consumption. Furthermore, in the context of medium-speed operation, the model that adopts a desired time headway of 3 (s·vehi −1 ) exhibits suboptimal car-following behavior, primarily due to its inability to promptly respond to alterations in the behavior of the leading vehicle.
Regarding the WLTC CLASS2 driving cycle, the proposed strategy exhibits slightly lower economic efficiency compared to the other two methods. This is because the switching and maintenance of different speed ranges are more variable. The intelligent agent policy sacrifices some economic efficiency in exchange for improved driver comfort and safety, thereby meeting the diversified requirements of eco-driving.

Main Conclusions
Existing IDM car-following models often set the time headway as a fixed value, which does not account for the influence of traffic flow. In response to this issue, this paper integrates the concept of eco-driving and proposes an eco-driving multi-objective function that comprehensively considers economy, comfort, and safety. An electric vehicle model is established based on the vehicle motion equations. An eco-driving car-following strategy based on DDPG is proposed, which adjusts the desired time headway in the IDM model using the deep reinforcement learning algorithm. The optimized desired time headway from the DDPG intelligent agent is compared with the fixed-time headway in the traditional calibrated IDM model through simulations.
The results from the study indicate that the proposed approach performs better than the traditional fixed-time-headway intelligent driver model (IDM) in terms of comfort, safety, and economy under the FTP72 driving condition. The proposed approach shows a 2.66% improvement in the economy compared to the model with a desired time headway of 2 (s·veh −1 ). In regard to comfort, the acceleration profile of the following vehicle in the proposed approach demonstrates a smoother variation. This implies that the proposed approach provides a more comfortable driving experience compared to the traditional fixedtime-headway IDM model. In terms of safety, the proposed approach effectively maintains a sufficient safety distance and exhibits good tracking performance across different driving conditions. This suggests that the proposed approach ensures the safety of the vehicles on the road and performs well in various low-to-medium speed driving conditions. Furthermore, the study verifies the generalization ability of the proposed approach by considering three representative driving conditions. The proposed approach proves its ability to adapt and perform well under different driving conditions. The comprehensive performance analysis confirms the generalization ability of the proposed approach. In conclusion, the IDM car-following model improved using the DDPG algorithm aligns with the principles of eco-driving, providing new insights for the enhancement of IDM car-following models and serving as a reference for the research and promotion of eco-driving technology.

Limitations and Future Research
This model is suitable only for medium-and low-speed driving conditions. We believe that car-following at high speeds is unsafe, so we do not consider high-speed scenarios. Additionally, we trained intelligent agents using electric vehicles. The carfollowing strategy model described in this paper is not applicable to fuel-powered or hybrid vehicles. Meanwhile, in the study of improving the car-following model's timeheadway action space using DDPG, the integration of data-driven and theory-based models was not explored. In future research, the theory-based model can be combined with realworld driving data sets. Additionally, other model parameters could be jointly optimized to meet different driving requirements, further enhancing the integration of the car-following model with eco-driving principles.
Author Contributions: W.Z., N.W., Q.L., C.P. and L.C. contributed to the study's conception and design. Material preparation, data collection, and analysis were performed by W.Z. and N.W. The first draft of the manuscript was written by N.W. and W.Z. W.Z., Q.L., C.P. and L.C. commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.