1. Introduction
In recent years, autonomous driving has advanced rapidly, with many research concentrating on self-driving technology. The field of autonomous driving typically includes perception [
1,
2,
3,
4,
5,
6], planning [
7,
8,
9,
10,
11,
12], and control [
13,
14]. Besides, some works rely on the blockchain-based technology to enhance the autonomous driving [
15,
16]. However, interactive driving scenarios still pose significant challenges for motion planning. Among these, ensuring safety is the top priority. In
Figure 1, an autonomous vehicle (AV) coexists on the road with multiple human-driven vehicle (HDV), creating an environment where insufficient caution could lead to collisions. The AV must continually evaluate potential hazards and generate safe trajectories during the interactions.
Model-based methods have been extensively explored. For instance, game-theoretic techniques attempt to predict the actions of other drivers [
17,
18], assuming a particular driving style for them. However, real-world driver behavior can deviate from such assumptions, often resulting in overly conservative planning. Another model-based approach is robust control [
19], which expands the feasible region to address uncertainties. Yet, this enlargement may inadvertently include regions that are actually unsafe. Existing game-aware MPC frameworks (e.g., Stackelberg or mean-field game integrated MPC) have demonstrated the potential of incorporating interaction-aware predictions into motion planning. However, these approaches often rely on simplified dynamics or linear approximations, which can limit their ability to handle nonlinear vehicle behaviors and complex cost structures in real-time. Moreover, their solutions tend to be overly conservative in high-risk scenarios, leading to reduced efficiency.
Learning-based methods have also garnered considerable interest [
20,
21,
22,
23], typically requiring large-scale training datasets. Consequently, they demand substantial computational resources. Moreover, such methods often function as black boxes [
24], offering limited transparency into their decision-making processes. Other advanced learning-based concepts, such as context-assisted learning [
25,
26] and Large Language Models [
27], could only achieve simple driving-related tasks in current stage, even though they have good capabilities in other domains [
28].
Besides the variety of limitations in above methods, effectively balancing safety, efficiency, and flexibility in interactive driving remains an open problem. Therefore, an approach that can both overcome above limitations and maintain the flexible driving driving is needed. Optimization-based approaches, such as Particle Swarm Optimization (PSO) [
29,
30,
31], Quadratic Programming (QP) [
32], and Adaptive Searching (AS) [
33], and other optimization methods [
34,
35] have been adopted in certain motion-planning contexts for efficient and flexible driving. Nevertheless, these methods can be limited by local minima issues or require carefully tuned cost functions to manage safety margins and driver comfort.
To address these limitations, we propose a novel method based on game theory [
36,
37], model predictive control (MPC) [
38,
39] integrated with dynamic differential programming (DDP) [
40]. Our approach leverages the strengths of game theory, MPC and DDP. Game theory provides the flexible decision-makings based on the relative states over the HDV, even under high-risk scenarios. The MPC provides a receding-horizon framework that continuously updates the control policy based on real-time measurements and predicted traffic evolutions, while DDP efficiently handles nonlinear system dynamics and complex cost structures through a backward-forward optimization procedure. By dynamically adjusting the control inputs to account for interactions with surrounding HDVs, our MPC-DDP framework effectively balances safety and efficiency. In high-risk conditions, the method adopts lower speeds and larger inter-vehicle gaps, whereas in less hazardous situations, it increases speed and maintains tighter following distances. As a result, the proposed MPC-DDP algorithm offers a flexible, transparent, and computationally feasible solution for interactive driving scenarios. The main contributions of this work are:
We introduce a novel game-based MPC-DDP framework that leverages dynamic differential programming to solve the optimal control problem in interactive driving environments. Our formulation uses advanced matrix representations and a recursive backward-forward pass to efficiently compute real-time control policies.
We demonstrate that our MPC-DDP method adapts to diverse driving scenarios. For example, in lane-change scenarios, it achieves higher average speeds and tighter gaps to enable efficient merging, whereas in intersection scenarios, it maintains lower speeds and larger gaps to maximize safety.
Extensive simulation results confirm that our proposed method outperforms state-of-the-art benchmark algorithms such as PSO, QP, and AS, by producing smoother trajectories for improved ride comfort, optimizing speed for greater efficiency, and ensuring collision-free operation.
Unlike existing hybrid MPC approaches that primarily combine MPC with approximate solvers or learning-based modules, our framework uniquely integrates DDP into a game-theoretic MPC setting, thereby achieving both nonlinear optimization efficiency and adaptive interaction modeling.
Furthermore, our proposed game-based MPC-DDP framework distinguishes itself by embedding game-theoretic predictions into a receding-horizon control structure solved via Dynamic Differential Programming (DDP). This integration not only improves computational efficiency for nonlinear dynamics but also enables adaptive behaviors—choosing conservative strategies in high-risk situations while maintaining efficiency and comfort in low-risk conditions.
In summary, existing methods exhibit complementary strengths but also face significant drawbacks in interactive driving. Game-theoretic and robust control approaches often yield overly conservative or suboptimal behaviors; learning-based methods, while flexible, demand extensive computational resources and lack interpretability; and conventional optimization methods such as PSO, QP, and AS either suffer from local optima, linearization errors, or sensitivity to parameter tuning. Unlike these approaches, our proposed game-based MPC-DDP framework explicitly integrates game-theoretic reasoning with second-order optimization via Dynamic Differential Programming (DDP). This integration enables efficient handling of nonlinear vehicle dynamics while adaptively balancing safety, efficiency, and comfort across diverse traffic scenarios.
In the following sections, we introduce the related works, the system model, the risk field formulation, the evolution mechanism, and the MPC optimization problem. Experimental results validate the effectiveness of our proposed approach in interactive driving scenarios.
2. Related Works
This section reviews the challenges and existing approaches for interactive autonomous driving, focusing on optimization-based methods and their limitations, and introduces multi-agent reinforcement learning for driving scenarios.
2.1. Challenges in Interactive Autonomous Driving
Interactive autonomous driving presents numerous challenges, particularly in achieving flexible driving strategies that adapt to diverse traffic scenarios. The AV must first process raw images using sensing technologies such as laser scanning [
41] and point clouds [
42], or rely on different processed images such as RGB [
43], to perform semantic segmentation [
44,
45,
46] for interactive driving. Semantic segmentation provides categorized objects whose specific types and positions can be obtained [
47,
48,
49,
50], thereby assisting the interactive driving process. Once encounter the uncertain scenarios, such as special weathers [
51], that lead the incompletion of the images, diffusion models are used to fix the incomplete images [
52]. In dynamic interactive environments, vehicles must operate safely while interacting with HDVs [
53,
54,
55]. The risk of collisions increases when steering responses are delayed or judgment errors occur, which necessitates the adoption of distinct strategies for different scenarios. For example, maneuvers such as lane changes and overtaking require rapid, decisive actions [
56,
57,
58,
59] to maintain both efficiency and tight car-following, whereas more complex or high-risk environments, such as intersections, demand conservative approaches to maximize safety. In such cases, the vehicle must balance safety, efficiency, and ride comfort by dynamically switching between aggressive and cautious strategies based on the specific driving context. Many existing systems struggle to achieve this flexibility in real time, making it a critical challenge to ensure safe operation in complex and uncertain traffic conditions. Optimization-based approaches offer a promising solution to these challenges, as they enable a systematic formulation of the control problem. Besides, optimization-based approaches allow for real-time re-optimization that adapts to evolving traffic dynamics.
2.2. Shortcomings of Current Optimization-Based Approaches
Recent research in autonomous driving has explored various optimization-based methods for trajectory planning and control. PSO has been employed in several studies to address global optimization challenges in driving environments [
60,
61]. While PSO is advantageous in its ability to explore a large solution space, it often suffers from slow convergence and may become trapped in local optima, leading to suboptimal control performance in fast-changing traffic conditions.
QP techniques have also been widely utilized in autonomous driving applications, particularly for real-time collision avoidance and control under convex constraints [
62,
63]. However, QP-based approaches typically rely on linear or convex approximations of the underlying nonlinear vehicle dynamics and cost functions, which can compromise solution accuracy in driving scenarios.
AS methods have been proposed to improve the tuning of control parameters and to adaptively search the solution space [
33,
64]. Despite their flexibility, AS methods are often sensitive to the choice of parameters and can exhibit inconsistent performance when dealing with significant uncertainties in traffic interactions.
While the aforementioned optimization-based approaches have advanced autonomous driving, they exhibit critical limitations in real-time interactive scenarios. As shown in
Table 1, PSO’s slow convergence and susceptibility to local optima hinder its real-time applicability. QP methods often rely on convex approximations that fail to capture the full nonlinearity of vehicle dynamics and interaction costs, compromising both performance and safety in highly dynamic settings. Although flexible, AS techniques are sensitive to parameter tuning and can yield inconsistent performance under significant uncertainty. Beyond these, learning-based methods (such as, reinforcement learning) typically function as black boxes, lacking interpretability, and demand substantial computational resources for training and inference, raising concerns about reliability and real-time feasibility. These collective shortcomings in real-time performance, interpretability, and handling of nonlinear dynamics and uncertainties motivate our proposed game-theoretic MPC-DDP framework, which leverages a dynamic differential programming solver to efficiently and transparently address these challenges within a receding-horizon optimal control structure.
Game theory, with its equilibrium model, provides flexible decision-making capabilities, allowing the identification of optimal actions that satisfy both traffic participants based on their current states. Meanwhile, DDP offers a promising alternative by using a second-order Taylor series expansion to approximate both the system dynamics and the cost function, which enables it to capture the nonlinear characteristics of the system more accurately. Through its iterative backward-forward process, DDP computes optimal control corrections based on the full nonlinear model while efficiently updating the nominal trajectory using insights derived from game theory. This approach enables the MPC-DDP framework to dynamically adjust control inputs with greater precision, ensuring both robust and flexible control performance under various traffic conditions.
While hybrid MPC approaches have been proposed by combining MPC with heuristic optimization or machine learning to enhance efficiency, they typically overlook explicit game-theoretic reasoning and second-order optimization. Our method fills this gap by embedding DDP into a game-aware MPC formulation, thereby bridging strategic interaction modeling and efficient nonlinear control optimization.
2.3. Multi-Agent Reinforcement Learning for Driving Scenarios
In the field of Multi-Agent Reinforcement Learning (MARL), numerous studies in recent years have explored its application in autonomous driving decision-making and control. As shown in
Table 2. Hua et al. [
65] provides a systematic review of the latest advancements and future challenges of MARL in the control of Connected Autonomous Vehicles (CAVs), highlighting its potential to handle dynamic interactions and enhance traffic flow and fuel efficiency in critical scenarios such as fleet coordination, lane changes, and unsignalized intersections. Addressing safety challenges in practical MARL deployment, Zheng et al. [
66] proposes a secure MARL approach based on Stackelberg models and two-layer optimization. It designs the CSQ and CS-MADDPG algorithms, significantly improving reward and safety performance across diverse driving scenarios including merging, roundabouts, and intersections. Wang et al. [
67] models energy management and eco-driving for plug-in hybrid vehicles as a multi-agent cooperative task, proposing a MARL-based cooperative control strategy that significantly reduces fuel consumption while ensuring safe following distances. Chen et al. [
68] focuses on cooperative decision-making between two vehicles in dynamic random highway environments, proposing a fair cooperative MARL method that effectively balances maintaining convoy formation with enabling free overtaking, enhancing system adaptability and collaborative efficiency under variable traffic conditions. Yadav et al. [
69] provides a comprehensive review of MARL applications in the CAV domain, systematically organizing current research issues, methodologies, and future directions, offering a clear research framework for subsequent developments in this field.
However, existing MARL methods still face numerous challenges, including high sample complexity, insufficient training stability, weak safety guarantees, and poor policy interpretability. Particularly in safety-critical scenarios, black-box decision mechanisms struggle to provide reliable behavioral verification and constraint satisfaction assurance. In contrast, the proposed optimization method based on game theory and model predictive control (MPC) in this paper, while exhibiting strong model dependency, features explicit safety constraint handling mechanisms, high computational efficiency, and good interpretability. It is suitable for interactive driving tasks demanding high safety and real-time performance. Both approaches possess distinct advantages, and future research may explore complementary integration by combining the adaptability of MARL with the safety properties of optimization methods.
4. Experimental Evaluation
The simulations were conducted to evaluate the safety, stability, and efficiency of the proposed method. They were run on a computer operating on Ubuntu 18.04.6 LTS, featuring a 12th generation, 16-thread Intel® Core™ i5-12600KF CPU, an NVIDIA GeForce RTX 3070Ti GPU, and 16 GB of RAM. All simulation results were generated using MATLAB R2024b.
To verify the effectiveness of our proposed game-based MPC-DDP, we designed two distinct simulation scenarios that reflect typical yet challenging driving maneuvers. In the first scenario, illustrated in
Figure 3a, the AV initiates a lane-change maneuver from its current lane to the adjacent lane, while the HDV travels at a steady speed in that lane. As shown, the AV’s path must merge safely behind or in front of the HDV without collisions or abrupt accelerations. In the second scenario, illustrated in
Figure 3b, the AV approaches and traverses a four-way intersection, while HDVs enter from both the top and bottom roads. The AV must navigate across the intersection, maintaining a safe distance from the crossing HDVs and adapting to potential high-risk interactions. In both scenarios, the HDVs drive with stable behavior that does not explicitly respond to the AV, thereby highlighting the AV’s need to anticipate and adapt to possible interactions. Finally, to underscore the superior performance of the proposed game-based MPC-DDP, we compared it against other popular benchmark algorithms under these two scenarios.
4.1. Implementation Details and Parameters
4.1.1. DDP Algorithm Parameters
The DDP optimization algorithm employs several critical parameters that directly impact convergence behavior and computational efficiency.
Table 3 summarizes these parameters with their theoretical justifications.
The convergence criteria employ multiple checks to ensure algorithm reliability:
where
denotes the cost at iteration
k, and
represents inequality constraint violations.
4.1.2. Safety Constraints and Limits
Safety-critical autonomous driving applications require carefully designed constraint sets.
Table 4 details our safety constraint implementation.
The collision avoidance constraint is formulated as a smooth barrier function to maintain differentiability:
where
m provides smooth approximation near the constraint boundary.
4.1.3. Game-Theoretic Parameters
The Stackelberg game formulation requires careful parameter tuning to model realistic human driving behavior.
Table 5 presents our game-theoretic configuration.
The HDV cost function incorporates the Intelligent Driver Model (IDM) structure:
where
is the desired gap distance given by:
4.1.4. Discretization Scheme
The continuous-time dynamics are discretized using fourth-order Runge-Kutta integration with adaptive step size control:
(1) Computational Complexity
The algorithm’s computational complexity scales as where n is the number of vehicles and T is the prediction horizon. Memory requirements scale as . Empirical timing results on an Intel i7-10700K processor show average computation times of ms for scenarios with up to 8 vehicles.
(2) Numerical Stability Measures
To ensure reliable operation across diverse scenarios, we implement several critical numerical stability measures that maintain algorithm robustness under varying conditions. The adaptive regularization scheme continuously monitors the Hessian condition number throughout the optimization process, automatically applying additional regularization terms when to prevent numerical ill-conditioning. This dynamic approach maintains computational stability without unnecessarily constraining well-conditioned problems.
The warm starting strategy initializes each optimization cycle with the solution from the previous time step, shifted forward by one time step, which significantly reduces the number of required iterations and improves convergence reliability. Constraint scaling normalizes all constraint functions to similar magnitudes, typically within the range , which prevents numerical precision issues that arise when constraints differ by several orders of magnitude. Finally, a fallback mechanism automatically engages emergency braking protocols if the optimization fails to converge within the allocated computational time budget, ensuring system safety under all circumstances.
(3) Reproducibility Guidelines
To facilitate accurate reproduction of our experimental results, we provide comprehensive documentation and resources that enable other researchers to implement and validate our approach. The complete parameter specification includes all numerical values, initialization procedures, and algorithmic choices documented, with explicit justification for each design decision based on theoretical analysis or empirical validation.
Our reference implementation provides a complete MATLAB codebase that implements the full algorithm, including all optimization routines, constraint handling mechanisms, and safety protocols. This implementation will be made available through a public GitHub repository upon paper acceptance, with comprehensive documentation and usage examples. The codebase includes a comprehensive suite of benchmark scenarios that represent standard test cases with documented expected outputs, enabling researchers to validate their own implementations against our reference results.
Performance baselines provide computational timing benchmarks measured on standardized hardware configurations, allowing researchers to assess the efficiency of their implementations and identify potential optimization opportunities. The modular code architecture facilitates easy modification of individual algorithmic components while preserving the overall framework integrity.
The modular code structure allows researchers to easily modify individual components (e.g., cost functions, constraint sets, or vehicle models) while maintaining the core algorithmic framework.
4.1.5. Parameter Sensitivity Analysis
Table 6 presents the sensitivity of key performance metrics to parameter variations, providing guidance for parameter tuning in different applications.
This analysis demonstrates the robustness of the algorithm to moderate parameter variations, with safety metrics showing particularly low sensitivity to parameter changes.
4.2. Effective Driving in Lane-Changing Scenario
4.2.1. Lane-Changing Scenario Setup
In this scenario, the AV performs a lane-change maneuver from its current lane to an adjacent lane where a HDV is traveling at a steady speed. The AV must anticipate the HDV’s motion and plan a safe merge behind or in front of the HDV.
Table 7 summarizes the initial conditions for both vehicles.
In our implementation, the AV starts in the lower lane (centered at
) with an initial speed of 15 m/s and heading
. Meanwhile, the HDV occupies the adjacent lane (centered at
) with the same initial speed of 15 m/s and the same heading. The lane width is set to 3.5 m, and both vehicles are assumed to follow the bicycle model described in
Section 3. The AV’s objective is to change lanes safely by anticipating the HDV’s motion, avoiding collisions, and maintaining comfortable accelerations and steering rates.
4.2.2. Simulation Results in Lane-Changing Scenario
Figure 4 illustrates the lane-change scenario in which the AV (blue trajectory) merges from its initial lane into the adjacent lane occupied by a HDV (red trajectory). The solid black lines represent lane boundaries, and the dashed line indicates the lane center. Observing the AV’s path (blue circles), one can see a smooth lateral transition from
to
, demonstrating that the proposed game-based MPC-DDP method effectively plans a continuous lane-change maneuver without abrupt steering or acceleration. This smoothness not only maintains passenger comfort, but also helps reduce the risk of collisions with the HDV or other vehicles.
From a safety perspective, there is no point along the trajectory where the AV crosses the HDV’s path with insufficient spacing, indicating that the method anticipates the HDV’s motion and avoids encroaching on its safety envelope. Additionally, the gradual convergence of the AV’s y-position to the target lane center (around ) reflects the method’s ability to execute a comfortable lane change, rather than a sudden or aggressive swerve. This trajectory analysis highlights how the proposed approach ensures both safety through collision-free motion and sufficient inter-vehicle spacing and comfort through smooth steering and controlled acceleration.
Figure 5 provides additional metrics for the lane-change scenario, highlighting the performance of our game-based MPC-DDP method. In the top subplot, the blue curve indicates the relative speed between the AV and HDV, whereas the red curve shows their relative distance. The left
y-axis corresponds to the relative speed, and the right
y-axis to the relative distance. Notably, the minimum relative distance observed in this simulation is approximately 19.8 m, a margin that is sufficiently large to avoid collisions and ensure safety. This relatively high margin reflects the algorithm’s ability to anticipate the HDV’s motion and adjust the AV’s acceleration to maintain a comfortable gap.
Meanwhile, the relative speed curve transitions smoothly from around m/s to nearly 0 m/s as time progresses, illustrating how the AV gradually converges to the HDV’s speed without abrupt changes. In the bottom subplot, the speed scatter plot shows that the HDV maintains a roughly constant speed of around 15 m/s, while the AV’s speed initially rises above 15 m/s before gently settling to match the HDV. The absence of sharp spikes in the AV’s speed profile underscores both comfort by limiting excessive accelerations and stability by avoiding oscillatory or reactive behavior. Together, these results confirm that our proposed method ensures safety by keeping the vehicles well separated while providing a smooth driving experience through gradual speed adaptation.
Figure 6 compares the speed and acceleration distributions of both the AV and HDV in the lane-change scenario. In the left plot, we show a speed histogram for the AV and HDV. Most of the AV’s speeds are clustered between 14.8 m/s and 15.2 m/s, reflecting the AV’s smooth adaptation toward the HDV’s speed. Meanwhile, the HDV’s speed remains relatively constant at around 15 m/s, with minimal variance. This narrow band of speeds for the AV, centered around the HDV’s speed, indicates that the proposed method avoids large deviations or oscillations, thus contributing to a more comfortable ride.
In the right plot, we display the acceleration distributions for the AV and HDV. The HDV has a single representative acceleration point, corresponding to its nearly constant speed, while the AV’s accelerations spread within a moderate range, ensuring it can merge safely. Notably, the majority of the AV’s accelerations are relatively small in magnitude less than 1 m/s2, which verifies that the lane-change maneuver is performed without aggressive throttle or braking inputs. The absence of high positive or negative acceleration spikes also attests to passenger comfort and ride stability. Overall, these distribution analyses confirm that our game-based MPC-DDP method yields both a narrow speed band around the HDV’s velocity and moderate acceleration profiles, thus enhancing safety, smoothness, and overall driving quality.
4.3. Simulation Results in Intersection’s Driving
In this intersection scenario, the AV enters from the left side of the crossroad, while a HDV approaches from the top. Unlike the lane-change scenario, the HDV’s speed here varies slightly to reflect more realistic human driving behavior.
Table 8 summarizes the initial positions, speed ranges, and headings for both vehicles. The AV starts at a moderate speed and must pass safely through the intersection, anticipating any HDV speed fluctuations and avoiding collisions.
As indicated in
Table 8, the AV’s heading is 0 rad, meaning it travels in the positive
x-direction from
to cross the intersection. Meanwhile, the HDV’s heading is set to
. Although the HDV’s average speed is around 15 m/s, to make the simulation close to the real-world driving, it can fluctuate within the
–
m/s range.
Figure 7 illustrates the intersection scenario, in which the AV travels from left to right, while the HDV moves from top to bottom. The dashed lines indicate approximate intersection boundaries or lane demarcations. Notably, both vehicles maintain collision-free trajectories, highlighting the effectiveness of our game-based MPC-DDP method in handling potentially high-risk crossing maneuvers.
In particular, the AV’s path shows only minor deviations from a straight line, indicating that it does not need to make abrupt turns or stops when crossing the intersection. Meanwhile, the HDV passes steadily from the top to the bottom portion of the road. Despite the HDV’s slight speed variations, the AV anticipates and accounts for these fluctuations in real time, thus maintaining a safe longitudinal and lateral distance. This result underscores the method’s capabilities, allowing the AV to plan ahead and avoid late-reactive maneuvers. The absence of sudden changes in the AV’s trajectory or the HDV’s speed attests to the stability and robustness of the game-based MPC-DDP framework.
Figure 8 depicts the relative speed and distance as well as the speed scatter plot for the intersection scenario, highlighting how our game-based MPC-DDP method balances safety and efficiency. In the top subplot, the blue curve shows the relative speed between the AV and HDV, while the red curve shows their relative distance over time. Initially, the AV accelerates to a higher speed, allowing it to pass through the risky intersection area more quickly. After crossing, the AV gradually reduces its speed, converging closer to the HDV’s velocity. Throughout this process, the minimum relative distance is approximately 1.9 m, ensuring that no collisions occur despite the high-speed crossing.
The bottom subplot provides a speed scatter plot for both the AV and the HDV. As shown, the AV’s speed peaks around the middle of the simulation, then decreases back to a lower level, reflecting its strategy of briefly speeding up to clear the intersection and then resuming safer, more moderate speeds. Meanwhile, the HDV maintains a relatively steady pace, unaffected by the AV’s maneuver. This pattern underscores the flexibility of our method, which not only avoids collisions but also minimizes the time spent in high-risk zones by taking advantage of higher speeds when needed. Overall, these metrics confirm that the proposed game-based MPC-DDP achieves both efficiency through timely intersection crossing and safety through maintaining a minimum distance of 1.9 m and avoiding collisions.
4.4. Benchmark Comparisons for Performance Evaluation
Figure 9 presents bar charts comparing MPC integrated with four popular optimization methods, including PSO [
61], QP [
63], and AS [
33] in terms of average speed, average relative distance, and acceleration variability under two scenarios: lane-change in left side and intersection in right side. Each subplot highlights how our proposed game-based MPC-DDP adapts differently depending on the scenario’s requirements.
Lane-Change Scenario: In the left portion of each subplot, our game-based MPC_DDP achieves both the highest average speed, for example, around 15.2 m/s, and the smallest average gap, such as approximately 3.0 m, compared to the other methods which average around 14.6–15.0 m/s and maintain gaps of 3.5–4.0 m. This behavior is particularly suitable for lane-changing, where higher efficiency and tighter car-following are desirable. By traveling faster and maintaining a closer distance to the HDV, the AV minimizes travel time and merges smoothly without causing unnecessary slowdowns. Furthermore, the acceleration variance for MPC_DDP in this scenario is relatively large, for example, in the range of 0.35–0.40 (m/s2)2, whereas other methods typically remain below 0.25 (m/s2)2. This broader range indicates that MPC_DDP is more flexible in selecting acceleration strategies, enabling quick adaptation and optimization of both speed and gap under dynamic traffic conditions.
Intersection Scenario: In the right portion of each subplot, MPC_DDP shows the lowest average speed, for example, around 12.5 m/s, and the largest average gap, such as approximately 6.0 m, relative to the other methods which generally maintain speeds of 13.0–14.0 m/s and gaps of 4.0–5.0 m. This conservative, safety-oriented strategy suits the more hazardous intersection environment, where the AV must avoid high-speed conflicts and preserve a larger safety margin. By reducing its speed and increasing the gap, MPC_DDP lowers collision risk and can respond more effectively to unexpected HDV maneuvers. The acceleration variance also remains higher in this scenario, for example, around 0.30–0.35 (m/s2)2, again illustrating the AV’s ability to choose from a broader range of accelerations. This flexibility enables the AV to decelerate quickly or accelerate when needed, thereby enhancing overall safety and stability at the intersection.
As shown in
Table 9, in both scenarios, the acceleration variability of MPC_DDP remains moderate, implying smooth speed and steering changes. This balance further underscores our method’s capacity to adapt its driving style to be aggressive enough to ensure efficiency in the lane-change case, but cautious enough to ensure safety at the intersection.
4.5. Comprehensive Statistical Evaluation and Benchmark Comparisons
To demonstrate our algorithm’s performance in terms of statistical rigor, generalization capability, and diverse conditions, we conducted a large-scale comprehensive simulation study. This section details the experimental design and compares the proposed Game-MPC-DDP algorithm against three state-of-the-art benchmarks through comprehensive statistical analysis: Mixed-Integer Quadratic Programming MPC (MIQP-MPC) [
62], Nonlinear MPC (NMPC) [
38], and Deep Reinforcement Learning (Deep RL) [
21].
4.5.1. Experimental Design and Setup
To ensure the claims are statistically sound and generalizable, we designed a full-factorial experiment encompassing a wide spectrum of driving scenarios, environmental conditions, and behavioral uncertainties. The experiment consists of 2880 unique scenario configurations, each repeated 30 times via Monte Carlo simulation to account for stochasticity, resulting in a total of 86,400 experimental runs. The experimental factors are as follows:
Scenarios (4 types): Lane Change, Intersection, Highway Merge, and Roundabout, covering major interactive driving challenges.
HDV Configurations (4 levels): 2, 4, 6, and 8 surrounding HDVs to test scalability and complexity.
HDV Driving Styles (4 types):
- -
Aggressive: Time headway = 1.0 s, desired speed = 28 m/s, max acceleration = 3.0 m/s2.
- -
Normal: Time headway = 1.5 s, desired speed = 25 m/s, max acceleration = 2.0 m/s2.
- -
Conservative: Time headway = 2.0 s, desired speed = 22 m/s, max acceleration = 1.5 m/s2.
- -
Mixed: A combination of the above styles within a single scenario.
Initial Conditions (5 types): Low density (50 m spacing), Medium density (30 m), High density (15 m), Mixed speeds (15–30 m/s), and an Adversarial setting where HDVs exhibit blocking behaviors.
Weather Conditions (3 types):
- -
Clear: Visibility 100%, friction coefficient 1.0.
- -
Rain: Visibility 70%, friction coefficient 0.6, speed factor 0.8.
- -
Fog: Visibility 30%, friction coefficient 0.9, speed factor 0.6.
This design provides high statistical power (0.95 at to detect effects as small as 0.2 standard deviations), ensures comprehensive coverage of real-world variables, and allows for robust validation of our method’s performance.
4.5.2. Statistical Analysis of Safety Performance
The primary metric for evaluation is safety, quantified by the minimum distance between the AV and any HDV during an interaction.
Table 10 presents the aggregated safety results across all 86,400 experiments, demonstrating a decisive advantage of the proposed method.
The results show that our method maintains an average minimum distance 6–8 times larger than the benchmarks. This directly translates into superior safety outcomes: zero collisions were recorded in all 2880 scenarios, compared to collision rates of 9.5% to 17.5% for other methods. Furthermore, the number of near misses (distance < 2 m) and Time-to-Collision (TTC) violations for our approach are negligible, approaching zero. The statistical significance of these differences is confirmed with p-values < 0.001 using Welch’s t-test, and the effect size (Cohen’s ) indicates a very large practical significance.
4.5.3. Performance Under Diverse Conditions
Scalability with Number of HDVs: As shown in
Figure 10, the computation time of our method exhibits a sub-linear increase, growing only from 53 ms (2 HDVs) to 55 ms (8 HDVs). More importantly, its safety performance (minimum distance) remains consistently high regardless of complexity, while the performance of other methods degrades significantly with more HDVs.
Robustness to Weather Conditions:
Figure 10 illustrates that our method maintains a near-100% success rate (no collisions) across all weather conditions. In contrast, the success rates of benchmark methods, particularly in challenging fog conditions, drop to between 82% and 88%. This demonstrates the superior robustness of our game-theoretic prediction integrated with DDP optimization in adverse perceptual conditions.
Adaptation to Driving Styles and Density: The proposed method successfully adapts its strategy across all HDV driving styles and traffic densities. In aggressive, high-density settings, it prioritizes safety by maintaining larger gaps, while in normal conditions, it efficiently balances safety and traffic flow. The variance in performance metrics across different initial conditions was minimal for our method, confirming its robustness.
4.6. Discussion and Limitations
While the proposed game-based MPC-DDP framework demonstrates effective performance in the tested interactive scenarios, its performance relies on the assumption that human-driven vehicles (HDVs) act as rational agents in a Stackelberg game equilibrium. This model, although useful for structured prediction, represents a simplification of real-world driving behavior. In practice, human drivers may exhibit behaviors that deviate from this assumption due to factors such as unpredictable aggressiveness, hesitation, distraction, or varying levels of risk tolerance.
Such deviations could potentially impact the prediction accuracy of the game-theoretic module, especially in highly adversarial or ambiguous situations. For instance, an overly aggressive HDV might not yield as predicted, while a hesitant one might create unnecessary conservatism in the AV’s plan.
Nevertheless, the receding-horizon nature of the MPC framework provides inherent robustness to moderate prediction errors by frequently re-planning based on updated observations. Furthermore, the safety-oriented cost terms (e.g., the risk potential field and collision avoidance penalty ) are designed to explicitly penalize any trajectories that encroach on safety margins, thereby mitigating the consequences of imperfect behavioral predictions.
A promising direction for future work involves enhancing the behavioral model by integrating adaptive or probabilistic driver models that can identify different driving styles (e.g., aggressive, conservative) in real-time. This would allow the AV to adjust its interaction strategy dynamically, leading to even more robust and human-like cooperative driving in mixed traffic environments.
5. Conclusions
In this paper, we proposed a game-based MPC-DDP framework to tackle interactive autonomous driving challenges, particularly in lane-change and intersection scenarios. By integrating a game-theoretic prediction module with a Dynamic Differential Programming (DDP) solver under a receding-horizon control scheme, our method balances safety, efficiency, and comfort more effectively than existing approaches. Experimental results demonstrated that the proposed framework achieves higher average speeds and smaller inter-vehicle gaps when appropriate, while adopting conservative maneuvers in high-risk conditions to maintain larger safety margins. For future work, we intend to expand the proposed approach to handle multiple HDVs with diverse driving behaviors, thereby modeling more complex and realistic traffic interactions. Although the current framework demonstrates strong performance in single-HDV interaction scenarios, extending it to multi-HDV environments presents several critical challenges, primarily encompassing the following aspects. First, computational complexity increases significantly with the number of HDVs, as the game strategy space grows exponentially, potentially compromising real-time performance. To address this, we plan to explore hierarchical optimization structures or distributed computing approaches, combined with approximate dynamic programming to enhance computational efficiency. Second, the stability of equilibrium strategies in multi-agent games remains a critical concern. Nash equilibria or Stackelberg equilibria may be non-existent or non-unique under certain conditions. Therefore, we consider introducing learning-based behavioral prediction mechanisms to enhance adaptability to diverse driving styles and develop more robust equilibrium selection strategies. Additionally, common conflict coupling phenomena in multi-vehicle interactions—such as simultaneous lane changes or intersecting paths at intersections—require more refined coordination mechanisms. We propose employing conflict graph modeling or dynamic priority allocation strategies, combined with real-time replanning techniques to ensure system safety. Finally, the diversity of driving behaviors (e.g., aggressive, conservative) significantly impacts interaction strategies. Future work will integrate a driver behavior recognition module and design adaptive cost functions to dynamically respond to different behavioral patterns.